Compare commits

..

795 Commits

Author SHA1 Message Date
Fabiano Fidêncio
a001021721 Merge pull request #8292 from fidencio/topic/release-ensure-gh-is-used-from-a-git-repo
release: Always use actions/checkout to ensure we're in a git repo
2023-10-23 15:16:12 +02:00
Fabiano Fidêncio
c5cfad7023 actions: Move all the checkout actions to v4
It's been released for a while now, and we need to keep consistency
between what we used.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-23 14:01:53 +02:00
Fabiano Fidêncio
b32c6bf805 release: Always use actions/checkout to ensure we're in a git repo
Otherwise we'll face issues like:
```
Run tag=$(echo $GITHUB_REF | cut -d/ -f3-)
  tag=$(echo $GITHUB_REF | cut -d/ -f3-)
  tarball="kata-static-$tag-amd64.tar.xz"
  mv kata-static.tar.xz "$GITHUB_WORKSPACE/${tarball}"
  pushd $GITHUB_WORKSPACE
  echo "uploading asset '${tarball}' for tag: ${tag}"
  GITHUB_TOKEN=*** gh release upload "${tag}" "${tarball}"
  popd
  shell: /usr/bin/bash -e {0}
~/work/kata-containers/kata-containers ~/work/kata-containers/kata-containers
uploading asset 'kata-static-3.3.0-alpha0-amd64.tar.xz' for tag: 3.3.0-alpha0
failed to run git: fatal: not a git repository (or any of the parent directories): .git
```

Fixes: #8286 (or better, just a follow up of that)

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-23 14:00:39 +02:00
Fabiano Fidêncio
8fe88696c0 Merge pull request #8287 from fidencio/topic/release-use-gh-cli-instead-of-hub
actions: release: Use GH cli instead of hub
2023-10-23 12:40:22 +02:00
Fabiano Fidêncio
710eb8ab9d actions: release: Use GH cli instead of hub
hub is now deprecated, which has been causing issues with our release
process.

Let's move to the GH cli (https://cli.github.com/manual), and unblock
this release.

**NOTE**: This commit is purposefully not touching anywhere else hub is
used, as that would require more time and investigation to do the
switch, and right now we just want to unblock the release.

Fixes: #8286

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-23 08:49:55 +02:00
Fabiano Fidêncio
74d4865189 Merge pull request #8275 from fidencio/topic/ci-adapt-kata-deploy-regex-on-repo-version-update
release: Adapt the CIs using the kata-deploy image
2023-10-23 00:37:19 +02:00
Dan Mihai
732fe163f3 Merge pull request #8229 from microsoft/danmihai1/no-config-toml-endpoints
agent: no endpoint blocking from agent-config.toml
2023-10-20 11:30:43 -07:00
Fabiano Fidêncio
026f6a1a4c release: Adapt the CIs using the kata-deploy image
This is needed in order to properly run the CIs in branches that are not
the main one, as the kata-deploy.yaml file on those branches do not have
the `latest` tag, but rather the latest stable release.

Fixes: #8274

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-20 18:59:14 +02:00
Fabiano Fidêncio
124f498830 Merge pull request #8266 from fidencio/3.3.0-alpha0-branch-bump
# Kata Containers 3.3.0-alpha0
2023-10-20 17:40:44 +02:00
GabyCT
8486283012 Merge pull request #8247 from GabyCT/topic/iperfudp
metrics: Add iperf udp benchmark
2023-10-20 09:21:37 -06:00
Fabiano Fidêncio
0fb69ddf6a release: Kata Containers 3.3.0-alpha0
- kata-deploy-stable: Switch to using the ubuntu based payload
- libs: protection: Fix typo in TDX output
- ci: k8s: Fix bogus firecracker check in k8s-credentials-secrets.bat
- tests: Enable agent stability test
- docs: Fix paths to build kernel in SNP VMs documentation
- runtime-rs: ch: Add TDX CH features check
- runtime: Validate hypervisor section name in config file
- tests: query data from the OPA service
- release: tag_repos: Stop tagging the `tests` repo
- metrics: fixes common.sh function to always return true
- Memory footprint test removing trailing commas to make json results file valid
- policy: allow access to ReseedRandomDev
- runtime/kata-ctl: update dependencies
- runtime-rs : fix Nydus support for runtime-rs + Dragonball
- metrics: removal of reference in the documentation to the fio dax subtest.
- runtime-rs: ch: Detect Intel TDX version
- runitme-rs: use the same base64 as kata-runtime/direct-volume does
- tests: Enable scability test for stability CI
- runtime-rs: Add support for adding vfio device for cloud-hypervisor
- tests: Enable soak parallel stability test
- dragonball: vcpu metrics change to be recorded per vcpu
- ci: k8s: adapt gha-run.sh to run locally
- metrics: removes kata components and k8s deployment when test finishes
- GHA: fix up referenced yaml exceeding 20 limit problem
- gha: ci: Revert tracing test PR to unbreak CI
- runtime-rs: ch: Enable feature
- gha: ci: Port runk tests over
- ci: gha: Port tracing tests over
- Enable fio test using containerd client
- gha: Add stability tests workflow for gha
- gha: arm64: Ensure the builder is arm64-builder
- kata-deploy: Build kata-agent as we build all the other components
- versions: migrate out of k8s.gcr.io
- doc: Update crictl pod-config
- gha: Fix k0s deployment
- tests: Add stability test for kata CI
- docs: Update url in kata vra document
- gpu: Adding CDI support for cold and hot-plug of VFIO devices
- kata-deploy: build & ship the rust components from src/tools/
- metrics: Add latency value limits for kata CI
- runtime: fix reading cgroup stats of sandboxes
- Upgrade to Cloud Hypervisor v35.0
- ci: Port kata-monitor tests from Jenkins to GHA
- metrics: Fix latency yamls path
- metrics: Fix metrics README
- metrics: Fix C-Ray documentation
- runtime-rs: ch: Enable Intel TDX
- ci: k8s: crio: Follow up patches to have CRI-O also working as part of our CI
- metrics: Enable latency test in gha run script
- local-build: Fix .docker ownership before build-payload
- runtime-rs: Add network support for cloud-hypervisor
- osbuild: Reduce guest components binary size with strip
- gha: Add pandoc as a dependency for static checks
- ci: rootfs-image build-asset is failing
- feat(runtime-rs): introduce huge page mode to select VM RAM's backend
- clh: Direct IO support for block devices
- gha: Install hunspell for static checks
- ci: Trigger payload-after-push on workflow_dispatch
- ci: Actually enable the CRI-O tests
- protocol: remove gogoprotobuff tests
- ci: k8s: Also run tests with CRI-O
- runtime: support kernel params including spaces
- ci: kata-deploy: Fix runner name
- metrics: Enable parallel bandwidth iperf limit
- ci: kata-deploy: Enable all k8s flavours that we support
- ci: Create clusters in individual resource groups
- versions: Bump virtiofsd to v1.8.0
- clh: arm: Use static_sandbox_resource_mgmt=true
- Bump nydus versions and update nydus tests
- runtime/qemu: Rework QMP/HMP support
- clh:arm64: use arm AMBA UART for hypervisor debug
- ci: Use variable size of VMs depending on the tests running
- ci: Rework static checks
- runtime: incorrect handling of non-empty []Endpoint parameter in Remo…
- ci: cache: Check the sha256sum of the components & fix ovmf-sev cache usage
- ci: cache: Use the artefacts stored in ghcr.io/kata-containers/cached-artefacts/${component}
- ci: Run some of the GARM tests in smaller instances
- ci: Reduce the size of the AKS VMs
- ci: cache: Allow pushing our artefacts to an OCI registry
- metrics: Add iperf value for cpu utilization
- ci: cache: Export env vars needed to use ORAS
- gha: vfio: Import test script
- tests: fix kernel and initrd annotations
- metrics: Add iperf bandwidth value for kata metrics
- metrics: Add Cassandra Metrics documentation
- metrics: Remove warning from metrics documentation
- ci: docker: nerdctl: Switch to tcp port 80 ping
- runtime: Naming conflict of network devices
- Remove gogoproto.nullable extension
- metrics: Ensure docker is running in init_env
- metrics: this PR skips the FIO test temprarily to fix issues
- ci: Add a very basic nerdctl sanity test
- runtime-rs: hypervisor: Remove debug kernel options
- versions: Bump rust version
- ci: Add a very basic docker sanity test
- dragonball: fix for non-deterministic builds
- runtime-rs: bring hybrid vsock devices in manager.
- ci: use github.ref_name instead of $GITHUB_REF_NAME
- ci: Add more target-branch related fixes
- ci: Fix target-branch usage
- agent: optimize the code of systemd cgroup manager
- gha: Manually rebase PR atop of the target branch before testing
- Update kernel to the latest LTS release (v6.1.52) and bring in erofs patches needed for the CC work
- kata-deploy: Fix aarch64 image build
- runtime: Fix more virtiofs args
- kata-deploy: Switch to an alpine image
- metrics: Use TensorFlow optimized image
- metrics: fix FIO test initialization
- ci: k8s: Add clean-up-garm argument for gha-run.sh
- ci: k8s: Second round of fix-ups with the devmapper CI
- metrics: re-enable memory-usage initialization step
- Dragonball: optimize the placement of dbs-upcall features
- ci: k8s: Fix typo in run-k8s-tests-on-garm.yaml
- ci: k8s: Add k8s devmapper tests (part 0)
- kata-deploy: Create kata-static.tar with correct ownership
- runtime: run prestart hooks before starting VM for FC
- metrics: Add write 95 percentile FIO value
- runtime: Allow virtio_fs_extra_args annotation
- packaging: do not install docker-compose-plugin for s390x|ppc64le
- runtime-rs: Fix volumes and rootfs cleanup issues
- metrics: Enable iperf benchmark on gha for kata metrics
- CI: switch static-checks-dragonball CI machines to Azure
- metrics: Add README for kata metrics report
- osbuilder: Remove chcon operation for guest SELinux
- kata-sys-util: protection: Update TDX checks
- Improve the way to clean up storage devices for sandbox
- agent: avoid possible leakage of storage device
- tests: add policy to existing tests
- gha: Rebase PR atop of the target branch before testing
- versions: Update alpine to its 3.18 version
- runtime: Fix data race in ioCopy
- metrics: Add grabdata script for metrics report
- Fixes tests on AMD machines
- metrics: Enable FIO limits for kata metrics
- metrics: Add metrics report script
- metrics: Fix memory inside limits for kata metrics
- metrics: fix parsing issue on memory-usage test
- dragonball: vsock add fifo/pipe stream support for passed fd hybridSt…
- tests: Add confidential test
- tdx: Update the components needed for using the 6.2 kernel stack
- tests: delete k8s deployment at the test's end
- tests: use unique test name
- runtime-rs: check peer close in log_forwarder
- gha: Avoid "fail-fast" in tests that are known to be flaky
- Refine storage device management for kata-agent
- metrics: Remove unused variable in tensorflow nhwc script
- kata-deploy: Don't try to remove /opt/kata
- metrics: Add TensorFlow ResNet50 FP32 benchmark
- gha: vfio: Run on Ubuntu 23.04 runner
- kata-agent: use default filemode for block device when it is set to 0
- kata-types: introduce KataVirtualVolume to support nydus, direct volume and image pull
- libs,tests: fix typo disable_guest_seccomp in configuration-anno-1.toml
- local-build: Remove GID before creating group
- kata-deploy: Avoid failing on content removal
- runtime: fix image and initrd assets handling
- metrics: Add disk link to README
- metrics: Fix FIO path
- gha: capture additional kata-deploy output
- metrics: Use function from metrics common in pytorch script
- metrics: Enable kata runtime in K8s for FIO test.
- metrics: Fix README for pytorch
- metrics: Remove unused variable in tensorflow mobilenet script
- rootfs: agent: Policy support with AGENT_INIT=yes
- gha: k8s: kata-deploy: Move kata-deploy specific tests from integration/kubernetes to functional/kata-deploy
- metrics: Fix check results for tensorflow benchmark
- metrics: Add Tensorflow ResNet50 int8 benchmark
- kata-deploy: Properly create default runtime class
- agent: simplify error handling
- metrics: Fix MobileNet help me description
- gha: ci: Start running kata-deploy tests
- runk: Modify kill command's error message for containerd tests
- runtime-rs: add driver option
- gha: cri-containerd: Enable tests
- metrics: Rename tensorflow scripts
- gha: tests: Add kata-deploy functional tests -- Part 1
- agent: runtime: add Agent Policy feature
- runk: Support without pid ns
- metrics: Add Cassandra Kubernetes benchmark for kata metrics
- metrics: Add common functions to the common script
- metrics: fix the loop used to stop kata components
- docs: Remove installation step in virtcontainers doc
- Propogate secrets, config maps etc into guest if sharedFS not available
- kata-deploy: Preliminary k0s support
- gha: static-checks: Move to the Azure instances
- versions: Update firecracker version to 1.4.0
- agent: Allow clippy::redundant_clone in the unit tests
- agent: avoid creating new `Vec` instances when easily avoidable
- metrics: compute tensorflow statistics
- metrics: Add network nginx benchmark
- metrics: install kata once and run multiple checks
- ci: unencrypted-image: Fix build context
- ci: create-confidential-image: Add dependent actions
- Follow up fixes for https://github.com/kata-containers/kata-containers/pull/7596
- tests: Create image that will be used in the unencrypted confidential tests
- kata-deploy: Ensure we cover SHIMS / DEFAULT_SHIM as part of our tests
- tests: upgrade bats version
- Fix mimor bugs and improve coding stype of agent rpc/sandbox/mount
- deps: Bump dependent crate versions
- fix number of queues handling in dragonball share fs device
- runtime-rs: Introduce directly attachable network
- metrics: General improvements to mobilenet tensorflow test
- gha: Add iperf network metrics
- docs: Use control-plane term instead of master
- agent: avoid unnecessary calls to `Arc::clone`
- metrics: Add network latency test
- Image pulling on the host
- Use version 0.10.4 of `fuse-backend-rs`
- kata-deploy: Use host's systemctl
- release: Revert kata-deploy changes after 3.2.0-rc0 release
- metrics: stop kata components before start a metric test.
- runtime-rs: Add block device handling for cloud hypervisor

a93fdb014 kata-deploy-stable: Adapt to what we're using in the stable branch
36109da93 ci: k8s: Fix bogus firecracker check in k8s-credentials-secrets.bat
d01daf749 tests: Adjust timeout for agent stability test
9b14dda14 libs: protection: Fix typo in TDX output
0e0867f15 runtime-rs: ch: Add TDX CH features check
409eadddb runtime-rs: ch: Improve readability of guest protection checks
82a0814fc tests: Enable agent stability test
32be8e3a8 tests: query data from the OPA service
b81c0a669 tests: encode policy file during test
4f9681b41 metrics: fixes common.sh function to always return true
2ef2b2a6d docs: Fix paths to build kernel in SNP VMs documentation
408b59c02 runtime-rs: fix bugs to support Nydus v5
157caea9f Revert "nydus: Temporarily skip tests on dragonball"
678fe3cd3 Dragonball: fix Nydus config serde problem
b6ec62138 policy: allow access to ReseedRandomDev
908519db9 metrics: skips docker restart when it is not installed or is masked.
c2763120a metrics: removing trailing comma characters from json file.
3e8cf6959 runtime: Validate hypervisor section name in config file
ef6388e81 tests: Remove unused function from scability test
fbc8f8f46 scripts: Use install_yq from the `kata-containers`  repo
65b1a2d27 release: tag_repos: Stop tagging / updating the `tests` repo
87b760f56 runtime-rs: ch: Detect Intel TDX version
73e81f5e3 runitme-rs: unify base64 encoding for direct-volume
c6463cb5a tests: Fix path for versions yaml for soak parallel test
89c9454fc metrics: removal of reference in the documentation to the dax test.
30ff58904 tests: Enable scability test for stability CI
8d6f7b909 runtime-rs: Add support for handling vfio device for cloud-hypervisor
e786b2b01 gha: Add install dependencies for stability tests
dbfe6512f dragonball: vcpu metrics change to be recorded per vcpu
fa60fbe02 dragonball: METRICS is refactored to RwLock<DragonballMetrics>
500d1c5ce kata-ctl: update rustls-webpki/webpki dependency
d7660d82a runtime: unify gopkg.in/yaml.v3 to v3.0.1
fc9a107e8 runtime: unify swag and testify dependency
79ebb959c runtime: update runc dependency to v1.1.9
7f3e8bd65 runtime: unify golang.org/x/text to v0.7.0
df325ae37 runtime: update golang.org/x/net to v0.7.0
bba34910d metrics: stops kata components and k8s deployment when test finishes
84e3d884e gha: Add general dependencies to stability tests
dec3951ca tests: Add soak parallel stability test
0f04d527d tests: Enable soak parallel test
e669282c2 ci: k8s: set KUBERNETES default value
c30c3ff18 tests: run k8s-volume on a given node
666993da8 tests: run k8s-file-volume on a given node
3a00fc910 tests: exec_host() now gets the node name
61c9c17bf tests: add get_one_kata_node() to tests_common.sh
68f083c4d ci: k8s: set KATA_HYPERVISOR default value
6677a61fe ci: k8s: configurable deploy kata timeout
200e54292 ci: k8s: shellcheck fixes to gha-run.sh
4af78be13 kata-deploy: re-format kata-[deploy|cleanup].yaml
d54e6d9cd ci: k8s: run_tests() for kcli
c2ef1f0fb ci: k8s: add deploy-kata-kcli() to gh-run.sh
d2be8eef1 ci: k8s: add cleanup-kcli() to gha-run.sh
cbb9aa15b ci: k8s: set default image for deploy_kata()
89bef7d03 ci: k8s: create k8s clusters with kcli
954d40cce gha: combine coco jobs into a single yaml
b60e0a9b5 gha: combine basic amd64 jobs into a single yaml
e9bd85211 gha: ci: Revert tracing test PR to unbreak CI
b8a46a4b8 runtime-rs: ch: Enable feature
0f2dc8c67 gha: Add containerd stability tests to ci yaml
da91c9df8 ci: Port runk tests to this repo
7f2377276 ci: Add placeholder for runk tests
9205acc3d ci: Move tracing tests here
85d290a04 gha: Add stability gha run script
54f0c8f88 gha: Add stability tests workflow for gha
3bb2923e5 ci: Add placeholder for tracing tests
2c3bf406d ci: Create a function to install docker
119f03de2 gha: arm64: Ensure the builder is arm64-builder
8c498ef5e metrics: Use jq tool to pretty-print json metrics output
a2159a636 metrics: Enables FIO test for kata containers
70e7ec3e2 gha: Fix k0s deployment
560bbffb5 packaging: tools: Remove `set -x` leftover
18fa483d9 packaging: release: Mention newly added images
ca3b88837 packaging: tools: Fix container image env var name
5ca66795c packaging: Allow passing the TOOLS_CONTAINER_BUILDER
02acef957 gha: Build the kata-agent as part of our workflows
5208386ab packaging: Build the kata-agent
1727487ee agent: Allow specifying DESTDIR and AGENT_POLICY via env vars
45c118883 packaging: Add get_agent_image_name()
0db8fb8f9 versions: migrate out of k8s.gcr.io
a1a054367 doc: Fix spelling
6339605a1 tests: Add general stability fixes
59ae24444 doc: Update crictl pod-config
fd19f4082 tests: Add agent stability test
215577032 tests: Add cassandra stress in stability tests
f2d3ea988 tests: Add stressng dockerfile for stability tests
6493aa309 tests: Add stressor CPU test for stability tests
ef68a3a36 metrics: Add stability test for kata CI
7c934dc7d gpu: Fix cold-plug of VFIO devices
8d66ef518 metrics: Increase qemu jitter value
5600e28b5 metrics: Increase jitter value for clh
a6b1f5e21 ci: Build src/tools components as part of our tests / releases
501a168a8 kata-deploy: Build components from src/tools
6ef42db5e static-build: Add scripts to build content from src/tools
4d08ec29b packaging: Add get_tools_image_name()
98097c96d packaging: Use git abbreviated hash
489caf1ad ci: kata-monitor: Move tests over
a3fb067f1 ci: Add placeholder for kata-monitor tests
57cb4ce20 ci: Make install_kata aware of container engines
de1eeee33 ci: Create a generic install_crio function
64a200085 ci: Add install_cni_plugins helper
8132fe15c ci: Modify containerd default config
8cb7df1be metrics: Add checkmetrics for latency test
e90440ae2 metrics: Add qemu latency value limit
a74a8f8a9 metrics: Add latency value limits for kata CI
d7def8317 metrics: Fix general check static warnings
928553d1b docs: Update url in kata vra document
b0a3293d5 runtime-rs: ch: Enable Intel TDX
523399c32 runtime-rs: ch: Add more consts
dea806581 runtime-rs: ch: Remove unused function
995f2c015 runtime-rs: ch: Only handle particular pending device types
b1b96a5c4 runtime-rs: ch: Remove erroneous "virtio-blk-mmio" check
9ac29b8d3 metrics: Add init_env function to latency test
dfd0c9fa9 runtime: clh: Re-generate the client code
8f9f087e3 versions: Upgrade to Cloud Hypervisor v35.0
81c8babca metrics: Fix latency yamls path
481573682 metrics: Fix C-Ray documentation
ef63d67c4 ci: crio: Trail '\r' from exec_host() output
74c12b292 ci: crio: Enable default capabilities
358dc2f56 kata-deploy: Fix CRI-O detection
ebaa4fa4c ci: crio: Pass `-y` to apt
97e73b223 metrics: Fix spelling warnings
36c8cd6f1 metrics: Fix metrics README
15425a2b8 local-build: Fix .docker ownership before build-payload
13ca7d9f9 gha: Add pandoc as a dependency for static checks
08bc8e4db metrics: Add latency benchmark for gha
6776b55d7 metrics: Enable latency test in gha run script
94e2ccc2d runtime: fix reading cgroup stats of sandboxes
d507d189b fc: Add support for noflush cache option
2ca781518 clh: Direct IO support for block devices
0c95697cc ci: Trigger payload-after-push on workflow_dispatch
28cbc3b51 ci: rootfs-image build-asset is failing Fixes: #8027
87a861648 gha: Install hunspell for static checks
8c3c50ca8 ci: Actually enable the CRI-O tests
3a6510ad6 osbuild: Reduce guest components binary size with strip
07a6e63a6 ci: k8s: rke2: Use sudo to call systemd
03b82e848 ci: k8s: Add a CRI-O test
d7105cf7a ci: k8s: Add a method to install CRI-O
54c0a471b ci: k8s: k0s: Allow passing parameters to the k0s installer
730ef5169 deps: updating dependencies
3a2c83d69 ci: kata-deploy: Fix runner name
82ff2db46 runtime: support kernel params including spaces
604a9dd67 protocol: remove gogoprotobuff tests
f7fa7f602 ci: Enable kata-deploy tests for all the supported k8s flavours
2c908b598 ci: kata-deploy: Add the ability to deploy rke2
eaf616491 ci: kata-deploy: Add the ability to deploy k0s
001525763 ci: kata-deploy: Add deploy-k8s argument to gha-run.sh
bf2cb0228 ci: kata-deploy: Expland tests to run on k0s / rke2
b12b9e188 ci: kata-deploy: Add placeholder for tests on GARM
9e1fb8a96 ci: kata-deploy: Export KUBERNETES env var
09cc0ed43 ci: Move deploy_k8s() to gha-run-k8s-common.sh
486fe14c9 ci: Properly set K8S_TEST_UNION
d9ef1352a ci: Add first letter of the K8S_TEST_HOST_TYPE to resource group name
68267a399 ci: Create clusters in individual resource groups
9aa8d1c91 metrics: Add parallel bandwidth limit for qemu
44c7c082d versions: Bump virtiofsd to v1.8.0
af59d4bf4 metrics: Enable parallel bandwidth iperf limit
aba36ab18 nydus: Temporarily skip tests on dragonball
b8a8dfcd1 nydus: Use `kata-${KATA_HYPERVISOR}` instead of `kata`
f6df3d6ef static-build: Fix arch error on nydus build
2f9c9e2e6 tests: nydus: Update nydus tests
c9a4e7e46 versions: Bump nydus and nydus-snapshotter to its latest release
b73bde320 gha: nydus: Populate run()
b3904a1a3 gha: nydus: Populate install_dependencies()
d2b3b67f5 gha: nydus: Actually install kata when `install-kata` is called
0ec00ad42 gha: nydus: Get rid of nydus{,-snapshotter} install from nydus_test.sh
568439c77 tests: nydus: Add timeout to the crictl calls
5ac3b76eb tests: nydus: Add uid / namespace to the nydus container / sandbox
376574a16 tests: nydus: Decorate some calls with `sudo`
4290fd4b6 tests: nydus: Adapt "source ..." to GHA
a84efa3e8 tests: nydus: Adapt check to "clh" instead "cloud-hypervisor"
56a14b395 tests: common: Add install_nydus_snapshotter()
b6563783e tests: common: Add install_nydus()
72599f191 clh: arm: Use static_sandbox_resource_mgmt=true
1f16b6627 runtime/qemu: Rework QMP/HMP support
8b1e9b0c7 ci: static-checks: Clean up static-checks job
2c5ca2eaf ci: static-checks: Run tests depending on KVM
509c309ab ci: static-checks: Move "sudo make test" to the new test matrix
4e963cedf ci: static-checks: Move "make test" to the new test matrix
08f2e5ae0 runtime-rs: Ensure static-checks-build is a dep of `make test`
2bc3a616a kata-ctl: Use `loop` instead of `kvm` module in tests
46daddc50 kata-ctl: Ensure GENERATED_CODE is a dep of `make test`
ec826f328 agent: Ensure GENERATED_CODE is a dep of `make test`
1d32410a8 ci: install_libseccomp: Do not depend on the tests repo
bf888b9a5 ci: static-checks: Move "make check" to the new test matrix
473ec8780 kata-ctl: Add `kata-types` to the Cargo.lock file
ea19549a9 kata-ctl: Ensure GENERATED_CODE is a dep of `make check`
e12577586 tests: install_rust: Also install clippy
e2c61a152 ci: static-checks: Move vendor check to its own job
6794d4c84 tests: Move install_rust.sh from the tests repo
e64508c30 tests: install_go: Remove tests repo dependency
11dff731b tests: Move functions from kata_arch script here
75c974c80 ci: static-checks: Move kernel config check to its own job
9c233bb9e test: Add test to verify try_from for clh Netconfig
c69a1e33b ci: Use variable size of VMs depending on the tests running
9049d311d runtime-rs: Add network support for cloud-hypervisor
eecd5bf2a ci: cache: Fix ovmf-sev cache
86c41074b ci: cache: Check the sha256sum of the component
460988c5f ci: cache: Remove the script used to cache artefacts on Jenkins
4533a7a41 ci: cache: Also store the ${component} sha256sum
eccc76df6 ci: cache: Use the cached artefacts from ORAS
7f5e77bcb kernel: enable Arm pl011 support
241c355e0 clh:arm64: use arm AMBA uart for hypervisor debug
094b6b2cf ci: k8s: Temporarily disable tests that require a bigger VM instance
d0c257b3a ci: cache: Push cached artefacts to ghcr.io
108f1b60d kata-deploy: Generate latest_{artefact,image_builder} files
be2eb7b37 ci: cache: Install ORAS in the kata-deploy binaries builder container
fb24fb0dc ci: k8s: devmapper: Use a smaller / cheaper VM instance
1daf02f5d ci: nydus: Use a smaller / cheaper VM instance
e60d81f55 ci: nerdctl: Use a smaller / cheaper VM instance
4db416997 ci: docker: Use a smaller / cheaper VM instance
32841827b ci: cri-containerd: Use a smaller / cheaper VM instance
92fff129f ci: k8s: Don't set cpu limit request for k8s-inotofy test
faf98c062 ci: Reduce the size of the AKS VMs
adc18ecdb ci: cache: For consistency, read all used env vars
c7a851efd ci: cache: Pass the exposed env vars to the kata-deploy binaries in docker
6bd15a85d ci: cache: Export env vars needed to use ORAS
cd4fd1292 metrics: Add iperf cpu utilization limit for qemu
df5cd10ea metrics: Add iperf value for cpu utilization
a96050a7a tests: Apply timeout to 'ctr t kill'
9d9303678 tests/vfio: Bump VM image to Fedora 38
faee59b52 tests/vfio: Accept single device in vfio group for CLH
df3dc1105 tests/vfio: Get rid of sync's
7211c3dcc gha: vfio: Set test timeout to 15m
1b02f89e4 packaging: kernel: Enable VIRTIO_IOMMU on x86_64
3a1db7a86 runtime: clh: Support enabling iommu
9f1a42c6c tests/vfio: Give commands 30s to execute
b46b0ecf8 tests/vfio: Configure a value for 'hot_plug_vfio' for both vmms
bfc93927f runtime: Remove redundant check in checkPCIeConfig
7c4e73b60 runtime: Add test cases for checkPCIeConfig
fc51e4b9e runtime: Check config for supported CLH (cold|hot)_plug_vfio values
509771e6f runtime: clh: Add hot_plug_vfio entry to config
5f6475a28 tests/vfio: Gather debug info and disable tdp_mmu
8fffdc81c tests/vfio: Capture journal from vm
df815087e tests/vfio: Change to get the test working in GHA
a92ddeea1 tests/vfio: Move dependency installation to gha-run.sh
5a551a85b gha: vfio: Import jobs scripts from tests repo
49e2fa189 metrics: Increase jitter value for qemu
49234433a metrics: Increase value limit for jitter in clh
813bfdec0 ci: docker: nerdtl: Use io.containerd.kata-${KATA_HYPERVISOR}.io
46bc0b1c0 ci: nerdctl: Create the containerd config
13968aa7f ci: nerdctl: Switch to tcp port 80 ping
e0c811678 ci: docker: Switch to tcp port 80 ping
1636abbe1 runtime: issue with non-empty []Endpoint in RemoveEndpoints
0aa073967 metrics: Add iperf bandwidth value for qemu
c0ad91476 tests: fix kernel and initrd annotations
615c1cbf1 metrics: Add iperf bandwidth value for kata metrics
d53eb73ee metrics: Ensure docker is running in init_env
ad08321b8 metrics: Add Cassandra Metrics documentation
a58ea6659 metrics: this PR skips the FIO test temprarily to fix issues
f536ef5ce ci: docker: Also run the smoke test with runc
c83f167c5 ci: docker: Run the tests after the kata-static is created
12d833d07 ci: Add a very basic nerdctl sanity test
348b8644d ci: Add a very basic docker sanity test
a75fd5eb8 runk: Fix rust unecessary mut error
a31c14517 kata-ctl: useless-vec warning
c8419fc3b kata-ctl: Resolve non-minimal-cfg warning
3eaf68d95 agent-ctl: Allow clippy lint
1d8b78959 runtime-rs: Fix useless-vec warning
99f3d69e9 runtime-rs: Remove mut
16fbc27b0 dragonball: Allow ambiguous-glob-reexports
bbf191951 dragonball: Resolve non-minimal-cfg warning
75cfdd5d5 agent: config: Allow clippy lint
f3a0fd590 agent: config: Fix useles-vec warning
9e423bd3d libs: Fix clippy unnecesary hashes error
444395050 versions: Bump rust version
a16b0962b chore(cargo): update cargo lock
ca4b6b051 runtime: Naming conflict of network devices
202049f35 feat(runtime-rs): introduce huge page type to select VM RAM's backend
f811b064c ci: use github.ref_name instead of $GITHUB_REF_NAME
6d795c089 ci: Add more target-branch related fixes
8509c3187 ci: Fix target-branch usage
060499dca metrics: Remove warning from metrics documentation
c0f697fcc runtime: Allow kernel_params annotation
b03e49794 dragonball: fix for non-deterministic builds
976d10150 runtime-rs: hypervisor: Remove debug kernel options
fde34610c kernel: Add erofs patches needed for CC related work
dc6a4588a versions: Bump kernel to the latest LTS release (6.1.52)
52f6449b7 kata-manager: Remove initcall_debug kernel option
8b4a0b368 kata-deploy: Remove curl after it's used
139c7f03a kata-deploy: Fix aarch64 image build
470d06541 agent: optimize the code of systemd cgroup manager
bd24afcf7 gha: Manually rebase PR atop of the target branch before testing
72c510d05 runtime/virtiofsd: Drop all references to "--cache=none"
ead724bec protocol: removing gogo.nullable feature
d8e4bb985 protocol: remove unused PROTO_FILE env
5e1106a77 protocol: remove unused import_path
87accaaec protocol: use workdir during build
711a7ed96 protocol: remove mapping definitions
8db84c1bd protocol: force GOPATH to be set
68156d77a protocol: breaking lines to improve readability
670a8e9c7 kata-deploy: Switch to an alpine image
9d74b7ccc k8s: ci: Skip "Pod quota" test with firecracker
f6cd3930c ci: k8s: Remove useless skip statement from tests
3cc20b47a ci: k8s: Also check for "fc" (for firecracker)
b5bad3cb0 ci: k8s: Add clean-up-garm argument for gha-run.sh
aaec5a09f ci: k8s: devmapper tests should be using ubuntu 20.04
27fa7d828 ci: k8s: Add a kata-deploy-garm target
fa62a4c01 ci: k8s: Export KUBERNETES env var
8c9380a79 ci: k8s: Install bats on GARM runners
3de23034f ci: k8s: Wait some time after restarting k3s
adfea55b8 metrics: fix FIO test initialization
2df183fd9 ci: k8s: Append, instead of overwrite, the devmapper config
369a8af8f ci: k8s: Decrease k3s sleep from 4 to 2 minutes
ada65b988 ci: k8s: Use vanilla kubectl with k3s
ad45ab5d3 ci: k8s: Ensure k3s is deploy with --write-kubeconfig-mode=644
028a97e0d ci: k8s: Use the proper command for sleep
3a427795e metrics: Use TensorFlow optimized image
8d99972a8 ci: k8s: Fix typo in run-k8s-tests-on-garm.yaml
deed1b927 Dragonball: optimize the placement of dbs-upcall features
0e8bd50cb ci: k8s: Add k8s devmapper tests (part 0)
b28b54df0 ci: k8s: Add a function to configure devmapper for containerd
54f711721 ci: k8s: Add a function to deploy k3s
81536f21a runtime/qemu: Pass "--xattr" to virtiofsd instead of "-o xattr"
b1dd09a4d runtime: Allow virtio_fs_extra_args annotation
2efda20c7 packaging: do not install docker-compose-plugin for s390x|ppc64le
438fbf966 metrics: Add write 95 percentile for FIO for qemu
024b4d2ff metrics: Add write 95 percentile FIO value
e98e5cdea metrics: Add checkmetrics to gha run script
c1edfe551 metrics: Add checkmetrics value for qemu for iperf
6a79ecedf metrics: Add jitter value for clh
f609a9a75 metrics: Add test selector to iperf metrics
5b8db3042 metrics: Enable iperf benchmark on gha for kata metrics
60f733d30 CI: switch static-checks-dragonball CI machines to Azure
7870b33a2 runtime-rs: bring hybridVsock devices in manager.
18c94ebbe kata-deploy: Create kata-static.tar with correct ownership
57e7bf14a agent: refine StorageDeviceGeneric::cleanup()
53edb1937 agent: implement StorageDeviceGeneric::cleanup()
0c63453e2 types: make StorageDevice::cleanup() return possible error code
3a3d77b3b agent: move StorageDeviceGeneric from kata-types into agent
b151cfd14 metrics: re-enable memory-usage initialization step
f3e1a6a94 osbuilder: alpine: Change mirror
ac612aef5 osbuilder: alpine: Match the version on versions.yaml
9cd706d1c agent: avoid possible leakage of storage device
bf21411e9 tests: add policy to k8s tests
d0e061067 runtime: config: use the SEV initrd for SNP
67fed26f1 runtime: Use TDX image with in the qemu-tdx config
ac939c458 gha: Rebase atop of the target branch
82cd14ba3 versions: Update alpine to its 3.18 version
666882575 metrics: Add grabdata script for metrics report
c290eaed8 kata-sys-util: protection: Update TDX checks
d7a996c68 gha: Update to checkout@v3 action
c2ba29c15 runtime: Fix data race in ioCopy
211de08d9 osbuilder: Remove chcon operation for guest SELinux
9f21fa9b3 metrics: Add report generator link to general documentation
c0ed5ea0a metrics: Add README for kata metrics report
a7b59a5bf metrics: Add limit for 90 percentile for qemu value
99db6568e metrics: Add limit for write 90 percentile value for clh
6e06392c5 metrics: Enable FIO limits for kata metrics
2e4c87472 runtime/vc: runPrestartHooks should ignore GetHypervisorPid failure
21204caf2 runtime: fail early when starting docker container with FC
32fd01371 runtime: run prestart hooks before starting VM for FC
00e7ffd98 tests: check vmx only on Intel machines
c8dd3c073 metrics: Fix memory footprint qemu limit
8877ec62f metrics: Fix memory inside limits for kata metrics
80146f207 tests: Fixes cpuType check on AMD machines
7e364716d metrics: Add test setup details to metrics report
17dc1b976 metrics: Add boot lifecycle times to metrics report
3b0d6538f metrics: Add memory inside container to metrics report
79fbb9d24 metrics: Add scaling system footprint in metrics report
8e6d4e6f3 metrics: Add metrics reportgen
139ffd4f7 metrics: Add report file titles
878d1a2e7 metrics: Generate PNGs alongside the PDF report
fce248797 metrics: Add metrics report R files
08812074d metrics: Add report dockerfile
69781fc02 metrics: Add metrics report script
e286e842c tests: Expand confidential test to support TDX
e31f099be tests: Expand confidential test to support SNP
c3b9d4945 tests: Add confidential test for SEV
538c965c2 metrics: fix parsing issue on memory-usage test
3818bf331 local-build: Remove $HOME/.docker/buildx/activity/default
d1b54ede2 qemu: tdx: Workaround SMP issue with TDX 1.5
1e34220c4 qemu: tdx: Adapt to the TDX 1.5 stack
8115a0522 versions: tdx: Update Kernel to 6.2 + TDX
ec18180f3 versions: tdx: Update TDVF to the "edk2-stable202302"
9803b2428 versions: tdx: Update QEMU to v7.2 + TDX v1.10
dffc16e5b runtime-rs: check peer close in log_forwarder
aaa5ab126 agent: simplify storage device by removing StorageDeviceObject
fb49d5d7c gha: Avoid "fail-fast" in tests that are known to be flaky
183f51d6f tests: use unique test name
6a974679f tests: delete k8s deployment at the test's end
32a778b6d metrics: Remove unused variable in tensorflow nhwc script
d8f3ce649 kata-deploy: Don't try to remove /opt/kata
936e8091a gha: vfio: Run on Ubuntu 23.04 runner
0e7248264 agent: move storage device related code into dedicated files
268e84655 runtime-rs: Fix volumes and rootfs cleanup issues
8f49ee33b agent: refine storage related code a bit
60ca12ccb agent: switch to new storage subsystem
fcbda0b41 kata-types: introduce StorageDevice and StorageHandlerManager
b03b1f613 agent: simplify the way to manage storage object
8392c71bf sys-util: support more mount flags in parse_mount_options()
c00d8f3d4 agent: use create_mount_destination() from kata-sys-util
5e867f053 types: add more mount related constants
880e6c9a7 agent: use function from kata-sys-utils to reduce code
3b881fbc0 local-build: Remove GID before creating group
959ca4944 metrics: Add TensorFlow ResNet50 fp32 Dockerfile
4b7d72c4a metrics: Add TensorFlow ResNet50 FP32 benchmark
5cba38c17 kata-deploy: Avoid failing on content removal
18d42da21 runtime/fc: fix image/initrd annotation handling
9fda7059a runtime/clh: fix image/initrd annotation handling
1a0092d63 runtime/qemu: fix image/initrd annotation handling
22d8f335d libs,tests: fix typo disable_guest_seccomp in configuration-anno-1.toml
8afd158ce metrics: Add disk link to README
40914b25d kata-agent: use default filemode for block device when it is set to 0
eee2ee6ee metrics: Fix FIO path
39bc3488f metrics: Use function from metrics common in pytorch script
400eb8874 gha: capture additional kata-deploy output
4aee3eade kata-types: implement serde methods for KataVirtualVolume
b875e3932 kata-types: validate KataVirtualVolume object
fa2fdc105 kata-types: implement two conversion helpers for KataVirtualVolume
6326af20e kata-types: introduce KataVirtualVolume
c8b43f8b3 metrics: Fix README for pytorch
fb571f8be metrics: Enable kata runtime in K8s for FIO test.
cb056f8cb rootfs: agent: Policy support with AGENT_INIT=yes
85c02828e metrics: Update tensorflow name in gha run script
e8a511934 metrics: Fix check results for tensorflow benchmark
2d896ad12 gha: kata-deploy: Do the runtime class cleanup as part of the cleanup
4ffc2c86f gha: kata-deploy: Add the first kata-deploy test
8616c050a metrics: Remove unused variable in tensorflow mobilenet script
285e616b5 tests: common: Ensure test_type is used as part of the cluster's name
790bd3548 tests: commob: Don't fail if yq is not part of the cache
ce6adecd0 gha: kata-deploy: Add run-kata-deploy-tests.sh
cfc29c11a gha: k8s: Stop running kata-deploy tests as part of the k8s suite
f4dd15286 tests: k8s: Call ensure_yq() in setup.sh
339569b69 kata-deploy: Properly create default runtime class
2a491e9b1 metrics: Fix MobileNet help me description
d19a75e80 gha: ci: Start running kata-deploy tests
d90f7ac68 runtime-rs: add unit test for block driver
e44919f0d runtime-rs: add load_test_config for unit test
7f48a6937 runtime-rs: add driver option
bade6a5c3 docs: Fix TensorFlow word across the document
1a1b20776 docs: Add Tensorflow Resnet50 documentation
24baededc metrics: Add Dockerfile for ResNet50 int8
6d971ba8d metrics: Add Tensorflow ResNet50 int8 benchmark
25d151bd1 runk: Modify kill command's error message for containerd tests
b3592ab25 gha: cri-containerd: Enable tests
84dd02e0f gha: cri-containerd: Add timeout to the crictl calls on testContainerStop
b29782984 gha: cri-containerd: Show pod before deleting it
ae0930824 gha: cri-containerd: Print kata logs in case of error
6c8b2ffa6 gha: cri-containerd: Group containerd logs
9e898701f gha: cri-containerd: Ensure RUNTIME takes KATA_HYPERVISOR into account
76dac8f22 agent: simplify error handling
18a7fd8e4 metrics: Rename tensorflow scripts
e55fa93db tests: kata-deploy: Add placeholder for kata-deploy-tests-on-tdx
d9ee17aae tests: kata-deploy: Add placeholder for kata-deploy-tests-on-aks
ab829d103 agent: runtime: add the Agent Policy feature
831e73ff9 tests: kata-deploy: Add functional/kata-deploy/gha-run.sh placeholder
af1b46bbf tests: Add gha-run-k8s-common.sh
416445e7e docs: Remove installation step in virtcontainers doc
72cbcf040 kata-deploy: Add k0s support
767434d50 metrics: fix the loop used to stop kata components #7629
5d0f0d43c metrics: Add cassandra statefulset yaml
c1dcc1396 metrics: Add cassandra service yaml
2297a0d1c metrics: Add block loop pvc yaml for cassandra
e3d511946 metrics: Add block loop pv yaml for cassandra test
989027159 metrics: Add block loop pvc for cassandra test
349b89969 metrics: Add Cassandra Kubernetes benchmark for kata metrics
c52d09052 gha: static-checks: Move to the Azure instances
8815ed066 runtime: Remove config warnings
afe1a6ac5 agent: support copying of directories and symlinks
ab13ef87e runtime: propagate configmap/secrets etc changes for remote-hyp
c074ec4df runtime: Copy shared files recursively
fdcd52ff7 metrics: Add check containers are running in tensorflow mobilenet
36337ee14 metrics: Add check containers are up in tensorflow script
f700f9b0b metrics: Remove unused variable in tensorflow script
833cf7a68 metrics: Add check containers are running function
918c78308 metrics: Add check containers are up in tensorflow mobilenet script
9d57a1fab metrics: Use check containers are up in tensorflow script
1c84680d8 metrics: Add check containers are up in common script
d3e57cf45 metrics: Use collect_results function in tensorflow mobilenet test
286de046a metrics: Remove collect results function definition
9879709aa metrics: Add common functions to the common script
4746fa3da docs: Specify supported Firecracker version using `versions.yaml`
cc922be5e versions: Update firecracker version to 1.4.0
39e67b06e dragonball: vsock add fifo/pipe stream support for passed fd hybridStream
473b0d3a3 metrics: compute tensorflow statistics
03d1fa67b ci: unencrypted-image: Fix build context
eb463b38e ci: unencrypted-image: Don't fail to build on s390x
a2d731ad2 ci: create-confidential-image: Add dependent actions
d1a629622 metrics: Add nginx documentation to network README
498f7c054 metrics: Add nginx kubernetes yaml
f8a5255cf metrics: Add network nginx benchmark
43fe5d1b9 ci: k8s: tees: Ensure PR_NUMBER is exported
54f6a7850 ci: {{ pr-number }} should be {{ inputs.pr-number }}
034d7aab8 tests: k8s: Ensure the runtime classes are properly created
fac8ccf5c ci: Add build-and-publish-tee-confidential-unencrypted-image
ab5f603ff ci: k8s: Add the image used for unencrypted confidential tests
1e8fe131b k8s: tests: Take advantage of `SHIMS` and `DEFAULT_SHIM` env vars
729b2dd61 agent: avoid creating new `Vec` instances when easily avoidable
aeaec9dae tests: upgrade bats version
e66496986 metrics: install kata once and run multiple checks
baabfa9f1 agent: refine implementation of mount related code
98ba211a3 agent: fix a bug in update_ephemeral_mounts()
5333618d7 agent: make add_storage() take &[Storage] instead of Vec<Storage>
37f34781d agent: simplify function online_cpu_memory()
d3c542237 agent: refine style of code related to sandbox
71a9f6778 agent: avoid unwrap() in function do_remove_container()
84badd89d agent: avoid clone objects when possible
b23c5ed15 deps: Bump dependent crate versions
863283716 metrics: General improvements to mobilenet tensorflow test
3c319d8d4 metrics: Add iperf to gha run script
5b5caf890 gha: Add iperf network metrics
66db5b535 metrics: Add latency test to network README
c36572418 agent: avoid unnecessary calls to `Arc::clone`
4fbe0a3a5 runtime: bind-mount mounted block device into container
7e1b1949d runtime: add support for kata overlays
6c867d9e8 agent: add io.katacontainers.fs-opt.overlay-rw option
6163c3565 agent: skip mount options that start with "io.katacontainers."
b2ff97aa0 dragonball: use version 0.10.4 of `fuse-backend-rs`
845eeb4d7 agent: Allow clippy::redundant_clone in the unit tests
1163fc9de release: Revert kata-deploy changes after 3.2.0-rc0 release
3958a39d0 runtime-rs: Introduce directly attachable network
1e15369e5 metrics: Improve naming testing containers in launch times test
5dbe88330 metrics: Clean kata components before start a metric test.
3b45060b6 metrics: Add latency server yaml
9bb8451df metrics: Add latency client yaml
64fdb9870 metrics: Add network latency test
a81ad3b58 runtime-rs: Add block device handling in cloud hypervisor
3230dec95 kata-deploy: Use host's systemctl
1b21a4624 docs: Use control-plane term instead of master
28e5e9c86 runtime-rs: fix number of queues handling in dragonball share fs device
f1d8de9be runk: Allow runk to launch a container without pid namespace

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-20 14:44:50 +02:00
Fabiano Fidêncio
f6e20ac230 Merge pull request #7195 from fidencio/topic/adapt-kata-deploy-stable-to-using-ubuntu
kata-deploy-stable: Switch to using the ubuntu based payload
2023-10-20 14:42:04 +02:00
Fabiano Fidêncio
a93fdb014b kata-deploy-stable: Adapt to what we're using in the stable branch
This is basically to make sure that folks trying to use the kata-deploy
script from the main branch, to deploy **stable** kata-deploy images, do
not have a hard time.

Fixes: #7194

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-20 12:58:42 +02:00
James O. D. Hunt
79ed501a20 Merge pull request #8258 from jodh-intel/protection-fix-tdx-typo
libs: protection: Fix typo in TDX output
2023-10-20 08:36:22 +01:00
Dan Mihai
52aaf10759 agent: no endpoint blocking from agent-config.toml
Remove the ability to block access to kata agent endpoints by using
agent-config.toml. That functionality is now implemented using the
Agent Policy feature (#7573).

The CCv0 branch relied on blocking endpoints using agent-config.toml
but will set-up an equivalent default policy file instead (#8219).

Fixes: #8228

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-10-20 02:26:54 +00:00
Fabiano Fidêncio
468a3e4b53 Merge pull request #8260 from gkurz/fix-8259
ci: k8s: Fix bogus firecracker check in k8s-credentials-secrets.bat
2023-10-19 23:58:22 +02:00
GabyCT
5d6bdbd0a1 Merge pull request #8241 from GabyCT/topic/enableagenttest
tests: Enable agent stability test
2023-10-19 14:12:49 -06:00
Greg Kurz
36109da93f ci: k8s: Fix bogus firecracker check in k8s-credentials-secrets.bat
Fixes #8259

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-10-19 21:53:23 +02:00
GabyCT
dc295600b8 Merge pull request #8157 from GabyCT/topic/fixsevdoc
docs: Fix paths to build kernel in SNP VMs documentation
2023-10-19 11:42:03 -06:00
Gabriela Cervantes
d01daf749b tests: Adjust timeout for agent stability test
This PR adjusts the timeout for the agent stability test
to run on the gha.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-19 16:55:23 +00:00
James O. D. Hunt
9b14dda147 libs: protection: Fix typo in TDX output
Add the missing closing bracket to the output of the TDX details,
so rather than:

```bash
$ sudo kata-ctl env 2>/dev/null | grep available_guest_protection
available_guest_protection = "tdx (major_version: 1, minor_version: 0"
:                                                                    ^
:                                                           Missing ')' !
```

... we now have:

```bash
$ sudo kata-ctl env 2>/dev/null | grep available_guest_protection
available_guest_protection = "tdx (major_version: 1, minor_version: 0)"
:                                                                    ^
:                                                                   Aha!
```

Added a unit test for this scenario.

Fixes: #8257.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-19 16:06:08 +01:00
James O. D. Hunt
9336e2e492 Merge pull request #8155 from jodh-intel/runtime-rs-check-ch-tdx-build-feature
runtime-rs: ch: Add TDX CH features check
2023-10-19 14:13:08 +01:00
James O. D. Hunt
048cc70654 Merge pull request #8213 from jodh-intel/validate-hypervisor-cfg-name
runtime: Validate hypervisor section name in config file
2023-10-19 07:40:58 +01:00
Dan Mihai
99db6dff24 Merge pull request #8230 from microsoft/danmihai1/opa-data
tests: query data from the OPA service
2023-10-18 15:32:23 -07:00
James O. D. Hunt
0e0867f15d runtime-rs: ch: Add TDX CH features check
If you attempt to create a container (a TD) on a TDX system using a
custom build of Cloud Hypervisor (CH) that was not built with the `tdx`
CH feature, Kata will report the following, somewhat cryptic, CH error:

```
ApiError(VmBoot(InvalidPayload))
```

Newer versions of CH now report their build-time features in the ping
API response message so we now use that, if available, to detect this
scenario and generate a user-friendly error message instead.

This changes improves the readability of `handle_guest_protection()` and
adds a couple of additional tests for that method.

Fixes: #8152.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-18 18:07:39 +01:00
James O. D. Hunt
409eadddb2 runtime-rs: ch: Improve readability of guest protection checks
Improve the way `handle_guest_protection()` is structured by inverting
the logic and checking the value of the `confidential_guest` setting
before checking the guest protection. This makes the code easier to
understand.

> **Notes:**
>
> - This change also unconditionally saves the available guest protection
>   (where previously it was only saved when `confidential_guest=true`).
>   This explains the minor unit test fix.
>
> - This changes also errors if the CH driver finds an unexpected
>   protection (since only Intel TDX is currently tested).

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-18 18:06:02 +01:00
Greg Kurz
9863805752 Merge pull request #8201 from fidencio/topic/release-tag-repo-stop-tagging-the-tests-repo
release: tag_repos: Stop tagging the `tests` repo
2023-10-18 18:10:39 +02:00
Gabriela Cervantes
a58afe70b8 metrics: Add iperf udp benchmark
This PR adds the iperf udp benchmark for bandwdith measurement
for network metrics.

Fixes #8246

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-18 15:52:03 +00:00
Gabriela Cervantes
82a0814fc2 tests: Enable agent stability test
This PR enables the agent stability test for stability gha CI.

Fixes #8240

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-17 15:16:06 +00:00
Dan Mihai
32be8e3a87 tests: query data from the OPA service
Add example for querying json data from the OPA service.

Fixes: #8231

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-10-17 13:31:43 +00:00
David Esparza
d90d1c5c10 Merge pull request #8243 from dborquez/fix_systemctl_masked_query
metrics: fixes common.sh function to always return true
2023-10-16 20:17:24 -06:00
Dan Mihai
b81c0a6693 tests: encode policy file during test
Encode policy file during test - easier to understand than hard-coding
the encoded file contents.

Fixes: #8214

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-10-16 15:58:12 -07:00
David Esparza
4f9681b411 metrics: fixes common.sh function to always return true
This PR corrects the init env() helper function, to make that
systemctl always returns true when enumerating masked services,
and preventing the test from failing

Fixes: #8242

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-10-16 15:57:57 -06:00
David Esparza
59e8b1d5a7 Merge pull request #8206 from dborquez/memory_footprint_test_removing_trailing_commas_to_make_json_results_file_valid
Memory footprint test removing trailing commas to make json results file valid
2023-10-16 14:31:28 -06:00
Gabriela Cervantes
2ef2b2a6dc docs: Fix paths to build kernel in SNP VMs documentation
This PR fixes the correct path to setup, build and install properly
the kernel for snp.

Fixes #8156

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-16 20:09:02 +00:00
Fabiano Fidêncio
db37692f36 Merge pull request #8226 from microsoft/danmihai1/policy-typo
policy: allow access to ReseedRandomDev
2023-10-16 19:17:31 +02:00
Peng Tao
45e82b6581 Merge pull request #8192 from bergwolf/github/deps
runtime/kata-ctl: update dependencies
2023-10-16 16:39:17 +08:00
Chao Wu
44e602d69a Merge pull request #8014 from openanolis/chao/fix_nydus_break
runtime-rs : fix Nydus support for runtime-rs + Dragonball
2023-10-16 01:30:22 -05:00
Chao Wu
408b59c02c runtime-rs: fix bugs to support Nydus v5
1. enable virtio-fs-pro in Dragonball to have the ability to process nydus backend registry
2. change passthrough for rw layer's readonly config to false to have the accurate read write ability.

Fixes:#8013

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-10-16 10:22:21 +08:00
Chao Wu
157caea9fe Revert "nydus: Temporarily skip tests on dragonball"
This reverts commit aba36ab188.

Fixes: #8013

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-10-16 10:22:21 +08:00
Chao Wu
678fe3cd31 Dragonball: fix Nydus config serde problem
Since Nydus snapshotter has been updated in previous commits, there is a
problem that the config passthrough to Dragonball during mount_rafs is
RafsConfig instead of ConfigV2, but Dragonball could only serde ConfigV2
so it will panic.

We need to add the support for RafsConfig

Fixes:#8013

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-10-16 10:22:21 +08:00
Dan Mihai
b6ec621389 policy: allow access to ReseedRandomDev
Allow access to the ReseedRandomDev endpoint by default. Using false
for ReseedRandomDevRequest was unintended.

Fixes: #8225

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-10-13 21:18:27 +00:00
David Esparza
908519db9d metrics: skips docker restart when it is not installed or is masked.
To avoid errors when initializing the test environment, the
kill_processes_before_start() helper function needs to verify that
docker is installed before attempting to stop it.

Fixes: #8218

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-10-13 18:02:00 +00:00
David Esparza
c2763120aa metrics: removing trailing comma characters from json file.
This PR removes trailing commas so that the json results
file is valid.

This PR also changes the way data results are collected by
terating through the array of memory values to calculate
their average.

Fixes: #8204

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-10-13 18:00:57 +00:00
GabyCT
1974d13122 Merge pull request #8188 from dborquez/metrics_add_fio_readme.md
metrics: removal of reference in the documentation to the fio dax subtest.
2023-10-12 10:53:55 -06:00
James O. D. Hunt
3e8cf6959c runtime: Validate hypervisor section name in config file
Previously, if you accidentally modified the name of the hypervisor
section in the config file, the default golang runtime gives a cryptic
error message ("`VM memory cannot be zero`"). This can be demonstrated
using the `kata-runtime` utility program which uses the same golang
config package as the actual runtime (`containerd-shim-kata-v2`):

```bash
$ kata-runtime env >/dev/null; echo $?
0
$ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml
$ kata-runtime env >/dev/null; echo $?
VM memory cannot be zero
1
```

The hypervisor name is now validated so that the behaviour becomes:

```bash
$ kata-runtime env >/dev/null; echo $?
0
$ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml
$ ./kata-runtime env >/dev/null; echo $?
/etc/kata-containers/configuration.toml: configuration file contains invalid hypervisor section: "foo"
1
```

Fixes: #8212.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-12 13:53:37 +01:00
James O. D. Hunt
45d28998d9 Merge pull request #8149 from jodh-intel/runtime-rs-ch-detect-tdx-version
runtime-rs: ch: Detect Intel TDX version
2023-10-12 10:09:42 +01:00
QuanweiZhou
f904e64155 Merge pull request #8179 from Apokleos/directvol-urlEncode
runitme-rs: use the same base64 as kata-runtime/direct-volume does
2023-10-12 09:04:11 +08:00
GabyCT
bc6eadf4f6 Merge pull request #8197 from GabyCT/topic/enablescability
tests: Enable scability test for stability CI
2023-10-11 16:41:46 -06:00
Archana Shinde
f814b1a0a2 Merge pull request #8073 from amshinde/runtime-rs-vfio-clh
runtime-rs: Add support for adding vfio device for cloud-hypervisor
2023-10-11 15:01:55 -07:00
Gabriela Cervantes
ef6388e815 tests: Remove unused function from scability test
This PR removes an unused function from scability test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-11 19:44:21 +00:00
Fabiano Fidêncio
fbc8f8f466 scripts: Use install_yq from the kata-containers repo
As the file is already part of the kata-containers repo, and the tests
repo is about to become read-only, we're good to drop the tests
references from here and use everything coming from the
`kata-containers` repo instead.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-11 12:52:55 +02:00
Fabiano Fidêncio
65b1a2d277 release: tag_repos: Stop tagging / updating the tests repo
As we've moved all the tests to the `kata-containers` repo, the `tests`
repo will become a read-only repo.

Fixes: #8200

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-11 11:45:27 +02:00
James O. D. Hunt
87b760f569 runtime-rs: ch: Detect Intel TDX version
Improve the `GuestProtection` handling to detect the version of
Intel TDX available.

The TDX version is now logged by the Cloud Hypervisor driver.

Fixes: #8147.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-11 09:38:00 +01:00
alex.lyn
73e81f5e39 runitme-rs: unify base64 encoding for direct-volume
Direct-volume needs to use the same base64 character set as
kata-runtime/direct-volume does.

Fixes: #8175

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-10-11 14:00:13 +08:00
Gabriela Cervantes
c6463cb5ae tests: Fix path for versions yaml for soak parallel test
This PR fixes the path for versions yaml for soak parallel test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-10 22:29:20 +00:00
David Esparza
89c9454fca metrics: removal of reference in the documentation to the dax test.
This PR removes the reference in the documentation to the DAX
subtest of the FIO benchmark, because this metric is currently
WIP.

Fixes: #8159

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-10-10 15:55:59 -06:00
Gabriela Cervantes
30ff58904e tests: Enable scability test for stability CI
This PR enables the scability test for stability CI gha.

Fixes #8196

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-10 19:59:57 +00:00
GabyCT
538131ab44 Merge pull request #8154 from GabyCT/topic/addstability
tests: Enable soak parallel stability test
2023-10-10 13:53:14 -06:00
Archana Shinde
8d6f7b9096 runtime-rs: Add support for handling vfio device for cloud-hypervisor
This change adds support for adding and removing vfio devices for
 cloud-hypervisor.

Fixes: #6691

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-10-10 12:25:44 -07:00
Gabriela Cervantes
e786b2b019 gha: Add install dependencies for stability tests
This PR adds the install dependencies for stability tests.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-10 16:05:48 +00:00
Chao Wu
936553ae79 Merge pull request #7505 from lisongqian/feat/dragonball_metrics
dragonball: vcpu metrics change to be recorded per vcpu
2023-10-10 10:52:40 -05:00
Wainer Moschetta
d311c3dd04 Merge pull request #7621 from wainersm/gha-run-local
ci: k8s: adapt gha-run.sh to run locally
2023-10-10 11:19:19 -03:00
David Esparza
93fef543e0 Merge pull request #8127 from dborquez/fix_iperf_check_kata_processes_issue
metrics: removes kata components and k8s deployment when test finishes
2023-10-10 07:05:24 -06:00
lisongqian
dbfe6512fc dragonball: vcpu metrics change to be recorded per vcpu
In this commit, the vcpu metrics in Dragonball will be changed to record per-vcpu.

Fixes: #7248

Signed-off-by: lisongqian <mail@lisongqian.cn>
2023-10-10 16:22:40 +08:00
lisongqian
fa60fbe023 dragonball: METRICS is refactored to RwLock<DragonballMetrics>
In this commit, the METRICS is refactored to RwLock<DragonballMetrics>.

Fixes: #7248

Signed-off-by: lisongqian <mail@lisongqian.cn>
2023-10-10 16:22:40 +08:00
Peng Tao
500d1c5cee kata-ctl: update rustls-webpki/webpki dependency
The old ones have security issues.
ref: https://github.com/briansmith/webpki/issues/69
https://github.com/briansmith/webpki/issues/69

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
d7660d82a0 runtime: unify gopkg.in/yaml.v3 to v3.0.1
The older versions have Denial of Service issues.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
fc9a107e8e runtime: unify swag and testify dependency
So that we don't need to depend on that many versions of them.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
79ebb959c5 runtime: update runc dependency to v1.1.9
To pick up security fixes.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
7f3e8bd65e runtime: unify golang.org/x/text to v0.7.0
The older versions contain security issues.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
df325ae371 runtime: update golang.org/x/net to v0.7.0
To pick up fix for the following issue:

A maliciously crafted HTTP/2 stream could cause excessive CPU
consumption in the HPACK decoder, sufficient to cause a denial of
service from a small number of small requests.

Fixes: #8190
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:39 +00:00
David Esparza
bba34910df metrics: stops kata components and k8s deployment when test finishes
This PR adds a trap whenever the scrip exits, it deletes the iperf
k8s deployment and k8s services, and deletes the kata components.

This way, when the script finishes, it verifies that there are
indeed no kata components still running.

Fixes: #8126

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-10-09 13:41:43 -06:00
Gabriela Cervantes
84e3d884e4 gha: Add general dependencies to stability tests
This PR adds the general dependencies to stability tests.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-09 17:02:49 +00:00
Gabriela Cervantes
dec3951ca5 tests: Add soak parallel stability test
This PR adds the soak parallel stability test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-09 17:02:49 +00:00
Gabriela Cervantes
0f04d527d9 tests: Enable soak parallel test
This PR enables the soak parallel test for stability test.

Fixes #8153

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-09 17:02:49 +00:00
Wainer dos Santos Moschetta
e669282c25 ci: k8s: set KUBERNETES default value
The KUBERNETES variable is mostly used by kata-deploy whether to apply
k3s specific deployments or not. It is used to select the type of
kubernetes to be installed (k3s, k0s, rancher...etc) and it is always
set on CI. Running the script locally we want to set a value by default
to avoid `KUBERNETES: unbound variable` errors.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:08:48 -03:00
Wainer dos Santos Moschetta
c30c3ff185 tests: run k8s-volume on a given node
This test can give false-positive on a multi-node cluster. Changed it to
use the new get_one_kata_node() and the modified exec_host() to run the
setup commands on a given node (that has kata installed) and ensure the
test pod is scheduled at that same node.

Fixes #7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:08:48 -03:00
Wainer dos Santos Moschetta
666993da8d tests: run k8s-file-volume on a given node
This test can give false-positive on a multi-node cluster. Changed it to
use the new get_one_kata_node() and the modified exec_host() to run the
setup commands on a given node (that has kata installed) and ensure the
test pod is scheduled at that same node.

Fixes #7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:08:48 -03:00
Wainer dos Santos Moschetta
3a00fc9101 tests: exec_host() now gets the node name
The exec_host() simply fails on cluster with multi-nodes because
`kubectl get node -o name" will return a list o names. Moreover, it will
return control nodes names which usually don't have kata installed.

Fixes #7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
61c9c17bff tests: add get_one_kata_node() to tests_common.sh
The introduced get_one_kata_node() returns the first node that
has the kata-runtime=true label, i.e., supposedly a node with
kata installed.

This is useful for tests that should run on a determined worker
node on a multi-nodes cluster.

Fixes #7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
68f083c4d0 ci: k8s: set KATA_HYPERVISOR default value
Let KATA_HYPERVISOR be qemu by default in gh-run.sh as this variable
is required to tweak some configurations of kata-deploy.

Fixes #7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
6677a61fe4 ci: k8s: configurable deploy kata timeout
The deploy-kata() of gha-run.sh will wait for 10 minutes for the kata
deploy installation finish. This allow users of the script to overwrite
that value by exporting the KATA_DEPLOY_WAIT_TIMEOUT environment
variable.

Fixes #7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
200e542921 ci: k8s: shellcheck fixes to gha-run.sh
Fixed a couple of warns shellcheck emitted and disabled others:
 * SC2154 (var is referenced but not assigned)
 * SC2086 (Double quote to prevent globbing and word splitting)

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
4af78be13a kata-deploy: re-format kata-[deploy|cleanup].yaml
The .tests/integration/kubernetes/gh-run.sh script run `yq write` a
couple of times to edit the kata-[deploy|cleanup].yaml, resulting
on the file being formatted again. This is annoying because leaves
the git tree dirty.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
d54e6d9cda ci: k8s: run_tests() for kcli
The only difference to the other platforms is that it needs to
export KUBECONFIG.

Fixes #7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
c2ef1f0fb0 ci: k8s: add deploy-kata-kcli() to gh-run.sh
The cleanup-kcli() behaves like other deploy kata for
bare-metal (e.g. sev, tdx...etc) except that KUBECONFIG
should be exported.

Fixes #7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
d2be8eef1a ci: k8s: add cleanup-kcli() to gha-run.sh
The cleanup-kcli() behaves like other clean up for bare-metal (e.g. sev,
tdx...etc) except that KUBECONFIG should be exported.

Fixes #7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
cbb9aa15b6 ci: k8s: set default image for deploy_kata()
On CI workflows the variables DOCKER_REGISTRY, DOCKER_REPO and
DOCKER_TAG are exported to match the built image. However, when running
the script outside of CI context, a developer might just use the latest
image which in this case will be
`quay.io/kata-containers/kata-deploy-ci:kata-containers-latest`.

Fixes #7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
89bef7d036 ci: k8s: create k8s clusters with kcli
Adapted the gha-run.sh script to create a Kubernetes cluster locally
using the kcli tool.

Use `./gha-run.sh create-cluster-kcli` to create it, and
`./gha-run.sh delete-cluster-kcli` to delete.

Fixes #7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Fabiano Fidêncio
1280f85343 Merge pull request #8171 from bergwolf/github/fix-up-gha
GHA: fix up referenced yaml exceeding 20 limit problem
2023-10-09 09:37:03 +02:00
Peng Tao
954d40cce5 gha: combine coco jobs into a single yaml
So that we don't risk exceeding the GHA 20 rerefenced yaml files limit
that easy.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-08 14:22:01 +00:00
Peng Tao
b60e0a9b57 gha: combine basic amd64 jobs into a single yaml
GHA has an undocumented limitation that there can be at most 20
referenced yamls in a single yaml file. We workaround it by combining
multiple jobs into a single yaml file.

Fixes: #8161
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-08 13:55:01 +00:00
Fabiano Fidêncio
108db0a721 Merge pull request #8162 from sprt/sprt/unbreak-ci
gha: ci: Revert tracing test PR to unbreak CI
2023-10-08 10:13:46 +02:00
Aurélien Bombo
e9bd852113 gha: ci: Revert tracing test PR to unbreak CI
Revert "Merge pull request #8115 from fidencio/topic/ci-add-tracing-tests"

This unbreaks CI as seen in https://github.com/kata-containers/kata-containers/actions/runs/6434757133

Fixes: #8161

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-10-06 14:13:17 -07:00
James O. D. Hunt
16fe81f27c Merge pull request #8124 from jodh-intel/ch-enable-feature
runtime-rs: ch: Enable feature
2023-10-06 13:02:08 +01:00
Fabiano Fidêncio
fa6786d1d7 Merge pull request #8117 from fidencio/topic/ci-add-runk-tests
gha: ci: Port runk tests over
2023-10-06 11:19:55 +02:00
Fabiano Fidêncio
8fec654716 Merge pull request #8115 from fidencio/topic/ci-add-tracing-tests
ci: gha: Port tracing tests over
2023-10-06 10:06:57 +02:00
GabyCT
265f53e594 Merge pull request #8082 from dborquez/enable_fio_on_ctr
Enable fio test using containerd client
2023-10-05 17:26:22 -06:00
GabyCT
c8b9ec1cb5 Merge pull request #8108 from GabyCT/topic/ghastability
gha: Add stability tests workflow for gha
2023-10-05 17:10:10 -06:00
James O. D. Hunt
b8a46a4b85 runtime-rs: ch: Enable feature
Enable the Cloud Hypervisor driver (the `cloud-hypervisor` build feature) for the rust runtime.

Fixes: #6264.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-05 17:58:39 +01:00
Gabriela Cervantes
0f2dc8c675 gha: Add containerd stability tests to ci yaml
This PR adds containerd stability tests to ci yaml.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-05 15:21:24 +00:00
Fabiano Fidêncio
89f73e658d Merge pull request #8110 from fidencio/topic/gha-be-more-specific-about-the-arm-runners
gha: arm64: Ensure the builder is arm64-builder
2023-10-04 21:20:08 +02:00
Fabiano Fidêncio
da91c9df88 ci: Port runk tests to this repo
I'm basically moving the runk tests from the tests repo to this one, and
I'm adding the "Signed-off-by:" of every single contributor the tests.

Fixes: #8116

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Chen Yiyang <cyyzero@qq.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-04 20:41:29 +02:00
Fabiano Fidêncio
7f23772763 ci: Add placeholder for runk tests
The runk test has been executed as part of the former "ubuntu" jenkins
CI.

We're porting it to GHA and running it against LTS containerd.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-04 20:40:32 +02:00
Fabiano Fidêncio
9205acc3d2 ci: Move tracing tests here
I'm basically moving the tracing tests from the tests repo to this one,
and I'm adding the "Signed-off-by:" of every single contributor to the
tests.

Fixes: #8114

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2023-10-04 20:02:27 +02:00
Gabriela Cervantes
85d290a048 gha: Add stability gha run script
This PR adds the stability gha run script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-04 17:45:45 +00:00
Gabriela Cervantes
54f0c8f88e gha: Add stability tests workflow for gha
This PR adds the stability test workflow for gha for the kata CI.

Fixes #8107

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-04 16:32:13 +00:00
Fabiano Fidêncio
3bb2923e5d ci: Add placeholder for tracing tests
The tracing tests are currently running as part of the Jenkins CI with
the following setups:
* Container Engines: containerd
* VMMs: QEMU | Cloud Hypervisor
* Snapshotters: overlayfs | devmapper

We'll be restricting those tests to be running on LTS version of
containerd, without devmapper.

As it's known due to our GHA limitation, this is just a placeholder and
the tests will actually be added in the next interations.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-04 18:02:02 +02:00
Fabiano Fidêncio
2c3bf406dc ci: Create a function to install docker
This will be re-used in other tests as well.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-04 15:01:51 +02:00
Fabiano Fidêncio
c2cce12de5 Merge pull request #8100 from fidencio/topic/kata-deploy-build-agent
kata-deploy: Build kata-agent as we build all the other components
2023-10-04 11:56:03 +02:00
Steve Horsman
c430cc3707 Merge pull request #8098 from stevenhorsman/k8s-registry-suite
versions: migrate out of k8s.gcr.io
2023-10-04 10:51:39 +01:00
Fabiano Fidêncio
119f03de26 gha: arm64: Ensure the builder is arm64-builder
Otherwise we'll use any arm64 machine that's added as a runner, and
whenever new machines are added those may end up being only used for
running some specific set of the tests.

Fixes: #8109

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-04 11:08:11 +02:00
Fabiano Fidêncio
59b9380d1c Merge pull request #8093 from stevenhorsman/crictl-pod-config-update
doc: Update crictl pod-config
2023-10-04 10:49:04 +02:00
David Esparza
8c498ef5ee metrics: Use jq tool to pretty-print json metrics output
This PR enables the use of jq pretty-print feature to
improve the formatting of metric results json files.

Fixes: #8081

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-10-03 23:33:19 -06:00
David Esparza
a2159a6361 metrics: Enables FIO test for kata containers
FIO benchmark is enabled to measure IO in Kata
at different latencies using containerd client,
in order to complement the CI metrics testing set.

This PR asl deprecated the previous Fio bench
based on k8s.

Fixes: #8080

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-10-03 23:32:38 -06:00
Fabiano Fidêncio
f337315952 Merge pull request #8106 from fidencio/topic/gha-fix-k0s-related-cis
gha: Fix k0s deployment
2023-10-03 21:47:40 +02:00
GabyCT
d1d9af5de2 Merge pull request #8085 from GabyCT/topic/stabilitytests
tests: Add stability test for kata CI
2023-10-03 11:28:49 -06:00
Fabiano Fidêncio
70e7ec3e23 gha: Fix k0s deployment
The tests are failing when setting up k0s, and that happens because we
download a kubectl binary matching the kubernetes version k0s is using,
and we do that by:
```
sudo k0s kubectl version --short 2>/dev/null | ...
```

With kubectl 1.28, which is now the default on k0s, `kubectl version
--short` has been removed, leading us to an empty stringm causing then
the error in the CI.

Fixes: #8105

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 17:21:40 +02:00
Fabiano Fidêncio
560bbffb57 packaging: tools: Remove set -x leftover
This was used for debugging, and ended up being merged with that.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 15:33:55 +02:00
Fabiano Fidêncio
18fa483d90 packaging: release: Mention newly added images
We've added two new containerd builder images recently, one for the
components under `src/tools` and another one for the Kata Containers
agent.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 15:33:55 +02:00
Fabiano Fidêncio
ca3b888371 packaging: tools: Fix container image env var name
This should be TOOLS_CONTAINER_BUILDER instead of
VIRTIOFSD_CONTAINER_BUILDER.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 15:33:55 +02:00
Fabiano Fidêncio
5ca66795c7 packaging: Allow passing the TOOLS_CONTAINER_BUILDER
This follows what we've been doing for all the components we're
building, but was missed as part of #8077.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 15:33:55 +02:00
Fabiano Fidêncio
02acef9575 gha: Build the kata-agent as part of our workflows
The kata-agent binary won't be released, just built so it can be used,
later on,  as part of our tests and as part of the rootfs build.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 15:33:55 +02:00
Fabiano Fidêncio
5208386ab1 packaging: Build the kata-agent
Let's add the needed functions to start building the kata-agent, with or
without the OPA support.

For now this build is not used as part of the rootfs build, but later on
this will (not as part of this series, though).

Fixes: #8099

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 15:33:55 +02:00
Fabiano Fidêncio
1727487eef agent: Allow specifying DESTDIR and AGENT_POLICY via env vars
This will help to build the agent binary as part of the kata-deploy
localbuild, as we need to pass the DESTDIR to where the agent will be
installed, and also whether we're building the agent with policy support
enabled or not.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 14:18:45 +02:00
Fabiano Fidêncio
45c1188839 packaging: Add get_agent_image_name()
This will be used for building the kata-agent.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 14:17:38 +02:00
Wainer dos Santos Moschetta
0db8fb8f98 versions: migrate out of k8s.gcr.io
The k8s.gcr.io is deprecated for a while now and has been redirected to
registry.k8s.io. However on some bare-metal machines in our testing
pools that redirection is not working, so let's just replace the
registries.

Fixes #8098
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit b2c3bca558c38deff2117d5909d9071c23c05590)
2023-10-03 11:52:59 +01:00
stevenhorsman
a1a0543671 doc: Fix spelling
Spell check failed with:
```
[kata-spell-check.sh:275] WARNING: Word 'overcommitment':
did you mean one of the following?: over commitment, over-commitment,
commitment
```
So update this to pass the static checks

Fixes: #
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-10-03 10:17:38 +01:00
Gabriela Cervantes
6339605a14 tests: Add general stability fixes
This PR adds general stability fixes.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-02 19:42:46 +00:00
stevenhorsman
59ae244442 doc: Update crictl pod-config
- Ensure that our documented crictl pod config file contents have
uid  and namespace fields for compatibility with crictl 1.24+

This avoids a user potentially hitting the error:
```
getting sandbox status of pod "d3af2db414ce8": metadata.Name,
metadata.Namespace or metadata.Uid is not in metadata
"&PodSandboxMetadata{Name:nydus-sandbox,Uid:,Namespace:default,Attempt:1,}"

getting sandbox status of pod "-A": rpc error: code = NotFound desc = an
error occurred when try to find sandbox: not found
```

Fixes: #8092
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
(cherry picked from commit 8f8c2215)
2023-10-02 14:53:46 +01:00
Gabriela Cervantes
fd19f4082f tests: Add agent stability test
This PR adds the agent stability test to stability test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-28 22:37:02 +00:00
Gabriela Cervantes
215577032f tests: Add cassandra stress in stability tests
This PR adds the cassandra stress at the stability tests.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-28 22:34:45 +00:00
GabyCT
a890ad3a16 Merge pull request #8066 from GabyCT/topic/urlvra
docs: Update url in kata vra document
2023-09-28 14:59:34 -06:00
Zvonko Kaiser
79e33c211c Merge pull request #7325 from zvonkok/vfio-sandbox-id-debug
gpu: Adding CDI support for cold and hot-plug of VFIO devices
2023-09-28 21:31:12 +02:00
Gabriela Cervantes
f2d3ea988d tests: Add stressng dockerfile for stability tests
This PR adds the stressng dockerfile for stability tests.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-28 16:35:22 +00:00
Gabriela Cervantes
6493aa309e tests: Add stressor CPU test for stability tests
This PR adds the stressor CPU test for stability tests.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-28 16:33:08 +00:00
Gabriela Cervantes
ef68a3a36b metrics: Add stability test for kata CI
This PR adds the stability test for kata containers repository.

Fixes #8084

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-28 16:23:36 +00:00
David Esparza
f7ef45b167 Merge pull request #8077 from fidencio/topic/kata-deploy-ship-the-tools
kata-deploy: build & ship the rust components from src/tools/
2023-09-28 09:59:19 -06:00
Zvonko Kaiser
7c934dc7da gpu: Fix cold-plug of VFIO devices
We need to do proper sandbox sizing when we're doing cold-plug introduce CDI,
the de-facto standard for enabling devices in containers. containerd
will pass-through annotations for accumulated CPU,Memory and now CDI
devices. With that information sandbox sizing can be derived correctly.

Fixes: #7331

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-09-28 09:49:13 +00:00
GabyCT
fcc755fc3b Merge pull request #8068 from GabyCT/topic/limitlatency
metrics: Add latency value limits for kata CI
2023-09-27 13:28:41 -06:00
Greg Kurz
defbb64ac8 Merge pull request #8036 from rye-stripe/bugfix/overhead-metrics
runtime: fix reading cgroup stats of sandboxes
2023-09-27 19:39:55 +02:00
Archana Shinde
95455e6fe8 Merge pull request #8058 from likebreath/0925/clh_v35.0
Upgrade to Cloud Hypervisor v35.0
2023-09-27 10:39:32 -07:00
Gabriela Cervantes
8d66ef5185 metrics: Increase qemu jitter value
This PR increases qemu jitter value.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-27 17:31:07 +00:00
Gabriela Cervantes
5600e28b54 metrics: Increase jitter value for clh
This PR increases jitter value for clh.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-27 17:30:19 +00:00
Fabiano Fidêncio
a6b1f5e21b ci: Build src/tools components as part of our tests / releases
Build those as part of our CI and release workflows.

Fixes #5520 #5348

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 18:50:25 +02:00
Fabiano Fidêncio
501a168a81 kata-deploy: Build components from src/tools
Let's add targets and actually enable users and oursevles to build those
components in the same way we build the rest of the project.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 18:49:02 +02:00
Fabiano Fidêncio
6ef42db5ec static-build: Add scripts to build content from src/tools
As we'd like to ship the content from src/tools, we need to build them
in the very same way we build the other components, and the first step
is providing scripts that can build those inside a container.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 18:48:56 +02:00
Fabiano Fidêncio
4d08ec29bc packaging: Add get_tools_image_name()
This will be used for building all the (rust) components from src/tools.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 18:48:35 +02:00
Fabiano Fidêncio
98097c96de packaging: Use git abbreviated hash
This will make it easier to build images that rely on several
directories hashes.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 18:48:30 +02:00
Fabiano Fidêncio
8b25e90027 Merge pull request #8075 from fidencio/topic/ci-add-kata-monitor-tests
ci: Port kata-monitor tests from Jenkins to GHA
2023-09-27 15:48:46 +02:00
Fabiano Fidêncio
489caf1ad0 ci: kata-monitor: Move tests over
Let's move, adapt, and use the kata-monitor tests from the tests repo.
In this PR I'm keeping the SoB from every single contributor from who
touched those tests in the past.

Fixes: #8074

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-27 11:40:31 +02:00
Fabiano Fidêncio
a3fb067f1b ci: Add placeholder for kata-monitor tests
The kata-monitor tests is currently running as part of the Jenkins CI
with the following setups:
* Container Engines: CRI-O | containerd
* VMMs: QEMU

When using containerd, we're testing it with:
* Snapshotter: overlayfs | devmapper

We will stop running those tests on devmapper / overlayfs as that hardly
would get us a functionality issue.

Also, we're restricting this to run with the LTS version of containerd,
when containerd is used.

As it's known due to our GHA limitation, this is just a placeholder and
the tests will actually be added in the next iterations.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 11:31:17 +02:00
Fabiano Fidêncio
57cb4ce204 ci: Make install_kata aware of container engines
This will help us when running tests using CRI-O.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 11:31:17 +02:00
Fabiano Fidêncio
de1eeee334 ci: Create a generic install_crio function
This will serve us quite will in the upcoming tests addition, which will
also have to be executed using CRi-O.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 11:26:13 +02:00
Fabiano Fidêncio
64a2000859 ci: Add install_cni_plugins helper
This will become handy when doing tests with CRI-O, as CRI-O doesn't
install the CNI plugins for us.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 11:26:13 +02:00
Fabiano Fidêncio
8132fe15c9 ci: Modify containerd default config
Let's ensure we have runc running with `SystemdCgroups = false`,
otherwise we'll face failures when running tests depending on runc on
Ubuntu 22.04, woth LTS containerd.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 11:16:12 +02:00
Gabriela Cervantes
8cb7df1bed metrics: Add checkmetrics for latency test
This PR adds the checkmetrics for latency test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-26 19:11:08 +00:00
Gabriela Cervantes
e90440ae24 metrics: Add qemu latency value limit
This PR adds the qemu latency value limit for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-26 17:30:09 +00:00
Gabriela Cervantes
a74a8f8a9d metrics: Add latency value limits for kata CI
This PR adds latency value limits for kata CI.

Fixes #8067

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-26 17:29:07 +00:00
Gabriela Cervantes
d7def8317a metrics: Fix general check static warnings
This PR fixes general check static warnings.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-26 16:30:59 +00:00
GabyCT
309103169d Merge pull request #8056 from GabyCT/topic/fixlatencypath
metrics: Fix latency yamls path
2023-09-26 10:16:55 -06:00
Gabriela Cervantes
928553d1ba docs: Update url in kata vra document
This PR updates the url in kata vra document.

Fixes #8065

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-26 16:13:12 +00:00
GabyCT
5c0afaacf4 Merge pull request #8018 from GabyCT/topic/fixreadme
metrics: Fix metrics README
2023-09-26 09:51:47 -06:00
David Esparza
83326f89b3 Merge pull request #8054 from GabyCT/topic/fixcrdoc
metrics: Fix C-Ray documentation
2023-09-26 09:50:19 -06:00
James O. D. Hunt
31478b9c33 Merge pull request #7944 from jodh-intel/runtime-rs-ch-enable-tdx
runtime-rs: ch: Enable Intel TDX
2023-09-26 14:11:12 +01:00
James O. D. Hunt
b0a3293d53 runtime-rs: ch: Enable Intel TDX
Allow Cloud Hypervisor to create a confidential guest (a TD or
"Trust Domain") rather than a VM (Virtual Machine) on Intel systems
that provide TDX functionality.

> **Notes:**
>
> - At least currently, when built with the `tdx` feature, Cloud Hypervisor
>   cannot create a standard VM on a TDX capable system: it can only create
>   a TD. This implies that on TDX capable systems, the Kata Configuration
>   option `confidential_guest=` must be set to `true`. If it is not, Kata
>   will detect this and display the following error:
>
>   ```
>   TDX guest protection available and must be used with Cloud Hypervisor (set 'confidential_guest=true')
>   ```
>
> - This change expands the scope of the protection code, changing
>   Intel TDX specific booleans to more generic "available guest protection"
>   code that could be "none" or "TDX", or some other form of guest
>   protection.

Fixes: #6448.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-26 10:55:25 +01:00
James O. D. Hunt
523399c329 runtime-rs: ch: Add more consts
Introduce a few new constants (for PCI segment count and FS queues) and
move the disk queue constants to `convert.rs` to allow them to be used
there too.

> **Note:**
>
> This change gives the `ShareFs` code it's own set of values rather
> than relying on the disk queue constants.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-26 08:41:32 +01:00
James O. D. Hunt
dea8065811 runtime-rs: ch: Remove unused function
Delete the `handle_pending_devices_after_boot()` function which is no
longer required.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-26 08:41:32 +01:00
James O. D. Hunt
995f2c015f runtime-rs: ch: Only handle particular pending device types
Modify the Cloud Hypervisor `add_device()` method to add `ShareFs` and
`Network` devices to the list of pending devices since only these two
device types need to be cached before VM startup. Full details in the
comments.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-26 08:41:32 +01:00
James O. D. Hunt
b1b96a5c49 runtime-rs: ch: Remove erroneous "virtio-blk-mmio" check
Remove the `VIRTIO_BLK_MMIO` check which appears to have been added
erroneously in the first place.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-26 08:41:32 +01:00
Gabriela Cervantes
9ac29b8d38 metrics: Add init_env function to latency test
This Pr adds the init_env function to latency test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-25 22:06:00 +00:00
Bo Chen
dfd0c9fa9a runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v35.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #8057

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-09-25 12:22:37 -07:00
Bo Chen
8f9f087e35 versions: Upgrade to Cloud Hypervisor v35.0
Details of this release can be found in ourroadmap project as iteration
v35.0: https://github.com/orgs/cloud-hypervisor/projects/6.

Fixes: #8057

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-09-25 12:22:01 -07:00
Fabiano Fidêncio
a4daa86535 Merge pull request #8028 from fidencio/topic/ci-test-with-crio-part-2
ci: k8s: crio: Follow up patches to have CRI-O also working as part of our CI
2023-09-25 18:40:42 +02:00
Gabriela Cervantes
81c8babca9 metrics: Fix latency yamls path
This PR fixes the latency yamls path for the latency test for
kata metrics.

Fixes #8055

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-25 15:52:24 +00:00
Gabriela Cervantes
4815736820 metrics: Fix C-Ray documentation
This PR fixes the C-Ray documentation for kata metrics.

Fixes #8052

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-25 15:27:58 +00:00
Fabiano Fidêncio
ef63d67c41 ci: crio: Trail '\r' from exec_host() output
We've faced this as part of the CI, only happening with the CRI-O tests:
```
 not ok 1 Test readonly volume for pods
 # (from function `exec_host' in file tests_common.sh, line 51,
 #  in test file k8s-file-volume.bats, line 25)
 #   `exec_host "echo "$file_body" > $tmp_file"' failed with status 127
 # [bats-exec-test:38] INFO: k8s configured to use runtimeclass
 # bash: line 1: $'\r': command not found
 #
 # Error from server (NotFound): pods "test-file-volume" not found
```

I must say I didn't dig into figuring out why this is happening, but we
may be safe enough to just trail the '\r', as long as all the tests keep
passing on containerd.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-25 16:42:18 +02:00
Fabiano Fidêncio
74c12b2927 ci: crio: Enable default capabilities
We need the default capabilities to be enabled, especially `SYS_CHROOT`,
in order to have tests accessing the host to pass.

A huge thanks to Greg Kurz for spotting this and suggesting the fix.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-09-25 14:56:15 +02:00
Fabiano Fidêncio
358dc2f569 kata-deploy: Fix CRI-O detection
Some of the "k8s distros" allow using CRI-O in a non-official way, and
if that's done we cannot simply assume they're on containerd, otherwise
kata-deploy will simply not work.

In order to avoid such issue, let's check for `cri-o` as the container
engine as the first place and only proceed with the checks for the "k8s
distros" after we rule out that CRI-O is not being used.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-25 14:56:15 +02:00
Fabiano Fidêncio
ebaa4fa4c1 ci: crio: Pass -y to apt
That was something overlooked during my tests. :-/

Fixes: #8005

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-25 14:56:15 +02:00
GabyCT
11cf0e2d28 Merge pull request #8038 from GabyCT/topic/latency
metrics: Enable latency test in gha run script
2023-09-22 16:57:53 -06:00
GabyCT
3ef57b335e Merge pull request #8045 from jepio/fix-docker-ownership
local-build: Fix .docker ownership before build-payload
2023-09-22 14:43:38 -06:00
Archana Shinde
9bb9a3e7a4 Merge pull request #7966 from amshinde/runtime-rs-network-clh
runtime-rs: Add network support for cloud-hypervisor
2023-09-22 13:08:09 -07:00
Gabriela Cervantes
97e73b2234 metrics: Fix spelling warnings
This PR fixes general spelling warnings detected by the spelling check.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-22 15:50:51 +00:00
Gabriela Cervantes
36c8cd6f1f metrics: Fix metrics README
This PR fixes the network metrics section at the README by leaving
the current tests that we have in our kata metrics.

Fixes #8017

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-22 15:28:58 +00:00
Fabiano Fidêncio
c5a5a0c95e Merge pull request #8012 from arronwy/strip
osbuild: Reduce guest components binary size with strip
2023-09-22 15:45:38 +02:00
Fabiano Fidêncio
9d190f2390 Merge pull request #8042 from GabyCT/topic/pandoc
gha: Add pandoc as a dependency for static checks
2023-09-22 15:31:18 +02:00
Jeremi Piotrowski
15425a2b80 local-build: Fix .docker ownership before build-payload
The permissions on .docker/buildx/activity/default are regularly broken by us
passing docker.sock + $HOME/.docker to a container running as root and then
using buildx inside. Fixup ownership before executing docker commands.

Fixes: #8027
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-22 13:44:53 +02:00
Jeremi Piotrowski
a5338e885e Merge pull request #8030 from portersrc/8027-ci-rootfs-image-build-asset-is-failing-oras
ci: rootfs-image build-asset is failing
2023-09-22 11:07:50 +02:00
Chao Wu
6f98fbafde Merge pull request #6706 from guixiongwei/feat/thp
feat(runtime-rs): introduce huge page mode to select VM RAM's backend
2023-09-22 15:27:06 +08:00
Gabriela Cervantes
13ca7d9f97 gha: Add pandoc as a dependency for static checks
To avoid the failure of not finding pandoc command this PR adds that
package as a dependency for static checks.

Fixes #8041

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-21 20:14:41 +00:00
Jeremi Piotrowski
28dd5ae91e Merge pull request #7799 from UiPath/clh-directio-support
clh: Direct IO support for block devices
2023-09-21 19:16:08 +02:00
David Esparza
6de9f39895 Merge pull request #8020 from GabyCT/topic/fixhunspell
gha: Install hunspell for static checks
2023-09-21 10:58:40 -06:00
Gabriela Cervantes
08bc8e4db4 metrics: Add latency benchmark for gha
This PR adds the latency benchmark for gha for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-21 16:14:39 +00:00
Gabriela Cervantes
6776b55d7e metrics: Enable latency test in gha run script
This PR enables the latency test for gha run script for kata metrics.

Fixes #8037

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-21 16:11:58 +00:00
Peteris Rudzusiks
94e2ccc2d5 runtime: fix reading cgroup stats of sandboxes
The cgroup stats come from resourcecontrol package in the form of pointers
to structs. The sandbox Stat() method incorrectly was expecting structs.
This caused the cpu and memory stats to always be 0, which in turn caused
incorrect pod overhead metrics.

Fixes #8035

Signed-off-by: Peteris Rudzusiks <rye@stripe.com>
2023-09-21 17:00:53 +02:00
Alexandru Matei
d507d189bb fc: Add support for noflush cache option
Firecracker supports noflush semantic via Unsafe cache type.
There is no support for direct i/o, remove it from config file

Fixes: #7823

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2023-09-21 14:48:24 +03:00
Alexandru Matei
2ca781518a clh: Direct IO support for block devices
Clh suports direct i/o for disks. It doesn't
offer any support for noflush, removed passing
of option to cloud-hypervisor internal config

Fixes: #7798

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2023-09-21 14:48:24 +03:00
Fabiano Fidêncio
dd27912f31 Merge pull request #8032 from fidencio/topic/ci-make-push-after-build-be-trigger-by-workflow-dispatch
ci: Trigger payload-after-push on workflow_dispatch
2023-09-21 10:25:24 +02:00
Fabiano Fidêncio
0c95697cc4 ci: Trigger payload-after-push on workflow_dispatch
This will allow us to easily test failures and fixes on that workflows.

Fixes: #8031

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-21 09:24:13 +02:00
Chris Porter
28cbc3b51c ci: rootfs-image build-asset is failing
Fixes: #8027

Signed-off-by: Chris Porter <porter@ibm.com>
2023-09-21 00:58:42 -05:00
Fabiano Fidêncio
21f6f9a173 Merge pull request #8016 from fidencio/topic/ci-test-with-crio-part-1
ci: Actually enable the CRI-O tests
2023-09-21 07:42:27 +02:00
Wainer Moschetta
87e64a07ed Merge pull request #7979 from beraldoleal/gogo-removal
protocol: remove gogoprotobuff tests
2023-09-20 22:38:10 -03:00
Gabriela Cervantes
87a8616488 gha: Install hunspell for static checks
Seems like the static checks are failing due the missing of the hunspell
package this PR fixes that.

Fixes #8019

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-20 16:58:10 +00:00
Fabiano Fidêncio
8c3c50ca8a ci: Actually enable the CRI-O tests
The test has been added to the repo, but we have to also add it to the
list of jobs to be executed.

Fixes: #8005

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-20 18:01:25 +02:00
David Esparza
03554c799a Merge pull request #8006 from fidencio/topic/ci-test-with-crio-part-0
ci: k8s: Also run tests with CRI-O
2023-09-20 07:45:17 -06:00
Fabiano Fidêncio
c6a9e50c37 Merge pull request #8004 from microsoft/danmihai1/quoted-spaces
runtime: support kernel params including spaces
2023-09-20 12:10:51 +02:00
Wang, Arron
3a6510ad61 osbuild: Reduce guest components binary size with strip
opa_linux_amd64_static 38M => 27M
kata-agent 30M => 23M

ls -alh opa_linux_amd64_static
-rw-rw-r-- 1 arron arron 38M Jul 28 01:59 opa_linux_amd64_static
➜ kata-containers git:(main) ✗ strip opa_linux_amd64_static
➜ kata-containers git:(main) ✗ ls -alh opa_linux_amd64_static
-rw-rw-r-- 1 arron arron 27M Sep 20 16:12 opa_linux_amd64_static

ls -alh ./usr/bin/kata-agent
-rwxr-xr-x. 1 root root 30M Jul 30 23:41 ./usr/bin/kata-agent
ls -alh ./usr/bin/kata-agent
-rwxr-xr-x. 1 root root 23M Sep 20 16:13 ./usr/bin/kata-agent

Fixes: #8011

Signed-off-by: Wang, Arron <arron.wang@intel.com>
2023-09-20 16:23:17 +08:00
Fabiano Fidêncio
07a6e63a6b ci: k8s: rke2: Use sudo to call systemd
Otherwise we'll face the following error:
```
Failed to enable unit: Interactive authentication required.
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-20 08:48:29 +02:00
Fabiano Fidêncio
03b82e8484 ci: k8s: Add a CRI-O test
Let's make sure we'll also be testing k8s using CRI-O.

For now, we'll only be running the CRI-O test with QEMU.  Once it
becomes stable we can expand this to other Hypervisors as well.

Fixes: #8005

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-20 00:59:09 +02:00
Fabiano Fidêncio
d7105cf7a4 ci: k8s: Add a method to install CRI-O
This is based on official CRI-O documentations[0] and right now we're
making this specific to Ubuntu as that's what we have as runners.

We may want to expand this in the future, but we're good for now.

[0]:
https://github.com/cri-o/cri-o/blob/main/install.md#apt-based-operating-systems

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-20 00:59:09 +02:00
Fabiano Fidêncio
54c0a471b1 ci: k8s: k0s: Allow passing parameters to the k0s installer
We'll need this in order to setup k0s with a different container engine.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-20 00:59:09 +02:00
Fabiano Fidêncio
31ef64606c Merge pull request #8007 from fidencio/topic/ci-kata-deploy-fix-garm-runner-name
ci: kata-deploy: Fix runner name
2023-09-20 00:58:33 +02:00
Beraldo Leal
730ef51693 deps: updating dependencies
Updating dependencies after make check, make test.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-19 16:54:35 -04:00
GabyCT
6111ef6fb6 Merge pull request #7990 from GabyCT/topic/parallelbandwidth
metrics: Enable parallel bandwidth iperf limit
2023-09-19 14:52:21 -06:00
Fabiano Fidêncio
3a2c83d69b ci: kata-deploy: Fix runner name
It should be garm-ubuntu-2004-smaller instead of garm-ubuntu-2004-small.

Fixes: #7890

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 22:34:37 +02:00
Dan Mihai
82ff2db460 runtime: support kernel params including spaces
Support quoted kernel command line parameters that include space
characters. Example:

dm-mod.create="dm-verity,,,ro,0 736328 verity 1
/dev/vda1 /dev/vda2 4096 4096 92041 0 sha256
f211b9f1921ef726d57a72bf82be23a510076639fa8549ade10f85e214e0ddb4
065c13dfb5b4e0af034685aa5442bddda47b17c182ee44ba55a373835d18a038"

Fixes: #8003

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-09-19 20:26:38 +00:00
Beraldo Leal
604a9dd673 protocol: remove gogoprotobuff tests
This is part of a bigger effort to drop gogoprotobuff from our code
base. IIUC, those options are basically used by *pb_test.go, and since
we are dropping gogoprotobuff and those are auto generated tests, let's
just remove it.

Fixes #7978.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-19 12:55:42 -04:00
Fabiano Fidêncio
5560e72024 Merge pull request #7896 from fidencio/topic/ground-work-for-testing-all-k8s-flavours-we-support
ci: kata-deploy: Enable all k8s flavours that we support
2023-09-19 17:44:34 +02:00
Fabiano Fidêncio
f7fa7f602a ci: Enable kata-deploy tests for all the supported k8s flavours
Let's ensure we test kata-deploy on RKE2 and k0s as well.

Fixes: #7890

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 13:38:10 +02:00
Fabiano Fidêncio
2c908b598c ci: kata-deploy: Add the ability to deploy rke2
This will be very useful in the near future, when we start testing
kata-deploy with rke2 as well.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 13:38:10 +02:00
Fabiano Fidêncio
eaf6164916 ci: kata-deploy: Add the ability to deploy k0s
This will be very useful in the near future, when we start testing
kata-deploy with k0s as well.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 13:38:10 +02:00
Fabiano Fidêncio
0015257636 ci: kata-deploy: Add deploy-k8s argument to gha-run.sh
We'll be using exactly the same code used for the k8s tests, which are
already deploying k3s on GARM.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 13:38:10 +02:00
Fabiano Fidêncio
bf2cb02283 ci: kata-deploy: Expland tests to run on k0s / rke2
We just need to make sure the correct overlay is applied, following what
we already have been doing for k3s.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 13:38:10 +02:00
Fabiano Fidêncio
6d5d844e5c Merge pull request #7983 from sprt/resource-group-naming
ci: Create clusters in individual resource groups
2023-09-19 12:54:21 +02:00
Fabiano Fidêncio
b12b9e1886 ci: kata-deploy: Add placeholder for tests on GARM
We'll be testing kata-deploy with different kubernetes flavours as part
of our GARM tests, and this is a place-holder for this.

Once enabled, we'll do nothing, just `return 0`, so we can then properly
add the tests after this commit gets merged.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 12:42:02 +02:00
Fabiano Fidêncio
9e1fb8a966 ci: kata-deploy: Export KUBERNETES env var
So we have a better control on which flavour of kubernetes kata-deploy
is expected to be targetting.

This was also done as part of fa62a4c01b,
for the k8s tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 12:37:56 +02:00
Fabiano Fidêncio
09cc0ed438 ci: Move deploy_k8s() to gha-run-k8s-common.sh
This will allow us to re-use the function in the kata-deploy tests,
which will come soon.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 12:37:56 +02:00
Fabiano Fidêncio
1829f5c049 Merge pull request #7992 from skaegi/virtiofsd-1.8.0
versions: Bump virtiofsd to v1.8.0
2023-09-19 11:52:49 +02:00
Fabiano Fidêncio
486fe14c99 ci: Properly set K8S_TEST_UNION
Otherwise only the first test will be executed

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 10:23:58 +02:00
Aurélien Bombo
d9ef1352af ci: Add first letter of the K8S_TEST_HOST_TYPE to resource group name
Ideally we'd add the instance_type or the full K8S_TEST_HOST_TYPE but
that exceeds the maximum amount of characteres allowed for the cluster
name.  With this in mind, let's use the first letter of
K8S_TEST_HOST_TYPE instead.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-09-19 10:23:58 +02:00
Aurélien Bombo
68267a3996 ci: Create clusters in individual resource groups
This makes it so that each AKS cluster is created in its own individual
resource group, rather than using the "kataCI" resource group for all
test clusters.

This is to accommodate a tool that we recently introduced in our Azure
subscription which automatically deletes resource groups after a set
amount of time, in order to keep spending under control.

The tool will automatically delete any resource group, unless it has a
tag SkipAutoDeleteTill = YYYY-MM-DD. When this tag is present, the
resource group will be retained until the specified date.

Note that I tagged all current resource groups in our subscription with
SkipAutoDeleteTill = 2043-01-01 so that we don't lose any existing
resources.

Fixes: #7982

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-09-19 10:23:55 +02:00
Fabiano Fidêncio
84c0d59d23 Merge pull request #7985 from fidencio/topic/clh-use-static_sandbox_resource_mgmt-as-default-on-arm
clh: arm: Use static_sandbox_resource_mgmt=true
2023-09-19 09:25:34 +02:00
Gabriela Cervantes
9aa8d1c917 metrics: Add parallel bandwidth limit for qemu
This PR adds the parallel bandwidth limit for qemu for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-18 21:08:54 +00:00
Simon Kaegi
44c7c082d9 versions: Bump virtiofsd to v1.8.0
https://gitlab.com/virtio-fs/virtiofsd/-/releases/v1.8.0 was released two weeks ago. We have fully tested and are using this version.

Also bumps toolchain version to match what virtiofsd used.

Fixes: #7960

Signed-off-by: Simon Kaegi <simon.kaegi@gmail.com>
2023-09-18 15:21:15 -04:00
Fabiano Fidêncio
5f8e210d3b Merge pull request #7961 from ChengyuZhu6/update_nydus
Bump nydus versions and update nydus tests
2023-09-18 21:02:20 +02:00
Fabiano Fidêncio
c3ee913bf6 Merge pull request #7953 from gkurz/extra-monitor-socket
runtime/qemu: Rework QMP/HMP support
2023-09-18 19:04:14 +02:00
Gabriela Cervantes
af59d4bf4a metrics: Enable parallel bandwidth iperf limit
This PR enables the parallel bandwidth iperf limit for kata metrics.

Fixes #7989

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-18 16:32:11 +00:00
Fabiano Fidêncio
aba36ab188 nydus: Temporarily skip tests on dragonball
We're hitting a specific issue after updating, which will require some
work on dragonball before it can be re-added here.

The issue:
```
...
3: failed to do rafs mount\\n
4: fail to attach rafs \\\"/var/lib/containerd-nydus/snapshots/2/fs/image/image.boot\\\"\\n
5: add share fs mount\\n
6: Mount rafs at
   /rafs/197ef3db03c86b91bf3045ff59183ce8b5750941ad1d3484f4a8301a70f5109f/rootfs_lower
   error: Failed to Mount backend
...

Caused by:
vmm action error: FsDevice(AttachBackendFailed(\\\"attach/detach a
backend filesystem failed:: missing field `version` at line 1 column
489\\\"))\"): unknown"
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
b8a8dfcd15 nydus: Use kata-${KATA_HYPERVISOR} instead of kata
This will ensure we're testing with the correct runtime, instead of
using the `default` one.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
ChengyuZhu6
f6df3d6efb static-build: Fix arch error on nydus build
Fix the arch error when downloading the nydus tarball.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Signed-off-by: Steven Horsman <steven@uk.ibm.com>
2023-09-18 17:40:06 +02:00
ChengyuZhu6
2f9c9e2e63 tests: nydus: Update nydus tests
To support the v0.12.0 nydus-snapshotter, we need to update the config
files and the commandline to start nydus-snapshotter.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
c9a4e7e46d versions: Bump nydus and nydus-snapshotter to its latest release
As we need https://github.com/containerd/nydus-snapshotter/pull/530 in.

Fixes #7984

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
b73bde320d gha: nydus: Populate run()
And with this we finally enable the nydus tests to run as part of our
GHA CI.

Fixes: #6543

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
b3904a1a30 gha: nydus: Populate install_dependencies()
Let's have all the dependencies needed for running the nydus tests
installed.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
d2b3b67f5d gha: nydus: Actually install kata when install-kata is called
We've been simply doing nothing whenever `install-kata` was called, and
that was the intent when we added the placeholder calls.

Now, let's install kata, as expected. :-)

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
0ec00ad42e gha: nydus: Get rid of nydus{,-snapshotter} install from nydus_test.sh
As we've added install_nydus() and install_nydus_snapshotter(), which do
conform with the pattern we're following on GHA, let's rely on them
rather than relying on the bits coming from nydus_test.sh.

Later on we'll have install_nydus() and install_nydus_snapshotter() as
part of the dependencies install in our `gha-run.sh`.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
568439c77b tests: nydus: Add timeout to the crictl calls
Similarly to what's been done for the cri-containerd tests, as part of
84dd02e0f9, we need to add the timeout
here for the crictl calls.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
5ac3b76eb1 tests: nydus: Add uid / namespace to the nydus container / sandbox
Otherwise we may face errors like:
```
getting sandbox status of pod "d3af2db414ce8": metadata.Name,
metadata.Namespace or metadata.Uid is not in metadata
"&PodSandboxMetadata{Name:nydus-sandbox,Uid:,Namespace:default,Attempt:1,}"

getting sandbox status of pod "-A": rpc error: code = NotFound desc = an
error occurred when try to find sandbox: not found
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
376574a16c tests: nydus: Decorate some calls with sudo
Otherwise we canoot properly start the nydus snapshotter, nor properly
kill it after it's been started.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
4290fd4b67 tests: nydus: Adapt "source ..." to GHA
The "source ..." we've been doing was not changed since those tests were
part of the Jenkins tests, and we need to adapt them, either setting the
correct path or entirely removing the ones that are not relevant to us
anymore.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
a84efa3e87 tests: nydus: Adapt check to "clh" instead "cloud-hypervisor"
As that's what we've been using as part of the GHA.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
56a14b3950 tests: common: Add install_nydus_snapshotter()
This function will be used to download and install the
nydus-snapshotter, and it follows the same pattern we already have
introduced for downloading and installing another dependencies from
GitHub.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
b6563783e2 tests: common: Add install_nydus()
This function will be used to download and install nydus, and it follows
the same pattern we already have introduced for downloading and
installing another dependencies from GitHub.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
72599f1911 clh: arm: Use static_sandbox_resource_mgmt=true
Users have noticed that this is needed, as CLH does not yet implement a
way to hotplug resources on aarh64.

With this patch, when building for x86_64, I can see the this is the
resulting config:
```
$ ARCH=amd64 make
...

$ cat config/configuration-clh.toml | grep static_sandbox_resource_mgmt
static_sandbox_resource_mgmt=false

```

And when building for aarch64:
```
$ ARCH=arm64 make
...

$ cat config/configuration-clh.toml | grep static_sandbox_resource_mgmt
static_sandbox_resource_mgmt=true
```

Fixes: #7941

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 14:14:10 +02:00
Jeremi Piotrowski
dfa6af54df Merge pull request #7806 from jongwu/clh_serial
clh:arm64: use arm AMBA UART for hypervisor debug
2023-09-18 12:29:07 +02:00
Greg Kurz
1f16b6627b runtime/qemu: Rework QMP/HMP support
PR #6146 added the possibility to control QEMU with an extra HMP socket
as an aid for debugging. This is great for development or bug chasing
but this raises some concerns in production.

The HMP monitor allows to temper with the VM state in a variety of ways.
This could be intentionally or mistakenly used to inject subtle bugs in
the VM that would be extremely hard if not even impossible to debug. We
definitely don't want that to be enabled by default.

The feature is currently wired to the `enable_debug` setting in the
`[hypervisor.qemu]` section of the configuration file. This setting has
historically been used to control "debug output" and it is used as such
by some downstream users (e.g. Openshift). Forcing people to have the
extra HMP backdoor at the same time is abusive and dangerous.

A new `extra_monitor_socket` is added to `[hypervisor.qemu]` to give
fine control on whether the HMP socket is wanted or not. This setting
is still gated by `enable_debug = true` to make it clear it is for
debug only. The default is to not have the HMP socket though. This
isn't backward compatible with #6416 but it is for the sake of "better
safe than sorry".

An extra monitor socket makes the QEMU instance untrusted. A warning is
thus logged to the journal when one is requested.

While here, also allow the user to choose between HMP and QMP for the
extra monitor socket. Motivation is that QMP offers way more options to
control or introspect the VM than HMP does. Users can also ask for
pretty json formatting well suited for human reading. This will improve
the debugging experience.

This feature is only made visible in the base and GPU configurations
of QEMU for now.

Fixes #7952

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-09-18 12:13:01 +02:00
Greg Kurz
cab46c9e23 Merge pull request #7973 from fidencio/topic/ci-use-bigger-machine-sizes-for-the-needed-tests-part-0
ci: Use variable size of VMs depending on the tests running
2023-09-18 12:06:44 +02:00
Fabiano Fidêncio
0e3bfac3b3 Merge pull request #7976 from fidencio/topic/ci-static-checks-rework-part-0
ci: Rework static checks
2023-09-18 11:01:18 +02:00
Peng Tao
6eedd9b0b9 Merge pull request #7738 from Xuanqing-Shi/7732/handle-non-empty-endpoints-in-RemoveEndpoints
runtime: incorrect handling of non-empty []Endpoint parameter in Remo…
2023-09-18 10:58:28 +08:00
Fabiano Fidêncio
8b1e9b0c75 ci: static-checks: Clean up static-checks job
Now that the static-checks job only takes care of running the
static-checks, let's clean it up, remove all the unneeded steps, make
sure that we're using the actions in their latest version, and have it
running in a cost free runner.

At some point I'd like to see those tests done in parallel, in the same
way that I've organised the build-checks, but that's something for
someone else, at some other time.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 14:23:02 +02:00
Fabiano Fidêncio
2c5ca2eaf8 ci: static-checks: Run tests depending on KVM
With this we're removing the dragonball static-checks CI, as the test is
running here now. :-)

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 14:22:38 +02:00
Fabiano Fidêncio
509c309ab2 ci: static-checks: Move "sudo make test" to the new test matrix
We're moving it out of the previous "static-checks" confusing matrix,
and adding it to the matrix that was currently being used for the `make
vendor` and `make check` checks.

This will allow us to have one job per component, and with that we can
easily run those in parallel and on the zero cost runners.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:53:23 +02:00
Fabiano Fidêncio
4e963cedf4 ci: static-checks: Move "make test" to the new test matrix
We're moving it out of the previous "static-checks" confusing matrix,
and adding it to the matrix that was currently being used for the `make
vendor` and `make check` checks.

This will allow us to have one job per component, and with that we can
easily run those in parallel and on the zero cost runners.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:53:17 +02:00
Fabiano Fidêncio
08f2e5ae0b runtime-rs: Ensure static-checks-build is a dep of make test
Otherwise `make test` will simply fail with:
```
error[E0583]: file not found for module `config`
```

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:53:13 +02:00
Fabiano Fidêncio
2bc3a616ae kata-ctl: Use loop instead of kvm module in tests
This makes it pssible to run the tests in the cost free runners, which
are not KVM capable.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:53:08 +02:00
Fabiano Fidêncio
46daddc500 kata-ctl: Ensure GENERATED_CODE is a dep of make test
Otherwise `make test` will simply fail with:
```
error[E0583]: file not found for module `version`
```

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:53:01 +02:00
Fabiano Fidêncio
ec826f328f agent: Ensure GENERATED_CODE is a dep of make test
Otherwise `make test` will fail with:
```
error[E0583]: file not found for module `version`
```

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:57 +02:00
Fabiano Fidêncio
1d32410a83 ci: install_libseccomp: Do not depend on the tests repo
It makes things way simpler, waaaaay simpler.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:49 +02:00
Fabiano Fidêncio
bf888b9a5e ci: static-checks: Move "make check" to the new test matrix
We're moving it out of the previous "static-checks" confusing matrix,
and adding it to the matrix that was currently being used for the `make
vendor` checks.

This will allow us to have one job per component, and with that we can
easily run those in parallel and on the zero cost runners.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:45 +02:00
Fabiano Fidêncio
473ec87806 kata-ctl: Add kata-types to the Cargo.lock file
Commit message covered everything. :-)

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:40 +02:00
Fabiano Fidêncio
ea19549a99 kata-ctl: Ensure GENERATED_CODE is a dep of make check
Otherwise `make check` would fail with:
```
Error writing files: failed to resolve mod `version`:
/home/runner/work/kata-containers/kata-containers/src/tools/kata-ctl/src/ops/version.rs
does not exist make: *** [../../../utils.mk:176: standard_rust_check] Error 1
```

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:36 +02:00
Fabiano Fidêncio
e125775863 tests: install_rust: Also install clippy
clippy is used as part our tests, so it's useful to have it installed
while we're already installing rust.

In case of developers, they also better be using it. :-)

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:31 +02:00
Fabiano Fidêncio
e2c61a152c ci: static-checks: Move vendor check to its own job
Similarly to the static-check jobs, those jobs can be run on the zero
cost runners.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:30 +02:00
Fabiano Fidêncio
6794d4c843 tests: Move install_rust.sh from the tests repo
We'll use it as part of the refactoring we're doing in the static check
tests.

I can see a lot of other uses of this, but changing all of them to this
one is out of the scope for this PR.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:29 +02:00
Fabiano Fidêncio
e64508c308 tests: install_go: Remove tests repo dependency
We can rely on the functions that are now part of the common.bash.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:28 +02:00
Fabiano Fidêncio
11dff731b7 tests: Move functions from kata_arch script here
We can use this a lot as part of our CI, but right now I'm just moving
those here with the intent to use later on in this series.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:28 +02:00
Fabiano Fidêncio
75c974c802 ci: static-checks: Move kernel config check to its own job
It doesn't make sense to run this for all the bits of the matrix,
neither it's demanding enough to require running this in one of our
Azure sponsored runners.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:25 +02:00
Archana Shinde
9c233bb9e0 test: Add test to verify try_from for clh Netconfig
Add tests to verify conversion from runtime NetworkConfig
to clh specific config.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-09-16 00:24:14 -07:00
Fabiano Fidêncio
c69a1e33bd ci: Use variable size of VMs depending on the tests running
Let me start with a fair warning that this commit is hard to split into
different parts that could be easily tested (or not tested, just
ignored) without breaking pieces.

Now, about the commit itself, as we're on the run to reduce costs
related to our sponsorship on Azure, we can split the k8s tests we run
in 2 simple groups:
* Tests that can be run in the smaller Azure instance (D2s_v5)
* Tests that required the normal Azure instance (D4s_v5)

With this in mind, we're now passing to the tests which type of host
we're using, which allows us to select to run either one of the two
types of tests, or even both in case of running the tests on a baremetal
system.

Fixes: #7972

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 09:13:54 +02:00
Archana Shinde
9049d311df runtime-rs: Add network support for cloud-hypervisor
This PR adds support for adding a network device before starting the
cloud-hypervisor VM.

Support for adding and removing network devices is not really added to
the resource manager, so supporting this for cloud-hypervisor is not
scoped in this PR.

This also changes "pending_devices" for clh implementation from an
Option of vector to simply a vector. This simplifies the structure a bit
as we can simple iterate over the pending devices instead of having to
check for a "Some" value as this is not really required.

Fixes: #6333

Signed-off-by: Shuaiyi Zhang <zhang_syi@qq.com>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-09-15 23:25:20 -07:00
Greg Kurz
79c494eb4e Merge pull request #7969 from fidencio/topic/ci-cache-using-oras-part-3
ci: cache: Check the sha256sum of the components & fix ovmf-sev cache usage
2023-09-15 16:30:22 +02:00
Fabiano Fidêncio
eecd5bf2aa ci: cache: Fix ovmf-sev cache
The cached tarball is relying on the component name, thus it's important
to set it correctly, otherwise we'll end up always building it.

With this patch applied:
```
≡ ⨯ make ovmf-sev-tarball
make ovmf-sev-tarball-build
make[1]: Entering directory '/home/ffidenci/src/upstream/kata-containers/kata-containers'
/home/ffidenci/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build//kata-deploy-binaries-in-docker.sh  --build=ovmf-sev
sha256:67cc94e393dc1d5bfc2b77a77e83c9b1c0833d0fbbebaa9e9e36f938bb841fcc
Build kata version 3.2.0-rc0: ovmf-sev
INFO: DESTDIR /home/ffidenci/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build/build/ovmf-sev/destdir
Downloading a76f5522493f ovmf-sev-builder-image-version
Downloading 7e98c854bd94 kata-static-ovmf-sev.tar.xz
Downloading 559311973ff8 ovmf-sev-version
Downloaded  a76f5522493f ovmf-sev-builder-image-version
Downloading 353b655c2297 ovmf-sev-sha256sum
Downloaded  559311973ff8 ovmf-sev-version
Downloaded  353b655c2297 ovmf-sev-sha256sum
Downloaded  7e98c854bd94 kata-static-ovmf-sev.tar.xz
Pulled [registry] ghcr.io/kata-containers/cached-artefacts/ovmf-sev:latest-main-x86_64
Digest: sha256:933236c2c79e53be3ca7acc0b966d0ddac9c0335edcb1e8cad8b9bb3aaf508ce
kata-static-ovmf-sev.tar.xz: OK
INFO: Using cached tarball of ovmf-sev
drwxr-xr-x runner/runner     0 2023-09-15 10:34 ./
drwxr-xr-x runner/runner     0 2023-09-15 10:34 ./opt/
drwxr-xr-x runner/runner     0 2023-09-15 10:34 ./opt/kata/
drwxr-xr-x runner/runner     0 2023-09-15 10:34 ./opt/kata/share/
drwxr-xr-x runner/runner     0 2023-09-15 10:34 ./opt/kata/share/ovmf/
-rwxr-xr-x runner/runner 4194304 2023-09-15 10:34 ./opt/kata/share/ovmf/AMDSEV.fd
~/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build/build ~/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build/build/ovmf-sev/builddir
~/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build/build/ovmf-sev/builddir
make[1]: Leaving directory '/home/ffidenci/src/upstream/kata-containers/kata-containers'
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 12:39:22 +02:00
Fabiano Fidêncio
86c41074b4 ci: cache: Check the sha256sum of the component
We've removed this in the part 2 of this effort, as we were not caching
the sha256sum of the component.  Now that this part has been merged,
let's get back to checking it.

Fixes: #7834 -- part 3

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 12:34:30 +02:00
Fabiano Fidêncio
f5e52d02d3 Merge pull request #7964 from fidencio/topic/ci-cache-using-oras-part-2
ci: cache: Use the artefacts stored in ghcr.io/kata-containers/cached-artefacts/${component}
2023-09-15 12:29:28 +02:00
Fabiano Fidêncio
2fe0b494da Merge pull request #7959 from fidencio/topic/ci-run-on-smaller-garm-instances
ci: Run some of the GARM tests in smaller instances
2023-09-15 11:30:13 +02:00
Fabiano Fidêncio
460988c5f7 ci: cache: Remove the script used to cache artefacts on Jenkins
That's not needed anymore, as we've switched to using ORAS and an OCI
registry to cache the artefacts.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 10:27:55 +02:00
Fabiano Fidêncio
4533a7a416 ci: cache: Also store the ${component} sha256sum
This is something that was done by our Jenkins jobs, but that I ended up
missing when writing d0c257b3a7.

Now, let's also add the sha256sum to the cached artefact, and in a
coming up PR (after this one is merged) we will also start checking for
that.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 10:25:26 +02:00
Fabiano Fidêncio
eccc76df63 ci: cache: Use the cached artefacts from ORAS
In the previous series related to the artefacts we build, we've
switching from storing the artefacts on Jenkins, to storing those in the
ghcr.io/kata-containers/cached-artefacts/${artefact_name}.

Now, let's take advantage of that and actually use the artefacts coming
from that "package" (as GitHub calls it).

NOTE: One thing that I've noticed that we're missing, is storing and
checking the sha256sum of the artefact.  The storing part will be done
in a different commit, and the checking the sha256sum will be done in a
different PR, as we need to ensure those were pushed to the registry
before actually taking the bullet to check for them.

Fixes: #7834 -- part 2

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 10:13:47 +02:00
Jeremi Piotrowski
6f30d00ae7 Merge pull request #7956 from fidencio/topic/ci-reduce-the-machine-size-used
ci: Reduce the size of the AKS VMs
2023-09-15 08:49:08 +02:00
Steve Horsman
1b8f3fa9ae Merge pull request #7957 from fidencio/topic/ci-cache-using-oras-part-1
ci: cache: Allow pushing our artefacts to an OCI registry
2023-09-15 07:45:24 +01:00
Jianyong Wu
7f5e77bcb8 kernel: enable Arm pl011 support
Enable pl011 (ttyAMA0) support in kernel for aarch64.

Fixes: #5080
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-09-15 01:45:16 +00:00
Jianyong Wu
241c355e07 clh:arm64: use arm AMBA uart for hypervisor debug
cloud hypervisor on arm64 only support arm AMBA UART(pl011) as
tty. So, the console should be set to "ttyAMA0" instead of "ttyS0"
when enable hypervisor debug mode.

Fixes: #5080
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-09-15 01:44:23 +00:00
Fabiano Fidêncio
094b6b2cf8 ci: k8s: Temporarily disable tests that require a bigger VM instance
The list of tests which require a bigger VM instance is:
* k8s-number-cpus.bats -- failing on all CIs
* k8s-parallel.bats -- only failing on the cbl-mariner CI
* k8s-scale-nginx.bats -- only failing on the cbl-mariner CI

We'll keep those disabled while we re-work the logic to **only run
those** in a bigger (and more expensive) VM instance.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 01:33:19 +02:00
GabyCT
6fe5cd3bd5 Merge pull request #7937 from GabyCT/topic/iperfbandwidth
metrics: Add iperf value for cpu utilization
2023-09-14 16:47:19 -06:00
Fabiano Fidêncio
d0c257b3a7 ci: cache: Push cached artefacts to ghcr.io
Let's push the artefacts to ghcr.io and stop relying on jenkins for
that.

Fixes: #7834 -- part 1

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 00:39:57 +02:00
Fabiano Fidêncio
108f1b60dd kata-deploy: Generate latest_{artefact,image_builder} files
Right now this is not used, but it'll be used when we start caching the
artefacts using ORAS.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 00:39:57 +02:00
Fabiano Fidêncio
be2eb7b378 ci: cache: Install ORAS in the kata-deploy binaries builder container
ORAS is the tool which will help us to deal with our artefacts being
pushed to and pulled from a container registry.

As both the push to and the pull from will be done inside the
kata-deploy binaries builder container, we need it installed there.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 00:39:57 +02:00
Fabiano Fidêncio
fb24fb0dc1 ci: k8s: devmapper: Use a smaller / cheaper VM instance
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense.  With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.

Fixes: #7958

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 00:27:05 +02:00
Fabiano Fidêncio
1daf02f5d4 ci: nydus: Use a smaller / cheaper VM instance
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense.  With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.

Fixes: #7958

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 00:25:41 +02:00
Fabiano Fidêncio
e60d81f554 ci: nerdctl: Use a smaller / cheaper VM instance
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense.  With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.

Fixes: #7958

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 00:25:41 +02:00
Fabiano Fidêncio
4db416997c ci: docker: Use a smaller / cheaper VM instance
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense.  With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.

Fixes: #7958

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 00:25:41 +02:00
Fabiano Fidêncio
32841827b8 ci: cri-containerd: Use a smaller / cheaper VM instance
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense.  With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.

Fixes: #7958

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 00:25:35 +02:00
Fabiano Fidêncio
92fff129fd ci: k8s: Don't set cpu limit request for k8s-inotofy test
Without setting the cpu limit / request to 1, we can make this test run
in a smaller VM instance without any issue.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-14 22:03:16 +02:00
Fabiano Fidêncio
faf98c0623 ci: Reduce the size of the AKS VMs
We do **not** need a very powerful machine for our tests, as we're not
building anything there.

The instance we switched to (Standard_D2s_v5) still has nested virt
available, as shown here[0], but has half of the amount of vCPUs /
Memory, which should be fine only for running the tests, costing us
basically half of the price[1].

[0]:
https://learn.microsoft.com/en-us/azure/virtual-machines/dv5-dsv5-series
[1]:
https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/#pricing

Fixes: #7955

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-14 22:03:16 +02:00
Fabiano Fidêncio
adc18ecdb1 ci: cache: For consistency, read all used env vars
Instead of having some of them only being considered if explicitly
passed to the script.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-14 20:24:48 +02:00
Fabiano Fidêncio
c7a851efd7 ci: cache: Pass the exposed env vars to the kata-deploy binaries in docker
As the environment variables are now being passed down from the GitHub
Actions, let's make sure they're exposed to the container used to build
the kata-deploy binaries, and during the build process we'll be able to
use those to log in and push the artefacts to the OCI registry, using
ORAS.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-14 20:24:48 +02:00
Fabiano Fidêncio
2e8b41f39c Merge pull request #7954 from fidencio/topic/ci-cache-using-oras-part-0
ci: cache: Export env vars needed to use ORAS
2023-09-14 20:23:55 +02:00
Fabiano Fidêncio
6bd15a85d5 ci: cache: Export env vars needed to use ORAS
We do the build of our artefacts inside a container image, and we need
to expose some env vars to the container so ORAS can be used there to
push the artefacts we want to cache to ghcr.io.

The env vars we're exposing are:
* ARTEFACT_REGISTRY: The registry where we're going to save the
  artefacts.
* ARTEFACT_REGISTRY_USERNAME: The username to log in to the registry, as
  ORAS does not use the same json file used by docker.
* ARTEFACT_REGISTRY_PASSWORD: The pasword to log in to the the registry,
  as the ORAS does not use the same json file used by docker.
* TARGET_BRANCH: The target branch, which will be part of the tag of the
  artefact, as we may end up caching the artefacts for both main and
  stable branches.

Fixes: #7834 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-14 19:36:33 +02:00
Gabriela Cervantes
cd4fd1292a metrics: Add iperf cpu utilization limit for qemu
This PR adds the iperf cpu utilization limit for qemu for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-14 17:17:47 +00:00
Gabriela Cervantes
df5cd10ea0 metrics: Add iperf value for cpu utilization
This PR adds the iperf value for cpu utilization for kata metrics.

Fixes #7936

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-14 16:06:49 +00:00
Jeremi Piotrowski
b54dd8cdf4 Merge pull request #7704 from jepio/vfio-part-1
gha: vfio: Import test script
2023-09-14 16:45:31 +02:00
Jeremi Piotrowski
a96050a7ad tests: Apply timeout to 'ctr t kill'
This task has been observed to hang at times.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
9d93036783 tests/vfio: Bump VM image to Fedora 38
We need a very recent L2 guest kernel to fix all the bugs that occur in nested
virtualization.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
faee59b520 tests/vfio: Accept single device in vfio group for CLH
cloud hypervisor does not emulate pcie switches or pci bridges, so we need to
accept a lonely device.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
df3dc1105c tests/vfio: Get rid of sync's
It is fine to start a VM with the disk image without syncing it as we now run
the test in an ephemeral Azure instance.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
7211c3dccc gha: vfio: Set test timeout to 15m
Sometimes the test gets stuck running commands in the container - need to
investigate why later.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
1b02f89e4f packaging: kernel: Enable VIRTIO_IOMMU on x86_64
Cloud Hypervisor exposes a VIRTIO_IOMMU device to the VM when IOMMU support is
enabled. We need to add it to the whitelist because dragonball uses kernel
v5.10 which restricted VIRTIO_IOMMU to ARM64 only.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
3a1db7a86b runtime: clh: Support enabling iommu
by enabling IOMMU on the default PCI segment. For hotplug to work we need a
virtualized iommu and clh exposes one if there is some device or PCI segment
that requests it. I would have preferred to add a separate PCI segment for
hotplugging vfio devices but unfortunately kata assumes there is only one
segment all over the place. See create_pci_root_bus_path(),
split_vfio_pci_option() and grep for '0000'.

Enabling the IOMMU on the default PCI segment requires passing enabling IOMMU on
every device that is attached to it, which is why it is sprinkled all over the
place.

CLH does not support IOMMU for VirtioFs, so I've added a non IOMMU segment for
that device.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
9f1a42c6cc tests/vfio: Give commands 30s to execute
This is a to catch the case of the guest getting stuck.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
b46b0ecf8b tests/vfio: Configure a value for 'hot_plug_vfio' for both vmms
This shouldn't be hiding behind only a qemu check, we need this for clh as
well.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
bfc93927fb runtime: Remove redundant check in checkPCIeConfig
There is no way for this branch to be hit, as port is only set when it is
different than config.NoPort.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
7c4e73b609 runtime: Add test cases for checkPCIeConfig
These test cases shows which options are valid for CLH/Qemu, and test that we
correctly catch unsupported combinations.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
fc51e4b9eb runtime: Check config for supported CLH (cold|hot)_plug_vfio values
The only supported options are hot_plug_vfio=root-port or no-port.
cold_plug_vfio not supported yet.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
509771e6f5 runtime: clh: Add hot_plug_vfio entry to config
hot_plug_vfio needs to be set to root-port, otherwise attaching vfio devices to
CLH VMs fails. Either cold_plug_vfio or hot_plug_vfio is required, and we have
not implemented support for cold_plug_vfio in CLH yet.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
5f6475a28a tests/vfio: Gather debug info and disable tdp_mmu
tdp_mmu had some issues up until around Linux v6.3 that make it work
particularly bad when running nested on Hyper-V. Reload the module at the start
of the test and disable the tdp_mmu param.

Gather debug info at the end of the test to make it easier to figure out what
went wrong. This uses github actions group syntax so that each section can be
collapsed.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
8fffdc81c5 tests/vfio: Capture journal from vm
For debugging (though this doesn't get exposed yet).

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
df815087e7 tests/vfio: Change to get the test working in GHA
- reduce memory and cpu usage to fit in a D4s_v5
- source correct lib
- mount workspace from 9p
- disable cpu mitigations for speed
- drop unused commands and variables
- install containerd
- install kata from built artifacts

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
a92ddeea15 tests/vfio: Move dependency installation to gha-run.sh
To match the flow of other github actions workflows.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
5a551a85b1 gha: vfio: Import jobs scripts from tests repo
This imports the vfio test scripts github.com/kata-containers/tests. The test
case doesn't work yet but doing the changes in a separate commit will make it
easier to track the changes. The only change in this commit is renaming
vfio_jenkins_job_build.sh -> vfio_fedora_vm_wrapper.sh

Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Fabiano Fidêncio
a1e3fa7ac4 Merge pull request #7905 from microsoft/danmihai1/mariner-annotations
tests: fix kernel and initrd annotations
2023-09-14 10:37:42 +02:00
GabyCT
1d331124ad Merge pull request #7925 from GabyCT/topic/bandwidthlimit
metrics: Add iperf bandwidth value for kata metrics
2023-09-13 17:43:55 -06:00
Gabriela Cervantes
49e2fa189c metrics: Increase jitter value for qemu
This PR increases the jitter value for qemu for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-13 22:36:09 +00:00
Gabriela Cervantes
49234433a7 metrics: Increase value limit for jitter in clh
This PR increases the value limit for jitter in clh.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-13 21:27:08 +00:00
David Esparza
0a24d3f718 Merge pull request #7923 from GabyCT/topic/addcassandradoc
metrics: Add Cassandra Metrics documentation
2023-09-13 10:17:00 -06:00
GabyCT
c565053bac Merge pull request #7895 from GabyCT/topic/removewarning
metrics: Remove warning from metrics documentation
2023-09-13 10:16:38 -06:00
Fabiano Fidêncio
8b9df1d32e Merge pull request #7929 from fidencio/topic/use-tcp-port-ping-on-docker-nerdctl-tests
ci: docker: nerdctl: Switch to tcp port 80 ping
2023-09-13 15:46:31 +02:00
Peng Tao
55ca7e8aec Merge pull request #7907 from Xuanqing-Shi/7876/network-devices-naming-conflict
runtime: Naming conflict of network devices
2023-09-13 19:29:41 +08:00
Fabiano Fidêncio
813bfdec01 ci: docker: nerdtl: Use io.containerd.kata-${KATA_HYPERVISOR}.io
This will ensure that we're calling the correct binary for the
hypervisor.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-13 13:10:14 +02:00
Fabiano Fidêncio
46bc0b1c01 ci: nerdctl: Create the containerd config
Otherwise we'll fail to configure kata-containers in the `install-kata`
step.

This is mostly needed because the nerdctl-full tarball doesn't provide a
contaienrd configuration, just the binary, as contaienrd does not
actually require a configuration file to run with the default config.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-13 13:00:57 +02:00
Fabiano Fidêncio
13968aa7f6 ci: nerdctl: Switch to tcp port 80 ping
TIL that the Azure VMs we use are created without an explicit outbund
connectivity defined.

This leads us to issues using `ping ...` as part of our tests, and when
consulting Jeremi Piotrowski about the issue he pointed me out to two
interesting links:
* https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access
* https://learn.microsoft.com/en-us/archive/blogs/mast/use-port-pings-instead-of-icmp-to-test-azure-vm-connectivity

For your own sanity, do not read the comments, after all this is
internet. :-)

Anyways, the suggestion is to use nping instead, which is provided by
the nmap package, so we can explicitly switch to using the tcp port 80
for the ping.  With this in mind, I'm switching the image we use for the
test and using one that provided nping as a possible entry point, and
from now on (this part of) the tests should work.

Fixes: #7910

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-13 13:00:57 +02:00
Fabiano Fidêncio
e0c811678b ci: docker: Switch to tcp port 80 ping
TIL that the Azure VMs we use are created without an explicit outbund
connectivity defined.

This leads us to issues using `ping ...` as part of our tests, and when
consulting Jeremi Piotrowski about the issue he pointed me out to two
interesting links:
* https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access
* https://learn.microsoft.com/en-us/archive/blogs/mast/use-port-pings-instead-of-icmp-to-test-azure-vm-connectivity

For your own sanity, do not read the comments, after all this is
internet. :-)

Anyways, the suggestion is to use nping instead, which is provided by
the nmap package, so we can explicitly switch to using the tcp port 80
for the ping.  With this in mind, I'm switching the image we use for the
test and using one that provided nping as a possible entry point, and
from now on (this part of) the tests should work.

Fixes: #7910

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-13 13:00:57 +02:00
shixuanqing
1636abbe1c runtime: issue with non-empty []Endpoint in RemoveEndpoints
In the RemoveEndpoints(), when the endpoints paramete isn't empty,
using idx may result in wrong endpoint removals. To improve,
directly passing the endpoint parameter helps
locate the correct elements within n.eps.

Fixes: #7732

Signed-off-by: shixuanqing <1356292400@qq.com>

Fixes: #7732

Signed-off-by: shixuanqing <1356292400@qq.com>

Update src/runtime/virtcontainers/network_linux.go

Co-authored-by: Xuewei Niu <justxuewei@apache.org>
2023-09-13 09:47:18 +00:00
Peng Tao
9766f9090c Merge pull request #7719 from beraldoleal/nullable
Remove gogoproto.nullable extension
2023-09-13 15:11:56 +08:00
David Esparza
c2b2a00ad9 Merge pull request #7899 from GabyCT/topic/startdocker
metrics: Ensure docker is running in init_env
2023-09-12 23:01:26 -06:00
Gabriela Cervantes
0aa073967d metrics: Add iperf bandwidth value for qemu
This PR adds the iperf bandwidth value for qemu for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-12 20:57:14 +00:00
Dan Mihai
c0ad914766 tests: fix kernel and initrd annotations
Fix kernel and initrd annotations in the k8s tests on Mariner. These
annotations must be applied to the spec.template for Deployment, Job
and ReplicationController resources.

Fixes: #7764

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-09-12 20:15:25 +00:00
Gabriela Cervantes
615c1cbf19 metrics: Add iperf bandwidth value for kata metrics
This PR adds the iperf bandwidth value for kata metrics.

Fixes #7924

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-12 19:30:24 +00:00
Gabriela Cervantes
d53eb73eec metrics: Ensure docker is running in init_env
This PR ensures that docker is running as part of the init_env function
in kata metrics to avoid failures like docker is not running and making
the kata metrics CI to fail.

Fixes #7898

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-12 19:13:09 +00:00
GabyCT
c0d502493e Merge pull request #7921 from dborquez/metrics_disable_fio_test
metrics: this PR skips the FIO test temprarily to fix issues
2023-09-12 12:08:48 -06:00
Gabriela Cervantes
ad08321b83 metrics: Add Cassandra Metrics documentation
This PR adds the Cassandra Metrics documentation for kata metrics.

Fixes #7922

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-12 16:30:35 +00:00
David Esparza
a58ea66592 metrics: this PR skips the FIO test temprarily to fix issues
FIO test is showing ongoing issues when running in k8s.
Working on running FIO on the ctr client which has been
shown to be stable.

Fixes: #7920

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-09-12 10:23:57 -06:00
Fabiano Fidêncio
2d8447fc6b Merge pull request #7916 from fidencio/topic/add-functional-nerdctl-tests
ci: Add a very basic nerdctl sanity test
2023-09-12 17:47:08 +02:00
James O. D. Hunt
7feb8de9dc Merge pull request #7887 from jodh-intel/hypervisor-remove-debug-kernel-options
runtime-rs: hypervisor: Remove debug kernel options
2023-09-12 16:31:48 +01:00
Fabiano Fidêncio
f536ef5ce1 ci: docker: Also run the smoke test with runc
This will help us to make sure that the failure is actually related to
Kata Containers.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-12 16:54:02 +02:00
Fabiano Fidêncio
c83f167c59 ci: docker: Run the tests after the kata-static is created
There's no reason to wait till the payload is created to run the tests,
as we rely on the tarball, not on the kata-deploy payload.

That was a mistake on my side, and that's already fixed for the nerdctl
tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-12 16:53:47 +02:00
Fabiano Fidêncio
12d833d07d ci: Add a very basic nerdctl sanity test
Let's add a very basic sanity test to check that we can spawn a
containers using nerdctl + Kata Containers.

This will ensure that, at least, we don't regress to the point where
this feature doesn't work at all.

In the future, we should also test all the VMMs with devmapper, but
that's for a follow-up PR after this test is working as expected.

Fixes: #7911

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-12 16:52:55 +02:00
Greg Kurz
be71a0ab4e Merge pull request #7811 from stevenhorsman/bump-rust-to-1.72
versions: Bump rust version
2023-09-12 15:30:35 +02:00
Fabiano Fidêncio
b020912629 Merge pull request #7913 from fidencio/topic/add-functional-docker-tests
ci: Add a very basic docker sanity test
2023-09-12 15:28:49 +02:00
Fabiano Fidêncio
348b8644d6 ci: Add a very basic docker sanity test
Let's add a very basic sanity test to check that we can spawn a
containers using docker + Kata Containers.

This will ensure that, at least, we don't regress to the point where
this feature doesn't work at all.

For now we're running this test against Cloud Hypervisor and QEMU only,
due to an already reported issue with dragonball:
https://github.com/kata-containers/kata-containers/issues/7912

In the future, we should also test all the VMMs with devmapper, but
that's for a follow-up PR after this test is working as expected.

Fixes: #7910

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-12 15:15:26 +02:00
stevenhorsman
a75fd5eb81 runk: Fix rust unecessary mut error
- Fix `error: variable does not need to be mutable`
in rust 1.72

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
a31c145172 kata-ctl: useless-vec warning
- Fix clippy::useless-vec warning

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
c8419fc3bb kata-ctl: Resolve non-minimal-cfg warning
- In rust 1.72, clippy warned clippy::non-minimal-cfg
as the cfg has only one condition, so doesn't
need to be wrapped in the any combinator.

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
3eaf68d954 agent-ctl: Allow clippy lint
- Allow `clippy::redundant-closure-call`
which has issues with the guard function passed into
the `run_if_auto_values` macro

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
1d8b78959d runtime-rs: Fix useless-vec warning
Fix clippy::useless-vec warning

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
99f3d69e94 runtime-rs: Remove mut
Fix `error: variable does not need to be mutable`

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
16fbc27b09 dragonball: Allow ambiguous-glob-reexports
The bindgen generated code is triggering lots of
ambiguous-glob-reexports warnings in rust 1.70+

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
bbf1919516 dragonball: Resolve non-minimal-cfg warning
- In rust 1.72, clippy warned clippy::non-minimal-cfg
as the cfg has only one condition, so doesn't
need to be wrapped in the all combinators.

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
75cfdd5d59 agent: config: Allow clippy lint
- Allow `clippy::redundant-closure-call` in `from_cmdline`
which has issues with the guard function passed into
the `parse_cmdline_param` macro

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
f3a0fd5907 agent: config: Fix useles-vec warning
Fix clippy::useless-vec warning

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
9e423bd3d6 libs: Fix clippy unnecesary hashes error
- Fix error: unnecessary hashes around raw string literal

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
444395050a versions: Bump rust version
Bump rust to 1.72.0 to test what extra warnings/issues we get

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
Yipeng Yin
a16b0962b5 chore(cargo): update cargo lock
Update cargo lock for runtime-rs, agent and kata-ctl.

Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
2023-09-12 15:27:38 +08:00
Chao Wu
c800d0739f Merge pull request #7889 from UiPath/fix-dragonball-build
dragonball: fix for non-deterministic builds
2023-09-12 14:06:18 +08:00
shixuanqing
ca4b6b051d runtime: Naming conflict of network devices
When creating a new endpoint, we check existing endpoint names and automatically adjust the naming of the new endpoint to ensure uniqueness.

Fixes: #7876

Signed-off-by: shixuanqing <1356292400@qq.com>
2023-09-12 04:29:51 +00:00
Guixiong Wei
202049f35e feat(runtime-rs): introduce huge page type to select VM RAM's backend
This commit allows us to specify the huge page backend when enabling huge
page. Currently, we support two backends: thp and hugetlbfs, the default
is hugetlbfs.

To ensure backward compatibility, we introduce another configuration item
"hugepage_type" to select the memory backend, which is available only when
"enable_hugepages" is true. Besides, we add an annotation
"io.katacontainers.config.hypervisor.hugepage_type" to configure huge page
type per pod.

Fixes: #6703

Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com>
Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
2023-09-12 11:28:27 +08:00
Zhongtao Hu
e1f54f96d0 Merge pull request #7766 from Apokleos/wrap-vsock-virtiofs
runtime-rs: bring hybrid vsock devices in manager.
2023-09-12 09:27:34 +08:00
GabyCT
af29eeb8b1 Merge pull request #7901 from fidencio/topic/ci-target-branch-fixes-follow-up-3
ci: use github.ref_name instead of $GITHUB_REF_NAME
2023-09-11 15:31:29 -06:00
Fabiano Fidêncio
f811b064ca ci: use github.ref_name instead of $GITHUB_REF_NAME
As, regardless of what's mentioned in the documentation, it seems that
$GITHUB_REF_NAME is passed down as a literal string.

Fixes: #7414

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-11 22:14:55 +02:00
Fabiano Fidêncio
dc0b350e49 Merge pull request #7900 from fidencio/topic/ci-target-branch-fixes-follow-up-2
ci: Add more target-branch related fixes
2023-09-11 21:26:26 +02:00
Fabiano Fidêncio
6d795c089e ci: Add more target-branch related fixes
The ones for the payload-after-push.yamland ci-nightly.yaml are not that
much important right now, but they're needed for when we start running
those on stable branches as well.

The other ones were missed during
bd24afcf73.

Fixes: #7414

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-11 20:42:57 +02:00
Fabiano Fidêncio
07d0ad0ad7 Merge pull request #7897 from fidencio/topic/ci-devmapper-do-the-rebase-as-well
ci: Fix target-branch usage
2023-09-11 20:30:53 +02:00
Fabiano Fidêncio
d7f991d139 Merge pull request #7151 from Yuan-Zhuo/fix-systemd-cgroup
agent: optimize the code of systemd cgroup manager
2023-09-11 20:15:51 +02:00
Fabiano Fidêncio
8509c31870 ci: Fix target-branch usage
We missed those one as part of bd24afcf73.

Fixes: #7414

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-11 20:10:27 +02:00
Gabriela Cervantes
060499dcae metrics: Remove warning from metrics documentation
Now that the metrics migration from the tests to kata containers has been completed, this PR removes the warning from the main metrics documentation.

Fixes #7894

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-11 16:41:48 +00:00
GabyCT
b384757ac7 Merge pull request #7874 from fidencio/topic/manually-rebase-branches-atop-of-the-target-one
gha: Manually rebase PR atop of the target branch before testing
2023-09-11 10:35:01 -06:00
Fabiano Fidêncio
46e73cf7a2 Merge pull request #7884 from fidencio/topic/update-kernel-to-the-latest-lts-plus-bring-in-erofs-patches
Update kernel to the latest LTS release (v6.1.52) and bring in erofs patches needed for the CC work
2023-09-11 13:58:43 +02:00
James O. D. Hunt
c0f697fcc5 runtime: Allow kernel_params annotation
To support the removal of the `initcall_debug` and `earlyprintk=`
options from the default guest kernel cmdline, add `kernel_params` to the list
of enabled annotations to allow those kernel options (or others) to be
set using `kata-deploy` for either runtime.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-11 12:12:12 +01:00
Alexandru Matei
b03e49794e dragonball: fix for non-deterministic builds
Fixes: #7888

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2023-09-11 14:07:10 +03:00
Fabiano Fidêncio
93bad13769 Merge pull request #7875 from fidencio/topic/kata-deploy-fix-arm64-image-build
kata-deploy: Fix aarch64 image build
2023-09-11 11:36:52 +02:00
James O. D. Hunt
976d10150c runtime-rs: hypervisor: Remove debug kernel options
Removed the following kernel command line options:

- `earlyprintk=ttyS0`
- `initcall_debug`

Both these options are only useful when debugging a guest kernel failure
which is not a common occurrence.

Further, the `earlyprintk=` option can have a large negative performance
impact (it can increase the VM boot time significantly).

If the user wishes to use either of these options, they can add them to the
`kernel_params=` setting in the Kata configuration file's hypervisor
stanza.

Fixes: #7886.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-11 09:43:39 +01:00
Fabiano Fidêncio
fde34610cd kernel: Add erofs patches needed for CC related work
All the patches have already been merged upstream and they've just been
cherry-picked to this branch.

Fixes: #7885

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-11 10:39:37 +02:00
Fabiano Fidêncio
dc6a4588a2 versions: Bump kernel to the latest LTS release (6.1.52)
We're bumping here in order to make our lives easier backporting EROFS
patches needed for the CC related work.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-11 10:32:16 +02:00
James O. D. Hunt
52f6449b70 kata-manager: Remove initcall_debug kernel option
Removed the addition of the `initcall_debug` kernel option when agent
debugging enabled. This option has nothing to do with the agent.

If the user wishes to use this option, they can add it to the
`kernel_params=` setting in the Kata configuration file's hypervisor
stanza.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-11 09:31:44 +01:00
Fabiano Fidêncio
6cd5d83a37 Merge pull request #7865 from gkurz/fix-more-virtiofs-args
runtime: Fix more virtiofs args
2023-09-09 21:30:16 +02:00
Fabiano Fidêncio
8b4a0b368f kata-deploy: Remove curl after it's used
There's no need to keep curl there after the kubectl binary has already
been downloaded.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-09 10:52:05 +02:00
Fabiano Fidêncio
139c7f03ab kata-deploy: Fix aarch64 image build
Similarly to what's been done for x86_64 -> amd64, we need to do a
aarch64 -> arm64 change in order to be able to download the kubectl
binary.

Fixes: #7861

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-09 10:51:52 +02:00
Fabiano Fidêncio
94f5a69346 Merge pull request #7862 from fidencio/topic/kata-deploy-use-alpine-as-base-image
kata-deploy: Switch to an alpine image
2023-09-09 09:02:13 +02:00
Yuan-Zhuo
470d065415 agent: optimize the code of systemd cgroup manager
1. Directly support CgroupManager::freeze through systemd API.
2. Avoid always passing unit_name by storing it into DBusClient.
3. Realize CgroupManager::destroy more accurately by killing systemd unit rather than stop it.
4. Ignore no such unit error when destroying systemd unit.
5. Update zbus version and corresponding interface file.

Acknowledgement: error handling for no such systemd unit error refers to

Fixes: #7080, #7142, #7143, #7166

Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
2023-09-09 13:56:43 +08:00
GabyCT
fa818bfad1 Merge pull request #7867 from GabyCT/topic/optimizedimage
metrics: Use TensorFlow optimized image
2023-09-08 11:34:21 -06:00
Fabiano Fidêncio
bd24afcf73 gha: Manually rebase PR atop of the target branch before testing
We're changing what's been done as part of ac939c458c, as we've
notcied issues using `github.event.pull_request.merge_commit_sha`.

Basically, whenever a force-push would happen, the reference of
merge_commit_sha wouldn't be updated, leading us to test PRs with the
old code. :-/

In order to get the rebase properly working, we need to ensure we pull
the hash of the commit as part of checkout action, and ensure
fetch-depth is set to 0.

Fixes: #7414

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 18:56:31 +02:00
GabyCT
dc7414f5c1 Merge pull request #7870 from dborquez/metrics_fio_fix_clean_env_order
metrics: fix FIO test initialization
2023-09-08 10:28:10 -06:00
Greg Kurz
72c510d057 runtime/virtiofsd: Drop all references to "--cache=none"
This syntax belongs to the legacy C virtiofsd implementation that
we don't support anymore since kata-containers 3.1.3 because
of other API breaking changes.

People have been warned to switch from "none" to "never" since
kata-containers 2.5.2. Let's officially do that.

The compat code that would convert "none" to "never" isn't
needed anymore. Just drop it.

Fixes #7864

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-09-08 17:57:30 +02:00
Beraldo Leal
ead724bec1 protocol: removing gogo.nullable feature
gogo.nullable is the main gogo.protobuf' feature used here. Since we are
trying to remove gogo.protobuf, the first reasonable step seems to be
remove this feature. This is a core update, and it will change how the
structs are defined. I could spot only a few places using those structs,
based on make check/build.

Fixes #7723.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
d8e4bb9859 protocol: remove unused PROTO_FILE env
There is no reference to PROTO_FILE and this is not working. Also we are
not inside a Makefile, so makes sense to adapt the usage to reflect the
script instead of a make command.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
5e1106a770 protocol: remove unused import_path
import_path is used as the default package when no input files specify
go_package. However, all the files we are currently building already
have a go_package definition, making this behavior both redundant and
error-prone.

Additionally, one of our files (types.pb.go) resides outside the grpc
directory, indicating that it's indeed ignored but also inconsistent.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
87accaaecb protocol: use workdir during build
Currently, the script searches for .proto files within $GOPATH/.
Consequently, modifications to a definition file in the current working
directory won't influence the output .pb.go if the directory is outside
of $GOPATH. For developers, it's more intuitive to alter the local
codebase than the version stored in $GOPATH.

With this modification, the generated .pb.go files will be relative to
the current working directory, removing the need to clone this project
under $GOPATH/src/github.com/kata-containers.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
711a7ed965 protocol: remove mapping definitions
The definitions are already specified in the .proto files using the
go_package option. Centralizing them in one location reduces the
potential for errors and simplifies the script.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
8db84c1bd2 protocol: force GOPATH to be set
Currently, if GOPATH is not set, errors will raise since protoc is using
GOPATH to find packages.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
68156d77ac protocol: breaking lines to improve readability
Just a small change to improve the readability of modules before the
actual changes.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Fabiano Fidêncio
670a8e9c73 kata-deploy: Switch to an alpine image
This will make our image smaller, and still ensure it's multi-arch
support.

Fixes: #7861

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 17:39:51 +02:00
Fabiano Fidêncio
0b26a5d053 Merge pull request #7871 from fidencio/topic/ci-add-k8s-devmapper-tests-follow-up-3
ci: k8s: Add clean-up-garm argument for gha-run.sh
2023-09-08 17:27:57 +02:00
Fabiano Fidêncio
9d74b7ccc9 k8s: ci: Skip "Pod quota" test with firecracker
The test is failing, and an issue has been opened to track it.
For now, let's skip it.

Issue:
https://github.com/kata-containers/kata-containers/issues/7873

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 15:51:46 +02:00
Fabiano Fidêncio
f6cd3930c5 ci: k8s: Remove useless skip statement from tests
There's absolutely no need to have the skip check as part of the test
itself when it's already done as part of the setup function.

We're only touching the files here that were touched in the previous
commit.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 14:25:29 +02:00
Fabiano Fidêncio
3cc20b47a6 ci: k8s: Also check for "fc" (for firecracker)
Let's keep both checks for now, but in the future we'll be able to
remove the check for "firecracker", as the hypervisor name used as part
of the GitHub Actions has to match what's used as part of the
kata-deploy stuff, which is `fc` (as in `kata-fc for the runtime class)
instead of `firecracker`.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 14:25:24 +02:00
Fabiano Fidêncio
b5bad3cb0f ci: k8s: Add clean-up-garm argument for gha-run.sh
The tests are failing to finish as the argument is invalid.

Fixes: #6542

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 14:04:50 +02:00
Fabiano Fidêncio
05e2e7636e Merge pull request #7868 from fidencio/topic/ci-add-k8s-devmapper-tests-follow-up-2
ci: k8s: Second round of fix-ups with the devmapper CI
2023-09-08 11:02:20 +02:00
Fabiano Fidêncio
aaec5a09f3 ci: k8s: devmapper tests should be using ubuntu 20.04
That's what we've been using as part of Jenkins, so let's ensure things
will work as they did before, and only after that consider upgrading the
base OS used for the tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 10:09:04 +02:00
Fabiano Fidêncio
27fa7d828d ci: k8s: Add a kata-deploy-garm target
We've been using the `kata-deploy-tdx` target as that also uses k3s as
base, but it's better to just have a specific garm target.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 10:09:04 +02:00
Fabiano Fidêncio
fa62a4c01b ci: k8s: Export KUBERNETES env var
So we have a better control on which flavour of kubernetes kata-deploy
is expected to be targetting.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 10:09:04 +02:00
Fabiano Fidêncio
8c9380a798 ci: k8s: Install bats on GARM runners
GARM runners do not come with the whole set of tools we need, or are
used to when it comes to the GHA runners, so we need to manually install
bats on those.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 10:09:04 +02:00
Fabiano Fidêncio
3de23034f8 ci: k8s: Wait some time after restarting k3s
Let's put a 1 minute sleep, just to make sure everything is back up
again.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-07 23:46:58 +02:00
David Esparza
adfea55b8f metrics: fix FIO test initialization
This PR changes the order in which the FIO test first
cleans the environment and then checks if the environment
is indeed clean.

Fixes: #7869

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-09-07 15:41:59 -06:00
Fabiano Fidêncio
2df183fd99 ci: k8s: Append, instead of overwrite, the devmapper config
As we were using `tee` without the `-a` (or `--apend`) aptton, the
containerd config would be overwritten, leading to a NotReady state of
the Node.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-07 23:12:55 +02:00
Fabiano Fidêncio
369a8af8f7 ci: k8s: Decrease k3s sleep from 4 to 2 minutes
It should be plenty, and worked well in local tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-07 23:12:55 +02:00
Fabiano Fidêncio
ada65b988a ci: k8s: Use vanilla kubectl with k3s
Let's download the vanilla kubectl binary into `/usr/bin/`, as we need
to avoid hitting issues like:
```sh
error: open /etc/rancher/k3s/k3s.yaml.lock: permission denied
```

The issue basically happens because k3s links `/usr/local/bin/kubectl`
to `/usr/local/bin/k3s`, and that does extra stuff that vanilla
`kubectl` doesn't do.

Also, in order to properly use the k3s.yaml config with the vanilla
kubectl, we're copying it to ~/.kube/config.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-07 23:12:55 +02:00
Fabiano Fidêncio
ad45ab5d33 ci: k8s: Ensure k3s is deploy with --write-kubeconfig-mode=644
Otherwise the /etc/rancher/k3s/k3s.yaml is not readable by other users
than root.

As --write-config-mode is being passed, and that's an option that has to
be passed to the `server`, -s is also added to the command line.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-07 23:12:55 +02:00
Fabiano Fidêncio
028a97e0d5 ci: k8s: Use the proper command for sleep
`wait` waits for a job to complete, not a number of seconds.  Not sure
how I got that wrong in the first place, but it's what it's.

Fixes: #6542

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-07 23:12:55 +02:00
David Esparza
34f580901f Merge pull request #7824 from dborquez/fix_memory_usage_initialization
metrics: re-enable memory-usage initialization step
2023-09-07 14:24:27 -06:00
Gabriela Cervantes
3a427795ea metrics: Use TensorFlow optimized image
This PR replaces the ubuntu image for one which has TensorFlow optimized
for kata metrics.

Fixes #7866

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-07 15:38:51 +00:00
Chao Wu
cd8c217ee1 Merge pull request #6879 from openanolis/chao/update_upstream_upcall_feature
Dragonball: optimize the placement of dbs-upcall features
2023-09-07 18:07:53 +08:00
Fabiano Fidêncio
dfa1cce916 Merge pull request #7860 from fidencio/topic/ci-add-k8s-devmapper-tests-follow-up-1
ci: k8s: Fix typo in run-k8s-tests-on-garm.yaml
2023-09-07 11:48:30 +02:00
Fabiano Fidêncio
8d99972a8a ci: k8s: Fix typo in run-k8s-tests-on-garm.yaml
integrations -> integration
integrtion -> integration

Fixes: #6542

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-07 11:31:30 +02:00
Fabiano Fidêncio
0483d3d16d Merge pull request #7841 from fidencio/topic/ci-add-k8s-devmapper-tests
ci: k8s: Add k8s devmapper tests (part 0)
2023-09-07 10:53:09 +02:00
Jeremi Piotrowski
f6cc01d77c Merge pull request #7833 from jepio/kata-static-fix-ownership
kata-deploy: Create kata-static.tar with correct ownership
2023-09-07 10:16:23 +02:00
Peng Tao
435e890cd9 Merge pull request #7703 from bergwolf/github/nerdctl-fc
runtime: run prestart hooks before starting VM for FC
2023-09-07 10:55:31 +08:00
Chao Wu
deed1b927d Dragonball: optimize the placement of dbs-upcall features
Currently, the dbs-upcall features have 2 problems that are needed to be
fixed :

There are redundant dbs-upcall features that are needed to be removed.
Some place should be controlled by dbs-upcall but not being implemented.

This commit will fix those two problems.

fixes: #6878

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-09-07 10:27:29 +08:00
Fabiano Fidêncio
0e8bd50cbb ci: k8s: Add k8s devmapper tests (part 0)
Let's enable the devmapper kubernetes tests to match exactly what's been
tested as part of the Jenkins CI.

Fixes: #6542

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-06 23:08:38 +02:00
Fabiano Fidêncio
b28b54df04 ci: k8s: Add a function to configure devmapper for containerd
This function right now is completely based on what's part of the tests
repo[0], and that's the reason I'm keeping the `Signed-off-by` of all
the contributors to that file.

This is not perfect, though, as it changes the default snapshotter to
devmapper, instead of only doing so for the Kata Containers specific
runtime handlers.  OTOH, this is exactly what we've always been doing as
part of the tests.

We'll improve it, soon enough, when we get to also add a way for
kata-deploy to set up different snapshotters for different handlers.
But, for now, this is as good (or as bad) as it's always been.

It's important to note that the devmapper setup doesn't take into
consideration a BM machine, and this is not suitable for that.  We're
really only targetting GHA runners which will be thrown away after the
run is over.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-06 23:08:17 +02:00
Fabiano Fidêncio
54f7117212 ci: k8s: Add a function to deploy k3s
One can use different kubernetes flavours for getting a kubernetes
cluster up and running.

As part of our CI, though, I really would like to avoid contributors
spending time maintaining and updating kubernetes dependencies, as done
with the tests repo, and which has been proven to be really good on
getting things rotten.

With this in mind, I'm taking the bullet and using "k3s" as the way to
deploy kubernetes for the devmapper related tests, and that's the reason
I'm adding a function to do so, and this will be used later on as part
of this series.

It's important to note that the k3s setup doesn't take into
consideration a BM machine, and this is not suitable for that.  We're
really only targetting GHA runners which will be thrown away after the
run is over.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-06 23:07:41 +02:00
David Esparza
cf258090aa Merge pull request #7843 from GabyCT/topic/ffiolimit
metrics: Add write 95 percentile FIO value
2023-09-06 14:52:00 -06:00
Fabiano Fidêncio
c5e1e7ddc3 Merge pull request #7854 from fidencio/topic/runtime-allow-virtio_fs_extra_args-annotation
runtime: Allow virtio_fs_extra_args annotation
2023-09-06 19:20:40 +02:00
Greg Kurz
81536f21af runtime/qemu: Pass "--xattr" to virtiofsd instead of "-o xattr"
The "-o" syntax belongs to the legacy C virtiofsd. It is deprecated
with the rust implementation.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-09-06 17:50:35 +02:00
Fabiano Fidêncio
b1dd09a4d3 runtime: Allow virtio_fs_extra_args annotation
Some use cases may just require passing extra arguments to virtiofsd,
and having this disabled by default makes it impossible to set when
using kata-deploy, as changes in the configuration file would be
overwritten by the daemon-set.

With this in mind, let's allow users to pass whatever thet need (and
here I'm specifically looking at `--xattr`) as a virtio_fs_extra_arg.

Fixes: #7853

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-06 17:11:16 +02:00
Hyounggyu Choi
d27fe18167 Merge pull request #7849 from BbolroC/hot-fix-dockerbuild
packaging: do not install docker-compose-plugin for s390x|ppc64le
2023-09-06 13:13:25 +02:00
Hyounggyu Choi
2efda20c77 packaging: do not install docker-compose-plugin for s390x|ppc64le
This PR is to skip installing docker-compose-plugin while buiding a `build-kata-deploy` image for s390x|ppc64le.
It is a temporary solution to fix current CI failures for s390x regarding `hash sum mismatch`.

Fixes: #7848
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2023-09-06 11:12:03 +02:00
Zhongtao Hu
aa85e0b3ec Merge pull request #7714 from justxuewei/volumes-cleanup
runtime-rs: Fix volumes and rootfs cleanup issues
2023-09-06 10:13:55 +08:00
Gabriela Cervantes
438fbf9669 metrics: Add write 95 percentile for FIO for qemu
This PR adds the write 95 percentile for FIO for qemu for
checkmetrics for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-05 22:50:31 +00:00
Gabriela Cervantes
024b4d2ffe metrics: Add write 95 percentile FIO value
This PR adds the write 95 percentile FIO value for checkmetrics
for kata metrics.

Fixes #7842

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-05 21:00:05 +00:00
GabyCT
3e3a91fd2c Merge pull request #7577 from GabyCT/topic/enableiperfm
metrics: Enable iperf benchmark on gha for kata metrics
2023-09-05 14:53:47 -06:00
Gabriela Cervantes
e98e5cdea2 metrics: Add checkmetrics to gha run script
This PR adds the checkmetrics to gha run script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-05 17:05:03 +00:00
Gabriela Cervantes
c1edfe5511 metrics: Add checkmetrics value for qemu for iperf
This PR adds the checkmetrics value for qemu for iperf benchmark.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-05 16:04:52 +00:00
Gabriela Cervantes
6a79ecedf9 metrics: Add jitter value for clh
This PR adds jitter value for clh for iperf metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-05 16:04:52 +00:00
Gabriela Cervantes
f609a9a754 metrics: Add test selector to iperf metrics
This PR adds test selector to iperf metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-05 16:04:52 +00:00
Gabriela Cervantes
5b8db30422 metrics: Enable iperf benchmark on gha for kata metrics
This PR enables the iperf benchmark to run on the gha for kata metrics.

Fixes #7575

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-05 16:04:52 +00:00
Jeremi Piotrowski
cf46b056fd Merge pull request #7839 from openanolis/chao/switch_to_azure
CI: switch static-checks-dragonball CI machines to Azure
2023-09-05 10:59:02 +02:00
Chao Wu
60f733d301 CI: switch static-checks-dragonball CI machines to Azure
Previously, static-checks-dragonball is using machines from Alibaba
Cloud to run all the CI jobs.

Currently, we are going through an internal process to apply for the new
machines for Dragonball CI. Before the internal process is over, we will
temporarily use Azure VM to run static-checks-dragonball jobs.

fixes: #7838

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-09-05 15:19:07 +08:00
alex.lyn
7870b33a2d runtime-rs: bring hybridVsock devices in manager.
Currently, virtio_vsock are still outside of the device
manager. This causes some management issues,such as the
inability to unify PCI address management.

Just do some work for hybrid vsock.

Fixes: #7655

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-09-05 08:46:56 +08:00
Jeremi Piotrowski
18c94ebbe3 kata-deploy: Create kata-static.tar with correct ownership
Pass --owner and --group to the tar invokation to prevent gihtub runner user
from leaking into release artifacts.

Fixes: #7832
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-04 17:24:00 +02:00
Fabiano Fidêncio
b663ec21ac Merge pull request #7803 from GabyCT/topic/readmereportdoc
metrics: Add README for kata metrics report
2023-09-03 21:57:13 +02:00
Fabiano Fidêncio
e490b0bc76 Merge pull request #7808 from ManaSugi/fix/remove-manual-chcon
osbuilder: Remove chcon operation for guest SELinux
2023-09-03 21:55:02 +02:00
Fabiano Fidêncio
27dab249a0 Merge pull request #7800 from jodh-intel/kata-sys-util-update-tdx-protection-checks
kata-sys-util: protection: Update TDX checks
2023-09-02 14:47:51 +02:00
Jiang Liu
d5729e818c Merge pull request #7819 from jiangliu/storage-cleanup
Improve the way to clean up storage devices for sandbox
2023-09-02 17:02:51 +08:00
Jiang Liu
57e7bf14a6 agent: refine StorageDeviceGeneric::cleanup()
Refine StorageDeviceGeneric::cleanup() to improve safety.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-02 14:22:21 +08:00
Jiang Liu
53edb19374 agent: implement StorageDeviceGeneric::cleanup()
Refactor cleanup_sandbox_storage as StorageDeviceGeneric::cleanup().

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-02 14:00:26 +08:00
Jiang Liu
0c63453e28 types: make StorageDevice::cleanup() return possible error code
Make StorageDevice::cleanup() return possible error code.

Fixes: #7818

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-02 13:27:06 +08:00
Jiang Liu
3a3d77b3b5 agent: move StorageDeviceGeneric from kata-types into agent
Move StorageDeviceGeneric from kata-types into agent, so we can
refactor code later.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-02 13:12:17 +08:00
Jiang Liu
d848126b61 Merge pull request #7821 from jiangliu/storage-leak
agent: avoid possible leakage of storage device
2023-09-02 12:40:40 +08:00
Fabiano Fidêncio
4f92e6df90 Merge pull request #7683 from microsoft/danmihai1/policy-tests
tests: add policy to existing tests
2023-09-01 23:52:15 +02:00
David Esparza
b151cfd140 metrics: re-enable memory-usage initialization step
This PR re-enables the initialization step disabled
on 538c965c2b.

Fixes: #7804

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-09-01 14:29:34 -06:00
Fabiano Fidêncio
f3e1a6a94f osbuilder: alpine: Change mirror
As we're hitting a lot of:
```
ERROR: https://dl-5.alpinelinux.org/alpine/v3.18/main: operation timed
out
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-01 16:01:42 +00:00
Fabiano Fidêncio
ac612aef5e osbuilder: alpine: Match the version on versions.yaml
We've switching to 3.18 as part of
82cd14ba39.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-01 16:01:33 +00:00
Jiang Liu
9cd706d1c9 agent: avoid possible leakage of storage device
When a storage device is used by more than one container, the second
and forth instances will cause storage device reference count leakage,
thus cause storage device leakage. The reason is:
add_storages() will increase reference count of existing storage device,
but forget to add the device to the `mount_list` array, thus leak the
reference count.

Fixes: #7820

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-01 22:52:42 +08:00
Dan Mihai
bf21411e90 tests: add policy to k8s tests
Use AGENT_POLICY=yes when building the Guest images, and add a
permissive test policy to the k8s tests for:
- CBL-Mariner
- SEV
- SNP
- TDX

Also, add an example of policy rejecting ExecProcessRequest.

Fixes: #7667

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-09-01 14:28:08 +00:00
Dan Mihai
d0e0610679 runtime: config: use the SEV initrd for SNP
Thanks Unmesh Deodhar!

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-09-01 14:28:08 +00:00
Fabiano Fidêncio
67fed26f18 runtime: Use TDX image with in the qemu-tdx config
Let's make sure we use the TDX image as part of the QEMU TDX
configuration, which will help us to have the policies tested here.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-01 14:28:08 +00:00
Fabiano Fidêncio
f65ffb23da Merge pull request #7814 from fidencio/topic/gha-rebase-prs-atop-of-main-for-the-tests
gha: Rebase PR atop of the target branch before testing
2023-09-01 16:26:32 +02:00
Fabiano Fidêncio
ef70aeb6b8 Merge pull request #7817 from fidencio/topic/update-alpine-to-its-latest-release
versions: Update alpine to its 3.18 version
2023-09-01 14:51:58 +02:00
Fabiano Fidêncio
ac939c458c gha: Rebase atop of the target branch
We have two scenarios we care about this, `pull_request` and
`pull_request_target` events triggered a job.

`pull_request` event:
When using the checkout action, it'll already provide a "rebased atop of
main" repo for us, nothing else is needed, and that's basically what we
already have as part of the jobs in our CI.

`pull_request_target` event:
This one is a little bit tricky, as the checkout action, unless passing
a spsecific repo, give us the PR checked out rebased atop of the HEAD of
the PR branch.  Jeremi Piotrowski nicely pointed out that we could use
github.event.pull_request.merge_commit_sha instead, which is the result
of the PR's branch with the official repo target branch.

Now, the only cases where the contributor's rebase would still be needed
is when the action itself has been changed.

Fixes: #7414

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-01 11:23:31 +02:00
Jeremi Piotrowski
bde06758b1 Merge pull request #7761 from jepio/iocopy-fix-race
runtime: Fix data race in ioCopy
2023-09-01 09:30:54 +02:00
Fabiano Fidêncio
82cd14ba39 versions: Update alpine to its 3.18 version
3.15 will be out of life in 2 months from now.

Fixes: #7816

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-31 23:02:54 +02:00
GabyCT
d75c7b5f9c Merge pull request #7813 from GabyCT/topic/genreport
metrics: Add grabdata script for metrics report
2023-08-31 13:33:38 -06:00
Gabriela Cervantes
6668825752 metrics: Add grabdata script for metrics report
This PR adds the grabdata script so it can be used for the metrics report
for kata metrics.

Fixes #7812

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-31 16:17:29 +00:00
James O. D. Hunt
c290eaed8c kata-sys-util: protection: Update TDX checks
Update the protection checking code to detect newer versions of Intel
TDX (whose userland interface has now stabilised).

> **Note:** that we don't need to retain the existing behaviour since:
>
> - We haven't yet landed the TDX feature (#6448).
> - Systems wishing to use TDX will need to use the latest available
>   system components (such as firmware and host kernel).

Also added an explicit TDX unit test.

Fixes: #7384.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-08-31 16:15:15 +01:00
Fabiano Fidêncio
d7a996c686 gha: Update to checkout@v3 action
At this point we should always be using the latest checkout action.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-31 16:02:31 +02:00
Jeremi Piotrowski
d7612440b8 Merge pull request #7789 from beraldoleal/tests/amd
Fixes tests on AMD machines
2023-08-31 11:23:51 +02:00
Jeremi Piotrowski
c2ba29c15b runtime: Fix data race in ioCopy
IoCopy is a tricky function (I don't claim to fully understand its contract),
but here is what I see: The goroutine that runs it spawns 3 goroutines - one
for each stream to handle (stdin/stdout/stderr). The goroutine then waits for
the stream goroutines to exit. The idea is that when the process exits and is
closed, the stdout goroutine will be unblocked and close stdin - this should
unblock the stdin goroutine. The stderr goroutine will exit at the same time as
the stdout goroutine. The iocopy routine then closes all tty.io streams.

The problem is that the stdout goroutine decrements the WaitGroup before
closing the stdin stream, which causes the iocopy goroutine to race to close
the streams. Move the wg.Done() of the stdout routine past the close so that
*this* race becomes impossible. I can't guarantee that this doesn't affect some
unspecified behavior.

Fixes: #5031
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-08-31 10:17:38 +02:00
Manabu Sugimoto
211de08d9e osbuilder: Remove chcon operation for guest SELinux
Remove the `chcon` operation which adds `container_runtime_exec_t` label to
the `kata-agent` binary because the container-selinux package including
the 39f83cc74d
commit has been released officially.
Ref. https://centos.pkgs.org/9-stream/centos-appstream-x86_64/container-selinux-2.221.0-1.el9.noarch.rpm.html

The container-selinux package is installed in a guest rootfs when we create it with `SELinux = yes`,
and `restorecon` sets `container_runtime_exec_t` to the `kata-agent`.

Fixes: #7807

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-31 16:44:32 +09:00
GabyCT
b467f2ef68 Merge pull request #7772 from GabyCT/topic/fiolimit
metrics: Enable FIO limits for kata metrics
2023-08-30 14:49:04 -06:00
Gabriela Cervantes
9f21fa9b39 metrics: Add report generator link to general documentation
This PR adds the report generator link to general documentation.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-30 16:55:14 +00:00
Gabriela Cervantes
c0ed5ea0ad metrics: Add README for kata metrics report
This PR adds the README for kata metrics report.

Fixes #7802

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-30 16:36:08 +00:00
Fabiano Fidêncio
aa2b51a831 Merge pull request #7783 from GabyCT/topic/makereport
metrics: Add metrics report script
2023-08-30 17:11:39 +02:00
Gabriela Cervantes
a7b59a5bf9 metrics: Add limit for 90 percentile for qemu value
This PR adds the limit for 90 percentile for qemu value for
FIO kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-30 13:53:38 +00:00
Gabriela Cervantes
99db6568e9 metrics: Add limit for write 90 percentile value for clh
This PR adds the limit for write 90 percentile value for clh for
FIO metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-30 13:53:38 +00:00
Gabriela Cervantes
6e06392c55 metrics: Enable FIO limits for kata metrics
This PR enables the FIO limits for kata metrics.

Fixes #7771

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-30 13:53:38 +00:00
David Esparza
924d06a7f5 Merge pull request #7787 from GabyCT/topic/fixmemoryinsidelimit
metrics: Fix memory inside limits for kata metrics
2023-08-30 07:45:17 -06:00
Peng Tao
2e4c874726 runtime/vc: runPrestartHooks should ignore GetHypervisorPid failure
If we are running FC hypervisor, it is not started when prestart hooks
are executed. So we should just ignore such error and just go ahead and
run the hooks.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-30 03:06:11 +00:00
Peng Tao
21204caf20 runtime: fail early when starting docker container with FC
FC does not support network device hotplug. Let's add a check to fail
early when starting containers created by docker.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-30 02:52:01 +00:00
Peng Tao
32fd013716 runtime: run prestart hooks before starting VM for FC
Add a new hypervisor capability to tell if it supports device hotplug.
If not, we should run prestart hooks before starting new VMs as nerdctl
is using the prestart hooks to set up netns. To make nerdctl + FC
to work, we need to run the prestart hooks before starting new VMs.

Fixes: #6384
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-30 02:52:01 +00:00
Beraldo Leal
00e7ffd988 tests: check vmx only on Intel machines
When running on amd machines, those tests will fail because there is no
vmx flag. Following other tests that checks for cpuType, let's adapt
them to restrict vmx only on Intel machines.

Fixes #7788.
Related #5066

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-08-29 20:04:31 -04:00
Gabriela Cervantes
c8dd3c0737 metrics: Fix memory footprint qemu limit
This PR fixes the memory footprint qemu limit for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 22:51:21 +00:00
Gabriela Cervantes
8877ec62fb metrics: Fix memory inside limits for kata metrics
This PR fixes the memory inside limit for clh for kata metrics due
to the recent changes that we had in the script which impacted
in the performance measurement.

Fixes #7786

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 21:38:18 +00:00
Beraldo Leal
80146f2078 tests: Fixes cpuType check on AMD machines
cpuType is not initialized yet. gets 0 (Intel) by default, failing on
AMD machines.

Fixes #7785

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-08-29 17:04:07 -04:00
Gabriela Cervantes
7e364716dd metrics: Add test setup details to metrics report
This PR adds test setup details to metrics report.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:56:53 +00:00
Gabriela Cervantes
17dc1b9760 metrics: Add boot lifecycle times to metrics report
This PR adds the boot lifecycle times to metrics report.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:55:44 +00:00
Gabriela Cervantes
3b0d6538f2 metrics: Add memory inside container to metrics report
This PR adds memory inside container to metrics report.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:53:17 +00:00
Gabriela Cervantes
79fbb9d243 metrics: Add scaling system footprint in metrics report
This PR adds scaling system footprint in metrics report.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:51:27 +00:00
Gabriela Cervantes
8e6d4e6f3d metrics: Add metrics reportgen
This PR adds metrics reportgen for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:45:36 +00:00
Gabriela Cervantes
139ffd4f75 metrics: Add report file titles
This PR adds report file titles for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:43:06 +00:00
GabyCT
8f2dae7b53 Merge pull request #7775 from dborquez/fix_memory_usage_parsing_results
metrics: fix parsing issue on memory-usage test
2023-08-29 11:26:13 -06:00
Gabriela Cervantes
878d1a2e7d metrics: Generate PNGs alongside the PDF report
This PR generates the PNGs for the kata metrics PDF report.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 16:50:32 +00:00
Gabriela Cervantes
fce2487971 metrics: Add metrics report R files
This PR adds the metrics report R files.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 16:45:22 +00:00
Gabriela Cervantes
08812074d1 metrics: Add report dockerfile
This PR adds the report dockerfile for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 16:28:32 +00:00
Gabriela Cervantes
69781fc027 metrics: Add metrics report script
This PR adds metrics report script for kata metrics.

Fixes #7782

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 16:25:14 +00:00
Chao Wu
e4fb20c74a Merge pull request #7585 from lifupan/main
dragonball: vsock add fifo/pipe stream support for passed fd hybridSt…
2023-08-29 23:39:21 +08:00
Fabiano Fidêncio
50e51bcafe Merge pull request #7185 from UnmeshDeodhar/add-cc-sev-test
tests: Add confidential test
2023-08-29 15:32:25 +02:00
Fabiano Fidêncio
e286e842c1 tests: Expand confidential test to support TDX
Let's expand the confidential test to also support TDX.

The main difference on the test, though, is that we're not grepping for
a string in the `dmesg` output, but rather relying on `cpuid` to detect
a TDX guest.

Fixes: #7184

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-29 14:10:47 +02:00
Unmesh Deodhar
e31f099be1 tests: Expand confidential test to support SNP
Let's expand the confidential test to also support SNP.

Fixes: #7184

Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
2023-08-29 14:10:47 +02:00
Unmesh Deodhar
c3b9d4945e tests: Add confidential test for SEV
Add a test case for the launch of unencrypted confidential
container, verifying that we are running inside a TEE.

Right now the test only works with SEV, but it'll be expanded in the
coming commits, as part of this very same series.

Fixes: #7184

Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-29 14:10:34 +02:00
David Esparza
538c965c2b metrics: fix parsing issue on memory-usage test
This PR fixes an issues in the parsing results stage,
by collecting just the n-results from the n-running
containers, discarding irrelevant data.

Fixes: #7774

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-08-28 23:39:46 -06:00
Fabiano Fidêncio
708b0a3052 Merge pull request #7768 from fidencio/topic/update-tdx-to-the-6.2-kernel-based-stack
tdx: Update the components needed for using the 6.2 kernel stack
2023-08-28 19:27:15 +02:00
Fabiano Fidêncio
3818bf3311 local-build: Remove $HOME/.docker/buildx/activity/default
The file can be removed between builds without causing any issue, and
leaving it around has been causing us some headache due to:
```
ERROR: open /home/runner/.docker/buildx/activity/default: permission denied
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-28 13:41:36 +02:00
Fabiano Fidêncio
d1b54ede29 qemu: tdx: Workaround SMP issue with TDX 1.5
`...,sockets=1,cores=numvcpus,threads=1,...` must be used.

Fixes: #7770

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-28 13:41:36 +02:00
Archana Shinde
1e34220c41 qemu: tdx: Adapt to the TDX 1.5 stack
QEMU for TDX 1.5 makes use of private memory map/unmap.
Make changes to govmm to support this. Support for private backing fd
for memory is added as knob to the qemu config.

Userspace's map/unmap operations are done by fallocate() ioctl on the
backing store fd.
Reference:
https://lore.kernel.org/linux-mm/20220519153713.819591-1-chao.p.peng@linux.intel.com/

Fixes: #7770

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-28 13:41:36 +02:00
Fabiano Fidêncio
8115a0522d versions: tdx: Update Kernel to 6.2 + TDX
This is the version that's been used and tested inside Intel, and it
matches with https://github.com/intel/tdx-tools/releases/tag/2023ww15.

Fixes: #7770

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-28 13:11:34 +02:00
Fabiano Fidêncio
ec18180f34 versions: tdx: Update TDVF to the "edk2-stable202302"
This is the version that's been used and tested inside Intel, and it
matches with https://github.com/intel/tdx-tools/releases/tag/2023ww15.

Fixes: #7770

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-28 13:11:34 +02:00
Fabiano Fidêncio
9803b24286 versions: tdx: Update QEMU to v7.2 + TDX v1.10
This is the version that's been used and tested inside Intel, and it
matches with https://github.com/intel/tdx-tools/releases/tag/2023ww15.

Fixes: #7770

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-28 13:11:27 +02:00
Fabiano Fidêncio
02a08c956b Merge pull request #7754 from microsoft/danmihai1/pod-quota-deployment
tests: delete k8s deployment at the test's end
2023-08-27 17:52:00 +02:00
Fabiano Fidêncio
98037ced52 Merge pull request #7755 from microsoft/danmihai1/unique-test-name
tests: use unique test name
2023-08-27 17:27:40 +02:00
Zhongtao Hu
f0440a9cfe Merge pull request #7742 from frezcirno/fix-log-forwarder-loop
runtime-rs: check peer close in log_forwarder
2023-08-26 10:44:09 +08:00
Fabiano Fidêncio
16a610d788 Merge pull request #7758 from fidencio/topic/gha-avoid-fail-fast-till-everything-is-ultra-stable
gha: Avoid "fail-fast" in tests that are known to be flaky
2023-08-25 16:49:26 +02:00
Jiang Liu
91db888d83 Merge pull request #7602 from jiangliu/agent-storage
Refine storage device management for kata-agent
2023-08-25 22:20:18 +08:00
Zixuan Tan
dffc16e5b3 runtime-rs: check peer close in log_forwarder
The log_forwarder task does not check if the peer has closed, causing a
meaningless loop during the period of “kata vm exit”, when the peer
closed, and “ShutdownContainer RPC received” that aborts the log forwarder.

This patch fixes the problem.

Fixes: #7741

Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2023-08-25 19:00:07 +08:00
Jiang Liu
aaa5ab1264 agent: simplify storage device by removing StorageDeviceObject
Simplify storage device implementation by removing StorageDeviceObject.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-25 17:23:16 +08:00
Fabiano Fidêncio
fb49d5d7ce gha: Avoid "fail-fast" in tests that are known to be flaky
Otherwise we'll have to re-run all the tests due to a flaky behaviour in
one of the parts.

Fixes: #7757

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-25 10:00:17 +02:00
Dan Mihai
183f51d6f6 tests: use unique test name
k8s-pid-ns.bats was already using the test name from
k8s-kill-all-process-in-container.bats - probably a copy/paste bug.

Fixes: #7753

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-08-25 03:41:06 +00:00
Dan Mihai
6a974679f2 tests: delete k8s deployment at the test's end
At the end of k8s-kill-all-process-in-container.bats, delete the
deployment it created.

Fixes: #7752

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-08-25 03:34:37 +00:00
David Esparza
686eb3878b Merge pull request #7751 from GabyCT/topic/unusednhwc
metrics: Remove unused variable in tensorflow nhwc script
2023-08-24 18:34:06 -06:00
Fabiano Fidêncio
f1d8e1f513 Merge pull request #7747 from fidencio/topic/kata-deploy-dont-try-to-remove-opt-kata
kata-deploy: Don't try to remove /opt/kata
2023-08-24 18:56:52 +02:00
Gabriela Cervantes
32a778b6da metrics: Remove unused variable in tensorflow nhwc script
This PR removes unused variable in tensorflow nhwc script.

Fixes #7750

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-24 15:54:27 +00:00
David Esparza
875a85ee14 Merge pull request #7736 from GabyCT/topic/tensorflowfp32
metrics: Add TensorFlow ResNet50 FP32 benchmark
2023-08-24 08:56:24 -06:00
Fabiano Fidêncio
d8f3ce6497 kata-deploy: Don't try to remove /opt/kata
The directory is a host path mount and cannot be removed from within the
container.  What we actually want to remove is whatever is inside that
directory.

This may raise errors like:
```
rm: cannot remove '/opt/kata/': Device or resource busy
```

Fixes: #7746

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-24 13:57:36 +02:00
Jeremi Piotrowski
71c90b994a Merge pull request #7745 from jepio/vfio-part-0
gha: vfio: Run on Ubuntu 23.04 runner
2023-08-24 12:15:19 +02:00
Greg Kurz
9991772b26 Merge pull request #7718 from littlejawa/fix_filemode_when_zero
kata-agent: use default filemode for block device when it is set to 0
2023-08-24 11:40:28 +02:00
Jeremi Piotrowski
936e8091a7 gha: vfio: Run on Ubuntu 23.04 runner
The vfio test requires nested-nested virtualization:

L0 Azure host
-> L1 Ubuntu VM
  -> L2 Fedora VM
    -> L3 Kata

This hits a kernel bug on v5.15 but works quite nicely on the v6.2 kernel
included in Ubuntu 23.04. We can switch back to Ubuntu 22.04 when they roll out
v6.2.

Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-08-24 10:10:02 +02:00
Jiang Liu
0e7248264d agent: move storage device related code into dedicated files
Move storage device related code into dedicated files.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 13:48:51 +08:00
Xuewei Niu
268e846558 runtime-rs: Fix volumes and rootfs cleanup issues
There are several processes for container exit:

- Non-detach mode: `Wait` request is sent by containerd, then
  `wait_process()` will be called eventually.
- Detach mode: `Wait` request is not sent, the `wait_process()` won’t be
  called.
    - Killed by ctr: For example, a container runs `tail -f /dev/null`, and
      is killed by `sudo ctr t kill -a -s SIGTERM <CID>`. Kill request is
      sent, then `kill_process()` will be called. User executes `sudo ctr c
      rm <CID>`, `Delete` request is sent, then `delete_process()` will be
      called.
    - Exited on its own: For example, a container runs `sleep 1s`. The
      container’s state goes to `Stopped` after 1 second. User executes
      the delete command as below.

Where do we do container cleanup things?

- `wait_process()`: No, because it won’t be called in detach mode.
- `delete_process()`: No, because it depends on when the user executes the
  delete command.
- `run_io_wait()`: Yes. A container is considered exited once its IO ended.
  And this always be called once a container is launched.

Fixes: #7713

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-08-24 13:23:47 +08:00
Jiang Liu
8f49ee33b2 agent: refine storage related code a bit
Refine storage related code by:
- remove the STORAGE_HANDLER_LIST
- define type alias
- move code near to its caller

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 13:09:10 +08:00
Jiang Liu
60ca12ccb0 agent: switch to new storage subsystem
Switch to new storage subsystem to create a StorageDevice for each
storage object.

Fixes: #7614

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 13:09:09 +08:00
Jiang Liu
fcbda0b419 kata-types: introduce StorageDevice and StorageHandlerManager
Introduce StorageDevice and StorageHandlerManager, which will be used
to refine storage device management for kata-agent.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 13:08:55 +08:00
Jiang Liu
b03b1f6134 agent: simplify the way to manage storage object
Simplify the way to manage storage objects, and introduce
StorageStateCommon structures for coming extensions.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:58:24 +08:00
Jiang Liu
8392c71bf2 sys-util: support more mount flags in parse_mount_options()
Support more mount flags in parse_mount_options().

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:17:39 +08:00
Jiang Liu
c00d8f3d48 agent: use create_mount_destination() from kata-sys-util
Use create_mount_destination() from kata-sys-util crate to reduce
redundant code.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:17:38 +08:00
Jiang Liu
5e867f0538 types: add more mount related constants
Add more mount related constants.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:17:36 +08:00
Jiang Liu
880e6c9a76 agent: use function from kata-sys-utils to reduce code
Use function get_linux_mount_info() from kata-sys-util crate to share
common code.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:17:34 +08:00
QuanweiZhou
a6921dd837 Merge pull request #7698 from jiangliu/virtual-volume
kata-types: introduce KataVirtualVolume to support nydus, direct volume and image pull
2023-08-24 11:50:39 +08:00
Fabiano Fidêncio
7705c5962e Merge pull request #7728 from ManaSugi/fix/typo-test-toml
libs,tests: fix typo disable_guest_seccomp in configuration-anno-1.toml
2023-08-23 23:55:41 +02:00
GabyCT
c1712e1930 Merge pull request #7737 from jepio/fix-local-build
local-build: Remove GID before creating group
2023-08-23 12:26:39 -06:00
Jeremi Piotrowski
3b881fbc0e local-build: Remove GID before creating group
docker install now creates a group with gid 999 which happens to match what we
need to get docker-in-docker to work. Remove the group first as we don't need
it.

Fixes: #7726
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-08-23 18:58:38 +02:00
David Esparza
ebce5d25a9 Merge pull request #7734 from fidencio/topic/kata-deploy-fix-removal
kata-deploy: Avoid failing on content removal
2023-08-23 10:29:57 -06:00
Gabriela Cervantes
959ca49447 metrics: Add TensorFlow ResNet50 fp32 Dockerfile
This PR adds the TensorFlow ResNet50 fp32 Dockerfile for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-23 16:24:58 +00:00
Gabriela Cervantes
4b7d72c4a8 metrics: Add TensorFlow ResNet50 FP32 benchmark
This PR adds TensorFlow ResNet50 FP32 benchmark for kata metrics.

Fixes #7735

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-23 16:21:09 +00:00
Fabiano Fidêncio
e7e4cc2182 Merge pull request #7716 from bergwolf/github/image-initrd-assets
runtime: fix image and initrd assets handling
2023-08-23 18:02:15 +02:00
Fabiano Fidêncio
5cba38c175 kata-deploy: Avoid failing on content removal
We can simply use `rm -f` all over the place and avoid the container
returning any error.

Fixes: #7733

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-23 16:49:26 +02:00
Peng Tao
18d42da21e runtime/fc: fix image/initrd annotation handling
Right now if we configure an image annotation and have a config file
setting initrd, the initrd config would override the image annotation.

Make sure annotations are preferred over config options in image and initrd
path handling.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-23 03:47:28 +00:00
Peng Tao
9fda7059a5 runtime/clh: fix image/initrd annotation handling
We should make sure annotations are preferred over
config options in image and initrd path handling.

Fixes: #7705
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-23 03:47:28 +00:00
Peng Tao
1a0092d631 runtime/qemu: fix image/initrd annotation handling
Right now if we configure an image annotation and have a config file
setting initrd, the initrd config would override the image annotation.

Add a helper function ImageOrInitrdAssetPath to make sure annotations
are preferred over config options in image and initrd path handling.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-23 03:47:27 +00:00
Manabu Sugimoto
22d8f335d6 libs,tests: fix typo disable_guest_seccomp in configuration-anno-1.toml
Change `pdisable_guest_seccomp` to `disable_guest_seccomp`

Fixes: #7727

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-23 12:08:18 +09:00
GabyCT
b8990c0490 Merge pull request #7722 from GabyCT/topic/adddiskreadme
metrics: Add disk link to README
2023-08-22 12:29:54 -06:00
GabyCT
514d3d42b8 Merge pull request #7712 from GabyCT/topic/fixfiopath
metrics: Fix FIO path
2023-08-22 12:28:28 -06:00
Gabriela Cervantes
8afd158cef metrics: Add disk link to README
This PR adds disk link to README documentation for kata metrics.

Fixes #7721

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-22 16:20:31 +00:00
Julien Ropé
40914b25d4 kata-agent: use default filemode for block device when it is set to 0
When the FileMode field for the device is unset (0), use a default value instead
to allow the use of the device from the container.
This behaviour is seen from cri-o typically.

Note: this is what runc is doing, which is why regular containers don't have an
issue. This change makes sure kata behaves the same as runc.

Fixes: #7717

Signed-off-by: Julien Ropé <jrope@redhat.com>
2023-08-22 16:08:14 +02:00
Fabiano Fidêncio
8032797418 Merge pull request #7708 from microsoft/danmihai1/kata-deploy-log
gha: capture additional kata-deploy output
2023-08-21 23:43:51 +02:00
David Esparza
d2c130ea69 Merge pull request #7710 from GabyCT/topic/fixpytorch1
metrics: Use function from metrics common in pytorch script
2023-08-21 15:31:24 -06:00
Gabriela Cervantes
eee2ee6eeb metrics: Fix FIO path
This PR fixes the FIO path for the FIO files.

Fixes #7711

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-21 21:06:04 +00:00
David Esparza
9347051592 Merge pull request #7666 from dborquez/metrics_improve_fio_test
metrics: Enable kata runtime in K8s for FIO test.
2023-08-21 13:51:57 -06:00
Gabriela Cervantes
39bc3488f5 metrics: Use function from metrics common in pytorch script
This PR uses a common function into the pytorch script.

Fixes #7709

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-21 16:12:35 +00:00
Dan Mihai
400eb88743 gha: capture additional kata-deploy output
10 lines can be insufficient for diagnostics.

Fixes: #7707

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-08-21 15:58:57 +00:00
GabyCT
700759232f Merge pull request #7690 from GabyCT/topic/fixpytorch
metrics: Fix README for pytorch
2023-08-21 09:50:14 -06:00
Jiang Liu
6e038e66e4 Merge pull request #7680 from GabyCT/topic/removetime
metrics: Remove unused variable in tensorflow mobilenet script
2023-08-21 23:39:07 +08:00
Jiang Liu
4aee3eade0 kata-types: implement serde methods for KataVirtualVolume
Implement serilization/deserialization methods for KataVirtualVolume.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-21 16:46:56 +08:00
Jiang Liu
b875e39323 kata-types: validate KataVirtualVolume object
Implement method validate() for KataVirtualVolume to validate message
format.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-21 16:42:07 +08:00
Jiang Liu
fa2fdc1057 kata-types: implement two conversion helpers for KataVirtualVolume
Enable conversions from NydusExtraOptions/DirectVolumeMountInfo to
KataVirtualVolume.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-21 16:35:26 +08:00
Jiang Liu
6326af20e3 kata-types: introduce KataVirtualVolume
Introduce structure KataVirtualVolume to to encapsulate information
for extra mount options and direct volumes, so we could build a common
infrastructure to handle these cases.

Fixes: #7699

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-21 16:19:47 +08:00
Gabriela Cervantes
c8b43f8b3e metrics: Fix README for pytorch
This PR fixes the pytorch reference in the README file.

Fixes #7689

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-18 20:14:49 +00:00
Aurélien
fa34d61805 Merge pull request #7664 from microsoft/danmihai1/agent-init-policy
rootfs: agent: Policy support with AGENT_INIT=yes
2023-08-18 10:51:55 -07:00
Fabiano Fidêncio
7e66d1f6b5 Merge pull request #7649 from fidencio/topic/k8s-tests-remove-kata-deploy-tests
gha: k8s: kata-deploy: Move kata-deploy specific tests from integration/kubernetes to functional/kata-deploy
2023-08-18 07:47:26 +02:00
David Esparza
fb571f8be9 metrics: Enable kata runtime in K8s for FIO test.
This PR configures the corresponding kata runtime in K8s
based on the tested hypervisor.

This PR also enables FIO metrics test in the kata metrics-ci.

Fixes: #7665

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-08-17 17:11:27 -06:00
Dan Mihai
cb056f8cb3 rootfs: agent: Policy support with AGENT_INIT=yes
When building with AGENT_POLICY=yes and AGENT_INIT=yes:
1. Include OPA and the Policy settings in rootfs.
2. Start OPA from the kata agent.

Before these changes, building with both AGENT_POLICY=yes and
AGENT_INIT=yes was unsupported.

Starting OPA from systemd (when AGENT_INIT=no) was already supported.

Fixes: #7615

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-08-17 22:37:58 +00:00
GabyCT
c358056a3f Merge pull request #7685 from GabyCT/topic/changename
metrics: Fix check results for tensorflow benchmark
2023-08-17 15:39:43 -06:00
Gabriela Cervantes
85c02828e1 metrics: Update tensorflow name in gha run script
This PR update tensorflow name in gha run script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-17 20:17:48 +00:00
Gabriela Cervantes
e8a5119343 metrics: Fix check results for tensorflow benchmark
This PR fixes the check results for tensorflow benchmark now
that we change the name of the test.

Fixes #7684

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-17 19:52:45 +00:00
Fabiano Fidêncio
2d896ad12f gha: kata-deploy: Do the runtime class cleanup as part of the cleanup
Instead of doing this as part of the test itself, let's ensure it's done
before running the tests and during the tests cleanup.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-17 18:54:46 +02:00
Fabiano Fidêncio
4ffc2c86f3 gha: kata-deploy: Add the first kata-deploy test
This test, at least for now, only checks whether the runtimeclasses
have been properly created.

This is just a migration from a test we had as part of the k8s suite.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-17 18:54:46 +02:00
GabyCT
4ba684e6e4 Merge pull request #7653 from GabyCT/topic/tensorflowfp32
metrics: Add Tensorflow ResNet50 int8 benchmark
2023-08-17 10:44:25 -06:00
Gabriela Cervantes
8616c050ae metrics: Remove unused variable in tensorflow mobilenet script
This PR removes unused variable in tensorflow mobilenet script.

Fixes #7679

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-17 16:04:18 +00:00
Fabiano Fidêncio
285e616b5e tests: common: Ensure test_type is used as part of the cluster's name
By doing this we can make sure there won't be any clash on the cluster
name created for either the k8s or the kata-deploy tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-17 14:22:16 +02:00
Fabiano Fidêncio
790bd3548d tests: commob: Don't fail if yq is not part of the cache
This may happen on external runners.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-17 14:22:14 +02:00
Fabiano Fidêncio
ce6adecd0a gha: kata-deploy: Add run-kata-deploy-tests.sh
This will have the same function as run-k8s-tests.sh has, but for
kata-deploy.

Right now it doesn't have any tests, and the command to actually run the
tests is commented out, but right now this is just a placeholder that
will be populated sooner than later.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-17 09:49:03 +02:00
Fabiano Fidêncio
cfc29c11a3 gha: k8s: Stop running kata-deploy tests as part of the k8s suite
In a follow-up series, we'll add a whole suite for the kata-deploy
tests.  With this in mind, let's already get rid of this one and avoid
more kata-deploy tests to land here.

Fixes: #7642

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-17 09:48:54 +02:00
Fabiano Fidêncio
e470a650e0 Merge pull request #7654 from sprt/ci-fixes
kata-deploy: Properly create default runtime class
2023-08-17 09:43:34 +02:00
Wedson Almeida Filho
962378606e Merge pull request #7627 from wedsonaf/error-conv
agent: simplify error handling
2023-08-16 21:02:38 -03:00
Aurélien Bombo
f4dd152863 tests: k8s: Call ensure_yq() in setup.sh
It wasn't the `common.bash` import in `run_kubernetes_tests.sh` causing
the yq error so let's try this instead.

Reference: https://github.com/kata-containers/kata-containers/actions/runs/5674941359/job/15379797568#step:10:341

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-08-16 14:13:56 -07:00
GabyCT
3d0cfc88c9 Merge pull request #7662 from GabyCT/topic/fixhelptensorflow
metrics: Fix MobileNet help me description
2023-08-16 14:13:39 -06:00
Aurélien Bombo
339569b69c kata-deploy: Properly create default runtime class
The default `kata` runtime class would get created with the `kata`
handler instead of `kata-$KATA_HYPERVISOR`. This made Kata use the wrong
hypervisor and broke CI.

Fixes: #7663

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-08-16 11:04:44 -07:00
Gabriela Cervantes
2a491e9b1f metrics: Fix MobileNet help me description
This PR fixes MobileNet help me description in the
tensorflow script.

Fixes #7661

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-16 15:25:39 +00:00
Fabiano Fidêncio
606e419fac Merge pull request #7660 from fidencio/topic/add-kata-deploy-tests-as-part-of-the-ci
gha: ci: Start running kata-deploy tests
2023-08-16 16:44:08 +02:00
Fabiano Fidêncio
d19a75e80c gha: ci: Start running kata-deploy tests
Let's add the tests as part of the ci.yaml, so they an be triggered as
part of each PR.

For this PR those tests won't be triggered, courtesy to the
`pull_request_target` event we rely on.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-16 16:08:05 +02:00
Fabiano Fidêncio
4adcf2192e Merge pull request #7651 from ManaSugi/runk/containerd-test
runk: Modify kill command's error message for containerd tests
2023-08-16 15:37:48 +02:00
Zhongtao Hu
5c8a61a4c8 Merge pull request #7558 from openanolis/fix/driver_option
runtime-rs: add driver option
2023-08-16 13:56:29 +08:00
Zhongtao Hu
d90f7ac689 runtime-rs: add unit test for block driver
add unit test for block driver

Fixes:#7539
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-08-16 11:45:27 +08:00
Zhongtao Hu
e44919f0da runtime-rs: add load_test_config for unit test
add load_test_config for unit test

Fixes:#7539
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-08-16 11:32:56 +08:00
Zhongtao Hu
7f48a69379 runtime-rs: add driver option
add driver option when handle linux devices

Fixes:#7539
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-08-16 11:32:49 +08:00
Gabriela Cervantes
bade6a5c3b docs: Fix TensorFlow word across the document
This PR fixes the TensorFlow word across the document to have uniformity
across all the document.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-15 20:13:05 +00:00
Fabiano Fidêncio
0bc48eab60 Merge pull request #7640 from fidencio/topic/gha-cri-containerd-enable-tests
gha: cri-containerd: Enable tests
2023-08-15 21:18:28 +02:00
Gabriela Cervantes
1a1b207760 docs: Add Tensorflow Resnet50 documentation
This PR adds the Tensorflow Resnet50 documentation.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-15 17:46:44 +00:00
Gabriela Cervantes
24baededc0 metrics: Add Dockerfile for ResNet50 int8
This PR adds the dockerfile for ResNet50 int8 benchmark.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-15 17:38:26 +00:00
Gabriela Cervantes
6d971ba8df metrics: Add Tensorflow ResNet50 int8 benchmark
This PR adds the Tensorflow ResNet50 int8 script for kata metrics.

Fixes #7652

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-15 17:30:22 +00:00
Manabu Sugimoto
25d151bd1b runk: Modify kill command's error message for containerd tests
The error message when the kill command is executed with the container's
state == Stopped should be "container not running" because the containerd
tests expect that OCI runtimes return the error message and compare it.
If the error message is different from the expected one, the tests fail.

Fixes: #7650

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-16 00:39:50 +09:00
GabyCT
0bbabeaaf8 Merge pull request #7644 from GabyCT/topic/renametensorflow
metrics: Rename tensorflow scripts
2023-08-15 09:23:24 -06:00
Fabiano Fidêncio
46d25d908d Merge pull request #7643 from fidencio/topic/add-functional-kata-deploy-tests
gha: tests: Add kata-deploy functional tests -- Part 1
2023-08-15 15:23:48 +02:00
Fabiano Fidêncio
b3592ab25c gha: cri-containerd: Enable tests
As the cri-containerd tests have been fully migrated to GHA, let's make
sure we get them running.

Fixes: #6543

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-15 14:32:42 +02:00
Fabiano Fidêncio
84dd02e0f9 gha: cri-containerd: Add timeout to the crictl calls on testContainerStop
As part of the runners, we're hitting a timeout that I cannot reproduce,
at all, when allocating the same instance and running the tests
manually.

The default timeout to connect to the server is 2s when using `crictl`.
Let's increase this to 20s.

It's fairly important to mention that in the first tests I used a
timeout of 10s, and that helped but we still hit issues every now and
then.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-15 14:31:54 +02:00
Fabiano Fidêncio
b29782984a gha: cri-containerd: Show pod before deleting it
It'll help us to debug failures with the pod stop / pod delete.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-15 14:31:54 +02:00
Fabiano Fidêncio
ae0930824a gha: cri-containerd: Print kata logs in case of error
We need this to fully understand what are the issues we're facing.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-15 14:31:54 +02:00
Fabiano Fidêncio
6c8b2ffa60 gha: cri-containerd: Group containerd logs
This improves readability in case of failures by a lot.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-15 14:31:54 +02:00
Fabiano Fidêncio
9e898701f5 gha: cri-containerd: Ensure RUNTIME takes KATA_HYPERVISOR into account
Short commit log says it all.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-15 14:31:54 +02:00
Wedson Almeida Filho
76dac8f22c agent: simplify error handling
We extend the `Result` and `Option` types with associated types that
allows converting a `Result<T, E>` and `Option<T>` into
`ttrpc::Result<T>`.

This allows the elimination of many `match` statements in favor of
calling the map function plus the `?` operator. This transformation
simplifies the code.

Fixes: #7624

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-15 06:55:27 -03:00
Fabiano Fidêncio
e107d1d94e Merge pull request #7574 from microsoft/danmihai1/policy
agent: runtime: add Agent Policy feature
2023-08-15 11:29:13 +02:00
Bin Liu
ea81eb6c2e Merge pull request #7169 from chethanah/runk/support-no-pid-ns
runk: Support without pid ns
2023-08-15 13:00:40 +08:00
Gabriela Cervantes
18a7fd8e4e metrics: Rename tensorflow scripts
This PR renames the tensorflow scripts to include the data format
that is being used as we will have multiple tests with different
data and model formats for tensorflow so this will help us to
distinguish them.

Fixes #7645

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-14 20:40:35 +00:00
GabyCT
a740c80251 Merge pull request #7626 from GabyCT/topic/cassandrak
metrics: Add Cassandra Kubernetes benchmark for kata metrics
2023-08-14 14:22:52 -06:00
GabyCT
4e5e39e8b3 Merge pull request #7618 from GabyCT/topic/addfunctionscommon
metrics: Add common functions to the common script
2023-08-14 14:22:30 -06:00
GabyCT
a19d471c01 Merge pull request #7629 from dborquez/metrics_improve_stopping_kata_components
metrics: fix the loop used to stop kata components
2023-08-14 14:22:06 -06:00
Fabiano Fidêncio
e55fa93db9 tests: kata-deploy: Add placeholder for kata-deploy-tests-on-tdx
This will not be tested as part of the PR, thanks to the
`pull_request_target` event, but we want it to be added so we can build
atop of that in a coming up series.

Fixes: #7642

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-14 21:38:00 +02:00
Fabiano Fidêncio
d9ee17aaec tests: kata-deploy: Add placeholder for kata-deploy-tests-on-aks
This will not be tested as part of the PR, thanks to the
`pull_request_target` event, but we want it to be added so we can build
atop of that in a coming up series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-14 21:37:52 +02:00
Chelsea Mafrica
22465d22f0 Merge pull request #7638 from ManaSugi/fix/virtcontainers-doc
docs: Remove installation step in virtcontainers doc
2023-08-14 10:21:57 -07:00
Dan Mihai
ab829d1038 agent: runtime: add the Agent Policy feature
Fixes: #7573

To enable this feature, build your rootfs using AGENT_POLICY=yes. The
default is AGENT_POLICY=no.

Building rootfs using AGENT_POLICY=yes has the following effects:

1. The kata-opa service gets included in the Guest image.

2. The agent gets built using AGENT_POLICY=yes.

After this patch, the shim calls SetPolicy if and only if a Policy
annotation is attached to the sandbox/pod. When creating a sandbox/pod
that doesn't have an attached Policy annotation:

1. If the agent was built using AGENT_POLICY=yes, the new sandbox uses
   the default agent settings, that might include a default Policy too.

2. If the agent was built using AGENT_POLICY=no, the new sandbox is
   executed the same way as before this patch.

Any SetPolicy calls from the shim to the agent fail if the agent was
built using AGENT_POLICY=no.

If the agent was built using AGENT_POLICY=yes:

1. The agent reads the contents of a default policy file during sandbox
   start-up.

2. The agent then connects to the OPA service on localhost and sends
   the default policy to OPA.

3. If the shim calls SetPolicy:

   a. The agent checks if SetPolicy is allowed by the current
      policy (the current policy is typically the default policy
      mentioned above).

   b. If SetPolicy is allowed, the agent deletes the current policy
      from OPA and replaces it with the new policy it received from
      the shim.

   A typical new policy from the shim doesn't allow any future SetPolicy
   calls.

4. For every agent rpc API call, the agent asks OPA if that call
   should be allowed. OPA allows or not a call based on the current
   policy, the name of the agent API, and the API call's inputs. The
   agent rejects any calls that are rejected by OPA.

When building using AGENT_POLICY_DEBUG=yes, additional Policy logging
gets enabled in the agent. In particular, information about the inputs
for agent rpc API calls is logged in /tmp/policy.txt, on the Guest VM.
These inputs can be useful for investigating API calls that might have
been rejected by the Policy. Examples:

1. Load a failing policy file test1.rego on a different machine:

opa run --server --addr 127.0.0.1:8181 test1.rego

2. Collect the API inputs from Guest's /tmp/policy.txt and test on the
   machine where the failing policy has been loaded:

curl -X POST http://localhost:8181/v1/data/agent_policy/CreateContainerRequest \
--data-binary @test1-inputs.json

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-08-14 17:07:35 +00:00
Fabiano Fidêncio
831e73ff91 tests: kata-deploy: Add functional/kata-deploy/gha-run.sh placeholder
Right now this file does nothing, as it's not even called by any GHA.
However, it'll be populated later on as part of a different series,
where we'll have kata-deploy specific tests running here.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-14 17:46:10 +02:00
Fabiano Fidêncio
af1b46bbf2 tests: Add gha-run-k8s-common.sh
Let's split a good portion of `tests/integration/kuberentes/gha-run.sh`
out, and put them in a place where they can be used to the soon-to-come
kata-deploy specific tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-14 17:45:58 +02:00
Jeremi Piotrowski
a57e7ffe14 Merge pull request #7211 from stevenhorsman/propogate-secrets
Propogate secrets, config maps etc into guest if sharedFS not available
2023-08-14 11:24:47 +02:00
Manabu Sugimoto
416445e7eb docs: Remove installation step in virtcontainers doc
Remove the installation step in the virtcontainers doc
because the virtcontainers install/uninstall targets have
been removed by 86723b51ae
and they are not used anymore.

Fixes: #7637

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-14 15:15:24 +09:00
Fabiano Fidêncio
b975c27793 Merge pull request #7547 from stevefan1999-personal/patch-k0s
kata-deploy: Preliminary k0s support
2023-08-12 14:28:13 +02:00
Fabiano Fidêncio
6ed57d1e9a Merge pull request #7447 from fidencio/topic/gha-move-static-jenkins-to-azure-instances
gha: static-checks: Move to the Azure instances
2023-08-12 13:31:54 +02:00
Steve Fan
72cbcf040b kata-deploy: Add k0s support
Add k0s support to kata-deploy, in the very same way kata-containers
already supports k3s, and rke2.

k0s support requires v1.27.1, which is noted as part of the kata-deploy
documentation, as it's the way to use dynamic configuration on
containerd CRI runtimes.

This support will only be part of the `main` branch, as it's not a bug
fix that can be backported to the `stable-3.2` branch, and this is also
noted as part of the documentation.

Fixes: #7548
Signed-off-by: Steve Fan <29133953+stevefan1999-personal@users.noreply.github.com>
2023-08-11 21:17:23 +02:00
David Esparza
767434d50a metrics: fix the loop used to stop kata components #7629
This PR fixed the loop that stops the kata-shim and the
hypervisors used in metrics checks.

Fixes: #7628

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-08-11 12:32:41 -06:00
Gabriela Cervantes
5d0f0d43c7 metrics: Add cassandra statefulset yaml
This PR adds cassandra statefulset yaml for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-11 17:22:39 +00:00
Gabriela Cervantes
c1dcc1396f metrics: Add cassandra service yaml
This PR adds the cassandra service yaml for the benchmark.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-11 17:22:36 +00:00
Gabriela Cervantes
2297a0d1c5 metrics: Add block loop pvc yaml for cassandra
This PR adds block loop pvc yaml for cassandra test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-11 17:22:33 +00:00
Gabriela Cervantes
e3d511946f metrics: Add block loop pv yaml for cassandra test
This PR adds the block loop pv yaml for cassandra test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-11 17:22:29 +00:00
Gabriela Cervantes
9890271594 metrics: Add block loop pvc for cassandra test
This PR adds the block loop pvc for cassandra test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-11 17:22:19 +00:00
Gabriela Cervantes
349b89969a metrics: Add Cassandra Kubernetes benchmark for kata metrics
This PR adds Cassandra Kubernetes benchmark for kata metrics tests.

Fixes #7625

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-11 17:21:48 +00:00
Fabiano Fidêncio
c52d090522 gha: static-checks: Move to the Azure instances
The GHA runners are not exactly powerful, which makes the static-checks
take way too long (almost an hour).

Let's give a try and move those to the same size of Azure instances used
as part of our CI, and probably have this time reduced.

Fixes: #7446

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-11 18:47:47 +02:00
stevenhorsman
8815ed0665 runtime: Remove config warnings
Remove configuration file shared_fs = none warnings
now that there is a solution to updating configMaps, secrets etc

Fixes: #7210
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-08-11 16:31:08 +01:00
Yohei Ueda
afe1a6ac5a agent: support copying of directories and symlinks
This patch allows copying of directories and symlinks when
static file copying is used between host and guest. This change is
necessary to support recursive file copying between shim and agent.

Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
(cherry picked from commit de232b8030)
2023-08-11 16:31:08 +01:00
Pradipta Banerjee
ab13ef87ee runtime: propagate configmap/secrets etc changes for remote-hyp
For remote hypervisor, the configmap, secrets, downward-api or project-volumes are
copied from host to guest. This patch watches for changes to the host files
and copies the changes to the guest.

Note that configmap updates takes significantly longer than updates via downward-api.
This is similar across runc and Kata runtimes.

Fixes: #7210

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: Julien Ropé <jrope@redhat.com>
(cherry picked from commit 3081cd5f8e)
(cherry picked from commit 68ec673bc4d9cd853eee51b21a0e91fcec149aad)
2023-08-11 16:31:08 +01:00
Yohei Ueda
c074ec4df1 runtime: Copy shared files recursively
This patch enables recursive file copying
when filesystem sharing is not used.

Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
(cherry picked from commit 5422a056f2)
(cherry picked from commit 16055ce040bbd724be2916bc518d89b69c9e0ca5)

Fixes: #7210
2023-08-11 16:16:52 +01:00
Peng Tao
a39fd6c066 Merge pull request #7611 from ManaSugi/fix/fc-version
versions: Update firecracker version to 1.4.0
2023-08-11 16:43:37 +08:00
Chao Wu
7031b5db07 Merge pull request #7535 from ManaSugi/fix/allow-redundant-clone
agent: Allow clippy::redundant_clone in the unit tests
2023-08-11 14:17:56 +08:00
Gabriela Cervantes
fdcd52ff78 metrics: Add check containers are running in tensorflow mobilenet
This PR adds check containers are running in tensorflow mobilenet
that is being defined in common script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 20:17:20 +00:00
Gabriela Cervantes
36337ee146 metrics: Add check containers are up in tensorflow script
This PR adds the check containers are up function from common
in tensorflow script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 20:15:18 +00:00
Gabriela Cervantes
f700f9b0ba metrics: Remove unused variable in tensorflow script
This PR removes an unused variable in tensorflow script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 20:13:37 +00:00
Gabriela Cervantes
833cf7a684 metrics: Add check containers are running function
This PR adds the check containers are running function the common metrics
script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 20:12:22 +00:00
Gabriela Cervantes
918c783084 metrics: Add check containers are up in tensorflow mobilenet script
This PR adds the check containers are up in the common script
in the tensorflow mobilenet script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 20:06:40 +00:00
Gabriela Cervantes
9d57a1fab4 metrics: Use check containers are up in tensorflow script
This PR uses the check containers are up from the common script
in the tensorflow script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 17:42:09 +00:00
Gabriela Cervantes
1c84680d8c metrics: Add check containers are up in common script
This PR adds check containers are up in common script for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 17:39:24 +00:00
Gabriela Cervantes
d3e57cf454 metrics: Use collect_results function in tensorflow mobilenet test
This PR uses the collect results function defined in common for
the tensorflow mobilenet test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 17:34:30 +00:00
Gabriela Cervantes
286de046af metrics: Remove collect results function definition
This PR removes the collect results function from tensorflow script
as it is going to be referenced in the common metrics script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 17:31:23 +00:00
Gabriela Cervantes
9879709aae metrics: Add common functions to the common script
This PR adds the collect results function to the common metrics
script.

Fixes #7617

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 17:27:11 +00:00
Fabiano Fidêncio
a89c9cd620 Merge pull request #7557 from wedsonaf/no-new-vecs
agent: avoid creating new `Vec` instances when easily avoidable
2023-08-10 18:43:46 +02:00
Manabu Sugimoto
4746fa3daa docs: Specify supported Firecracker version using versions.yaml
Specify the supported version of Firecracker using our `versions.yaml`
to improve the maintainability of the documentation.

Fixes: #7610

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-10 16:49:45 +09:00
Manabu Sugimoto
cc922be5ec versions: Update firecracker version to 1.4.0
This patch upgrades Firecracker version from v1.1.0 to v1.4.0.

* Generate swagger models for v1.4.0 (from `firecracker.yaml`)
  - The version of go-swagger used is v0.30.0
* The firecracker v1.4.0 includes the following changes.
  - Added
    * Added support for custom CPU templates allowing users to adjust vCPU features
    exposed to the guest via CPUID, MSRs and ARM registers.
    * Introduced V1N1 static CPU template for ARM to represent Neoverse V1 CPU
    as Neoverse N1.
    * Added support for the virtio-rng entropy device. The device is optional. A
    single device can be enabled per VM using the /entropy endpoint.
    * Added a cpu-template-helper tool for assisting with creating and managing
    custom CPU templates.
  - Changed
    * Set FDP_EXCPTN_ONLY bit (CPUID.7h.0:EBX[6]) and ZERO_FCS_FDS bit
    (CPUID.7h.0:EBX[13]) in Intel's CPUID normalization process.
  - Fixed
    * Fixed feature flags in T2S CPU template on Intel Ice Lake.
    * Fixed CPUID leaf 0xb to be exposed to guests running on AMD host.
    * Fixed a performance regression in the jailer logic for closing open file
    descriptors.
    * A race condition that has been identified between the API thread and the VMM
    thread due to a misconfiguration of the api_event_fd.
    * Fixed CPUID leaf 0x1 to disable perfmon and debug feature on x86 host.
    * Fixed passing through cache information from host in CPUID leaf 0x80000006.
    * Fixed the T2S CPU template to set the RRSBA bit of the IA32_ARCH_CAPABILITIES
    MSR to 1 in accordance with an Intel microcode update.
    * Fixed the T2CL CPU template to pass through the RSBA and RRSBA bits of the
    IA32_ARCH_CAPABILITIES MSR from the host in accordance with an Intel microcode
    update.
    * Fixed passing through cache information from host in CPUID leaf 0x80000005.
    * Fixed the T2A CPU template to disable SVM (nested virtualization).
    * Fixed the T2A CPU template to set EferLmsleUnsupported bit
    (CPUID.80000008h:EBX[20]), which indicates that EFER[LMSLE] is not supported.

Fixes: #7610

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-10 16:48:13 +09:00
Fupan Li
39e67b06e9 dragonball: vsock add fifo/pipe stream support for passed fd hybridStream
Since the passed fd through unix socket would be any
stream fd such as pipe/fifo fd or any other socket
fd, thus we should deal with it as a normal hybrid
stream instead of a unix stream.

Fixes:#7584

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2023-08-10 11:07:10 +08:00
David Esparza
7bf994827d Merge pull request #7609 from dborquez/tensorflow_check_completion
metrics: compute tensorflow statistics
2023-08-09 18:47:47 -06:00
David Esparza
dcdb3b067f Merge pull request #7606 from GabyCT/topic/nginx
metrics: Add network nginx benchmark
2023-08-09 16:14:13 -06:00
David Esparza
2defdcc598 Merge pull request #7579 from dborquez/simplify_gha_metrics_workflow
metrics: install kata once and run multiple checks
2023-08-09 14:45:09 -06:00
David Esparza
473b0d3a31 metrics: compute tensorflow statistics
This PR computes average results for TF bench.
Additionally, it improves the data parsing from
all running containers.

Fixes: #7603

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-08-09 14:42:30 -06:00
Fabiano Fidêncio
0a8208c670 Merge pull request #7608 from fidencio/topic/create-image-to-be-used-by-the-confidential-tests-follow-up-3
ci: unencrypted-image: Fix build context
2023-08-09 21:00:46 +02:00
Fabiano Fidêncio
03d1fa67b1 ci: unencrypted-image: Fix build context
The build context should be the folder where the Dockerfile is present,
otherwise the files copied into the image won't be found.

Fixes: #7595

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 20:32:36 +02:00
Fabiano Fidêncio
eb463b38ec ci: unencrypted-image: Don't fail to build on s390x
Let's make sure that we don't fail in case we're building non x86_64.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 20:32:36 +02:00
Fabiano Fidêncio
ebc86091d1 Merge pull request #7607 from fidencio/topic/create-image-to-be-used-by-the-confidential-tests-follow-up-2
ci: create-confidential-image: Add dependent actions
2023-08-09 19:53:49 +02:00
Fabiano Fidêncio
a2d731ad26 ci: create-confidential-image: Add dependent actions
Following the example on https://github.com/docker/build-push-action,
it's clear that the actions to "Set up QEMU" and "Set up Docker Buildx"
are missing.

Let's add them, and also take the advantage to bump the
build-push-action to its v4, which, by the way, had a typo on its name
(build-and-push-action does **NOT** exist, build-push-action does).

Fixes: #7595

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 18:36:51 +02:00
Gabriela Cervantes
d1a6296221 metrics: Add nginx documentation to network README
This PR adds nginx documentation to network README for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-09 16:17:46 +00:00
Gabriela Cervantes
498f7c0549 metrics: Add nginx kubernetes yaml
This PR adds the nginx kubernetes yaml.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-09 16:14:04 +00:00
Gabriela Cervantes
f8a5255cf7 metrics: Add network nginx benchmark
This PR adds the network nginx benchmark for kata metrics.

Fixes #7605

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-09 16:12:21 +00:00
Fabiano Fidêncio
86f705d98b Merge pull request #7604 from fidencio/topic/create-image-to-be-used-by-the-confidential-tests-follow-up-1
Follow up fixes for https://github.com/kata-containers/kata-containers/pull/7596
2023-08-09 18:05:46 +02:00
Fabiano Fidêncio
43fe5d1b90 ci: k8s: tees: Ensure PR_NUMBER is exported
Right now this is not being used, but it'll as the image generated for
the confidential tests have that as part of their tag.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 17:45:42 +02:00
Fabiano Fidêncio
54f6a78500 ci: {{ pr-number }} should be {{ inputs.pr-number }}
One of the joys to rely on the `pull_request_target` is to only be able
to catch those after those are merged.

Fixes: #7595

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 17:41:07 +02:00
Fabiano Fidêncio
5cdf981a2b Merge pull request #7596 from fidencio/topic/create-image-to-be-used-by-the-confidential-tests
tests: Create image that will be used in the unencrypted confidential tests
2023-08-09 17:06:07 +02:00
Fabiano Fidêncio
c932369f42 Merge pull request #7492 from fidencio/topic/adapt-tests-to-the-new-kata-deploy-env-vars
kata-deploy: Ensure we cover SHIMS / DEFAULT_SHIM as part of our tests
2023-08-09 12:55:03 +02:00
Fabiano Fidêncio
034d7aab87 tests: k8s: Ensure the runtime classes are properly created
With these 2 simple checks we can ensure that we do not regress on the
behaviour of allowing the runtime classes / default runtime class to be
created by the kata-deploy payload.

Fixes: #7491

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 11:46:04 +02:00
Fabiano Fidêncio
fac8ccf5cd ci: Add build-and-publish-tee-confidential-unencrypted-image
This will be done before running TEE tests, and it's a hard dependency
fr them.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 11:36:10 +02:00
Fabiano Fidêncio
ab5f603ffa ci: k8s: Add the image used for unencrypted confidential tests
Let's add here the image we'll be using for unencrypted confidential
tests.  Later on, we'll make sure to build and use this image as part of
our CI.

The image can easily be built as a multi-arch image, and has `cpuid`
installed in case of `x86_64` build, so it can be used to detect whether
we're running on a TEE guest without having to rely on `dmesg | grep
...`.

Fixes: #7595

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 11:33:18 +02:00
Fabiano Fidêncio
36d53dd2af Merge pull request #7598 from UnmeshDeodhar/upgrade-bats-version
tests: upgrade bats version
2023-08-09 11:18:56 +02:00
Fabiano Fidêncio
1e8fe131bd k8s: tests: Take advantage of SHIMS and DEFAULT_SHIM env vars
We don't have to do any sed to replace the runtimeclass being used by
the moment we start taking advantage of the `DEFAULT_SHIM` environment
variable exposed merged in the previous commits.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 11:15:34 +02:00
Wedson Almeida Filho
729b2dd611 agent: avoid creating new Vec instances when easily avoidable
There are many places where the code currently creates new `Vec`
instances when it's not really needed. The result is a perf hit because
it allocates memory, copies all elements, then frees the memory; in some
cases, copying elements also involves extra allocations (e.g., when
elements are strings, or structs containing strings).

This patch addresses a number of these cases.

Fixes: #7203

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-09 02:38:36 -03:00
Jiang Liu
311671abb5 Merge pull request #7552 from jiangliu/agent-r1
Fix mimor bugs and improve coding stype of agent rpc/sandbox/mount
2023-08-09 13:19:02 +08:00
Unmesh Deodhar
aeaec9dae9 tests: upgrade bats version
Instead of using package manager to install bats, building
this from source. This gives us the updated version of bats
which supports functions such as setup_file and
teardown_file.
We can use these functions into our current tests.

Fixes: #7597

Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
2023-08-08 18:16:39 -05:00
David Esparza
e664969862 metrics: install kata once and run multiple checks
This PR changes the metrics workflow in order to just install
kata once, and run the checks for multiple hypervisor variations.

In this way we save time avoiding installing kata for each
hypervisor to be tested.

Fixes: #7578

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-08-08 10:25:13 -06:00
Jiang Liu
baabfa9f1f agent: refine implementation of mount related code
Refine implementation of mount by:
- log message with `path.display()` instead of `{:?}`
- add prefix "_" to unused variables
- pass by reference instead of by value to avoid creating redundant
  array
- exactly matching prefix "fsgid=" instead of "fsgid"
- avoid redundant clone() operations

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:03:03 +08:00
Jiang Liu
98ba211a34 agent: fix a bug in update_ephemeral_mounts()
There's a bug in function update_ephemeral_mounts() which only handles
the first storage object and ignores all other storage objects.

Fixes: #7551

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:03:02 +08:00
Jiang Liu
5333618d70 agent: make add_storage() take &[Storage] instead of Vec<Storage>
Simplify add_storage() by taking &[Storage] instead of Vec<Storage>.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:03:01 +08:00
Jiang Liu
37f34781d1 agent: simplify function online_cpu_memory()
Simplify function online_cpu_memory() by on calling update_cpuset_path()
for containers with cpuset configured.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:03:00 +08:00
Jiang Liu
d3c5422379 agent: refine style of code related to sandbox
Refine style of code related to sandbox by:
- remove unnecessary comments for caller to take lock, we have already taken
  `&mut self`.
- change "*count < 1 " to "*count == 0", `count` is type of u32.
- make remove_sandbox_storage() to take `&mut self` instead of `&self`.
- group related function to each others
- avoid search the map twice in function find_process()
- avoid unwrap() in function run_oom_event_monitor()
- avoid unwrap() in online_resources()

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:02:59 +08:00
Jiang Liu
71a9f67781 agent: avoid unwrap() in function do_remove_container()
Avoid unwrap() in function do_remove_container(), and also make
implmementation symmetric for both timeout and non-timeout cases.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:02:58 +08:00
Jiang Liu
84badd89d7 agent: avoid clone objects when possible
Optimize agent rpc implementation by:
- avoid clone objects when possible
- avoid unwrap() when possible
- explictly drop object to ensure order

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:02:56 +08:00
Chao Wu
b098960442 Merge pull request #7581 from justxuewei/bump-versions
deps: Bump dependent crate versions
2023-08-08 15:16:57 +08:00
Chao Wu
24bf637835 Merge pull request #7500 from pmores/fix-queue-num-in-dragonball-share-fs
fix number of queues handling in dragonball share fs device
2023-08-08 12:07:25 +08:00
Xuewei Niu
b23c5ed155 deps: Bump dependent crate versions
This pull request is mainly for updating vm-memory and vmm-sys-util.

The affacted crates include:

- vm-memory: from 0.9.0 to 0.10.0
- vmm-sys-util: from 0.10.0 to 0.11.0
- virtio-queue: from 0.6.0 to 0.7.0
- fuse-backend-rs: from 0.10.4 to 0.10.5
- linux-loader: from 0.6.0 to 0.8.0
- nydus-api: from 0.3.0 to 0.3.1
- nydus-rafs: from 0.3.1 to 0.3.2
- nydus-storage: from 0.6.3 to 0.6.4

Fixes: #0000

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-08-08 11:54:09 +08:00
Fupan Li
5a20d8dcaf Merge pull request #7383 from justxuewei/dan
runtime-rs: Introduce directly attachable network
2023-08-08 09:54:28 +08:00
Chelsea Mafrica
553fd79ea9 Merge pull request #7572 from GabyCT/topic/resnet50fp32
metrics: General improvements to mobilenet tensorflow test
2023-08-07 13:33:28 -07:00
GabyCT
194120b679 Merge pull request #7540 from GabyCT/topic/enableiperf
gha: Add iperf network metrics
2023-08-07 13:40:02 -06:00
Gabriela Cervantes
863283716d metrics: General improvements to mobilenet tensorflow test
This PR renames the mobilenet tensorflow test to have a more specific
tensorflow name mainly because tensorflow has different configurations
and we will add more tensorflow tests so we want to distinguish each
tensorflow test.

Fixes #7571

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-07 16:50:00 +00:00
Gabriela Cervantes
3c319d8d4c metrics: Add iperf to gha run script
This PR adds iperf to gha run script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-07 16:20:00 +00:00
Gabriela Cervantes
5b5caf8908 gha: Add iperf network metrics
This PR adds the iperf network metrics to the github actions
for kata metrics.

Fixes #7535

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-07 16:20:00 +00:00
Chelsea Mafrica
4559caf619 Merge pull request #7467 from ManaSugi/doc/use-k8-control-plane
docs: Use control-plane term instead of master
2023-08-06 23:40:51 -07:00
Fabiano Fidêncio
b365bef570 Merge pull request #7191 from wedsonaf/avoid-clones
agent: avoid unnecessary calls to `Arc::clone`
2023-08-06 15:34:07 +02:00
GabyCT
7144acb2a5 Merge pull request #7527 from GabyCT/topic/latency
metrics: Add network latency test
2023-08-04 15:54:07 -06:00
Gabriela Cervantes
66db5b5350 metrics: Add latency test to network README
This PR adds latency test to network README for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-04 20:27:27 +00:00
Wedson Almeida Filho
c36572418f agent: avoid unnecessary calls to Arc::clone
These calls cause two extra atomic instructions each time they're used,
one to increment and another one to decrement the refcount.

Since we don't need them because the referred value is guaranteed to
outlive the function, remove the calls.

Fixes: #7190

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 20:53:05 -03:00
Fabiano Fidêncio
8c03deac3a Merge pull request #7106 from wedsonaf/image-pulling
Image pulling on the host
2023-08-04 01:08:42 +02:00
Wedson Almeida Filho
4fbe0a3a53 runtime: bind-mount mounted block device into container
When the mounted block device isn't a layer, we want to mount it into
containers, but since it's already mounted with the correct fs (e.g.,
tar, ext4, etc.) in the pod, we just bind-mount it into the container.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Wedson Almeida Filho
7e1b1949d4 runtime: add support for kata overlays
When at least one `io.katacontainers.fs-opt.layer` option is added to
the rootfs, it gets inserted into the VM as a layer, and the file system
is mounted as an overlay of all layers using the overlayfs driver.

Additionally, if the `io.katacontainers.fs-opt.block_device=file` option
is present in a layer, it is mounted as a block device backed by a file
on the host.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Wedson Almeida Filho
6c867d9e86 agent: add io.katacontainers.fs-opt.overlay-rw option
This causes the overlay-fs driver to add the `upperdir` and `workdir`
options to an overlay-fs mount so that the mount becomes writable using
a discardable directory under the container id.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Wedson Almeida Filho
6163c35657 agent: skip mount options that start with "io.katacontainers."
This is so that file systems don't fail when we pass kata-specific
options from the snapshotter to kata.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Fabiano Fidêncio
fa35afa982 Merge pull request #7542 from wedsonaf/ci-fix
Use version 0.10.4 of `fuse-backend-rs`
2023-08-03 22:50:11 +02:00
Wedson Almeida Filho
b2ff97aa01 dragonball: use version 0.10.4 of fuse-backend-rs
Version 0.10.5, which was just released, breaks `nydus-storage`.

This is a workaround to fix the CI which is blocking other PRs.

Fixes: #7541

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 14:15:17 -03:00
Fabiano Fidêncio
ebdae7cfdf Merge pull request #7520 from jepio/host-systemctl
kata-deploy: Use host's systemctl
2023-08-03 13:53:28 +02:00
Manabu Sugimoto
845eeb4d7b agent: Allow clippy::redundant_clone in the unit tests
Allow `clippy::redundant_clone` in the agent's unit tests
because rustc>=1.70 shows the errors as false-negatives.
These `clone()` are required because the following codes
refer to the variable, but the clippy analyzes them by mistake,
using the conservative and limited approach.
Ref. https://rust-lang.github.io/rust-clippy/master/index.html#/redundant_clone

Fixes: #7534

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-03 19:07:40 +09:00
Fabiano Fidêncio
e2755a47b8 Merge pull request #7524 from fidencio/revert-kata-deploy-changes-after-3.2.0-rc0-release
release: Revert kata-deploy changes after 3.2.0-rc0 release
2023-08-03 11:28:43 +02:00
Fabiano Fidêncio
1163fc9de2 release: Revert kata-deploy changes after 3.2.0-rc0 release
As 3.2.0-rc0 has been released, let's switch the kata-deploy / kata-cleanup
tags back to "latest", and re-add the kata-deploy-stable and the
kata-cleanup-stable files.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-03 10:08:20 +02:00
Xuewei Niu
3958a39d07 runtime-rs: Introduce directly attachable network
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).

The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.

The format of file looks like as below:

```json
{
	"netns": "/path/to/netns",
	"devices": [{
		"name": "eth0",
		"guest_mac": "xx:xx:xx:xx:xx",
		"device": {
			"type": "vhost-user",
			"path": "/tmp/test",
			"queue_num": 1,
			"queue_size": 1
		},
		"network_info": {
			"interface": {
				"ip_addresses": ["192.168.0.1/24"],
				"mtu": 1500,
				"ntype": "tuntap",
				"flags": 0
			},
			"routes": [{
				"dest": "172.18.0.0/16",
				"source": "172.18.0.1",
				"gateway": "172.18.31.1",
				"scope": 0,
				"flags": 0
			}],
			"neighbors": [{
				"ip_address": "192.168.0.3/16",
				"device": "",
				"state": 0,
				"flags": 0,
				"hardware_addr": "xx:xx:xx:xx:xx"
			}]
		}
	}]
}
```

Fixes: #1922

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-08-03 15:33:34 +08:00
David Esparza
7d1c48c881 Merge pull request #7530 from dborquez/fix_check_running_processes
metrics: stop kata components before start a metric test.
2023-08-02 23:51:27 -06:00
Zhongtao Hu
e719423262 Merge pull request #7127 from cmaf/runtime-rs-ch-blk-2
runtime-rs: Add block device handling for cloud hypervisor
2023-08-03 09:46:32 +08:00
David Esparza
1e15369e59 metrics: Improve naming testing containers in launch times test
This commit provides a new way to name the containers used
in the launch-times-test in this form:
'kata_launch_times_RANDOM_NUMBER', where RANDOM_NUMBER is
in the 0-1000 range.

Fixes: #7529

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-08-02 17:04:55 -06:00
David Esparza
5dbe88330f metrics: Clean kata components before start a metric test.
This PR kills all kata components before start a new
metric test.

Fixes: #7528

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-08-02 17:04:51 -06:00
Fabiano Fidêncio
d424f3c595 Merge pull request #7523 from fidencio/3.2.0-rc0-branch-bump
# Kata Containers 3.2.0-rc0
2023-08-02 20:04:37 +02:00
Zvonko Kaiser
cf8899f260 Merge pull request #7494 from zvonkok/vfio-mode
vfio: Fix vfio device ordering
2023-08-02 19:45:22 +02:00
Gabriela Cervantes
3b45060b61 metrics: Add latency server yaml
This PR adds latency server yaml for kubernetes test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-02 16:52:17 +00:00
Gabriela Cervantes
9bb8451df5 metrics: Add latency client yaml
This PR adds latency client yaml for the kubernetes test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-02 16:50:51 +00:00
Gabriela Cervantes
64fdb98704 metrics: Add network latency test
This PR adds network latency test for kata metrics.

Fixes #7526

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-02 16:46:48 +00:00
Chelsea Mafrica
a81ad3b587 runtime-rs: Add block device handling in cloud hypervisor
Add functions for adding a block device to a container for CH.

Fixes #6690

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2023-08-02 09:18:48 -07:00
David Esparza
542012c8be Merge pull request #7503 from GabyCT/topic/ghafio
metrics: Add FIO test to gha for kata metrics CI
2023-08-02 10:05:09 -06:00
David Esparza
5979f3790b Merge pull request #7516 from GabyCT/topic/addiperf
metrics: Add iperf3 network test
2023-08-02 10:04:51 -06:00
Fabiano Fidêncio
006ecce49a release: Kata Containers 3.2.0-rc0
- ci-on-push: Make the CI also run for the stable-* branches
- ci: k8s: Do not fail when gathering info on AKS nodes
- kata-deploy: enable cross build for non-x86
- runtime-rs: add support for gather metrics in runtime-rs
- kata-ctl: add monitor subcommand for runtime-rs
- release: release-note.sh: Fix typos and reference to images
- metrics: Add sysbench performance test
- Simplify implementation of runtime-rs/service

6ad16d497 release: Adapt kata-deploy for 3.2.0-rc0
025596b28 ci-on-push: Make the CI also run for the stable-* branches
7ffc0c122 static-build: enable cross build for qemu
35d6d86ab static-build: enable cross-build for image build
2205fb9d0 static-build: enable cross build for virtiofsd
11631c681 static-build: enable cross build for shim-v2
7923de899 static-build: cross build kernel
e2c31fce2 kata-deploy: enable cross build for kata deploy script
2fc5f0e2e kata-depoly: prepare env for cross build in lib.sh
f5e9985af release: release-note.sh: Fix typos and reference to images
f910c66d6 ci: k8s: Do not fail when gathering info on AKS nodes
632818176 metrics: Add k8s sysbench documentation
b3901c46d runtime-rs: ignore errors during clean up sandbox resources
5a1b5d367 metrics: Add sysbench pod yaml
ad413d164 metrics: Add sysbench dockerfile
151256011 metrics: Add sysbench performance test
62e328ca5 runtime-rs: refine implementation of TaskService
458e1bc71 runtime-rs: make send_message() as an method of ServiceManager
1cc1c81c9 runtime-rs: fix possibe bug in ServiceManager::run()
1a5f90dc3 runtime-rs: simplify implementation of service crate
731e7c763 kata-ctl: add monitor subcommand for runtime-rs The previous kata-monitor in golang could not communicate with runtime-rs to gather metrics due to different sandbox addresses. This PR adds the subcommand monitor in kata-ctl to gather metrics from runtime-rs and monitor itself.
d74639d8c kata-ctl: provide the global TIMEOUT for creating MgmtClient
02cc4fe9d runtime-rs: add support for gather metrics in runtime-rs

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-02 16:59:41 +02:00
Fabiano Fidêncio
6ad16d4977 release: Adapt kata-deploy for 3.2.0-rc0
kata-deploy files must be adapted to a new release.  The cases where it
happens are when the release goes from -> to:
* main -> stable:
  * kata-deploy-stable / kata-cleanup-stable: are removed

* stable -> stable:
  * kata-deploy / kata-cleanup: bump the release to the new one.

There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-02 16:59:41 +02:00
Fabiano Fidêncio
4e812009f5 Merge pull request #7519 from fidencio/topic/gha-ci-run-on-stable-branches
ci-on-push: Make the CI also run for the stable-* branches
2023-08-02 16:13:06 +02:00
Jeremi Piotrowski
3230dec950 kata-deploy: Use host's systemctl
when interacting with systemd. We have occasionally faced issues with
compatibility between the systemctl version used inside the kata-deploy
container and the systemd version on the host. Instead of using a containerized
systemctl with bind mounted sockets, nsenter the host and run systemctl from
there. This provides less coupling between the kata-deploy container and the
host.

Fixes: #7511
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-08-02 15:32:01 +02:00
Fabiano Fidêncio
29855ed0c6 Merge pull request #7510 from fidencio/topic/ci-k8s-aks-do-not-fail-gathering-info
ci: k8s: Do not fail when gathering info on AKS nodes
2023-08-02 09:44:19 +02:00
Fabiano Fidêncio
025596b289 ci-on-push: Make the CI also run for the stable-* branches
As we only support one stable branch, it'll be used as part of the
stable-3.2 and onwards.

Fixes: #7518

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-02 09:26:24 +02:00
Fabiano Fidêncio
e1a69c0c92 Merge pull request #6586 from jongwu/cross_build
kata-deploy: enable cross build for non-x86
2023-08-02 09:11:56 +02:00
Fupan Li
1a6b27bf6a Merge pull request #5797 from Yuan-Zhuo/add-metrics-for-runtime-rs
runtime-rs: add support for gather metrics in runtime-rs
2023-08-02 13:40:22 +08:00
Fupan Li
a536d4a7bf Merge pull request #6672 from Yuan-Zhuo/add-monitor-in-kata-ctl
kata-ctl: add monitor subcommand for runtime-rs
2023-08-02 13:39:02 +08:00
Gabriela Cervantes
ad6e53c399 metrics: Modify boot time values
This PR modifies boot time values limit.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-01 23:34:15 +00:00
Jianyong Wu
7ffc0c1225 static-build: enable cross build for qemu
Depends on mutiarch feature of ubuntu, we can set up cross build
environment easily and achive as good build performance as native
build.

Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-08-01 23:28:52 +02:00
Jianyong Wu
35d6d86ab5 static-build: enable cross-build for image build
It's too long a time to cross build agent based on docker buildx, thus
we cross build rootfs based on a container with cross compile toolchain
of gcc and rust with musl libc. Then we get fast build just like native
build.

rootfs initrd cross build is disabled as no cross compile tolchain for
rust with musl lib if found for alpine and based on docker buildx takes
too long a time.

Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-08-01 23:28:52 +02:00
Gabriela Cervantes
f764248095 gha: Add FIO test to run metrics yaml
This PR adds FIO test to run metrics yaml.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-01 20:29:16 +00:00
Jianyong Wu
2205fb9d05 static-build: enable cross build for virtiofsd
Based on messense/rust-musl-cross which offer cross build musl lib
environment to cross compile virtiofsd.

Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-08-01 22:10:46 +02:00
Jianyong Wu
11631c681a static-build: enable cross build for shim-v2
shim-v2 has go and rust code. For rust code, we use messense/rust-musl-cross
to build for speed up as it doesn't depends on qemu emulation. Build go
code based on docker buildx as it doesn't support cross build now.

Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-08-01 22:10:46 +02:00
Jianyong Wu
7923de8999 static-build: cross build kernel
Prepare cross build environment based on current Dockerfile.

Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-08-01 22:10:46 +02:00
Jianyong Wu
e2c31fce23 kata-deploy: enable cross build for kata deploy script
kata-deploy-binaries-in-docker.sh is the entry to build kata components.
set some environment to facilitate the following cross build work.

Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-08-01 22:10:46 +02:00
Jianyong Wu
2fc5f0e2e0 kata-depoly: prepare env for cross build in lib.sh
We leverage three env, TARGET_ARCH means the buid target tuple;
ARCH nearly the same meaning with TARGET_ARCH but has been widely
used in kata; CROSS_BUILD means if you want to do cross compile.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-08-01 22:10:46 +02:00
Fabiano Fidêncio
c0171ea0a7 Merge pull request #7508 from fidencio/topic/fix-release-notes-typos-and-references
release: release-note.sh: Fix typos and reference to images
2023-08-01 22:05:32 +02:00
Gabriela Cervantes
58f9a57c20 metrics: Add network reference to general README metrics
This PR adds network reference to the general metrics README.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-01 16:54:00 +00:00
Gabriela Cervantes
07694ef3ae metrics: Add Kata Containers network metrics README
This PR adds the Kata Containers network metrics README.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-01 16:49:09 +00:00
Gabriela Cervantes
d8439dba89 metrics: Add iperf3 deployment yaml
This PR adds the iperf3 deployment yaml.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-01 16:45:01 +00:00
Gabriela Cervantes
bda83cee5d metrics: Add iperf3 daemonset for k8s
This PR adds the iperf3 daemonset for k8s.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-01 16:42:15 +00:00
Gabriela Cervantes
badff23c71 metrics: Add iperf3 service yaml for k8s
This PR adds the iperf3 service yaml for k8s.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-01 16:37:19 +00:00
Gabriela Cervantes
27c02367f9 metrics: Add iperf3 network test
This PR adds the iperf3 benchmark test for kata metrics.

Fixes #7515

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-01 16:30:46 +00:00
GabyCT
a0a524efc2 Merge pull request #7486 from kata-containers/topic/addsysbench
metrics: Add sysbench performance test
2023-08-01 10:17:48 -06:00
Fabiano Fidêncio
f5e9985afe release: release-note.sh: Fix typos and reference to images
diferent -> different

And also let's make sure we escape the backticks around the kata-deploy
environment variables, otherwise bash will try to interpret those.

Fixes: #7497

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-01 12:42:03 +02:00
Fabiano Fidêncio
f910c66d6f ci: k8s: Do not fail when gathering info on AKS nodes
Otherwise the VM deletion may not delete, leaving us with several
machines behind.

Fixes: #7509

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-01 12:36:33 +02:00
Manabu Sugimoto
1b21a46246 docs: Use control-plane term instead of master
Replace `master` with `control-plane` in the context of K8s
because `master` is a legacy term and haven't been used any more.

Ref. https://github.com/kubernetes/enhancements/tree/master/keps/sig-cluster-lifecycle/kubeadm/2067-rename-master-label-taint

Fixes: #7466

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-01 17:41:40 +09:00
Chao Wu
1a94aad44f Merge pull request #7480 from jiangliu/rt-service
Simplify implementation of runtime-rs/service
2023-08-01 16:05:33 +08:00
Gabriela Cervantes
6328181762 metrics: Add k8s sysbench documentation
This PR adds k8s sysbench documentation at general density documentation.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-31 20:28:37 +00:00
Gabriela Cervantes
8933d54428 metrics: Add FIO to gha run script
This PR adds FIO to gha run script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-31 17:51:11 +00:00
Gabriela Cervantes
8a584589ff metrics: Add DAX FIO README
This PR adds DAX FIO README information.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-31 17:42:44 +00:00
Gabriela Cervantes
21f5b65233 metrics: Add FIO information in storage general README
This PR adds FIO information in storage general README.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-31 17:33:39 +00:00
Gabriela Cervantes
69f05cf9e6 metrics: Add FIO general README
This PR adds FIO general README information.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-31 17:30:05 +00:00
Gabriela Cervantes
87d41b3dfa metrics: Add FIO test to gha for kata metrics CI
This PR adds FIO test to gha for kata metrics CI.

Fixes #7502

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-31 16:50:16 +00:00
Pavel Mores
28e5e9c86e runtime-rs: fix number of queues handling in dragonball share fs device
Looks like a copy/paste error...

Fixes #7501

Signed-off-by: Pavel Mores <pmores@redhat.com>
2023-07-31 17:25:47 +02:00
Zvonko Kaiser
cddcde1d40 vfio: Fix vfio device ordering
If modeVFIO is enabled we need 1st to attach the VFIO control group
device /dev/vfio/vfio an 2nd the actuall device(s) afterwards.Sort the
devices starting with device #1 being the VFIO control group device and
the next the actuall device(s)
/dev/vfio/<group>

Fixes: #7493

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-31 11:26:27 +00:00
Jiang Liu
b3901c46d6 runtime-rs: ignore errors during clean up sandbox resources
Ignore errors during clean up sandbox resources as much as we can.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-07-31 13:07:43 +08:00
Gabriela Cervantes
5a1b5d3672 metrics: Add sysbench pod yaml
This PR adds the sysbench pod yaml for the sysbench performance test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-28 20:03:15 +00:00
Gabriela Cervantes
ad413d1646 metrics: Add sysbench dockerfile
This PR adds sysbench dockerfile.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-28 19:58:10 +00:00
Gabriela Cervantes
1512560111 metrics: Add sysbench performance test
This PR adds the sysbench performance test for kata CI.

Fixes #7485

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-28 19:54:12 +00:00
Jiang Liu
62e328ca5c runtime-rs: refine implementation of TaskService
Refine implementation of TaskService, making handler_message() as a
method.

Fixes: #7479

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-07-29 00:47:33 +08:00
Jiang Liu
458e1bc712 runtime-rs: make send_message() as an method of ServiceManager
Simplify implementation by making send_message() as an method of
ServiceManager.

Fixes: #7479

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-07-29 00:47:31 +08:00
Jiang Liu
1cc1c81c9a runtime-rs: fix possibe bug in ServiceManager::run()
Multiple instances of task service may get registered by
ServiceManager::run(), fix it by making operation symmetric.

Fixes: #7479

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-07-29 00:47:30 +08:00
Jiang Liu
1a5f90dc3f runtime-rs: simplify implementation of service crate
Simplify implementation of service crate.

Fixes: #7479

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-07-29 00:47:28 +08:00
Yuan-Zhuo
731e7c763f kata-ctl: add monitor subcommand for runtime-rs
The previous kata-monitor in golang could not communicate with runtime-rs
to gather metrics due to different sandbox addresses.
This PR adds the subcommand monitor in kata-ctl to gather metrics from
runtime-rs and monitor itself.

Fixes: #5017

Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
2023-07-28 17:30:08 +08:00
Yuan-Zhuo
d74639d8c6 kata-ctl: provide the global TIMEOUT for creating MgmtClient
Several functions in kata-ctl need to establish a connection with runtime-rs through MgmtClient.
This PR provides a global TIMEOUT to avoid multiple definitions.

Fixes: #5017

Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
2023-07-28 17:23:37 +08:00
Yuan-Zhuo
02cc4fe9db runtime-rs: add support for gather metrics in runtime-rs
1. Implemented metrics collection for runtime-rs shim and dragonball hypervisor.
2. Described the current supported metrics in runtime-rs.(docs/design/kata-metrics-in-runtime-rs.md)

Fixes: #5017

Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
2023-07-28 17:16:51 +08:00
Manabu Sugimoto
f1d8de9be6 runk: Allow runk to launch a container without pid namespace
Allow runk to launch a container even though users don't specify the
pid namespace in `config.json` because general container runtimes
such as runc also can launch a container without the namespace.
On the other hand, Kata Containers doesn't allow it due to security issue
so this feature should be enabled in only runk.

Fixes: #7168

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-07-16 23:31:14 +05:30
903 changed files with 64425 additions and 22906 deletions

View File

@@ -21,7 +21,7 @@ jobs:
steps:
- name: Checkout code to allow hub to communicate with the project
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Install hub extension script
run: |

View File

@@ -39,7 +39,7 @@ jobs:
popd &>/dev/null
- name: Checkout code to allow hub to communicate with the project
uses: actions/checkout@v2
uses: actions/checkout@v4
- name: Add issue to issue backlog
env:

View File

@@ -21,7 +21,16 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v1
uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ github.event.pull_request.base.ref }}
- name: Install PR sizing label script
run: |

200
.github/workflows/basic-ci-amd64.yaml vendored Normal file
View File

@@ -0,0 +1,200 @@
name: CI | Basic amd64 tests
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-cri-containerd:
strategy:
# We can set this to true whenever we're 100% sure that
# the all the tests are not flaky, otherwise we'll fail
# all the tests due to a single flaky instance.
fail-fast: false
matrix:
containerd_version: ['lts', 'active']
vmm: ['clh', 'qemu']
runs-on: garm-ubuntu-2204-smaller
env:
CONTAINERD_VERSION: ${{ matrix.containerd_version }}
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/cri-containerd/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/cri-containerd/gha-run.sh install-kata kata-artifacts
- name: Run cri-containerd tests
run: bash tests/integration/cri-containerd/gha-run.sh run
run-containerd-stability:
strategy:
fail-fast: false
matrix:
containerd_version: ['lts', 'active']
vmm: ['clh', 'qemu']
runs-on: garm-ubuntu-2204-smaller
env:
CONTAINERD_VERSION: ${{ matrix.containerd_version }}
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/stability/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/stability/gha-run.sh install-kata kata-artifacts
- name: Run containerd-stability tests
run: bash tests/stability/gha-run.sh run
run-nydus:
strategy:
# We can set this to true whenever we're 100% sure that
# the all the tests are not flaky, otherwise we'll fail
# all the tests due to a single flaky instance.
fail-fast: false
matrix:
containerd_version: ['lts', 'active']
vmm: ['clh', 'qemu', 'dragonball']
runs-on: garm-ubuntu-2204-smaller
env:
CONTAINERD_VERSION: ${{ matrix.containerd_version }}
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/nydus/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/nydus/gha-run.sh install-kata kata-artifacts
- name: Run nydus tests
run: bash tests/integration/nydus/gha-run.sh run
run-runk:
runs-on: garm-ubuntu-2204-smaller
env:
CONTAINERD_VERSION: lts
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/runk/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/runk/gha-run.sh install-kata kata-artifacts
- name: Run tracing tests
run: bash tests/integration/runk/gha-run.sh run
run-vfio:
strategy:
fail-fast: false
matrix:
vmm: ['clh', 'qemu']
runs-on: garm-ubuntu-2304
env:
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/functional/vfio/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Run vfio tests
timeout-minutes: 15
run: bash tests/functional/vfio/gha-run.sh run

View File

@@ -16,6 +16,10 @@ on:
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
build-asset:
@@ -23,9 +27,13 @@ jobs:
strategy:
matrix:
asset:
- agent
- agent-opa
- agent-ctl
- cloud-hypervisor
- cloud-hypervisor-glibc
- firecracker
- kata-ctl
- kernel
- kernel-sev
- kernel-dragonball-experimental
@@ -33,6 +41,7 @@ jobs:
- kernel-nvidia-gpu
- kernel-nvidia-gpu-snp
- kernel-nvidia-gpu-tdx-experimental
- log-parser-rs
- nydus
- ovmf
- ovmf-sev
@@ -44,12 +53,18 @@ jobs:
- rootfs-initrd
- rootfs-initrd-mariner
- rootfs-initrd-sev
- runk
- shim-v2
- tdvf
- trace-forwarder
- virtiofsd
stage:
- ${{ inputs.stage }}
exclude:
- asset: agent
stage: release
- asset: agent-opa
stage: release
- asset: cloud-hypervisor-glibc
stage: release
steps:
@@ -61,11 +76,17 @@ jobs:
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Build ${{ matrix.asset }}
run: |
make "${KATA_ASSET}-tarball"
@@ -76,6 +97,10 @@ jobs:
KATA_ASSET: ${{ matrix.asset }}
TAR_OUTPUT: ${{ matrix.asset }}.tar.gz
PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}
ARTEFACT_REGISTRY: ghcr.io
ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}
ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: store-artifact ${{ matrix.asset }}
uses: actions/upload-artifact@v3
@@ -89,9 +114,15 @@ jobs:
runs-on: ubuntu-latest
needs: build-asset
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-artifacts
uses: actions/download-artifact@v3
with:

View File

@@ -16,10 +16,14 @@ on:
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
build-asset:
runs-on: arm64
runs-on: arm64-builder
strategy:
matrix:
asset:
@@ -48,10 +52,17 @@ jobs:
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Build ${{ matrix.asset }}
run: |
make "${KATA_ASSET}-tarball"
@@ -62,6 +73,10 @@ jobs:
KATA_ASSET: ${{ matrix.asset }}
TAR_OUTPUT: ${{ matrix.asset }}.tar.gz
PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}
ARTEFACT_REGISTRY: ghcr.io
ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}
ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: store-artifact ${{ matrix.asset }}
uses: actions/upload-artifact@v3
@@ -72,16 +87,22 @@ jobs:
if-no-files-found: error
create-kata-tarball:
runs-on: arm64
runs-on: arm64-builder
needs: build-asset
steps:
- name: Adjust a permission for repo
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-artifacts
uses: actions/download-artifact@v3
with:

View File

@@ -16,6 +16,10 @@ on:
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
build-asset:
@@ -44,10 +48,17 @@ jobs:
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Build ${{ matrix.asset }}
run: |
make "${KATA_ASSET}-tarball"
@@ -59,6 +70,10 @@ jobs:
KATA_ASSET: ${{ matrix.asset }}
TAR_OUTPUT: ${{ matrix.asset }}.tar.gz
PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}
ARTEFACT_REGISTRY: ghcr.io
ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}
ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: store-artifact ${{ matrix.asset }}
uses: actions/upload-artifact@v3
@@ -76,9 +91,15 @@ jobs:
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-artifacts
uses: actions/download-artifact@v3
with:

View File

@@ -19,7 +19,7 @@ jobs:
steps:
- name: Checkout Code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Generate Action
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: bash cargo-deny-generator.sh

View File

@@ -15,4 +15,5 @@ jobs:
commit-hash: ${{ github.sha }}
pr-number: "nightly"
tag: ${{ github.sha }}-nightly
target-branch: ${{ github.ref_name }}
secrets: inherit

View File

@@ -3,6 +3,7 @@ on:
pull_request_target:
branches:
- 'main'
- 'stable-*'
types:
# Adding 'labeled' to the list of activity types that trigger this event
# (default: opened, synchronize, reopened) so that we can run this
@@ -27,4 +28,5 @@ jobs:
commit-hash: ${{ github.event.pull_request.head.sha }}
pr-number: ${{ github.event.pull_request.number }}
tag: ${{ github.event.pull_request.number }}-${{ github.event.pull_request.head.sha }}
target-branch: ${{ github.event.pull_request.base.ref }}
secrets: inherit

View File

@@ -11,6 +11,10 @@ on:
tag:
required: true
type: string
target-branch:
required: false
type: string
default: ""
jobs:
build-kata-static-tarball-amd64:
@@ -18,6 +22,7 @@ jobs:
with:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
publish-kata-deploy-payload-amd64:
needs: build-kata-static-tarball-amd64
@@ -28,8 +33,94 @@ jobs:
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
secrets: inherit
build-and-publish-tee-confidential-unencrypted-image:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to Kata Containers ghcr.io
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Docker build and push
uses: docker/build-push-action@v4
with:
tags: ghcr.io/kata-containers/test-images:unencrypted-${{ inputs.pr-number }}
push: true
context: tests/integration/kubernetes/runtimeclass_workloads/confidential/unencrypted/
platforms: linux/amd64, linux/s390x
file: tests/integration/kubernetes/runtimeclass_workloads/confidential/unencrypted/Dockerfile
run-docker-tests-on-garm:
needs: build-kata-static-tarball-amd64
uses: ./.github/workflows/run-docker-tests-on-garm.yaml
with:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
run-nerdctl-tests-on-garm:
needs: build-kata-static-tarball-amd64
uses: ./.github/workflows/run-nerdctl-tests-on-garm.yaml
with:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
run-kata-deploy-tests-on-aks:
needs: publish-kata-deploy-payload-amd64
uses: ./.github/workflows/run-kata-deploy-tests-on-aks.yaml
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
secrets: inherit
run-kata-deploy-tests-on-garm:
needs: publish-kata-deploy-payload-amd64
uses: ./.github/workflows/run-kata-deploy-tests-on-garm.yaml
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
secrets: inherit
run-kata-monitor-tests:
needs: build-kata-static-tarball-amd64
uses: ./.github/workflows/run-kata-monitor-tests.yaml
with:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
run-k8s-tests-on-aks:
needs: publish-kata-deploy-payload-amd64
uses: ./.github/workflows/run-k8s-tests-on-aks.yaml
@@ -39,34 +130,43 @@ jobs:
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
secrets: inherit
run-k8s-tests-on-sev:
run-k8s-tests-on-garm:
needs: publish-kata-deploy-payload-amd64
uses: ./.github/workflows/run-k8s-tests-on-sev.yaml
uses: ./.github/workflows/run-k8s-tests-on-garm.yaml
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
secrets: inherit
run-k8s-tests-on-snp:
run-k8s-tests-with-crio-on-garm:
needs: publish-kata-deploy-payload-amd64
uses: ./.github/workflows/run-k8s-tests-on-snp.yaml
uses: ./.github/workflows/run-k8s-tests-with-crio-on-garm.yaml
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
secrets: inherit
run-k8s-tests-on-tdx:
needs: publish-kata-deploy-payload-amd64
uses: ./.github/workflows/run-k8s-tests-on-tdx.yaml
run-kata-coco-tests:
needs: [publish-kata-deploy-payload-amd64, build-and-publish-tee-confidential-unencrypted-image]
uses: ./.github/workflows/run-kata-coco-tests.yaml
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
run-metrics-tests:
needs: build-kata-static-tarball-amd64
@@ -74,24 +174,12 @@ jobs:
with:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
run-cri-containerd-tests:
run-basic-amd64-tests:
needs: build-kata-static-tarball-amd64
uses: ./.github/workflows/run-cri-containerd-tests.yaml
with:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
run-nydus-tests:
needs: build-kata-static-tarball-amd64
uses: ./.github/workflows/run-nydus-tests.yaml
with:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
run-vfio-tests:
needs: build-kata-static-tarball-amd64
uses: ./.github/workflows/run-vfio-tests.yaml
uses: ./.github/workflows/basic-ci-amd64.yaml
with:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}

View File

@@ -21,6 +21,6 @@ jobs:
with:
go-version: 1.19.3
- name: Checkout code
uses: actions/checkout@v2
uses: actions/checkout@v4
- name: Build utils
run: ./ci/darwin-test.sh

View File

@@ -22,7 +22,7 @@ jobs:
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
uses: actions/checkout@v2
uses: actions/checkout@v4
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}

View File

@@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Ensure the split out runtime classes match the all-in-one file
run: |
pushd tools/packaging/kata-deploy/runtimeclasses/

View File

@@ -38,7 +38,17 @@ jobs:
- name: Checkout code to allow hub to communicate with the project
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v2
uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ github.event.pull_request.base.ref }}
- name: Move issue to "In progress"
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}

View File

@@ -4,6 +4,7 @@ on:
branches:
- main
- stable-*
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
@@ -15,6 +16,7 @@ jobs:
with:
commit-hash: ${{ github.sha }}
push-to-registry: yes
target-branch: ${{ github.ref_name }}
secrets: inherit
build-assets-arm64:
@@ -22,6 +24,7 @@ jobs:
with:
commit-hash: ${{ github.sha }}
push-to-registry: yes
target-branch: ${{ github.ref_name }}
secrets: inherit
build-assets-s390x:
@@ -29,6 +32,7 @@ jobs:
with:
commit-hash: ${{ github.sha }}
push-to-registry: yes
target-branch: ${{ github.ref_name }}
secrets: inherit
publish-kata-deploy-payload-amd64:
@@ -39,6 +43,7 @@ jobs:
registry: quay.io
repo: kata-containers/kata-deploy-ci
tag: kata-containers-amd64
target-branch: ${{ github.ref_name }}
secrets: inherit
publish-kata-deploy-payload-arm64:
@@ -49,6 +54,7 @@ jobs:
registry: quay.io
repo: kata-containers/kata-deploy-ci
tag: kata-containers-arm64
target-branch: ${{ github.ref_name }}
secrets: inherit
publish-kata-deploy-payload-s390x:
@@ -59,6 +65,7 @@ jobs:
registry: quay.io
repo: kata-containers/kata-deploy-ci
tag: kata-containers-s390x
target-branch: ${{ github.ref_name }}
secrets: inherit
publish-manifest:
@@ -66,7 +73,7 @@ jobs:
needs: [publish-kata-deploy-payload-amd64, publish-kata-deploy-payload-arm64, publish-kata-deploy-payload-s390x]
steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Login to Kata Containers quay.io
uses: docker/login-action@v2

View File

@@ -17,14 +17,25 @@ on:
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
kata-payload:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tarball
uses: actions/download-artifact@v3

View File

@@ -17,18 +17,29 @@ on:
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
kata-payload:
runs-on: arm64
runs-on: arm64-builder
steps:
- name: Adjust a permission for repo
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tarball
uses: actions/download-artifact@v3

View File

@@ -17,6 +17,10 @@ on:
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
kata-payload:
@@ -26,9 +30,16 @@ jobs:
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tarball
uses: actions/download-artifact@v3

View File

@@ -29,7 +29,7 @@ jobs:
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:

View File

@@ -14,7 +14,7 @@ jobs:
kata-deploy:
needs: build-kata-static-tarball-arm64
runs-on: arm64
runs-on: arm64-builder
steps:
- name: Login to Kata Containers docker.io
uses: docker/login-action@v2
@@ -29,7 +29,7 @@ jobs:
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:

View File

@@ -29,7 +29,7 @@ jobs:
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:

View File

@@ -32,7 +32,7 @@ jobs:
needs: [build-and-push-assets-amd64, build-and-push-assets-arm64, build-and-push-assets-s390x]
steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Login to Kata Containers docker.io
uses: docker/login-action@v2
@@ -73,11 +73,7 @@ jobs:
needs: publish-multi-arch-images
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: install hub
run: |
wget -q -O- https://github.com/mislav/hub/releases/download/v2.14.2/hub-linux-amd64-2.14.2.tgz | \
tar xz --strip-components=2 --wildcards '*/bin/hub' && sudo mv hub /usr/local/bin/hub
- uses: actions/checkout@v4
- name: download-artifacts-amd64
uses: actions/download-artifact@v3
@@ -90,7 +86,7 @@ jobs:
mv kata-static.tar.xz "$GITHUB_WORKSPACE/${tarball}"
pushd $GITHUB_WORKSPACE
echo "uploading asset '${tarball}' for tag: ${tag}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} hub release edit -m "" -a "${tarball}" "${tag}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} gh release upload "${tag}" "${tarball}"
popd
- name: download-artifacts-arm64
@@ -104,7 +100,7 @@ jobs:
mv kata-static.tar.xz "$GITHUB_WORKSPACE/${tarball}"
pushd $GITHUB_WORKSPACE
echo "uploading asset '${tarball}' for tag: ${tag}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} hub release edit -m "" -a "${tarball}" "${tag}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} gh release upload "${tag}" "${tarball}"
popd
- name: download-artifacts-s390x
@@ -118,13 +114,13 @@ jobs:
mv kata-static.tar.xz "$GITHUB_WORKSPACE/${tarball}"
pushd $GITHUB_WORKSPACE
echo "uploading asset '${tarball}' for tag: ${tag}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} hub release edit -m "" -a "${tarball}" "${tag}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} gh release upload "${tag}" "${tarball}"
popd
upload-versions-yaml:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: upload versions.yaml
env:
GITHUB_TOKEN: ${{ secrets.GIT_UPLOAD_TOKEN }}
@@ -133,28 +129,28 @@ jobs:
pushd $GITHUB_WORKSPACE
versions_file="kata-containers-$tag-versions.yaml"
cp versions.yaml ${versions_file}
hub release edit -m "" -a "${versions_file}" "${tag}"
gh release upload "${tag}" "${versions_file}"
popd
upload-cargo-vendored-tarball:
needs: upload-multi-arch-static-tarball
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: generate-and-upload-tarball
run: |
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
tarball="kata-containers-$tag-vendor.tar.gz"
pushd $GITHUB_WORKSPACE
bash -c "tools/packaging/release/generate_vendor.sh ${tarball}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} hub release edit -m "" -a "${tarball}" "${tag}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} gh release upload "${tag}" "${tarball}"
popd
upload-libseccomp-tarball:
needs: upload-cargo-vendored-tarball
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: download-and-upload-tarball
env:
GITHUB_TOKEN: ${{ secrets.GIT_UPLOAD_TOKEN }}
@@ -174,6 +170,6 @@ jobs:
# "-m" option should be empty to re-use the existing release title
# without opening a text editor.
# For the details, check https://hub.github.com/hub-release.1.html.
hub release edit -m "" -a "${tarball}" "${tag}"
hub release edit -m "" -a "${asc}" "${tag}"
gh release upload "${tag}" "${tarball}"
gh release upload "${tag}" "${asc}"
popd

View File

@@ -36,7 +36,17 @@ jobs:
- name: Checkout code to allow hub to communicate with the project
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v2
uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ github.event.pull_request.base.ref }}
- name: Install porting checker script
run: |

View File

@@ -1,42 +0,0 @@
name: CI | Run cri-containerd tests
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
commit-hash:
required: false
type: string
jobs:
run-cri-containerd:
strategy:
fail-fast: true
matrix:
containerd_version: ['lts', 'active']
vmm: ['clh', 'qemu']
runs-on: garm-ubuntu-2204
env:
CONTAINERD_VERSION: ${{ matrix.containerd_version }}
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v3
with:
ref: ${{ inputs.commit-hash }}
- name: Install dependencies
run: bash tests/integration/cri-containerd/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/cri-containerd/gha-run.sh install-kata kata-artifacts
- name: Run cri-containerd tests
run: bash tests/integration/cri-containerd/gha-run.sh run

View File

@@ -0,0 +1,56 @@
name: CI | Run docker integration tests
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-docker-tests:
strategy:
# We can set this to true whenever we're 100% sure that
# all the tests are not flaky, otherwise we'll fail them
# all due to a single flaky instance.
fail-fast: false
matrix:
vmm:
- clh
- qemu
runs-on: garm-ubuntu-2304-smaller
env:
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/docker/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/docker/gha-run.sh install-kata kata-artifacts
- name: Run docker smoke test
timeout-minutes: 5
run: bash tests/integration/docker/gha-run.sh run

View File

@@ -17,6 +17,10 @@ on:
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-k8s-tests:
@@ -29,6 +33,9 @@ jobs:
- clh
- dragonball
- qemu
instance-type:
- small
- normal
include:
- host_os: cbl-mariner
vmm: clh
@@ -40,11 +47,20 @@ jobs:
GH_PR_NUMBER: ${{ inputs.pr-number }}
KATA_HOST_OS: ${{ matrix.host_os }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: "vanilla"
USING_NFD: "false"
K8S_TEST_HOST_TYPE: ${{ matrix.instance-type }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Download Azure CLI
run: bash tests/integration/kubernetes/gha-run.sh install-azure-cli

View File

@@ -0,0 +1,88 @@
name: CI | Run kubernetes tests on GARM
on:
workflow_call:
inputs:
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
pr-number:
required: true
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-k8s-tests:
strategy:
fail-fast: false
matrix:
vmm:
- clh #cloud-hypervisor
- fc #firecracker
- qemu
snapshotter:
- devmapper
k8s:
- k3s
instance:
- garm-ubuntu-2004
- garm-ubuntu-2004-smaller
include:
- instance: garm-ubuntu-2004
instance-type: normal
- instance: garm-ubuntu-2004-smaller
instance-type: small
runs-on: ${{ matrix.instance }}
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: ${{ matrix.k8s }}
SNAPSHOTTER: ${{ matrix.snapshotter }}
USING_NFD: "false"
K8S_TEST_HOST_TYPE: ${{ matrix.instance-type }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Deploy ${{ matrix.k8s }}
run: bash tests/integration/kubernetes/gha-run.sh deploy-k8s
- name: Configure the ${{ matrix.snapshotter }} snapshotter
run: bash tests/integration/kubernetes/gha-run.sh configure-snapshotter
- name: Deploy Kata
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-garm
- name: Install `bats`
run: bash tests/integration/kubernetes/gha-run.sh install-bats
- name: Run tests
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Delete kata-deploy
if: always()
run: bash tests/integration/kubernetes/gha-run.sh cleanup-garm

View File

@@ -1,48 +0,0 @@
name: CI | Run kubernetes tests on SEV
on:
workflow_call:
inputs:
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
commit-hash:
required: false
type: string
jobs:
run-k8s-tests:
strategy:
fail-fast: false
matrix:
vmm:
- qemu-sev
runs-on: sev
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBECONFIG: /home/kata/.kube/config
USING_NFD: "false"
steps:
- uses: actions/checkout@v3
with:
ref: ${{ inputs.commit-hash }}
- name: Deploy Kata
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-sev
- name: Run tests
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Delete kata-deploy
if: always()
run: bash tests/integration/kubernetes/gha-run.sh cleanup-sev

View File

@@ -1,48 +0,0 @@
name: CI | Run kubernetes tests on SEV-SNP
on:
workflow_call:
inputs:
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
commit-hash:
required: false
type: string
jobs:
run-k8s-tests:
strategy:
fail-fast: false
matrix:
vmm:
- qemu-snp
runs-on: sev-snp
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBECONFIG: /home/kata/.kube/config
USING_NFD: "false"
steps:
- uses: actions/checkout@v3
with:
ref: ${{ inputs.commit-hash }}
- name: Deploy Kata
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-snp
- name: Run tests
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Delete kata-deploy
if: always()
run: bash tests/integration/kubernetes/gha-run.sh cleanup-snp

View File

@@ -1,47 +0,0 @@
name: CI | Run kubernetes tests on TDX
on:
workflow_call:
inputs:
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
commit-hash:
required: false
type: string
jobs:
run-k8s-tests:
strategy:
fail-fast: false
matrix:
vmm:
- qemu-tdx
runs-on: tdx
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
USING_NFD: "true"
steps:
- uses: actions/checkout@v3
with:
ref: ${{ inputs.commit-hash }}
- name: Deploy Kata
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-tdx
- name: Run tests
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Delete kata-deploy
if: always()
run: bash tests/integration/kubernetes/gha-run.sh cleanup-tdx

View File

@@ -0,0 +1,86 @@
name: CI | Run kubernetes tests, using CRI-O, on GARM
on:
workflow_call:
inputs:
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
pr-number:
required: true
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-k8s-tests:
strategy:
fail-fast: false
matrix:
vmm:
- qemu
k8s:
- k0s
instance:
- garm-ubuntu-2004
- garm-ubuntu-2004-smaller
include:
- instance: garm-ubuntu-2004
instance-type: normal
- instance: garm-ubuntu-2004-smaller
instance-type: small
- k8s: k0s
k8s-extra-params: '--cri-socket remote:unix:///var/run/crio/crio.sock --kubelet-extra-args --cgroup-driver="systemd"'
runs-on: ${{ matrix.instance }}
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: ${{ matrix.k8s }}
KUBERNETES_EXTRA_PARAMS: ${{ matrix.k8s-extra-params }}
USING_NFD: "false"
K8S_TEST_HOST_TYPE: ${{ matrix.instance-type }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Configure CRI-O
run: bash tests/integration/kubernetes/gha-run.sh setup-crio
- name: Deploy ${{ matrix.k8s }}
run: bash tests/integration/kubernetes/gha-run.sh deploy-k8s
- name: Deploy Kata
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-garm
- name: Install `bats`
run: bash tests/integration/kubernetes/gha-run.sh install-bats
- name: Run tests
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Delete kata-deploy
if: always()
run: bash tests/integration/kubernetes/gha-run.sh cleanup-garm

View File

@@ -0,0 +1,176 @@
name: CI | Run kata coco tests
on:
workflow_call:
inputs:
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
pr-number:
required: true
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-kata-deploy-tests-on-tdx:
strategy:
fail-fast: false
matrix:
vmm:
- qemu-tdx
runs-on: tdx
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: "k3s"
USING_NFD: "true"
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Run tests
run: bash tests/functional/kata-deploy/gha-run.sh run-tests
run-k8s-tests-on-tdx:
strategy:
fail-fast: false
matrix:
vmm:
- qemu-tdx
runs-on: tdx
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: "k3s"
USING_NFD: "true"
K8S_TEST_HOST_TYPE: "baremetal"
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Deploy Kata
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-tdx
- name: Run tests
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Delete kata-deploy
if: always()
run: bash tests/integration/kubernetes/gha-run.sh cleanup-tdx
run-k8s-tests-on-sev:
strategy:
fail-fast: false
matrix:
vmm:
- qemu-sev
runs-on: sev
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBECONFIG: /home/kata/.kube/config
KUBERNETES: "vanilla"
USING_NFD: "false"
K8S_TEST_HOST_TYPE: "baremetal"
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Deploy Kata
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-sev
- name: Run tests
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Delete kata-deploy
if: always()
run: bash tests/integration/kubernetes/gha-run.sh cleanup-sev
run-k8s-tests-sev-snp:
strategy:
fail-fast: false
matrix:
vmm:
- qemu-snp
runs-on: sev-snp
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBECONFIG: /home/kata/.kube/config
KUBERNETES: "vanilla"
USING_NFD: "false"
K8S_TEST_HOST_TYPE: "baremetal"
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Deploy Kata
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-snp
- name: Run tests
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Delete kata-deploy
if: always()
run: bash tests/integration/kubernetes/gha-run.sh cleanup-snp

View File

@@ -0,0 +1,89 @@
name: CI | Run kata-deploy tests on AKS
on:
workflow_call:
inputs:
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
pr-number:
required: true
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-kata-deploy-tests:
strategy:
fail-fast: false
matrix:
host_os:
- ubuntu
vmm:
- clh
- dragonball
- qemu
include:
- host_os: cbl-mariner
vmm: clh
runs-on: ubuntu-latest
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
GH_PR_NUMBER: ${{ inputs.pr-number }}
KATA_HOST_OS: ${{ matrix.host_os }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: "vanilla"
USING_NFD: "false"
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Download Azure CLI
run: bash tests/functional/kata-deploy/gha-run.sh install-azure-cli
- name: Log into the Azure account
run: bash tests/functional/kata-deploy/gha-run.sh login-azure
env:
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_PASSWORD: ${{ secrets.AZ_PASSWORD }}
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}
- name: Create AKS cluster
timeout-minutes: 10
run: bash tests/functional/kata-deploy/gha-run.sh create-cluster
- name: Install `bats`
run: bash tests/functional/kata-deploy/gha-run.sh install-bats
- name: Install `kubectl`
run: bash tests/functional/kata-deploy/gha-run.sh install-kubectl
- name: Download credentials for the Kubernetes CLI to use them
run: bash tests/functional/kata-deploy/gha-run.sh get-cluster-credentials
- name: Run tests
run: bash tests/functional/kata-deploy/gha-run.sh run-tests
- name: Delete AKS cluster
if: always()
run: bash tests/functional/kata-deploy/gha-run.sh delete-cluster

View File

@@ -0,0 +1,65 @@
name: CI | Run kata-deploy tests on GARM
on:
workflow_call:
inputs:
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
pr-number:
required: true
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-kata-deploy-tests:
strategy:
fail-fast: false
matrix:
vmm:
- clh
- qemu
k8s:
- k0s
- k3s
- rke2
runs-on: garm-ubuntu-2004-smaller
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: ${{ matrix.k8s }}
USING_NFD: "false"
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Deploy ${{ matrix.k8s }}
run: bash tests/functional/kata-deploy/gha-run.sh deploy-k8s
- name: Install `bats`
run: bash tests/functional/kata-deploy/gha-run.sh install-bats
- name: Run tests
run: bash tests/functional/kata-deploy/gha-run.sh run-tests

View File

@@ -0,0 +1,59 @@
name: CI | Run kata-monitor tests
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-monitor:
strategy:
fail-fast: false
matrix:
vmm:
- qemu
container_engine:
- crio
- containerd
include:
- container_engine: containerd
containerd_version: lts
runs-on: garm-ubuntu-2204-smaller
env:
CONTAINER_ENGINE: ${{ matrix.container_engine }}
CONTAINERD_VERSION: ${{ matrix.containerd_version }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/functional/kata-monitor/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/functional/kata-monitor/gha-run.sh install-kata kata-artifacts
- name: Run kata-monitor tests
run: bash tests/functional/kata-monitor/gha-run.sh run

View File

@@ -8,22 +8,28 @@ on:
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-metrics:
strategy:
fail-fast: true
matrix:
vmm: ['clh', 'qemu']
max-parallel: 1
setup-kata:
name: Kata Setup
runs-on: metrics
env:
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tarball
uses: actions/download-artifact@v3
@@ -34,6 +40,24 @@ jobs:
- name: Install kata
run: bash tests/metrics/gha-run.sh install-kata kata-artifacts
run-metrics:
needs: setup-kata
strategy:
# We can set this to true whenever we're 100% sure that
# the all the tests are not flaky, otherwise we'll fail
# all the tests due to a single flaky instance.
fail-fast: false
matrix:
vmm: ['clh', 'qemu']
max-parallel: 1
runs-on: metrics
env:
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- name: enabling the hypervisor
run: bash tests/metrics/gha-run.sh enabling-hypervisor
- name: run launch times test
run: bash tests/metrics/gha-run.sh run-test-launchtimes
@@ -49,9 +73,18 @@ jobs:
- name: run tensorflow test
run: bash tests/metrics/gha-run.sh run-test-tensorflow
- name: run fio test
run: bash tests/metrics/gha-run.sh run-test-fio
- name: run iperf test
run: bash tests/metrics/gha-run.sh run-test-iperf
- name: run latency test
run: bash tests/metrics/gha-run.sh run-test-latency
- name: make metrics tarball ${{ matrix.vmm }}
run: bash tests/metrics/gha-run.sh make-tarball-results
- name: archive metrics results ${{ matrix.vmm }}
uses: actions/upload-artifact@v3
with:

View File

@@ -0,0 +1,57 @@
name: CI | Run nerdctl integration tests
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-nerdctl-tests:
strategy:
# We can set this to true whenever we're 100% sure that
# all the tests are not flaky, otherwise we'll fail them
# all due to a single flaky instance.
fail-fast: false
matrix:
vmm:
- clh
- dragonball
- qemu
runs-on: garm-ubuntu-2304-smaller
env:
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/nerdctl/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/nerdctl/gha-run.sh install-kata kata-artifacts
- name: Run nerdctl smoke test
timeout-minutes: 5
run: bash tests/integration/nerdctl/gha-run.sh run

View File

@@ -1,42 +0,0 @@
name: CI | Run nydus tests
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
commit-hash:
required: false
type: string
jobs:
run-nydus:
strategy:
fail-fast: true
matrix:
containerd_version: ['lts', 'active']
vmm: ['clh', 'qemu', 'dragonball']
runs-on: garm-ubuntu-2204
env:
CONTAINERD_VERSION: ${{ matrix.containerd_version }}
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v3
with:
ref: ${{ inputs.commit-hash }}
- name: Install dependencies
run: bash tests/integration/nydus/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/nydus/gha-run.sh install-kata kata-artifacts
- name: Run nydus tests
run: bash tests/integration/nydus/gha-run.sh run

46
.github/workflows/run-runk-tests.yaml vendored Normal file
View File

@@ -0,0 +1,46 @@
name: CI | Run runk tests
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-runk:
runs-on: garm-ubuntu-2204-smaller
env:
CONTAINERD_VERSION: lts
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/runk/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/runk/gha-run.sh install-kata kata-artifacts
- name: Run tracing tests
run: bash tests/integration/runk/gha-run.sh run

View File

@@ -1,37 +0,0 @@
name: CI | Run vfio tests
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
commit-hash:
required: false
type: string
jobs:
run-vfio:
strategy:
fail-fast: false
matrix:
vmm: ['clh', 'qemu']
runs-on: garm-ubuntu-2204
env:
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v3
with:
ref: ${{ inputs.commit-hash }}
- name: Install dependencies
run: bash tests/functional/vfio/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Run vfio tests
run: bash tests/functional/vfio/gha-run.sh run

View File

@@ -1,37 +0,0 @@
on:
pull_request:
types:
- opened
- edited
- reopened
- synchronize
paths-ignore: [ '**.md', '**.png', '**.jpg', '**.jpeg', '**.svg', '/docs/**' ]
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
name: Static checks dragonball
jobs:
test-dragonball:
runs-on: dragonball
env:
RUST_BACKTRACE: "1"
steps:
- uses: actions/checkout@v3
- name: Set env
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
- name: Install Rust
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
./ci/install_rust.sh
echo PATH="$HOME/.cargo/bin:$PATH" >> $GITHUB_ENV
- name: Run Unit Test
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd src/dragonball
cargo version
rustc --version
sudo -E env PATH=$PATH LIBC=gnu SUPPORT_VIRTUALIZATION=true make test

View File

@@ -12,74 +12,183 @@ concurrency:
name: Static checks
jobs:
static-checks:
check-kernel-config-version:
runs-on: ubuntu-latest
steps:
- name: Checkout the code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Ensure the kernel config version has been updated
run: |
kernel_dir="tools/packaging/kernel/"
kernel_version_file="${kernel_dir}kata_config_version"
modified_files=$(git diff --name-only origin/$GITHUB_BASE_REF..HEAD)
if git diff --name-only origin/$GITHUB_BASE_REF..HEAD "${kernel_dir}" | grep "${kernel_dir}"; then
echo "Kernel directory has changed, checking if $kernel_version_file has been updated"
if echo "$modified_files" | grep -v "README.md" | grep "${kernel_dir}" >>"/dev/null"; then
echo "$modified_files" | grep "$kernel_version_file" >>/dev/null || ( echo "Please bump version in $kernel_version_file" && exit 1)
else
echo "Readme file changed, no need for kernel config version update."
fi
echo "Check passed"
fi
build-checks:
runs-on: ubuntu-20.04
strategy:
fail-fast: false
matrix:
cmd:
component:
- agent
- dragonball
- runtime
- runtime-rs
- agent-ctl
- kata-ctl
- log-parser-rs
- runk
- trace-forwarder
command:
- "make vendor"
- "make static-checks"
- "make check"
- "make test"
- "sudo -E PATH=\"$PATH\" make test"
include:
- component: agent
component-path: src/agent
- component: dragonball
component-path: src/dragonball
- component: runtime
component-path: src/runtime
- component: runtime-rs
component-path: src/runtime-rs
- component: agent-ctl
component-path: src/tools/agent-ctl
- component: kata-ctl
component-path: src/tools/kata-ctl
- component: log-parser-rs
component-path: src/tools/log-parser-rs
- component: runk
component-path: src/tools/runk
- component: trace-forwarder
component-path: src/tools/trace-forwarder
- install-libseccomp: no
- component: agent
install-libseccomp: yes
- component: runk
install-libseccomp: yes
steps:
- name: Checkout the code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install yq
run: |
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Install golang
if: ${{ matrix.component == 'runtime' }}
run: |
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> $GITHUB_PATH
- name: Install rust
if: ${{ matrix.component != 'runtime' }}
run: |
./tests/install_rust.sh
echo "${HOME}/.cargo/bin" >> $GITHUB_PATH
- name: Install musl-tools
if: ${{ matrix.component != 'runtime' }}
run: sudo apt-get -y install musl-tools
- name: Install libseccomp
if: ${{ matrix.command != 'make vendor' && matrix.command != 'make check' && matrix.install-libseccomp == 'yes' }}
run: |
libseccomp_install_dir=$(mktemp -d -t libseccomp.XXXXXXXXXX)
gperf_install_dir=$(mktemp -d -t gperf.XXXXXXXXXX)
./ci/install_libseccomp.sh "${libseccomp_install_dir}" "${gperf_install_dir}"
echo "Set environment variables for the libseccomp crate to link the libseccomp library statically"
echo "LIBSECCOMP_LINK_TYPE=static" >> $GITHUB_ENV
echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> $GITHUB_ENV
- name: Setup XDG_RUNTIME_DIR for the `runtime` tests
if: ${{ matrix.command != 'make vendor' && matrix.command != 'make check' && matrix.component == 'runtime' }}
run: |
XDG_RUNTIME_DIR=$(mktemp -d /tmp/kata-tests-$USER.XXX | tee >(xargs chmod 0700))
echo "XDG_RUNTIME_DIR=${XDG_RUNTIME_DIR}" >> $GITHUB_ENV
- name: Running `${{ matrix.command }}` for ${{ matrix.component }}
run: |
cd ${{ matrix.component-path }}
${{ matrix.command }}
env:
RUST_BACKTRACE: "1"
build-checks-depending-on-kvm:
runs-on: garm-ubuntu-2004-smaller
strategy:
fail-fast: false
matrix:
component:
- runtime-rs
include:
- component: runtime-rs
command: "sudo -E env PATH=$PATH LIBC=gnu SUPPORT_VIRTUALIZATION=true make test"
- component: runtime-rs
component-path: src/dragonball
steps:
- name: Checkout the code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install system deps
run: |
sudo apt-get install -y build-essential musl-tools
- name: Install yq
run: |
sudo -E ./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Install rust
run: |
export PATH="$PATH:/usr/local/bin"
./tests/install_rust.sh
- name: Running `${{ matrix.command }}` for ${{ matrix.component }}
run: |
export PATH="$PATH:${HOME}/.cargo/bin"
cd ${{ matrix.component-path }}
${{ matrix.command }}
env:
RUST_BACKTRACE: "1"
static-checks:
runs-on: ubuntu-20.04
strategy:
fail-fast: false
matrix:
cmd:
- "make static-checks"
env:
RUST_BACKTRACE: "1"
target_branch: ${{ github.base_ref }}
GOPATH: ${{ github.workspace }}
steps:
- name: Free disk space
run: |
sudo rm -rf /usr/share/dotnet
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- name: Checkout code
uses: actions/checkout@v3
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Install Go
uses: actions/setup-go@v3
with:
go-version: 1.19.3
- name: Check kernel config version
run: |
cd "${{ github.workspace }}/src/github.com/${{ github.repository }}"
kernel_dir="tools/packaging/kernel/"
kernel_version_file="${kernel_dir}kata_config_version"
modified_files=$(git diff --name-only origin/main..HEAD)
if git diff --name-only origin/main..HEAD "${kernel_dir}" | grep "${kernel_dir}"; then
echo "Kernel directory has changed, checking if $kernel_version_file has been updated"
if echo "$modified_files" | grep -v "README.md" | grep "${kernel_dir}" >>"/dev/null"; then
echo "$modified_files" | grep "$kernel_version_file" >>/dev/null || ( echo "Please bump version in $kernel_version_file" && exit 1)
else
echo "Readme file changed, no need for kernel config version update."
fi
echo "Check passed"
fi
- name: Set PATH
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Setup
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
- name: Installing rust
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_rust.sh
PATH=$PATH:"$HOME/.cargo/bin"
rustup target add x86_64-unknown-linux-musl
rustup component add rustfmt clippy
- name: Setup seccomp
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
libseccomp_install_dir=$(mktemp -d -t libseccomp.XXXXXXXXXX)
gperf_install_dir=$(mktemp -d -t gperf.XXXXXXXXXX)
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_libseccomp.sh "${libseccomp_install_dir}" "${gperf_install_dir}"
echo "Set environment variables for the libseccomp crate to link the libseccomp library statically"
echo "LIBSECCOMP_LINK_TYPE=static" >> $GITHUB_ENV
echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> $GITHUB_ENV
- name: Run check
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ${{ matrix.cmd }}
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Install yq
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }}
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Install golang
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }}
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> $GITHUB_PATH
- name: Install system dependencies
run: |
sudo apt-get -y install moreutils hunspell pandoc
- name: Run check
run: |
export PATH=${PATH}:${GOPATH}/bin
cd ${GOPATH}/src/github.com/${{ github.repository }} && ${{ matrix.cmd }}

View File

@@ -1 +1 @@
3.2.0-alpha4
3.3.0-alpha0

View File

@@ -7,12 +7,10 @@
set -o errexit
cidir=$(dirname "$0")
source "${cidir}/lib.sh"
script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
script_name="$(basename "${BASH_SOURCE[0]}")"
clone_tests_repo
source "${tests_repo_dir}/.ci/lib.sh"
source "${script_dir}/../tests/common.bash"
# The following variables if set on the environment will change the behavior
# of gperf and libseccomp configure scripts, that may lead this script to
@@ -25,11 +23,11 @@ workdir="$(mktemp -d --tmpdir build-libseccomp.XXXXX)"
# Variables for libseccomp
libseccomp_version="${LIBSECCOMP_VERSION:-""}"
if [ -z "${libseccomp_version}" ]; then
libseccomp_version=$(get_version "externals.libseccomp.version")
libseccomp_version=$(get_from_kata_deps "externals.libseccomp.version")
fi
libseccomp_url="${LIBSECCOMP_URL:-""}"
if [ -z "${libseccomp_url}" ]; then
libseccomp_url=$(get_version "externals.libseccomp.url")
libseccomp_url=$(get_from_kata_deps "externals.libseccomp.url")
fi
libseccomp_tarball="libseccomp-${libseccomp_version}.tar.gz"
libseccomp_tarball_url="${libseccomp_url}/releases/download/v${libseccomp_version}/${libseccomp_tarball}"
@@ -38,11 +36,11 @@ cflags="-O2"
# Variables for gperf
gperf_version="${GPERF_VERSION:-""}"
if [ -z "${gperf_version}" ]; then
gperf_version=$(get_version "externals.gperf.version")
gperf_version=$(get_from_kata_deps "externals.gperf.version")
fi
gperf_url="${GPERF_URL:-""}"
if [ -z "${gperf_url}" ]; then
gperf_url=$(get_version "externals.gperf.url")
gperf_url=$(get_from_kata_deps "externals.gperf.url")
fi
gperf_tarball="gperf-${gperf_version}.tar.gz"
gperf_tarball_url="${gperf_url}/${gperf_tarball}"
@@ -87,7 +85,8 @@ build_and_install_libseccomp() {
curl -sLO "${libseccomp_tarball_url}"
tar -xf "${libseccomp_tarball}"
pushd "libseccomp-${libseccomp_version}"
./configure --prefix="${libseccomp_install_dir}" CFLAGS="${cflags}" --enable-static --host="${arch}"
[ "${arch}" == $(uname -m) ] && cc_name="" || cc_name="${arch}-linux-gnu-gcc"
CC=${cc_name} ./configure --prefix="${libseccomp_install_dir}" CFLAGS="${cflags}" --enable-static --host="${arch}"
make
make install
popd

View File

@@ -14,6 +14,7 @@ Kata Containers design documents:
- [`Inotify` support](inotify.md)
- [`Hooks` support](hooks-handling.md)
- [Metrics(Kata 2.0)](kata-2-0-metrics.md)
- [Metrics in Rust Runtime(runtime-rs)](kata-metrics-in-runtime-rs.md)
- [Design for Kata Containers `Lazyload` ability with `nydus`](kata-nydus-design.md)
- [Design for direct-assigned volume](direct-blk-device-assignment.md)
- [Design for core-scheduling](core-scheduling.md)

View File

@@ -12,7 +12,7 @@ only needs to run a container runtime and a container agent (called a
Kata Containers represents a Kubelet pod as a VM.
A Kubernetes cluster runs a control plane where a scheduler (typically
running on a dedicated master node) calls into a compute Kubelet. This
running on a dedicated control-plane node) calls into a compute Kubelet. This
Kubelet instance is responsible for managing the lifecycle of pods
within the nodes and eventually relies on a container runtime to
handle execution. The Kubelet architecture decouples lifecycle

View File

@@ -0,0 +1,50 @@
# Kata Metrics in Rust Runtime(runtime-rs)
Rust Runtime(runtime-rs) is responsible for:
- Gather metrics about `shim`.
- Gather metrics from `hypervisor` (through `channel`).
- Get metrics from `agent` (through `ttrpc`).
---
Here are listed all the metrics gathered by `runtime-rs`.
> * Current status of each entry is marked as:
> * ✅DONE
> * 🚧TODO
### Kata Shim
| STATUS | Metric name | Type | Units | Labels |
| ------ | ------------------------------------------------------------ | ----------- | -------------- | ------------------------------------------------------------ |
| 🚧 | `kata_shim_agent_rpc_durations_histogram_milliseconds`: <br> RPC latency distributions. | `HISTOGRAM` | `milliseconds` | <ul><li>`action` (RPC actions of Kata agent)<ul><li>`grpc.CheckRequest`</li><li>`grpc.CloseStdinRequest`</li><li>`grpc.CopyFileRequest`</li><li>`grpc.CreateContainerRequest`</li><li>`grpc.CreateSandboxRequest`</li><li>`grpc.DestroySandboxRequest`</li><li>`grpc.ExecProcessRequest`</li><li>`grpc.GetMetricsRequest`</li><li>`grpc.GuestDetailsRequest`</li><li>`grpc.ListInterfacesRequest`</li><li>`grpc.ListProcessesRequest`</li><li>`grpc.ListRoutesRequest`</li><li>`grpc.MemHotplugByProbeRequest`</li><li>`grpc.OnlineCPUMemRequest`</li><li>`grpc.PauseContainerRequest`</li><li>`grpc.RemoveContainerRequest`</li><li>`grpc.ReseedRandomDevRequest`</li><li>`grpc.ResumeContainerRequest`</li><li>`grpc.SetGuestDateTimeRequest`</li><li>`grpc.SignalProcessRequest`</li><li>`grpc.StartContainerRequest`</li><li>`grpc.StatsContainerRequest`</li><li>`grpc.TtyWinResizeRequest`</li><li>`grpc.UpdateContainerRequest`</li><li>`grpc.UpdateInterfaceRequest`</li><li>`grpc.UpdateRoutesRequest`</li><li>`grpc.WaitProcessRequest`</li><li>`grpc.WriteStreamRequest`</li></ul></li><li>`sandbox_id`</li></ul> |
| ✅ | `kata_shim_fds`: <br> Kata containerd shim v2 open FDs. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> |
| ✅ | `kata_shim_io_stat`: <br> Kata containerd shim v2 process IO statistics. | `GAUGE` | | <ul><li>`item` (see `/proc/<pid>/io`)<ul><li>`cancelledwritebytes`</li><li>`rchar`</li><li>`readbytes`</li><li>`syscr`</li><li>`syscw`</li><li>`wchar`</li><li>`writebytes`</li></ul></li><li>`sandbox_id`</li></ul> |
| ✅ | `kata_shim_netdev`: <br> Kata containerd shim v2 network devices statistics. | `GAUGE` | | <ul><li>`interface` (network device name)</li><li>`item` (see `/proc/net/dev`)<ul><li>`recv_bytes`</li><li>`recv_compressed`</li><li>`recv_drop`</li><li>`recv_errs`</li><li>`recv_fifo`</li><li>`recv_frame`</li><li>`recv_multicast`</li><li>`recv_packets`</li><li>`sent_bytes`</li><li>`sent_carrier`</li><li>`sent_colls`</li><li>`sent_compressed`</li><li>`sent_drop`</li><li>`sent_errs`</li><li>`sent_fifo`</li><li>`sent_packets`</li></ul></li><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_pod_overhead_cpu`: <br> Kata Pod overhead for CPU resources(percent). | `GAUGE` | percent | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_pod_overhead_memory_in_bytes`: <br> Kata Pod overhead for memory resources(bytes). | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> |
| ✅ | `kata_shim_proc_stat`: <br> Kata containerd shim v2 process statistics. | `GAUGE` | | <ul><li>`item` (see `/proc/<pid>/stat`)<ul><li>`cstime`</li><li>`cutime`</li><li>`stime`</li><li>`utime`</li></ul></li><li>`sandbox_id`</li></ul> |
| ✅ | `kata_shim_proc_status`: <br> Kata containerd shim v2 process status. | `GAUGE` | | <ul><li>`item` (see `/proc/<pid>/status`)<ul><li>`hugetlbpages`</li><li>`nonvoluntary_ctxt_switches`</li><li>`rssanon`</li><li>`rssfile`</li><li>`rssshmem`</li><li>`vmdata`</li><li>`vmexe`</li><li>`vmhwm`</li><li>`vmlck`</li><li>`vmlib`</li><li>`vmpeak`</li><li>`vmpin`</li><li>`vmpmd`</li><li>`vmpte`</li><li>`vmrss`</li><li>`vmsize`</li><li>`vmstk`</li><li>`vmswap`</li><li>`voluntary_ctxt_switches`</li></ul></li><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_cpu_seconds_total`: <br> Total user and system CPU time spent in seconds. | `COUNTER` | `seconds` | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_max_fds`: <br> Maximum number of open file descriptors. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_open_fds`: <br> Number of open file descriptors. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_resident_memory_bytes`: <br> Resident memory size in bytes. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_start_time_seconds`: <br> Start time of the process since `unix` epoch in seconds. | `GAUGE` | `seconds` | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_virtual_memory_bytes`: <br> Virtual memory size in bytes. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_virtual_memory_max_bytes`: <br> Maximum amount of virtual memory available in bytes. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_rpc_durations_histogram_milliseconds`: <br> RPC latency distributions. | `HISTOGRAM` | `milliseconds` | <ul><li>`action` (Kata shim v2 actions)<ul><li>`checkpoint`</li><li>`close_io`</li><li>`connect`</li><li>`create`</li><li>`delete`</li><li>`exec`</li><li>`kill`</li><li>`pause`</li><li>`pids`</li><li>`resize_pty`</li><li>`resume`</li><li>`shutdown`</li><li>`start`</li><li>`state`</li><li>`stats`</li><li>`update`</li><li>`wait`</li></ul></li><li>`sandbox_id`</li></ul> |
| ✅ | `kata_shim_threads`: <br> Kata containerd shim v2 process threads. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> |
### Kata Hypervisor
Different from golang runtime, hypervisor and shim in runtime-rs belong to the **same process**, so all previous metrics for hypervisor and shim only need to be gathered once. Thus, we currently only collect previous metrics in kata shim.
At the same time, we added the interface(`VmmAction::GetHypervisorMetrics`) to gather hypervisor metrics, in case we design tailor-made metrics for hypervisor in the future. Here're metrics exposed from [src/dragonball/src/metric.rs](https://github.com/kata-containers/kata-containers/blob/main/src/dragonball/src/metric.rs).
| Metric name | Type | Units | Labels |
| ------------------------------------------------------------ | ---------- | ----- | ------------------------------------------------------------ |
| `kata_hypervisor_scrape_count`: <br> Metrics scrape count | `COUNTER` | | <ul><li>`sandbox_id`</li></ul> |
| `kata_hypervisor_vcpu`: <br>Hypervisor metrics specific to VCPUs' mode of functioning. | `IntGauge` | | <ul><li>`item`<ul><li>`exit_io_in`</li><li>`exit_io_out`</li><li>`exit_mmio_read`</li><li>`exit_mmio_write`</li><li>`failures`</li><li>`filter_cpuid`</li></ul></li><li>`sandbox_id`</li></ul> |
| `kata_hypervisor_seccomp`: <br> Hypervisor metrics for the seccomp filtering. | `IntGauge` | | <ul><li>`item`<ul><li>`num_faults`</li></ul></li><li>`sandbox_id`</li></ul> |
| `kata_hypervisor_seccomp`: <br> Hypervisor metrics for the seccomp filtering. | `IntGauge` | | <ul><li>`item`<ul><li>`sigbus`</li><li>`sigsegv`</li></ul></li><li>`sandbox_id`</li></ul> |

View File

@@ -43,7 +43,7 @@ and perform DMA transactions _anywhere_.
The second feature is ACS (Access Control Services), which controls which
devices are allowed to communicate with one another and thus avoids improper
routing of packets irrespectively of whether IOMMU is enabled or not.
routing of packets `irrespectively` of whether IOMMU is enabled or not.
When IOMMU is enabled, ACS is normally configured to force all PCI Express DMA
to go through the root complex so IOMMU can translate it, impacting performance
@@ -126,7 +126,7 @@ efficient P2P communication.
## PCI Express Virtual P2P Approval Capability
Most of the time, the PCI Express topology is flattened and obfuscated to ensure
easy migration of the VM image between different physical hardware topologies.
easy migration of the VM image between different physical hardware `topologies`.
In Kata, we can configure the hypervisor to use PCI Express root ports to
hotplug the VFIO  devices one is passing through. A user can select how many PCI
Express root ports to allocate depending on how many devices are passed through.
@@ -220,7 +220,7 @@ containers that he wants to run with Kata. The goal is to make such things as
transparent as possible, so we also introduced
[CDI](https://github.com/container-orchestrated-devices/container-device-interface)
(Container Device Interface) to Kata. CDI is a[
specification](https://github.com/container-orchestrated-devices/container-device-interface/blob/master/SPEC.md)
specification](https://github.com/container-orchestrated-devices/container-device-interface/blob/main/SPEC.md)
for container runtimes to support third-party devices.
As written before, we can provide a clique ID for the devices that belong
@@ -300,7 +300,7 @@ pcie_switch_port = 8
```
Each device that is passed through is attached to a PCI Express downstream port
as illustrated below. We can even replicate the hosts two DPUs topologies with
as illustrated below. We can even replicate the hosts two DPUs `topologies` with
added metadata through the CDI. Most of the time, a container only needs one
pair of GPU and NIC for GPUDirect RDMA. This is more of a showcase of what we
can do with the power of Kata and CDI. One could even think of adding groups of
@@ -328,7 +328,7 @@ $ lspci -tv
```
The configuration of using either the root port or switch port can be applied on
a per Container or Pod basis, meaning we can switch PCI Express topologies on
a per Container or Pod basis, meaning we can switch PCI Express `topologies` on
each run of an application.
## Hypervisor Resource Limits

View File

@@ -28,10 +28,10 @@ __Steps from the Developer Guide:__
__SNP-specific steps:__
- Build the SNP-specific kernel as shown below (see this [guide](../../tools/packaging/kernel/README.md#build-kata-containers-kernel) for more information)
```bash
$ pushd kata-containers/tools/packaging/kernel/
$ ./build-kernel.sh -a x86_64 -x snp setup
$ ./build-kernel.sh -a x86_64 -x snp build
$ sudo -E PATH="${PATH}" ./build-kernel.sh -x snp install
$ pushd kata-containers/tools/packaging/
$ ./kernel/build-kernel.sh -a x86_64 -x snp setup
$ ./kernel/build-kernel.sh -a x86_64 -x snp build
$ sudo -E PATH="${PATH}" ./kernel/build-kernel.sh -x snp install
$ popd
```
- Build a current OVMF capable of SEV-SNP:

View File

@@ -27,6 +27,8 @@ $ image="quay.io/prometheus/busybox:latest"
$ cat << EOF > "${pod_yaml}"
metadata:
name: busybox-sandbox1
uid: $(uuidgen)
namespace: default
EOF
$ cat << EOF > "${container_yaml}"
metadata:

View File

@@ -139,12 +139,12 @@ By default the CNI plugin binaries is installed under `/opt/cni/bin` (in package
EOF
```
## Allow pods to run in the master node
## Allow pods to run in the control-plane node
By default, the cluster will not schedule pods in the master node. To enable master node scheduling:
By default, the cluster will not schedule pods in the control-plane node. To enable control-plane node scheduling:
```bash
$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/master-
$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/control-plane-
```
## Create runtime class for Kata Containers

View File

@@ -19,12 +19,14 @@ This document requires the presence of Kata Containers on your system. Install u
## Install AWS Firecracker
Kata Containers only support AWS Firecracker v0.23.4 ([yet](https://github.com/kata-containers/kata-containers/pull/1519)).
For information about the supported version of Firecracker, see the Kata Containers
[`versions.yaml`](../../versions.yaml).
To install Firecracker we need to get the `firecracker` and `jailer` binaries:
```bash
$ release_url="https://github.com/firecracker-microvm/firecracker/releases"
$ version="v0.23.1"
$ version=$(yq read <kata-repository>/versions.yaml assets.hypervisor.firecracker.version)
$ arch=`uname -m`
$ curl ${release_url}/download/${version}/firecracker-${version}-${arch} -o firecracker
$ curl ${release_url}/download/${version}/jailer-${version}-${arch} -o jailer

View File

@@ -32,6 +32,7 @@ The `nydus-sandbox.yaml` looks like below:
metadata:
attempt: 1
name: nydus-sandbox
uid: nydus-uid
namespace: default
log_directory: /tmp
linux:

View File

@@ -29,7 +29,7 @@ Then you can build and install the guest kernel image as shown [here](../../tool
## Run a Kata Container utilizing `virtio-mem`
Use following command to enable memory overcommitment of a Linux kernel. Because QEMU `virtio-mem` device need to allocate a lot of memory.
Use following command to enable memory over-commitment of a Linux kernel. Because QEMU `virtio-mem` device need to allocate a lot of memory.
```
$ echo 1 | sudo tee /proc/sys/vm/overcommit_memory
```
@@ -42,6 +42,8 @@ $ image="quay.io/prometheus/busybox:latest"
$ cat << EOF > "${pod_yaml}"
metadata:
name: busybox-sandbox1
uid: $(uuidgen)
namespace: default
EOF
$ cat << EOF > "${container_yaml}"
metadata:

View File

@@ -115,11 +115,11 @@ $ sudo kubeadm init --ignore-preflight-errors=all --config kubeadm-config.yaml
$ export KUBECONFIG=/etc/kubernetes/admin.conf
```
### Allow pods to run in the master node
### Allow pods to run in the control-plane node
By default, the cluster will not schedule pods in the master node. To enable master node scheduling:
By default, the cluster will not schedule pods in the control-plane node. To enable control-plane node scheduling:
```bash
$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/master-
$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/control-plane-
```
### Create runtime class for Kata Containers

View File

@@ -91,7 +91,7 @@ Before you install Kata Containers, check that your Minikube is operating. On yo
$ kubectl get nodes
```
You should see your `master` node listed as being `Ready`.
You should see your `control-plane` node listed as being `Ready`.
Check you have virtualization enabled inside your Minikube. The following should return
a number larger than `0` if you have either of the `vmx` or `svm` nested virtualization features

1074
src/agent/Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -8,7 +8,7 @@ license = "Apache-2.0"
[dependencies]
oci = { path = "../libs/oci" }
rustjail = { path = "rustjail" }
protocols = { path = "../libs/protocols", features = ["async"] }
protocols = { path = "../libs/protocols", features = ["async", "with-serde"] }
lazy_static = "1.3.0"
ttrpc = { version = "0.7.1", features = ["async"], default-features = false }
protobuf = "3.2.0"
@@ -67,6 +67,12 @@ serde = { version = "1.0.129", features = ["derive"] }
toml = "0.5.8"
clap = { version = "3.0.1", features = ["derive"] }
# Communication with the OPA service
http = { version = "0.2.8", optional = true }
reqwest = { version = "0.11.14", optional = true }
# The "vendored" feature for openssl is required for musl build
openssl = { version = "0.10.54", features = ["vendored"], optional = true }
[dev-dependencies]
tempfile = "3.1.0"
test-utils = { path = "../libs/test-utils" }
@@ -83,6 +89,7 @@ lto = true
[features]
seccomp = ["rustjail/seccomp"]
standard-oci-runtime = ["rustjail/standard-oci-runtime"]
agent-policy = ["http", "openssl", "reqwest"]
[[bin]]
name = "kata-agent"

View File

@@ -33,6 +33,14 @@ ifeq ($(SECCOMP),yes)
override EXTRA_RUSTFEATURES += seccomp
endif
##VAR AGENT_POLICY=yes|no define if agent enables the policy feature
AGENT_POLICY ?= no
# Enable the policy feature of rust build
ifeq ($(AGENT_POLICY),yes)
override EXTRA_RUSTFEATURES += agent-policy
endif
include ../../utils.mk
ifeq ($(ARCH), ppc64le)
@@ -54,7 +62,7 @@ endif
TARGET_PATH = target/$(TRIPLE)/$(BUILD_TYPE)/$(TARGET)
##VAR DESTDIR=<path> is a directory prepended to each installed target file
DESTDIR :=
DESTDIR ?=
##VAR BINDIR=<path> is a directory for installing executable programs
BINDIR := /usr/bin
@@ -140,7 +148,7 @@ vendor:
#TARGET test: run cargo tests
test:
test: $(GENERATED_FILES)
@cargo test --all --target $(TRIPLE) $(EXTRA_RUSTFEATURES) -- --nocapture
##TARGET check: run test

View File

@@ -34,7 +34,7 @@ futures = "0.3.17"
async-trait = "0.1.31"
inotify = "0.9.2"
libseccomp = { version = "0.3.0", optional = true }
zbus = "2.3.0"
zbus = "3.12.0"
bit-vec= "0.6.3"
xattr = "0.2.3"

View File

@@ -6,7 +6,10 @@
pub const DEFAULT_SLICE: &str = "system.slice";
pub const SLICE_SUFFIX: &str = ".slice";
pub const SCOPE_SUFFIX: &str = ".scope";
pub const UNIT_MODE: &str = "replace";
pub const WHO_ENUM_ALL: &str = "all";
pub const SIGNAL_KILL: i32 = nix::sys::signal::SIGKILL as i32;
pub const UNIT_MODE_REPLACE: &str = "replace";
pub const NO_SUCH_UNIT_ERROR: &str = "org.freedesktop.systemd1.NoSuchUnit";
pub type Properties<'a> = Vec<(&'a str, zbus::zvariant::Value<'a>)>;

View File

@@ -1,56 +1,50 @@
// Copyright 2021-2022 Kata Contributors
// Copyright 2021-2023 Kata Contributors
//
// SPDX-License-Identifier: Apache-2.0
//
use std::vec;
use super::common::CgroupHierarchy;
use super::common::{Properties, SLICE_SUFFIX, UNIT_MODE};
use super::common::{
CgroupHierarchy, Properties, NO_SUCH_UNIT_ERROR, SIGNAL_KILL, SLICE_SUFFIX, UNIT_MODE_REPLACE,
WHO_ENUM_ALL,
};
use super::interface::system::ManagerProxyBlocking as SystemManager;
use anyhow::{Context, Result};
use anyhow::{anyhow, Context, Result};
use zbus::zvariant::Value;
pub trait SystemdInterface {
fn start_unit(
&self,
pid: i32,
parent: &str,
unit_name: &str,
cg_hierarchy: &CgroupHierarchy,
) -> Result<()>;
fn set_properties(&self, unit_name: &str, properties: &Properties) -> Result<()>;
fn stop_unit(&self, unit_name: &str) -> Result<()>;
fn start_unit(&self, pid: i32, parent: &str, cg_hierarchy: &CgroupHierarchy) -> Result<()>;
fn set_properties(&self, properties: &Properties) -> Result<()>;
fn kill_unit(&self) -> Result<()>;
fn freeze_unit(&self) -> Result<()>;
fn thaw_unit(&self) -> Result<()>;
fn add_process(&self, pid: i32) -> Result<()>;
fn get_version(&self) -> Result<String>;
fn unit_exists(&self, unit_name: &str) -> Result<bool>;
fn add_process(&self, pid: i32, unit_name: &str) -> Result<()>;
fn unit_exists(&self) -> Result<bool>;
}
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct DBusClient {}
pub struct DBusClient {
unit_name: String,
}
impl DBusClient {
pub fn new(unit_name: String) -> Self {
Self { unit_name }
}
fn build_proxy(&self) -> Result<SystemManager<'static>> {
let connection =
zbus::blocking::Connection::system().context("Establishing a D-Bus connection")?;
let proxy = SystemManager::new(&connection).context("Building a D-Bus proxy manager")?;
Ok(proxy)
}
}
impl SystemdInterface for DBusClient {
fn start_unit(
&self,
pid: i32,
parent: &str,
unit_name: &str,
cg_hierarchy: &CgroupHierarchy,
) -> Result<()> {
fn start_unit(&self, pid: i32, parent: &str, cg_hierarchy: &CgroupHierarchy) -> Result<()> {
let proxy = self.build_proxy()?;
// enable CPUAccounting & MemoryAccounting & (Block)IOAccounting by default
@@ -68,7 +62,7 @@ impl SystemdInterface for DBusClient {
CgroupHierarchy::Unified => properties.push(("BlockIOAccounting", Value::Bool(true))),
}
if unit_name.ends_with(SLICE_SUFFIX) {
if self.unit_name.ends_with(SLICE_SUFFIX) {
properties.push(("Wants", Value::Str(parent.into())));
} else {
properties.push(("Slice", Value::Str(parent.into())));
@@ -76,27 +70,57 @@ impl SystemdInterface for DBusClient {
}
proxy
.start_transient_unit(unit_name, UNIT_MODE, &properties, &[])
.with_context(|| format!("failed to start transient unit {}", unit_name))?;
Ok(())
}
fn set_properties(&self, unit_name: &str, properties: &Properties) -> Result<()> {
let proxy = self.build_proxy()?;
proxy
.set_unit_properties(unit_name, true, properties)
.with_context(|| format!("failed to set unit properties {}", unit_name))?;
.start_transient_unit(&self.unit_name, UNIT_MODE_REPLACE, &properties, &[])
.context(format!("failed to start transient unit {}", self.unit_name))?;
Ok(())
}
fn stop_unit(&self, unit_name: &str) -> Result<()> {
fn set_properties(&self, properties: &Properties) -> Result<()> {
let proxy = self.build_proxy()?;
proxy
.stop_unit(unit_name, UNIT_MODE)
.with_context(|| format!("failed to stop unit {}", unit_name))?;
.set_unit_properties(&self.unit_name, true, properties)
.context(format!("failed to set unit {} properties", self.unit_name))?;
Ok(())
}
fn kill_unit(&self) -> Result<()> {
let proxy = self.build_proxy()?;
proxy
.kill_unit(&self.unit_name, WHO_ENUM_ALL, SIGNAL_KILL)
.or_else(|e| match e {
zbus::Error::MethodError(error_name, _, _)
if error_name.as_str() == NO_SUCH_UNIT_ERROR =>
{
Ok(())
}
_ => Err(e),
})
.context(format!("failed to kill unit {}", self.unit_name))?;
Ok(())
}
fn freeze_unit(&self) -> Result<()> {
let proxy = self.build_proxy()?;
proxy
.freeze_unit(&self.unit_name)
.context(format!("failed to freeze unit {}", self.unit_name))?;
Ok(())
}
fn thaw_unit(&self) -> Result<()> {
let proxy = self.build_proxy()?;
proxy
.thaw_unit(&self.unit_name)
.context(format!("failed to thaw unit {}", self.unit_name))?;
Ok(())
}
@@ -105,24 +129,37 @@ impl SystemdInterface for DBusClient {
let systemd_version = proxy
.version()
.with_context(|| "failed to get systemd version".to_string())?;
.context("failed to get systemd version".to_string())?;
Ok(systemd_version)
}
fn unit_exists(&self, unit_name: &str) -> Result<bool> {
let proxy = self
.build_proxy()
.with_context(|| format!("Checking if systemd unit {} exists", unit_name))?;
fn unit_exists(&self) -> Result<bool> {
let proxy = self.build_proxy()?;
Ok(proxy.get_unit(unit_name).is_ok())
match proxy.get_unit(&self.unit_name) {
Ok(_) => Ok(true),
Err(zbus::Error::MethodError(error_name, _, _))
if error_name.as_str() == NO_SUCH_UNIT_ERROR =>
{
Ok(false)
}
Err(e) => Err(anyhow!(format!(
"failed to check if unit {} exists: {:?}",
self.unit_name, e
))),
}
}
fn add_process(&self, pid: i32, unit_name: &str) -> Result<()> {
fn add_process(&self, pid: i32) -> Result<()> {
let proxy = self.build_proxy()?;
proxy
.attach_processes_to_unit(unit_name, "/", &[pid as u32])
.with_context(|| format!("failed to add process {}", unit_name))?;
.attach_processes_to_unit(&self.unit_name, "/", &[pid as u32])
.context(format!(
"failed to add process into unit {}",
self.unit_name
))?;
Ok(())
}

View File

@@ -1,4 +1,4 @@
// Copyright 2021-2022 Kata Contributors
// Copyright 2021-2023 Kata Contributors
//
// SPDX-License-Identifier: Apache-2.0
//
@@ -8,7 +8,7 @@
//! # DBus interface proxy for: `org.freedesktop.systemd1.Manager`
//!
//! This code was generated by `zbus-xmlgen` `2.0.1` from DBus introspection data.
//! This code was generated by `zbus-xmlgen` `3.1.1` from DBus introspection data.
//! Source: `Interface '/org/freedesktop/systemd1' from service 'org.freedesktop.systemd1' on system bus`.
//!
//! You may prefer to adapt it, instead of using it verbatim.
@@ -189,12 +189,14 @@ trait Manager {
) -> zbus::Result<zbus::zvariant::OwnedObjectPath>;
/// GetUnitByInvocationID method
#[dbus_proxy(name = "GetUnitByInvocationID")]
fn get_unit_by_invocation_id(
&self,
invocation_id: &[u8],
) -> zbus::Result<zbus::zvariant::OwnedObjectPath>;
/// GetUnitByPID method
#[dbus_proxy(name = "GetUnitByPID")]
fn get_unit_by_pid(&self, pid: u32) -> zbus::Result<zbus::zvariant::OwnedObjectPath>;
/// GetUnitFileLinks method
@@ -210,6 +212,7 @@ trait Manager {
fn halt(&self) -> zbus::Result<()>;
/// KExec method
#[dbus_proxy(name = "KExec")]
fn kexec(&self) -> zbus::Result<()>;
/// KillUnit method
@@ -330,6 +333,7 @@ trait Manager {
fn lookup_dynamic_user_by_name(&self, name: &str) -> zbus::Result<u32>;
/// LookupDynamicUserByUID method
#[dbus_proxy(name = "LookupDynamicUserByUID")]
fn lookup_dynamic_user_by_uid(&self, uid: u32) -> zbus::Result<String>;
/// MaskUnitFiles method
@@ -571,139 +575,139 @@ trait Manager {
fn ctrl_alt_del_burst_action(&self) -> zbus::Result<String>;
/// DefaultBlockIOAccounting property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultBlockIOAccounting")]
fn default_block_ioaccounting(&self) -> zbus::Result<bool>;
/// DefaultCPUAccounting property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultCPUAccounting")]
fn default_cpuaccounting(&self) -> zbus::Result<bool>;
/// DefaultLimitAS property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitAS")]
fn default_limit_as(&self) -> zbus::Result<u64>;
/// DefaultLimitASSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitASSoft")]
fn default_limit_assoft(&self) -> zbus::Result<u64>;
/// DefaultLimitCORE property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitCORE")]
fn default_limit_core(&self) -> zbus::Result<u64>;
/// DefaultLimitCORESoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitCORESoft")]
fn default_limit_coresoft(&self) -> zbus::Result<u64>;
/// DefaultLimitCPU property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitCPU")]
fn default_limit_cpu(&self) -> zbus::Result<u64>;
/// DefaultLimitCPUSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitCPUSoft")]
fn default_limit_cpusoft(&self) -> zbus::Result<u64>;
/// DefaultLimitDATA property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitDATA")]
fn default_limit_data(&self) -> zbus::Result<u64>;
/// DefaultLimitDATASoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitDATASoft")]
fn default_limit_datasoft(&self) -> zbus::Result<u64>;
/// DefaultLimitFSIZE property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitFSIZE")]
fn default_limit_fsize(&self) -> zbus::Result<u64>;
/// DefaultLimitFSIZESoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitFSIZESoft")]
fn default_limit_fsizesoft(&self) -> zbus::Result<u64>;
/// DefaultLimitLOCKS property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitLOCKS")]
fn default_limit_locks(&self) -> zbus::Result<u64>;
/// DefaultLimitLOCKSSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitLOCKSSoft")]
fn default_limit_lockssoft(&self) -> zbus::Result<u64>;
/// DefaultLimitMEMLOCK property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitMEMLOCK")]
fn default_limit_memlock(&self) -> zbus::Result<u64>;
/// DefaultLimitMEMLOCKSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitMEMLOCKSoft")]
fn default_limit_memlocksoft(&self) -> zbus::Result<u64>;
/// DefaultLimitMSGQUEUE property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitMSGQUEUE")]
fn default_limit_msgqueue(&self) -> zbus::Result<u64>;
/// DefaultLimitMSGQUEUESoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitMSGQUEUESoft")]
fn default_limit_msgqueuesoft(&self) -> zbus::Result<u64>;
/// DefaultLimitNICE property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitNICE")]
fn default_limit_nice(&self) -> zbus::Result<u64>;
/// DefaultLimitNICESoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitNICESoft")]
fn default_limit_nicesoft(&self) -> zbus::Result<u64>;
/// DefaultLimitNOFILE property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitNOFILE")]
fn default_limit_nofile(&self) -> zbus::Result<u64>;
/// DefaultLimitNOFILESoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitNOFILESoft")]
fn default_limit_nofilesoft(&self) -> zbus::Result<u64>;
/// DefaultLimitNPROC property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitNPROC")]
fn default_limit_nproc(&self) -> zbus::Result<u64>;
/// DefaultLimitNPROCSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitNPROCSoft")]
fn default_limit_nprocsoft(&self) -> zbus::Result<u64>;
/// DefaultLimitRSS property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitRSS")]
fn default_limit_rss(&self) -> zbus::Result<u64>;
/// DefaultLimitRSSSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitRSSSoft")]
fn default_limit_rsssoft(&self) -> zbus::Result<u64>;
/// DefaultLimitRTPRIO property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitRTPRIO")]
fn default_limit_rtprio(&self) -> zbus::Result<u64>;
/// DefaultLimitRTPRIOSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitRTPRIOSoft")]
fn default_limit_rtpriosoft(&self) -> zbus::Result<u64>;
/// DefaultLimitRTTIME property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitRTTIME")]
fn default_limit_rttime(&self) -> zbus::Result<u64>;
/// DefaultLimitRTTIMESoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitRTTIMESoft")]
fn default_limit_rttimesoft(&self) -> zbus::Result<u64>;
/// DefaultLimitSIGPENDING property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitSIGPENDING")]
fn default_limit_sigpending(&self) -> zbus::Result<u64>;
/// DefaultLimitSIGPENDINGSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitSIGPENDINGSoft")]
fn default_limit_sigpendingsoft(&self) -> zbus::Result<u64>;
/// DefaultLimitSTACK property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitSTACK")]
fn default_limit_stack(&self) -> zbus::Result<u64>;
/// DefaultLimitSTACKSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitSTACKSoft")]
fn default_limit_stacksoft(&self) -> zbus::Result<u64>;
/// DefaultMemoryAccounting property
@@ -711,11 +715,11 @@ trait Manager {
fn default_memory_accounting(&self) -> zbus::Result<bool>;
/// DefaultOOMPolicy property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultOOMPolicy")]
fn default_oompolicy(&self) -> zbus::Result<String>;
/// DefaultRestartUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultRestartUSec")]
fn default_restart_usec(&self) -> zbus::Result<u64>;
/// DefaultStandardError property
@@ -731,7 +735,7 @@ trait Manager {
fn default_start_limit_burst(&self) -> zbus::Result<u32>;
/// DefaultStartLimitIntervalUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultStartLimitIntervalUSec")]
fn default_start_limit_interval_usec(&self) -> zbus::Result<u64>;
/// DefaultTasksAccounting property
@@ -743,19 +747,19 @@ trait Manager {
fn default_tasks_max(&self) -> zbus::Result<u64>;
/// DefaultTimeoutAbortUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultTimeoutAbortUSec")]
fn default_timeout_abort_usec(&self) -> zbus::Result<u64>;
/// DefaultTimeoutStartUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultTimeoutStartUSec")]
fn default_timeout_start_usec(&self) -> zbus::Result<u64>;
/// DefaultTimeoutStopUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultTimeoutStopUSec")]
fn default_timeout_stop_usec(&self) -> zbus::Result<u64>;
/// DefaultTimerAccuracyUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultTimerAccuracyUSec")]
fn default_timer_accuracy_usec(&self) -> zbus::Result<u64>;
/// Environment property
@@ -803,65 +807,64 @@ trait Manager {
fn generators_start_timestamp_monotonic(&self) -> zbus::Result<u64>;
/// InitRDGeneratorsFinishTimestamp property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDGeneratorsFinishTimestamp")]
fn init_rdgenerators_finish_timestamp(&self) -> zbus::Result<u64>;
/// InitRDGeneratorsFinishTimestampMonotonic property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDGeneratorsFinishTimestampMonotonic")]
fn init_rdgenerators_finish_timestamp_monotonic(&self) -> zbus::Result<u64>;
/// InitRDGeneratorsStartTimestamp property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDGeneratorsStartTimestamp")]
fn init_rdgenerators_start_timestamp(&self) -> zbus::Result<u64>;
/// InitRDGeneratorsStartTimestampMonotonic property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDGeneratorsStartTimestampMonotonic")]
fn init_rdgenerators_start_timestamp_monotonic(&self) -> zbus::Result<u64>;
/// InitRDSecurityFinishTimestamp property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDSecurityFinishTimestamp")]
fn init_rdsecurity_finish_timestamp(&self) -> zbus::Result<u64>;
/// InitRDSecurityFinishTimestampMonotonic property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDSecurityFinishTimestampMonotonic")]
fn init_rdsecurity_finish_timestamp_monotonic(&self) -> zbus::Result<u64>;
/// InitRDSecurityStartTimestamp property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDSecurityStartTimestamp")]
fn init_rdsecurity_start_timestamp(&self) -> zbus::Result<u64>;
/// InitRDSecurityStartTimestampMonotonic property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDSecurityStartTimestampMonotonic")]
fn init_rdsecurity_start_timestamp_monotonic(&self) -> zbus::Result<u64>;
/// InitRDTimestamp property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDTimestamp")]
fn init_rdtimestamp(&self) -> zbus::Result<u64>;
/// InitRDTimestampMonotonic property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDTimestampMonotonic")]
fn init_rdtimestamp_monotonic(&self) -> zbus::Result<u64>;
/// InitRDUnitsLoadFinishTimestamp property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDUnitsLoadFinishTimestamp")]
fn init_rdunits_load_finish_timestamp(&self) -> zbus::Result<u64>;
/// InitRDUnitsLoadFinishTimestampMonotonic property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDUnitsLoadFinishTimestampMonotonic")]
fn init_rdunits_load_finish_timestamp_monotonic(&self) -> zbus::Result<u64>;
/// InitRDUnitsLoadStartTimestamp property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDUnitsLoadStartTimestamp")]
fn init_rdunits_load_start_timestamp(&self) -> zbus::Result<u64>;
/// InitRDUnitsLoadStartTimestampMonotonic property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDUnitsLoadStartTimestampMonotonic")]
fn init_rdunits_load_start_timestamp_monotonic(&self) -> zbus::Result<u64>;
/// KExecWatchdogUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "KExecWatchdogUSec")]
fn kexec_watchdog_usec(&self) -> zbus::Result<u64>;
#[dbus_proxy(property)]
fn set_kexec_watchdog_usec(&self, value: u64) -> zbus::Result<()>;
/// KernelTimestamp property
@@ -883,33 +886,31 @@ trait Manager {
/// LogLevel property
#[dbus_proxy(property)]
fn log_level(&self) -> zbus::Result<String>;
#[dbus_proxy(property)]
fn set_log_level(&self, value: &str) -> zbus::Result<()>;
/// LogTarget property
#[dbus_proxy(property)]
fn log_target(&self) -> zbus::Result<String>;
#[dbus_proxy(property)]
fn set_log_target(&self, value: &str) -> zbus::Result<()>;
/// NFailedJobs property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "NFailedJobs")]
fn nfailed_jobs(&self) -> zbus::Result<u32>;
/// NFailedUnits property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "NFailedUnits")]
fn nfailed_units(&self) -> zbus::Result<u32>;
/// NInstalledJobs property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "NInstalledJobs")]
fn ninstalled_jobs(&self) -> zbus::Result<u32>;
/// NJobs property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "NJobs")]
fn njobs(&self) -> zbus::Result<u32>;
/// NNames property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "NNames")]
fn nnames(&self) -> zbus::Result<u32>;
/// Progress property
@@ -917,15 +918,13 @@ trait Manager {
fn progress(&self) -> zbus::Result<f64>;
/// RebootWatchdogUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "RebootWatchdogUSec")]
fn reboot_watchdog_usec(&self) -> zbus::Result<u64>;
#[dbus_proxy(property)]
fn set_reboot_watchdog_usec(&self, value: u64) -> zbus::Result<()>;
/// RuntimeWatchdogUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "RuntimeWatchdogUSec")]
fn runtime_watchdog_usec(&self) -> zbus::Result<u64>;
#[dbus_proxy(property)]
fn set_runtime_watchdog_usec(&self, value: u64) -> zbus::Result<()>;
/// SecurityFinishTimestamp property
@@ -947,7 +946,6 @@ trait Manager {
/// ServiceWatchdogs property
#[dbus_proxy(property)]
fn service_watchdogs(&self) -> zbus::Result<bool>;
#[dbus_proxy(property)]
fn set_service_watchdogs(&self, value: bool) -> zbus::Result<()>;
/// ShowStatus property
@@ -963,7 +961,7 @@ trait Manager {
fn tainted(&self) -> zbus::Result<String>;
/// TimerSlackNSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "TimerSlackNSec")]
fn timer_slack_nsec(&self) -> zbus::Result<u64>;
/// UnitPath property

View File

@@ -5,7 +5,7 @@
use crate::cgroups::Manager as CgroupManager;
use crate::protocols::agent::CgroupStats;
use anyhow::Result;
use anyhow::{anyhow, Result};
use cgroups::freezer::FreezerState;
use libc::{self, pid_t};
use oci::LinuxResources;
@@ -29,7 +29,6 @@ pub struct Manager {
pub mounts: HashMap<String, String>,
pub cgroups_path: CgroupsPath,
pub cpath: String,
pub unit_name: String,
// dbus client for set properties
dbus_client: DBusClient,
// fs manager for get properties
@@ -40,14 +39,12 @@ pub struct Manager {
impl CgroupManager for Manager {
fn apply(&self, pid: pid_t) -> Result<()> {
let unit_name = self.unit_name.as_str();
if self.dbus_client.unit_exists(unit_name)? {
self.dbus_client.add_process(pid, self.unit_name.as_str())?;
if self.dbus_client.unit_exists()? {
self.dbus_client.add_process(pid)?;
} else {
self.dbus_client.start_unit(
(pid as u32).try_into().unwrap(),
self.cgroups_path.slice.as_str(),
self.unit_name.as_str(),
&self.cg_hierarchy,
)?;
}
@@ -66,8 +63,7 @@ impl CgroupManager for Manager {
Pids::apply(r, &mut properties, &self.cg_hierarchy, systemd_version_str)?;
CpuSet::apply(r, &mut properties, &self.cg_hierarchy, systemd_version_str)?;
self.dbus_client
.set_properties(self.unit_name.as_str(), &properties)?;
self.dbus_client.set_properties(&properties)?;
Ok(())
}
@@ -77,11 +73,15 @@ impl CgroupManager for Manager {
}
fn freeze(&self, state: FreezerState) -> Result<()> {
self.fs_manager.freeze(state)
match state {
FreezerState::Thawed => self.dbus_client.thaw_unit(),
FreezerState::Frozen => self.dbus_client.freeze_unit(),
_ => Err(anyhow!("Invalid FreezerState")),
}
}
fn destroy(&mut self) -> Result<()> {
self.dbus_client.stop_unit(self.unit_name.as_str())?;
self.dbus_client.kill_unit()?;
self.fs_manager.destroy()
}
@@ -120,8 +120,7 @@ impl Manager {
mounts: fs_manager.mounts.clone(),
cgroups_path,
cpath,
unit_name,
dbus_client: DBusClient {},
dbus_client: DBusClient::new(unit_name),
fs_manager,
cg_hierarchy: if cgroups::hierarchies::is_cgroup2_unified_mode() {
CgroupHierarchy::Unified

View File

@@ -80,6 +80,7 @@ const CLOG_FD: &str = "CLOG_FD";
const FIFO_FD: &str = "FIFO_FD";
const HOME_ENV_KEY: &str = "HOME";
const PIDNS_FD: &str = "PIDNS_FD";
const PIDNS_ENABLED: &str = "PIDNS_ENABLED";
const CONSOLE_SOCKET_FD: &str = "CONSOLE_SOCKET_FD";
#[derive(Debug)]
@@ -280,6 +281,17 @@ pub struct SyncPc {
pid: pid_t,
}
#[derive(Debug, Clone)]
pub struct PidNs {
enabled: bool,
fd: Option<i32>,
}
impl PidNs {
pub fn new(enabled: bool, fd: Option<i32>) -> Self {
Self { enabled, fd }
}
}
pub trait Container: BaseContainer {
fn pause(&mut self) -> Result<()>;
fn resume(&mut self) -> Result<()>;
@@ -339,16 +351,20 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
let crfd = std::env::var(CRFD_FD)?.parse::<i32>().unwrap();
let cfd_log = std::env::var(CLOG_FD)?.parse::<i32>().unwrap();
// get the pidns fd from parent, if parent had passed the pidns fd,
// then get it and join in this pidns; otherwise, create a new pidns
// by unshare from the parent pidns.
match std::env::var(PIDNS_FD) {
Ok(fd) => {
let pidns_fd = fd.parse::<i32>().context("get parent pidns fd")?;
sched::setns(pidns_fd, CloneFlags::CLONE_NEWPID).context("failed to join pidns")?;
let _ = unistd::close(pidns_fd);
if std::env::var(PIDNS_ENABLED)?.eq(format!("{}", true).as_str()) {
// get the pidns fd from parent, if parent had passed the pidns fd,
// then get it and join in this pidns; otherwise, create a new pidns
// by unshare from the parent pidns.
match std::env::var(PIDNS_FD) {
Ok(fd) => {
let pidns_fd = fd.parse::<i32>().context("get parent pidns fd")?;
sched::setns(pidns_fd, CloneFlags::CLONE_NEWPID).context("failed to join pidns")?;
let _ = unistd::close(pidns_fd);
}
Err(_e) => {
sched::unshare(CloneFlags::CLONE_NEWPID)?;
}
}
Err(_e) => sched::unshare(CloneFlags::CLONE_NEWPID)?,
}
match unsafe { fork() } {
@@ -983,9 +999,13 @@ impl BaseContainer for LinuxContainer {
}
let pidns = get_pid_namespace(&self.logger, linux)?;
#[cfg(not(feature = "standard-oci-runtime"))]
if !pidns.enabled {
return Err(anyhow!("cannot find the pid ns"));
}
defer!(if let Some(pid) = pidns {
let _ = unistd::close(pid);
defer!(if let Some(fd) = pidns.fd {
let _ = unistd::close(fd);
});
let exec_path = std::env::current_exe()?;
@@ -1008,14 +1028,15 @@ impl BaseContainer for LinuxContainer {
.env(CRFD_FD, format!("{}", crfd))
.env(CWFD_FD, format!("{}", cwfd))
.env(CLOG_FD, format!("{}", cfd_log))
.env(CONSOLE_SOCKET_FD, console_name);
.env(CONSOLE_SOCKET_FD, console_name)
.env(PIDNS_ENABLED, format!("{}", pidns.enabled));
if p.init {
child = child.env(FIFO_FD, format!("{}", fifofd));
}
if pidns.is_some() {
child = child.env(PIDNS_FD, format!("{}", pidns.unwrap()));
if pidns.fd.is_some() {
child = child.env(PIDNS_FD, format!("{}", pidns.fd.unwrap()));
}
child.spawn()?;
@@ -1249,11 +1270,11 @@ pub fn update_namespaces(logger: &Logger, spec: &mut Spec, init_pid: RawFd) -> R
Ok(())
}
fn get_pid_namespace(logger: &Logger, linux: &Linux) -> Result<Option<RawFd>> {
fn get_pid_namespace(logger: &Logger, linux: &Linux) -> Result<PidNs> {
for ns in &linux.namespaces {
if ns.r#type == "pid" {
if ns.path.is_empty() {
return Ok(None);
return Ok(PidNs::new(true, None));
}
let fd =
@@ -1269,11 +1290,11 @@ fn get_pid_namespace(logger: &Logger, linux: &Linux) -> Result<Option<RawFd>> {
e
})?;
return Ok(Some(fd));
return Ok(PidNs::new(true, Some(fd)));
}
}
Err(anyhow!("cannot find the pid ns"))
Ok(PidNs::new(false, None))
}
fn is_userns_enabled(linux: &Linux) -> bool {

View File

@@ -423,12 +423,18 @@ fn linux_grpc_to_oci(l: &grpc::Linux) -> oci::Linux {
let mut r = Vec::new();
for d in l.Devices.iter() {
// if the filemode for the device is 0 (unset), use a default value as runc does
let filemode = if d.FileMode != 0 {
Some(d.FileMode)
} else {
Some(0o666)
};
r.push(oci::LinuxDevice {
path: d.Path.clone(),
r#type: d.Type.clone(),
major: d.Major,
minor: d.Minor,
file_mode: Some(d.FileMode),
file_mode: filemode,
uid: Some(d.UID),
gid: Some(d.GID),
});

View File

@@ -5,7 +5,6 @@
use crate::rpc;
use anyhow::{bail, ensure, Context, Result};
use serde::Deserialize;
use std::collections::HashSet;
use std::env;
use std::fs;
use std::str::FromStr;
@@ -52,17 +51,6 @@ const ERR_INVALID_CONTAINER_PIPE_SIZE_PARAM: &str = "unable to parse container p
const ERR_INVALID_CONTAINER_PIPE_SIZE_KEY: &str = "invalid container pipe size key name";
const ERR_INVALID_CONTAINER_PIPE_NEGATIVE: &str = "container pipe size should not be negative";
#[derive(Debug, Default, Deserialize)]
pub struct EndpointsConfig {
pub allowed: Vec<String>,
}
#[derive(Debug, Default)]
pub struct AgentEndpoints {
pub allowed: HashSet<String>,
pub all_allowed: bool,
}
#[derive(Debug)]
pub struct AgentConfig {
pub debug_console: bool,
@@ -75,7 +63,6 @@ pub struct AgentConfig {
pub server_addr: String,
pub unified_cgroup_hierarchy: bool,
pub tracing: bool,
pub endpoints: AgentEndpoints,
pub supports_seccomp: bool,
}
@@ -91,7 +78,6 @@ pub struct AgentConfigBuilder {
pub server_addr: Option<String>,
pub unified_cgroup_hierarchy: Option<bool>,
pub tracing: Option<bool>,
pub endpoints: Option<EndpointsConfig>,
}
macro_rules! config_override {
@@ -151,7 +137,6 @@ impl Default for AgentConfig {
server_addr: format!("{}:{}", VSOCK_ADDR, DEFAULT_AGENT_VSOCK_PORT),
unified_cgroup_hierarchy: false,
tracing: false,
endpoints: Default::default(),
supports_seccomp: rpc::have_seccomp(),
}
}
@@ -182,19 +167,13 @@ impl FromStr for AgentConfig {
config_override!(agent_config_builder, agent_config, unified_cgroup_hierarchy);
config_override!(agent_config_builder, agent_config, tracing);
// Populate the allowed endpoints hash set, if we got any from the config file.
if let Some(endpoints) = agent_config_builder.endpoints {
for ep in endpoints.allowed {
agent_config.endpoints.allowed.insert(ep);
}
}
Ok(agent_config)
}
}
impl AgentConfig {
#[instrument]
#[allow(clippy::redundant_closure_call)]
pub fn from_cmdline(file: &str, args: Vec<String>) -> Result<AgentConfig> {
// If config file specified in the args, generate our config from it
let config_position = args.iter().position(|a| a == "--config" || a == "-c");
@@ -297,9 +276,6 @@ impl AgentConfig {
config.tracing = get_bool_value(&name_value)?;
}
// We did not get a configuration file: allow all endpoints.
config.endpoints.all_allowed = true;
Ok(config)
}
@@ -309,10 +285,6 @@ impl AgentConfig {
.with_context(|| format!("Failed to read config file {}", file))?;
AgentConfig::from_str(&config)
}
pub fn is_allowed_endpoint(&self, ep: &str) -> bool {
self.endpoints.all_allowed || self.endpoints.allowed.contains(ep)
}
}
#[instrument]
@@ -1377,26 +1349,13 @@ Caused by:
r#"
dev_mode = true
server_addr = 'vsock://8:2048'
[endpoints]
allowed = ["CreateContainer", "StartContainer"]
"#,
)
.unwrap();
// Verify that the all_allowed flag is false
assert!(!config.endpoints.all_allowed);
// Verify that the override worked
assert!(config.dev_mode);
assert_eq!(config.server_addr, "vsock://8:2048");
assert_eq!(
config.endpoints.allowed,
vec!["CreateContainer".to_string(), "StartContainer".to_string()]
.iter()
.cloned()
.collect()
);
// Verify that the default values are valid
assert_eq!(config.hotplug_timeout, DEFAULT_HOTPLUG_TIMEOUT);

View File

@@ -35,9 +35,9 @@ const VM_ROOTFS: &str = "/";
const BLOCK: &str = "block";
pub const DRIVER_9P_TYPE: &str = "9p";
pub const DRIVER_VIRTIOFS_TYPE: &str = "virtio-fs";
pub const DRIVER_BLK_TYPE: &str = "blk";
pub const DRIVER_BLK_PCI_TYPE: &str = "blk";
pub const DRIVER_BLK_CCW_TYPE: &str = "blk-ccw";
pub const DRIVER_MMIO_BLK_TYPE: &str = "mmioblk";
pub const DRIVER_BLK_MMIO_TYPE: &str = "mmioblk";
pub const DRIVER_SCSI_TYPE: &str = "scsi";
pub const DRIVER_NVDIMM_TYPE: &str = "nvdimm";
pub const DRIVER_EPHEMERAL_TYPE: &str = "ephemeral";
@@ -935,9 +935,9 @@ async fn add_device(device: &Device, sandbox: &Arc<Mutex<Sandbox>>) -> Result<Sp
}
match device.type_.as_str() {
DRIVER_BLK_TYPE => virtio_blk_device_handler(device, sandbox).await,
DRIVER_BLK_PCI_TYPE => virtio_blk_device_handler(device, sandbox).await,
DRIVER_BLK_CCW_TYPE => virtio_blk_ccw_device_handler(device, sandbox).await,
DRIVER_MMIO_BLK_TYPE => virtiommio_blk_device_handler(device, sandbox).await,
DRIVER_BLK_MMIO_TYPE => virtiommio_blk_device_handler(device, sandbox).await,
DRIVER_NVDIMM_TYPE => virtio_nvdimm_device_handler(device, sandbox).await,
DRIVER_SCSI_TYPE => virtio_scsi_device_handler(device, sandbox).await,
DRIVER_VFIO_PCI_GK_TYPE | DRIVER_VFIO_PCI_TYPE => {
@@ -1467,6 +1467,7 @@ mod tests {
}
#[tokio::test]
#[allow(clippy::redundant_clone)]
async fn test_virtio_blk_matcher() {
let root_bus = create_pci_root_bus_path();
let devname = "vda";
@@ -1551,6 +1552,7 @@ mod tests {
}
#[tokio::test]
#[allow(clippy::redundant_clone)]
async fn test_scsi_block_matcher() {
let root_bus = create_pci_root_bus_path();
let devname = "sda";
@@ -1581,6 +1583,7 @@ mod tests {
}
#[tokio::test]
#[allow(clippy::redundant_clone)]
async fn test_vfio_matcher() {
let grpa = IommuGroup(1);
let grpb = IommuGroup(22);
@@ -1602,6 +1605,7 @@ mod tests {
}
#[tokio::test]
#[allow(clippy::redundant_clone)]
async fn test_mmio_block_matcher() {
let devname_a = "vda";
let devname_b = "vdb";

View File

@@ -48,6 +48,7 @@ mod pci;
pub mod random;
mod sandbox;
mod signal;
mod storage;
mod uevent;
mod util;
mod version;
@@ -73,6 +74,9 @@ use tokio::{
mod rpc;
mod tracer;
#[cfg(feature = "agent-policy")]
mod policy;
cfg_if! {
if #[cfg(target_arch = "s390x")] {
mod ap;
@@ -90,6 +94,11 @@ lazy_static! {
AgentConfig::from_cmdline("/proc/cmdline", env::args().collect()).unwrap();
}
#[cfg(feature = "agent-policy")]
lazy_static! {
static ref AGENT_POLICY: Mutex<policy::AgentPolicy> = Mutex::new(AgentPolicy::new());
}
#[derive(Parser)]
// The default clap version info doesn't match our form, so we need to override it
#[clap(global_setting(AppSettings::DisableVersionFlag))]
@@ -326,6 +335,19 @@ async fn start_sandbox(
s.rtnl.handle_localhost().await?;
}
// - When init_mode is true, enabling the localhost link during the
// handle_localhost call above is required before starting OPA with the
// initialize_policy call below.
// - When init_mode is false, the Policy could be initialized earlier,
// because initialize_policy doesn't start OPA. OPA is started by
// systemd after localhost has been enabled.
#[cfg(feature = "agent-policy")]
if let Err(e) = initialize_policy(init_mode).await {
error!(logger, "Failed to initialize agent policy: {:?}", e);
// Continuing execution without a security policy could be dangerous.
std::process::abort();
}
let sandbox = Arc::new(Mutex::new(s));
let signal_handler_task = tokio::spawn(setup_signal_handler(
@@ -387,6 +409,18 @@ fn init_agent_as_init(logger: &Logger, unified_cgroup_hierarchy: bool) -> Result
Ok(())
}
#[cfg(feature = "agent-policy")]
async fn initialize_policy(init_mode: bool) -> Result<()> {
let opa_addr = "localhost:8181";
let agent_policy_path = "/agent_policy";
let default_agent_policy = "/etc/kata-opa/default-policy.rego";
AGENT_POLICY
.lock()
.await
.initialize(init_mode, opa_addr, agent_policy_path, default_agent_policy)
.await
}
// The Rust standard library had suppressed the default SIGPIPE behavior,
// see https://github.com/rust-lang/rust/pull/13158.
// Since the parent's signal handler would be inherited by it's child process,
@@ -401,6 +435,9 @@ fn reset_sigpipe() {
use crate::config::AgentConfig;
use std::os::unix::io::{FromRawFd, RawFd};
#[cfg(feature = "agent-policy")]
use crate::policy::AgentPolicy;
#[cfg(test)]
mod tests {
use super::*;

File diff suppressed because it is too large Load Diff

View File

@@ -7,14 +7,14 @@ use anyhow::{anyhow, Result};
use nix::mount::MsFlags;
use nix::sched::{unshare, CloneFlags};
use nix::unistd::{getpid, gettid};
use slog::Logger;
use std::fmt;
use std::fs;
use std::fs::File;
use std::path::{Path, PathBuf};
use tracing::instrument;
use crate::mount::{baremount, FLAGS};
use slog::Logger;
use crate::mount::baremount;
const PERSISTENT_NS_DIR: &str = "/var/run/sandbox-ns";
pub const NSTYPEIPC: &str = "ipc";
@@ -116,15 +116,7 @@ impl Namespace {
// Bind mount the new namespace from the current thread onto the mount point to persist it.
let mut flags = MsFlags::empty();
if let Some(x) = FLAGS.get("rbind") {
let (clear, f) = *x;
if clear {
flags &= !f;
} else {
flags |= f;
}
};
flags |= MsFlags::MS_BIND | MsFlags::MS_REC;
baremount(source, destination, "none", flags, "", &logger).map_err(|e| {
anyhow!(

View File

@@ -29,7 +29,7 @@ impl Network {
}
}
pub fn setup_guest_dns(logger: Logger, dns_list: Vec<String>) -> Result<()> {
pub fn setup_guest_dns(logger: Logger, dns_list: &[String]) -> Result<()> {
do_setup_guest_dns(
logger,
dns_list,
@@ -38,7 +38,7 @@ pub fn setup_guest_dns(logger: Logger, dns_list: Vec<String>) -> Result<()> {
)
}
fn do_setup_guest_dns(logger: Logger, dns_list: Vec<String>, src: &str, dst: &str) -> Result<()> {
fn do_setup_guest_dns(logger: Logger, dns_list: &[String], src: &str, dst: &str) -> Result<()> {
let logger = logger.new(o!( "subsystem" => "network"));
if dns_list.is_empty() {
@@ -124,7 +124,7 @@ mod tests {
.expect("failed to write file contents");
// call do_setup_guest_dns
let result = do_setup_guest_dns(logger, dns.clone(), src_filename, dst_filename);
let result = do_setup_guest_dns(logger, &dns, src_filename, dst_filename);
assert!(result.is_ok(), "result should be ok, but {:?}", result);

267
src/agent/src/policy.rs Normal file
View File

@@ -0,0 +1,267 @@
// Copyright (c) 2023 Microsoft Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::{bail, Result};
use serde::{Deserialize, Serialize};
use slog::Drain;
use tokio::io::AsyncWriteExt;
use tokio::time::{sleep, Duration};
static EMPTY_JSON_INPUT: &str = "{\"input\":{}}";
static OPA_DATA_PATH: &str = "/data";
static OPA_POLICIES_PATH: &str = "/policies";
static POLICY_LOG_FILE: &str = "/tmp/policy.txt";
/// Convenience macro to obtain the scope logger
macro_rules! sl {
() => {
slog_scope::logger()
};
}
/// Example of HTTP response from OPA: {"result":true}
#[derive(Debug, Serialize, Deserialize)]
struct AllowResponse {
result: bool,
}
/// Singleton policy object.
#[derive(Debug, Default)]
pub struct AgentPolicy {
/// When true policy errors are ignored, for debug purposes.
allow_failures: bool,
/// OPA path used to query if an Agent gRPC request should be allowed.
/// The request name (e.g., CreateContainerRequest) must be added to
/// this path.
query_path: String,
/// OPA path used to add or delete a rego format Policy.
policy_path: String,
/// Client used to connect a single time to the OPA service and reused
/// for all the future communication with OPA.
opa_client: Option<reqwest::Client>,
/// "/tmp/policy.txt" log file for policy activity.
log_file: Option<tokio::fs::File>,
}
impl AgentPolicy {
/// Create AgentPolicy object.
pub fn new() -> Self {
Self {
allow_failures: false,
..Default::default()
}
}
/// Wait for OPA to start and connect to it.
pub async fn initialize(
&mut self,
launch_opa: bool,
opa_addr: &str,
policy_name: &str,
default_policy: &str,
) -> Result<()> {
if sl!().is_enabled(slog::Level::Debug) {
self.log_file = Some(
tokio::fs::OpenOptions::new()
.write(true)
.truncate(true)
.create(true)
.open(POLICY_LOG_FILE)
.await?,
);
debug!(sl!(), "policy: log file: {}", POLICY_LOG_FILE);
}
if launch_opa {
start_opa(opa_addr)?;
}
let opa_uri = format!("http://{opa_addr}/v1");
self.query_path = format!("{opa_uri}{OPA_DATA_PATH}{policy_name}/");
self.policy_path = format!("{opa_uri}{OPA_POLICIES_PATH}{policy_name}");
let opa_client = reqwest::Client::builder().http1_only().build()?;
let policy = tokio::fs::read_to_string(default_policy).await?;
// This loop is necessary to get the opa_client connected to the
// OPA service while that service is starting. Future requests to
// OPA are expected to work without retrying, after connecting
// successfully for the first time.
for i in 0..50 {
if i > 0 {
sleep(Duration::from_millis(100)).await;
debug!(sl!(), "policy initialize: PUT failed, retrying");
}
// Set-up the default policy.
if opa_client
.put(&self.policy_path)
.body(policy.clone())
.send()
.await
.is_ok()
{
self.opa_client = Some(opa_client);
// Check if requests causing policy errors should actually
// be allowed. That is an insecure configuration but is
// useful for allowing insecure pods to start, then connect to
// them and inspect Guest logs for the root cause of a failure.
//
// Note that post_query returns Ok(false) in case
// AllowRequestsFailingPolicy was not defined in the policy.
self.allow_failures = self
.post_query("AllowRequestsFailingPolicy", EMPTY_JSON_INPUT)
.await?;
return Ok(());
}
}
bail!("Failed to connect to OPA")
}
/// Ask OPA to check if an API call should be allowed or not.
pub async fn is_allowed_endpoint(&mut self, ep: &str, request: &str) -> bool {
let post_input = format!("{{\"input\":{request}}}");
self.log_opa_input(ep, &post_input).await;
match self.post_query(ep, &post_input).await {
Err(e) => {
debug!(
sl!(),
"policy: failed to query endpoint {}: {:?}. Returning false.", ep, e
);
false
}
Ok(allowed) => allowed,
}
}
/// Replace the Policy in OPA.
pub async fn set_policy(&mut self, policy: &str) -> Result<()> {
if let Some(opa_client) = &mut self.opa_client {
// Delete the old rules.
opa_client.delete(&self.policy_path).send().await?;
// Put the new rules.
opa_client
.put(&self.policy_path)
.body(policy.to_string())
.send()
.await?;
// Check if requests causing policy errors should actually be allowed.
// That is an insecure configuration but is useful for allowing insecure
// pods to start, then connect to them and inspect Guest logs for the
// root cause of a failure.
//
// Note that post_query returns Ok(false) in case
// AllowRequestsFailingPolicy was not defined in the policy.
self.allow_failures = self
.post_query("AllowRequestsFailingPolicy", EMPTY_JSON_INPUT)
.await?;
Ok(())
} else {
bail!("Agent Policy is not initialized")
}
}
// Post query to OPA.
async fn post_query(&mut self, ep: &str, post_input: &str) -> Result<bool> {
debug!(sl!(), "policy check: {ep}");
if let Some(opa_client) = &mut self.opa_client {
let uri = format!("{}{ep}", &self.query_path);
let response = opa_client
.post(uri)
.body(post_input.to_string())
.send()
.await?;
if response.status() != http::StatusCode::OK {
bail!("policy: POST {} response status {}", ep, response.status());
}
let http_response = response.text().await?;
let opa_response: serde_json::Result<AllowResponse> =
serde_json::from_str(&http_response);
match opa_response {
Ok(resp) => {
if !resp.result {
if self.allow_failures {
warn!(
sl!(),
"policy: POST {} response <{}>. Ignoring error!", ep, http_response
);
return Ok(true);
} else {
error!(sl!(), "policy: POST {} response <{}>", ep, http_response);
}
}
Ok(resp.result)
}
Err(_) => {
warn!(
sl!(),
"policy: endpoint {} not found in policy. Returning false.", ep,
);
Ok(false)
}
}
} else {
bail!("Agent Policy is not initialized")
}
}
async fn log_opa_input(&mut self, ep: &str, input: &str) {
if let Some(log_file) = &mut self.log_file {
match ep {
"StatsContainerRequest" | "ReadStreamRequest" | "SetPolicyRequest" => {
// - StatsContainerRequest and ReadStreamRequest are called
// relatively often, so we're not logging them, to avoid
// growing this log file too much.
// - Confidential Containers Policy documents are relatively
// large, so we're not logging them here, for SetPolicyRequest.
// The Policy text can be obtained directly from the pod YAML.
}
_ => {
let log_entry = format!("[\"ep\":\"{ep}\",{input}],\n\n");
if let Err(e) = log_file.write_all(log_entry.as_bytes()).await {
warn!(sl!(), "policy: log_opa_input: write_all failed: {}", e);
} else if let Err(e) = log_file.flush().await {
warn!(sl!(), "policy: log_opa_input: flush failed: {}", e);
}
}
}
}
}
}
fn start_opa(opa_addr: &str) -> Result<()> {
let bin_dirs = vec!["/bin", "/usr/bin", "/usr/local/bin"];
for bin_dir in &bin_dirs {
let opa_path = bin_dir.to_string() + "/opa";
if std::fs::metadata(&opa_path).is_ok() {
// args copied from kata-opa.service.in.
std::process::Command::new(&opa_path)
.arg("run")
.arg("--server")
.arg("--disable-telemetry")
.arg("--addr")
.arg(opa_addr)
.arg("--log-level")
.arg("info")
.spawn()?;
return Ok(());
}
}
bail!("OPA binary not found in {:?}", &bin_dirs);
}

File diff suppressed because it is too large Load Diff

View File

@@ -3,16 +3,20 @@
// SPDX-License-Identifier: Apache-2.0
//
use crate::linux_abi::*;
use crate::mount::{get_mount_fs_type, remove_mounts, TYPE_ROOTFS};
use crate::namespace::Namespace;
use crate::netlink::Handle;
use crate::network::Network;
use crate::pci;
use crate::uevent::{Uevent, UeventMatcher};
use crate::watcher::BindWatcher;
use std::collections::hash_map::Entry;
use std::collections::HashMap;
use std::fmt::{Debug, Formatter};
use std::fs;
use std::os::unix::fs::PermissionsExt;
use std::path::Path;
use std::str::FromStr;
use std::sync::atomic::{AtomicU32, Ordering};
use std::sync::Arc;
use std::{thread, time};
use anyhow::{anyhow, Context, Result};
use kata_types::cpu::CpuSet;
use kata_types::mount::StorageDevice;
use libc::pid_t;
use oci::{Hook, Hooks};
use protocols::agent::OnlineCPUMemRequest;
@@ -22,22 +26,69 @@ use rustjail::container::BaseContainer;
use rustjail::container::LinuxContainer;
use rustjail::process::Process;
use slog::Logger;
use std::collections::HashMap;
use std::fs;
use std::os::unix::fs::PermissionsExt;
use std::path::Path;
use std::str::FromStr;
use std::sync::Arc;
use std::{thread, time};
use tokio::sync::mpsc::{channel, Receiver, Sender};
use tokio::sync::oneshot;
use tokio::sync::Mutex;
use tracing::instrument;
use crate::linux_abi::*;
use crate::mount::{get_mount_fs_type, TYPE_ROOTFS};
use crate::namespace::Namespace;
use crate::netlink::Handle;
use crate::network::Network;
use crate::pci;
use crate::storage::StorageDeviceGeneric;
use crate::uevent::{Uevent, UeventMatcher};
use crate::watcher::BindWatcher;
pub const ERR_INVALID_CONTAINER_ID: &str = "Invalid container id";
type UeventWatcher = (Box<dyn UeventMatcher>, oneshot::Sender<Uevent>);
#[derive(Clone)]
pub struct StorageState {
count: Arc<AtomicU32>,
device: Arc<dyn StorageDevice>,
}
impl Debug for StorageState {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
f.debug_struct("StorageState").finish()
}
}
impl StorageState {
fn new() -> Self {
StorageState {
count: Arc::new(AtomicU32::new(1)),
device: Arc::new(StorageDeviceGeneric::default()),
}
}
pub fn from_device(device: Arc<dyn StorageDevice>) -> Self {
Self {
count: Arc::new(AtomicU32::new(1)),
device,
}
}
pub fn path(&self) -> Option<&str> {
self.device.path()
}
pub async fn ref_count(&self) -> u32 {
self.count.load(Ordering::Relaxed)
}
async fn inc_ref_count(&self) {
self.count.fetch_add(1, Ordering::Acquire);
}
async fn dec_and_test_ref_count(&self) -> bool {
self.count.fetch_sub(1, Ordering::AcqRel) == 1
}
}
#[derive(Debug)]
pub struct Sandbox {
pub logger: Logger,
@@ -52,7 +103,7 @@ pub struct Sandbox {
pub shared_utsns: Namespace,
pub shared_ipcns: Namespace,
pub sandbox_pidns: Option<Namespace>,
pub storages: HashMap<String, u32>,
pub storages: HashMap<String, StorageState>,
pub running: bool,
pub no_pivot_root: bool,
pub sender: Option<tokio::sync::oneshot::Sender<i32>>,
@@ -98,85 +149,60 @@ impl Sandbox {
})
}
// set_sandbox_storage sets the sandbox level reference
// counter for the sandbox storage.
// This method also returns a boolean to let
// callers know if the storage already existed or not.
// It will return true if storage is new.
//
// It's assumed that caller is calling this method after
// acquiring a lock on sandbox.
/// Add a new storage object or increase reference count of existing one.
/// The caller may detect new storage object by checking `StorageState.refcount == 1`.
#[instrument]
pub fn set_sandbox_storage(&mut self, path: &str) -> bool {
match self.storages.get_mut(path) {
None => {
self.storages.insert(path.to_string(), 1);
true
pub async fn add_sandbox_storage(&mut self, path: &str) -> StorageState {
match self.storages.entry(path.to_string()) {
Entry::Occupied(e) => {
let state = e.get().clone();
state.inc_ref_count().await;
state
}
Some(count) => {
*count += 1;
false
Entry::Vacant(e) => {
let state = StorageState::new();
e.insert(state.clone());
state
}
}
}
// unset_sandbox_storage will decrement the sandbox storage
// reference counter. If there aren't any containers using
// that sandbox storage, this method will remove the
// storage reference from the sandbox and return 'true' to
// let the caller know that they can clean up the storage
// related directories by calling remove_sandbox_storage
//
// It's assumed that caller is calling this method after
// acquiring a lock on sandbox.
/// Update the storage device associated with a path.
pub fn update_sandbox_storage(
&mut self,
path: &str,
device: Arc<dyn StorageDevice>,
) -> std::result::Result<Arc<dyn StorageDevice>, Arc<dyn StorageDevice>> {
if !self.storages.contains_key(path) {
return Err(device);
}
let state = StorageState::from_device(device);
// Safe to unwrap() because we have just ensured existence of entry.
let state = self.storages.insert(path.to_string(), state).unwrap();
Ok(state.device)
}
/// Decrease reference count and destroy the storage object if reference count reaches zero.
/// Returns `Ok(true)` if the reference count has reached zero and the storage object has been
/// removed.
#[instrument]
pub fn unset_sandbox_storage(&mut self, path: &str) -> Result<bool> {
match self.storages.get_mut(path) {
pub async fn remove_sandbox_storage(&mut self, path: &str) -> Result<bool> {
match self.storages.get(path) {
None => Err(anyhow!("Sandbox storage with path {} not found", path)),
Some(count) => {
*count -= 1;
if *count < 1 {
self.storages.remove(path);
return Ok(true);
Some(state) => {
if state.dec_and_test_ref_count().await {
if let Some(storage) = self.storages.remove(path) {
storage.device.cleanup()?;
}
Ok(true)
} else {
Ok(false)
}
Ok(false)
}
}
}
// remove_sandbox_storage removes the sandbox storage if no
// containers are using that storage.
//
// It's assumed that caller is calling this method after
// acquiring a lock on sandbox.
#[instrument]
pub fn remove_sandbox_storage(&self, path: &str) -> Result<()> {
let mounts = vec![path.to_string()];
remove_mounts(&mounts)?;
// "remove_dir" will fail if the mount point is backed by a read-only filesystem.
// This is the case with the device mapper snapshotter, where we mount the block device directly
// at the underlying sandbox path which was provided from the base RO kataShared path from the host.
if let Err(err) = fs::remove_dir(path) {
warn!(self.logger, "failed to remove dir {}, {:?}", path, err);
}
Ok(())
}
// unset_and_remove_sandbox_storage unsets the storage from sandbox
// and if there are no containers using this storage it will
// remove it from the sandbox.
//
// It's assumed that caller is calling this method after
// acquiring a lock on sandbox.
#[instrument]
pub fn unset_and_remove_sandbox_storage(&mut self, path: &str) -> Result<()> {
if self.unset_sandbox_storage(path)? {
return self.remove_sandbox_storage(path);
}
Ok(())
}
#[instrument]
pub async fn setup_shared_namespaces(&mut self) -> Result<bool> {
// Set up shared IPC namespace
@@ -184,22 +210,18 @@ impl Sandbox {
.get_ipc()
.setup()
.await
.context("Failed to setup persistent IPC namespace")?;
.context("setup persistent IPC namespace")?;
// // Set up shared UTS namespace
self.shared_utsns = Namespace::new(&self.logger)
.get_uts(self.hostname.as_str())
.setup()
.await
.context("Failed to setup persistent UTS namespace")?;
.context("setup persistent UTS namespace")?;
Ok(true)
}
pub fn add_container(&mut self, c: LinuxContainer) {
self.containers.insert(c.id.clone(), c);
}
#[instrument]
pub fn update_shared_pidns(&mut self, c: &LinuxContainer) -> Result<()> {
// Populate the shared pid path only if this is an infra container and
@@ -224,14 +246,18 @@ impl Sandbox {
Ok(())
}
pub fn add_container(&mut self, c: LinuxContainer) {
self.containers.insert(c.id.clone(), c);
}
pub fn get_container(&mut self, id: &str) -> Option<&mut LinuxContainer> {
self.containers.get_mut(id)
}
pub fn find_process(&mut self, pid: pid_t) -> Option<&mut Process> {
for (_, c) in self.containers.iter_mut() {
if c.processes.get(&pid).is_some() {
return c.processes.get_mut(&pid);
if let Some(p) = c.processes.get_mut(&pid) {
return Some(p);
}
}
@@ -280,25 +306,17 @@ impl Sandbox {
let guest_cpuset = rustjail_cgroups::fs::get_guest_cpuset()?;
for (_, ctr) in self.containers.iter() {
let cpu = ctr
.config
.spec
.as_ref()
.unwrap()
.linux
.as_ref()
.unwrap()
.resources
.as_ref()
.unwrap()
.cpu
.as_ref();
let container_cpust = if let Some(c) = cpu { &c.cpus } else { "" };
info!(self.logger, "updating {}", ctr.id.as_str());
ctr.cgroup_manager
.as_ref()
.update_cpuset_path(guest_cpuset.as_str(), container_cpust)?;
if let Some(spec) = ctr.config.spec.as_ref() {
if let Some(linux) = spec.linux.as_ref() {
if let Some(resources) = linux.resources.as_ref() {
if let Some(cpus) = resources.cpu.as_ref() {
info!(self.logger, "updating {}", ctr.id.as_str());
ctr.cgroup_manager
.update_cpuset_path(guest_cpuset.as_str(), &cpus.cpus)?;
}
}
}
}
}
Ok(())
@@ -360,31 +378,28 @@ impl Sandbox {
#[instrument]
pub async fn run_oom_event_monitor(&self, mut rx: Receiver<String>, container_id: String) {
let logger = self.logger.clone();
if self.event_tx.is_none() {
error!(
logger,
"sandbox.event_tx not found in run_oom_event_monitor"
);
return;
}
let tx = self.event_tx.as_ref().unwrap().clone();
let tx = match self.event_tx.as_ref() {
Some(v) => v.clone(),
None => {
error!(
logger,
"sandbox.event_tx not found in run_oom_event_monitor"
);
return;
}
};
tokio::spawn(async move {
loop {
let event = rx.recv().await;
// None means the container has exited,
// and sender in OOM notifier is dropped.
// None means the container has exited, and sender in OOM notifier is dropped.
if event.is_none() {
return;
}
info!(logger, "got an OOM event {:?}", event);
let _ = tx
.send(container_id.clone())
.await
.map_err(|e| error!(logger, "failed to send message: {:?}", e));
if let Err(e) = tx.send(container_id.clone()).await {
error!(logger, "failed to send message: {:?}", e);
}
}
});
}
@@ -397,39 +412,36 @@ fn online_resources(logger: &Logger, path: &str, pattern: &str, num: i32) -> Res
for e in fs::read_dir(path)? {
let entry = e?;
let tmpname = entry.file_name();
let name = tmpname.to_str().unwrap();
let p = entry.path();
if re.is_match(name) {
let file = format!("{}/{}", p.to_str().unwrap(), SYSFS_ONLINE_FILE);
info!(logger, "{}", file.as_str());
let c = fs::read_to_string(file.as_str());
if c.is_err() {
continue;
}
let c = c.unwrap();
if c.trim().contains('0') {
let r = fs::write(file.as_str(), "1");
if r.is_err() {
// Skip direntry which doesn't match the pattern.
match entry.file_name().to_str() {
None => continue,
Some(v) => {
if !re.is_match(v) {
continue;
}
count += 1;
}
};
if num > 0 && count == num {
let p = entry.path().join(SYSFS_ONLINE_FILE);
if let Ok(c) = fs::read_to_string(&p) {
// Try to online the object in offline state.
if c.trim().contains('0') && fs::write(&p, "1").is_ok() && num > 0 {
count += 1;
if count == num {
break;
}
}
}
}
if num > 0 {
return Ok(count);
}
Ok(count)
}
Ok(0)
#[instrument]
fn online_memory(logger: &Logger) -> Result<()> {
online_resources(logger, SYSFS_MEMORY_ONLINE_PATH, r"memory[0-9]+", -1)
.context("online memory resource")?;
Ok(())
}
// max wait for all CPUs to online will use 50 * 100 = 5 seconds.
@@ -473,13 +485,6 @@ fn online_cpus(logger: &Logger, num: i32) -> Result<i32> {
))
}
#[instrument]
fn online_memory(logger: &Logger) -> Result<()> {
online_resources(logger, SYSFS_MEMORY_ONLINE_PATH, r"memory[0-9]+", -1)
.context("online memory resource")?;
Ok(())
}
fn onlined_cpus() -> Result<i32> {
let content =
fs::read_to_string(SYSFS_CPU_ONLINE_PATH).context("read sysfs cpu online file")?;
@@ -524,24 +529,22 @@ mod tests {
let tmpdir_path = tmpdir.path().to_str().unwrap();
// Add a new sandbox storage
let new_storage = s.set_sandbox_storage(tmpdir_path);
let new_storage = s.add_sandbox_storage(tmpdir_path).await;
// Check the reference counter
let ref_count = s.storages[tmpdir_path];
let ref_count = new_storage.ref_count().await;
assert_eq!(
ref_count, 1,
"Invalid refcount, got {} expected 1.",
ref_count
);
assert!(new_storage);
// Use the existing sandbox storage
let new_storage = s.set_sandbox_storage(tmpdir_path);
assert!(!new_storage, "Should be false as already exists.");
let new_storage = s.add_sandbox_storage(tmpdir_path).await;
// Since we are using existing storage, the reference counter
// should be 2 by now.
let ref_count = s.storages[tmpdir_path];
let ref_count = new_storage.ref_count().await;
assert_eq!(
ref_count, 2,
"Invalid refcount, got {} expected 2.",
@@ -549,52 +552,6 @@ mod tests {
);
}
#[tokio::test]
#[serial]
async fn remove_sandbox_storage() {
skip_if_not_root!();
let logger = slog::Logger::root(slog::Discard, o!());
let s = Sandbox::new(&logger).unwrap();
let tmpdir = Builder::new().tempdir().unwrap();
let tmpdir_path = tmpdir.path().to_str().unwrap();
let srcdir = Builder::new()
.prefix("src")
.tempdir_in(tmpdir_path)
.unwrap();
let srcdir_path = srcdir.path().to_str().unwrap();
let destdir = Builder::new()
.prefix("dest")
.tempdir_in(tmpdir_path)
.unwrap();
let destdir_path = destdir.path().to_str().unwrap();
let emptydir = Builder::new()
.prefix("empty")
.tempdir_in(tmpdir_path)
.unwrap();
assert!(
s.remove_sandbox_storage(srcdir_path).is_err(),
"Expect Err as the directory is not a mountpoint"
);
assert!(s.remove_sandbox_storage("").is_err());
let invalid_dir = emptydir.path().join("invalid");
assert!(s
.remove_sandbox_storage(invalid_dir.to_str().unwrap())
.is_err());
assert!(bind_mount(srcdir_path, destdir_path, &logger).is_ok());
assert!(s.remove_sandbox_storage(destdir_path).is_ok());
}
#[tokio::test]
#[serial]
async fn unset_and_remove_sandbox_storage() {
@@ -604,8 +561,7 @@ mod tests {
let mut s = Sandbox::new(&logger).unwrap();
assert!(
s.unset_and_remove_sandbox_storage("/tmp/testEphePath")
.is_err(),
s.remove_sandbox_storage("/tmp/testEphePath").await.is_err(),
"Should fail because sandbox storage doesn't exist"
);
@@ -626,8 +582,12 @@ mod tests {
assert!(bind_mount(srcdir_path, destdir_path, &logger).is_ok());
assert!(s.set_sandbox_storage(destdir_path));
assert!(s.unset_and_remove_sandbox_storage(destdir_path).is_ok());
s.add_sandbox_storage(destdir_path).await;
let storage = StorageDeviceGeneric::new(destdir_path.to_string());
assert!(s
.update_sandbox_storage(destdir_path, Arc::new(storage))
.is_ok());
assert!(s.remove_sandbox_storage(destdir_path).await.is_ok());
let other_dir_str;
{
@@ -640,10 +600,14 @@ mod tests {
let other_dir_path = other_dir.path().to_str().unwrap();
other_dir_str = other_dir_path.to_string();
assert!(s.set_sandbox_storage(other_dir_path));
s.add_sandbox_storage(other_dir_path).await;
let storage = StorageDeviceGeneric::new(other_dir_path.to_string());
assert!(s
.update_sandbox_storage(other_dir_path, Arc::new(storage))
.is_ok());
}
assert!(s.unset_and_remove_sandbox_storage(&other_dir_str).is_err());
assert!(s.remove_sandbox_storage(&other_dir_str).await.is_ok());
}
#[tokio::test]
@@ -655,28 +619,30 @@ mod tests {
let storage_path = "/tmp/testEphe";
// Add a new sandbox storage
assert!(s.set_sandbox_storage(storage_path));
s.add_sandbox_storage(storage_path).await;
// Use the existing sandbox storage
let state = s.add_sandbox_storage(storage_path).await;
assert!(
!s.set_sandbox_storage(storage_path),
state.ref_count().await > 1,
"Expects false as the storage is not new."
);
assert!(
!s.unset_sandbox_storage(storage_path).unwrap(),
!s.remove_sandbox_storage(storage_path).await.unwrap(),
"Expects false as there is still a storage."
);
// Reference counter should decrement to 1.
let ref_count = s.storages[storage_path];
let storage = &s.storages[storage_path];
let refcount = storage.ref_count().await;
assert_eq!(
ref_count, 1,
refcount, 1,
"Invalid refcount, got {} expected 1.",
ref_count
refcount
);
assert!(
s.unset_sandbox_storage(storage_path).unwrap(),
s.remove_sandbox_storage(storage_path).await.unwrap(),
"Expects true as there is still a storage."
);
@@ -692,7 +658,7 @@ mod tests {
// If no container is using the sandbox storage, the reference
// counter for it should not exist.
assert!(
s.unset_sandbox_storage(storage_path).is_err(),
s.remove_sandbox_storage(storage_path).await.is_err(),
"Expects false as the reference counter should no exist."
);
}

View File

@@ -0,0 +1,37 @@
// Copyright (c) 2019 Ant Financial
// Copyright (c) 2023 Alibaba Cloud
//
// SPDX-License-Identifier: Apache-2.0
//
use anyhow::Result;
use kata_types::mount::StorageDevice;
use protocols::agent::Storage;
use std::iter;
use std::sync::Arc;
use tracing::instrument;
use crate::storage::{new_device, StorageContext, StorageHandler};
#[derive(Debug)]
pub struct BindWatcherHandler {}
#[async_trait::async_trait]
impl StorageHandler for BindWatcherHandler {
#[instrument]
async fn create_device(
&self,
storage: Storage,
ctx: &mut StorageContext,
) -> Result<Arc<dyn StorageDevice>> {
if let Some(cid) = ctx.cid {
ctx.sandbox
.lock()
.await
.bind_watcher
.add_container(cid.to_string(), iter::once(storage.clone()), ctx.logger)
.await?;
}
new_device("".to_string())
}
}

View File

@@ -0,0 +1,146 @@
// Copyright (c) 2019 Ant Financial
// Copyright (c) 2023 Alibaba Cloud
//
// SPDX-License-Identifier: Apache-2.0
//
use std::fs;
use std::os::unix::fs::PermissionsExt;
use std::path::Path;
use std::str::FromStr;
use std::sync::Arc;
use anyhow::{anyhow, Context, Result};
use kata_types::mount::StorageDevice;
use protocols::agent::Storage;
use tracing::instrument;
use crate::device::{
get_scsi_device_name, get_virtio_blk_pci_device_name, get_virtio_mmio_device_name,
wait_for_pmem_device,
};
use crate::pci;
use crate::storage::{common_storage_handler, new_device, StorageContext, StorageHandler};
#[cfg(target_arch = "s390x")]
use crate::{ccw, device::get_virtio_blk_ccw_device_name};
#[derive(Debug)]
pub struct VirtioBlkMmioHandler {}
#[async_trait::async_trait]
impl StorageHandler for VirtioBlkMmioHandler {
#[instrument]
async fn create_device(
&self,
storage: Storage,
ctx: &mut StorageContext,
) -> Result<Arc<dyn StorageDevice>> {
if !Path::new(&storage.source).exists() {
get_virtio_mmio_device_name(ctx.sandbox, &storage.source)
.await
.context("failed to get mmio device name")?;
}
let path = common_storage_handler(ctx.logger, &storage)?;
new_device(path)
}
}
#[derive(Debug)]
pub struct VirtioBlkPciHandler {}
#[async_trait::async_trait]
impl StorageHandler for VirtioBlkPciHandler {
#[instrument]
async fn create_device(
&self,
mut storage: Storage,
ctx: &mut StorageContext,
) -> Result<Arc<dyn StorageDevice>> {
// If hot-plugged, get the device node path based on the PCI path
// otherwise use the virt path provided in Storage Source
if storage.source.starts_with("/dev") {
let metadata = fs::metadata(&storage.source)
.context(format!("get metadata on file {:?}", &storage.source))?;
let mode = metadata.permissions().mode();
if mode & libc::S_IFBLK == 0 {
return Err(anyhow!("Invalid device {}", &storage.source));
}
} else {
let pcipath = pci::Path::from_str(&storage.source)?;
let dev_path = get_virtio_blk_pci_device_name(ctx.sandbox, &pcipath).await?;
storage.source = dev_path;
}
let path = common_storage_handler(ctx.logger, &storage)?;
new_device(path)
}
}
#[derive(Debug)]
pub struct VirtioBlkCcwHandler {}
#[async_trait::async_trait]
impl StorageHandler for VirtioBlkCcwHandler {
#[cfg(target_arch = "s390x")]
#[instrument]
async fn create_device(
&self,
mut storage: Storage,
ctx: &mut StorageContext,
) -> Result<Arc<dyn StorageDevice>> {
let ccw_device = ccw::Device::from_str(&storage.source)?;
let dev_path = get_virtio_blk_ccw_device_name(ctx.sandbox, &ccw_device).await?;
storage.source = dev_path;
let path = common_storage_handler(ctx.logger, &storage)?;
new_device(path)
}
#[cfg(not(target_arch = "s390x"))]
#[instrument]
async fn create_device(
&self,
_storage: Storage,
_ctx: &mut StorageContext,
) -> Result<Arc<dyn StorageDevice>> {
Err(anyhow!("CCW is only supported on s390x"))
}
}
#[derive(Debug)]
pub struct ScsiHandler {}
#[async_trait::async_trait]
impl StorageHandler for ScsiHandler {
#[instrument]
async fn create_device(
&self,
mut storage: Storage,
ctx: &mut StorageContext,
) -> Result<Arc<dyn StorageDevice>> {
// Retrieve the device path from SCSI address.
let dev_path = get_scsi_device_name(ctx.sandbox, &storage.source).await?;
storage.source = dev_path;
let path = common_storage_handler(ctx.logger, &storage)?;
new_device(path)
}
}
#[derive(Debug)]
pub struct PmemHandler {}
#[async_trait::async_trait]
impl StorageHandler for PmemHandler {
#[instrument]
async fn create_device(
&self,
storage: Storage,
ctx: &mut StorageContext,
) -> Result<Arc<dyn StorageDevice>> {
// Retrieve the device for pmem storage
wait_for_pmem_device(ctx.sandbox, &storage.source).await?;
let path = common_storage_handler(ctx.logger, &storage)?;
new_device(path)
}
}

View File

@@ -0,0 +1,293 @@
// Copyright (c) 2019 Ant Financial
// Copyright (c) 2023 Alibaba Cloud
//
// SPDX-License-Identifier: Apache-2.0
//
use std::fs;
use std::fs::OpenOptions;
use std::io::Write;
use std::os::unix::fs::{MetadataExt, PermissionsExt};
use std::path::Path;
use std::sync::Arc;
use anyhow::{anyhow, Context, Result};
use kata_sys_util::mount::parse_mount_options;
use kata_types::mount::{StorageDevice, KATA_MOUNT_OPTION_FS_GID};
use nix::unistd::Gid;
use protocols::agent::Storage;
use slog::Logger;
use tokio::sync::Mutex;
use tracing::instrument;
use crate::device::{DRIVER_EPHEMERAL_TYPE, FS_TYPE_HUGETLB};
use crate::mount::baremount;
use crate::sandbox::Sandbox;
use crate::storage::{
common_storage_handler, new_device, parse_options, StorageContext, StorageHandler, MODE_SETGID,
};
const FS_GID_EQ: &str = "fsgid=";
const SYS_FS_HUGEPAGES_PREFIX: &str = "/sys/kernel/mm/hugepages";
#[derive(Debug)]
pub struct EphemeralHandler {}
#[async_trait::async_trait]
impl StorageHandler for EphemeralHandler {
#[instrument]
async fn create_device(
&self,
mut storage: Storage,
ctx: &mut StorageContext,
) -> Result<Arc<dyn StorageDevice>> {
// hugetlbfs
if storage.fstype == FS_TYPE_HUGETLB {
info!(ctx.logger, "handle hugetlbfs storage");
// Allocate hugepages before mount
// /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
// /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
// options eg "pagesize=2097152,size=524288000"(2M, 500M)
Self::allocate_hugepages(ctx.logger, &storage.options.to_vec())
.context("allocate hugepages")?;
common_storage_handler(ctx.logger, &storage)?;
} else if !storage.options.is_empty() {
// By now we only support one option field: "fsGroup" which
// isn't an valid mount option, thus we should remove it when
// do mount.
let opts = parse_options(&storage.options);
storage.options = Default::default();
common_storage_handler(ctx.logger, &storage)?;
// ephemeral_storage didn't support mount options except fsGroup.
if let Some(fsgid) = opts.get(KATA_MOUNT_OPTION_FS_GID) {
let gid = fsgid.parse::<u32>()?;
nix::unistd::chown(storage.mount_point.as_str(), None, Some(Gid::from_raw(gid)))?;
let meta = fs::metadata(&storage.mount_point)?;
let mut permission = meta.permissions();
let o_mode = meta.mode() | MODE_SETGID;
permission.set_mode(o_mode);
fs::set_permissions(&storage.mount_point, permission)?;
}
} else {
common_storage_handler(ctx.logger, &storage)?;
}
new_device("".to_string())
}
}
impl EphemeralHandler {
// Allocate hugepages by writing to sysfs
fn allocate_hugepages(logger: &Logger, options: &[String]) -> Result<()> {
info!(logger, "mounting hugePages storage options: {:?}", options);
let (pagesize, size) = Self::get_pagesize_and_size_from_option(options)
.context(format!("parse mount options: {:?}", &options))?;
info!(
logger,
"allocate hugepages. pageSize: {}, size: {}", pagesize, size
);
// sysfs entry is always of the form hugepages-${pagesize}kB
// Ref: https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt
let path = Path::new(SYS_FS_HUGEPAGES_PREFIX)
.join(format!("hugepages-{}kB", pagesize / 1024))
.join("nr_hugepages");
// write numpages to nr_hugepages file.
let numpages = format!("{}", size / pagesize);
info!(logger, "write {} pages to {:?}", &numpages, &path);
let mut file = OpenOptions::new()
.write(true)
.open(&path)
.context(format!("open nr_hugepages directory {:?}", &path))?;
file.write_all(numpages.as_bytes())
.context(format!("write nr_hugepages failed: {:?}", &path))?;
// Even if the write succeeds, the kernel isn't guaranteed to be
// able to allocate all the pages we requested. Verify that it
// did.
let verify = fs::read_to_string(&path).context(format!("reading {:?}", &path))?;
let allocated = verify
.trim_end()
.parse::<u64>()
.map_err(|_| anyhow!("Unexpected text {:?} in {:?}", &verify, &path))?;
if allocated != size / pagesize {
return Err(anyhow!(
"Only allocated {} of {} hugepages of size {}",
allocated,
numpages,
pagesize
));
}
Ok(())
}
// Parse filesystem options string to retrieve hugepage details
// options eg "pagesize=2048,size=107374182"
fn get_pagesize_and_size_from_option(options: &[String]) -> Result<(u64, u64)> {
let mut pagesize_str: Option<&str> = None;
let mut size_str: Option<&str> = None;
for option in options {
let vars: Vec<&str> = option.trim().split(',').collect();
for var in vars {
if let Some(stripped) = var.strip_prefix("pagesize=") {
pagesize_str = Some(stripped);
} else if let Some(stripped) = var.strip_prefix("size=") {
size_str = Some(stripped);
}
if pagesize_str.is_some() && size_str.is_some() {
break;
}
}
}
if pagesize_str.is_none() || size_str.is_none() {
return Err(anyhow!("no pagesize/size options found"));
}
let pagesize = pagesize_str
.unwrap()
.parse::<u64>()
.context(format!("parse pagesize: {:?}", &pagesize_str))?;
let size = size_str
.unwrap()
.parse::<u64>()
.context(format!("parse size: {:?}", &pagesize_str))?;
Ok((pagesize, size))
}
}
// update_ephemeral_mounts takes a list of ephemeral mounts and remounts them
// with mount options passed by the caller
#[instrument]
pub async fn update_ephemeral_mounts(
logger: Logger,
storages: &[Storage],
_sandbox: &Arc<Mutex<Sandbox>>,
) -> Result<()> {
for storage in storages {
let handler_name = &storage.driver;
let logger = logger.new(o!(
"msg" => "updating tmpfs storage",
"subsystem" => "storage",
"storage-type" => handler_name.to_owned()));
match handler_name.as_str() {
DRIVER_EPHEMERAL_TYPE => {
fs::create_dir_all(&storage.mount_point)?;
if storage.options.is_empty() {
continue;
} else {
// assume that fsGid has already been set
let mount_path = Path::new(&storage.mount_point);
let src_path = Path::new(&storage.source);
let opts: Vec<&String> = storage
.options
.iter()
.filter(|&opt| !opt.starts_with(FS_GID_EQ))
.collect();
let (flags, options) = parse_mount_options(&opts)?;
info!(logger, "mounting storage";
"mount-source" => src_path.display(),
"mount-destination" => mount_path.display(),
"mount-fstype" => storage.fstype.as_str(),
"mount-options" => options.as_str(),
);
baremount(
src_path,
mount_path,
storage.fstype.as_str(),
flags,
options.as_str(),
&logger,
)?;
}
}
_ => {
return Err(anyhow!(
"Unsupported storage type for syncing mounts {}. Only ephemeral storage update is supported",
storage.driver
));
}
};
}
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_get_pagesize_and_size_from_option() {
let expected_pagesize = 2048;
let expected_size = 107374182;
let expected = (expected_pagesize, expected_size);
let data = vec![
// (input, expected, is_ok)
("size-1=107374182,pagesize-1=2048", expected, false),
("size-1=107374182,pagesize=2048", expected, false),
("size=107374182,pagesize-1=2048", expected, false),
("size=107374182,pagesize=abc", expected, false),
("size=abc,pagesize=2048", expected, false),
("size=,pagesize=2048", expected, false),
("size=107374182,pagesize=", expected, false),
("size=107374182,pagesize=2048", expected, true),
("pagesize=2048,size=107374182", expected, true),
("foo=bar,pagesize=2048,size=107374182", expected, true),
(
"foo=bar,pagesize=2048,foo1=bar1,size=107374182",
expected,
true,
),
(
"pagesize=2048,foo1=bar1,foo=bar,size=107374182",
expected,
true,
),
(
"foo=bar,pagesize=2048,foo1=bar1,size=107374182,foo2=bar2",
expected,
true,
),
(
"foo=bar,size=107374182,foo1=bar1,pagesize=2048",
expected,
true,
),
];
for case in data {
let input = case.0;
let r = EphemeralHandler::get_pagesize_and_size_from_option(&[input.to_string()]);
let is_ok = case.2;
if is_ok {
let expected = case.1;
let (pagesize, size) = r.unwrap();
assert_eq!(expected.0, pagesize);
assert_eq!(expected.1, size);
} else {
assert!(r.is_err());
}
}
}
}

View File

@@ -0,0 +1,89 @@
// Copyright (c) 2019 Ant Financial
// Copyright (c) 2023 Alibaba Cloud
//
// SPDX-License-Identifier: Apache-2.0
//
use std::fs;
use std::path::Path;
use std::sync::Arc;
use anyhow::{anyhow, Context, Result};
use kata_types::mount::StorageDevice;
use protocols::agent::Storage;
use tracing::instrument;
use crate::storage::{common_storage_handler, new_device, StorageContext, StorageHandler};
#[derive(Debug)]
pub struct OverlayfsHandler {}
#[async_trait::async_trait]
impl StorageHandler for OverlayfsHandler {
#[instrument]
async fn create_device(
&self,
mut storage: Storage,
ctx: &mut StorageContext,
) -> Result<Arc<dyn StorageDevice>> {
if storage
.options
.iter()
.any(|e| e == "io.katacontainers.fs-opt.overlay-rw")
{
let cid = ctx
.cid
.clone()
.ok_or_else(|| anyhow!("No container id in rw overlay"))?;
let cpath = Path::new(crate::rpc::CONTAINER_BASE).join(cid);
let work = cpath.join("work");
let upper = cpath.join("upper");
fs::create_dir_all(&work).context("Creating overlay work directory")?;
fs::create_dir_all(&upper).context("Creating overlay upper directory")?;
storage.fstype = "overlay".into();
storage
.options
.push(format!("upperdir={}", upper.to_string_lossy()));
storage
.options
.push(format!("workdir={}", work.to_string_lossy()));
}
let path = common_storage_handler(ctx.logger, &storage)?;
new_device(path)
}
}
#[derive(Debug)]
pub struct Virtio9pHandler {}
#[async_trait::async_trait]
impl StorageHandler for Virtio9pHandler {
#[instrument]
async fn create_device(
&self,
storage: Storage,
ctx: &mut StorageContext,
) -> Result<Arc<dyn StorageDevice>> {
let path = common_storage_handler(ctx.logger, &storage)?;
new_device(path)
}
}
#[derive(Debug)]
pub struct VirtioFsHandler {}
#[async_trait::async_trait]
impl StorageHandler for VirtioFsHandler {
#[instrument]
async fn create_device(
&self,
storage: Storage,
ctx: &mut StorageContext,
) -> Result<Arc<dyn StorageDevice>> {
let path = common_storage_handler(ctx.logger, &storage)?;
new_device(path)
}
}

View File

@@ -0,0 +1,61 @@
// Copyright (c) 2019 Ant Financial
// Copyright (c) 2023 Alibaba Cloud
//
// SPDX-License-Identifier: Apache-2.0
//
use std::fs;
use std::os::unix::fs::PermissionsExt;
use std::sync::Arc;
use anyhow::{Context, Result};
use kata_types::mount::{StorageDevice, KATA_MOUNT_OPTION_FS_GID};
use nix::unistd::Gid;
use protocols::agent::Storage;
use tracing::instrument;
use crate::storage::{new_device, parse_options, StorageContext, StorageHandler, MODE_SETGID};
#[derive(Debug)]
pub struct LocalHandler {}
#[async_trait::async_trait]
impl StorageHandler for LocalHandler {
#[instrument]
async fn create_device(
&self,
storage: Storage,
_ctx: &mut StorageContext,
) -> Result<Arc<dyn StorageDevice>> {
fs::create_dir_all(&storage.mount_point).context(format!(
"failed to create dir all {:?}",
&storage.mount_point
))?;
let opts = parse_options(&storage.options);
let mut need_set_fsgid = false;
if let Some(fsgid) = opts.get(KATA_MOUNT_OPTION_FS_GID) {
let gid = fsgid.parse::<u32>()?;
nix::unistd::chown(storage.mount_point.as_str(), None, Some(Gid::from_raw(gid)))?;
need_set_fsgid = true;
}
if let Some(mode) = opts.get("mode") {
let mut permission = fs::metadata(&storage.mount_point)?.permissions();
let mut o_mode = u32::from_str_radix(mode, 8)?;
if need_set_fsgid {
// set SetGid mode mask.
o_mode |= MODE_SETGID;
}
permission.set_mode(o_mode);
fs::set_permissions(&storage.mount_point, permission)?;
}
new_device("".to_string())
}
}

View File

@@ -0,0 +1,789 @@
// Copyright (c) 2019 Ant Financial
// Copyright (c) 2023 Alibaba Cloud
//
// SPDX-License-Identifier: Apache-2.0
//
use std::collections::HashMap;
use std::fs;
use std::os::unix::fs::{MetadataExt, PermissionsExt};
use std::path::Path;
use std::sync::Arc;
use anyhow::{anyhow, Context, Result};
use kata_sys_util::mount::{create_mount_destination, parse_mount_options};
use kata_types::mount::{StorageDevice, StorageHandlerManager, KATA_SHAREDFS_GUEST_PREMOUNT_TAG};
use nix::unistd::{Gid, Uid};
use protocols::agent::Storage;
use protocols::types::FSGroupChangePolicy;
use slog::Logger;
use tokio::sync::Mutex;
use tracing::instrument;
use self::bind_watcher_handler::BindWatcherHandler;
use self::block_handler::{PmemHandler, ScsiHandler, VirtioBlkMmioHandler, VirtioBlkPciHandler};
use self::ephemeral_handler::EphemeralHandler;
use self::fs_handler::{OverlayfsHandler, Virtio9pHandler, VirtioFsHandler};
use self::local_handler::LocalHandler;
use crate::device::{
DRIVER_9P_TYPE, DRIVER_BLK_MMIO_TYPE, DRIVER_BLK_PCI_TYPE, DRIVER_EPHEMERAL_TYPE,
DRIVER_LOCAL_TYPE, DRIVER_NVDIMM_TYPE, DRIVER_OVERLAYFS_TYPE, DRIVER_SCSI_TYPE,
DRIVER_VIRTIOFS_TYPE, DRIVER_WATCHABLE_BIND_TYPE,
};
use crate::mount::{baremount, is_mounted, remove_mounts};
use crate::sandbox::Sandbox;
pub use self::ephemeral_handler::update_ephemeral_mounts;
mod bind_watcher_handler;
mod block_handler;
mod ephemeral_handler;
mod fs_handler;
mod local_handler;
const RW_MASK: u32 = 0o660;
const RO_MASK: u32 = 0o440;
const EXEC_MASK: u32 = 0o110;
const MODE_SETGID: u32 = 0o2000;
#[derive(Debug)]
pub struct StorageContext<'a> {
cid: &'a Option<String>,
logger: &'a Logger,
sandbox: &'a Arc<Mutex<Sandbox>>,
}
/// An implementation of generic storage device.
#[derive(Default, Debug)]
pub struct StorageDeviceGeneric {
path: Option<String>,
}
impl StorageDeviceGeneric {
/// Create a new instance of `StorageStateCommon`.
pub fn new(path: String) -> Self {
StorageDeviceGeneric { path: Some(path) }
}
}
impl StorageDevice for StorageDeviceGeneric {
fn path(&self) -> Option<&str> {
self.path.as_deref()
}
fn cleanup(&self) -> Result<()> {
let path = match self.path() {
None => return Ok(()),
Some(v) => {
if v.is_empty() {
// TODO: Bind watch, local, ephemeral volume has empty path, which will get leaked.
return Ok(());
} else {
v
}
}
};
if !Path::new(path).exists() {
return Ok(());
}
if matches!(is_mounted(path), Ok(true)) {
let mounts = vec![path.to_string()];
remove_mounts(&mounts)?;
}
if matches!(is_mounted(path), Ok(true)) {
return Err(anyhow!("failed to umount mountpoint {}", path));
}
let p = Path::new(path);
if p.is_dir() {
let is_empty = p.read_dir()?.next().is_none();
if !is_empty {
return Err(anyhow!("directory is not empty when clean up storage"));
}
// "remove_dir" will fail if the mount point is backed by a read-only filesystem.
// This is the case with the device mapper snapshotter, where we mount the block device
// directly at the underlying sandbox path which was provided from the base RO kataShared
// path from the host.
let _ = fs::remove_dir(p);
} else if !p.is_file() {
// TODO: should we remove the file for bind mount?
return Err(anyhow!(
"storage path {} is neither directory nor file",
path
));
}
Ok(())
}
}
/// Trait object to handle storage device.
#[async_trait::async_trait]
pub trait StorageHandler: Send + Sync {
/// Create a new storage device.
async fn create_device(
&self,
storage: Storage,
ctx: &mut StorageContext,
) -> Result<Arc<dyn StorageDevice>>;
}
#[rustfmt::skip]
lazy_static! {
pub static ref STORAGE_HANDLERS: StorageHandlerManager<Arc<dyn StorageHandler>> = {
let mut manager: StorageHandlerManager<Arc<dyn StorageHandler>> = StorageHandlerManager::new();
manager.add_handler(DRIVER_9P_TYPE, Arc::new(Virtio9pHandler{})).unwrap();
#[cfg(target_arch = "s390x")]
manager.add_handler(crate::device::DRIVER_BLK_CCW_TYPE, Arc::new(self::block_handler::VirtioBlkCcwHandler{})).unwrap();
manager.add_handler(DRIVER_BLK_MMIO_TYPE, Arc::new(VirtioBlkMmioHandler{})).unwrap();
manager.add_handler(DRIVER_BLK_PCI_TYPE, Arc::new(VirtioBlkPciHandler{})).unwrap();
manager.add_handler(DRIVER_EPHEMERAL_TYPE, Arc::new(EphemeralHandler{})).unwrap();
manager.add_handler(DRIVER_LOCAL_TYPE, Arc::new(LocalHandler{})).unwrap();
manager.add_handler(DRIVER_NVDIMM_TYPE, Arc::new(PmemHandler{})).unwrap();
manager.add_handler(DRIVER_OVERLAYFS_TYPE, Arc::new(OverlayfsHandler{})).unwrap();
manager.add_handler(DRIVER_SCSI_TYPE, Arc::new(ScsiHandler{})).unwrap();
manager.add_handler(DRIVER_VIRTIOFS_TYPE, Arc::new(VirtioFsHandler{})).unwrap();
manager.add_handler(DRIVER_WATCHABLE_BIND_TYPE, Arc::new(BindWatcherHandler{})).unwrap();
manager
};
}
// add_storages takes a list of storages passed by the caller, and perform the
// associated operations such as waiting for the device to show up, and mount
// it to a specific location, according to the type of handler chosen, and for
// each storage.
#[instrument]
pub async fn add_storages(
logger: Logger,
storages: Vec<Storage>,
sandbox: &Arc<Mutex<Sandbox>>,
cid: Option<String>,
) -> Result<Vec<String>> {
let mut mount_list = Vec::new();
for storage in storages {
let path = storage.mount_point.clone();
let state = sandbox.lock().await.add_sandbox_storage(&path).await;
if state.ref_count().await > 1 {
if let Some(path) = state.path() {
if !path.is_empty() {
mount_list.push(path.to_string());
}
}
// The device already exists.
continue;
}
if let Some(handler) = STORAGE_HANDLERS.handler(&storage.driver) {
let logger =
logger.new(o!( "subsystem" => "storage", "storage-type" => storage.driver.clone()));
let mut ctx = StorageContext {
cid: &cid,
logger: &logger,
sandbox,
};
match handler.create_device(storage, &mut ctx).await {
Ok(device) => {
match sandbox
.lock()
.await
.update_sandbox_storage(&path, device.clone())
{
Ok(d) => {
if let Some(path) = device.path() {
if !path.is_empty() {
mount_list.push(path.to_string());
}
}
drop(d);
}
Err(device) => {
error!(logger, "failed to update device for storage");
if let Err(e) = sandbox.lock().await.remove_sandbox_storage(&path).await
{
warn!(logger, "failed to remove dummy sandbox storage {:?}", e);
}
if let Err(e) = device.cleanup() {
error!(
logger,
"failed to clean state for storage device {}, {}", path, e
);
}
return Err(anyhow!("failed to update device for storage"));
}
}
}
Err(e) => {
error!(logger, "failed to create device for storage, error: {e:?}");
if let Err(e) = sandbox.lock().await.remove_sandbox_storage(&path).await {
warn!(logger, "failed to remove dummy sandbox storage {e:?}");
}
return Err(e);
}
}
} else {
return Err(anyhow!(
"Failed to find the storage handler {}",
storage.driver
));
}
}
Ok(mount_list)
}
pub(crate) fn new_device(path: String) -> Result<Arc<dyn StorageDevice>> {
let device = StorageDeviceGeneric::new(path);
Ok(Arc::new(device))
}
#[instrument]
pub(crate) fn common_storage_handler(logger: &Logger, storage: &Storage) -> Result<String> {
mount_storage(logger, storage)?;
set_ownership(logger, storage)?;
Ok(storage.mount_point.clone())
}
// mount_storage performs the mount described by the storage structure.
#[instrument]
fn mount_storage(logger: &Logger, storage: &Storage) -> Result<()> {
let logger = logger.new(o!("subsystem" => "mount"));
// There's a special mechanism to create mountpoint from a `sharedfs` instance before
// starting the kata-agent. Check for such cases.
if storage.source == KATA_SHAREDFS_GUEST_PREMOUNT_TAG && is_mounted(&storage.mount_point)? {
warn!(
logger,
"{} already mounted on {}, ignoring...",
KATA_SHAREDFS_GUEST_PREMOUNT_TAG,
&storage.mount_point
);
return Ok(());
}
let (flags, options) = parse_mount_options(&storage.options)?;
let mount_path = Path::new(&storage.mount_point);
let src_path = Path::new(&storage.source);
create_mount_destination(src_path, mount_path, "", &storage.fstype)
.context("Could not create mountpoint")?;
info!(logger, "mounting storage";
"mount-source" => src_path.display(),
"mount-destination" => mount_path.display(),
"mount-fstype" => storage.fstype.as_str(),
"mount-options" => options.as_str(),
);
baremount(
src_path,
mount_path,
storage.fstype.as_str(),
flags,
options.as_str(),
&logger,
)
}
#[instrument]
pub(crate) fn parse_options(option_list: &[String]) -> HashMap<String, String> {
let mut options = HashMap::new();
for opt in option_list {
let fields: Vec<&str> = opt.split('=').collect();
if fields.len() == 2 {
options.insert(fields[0].to_string(), fields[1].to_string());
}
}
options
}
#[instrument]
pub fn set_ownership(logger: &Logger, storage: &Storage) -> Result<()> {
let logger = logger.new(o!("subsystem" => "mount", "fn" => "set_ownership"));
// If fsGroup is not set, skip performing ownership change
if storage.fs_group.is_none() {
return Ok(());
}
let fs_group = storage.fs_group();
let read_only = storage.options.contains(&String::from("ro"));
let mount_path = Path::new(&storage.mount_point);
let metadata = mount_path.metadata().map_err(|err| {
error!(logger, "failed to obtain metadata for mount path";
"mount-path" => mount_path.to_str(),
"error" => err.to_string(),
);
err
})?;
if fs_group.group_change_policy == FSGroupChangePolicy::OnRootMismatch.into()
&& metadata.gid() == fs_group.group_id
{
let mut mask = if read_only { RO_MASK } else { RW_MASK };
mask |= EXEC_MASK;
// With fsGroup change policy to OnRootMismatch, if the current
// gid of the mount path root directory matches the desired gid
// and the current permission of mount path root directory is correct,
// then ownership change will be skipped.
let current_mode = metadata.permissions().mode();
if (mask & current_mode == mask) && (current_mode & MODE_SETGID != 0) {
info!(logger, "skipping ownership change for volume";
"mount-path" => mount_path.to_str(),
"fs-group" => fs_group.group_id.to_string(),
);
return Ok(());
}
}
info!(logger, "performing recursive ownership change";
"mount-path" => mount_path.to_str(),
"fs-group" => fs_group.group_id.to_string(),
);
recursive_ownership_change(
mount_path,
None,
Some(Gid::from_raw(fs_group.group_id)),
read_only,
)
}
#[instrument]
pub fn recursive_ownership_change(
path: &Path,
uid: Option<Uid>,
gid: Option<Gid>,
read_only: bool,
) -> Result<()> {
let mut mask = if read_only { RO_MASK } else { RW_MASK };
if path.is_dir() {
for entry in fs::read_dir(path)? {
recursive_ownership_change(entry?.path().as_path(), uid, gid, read_only)?;
}
mask |= EXEC_MASK;
mask |= MODE_SETGID;
}
// We do not want to change the permission of the underlying file
// using symlink. Hence we skip symlinks from recursive ownership
// and permission changes.
if path.is_symlink() {
return Ok(());
}
nix::unistd::chown(path, uid, gid)?;
if gid.is_some() {
let metadata = path.metadata()?;
let mut permission = metadata.permissions();
let target_mode = metadata.mode() | mask;
permission.set_mode(target_mode);
fs::set_permissions(path, permission)?;
}
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use anyhow::Error;
use nix::mount::MsFlags;
use protocols::agent::FSGroup;
use std::fs::File;
use tempfile::{tempdir, Builder};
use test_utils::{
skip_if_not_root, skip_loop_by_user, skip_loop_if_not_root, skip_loop_if_root, TestUserType,
};
#[test]
fn test_mount_storage() {
#[derive(Debug)]
struct TestData<'a> {
test_user: TestUserType,
storage: Storage,
error_contains: &'a str,
make_source_dir: bool,
make_mount_dir: bool,
deny_mount_permission: bool,
}
impl Default for TestData<'_> {
fn default() -> Self {
TestData {
test_user: TestUserType::Any,
storage: Storage {
mount_point: "mnt".to_string(),
source: "src".to_string(),
fstype: "tmpfs".to_string(),
..Default::default()
},
make_source_dir: true,
make_mount_dir: false,
deny_mount_permission: false,
error_contains: "",
}
}
}
let tests = &[
TestData {
test_user: TestUserType::NonRootOnly,
error_contains: "EPERM: Operation not permitted",
..Default::default()
},
TestData {
test_user: TestUserType::RootOnly,
..Default::default()
},
TestData {
storage: Storage {
mount_point: "mnt".to_string(),
source: "src".to_string(),
fstype: "bind".to_string(),
..Default::default()
},
make_source_dir: false,
make_mount_dir: true,
error_contains: "Could not create mountpoint",
..Default::default()
},
TestData {
test_user: TestUserType::NonRootOnly,
deny_mount_permission: true,
error_contains: "Could not create mountpoint",
..Default::default()
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
skip_loop_by_user!(msg, d.test_user);
let drain = slog::Discard;
let logger = slog::Logger::root(drain, o!());
let tempdir = tempdir().unwrap();
let source = tempdir.path().join(&d.storage.source);
let mount_point = tempdir.path().join(&d.storage.mount_point);
let storage = Storage {
source: source.to_str().unwrap().to_string(),
mount_point: mount_point.to_str().unwrap().to_string(),
..d.storage.clone()
};
if d.make_source_dir {
fs::create_dir_all(&storage.source).unwrap();
}
if d.make_mount_dir {
fs::create_dir_all(&storage.mount_point).unwrap();
}
if d.deny_mount_permission {
fs::set_permissions(
mount_point.parent().unwrap(),
fs::Permissions::from_mode(0o000),
)
.unwrap();
}
let result = mount_storage(&logger, &storage);
// restore permissions so tempdir can be cleaned up
if d.deny_mount_permission {
fs::set_permissions(
mount_point.parent().unwrap(),
fs::Permissions::from_mode(0o755),
)
.unwrap();
}
if result.is_ok() {
nix::mount::umount(&mount_point).unwrap();
}
let msg = format!("{}: result: {:?}", msg, result);
if d.error_contains.is_empty() {
assert!(result.is_ok(), "{}", msg);
} else {
assert!(result.is_err(), "{}", msg);
let error_msg = format!("{}", result.unwrap_err());
assert!(error_msg.contains(d.error_contains), "{}", msg);
}
}
}
#[test]
fn test_set_ownership() {
skip_if_not_root!();
let logger = slog::Logger::root(slog::Discard, o!());
#[derive(Debug)]
struct TestData<'a> {
mount_path: &'a str,
fs_group: Option<FSGroup>,
read_only: bool,
expected_group_id: u32,
expected_permission: u32,
}
let tests = &[
TestData {
mount_path: "foo",
fs_group: None,
read_only: false,
expected_group_id: 0,
expected_permission: 0,
},
TestData {
mount_path: "rw_mount",
fs_group: Some(FSGroup {
group_id: 3000,
group_change_policy: FSGroupChangePolicy::Always.into(),
..Default::default()
}),
read_only: false,
expected_group_id: 3000,
expected_permission: RW_MASK | EXEC_MASK | MODE_SETGID,
},
TestData {
mount_path: "ro_mount",
fs_group: Some(FSGroup {
group_id: 3000,
group_change_policy: FSGroupChangePolicy::OnRootMismatch.into(),
..Default::default()
}),
read_only: true,
expected_group_id: 3000,
expected_permission: RO_MASK | EXEC_MASK | MODE_SETGID,
},
];
let tempdir = tempdir().expect("failed to create tmpdir");
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let mount_dir = tempdir.path().join(d.mount_path);
fs::create_dir(&mount_dir)
.unwrap_or_else(|_| panic!("{}: failed to create root directory", msg));
let directory_mode = mount_dir.as_path().metadata().unwrap().permissions().mode();
let mut storage_data = Storage::new();
if d.read_only {
storage_data.set_options(vec!["foo".to_string(), "ro".to_string()]);
}
if let Some(fs_group) = d.fs_group.clone() {
storage_data.set_fs_group(fs_group);
}
storage_data.mount_point = mount_dir.clone().into_os_string().into_string().unwrap();
let result = set_ownership(&logger, &storage_data);
assert!(result.is_ok());
assert_eq!(
mount_dir.as_path().metadata().unwrap().gid(),
d.expected_group_id
);
assert_eq!(
mount_dir.as_path().metadata().unwrap().permissions().mode(),
(directory_mode | d.expected_permission)
);
}
}
#[test]
fn test_recursive_ownership_change() {
skip_if_not_root!();
const COUNT: usize = 5;
#[derive(Debug)]
struct TestData<'a> {
// Directory where the recursive ownership change should be performed on
path: &'a str,
// User ID for ownership change
uid: u32,
// Group ID for ownership change
gid: u32,
// Set when the permission should be read-only
read_only: bool,
// The expected permission of all directories after ownership change
expected_permission_directory: u32,
// The expected permission of all files after ownership change
expected_permission_file: u32,
}
let tests = &[
TestData {
path: "no_gid_change",
uid: 0,
gid: 0,
read_only: false,
expected_permission_directory: 0,
expected_permission_file: 0,
},
TestData {
path: "rw_gid_change",
uid: 0,
gid: 3000,
read_only: false,
expected_permission_directory: RW_MASK | EXEC_MASK | MODE_SETGID,
expected_permission_file: RW_MASK,
},
TestData {
path: "ro_gid_change",
uid: 0,
gid: 3000,
read_only: true,
expected_permission_directory: RO_MASK | EXEC_MASK | MODE_SETGID,
expected_permission_file: RO_MASK,
},
];
let tempdir = tempdir().expect("failed to create tmpdir");
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let mount_dir = tempdir.path().join(d.path);
fs::create_dir(&mount_dir)
.unwrap_or_else(|_| panic!("{}: failed to create root directory", msg));
let directory_mode = mount_dir.as_path().metadata().unwrap().permissions().mode();
let mut file_mode: u32 = 0;
// create testing directories and files
for n in 1..COUNT {
let nest_dir = mount_dir.join(format!("nested{}", n));
fs::create_dir(&nest_dir)
.unwrap_or_else(|_| panic!("{}: failed to create nest directory", msg));
for f in 1..COUNT {
let filename = nest_dir.join(format!("file{}", f));
File::create(&filename)
.unwrap_or_else(|_| panic!("{}: failed to create file", msg));
file_mode = filename.as_path().metadata().unwrap().permissions().mode();
}
}
let uid = if d.uid > 0 {
Some(Uid::from_raw(d.uid))
} else {
None
};
let gid = if d.gid > 0 {
Some(Gid::from_raw(d.gid))
} else {
None
};
let result = recursive_ownership_change(&mount_dir, uid, gid, d.read_only);
assert!(result.is_ok());
assert_eq!(mount_dir.as_path().metadata().unwrap().gid(), d.gid);
assert_eq!(
mount_dir.as_path().metadata().unwrap().permissions().mode(),
(directory_mode | d.expected_permission_directory)
);
for n in 1..COUNT {
let nest_dir = mount_dir.join(format!("nested{}", n));
for f in 1..COUNT {
let filename = nest_dir.join(format!("file{}", f));
let file = Path::new(&filename);
assert_eq!(file.metadata().unwrap().gid(), d.gid);
assert_eq!(
file.metadata().unwrap().permissions().mode(),
(file_mode | d.expected_permission_file)
);
}
let dir = Path::new(&nest_dir);
assert_eq!(dir.metadata().unwrap().gid(), d.gid);
assert_eq!(
dir.metadata().unwrap().permissions().mode(),
(directory_mode | d.expected_permission_directory)
);
}
}
}
#[tokio::test]
#[serial_test::serial]
async fn cleanup_storage() {
skip_if_not_root!();
let logger = slog::Logger::root(slog::Discard, o!());
let tmpdir = Builder::new().tempdir().unwrap();
let tmpdir_path = tmpdir.path().to_str().unwrap();
let srcdir = Builder::new()
.prefix("src")
.tempdir_in(tmpdir_path)
.unwrap();
let srcdir_path = srcdir.path().to_str().unwrap();
let empty_file = Path::new(srcdir_path).join("emptyfile");
fs::write(&empty_file, "test").unwrap();
let destdir = Builder::new()
.prefix("dest")
.tempdir_in(tmpdir_path)
.unwrap();
let destdir_path = destdir.path().to_str().unwrap();
let emptydir = Builder::new()
.prefix("empty")
.tempdir_in(tmpdir_path)
.unwrap();
let s = StorageDeviceGeneric::default();
assert!(s.cleanup().is_ok());
let s = StorageDeviceGeneric::new("".to_string());
assert!(s.cleanup().is_ok());
let invalid_dir = emptydir
.path()
.join("invalid")
.to_str()
.unwrap()
.to_string();
let s = StorageDeviceGeneric::new(invalid_dir);
assert!(s.cleanup().is_ok());
assert!(bind_mount(srcdir_path, destdir_path, &logger).is_ok());
let s = StorageDeviceGeneric::new(destdir_path.to_string());
assert!(s.cleanup().is_ok());
// fail to remove non-empty directory
let s = StorageDeviceGeneric::new(srcdir_path.to_string());
s.cleanup().unwrap_err();
// remove a directory without umount
fs::remove_file(&empty_file).unwrap();
s.cleanup().unwrap();
}
fn bind_mount(src: &str, dst: &str, logger: &Logger) -> Result<(), Error> {
let src_path = Path::new(src);
let dst_path = Path::new(dst);
baremount(src_path, dst_path, "bind", MsFlags::MS_BIND, "", logger)
}
}

View File

@@ -1,3 +1,2 @@
target
Cargo.lock
.idea

1337
src/dragonball/Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -10,6 +10,7 @@ license = "Apache-2.0"
edition = "2018"
[dependencies]
anyhow = "1.0.32"
arc-swap = "1.5.0"
bytes = "1.1.0"
dbs-address-space = { path = "./src/dbs_address_space" }
@@ -26,9 +27,11 @@ kvm-bindings = "0.6.0"
kvm-ioctls = "0.12.0"
lazy_static = "1.2"
libc = "0.2.39"
linux-loader = "0.6.0"
linux-loader = "0.8.0"
log = "0.4.14"
nix = "0.24.2"
procfs = "0.12.0"
prometheus = { version = "0.13.0", features = ["process"] }
seccompiler = "0.2.0"
serde = "1.0.27"
serde_derive = "1.0.27"
@@ -37,13 +40,14 @@ slog = "2.5.2"
slog-scope = "4.4.0"
thiserror = "1"
vmm-sys-util = "0.11.0"
virtio-queue = { version = "0.6.0", optional = true }
vm-memory = { version = "0.9.0", features = ["backend-mmap"] }
virtio-queue = { version = "0.7.0", optional = true }
vm-memory = { version = "0.10.0", features = ["backend-mmap"] }
crossbeam-channel = "0.5.6"
fuse-backend-rs = "0.10.5"
[dev-dependencies]
slog-term = "2.9.0"
slog-async = "2.7.0"
slog-term = "2.9.0"
test-utils = { path = "../libs/test-utils" }
[features]
@@ -54,6 +58,6 @@ virtio-vsock = ["dbs-virtio-devices/virtio-vsock", "virtio-queue"]
virtio-blk = ["dbs-virtio-devices/virtio-blk", "virtio-queue"]
virtio-net = ["dbs-virtio-devices/virtio-net", "virtio-queue"]
# virtio-fs only work on atomic-guest-memory
virtio-fs = ["dbs-virtio-devices/virtio-fs", "virtio-queue", "atomic-guest-memory"]
virtio-fs = ["dbs-virtio-devices/virtio-fs-pro", "virtio-queue", "atomic-guest-memory"]
virtio-mem = ["dbs-virtio-devices/virtio-mem", "virtio-queue", "atomic-guest-memory"]
virtio-balloon = ["dbs-virtio-devices/virtio-balloon", "virtio-queue"]

View File

@@ -16,6 +16,8 @@ use crate::event_manager::EventManager;
use crate::vm::{CpuTopology, KernelConfigInfo, VmConfigInfo};
use crate::vmm::Vmm;
use crate::hypervisor_metrics::get_hypervisor_metrics;
use self::VmConfigError::*;
use self::VmmActionError::MachineConfig;
@@ -58,6 +60,11 @@ pub enum VmmActionError {
#[error("Upcall not ready, can't hotplug device.")]
UpcallServerNotReady,
/// Error when get prometheus metrics.
/// Currently does not distinguish between error types for metrics.
#[error("failed to get hypervisor metrics")]
GetHypervisorMetrics,
/// The action `ConfigureBootSource` failed either because of bad user input or an internal
/// error.
#[error("failed to configure boot source for VM: {0}")]
@@ -135,6 +142,9 @@ pub enum VmmAction {
/// Get the configuration of the microVM.
GetVmConfiguration,
/// Get Prometheus Metrics.
GetHypervisorMetrics,
/// Set the microVM configuration (memory & vcpu) using `VmConfig` as input. This
/// action can only be called before the microVM has booted.
SetVmConfiguration(VmConfigInfo),
@@ -208,6 +218,8 @@ pub enum VmmData {
Empty,
/// The microVM configuration represented by `VmConfigInfo`.
MachineConfiguration(Box<VmConfigInfo>),
/// Prometheus Metrics represented by String.
HypervisorMetrics(String),
}
/// Request data type used to communicate between the API and the VMM.
@@ -262,6 +274,7 @@ impl VmmService {
VmmAction::GetVmConfiguration => Ok(VmmData::MachineConfiguration(Box::new(
self.machine_config.clone(),
))),
VmmAction::GetHypervisorMetrics => self.get_hypervisor_metrics(),
VmmAction::SetVmConfiguration(machine_config) => {
self.set_vm_configuration(vmm, machine_config)
}
@@ -345,7 +358,8 @@ impl VmmService {
Some(ref path) => Some(File::open(path).map_err(|e| BootSource(InvalidInitrdPath(e)))?),
};
let mut cmdline = linux_loader::cmdline::Cmdline::new(dbs_boot::layout::CMDLINE_MAX_SIZE);
let mut cmdline = linux_loader::cmdline::Cmdline::new(dbs_boot::layout::CMDLINE_MAX_SIZE)
.map_err(|err| BootSource(InvalidKernelCommandLine(err)))?;
let boot_args = boot_source_config
.boot_args
.unwrap_or_else(|| String::from(DEFAULT_KERNEL_CMDLINE));
@@ -381,6 +395,13 @@ impl VmmService {
Ok(VmmData::Empty)
}
/// Get prometheus metrics.
fn get_hypervisor_metrics(&self) -> VmmRequestResult {
get_hypervisor_metrics()
.map_err(|_| VmmActionError::GetHypervisorMetrics)
.map(VmmData::HypervisorMetrics)
}
/// Set virtual machine configuration.
pub fn set_vm_configuration(
&mut self,
@@ -671,6 +692,7 @@ impl VmmService {
));
}
#[cfg(feature = "dbs-upcall")]
vm.resize_vcpu(config, None).map_err(|e| {
if let VcpuResizeError::UpcallServerNotReady = e {
return VmmActionError::UpcallServerNotReady;

View File

@@ -17,4 +17,4 @@ nix = "0.23.1"
lazy_static = "1"
thiserror = "1"
vmm-sys-util = "0.11.0"
vm-memory = { version = "0.9", features = ["backend-mmap", "backend-atomic"] }
vm-memory = { version = "0.10", features = ["backend-mmap", "backend-atomic"] }

View File

@@ -15,12 +15,12 @@ memoffset = "0.6"
kvm-bindings = { version = "0.6.0", features = ["fam-wrappers"] }
kvm-ioctls = "0.12.0"
thiserror = "1"
vm-memory = { version = "0.9" }
vm-memory = { version = "0.10" }
vmm-sys-util = "0.11.0"
libc = ">=0.2.39"
[dev-dependencies]
vm-memory = { version = "0.9", features = ["backend-mmap"] }
vm-memory = { version = "0.10", features = ["backend-mmap"] }
[package.metadata.docs.rs]
all-features = true

View File

@@ -17,10 +17,10 @@ kvm-ioctls = "0.12.0"
lazy_static = "1"
libc = "0.2.39"
thiserror = "1"
vm-memory = "0.9.0"
vm-memory = "0.10.0"
vm-fdt = "0.2.0"
[dev-dependencies]
vm-memory = { version = "0.9.0", features = ["backend-mmap"] }
vm-memory = { version = "0.10.0", features = ["backend-mmap"] }
device_tree = ">=1.1.0"
dbs-device = { path = "../dbs_device" }

View File

@@ -10,6 +10,7 @@
#![allow(non_snake_case)]
#![allow(missing_docs)]
#![allow(deref_nullptr)]
#![allow(ambiguous_glob_reexports)]
// generated with bindgen /usr/include/linux/if.h --no-unstable-rust
// --constified-enum '*' --with-derive-default -- -D __UAPI_DEF_IF_IFNAMSIZ -D

View File

@@ -18,28 +18,28 @@ dbs-interrupt = { path = "../dbs_interrupt", features = ["kvm-legacy-irq", "kvm-
dbs-utils = { path = "../dbs_utils" }
epoll = ">=4.3.1, <4.3.2"
io-uring = "0.5.2"
fuse-backend-rs = { version = "0.10.0", optional = true }
fuse-backend-rs = { version = "0.10.5", optional = true }
kvm-bindings = "0.6.0"
kvm-ioctls = "0.12.0"
libc = "0.2.119"
log = "0.4.14"
nix = "0.24.3"
nydus-api = "0.3.0"
nydus-rafs = "0.3.1"
nydus-storage = "0.6.3"
nydus-api = "0.3.1"
nydus-rafs = "0.3.2"
nydus-storage = "0.6.4"
rlimit = "0.7.0"
serde = "1.0.27"
serde_json = "1.0.9"
thiserror = "1"
threadpool = "1"
virtio-bindings = "0.1.0"
virtio-queue = "0.6.0"
virtio-queue = "0.7.0"
vmm-sys-util = "0.11.0"
vm-memory = { version = "0.9.0", features = [ "backend-mmap" ] }
vm-memory = { version = "0.10.0", features = [ "backend-mmap" ] }
sendfd = "0.4.3"
[dev-dependencies]
vm-memory = { version = "0.9.0", features = [ "backend-mmap", "backend-atomic" ] }
vm-memory = { version = "0.10.0", features = [ "backend-mmap", "backend-atomic" ] }
[features]
virtio-mmio = []

View File

@@ -475,7 +475,7 @@ impl<AS: GuestAddressSpace> VirtioFs<AS> {
let (mut rafs, rafs_cfg) = match config.as_ref() {
Some(cfg) => {
let rafs_conf: Arc<ConfigV2> = Arc::new(
serde_json::from_str(cfg).map_err(|e| FsError::BackendFs(e.to_string()))?,
ConfigV2::from_str(cfg).map_err(|e| FsError::BackendFs(e.to_string()))?,
);
(

View File

@@ -0,0 +1,94 @@
// Copyright 2023 Ant Group. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0
use std::any::Any;
use std::io::{Error, Read, Write};
use std::os::unix::io::{AsRawFd, RawFd};
use std::time::Duration;
use log::error;
use nix::errno::Errno;
use super::{VsockBackendType, VsockStream};
pub struct HybridStream {
pub hybrid_stream: std::fs::File,
pub slave_stream: Option<Box<dyn VsockStream>>,
}
impl AsRawFd for HybridStream {
fn as_raw_fd(&self) -> RawFd {
self.hybrid_stream.as_raw_fd()
}
}
impl Read for HybridStream {
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
self.hybrid_stream.read(buf)
}
}
impl Write for HybridStream {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
// The slave stream was only used to reply the connect result "ok <port>",
// thus it was only used once here, and the data would be replied by the
// main stream.
if let Some(mut stream) = self.slave_stream.take() {
stream.write(buf)
} else {
self.hybrid_stream.write(buf)
}
}
fn flush(&mut self) -> std::io::Result<()> {
self.hybrid_stream.flush()
}
}
impl VsockStream for HybridStream {
fn backend_type(&self) -> VsockBackendType {
VsockBackendType::HybridStream
}
fn set_nonblocking(&mut self, nonblocking: bool) -> std::io::Result<()> {
let fd = self.hybrid_stream.as_raw_fd();
let mut flag = unsafe { libc::fcntl(fd, libc::F_GETFL) };
if nonblocking {
flag = flag | libc::O_NONBLOCK;
} else {
flag = flag & !libc::O_NONBLOCK;
}
let ret = unsafe { libc::fcntl(fd, libc::F_SETFL, flag) };
if ret < 0 {
error!("failed to set fcntl for fd {} with ret {}", fd, ret);
return Err(Error::last_os_error());
}
Ok(())
}
fn set_read_timeout(&mut self, _dur: Option<Duration>) -> std::io::Result<()> {
error!("unsupported!");
Err(Errno::ENOPROTOOPT.into())
}
fn set_write_timeout(&mut self, _dur: Option<Duration>) -> std::io::Result<()> {
error!("unsupported!");
Err(Errno::ENOPROTOOPT.into())
}
fn as_any(&self) -> &dyn Any {
self
}
fn recv_data_fd(
&self,
_bytes: &mut [u8],
_fds: &mut [RawFd],
) -> std::io::Result<(usize, usize)> {
Err(Errno::ENOPROTOOPT.into())
}
}

View File

@@ -9,13 +9,14 @@ use std::io::{Read, Write};
use std::os::unix::io::{AsRawFd, RawFd};
use std::time::Duration;
mod hybrid_stream;
mod inner;
mod tcp;
mod unix_stream;
pub use self::hybrid_stream::HybridStream;
pub use self::inner::{VsockInnerBackend, VsockInnerConnector, VsockInnerStream};
pub use self::tcp::VsockTcpBackend;
pub use self::unix_stream::HybridUnixStreamBackend;
pub use self::unix_stream::VsockUnixStreamBackend;
/// The type of vsock backend.
@@ -27,6 +28,8 @@ pub enum VsockBackendType {
Tcp,
/// Inner backend
Inner,
/// Fd passed hybrid stream backend
HybridStream,
/// For test purpose
#[cfg(test)]
Test,

View File

@@ -2,7 +2,6 @@
// SPDX-License-Identifier: Apache-2.0
use std::any::Any;
use std::io::{Read, Write};
use std::os::unix::io::{AsRawFd, RawFd};
use std::os::unix::net::{UnixListener, UnixStream};
use std::time::Duration;
@@ -13,66 +12,6 @@ use sendfd::RecvWithFd;
use super::super::{Result, VsockError};
use super::{VsockBackend, VsockBackendType, VsockStream};
pub struct HybridUnixStreamBackend {
pub unix_stream: Box<dyn VsockStream>,
pub slave_stream: Option<Box<dyn VsockStream>>,
}
impl VsockStream for HybridUnixStreamBackend {
fn backend_type(&self) -> VsockBackendType {
self.unix_stream.backend_type()
}
fn set_nonblocking(&mut self, nonblocking: bool) -> std::io::Result<()> {
self.unix_stream.set_nonblocking(nonblocking)
}
fn set_read_timeout(&mut self, dur: Option<Duration>) -> std::io::Result<()> {
self.unix_stream.set_read_timeout(dur)
}
fn set_write_timeout(&mut self, dur: Option<Duration>) -> std::io::Result<()> {
self.unix_stream.set_write_timeout(dur)
}
fn as_any(&self) -> &dyn Any {
self.unix_stream.as_any()
}
fn recv_data_fd(&self, bytes: &mut [u8], fds: &mut [RawFd]) -> std::io::Result<(usize, usize)> {
self.unix_stream.recv_data_fd(bytes, fds)
}
}
impl AsRawFd for HybridUnixStreamBackend {
fn as_raw_fd(&self) -> RawFd {
self.unix_stream.as_raw_fd()
}
}
impl Read for HybridUnixStreamBackend {
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
self.unix_stream.read(buf)
}
}
impl Write for HybridUnixStreamBackend {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
// The slave stream was only used to reply the connect result "ok <port>",
// thus it was only used once here, and the data would be replied by the
// main stream.
if let Some(mut stream) = self.slave_stream.take() {
stream.write(buf)
} else {
self.unix_stream.write(buf)
}
}
fn flush(&mut self) -> std::io::Result<()> {
self.unix_stream.flush()
}
}
impl VsockStream for UnixStream {
fn backend_type(&self) -> VsockBackendType {
VsockBackendType::UnixStream

View File

@@ -60,6 +60,10 @@ pub enum Error {
#[error("error connecting to a backend: {0}")]
BackendConnect(#[source] std::io::Error),
/// Error set nonblock to a backend stream.
#[error("error set nonblocking to a backend: {0}")]
BackendSetNonBlock(#[source] std::io::Error),
/// Error reading from backend.
#[error("error reading from backend: {0}")]
BackendRead(#[source] std::io::Error),

View File

@@ -36,14 +36,14 @@
/// route all these events to their handlers, the muxer uses another
/// `HashMap` object, mapping `RawFd`s to `EpollListener`s.
use std::collections::{HashMap, HashSet};
use std::fs::File;
use std::io::Read;
use std::os::fd::FromRawFd;
use std::os::unix::io::{AsRawFd, RawFd};
use std::os::unix::net::UnixStream;
use log::{debug, error, info, trace, warn};
use super::super::backend::{HybridUnixStreamBackend, VsockBackend, VsockBackendType, VsockStream};
use super::super::backend::{HybridStream, VsockBackend, VsockBackendType, VsockStream};
use super::super::csm::{ConnState, VsockConnection};
use super::super::defs::uapi;
@@ -480,13 +480,17 @@ impl VsockMuxer {
.and_then(|(nfd, local_port, peer_port)| {
// Here we should make sure the nfd the sole owner to convert it
// into an UnixStream object, otherwise, it could cause memory unsafety.
let nstream = unsafe { UnixStream::from_raw_fd(nfd) };
let nstream = unsafe { File::from_raw_fd(nfd) };
let hybridstream = HybridUnixStreamBackend {
unix_stream: Box::new(nstream),
let mut hybridstream = HybridStream {
hybrid_stream: nstream,
slave_stream: Some(stream),
};
hybridstream
.set_nonblocking(true)
.map_err(Error::BackendSetNonBlock)?;
self.add_connection(
ConnMapKey {
local_port,

View File

@@ -1195,7 +1195,7 @@ mod tests {
let mut cmdline = crate::vm::KernelConfigInfo::new(
kernel_file,
None,
linux_loader::cmdline::Cmdline::new(0x1000),
linux_loader::cmdline::Cmdline::new(0x1000).unwrap(),
);
let address_space = vm.vm_address_space().cloned();

View File

@@ -0,0 +1,110 @@
// Copyright 2021-2022 Ant Group
//
// SPDX-License-Identifier: Apache-2.0
//
extern crate procfs;
use std::sync::Mutex;
use anyhow::{anyhow, Result};
use dbs_utils::metric::IncMetric;
use prometheus::{Encoder, IntCounter, IntGaugeVec, Opts, Registry, TextEncoder};
use crate::metric::METRICS;
const NAMESPACE_KATA_HYPERVISOR: &str = "kata_hypervisor";
lazy_static! {
static ref REGISTERED: Mutex<bool> = Mutex::new(false);
// custom registry
static ref REGISTRY: Registry = Registry::new();
// hypervisor metrics
static ref HYPERVISOR_SCRAPE_COUNT: IntCounter =
IntCounter::new(format!("{}_{}",NAMESPACE_KATA_HYPERVISOR,"scrape_count"), "Hypervisor metrics scrape count.").unwrap();
static ref HYPERVISOR_VCPU: IntGaugeVec =
IntGaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_HYPERVISOR,"vcpu"), "Hypervisor metrics specific to VCPUs' mode of functioning."), &["cpu_id", "item"]).unwrap();
static ref HYPERVISOR_SECCOMP: IntGaugeVec =
IntGaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_HYPERVISOR,"seccomp"), "Hypervisor metrics for the seccomp filtering."), &["item"]).unwrap();
static ref HYPERVISOR_SIGNALS: IntGaugeVec =
IntGaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_HYPERVISOR,"signals"), "Hypervisor metrics related to signals."), &["item"]).unwrap();
}
/// get prometheus metrics
pub fn get_hypervisor_metrics() -> Result<String> {
let mut registered = REGISTERED
.lock()
.map_err(|e| anyhow!("failed to check hypervisor metrics register status {:?}", e))?;
if !(*registered) {
register_hypervisor_metrics()?;
*registered = true;
}
update_hypervisor_metrics()?;
// gather all metrics and return as a String
let metric_families = REGISTRY.gather();
let mut buffer = Vec::new();
let encoder = TextEncoder::new();
encoder.encode(&metric_families, &mut buffer)?;
Ok(String::from_utf8(buffer)?)
}
fn register_hypervisor_metrics() -> Result<()> {
REGISTRY.register(Box::new(HYPERVISOR_SCRAPE_COUNT.clone()))?;
REGISTRY.register(Box::new(HYPERVISOR_VCPU.clone()))?;
REGISTRY.register(Box::new(HYPERVISOR_SECCOMP.clone()))?;
REGISTRY.register(Box::new(HYPERVISOR_SIGNALS.clone()))?;
Ok(())
}
fn update_hypervisor_metrics() -> Result<()> {
HYPERVISOR_SCRAPE_COUNT.inc();
set_intgauge_vec_vcpu(&HYPERVISOR_VCPU);
set_intgauge_vec_seccomp(&HYPERVISOR_SECCOMP);
set_intgauge_vec_signals(&HYPERVISOR_SIGNALS);
Ok(())
}
fn set_intgauge_vec_vcpu(icv: &prometheus::IntGaugeVec) {
let metric_guard = METRICS.read().unwrap();
for (cpu_id, metrics) in metric_guard.vcpu.iter() {
icv.with_label_values(&[cpu_id.to_string().as_str(), "exit_io_in"])
.set(metrics.exit_io_in.count() as i64);
icv.with_label_values(&[cpu_id.to_string().as_str(), "exit_io_out"])
.set(metrics.exit_io_out.count() as i64);
icv.with_label_values(&[cpu_id.to_string().as_str(), "exit_mmio_read"])
.set(metrics.exit_mmio_read.count() as i64);
icv.with_label_values(&[cpu_id.to_string().as_str(), "exit_mmio_write"])
.set(metrics.exit_mmio_write.count() as i64);
icv.with_label_values(&[cpu_id.to_string().as_str(), "failures"])
.set(metrics.failures.count() as i64);
icv.with_label_values(&[cpu_id.to_string().as_str(), "filter_cpuid"])
.set(metrics.filter_cpuid.count() as i64);
}
}
fn set_intgauge_vec_seccomp(icv: &prometheus::IntGaugeVec) {
let metric_guard = METRICS.read().unwrap();
icv.with_label_values(&["num_faults"])
.set(metric_guard.seccomp.num_faults.count() as i64);
}
fn set_intgauge_vec_signals(icv: &prometheus::IntGaugeVec) {
let metric_guard = METRICS.read().unwrap();
icv.with_label_values(&["sigbus"])
.set(metric_guard.signals.sigbus.count() as i64);
icv.with_label_values(&["sigsegv"])
.set(metric_guard.signals.sigsegv.count() as i64);
}

View File

@@ -9,6 +9,9 @@
//TODO: Remove this, after the rest of dragonball has been committed.
#![allow(dead_code)]
#[macro_use]
extern crate lazy_static;
/// Address space manager for virtual machines.
pub mod address_space_manager;
/// API to handle vmm requests.
@@ -19,6 +22,8 @@ pub mod config_manager;
pub mod device_manager;
/// Errors related to Virtual machine manager.
pub mod error;
/// Prometheus Metrics.
pub mod hypervisor_metrics;
/// KVM operation context for virtual machines.
pub mod kvm_context;
/// Metrics system.

Some files were not shown because too many files have changed in this diff Show More