Compare commits

...

1736 Commits

Author SHA1 Message Date
Fabiano Fidêncio
a001021721 Merge pull request #8292 from fidencio/topic/release-ensure-gh-is-used-from-a-git-repo
release: Always use actions/checkout to ensure we're in a git repo
2023-10-23 15:16:12 +02:00
Fabiano Fidêncio
c5cfad7023 actions: Move all the checkout actions to v4
It's been released for a while now, and we need to keep consistency
between what we used.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-23 14:01:53 +02:00
Fabiano Fidêncio
b32c6bf805 release: Always use actions/checkout to ensure we're in a git repo
Otherwise we'll face issues like:
```
Run tag=$(echo $GITHUB_REF | cut -d/ -f3-)
  tag=$(echo $GITHUB_REF | cut -d/ -f3-)
  tarball="kata-static-$tag-amd64.tar.xz"
  mv kata-static.tar.xz "$GITHUB_WORKSPACE/${tarball}"
  pushd $GITHUB_WORKSPACE
  echo "uploading asset '${tarball}' for tag: ${tag}"
  GITHUB_TOKEN=*** gh release upload "${tag}" "${tarball}"
  popd
  shell: /usr/bin/bash -e {0}
~/work/kata-containers/kata-containers ~/work/kata-containers/kata-containers
uploading asset 'kata-static-3.3.0-alpha0-amd64.tar.xz' for tag: 3.3.0-alpha0
failed to run git: fatal: not a git repository (or any of the parent directories): .git
```

Fixes: #8286 (or better, just a follow up of that)

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-23 14:00:39 +02:00
Fabiano Fidêncio
8fe88696c0 Merge pull request #8287 from fidencio/topic/release-use-gh-cli-instead-of-hub
actions: release: Use GH cli instead of hub
2023-10-23 12:40:22 +02:00
Fabiano Fidêncio
710eb8ab9d actions: release: Use GH cli instead of hub
hub is now deprecated, which has been causing issues with our release
process.

Let's move to the GH cli (https://cli.github.com/manual), and unblock
this release.

**NOTE**: This commit is purposefully not touching anywhere else hub is
used, as that would require more time and investigation to do the
switch, and right now we just want to unblock the release.

Fixes: #8286

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-23 08:49:55 +02:00
Fabiano Fidêncio
74d4865189 Merge pull request #8275 from fidencio/topic/ci-adapt-kata-deploy-regex-on-repo-version-update
release: Adapt the CIs using the kata-deploy image
2023-10-23 00:37:19 +02:00
Dan Mihai
732fe163f3 Merge pull request #8229 from microsoft/danmihai1/no-config-toml-endpoints
agent: no endpoint blocking from agent-config.toml
2023-10-20 11:30:43 -07:00
Fabiano Fidêncio
026f6a1a4c release: Adapt the CIs using the kata-deploy image
This is needed in order to properly run the CIs in branches that are not
the main one, as the kata-deploy.yaml file on those branches do not have
the `latest` tag, but rather the latest stable release.

Fixes: #8274

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-20 18:59:14 +02:00
Fabiano Fidêncio
124f498830 Merge pull request #8266 from fidencio/3.3.0-alpha0-branch-bump
# Kata Containers 3.3.0-alpha0
2023-10-20 17:40:44 +02:00
GabyCT
8486283012 Merge pull request #8247 from GabyCT/topic/iperfudp
metrics: Add iperf udp benchmark
2023-10-20 09:21:37 -06:00
Fabiano Fidêncio
0fb69ddf6a release: Kata Containers 3.3.0-alpha0
- kata-deploy-stable: Switch to using the ubuntu based payload
- libs: protection: Fix typo in TDX output
- ci: k8s: Fix bogus firecracker check in k8s-credentials-secrets.bat
- tests: Enable agent stability test
- docs: Fix paths to build kernel in SNP VMs documentation
- runtime-rs: ch: Add TDX CH features check
- runtime: Validate hypervisor section name in config file
- tests: query data from the OPA service
- release: tag_repos: Stop tagging the `tests` repo
- metrics: fixes common.sh function to always return true
- Memory footprint test removing trailing commas to make json results file valid
- policy: allow access to ReseedRandomDev
- runtime/kata-ctl: update dependencies
- runtime-rs : fix Nydus support for runtime-rs + Dragonball
- metrics: removal of reference in the documentation to the fio dax subtest.
- runtime-rs: ch: Detect Intel TDX version
- runitme-rs: use the same base64 as kata-runtime/direct-volume does
- tests: Enable scability test for stability CI
- runtime-rs: Add support for adding vfio device for cloud-hypervisor
- tests: Enable soak parallel stability test
- dragonball: vcpu metrics change to be recorded per vcpu
- ci: k8s: adapt gha-run.sh to run locally
- metrics: removes kata components and k8s deployment when test finishes
- GHA: fix up referenced yaml exceeding 20 limit problem
- gha: ci: Revert tracing test PR to unbreak CI
- runtime-rs: ch: Enable feature
- gha: ci: Port runk tests over
- ci: gha: Port tracing tests over
- Enable fio test using containerd client
- gha: Add stability tests workflow for gha
- gha: arm64: Ensure the builder is arm64-builder
- kata-deploy: Build kata-agent as we build all the other components
- versions: migrate out of k8s.gcr.io
- doc: Update crictl pod-config
- gha: Fix k0s deployment
- tests: Add stability test for kata CI
- docs: Update url in kata vra document
- gpu: Adding CDI support for cold and hot-plug of VFIO devices
- kata-deploy: build & ship the rust components from src/tools/
- metrics: Add latency value limits for kata CI
- runtime: fix reading cgroup stats of sandboxes
- Upgrade to Cloud Hypervisor v35.0
- ci: Port kata-monitor tests from Jenkins to GHA
- metrics: Fix latency yamls path
- metrics: Fix metrics README
- metrics: Fix C-Ray documentation
- runtime-rs: ch: Enable Intel TDX
- ci: k8s: crio: Follow up patches to have CRI-O also working as part of our CI
- metrics: Enable latency test in gha run script
- local-build: Fix .docker ownership before build-payload
- runtime-rs: Add network support for cloud-hypervisor
- osbuild: Reduce guest components binary size with strip
- gha: Add pandoc as a dependency for static checks
- ci: rootfs-image build-asset is failing
- feat(runtime-rs): introduce huge page mode to select VM RAM's backend
- clh: Direct IO support for block devices
- gha: Install hunspell for static checks
- ci: Trigger payload-after-push on workflow_dispatch
- ci: Actually enable the CRI-O tests
- protocol: remove gogoprotobuff tests
- ci: k8s: Also run tests with CRI-O
- runtime: support kernel params including spaces
- ci: kata-deploy: Fix runner name
- metrics: Enable parallel bandwidth iperf limit
- ci: kata-deploy: Enable all k8s flavours that we support
- ci: Create clusters in individual resource groups
- versions: Bump virtiofsd to v1.8.0
- clh: arm: Use static_sandbox_resource_mgmt=true
- Bump nydus versions and update nydus tests
- runtime/qemu: Rework QMP/HMP support
- clh:arm64: use arm AMBA UART for hypervisor debug
- ci: Use variable size of VMs depending on the tests running
- ci: Rework static checks
- runtime: incorrect handling of non-empty []Endpoint parameter in Remo…
- ci: cache: Check the sha256sum of the components & fix ovmf-sev cache usage
- ci: cache: Use the artefacts stored in ghcr.io/kata-containers/cached-artefacts/${component}
- ci: Run some of the GARM tests in smaller instances
- ci: Reduce the size of the AKS VMs
- ci: cache: Allow pushing our artefacts to an OCI registry
- metrics: Add iperf value for cpu utilization
- ci: cache: Export env vars needed to use ORAS
- gha: vfio: Import test script
- tests: fix kernel and initrd annotations
- metrics: Add iperf bandwidth value for kata metrics
- metrics: Add Cassandra Metrics documentation
- metrics: Remove warning from metrics documentation
- ci: docker: nerdctl: Switch to tcp port 80 ping
- runtime: Naming conflict of network devices
- Remove gogoproto.nullable extension
- metrics: Ensure docker is running in init_env
- metrics: this PR skips the FIO test temprarily to fix issues
- ci: Add a very basic nerdctl sanity test
- runtime-rs: hypervisor: Remove debug kernel options
- versions: Bump rust version
- ci: Add a very basic docker sanity test
- dragonball: fix for non-deterministic builds
- runtime-rs: bring hybrid vsock devices in manager.
- ci: use github.ref_name instead of $GITHUB_REF_NAME
- ci: Add more target-branch related fixes
- ci: Fix target-branch usage
- agent: optimize the code of systemd cgroup manager
- gha: Manually rebase PR atop of the target branch before testing
- Update kernel to the latest LTS release (v6.1.52) and bring in erofs patches needed for the CC work
- kata-deploy: Fix aarch64 image build
- runtime: Fix more virtiofs args
- kata-deploy: Switch to an alpine image
- metrics: Use TensorFlow optimized image
- metrics: fix FIO test initialization
- ci: k8s: Add clean-up-garm argument for gha-run.sh
- ci: k8s: Second round of fix-ups with the devmapper CI
- metrics: re-enable memory-usage initialization step
- Dragonball: optimize the placement of dbs-upcall features
- ci: k8s: Fix typo in run-k8s-tests-on-garm.yaml
- ci: k8s: Add k8s devmapper tests (part 0)
- kata-deploy: Create kata-static.tar with correct ownership
- runtime: run prestart hooks before starting VM for FC
- metrics: Add write 95 percentile FIO value
- runtime: Allow virtio_fs_extra_args annotation
- packaging: do not install docker-compose-plugin for s390x|ppc64le
- runtime-rs: Fix volumes and rootfs cleanup issues
- metrics: Enable iperf benchmark on gha for kata metrics
- CI: switch static-checks-dragonball CI machines to Azure
- metrics: Add README for kata metrics report
- osbuilder: Remove chcon operation for guest SELinux
- kata-sys-util: protection: Update TDX checks
- Improve the way to clean up storage devices for sandbox
- agent: avoid possible leakage of storage device
- tests: add policy to existing tests
- gha: Rebase PR atop of the target branch before testing
- versions: Update alpine to its 3.18 version
- runtime: Fix data race in ioCopy
- metrics: Add grabdata script for metrics report
- Fixes tests on AMD machines
- metrics: Enable FIO limits for kata metrics
- metrics: Add metrics report script
- metrics: Fix memory inside limits for kata metrics
- metrics: fix parsing issue on memory-usage test
- dragonball: vsock add fifo/pipe stream support for passed fd hybridSt…
- tests: Add confidential test
- tdx: Update the components needed for using the 6.2 kernel stack
- tests: delete k8s deployment at the test's end
- tests: use unique test name
- runtime-rs: check peer close in log_forwarder
- gha: Avoid "fail-fast" in tests that are known to be flaky
- Refine storage device management for kata-agent
- metrics: Remove unused variable in tensorflow nhwc script
- kata-deploy: Don't try to remove /opt/kata
- metrics: Add TensorFlow ResNet50 FP32 benchmark
- gha: vfio: Run on Ubuntu 23.04 runner
- kata-agent: use default filemode for block device when it is set to 0
- kata-types: introduce KataVirtualVolume to support nydus, direct volume and image pull
- libs,tests: fix typo disable_guest_seccomp in configuration-anno-1.toml
- local-build: Remove GID before creating group
- kata-deploy: Avoid failing on content removal
- runtime: fix image and initrd assets handling
- metrics: Add disk link to README
- metrics: Fix FIO path
- gha: capture additional kata-deploy output
- metrics: Use function from metrics common in pytorch script
- metrics: Enable kata runtime in K8s for FIO test.
- metrics: Fix README for pytorch
- metrics: Remove unused variable in tensorflow mobilenet script
- rootfs: agent: Policy support with AGENT_INIT=yes
- gha: k8s: kata-deploy: Move kata-deploy specific tests from integration/kubernetes to functional/kata-deploy
- metrics: Fix check results for tensorflow benchmark
- metrics: Add Tensorflow ResNet50 int8 benchmark
- kata-deploy: Properly create default runtime class
- agent: simplify error handling
- metrics: Fix MobileNet help me description
- gha: ci: Start running kata-deploy tests
- runk: Modify kill command's error message for containerd tests
- runtime-rs: add driver option
- gha: cri-containerd: Enable tests
- metrics: Rename tensorflow scripts
- gha: tests: Add kata-deploy functional tests -- Part 1
- agent: runtime: add Agent Policy feature
- runk: Support without pid ns
- metrics: Add Cassandra Kubernetes benchmark for kata metrics
- metrics: Add common functions to the common script
- metrics: fix the loop used to stop kata components
- docs: Remove installation step in virtcontainers doc
- Propogate secrets, config maps etc into guest if sharedFS not available
- kata-deploy: Preliminary k0s support
- gha: static-checks: Move to the Azure instances
- versions: Update firecracker version to 1.4.0
- agent: Allow clippy::redundant_clone in the unit tests
- agent: avoid creating new `Vec` instances when easily avoidable
- metrics: compute tensorflow statistics
- metrics: Add network nginx benchmark
- metrics: install kata once and run multiple checks
- ci: unencrypted-image: Fix build context
- ci: create-confidential-image: Add dependent actions
- Follow up fixes for https://github.com/kata-containers/kata-containers/pull/7596
- tests: Create image that will be used in the unencrypted confidential tests
- kata-deploy: Ensure we cover SHIMS / DEFAULT_SHIM as part of our tests
- tests: upgrade bats version
- Fix mimor bugs and improve coding stype of agent rpc/sandbox/mount
- deps: Bump dependent crate versions
- fix number of queues handling in dragonball share fs device
- runtime-rs: Introduce directly attachable network
- metrics: General improvements to mobilenet tensorflow test
- gha: Add iperf network metrics
- docs: Use control-plane term instead of master
- agent: avoid unnecessary calls to `Arc::clone`
- metrics: Add network latency test
- Image pulling on the host
- Use version 0.10.4 of `fuse-backend-rs`
- kata-deploy: Use host's systemctl
- release: Revert kata-deploy changes after 3.2.0-rc0 release
- metrics: stop kata components before start a metric test.
- runtime-rs: Add block device handling for cloud hypervisor

a93fdb014 kata-deploy-stable: Adapt to what we're using in the stable branch
36109da93 ci: k8s: Fix bogus firecracker check in k8s-credentials-secrets.bat
d01daf749 tests: Adjust timeout for agent stability test
9b14dda14 libs: protection: Fix typo in TDX output
0e0867f15 runtime-rs: ch: Add TDX CH features check
409eadddb runtime-rs: ch: Improve readability of guest protection checks
82a0814fc tests: Enable agent stability test
32be8e3a8 tests: query data from the OPA service
b81c0a669 tests: encode policy file during test
4f9681b41 metrics: fixes common.sh function to always return true
2ef2b2a6d docs: Fix paths to build kernel in SNP VMs documentation
408b59c02 runtime-rs: fix bugs to support Nydus v5
157caea9f Revert "nydus: Temporarily skip tests on dragonball"
678fe3cd3 Dragonball: fix Nydus config serde problem
b6ec62138 policy: allow access to ReseedRandomDev
908519db9 metrics: skips docker restart when it is not installed or is masked.
c2763120a metrics: removing trailing comma characters from json file.
3e8cf6959 runtime: Validate hypervisor section name in config file
ef6388e81 tests: Remove unused function from scability test
fbc8f8f46 scripts: Use install_yq from the `kata-containers`  repo
65b1a2d27 release: tag_repos: Stop tagging / updating the `tests` repo
87b760f56 runtime-rs: ch: Detect Intel TDX version
73e81f5e3 runitme-rs: unify base64 encoding for direct-volume
c6463cb5a tests: Fix path for versions yaml for soak parallel test
89c9454fc metrics: removal of reference in the documentation to the dax test.
30ff58904 tests: Enable scability test for stability CI
8d6f7b909 runtime-rs: Add support for handling vfio device for cloud-hypervisor
e786b2b01 gha: Add install dependencies for stability tests
dbfe6512f dragonball: vcpu metrics change to be recorded per vcpu
fa60fbe02 dragonball: METRICS is refactored to RwLock<DragonballMetrics>
500d1c5ce kata-ctl: update rustls-webpki/webpki dependency
d7660d82a runtime: unify gopkg.in/yaml.v3 to v3.0.1
fc9a107e8 runtime: unify swag and testify dependency
79ebb959c runtime: update runc dependency to v1.1.9
7f3e8bd65 runtime: unify golang.org/x/text to v0.7.0
df325ae37 runtime: update golang.org/x/net to v0.7.0
bba34910d metrics: stops kata components and k8s deployment when test finishes
84e3d884e gha: Add general dependencies to stability tests
dec3951ca tests: Add soak parallel stability test
0f04d527d tests: Enable soak parallel test
e669282c2 ci: k8s: set KUBERNETES default value
c30c3ff18 tests: run k8s-volume on a given node
666993da8 tests: run k8s-file-volume on a given node
3a00fc910 tests: exec_host() now gets the node name
61c9c17bf tests: add get_one_kata_node() to tests_common.sh
68f083c4d ci: k8s: set KATA_HYPERVISOR default value
6677a61fe ci: k8s: configurable deploy kata timeout
200e54292 ci: k8s: shellcheck fixes to gha-run.sh
4af78be13 kata-deploy: re-format kata-[deploy|cleanup].yaml
d54e6d9cd ci: k8s: run_tests() for kcli
c2ef1f0fb ci: k8s: add deploy-kata-kcli() to gh-run.sh
d2be8eef1 ci: k8s: add cleanup-kcli() to gha-run.sh
cbb9aa15b ci: k8s: set default image for deploy_kata()
89bef7d03 ci: k8s: create k8s clusters with kcli
954d40cce gha: combine coco jobs into a single yaml
b60e0a9b5 gha: combine basic amd64 jobs into a single yaml
e9bd85211 gha: ci: Revert tracing test PR to unbreak CI
b8a46a4b8 runtime-rs: ch: Enable feature
0f2dc8c67 gha: Add containerd stability tests to ci yaml
da91c9df8 ci: Port runk tests to this repo
7f2377276 ci: Add placeholder for runk tests
9205acc3d ci: Move tracing tests here
85d290a04 gha: Add stability gha run script
54f0c8f88 gha: Add stability tests workflow for gha
3bb2923e5 ci: Add placeholder for tracing tests
2c3bf406d ci: Create a function to install docker
119f03de2 gha: arm64: Ensure the builder is arm64-builder
8c498ef5e metrics: Use jq tool to pretty-print json metrics output
a2159a636 metrics: Enables FIO test for kata containers
70e7ec3e2 gha: Fix k0s deployment
560bbffb5 packaging: tools: Remove `set -x` leftover
18fa483d9 packaging: release: Mention newly added images
ca3b88837 packaging: tools: Fix container image env var name
5ca66795c packaging: Allow passing the TOOLS_CONTAINER_BUILDER
02acef957 gha: Build the kata-agent as part of our workflows
5208386ab packaging: Build the kata-agent
1727487ee agent: Allow specifying DESTDIR and AGENT_POLICY via env vars
45c118883 packaging: Add get_agent_image_name()
0db8fb8f9 versions: migrate out of k8s.gcr.io
a1a054367 doc: Fix spelling
6339605a1 tests: Add general stability fixes
59ae24444 doc: Update crictl pod-config
fd19f4082 tests: Add agent stability test
215577032 tests: Add cassandra stress in stability tests
f2d3ea988 tests: Add stressng dockerfile for stability tests
6493aa309 tests: Add stressor CPU test for stability tests
ef68a3a36 metrics: Add stability test for kata CI
7c934dc7d gpu: Fix cold-plug of VFIO devices
8d66ef518 metrics: Increase qemu jitter value
5600e28b5 metrics: Increase jitter value for clh
a6b1f5e21 ci: Build src/tools components as part of our tests / releases
501a168a8 kata-deploy: Build components from src/tools
6ef42db5e static-build: Add scripts to build content from src/tools
4d08ec29b packaging: Add get_tools_image_name()
98097c96d packaging: Use git abbreviated hash
489caf1ad ci: kata-monitor: Move tests over
a3fb067f1 ci: Add placeholder for kata-monitor tests
57cb4ce20 ci: Make install_kata aware of container engines
de1eeee33 ci: Create a generic install_crio function
64a200085 ci: Add install_cni_plugins helper
8132fe15c ci: Modify containerd default config
8cb7df1be metrics: Add checkmetrics for latency test
e90440ae2 metrics: Add qemu latency value limit
a74a8f8a9 metrics: Add latency value limits for kata CI
d7def8317 metrics: Fix general check static warnings
928553d1b docs: Update url in kata vra document
b0a3293d5 runtime-rs: ch: Enable Intel TDX
523399c32 runtime-rs: ch: Add more consts
dea806581 runtime-rs: ch: Remove unused function
995f2c015 runtime-rs: ch: Only handle particular pending device types
b1b96a5c4 runtime-rs: ch: Remove erroneous "virtio-blk-mmio" check
9ac29b8d3 metrics: Add init_env function to latency test
dfd0c9fa9 runtime: clh: Re-generate the client code
8f9f087e3 versions: Upgrade to Cloud Hypervisor v35.0
81c8babca metrics: Fix latency yamls path
481573682 metrics: Fix C-Ray documentation
ef63d67c4 ci: crio: Trail '\r' from exec_host() output
74c12b292 ci: crio: Enable default capabilities
358dc2f56 kata-deploy: Fix CRI-O detection
ebaa4fa4c ci: crio: Pass `-y` to apt
97e73b223 metrics: Fix spelling warnings
36c8cd6f1 metrics: Fix metrics README
15425a2b8 local-build: Fix .docker ownership before build-payload
13ca7d9f9 gha: Add pandoc as a dependency for static checks
08bc8e4db metrics: Add latency benchmark for gha
6776b55d7 metrics: Enable latency test in gha run script
94e2ccc2d runtime: fix reading cgroup stats of sandboxes
d507d189b fc: Add support for noflush cache option
2ca781518 clh: Direct IO support for block devices
0c95697cc ci: Trigger payload-after-push on workflow_dispatch
28cbc3b51 ci: rootfs-image build-asset is failing Fixes: #8027
87a861648 gha: Install hunspell for static checks
8c3c50ca8 ci: Actually enable the CRI-O tests
3a6510ad6 osbuild: Reduce guest components binary size with strip
07a6e63a6 ci: k8s: rke2: Use sudo to call systemd
03b82e848 ci: k8s: Add a CRI-O test
d7105cf7a ci: k8s: Add a method to install CRI-O
54c0a471b ci: k8s: k0s: Allow passing parameters to the k0s installer
730ef5169 deps: updating dependencies
3a2c83d69 ci: kata-deploy: Fix runner name
82ff2db46 runtime: support kernel params including spaces
604a9dd67 protocol: remove gogoprotobuff tests
f7fa7f602 ci: Enable kata-deploy tests for all the supported k8s flavours
2c908b598 ci: kata-deploy: Add the ability to deploy rke2
eaf616491 ci: kata-deploy: Add the ability to deploy k0s
001525763 ci: kata-deploy: Add deploy-k8s argument to gha-run.sh
bf2cb0228 ci: kata-deploy: Expland tests to run on k0s / rke2
b12b9e188 ci: kata-deploy: Add placeholder for tests on GARM
9e1fb8a96 ci: kata-deploy: Export KUBERNETES env var
09cc0ed43 ci: Move deploy_k8s() to gha-run-k8s-common.sh
486fe14c9 ci: Properly set K8S_TEST_UNION
d9ef1352a ci: Add first letter of the K8S_TEST_HOST_TYPE to resource group name
68267a399 ci: Create clusters in individual resource groups
9aa8d1c91 metrics: Add parallel bandwidth limit for qemu
44c7c082d versions: Bump virtiofsd to v1.8.0
af59d4bf4 metrics: Enable parallel bandwidth iperf limit
aba36ab18 nydus: Temporarily skip tests on dragonball
b8a8dfcd1 nydus: Use `kata-${KATA_HYPERVISOR}` instead of `kata`
f6df3d6ef static-build: Fix arch error on nydus build
2f9c9e2e6 tests: nydus: Update nydus tests
c9a4e7e46 versions: Bump nydus and nydus-snapshotter to its latest release
b73bde320 gha: nydus: Populate run()
b3904a1a3 gha: nydus: Populate install_dependencies()
d2b3b67f5 gha: nydus: Actually install kata when `install-kata` is called
0ec00ad42 gha: nydus: Get rid of nydus{,-snapshotter} install from nydus_test.sh
568439c77 tests: nydus: Add timeout to the crictl calls
5ac3b76eb tests: nydus: Add uid / namespace to the nydus container / sandbox
376574a16 tests: nydus: Decorate some calls with `sudo`
4290fd4b6 tests: nydus: Adapt "source ..." to GHA
a84efa3e8 tests: nydus: Adapt check to "clh" instead "cloud-hypervisor"
56a14b395 tests: common: Add install_nydus_snapshotter()
b6563783e tests: common: Add install_nydus()
72599f191 clh: arm: Use static_sandbox_resource_mgmt=true
1f16b6627 runtime/qemu: Rework QMP/HMP support
8b1e9b0c7 ci: static-checks: Clean up static-checks job
2c5ca2eaf ci: static-checks: Run tests depending on KVM
509c309ab ci: static-checks: Move "sudo make test" to the new test matrix
4e963cedf ci: static-checks: Move "make test" to the new test matrix
08f2e5ae0 runtime-rs: Ensure static-checks-build is a dep of `make test`
2bc3a616a kata-ctl: Use `loop` instead of `kvm` module in tests
46daddc50 kata-ctl: Ensure GENERATED_CODE is a dep of `make test`
ec826f328 agent: Ensure GENERATED_CODE is a dep of `make test`
1d32410a8 ci: install_libseccomp: Do not depend on the tests repo
bf888b9a5 ci: static-checks: Move "make check" to the new test matrix
473ec8780 kata-ctl: Add `kata-types` to the Cargo.lock file
ea19549a9 kata-ctl: Ensure GENERATED_CODE is a dep of `make check`
e12577586 tests: install_rust: Also install clippy
e2c61a152 ci: static-checks: Move vendor check to its own job
6794d4c84 tests: Move install_rust.sh from the tests repo
e64508c30 tests: install_go: Remove tests repo dependency
11dff731b tests: Move functions from kata_arch script here
75c974c80 ci: static-checks: Move kernel config check to its own job
9c233bb9e test: Add test to verify try_from for clh Netconfig
c69a1e33b ci: Use variable size of VMs depending on the tests running
9049d311d runtime-rs: Add network support for cloud-hypervisor
eecd5bf2a ci: cache: Fix ovmf-sev cache
86c41074b ci: cache: Check the sha256sum of the component
460988c5f ci: cache: Remove the script used to cache artefacts on Jenkins
4533a7a41 ci: cache: Also store the ${component} sha256sum
eccc76df6 ci: cache: Use the cached artefacts from ORAS
7f5e77bcb kernel: enable Arm pl011 support
241c355e0 clh:arm64: use arm AMBA uart for hypervisor debug
094b6b2cf ci: k8s: Temporarily disable tests that require a bigger VM instance
d0c257b3a ci: cache: Push cached artefacts to ghcr.io
108f1b60d kata-deploy: Generate latest_{artefact,image_builder} files
be2eb7b37 ci: cache: Install ORAS in the kata-deploy binaries builder container
fb24fb0dc ci: k8s: devmapper: Use a smaller / cheaper VM instance
1daf02f5d ci: nydus: Use a smaller / cheaper VM instance
e60d81f55 ci: nerdctl: Use a smaller / cheaper VM instance
4db416997 ci: docker: Use a smaller / cheaper VM instance
32841827b ci: cri-containerd: Use a smaller / cheaper VM instance
92fff129f ci: k8s: Don't set cpu limit request for k8s-inotofy test
faf98c062 ci: Reduce the size of the AKS VMs
adc18ecdb ci: cache: For consistency, read all used env vars
c7a851efd ci: cache: Pass the exposed env vars to the kata-deploy binaries in docker
6bd15a85d ci: cache: Export env vars needed to use ORAS
cd4fd1292 metrics: Add iperf cpu utilization limit for qemu
df5cd10ea metrics: Add iperf value for cpu utilization
a96050a7a tests: Apply timeout to 'ctr t kill'
9d9303678 tests/vfio: Bump VM image to Fedora 38
faee59b52 tests/vfio: Accept single device in vfio group for CLH
df3dc1105 tests/vfio: Get rid of sync's
7211c3dcc gha: vfio: Set test timeout to 15m
1b02f89e4 packaging: kernel: Enable VIRTIO_IOMMU on x86_64
3a1db7a86 runtime: clh: Support enabling iommu
9f1a42c6c tests/vfio: Give commands 30s to execute
b46b0ecf8 tests/vfio: Configure a value for 'hot_plug_vfio' for both vmms
bfc93927f runtime: Remove redundant check in checkPCIeConfig
7c4e73b60 runtime: Add test cases for checkPCIeConfig
fc51e4b9e runtime: Check config for supported CLH (cold|hot)_plug_vfio values
509771e6f runtime: clh: Add hot_plug_vfio entry to config
5f6475a28 tests/vfio: Gather debug info and disable tdp_mmu
8fffdc81c tests/vfio: Capture journal from vm
df815087e tests/vfio: Change to get the test working in GHA
a92ddeea1 tests/vfio: Move dependency installation to gha-run.sh
5a551a85b gha: vfio: Import jobs scripts from tests repo
49e2fa189 metrics: Increase jitter value for qemu
49234433a metrics: Increase value limit for jitter in clh
813bfdec0 ci: docker: nerdtl: Use io.containerd.kata-${KATA_HYPERVISOR}.io
46bc0b1c0 ci: nerdctl: Create the containerd config
13968aa7f ci: nerdctl: Switch to tcp port 80 ping
e0c811678 ci: docker: Switch to tcp port 80 ping
1636abbe1 runtime: issue with non-empty []Endpoint in RemoveEndpoints
0aa073967 metrics: Add iperf bandwidth value for qemu
c0ad91476 tests: fix kernel and initrd annotations
615c1cbf1 metrics: Add iperf bandwidth value for kata metrics
d53eb73ee metrics: Ensure docker is running in init_env
ad08321b8 metrics: Add Cassandra Metrics documentation
a58ea6659 metrics: this PR skips the FIO test temprarily to fix issues
f536ef5ce ci: docker: Also run the smoke test with runc
c83f167c5 ci: docker: Run the tests after the kata-static is created
12d833d07 ci: Add a very basic nerdctl sanity test
348b8644d ci: Add a very basic docker sanity test
a75fd5eb8 runk: Fix rust unecessary mut error
a31c14517 kata-ctl: useless-vec warning
c8419fc3b kata-ctl: Resolve non-minimal-cfg warning
3eaf68d95 agent-ctl: Allow clippy lint
1d8b78959 runtime-rs: Fix useless-vec warning
99f3d69e9 runtime-rs: Remove mut
16fbc27b0 dragonball: Allow ambiguous-glob-reexports
bbf191951 dragonball: Resolve non-minimal-cfg warning
75cfdd5d5 agent: config: Allow clippy lint
f3a0fd590 agent: config: Fix useles-vec warning
9e423bd3d libs: Fix clippy unnecesary hashes error
444395050 versions: Bump rust version
a16b0962b chore(cargo): update cargo lock
ca4b6b051 runtime: Naming conflict of network devices
202049f35 feat(runtime-rs): introduce huge page type to select VM RAM's backend
f811b064c ci: use github.ref_name instead of $GITHUB_REF_NAME
6d795c089 ci: Add more target-branch related fixes
8509c3187 ci: Fix target-branch usage
060499dca metrics: Remove warning from metrics documentation
c0f697fcc runtime: Allow kernel_params annotation
b03e49794 dragonball: fix for non-deterministic builds
976d10150 runtime-rs: hypervisor: Remove debug kernel options
fde34610c kernel: Add erofs patches needed for CC related work
dc6a4588a versions: Bump kernel to the latest LTS release (6.1.52)
52f6449b7 kata-manager: Remove initcall_debug kernel option
8b4a0b368 kata-deploy: Remove curl after it's used
139c7f03a kata-deploy: Fix aarch64 image build
470d06541 agent: optimize the code of systemd cgroup manager
bd24afcf7 gha: Manually rebase PR atop of the target branch before testing
72c510d05 runtime/virtiofsd: Drop all references to "--cache=none"
ead724bec protocol: removing gogo.nullable feature
d8e4bb985 protocol: remove unused PROTO_FILE env
5e1106a77 protocol: remove unused import_path
87accaaec protocol: use workdir during build
711a7ed96 protocol: remove mapping definitions
8db84c1bd protocol: force GOPATH to be set
68156d77a protocol: breaking lines to improve readability
670a8e9c7 kata-deploy: Switch to an alpine image
9d74b7ccc k8s: ci: Skip "Pod quota" test with firecracker
f6cd3930c ci: k8s: Remove useless skip statement from tests
3cc20b47a ci: k8s: Also check for "fc" (for firecracker)
b5bad3cb0 ci: k8s: Add clean-up-garm argument for gha-run.sh
aaec5a09f ci: k8s: devmapper tests should be using ubuntu 20.04
27fa7d828 ci: k8s: Add a kata-deploy-garm target
fa62a4c01 ci: k8s: Export KUBERNETES env var
8c9380a79 ci: k8s: Install bats on GARM runners
3de23034f ci: k8s: Wait some time after restarting k3s
adfea55b8 metrics: fix FIO test initialization
2df183fd9 ci: k8s: Append, instead of overwrite, the devmapper config
369a8af8f ci: k8s: Decrease k3s sleep from 4 to 2 minutes
ada65b988 ci: k8s: Use vanilla kubectl with k3s
ad45ab5d3 ci: k8s: Ensure k3s is deploy with --write-kubeconfig-mode=644
028a97e0d ci: k8s: Use the proper command for sleep
3a427795e metrics: Use TensorFlow optimized image
8d99972a8 ci: k8s: Fix typo in run-k8s-tests-on-garm.yaml
deed1b927 Dragonball: optimize the placement of dbs-upcall features
0e8bd50cb ci: k8s: Add k8s devmapper tests (part 0)
b28b54df0 ci: k8s: Add a function to configure devmapper for containerd
54f711721 ci: k8s: Add a function to deploy k3s
81536f21a runtime/qemu: Pass "--xattr" to virtiofsd instead of "-o xattr"
b1dd09a4d runtime: Allow virtio_fs_extra_args annotation
2efda20c7 packaging: do not install docker-compose-plugin for s390x|ppc64le
438fbf966 metrics: Add write 95 percentile for FIO for qemu
024b4d2ff metrics: Add write 95 percentile FIO value
e98e5cdea metrics: Add checkmetrics to gha run script
c1edfe551 metrics: Add checkmetrics value for qemu for iperf
6a79ecedf metrics: Add jitter value for clh
f609a9a75 metrics: Add test selector to iperf metrics
5b8db3042 metrics: Enable iperf benchmark on gha for kata metrics
60f733d30 CI: switch static-checks-dragonball CI machines to Azure
7870b33a2 runtime-rs: bring hybridVsock devices in manager.
18c94ebbe kata-deploy: Create kata-static.tar with correct ownership
57e7bf14a agent: refine StorageDeviceGeneric::cleanup()
53edb1937 agent: implement StorageDeviceGeneric::cleanup()
0c63453e2 types: make StorageDevice::cleanup() return possible error code
3a3d77b3b agent: move StorageDeviceGeneric from kata-types into agent
b151cfd14 metrics: re-enable memory-usage initialization step
f3e1a6a94 osbuilder: alpine: Change mirror
ac612aef5 osbuilder: alpine: Match the version on versions.yaml
9cd706d1c agent: avoid possible leakage of storage device
bf21411e9 tests: add policy to k8s tests
d0e061067 runtime: config: use the SEV initrd for SNP
67fed26f1 runtime: Use TDX image with in the qemu-tdx config
ac939c458 gha: Rebase atop of the target branch
82cd14ba3 versions: Update alpine to its 3.18 version
666882575 metrics: Add grabdata script for metrics report
c290eaed8 kata-sys-util: protection: Update TDX checks
d7a996c68 gha: Update to checkout@v3 action
c2ba29c15 runtime: Fix data race in ioCopy
211de08d9 osbuilder: Remove chcon operation for guest SELinux
9f21fa9b3 metrics: Add report generator link to general documentation
c0ed5ea0a metrics: Add README for kata metrics report
a7b59a5bf metrics: Add limit for 90 percentile for qemu value
99db6568e metrics: Add limit for write 90 percentile value for clh
6e06392c5 metrics: Enable FIO limits for kata metrics
2e4c87472 runtime/vc: runPrestartHooks should ignore GetHypervisorPid failure
21204caf2 runtime: fail early when starting docker container with FC
32fd01371 runtime: run prestart hooks before starting VM for FC
00e7ffd98 tests: check vmx only on Intel machines
c8dd3c073 metrics: Fix memory footprint qemu limit
8877ec62f metrics: Fix memory inside limits for kata metrics
80146f207 tests: Fixes cpuType check on AMD machines
7e364716d metrics: Add test setup details to metrics report
17dc1b976 metrics: Add boot lifecycle times to metrics report
3b0d6538f metrics: Add memory inside container to metrics report
79fbb9d24 metrics: Add scaling system footprint in metrics report
8e6d4e6f3 metrics: Add metrics reportgen
139ffd4f7 metrics: Add report file titles
878d1a2e7 metrics: Generate PNGs alongside the PDF report
fce248797 metrics: Add metrics report R files
08812074d metrics: Add report dockerfile
69781fc02 metrics: Add metrics report script
e286e842c tests: Expand confidential test to support TDX
e31f099be tests: Expand confidential test to support SNP
c3b9d4945 tests: Add confidential test for SEV
538c965c2 metrics: fix parsing issue on memory-usage test
3818bf331 local-build: Remove $HOME/.docker/buildx/activity/default
d1b54ede2 qemu: tdx: Workaround SMP issue with TDX 1.5
1e34220c4 qemu: tdx: Adapt to the TDX 1.5 stack
8115a0522 versions: tdx: Update Kernel to 6.2 + TDX
ec18180f3 versions: tdx: Update TDVF to the "edk2-stable202302"
9803b2428 versions: tdx: Update QEMU to v7.2 + TDX v1.10
dffc16e5b runtime-rs: check peer close in log_forwarder
aaa5ab126 agent: simplify storage device by removing StorageDeviceObject
fb49d5d7c gha: Avoid "fail-fast" in tests that are known to be flaky
183f51d6f tests: use unique test name
6a974679f tests: delete k8s deployment at the test's end
32a778b6d metrics: Remove unused variable in tensorflow nhwc script
d8f3ce649 kata-deploy: Don't try to remove /opt/kata
936e8091a gha: vfio: Run on Ubuntu 23.04 runner
0e7248264 agent: move storage device related code into dedicated files
268e84655 runtime-rs: Fix volumes and rootfs cleanup issues
8f49ee33b agent: refine storage related code a bit
60ca12ccb agent: switch to new storage subsystem
fcbda0b41 kata-types: introduce StorageDevice and StorageHandlerManager
b03b1f613 agent: simplify the way to manage storage object
8392c71bf sys-util: support more mount flags in parse_mount_options()
c00d8f3d4 agent: use create_mount_destination() from kata-sys-util
5e867f053 types: add more mount related constants
880e6c9a7 agent: use function from kata-sys-utils to reduce code
3b881fbc0 local-build: Remove GID before creating group
959ca4944 metrics: Add TensorFlow ResNet50 fp32 Dockerfile
4b7d72c4a metrics: Add TensorFlow ResNet50 FP32 benchmark
5cba38c17 kata-deploy: Avoid failing on content removal
18d42da21 runtime/fc: fix image/initrd annotation handling
9fda7059a runtime/clh: fix image/initrd annotation handling
1a0092d63 runtime/qemu: fix image/initrd annotation handling
22d8f335d libs,tests: fix typo disable_guest_seccomp in configuration-anno-1.toml
8afd158ce metrics: Add disk link to README
40914b25d kata-agent: use default filemode for block device when it is set to 0
eee2ee6ee metrics: Fix FIO path
39bc3488f metrics: Use function from metrics common in pytorch script
400eb8874 gha: capture additional kata-deploy output
4aee3eade kata-types: implement serde methods for KataVirtualVolume
b875e3932 kata-types: validate KataVirtualVolume object
fa2fdc105 kata-types: implement two conversion helpers for KataVirtualVolume
6326af20e kata-types: introduce KataVirtualVolume
c8b43f8b3 metrics: Fix README for pytorch
fb571f8be metrics: Enable kata runtime in K8s for FIO test.
cb056f8cb rootfs: agent: Policy support with AGENT_INIT=yes
85c02828e metrics: Update tensorflow name in gha run script
e8a511934 metrics: Fix check results for tensorflow benchmark
2d896ad12 gha: kata-deploy: Do the runtime class cleanup as part of the cleanup
4ffc2c86f gha: kata-deploy: Add the first kata-deploy test
8616c050a metrics: Remove unused variable in tensorflow mobilenet script
285e616b5 tests: common: Ensure test_type is used as part of the cluster's name
790bd3548 tests: commob: Don't fail if yq is not part of the cache
ce6adecd0 gha: kata-deploy: Add run-kata-deploy-tests.sh
cfc29c11a gha: k8s: Stop running kata-deploy tests as part of the k8s suite
f4dd15286 tests: k8s: Call ensure_yq() in setup.sh
339569b69 kata-deploy: Properly create default runtime class
2a491e9b1 metrics: Fix MobileNet help me description
d19a75e80 gha: ci: Start running kata-deploy tests
d90f7ac68 runtime-rs: add unit test for block driver
e44919f0d runtime-rs: add load_test_config for unit test
7f48a6937 runtime-rs: add driver option
bade6a5c3 docs: Fix TensorFlow word across the document
1a1b20776 docs: Add Tensorflow Resnet50 documentation
24baededc metrics: Add Dockerfile for ResNet50 int8
6d971ba8d metrics: Add Tensorflow ResNet50 int8 benchmark
25d151bd1 runk: Modify kill command's error message for containerd tests
b3592ab25 gha: cri-containerd: Enable tests
84dd02e0f gha: cri-containerd: Add timeout to the crictl calls on testContainerStop
b29782984 gha: cri-containerd: Show pod before deleting it
ae0930824 gha: cri-containerd: Print kata logs in case of error
6c8b2ffa6 gha: cri-containerd: Group containerd logs
9e898701f gha: cri-containerd: Ensure RUNTIME takes KATA_HYPERVISOR into account
76dac8f22 agent: simplify error handling
18a7fd8e4 metrics: Rename tensorflow scripts
e55fa93db tests: kata-deploy: Add placeholder for kata-deploy-tests-on-tdx
d9ee17aae tests: kata-deploy: Add placeholder for kata-deploy-tests-on-aks
ab829d103 agent: runtime: add the Agent Policy feature
831e73ff9 tests: kata-deploy: Add functional/kata-deploy/gha-run.sh placeholder
af1b46bbf tests: Add gha-run-k8s-common.sh
416445e7e docs: Remove installation step in virtcontainers doc
72cbcf040 kata-deploy: Add k0s support
767434d50 metrics: fix the loop used to stop kata components #7629
5d0f0d43c metrics: Add cassandra statefulset yaml
c1dcc1396 metrics: Add cassandra service yaml
2297a0d1c metrics: Add block loop pvc yaml for cassandra
e3d511946 metrics: Add block loop pv yaml for cassandra test
989027159 metrics: Add block loop pvc for cassandra test
349b89969 metrics: Add Cassandra Kubernetes benchmark for kata metrics
c52d09052 gha: static-checks: Move to the Azure instances
8815ed066 runtime: Remove config warnings
afe1a6ac5 agent: support copying of directories and symlinks
ab13ef87e runtime: propagate configmap/secrets etc changes for remote-hyp
c074ec4df runtime: Copy shared files recursively
fdcd52ff7 metrics: Add check containers are running in tensorflow mobilenet
36337ee14 metrics: Add check containers are up in tensorflow script
f700f9b0b metrics: Remove unused variable in tensorflow script
833cf7a68 metrics: Add check containers are running function
918c78308 metrics: Add check containers are up in tensorflow mobilenet script
9d57a1fab metrics: Use check containers are up in tensorflow script
1c84680d8 metrics: Add check containers are up in common script
d3e57cf45 metrics: Use collect_results function in tensorflow mobilenet test
286de046a metrics: Remove collect results function definition
9879709aa metrics: Add common functions to the common script
4746fa3da docs: Specify supported Firecracker version using `versions.yaml`
cc922be5e versions: Update firecracker version to 1.4.0
39e67b06e dragonball: vsock add fifo/pipe stream support for passed fd hybridStream
473b0d3a3 metrics: compute tensorflow statistics
03d1fa67b ci: unencrypted-image: Fix build context
eb463b38e ci: unencrypted-image: Don't fail to build on s390x
a2d731ad2 ci: create-confidential-image: Add dependent actions
d1a629622 metrics: Add nginx documentation to network README
498f7c054 metrics: Add nginx kubernetes yaml
f8a5255cf metrics: Add network nginx benchmark
43fe5d1b9 ci: k8s: tees: Ensure PR_NUMBER is exported
54f6a7850 ci: {{ pr-number }} should be {{ inputs.pr-number }}
034d7aab8 tests: k8s: Ensure the runtime classes are properly created
fac8ccf5c ci: Add build-and-publish-tee-confidential-unencrypted-image
ab5f603ff ci: k8s: Add the image used for unencrypted confidential tests
1e8fe131b k8s: tests: Take advantage of `SHIMS` and `DEFAULT_SHIM` env vars
729b2dd61 agent: avoid creating new `Vec` instances when easily avoidable
aeaec9dae tests: upgrade bats version
e66496986 metrics: install kata once and run multiple checks
baabfa9f1 agent: refine implementation of mount related code
98ba211a3 agent: fix a bug in update_ephemeral_mounts()
5333618d7 agent: make add_storage() take &[Storage] instead of Vec<Storage>
37f34781d agent: simplify function online_cpu_memory()
d3c542237 agent: refine style of code related to sandbox
71a9f6778 agent: avoid unwrap() in function do_remove_container()
84badd89d agent: avoid clone objects when possible
b23c5ed15 deps: Bump dependent crate versions
863283716 metrics: General improvements to mobilenet tensorflow test
3c319d8d4 metrics: Add iperf to gha run script
5b5caf890 gha: Add iperf network metrics
66db5b535 metrics: Add latency test to network README
c36572418 agent: avoid unnecessary calls to `Arc::clone`
4fbe0a3a5 runtime: bind-mount mounted block device into container
7e1b1949d runtime: add support for kata overlays
6c867d9e8 agent: add io.katacontainers.fs-opt.overlay-rw option
6163c3565 agent: skip mount options that start with "io.katacontainers."
b2ff97aa0 dragonball: use version 0.10.4 of `fuse-backend-rs`
845eeb4d7 agent: Allow clippy::redundant_clone in the unit tests
1163fc9de release: Revert kata-deploy changes after 3.2.0-rc0 release
3958a39d0 runtime-rs: Introduce directly attachable network
1e15369e5 metrics: Improve naming testing containers in launch times test
5dbe88330 metrics: Clean kata components before start a metric test.
3b45060b6 metrics: Add latency server yaml
9bb8451df metrics: Add latency client yaml
64fdb9870 metrics: Add network latency test
a81ad3b58 runtime-rs: Add block device handling in cloud hypervisor
3230dec95 kata-deploy: Use host's systemctl
1b21a4624 docs: Use control-plane term instead of master
28e5e9c86 runtime-rs: fix number of queues handling in dragonball share fs device
f1d8de9be runk: Allow runk to launch a container without pid namespace

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-20 14:44:50 +02:00
Fabiano Fidêncio
f6e20ac230 Merge pull request #7195 from fidencio/topic/adapt-kata-deploy-stable-to-using-ubuntu
kata-deploy-stable: Switch to using the ubuntu based payload
2023-10-20 14:42:04 +02:00
Fabiano Fidêncio
a93fdb014b kata-deploy-stable: Adapt to what we're using in the stable branch
This is basically to make sure that folks trying to use the kata-deploy
script from the main branch, to deploy **stable** kata-deploy images, do
not have a hard time.

Fixes: #7194

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-20 12:58:42 +02:00
James O. D. Hunt
79ed501a20 Merge pull request #8258 from jodh-intel/protection-fix-tdx-typo
libs: protection: Fix typo in TDX output
2023-10-20 08:36:22 +01:00
Dan Mihai
52aaf10759 agent: no endpoint blocking from agent-config.toml
Remove the ability to block access to kata agent endpoints by using
agent-config.toml. That functionality is now implemented using the
Agent Policy feature (#7573).

The CCv0 branch relied on blocking endpoints using agent-config.toml
but will set-up an equivalent default policy file instead (#8219).

Fixes: #8228

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-10-20 02:26:54 +00:00
Fabiano Fidêncio
468a3e4b53 Merge pull request #8260 from gkurz/fix-8259
ci: k8s: Fix bogus firecracker check in k8s-credentials-secrets.bat
2023-10-19 23:58:22 +02:00
GabyCT
5d6bdbd0a1 Merge pull request #8241 from GabyCT/topic/enableagenttest
tests: Enable agent stability test
2023-10-19 14:12:49 -06:00
Greg Kurz
36109da93f ci: k8s: Fix bogus firecracker check in k8s-credentials-secrets.bat
Fixes #8259

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-10-19 21:53:23 +02:00
GabyCT
dc295600b8 Merge pull request #8157 from GabyCT/topic/fixsevdoc
docs: Fix paths to build kernel in SNP VMs documentation
2023-10-19 11:42:03 -06:00
Gabriela Cervantes
d01daf749b tests: Adjust timeout for agent stability test
This PR adjusts the timeout for the agent stability test
to run on the gha.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-19 16:55:23 +00:00
James O. D. Hunt
9b14dda147 libs: protection: Fix typo in TDX output
Add the missing closing bracket to the output of the TDX details,
so rather than:

```bash
$ sudo kata-ctl env 2>/dev/null | grep available_guest_protection
available_guest_protection = "tdx (major_version: 1, minor_version: 0"
:                                                                    ^
:                                                           Missing ')' !
```

... we now have:

```bash
$ sudo kata-ctl env 2>/dev/null | grep available_guest_protection
available_guest_protection = "tdx (major_version: 1, minor_version: 0)"
:                                                                    ^
:                                                                   Aha!
```

Added a unit test for this scenario.

Fixes: #8257.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-19 16:06:08 +01:00
James O. D. Hunt
9336e2e492 Merge pull request #8155 from jodh-intel/runtime-rs-check-ch-tdx-build-feature
runtime-rs: ch: Add TDX CH features check
2023-10-19 14:13:08 +01:00
James O. D. Hunt
048cc70654 Merge pull request #8213 from jodh-intel/validate-hypervisor-cfg-name
runtime: Validate hypervisor section name in config file
2023-10-19 07:40:58 +01:00
Dan Mihai
99db6dff24 Merge pull request #8230 from microsoft/danmihai1/opa-data
tests: query data from the OPA service
2023-10-18 15:32:23 -07:00
James O. D. Hunt
0e0867f15d runtime-rs: ch: Add TDX CH features check
If you attempt to create a container (a TD) on a TDX system using a
custom build of Cloud Hypervisor (CH) that was not built with the `tdx`
CH feature, Kata will report the following, somewhat cryptic, CH error:

```
ApiError(VmBoot(InvalidPayload))
```

Newer versions of CH now report their build-time features in the ping
API response message so we now use that, if available, to detect this
scenario and generate a user-friendly error message instead.

This changes improves the readability of `handle_guest_protection()` and
adds a couple of additional tests for that method.

Fixes: #8152.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-18 18:07:39 +01:00
James O. D. Hunt
409eadddb2 runtime-rs: ch: Improve readability of guest protection checks
Improve the way `handle_guest_protection()` is structured by inverting
the logic and checking the value of the `confidential_guest` setting
before checking the guest protection. This makes the code easier to
understand.

> **Notes:**
>
> - This change also unconditionally saves the available guest protection
>   (where previously it was only saved when `confidential_guest=true`).
>   This explains the minor unit test fix.
>
> - This changes also errors if the CH driver finds an unexpected
>   protection (since only Intel TDX is currently tested).

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-18 18:06:02 +01:00
Greg Kurz
9863805752 Merge pull request #8201 from fidencio/topic/release-tag-repo-stop-tagging-the-tests-repo
release: tag_repos: Stop tagging the `tests` repo
2023-10-18 18:10:39 +02:00
Gabriela Cervantes
a58afe70b8 metrics: Add iperf udp benchmark
This PR adds the iperf udp benchmark for bandwdith measurement
for network metrics.

Fixes #8246

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-18 15:52:03 +00:00
Gabriela Cervantes
82a0814fc2 tests: Enable agent stability test
This PR enables the agent stability test for stability gha CI.

Fixes #8240

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-17 15:16:06 +00:00
Dan Mihai
32be8e3a87 tests: query data from the OPA service
Add example for querying json data from the OPA service.

Fixes: #8231

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-10-17 13:31:43 +00:00
David Esparza
d90d1c5c10 Merge pull request #8243 from dborquez/fix_systemctl_masked_query
metrics: fixes common.sh function to always return true
2023-10-16 20:17:24 -06:00
Dan Mihai
b81c0a6693 tests: encode policy file during test
Encode policy file during test - easier to understand than hard-coding
the encoded file contents.

Fixes: #8214

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-10-16 15:58:12 -07:00
David Esparza
4f9681b411 metrics: fixes common.sh function to always return true
This PR corrects the init env() helper function, to make that
systemctl always returns true when enumerating masked services,
and preventing the test from failing

Fixes: #8242

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-10-16 15:57:57 -06:00
David Esparza
59e8b1d5a7 Merge pull request #8206 from dborquez/memory_footprint_test_removing_trailing_commas_to_make_json_results_file_valid
Memory footprint test removing trailing commas to make json results file valid
2023-10-16 14:31:28 -06:00
Gabriela Cervantes
2ef2b2a6dc docs: Fix paths to build kernel in SNP VMs documentation
This PR fixes the correct path to setup, build and install properly
the kernel for snp.

Fixes #8156

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-16 20:09:02 +00:00
Fabiano Fidêncio
db37692f36 Merge pull request #8226 from microsoft/danmihai1/policy-typo
policy: allow access to ReseedRandomDev
2023-10-16 19:17:31 +02:00
Peng Tao
45e82b6581 Merge pull request #8192 from bergwolf/github/deps
runtime/kata-ctl: update dependencies
2023-10-16 16:39:17 +08:00
Chao Wu
44e602d69a Merge pull request #8014 from openanolis/chao/fix_nydus_break
runtime-rs : fix Nydus support for runtime-rs + Dragonball
2023-10-16 01:30:22 -05:00
Chao Wu
408b59c02c runtime-rs: fix bugs to support Nydus v5
1. enable virtio-fs-pro in Dragonball to have the ability to process nydus backend registry
2. change passthrough for rw layer's readonly config to false to have the accurate read write ability.

Fixes:#8013

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-10-16 10:22:21 +08:00
Chao Wu
157caea9fe Revert "nydus: Temporarily skip tests on dragonball"
This reverts commit aba36ab188.

Fixes: #8013

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-10-16 10:22:21 +08:00
Chao Wu
678fe3cd31 Dragonball: fix Nydus config serde problem
Since Nydus snapshotter has been updated in previous commits, there is a
problem that the config passthrough to Dragonball during mount_rafs is
RafsConfig instead of ConfigV2, but Dragonball could only serde ConfigV2
so it will panic.

We need to add the support for RafsConfig

Fixes:#8013

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-10-16 10:22:21 +08:00
Dan Mihai
b6ec621389 policy: allow access to ReseedRandomDev
Allow access to the ReseedRandomDev endpoint by default. Using false
for ReseedRandomDevRequest was unintended.

Fixes: #8225

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-10-13 21:18:27 +00:00
David Esparza
908519db9d metrics: skips docker restart when it is not installed or is masked.
To avoid errors when initializing the test environment, the
kill_processes_before_start() helper function needs to verify that
docker is installed before attempting to stop it.

Fixes: #8218

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-10-13 18:02:00 +00:00
David Esparza
c2763120aa metrics: removing trailing comma characters from json file.
This PR removes trailing commas so that the json results
file is valid.

This PR also changes the way data results are collected by
terating through the array of memory values to calculate
their average.

Fixes: #8204

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-10-13 18:00:57 +00:00
GabyCT
1974d13122 Merge pull request #8188 from dborquez/metrics_add_fio_readme.md
metrics: removal of reference in the documentation to the fio dax subtest.
2023-10-12 10:53:55 -06:00
James O. D. Hunt
3e8cf6959c runtime: Validate hypervisor section name in config file
Previously, if you accidentally modified the name of the hypervisor
section in the config file, the default golang runtime gives a cryptic
error message ("`VM memory cannot be zero`"). This can be demonstrated
using the `kata-runtime` utility program which uses the same golang
config package as the actual runtime (`containerd-shim-kata-v2`):

```bash
$ kata-runtime env >/dev/null; echo $?
0
$ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml
$ kata-runtime env >/dev/null; echo $?
VM memory cannot be zero
1
```

The hypervisor name is now validated so that the behaviour becomes:

```bash
$ kata-runtime env >/dev/null; echo $?
0
$ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml
$ ./kata-runtime env >/dev/null; echo $?
/etc/kata-containers/configuration.toml: configuration file contains invalid hypervisor section: "foo"
1
```

Fixes: #8212.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-12 13:53:37 +01:00
James O. D. Hunt
45d28998d9 Merge pull request #8149 from jodh-intel/runtime-rs-ch-detect-tdx-version
runtime-rs: ch: Detect Intel TDX version
2023-10-12 10:09:42 +01:00
QuanweiZhou
f904e64155 Merge pull request #8179 from Apokleos/directvol-urlEncode
runitme-rs: use the same base64 as kata-runtime/direct-volume does
2023-10-12 09:04:11 +08:00
GabyCT
bc6eadf4f6 Merge pull request #8197 from GabyCT/topic/enablescability
tests: Enable scability test for stability CI
2023-10-11 16:41:46 -06:00
Archana Shinde
f814b1a0a2 Merge pull request #8073 from amshinde/runtime-rs-vfio-clh
runtime-rs: Add support for adding vfio device for cloud-hypervisor
2023-10-11 15:01:55 -07:00
Gabriela Cervantes
ef6388e815 tests: Remove unused function from scability test
This PR removes an unused function from scability test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-11 19:44:21 +00:00
Fabiano Fidêncio
fbc8f8f466 scripts: Use install_yq from the kata-containers repo
As the file is already part of the kata-containers repo, and the tests
repo is about to become read-only, we're good to drop the tests
references from here and use everything coming from the
`kata-containers` repo instead.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-11 12:52:55 +02:00
Fabiano Fidêncio
65b1a2d277 release: tag_repos: Stop tagging / updating the tests repo
As we've moved all the tests to the `kata-containers` repo, the `tests`
repo will become a read-only repo.

Fixes: #8200

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-11 11:45:27 +02:00
James O. D. Hunt
87b760f569 runtime-rs: ch: Detect Intel TDX version
Improve the `GuestProtection` handling to detect the version of
Intel TDX available.

The TDX version is now logged by the Cloud Hypervisor driver.

Fixes: #8147.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-11 09:38:00 +01:00
alex.lyn
73e81f5e39 runitme-rs: unify base64 encoding for direct-volume
Direct-volume needs to use the same base64 character set as
kata-runtime/direct-volume does.

Fixes: #8175

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-10-11 14:00:13 +08:00
Gabriela Cervantes
c6463cb5ae tests: Fix path for versions yaml for soak parallel test
This PR fixes the path for versions yaml for soak parallel test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-10 22:29:20 +00:00
David Esparza
89c9454fca metrics: removal of reference in the documentation to the dax test.
This PR removes the reference in the documentation to the DAX
subtest of the FIO benchmark, because this metric is currently
WIP.

Fixes: #8159

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-10-10 15:55:59 -06:00
Gabriela Cervantes
30ff58904e tests: Enable scability test for stability CI
This PR enables the scability test for stability CI gha.

Fixes #8196

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-10 19:59:57 +00:00
GabyCT
538131ab44 Merge pull request #8154 from GabyCT/topic/addstability
tests: Enable soak parallel stability test
2023-10-10 13:53:14 -06:00
Archana Shinde
8d6f7b9096 runtime-rs: Add support for handling vfio device for cloud-hypervisor
This change adds support for adding and removing vfio devices for
 cloud-hypervisor.

Fixes: #6691

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-10-10 12:25:44 -07:00
Gabriela Cervantes
e786b2b019 gha: Add install dependencies for stability tests
This PR adds the install dependencies for stability tests.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-10 16:05:48 +00:00
Chao Wu
936553ae79 Merge pull request #7505 from lisongqian/feat/dragonball_metrics
dragonball: vcpu metrics change to be recorded per vcpu
2023-10-10 10:52:40 -05:00
Wainer Moschetta
d311c3dd04 Merge pull request #7621 from wainersm/gha-run-local
ci: k8s: adapt gha-run.sh to run locally
2023-10-10 11:19:19 -03:00
David Esparza
93fef543e0 Merge pull request #8127 from dborquez/fix_iperf_check_kata_processes_issue
metrics: removes kata components and k8s deployment when test finishes
2023-10-10 07:05:24 -06:00
lisongqian
dbfe6512fc dragonball: vcpu metrics change to be recorded per vcpu
In this commit, the vcpu metrics in Dragonball will be changed to record per-vcpu.

Fixes: #7248

Signed-off-by: lisongqian <mail@lisongqian.cn>
2023-10-10 16:22:40 +08:00
lisongqian
fa60fbe023 dragonball: METRICS is refactored to RwLock<DragonballMetrics>
In this commit, the METRICS is refactored to RwLock<DragonballMetrics>.

Fixes: #7248

Signed-off-by: lisongqian <mail@lisongqian.cn>
2023-10-10 16:22:40 +08:00
Peng Tao
500d1c5cee kata-ctl: update rustls-webpki/webpki dependency
The old ones have security issues.
ref: https://github.com/briansmith/webpki/issues/69
https://github.com/briansmith/webpki/issues/69

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
d7660d82a0 runtime: unify gopkg.in/yaml.v3 to v3.0.1
The older versions have Denial of Service issues.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
fc9a107e8e runtime: unify swag and testify dependency
So that we don't need to depend on that many versions of them.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
79ebb959c5 runtime: update runc dependency to v1.1.9
To pick up security fixes.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
7f3e8bd65e runtime: unify golang.org/x/text to v0.7.0
The older versions contain security issues.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:45 +00:00
Peng Tao
df325ae371 runtime: update golang.org/x/net to v0.7.0
To pick up fix for the following issue:

A maliciously crafted HTTP/2 stream could cause excessive CPU
consumption in the HPACK decoder, sufficient to cause a denial of
service from a small number of small requests.

Fixes: #8190
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-10 03:56:39 +00:00
David Esparza
bba34910df metrics: stops kata components and k8s deployment when test finishes
This PR adds a trap whenever the scrip exits, it deletes the iperf
k8s deployment and k8s services, and deletes the kata components.

This way, when the script finishes, it verifies that there are
indeed no kata components still running.

Fixes: #8126

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-10-09 13:41:43 -06:00
Gabriela Cervantes
84e3d884e4 gha: Add general dependencies to stability tests
This PR adds the general dependencies to stability tests.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-09 17:02:49 +00:00
Gabriela Cervantes
dec3951ca5 tests: Add soak parallel stability test
This PR adds the soak parallel stability test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-09 17:02:49 +00:00
Gabriela Cervantes
0f04d527d9 tests: Enable soak parallel test
This PR enables the soak parallel test for stability test.

Fixes #8153

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-09 17:02:49 +00:00
Wainer dos Santos Moschetta
e669282c25 ci: k8s: set KUBERNETES default value
The KUBERNETES variable is mostly used by kata-deploy whether to apply
k3s specific deployments or not. It is used to select the type of
kubernetes to be installed (k3s, k0s, rancher...etc) and it is always
set on CI. Running the script locally we want to set a value by default
to avoid `KUBERNETES: unbound variable` errors.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:08:48 -03:00
Wainer dos Santos Moschetta
c30c3ff185 tests: run k8s-volume on a given node
This test can give false-positive on a multi-node cluster. Changed it to
use the new get_one_kata_node() and the modified exec_host() to run the
setup commands on a given node (that has kata installed) and ensure the
test pod is scheduled at that same node.

Fixes #7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:08:48 -03:00
Wainer dos Santos Moschetta
666993da8d tests: run k8s-file-volume on a given node
This test can give false-positive on a multi-node cluster. Changed it to
use the new get_one_kata_node() and the modified exec_host() to run the
setup commands on a given node (that has kata installed) and ensure the
test pod is scheduled at that same node.

Fixes #7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:08:48 -03:00
Wainer dos Santos Moschetta
3a00fc9101 tests: exec_host() now gets the node name
The exec_host() simply fails on cluster with multi-nodes because
`kubectl get node -o name" will return a list o names. Moreover, it will
return control nodes names which usually don't have kata installed.

Fixes #7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
61c9c17bff tests: add get_one_kata_node() to tests_common.sh
The introduced get_one_kata_node() returns the first node that
has the kata-runtime=true label, i.e., supposedly a node with
kata installed.

This is useful for tests that should run on a determined worker
node on a multi-nodes cluster.

Fixes #7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
68f083c4d0 ci: k8s: set KATA_HYPERVISOR default value
Let KATA_HYPERVISOR be qemu by default in gh-run.sh as this variable
is required to tweak some configurations of kata-deploy.

Fixes #7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
6677a61fe4 ci: k8s: configurable deploy kata timeout
The deploy-kata() of gha-run.sh will wait for 10 minutes for the kata
deploy installation finish. This allow users of the script to overwrite
that value by exporting the KATA_DEPLOY_WAIT_TIMEOUT environment
variable.

Fixes #7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
200e542921 ci: k8s: shellcheck fixes to gha-run.sh
Fixed a couple of warns shellcheck emitted and disabled others:
 * SC2154 (var is referenced but not assigned)
 * SC2086 (Double quote to prevent globbing and word splitting)

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
4af78be13a kata-deploy: re-format kata-[deploy|cleanup].yaml
The .tests/integration/kubernetes/gh-run.sh script run `yq write` a
couple of times to edit the kata-[deploy|cleanup].yaml, resulting
on the file being formatted again. This is annoying because leaves
the git tree dirty.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
d54e6d9cda ci: k8s: run_tests() for kcli
The only difference to the other platforms is that it needs to
export KUBECONFIG.

Fixes #7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
c2ef1f0fb0 ci: k8s: add deploy-kata-kcli() to gh-run.sh
The cleanup-kcli() behaves like other deploy kata for
bare-metal (e.g. sev, tdx...etc) except that KUBECONFIG
should be exported.

Fixes #7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
d2be8eef1a ci: k8s: add cleanup-kcli() to gha-run.sh
The cleanup-kcli() behaves like other clean up for bare-metal (e.g. sev,
tdx...etc) except that KUBECONFIG should be exported.

Fixes #7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
cbb9aa15b6 ci: k8s: set default image for deploy_kata()
On CI workflows the variables DOCKER_REGISTRY, DOCKER_REPO and
DOCKER_TAG are exported to match the built image. However, when running
the script outside of CI context, a developer might just use the latest
image which in this case will be
`quay.io/kata-containers/kata-deploy-ci:kata-containers-latest`.

Fixes #7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Wainer dos Santos Moschetta
89bef7d036 ci: k8s: create k8s clusters with kcli
Adapted the gha-run.sh script to create a Kubernetes cluster locally
using the kcli tool.

Use `./gha-run.sh create-cluster-kcli` to create it, and
`./gha-run.sh delete-cluster-kcli` to delete.

Fixes #7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2023-10-09 11:05:40 -03:00
Fabiano Fidêncio
1280f85343 Merge pull request #8171 from bergwolf/github/fix-up-gha
GHA: fix up referenced yaml exceeding 20 limit problem
2023-10-09 09:37:03 +02:00
Peng Tao
954d40cce5 gha: combine coco jobs into a single yaml
So that we don't risk exceeding the GHA 20 rerefenced yaml files limit
that easy.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-08 14:22:01 +00:00
Peng Tao
b60e0a9b57 gha: combine basic amd64 jobs into a single yaml
GHA has an undocumented limitation that there can be at most 20
referenced yamls in a single yaml file. We workaround it by combining
multiple jobs into a single yaml file.

Fixes: #8161
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-10-08 13:55:01 +00:00
Fabiano Fidêncio
108db0a721 Merge pull request #8162 from sprt/sprt/unbreak-ci
gha: ci: Revert tracing test PR to unbreak CI
2023-10-08 10:13:46 +02:00
Aurélien Bombo
e9bd852113 gha: ci: Revert tracing test PR to unbreak CI
Revert "Merge pull request #8115 from fidencio/topic/ci-add-tracing-tests"

This unbreaks CI as seen in https://github.com/kata-containers/kata-containers/actions/runs/6434757133

Fixes: #8161

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-10-06 14:13:17 -07:00
James O. D. Hunt
16fe81f27c Merge pull request #8124 from jodh-intel/ch-enable-feature
runtime-rs: ch: Enable feature
2023-10-06 13:02:08 +01:00
Fabiano Fidêncio
fa6786d1d7 Merge pull request #8117 from fidencio/topic/ci-add-runk-tests
gha: ci: Port runk tests over
2023-10-06 11:19:55 +02:00
Fabiano Fidêncio
8fec654716 Merge pull request #8115 from fidencio/topic/ci-add-tracing-tests
ci: gha: Port tracing tests over
2023-10-06 10:06:57 +02:00
GabyCT
265f53e594 Merge pull request #8082 from dborquez/enable_fio_on_ctr
Enable fio test using containerd client
2023-10-05 17:26:22 -06:00
GabyCT
c8b9ec1cb5 Merge pull request #8108 from GabyCT/topic/ghastability
gha: Add stability tests workflow for gha
2023-10-05 17:10:10 -06:00
James O. D. Hunt
b8a46a4b85 runtime-rs: ch: Enable feature
Enable the Cloud Hypervisor driver (the `cloud-hypervisor` build feature) for the rust runtime.

Fixes: #6264.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-10-05 17:58:39 +01:00
Gabriela Cervantes
0f2dc8c675 gha: Add containerd stability tests to ci yaml
This PR adds containerd stability tests to ci yaml.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-05 15:21:24 +00:00
Fabiano Fidêncio
89f73e658d Merge pull request #8110 from fidencio/topic/gha-be-more-specific-about-the-arm-runners
gha: arm64: Ensure the builder is arm64-builder
2023-10-04 21:20:08 +02:00
Fabiano Fidêncio
da91c9df88 ci: Port runk tests to this repo
I'm basically moving the runk tests from the tests repo to this one, and
I'm adding the "Signed-off-by:" of every single contributor the tests.

Fixes: #8116

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Chen Yiyang <cyyzero@qq.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-04 20:41:29 +02:00
Fabiano Fidêncio
7f23772763 ci: Add placeholder for runk tests
The runk test has been executed as part of the former "ubuntu" jenkins
CI.

We're porting it to GHA and running it against LTS containerd.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-04 20:40:32 +02:00
Fabiano Fidêncio
9205acc3d2 ci: Move tracing tests here
I'm basically moving the tracing tests from the tests repo to this one,
and I'm adding the "Signed-off-by:" of every single contributor to the
tests.

Fixes: #8114

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2023-10-04 20:02:27 +02:00
Gabriela Cervantes
85d290a048 gha: Add stability gha run script
This PR adds the stability gha run script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-04 17:45:45 +00:00
Gabriela Cervantes
54f0c8f88e gha: Add stability tests workflow for gha
This PR adds the stability test workflow for gha for the kata CI.

Fixes #8107

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-04 16:32:13 +00:00
Fabiano Fidêncio
3bb2923e5d ci: Add placeholder for tracing tests
The tracing tests are currently running as part of the Jenkins CI with
the following setups:
* Container Engines: containerd
* VMMs: QEMU | Cloud Hypervisor
* Snapshotters: overlayfs | devmapper

We'll be restricting those tests to be running on LTS version of
containerd, without devmapper.

As it's known due to our GHA limitation, this is just a placeholder and
the tests will actually be added in the next interations.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-04 18:02:02 +02:00
Fabiano Fidêncio
2c3bf406dc ci: Create a function to install docker
This will be re-used in other tests as well.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-04 15:01:51 +02:00
Fabiano Fidêncio
c2cce12de5 Merge pull request #8100 from fidencio/topic/kata-deploy-build-agent
kata-deploy: Build kata-agent as we build all the other components
2023-10-04 11:56:03 +02:00
Steve Horsman
c430cc3707 Merge pull request #8098 from stevenhorsman/k8s-registry-suite
versions: migrate out of k8s.gcr.io
2023-10-04 10:51:39 +01:00
Fabiano Fidêncio
119f03de26 gha: arm64: Ensure the builder is arm64-builder
Otherwise we'll use any arm64 machine that's added as a runner, and
whenever new machines are added those may end up being only used for
running some specific set of the tests.

Fixes: #8109

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-04 11:08:11 +02:00
Fabiano Fidêncio
59b9380d1c Merge pull request #8093 from stevenhorsman/crictl-pod-config-update
doc: Update crictl pod-config
2023-10-04 10:49:04 +02:00
David Esparza
8c498ef5ee metrics: Use jq tool to pretty-print json metrics output
This PR enables the use of jq pretty-print feature to
improve the formatting of metric results json files.

Fixes: #8081

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-10-03 23:33:19 -06:00
David Esparza
a2159a6361 metrics: Enables FIO test for kata containers
FIO benchmark is enabled to measure IO in Kata
at different latencies using containerd client,
in order to complement the CI metrics testing set.

This PR asl deprecated the previous Fio bench
based on k8s.

Fixes: #8080

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-10-03 23:32:38 -06:00
Fabiano Fidêncio
f337315952 Merge pull request #8106 from fidencio/topic/gha-fix-k0s-related-cis
gha: Fix k0s deployment
2023-10-03 21:47:40 +02:00
GabyCT
d1d9af5de2 Merge pull request #8085 from GabyCT/topic/stabilitytests
tests: Add stability test for kata CI
2023-10-03 11:28:49 -06:00
Fabiano Fidêncio
70e7ec3e23 gha: Fix k0s deployment
The tests are failing when setting up k0s, and that happens because we
download a kubectl binary matching the kubernetes version k0s is using,
and we do that by:
```
sudo k0s kubectl version --short 2>/dev/null | ...
```

With kubectl 1.28, which is now the default on k0s, `kubectl version
--short` has been removed, leading us to an empty stringm causing then
the error in the CI.

Fixes: #8105

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 17:21:40 +02:00
Fabiano Fidêncio
560bbffb57 packaging: tools: Remove set -x leftover
This was used for debugging, and ended up being merged with that.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 15:33:55 +02:00
Fabiano Fidêncio
18fa483d90 packaging: release: Mention newly added images
We've added two new containerd builder images recently, one for the
components under `src/tools` and another one for the Kata Containers
agent.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 15:33:55 +02:00
Fabiano Fidêncio
ca3b888371 packaging: tools: Fix container image env var name
This should be TOOLS_CONTAINER_BUILDER instead of
VIRTIOFSD_CONTAINER_BUILDER.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 15:33:55 +02:00
Fabiano Fidêncio
5ca66795c7 packaging: Allow passing the TOOLS_CONTAINER_BUILDER
This follows what we've been doing for all the components we're
building, but was missed as part of #8077.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 15:33:55 +02:00
Fabiano Fidêncio
02acef9575 gha: Build the kata-agent as part of our workflows
The kata-agent binary won't be released, just built so it can be used,
later on,  as part of our tests and as part of the rootfs build.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 15:33:55 +02:00
Fabiano Fidêncio
5208386ab1 packaging: Build the kata-agent
Let's add the needed functions to start building the kata-agent, with or
without the OPA support.

For now this build is not used as part of the rootfs build, but later on
this will (not as part of this series, though).

Fixes: #8099

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 15:33:55 +02:00
Fabiano Fidêncio
1727487eef agent: Allow specifying DESTDIR and AGENT_POLICY via env vars
This will help to build the agent binary as part of the kata-deploy
localbuild, as we need to pass the DESTDIR to where the agent will be
installed, and also whether we're building the agent with policy support
enabled or not.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 14:18:45 +02:00
Fabiano Fidêncio
45c1188839 packaging: Add get_agent_image_name()
This will be used for building the kata-agent.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-10-03 14:17:38 +02:00
Wainer dos Santos Moschetta
0db8fb8f98 versions: migrate out of k8s.gcr.io
The k8s.gcr.io is deprecated for a while now and has been redirected to
registry.k8s.io. However on some bare-metal machines in our testing
pools that redirection is not working, so let's just replace the
registries.

Fixes #8098
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit b2c3bca558c38deff2117d5909d9071c23c05590)
2023-10-03 11:52:59 +01:00
stevenhorsman
a1a0543671 doc: Fix spelling
Spell check failed with:
```
[kata-spell-check.sh:275] WARNING: Word 'overcommitment':
did you mean one of the following?: over commitment, over-commitment,
commitment
```
So update this to pass the static checks

Fixes: #
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-10-03 10:17:38 +01:00
Gabriela Cervantes
6339605a14 tests: Add general stability fixes
This PR adds general stability fixes.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-10-02 19:42:46 +00:00
stevenhorsman
59ae244442 doc: Update crictl pod-config
- Ensure that our documented crictl pod config file contents have
uid  and namespace fields for compatibility with crictl 1.24+

This avoids a user potentially hitting the error:
```
getting sandbox status of pod "d3af2db414ce8": metadata.Name,
metadata.Namespace or metadata.Uid is not in metadata
"&PodSandboxMetadata{Name:nydus-sandbox,Uid:,Namespace:default,Attempt:1,}"

getting sandbox status of pod "-A": rpc error: code = NotFound desc = an
error occurred when try to find sandbox: not found
```

Fixes: #8092
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
(cherry picked from commit 8f8c2215)
2023-10-02 14:53:46 +01:00
Gabriela Cervantes
fd19f4082f tests: Add agent stability test
This PR adds the agent stability test to stability test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-28 22:37:02 +00:00
Gabriela Cervantes
215577032f tests: Add cassandra stress in stability tests
This PR adds the cassandra stress at the stability tests.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-28 22:34:45 +00:00
GabyCT
a890ad3a16 Merge pull request #8066 from GabyCT/topic/urlvra
docs: Update url in kata vra document
2023-09-28 14:59:34 -06:00
Zvonko Kaiser
79e33c211c Merge pull request #7325 from zvonkok/vfio-sandbox-id-debug
gpu: Adding CDI support for cold and hot-plug of VFIO devices
2023-09-28 21:31:12 +02:00
Gabriela Cervantes
f2d3ea988d tests: Add stressng dockerfile for stability tests
This PR adds the stressng dockerfile for stability tests.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-28 16:35:22 +00:00
Gabriela Cervantes
6493aa309e tests: Add stressor CPU test for stability tests
This PR adds the stressor CPU test for stability tests.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-28 16:33:08 +00:00
Gabriela Cervantes
ef68a3a36b metrics: Add stability test for kata CI
This PR adds the stability test for kata containers repository.

Fixes #8084

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-28 16:23:36 +00:00
David Esparza
f7ef45b167 Merge pull request #8077 from fidencio/topic/kata-deploy-ship-the-tools
kata-deploy: build & ship the rust components from src/tools/
2023-09-28 09:59:19 -06:00
Zvonko Kaiser
7c934dc7da gpu: Fix cold-plug of VFIO devices
We need to do proper sandbox sizing when we're doing cold-plug introduce CDI,
the de-facto standard for enabling devices in containers. containerd
will pass-through annotations for accumulated CPU,Memory and now CDI
devices. With that information sandbox sizing can be derived correctly.

Fixes: #7331

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-09-28 09:49:13 +00:00
GabyCT
fcc755fc3b Merge pull request #8068 from GabyCT/topic/limitlatency
metrics: Add latency value limits for kata CI
2023-09-27 13:28:41 -06:00
Greg Kurz
defbb64ac8 Merge pull request #8036 from rye-stripe/bugfix/overhead-metrics
runtime: fix reading cgroup stats of sandboxes
2023-09-27 19:39:55 +02:00
Archana Shinde
95455e6fe8 Merge pull request #8058 from likebreath/0925/clh_v35.0
Upgrade to Cloud Hypervisor v35.0
2023-09-27 10:39:32 -07:00
Gabriela Cervantes
8d66ef5185 metrics: Increase qemu jitter value
This PR increases qemu jitter value.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-27 17:31:07 +00:00
Gabriela Cervantes
5600e28b54 metrics: Increase jitter value for clh
This PR increases jitter value for clh.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-27 17:30:19 +00:00
Fabiano Fidêncio
a6b1f5e21b ci: Build src/tools components as part of our tests / releases
Build those as part of our CI and release workflows.

Fixes #5520 #5348

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 18:50:25 +02:00
Fabiano Fidêncio
501a168a81 kata-deploy: Build components from src/tools
Let's add targets and actually enable users and oursevles to build those
components in the same way we build the rest of the project.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 18:49:02 +02:00
Fabiano Fidêncio
6ef42db5ec static-build: Add scripts to build content from src/tools
As we'd like to ship the content from src/tools, we need to build them
in the very same way we build the other components, and the first step
is providing scripts that can build those inside a container.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 18:48:56 +02:00
Fabiano Fidêncio
4d08ec29bc packaging: Add get_tools_image_name()
This will be used for building all the (rust) components from src/tools.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 18:48:35 +02:00
Fabiano Fidêncio
98097c96de packaging: Use git abbreviated hash
This will make it easier to build images that rely on several
directories hashes.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 18:48:30 +02:00
Fabiano Fidêncio
8b25e90027 Merge pull request #8075 from fidencio/topic/ci-add-kata-monitor-tests
ci: Port kata-monitor tests from Jenkins to GHA
2023-09-27 15:48:46 +02:00
Fabiano Fidêncio
489caf1ad0 ci: kata-monitor: Move tests over
Let's move, adapt, and use the kata-monitor tests from the tests repo.
In this PR I'm keeping the SoB from every single contributor from who
touched those tests in the past.

Fixes: #8074

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-27 11:40:31 +02:00
Fabiano Fidêncio
a3fb067f1b ci: Add placeholder for kata-monitor tests
The kata-monitor tests is currently running as part of the Jenkins CI
with the following setups:
* Container Engines: CRI-O | containerd
* VMMs: QEMU

When using containerd, we're testing it with:
* Snapshotter: overlayfs | devmapper

We will stop running those tests on devmapper / overlayfs as that hardly
would get us a functionality issue.

Also, we're restricting this to run with the LTS version of containerd,
when containerd is used.

As it's known due to our GHA limitation, this is just a placeholder and
the tests will actually be added in the next iterations.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 11:31:17 +02:00
Fabiano Fidêncio
57cb4ce204 ci: Make install_kata aware of container engines
This will help us when running tests using CRI-O.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 11:31:17 +02:00
Fabiano Fidêncio
de1eeee334 ci: Create a generic install_crio function
This will serve us quite will in the upcoming tests addition, which will
also have to be executed using CRi-O.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 11:26:13 +02:00
Fabiano Fidêncio
64a2000859 ci: Add install_cni_plugins helper
This will become handy when doing tests with CRI-O, as CRI-O doesn't
install the CNI plugins for us.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 11:26:13 +02:00
Fabiano Fidêncio
8132fe15c9 ci: Modify containerd default config
Let's ensure we have runc running with `SystemdCgroups = false`,
otherwise we'll face failures when running tests depending on runc on
Ubuntu 22.04, woth LTS containerd.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-27 11:16:12 +02:00
Gabriela Cervantes
8cb7df1bed metrics: Add checkmetrics for latency test
This PR adds the checkmetrics for latency test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-26 19:11:08 +00:00
Gabriela Cervantes
e90440ae24 metrics: Add qemu latency value limit
This PR adds the qemu latency value limit for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-26 17:30:09 +00:00
Gabriela Cervantes
a74a8f8a9d metrics: Add latency value limits for kata CI
This PR adds latency value limits for kata CI.

Fixes #8067

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-26 17:29:07 +00:00
Gabriela Cervantes
d7def8317a metrics: Fix general check static warnings
This PR fixes general check static warnings.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-26 16:30:59 +00:00
GabyCT
309103169d Merge pull request #8056 from GabyCT/topic/fixlatencypath
metrics: Fix latency yamls path
2023-09-26 10:16:55 -06:00
Gabriela Cervantes
928553d1ba docs: Update url in kata vra document
This PR updates the url in kata vra document.

Fixes #8065

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-26 16:13:12 +00:00
GabyCT
5c0afaacf4 Merge pull request #8018 from GabyCT/topic/fixreadme
metrics: Fix metrics README
2023-09-26 09:51:47 -06:00
David Esparza
83326f89b3 Merge pull request #8054 from GabyCT/topic/fixcrdoc
metrics: Fix C-Ray documentation
2023-09-26 09:50:19 -06:00
James O. D. Hunt
31478b9c33 Merge pull request #7944 from jodh-intel/runtime-rs-ch-enable-tdx
runtime-rs: ch: Enable Intel TDX
2023-09-26 14:11:12 +01:00
James O. D. Hunt
b0a3293d53 runtime-rs: ch: Enable Intel TDX
Allow Cloud Hypervisor to create a confidential guest (a TD or
"Trust Domain") rather than a VM (Virtual Machine) on Intel systems
that provide TDX functionality.

> **Notes:**
>
> - At least currently, when built with the `tdx` feature, Cloud Hypervisor
>   cannot create a standard VM on a TDX capable system: it can only create
>   a TD. This implies that on TDX capable systems, the Kata Configuration
>   option `confidential_guest=` must be set to `true`. If it is not, Kata
>   will detect this and display the following error:
>
>   ```
>   TDX guest protection available and must be used with Cloud Hypervisor (set 'confidential_guest=true')
>   ```
>
> - This change expands the scope of the protection code, changing
>   Intel TDX specific booleans to more generic "available guest protection"
>   code that could be "none" or "TDX", or some other form of guest
>   protection.

Fixes: #6448.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-26 10:55:25 +01:00
James O. D. Hunt
523399c329 runtime-rs: ch: Add more consts
Introduce a few new constants (for PCI segment count and FS queues) and
move the disk queue constants to `convert.rs` to allow them to be used
there too.

> **Note:**
>
> This change gives the `ShareFs` code it's own set of values rather
> than relying on the disk queue constants.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-26 08:41:32 +01:00
James O. D. Hunt
dea8065811 runtime-rs: ch: Remove unused function
Delete the `handle_pending_devices_after_boot()` function which is no
longer required.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-26 08:41:32 +01:00
James O. D. Hunt
995f2c015f runtime-rs: ch: Only handle particular pending device types
Modify the Cloud Hypervisor `add_device()` method to add `ShareFs` and
`Network` devices to the list of pending devices since only these two
device types need to be cached before VM startup. Full details in the
comments.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-26 08:41:32 +01:00
James O. D. Hunt
b1b96a5c49 runtime-rs: ch: Remove erroneous "virtio-blk-mmio" check
Remove the `VIRTIO_BLK_MMIO` check which appears to have been added
erroneously in the first place.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-26 08:41:32 +01:00
Gabriela Cervantes
9ac29b8d38 metrics: Add init_env function to latency test
This Pr adds the init_env function to latency test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-25 22:06:00 +00:00
Bo Chen
dfd0c9fa9a runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v35.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #8057

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-09-25 12:22:37 -07:00
Bo Chen
8f9f087e35 versions: Upgrade to Cloud Hypervisor v35.0
Details of this release can be found in ourroadmap project as iteration
v35.0: https://github.com/orgs/cloud-hypervisor/projects/6.

Fixes: #8057

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-09-25 12:22:01 -07:00
Fabiano Fidêncio
a4daa86535 Merge pull request #8028 from fidencio/topic/ci-test-with-crio-part-2
ci: k8s: crio: Follow up patches to have CRI-O also working as part of our CI
2023-09-25 18:40:42 +02:00
Gabriela Cervantes
81c8babca9 metrics: Fix latency yamls path
This PR fixes the latency yamls path for the latency test for
kata metrics.

Fixes #8055

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-25 15:52:24 +00:00
Gabriela Cervantes
4815736820 metrics: Fix C-Ray documentation
This PR fixes the C-Ray documentation for kata metrics.

Fixes #8052

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-25 15:27:58 +00:00
Fabiano Fidêncio
ef63d67c41 ci: crio: Trail '\r' from exec_host() output
We've faced this as part of the CI, only happening with the CRI-O tests:
```
 not ok 1 Test readonly volume for pods
 # (from function `exec_host' in file tests_common.sh, line 51,
 #  in test file k8s-file-volume.bats, line 25)
 #   `exec_host "echo "$file_body" > $tmp_file"' failed with status 127
 # [bats-exec-test:38] INFO: k8s configured to use runtimeclass
 # bash: line 1: $'\r': command not found
 #
 # Error from server (NotFound): pods "test-file-volume" not found
```

I must say I didn't dig into figuring out why this is happening, but we
may be safe enough to just trail the '\r', as long as all the tests keep
passing on containerd.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-25 16:42:18 +02:00
Fabiano Fidêncio
74c12b2927 ci: crio: Enable default capabilities
We need the default capabilities to be enabled, especially `SYS_CHROOT`,
in order to have tests accessing the host to pass.

A huge thanks to Greg Kurz for spotting this and suggesting the fix.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
2023-09-25 14:56:15 +02:00
Fabiano Fidêncio
358dc2f569 kata-deploy: Fix CRI-O detection
Some of the "k8s distros" allow using CRI-O in a non-official way, and
if that's done we cannot simply assume they're on containerd, otherwise
kata-deploy will simply not work.

In order to avoid such issue, let's check for `cri-o` as the container
engine as the first place and only proceed with the checks for the "k8s
distros" after we rule out that CRI-O is not being used.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-25 14:56:15 +02:00
Fabiano Fidêncio
ebaa4fa4c1 ci: crio: Pass -y to apt
That was something overlooked during my tests. :-/

Fixes: #8005

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-25 14:56:15 +02:00
GabyCT
11cf0e2d28 Merge pull request #8038 from GabyCT/topic/latency
metrics: Enable latency test in gha run script
2023-09-22 16:57:53 -06:00
GabyCT
3ef57b335e Merge pull request #8045 from jepio/fix-docker-ownership
local-build: Fix .docker ownership before build-payload
2023-09-22 14:43:38 -06:00
Archana Shinde
9bb9a3e7a4 Merge pull request #7966 from amshinde/runtime-rs-network-clh
runtime-rs: Add network support for cloud-hypervisor
2023-09-22 13:08:09 -07:00
Gabriela Cervantes
97e73b2234 metrics: Fix spelling warnings
This PR fixes general spelling warnings detected by the spelling check.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-22 15:50:51 +00:00
Gabriela Cervantes
36c8cd6f1f metrics: Fix metrics README
This PR fixes the network metrics section at the README by leaving
the current tests that we have in our kata metrics.

Fixes #8017

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-22 15:28:58 +00:00
Fabiano Fidêncio
c5a5a0c95e Merge pull request #8012 from arronwy/strip
osbuild: Reduce guest components binary size with strip
2023-09-22 15:45:38 +02:00
Fabiano Fidêncio
9d190f2390 Merge pull request #8042 from GabyCT/topic/pandoc
gha: Add pandoc as a dependency for static checks
2023-09-22 15:31:18 +02:00
Jeremi Piotrowski
15425a2b80 local-build: Fix .docker ownership before build-payload
The permissions on .docker/buildx/activity/default are regularly broken by us
passing docker.sock + $HOME/.docker to a container running as root and then
using buildx inside. Fixup ownership before executing docker commands.

Fixes: #8027
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-22 13:44:53 +02:00
Jeremi Piotrowski
a5338e885e Merge pull request #8030 from portersrc/8027-ci-rootfs-image-build-asset-is-failing-oras
ci: rootfs-image build-asset is failing
2023-09-22 11:07:50 +02:00
Chao Wu
6f98fbafde Merge pull request #6706 from guixiongwei/feat/thp
feat(runtime-rs): introduce huge page mode to select VM RAM's backend
2023-09-22 15:27:06 +08:00
Gabriela Cervantes
13ca7d9f97 gha: Add pandoc as a dependency for static checks
To avoid the failure of not finding pandoc command this PR adds that
package as a dependency for static checks.

Fixes #8041

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-21 20:14:41 +00:00
Jeremi Piotrowski
28dd5ae91e Merge pull request #7799 from UiPath/clh-directio-support
clh: Direct IO support for block devices
2023-09-21 19:16:08 +02:00
David Esparza
6de9f39895 Merge pull request #8020 from GabyCT/topic/fixhunspell
gha: Install hunspell for static checks
2023-09-21 10:58:40 -06:00
Gabriela Cervantes
08bc8e4db4 metrics: Add latency benchmark for gha
This PR adds the latency benchmark for gha for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-21 16:14:39 +00:00
Gabriela Cervantes
6776b55d7e metrics: Enable latency test in gha run script
This PR enables the latency test for gha run script for kata metrics.

Fixes #8037

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-21 16:11:58 +00:00
Peteris Rudzusiks
94e2ccc2d5 runtime: fix reading cgroup stats of sandboxes
The cgroup stats come from resourcecontrol package in the form of pointers
to structs. The sandbox Stat() method incorrectly was expecting structs.
This caused the cpu and memory stats to always be 0, which in turn caused
incorrect pod overhead metrics.

Fixes #8035

Signed-off-by: Peteris Rudzusiks <rye@stripe.com>
2023-09-21 17:00:53 +02:00
Alexandru Matei
d507d189bb fc: Add support for noflush cache option
Firecracker supports noflush semantic via Unsafe cache type.
There is no support for direct i/o, remove it from config file

Fixes: #7823

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2023-09-21 14:48:24 +03:00
Alexandru Matei
2ca781518a clh: Direct IO support for block devices
Clh suports direct i/o for disks. It doesn't
offer any support for noflush, removed passing
of option to cloud-hypervisor internal config

Fixes: #7798

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2023-09-21 14:48:24 +03:00
Fabiano Fidêncio
dd27912f31 Merge pull request #8032 from fidencio/topic/ci-make-push-after-build-be-trigger-by-workflow-dispatch
ci: Trigger payload-after-push on workflow_dispatch
2023-09-21 10:25:24 +02:00
Fabiano Fidêncio
0c95697cc4 ci: Trigger payload-after-push on workflow_dispatch
This will allow us to easily test failures and fixes on that workflows.

Fixes: #8031

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-21 09:24:13 +02:00
Chris Porter
28cbc3b51c ci: rootfs-image build-asset is failing
Fixes: #8027

Signed-off-by: Chris Porter <porter@ibm.com>
2023-09-21 00:58:42 -05:00
Fabiano Fidêncio
21f6f9a173 Merge pull request #8016 from fidencio/topic/ci-test-with-crio-part-1
ci: Actually enable the CRI-O tests
2023-09-21 07:42:27 +02:00
Wainer Moschetta
87e64a07ed Merge pull request #7979 from beraldoleal/gogo-removal
protocol: remove gogoprotobuff tests
2023-09-20 22:38:10 -03:00
Gabriela Cervantes
87a8616488 gha: Install hunspell for static checks
Seems like the static checks are failing due the missing of the hunspell
package this PR fixes that.

Fixes #8019

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-20 16:58:10 +00:00
Fabiano Fidêncio
8c3c50ca8a ci: Actually enable the CRI-O tests
The test has been added to the repo, but we have to also add it to the
list of jobs to be executed.

Fixes: #8005

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-20 18:01:25 +02:00
David Esparza
03554c799a Merge pull request #8006 from fidencio/topic/ci-test-with-crio-part-0
ci: k8s: Also run tests with CRI-O
2023-09-20 07:45:17 -06:00
Fabiano Fidêncio
c6a9e50c37 Merge pull request #8004 from microsoft/danmihai1/quoted-spaces
runtime: support kernel params including spaces
2023-09-20 12:10:51 +02:00
Wang, Arron
3a6510ad61 osbuild: Reduce guest components binary size with strip
opa_linux_amd64_static 38M => 27M
kata-agent 30M => 23M

ls -alh opa_linux_amd64_static
-rw-rw-r-- 1 arron arron 38M Jul 28 01:59 opa_linux_amd64_static
➜ kata-containers git:(main) ✗ strip opa_linux_amd64_static
➜ kata-containers git:(main) ✗ ls -alh opa_linux_amd64_static
-rw-rw-r-- 1 arron arron 27M Sep 20 16:12 opa_linux_amd64_static

ls -alh ./usr/bin/kata-agent
-rwxr-xr-x. 1 root root 30M Jul 30 23:41 ./usr/bin/kata-agent
ls -alh ./usr/bin/kata-agent
-rwxr-xr-x. 1 root root 23M Sep 20 16:13 ./usr/bin/kata-agent

Fixes: #8011

Signed-off-by: Wang, Arron <arron.wang@intel.com>
2023-09-20 16:23:17 +08:00
Fabiano Fidêncio
07a6e63a6b ci: k8s: rke2: Use sudo to call systemd
Otherwise we'll face the following error:
```
Failed to enable unit: Interactive authentication required.
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-20 08:48:29 +02:00
Fabiano Fidêncio
03b82e8484 ci: k8s: Add a CRI-O test
Let's make sure we'll also be testing k8s using CRI-O.

For now, we'll only be running the CRI-O test with QEMU.  Once it
becomes stable we can expand this to other Hypervisors as well.

Fixes: #8005

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-20 00:59:09 +02:00
Fabiano Fidêncio
d7105cf7a4 ci: k8s: Add a method to install CRI-O
This is based on official CRI-O documentations[0] and right now we're
making this specific to Ubuntu as that's what we have as runners.

We may want to expand this in the future, but we're good for now.

[0]:
https://github.com/cri-o/cri-o/blob/main/install.md#apt-based-operating-systems

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-20 00:59:09 +02:00
Fabiano Fidêncio
54c0a471b1 ci: k8s: k0s: Allow passing parameters to the k0s installer
We'll need this in order to setup k0s with a different container engine.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-20 00:59:09 +02:00
Fabiano Fidêncio
31ef64606c Merge pull request #8007 from fidencio/topic/ci-kata-deploy-fix-garm-runner-name
ci: kata-deploy: Fix runner name
2023-09-20 00:58:33 +02:00
Beraldo Leal
730ef51693 deps: updating dependencies
Updating dependencies after make check, make test.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-19 16:54:35 -04:00
GabyCT
6111ef6fb6 Merge pull request #7990 from GabyCT/topic/parallelbandwidth
metrics: Enable parallel bandwidth iperf limit
2023-09-19 14:52:21 -06:00
Fabiano Fidêncio
3a2c83d69b ci: kata-deploy: Fix runner name
It should be garm-ubuntu-2004-smaller instead of garm-ubuntu-2004-small.

Fixes: #7890

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 22:34:37 +02:00
Dan Mihai
82ff2db460 runtime: support kernel params including spaces
Support quoted kernel command line parameters that include space
characters. Example:

dm-mod.create="dm-verity,,,ro,0 736328 verity 1
/dev/vda1 /dev/vda2 4096 4096 92041 0 sha256
f211b9f1921ef726d57a72bf82be23a510076639fa8549ade10f85e214e0ddb4
065c13dfb5b4e0af034685aa5442bddda47b17c182ee44ba55a373835d18a038"

Fixes: #8003

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-09-19 20:26:38 +00:00
Beraldo Leal
604a9dd673 protocol: remove gogoprotobuff tests
This is part of a bigger effort to drop gogoprotobuff from our code
base. IIUC, those options are basically used by *pb_test.go, and since
we are dropping gogoprotobuff and those are auto generated tests, let's
just remove it.

Fixes #7978.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-19 12:55:42 -04:00
Fabiano Fidêncio
5560e72024 Merge pull request #7896 from fidencio/topic/ground-work-for-testing-all-k8s-flavours-we-support
ci: kata-deploy: Enable all k8s flavours that we support
2023-09-19 17:44:34 +02:00
Fabiano Fidêncio
f7fa7f602a ci: Enable kata-deploy tests for all the supported k8s flavours
Let's ensure we test kata-deploy on RKE2 and k0s as well.

Fixes: #7890

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 13:38:10 +02:00
Fabiano Fidêncio
2c908b598c ci: kata-deploy: Add the ability to deploy rke2
This will be very useful in the near future, when we start testing
kata-deploy with rke2 as well.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 13:38:10 +02:00
Fabiano Fidêncio
eaf6164916 ci: kata-deploy: Add the ability to deploy k0s
This will be very useful in the near future, when we start testing
kata-deploy with k0s as well.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 13:38:10 +02:00
Fabiano Fidêncio
0015257636 ci: kata-deploy: Add deploy-k8s argument to gha-run.sh
We'll be using exactly the same code used for the k8s tests, which are
already deploying k3s on GARM.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 13:38:10 +02:00
Fabiano Fidêncio
bf2cb02283 ci: kata-deploy: Expland tests to run on k0s / rke2
We just need to make sure the correct overlay is applied, following what
we already have been doing for k3s.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 13:38:10 +02:00
Fabiano Fidêncio
6d5d844e5c Merge pull request #7983 from sprt/resource-group-naming
ci: Create clusters in individual resource groups
2023-09-19 12:54:21 +02:00
Fabiano Fidêncio
b12b9e1886 ci: kata-deploy: Add placeholder for tests on GARM
We'll be testing kata-deploy with different kubernetes flavours as part
of our GARM tests, and this is a place-holder for this.

Once enabled, we'll do nothing, just `return 0`, so we can then properly
add the tests after this commit gets merged.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 12:42:02 +02:00
Fabiano Fidêncio
9e1fb8a966 ci: kata-deploy: Export KUBERNETES env var
So we have a better control on which flavour of kubernetes kata-deploy
is expected to be targetting.

This was also done as part of fa62a4c01b,
for the k8s tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 12:37:56 +02:00
Fabiano Fidêncio
09cc0ed438 ci: Move deploy_k8s() to gha-run-k8s-common.sh
This will allow us to re-use the function in the kata-deploy tests,
which will come soon.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 12:37:56 +02:00
Fabiano Fidêncio
1829f5c049 Merge pull request #7992 from skaegi/virtiofsd-1.8.0
versions: Bump virtiofsd to v1.8.0
2023-09-19 11:52:49 +02:00
Fabiano Fidêncio
486fe14c99 ci: Properly set K8S_TEST_UNION
Otherwise only the first test will be executed

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 10:23:58 +02:00
Aurélien Bombo
d9ef1352af ci: Add first letter of the K8S_TEST_HOST_TYPE to resource group name
Ideally we'd add the instance_type or the full K8S_TEST_HOST_TYPE but
that exceeds the maximum amount of characteres allowed for the cluster
name.  With this in mind, let's use the first letter of
K8S_TEST_HOST_TYPE instead.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-09-19 10:23:58 +02:00
Aurélien Bombo
68267a3996 ci: Create clusters in individual resource groups
This makes it so that each AKS cluster is created in its own individual
resource group, rather than using the "kataCI" resource group for all
test clusters.

This is to accommodate a tool that we recently introduced in our Azure
subscription which automatically deletes resource groups after a set
amount of time, in order to keep spending under control.

The tool will automatically delete any resource group, unless it has a
tag SkipAutoDeleteTill = YYYY-MM-DD. When this tag is present, the
resource group will be retained until the specified date.

Note that I tagged all current resource groups in our subscription with
SkipAutoDeleteTill = 2043-01-01 so that we don't lose any existing
resources.

Fixes: #7982

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-09-19 10:23:55 +02:00
Fabiano Fidêncio
84c0d59d23 Merge pull request #7985 from fidencio/topic/clh-use-static_sandbox_resource_mgmt-as-default-on-arm
clh: arm: Use static_sandbox_resource_mgmt=true
2023-09-19 09:25:34 +02:00
Gabriela Cervantes
9aa8d1c917 metrics: Add parallel bandwidth limit for qemu
This PR adds the parallel bandwidth limit for qemu for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-18 21:08:54 +00:00
Simon Kaegi
44c7c082d9 versions: Bump virtiofsd to v1.8.0
https://gitlab.com/virtio-fs/virtiofsd/-/releases/v1.8.0 was released two weeks ago. We have fully tested and are using this version.

Also bumps toolchain version to match what virtiofsd used.

Fixes: #7960

Signed-off-by: Simon Kaegi <simon.kaegi@gmail.com>
2023-09-18 15:21:15 -04:00
Fabiano Fidêncio
5f8e210d3b Merge pull request #7961 from ChengyuZhu6/update_nydus
Bump nydus versions and update nydus tests
2023-09-18 21:02:20 +02:00
Fabiano Fidêncio
c3ee913bf6 Merge pull request #7953 from gkurz/extra-monitor-socket
runtime/qemu: Rework QMP/HMP support
2023-09-18 19:04:14 +02:00
Gabriela Cervantes
af59d4bf4a metrics: Enable parallel bandwidth iperf limit
This PR enables the parallel bandwidth iperf limit for kata metrics.

Fixes #7989

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-18 16:32:11 +00:00
Fabiano Fidêncio
aba36ab188 nydus: Temporarily skip tests on dragonball
We're hitting a specific issue after updating, which will require some
work on dragonball before it can be re-added here.

The issue:
```
...
3: failed to do rafs mount\\n
4: fail to attach rafs \\\"/var/lib/containerd-nydus/snapshots/2/fs/image/image.boot\\\"\\n
5: add share fs mount\\n
6: Mount rafs at
   /rafs/197ef3db03c86b91bf3045ff59183ce8b5750941ad1d3484f4a8301a70f5109f/rootfs_lower
   error: Failed to Mount backend
...

Caused by:
vmm action error: FsDevice(AttachBackendFailed(\\\"attach/detach a
backend filesystem failed:: missing field `version` at line 1 column
489\\\"))\"): unknown"
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
b8a8dfcd15 nydus: Use kata-${KATA_HYPERVISOR} instead of kata
This will ensure we're testing with the correct runtime, instead of
using the `default` one.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
ChengyuZhu6
f6df3d6efb static-build: Fix arch error on nydus build
Fix the arch error when downloading the nydus tarball.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Signed-off-by: Steven Horsman <steven@uk.ibm.com>
2023-09-18 17:40:06 +02:00
ChengyuZhu6
2f9c9e2e63 tests: nydus: Update nydus tests
To support the v0.12.0 nydus-snapshotter, we need to update the config
files and the commandline to start nydus-snapshotter.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
c9a4e7e46d versions: Bump nydus and nydus-snapshotter to its latest release
As we need https://github.com/containerd/nydus-snapshotter/pull/530 in.

Fixes #7984

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
b73bde320d gha: nydus: Populate run()
And with this we finally enable the nydus tests to run as part of our
GHA CI.

Fixes: #6543

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
b3904a1a30 gha: nydus: Populate install_dependencies()
Let's have all the dependencies needed for running the nydus tests
installed.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
d2b3b67f5d gha: nydus: Actually install kata when install-kata is called
We've been simply doing nothing whenever `install-kata` was called, and
that was the intent when we added the placeholder calls.

Now, let's install kata, as expected. :-)

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
0ec00ad42e gha: nydus: Get rid of nydus{,-snapshotter} install from nydus_test.sh
As we've added install_nydus() and install_nydus_snapshotter(), which do
conform with the pattern we're following on GHA, let's rely on them
rather than relying on the bits coming from nydus_test.sh.

Later on we'll have install_nydus() and install_nydus_snapshotter() as
part of the dependencies install in our `gha-run.sh`.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
568439c77b tests: nydus: Add timeout to the crictl calls
Similarly to what's been done for the cri-containerd tests, as part of
84dd02e0f9, we need to add the timeout
here for the crictl calls.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
5ac3b76eb1 tests: nydus: Add uid / namespace to the nydus container / sandbox
Otherwise we may face errors like:
```
getting sandbox status of pod "d3af2db414ce8": metadata.Name,
metadata.Namespace or metadata.Uid is not in metadata
"&PodSandboxMetadata{Name:nydus-sandbox,Uid:,Namespace:default,Attempt:1,}"

getting sandbox status of pod "-A": rpc error: code = NotFound desc = an
error occurred when try to find sandbox: not found
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
376574a16c tests: nydus: Decorate some calls with sudo
Otherwise we canoot properly start the nydus snapshotter, nor properly
kill it after it's been started.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
4290fd4b67 tests: nydus: Adapt "source ..." to GHA
The "source ..." we've been doing was not changed since those tests were
part of the Jenkins tests, and we need to adapt them, either setting the
correct path or entirely removing the ones that are not relevant to us
anymore.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
a84efa3e87 tests: nydus: Adapt check to "clh" instead "cloud-hypervisor"
As that's what we've been using as part of the GHA.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
56a14b3950 tests: common: Add install_nydus_snapshotter()
This function will be used to download and install the
nydus-snapshotter, and it follows the same pattern we already have
introduced for downloading and installing another dependencies from
GitHub.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
b6563783e2 tests: common: Add install_nydus()
This function will be used to download and install nydus, and it follows
the same pattern we already have introduced for downloading and
installing another dependencies from GitHub.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 17:40:06 +02:00
Fabiano Fidêncio
72599f1911 clh: arm: Use static_sandbox_resource_mgmt=true
Users have noticed that this is needed, as CLH does not yet implement a
way to hotplug resources on aarh64.

With this patch, when building for x86_64, I can see the this is the
resulting config:
```
$ ARCH=amd64 make
...

$ cat config/configuration-clh.toml | grep static_sandbox_resource_mgmt
static_sandbox_resource_mgmt=false

```

And when building for aarch64:
```
$ ARCH=arm64 make
...

$ cat config/configuration-clh.toml | grep static_sandbox_resource_mgmt
static_sandbox_resource_mgmt=true
```

Fixes: #7941

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-18 14:14:10 +02:00
Jeremi Piotrowski
dfa6af54df Merge pull request #7806 from jongwu/clh_serial
clh:arm64: use arm AMBA UART for hypervisor debug
2023-09-18 12:29:07 +02:00
Greg Kurz
1f16b6627b runtime/qemu: Rework QMP/HMP support
PR #6146 added the possibility to control QEMU with an extra HMP socket
as an aid for debugging. This is great for development or bug chasing
but this raises some concerns in production.

The HMP monitor allows to temper with the VM state in a variety of ways.
This could be intentionally or mistakenly used to inject subtle bugs in
the VM that would be extremely hard if not even impossible to debug. We
definitely don't want that to be enabled by default.

The feature is currently wired to the `enable_debug` setting in the
`[hypervisor.qemu]` section of the configuration file. This setting has
historically been used to control "debug output" and it is used as such
by some downstream users (e.g. Openshift). Forcing people to have the
extra HMP backdoor at the same time is abusive and dangerous.

A new `extra_monitor_socket` is added to `[hypervisor.qemu]` to give
fine control on whether the HMP socket is wanted or not. This setting
is still gated by `enable_debug = true` to make it clear it is for
debug only. The default is to not have the HMP socket though. This
isn't backward compatible with #6416 but it is for the sake of "better
safe than sorry".

An extra monitor socket makes the QEMU instance untrusted. A warning is
thus logged to the journal when one is requested.

While here, also allow the user to choose between HMP and QMP for the
extra monitor socket. Motivation is that QMP offers way more options to
control or introspect the VM than HMP does. Users can also ask for
pretty json formatting well suited for human reading. This will improve
the debugging experience.

This feature is only made visible in the base and GPU configurations
of QEMU for now.

Fixes #7952

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-09-18 12:13:01 +02:00
Greg Kurz
cab46c9e23 Merge pull request #7973 from fidencio/topic/ci-use-bigger-machine-sizes-for-the-needed-tests-part-0
ci: Use variable size of VMs depending on the tests running
2023-09-18 12:06:44 +02:00
Fabiano Fidêncio
0e3bfac3b3 Merge pull request #7976 from fidencio/topic/ci-static-checks-rework-part-0
ci: Rework static checks
2023-09-18 11:01:18 +02:00
Peng Tao
6eedd9b0b9 Merge pull request #7738 from Xuanqing-Shi/7732/handle-non-empty-endpoints-in-RemoveEndpoints
runtime: incorrect handling of non-empty []Endpoint parameter in Remo…
2023-09-18 10:58:28 +08:00
Fabiano Fidêncio
8b1e9b0c75 ci: static-checks: Clean up static-checks job
Now that the static-checks job only takes care of running the
static-checks, let's clean it up, remove all the unneeded steps, make
sure that we're using the actions in their latest version, and have it
running in a cost free runner.

At some point I'd like to see those tests done in parallel, in the same
way that I've organised the build-checks, but that's something for
someone else, at some other time.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 14:23:02 +02:00
Fabiano Fidêncio
2c5ca2eaf8 ci: static-checks: Run tests depending on KVM
With this we're removing the dragonball static-checks CI, as the test is
running here now. :-)

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 14:22:38 +02:00
Fabiano Fidêncio
509c309ab2 ci: static-checks: Move "sudo make test" to the new test matrix
We're moving it out of the previous "static-checks" confusing matrix,
and adding it to the matrix that was currently being used for the `make
vendor` and `make check` checks.

This will allow us to have one job per component, and with that we can
easily run those in parallel and on the zero cost runners.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:53:23 +02:00
Fabiano Fidêncio
4e963cedf4 ci: static-checks: Move "make test" to the new test matrix
We're moving it out of the previous "static-checks" confusing matrix,
and adding it to the matrix that was currently being used for the `make
vendor` and `make check` checks.

This will allow us to have one job per component, and with that we can
easily run those in parallel and on the zero cost runners.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:53:17 +02:00
Fabiano Fidêncio
08f2e5ae0b runtime-rs: Ensure static-checks-build is a dep of make test
Otherwise `make test` will simply fail with:
```
error[E0583]: file not found for module `config`
```

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:53:13 +02:00
Fabiano Fidêncio
2bc3a616ae kata-ctl: Use loop instead of kvm module in tests
This makes it pssible to run the tests in the cost free runners, which
are not KVM capable.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:53:08 +02:00
Fabiano Fidêncio
46daddc500 kata-ctl: Ensure GENERATED_CODE is a dep of make test
Otherwise `make test` will simply fail with:
```
error[E0583]: file not found for module `version`
```

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:53:01 +02:00
Fabiano Fidêncio
ec826f328f agent: Ensure GENERATED_CODE is a dep of make test
Otherwise `make test` will fail with:
```
error[E0583]: file not found for module `version`
```

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:57 +02:00
Fabiano Fidêncio
1d32410a83 ci: install_libseccomp: Do not depend on the tests repo
It makes things way simpler, waaaaay simpler.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:49 +02:00
Fabiano Fidêncio
bf888b9a5e ci: static-checks: Move "make check" to the new test matrix
We're moving it out of the previous "static-checks" confusing matrix,
and adding it to the matrix that was currently being used for the `make
vendor` checks.

This will allow us to have one job per component, and with that we can
easily run those in parallel and on the zero cost runners.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:45 +02:00
Fabiano Fidêncio
473ec87806 kata-ctl: Add kata-types to the Cargo.lock file
Commit message covered everything. :-)

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:40 +02:00
Fabiano Fidêncio
ea19549a99 kata-ctl: Ensure GENERATED_CODE is a dep of make check
Otherwise `make check` would fail with:
```
Error writing files: failed to resolve mod `version`:
/home/runner/work/kata-containers/kata-containers/src/tools/kata-ctl/src/ops/version.rs
does not exist make: *** [../../../utils.mk:176: standard_rust_check] Error 1
```

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:36 +02:00
Fabiano Fidêncio
e125775863 tests: install_rust: Also install clippy
clippy is used as part our tests, so it's useful to have it installed
while we're already installing rust.

In case of developers, they also better be using it. :-)

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:31 +02:00
Fabiano Fidêncio
e2c61a152c ci: static-checks: Move vendor check to its own job
Similarly to the static-check jobs, those jobs can be run on the zero
cost runners.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:30 +02:00
Fabiano Fidêncio
6794d4c843 tests: Move install_rust.sh from the tests repo
We'll use it as part of the refactoring we're doing in the static check
tests.

I can see a lot of other uses of this, but changing all of them to this
one is out of the scope for this PR.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:29 +02:00
Fabiano Fidêncio
e64508c308 tests: install_go: Remove tests repo dependency
We can rely on the functions that are now part of the common.bash.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:28 +02:00
Fabiano Fidêncio
11dff731b7 tests: Move functions from kata_arch script here
We can use this a lot as part of our CI, but right now I'm just moving
those here with the intent to use later on in this series.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:28 +02:00
Fabiano Fidêncio
75c974c802 ci: static-checks: Move kernel config check to its own job
It doesn't make sense to run this for all the bits of the matrix,
neither it's demanding enough to require running this in one of our
Azure sponsored runners.

Fixes: #7974 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 12:52:25 +02:00
Archana Shinde
9c233bb9e0 test: Add test to verify try_from for clh Netconfig
Add tests to verify conversion from runtime NetworkConfig
to clh specific config.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-09-16 00:24:14 -07:00
Fabiano Fidêncio
c69a1e33bd ci: Use variable size of VMs depending on the tests running
Let me start with a fair warning that this commit is hard to split into
different parts that could be easily tested (or not tested, just
ignored) without breaking pieces.

Now, about the commit itself, as we're on the run to reduce costs
related to our sponsorship on Azure, we can split the k8s tests we run
in 2 simple groups:
* Tests that can be run in the smaller Azure instance (D2s_v5)
* Tests that required the normal Azure instance (D4s_v5)

With this in mind, we're now passing to the tests which type of host
we're using, which allows us to select to run either one of the two
types of tests, or even both in case of running the tests on a baremetal
system.

Fixes: #7972

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-16 09:13:54 +02:00
Archana Shinde
9049d311df runtime-rs: Add network support for cloud-hypervisor
This PR adds support for adding a network device before starting the
cloud-hypervisor VM.

Support for adding and removing network devices is not really added to
the resource manager, so supporting this for cloud-hypervisor is not
scoped in this PR.

This also changes "pending_devices" for clh implementation from an
Option of vector to simply a vector. This simplifies the structure a bit
as we can simple iterate over the pending devices instead of having to
check for a "Some" value as this is not really required.

Fixes: #6333

Signed-off-by: Shuaiyi Zhang <zhang_syi@qq.com>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-09-15 23:25:20 -07:00
Greg Kurz
79c494eb4e Merge pull request #7969 from fidencio/topic/ci-cache-using-oras-part-3
ci: cache: Check the sha256sum of the components & fix ovmf-sev cache usage
2023-09-15 16:30:22 +02:00
Fabiano Fidêncio
eecd5bf2aa ci: cache: Fix ovmf-sev cache
The cached tarball is relying on the component name, thus it's important
to set it correctly, otherwise we'll end up always building it.

With this patch applied:
```
≡ ⨯ make ovmf-sev-tarball
make ovmf-sev-tarball-build
make[1]: Entering directory '/home/ffidenci/src/upstream/kata-containers/kata-containers'
/home/ffidenci/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build//kata-deploy-binaries-in-docker.sh  --build=ovmf-sev
sha256:67cc94e393dc1d5bfc2b77a77e83c9b1c0833d0fbbebaa9e9e36f938bb841fcc
Build kata version 3.2.0-rc0: ovmf-sev
INFO: DESTDIR /home/ffidenci/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build/build/ovmf-sev/destdir
Downloading a76f5522493f ovmf-sev-builder-image-version
Downloading 7e98c854bd94 kata-static-ovmf-sev.tar.xz
Downloading 559311973ff8 ovmf-sev-version
Downloaded  a76f5522493f ovmf-sev-builder-image-version
Downloading 353b655c2297 ovmf-sev-sha256sum
Downloaded  559311973ff8 ovmf-sev-version
Downloaded  353b655c2297 ovmf-sev-sha256sum
Downloaded  7e98c854bd94 kata-static-ovmf-sev.tar.xz
Pulled [registry] ghcr.io/kata-containers/cached-artefacts/ovmf-sev:latest-main-x86_64
Digest: sha256:933236c2c79e53be3ca7acc0b966d0ddac9c0335edcb1e8cad8b9bb3aaf508ce
kata-static-ovmf-sev.tar.xz: OK
INFO: Using cached tarball of ovmf-sev
drwxr-xr-x runner/runner     0 2023-09-15 10:34 ./
drwxr-xr-x runner/runner     0 2023-09-15 10:34 ./opt/
drwxr-xr-x runner/runner     0 2023-09-15 10:34 ./opt/kata/
drwxr-xr-x runner/runner     0 2023-09-15 10:34 ./opt/kata/share/
drwxr-xr-x runner/runner     0 2023-09-15 10:34 ./opt/kata/share/ovmf/
-rwxr-xr-x runner/runner 4194304 2023-09-15 10:34 ./opt/kata/share/ovmf/AMDSEV.fd
~/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build/build ~/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build/build/ovmf-sev/builddir
~/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build/build/ovmf-sev/builddir
make[1]: Leaving directory '/home/ffidenci/src/upstream/kata-containers/kata-containers'
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 12:39:22 +02:00
Fabiano Fidêncio
86c41074b4 ci: cache: Check the sha256sum of the component
We've removed this in the part 2 of this effort, as we were not caching
the sha256sum of the component.  Now that this part has been merged,
let's get back to checking it.

Fixes: #7834 -- part 3

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 12:34:30 +02:00
Fabiano Fidêncio
f5e52d02d3 Merge pull request #7964 from fidencio/topic/ci-cache-using-oras-part-2
ci: cache: Use the artefacts stored in ghcr.io/kata-containers/cached-artefacts/${component}
2023-09-15 12:29:28 +02:00
Fabiano Fidêncio
2fe0b494da Merge pull request #7959 from fidencio/topic/ci-run-on-smaller-garm-instances
ci: Run some of the GARM tests in smaller instances
2023-09-15 11:30:13 +02:00
Fabiano Fidêncio
460988c5f7 ci: cache: Remove the script used to cache artefacts on Jenkins
That's not needed anymore, as we've switched to using ORAS and an OCI
registry to cache the artefacts.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 10:27:55 +02:00
Fabiano Fidêncio
4533a7a416 ci: cache: Also store the ${component} sha256sum
This is something that was done by our Jenkins jobs, but that I ended up
missing when writing d0c257b3a7.

Now, let's also add the sha256sum to the cached artefact, and in a
coming up PR (after this one is merged) we will also start checking for
that.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 10:25:26 +02:00
Fabiano Fidêncio
eccc76df63 ci: cache: Use the cached artefacts from ORAS
In the previous series related to the artefacts we build, we've
switching from storing the artefacts on Jenkins, to storing those in the
ghcr.io/kata-containers/cached-artefacts/${artefact_name}.

Now, let's take advantage of that and actually use the artefacts coming
from that "package" (as GitHub calls it).

NOTE: One thing that I've noticed that we're missing, is storing and
checking the sha256sum of the artefact.  The storing part will be done
in a different commit, and the checking the sha256sum will be done in a
different PR, as we need to ensure those were pushed to the registry
before actually taking the bullet to check for them.

Fixes: #7834 -- part 2

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 10:13:47 +02:00
Jeremi Piotrowski
6f30d00ae7 Merge pull request #7956 from fidencio/topic/ci-reduce-the-machine-size-used
ci: Reduce the size of the AKS VMs
2023-09-15 08:49:08 +02:00
Steve Horsman
1b8f3fa9ae Merge pull request #7957 from fidencio/topic/ci-cache-using-oras-part-1
ci: cache: Allow pushing our artefacts to an OCI registry
2023-09-15 07:45:24 +01:00
Jianyong Wu
7f5e77bcb8 kernel: enable Arm pl011 support
Enable pl011 (ttyAMA0) support in kernel for aarch64.

Fixes: #5080
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-09-15 01:45:16 +00:00
Jianyong Wu
241c355e07 clh:arm64: use arm AMBA uart for hypervisor debug
cloud hypervisor on arm64 only support arm AMBA UART(pl011) as
tty. So, the console should be set to "ttyAMA0" instead of "ttyS0"
when enable hypervisor debug mode.

Fixes: #5080
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-09-15 01:44:23 +00:00
Fabiano Fidêncio
094b6b2cf8 ci: k8s: Temporarily disable tests that require a bigger VM instance
The list of tests which require a bigger VM instance is:
* k8s-number-cpus.bats -- failing on all CIs
* k8s-parallel.bats -- only failing on the cbl-mariner CI
* k8s-scale-nginx.bats -- only failing on the cbl-mariner CI

We'll keep those disabled while we re-work the logic to **only run
those** in a bigger (and more expensive) VM instance.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 01:33:19 +02:00
GabyCT
6fe5cd3bd5 Merge pull request #7937 from GabyCT/topic/iperfbandwidth
metrics: Add iperf value for cpu utilization
2023-09-14 16:47:19 -06:00
Fabiano Fidêncio
d0c257b3a7 ci: cache: Push cached artefacts to ghcr.io
Let's push the artefacts to ghcr.io and stop relying on jenkins for
that.

Fixes: #7834 -- part 1

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 00:39:57 +02:00
Fabiano Fidêncio
108f1b60dd kata-deploy: Generate latest_{artefact,image_builder} files
Right now this is not used, but it'll be used when we start caching the
artefacts using ORAS.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 00:39:57 +02:00
Fabiano Fidêncio
be2eb7b378 ci: cache: Install ORAS in the kata-deploy binaries builder container
ORAS is the tool which will help us to deal with our artefacts being
pushed to and pulled from a container registry.

As both the push to and the pull from will be done inside the
kata-deploy binaries builder container, we need it installed there.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 00:39:57 +02:00
Fabiano Fidêncio
fb24fb0dc1 ci: k8s: devmapper: Use a smaller / cheaper VM instance
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense.  With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.

Fixes: #7958

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 00:27:05 +02:00
Fabiano Fidêncio
1daf02f5d4 ci: nydus: Use a smaller / cheaper VM instance
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense.  With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.

Fixes: #7958

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 00:25:41 +02:00
Fabiano Fidêncio
e60d81f554 ci: nerdctl: Use a smaller / cheaper VM instance
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense.  With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.

Fixes: #7958

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 00:25:41 +02:00
Fabiano Fidêncio
4db416997c ci: docker: Use a smaller / cheaper VM instance
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense.  With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.

Fixes: #7958

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 00:25:41 +02:00
Fabiano Fidêncio
32841827b8 ci: cri-containerd: Use a smaller / cheaper VM instance
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense.  With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.

Fixes: #7958

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-15 00:25:35 +02:00
Fabiano Fidêncio
92fff129fd ci: k8s: Don't set cpu limit request for k8s-inotofy test
Without setting the cpu limit / request to 1, we can make this test run
in a smaller VM instance without any issue.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-14 22:03:16 +02:00
Fabiano Fidêncio
faf98c0623 ci: Reduce the size of the AKS VMs
We do **not** need a very powerful machine for our tests, as we're not
building anything there.

The instance we switched to (Standard_D2s_v5) still has nested virt
available, as shown here[0], but has half of the amount of vCPUs /
Memory, which should be fine only for running the tests, costing us
basically half of the price[1].

[0]:
https://learn.microsoft.com/en-us/azure/virtual-machines/dv5-dsv5-series
[1]:
https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/#pricing

Fixes: #7955

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-14 22:03:16 +02:00
Fabiano Fidêncio
adc18ecdb1 ci: cache: For consistency, read all used env vars
Instead of having some of them only being considered if explicitly
passed to the script.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-14 20:24:48 +02:00
Fabiano Fidêncio
c7a851efd7 ci: cache: Pass the exposed env vars to the kata-deploy binaries in docker
As the environment variables are now being passed down from the GitHub
Actions, let's make sure they're exposed to the container used to build
the kata-deploy binaries, and during the build process we'll be able to
use those to log in and push the artefacts to the OCI registry, using
ORAS.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-14 20:24:48 +02:00
Fabiano Fidêncio
2e8b41f39c Merge pull request #7954 from fidencio/topic/ci-cache-using-oras-part-0
ci: cache: Export env vars needed to use ORAS
2023-09-14 20:23:55 +02:00
Fabiano Fidêncio
6bd15a85d5 ci: cache: Export env vars needed to use ORAS
We do the build of our artefacts inside a container image, and we need
to expose some env vars to the container so ORAS can be used there to
push the artefacts we want to cache to ghcr.io.

The env vars we're exposing are:
* ARTEFACT_REGISTRY: The registry where we're going to save the
  artefacts.
* ARTEFACT_REGISTRY_USERNAME: The username to log in to the registry, as
  ORAS does not use the same json file used by docker.
* ARTEFACT_REGISTRY_PASSWORD: The pasword to log in to the the registry,
  as the ORAS does not use the same json file used by docker.
* TARGET_BRANCH: The target branch, which will be part of the tag of the
  artefact, as we may end up caching the artefacts for both main and
  stable branches.

Fixes: #7834 -- part 0

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-14 19:36:33 +02:00
Gabriela Cervantes
cd4fd1292a metrics: Add iperf cpu utilization limit for qemu
This PR adds the iperf cpu utilization limit for qemu for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-14 17:17:47 +00:00
Gabriela Cervantes
df5cd10ea0 metrics: Add iperf value for cpu utilization
This PR adds the iperf value for cpu utilization for kata metrics.

Fixes #7936

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-14 16:06:49 +00:00
Jeremi Piotrowski
b54dd8cdf4 Merge pull request #7704 from jepio/vfio-part-1
gha: vfio: Import test script
2023-09-14 16:45:31 +02:00
Jeremi Piotrowski
a96050a7ad tests: Apply timeout to 'ctr t kill'
This task has been observed to hang at times.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
9d93036783 tests/vfio: Bump VM image to Fedora 38
We need a very recent L2 guest kernel to fix all the bugs that occur in nested
virtualization.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
faee59b520 tests/vfio: Accept single device in vfio group for CLH
cloud hypervisor does not emulate pcie switches or pci bridges, so we need to
accept a lonely device.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
df3dc1105c tests/vfio: Get rid of sync's
It is fine to start a VM with the disk image without syncing it as we now run
the test in an ephemeral Azure instance.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
7211c3dccc gha: vfio: Set test timeout to 15m
Sometimes the test gets stuck running commands in the container - need to
investigate why later.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
1b02f89e4f packaging: kernel: Enable VIRTIO_IOMMU on x86_64
Cloud Hypervisor exposes a VIRTIO_IOMMU device to the VM when IOMMU support is
enabled. We need to add it to the whitelist because dragonball uses kernel
v5.10 which restricted VIRTIO_IOMMU to ARM64 only.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
3a1db7a86b runtime: clh: Support enabling iommu
by enabling IOMMU on the default PCI segment. For hotplug to work we need a
virtualized iommu and clh exposes one if there is some device or PCI segment
that requests it. I would have preferred to add a separate PCI segment for
hotplugging vfio devices but unfortunately kata assumes there is only one
segment all over the place. See create_pci_root_bus_path(),
split_vfio_pci_option() and grep for '0000'.

Enabling the IOMMU on the default PCI segment requires passing enabling IOMMU on
every device that is attached to it, which is why it is sprinkled all over the
place.

CLH does not support IOMMU for VirtioFs, so I've added a non IOMMU segment for
that device.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
9f1a42c6cc tests/vfio: Give commands 30s to execute
This is a to catch the case of the guest getting stuck.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
b46b0ecf8b tests/vfio: Configure a value for 'hot_plug_vfio' for both vmms
This shouldn't be hiding behind only a qemu check, we need this for clh as
well.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
bfc93927fb runtime: Remove redundant check in checkPCIeConfig
There is no way for this branch to be hit, as port is only set when it is
different than config.NoPort.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
7c4e73b609 runtime: Add test cases for checkPCIeConfig
These test cases shows which options are valid for CLH/Qemu, and test that we
correctly catch unsupported combinations.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
fc51e4b9eb runtime: Check config for supported CLH (cold|hot)_plug_vfio values
The only supported options are hot_plug_vfio=root-port or no-port.
cold_plug_vfio not supported yet.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
509771e6f5 runtime: clh: Add hot_plug_vfio entry to config
hot_plug_vfio needs to be set to root-port, otherwise attaching vfio devices to
CLH VMs fails. Either cold_plug_vfio or hot_plug_vfio is required, and we have
not implemented support for cold_plug_vfio in CLH yet.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
5f6475a28a tests/vfio: Gather debug info and disable tdp_mmu
tdp_mmu had some issues up until around Linux v6.3 that make it work
particularly bad when running nested on Hyper-V. Reload the module at the start
of the test and disable the tdp_mmu param.

Gather debug info at the end of the test to make it easier to figure out what
went wrong. This uses github actions group syntax so that each section can be
collapsed.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
8fffdc81c5 tests/vfio: Capture journal from vm
For debugging (though this doesn't get exposed yet).

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
df815087e7 tests/vfio: Change to get the test working in GHA
- reduce memory and cpu usage to fit in a D4s_v5
- source correct lib
- mount workspace from 9p
- disable cpu mitigations for speed
- drop unused commands and variables
- install containerd
- install kata from built artifacts

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
a92ddeea15 tests/vfio: Move dependency installation to gha-run.sh
To match the flow of other github actions workflows.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Jeremi Piotrowski
5a551a85b1 gha: vfio: Import jobs scripts from tests repo
This imports the vfio test scripts github.com/kata-containers/tests. The test
case doesn't work yet but doing the changes in a separate commit will make it
easier to track the changes. The only change in this commit is renaming
vfio_jenkins_job_build.sh -> vfio_fedora_vm_wrapper.sh

Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-14 14:23:28 +02:00
Fabiano Fidêncio
a1e3fa7ac4 Merge pull request #7905 from microsoft/danmihai1/mariner-annotations
tests: fix kernel and initrd annotations
2023-09-14 10:37:42 +02:00
GabyCT
1d331124ad Merge pull request #7925 from GabyCT/topic/bandwidthlimit
metrics: Add iperf bandwidth value for kata metrics
2023-09-13 17:43:55 -06:00
Gabriela Cervantes
49e2fa189c metrics: Increase jitter value for qemu
This PR increases the jitter value for qemu for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-13 22:36:09 +00:00
Gabriela Cervantes
49234433a7 metrics: Increase value limit for jitter in clh
This PR increases the value limit for jitter in clh.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-13 21:27:08 +00:00
David Esparza
0a24d3f718 Merge pull request #7923 from GabyCT/topic/addcassandradoc
metrics: Add Cassandra Metrics documentation
2023-09-13 10:17:00 -06:00
GabyCT
c565053bac Merge pull request #7895 from GabyCT/topic/removewarning
metrics: Remove warning from metrics documentation
2023-09-13 10:16:38 -06:00
Fabiano Fidêncio
8b9df1d32e Merge pull request #7929 from fidencio/topic/use-tcp-port-ping-on-docker-nerdctl-tests
ci: docker: nerdctl: Switch to tcp port 80 ping
2023-09-13 15:46:31 +02:00
Peng Tao
55ca7e8aec Merge pull request #7907 from Xuanqing-Shi/7876/network-devices-naming-conflict
runtime: Naming conflict of network devices
2023-09-13 19:29:41 +08:00
Fabiano Fidêncio
813bfdec01 ci: docker: nerdtl: Use io.containerd.kata-${KATA_HYPERVISOR}.io
This will ensure that we're calling the correct binary for the
hypervisor.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-13 13:10:14 +02:00
Fabiano Fidêncio
46bc0b1c01 ci: nerdctl: Create the containerd config
Otherwise we'll fail to configure kata-containers in the `install-kata`
step.

This is mostly needed because the nerdctl-full tarball doesn't provide a
contaienrd configuration, just the binary, as contaienrd does not
actually require a configuration file to run with the default config.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-13 13:00:57 +02:00
Fabiano Fidêncio
13968aa7f6 ci: nerdctl: Switch to tcp port 80 ping
TIL that the Azure VMs we use are created without an explicit outbund
connectivity defined.

This leads us to issues using `ping ...` as part of our tests, and when
consulting Jeremi Piotrowski about the issue he pointed me out to two
interesting links:
* https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access
* https://learn.microsoft.com/en-us/archive/blogs/mast/use-port-pings-instead-of-icmp-to-test-azure-vm-connectivity

For your own sanity, do not read the comments, after all this is
internet. :-)

Anyways, the suggestion is to use nping instead, which is provided by
the nmap package, so we can explicitly switch to using the tcp port 80
for the ping.  With this in mind, I'm switching the image we use for the
test and using one that provided nping as a possible entry point, and
from now on (this part of) the tests should work.

Fixes: #7910

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-13 13:00:57 +02:00
Fabiano Fidêncio
e0c811678b ci: docker: Switch to tcp port 80 ping
TIL that the Azure VMs we use are created without an explicit outbund
connectivity defined.

This leads us to issues using `ping ...` as part of our tests, and when
consulting Jeremi Piotrowski about the issue he pointed me out to two
interesting links:
* https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access
* https://learn.microsoft.com/en-us/archive/blogs/mast/use-port-pings-instead-of-icmp-to-test-azure-vm-connectivity

For your own sanity, do not read the comments, after all this is
internet. :-)

Anyways, the suggestion is to use nping instead, which is provided by
the nmap package, so we can explicitly switch to using the tcp port 80
for the ping.  With this in mind, I'm switching the image we use for the
test and using one that provided nping as a possible entry point, and
from now on (this part of) the tests should work.

Fixes: #7910

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-13 13:00:57 +02:00
shixuanqing
1636abbe1c runtime: issue with non-empty []Endpoint in RemoveEndpoints
In the RemoveEndpoints(), when the endpoints paramete isn't empty,
using idx may result in wrong endpoint removals. To improve,
directly passing the endpoint parameter helps
locate the correct elements within n.eps.

Fixes: #7732

Signed-off-by: shixuanqing <1356292400@qq.com>

Fixes: #7732

Signed-off-by: shixuanqing <1356292400@qq.com>

Update src/runtime/virtcontainers/network_linux.go

Co-authored-by: Xuewei Niu <justxuewei@apache.org>
2023-09-13 09:47:18 +00:00
Peng Tao
9766f9090c Merge pull request #7719 from beraldoleal/nullable
Remove gogoproto.nullable extension
2023-09-13 15:11:56 +08:00
David Esparza
c2b2a00ad9 Merge pull request #7899 from GabyCT/topic/startdocker
metrics: Ensure docker is running in init_env
2023-09-12 23:01:26 -06:00
Gabriela Cervantes
0aa073967d metrics: Add iperf bandwidth value for qemu
This PR adds the iperf bandwidth value for qemu for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-12 20:57:14 +00:00
Dan Mihai
c0ad914766 tests: fix kernel and initrd annotations
Fix kernel and initrd annotations in the k8s tests on Mariner. These
annotations must be applied to the spec.template for Deployment, Job
and ReplicationController resources.

Fixes: #7764

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-09-12 20:15:25 +00:00
Gabriela Cervantes
615c1cbf19 metrics: Add iperf bandwidth value for kata metrics
This PR adds the iperf bandwidth value for kata metrics.

Fixes #7924

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-12 19:30:24 +00:00
Gabriela Cervantes
d53eb73eec metrics: Ensure docker is running in init_env
This PR ensures that docker is running as part of the init_env function
in kata metrics to avoid failures like docker is not running and making
the kata metrics CI to fail.

Fixes #7898

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-12 19:13:09 +00:00
GabyCT
c0d502493e Merge pull request #7921 from dborquez/metrics_disable_fio_test
metrics: this PR skips the FIO test temprarily to fix issues
2023-09-12 12:08:48 -06:00
Gabriela Cervantes
ad08321b83 metrics: Add Cassandra Metrics documentation
This PR adds the Cassandra Metrics documentation for kata metrics.

Fixes #7922

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-12 16:30:35 +00:00
David Esparza
a58ea66592 metrics: this PR skips the FIO test temprarily to fix issues
FIO test is showing ongoing issues when running in k8s.
Working on running FIO on the ctr client which has been
shown to be stable.

Fixes: #7920

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-09-12 10:23:57 -06:00
Fabiano Fidêncio
2d8447fc6b Merge pull request #7916 from fidencio/topic/add-functional-nerdctl-tests
ci: Add a very basic nerdctl sanity test
2023-09-12 17:47:08 +02:00
James O. D. Hunt
7feb8de9dc Merge pull request #7887 from jodh-intel/hypervisor-remove-debug-kernel-options
runtime-rs: hypervisor: Remove debug kernel options
2023-09-12 16:31:48 +01:00
Fabiano Fidêncio
f536ef5ce1 ci: docker: Also run the smoke test with runc
This will help us to make sure that the failure is actually related to
Kata Containers.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-12 16:54:02 +02:00
Fabiano Fidêncio
c83f167c59 ci: docker: Run the tests after the kata-static is created
There's no reason to wait till the payload is created to run the tests,
as we rely on the tarball, not on the kata-deploy payload.

That was a mistake on my side, and that's already fixed for the nerdctl
tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-12 16:53:47 +02:00
Fabiano Fidêncio
12d833d07d ci: Add a very basic nerdctl sanity test
Let's add a very basic sanity test to check that we can spawn a
containers using nerdctl + Kata Containers.

This will ensure that, at least, we don't regress to the point where
this feature doesn't work at all.

In the future, we should also test all the VMMs with devmapper, but
that's for a follow-up PR after this test is working as expected.

Fixes: #7911

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-12 16:52:55 +02:00
Greg Kurz
be71a0ab4e Merge pull request #7811 from stevenhorsman/bump-rust-to-1.72
versions: Bump rust version
2023-09-12 15:30:35 +02:00
Fabiano Fidêncio
b020912629 Merge pull request #7913 from fidencio/topic/add-functional-docker-tests
ci: Add a very basic docker sanity test
2023-09-12 15:28:49 +02:00
Fabiano Fidêncio
348b8644d6 ci: Add a very basic docker sanity test
Let's add a very basic sanity test to check that we can spawn a
containers using docker + Kata Containers.

This will ensure that, at least, we don't regress to the point where
this feature doesn't work at all.

For now we're running this test against Cloud Hypervisor and QEMU only,
due to an already reported issue with dragonball:
https://github.com/kata-containers/kata-containers/issues/7912

In the future, we should also test all the VMMs with devmapper, but
that's for a follow-up PR after this test is working as expected.

Fixes: #7910

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-12 15:15:26 +02:00
stevenhorsman
a75fd5eb81 runk: Fix rust unecessary mut error
- Fix `error: variable does not need to be mutable`
in rust 1.72

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
a31c145172 kata-ctl: useless-vec warning
- Fix clippy::useless-vec warning

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
c8419fc3bb kata-ctl: Resolve non-minimal-cfg warning
- In rust 1.72, clippy warned clippy::non-minimal-cfg
as the cfg has only one condition, so doesn't
need to be wrapped in the any combinator.

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
3eaf68d954 agent-ctl: Allow clippy lint
- Allow `clippy::redundant-closure-call`
which has issues with the guard function passed into
the `run_if_auto_values` macro

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
1d8b78959d runtime-rs: Fix useless-vec warning
Fix clippy::useless-vec warning

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
99f3d69e94 runtime-rs: Remove mut
Fix `error: variable does not need to be mutable`

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
16fbc27b09 dragonball: Allow ambiguous-glob-reexports
The bindgen generated code is triggering lots of
ambiguous-glob-reexports warnings in rust 1.70+

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
bbf1919516 dragonball: Resolve non-minimal-cfg warning
- In rust 1.72, clippy warned clippy::non-minimal-cfg
as the cfg has only one condition, so doesn't
need to be wrapped in the all combinators.

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
75cfdd5d59 agent: config: Allow clippy lint
- Allow `clippy::redundant-closure-call` in `from_cmdline`
which has issues with the guard function passed into
the `parse_cmdline_param` macro

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
f3a0fd5907 agent: config: Fix useles-vec warning
Fix clippy::useless-vec warning

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
9e423bd3d6 libs: Fix clippy unnecesary hashes error
- Fix error: unnecessary hashes around raw string literal

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
stevenhorsman
444395050a versions: Bump rust version
Bump rust to 1.72.0 to test what extra warnings/issues we get

Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-09-12 11:31:49 +01:00
Yipeng Yin
a16b0962b5 chore(cargo): update cargo lock
Update cargo lock for runtime-rs, agent and kata-ctl.

Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
2023-09-12 15:27:38 +08:00
Chao Wu
c800d0739f Merge pull request #7889 from UiPath/fix-dragonball-build
dragonball: fix for non-deterministic builds
2023-09-12 14:06:18 +08:00
shixuanqing
ca4b6b051d runtime: Naming conflict of network devices
When creating a new endpoint, we check existing endpoint names and automatically adjust the naming of the new endpoint to ensure uniqueness.

Fixes: #7876

Signed-off-by: shixuanqing <1356292400@qq.com>
2023-09-12 04:29:51 +00:00
Guixiong Wei
202049f35e feat(runtime-rs): introduce huge page type to select VM RAM's backend
This commit allows us to specify the huge page backend when enabling huge
page. Currently, we support two backends: thp and hugetlbfs, the default
is hugetlbfs.

To ensure backward compatibility, we introduce another configuration item
"hugepage_type" to select the memory backend, which is available only when
"enable_hugepages" is true. Besides, we add an annotation
"io.katacontainers.config.hypervisor.hugepage_type" to configure huge page
type per pod.

Fixes: #6703

Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com>
Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
2023-09-12 11:28:27 +08:00
Zhongtao Hu
e1f54f96d0 Merge pull request #7766 from Apokleos/wrap-vsock-virtiofs
runtime-rs: bring hybrid vsock devices in manager.
2023-09-12 09:27:34 +08:00
GabyCT
af29eeb8b1 Merge pull request #7901 from fidencio/topic/ci-target-branch-fixes-follow-up-3
ci: use github.ref_name instead of $GITHUB_REF_NAME
2023-09-11 15:31:29 -06:00
Fabiano Fidêncio
f811b064ca ci: use github.ref_name instead of $GITHUB_REF_NAME
As, regardless of what's mentioned in the documentation, it seems that
$GITHUB_REF_NAME is passed down as a literal string.

Fixes: #7414

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-11 22:14:55 +02:00
Fabiano Fidêncio
dc0b350e49 Merge pull request #7900 from fidencio/topic/ci-target-branch-fixes-follow-up-2
ci: Add more target-branch related fixes
2023-09-11 21:26:26 +02:00
Fabiano Fidêncio
6d795c089e ci: Add more target-branch related fixes
The ones for the payload-after-push.yamland ci-nightly.yaml are not that
much important right now, but they're needed for when we start running
those on stable branches as well.

The other ones were missed during
bd24afcf73.

Fixes: #7414

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-11 20:42:57 +02:00
Fabiano Fidêncio
07d0ad0ad7 Merge pull request #7897 from fidencio/topic/ci-devmapper-do-the-rebase-as-well
ci: Fix target-branch usage
2023-09-11 20:30:53 +02:00
Fabiano Fidêncio
d7f991d139 Merge pull request #7151 from Yuan-Zhuo/fix-systemd-cgroup
agent: optimize the code of systemd cgroup manager
2023-09-11 20:15:51 +02:00
Fabiano Fidêncio
8509c31870 ci: Fix target-branch usage
We missed those one as part of bd24afcf73.

Fixes: #7414

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-11 20:10:27 +02:00
Gabriela Cervantes
060499dcae metrics: Remove warning from metrics documentation
Now that the metrics migration from the tests to kata containers has been completed, this PR removes the warning from the main metrics documentation.

Fixes #7894

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-11 16:41:48 +00:00
GabyCT
b384757ac7 Merge pull request #7874 from fidencio/topic/manually-rebase-branches-atop-of-the-target-one
gha: Manually rebase PR atop of the target branch before testing
2023-09-11 10:35:01 -06:00
Fabiano Fidêncio
46e73cf7a2 Merge pull request #7884 from fidencio/topic/update-kernel-to-the-latest-lts-plus-bring-in-erofs-patches
Update kernel to the latest LTS release (v6.1.52) and bring in erofs patches needed for the CC work
2023-09-11 13:58:43 +02:00
James O. D. Hunt
c0f697fcc5 runtime: Allow kernel_params annotation
To support the removal of the `initcall_debug` and `earlyprintk=`
options from the default guest kernel cmdline, add `kernel_params` to the list
of enabled annotations to allow those kernel options (or others) to be
set using `kata-deploy` for either runtime.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-11 12:12:12 +01:00
Alexandru Matei
b03e49794e dragonball: fix for non-deterministic builds
Fixes: #7888

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2023-09-11 14:07:10 +03:00
Fabiano Fidêncio
93bad13769 Merge pull request #7875 from fidencio/topic/kata-deploy-fix-arm64-image-build
kata-deploy: Fix aarch64 image build
2023-09-11 11:36:52 +02:00
James O. D. Hunt
976d10150c runtime-rs: hypervisor: Remove debug kernel options
Removed the following kernel command line options:

- `earlyprintk=ttyS0`
- `initcall_debug`

Both these options are only useful when debugging a guest kernel failure
which is not a common occurrence.

Further, the `earlyprintk=` option can have a large negative performance
impact (it can increase the VM boot time significantly).

If the user wishes to use either of these options, they can add them to the
`kernel_params=` setting in the Kata configuration file's hypervisor
stanza.

Fixes: #7886.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-11 09:43:39 +01:00
Fabiano Fidêncio
fde34610cd kernel: Add erofs patches needed for CC related work
All the patches have already been merged upstream and they've just been
cherry-picked to this branch.

Fixes: #7885

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-11 10:39:37 +02:00
Fabiano Fidêncio
dc6a4588a2 versions: Bump kernel to the latest LTS release (6.1.52)
We're bumping here in order to make our lives easier backporting EROFS
patches needed for the CC related work.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-11 10:32:16 +02:00
James O. D. Hunt
52f6449b70 kata-manager: Remove initcall_debug kernel option
Removed the addition of the `initcall_debug` kernel option when agent
debugging enabled. This option has nothing to do with the agent.

If the user wishes to use this option, they can add it to the
`kernel_params=` setting in the Kata configuration file's hypervisor
stanza.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-09-11 09:31:44 +01:00
Fabiano Fidêncio
6cd5d83a37 Merge pull request #7865 from gkurz/fix-more-virtiofs-args
runtime: Fix more virtiofs args
2023-09-09 21:30:16 +02:00
Fabiano Fidêncio
8b4a0b368f kata-deploy: Remove curl after it's used
There's no need to keep curl there after the kubectl binary has already
been downloaded.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-09 10:52:05 +02:00
Fabiano Fidêncio
139c7f03ab kata-deploy: Fix aarch64 image build
Similarly to what's been done for x86_64 -> amd64, we need to do a
aarch64 -> arm64 change in order to be able to download the kubectl
binary.

Fixes: #7861

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-09 10:51:52 +02:00
Fabiano Fidêncio
94f5a69346 Merge pull request #7862 from fidencio/topic/kata-deploy-use-alpine-as-base-image
kata-deploy: Switch to an alpine image
2023-09-09 09:02:13 +02:00
Yuan-Zhuo
470d065415 agent: optimize the code of systemd cgroup manager
1. Directly support CgroupManager::freeze through systemd API.
2. Avoid always passing unit_name by storing it into DBusClient.
3. Realize CgroupManager::destroy more accurately by killing systemd unit rather than stop it.
4. Ignore no such unit error when destroying systemd unit.
5. Update zbus version and corresponding interface file.

Acknowledgement: error handling for no such systemd unit error refers to

Fixes: #7080, #7142, #7143, #7166

Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
2023-09-09 13:56:43 +08:00
GabyCT
fa818bfad1 Merge pull request #7867 from GabyCT/topic/optimizedimage
metrics: Use TensorFlow optimized image
2023-09-08 11:34:21 -06:00
Fabiano Fidêncio
bd24afcf73 gha: Manually rebase PR atop of the target branch before testing
We're changing what's been done as part of ac939c458c, as we've
notcied issues using `github.event.pull_request.merge_commit_sha`.

Basically, whenever a force-push would happen, the reference of
merge_commit_sha wouldn't be updated, leading us to test PRs with the
old code. :-/

In order to get the rebase properly working, we need to ensure we pull
the hash of the commit as part of checkout action, and ensure
fetch-depth is set to 0.

Fixes: #7414

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 18:56:31 +02:00
GabyCT
dc7414f5c1 Merge pull request #7870 from dborquez/metrics_fio_fix_clean_env_order
metrics: fix FIO test initialization
2023-09-08 10:28:10 -06:00
Greg Kurz
72c510d057 runtime/virtiofsd: Drop all references to "--cache=none"
This syntax belongs to the legacy C virtiofsd implementation that
we don't support anymore since kata-containers 3.1.3 because
of other API breaking changes.

People have been warned to switch from "none" to "never" since
kata-containers 2.5.2. Let's officially do that.

The compat code that would convert "none" to "never" isn't
needed anymore. Just drop it.

Fixes #7864

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-09-08 17:57:30 +02:00
Beraldo Leal
ead724bec1 protocol: removing gogo.nullable feature
gogo.nullable is the main gogo.protobuf' feature used here. Since we are
trying to remove gogo.protobuf, the first reasonable step seems to be
remove this feature. This is a core update, and it will change how the
structs are defined. I could spot only a few places using those structs,
based on make check/build.

Fixes #7723.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
d8e4bb9859 protocol: remove unused PROTO_FILE env
There is no reference to PROTO_FILE and this is not working. Also we are
not inside a Makefile, so makes sense to adapt the usage to reflect the
script instead of a make command.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
5e1106a770 protocol: remove unused import_path
import_path is used as the default package when no input files specify
go_package. However, all the files we are currently building already
have a go_package definition, making this behavior both redundant and
error-prone.

Additionally, one of our files (types.pb.go) resides outside the grpc
directory, indicating that it's indeed ignored but also inconsistent.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
87accaaecb protocol: use workdir during build
Currently, the script searches for .proto files within $GOPATH/.
Consequently, modifications to a definition file in the current working
directory won't influence the output .pb.go if the directory is outside
of $GOPATH. For developers, it's more intuitive to alter the local
codebase than the version stored in $GOPATH.

With this modification, the generated .pb.go files will be relative to
the current working directory, removing the need to clone this project
under $GOPATH/src/github.com/kata-containers.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
711a7ed965 protocol: remove mapping definitions
The definitions are already specified in the .proto files using the
go_package option. Centralizing them in one location reduces the
potential for errors and simplifies the script.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
8db84c1bd2 protocol: force GOPATH to be set
Currently, if GOPATH is not set, errors will raise since protoc is using
GOPATH to find packages.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Beraldo Leal
68156d77ac protocol: breaking lines to improve readability
Just a small change to improve the readability of modules before the
actual changes.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-09-08 11:49:01 -04:00
Fabiano Fidêncio
670a8e9c73 kata-deploy: Switch to an alpine image
This will make our image smaller, and still ensure it's multi-arch
support.

Fixes: #7861

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 17:39:51 +02:00
Fabiano Fidêncio
0b26a5d053 Merge pull request #7871 from fidencio/topic/ci-add-k8s-devmapper-tests-follow-up-3
ci: k8s: Add clean-up-garm argument for gha-run.sh
2023-09-08 17:27:57 +02:00
Fabiano Fidêncio
9d74b7ccc9 k8s: ci: Skip "Pod quota" test with firecracker
The test is failing, and an issue has been opened to track it.
For now, let's skip it.

Issue:
https://github.com/kata-containers/kata-containers/issues/7873

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 15:51:46 +02:00
Fabiano Fidêncio
f6cd3930c5 ci: k8s: Remove useless skip statement from tests
There's absolutely no need to have the skip check as part of the test
itself when it's already done as part of the setup function.

We're only touching the files here that were touched in the previous
commit.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 14:25:29 +02:00
Fabiano Fidêncio
3cc20b47a6 ci: k8s: Also check for "fc" (for firecracker)
Let's keep both checks for now, but in the future we'll be able to
remove the check for "firecracker", as the hypervisor name used as part
of the GitHub Actions has to match what's used as part of the
kata-deploy stuff, which is `fc` (as in `kata-fc for the runtime class)
instead of `firecracker`.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 14:25:24 +02:00
Fabiano Fidêncio
b5bad3cb0f ci: k8s: Add clean-up-garm argument for gha-run.sh
The tests are failing to finish as the argument is invalid.

Fixes: #6542

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 14:04:50 +02:00
Fabiano Fidêncio
05e2e7636e Merge pull request #7868 from fidencio/topic/ci-add-k8s-devmapper-tests-follow-up-2
ci: k8s: Second round of fix-ups with the devmapper CI
2023-09-08 11:02:20 +02:00
Fabiano Fidêncio
aaec5a09f3 ci: k8s: devmapper tests should be using ubuntu 20.04
That's what we've been using as part of Jenkins, so let's ensure things
will work as they did before, and only after that consider upgrading the
base OS used for the tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 10:09:04 +02:00
Fabiano Fidêncio
27fa7d828d ci: k8s: Add a kata-deploy-garm target
We've been using the `kata-deploy-tdx` target as that also uses k3s as
base, but it's better to just have a specific garm target.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 10:09:04 +02:00
Fabiano Fidêncio
fa62a4c01b ci: k8s: Export KUBERNETES env var
So we have a better control on which flavour of kubernetes kata-deploy
is expected to be targetting.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 10:09:04 +02:00
Fabiano Fidêncio
8c9380a798 ci: k8s: Install bats on GARM runners
GARM runners do not come with the whole set of tools we need, or are
used to when it comes to the GHA runners, so we need to manually install
bats on those.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-08 10:09:04 +02:00
Fabiano Fidêncio
3de23034f8 ci: k8s: Wait some time after restarting k3s
Let's put a 1 minute sleep, just to make sure everything is back up
again.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-07 23:46:58 +02:00
David Esparza
adfea55b8f metrics: fix FIO test initialization
This PR changes the order in which the FIO test first
cleans the environment and then checks if the environment
is indeed clean.

Fixes: #7869

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-09-07 15:41:59 -06:00
Fabiano Fidêncio
2df183fd99 ci: k8s: Append, instead of overwrite, the devmapper config
As we were using `tee` without the `-a` (or `--apend`) aptton, the
containerd config would be overwritten, leading to a NotReady state of
the Node.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-07 23:12:55 +02:00
Fabiano Fidêncio
369a8af8f7 ci: k8s: Decrease k3s sleep from 4 to 2 minutes
It should be plenty, and worked well in local tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-07 23:12:55 +02:00
Fabiano Fidêncio
ada65b988a ci: k8s: Use vanilla kubectl with k3s
Let's download the vanilla kubectl binary into `/usr/bin/`, as we need
to avoid hitting issues like:
```sh
error: open /etc/rancher/k3s/k3s.yaml.lock: permission denied
```

The issue basically happens because k3s links `/usr/local/bin/kubectl`
to `/usr/local/bin/k3s`, and that does extra stuff that vanilla
`kubectl` doesn't do.

Also, in order to properly use the k3s.yaml config with the vanilla
kubectl, we're copying it to ~/.kube/config.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-07 23:12:55 +02:00
Fabiano Fidêncio
ad45ab5d33 ci: k8s: Ensure k3s is deploy with --write-kubeconfig-mode=644
Otherwise the /etc/rancher/k3s/k3s.yaml is not readable by other users
than root.

As --write-config-mode is being passed, and that's an option that has to
be passed to the `server`, -s is also added to the command line.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-07 23:12:55 +02:00
Fabiano Fidêncio
028a97e0d5 ci: k8s: Use the proper command for sleep
`wait` waits for a job to complete, not a number of seconds.  Not sure
how I got that wrong in the first place, but it's what it's.

Fixes: #6542

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-07 23:12:55 +02:00
David Esparza
34f580901f Merge pull request #7824 from dborquez/fix_memory_usage_initialization
metrics: re-enable memory-usage initialization step
2023-09-07 14:24:27 -06:00
Gabriela Cervantes
3a427795ea metrics: Use TensorFlow optimized image
This PR replaces the ubuntu image for one which has TensorFlow optimized
for kata metrics.

Fixes #7866

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-07 15:38:51 +00:00
Chao Wu
cd8c217ee1 Merge pull request #6879 from openanolis/chao/update_upstream_upcall_feature
Dragonball: optimize the placement of dbs-upcall features
2023-09-07 18:07:53 +08:00
Fabiano Fidêncio
dfa1cce916 Merge pull request #7860 from fidencio/topic/ci-add-k8s-devmapper-tests-follow-up-1
ci: k8s: Fix typo in run-k8s-tests-on-garm.yaml
2023-09-07 11:48:30 +02:00
Fabiano Fidêncio
8d99972a8a ci: k8s: Fix typo in run-k8s-tests-on-garm.yaml
integrations -> integration
integrtion -> integration

Fixes: #6542

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-07 11:31:30 +02:00
Fabiano Fidêncio
0483d3d16d Merge pull request #7841 from fidencio/topic/ci-add-k8s-devmapper-tests
ci: k8s: Add k8s devmapper tests (part 0)
2023-09-07 10:53:09 +02:00
Jeremi Piotrowski
f6cc01d77c Merge pull request #7833 from jepio/kata-static-fix-ownership
kata-deploy: Create kata-static.tar with correct ownership
2023-09-07 10:16:23 +02:00
Peng Tao
435e890cd9 Merge pull request #7703 from bergwolf/github/nerdctl-fc
runtime: run prestart hooks before starting VM for FC
2023-09-07 10:55:31 +08:00
Chao Wu
deed1b927d Dragonball: optimize the placement of dbs-upcall features
Currently, the dbs-upcall features have 2 problems that are needed to be
fixed :

There are redundant dbs-upcall features that are needed to be removed.
Some place should be controlled by dbs-upcall but not being implemented.

This commit will fix those two problems.

fixes: #6878

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-09-07 10:27:29 +08:00
Fabiano Fidêncio
0e8bd50cbb ci: k8s: Add k8s devmapper tests (part 0)
Let's enable the devmapper kubernetes tests to match exactly what's been
tested as part of the Jenkins CI.

Fixes: #6542

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-06 23:08:38 +02:00
Fabiano Fidêncio
b28b54df04 ci: k8s: Add a function to configure devmapper for containerd
This function right now is completely based on what's part of the tests
repo[0], and that's the reason I'm keeping the `Signed-off-by` of all
the contributors to that file.

This is not perfect, though, as it changes the default snapshotter to
devmapper, instead of only doing so for the Kata Containers specific
runtime handlers.  OTOH, this is exactly what we've always been doing as
part of the tests.

We'll improve it, soon enough, when we get to also add a way for
kata-deploy to set up different snapshotters for different handlers.
But, for now, this is as good (or as bad) as it's always been.

It's important to note that the devmapper setup doesn't take into
consideration a BM machine, and this is not suitable for that.  We're
really only targetting GHA runners which will be thrown away after the
run is over.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-06 23:08:17 +02:00
Fabiano Fidêncio
54f7117212 ci: k8s: Add a function to deploy k3s
One can use different kubernetes flavours for getting a kubernetes
cluster up and running.

As part of our CI, though, I really would like to avoid contributors
spending time maintaining and updating kubernetes dependencies, as done
with the tests repo, and which has been proven to be really good on
getting things rotten.

With this in mind, I'm taking the bullet and using "k3s" as the way to
deploy kubernetes for the devmapper related tests, and that's the reason
I'm adding a function to do so, and this will be used later on as part
of this series.

It's important to note that the k3s setup doesn't take into
consideration a BM machine, and this is not suitable for that.  We're
really only targetting GHA runners which will be thrown away after the
run is over.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-06 23:07:41 +02:00
David Esparza
cf258090aa Merge pull request #7843 from GabyCT/topic/ffiolimit
metrics: Add write 95 percentile FIO value
2023-09-06 14:52:00 -06:00
Fabiano Fidêncio
c5e1e7ddc3 Merge pull request #7854 from fidencio/topic/runtime-allow-virtio_fs_extra_args-annotation
runtime: Allow virtio_fs_extra_args annotation
2023-09-06 19:20:40 +02:00
Greg Kurz
81536f21af runtime/qemu: Pass "--xattr" to virtiofsd instead of "-o xattr"
The "-o" syntax belongs to the legacy C virtiofsd. It is deprecated
with the rust implementation.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-09-06 17:50:35 +02:00
Fabiano Fidêncio
b1dd09a4d3 runtime: Allow virtio_fs_extra_args annotation
Some use cases may just require passing extra arguments to virtiofsd,
and having this disabled by default makes it impossible to set when
using kata-deploy, as changes in the configuration file would be
overwritten by the daemon-set.

With this in mind, let's allow users to pass whatever thet need (and
here I'm specifically looking at `--xattr`) as a virtio_fs_extra_arg.

Fixes: #7853

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-06 17:11:16 +02:00
Hyounggyu Choi
d27fe18167 Merge pull request #7849 from BbolroC/hot-fix-dockerbuild
packaging: do not install docker-compose-plugin for s390x|ppc64le
2023-09-06 13:13:25 +02:00
Hyounggyu Choi
2efda20c77 packaging: do not install docker-compose-plugin for s390x|ppc64le
This PR is to skip installing docker-compose-plugin while buiding a `build-kata-deploy` image for s390x|ppc64le.
It is a temporary solution to fix current CI failures for s390x regarding `hash sum mismatch`.

Fixes: #7848
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2023-09-06 11:12:03 +02:00
Zhongtao Hu
aa85e0b3ec Merge pull request #7714 from justxuewei/volumes-cleanup
runtime-rs: Fix volumes and rootfs cleanup issues
2023-09-06 10:13:55 +08:00
Gabriela Cervantes
438fbf9669 metrics: Add write 95 percentile for FIO for qemu
This PR adds the write 95 percentile for FIO for qemu for
checkmetrics for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-05 22:50:31 +00:00
Gabriela Cervantes
024b4d2ffe metrics: Add write 95 percentile FIO value
This PR adds the write 95 percentile FIO value for checkmetrics
for kata metrics.

Fixes #7842

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-05 21:00:05 +00:00
GabyCT
3e3a91fd2c Merge pull request #7577 from GabyCT/topic/enableiperfm
metrics: Enable iperf benchmark on gha for kata metrics
2023-09-05 14:53:47 -06:00
Gabriela Cervantes
e98e5cdea2 metrics: Add checkmetrics to gha run script
This PR adds the checkmetrics to gha run script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-05 17:05:03 +00:00
Gabriela Cervantes
c1edfe5511 metrics: Add checkmetrics value for qemu for iperf
This PR adds the checkmetrics value for qemu for iperf benchmark.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-05 16:04:52 +00:00
Gabriela Cervantes
6a79ecedf9 metrics: Add jitter value for clh
This PR adds jitter value for clh for iperf metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-05 16:04:52 +00:00
Gabriela Cervantes
f609a9a754 metrics: Add test selector to iperf metrics
This PR adds test selector to iperf metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-05 16:04:52 +00:00
Gabriela Cervantes
5b8db30422 metrics: Enable iperf benchmark on gha for kata metrics
This PR enables the iperf benchmark to run on the gha for kata metrics.

Fixes #7575

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-09-05 16:04:52 +00:00
Jeremi Piotrowski
cf46b056fd Merge pull request #7839 from openanolis/chao/switch_to_azure
CI: switch static-checks-dragonball CI machines to Azure
2023-09-05 10:59:02 +02:00
Chao Wu
60f733d301 CI: switch static-checks-dragonball CI machines to Azure
Previously, static-checks-dragonball is using machines from Alibaba
Cloud to run all the CI jobs.

Currently, we are going through an internal process to apply for the new
machines for Dragonball CI. Before the internal process is over, we will
temporarily use Azure VM to run static-checks-dragonball jobs.

fixes: #7838

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-09-05 15:19:07 +08:00
alex.lyn
7870b33a2d runtime-rs: bring hybridVsock devices in manager.
Currently, virtio_vsock are still outside of the device
manager. This causes some management issues,such as the
inability to unify PCI address management.

Just do some work for hybrid vsock.

Fixes: #7655

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-09-05 08:46:56 +08:00
Jeremi Piotrowski
18c94ebbe3 kata-deploy: Create kata-static.tar with correct ownership
Pass --owner and --group to the tar invokation to prevent gihtub runner user
from leaking into release artifacts.

Fixes: #7832
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-09-04 17:24:00 +02:00
Fabiano Fidêncio
b663ec21ac Merge pull request #7803 from GabyCT/topic/readmereportdoc
metrics: Add README for kata metrics report
2023-09-03 21:57:13 +02:00
Fabiano Fidêncio
e490b0bc76 Merge pull request #7808 from ManaSugi/fix/remove-manual-chcon
osbuilder: Remove chcon operation for guest SELinux
2023-09-03 21:55:02 +02:00
Fabiano Fidêncio
27dab249a0 Merge pull request #7800 from jodh-intel/kata-sys-util-update-tdx-protection-checks
kata-sys-util: protection: Update TDX checks
2023-09-02 14:47:51 +02:00
Jiang Liu
d5729e818c Merge pull request #7819 from jiangliu/storage-cleanup
Improve the way to clean up storage devices for sandbox
2023-09-02 17:02:51 +08:00
Jiang Liu
57e7bf14a6 agent: refine StorageDeviceGeneric::cleanup()
Refine StorageDeviceGeneric::cleanup() to improve safety.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-02 14:22:21 +08:00
Jiang Liu
53edb19374 agent: implement StorageDeviceGeneric::cleanup()
Refactor cleanup_sandbox_storage as StorageDeviceGeneric::cleanup().

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-02 14:00:26 +08:00
Jiang Liu
0c63453e28 types: make StorageDevice::cleanup() return possible error code
Make StorageDevice::cleanup() return possible error code.

Fixes: #7818

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-02 13:27:06 +08:00
Jiang Liu
3a3d77b3b5 agent: move StorageDeviceGeneric from kata-types into agent
Move StorageDeviceGeneric from kata-types into agent, so we can
refactor code later.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-02 13:12:17 +08:00
Jiang Liu
d848126b61 Merge pull request #7821 from jiangliu/storage-leak
agent: avoid possible leakage of storage device
2023-09-02 12:40:40 +08:00
Fabiano Fidêncio
4f92e6df90 Merge pull request #7683 from microsoft/danmihai1/policy-tests
tests: add policy to existing tests
2023-09-01 23:52:15 +02:00
David Esparza
b151cfd140 metrics: re-enable memory-usage initialization step
This PR re-enables the initialization step disabled
on 538c965c2b.

Fixes: #7804

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-09-01 14:29:34 -06:00
Fabiano Fidêncio
f3e1a6a94f osbuilder: alpine: Change mirror
As we're hitting a lot of:
```
ERROR: https://dl-5.alpinelinux.org/alpine/v3.18/main: operation timed
out
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-01 16:01:42 +00:00
Fabiano Fidêncio
ac612aef5e osbuilder: alpine: Match the version on versions.yaml
We've switching to 3.18 as part of
82cd14ba39.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-01 16:01:33 +00:00
Jiang Liu
9cd706d1c9 agent: avoid possible leakage of storage device
When a storage device is used by more than one container, the second
and forth instances will cause storage device reference count leakage,
thus cause storage device leakage. The reason is:
add_storages() will increase reference count of existing storage device,
but forget to add the device to the `mount_list` array, thus leak the
reference count.

Fixes: #7820

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-01 22:52:42 +08:00
Dan Mihai
bf21411e90 tests: add policy to k8s tests
Use AGENT_POLICY=yes when building the Guest images, and add a
permissive test policy to the k8s tests for:
- CBL-Mariner
- SEV
- SNP
- TDX

Also, add an example of policy rejecting ExecProcessRequest.

Fixes: #7667

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-09-01 14:28:08 +00:00
Dan Mihai
d0e0610679 runtime: config: use the SEV initrd for SNP
Thanks Unmesh Deodhar!

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-09-01 14:28:08 +00:00
Fabiano Fidêncio
67fed26f18 runtime: Use TDX image with in the qemu-tdx config
Let's make sure we use the TDX image as part of the QEMU TDX
configuration, which will help us to have the policies tested here.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-01 14:28:08 +00:00
Fabiano Fidêncio
f65ffb23da Merge pull request #7814 from fidencio/topic/gha-rebase-prs-atop-of-main-for-the-tests
gha: Rebase PR atop of the target branch before testing
2023-09-01 16:26:32 +02:00
Fabiano Fidêncio
ef70aeb6b8 Merge pull request #7817 from fidencio/topic/update-alpine-to-its-latest-release
versions: Update alpine to its 3.18 version
2023-09-01 14:51:58 +02:00
Fabiano Fidêncio
ac939c458c gha: Rebase atop of the target branch
We have two scenarios we care about this, `pull_request` and
`pull_request_target` events triggered a job.

`pull_request` event:
When using the checkout action, it'll already provide a "rebased atop of
main" repo for us, nothing else is needed, and that's basically what we
already have as part of the jobs in our CI.

`pull_request_target` event:
This one is a little bit tricky, as the checkout action, unless passing
a spsecific repo, give us the PR checked out rebased atop of the HEAD of
the PR branch.  Jeremi Piotrowski nicely pointed out that we could use
github.event.pull_request.merge_commit_sha instead, which is the result
of the PR's branch with the official repo target branch.

Now, the only cases where the contributor's rebase would still be needed
is when the action itself has been changed.

Fixes: #7414

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-01 11:23:31 +02:00
Jeremi Piotrowski
bde06758b1 Merge pull request #7761 from jepio/iocopy-fix-race
runtime: Fix data race in ioCopy
2023-09-01 09:30:54 +02:00
Fabiano Fidêncio
82cd14ba39 versions: Update alpine to its 3.18 version
3.15 will be out of life in 2 months from now.

Fixes: #7816

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-31 23:02:54 +02:00
GabyCT
d75c7b5f9c Merge pull request #7813 from GabyCT/topic/genreport
metrics: Add grabdata script for metrics report
2023-08-31 13:33:38 -06:00
Gabriela Cervantes
6668825752 metrics: Add grabdata script for metrics report
This PR adds the grabdata script so it can be used for the metrics report
for kata metrics.

Fixes #7812

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-31 16:17:29 +00:00
James O. D. Hunt
c290eaed8c kata-sys-util: protection: Update TDX checks
Update the protection checking code to detect newer versions of Intel
TDX (whose userland interface has now stabilised).

> **Note:** that we don't need to retain the existing behaviour since:
>
> - We haven't yet landed the TDX feature (#6448).
> - Systems wishing to use TDX will need to use the latest available
>   system components (such as firmware and host kernel).

Also added an explicit TDX unit test.

Fixes: #7384.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-08-31 16:15:15 +01:00
Fabiano Fidêncio
d7a996c686 gha: Update to checkout@v3 action
At this point we should always be using the latest checkout action.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-31 16:02:31 +02:00
Jeremi Piotrowski
d7612440b8 Merge pull request #7789 from beraldoleal/tests/amd
Fixes tests on AMD machines
2023-08-31 11:23:51 +02:00
Jeremi Piotrowski
c2ba29c15b runtime: Fix data race in ioCopy
IoCopy is a tricky function (I don't claim to fully understand its contract),
but here is what I see: The goroutine that runs it spawns 3 goroutines - one
for each stream to handle (stdin/stdout/stderr). The goroutine then waits for
the stream goroutines to exit. The idea is that when the process exits and is
closed, the stdout goroutine will be unblocked and close stdin - this should
unblock the stdin goroutine. The stderr goroutine will exit at the same time as
the stdout goroutine. The iocopy routine then closes all tty.io streams.

The problem is that the stdout goroutine decrements the WaitGroup before
closing the stdin stream, which causes the iocopy goroutine to race to close
the streams. Move the wg.Done() of the stdout routine past the close so that
*this* race becomes impossible. I can't guarantee that this doesn't affect some
unspecified behavior.

Fixes: #5031
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-08-31 10:17:38 +02:00
Manabu Sugimoto
211de08d9e osbuilder: Remove chcon operation for guest SELinux
Remove the `chcon` operation which adds `container_runtime_exec_t` label to
the `kata-agent` binary because the container-selinux package including
the 39f83cc74d
commit has been released officially.
Ref. https://centos.pkgs.org/9-stream/centos-appstream-x86_64/container-selinux-2.221.0-1.el9.noarch.rpm.html

The container-selinux package is installed in a guest rootfs when we create it with `SELinux = yes`,
and `restorecon` sets `container_runtime_exec_t` to the `kata-agent`.

Fixes: #7807

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-31 16:44:32 +09:00
GabyCT
b467f2ef68 Merge pull request #7772 from GabyCT/topic/fiolimit
metrics: Enable FIO limits for kata metrics
2023-08-30 14:49:04 -06:00
Gabriela Cervantes
9f21fa9b39 metrics: Add report generator link to general documentation
This PR adds the report generator link to general documentation.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-30 16:55:14 +00:00
Gabriela Cervantes
c0ed5ea0ad metrics: Add README for kata metrics report
This PR adds the README for kata metrics report.

Fixes #7802

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-30 16:36:08 +00:00
Fabiano Fidêncio
aa2b51a831 Merge pull request #7783 from GabyCT/topic/makereport
metrics: Add metrics report script
2023-08-30 17:11:39 +02:00
Gabriela Cervantes
a7b59a5bf9 metrics: Add limit for 90 percentile for qemu value
This PR adds the limit for 90 percentile for qemu value for
FIO kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-30 13:53:38 +00:00
Gabriela Cervantes
99db6568e9 metrics: Add limit for write 90 percentile value for clh
This PR adds the limit for write 90 percentile value for clh for
FIO metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-30 13:53:38 +00:00
Gabriela Cervantes
6e06392c55 metrics: Enable FIO limits for kata metrics
This PR enables the FIO limits for kata metrics.

Fixes #7771

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-30 13:53:38 +00:00
David Esparza
924d06a7f5 Merge pull request #7787 from GabyCT/topic/fixmemoryinsidelimit
metrics: Fix memory inside limits for kata metrics
2023-08-30 07:45:17 -06:00
Peng Tao
2e4c874726 runtime/vc: runPrestartHooks should ignore GetHypervisorPid failure
If we are running FC hypervisor, it is not started when prestart hooks
are executed. So we should just ignore such error and just go ahead and
run the hooks.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-30 03:06:11 +00:00
Peng Tao
21204caf20 runtime: fail early when starting docker container with FC
FC does not support network device hotplug. Let's add a check to fail
early when starting containers created by docker.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-30 02:52:01 +00:00
Peng Tao
32fd013716 runtime: run prestart hooks before starting VM for FC
Add a new hypervisor capability to tell if it supports device hotplug.
If not, we should run prestart hooks before starting new VMs as nerdctl
is using the prestart hooks to set up netns. To make nerdctl + FC
to work, we need to run the prestart hooks before starting new VMs.

Fixes: #6384
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-30 02:52:01 +00:00
Beraldo Leal
00e7ffd988 tests: check vmx only on Intel machines
When running on amd machines, those tests will fail because there is no
vmx flag. Following other tests that checks for cpuType, let's adapt
them to restrict vmx only on Intel machines.

Fixes #7788.
Related #5066

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-08-29 20:04:31 -04:00
Gabriela Cervantes
c8dd3c0737 metrics: Fix memory footprint qemu limit
This PR fixes the memory footprint qemu limit for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 22:51:21 +00:00
Gabriela Cervantes
8877ec62fb metrics: Fix memory inside limits for kata metrics
This PR fixes the memory inside limit for clh for kata metrics due
to the recent changes that we had in the script which impacted
in the performance measurement.

Fixes #7786

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 21:38:18 +00:00
Beraldo Leal
80146f2078 tests: Fixes cpuType check on AMD machines
cpuType is not initialized yet. gets 0 (Intel) by default, failing on
AMD machines.

Fixes #7785

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-08-29 17:04:07 -04:00
Gabriela Cervantes
7e364716dd metrics: Add test setup details to metrics report
This PR adds test setup details to metrics report.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:56:53 +00:00
Gabriela Cervantes
17dc1b9760 metrics: Add boot lifecycle times to metrics report
This PR adds the boot lifecycle times to metrics report.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:55:44 +00:00
Gabriela Cervantes
3b0d6538f2 metrics: Add memory inside container to metrics report
This PR adds memory inside container to metrics report.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:53:17 +00:00
Gabriela Cervantes
79fbb9d243 metrics: Add scaling system footprint in metrics report
This PR adds scaling system footprint in metrics report.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:51:27 +00:00
Gabriela Cervantes
8e6d4e6f3d metrics: Add metrics reportgen
This PR adds metrics reportgen for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:45:36 +00:00
Gabriela Cervantes
139ffd4f75 metrics: Add report file titles
This PR adds report file titles for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:43:06 +00:00
GabyCT
8f2dae7b53 Merge pull request #7775 from dborquez/fix_memory_usage_parsing_results
metrics: fix parsing issue on memory-usage test
2023-08-29 11:26:13 -06:00
Gabriela Cervantes
878d1a2e7d metrics: Generate PNGs alongside the PDF report
This PR generates the PNGs for the kata metrics PDF report.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 16:50:32 +00:00
Gabriela Cervantes
fce2487971 metrics: Add metrics report R files
This PR adds the metrics report R files.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 16:45:22 +00:00
Gabriela Cervantes
08812074d1 metrics: Add report dockerfile
This PR adds the report dockerfile for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 16:28:32 +00:00
Gabriela Cervantes
69781fc027 metrics: Add metrics report script
This PR adds metrics report script for kata metrics.

Fixes #7782

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 16:25:14 +00:00
Chao Wu
e4fb20c74a Merge pull request #7585 from lifupan/main
dragonball: vsock add fifo/pipe stream support for passed fd hybridSt…
2023-08-29 23:39:21 +08:00
Fabiano Fidêncio
50e51bcafe Merge pull request #7185 from UnmeshDeodhar/add-cc-sev-test
tests: Add confidential test
2023-08-29 15:32:25 +02:00
Fabiano Fidêncio
e286e842c1 tests: Expand confidential test to support TDX
Let's expand the confidential test to also support TDX.

The main difference on the test, though, is that we're not grepping for
a string in the `dmesg` output, but rather relying on `cpuid` to detect
a TDX guest.

Fixes: #7184

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-29 14:10:47 +02:00
Unmesh Deodhar
e31f099be1 tests: Expand confidential test to support SNP
Let's expand the confidential test to also support SNP.

Fixes: #7184

Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
2023-08-29 14:10:47 +02:00
Unmesh Deodhar
c3b9d4945e tests: Add confidential test for SEV
Add a test case for the launch of unencrypted confidential
container, verifying that we are running inside a TEE.

Right now the test only works with SEV, but it'll be expanded in the
coming commits, as part of this very same series.

Fixes: #7184

Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-29 14:10:34 +02:00
David Esparza
538c965c2b metrics: fix parsing issue on memory-usage test
This PR fixes an issues in the parsing results stage,
by collecting just the n-results from the n-running
containers, discarding irrelevant data.

Fixes: #7774

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-08-28 23:39:46 -06:00
Fabiano Fidêncio
708b0a3052 Merge pull request #7768 from fidencio/topic/update-tdx-to-the-6.2-kernel-based-stack
tdx: Update the components needed for using the 6.2 kernel stack
2023-08-28 19:27:15 +02:00
Fabiano Fidêncio
3818bf3311 local-build: Remove $HOME/.docker/buildx/activity/default
The file can be removed between builds without causing any issue, and
leaving it around has been causing us some headache due to:
```
ERROR: open /home/runner/.docker/buildx/activity/default: permission denied
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-28 13:41:36 +02:00
Fabiano Fidêncio
d1b54ede29 qemu: tdx: Workaround SMP issue with TDX 1.5
`...,sockets=1,cores=numvcpus,threads=1,...` must be used.

Fixes: #7770

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-28 13:41:36 +02:00
Archana Shinde
1e34220c41 qemu: tdx: Adapt to the TDX 1.5 stack
QEMU for TDX 1.5 makes use of private memory map/unmap.
Make changes to govmm to support this. Support for private backing fd
for memory is added as knob to the qemu config.

Userspace's map/unmap operations are done by fallocate() ioctl on the
backing store fd.
Reference:
https://lore.kernel.org/linux-mm/20220519153713.819591-1-chao.p.peng@linux.intel.com/

Fixes: #7770

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-28 13:41:36 +02:00
Fabiano Fidêncio
8115a0522d versions: tdx: Update Kernel to 6.2 + TDX
This is the version that's been used and tested inside Intel, and it
matches with https://github.com/intel/tdx-tools/releases/tag/2023ww15.

Fixes: #7770

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-28 13:11:34 +02:00
Fabiano Fidêncio
ec18180f34 versions: tdx: Update TDVF to the "edk2-stable202302"
This is the version that's been used and tested inside Intel, and it
matches with https://github.com/intel/tdx-tools/releases/tag/2023ww15.

Fixes: #7770

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-28 13:11:34 +02:00
Fabiano Fidêncio
9803b24286 versions: tdx: Update QEMU to v7.2 + TDX v1.10
This is the version that's been used and tested inside Intel, and it
matches with https://github.com/intel/tdx-tools/releases/tag/2023ww15.

Fixes: #7770

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-28 13:11:27 +02:00
Fabiano Fidêncio
02a08c956b Merge pull request #7754 from microsoft/danmihai1/pod-quota-deployment
tests: delete k8s deployment at the test's end
2023-08-27 17:52:00 +02:00
Fabiano Fidêncio
98037ced52 Merge pull request #7755 from microsoft/danmihai1/unique-test-name
tests: use unique test name
2023-08-27 17:27:40 +02:00
Zhongtao Hu
f0440a9cfe Merge pull request #7742 from frezcirno/fix-log-forwarder-loop
runtime-rs: check peer close in log_forwarder
2023-08-26 10:44:09 +08:00
Fabiano Fidêncio
16a610d788 Merge pull request #7758 from fidencio/topic/gha-avoid-fail-fast-till-everything-is-ultra-stable
gha: Avoid "fail-fast" in tests that are known to be flaky
2023-08-25 16:49:26 +02:00
Jiang Liu
91db888d83 Merge pull request #7602 from jiangliu/agent-storage
Refine storage device management for kata-agent
2023-08-25 22:20:18 +08:00
Zixuan Tan
dffc16e5b3 runtime-rs: check peer close in log_forwarder
The log_forwarder task does not check if the peer has closed, causing a
meaningless loop during the period of “kata vm exit”, when the peer
closed, and “ShutdownContainer RPC received” that aborts the log forwarder.

This patch fixes the problem.

Fixes: #7741

Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2023-08-25 19:00:07 +08:00
Jiang Liu
aaa5ab1264 agent: simplify storage device by removing StorageDeviceObject
Simplify storage device implementation by removing StorageDeviceObject.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-25 17:23:16 +08:00
Fabiano Fidêncio
fb49d5d7ce gha: Avoid "fail-fast" in tests that are known to be flaky
Otherwise we'll have to re-run all the tests due to a flaky behaviour in
one of the parts.

Fixes: #7757

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-25 10:00:17 +02:00
Dan Mihai
183f51d6f6 tests: use unique test name
k8s-pid-ns.bats was already using the test name from
k8s-kill-all-process-in-container.bats - probably a copy/paste bug.

Fixes: #7753

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-08-25 03:41:06 +00:00
Dan Mihai
6a974679f2 tests: delete k8s deployment at the test's end
At the end of k8s-kill-all-process-in-container.bats, delete the
deployment it created.

Fixes: #7752

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-08-25 03:34:37 +00:00
David Esparza
686eb3878b Merge pull request #7751 from GabyCT/topic/unusednhwc
metrics: Remove unused variable in tensorflow nhwc script
2023-08-24 18:34:06 -06:00
Fabiano Fidêncio
f1d8e1f513 Merge pull request #7747 from fidencio/topic/kata-deploy-dont-try-to-remove-opt-kata
kata-deploy: Don't try to remove /opt/kata
2023-08-24 18:56:52 +02:00
Gabriela Cervantes
32a778b6da metrics: Remove unused variable in tensorflow nhwc script
This PR removes unused variable in tensorflow nhwc script.

Fixes #7750

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-24 15:54:27 +00:00
David Esparza
875a85ee14 Merge pull request #7736 from GabyCT/topic/tensorflowfp32
metrics: Add TensorFlow ResNet50 FP32 benchmark
2023-08-24 08:56:24 -06:00
Fabiano Fidêncio
d8f3ce6497 kata-deploy: Don't try to remove /opt/kata
The directory is a host path mount and cannot be removed from within the
container.  What we actually want to remove is whatever is inside that
directory.

This may raise errors like:
```
rm: cannot remove '/opt/kata/': Device or resource busy
```

Fixes: #7746

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-24 13:57:36 +02:00
Jeremi Piotrowski
71c90b994a Merge pull request #7745 from jepio/vfio-part-0
gha: vfio: Run on Ubuntu 23.04 runner
2023-08-24 12:15:19 +02:00
Greg Kurz
9991772b26 Merge pull request #7718 from littlejawa/fix_filemode_when_zero
kata-agent: use default filemode for block device when it is set to 0
2023-08-24 11:40:28 +02:00
Jeremi Piotrowski
936e8091a7 gha: vfio: Run on Ubuntu 23.04 runner
The vfio test requires nested-nested virtualization:

L0 Azure host
-> L1 Ubuntu VM
  -> L2 Fedora VM
    -> L3 Kata

This hits a kernel bug on v5.15 but works quite nicely on the v6.2 kernel
included in Ubuntu 23.04. We can switch back to Ubuntu 22.04 when they roll out
v6.2.

Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-08-24 10:10:02 +02:00
Jiang Liu
0e7248264d agent: move storage device related code into dedicated files
Move storage device related code into dedicated files.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 13:48:51 +08:00
Xuewei Niu
268e846558 runtime-rs: Fix volumes and rootfs cleanup issues
There are several processes for container exit:

- Non-detach mode: `Wait` request is sent by containerd, then
  `wait_process()` will be called eventually.
- Detach mode: `Wait` request is not sent, the `wait_process()` won’t be
  called.
    - Killed by ctr: For example, a container runs `tail -f /dev/null`, and
      is killed by `sudo ctr t kill -a -s SIGTERM <CID>`. Kill request is
      sent, then `kill_process()` will be called. User executes `sudo ctr c
      rm <CID>`, `Delete` request is sent, then `delete_process()` will be
      called.
    - Exited on its own: For example, a container runs `sleep 1s`. The
      container’s state goes to `Stopped` after 1 second. User executes
      the delete command as below.

Where do we do container cleanup things?

- `wait_process()`: No, because it won’t be called in detach mode.
- `delete_process()`: No, because it depends on when the user executes the
  delete command.
- `run_io_wait()`: Yes. A container is considered exited once its IO ended.
  And this always be called once a container is launched.

Fixes: #7713

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-08-24 13:23:47 +08:00
Jiang Liu
8f49ee33b2 agent: refine storage related code a bit
Refine storage related code by:
- remove the STORAGE_HANDLER_LIST
- define type alias
- move code near to its caller

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 13:09:10 +08:00
Jiang Liu
60ca12ccb0 agent: switch to new storage subsystem
Switch to new storage subsystem to create a StorageDevice for each
storage object.

Fixes: #7614

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 13:09:09 +08:00
Jiang Liu
fcbda0b419 kata-types: introduce StorageDevice and StorageHandlerManager
Introduce StorageDevice and StorageHandlerManager, which will be used
to refine storage device management for kata-agent.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 13:08:55 +08:00
Jiang Liu
b03b1f6134 agent: simplify the way to manage storage object
Simplify the way to manage storage objects, and introduce
StorageStateCommon structures for coming extensions.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:58:24 +08:00
Jiang Liu
8392c71bf2 sys-util: support more mount flags in parse_mount_options()
Support more mount flags in parse_mount_options().

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:17:39 +08:00
Jiang Liu
c00d8f3d48 agent: use create_mount_destination() from kata-sys-util
Use create_mount_destination() from kata-sys-util crate to reduce
redundant code.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:17:38 +08:00
Jiang Liu
5e867f0538 types: add more mount related constants
Add more mount related constants.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:17:36 +08:00
Jiang Liu
880e6c9a76 agent: use function from kata-sys-utils to reduce code
Use function get_linux_mount_info() from kata-sys-util crate to share
common code.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-24 12:17:34 +08:00
QuanweiZhou
a6921dd837 Merge pull request #7698 from jiangliu/virtual-volume
kata-types: introduce KataVirtualVolume to support nydus, direct volume and image pull
2023-08-24 11:50:39 +08:00
Fabiano Fidêncio
7705c5962e Merge pull request #7728 from ManaSugi/fix/typo-test-toml
libs,tests: fix typo disable_guest_seccomp in configuration-anno-1.toml
2023-08-23 23:55:41 +02:00
GabyCT
c1712e1930 Merge pull request #7737 from jepio/fix-local-build
local-build: Remove GID before creating group
2023-08-23 12:26:39 -06:00
Jeremi Piotrowski
3b881fbc0e local-build: Remove GID before creating group
docker install now creates a group with gid 999 which happens to match what we
need to get docker-in-docker to work. Remove the group first as we don't need
it.

Fixes: #7726
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-08-23 18:58:38 +02:00
David Esparza
ebce5d25a9 Merge pull request #7734 from fidencio/topic/kata-deploy-fix-removal
kata-deploy: Avoid failing on content removal
2023-08-23 10:29:57 -06:00
Gabriela Cervantes
959ca49447 metrics: Add TensorFlow ResNet50 fp32 Dockerfile
This PR adds the TensorFlow ResNet50 fp32 Dockerfile for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-23 16:24:58 +00:00
Gabriela Cervantes
4b7d72c4a8 metrics: Add TensorFlow ResNet50 FP32 benchmark
This PR adds TensorFlow ResNet50 FP32 benchmark for kata metrics.

Fixes #7735

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-23 16:21:09 +00:00
Fabiano Fidêncio
e7e4cc2182 Merge pull request #7716 from bergwolf/github/image-initrd-assets
runtime: fix image and initrd assets handling
2023-08-23 18:02:15 +02:00
Fabiano Fidêncio
5cba38c175 kata-deploy: Avoid failing on content removal
We can simply use `rm -f` all over the place and avoid the container
returning any error.

Fixes: #7733

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-23 16:49:26 +02:00
Peng Tao
18d42da21e runtime/fc: fix image/initrd annotation handling
Right now if we configure an image annotation and have a config file
setting initrd, the initrd config would override the image annotation.

Make sure annotations are preferred over config options in image and initrd
path handling.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-23 03:47:28 +00:00
Peng Tao
9fda7059a5 runtime/clh: fix image/initrd annotation handling
We should make sure annotations are preferred over
config options in image and initrd path handling.

Fixes: #7705
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-23 03:47:28 +00:00
Peng Tao
1a0092d631 runtime/qemu: fix image/initrd annotation handling
Right now if we configure an image annotation and have a config file
setting initrd, the initrd config would override the image annotation.

Add a helper function ImageOrInitrdAssetPath to make sure annotations
are preferred over config options in image and initrd path handling.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-08-23 03:47:27 +00:00
Manabu Sugimoto
22d8f335d6 libs,tests: fix typo disable_guest_seccomp in configuration-anno-1.toml
Change `pdisable_guest_seccomp` to `disable_guest_seccomp`

Fixes: #7727

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-23 12:08:18 +09:00
GabyCT
b8990c0490 Merge pull request #7722 from GabyCT/topic/adddiskreadme
metrics: Add disk link to README
2023-08-22 12:29:54 -06:00
GabyCT
514d3d42b8 Merge pull request #7712 from GabyCT/topic/fixfiopath
metrics: Fix FIO path
2023-08-22 12:28:28 -06:00
Gabriela Cervantes
8afd158cef metrics: Add disk link to README
This PR adds disk link to README documentation for kata metrics.

Fixes #7721

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-22 16:20:31 +00:00
Julien Ropé
40914b25d4 kata-agent: use default filemode for block device when it is set to 0
When the FileMode field for the device is unset (0), use a default value instead
to allow the use of the device from the container.
This behaviour is seen from cri-o typically.

Note: this is what runc is doing, which is why regular containers don't have an
issue. This change makes sure kata behaves the same as runc.

Fixes: #7717

Signed-off-by: Julien Ropé <jrope@redhat.com>
2023-08-22 16:08:14 +02:00
Fabiano Fidêncio
8032797418 Merge pull request #7708 from microsoft/danmihai1/kata-deploy-log
gha: capture additional kata-deploy output
2023-08-21 23:43:51 +02:00
David Esparza
d2c130ea69 Merge pull request #7710 from GabyCT/topic/fixpytorch1
metrics: Use function from metrics common in pytorch script
2023-08-21 15:31:24 -06:00
Gabriela Cervantes
eee2ee6eeb metrics: Fix FIO path
This PR fixes the FIO path for the FIO files.

Fixes #7711

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-21 21:06:04 +00:00
David Esparza
9347051592 Merge pull request #7666 from dborquez/metrics_improve_fio_test
metrics: Enable kata runtime in K8s for FIO test.
2023-08-21 13:51:57 -06:00
Gabriela Cervantes
39bc3488f5 metrics: Use function from metrics common in pytorch script
This PR uses a common function into the pytorch script.

Fixes #7709

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-21 16:12:35 +00:00
Dan Mihai
400eb88743 gha: capture additional kata-deploy output
10 lines can be insufficient for diagnostics.

Fixes: #7707

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-08-21 15:58:57 +00:00
GabyCT
700759232f Merge pull request #7690 from GabyCT/topic/fixpytorch
metrics: Fix README for pytorch
2023-08-21 09:50:14 -06:00
Jiang Liu
6e038e66e4 Merge pull request #7680 from GabyCT/topic/removetime
metrics: Remove unused variable in tensorflow mobilenet script
2023-08-21 23:39:07 +08:00
Jiang Liu
4aee3eade0 kata-types: implement serde methods for KataVirtualVolume
Implement serilization/deserialization methods for KataVirtualVolume.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-21 16:46:56 +08:00
Jiang Liu
b875e39323 kata-types: validate KataVirtualVolume object
Implement method validate() for KataVirtualVolume to validate message
format.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-21 16:42:07 +08:00
Jiang Liu
fa2fdc1057 kata-types: implement two conversion helpers for KataVirtualVolume
Enable conversions from NydusExtraOptions/DirectVolumeMountInfo to
KataVirtualVolume.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-21 16:35:26 +08:00
Jiang Liu
6326af20e3 kata-types: introduce KataVirtualVolume
Introduce structure KataVirtualVolume to to encapsulate information
for extra mount options and direct volumes, so we could build a common
infrastructure to handle these cases.

Fixes: #7699

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-21 16:19:47 +08:00
Gabriela Cervantes
c8b43f8b3e metrics: Fix README for pytorch
This PR fixes the pytorch reference in the README file.

Fixes #7689

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-18 20:14:49 +00:00
Aurélien
fa34d61805 Merge pull request #7664 from microsoft/danmihai1/agent-init-policy
rootfs: agent: Policy support with AGENT_INIT=yes
2023-08-18 10:51:55 -07:00
Fabiano Fidêncio
7e66d1f6b5 Merge pull request #7649 from fidencio/topic/k8s-tests-remove-kata-deploy-tests
gha: k8s: kata-deploy: Move kata-deploy specific tests from integration/kubernetes to functional/kata-deploy
2023-08-18 07:47:26 +02:00
David Esparza
fb571f8be9 metrics: Enable kata runtime in K8s for FIO test.
This PR configures the corresponding kata runtime in K8s
based on the tested hypervisor.

This PR also enables FIO metrics test in the kata metrics-ci.

Fixes: #7665

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-08-17 17:11:27 -06:00
Dan Mihai
cb056f8cb3 rootfs: agent: Policy support with AGENT_INIT=yes
When building with AGENT_POLICY=yes and AGENT_INIT=yes:
1. Include OPA and the Policy settings in rootfs.
2. Start OPA from the kata agent.

Before these changes, building with both AGENT_POLICY=yes and
AGENT_INIT=yes was unsupported.

Starting OPA from systemd (when AGENT_INIT=no) was already supported.

Fixes: #7615

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-08-17 22:37:58 +00:00
GabyCT
c358056a3f Merge pull request #7685 from GabyCT/topic/changename
metrics: Fix check results for tensorflow benchmark
2023-08-17 15:39:43 -06:00
Gabriela Cervantes
85c02828e1 metrics: Update tensorflow name in gha run script
This PR update tensorflow name in gha run script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-17 20:17:48 +00:00
Gabriela Cervantes
e8a5119343 metrics: Fix check results for tensorflow benchmark
This PR fixes the check results for tensorflow benchmark now
that we change the name of the test.

Fixes #7684

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-17 19:52:45 +00:00
Fabiano Fidêncio
2d896ad12f gha: kata-deploy: Do the runtime class cleanup as part of the cleanup
Instead of doing this as part of the test itself, let's ensure it's done
before running the tests and during the tests cleanup.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-17 18:54:46 +02:00
Fabiano Fidêncio
4ffc2c86f3 gha: kata-deploy: Add the first kata-deploy test
This test, at least for now, only checks whether the runtimeclasses
have been properly created.

This is just a migration from a test we had as part of the k8s suite.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-17 18:54:46 +02:00
GabyCT
4ba684e6e4 Merge pull request #7653 from GabyCT/topic/tensorflowfp32
metrics: Add Tensorflow ResNet50 int8 benchmark
2023-08-17 10:44:25 -06:00
Gabriela Cervantes
8616c050ae metrics: Remove unused variable in tensorflow mobilenet script
This PR removes unused variable in tensorflow mobilenet script.

Fixes #7679

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-17 16:04:18 +00:00
Fabiano Fidêncio
285e616b5e tests: common: Ensure test_type is used as part of the cluster's name
By doing this we can make sure there won't be any clash on the cluster
name created for either the k8s or the kata-deploy tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-17 14:22:16 +02:00
Fabiano Fidêncio
790bd3548d tests: commob: Don't fail if yq is not part of the cache
This may happen on external runners.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-17 14:22:14 +02:00
Fabiano Fidêncio
ce6adecd0a gha: kata-deploy: Add run-kata-deploy-tests.sh
This will have the same function as run-k8s-tests.sh has, but for
kata-deploy.

Right now it doesn't have any tests, and the command to actually run the
tests is commented out, but right now this is just a placeholder that
will be populated sooner than later.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-17 09:49:03 +02:00
Fabiano Fidêncio
cfc29c11a3 gha: k8s: Stop running kata-deploy tests as part of the k8s suite
In a follow-up series, we'll add a whole suite for the kata-deploy
tests.  With this in mind, let's already get rid of this one and avoid
more kata-deploy tests to land here.

Fixes: #7642

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-17 09:48:54 +02:00
Fabiano Fidêncio
e470a650e0 Merge pull request #7654 from sprt/ci-fixes
kata-deploy: Properly create default runtime class
2023-08-17 09:43:34 +02:00
Wedson Almeida Filho
962378606e Merge pull request #7627 from wedsonaf/error-conv
agent: simplify error handling
2023-08-16 21:02:38 -03:00
Aurélien Bombo
f4dd152863 tests: k8s: Call ensure_yq() in setup.sh
It wasn't the `common.bash` import in `run_kubernetes_tests.sh` causing
the yq error so let's try this instead.

Reference: https://github.com/kata-containers/kata-containers/actions/runs/5674941359/job/15379797568#step:10:341

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-08-16 14:13:56 -07:00
GabyCT
3d0cfc88c9 Merge pull request #7662 from GabyCT/topic/fixhelptensorflow
metrics: Fix MobileNet help me description
2023-08-16 14:13:39 -06:00
Aurélien Bombo
339569b69c kata-deploy: Properly create default runtime class
The default `kata` runtime class would get created with the `kata`
handler instead of `kata-$KATA_HYPERVISOR`. This made Kata use the wrong
hypervisor and broke CI.

Fixes: #7663

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-08-16 11:04:44 -07:00
Gabriela Cervantes
2a491e9b1f metrics: Fix MobileNet help me description
This PR fixes MobileNet help me description in the
tensorflow script.

Fixes #7661

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-16 15:25:39 +00:00
Fabiano Fidêncio
606e419fac Merge pull request #7660 from fidencio/topic/add-kata-deploy-tests-as-part-of-the-ci
gha: ci: Start running kata-deploy tests
2023-08-16 16:44:08 +02:00
Fabiano Fidêncio
d19a75e80c gha: ci: Start running kata-deploy tests
Let's add the tests as part of the ci.yaml, so they an be triggered as
part of each PR.

For this PR those tests won't be triggered, courtesy to the
`pull_request_target` event we rely on.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-16 16:08:05 +02:00
Fabiano Fidêncio
4adcf2192e Merge pull request #7651 from ManaSugi/runk/containerd-test
runk: Modify kill command's error message for containerd tests
2023-08-16 15:37:48 +02:00
Zhongtao Hu
5c8a61a4c8 Merge pull request #7558 from openanolis/fix/driver_option
runtime-rs: add driver option
2023-08-16 13:56:29 +08:00
Zhongtao Hu
d90f7ac689 runtime-rs: add unit test for block driver
add unit test for block driver

Fixes:#7539
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-08-16 11:45:27 +08:00
Zhongtao Hu
e44919f0da runtime-rs: add load_test_config for unit test
add load_test_config for unit test

Fixes:#7539
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-08-16 11:32:56 +08:00
Zhongtao Hu
7f48a69379 runtime-rs: add driver option
add driver option when handle linux devices

Fixes:#7539
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-08-16 11:32:49 +08:00
Gabriela Cervantes
bade6a5c3b docs: Fix TensorFlow word across the document
This PR fixes the TensorFlow word across the document to have uniformity
across all the document.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-15 20:13:05 +00:00
Fabiano Fidêncio
0bc48eab60 Merge pull request #7640 from fidencio/topic/gha-cri-containerd-enable-tests
gha: cri-containerd: Enable tests
2023-08-15 21:18:28 +02:00
Gabriela Cervantes
1a1b207760 docs: Add Tensorflow Resnet50 documentation
This PR adds the Tensorflow Resnet50 documentation.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-15 17:46:44 +00:00
Gabriela Cervantes
24baededc0 metrics: Add Dockerfile for ResNet50 int8
This PR adds the dockerfile for ResNet50 int8 benchmark.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-15 17:38:26 +00:00
Gabriela Cervantes
6d971ba8df metrics: Add Tensorflow ResNet50 int8 benchmark
This PR adds the Tensorflow ResNet50 int8 script for kata metrics.

Fixes #7652

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-15 17:30:22 +00:00
Manabu Sugimoto
25d151bd1b runk: Modify kill command's error message for containerd tests
The error message when the kill command is executed with the container's
state == Stopped should be "container not running" because the containerd
tests expect that OCI runtimes return the error message and compare it.
If the error message is different from the expected one, the tests fail.

Fixes: #7650

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-16 00:39:50 +09:00
GabyCT
0bbabeaaf8 Merge pull request #7644 from GabyCT/topic/renametensorflow
metrics: Rename tensorflow scripts
2023-08-15 09:23:24 -06:00
Fabiano Fidêncio
46d25d908d Merge pull request #7643 from fidencio/topic/add-functional-kata-deploy-tests
gha: tests: Add kata-deploy functional tests -- Part 1
2023-08-15 15:23:48 +02:00
Fabiano Fidêncio
b3592ab25c gha: cri-containerd: Enable tests
As the cri-containerd tests have been fully migrated to GHA, let's make
sure we get them running.

Fixes: #6543

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-15 14:32:42 +02:00
Fabiano Fidêncio
84dd02e0f9 gha: cri-containerd: Add timeout to the crictl calls on testContainerStop
As part of the runners, we're hitting a timeout that I cannot reproduce,
at all, when allocating the same instance and running the tests
manually.

The default timeout to connect to the server is 2s when using `crictl`.
Let's increase this to 20s.

It's fairly important to mention that in the first tests I used a
timeout of 10s, and that helped but we still hit issues every now and
then.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-15 14:31:54 +02:00
Fabiano Fidêncio
b29782984a gha: cri-containerd: Show pod before deleting it
It'll help us to debug failures with the pod stop / pod delete.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-15 14:31:54 +02:00
Fabiano Fidêncio
ae0930824a gha: cri-containerd: Print kata logs in case of error
We need this to fully understand what are the issues we're facing.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-15 14:31:54 +02:00
Fabiano Fidêncio
6c8b2ffa60 gha: cri-containerd: Group containerd logs
This improves readability in case of failures by a lot.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-15 14:31:54 +02:00
Fabiano Fidêncio
9e898701f5 gha: cri-containerd: Ensure RUNTIME takes KATA_HYPERVISOR into account
Short commit log says it all.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-15 14:31:54 +02:00
Wedson Almeida Filho
76dac8f22c agent: simplify error handling
We extend the `Result` and `Option` types with associated types that
allows converting a `Result<T, E>` and `Option<T>` into
`ttrpc::Result<T>`.

This allows the elimination of many `match` statements in favor of
calling the map function plus the `?` operator. This transformation
simplifies the code.

Fixes: #7624

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-15 06:55:27 -03:00
Fabiano Fidêncio
e107d1d94e Merge pull request #7574 from microsoft/danmihai1/policy
agent: runtime: add Agent Policy feature
2023-08-15 11:29:13 +02:00
Bin Liu
ea81eb6c2e Merge pull request #7169 from chethanah/runk/support-no-pid-ns
runk: Support without pid ns
2023-08-15 13:00:40 +08:00
Gabriela Cervantes
18a7fd8e4e metrics: Rename tensorflow scripts
This PR renames the tensorflow scripts to include the data format
that is being used as we will have multiple tests with different
data and model formats for tensorflow so this will help us to
distinguish them.

Fixes #7645

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-14 20:40:35 +00:00
GabyCT
a740c80251 Merge pull request #7626 from GabyCT/topic/cassandrak
metrics: Add Cassandra Kubernetes benchmark for kata metrics
2023-08-14 14:22:52 -06:00
GabyCT
4e5e39e8b3 Merge pull request #7618 from GabyCT/topic/addfunctionscommon
metrics: Add common functions to the common script
2023-08-14 14:22:30 -06:00
GabyCT
a19d471c01 Merge pull request #7629 from dborquez/metrics_improve_stopping_kata_components
metrics: fix the loop used to stop kata components
2023-08-14 14:22:06 -06:00
Fabiano Fidêncio
e55fa93db9 tests: kata-deploy: Add placeholder for kata-deploy-tests-on-tdx
This will not be tested as part of the PR, thanks to the
`pull_request_target` event, but we want it to be added so we can build
atop of that in a coming up series.

Fixes: #7642

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-14 21:38:00 +02:00
Fabiano Fidêncio
d9ee17aaec tests: kata-deploy: Add placeholder for kata-deploy-tests-on-aks
This will not be tested as part of the PR, thanks to the
`pull_request_target` event, but we want it to be added so we can build
atop of that in a coming up series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-14 21:37:52 +02:00
Chelsea Mafrica
22465d22f0 Merge pull request #7638 from ManaSugi/fix/virtcontainers-doc
docs: Remove installation step in virtcontainers doc
2023-08-14 10:21:57 -07:00
Dan Mihai
ab829d1038 agent: runtime: add the Agent Policy feature
Fixes: #7573

To enable this feature, build your rootfs using AGENT_POLICY=yes. The
default is AGENT_POLICY=no.

Building rootfs using AGENT_POLICY=yes has the following effects:

1. The kata-opa service gets included in the Guest image.

2. The agent gets built using AGENT_POLICY=yes.

After this patch, the shim calls SetPolicy if and only if a Policy
annotation is attached to the sandbox/pod. When creating a sandbox/pod
that doesn't have an attached Policy annotation:

1. If the agent was built using AGENT_POLICY=yes, the new sandbox uses
   the default agent settings, that might include a default Policy too.

2. If the agent was built using AGENT_POLICY=no, the new sandbox is
   executed the same way as before this patch.

Any SetPolicy calls from the shim to the agent fail if the agent was
built using AGENT_POLICY=no.

If the agent was built using AGENT_POLICY=yes:

1. The agent reads the contents of a default policy file during sandbox
   start-up.

2. The agent then connects to the OPA service on localhost and sends
   the default policy to OPA.

3. If the shim calls SetPolicy:

   a. The agent checks if SetPolicy is allowed by the current
      policy (the current policy is typically the default policy
      mentioned above).

   b. If SetPolicy is allowed, the agent deletes the current policy
      from OPA and replaces it with the new policy it received from
      the shim.

   A typical new policy from the shim doesn't allow any future SetPolicy
   calls.

4. For every agent rpc API call, the agent asks OPA if that call
   should be allowed. OPA allows or not a call based on the current
   policy, the name of the agent API, and the API call's inputs. The
   agent rejects any calls that are rejected by OPA.

When building using AGENT_POLICY_DEBUG=yes, additional Policy logging
gets enabled in the agent. In particular, information about the inputs
for agent rpc API calls is logged in /tmp/policy.txt, on the Guest VM.
These inputs can be useful for investigating API calls that might have
been rejected by the Policy. Examples:

1. Load a failing policy file test1.rego on a different machine:

opa run --server --addr 127.0.0.1:8181 test1.rego

2. Collect the API inputs from Guest's /tmp/policy.txt and test on the
   machine where the failing policy has been loaded:

curl -X POST http://localhost:8181/v1/data/agent_policy/CreateContainerRequest \
--data-binary @test1-inputs.json

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-08-14 17:07:35 +00:00
Fabiano Fidêncio
831e73ff91 tests: kata-deploy: Add functional/kata-deploy/gha-run.sh placeholder
Right now this file does nothing, as it's not even called by any GHA.
However, it'll be populated later on as part of a different series,
where we'll have kata-deploy specific tests running here.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-14 17:46:10 +02:00
Fabiano Fidêncio
af1b46bbf2 tests: Add gha-run-k8s-common.sh
Let's split a good portion of `tests/integration/kuberentes/gha-run.sh`
out, and put them in a place where they can be used to the soon-to-come
kata-deploy specific tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-14 17:45:58 +02:00
Jeremi Piotrowski
a57e7ffe14 Merge pull request #7211 from stevenhorsman/propogate-secrets
Propogate secrets, config maps etc into guest if sharedFS not available
2023-08-14 11:24:47 +02:00
Manabu Sugimoto
416445e7eb docs: Remove installation step in virtcontainers doc
Remove the installation step in the virtcontainers doc
because the virtcontainers install/uninstall targets have
been removed by 86723b51ae
and they are not used anymore.

Fixes: #7637

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-14 15:15:24 +09:00
Fabiano Fidêncio
b975c27793 Merge pull request #7547 from stevefan1999-personal/patch-k0s
kata-deploy: Preliminary k0s support
2023-08-12 14:28:13 +02:00
Fabiano Fidêncio
6ed57d1e9a Merge pull request #7447 from fidencio/topic/gha-move-static-jenkins-to-azure-instances
gha: static-checks: Move to the Azure instances
2023-08-12 13:31:54 +02:00
Steve Fan
72cbcf040b kata-deploy: Add k0s support
Add k0s support to kata-deploy, in the very same way kata-containers
already supports k3s, and rke2.

k0s support requires v1.27.1, which is noted as part of the kata-deploy
documentation, as it's the way to use dynamic configuration on
containerd CRI runtimes.

This support will only be part of the `main` branch, as it's not a bug
fix that can be backported to the `stable-3.2` branch, and this is also
noted as part of the documentation.

Fixes: #7548
Signed-off-by: Steve Fan <29133953+stevefan1999-personal@users.noreply.github.com>
2023-08-11 21:17:23 +02:00
David Esparza
767434d50a metrics: fix the loop used to stop kata components #7629
This PR fixed the loop that stops the kata-shim and the
hypervisors used in metrics checks.

Fixes: #7628

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-08-11 12:32:41 -06:00
Gabriela Cervantes
5d0f0d43c7 metrics: Add cassandra statefulset yaml
This PR adds cassandra statefulset yaml for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-11 17:22:39 +00:00
Gabriela Cervantes
c1dcc1396f metrics: Add cassandra service yaml
This PR adds the cassandra service yaml for the benchmark.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-11 17:22:36 +00:00
Gabriela Cervantes
2297a0d1c5 metrics: Add block loop pvc yaml for cassandra
This PR adds block loop pvc yaml for cassandra test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-11 17:22:33 +00:00
Gabriela Cervantes
e3d511946f metrics: Add block loop pv yaml for cassandra test
This PR adds the block loop pv yaml for cassandra test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-11 17:22:29 +00:00
Gabriela Cervantes
9890271594 metrics: Add block loop pvc for cassandra test
This PR adds the block loop pvc for cassandra test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-11 17:22:19 +00:00
Gabriela Cervantes
349b89969a metrics: Add Cassandra Kubernetes benchmark for kata metrics
This PR adds Cassandra Kubernetes benchmark for kata metrics tests.

Fixes #7625

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-11 17:21:48 +00:00
Fabiano Fidêncio
c52d090522 gha: static-checks: Move to the Azure instances
The GHA runners are not exactly powerful, which makes the static-checks
take way too long (almost an hour).

Let's give a try and move those to the same size of Azure instances used
as part of our CI, and probably have this time reduced.

Fixes: #7446

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-11 18:47:47 +02:00
stevenhorsman
8815ed0665 runtime: Remove config warnings
Remove configuration file shared_fs = none warnings
now that there is a solution to updating configMaps, secrets etc

Fixes: #7210
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-08-11 16:31:08 +01:00
Yohei Ueda
afe1a6ac5a agent: support copying of directories and symlinks
This patch allows copying of directories and symlinks when
static file copying is used between host and guest. This change is
necessary to support recursive file copying between shim and agent.

Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
(cherry picked from commit de232b8030)
2023-08-11 16:31:08 +01:00
Pradipta Banerjee
ab13ef87ee runtime: propagate configmap/secrets etc changes for remote-hyp
For remote hypervisor, the configmap, secrets, downward-api or project-volumes are
copied from host to guest. This patch watches for changes to the host files
and copies the changes to the guest.

Note that configmap updates takes significantly longer than updates via downward-api.
This is similar across runc and Kata runtimes.

Fixes: #7210

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: Julien Ropé <jrope@redhat.com>
(cherry picked from commit 3081cd5f8e)
(cherry picked from commit 68ec673bc4d9cd853eee51b21a0e91fcec149aad)
2023-08-11 16:31:08 +01:00
Yohei Ueda
c074ec4df1 runtime: Copy shared files recursively
This patch enables recursive file copying
when filesystem sharing is not used.

Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
(cherry picked from commit 5422a056f2)
(cherry picked from commit 16055ce040bbd724be2916bc518d89b69c9e0ca5)

Fixes: #7210
2023-08-11 16:16:52 +01:00
Peng Tao
a39fd6c066 Merge pull request #7611 from ManaSugi/fix/fc-version
versions: Update firecracker version to 1.4.0
2023-08-11 16:43:37 +08:00
Chao Wu
7031b5db07 Merge pull request #7535 from ManaSugi/fix/allow-redundant-clone
agent: Allow clippy::redundant_clone in the unit tests
2023-08-11 14:17:56 +08:00
Gabriela Cervantes
fdcd52ff78 metrics: Add check containers are running in tensorflow mobilenet
This PR adds check containers are running in tensorflow mobilenet
that is being defined in common script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 20:17:20 +00:00
Gabriela Cervantes
36337ee146 metrics: Add check containers are up in tensorflow script
This PR adds the check containers are up function from common
in tensorflow script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 20:15:18 +00:00
Gabriela Cervantes
f700f9b0ba metrics: Remove unused variable in tensorflow script
This PR removes an unused variable in tensorflow script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 20:13:37 +00:00
Gabriela Cervantes
833cf7a684 metrics: Add check containers are running function
This PR adds the check containers are running function the common metrics
script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 20:12:22 +00:00
Gabriela Cervantes
918c783084 metrics: Add check containers are up in tensorflow mobilenet script
This PR adds the check containers are up in the common script
in the tensorflow mobilenet script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 20:06:40 +00:00
Gabriela Cervantes
9d57a1fab4 metrics: Use check containers are up in tensorflow script
This PR uses the check containers are up from the common script
in the tensorflow script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 17:42:09 +00:00
Gabriela Cervantes
1c84680d8c metrics: Add check containers are up in common script
This PR adds check containers are up in common script for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 17:39:24 +00:00
Gabriela Cervantes
d3e57cf454 metrics: Use collect_results function in tensorflow mobilenet test
This PR uses the collect results function defined in common for
the tensorflow mobilenet test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 17:34:30 +00:00
Gabriela Cervantes
286de046af metrics: Remove collect results function definition
This PR removes the collect results function from tensorflow script
as it is going to be referenced in the common metrics script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 17:31:23 +00:00
Gabriela Cervantes
9879709aae metrics: Add common functions to the common script
This PR adds the collect results function to the common metrics
script.

Fixes #7617

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-10 17:27:11 +00:00
Fabiano Fidêncio
a89c9cd620 Merge pull request #7557 from wedsonaf/no-new-vecs
agent: avoid creating new `Vec` instances when easily avoidable
2023-08-10 18:43:46 +02:00
Manabu Sugimoto
4746fa3daa docs: Specify supported Firecracker version using versions.yaml
Specify the supported version of Firecracker using our `versions.yaml`
to improve the maintainability of the documentation.

Fixes: #7610

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-10 16:49:45 +09:00
Manabu Sugimoto
cc922be5ec versions: Update firecracker version to 1.4.0
This patch upgrades Firecracker version from v1.1.0 to v1.4.0.

* Generate swagger models for v1.4.0 (from `firecracker.yaml`)
  - The version of go-swagger used is v0.30.0
* The firecracker v1.4.0 includes the following changes.
  - Added
    * Added support for custom CPU templates allowing users to adjust vCPU features
    exposed to the guest via CPUID, MSRs and ARM registers.
    * Introduced V1N1 static CPU template for ARM to represent Neoverse V1 CPU
    as Neoverse N1.
    * Added support for the virtio-rng entropy device. The device is optional. A
    single device can be enabled per VM using the /entropy endpoint.
    * Added a cpu-template-helper tool for assisting with creating and managing
    custom CPU templates.
  - Changed
    * Set FDP_EXCPTN_ONLY bit (CPUID.7h.0:EBX[6]) and ZERO_FCS_FDS bit
    (CPUID.7h.0:EBX[13]) in Intel's CPUID normalization process.
  - Fixed
    * Fixed feature flags in T2S CPU template on Intel Ice Lake.
    * Fixed CPUID leaf 0xb to be exposed to guests running on AMD host.
    * Fixed a performance regression in the jailer logic for closing open file
    descriptors.
    * A race condition that has been identified between the API thread and the VMM
    thread due to a misconfiguration of the api_event_fd.
    * Fixed CPUID leaf 0x1 to disable perfmon and debug feature on x86 host.
    * Fixed passing through cache information from host in CPUID leaf 0x80000006.
    * Fixed the T2S CPU template to set the RRSBA bit of the IA32_ARCH_CAPABILITIES
    MSR to 1 in accordance with an Intel microcode update.
    * Fixed the T2CL CPU template to pass through the RSBA and RRSBA bits of the
    IA32_ARCH_CAPABILITIES MSR from the host in accordance with an Intel microcode
    update.
    * Fixed passing through cache information from host in CPUID leaf 0x80000005.
    * Fixed the T2A CPU template to disable SVM (nested virtualization).
    * Fixed the T2A CPU template to set EferLmsleUnsupported bit
    (CPUID.80000008h:EBX[20]), which indicates that EFER[LMSLE] is not supported.

Fixes: #7610

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-10 16:48:13 +09:00
Fupan Li
39e67b06e9 dragonball: vsock add fifo/pipe stream support for passed fd hybridStream
Since the passed fd through unix socket would be any
stream fd such as pipe/fifo fd or any other socket
fd, thus we should deal with it as a normal hybrid
stream instead of a unix stream.

Fixes:#7584

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2023-08-10 11:07:10 +08:00
David Esparza
7bf994827d Merge pull request #7609 from dborquez/tensorflow_check_completion
metrics: compute tensorflow statistics
2023-08-09 18:47:47 -06:00
David Esparza
dcdb3b067f Merge pull request #7606 from GabyCT/topic/nginx
metrics: Add network nginx benchmark
2023-08-09 16:14:13 -06:00
David Esparza
2defdcc598 Merge pull request #7579 from dborquez/simplify_gha_metrics_workflow
metrics: install kata once and run multiple checks
2023-08-09 14:45:09 -06:00
David Esparza
473b0d3a31 metrics: compute tensorflow statistics
This PR computes average results for TF bench.
Additionally, it improves the data parsing from
all running containers.

Fixes: #7603

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-08-09 14:42:30 -06:00
Fabiano Fidêncio
0a8208c670 Merge pull request #7608 from fidencio/topic/create-image-to-be-used-by-the-confidential-tests-follow-up-3
ci: unencrypted-image: Fix build context
2023-08-09 21:00:46 +02:00
Fabiano Fidêncio
03d1fa67b1 ci: unencrypted-image: Fix build context
The build context should be the folder where the Dockerfile is present,
otherwise the files copied into the image won't be found.

Fixes: #7595

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 20:32:36 +02:00
Fabiano Fidêncio
eb463b38ec ci: unencrypted-image: Don't fail to build on s390x
Let's make sure that we don't fail in case we're building non x86_64.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 20:32:36 +02:00
Fabiano Fidêncio
ebc86091d1 Merge pull request #7607 from fidencio/topic/create-image-to-be-used-by-the-confidential-tests-follow-up-2
ci: create-confidential-image: Add dependent actions
2023-08-09 19:53:49 +02:00
Fabiano Fidêncio
a2d731ad26 ci: create-confidential-image: Add dependent actions
Following the example on https://github.com/docker/build-push-action,
it's clear that the actions to "Set up QEMU" and "Set up Docker Buildx"
are missing.

Let's add them, and also take the advantage to bump the
build-push-action to its v4, which, by the way, had a typo on its name
(build-and-push-action does **NOT** exist, build-push-action does).

Fixes: #7595

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 18:36:51 +02:00
Gabriela Cervantes
d1a6296221 metrics: Add nginx documentation to network README
This PR adds nginx documentation to network README for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-09 16:17:46 +00:00
Gabriela Cervantes
498f7c0549 metrics: Add nginx kubernetes yaml
This PR adds the nginx kubernetes yaml.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-09 16:14:04 +00:00
Gabriela Cervantes
f8a5255cf7 metrics: Add network nginx benchmark
This PR adds the network nginx benchmark for kata metrics.

Fixes #7605

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-09 16:12:21 +00:00
Fabiano Fidêncio
86f705d98b Merge pull request #7604 from fidencio/topic/create-image-to-be-used-by-the-confidential-tests-follow-up-1
Follow up fixes for https://github.com/kata-containers/kata-containers/pull/7596
2023-08-09 18:05:46 +02:00
Fabiano Fidêncio
43fe5d1b90 ci: k8s: tees: Ensure PR_NUMBER is exported
Right now this is not being used, but it'll as the image generated for
the confidential tests have that as part of their tag.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 17:45:42 +02:00
Fabiano Fidêncio
54f6a78500 ci: {{ pr-number }} should be {{ inputs.pr-number }}
One of the joys to rely on the `pull_request_target` is to only be able
to catch those after those are merged.

Fixes: #7595

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 17:41:07 +02:00
Fabiano Fidêncio
5cdf981a2b Merge pull request #7596 from fidencio/topic/create-image-to-be-used-by-the-confidential-tests
tests: Create image that will be used in the unencrypted confidential tests
2023-08-09 17:06:07 +02:00
Fabiano Fidêncio
c932369f42 Merge pull request #7492 from fidencio/topic/adapt-tests-to-the-new-kata-deploy-env-vars
kata-deploy: Ensure we cover SHIMS / DEFAULT_SHIM as part of our tests
2023-08-09 12:55:03 +02:00
Fabiano Fidêncio
034d7aab87 tests: k8s: Ensure the runtime classes are properly created
With these 2 simple checks we can ensure that we do not regress on the
behaviour of allowing the runtime classes / default runtime class to be
created by the kata-deploy payload.

Fixes: #7491

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 11:46:04 +02:00
Fabiano Fidêncio
fac8ccf5cd ci: Add build-and-publish-tee-confidential-unencrypted-image
This will be done before running TEE tests, and it's a hard dependency
fr them.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 11:36:10 +02:00
Fabiano Fidêncio
ab5f603ffa ci: k8s: Add the image used for unencrypted confidential tests
Let's add here the image we'll be using for unencrypted confidential
tests.  Later on, we'll make sure to build and use this image as part of
our CI.

The image can easily be built as a multi-arch image, and has `cpuid`
installed in case of `x86_64` build, so it can be used to detect whether
we're running on a TEE guest without having to rely on `dmesg | grep
...`.

Fixes: #7595

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 11:33:18 +02:00
Fabiano Fidêncio
36d53dd2af Merge pull request #7598 from UnmeshDeodhar/upgrade-bats-version
tests: upgrade bats version
2023-08-09 11:18:56 +02:00
Fabiano Fidêncio
1e8fe131bd k8s: tests: Take advantage of SHIMS and DEFAULT_SHIM env vars
We don't have to do any sed to replace the runtimeclass being used by
the moment we start taking advantage of the `DEFAULT_SHIM` environment
variable exposed merged in the previous commits.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-09 11:15:34 +02:00
Wedson Almeida Filho
729b2dd611 agent: avoid creating new Vec instances when easily avoidable
There are many places where the code currently creates new `Vec`
instances when it's not really needed. The result is a perf hit because
it allocates memory, copies all elements, then frees the memory; in some
cases, copying elements also involves extra allocations (e.g., when
elements are strings, or structs containing strings).

This patch addresses a number of these cases.

Fixes: #7203

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-09 02:38:36 -03:00
Jiang Liu
311671abb5 Merge pull request #7552 from jiangliu/agent-r1
Fix mimor bugs and improve coding stype of agent rpc/sandbox/mount
2023-08-09 13:19:02 +08:00
Unmesh Deodhar
aeaec9dae9 tests: upgrade bats version
Instead of using package manager to install bats, building
this from source. This gives us the updated version of bats
which supports functions such as setup_file and
teardown_file.
We can use these functions into our current tests.

Fixes: #7597

Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
2023-08-08 18:16:39 -05:00
David Esparza
e664969862 metrics: install kata once and run multiple checks
This PR changes the metrics workflow in order to just install
kata once, and run the checks for multiple hypervisor variations.

In this way we save time avoiding installing kata for each
hypervisor to be tested.

Fixes: #7578

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-08-08 10:25:13 -06:00
Jiang Liu
baabfa9f1f agent: refine implementation of mount related code
Refine implementation of mount by:
- log message with `path.display()` instead of `{:?}`
- add prefix "_" to unused variables
- pass by reference instead of by value to avoid creating redundant
  array
- exactly matching prefix "fsgid=" instead of "fsgid"
- avoid redundant clone() operations

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:03:03 +08:00
Jiang Liu
98ba211a34 agent: fix a bug in update_ephemeral_mounts()
There's a bug in function update_ephemeral_mounts() which only handles
the first storage object and ignores all other storage objects.

Fixes: #7551

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:03:02 +08:00
Jiang Liu
5333618d70 agent: make add_storage() take &[Storage] instead of Vec<Storage>
Simplify add_storage() by taking &[Storage] instead of Vec<Storage>.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:03:01 +08:00
Jiang Liu
37f34781d1 agent: simplify function online_cpu_memory()
Simplify function online_cpu_memory() by on calling update_cpuset_path()
for containers with cpuset configured.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:03:00 +08:00
Jiang Liu
d3c5422379 agent: refine style of code related to sandbox
Refine style of code related to sandbox by:
- remove unnecessary comments for caller to take lock, we have already taken
  `&mut self`.
- change "*count < 1 " to "*count == 0", `count` is type of u32.
- make remove_sandbox_storage() to take `&mut self` instead of `&self`.
- group related function to each others
- avoid search the map twice in function find_process()
- avoid unwrap() in function run_oom_event_monitor()
- avoid unwrap() in online_resources()

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:02:59 +08:00
Jiang Liu
71a9f67781 agent: avoid unwrap() in function do_remove_container()
Avoid unwrap() in function do_remove_container(), and also make
implmementation symmetric for both timeout and non-timeout cases.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:02:58 +08:00
Jiang Liu
84badd89d7 agent: avoid clone objects when possible
Optimize agent rpc implementation by:
- avoid clone objects when possible
- avoid unwrap() when possible
- explictly drop object to ensure order

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-08-08 18:02:56 +08:00
Chao Wu
b098960442 Merge pull request #7581 from justxuewei/bump-versions
deps: Bump dependent crate versions
2023-08-08 15:16:57 +08:00
Chao Wu
24bf637835 Merge pull request #7500 from pmores/fix-queue-num-in-dragonball-share-fs
fix number of queues handling in dragonball share fs device
2023-08-08 12:07:25 +08:00
Xuewei Niu
b23c5ed155 deps: Bump dependent crate versions
This pull request is mainly for updating vm-memory and vmm-sys-util.

The affacted crates include:

- vm-memory: from 0.9.0 to 0.10.0
- vmm-sys-util: from 0.10.0 to 0.11.0
- virtio-queue: from 0.6.0 to 0.7.0
- fuse-backend-rs: from 0.10.4 to 0.10.5
- linux-loader: from 0.6.0 to 0.8.0
- nydus-api: from 0.3.0 to 0.3.1
- nydus-rafs: from 0.3.1 to 0.3.2
- nydus-storage: from 0.6.3 to 0.6.4

Fixes: #0000

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-08-08 11:54:09 +08:00
Fupan Li
5a20d8dcaf Merge pull request #7383 from justxuewei/dan
runtime-rs: Introduce directly attachable network
2023-08-08 09:54:28 +08:00
Chelsea Mafrica
553fd79ea9 Merge pull request #7572 from GabyCT/topic/resnet50fp32
metrics: General improvements to mobilenet tensorflow test
2023-08-07 13:33:28 -07:00
GabyCT
194120b679 Merge pull request #7540 from GabyCT/topic/enableiperf
gha: Add iperf network metrics
2023-08-07 13:40:02 -06:00
Gabriela Cervantes
863283716d metrics: General improvements to mobilenet tensorflow test
This PR renames the mobilenet tensorflow test to have a more specific
tensorflow name mainly because tensorflow has different configurations
and we will add more tensorflow tests so we want to distinguish each
tensorflow test.

Fixes #7571

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-07 16:50:00 +00:00
Gabriela Cervantes
3c319d8d4c metrics: Add iperf to gha run script
This PR adds iperf to gha run script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-07 16:20:00 +00:00
Gabriela Cervantes
5b5caf8908 gha: Add iperf network metrics
This PR adds the iperf network metrics to the github actions
for kata metrics.

Fixes #7535

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-07 16:20:00 +00:00
Chelsea Mafrica
4559caf619 Merge pull request #7467 from ManaSugi/doc/use-k8-control-plane
docs: Use control-plane term instead of master
2023-08-06 23:40:51 -07:00
Fabiano Fidêncio
b365bef570 Merge pull request #7191 from wedsonaf/avoid-clones
agent: avoid unnecessary calls to `Arc::clone`
2023-08-06 15:34:07 +02:00
GabyCT
7144acb2a5 Merge pull request #7527 from GabyCT/topic/latency
metrics: Add network latency test
2023-08-04 15:54:07 -06:00
Gabriela Cervantes
66db5b5350 metrics: Add latency test to network README
This PR adds latency test to network README for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-04 20:27:27 +00:00
Wedson Almeida Filho
c36572418f agent: avoid unnecessary calls to Arc::clone
These calls cause two extra atomic instructions each time they're used,
one to increment and another one to decrement the refcount.

Since we don't need them because the referred value is guaranteed to
outlive the function, remove the calls.

Fixes: #7190

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 20:53:05 -03:00
Fabiano Fidêncio
8c03deac3a Merge pull request #7106 from wedsonaf/image-pulling
Image pulling on the host
2023-08-04 01:08:42 +02:00
Wedson Almeida Filho
4fbe0a3a53 runtime: bind-mount mounted block device into container
When the mounted block device isn't a layer, we want to mount it into
containers, but since it's already mounted with the correct fs (e.g.,
tar, ext4, etc.) in the pod, we just bind-mount it into the container.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Wedson Almeida Filho
7e1b1949d4 runtime: add support for kata overlays
When at least one `io.katacontainers.fs-opt.layer` option is added to
the rootfs, it gets inserted into the VM as a layer, and the file system
is mounted as an overlay of all layers using the overlayfs driver.

Additionally, if the `io.katacontainers.fs-opt.block_device=file` option
is present in a layer, it is mounted as a block device backed by a file
on the host.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Wedson Almeida Filho
6c867d9e86 agent: add io.katacontainers.fs-opt.overlay-rw option
This causes the overlay-fs driver to add the `upperdir` and `workdir`
options to an overlay-fs mount so that the mount becomes writable using
a discardable directory under the container id.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Wedson Almeida Filho
6163c35657 agent: skip mount options that start with "io.katacontainers."
This is so that file systems don't fail when we pass kata-specific
options from the snapshotter to kata.

Fixes: #7536

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 17:58:39 -03:00
Fabiano Fidêncio
fa35afa982 Merge pull request #7542 from wedsonaf/ci-fix
Use version 0.10.4 of `fuse-backend-rs`
2023-08-03 22:50:11 +02:00
Wedson Almeida Filho
b2ff97aa01 dragonball: use version 0.10.4 of fuse-backend-rs
Version 0.10.5, which was just released, breaks `nydus-storage`.

This is a workaround to fix the CI which is blocking other PRs.

Fixes: #7541

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-08-03 14:15:17 -03:00
Fabiano Fidêncio
ebdae7cfdf Merge pull request #7520 from jepio/host-systemctl
kata-deploy: Use host's systemctl
2023-08-03 13:53:28 +02:00
Manabu Sugimoto
845eeb4d7b agent: Allow clippy::redundant_clone in the unit tests
Allow `clippy::redundant_clone` in the agent's unit tests
because rustc>=1.70 shows the errors as false-negatives.
These `clone()` are required because the following codes
refer to the variable, but the clippy analyzes them by mistake,
using the conservative and limited approach.
Ref. https://rust-lang.github.io/rust-clippy/master/index.html#/redundant_clone

Fixes: #7534

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-03 19:07:40 +09:00
Fabiano Fidêncio
e2755a47b8 Merge pull request #7524 from fidencio/revert-kata-deploy-changes-after-3.2.0-rc0-release
release: Revert kata-deploy changes after 3.2.0-rc0 release
2023-08-03 11:28:43 +02:00
Fabiano Fidêncio
1163fc9de2 release: Revert kata-deploy changes after 3.2.0-rc0 release
As 3.2.0-rc0 has been released, let's switch the kata-deploy / kata-cleanup
tags back to "latest", and re-add the kata-deploy-stable and the
kata-cleanup-stable files.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-03 10:08:20 +02:00
Xuewei Niu
3958a39d07 runtime-rs: Introduce directly attachable network
Kata containers as VM-based containers are allowed to run in the host
netns. That is, the network is able to isolate in the L2. The network
performance will benefit from this architecture, which eliminates as many
hops as possible. We called it a Directly Attachable Network (DAN for
short).

The network devices are placed at the host netns by the CNI plugins. The
configs are saved at {dan_conf}/{sandbox_id}.json in the format of JSON,
including device name, type, and network info. At the very beginning stage,
the DAN only supports host tap devices. More devices, like the DPDK, will
be supported in later versions.

The format of file looks like as below:

```json
{
	"netns": "/path/to/netns",
	"devices": [{
		"name": "eth0",
		"guest_mac": "xx:xx:xx:xx:xx",
		"device": {
			"type": "vhost-user",
			"path": "/tmp/test",
			"queue_num": 1,
			"queue_size": 1
		},
		"network_info": {
			"interface": {
				"ip_addresses": ["192.168.0.1/24"],
				"mtu": 1500,
				"ntype": "tuntap",
				"flags": 0
			},
			"routes": [{
				"dest": "172.18.0.0/16",
				"source": "172.18.0.1",
				"gateway": "172.18.31.1",
				"scope": 0,
				"flags": 0
			}],
			"neighbors": [{
				"ip_address": "192.168.0.3/16",
				"device": "",
				"state": 0,
				"flags": 0,
				"hardware_addr": "xx:xx:xx:xx:xx"
			}]
		}
	}]
}
```

Fixes: #1922

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-08-03 15:33:34 +08:00
David Esparza
7d1c48c881 Merge pull request #7530 from dborquez/fix_check_running_processes
metrics: stop kata components before start a metric test.
2023-08-02 23:51:27 -06:00
Zhongtao Hu
e719423262 Merge pull request #7127 from cmaf/runtime-rs-ch-blk-2
runtime-rs: Add block device handling for cloud hypervisor
2023-08-03 09:46:32 +08:00
David Esparza
1e15369e59 metrics: Improve naming testing containers in launch times test
This commit provides a new way to name the containers used
in the launch-times-test in this form:
'kata_launch_times_RANDOM_NUMBER', where RANDOM_NUMBER is
in the 0-1000 range.

Fixes: #7529

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-08-02 17:04:55 -06:00
David Esparza
5dbe88330f metrics: Clean kata components before start a metric test.
This PR kills all kata components before start a new
metric test.

Fixes: #7528

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-08-02 17:04:51 -06:00
Fabiano Fidêncio
d424f3c595 Merge pull request #7523 from fidencio/3.2.0-rc0-branch-bump
# Kata Containers 3.2.0-rc0
2023-08-02 20:04:37 +02:00
Zvonko Kaiser
cf8899f260 Merge pull request #7494 from zvonkok/vfio-mode
vfio: Fix vfio device ordering
2023-08-02 19:45:22 +02:00
Gabriela Cervantes
3b45060b61 metrics: Add latency server yaml
This PR adds latency server yaml for kubernetes test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-02 16:52:17 +00:00
Gabriela Cervantes
9bb8451df5 metrics: Add latency client yaml
This PR adds latency client yaml for the kubernetes test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-02 16:50:51 +00:00
Gabriela Cervantes
64fdb98704 metrics: Add network latency test
This PR adds network latency test for kata metrics.

Fixes #7526

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-02 16:46:48 +00:00
Chelsea Mafrica
a81ad3b587 runtime-rs: Add block device handling in cloud hypervisor
Add functions for adding a block device to a container for CH.

Fixes #6690

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2023-08-02 09:18:48 -07:00
David Esparza
542012c8be Merge pull request #7503 from GabyCT/topic/ghafio
metrics: Add FIO test to gha for kata metrics CI
2023-08-02 10:05:09 -06:00
David Esparza
5979f3790b Merge pull request #7516 from GabyCT/topic/addiperf
metrics: Add iperf3 network test
2023-08-02 10:04:51 -06:00
Fabiano Fidêncio
006ecce49a release: Kata Containers 3.2.0-rc0
- ci-on-push: Make the CI also run for the stable-* branches
- ci: k8s: Do not fail when gathering info on AKS nodes
- kata-deploy: enable cross build for non-x86
- runtime-rs: add support for gather metrics in runtime-rs
- kata-ctl: add monitor subcommand for runtime-rs
- release: release-note.sh: Fix typos and reference to images
- metrics: Add sysbench performance test
- Simplify implementation of runtime-rs/service

6ad16d497 release: Adapt kata-deploy for 3.2.0-rc0
025596b28 ci-on-push: Make the CI also run for the stable-* branches
7ffc0c122 static-build: enable cross build for qemu
35d6d86ab static-build: enable cross-build for image build
2205fb9d0 static-build: enable cross build for virtiofsd
11631c681 static-build: enable cross build for shim-v2
7923de899 static-build: cross build kernel
e2c31fce2 kata-deploy: enable cross build for kata deploy script
2fc5f0e2e kata-depoly: prepare env for cross build in lib.sh
f5e9985af release: release-note.sh: Fix typos and reference to images
f910c66d6 ci: k8s: Do not fail when gathering info on AKS nodes
632818176 metrics: Add k8s sysbench documentation
b3901c46d runtime-rs: ignore errors during clean up sandbox resources
5a1b5d367 metrics: Add sysbench pod yaml
ad413d164 metrics: Add sysbench dockerfile
151256011 metrics: Add sysbench performance test
62e328ca5 runtime-rs: refine implementation of TaskService
458e1bc71 runtime-rs: make send_message() as an method of ServiceManager
1cc1c81c9 runtime-rs: fix possibe bug in ServiceManager::run()
1a5f90dc3 runtime-rs: simplify implementation of service crate
731e7c763 kata-ctl: add monitor subcommand for runtime-rs The previous kata-monitor in golang could not communicate with runtime-rs to gather metrics due to different sandbox addresses. This PR adds the subcommand monitor in kata-ctl to gather metrics from runtime-rs and monitor itself.
d74639d8c kata-ctl: provide the global TIMEOUT for creating MgmtClient
02cc4fe9d runtime-rs: add support for gather metrics in runtime-rs

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-02 16:59:41 +02:00
Fabiano Fidêncio
6ad16d4977 release: Adapt kata-deploy for 3.2.0-rc0
kata-deploy files must be adapted to a new release.  The cases where it
happens are when the release goes from -> to:
* main -> stable:
  * kata-deploy-stable / kata-cleanup-stable: are removed

* stable -> stable:
  * kata-deploy / kata-cleanup: bump the release to the new one.

There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-02 16:59:41 +02:00
Fabiano Fidêncio
4e812009f5 Merge pull request #7519 from fidencio/topic/gha-ci-run-on-stable-branches
ci-on-push: Make the CI also run for the stable-* branches
2023-08-02 16:13:06 +02:00
Jeremi Piotrowski
3230dec950 kata-deploy: Use host's systemctl
when interacting with systemd. We have occasionally faced issues with
compatibility between the systemctl version used inside the kata-deploy
container and the systemd version on the host. Instead of using a containerized
systemctl with bind mounted sockets, nsenter the host and run systemctl from
there. This provides less coupling between the kata-deploy container and the
host.

Fixes: #7511
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-08-02 15:32:01 +02:00
Fabiano Fidêncio
29855ed0c6 Merge pull request #7510 from fidencio/topic/ci-k8s-aks-do-not-fail-gathering-info
ci: k8s: Do not fail when gathering info on AKS nodes
2023-08-02 09:44:19 +02:00
Fabiano Fidêncio
025596b289 ci-on-push: Make the CI also run for the stable-* branches
As we only support one stable branch, it'll be used as part of the
stable-3.2 and onwards.

Fixes: #7518

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-02 09:26:24 +02:00
Fabiano Fidêncio
e1a69c0c92 Merge pull request #6586 from jongwu/cross_build
kata-deploy: enable cross build for non-x86
2023-08-02 09:11:56 +02:00
Fupan Li
1a6b27bf6a Merge pull request #5797 from Yuan-Zhuo/add-metrics-for-runtime-rs
runtime-rs: add support for gather metrics in runtime-rs
2023-08-02 13:40:22 +08:00
Fupan Li
a536d4a7bf Merge pull request #6672 from Yuan-Zhuo/add-monitor-in-kata-ctl
kata-ctl: add monitor subcommand for runtime-rs
2023-08-02 13:39:02 +08:00
Gabriela Cervantes
ad6e53c399 metrics: Modify boot time values
This PR modifies boot time values limit.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-01 23:34:15 +00:00
Jianyong Wu
7ffc0c1225 static-build: enable cross build for qemu
Depends on mutiarch feature of ubuntu, we can set up cross build
environment easily and achive as good build performance as native
build.

Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-08-01 23:28:52 +02:00
Jianyong Wu
35d6d86ab5 static-build: enable cross-build for image build
It's too long a time to cross build agent based on docker buildx, thus
we cross build rootfs based on a container with cross compile toolchain
of gcc and rust with musl libc. Then we get fast build just like native
build.

rootfs initrd cross build is disabled as no cross compile tolchain for
rust with musl lib if found for alpine and based on docker buildx takes
too long a time.

Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-08-01 23:28:52 +02:00
Gabriela Cervantes
f764248095 gha: Add FIO test to run metrics yaml
This PR adds FIO test to run metrics yaml.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-01 20:29:16 +00:00
Jianyong Wu
2205fb9d05 static-build: enable cross build for virtiofsd
Based on messense/rust-musl-cross which offer cross build musl lib
environment to cross compile virtiofsd.

Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-08-01 22:10:46 +02:00
Jianyong Wu
11631c681a static-build: enable cross build for shim-v2
shim-v2 has go and rust code. For rust code, we use messense/rust-musl-cross
to build for speed up as it doesn't depends on qemu emulation. Build go
code based on docker buildx as it doesn't support cross build now.

Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-08-01 22:10:46 +02:00
Jianyong Wu
7923de8999 static-build: cross build kernel
Prepare cross build environment based on current Dockerfile.

Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-08-01 22:10:46 +02:00
Jianyong Wu
e2c31fce23 kata-deploy: enable cross build for kata deploy script
kata-deploy-binaries-in-docker.sh is the entry to build kata components.
set some environment to facilitate the following cross build work.

Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-08-01 22:10:46 +02:00
Jianyong Wu
2fc5f0e2e0 kata-depoly: prepare env for cross build in lib.sh
We leverage three env, TARGET_ARCH means the buid target tuple;
ARCH nearly the same meaning with TARGET_ARCH but has been widely
used in kata; CROSS_BUILD means if you want to do cross compile.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-08-01 22:10:46 +02:00
Fabiano Fidêncio
c0171ea0a7 Merge pull request #7508 from fidencio/topic/fix-release-notes-typos-and-references
release: release-note.sh: Fix typos and reference to images
2023-08-01 22:05:32 +02:00
Gabriela Cervantes
58f9a57c20 metrics: Add network reference to general README metrics
This PR adds network reference to the general metrics README.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-01 16:54:00 +00:00
Gabriela Cervantes
07694ef3ae metrics: Add Kata Containers network metrics README
This PR adds the Kata Containers network metrics README.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-01 16:49:09 +00:00
Gabriela Cervantes
d8439dba89 metrics: Add iperf3 deployment yaml
This PR adds the iperf3 deployment yaml.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-01 16:45:01 +00:00
Gabriela Cervantes
bda83cee5d metrics: Add iperf3 daemonset for k8s
This PR adds the iperf3 daemonset for k8s.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-01 16:42:15 +00:00
Gabriela Cervantes
badff23c71 metrics: Add iperf3 service yaml for k8s
This PR adds the iperf3 service yaml for k8s.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-01 16:37:19 +00:00
Gabriela Cervantes
27c02367f9 metrics: Add iperf3 network test
This PR adds the iperf3 benchmark test for kata metrics.

Fixes #7515

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-01 16:30:46 +00:00
GabyCT
a0a524efc2 Merge pull request #7486 from kata-containers/topic/addsysbench
metrics: Add sysbench performance test
2023-08-01 10:17:48 -06:00
Fabiano Fidêncio
f5e9985afe release: release-note.sh: Fix typos and reference to images
diferent -> different

And also let's make sure we escape the backticks around the kata-deploy
environment variables, otherwise bash will try to interpret those.

Fixes: #7497

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-01 12:42:03 +02:00
Fabiano Fidêncio
f910c66d6f ci: k8s: Do not fail when gathering info on AKS nodes
Otherwise the VM deletion may not delete, leaving us with several
machines behind.

Fixes: #7509

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-01 12:36:33 +02:00
Manabu Sugimoto
1b21a46246 docs: Use control-plane term instead of master
Replace `master` with `control-plane` in the context of K8s
because `master` is a legacy term and haven't been used any more.

Ref. https://github.com/kubernetes/enhancements/tree/master/keps/sig-cluster-lifecycle/kubeadm/2067-rename-master-label-taint

Fixes: #7466

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-01 17:41:40 +09:00
Chao Wu
1a94aad44f Merge pull request #7480 from jiangliu/rt-service
Simplify implementation of runtime-rs/service
2023-08-01 16:05:33 +08:00
Chao Wu
2d13e2d71c Merge pull request #7504 from fidencio/topic/gha-release-fix-upload-versions-yaml
release: Fix upload-versions-yaml
2023-08-01 13:58:07 +08:00
GabyCT
b77d69aeee Merge pull request #7396 from GabyCT/topic/addghatensorflow
metrics: Enable Tensorflow metrics for kata CI
2023-07-31 17:13:24 -06:00
Fabiano Fidêncio
743291c6c4 release: Fix upload-versions-yaml
This requires the GITHUB_UPLOAD_TOKEN.  While we're here, let's also fix
the name of the action and remove the "-tarball" suffix, as it's not
really a tarball.

Fixes: #7497

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-31 23:57:33 +02:00
Fabiano Fidêncio
a71d35c764 Merge pull request #7499 from fidencio/topic/gha-release-ensure-stage-is-defined-for-amr64-s300x
gha: release: `stage` must be defined for arm64 / s390x yamls
2023-07-31 22:55:54 +02:00
Gabriela Cervantes
6328181762 metrics: Add k8s sysbench documentation
This PR adds k8s sysbench documentation at general density documentation.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-31 20:28:37 +00:00
Chelsea Mafrica
f74b7aba18 Merge pull request #7488 from cmaf/docs-k8s-links
docs: Update links for pods and kubelet
2023-07-31 12:44:24 -07:00
Gabriela Cervantes
8933d54428 metrics: Add FIO to gha run script
This PR adds FIO to gha run script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-31 17:51:11 +00:00
Gabriela Cervantes
8a584589ff metrics: Add DAX FIO README
This PR adds DAX FIO README information.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-31 17:42:44 +00:00
Gabriela Cervantes
21f5b65233 metrics: Add FIO information in storage general README
This PR adds FIO information in storage general README.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-31 17:33:39 +00:00
Gabriela Cervantes
69f05cf9e6 metrics: Add FIO general README
This PR adds FIO general README information.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-31 17:30:05 +00:00
Gabriela Cervantes
87d41b3dfa metrics: Add FIO test to gha for kata metrics CI
This PR adds FIO test to gha for kata metrics CI.

Fixes #7502

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-31 16:50:16 +00:00
Pavel Mores
28e5e9c86e runtime-rs: fix number of queues handling in dragonball share fs device
Looks like a copy/paste error...

Fixes #7501

Signed-off-by: Pavel Mores <pmores@redhat.com>
2023-07-31 17:25:47 +02:00
Fabiano Fidêncio
ff8d7e7e41 Merge pull request #7496 from fidencio/topic/topic/kata-deploy-take-nfd-into-consideration-pre-work
k8s: Rely on the USING_NFD environment variable passed by the jobs
2023-07-31 14:56:15 +02:00
Fabiano Fidêncio
1b111a9aab gha: release: stage must be defined for arm64 / s390x yamls
`stage`  has been added, but only hooked up to the amd64 logic, leaving
arm64 and s390x behind.

Let's fix this right now, and make sure no error occurs when passing
this down to the yaml files.

Fixes: #7497

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-31 14:41:35 +02:00
Fabiano Fidêncio
684a6e1a55 Revert "gha: release: stage must be a string"
This reverts commit 7c857d38c1.

I've misunderstood the error given by github action, let's fix this in
the next commit.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-31 14:37:52 +02:00
Fabiano Fidêncio
99711f107f Merge pull request #7498 from fidencio/topic/gha-release-stage-must-be-a-string
gha: release: `stage` must be a string
2023-07-31 14:32:47 +02:00
Fabiano Fidêncio
7c857d38c1 gha: release: stage must be a string
Otherwise we'll face the following error as part of our GHA:
```
The workflow is not valid.
kata-containers/kata-containers/.github/workflows/release-$foo.yaml
(Line: 13, Col: 14): Invalid input, stage is not defined in the
referenced workflow.
```

Fixes: #7497

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-31 13:39:13 +02:00
Fabiano Fidêncio
28e171bf73 Merge pull request #7490 from fidencio/3.2.0-alpha4-branch-bump
# Kata Containers 3.2.0-alpha4
2023-07-31 13:34:15 +02:00
Fabiano Fidêncio
91e1e612c3 k8s: Rely on the USING_NFD environment variable passed by the jobs
Let's make sure we can rely on the tests passing down whether they want
to be tested using Node Feataure Discovery or not.

Right now, only the TDX job has this option set to "true", all the other
jobs have this option set to "false".

We can and have to merge this one before merging the NFD related patches
as:
1) It causes no harm in exporting this environment variable, but not
   having it used
2) It will allow us to test the NFD after this one is merged, as changes
   in the yaml file, in the case of the pull_request_target event,  are
   not taken into consideration before they're merged

Fixes: #7495

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-31 13:30:18 +02:00
Zvonko Kaiser
cddcde1d40 vfio: Fix vfio device ordering
If modeVFIO is enabled we need 1st to attach the VFIO control group
device /dev/vfio/vfio an 2nd the actuall device(s) afterwards.Sort the
devices starting with device #1 being the VFIO control group device and
the next the actuall device(s)
/dev/vfio/<group>

Fixes: #7493

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-31 11:26:27 +00:00
Fabiano Fidêncio
7edc7172c0 release: Kata Containers 3.2.0-alpha4
- tests: Add `k8s-volume` and `k8s-file-volume` tests to GHA CI
- metrics: Update boot time for kata metrics
- metrics: Add FIO report files for kata metrics
- kata-deploy: Allow runtimeclasses to be created by the daemonset
- runtime-rs: change block index to 0
- agent: fix typo in constant
- metrics: Add FIO benchmark for metrics tests
- gha: dragonball: Run only on the dragonball labeled machine
- tests: Fix `k8s-job` test
- agent,libs: Remove unused 'mut' keywords
- runtime-rs: remove unneeded 'mut' keywords
- tests: QoL improvements for running tests locally
- agent: exclude symlinks from recursive ownership change
- cache: kernel: Fix kernel caching
- runk: Add Docker guide to README
- metrics: General improvements to json.bash script
- kata-deploy: Allow shim creation based on what's passed to the daemonset
- gha: ci: Add skeleton of vfio job
- s390x: Fixing device.Bus assignment
- release: Mention the container images used to build the project
- kata-deploy-binaries: kernel_cache: Take module_dir into account
- ci: nydus: Fix typo in "source"
- gha: ci: Add no-op nydus tests to our CI
- Dragonball: migrate dragonball-sandbox crates to Kata
- ci: gha: Add cri-containerd tests (but still do not enable them)
- packaging/tools: Add kata-debug and use it as part of our CI
- cache: kernel: Consider changes in tools/packaging/kernel
- kata-deploy: Properly get the path of the versions.yaml file
- kata-deploy: Add VERSION and versions.yaml to the final tarball
- metrics: Add C-Ray performance test
- metrics: enable TensorFlow benchmark to be run on gha
- metrics: Add function to memory inside container script
- Revert "metrics: Replace backslashes used to escape double quoted key in jq expr"
- versions: Bump virtiofsd to v1.7.0
- metrics: stop hypervirsor and shim at init_env stage
- ci: k8s: Adapt "source ..." to the new location of gha-run.sh
- ci: Move `tests/integration/gha-run.sh`  to `tests/integration/kuberentes/` ... and also remove KUBECONFIG from the tdx envs
- versions: Update kernel to version v6.1.x
- agent: Fix exec hang issues with a backgroud process
- agent: Ignore already mounted dev/fs/pseudo-fs
- ci: k8s: Bring TDX tests back
- metrics: Update machine learning documentation
- gha: ci: cri-containerd: Fix KATA_HYPERVSIOR typo
- tests: Add MobileNet Tensorflow performance benchmark
- metrics: replace backslashes used to escape double quoted jq key expr.
- runtime-rs: enhancement of Device Manager for network endpoints.
- feat(Tracing): tracing in Rust runtime
- runtime-rs: ignore unconfigured network interfaces
- metrics: Stop running kata-env before kata is properly installed.
- metrics: use rm -f to remove the oldest continerd config file.
- kernel: Update kernel config name
- kata-deploy: Add a debug option to kata-deploy (and also use it as part of our CI)
- runtime-rs: add parameter for propagation of (u)mount events
- kata-ctl: Move GuestProtection code to kata-sys-util
- tests: Add function before function name in common.bash for metrics
- tests: Add metrics storage documentation
- metrics: Fix metrics ts generator to treat numbers as decimals
- gha: ci: Add cri-containerd tests skeleton -- follow up 1
- dragonball/agent: Add some optimization for Makefile and bugfixes of unit tests on aarch64
- metrics: Enable blogbench test
- tests: Add machine learning performance tests
- tests: gha: ci: Add cri-containerd tests skeleton
- metrics: Enable memory inside container metrics
- tools: Use a consistent target name when building mariner initrd
- gha: ci: Gather info about the node / pods
- runtime-rs: Do not scan network if network model is "none"
- gha: k8s: tdx: Temporarily disable TDX tests
- metrics: Update memory usage script
- gha: Cancel previous jobs if a PR is updated
- gha: nightly: Fix long name of AKS clusters issue and make the CI easier to test
- README: Add badge for our Nightly CI
- gha: Do not run all the tests if only docs are updated
- bugfix: plus default_memory when calculating mem size
- gha: ci: Use github.sha to get the last commit reference
- dragonball: Don't fail if a request asks for more CPUs than allowed
- gha: ci: Fix refernce passed to checkout@v3
- gha: ci: Avoid using env also in the ci-nightly and payload-after-push
- gha: k8s: Ensure cluster doesn't exist before creating it
- gha: ci: More follow up fixes after adding a nightly CI
- tests: Enable running k8s tests on Mariner
- gha: ci: Avoid using env unless it's really needed
- gha: ci: Follow up fixes for the nightly jobs
- tests: Enable memory usage metrics tests
- gha: Add nightly jobs
- metrics: storing metrics workflow artifacts
- gha: k8s: Ensure tests are running on a specific namespace
- metrics: Adds blogbench and webtool metrics tests
- gha: dragonball: Correctly propagate PATH update
- versions: Upgrade to Cloud Hypervisor v33.0
- Convert `is_allowed`, `ttrpc_error` and `sl` to functions
- gha: release: Use a specific release of hub
- metrics: Add checkmetrics to gha-run.sh for metrics CI
- packaging: Fix indentation of build.sh script at ovmf
- doc: Add documentation for the virtualization reference architecture
- gpu: Update kernel building to the latest changes
- runtime: fix PCIe topology for GPUDirect use-case
- metrics: Add memory footprint tests
- runtime: Add "none" as a shared_fs option
- metrics: Uniformity across function names in gha-run.sh
- runtime-rs:  support physical endpoint using device manager
- runtime-rs: bugfix for direct volume path's validation.
- metrics: Fix retrieving hypervisor version on metrics
- runtime-rs: fix build error on AArch64
- checkmetrics: Add checkmetrics makefile and documentation
- docs: Add boot time metrics documentation
- runtime-rs: add support spdk/vhost-user based volume.
- static-build: Remove kata-version parameter
- dragonball: avoid obtaining lock twice in create_stdio_console
- metrics: Add checkmetrics for kata metrics CI
- metrics: enable launch-times test on gha-run metrics script
- docs: Add general metrics documentation
- add support vfio device manager
- gha: Don't automatically trigger CI
- kata-ctl: Check for vm capability
- docs: fix spelling of "crate"
- packaging: Fix indentation in init.sh script
- gha: Fix gha actions
- metrics: install kata and launch-times test
- tests: Move tests helper script to this repo
- tests: Add json script for metrics tests
- Cherry pick initramfs caching updates from CCv0
- gha: Fix format for run launchtimes metrics yaml
- tests: Add tests lib common script
- Fix deprecated virtiofsd args (go shim only)
- gha: Add base branch on SHA on pull requst
- gha: ci-on-push: Run metrics tests
- docs: Update Developer Guide
- runtime-rs: Enhance flexibility of virtio-fs config
- versions: Update firecracker version to 1.3.3
- tools: Fix no-op builds
- runtime-rs: update Cargo.lock
- gha: Fix `stage` definition in matrix
- feat(runtime): vcpu resize capability
- packaging: Remove snap package
- gha: Add new build targets for Mariner
- Dragonball: support resize memory
- Port Measured rootfs feature from CCv0 branch to main
- add support direct volume and refactor device manager
- gha: Fix gha-run.sh and unbreak CI
- kata-ctl: Switch to slog logging; add --log-level and --json-logging arguments
- log-parser: Update log parser link at README
- gha: aks: Extract `run` commands to a script
- runtime-rs: handle copy files when share_fs is not available
- agent-ctl: fix the compile error
- agent: fix the issue of exec hang with a backgroud process
- runtime-rs: bugfix: update Cargo.lock
- gha: aks: Use short SHA in cluster name
- README: Display badge for the "Publish Artefacts" job and update the Kata Containers logo
- kata-deploy: Change how we get the Ubuntu k8s key
- gha: aks: Ensure host_os is used everywhere needed
- kubernetes: add agnhost command in pod yaml
- main | release: Standardize kata static file name
- packaging: make BUILDER_REGISTRY configurable
- gha: aks: Add the host_os as part of the aks cluster's name
- kernel: Modify build-kernel.sh to accomodate for changes in version.yaml
- gha: Fix Mariner cluster creation
- gha: Unbreak CI and fix cluster creation step
- Dragonball: support vcpu hotplug on aarch64
- runtime-rs/sandbox_bindmounts: add support for sandbox bindmounts
- runtime-rs/kata-ctl: Enhancement of DirectVolumeMount.
- gha: Create Mariner host as part of k8s tests
- netlink: Fix the issue of update_interface
- gha: Increase timeout for AKS jobs and give more time to start running the tests
- runtime: sending SIGKILL to qemu
- dragonball: convert BlockDeviceMgr and VirtioNetDeviceMgr functions to methods
- dragonball: Remove virtio-net and vsock devices gracefully
- kata-deploy: Improve shim backup / restore
- doc: Update git commands
- kata-deploy: Fix indentation on kata deploy merge script

8353aae41 ci: k8s: Rework get_nodes_and_pods_info()
6ad5d7112 ci: k8s: Do not gather node info before running the tests
5261e3a60 ci: k8s: Group messages to improve readability
9cc6b5f46 ci: k8s: Get logs from kata-deploy
9d285c622 ci: k8s: Let kata-deploy take care of the runtimeclasses
87568ed98 gha: Test split out runtimeclasses are in sync with all-in-one file
39192c608 kata-deploy: Print variables passed to the script
0e157be6f kata-deploy: Allow runtimeclasses to be created by the daemonset
a27433324 kata-deploy: Change default values of DEBUG
69535b808 kata-deploy: runtimeclass: Split out entries
9e1710674 kata-runtimeClasses: Alphabetically sort the enrties
6222bd910 tests: Add k8s-file-volume test
187a72d38 tests: Add k8s-volume test
0c8427035 metrics: Add boot time value for qemu
6520dfee3 metrics: Update boot time for kata metrics
ff2279061 metrics: Update runtime and configuration paths
a5d4e3388 metrics: Add compare virtiofsd dax script
5e937fa62 metrics: Update general FIO tests
b0bea47c5 metrics: Add makefile to report generator
73c57b9a1 metrics: Add FIO report files for kata metrics
c8fcd29d9 runtime-rs: use device manager to handle virtio-pmem
901c19225 runtime-rs: support configure vm_rootfs_driver
5d6199f9b runtime-rs: use device manager to handle vm rootfs
20f1f62a2 runtime-rs: change block index to 0
662f87539 metrics: Add general FIO makefile
c5a87eed2 tests: gha: Add timeout to cluster creation
6daeb08e6 tests: k8s: Clean up node debuggers after running
3aa6c77a0 gha: dragonball: Run only on the dragonball labeled machine
37641a543 metrics: Add example config for fio jobs
314aec73d agent: fix typo in constant
4703434b1 tests: k8s: Allow using custom resource group
350f3f70b tests: Import `common.bash` in `run_kubernetes_tests.sh`
d7f04a64a tests: k8s: Leave `runtimeclass_workloads/` alone
bdde6aa94 tests: k8s: Split deployment and testing commands
91a0b3b40 tests: aks: Simply delete cluster when cleaning up
3c1044d9d metrics: Update FIO paths for k8s runner
6177a0db3 metrics: Add env files for FIO
a45900324 metrics: Add fio exec
ea198fddc metrics: Add FIO runner k8s
8f7ef41c1 metrics: Add FIO vendor code
6293c17bd metrics: Add FIO benchmark for metrics tests
ff4cfcd8a runk: Add Docker guide to README
c8ac56569 cache: kernel: Harmonize commit with fetching side
81775ab1b cache: kernel: Fix SEV kernel caching
717f775f3 gha: ci: Add skeleton of vfio job
b9f100b39 agent,libs: Remove unused 'mut' keywords
a56f96bb2 kata-deploy: Allow shim creation based on what's passed to the daemonset
4a5ab38f1 metrics: General improvements to json.bash script
d4eba3698 kata-deploy-binaries: kernel_cache: Take module_dir into account
b7c9867d6 release: Mention the container images used to build the project
7c4b59781 ci: nydus: Fix typo in "source"
6a680e241 gha: ci: Add placeholder for the nydus tests as part of the CI
fb4f7a002 gha: nydus: Add a no-op GHA for nydus
4a207a16f gha: nydus: Bring tests as they are from the tests repo
2c8f83424 runtime-rs: remove unneeded 'mut' keywords
1fc715bc6 s390x: Add AP Attach/Detach test
e91f5edba ci: cri-containerd: Fix default typo for testContainerStart()
8b8aef09a ci: cri-containerd: Temporarily disable TestContainerSwap
56767001c ci: cri-containerd: Add namespace / uid to the pods
a84773652 ci: cri-containerd: Always use sudo to call crictl
99ba86a1b ci: cri-containerd: Add /usr/local/go/bin to the PATH
7f3b30999 ci: cri-containerd: Add `function` before each function
fde22d6bc ci: cri-containerd: Assume podman is always used
9465a0496 ci: cri-containerd: Adapt "source ..." to this repo
df8d14411 ci: cri-containerd: Remove CI variable
f90570aef ci: cri-containerd: Remove unused runc_runtime_bin
c3637039f ci: cri-containerd: Remove KILL_VMM_TEST env var
bc4919f9b ci: cri-containerd: Always run shim-v2 tests
f9e332c6d ci: cri-containerd: Stop cloning containerd
cfd662fee ci: cri-containerd: Remove ununsed SNAP_CI var
d36c3395c ci: cri-containerd: Update copyright
b5be8a4a8 ci: cri-containerd: Move integration-tests.sh as it was
f2e00c95c ci: cri-containerd: Populate install_dependencies()
897955252 versions: Add "latest" field for cri-tools
1bbcbafa6 ci: Add clone_cri_container()
f66c68a2b ci: Add install_cri_tools()
4dd828414 ci: Add install_cri_containerd()
ad47d1b9f ci: Add download_github_project_tarball()
788c562a9 ci: Add get_latest_patch_release_from_a_github_project()
6742f3a89 ci: Use `function` before each install_go.sh function
5eacecffc ci: Adjust paths for install_go.sh
8ed1595f9 ci: Update copyright for install_go.sh
6123d0db2 ci: Move install_go.sh as it was
8653be71b ci: Do not take cross-build into consideration for kata-arch.sh
6a76bf92c ci: Fix style / identation if kata-arch.sh
72743851c ci: Add `function` before each kata-arch.sh function
9f6d4892c ci: Update copyright for kata-arch.sh
6f73a7283 ci: Move kata-arch.sh as it was
3615d7343 ci: Add get_from_kata_deps()
34779491e gha: kubernetes: Avoid declaring repo_root_dir
f3738beac tests: Use $HOME/go as fallback for $GOPATH
b87ed2741 tests: Move `ensure_yq` to common.bash
124e39033 tests: common: Fix quoting when globbing
db77c9a43 tests: Make install_kata take care of the links
13715db1f tests: Do not call `install_check_metrics` when installing kata
630634c5d ci: k8s: Group logs to make them easier to read
228b30f31 ci: k8s: Gather node info during the cleanup
81f99543e ci: k8s: Cleanup cluster before deleting it
38a7b5325 packaging/tools: Add kata-debug
ae6e8d2b3 kata-deploy: Properly get the path of the versions.yaml file
309e23255 cache: kernel: Consider changes in tools/packaging/kernel
59fdd69b8 kata-deploy: Add VERSION and versions.yaml to the final tarball
5dddd7c5d release: Upload versions.yaml as part of the release
bad3ac84b metrics: Rename C-Ray to cpu performance tests
87d99a71e versions: Remove "kernel-experimental"
545de5042 vfio: Fix tests
62aa6750e vfio: Added better handling of VFIO Control Devices
dd422ccb6 vfio: Remove obsolete HotplugVFIOonRootBus
114542e2b s390x: Fixing device.Bus assignment
371a118ad agent: exclude symlinks from recursive ownership change
e64edf41e metrics: Add tensorflow function in gha-run script
67a6fff4f metrics: Enable tensorflow benchmark on gha
01450deb6 Revert "metrics: Replace backslashes used to escape double quoted key in jq expr."
843006805 metrics: Add function to memory inside container script
bbd3c1b6a Dragonball: migrate dragonball-sandbox crates to Kata
fad801d0f ci: k8s: Adapt "source ..." to the new location of gha-run.sh
55e2f0955 metrics: stop hypervirsor and shim at init_env stage
556e663fc metrics: Add disk link to general metrics README
98c121709 metrics: Add C-Ray README
8e7d9926e metrics: Add C-Ray Dockerfile
e2ee76978 metrics: Add C-Ray performance test
2ee2cd307 ci: k8s: Move gha-run.sh to the kubernetes dir
88eaff533 ci: tdx: Adjust KUBECONFIG
c09e268a1 versions: Downgrade SEV(-SNP) kernel back to v5.19.x
6a7a32365 versions: Bump virtiofsd to v1.7.0
ac5f5353b ci: k8s: Bring TDX tests back
950b89ffa versions: Update kernel to version v6.1.38
8ccc1e5c9 metrics: Update machine learning documentation
f50d2b066 gha: ci: cri-containerd: Fix KATA_HYPERVSIOR typo
620b94597 metrics: Add Tensorflow Mobilenet documentation
6c91af0a2 agent: Fix exec hang issues with a backgroud process
59f4731bb metrics: Stop running kata-env before kata is properly installed.
468f017e2 metrics: Replace backslashes used to escape double quoted key in jq expr.
64f013f3b ci: k8s: Enable debug when running the tests
8f4b1df9c kata-deploy: Give users the ability to run it on DEBUG mode
2c8dfde16 kernel: Update kernel config name
150e54d02 runtime-rs: ignore unconfigured network interfaces
3ae02f920 metrics: use rm -f to remove older continerd config file.
a864d0e34 tests: Add tensorflow mobilenet dockerfile
788d2a254 tests: Add tensorflow mobilenet performance test
3fed61e7a tests: Add storage link to general metrics documentation
b34dda4ca tests: Add storage blogbench metrics documentation
6787c6390 runtime-rs: add parameter for propagation of (u)mount events
6e5679bc4 tests: Add function before function name in common.bash for metrics
62080f83c kata-sys-util: Fix compilation errors
02d99caf6 static-checks: Make cargo clippy pass.
982420682 agent: Make the static checks pass for agent
61e4032b0 kata-ctl: Remove all utility functions to get platform protection
a24dbdc78 kata-sys-util: Move utilities to get platform protection
dacdf7c28 kata-ctl: Remove cpu related functions from kata-ctl
f5d195717 kata-sys-util: Move additional functionality to cpu.rs
304b9d914 kata-sys-util: Move CPU info functions
7319cff77 ci: cri-containerd: Add LTS / Active versions for containerd
2a957d41c ci: cri-containerd: Export GOPATH
75a294b74 ci: cri-containerd: Ensure deps are installed
6924d14df metrics: Fix metrics ts generator to treat numbers as decimals
9e048c8ee checkmetrics: Add blogbench read value for qemu
2935aeb7d checkmetrics: Add blogbench write value for qemu
02031e29a checkmetrics: Add blogbench read value for clh
107fae033 checkmetrics: Add blogbench write value for clh
8c75c2f4b metrics: Update blogbench Dockerfile
49723a9ec metrics: Add double quotes to variables
dc67d902e metrics: Enable blogbench test
438fe3b82 gha: ci: Add cri-containerd tests skeleton
bd08d745f tests: metrics: Move metrics specific function to metrics gha-run.sh
3ffd48bc1 tests: common: Move a few utility functions to common.bash
7f961461b tests: Add machine learning README
bb2ef4ca3 tests: Add `function` before each function
063f7aa7c tests: Add Pytorch Dockerfile
1af03b9b3 tests: Add Pytorch performance test
4cecd6237 tests: Add tensorflow Dockerfile
c4094f62c tests: Add metrics machine learning performance tests
89b622dcb gha: k8s: tdx: Temporarily disable TDX tests
8c9d08e87 gha: ci: Gather info about the node / pods
283f809dd runtime-rs: Enhancing Device Manager for network endpoints.
a65291ad7 agent: rustjail: update test_mknod_dev
46b81dd7d agent: clippy: fix cargo clippy warnings
c4771d9e8 agent: Makefile: enable set SECCOMP dynamically
a88212e2c utils.mk: update BUILD_TYPE argument
883b4db38 dragonball: fix cargo test on aarch64
6822029c8 runtime-rs: Do not scan network if network model is "none"
ce54e43eb metrics: Update memory usage script
fbc2a91ab gha: Cancel previous jobs if a PR is updated
307cfc8f7 tools: Use a consistent target name when building mariner initrd
d780cc08f gha: nightly: Also use `workflow_dispatch` to trigger it
b99ff3026 gha: nightly: Fix name size limit for AKS
aedc586e1 dragonball: Makefile: add coverage target
310e069f7 checkmetrics: Enable checkmetrics for memory inside test
1363fbbf1 README: Add badge for our Nightly CI
1776b18fa gha: Do not run all the tests if only docs are updated
28c29b248 bugfix: plus default_memory when calculating mem size
0c1cbd01d gha: ci: after-push: Use github.sha to get the last commit reference
37a955678 gha: ci: nightly: Use github.sha to get the last commit reference
ed23b47c7 tracing: Add tracing to runtime-rs
96e9374d4 dragonball: Don't fail if a request asks for more CPUs than allowed
38f0aaa51 Revert "gha: k8s: dragonball: Skip k8s-number-cpus"
828a72183 gha: k8s: dragonball: Skip k8s-oom
a79505b66 gha: k8s: dragonball: Skip k8s-number-cpus
275c84e7b Revert "agent: fix the issue of exec hang with a backgroud process"
2be342023 checkmetrics: Add memory usage inside container value for qemu
6ca34f949 checkmetrics: Add memory inside container value for clh
6c6892423 metrics: Enable memory inside container metrics
0ad298895 gha: ci: Fix refernce passed to checkout@v3
86904909a gha: ci: Avoid using env also in the ci-nightly and payload-after-push
f72cb2fc1 agent: Remove shadowed function, add slog-term
1d05b9cc7 gha: ci: Pass down secrets to ci-on-push / ci-nightly
c5b4164cb gha: ci: Fix tarball-suffix passed to the metrics tests
07810bf71 agent: Ignore already mounted dev/fs/pseudo-fs
11e3ccfa4 gha: ci: Avoid using env unless it's really needed
c45f646b9 gha: k8s: Ensure cluster doesn't exist before creating it
1a7bbcd39 gha: ci: Fix typo pull_requesst -> pull_request
ddf4afb96 gha: ci: Fix set-fake-pr-number job
8a0a66655 gha: ci: schedule expects a list, not a map
5c0269dc5 gha: ci: Add pr-number input to the correct job
de83cd9de gha: ci: Use $VAR instead of ${{ env.VAR }}
6acce83e1 metrics: Fix the call to check_metrics function
e067d1833 gha: Add a nightly CI job
7c0de8703 gha: k8s: Ensure tests are running on a specific namespace
106e30571 gha: Create a re-usable `ci.yaml` file
cc3993d86 gha: Pass event specific info from the caller workflow
4e396e728 metrics: Add function keyword to to helper metrics functions
1ca17c2f7 metrics: storing metrics workflow artifacts
5a61065ab checkmetrics: Add checkmetrics value for memory usage in qemu
78086ed1f checkmetrics: Add memory usage value for clh
1c3dbafbf metrics: Fix function of how to retrieve multiple values
18968f428 metrics: Add function to have uniformity
35d096b60 metrics: Adds blogbench and webtool metrics tests
d8f90e89d metrics: Rename function at memory usage script
b9d66e0d5 metrics: Fix double quotes variables in memory usage script
476a11194 tests: Enable memory usage metrics tests
b568c7f7d tests/integration: Provide default value for KATA_HOST_OS
d6e96ea06 tests/integration: Use AzureLinux instead of Mariner
40c46c75e tests/integration: Perform yq install in run_tests()
d8b8f7e94 metrics: Enable launch tests time metrics
72fd562bd gha: release: Use a specific release of hub
0502354b4 checkmetrics: Add checkmetrics json for qemu
b481ef188 makefile: Add -buildvcs=false flag to go build
e94aaed3c ci_worker: Add checkmetrics ci worker for cloud hypervisor
917576e6f metrics: Add double quotes in all variables
cc8f0a24e metrics: Add checkmetrics to gha-run.sh for metrics CI
477856c1e gha: dragonball: Correctly propagate PATH update
1c211cd73 gha: Swap asset/release in build matrix
0152c9aba tools: Introduce `USE_CACHE` environment variable
2b5975689 tests: Build CLH with glibc for Mariner
80c78eadc tests: Use baked-in kernel with Mariner
532755ce3 tests: Build Mariner rootfs initrd
6a21e20c6 runtime: Add "none" as a shared_fs option
5681caad5 versions: Upgrade to Cloud Hypervisor v33.0
b2ce8b4d6 metrics: Add memory footprint tests to the CI
d035955ef doc: Add documentation for the virtualization reference architecture
0f454d0c0 gpu: Fixing typos for PCIe topology changes
6bb2ea819 packaging: Fix indentation of build.sh script at ovmf
0504bd725 agent: convert the `sl` macros to functions
0860fbd41 agent: convert the `ttrpc_error` macro to a function
0e5d6ce6d agent: convert the `is_allowed` macro to a function
f680fc52b agent: change `AGENT_CONFIG`'s lazy type to just `AgentConfig`
beb706368 metrics: Uniformity across function names
1f3e837e4 runtime-rs: fix build error on AArch64
6fd25968c runtime-rs: bugfix for direct volume path's validation.
415578cf3 docs: Add general README
bff4672f7 runtime-rs: support physical endpoint using device manager
32cba7e44 metrics: Fix retrieving hypervisor version on metrics
aa7946de4 checkmetrics: Add general checkmetrics documentation
2fac2b72f checkmetrics: Add checkmetrics makefile
e45899ae0 docs: Add time tests documentation reference
28130d3ce docs: Add boot time metrics documentation
0df2fc270 runtime-rs: add support spdk/vhost-user based volume.
17198089e vendor: Add vendor checkmetrics dependencies
f1dfea6e8 docs: Add metrics documentation reference
8330fb8ee gpu: Update unit tests
859359424 metrics: enable launch-times test on gha-run metrics script
c4ee601bf metrics: Add checkmetrics for kata metrics CI
e0d6475b4 gha: Don't automatically trigger CI
b535c7cbd tests: Enable running k8s tests on Mariner
71071bdb6 docs: Add general metrics documentation
610f7986e check: Relax the unrestricted_guest check when running in a VM
1b406b9d0 kata-ctl:Implement functionality to check host is capable of running VM
adf88eaa8 static-build: Remove kata-version parameter
09720babc docs: fix spelling of "crate"
7185afc50 gha: Fix gha actions
21294b868 packaging: Fix indentation in init.sh script
fad3ac9f5 metrics: install kata and launch-times test
4bbfcfaf1 tests: Move tests helper script to this repo
f152f0e8c metrics: Add launch-times to metrics tests
59510cfee runtime-rs: add support vfio device based volume
1e3b372bb runtime-rs: add support vfio device manager
6b0848930 gha: Fix format for run launchtimes metrics yaml
3cefa43e7 tests: Add json script for metrics tests
6a3710055 initramfs: Build dependencies as part of the Dockerfile
aa2380fdd packaging: Add infra to push the initramfs builder image
1c7fcc6cb packaging: Use existing image to build the initramfs
a43ea24df virtiofsd: Convert legacy `-o` sub-options to their `--` replacement
8e00dc694 virtiofsd: Drop `-o no_posix_lock`
2a15ad978 virtiofsd: Stop using deprecated `-f` option
c3043a6c6 tests: Add tests lib common script
b16e0de73 gha: Add base branch on SHA on pull requst
72f2cb84e gpu: Reset cold or hot plug after overriding
fbacc0964 gpu: PCIe topology, consider vhost-user-block in Virt
bc152b114 gha: ci-on-push: Run metrics tests
dad731d5c docs: Update Developer Guide
b11246c3a gpu: Various fixes for virt machine type
40101ea7d vfio: Added annotation for hot(cold) plug
8f0d4e261 vfio: Cleanup of Cold and Hot Plug
b5c4677e0 vfio: Rearrange the bus assignemnt
b1aa8c8a2 gpu: Moved the PCIe configs to drivers
55a66eb7f gpu: Add config to TOML
da42801c3 gpu: Add config settings tests for hot-plug
de39fb7d3 runtime: Add support for GPUDirect and GPUDirect RDMA PCIe topology
9318e022a gpu: Add CC relates configs
b7932be4b gpu: Add Arm64 Kernel Settings
211b0ab26 gpu: Update Kernel Config
5f103003d gpu: Update kernel building to the latest changes
35e4938e8 tools: Fix no-op builds
347385b4e runtime-rs: Enhance flexibility of virtio-fs config
21d227853 versions: Update firecracker version to 1.3.3
0e2379909 gha: Fix `stage` definition in matrix
ae2cfa826 doc: add vcpu handlint doc for runtime-rs
7b1e67819 fix(clippy): fix clippy error
67972ec48 feat(runtime-rs): calculate initial size
aaa96c749 feat(runtime-rs): modify onlineCpuMemRequest
d66f7572d feat(runtime-rs): clear cpuset in runtime side
a0385e138 feat(runtime-rs): update linux resource when stop_process
a39e1e6cd feat(runtime-rs): merge the update_cgroups in update_linux_resources
fa6dff9f7 feat(runtime-rs): support vcpu resizing on runtime side
8cb4238b4 packaging: Remove snap package
213773998 runtime-rs: update Cargo.lock
56d2ea9b7 kata-ctl: Refactor kernel module check
9f7a45996 gha: Add `rootfs-initrd-mariner` build target
f28a62164 gha: Add `cloud-hypervisor-glibc` build target
8fb7ab751 dragonball: introduce virtio-balloon device
7ed949497 dragonball: introduce virtio-mem device
776a15e09 runtime-rs: add support direct volume.
a8e0f51c5 dragonball: extend DeviceOpContext
abae11404 runtime-rs: refactor device manager implementation
210a15794 dragonball: avoid obtaining lock twice in create_stdio_console
69668ce87 tests: gha-run: Use correct env variable for repo
f487199ed gha: aks: Fix argument in call to gha-run.sh
f6afae9c7 packaging: Add rootfs-image-tdx-tarball target
f62b2670c config: Add root hash value and measure config to kernel params
008058807 kernel: Integrate initramfs into Guest kernel
28b264562 initramfs: Add build script to generate initramfs
5cb02a806 image-build: generate root hash as an separate partition for rootfs
31c0ad207 packaging: Add cryptsetup support in Guest kernel and rootfs
980d084f4 log-parser: Update log parser link at README
410bc1814 agent-ctl: fix the compile error
77519fd12 kata-ctl: Switch to slog logging; add --log-level, --json-logging args
aab603096 gha: aks: Extract `run` commands to a script
e4eb664d2 runtime-rs: update rust to 1.69.0
ed37715e0 runtime-rs: handle copy files when share_fs is not available
5f6fc3ed7 runtime-rs: bugfix: update Cargo.lock
1c6d22c80 gha: aks: Use short SHA in cluster name
3c1f6d36d readme: Update Kata Containers logo
388684113 readme: Add status badge for the "Publish Artefacts" job
26f752038 kata-deploy: Change how we get the Ubuntu k8s key
aebd3b47d gha: aks: Ensure host_os is used everywhere needed
0c8282c22 gha: aks: Add the host_os as part of the aks cluster's name
4b89a6bda release: Standardize kata static file name
9228815ad  kernel: Modify build-kernel.sh to accomodate for changes in version.yaml
03027a739 gha: Fix Mariner cluster creation
43e73bdef packaging: make BUILDER_REGISTRY configurable
ffe3157a4 dragonball: add arm64 patches for upcall
560442e6e dragonball: add vcpu_boot_onlined vector
e31772cfe dragonball: add support resize_vcpu on aarch64
64c764c14 dragonball: update dbs-boot to v0.4.0
fd9b41464 dragonball: update comment for init_microvm
af16d3fca gha: Unbreak CI and fix cluster creation step
5ddc4f94c runtime-rs/kata-ctl: Enhancement of DirectVolumeMount.
25d2fb0fd agent: fix the issue of exec hang with a backgroud process
4af4ced1a gha: Create Mariner host as part of k8s tests
eee7aae71 runtime-rs/sandbox_bindmounts: add support for sandbox bindmounts
557b84081 gha: aks: Wait longer to start running the tests
c04c872c4 gha: aks: Increase the timeout time
428041624 kata-deploy: Improve shim backup / restore
14c3f1e9f kata-deploy: Fix indentation on kata deploy merge script
0e47cfc4c runtime: sending SIGKILL to qemu
6a0035e41 doc: Update git commands
433b5add4 kubernetes: add agnhost command in pod yaml
c477ac551 dragonball: Convert VirtioNetDeviceMgr function to method
4659facb7 dragonball: Convert BlockDeviceMgr function to method
ee6deef09 dragonball: Remove virtio-net and vsock devices gracefully
2bda92fac netlink: Fix the issue of update_interface

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-31 09:02:07 +02:00
Jiang Liu
b3901c46d6 runtime-rs: ignore errors during clean up sandbox resources
Ignore errors during clean up sandbox resources as much as we can.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-07-31 13:07:43 +08:00
Chelsea Mafrica
8a2c201719 docs: Update links for pods and kubelet
The links for pods and kubelets no longer work so update to new links
with relevant info.

Fixes #7487

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2023-07-29 00:38:35 +00:00
Gabriela Cervantes
5a1b5d3672 metrics: Add sysbench pod yaml
This PR adds the sysbench pod yaml for the sysbench performance test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-28 20:03:15 +00:00
Gabriela Cervantes
ad413d1646 metrics: Add sysbench dockerfile
This PR adds sysbench dockerfile.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-28 19:58:10 +00:00
Gabriela Cervantes
1512560111 metrics: Add sysbench performance test
This PR adds the sysbench performance test for kata CI.

Fixes #7485

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-28 19:54:12 +00:00
Gabriela Cervantes
bee1a628bd metrics: Fix json result for tensorflow
This PR fixes the json result for tensorflow.i

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-28 17:02:16 +00:00
Jiang Liu
62e328ca5c runtime-rs: refine implementation of TaskService
Refine implementation of TaskService, making handler_message() as a
method.

Fixes: #7479

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-07-29 00:47:33 +08:00
Jiang Liu
458e1bc712 runtime-rs: make send_message() as an method of ServiceManager
Simplify implementation by making send_message() as an method of
ServiceManager.

Fixes: #7479

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-07-29 00:47:31 +08:00
Jiang Liu
1cc1c81c9a runtime-rs: fix possibe bug in ServiceManager::run()
Multiple instances of task service may get registered by
ServiceManager::run(), fix it by making operation symmetric.

Fixes: #7479

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-07-29 00:47:30 +08:00
Jiang Liu
1a5f90dc3f runtime-rs: simplify implementation of service crate
Simplify implementation of service crate.

Fixes: #7479

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-07-29 00:47:28 +08:00
Gabriela Cervantes
51cd99c927 metrics: Round axelnet and resnet results
This PR rounds the axelnet and resnet results in order to extract
properly the result.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-28 16:15:22 +00:00
Gabriela Cervantes
3b883bf5a7 metrics: Fix atoi invalid syntax
This PR will avoid to have the strconv.atoi parsing error when we
are retrieving the results from the json.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-28 16:15:22 +00:00
Gabriela Cervantes
f9dec11a8f checkmetrics: Move checkmetrics to gha-run script
This PR moves the checkmetrics to gha-run script to gathered
tensorflow information.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-28 16:15:22 +00:00
Gabriela Cervantes
53af71cfd0 checkmetrics: Add AlexNet value for qemu
This PR adds AlexNet value for qemu for checkmetrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-28 16:15:22 +00:00
Gabriela Cervantes
a435d36fe1 checkmetrics: Add Resnet value for qemu
This PR adds the Resnet value for qemu for checkmetrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-28 16:15:22 +00:00
Gabriela Cervantes
a79a3a8e1d checkmetrics: Add alexnet value for clh
This PR adds the AlexNet value for clh for checkmetrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-28 16:15:22 +00:00
Gabriela Cervantes
3c32875046 checkmetrics: Add Resnet value for clh
This PR adds the checkmetrics Resnet value for clh.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-28 16:15:22 +00:00
Gabriela Cervantes
08dfaa97aa metrics: General improvements to the tensorflow script
This PR adds general improvements to the tensorflow script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-28 16:15:22 +00:00
Gabriela Cervantes
63b8534b41 metrics: Enable Tensorflow metrics for kata CI
This PR enables the Tensorflow benchmark metrics for kata CI.

Fixes #7395

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-28 16:15:22 +00:00
Aurélien
e8f8641988 Merge pull request #7132 from sprt/aks-volume-tests
tests: Add `k8s-volume` and `k8s-file-volume` tests to GHA CI
2023-07-28 08:58:03 -07:00
Fabiano Fidêncio
68b9acfd02 Merge pull request #7474 from GabyCT/topic/upboo
metrics: Update boot time for kata metrics
2023-07-28 17:55:43 +02:00
David Esparza
f89abcbad8 Merge pull request #7473 from GabyCT/topic/addfioreport
metrics: Add FIO report files for kata metrics
2023-07-28 09:37:21 -06:00
Fabiano Fidêncio
c9742d6fa9 Merge pull request #7411 from fidencio/topic/kata-deploy-create-runtime-classes
kata-deploy: Allow runtimeclasses to be created by the daemonset
2023-07-28 16:05:49 +02:00
Yuan-Zhuo
731e7c763f kata-ctl: add monitor subcommand for runtime-rs
The previous kata-monitor in golang could not communicate with runtime-rs
to gather metrics due to different sandbox addresses.
This PR adds the subcommand monitor in kata-ctl to gather metrics from
runtime-rs and monitor itself.

Fixes: #5017

Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
2023-07-28 17:30:08 +08:00
Yuan-Zhuo
d74639d8c6 kata-ctl: provide the global TIMEOUT for creating MgmtClient
Several functions in kata-ctl need to establish a connection with runtime-rs through MgmtClient.
This PR provides a global TIMEOUT to avoid multiple definitions.

Fixes: #5017

Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
2023-07-28 17:23:37 +08:00
Yuan-Zhuo
02cc4fe9db runtime-rs: add support for gather metrics in runtime-rs
1. Implemented metrics collection for runtime-rs shim and dragonball hypervisor.
2. Described the current supported metrics in runtime-rs.(docs/design/kata-metrics-in-runtime-rs.md)

Fixes: #5017

Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
2023-07-28 17:16:51 +08:00
Fabiano Fidêncio
8353aae41a ci: k8s: Rework get_nodes_and_pods_info()
The amount of info we've added seemed unnecessary, and ends up making
our lives even harder when trying to find errors.

Let's just rely on the kata-debug container to collect the needed info
for us.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-28 10:04:33 +02:00
Fabiano Fidêncio
6ad5d7112e ci: k8s: Do not gather node info before running the tests
It's been proven to not be useful, and ends up making things more
confusing due to the amount of logs printed.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-28 10:04:33 +02:00
Fabiano Fidêncio
5261e3a60c ci: k8s: Group messages to improve readability
Right now is getting way too easy to get lost in the logs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-28 10:04:33 +02:00
Fabiano Fidêncio
9cc6b5f461 ci: k8s: Get logs from kata-deploy
Let's make sure we can debug kata-deploy in case something goes wrong
during its execution.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-28 10:04:33 +02:00
Fabiano Fidêncio
9d285c6226 ci: k8s: Let kata-deploy take care of the runtimeclasses
By doing this we can test the change done for the daemonset. :-)

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-28 10:04:33 +02:00
Fabiano Fidêncio
87568ed985 gha: Test split out runtimeclasses are in sync with all-in-one file
This is needed in order to not lose track of what's been created and
what's been added here and there.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-28 10:04:33 +02:00
Fabiano Fidêncio
39192c6084 kata-deploy: Print variables passed to the script
This will help folks to debug / understand what's been passed to the
kata-deploy.sh script.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-28 10:04:33 +02:00
Fabiano Fidêncio
0e157be6f2 kata-deploy: Allow runtimeclasses to be created by the daemonset
Let's allow the daemonset to create the runtimeclasses, which will
decrease one manual step a user of kata-deploy should take, and also
help us in the Confidential Containers land as the Operator can just
delegate it to this script.

Fixes: #7409

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-28 10:04:33 +02:00
Fabiano Fidêncio
a274333248 kata-deploy: Change default values of DEBUG
This can be easily done as there was no official release with the
previous values.

The reason we're doing so is because when using `yq` to replace the
value, even when forcing `--tag '!!str' "yes"`, the content is placed
without quotes, causing errors in our CI.

While here, we're also removing the fallback value for DEBUG, as it is
**always** set in the kata-deploy.yaml file.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-28 09:50:39 +02:00
Fabiano Fidêncio
69535b8089 kata-deploy: runtimeclass: Split out entries
This will make things simpler to only create the handlers defined by the
kata-deploy user.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-28 09:43:45 +02:00
Fabiano Fidêncio
9e1710674a kata-runtimeClasses: Alphabetically sort the enrties
This will become handy in the near future, as we want to have separate
enrties for each file, while still keeping this one.

Having the entries sorted will make our lives easier to test those are
always in sync.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-28 09:43:45 +02:00
Zhongtao Hu
61a8eabf8e Merge pull request #7139 from openanolis/fix/devmanager
runtime-rs: change block index to 0
2023-07-28 14:04:19 +08:00
Aurélien Bombo
6222bd9103 tests: Add k8s-file-volume test
This imports the k8s-file-volume test from the tests repo and modifies
it slightly to set up the host volume on the AKS host.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-07-27 14:07:55 -07:00
Aurélien Bombo
187a72d381 tests: Add k8s-volume test
This imports the k8s-volume test from the tests repo and modifies it
slightly to set up the host volume on the AKS host.

Fixes: #6566

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-07-27 14:06:43 -07:00
Gabriela Cervantes
0c84270357 metrics: Add boot time value for qemu
This PR adds the boot time value and limit for qemu.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-27 20:06:24 +00:00
Gabriela Cervantes
6520dfee37 metrics: Update boot time for kata metrics
This PR updates the boot time limit for kata metrics.

Fixes #7475

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-27 19:14:19 +00:00
Gabriela Cervantes
ff22790617 metrics: Update runtime and configuration paths
This PR updates the runtime and configuration paths for kata containers.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-27 17:14:03 +00:00
Gabriela Cervantes
a5d4e33880 metrics: Add compare virtiofsd dax script
This PR adds the compare virtiofsd dax script for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-27 16:53:50 +00:00
Gabriela Cervantes
5e937fa622 metrics: Update general FIO tests
This PR updates general FIO tests by adding the recent date of a change.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-27 16:47:17 +00:00
Gabriela Cervantes
b0bea47c53 metrics: Add makefile to report generator
This PR adds the makefile to report generator for the FIO test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-27 16:42:11 +00:00
Gabriela Cervantes
73c57b9a19 metrics: Add FIO report files for kata metrics
This PR adds FIO report files for kata metrics.

Fixes #7472

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-27 16:39:35 +00:00
Chelsea Mafrica
e941b3a094 Merge pull request #7456 from alakesh/agent-fix-typo
agent: fix typo in constant
2023-07-27 09:31:24 -07:00
David Esparza
ba8a8fcbf2 Merge pull request #7442 from GabyCT/topic/addgofilesfio
metrics: Add FIO benchmark for metrics tests
2023-07-27 10:20:43 -06:00
Zhongtao Hu
c8fcd29d9b runtime-rs: use device manager to handle virtio-pmem
use device manager to handle virtio-pmem device

Fixes: #7119
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-07-27 20:18:49 +08:00
Zhongtao Hu
901c192251 runtime-rs: support configure vm_rootfs_driver
support configure vm_rootfs_driver in toml config

Fixes: #7119
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-07-27 20:12:53 +08:00
Zhongtao Hu
5d6199f9bc runtime-rs: use device manager to handle vm rootfs
use device manager to handle vm rootfs, after attach the block device of
vm rootfs, we need to increase index number

Fixes: #7119
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-07-27 20:12:45 +08:00
James O. D. Hunt
20f1f62a2a runtime-rs: change block index to 0
Change block index in SharedInfo to 0 for vda.

Fixes #7119

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-07-27 20:11:44 +08:00
Chao Wu
ede1dae65d Merge pull request #7465 from fidencio/topic/fix-dragonball-static-check-runner-selector
gha: dragonball: Run only on the dragonball labeled machine
2023-07-27 10:19:26 +08:00
Gabriela Cervantes
662f87539e metrics: Add general FIO makefile
This PR adds a general FIO makefile for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-26 20:46:02 +00:00
Fabiano Fidêncio
f28af98ac6 Merge pull request #7453 from sprt/fix-ci-node-debugger
tests: Fix `k8s-job` test
2023-07-26 22:27:21 +02:00
Fabiano Fidêncio
8a22b5f075 Merge pull request #7439 from ManaSugi/fix/remove-unused-mut
agent,libs: Remove unused 'mut' keywords
2023-07-26 21:25:41 +02:00
Fabiano Fidêncio
9792ac49fe Merge pull request #7425 from jongwu/remove_mut
runtime-rs: remove unneeded 'mut' keywords
2023-07-26 21:24:40 +02:00
Fabiano Fidêncio
24564a8499 Merge pull request #7455 from sprt/local-tests
tests: QoL improvements for running tests locally
2023-07-26 21:23:43 +02:00
Aurélien Bombo
c5a87eed29 tests: gha: Add timeout to cluster creation
This has been intermittently taking a while lately so let's add a
timeout.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-07-26 10:19:07 -07:00
Aurélien Bombo
6daeb08e69 tests: k8s: Clean up node debuggers after running
This deletes node debugger pods after execution since their presence may
affect tests that assume only test workloads pods are present.

For example, in `k8s-job` we wait for *any* pod to be in the `Succeeded`
state before proceeding, which causes failures.

Fixes: #7452

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-07-26 10:19:07 -07:00
Fabiano Fidêncio
3aa6c77a01 gha: dragonball: Run only on the dragonball labeled machine
Static checks for dragonball are landing on any of the self-hosted
runners, and the reason for that is because "self-hosted" was the label
selector used.

Let's use "dragonball" instead, as the machine has that label as well.

Fixes: #7464

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-26 18:15:04 +02:00
Gabriela Cervantes
37641a5430 metrics: Add example config for fio jobs
This PR adds example config for fio jobs.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-26 16:03:12 +00:00
Alakesh Haloi
314aec73d4 agent: fix typo in constant
It fixes a constant name to have the right spelling

Fixes: #7457
Signed-off-by: Alakesh Haloi <a_haloi@apple.com>
2023-07-26 00:06:34 -05:00
Aurélien Bombo
4703434b12 tests: k8s: Allow using custom resource group
This simply allows setting a custom resource group when debugging
locally, so as to prevent name collisions and not pollute the namespace.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-07-25 15:45:44 -07:00
Aurélien Bombo
350f3f70b7 tests: Import common.bash in run_kubernetes_tests.sh
Not sure why this works in GHA, but the `info` call on line 65 would
fail locally.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-07-25 15:45:44 -07:00
Aurélien Bombo
d7f04a64a0 tests: k8s: Leave runtimeclass_workloads/ alone
Makes it so that `setup.sh` doesn't make changes in
`runtimeclass_workloads/` directly. Instead we treat that as a template
directory and we use the new directory `runtimeclass_workloads_work/` as
a work dir.

This has two advantages:

 * Allows rerunning tests without the assumption that `setup.sh` must be
   idempotent. E.g. the `set_runtime_class()` step would break.
 * Doesn't pollute your git environment with a bunch of changes when
   developing.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-07-25 15:45:44 -07:00
Aurélien Bombo
bdde6aa948 tests: k8s: Split deployment and testing commands
This splits deploying Kata and running the tests into separate commands
to make it possible to rerun tests locally without having to redeploy
Kata each time.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-07-25 15:44:46 -07:00
Aurélien Bombo
91a0b3b406 tests: aks: Simply delete cluster when cleaning up
If we're going to delete the cluster anyway, no need to call
kata-cleanup.

Fixes: #7454

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-07-25 15:44:46 -07:00
Gabriela Cervantes
3c1044d9d5 metrics: Update FIO paths for k8s runner
This PR updates the FIO paths for k8s runner.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-25 20:50:03 +00:00
Eric Ernst
5385ddc560 Merge pull request #7365 from alakesh/symlink-fix
agent: exclude symlinks from recursive ownership change
2023-07-25 11:27:48 -07:00
Gabriela Cervantes
6177a0db3e metrics: Add env files for FIO
This PR adds the env files for FIO for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-25 17:48:45 +00:00
Gabriela Cervantes
a45900324d metrics: Add fio exec
This PR adds fio exec for the FIO benchmark.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-25 17:36:08 +00:00
Gabriela Cervantes
ea198fddcc metrics: Add FIO runner k8s
Add program to execute FIO workloads using k8s.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-25 17:34:29 +00:00
Gabriela Cervantes
8f7ef41c14 metrics: Add FIO vendor code
This PR adds the FIO vendor code.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-25 17:24:29 +00:00
Gabriela Cervantes
6293c17bde metrics: Add FIO benchmark for metrics tests
This PR adds the FIO benchmark scripts and resources for the metrics
tests section.

Fixes #7441

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-25 16:36:33 +00:00
Fabiano Fidêncio
cdf04e5018 Merge pull request #7437 from jepio/fix-sev-kernel-cache
cache: kernel: Fix kernel caching
2023-07-25 18:10:03 +02:00
GabyCT
7a3b55ce67 Merge pull request #7432 from ManaSugi/runk/doc-docker
runk: Add Docker guide to README
2023-07-25 09:56:02 -06:00
GabyCT
c1bd527163 Merge pull request #7430 from GabyCT/topic/fixjson
metrics: General improvements to json.bash script
2023-07-25 09:45:53 -06:00
Fabiano Fidêncio
6efd684a46 Merge pull request #7408 from fidencio/topic/kata-deploy-add-SHIMS-and-SHIM_DEFAULT-as-env
kata-deploy: Allow shim creation based on what's passed to the daemonset
2023-07-25 16:56:46 +02:00
Fabiano Fidêncio
5b82268d2c Merge pull request #7436 from jepio/vfio-gha
gha: ci: Add skeleton of vfio job
2023-07-25 14:44:04 +02:00
Manabu Sugimoto
ff4cfcd8a2 runk: Add Docker guide to README
`runk` can launch containers using Docker, so add the guide
to it's README.

```sh
$ sudo dockerd --experimental --add-runtime="runk=/usr/local/bin/runk"
$ sudo docker run -it --rm --runtime runk busybox echo hello runk
hello runk
```

Fixes: #7431

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-07-25 20:10:49 +09:00
Jeremi Piotrowski
c8ac56569a cache: kernel: Harmonize commit with fetching side
kata-deploy-binaries.sh uses the last commit in
tools/packaging/static-build/kernel for its version check, while the cache
generation uses tools/packaging/kernel. Use tools/packaging/static-build/kernel
as $kata_config_version is already part of the version string and covers any
changes to tools/packaging/kernel.

Fixes: #7403
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-07-25 12:23:05 +02:00
Jeremi Piotrowski
81775ab1b3 cache: kernel: Fix SEV kernel caching
The SEV kernel cache calls create_cache_asset() twice, once for the kernel and
once for modules. Both calls need to use the same version string, otherwise the
second call overwrites the "latest" file of the first one and the cache is not
used.

Fixes: #7403
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-07-25 11:58:19 +02:00
Jeremi Piotrowski
717f775f30 gha: ci: Add skeleton of vfio job
This job will run on a nested virt capable Azure VM (improving test
concurrency). This is just a placeholder while we adapt the test to GHA.

Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-07-25 11:13:04 +02:00
Manabu Sugimoto
b9f100b391 agent,libs: Remove unused 'mut' keywords
Remove unused `mut` because the agent compilation fails
when the rust compiler is >= 1.71. This is related to #7425

Fixes: #7438

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-07-25 17:41:08 +09:00
Fabiano Fidêncio
a56f96bb2b kata-deploy: Allow shim creation based on what's passed to the daemonset
Instead of hardcoding shims as part of the script, let's ensure we can
allow them to be created based on environment variables passed to the
daemonset.

This change brings no functionality change as the default values in the
daemonset are exactly what has been used as part of the scripts.

Fixes: #7407

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-25 08:30:00 +02:00
Fabiano Fidêncio
5ce0b4743f Merge pull request #7382 from zvonkok/vfio-ap-debug
s390x: Fixing device.Bus assignment
2023-07-25 08:26:25 +02:00
David Esparza
b11d618a3f Merge pull request #7413 from fidencio/topic/release-publish-builder-images
release: Mention the container images used to build the project
2023-07-24 15:46:31 -06:00
Fabiano Fidêncio
56fdeb1247 Merge pull request #7417 from fidencio/topic/kata-deploy-binaries-cached-kernel-fix
kata-deploy-binaries: kernel_cache: Take module_dir into account
2023-07-24 22:26:09 +02:00
Gabriela Cervantes
4a5ab38f16 metrics: General improvements to json.bash script
This PR adds general improvements like putting function before function
name and consistency in how we declare variables and so on to have
uniformity across the metrics scripts.

Fixes #7429

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-24 16:51:38 +00:00
Fabiano Fidêncio
d4eba36980 kata-deploy-binaries: kernel_cache: Take module_dir into account
`module_dir` has been passed to the function but was never assigned to a
var, leading to errors when trying to use it.

Fixes: #7416

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-24 18:19:13 +02:00
Fabiano Fidêncio
b7c9867d60 release: Mention the container images used to build the project
This is a small step towards build reproducibility.

Fixes: #7412

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-24 18:01:57 +02:00
Wainer Moschetta
2e9853c761 Merge pull request #7427 from fidencio/topic/gha-port-nydus-tests-follow-up-1
ci: nydus: Fix typo in "source"
2023-07-24 11:20:05 -03:00
Fabiano Fidêncio
7c4b597816 ci: nydus: Fix typo in "source"
We should source from `nydus_dir`, instead of `cri_containerd_dir`, and
that was a leftover from fb4f7a002c.

Fixes: #6543

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-24 14:55:09 +02:00
Fabiano Fidêncio
589672d510 Merge pull request #7426 from fidencio/topic/gha-port-nydus-tests
gha: ci: Add no-op nydus tests to our CI
2023-07-24 13:56:57 +02:00
Fabiano Fidêncio
6a680e241b gha: ci: Add placeholder for the nydus tests as part of the CI
This will triger the nydus tests, but as they currently are they'll just
return "okay" without actually executing.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-24 13:37:36 +02:00
Fabiano Fidêncio
fb4f7a002c gha: nydus: Add a no-op GHA for nydus
This newly added GHA does nothing, is not even triggered, and it's just
a placeholder that we'll grow in the next commits / PRs, so we can
actually start running the nydus tests as part of our CI.

Fixes: #6543

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-24 13:37:33 +02:00
Fupan Li
0ae987973b Merge pull request #7367 from openanolis/chao/migrate_dragonball_sandbox
Dragonball: migrate dragonball-sandbox crates to Kata
2023-07-24 17:52:11 +08:00
Fabiano Fidêncio
4a207a16f9 gha: nydus: Bring tests as they are from the tests repo
Let's bring the nydus tests, without any kind of modification, from the
tests repo.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-24 10:56:41 +02:00
Jianyong Wu
2c8f83424d runtime-rs: remove unneeded 'mut' keywords
These unneeded 'mut' keywords blocks built by rust 1.71.0. Remove them.

Fixes: #7424
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-07-24 08:47:15 +00:00
Zvonko Kaiser
1fc715bc65 s390x: Add AP Attach/Detach test
Now that we have propper AP device support add a
unit test for testing the correct Attach/Detach of AP devices.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-23 13:44:19 +00:00
Fabiano Fidêncio
e1a4040a6c Merge pull request #7326 from fidencio/topic/gha-ci-add-cri-containerd-tests
ci: gha: Add cri-containerd tests (but still do not enable them)
2023-07-21 19:29:38 +02:00
Fabiano Fidêncio
6a59e227b6 Merge pull request #7399 from fidencio/topic/add-kata-debug
packaging/tools: Add kata-debug and use it as part of our CI
2023-07-21 17:05:27 +02:00
Fabiano Fidêncio
e91f5edba0 ci: cri-containerd: Fix default typo for testContainerStart()
It must but {1:-0}, instead of {1-0}.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
8b8aef09af ci: cri-containerd: Temporarily disable TestContainerSwap
The test is currently failing with GHA, and I don't think it makes sense
to block all the other tests to get merged while it's happening.

For now, let's disable it and re-enable it as soon as we have it
passing.

Reference: https://github.com/kata-containers/kata-containers/issues/7410

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
56767001cb ci: cri-containerd: Add namespace / uid to the pods
Otherwise crictl will fail to remove them with:
```
getting sandbox status of pod "$pod": metadata.Name, metadata.Namespace
or metadata.Uid is not in metadata "..."
```

A huge shout out to Steven Horsman for helping to debug this one.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
a84773652c ci: cri-containerd: Always use sudo to call crictl
Otherwise we may get the following error:
```
time="2023-07-15T21:12:13Z" level=fatal msg="validate service connection: validate CRI v1 runtime API for endpoint \"unix:///run/containerd/containerd.sock\": rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: permission denied\""
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
99ba86a1b2 ci: cri-containerd: Add /usr/local/go/bin to the PATH
Otherwise go is not picked up.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
7f3b309997 ci: cri-containerd: Add function before each function
We've been doing this for all files moved to this repo.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
fde22d6bce ci: cri-containerd: Assume podman is always used
For this set of tests, we'll always be using podman in order to avoid
having containerd pulled in by docker.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
9465a04963 ci: cri-containerd: Adapt "source ..." to this repo
Let's adapt what we "source" to the kata-containers repo.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
df8d144119 ci: cri-containerd: Remove CI variable
We always want to run the tests using as much debug as possible.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
f90570aef0 ci: cri-containerd: Remove unused runc_runtime_bin
The variable is not used anywhere in our tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
c3637039f4 ci: cri-containerd: Remove KILL_VMM_TEST env var
We don't need the env var, we just need to restrict the test according
to the KATA_HYPERVISOR used, as right now it's very specifict to QEMU.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
bc4919f9b2 ci: cri-containerd: Always run shim-v2 tests
We only have shim-v2 as the runtime type, so we always need to run tests
using it. :-)

We had to adjust the script in order to properly run the tests with the
current logic.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
f9e332c6db ci: cri-containerd: Stop cloning containerd
It's already done as part of the install_dependencies()

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
cfd662fee9 ci: cri-containerd: Remove ununsed SNAP_CI var
We don't support SNAP anymore, thus we can remove the var.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
d36c3395c0 ci: cri-containerd: Update copyright
As we're touching the file already, let's update its Copyright info.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
b5be8a4a8f ci: cri-containerd: Move integration-tests.sh as it was
Let's move the `integration/containerd/cri/integration-tests.sh` file
from the tests repo to this one.

The file has been moved as it is, it's not used, and in the following
commits we'll clean it up before actually using it.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
f2e00c95c0 ci: cri-containerd: Populate install_dependencies()
Let's install all the dependencies needed for running the
`cri-containerd` tests.

The list of dependencies we have are:
* From the system
  - build-essential
  - jq
  - podman-docker
* From our own repo
  - yq
  - go
* From GitHub projects
  - containerd
  - cri-tools

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
8979552527 versions: Add "latest" field for cri-tools
As we don't want to disrupt what we have on the `tests` repo, let's
create a "latest" entry and use that for the GitHub actions tests.

Once we deprecate the `tests` repo we can decide whether we want to
stick to using "latest" or switch back to "version".

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
1bbcbafa67 ci: Add clone_cri_container()
This function will simply clone containerd repo, specifically on a tag
we want to use to test.

This can be expanded for different projects, and it will be the case as
soon as we grow the tests.  But, for now, let's keep it simple.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
f66c68a2bf ci: Add install_cri_tools()
This function will install cri-tools in the host, and soon enough (as
part of this PR) we'll be using it to install cri-tools as part of the
cri-containerd tests.

I've decided to have this as part of the `common.bash` as other tests
that will be added in the future will require cri-tools to be installed
as well.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
4dd828414f ci: Add install_cri_containerd()
This function will install cri-containerd in the host, and soon enough
(as part of this PR) we'll be using it to install cri-containerd as part
of the cri-containerd tests.

I've decided to have this as part of the `common.bash` as other tests
that will be added in the future will require cri-containerd to be
installed as well.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
ad47d1b9f8 ci: Add download_github_project_tarball()
This function will hel us to get the tarball, from a github project,
that we're going to use as part of our tests.

Right now this is not used anywhere, but it'll soon enough (as part of
this series) be used to download the cri-containerd / cri-tools / cni
tarballs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
788c562a95 ci: Add get_latest_patch_release_from_a_github_project()
This function will help us to get the latest patch release from a
GitHub project.

The idea behind this function is that we don't have to keep updating
versions.yaml that frequently (or worse, have it outdated as it
currently is), and always test against the latest patch release of a
given project's version that we care about.

Although right now this is not used anywhere, this will be used with the
coming cri-containerd tests, which will be part of this series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
6742f3a898 ci: Use function before each install_go.sh function
We've been doing this for all files moved to this repo.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
5eacecffc3 ci: Adjust paths for install_go.sh
Let's adjust paths for what we source and the scripts we call, after
moving from the tests repo to this one.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
8ed1595f96 ci: Update copyright for install_go.sh
As we're touching the file already, let's update its Copyright info.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
6123d0db2c ci: Move install_go.sh as it was
Let's move `.ci/install_go.sh` file from the tests repo to this one.

The file has been moved as it is, it's not used, and in the following
commits we'll clean it up before actually using it.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
8653be71b2 ci: Do not take cross-build into consideration for kata-arch.sh
Right now we'd need to import lib.sh just in order to get cross-build
information for rust, and it seems a little bit premature to do so at
this stage and only for rust.

Let's skip it and keep this transition simple.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
6a76bf92cb ci: Fix style / identation if kata-arch.sh
We've been using:
```
function foo() {
}
```

instead of
```
function foo()
{
}
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
72743851c1 ci: Add function before each kata-arch.sh function
We've been doing this for all files moved to this repo.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
9f6d4892c8 ci: Update copyright for kata-arch.sh
As we're touching the file already, let's update its Copyright info.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
6f73a72839 ci: Move kata-arch.sh as it was
Let's move `.ci/kata-arch.sh` file from the tests repo to this one.

The file has been moved as it is, it's not used, and in the following
commits we'll clean it up before actually using it.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
3615d73433 ci: Add get_from_kata_deps()
First of all, I'm 100% aware that I'm duplicating this function here as
I've copied it from the packaging stuff, and I'm not exactly proud of
that.

However, right now it seems a little bit premature to combine that set
of scripts with this set of scripts in a single one and make them used
by both pieces of our project.

Anyways, this functions helps to get information from the
`versions.yaml` file, and it'll be used as part of the cri-containerd
tests and a few others in the future.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
34779491e0 gha: kubernetes: Avoid declaring repo_root_dir
This is already declared as part of the `common.bash` file, so let's
just make sure we use it from there.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
f3738beaca tests: Use $HOME/go as fallback for $GOPATH
Considering that someone may want to run the tests locally, we shouldn't
rely on having GITHUB_WORKSPACE exported, and fallback to $HOME/go if
needed.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
b87ed27416 tests: Move ensure_yq to common.bash
As this function will be used by different scripts, let's move it to a
common place.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Jeremi Piotrowski
124e390333 tests: common: Fix quoting when globbing
When the glob star is inside quotes, there is only one iteration of the loop
and b holds all matches at once. Move the glob out of the quotes so that we
actually iterate over matched paths.

Fixes: #6543
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
db77c9a438 tests: Make install_kata take care of the links
It makes the kata-containers installation more complete.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
13715db1f8 tests: Do not call install_check_metrics when installing kata
The `install_kata` function was moved from the metrics' `gha-run.sh`
file to the `common.bash` in the commit 3ffd48bc16, but I didn't notice
that it brought with it a call to `install_check_metrics`, which is
totally unrelated to installing Kata Containers.

Let's remove the call so the function is a little bit less specific, and
move the call to install_check_metrics to the metrics `gha-run.sh` file.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 16:54:27 +02:00
Fabiano Fidêncio
e149a3c783 Merge pull request #7404 from fidencio/topic/cache-consider-changes-in-the-scripts-used-to-build-the-kernel
cache: kernel: Consider changes in tools/packaging/kernel
2023-07-21 15:05:01 +02:00
Fabiano Fidêncio
630634c5df ci: k8s: Group logs to make them easier to read
Otherwise it becomes really hard to find the info you're looking for.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 14:05:30 +02:00
Fabiano Fidêncio
228b30f31c ci: k8s: Gather node info during the cleanup
This will make our lives easier to debug issues with the CI.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 14:05:30 +02:00
Fabiano Fidêncio
81f99543ec ci: k8s: Cleanup cluster before deleting it
This will help us to in two fronts:
* catching possible issues related to kata-deploy cleanup
* do more (like, in the future, collect logs) after the tests run

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 14:05:30 +02:00
Fabiano Fidêncio
38a7b5325f packaging/tools: Add kata-debug
kata-debug is a tool that is used as part of the Kata Containers CI to gather
information from the node, in order to help debugging issues with Kata
Containers.

As one can imagine, this can be expanded and used outside of the CI context,
and any contribution back to the script is very much welcome.

The resulting container is stored at the [Kata Containers quay.io
space](https://quay.io/repository/kata-containers/kata-debug) and can
be used as shown below:
```sh
kubectl debug $NODE_NAME -it --image=quay.io/kata-containers/kata-debug:latest
```

Fixes: #7397

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 14:05:30 +02:00
Fabiano Fidêncio
a0fd41fd37 Merge pull request #7406 from fidencio/topic/merge-tarball-fix-version-yaml-not-found
kata-deploy: Properly get the path of the versions.yaml file
2023-07-21 14:04:18 +02:00
Fabiano Fidêncio
ae6e8d2b38 kata-deploy: Properly get the path of the versions.yaml file
We need to correctly get the full path of the versions.yaml file as part
of the merge-builds.sh script, as we do a `pushd` there and that leads
to a fail merging the artefacts as the `versions.yaml` file does not
exists in that path.

Fixes: #7405

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 12:02:11 +02:00
Fabiano Fidêncio
309e232553 cache: kernel: Consider changes in tools/packaging/kernel
Any change in the script used to build the kernel should invalidate the
cache.

Fixes: #7403

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-21 11:48:29 +02:00
GabyCT
f95a7896b1 Merge pull request #7394 from fidencio/topic/ship-VERSIOB-and-versions.yaml-as-part-of-release-tarball
kata-deploy: Add VERSION and versions.yaml to the final tarball
2023-07-20 14:38:21 -06:00
GabyCT
14025baafe Merge pull request #7376 from GabyCT/topic/addcray
metrics: Add C-Ray performance test
2023-07-20 14:37:53 -06:00
GabyCT
b629f6a822 Merge pull request #7363 from GabyCT/topic/enabletensorflow
metrics: enable TensorFlow benchmark to be run on gha
2023-07-20 13:36:55 -06:00
Fabiano Fidêncio
59fdd69b85 kata-deploy: Add VERSION and versions.yaml to the final tarball
Let's make things simpler to figure out which version of Kata
Containers has been deployed, and also which artefacts come with it.

This will help us immensely in the future, for the TEEs use case, so we
can easily know whether we can deploy a specific guest kernel for a
specific host kernel.

Fixes: #7394

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-20 18:33:14 +02:00
Fabiano Fidêncio
5dddd7c5d1 release: Upload versions.yaml as part of the release
Although this file is far away from being a SBOM, it'll help folks to
easily visualise which components are part of a release, and even have
SBOMs generated from that.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-20 18:31:21 +02:00
Gabriela Cervantes
bad3ac84b0 metrics: Rename C-Ray to cpu performance tests
This PR renames C-Ray tests to cpu category.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-20 15:56:02 +00:00
Fabiano Fidêncio
87d99a71ec versions: Remove "kernel-experimental"
We've not been using nor shipping this kernel for a very long time.

Regardless, we're leaving behind the logic in the kernel scripts to
build it, in case it becomes necessary in the future.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-20 17:14:22 +02:00
Zvonko Kaiser
545de5042a vfio: Fix tests
Now with more elaborate checking of cold|hot plug ports
we needed to update some of the tests.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 13:42:44 +00:00
Zvonko Kaiser
62aa6750ec vfio: Added better handling of VFIO Control Devices
Depending on the vfio_mode we need to mount the
VFIO control device additionally into the container.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 13:42:42 +00:00
Fabiano Fidêncio
fe07ac662d Merge pull request #7387 from GabyCT/topic/fixmemoryinsidec
metrics: Add function to memory inside container script
2023-07-20 10:06:15 +02:00
Zvonko Kaiser
dd422ccb69 vfio: Remove obsolete HotplugVFIOonRootBus
Removing HotplugVFIOonRootBus which is obsolete with the latest PCI
topology changes, users can set cold_plug_vfio or hot_plug_vfio either
in the configuration.toml or via annotations.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 07:25:40 +00:00
Zvonko Kaiser
114542e2ba s390x: Fixing device.Bus assignment
The device.Bus was reset if a specific combination of
configuration parameters were not met. With the new
PCIe topology this should not happen anymore

Fixes: #7381

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-20 07:24:26 +00:00
Alakesh Haloi
371a118ad0 agent: exclude symlinks from recursive ownership change
currently when fsGroup is used with direct-assign, kata agent
recursively changes ownership and permission for each file including
symlinks. However the problem with symlinks is, the permission of
the symlink itself may not be same as the underlying file. So while
doing recursive ownership and permission changes we should skip
symlinks.

Fixes: #7364
Signed-off-by: Alakesh Haloi <a_haloi@apple.com>
2023-07-19 20:42:55 -07:00
Gabriela Cervantes
e64edf41e5 metrics: Add tensorflow function in gha-run script
This PR adds the tensorflow function in gha-run script in order to
be triggered in the gha.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-19 21:31:51 +00:00
Gabriela Cervantes
67a6fff4f7 metrics: Enable tensorflow benchmark on gha
This PR enables the TensorFlow benchmark on gha for the kata metrics CI.

Fixes #7362

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-19 21:31:51 +00:00
GabyCT
c3f21c36f3 Merge pull request #7388 from dborquez/revert-commit-broke-checkmetrics-baseline-values
Revert "metrics: Replace backslashes used to escape double quoted key in jq expr"
2023-07-19 14:36:16 -06:00
David Esparza
01450deb6a Revert "metrics: Replace backslashes used to escape double quoted key in jq expr."
This reverts commit 468f017e21.

Fixes: #7385

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-07-19 10:07:11 -06:00
Gabriela Cervantes
8430068058 metrics: Add function to memory inside container script
This PR adds function before function of the variables at the memory
inside container script in order to have uniformity across the script.

Fixes #7386

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-19 16:00:53 +00:00
Chao Wu
bbd3c1b6ab Dragonball: migrate dragonball-sandbox crates to Kata
In order to make it easier for developers to contribute to Dragonball,
we decide to migrate all dragonball-sandbox crates to Kata.

fixes: #7262

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-07-19 19:41:57 +08:00
Chao Wu
7153b51578 Merge pull request #7372 from fidencio/topic/bump-virtiofsd-to-v1.7.0
versions: Bump virtiofsd to v1.7.0
2023-07-19 10:51:49 +08:00
GabyCT
8c662916ab Merge pull request #7377 from dborquez/add_verbosity_to_blogbench
metrics: stop hypervirsor and shim at init_env stage
2023-07-18 15:57:54 -06:00
Fabiano Fidêncio
5f7da301fd Merge pull request #7378 from fidencio/topic/ci-k8s-fix-source-path
ci: k8s: Adapt "source ..." to the new location of gha-run.sh
2023-07-18 22:30:55 +02:00
Fabiano Fidêncio
fad801d0fb ci: k8s: Adapt "source ..." to the new location of gha-run.sh
This is a follow up of 2ee2cd307b, which
changed the location of gha-run.sh

Fixes: #7373

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-18 21:26:41 +02:00
David Esparza
55e2f0955b metrics: stop hypervirsor and shim at init_env stage
This PR kills the hypervisor and the kata shim in the
init_env stage prior to launch any metric test.
Additionally this PR adds info messages in the main blocks
of the blogbench test to help in debugging.

Fixes: #7366

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-07-18 12:05:29 -06:00
Gabriela Cervantes
556e663fce metrics: Add disk link to general metrics README
This PR adds the disk link information to the general metrics README.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-18 16:42:35 +00:00
Gabriela Cervantes
98c1217093 metrics: Add C-Ray README
This PR adds the C-Ray documentation at the README file.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-18 16:35:54 +00:00
Gabriela Cervantes
8e7d9926e4 metrics: Add C-Ray Dockerfile
This PR adds the C-Ray Dockerfile for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-18 16:33:55 +00:00
Gabriela Cervantes
e2ee769783 metrics: Add C-Ray performance test
This PR adds C-Ray performance test in order to be part of the kata
metrics CI.

Fixes #7375

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-18 16:32:23 +00:00
Fabiano Fidêncio
2011e3d72a Merge pull request #7374 from fidencio/topic/ci-tdx-adjust-kubeconfig-path
ci: Move `tests/integration/gha-run.sh`  to `tests/integration/kuberentes/` ... and also remove KUBECONFIG from the tdx envs
2023-07-18 17:32:57 +02:00
Fabiano Fidêncio
8e09e04f48 Merge pull request #6788 from jepio/kernel-update-6.1-lts
versions: Update kernel to version v6.1.x
2023-07-18 17:29:21 +02:00
Chao Wu
935432c36d Merge pull request #7352 from justxuewei/exec-hang
agent: Fix exec hang issues with a backgroud process
2023-07-18 23:02:18 +08:00
Fabiano Fidêncio
2ee2cd307b ci: k8s: Move gha-run.sh to the kubernetes dir
The file belongs there, as it's only used for k8s related tests.

Fixes: #7373

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-18 15:45:06 +02:00
Fabiano Fidêncio
88eaff5330 ci: tdx: Adjust KUBECONFIG
We don't need to export KUBECONFIG there.  Let's just make sure we have
the server correctly setup and avoid doing that.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-18 15:39:52 +02:00
Jeremi Piotrowski
c09e268a1b versions: Downgrade SEV(-SNP) kernel back to v5.19.x
CC-GPU seems to have issues with v6.1, so downgrade the kernels used for
SEV-SNP to a known-working version. It is worth mentioning that TDX is also
still on 5.19.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-07-18 15:29:46 +02:00
Fabiano Fidêncio
25d80fcec2 Merge pull request #6993 from zvonkok/kata-agent-init-mount
agent: Ignore already mounted dev/fs/pseudo-fs
2023-07-18 14:11:44 +02:00
Fabiano Fidêncio
4687f2bf9d Merge pull request #7369 from fidencio/topic/gha-ci-bring-tdx-back
ci: k8s: Bring TDX tests back
2023-07-18 13:28:33 +02:00
Fabiano Fidêncio
6a7a323656 versions: Bump virtiofsd to v1.7.0
https://gitlab.com/virtio-fs/virtiofsd/-/releases/v1.7.0 was released
Today.

Fixes: #7371

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-18 12:33:13 +02:00
Fabiano Fidêncio
ac5f5353ba ci: k8s: Bring TDX tests back
Now that we have a new TDX machine plugged into our CI, let's re-enable
the TDX tests.

Fixes: #7368

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-18 10:33:43 +02:00
Jeremi Piotrowski
950b89ffac versions: Update kernel to version v6.1.38
Kernel v6.1.38 is the current latest LTS version, switch to it.  No
patches should be necessary. Some CONFIG options have been removed:

- CONFIG_MEMCG_SWAP is covered by CONFIG_SWAP and CONFIG_MEMCG
- CONFIG_ARCH_RANDOM is unconditionally compiled in
- CONFIG_ARM64_CRYPTO is covered by CONFIG_CRYPTO and ARCH=arm64

Fixes: #6086
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-07-18 10:04:21 +02:00
GabyCT
7729d82e6e Merge pull request #7360 from GabyCT/topic/updategraldoc
metrics: Update machine learning documentation
2023-07-17 15:30:13 -06:00
Fabiano Fidêncio
26d525fcf3 Merge pull request #7361 from fidencio/topic/gha-ci-add-cri-containerd-tests-skeleton-follow-up-2
gha: ci: cri-containerd: Fix KATA_HYPERVSIOR typo
2023-07-17 22:38:50 +02:00
GabyCT
b4852c8544 Merge pull request #7335 from kata-containers/topic/addmobilenet
tests: Add MobileNet Tensorflow performance benchmark
2023-07-17 14:36:59 -06:00
Gabriela Cervantes
8ccc1e5c93 metrics: Update machine learning documentation
This PR updates the machine learning documentation related with
Tensorflow and Pytorch benchmarks.

Fixes #7359

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-17 20:32:49 +00:00
Fabiano Fidêncio
f50d2b0664 gha: ci: cri-containerd: Fix KATA_HYPERVSIOR typo
KATA_HYPERVSIOR should be KATA_HYPERVISOR

Fixes: #6543

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-17 21:56:51 +02:00
David Esparza
687596ae41 Merge pull request #7320 from dborquez/fix_jq_checkmetrics_checkvar_expression
metrics: replace backslashes used to escape double quoted jq key expr.
2023-07-17 13:50:18 -06:00
Gabriela Cervantes
620b945975 metrics: Add Tensorflow Mobilenet documentation
This PR adds the Tensorflow mobilinet documentation for the machine
learning README.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-17 17:39:05 +00:00
Zhongtao Hu
d50f3888af Merge pull request #7219 from Apokleos/network-refactor
runtime-rs: enhancement of Device Manager for network endpoints.
2023-07-17 14:13:51 +08:00
QuanweiZhou
ce14f26d82 Merge pull request #5450 from openanolis/trace_rs
feat(Tracing): tracing in Rust runtime
2023-07-17 09:27:13 +08:00
Manabu Sugimoto
f1d8de9be6 runk: Allow runk to launch a container without pid namespace
Allow runk to launch a container even though users don't specify the
pid namespace in `config.json` because general container runtimes
such as runc also can launch a container without the namespace.
On the other hand, Kata Containers doesn't allow it due to security issue
so this feature should be enabled in only runk.

Fixes: #7168

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-07-16 23:31:14 +05:30
Zhongtao Hu
419f8a5db7 Merge pull request #7021 from cheriL/7020/ignore-unconfigured-netinterface
runtime-rs: ignore unconfigured network interfaces
2023-07-16 10:11:15 +08:00
Xuewei Niu
6c91af0a26 agent: Fix exec hang issues with a backgroud process
Issue #4747 and pull request #4748 fix exec hang issues where the exec
command hangs when a process's stdout is not closed. However, the PR might
cause the exec command not to work as expected, leading to CI failure. The
PR was reverted in #7042. This PR resolves the exec hang issues and has
undergone 1000 rounds of testing to verify that it would not cause any CI
failures.

Fixes: #4747

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-07-16 08:32:45 +08:00
David Esparza
5a9829996c Merge pull request #7349 from dborquez/fix_extract_kata_env_for_metrics
metrics: Stop running kata-env before kata is properly installed.
2023-07-14 15:20:52 -06:00
David Esparza
59f4731bb2 metrics: Stop running kata-env before kata is properly installed.
This PR makes kata-env is called only after some metrics have
completed his workload. This fixes a bug that occurs when
kata-env was being called before kata is already installed on the
testing platform.

Fixes: #7348

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-07-14 13:40:48 -06:00
David Esparza
468f017e21 metrics: Replace backslashes used to escape double quoted key in jq expr.
This PR uses squared brackets in a jq expression to access
key values corresponding to metric results in json format.

The values are the data inputs into the checkmetrics tool.

Fixes: #7319

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-07-14 18:41:41 +00:00
GabyCT
b9535fb187 Merge pull request #7337 from dborquez/fix_remove_old_metrics_config
metrics: use rm -f to remove the oldest continerd config file.
2023-07-14 09:19:41 -06:00
Fabiano Fidêncio
7a854507cc Merge pull request #7333 from zvonkok/main
kernel: Update kernel config name
2023-07-14 13:49:27 +02:00
Fabiano Fidêncio
cfc90fad84 Merge pull request #7344 from fidencio/topic/kata-deploy-add-a-debug-option
kata-deploy: Add a debug option to kata-deploy (and also use it as part of our CI)
2023-07-14 13:16:55 +02:00
Fabiano Fidêncio
64f013f3bf ci: k8s: Enable debug when running the tests
This will help us to gather more information about Kata Containers in
case of failure.

Fixes: #7343

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-14 12:18:11 +02:00
Fabiano Fidêncio
8f4b1df9cf kata-deploy: Give users the ability to run it on DEBUG mode
The DEBUG env var introduced to the kata-deploy / kata-cleanup yaml file
will be responsible for:
* Setting up the CRI Engine to run with the debug log level set to debug
  * The default is usually info
* Setting up Kata Containers to enable:
  * debug logs
  * debug console
  * agent logs

This will help a lot folks trying to debug Kata Containers while using
kata-deploy, and also help us to always run with DEBUG=yes as part of
our CI.

Fixes: #7342

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-14 12:18:08 +02:00
Chao Wu
9b3dc572ae Merge pull request #7018 from nubificus/feat_bindmount_propagation
runtime-rs: add parameter for propagation of (u)mount events
2023-07-14 15:21:41 +08:00
Zvonko Kaiser
2c8dfde168 kernel: Update kernel config name
Fixes: #7294

When installing the kernel config adjust the name like
the vmlinuz and vmlinux files so that any added suffixes
are also reflected in the kernel config name.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-14 06:50:35 +00:00
Archana Shinde
b9b8ccca0c Merge pull request #7236 from amshinde/move-guestprotection
kata-ctl: Move GuestProtection code to kata-sys-util
2023-07-13 23:50:17 -07:00
soup
150e54d02b runtime-rs: ignore unconfigured network interfaces
Fixes: #7020

Signed-off-by: soup <lqh348659137@outlook.com>
2023-07-14 14:16:03 +08:00
David Esparza
3ae02f9202 metrics: use rm -f to remove older continerd config file.
In order to run kata metrics we need to check that the containerd
config file is properly set. When this is not the case, we
need to remove that file, and generate a valid one.

This PR runs rm -f in order to ignore errors in case the
file to delete does not exist.

Fixes: #7336

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-07-13 16:20:03 -06:00
David Esparza
22d4e4c5a6 Merge pull request #7328 from GabyCT/topic/updatecommon
tests: Add function before function name in common.bash for metrics
2023-07-13 16:11:30 -06:00
Gabriela Cervantes
a864d0e349 tests: Add tensorflow mobilenet dockerfile
This PR adds the tensorflow mobilenet dockerfile.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-13 21:24:40 +00:00
Gabriela Cervantes
788d2a254e tests: Add tensorflow mobilenet performance test
This PR adds tensorflow mobilenet performance test for
kata metrics.

Fixes #7334

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-13 21:18:25 +00:00
David Esparza
e8917d7321 Merge pull request #7330 from GabyCT/topic/storagedoc
tests: Add metrics storage documentation
2023-07-13 15:10:53 -06:00
GabyCT
8db43eae44 Merge pull request #7318 from dborquez/fix_timestamp_generator_on_metrics
metrics: Fix metrics ts generator to treat numbers as decimals
2023-07-13 11:21:09 -06:00
Gabriela Cervantes
3fed61e7a4 tests: Add storage link to general metrics documentation
This PR adds storage link to general metrics README.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-13 16:03:49 +00:00
Gabriela Cervantes
b34dda4ca6 tests: Add storage blogbench metrics documentation
This PR adds the storage metrics documentation for blogbench for kata
metrics.

Fixes #7329

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-13 16:00:14 +00:00
Anastassios Nanos
6787c63900 runtime-rs: add parameter for propagation of (u)mount events
Add an extra parameter in `bind_mount_unchecked` to specify
the propagation type: "shared" or "slave".

Fixes: #7017

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
2023-07-13 15:58:22 +00:00
Gabriela Cervantes
6e5679bc46 tests: Add function before function name in common.bash for metrics
This PR adds function before the function name in common.bash script
in order to have uniformity across all the script.

Fixes #7327

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-13 15:48:47 +00:00
Archana Shinde
62080f83cb kata-sys-util: Fix compilation errors
Fix compilation errors for aarch64 and s390x

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:09:43 +05:30
Archana Shinde
02d99caf6d static-checks: Make cargo clippy pass.
Get rid of cargo clippy warnings.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:08:13 +05:30
Archana Shinde
9824206820 agent: Make the static checks pass for agent
The static checks for the agent require Cargo.lock to be updated.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:08:13 +05:30
Archana Shinde
61e4032b08 kata-ctl: Remove all utility functions to get platform protection
Since these have been added to kata-sys-util, remove these from
kata-ctl. Change all invocations to get platform protection to make use
of kata-sys-util.

Fixes: #7144

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:08:13 +05:30
Archana Shinde
a24dbdc781 kata-sys-util: Move utilities to get platform protection
Add utilities to get platform protection to kata-sys-util

Fixes: #7144

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:08:13 +05:30
Archana Shinde
dacdf7c282 kata-ctl: Remove cpu related functions from kata-ctl
Remove cpu related functions which have been moved to kata-sys-util.
Change invocations in kata-ctl to make use of functions now moved to
kata-sys-util.

Signed-off-by: Nathan Whyte <nathanwhyte35@gmail.com>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:08:13 +05:30
Archana Shinde
f5d1957174 kata-sys-util: Move additional functionality to cpu.rs
Make certain imports architecture specific as these are not used on all
architectures.
Move additional constants and functionality to cpu.rs.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:08:13 +05:30
Nathan Whyte
304b9d9146 kata-sys-util: Move CPU info functions
Move get_single_cpu_info and get_cpu_flags into kata-sys-util.
Add new functions that get a list of flags and check if a flag
exists in that list.

Fixes #6383

Signed-off-by: Nathan Whyte <nathanwhyte35@gmail.com>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-07-13 20:08:13 +05:30
Fabiano Fidêncio
eed3c7c046 Merge pull request #7322 from fidencio/topic/gha-ci-add-cri-containerd-tests-skeleton-follow-up
gha: ci: Add cri-containerd tests skeleton -- follow up 1
2023-07-13 13:53:48 +02:00
Fabiano Fidêncio
7319cff77a ci: cri-containerd: Add LTS / Active versions for containerd
As we'll be testing against the LTS and the Active versions of
containers, let's add those entries to the versions.yaml file and make
sure we export what we want to use for the tests as an env var.

The approach taken should not break the current way of getting the
containerd version.

LTS and Active versions of containerd can be found at:
https://containerd.io/releases/#support-horizon

Fixes: #6543

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-13 12:05:47 +02:00
Fabiano Fidêncio
2a957d41c8 ci: cri-containerd: Export GOPATH
Let's make sure this is exported, as it'll be needed in order to install
`yq`, which will be used to get the versions of the dependencies to be
installed.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-13 12:05:47 +02:00
Fabiano Fidêncio
75a294b74b ci: cri-containerd: Ensure deps are installed
Let's make sure we install the needed dependencies for running the
`cri-containerd` tests.

Right now this commit is basically adding a placeholder, and later on,
when we'll actually be able to test the job, we'll add the logic of
installing the needed dependencies.

The obvious dependencies we've spotted so far are:
* From the OS
  * jq
  * curl (already present)
* From our repo
  * yq (using the install_yq script)
* From GitHub
  * cri-containerd
  * cri-tools
  * cni plugins

We may need a few more packages, but we will only figure this out as
part of the actual work.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-13 12:04:22 +02:00
Zhongtao Hu
b69cdb5c21 Merge pull request #7286 from xuejun-xj/xuejun/up-fix
dragonball/agent: Add some optimization for Makefile and bugfixes of unit tests on aarch64
2023-07-13 09:39:23 +08:00
GabyCT
ee17097e88 Merge pull request #7282 from GabyCT/topic/enableblogbench
metrics: Enable blogbench test
2023-07-12 16:35:52 -06:00
David Esparza
f63673838b Merge pull request #7315 from GabyCT/topic/machinelearning
tests: Add machine learning performance tests
2023-07-12 15:57:11 -06:00
David Esparza
6924d14df5 metrics: Fix metrics ts generator to treat numbers as decimals
Use bc tool to perform math operations even when variables contain
values with leading zero.

Fixes: #7317

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-07-12 20:57:33 +00:00
Gabriela Cervantes
9e048c8ee0 checkmetrics: Add blogbench read value for qemu
This PR adds the blogbench read value for qemu.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-12 20:38:27 +00:00
Gabriela Cervantes
2935aeb7d7 checkmetrics: Add blogbench write value for qemu
This PR adds the blogbench write value for qemu limit.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-12 20:37:27 +00:00
Gabriela Cervantes
02031e29aa checkmetrics: Add blogbench read value for clh
This PR adds the blogbench read value for clh limit.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-12 20:37:27 +00:00
Gabriela Cervantes
107fae033b checkmetrics: Add blogbench write value for clh
This PR adds the blogbench write value limit for clh.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-12 20:37:27 +00:00
Gabriela Cervantes
8c75c2f4bd metrics: Update blogbench Dockerfile
This PR udpates the blogbench dockerfile to have non interactive mode.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-12 20:37:27 +00:00
Gabriela Cervantes
49723a9ecf metrics: Add double quotes to variables
This PR adds double quotes to variables in the blogbench script to
have uniformity across all the tests.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-12 20:37:27 +00:00
Gabriela Cervantes
dc67d902eb metrics: Enable blogbench test
This PR enables the blogbench performance test for the kata metrics CI.

Fixes #7281

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-12 20:37:24 +00:00
Fabiano Fidêncio
3f38f75918 Merge pull request #7314 from fidencio/topic/gha-ci-add-cri-containerd-tests-skeleton
tests: gha: ci: Add cri-containerd tests skeleton
2023-07-12 22:21:47 +02:00
Fabiano Fidêncio
438fe3b829 gha: ci: Add cri-containerd tests skeleton
This PR builds the foundation for us to start migrating the
cri-containerd tests from Jenkins to GitHub Actions.

Right now the test does nothing and should always finish successfully.
The coming PRs will actually introduce logic to the `gha-run.sh` script
where we'll be able to run the tests and make sure those pass before
having them actually merged.

Fixes: #6543

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-12 20:57:39 +02:00
Fabiano Fidêncio
bd08d745f4 tests: metrics: Move metrics specific function to metrics gha-run.sh
`compress_metrics_results_dir()` is only used by the metrics GHA.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-12 20:56:55 +02:00
Fabiano Fidêncio
3ffd48bc16 tests: common: Move a few utility functions to common.bash
Those functions were originally introduced as part of the
`metrics/gha-run.sh` file, but those will be very hand at the time we
start adding more tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-12 20:55:05 +02:00
Gabriela Cervantes
7f961461bd tests: Add machine learning README
This PR adds machine learning README.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-12 16:37:15 +00:00
Fabiano Fidêncio
bb2ef4ca34 tests: Add function before each function
Let's just keep this standardised.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-12 18:36:09 +02:00
Gabriela Cervantes
063f7aa7cb tests: Add Pytorch Dockerfile
This PR adds Pytorch Dockerfile for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-12 16:34:17 +00:00
Fabiano Fidêncio
b6282f7053 Merge pull request #7255 from GabyCT/topic/memoryinsideenabled
metrics: Enable memory inside container metrics
2023-07-12 18:33:36 +02:00
Gabriela Cervantes
1af03b9b32 tests: Add Pytorch performance test
This PR adds Pytorch performance test for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-12 16:33:02 +00:00
Gabriela Cervantes
4cecd62370 tests: Add tensorflow Dockerfile
This PR adds the tensorflow Dockerfile.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-12 16:31:32 +00:00
Gabriela Cervantes
c4094f62c9 tests: Add metrics machine learning performance tests
This PR adds metrics machine learning performance tests like
Tensorflow and Pytorch.

Fixes #7313

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-12 16:28:25 +00:00
Jeremi Piotrowski
b9a63d66a4 Merge pull request #7297 from jepio/fix-mariner-cache
tools: Use a consistent target name when building mariner initrd
2023-07-12 13:43:47 +02:00
Fabiano Fidêncio
1ab99bd6bb Merge pull request #7276 from fidencio/topic/gha-debug-gha-tests-start
gha: ci: Gather info about the node / pods
2023-07-12 12:35:10 +02:00
Chao Wu
f6a51a8a78 Merge pull request #7306 from justxuewei/none-network-model
runtime-rs: Do not scan network if network model is "none"
2023-07-12 14:53:52 +08:00
Zvonko Kaiser
4e352a73ee Merge pull request #7308 from fidencio/topic/gha-temporarily-disable-tdx-runs
gha: k8s: tdx: Temporarily disable TDX tests
2023-07-12 08:39:02 +02:00
Fabiano Fidêncio
89b622dcb8 gha: k8s: tdx: Temporarily disable TDX tests
TDX tests need to be temporarily disabled as the current machine
allocated for this will be off for some time, and a new machine only
will be added next week.

Fixes: #7307

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-12 08:26:10 +02:00
Fabiano Fidêncio
8c9d08e872 gha: ci: Gather info about the node / pods
This is a very simple addition, that should be expanded by
https://github.com/kata-containers/kata-containers/pull/7185, and it's
targetting gathering more info that will help us to debug CI failures.

Fixes: #7296

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-12 08:04:37 +02:00
alex.lyn
283f809dda runtime-rs: Enhancing Device Manager for network endpoints.
Currently, network endpoints are separate from the device manager
and need to be included for proper management. In order to do so,
we need to refactor the implementation of the network endpoints.

The first step is to restructure the NetworkConfig and NetworkDevice
structures.
Next, we will implement the virtio-net driver and add the Network
device to the Device Manager.
Finally, we'll unify entries with do_handle_device for each endpoint.

Fixes: #7215

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-07-12 11:27:12 +08:00
xuejun-xj
a65291ad72 agent: rustjail: update test_mknod_dev
When running cargo test in container, test_mknod_dev may fail sometimes
because of "Operation not permitted". Change the device path to
"/dev/fifo-test" to avoid this case.

Fixes: #7284

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-07-12 11:22:32 +08:00
xuejun-xj
46b81dd7d2 agent: clippy: fix cargo clippy warnings
Replace "if let Ok(_) = ..." with ".is_ok()" method.

Fixes: #7284

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-07-12 11:22:32 +08:00
xuejun-xj
c4771d9e89 agent: Makefile: enable set SECCOMP dynamically
Change ":=" to "?:".

Fixes: #7284

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-07-12 11:22:32 +08:00
xuejun-xj
a88212e2c5 utils.mk: update BUILD_TYPE argument
Enable to dynamically set BUILD_TYPE argument.

Fixes: #7284

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-07-12 11:22:32 +08:00
xuejun-xj
883b4db380 dragonball: fix cargo test on aarch64
1. Update memory end assert because address space layout differs between
x86 and arm.
2. Set guest_addr for aarch64 in test_handler_insert_region case.

Fixes: #7284
TODO: #7290

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-07-12 11:22:31 +08:00
Xuewei Niu
6822029c81 runtime-rs: Do not scan network if network model is "none"
Skip to scan network from netns if the network model is specified to
"none".

Fixes: #7305

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-07-12 10:00:50 +08:00
Fabiano Fidêncio
ae55893deb Merge pull request #7303 from GabyCT/topic/cleanupmemoryusage
metrics: Update memory usage script
2023-07-11 23:52:05 +02:00
Gabriela Cervantes
ce54e43ebe metrics: Update memory usage script
This PR updates memory usage script by applying the clean_env_ctr at the main
in order to avoid failures of leaving certain processes not removed.

Fixes #7302

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-11 17:03:25 +00:00
Fabiano Fidêncio
ceb5c69ee8 Merge pull request #7299 from fidencio/topic/gha-stop-previous-workflows-if-a-pr-is-updated
gha: Cancel previous jobs if a PR is updated
2023-07-11 16:22:47 +02:00
Fabiano Fidêncio
fbc2a91ab5 gha: Cancel previous jobs if a PR is updated
Let's make sure we cancel previous runs, mainly as we have some of those
that take a lot of time to run, whenever the PR is updated.

This is based on the following stack overflow suggestion:
https://stackoverflow.com/questions/66335225/how-to-cancel-previous-runs-in-the-pr-when-you-push-new-commitsupdate-the-curre

This is very much needed as we don't want to wait for a long time to
have access to a runner because of other runners are still being used
performing a task that's meaningless due to the PR update.

Fixes: #7298

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-11 14:37:10 +02:00
Jeremi Piotrowski
307cfc8f7a tools: Use a consistent target name when building mariner initrd
Currently a mixture of cbl-mariner and mariner is used when creating the
mariner initrd. The kata-static tarball has mariner in the name, but the
jenkins url uses cbl-mariner. This breaks cache usage.

Use mariner as the target name throughout the build, so that caching works.

Fixes: #7292
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-07-11 14:17:14 +02:00
Fabiano Fidêncio
aa484dc0e3 Merge pull request #7288 from fidencio/topic/add-nightly-jobs-follow-up-7
gha: nightly: Fix long name of AKS clusters issue and make the CI easier to test
2023-07-11 11:16:09 +02:00
Fabiano Fidêncio
d780cc08f4 gha: nightly: Also use workflow_dispatch to trigger it
This is a very nice suggestion from Steve Horsman, as with that we can
manually trigger the workflow anytime we need to test it, instead of
waiting for a full day for it to be retriggered via the `schedule`
event.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-11 10:42:40 +02:00
Fabiano Fidêncio
b99ff30267 gha: nightly: Fix name size limit for AKS
Passing the commit hash as the "pr-number" has shown problematic as it
would make the AKS cluster name longer than what's accepted by AKS.

One easy way to solve this is just passing "nightly" as the PR number,
as that's only used to create the cluster.

Fixes: #7247

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-11 09:59:13 +02:00
xuejun-xj
aedc586e14 dragonball: Makefile: add coverage target
Add "coverage" target to compute code coverage for dragonball.

Fixes: #7284

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-07-11 14:36:25 +08:00
Fabiano Fidêncio
52100bb3dd Merge pull request #7280 from fidencio/topic/gha-add-badge-for-our-tests
README: Add badge for our Nightly CI
2023-07-10 19:35:33 +02:00
Gabriela Cervantes
310e069f73 checkmetrics: Enable checkmetrics for memory inside test
This PR enables the checkmetrics to include the memory inside
container test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-10 17:05:13 +00:00
Fabiano Fidêncio
b61b15aab6 Merge pull request #7259 from fidencio/topic/gha-restrict-job-run-according-to-files-touched
gha: Do not run all the tests if only docs are updated
2023-07-10 18:12:29 +02:00
Fabiano Fidêncio
1363fbbf12 README: Add badge for our Nightly CI
This will help folks to monitor the history of the failing tests, as
we've done in Jenkins with the "Green Effort CI".

Fixes: #7279

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-10 17:31:51 +02:00
Fabiano Fidêncio
9dc63fe338 Merge pull request #7273 from openanolis/runtime-rs-fix-mem-ci
bugfix: plus default_memory when calculating mem size
2023-07-10 15:12:05 +02:00
Zvonko Kaiser
fab2e6a93f Merge pull request #7277 from fidencio/topic/add-nightly-jobs-follow-up-6
gha: ci: Use github.sha to get the last commit reference
2023-07-10 13:36:31 +02:00
Fabiano Fidêncio
1776b18fa0 gha: Do not run all the tests if only docs are updated
We should not go through the trouble of running all our tests on AKS /
Azure / baremetal machines in case a PR only changes our documentation.

Fixes: #7258

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-10 10:30:46 +02:00
Yushuo
28c29b248d bugfix: plus default_memory when calculating mem size
We've noticed this caused regressions with the k8s-oom tests, and then
decided to take a step back and do this in the same way it was done
before 67972ec48a.

Moreover, this step back is also more reasonable in terms of the
controlling logic.

And by doing this we can re-enable the k8s-oom.bats tests, which is done
as part of this PR.

Fixes: #7271
Depends-on: github.com/kata-containers/tests#5705

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
2023-07-10 15:53:04 +08:00
Fabiano Fidêncio
0c1cbd01d8 gha: ci: after-push: Use github.sha to get the last commit reference
As we need to pass down the commit sha to the jobs that will be
triggered from the `push` event, we must be careful on what exactly
we're using there.

At first we were using ${{ github.ref }}, but this turns out to be the
**branch name**, rather than the commit hash.  In order to actually get
the commit hash, Let's use ${{ github.sha }} instead.

Fixes: #7247

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-10 09:39:33 +02:00
Fabiano Fidêncio
37a9556789 gha: ci: nightly: Use github.sha to get the last commit reference
As we need to pass down the commit sha to the jobs that will be
triggered from the `schedule` event, we must be careful on what exactly
we're using there.

At first we were using ${{ github.ref }}, but this turns out to be the
**branch name**, rather than the commit hash.  In order to actually get
the commit hash, Let's use ${{ github.sha }} instead, as described by
https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#

Fixes: #7247

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-10 09:39:26 +02:00
Fabiano Fidêncio
afbc1f94d7 Merge pull request #7272 from fidencio/topic/dragonball-k8s-number-cpus-fix
dragonball: Don't fail if a request asks for more CPUs than allowed
2023-07-10 08:25:06 +02:00
Ji-Xinyou
ed23b47c71 tracing: Add tracing to runtime-rs
Introduce tracing into runtime-rs, only some functions are instrumented.

Fixes: #5239

Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
2023-07-09 22:09:43 +08:00
Fabiano Fidêncio
96e9374d4b dragonball: Don't fail if a request asks for more CPUs than allowed
Let's take the same approach of the go runtime, instead, and allocate
the maximum allowed number of vcpus instead.

Fixes: #7270

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-08 15:50:23 +02:00
Fabiano Fidêncio
38f0aaa516 Revert "gha: k8s: dragonball: Skip k8s-number-cpus"
This reverts commit a79505b667.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-08 14:43:49 +02:00
Fabiano Fidêncio
828a721838 gha: k8s: dragonball: Skip k8s-oom
Let's skip the k8s-oom, as the test is currently failing.

We've an issue opened for that, and we'll be working on re-enabling it
as soon as possible.

Reference:
https://github.com/kata-containers/kata-containers/issues/7271

Fixes: #7253

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-08 14:27:49 +02:00
Fabiano Fidêncio
a79505b667 gha: k8s: dragonball: Skip k8s-number-cpus
Let's skip the k8s-number-cpus, as the test is currently failing.

We've an issue opened for that, and we'll be working on re-enabling it
as soon as possible.

Reference:
https://github.com/kata-containers/kata-containers/issues/7270

Fixes: #7253

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-08 14:27:42 +02:00
Fabiano Fidêncio
275c84e7b5 Revert "agent: fix the issue of exec hang with a backgroud process"
This reverts commit 25d2fb0fde.

The reason we're reverting the commit is because it to check whether
it's the cause for the regression on devmapper tests.

Fixes: #7253
Depends-on: github.com/kata-containers/tests#5705

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-08 14:27:40 +02:00
Gabriela Cervantes
2be342023b checkmetrics: Add memory usage inside container value for qemu
This PR adds the memory usage inside container value for qemu.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-07 16:28:28 +00:00
Gabriela Cervantes
6ca34f949e checkmetrics: Add memory inside container value for clh
Add memory inside container value for clh.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-07 16:28:28 +00:00
Gabriela Cervantes
6c68924230 metrics: Enable memory inside container metrics
This PR will enable the memory inside container metrics for the Kata CI.

Fixes #7254

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-07 16:28:28 +00:00
Fabiano Fidêncio
b7c58320a5 Merge pull request #7267 from fidencio/topic/add-nightly-jobs-follow-up-5
gha: ci: Fix refernce passed to checkout@v3
2023-07-07 18:26:44 +02:00
Fabiano Fidêncio
0ad298895e gha: ci: Fix refernce passed to checkout@v3
On cc3993d860 we introduced a regression,
where we started passing inputs.commit-hash, instead of
github.event.pull_request.head.sha. However, we have been setting
commit-hash to github.event.pull_request.sha, meaning that we're mssing
a `.head.` there.

github.event.pull_request.sha is empty for the pull_request_target
event, leading the CI to pull the content from `main` instead of the
content from the PR.

Fixes: #7247

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-07 17:55:11 +02:00
Fabiano Fidêncio
48d9f8769e Merge pull request #7264 from fidencio/topic/add-nightly-jobs-follow-up-4
gha: ci: Avoid using env also in the ci-nightly and payload-after-push
2023-07-07 17:10:43 +02:00
Fabiano Fidêncio
86904909aa gha: ci: Avoid using env also in the ci-nightly and payload-after-push
The latter workflow is breaking as it doesn't recognise ${GITHUB_REF},
the former would most likely break as well, but it didn't get triggered
yet.

The error we're facing is:
```
Determining the checkout info
  /usr/bin/git branch --list --remote origin/${GITHUB_REF}
  /usr/bin/git tag --list ${GITHUB_REF}
  Error: A branch or tag with the name '${GITHUB_REF}' could not be found
```

Fixes: #7247

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-07 14:46:30 +02:00
Fabiano Fidêncio
48c3cec1f4 Merge pull request #7243 from sprt/ensure-cluster-no-exist
gha: k8s: Ensure cluster doesn't exist before creating it
2023-07-07 14:03:41 +02:00
Fabiano Fidêncio
3e2b723487 Merge pull request #7263 from fidencio/topic/add-nightly-jobs-follow-up-3
gha: ci: More follow up fixes after adding a nightly CI
2023-07-07 13:58:26 +02:00
Fabiano Fidêncio
18bd2d6e4a Merge pull request #6839 from sprt/sprt/mariner-ci-tests
tests: Enable running k8s tests on Mariner
2023-07-07 13:36:28 +02:00
Zvonko Kaiser
f72cb2fc12 agent: Remove shadowed function, add slog-term
Remove shadowed get_mounts(), added slog-term as a new crate,
slog can directly log to stdout and we can capture output
in the test-cases that are created in the function to be tested.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-07 11:28:14 +00:00
Fabiano Fidêncio
1d05b9cc71 gha: ci: Pass down secrets to ci-on-push / ci-nightly
We have to do this, otherwise we cannot log into azure.

This is a regression introduced by
106e305717.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-07 12:00:33 +02:00
Fabiano Fidêncio
c5b4164cb1 gha: ci: Fix tarball-suffix passed to the metrics tests
Instead of passing "-${{ inputs.tag }}-amd64", we must only pass
"-${{ inputs.tag }}".

This is a regression introduced by
106e305717.

Fixes: #7247

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-07 12:00:24 +02:00
Fabiano Fidêncio
fa0f9954a1 Merge pull request #7261 from fidencio/topic/add-nightly-jobs-follow-up-2
gha: ci: Avoid using env unless it's really needed
2023-07-07 10:13:25 +02:00
Zvonko Kaiser
07810bf71f agent: Ignore already mounted dev/fs/pseudo-fs
Using an initrd and setting KATA_INIT=yes meaning we're using the kata-agent
as the init process we need to make sure that the agent is not segfaulting
if mounts are already happened. Some workloads need to configure several
things in the initrd before the kata-agent starts which involves having
/proc or /sys already mounted.

Fixes: #6992

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-07-07 07:36:04 +00:00
Fabiano Fidêncio
11e3ccfa4d gha: ci: Avoid using env unless it's really needed
de83cd9de7 tried to solve an issue, but it
clearly seems that I'm using env wrongly, as what ended up being passed
as input was "$VAR", instead of the content of the VAR variable.

As we can simply avoid using those here, let's do it and save us a
headache.

Fixes: #7247

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-07 07:31:10 +02:00
Aurélien Bombo
c45f646b9d gha: k8s: Ensure cluster doesn't exist before creating it
The cluster cleanup step will sometimes fail to run, meaning the next
run would fail in the cluster creation step. This PR addresses that.

Example: https://github.com/kata-containers/kata-containers/actions/runs/5349582743/jobs/9867845852

Fixes: #7242

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-07-06 15:06:30 -07:00
GabyCT
58e921eace Merge pull request #7260 from fidencio/topic/add-nightly-jobs-follow-up-1
gha: ci: Follow up fixes for the nightly jobs
2023-07-06 15:45:13 -06:00
GabyCT
54da0d7c91 Merge pull request #7230 from GabyCT/topic/enabmemory
tests: Enable memory usage metrics tests
2023-07-06 14:30:56 -06:00
Fabiano Fidêncio
1a7bbcd398 gha: ci: Fix typo pull_requesst -> pull_request
Thanks David Esparza for pointing this one out.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-06 22:29:00 +02:00
Fabiano Fidêncio
ddf4afb961 gha: ci: Fix set-fake-pr-number job
It has to have steps declared, and we need to make it a dependency for
the nightly kata-containers-ci-on-push job.

Fixes: #7247

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-06 22:02:08 +02:00
Fabiano Fidêncio
8a0a66655d gha: ci: schedule expects a list, not a map
And because of that we need to declare '- cron', instead of 'cron'.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-06 22:02:08 +02:00
Fabiano Fidêncio
5c0269dc5a gha: ci: Add pr-number input to the correct job
It must have been an input for the AKS jobs, not the SNP one.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-06 22:02:08 +02:00
Fabiano Fidêncio
de83cd9de7 gha: ci: Use $VAR instead of ${{ env.VAR }}
Otherwise we'll get the following error from the workflow:
```
The workflow is not valid. .github/workflows/ci-on-push.yaml (Line: 24,
Col: 20): Unrecognized named-value: 'env'. Located at position 1 within
expression: env.COMMIT_HASH .github/workflows/ci-on-push.yaml (Line: 25,
Col: 18): Unrecognized named-value: 'env'. Located at position 1 within
expression: env.PR_NUMBER
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-06 22:02:08 +02:00
Wainer Moschetta
1a4ae1ef47 Merge pull request #6953 from fidencio/topic/add-nightly-jobs
gha: Add nightly jobs
2023-07-06 14:50:10 -03:00
Gabriela Cervantes
6acce83e12 metrics: Fix the call to check_metrics function
This PR fixes the call to check_metrics function as KATA_HYPERVISOR
is not needed to be passed.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-06 17:22:49 +00:00
David Esparza
0bd21c173a Merge pull request #7240 from dborquez/storing_metrics_artifacts
metrics: storing metrics workflow artifacts
2023-07-06 09:49:45 -06:00
Fabiano Fidêncio
152e2509ca Merge pull request #7238 from fidencio/topic/gha-run-tests-on-specific-namespace
gha: k8s: Ensure tests are running on a specific namespace
2023-07-06 17:25:00 +02:00
Fabiano Fidêncio
e067d18333 gha: Add a nightly CI job
The idea is to mimic what's been done with Jenkins and the "Green CI"
effort, but now using our GHA and the GHA infrastructure.

Fixes: #7247

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-06 14:39:49 +02:00
Fabiano Fidêncio
7c0de8703c gha: k8s: Ensure tests are running on a specific namespace
Let's make sure we run our tests in a specific namespace, as in case of
any kind of issue, we will just get rid of the namespace itself, which
will take care of cleaning up any leftover from failing tests.

One important thing to mention is why we can get rid of the `namespace:
${namespace}` on the tests that are already using it, and let's do it in
parts:
* namespace: default
  We can easily get rid of this as that's the default namespace where
  pods are created, so it was a no-op so far.
* namespace: test-quota-ns
  My understanding is that we'd need this in order to get a clean
  namespace where we'd be setting a quota for.  Doing this in the
  namespace that's only used for tests should **not** cause any
  side-effect on the tests, as we're running those in serial and there's
  no other pods running on the `kata-containers-k8s-tests` namespace

Last but not least, we're not dynamically creating namespaces as the
tests are not running in parallel, **never**, not in the case of having
2 tests being ran at same time, neither in the case of having 2 jobs
being scheduled to the same machine.

Fixes: #6864

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-06 14:14:50 +02:00
Fabiano Fidêncio
106e305717 gha: Create a re-usable ci.yaml file
This is based on the `ci-on-push.yaml` file, and it's called from ther
The reason to split on a new file is that we can easily introduce a
`ci-nightly.yaml` file and re-use the `ci.yaml` file there as well.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-06 13:07:59 +02:00
Fabiano Fidêncio
cc3993d860 gha: Pass event specific info from the caller workflow
Let's ensure we're not relying, on any of the called workflows, on event
specific information.

Right now, the two information we've been relying on are:
* PR number, coming from github.event.pull_request.number
* Commit hash, coming from github.event.pull_request.head.sha

As we want to, in the future, add nightly jobs, which will be triggered
by a different event (thus, having different fields populated), we
should ensure that those are not used unless it's in the "top action"
that's trigerred by the event.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-06 11:23:17 +02:00
David Esparza
4e396e7285 metrics: Add function keyword to to helper metrics functions
Use the 'function' keyword to prevent bash aliases from colliding
with other function's name.

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-07-05 20:59:21 -06:00
David Esparza
1ca17c2f70 metrics: storing metrics workflow artifacts
This PR enables storing metrics workflow artifacts in two
separated flavours: clh and qemu.

Fixes: #7239

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-07-05 20:57:10 -06:00
David Esparza
a3fc673121 Merge pull request #7181 from dborquez/add_blogbench_and_webtooling
metrics: Adds blogbench and webtool metrics tests
2023-07-05 20:37:33 -06:00
Gabriela Cervantes
5a61065ab7 checkmetrics: Add checkmetrics value for memory usage in qemu
This PR adds the checkmetrics value for memory usage in qemu.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-05 19:22:12 +00:00
Gabriela Cervantes
78086ed1fe checkmetrics: Add memory usage value for clh
This PR adds the memory usage value for clh.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-05 19:19:04 +00:00
Gabriela Cervantes
1c3dbafbf0 metrics: Fix function of how to retrieve multiple values
This PR fixes the function of how to add multiple values of pss memory.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-05 18:19:36 +00:00
Gabriela Cervantes
18968f428f metrics: Add function to have uniformity
This PR adds the function name before the function to have uniformity
across all the test.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-05 18:15:31 +00:00
David Esparza
35d096b607 metrics: Adds blogbench and webtool metrics tests
This PR adds blogbench and webtooling metrics checks to this repo.
The function running the test intentionally returns zero, so
the test will be enabled in another PR once the workflow is
green.

Fixes: #7069

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-07-04 14:38:52 -06:00
Gabriela Cervantes
d8f90e89d5 metrics: Rename function at memory usage script
This PR renames the function name for the memory usage script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-04 19:58:09 +00:00
Gabriela Cervantes
b9d66e0d53 metrics: Fix double quotes variables in memory usage script
This PR usses double quotes in all the variables as well as general fixes
to the memory usage script in order to have uniformity.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-04 19:51:36 +00:00
Gabriela Cervantes
476a11194a tests: Enable memory usage metrics tests
This PR enables the memory usage metrics tests for kata CI.

Fixes #7229

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-04 16:11:54 +00:00
Fabiano Fidêncio
a25d5b9807 Merge pull request #7222 from jepio/fix-dragonball-check
gha: dragonball: Correctly propagate PATH update
2023-07-04 15:59:13 +02:00
Jeremi Piotrowski
b568c7f7d8 tests/integration: Provide default value for KATA_HOST_OS
Non AKS k8s tests (SEV/SNP/TDX) don't currently set KATA_HOST_OS, so provide a
default empty value for the variable so that those tests can run.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-07-04 14:28:29 +02:00
Fabiano Fidêncio
6d2e6ed7b6 Merge pull request #7217 from likebreath/0630/clh_v33.0
versions: Upgrade to Cloud Hypervisor v33.0
2023-07-04 12:52:26 +02:00
Jeremi Piotrowski
d6e96ea06d tests/integration: Use AzureLinux instead of Mariner
as OSSKU value, to get rid of this warning when creating the AKS cluster:

WARNING: The osSKU "AzureLinux" should be used going forward instead of
"CBLMariner" or "Mariner". The osSKUs "CBLMariner" and "Mariner" will
eventually be deprecated.

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-07-04 12:49:07 +02:00
Jeremi Piotrowski
40c46c75ed tests/integration: Perform yq install in run_tests()
We only need to install in run_tests() so that the yq install is picked up by
kubernets/setup.sh as well. We also need to either use (sudo &&
INSTALL_IN_GOPATH=false) || (INSTALL_IN_GOPATH=true).

Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-07-04 12:49:07 +02:00
Bin Liu
f214058b07 Merge pull request #7202 from wedsonaf/macros
Convert `is_allowed`, `ttrpc_error` and `sl` to functions
2023-07-04 14:23:08 +08:00
Peng Tao
f5658c7833 Merge pull request #7224 from fidencio/topic/gha-release-fix-hub-download
gha: release: Use a specific release of hub
2023-07-04 10:21:17 +08:00
GabyCT
5950df7d95 Merge pull request #7199 from GabyCT/topic/installchem
metrics: Add checkmetrics to gha-run.sh for metrics CI
2023-07-03 17:49:18 -06:00
Gabriela Cervantes
d8b8f7e94d metrics: Enable launch tests time metrics
This PR enables the launch tests metrics for kata CI.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-03 22:38:04 +00:00
Fabiano Fidêncio
72fd562bd6 gha: release: Use a specific release of hub
ideally we should never ever use hub again, and switch to a supported /
release tool instead.  However, in order to get v3.1.3 released, let's
just stick to the last released version of hub, as trying to get its
release is leading to:
```
curl -s "https://api.github.com/repos/github/hub/releases/latest"
{
  "message": "Moved Permanently",
  "url": "https://api.github.com/repositories/401025/releases/latest",
  "documentation_url": "https://docs.github.com/v3/#http-redirects"
}
```

And that breaks the release process. :-/

Fixes: #7223

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-07-03 22:00:55 +02:00
Fabiano Fidêncio
a7340a63a4 Merge pull request #7209 from GabyCT/topic/fixbuildovmf
packaging: Fix indentation of build.sh script at ovmf
2023-07-03 20:06:29 +02:00
Gabriela Cervantes
0502354b42 checkmetrics: Add checkmetrics json for qemu
This PR adds checkmetrics json file for qemu metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-03 16:47:03 +00:00
Gabriela Cervantes
b481ef1883 makefile: Add -buildvcs=false flag to go build
This PR adds the -buildvcs=false flag to the go build of checkmetrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-03 16:42:51 +00:00
Gabriela Cervantes
e94aaed3c7 ci_worker: Add checkmetrics ci worker for cloud hypervisor
This PR adds the checkmetrics ci worker file for cloud hypervisor in
order to check the boot times limit.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-03 16:42:51 +00:00
Gabriela Cervantes
917576e6fb metrics: Add double quotes in all variables
This PR adds double quotes in all variables to have uniformity across
all the gha-run.sh script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-03 16:42:50 +00:00
Gabriela Cervantes
cc8f0a24e4 metrics: Add checkmetrics to gha-run.sh for metrics CI
This PR adds checkmetrics installation for gha-run.sh in order to compare
results limits as part of the metrics CI.

Fixes #7198

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-07-03 16:41:31 +00:00
Jeremi Piotrowski
477856c1e3 gha: dragonball: Correctly propagate PATH update
cargo/rust is installed in one step, we need to write the PATH update to
GITHUBENV so that it becomes visible in the next steps.

Fixes: #7221
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-07-03 17:05:12 +02:00
Fupan Li
b6307c2744 Merge pull request #5444 from zvonkok/vra
doc: Add documentation for the virtualization reference architecture
2023-07-03 10:14:20 +08:00
Peng Tao
c85aff7ef4 Merge pull request #6949 from zvonkok/kernel-fixes
gpu: Update kernel building to the latest changes
2023-07-03 09:53:08 +08:00
Peng Tao
581be92b25 Merge pull request #4492 from zvonkok/pcie-topology
runtime: fix PCIe topology for GPUDirect use-case
2023-07-03 09:17:12 +08:00
David Esparza
d01762dc35 Merge pull request #7174 from dborquez/add_memory_footprint_test
metrics: Add memory footprint tests
2023-06-30 16:32:10 -06:00
Fabiano Fidêncio
00b0755e3e Merge pull request #7200 from fidencio/topic/add-virtiofs-none-option
runtime: Add "none" as a shared_fs option
2023-06-30 22:45:39 +02:00
Aurélien Bombo
1c211cd730 gha: Swap asset/release in build matrix
This simply displays the asset name first in GH's UI, so that the
release name (always "test") is truncated rather than the asset name.
Makes things slightly easier to read.

e.g.

    build-asset (cloud-hypervisor-glibc, te...

instead of

    build-asset (test, cloud-hypervisor-gli...

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-06-30 12:51:40 -07:00
Aurélien Bombo
0152c9aba5 tools: Introduce USE_CACHE environment variable
This allows setting `USE_CACHE=no` to test building e2e during
developmet without having to comment code blocks and so forth.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-06-30 12:51:40 -07:00
Aurélien Bombo
2b59756894 tests: Build CLH with glibc for Mariner
This enables building CLH with glibc and the mshv feature as required
for Mariner. At test time, it also configures Kata to use that CLH
flavor when running Mariner.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-06-30 12:51:40 -07:00
Aurélien Bombo
80c78eadce tests: Use baked-in kernel with Mariner
Mariner ships a bleeding-edge kernel that might be ahead of upstream, so
we use that to guarantee compatibility with the host.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-06-30 12:51:40 -07:00
Aurélien Bombo
532755ce31 tests: Build Mariner rootfs initrd
* Adds a new `rootfs-initrd-mariner` build target.
 * Sets the custom initrd path via annotation in `setup.sh` at test
   time.
 * Adapts versions.yaml to specify a `cbl-mariner` initrd variant.
 * Introduces env variable `HOST_OS` at deploy time to enable using a
   custom initrd.
 * Refactors the image builder so that its caller specifies the desired
   guest OS.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-06-30 12:51:40 -07:00
Fabiano Fidêncio
6a21e20c63 runtime: Add "none" as a shared_fs option
Currently, even when using devmapper, if the VMM supports virtio-fs /
virtio-9p, that's used to share a few files between the host and the
guest.

This *needed*, as we need to share with the guest contents like secrets,
certificates, and configurations, via Kubernetes objects like configMaps
or secrets, and those are rotated and must be updated into the guest
whenever the rotation happens.

However, there are still use-cases users can live with just copying
those files into the guest at the pod creation time, and for those
there's absolutely no need to have a shared filesystem process running
with no extra obvious benefit, consuming memory and even increasing the
attack surface used by Kata Containers.

For the case mentioned above, we should allow users, making it very
clear which limitations it'll bring, to run Kata Containers with
devmapper without actually having to use a shared file system, which is
already the approach taken when using Firecracker as the VMM.

Fixes: #7207

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-06-30 20:45:00 +02:00
Bo Chen
5681caad5c versions: Upgrade to Cloud Hypervisor v33.0
Details of this release can be found in ourroadmap project as iteration
v33.0: https://github.com/orgs/cloud-hypervisor/projects/6.

Fixes: #7216

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-06-30 09:37:27 -07:00
David Esparza
b2ce8b4d61 metrics: Add memory footprint tests to the CI
This PR adds memory foot print metrics to tests/metrics/density
folder.

Intentionally, each test exits w/ zero in all test cases to ensure
that tests would be green when added, and will be enabled in a
subsequent PR.

A workflow matrix was added to define hypervisor variation on
each job, in order to run them sequentially.

The launch-times test was updated to make use of the matrix
environment variables.

Fixes: #7066

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-06-30 09:52:27 -06:00
David Esparza
5e3f617cb6 Merge pull request #7197 from GabyCT/topic/fixfunctionname
metrics: Uniformity across function names in gha-run.sh
2023-06-30 09:37:15 -06:00
Zvonko Kaiser
d035955ef5 doc: Add documentation for the virtualization reference architecture
Fixes: #4041

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-30 12:30:37 +00:00
Zvonko Kaiser
0f454d0c04 gpu: Fixing typos for PCIe topology changes
Some comments and functions had typos and wrong capitalization.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-30 08:42:55 +00:00
Gabriela Cervantes
6bb2ea8195 packaging: Fix indentation of build.sh script at ovmf
This PR fixes the indentation of build.sh script at ovmf.

Fixes #7208

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-29 15:46:54 +00:00
Fupan Li
4288b935e1 Merge pull request #7104 from openanolis/physical/endpoint
runtime-rs:  support physical endpoint using device manager
2023-06-29 14:43:44 +08:00
GabyCT
19890133e9 Merge pull request #7189 from Apokleos/direct-vol-bugfix
runtime-rs: bugfix for direct volume path's validation.
2023-06-28 12:26:22 -06:00
Wedson Almeida Filho
0504bd7254 agent: convert the sl macros to functions
There is nothing in them that requires them to be macros. Converting
them to functions allows for better error messages.

Fixes: #7201

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-06-28 14:05:32 -03:00
Wedson Almeida Filho
0860fbd410 agent: convert the ttrpc_error macro to a function
There is nothing in it that requires it to be a macro. Converting it to
a function allows for better error messages.

Fixes: #7201

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-06-28 14:05:32 -03:00
Wedson Almeida Filho
0e5d6ce6d7 agent: convert the is_allowed macro to a function
Having a function allows for better error messages from the type checker
and it makes it clearer to callers what can happen. For example:

is_allowed!(req);

Gives no indication that it may result in an early return, and no simple
way for callers to modify the behaviour. It also makes it look like
ownership of `req` is being transferred.

On the other hand,

is_allowed(&req)?;

Indicates that `req` is being borrowed (immutably) and may fail. The
question mark indicates that the caller wants an early return on
failure.

Fixes: #7201

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-06-28 14:05:32 -03:00
Wedson Almeida Filho
f680fc52be agent: change AGENT_CONFIG's lazy type to just AgentConfig
Since it is never modified, it doesn't really need a lock of any kind.
Removing the `RwLock` wrapper allows us to remove all `.read().await`
calls when accessing it.

Additionally, `AGENT_CONFIG` already has a static lifetime, so there is
no need to wrap it in a ref-counted heap allocation.

Fixes: #5409

Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
2023-06-28 14:05:27 -03:00
GabyCT
3f87d0fbfe Merge pull request #7180 from dborquez/run_ret_hypervisor_version_w_sudo
metrics: Fix retrieving hypervisor version on metrics
2023-06-28 10:54:23 -06:00
Gabriela Cervantes
beb7063683 metrics: Uniformity across function names
This PR adds the word function before the function names in order to have
uniformity across the script as some are using this and some are not.

Fixes #7196

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-28 16:09:19 +00:00
Fabiano Fidêncio
c8d33da8a4 Merge pull request #7188 from jongwu/fix_vfio
runtime-rs: fix build error on AArch64
2023-06-28 15:43:14 +02:00
Jianyong Wu
1f3e837e4b runtime-rs: fix build error on AArch64
Vfio support introduce build error on AArch64. Remove arch related
annotation can avoid this error.

Fixes: #7187
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-06-28 07:10:43 +00:00
alex.lyn
6fd25968c6 runtime-rs: bugfix for direct volume path's validation.
The failure mainly caused by the encoded volume path and
the mount/src. As the src will be validated with stat,but
it's not a full path and encoded, which causes the stat
mount source failed.

Fixes: #7186

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-06-28 10:07:07 +08:00
GabyCT
3885ba4910 Merge pull request #7173 from GabyCT/topic/addcheckm
checkmetrics: Add checkmetrics makefile and documentation
2023-06-27 16:30:44 -06:00
Gabriela Cervantes
415578cf3b docs: Add general README
This PR adds link to the unreference docs in the cmd path to make
them more discoverable.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-27 20:29:37 +00:00
Zhongtao Hu
c76583a08f Merge pull request #7171 from GabyCT/topic/enabletimedoc
docs: Add boot time metrics documentation
2023-06-27 10:28:56 +08:00
Zhongtao Hu
bff4672f7d runtime-rs: support physical endpoint using device manager
use device manager to attach physical endpoint

Fixes: #7103
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-06-27 10:25:51 +08:00
David Esparza
32cba7e44a metrics: Fix retrieving hypervisor version on metrics
This PR makes use of sudo to retrieve the hypervisor version.

Fixes: #7178

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-06-26 16:26:27 -06:00
Gabriela Cervantes
aa7946de47 checkmetrics: Add general checkmetrics documentation
This PR adds the general checkmetrics documentation for kata metrics tests.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-26 17:07:57 +00:00
Gabriela Cervantes
2fac2b72fe checkmetrics: Add checkmetrics makefile
This PR adds checkmetrics makefile which is used to process the
metrics json results files.

Fixes #7172

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-26 16:31:55 +00:00
Gabriela Cervantes
e45899ae0e docs: Add time tests documentation reference
This PR adds time tests documentation reference in the general README
for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-26 16:30:20 +00:00
Gabriela Cervantes
28130d3cef docs: Add boot time metrics documentation
This PR adds boot time metrics documentation for kata metrics tests.

Fixes #7170

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-26 16:19:28 +00:00
Zhongtao Hu
ce8e3cc091 Merge pull request #7073 from Apokleos/spdk-vol
runtime-rs: add support spdk/vhost-user based volume.
2023-06-26 11:34:44 +08:00
alex.lyn
0df2fc2702 runtime-rs: add support spdk/vhost-user based volume.
Unlike the previous usage which requires creating
/dev/xxx by mknod on the host, the new approach will
fully utilize the DirectVolume-related usage method,
and pass the spdk controller to vmm.

And a user guide about using the spdk volume when run
a kata-containers. it can be found in docs/how-to.

Fixes: #6526

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-06-25 16:23:19 +08:00
GabyCT
4cf552c151 Merge pull request #7097 from stevenhorsman/remove-unecessary-kata-versions
static-build: Remove kata-version parameter
2023-06-23 16:53:57 -06:00
GabyCT
388b55175e Merge pull request #7056 from FuuuOverclocking/fuu/fix-console_manager
dragonball: avoid obtaining lock twice in create_stdio_console
2023-06-23 16:47:00 -06:00
GabyCT
1a80fd66a2 Merge pull request #7161 from GabyCT/topic/enablemetricslimits
metrics: Add checkmetrics for kata metrics CI
2023-06-23 16:45:16 -06:00
Gabriela Cervantes
17198089ee vendor: Add vendor checkmetrics dependencies
This PR adds the vendor for the checkmetrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-23 20:55:30 +00:00
David Esparza
cfd6da9467 Merge pull request #7159 from dborquez/enable_launchtimes_test
metrics: enable launch-times test on gha-run metrics script
2023-06-23 12:59:46 -06:00
GabyCT
d6ff48f4e7 Merge pull request #7158 from GabyCT/topic/addmetricsreadme
docs: Add general metrics documentation
2023-06-23 11:28:00 -06:00
Gabriela Cervantes
f1dfea6e87 docs: Add metrics documentation reference
This PR adds the metrics documentation as a general reference in the
main README for kata containers.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-23 16:26:34 +00:00
Zvonko Kaiser
8330fb8ee7 gpu: Update unit tests
Some tests are now failing due to the changes how PCIe is
handled. Update the test accordingly.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-23 11:16:25 +00:00
David Esparza
8593594247 metrics: enable launch-times test on gha-run metrics script
This PR enables launch-times test on gha metrics workflow.

Fixes: #7049

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-06-22 18:05:46 -06:00
Fupan Li
469c678425 Merge pull request #7058 from Apokleos/vfio-dev
add support vfio device manager
2023-06-22 17:51:22 -06:00
Gabriela Cervantes
c4ee601bf4 metrics: Add checkmetrics for kata metrics CI
This PR adds the checkmetrics scripts that will be used for the kata metrics CI.

Fixes #7160

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-22 21:06:46 +00:00
Steve Horsman
267e97f9c0 Merge pull request #7162 from sprt/trusted-pr-authors
gha: Don't automatically trigger CI
2023-06-22 20:55:10 +01:00
Aurélien Bombo
e0d6475b49 gha: Don't automatically trigger CI
We have GH configured so that manual approval is required for CI runs
triggered by outside contributors. However, because CI is triggered by
the `pull_request_target` event, this setting isn't being honored
(see [1]). This means that an attacker could trivially extracts secrets
by submitting a PR.

This change aims to mititgate this issue by preventing PRs from
triggering CI unless the `ok-to-test` label is set.

Note: For further context, we use the `pull_request_target` event and
manually check out the PR branch because it is the only way to both
access secrets and test incoming code changes.

Fixes: #7163

 [1]: https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-06-22 11:05:53 -07:00
Aurélien Bombo
b535c7cbd8 tests: Enable running k8s tests on Mariner
This removes the gate and lets CI run tests on Mariner.

Fixes: #6840

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-06-22 10:30:52 -07:00
Archana Shinde
2d329125fd Merge pull request #6800 from amshinde/check-vm-capability
kata-ctl: Check for vm capability
2023-06-21 23:52:46 -07:00
Zhongtao Hu
4b793222ab Merge pull request #7154 from cheriL/7153/fix_spellings
docs: fix spelling of "crate"
2023-06-22 10:48:58 +08:00
Gabriela Cervantes
71071bdb63 docs: Add general metrics documentation
This PR adds a general metrics introduction documentation for the kata CI.

Fixes #7157

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-21 17:19:36 +00:00
Archana Shinde
610f7986e4 check: Relax the unrestricted_guest check when running in a VM
When running on a VM, the kernel parameter "unrestricted_guest" for
kernel module "kvm_intel" is not required. So, return success when running
on a VM without checking value of this kernel parameter.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-06-21 07:30:35 -07:00
Archana Shinde
1b406b9d0c kata-ctl:Implement functionality to check host is capable of running VM
Implement functionality to add to the env output if the host is capable
of running a VM.

Fixes: #6727

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-06-21 07:30:22 -07:00
David Esparza
90408d66c0 Merge pull request #7148 from GabyCT/topic/fixtabsinitscript
packaging: Fix indentation in init.sh script
2023-06-21 07:24:25 -06:00
stevenhorsman
adf88eaa89 static-build: Remove kata-version parameter
- Remove the unnecessary kata-version passed as a second parameter

Fixes: #7096
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-06-21 10:15:42 +01:00
soup
09720babc3 docs: fix spelling of "crate"
Fixes: #7153

Signed-off-by: soup <lqh348659137@outlook.com>
2023-06-21 16:10:54 +08:00
David Esparza
84b214d9d2 Merge pull request #7150 from GabyCT/topic/fixworkflows
gha: Fix gha actions
2023-06-20 18:08:23 -06:00
Gabriela Cervantes
7185afc50e gha: Fix gha actions
This PR removes an unrecognized value located in one of the yamls for the
gha in order to make it work the CI again.

Fixes #7149

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-20 23:13:25 +00:00
Gabriela Cervantes
21294b868d packaging: Fix indentation in init.sh script
This PR replaces single spaces for tabs in order to fix the indentation
in the init.sh script.

Fixes #7147

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-20 22:06:52 +00:00
GabyCT
90e36f43ff Merge pull request #7138 from dborquez/setup-kata-and-configure-launchtimes-test
metrics: install kata and launch-times test
2023-06-20 16:00:38 -06:00
David Esparza
fad3ac9f58 metrics: install kata and launch-times test
This PR installs kata static tarball on metrics runner
and run launch-times tests.

Fixes: #7049

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-06-20 13:58:09 -06:00
David Esparza
d071a87c7b Merge pull request #7109 from dborquez/add_common_libs_for_metrics
tests: Move tests helper script to this repo
2023-06-19 19:02:37 -06:00
David Esparza
4bbfcfaf15 tests: Move tests helper script to this repo
The common.sh script includes helper functions used in
our metrics tests, so we are gradually adding more
metrics used in kata.

Fixes: #7108

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-06-19 12:14:25 -06:00
David Esparza
f152f0e8c3 metrics: Add launch-times to metrics tests
This test measures the duration of a workload that starts, and then
immediately stops the contianer. Also measures the workload period,
the time to quit period, and the time to kernel period.

Fixes: #7049

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-06-19 10:40:16 -06:00
GabyCT
decbe77e28 Merge pull request #7129 from GabyCT/topic/metrlibjson
tests: Add json script for metrics tests
2023-06-19 09:59:41 -06:00
Fabiano Fidêncio
ef8b360711 Merge pull request #7085 from stevenhorsman/cherry-pick-initramfs
Cherry pick initramfs caching updates from CCv0
2023-06-19 11:59:00 +02:00
alex.lyn
59510cfee0 runtime-rs: add support vfio device based volume
A new choice of using vfio devic based volume for kata-containers.
With the help of kata-ctl direct-volume, users are able to add a
specified device which is BDF or IOMMU group ID.

To help users to use it smoothly, A doc about howto added in
docs/how-to/how-to-run-kata-containers-with-kinds-of-Block-Volumes.

Fixes: #6525

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-06-18 14:07:05 +08:00
alex.lyn
1e3b372bbb runtime-rs: add support vfio device manager
Limitations:
As no ready rust vmm's vfio manager is ready, it only supports
part of vfio in runtime-rs. And the left part is to call vmm
interfaces related to vfio add/remove.

So when vmm/vfio manager ready, a new PR will be pushed to
narrow the gap.

Fixes: #6525

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-06-18 14:05:59 +08:00
David Esparza
61e819ea8e Merge pull request #7131 from GabyCT/topic/fixrunner
gha: Fix format for run launchtimes metrics yaml
2023-06-16 18:30:57 -06:00
Gabriela Cervantes
6b08489301 gha: Fix format for run launchtimes metrics yaml
This PR fixes the format for the run launchtimes metrics yaml which
is causing to the workflow to fail.

Fixes #7130

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-16 22:00:36 +00:00
Gabriela Cervantes
3cefa43e75 tests: Add json script for metrics tests
This PR adds the json script which allow us to save the metrics results
into a json file which will be used in the kata containers metrics.

Fixes #7128

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-16 19:45:26 +00:00
GabyCT
7976a0ac72 Merge pull request #7114 from GabyCT/topic/libcommontests
tests: Add tests lib common script
2023-06-16 11:48:19 -06:00
Greg Kurz
27045798bf Merge pull request #7112 from gkurz/fix-virtiofsd-args
Fix deprecated virtiofsd args (go shim only)
2023-06-16 18:13:24 +02:00
Fabiano Fidêncio
6a3710055b initramfs: Build dependencies as part of the Dockerfile
This will help to not have to build those on every CI run, and rather
take advantage of the cached image.

Fixes: #7084

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit c720869eef)
2023-06-16 10:58:12 +01:00
Fabiano Fidêncio
aa2380fdd6 packaging: Add infra to push the initramfs builder image
Let's add the needed infra for only building and pushing the initramfs
builder image to the Kata Containers' quay.io registry.

Fixes: #7084

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 111ad87828)
2023-06-16 10:58:12 +01:00
Fabiano Fidêncio
1c7fcc6cbb packaging: Use existing image to build the initramfs
Let's first try to pull a pre-existing image, instead of building our
own, to be used as a builder for the initramds.

This will save us some CI time.

Fixes: #7084

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ebf6c83839)
2023-06-16 10:58:12 +01:00
Greg Kurz
a43ea24dfc virtiofsd: Convert legacy -o sub-options to their -- replacement
The `-o` option is the legacy way to configure virtiofsd, inherited
from the C implementation. The rust implementation honours it for
compatibility but it logs deprecation warnings.

Let's use the replacement options in the go shim code. Also drop
references to `-o` from the configuration TOML file.

Fixes #7111

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-06-16 11:42:54 +02:00
Greg Kurz
8e00dc6944 virtiofsd: Drop -o no_posix_lock
The C implementation of virtiofsd had some kind of limited support
for remote POSIX locks that was causing some workflows to fail with
kata. Commit 432f9bea6e hard coded `-o no_posix_lock` in order
to enforce guest local POSIX locks and avoid the issues.

We've switched to the rust implementation of virtiofsd since then,
but it emits a warning about `-o` being deprecated.

According to https://gitlab.com/virtio-fs/virtiofsd/-/issues/53 :

   The C implementation of the daemon has limited support for
   remote POSIX locks, restricted exclusively to non-blocking
   operations. We tried to implement the same level of
   functionality in #2, but we finally decided against it because,
   in practice most applications will fail if non-blocking
   operations aren't supported.

   Implementing support for non-blocking isn't trivial and will
   probably require extending the kernel interface before we can
   even start working on the daemon side.

There is thus no justification to pass `-o no_posix_lock` anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-06-16 11:42:39 +02:00
Greg Kurz
2a15ad9788 virtiofsd: Stop using deprecated -f option
The rust implementation of virtiofsd always runs foreground and
spits a deprecation warning when `-f` is passed.

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-06-16 10:30:40 +02:00
David Esparza
b9d92f4577 Merge pull request #7117 from dborquez/add_checkout_metrics_workflow
gha: Add base branch on SHA on pull requst
2023-06-15 17:06:16 -06:00
Gabriela Cervantes
c3043a6c60 tests: Add tests lib common script
This PR adds the test lib common script that is going to be used
for kata containers metrics.

Fixes #7113

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-15 21:23:00 +00:00
David Esparza
b16e0de734 gha: Add base branch on SHA on pull requst
The run-launchtimes-metrics workflow needs to get the commit ID
for the last commit to the head branch of the PR.

Fixes: #7116

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-06-15 13:11:33 -06:00
Zvonko Kaiser
72f2cb84e6 gpu: Reset cold or hot plug after overriding
If we override the cold, hot plug with an annotation
we need to reset the other plugging mechanism to NoPort
otherwise both will be enabled.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-15 17:51:01 +00:00
Zvonko Kaiser
fbacc09646 gpu: PCIe topology, consider vhost-user-block in Virt
In Virt the vhost-user-block is an PCIe device so
we need to make sure to consider it as well. We're keeping
track of vhost-user-block devices and deduce the correct
amount of PCIe root ports.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-15 17:39:55 +00:00
GabyCT
0f24f427d7 Merge pull request #7101 from dborquez/add_initial_metrics_gh_workflow
gha: ci-on-push: Run metrics tests
2023-06-15 10:08:56 -06:00
David Esparza
bc152b1141 gha: ci-on-push: Run metrics tests
This gh-workflow prints a simple msg, but is the base for future
PRs that will gradually add the jobs corresponding to the kata
metrics test.

Fixes: #7100

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-06-14 15:15:08 -06:00
GabyCT
a3180d0cb8 Merge pull request #7095 from GabyCT/topic/updatedebugconse
docs: Update Developer Guide
2023-06-14 13:49:37 -06:00
Gabriela Cervantes
dad731d5c1 docs: Update Developer Guide
This PR updates the developer guide at the connect to the debug console
section.

Fixes #7094

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-14 15:36:51 +00:00
Zhongtao Hu
11692a76e1 Merge pull request #7092 from Apokleos/virtiofs-enhancement
runtime-rs: Enhance flexibility of virtio-fs config
2023-06-14 20:01:46 +08:00
Zvonko Kaiser
b11246c3aa gpu: Various fixes for virt machine type
The PCI qom path was not deduced correctly added regex for correct
path walking.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:33:57 +00:00
Zvonko Kaiser
40101ea7db vfio: Added annotation for hot(cold) plug
Now it is possible to configure the PCIe topology via annotations
and addded a simple test, checking for Invalid and RootPort

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
8f0d4e2612 vfio: Cleanup of Cold and Hot Plug
Removed the configuration of PCIeRootPort and PCIeSwitchPort, those
values can be deduced in createPCIeTopology

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
b5c4677e0e vfio: Rearrange the bus assignemnt
Refactor the bus assignment so that the call to GetAllVFIODevicesFromIOMMUGroup
can be used by any module without affecting the topology.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
b1aa8c8a24 gpu: Moved the PCIe configs to drivers
The hypervisor_state file was the wrong location for the PCIe Port
settings, moved everything under device umbrella, where it can be
consumed more easily and we do not get into circular deps.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
55a66eb7fb gpu: Add config to TOML
Update cold-plug and hot-plug setting to include bridge, root and
switch-port

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
da42801c38 gpu: Add config settings tests for hot-plug
Updated all references and config settings for hot-plug to match
cold-plug

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
de39fb7d38 runtime: Add support for GPUDirect and GPUDirect RDMA PCIe topology
Fixes: #4491

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 08:20:24 +00:00
Zvonko Kaiser
9318e022af gpu: Add CC relates configs
For the GPU CC use case we need to set several crypto algorithms.
The driver relies on them in the CC case.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 07:56:53 +00:00
Zvonko Kaiser
b7932be4b6 gpu: Add Arm64 Kernel Settings
For different archs we need diferent settings use ${ARCH} to choose
the right fragment

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 07:56:53 +00:00
Zvonko Kaiser
211b0ab268 gpu: Update Kernel Config
Newer drivers need more symbols so lets enable them

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 07:56:53 +00:00
Zvonko Kaiser
5f103003d6 gpu: Update kernel building to the latest changes
Use now the sev.conf rather then the snp.conf.
Devices can be prestend in two different way in the
container (1) as vfio devices /dev/vfio/<num>
(2) the device is managed by whataever driver in
the VM kernel claims it.

Fixes: #6844

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-06-14 07:56:53 +00:00
Fabiano Fidêncio
95bec479ca Merge pull request #7090 from GabyCT/topic/ufcversion
versions: Update firecracker version to 1.3.3
2023-06-14 01:24:02 +02:00
Fabiano Fidêncio
8aa4a87fae Merge pull request #7099 from sprt/fix-new-targets
tools: Fix no-op builds
2023-06-14 01:23:39 +02:00
Aurélien Bombo
35e4938e8c tools: Fix no-op builds
This fixes the builds of `cloud-hypervisor-glibc` and
`rootfs-initrd-mariner` to properly create the `build/` directory.

Fixes: #7098

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-06-13 10:56:49 -07:00
Zhongtao Hu
da8dde0c24 Merge pull request #7079 from HerlinCoder/herlincoder/vpa
runtime-rs: update Cargo.lock
2023-06-13 21:44:45 +08:00
Fabiano Fidêncio
ff38937246 Merge pull request #7087 from sprt/fix-gha-stage
gha: Fix `stage` definition in matrix
2023-06-13 12:17:25 +02:00
alex.lyn
347385b4ee runtime-rs: Enhance flexibility of virtio-fs config
support more and flexible options for inline virtiofs.

Fixes: #7091

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-06-13 15:12:47 +08:00
Zhongtao Hu
355a24e0e1 Merge pull request #6289 from openanolis/runtime_vcpu_resize
feat(runtime): vcpu resize capability
2023-06-13 10:54:11 +08:00
Chelsea Mafrica
1763b1f69f Merge pull request #7082 from jodh-intel/remove-snap
packaging: Remove snap package
2023-06-12 17:05:00 -07:00
Gabriela Cervantes
21d2278539 versions: Update firecracker version to 1.3.3
This PR updates the firecracker version to 1.3.3 which includes the following
changes
Fixed passing through cache information from host in CPUID leaf 0x80000006.
A race condition that has been identified between the API thread and the VMM
thread due to a misconfiguration of the api_event_fd.

Fixes #7089

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-12 20:32:02 +00:00
Aurélien Bombo
0e2379909b gha: Fix stage definition in matrix
This defines `stage` as a list instead of a literal to fix the GHA CI.

Fixes: #7086

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-06-12 11:24:45 -07:00
Fabiano Fidêncio
977309a281 Merge pull request #7027 from sprt/sprt/mariner-build-targets
gha: Add new build targets for Mariner
2023-06-12 19:19:22 +02:00
Yushuo
ae2cfa8263 doc: add vcpu handlint doc for runtime-rs
Kubernetes and Containerd will help calculate the Sandbox Size and pass it to
Kata Containers through annotations.

In order to accommodate this favorable change and be compatible with the past,
we have implemented the handling of the number of vCPUs in runtime-rs. This is
This is slightly different from the original runtime-go design.

This doc introduce how we handle vCPU size in runtime-rs.

Fixes: #5030

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 19:23:11 +08:00
Yushuo
7b1e67819c fix(clippy): fix clippy error
Fixes: #5030

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 17:53:16 +08:00
Yushuo
67972ec48a feat(runtime-rs): calculate initial size
In this commit, we refactored the logic of static resource management.

We defined the sandbox size calculated from PodSandbox's annotation and
SingleContainer's spec as initial size, which will always be the sandbox
size when booting the VM.

The configuration static_sandbox_resource_mgmt controls whether we will
modify the sandbox size in  the following container operation.

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 17:53:16 +08:00
Yushuo
aaa96c749b feat(runtime-rs): modify onlineCpuMemRequest
Some vmms, such as dragonball, will actively help us
perform online cpu operations when doing cpu hotplug.
Under the old onlineCpuMem interface, it is difficult
to adapt to this situation.

So we modify the semantics of nb_cpus in onlineCpuMemRequest.
In the original semantics, nb_cpus represents the number of
newly added CPUs that need to be online. The modified
semantics become that the number of online CPUs in the guest
needs to be guaranteed.

Fixes: #5030

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 17:53:16 +08:00
Yushuo
d66f7572dd feat(runtime-rs): clear cpuset in runtime side
The declaration of the cpu number in the cpuset is greater
than the actual number of vcpus, which will cause an error when
updating the cgroup in the guest.

This problem is difficult to solve, so we temporarily clean up
the cpuset in the container spec before passing in the agent.

Fixes: #5030

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 17:53:16 +08:00
Yushuo
a0385e1383 feat(runtime-rs): update linux resource when stop_process
Update the resource when delete container, which is in
stop_process in runtime-rs.

Fixes: #5030

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 17:53:16 +08:00
Yushuo
a39e1e6cd1 feat(runtime-rs): merge the update_cgroups in update_linux_resources
Updating vCPU resources and memory resources of the sandbox and
updating cgroups on the host will always happening together, and
they are all updated based on the linux resources declarations of
all the containers.

So we merge update_cgroups into the update_linux_resources, so we
can better manage the resources allocated to one pod in the host.

Fixes: #5030

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
2023-06-12 17:53:16 +08:00
Ji-Xinyou
fa6dff9f70 feat(runtime-rs): support vcpu resizing on runtime side
Support vcpu resizing on runtime side:
1. Calculate vcpu numbers in resource_manager using all the containers'
   linux_resources in the spec.
2. Call the hypervisor(vmm) to do the vcpu resize.
3. Call the agent to online vcpus.

Fixes: #5030
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
2023-06-12 17:53:16 +08:00
James O. D. Hunt
8cb4238b46 packaging: Remove snap package
Nobody has volunteered to maintain the (currently broken) snap build, so
remove it.

Fixes: #6769.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-06-12 09:24:09 +01:00
Helin Guo
2137739987 runtime-rs: update Cargo.lock
After we support memory resize in Dragonball, we need to update
Cargo.lock in runtime-rs.

Fixes: #6719

Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
2023-06-12 11:25:59 +08:00
Chao Wu
2988553305 Merge pull request #6998 from HerlinCoder/herlincoder/vpa
Dragonball: support resize memory
2023-06-11 17:21:12 +08:00
Archana Shinde
56d2ea9b78 kata-ctl: Refactor kernel module check
Adding vhost and vhost-net to the kernel modules. These do not require
any kernel module parameters to be checked. Currently, kernel params is
a required field. Make this as optional. Could make this as <Option>,
but making this a slice instead, as a module could have multiple kernel
params. Refactor the function that checks are for kernel modules into
two with one specifically checking if the module is loaded and other
checking for module parameters.

Refactor some of the tests to take into account these changes.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-06-09 14:10:31 -07:00
Aurélien Bombo
9f7a45996c gha: Add rootfs-initrd-mariner build target
This adds the Mariner guest image build target to the list of assets
as preparation for #6839.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-06-09 11:36:42 -07:00
Aurélien Bombo
f28a62164a gha: Add cloud-hypervisor-glibc build target
This adds the glibc flavor of CLH to the list of assets as preparation
for #6839. Mariner Kata is only tested with glibc.

Fixes: #7026

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-06-09 11:35:50 -07:00
Fabiano Fidêncio
b50f62ce48 Merge pull request #6756 from arronwy/measured_rootfs
Port Measured rootfs feature from CCv0 branch to main
2023-06-09 12:35:05 +02:00
Helin Guo
8fb7ab7518 dragonball: introduce virtio-balloon device
We introduce virtio-balloon device to support memory resize.
virtio-balloon device could reclaim memory from guest to host.

Fixes: #6719

Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
2023-06-09 17:47:27 +08:00
Helin Guo
7ed9494973 dragonball: introduce virtio-mem device
We introduce virtio-mem device to support memory resize. virtio-mem
device could hot-plug more memory blocks to guest and could also
hot-unplug them from guest.

Fixes: #6719

Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
2023-06-09 17:47:21 +08:00
Chao Wu
c7c45626c9 Merge pull request #6973 from Apokleos/direct-vol
add support direct volume and refactor device manager
2023-06-09 11:29:00 +08:00
alex.lyn
776a15e092 runtime-rs: add support direct volume.
As block/direct volume use similar steps of device adding,
so making full use of block volume code is a better way to
handle direct volume.

the only different point is that direct volume will use
DirectVolume and get_volume_mount_info to parse mountinfo.json
from the direct volume path. That's to say, direct volume needs
the help of `kata-ctl direct-volume ...`.

Details seen at Advanced Topics:
[How to run Kata Containers with kinds of Block Volumes]
docs/how-to/how-to-run-kata-containers-with-kinds-of-Block-Volumes.md

Fixes: #5656

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-06-09 08:16:26 +08:00
Helin Guo
a8e0f51c52 dragonball: extend DeviceOpContext
In order to support virtio-mem and virtio-balloon devices, we need to
extend DeviceOpContext with VmConfigInfo and InstanceInfo.

Fixes: #6719

Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
2023-06-08 22:04:31 +08:00
alex.lyn
abae114046 runtime-rs: refactor device manager implementation
The key aspects of the DM implementation refactoring as below:

1. reduce duplicated code
 Many scenarios have similar steps when adding devices. so to reduce
 duplicated code, we should create a common method abstracted and use
 it in various scenarios.
do_handle_device:
(1) new_device with DeviceConfig and return device_id;
(2) try_add_device with device_id and do really add device;
(3) return device info of device's info;

2. return full info of Device Trait get_device_info
 replace the original type DeviceConfig with full info DeviceType.

3. refactor find_device method.

Fixes: #5656

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-06-08 08:47:08 +08:00
Fabiano Fidêncio
08d10d38be Merge pull request #7048 from sprt/sprt/fix-gha
gha: Fix gha-run.sh and unbreak CI
2023-06-07 23:40:02 +02:00
James O. D. Hunt
452f286552 Merge pull request #6764 from byron-marohn/fix_5401
kata-ctl: Switch to slog logging; add --log-level and --json-logging arguments
2023-06-07 16:08:53 +01:00
Fuu
210a15794c dragonball: avoid obtaining lock twice in create_stdio_console
Fixes #7055

Signed-off-by: Fuu <fuu-open@linux.alibaba.com>
2023-06-07 16:12:22 +08:00
Aurélien Bombo
69668ce87f tests: gha-run: Use correct env variable for repo
s/DOCKER_IMAGE/DOCKER_REPO

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-06-06 11:54:43 -07:00
Aurélien Bombo
f487199edf gha: aks: Fix argument in call to gha-run.sh
Fixes: #7047

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-06-06 11:51:18 -07:00
GabyCT
5ad8aaf9df Merge pull request #7035 from GabyCT/topic/logparserdoc
log-parser: Update log parser link at README
2023-06-06 12:02:25 -06:00
Fabiano Fidêncio
de2e507483 Merge pull request #6972 from sprt/sprt/gha-run-script
gha: aks: Extract `run` commands to a script
2023-06-06 14:54:03 +02:00
Wang, Arron
f6afae9c73 packaging: Add rootfs-image-tdx-tarball target
Add rootfs-image-tdx target:
./tools/packaging/kata-deploy/local-build/kata-deploy-binaries.sh --build=rootfs-image-tdx
./opt/kata/share/kata-containers/kata-containers-tdx.img
./opt/kata/share/kata-containers/kata-ubuntu-latest-tdx.image

Fixes: #6674

Signed-off-by: Wang, Arron <arron.wang@intel.com>
2023-06-06 12:34:20 +02:00
Wang, Arron
f62b2670c0 config: Add root hash value and measure config to kernel params
After we have a guest kernel with builtin initramfs which
provide the rootfs measurement capability and Kata rootfs
image with hash device, we need set related root hash value
and measure config to the kernel params in kata configuration file.

Fixes: #6674

Signed-off-by: Wang, Arron <arron.wang@intel.com>
2023-06-06 12:34:13 +02:00
Wang, Arron
0080588075 kernel: Integrate initramfs into Guest kernel
Integrate initramfs into guest kernel as one binary,
which will be measured by the firmware together.

Fixes: #6674

Signed-off-by: Wang, Arron <arron.wang@intel.com>
2023-06-06 12:33:41 +02:00
Wang, Arron
28b2645624 initramfs: Add build script to generate initramfs
The init.sh in initramfs will parse the verity scheme,
roothash, root device and setup the root device accordingly.

Fixes: #6674

Signed-off-by: Wang, Arron <arron.wang@intel.com>
2023-06-06 12:33:28 +02:00
Wang, Arron
5cb02a8067 image-build: generate root hash as an separate partition for rootfs
Generate rootfs hash data during creating the kata rootfs,
current kata image only have one partition, we add another
partition as hash device to save hash data of rootfs data blocks.

Fixes: #6674

Signed-off-by: Wang, Arron <arron.wang@intel.com>
2023-06-06 12:31:14 +02:00
Arron Wang
31c0ad2076 packaging: Add cryptsetup support in Guest kernel and rootfs
Add required kernel config for dm-crypt/dm-integrity/dm-verity
and related crypto config.

Add userspace command line tools for disk encryption support
and ext4 file system utilities.

Fixes: #6674

Signed-off-by: Arron Wang <arron.wang@intel.com>
2023-06-06 12:30:07 +02:00
Fabiano Fidêncio
eb1bfa922b Merge pull request #6980 from nubificus/feat_sharefs_files
runtime-rs: handle copy files when share_fs is not available
2023-06-06 12:26:55 +02:00
Chao Wu
b0c6cd05a2 Merge pull request #7033 from openanolis/fix-agent-ctl
agent-ctl: fix the compile error
2023-06-06 11:55:15 +08:00
Gabriela Cervantes
980d084f47 log-parser: Update log parser link at README
This PR updates the link to the correspondent Developer Guide at the
enabling full containerd debug that we have for kata 2.0 documentation.

Fixes #7034

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-06-05 15:59:52 +00:00
Yushuo
410bc18143 agent-ctl: fix the compile error
When the version of libc is upgraded to 0.2.145, older getrandom could not adapt
to new API, and this will make agent-ctl fail to compile.

We upgrade the version of `rand`, so the low version of getrandom will no longer
need.

Fixes: #7032

Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
2023-06-05 21:48:36 +08:00
Jayant Singh
77519fd120 kata-ctl: Switch to slog logging; add --log-level, --json-logging args
Fixes: #5401, #6654

- Switch kata-ctl from eprintln!()/println!() to structured logging via
  the logging library which uses slog.
- Adds a new create_term_logger() library call which enables printing
  log messages to the terminal via a less verbose / more human readable
  terminal format with colors.
- Adds --log-level argument to select the minimum log level of printed messages.
- Adds --json-logging argument to switch to logging in JSON format.

Co-authored-by: Byron Marohn <byron.marohn@intel.com>
Co-authored-by: Luke Phillips <lucas.phillips@intel.com>
Signed-off-by: Jayant Singh <jayant.singh@intel.com>
Signed-off-by: Byron Marohn <byron.marohn@intel.com>
Signed-off-by: Luke Phillips <lucas.phillips@intel.com>
Signed-off-by: Kelby Madal-Hellmuth <kelby.madal-hellmuth@intel.com>
Signed-off-by: Liz Lawrens <liz.lawrens@intel.com>
2023-06-02 20:13:22 +00:00
Aurélien Bombo
aab6030962 gha: aks: Extract run commands to a script
Github Actions reads and runs workflow files from the main branch,
rather than from the PR branch. This means that PRs that modify workflow
files aren't being tested with the updated workflows coming from the PR,
but rather with the old workflows from the main branch. AFAIK, this
behavior isn't avoidable for workflow files (but is for other scripts).

This makes it very hard to reliably test workflow changes before they're
actually merged into main and leads to issues that we have to hotifx
(see #6983, #6995).

This PR aims to mitigate that by extracting the commands used in
workflows to a separate script file. The way our CI is set up, those
script files are read from the PR branch and thus changes would be
reflected in the CI checks.

Fixes: #6971

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-06-02 10:22:35 -07:00
Fupan Li
465f5a5ced Merge pull request #4748 from lifupan/main_fix
agent: fix the issue of exec hang with a backgroud process
2023-06-02 10:46:43 +08:00
Chao Wu
2128fa2b4e Merge pull request #7013 from xuejun-xj/xuejun/bugfix
runtime-rs: bugfix: update Cargo.lock
2023-06-02 10:08:27 +08:00
Anastassios Nanos
e4eb664d27 runtime-rs: update rust to 1.69.0
We are probably hitting this:
https://github.com/rust-lang/rust/issues/63033

Seems like it is worth a try to upgrade to 1.69.0

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
2023-06-01 21:40:56 +00:00
Anastassios Nanos
ed37715e05 runtime-rs: handle copy files when share_fs is not available
In hypervisors that do not support virtiofs we have to copy files in
the VM sandbox to properly setup the network (resolv.conf, hosts, and hostname).

To do that, we construct the volume as before, with the addition of an extra
variable that designates the path where the file will reside in the sandbox.

In this case, we issue a `copy_file` agent request *and* we patch the spec
to account for this change.

Fixes: #6978

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
Signed-off-by: George Pyrros <gpyrros@nubificus.co.uk>
2023-06-01 21:40:56 +00:00
Fabiano Fidêncio
18b1a019d4 Merge pull request #7011 from jepio/fix-aks-cluster-name
gha: aks: Use short SHA in cluster name
2023-06-01 15:56:20 +02:00
Fabiano Fidêncio
5ab42d87fb Merge pull request #7009 from fidencio/topic/display-badge-for-the-publish-artefacts-job
README: Display badge for the "Publish Artefacts" job and update the Kata Containers logo
2023-06-01 15:13:41 +02:00
Fabiano Fidêncio
eb1f44f111 Merge pull request #7007 from fidencio/topic/try-to-fix-ubuntu-k8s-key-not-available
kata-deploy: Change how we get the Ubuntu k8s key
2023-06-01 15:13:22 +02:00
xuejun-xj
5f6fc3ed76 runtime-rs: bugfix: update Cargo.lock
When dragonball update dbs-boot crate in commit
64c764c147, the Cargo.lock in runtime-rs
should also be updated.

Fixes: #6969

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-06-01 20:25:35 +08:00
Jeremi Piotrowski
1c6d22c803 gha: aks: Use short SHA in cluster name
Full SHA is 40 characters, while AKS cluster name has a limit of 63. Trim the
SHA to 12 characters, which is widely considered to be unique enough and is
short enough to be used in the cluster name

Fixes: #7010
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-06-01 14:03:53 +02:00
Fabiano Fidêncio
3c1f6d36dc readme: Update Kata Containers logo
Let's use the horizontal logo, as it occupies better the space the we
have.

The logo comes from:
https://openinfra.dev/brand/logos

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-06-01 12:25:13 +02:00
Fabiano Fidêncio
3886841131 readme: Add status badge for the "Publish Artefacts" job
Let's start adding the status of our jobs as part of our main page, so
folks monitoring those can easily check whether they're okay, or if
someone has to be pinged about those.

Fixes: #7008

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-06-01 12:25:01 +02:00
Fabiano Fidêncio
26f7520387 kata-deploy: Change how we get the Ubuntu k8s key
The current method has been failing every now and then, and was reported
on https://github.com/kubernetes/release/issues/2862.

Ding poked me and suggested to do this change here, so here we go. :-)

Fixes: #7006

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-06-01 12:10:30 +02:00
Fabiano Fidêncio
9ec2bca101 Merge pull request #7002 from fidencio/topic/follow-up-on-7000
gha: aks: Ensure host_os is used everywhere needed
2023-06-01 08:51:27 +02:00
Fabiano Fidêncio
8cbb80da66 Merge pull request #6929 from LindaYu17/dev
kubernetes: add agnhost command in pod yaml
2023-06-01 08:39:58 +02:00
Fabiano Fidêncio
aebd3b47d9 gha: aks: Ensure host_os is used everywhere needed
We added that to create the cluster name, but I forgot to add that to
the part we get the k8s config file, or to the part where we delete the
AKS cluster.

Fixes: #6999

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-31 20:50:55 +02:00
Fabiano Fidêncio
e01f75723a Merge pull request #6997 from singhwang/main
main | release: Standardize kata static file name
2023-05-31 15:22:30 +02:00
Fabiano Fidêncio
1ed917a079 Merge pull request #6989 from BbolroC/configurable-build-registry
packaging: make BUILDER_REGISTRY configurable
2023-05-31 15:18:51 +02:00
Fabiano Fidêncio
de22783124 Merge pull request #7000 from fidencio/topic/use-a-different-name-for-the-ubuntu-and-mariner-aks-clusters
gha: aks: Add the host_os as part of the aks cluster's name
2023-05-31 15:18:17 +02:00
Archana Shinde
141c26f307 Merge pull request #6985 from amshinde/kernel-tdx-build
kernel: Modify build-kernel.sh to accomodate for changes in version.yaml
2023-05-31 01:57:20 -07:00
Fabiano Fidêncio
0c8282c224 gha: aks: Add the host_os as part of the aks cluster's name
We need to do so, otherwise we'll create two clusters for testing Cloud
Hypervisor with exactly the same name, one using Ubuntu, and one using
Mariner.

Fixes: #6999

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-31 05:20:04 +02:00
SinghWang
4b89a6bdac release: Standardize kata static file name
The string representing the architecture aarch64 and x86_64 need to be changed to arm64 and amd64 for the release.

Fixes: #6986
Signed-off-by: SinghWang <wangxin_0611@126.com>
2023-05-31 10:24:45 +08:00
Fabiano Fidêncio
51e42a9972 Merge pull request #6995 from sprt/sprt/fix-mariner-ci
gha: Fix Mariner cluster creation
2023-05-31 00:23:36 +02:00
Archana Shinde
9228815ad2 kernel: Modify build-kernel.sh to accomodate for changes in version.yaml
There were recent changes for the tdx kernel in the version.yaml that are
not currently accounted for in the build-kernel.sh script.
Attempts to setup a tdx kernel to build local changes seemed to not download
the tdx kernel. Instead the mainline kernel is downloaded which has no
tdx-related changes.

The version.yaml has a new entry for tdx kernel. Use that instead for
setting up and downloading the tdx kernel.

Fixes: #6984

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-05-30 13:44:58 -07:00
Aurélien Bombo
03027a7399 gha: Fix Mariner cluster creation
While the Mariner Kata host is in preview, we need the `aks-preview`
extension to enable the `--workload-runtime KataMshvVmIsolation` flag.

Fixes: #6994

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-05-30 13:26:49 -07:00
Hyounggyu Choi
43e73bdef7 packaging: make BUILDER_REGISTRY configurable
This PR is to make an environment variable `BUILDER_REGISTRY` configurable
so that those who want to use their own registry for build can set up
the registry.

Fixes: #6988
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2023-05-30 14:40:02 +02:00
Fabiano Fidêncio
2e2d7243d2 Merge pull request #6983 from sprt/sprt/fix-gha-ci
gha: Unbreak CI and fix cluster creation step
2023-05-30 12:58:10 +02:00
Zhongtao Hu
8b6cb2cd75 Merge pull request #6806 from xuejun-xj/xuejun/vcpuhotplug
Dragonball: support vcpu hotplug on aarch64
2023-05-30 18:47:50 +08:00
xuejun-xj
ffe3157a46 dragonball: add arm64 patches for upcall
The vcpu hotplug/hotunplug feature is implemented with upcall. This commit
add three patches to support the feature on aarch64. Patches:
> 0005: add support of upcall on aarch64
> 0006: skip activate offline cpus' MSI interrupt
> 0007: set the correct boot cpu number

Fixes: #6010

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-05-30 15:51:08 +08:00
xuejun-xj
560442e6ed dragonball: add vcpu_boot_onlined vector
This commit implements the vcpu_boot_onlined vector in get_fdt_vm_info.

"boot_enabled" means whether this vcpu should be onlined at first boot.
It will be used by fdt, which write an attribute called boot_enabled,
and will be handled by guest kernel to pass the correct cpu number to
function "bringup_nonboot_cpus".

Fixes: #6010

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-05-30 15:51:08 +08:00
xuejun-xj
e31772cfea dragonball: add support resize_vcpu on aarch64
This commit add support of resize_vcpu on aarch64. As kvm will check
whether vgic is initialized when calling KVM_CREATE_VCPU ioctl, all the
vcpu fds should be created before vm is booted.

To support resizing vcpu scenario, we use max_vcpu_count for
create_vcpus and setup_interrupt_controller interfaces. The
SetVmConfiguration API will ensure max_vcpu_count >= boot_vcpu_count.

Fixes: #6010

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-05-30 15:51:08 +08:00
xuejun-xj
64c764c147 dragonball: update dbs-boot to v0.4.0
dbs-boot-v0.4.0 refectors the create_fdt interface. It simplifies the
parameters needed to be passed and abstracts them into three structs.

By the way, it also reserves some interfaces for future feature: numa
passthrough and cache passthrough.

Fixes: #6969

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-05-30 15:51:08 +08:00
xuejun-xj
fd9b414646 dragonball: update comment for init_microvm
Rewrite the comment of Vm::init_microvm method for aarch64.

Fixes cargo test warnings on aarch64.

Fixes: #6969

Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
2023-05-30 15:51:08 +08:00
Aurélien Bombo
af16d3fca4 gha: Unbreak CI and fix cluster creation step
This fixes the regression introduced by #6686 by properly injecting the
`--os-sku mariner --workload-runtime KataMshvVmIsolation` flags.

Error reference:
https://github.com/kata-containers/kata-containers/actions/runs/5111460297/jobs/9188819103

Fixes: #6982

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-05-29 13:32:47 -07:00
Zhongtao Hu
099b4b0d0e Merge pull request #6598 from Apokleos/sandbox_bind_mounts
runtime-rs/sandbox_bindmounts: add support for sandbox bindmounts
2023-05-28 12:00:39 +08:00
Zhongtao Hu
cb962b0dc9 Merge pull request #6702 from Apokleos/directvol-common
runtime-rs/kata-ctl: Enhancement of DirectVolumeMount.
2023-05-28 12:00:12 +08:00
Fabiano Fidêncio
44546a4a57 Merge pull request #6686 from sprt/sprt/mariner-ci
gha: Create Mariner host as part of k8s tests
2023-05-27 05:34:28 +02:00
alex.lyn
5ddc4f94c5 runtime-rs/kata-ctl: Enhancement of DirectVolumeMount.
Move the get_volume_mount_info to kata-types/src/mount.rs.
If so, it becomes a common method of DirectVolumeMountInfo
and reduces duplicated code.

Fixes: #6701

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-26 11:18:29 +08:00
Fupan Li
25d2fb0fde agent: fix the issue of exec hang with a backgroud process
When run a exec process in backgroud without tty, the
exec will hang and didn't terminated.

For example:

crictl -i <container id> sh -c 'nohup tail -f /dev/null &'

Fixes: #4747

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2023-05-26 10:56:46 +08:00
Tim Zhang
5231aff90f Merge pull request #6860 from lifupan/main
netlink: Fix the issue of update_interface
2023-05-26 10:54:07 +08:00
Aurélien Bombo
4af4ced1aa gha: Create Mariner host as part of k8s tests
The current testing setup only supports running Kata on top of an Ubuntu
host. This adds Mariner to the matrix of testable hosts for k8s
tests, with Cloud Hypervisor as a VMM.

As preparation for the upcoming PR that will change only the actual test
code (rather than workflow YAMLs), this also introduces a new file
`setup.sh` that will be used to set host-specific parameters at test
run-time.

Fixes: #6961

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2023-05-25 14:29:46 -07:00
Fabiano Fidêncio
59cefa719c Merge pull request #6965 from fidencio/topic/gha-increase-aks-creation-waiting-time
gha: Increase timeout for AKS jobs and give more time to start running the tests
2023-05-25 17:23:17 +02:00
Greg Kurz
837f7a2fe6 Merge pull request #6959 from beraldoleal/issues/6757
runtime: sending SIGKILL to qemu
2023-05-25 16:24:37 +02:00
alex.lyn
eee7aae71d runtime-rs/sandbox_bindmounts: add support for sandbox bindmounts
sandbox_bind_mounts supports kinds of mount patterns, for example:

(1) "/path/to", default readonly mode.
(2) "/path/to:ro", same as (1).
(3) "/path/to:rw", readwrite mode.

Both support configuration and annotation:
(1)[runtime]
sandbox_bind_mounts=["/path/to", "/path/to:rw", "/mnt/to:ro"]
(2) annotation will alse be supported, restricted as below:
io.katacontainers.config.runtime.sandbox_bind_mounts
                         = "/path/to /path/to:rw /mnt/to:ro"

Fixes: #6597

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-25 20:00:25 +08:00
Fupan Li
62b2838962 Merge pull request #6846 from ZhangShuaiyi/DeviceMgrMethod
dragonball: convert BlockDeviceMgr and VirtioNetDeviceMgr functions to methods
2023-05-25 18:11:44 +08:00
QuanweiZhou
377b7735f5 Merge pull request #6872 from justxuewei/rm-virtio-devices
dragonball: Remove virtio-net and vsock devices gracefully
2023-05-25 17:08:36 +08:00
Fabiano Fidêncio
3d5d6eb361 Merge pull request #6958 from fidencio/topic/kata-deploy-improve-backup-restore
kata-deploy: Improve shim backup / restore
2023-05-25 10:54:06 +02:00
Fabiano Fidêncio
3f0735a7e8 Merge pull request #6952 from stevenhorsman/git-clone-doc-fix
doc: Update git commands
2023-05-25 10:36:08 +02:00
Fabiano Fidêncio
557b840814 gha: aks: Wait longer to start running the tests
We're still facing issues related to the time taken to deploy the
kata-deplot daemonset and starting to run the tests.

Ideally, we should solve this with a readiness probe, and that's the
approach we want to take in the future.  However, for now, let's just
make sure those tests are not on the way of the community.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-25 10:13:19 +02:00
Fabiano Fidêncio
c04c872c42 gha: aks: Increase the timeout time
We've seen tests being aborted close to the end of the run due to the
timeout.  Let's increase it, avoiding to hit such cases again..

Fixes: #6964

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-25 10:13:08 +02:00
GabyCT
8d98484230 Merge pull request #6926 from GabyCT/topic/fixtabsmerge
kata-deploy: Fix indentation on kata deploy merge script
2023-05-24 14:55:51 -06:00
Fabiano Fidêncio
428041624a kata-deploy: Improve shim backup / restore
We're currently backing up and restoring all the possible shim files,
but the default one ("containerd-shim-kata-v2").

Let's ensure this is also backed up and restored.

Fixes: #6957

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-24 18:39:27 +02:00
Gabriela Cervantes
14c3f1e9f5 kata-deploy: Fix indentation on kata deploy merge script
This PR fixes the indentation on the kata deploy merge script
that instead of single spaces uses a tap.

Fixes #6925

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-05-24 16:01:10 +00:00
Beraldo Leal
0e47cfc4c7 runtime: sending SIGKILL to qemu
There is a race condition when virtiofsd is killed without finishing all
the clients. Because of that, when a pod is stopped, QEMU detects
virtiofsd is gone, which is legitimate.

Sending a SIGTERM first before killing could introduce some latency
during the shutdown.

Fixes #6757.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-05-24 11:31:28 -04:00
stevenhorsman
6a0035e419 doc: Update git commands
Fix bad migrations from `go get` to `git clone` and update the cloned
directory path

Fixes: #6951
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-05-24 13:16:48 +01:00
Fabiano Fidêncio
7c9faab523 Merge pull request #6947 from fidencio/topic/gha-release-fix-payload-tagging
gha: release: Simplify the process for tagging the payload
2023-05-24 11:22:09 +02:00
Fabiano Fidêncio
f636c1f8a4 gha: release: Simplify the process for tagging the payload
We previously were doing:
* Create a new image on kata-deploy-ci using the commit hash of the
  latest tag
  * This was used to test on AKS, which is no longer needed as we test
    on AKS on every PR
* Create a new image on kata-deploy using the release tag and "latest"
  or "stable", by tagging the kata-deploy-ci image accordingly

As part of cfe63527c5, we broke the
workflow described above, as in the first step we would save the PKG_SHA
to be used in the second step, but that part ended up being removed.

Anyways, this back and forth is not needed anymore and we can simplify
the process by doing:
* Create a new image on kata-deploy, using:
  - The tag received as ref from the event that triggered this worklow
  - "latest" or "stable" tag, depending on whether it's a stable release
    or not

Fixes: #6946

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-24 08:54:43 +02:00
Fabiano Fidêncio
01827911f4 Merge pull request #6943 from fidencio/topic/gha-login-dont-specify-the-registry-if-using-docker-io
gha: release: login-action: Don't specify docker.io registry
2023-05-24 07:33:12 +02:00
Fabiano Fidêncio
1c9ad4435a Merge pull request #6939 from GabyCT/topic/updatenydus
versions: Update nydus version to 2.2.1
2023-05-24 00:12:57 +02:00
Fabiano Fidêncio
d10c9be603 gha: release: login-action: Don't specify docker.io registry
For some bizarre reason, the login-action will simply fail to
authenticate to docker.io in it's specified as a registry.  The way to
proceed, instead, is to *not* specify any registry as it'd be used by
default.

Fixes: #6943

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-23 22:38:12 +02:00
Fabiano Fidêncio
9aae333343 Merge pull request #6871 from kmjohansen/bugfix/ptmx
runtime: make debug console work with sandbox_cgroup_only
2023-05-23 22:24:51 +02:00
Fabiano Fidêncio
df77fefce8 Merge pull request #6941 from fidencio/3.2.0-alpha3-branch-bump
# Kata Containers 3.2.0-alpha3
2023-05-23 22:21:03 +02:00
Fabiano Fidêncio
c54363114d release: Kata Containers 3.2.0-alpha3
- release: Fix `docker/login-action` version

f3702268d release: Fix `docker/login-action` version

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-23 18:39:16 +02:00
Fabiano Fidêncio
c7a77f980b Merge pull request #6935 from fidencio/topic/release-fix-docker-login-action-version
release: Fix `docker/login-action` version
2023-05-23 18:35:03 +02:00
Gabriela Cervantes
0b1c5ea5bb versions: Update nydus version to 2.2.1
This PR updates the nydus version to 2.2.1. This change includes:
nydus-image: fix a underflow issue in get_compressed_size()
backport fix/feature to stable 2.2
[backport] contrib: upgrade runc to v1.1.5
service: add README for nydus-service
nydus: fix a possible panic caused by SubCmdArgs::is_present
Backports two bugfixes from master into stable/v2.2
[backport stable/v2.2] action: upgrade golangci-lint to v1.51.2
[backport] action: fix smoke test for branch pattern

Fixes #6938

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-05-23 15:39:04 +00:00
Fabiano Fidêncio
f3702268d1 release: Fix docker/login-action version
`docker/login-action@v3` does *not* exist and `docker/login-action@v2`
should be used instead.

Fixes: #6934

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-23 14:11:03 +02:00
Fabiano Fidêncio
c82ac57e30 Merge pull request #6930 from fidencio/3.2.0-alpha2-branch-bump
# Kata Containers 3.2.0-alpha2
2023-05-23 13:50:58 +02:00
Linda Yu
433b5add4a kubernetes: add agnhost command in pod yaml
Fixes: #6928

Signed-off-by: Linda Yu <linda.yu@intel.com>
2023-05-23 18:11:45 +08:00
Fupan Li
170336517f Merge pull request #5441 from openanolis/device_manager_dev
runtime-rs: device manager for runtime-rs
2023-05-23 16:50:07 +08:00
Fabiano Fidêncio
fc09d0f5dd release: Kata Containers 3.2.0-alpha2
- Fix cache for OVMF and rootfs-initrd (both x86_64)
- Upgrade to Cloud Hypervisor v32.0
- osbuilder: Bump fedora image version
- local-build: Standardise what's set for the local build scripts
- gha: aks: Wait a little bit more before run the tests
- docs: Update container network model url
- gha: release: Fix s390x worklow
- cache: Fix OVMF caching
- gha: payload-after-push: Pass secrets down
- tools: Fix arch bug

22154e0a3 cache: Fix OVMF tarball name for different flavours
b7341cd96 cache: Use "initrd" as `initrd_type` to build rootfs-initrd
b8ffcd1b9 osbuilder: Bump fedora image version
636539bf0 kata-deploy: Use apt-key.gpg from k8s.io
ae24dc73c local-build: Standardise what's set for the local build scripts
35c3d7b4b runtime: clh: Re-generate the client code
cfee99c57 versions: Upgrade to Cloud Hypervisor v32.0
ad324adf1 gha: aks: Wait a little bit more before run the tests
191b6dd9d gha: release: Fix s390x worklow
cfd8f4ff7 gha: payload-after-push: Pass secrets down
75330ab3f cache: Fix OVMF caching
a89b44aab tools: Fix arch bug
11a34a72e docs: Update container network model url

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-23 09:06:44 +02:00
Fabiano Fidêncio
160d9aae4d Merge pull request #6918 from fidencio/topic/fix-cache-x86_64-ovmf-rootfs-initrd
Fix cache for OVMF and rootfs-initrd (both x86_64)
2023-05-22 21:34:56 +02:00
Zhongtao Hu
4719802c8d runtime-rs: add virtio-blk-mmio
add virtio-blk-mmio option for dragonball

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:58:10 +08:00
Zhongtao Hu
f9bded4484 runtime-rs: add devicetype enum
use device type to store the config information for different kind of
devices

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:55:35 +08:00
Zhongtao Hu
6800d30fdb runtime-rs: remove device
Support remove device after container stop

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:54:22 +08:00
Zhongtao Hu
f16012a1eb runtime-rs: support linux device
support linux device in runtime-rs

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:54:13 +08:00
Zhongtao Hu
fe9ec67644 runtime-rs: block volume
support block volume in runtime-rs

Fixes: #5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:54:04 +08:00
Zhongtao Hu
a8bfac90b1 runtime-rs: support block rootfs
support devmapper for block rootfs

Fixes: #5375

Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:53:30 +08:00
Zhongtao Hu
b076d46db3 agent: handle hotplug virtio-mmio device
As dragonball support hotplug virtio-mmio device, we should handle it in agent

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:53:22 +08:00
Zhongtao Hu
6e273d6ccc runtime-rs: implement trait for vhost-user device
add the trait implementation for vhost-user device

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-05-23 00:53:16 +08:00
Zhongtao Hu
cc9c915384 runtime-rs: implement trait for vfio device
add the trait implementation for vfio device,

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:53:10 +08:00
Archana Shinde
2c9efbe04c Merge pull request #6907 from likebreath/0519/clh_v32.0
Upgrade to Cloud Hypervisor v32.0
2023-05-22 09:53:05 -07:00
Zhongtao Hu
e4c5c74a75 runtime-rs: device manager
Support device manager for runtime-rs, add block device handler for
device manager

Fixes:#5375
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-05-23 00:53:04 +08:00
Fabiano Fidêncio
22154e0a3b cache: Fix OVMF tarball name for different flavours
75330ab3f9 tried to fix OVMF caching, but
didn't consider that the "vanilla" OVMF tarball name is not
"kata-static-ovmf-x86_64.tar.xz", but rather "kata-static-ovmf.tar.xz".

The fact we missed that, led to the cache builds of OVMF failing, and
the need to build the component on every single PR.

Fixes: #6917 (hopefully for good this time).

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-22 18:12:30 +02:00
Fabiano Fidêncio
b7341cd968 cache: Use "initrd" as initrd_type to build rootfs-initrd
We've been defaulting to "", which would lead to a mismatch with the
latest version from the cache, causing a miss, and finally having to
build the rootfs-initrd as part of the tests, every single time.

Fixes: #6917

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-22 18:12:30 +02:00
Fabiano Fidêncio
a28cefd538 Merge pull request #6924 from stevenhorsman/fedora-bump
osbuilder: Bump fedora image version
2023-05-22 18:10:57 +02:00
Fabiano Fidêncio
7f350d3ec6 Merge pull request #6913 from fidencio/topic/gha-build-and-upload-payload-can-silently-fail
local-build: Standardise what's set for the local build scripts
2023-05-22 18:04:51 +02:00
stevenhorsman
b8ffcd1b9b osbuilder: Bump fedora image version
- Swap out an EoL fedora image for the latest

Fixes: #6923
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-05-22 13:48:00 +01:00
Fabiano Fidêncio
636539bf0c kata-deploy: Use apt-key.gpg from k8s.io
We're facing some issues to download / use the public key provided by
google for installing kubernetes as part of the kata-deploy image.
```
The following signatures couldn't be verified because the public key is
not available: NO_PUBKEY B53DC80D13EDEF05
Reading package lists... Done
W: GPG error: https://packages.cloud.google.com/apt kubernetes-xenial
   InRelease: The following signatures couldn't be verified because the
   public key is not available: NO_PUBKEY B53DC80D13EDEF05 E: The
   repository 'https://apt.kubernetes.io kubernetes-xenial InRelease' is
   not signed.
N: Updating from such a repository can't be done securely, and is
   therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user
   configuration details.
```

Let's work this around following the suggestion made by @dims, at:
https://github.com/kubernetes/k8s.io/pull/4837#issuecomment-1446426585

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-22 11:06:01 +02:00
Fabiano Fidêncio
ae24dc73c1 local-build: Standardise what's set for the local build scripts
We've a discrepancy on what's set along the scripts used to build the
Kata Cotainers artefacts locally.

Some of those were missing a way to easily debug them in case of a
failure happens, but one specific one (build-and-upload-payload.sh)
could actually silently fail.

All of those have been changed as part of this commut.

Fixes: #6908

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-22 08:36:01 +02:00
Steve Horsman
a2e69c5b66 Merge pull request #6906 from fidencio/topic/gh-aks-wait-a-little-more-before-start-the-tests
gha: aks: Wait a little bit more before run the tests
2023-05-20 08:01:20 +01:00
GabyCT
6796af511b Merge pull request #6890 from GabyCT/topic/fixurlvirt
docs: Update container network model url
2023-05-19 15:10:26 -06:00
Bo Chen
35c3d7b4bc runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v32.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #6632

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-05-19 12:49:45 -07:00
Bo Chen
cfee99c577 versions: Upgrade to Cloud Hypervisor v32.0
Details of this release can be found in ourroadmap project as iteration
v32.0: https://github.com/orgs/cloud-hypervisor/projects/6.

Fixes: #6682

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-05-19 12:11:13 -07:00
Steve Horsman
98fa436627 Merge pull request #6904 from fidencio/topic/gha-fix-s390x-release-workflow
gha: release: Fix s390x worklow
2023-05-19 19:00:57 +01:00
Steve Horsman
d5355dee20 Merge pull request #6898 from fidencio/topic/fix-ovmf-caching
cache: Fix OVMF caching
2023-05-19 18:24:51 +01:00
Fabiano Fidêncio
dfa9301eac Merge pull request #6900 from fidencio/topic/gha-fix-payload-after-push
gha: payload-after-push: Pass secrets down
2023-05-19 17:23:00 +02:00
Fabiano Fidêncio
ad324adf1d gha: aks: Wait a little bit more before run the tests
fa832f4709 increased the timeout, which
helped a lot, mainly in the TEE machines.  However, we're still seeing
some failures here and there with the AKS tests.

Let's bump it yet again and, hopefully, those errors to start the tests
will go away.

Fixes: #6905

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-19 16:40:35 +02:00
Fabiano Fidêncio
191b6dd9dd gha: release: Fix s390x worklow
GitHub is warning us that:
"""
The workflow is not valid. In .github/workflows/release.yaml (Line: 21,
Col: 11): Error from called workflow
kata-containers/kata-containers/.github/workflows/release-s390x.yaml@d2e92c9ec993f56537044950a4673e50707369b5
(Line: 14, Col: 12): Job 'kata-deploy' depends on unknown job
'create-kata-tarball'.
"""

This is happening as we need to reference
"build-kata-static-tarball-s390x" instead of "create-kata-tarball".

Fixes: #6903

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-19 16:21:49 +02:00
Fabiano Fidêncio
cfd8f4ff76 gha: payload-after-push: Pass secrets down
The "build-assets-${arch}" jobs need to have access to the secrets in
order to log into the container registry in the cases where
"push-to-registry", which is used to push the builder containers to
quay.io, is set to "yes".

Now that "build-assets-${arch}" pass the secrets down, we need to log
into the container registry in the "build-kata-static-tarball-${arch}"
files, in case "push-to-registry" is set to "yes".

Fixes: #6899

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-19 15:00:06 +02:00
Fabiano Fidêncio
7abae8ee9c Merge pull request #6896 from stevenhorsman/firecracker-arch-case
tools: Fix arch bug
2023-05-19 14:26:14 +02:00
Fabiano Fidêncio
75330ab3f9 cache: Fix OVMF caching
OVMF has been cached, but it's not been used from cache as the `version`
set in the cached builds has always been empty.

The reason for that is because we've been trying to look for
`externals.ovmf.ovmf.version`, while we should be actually looking for
`externals.ovmf.x86_64.version`.

Setting `x86_64` as the OVMF_FLAVOUR would cause another bug, as the
expected tarball name would then be `kata-static-x86_64.tar.xz`, instead
of `kata-static-ovmf-x86_64.tar.xz`.

With everything said, let's simplify the OVMF_FLAVOUR usage, by using it
as it's passed, and only adapting the tarball name for the TDVF case,
which is the abnormal one.

Fixes: #6897

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-19 14:00:39 +02:00
Fabiano Fidêncio
d2e92c9ec9 Merge pull request #6892 from fidencio/3.2.0-alpha1-branch-bump
# Kata Containers 3.2.0-alpha1
2023-05-19 12:31:33 +02:00
stevenhorsman
a89b44aabf tools: Fix arch bug
Fix mismatched case of `arch`

Fixes: #6895
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-05-19 09:28:22 +01:00
Fabiano Fidêncio
f527f614c1 release: Kata Containers 3.2.0-alpha1
- runtime: Use static_sandbox_resource_mgmt=true for TEEs
- update tokio dependency
- resource-control: fix setting CPU affinities on Linux
- runtime: use enable_vcpus_pinning from toml
- gha: k8s: Make the tests more reliable
- gha: Enable SEV-SNP tests on main
- gha: tdx: Use the k3s overlay for kata-cleanup
- runtime: Port sev package to main
- gpu: Rename the last bits from `gpu` to `nvidia-gpu`
- deploy: fix shell script error
- ppc64le: switch virtiofsd from C to rust version
- osbuilder: Fix indentation in rootfs.sh
- virtcontainers/qemu_test.go: Improve coverage
- agent: Add context to errors that may occur when AgentConfig file is …
- virtcontainers/pkg/compatoci/: Improved coverage for  for Kata 2.0
- kata-manager: Fix '-o' syntax and logic error
- kata-ctl:  Add the option to install kata-ctl to a user specified directory
- runtime-rs: fix building instructions to use correct required Rust ve…
- Dragonball: use LinuxBootConfigurator::write_bootparams
- kata-deploy: Add http_proxy as part of the docker build
- kata-deploy: Do not ship the kata tarball
- kata-deploy: Build improvements
- deploy: Fix arch in image tag
- Revert "kata-deploy: Use readinessProbe to ensure everything is ready"
- virtcontainers: Improved test coverage for fc.go from 4.6% to 18.5%
- main | release: Fix multi-arch publishing is not supported
- cache: More fixes to nvidia-gpu kernels caching
- runtime: remove overriding ARCH value by default for ppc64le
- gha: Fix Body Line Length action flagging empty body commit messages
- gha: Fix snap creation workflow
- cache: Fix nvidia-gpu version
- cache: Update the KERNEL_FLAVOUR list to include nvidia-gpu
- packaging: Add SEV-SNP artifacts to main
- docs: Mark snap installation method as unmaintained
- packaging: Add sev artifacts to main
- kata-ctl: add generic kvm check & unit test
- Log-parser-rs
- warning_fix: fix warnings when build with cargo-1.68.0
- cross-compile: Include documentation and configuration for cross-compile
- runtime: Fix virtiofs fd leak
- gpu: cold plug VFIO devices
- pkg/signals: Improved test coverage 60% to 100%
- virtcontainers/persist: Improved test coverage 65% to 87.5%
- virtcontainers/clh_test.go: improve unit test coverage
- virtcontainers/factory: Improved test coverage
- gha: Also run k8s tests on qemu-snp
- gha: sev: fix for kata-deploy error
- gha: Also run k8s tests on qemu-sev
- Implement the "kata-ctl env" command
- runtime-rs: support keep_abnormal in toml config
- gpu: Build and Ship an GPU enabled Kernel
- kata-ctl: checks for kvm, kvm_intel modules loaded
- osbuilder: Fix D-Bus enabling in the dracut case
- snap: fix docker start fail issue
- kata-manager: Fix containerd download
- agent: Fix ut issue caused by fd double closed
- Bump ttrpc to 0.7.2 and protobuf to 3.2.0
- gpu: Add GPU enabled confguration and runtime
- gpu: Do not pass-through PCI (Host) Bridges
- cache-components: Fix caching of TDVF and QEMU for TDX
- gha: tdx: Ensure kata-deploy is removed after the tests run
- versions: Upgrade to Cloud Hypervisor v31.0
- osbuilder: Enable dbus in the dracut case
- runtime: Don't create socket file in /run/kata
- nydus_rootfs/prefetch_files: add prefetch_files for RAFS
- runtime-rs/virtio-fs: add support extra handler for cache mode.
- runtime-rs: enable nerdctl to setup cni plugin
- tdx: Add artefacts from the latest TDX tools release into main
- runtime: support non-root for clh
- gha: ci-on-push: Run k8s tests with dragonball
- rustjail: Use CPUWeight with systemd and CgroupsV2
- gha: k8s-on-aks: {create,delete} AKS must be a coded-in step
- docs: update the rust version from version.yaml
- gha: k8s-on-aks: Set {create,delete}_aks as steps
- gha: k8s-on-aks: Fix cluster name
- gha: Also run k8s tests on AKS with dragonball
- gha: Only push images to registry after merging a PR
- gha: aks: Use D4s_v5 instance
- tools: Avoid building the kernel twice
- rustjail: Fix panic when cgroup manager fails
- runtime: add filter metrics with specific names
- gha: Use ghcr.io for the k8s CI
- GHA |Switch "kubernetes tests" from jenkins to GitHub actions
- docs: Update CNM url in networking document
- kata-ctl: add function to get platform protection.

f6e1b1152 agent: update tokio dependency
4cb83dc21 kata-ctl: update tokio dependency
df615ff25 runk: update tokio dependency
ca6892ddb runtime-rs: update tokio dependency
ca1531fe9 runtime: Use static_sandbox_resource_mgmt=true for TEEs
fa832f470 gha: k8s: Make the tests more reliable
cbb9fe8b8 config: Use standard OVMF with SEV
724437efb kata-deploy: add kata-qemu-sev runtimeclass
521dad2a4 Tests: skip CPU constraints test on SEV and SNP
72308ddb0 gha: ci-on-push: Don't skip tests for SEV
da0f92cef gha: ci-on-push: Don't skip tests for SEV-SNP
12f43bea0 gha: tdx: Use the k3s overlay for kata-cleanup
1a3f8fc1a deploy: fix shell script error
87cb98c01 osbuilder: Fix indentation in rootfs.sh
c5a59caca ppc64le: switch virtiofsd from C to rust version
bfdf0144a versions: Bump virtiofsd to 1.6.1
dd7562522 runtime: pkg/sev: Add kbs utility package for SEV pre-attestation
05de7b260 runtime: Add sev package
3a9d3c72a gpu: Rename the last bits from `gpu` to `nvidia-gpu`
4cde844f7 local-build: Fix kernel-nvidia-gpu target name
593840e07 kata-ctl: Allow INSTALL_PATH= to be specified
bdb75fb21 runtime: use enable_vcpus_pinning from toml
20cb87508 virtcontainers/qemu_test.go: Improve test coverage
b9a1db260 kata-deploy: Add http_proxy as part of the docker build
3e85bf5b1 resource-control: fix setting CPU affinities on Linux
5f3f844a1 runtime-rs: fix building instructions with respect to required Rust version
777c3dc8d kata-deploy: Do not ship the kata tarball
50cc9c582 tests: Improve coverage for virtcontainers/pkg/compatoci/ for Kata 2.0
136e2415d static-build: Download firecracker instead of building it
3bf767cfc static-build: Adjust ARCH for nydus
ac88d34e0 static-build: Use relased binary for CLH (aarch64)
73913c8eb kata-manager: Fix '-o' syntax and logic error
2856d3f23 deploy: Fix arch in image tag
e8f81ee93 Revert "kata-deploy: Use readinessProbe to ensure everything is ready"
cfe63527c release: Fix multi-arch publishing is not supported
197c33651 Dragonball: use LinuxBootConfigurator::write_bootparams to writes the boot parameters into guest memory.
4d17ea4a0 cache: Fix nvidia-snp caching version
a133fadbf cache: Fix nvidia-gpu-tdx-experimental cache URL
b9990c201 cache: Fix nvidia-gpu version
c9bf7808b cache: Update the KERNEL_FLAVOUR list to include nvidia-gpu
3665b4204 gpu: Rename `gpu` targets to `nvidia-gpu`
2c90cac75 local-build: fixup alphabetization
4da6eb588 kata-deploy: Add qemu-snp shim
14dd05375 kata-deploy: add kata-qemu-snp runtimeclass
0bb37bff7 config: Add SNP configuration
af7f2519b versions: update SEV kernel description
dbcc3b5cc local-build: fix default values for OVMF build
b8bbe6325 gha: build OVMF for tests and release
cf0ca265f local-build: Add x86_64 OVMF target
db095ddeb cache: add SNP flavor to comments
f4ee00576 gha: Build and ship QEMU for SNP
7a58a91fa docs: update SNP guide
879333bfc versions: update SNP QEMU version
38ce4a32a local-build: add support to build QEMU for SEV-SNP
5f8008b69 kata-ctl: add unit test for kvm check
a085a6d7b kata-ctl: add generic kvm check
772d4db26 gha: Build and ship SEV initrd
45fa36692 gha: Build and ship SEV OVMF
4770d3064 gha: Build and ship SEV kernel.
fb9c1fc36 runtime: Add qemu-sev config
813e4c576 runtimeClasses: add sev runtime class
af18806a8 static-build: Add caching support to sev ovmf
76ae7a3ab packaging: adding caching capability for kernel
12c5ef902 packaging: add support to build OVMF for SEV
b87820ee8 packaging: add support to build initrd for sev
e1f3b871c docs: Mark snap installation method as unmaintained
022a33de9 agent: Add context to errors when AgentConfig file is missing
b0e6a094b packaging: Add sev kernel build capability
a4c0303d8 virtcontainers: Fixed static checks for improved test coverage for fc.go
8495f830b cross-compile: Include documentation and configuration for cross-compile
13d7f39c7 gpu: Check for VFIO port assignments
6594a9329 tools: made log-parser-rs
03a8cd69c virtcontainers: Improved test coverage for fc.go from 4.6% to 18.5%
9e2b7ff17 gha: sev: fix for kata-deploy error
5c9246db1 gha: Also run k8s tests on qemu-snp
c57a44436 gha: Add the ability to test qemu-snp
406419289 env: Utilize arch specific functionality to get cpu details
fb40c71a2 env: Check for root privileges
1016bc17b config: Add api to fetch config from default config path
b908a780a kata-env: Pass cmd option for file path
b1920198b config: Workaround the way agent and hypervisor configs are fetched
f2b2621de kata-env: Implement the kata-env command.
c849bdb0a gha: Also run k8s tests on qemu-sev
6bf1fc605 virtcontainers/factory: Improved test coverage
0d49ceee0 gha: Fix snap creation workflow warnings
138ada049 gpu: Cold Plug VFIO toml setting
defb64334 runtime: remove overriding ARCH value by default for ppc64le
f7ad75cb1 gpu: Cold-plug extend the api.md
0fec2e698 gpu: Add cold-plug test
f2ebdd81c utils: Get rid of spurious print statement left behind.
9a94f1f14 make: Export VERSION and COMMIT
2f81f48da config: Add file under /opt as another location to look for the config
07f7d17db config: Make the pipe_size field optional
68f635773 config: Make function to get the default conf file public
7565b3356 kata-ctl: Implement Display trait for GuestProtection enum
94a00f934 utils: Make certain constants in utils.rs public
572b338b3 gitignore: Ignore .swp and .swo editor backup files
376884b8a cargo: Update version of clap to 4.1.13
17daeb9dd warning_fix: fix warnings when build with cargo-1.68.0
521519d74 gha: Add the ability to test qemu-sev
205909fbe runtime: Fix virtiofs fd leak
5226f15c8 gha: Fix Body Line Length action flagging empty body commit messages
0f45b0faa virtcontainers/clh_test.go: improve unit test coverage
dded731db gpu: Add OVMF setting for MMIO aperture
2a830177c gpu: Add fwcfg helper function
131f056a1 gpu: Extract VFIO Functions to drivers
c8cf7ed3b gpu: Add ColdPlug of VFIO devices with devManager
e2b5e7f73 gpu: Add Rawdevices to hypervisor
6107c32d7 gpu: Assign default value to cold-plug
377ebc2ad gpu: Add configuration option for cold-plug VFIO
c18ceae10 gpu: Add new struct PCIePort
9c38204f1 virtcontainers/persist: Improved test coverage 65% to 87.5%
1c1ee8057 pkg/signals: Improved test coverage 60% to 100%
cc8ea3232 runtime-rs: support keep_abnormal in toml config
96e8470db kata-manager: Fix containerd download
432d40744 kata-ctl: checks for kvm, kvm_intel modules loaded
b1730e4a6 gpu: Add new kernel build option to usage()
3e7b90226 osbuilder: Fix D-Bus enabling in the dracut case
53c749a9d agent: Fix ut issue caused by fd double closed
2e3f19af9 agent: fix clippy warnings caused by protobuf3
4849c56fa agent: Fix unit test issue cuased by protobuf upgrade
0a582f781 trace-forwarder: remove unused crate protobuf
73253850e kata-ctl: remove unused crate ttrpc
76d2e3054 agent-ctl: Bump ttrpc from 0.6.0 to 0.7.1
eb3d20dcc protocols: Add ut for Serde
59568c79d protocols: add support for Serde
a6b4d92c8 runtime-rs: Bump ttrpc from 0.6.0 to 0.7.1
ac7c63bc6 gpu: Add containerd shim for qemu-gpu
a0cc8a75f gpu: Add a kube runtime class
a81fff706 gpu: Adding a GPU enabled configuration
8af6fc77c agent: Bump ttrpc from 0.6.0 to 0.7.1
009b42dbf protocols: Fix unit test
392732e21 protocols: Bump ttrpc from 0.6.0  to 0.7.1
f4f958d53 gpu: Do not pass-through PCI (Host) Bridges
825e76948 gpu: Add GPU support to default kernel without any TEE
e4ee07f7d gpu: Add GPU TDX experimental kernel
a1272bcf1 gha: tdx: Fix typo overlay -> overlays
3fa0890e5 cache-components: Fix TDVF caching
80e3a2d40 cache-components: Fix TDX QEMU caching
87ea43cd4 gpu: Add configuration fragment
aca6ff728 gpu: Build and Ship an GPU enabled Kernel
dc662333d runtime: Increase the dial_timeout
eb1762e81 osbuilder: Enable dbus in the dracut case
f478b9115 clh: tdx: Update timeouts for confidential guest
3b76abb36 kata-deploy: Ensure node is ready after CRI Engine restart
5ec9ae0f0 kata-deploy: Use readinessProbe to ensure everything is ready
ea386700f kata-deploy: Update podOverhead for TDX
e31efc861 gha: tdx: Use the k3s overlay
542bb0f3f gha: tdx: Set KUBECONFIG env at the job level
d7fdf19e9 gha: tdx: Delete kata-deploy after the tests finish
da35241a9 tests: k8s: Skip k8s-cpu-ns when testing TDX
db2cac34d runtime: Don't create socket file in /run/kata
6d315719f snap: fix docker start fail issue
e4b3b0887 gpu: Add proper CONFIG_LOCALVERSION depending on TEE
69ba2098f runtime-rs: remove network entities and netns
b31f103d1 runtime-rs: enable nerdctl cni plugin
69d7a959c gha: ci-on-push: Run tests on TDX
5a0727ecb kata-deploy: Ship kata-qemu-tdx runtimeClass
98682805b config: Add configuration for QEMU TDX
3e1580019 govmm: Directly pass the firmware using -bios with TDX
3c5ffb0c8 govmm: Set "sept-ve-disable=on"
ed145365e runtime/qemu: Drop "kvm-type=tdx"
25b3cdd38 virtcontainers: Drop check for the `tdx` CPU flag
01bdacb4e virtcontainers: Also check /sys/firmwares/tdx for TDX
9feec533c cache: Add ability to cache OVMF
ce8d98251 gha: Build and ship the OVMF for TDX
39c3fab7b local-build: Add support to build OVMF for TDX
054174d3e versions: Bump OVMF for TDX
800fb49da packaging: Add get_ovmf_image_name() helper
fbf03d7ac cache: Document kernel-tdx-experimental
5d79e9696 cache: Add a space to ease the reading of the kernel flavours
6e4726e45 cache: Fix typos
fc22ed0a8 gha: Build and ship the Kernel for TDX
502844ced local-build: Add support to build Kernel for TDX
b2585eecf local-build: Avoid code duplication building the kernel
f33345c31 versions: Update Kernel TDX version
20ab2c242 versions: Move Kernel TDX to its own experimental entry
3d9ce3982 cache: Allow specifying the QEMU_FLAVOUR
33dc6c65a gha: Build and ship QEMU for TDX
eceaae30a local-build: Add support to build QEMU for TDX
f7b7c187e static-build: Improve qemu-experimental build script
3018c9ad5 versions: Update QEMU TDX version
800ee5cd8 versions: Move QEMU TDX to its own experimental entry
1315bb45f local-build: Add dragonball kernel to the `all` target
73e108136 local-build: Rename non vanilla kernel build functions
1d851b4be local-build: Cosmetic changes in build targets
49ce685eb gha: k8s-on-aks: Always delete the AKS cluster
e2a770df5 gha: ci-on-push: Run k8s tests with dragonball
d1f550bd1 docs: update the rust version from versions.yaml
f3595e48b nydus_rootfs/prefetch_files: add prefetch_files for RAFS
3bfaafbf4 fix: oci hook
c1fbaae8d rustjail: Use CPUWeight with systemd and CgroupsV2
375187e04 versions: Upgrade to Cloud Hypervisor v31.0
79f3047f0 gha: k8s-on-aks: {create,delete} AKS must be a coded-in step
2f35b4d4e gha: ci-on-push: Only run on `main` branch
e7bd2545e Revert "gha: ci-on-push: Depend on Commit Message Check"
0d96d4963 Revert "gha: ci-on-push: Adjust to using workflow_run"
c7ee45f7e Revert "gha: ci-on-push: Adapt chained jobs to workflow_run"
5d4d72064 Revert "gha: k8s-on-aks: Fix cluster name"
13d857a56 gha: k8s-on-aks: Set {create,delete}_aks as steps
dc6569dbb runtime-rs/virtio-fs: add support extra handler for cache mode.
85cc5bb53 gha: k8s-on-aks: Fix cluster name
1688e4f3f gha: aks: Use D4s_v5 instance
108d80a86 gha: Add the ability to also test Dragonball
2550d4462 gha: build-kata-static-tarball: Only push to registry after merge
e81b8b8ee local-build: build-and-upload-payload is not quay.io specific
13929fc61 gha: publish-kata-deploy-payload: Improve registry login
41026f003 gha: payload-after-push: Pass registry / repo as inputs
7855b4306 gha: ci-on-push: Adapt chained jobs to workflow_run
3a760a157 gha: ci-on-push: Adjust to using workflow_run
a159ffdba gha: ci-on-push: Depend on Commit Message Check
8086c75f6 gha: Also run k8s tests on AKS with dragonball
fe86c08a6 tools: Avoid building the kernel twice
3215860a4 gha: Set ci-on-push to run on `pull_request_target`
d17dfe4cd gha: Use ghcr.io for the k8s CI
b661e0cf3 rustjail: Add anyhow context for D-Bus connections
60c62c3b6 gha: Remove kata-deploy-test.yaml
43894e945 gha: Remove kata-deploy-push.yaml
cab9ca043 gha: Add a CI pipeline for Kata Containers
53b526b6b gha: k8s: Add snippet to run k8s tests on aks clusters
c444c24bc gha: aks: Add snippets to create / delete aks clusters
11e0099fb tests: Move k8s tests to this repo
73be4bd3f gha: Update actions for release.yaml
d38d7fbf1 gha: Remove code duplication from release.yaml
56331bd7b gha: Split payload-after-push-*.yaml
a552a1953 docs: Update CNM url in networking document
7796e6ccc rustjail: Fix minor grammatical error in function name
41fdda1d8 rustjail: Do  not unwrap potential error with cgroup manager
a914283ce kata-ctl: add function to get platform protection.
0f7351556 runtime: add filter metrics with specific names
cbe6ad903 runtime: support non-root for clh
d3bb25418 utils: Add function to check vhost-vsock

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-19 09:26:36 +02:00
Fabiano Fidêncio
0364620844 Merge pull request #6819 from fidencio/topic/use-static-sandbox-resource-mgmt-for-TEEs
runtime: Use static_sandbox_resource_mgmt=true for TEEs
2023-05-18 22:38:31 +02:00
Fabiano Fidêncio
2ea8acaaa5 Merge pull request #6882 from bergwolf/github/tokio
update tokio dependency
2023-05-18 20:35:16 +02:00
Krister Johansen
eff6ed2d5f runtime: make debug console work with sandbox_cgroup_only
If a hypervisor debug console is enabled and sandbox_cgroup_only is set,
the hypervisor can fail to open /dev/ptmx, which prevents the sandbox
from launching.

This is caused by the absence of a device cgroup entry to allow access
to /dev/ptmx.  When sandbox_cgroup_only is not set, the hypervisor
inherits the default unrestrcited device cgroup, but with it enabled it
runs into allow / deny list restrictions.

Fix by adding an allowlist entry for /dev/ptmx when debug is enabled,
sandbox_cgroup_only is true, and no /dev/ptmx is already in the list of
devices.

Fixes: #6870

Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
2023-05-18 10:36:24 -07:00
Gabriela Cervantes
11a34a72e2 docs: Update container network model url
This PR updates the container network model url that is part of the
virtcontainers documentation.

Fixes #6889

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-05-18 15:08:08 +00:00
Peng Tao
f6e1b1152c agent: update tokio dependency
To 1.28.1 to bring in the latest fixes.

Fixes: #6881
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-05-18 09:36:06 +00:00
Shuaiyi Zhang
c477ac551f dragonball: Convert VirtioNetDeviceMgr function to method
Convert VirtioNetDeviceMgr::insert_device and
VirtioNetDeviceMgr::update_device_ratelimiters to method.

Fixes: #6880

Signed-off-by: Shuaiyi Zhang <zhang_syi@qq.com>
2023-05-18 16:57:01 +08:00
Shuaiyi Zhang
4659facb74 dragonball: Convert BlockDeviceMgr function to method
Convert BlockDeviceMgr::insert_device, BlockDeviceMgr::remove_device
and BlockDeviceMgr::update_device_ratelimiters to method.

Fixes: #6880

Signed-off-by: Shuaiyi Zhang <zhang_syi@qq.com>
2023-05-18 16:56:49 +08:00
Peng Tao
4cb83dc219 kata-ctl: update tokio dependency
Update to 1.28.1 To pick up the latest fixes.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-05-18 08:25:13 +00:00
Peng Tao
df615ff252 runk: update tokio dependency
Update to 1.28.1 to pick up latest fixes.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-05-18 08:24:41 +00:00
Peng Tao
ca6892ddb1 runtime-rs: update tokio dependency
Unify it to the latest 1.28.1 version.

Signed-off-by: Peng Tao <bergwolf@hyper.sh>
2023-05-18 08:18:22 +00:00
Fabiano Fidêncio
3a4b924226 Merge pull request #6833 from rye-stripe/bugfix/vcpu-pinning
resource-control: fix setting CPU affinities on Linux
2023-05-18 08:12:39 +02:00
Xuewei Niu
ee6deef09d dragonball: Remove virtio-net and vsock devices gracefully
This MR implements removing virtio-net and virtio-vsock devices gracefully when
shutting down VMM.

Fixes: #6684

Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-05-18 12:11:20 +08:00
Fabiano Fidêncio
e762f70920 Merge pull request #6838 from rye-stripe/bugfix/use-enable-vcpus-pinning-from-toml
runtime: use enable_vcpus_pinning from toml
2023-05-17 21:30:44 +02:00
Fabiano Fidêncio
ca1531fe9d runtime: Use static_sandbox_resource_mgmt=true for TEEs
When this option is enabled the runtime will attempt to determine the
appropriate sandbox size (memory, CPU) before booting the virtual
machine.

As TEEs do not support memory and CPU hotplug, this approach must be
used.

Fixes: #6818

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-17 19:21:52 +02:00
Fabiano Fidêncio
851b97fa51 Merge pull request #6866 from fidencio/topic/gha-improve-actions
gha: k8s: Make the tests more reliable
2023-05-17 19:19:18 +02:00
Fabiano Fidêncio
8ce14e709a Merge pull request #6810 from fitzthum/snp-enable
gha: Enable SEV-SNP tests on main
2023-05-17 15:29:54 +02:00
Greg Kurz
206df04b99 Merge pull request #6858 from fidencio/topic/gha-tdx-fix-cleanup
gha: tdx: Use the k3s overlay for kata-cleanup
2023-05-17 15:04:56 +02:00
Wainer Moschetta
259158f1c3 Merge pull request #6789 from dubek/add-sev-package
runtime: Port sev package to main
2023-05-17 10:02:19 -03:00
Fabiano Fidêncio
fa832f4709 gha: k8s: Make the tests more reliable
We like it or not, every now and then we'll have to deal with flaky
tests, and our tests using GHA are not exempt from that fact.

With this simple commit, we're trying to improve the reliability of the
tests in a few different fronts:

* Giving enough time for the script used by kata-deploy to be executed
  * We've hit issues as the kata-deploy pod is considered "Ready" at the
    moment it starts running, not when it finishes the needed setup. We
    should also be looking on how to solve this on the kata-deploy side
    but, for now, let's ensure our tests do not break with the current
    kata-deploy behavior.

* Merging the "Deploy kata-deploy" and "Run tests" steps
  * We've hit issues re-running tests and seeing even more failures than
    the ones we're trying to debug, as a step will simply be taken as
    succeeded as part of the re-run, in case it was successful executed
    as part of the first run.  This causes issues with the kata-deploy
    deployment, as the tests would start running before even having the
    node set up for running Kata Containers.

Fixes: #6865 #6649

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-17 13:38:08 +02:00
Tobin Feldman-Fitzthum
cbb9fe8b81 config: Use standard OVMF with SEV
The AmdSev firmware package should be used with
measured direct boot. If the expected hashes are not
injected into the firmware binary by the VMM, the
guest will not boot. This is required for security.

Currently the main branch does not have the extended
shim support for SEV, which tells the VMM to inject
the expected hashes.

We ship the standard OVMF package to use with SNP,
so let's switch SEV to that for now. This will need
to be changed back when shim support for SEV(-ES)
is added to main.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2023-05-17 11:36:04 +02:00
Tobin Feldman-Fitzthum
724437efb3 kata-deploy: add kata-qemu-sev runtimeclass
In order to populate containerd config file with
support for SEV, we need to add the qemu-sev shim
to the kata-deploy script.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2023-05-17 11:36:02 +02:00
Tobin Feldman-Fitzthum
521dad2a47 Tests: skip CPU constraints test on SEV and SNP
Currently Kata does not support memory / CPU hotplug for SEV or
SEV-SNP so we need to skip tests that rely on it.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2023-05-17 11:35:13 +02:00
Tobin Feldman-Fitzthum
72308ddb07 gha: ci-on-push: Don't skip tests for SEV
Now that SEV artifacts are built by GHA, remove
conditional that skips tests when using qemu-sev.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2023-05-17 11:35:13 +02:00
Tobin Feldman-Fitzthum
da0f92cef8 gha: ci-on-push: Don't skip tests for SEV-SNP
Now that we have SNP artifacts in place and they are built via gha,
remove the condition that skips the tests for SNP.

Fixes: #6809

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2023-05-17 11:35:13 +02:00
fupan
2bda92face netlink: Fix the issue of update_interface
When updating an interface, there's maybe an existed
interface whose name would be the same with the updated
required name, thus it would update failed with interface
name existed error. Thus we should rename the existed interface
with an temporary name and swap it with the previouse interface
name last.

Fixes: #6842

Signed-off-by: fupan <fupan.lfp@antgroup.com>
2023-05-17 16:45:49 +08:00
Fabiano Fidêncio
12f43bea0f gha: tdx: Use the k3s overlay for kata-cleanup
As the TDX CI runs on k3s, we must ensure the cleanup, as already done
for the deploy, used the k3s overlay.

Fixes: #6857

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-17 09:50:29 +02:00
Fabiano Fidêncio
9630c13ac0 Merge pull request #6845 from fidencio/topic/yet-more-nvidia-gpu-naming-fixes
gpu: Rename the last bits from `gpu` to `nvidia-gpu`
2023-05-17 09:05:12 +02:00
Steve Horsman
e4a458035c Merge pull request #6852 from stevenhorsman/container-image-arch-consistency
deploy: fix shell script error
2023-05-17 08:01:39 +01:00
Amulya Meka
3ccc29030d Merge pull request #6780 from Amulyam24/rust-virtfs
ppc64le: switch virtiofsd from C to rust version
2023-05-17 09:36:28 +05:30
GabyCT
e0e46de12d Merge pull request #6849 from GabyCT/topic/fixtabs
osbuilder: Fix indentation in rootfs.sh
2023-05-16 16:47:09 -06:00
stevenhorsman
1a3f8fc1a2 deploy: fix shell script error
- Remove local introduced by bad copy-paste

Fixes: #6814
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-05-16 19:30:32 +01:00
Salvador Fuentes
b76058c979 Merge pull request #6721 from nedsouza/virtcontainers-qemu-go-coverage
virtcontainers/qemu_test.go: Improve coverage
2023-05-16 11:11:43 -06:00
Feng Wang
ebc8e8e2fd Merge pull request #6773 from jepio/agent-config-error-context
agent: Add context to errors that may occur when AgentConfig file is …
2023-05-16 09:21:34 -07:00
Gabriela Cervantes
87cb98c01d osbuilder: Fix indentation in rootfs.sh
This PR replaces single spaces to tabs in order to fix the
indentation of the rootfs script.

Fixes #6848

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-05-16 15:30:50 +00:00
James O. D. Hunt
a96fcfd5be Merge pull request #6735 from nedsouza/258/tests-coverage-compatoci
virtcontainers/pkg/compatoci/: Improved coverage for  for Kata 2.0
2023-05-16 15:36:35 +01:00
Amulyam24
c5a59caca1 ppc64le: switch virtiofsd from C to rust version
We have been using the C version of virtiofsd on ppc64le. Now that the issue with
rust virtiofsd have been fixed, let's switch to it.

Fixes: #4259

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2023-05-16 14:46:19 +02:00
Amulyam24
bfdf0144aa versions: Bump virtiofsd to 1.6.1
virtiofsd v1.6.1  has been released with the fixes required for running
successfully on ppc64le.

Fixes: #4259

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2023-05-16 14:46:16 +02:00
Dov Murik
dd7562522a runtime: pkg/sev: Add kbs utility package for SEV pre-attestation
Supports both online and offline modes of interaction with simple-kbs
for SEV/SEV-ES confidential guests.

Fixes: #6795

Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
2023-05-16 15:27:32 +03:00
Dov Murik
05de7b2607 runtime: Add sev package
The sev package provides utilities for launching AMD SEV and SEV-ES
confidential guests.

Fixes: #6795

Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
2023-05-16 15:27:32 +03:00
Fabiano Fidêncio
3a9d3c72aa gpu: Rename the last bits from gpu to nvidia-gpu
Let's specifically name the `gpu` runtime class as `nvidia-gpu`.  By
doing this we keep the door open and ease the life of the next vendor
adding GPU support for Kata Containers.

Fixes: #6553

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-16 13:47:52 +02:00
Fabiano Fidêncio
4cde844f70 local-build: Fix kernel-nvidia-gpu target name
It must have `-tarball` as part of its name.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-16 13:34:52 +02:00
Archana Shinde
8d10d157b3 Merge pull request #6823 from jodh-intel/utils-kata-manager-containerd-fix
kata-manager: Fix '-o' syntax and logic error
2023-05-15 21:44:35 -07:00
Bin Liu
47a02dcc7f Merge pull request #6767 from ngpatel6/Issue-5403
kata-ctl:  Add the option to install kata-ctl to a user specified directory
2023-05-16 10:43:40 +08:00
Chao Wu
911d8a5a7f Merge pull request #6804 from pmores/fix-rust-version-in-docs
runtime-rs: fix building instructions to use correct required Rust ve…
2023-05-16 10:14:05 +08:00
Bin Liu
2cd2d02d1f Merge pull request #6812 from ZhangShuaiyi/dev/write_bootparams
Dragonball: use LinuxBootConfigurator::write_bootparams
2023-05-16 09:54:41 +08:00
GabyCT
3d8185863d Merge pull request #6835 from GabyCT/topic/buildkataproxy
kata-deploy: Add http_proxy as part of the docker build
2023-05-15 16:15:27 -06:00
Narendra Patel
593840e075 kata-ctl: Allow INSTALL_PATH= to be specified
Update the kata-ctl install rule to allow it to be installed to a given directory

The Makefile was updated to use an INSTALL_PATH variable to track where the
kata-ctl binary should be installed.  If the user doesn't specify anything,
then it uses the default path that cargo uses.  Otherwise, it will install it
in the directory that the user specified.  The README.md file was also updated
to show how to use the new option.

Fixes #5403

Co-authored-by: Cesar Tamayo <cesar.tamayo@intel.com>
Co-authored-by: Kevin Mora Jimenez <kevin.mora.jimenez@intel.com>
Co-authored-by: Narendra Patel <narendra.g.patel@intel.com>
Co-authored-by: Ray Karrenbauer <ray.karrenbauer@intel.com>
Co-authored-by: Srinath Duraisamy <srinath.duraisamy@intel.com>
Signed-off-by: Narendra Patel <narendra.g.patel@intel.com>
2023-05-15 17:21:49 -04:00
Peteris Rudzusiks
bdb75fb21e runtime: use enable_vcpus_pinning from toml
Set the default value of runtime's EnableVCPUsPinning to value read from .toml.

Fixes: #6836

Signed-off-by: Peteris Rudzusiks <rye@stripe.com>
2023-05-15 21:41:20 +02:00
Tamas K Lengyel
20cb875087 virtcontainers/qemu_test.go: Improve test coverage
Rework TestQemuCreateVM routine to be a table driven test with
various config variations passed to it. After CreateVM a handful
of additional functions are exercised to improve code-coverage.
Also add partial coverage for StartVM routine.

Currently improving from 19.7% to 35.7%

Credit PR to Hackathon Team3

Fixes: #267

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
2023-05-15 15:26:35 -04:00
Fabiano Fidêncio
da877a603d Merge pull request #6829 from fidencio/topic/kata-deploy-remove-tarball-from-payload-image
kata-deploy: Do not ship the kata tarball
2023-05-15 19:01:14 +02:00
Gabriela Cervantes
b9a1db2601 kata-deploy: Add http_proxy as part of the docker build
Add http_proxy and https_proxy as part of the docker build arguments
in order to build properly when we are behind a proxy.

Fixes #6834

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-05-15 15:57:29 +00:00
Peteris Rudzusiks
3e85bf5b17 resource-control: fix setting CPU affinities on Linux
With this fix the vCPU pinning feature chooses the correct
physical cores to pin the vCPU threads on rather than always using core 0.

Fixes #6831

Signed-off-by: Peteris Rudzusiks <rye@stripe.com>
2023-05-15 16:46:36 +02:00
Pavel Mores
5f3f844a1e runtime-rs: fix building instructions with respect to required Rust version
Fixes: #6803

Signed-off-by: Pavel Mores <pmores@redhat.com>
2023-05-15 16:30:41 +02:00
Fabiano Fidêncio
9e83795fca Merge pull request #6825 from fidencio/topic/kata-deploy-build-improvements
kata-deploy: Build improvements
2023-05-15 13:49:15 +02:00
Fabiano Fidêncio
802cd2f673 Merge pull request #6821 from stevenhorsman/container-image-arch-consistency
deploy: Fix arch in image tag
2023-05-15 11:16:01 +02:00
Fabiano Fidêncio
815b4e8dac Merge pull request #6816 from fidencio/topic/kata-deploy-fixes
Revert "kata-deploy: Use readinessProbe to ensure everything is ready"
2023-05-15 10:24:58 +02:00
Fabiano Fidêncio
777c3dc8d2 kata-deploy: Do not ship the kata tarball
There's absolutely no reason to ship the kata-static tarball as part of
the payload image, as:
* The tarball is already part of the release process
* The payload image already has uncompressed content of the tarball
* The tarball itself is not used anywhere by the kata-deploy scripts

Fixes: #6828

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-15 09:22:39 +02:00
LiuWeijie
50cc9c582f tests: Improve coverage for virtcontainers/pkg/compatoci/ for Kata 2.0
Add test cases for ParseConfigJson function and GetContainerSpec function

Fixes: #258

Signed-off-by: LiuWeijie <weijie.liu@intel.com>
2023-05-15 11:58:17 +08:00
Fabiano Fidêncio
136e2415da static-build: Download firecracker instead of building it
There's no reason for us to build firecracker instead of simply
downloading the official released tarball, as tarballs are provided for
the architectures we want to use them.

Fixes: #6770

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-12 22:05:33 +02:00
Fabiano Fidêncio
3bf767cfcd static-build: Adjust ARCH for nydus
When building from aarch64, just use "arm64" as that's what's used in
the name of the released nydus tarballs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-12 22:05:33 +02:00
Fabiano Fidêncio
ac88d34e0c static-build: Use relased binary for CLH (aarch64)
There's no need to build Cloud Hypervisor aarch64 as, for a few releases
already, Cloud Hypervisor provides an official release binary for the
architecture.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-12 22:05:01 +02:00
Archana Shinde
32b39ee347 Merge pull request #6763 from nedsouza/266/tests_coverage_virtcontainers_fc
virtcontainers: Improved test coverage for fc.go from 4.6% to 18.5%
2023-05-12 11:53:27 -07:00
James O. D. Hunt
73913c8eb7 kata-manager: Fix '-o' syntax and logic error
Fix the syntax and logic error that is only displayed if the user runs
the script with `-o`. This option requests that "only" Kata Containers
is installed and stops containerd from being installed.

Fixes: #6822.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-05-12 16:44:24 +01:00
stevenhorsman
2856d3f23d deploy: Fix arch in image tag
`uname -m` produces `x86_64`, but container image convention
is to use `amd64`, so update this in the tag

Fixes: #6820
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-05-12 16:14:19 +01:00
Fabiano Fidêncio
42dce15b1f Merge pull request #6450 from singhwang/main
main | release: Fix multi-arch publishing is not supported
2023-05-12 15:25:59 +02:00
Fabiano Fidêncio
e8f81ee93d Revert "kata-deploy: Use readinessProbe to ensure everything is ready"
This reverts commit 5ec9ae0f04, for two
main reasons:
* The readinessProbe was misintepreted by myself when working on the
  original PR
* It's actually causing issues, as the pod ends up marked as not
  healthy.
2023-05-12 14:28:23 +02:00
SinghWang
cfe63527c5 release: Fix multi-arch publishing is not supported
When release is published, kata-deploy payload and kata-static package
can support multi-arch publishing.

Fixes: #6449

Signed-off-by: SinghWang <wangxin_0611@126.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-12 13:36:44 +02:00
Shuaiyi Zhang
197c336516 Dragonball: use LinuxBootConfigurator::write_bootparams to writes
the boot parameters into guest memory.

Fixes: #6813

Signed-off-by: Shuaiyi Zhang <zhang_syi@qq.com>
2023-05-12 16:07:44 +08:00
Fabiano Fidêncio
181017d1d8 Merge pull request #6811 from fidencio/topic/yet-more-fixes-for-nvidia-gpu-kernels
cache: More fixes to nvidia-gpu kernels caching
2023-05-12 10:02:08 +02:00
Amulya Meka
76f975e5e6 Merge pull request #6742 from Amulyam24/agent-build
runtime: remove overriding ARCH value by default for ppc64le
2023-05-12 12:34:50 +05:30
Archana Shinde
20ac3917ad Merge pull request #6739 from byron-marohn/fix_5561
gha: Fix Body Line Length action flagging empty body commit messages
2023-05-11 15:17:07 -07:00
Archana Shinde
1ad442e656 Merge pull request #6748 from nedsouza/fix-snap
gha: Fix snap creation workflow
2023-05-11 15:09:22 -07:00
Fabiano Fidêncio
4d17ea4a01 cache: Fix nvidia-snp caching version
All the kernel-foo instances, such as "kernel-sev" or "kernel-snp",
should be transformed into "kernel.foo" when looking at the
versions.yaml file.

This was already done for SEV, but missed on the SNP case.

Fixes: #6777

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-11 21:26:58 +02:00
Fabiano Fidêncio
a133fadbfa cache: Fix nvidia-gpu-tdx-experimental cache URL
We were passing "kernel-nvidia-gpu-tdx", missing the "-experimental"
part, leading to a non-valid URL.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-11 21:20:06 +02:00
Fabiano Fidêncio
a7dd6cbadd Merge pull request #6807 from fidencio/topic/fix-nvidia-gpu-cache
cache: Fix nvidia-gpu version
2023-05-11 17:40:41 +02:00
Fabiano Fidêncio
b9990c2017 cache: Fix nvidia-gpu version
c9bf7808b6 introduced the logic to
properly get the version of nvidia-gpu kernels, but one important part
was dropped during the rebase into main, which is actually getting the
correct version of the kernel.

Fixing this now, and using the old issue as reference.

Fixes: #6777

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-11 13:55:14 +02:00
Fabiano Fidêncio
14939d00ad Merge pull request #6778 from fidencio/topic/cache-gpu-related-kernels
cache: Update the KERNEL_FLAVOUR list to include nvidia-gpu
2023-05-11 13:14:45 +02:00
Fabiano Fidêncio
c9bf7808b6 cache: Update the KERNEL_FLAVOUR list to include nvidia-gpu
We need to make sure that, when caching a `-nvidia-gpu` kernel, we still
look at the version of the base kernel used to build the nvidia-gpu
drivers, as the ${vendor}-gpu kernels are based on already existing
entries in the versions.yaml file and do not require a new entry to be
added.

Fixes: #6777

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-11 10:56:13 +02:00
Fabiano Fidêncio
3665b42045 gpu: Rename gpu targets to nvidia-gpu
This will make it easier for other GPU vendors to add the needed bits in
the future.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-05-11 10:55:55 +02:00
Fabiano Fidêncio
edfaae85cb Merge pull request #6700 from fitzthum/snp-artifacts
packaging: Add SEV-SNP artifacts to main
2023-05-11 10:47:10 +02:00
James O. D. Hunt
fe33015075 Merge pull request #6794 from jodh-intel/docs-mark-snap-as-unmaintained
docs: Mark snap installation method as unmaintained
2023-05-11 09:14:25 +01:00
Fabiano Fidêncio
c937d0a5d4 Merge pull request #6591 from UnmeshDeodhar/add-sev-artifacts-to-main
packaging: Add sev artifacts to main
2023-05-11 09:09:36 +02:00
Tobin Feldman-Fitzthum
2c90cac751 local-build: fixup alphabetization
A few pieces of the local-build tooling are supposed to be
alphabetized. Fixup a couple minor issues that have accumulated.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2023-05-10 21:23:38 +00:00
Tobin Feldman-Fitzthum
4da6eb588d kata-deploy: Add qemu-snp shim
Now that we have the SNP components in place, make sure that
kata-deploy knows about the qemu-snp shim so that it will be
added to containerd config.

Fixes: #6575

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2023-05-10 20:55:36 +00:00
Tobin Feldman-Fitzthum
14dd053758 kata-deploy: add kata-qemu-snp runtimeclass
Since SEV-SNP has limited hotplug support, increase
the pod overhead to account for fixed resource usage.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2023-05-10 20:55:36 +00:00
Tobin Feldman-Fitzthum
0bb37bff78 config: Add SNP configuration
SNP requires many specific configurations, so let's make
a new SNP configuration file that we can use with the
kata-qemu-snp runtime class.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Alex Carter <Alex.Carter@ibm.com>
2023-05-10 20:55:36 +00:00
Chelsea Mafrica
13f9ba2298 Merge pull request #6379 from cmaf/kata-ctl-check-kvm-1
kata-ctl: add generic kvm check & unit test
2023-05-10 13:33:57 -07:00
Tobin Feldman-Fitzthum
af7f2519bf versions: update SEV kernel description
SNP and SEV will share a (guest) kernel. Update the description
in versions.yaml to mention this.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2023-05-10 20:27:12 +00:00
Tobin Feldman-Fitzthum
dbcc3b5cc8 local-build: fix default values for OVMF build
Existing value has wrong name and compression type
leading to installation failure.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2023-05-10 20:27:12 +00:00
Tobin Feldman-Fitzthum
b8bbe6325f gha: build OVMF for tests and release
The x86_64 package of OVMF is required for deployments
that don't use kernel hashes, which includes SEV-SNP
in the short term. We should keep this in the bundle
in the long term in case someone wants to disable
kernel hashes.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2023-05-10 20:27:12 +00:00
Tobin Feldman-Fitzthum
cf0ca265f9 local-build: Add x86_64 OVMF target
Add targets to build the "plain" x86_64 OVMF.

This will be used by anyone who is using SEV or SNP
without kernel hashes. The SNP QEMU does not yet
support kernel hashes so the OvmfPkg will be used
by default.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Alex Carter <Alex.Carter@ibm.com>
2023-05-10 20:24:51 +00:00
Tobin Feldman-Fitzthum
db095ddeb4 cache: add SNP flavor to comments
Update comments to include new SNP QEMU option

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2023-05-10 20:19:56 +00:00
Tobin Feldman-Fitzthum
f4ee00576a gha: Build and ship QEMU for SNP
Now that we can build SNP QEMU, let's do that for tests and release.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2023-05-10 20:19:56 +00:00
Tobin Feldman-Fitzthum
7a58a91fa6 docs: update SNP guide
Since we reshuffled versions.yaml, update the guide so that
we can find the SNP QEMU info.

Once runtime support is merged we should overhaul or remove
this guide, but let's keep it for now.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2023-05-10 20:19:56 +00:00
Tobin Feldman-Fitzthum
879333bfc7 versions: update SNP QEMU version
Refactor SNP QEMU entry in versions.yaml to match
qemu-experimental and qemu-tdx-experimental.

Also, update the version of QEMU to what we are using
in CCv0. This is the non-UPM QEMU and it does not
have kernel hashes support.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Alex Carter <Alex.Carter@ibm.com>
2023-05-10 20:19:56 +00:00
Tobin Feldman-Fitzthum
38ce4a32af local-build: add support to build QEMU for SEV-SNP
Add Make targets and helper functions to build the QEMU
needed for SEV-SNP.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Alex Carter <Alex.Carter@ibm.com>
2023-05-10 20:19:56 +00:00
Chelsea Mafrica
5f8008b69c kata-ctl: add unit test for kvm check
Check that kvm test fails when run as non-root and when device specified
is not /dev/kvm.

Fixes #5338

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2023-05-10 10:29:20 -07:00
Chelsea Mafrica
a085a6d7b4 kata-ctl: add generic kvm check
Add kvm check using ioctl macro to create a syscall that checks the kvm
api version and if creation of a vm is successful.

Fixes #5338

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2023-05-10 10:29:20 -07:00
Unmesh Deodhar
772d4db262 gha: Build and ship SEV initrd
We have code that builds initrd for SEV.
thus, adding that to the test and release process.

Fixes: #6572

Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
2023-05-10 12:19:56 -05:00
Unmesh Deodhar
45fa366926 gha: Build and ship SEV OVMF
SEV requires special OVMF to work. Thus, building that for test and release.

Fixes: #6572

Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
2023-05-10 12:19:56 -05:00
Unmesh Deodhar
4770d3064a gha: Build and ship SEV kernel.
SEV requires custom kernel arguments when building.
Thus, adding it to the test and release process.

Fixes: #6572

Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
2023-05-10 12:19:56 -05:00
Unmesh Deodhar
fb9c1fc36e runtime: Add qemu-sev config
Adding config file that can be used with qemu-sev runtime class.
Since SEV has limited hotplug support, increase
the pod overhead to account for fixed resource usage.

Fixes: #6572

Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
2023-05-10 12:19:56 -05:00
Unmesh Deodhar
813e4c576f runtimeClasses: add sev runtime class
Adding kata-qemu-sev runtime class.

Fixes: #6572

Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
2023-05-10 12:19:56 -05:00
Unmesh Deodhar
af18806a8d static-build: Add caching support to sev ovmf
SEV requires special OVMF.
Now that we have ability to build this custom OVMF, let's optimize
it by caching so that we don't have to build it for every run.

Fixes: sev: #6572

Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
2023-05-10 12:19:55 -05:00
Unmesh Deodhar
76ae7a3abe packaging: adding caching capability for kernel
The SEV initrd build requires kernel modules.
So, for SEV case, we need to cache kernel modules tarball in
addition to kernel tarball.

Fixes: #6572

Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
2023-05-10 12:19:55 -05:00
Unmesh Deodhar
12c5ef9020 packaging: add support to build OVMF for SEV
SEV requires special OVMF to work with kernel hashes.
Thus, adding changes that builds this custom OVMF for SEV.

Fixes: #6572

Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
2023-05-10 12:19:55 -05:00
Unmesh Deodhar
b87820ee8c packaging: add support to build initrd for sev
We need special initrd for SEV. The work on SEV initrd is based on
Ubuntu. Thus, adding another entry in versions.yaml
This binary will have '-sev' suffix to distinguish it from the generic
binary.

Fixes: #6572

Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
2023-05-10 12:19:55 -05:00
James O. D. Hunt
e1f3b871cd docs: Mark snap installation method as unmaintained
The snap package is no longer being maintained so update the docs to
warn readers.

We'll remove the snap installation docs in a few weeks.

See: #6769.
Fixes: #6793.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-05-10 18:02:46 +01:00
Jeremi Piotrowski
022a33de92 agent: Add context to errors when AgentConfig file is missing
When the agent config file is missing, the panic message says "no such file or
directory" but doesn't inform the user about which file was missing. Add
context to the parsing (with filename) and to the from_config_file() calls
(with information where the path is coming from).

Fixes: #6771
Depends-on: github.com/kata-containers/tests#5627
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-05-10 08:43:16 +02:00
Fabiano Fidêncio
6881b9558b Merge pull request #6512 from gabevenberg/log-parser-rs
Log-parser-rs
2023-05-10 08:22:59 +02:00
Chao Wu
7218229af0 Merge pull request #6594 from Apokleos/warning_fix_1.68.0
warning_fix: fix warnings when build with cargo-1.68.0
2023-05-10 09:51:45 +08:00
Unmesh Deodhar
b0e6a094be packaging: Add sev kernel build capability
Adding code that builds sev kernel.

Fixes: #6572

Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
2023-05-09 13:47:22 -05:00
Tim Zhang
b0b5d7082e Merge pull request #6753 from amshinde/add-cross-building-with-cross
cross-compile: Include documentation and configuration for cross-compile
2023-05-09 16:31:40 +08:00
Feng Wang
4e0dce6802 Merge pull request #6738 from fengwang666/oss-fix-fd-leak
runtime: Fix virtiofs fd leak
2023-05-08 10:52:36 -07:00
Eduardo Berrocal
a4c0303d89 virtcontainers: Fixed static checks for improved test coverage for fc.go
Expanded tests on fc_test.go to cover more lines of code. Coverage went from 4.6% to 18.5%.
Fixed very simple static check fail on line 202.

Fixes: #266

Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>
2023-05-07 00:17:36 -07:00
Peng Tao
65670e6b0a Merge pull request #6699 from zvonkok/cold-plug-vfio
gpu: cold plug VFIO devices
2023-05-05 10:04:29 +08:00
Archana Shinde
b86d32aba9 Merge pull request #6728 from nedsouza/256/tests_coverage_pkg_signals
pkg/signals: Improved test coverage 60% to 100%
2023-05-04 16:19:12 -07:00
Archana Shinde
9443c4aea7 Merge pull request #6729 from nedsouza/259/tests_coverage_virtcontainers_persist
virtcontainers/persist: Improved test coverage 65% to 87.5%
2023-05-04 16:18:55 -07:00
Archana Shinde
09134c30de Merge pull request #6737 from nedsouza/265/virtcontainers-clh-go-coverage
virtcontainers/clh_test.go: improve unit test coverage
2023-05-04 16:15:43 -07:00
Archana Shinde
8495f830b7 cross-compile: Include documentation and configuration for cross-compile
`cross` is an open source tool that provides zero-setup cross compile
for rust binaries. Add documentation on this tool for compiling
kata-ctl tool and Cross.toml file that provides required configuration
for installing dependencies for various targets.
This is pretty useful for a developer to make sure code compiles and
passes checks for various architectures.

Fixes: #6765

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-05-04 14:13:00 -07:00
Bin Liu
e57ac2ae18 Merge pull request #6749 from nedsouza/260/tests_coverage_virtcontainers_factory
virtcontainers/factory: Improved test coverage
2023-05-04 10:54:40 +08:00
Zvonko Kaiser
13d7f39c71 gpu: Check for VFIO port assignments
Bailing out early if the port is wrong, allowed port settings are
no-port, root-port, switch-port

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-05-03 12:32:33 +00:00
Gabe Venberg
6594a9329d tools: made log-parser-rs
Eventual replacement of kata-log-parser, but for now replicates its
functionaility for the new runtime-rs syntax. Takes in log files,
parses, sorts by timestamp, spits them out in json, csv, xml, toml, and
a few others.

Fixes #5350

Signed-off-by: Gabe Venberg <gabevenberg@gmail.com>
2023-05-02 13:16:54 -05:00
Wainer Moschetta
f5ff975560 Merge pull request #6723 from ryansavino/gha-k8s-also-test-snp
gha: Also run k8s tests on qemu-snp
2023-05-01 10:37:12 -03:00
Fabiano Fidêncio
b6e54676eb Merge pull request #6759 from ryansavino/gha-sev-kata-deploy-fix
gha: sev: fix for kata-deploy error
2023-05-01 11:42:16 +02:00
Eduardo Berrocal
03a8cd69c2 virtcontainers: Improved test coverage for fc.go from 4.6% to 18.5%
Expanded tests on fc_test.go to cover more lines of code. Coverage went from 4.6% to 18.5%.

Fixes: #266

Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>
2023-04-28 15:40:45 -07:00
Ryan Savino
9e2b7ff177 gha: sev: fix for kata-deploy error
kubectl commands need a '-f' instead of a '-k'

Fixes: #6758

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2023-04-28 14:54:36 -05:00
Ryan Savino
5c9246db19 gha: Also run k8s tests on qemu-snp
Added the k8s tests for qemu-snp

Fixes: #6722

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2023-04-28 14:43:53 -05:00
Ryan Savino
c57a44436c gha: Add the ability to test qemu-snp
With the changes proposed as part of this PR, a qemu-snp cluster
will be created but no tests will be performed.

GitHub Actions will only run the tests using the workflows that are
part of the **target** branch, instead of the using the ones coming
from the PR. No way to work around this for now.

After this commit is merged, the tests (not the yaml files for the
actions) will be altered in order for the checkout action  to help in
this case.

Fixes: #6722

Signed-off-by: Ryan Savino <ryan.savino@amd.com>
2023-04-28 13:07:13 -05:00
Wainer Moschetta
29785a43d7 Merge pull request #6712 from ryansavino/gha-k8s-also-test-sev
gha: Also run k8s tests on qemu-sev
2023-04-28 14:22:03 -03:00
Archana Shinde
65c61785fc Merge pull request #6660 from amshinde/kata-ctl-cmd
Implement the "kata-ctl env" command
2023-04-28 01:33:28 -07:00
Archana Shinde
4064192896 env: Utilize arch specific functionality to get cpu details
Have kata-env call architecture specific function to get cpu details
instead of generic function to get cpu details that works only for
certain architectures. The functionality for cpu details has been fully
implemented for x86_64 and arm architectures, but needs to be
implemented for s390 and powerpc.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-04-27 16:45:41 -07:00
Archana Shinde
fb40c71a21 env: Check for root privileges
Check for root privileges early on.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-04-27 16:45:41 -07:00
Archana Shinde
1016bc17b7 config: Add api to fetch config from default config path
Add api to fetch config from default config path and use that in
kata-ctl tool.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-04-27 16:45:41 -07:00
Archana Shinde
b908a780a0 kata-env: Pass cmd option for file path
Add ability to write the environment information to a file
or stdout if file path is absent.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-04-27 16:45:41 -07:00
Archana Shinde
b1920198be config: Workaround the way agent and hypervisor configs are fetched
This is essentially a workaround for the issue:
https://github.com/kata-containers/kata-containers/issues/5954

runtime-rs chnages the Kata config format adding agent_name and
hypervisor_name which are then used as keys to fetch the agent and
hypervisor configs. This will not work for older configs.
So use the first entry in the hashmaps to fetch the configs as a
workaround while the config change issue is resolved.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-04-27 16:45:41 -07:00
Archana Shinde
f2b2621dec kata-env: Implement the kata-env command.
Command implements functionality to get user environment settings.

Fixes: #5339

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-04-27 16:45:41 -07:00
Ryan Savino
c849bdb0a5 gha: Also run k8s tests on qemu-sev
Added the k8s tests for qemu-sev

Fixes: #6711

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2023-04-27 15:24:08 -05:00
Eduardo Berrocal
6bf1fc6051 virtcontainers/factory: Improved test coverage
Expanded tests on factory_test.go to cover more lines of code. Coverage went from 34% to 41.5% in the case of user-mode run tests,
and from 77.7% to 84% in the case of priviledge-mode run tests.

Fixes: #260

Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>
2023-04-27 13:08:35 -07:00
Tamas K Lengyel
0d49ceee0b gha: Fix snap creation workflow warnings
Fix recurring issues of failing to install dependencies due to stale apt cache.
Uprev actions/checkout to v3 to resolve issue "Node.js 12 actions are deprecated."

Fixes: #5659
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
2023-04-27 18:40:02 +00:00
Zvonko Kaiser
138ada049c gpu: Cold Plug VFIO toml setting
Added the cold_plug_vfio setting to the qemu-toml.in with some
epxlanation

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-27 11:04:45 +00:00
Amulyam24
defb643346 runtime: remove overriding ARCH value by default for ppc64le
Currently, ARCH value is being set to powerpc64le by default.
powerpc64le is only right in context of rust and any operation
which might use this variable for a different purpose would fail on ppc64le.

Fixes: #6741

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2023-04-27 16:17:48 +05:30
Zvonko Kaiser
f7ad75cb12 gpu: Cold-plug extend the api.md
Make the hypervisorconfig consistent in code and api.md

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-27 09:35:05 +00:00
Zvonko Kaiser
0fec2e6986 gpu: Add cold-plug test
Cold plug setting is now correctly decoded in toml

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-27 09:30:24 +00:00
Archana Shinde
f2ebdd81c2 utils: Get rid of spurious print statement left behind.
The print was used for debugging, get ris of it.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-04-26 22:12:30 -07:00
Archana Shinde
9a94f1f149 make: Export VERSION and COMMIT
These will be consumed by kata-ctl, so export these so that
they can be used to replace variables available to the rust binary.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-04-26 22:12:30 -07:00
Archana Shinde
2f81f48dae config: Add file under /opt as another location to look for the config
Most of kata installation tools use this path for installation, so
add this to the paths to look for the configuration.toml file.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-04-26 22:12:30 -07:00
Archana Shinde
07f7d17db5 config: Make the pipe_size field optional
Add the serde default attribute to the field so that parsing
can continue if this field is not present.
The agent assumes a default value for this, so it is not required
by the user to provide a value here.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-04-26 22:12:30 -07:00
Archana Shinde
68f6357731 config: Make function to get the default conf file public
This will be used by the kata-env command.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-04-26 22:12:30 -07:00
Archana Shinde
7565b33568 kata-ctl: Implement Display trait for GuestProtection enum
Implement Display for enum to display in env output.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-04-26 22:12:30 -07:00
Archana Shinde
94a00f9346 utils: Make certain constants in utils.rs public
These would be used outside of utils.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-04-26 22:12:30 -07:00
Archana Shinde
572b338b3b gitignore: Ignore .swp and .swo editor backup files
Ignore temporary files created by vim editor.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-04-26 22:12:30 -07:00
Archana Shinde
376884b8a4 cargo: Update version of clap to 4.1.13
This version includes macros related to using command options.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-04-26 22:12:30 -07:00
alex.lyn
17daeb9dd7 warning_fix: fix warnings when build with cargo-1.68.0
Fixes: #6593

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-04-27 10:29:50 +08:00
Ryan Savino
521519d745 gha: Add the ability to test qemu-sev
With the changes proposed as part of this PR, a qemu-sev cluster will
be created but no tests will be performed.

GitHub Actions will only run the tests using the workflows that are
part of the **target** branch, instead of the using the ones coming
from the PR. No way to work around this for now.

After this commit is merged, the tests (not the yaml files for the
actions) will be altered in order for the checkout action  to help in this
case.

Fixes: #6711

Signed-off-by: Ryan Savino <ryan.savino@amd.com>
2023-04-26 17:56:28 -05:00
Feng Wang
205909fbed runtime: Fix virtiofs fd leak
The kata runtime invokes removeStaleVirtiofsShareMounts after
a container is stopped to clean up the stale virtiofs file caches.

Fixes: #6455
Signed-off-by: Feng Wang <fwang@confluent.io>
2023-04-26 15:53:39 -07:00
Byron Marohn
5226f15c84 gha: Fix Body Line Length action flagging empty body commit messages
Change the Body Line Length workflow to not trigger when the commit
message contains only a message without a body. Other workflows will
flag the missing body sections, and it was confusing to have an error
message that said 'Body line too long (max 150)' when this was not
actually the case.

Fixes: #5561

Co-authored-by: Jayant Singh <jayant.singh@intel.com>
Co-authored-by: Luke Phillips <lucas.phillips@intel.com>
Signed-off-by: Byron Marohn <byron.marohn@intel.com>
Signed-off-by: Jayant Singh <jayant.singh@intel.com>
Signed-off-by: Luke Phillips <lucas.phillips@intel.com>
Signed-off-by: Kelby Madal-Hellmuth <kelby.madal-hellmuth@intel.com>
Signed-off-by: Liz Lawrens <liz.lawrens@intel.com>
2023-04-26 17:29:16 -04:00
Tamas K Lengyel
0f45b0faa9 virtcontainers/clh_test.go: improve unit test coverage
Credit PR to Hackathon Team3

Fixes: #265

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
2023-04-26 19:12:51 +00:00
Zvonko Kaiser
dded731db3 gpu: Add OVMF setting for MMIO aperture
The default size of OVMFs aperture is too low to
initialized PCIe devices with huge BARs

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Zvonko Kaiser
2a830177ca gpu: Add fwcfg helper function
Added driver util function for easier handling of VFIO
devices outside of the VFIO module. At the sandbox level
we may need to set options depending if we have a VFIO/PCIe
device, like the fwCfg for confiential guests.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Zvonko Kaiser
131f056a12 gpu: Extract VFIO Functions to drivers
Some functions may be used in other modules then only in
the VFIO module, extract them and make them available to
other layers like sandbox.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Zvonko Kaiser
c8cf7ed3bc gpu: Add ColdPlug of VFIO devices with devManager
If we have a VFIO device and cold-plug is enabled
we mark each device as ColdPlug=true and let the VFIO
module do the attaching.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Zvonko Kaiser
e2b5e7f73b gpu: Add Rawdevices to hypervisor
RawDevics are used to get PCIe device info early before the sandbox
is started to make better PCIe topology decisions

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Zvonko Kaiser
6107c32d70 gpu: Assign default value to cold-plug
Make sure the configuration is propagated to the right structs
and the default value is assigned.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Zvonko Kaiser
377ebc2ad1 gpu: Add configuration option for cold-plug VFIO
Users can set cold-plug="root-port" to cold plug a VFIO device in QEMU

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Zvonko Kaiser
c18ceae109 gpu: Add new struct PCIePort
For the hypervisor to distinguish between PCIe components, adding
a new enum that can be used for hot-plug and cold-plug of PCIe devices

Fixes: #6687

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-26 09:47:37 +00:00
Bin Liu
509bc8b6c8 Merge pull request #6718 from openanolis/mengze/keep_abnormal
runtime-rs: support keep_abnormal in toml config
2023-04-26 12:36:52 +08:00
Bin Liu
b6d880510a Merge pull request #6595 from zvonkok/gpu-snp-tdx-kernel
gpu: Build and Ship an GPU enabled Kernel
2023-04-26 12:33:51 +08:00
Eduardo Berrocal
9c38204f13 virtcontainers/persist: Improved test coverage 65% to 87.5%
Expanded tests on manager_test.go to cover more lines of code.

Fixes: #259

Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>
2023-04-25 23:53:46 +00:00
Eduardo Berrocal
1c1ee8057c pkg/signals: Improved test coverage 60% to 100%
Expanded tests on signals_test.go to cover more lines of code. 'go test' won't show 100% coverage (only 66.7%), because one test need to spawn a new
process (since it is testing a function that calls os.Exit(1)).

Fixes: #256

Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>
2023-04-25 23:34:13 +00:00
mengze
cc8ea3232e runtime-rs: support keep_abnormal in toml config
This patch adds keep_abnormal in runtime config. If keep_abnormal =
true, it means that 1) if the runtime exits abnormally, the cleanup
process will be skipped, and 2) the runtime will not exit even if the
health check fails.

This option is typically used to retain abnormal information for
debugging and should NOT be enabled by default.

Fixes: #6717

Signed-off-by: mengze <mengze@linux.alibaba.com>
Signed-off-by: quanweiZhou <quanweiZhou@linux.alibaba.com>
2023-04-25 13:47:44 +08:00
David Esparza
7fdaab49bc Merge pull request #6295 from dborquez/add_kernel_module_checks_kvm
kata-ctl: checks for kvm, kvm_intel modules loaded
2023-04-24 13:33:18 -06:00
Greg Kurz
0ca6d3b726 Merge pull request #6681 from Vlad1mir-D/6677-fix-kata-agent-dbus-connection
osbuilder: Fix D-Bus enabling in the dracut case
2023-04-24 17:31:13 +02:00
Bin Liu
3d8688f92e Merge pull request #6620 from jongwu/docker_fail_start_snap
snap: fix docker start fail issue
2023-04-24 10:53:16 +08:00
Archana Shinde
97291d88e9 Merge pull request #6696 from amshinde/kata-manager-containerd-fix
kata-manager: Fix containerd download
2023-04-21 09:54:30 -07:00
Archana Shinde
96e8470dbe kata-manager: Fix containerd download
Newer containerd releases have an additional static package published.
Because of this,  download_url contains two urls causing curl to fail.
To resolve this, pick the first url from the containerd releases to
download containerd.

Fixes: #6695

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-04-20 23:08:51 -07:00
David Esparza
432d407440 kata-ctl: checks for kvm, kvm_intel modules loaded
Ensure that kvm and kvm_intel modules are loaded.
Renames the get_cpu_info() function to read_file_contents()

Fixes #5332

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2023-04-20 11:29:36 -06:00
Zvonko Kaiser
b1730e4a67 gpu: Add new kernel build option to usage()
With each release make sure we ship a GPU  enabled kernel

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-20 07:48:30 +00:00
Fupan Li
ceefd50bd0 Merge pull request #6680 from Tim-Zhang/fix-ut-bad-fd
agent: Fix ut issue caused by fd double closed
2023-04-20 11:18:27 +08:00
Fupan Li
a7b4b69230 Merge pull request #6673 from Tim-Zhang/upgrade-ttrpc-protobuf
Bump ttrpc to 0.7.2 and protobuf to 3.2.0
2023-04-20 10:13:43 +08:00
Fupan Li
a1568cd2f5 Merge pull request #6676 from zvonkok/gpu-runtime
gpu: Add GPU enabled confguration and runtime
2023-04-19 13:01:49 +08:00
Vladimir
3e7b902265 osbuilder: Fix D-Bus enabling in the dracut case
- D-Bus enabling now occurs only in setup_rootfs (instead of
prepare_overlay and setup_rootfs)
- Adjust permissions of / so dbus-broker will be able to traverse FS

These changes enables kata-agent to successfully communicate with D-Bus.

Fixes #6677

Signed-off-by: Vladimir <amigo.elite@gmail.com>
2023-04-18 23:17:34 +03:00
Tim Zhang
53c749a9de agent: Fix ut issue caused by fd double closed
Never ever try to close the same fd double times, even in a unit test.

A file descriptor is a number which will be reused, so when you close
the same number twice you may close another file descriptor in the second
time and then there will be an error 'Bad file descriptor (os error 9)'
while the wrongly closed fd is being used.

Fixes: #6679

Signed-off-by: Tim Zhang <tim@hyper.sh>
2023-04-18 23:19:10 +08:00
Hyounggyu Choi
5c032c64ac Merge pull request #6664 from zvonkok/vfio-fix
gpu: Do not pass-through PCI (Host) Bridges
2023-04-18 19:50:15 +09:00
Tim Zhang
2e3f19af92 agent: fix clippy warnings caused by protobuf3
Fix warnings introduced by protobuf upgrade.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2023-04-17 20:15:49 +08:00
Tim Zhang
4849c56faa agent: Fix unit test issue cuased by protobuf upgrade
Fixes: #6646

Signed-off-by: Tim Zhang <tim@hyper.sh>
2023-04-17 19:49:21 +08:00
Tim Zhang
0a582f7815 trace-forwarder: remove unused crate protobuf
Remove unused crate protobuf.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2023-04-17 19:49:21 +08:00
Tim Zhang
73253850e6 kata-ctl: remove unused crate ttrpc
Remove unused crate ttrpc.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2023-04-17 19:49:21 +08:00
Tim Zhang
76d2e30547 agent-ctl: Bump ttrpc from 0.6.0 to 0.7.1
Fixes: #6646

Signed-off-by: Tim Zhang <tim@hyper.sh>
2023-04-17 19:49:21 +08:00
Tim Zhang
eb3d20dccb protocols: Add ut for Serde
Fixes: #6646

Signed-off-by: Tim Zhang <tim@hyper.sh>
2023-04-17 19:49:21 +08:00
Tim Zhang
59568c79dd protocols: add support for Serde
rust-protobuf@3 does not support Serde natively anymore.
So we need to do it by ourselves.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2023-04-17 19:49:21 +08:00
Tim Zhang
a6b4d92c84 runtime-rs: Bump ttrpc from 0.6.0 to 0.7.1
Fixes: #6646

Signed-off-by: Tim Zhang <tim@hyper.sh>
2023-04-17 19:49:20 +08:00
Zvonko Kaiser
ac7c63bc66 gpu: Add containerd shim for qemu-gpu
Last but not least add the continerd shim configuration
pointing to the correct configuration-<shim>.toml

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-17 10:45:04 +00:00
Zvonko Kaiser
a0cc8a75f2 gpu: Add a kube runtime class
With the added configuration add the corresponding kube
runtime class.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-17 10:42:04 +00:00
Zvonko Kaiser
a81fff706f gpu: Adding a GPU enabled configuration
We need to set hotplug on pci root port and enable at least one
root port. Also set the guest-hooks-dir to the correct path

Fixes: #6675

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-17 10:40:09 +00:00
Tim Zhang
8af6fc77cd agent: Bump ttrpc from 0.6.0 to 0.7.1
Fixes: #6646

Signed-off-by: Tim Zhang <tim@hyper.sh>
2023-04-17 18:31:41 +08:00
Tim Zhang
009b42dbff protocols: Fix unit test
Fixes: #6646

Signed-off-by: Tim Zhang <tim@hyper.sh>
2023-04-17 18:31:41 +08:00
Tim Zhang
392732e213 protocols: Bump ttrpc from 0.6.0 to 0.7.1
Fixes: #6646

Signed-off-by: Tim Zhang <tim@hyper.sh>
2023-04-17 18:31:35 +08:00
Zvonko Kaiser
f4f958d53c gpu: Do not pass-through PCI (Host) Bridges
On some systems a GPU is in a IOMMU group with a PCI Bridge and
PCI Host Bridge. Per default no PCI Bridge needs to be passed-through.
When scanning the IOMMU group, ignore devices with a 0x60 class ID prefix.

Fixes: #6663

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-17 10:08:23 +00:00
Zvonko Kaiser
825e769483 gpu: Add GPU support to default kernel without any TEE
With each release make sure we ship a GPU enabled kernel

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-17 09:58:58 +00:00
Zvonko Kaiser
e4ee07f7d4 gpu: Add GPU TDX experimental kernel
With each release make sure we ship a GPU and TEE enabled kernel
This adds tdx-experimental kernel support

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-17 09:58:52 +00:00
Fabiano Fidêncio
243cb2e3af Merge pull request #6670 from fidencio/topic/fix-caching-of-tdvf-and-tdx-qemu
cache-components: Fix caching of TDVF and QEMU for TDX
2023-04-16 09:04:04 +02:00
Fabiano Fidêncio
a1272bcf1d gha: tdx: Fix typo overlay -> overlays
The beauty of GHA not allowing us to easily test changes in the yaml
files as part of the PR has hit us again. :-/

The correct path for the k3s deployment is
tools/packaging/kata-deploy/kata-deploy/overlays/k3s instead of
tools/packaging/kata-deploy/kata-deploy/overlay/k3s.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-15 15:00:06 +02:00
Fabiano Fidêncio
3fa0890e5e cache-components: Fix TDVF caching
TDVF caching is not working as the tarball name is incorrect. The result
expected is kata-static-tdvf.tar.xz, but it's looking for
kata-static-tdx.tar.xz.

This happens as a logic to convert tdx -> tdvf has been added as part of
the building scripts, but I missed doing this as part of the caching
scripts.

Fixes: #6669

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-15 14:12:29 +02:00
Fabiano Fidêncio
80e3a2d408 cache-components: Fix TDX QEMU caching
TDX QEMU caching is not working as expected, as we're checking for its
version looking at "assets.hypervisor.${QEMU_FLAVOUR}.version", which is
correct for standard QEMU. However, for TDX QEMU we should be checking
for "assets.hypervisor.${QEMU_FLAVOUR}.tag"

Fixes: #6668

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-15 14:12:26 +02:00
Fabiano Fidêncio
fffe2c6082 Merge pull request #6648 from fidencio/topic/gha-tdx-improvements-and-fixes
gha: tdx: Ensure kata-deploy is removed after the tests run
2023-04-15 00:21:31 +02:00
Bo Chen
a819ce145f Merge pull request #6633 from likebreath/0406/clh_v31.0
versions: Upgrade to Cloud Hypervisor v31.0
2023-04-14 13:52:19 -07:00
Zvonko Kaiser
87ea43cd4e gpu: Add configuration fragment
Adding configuration fragment for the kernel,
depending on the TEE kernel update the LOCALVERSION

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-14 07:52:51 +00:00
Zvonko Kaiser
aca6ff7289 gpu: Build and Ship an GPU enabled Kernel
With each release make sure we ship a GPU and TEE enabled kernel

Fixes: #6553

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-14 07:52:42 +00:00
Fabiano Fidêncio
dc662333df runtime: Increase the dial_timeout
When testing on AKS, we've been hitting the dial_timeout every now and
then.  Let's increase it to 45 seconds (instead of 30) for all the VMMs,
and to 60 seconfs in case of TEEs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-13 22:42:52 +02:00
Greg Kurz
897c0bc67e Merge pull request #6658 from gkurz/osbuilder-dracut-dbus
osbuilder: Enable dbus in the dracut case
2023-04-13 19:03:15 +02:00
Greg Kurz
eb1762e813 osbuilder: Enable dbus in the dracut case
The agent now offloads cgroup configuration to systemd when
possible. This requires to enable D-Bus in order to communicate
with systemd.

Fixes #6657

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-04-13 14:16:50 +02:00
Greg Kurz
f9a94f8fc5 Merge pull request #6623 from UiPath/fix-no-space-device
runtime: Don't create socket file in /run/kata
2023-04-13 10:36:20 +02:00
Fabiano Fidêncio
f478b9115e clh: tdx: Update timeouts for confidential guest
Booting up TDX takes more time than booting up a normal VM.  Those
values are being already used as part of the CCv0 branch, and we're just
bringing them to the `main` branch as well.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-13 10:18:07 +02:00
Fabiano Fidêncio
3b76abb366 kata-deploy: Ensure node is ready after CRI Engine restart
Let's ensure the node is ready after the CRI Engine restart, otherwise
we may proceed and scripts may simply fail if they try to deploy a pod
while the CRI Engine is not yet restarted (and, consequently, the node
is not Ready).

Related: #6649

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-13 10:18:07 +02:00
Fabiano Fidêncio
5ec9ae0f04 kata-deploy: Use readinessProbe to ensure everything is ready
readinessProbe will help us to only have the kata-deploy pod marked as
Ready when it finishes all the needed configurations in the node.

Related: #6649

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-13 10:18:07 +02:00
Fabiano Fidêncio
ea386700fe kata-deploy: Update podOverhead for TDX
As TEEs cannot hotplug memory / CPU, we *must* consider the default
values for those as part of the podOverhead.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-13 10:18:07 +02:00
Fabiano Fidêncio
e31efc861c gha: tdx: Use the k3s overlay
As the TDX machine is using k3s, let's make sure we're deploying
kat-deploy using the k3s overlay.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-13 10:18:07 +02:00
Fabiano Fidêncio
542bb0f3f3 gha: tdx: Set KUBECONFIG env at the job level
By doing this we avoid having to set it up on every step.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-13 10:18:07 +02:00
Fabiano Fidêncio
d7fdf19e9b gha: tdx: Delete kata-deploy after the tests finish
We must ensure that no kata-deploy is left behind after the tests
finish, otherwise it may interfere with the next run.

Fixes: #6647

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-13 10:18:07 +02:00
Fabiano Fidêncio
da35241a91 tests: k8s: Skip k8s-cpu-ns when testing TDX
TEEs do not support CPU / memory hotplug, thus this test must be
skipped.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-13 10:18:07 +02:00
Alexandru Matei
db2cac34d8 runtime: Don't create socket file in /run/kata
The socket file for shim management is created in /run/kata
and it isn't deleted after the container is stopped. After
running and stopping thousands of containers /run folder
will run out of space.

Fixes #6622
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Co-authored-by: Greg Kurz <groug@kaod.org>
2023-04-13 10:21:29 +03:00
Jianyong Wu
6d315719f0 snap: fix docker start fail issue
In Arm baseline CI, docker starts fail with error: "no sockets found via
socket activation: make sure the service was started by systemd". I find
a solusion in [1] to fix it.

[1] https://forums.docker.com/t/failed-to-load-listeners-no-sockets-found-via-socket-activation-make-sure-the-service-was-started-by-systemd/62505

Fixes: #6619
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2023-04-13 09:35:40 +08:00
Zhongtao Hu
328793bb27 Merge pull request #6585 from Apokleos/nydus_prefetch_files
nydus_rootfs/prefetch_files: add prefetch_files for RAFS
2023-04-12 19:58:36 +08:00
Zvonko Kaiser
e4b3b08871 gpu: Add proper CONFIG_LOCALVERSION depending on TEE
If conf_guest is set we need to update the CONFIG_LOCALVERSION
to match the suffix created in install_kata
-nvidia-gpu-{snp|tdx}, the linux headers will be named the very
same if build with make deb-pkg for TDX or SNP.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-12 11:30:59 +00:00
Zhongtao Hu
fef531f565 Merge pull request #6618 from Apokleos/virtiofs_extra_cache_mode
runtime-rs/virtio-fs: add support extra handler for cache mode.
2023-04-12 14:40:05 +08:00
Bin Liu
9327bb0912 Merge pull request #6639 from openanolis/nerdctl
runtime-rs: enable nerdctl to setup cni plugin
2023-04-12 12:04:37 +08:00
Zhongtao Hu
69ba2098f8 runtime-rs: remove network entities and netns
remove network entities and netns

Fixes:#4693
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-04-12 10:21:06 +08:00
Zhongtao Hu
b31f103d12 runtime-rs: enable nerdctl cni plugin
1. when we use nerdctl to setup network for kata, no netns is created by
nerdctl, kata need to create netns by its own

2. after start VM, nerdctl will call cni plugin via oci hook, we need to
rescan the netns after the interfaces have been created, and hotplug
the network device into the VM

Fixes:#4693
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-04-12 10:21:04 +08:00
Fabiano Fidêncio
3b3656d96d Merge pull request #6522 from fidencio/topic/add-tdx-artefacts-from-2023ww01-to-main
tdx: Add artefacts from the latest TDX tools release into main
2023-04-11 20:43:02 +02:00
Fabiano Fidêncio
50ce33b02d Merge pull request #6205 from fengwang666/non-root-clh
runtime: support non-root for clh
2023-04-11 19:34:00 +02:00
Fabiano Fidêncio
4751adbea1 Merge pull request #6610 from fidencio/topic/gha-run-dragonball-k8s-tests
gha: ci-on-push: Run k8s tests with dragonball
2023-04-11 18:16:14 +02:00
Fabiano Fidêncio
69d7a959c8 gha: ci-on-push: Run tests on TDX
Now that we've added a TDX capable external runner, let's make sure we
also run the basic tests using TDX.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 16:10:35 +02:00
Fabiano Fidêncio
5a0727ecb4 kata-deploy: Ship kata-qemu-tdx runtimeClass
Let's make sure we configure containerd for the kata-qemu-tdx handler
and ship the kata-qemu-tdx runtime class for kubernetes.

Fixes: #6537

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 16:10:35 +02:00
Fabiano Fidêncio
98682805be config: Add configuration for QEMU TDX
As the QEMU configuration for TDX differs quite a lot from the normal
QEMU configuration, let's add a new configuration file for the QEMU TDX.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 16:10:35 +02:00
Fabiano Fidêncio
3e15800199 govmm: Directly pass the firmware using -bios with TDX
Since TDX doesn't support readonly memslot, TDVF cannot be mapped as
pflash device and it actually works as RAM. "-bios" option is chosen to
load TDVF.

OVMF is the opensource firmware that implements the TDVF support. Thus
the command line to specify and load TDVF is ``-bios OVMF.fd``

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
3c5ffb0c85 govmm: Set "sept-ve-disable=on"
This is needed since 22ww49.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
ed145365ec runtime/qemu: Drop "kvm-type=tdx"
This is not supported since 22ww49.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
25b3cdd38c virtcontainers: Drop check for the tdx CPU flag
In the recent kernels provided by Intel the `tdx` CPU flag is not
present anymore.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
01bdacb4e4 virtcontainers: Also check /sys/firmwares/tdx for TDX
Let's make sure we also check /sys/firmwares/tdx for TDX guest
protection, as the location may depend on whether TDX Seam is being used
or not.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
9feec533ce cache: Add ability to cache OVMF
Let's add the ability to cache OVMF, which right now we're only building
and shipping it for TDX.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
ce8d982512 gha: Build and ship the OVMF for TDX
Let's build the OVMF with TDX support as part of our tests, and let's
ship it as part of our releases.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
39c3fab7b1 local-build: Add support to build OVMF for TDX
Let's add the needed targets and modifications to be able to build
OVMF for TDX as part of the local-build scripts.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
054174d3e6 versions: Bump OVMF for TDX
Let's update the OVMF for TDX version to what's the latest tested
release of the Intel TDX tools with Kata Containers.

This change requires a newer version of `nasm` than the one provided by
the container used to build the project.  This change will also be
needed for SEV-SNP and was originally done by Alex Carter (thanks!).

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Alex Carter <Alex.Carter@ibm.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
800fb49da1 packaging: Add get_ovmf_image_name() helper
As we'll be using this from different places in the near future, let's
create a helper function as part of the libs.sh.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
fbf03d7aca cache: Document kernel-tdx-experimental
Let's make users aware of the cache_components_main.sh that they can
also cache the kernel-tdx-experimental builds.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
5d79e96966 cache: Add a space to ease the reading of the kernel flavours
Right now it's quite hard to read those, let's improve it a little bit.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
6e4726e454 cache: Fix typos
Let's just fix a few simple typos:
* kernek -> kernel
* experimetnal -> experimental

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
fc22ed0a8a gha: Build and ship the Kernel for TDX
Let's build the kernel with TDX support as part of our tests, and let's
ship it as part of our releases.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
502844ced9 local-build: Add support to build Kernel for TDX
Let's add the needed targets and modifications to be able to build
kernel-tdx-experimental as part of the local-build scripts.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
b2585eecff local-build: Avoid code duplication building the kernel
Let's create a `install_kernel_helper()` function, as it was already
done for QEMU, and rely on that when calling `install_kernel` and
`install_kernel_dragonball_experimental`.

This helps us to reduce the code duplication by a fair amount.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
f33345c311 versions: Update Kernel TDX version
Let's update the Kernel TDX version to what's the latest tested release
of the Intel TDX tools with Kata Containers.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
20ab2c2420 versions: Move Kernel TDX to its own experimental entry
Although we've been providing users a way to build kernel with TDX
support, this must be moved to its own experimental entry instead of how
it currently is.

The reason for that is because the patches are not yet merged into
kernel, and this is still an experimental build of the project.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
3d9ce3982b cache: Allow specifying the QEMU_FLAVOUR
Let's do what we already did when caching the kernel, and allow passing
a FLAVOUR of the project to build.

By doing this we can re-use the same function used to cache QEMU to also
cache any kind of experimental QEMU that we may happen to have.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
33dc6c65aa gha: Build and ship QEMU for TDX
Let's build QEMU TDX as part of our tests, and let's ship it as part of
our releases.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
eceaae30a5 local-build: Add support to build QEMU for TDX
Let's add the needed targets and modifications to be able to build
qemu-tdx-experimental as part of the local-build scripts.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:23:42 +02:00
Fabiano Fidêncio
f7b7c187ec static-build: Improve qemu-experimental build script
Let's make sure the `qemu_suffix` and `qemu_tarball_name` can be
specified.  With this we make it really easy to reuse this script for
any addition flavour of an experimental QEMU that ends up having to be
built (specifically looking at the ones for Confidential Containers
here).

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:17:04 +02:00
Fabiano Fidêncio
3018c9ad51 versions: Update QEMU TDX version
Let's update the QEMU TDX version to what's the latest tested release of
the Intel TDX tools with Kata Containers.

In order to do such update, we had to relax the checks on the QEMU
version for some of the configuration options, as those were removed
right after the window was open for the 7.1.0 development (thus the
7.0.50 check).

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:17:04 +02:00
Fabiano Fidêncio
800ee5cd88 versions: Move QEMU TDX to its own experimental entry
Although we've been providing users a way to build QEMU with TDX
support, this must be moved to its own experimental entry instead of how
it currently is.

The reason for that is because the patches are not yet merged into QEMU,
and this is still an experimental build of the project.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:17:04 +02:00
Fabiano Fidêncio
1315bb45f9 local-build: Add dragonball kernel to the all target
As the dragonball kernel is shipped as part of our releases, it must be
added to the `all` target.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:17:04 +02:00
Fabiano Fidêncio
73e108136a local-build: Rename non vanilla kernel build functions
In order to make it easier to read, let's just rename the
install_dragonball_experimental_kernel and install_experimental_kernel
to install_kernel_dragonball_experimental and
install_kernel_experimental, respectively.

This allows us to quickly get to those functions when looking for
`install_kernel`.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:17:04 +02:00
Fabiano Fidêncio
1d851b4be3 local-build: Cosmetic changes in build targets
This is a simple cosmetic change, adding a space between the function
call and the `;;`.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 15:17:04 +02:00
Fabiano Fidêncio
49ce685ebf gha: k8s-on-aks: Always delete the AKS cluster
Regardless of the tests succeeding or failing, the AKS cluster **must be
deleted**.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 13:40:40 +02:00
Fabiano Fidêncio
e2a770df55 gha: ci-on-push: Run k8s tests with dragonball
Now that the infra for running dragonball tests has been enabled, let's
actually make sure to have them running on each PR.

The tests skipped are:
* `k8s-cpu-ns.bats`, as CPU resize doesn't seem to be yet properly
  supported on runtime-rs
  * https://github.com/kata-containers/kata-containers/issues/6621

Fixes: #6605

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-11 11:47:47 +02:00
Fabiano Fidêncio
aee6174a53 Merge pull request #6637 from gkurz/cpu-shares-to-weight
rustjail: Use CPUWeight with systemd and CgroupsV2
2023-04-11 10:55:48 +02:00
GabyCT
dc74133e74 Merge pull request #6631 from fidencio/topic/gha-create-delete-aks-cannot-be-workflows
gha: k8s-on-aks: {create,delete} AKS must be a coded-in step
2023-04-10 14:05:24 -06:00
Zhongtao Hu
8cdec5707e Merge pull request #6540 from houstar/main
docs: update the rust version from version.yaml
2023-04-10 16:53:21 +08:00
Qingyuan Hou
d1f550bd1e docs: update the rust version from versions.yaml
Fixes: #6539
Signed-off-by: Qingyuan Hou <lenohou@gmail.com>
2023-04-10 03:34:15 +00:00
alex.lyn
f3595e48b0 nydus_rootfs/prefetch_files: add prefetch_files for RAFS
A sandbox annotation used to specify prefetch_files.list
path the container image being used, and runtime will pass
it to Hypervisor to search for corresponding prefetch file:
format looks like:
"io.katacontainers.config.hypervisor.prefetch_files.list"
      = /path/to/<uid>/xyz.com/fedora:36/prefetch_file.list

Fixes: #6582

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-04-10 10:05:52 +08:00
Zhongtao Hu
3bfaafbf44 fix: oci hook
1. when do the deserialization for the oci hook, we should use camel
case for createRuntime

2. we should pass the dir of bundle path instead of the path of
config.json

Fixes:#4693
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
2023-04-10 09:53:43 +08:00
Greg Kurz
c1fbaae8d6 rustjail: Use CPUWeight with systemd and CgroupsV2
The CPU shares property belongs to CgroupsV1. CgroupsV2 uses CPU weight
instead. The correct value is computed in the latter case but it is passed
to systemd using the legacy property. Systemd rejects the request and the
agent exists with the following error :

        Value specified in CPUShares is out of range: unknown

Replace the "shares" wording with "weight" in the CgroupsV2 code to
avoid confusions. Use the "CPUWeight" property since this is what
systemd expects in this case.

Fixes #6636

References:

https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#CPUWeight=weight
https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#systemd%20252
https://github.com/containers/crun/blob/main/crun.1.md#cpu-controller

Signed-off-by: Greg Kurz <groug@kaod.org>
2023-04-07 17:57:26 +02:00
Bo Chen
375187e045 versions: Upgrade to Cloud Hypervisor v31.0
Details of this release can be found in our new roadmap project as
iteration v31.0: https://github.com/orgs/cloud-hypervisor/projects/6.

Fixes: #6632

Signed-off-by: Bo Chen <chen.bo@intel.com>
2023-04-06 14:35:26 -07:00
Fabiano Fidêncio
79f3047f06 gha: k8s-on-aks: {create,delete} AKS must be a coded-in step
I should have seen this coming, but currently the "create" and "delete"
AKS workflows cannot be imported and uses as a job's step, resulting on
an error trying to find the correspondent action.yaml file for those.

Fixes: #6630

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-06 22:56:08 +02:00
Fabiano Fidêncio
ee5dda012b Merge pull request #6629 from fidencio/topic/gha-refactor-run-k8s-tests-on-aks
gha: k8s-on-aks: Set {create,delete}_aks as steps
2023-04-06 22:02:34 +02:00
Fabiano Fidêncio
2f35b4d4e5 gha: ci-on-push: Only run on main branch
Let's ensure we're only running this workflow when PRs are opened
against the main branch.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-06 19:11:24 +02:00
Fabiano Fidêncio
e7bd2545ef Revert "gha: ci-on-push: Depend on Commit Message Check"
This reverts commit a159ffdba7.

Unfortunately we have to revert the PRs related to the switch done to
using `workflow_run` instead of `pull_request_target`.  The reason for
that being that we can only mark jobs as required if they are targetting
PRs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-06 19:11:14 +02:00
Fabiano Fidêncio
0d96d49633 Revert "gha: ci-on-push: Adjust to using workflow_run"
This reverts commit 3a760a157a.

Unfortunately we have to revert the PRs related to the switch done to
using `workflow_run` instead of `pull_request_target`.  The reason for
that being that we can only mark jobs as required if they are targetting
PRs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-06 19:11:06 +02:00
Fabiano Fidêncio
c7ee45f7e5 Revert "gha: ci-on-push: Adapt chained jobs to workflow_run"
This reverts commit 7855b43062.

Unfortunately we have to revert the PRs related to the switch done to
using `workflow_run` instead of `pull_request_target`.  The reason for
that being that we can only mark jobs as required if they are targetting
PRs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-06 19:09:54 +02:00
Fabiano Fidêncio
5d4d720647 Revert "gha: k8s-on-aks: Fix cluster name"
This reverts commit 85cc5bb534.

Unfortunately we have to revert the PRs related to the switch done to
using `workflow_run` instead of `pull_request_target`.  The reason for
that being that we can only mark jobs as required if they are targetting
PRs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-06 19:07:04 +02:00
Fabiano Fidêncio
13d857a56d gha: k8s-on-aks: Set {create,delete}_aks as steps
We've been currently using {create,delete}_aks as jobs.  However, it
means that if the tests fail we'll end up deleting the AKS cluster (as
expected), but not having a way to recreate the cluster without
re-running all jobs, which is a waste of resources.

Fixes: #6628

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-06 16:54:15 +02:00
Fabiano Fidêncio
abaf881f4a Merge pull request #6612 from fidencio/topic/gha-k8s-on-aks-fix-cluster-name
gha: k8s-on-aks: Fix cluster name
2023-04-06 10:48:38 +02:00
alex.lyn
dc6569dbbc runtime-rs/virtio-fs: add support extra handler for cache mode.
Add support for virtiofsd when virtio_fs_extra_args with
"-o cache auto, ..." users specified.

Fixes: #6615

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-04-06 16:31:02 +08:00
Fabiano Fidêncio
85cc5bb534 gha: k8s-on-aks: Fix cluster name
This was missed from the last series, as GHA will use the "target
branch" yaml file to start the workflow.

Basically we changed the name of the cluster created to stop relying on
the PR number, as that's not easily accessible on `workflow_run`.

Fixes: #6611

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-06 08:50:07 +02:00
Fabiano Fidêncio
68cb5689f5 Merge pull request #6584 from fidencio/topic/gha-k8s-also-test-dragonball
gha: Also run k8s tests on AKS with dragonball
2023-04-05 22:50:14 +02:00
Fabiano Fidêncio
ae488cc09f Merge pull request #6596 from fidencio/topic/gha-only-push-to-registry-when-merging-content
gha: Only push images to registry after merging a PR
2023-04-05 22:07:13 +02:00
Fabiano Fidêncio
2c38e17ef0 Merge pull request #6607 from fidencio/topic/gha-switch-to-using-a-D4_v5-instance
gha: aks: Use D4s_v5 instance
2023-04-05 22:06:40 +02:00
Archana Shinde
6af52cef3a Merge pull request #6590 from zvonkok/build-kernel-fix
tools: Avoid building the kernel twice
2023-04-05 11:45:59 -07:00
Greg Kurz
a3e3b0591f Merge pull request #6562 from c3d/issue/6561-unwrap-panic
rustjail: Fix panic when cgroup manager fails
2023-04-05 16:58:13 +02:00
James O. D. Hunt
cbe6f04194 Merge pull request #6501 from shippomx/dev_metrics
runtime: add filter metrics with specific names
2023-04-05 15:15:09 +01:00
Fabiano Fidêncio
1688e4f3f0 gha: aks: Use D4s_v5 instance
It's been pointed out that D4s_v5 instances are more powerful than the
D4s_v3 ones, and have the very same price.  With this in mind, let's
switch to the newer machines.

Fixes: #6606

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-05 16:02:17 +02:00
Fabiano Fidêncio
108d80a86d gha: Add the ability to also test Dragonball
With the changes proposed as part of this PR, an AKS cluster will be
created but no tests will be performed.

The reason we have to do this is because GitHub Actions will only run
the tests using the workflows that are part of the **target** branch,
instead of the using the ones coming from the PR, and we didn't find yet
a way to work this around.

Once this commit is in, we'll actually change the tests themselves (not
the yaml files for the actions), as those will be the ones we want as
the checkout action helps us on this case.

Fixes: #6583

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-05 15:53:03 +02:00
Fabiano Fidêncio
2550d4462d gha: build-kata-static-tarball: Only push to registry after merge
56331bd7bc oversaw the fact that we
mistakenly tried to push the build containers to the registry for a PR,
rather than doing so only when the code is merged.

As the workflow is now shared between different actions, let's introduce
an input variable to specify which are the cases we actually need to
perform a push to the registry.

Fixes: #6592

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-05 13:57:26 +02:00
Fabiano Fidêncio
e81b8b8ee5 local-build: build-and-upload-payload is not quay.io specific
Let's just print "to the registry" instead of printing "to quay.io", as
the registry used is not tied to quay.io.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-05 12:54:44 +02:00
Fabiano Fidêncio
13929fc610 gha: publish-kata-deploy-payload: Improve registry login
Let's only try to login to the registry that's being passed as an input
argument.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-05 12:54:44 +02:00
Fabiano Fidêncio
41026f003e gha: payload-after-push: Pass registry / repo as inputs
We made registry / repo mandatory, but we only adapted that to the amd64
job.  Let's fix it now and make sure this is also passed to the arm64
and s390x jobs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-05 12:54:44 +02:00
Fabiano Fidêncio
7855b43062 gha: ci-on-push: Adapt chained jobs to workflow_run
As we're using the `workflow_run` event, the checkout action would
pull the **current target branch** instead of the PR one.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-05 12:54:44 +02:00
Fabiano Fidêncio
3a760a157a gha: ci-on-push: Adjust to using workflow_run
The way previously used to get the PR's commit sha can only be used with
`pull_request*` kind of events.

Let's adapt it to the `workflow_run` now that we're using it.

With this change we ended up dropping the PR number from the tarball
suffix, as that's not straightforward to get and, to be honest, not a
unique differentiator that would justify the effort.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-05 12:54:44 +02:00
Fabiano Fidêncio
a159ffdba7 gha: ci-on-push: Depend on Commit Message Check
Let's make this workflow dependent of the commit message check, and only
start it if the commit message check one passes.

As a side effect, this allows us to run this specific workflow using
secrets, without having to rely on `pull_request_target`.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-05 12:54:40 +02:00
Fabiano Fidêncio
8086c75f61 gha: Also run k8s tests on AKS with dragonball
As already done for Cloud Hypervisor and QEMU, let's make sure we can
run the AKS tests using dragonball.

Fixes: #6583

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-04 10:58:47 +02:00
Fabiano Fidêncio
1c6d7cb0f7 Merge pull request #6589 from fidencio/topic/gha-k8s-use-ghcr-instead-of-quay
gha: Use ghcr.io for the k8s CI
2023-04-04 10:48:16 +02:00
Zvonko Kaiser
fe86c08a63 tools: Avoid building the kernel twice
Two different kernel build targets (build,install) have both instructions to
build the kernel, hence it was executed twice. Install should only do
install and build should only do build.

Fixes: #6588

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2023-04-04 05:44:44 +00:00
Fabiano Fidêncio
3215860a47 gha: Set ci-on-push to run on pull_request_target
This is less secure than running the PR on `pull_request`, and will
require using an additional `ok-to-test` label to make sure someone
deliverately ran the actions coming from a forked repo.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-03 20:50:36 +02:00
Fabiano Fidêncio
d17dfe4cdd gha: Use ghcr.io for the k8s CI
Let's switch to using the `ghcr.io` registry for the k8s CI, as this
will save us some troubles on running the CI with PRs coming from forked
repos.

Fixes: #6587

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-04-03 15:52:33 +02:00
Fabiano Fidêncio
e1f972fb1d Merge pull request #6568 from kata-containers/topic/add-k8s-tests-as-part-of-gha
GHA |Switch "kubernetes tests" from jenkins to GitHub actions
2023-04-03 14:25:35 +02:00
Christophe de Dinechin
b661e0cf3f rustjail: Add anyhow context for D-Bus connections
In cases where the D-Bus connection fails, add a little additional context about
the origin of the error.

Fixes: 6561

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Suggested-by: Archana Shinde <archana.m.shinde@intel.com>
Spell-checked-by: Greg Kurz <gkurz@redhat.com>
2023-04-03 14:09:34 +02:00
Fabiano Fidêncio
60c62c3b69 gha: Remove kata-deploy-test.yaml
This workflow becomes redundant as we're already testing kubernetes
using kata-deploy, and also testing it on AKS.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-03-31 21:55:41 +02:00
Fabiano Fidêncio
43894e9459 gha: Remove kata-deploy-push.yaml
This becomes redundant now that its steps are covered as part of the
`ci-on-push.yaml`.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-03-31 21:55:41 +02:00
Fabiano Fidêncio
cab9ca0436 gha: Add a CI pipeline for Kata Containers
This is the very first step to replacing the Jenkins CI, and I've
decided to start with an x86_64 approach only (although easily
expansible for other arches as soon as they're ready to switch), and to
start running our kubernetes tests (now running on AKS).

Fixes: #6541

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-03-31 21:55:41 +02:00
Fabiano Fidêncio
53b526b6bd gha: k8s: Add snippet to run k8s tests on aks clusters
This will be shortly used as part of a newly created GitHub action which
will replace our Jenkins CI.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-03-31 21:55:41 +02:00
Fabiano Fidêncio
c444c24bc5 gha: aks: Add snippets to create / delete aks clusters
Those will be shortly used as part of a newly added GitHub action for
testing k8s tests on Azure.

They've been created using the secrets we already have exposed as part
of our GitHub, and they follow a similar way to authenticate to Azure /
create an AKS cluster as done in the `/test-kata-deploy` action.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-03-31 21:55:41 +02:00
Fabiano Fidêncio
11e0099fb5 tests: Move k8s tests to this repo
The first part of simplifying things to have all our tests using GitHub
actions is moving the k8s tests to this repo, as those will be the first
vict^W targets to be migrated to GitHub actions.

Those tests have been slightly adapted, mainly related to what they load
/ import, so they are more self-contained and do not require us bringing
a lot of scripts from the tests repo here.

A few scripts were also dropped along the way, as we no longer plan to
deploy kubernetes as part of every single run, but rather assume there
will always be k8s running whenever we land to run those tests.

It's important to mention that a few tests were not added here:

* k8s-block-volume:
* k8s-file-volume:
* k8s-volume:
* k8s-ro-volume:
  These tests depend on some sort of volume being created on the
  kubernetes node where the test will run, and this won't fly as the
  tests will run from a GitHub runner, targetting a different machine
  where kubernetes will be running.
  * https://github.com/kata-containers/kata-containers/issues/6566

* k8s-hugepages: This test depends a whole lot on the host where it
  lands and right now we cannot assume anything about that anymore, as
  the tests will run from a GitHub runner, targetting a different
  machine where kubernetes will be running.
  * https://github.com/kata-containers/kata-containers/issues/6567

* k8s-expose-ip: This is simply hanging when running on AKS and has to
  be debugged in order to figure out the root cause of that, and then
  adapted to also work on AKS.
  * https://github.com/kata-containers/kata-containers/issues/6578

Till those issues are solved, we'll keep running a jenkins job with
hose tests to avoid any possible regression.

Last but not least, I've decided to **not** keep the history when
bringing those tests here, otherwise we'd end up polluting a lot the
history of this repo, without any clear benefit on doing so.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-03-31 21:55:41 +02:00
David Esparza
5d89d08fc4 Merge pull request #6564 from GabyCT/topic/updateneturl
docs: Update CNM url in networking document
2023-03-31 09:58:55 -06:00
Fabiano Fidêncio
73be4bd3f9 gha: Update actions for release.yaml
checkout@v2 should not be used anymore, please, see:
https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-03-31 13:24:26 +02:00
Fabiano Fidêncio
d38d7fbf1a gha: Remove code duplication from release.yaml
We can easily re-use the newly added build-kata-static-tarball-*.yaml as
part of the release.yaml file.

By doing this we consolidate on how we build the components accross our
actions.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-03-31 13:24:26 +02:00
Fabiano Fidêncio
56331bd7bc gha: Split payload-after-push-*.yaml
Let's split those actions into two different ones:
* Build the kata-static tarball
* Publish the kata-deploy payload

We're doing this as, later in this series we'll start taking advantage
of both pieces.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-03-31 13:24:26 +02:00
Gabriela Cervantes
a552a1953a docs: Update CNM url in networking document
This PR updates the url for the Container Network Model
in the network document.

Fixes #6563

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-03-30 16:20:33 +00:00
Christophe de Dinechin
7796e6ccc6 rustjail: Fix minor grammatical error in function name
Rename `unit_exist` function to `unit_exists` to match English grammar rule.

Fixes: #6561

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2023-03-30 16:13:37 +02:00
Christophe de Dinechin
41fdda1d84 rustjail: Do not unwrap potential error with cgroup manager
There can be an error while connecting to the cgroups managager, for
example a `ENOENT` if a file is not found. Make sure that this is
reported through the proper channels instead of causing a `panic()`
that does not provide much information.

Fixes: #6561

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Reported-by: Greg Kurz <gkurz@redhat.com>
2023-03-30 16:09:13 +02:00
Archana Shinde
07e49c63e1 Merge pull request #6257 from amshinde/kata-ctl-env
kata-ctl: add function to get platform protection.
2023-03-29 11:55:07 -07:00
Archana Shinde
a914283ce0 kata-ctl: add function to get platform protection.
This function checks for tdx, sev or snp protection on x86
platform.

Fixes: #1000

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-03-28 15:40:25 -07:00
Miao Xia
0f73515561 runtime: add filter metrics with specific names
The kata monitor metrics API returns a huge size response,
if containers or sandboxs are a large number,
focus on what we need will be harder.

Fixes: #6500

Signed-off-by: Miao Xia <xia.miao1@zte.com.cn>
2023-03-28 14:56:13 +08:00
Feng Wang
cbe6ad9034 runtime: support non-root for clh
This change enables to run cloud-hypervisor VMM using a non-root user
when rootless flag is set true in the configuration

Fixes: #2567

Signed-off-by: Feng Wang <fwang@confluent.io>
2023-02-22 13:57:09 -08:00
Archana Shinde
d3bb254188 utils: Add function to check vhost-vsock
Add function to check if the host-system has the vhost-vsock
kernel module.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2023-02-03 15:41:59 -08:00
1877 changed files with 396274 additions and 25671 deletions

View File

@@ -9,6 +9,10 @@ on:
- labeled
- unlabeled
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
pr_wip_check:
runs-on: ubuntu-latest

View File

@@ -10,6 +10,10 @@ on:
- labeled
- unlabeled
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
check-issues:
if: ${{ github.event.label.name != 'auto-backport' }}
@@ -17,7 +21,7 @@ jobs:
steps:
- name: Checkout code to allow hub to communicate with the project
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Install hub extension script
run: |

View File

@@ -11,6 +11,10 @@ on:
- opened
- reopened
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
add-new-issues-to-backlog:
runs-on: ubuntu-latest
@@ -35,7 +39,7 @@ jobs:
popd &>/dev/null
- name: Checkout code to allow hub to communicate with the project
uses: actions/checkout@v2
uses: actions/checkout@v4
- name: Add issue to issue backlog
env:

View File

@@ -12,12 +12,25 @@ on:
- reopened
- synchronize
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
add-pr-size-label:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v1
uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ github.event.pull_request.base.ref }}
- name: Install PR sizing label script
run: |

View File

@@ -2,6 +2,10 @@ on:
pull_request_target:
types: ["labeled", "closed"]
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
backport:
name: Backport PR

200
.github/workflows/basic-ci-amd64.yaml vendored Normal file
View File

@@ -0,0 +1,200 @@
name: CI | Basic amd64 tests
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-cri-containerd:
strategy:
# We can set this to true whenever we're 100% sure that
# the all the tests are not flaky, otherwise we'll fail
# all the tests due to a single flaky instance.
fail-fast: false
matrix:
containerd_version: ['lts', 'active']
vmm: ['clh', 'qemu']
runs-on: garm-ubuntu-2204-smaller
env:
CONTAINERD_VERSION: ${{ matrix.containerd_version }}
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/cri-containerd/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/cri-containerd/gha-run.sh install-kata kata-artifacts
- name: Run cri-containerd tests
run: bash tests/integration/cri-containerd/gha-run.sh run
run-containerd-stability:
strategy:
fail-fast: false
matrix:
containerd_version: ['lts', 'active']
vmm: ['clh', 'qemu']
runs-on: garm-ubuntu-2204-smaller
env:
CONTAINERD_VERSION: ${{ matrix.containerd_version }}
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/stability/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/stability/gha-run.sh install-kata kata-artifacts
- name: Run containerd-stability tests
run: bash tests/stability/gha-run.sh run
run-nydus:
strategy:
# We can set this to true whenever we're 100% sure that
# the all the tests are not flaky, otherwise we'll fail
# all the tests due to a single flaky instance.
fail-fast: false
matrix:
containerd_version: ['lts', 'active']
vmm: ['clh', 'qemu', 'dragonball']
runs-on: garm-ubuntu-2204-smaller
env:
CONTAINERD_VERSION: ${{ matrix.containerd_version }}
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/nydus/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/nydus/gha-run.sh install-kata kata-artifacts
- name: Run nydus tests
run: bash tests/integration/nydus/gha-run.sh run
run-runk:
runs-on: garm-ubuntu-2204-smaller
env:
CONTAINERD_VERSION: lts
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/runk/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/runk/gha-run.sh install-kata kata-artifacts
- name: Run tracing tests
run: bash tests/integration/runk/gha-run.sh run
run-vfio:
strategy:
fail-fast: false
matrix:
vmm: ['clh', 'qemu']
runs-on: garm-ubuntu-2304
env:
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/functional/vfio/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Run vfio tests
timeout-minutes: 15
run: bash tests/functional/vfio/gha-run.sh run

View File

@@ -0,0 +1,140 @@
name: CI | Build kata-static tarball for amd64
on:
workflow_call:
inputs:
stage:
required: false
type: string
default: test
tarball-suffix:
required: false
type: string
push-to-registry:
required: false
type: string
default: no
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
build-asset:
runs-on: ubuntu-latest
strategy:
matrix:
asset:
- agent
- agent-opa
- agent-ctl
- cloud-hypervisor
- cloud-hypervisor-glibc
- firecracker
- kata-ctl
- kernel
- kernel-sev
- kernel-dragonball-experimental
- kernel-tdx-experimental
- kernel-nvidia-gpu
- kernel-nvidia-gpu-snp
- kernel-nvidia-gpu-tdx-experimental
- log-parser-rs
- nydus
- ovmf
- ovmf-sev
- qemu
- qemu-snp-experimental
- qemu-tdx-experimental
- rootfs-image
- rootfs-image-tdx
- rootfs-initrd
- rootfs-initrd-mariner
- rootfs-initrd-sev
- runk
- shim-v2
- tdvf
- trace-forwarder
- virtiofsd
stage:
- ${{ inputs.stage }}
exclude:
- asset: agent
stage: release
- asset: agent-opa
stage: release
- asset: cloud-hypervisor-glibc
stage: release
steps:
- name: Login to Kata Containers quay.io
if: ${{ inputs.push-to-registry == 'yes' }}
uses: docker/login-action@v2
with:
registry: quay.io
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Build ${{ matrix.asset }}
run: |
make "${KATA_ASSET}-tarball"
build_dir=$(readlink -f build)
# store-artifact does not work with symlink
sudo cp -r "${build_dir}" "kata-build"
env:
KATA_ASSET: ${{ matrix.asset }}
TAR_OUTPUT: ${{ matrix.asset }}.tar.gz
PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}
ARTEFACT_REGISTRY: ghcr.io
ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}
ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: store-artifact ${{ matrix.asset }}
uses: actions/upload-artifact@v3
with:
name: kata-artifacts-amd64${{ inputs.tarball-suffix }}
path: kata-build/kata-static-${{ matrix.asset }}.tar.xz
retention-days: 1
if-no-files-found: error
create-kata-tarball:
runs-on: ubuntu-latest
needs: build-asset
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-artifacts
uses: actions/download-artifact@v3
with:
name: kata-artifacts-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: merge-artifacts
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts versions.yaml
- name: store-artifacts
uses: actions/upload-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-static.tar.xz
retention-days: 1
if-no-files-found: error

View File

@@ -1,14 +1,29 @@
name: CI | Publish kata-deploy payload for arm64
name: CI | Build kata-static tarball for arm64
on:
workflow_call:
inputs:
target-arch:
required: true
stage:
required: false
type: string
default: test
tarball-suffix:
required: false
type: string
push-to-registry:
required: false
type: string
default: no
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
build-asset:
runs-on: arm64
runs-on: arm64-builder
strategy:
matrix:
asset:
@@ -22,21 +37,32 @@ jobs:
- rootfs-initrd
- shim-v2
- virtiofsd
stage:
- ${{ inputs.stage }}
steps:
- name: Adjust a permission for repo
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
- name: Login to Kata Containers quay.io
if: ${{ inputs.push-to-registry == 'yes' }}
uses: docker/login-action@v2
with:
registry: quay.io
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- name: Adjust a permission for repo
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Build ${{ matrix.asset }}
run: |
make "${KATA_ASSET}-tarball"
@@ -46,65 +72,49 @@ jobs:
env:
KATA_ASSET: ${{ matrix.asset }}
TAR_OUTPUT: ${{ matrix.asset }}.tar.gz
PUSH_TO_REGISTRY: yes
PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}
ARTEFACT_REGISTRY: ghcr.io
ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}
ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: store-artifact ${{ matrix.asset }}
uses: actions/upload-artifact@v3
with:
name: kata-artifacts-arm64
name: kata-artifacts-arm64${{ inputs.tarball-suffix }}
path: kata-build/kata-static-${{ matrix.asset }}.tar.xz
retention-days: 1
if-no-files-found: error
create-kata-tarball:
runs-on: arm64
runs-on: arm64-builder
needs: build-asset
steps:
- name: Adjust a permission for repo
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-artifacts
uses: actions/download-artifact@v3
with:
name: kata-artifacts-arm64
name: kata-artifacts-arm64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: merge-artifacts
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts versions.yaml
- name: store-artifacts
uses: actions/upload-artifact@v3
with:
name: kata-static-tarball-arm64
name: kata-static-tarball-arm64${{ inputs.tarball-suffix }}
path: kata-static.tar.xz
retention-days: 1
if-no-files-found: error
kata-payload:
needs: create-kata-tarball
runs-on: arm64
steps:
- name: Login to Kata Containers quay.io
uses: docker/login-action@v2
with:
registry: quay.io
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- name: Adjust a permission for repo
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
- uses: actions/checkout@v3
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-arm64
- name: build-and-push-kata-payload
id: build-and-push-kata-payload
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \
$(pwd)/kata-static.tar.xz "quay.io/kata-containers/kata-deploy-ci" \
"kata-containers-${{ inputs.target-arch }}"

View File

@@ -1,10 +1,25 @@
name: CI | Publish kata-deploy payload for s390x
name: CI | Build kata-static tarball for s390x
on:
workflow_call:
inputs:
target-arch:
required: true
stage:
required: false
type: string
default: test
tarball-suffix:
required: false
type: string
push-to-registry:
required: false
type: string
default: no
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
build-asset:
@@ -18,21 +33,32 @@ jobs:
- rootfs-initrd
- shim-v2
- virtiofsd
stage:
- ${{ inputs.stage }}
steps:
- name: Adjust a permission for repo
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
- name: Login to Kata Containers quay.io
if: ${{ inputs.push-to-registry == 'yes' }}
uses: docker/login-action@v2
with:
registry: quay.io
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- name: Adjust a permission for repo
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Build ${{ matrix.asset }}
run: |
make "${KATA_ASSET}-tarball"
@@ -43,12 +69,16 @@ jobs:
env:
KATA_ASSET: ${{ matrix.asset }}
TAR_OUTPUT: ${{ matrix.asset }}.tar.gz
PUSH_TO_REGISTRY: yes
PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}
ARTEFACT_REGISTRY: ghcr.io
ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}
ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: store-artifact ${{ matrix.asset }}
uses: actions/upload-artifact@v3
with:
name: kata-artifacts-s390x
name: kata-artifacts-s390x${{ inputs.tarball-suffix }}
path: kata-build/kata-static-${{ matrix.asset }}.tar.xz
retention-days: 1
if-no-files-found: error
@@ -61,47 +91,27 @@ jobs:
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-artifacts
uses: actions/download-artifact@v3
with:
name: kata-artifacts-s390x
name: kata-artifacts-s390x${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: merge-artifacts
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts versions.yaml
- name: store-artifacts
uses: actions/upload-artifact@v3
with:
name: kata-static-tarball-s390x
name: kata-static-tarball-s390x${{ inputs.tarball-suffix }}
path: kata-static.tar.xz
retention-days: 1
if-no-files-found: error
kata-payload:
needs: create-kata-tarball
runs-on: s390x
steps:
- name: Login to Kata Containers quay.io
uses: docker/login-action@v2
with:
registry: quay.io
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- name: Adjust a permission for repo
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
- uses: actions/checkout@v3
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-s390x
- name: build-and-push-kata-payload
id: build-and-push-kata-payload
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \
$(pwd)/kata-static.tar.xz "quay.io/kata-containers/kata-deploy-ci" \
"kata-containers-${{ inputs.target-arch }}"

View File

@@ -7,6 +7,11 @@ on:
- reopened
- synchronize
paths-ignore: [ '**.md', '**.png', '**.jpg', '**.jpeg', '**.svg', '/docs/**' ]
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
cargo-deny-runner:
runs-on: ubuntu-latest
@@ -14,7 +19,7 @@ jobs:
steps:
- name: Checkout Code
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Generate Action
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: bash cargo-deny-generator.sh

19
.github/workflows/ci-nightly.yaml vendored Normal file
View File

@@ -0,0 +1,19 @@
name: Kata Containers Nightly CI
on:
schedule:
- cron: '0 0 * * *'
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
kata-containers-ci-on-push:
uses: ./.github/workflows/ci.yaml
with:
commit-hash: ${{ github.sha }}
pr-number: "nightly"
tag: ${{ github.sha }}-nightly
target-branch: ${{ github.ref_name }}
secrets: inherit

32
.github/workflows/ci-on-push.yaml vendored Normal file
View File

@@ -0,0 +1,32 @@
name: Kata Containers CI
on:
pull_request_target:
branches:
- 'main'
- 'stable-*'
types:
# Adding 'labeled' to the list of activity types that trigger this event
# (default: opened, synchronize, reopened) so that we can run this
# workflow when the 'ok-to-test' label is added.
# Reference: https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#pull_request_target
- opened
- synchronize
- reopened
- labeled
paths-ignore:
- 'docs/**'
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
kata-containers-ci-on-push:
if: ${{ contains(github.event.pull_request.labels.*.name, 'ok-to-test') }}
uses: ./.github/workflows/ci.yaml
with:
commit-hash: ${{ github.event.pull_request.head.sha }}
pr-number: ${{ github.event.pull_request.number }}
tag: ${{ github.event.pull_request.number }}-${{ github.event.pull_request.head.sha }}
target-branch: ${{ github.event.pull_request.base.ref }}
secrets: inherit

185
.github/workflows/ci.yaml vendored Normal file
View File

@@ -0,0 +1,185 @@
name: Run the Kata Containers CI
on:
workflow_call:
inputs:
commit-hash:
required: true
type: string
pr-number:
required: true
type: string
tag:
required: true
type: string
target-branch:
required: false
type: string
default: ""
jobs:
build-kata-static-tarball-amd64:
uses: ./.github/workflows/build-kata-static-tarball-amd64.yaml
with:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
publish-kata-deploy-payload-amd64:
needs: build-kata-static-tarball-amd64
uses: ./.github/workflows/publish-kata-deploy-payload-amd64.yaml
with:
tarball-suffix: -${{ inputs.tag }}
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
secrets: inherit
build-and-publish-tee-confidential-unencrypted-image:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Set up QEMU
uses: docker/setup-qemu-action@v2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Login to Kata Containers ghcr.io
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Docker build and push
uses: docker/build-push-action@v4
with:
tags: ghcr.io/kata-containers/test-images:unencrypted-${{ inputs.pr-number }}
push: true
context: tests/integration/kubernetes/runtimeclass_workloads/confidential/unencrypted/
platforms: linux/amd64, linux/s390x
file: tests/integration/kubernetes/runtimeclass_workloads/confidential/unencrypted/Dockerfile
run-docker-tests-on-garm:
needs: build-kata-static-tarball-amd64
uses: ./.github/workflows/run-docker-tests-on-garm.yaml
with:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
run-nerdctl-tests-on-garm:
needs: build-kata-static-tarball-amd64
uses: ./.github/workflows/run-nerdctl-tests-on-garm.yaml
with:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
run-kata-deploy-tests-on-aks:
needs: publish-kata-deploy-payload-amd64
uses: ./.github/workflows/run-kata-deploy-tests-on-aks.yaml
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
secrets: inherit
run-kata-deploy-tests-on-garm:
needs: publish-kata-deploy-payload-amd64
uses: ./.github/workflows/run-kata-deploy-tests-on-garm.yaml
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
secrets: inherit
run-kata-monitor-tests:
needs: build-kata-static-tarball-amd64
uses: ./.github/workflows/run-kata-monitor-tests.yaml
with:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
run-k8s-tests-on-aks:
needs: publish-kata-deploy-payload-amd64
uses: ./.github/workflows/run-k8s-tests-on-aks.yaml
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
secrets: inherit
run-k8s-tests-on-garm:
needs: publish-kata-deploy-payload-amd64
uses: ./.github/workflows/run-k8s-tests-on-garm.yaml
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
secrets: inherit
run-k8s-tests-with-crio-on-garm:
needs: publish-kata-deploy-payload-amd64
uses: ./.github/workflows/run-k8s-tests-with-crio-on-garm.yaml
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
secrets: inherit
run-kata-coco-tests:
needs: [publish-kata-deploy-payload-amd64, build-and-publish-tee-confidential-unencrypted-image]
uses: ./.github/workflows/run-kata-coco-tests.yaml
with:
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
run-metrics-tests:
needs: build-kata-static-tarball-amd64
uses: ./.github/workflows/run-metrics.yaml
with:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
run-basic-amd64-tests:
needs: build-kata-static-tarball-amd64
uses: ./.github/workflows/basic-ci-amd64.yaml
with:
tarball-suffix: -${{ inputs.tag }}
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}

View File

@@ -6,6 +6,10 @@ on:
- reopened
- synchronize
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
env:
error_msg: |+
See the document below for help on formatting commits for the project.
@@ -62,6 +66,9 @@ jobs:
# to be specified at the start of the regex as the action is passed
# the entire commit message.
#
# - This check will pass if the commit message only contains a subject
# line, as other body message properties are enforced elsewhere.
#
# - Body lines *can* be longer than the maximum if they start
# with a non-alphabetic character or if there is no whitespace in
# the line.
@@ -75,7 +82,7 @@ jobs:
#
# - A SoB comment can be any length (as it is unreasonable to penalise
# people with long names/email addresses :)
pattern: '^.+(\n([a-zA-Z].{0,150}|[^a-zA-Z\n].*|[^\s\n]*|Signed-off-by:.*|))+$'
pattern: '(^[^\n]+$|^.+(\n([a-zA-Z].{0,150}|[^a-zA-Z\n].*|[^\s\n]*|Signed-off-by:.*|))+$)'
error: 'Body line too long (max 150)'
post_error: ${{ env.error_msg }}

View File

@@ -6,6 +6,11 @@ on:
- reopened
- synchronize
paths-ignore: [ '**.md', '**.png', '**.jpg', '**.jpeg', '**.svg', '/docs/**' ]
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
name: Darwin tests
jobs:
test:
@@ -16,6 +21,6 @@ jobs:
with:
go-version: 1.19.3
- name: Checkout code
uses: actions/checkout@v2
uses: actions/checkout@v4
- name: Build utils
run: ./ci/darwin-test.sh

View File

@@ -22,7 +22,7 @@ jobs:
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Checkout code
uses: actions/checkout@v2
uses: actions/checkout@v4
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}

View File

@@ -1,80 +0,0 @@
name: kata deploy build
on:
pull_request:
types:
- opened
- edited
- reopened
- synchronize
paths:
- tools/**
- versions.yaml
jobs:
build-asset:
runs-on: ubuntu-latest
strategy:
matrix:
asset:
- kernel
- kernel-dragonball-experimental
- shim-v2
- qemu
- cloud-hypervisor
- firecracker
- rootfs-image
- rootfs-initrd
- virtiofsd
- nydus
steps:
- uses: actions/checkout@v2
- name: Build ${{ matrix.asset }}
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
make "${KATA_ASSET}-tarball"
build_dir=$(readlink -f build)
# store-artifact does not work with symlink
sudo cp -r --preserve=all "${build_dir}" "kata-build"
env:
KATA_ASSET: ${{ matrix.asset }}
- name: store-artifact ${{ matrix.asset }}
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-build/kata-static-${{ matrix.asset }}.tar.xz
if-no-files-found: error
create-kata-tarball:
runs-on: ubuntu-latest
needs: build-asset
steps:
- uses: actions/checkout@v2
- name: get-artifacts
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/download-artifact@v2
with:
name: kata-artifacts
path: build
- name: merge-artifacts
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
make merge-builds
- name: store-artifacts
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/upload-artifact@v2
with:
name: kata-static-tarball
path: kata-static.tar.xz
make-kata-tarball:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: make kata-tarball
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
make kata-tarball
sudo make install-tarball

View File

@@ -1,164 +0,0 @@
on:
workflow_dispatch: # this is used to trigger the workflow on non-main branches
inputs:
pr:
description: 'PR number from the selected branch to test'
type: string
required: true
issue_comment:
types: [created, edited]
name: test-kata-deploy
jobs:
check-comment-and-membership:
runs-on: ubuntu-latest
if: |
github.event.issue.pull_request
&& github.event_name == 'issue_comment'
&& github.event.action == 'created'
&& startsWith(github.event.comment.body, '/test_kata_deploy')
|| github.event_name == 'workflow_dispatch'
steps:
- name: Check membership on comment or dispatch
uses: kata-containers/is-organization-member@1.0.1
id: is_organization_member
with:
organization: kata-containers
username: ${{ github.event.comment.user.login || github.event.sender.login }}
token: ${{ secrets.GITHUB_TOKEN }}
- name: Fail if not member
run: |
result=${{ steps.is_organization_member.outputs.result }}
if [ $result == false ]; then
user=${{ github.event.comment.user.login || github.event.sender.login }}
echo Either ${user} is not part of the kata-containers organization
echo or ${user} has its Organization Visibility set to Private at
echo https://github.com/orgs/kata-containers/people?query=${user}
echo
echo Ensure you change your Organization Visibility to Public and
echo trigger the test again.
exit 1
fi
build-asset:
runs-on: ubuntu-latest
needs: check-comment-and-membership
strategy:
matrix:
asset:
- cloud-hypervisor
- firecracker
- kernel
- kernel-dragonball-experimental
- nydus
- qemu
- rootfs-image
- rootfs-initrd
- shim-v2
- virtiofsd
steps:
- name: get-PR-ref
id: get-PR-ref
run: |
if [ ${{ github.event_name }} == 'issue_comment' ]; then
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
else # workflow_dispatch
ref="refs/pull/${{ github.event.inputs.pr }}/merge"
fi
echo "reference for PR: " ${ref} "event:" ${{ github.event_name }}
echo "pr-ref=${ref}" >> $GITHUB_OUTPUT
- uses: actions/checkout@v2
with:
ref: ${{ steps.get-PR-ref.outputs.pr-ref }}
- name: Build ${{ matrix.asset }}
run: |
make "${KATA_ASSET}-tarball"
build_dir=$(readlink -f build)
# store-artifact does not work with symlink
sudo cp -r "${build_dir}" "kata-build"
env:
KATA_ASSET: ${{ matrix.asset }}
TAR_OUTPUT: ${{ matrix.asset }}.tar.gz
- name: store-artifact ${{ matrix.asset }}
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-build/kata-static-${{ matrix.asset }}.tar.xz
if-no-files-found: error
create-kata-tarball:
runs-on: ubuntu-latest
needs: build-asset
steps:
- name: get-PR-ref
id: get-PR-ref
run: |
if [ ${{ github.event_name }} == 'issue_comment' ]; then
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
else # workflow_dispatch
ref="refs/pull/${{ github.event.inputs.pr }}/merge"
fi
echo "reference for PR: " ${ref} "event:" ${{ github.event_name }}
echo "pr-ref=${ref}" >> $GITHUB_OUTPUT
- uses: actions/checkout@v2
with:
ref: ${{ steps.get-PR-ref.outputs.pr-ref }}
- name: get-artifacts
uses: actions/download-artifact@v2
with:
name: kata-artifacts
path: kata-artifacts
- name: merge-artifacts
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts
- name: store-artifacts
uses: actions/upload-artifact@v2
with:
name: kata-static-tarball
path: kata-static.tar.xz
kata-deploy:
needs: create-kata-tarball
runs-on: ubuntu-latest
steps:
- name: get-PR-ref
id: get-PR-ref
run: |
if [ ${{ github.event_name }} == 'issue_comment' ]; then
ref=$(cat $GITHUB_EVENT_PATH | jq -r '.issue.pull_request.url' | sed 's#^.*\/pulls#refs\/pull#' | sed 's#$#\/merge#')
else # workflow_dispatch
ref="refs/pull/${{ github.event.inputs.pr }}/merge"
fi
echo "reference for PR: " ${ref} "event:" ${{ github.event_name }}
echo "pr-ref=${ref}" >> $GITHUB_OUTPUT
- uses: actions/checkout@v2
with:
ref: ${{ steps.get-PR-ref.outputs.pr-ref }}
- name: get-kata-tarball
uses: actions/download-artifact@v2
with:
name: kata-static-tarball
- name: build-and-push-kata-deploy-ci
id: build-and-push-kata-deploy-ci
run: |
PR_SHA=$(git log --format=format:%H -n1)
mv kata-static.tar.xz $GITHUB_WORKSPACE/tools/packaging/kata-deploy/kata-static.tar.xz
docker build --build-arg KATA_ARTIFACTS=kata-static.tar.xz -t quay.io/kata-containers/kata-deploy-ci:$PR_SHA $GITHUB_WORKSPACE/tools/packaging/kata-deploy
docker login -u ${{ secrets.QUAY_DEPLOYER_USERNAME }} -p ${{ secrets.QUAY_DEPLOYER_PASSWORD }} quay.io
docker push quay.io/kata-containers/kata-deploy-ci:$PR_SHA
mkdir -p packaging/kata-deploy
ln -s $GITHUB_WORKSPACE/tools/packaging/kata-deploy/action packaging/kata-deploy/action
echo "PKG_SHA=${PR_SHA}" >> $GITHUB_OUTPUT
- name: test-kata-deploy-ci-in-aks
uses: ./packaging/kata-deploy/action
with:
packaging-sha: ${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}}
env:
PKG_SHA: ${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}}
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_PASSWORD: ${{ secrets.AZ_PASSWORD }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}

View File

@@ -0,0 +1,36 @@
on:
pull_request:
types:
- opened
- edited
- reopened
- synchronize
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
kata-deploy-runtime-classes-check:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Ensure the split out runtime classes match the all-in-one file
run: |
pushd tools/packaging/kata-deploy/runtimeclasses/
echo "::group::Combine runtime classes"
for runtimeClass in `find . -type f \( -name "*.yaml" -and -not -name "kata-runtimeClasses.yaml" \) | sort`; do
echo "Adding ${runtimeClass} to the resultingRuntimeClasses.yaml"
cat ${runtimeClass} >> resultingRuntimeClasses.yaml;
done
echo "::endgroup::"
echo "::group::Displaying the content of resultingRuntimeClasses.yaml"
cat resultingRuntimeClasses.yaml
echo "::endgroup::"
echo ""
echo "::group::Displaying the content of kata-runtimeClasses.yaml"
cat kata-runtimeClasses.yaml
echo "::endgroup::"
echo ""
diff resultingRuntimeClasses.yaml kata-runtimeClasses.yaml

View File

@@ -38,7 +38,17 @@ jobs:
- name: Checkout code to allow hub to communicate with the project
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v2
uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ github.event.pull_request.base.ref }}
- name: Move issue to "In progress"
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}

View File

@@ -1,98 +0,0 @@
name: CI | Publish kata-deploy payload for amd64
on:
workflow_call:
inputs:
target-arch:
required: true
type: string
jobs:
build-asset:
runs-on: ubuntu-latest
strategy:
matrix:
asset:
- cloud-hypervisor
- firecracker
- kernel
- kernel-dragonball-experimental
- nydus
- qemu
- rootfs-image
- rootfs-initrd
- shim-v2
- virtiofsd
steps:
- name: Login to Kata Containers quay.io
uses: docker/login-action@v2
with:
registry: quay.io
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@v3
with:
fetch-depth: 0 # This is needed in order to keep the commit ids history
- name: Build ${{ matrix.asset }}
run: |
make "${KATA_ASSET}-tarball"
build_dir=$(readlink -f build)
# store-artifact does not work with symlink
sudo cp -r "${build_dir}" "kata-build"
env:
KATA_ASSET: ${{ matrix.asset }}
TAR_OUTPUT: ${{ matrix.asset }}.tar.gz
PUSH_TO_REGISTRY: yes
- name: store-artifact ${{ matrix.asset }}
uses: actions/upload-artifact@v3
with:
name: kata-artifacts-amd64
path: kata-build/kata-static-${{ matrix.asset }}.tar.xz
retention-days: 1
if-no-files-found: error
create-kata-tarball:
runs-on: ubuntu-latest
needs: build-asset
steps:
- uses: actions/checkout@v3
- name: get-artifacts
uses: actions/download-artifact@v3
with:
name: kata-artifacts-amd64
path: kata-artifacts
- name: merge-artifacts
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts
- name: store-artifacts
uses: actions/upload-artifact@v3
with:
name: kata-static-tarball-amd64
path: kata-static.tar.xz
retention-days: 1
if-no-files-found: error
kata-payload:
needs: create-kata-tarball
runs-on: ubuntu-latest
steps:
- name: Login to Kata Containers quay.io
uses: docker/login-action@v2
with:
registry: quay.io
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@v3
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64
- name: build-and-push-kata-payload
id: build-and-push-kata-payload
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \
$(pwd)/kata-static.tar.xz "quay.io/kata-containers/kata-deploy-ci" \
"kata-containers-${{ inputs.target-arch }}"

View File

@@ -4,32 +4,76 @@ on:
branches:
- main
- stable-*
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
build-assets-amd64:
uses: ./.github/workflows/payload-after-push-amd64.yaml
uses: ./.github/workflows/build-kata-static-tarball-amd64.yaml
with:
target-arch: amd64
commit-hash: ${{ github.sha }}
push-to-registry: yes
target-branch: ${{ github.ref_name }}
secrets: inherit
build-assets-arm64:
uses: ./.github/workflows/payload-after-push-arm64.yaml
uses: ./.github/workflows/build-kata-static-tarball-arm64.yaml
with:
target-arch: arm64
commit-hash: ${{ github.sha }}
push-to-registry: yes
target-branch: ${{ github.ref_name }}
secrets: inherit
build-assets-s390x:
uses: ./.github/workflows/payload-after-push-s390x.yaml
uses: ./.github/workflows/build-kata-static-tarball-s390x.yaml
with:
target-arch: s390x
commit-hash: ${{ github.sha }}
push-to-registry: yes
target-branch: ${{ github.ref_name }}
secrets: inherit
publish:
publish-kata-deploy-payload-amd64:
needs: build-assets-amd64
uses: ./.github/workflows/publish-kata-deploy-payload-amd64.yaml
with:
commit-hash: ${{ github.sha }}
registry: quay.io
repo: kata-containers/kata-deploy-ci
tag: kata-containers-amd64
target-branch: ${{ github.ref_name }}
secrets: inherit
publish-kata-deploy-payload-arm64:
needs: build-assets-arm64
uses: ./.github/workflows/publish-kata-deploy-payload-arm64.yaml
with:
commit-hash: ${{ github.sha }}
registry: quay.io
repo: kata-containers/kata-deploy-ci
tag: kata-containers-arm64
target-branch: ${{ github.ref_name }}
secrets: inherit
publish-kata-deploy-payload-s390x:
needs: build-assets-s390x
uses: ./.github/workflows/publish-kata-deploy-payload-s390x.yaml
with:
commit-hash: ${{ github.sha }}
registry: quay.io
repo: kata-containers/kata-deploy-ci
tag: kata-containers-s390x
target-branch: ${{ github.ref_name }}
secrets: inherit
publish-manifest:
runs-on: ubuntu-latest
needs: [build-assets-amd64, build-assets-arm64, build-assets-s390x]
needs: [publish-kata-deploy-payload-amd64, publish-kata-deploy-payload-arm64, publish-kata-deploy-payload-s390x]
steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Login to Kata Containers quay.io
uses: docker/login-action@v2

View File

@@ -0,0 +1,66 @@
name: CI | Publish kata-deploy payload for amd64
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
kata-payload:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
- name: Login to Kata Containers quay.io
if: ${{ inputs.registry == 'quay.io' }}
uses: docker/login-action@v2
with:
registry: quay.io
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- name: Login to Kata Containers ghcr.io
if: ${{ inputs.registry == 'ghcr.io' }}
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: build-and-push-kata-payload
id: build-and-push-kata-payload
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \
$(pwd)/kata-static.tar.xz \
${{ inputs.registry }}/${{ inputs.repo }} ${{ inputs.tag }}

View File

@@ -0,0 +1,71 @@
name: CI | Publish kata-deploy payload for arm64
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
kata-payload:
runs-on: arm64-builder
steps:
- name: Adjust a permission for repo
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-arm64${{ inputs.tarball-suffix }}
- name: Login to Kata Containers quay.io
if: ${{ inputs.registry == 'quay.io' }}
uses: docker/login-action@v2
with:
registry: quay.io
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- name: Login to Kata Containers ghcr.io
if: ${{ inputs.registry == 'ghcr.io' }}
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: build-and-push-kata-payload
id: build-and-push-kata-payload
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \
$(pwd)/kata-static.tar.xz \
${{ inputs.registry }}/${{ inputs.repo }} ${{ inputs.tag }}

View File

@@ -0,0 +1,70 @@
name: CI | Publish kata-deploy payload for s390x
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
kata-payload:
runs-on: s390x
steps:
- name: Adjust a permission for repo
run: |
sudo chown -R $USER:$USER $GITHUB_WORKSPACE
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-s390x${{ inputs.tarball-suffix }}
- name: Login to Kata Containers quay.io
if: ${{ inputs.registry == 'quay.io' }}
uses: docker/login-action@v2
with:
registry: quay.io
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- name: Login to Kata Containers ghcr.io
if: ${{ inputs.registry == 'ghcr.io' }}
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: build-and-push-kata-payload
id: build-and-push-kata-payload
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \
$(pwd)/kata-static.tar.xz \
${{ inputs.registry }}/${{ inputs.repo }} ${{ inputs.tag }}

53
.github/workflows/release-amd64.yaml vendored Normal file
View File

@@ -0,0 +1,53 @@
name: Publish Kata release artifacts for amd64
on:
workflow_call:
inputs:
target-arch:
required: true
type: string
jobs:
build-kata-static-tarball-amd64:
uses: ./.github/workflows/build-kata-static-tarball-amd64.yaml
with:
stage: release
kata-deploy:
needs: build-kata-static-tarball-amd64
runs-on: ubuntu-latest
steps:
- name: Login to Kata Containers docker.io
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Login to Kata Containers quay.io
uses: docker/login-action@v2
with:
registry: quay.io
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@v4
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64
- name: build-and-push-kata-deploy-ci-amd64
id: build-and-push-kata-deploy-ci-amd64
run: |
# We need to do such trick here as the format of the $GITHUB_REF
# is "refs/tags/<tag>"
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
tags=($tag)
tags+=($([[ "$tag" =~ "alpha"|"rc" ]] && echo "latest" || echo "stable"))
for tag in ${tags[@]}; do
./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \
$(pwd)/kata-static.tar.xz "docker.io/katadocker/kata-deploy" \
"${tag}-${{ inputs.target-arch }}"
./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \
$(pwd)/kata-static.tar.xz "quay.io/kata-containers/kata-deploy" \
"${tag}-${{ inputs.target-arch }}"
done

53
.github/workflows/release-arm64.yaml vendored Normal file
View File

@@ -0,0 +1,53 @@
name: Publish Kata release artifacts for arm64
on:
workflow_call:
inputs:
target-arch:
required: true
type: string
jobs:
build-kata-static-tarball-arm64:
uses: ./.github/workflows/build-kata-static-tarball-arm64.yaml
with:
stage: release
kata-deploy:
needs: build-kata-static-tarball-arm64
runs-on: arm64-builder
steps:
- name: Login to Kata Containers docker.io
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Login to Kata Containers quay.io
uses: docker/login-action@v2
with:
registry: quay.io
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@v4
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-arm64
- name: build-and-push-kata-deploy-ci-arm64
id: build-and-push-kata-deploy-ci-arm64
run: |
# We need to do such trick here as the format of the $GITHUB_REF
# is "refs/tags/<tag>"
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
tags=($tag)
tags+=($([[ "$tag" =~ "alpha"|"rc" ]] && echo "latest" || echo "stable"))
for tag in ${tags[@]}; do
./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \
$(pwd)/kata-static.tar.xz "docker.io/katadocker/kata-deploy" \
"${tag}-${{ inputs.target-arch }}"
./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \
$(pwd)/kata-static.tar.xz "quay.io/kata-containers/kata-deploy" \
"${tag}-${{ inputs.target-arch }}"
done

53
.github/workflows/release-s390x.yaml vendored Normal file
View File

@@ -0,0 +1,53 @@
name: Publish Kata release artifacts for s390x
on:
workflow_call:
inputs:
target-arch:
required: true
type: string
jobs:
build-kata-static-tarball-s390x:
uses: ./.github/workflows/build-kata-static-tarball-s390x.yaml
with:
stage: release
kata-deploy:
needs: build-kata-static-tarball-s390x
runs-on: s390x
steps:
- name: Login to Kata Containers docker.io
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Login to Kata Containers quay.io
uses: docker/login-action@v2
with:
registry: quay.io
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@v4
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-s390x
- name: build-and-push-kata-deploy-ci-s390x
id: build-and-push-kata-deploy-ci-s390x
run: |
# We need to do such trick here as the format of the $GITHUB_REF
# is "refs/tags/<tag>"
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
tags=($tag)
tags+=($([[ "$tag" =~ "alpha"|"rc" ]] && echo "latest" || echo "stable"))
for tag in ${tags[@]}; do
./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \
$(pwd)/kata-static.tar.xz "docker.io/katadocker/kata-deploy" \
"${tag}-${{ inputs.target-arch }}"
./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \
$(pwd)/kata-static.tar.xz "quay.io/kata-containers/kata-deploy" \
"${tag}-${{ inputs.target-arch }}"
done

View File

@@ -4,153 +4,153 @@ on:
tags:
- '[0-9]+.[0-9]+.[0-9]+*'
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
build-asset:
runs-on: ubuntu-latest
strategy:
matrix:
asset:
- cloud-hypervisor
- firecracker
- kernel
- kernel-dragonball-experimental
- nydus
- qemu
- rootfs-image
- rootfs-initrd
- shim-v2
- virtiofsd
steps:
- uses: actions/checkout@v2
- name: Build ${{ matrix.asset }}
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-copy-yq-installer.sh
./tools/packaging/kata-deploy/local-build/kata-deploy-binaries-in-docker.sh --build="${KATA_ASSET}"
build_dir=$(readlink -f build)
# store-artifact does not work with symlink
sudo cp -r "${build_dir}" "kata-build"
env:
KATA_ASSET: ${{ matrix.asset }}
TAR_OUTPUT: ${{ matrix.asset }}.tar.gz
build-and-push-assets-amd64:
uses: ./.github/workflows/release-amd64.yaml
with:
target-arch: amd64
secrets: inherit
- name: store-artifact ${{ matrix.asset }}
uses: actions/upload-artifact@v2
with:
name: kata-artifacts
path: kata-build/kata-static-${{ matrix.asset }}.tar.xz
if-no-files-found: error
build-and-push-assets-arm64:
uses: ./.github/workflows/release-arm64.yaml
with:
target-arch: arm64
secrets: inherit
create-kata-tarball:
runs-on: ubuntu-latest
needs: build-asset
steps:
- uses: actions/checkout@v2
- name: get-artifacts
uses: actions/download-artifact@v2
with:
name: kata-artifacts
path: kata-artifacts
- name: merge-artifacts
run: |
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts
- name: store-artifacts
uses: actions/upload-artifact@v2
with:
name: kata-static-tarball
path: kata-static.tar.xz
build-and-push-assets-s390x:
uses: ./.github/workflows/release-s390x.yaml
with:
target-arch: s390x
secrets: inherit
kata-deploy:
needs: create-kata-tarball
publish-multi-arch-images:
runs-on: ubuntu-latest
needs: [build-and-push-assets-amd64, build-and-push-assets-arm64, build-and-push-assets-s390x]
steps:
- uses: actions/checkout@v2
- name: get-kata-tarball
uses: actions/download-artifact@v2
- name: Checkout repository
uses: actions/checkout@v4
- name: Login to Kata Containers docker.io
uses: docker/login-action@v2
with:
name: kata-static-tarball
- name: build-and-push-kata-deploy-ci
id: build-and-push-kata-deploy-ci
run: |
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
pushd $GITHUB_WORKSPACE
git checkout $tag
pkg_sha=$(git rev-parse HEAD)
popd
mv kata-static.tar.xz $GITHUB_WORKSPACE/tools/packaging/kata-deploy/kata-static.tar.xz
docker build --build-arg KATA_ARTIFACTS=kata-static.tar.xz -t katadocker/kata-deploy-ci:$pkg_sha -t quay.io/kata-containers/kata-deploy-ci:$pkg_sha $GITHUB_WORKSPACE/tools/packaging/kata-deploy
docker login -u ${{ secrets.DOCKER_USERNAME }} -p ${{ secrets.DOCKER_PASSWORD }}
docker push katadocker/kata-deploy-ci:$pkg_sha
docker login -u ${{ secrets.QUAY_DEPLOYER_USERNAME }} -p ${{ secrets.QUAY_DEPLOYER_PASSWORD }} quay.io
docker push quay.io/kata-containers/kata-deploy-ci:$pkg_sha
mkdir -p packaging/kata-deploy
ln -s $GITHUB_WORKSPACE/tools/packaging/kata-deploy/action packaging/kata-deploy/action
echo "PKG_SHA=${pkg_sha}" >> $GITHUB_OUTPUT
- name: test-kata-deploy-ci-in-aks
uses: ./packaging/kata-deploy/action
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Login to Kata Containers quay.io
uses: docker/login-action@v2
with:
packaging-sha: ${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}}
env:
PKG_SHA: ${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}}
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_PASSWORD: ${{ secrets.AZ_PASSWORD }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}
- name: push-tarball
registry: quay.io
username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- name: Push multi-arch manifest
run: |
# tag the container image we created and push to DockerHub
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
tags=($tag)
tags+=($([[ "$tag" =~ "alpha"|"rc" ]] && echo "latest" || echo "stable"))
for tag in ${tags[@]}; do \
docker tag katadocker/kata-deploy-ci:${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}} katadocker/kata-deploy:${tag} && \
docker tag quay.io/kata-containers/kata-deploy-ci:${{steps.build-and-push-kata-deploy-ci.outputs.PKG_SHA}} quay.io/kata-containers/kata-deploy:${tag} && \
docker push katadocker/kata-deploy:${tag} && \
docker push quay.io/kata-containers/kata-deploy:${tag}; \
# push to quay.io and docker.io
for tag in ${tags[@]}; do
docker manifest create quay.io/kata-containers/kata-deploy:${tag} \
--amend quay.io/kata-containers/kata-deploy:${tag}-amd64 \
--amend quay.io/kata-containers/kata-deploy:${tag}-arm64 \
--amend quay.io/kata-containers/kata-deploy:${tag}-s390x
docker manifest create docker.io/katadocker/kata-deploy:${tag} \
--amend docker.io/katadocker/kata-deploy:${tag}-amd64 \
--amend docker.io/katadocker/kata-deploy:${tag}-arm64 \
--amend docker.io/katadocker/kata-deploy:${tag}-s390x
docker manifest push quay.io/kata-containers/kata-deploy:${tag}
docker manifest push docker.io/katadocker/kata-deploy:${tag}
done
upload-static-tarball:
needs: kata-deploy
upload-multi-arch-static-tarball:
needs: publish-multi-arch-images
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: download-artifacts
uses: actions/download-artifact@v2
- uses: actions/checkout@v4
- name: download-artifacts-amd64
uses: actions/download-artifact@v3
with:
name: kata-static-tarball
- name: install hub
run: |
HUB_VER=$(curl -s "https://api.github.com/repos/github/hub/releases/latest" | jq -r .tag_name | sed 's/^v//')
wget -q -O- https://github.com/github/hub/releases/download/v$HUB_VER/hub-linux-amd64-$HUB_VER.tgz | \
tar xz --strip-components=2 --wildcards '*/bin/hub' && sudo mv hub /usr/local/bin/hub
- name: push static tarball to github
name: kata-static-tarball-amd64
- name: push amd64 static tarball to github
run: |
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
tarball="kata-static-$tag-x86_64.tar.xz"
tarball="kata-static-$tag-amd64.tar.xz"
mv kata-static.tar.xz "$GITHUB_WORKSPACE/${tarball}"
pushd $GITHUB_WORKSPACE
echo "uploading asset '${tarball}' for tag: ${tag}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} hub release edit -m "" -a "${tarball}" "${tag}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} gh release upload "${tag}" "${tarball}"
popd
- name: download-artifacts-arm64
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-arm64
- name: push arm64 static tarball to github
run: |
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
tarball="kata-static-$tag-arm64.tar.xz"
mv kata-static.tar.xz "$GITHUB_WORKSPACE/${tarball}"
pushd $GITHUB_WORKSPACE
echo "uploading asset '${tarball}' for tag: ${tag}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} gh release upload "${tag}" "${tarball}"
popd
- name: download-artifacts-s390x
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-s390x
- name: push s390x static tarball to github
run: |
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
tarball="kata-static-$tag-s390x.tar.xz"
mv kata-static.tar.xz "$GITHUB_WORKSPACE/${tarball}"
pushd $GITHUB_WORKSPACE
echo "uploading asset '${tarball}' for tag: ${tag}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} gh release upload "${tag}" "${tarball}"
popd
upload-versions-yaml:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: upload versions.yaml
env:
GITHUB_TOKEN: ${{ secrets.GIT_UPLOAD_TOKEN }}
run: |
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
pushd $GITHUB_WORKSPACE
versions_file="kata-containers-$tag-versions.yaml"
cp versions.yaml ${versions_file}
gh release upload "${tag}" "${versions_file}"
popd
upload-cargo-vendored-tarball:
needs: upload-static-tarball
needs: upload-multi-arch-static-tarball
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: generate-and-upload-tarball
run: |
tag=$(echo $GITHUB_REF | cut -d/ -f3-)
tarball="kata-containers-$tag-vendor.tar.gz"
pushd $GITHUB_WORKSPACE
bash -c "tools/packaging/release/generate_vendor.sh ${tarball}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} hub release edit -m "" -a "${tarball}" "${tag}"
GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} gh release upload "${tag}" "${tarball}"
popd
upload-libseccomp-tarball:
needs: upload-cargo-vendored-tarball
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: download-and-upload-tarball
env:
GITHUB_TOKEN: ${{ secrets.GIT_UPLOAD_TOKEN }}
@@ -170,6 +170,6 @@ jobs:
# "-m" option should be empty to re-use the existing release title
# without opening a text editor.
# For the details, check https://hub.github.com/hub-release.1.html.
hub release edit -m "" -a "${tarball}" "${tag}"
hub release edit -m "" -a "${asc}" "${tag}"
gh release upload "${tag}" "${tarball}"
gh release upload "${tag}" "${asc}"
popd

View File

@@ -15,6 +15,10 @@ on:
branches:
- main
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
check-pr-porting-labels:
runs-on: ubuntu-latest
@@ -32,7 +36,17 @@ jobs:
- name: Checkout code to allow hub to communicate with the project
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v2
uses: actions/checkout@v4
with:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ github.event.pull_request.base.ref }}
- name: Install porting checker script
run: |

View File

@@ -0,0 +1,56 @@
name: CI | Run docker integration tests
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-docker-tests:
strategy:
# We can set this to true whenever we're 100% sure that
# all the tests are not flaky, otherwise we'll fail them
# all due to a single flaky instance.
fail-fast: false
matrix:
vmm:
- clh
- qemu
runs-on: garm-ubuntu-2304-smaller
env:
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/docker/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/docker/gha-run.sh install-kata kata-artifacts
- name: Run docker smoke test
timeout-minutes: 5
run: bash tests/integration/docker/gha-run.sh run

View File

@@ -0,0 +1,98 @@
name: CI | Run kubernetes tests on AKS
on:
workflow_call:
inputs:
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
pr-number:
required: true
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-k8s-tests:
strategy:
fail-fast: false
matrix:
host_os:
- ubuntu
vmm:
- clh
- dragonball
- qemu
instance-type:
- small
- normal
include:
- host_os: cbl-mariner
vmm: clh
runs-on: ubuntu-latest
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
GH_PR_NUMBER: ${{ inputs.pr-number }}
KATA_HOST_OS: ${{ matrix.host_os }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: "vanilla"
USING_NFD: "false"
K8S_TEST_HOST_TYPE: ${{ matrix.instance-type }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Download Azure CLI
run: bash tests/integration/kubernetes/gha-run.sh install-azure-cli
- name: Log into the Azure account
run: bash tests/integration/kubernetes/gha-run.sh login-azure
env:
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_PASSWORD: ${{ secrets.AZ_PASSWORD }}
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}
- name: Create AKS cluster
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh create-cluster
- name: Install `bats`
run: bash tests/integration/kubernetes/gha-run.sh install-bats
- name: Install `kubectl`
run: bash tests/integration/kubernetes/gha-run.sh install-kubectl
- name: Download credentials for the Kubernetes CLI to use them
run: bash tests/integration/kubernetes/gha-run.sh get-cluster-credentials
- name: Deploy Kata
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-aks
- name: Run tests
timeout-minutes: 60
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Delete AKS cluster
if: always()
run: bash tests/integration/kubernetes/gha-run.sh delete-cluster

View File

@@ -0,0 +1,88 @@
name: CI | Run kubernetes tests on GARM
on:
workflow_call:
inputs:
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
pr-number:
required: true
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-k8s-tests:
strategy:
fail-fast: false
matrix:
vmm:
- clh #cloud-hypervisor
- fc #firecracker
- qemu
snapshotter:
- devmapper
k8s:
- k3s
instance:
- garm-ubuntu-2004
- garm-ubuntu-2004-smaller
include:
- instance: garm-ubuntu-2004
instance-type: normal
- instance: garm-ubuntu-2004-smaller
instance-type: small
runs-on: ${{ matrix.instance }}
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: ${{ matrix.k8s }}
SNAPSHOTTER: ${{ matrix.snapshotter }}
USING_NFD: "false"
K8S_TEST_HOST_TYPE: ${{ matrix.instance-type }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Deploy ${{ matrix.k8s }}
run: bash tests/integration/kubernetes/gha-run.sh deploy-k8s
- name: Configure the ${{ matrix.snapshotter }} snapshotter
run: bash tests/integration/kubernetes/gha-run.sh configure-snapshotter
- name: Deploy Kata
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-garm
- name: Install `bats`
run: bash tests/integration/kubernetes/gha-run.sh install-bats
- name: Run tests
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Delete kata-deploy
if: always()
run: bash tests/integration/kubernetes/gha-run.sh cleanup-garm

View File

@@ -0,0 +1,86 @@
name: CI | Run kubernetes tests, using CRI-O, on GARM
on:
workflow_call:
inputs:
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
pr-number:
required: true
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-k8s-tests:
strategy:
fail-fast: false
matrix:
vmm:
- qemu
k8s:
- k0s
instance:
- garm-ubuntu-2004
- garm-ubuntu-2004-smaller
include:
- instance: garm-ubuntu-2004
instance-type: normal
- instance: garm-ubuntu-2004-smaller
instance-type: small
- k8s: k0s
k8s-extra-params: '--cri-socket remote:unix:///var/run/crio/crio.sock --kubelet-extra-args --cgroup-driver="systemd"'
runs-on: ${{ matrix.instance }}
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: ${{ matrix.k8s }}
KUBERNETES_EXTRA_PARAMS: ${{ matrix.k8s-extra-params }}
USING_NFD: "false"
K8S_TEST_HOST_TYPE: ${{ matrix.instance-type }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Configure CRI-O
run: bash tests/integration/kubernetes/gha-run.sh setup-crio
- name: Deploy ${{ matrix.k8s }}
run: bash tests/integration/kubernetes/gha-run.sh deploy-k8s
- name: Deploy Kata
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-garm
- name: Install `bats`
run: bash tests/integration/kubernetes/gha-run.sh install-bats
- name: Run tests
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Delete kata-deploy
if: always()
run: bash tests/integration/kubernetes/gha-run.sh cleanup-garm

View File

@@ -0,0 +1,176 @@
name: CI | Run kata coco tests
on:
workflow_call:
inputs:
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
pr-number:
required: true
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-kata-deploy-tests-on-tdx:
strategy:
fail-fast: false
matrix:
vmm:
- qemu-tdx
runs-on: tdx
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: "k3s"
USING_NFD: "true"
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Run tests
run: bash tests/functional/kata-deploy/gha-run.sh run-tests
run-k8s-tests-on-tdx:
strategy:
fail-fast: false
matrix:
vmm:
- qemu-tdx
runs-on: tdx
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: "k3s"
USING_NFD: "true"
K8S_TEST_HOST_TYPE: "baremetal"
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Deploy Kata
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-tdx
- name: Run tests
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Delete kata-deploy
if: always()
run: bash tests/integration/kubernetes/gha-run.sh cleanup-tdx
run-k8s-tests-on-sev:
strategy:
fail-fast: false
matrix:
vmm:
- qemu-sev
runs-on: sev
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBECONFIG: /home/kata/.kube/config
KUBERNETES: "vanilla"
USING_NFD: "false"
K8S_TEST_HOST_TYPE: "baremetal"
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Deploy Kata
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-sev
- name: Run tests
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Delete kata-deploy
if: always()
run: bash tests/integration/kubernetes/gha-run.sh cleanup-sev
run-k8s-tests-sev-snp:
strategy:
fail-fast: false
matrix:
vmm:
- qemu-snp
runs-on: sev-snp
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBECONFIG: /home/kata/.kube/config
KUBERNETES: "vanilla"
USING_NFD: "false"
K8S_TEST_HOST_TYPE: "baremetal"
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Deploy Kata
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-snp
- name: Run tests
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Delete kata-deploy
if: always()
run: bash tests/integration/kubernetes/gha-run.sh cleanup-snp

View File

@@ -0,0 +1,89 @@
name: CI | Run kata-deploy tests on AKS
on:
workflow_call:
inputs:
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
pr-number:
required: true
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-kata-deploy-tests:
strategy:
fail-fast: false
matrix:
host_os:
- ubuntu
vmm:
- clh
- dragonball
- qemu
include:
- host_os: cbl-mariner
vmm: clh
runs-on: ubuntu-latest
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
GH_PR_NUMBER: ${{ inputs.pr-number }}
KATA_HOST_OS: ${{ matrix.host_os }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: "vanilla"
USING_NFD: "false"
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Download Azure CLI
run: bash tests/functional/kata-deploy/gha-run.sh install-azure-cli
- name: Log into the Azure account
run: bash tests/functional/kata-deploy/gha-run.sh login-azure
env:
AZ_APPID: ${{ secrets.AZ_APPID }}
AZ_PASSWORD: ${{ secrets.AZ_PASSWORD }}
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}
- name: Create AKS cluster
timeout-minutes: 10
run: bash tests/functional/kata-deploy/gha-run.sh create-cluster
- name: Install `bats`
run: bash tests/functional/kata-deploy/gha-run.sh install-bats
- name: Install `kubectl`
run: bash tests/functional/kata-deploy/gha-run.sh install-kubectl
- name: Download credentials for the Kubernetes CLI to use them
run: bash tests/functional/kata-deploy/gha-run.sh get-cluster-credentials
- name: Run tests
run: bash tests/functional/kata-deploy/gha-run.sh run-tests
- name: Delete AKS cluster
if: always()
run: bash tests/functional/kata-deploy/gha-run.sh delete-cluster

View File

@@ -0,0 +1,65 @@
name: CI | Run kata-deploy tests on GARM
on:
workflow_call:
inputs:
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
pr-number:
required: true
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-kata-deploy-tests:
strategy:
fail-fast: false
matrix:
vmm:
- clh
- qemu
k8s:
- k0s
- k3s
- rke2
runs-on: garm-ubuntu-2004-smaller
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: ${{ matrix.k8s }}
USING_NFD: "false"
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Deploy ${{ matrix.k8s }}
run: bash tests/functional/kata-deploy/gha-run.sh deploy-k8s
- name: Install `bats`
run: bash tests/functional/kata-deploy/gha-run.sh install-bats
- name: Run tests
run: bash tests/functional/kata-deploy/gha-run.sh run-tests

View File

@@ -0,0 +1,59 @@
name: CI | Run kata-monitor tests
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-monitor:
strategy:
fail-fast: false
matrix:
vmm:
- qemu
container_engine:
- crio
- containerd
include:
- container_engine: containerd
containerd_version: lts
runs-on: garm-ubuntu-2204-smaller
env:
CONTAINER_ENGINE: ${{ matrix.container_engine }}
CONTAINERD_VERSION: ${{ matrix.containerd_version }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/functional/kata-monitor/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/functional/kata-monitor/gha-run.sh install-kata kata-artifacts
- name: Run kata-monitor tests
run: bash tests/functional/kata-monitor/gha-run.sh run

94
.github/workflows/run-metrics.yaml vendored Normal file
View File

@@ -0,0 +1,94 @@
name: CI | Run test metrics
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
setup-kata:
name: Kata Setup
runs-on: metrics
env:
GOPATH: ${{ github.workspace }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/metrics/gha-run.sh install-kata kata-artifacts
run-metrics:
needs: setup-kata
strategy:
# We can set this to true whenever we're 100% sure that
# the all the tests are not flaky, otherwise we'll fail
# all the tests due to a single flaky instance.
fail-fast: false
matrix:
vmm: ['clh', 'qemu']
max-parallel: 1
runs-on: metrics
env:
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- name: enabling the hypervisor
run: bash tests/metrics/gha-run.sh enabling-hypervisor
- name: run launch times test
run: bash tests/metrics/gha-run.sh run-test-launchtimes
- name: run memory foot print test
run: bash tests/metrics/gha-run.sh run-test-memory-usage
- name: run memory usage inside container test
run: bash tests/metrics/gha-run.sh run-test-memory-usage-inside-container
- name: run blogbench test
run: bash tests/metrics/gha-run.sh run-test-blogbench
- name: run tensorflow test
run: bash tests/metrics/gha-run.sh run-test-tensorflow
- name: run fio test
run: bash tests/metrics/gha-run.sh run-test-fio
- name: run iperf test
run: bash tests/metrics/gha-run.sh run-test-iperf
- name: run latency test
run: bash tests/metrics/gha-run.sh run-test-latency
- name: make metrics tarball ${{ matrix.vmm }}
run: bash tests/metrics/gha-run.sh make-tarball-results
- name: archive metrics results ${{ matrix.vmm }}
uses: actions/upload-artifact@v3
with:
name: metrics-artifacts-${{ matrix.vmm }}
path: results-${{ matrix.vmm }}.tar.gz
retention-days: 1
if-no-files-found: error

View File

@@ -0,0 +1,57 @@
name: CI | Run nerdctl integration tests
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-nerdctl-tests:
strategy:
# We can set this to true whenever we're 100% sure that
# all the tests are not flaky, otherwise we'll fail them
# all due to a single flaky instance.
fail-fast: false
matrix:
vmm:
- clh
- dragonball
- qemu
runs-on: garm-ubuntu-2304-smaller
env:
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/nerdctl/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/nerdctl/gha-run.sh install-kata kata-artifacts
- name: Run nerdctl smoke test
timeout-minutes: 5
run: bash tests/integration/nerdctl/gha-run.sh run

46
.github/workflows/run-runk-tests.yaml vendored Normal file
View File

@@ -0,0 +1,46 @@
name: CI | Run runk tests
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
jobs:
run-runk:
runs-on: garm-ubuntu-2204-smaller
env:
CONTAINERD_VERSION: lts
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/runk/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@v3
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/runk/gha-run.sh install-kata kata-artifacts
- name: Run tracing tests
run: bash tests/integration/runk/gha-run.sh run

View File

@@ -1,52 +0,0 @@
name: Release Kata in snapcraft store
on:
push:
tags:
- '[0-9]+.[0-9]+.[0-9]+*'
env:
SNAPCRAFT_STORE_CREDENTIALS: ${{ secrets.snapcraft_token }}
jobs:
release-snap:
runs-on: ubuntu-20.04
steps:
- name: Check out Git repository
uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Install Snapcraft
run: |
# Required to avoid snapcraft install failure
sudo chown root:root /
# "--classic" is needed for the GitHub action runner
# environment.
sudo snap install snapcraft --classic
# Allow other parts to access snap binaries
echo /snap/bin >> "$GITHUB_PATH"
- name: Build snap
run: |
# Removing man-db, workflow kept failing, fixes: #4480
sudo apt -y remove --purge man-db
sudo apt-get install -y git git-extras
kata_url="https://github.com/kata-containers/kata-containers"
latest_version=$(git ls-remote --tags ${kata_url} | egrep -o "refs.*" | egrep -v "\-alpha|\-rc|{}" | egrep -o "[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+" | sort -V -r | head -1)
current_version="$(echo ${GITHUB_REF} | cut -d/ -f3)"
# Check semantic versioning format (x.y.z) and if the current tag is the latest tag
if echo "${current_version}" | grep -q "^[[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+$" && echo -e "$latest_version\n$current_version" | sort -C -V; then
# Current version is the latest version, build it
snapcraft snap --debug --destructive-mode
fi
- name: Upload snap
run: |
snap_version="$(echo ${GITHUB_REF} | cut -d/ -f3)"
snap_file="kata-containers_${snap_version}_amd64.snap"
# Upload the snap if it exists
if [ -f ${snap_file} ]; then
snapcraft upload --release=stable ${snap_file}
fi

View File

@@ -1,37 +0,0 @@
name: snap CI
on:
pull_request:
types:
- opened
- synchronize
- reopened
- edited
paths-ignore: [ '**.md', '**.png', '**.jpg', '**.jpeg', '**.svg', '/docs/**' ]
jobs:
test:
runs-on: ubuntu-20.04
steps:
- name: Check out
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
uses: actions/checkout@v2
with:
fetch-depth: 0
- name: Install Snapcraft
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
# Required to avoid snapcraft install failure
sudo chown root:root /
# "--classic" is needed for the GitHub action runner
# environment.
sudo snap install snapcraft --classic
# Allow other parts to access snap binaries
echo /snap/bin >> "$GITHUB_PATH"
- name: Build snap
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
snapcraft snap --debug --destructive-mode

View File

@@ -1,33 +0,0 @@
on:
pull_request:
types:
- opened
- edited
- reopened
- synchronize
paths-ignore: [ '**.md', '**.png', '**.jpg', '**.jpeg', '**.svg', '/docs/**' ]
name: Static checks dragonball
jobs:
test-dragonball:
runs-on: self-hosted
env:
RUST_BACKTRACE: "1"
steps:
- uses: actions/checkout@v3
- name: Set env
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV
- name: Install Rust
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
./ci/install_rust.sh
PATH=$PATH:"$HOME/.cargo/bin"
- name: Run Unit Test
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd src/dragonball
cargo version
rustc --version
sudo -E env PATH=$PATH LIBC=gnu SUPPORT_VIRTUALIZATION=true make test

View File

@@ -6,76 +6,189 @@ on:
- reopened
- synchronize
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
name: Static checks
jobs:
static-checks:
check-kernel-config-version:
runs-on: ubuntu-latest
steps:
- name: Checkout the code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Ensure the kernel config version has been updated
run: |
kernel_dir="tools/packaging/kernel/"
kernel_version_file="${kernel_dir}kata_config_version"
modified_files=$(git diff --name-only origin/$GITHUB_BASE_REF..HEAD)
if git diff --name-only origin/$GITHUB_BASE_REF..HEAD "${kernel_dir}" | grep "${kernel_dir}"; then
echo "Kernel directory has changed, checking if $kernel_version_file has been updated"
if echo "$modified_files" | grep -v "README.md" | grep "${kernel_dir}" >>"/dev/null"; then
echo "$modified_files" | grep "$kernel_version_file" >>/dev/null || ( echo "Please bump version in $kernel_version_file" && exit 1)
else
echo "Readme file changed, no need for kernel config version update."
fi
echo "Check passed"
fi
build-checks:
runs-on: ubuntu-20.04
strategy:
fail-fast: false
matrix:
cmd:
component:
- agent
- dragonball
- runtime
- runtime-rs
- agent-ctl
- kata-ctl
- log-parser-rs
- runk
- trace-forwarder
command:
- "make vendor"
- "make static-checks"
- "make check"
- "make test"
- "sudo -E PATH=\"$PATH\" make test"
include:
- component: agent
component-path: src/agent
- component: dragonball
component-path: src/dragonball
- component: runtime
component-path: src/runtime
- component: runtime-rs
component-path: src/runtime-rs
- component: agent-ctl
component-path: src/tools/agent-ctl
- component: kata-ctl
component-path: src/tools/kata-ctl
- component: log-parser-rs
component-path: src/tools/log-parser-rs
- component: runk
component-path: src/tools/runk
- component: trace-forwarder
component-path: src/tools/trace-forwarder
- install-libseccomp: no
- component: agent
install-libseccomp: yes
- component: runk
install-libseccomp: yes
steps:
- name: Checkout the code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install yq
run: |
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Install golang
if: ${{ matrix.component == 'runtime' }}
run: |
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> $GITHUB_PATH
- name: Install rust
if: ${{ matrix.component != 'runtime' }}
run: |
./tests/install_rust.sh
echo "${HOME}/.cargo/bin" >> $GITHUB_PATH
- name: Install musl-tools
if: ${{ matrix.component != 'runtime' }}
run: sudo apt-get -y install musl-tools
- name: Install libseccomp
if: ${{ matrix.command != 'make vendor' && matrix.command != 'make check' && matrix.install-libseccomp == 'yes' }}
run: |
libseccomp_install_dir=$(mktemp -d -t libseccomp.XXXXXXXXXX)
gperf_install_dir=$(mktemp -d -t gperf.XXXXXXXXXX)
./ci/install_libseccomp.sh "${libseccomp_install_dir}" "${gperf_install_dir}"
echo "Set environment variables for the libseccomp crate to link the libseccomp library statically"
echo "LIBSECCOMP_LINK_TYPE=static" >> $GITHUB_ENV
echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> $GITHUB_ENV
- name: Setup XDG_RUNTIME_DIR for the `runtime` tests
if: ${{ matrix.command != 'make vendor' && matrix.command != 'make check' && matrix.component == 'runtime' }}
run: |
XDG_RUNTIME_DIR=$(mktemp -d /tmp/kata-tests-$USER.XXX | tee >(xargs chmod 0700))
echo "XDG_RUNTIME_DIR=${XDG_RUNTIME_DIR}" >> $GITHUB_ENV
- name: Running `${{ matrix.command }}` for ${{ matrix.component }}
run: |
cd ${{ matrix.component-path }}
${{ matrix.command }}
env:
RUST_BACKTRACE: "1"
build-checks-depending-on-kvm:
runs-on: garm-ubuntu-2004-smaller
strategy:
fail-fast: false
matrix:
component:
- runtime-rs
include:
- component: runtime-rs
command: "sudo -E env PATH=$PATH LIBC=gnu SUPPORT_VIRTUALIZATION=true make test"
- component: runtime-rs
component-path: src/dragonball
steps:
- name: Checkout the code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install system deps
run: |
sudo apt-get install -y build-essential musl-tools
- name: Install yq
run: |
sudo -E ./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Install rust
run: |
export PATH="$PATH:/usr/local/bin"
./tests/install_rust.sh
- name: Running `${{ matrix.command }}` for ${{ matrix.component }}
run: |
export PATH="$PATH:${HOME}/.cargo/bin"
cd ${{ matrix.component-path }}
${{ matrix.command }}
env:
RUST_BACKTRACE: "1"
static-checks:
runs-on: ubuntu-20.04
strategy:
fail-fast: false
matrix:
cmd:
- "make static-checks"
env:
RUST_BACKTRACE: "1"
target_branch: ${{ github.base_ref }}
GOPATH: ${{ github.workspace }}
steps:
- name: Free disk space
run: |
sudo rm -rf /usr/share/dotnet
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
- name: Checkout code
uses: actions/checkout@v3
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Install Go
uses: actions/setup-go@v3
with:
go-version: 1.19.3
- name: Check kernel config version
run: |
cd "${{ github.workspace }}/src/github.com/${{ github.repository }}"
kernel_dir="tools/packaging/kernel/"
kernel_version_file="${kernel_dir}kata_config_version"
modified_files=$(git diff --name-only origin/main..HEAD)
if git diff --name-only origin/main..HEAD "${kernel_dir}" | grep "${kernel_dir}"; then
echo "Kernel directory has changed, checking if $kernel_version_file has been updated"
if echo "$modified_files" | grep -v "README.md" | grep "${kernel_dir}" >>"/dev/null"; then
echo "$modified_files" | grep "$kernel_version_file" >>/dev/null || ( echo "Please bump version in $kernel_version_file" && exit 1)
else
echo "Readme file changed, no need for kernel config version update."
fi
echo "Check passed"
fi
- name: Set PATH
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
echo "${{ github.workspace }}/bin" >> $GITHUB_PATH
- name: Setup
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh
- name: Installing rust
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_rust.sh
PATH=$PATH:"$HOME/.cargo/bin"
rustup target add x86_64-unknown-linux-musl
rustup component add rustfmt clippy
- name: Setup seccomp
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
libseccomp_install_dir=$(mktemp -d -t libseccomp.XXXXXXXXXX)
gperf_install_dir=$(mktemp -d -t gperf.XXXXXXXXXX)
cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/install_libseccomp.sh "${libseccomp_install_dir}" "${gperf_install_dir}"
echo "Set environment variables for the libseccomp crate to link the libseccomp library statically"
echo "LIBSECCOMP_LINK_TYPE=static" >> $GITHUB_ENV
echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> $GITHUB_ENV
- name: Run check
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }} && ${{ matrix.cmd }}
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
path: ./src/github.com/${{ github.repository }}
- name: Install yq
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }}
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Install golang
run: |
cd ${GOPATH}/src/github.com/${{ github.repository }}
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> $GITHUB_PATH
- name: Install system dependencies
run: |
sudo apt-get -y install moreutils hunspell pandoc
- name: Run check
run: |
export PATH=${PATH}:${GOPATH}/bin
cd ${GOPATH}/src/github.com/${{ github.repository }} && ${{ matrix.cmd }}

2
.gitignore vendored
View File

@@ -6,6 +6,8 @@
**/.vscode
**/.idea
**/.fleet
**/*.swp
**/*.swo
pkg/logging/Cargo.lock
src/agent/src/version.rs
src/agent/kata-agent.service

View File

@@ -18,11 +18,16 @@ TOOLS =
TOOLS += agent-ctl
TOOLS += kata-ctl
TOOLS += log-parser
TOOLS += log-parser-rs
TOOLS += runk
TOOLS += trace-forwarder
STANDARD_TARGETS = build check clean install static-checks-build test vendor
# Variables for the build-and-publish-kata-debug target
KATA_DEBUG_REGISTRY ?= ""
KATA_DEBUG_TAG ?= ""
default: all
include utils.mk
@@ -43,6 +48,9 @@ static-checks: static-checks-build
docs-url-alive-check:
bash ci/docs-url-alive-check.sh
build-and-publish-kata-debug:
bash tools/packaging/kata-debug/kata-debug-build-and-upload-payload.sh ${KATA_DEBUG_REGISTRY} ${KATA_DEBUG_TAG}
.PHONY: \
all \
kata-tarball \

View File

@@ -1,4 +1,6 @@
<img src="https://www.openstack.org/assets/kata/kata-vertical-on-white.png" width="150">
<img src="https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-images-prod/openstack-logo/kata/SVG/kata-1.svg" width="900">
[![CI | Publish Kata Containers payload](https://github.com/kata-containers/kata-containers/actions/workflows/payload-after-push.yaml/badge.svg)](https://github.com/kata-containers/kata-containers/actions/workflows/payload-after-push.yaml) [![Kata Containers Nightly CI](https://github.com/kata-containers/kata-containers/actions/workflows/ci-nightly.yaml/badge.svg)](https://github.com/kata-containers/kata-containers/actions/workflows/ci-nightly.yaml)
# Kata Containers
@@ -132,8 +134,10 @@ The table below lists the remaining parts of the project:
| [packaging](tools/packaging) | infrastructure | Scripts and metadata for producing packaged binaries<br/>(components, hypervisors, kernel and rootfs). |
| [kernel](https://www.kernel.org) | kernel | Linux kernel used by the hypervisor to boot the guest image. Patches are stored [here](tools/packaging/kernel). |
| [osbuilder](tools/osbuilder) | infrastructure | Tool to create "mini O/S" rootfs and initrd images and kernel for the hypervisor. |
| [kata-debug](tools/packaging/kata-debug/README.md) | infrastructure | Utility tool to gather Kata Containers debug information from Kubernetes clusters. |
| [`agent-ctl`](src/tools/agent-ctl) | utility | Tool that provides low-level access for testing the agent. |
| [`kata-ctl`](src/tools/kata-ctl) | utility | Tool that provides advanced commands and debug facilities. |
| [`log-parser-rs`](src/tools/log-parser-rs) | utility | Tool that aid in analyzing logs from the kata runtime. |
| [`trace-forwarder`](src/tools/trace-forwarder) | utility | Agent tracing helper. |
| [`runk`](src/tools/runk) | utility | Standard OCI container runtime based on the agent. |
| [`ci`](https://github.com/kata-containers/ci) | CI | Continuous Integration configuration files and scripts. |
@@ -143,8 +147,10 @@ The table below lists the remaining parts of the project:
Kata Containers is now
[available natively for most distributions](docs/install/README.md#packaged-installation-methods).
However, packaging scripts and metadata are still used to generate [snap](snap/local) and GitHub releases. See
the [components](#components) section for further details.
## Metrics tests
See the [metrics documentation](tests/metrics/README.md).
## Glossary of Terms

View File

@@ -1 +1 @@
3.2.0-alpha0
3.3.0-alpha0

View File

@@ -7,12 +7,10 @@
set -o errexit
cidir=$(dirname "$0")
source "${cidir}/lib.sh"
script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
script_name="$(basename "${BASH_SOURCE[0]}")"
clone_tests_repo
source "${tests_repo_dir}/.ci/lib.sh"
source "${script_dir}/../tests/common.bash"
# The following variables if set on the environment will change the behavior
# of gperf and libseccomp configure scripts, that may lead this script to
@@ -25,11 +23,11 @@ workdir="$(mktemp -d --tmpdir build-libseccomp.XXXXX)"
# Variables for libseccomp
libseccomp_version="${LIBSECCOMP_VERSION:-""}"
if [ -z "${libseccomp_version}" ]; then
libseccomp_version=$(get_version "externals.libseccomp.version")
libseccomp_version=$(get_from_kata_deps "externals.libseccomp.version")
fi
libseccomp_url="${LIBSECCOMP_URL:-""}"
if [ -z "${libseccomp_url}" ]; then
libseccomp_url=$(get_version "externals.libseccomp.url")
libseccomp_url=$(get_from_kata_deps "externals.libseccomp.url")
fi
libseccomp_tarball="libseccomp-${libseccomp_version}.tar.gz"
libseccomp_tarball_url="${libseccomp_url}/releases/download/v${libseccomp_version}/${libseccomp_tarball}"
@@ -38,11 +36,11 @@ cflags="-O2"
# Variables for gperf
gperf_version="${GPERF_VERSION:-""}"
if [ -z "${gperf_version}" ]; then
gperf_version=$(get_version "externals.gperf.version")
gperf_version=$(get_from_kata_deps "externals.gperf.version")
fi
gperf_url="${GPERF_URL:-""}"
if [ -z "${gperf_url}" ]; then
gperf_url=$(get_version "externals.gperf.url")
gperf_url=$(get_from_kata_deps "externals.gperf.url")
fi
gperf_tarball="gperf-${gperf_version}.tar.gz"
gperf_tarball_url="${gperf_url}/${gperf_tarball}"
@@ -87,7 +85,8 @@ build_and_install_libseccomp() {
curl -sLO "${libseccomp_tarball_url}"
tar -xf "${libseccomp_tarball}"
pushd "libseccomp-${libseccomp_version}"
./configure --prefix="${libseccomp_install_dir}" CFLAGS="${cflags}" --enable-static --host="${arch}"
[ "${arch}" == $(uname -m) ] && cc_name="" || cc_name="${arch}-linux-gnu-gcc"
CC=${cc_name} ./configure --prefix="${libseccomp_install_dir}" CFLAGS="${cflags}" --enable-static --host="${arch}"
make
make install
popd

View File

@@ -587,10 +587,15 @@ $ sudo kata-monitor
#### Connect to debug console
Command `kata-runtime exec` is used to connect to the debug console.
You need to start a container for example:
```bash
$ sudo ctr run --runtime io.containerd.kata.v2 -d docker.io/library/ubuntu:latest testdebug
```
Then, you can use the command `kata-runtime exec <sandbox id>` to connect to the debug console.
```
$ kata-runtime exec 1a9ab65be63b8b03dfd0c75036d27f0ed09eab38abb45337fea83acd3cd7bacd
$ kata-runtime exec testdebug
bash-4.2# id
uid=0(root) gid=0(root) groups=0(root)
bash-4.2# pwd

View File

@@ -147,7 +147,8 @@ these commands is potentially challenging.
See issue https://github.com/clearcontainers/runtime/issues/341 and [the constraints challenge](#the-constraints-challenge) for more information.
For CPUs resource management see
[CPU constraints](design/vcpu-handling.md).
[CPU constraints(in runtime-go)](design/vcpu-handling-runtime-go.md).
[CPU constraints(in runtime-rs)](design/vcpu-handling-runtime-rs.md).
# Architectural limitations

View File

@@ -6,16 +6,19 @@ Kata Containers design documents:
- [API Design of Kata Containers](kata-api-design.md)
- [Design requirements for Kata Containers](kata-design-requirements.md)
- [VSocks](VSocks.md)
- [VCPU handling](vcpu-handling.md)
- [VCPU handling(in runtime-go)](vcpu-handling-runtime-go.md)
- [VCPU handling(in runtime-rs)](vcpu-handling-runtime-rs.md)
- [VCPU threads pinning](vcpu-threads-pinning.md)
- [Host cgroups](host-cgroups.md)
- [Agent systemd cgroup](agent-systemd-cgroup.md)
- [`Inotify` support](inotify.md)
- [`Hooks` support](hooks-handling.md)
- [Metrics(Kata 2.0)](kata-2-0-metrics.md)
- [Metrics in Rust Runtime(runtime-rs)](kata-metrics-in-runtime-rs.md)
- [Design for Kata Containers `Lazyload` ability with `nydus`](kata-nydus-design.md)
- [Design for direct-assigned volume](direct-blk-device-assignment.md)
- [Design for core-scheduling](core-scheduling.md)
- [Virtualization Reference Architecture](kata-vra.md)
---
- [Design proposals](proposals)

View File

@@ -78,4 +78,4 @@ with the containers is if the VM itself or the `containerd-shim-kata-v2` dies, i
the containers are removed automatically.
[1]: https://wiki.qemu.org/Features/VirtioVsock
[2]: ./vcpu-handling.md#virtual-cpus-and-kubernetes-pods
[2]: ./vcpu-handling-runtime-go.md#virtual-cpus-and-kubernetes-pods

View File

@@ -3,16 +3,16 @@
[Kubernetes](https://github.com/kubernetes/kubernetes/), or K8s, is a popular open source
container orchestration engine. In Kubernetes, a set of containers sharing resources
such as networking, storage, mount, PID, etc. is called a
[pod](https://kubernetes.io/docs/user-guide/pods/).
[pod](https://kubernetes.io/docs/concepts/workloads/pods/).
A node can have multiple pods, but at a minimum, a node within a Kubernetes cluster
only needs to run a container runtime and a container agent (called a
[Kubelet](https://kubernetes.io/docs/admin/kubelet/)).
[Kubelet](https://kubernetes.io/docs/concepts/overview/components/#kubelet)).
Kata Containers represents a Kubelet pod as a VM.
A Kubernetes cluster runs a control plane where a scheduler (typically
running on a dedicated master node) calls into a compute Kubelet. This
running on a dedicated control-plane node) calls into a compute Kubelet. This
Kubelet instance is responsible for managing the lifecycle of pods
within the nodes and eventually relies on a container runtime to
handle execution. The Kubelet architecture decouples lifecycle

View File

@@ -36,7 +36,7 @@ compatibility, and performance on par with MACVTAP.
Kata Containers has deprecated support for bridge due to lacking performance relative to TC-filter and MACVTAP.
Kata Containers supports both
[CNM](https://github.com/docker/libnetwork/blob/master/docs/design.md#the-container-network-model)
[CNM](https://github.com/moby/libnetwork/blob/master/docs/design.md#the-container-network-model)
and [CNI](https://github.com/containernetworking/cni) for networking management.
## Network Hotplug

View File

@@ -0,0 +1,50 @@
# Kata Metrics in Rust Runtime(runtime-rs)
Rust Runtime(runtime-rs) is responsible for:
- Gather metrics about `shim`.
- Gather metrics from `hypervisor` (through `channel`).
- Get metrics from `agent` (through `ttrpc`).
---
Here are listed all the metrics gathered by `runtime-rs`.
> * Current status of each entry is marked as:
> * ✅DONE
> * 🚧TODO
### Kata Shim
| STATUS | Metric name | Type | Units | Labels |
| ------ | ------------------------------------------------------------ | ----------- | -------------- | ------------------------------------------------------------ |
| 🚧 | `kata_shim_agent_rpc_durations_histogram_milliseconds`: <br> RPC latency distributions. | `HISTOGRAM` | `milliseconds` | <ul><li>`action` (RPC actions of Kata agent)<ul><li>`grpc.CheckRequest`</li><li>`grpc.CloseStdinRequest`</li><li>`grpc.CopyFileRequest`</li><li>`grpc.CreateContainerRequest`</li><li>`grpc.CreateSandboxRequest`</li><li>`grpc.DestroySandboxRequest`</li><li>`grpc.ExecProcessRequest`</li><li>`grpc.GetMetricsRequest`</li><li>`grpc.GuestDetailsRequest`</li><li>`grpc.ListInterfacesRequest`</li><li>`grpc.ListProcessesRequest`</li><li>`grpc.ListRoutesRequest`</li><li>`grpc.MemHotplugByProbeRequest`</li><li>`grpc.OnlineCPUMemRequest`</li><li>`grpc.PauseContainerRequest`</li><li>`grpc.RemoveContainerRequest`</li><li>`grpc.ReseedRandomDevRequest`</li><li>`grpc.ResumeContainerRequest`</li><li>`grpc.SetGuestDateTimeRequest`</li><li>`grpc.SignalProcessRequest`</li><li>`grpc.StartContainerRequest`</li><li>`grpc.StatsContainerRequest`</li><li>`grpc.TtyWinResizeRequest`</li><li>`grpc.UpdateContainerRequest`</li><li>`grpc.UpdateInterfaceRequest`</li><li>`grpc.UpdateRoutesRequest`</li><li>`grpc.WaitProcessRequest`</li><li>`grpc.WriteStreamRequest`</li></ul></li><li>`sandbox_id`</li></ul> |
| ✅ | `kata_shim_fds`: <br> Kata containerd shim v2 open FDs. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> |
| ✅ | `kata_shim_io_stat`: <br> Kata containerd shim v2 process IO statistics. | `GAUGE` | | <ul><li>`item` (see `/proc/<pid>/io`)<ul><li>`cancelledwritebytes`</li><li>`rchar`</li><li>`readbytes`</li><li>`syscr`</li><li>`syscw`</li><li>`wchar`</li><li>`writebytes`</li></ul></li><li>`sandbox_id`</li></ul> |
| ✅ | `kata_shim_netdev`: <br> Kata containerd shim v2 network devices statistics. | `GAUGE` | | <ul><li>`interface` (network device name)</li><li>`item` (see `/proc/net/dev`)<ul><li>`recv_bytes`</li><li>`recv_compressed`</li><li>`recv_drop`</li><li>`recv_errs`</li><li>`recv_fifo`</li><li>`recv_frame`</li><li>`recv_multicast`</li><li>`recv_packets`</li><li>`sent_bytes`</li><li>`sent_carrier`</li><li>`sent_colls`</li><li>`sent_compressed`</li><li>`sent_drop`</li><li>`sent_errs`</li><li>`sent_fifo`</li><li>`sent_packets`</li></ul></li><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_pod_overhead_cpu`: <br> Kata Pod overhead for CPU resources(percent). | `GAUGE` | percent | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_pod_overhead_memory_in_bytes`: <br> Kata Pod overhead for memory resources(bytes). | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> |
| ✅ | `kata_shim_proc_stat`: <br> Kata containerd shim v2 process statistics. | `GAUGE` | | <ul><li>`item` (see `/proc/<pid>/stat`)<ul><li>`cstime`</li><li>`cutime`</li><li>`stime`</li><li>`utime`</li></ul></li><li>`sandbox_id`</li></ul> |
| ✅ | `kata_shim_proc_status`: <br> Kata containerd shim v2 process status. | `GAUGE` | | <ul><li>`item` (see `/proc/<pid>/status`)<ul><li>`hugetlbpages`</li><li>`nonvoluntary_ctxt_switches`</li><li>`rssanon`</li><li>`rssfile`</li><li>`rssshmem`</li><li>`vmdata`</li><li>`vmexe`</li><li>`vmhwm`</li><li>`vmlck`</li><li>`vmlib`</li><li>`vmpeak`</li><li>`vmpin`</li><li>`vmpmd`</li><li>`vmpte`</li><li>`vmrss`</li><li>`vmsize`</li><li>`vmstk`</li><li>`vmswap`</li><li>`voluntary_ctxt_switches`</li></ul></li><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_cpu_seconds_total`: <br> Total user and system CPU time spent in seconds. | `COUNTER` | `seconds` | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_max_fds`: <br> Maximum number of open file descriptors. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_open_fds`: <br> Number of open file descriptors. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_resident_memory_bytes`: <br> Resident memory size in bytes. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_start_time_seconds`: <br> Start time of the process since `unix` epoch in seconds. | `GAUGE` | `seconds` | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_virtual_memory_bytes`: <br> Virtual memory size in bytes. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_virtual_memory_max_bytes`: <br> Maximum amount of virtual memory available in bytes. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_rpc_durations_histogram_milliseconds`: <br> RPC latency distributions. | `HISTOGRAM` | `milliseconds` | <ul><li>`action` (Kata shim v2 actions)<ul><li>`checkpoint`</li><li>`close_io`</li><li>`connect`</li><li>`create`</li><li>`delete`</li><li>`exec`</li><li>`kill`</li><li>`pause`</li><li>`pids`</li><li>`resize_pty`</li><li>`resume`</li><li>`shutdown`</li><li>`start`</li><li>`state`</li><li>`stats`</li><li>`update`</li><li>`wait`</li></ul></li><li>`sandbox_id`</li></ul> |
| ✅ | `kata_shim_threads`: <br> Kata containerd shim v2 process threads. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> |
### Kata Hypervisor
Different from golang runtime, hypervisor and shim in runtime-rs belong to the **same process**, so all previous metrics for hypervisor and shim only need to be gathered once. Thus, we currently only collect previous metrics in kata shim.
At the same time, we added the interface(`VmmAction::GetHypervisorMetrics`) to gather hypervisor metrics, in case we design tailor-made metrics for hypervisor in the future. Here're metrics exposed from [src/dragonball/src/metric.rs](https://github.com/kata-containers/kata-containers/blob/main/src/dragonball/src/metric.rs).
| Metric name | Type | Units | Labels |
| ------------------------------------------------------------ | ---------- | ----- | ------------------------------------------------------------ |
| `kata_hypervisor_scrape_count`: <br> Metrics scrape count | `COUNTER` | | <ul><li>`sandbox_id`</li></ul> |
| `kata_hypervisor_vcpu`: <br>Hypervisor metrics specific to VCPUs' mode of functioning. | `IntGauge` | | <ul><li>`item`<ul><li>`exit_io_in`</li><li>`exit_io_out`</li><li>`exit_mmio_read`</li><li>`exit_mmio_write`</li><li>`failures`</li><li>`filter_cpuid`</li></ul></li><li>`sandbox_id`</li></ul> |
| `kata_hypervisor_seccomp`: <br> Hypervisor metrics for the seccomp filtering. | `IntGauge` | | <ul><li>`item`<ul><li>`num_faults`</li></ul></li><li>`sandbox_id`</li></ul> |
| `kata_hypervisor_seccomp`: <br> Hypervisor metrics for the seccomp filtering. | `IntGauge` | | <ul><li>`item`<ul><li>`sigbus`</li><li>`sigsegv`</li></ul></li><li>`sandbox_id`</li></ul> |

434
docs/design/kata-vra.md Normal file
View File

@@ -0,0 +1,434 @@
# Virtualization Reference Architecture
## Subject to Change | © 2022 by NVIDIA Corporation. All rights reserved. | For test and development only_
Before digging deeper into the virtualization reference architecture, let's
first look at the various GPUDirect use cases in the following table. Were
distinguishing between two top-tier use cases where the devices are (1)
passthrough and (2) virtualized, where a VM gets assigned a virtual function
(VF) and not the physical function (PF). A combination of PF and VF would also
be possible.
| Device #1  (passthrough) | Device #2 (passthrough) | P2P Compatibility and Mode |
| ------------------------- | ----------------------- | -------------------------------------------- |
| GPU PF | GPU PF | GPUDirect P2P  |
| GPU PF | NIC PF | GPUDirect RDMA |
| MIG-slice | MIG-slice | _No GPUDirect P2P_ |
| MIG-slice | NIC PF | GPUDirect RDMA |
| **PDevice #1  (virtualized)** | **Device #2 (virtualized)** | **P2P Compatibility and Mode** |
| Time-slice vGPU VF | Time-slice vGPU VF | _No GPUDirect P2P  but NVLINK P2P available_ |
| Time-slice vGPU VF | NIC VF | GPUDirect RDMA |
| MIG-slice vGPU | MIG-slice vGPU | _No GPUDirect P2P_ |
| MIG-slice vGPU | NIC VF | GPUDirect RDMA |
In a virtualized environment we have several distinct features that may prevent
Peer-to-peer (P2P) communication of two endpoints in a PCI Express topology. The
IOMMU translates IO virtual addresses (IOVA) to physical addresses (PA). Each
device behind an IOMMU has its own IOVA memory space, usually, no two devices
share the same IOVA memory space but its up to the hypervisor or OS how it
chooses to map devices to IOVA spaces.  Any PCI Express DMA transactions will
use IOVAs, which the IOMMU must translate. By default, all the traffic is routed
to the root complex and not issued directly to the peer device.
An IOMMU can be used to isolate and protect devices even if virtualization is
not used; since devices can only access memory regions that are mapped for it, a
DMA from one device to another is not possible. DPDK uses the IOMMU to have
better isolation between devices, another benefit is that IOVA space can be
represented as a contiguous memory even if the PA space is heavily scattered.
In the case of virtualization, the IOMMU is responsible for isolating the device
and memory between VMs for safe device assignment without compromising the host
and other guest OSes. Without an IOMMU, any device can access the entire system
and perform DMA transactions _anywhere_.
The second feature is ACS (Access Control Services), which controls which
devices are allowed to communicate with one another and thus avoids improper
routing of packets `irrespectively` of whether IOMMU is enabled or not.
When IOMMU is enabled, ACS is normally configured to force all PCI Express DMA
to go through the root complex so IOMMU can translate it, impacting performance
between peers with higher latency and reduced bandwidth.
A way to avoid the performance hit is to enable Address Translation Services
(ATS). ATS-capable endpoints can prefetch IOVA -> PA translations from the IOMMU
and then perform DMA transactions directly to another endpoint. Hypervisors
enable this by enabling ATS in such endpoints, configuring ACS to enable Direct
Translated P2P, and configuring the IOMMU to allow Address Translation requests.
Another important factor is that the NVIDIA driver stack will use the PCI
Express topology of the system it is running on to determine whether the
hardware is capable of supporting P2P. The driver stack qualifies specific
chipsets, and PCI Express switches for use with GPUDirect P2P. In virtual
environments, the PCI Express topology is flattened and obfuscated to present a
uniform environment to the software inside the VM, which breaks the GPUDirect
P2P use case.
On a bare metal machine, the driver stack groups GPUs into cliques that can
perform GPUDirect P2P communication, excluding peer mappings where P2P
communication is not possible, prominently if GPUs are attached to multiple CPU
sockets.  
CPUs and local memory banks are referred to as NUMA nodes. In a two-socket
server, each of the CPUs has a local memory bank for a total of two NUMA nodes.
Some servers provide the ability to configure additional NUMA nodes per CPU,
which means a CPU socket can have two NUMA nodes  (some servers support four
NUMA nodes per socket) with local memory banks and L3 NUMA domains for improved
performance.
One of the current solutions is that the hypervisor provides additional topology
information that the driver stack can pick up and enable GPUDirect P2P between
GPUs, even if the virtualized environment does not directly expose it. The PCI
Express virtual P2P approval capability structure in the PCI configuration space
is entirely emulated by the hypervisor of passthrough GPU devices.
A clique ID is provided where GPUs with the same clique ID belong to a group of
GPUs capable of P2P communication
On vSphere, Azure, and other CPSs,  the hypervisor lays down a `topologies.xml`
which NCCL can pick up and deduce the right P2P level[^1]. NCCL is leveraging
Infiniband (IB) and/or Unified Communication X (UCX) for communication, and
GPUDirect P2P and GPUDirect RDMA should just work in this case. The only culprit
is that software or applications that do not use the XML file to deduce the
topology will fail and not enable GPUDirect ( [`nccl-p2p-level`](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/env.html#nccl-p2p-level) )
## Hypervisor PCI Express Topology
To enable every part of the accelerator stack, we propose a virtualized
reference architecture to enable GPUDirect P2P and GPUDirect RDMA for any
hypervisor. The idea is split into two parts to enable the right PCI Express
topology. The first part builds upon extending the PCI Express virtual P2P
approval capability structure to every device that wants to do P2P in some way
and groups devices by clique ID. The other part involves replicating a subset of
the host topology so that applications running in the VM do not need to read
additional information and enable the P2P capability like in the bare-metal use
case described above. The driver stack can then deduce automatically if the
topology presented in the VM is capable of P2P communication.
We will work with the following host topology for the following sections. It is
a system with two converged DPUs, each having an `A100X` GPU and two `ConnectX-6`
network ports connected to the downstream ports of a PCI Express switch.
```sh
+-00.0-[d8-df]----00.0-[d9-df]--+-00.0-[da-db]--+-00.0 Mellanox Tech MT42822 BlueField-2 integrated ConnectX-6 Dx network
| +-00.1 Mellanox Tech MT42822 BlueField-2 integrated ConnectX-6 Dx network
| \-00.2 Mellanox Tech MT42822 BlueField-2 SoC Management Interface
\-01.0-[dc-df]----00.0-[dd-df]----08.0-[de-df]----00.0 NVIDIA Corporation GA100 [A100X]
+-00.0-[3b-42]----00.0-[3c-42]--+-00.0-[3d-3e]--+-00.0 Mellanox Tech MT42822 BlueField-2 integrated ConnectX-6 Dx network
| +-00.1 Mellanox Tech MT42822 BlueField-2 integrated ConnectX-6 Dx network
| \-00.2 Mellanox Tech MT42822 BlueField-2 SoC Management Interface
\-01.0-[3f-42]----00.0-[40-42]----08.0-[41-42]----00.0 NVIDIA Corporation GA100 [A100X]
```
The green path highlighted above is the optimal and preferred path for
efficient P2P communication.
## PCI Express Virtual P2P Approval Capability
Most of the time, the PCI Express topology is flattened and obfuscated to ensure
easy migration of the VM image between different physical hardware `topologies`.
In Kata, we can configure the hypervisor to use PCI Express root ports to
hotplug the VFIO  devices one is passing through. A user can select how many PCI
Express root ports to allocate depending on how many devices are passed through.
A recent addition to Kata will detect the right amount of PCI Express devices
that need hotplugging and bail out if the number of root ports is insufficient.
In Kata, we do not automatically increase the number of root ports, we want the
user to be in full control of the topology.
```toml
# /etc/kata-containers/configuration.toml
# VFIO devices are hotplugged on a bridge by default.
# Enable hot-plugging on the root bus. This may be required for devices with
# a large PCI bar, as this is a current limitation with hot-plugging on
# a bridge.
# Default “bridge-port”
hotplug_vfio = "root-port"
# Before hot plugging a PCIe device, you need to add a pcie_root_port device.
# Use this parameter when using some large PCI bar devices, such as NVIDIA GPU
# The value means the number of pcie_root_port
# This value is valid when hotplug_vfio_on_root_bus is true and machine_type is "q35"
# Default 0
pcie_root_port = 8
```
VFIO devices are hotplugged on a PCIe-PCI bridge by default. Hotplug of PCI
Express devices is only supported on PCI Express root or downstream ports. With
this configuration set, if we start up a Kata container, we can inspect our
topology and see the allocated PCI Express root ports and the hotplugged
devices.
```sh
$ lspci -tv
-[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
+-01.0 Red Hat, Inc. Virtio console
+-02.0 Red Hat, Inc. Virtio SCSI
+-03.0 Red Hat, Inc. Virtio RNG
+-04.0-[01]----00.0 Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6
+-05.0-[02]----00.0 Mellanox Technologies MT42822 BlueField-2 integrated ConnectX-6
+-06.0-[03]----00.0 NVIDIA Corporation Device 20b8
+-07.0-[04]----00.0 NVIDIA Corporation Device 20b8
+-08.0-[05]--
+-09.0-[06]--
+-0a.0-[07]--
+-0b.0-[08]--
+-0c.0 Red Hat, Inc. Virtio socket
+-0d.0 Red Hat, Inc. Virtio file system
+-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller
+-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller
\-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller
```
For devices with huge BARs (Base Address Registers) like the GPU (we need to
configure the PCI Express root port properly and allocate enough memory for
mapping), we have added a heuristic to Kata to deduce the right settings. Hence,
the BARs can be mapped correctly. This functionality is added to
[`nvidia/go-nvlib1](https://gitlab.com/nvidia/cloud-native/go-nvlib) which is part
of Kata now.
```sh
$ sudo dmesg | grep BAR
[ 0.179960] pci 0000:00:04.0: BAR 7: assigned [io 0x1000-0x1fff]
[ 0.179962] pci 0000:00:05.0: BAR 7: assigned [io 0x2000-0x2fff]
[ 0.179963] pci 0000:00:06.0: BAR 7: assigned [io 0x3000-0x3fff]
[ 0.179964] pci 0000:00:07.0: BAR 7: assigned [io 0x4000-0x4fff]
[ 0.179966] pci 0000:00:08.0: BAR 7: assigned [io 0x5000-0x5fff]
[ 0.179967] pci 0000:00:09.0: BAR 7: assigned [io 0x6000-0x6fff]
[ 0.179968] pci 0000:00:0a.0: BAR 7: assigned [io 0x7000-0x7fff]
[ 0.179969] pci 0000:00:0b.0: BAR 7: assigned [io 0x8000-0x8fff]
[ 2.115912] pci 0000:01:00.0: BAR 0: assigned [mem 0x13000000000-0x13001ffffff 64bit pref]
[ 2.116203] pci 0000:01:00.0: BAR 2: assigned [mem 0x13002000000-0x130027fffff 64bit pref]
[ 2.683132] pci 0000:02:00.0: BAR 0: assigned [mem 0x12000000000-0x12001ffffff 64bit pref]
[ 2.683419] pci 0000:02:00.0: BAR 2: assigned [mem 0x12002000000-0x120027fffff 64bit pref]
[ 2.959155] pci 0000:03:00.0: BAR 1: assigned [mem 0x11000000000-0x117ffffffff 64bit pref]
[ 2.959345] pci 0000:03:00.0: BAR 3: assigned [mem 0x11800000000-0x11801ffffff 64bit pref]
[ 2.959523] pci 0000:03:00.0: BAR 0: assigned [mem 0xf9000000-0xf9ffffff]
[ 2.966119] pci 0000:04:00.0: BAR 1: assigned [mem 0x10000000000-0x107ffffffff 64bit pref]
[ 2.966295] pci 0000:04:00.0: BAR 3: assigned [mem 0x10800000000-0x10801ffffff 64bit pref]
[ 2.966472] pci 0000:04:00.0: BAR 0: assigned [mem 0xf7000000-0xf7ffffff]
```
The NVIDIA driver stack in this case would refuse to do P2P communication since
(1) the topology is not what it expects, (2)  we do not have a qualified
chipset. Since our P2P devices are not connected to a PCI Express switch port,
we need to provide additional information to support the P2P functionality. One
way of providing such meta information would be to annotate the container; most
of the settings in Kata's configuration file can be overridden via annotations,
but this limits the flexibility, and a user would need to update all the
containers that he wants to run with Kata. The goal is to make such things as
transparent as possible, so we also introduced
[CDI](https://github.com/container-orchestrated-devices/container-device-interface)
(Container Device Interface) to Kata. CDI is a[
specification](https://github.com/container-orchestrated-devices/container-device-interface/blob/main/SPEC.md)
for container runtimes to support third-party devices.
As written before, we can provide a clique ID for the devices that belong
together and are capable of doing P2P. This information is provided to the
hypervisor, which will set up things in the VM accordingly. Let's suppose the
user wanted to do GPUDirect RDMA with the first GPU and the NIC that reside on
the same DPU, one could provide the specification telling the hypervisor that
they belong to the same clique.
```yaml
# /etc/cdi/nvidia.yaml
cdiVersion: 0.4.0
kind: nvidia.com/gpu
devices:
- name: gpu0
annotations:
bdf: “41:00.0”
clique-id: “0”
containerEdits:
deviceNodes:
- path: “/dev/vfio/71"
# /etc/cdi/mellanox.yaml
cdiVersion: 0.4.0
kind: mellanox.com/nic
devices:
- name: nic0
annotations:
bdf: “3d:00.0”
clique-id: “0”
attach-pci: “true”
containerEdits:
deviceNodes:
- path: "/dev/vfio/66"
```
Since this setting is bound to the device and not the container we do not need
to alter the container just allocate the right resource and GPUDirect RDMA would
be set up correctly. Rather than exposing them separately, an idea would be to
expose a GPUDirect RDMA device via NFD (Node Feature Discovery) that combines
both of them; this way, we could make sure that the right pair is allocated and
used more on  Kubernetes deployment in the next section.
The GPU driver stack is leveraging the PCI Express virtual P2P approval
capability, but the NIC stack does not use this now. One of the action items is
to enable MOFED to read the P2P approval capability and enable ATS and ACS
settings as described above.
This way, we could enable GPUDirect P2P and GPUDirect RDMA on any topology
presented to the VM application. It is the responsibility of the administrator
or infrastructure engineer to provide the right information either via
annotations or a CDI specification.
## Host Topology Replication
The other way to represent the PCI Express topology in the VM is to replicate a
subset of the topology needed to support the P2P use case inside the VM. Similar
to the configuration for the root ports, we can easily configure the usage of
PCI Express switch ports to hotplug the devices.
```toml
# /etc/kata-containers/configuration.toml
# VFIO devices are hotplugged on a bridge by default.
# Enable hot plugging on the root bus. This may be required for devices with
# a large PCI bar, as this is a current limitation with hot plugging on
# a bridge.
# Default “bridge-port”
hotplug_vfio = "switch-port"
# Before hot plugging a PCIe device, you need to add a pcie_root_port device.
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
# The value means the number of pcie_root_port
# This value is valid when hotplug_vfio_on_root_bus is true and machine_type is "q35"
# Default 0
pcie_switch_port = 8
```
Each device that is passed through is attached to a PCI Express downstream port
as illustrated below. We can even replicate the hosts two DPUs `topologies` with
added metadata through the CDI. Most of the time, a container only needs one
pair of GPU and NIC for GPUDirect RDMA. This is more of a showcase of what we
can do with the power of Kata and CDI. One could even think of adding groups of
devices that support P2P, even from different CPU sockets or NUMA nodes, into
one container; indeed, the first group is NUMA node 0 (red), and the second
group is NUMA node 1 (green). Since they are grouped correctly, P2P would be
enabled naturally inside a group, aka clique ID.
```sh
$ lspci -tv
-[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
+-01.0 Red Hat, Inc. Virtio console
+-02.0 Red Hat, Inc. Virtio SCSI
+-03.0 Red Hat, Inc. Virtio RNG
+-04.0-[01-04]----00.0-[02-04]--+-00.0-[03]----00.0 NVIDIA Corporation Device 20b8
| \-01.0-[04]----00.0 Mellanox Tech MT42822 BlueField-2 integrated ConnectX-6 Dx
+-05.0-[05-08]----00.0-[06-08]--+-00.0-[07]----00.0 Mellanox Tech MT42822 BlueField-2 integrated ConnectX-6 Dx
| \-01.0-[08]----00.0 NVIDIA Corporation Device 20b8
+-06.0 Red Hat, Inc. Virtio socket
+-07.0 Red Hat, Inc. Virtio file system
+-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller
+-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
\-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller
\-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller
```
The configuration of using either the root port or switch port can be applied on
a per Container or Pod basis, meaning we can switch PCI Express `topologies` on
each run of an application.
## Hypervisor Resource Limits
Every hypervisor will have resource limits in terms of how many PCI Express root
ports, switch ports, or bridge ports can be created, especially with devices
that need to reserve a 4K IO range per PCI specification. Each instance of root
or switch port will consume 4K IO of very limited capacity, 64k is the maximum.
Simple math brings us to the conclusion that we can have a maximum of 16 PCI
Express root ports or 16 PCI Express switch ports in QEMU if devices with IO
BARs are used in the PCI Express hierarchy.
Additionally, one can have 32 slots on the PCI root bus and a maximum of 256
slots for the complete PCI(e) topology.
Per default, QEMU will attach a multi-function device in the last slot on the
PCI root bus,
```sh
+-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller
+-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
\-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller
```
Kata will additionally add `virtio-xxx-pci` devices consuming (5 slots) plus a
PCIe-PCI-bridge (1 slot) and a DRAM controller (1 slot), meaning per default, we
have already eight slots used. This leaves us 24 slots for adding other devices
to the root bus.
The problem that arises here is one use-case from a customer that uses recent
RTX GPUs with Kata. The user wanted to pass through eight of these GPUs into one
container and ran into issues. The problem is that those cards often consist of
four individual device nodes: GPU, Audio, and two USB controller devices (some
cards have a USB-C output).
These devices are grouped into one IOMMU group. Since one needs to pass through
the complete IOMMU group into the VM, we need to allocate 32 PCI Express root
ports or 32 PCI Express switch ports, which is technically impossible due to the
resource limits outlined above. Since all the devices appear as PCI Express
devices, we need to hotplug those into a root or switch port.
The solution to this problem is leveraging CDI. For each device, add the
information if it is going to be hotplugged as a PCI Express or PCI device,
which results in either using a PCI Express root/switch port or an ordinary PCI
bridge. PCI bridges are not affected by the limited IO range. This way, the GPU
is attached as a PCI Express device to a root/switch port and the other three
PCI devices to a PCI bridge, leaving enough resources to create the needed PCI
Express root/switch ports.  For example, were going to attach the GPUs to a PCI
Express root port and the NICs to a PCI bridge.
```jsonld
# /etc/cdi/mellanox.json
cdiVersion: 0.4.0
kind: mellanox.com/nic
devices:
- name: nic0
annotations:
bdf: “3d:00.0”
clique-id: “0”
attach-pci: “true”
containerEdits:
deviceNodes:
- path: "/dev/vfio/66"
- name: nic1
annotations:
bdf: “3d:00.1”
clique-id: “1”
attach-pci: “true”
containerEdits:
deviceNodes:
- path: "/dev/vfio/67”
```
The configuration is set to use eight root ports for the GPUs and attach the
NICs to a PCI bridge which is connected to a PCI Express-PCI bridge which is the
preferred way of introducing a PCI topology in a PCI Express machine.
```sh
$ lspci -tv
-[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
+-01.0 Red Hat, Inc. Virtio console
+-02.0 Red Hat, Inc. Virtio SCSI
+-03.0 Red Hat, Inc. Virtio RNG
+-04.0-[01]----00.0 NVIDIA Corporation Device 20b8
+-05.0-[02]----00.0 NVIDIA Corporation Device 20b8
+-06.0-[03]--
+-07.0-[04]--
+-08.0-[05]--
+-09.0-[06]--
+-0a.0-[07]--
+-0b.0-[08]--
+-0c.0-[09-0a]----00.0-[0a]--+-00.0 Mellanox Tech MT42822 BlueField-2 ConnectX-6
| \-01.0 Mellanox Tech MT42822 BlueField-2 ConnectX-6
+-0d.0 Red Hat, Inc. Virtio socket
+-0e.0 Red Hat, Inc. Virtio file system
+-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller
+-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller
\-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller
```
The PCI devices will consume a slot of which we have 256 in the PCI(e) topology
and leave scarce resources for the needed PCI Express devices.

View File

@@ -0,0 +1,51 @@
# Virtual machine vCPU sizing in Kata Containers 3.0
> Preview:
> [Kubernetes(since 1.23)][1] and [Containerd(since 1.6.0-beta4)][2] will help calculate `Sandbox Size` info and pass it to Kata Containers through annotations.
> In order to adapt to this beneficial change and be compatible with the past, we have implemented the new vCPUs handling way in `runtime-rs`, which is slightly different from the original `runtime-go`'s design.
## When do we need to handle vCPUs size?
vCPUs sizing should be determined by the container workloads. So throughout the life cycle of Kata Containers, there are several points in time when we need to think about how many vCPUs should be at the time. Mainly including the time points of `CreateVM`, `CreateContainer`, `UpdateContainer`, and `DeleteContainer`.
* `CreateVM`: When creating a sandbox, we need to know how many vCPUs to start the VM with.
* `CreateContainer`: When creating a new container in the VM, we may need to hot-plug the vCPUs according to the requirements in container's spec.
* `UpdateContainer`: When receiving the `UpdateContainer` request, we may need to update the vCPU resources according to the new requirements of the container.
* `DeleteContainer`: When a container is removed from the VM, we may need to hot-unplug the vCPUs to reclaim the vCPU resources introduced by the container.
## On what basis do we calculate the number of vCPUs?
When Kata calculate the number of vCPUs, We have three data sources, the `default_vcpus` and `default_maxvcpus` specified in the configuration file (named `TomlConfig` later in the doc), the `io.kubernetes.cri.sandbox-cpu-quota` and `io.kubernetes.cri.sandbox-cpu-period` annotations passed by the upper layer runtime, and the corresponding CPU resource part in the container's spec for the container when `CreateContainer`/`UpdateContainer`/`DeleteContainer` is requested.
Our understanding and priority of these resources are as follows, which will affect how we calculate the number of vCPUs later.
* From `TomlConfig`:
* `default_vcpus`: default number of vCPUs when starting a VM.
* `default_maxvcpus`: maximum number of vCPUs.
* From `Annotation`:
* `InitialSize`: we call the size of the resource passed from the annotations as `InitialSize`. Kubernetes will calculate the sandbox size according to the Pod's statement, which is the `InitialSize` here. This size should be the size we want to prioritize.
* From `Container Spec`:
* The amount of CPU resources that the Container wants to use will be declared through the spec. Including the aforementioned annotations, we mainly consider `cpu quota` and `cpuset` when calculating the number of vCPUs.
* `cpu quota`: `cpu quota` is the most common way to declare the amount of CPU resources. The number of vCPUs introduced by `cpu quota` declared in a container's spec is: `vCPUs = ceiling( quota / period )`.
* `cpuset`: `cpuset` is often used to bind the CPUs that tasks can run on. The number of vCPUs may introduced by `cpuset` declared in a container's spec is the number of CPUs specified in the set that do not overlap with other containers.
## How to calculate and adjust the vCPUs size:
There are two types of vCPUs that we need to consider, one is the number of vCPUs when starting the VM (named `Boot Size` in the doc). The second is the number of vCPUs when `CreateContainer`/`UpdateContainer`/`DeleteContainer` request is received (`Real-time Size` in the doc).
### `Boot Size`
The main considerations are `InitialSize` and `default_vcpus`. There are the following principles:
`InitialSize` has priority over `default_vcpus` declared in `TomlConfig`.
1. When there is such an annotation statement, the originally `default_vcpus` will be modified to the number of vCPUs in the `InitialSize` as the `Boot Size`. (Because not all runtimes support this annotation for the time being, we still keep the `default_cpus` in `TomlConfig`.)
2. When the specs of all containers are aggregated for sandbox size calculation, the method is consistent with the calculation method of `InitialSize` here.
### `Real-time Size`
When we receive an OCI request, it may be for a single container. But what we have to consider is the number of vCPUs for the entire VM. So we will maintain a list. Every time there is a demand for adjustment, the entire list will be traversed to calculate a value for the number of vCPUs. In addition, there are the following principles:
1. Do not cut computing power and try to keep the number of vCPUs specified by `InitialSize`.
* So the number of vCPUs after will not be less than the `Boot Size`.
2. `cpu quota` takes precedence over `cpuset` and the setting history are took into account.
* We think quota describes the CPU time slice that a cgroup can use, and `cpuset` describes the actual CPU number that a cgroup can use. Quota can better describe the size of the CPU time slice that a cgroup actually wants to use. The `cpuset` only describes which CPUs the cgroup can use, but the cgroup can use the specified CPU but consumes a smaller time slice, so the quota takes precedence over the `cpuset`.
* On the one hand, when both `cpu quota` and `cpuset` are specified, we will calculate the number of vCPUs based on `cpu quota` and ignore `cpuset`. On the other hand, if `cpu quota` was used to control the number of vCPUs in the past, and only `cpuset` was updated during `UpdateContainer`, we will not adjust the number of vCPUs at this time.
3. `StaticSandboxResourceMgmt` controls hotplug.
* Some VMMs and kernels of some architectures do not support hotplugging. We can accommodate this situation through `StaticSandboxResourceMgmt`. When `StaticSandboxResourceMgmt = true` is set, we don't make any further attempts to update the number of vCPUs after booting.
[1]: https://github.com/kubernetes/kubernetes/pull/104886
[2]: https://github.com/containerd/containerd/pull/6155

View File

@@ -45,3 +45,4 @@
- [How to run Kata Containers with `nydus`](how-to-use-virtio-fs-nydus-with-kata.md)
- [How to run Kata Containers with AMD SEV-SNP](how-to-run-kata-containers-with-SNP-VMs.md)
- [How to use EROFS to build rootfs in Kata Containers](how-to-use-erofs-build-rootfs.md)
- [How to run Kata Containers with kinds of Block Volumes](how-to-run-kata-containers-with-kinds-of-Block-Volumes.md)

View File

@@ -28,10 +28,10 @@ __Steps from the Developer Guide:__
__SNP-specific steps:__
- Build the SNP-specific kernel as shown below (see this [guide](../../tools/packaging/kernel/README.md#build-kata-containers-kernel) for more information)
```bash
$ pushd kata-containers/tools/packaging/kernel/
$ ./build-kernel.sh -a x86_64 -x snp setup
$ ./build-kernel.sh -a x86_64 -x snp build
$ sudo -E PATH="${PATH}" ./build-kernel.sh -x snp install
$ pushd kata-containers/tools/packaging/
$ ./kernel/build-kernel.sh -a x86_64 -x snp setup
$ ./kernel/build-kernel.sh -a x86_64 -x snp build
$ sudo -E PATH="${PATH}" ./kernel/build-kernel.sh -x snp install
$ popd
```
- Build a current OVMF capable of SEV-SNP:
@@ -44,12 +44,11 @@ $ popd
- Build a custom QEMU
```bash
$ source kata-containers/tools/packaging/scripts/lib.sh
$ qemu_url="$(get_from_kata_deps "assets.hypervisor.qemu.snp.url")"
$ qemu_branch="$(get_from_kata_deps "assets.hypervisor.qemu.snp.branch")"
$ qemu_commit="$(get_from_kata_deps "assets.hypervisor.qemu.snp.commit")"
$ git clone -b "${qemu_branch}" "${qemu_url}"
$ qemu_url="$(get_from_kata_deps "assets.hypervisor.qemu-snp-experimental.url")"
$ qemu_tag="$(get_from_kata_deps "assets.hypervisor.qemu-snp-experimental.tag")"
$ git clone "${qemu_url}"
$ pushd qemu
$ git checkout "${qemu_commit}"
$ git checkout "${qemu_tag}"
$ ./configure --enable-virtfs --target-list=x86_64-softmmu --enable-debug
$ make -j "$(nproc)"
$ popd

View File

@@ -0,0 +1,226 @@
# A new way for Kata Containers to use Kinds of Block Volumes
> **Note:** This guide is only available for runtime-rs with default Hypervisor Dragonball.
> Now, other hypervisors are still ongoing, and it'll be updated when they're ready.
## Background
Currently, there is no widely applicable and convenient method available for users to use some kinds of backend storages, such as File on host based block volume, SPDK based volume or VFIO device based volume for Kata Containers, so we adopt [Proposal: Direct Block Device Assignment](https://github.com/kata-containers/kata-containers/blob/main/docs/design/direct-blk-device-assignment.md) to address it.
## Solution
According to the proposal, it requires to use the `kata-ctl direct-volume` command to add a direct assigned block volume device to the Kata Containers runtime.
And then with the help of method [get_volume_mount_info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L95), get information from JSON file: `(mountinfo.json)` and parse them into structure [Direct Volume Info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L70) which is used to save device-related information.
We only fill the `mountinfo.json`, such as `device` ,`volume_type`, `fs_type`, `metadata` and `options`, which correspond to the fields in [Direct Volume Info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L70), to describe a device.
The JSON file `mountinfo.json` placed in a sub-path `/kubelet/kata-test-vol-001/volume001` which under fixed path `/run/kata-containers/shared/direct-volumes/`.
And the full path looks like: `/run/kata-containers/shared/direct-volumes/kubelet/kata-test-vol-001/volume001`, But for some security reasons. it is
encoded as `/run/kata-containers/shared/direct-volumes/L2t1YmVsZXQva2F0YS10ZXN0LXZvbC0wMDEvdm9sdW1lMDAx`.
Finally, when running a Kata Containers with `ctr run --mount type=X, src=Y, dst=Z,,options=rbind:rw`, the `type=X` should be specified a proprietary type specifically designed for some kind of volume.
Now, supported types:
- `directvol` for direct volume
- `vfiovol` for VFIO device based volume
- `spdkvol` for SPDK/vhost-user based volume
## Setup Device and Run a Kata-Containers
### Direct Block Device Based Volume
#### create raw block based backend storage
> **Tips:** raw block based backend storage MUST be formatted with `mkfs`.
```bash
$ sudo dd if=/dev/zero of=/tmp/stor/rawdisk01.20g bs=1M count=20480
$ sudo mkfs.ext4 /tmp/stor/rawdisk01.20g
```
#### setup direct block device for kata-containers
```json
{
"device": "/tmp/stor/rawdisk01.20g",
"volume_type": "directvol",
"fs_type": "ext4",
"metadata":"{}",
"options": []
}
```
```bash
$ sudo kata-ctl direct-volume add /kubelet/kata-direct-vol-002/directvol002 "{\"device\": \"/tmp/stor/rawdisk01.20g\", \"volume_type\": \"directvol\", \"fs_type\": \"ext4\", \"metadata\":"{}", \"options\": []}"
$# /kubelet/kata-direct-vol-002/directvol002 <==> /run/kata-containers/shared/direct-volumes/W1lMa2F0ZXQva2F0YS10a2F0DAxvbC0wMDEvdm9sdW1lMDAx
$ cat W1lMa2F0ZXQva2F0YS10a2F0DAxvbC0wMDEvdm9sdW1lMDAx/mountInfo.json
{"volume_type":"directvol","device":"/tmp/stor/rawdisk01.20g","fs_type":"ext4","metadata":{},"options":[]}
```
#### Run a Kata container with direct block device volume
```bash
$ # type=disrectvol,src=/kubelet/kata-direct-vol-002/directvol002,dst=/disk002,options=rbind:rw
$ sudo ctr run -t --rm --runtime io.containerd.kata.v2 --mount type=directvol,src=/kubelet/kata-direct-vol-002/directvol002,dst=/disk002,options=rbind:rw "$image" kata-direct-vol-xx05302045 /bin/bash
```
### VFIO Device Based Block Volume
#### create VFIO device based backend storage
> **Tip:** It only supports `vfio-pci` based PCI device passthrough mode.
In this scenario, the device's host kernel driver will be replaced by `vfio-pci`, and IOMMU group ID generated.
And either device's BDF or its VFIO IOMMU group ID in `/dev/vfio/` is fine for "device" in `mountinfo.json`.
```bash
$ lspci -nn -k -s 45:00.1
45:00.1 SCSI storage controller
...
Kernel driver in use: vfio-pci
...
$ ls /dev/vfio/110
/dev/vfio/110
$ ls /sys/kernel/iommu_groups/110/devices/
0000:45:00.1
```
#### setup VFIO device for kata-containers
First, configure the `mountinfo.json`, as below:
- (1) device with `BB:DD:F`
```json
{
"device": "45:00.1",
"volume_type": "vfiovol",
"fs_type": "ext4",
"metadata":"{}",
"options": []
}
```
- (2) device with `DDDD:BB:DD:F`
```json
{
"device": "0000:45:00.1",
"volume_type": "vfiovol",
"fs_type": "ext4",
"metadata":"{}",
"options": []
}
```
- (3) device with `/dev/vfio/X`
```json
{
"device": "/dev/vfio/110",
"volume_type": "vfiovol",
"fs_type": "ext4",
"metadata":"{}",
"options": []
}
```
Second, run kata-containers with device(`/dev/vfio/110`) as an example:
```bash
$ sudo kata-ctl direct-volume add /kubelet/kata-vfio-vol-003/vfiovol003 "{\"device\": \"/dev/vfio/110\", \"volume_type\": \"vfiovol\", \"fs_type\": \"ext4\", \"metadata\":"{}", \"options\": []}"
$ # /kubelet/kata-vfio-vol-003/directvol003 <==> /run/kata-containers/shared/direct-volumes/F0va22F0ZvaS12F0YS10a2F0DAxvbC0F0ZXvdm9sdF0Z0YSx
$ cat F0va22F0ZvaS12F0YS10a2F0DAxvbC0F0ZXvdm9sdF0Z0YSx/mountInfo.json
{"volume_type":"vfiovol","device":"/dev/vfio/110","fs_type":"ext4","metadata":{},"options":[]}
```
#### Run a Kata container with VFIO block device based volume
```bash
$ # type=disrectvol,src=/kubelet/kata-vfio-vol-003/vfiovol003,dst=/disk003,options=rbind:rw
$ sudo ctr run -t --rm --runtime io.containerd.kata.v2 --mount type=vfiovol,src=/kubelet/kata-vfio-vol-003/vfiovol003,dst=/disk003,options=rbind:rw "$image" kata-vfio-vol-xx05302245 /bin/bash
```
### SPDK Device Based Block Volume
SPDK vhost-user devices in runtime-rs, unlike runtime (golang version), there is no need to `mknod` device node under `/dev/` any more.
Just using the `kata-ctl direct-volume add ..` to make a mount info config is enough.
#### Run SPDK vhost target and Expose vhost block device
Run a SPDK vhost target and get vhost-user block controller as an example:
First, run SPDK vhost target:
> **Tips:** If driver `vfio-pci` supported, you can run SPDK with `DRIVER_OVERRIDE=vfio-pci`
> Otherwise, Just run without it `sudo HUGEMEM=4096 ./scripts/setup.sh`.
```bash
$ SPDK_DEVEL=/xx/spdk
$ VHU_UDS_PATH=/tmp/vhu-targets
$ RAW_DISKS=/xx/rawdisks
$ # Reset first
$ ${SPDK_DEVEL}/scripts/setup.sh reset
$ sudo sysctl -w vm.nr_hugepages=2048
$ #4G Huge Memory for spdk
$ sudo HUGEMEM=4096 DRIVER_OVERRIDE=vfio-pci ${SPDK_DEVEL}/scripts/setup.sh
$ sudo ${SPDK_DEVEL}/build/bin/spdk_tgt -S $VHU_UDS_PATH -s 1024 -m 0x3 &
```
Second, create a vhost controller:
```bash
$ sudo dd if=/dev/zero of=${RAW_DISKS}/rawdisk01.20g bs=1M count=20480
$ sudo ${SPDK_DEVEL}/scripts/rpc.py bdev_aio_create ${RAW_DISKS}/rawdisk01.20g vhu-rawdisk01.20g 512
$ sudo ${SPDK_DEVEL}/scripts/rpc.py vhost_create_blk_controller vhost-blk-rawdisk01.sock vhu-rawdisk01.20g
```
Here, a vhost controller `vhost-blk-rawdisk01.sock` is created, and the controller will
be passed to Hypervisor, such as Dragonball, Cloud-Hypervisor, Firecracker or QEMU.
#### setup vhost-user block device for kata-containers
First, `mkdir` a sub-path `kubelet/kata-test-vol-001/` under `/run/kata-containers/shared/direct-volumes/`.
Second, fill fields in `mountinfo.json`, it looks like as below:
```json
{
"device": "/tmp/vhu-targets/vhost-blk-rawdisk01.sock",
"volume_type": "spdkvol",
"fs_type": "ext4",
"metadata":"{}",
"options": []
}
```
Third, with the help of `kata-ctl direct-volume` to add block device to generate `mountinfo.json`, and run a kata container with `--mount`.
```bash
$ # kata-ctl direct-volume add
$ sudo kata-ctl direct-volume add /kubelet/kata-test-vol-001/volume001 "{\"device\": \"/tmp/vhu-targets/vhost-blk-rawdisk01.sock\", \"volume_type\":\"spdkvol\", \"fs_type\": \"ext4\", \"metadata\":"{}", \"options\": []}"
$ # /kubelet/kata-test-vol-001/volume001 <==> /run/kata-containers/shared/direct-volumes/L2t1YmVsZXQva2F0YS10ZXN0LXZvbC0wMDEvdm9sdW1lMDAx
$ cat L2t1YmVsZXQva2F0YS10ZXN0LXZvbC0wMDEvdm9sdW1lMDAx/mountInfo.json
$ {"volume_type":"spdkvol","device":"/tmp/vhu-targets/vhost-blk-rawdisk01.sock","fs_type":"ext4","metadata":{},"options":[]}
```
As `/run/kata-containers/shared/direct-volumes/` is a fixed path , we will be able to run a kata pod with `--mount` and set
`src` sub-path. And the `--mount` argument looks like: `--mount type=spdkvol,src=/kubelet/kata-test-vol-001/volume001,dst=/disk001`.
#### Run a Kata container with SPDK vhost-user block device
In the case, `ctr run --mount type=X, src=source, dst=dest`, the X will be set `spdkvol` which is a proprietary type specifically designed for SPDK volumes.
```bash
$ # ctr run with --mount type=spdkvol,src=/kubelet/kata-test-vol-001/volume001,dst=/disk001
$ sudo ctr run -t --rm --runtime io.containerd.kata.v2 --mount type=spdkvol,src=/kubelet/kata-test-vol-001/volume001,dst=/disk001,options=rbind:rw "$image" kata-spdk-vol-xx0530 /bin/bash
```

View File

@@ -1,5 +1,5 @@
## Introduction
To improve security, Kata Container supports running the VMM process (currently only QEMU) as a non-`root` user.
To improve security, Kata Container supports running the VMM process (QEMU and cloud-hypervisor) as a non-`root` user.
This document describes how to enable the rootless VMM mode and its limitations.
## Pre-requisites
@@ -27,7 +27,7 @@ Another necessary change is to move the hypervisor runtime files (e.g. `vhost-fs
## Limitations
1. Only the VMM process is running as a non-root user. Other processes such as Kata Container shimv2 and `virtiofsd` still run as the root user.
2. Currently, this feature is only supported in QEMU. Still need to bring it to Firecracker and Cloud Hypervisor (see https://github.com/kata-containers/kata-containers/issues/2567).
2. Currently, this feature is only supported in QEMU and cloud-hypervisor. For firecracker, you can use jailer to run the VMM process with a non-root user.
3. Certain features will not work when rootless VMM is enabled, including:
1. Passing devices to the guest (`virtio-blk`, `virtio-scsi`) will not work if the non-privileged user does not have permission to access it (leading to a permission denied error). A more permissive permission (e.g. 666) may overcome this issue. However, you need to be aware of the potential security implications of reducing the security on such devices.
2. `vfio` device will also not work because of permission denied error.

View File

@@ -27,6 +27,8 @@ $ image="quay.io/prometheus/busybox:latest"
$ cat << EOF > "${pod_yaml}"
metadata:
name: busybox-sandbox1
uid: $(uuidgen)
namespace: default
EOF
$ cat << EOF > "${container_yaml}"
metadata:

View File

@@ -139,12 +139,12 @@ By default the CNI plugin binaries is installed under `/opt/cni/bin` (in package
EOF
```
## Allow pods to run in the master node
## Allow pods to run in the control-plane node
By default, the cluster will not schedule pods in the master node. To enable master node scheduling:
By default, the cluster will not schedule pods in the control-plane node. To enable control-plane node scheduling:
```bash
$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/master-
$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/control-plane-
```
## Create runtime class for Kata Containers

View File

@@ -19,12 +19,14 @@ This document requires the presence of Kata Containers on your system. Install u
## Install AWS Firecracker
Kata Containers only support AWS Firecracker v0.23.4 ([yet](https://github.com/kata-containers/kata-containers/pull/1519)).
For information about the supported version of Firecracker, see the Kata Containers
[`versions.yaml`](../../versions.yaml).
To install Firecracker we need to get the `firecracker` and `jailer` binaries:
```bash
$ release_url="https://github.com/firecracker-microvm/firecracker/releases"
$ version="v0.23.1"
$ version=$(yq read <kata-repository>/versions.yaml assets.hypervisor.firecracker.version)
$ arch=`uname -m`
$ curl ${release_url}/download/${version}/firecracker-${version}-${arch} -o firecracker
$ curl ${release_url}/download/${version}/jailer-${version}-${arch} -o jailer

View File

@@ -32,6 +32,7 @@ The `nydus-sandbox.yaml` looks like below:
metadata:
attempt: 1
name: nydus-sandbox
uid: nydus-uid
namespace: default
log_directory: /tmp
linux:

View File

@@ -29,7 +29,7 @@ Then you can build and install the guest kernel image as shown [here](../../tool
## Run a Kata Container utilizing `virtio-mem`
Use following command to enable memory overcommitment of a Linux kernel. Because QEMU `virtio-mem` device need to allocate a lot of memory.
Use following command to enable memory over-commitment of a Linux kernel. Because QEMU `virtio-mem` device need to allocate a lot of memory.
```
$ echo 1 | sudo tee /proc/sys/vm/overcommit_memory
```
@@ -42,6 +42,8 @@ $ image="quay.io/prometheus/busybox:latest"
$ cat << EOF > "${pod_yaml}"
metadata:
name: busybox-sandbox1
uid: $(uuidgen)
namespace: default
EOF
$ cat << EOF > "${container_yaml}"
metadata:

View File

@@ -115,11 +115,11 @@ $ sudo kubeadm init --ignore-preflight-errors=all --config kubeadm-config.yaml
$ export KUBECONFIG=/etc/kubernetes/admin.conf
```
### Allow pods to run in the master node
### Allow pods to run in the control-plane node
By default, the cluster will not schedule pods in the master node. To enable master node scheduling:
By default, the cluster will not schedule pods in the control-plane node. To enable control-plane node scheduling:
```bash
$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/master-
$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/control-plane-
```
### Create runtime class for Kata Containers

View File

@@ -19,7 +19,6 @@ Packaged installation methods uses your distribution's native package format (su
|------------------------------------------------------|----------------------------------------------------------------------------------------------|-------------------|-----------------------------------------------------------------------------------------------|
| [Using kata-deploy](#kata-deploy-installation) | The preferred way to deploy the Kata Containers distributed binaries on a Kubernetes cluster | **No!** | Best way to give it a try on kata-containers on an already up and running Kubernetes cluster. |
| [Using official distro packages](#official-packages) | Kata packages provided by Linux distributions official repositories | yes | Recommended for most users. |
| [Using snap](#snap-installation) | Easy to install | yes | Good alternative to official distro packages. |
| [Automatic](#automatic-installation) | Run a single command to install a full system | **No!** | For those wanting the latest release quickly. |
| [Manual](#manual-installation) | Follow a guide step-by-step to install a working system | **No!** | For those who want the latest release with more control. |
| [Build from source](#build-from-source-installation) | Build the software components manually | **No!** | Power users and developers only. |
@@ -42,12 +41,6 @@ Kata packages are provided by official distribution repositories for:
| [CentOS](centos-installation-guide.md) | 8 |
| [Fedora](fedora-installation-guide.md) | 34 |
### Snap Installation
The snap installation is available for all distributions which support `snapd`.
[Use snap](snap-installation-guide.md) to install Kata Containers from https://snapcraft.io.
### Automatic Installation
[Use `kata-manager`](/utils/README.md) to automatically install a working Kata Containers system.

View File

@@ -26,7 +26,6 @@ architectures:
|------------------------------------------------------|----------------------------------------------------------------------------------------------|-------------------|-----------------------------------------------------------------------------------------------|----------- |
| [Using kata-deploy](#kata-deploy-installation) | The preferred way to deploy the Kata Containers distributed binaries on a Kubernetes cluster | **No!** | Best way to give it a try on kata-containers on an already up and running Kubernetes cluster. | Yes |
| [Using official distro packages](#official-packages) | Kata packages provided by Linux distributions official repositories | yes | Recommended for most users. | No |
| [Using snap](#snap-installation) | Easy to install | yes | Good alternative to official distro packages. | No |
| [Automatic](#automatic-installation) | Run a single command to install a full system | **No!** | For those wanting the latest release quickly. | No |
| [Manual](#manual-installation) | Follow a guide step-by-step to install a working system | **No!** | For those who want the latest release with more control. | No |
| [Build from source](#build-from-source-installation) | Build the software components manually | **No!** | Power users and developers only. | Yes |
@@ -36,8 +35,6 @@ architectures:
Follow the [`kata-deploy`](../../tools/packaging/kata-deploy/README.md).
### Official packages
`ToDo`
### Snap Installation
`ToDo`
### Automatic Installation
`ToDo`
### Manual Installation
@@ -49,14 +46,14 @@ Follow the [`kata-deploy`](../../tools/packaging/kata-deploy/README.md).
* Download `Rustup` and install `Rust`
> **Notes:**
> Rust version 1.62.0 is needed
> For Rust version, please set `RUST_VERSION` to the value of `languages.rust.meta.newest-version key` in [`versions.yaml`](../../versions.yaml) or, if `yq` is available on your system, run `export RUST_VERSION=$(yq read versions.yaml languages.rust.meta.newest-version)`.
Example for `x86_64`
```
$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
$ source $HOME/.cargo/env
$ rustup install 1.62.0
$ rustup default 1.62.0-x86_64-unknown-linux-gnu
$ rustup install ${RUST_VERSION}
$ rustup default ${RUST_VERSION}-x86_64-unknown-linux-gnu
```
* Musl support for fully static binary

View File

@@ -91,7 +91,7 @@ Before you install Kata Containers, check that your Minikube is operating. On yo
$ kubectl get nodes
```
You should see your `master` node listed as being `Ready`.
You should see your `control-plane` node listed as being `Ready`.
Check you have virtualization enabled inside your Minikube. The following should return
a number larger than `0` if you have either of the `vmx` or `svm` nested virtualization features

View File

@@ -1,52 +0,0 @@
# Kata Containers snap package
## Install Kata Containers
Kata Containers can be installed in any Linux distribution that supports
[snapd](https://docs.snapcraft.io/installing-snapd).
Run the following command to install **Kata Containers**:
```sh
$ sudo snap install kata-containers --stable --classic
```
## Configure Kata Containers
By default Kata Containers snap image is mounted at `/snap/kata-containers` as a
read-only file system, therefore default configuration file can not be edited.
Fortunately Kata Containers supports loading a configuration file from another
path than the default.
```sh
$ sudo mkdir -p /etc/kata-containers
$ sudo cp /snap/kata-containers/current/usr/share/defaults/kata-containers/configuration.toml /etc/kata-containers/
$ $EDITOR /etc/kata-containers/configuration.toml
```
## Integration with shim v2 Container Engines
The Container engine daemon (`cri-o`, `containerd`, etc) needs to be able to find the
`containerd-shim-kata-v2` binary to allow Kata Containers to be created.
Run the following command to create a symbolic link to the shim v2 binary.
```sh
$ sudo ln -sf /snap/kata-containers/current/usr/bin/containerd-shim-kata-v2 /usr/local/bin/containerd-shim-kata-v2
```
Once the symbolic link has been created and the engine daemon configured, `io.containerd.kata.v2`
can be used as runtime.
Read the following documents to know how to run Kata Containers 2.x with `containerd`.
* [How to use Kata Containers and Containerd](../how-to/containerd-kata.md)
* [Install Kata Containers with containerd](./container-manager/containerd/containerd-install.md)
## Remove Kata Containers snap package
Run the following command to remove the Kata Containers snap:
```sh
$ sudo snap remove kata-containers
```

View File

@@ -1,101 +0,0 @@
# Kata Containers snap image
This directory contains the resources needed to build the Kata Containers
[snap][1] image.
## Initial setup
Kata Containers can be installed in any Linux distribution that supports
[snapd](https://docs.snapcraft.io/installing-snapd). For this example, we
assume Ubuntu as your base distro.
```sh
$ sudo apt-get --no-install-recommends install -y apt-utils ca-certificates snapd snapcraft
```
## Install snap
You can install the Kata Containers snap from the [snapcraft store][8] or by running the following command:
```sh
$ sudo snap install kata-containers --classic
```
## Build and install snap image
Run the command below which will use the packaging Makefile to build the snap image:
```sh
$ make -C tools/packaging snap
```
> **Warning:**
>
> By default, `snapcraft` will create a clean virtual machine
> environment to build the snap in using the `multipass` tool.
>
> However, `multipass` is silently disabled when `--destructive-mode` is
> used.
>
> Since building the Kata Containers package currently requires
> `--destructive-mode`, the snap will be built using the host
> environment. To avoid parts of the build auto-detecting additional
> features to enable (for example for QEMU), we recommend that you
> only run the snap build in a minimal host environment.
To install the resulting snap image, snap must be put in [classic mode][3] and the
security confinement must be disabled (`--classic`). Also since the resulting snap
has not been signed the verification of signature must be omitted (`--dangerous`).
```sh
$ sudo snap install --classic --dangerous "kata-containers_${version}_${arch}.snap"
```
Replace `${version}` with the current version of Kata Containers and `${arch}` with
the system architecture.
## Configure Kata Containers
By default Kata Containers snap image is mounted at `/snap/kata-containers` as a
read-only file system, therefore default configuration file can not be edited.
Fortunately [`kata-runtime`][4] supports loading a configuration file from another
path than the default.
```sh
$ sudo mkdir -p /etc/kata-containers
$ sudo cp /snap/kata-containers/current/usr/share/defaults/kata-containers/configuration.toml /etc/kata-containers/
$ $EDITOR /etc/kata-containers/configuration.toml
```
## Integration with docker and Kubernetes
The path to the runtime provided by the Kata Containers snap image is
`/snap/kata-containers/current/usr/bin/kata-runtime`. You should use it to
run Kata Containers with [docker][9] and [Kubernetes][10].
## Remove snap
You can remove the Kata Containers snap by running the following command:
```sh
$ sudo snap remove kata-containers
```
## Limitations
The [miniOS image][2] is not included in the snap image as it is not possible for
QEMU to open a guest RAM backing store on a read-only filesystem. Fortunately,
you can start Kata Containers with a Linux initial RAM disk (initrd) that is
included in the snap image. If you want to use the miniOS image instead of initrd,
then a new configuration file can be [created](#configure-kata-containers)
and [configured][7].
[1]: https://docs.snapcraft.io/snaps/intro
[2]: ../../docs/design/architecture/README.md#root-filesystem-image
[3]: https://docs.snapcraft.io/reference/confinement#classic
[4]: https://github.com/kata-containers/kata-containers/tree/main/src/runtime#configuration
[5]: https://docs.docker.com/engine/reference/commandline/dockerd
[6]: ../../docs/install/docker/ubuntu-docker-install.md
[7]: ../../docs/Developer-Guide.md#configure-to-use-initrd-or-rootfs-image
[8]: https://snapcraft.io/kata-containers
[9]: ../../docs/Developer-Guide.md#run-kata-containers-with-docker
[10]: ../../docs/Developer-Guide.md#run-kata-containers-with-kubernetes

View File

@@ -1,114 +0,0 @@
#!/usr/bin/env bash
#
# Copyright (c) 2022 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
# Description: Idempotent script to be sourced by all parts in a
# snapcraft config file.
set -o errexit
set -o nounset
set -o pipefail
# XXX: Bash-specific code. zsh doesn't support this option and that *does*
# matter if this script is run sourced... since it'll be using zsh! ;)
[ -n "$BASH_VERSION" ] && set -o errtrace
[ -n "${DEBUG:-}" ] && set -o xtrace
die()
{
echo >&2 "ERROR: $0: $*"
}
[ -n "${SNAPCRAFT_STAGE:-}" ] ||\
die "must be sourced from a snapcraft config file"
snap_yq_version=3.4.1
snap_common_install_yq()
{
export yq="${SNAPCRAFT_STAGE}/bin/yq"
local yq_pkg
yq_pkg="github.com/mikefarah/yq"
local yq_url
yq_url="https://${yq_pkg}/releases/download/${snap_yq_version}/yq_${goos}_${goarch}"
curl -o "${yq}" -L "${yq_url}"
chmod +x "${yq}"
}
# Function that should be called for each snap "part" in
# snapcraft.yaml.
snap_common_main()
{
# Architecture
arch="$(uname -m)"
case "${arch}" in
aarch64)
goarch="arm64"
qemu_arch="${arch}"
;;
ppc64le)
goarch="ppc64le"
qemu_arch="ppc64"
;;
s390x)
goarch="${arch}"
qemu_arch="${arch}"
;;
x86_64)
goarch="amd64"
qemu_arch="${arch}"
;;
*) die "unsupported architecture: ${arch}" ;;
esac
dpkg_arch=$(dpkg --print-architecture)
# golang
#
# We need the O/S name in golang format, but since we don't
# know if the godeps part has run, we don't know if golang is
# available yet, hence fall back to a standard system command.
goos="$(go env GOOS &>/dev/null || true)"
[ -z "$goos" ] && goos=$(uname -s|tr '[A-Z]' '[a-z]')
export GOROOT="${SNAPCRAFT_STAGE}"
export GOPATH="${GOROOT}/gopath"
export GO111MODULE="auto"
mkdir -p "${GOPATH}/bin"
export PATH="${GOPATH}/bin:${PATH}"
# Proxy
export http_proxy="${http_proxy:-}"
export https_proxy="${https_proxy:-}"
# Binaries
mkdir -p "${SNAPCRAFT_STAGE}/bin"
export PATH="$PATH:${SNAPCRAFT_STAGE}/bin"
# YAML query tool
export yq="${SNAPCRAFT_STAGE}/bin/yq"
# Kata paths
export kata_dir=$(printf "%s/src/github.com/%s/%s" \
"${GOPATH}" \
"${SNAPCRAFT_PROJECT_NAME}" \
"${SNAPCRAFT_PROJECT_NAME}")
export versions_file="${kata_dir}/versions.yaml"
[ -n "${yq:-}" ] && [ -x "${yq:-}" ] || snap_common_install_yq
}
snap_common_main

View File

@@ -1,167 +0,0 @@
name: kata-containers
website: https://github.com/kata-containers/kata-containers
summary: Build lightweight VMs that seamlessly plug into the containers ecosystem
description: |
Kata Containers is an open source project and community working to build a
standard implementation of lightweight Virtual Machines (VMs) that feel and
perform like containers, but provide the workload isolation and security
advantages of VMs
confinement: classic
adopt-info: metadata
base: core20
parts:
metadata:
plugin: nil
prime:
- -*
build-packages:
- git
- git-extras
override-pull: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
version="9999"
if echo "${GITHUB_REF:-}" | grep -q -E "^refs/tags"; then
version=$(echo ${GITHUB_REF:-} | cut -d/ -f3)
git checkout ${version}
fi
snapcraftctl set-grade "stable"
snapcraftctl set-version "${version}"
mkdir -p $(dirname ${kata_dir})
ln -sf $(realpath "${SNAPCRAFT_STAGE}/..") ${kata_dir}
docker:
after: [metadata]
plugin: nil
prime:
- -*
build-packages:
- ca-certificates
- containerd
- curl
- gnupg
- lsb-release
- runc
override-build: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
curl -fsSL https://download.docker.com/linux/ubuntu/gpg |\
sudo gpg --batch --yes --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
distro_codename=$(lsb_release -cs)
echo "deb [arch=${dpkg_arch} signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu ${distro_codename} stable" |\
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get -y update
sudo apt-get -y install docker-ce docker-ce-cli containerd.io
echo "Unmasking docker service"
sudo -E systemctl unmask docker.service || true
sudo -E systemctl unmask docker.socket || true
echo "Adding $USER into docker group"
sudo -E gpasswd -a $USER docker
echo "Starting docker"
sudo -E systemctl start docker || true
image:
after: [docker]
plugin: nil
override-build: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
cd "${SNAPCRAFT_PROJECT_DIR}"
sudo -E NO_TTY=true make rootfs-image-tarball
tarfile="${SNAPCRAFT_PROJECT_DIR}/tools/packaging/kata-deploy/local-build/build/kata-static-rootfs-image.tar.xz"
tar -xvJpf "${tarfile}" -C "${SNAPCRAFT_PART_INSTALL}"
sudo -E NO_TTY=true make rootfs-initrd-tarball
tarfile="${SNAPCRAFT_PROJECT_DIR}/tools/packaging/kata-deploy/local-build/build/kata-static-rootfs-initrd.tar.xz"
tar -xvJpf "${tarfile}" -C "${SNAPCRAFT_PART_INSTALL}"
runtime:
after: [docker]
plugin: nil
override-build: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
cd "${SNAPCRAFT_PROJECT_DIR}"
sudo -E NO_TTY=true make shim-v2-tarball
tarfile="${SNAPCRAFT_PROJECT_DIR}/tools/packaging/kata-deploy/local-build/build/kata-static-shim-v2.tar.xz"
tar -xvJpf "${tarfile}" -C "${SNAPCRAFT_PART_INSTALL}"
mkdir -p "${SNAPCRAFT_PART_INSTALL}/usr/bin"
ln -sf "${SNAPCRAFT_PART_INSTALL}/opt/kata/bin/containerd-shim-kata-v2" "${SNAPCRAFT_PART_INSTALL}/usr/bin/containerd-shim-kata-v2"
ln -sf "${SNAPCRAFT_PART_INSTALL}/opt/kata/bin/kata-runtime" "${SNAPCRAFT_PART_INSTALL}/usr/bin/kata-runtime"
ln -sf "${SNAPCRAFT_PART_INSTALL}/opt/kata/bin/kata-collect-data.sh" "${SNAPCRAFT_PART_INSTALL}/usr/bin/kata-collect-data.sh"
kernel:
after: [docker]
plugin: nil
override-build: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
cd "${SNAPCRAFT_PROJECT_DIR}"
sudo -E NO_TTY=true make kernel-tarball
tarfile="${SNAPCRAFT_PROJECT_DIR}/tools/packaging/kata-deploy/local-build/build/kata-static-kernel.tar.xz"
tar -xvJpf "${tarfile}" -C "${SNAPCRAFT_PART_INSTALL}"
qemu:
plugin: make
after: [docker]
override-build: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
cd "${SNAPCRAFT_PROJECT_DIR}"
sudo -E NO_TTY=true make qemu-tarball
tarfile="${SNAPCRAFT_PROJECT_DIR}/tools/packaging/kata-deploy/local-build/build/kata-static-qemu.tar.xz"
tar -xvJpf "${tarfile}" -C "${SNAPCRAFT_PART_INSTALL}"
virtiofsd:
plugin: nil
after: [docker]
override-build: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
cd "${SNAPCRAFT_PROJECT_DIR}"
sudo -E NO_TTY=true make virtiofsd-tarball
tarfile="${SNAPCRAFT_PROJECT_DIR}/tools/packaging/kata-deploy/local-build/build/kata-static-virtiofsd.tar.xz"
tar -xvJpf "${tarfile}" -C "${SNAPCRAFT_PART_INSTALL}"
cloud-hypervisor:
plugin: nil
after: [docker]
override-build: |
source "${SNAPCRAFT_PROJECT_DIR}/snap/local/snap-common.sh"
if [ "${arch}" == "aarch64" ] || [ "${arch}" == "x86_64" ]; then
cd "${SNAPCRAFT_PROJECT_DIR}"
sudo -E NO_TTY=true make cloud-hypervisor-tarball
tarfile="${SNAPCRAFT_PROJECT_DIR}/tools/packaging/kata-deploy/local-build/build/kata-static-cloud-hypervisor.tar.xz"
tar -xvJpf "${tarfile}" -C "${SNAPCRAFT_PART_INSTALL}"
fi
apps:
runtime:
command: usr/bin/kata-runtime
shim:
command: usr/bin/containerd-shim-kata-v2
collect-data:
command: usr/bin/kata-collect-data.sh

1390
src/agent/Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -8,10 +8,10 @@ license = "Apache-2.0"
[dependencies]
oci = { path = "../libs/oci" }
rustjail = { path = "rustjail" }
protocols = { path = "../libs/protocols", features = ["async"] }
protocols = { path = "../libs/protocols", features = ["async", "with-serde"] }
lazy_static = "1.3.0"
ttrpc = { version = "0.6.0", features = ["async"], default-features = false }
protobuf = "2.27.0"
ttrpc = { version = "0.7.1", features = ["async"], default-features = false }
protobuf = "3.2.0"
libc = "0.2.58"
nix = "0.24.2"
capctl = "0.2.0"
@@ -30,7 +30,7 @@ async-recursion = "0.3.2"
futures = "0.3.17"
# Async runtime
tokio = { version = "1.14.0", features = ["full"] }
tokio = { version = "1.28.1", features = ["full"] }
tokio-vsock = "0.3.1"
netlink-sys = { version = "0.7.0", features = ["tokio_socket",]}
@@ -43,6 +43,7 @@ ipnetwork = "0.17.0"
logging = { path = "../libs/logging" }
slog = "2.5.2"
slog-scope = "4.1.2"
slog-term = "2.9.0"
# Redirect ttrpc log calls
slog-stdlog = "4.0.0"
@@ -66,6 +67,12 @@ serde = { version = "1.0.129", features = ["derive"] }
toml = "0.5.8"
clap = { version = "3.0.1", features = ["derive"] }
# Communication with the OPA service
http = { version = "0.2.8", optional = true }
reqwest = { version = "0.11.14", optional = true }
# The "vendored" feature for openssl is required for musl build
openssl = { version = "0.10.54", features = ["vendored"], optional = true }
[dev-dependencies]
tempfile = "3.1.0"
test-utils = { path = "../libs/test-utils" }
@@ -82,6 +89,7 @@ lto = true
[features]
seccomp = ["rustjail/seccomp"]
standard-oci-runtime = ["rustjail/standard-oci-runtime"]
agent-policy = ["http", "openssl", "reqwest"]
[[bin]]
name = "kata-agent"

View File

@@ -26,13 +26,27 @@ export VERSION_COMMIT := $(if $(COMMIT),$(VERSION)-$(COMMIT),$(VERSION))
EXTRA_RUSTFEATURES :=
##VAR SECCOMP=yes|no define if agent enables seccomp feature
SECCOMP := yes
SECCOMP ?= yes
# Enable seccomp feature of rust build
ifeq ($(SECCOMP),yes)
override EXTRA_RUSTFEATURES += seccomp
endif
##VAR AGENT_POLICY=yes|no define if agent enables the policy feature
AGENT_POLICY ?= no
# Enable the policy feature of rust build
ifeq ($(AGENT_POLICY),yes)
override EXTRA_RUSTFEATURES += agent-policy
endif
include ../../utils.mk
ifeq ($(ARCH), ppc64le)
override ARCH = powerpc64le
endif
##VAR STANDARD_OCI_RUNTIME=yes|no define if agent enables standard oci runtime feature
STANDARD_OCI_RUNTIME := no
@@ -45,12 +59,10 @@ ifneq ($(EXTRA_RUSTFEATURES),)
override EXTRA_RUSTFEATURES := --features "$(EXTRA_RUSTFEATURES)"
endif
include ../../utils.mk
TARGET_PATH = target/$(TRIPLE)/$(BUILD_TYPE)/$(TARGET)
##VAR DESTDIR=<path> is a directory prepended to each installed target file
DESTDIR :=
DESTDIR ?=
##VAR BINDIR=<path> is a directory for installing executable programs
BINDIR := /usr/bin
@@ -136,7 +148,7 @@ vendor:
#TARGET test: run cargo tests
test:
test: $(GENERATED_FILES)
@cargo test --all --target $(TRIPLE) $(EXTRA_RUSTFEATURES) -- --nocapture
##TARGET check: run test

View File

@@ -18,7 +18,7 @@ scopeguard = "1.0.0"
capctl = "0.2.0"
lazy_static = "1.3.0"
libc = "0.2.58"
protobuf = "2.27.0"
protobuf = "3.2.0"
slog = "2.5.2"
slog-scope = "4.1.2"
scan_fmt = "0.2.6"
@@ -29,12 +29,12 @@ cgroups = { package = "cgroups-rs", version = "0.3.2" }
rlimit = "0.5.3"
cfg-if = "0.1.0"
tokio = { version = "1.2.0", features = ["sync", "io-util", "process", "time", "macros", "rt"] }
tokio = { version = "1.28.1", features = ["sync", "io-util", "process", "time", "macros", "rt"] }
futures = "0.3.17"
async-trait = "0.1.31"
inotify = "0.9.2"
libseccomp = { version = "0.3.0", optional = true }
zbus = "2.3.0"
zbus = "3.12.0"
bit-vec= "0.6.3"
xattr = "0.2.3"

View File

@@ -27,7 +27,7 @@ use oci::{
LinuxNetwork, LinuxPids, LinuxResources,
};
use protobuf::{CachedSize, RepeatedField, SingularPtrField, UnknownFields};
use protobuf::MessageField;
use protocols::agent::{
BlkioStats, BlkioStatsEntry, CgroupStats, CpuStats, CpuUsage, HugetlbStats, MemoryData,
MemoryStats, PidsStats, ThrottlingData,
@@ -39,18 +39,16 @@ use std::path::Path;
const GUEST_CPUS_PATH: &str = "/sys/devices/system/cpu/online";
// Convenience macro to obtain the scope logger
macro_rules! sl {
() => {
slog_scope::logger().new(o!("subsystem" => "cgroups"))
};
// Convenience function to obtain the scope logger.
fn sl() -> slog::Logger {
slog_scope::logger().new(o!("subsystem" => "cgroups"))
}
macro_rules! get_controller_or_return_singular_none {
($cg:ident) => {
match $cg.controller_of() {
Some(c) => c,
None => return SingularPtrField::none(),
None => return MessageField::none(),
}
};
}
@@ -82,7 +80,7 @@ impl CgroupManager for Manager {
fn set(&self, r: &LinuxResources, update: bool) -> Result<()> {
info!(
sl!(),
sl(),
"cgroup manager set resources for container. Resources input {:?}", r
);
@@ -120,7 +118,7 @@ impl CgroupManager for Manager {
// set devices resources
set_devices_resources(&self.cgroup, &r.devices, res);
info!(sl!(), "resources after processed {:?}", res);
info!(sl(), "resources after processed {:?}", res);
// apply resources
self.cgroup.apply(res)?;
@@ -134,11 +132,10 @@ impl CgroupManager for Manager {
let throttling_data = get_cpu_stats(&self.cgroup);
let cpu_stats = SingularPtrField::some(CpuStats {
let cpu_stats = MessageField::some(CpuStats {
cpu_usage,
throttling_data,
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
..Default::default()
});
// Memorystats
@@ -160,8 +157,7 @@ impl CgroupManager for Manager {
pids_stats,
blkio_stats,
hugetlb_stats,
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
..Default::default()
})
}
@@ -199,7 +195,7 @@ impl CgroupManager for Manager {
if guest_cpuset.is_empty() {
return Ok(());
}
info!(sl!(), "update_cpuset_path to: {}", guest_cpuset);
info!(sl(), "update_cpuset_path to: {}", guest_cpuset);
let h = cgroups::hierarchies::auto();
let root_cg = h.root_control_group();
@@ -207,12 +203,12 @@ impl CgroupManager for Manager {
let root_cpuset_controller: &CpuSetController = root_cg.controller_of().unwrap();
let path = root_cpuset_controller.path();
let root_path = Path::new(path);
info!(sl!(), "root cpuset path: {:?}", &path);
info!(sl(), "root cpuset path: {:?}", &path);
let container_cpuset_controller: &CpuSetController = self.cgroup.controller_of().unwrap();
let path = container_cpuset_controller.path();
let container_path = Path::new(path);
info!(sl!(), "container cpuset path: {:?}", &path);
info!(sl(), "container cpuset path: {:?}", &path);
let mut paths = vec![];
for ancestor in container_path.ancestors() {
@@ -221,7 +217,7 @@ impl CgroupManager for Manager {
}
paths.push(ancestor);
}
info!(sl!(), "parent paths to update cpuset: {:?}", &paths);
info!(sl(), "parent paths to update cpuset: {:?}", &paths);
let mut i = paths.len();
loop {
@@ -235,7 +231,7 @@ impl CgroupManager for Manager {
.to_str()
.unwrap()
.trim_start_matches(root_path.to_str().unwrap());
info!(sl!(), "updating cpuset for parent path {:?}", &r_path);
info!(sl(), "updating cpuset for parent path {:?}", &r_path);
let cg = new_cgroup(cgroups::hierarchies::auto(), r_path)?;
let cpuset_controller: &CpuSetController = cg.controller_of().unwrap();
cpuset_controller.set_cpus(guest_cpuset)?;
@@ -243,7 +239,7 @@ impl CgroupManager for Manager {
if !container_cpuset.is_empty() {
info!(
sl!(),
sl(),
"updating cpuset for container path: {:?} cpuset: {}",
&container_path,
container_cpuset
@@ -278,7 +274,7 @@ fn set_network_resources(
network: &LinuxNetwork,
res: &mut cgroups::Resources,
) {
info!(sl!(), "cgroup manager set network");
info!(sl(), "cgroup manager set network");
// set classid
// description can be found at https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v1/net_cls.html
@@ -305,7 +301,7 @@ fn set_devices_resources(
device_resources: &[LinuxDeviceCgroup],
res: &mut cgroups::Resources,
) {
info!(sl!(), "cgroup manager set devices");
info!(sl(), "cgroup manager set devices");
let mut devices = vec![];
for d in device_resources.iter() {
@@ -334,7 +330,7 @@ fn set_hugepages_resources(
hugepage_limits: &[LinuxHugepageLimit],
res: &mut cgroups::Resources,
) {
info!(sl!(), "cgroup manager set hugepage");
info!(sl(), "cgroup manager set hugepage");
let mut limits = vec![];
let hugetlb_controller = cg.controller_of::<HugeTlbController>();
@@ -348,7 +344,7 @@ fn set_hugepages_resources(
limits.push(hr);
} else {
warn!(
sl!(),
sl(),
"{} page size support cannot be verified, dropping requested limit", l.page_size
);
}
@@ -361,7 +357,7 @@ fn set_block_io_resources(
blkio: &LinuxBlockIo,
res: &mut cgroups::Resources,
) {
info!(sl!(), "cgroup manager set block io");
info!(sl(), "cgroup manager set block io");
res.blkio.weight = blkio.weight;
res.blkio.leaf_weight = blkio.leaf_weight;
@@ -389,13 +385,13 @@ fn set_block_io_resources(
}
fn set_cpu_resources(cg: &cgroups::Cgroup, cpu: &LinuxCpu) -> Result<()> {
info!(sl!(), "cgroup manager set cpu");
info!(sl(), "cgroup manager set cpu");
let cpuset_controller: &CpuSetController = cg.controller_of().unwrap();
if !cpu.cpus.is_empty() {
if let Err(e) = cpuset_controller.set_cpus(&cpu.cpus) {
warn!(sl!(), "write cpuset failed: {:?}", e);
warn!(sl(), "write cpuset failed: {:?}", e);
}
}
@@ -426,7 +422,7 @@ fn set_cpu_resources(cg: &cgroups::Cgroup, cpu: &LinuxCpu) -> Result<()> {
}
fn set_memory_resources(cg: &cgroups::Cgroup, memory: &LinuxMemory, update: bool) -> Result<()> {
info!(sl!(), "cgroup manager set memory");
info!(sl(), "cgroup manager set memory");
let mem_controller: &MemController = cg.controller_of().unwrap();
if !update {
@@ -446,14 +442,14 @@ fn set_memory_resources(cg: &cgroups::Cgroup, memory: &LinuxMemory, update: bool
let memstat = get_memory_stats(cg)
.into_option()
.ok_or_else(|| anyhow!("failed to get the cgroup memory stats"))?;
let memusage = memstat.get_usage();
let memusage = memstat.usage();
// When update memory limit, the kernel would check the current memory limit
// set against the new swap setting, if the current memory limit is large than
// the new swap, then set limit first, otherwise the kernel would complain and
// refused to set; on the other hand, if the current memory limit is smaller than
// the new swap, then we should set the swap first and then set the memor limit.
if swap == -1 || memusage.get_limit() < swap as u64 {
if swap == -1 || memusage.limit() < swap as u64 {
mem_controller.set_memswap_limit(swap)?;
set_resource!(mem_controller, set_limit, memory, limit);
} else {
@@ -495,7 +491,7 @@ fn set_memory_resources(cg: &cgroups::Cgroup, memory: &LinuxMemory, update: bool
}
fn set_pids_resources(cg: &cgroups::Cgroup, pids: &LinuxPids) -> Result<()> {
info!(sl!(), "cgroup manager set pids");
info!(sl(), "cgroup manager set pids");
let pid_controller: &PidController = cg.controller_of().unwrap();
let v = if pids.limit > 0 {
MaxValue::Value(pids.limit)
@@ -657,21 +653,20 @@ lazy_static! {
};
}
fn get_cpu_stats(cg: &cgroups::Cgroup) -> SingularPtrField<ThrottlingData> {
fn get_cpu_stats(cg: &cgroups::Cgroup) -> MessageField<ThrottlingData> {
let cpu_controller: &CpuController = get_controller_or_return_singular_none!(cg);
let stat = cpu_controller.cpu().stat;
let h = lines_to_map(&stat);
SingularPtrField::some(ThrottlingData {
MessageField::some(ThrottlingData {
periods: *h.get("nr_periods").unwrap_or(&0),
throttled_periods: *h.get("nr_throttled").unwrap_or(&0),
throttled_time: *h.get("throttled_time").unwrap_or(&0),
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
..Default::default()
})
}
fn get_cpuacct_stats(cg: &cgroups::Cgroup) -> SingularPtrField<CpuUsage> {
fn get_cpuacct_stats(cg: &cgroups::Cgroup) -> MessageField<CpuUsage> {
if let Some(cpuacct_controller) = cg.controller_of::<CpuAcctController>() {
let cpuacct = cpuacct_controller.cpuacct();
@@ -685,13 +680,12 @@ fn get_cpuacct_stats(cg: &cgroups::Cgroup) -> SingularPtrField<CpuUsage> {
let percpu_usage = line_to_vec(&cpuacct.usage_percpu);
return SingularPtrField::some(CpuUsage {
return MessageField::some(CpuUsage {
total_usage,
percpu_usage,
usage_in_kernelmode,
usage_in_usermode,
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
..Default::default()
});
}
@@ -704,17 +698,16 @@ fn get_cpuacct_stats(cg: &cgroups::Cgroup) -> SingularPtrField<CpuUsage> {
let total_usage = *h.get("usage_usec").unwrap_or(&0);
let percpu_usage = vec![];
SingularPtrField::some(CpuUsage {
MessageField::some(CpuUsage {
total_usage,
percpu_usage,
usage_in_kernelmode,
usage_in_usermode,
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
..Default::default()
})
}
fn get_memory_stats(cg: &cgroups::Cgroup) -> SingularPtrField<MemoryStats> {
fn get_memory_stats(cg: &cgroups::Cgroup) -> MessageField<MemoryStats> {
let memory_controller: &MemController = get_controller_or_return_singular_none!(cg);
// cache from memory stat
@@ -726,52 +719,48 @@ fn get_memory_stats(cg: &cgroups::Cgroup) -> SingularPtrField<MemoryStats> {
let use_hierarchy = value == 1;
// get memory data
let usage = SingularPtrField::some(MemoryData {
let usage = MessageField::some(MemoryData {
usage: memory.usage_in_bytes,
max_usage: memory.max_usage_in_bytes,
failcnt: memory.fail_cnt,
limit: memory.limit_in_bytes as u64,
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
..Default::default()
});
// get swap usage
let memswap = memory_controller.memswap();
let swap_usage = SingularPtrField::some(MemoryData {
let swap_usage = MessageField::some(MemoryData {
usage: memswap.usage_in_bytes,
max_usage: memswap.max_usage_in_bytes,
failcnt: memswap.fail_cnt,
limit: memswap.limit_in_bytes as u64,
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
..Default::default()
});
// get kernel usage
let kmem_stat = memory_controller.kmem_stat();
let kernel_usage = SingularPtrField::some(MemoryData {
let kernel_usage = MessageField::some(MemoryData {
usage: kmem_stat.usage_in_bytes,
max_usage: kmem_stat.max_usage_in_bytes,
failcnt: kmem_stat.fail_cnt,
limit: kmem_stat.limit_in_bytes as u64,
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
..Default::default()
});
SingularPtrField::some(MemoryStats {
MessageField::some(MemoryStats {
cache,
usage,
swap_usage,
kernel_usage,
use_hierarchy,
stats: memory.stat.raw,
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
..Default::default()
})
}
fn get_pids_stats(cg: &cgroups::Cgroup) -> SingularPtrField<PidsStats> {
fn get_pids_stats(cg: &cgroups::Cgroup) -> MessageField<PidsStats> {
let pid_controller: &PidController = get_controller_or_return_singular_none!(cg);
let current = pid_controller.get_pid_current().unwrap_or(0);
@@ -785,11 +774,10 @@ fn get_pids_stats(cg: &cgroups::Cgroup) -> SingularPtrField<PidsStats> {
},
} as u64;
SingularPtrField::some(PidsStats {
MessageField::some(PidsStats {
current,
limit,
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
..Default::default()
})
}
@@ -825,8 +813,8 @@ https://github.com/opencontainers/runc/blob/a5847db387ae28c0ca4ebe4beee1a76900c8
Total 0
*/
fn get_blkio_stat_blkiodata(blkiodata: &[BlkIoData]) -> RepeatedField<BlkioStatsEntry> {
let mut m = RepeatedField::new();
fn get_blkio_stat_blkiodata(blkiodata: &[BlkIoData]) -> Vec<BlkioStatsEntry> {
let mut m = Vec::new();
if blkiodata.is_empty() {
return m;
}
@@ -839,16 +827,15 @@ fn get_blkio_stat_blkiodata(blkiodata: &[BlkIoData]) -> RepeatedField<BlkioStats
minor: d.minor as u64,
op: op.clone(),
value: d.data,
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
..Default::default()
});
}
m
}
fn get_blkio_stat_ioservice(services: &[IoService]) -> RepeatedField<BlkioStatsEntry> {
let mut m = RepeatedField::new();
fn get_blkio_stat_ioservice(services: &[IoService]) -> Vec<BlkioStatsEntry> {
let mut m = Vec::new();
if services.is_empty() {
return m;
@@ -872,17 +859,16 @@ fn build_blkio_stats_entry(major: i16, minor: i16, op: &str, value: u64) -> Blki
minor: minor as u64,
op: op.to_string(),
value,
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
..Default::default()
}
}
fn get_blkio_stats_v2(cg: &cgroups::Cgroup) -> SingularPtrField<BlkioStats> {
fn get_blkio_stats_v2(cg: &cgroups::Cgroup) -> MessageField<BlkioStats> {
let blkio_controller: &BlkIoController = get_controller_or_return_singular_none!(cg);
let blkio = blkio_controller.blkio();
let mut resp = BlkioStats::new();
let mut blkio_stats = RepeatedField::new();
let mut blkio_stats = Vec::new();
let stat = blkio.io_stat;
for s in stat {
@@ -898,10 +884,10 @@ fn get_blkio_stats_v2(cg: &cgroups::Cgroup) -> SingularPtrField<BlkioStats> {
resp.io_service_bytes_recursive = blkio_stats;
SingularPtrField::some(resp)
MessageField::some(resp)
}
fn get_blkio_stats(cg: &cgroups::Cgroup) -> SingularPtrField<BlkioStats> {
fn get_blkio_stats(cg: &cgroups::Cgroup) -> MessageField<BlkioStats> {
if cg.v2() {
return get_blkio_stats_v2(cg);
}
@@ -934,7 +920,7 @@ fn get_blkio_stats(cg: &cgroups::Cgroup) -> SingularPtrField<BlkioStats> {
m.sectors_recursive = get_blkio_stat_blkiodata(&blkio.sectors_recursive);
}
SingularPtrField::some(m)
MessageField::some(m)
}
fn get_hugetlb_stats(cg: &cgroups::Cgroup) -> HashMap<String, HugetlbStats> {
@@ -958,8 +944,7 @@ fn get_hugetlb_stats(cg: &cgroups::Cgroup) -> HashMap<String, HugetlbStats> {
usage,
max_usage,
failcnt,
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
..Default::default()
},
);
}
@@ -975,7 +960,7 @@ pub fn get_paths() -> Result<HashMap<String, String>> {
for l in fs::read_to_string(PATHS)?.lines() {
let fl: Vec<&str> = l.split(':').collect();
if fl.len() != 3 {
info!(sl!(), "Corrupted cgroup data!");
info!(sl(), "Corrupted cgroup data!");
continue;
}
@@ -996,7 +981,7 @@ pub fn get_mounts(paths: &HashMap<String, String>) -> Result<HashMap<String, Str
let post: Vec<&str> = p[1].split(' ').collect();
if post.len() != 3 {
warn!(sl!(), "can't parse {} line {:?}", MOUNTS, l);
warn!(sl(), "can't parse {} line {:?}", MOUNTS, l);
continue;
}

View File

@@ -3,7 +3,7 @@
// SPDX-License-Identifier: Apache-2.0
//
use protobuf::{CachedSize, SingularPtrField, UnknownFields};
use protobuf::MessageField;
use crate::cgroups::Manager as CgroupManager;
use crate::protocols::agent::{BlkioStats, CgroupStats, CpuStats, MemoryStats, PidsStats};
@@ -33,13 +33,12 @@ impl CgroupManager for Manager {
fn get_stats(&self) -> Result<CgroupStats> {
Ok(CgroupStats {
cpu_stats: SingularPtrField::some(CpuStats::default()),
memory_stats: SingularPtrField::some(MemoryStats::new()),
pids_stats: SingularPtrField::some(PidsStats::new()),
blkio_stats: SingularPtrField::some(BlkioStats::new()),
cpu_stats: MessageField::some(CpuStats::default()),
memory_stats: MessageField::some(MemoryStats::new()),
pids_stats: MessageField::some(PidsStats::new()),
blkio_stats: MessageField::some(BlkioStats::new()),
hugetlb_stats: HashMap::new(),
unknown_fields: UnknownFields::default(),
cached_size: CachedSize::default(),
..Default::default()
})
}

View File

@@ -16,11 +16,9 @@ use inotify::{Inotify, WatchMask};
use tokio::io::AsyncReadExt;
use tokio::sync::mpsc::{channel, Receiver};
// Convenience macro to obtain the scope logger
macro_rules! sl {
() => {
slog_scope::logger().new(o!("subsystem" => "cgroups_notifier"))
};
// Convenience function to obtain the scope logger.
fn sl() -> slog::Logger {
slog_scope::logger().new(o!("subsystem" => "cgroups_notifier"))
}
pub async fn notify_oom(cid: &str, cg_dir: String) -> Result<Receiver<String>> {
@@ -38,7 +36,7 @@ pub async fn notify_oom(cid: &str, cg_dir: String) -> Result<Receiver<String>> {
fn get_value_from_cgroup(path: &Path, key: &str) -> Result<i64> {
let content = fs::read_to_string(path)?;
info!(
sl!(),
sl(),
"get_value_from_cgroup file: {:?}, content: {}", &path, &content
);
@@ -67,11 +65,11 @@ async fn register_memory_event_v2(
let event_control_path = Path::new(&cg_dir).join(memory_event_name);
let cgroup_event_control_path = Path::new(&cg_dir).join(cgroup_event_name);
info!(
sl!(),
sl(),
"register_memory_event_v2 event_control_path: {:?}", &event_control_path
);
info!(
sl!(),
sl(),
"register_memory_event_v2 cgroup_event_control_path: {:?}", &cgroup_event_control_path
);
@@ -82,8 +80,8 @@ async fn register_memory_event_v2(
// Because no `unix.IN_DELETE|unix.IN_DELETE_SELF` event for cgroup file system, so watching all process exited
let cg_wd = inotify.add_watch(&cgroup_event_control_path, WatchMask::MODIFY)?;
info!(sl!(), "ev_wd: {:?}", ev_wd);
info!(sl!(), "cg_wd: {:?}", cg_wd);
info!(sl(), "ev_wd: {:?}", ev_wd);
info!(sl(), "cg_wd: {:?}", cg_wd);
let (sender, receiver) = channel(100);
let containere_id = containere_id.to_string();
@@ -97,17 +95,17 @@ async fn register_memory_event_v2(
while let Some(event_or_error) = stream.next().await {
let event = event_or_error.unwrap();
info!(
sl!(),
sl(),
"container[{}] get event for container: {:?}", &containere_id, &event
);
// info!("is1: {}", event.wd == wd1);
info!(sl!(), "event.wd: {:?}", event.wd);
info!(sl(), "event.wd: {:?}", event.wd);
if event.wd == ev_wd {
let oom = get_value_from_cgroup(&event_control_path, "oom_kill");
if oom.unwrap_or(0) > 0 {
let _ = sender.send(containere_id.clone()).await.map_err(|e| {
error!(sl!(), "send containere_id failed, error: {:?}", e);
error!(sl(), "send containere_id failed, error: {:?}", e);
});
return;
}
@@ -171,13 +169,13 @@ async fn register_memory_event(
let mut buf = [0u8; 8];
match eventfd_stream.read(&mut buf).await {
Err(err) => {
warn!(sl!(), "failed to read from eventfd: {:?}", err);
warn!(sl(), "failed to read from eventfd: {:?}", err);
return;
}
Ok(_) => {
let content = fs::read_to_string(path.clone());
info!(
sl!(),
sl(),
"cgroup event for container: {}, path: {:?}, content: {:?}",
&containere_id,
&path,
@@ -193,7 +191,7 @@ async fn register_memory_event(
}
let _ = sender.send(containere_id.clone()).await.map_err(|e| {
error!(sl!(), "send containere_id failed, error: {:?}", e);
error!(sl(), "send containere_id failed, error: {:?}", e);
});
}
});

View File

@@ -6,7 +6,10 @@
pub const DEFAULT_SLICE: &str = "system.slice";
pub const SLICE_SUFFIX: &str = ".slice";
pub const SCOPE_SUFFIX: &str = ".scope";
pub const UNIT_MODE: &str = "replace";
pub const WHO_ENUM_ALL: &str = "all";
pub const SIGNAL_KILL: i32 = nix::sys::signal::SIGKILL as i32;
pub const UNIT_MODE_REPLACE: &str = "replace";
pub const NO_SUCH_UNIT_ERROR: &str = "org.freedesktop.systemd1.NoSuchUnit";
pub type Properties<'a> = Vec<(&'a str, zbus::zvariant::Value<'a>)>;

View File

@@ -1,55 +1,50 @@
// Copyright 2021-2022 Kata Contributors
// Copyright 2021-2023 Kata Contributors
//
// SPDX-License-Identifier: Apache-2.0
//
use std::vec;
use super::common::CgroupHierarchy;
use super::common::{Properties, SLICE_SUFFIX, UNIT_MODE};
use super::common::{
CgroupHierarchy, Properties, NO_SUCH_UNIT_ERROR, SIGNAL_KILL, SLICE_SUFFIX, UNIT_MODE_REPLACE,
WHO_ENUM_ALL,
};
use super::interface::system::ManagerProxyBlocking as SystemManager;
use anyhow::{Context, Result};
use anyhow::{anyhow, Context, Result};
use zbus::zvariant::Value;
pub trait SystemdInterface {
fn start_unit(
&self,
pid: i32,
parent: &str,
unit_name: &str,
cg_hierarchy: &CgroupHierarchy,
) -> Result<()>;
fn set_properties(&self, unit_name: &str, properties: &Properties) -> Result<()>;
fn stop_unit(&self, unit_name: &str) -> Result<()>;
fn start_unit(&self, pid: i32, parent: &str, cg_hierarchy: &CgroupHierarchy) -> Result<()>;
fn set_properties(&self, properties: &Properties) -> Result<()>;
fn kill_unit(&self) -> Result<()>;
fn freeze_unit(&self) -> Result<()>;
fn thaw_unit(&self) -> Result<()>;
fn add_process(&self, pid: i32) -> Result<()>;
fn get_version(&self) -> Result<String>;
fn unit_exist(&self, unit_name: &str) -> Result<bool>;
fn add_process(&self, pid: i32, unit_name: &str) -> Result<()>;
fn unit_exists(&self) -> Result<bool>;
}
#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct DBusClient {}
pub struct DBusClient {
unit_name: String,
}
impl DBusClient {
pub fn new(unit_name: String) -> Self {
Self { unit_name }
}
fn build_proxy(&self) -> Result<SystemManager<'static>> {
let connection = zbus::blocking::Connection::system()?;
let proxy = SystemManager::new(&connection)?;
let connection =
zbus::blocking::Connection::system().context("Establishing a D-Bus connection")?;
let proxy = SystemManager::new(&connection).context("Building a D-Bus proxy manager")?;
Ok(proxy)
}
}
impl SystemdInterface for DBusClient {
fn start_unit(
&self,
pid: i32,
parent: &str,
unit_name: &str,
cg_hierarchy: &CgroupHierarchy,
) -> Result<()> {
fn start_unit(&self, pid: i32, parent: &str, cg_hierarchy: &CgroupHierarchy) -> Result<()> {
let proxy = self.build_proxy()?;
// enable CPUAccounting & MemoryAccounting & (Block)IOAccounting by default
@@ -67,7 +62,7 @@ impl SystemdInterface for DBusClient {
CgroupHierarchy::Unified => properties.push(("BlockIOAccounting", Value::Bool(true))),
}
if unit_name.ends_with(SLICE_SUFFIX) {
if self.unit_name.ends_with(SLICE_SUFFIX) {
properties.push(("Wants", Value::Str(parent.into())));
} else {
properties.push(("Slice", Value::Str(parent.into())));
@@ -75,27 +70,57 @@ impl SystemdInterface for DBusClient {
}
proxy
.start_transient_unit(unit_name, UNIT_MODE, &properties, &[])
.with_context(|| format!("failed to start transient unit {}", unit_name))?;
Ok(())
}
fn set_properties(&self, unit_name: &str, properties: &Properties) -> Result<()> {
let proxy = self.build_proxy()?;
proxy
.set_unit_properties(unit_name, true, properties)
.with_context(|| format!("failed to set unit properties {}", unit_name))?;
.start_transient_unit(&self.unit_name, UNIT_MODE_REPLACE, &properties, &[])
.context(format!("failed to start transient unit {}", self.unit_name))?;
Ok(())
}
fn stop_unit(&self, unit_name: &str) -> Result<()> {
fn set_properties(&self, properties: &Properties) -> Result<()> {
let proxy = self.build_proxy()?;
proxy
.stop_unit(unit_name, UNIT_MODE)
.with_context(|| format!("failed to stop unit {}", unit_name))?;
.set_unit_properties(&self.unit_name, true, properties)
.context(format!("failed to set unit {} properties", self.unit_name))?;
Ok(())
}
fn kill_unit(&self) -> Result<()> {
let proxy = self.build_proxy()?;
proxy
.kill_unit(&self.unit_name, WHO_ENUM_ALL, SIGNAL_KILL)
.or_else(|e| match e {
zbus::Error::MethodError(error_name, _, _)
if error_name.as_str() == NO_SUCH_UNIT_ERROR =>
{
Ok(())
}
_ => Err(e),
})
.context(format!("failed to kill unit {}", self.unit_name))?;
Ok(())
}
fn freeze_unit(&self) -> Result<()> {
let proxy = self.build_proxy()?;
proxy
.freeze_unit(&self.unit_name)
.context(format!("failed to freeze unit {}", self.unit_name))?;
Ok(())
}
fn thaw_unit(&self) -> Result<()> {
let proxy = self.build_proxy()?;
proxy
.thaw_unit(&self.unit_name)
.context(format!("failed to thaw unit {}", self.unit_name))?;
Ok(())
}
@@ -104,22 +129,37 @@ impl SystemdInterface for DBusClient {
let systemd_version = proxy
.version()
.with_context(|| "failed to get systemd version".to_string())?;
.context("failed to get systemd version".to_string())?;
Ok(systemd_version)
}
fn unit_exist(&self, unit_name: &str) -> Result<bool> {
fn unit_exists(&self) -> Result<bool> {
let proxy = self.build_proxy()?;
Ok(proxy.get_unit(unit_name).is_ok())
match proxy.get_unit(&self.unit_name) {
Ok(_) => Ok(true),
Err(zbus::Error::MethodError(error_name, _, _))
if error_name.as_str() == NO_SUCH_UNIT_ERROR =>
{
Ok(false)
}
Err(e) => Err(anyhow!(format!(
"failed to check if unit {} exists: {:?}",
self.unit_name, e
))),
}
}
fn add_process(&self, pid: i32, unit_name: &str) -> Result<()> {
fn add_process(&self, pid: i32) -> Result<()> {
let proxy = self.build_proxy()?;
proxy
.attach_processes_to_unit(unit_name, "/", &[pid as u32])
.with_context(|| format!("failed to add process {}", unit_name))?;
.attach_processes_to_unit(&self.unit_name, "/", &[pid as u32])
.context(format!(
"failed to add process into unit {}",
self.unit_name
))?;
Ok(())
}

View File

@@ -1,4 +1,4 @@
// Copyright 2021-2022 Kata Contributors
// Copyright 2021-2023 Kata Contributors
//
// SPDX-License-Identifier: Apache-2.0
//
@@ -8,7 +8,7 @@
//! # DBus interface proxy for: `org.freedesktop.systemd1.Manager`
//!
//! This code was generated by `zbus-xmlgen` `2.0.1` from DBus introspection data.
//! This code was generated by `zbus-xmlgen` `3.1.1` from DBus introspection data.
//! Source: `Interface '/org/freedesktop/systemd1' from service 'org.freedesktop.systemd1' on system bus`.
//!
//! You may prefer to adapt it, instead of using it verbatim.
@@ -189,12 +189,14 @@ trait Manager {
) -> zbus::Result<zbus::zvariant::OwnedObjectPath>;
/// GetUnitByInvocationID method
#[dbus_proxy(name = "GetUnitByInvocationID")]
fn get_unit_by_invocation_id(
&self,
invocation_id: &[u8],
) -> zbus::Result<zbus::zvariant::OwnedObjectPath>;
/// GetUnitByPID method
#[dbus_proxy(name = "GetUnitByPID")]
fn get_unit_by_pid(&self, pid: u32) -> zbus::Result<zbus::zvariant::OwnedObjectPath>;
/// GetUnitFileLinks method
@@ -210,6 +212,7 @@ trait Manager {
fn halt(&self) -> zbus::Result<()>;
/// KExec method
#[dbus_proxy(name = "KExec")]
fn kexec(&self) -> zbus::Result<()>;
/// KillUnit method
@@ -330,6 +333,7 @@ trait Manager {
fn lookup_dynamic_user_by_name(&self, name: &str) -> zbus::Result<u32>;
/// LookupDynamicUserByUID method
#[dbus_proxy(name = "LookupDynamicUserByUID")]
fn lookup_dynamic_user_by_uid(&self, uid: u32) -> zbus::Result<String>;
/// MaskUnitFiles method
@@ -571,139 +575,139 @@ trait Manager {
fn ctrl_alt_del_burst_action(&self) -> zbus::Result<String>;
/// DefaultBlockIOAccounting property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultBlockIOAccounting")]
fn default_block_ioaccounting(&self) -> zbus::Result<bool>;
/// DefaultCPUAccounting property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultCPUAccounting")]
fn default_cpuaccounting(&self) -> zbus::Result<bool>;
/// DefaultLimitAS property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitAS")]
fn default_limit_as(&self) -> zbus::Result<u64>;
/// DefaultLimitASSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitASSoft")]
fn default_limit_assoft(&self) -> zbus::Result<u64>;
/// DefaultLimitCORE property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitCORE")]
fn default_limit_core(&self) -> zbus::Result<u64>;
/// DefaultLimitCORESoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitCORESoft")]
fn default_limit_coresoft(&self) -> zbus::Result<u64>;
/// DefaultLimitCPU property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitCPU")]
fn default_limit_cpu(&self) -> zbus::Result<u64>;
/// DefaultLimitCPUSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitCPUSoft")]
fn default_limit_cpusoft(&self) -> zbus::Result<u64>;
/// DefaultLimitDATA property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitDATA")]
fn default_limit_data(&self) -> zbus::Result<u64>;
/// DefaultLimitDATASoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitDATASoft")]
fn default_limit_datasoft(&self) -> zbus::Result<u64>;
/// DefaultLimitFSIZE property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitFSIZE")]
fn default_limit_fsize(&self) -> zbus::Result<u64>;
/// DefaultLimitFSIZESoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitFSIZESoft")]
fn default_limit_fsizesoft(&self) -> zbus::Result<u64>;
/// DefaultLimitLOCKS property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitLOCKS")]
fn default_limit_locks(&self) -> zbus::Result<u64>;
/// DefaultLimitLOCKSSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitLOCKSSoft")]
fn default_limit_lockssoft(&self) -> zbus::Result<u64>;
/// DefaultLimitMEMLOCK property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitMEMLOCK")]
fn default_limit_memlock(&self) -> zbus::Result<u64>;
/// DefaultLimitMEMLOCKSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitMEMLOCKSoft")]
fn default_limit_memlocksoft(&self) -> zbus::Result<u64>;
/// DefaultLimitMSGQUEUE property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitMSGQUEUE")]
fn default_limit_msgqueue(&self) -> zbus::Result<u64>;
/// DefaultLimitMSGQUEUESoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitMSGQUEUESoft")]
fn default_limit_msgqueuesoft(&self) -> zbus::Result<u64>;
/// DefaultLimitNICE property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitNICE")]
fn default_limit_nice(&self) -> zbus::Result<u64>;
/// DefaultLimitNICESoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitNICESoft")]
fn default_limit_nicesoft(&self) -> zbus::Result<u64>;
/// DefaultLimitNOFILE property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitNOFILE")]
fn default_limit_nofile(&self) -> zbus::Result<u64>;
/// DefaultLimitNOFILESoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitNOFILESoft")]
fn default_limit_nofilesoft(&self) -> zbus::Result<u64>;
/// DefaultLimitNPROC property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitNPROC")]
fn default_limit_nproc(&self) -> zbus::Result<u64>;
/// DefaultLimitNPROCSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitNPROCSoft")]
fn default_limit_nprocsoft(&self) -> zbus::Result<u64>;
/// DefaultLimitRSS property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitRSS")]
fn default_limit_rss(&self) -> zbus::Result<u64>;
/// DefaultLimitRSSSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitRSSSoft")]
fn default_limit_rsssoft(&self) -> zbus::Result<u64>;
/// DefaultLimitRTPRIO property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitRTPRIO")]
fn default_limit_rtprio(&self) -> zbus::Result<u64>;
/// DefaultLimitRTPRIOSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitRTPRIOSoft")]
fn default_limit_rtpriosoft(&self) -> zbus::Result<u64>;
/// DefaultLimitRTTIME property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitRTTIME")]
fn default_limit_rttime(&self) -> zbus::Result<u64>;
/// DefaultLimitRTTIMESoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitRTTIMESoft")]
fn default_limit_rttimesoft(&self) -> zbus::Result<u64>;
/// DefaultLimitSIGPENDING property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitSIGPENDING")]
fn default_limit_sigpending(&self) -> zbus::Result<u64>;
/// DefaultLimitSIGPENDINGSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitSIGPENDINGSoft")]
fn default_limit_sigpendingsoft(&self) -> zbus::Result<u64>;
/// DefaultLimitSTACK property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitSTACK")]
fn default_limit_stack(&self) -> zbus::Result<u64>;
/// DefaultLimitSTACKSoft property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultLimitSTACKSoft")]
fn default_limit_stacksoft(&self) -> zbus::Result<u64>;
/// DefaultMemoryAccounting property
@@ -711,11 +715,11 @@ trait Manager {
fn default_memory_accounting(&self) -> zbus::Result<bool>;
/// DefaultOOMPolicy property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultOOMPolicy")]
fn default_oompolicy(&self) -> zbus::Result<String>;
/// DefaultRestartUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultRestartUSec")]
fn default_restart_usec(&self) -> zbus::Result<u64>;
/// DefaultStandardError property
@@ -731,7 +735,7 @@ trait Manager {
fn default_start_limit_burst(&self) -> zbus::Result<u32>;
/// DefaultStartLimitIntervalUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultStartLimitIntervalUSec")]
fn default_start_limit_interval_usec(&self) -> zbus::Result<u64>;
/// DefaultTasksAccounting property
@@ -743,19 +747,19 @@ trait Manager {
fn default_tasks_max(&self) -> zbus::Result<u64>;
/// DefaultTimeoutAbortUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultTimeoutAbortUSec")]
fn default_timeout_abort_usec(&self) -> zbus::Result<u64>;
/// DefaultTimeoutStartUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultTimeoutStartUSec")]
fn default_timeout_start_usec(&self) -> zbus::Result<u64>;
/// DefaultTimeoutStopUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultTimeoutStopUSec")]
fn default_timeout_stop_usec(&self) -> zbus::Result<u64>;
/// DefaultTimerAccuracyUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "DefaultTimerAccuracyUSec")]
fn default_timer_accuracy_usec(&self) -> zbus::Result<u64>;
/// Environment property
@@ -803,65 +807,64 @@ trait Manager {
fn generators_start_timestamp_monotonic(&self) -> zbus::Result<u64>;
/// InitRDGeneratorsFinishTimestamp property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDGeneratorsFinishTimestamp")]
fn init_rdgenerators_finish_timestamp(&self) -> zbus::Result<u64>;
/// InitRDGeneratorsFinishTimestampMonotonic property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDGeneratorsFinishTimestampMonotonic")]
fn init_rdgenerators_finish_timestamp_monotonic(&self) -> zbus::Result<u64>;
/// InitRDGeneratorsStartTimestamp property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDGeneratorsStartTimestamp")]
fn init_rdgenerators_start_timestamp(&self) -> zbus::Result<u64>;
/// InitRDGeneratorsStartTimestampMonotonic property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDGeneratorsStartTimestampMonotonic")]
fn init_rdgenerators_start_timestamp_monotonic(&self) -> zbus::Result<u64>;
/// InitRDSecurityFinishTimestamp property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDSecurityFinishTimestamp")]
fn init_rdsecurity_finish_timestamp(&self) -> zbus::Result<u64>;
/// InitRDSecurityFinishTimestampMonotonic property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDSecurityFinishTimestampMonotonic")]
fn init_rdsecurity_finish_timestamp_monotonic(&self) -> zbus::Result<u64>;
/// InitRDSecurityStartTimestamp property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDSecurityStartTimestamp")]
fn init_rdsecurity_start_timestamp(&self) -> zbus::Result<u64>;
/// InitRDSecurityStartTimestampMonotonic property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDSecurityStartTimestampMonotonic")]
fn init_rdsecurity_start_timestamp_monotonic(&self) -> zbus::Result<u64>;
/// InitRDTimestamp property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDTimestamp")]
fn init_rdtimestamp(&self) -> zbus::Result<u64>;
/// InitRDTimestampMonotonic property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDTimestampMonotonic")]
fn init_rdtimestamp_monotonic(&self) -> zbus::Result<u64>;
/// InitRDUnitsLoadFinishTimestamp property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDUnitsLoadFinishTimestamp")]
fn init_rdunits_load_finish_timestamp(&self) -> zbus::Result<u64>;
/// InitRDUnitsLoadFinishTimestampMonotonic property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDUnitsLoadFinishTimestampMonotonic")]
fn init_rdunits_load_finish_timestamp_monotonic(&self) -> zbus::Result<u64>;
/// InitRDUnitsLoadStartTimestamp property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDUnitsLoadStartTimestamp")]
fn init_rdunits_load_start_timestamp(&self) -> zbus::Result<u64>;
/// InitRDUnitsLoadStartTimestampMonotonic property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "InitRDUnitsLoadStartTimestampMonotonic")]
fn init_rdunits_load_start_timestamp_monotonic(&self) -> zbus::Result<u64>;
/// KExecWatchdogUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "KExecWatchdogUSec")]
fn kexec_watchdog_usec(&self) -> zbus::Result<u64>;
#[dbus_proxy(property)]
fn set_kexec_watchdog_usec(&self, value: u64) -> zbus::Result<()>;
/// KernelTimestamp property
@@ -883,33 +886,31 @@ trait Manager {
/// LogLevel property
#[dbus_proxy(property)]
fn log_level(&self) -> zbus::Result<String>;
#[dbus_proxy(property)]
fn set_log_level(&self, value: &str) -> zbus::Result<()>;
/// LogTarget property
#[dbus_proxy(property)]
fn log_target(&self) -> zbus::Result<String>;
#[dbus_proxy(property)]
fn set_log_target(&self, value: &str) -> zbus::Result<()>;
/// NFailedJobs property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "NFailedJobs")]
fn nfailed_jobs(&self) -> zbus::Result<u32>;
/// NFailedUnits property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "NFailedUnits")]
fn nfailed_units(&self) -> zbus::Result<u32>;
/// NInstalledJobs property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "NInstalledJobs")]
fn ninstalled_jobs(&self) -> zbus::Result<u32>;
/// NJobs property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "NJobs")]
fn njobs(&self) -> zbus::Result<u32>;
/// NNames property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "NNames")]
fn nnames(&self) -> zbus::Result<u32>;
/// Progress property
@@ -917,15 +918,13 @@ trait Manager {
fn progress(&self) -> zbus::Result<f64>;
/// RebootWatchdogUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "RebootWatchdogUSec")]
fn reboot_watchdog_usec(&self) -> zbus::Result<u64>;
#[dbus_proxy(property)]
fn set_reboot_watchdog_usec(&self, value: u64) -> zbus::Result<()>;
/// RuntimeWatchdogUSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "RuntimeWatchdogUSec")]
fn runtime_watchdog_usec(&self) -> zbus::Result<u64>;
#[dbus_proxy(property)]
fn set_runtime_watchdog_usec(&self, value: u64) -> zbus::Result<()>;
/// SecurityFinishTimestamp property
@@ -947,7 +946,6 @@ trait Manager {
/// ServiceWatchdogs property
#[dbus_proxy(property)]
fn service_watchdogs(&self) -> zbus::Result<bool>;
#[dbus_proxy(property)]
fn set_service_watchdogs(&self, value: bool) -> zbus::Result<()>;
/// ShowStatus property
@@ -963,7 +961,7 @@ trait Manager {
fn tainted(&self) -> zbus::Result<String>;
/// TimerSlackNSec property
#[dbus_proxy(property)]
#[dbus_proxy(property, name = "TimerSlackNSec")]
fn timer_slack_nsec(&self) -> zbus::Result<u64>;
/// UnitPath property

View File

@@ -5,7 +5,7 @@
use crate::cgroups::Manager as CgroupManager;
use crate::protocols::agent::CgroupStats;
use anyhow::Result;
use anyhow::{anyhow, Result};
use cgroups::freezer::FreezerState;
use libc::{self, pid_t};
use oci::LinuxResources;
@@ -29,7 +29,6 @@ pub struct Manager {
pub mounts: HashMap<String, String>,
pub cgroups_path: CgroupsPath,
pub cpath: String,
pub unit_name: String,
// dbus client for set properties
dbus_client: DBusClient,
// fs manager for get properties
@@ -40,14 +39,12 @@ pub struct Manager {
impl CgroupManager for Manager {
fn apply(&self, pid: pid_t) -> Result<()> {
let unit_name = self.unit_name.as_str();
if self.dbus_client.unit_exist(unit_name).unwrap() {
self.dbus_client.add_process(pid, self.unit_name.as_str())?;
if self.dbus_client.unit_exists()? {
self.dbus_client.add_process(pid)?;
} else {
self.dbus_client.start_unit(
(pid as u32).try_into().unwrap(),
self.cgroups_path.slice.as_str(),
self.unit_name.as_str(),
&self.cg_hierarchy,
)?;
}
@@ -66,8 +63,7 @@ impl CgroupManager for Manager {
Pids::apply(r, &mut properties, &self.cg_hierarchy, systemd_version_str)?;
CpuSet::apply(r, &mut properties, &self.cg_hierarchy, systemd_version_str)?;
self.dbus_client
.set_properties(self.unit_name.as_str(), &properties)?;
self.dbus_client.set_properties(&properties)?;
Ok(())
}
@@ -77,11 +73,15 @@ impl CgroupManager for Manager {
}
fn freeze(&self, state: FreezerState) -> Result<()> {
self.fs_manager.freeze(state)
match state {
FreezerState::Thawed => self.dbus_client.thaw_unit(),
FreezerState::Frozen => self.dbus_client.freeze_unit(),
_ => Err(anyhow!("Invalid FreezerState")),
}
}
fn destroy(&mut self) -> Result<()> {
self.dbus_client.stop_unit(self.unit_name.as_str())?;
self.dbus_client.kill_unit()?;
self.fs_manager.destroy()
}
@@ -120,8 +120,7 @@ impl Manager {
mounts: fs_manager.mounts.clone(),
cgroups_path,
cpath,
unit_name,
dbus_client: DBusClient {},
dbus_client: DBusClient::new(unit_name),
fs_manager,
cg_hierarchy: if cgroups::hierarchies::is_cgroup2_unified_mode() {
CgroupHierarchy::Unified

View File

@@ -71,7 +71,7 @@ impl Cpu {
}
// v2:
// cpu.shares <-> CPUShares
// cpu.shares <-> CPUWeight
// cpu.period <-> CPUQuotaPeriodUSec
// cpu.period & cpu.quota <-> CPUQuotaPerSecUSec
fn unified_apply(
@@ -80,8 +80,8 @@ impl Cpu {
systemd_version: &str,
) -> Result<()> {
if let Some(shares) = cpu_resources.shares {
let unified_shares = get_unified_cpushares(shares);
properties.push(("CPUShares", Value::U64(unified_shares)));
let weight = shares_to_weight(shares);
properties.push(("CPUWeight", Value::U64(weight)));
}
if let Some(period) = cpu_resources.period {
@@ -104,7 +104,7 @@ impl Cpu {
// ref: https://github.com/containers/crun/blob/main/crun.1.md#cgroup-v2
// [2-262144] to [1-10000]
fn get_unified_cpushares(shares: u64) -> u64 {
fn shares_to_weight(shares: u64) -> u64 {
if shares == 0 {
return 100;
}

View File

@@ -48,7 +48,7 @@ use nix::unistd::{self, fork, ForkResult, Gid, Pid, Uid, User};
use std::os::unix::fs::MetadataExt;
use std::os::unix::io::AsRawFd;
use protobuf::SingularPtrField;
use protobuf::MessageField;
use oci::State as OCIState;
use regex::Regex;
@@ -80,6 +80,7 @@ const CLOG_FD: &str = "CLOG_FD";
const FIFO_FD: &str = "FIFO_FD";
const HOME_ENV_KEY: &str = "HOME";
const PIDNS_FD: &str = "PIDNS_FD";
const PIDNS_ENABLED: &str = "PIDNS_ENABLED";
const CONSOLE_SOCKET_FD: &str = "CONSOLE_SOCKET_FD";
#[derive(Debug)]
@@ -280,6 +281,17 @@ pub struct SyncPc {
pid: pid_t,
}
#[derive(Debug, Clone)]
pub struct PidNs {
enabled: bool,
fd: Option<i32>,
}
impl PidNs {
pub fn new(enabled: bool, fd: Option<i32>) -> Self {
Self { enabled, fd }
}
}
pub trait Container: BaseContainer {
fn pause(&mut self) -> Result<()>;
fn resume(&mut self) -> Result<()>;
@@ -339,16 +351,20 @@ fn do_init_child(cwfd: RawFd) -> Result<()> {
let crfd = std::env::var(CRFD_FD)?.parse::<i32>().unwrap();
let cfd_log = std::env::var(CLOG_FD)?.parse::<i32>().unwrap();
// get the pidns fd from parent, if parent had passed the pidns fd,
// then get it and join in this pidns; otherwise, create a new pidns
// by unshare from the parent pidns.
match std::env::var(PIDNS_FD) {
Ok(fd) => {
let pidns_fd = fd.parse::<i32>().context("get parent pidns fd")?;
sched::setns(pidns_fd, CloneFlags::CLONE_NEWPID).context("failed to join pidns")?;
let _ = unistd::close(pidns_fd);
if std::env::var(PIDNS_ENABLED)?.eq(format!("{}", true).as_str()) {
// get the pidns fd from parent, if parent had passed the pidns fd,
// then get it and join in this pidns; otherwise, create a new pidns
// by unshare from the parent pidns.
match std::env::var(PIDNS_FD) {
Ok(fd) => {
let pidns_fd = fd.parse::<i32>().context("get parent pidns fd")?;
sched::setns(pidns_fd, CloneFlags::CLONE_NEWPID).context("failed to join pidns")?;
let _ = unistd::close(pidns_fd);
}
Err(_e) => {
sched::unshare(CloneFlags::CLONE_NEWPID)?;
}
}
Err(_e) => sched::unshare(CloneFlags::CLONE_NEWPID)?,
}
match unsafe { fork() } {
@@ -875,7 +891,7 @@ impl BaseContainer for LinuxContainer {
// what about network interface stats?
Ok(StatsContainerResponse {
cgroup_stats: SingularPtrField::some(self.cgroup_manager.as_ref().get_stats()?),
cgroup_stats: MessageField::some(self.cgroup_manager.as_ref().get_stats()?),
..Default::default()
})
}
@@ -983,9 +999,13 @@ impl BaseContainer for LinuxContainer {
}
let pidns = get_pid_namespace(&self.logger, linux)?;
#[cfg(not(feature = "standard-oci-runtime"))]
if !pidns.enabled {
return Err(anyhow!("cannot find the pid ns"));
}
defer!(if let Some(pid) = pidns {
let _ = unistd::close(pid);
defer!(if let Some(fd) = pidns.fd {
let _ = unistd::close(fd);
});
let exec_path = std::env::current_exe()?;
@@ -1008,14 +1028,15 @@ impl BaseContainer for LinuxContainer {
.env(CRFD_FD, format!("{}", crfd))
.env(CWFD_FD, format!("{}", cwfd))
.env(CLOG_FD, format!("{}", cfd_log))
.env(CONSOLE_SOCKET_FD, console_name);
.env(CONSOLE_SOCKET_FD, console_name)
.env(PIDNS_ENABLED, format!("{}", pidns.enabled));
if p.init {
child = child.env(FIFO_FD, format!("{}", fifofd));
}
if pidns.is_some() {
child = child.env(PIDNS_FD, format!("{}", pidns.unwrap()));
if pidns.fd.is_some() {
child = child.env(PIDNS_FD, format!("{}", pidns.fd.unwrap()));
}
child.spawn()?;
@@ -1249,11 +1270,11 @@ pub fn update_namespaces(logger: &Logger, spec: &mut Spec, init_pid: RawFd) -> R
Ok(())
}
fn get_pid_namespace(logger: &Logger, linux: &Linux) -> Result<Option<RawFd>> {
fn get_pid_namespace(logger: &Logger, linux: &Linux) -> Result<PidNs> {
for ns in &linux.namespaces {
if ns.r#type == "pid" {
if ns.path.is_empty() {
return Ok(None);
return Ok(PidNs::new(true, None));
}
let fd =
@@ -1269,11 +1290,11 @@ fn get_pid_namespace(logger: &Logger, linux: &Linux) -> Result<Option<RawFd>> {
e
})?;
return Ok(Some(fd));
return Ok(PidNs::new(true, Some(fd)));
}
}
Err(anyhow!("cannot find the pid ns"))
Ok(PidNs::new(false, None))
}
fn is_userns_enabled(linux: &Linux) -> bool {
@@ -1596,10 +1617,8 @@ mod tests {
use tempfile::tempdir;
use test_utils::skip_if_not_root;
macro_rules! sl {
() => {
slog_scope::logger()
};
fn sl() -> slog::Logger {
slog_scope::logger()
}
#[test]
@@ -1854,7 +1873,7 @@ mod tests {
let _ = new_linux_container_and_then(|mut c: LinuxContainer| {
c.processes.insert(
1,
Process::new(&sl!(), &oci::Process::default(), "123", true, 1).unwrap(),
Process::new(&sl(), &oci::Process::default(), "123", true, 1).unwrap(),
);
let p = c.get_process("123");
assert!(p.is_ok(), "Expecting Ok, Got {:?}", p);
@@ -1881,7 +1900,7 @@ mod tests {
let (c, _dir) = new_linux_container();
let ret = c
.unwrap()
.start(Process::new(&sl!(), &oci::Process::default(), "123", true, 1).unwrap())
.start(Process::new(&sl(), &oci::Process::default(), "123", true, 1).unwrap())
.await;
assert!(ret.is_err(), "Expecting Err, Got {:?}", ret);
}
@@ -1891,7 +1910,7 @@ mod tests {
let (c, _dir) = new_linux_container();
let ret = c
.unwrap()
.run(Process::new(&sl!(), &oci::Process::default(), "123", true, 1).unwrap())
.run(Process::new(&sl(), &oci::Process::default(), "123", true, 1).unwrap())
.await;
assert!(ret.is_err(), "Expecting Err, Got {:?}", ret);
}

View File

@@ -82,11 +82,11 @@ pub fn process_grpc_to_oci(p: &grpc::Process) -> oci::Process {
let cap = p.Capabilities.as_ref().unwrap();
Some(oci::LinuxCapabilities {
bounding: cap.Bounding.clone().into_vec(),
effective: cap.Effective.clone().into_vec(),
inheritable: cap.Inheritable.clone().into_vec(),
permitted: cap.Permitted.clone().into_vec(),
ambient: cap.Ambient.clone().into_vec(),
bounding: cap.Bounding.clone(),
effective: cap.Effective.clone(),
inheritable: cap.Inheritable.clone(),
permitted: cap.Permitted.clone(),
ambient: cap.Ambient.clone(),
})
} else {
None
@@ -108,8 +108,8 @@ pub fn process_grpc_to_oci(p: &grpc::Process) -> oci::Process {
terminal: p.Terminal,
console_size,
user,
args: p.Args.clone().into_vec(),
env: p.Env.clone().into_vec(),
args: p.Args.clone(),
env: p.Env.clone(),
cwd: p.Cwd.clone(),
capabilities,
rlimits,
@@ -130,9 +130,9 @@ fn root_grpc_to_oci(root: &grpc::Root) -> oci::Root {
fn mount_grpc_to_oci(m: &grpc::Mount) -> oci::Mount {
oci::Mount {
destination: m.destination.clone(),
r#type: m.field_type.clone(),
r#type: m.type_.clone(),
source: m.source.clone(),
options: m.options.clone().into_vec(),
options: m.options.clone(),
}
}
@@ -143,8 +143,8 @@ fn hook_grpc_to_oci(h: &[grpcHook]) -> Vec<oci::Hook> {
for e in h.iter() {
r.push(oci::Hook {
path: e.Path.clone(),
args: e.Args.clone().into_vec(),
env: e.Env.clone().into_vec(),
args: e.Args.clone(),
env: e.Env.clone(),
timeout: Some(e.Timeout as i32),
});
}
@@ -359,7 +359,7 @@ fn seccomp_grpc_to_oci(sec: &grpc::LinuxSeccomp) -> oci::LinuxSeccomp {
let mut args = Vec::new();
let errno_ret: u32 = if sys.has_errnoret() {
sys.get_errnoret()
sys.errnoret()
} else {
libc::EPERM as u32
};
@@ -374,7 +374,7 @@ fn seccomp_grpc_to_oci(sec: &grpc::LinuxSeccomp) -> oci::LinuxSeccomp {
}
r.push(oci::LinuxSyscall {
names: sys.Names.clone().into_vec(),
names: sys.Names.clone(),
action: sys.Action.clone(),
errno_ret,
args,
@@ -385,8 +385,8 @@ fn seccomp_grpc_to_oci(sec: &grpc::LinuxSeccomp) -> oci::LinuxSeccomp {
oci::LinuxSeccomp {
default_action: sec.DefaultAction.clone(),
architectures: sec.Architectures.clone().into_vec(),
flags: sec.Flags.clone().into_vec(),
architectures: sec.Architectures.clone(),
flags: sec.Flags.clone(),
syscalls,
}
}
@@ -423,12 +423,18 @@ fn linux_grpc_to_oci(l: &grpc::Linux) -> oci::Linux {
let mut r = Vec::new();
for d in l.Devices.iter() {
// if the filemode for the device is 0 (unset), use a default value as runc does
let filemode = if d.FileMode != 0 {
Some(d.FileMode)
} else {
Some(0o666)
};
r.push(oci::LinuxDevice {
path: d.Path.clone(),
r#type: d.Type.clone(),
major: d.Major,
minor: d.Minor,
file_mode: Some(d.FileMode),
file_mode: filemode,
uid: Some(d.UID),
gid: Some(d.GID),
});
@@ -456,8 +462,8 @@ fn linux_grpc_to_oci(l: &grpc::Linux) -> oci::Linux {
devices,
seccomp,
rootfs_propagation: l.RootfsPropagation.clone(),
masked_paths: l.MaskedPaths.clone().into_vec(),
readonly_paths: l.ReadonlyPaths.clone().into_vec(),
masked_paths: l.MaskedPaths.clone(),
readonly_paths: l.ReadonlyPaths.clone(),
mount_label: l.MountLabel.clone(),
intel_rdt,
}
@@ -558,35 +564,30 @@ mod tests {
// All fields specified
grpcproc: grpc::Process {
Terminal: true,
ConsoleSize: protobuf::SingularPtrField::<grpc::Box>::some(grpc::Box {
ConsoleSize: protobuf::MessageField::<grpc::Box>::some(grpc::Box {
Height: 123,
Width: 456,
..Default::default()
}),
User: protobuf::SingularPtrField::<grpc::User>::some(grpc::User {
User: protobuf::MessageField::<grpc::User>::some(grpc::User {
UID: 1234,
GID: 5678,
AdditionalGids: Vec::from([910, 1112]),
Username: String::from("username"),
..Default::default()
}),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([String::from("env")])),
Args: Vec::from([String::from("arg1"), String::from("arg2")]),
Env: Vec::from([String::from("env")]),
Cwd: String::from("cwd"),
Capabilities: protobuf::SingularPtrField::some(grpc::LinuxCapabilities {
Bounding: protobuf::RepeatedField::from(Vec::from([String::from("bnd")])),
Effective: protobuf::RepeatedField::from(Vec::from([String::from("eff")])),
Inheritable: protobuf::RepeatedField::from(Vec::from([String::from(
"inher",
)])),
Permitted: protobuf::RepeatedField::from(Vec::from([String::from("perm")])),
Ambient: protobuf::RepeatedField::from(Vec::from([String::from("amb")])),
Capabilities: protobuf::MessageField::some(grpc::LinuxCapabilities {
Bounding: Vec::from([String::from("bnd")]),
Effective: Vec::from([String::from("eff")]),
Inheritable: Vec::from([String::from("inher")]),
Permitted: Vec::from([String::from("perm")]),
Ambient: Vec::from([String::from("amb")]),
..Default::default()
}),
Rlimits: protobuf::RepeatedField::from(Vec::from([
Rlimits: Vec::from([
grpc::POSIXRlimit {
Type: String::from("r#type"),
Hard: 123,
@@ -599,7 +600,7 @@ mod tests {
Soft: 1011,
..Default::default()
},
])),
]),
NoNewPrivileges: true,
ApparmorProfile: String::from("apparmor profile"),
OOMScoreAdj: 123456,
@@ -649,7 +650,7 @@ mod tests {
TestData {
// None ConsoleSize
grpcproc: grpc::Process {
ConsoleSize: protobuf::SingularPtrField::<grpc::Box>::none(),
ConsoleSize: protobuf::MessageField::<grpc::Box>::none(),
OOMScoreAdj: 0,
..Default::default()
},
@@ -662,7 +663,7 @@ mod tests {
TestData {
// None User
grpcproc: grpc::Process {
User: protobuf::SingularPtrField::<grpc::User>::none(),
User: protobuf::MessageField::<grpc::User>::none(),
OOMScoreAdj: 0,
..Default::default()
},
@@ -680,7 +681,7 @@ mod tests {
TestData {
// None Capabilities
grpcproc: grpc::Process {
Capabilities: protobuf::SingularPtrField::none(),
Capabilities: protobuf::MessageField::none(),
OOMScoreAdj: 0,
..Default::default()
},
@@ -781,99 +782,57 @@ mod tests {
TestData {
// All specified
grpchooks: grpc::Hooks {
Prestart: protobuf::RepeatedField::from(Vec::from([
Prestart: Vec::from([
grpc::Hook {
Path: String::from("prestartpath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Args: Vec::from([String::from("arg1"), String::from("arg2")]),
Env: Vec::from([String::from("env1"), String::from("env2")]),
Timeout: 10,
..Default::default()
},
grpc::Hook {
Path: String::from("prestartpath2"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg3"),
String::from("arg4"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env3"),
String::from("env4"),
])),
Args: Vec::from([String::from("arg3"), String::from("arg4")]),
Env: Vec::from([String::from("env3"), String::from("env4")]),
Timeout: 25,
..Default::default()
},
])),
Poststart: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
]),
Poststart: Vec::from([grpc::Hook {
Path: String::from("poststartpath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Args: Vec::from([String::from("arg1"), String::from("arg2")]),
Env: Vec::from([String::from("env1"), String::from("env2")]),
Timeout: 10,
..Default::default()
}])),
Poststop: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
}]),
Poststop: Vec::from([grpc::Hook {
Path: String::from("poststoppath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Args: Vec::from([String::from("arg1"), String::from("arg2")]),
Env: Vec::from([String::from("env1"), String::from("env2")]),
Timeout: 10,
..Default::default()
}])),
CreateRuntime: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
}]),
CreateRuntime: Vec::from([grpc::Hook {
Path: String::from("createruntimepath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Args: Vec::from([String::from("arg1"), String::from("arg2")]),
Env: Vec::from([String::from("env1"), String::from("env2")]),
Timeout: 10,
..Default::default()
}])),
CreateContainer: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
}]),
CreateContainer: Vec::from([grpc::Hook {
Path: String::from("createcontainerpath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Args: Vec::from([String::from("arg1"), String::from("arg2")]),
Env: Vec::from([String::from("env1"), String::from("env2")]),
Timeout: 10,
..Default::default()
}])),
StartContainer: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
}]),
StartContainer: Vec::from([grpc::Hook {
Path: String::from("startcontainerpath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Args: Vec::from([String::from("arg1"), String::from("arg2")]),
Env: Vec::from([String::from("env1"), String::from("env2")]),
Timeout: 10,
..Default::default()
}])),
}]),
..Default::default()
},
result: oci::Hooks {
@@ -926,72 +885,42 @@ mod tests {
TestData {
// Prestart empty
grpchooks: grpc::Hooks {
Prestart: protobuf::RepeatedField::from(Vec::from([])),
Poststart: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
Prestart: Vec::from([]),
Poststart: Vec::from([grpc::Hook {
Path: String::from("poststartpath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Args: Vec::from([String::from("arg1"), String::from("arg2")]),
Env: Vec::from([String::from("env1"), String::from("env2")]),
Timeout: 10,
..Default::default()
}])),
Poststop: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
}]),
Poststop: Vec::from([grpc::Hook {
Path: String::from("poststoppath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Args: Vec::from([String::from("arg1"), String::from("arg2")]),
Env: Vec::from([String::from("env1"), String::from("env2")]),
Timeout: 10,
..Default::default()
}])),
CreateRuntime: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
}]),
CreateRuntime: Vec::from([grpc::Hook {
Path: String::from("createruntimepath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Args: Vec::from([String::from("arg1"), String::from("arg2")]),
Env: Vec::from([String::from("env1"), String::from("env2")]),
Timeout: 10,
..Default::default()
}])),
CreateContainer: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
}]),
CreateContainer: Vec::from([grpc::Hook {
Path: String::from("createcontainerpath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Args: Vec::from([String::from("arg1"), String::from("arg2")]),
Env: Vec::from([String::from("env1"), String::from("env2")]),
Timeout: 10,
..Default::default()
}])),
StartContainer: protobuf::RepeatedField::from(Vec::from([grpc::Hook {
}]),
StartContainer: Vec::from([grpc::Hook {
Path: String::from("startcontainerpath"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Args: Vec::from([String::from("arg1"), String::from("arg2")]),
Env: Vec::from([String::from("env1"), String::from("env2")]),
Timeout: 10,
..Default::default()
}])),
}]),
..Default::default()
},
result: oci::Hooks {
@@ -1063,11 +992,8 @@ mod tests {
grpcmount: grpc::Mount {
destination: String::from("destination"),
source: String::from("source"),
field_type: String::from("fieldtype"),
options: protobuf::RepeatedField::from(Vec::from([
String::from("option1"),
String::from("option2"),
])),
type_: String::from("fieldtype"),
options: Vec::from([String::from("option1"), String::from("option2")]),
..Default::default()
},
result: oci::Mount {
@@ -1081,8 +1007,8 @@ mod tests {
grpcmount: grpc::Mount {
destination: String::from("destination"),
source: String::from("source"),
field_type: String::from("fieldtype"),
options: protobuf::RepeatedField::from(Vec::new()),
type_: String::from("fieldtype"),
options: Vec::new(),
..Default::default()
},
result: oci::Mount {
@@ -1096,8 +1022,8 @@ mod tests {
grpcmount: grpc::Mount {
destination: String::new(),
source: String::from("source"),
field_type: String::from("fieldtype"),
options: protobuf::RepeatedField::from(Vec::from([String::from("option1")])),
type_: String::from("fieldtype"),
options: Vec::from([String::from("option1")]),
..Default::default()
},
result: oci::Mount {
@@ -1111,8 +1037,8 @@ mod tests {
grpcmount: grpc::Mount {
destination: String::from("destination"),
source: String::from("source"),
field_type: String::new(),
options: protobuf::RepeatedField::from(Vec::from([String::from("option1")])),
type_: String::new(),
options: Vec::from([String::from("option1")]),
..Default::default()
},
result: oci::Mount {
@@ -1172,27 +1098,15 @@ mod tests {
grpchook: &[
grpc::Hook {
Path: String::from("path"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg1"),
String::from("arg2"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env1"),
String::from("env2"),
])),
Args: Vec::from([String::from("arg1"), String::from("arg2")]),
Env: Vec::from([String::from("env1"), String::from("env2")]),
Timeout: 10,
..Default::default()
},
grpc::Hook {
Path: String::from("path2"),
Args: protobuf::RepeatedField::from(Vec::from([
String::from("arg3"),
String::from("arg4"),
])),
Env: protobuf::RepeatedField::from(Vec::from([
String::from("env3"),
String::from("env4"),
])),
Args: Vec::from([String::from("arg3"), String::from("arg4")]),
Env: Vec::from([String::from("env3"), String::from("env4")]),
Timeout: 20,
..Default::default()
},

View File

@@ -35,7 +35,7 @@ use crate::log_child;
// struct is populated from the content in the /proc/<pid>/mountinfo file.
#[derive(std::fmt::Debug, PartialEq)]
pub struct Info {
mount_point: String,
pub mount_point: String,
optional: String,
fstype: String,
}
@@ -553,7 +553,7 @@ fn rootfs_parent_mount_private(path: &str) -> Result<()> {
// Parse /proc/self/mountinfo because comparing Dev and ino does not work from
// bind mounts
fn parse_mount_table(mountinfo_path: &str) -> Result<Vec<Info>> {
pub fn parse_mount_table(mountinfo_path: &str) -> Result<Vec<Info>> {
let file = File::open(mountinfo_path)?;
let reader = BufReader::new(file);
let mut infos = Vec::new();
@@ -1118,6 +1118,7 @@ mod tests {
use std::fs::create_dir;
use std::fs::create_dir_all;
use std::fs::remove_dir_all;
use std::fs::remove_file;
use std::io;
use std::os::unix::fs;
use std::os::unix::io::AsRawFd;
@@ -1333,14 +1334,9 @@ mod tests {
fn test_mknod_dev() {
skip_if_not_root!();
let tempdir = tempdir().unwrap();
let olddir = unistd::getcwd().unwrap();
defer!(let _ = unistd::chdir(&olddir););
let _ = unistd::chdir(tempdir.path());
let path = "/dev/fifo-test";
let dev = oci::LinuxDevice {
path: "/fifo".to_string(),
path: path.to_string(),
r#type: "c".to_string(),
major: 0,
minor: 0,
@@ -1348,13 +1344,16 @@ mod tests {
uid: Some(unistd::getuid().as_raw()),
gid: Some(unistd::getgid().as_raw()),
};
let path = Path::new("fifo");
let ret = mknod_dev(&dev, path);
let ret = mknod_dev(&dev, Path::new(path));
assert!(ret.is_ok(), "Should pass. Got: {:?}", ret);
let ret = stat::stat(path);
assert!(ret.is_ok(), "Should pass. Got: {:?}", ret);
// clear test device node
let ret = remove_file(path);
assert!(ret.is_ok(), "Should pass, Got: {:?}", ret);
}
#[test]

View File

@@ -161,7 +161,7 @@ impl Process {
pub fn notify_term_close(&mut self) {
let notify = self.term_exit_notifier.clone();
notify.notify_one();
notify.notify_waiters();
}
pub fn close_stdin(&mut self) {

View File

@@ -5,7 +5,6 @@
use crate::rpc;
use anyhow::{bail, ensure, Context, Result};
use serde::Deserialize;
use std::collections::HashSet;
use std::env;
use std::fs;
use std::str::FromStr;
@@ -52,17 +51,6 @@ const ERR_INVALID_CONTAINER_PIPE_SIZE_PARAM: &str = "unable to parse container p
const ERR_INVALID_CONTAINER_PIPE_SIZE_KEY: &str = "invalid container pipe size key name";
const ERR_INVALID_CONTAINER_PIPE_NEGATIVE: &str = "container pipe size should not be negative";
#[derive(Debug, Default, Deserialize)]
pub struct EndpointsConfig {
pub allowed: Vec<String>,
}
#[derive(Debug, Default)]
pub struct AgentEndpoints {
pub allowed: HashSet<String>,
pub all_allowed: bool,
}
#[derive(Debug)]
pub struct AgentConfig {
pub debug_console: bool,
@@ -75,7 +63,6 @@ pub struct AgentConfig {
pub server_addr: String,
pub unified_cgroup_hierarchy: bool,
pub tracing: bool,
pub endpoints: AgentEndpoints,
pub supports_seccomp: bool,
}
@@ -91,7 +78,6 @@ pub struct AgentConfigBuilder {
pub server_addr: Option<String>,
pub unified_cgroup_hierarchy: Option<bool>,
pub tracing: Option<bool>,
pub endpoints: Option<EndpointsConfig>,
}
macro_rules! config_override {
@@ -151,7 +137,6 @@ impl Default for AgentConfig {
server_addr: format!("{}:{}", VSOCK_ADDR, DEFAULT_AGENT_VSOCK_PORT),
unified_cgroup_hierarchy: false,
tracing: false,
endpoints: Default::default(),
supports_seccomp: rpc::have_seccomp(),
}
}
@@ -182,25 +167,19 @@ impl FromStr for AgentConfig {
config_override!(agent_config_builder, agent_config, unified_cgroup_hierarchy);
config_override!(agent_config_builder, agent_config, tracing);
// Populate the allowed endpoints hash set, if we got any from the config file.
if let Some(endpoints) = agent_config_builder.endpoints {
for ep in endpoints.allowed {
agent_config.endpoints.allowed.insert(ep);
}
}
Ok(agent_config)
}
}
impl AgentConfig {
#[instrument]
#[allow(clippy::redundant_closure_call)]
pub fn from_cmdline(file: &str, args: Vec<String>) -> Result<AgentConfig> {
// If config file specified in the args, generate our config from it
let config_position = args.iter().position(|a| a == "--config" || a == "-c");
if let Some(config_position) = config_position {
if let Some(config_file) = args.get(config_position + 1) {
return AgentConfig::from_config_file(config_file);
return AgentConfig::from_config_file(config_file).context("AgentConfig from args");
} else {
panic!("The config argument wasn't formed properly: {:?}", args);
}
@@ -216,7 +195,8 @@ impl AgentConfig {
// or if it can't be parsed properly.
if param.starts_with(format!("{}=", CONFIG_FILE).as_str()) {
let config_file = get_string_value(param)?;
return AgentConfig::from_config_file(&config_file);
return AgentConfig::from_config_file(&config_file)
.context("AgentConfig from kernel cmdline");
}
// parse cmdline flags
@@ -296,21 +276,15 @@ impl AgentConfig {
config.tracing = get_bool_value(&name_value)?;
}
// We did not get a configuration file: allow all endpoints.
config.endpoints.all_allowed = true;
Ok(config)
}
#[instrument]
pub fn from_config_file(file: &str) -> Result<AgentConfig> {
let config = fs::read_to_string(file)?;
let config = fs::read_to_string(file)
.with_context(|| format!("Failed to read config file {}", file))?;
AgentConfig::from_str(&config)
}
pub fn is_allowed_endpoint(&self, ep: &str) -> bool {
self.endpoints.all_allowed || self.endpoints.allowed.contains(ep)
}
}
#[instrument]
@@ -1375,26 +1349,13 @@ Caused by:
r#"
dev_mode = true
server_addr = 'vsock://8:2048'
[endpoints]
allowed = ["CreateContainer", "StartContainer"]
"#,
)
.unwrap();
// Verify that the all_allowed flag is false
assert!(!config.endpoints.all_allowed);
// Verify that the override worked
assert!(config.dev_mode);
assert_eq!(config.server_addr, "vsock://8:2048");
assert_eq!(
config.endpoints.allowed,
vec!["CreateContainer".to_string(), "StartContainer".to_string()]
.iter()
.cloned()
.collect()
);
// Verify that the default values are valid
assert_eq!(config.hotplug_timeout, DEFAULT_HOTPLUG_TIMEOUT);

View File

@@ -26,20 +26,18 @@ use oci::{LinuxDeviceCgroup, LinuxResources, Spec};
use protocols::agent::Device;
use tracing::instrument;
// Convenience macro to obtain the scope logger
macro_rules! sl {
() => {
slog_scope::logger().new(o!("subsystem" => "device"))
};
// Convenience function to obtain the scope logger.
fn sl() -> slog::Logger {
slog_scope::logger().new(o!("subsystem" => "device"))
}
const VM_ROOTFS: &str = "/";
const BLOCK: &str = "block";
pub const DRIVER_9P_TYPE: &str = "9p";
pub const DRIVER_VIRTIOFS_TYPE: &str = "virtio-fs";
pub const DRIVER_BLK_TYPE: &str = "blk";
pub const DRIVER_BLK_PCI_TYPE: &str = "blk";
pub const DRIVER_BLK_CCW_TYPE: &str = "blk-ccw";
pub const DRIVER_MMIO_BLK_TYPE: &str = "mmioblk";
pub const DRIVER_BLK_MMIO_TYPE: &str = "mmioblk";
pub const DRIVER_SCSI_TYPE: &str = "scsi";
pub const DRIVER_NVDIMM_TYPE: &str = "nvdimm";
pub const DRIVER_EPHEMERAL_TYPE: &str = "ephemeral";
@@ -78,7 +76,7 @@ where
{
let syspci = Path::new(&syspci);
let drv = drv.as_ref();
info!(sl!(), "rebind_pci_driver: {} => {:?}", dev, drv);
info!(sl(), "rebind_pci_driver: {} => {:?}", dev, drv);
let devpath = syspci.join("devices").join(dev.to_string());
let overridepath = &devpath.join("driver_override");
@@ -204,7 +202,7 @@ impl ScsiBlockMatcher {
impl UeventMatcher for ScsiBlockMatcher {
fn is_match(&self, uev: &Uevent) -> bool {
uev.subsystem == "block" && uev.devpath.contains(&self.search) && !uev.devname.is_empty()
uev.subsystem == BLOCK && uev.devpath.contains(&self.search) && !uev.devname.is_empty()
}
}
@@ -238,7 +236,7 @@ impl VirtioBlkPciMatcher {
impl UeventMatcher for VirtioBlkPciMatcher {
fn is_match(&self, uev: &Uevent) -> bool {
uev.subsystem == "block" && self.rex.is_match(&uev.devpath) && !uev.devname.is_empty()
uev.subsystem == BLOCK && self.rex.is_match(&uev.devpath) && !uev.devname.is_empty()
}
}
@@ -311,7 +309,7 @@ impl PmemBlockMatcher {
impl UeventMatcher for PmemBlockMatcher {
fn is_match(&self, uev: &Uevent) -> bool {
uev.subsystem == "block"
uev.subsystem == BLOCK
&& uev.devpath.starts_with(ACPI_DEV_PATH)
&& uev.devpath.ends_with(&self.suffix)
&& !uev.devname.is_empty()
@@ -441,6 +439,48 @@ async fn wait_for_ap_device(sandbox: &Arc<Mutex<Sandbox>>, address: ap::Address)
Ok(())
}
#[derive(Debug)]
struct MmioBlockMatcher {
suffix: String,
}
impl MmioBlockMatcher {
fn new(devname: &str) -> MmioBlockMatcher {
MmioBlockMatcher {
suffix: format!(r"/block/{}", devname),
}
}
}
impl UeventMatcher for MmioBlockMatcher {
fn is_match(&self, uev: &Uevent) -> bool {
uev.subsystem == BLOCK && uev.devpath.ends_with(&self.suffix) && !uev.devname.is_empty()
}
}
#[instrument]
pub async fn get_virtio_mmio_device_name(
sandbox: &Arc<Mutex<Sandbox>>,
devpath: &str,
) -> Result<()> {
let devname = devpath
.strip_prefix("/dev/")
.ok_or_else(|| anyhow!("Storage source '{}' must start with /dev/", devpath))?;
let matcher = MmioBlockMatcher::new(devname);
let uev = wait_for_uevent(sandbox, matcher)
.await
.context("failed to wait for uevent")?;
if uev.devname != devname {
return Err(anyhow!(
"Unexpected device name {} for mmio device (expected {})",
uev.devname,
devname
));
}
Ok(())
}
/// Scan SCSI bus for the given SCSI address(SCSI-Id and LUN)
#[instrument]
fn scan_scsi_bus(scsi_addr: &str) -> Result<()> {
@@ -564,7 +604,7 @@ fn update_spec_devices(spec: &mut Spec, mut updates: HashMap<&str, DevUpdate>) -
let host_minor = specdev.minor;
info!(
sl!(),
sl(),
"update_spec_devices() updating device";
"container_path" => &specdev.path,
"type" => &specdev.r#type,
@@ -615,7 +655,7 @@ fn update_spec_devices(spec: &mut Spec, mut updates: HashMap<&str, DevUpdate>) -
if let Some(update) = res_updates.get(&(r.r#type.as_str(), host_major, host_minor))
{
info!(
sl!(),
sl(),
"update_spec_devices() updating resource";
"type" => &r.r#type,
"host_major" => host_major,
@@ -676,12 +716,18 @@ pub fn update_env_pci(
#[instrument]
async fn virtiommio_blk_device_handler(
device: &Device,
_sandbox: &Arc<Mutex<Sandbox>>,
sandbox: &Arc<Mutex<Sandbox>>,
) -> Result<SpecUpdate> {
if device.vm_path.is_empty() {
return Err(anyhow!("Invalid path for virtio mmio blk device"));
}
if !Path::new(&device.vm_path).exists() {
get_virtio_mmio_device_name(sandbox, &device.vm_path.to_string())
.await
.context("failed to get mmio device name")?;
}
Ok(DevNumUpdate::from_vm_path(&device.vm_path)?.into())
}
@@ -759,7 +805,7 @@ async fn vfio_pci_device_handler(
device: &Device,
sandbox: &Arc<Mutex<Sandbox>>,
) -> Result<SpecUpdate> {
let vfio_in_guest = device.field_type != DRIVER_VFIO_PCI_GK_TYPE;
let vfio_in_guest = device.type_ != DRIVER_VFIO_PCI_GK_TYPE;
let mut pci_fixups = Vec::<(pci::Address, pci::Address)>::new();
let mut group = None;
@@ -873,10 +919,10 @@ pub async fn add_devices(
#[instrument]
async fn add_device(device: &Device, sandbox: &Arc<Mutex<Sandbox>>) -> Result<SpecUpdate> {
// log before validation to help with debugging gRPC protocol version differences.
info!(sl!(), "device-id: {}, device-type: {}, device-vm-path: {}, device-container-path: {}, device-options: {:?}",
device.id, device.field_type, device.vm_path, device.container_path, device.options);
info!(sl(), "device-id: {}, device-type: {}, device-vm-path: {}, device-container-path: {}, device-options: {:?}",
device.id, device.type_, device.vm_path, device.container_path, device.options);
if device.field_type.is_empty() {
if device.type_.is_empty() {
return Err(anyhow!("invalid type for device {:?}", device));
}
@@ -888,17 +934,17 @@ async fn add_device(device: &Device, sandbox: &Arc<Mutex<Sandbox>>) -> Result<Sp
return Err(anyhow!("invalid container path for device {:?}", device));
}
match device.field_type.as_str() {
DRIVER_BLK_TYPE => virtio_blk_device_handler(device, sandbox).await,
match device.type_.as_str() {
DRIVER_BLK_PCI_TYPE => virtio_blk_device_handler(device, sandbox).await,
DRIVER_BLK_CCW_TYPE => virtio_blk_ccw_device_handler(device, sandbox).await,
DRIVER_MMIO_BLK_TYPE => virtiommio_blk_device_handler(device, sandbox).await,
DRIVER_BLK_MMIO_TYPE => virtiommio_blk_device_handler(device, sandbox).await,
DRIVER_NVDIMM_TYPE => virtio_nvdimm_device_handler(device, sandbox).await,
DRIVER_SCSI_TYPE => virtio_scsi_device_handler(device, sandbox).await,
DRIVER_VFIO_PCI_GK_TYPE | DRIVER_VFIO_PCI_TYPE => {
vfio_pci_device_handler(device, sandbox).await
}
DRIVER_VFIO_AP_TYPE => vfio_ap_device_handler(device, sandbox).await,
_ => Err(anyhow!("Unknown device type {}", device.field_type)),
_ => Err(anyhow!("Unknown device type {}", device.type_)),
}
}
@@ -1394,7 +1440,7 @@ mod tests {
let mut uev = crate::uevent::Uevent::default();
uev.action = crate::linux_abi::U_EVENT_ACTION_ADD.to_string();
uev.subsystem = "block".to_string();
uev.subsystem = BLOCK.to_string();
uev.devpath = devpath.clone();
uev.devname = devname.to_string();
@@ -1421,6 +1467,7 @@ mod tests {
}
#[tokio::test]
#[allow(clippy::redundant_clone)]
async fn test_virtio_blk_matcher() {
let root_bus = create_pci_root_bus_path();
let devname = "vda";
@@ -1428,7 +1475,7 @@ mod tests {
let mut uev_a = crate::uevent::Uevent::default();
let relpath_a = "/0000:00:0a.0";
uev_a.action = crate::linux_abi::U_EVENT_ACTION_ADD.to_string();
uev_a.subsystem = "block".to_string();
uev_a.subsystem = BLOCK.to_string();
uev_a.devname = devname.to_string();
uev_a.devpath = format!("{}{}/virtio4/block/{}", root_bus, relpath_a, devname);
let matcher_a = VirtioBlkPciMatcher::new(relpath_a);
@@ -1505,6 +1552,7 @@ mod tests {
}
#[tokio::test]
#[allow(clippy::redundant_clone)]
async fn test_scsi_block_matcher() {
let root_bus = create_pci_root_bus_path();
let devname = "sda";
@@ -1512,7 +1560,7 @@ mod tests {
let mut uev_a = crate::uevent::Uevent::default();
let addr_a = "0:0";
uev_a.action = crate::linux_abi::U_EVENT_ACTION_ADD.to_string();
uev_a.subsystem = "block".to_string();
uev_a.subsystem = BLOCK.to_string();
uev_a.devname = devname.to_string();
uev_a.devpath = format!(
"{}/0000:00:00.0/virtio0/host0/target0:0:0/0:0:{}/block/sda",
@@ -1535,6 +1583,7 @@ mod tests {
}
#[tokio::test]
#[allow(clippy::redundant_clone)]
async fn test_vfio_matcher() {
let grpa = IommuGroup(1);
let grpb = IommuGroup(22);
@@ -1555,6 +1604,34 @@ mod tests {
assert!(!matcher_a.is_match(&uev_b));
}
#[tokio::test]
#[allow(clippy::redundant_clone)]
async fn test_mmio_block_matcher() {
let devname_a = "vda";
let devname_b = "vdb";
let mut uev_a = crate::uevent::Uevent::default();
uev_a.action = crate::linux_abi::U_EVENT_ACTION_ADD.to_string();
uev_a.subsystem = BLOCK.to_string();
uev_a.devname = devname_a.to_string();
uev_a.devpath = format!(
"/sys/devices/virtio-mmio-cmdline/virtio-mmio.0/virtio0/block/{}",
devname_a
);
let matcher_a = MmioBlockMatcher::new(devname_a);
let mut uev_b = uev_a.clone();
uev_b.devpath = format!(
"/sys/devices/virtio-mmio-cmdline/virtio-mmio.4/virtio4/block/{}",
devname_b
);
let matcher_b = MmioBlockMatcher::new(devname_b);
assert!(matcher_a.is_match(&uev_a));
assert!(matcher_b.is_match(&uev_b));
assert!(!matcher_b.is_match(&uev_a));
assert!(!matcher_a.is_match(&uev_b));
}
#[test]
fn test_split_vfio_pci_option() {
assert_eq!(

View File

@@ -33,7 +33,7 @@ pub fn create_pci_root_bus_path() -> String {
// check if there is pci bus path for acpi
acpi_sysfs_dir.push_str(&acpi_root_bus_path);
if let Ok(_) = fs::metadata(&acpi_sysfs_dir) {
if fs::metadata(&acpi_sysfs_dir).is_ok() {
return acpi_root_bus_path;
}
@@ -81,7 +81,8 @@ cfg_if! {
// sysfs as directories in the subtree under /sys/devices/LNXSYSTM:00
pub const ACPI_DEV_PATH: &str = "/devices/LNXSYSTM";
pub const SYSFS_CPU_ONLINE_PATH: &str = "/sys/devices/system/cpu";
pub const SYSFS_CPU_PATH: &str = "/sys/devices/system/cpu";
pub const SYSFS_CPU_ONLINE_PATH: &str = "/sys/devices/system/cpu/online";
pub const SYSFS_MEMORY_BLOCK_SIZE_PATH: &str = "/sys/devices/system/memory/block_size_bytes";
pub const SYSFS_MEMORY_HOTPLUG_PROBE_PATH: &str = "/sys/devices/system/memory/probe";

View File

@@ -48,6 +48,7 @@ mod pci;
pub mod random;
mod sandbox;
mod signal;
mod storage;
mod uevent;
mod util;
mod version;
@@ -65,7 +66,7 @@ use tokio::{
io::AsyncWrite,
sync::{
watch::{channel, Receiver},
Mutex, RwLock,
Mutex,
},
task::JoinHandle,
};
@@ -73,6 +74,9 @@ use tokio::{
mod rpc;
mod tracer;
#[cfg(feature = "agent-policy")]
mod policy;
cfg_if! {
if #[cfg(target_arch = "s390x")] {
mod ap;
@@ -83,12 +87,16 @@ cfg_if! {
const NAME: &str = "kata-agent";
lazy_static! {
static ref AGENT_CONFIG: Arc<RwLock<AgentConfig>> = Arc::new(RwLock::new(
static ref AGENT_CONFIG: AgentConfig =
// Note: We can't do AgentOpts.parse() here to send through the processed arguments to AgentConfig
// clap::Parser::parse() greedily process all command line input including cargo test parameters,
// so should only be used inside main.
AgentConfig::from_cmdline("/proc/cmdline", env::args().collect()).unwrap()
));
AgentConfig::from_cmdline("/proc/cmdline", env::args().collect()).unwrap();
}
#[cfg(feature = "agent-policy")]
lazy_static! {
static ref AGENT_POLICY: Mutex<policy::AgentPolicy> = Mutex::new(AgentPolicy::new());
}
#[derive(Parser)]
@@ -181,13 +189,13 @@ async fn real_main() -> std::result::Result<(), Box<dyn std::error::Error>> {
lazy_static::initialize(&AGENT_CONFIG);
init_agent_as_init(&logger, AGENT_CONFIG.read().await.unified_cgroup_hierarchy)?;
init_agent_as_init(&logger, AGENT_CONFIG.unified_cgroup_hierarchy)?;
drop(logger_async_guard);
} else {
lazy_static::initialize(&AGENT_CONFIG);
}
let config = AGENT_CONFIG.read().await;
let config = &AGENT_CONFIG;
let log_vport = config.log_vport as u32;
let log_handle = tokio::spawn(create_logger_task(rfd, log_vport, shutdown_rx.clone()));
@@ -200,7 +208,7 @@ async fn real_main() -> std::result::Result<(), Box<dyn std::error::Error>> {
let (logger, logger_async_guard) =
logging::create_logger(NAME, "agent", config.log_level, writer);
announce(&logger, &config);
announce(&logger, config);
// This variable is required as it enables the global (and crucially static) logger,
// which is required to satisfy the the lifetime constraints of the auto-generated gRPC code.
@@ -228,7 +236,7 @@ async fn real_main() -> std::result::Result<(), Box<dyn std::error::Error>> {
let span_guard = root_span.enter();
// Start the sandbox and wait for its ttRPC server to end
start_sandbox(&logger, &config, init_mode, &mut tasks, shutdown_rx.clone()).await?;
start_sandbox(&logger, config, init_mode, &mut tasks, shutdown_rx.clone()).await?;
// Install a NOP logger for the remainder of the shutdown sequence
// to ensure any log calls made by local crates using the scope logger
@@ -327,6 +335,19 @@ async fn start_sandbox(
s.rtnl.handle_localhost().await?;
}
// - When init_mode is true, enabling the localhost link during the
// handle_localhost call above is required before starting OPA with the
// initialize_policy call below.
// - When init_mode is false, the Policy could be initialized earlier,
// because initialize_policy doesn't start OPA. OPA is started by
// systemd after localhost has been enabled.
#[cfg(feature = "agent-policy")]
if let Err(e) = initialize_policy(init_mode).await {
error!(logger, "Failed to initialize agent policy: {:?}", e);
// Continuing execution without a security policy could be dangerous.
std::process::abort();
}
let sandbox = Arc::new(Mutex::new(s));
let signal_handler_task = tokio::spawn(setup_signal_handler(
@@ -388,6 +409,18 @@ fn init_agent_as_init(logger: &Logger, unified_cgroup_hierarchy: bool) -> Result
Ok(())
}
#[cfg(feature = "agent-policy")]
async fn initialize_policy(init_mode: bool) -> Result<()> {
let opa_addr = "localhost:8181";
let agent_policy_path = "/agent_policy";
let default_agent_policy = "/etc/kata-opa/default-policy.rego";
AGENT_POLICY
.lock()
.await
.initialize(init_mode, opa_addr, agent_policy_path, default_agent_policy)
.await
}
// The Rust standard library had suppressed the default SIGPIPE behavior,
// see https://github.com/rust-lang/rust/pull/13158.
// Since the parent's signal handler would be inherited by it's child process,
@@ -402,6 +435,9 @@ fn reset_sigpipe() {
use crate::config::AgentConfig;
use std::os::unix::io::{FromRawFd, RawFd};
#[cfg(feature = "agent-policy")]
use crate::policy::AgentPolicy;
#[cfg(test)]
mod tests {
use super::*;
@@ -442,9 +478,8 @@ mod tests {
let msg = format!("test[{}]: {:?}", i, d);
let (rfd, wfd) = unistd::pipe2(OFlag::O_CLOEXEC).unwrap();
defer!({
// rfd is closed by the use of PipeStream in the crate_logger_task function,
// but we will attempt to close in case of a failure
let _ = unistd::close(rfd);
// XXX: Never try to close rfd, because it will be closed by PipeStream in
// create_logger_task() and it's not safe to close the same fd twice time.
unistd::close(wfd).unwrap();
});

View File

@@ -15,11 +15,9 @@ use tracing::instrument;
const NAMESPACE_KATA_AGENT: &str = "kata_agent";
const NAMESPACE_KATA_GUEST: &str = "kata_guest";
// Convenience macro to obtain the scope logger
macro_rules! sl {
() => {
slog_scope::logger().new(o!("subsystem" => "metrics"))
};
// Convenience function to obtain the scope logger.
fn sl() -> slog::Logger {
slog_scope::logger().new(o!("subsystem" => "metrics"))
}
lazy_static! {
@@ -139,7 +137,7 @@ fn update_agent_metrics() -> Result<()> {
Ok(p) => p,
Err(e) => {
// FIXME: return Ok for all errors?
warn!(sl!(), "failed to create process instance: {:?}", e);
warn!(sl(), "failed to create process instance: {:?}", e);
return Ok(());
}
@@ -160,7 +158,7 @@ fn update_agent_metrics() -> Result<()> {
// io
match me.io() {
Err(err) => {
info!(sl!(), "failed to get process io stat: {:?}", err);
info!(sl(), "failed to get process io stat: {:?}", err);
}
Ok(io) => {
set_gauge_vec_proc_io(&AGENT_IO_STAT, &io);
@@ -169,7 +167,7 @@ fn update_agent_metrics() -> Result<()> {
match me.stat() {
Err(err) => {
info!(sl!(), "failed to get process stat: {:?}", err);
info!(sl(), "failed to get process stat: {:?}", err);
}
Ok(stat) => {
set_gauge_vec_proc_stat(&AGENT_PROC_STAT, &stat);
@@ -177,7 +175,7 @@ fn update_agent_metrics() -> Result<()> {
}
match me.status() {
Err(err) => error!(sl!(), "failed to get process status: {:?}", err),
Err(err) => error!(sl(), "failed to get process status: {:?}", err),
Ok(status) => set_gauge_vec_proc_status(&AGENT_PROC_STATUS, &status),
}
@@ -189,7 +187,7 @@ fn update_guest_metrics() {
// try get load and task info
match procfs::LoadAverage::new() {
Err(err) => {
info!(sl!(), "failed to get guest LoadAverage: {:?}", err);
info!(sl(), "failed to get guest LoadAverage: {:?}", err);
}
Ok(load) => {
GUEST_LOAD
@@ -209,7 +207,7 @@ fn update_guest_metrics() {
// try to get disk stats
match procfs::diskstats() {
Err(err) => {
info!(sl!(), "failed to get guest diskstats: {:?}", err);
info!(sl(), "failed to get guest diskstats: {:?}", err);
}
Ok(diskstats) => {
for diskstat in diskstats {
@@ -221,7 +219,7 @@ fn update_guest_metrics() {
// try to get vm stats
match procfs::vmstat() {
Err(err) => {
info!(sl!(), "failed to get guest vmstat: {:?}", err);
info!(sl(), "failed to get guest vmstat: {:?}", err);
}
Ok(vmstat) => {
for (k, v) in vmstat {
@@ -233,7 +231,7 @@ fn update_guest_metrics() {
// cpu stat
match procfs::KernelStats::new() {
Err(err) => {
info!(sl!(), "failed to get guest KernelStats: {:?}", err);
info!(sl(), "failed to get guest KernelStats: {:?}", err);
}
Ok(kernel_stats) => {
set_gauge_vec_cpu_time(&GUEST_CPU_TIME, "total", &kernel_stats.total);
@@ -246,7 +244,7 @@ fn update_guest_metrics() {
// try to get net device stats
match procfs::net::dev_status() {
Err(err) => {
info!(sl!(), "failed to get guest net::dev_status: {:?}", err);
info!(sl(), "failed to get guest net::dev_status: {:?}", err);
}
Ok(devs) => {
// netdev: map[string]procfs::net::DeviceStatus
@@ -259,7 +257,7 @@ fn update_guest_metrics() {
// get statistics about memory from /proc/meminfo
match procfs::Meminfo::new() {
Err(err) => {
info!(sl!(), "failed to get guest Meminfo: {:?}", err);
info!(sl(), "failed to get guest Meminfo: {:?}", err);
}
Ok(meminfo) => {
set_gauge_vec_meminfo(&GUEST_MEMINFO, &meminfo);

File diff suppressed because it is too large Load Diff

View File

@@ -7,14 +7,14 @@ use anyhow::{anyhow, Result};
use nix::mount::MsFlags;
use nix::sched::{unshare, CloneFlags};
use nix::unistd::{getpid, gettid};
use slog::Logger;
use std::fmt;
use std::fs;
use std::fs::File;
use std::path::{Path, PathBuf};
use tracing::instrument;
use crate::mount::{baremount, FLAGS};
use slog::Logger;
use crate::mount::baremount;
const PERSISTENT_NS_DIR: &str = "/var/run/sandbox-ns";
pub const NSTYPEIPC: &str = "ipc";
@@ -116,15 +116,7 @@ impl Namespace {
// Bind mount the new namespace from the current thread onto the mount point to persist it.
let mut flags = MsFlags::empty();
if let Some(x) = FLAGS.get("rbind") {
let (clear, f) = *x;
if clear {
flags &= !f;
} else {
flags |= f;
}
};
flags |= MsFlags::MS_BIND | MsFlags::MS_REC;
baremount(source, destination, "none", flags, "", &logger).map_err(|e| {
anyhow!(

Some files were not shown because too many files have changed in this diff Show More