hub is now deprecated, which has been causing issues with our release
process.
Let's move to the GH cli (https://cli.github.com/manual), and unblock
this release.
**NOTE**: This commit is purposefully not touching anywhere else hub is
used, as that would require more time and investigation to do the
switch, and right now we just want to unblock the release.
Fixes: #8286
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is needed in order to properly run the CIs in branches that are not
the main one, as the kata-deploy.yaml file on those branches do not have
the `latest` tag, but rather the latest stable release.
Fixes: #8274
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
- kata-deploy-stable: Switch to using the ubuntu based payload
- libs: protection: Fix typo in TDX output
- ci: k8s: Fix bogus firecracker check in k8s-credentials-secrets.bat
- tests: Enable agent stability test
- docs: Fix paths to build kernel in SNP VMs documentation
- runtime-rs: ch: Add TDX CH features check
- runtime: Validate hypervisor section name in config file
- tests: query data from the OPA service
- release: tag_repos: Stop tagging the `tests` repo
- metrics: fixes common.sh function to always return true
- Memory footprint test removing trailing commas to make json results file valid
- policy: allow access to ReseedRandomDev
- runtime/kata-ctl: update dependencies
- runtime-rs : fix Nydus support for runtime-rs + Dragonball
- metrics: removal of reference in the documentation to the fio dax subtest.
- runtime-rs: ch: Detect Intel TDX version
- runitme-rs: use the same base64 as kata-runtime/direct-volume does
- tests: Enable scability test for stability CI
- runtime-rs: Add support for adding vfio device for cloud-hypervisor
- tests: Enable soak parallel stability test
- dragonball: vcpu metrics change to be recorded per vcpu
- ci: k8s: adapt gha-run.sh to run locally
- metrics: removes kata components and k8s deployment when test finishes
- GHA: fix up referenced yaml exceeding 20 limit problem
- gha: ci: Revert tracing test PR to unbreak CI
- runtime-rs: ch: Enable feature
- gha: ci: Port runk tests over
- ci: gha: Port tracing tests over
- Enable fio test using containerd client
- gha: Add stability tests workflow for gha
- gha: arm64: Ensure the builder is arm64-builder
- kata-deploy: Build kata-agent as we build all the other components
- versions: migrate out of k8s.gcr.io
- doc: Update crictl pod-config
- gha: Fix k0s deployment
- tests: Add stability test for kata CI
- docs: Update url in kata vra document
- gpu: Adding CDI support for cold and hot-plug of VFIO devices
- kata-deploy: build & ship the rust components from src/tools/
- metrics: Add latency value limits for kata CI
- runtime: fix reading cgroup stats of sandboxes
- Upgrade to Cloud Hypervisor v35.0
- ci: Port kata-monitor tests from Jenkins to GHA
- metrics: Fix latency yamls path
- metrics: Fix metrics README
- metrics: Fix C-Ray documentation
- runtime-rs: ch: Enable Intel TDX
- ci: k8s: crio: Follow up patches to have CRI-O also working as part of our CI
- metrics: Enable latency test in gha run script
- local-build: Fix .docker ownership before build-payload
- runtime-rs: Add network support for cloud-hypervisor
- osbuild: Reduce guest components binary size with strip
- gha: Add pandoc as a dependency for static checks
- ci: rootfs-image build-asset is failing
- feat(runtime-rs): introduce huge page mode to select VM RAM's backend
- clh: Direct IO support for block devices
- gha: Install hunspell for static checks
- ci: Trigger payload-after-push on workflow_dispatch
- ci: Actually enable the CRI-O tests
- protocol: remove gogoprotobuff tests
- ci: k8s: Also run tests with CRI-O
- runtime: support kernel params including spaces
- ci: kata-deploy: Fix runner name
- metrics: Enable parallel bandwidth iperf limit
- ci: kata-deploy: Enable all k8s flavours that we support
- ci: Create clusters in individual resource groups
- versions: Bump virtiofsd to v1.8.0
- clh: arm: Use static_sandbox_resource_mgmt=true
- Bump nydus versions and update nydus tests
- runtime/qemu: Rework QMP/HMP support
- clh:arm64: use arm AMBA UART for hypervisor debug
- ci: Use variable size of VMs depending on the tests running
- ci: Rework static checks
- runtime: incorrect handling of non-empty []Endpoint parameter in Remo…
- ci: cache: Check the sha256sum of the components & fix ovmf-sev cache usage
- ci: cache: Use the artefacts stored in ghcr.io/kata-containers/cached-artefacts/${component}
- ci: Run some of the GARM tests in smaller instances
- ci: Reduce the size of the AKS VMs
- ci: cache: Allow pushing our artefacts to an OCI registry
- metrics: Add iperf value for cpu utilization
- ci: cache: Export env vars needed to use ORAS
- gha: vfio: Import test script
- tests: fix kernel and initrd annotations
- metrics: Add iperf bandwidth value for kata metrics
- metrics: Add Cassandra Metrics documentation
- metrics: Remove warning from metrics documentation
- ci: docker: nerdctl: Switch to tcp port 80 ping
- runtime: Naming conflict of network devices
- Remove gogoproto.nullable extension
- metrics: Ensure docker is running in init_env
- metrics: this PR skips the FIO test temprarily to fix issues
- ci: Add a very basic nerdctl sanity test
- runtime-rs: hypervisor: Remove debug kernel options
- versions: Bump rust version
- ci: Add a very basic docker sanity test
- dragonball: fix for non-deterministic builds
- runtime-rs: bring hybrid vsock devices in manager.
- ci: use github.ref_name instead of $GITHUB_REF_NAME
- ci: Add more target-branch related fixes
- ci: Fix target-branch usage
- agent: optimize the code of systemd cgroup manager
- gha: Manually rebase PR atop of the target branch before testing
- Update kernel to the latest LTS release (v6.1.52) and bring in erofs patches needed for the CC work
- kata-deploy: Fix aarch64 image build
- runtime: Fix more virtiofs args
- kata-deploy: Switch to an alpine image
- metrics: Use TensorFlow optimized image
- metrics: fix FIO test initialization
- ci: k8s: Add clean-up-garm argument for gha-run.sh
- ci: k8s: Second round of fix-ups with the devmapper CI
- metrics: re-enable memory-usage initialization step
- Dragonball: optimize the placement of dbs-upcall features
- ci: k8s: Fix typo in run-k8s-tests-on-garm.yaml
- ci: k8s: Add k8s devmapper tests (part 0)
- kata-deploy: Create kata-static.tar with correct ownership
- runtime: run prestart hooks before starting VM for FC
- metrics: Add write 95 percentile FIO value
- runtime: Allow virtio_fs_extra_args annotation
- packaging: do not install docker-compose-plugin for s390x|ppc64le
- runtime-rs: Fix volumes and rootfs cleanup issues
- metrics: Enable iperf benchmark on gha for kata metrics
- CI: switch static-checks-dragonball CI machines to Azure
- metrics: Add README for kata metrics report
- osbuilder: Remove chcon operation for guest SELinux
- kata-sys-util: protection: Update TDX checks
- Improve the way to clean up storage devices for sandbox
- agent: avoid possible leakage of storage device
- tests: add policy to existing tests
- gha: Rebase PR atop of the target branch before testing
- versions: Update alpine to its 3.18 version
- runtime: Fix data race in ioCopy
- metrics: Add grabdata script for metrics report
- Fixes tests on AMD machines
- metrics: Enable FIO limits for kata metrics
- metrics: Add metrics report script
- metrics: Fix memory inside limits for kata metrics
- metrics: fix parsing issue on memory-usage test
- dragonball: vsock add fifo/pipe stream support for passed fd hybridSt…
- tests: Add confidential test
- tdx: Update the components needed for using the 6.2 kernel stack
- tests: delete k8s deployment at the test's end
- tests: use unique test name
- runtime-rs: check peer close in log_forwarder
- gha: Avoid "fail-fast" in tests that are known to be flaky
- Refine storage device management for kata-agent
- metrics: Remove unused variable in tensorflow nhwc script
- kata-deploy: Don't try to remove /opt/kata
- metrics: Add TensorFlow ResNet50 FP32 benchmark
- gha: vfio: Run on Ubuntu 23.04 runner
- kata-agent: use default filemode for block device when it is set to 0
- kata-types: introduce KataVirtualVolume to support nydus, direct volume and image pull
- libs,tests: fix typo disable_guest_seccomp in configuration-anno-1.toml
- local-build: Remove GID before creating group
- kata-deploy: Avoid failing on content removal
- runtime: fix image and initrd assets handling
- metrics: Add disk link to README
- metrics: Fix FIO path
- gha: capture additional kata-deploy output
- metrics: Use function from metrics common in pytorch script
- metrics: Enable kata runtime in K8s for FIO test.
- metrics: Fix README for pytorch
- metrics: Remove unused variable in tensorflow mobilenet script
- rootfs: agent: Policy support with AGENT_INIT=yes
- gha: k8s: kata-deploy: Move kata-deploy specific tests from integration/kubernetes to functional/kata-deploy
- metrics: Fix check results for tensorflow benchmark
- metrics: Add Tensorflow ResNet50 int8 benchmark
- kata-deploy: Properly create default runtime class
- agent: simplify error handling
- metrics: Fix MobileNet help me description
- gha: ci: Start running kata-deploy tests
- runk: Modify kill command's error message for containerd tests
- runtime-rs: add driver option
- gha: cri-containerd: Enable tests
- metrics: Rename tensorflow scripts
- gha: tests: Add kata-deploy functional tests -- Part 1
- agent: runtime: add Agent Policy feature
- runk: Support without pid ns
- metrics: Add Cassandra Kubernetes benchmark for kata metrics
- metrics: Add common functions to the common script
- metrics: fix the loop used to stop kata components
- docs: Remove installation step in virtcontainers doc
- Propogate secrets, config maps etc into guest if sharedFS not available
- kata-deploy: Preliminary k0s support
- gha: static-checks: Move to the Azure instances
- versions: Update firecracker version to 1.4.0
- agent: Allow clippy::redundant_clone in the unit tests
- agent: avoid creating new `Vec` instances when easily avoidable
- metrics: compute tensorflow statistics
- metrics: Add network nginx benchmark
- metrics: install kata once and run multiple checks
- ci: unencrypted-image: Fix build context
- ci: create-confidential-image: Add dependent actions
- Follow up fixes for https://github.com/kata-containers/kata-containers/pull/7596
- tests: Create image that will be used in the unencrypted confidential tests
- kata-deploy: Ensure we cover SHIMS / DEFAULT_SHIM as part of our tests
- tests: upgrade bats version
- Fix mimor bugs and improve coding stype of agent rpc/sandbox/mount
- deps: Bump dependent crate versions
- fix number of queues handling in dragonball share fs device
- runtime-rs: Introduce directly attachable network
- metrics: General improvements to mobilenet tensorflow test
- gha: Add iperf network metrics
- docs: Use control-plane term instead of master
- agent: avoid unnecessary calls to `Arc::clone`
- metrics: Add network latency test
- Image pulling on the host
- Use version 0.10.4 of `fuse-backend-rs`
- kata-deploy: Use host's systemctl
- release: Revert kata-deploy changes after 3.2.0-rc0 release
- metrics: stop kata components before start a metric test.
- runtime-rs: Add block device handling for cloud hypervisor
a93fdb014 kata-deploy-stable: Adapt to what we're using in the stable branch
36109da93 ci: k8s: Fix bogus firecracker check in k8s-credentials-secrets.bat
d01daf749 tests: Adjust timeout for agent stability test
9b14dda14 libs: protection: Fix typo in TDX output
0e0867f15 runtime-rs: ch: Add TDX CH features check
409eadddb runtime-rs: ch: Improve readability of guest protection checks
82a0814fc tests: Enable agent stability test
32be8e3a8 tests: query data from the OPA service
b81c0a669 tests: encode policy file during test
4f9681b41 metrics: fixes common.sh function to always return true
2ef2b2a6d docs: Fix paths to build kernel in SNP VMs documentation
408b59c02 runtime-rs: fix bugs to support Nydus v5
157caea9f Revert "nydus: Temporarily skip tests on dragonball"
678fe3cd3 Dragonball: fix Nydus config serde problem
b6ec62138 policy: allow access to ReseedRandomDev
908519db9 metrics: skips docker restart when it is not installed or is masked.
c2763120a metrics: removing trailing comma characters from json file.
3e8cf6959 runtime: Validate hypervisor section name in config file
ef6388e81 tests: Remove unused function from scability test
fbc8f8f46 scripts: Use install_yq from the `kata-containers` repo
65b1a2d27 release: tag_repos: Stop tagging / updating the `tests` repo
87b760f56 runtime-rs: ch: Detect Intel TDX version
73e81f5e3 runitme-rs: unify base64 encoding for direct-volume
c6463cb5a tests: Fix path for versions yaml for soak parallel test
89c9454fc metrics: removal of reference in the documentation to the dax test.
30ff58904 tests: Enable scability test for stability CI
8d6f7b909 runtime-rs: Add support for handling vfio device for cloud-hypervisor
e786b2b01 gha: Add install dependencies for stability tests
dbfe6512f dragonball: vcpu metrics change to be recorded per vcpu
fa60fbe02 dragonball: METRICS is refactored to RwLock<DragonballMetrics>
500d1c5ce kata-ctl: update rustls-webpki/webpki dependency
d7660d82a runtime: unify gopkg.in/yaml.v3 to v3.0.1
fc9a107e8 runtime: unify swag and testify dependency
79ebb959c runtime: update runc dependency to v1.1.9
7f3e8bd65 runtime: unify golang.org/x/text to v0.7.0
df325ae37 runtime: update golang.org/x/net to v0.7.0
bba34910d metrics: stops kata components and k8s deployment when test finishes
84e3d884e gha: Add general dependencies to stability tests
dec3951ca tests: Add soak parallel stability test
0f04d527d tests: Enable soak parallel test
e669282c2 ci: k8s: set KUBERNETES default value
c30c3ff18 tests: run k8s-volume on a given node
666993da8 tests: run k8s-file-volume on a given node
3a00fc910 tests: exec_host() now gets the node name
61c9c17bf tests: add get_one_kata_node() to tests_common.sh
68f083c4d ci: k8s: set KATA_HYPERVISOR default value
6677a61fe ci: k8s: configurable deploy kata timeout
200e54292 ci: k8s: shellcheck fixes to gha-run.sh
4af78be13 kata-deploy: re-format kata-[deploy|cleanup].yaml
d54e6d9cd ci: k8s: run_tests() for kcli
c2ef1f0fb ci: k8s: add deploy-kata-kcli() to gh-run.sh
d2be8eef1 ci: k8s: add cleanup-kcli() to gha-run.sh
cbb9aa15b ci: k8s: set default image for deploy_kata()
89bef7d03 ci: k8s: create k8s clusters with kcli
954d40cce gha: combine coco jobs into a single yaml
b60e0a9b5 gha: combine basic amd64 jobs into a single yaml
e9bd85211 gha: ci: Revert tracing test PR to unbreak CI
b8a46a4b8 runtime-rs: ch: Enable feature
0f2dc8c67 gha: Add containerd stability tests to ci yaml
da91c9df8 ci: Port runk tests to this repo
7f2377276 ci: Add placeholder for runk tests
9205acc3d ci: Move tracing tests here
85d290a04 gha: Add stability gha run script
54f0c8f88 gha: Add stability tests workflow for gha
3bb2923e5 ci: Add placeholder for tracing tests
2c3bf406d ci: Create a function to install docker
119f03de2 gha: arm64: Ensure the builder is arm64-builder
8c498ef5e metrics: Use jq tool to pretty-print json metrics output
a2159a636 metrics: Enables FIO test for kata containers
70e7ec3e2 gha: Fix k0s deployment
560bbffb5 packaging: tools: Remove `set -x` leftover
18fa483d9 packaging: release: Mention newly added images
ca3b88837 packaging: tools: Fix container image env var name
5ca66795c packaging: Allow passing the TOOLS_CONTAINER_BUILDER
02acef957 gha: Build the kata-agent as part of our workflows
5208386ab packaging: Build the kata-agent
1727487ee agent: Allow specifying DESTDIR and AGENT_POLICY via env vars
45c118883 packaging: Add get_agent_image_name()
0db8fb8f9 versions: migrate out of k8s.gcr.io
a1a054367 doc: Fix spelling
6339605a1 tests: Add general stability fixes
59ae24444 doc: Update crictl pod-config
fd19f4082 tests: Add agent stability test
215577032 tests: Add cassandra stress in stability tests
f2d3ea988 tests: Add stressng dockerfile for stability tests
6493aa309 tests: Add stressor CPU test for stability tests
ef68a3a36 metrics: Add stability test for kata CI
7c934dc7d gpu: Fix cold-plug of VFIO devices
8d66ef518 metrics: Increase qemu jitter value
5600e28b5 metrics: Increase jitter value for clh
a6b1f5e21 ci: Build src/tools components as part of our tests / releases
501a168a8 kata-deploy: Build components from src/tools
6ef42db5e static-build: Add scripts to build content from src/tools
4d08ec29b packaging: Add get_tools_image_name()
98097c96d packaging: Use git abbreviated hash
489caf1ad ci: kata-monitor: Move tests over
a3fb067f1 ci: Add placeholder for kata-monitor tests
57cb4ce20 ci: Make install_kata aware of container engines
de1eeee33 ci: Create a generic install_crio function
64a200085 ci: Add install_cni_plugins helper
8132fe15c ci: Modify containerd default config
8cb7df1be metrics: Add checkmetrics for latency test
e90440ae2 metrics: Add qemu latency value limit
a74a8f8a9 metrics: Add latency value limits for kata CI
d7def8317 metrics: Fix general check static warnings
928553d1b docs: Update url in kata vra document
b0a3293d5 runtime-rs: ch: Enable Intel TDX
523399c32 runtime-rs: ch: Add more consts
dea806581 runtime-rs: ch: Remove unused function
995f2c015 runtime-rs: ch: Only handle particular pending device types
b1b96a5c4 runtime-rs: ch: Remove erroneous "virtio-blk-mmio" check
9ac29b8d3 metrics: Add init_env function to latency test
dfd0c9fa9 runtime: clh: Re-generate the client code
8f9f087e3 versions: Upgrade to Cloud Hypervisor v35.0
81c8babca metrics: Fix latency yamls path
481573682 metrics: Fix C-Ray documentation
ef63d67c4 ci: crio: Trail '\r' from exec_host() output
74c12b292 ci: crio: Enable default capabilities
358dc2f56 kata-deploy: Fix CRI-O detection
ebaa4fa4c ci: crio: Pass `-y` to apt
97e73b223 metrics: Fix spelling warnings
36c8cd6f1 metrics: Fix metrics README
15425a2b8 local-build: Fix .docker ownership before build-payload
13ca7d9f9 gha: Add pandoc as a dependency for static checks
08bc8e4db metrics: Add latency benchmark for gha
6776b55d7 metrics: Enable latency test in gha run script
94e2ccc2d runtime: fix reading cgroup stats of sandboxes
d507d189b fc: Add support for noflush cache option
2ca781518 clh: Direct IO support for block devices
0c95697cc ci: Trigger payload-after-push on workflow_dispatch
28cbc3b51 ci: rootfs-image build-asset is failing Fixes: #802787a861648 gha: Install hunspell for static checks
8c3c50ca8 ci: Actually enable the CRI-O tests
3a6510ad6 osbuild: Reduce guest components binary size with strip
07a6e63a6 ci: k8s: rke2: Use sudo to call systemd
03b82e848 ci: k8s: Add a CRI-O test
d7105cf7a ci: k8s: Add a method to install CRI-O
54c0a471b ci: k8s: k0s: Allow passing parameters to the k0s installer
730ef5169 deps: updating dependencies
3a2c83d69 ci: kata-deploy: Fix runner name
82ff2db46 runtime: support kernel params including spaces
604a9dd67 protocol: remove gogoprotobuff tests
f7fa7f602 ci: Enable kata-deploy tests for all the supported k8s flavours
2c908b598 ci: kata-deploy: Add the ability to deploy rke2
eaf616491 ci: kata-deploy: Add the ability to deploy k0s
001525763 ci: kata-deploy: Add deploy-k8s argument to gha-run.sh
bf2cb0228 ci: kata-deploy: Expland tests to run on k0s / rke2
b12b9e188 ci: kata-deploy: Add placeholder for tests on GARM
9e1fb8a96 ci: kata-deploy: Export KUBERNETES env var
09cc0ed43 ci: Move deploy_k8s() to gha-run-k8s-common.sh
486fe14c9 ci: Properly set K8S_TEST_UNION
d9ef1352a ci: Add first letter of the K8S_TEST_HOST_TYPE to resource group name
68267a399 ci: Create clusters in individual resource groups
9aa8d1c91 metrics: Add parallel bandwidth limit for qemu
44c7c082d versions: Bump virtiofsd to v1.8.0
af59d4bf4 metrics: Enable parallel bandwidth iperf limit
aba36ab18 nydus: Temporarily skip tests on dragonball
b8a8dfcd1 nydus: Use `kata-${KATA_HYPERVISOR}` instead of `kata`
f6df3d6ef static-build: Fix arch error on nydus build
2f9c9e2e6 tests: nydus: Update nydus tests
c9a4e7e46 versions: Bump nydus and nydus-snapshotter to its latest release
b73bde320 gha: nydus: Populate run()
b3904a1a3 gha: nydus: Populate install_dependencies()
d2b3b67f5 gha: nydus: Actually install kata when `install-kata` is called
0ec00ad42 gha: nydus: Get rid of nydus{,-snapshotter} install from nydus_test.sh
568439c77 tests: nydus: Add timeout to the crictl calls
5ac3b76eb tests: nydus: Add uid / namespace to the nydus container / sandbox
376574a16 tests: nydus: Decorate some calls with `sudo`
4290fd4b6 tests: nydus: Adapt "source ..." to GHA
a84efa3e8 tests: nydus: Adapt check to "clh" instead "cloud-hypervisor"
56a14b395 tests: common: Add install_nydus_snapshotter()
b6563783e tests: common: Add install_nydus()
72599f191 clh: arm: Use static_sandbox_resource_mgmt=true
1f16b6627 runtime/qemu: Rework QMP/HMP support
8b1e9b0c7 ci: static-checks: Clean up static-checks job
2c5ca2eaf ci: static-checks: Run tests depending on KVM
509c309ab ci: static-checks: Move "sudo make test" to the new test matrix
4e963cedf ci: static-checks: Move "make test" to the new test matrix
08f2e5ae0 runtime-rs: Ensure static-checks-build is a dep of `make test`
2bc3a616a kata-ctl: Use `loop` instead of `kvm` module in tests
46daddc50 kata-ctl: Ensure GENERATED_CODE is a dep of `make test`
ec826f328 agent: Ensure GENERATED_CODE is a dep of `make test`
1d32410a8 ci: install_libseccomp: Do not depend on the tests repo
bf888b9a5 ci: static-checks: Move "make check" to the new test matrix
473ec8780 kata-ctl: Add `kata-types` to the Cargo.lock file
ea19549a9 kata-ctl: Ensure GENERATED_CODE is a dep of `make check`
e12577586 tests: install_rust: Also install clippy
e2c61a152 ci: static-checks: Move vendor check to its own job
6794d4c84 tests: Move install_rust.sh from the tests repo
e64508c30 tests: install_go: Remove tests repo dependency
11dff731b tests: Move functions from kata_arch script here
75c974c80 ci: static-checks: Move kernel config check to its own job
9c233bb9e test: Add test to verify try_from for clh Netconfig
c69a1e33b ci: Use variable size of VMs depending on the tests running
9049d311d runtime-rs: Add network support for cloud-hypervisor
eecd5bf2a ci: cache: Fix ovmf-sev cache
86c41074b ci: cache: Check the sha256sum of the component
460988c5f ci: cache: Remove the script used to cache artefacts on Jenkins
4533a7a41 ci: cache: Also store the ${component} sha256sum
eccc76df6 ci: cache: Use the cached artefacts from ORAS
7f5e77bcb kernel: enable Arm pl011 support
241c355e0 clh:arm64: use arm AMBA uart for hypervisor debug
094b6b2cf ci: k8s: Temporarily disable tests that require a bigger VM instance
d0c257b3a ci: cache: Push cached artefacts to ghcr.io
108f1b60d kata-deploy: Generate latest_{artefact,image_builder} files
be2eb7b37 ci: cache: Install ORAS in the kata-deploy binaries builder container
fb24fb0dc ci: k8s: devmapper: Use a smaller / cheaper VM instance
1daf02f5d ci: nydus: Use a smaller / cheaper VM instance
e60d81f55 ci: nerdctl: Use a smaller / cheaper VM instance
4db416997 ci: docker: Use a smaller / cheaper VM instance
32841827b ci: cri-containerd: Use a smaller / cheaper VM instance
92fff129f ci: k8s: Don't set cpu limit request for k8s-inotofy test
faf98c062 ci: Reduce the size of the AKS VMs
adc18ecdb ci: cache: For consistency, read all used env vars
c7a851efd ci: cache: Pass the exposed env vars to the kata-deploy binaries in docker
6bd15a85d ci: cache: Export env vars needed to use ORAS
cd4fd1292 metrics: Add iperf cpu utilization limit for qemu
df5cd10ea metrics: Add iperf value for cpu utilization
a96050a7a tests: Apply timeout to 'ctr t kill'
9d9303678 tests/vfio: Bump VM image to Fedora 38
faee59b52 tests/vfio: Accept single device in vfio group for CLH
df3dc1105 tests/vfio: Get rid of sync's
7211c3dcc gha: vfio: Set test timeout to 15m
1b02f89e4 packaging: kernel: Enable VIRTIO_IOMMU on x86_64
3a1db7a86 runtime: clh: Support enabling iommu
9f1a42c6c tests/vfio: Give commands 30s to execute
b46b0ecf8 tests/vfio: Configure a value for 'hot_plug_vfio' for both vmms
bfc93927f runtime: Remove redundant check in checkPCIeConfig
7c4e73b60 runtime: Add test cases for checkPCIeConfig
fc51e4b9e runtime: Check config for supported CLH (cold|hot)_plug_vfio values
509771e6f runtime: clh: Add hot_plug_vfio entry to config
5f6475a28 tests/vfio: Gather debug info and disable tdp_mmu
8fffdc81c tests/vfio: Capture journal from vm
df815087e tests/vfio: Change to get the test working in GHA
a92ddeea1 tests/vfio: Move dependency installation to gha-run.sh
5a551a85b gha: vfio: Import jobs scripts from tests repo
49e2fa189 metrics: Increase jitter value for qemu
49234433a metrics: Increase value limit for jitter in clh
813bfdec0 ci: docker: nerdtl: Use io.containerd.kata-${KATA_HYPERVISOR}.io
46bc0b1c0 ci: nerdctl: Create the containerd config
13968aa7f ci: nerdctl: Switch to tcp port 80 ping
e0c811678 ci: docker: Switch to tcp port 80 ping
1636abbe1 runtime: issue with non-empty []Endpoint in RemoveEndpoints
0aa073967 metrics: Add iperf bandwidth value for qemu
c0ad91476 tests: fix kernel and initrd annotations
615c1cbf1 metrics: Add iperf bandwidth value for kata metrics
d53eb73ee metrics: Ensure docker is running in init_env
ad08321b8 metrics: Add Cassandra Metrics documentation
a58ea6659 metrics: this PR skips the FIO test temprarily to fix issues
f536ef5ce ci: docker: Also run the smoke test with runc
c83f167c5 ci: docker: Run the tests after the kata-static is created
12d833d07 ci: Add a very basic nerdctl sanity test
348b8644d ci: Add a very basic docker sanity test
a75fd5eb8 runk: Fix rust unecessary mut error
a31c14517 kata-ctl: useless-vec warning
c8419fc3b kata-ctl: Resolve non-minimal-cfg warning
3eaf68d95 agent-ctl: Allow clippy lint
1d8b78959 runtime-rs: Fix useless-vec warning
99f3d69e9 runtime-rs: Remove mut
16fbc27b0 dragonball: Allow ambiguous-glob-reexports
bbf191951 dragonball: Resolve non-minimal-cfg warning
75cfdd5d5 agent: config: Allow clippy lint
f3a0fd590 agent: config: Fix useles-vec warning
9e423bd3d libs: Fix clippy unnecesary hashes error
444395050 versions: Bump rust version
a16b0962b chore(cargo): update cargo lock
ca4b6b051 runtime: Naming conflict of network devices
202049f35 feat(runtime-rs): introduce huge page type to select VM RAM's backend
f811b064c ci: use github.ref_name instead of $GITHUB_REF_NAME
6d795c089 ci: Add more target-branch related fixes
8509c3187 ci: Fix target-branch usage
060499dca metrics: Remove warning from metrics documentation
c0f697fcc runtime: Allow kernel_params annotation
b03e49794 dragonball: fix for non-deterministic builds
976d10150 runtime-rs: hypervisor: Remove debug kernel options
fde34610c kernel: Add erofs patches needed for CC related work
dc6a4588a versions: Bump kernel to the latest LTS release (6.1.52)
52f6449b7 kata-manager: Remove initcall_debug kernel option
8b4a0b368 kata-deploy: Remove curl after it's used
139c7f03a kata-deploy: Fix aarch64 image build
470d06541 agent: optimize the code of systemd cgroup manager
bd24afcf7 gha: Manually rebase PR atop of the target branch before testing
72c510d05 runtime/virtiofsd: Drop all references to "--cache=none"
ead724bec protocol: removing gogo.nullable feature
d8e4bb985 protocol: remove unused PROTO_FILE env
5e1106a77 protocol: remove unused import_path
87accaaec protocol: use workdir during build
711a7ed96 protocol: remove mapping definitions
8db84c1bd protocol: force GOPATH to be set
68156d77a protocol: breaking lines to improve readability
670a8e9c7 kata-deploy: Switch to an alpine image
9d74b7ccc k8s: ci: Skip "Pod quota" test with firecracker
f6cd3930c ci: k8s: Remove useless skip statement from tests
3cc20b47a ci: k8s: Also check for "fc" (for firecracker)
b5bad3cb0 ci: k8s: Add clean-up-garm argument for gha-run.sh
aaec5a09f ci: k8s: devmapper tests should be using ubuntu 20.04
27fa7d828 ci: k8s: Add a kata-deploy-garm target
fa62a4c01 ci: k8s: Export KUBERNETES env var
8c9380a79 ci: k8s: Install bats on GARM runners
3de23034f ci: k8s: Wait some time after restarting k3s
adfea55b8 metrics: fix FIO test initialization
2df183fd9 ci: k8s: Append, instead of overwrite, the devmapper config
369a8af8f ci: k8s: Decrease k3s sleep from 4 to 2 minutes
ada65b988 ci: k8s: Use vanilla kubectl with k3s
ad45ab5d3 ci: k8s: Ensure k3s is deploy with --write-kubeconfig-mode=644
028a97e0d ci: k8s: Use the proper command for sleep
3a427795e metrics: Use TensorFlow optimized image
8d99972a8 ci: k8s: Fix typo in run-k8s-tests-on-garm.yaml
deed1b927 Dragonball: optimize the placement of dbs-upcall features
0e8bd50cb ci: k8s: Add k8s devmapper tests (part 0)
b28b54df0 ci: k8s: Add a function to configure devmapper for containerd
54f711721 ci: k8s: Add a function to deploy k3s
81536f21a runtime/qemu: Pass "--xattr" to virtiofsd instead of "-o xattr"
b1dd09a4d runtime: Allow virtio_fs_extra_args annotation
2efda20c7 packaging: do not install docker-compose-plugin for s390x|ppc64le
438fbf966 metrics: Add write 95 percentile for FIO for qemu
024b4d2ff metrics: Add write 95 percentile FIO value
e98e5cdea metrics: Add checkmetrics to gha run script
c1edfe551 metrics: Add checkmetrics value for qemu for iperf
6a79ecedf metrics: Add jitter value for clh
f609a9a75 metrics: Add test selector to iperf metrics
5b8db3042 metrics: Enable iperf benchmark on gha for kata metrics
60f733d30 CI: switch static-checks-dragonball CI machines to Azure
7870b33a2 runtime-rs: bring hybridVsock devices in manager.
18c94ebbe kata-deploy: Create kata-static.tar with correct ownership
57e7bf14a agent: refine StorageDeviceGeneric::cleanup()
53edb1937 agent: implement StorageDeviceGeneric::cleanup()
0c63453e2 types: make StorageDevice::cleanup() return possible error code
3a3d77b3b agent: move StorageDeviceGeneric from kata-types into agent
b151cfd14 metrics: re-enable memory-usage initialization step
f3e1a6a94 osbuilder: alpine: Change mirror
ac612aef5 osbuilder: alpine: Match the version on versions.yaml
9cd706d1c agent: avoid possible leakage of storage device
bf21411e9 tests: add policy to k8s tests
d0e061067 runtime: config: use the SEV initrd for SNP
67fed26f1 runtime: Use TDX image with in the qemu-tdx config
ac939c458 gha: Rebase atop of the target branch
82cd14ba3 versions: Update alpine to its 3.18 version
666882575 metrics: Add grabdata script for metrics report
c290eaed8 kata-sys-util: protection: Update TDX checks
d7a996c68 gha: Update to checkout@v3 action
c2ba29c15 runtime: Fix data race in ioCopy
211de08d9 osbuilder: Remove chcon operation for guest SELinux
9f21fa9b3 metrics: Add report generator link to general documentation
c0ed5ea0a metrics: Add README for kata metrics report
a7b59a5bf metrics: Add limit for 90 percentile for qemu value
99db6568e metrics: Add limit for write 90 percentile value for clh
6e06392c5 metrics: Enable FIO limits for kata metrics
2e4c87472 runtime/vc: runPrestartHooks should ignore GetHypervisorPid failure
21204caf2 runtime: fail early when starting docker container with FC
32fd01371 runtime: run prestart hooks before starting VM for FC
00e7ffd98 tests: check vmx only on Intel machines
c8dd3c073 metrics: Fix memory footprint qemu limit
8877ec62f metrics: Fix memory inside limits for kata metrics
80146f207 tests: Fixes cpuType check on AMD machines
7e364716d metrics: Add test setup details to metrics report
17dc1b976 metrics: Add boot lifecycle times to metrics report
3b0d6538f metrics: Add memory inside container to metrics report
79fbb9d24 metrics: Add scaling system footprint in metrics report
8e6d4e6f3 metrics: Add metrics reportgen
139ffd4f7 metrics: Add report file titles
878d1a2e7 metrics: Generate PNGs alongside the PDF report
fce248797 metrics: Add metrics report R files
08812074d metrics: Add report dockerfile
69781fc02 metrics: Add metrics report script
e286e842c tests: Expand confidential test to support TDX
e31f099be tests: Expand confidential test to support SNP
c3b9d4945 tests: Add confidential test for SEV
538c965c2 metrics: fix parsing issue on memory-usage test
3818bf331 local-build: Remove $HOME/.docker/buildx/activity/default
d1b54ede2 qemu: tdx: Workaround SMP issue with TDX 1.5
1e34220c4 qemu: tdx: Adapt to the TDX 1.5 stack
8115a0522 versions: tdx: Update Kernel to 6.2 + TDX
ec18180f3 versions: tdx: Update TDVF to the "edk2-stable202302"
9803b2428 versions: tdx: Update QEMU to v7.2 + TDX v1.10
dffc16e5b runtime-rs: check peer close in log_forwarder
aaa5ab126 agent: simplify storage device by removing StorageDeviceObject
fb49d5d7c gha: Avoid "fail-fast" in tests that are known to be flaky
183f51d6f tests: use unique test name
6a974679f tests: delete k8s deployment at the test's end
32a778b6d metrics: Remove unused variable in tensorflow nhwc script
d8f3ce649 kata-deploy: Don't try to remove /opt/kata
936e8091a gha: vfio: Run on Ubuntu 23.04 runner
0e7248264 agent: move storage device related code into dedicated files
268e84655 runtime-rs: Fix volumes and rootfs cleanup issues
8f49ee33b agent: refine storage related code a bit
60ca12ccb agent: switch to new storage subsystem
fcbda0b41 kata-types: introduce StorageDevice and StorageHandlerManager
b03b1f613 agent: simplify the way to manage storage object
8392c71bf sys-util: support more mount flags in parse_mount_options()
c00d8f3d4 agent: use create_mount_destination() from kata-sys-util
5e867f053 types: add more mount related constants
880e6c9a7 agent: use function from kata-sys-utils to reduce code
3b881fbc0 local-build: Remove GID before creating group
959ca4944 metrics: Add TensorFlow ResNet50 fp32 Dockerfile
4b7d72c4a metrics: Add TensorFlow ResNet50 FP32 benchmark
5cba38c17 kata-deploy: Avoid failing on content removal
18d42da21 runtime/fc: fix image/initrd annotation handling
9fda7059a runtime/clh: fix image/initrd annotation handling
1a0092d63 runtime/qemu: fix image/initrd annotation handling
22d8f335d libs,tests: fix typo disable_guest_seccomp in configuration-anno-1.toml
8afd158ce metrics: Add disk link to README
40914b25d kata-agent: use default filemode for block device when it is set to 0
eee2ee6ee metrics: Fix FIO path
39bc3488f metrics: Use function from metrics common in pytorch script
400eb8874 gha: capture additional kata-deploy output
4aee3eade kata-types: implement serde methods for KataVirtualVolume
b875e3932 kata-types: validate KataVirtualVolume object
fa2fdc105 kata-types: implement two conversion helpers for KataVirtualVolume
6326af20e kata-types: introduce KataVirtualVolume
c8b43f8b3 metrics: Fix README for pytorch
fb571f8be metrics: Enable kata runtime in K8s for FIO test.
cb056f8cb rootfs: agent: Policy support with AGENT_INIT=yes
85c02828e metrics: Update tensorflow name in gha run script
e8a511934 metrics: Fix check results for tensorflow benchmark
2d896ad12 gha: kata-deploy: Do the runtime class cleanup as part of the cleanup
4ffc2c86f gha: kata-deploy: Add the first kata-deploy test
8616c050a metrics: Remove unused variable in tensorflow mobilenet script
285e616b5 tests: common: Ensure test_type is used as part of the cluster's name
790bd3548 tests: commob: Don't fail if yq is not part of the cache
ce6adecd0 gha: kata-deploy: Add run-kata-deploy-tests.sh
cfc29c11a gha: k8s: Stop running kata-deploy tests as part of the k8s suite
f4dd15286 tests: k8s: Call ensure_yq() in setup.sh
339569b69 kata-deploy: Properly create default runtime class
2a491e9b1 metrics: Fix MobileNet help me description
d19a75e80 gha: ci: Start running kata-deploy tests
d90f7ac68 runtime-rs: add unit test for block driver
e44919f0d runtime-rs: add load_test_config for unit test
7f48a6937 runtime-rs: add driver option
bade6a5c3 docs: Fix TensorFlow word across the document
1a1b20776 docs: Add Tensorflow Resnet50 documentation
24baededc metrics: Add Dockerfile for ResNet50 int8
6d971ba8d metrics: Add Tensorflow ResNet50 int8 benchmark
25d151bd1 runk: Modify kill command's error message for containerd tests
b3592ab25 gha: cri-containerd: Enable tests
84dd02e0f gha: cri-containerd: Add timeout to the crictl calls on testContainerStop
b29782984 gha: cri-containerd: Show pod before deleting it
ae0930824 gha: cri-containerd: Print kata logs in case of error
6c8b2ffa6 gha: cri-containerd: Group containerd logs
9e898701f gha: cri-containerd: Ensure RUNTIME takes KATA_HYPERVISOR into account
76dac8f22 agent: simplify error handling
18a7fd8e4 metrics: Rename tensorflow scripts
e55fa93db tests: kata-deploy: Add placeholder for kata-deploy-tests-on-tdx
d9ee17aae tests: kata-deploy: Add placeholder for kata-deploy-tests-on-aks
ab829d103 agent: runtime: add the Agent Policy feature
831e73ff9 tests: kata-deploy: Add functional/kata-deploy/gha-run.sh placeholder
af1b46bbf tests: Add gha-run-k8s-common.sh
416445e7e docs: Remove installation step in virtcontainers doc
72cbcf040 kata-deploy: Add k0s support
767434d50 metrics: fix the loop used to stop kata components #76295d0f0d43c metrics: Add cassandra statefulset yaml
c1dcc1396 metrics: Add cassandra service yaml
2297a0d1c metrics: Add block loop pvc yaml for cassandra
e3d511946 metrics: Add block loop pv yaml for cassandra test
989027159 metrics: Add block loop pvc for cassandra test
349b89969 metrics: Add Cassandra Kubernetes benchmark for kata metrics
c52d09052 gha: static-checks: Move to the Azure instances
8815ed066 runtime: Remove config warnings
afe1a6ac5 agent: support copying of directories and symlinks
ab13ef87e runtime: propagate configmap/secrets etc changes for remote-hyp
c074ec4df runtime: Copy shared files recursively
fdcd52ff7 metrics: Add check containers are running in tensorflow mobilenet
36337ee14 metrics: Add check containers are up in tensorflow script
f700f9b0b metrics: Remove unused variable in tensorflow script
833cf7a68 metrics: Add check containers are running function
918c78308 metrics: Add check containers are up in tensorflow mobilenet script
9d57a1fab metrics: Use check containers are up in tensorflow script
1c84680d8 metrics: Add check containers are up in common script
d3e57cf45 metrics: Use collect_results function in tensorflow mobilenet test
286de046a metrics: Remove collect results function definition
9879709aa metrics: Add common functions to the common script
4746fa3da docs: Specify supported Firecracker version using `versions.yaml`
cc922be5e versions: Update firecracker version to 1.4.0
39e67b06e dragonball: vsock add fifo/pipe stream support for passed fd hybridStream
473b0d3a3 metrics: compute tensorflow statistics
03d1fa67b ci: unencrypted-image: Fix build context
eb463b38e ci: unencrypted-image: Don't fail to build on s390x
a2d731ad2 ci: create-confidential-image: Add dependent actions
d1a629622 metrics: Add nginx documentation to network README
498f7c054 metrics: Add nginx kubernetes yaml
f8a5255cf metrics: Add network nginx benchmark
43fe5d1b9 ci: k8s: tees: Ensure PR_NUMBER is exported
54f6a7850 ci: {{ pr-number }} should be {{ inputs.pr-number }}
034d7aab8 tests: k8s: Ensure the runtime classes are properly created
fac8ccf5c ci: Add build-and-publish-tee-confidential-unencrypted-image
ab5f603ff ci: k8s: Add the image used for unencrypted confidential tests
1e8fe131b k8s: tests: Take advantage of `SHIMS` and `DEFAULT_SHIM` env vars
729b2dd61 agent: avoid creating new `Vec` instances when easily avoidable
aeaec9dae tests: upgrade bats version
e66496986 metrics: install kata once and run multiple checks
baabfa9f1 agent: refine implementation of mount related code
98ba211a3 agent: fix a bug in update_ephemeral_mounts()
5333618d7 agent: make add_storage() take &[Storage] instead of Vec<Storage>
37f34781d agent: simplify function online_cpu_memory()
d3c542237 agent: refine style of code related to sandbox
71a9f6778 agent: avoid unwrap() in function do_remove_container()
84badd89d agent: avoid clone objects when possible
b23c5ed15 deps: Bump dependent crate versions
863283716 metrics: General improvements to mobilenet tensorflow test
3c319d8d4 metrics: Add iperf to gha run script
5b5caf890 gha: Add iperf network metrics
66db5b535 metrics: Add latency test to network README
c36572418 agent: avoid unnecessary calls to `Arc::clone`
4fbe0a3a5 runtime: bind-mount mounted block device into container
7e1b1949d runtime: add support for kata overlays
6c867d9e8 agent: add io.katacontainers.fs-opt.overlay-rw option
6163c3565 agent: skip mount options that start with "io.katacontainers."
b2ff97aa0 dragonball: use version 0.10.4 of `fuse-backend-rs`
845eeb4d7 agent: Allow clippy::redundant_clone in the unit tests
1163fc9de release: Revert kata-deploy changes after 3.2.0-rc0 release
3958a39d0 runtime-rs: Introduce directly attachable network
1e15369e5 metrics: Improve naming testing containers in launch times test
5dbe88330 metrics: Clean kata components before start a metric test.
3b45060b6 metrics: Add latency server yaml
9bb8451df metrics: Add latency client yaml
64fdb9870 metrics: Add network latency test
a81ad3b58 runtime-rs: Add block device handling in cloud hypervisor
3230dec95 kata-deploy: Use host's systemctl
1b21a4624 docs: Use control-plane term instead of master
28e5e9c86 runtime-rs: fix number of queues handling in dragonball share fs device
f1d8de9be runk: Allow runk to launch a container without pid namespace
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is basically to make sure that folks trying to use the kata-deploy
script from the main branch, to deploy **stable** kata-deploy images, do
not have a hard time.
Fixes: #7194
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Remove the ability to block access to kata agent endpoints by using
agent-config.toml. That functionality is now implemented using the
Agent Policy feature (#7573).
The CCv0 branch relied on blocking endpoints using agent-config.toml
but will set-up an equivalent default policy file instead (#8219).
Fixes: #8228
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Add the missing closing bracket to the output of the TDX details,
so rather than:
```bash
$ sudo kata-ctl env 2>/dev/null | grep available_guest_protection
available_guest_protection = "tdx (major_version: 1, minor_version: 0"
: ^
: Missing ')' !
```
... we now have:
```bash
$ sudo kata-ctl env 2>/dev/null | grep available_guest_protection
available_guest_protection = "tdx (major_version: 1, minor_version: 0)"
: ^
: Aha!
```
Added a unit test for this scenario.
Fixes: #8257.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
If you attempt to create a container (a TD) on a TDX system using a
custom build of Cloud Hypervisor (CH) that was not built with the `tdx`
CH feature, Kata will report the following, somewhat cryptic, CH error:
```
ApiError(VmBoot(InvalidPayload))
```
Newer versions of CH now report their build-time features in the ping
API response message so we now use that, if available, to detect this
scenario and generate a user-friendly error message instead.
This changes improves the readability of `handle_guest_protection()` and
adds a couple of additional tests for that method.
Fixes: #8152.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Improve the way `handle_guest_protection()` is structured by inverting
the logic and checking the value of the `confidential_guest` setting
before checking the guest protection. This makes the code easier to
understand.
> **Notes:**
>
> - This change also unconditionally saves the available guest protection
> (where previously it was only saved when `confidential_guest=true`).
> This explains the minor unit test fix.
>
> - This changes also errors if the CH driver finds an unexpected
> protection (since only Intel TDX is currently tested).
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This PR adds the iperf udp benchmark for bandwdith measurement
for network metrics.
Fixes#8246
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Encode policy file during test - easier to understand than hard-coding
the encoded file contents.
Fixes: #8214
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR corrects the init env() helper function, to make that
systemctl always returns true when enumerating masked services,
and preventing the test from failing
Fixes: #8242
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR fixes the correct path to setup, build and install properly
the kernel for snp.
Fixes#8156
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
1. enable virtio-fs-pro in Dragonball to have the ability to process nydus backend registry
2. change passthrough for rw layer's readonly config to false to have the accurate read write ability.
Fixes:#8013
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Since Nydus snapshotter has been updated in previous commits, there is a
problem that the config passthrough to Dragonball during mount_rafs is
RafsConfig instead of ConfigV2, but Dragonball could only serde ConfigV2
so it will panic.
We need to add the support for RafsConfig
Fixes:#8013
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Allow access to the ReseedRandomDev endpoint by default. Using false
for ReseedRandomDevRequest was unintended.
Fixes: #8225
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
To avoid errors when initializing the test environment, the
kill_processes_before_start() helper function needs to verify that
docker is installed before attempting to stop it.
Fixes: #8218
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR removes trailing commas so that the json results
file is valid.
This PR also changes the way data results are collected by
terating through the array of memory values to calculate
their average.
Fixes: #8204
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
Previously, if you accidentally modified the name of the hypervisor
section in the config file, the default golang runtime gives a cryptic
error message ("`VM memory cannot be zero`"). This can be demonstrated
using the `kata-runtime` utility program which uses the same golang
config package as the actual runtime (`containerd-shim-kata-v2`):
```bash
$ kata-runtime env >/dev/null; echo $?
0
$ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml
$ kata-runtime env >/dev/null; echo $?
VM memory cannot be zero
1
```
The hypervisor name is now validated so that the behaviour becomes:
```bash
$ kata-runtime env >/dev/null; echo $?
0
$ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml
$ ./kata-runtime env >/dev/null; echo $?
/etc/kata-containers/configuration.toml: configuration file contains invalid hypervisor section: "foo"
1
```
Fixes: #8212.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
As the file is already part of the kata-containers repo, and the tests
repo is about to become read-only, we're good to drop the tests
references from here and use everything coming from the
`kata-containers` repo instead.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As we've moved all the tests to the `kata-containers` repo, the `tests`
repo will become a read-only repo.
Fixes: #8200
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Improve the `GuestProtection` handling to detect the version of
Intel TDX available.
The TDX version is now logged by the Cloud Hypervisor driver.
Fixes: #8147.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Direct-volume needs to use the same base64 character set as
kata-runtime/direct-volume does.
Fixes: #8175
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This PR removes the reference in the documentation to the DAX
subtest of the FIO benchmark, because this metric is currently
WIP.
Fixes: #8159
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This change adds support for adding and removing vfio devices for
cloud-hypervisor.
Fixes: #6691
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
To pick up fix for the following issue:
A maliciously crafted HTTP/2 stream could cause excessive CPU
consumption in the HPACK decoder, sufficient to cause a denial of
service from a small number of small requests.
Fixes: #8190
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
This PR adds a trap whenever the scrip exits, it deletes the iperf
k8s deployment and k8s services, and deletes the kata components.
This way, when the script finishes, it verifies that there are
indeed no kata components still running.
Fixes: #8126
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
The KUBERNETES variable is mostly used by kata-deploy whether to apply
k3s specific deployments or not. It is used to select the type of
kubernetes to be installed (k3s, k0s, rancher...etc) and it is always
set on CI. Running the script locally we want to set a value by default
to avoid `KUBERNETES: unbound variable` errors.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test can give false-positive on a multi-node cluster. Changed it to
use the new get_one_kata_node() and the modified exec_host() to run the
setup commands on a given node (that has kata installed) and ensure the
test pod is scheduled at that same node.
Fixes#7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test can give false-positive on a multi-node cluster. Changed it to
use the new get_one_kata_node() and the modified exec_host() to run the
setup commands on a given node (that has kata installed) and ensure the
test pod is scheduled at that same node.
Fixes#7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The exec_host() simply fails on cluster with multi-nodes because
`kubectl get node -o name" will return a list o names. Moreover, it will
return control nodes names which usually don't have kata installed.
Fixes#7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The introduced get_one_kata_node() returns the first node that
has the kata-runtime=true label, i.e., supposedly a node with
kata installed.
This is useful for tests that should run on a determined worker
node on a multi-nodes cluster.
Fixes#7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Let KATA_HYPERVISOR be qemu by default in gh-run.sh as this variable
is required to tweak some configurations of kata-deploy.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The deploy-kata() of gha-run.sh will wait for 10 minutes for the kata
deploy installation finish. This allow users of the script to overwrite
that value by exporting the KATA_DEPLOY_WAIT_TIMEOUT environment
variable.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Fixed a couple of warns shellcheck emitted and disabled others:
* SC2154 (var is referenced but not assigned)
* SC2086 (Double quote to prevent globbing and word splitting)
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The .tests/integration/kubernetes/gh-run.sh script run `yq write` a
couple of times to edit the kata-[deploy|cleanup].yaml, resulting
on the file being formatted again. This is annoying because leaves
the git tree dirty.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The only difference to the other platforms is that it needs to
export KUBECONFIG.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The cleanup-kcli() behaves like other deploy kata for
bare-metal (e.g. sev, tdx...etc) except that KUBECONFIG
should be exported.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The cleanup-kcli() behaves like other clean up for bare-metal (e.g. sev,
tdx...etc) except that KUBECONFIG should be exported.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
On CI workflows the variables DOCKER_REGISTRY, DOCKER_REPO and
DOCKER_TAG are exported to match the built image. However, when running
the script outside of CI context, a developer might just use the latest
image which in this case will be
`quay.io/kata-containers/kata-deploy-ci:kata-containers-latest`.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Adapted the gha-run.sh script to create a Kubernetes cluster locally
using the kcli tool.
Use `./gha-run.sh create-cluster-kcli` to create it, and
`./gha-run.sh delete-cluster-kcli` to delete.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
GHA has an undocumented limitation that there can be at most 20
referenced yamls in a single yaml file. We workaround it by combining
multiple jobs into a single yaml file.
Fixes: #8161
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Enable the Cloud Hypervisor driver (the `cloud-hypervisor` build feature) for the rust runtime.
Fixes: #6264.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
I'm basically moving the runk tests from the tests repo to this one, and
I'm adding the "Signed-off-by:" of every single contributor the tests.
Fixes: #8116
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Chen Yiyang <cyyzero@qq.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The runk test has been executed as part of the former "ubuntu" jenkins
CI.
We're porting it to GHA and running it against LTS containerd.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The tracing tests are currently running as part of the Jenkins CI with
the following setups:
* Container Engines: containerd
* VMMs: QEMU | Cloud Hypervisor
* Snapshotters: overlayfs | devmapper
We'll be restricting those tests to be running on LTS version of
containerd, without devmapper.
As it's known due to our GHA limitation, this is just a placeholder and
the tests will actually be added in the next interations.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise we'll use any arm64 machine that's added as a runner, and
whenever new machines are added those may end up being only used for
running some specific set of the tests.
Fixes: #8109
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR enables the use of jq pretty-print feature to
improve the formatting of metric results json files.
Fixes: #8081
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
FIO benchmark is enabled to measure IO in Kata
at different latencies using containerd client,
in order to complement the CI metrics testing set.
This PR asl deprecated the previous Fio bench
based on k8s.
Fixes: #8080
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
The tests are failing when setting up k0s, and that happens because we
download a kubectl binary matching the kubernetes version k0s is using,
and we do that by:
```
sudo k0s kubectl version --short 2>/dev/null | ...
```
With kubectl 1.28, which is now the default on k0s, `kubectl version
--short` has been removed, leading us to an empty stringm causing then
the error in the CI.
Fixes: #8105
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We've added two new containerd builder images recently, one for the
components under `src/tools` and another one for the Kata Containers
agent.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This follows what we've been doing for all the components we're
building, but was missed as part of #8077.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The kata-agent binary won't be released, just built so it can be used,
later on, as part of our tests and as part of the rootfs build.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the needed functions to start building the kata-agent, with or
without the OPA support.
For now this build is not used as part of the rootfs build, but later on
this will (not as part of this series, though).
Fixes: #8099
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will help to build the agent binary as part of the kata-deploy
localbuild, as we need to pass the DESTDIR to where the agent will be
installed, and also whether we're building the agent with policy support
enabled or not.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The k8s.gcr.io is deprecated for a while now and has been redirected to
registry.k8s.io. However on some bare-metal machines in our testing
pools that redirection is not working, so let's just replace the
registries.
Fixes#8098
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit b2c3bca558c38deff2117d5909d9071c23c05590)
Spell check failed with:
```
[kata-spell-check.sh:275] WARNING: Word 'overcommitment':
did you mean one of the following?: over commitment, over-commitment,
commitment
```
So update this to pass the static checks
Fixes: #
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Ensure that our documented crictl pod config file contents have
uid and namespace fields for compatibility with crictl 1.24+
This avoids a user potentially hitting the error:
```
getting sandbox status of pod "d3af2db414ce8": metadata.Name,
metadata.Namespace or metadata.Uid is not in metadata
"&PodSandboxMetadata{Name:nydus-sandbox,Uid:,Namespace:default,Attempt:1,}"
getting sandbox status of pod "-A": rpc error: code = NotFound desc = an
error occurred when try to find sandbox: not found
```
Fixes: #8092
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
(cherry picked from commit 8f8c2215)
We need to do proper sandbox sizing when we're doing cold-plug introduce CDI,
the de-facto standard for enabling devices in containers. containerd
will pass-through annotations for accumulated CPU,Memory and now CDI
devices. With that information sandbox sizing can be derived correctly.
Fixes: #7331
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Let's add targets and actually enable users and oursevles to build those
components in the same way we build the rest of the project.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As we'd like to ship the content from src/tools, we need to build them
in the very same way we build the other components, and the first step
is providing scripts that can build those inside a container.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The kata-monitor tests is currently running as part of the Jenkins CI
with the following setups:
* Container Engines: CRI-O | containerd
* VMMs: QEMU
When using containerd, we're testing it with:
* Snapshotter: overlayfs | devmapper
We will stop running those tests on devmapper / overlayfs as that hardly
would get us a functionality issue.
Also, we're restricting this to run with the LTS version of containerd,
when containerd is used.
As it's known due to our GHA limitation, this is just a placeholder and
the tests will actually be added in the next iterations.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will serve us quite will in the upcoming tests addition, which will
also have to be executed using CRi-O.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will become handy when doing tests with CRI-O, as CRI-O doesn't
install the CNI plugins for us.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's ensure we have runc running with `SystemdCgroups = false`,
otherwise we'll face failures when running tests depending on runc on
Ubuntu 22.04, woth LTS containerd.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Allow Cloud Hypervisor to create a confidential guest (a TD or
"Trust Domain") rather than a VM (Virtual Machine) on Intel systems
that provide TDX functionality.
> **Notes:**
>
> - At least currently, when built with the `tdx` feature, Cloud Hypervisor
> cannot create a standard VM on a TDX capable system: it can only create
> a TD. This implies that on TDX capable systems, the Kata Configuration
> option `confidential_guest=` must be set to `true`. If it is not, Kata
> will detect this and display the following error:
>
> ```
> TDX guest protection available and must be used with Cloud Hypervisor (set 'confidential_guest=true')
> ```
>
> - This change expands the scope of the protection code, changing
> Intel TDX specific booleans to more generic "available guest protection"
> code that could be "none" or "TDX", or some other form of guest
> protection.
Fixes: #6448.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Introduce a few new constants (for PCI segment count and FS queues) and
move the disk queue constants to `convert.rs` to allow them to be used
there too.
> **Note:**
>
> This change gives the `ShareFs` code it's own set of values rather
> than relying on the disk queue constants.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Modify the Cloud Hypervisor `add_device()` method to add `ShareFs` and
`Network` devices to the list of pending devices since only these two
device types need to be cached before VM startup. Full details in the
comments.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Remove the `VIRTIO_BLK_MMIO` check which appears to have been added
erroneously in the first place.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This patch re-generates the client code for Cloud Hypervisor v35.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.
Fixes: #8057
Signed-off-by: Bo Chen <chen.bo@intel.com>
We've faced this as part of the CI, only happening with the CRI-O tests:
```
not ok 1 Test readonly volume for pods
# (from function `exec_host' in file tests_common.sh, line 51,
# in test file k8s-file-volume.bats, line 25)
# `exec_host "echo "$file_body" > $tmp_file"' failed with status 127
# [bats-exec-test:38] INFO: k8s configured to use runtimeclass
# bash: line 1: $'\r': command not found
#
# Error from server (NotFound): pods "test-file-volume" not found
```
I must say I didn't dig into figuring out why this is happening, but we
may be safe enough to just trail the '\r', as long as all the tests keep
passing on containerd.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We need the default capabilities to be enabled, especially `SYS_CHROOT`,
in order to have tests accessing the host to pass.
A huge thanks to Greg Kurz for spotting this and suggesting the fix.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Some of the "k8s distros" allow using CRI-O in a non-official way, and
if that's done we cannot simply assume they're on containerd, otherwise
kata-deploy will simply not work.
In order to avoid such issue, let's check for `cri-o` as the container
engine as the first place and only proceed with the checks for the "k8s
distros" after we rule out that CRI-O is not being used.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes the network metrics section at the README by leaving
the current tests that we have in our kata metrics.
Fixes#8017
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The permissions on .docker/buildx/activity/default are regularly broken by us
passing docker.sock + $HOME/.docker to a container running as root and then
using buildx inside. Fixup ownership before executing docker commands.
Fixes: #8027
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
To avoid the failure of not finding pandoc command this PR adds that
package as a dependency for static checks.
Fixes#8041
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The cgroup stats come from resourcecontrol package in the form of pointers
to structs. The sandbox Stat() method incorrectly was expecting structs.
This caused the cpu and memory stats to always be 0, which in turn caused
incorrect pod overhead metrics.
Fixes#8035
Signed-off-by: Peteris Rudzusiks <rye@stripe.com>
Firecracker supports noflush semantic via Unsafe cache type.
There is no support for direct i/o, remove it from config file
Fixes: #7823
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Clh suports direct i/o for disks. It doesn't
offer any support for noflush, removed passing
of option to cloud-hypervisor internal config
Fixes: #7798
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Seems like the static checks are failing due the missing of the hunspell
package this PR fixes that.
Fixes#8019
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The test has been added to the repo, but we have to also add it to the
list of jobs to be executed.
Fixes: #8005
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise we'll face the following error:
```
Failed to enable unit: Interactive authentication required.
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make sure we'll also be testing k8s using CRI-O.
For now, we'll only be running the CRI-O test with QEMU. Once it
becomes stable we can expand this to other Hypervisors as well.
Fixes: #8005
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Support quoted kernel command line parameters that include space
characters. Example:
dm-mod.create="dm-verity,,,ro,0 736328 verity 1
/dev/vda1 /dev/vda2 4096 4096 92041 0 sha256
f211b9f1921ef726d57a72bf82be23a510076639fa8549ade10f85e214e0ddb4
065c13dfb5b4e0af034685aa5442bddda47b17c182ee44ba55a373835d18a038"
Fixes: #8003
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This is part of a bigger effort to drop gogoprotobuff from our code
base. IIUC, those options are basically used by *pb_test.go, and since
we are dropping gogoprotobuff and those are auto generated tests, let's
just remove it.
Fixes#7978.
Signed-off-by: Beraldo Leal <bleal@redhat.com>
This will be very useful in the near future, when we start testing
kata-deploy with rke2 as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will be very useful in the near future, when we start testing
kata-deploy with k0s as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We'll be using exactly the same code used for the k8s tests, which are
already deploying k3s on GARM.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We just need to make sure the correct overlay is applied, following what
we already have been doing for k3s.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We'll be testing kata-deploy with different kubernetes flavours as part
of our GARM tests, and this is a place-holder for this.
Once enabled, we'll do nothing, just `return 0`, so we can then properly
add the tests after this commit gets merged.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
So we have a better control on which flavour of kubernetes kata-deploy
is expected to be targetting.
This was also done as part of fa62a4c01b,
for the k8s tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Ideally we'd add the instance_type or the full K8S_TEST_HOST_TYPE but
that exceeds the maximum amount of characteres allowed for the cluster
name. With this in mind, let's use the first letter of
K8S_TEST_HOST_TYPE instead.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This makes it so that each AKS cluster is created in its own individual
resource group, rather than using the "kataCI" resource group for all
test clusters.
This is to accommodate a tool that we recently introduced in our Azure
subscription which automatically deletes resource groups after a set
amount of time, in order to keep spending under control.
The tool will automatically delete any resource group, unless it has a
tag SkipAutoDeleteTill = YYYY-MM-DD. When this tag is present, the
resource group will be retained until the specified date.
Note that I tagged all current resource groups in our subscription with
SkipAutoDeleteTill = 2043-01-01 so that we don't lose any existing
resources.
Fixes: #7982
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
We're hitting a specific issue after updating, which will require some
work on dragonball before it can be re-added here.
The issue:
```
...
3: failed to do rafs mount\\n
4: fail to attach rafs \\\"/var/lib/containerd-nydus/snapshots/2/fs/image/image.boot\\\"\\n
5: add share fs mount\\n
6: Mount rafs at
/rafs/197ef3db03c86b91bf3045ff59183ce8b5750941ad1d3484f4a8301a70f5109f/rootfs_lower
error: Failed to Mount backend
...
Caused by:
vmm action error: FsDevice(AttachBackendFailed(\\\"attach/detach a
backend filesystem failed:: missing field `version` at line 1 column
489\\\"))\"): unknown"
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will ensure we're testing with the correct runtime, instead of
using the `default` one.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Fix the arch error when downloading the nydus tarball.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Signed-off-by: Steven Horsman <steven@uk.ibm.com>
To support the v0.12.0 nydus-snapshotter, we need to update the config
files and the commandline to start nydus-snapshotter.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
And with this we finally enable the nydus tests to run as part of our
GHA CI.
Fixes: #6543
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We've been simply doing nothing whenever `install-kata` was called, and
that was the intent when we added the placeholder calls.
Now, let's install kata, as expected. :-)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As we've added install_nydus() and install_nydus_snapshotter(), which do
conform with the pattern we're following on GHA, let's rely on them
rather than relying on the bits coming from nydus_test.sh.
Later on we'll have install_nydus() and install_nydus_snapshotter() as
part of the dependencies install in our `gha-run.sh`.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to what's been done for the cri-containerd tests, as part of
84dd02e0f9, we need to add the timeout
here for the crictl calls.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise we may face errors like:
```
getting sandbox status of pod "d3af2db414ce8": metadata.Name,
metadata.Namespace or metadata.Uid is not in metadata
"&PodSandboxMetadata{Name:nydus-sandbox,Uid:,Namespace:default,Attempt:1,}"
getting sandbox status of pod "-A": rpc error: code = NotFound desc = an
error occurred when try to find sandbox: not found
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise we canoot properly start the nydus snapshotter, nor properly
kill it after it's been started.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The "source ..." we've been doing was not changed since those tests were
part of the Jenkins tests, and we need to adapt them, either setting the
correct path or entirely removing the ones that are not relevant to us
anymore.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will be used to download and install the
nydus-snapshotter, and it follows the same pattern we already have
introduced for downloading and installing another dependencies from
GitHub.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will be used to download and install nydus, and it follows
the same pattern we already have introduced for downloading and
installing another dependencies from GitHub.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Users have noticed that this is needed, as CLH does not yet implement a
way to hotplug resources on aarh64.
With this patch, when building for x86_64, I can see the this is the
resulting config:
```
$ ARCH=amd64 make
...
$ cat config/configuration-clh.toml | grep static_sandbox_resource_mgmt
static_sandbox_resource_mgmt=false
```
And when building for aarch64:
```
$ ARCH=arm64 make
...
$ cat config/configuration-clh.toml | grep static_sandbox_resource_mgmt
static_sandbox_resource_mgmt=true
```
Fixes: #7941
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
PR #6146 added the possibility to control QEMU with an extra HMP socket
as an aid for debugging. This is great for development or bug chasing
but this raises some concerns in production.
The HMP monitor allows to temper with the VM state in a variety of ways.
This could be intentionally or mistakenly used to inject subtle bugs in
the VM that would be extremely hard if not even impossible to debug. We
definitely don't want that to be enabled by default.
The feature is currently wired to the `enable_debug` setting in the
`[hypervisor.qemu]` section of the configuration file. This setting has
historically been used to control "debug output" and it is used as such
by some downstream users (e.g. Openshift). Forcing people to have the
extra HMP backdoor at the same time is abusive and dangerous.
A new `extra_monitor_socket` is added to `[hypervisor.qemu]` to give
fine control on whether the HMP socket is wanted or not. This setting
is still gated by `enable_debug = true` to make it clear it is for
debug only. The default is to not have the HMP socket though. This
isn't backward compatible with #6416 but it is for the sake of "better
safe than sorry".
An extra monitor socket makes the QEMU instance untrusted. A warning is
thus logged to the journal when one is requested.
While here, also allow the user to choose between HMP and QMP for the
extra monitor socket. Motivation is that QMP offers way more options to
control or introspect the VM than HMP does. Users can also ask for
pretty json formatting well suited for human reading. This will improve
the debugging experience.
This feature is only made visible in the base and GPU configurations
of QEMU for now.
Fixes#7952
Signed-off-by: Greg Kurz <groug@kaod.org>
Now that the static-checks job only takes care of running the
static-checks, let's clean it up, remove all the unneeded steps, make
sure that we're using the actions in their latest version, and have it
running in a cost free runner.
At some point I'd like to see those tests done in parallel, in the same
way that I've organised the build-checks, but that's something for
someone else, at some other time.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
With this we're removing the dragonball static-checks CI, as the test is
running here now. :-)
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We're moving it out of the previous "static-checks" confusing matrix,
and adding it to the matrix that was currently being used for the `make
vendor` and `make check` checks.
This will allow us to have one job per component, and with that we can
easily run those in parallel and on the zero cost runners.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We're moving it out of the previous "static-checks" confusing matrix,
and adding it to the matrix that was currently being used for the `make
vendor` and `make check` checks.
This will allow us to have one job per component, and with that we can
easily run those in parallel and on the zero cost runners.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise `make test` will simply fail with:
```
error[E0583]: file not found for module `config`
```
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This makes it pssible to run the tests in the cost free runners, which
are not KVM capable.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise `make test` will simply fail with:
```
error[E0583]: file not found for module `version`
```
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise `make test` will fail with:
```
error[E0583]: file not found for module `version`
```
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We're moving it out of the previous "static-checks" confusing matrix,
and adding it to the matrix that was currently being used for the `make
vendor` checks.
This will allow us to have one job per component, and with that we can
easily run those in parallel and on the zero cost runners.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
clippy is used as part our tests, so it's useful to have it installed
while we're already installing rust.
In case of developers, they also better be using it. :-)
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to the static-check jobs, those jobs can be run on the zero
cost runners.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We'll use it as part of the refactoring we're doing in the static check
tests.
I can see a lot of other uses of this, but changing all of them to this
one is out of the scope for this PR.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We can use this a lot as part of our CI, but right now I'm just moving
those here with the intent to use later on in this series.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It doesn't make sense to run this for all the bits of the matrix,
neither it's demanding enough to require running this in one of our
Azure sponsored runners.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let me start with a fair warning that this commit is hard to split into
different parts that could be easily tested (or not tested, just
ignored) without breaking pieces.
Now, about the commit itself, as we're on the run to reduce costs
related to our sponsorship on Azure, we can split the k8s tests we run
in 2 simple groups:
* Tests that can be run in the smaller Azure instance (D2s_v5)
* Tests that required the normal Azure instance (D4s_v5)
With this in mind, we're now passing to the tests which type of host
we're using, which allows us to select to run either one of the two
types of tests, or even both in case of running the tests on a baremetal
system.
Fixes: #7972
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR adds support for adding a network device before starting the
cloud-hypervisor VM.
Support for adding and removing network devices is not really added to
the resource manager, so supporting this for cloud-hypervisor is not
scoped in this PR.
This also changes "pending_devices" for clh implementation from an
Option of vector to simply a vector. This simplifies the structure a bit
as we can simple iterate over the pending devices instead of having to
check for a "Some" value as this is not really required.
Fixes: #6333
Signed-off-by: Shuaiyi Zhang <zhang_syi@qq.com>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
We've removed this in the part 2 of this effort, as we were not caching
the sha256sum of the component. Now that this part has been merged,
let's get back to checking it.
Fixes: #7834 -- part 3
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
That's not needed anymore, as we've switched to using ORAS and an OCI
registry to cache the artefacts.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is something that was done by our Jenkins jobs, but that I ended up
missing when writing d0c257b3a7.
Now, let's also add the sha256sum to the cached artefact, and in a
coming up PR (after this one is merged) we will also start checking for
that.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In the previous series related to the artefacts we build, we've
switching from storing the artefacts on Jenkins, to storing those in the
ghcr.io/kata-containers/cached-artefacts/${artefact_name}.
Now, let's take advantage of that and actually use the artefacts coming
from that "package" (as GitHub calls it).
NOTE: One thing that I've noticed that we're missing, is storing and
checking the sha256sum of the artefact. The storing part will be done
in a different commit, and the checking the sha256sum will be done in a
different PR, as we need to ensure those were pushed to the registry
before actually taking the bullet to check for them.
Fixes: #7834 -- part 2
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
cloud hypervisor on arm64 only support arm AMBA UART(pl011) as
tty. So, the console should be set to "ttyAMA0" instead of "ttyS0"
when enable hypervisor debug mode.
Fixes: #5080
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
The list of tests which require a bigger VM instance is:
* k8s-number-cpus.bats -- failing on all CIs
* k8s-parallel.bats -- only failing on the cbl-mariner CI
* k8s-scale-nginx.bats -- only failing on the cbl-mariner CI
We'll keep those disabled while we re-work the logic to **only run
those** in a bigger (and more expensive) VM instance.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's push the artefacts to ghcr.io and stop relying on jenkins for
that.
Fixes: #7834 -- part 1
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Right now this is not used, but it'll be used when we start caching the
artefacts using ORAS.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
ORAS is the tool which will help us to deal with our artefacts being
pushed to and pulled from a container registry.
As both the push to and the pull from will be done inside the
kata-deploy binaries builder container, we need it installed there.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense. With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.
Fixes: #7958
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense. With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.
Fixes: #7958
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense. With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.
Fixes: #7958
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense. With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.
Fixes: #7958
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense. With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.
Fixes: #7958
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Without setting the cpu limit / request to 1, we can make this test run
in a smaller VM instance without any issue.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As the environment variables are now being passed down from the GitHub
Actions, let's make sure they're exposed to the container used to build
the kata-deploy binaries, and during the build process we'll be able to
use those to log in and push the artefacts to the OCI registry, using
ORAS.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We do the build of our artefacts inside a container image, and we need
to expose some env vars to the container so ORAS can be used there to
push the artefacts we want to cache to ghcr.io.
The env vars we're exposing are:
* ARTEFACT_REGISTRY: The registry where we're going to save the
artefacts.
* ARTEFACT_REGISTRY_USERNAME: The username to log in to the registry, as
ORAS does not use the same json file used by docker.
* ARTEFACT_REGISTRY_PASSWORD: The pasword to log in to the the registry,
as the ORAS does not use the same json file used by docker.
* TARGET_BRANCH: The target branch, which will be part of the tag of the
artefact, as we may end up caching the artefacts for both main and
stable branches.
Fixes: #7834 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We need a very recent L2 guest kernel to fix all the bugs that occur in nested
virtualization.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
cloud hypervisor does not emulate pcie switches or pci bridges, so we need to
accept a lonely device.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
It is fine to start a VM with the disk image without syncing it as we now run
the test in an ephemeral Azure instance.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Sometimes the test gets stuck running commands in the container - need to
investigate why later.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Cloud Hypervisor exposes a VIRTIO_IOMMU device to the VM when IOMMU support is
enabled. We need to add it to the whitelist because dragonball uses kernel
v5.10 which restricted VIRTIO_IOMMU to ARM64 only.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
by enabling IOMMU on the default PCI segment. For hotplug to work we need a
virtualized iommu and clh exposes one if there is some device or PCI segment
that requests it. I would have preferred to add a separate PCI segment for
hotplugging vfio devices but unfortunately kata assumes there is only one
segment all over the place. See create_pci_root_bus_path(),
split_vfio_pci_option() and grep for '0000'.
Enabling the IOMMU on the default PCI segment requires passing enabling IOMMU on
every device that is attached to it, which is why it is sprinkled all over the
place.
CLH does not support IOMMU for VirtioFs, so I've added a non IOMMU segment for
that device.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
There is no way for this branch to be hit, as port is only set when it is
different than config.NoPort.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
These test cases shows which options are valid for CLH/Qemu, and test that we
correctly catch unsupported combinations.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
The only supported options are hot_plug_vfio=root-port or no-port.
cold_plug_vfio not supported yet.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
hot_plug_vfio needs to be set to root-port, otherwise attaching vfio devices to
CLH VMs fails. Either cold_plug_vfio or hot_plug_vfio is required, and we have
not implemented support for cold_plug_vfio in CLH yet.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
tdp_mmu had some issues up until around Linux v6.3 that make it work
particularly bad when running nested on Hyper-V. Reload the module at the start
of the test and disable the tdp_mmu param.
Gather debug info at the end of the test to make it easier to figure out what
went wrong. This uses github actions group syntax so that each section can be
collapsed.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
- reduce memory and cpu usage to fit in a D4s_v5
- source correct lib
- mount workspace from 9p
- disable cpu mitigations for speed
- drop unused commands and variables
- install containerd
- install kata from built artifacts
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This imports the vfio test scripts github.com/kata-containers/tests. The test
case doesn't work yet but doing the changes in a separate commit will make it
easier to track the changes. The only change in this commit is renaming
vfio_jenkins_job_build.sh -> vfio_fedora_vm_wrapper.sh
Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Otherwise we'll fail to configure kata-containers in the `install-kata`
step.
This is mostly needed because the nerdctl-full tarball doesn't provide a
contaienrd configuration, just the binary, as contaienrd does not
actually require a configuration file to run with the default config.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
TIL that the Azure VMs we use are created without an explicit outbund
connectivity defined.
This leads us to issues using `ping ...` as part of our tests, and when
consulting Jeremi Piotrowski about the issue he pointed me out to two
interesting links:
* https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access
* https://learn.microsoft.com/en-us/archive/blogs/mast/use-port-pings-instead-of-icmp-to-test-azure-vm-connectivity
For your own sanity, do not read the comments, after all this is
internet. :-)
Anyways, the suggestion is to use nping instead, which is provided by
the nmap package, so we can explicitly switch to using the tcp port 80
for the ping. With this in mind, I'm switching the image we use for the
test and using one that provided nping as a possible entry point, and
from now on (this part of) the tests should work.
Fixes: #7910
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
TIL that the Azure VMs we use are created without an explicit outbund
connectivity defined.
This leads us to issues using `ping ...` as part of our tests, and when
consulting Jeremi Piotrowski about the issue he pointed me out to two
interesting links:
* https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access
* https://learn.microsoft.com/en-us/archive/blogs/mast/use-port-pings-instead-of-icmp-to-test-azure-vm-connectivity
For your own sanity, do not read the comments, after all this is
internet. :-)
Anyways, the suggestion is to use nping instead, which is provided by
the nmap package, so we can explicitly switch to using the tcp port 80
for the ping. With this in mind, I'm switching the image we use for the
test and using one that provided nping as a possible entry point, and
from now on (this part of) the tests should work.
Fixes: #7910
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In the RemoveEndpoints(), when the endpoints paramete isn't empty,
using idx may result in wrong endpoint removals. To improve,
directly passing the endpoint parameter helps
locate the correct elements within n.eps.
Fixes: #7732
Signed-off-by: shixuanqing <1356292400@qq.com>
Fixes: #7732
Signed-off-by: shixuanqing <1356292400@qq.com>
Update src/runtime/virtcontainers/network_linux.go
Co-authored-by: Xuewei Niu <justxuewei@apache.org>
Fix kernel and initrd annotations in the k8s tests on Mariner. These
annotations must be applied to the spec.template for Deployment, Job
and ReplicationController resources.
Fixes: #7764
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR ensures that docker is running as part of the init_env function
in kata metrics to avoid failures like docker is not running and making
the kata metrics CI to fail.
Fixes#7898
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
FIO test is showing ongoing issues when running in k8s.
Working on running FIO on the ctr client which has been
shown to be stable.
Fixes: #7920
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
There's no reason to wait till the payload is created to run the tests,
as we rely on the tarball, not on the kata-deploy payload.
That was a mistake on my side, and that's already fixed for the nerdctl
tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add a very basic sanity test to check that we can spawn a
containers using nerdctl + Kata Containers.
This will ensure that, at least, we don't regress to the point where
this feature doesn't work at all.
In the future, we should also test all the VMMs with devmapper, but
that's for a follow-up PR after this test is working as expected.
Fixes: #7911
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add a very basic sanity test to check that we can spawn a
containers using docker + Kata Containers.
This will ensure that, at least, we don't regress to the point where
this feature doesn't work at all.
For now we're running this test against Cloud Hypervisor and QEMU only,
due to an already reported issue with dragonball:
https://github.com/kata-containers/kata-containers/issues/7912
In the future, we should also test all the VMMs with devmapper, but
that's for a follow-up PR after this test is working as expected.
Fixes: #7910
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
- In rust 1.72, clippy warned clippy::non-minimal-cfg
as the cfg has only one condition, so doesn't
need to be wrapped in the any combinator.
Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Allow `clippy::redundant-closure-call`
which has issues with the guard function passed into
the `run_if_auto_values` macro
Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The bindgen generated code is triggering lots of
ambiguous-glob-reexports warnings in rust 1.70+
Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- In rust 1.72, clippy warned clippy::non-minimal-cfg
as the cfg has only one condition, so doesn't
need to be wrapped in the all combinators.
Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Allow `clippy::redundant-closure-call` in `from_cmdline`
which has issues with the guard function passed into
the `parse_cmdline_param` macro
Fixes: #7902
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
When creating a new endpoint, we check existing endpoint names and automatically adjust the naming of the new endpoint to ensure uniqueness.
Fixes: #7876
Signed-off-by: shixuanqing <1356292400@qq.com>
This commit allows us to specify the huge page backend when enabling huge
page. Currently, we support two backends: thp and hugetlbfs, the default
is hugetlbfs.
To ensure backward compatibility, we introduce another configuration item
"hugepage_type" to select the memory backend, which is available only when
"enable_hugepages" is true. Besides, we add an annotation
"io.katacontainers.config.hypervisor.hugepage_type" to configure huge page
type per pod.
Fixes: #6703
Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com>
Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
As, regardless of what's mentioned in the documentation, it seems that
$GITHUB_REF_NAME is passed down as a literal string.
Fixes: #7414
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The ones for the payload-after-push.yamland ci-nightly.yaml are not that
much important right now, but they're needed for when we start running
those on stable branches as well.
The other ones were missed during
bd24afcf73.
Fixes: #7414
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Now that the metrics migration from the tests to kata containers has been completed, this PR removes the warning from the main metrics documentation.
Fixes#7894
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
To support the removal of the `initcall_debug` and `earlyprintk=`
options from the default guest kernel cmdline, add `kernel_params` to the list
of enabled annotations to allow those kernel options (or others) to be
set using `kata-deploy` for either runtime.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Removed the following kernel command line options:
- `earlyprintk=ttyS0`
- `initcall_debug`
Both these options are only useful when debugging a guest kernel failure
which is not a common occurrence.
Further, the `earlyprintk=` option can have a large negative performance
impact (it can increase the VM boot time significantly).
If the user wishes to use either of these options, they can add them to the
`kernel_params=` setting in the Kata configuration file's hypervisor
stanza.
Fixes: #7886.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
All the patches have already been merged upstream and they've just been
cherry-picked to this branch.
Fixes: #7885
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We're bumping here in order to make our lives easier backporting EROFS
patches needed for the CC related work.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Removed the addition of the `initcall_debug` kernel option when agent
debugging enabled. This option has nothing to do with the agent.
If the user wishes to use this option, they can add it to the
`kernel_params=` setting in the Kata configuration file's hypervisor
stanza.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Similarly to what's been done for x86_64 -> amd64, we need to do a
aarch64 -> arm64 change in order to be able to download the kubectl
binary.
Fixes: #7861
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
1. Directly support CgroupManager::freeze through systemd API.
2. Avoid always passing unit_name by storing it into DBusClient.
3. Realize CgroupManager::destroy more accurately by killing systemd unit rather than stop it.
4. Ignore no such unit error when destroying systemd unit.
5. Update zbus version and corresponding interface file.
Acknowledgement: error handling for no such systemd unit error refers to
Fixes: #7080, #7142, #7143, #7166
Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
We're changing what's been done as part of ac939c458c, as we've
notcied issues using `github.event.pull_request.merge_commit_sha`.
Basically, whenever a force-push would happen, the reference of
merge_commit_sha wouldn't be updated, leading us to test PRs with the
old code. :-/
In order to get the rebase properly working, we need to ensure we pull
the hash of the commit as part of checkout action, and ensure
fetch-depth is set to 0.
Fixes: #7414
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This syntax belongs to the legacy C virtiofsd implementation that
we don't support anymore since kata-containers 3.1.3 because
of other API breaking changes.
People have been warned to switch from "none" to "never" since
kata-containers 2.5.2. Let's officially do that.
The compat code that would convert "none" to "never" isn't
needed anymore. Just drop it.
Fixes#7864
Signed-off-by: Greg Kurz <groug@kaod.org>
gogo.nullable is the main gogo.protobuf' feature used here. Since we are
trying to remove gogo.protobuf, the first reasonable step seems to be
remove this feature. This is a core update, and it will change how the
structs are defined. I could spot only a few places using those structs,
based on make check/build.
Fixes#7723.
Signed-off-by: Beraldo Leal <bleal@redhat.com>
There is no reference to PROTO_FILE and this is not working. Also we are
not inside a Makefile, so makes sense to adapt the usage to reflect the
script instead of a make command.
Signed-off-by: Beraldo Leal <bleal@redhat.com>
import_path is used as the default package when no input files specify
go_package. However, all the files we are currently building already
have a go_package definition, making this behavior both redundant and
error-prone.
Additionally, one of our files (types.pb.go) resides outside the grpc
directory, indicating that it's indeed ignored but also inconsistent.
Signed-off-by: Beraldo Leal <bleal@redhat.com>
Currently, the script searches for .proto files within $GOPATH/.
Consequently, modifications to a definition file in the current working
directory won't influence the output .pb.go if the directory is outside
of $GOPATH. For developers, it's more intuitive to alter the local
codebase than the version stored in $GOPATH.
With this modification, the generated .pb.go files will be relative to
the current working directory, removing the need to clone this project
under $GOPATH/src/github.com/kata-containers.
Signed-off-by: Beraldo Leal <bleal@redhat.com>
The definitions are already specified in the .proto files using the
go_package option. Centralizing them in one location reduces the
potential for errors and simplifies the script.
Signed-off-by: Beraldo Leal <bleal@redhat.com>
There's absolutely no need to have the skip check as part of the test
itself when it's already done as part of the setup function.
We're only touching the files here that were touched in the previous
commit.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's keep both checks for now, but in the future we'll be able to
remove the check for "firecracker", as the hypervisor name used as part
of the GitHub Actions has to match what's used as part of the
kata-deploy stuff, which is `fc` (as in `kata-fc for the runtime class)
instead of `firecracker`.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
That's what we've been using as part of Jenkins, so let's ensure things
will work as they did before, and only after that consider upgrading the
base OS used for the tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We've been using the `kata-deploy-tdx` target as that also uses k3s as
base, but it's better to just have a specific garm target.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
So we have a better control on which flavour of kubernetes kata-deploy
is expected to be targetting.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
GARM runners do not come with the whole set of tools we need, or are
used to when it comes to the GHA runners, so we need to manually install
bats on those.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR changes the order in which the FIO test first
cleans the environment and then checks if the environment
is indeed clean.
Fixes: #7869
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
As we were using `tee` without the `-a` (or `--apend`) aptton, the
containerd config would be overwritten, leading to a NotReady state of
the Node.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's download the vanilla kubectl binary into `/usr/bin/`, as we need
to avoid hitting issues like:
```sh
error: open /etc/rancher/k3s/k3s.yaml.lock: permission denied
```
The issue basically happens because k3s links `/usr/local/bin/kubectl`
to `/usr/local/bin/k3s`, and that does extra stuff that vanilla
`kubectl` doesn't do.
Also, in order to properly use the k3s.yaml config with the vanilla
kubectl, we're copying it to ~/.kube/config.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise the /etc/rancher/k3s/k3s.yaml is not readable by other users
than root.
As --write-config-mode is being passed, and that's an option that has to
be passed to the `server`, -s is also added to the command line.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`wait` waits for a job to complete, not a number of seconds. Not sure
how I got that wrong in the first place, but it's what it's.
Fixes: #6542
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR replaces the ubuntu image for one which has TensorFlow optimized
for kata metrics.
Fixes#7866
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Currently, the dbs-upcall features have 2 problems that are needed to be
fixed :
There are redundant dbs-upcall features that are needed to be removed.
Some place should be controlled by dbs-upcall but not being implemented.
This commit will fix those two problems.
fixes: #6878
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Let's enable the devmapper kubernetes tests to match exactly what's been
tested as part of the Jenkins CI.
Fixes: #6542
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function right now is completely based on what's part of the tests
repo[0], and that's the reason I'm keeping the `Signed-off-by` of all
the contributors to that file.
This is not perfect, though, as it changes the default snapshotter to
devmapper, instead of only doing so for the Kata Containers specific
runtime handlers. OTOH, this is exactly what we've always been doing as
part of the tests.
We'll improve it, soon enough, when we get to also add a way for
kata-deploy to set up different snapshotters for different handlers.
But, for now, this is as good (or as bad) as it's always been.
It's important to note that the devmapper setup doesn't take into
consideration a BM machine, and this is not suitable for that. We're
really only targetting GHA runners which will be thrown away after the
run is over.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
One can use different kubernetes flavours for getting a kubernetes
cluster up and running.
As part of our CI, though, I really would like to avoid contributors
spending time maintaining and updating kubernetes dependencies, as done
with the tests repo, and which has been proven to be really good on
getting things rotten.
With this in mind, I'm taking the bullet and using "k3s" as the way to
deploy kubernetes for the devmapper related tests, and that's the reason
I'm adding a function to do so, and this will be used later on as part
of this series.
It's important to note that the k3s setup doesn't take into
consideration a BM machine, and this is not suitable for that. We're
really only targetting GHA runners which will be thrown away after the
run is over.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Some use cases may just require passing extra arguments to virtiofsd,
and having this disabled by default makes it impossible to set when
using kata-deploy, as changes in the configuration file would be
overwritten by the daemon-set.
With this in mind, let's allow users to pass whatever thet need (and
here I'm specifically looking at `--xattr`) as a virtio_fs_extra_arg.
Fixes: #7853
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR is to skip installing docker-compose-plugin while buiding a `build-kata-deploy` image for s390x|ppc64le.
It is a temporary solution to fix current CI failures for s390x regarding `hash sum mismatch`.
Fixes: #7848
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This PR adds the write 95 percentile for FIO for qemu for
checkmetrics for kata metrics.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the write 95 percentile FIO value for checkmetrics
for kata metrics.
Fixes#7842
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Previously, static-checks-dragonball is using machines from Alibaba
Cloud to run all the CI jobs.
Currently, we are going through an internal process to apply for the new
machines for Dragonball CI. Before the internal process is over, we will
temporarily use Azure VM to run static-checks-dragonball jobs.
fixes: #7838
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Currently, virtio_vsock are still outside of the device
manager. This causes some management issues,such as the
inability to unify PCI address management.
Just do some work for hybrid vsock.
Fixes: #7655
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Pass --owner and --group to the tar invokation to prevent gihtub runner user
from leaking into release artifacts.
Fixes: #7832
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
When a storage device is used by more than one container, the second
and forth instances will cause storage device reference count leakage,
thus cause storage device leakage. The reason is:
add_storages() will increase reference count of existing storage device,
but forget to add the device to the `mount_list` array, thus leak the
reference count.
Fixes: #7820
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
Use AGENT_POLICY=yes when building the Guest images, and add a
permissive test policy to the k8s tests for:
- CBL-Mariner
- SEV
- SNP
- TDX
Also, add an example of policy rejecting ExecProcessRequest.
Fixes: #7667
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Let's make sure we use the TDX image as part of the QEMU TDX
configuration, which will help us to have the policies tested here.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We have two scenarios we care about this, `pull_request` and
`pull_request_target` events triggered a job.
`pull_request` event:
When using the checkout action, it'll already provide a "rebased atop of
main" repo for us, nothing else is needed, and that's basically what we
already have as part of the jobs in our CI.
`pull_request_target` event:
This one is a little bit tricky, as the checkout action, unless passing
a spsecific repo, give us the PR checked out rebased atop of the HEAD of
the PR branch. Jeremi Piotrowski nicely pointed out that we could use
github.event.pull_request.merge_commit_sha instead, which is the result
of the PR's branch with the official repo target branch.
Now, the only cases where the contributor's rebase would still be needed
is when the action itself has been changed.
Fixes: #7414
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR adds the grabdata script so it can be used for the metrics report
for kata metrics.
Fixes#7812
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Update the protection checking code to detect newer versions of Intel
TDX (whose userland interface has now stabilised).
> **Note:** that we don't need to retain the existing behaviour since:
>
> - We haven't yet landed the TDX feature (#6448).
> - Systems wishing to use TDX will need to use the latest available
> system components (such as firmware and host kernel).
Also added an explicit TDX unit test.
Fixes: #7384.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
IoCopy is a tricky function (I don't claim to fully understand its contract),
but here is what I see: The goroutine that runs it spawns 3 goroutines - one
for each stream to handle (stdin/stdout/stderr). The goroutine then waits for
the stream goroutines to exit. The idea is that when the process exits and is
closed, the stdout goroutine will be unblocked and close stdin - this should
unblock the stdin goroutine. The stderr goroutine will exit at the same time as
the stdout goroutine. The iocopy routine then closes all tty.io streams.
The problem is that the stdout goroutine decrements the WaitGroup before
closing the stdin stream, which causes the iocopy goroutine to race to close
the streams. Move the wg.Done() of the stdout routine past the close so that
*this* race becomes impossible. I can't guarantee that this doesn't affect some
unspecified behavior.
Fixes: #5031
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
If we are running FC hypervisor, it is not started when prestart hooks
are executed. So we should just ignore such error and just go ahead and
run the hooks.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
FC does not support network device hotplug. Let's add a check to fail
early when starting containers created by docker.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Add a new hypervisor capability to tell if it supports device hotplug.
If not, we should run prestart hooks before starting new VMs as nerdctl
is using the prestart hooks to set up netns. To make nerdctl + FC
to work, we need to run the prestart hooks before starting new VMs.
Fixes: #6384
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
When running on amd machines, those tests will fail because there is no
vmx flag. Following other tests that checks for cpuType, let's adapt
them to restrict vmx only on Intel machines.
Fixes#7788.
Related #5066
Signed-off-by: Beraldo Leal <bleal@redhat.com>
This PR fixes the memory inside limit for clh for kata metrics due
to the recent changes that we had in the script which impacted
in the performance measurement.
Fixes#7786
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's expand the confidential test to also support TDX.
The main difference on the test, though, is that we're not grepping for
a string in the `dmesg` output, but rather relying on `cpuid` to detect
a TDX guest.
Fixes: #7184
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add a test case for the launch of unencrypted confidential
container, verifying that we are running inside a TEE.
Right now the test only works with SEV, but it'll be expanded in the
coming commits, as part of this very same series.
Fixes: #7184
Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes an issues in the parsing results stage,
by collecting just the n-results from the n-running
containers, discarding irrelevant data.
Fixes: #7774
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
The file can be removed between builds without causing any issue, and
leaving it around has been causing us some headache due to:
```
ERROR: open /home/runner/.docker/buildx/activity/default: permission denied
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The log_forwarder task does not check if the peer has closed, causing a
meaningless loop during the period of “kata vm exit”, when the peer
closed, and “ShutdownContainer RPC received” that aborts the log forwarder.
This patch fixes the problem.
Fixes: #7741
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
Otherwise we'll have to re-run all the tests due to a flaky behaviour in
one of the parts.
Fixes: #7757
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
k8s-pid-ns.bats was already using the test name from
k8s-kill-all-process-in-container.bats - probably a copy/paste bug.
Fixes: #7753
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
The directory is a host path mount and cannot be removed from within the
container. What we actually want to remove is whatever is inside that
directory.
This may raise errors like:
```
rm: cannot remove '/opt/kata/': Device or resource busy
```
Fixes: #7746
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The vfio test requires nested-nested virtualization:
L0 Azure host
-> L1 Ubuntu VM
-> L2 Fedora VM
-> L3 Kata
This hits a kernel bug on v5.15 but works quite nicely on the v6.2 kernel
included in Ubuntu 23.04. We can switch back to Ubuntu 22.04 when they roll out
v6.2.
Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
There are several processes for container exit:
- Non-detach mode: `Wait` request is sent by containerd, then
`wait_process()` will be called eventually.
- Detach mode: `Wait` request is not sent, the `wait_process()` won’t be
called.
- Killed by ctr: For example, a container runs `tail -f /dev/null`, and
is killed by `sudo ctr t kill -a -s SIGTERM <CID>`. Kill request is
sent, then `kill_process()` will be called. User executes `sudo ctr c
rm <CID>`, `Delete` request is sent, then `delete_process()` will be
called.
- Exited on its own: For example, a container runs `sleep 1s`. The
container’s state goes to `Stopped` after 1 second. User executes
the delete command as below.
Where do we do container cleanup things?
- `wait_process()`: No, because it won’t be called in detach mode.
- `delete_process()`: No, because it depends on when the user executes the
delete command.
- `run_io_wait()`: Yes. A container is considered exited once its IO ended.
And this always be called once a container is launched.
Fixes: #7713
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Refine storage related code by:
- remove the STORAGE_HANDLER_LIST
- define type alias
- move code near to its caller
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
Introduce StorageDevice and StorageHandlerManager, which will be used
to refine storage device management for kata-agent.
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
Simplify the way to manage storage objects, and introduce
StorageStateCommon structures for coming extensions.
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
docker install now creates a group with gid 999 which happens to match what we
need to get docker-in-docker to work. Remove the group first as we don't need
it.
Fixes: #7726
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
We can simply use `rm -f` all over the place and avoid the container
returning any error.
Fixes: #7733
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Right now if we configure an image annotation and have a config file
setting initrd, the initrd config would override the image annotation.
Make sure annotations are preferred over config options in image and initrd
path handling.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
We should make sure annotations are preferred over
config options in image and initrd path handling.
Fixes: #7705
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Right now if we configure an image annotation and have a config file
setting initrd, the initrd config would override the image annotation.
Add a helper function ImageOrInitrdAssetPath to make sure annotations
are preferred over config options in image and initrd path handling.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
When the FileMode field for the device is unset (0), use a default value instead
to allow the use of the device from the container.
This behaviour is seen from cri-o typically.
Note: this is what runc is doing, which is why regular containers don't have an
issue. This change makes sure kata behaves the same as runc.
Fixes: #7717
Signed-off-by: Julien Ropé <jrope@redhat.com>
Introduce structure KataVirtualVolume to to encapsulate information
for extra mount options and direct volumes, so we could build a common
infrastructure to handle these cases.
Fixes: #7699
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
This PR configures the corresponding kata runtime in K8s
based on the tested hypervisor.
This PR also enables FIO metrics test in the kata metrics-ci.
Fixes: #7665
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
When building with AGENT_POLICY=yes and AGENT_INIT=yes:
1. Include OPA and the Policy settings in rootfs.
2. Start OPA from the kata agent.
Before these changes, building with both AGENT_POLICY=yes and
AGENT_INIT=yes was unsupported.
Starting OPA from systemd (when AGENT_INIT=no) was already supported.
Fixes: #7615
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR fixes the check results for tensorflow benchmark now
that we change the name of the test.
Fixes#7684
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Instead of doing this as part of the test itself, let's ensure it's done
before running the tests and during the tests cleanup.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This test, at least for now, only checks whether the runtimeclasses
have been properly created.
This is just a migration from a test we had as part of the k8s suite.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
By doing this we can make sure there won't be any clash on the cluster
name created for either the k8s or the kata-deploy tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will have the same function as run-k8s-tests.sh has, but for
kata-deploy.
Right now it doesn't have any tests, and the command to actually run the
tests is commented out, but right now this is just a placeholder that
will be populated sooner than later.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In a follow-up series, we'll add a whole suite for the kata-deploy
tests. With this in mind, let's already get rid of this one and avoid
more kata-deploy tests to land here.
Fixes: #7642
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The default `kata` runtime class would get created with the `kata`
handler instead of `kata-$KATA_HYPERVISOR`. This made Kata use the wrong
hypervisor and broke CI.
Fixes: #7663
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Let's add the tests as part of the ci.yaml, so they an be triggered as
part of each PR.
For this PR those tests won't be triggered, courtesy to the
`pull_request_target` event we rely on.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes the TensorFlow word across the document to have uniformity
across all the document.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The error message when the kill command is executed with the container's
state == Stopped should be "container not running" because the containerd
tests expect that OCI runtimes return the error message and compare it.
If the error message is different from the expected one, the tests fail.
Fixes: #7650
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
As the cri-containerd tests have been fully migrated to GHA, let's make
sure we get them running.
Fixes: #6543
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As part of the runners, we're hitting a timeout that I cannot reproduce,
at all, when allocating the same instance and running the tests
manually.
The default timeout to connect to the server is 2s when using `crictl`.
Let's increase this to 20s.
It's fairly important to mention that in the first tests I used a
timeout of 10s, and that helped but we still hit issues every now and
then.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We extend the `Result` and `Option` types with associated types that
allows converting a `Result<T, E>` and `Option<T>` into
`ttrpc::Result<T>`.
This allows the elimination of many `match` statements in favor of
calling the map function plus the `?` operator. This transformation
simplifies the code.
Fixes: #7624
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
This PR renames the tensorflow scripts to include the data format
that is being used as we will have multiple tests with different
data and model formats for tensorflow so this will help us to
distinguish them.
Fixes#7645
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This will not be tested as part of the PR, thanks to the
`pull_request_target` event, but we want it to be added so we can build
atop of that in a coming up series.
Fixes: #7642
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will not be tested as part of the PR, thanks to the
`pull_request_target` event, but we want it to be added so we can build
atop of that in a coming up series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Fixes: #7573
To enable this feature, build your rootfs using AGENT_POLICY=yes. The
default is AGENT_POLICY=no.
Building rootfs using AGENT_POLICY=yes has the following effects:
1. The kata-opa service gets included in the Guest image.
2. The agent gets built using AGENT_POLICY=yes.
After this patch, the shim calls SetPolicy if and only if a Policy
annotation is attached to the sandbox/pod. When creating a sandbox/pod
that doesn't have an attached Policy annotation:
1. If the agent was built using AGENT_POLICY=yes, the new sandbox uses
the default agent settings, that might include a default Policy too.
2. If the agent was built using AGENT_POLICY=no, the new sandbox is
executed the same way as before this patch.
Any SetPolicy calls from the shim to the agent fail if the agent was
built using AGENT_POLICY=no.
If the agent was built using AGENT_POLICY=yes:
1. The agent reads the contents of a default policy file during sandbox
start-up.
2. The agent then connects to the OPA service on localhost and sends
the default policy to OPA.
3. If the shim calls SetPolicy:
a. The agent checks if SetPolicy is allowed by the current
policy (the current policy is typically the default policy
mentioned above).
b. If SetPolicy is allowed, the agent deletes the current policy
from OPA and replaces it with the new policy it received from
the shim.
A typical new policy from the shim doesn't allow any future SetPolicy
calls.
4. For every agent rpc API call, the agent asks OPA if that call
should be allowed. OPA allows or not a call based on the current
policy, the name of the agent API, and the API call's inputs. The
agent rejects any calls that are rejected by OPA.
When building using AGENT_POLICY_DEBUG=yes, additional Policy logging
gets enabled in the agent. In particular, information about the inputs
for agent rpc API calls is logged in /tmp/policy.txt, on the Guest VM.
These inputs can be useful for investigating API calls that might have
been rejected by the Policy. Examples:
1. Load a failing policy file test1.rego on a different machine:
opa run --server --addr 127.0.0.1:8181 test1.rego
2. Collect the API inputs from Guest's /tmp/policy.txt and test on the
machine where the failing policy has been loaded:
curl -X POST http://localhost:8181/v1/data/agent_policy/CreateContainerRequest \
--data-binary @test1-inputs.json
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Right now this file does nothing, as it's not even called by any GHA.
However, it'll be populated later on as part of a different series,
where we'll have kata-deploy specific tests running here.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's split a good portion of `tests/integration/kuberentes/gha-run.sh`
out, and put them in a place where they can be used to the soon-to-come
kata-deploy specific tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Remove the installation step in the virtcontainers doc
because the virtcontainers install/uninstall targets have
been removed by 86723b51ae
and they are not used anymore.
Fixes: #7637
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
This PR fixed the loop that stops the kata-shim and the
hypervisors used in metrics checks.
Fixes: #7628
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
Remove configuration file shared_fs = none warnings
now that there is a solution to updating configMaps, secrets etc
Fixes: #7210
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This patch allows copying of directories and symlinks when
static file copying is used between host and guest. This change is
necessary to support recursive file copying between shim and agent.
Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
(cherry picked from commit de232b8030)
For remote hypervisor, the configmap, secrets, downward-api or project-volumes are
copied from host to guest. This patch watches for changes to the host files
and copies the changes to the guest.
Note that configmap updates takes significantly longer than updates via downward-api.
This is similar across runc and Kata runtimes.
Fixes: #7210
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: Julien Ropé <jrope@redhat.com>
(cherry picked from commit 3081cd5f8e)
(cherry picked from commit 68ec673bc4d9cd853eee51b21a0e91fcec149aad)
This PR adds check containers are running in tensorflow mobilenet
that is being defined in common script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the check containers are up function from common
in tensorflow script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the check containers are running function the common metrics
script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the check containers are up in the common script
in the tensorflow mobilenet script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR uses the check containers are up from the common script
in the tensorflow script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR uses the collect results function defined in common for
the tensorflow mobilenet test.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes the collect results function from tensorflow script
as it is going to be referenced in the common metrics script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Since the passed fd through unix socket would be any
stream fd such as pipe/fifo fd or any other socket
fd, thus we should deal with it as a normal hybrid
stream instead of a unix stream.
Fixes:#7584
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Allow runk to launch a container even though users don't specify the
pid namespace in `config.json` because general container runtimes
such as runc also can launch a container without the namespace.
On the other hand, Kata Containers doesn't allow it due to security issue
so this feature should be enabled in only runk.
Fixes: #7168
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-07-16 23:31:14 +05:30
685 changed files with 24953 additions and 35096 deletions
- [How to run Kata Containers with `nydus`](how-to-use-virtio-fs-nydus-with-kata.md)
- [How to run Kata Containers with AMD SEV-SNP](how-to-run-kata-containers-with-SNP-VMs.md)
- [How to use EROFS to build rootfs in Kata Containers](how-to-use-erofs-build-rootfs.md)
- [How to run Kata Containers with kinds of Block Volumes](how-to-run-kata-containers-with-kinds-of-Block-Volumes.md)
## Confidential Containers
- [How to use build and test the Confidential Containers `CCv0` proof of concept](how-to-build-and-test-ccv0.md)
- [How to generate a Kata Containers payload for the Confidential Containers Operator](how-to-generate-a-kata-containers-payload-for-the-confidential-containers-operator.md)
- [How to run Kata Containers with kinds of Block Volumes](how-to-run-kata-containers-with-kinds-of-Block-Volumes.md)
which should only show a single `rootfs` directory if the container image was pulled on the guest, not the host
- Looking that `rootfs` directory with
```bash
$ sudo ls -ltr $(sudo find /run/kata-containers/shared/sandboxes/${pod_id}/shared -name rootfs)
```
shows something similar to
```
total 668
-rwxr-xr-x 1 root root 682696 Aug 25 13:58 pause
drwxr-xr-x 2 root root 6 Jan 20 02:01 proc
drwxr-xr-x 2 root root 6 Jan 20 02:01 dev
drwxr-xr-x 2 root root 6 Jan 20 02:01 sys
drwxr-xr-x 2 root root 25 Jan 20 02:01 etc
```
which is clearly the pause container indicating that the `busybox` based container image is not exposed to the host.
### Using a Kata pod sandbox for testing with `agent-ctl` or `ctr shim`
Once you have a kata pod sandbox created as described above, either using
[`crictl`](#using-crictl-for-end-to-end-provisioning-of-a-kata-confidential-containers-pod-with-an-unencrypted-image), or [Kubernetes](#using-kubernetes-for-end-to-end-provisioning-of-a-kata-confidential-containers-pod-with-an-unencrypted-image)
, you can use this to test specific components of the Kata confidential
containers architecture. This can be useful for development and debugging to isolate and test features
that aren't broadly supported end-to-end. Here are some examples:
- In the first terminal run the pull image on guest command against the Kata agent, via the shim (`containerd-shim-kata-v2`).
This can be achieved using the [containerd](https://github.com/containerd/containerd) CLI tool, `ctr`, which can be used to
interact with the shim directly. The command takes the form
`ctr --namespace k8s.io shim --id <sandbox-id> pull-image <image> <new-container-id>` and can been run directly, or through
the `ccv0.sh` script to automatically fill in the variables:
- Optionally, set up some environment variables to set the image and credentials used:
- By default the shim pull image test in `ccv0.sh` will use the `busybox:1.33.1` based test image
`quay.io/kata-containers/confidential-containers:signed` which requires no authentication. To use a different
image, set the `PULL_IMAGE` environment variable e.g.
{"msg":"Command PullImage (1 of 1) returned (Ok(()), false)","level":"INFO","ts":"2021-09-15T08:40:43.828261708-07:00","subsystem":"rpc","pid":"830920","source":"kata-agent-ctl","version":"0.1.0","name":"kata-agent-ctl"}
```
> **Note**: The first time that `~/ccv0.sh agent_pull_image` is run, the `agent-ctl` tool will be built
which may take a few minutes.
- To validate that the image pull was successful, you can open a shell into the Kata guest with:
```bash
$ ~/ccv0.sh open_kata_shell
```
- Check the `/run/kata-containers/` directory to verify that the container image bundle has been created in a directory
named either `01234556789` (for the container id), or the container image name, e.g.
```bash
$ ls -ltr /run/kata-containers/confidential-containers_signed/
```
which should show something like
```
total 72
drwxr-xr-x 10 root root 200 Jan 1 1970 rootfs
-rw-r--r-- 1 root root 2977 Jan 20 16:45 config.json
```
- Leave the Kata shell by running:
```bash
$ exit
```
## Verifying signed images
For this sample demo, we use local attestation to pass through the required
configuration to do container image signature verification. Due to this, the ability to verify images is limited
to a pre-created selection of test images in our test
For pulling images not in this test repository (called an *unprotected* registry below), we fall back to the behaviour
of not enforcing signatures. More documentation on how to customise this to match your own containers through local,
or remote attestation will be available in future.
In our test repository there are three tagged images:
| Test Image | Base Image used | Signature status | GPG key status |
| --- | --- | --- | --- |
| `quay.io/kata-containers/confidential-containers:signed` | `busybox:1.33.1` | [signature](https://github.com/kata-containers/tests/tree/CCv0/integration/confidential/fixtures/quay_verification/x86_64/signatures.tar) embedded in kata rootfs | [public key](https://github.com/kata-containers/tests/tree/CCv0/integration/confidential/fixtures/quay_verification/x86_64/public.gpg) embedded in kata rootfs |
| `quay.io/kata-containers/confidential-containers:unsigned` | `busybox:1.33.1` | not signed | not signed |
| `quay.io/kata-containers/confidential-containers:other_signed` | `nginx:1.21.3` | [signature](https://github.com/kata-containers/tests/tree/CCv0/integration/confidential/fixtures/quay_verification/x86_64/signatures.tar) embedded in kata rootfs | GPG key not kept |
Using a standard unsigned `busybox` image that can be pulled from another, *unprotected*, `quay.io` repository we can
test a few scenarios.
In this sample, with local attestation, we pass in the the public GPG key and signature files, and the [`offline_fs_kbc`
- Again this results in an error message from `crictl`:
`"PullImage from image service failed" err="rpc error: code = Internal desc = Security validate failed: Validate image failed: The signatures do not satisfied! Reject reason: [signature verify failed! There is no pubkey can verify the signature!]" image="quay.io/kata-containers/confidential-containers:other_signed"`
### Using Kubernetes to create a Kata confidential containers pod from the encrypted ssh demo sample image
The [ssh-demo](https://github.com/confidential-containers/documentation/tree/main/demos/ssh-demo) explains how to
demonstrate creating a Kata confidential containers pod from an encrypted image with the runtime created by the
@@ -28,10 +28,10 @@ __Steps from the Developer Guide:__
__SNP-specific steps:__
- Build the SNP-specific kernel as shown below (see this [guide](../../tools/packaging/kernel/README.md#build-kata-containers-kernel) for more information)
```bash
$ pushd kata-containers/tools/packaging/kernel/
$ ./build-kernel.sh -a x86_64 -x snp setup
$ ./build-kernel.sh -a x86_64 -x snp build
$ sudo -E PATH="${PATH}" ./build-kernel.sh -x snp install
$ pushd kata-containers/tools/packaging/
$ ./kernel/build-kernel.sh -a x86_64 -x snp setup
$ ./kernel/build-kernel.sh -a x86_64 -x snp build
$ sudo -E PATH="${PATH}" ./kernel/build-kernel.sh -x snp install
@@ -27,8 +27,6 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.runtime.internetworking_model` | string| determines how the VM should be connected to the container network interface. Valid values are `macvtap`, `tcfilter` and `none` |
| `io.katacontainers.config.runtime.sandbox_cgroup_only`| `boolean` | determines if Kata processes are managed only in sandbox cgroup |
| `io.katacontainers.config.runtime.enable_pprof` | `boolean` | enables Golang `pprof` for `containerd-shim-kata-v2` process |
| `io.katacontainers.config.runtime.image_request_timeout` | `uint64` | the timeout for pulling an image within the guest in `seconds`, default is `60` |
| `io.katacontainers.config.runtime.sealed_secret_enabled` | `boolean` | enables the sealed secret feature, default is `false` |
## Agent Options
| Key | Value Type | Comments |
@@ -96,16 +94,6 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.enable_guest_swap` | `boolean` | enable swap in the guest |
| `io.katacontainers.config.hypervisor.use_legacy_serial` | `boolean` | uses legacy serial device for guest's console (QEMU) |
IntGaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_HYPERVISOR,"vcpu"),"Hypervisor metrics specific to VCPUs' mode of functioning."),&["item"]).unwrap();
IntGaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_HYPERVISOR,"vcpu"),"Hypervisor metrics specific to VCPUs' mode of functioning."),&["cpu_id","item"]).unwrap();
staticrefHYPERVISOR_SECCOMP: IntGaugeVec=
IntGaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_HYPERVISOR,"seccomp"),"Hypervisor metrics for the seccomp filtering."),&["item"]).unwrap();
"Received KVM_SYSTEM_EVENT signal type: {}, flag: {}",
event_type,event_flags
@@ -504,7 +508,7 @@ impl Vcpu {
}
},
r=>{
METRICS.vcpu.failures.inc();
self.metrics.failures.inc();
// TODO: Are we sure we want to finish running a vcpu upon
// receiving a vm exit that is not necessarily an error?
error!("Unexpected exit reason on vcpu run: {:?}",r);
@@ -523,7 +527,7 @@ impl Vcpu {
Ok(VcpuEmulation::Interrupted)
}
_=>{
METRICS.vcpu.failures.inc();
self.metrics.failures.inc();
error!("Failure during vcpu run: {}",e);
#[cfg(target_arch = "x86_64")]
{
@@ -731,7 +735,7 @@ impl Vcpu {
fnwaiting_exit(&mutself)-> StateMachine<Self>{
// trigger vmm to stop machine
ifletErr(e)=self.exit_evt.write(1){
METRICS.vcpu.failures.inc();
self.metrics.failures.inc();
error!("Failed signaling vcpu exit event: {}",e);
}
@@ -765,11 +769,17 @@ impl Vcpu {
pubfnvcpu_fd(&self)-> &VcpuFd{
self.fd.as_ref()
}
pubfnmetrics(&self)-> Arc<VcpuMetrics>{
self.metrics.clone()
}
}
implDropforVcpu{
fndrop(&mutself){
let_=self.reset_thread_local_data();
letid: u32=self.idasu32;
METRICS.write().unwrap().vcpu.remove(&id);
}
}
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.