It's been released for a while now, and we need to keep consistency
between what we used.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit c5cfad7023)
Signed-off-by: Greg Kurz <groug@kaod.org>
hub is now deprecated, which has been causing issues with our release
process.
Let's move to the GH cli (https://cli.github.com/manual), and unblock
this release.
**NOTE**: This commit is purposefully not touching anywhere else hub is
used, as that would require more time and investigation to do the
switch, and right now we just want to unblock the release.
Fixes: #8286
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 710eb8ab9d)
Signed-off-by: Greg Kurz <groug@kaod.org>
- 3.2 backport bunch
- stable-3.2: Backport everything related to the tests
- 3.2 backport | kata-deploy: Use host's systemctl
- 3.2 backport | dragonball: use version 0.10.4 of `fuse-backend-rs`
2cda69b284 release: Adapt kata-deploy for 3.2.0
93c7d165dc ci: k8s: Fix bogus firecracker check in k8s-credentials-secrets.bat
12b8cbb4f6 tests: Adjust timeout for agent stability test
37c99a46b1 tests: Enable agent stability test
92f283f062 runtime: Validate hypervisor section name in config file
8cf5506700 metrics: fixes common.sh function to always return true
544f261433 metrics: skips docker restart when it is not installed or is masked.
26c6ca93d3 metrics: removing trailing comma characters from json file.
0e0aabfd87 metrics: removal of reference in the documentation to the dax test.
5d911db5e2 tests: Remove unused function from scability test
a380437380 tests: Fix path for versions yaml for soak parallel test
4495a79721 tests: Enable scability test for stability CI
961daee983 scripts: Use install_yq from the `kata-containers` repo
9b48525af1 release: tag_repos: Stop tagging / updating the `tests` repo
668c8979f0 runtime: fix reading cgroup stats of sandboxes
11e2f2a458 versions: Bump virtiofsd to v1.8.0
9eb8723a5b clh: arm: Use static_sandbox_resource_mgmt=true
e7579d20f7 runtime/qemu: Rework QMP/HMP support
f0278f41d7 runtime/virtiofsd: Drop all references to "--cache=none"
4679aa7712 runtime/qemu: Pass "--xattr" to virtiofsd instead of "-o xattr"
03d712ab25 runtime: Allow virtio_fs_extra_args annotation
e0513094a0 runtime/vc: runPrestartHooks should ignore GetHypervisorPid failure
c17cbd30f0 runtime: fail early when starting docker container with FC
7e6f8010bd runtime: run prestart hooks before starting VM for FC
fa824af234 qemu: tdx: Workaround SMP issue with TDX 1.5
07471cd7a6 qemu: tdx: Adapt to the TDX 1.5 stack
2f28866f26 versions: tdx: Update Kernel to 6.2 + TDX
a36064c729 versions: tdx: Update TDVF to the "edk2-stable202302"
65e0b99eb4 versions: tdx: Update QEMU to v7.2 + TDX v1.10
9ce8ee6c0c runtime/fc: fix image/initrd annotation handling
f86bfe0da3 runtime/clh: fix image/initrd annotation handling
59fae423b5 runtime/qemu: fix image/initrd annotation handling
ef65c5767f kata-agent: use default filemode for block device when it is set to 0
93609aa0cd deps: Bump dependent crate versions
7ff98daecf gha: Add install dependencies for stability tests
ef49db59f7 gha: Add general dependencies to stability tests
a818f628d7 tests: Add soak parallel stability test
602c56c0d7 tests: Enable soak parallel test
a195539307 ci: k8s: set KUBERNETES default value
c4456c21d9 tests: run k8s-volume on a given node
58ad833300 tests: run k8s-file-volume on a given node
a54bdd00d5 tests: exec_host() now gets the node name
0eaf81c1a2 tests: add get_one_kata_node() to tests_common.sh
5f2c7c78ff ci: k8s: set KATA_HYPERVISOR default value
7fceb21362 ci: k8s: configurable deploy kata timeout
c4b0f1f31b ci: k8s: shellcheck fixes to gha-run.sh
6fb40ad47d kata-deploy: re-format kata-[deploy|cleanup].yaml
5cd2e947dc ci: k8s: run_tests() for kcli
56cebfb485 ci: k8s: add deploy-kata-kcli() to gh-run.sh
6b76d21568 ci: k8s: add cleanup-kcli() to gha-run.sh
308ce26438 ci: k8s: set default image for deploy_kata()
c3b91ed394 ci: k8s: create k8s clusters with kcli
33791f0944 metrics: stops kata components and k8s deployment when test finishes
621e6e6d8c gha: combine coco jobs into a single yaml
fe52c0900c gha: combine basic amd64 jobs into a single yaml
301a7d94e3 gha: ci: Revert tracing test PR to unbreak CI
c1da29b9b1 ci: Port runk tests to this repo
63be808730 ci: Add placeholder for runk tests
6541969a83 ci: Move tracing tests here
5d232c8143 ci: Add placeholder for tracing tests
619ef169fb ci: Create a function to install docker
16e31dd409 metrics: Use jq tool to pretty-print json metrics output
1f9a4e908f metrics: Enables FIO test for kata containers
fe4f72e0a1 gha: Add containerd stability tests to ci yaml
7963298ba2 gha: Add stability gha run script
a4e0929054 gha: Add stability tests workflow for gha
be3a3c221b gha: arm64: Ensure the builder is arm64-builder
f20164dc75 packaging: tools: Remove `set -x` leftover
1941d87b84 packaging: release: Mention newly added images
95da1c71ec packaging: tools: Fix container image env var name
508016fca1 packaging: Allow passing the TOOLS_CONTAINER_BUILDER
bb1efe0d46 packaging: stable-3.2: Remove everything related to agent policy
892c9f2f03 gha: Build the kata-agent as part of our workflows
a586b8c581 packaging: Build the kata-agent
766a5fa118 agent: Allow specifying DESTDIR and AGENT_POLICY via env vars
050a4260b9 packaging: Add get_agent_image_name()
3770b200a8 gha: Fix k0s deployment
cf254bc4ee tests: Add general stability fixes
1edf2d9bc1 tests: Add agent stability test
a8eec39559 tests: Add cassandra stress in stability tests
240c584ae2 tests: Add stressng dockerfile for stability tests
e95d3b1be5 tests: Add stressor CPU test for stability tests
4393f553e9 metrics: Add stability test for kata CI
362adea8cd metrics: Fix general check static warnings
16c349e76c docs: Update url in kata vra document
5800be5029 ci: Build src/tools components as part of our tests / releases
41b509e0a6 kata-deploy: Build components from src/tools
a5d7ba6662 static-build: Add scripts to build content from src/tools
d503daf75e packaging: Add get_tools_image_name()
b2e432c024 packaging: Use git abbreviated hash
c22fdb46e3 metrics: Increase qemu jitter value
8a1af8689b metrics: Increase jitter value for clh
f3fcf6cbf9 metrics: Add checkmetrics for latency test
ce03e9f97a metrics: Add qemu latency value limit
cd82a351bd metrics: Add latency value limits for kata CI
1709f99975 ci: kata-monitor: Move tests over
a50c7f1972 ci: Add placeholder for kata-monitor tests
c42d19619d ci: Make install_kata aware of container engines
5017435734 ci: Create a generic install_crio function
98e9434be4 ci: Add install_cni_plugins helper
c61b488b66 ci: Modify containerd default config
7c4617cfac metrics: Add init_env function to latency test
e106ecd1e4 metrics: Fix latency yamls path
665805c81c metrics: Fix spelling warnings
b0c9b4254b metrics: Fix metrics README
c28a0a03f0 metrics: Fix C-Ray documentation
48a9b4ab13 ci: crio: Trail '\r' from exec_host() output
2de1c8bac2 ci: crio: Enable default capabilities
d1d3c7cbda kata-deploy: Fix CRI-O detection
0de3216b08 kata-deploy: Add k0s support
468a3218f5 ci: crio: Pass `-y` to apt
3f2780fca6 metrics: Add latency benchmark for gha
73a084a7d4 metrics: Enable latency test in gha run script
cf3abd308f local-build: Fix .docker ownership before build-payload
8b607ff79a gha: Add pandoc as a dependency for static checks
6a9384ed40 gha: Install hunspell for static checks
a11e8867af ci: Trigger payload-after-push on workflow_dispatch
390bde3182 ci: Actually enable the CRI-O tests
f2953e6448 ci: k8s: rke2: Use sudo to call systemd
08bdb6b5da ci: k8s: Add a CRI-O test
b41fa6d946 ci: k8s: Add a method to install CRI-O
67fef9d5c6 ci: k8s: k0s: Allow passing parameters to the k0s installer
2c3f130c85 ci: kata-deploy: Fix runner name
7a8d848a92 ci: Enable kata-deploy tests for all the supported k8s flavours
7fc2f7d003 ci: kata-deploy: Add the ability to deploy rke2
59a4b00d29 ci: kata-deploy: Add the ability to deploy k0s
1a605c33ad ci: kata-deploy: Add deploy-k8s argument to gha-run.sh
19ee6c9fd7 ci: kata-deploy: Expland tests to run on k0s / rke2
03a8bed32b ci: kata-deploy: Add placeholder for tests on GARM
f09c255766 ci: kata-deploy: Export KUBERNETES env var
abe9dc9904 ci: Move deploy_k8s() to gha-run-k8s-common.sh
ea6489653e ci: Properly set K8S_TEST_UNION
7892e04dd1 ci: Add first letter of the K8S_TEST_HOST_TYPE to resource group name
882d7d7d89 ci: Create clusters in individual resource groups
b09a3f8f8e metrics: Add parallel bandwidth limit for qemu
63e8c38a7a metrics: Enable parallel bandwidth iperf limit
f3c42ff5fe nydus: Temporarily skip tests on dragonball
49c1a37330 nydus: Use `kata-${KATA_HYPERVISOR}` instead of `kata`
ae55c0b510 static-build: Fix arch error on nydus build
65e5bfe9eb tests: nydus: Update nydus tests
079ab1e0ac versions: Bump nydus and nydus-snapshotter to its latest release
d9e910702b gha: nydus: Populate run()
33a4427845 gha: nydus: Populate install_dependencies()
70c1c7d868 gha: nydus: Actually install kata when `install-kata` is called
30efa3e563 gha: nydus: Get rid of nydus{,-snapshotter} install from nydus_test.sh
9ad6000676 tests: nydus: Add timeout to the crictl calls
6d9b8e2437 tests: nydus: Add uid / namespace to the nydus container / sandbox
fd5935da9d tests: nydus: Decorate some calls with `sudo`
4b58777eec tests: nydus: Adapt "source ..." to GHA
82c531978f tests: nydus: Adapt check to "clh" instead "cloud-hypervisor"
4915605b20 tests: common: Add install_nydus_snapshotter()
8e4180f697 tests: common: Add install_nydus()
625a05aa2a ci: static-checks: Clean up static-checks job
9784ded336 ci: static-checks: Run tests depending on KVM
668b7effb4 ci: static-checks: Move "sudo make test" to the new test matrix
4b660a4991 ci: static-checks: Move "make test" to the new test matrix
9e614ce466 runtime-rs: Ensure static-checks-build is a dep of `make test`
d5d21f4cb4 kata-ctl: Use `loop` instead of `kvm` module in tests
93577381a5 kata-ctl: Ensure GENERATED_CODE is a dep of `make test`
93440dc141 agent: Ensure GENERATED_CODE is a dep of `make test`
d269f09a66 ci: install_libseccomp: Do not depend on the tests repo
bb920178ad ci: static-checks: Move "make check" to the new test matrix
d6996d01c0 kata-ctl: Add `kata-types` to the Cargo.lock file
a62e18b27f kata-ctl: Ensure GENERATED_CODE is a dep of `make check`
cd6ab3cf07 tests: install_rust: Also install clippy
d288e1ab87 ci: static-checks: Move vendor check to its own job
755057c9ed tests: Move install_rust.sh from the tests repo
d3a04b7b8f tests: install_go: Remove tests repo dependency
c18c412db7 tests: Move functions from kata_arch script here
bb8d1be300 ci: static-checks: Move kernel config check to its own job
7c4a0f7fac ci: Use variable size of VMs depending on the tests running
7019a25f25 ci: cache: Fix ovmf-sev cache
dc9f2c24f1 ci: cache: Check the sha256sum of the component
a55c082fa1 ci: cache: Remove the script used to cache artefacts on Jenkins
e464bbfc93 ci: cache: Also store the ${component} sha256sum
b5da4ce0d8 ci: cache: Use the cached artefacts from ORAS
2f280659b1 ci: k8s: Temporarily disable tests that require a bigger VM instance
f160effaee ci: cache: Push cached artefacts to ghcr.io
6f8ded36b6 kata-deploy: Generate latest_{artefact,image_builder} files
0210db6e34 ci: cache: Install ORAS in the kata-deploy binaries builder container
27dd77469d ci: k8s: devmapper: Use a smaller / cheaper VM instance
3b64c8d687 ci: nydus: Use a smaller / cheaper VM instance
03857041e4 ci: nerdctl: Use a smaller / cheaper VM instance
301edcb92e ci: docker: Use a smaller / cheaper VM instance
594fcdce56 ci: cri-containerd: Use a smaller / cheaper VM instance
fa9dd46041 ci: k8s: Don't set cpu limit request for k8s-inotofy test
767ccb117f ci: Reduce the size of the AKS VMs
054895fcdd ci: cache: For consistency, read all used env vars
5e22a3085b ci: cache: Pass the exposed env vars to the kata-deploy binaries in docker
bda0354491 ci: cache: Export env vars needed to use ORAS
c78f740854 metrics: Add iperf cpu utilization limit for qemu
73e989c4b1 metrics: Add iperf value for cpu utilization
1c32b31589 tests: Apply timeout to 'ctr t kill'
1d78871713 tests/vfio: Bump VM image to Fedora 38
b40a42699d tests/vfio: Accept single device in vfio group for CLH
82a0225159 tests/vfio: Get rid of sync's
a1aed0c78e gha: vfio: Set test timeout to 15m
32be55aa8a packaging: kernel: Enable VIRTIO_IOMMU on x86_64
3b5c5bcfa4 runtime: clh: Support enabling iommu
a0f59829b2 tests/vfio: Give commands 30s to execute
65943d5b77 tests/vfio: Configure a value for 'hot_plug_vfio' for both vmms
18a8b8df03 runtime: Remove redundant check in checkPCIeConfig
d86af5923f runtime: Add test cases for checkPCIeConfig
0a918d0d20 runtime: Check config for supported CLH (cold|hot)_plug_vfio values
86201ace5a runtime: clh: Add hot_plug_vfio entry to config
01265fb217 tests/vfio: Gather debug info and disable tdp_mmu
44f37f689a tests/vfio: Capture journal from vm
a69d0d1772 tests/vfio: Change to get the test working in GHA
e90027f38c tests/vfio: Move dependency installation to gha-run.sh
62804d637c gha: vfio: Import jobs scripts from tests repo
97283b18b4 metrics: Increase jitter value for qemu
3c5bd8c44d metrics: Increase value limit for jitter in clh
6abf513f06 ci: docker: nerdtl: Use io.containerd.kata-${KATA_HYPERVISOR}.io
9a664ea8bb ci: nerdctl: Create the containerd config
5734c4cbca ci: nerdctl: Switch to tcp port 80 ping
55c8a47a40 ci: docker: Switch to tcp port 80 ping
31c3d9bd80 metrics: Add iperf bandwidth value for qemu
40ae855f0e metrics: Add iperf bandwidth value for kata metrics
deadacd58f metrics: Ensure docker is running in init_env
31c33f9c1c metrics: Add Cassandra Metrics documentation
0968bf1eb9 metrics: this PR skips the FIO test temprarily to fix issues
e5e3951398 ci: docker: Also run the smoke test with runc
c7147dabce ci: docker: Run the tests after the kata-static is created
33430ad60c ci: Add a very basic nerdctl sanity test
69dd11f459 ci: Add a very basic docker sanity test
fcfa6c6e1a ci: use github.ref_name instead of $GITHUB_REF_NAME
19d9fd9eb1 ci: Add more target-branch related fixes
fe4247a90c ci: Fix target-branch usage
9f510d059b metrics: Remove warning from metrics documentation
400418bce0 kata-deploy: Remove curl after it's used
1df997c38c kata-deploy: Fix aarch64 image build
61b1a99fca gha: Manually rebase PR atop of the target branch before testing
db563709e3 kata-deploy: Switch to an alpine image
bb5dbfbbce k8s: ci: Skip "Pod quota" test with firecracker
263ed4afd1 ci: k8s: Remove useless skip statement from tests
7e135294a7 ci: k8s: Also check for "fc" (for firecracker)
8892d9a7b2 ci: k8s: Add clean-up-garm argument for gha-run.sh
c723a7d9c8 ci: k8s: devmapper tests should be using ubuntu 20.04
aee6f36c86 ci: k8s: Add a kata-deploy-garm target
5bb77b628d ci: k8s: Export KUBERNETES env var
7ce5c8b3fa ci: k8s: Install bats on GARM runners
9fb291d88a ci: k8s: Wait some time after restarting k3s
053308eefc metrics: fix FIO test initialization
89345b6731 ci: k8s: Append, instead of overwrite, the devmapper config
bb675f8101 ci: k8s: Decrease k3s sleep from 4 to 2 minutes
695c7162ef ci: k8s: Use vanilla kubectl with k3s
7f865be398 ci: k8s: Ensure k3s is deploy with --write-kubeconfig-mode=644
7a96d0a589 ci: k8s: Use the proper command for sleep
92fdaf9719 metrics: Use TensorFlow optimized image
1b7ffeac53 ci: k8s: Fix typo in run-k8s-tests-on-garm.yaml
79de72592f ci: k8s: Add k8s devmapper tests (part 0)
a41a56e326 ci: k8s: Add a function to configure devmapper for containerd
315288a000 ci: k8s: Add a function to deploy k3s
899c823c0b packaging: do not install docker-compose-plugin for s390x|ppc64le
374e77d330 metrics: Add write 95 percentile for FIO for qemu
22ce1671a6 metrics: Add write 95 percentile FIO value
5e90c8e176 metrics: Add checkmetrics to gha run script
651b89ba41 metrics: Add checkmetrics value for qemu for iperf
907baa3464 metrics: Add jitter value for clh
d9408a7283 metrics: Add test selector to iperf metrics
3583f373f5 metrics: Enable iperf benchmark on gha for kata metrics
7fd7186780 CI: switch static-checks-dragonball CI machines to Azure
9b6c5eaff1 kata-deploy: Create kata-static.tar with correct ownership
4403af74ec metrics: re-enable memory-usage initialization step
d2d7c041f3 metrics: fix parsing issue on memory-usage test
8c7a4fd121 gha: Rebase atop of the target branch
75dcca5a53 metrics: Add grabdata script for metrics report
59e7c3a347 gha: Update to checkout@v3 action
8f1cc278ca metrics: Add report generator link to general documentation
05180b61a0 metrics: Add README for kata metrics report
17c88a1a7f metrics: Add limit for 90 percentile for qemu value
dbb4761c4b metrics: Add limit for write 90 percentile value for clh
aebf392e45 metrics: Enable FIO limits for kata metrics
41d05b8857 metrics: Fix memory footprint qemu limit
3491407581 metrics: Fix memory inside limits for kata metrics
08027f2282 metrics: Add test setup details to metrics report
99103db1fb metrics: Add boot lifecycle times to metrics report
75c92ba474 metrics: Add memory inside container to metrics report
1c1eb98107 metrics: Add scaling system footprint in metrics report
01f6e6a1a3 metrics: Add metrics reportgen
428eb6908d metrics: Add report file titles
a8fa3d99da metrics: Generate PNGs alongside the PDF report
80625ed573 metrics: Add metrics report R files
9f8e194e6f metrics: Add report dockerfile
03c206f87f metrics: Add metrics report script
2684b267f7 tests: Expand confidential test to support TDX
4976629aee tests: Expand confidential test to support SNP
019849071e tests: Add confidential test for SEV
1b7c7901d9 local-build: Remove $HOME/.docker/buildx/activity/default
6a34bae03d gha: Avoid "fail-fast" in tests that are known to be flaky
17d22cae34 tests: use unique test name
e8c24fa0b9 tests: delete k8s deployment at the test's end
3e07c89d39 metrics: Remove unused variable in tensorflow nhwc script
5b9a69433d kata-deploy: Don't try to remove /opt/kata
e99a13d26c gha: vfio: Run on Ubuntu 23.04 runner
394d146b89 local-build: Remove GID before creating group
7421737229 metrics: Add TensorFlow ResNet50 fp32 Dockerfile
9acbf2faf7 metrics: Add TensorFlow ResNet50 FP32 benchmark
4f2c9372c3 kata-deploy: Avoid failing on content removal
6ea1d3bffd metrics: Add disk link to README
ad2036927f metrics: Fix FIO path
abcb225ce3 metrics: Use function from metrics common in pytorch script
508f1bba15 gha: capture additional kata-deploy output
d46c300608 metrics: Enable kata runtime in K8s for FIO test.
3d3882a06a metrics: Update tensorflow name in gha run script
7d0a3dbf24 metrics: Fix check results for tensorflow benchmark
3e2a383b7d gha: kata-deploy: Do the runtime class cleanup as part of the cleanup
2c5db14a1a gha: kata-deploy: Add the first kata-deploy test
0b4fb826de metrics: Remove unused variable in tensorflow mobilenet script
b38624e2b3 tests: common: Ensure test_type is used as part of the cluster's name
cdfcd9aba8 tests: commob: Don't fail if yq is not part of the cache
74edbaac96 gha: kata-deploy: Add run-kata-deploy-tests.sh
d7130f48b0 gha: k8s: Stop running kata-deploy tests as part of the k8s suite
810507e8a3 tests: k8s: Call ensure_yq() in setup.sh
915bace795 kata-deploy: Properly create default runtime class
870d8004a0 metrics: Fix MobileNet help me description
145450544d gha: ci: Start running kata-deploy tests
bd29413721 docs: Fix TensorFlow word across the document
a845e94139 docs: Add Tensorflow Resnet50 documentation
6e5a5b8249 metrics: Add Dockerfile for ResNet50 int8
5d85cac1d6 metrics: Add Tensorflow ResNet50 int8 benchmark
7474e50ae2 gha: cri-containerd: Enable tests
20be3d93d5 gha: cri-containerd: Add timeout to the crictl calls on testContainerStop
10058f718a gha: cri-containerd: Show pod before deleting it
585d5fba03 gha: cri-containerd: Print kata logs in case of error
2fea5a5f8b gha: cri-containerd: Group containerd logs
3c7597f4ba gha: cri-containerd: Ensure RUNTIME takes KATA_HYPERVISOR into account
738d808cac metrics: Rename tensorflow scripts
4bb8fcc0c0 tests: kata-deploy: Add placeholder for kata-deploy-tests-on-tdx
f5e14ef283 tests: kata-deploy: Add placeholder for kata-deploy-tests-on-aks
e812c437fe tests: kata-deploy: Add functional/kata-deploy/gha-run.sh placeholder
c19cebfa80 tests: Add gha-run-k8s-common.sh
4e8c512346 metrics: fix the loop used to stop kata components #762947f32c4983 metrics: Add cassandra statefulset yaml
d5a14449fc metrics: Add cassandra service yaml
1292b51092 metrics: Add block loop pvc yaml for cassandra
105a556a30 metrics: Add block loop pv yaml for cassandra test
1b126eb4ce metrics: Add block loop pvc for cassandra test
671ad98451 metrics: Add Cassandra Kubernetes benchmark for kata metrics
058b304455 gha: static-checks: Move to the Azure instances
b600659df2 metrics: Add check containers are running in tensorflow mobilenet
1b30aa818e metrics: Add check containers are up in tensorflow script
3502bb4b20 metrics: Remove unused variable in tensorflow script
b07c19eb5f metrics: Add check containers are running function
fc89392745 metrics: Add check containers are up in tensorflow mobilenet script
73843b786d metrics: Use check containers are up in tensorflow script
7fffa7f9ce metrics: Add check containers are up in common script
1b68145b6a metrics: Use collect_results function in tensorflow mobilenet test
f29f811470 metrics: Remove collect results function definition
6b6a6ee724 metrics: Add common functions to the common script
a341c2f324 metrics: compute tensorflow statistics
b8b4ca10e9 ci: unencrypted-image: Fix build context
dcc35781f7 ci: unencrypted-image: Don't fail to build on s390x
babbd4186c ci: create-confidential-image: Add dependent actions
cecb30dbb2 metrics: Add nginx documentation to network README
1971fe4986 metrics: Add nginx kubernetes yaml
6c921ce3db metrics: Add network nginx benchmark
a5a3e4124f ci: k8s: tees: Ensure PR_NUMBER is exported
3a21c485bf ci: {{ pr-number }} should be {{ inputs.pr-number }}
218d83bd3f tests: k8s: Ensure the runtime classes are properly created
0625d8dfc1 ci: Add build-and-publish-tee-confidential-unencrypted-image
6ae591c618 ci: k8s: Add the image used for unencrypted confidential tests
8d4f9ef256 tests: upgrade bats version
a484666890 metrics: install kata once and run multiple checks
759b0fa385 metrics: General improvements to mobilenet tensorflow test
d6398ccf9e metrics: Add iperf to gha run script
a75db20167 gha: Add iperf network metrics
b33d4de013 metrics: Add latency test to network README
db23b95b53 metrics: Add latency server yaml
2b60fe0fe0 metrics: Add latency client yaml
aa71d6f931 metrics: Add network latency test
b2c627aac9 metrics: Improve naming testing containers in launch times test
ea1fdd2cb9 metrics: Clean kata components before start a metric test.
7d5f65be7c kata-deploy: Use host's systemctl
2881bad407 dragonball: use version 0.10.4 of `fuse-backend-rs`
Signed-off-by: Greg Kurz <groug@kaod.org>
kata-deploy files must be adapted to a new release. The cases where it
happens are when the release goes from -> to:
* main -> stable:
* kata-deploy-stable / kata-cleanup-stable: are removed
* stable -> stable:
* kata-deploy / kata-cleanup: bump the release to the new one.
There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.
Signed-off-by: Greg Kurz <groug@kaod.org>
Previously, if you accidentally modified the name of the hypervisor
section in the config file, the default golang runtime gives a cryptic
error message ("`VM memory cannot be zero`"). This can be demonstrated
using the `kata-runtime` utility program which uses the same golang
config package as the actual runtime (`containerd-shim-kata-v2`):
```bash
$ kata-runtime env >/dev/null; echo $?
0
$ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml
$ kata-runtime env >/dev/null; echo $?
VM memory cannot be zero
1
```
The hypervisor name is now validated so that the behaviour becomes:
```bash
$ kata-runtime env >/dev/null; echo $?
0
$ sudo sed -i 's!^\[hypervisor\.qemu\]!\[hypervisor\.foo\]!g' /etc/kata-containers/configuration.toml
$ ./kata-runtime env >/dev/null; echo $?
/etc/kata-containers/configuration.toml: configuration file contains invalid hypervisor section: "foo"
1
```
Fixes: #8212.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
(cherry picked from commit 3e8cf6959c)
Signed-off-by: Greg Kurz <groug@kaod.org>
This PR corrects the init env() helper function, to make that
systemctl always returns true when enumerating masked services,
and preventing the test from failing
Fixes: #8242
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit 4f9681b411)
Signed-off-by: Greg Kurz <groug@kaod.org>
To avoid errors when initializing the test environment, the
kill_processes_before_start() helper function needs to verify that
docker is installed before attempting to stop it.
Fixes: #8218
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit 908519db9d)
Signed-off-by: Greg Kurz <groug@kaod.org>
This PR removes trailing commas so that the json results
file is valid.
This PR also changes the way data results are collected by
terating through the array of memory values to calculate
their average.
Fixes: #8204
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit c2763120aa)
Signed-off-by: Greg Kurz <groug@kaod.org>
This PR removes the reference in the documentation to the DAX
subtest of the FIO benchmark, because this metric is currently
WIP.
Fixes: #8159
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit 89c9454fca)
Signed-off-by: Greg Kurz <groug@kaod.org>
As the file is already part of the kata-containers repo, and the tests
repo is about to become read-only, we're good to drop the tests
references from here and use everything coming from the
`kata-containers` repo instead.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit fbc8f8f466)
Signed-off-by: Greg Kurz <groug@kaod.org>
As we've moved all the tests to the `kata-containers` repo, the `tests`
repo will become a read-only repo.
Fixes: #8200
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 65b1a2d277)
Signed-off-by: Greg Kurz <groug@kaod.org>
The cgroup stats come from resourcecontrol package in the form of pointers
to structs. The sandbox Stat() method incorrectly was expecting structs.
This caused the cpu and memory stats to always be 0, which in turn caused
incorrect pod overhead metrics.
Fixes#8035
Signed-off-by: Peteris Rudzusiks <rye@stripe.com>
(cherry picked from commit 94e2ccc2d5)
Signed-off-by: Greg Kurz <groug@kaod.org>
Users have noticed that this is needed, as CLH does not yet implement a
way to hotplug resources on aarh64.
With this patch, when building for x86_64, I can see the this is the
resulting config:
```
$ ARCH=amd64 make
...
$ cat config/configuration-clh.toml | grep static_sandbox_resource_mgmt
static_sandbox_resource_mgmt=false
```
And when building for aarch64:
```
$ ARCH=arm64 make
...
$ cat config/configuration-clh.toml | grep static_sandbox_resource_mgmt
static_sandbox_resource_mgmt=true
```
Fixes: #7941
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 72599f1911)
Signed-off-by: Greg Kurz <groug@kaod.org>
PR #6146 added the possibility to control QEMU with an extra HMP socket
as an aid for debugging. This is great for development or bug chasing
but this raises some concerns in production.
The HMP monitor allows to temper with the VM state in a variety of ways.
This could be intentionally or mistakenly used to inject subtle bugs in
the VM that would be extremely hard if not even impossible to debug. We
definitely don't want that to be enabled by default.
The feature is currently wired to the `enable_debug` setting in the
`[hypervisor.qemu]` section of the configuration file. This setting has
historically been used to control "debug output" and it is used as such
by some downstream users (e.g. Openshift). Forcing people to have the
extra HMP backdoor at the same time is abusive and dangerous.
A new `extra_monitor_socket` is added to `[hypervisor.qemu]` to give
fine control on whether the HMP socket is wanted or not. This setting
is still gated by `enable_debug = true` to make it clear it is for
debug only. The default is to not have the HMP socket though. This
isn't backward compatible with #6416 but it is for the sake of "better
safe than sorry".
An extra monitor socket makes the QEMU instance untrusted. A warning is
thus logged to the journal when one is requested.
While here, also allow the user to choose between HMP and QMP for the
extra monitor socket. Motivation is that QMP offers way more options to
control or introspect the VM than HMP does. Users can also ask for
pretty json formatting well suited for human reading. This will improve
the debugging experience.
This feature is only made visible in the base and GPU configurations
of QEMU for now.
Fixes#7952
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 1f16b6627b)
Signed-off-by: Greg Kurz <groug@kaod.org>
This syntax belongs to the legacy C virtiofsd implementation that
we don't support anymore since kata-containers 3.1.3 because
of other API breaking changes.
People have been warned to switch from "none" to "never" since
kata-containers 2.5.2. Let's officially do that.
The compat code that would convert "none" to "never" isn't
needed anymore. Just drop it.
Fixes#7864
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 72c510d057)
Signed-off-by: Greg Kurz <groug@kaod.org>
The "-o" syntax belongs to the legacy C virtiofsd. It is deprecated
with the rust implementation.
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 81536f21af)
Signed-off-by: Greg Kurz <groug@kaod.org>
Some use cases may just require passing extra arguments to virtiofsd,
and having this disabled by default makes it impossible to set when
using kata-deploy, as changes in the configuration file would be
overwritten by the daemon-set.
With this in mind, let's allow users to pass whatever thet need (and
here I'm specifically looking at `--xattr`) as a virtio_fs_extra_arg.
Fixes: #7853
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b1dd09a4d3)
Signed-off-by: Greg Kurz <groug@kaod.org>
If we are running FC hypervisor, it is not started when prestart hooks
are executed. So we should just ignore such error and just go ahead and
run the hooks.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
(cherry picked from commit 2e4c874726)
Signed-off-by: Greg Kurz <groug@kaod.org>
FC does not support network device hotplug. Let's add a check to fail
early when starting containers created by docker.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
(cherry picked from commit 21204caf20)
Signed-off-by: Greg Kurz <groug@kaod.org>
Add a new hypervisor capability to tell if it supports device hotplug.
If not, we should run prestart hooks before starting new VMs as nerdctl
is using the prestart hooks to set up netns. To make nerdctl + FC
to work, we need to run the prestart hooks before starting new VMs.
Fixes: #6384
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
(cherry picked from commit 32fd013716)
Signed-off-by: Greg Kurz <groug@kaod.org>
Right now if we configure an image annotation and have a config file
setting initrd, the initrd config would override the image annotation.
Make sure annotations are preferred over config options in image and initrd
path handling.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
(cherry picked from commit 18d42da21e)
Signed-off-by: Greg Kurz <groug@kaod.org>
We should make sure annotations are preferred over
config options in image and initrd path handling.
Fixes: #7705
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
(cherry picked from commit 9fda7059a5)
Signed-off-by: Greg Kurz <groug@kaod.org>
Right now if we configure an image annotation and have a config file
setting initrd, the initrd config would override the image annotation.
Add a helper function ImageOrInitrdAssetPath to make sure annotations
are preferred over config options in image and initrd path handling.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
(cherry picked from commit 1a0092d631)
Signed-off-by: Greg Kurz <groug@kaod.org>
When the FileMode field for the device is unset (0), use a default value instead
to allow the use of the device from the container.
This behaviour is seen from cri-o typically.
Note: this is what runc is doing, which is why regular containers don't have an
issue. This change makes sure kata behaves the same as runc.
Fixes: #7717
Signed-off-by: Julien Ropé <jrope@redhat.com>
(cherry picked from commit 40914b25d4)
Signed-off-by: Greg Kurz <groug@kaod.org>
This pull request is mainly for updating vm-memory and vmm-sys-util.
The affacted crates include:
- vm-memory: from 0.9.0 to 0.10.0
- vmm-sys-util: from 0.10.0 to 0.11.0
- virtio-queue: from 0.6.0 to 0.7.0
- fuse-backend-rs: from 0.10.4 to 0.10.5
- linux-loader: from 0.6.0 to 0.8.0
- nydus-api: from 0.3.0 to 0.3.1
- nydus-rafs: from 0.3.1 to 0.3.2
- nydus-storage: from 0.6.3 to 0.6.4
Fixes: #0000
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
(cherry picked from commit b23c5ed155)
Signed-off-by: Greg Kurz <groug@kaod.org>
The KUBERNETES variable is mostly used by kata-deploy whether to apply
k3s specific deployments or not. It is used to select the type of
kubernetes to be installed (k3s, k0s, rancher...etc) and it is always
set on CI. Running the script locally we want to set a value by default
to avoid `KUBERNETES: unbound variable` errors.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit e669282c25)
This test can give false-positive on a multi-node cluster. Changed it to
use the new get_one_kata_node() and the modified exec_host() to run the
setup commands on a given node (that has kata installed) and ensure the
test pod is scheduled at that same node.
Fixes#7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit c30c3ff185)
This test can give false-positive on a multi-node cluster. Changed it to
use the new get_one_kata_node() and the modified exec_host() to run the
setup commands on a given node (that has kata installed) and ensure the
test pod is scheduled at that same node.
Fixes#7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit 666993da8d)
The exec_host() simply fails on cluster with multi-nodes because
`kubectl get node -o name" will return a list o names. Moreover, it will
return control nodes names which usually don't have kata installed.
Fixes#7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit 3a00fc9101)
The introduced get_one_kata_node() returns the first node that
has the kata-runtime=true label, i.e., supposedly a node with
kata installed.
This is useful for tests that should run on a determined worker
node on a multi-nodes cluster.
Fixes#7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit 61c9c17bff)
Let KATA_HYPERVISOR be qemu by default in gh-run.sh as this variable
is required to tweak some configurations of kata-deploy.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit 68f083c4d0)
The deploy-kata() of gha-run.sh will wait for 10 minutes for the kata
deploy installation finish. This allow users of the script to overwrite
that value by exporting the KATA_DEPLOY_WAIT_TIMEOUT environment
variable.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit 6677a61fe4)
Fixed a couple of warns shellcheck emitted and disabled others:
* SC2154 (var is referenced but not assigned)
* SC2086 (Double quote to prevent globbing and word splitting)
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit 200e542921)
The .tests/integration/kubernetes/gh-run.sh script run `yq write` a
couple of times to edit the kata-[deploy|cleanup].yaml, resulting
on the file being formatted again. This is annoying because leaves
the git tree dirty.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit 4af78be13a)
The only difference to the other platforms is that it needs to
export KUBECONFIG.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit d54e6d9cda)
The cleanup-kcli() behaves like other deploy kata for
bare-metal (e.g. sev, tdx...etc) except that KUBECONFIG
should be exported.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit c2ef1f0fb0)
The cleanup-kcli() behaves like other clean up for bare-metal (e.g. sev,
tdx...etc) except that KUBECONFIG should be exported.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit d2be8eef1a)
On CI workflows the variables DOCKER_REGISTRY, DOCKER_REPO and
DOCKER_TAG are exported to match the built image. However, when running
the script outside of CI context, a developer might just use the latest
image which in this case will be
`quay.io/kata-containers/kata-deploy-ci:kata-containers-latest`.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit cbb9aa15b6)
Adapted the gha-run.sh script to create a Kubernetes cluster locally
using the kcli tool.
Use `./gha-run.sh create-cluster-kcli` to create it, and
`./gha-run.sh delete-cluster-kcli` to delete.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit 89bef7d036)
This PR adds a trap whenever the scrip exits, it deletes the iperf
k8s deployment and k8s services, and deletes the kata components.
This way, when the script finishes, it verifies that there are
indeed no kata components still running.
Fixes: #8126
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit bba34910df)
So that we don't risk exceeding the GHA 20 rerefenced yaml files limit
that easy.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
(cherry picked from commit 954d40cce5)
GHA has an undocumented limitation that there can be at most 20
referenced yamls in a single yaml file. We workaround it by combining
multiple jobs into a single yaml file.
Fixes: #8161
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
(cherry picked from commit b60e0a9b57)
I'm basically moving the runk tests from the tests repo to this one, and
I'm adding the "Signed-off-by:" of every single contributor the tests.
Fixes: #8116
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Chen Yiyang <cyyzero@qq.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit da91c9df88)
The runk test has been executed as part of the former "ubuntu" jenkins
CI.
We're porting it to GHA and running it against LTS containerd.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 7f23772763)
The tracing tests are currently running as part of the Jenkins CI with
the following setups:
* Container Engines: containerd
* VMMs: QEMU | Cloud Hypervisor
* Snapshotters: overlayfs | devmapper
We'll be restricting those tests to be running on LTS version of
containerd, without devmapper.
As it's known due to our GHA limitation, this is just a placeholder and
the tests will actually be added in the next interations.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 3bb2923e5d)
This PR enables the use of jq pretty-print feature to
improve the formatting of metric results json files.
Fixes: #8081
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit 8c498ef5ee)
FIO benchmark is enabled to measure IO in Kata
at different latencies using containerd client,
in order to complement the CI metrics testing set.
This PR asl deprecated the previous Fio bench
based on k8s.
Fixes: #8080
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit a2159a6361)
Otherwise we'll use any arm64 machine that's added as a runner, and
whenever new machines are added those may end up being only used for
running some specific set of the tests.
Fixes: #8109
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 119f03de26)
This was used for debugging, and ended up being merged with that.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 560bbffb57)
We've added two new containerd builder images recently, one for the
components under `src/tools` and another one for the Kata Containers
agent.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 18fa483d90)
This should be TOOLS_CONTAINER_BUILDER instead of
VIRTIOFSD_CONTAINER_BUILDER.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ca3b888371)
This follows what we've been doing for all the components we're
building, but was missed as part of #8077.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 5ca66795c7)
The kata-agent binary won't be released, just built so it can be used,
later on, as part of our tests and as part of the rootfs build.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 02acef9575)
Let's add the needed functions to start building the kata-agent, with or
without the OPA support.
For now this build is not used as part of the rootfs build, but later on
this will (not as part of this series, though).
Fixes: #8099
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 5208386ab1)
This will help to build the agent binary as part of the kata-deploy
localbuild, as we need to pass the DESTDIR to where the agent will be
installed, and also whether we're building the agent with policy support
enabled or not.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 1727487eef)
Conflicts:
src/agent/Makefile
The conflict happened as AGENT_POLICY is not part of the stable-3.2,
thus I've manually removed it.
The tests are failing when setting up k0s, and that happens because we
download a kubectl binary matching the kubernetes version k0s is using,
and we do that by:
```
sudo k0s kubectl version --short 2>/dev/null | ...
```
With kubectl 1.28, which is now the default on k0s, `kubectl version
--short` has been removed, leading us to an empty stringm causing then
the error in the CI.
Fixes: #8105
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 70e7ec3e23)
Let's add targets and actually enable users and oursevles to build those
components in the same way we build the rest of the project.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 501a168a81)
As we'd like to ship the content from src/tools, we need to build them
in the very same way we build the other components, and the first step
is providing scripts that can build those inside a container.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 6ef42db5ec)
This will be used for building all the (rust) components from src/tools.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 4d08ec29bc)
This will make it easier to build images that rely on several
directories hashes.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 98097c96de)
The kata-monitor tests is currently running as part of the Jenkins CI
with the following setups:
* Container Engines: CRI-O | containerd
* VMMs: QEMU
When using containerd, we're testing it with:
* Snapshotter: overlayfs | devmapper
We will stop running those tests on devmapper / overlayfs as that hardly
would get us a functionality issue.
Also, we're restricting this to run with the LTS version of containerd,
when containerd is used.
As it's known due to our GHA limitation, this is just a placeholder and
the tests will actually be added in the next iterations.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit a3fb067f1b)
This will serve us quite will in the upcoming tests addition, which will
also have to be executed using CRi-O.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit de1eeee334)
This will become handy when doing tests with CRI-O, as CRI-O doesn't
install the CNI plugins for us.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 64a2000859)
Let's ensure we have runc running with `SystemdCgroups = false`,
otherwise we'll face failures when running tests depending on runc on
Ubuntu 22.04, woth LTS containerd.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 8132fe15c9)
This PR fixes general spelling warnings detected by the spelling check.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 97e73b2234)
This PR fixes the network metrics section at the README by leaving
the current tests that we have in our kata metrics.
Fixes#8017
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 36c8cd6f1f)
We've faced this as part of the CI, only happening with the CRI-O tests:
```
not ok 1 Test readonly volume for pods
# (from function `exec_host' in file tests_common.sh, line 51,
# in test file k8s-file-volume.bats, line 25)
# `exec_host "echo "$file_body" > $tmp_file"' failed with status 127
# [bats-exec-test:38] INFO: k8s configured to use runtimeclass
# bash: line 1: $'\r': command not found
#
# Error from server (NotFound): pods "test-file-volume" not found
```
I must say I didn't dig into figuring out why this is happening, but we
may be safe enough to just trail the '\r', as long as all the tests keep
passing on containerd.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ef63d67c41)
We need the default capabilities to be enabled, especially `SYS_CHROOT`,
in order to have tests accessing the host to pass.
A huge thanks to Greg Kurz for spotting this and suggesting the fix.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 74c12b2927)
Some of the "k8s distros" allow using CRI-O in a non-official way, and
if that's done we cannot simply assume they're on containerd, otherwise
kata-deploy will simply not work.
In order to avoid such issue, let's check for `cri-o` as the container
engine as the first place and only proceed with the checks for the "k8s
distros" after we rule out that CRI-O is not being used.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 358dc2f569)
Add k0s support to kata-deploy, in the very same way kata-containers
already supports k3s, and rke2.
k0s support requires v1.27.1, which is noted as part of the kata-deploy
documentation, as it's the way to use dynamic configuration on
containerd CRI runtimes.
This support will only be part of the `main` branch, as it's not a bug
fix that can be backported to the `stable-3.2` branch, and this is also
noted as part of the documentation.
Fixes: #7548
Signed-off-by: Steve Fan <29133953+stevefan1999-personal@users.noreply.github.com>
(cherry picked from commit 72cbcf040b)
The permissions on .docker/buildx/activity/default are regularly broken by us
passing docker.sock + $HOME/.docker to a container running as root and then
using buildx inside. Fixup ownership before executing docker commands.
Fixes: #8027
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 15425a2b80)
To avoid the failure of not finding pandoc command this PR adds that
package as a dependency for static checks.
Fixes#8041
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 13ca7d9f97)
Seems like the static checks are failing due the missing of the hunspell
package this PR fixes that.
Fixes#8019
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 87a8616488)
This will allow us to easily test failures and fixes on that workflows.
Fixes: #8031
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 0c95697cc4)
The test has been added to the repo, but we have to also add it to the
list of jobs to be executed.
Fixes: #8005
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 8c3c50ca8a)
Otherwise we'll face the following error:
```
Failed to enable unit: Interactive authentication required.
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 07a6e63a6b)
Let's make sure we'll also be testing k8s using CRI-O.
For now, we'll only be running the CRI-O test with QEMU. Once it
becomes stable we can expand this to other Hypervisors as well.
Fixes: #8005
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 03b82e8484)
We'll need this in order to setup k0s with a different container engine.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 54c0a471b1)
It should be garm-ubuntu-2004-smaller instead of garm-ubuntu-2004-small.
Fixes: #7890
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 3a2c83d69b)
Let's ensure we test kata-deploy on RKE2 and k0s as well.
Fixes: #7890
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit f7fa7f602a)
This will be very useful in the near future, when we start testing
kata-deploy with rke2 as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 2c908b598c)
This will be very useful in the near future, when we start testing
kata-deploy with k0s as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit eaf6164916)
We'll be using exactly the same code used for the k8s tests, which are
already deploying k3s on GARM.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 0015257636)
We just need to make sure the correct overlay is applied, following what
we already have been doing for k3s.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit bf2cb02283)
We'll be testing kata-deploy with different kubernetes flavours as part
of our GARM tests, and this is a place-holder for this.
Once enabled, we'll do nothing, just `return 0`, so we can then properly
add the tests after this commit gets merged.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b12b9e1886)
So we have a better control on which flavour of kubernetes kata-deploy
is expected to be targetting.
This was also done as part of fa62a4c01b,
for the k8s tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 9e1fb8a966)
This will allow us to re-use the function in the kata-deploy tests,
which will come soon.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 09cc0ed438)
Ideally we'd add the instance_type or the full K8S_TEST_HOST_TYPE but
that exceeds the maximum amount of characteres allowed for the cluster
name. With this in mind, let's use the first letter of
K8S_TEST_HOST_TYPE instead.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
(cherry picked from commit d9ef1352af)
This makes it so that each AKS cluster is created in its own individual
resource group, rather than using the "kataCI" resource group for all
test clusters.
This is to accommodate a tool that we recently introduced in our Azure
subscription which automatically deletes resource groups after a set
amount of time, in order to keep spending under control.
The tool will automatically delete any resource group, unless it has a
tag SkipAutoDeleteTill = YYYY-MM-DD. When this tag is present, the
resource group will be retained until the specified date.
Note that I tagged all current resource groups in our subscription with
SkipAutoDeleteTill = 2043-01-01 so that we don't lose any existing
resources.
Fixes: #7982
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
(cherry picked from commit 68267a3996)
We're hitting a specific issue after updating, which will require some
work on dragonball before it can be re-added here.
The issue:
```
...
3: failed to do rafs mount\\n
4: fail to attach rafs \\\"/var/lib/containerd-nydus/snapshots/2/fs/image/image.boot\\\"\\n
5: add share fs mount\\n
6: Mount rafs at
/rafs/197ef3db03c86b91bf3045ff59183ce8b5750941ad1d3484f4a8301a70f5109f/rootfs_lower
error: Failed to Mount backend
...
Caused by:
vmm action error: FsDevice(AttachBackendFailed(\\\"attach/detach a
backend filesystem failed:: missing field `version` at line 1 column
489\\\"))\"): unknown"
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit aba36ab188)
This will ensure we're testing with the correct runtime, instead of
using the `default` one.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b8a8dfcd15)
Fix the arch error when downloading the nydus tarball.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Signed-off-by: Steven Horsman <steven@uk.ibm.com>
(cherry picked from commit f6df3d6efb)
To support the v0.12.0 nydus-snapshotter, we need to update the config
files and the commandline to start nydus-snapshotter.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
(cherry picked from commit 2f9c9e2e63)
And with this we finally enable the nydus tests to run as part of our
GHA CI.
Fixes: #6543
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b73bde320d)
Let's have all the dependencies needed for running the nydus tests
installed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b3904a1a30)
We've been simply doing nothing whenever `install-kata` was called, and
that was the intent when we added the placeholder calls.
Now, let's install kata, as expected. :-)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit d2b3b67f5d)
As we've added install_nydus() and install_nydus_snapshotter(), which do
conform with the pattern we're following on GHA, let's rely on them
rather than relying on the bits coming from nydus_test.sh.
Later on we'll have install_nydus() and install_nydus_snapshotter() as
part of the dependencies install in our `gha-run.sh`.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 0ec00ad42e)
Similarly to what's been done for the cri-containerd tests, as part of
84dd02e0f9, we need to add the timeout
here for the crictl calls.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 568439c77b)
Otherwise we may face errors like:
```
getting sandbox status of pod "d3af2db414ce8": metadata.Name,
metadata.Namespace or metadata.Uid is not in metadata
"&PodSandboxMetadata{Name:nydus-sandbox,Uid:,Namespace:default,Attempt:1,}"
getting sandbox status of pod "-A": rpc error: code = NotFound desc = an
error occurred when try to find sandbox: not found
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 5ac3b76eb1)
Otherwise we canoot properly start the nydus snapshotter, nor properly
kill it after it's been started.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 376574a16c)
The "source ..." we've been doing was not changed since those tests were
part of the Jenkins tests, and we need to adapt them, either setting the
correct path or entirely removing the ones that are not relevant to us
anymore.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 4290fd4b67)
As that's what we've been using as part of the GHA.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit a84efa3e87)
This function will be used to download and install the
nydus-snapshotter, and it follows the same pattern we already have
introduced for downloading and installing another dependencies from
GitHub.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 56a14b3950)
This function will be used to download and install nydus, and it follows
the same pattern we already have introduced for downloading and
installing another dependencies from GitHub.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b6563783e2)
Now that the static-checks job only takes care of running the
static-checks, let's clean it up, remove all the unneeded steps, make
sure that we're using the actions in their latest version, and have it
running in a cost free runner.
At some point I'd like to see those tests done in parallel, in the same
way that I've organised the build-checks, but that's something for
someone else, at some other time.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 8b1e9b0c75)
With this we're removing the dragonball static-checks CI, as the test is
running here now. :-)
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 2c5ca2eaf8)
We're moving it out of the previous "static-checks" confusing matrix,
and adding it to the matrix that was currently being used for the `make
vendor` and `make check` checks.
This will allow us to have one job per component, and with that we can
easily run those in parallel and on the zero cost runners.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 509c309ab2)
We're moving it out of the previous "static-checks" confusing matrix,
and adding it to the matrix that was currently being used for the `make
vendor` and `make check` checks.
This will allow us to have one job per component, and with that we can
easily run those in parallel and on the zero cost runners.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 4e963cedf4)
Otherwise `make test` will simply fail with:
```
error[E0583]: file not found for module `config`
```
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 08f2e5ae0b)
This makes it pssible to run the tests in the cost free runners, which
are not KVM capable.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 2bc3a616ae)
Otherwise `make test` will simply fail with:
```
error[E0583]: file not found for module `version`
```
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 46daddc500)
Otherwise `make test` will fail with:
```
error[E0583]: file not found for module `version`
```
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ec826f328f)
It makes things way simpler, waaaaay simpler.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 1d32410a83)
We're moving it out of the previous "static-checks" confusing matrix,
and adding it to the matrix that was currently being used for the `make
vendor` checks.
This will allow us to have one job per component, and with that we can
easily run those in parallel and on the zero cost runners.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit bf888b9a5e)
clippy is used as part our tests, so it's useful to have it installed
while we're already installing rust.
In case of developers, they also better be using it. :-)
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit e125775863)
Similarly to the static-check jobs, those jobs can be run on the zero
cost runners.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit e2c61a152c)
We'll use it as part of the refactoring we're doing in the static check
tests.
I can see a lot of other uses of this, but changing all of them to this
one is out of the scope for this PR.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 6794d4c843)
We can rely on the functions that are now part of the common.bash.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit e64508c308)
We can use this a lot as part of our CI, but right now I'm just moving
those here with the intent to use later on in this series.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 11dff731b7)
It doesn't make sense to run this for all the bits of the matrix,
neither it's demanding enough to require running this in one of our
Azure sponsored runners.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 75c974c802)
Let me start with a fair warning that this commit is hard to split into
different parts that could be easily tested (or not tested, just
ignored) without breaking pieces.
Now, about the commit itself, as we're on the run to reduce costs
related to our sponsorship on Azure, we can split the k8s tests we run
in 2 simple groups:
* Tests that can be run in the smaller Azure instance (D2s_v5)
* Tests that required the normal Azure instance (D4s_v5)
With this in mind, we're now passing to the tests which type of host
we're using, which allows us to select to run either one of the two
types of tests, or even both in case of running the tests on a baremetal
system.
Fixes: #7972
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit c69a1e33bd)
We've removed this in the part 2 of this effort, as we were not caching
the sha256sum of the component. Now that this part has been merged,
let's get back to checking it.
Fixes: #7834 -- part 3
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 86c41074b4)
That's not needed anymore, as we've switched to using ORAS and an OCI
registry to cache the artefacts.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 460988c5f7)
This is something that was done by our Jenkins jobs, but that I ended up
missing when writing d0c257b3a7.
Now, let's also add the sha256sum to the cached artefact, and in a
coming up PR (after this one is merged) we will also start checking for
that.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 4533a7a416)
In the previous series related to the artefacts we build, we've
switching from storing the artefacts on Jenkins, to storing those in the
ghcr.io/kata-containers/cached-artefacts/${artefact_name}.
Now, let's take advantage of that and actually use the artefacts coming
from that "package" (as GitHub calls it).
NOTE: One thing that I've noticed that we're missing, is storing and
checking the sha256sum of the artefact. The storing part will be done
in a different commit, and the checking the sha256sum will be done in a
different PR, as we need to ensure those were pushed to the registry
before actually taking the bullet to check for them.
Fixes: #7834 -- part 2
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit eccc76df63)
The list of tests which require a bigger VM instance is:
* k8s-number-cpus.bats -- failing on all CIs
* k8s-parallel.bats -- only failing on the cbl-mariner CI
* k8s-scale-nginx.bats -- only failing on the cbl-mariner CI
We'll keep those disabled while we re-work the logic to **only run
those** in a bigger (and more expensive) VM instance.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 094b6b2cf8)
Let's push the artefacts to ghcr.io and stop relying on jenkins for
that.
Fixes: #7834 -- part 1
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit d0c257b3a7)
Right now this is not used, but it'll be used when we start caching the
artefacts using ORAS.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 108f1b60dd)
ORAS is the tool which will help us to deal with our artefacts being
pushed to and pulled from a container registry.
As both the push to and the pull from will be done inside the
kata-deploy binaries builder container, we need it installed there.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit be2eb7b378)
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense. With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.
Fixes: #7958
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit fb24fb0dc1)
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense. With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.
Fixes: #7958
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 1daf02f5d4)
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense. With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.
Fixes: #7958
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit e60d81f554)
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense. With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.
Fixes: #7958
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 4db416997c)
We don't need to run on a D4s_v5. as those tests are not CPU / memory
intense. With this is mind, let's use a smaller version of the
instance, the D2s_v5 one.
Fixes: #7958
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 32841827b8)
Without setting the cpu limit / request to 1, we can make this test run
in a smaller VM instance without any issue.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 92fff129fd)
Instead of having some of them only being considered if explicitly
passed to the script.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit adc18ecdb1)
As the environment variables are now being passed down from the GitHub
Actions, let's make sure they're exposed to the container used to build
the kata-deploy binaries, and during the build process we'll be able to
use those to log in and push the artefacts to the OCI registry, using
ORAS.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit c7a851efd7)
We do the build of our artefacts inside a container image, and we need
to expose some env vars to the container so ORAS can be used there to
push the artefacts we want to cache to ghcr.io.
The env vars we're exposing are:
* ARTEFACT_REGISTRY: The registry where we're going to save the
artefacts.
* ARTEFACT_REGISTRY_USERNAME: The username to log in to the registry, as
ORAS does not use the same json file used by docker.
* ARTEFACT_REGISTRY_PASSWORD: The pasword to log in to the the registry,
as the ORAS does not use the same json file used by docker.
* TARGET_BRANCH: The target branch, which will be part of the tag of the
artefact, as we may end up caching the artefacts for both main and
stable branches.
Fixes: #7834 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 6bd15a85d5)
This PR adds the iperf cpu utilization limit for qemu for kata metrics.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit cd4fd1292a)
We need a very recent L2 guest kernel to fix all the bugs that occur in nested
virtualization.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 9d93036783)
cloud hypervisor does not emulate pcie switches or pci bridges, so we need to
accept a lonely device.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit faee59b520)
It is fine to start a VM with the disk image without syncing it as we now run
the test in an ephemeral Azure instance.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit df3dc1105c)
Sometimes the test gets stuck running commands in the container - need to
investigate why later.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 7211c3dccc)
Cloud Hypervisor exposes a VIRTIO_IOMMU device to the VM when IOMMU support is
enabled. We need to add it to the whitelist because dragonball uses kernel
v5.10 which restricted VIRTIO_IOMMU to ARM64 only.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 1b02f89e4f)
Conflicts:
tools/packaging/kernel/kata_config_version
by enabling IOMMU on the default PCI segment. For hotplug to work we need a
virtualized iommu and clh exposes one if there is some device or PCI segment
that requests it. I would have preferred to add a separate PCI segment for
hotplugging vfio devices but unfortunately kata assumes there is only one
segment all over the place. See create_pci_root_bus_path(),
split_vfio_pci_option() and grep for '0000'.
Enabling the IOMMU on the default PCI segment requires passing enabling IOMMU on
every device that is attached to it, which is why it is sprinkled all over the
place.
CLH does not support IOMMU for VirtioFs, so I've added a non IOMMU segment for
that device.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 3a1db7a86b)
This is a to catch the case of the guest getting stuck.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 9f1a42c6cc)
This shouldn't be hiding behind only a qemu check, we need this for clh as
well.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit b46b0ecf8b)
There is no way for this branch to be hit, as port is only set when it is
different than config.NoPort.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit bfc93927fb)
These test cases shows which options are valid for CLH/Qemu, and test that we
correctly catch unsupported combinations.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 7c4e73b609)
The only supported options are hot_plug_vfio=root-port or no-port.
cold_plug_vfio not supported yet.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit fc51e4b9eb)
hot_plug_vfio needs to be set to root-port, otherwise attaching vfio devices to
CLH VMs fails. Either cold_plug_vfio or hot_plug_vfio is required, and we have
not implemented support for cold_plug_vfio in CLH yet.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 509771e6f5)
tdp_mmu had some issues up until around Linux v6.3 that make it work
particularly bad when running nested on Hyper-V. Reload the module at the start
of the test and disable the tdp_mmu param.
Gather debug info at the end of the test to make it easier to figure out what
went wrong. This uses github actions group syntax so that each section can be
collapsed.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 5f6475a28a)
For debugging (though this doesn't get exposed yet).
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 8fffdc81c5)
- reduce memory and cpu usage to fit in a D4s_v5
- source correct lib
- mount workspace from 9p
- disable cpu mitigations for speed
- drop unused commands and variables
- install containerd
- install kata from built artifacts
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit df815087e7)
To match the flow of other github actions workflows.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit a92ddeea15)
This imports the vfio test scripts github.com/kata-containers/tests. The test
case doesn't work yet but doing the changes in a separate commit will make it
easier to track the changes. The only change in this commit is renaming
vfio_jenkins_job_build.sh -> vfio_fedora_vm_wrapper.sh
Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 5a551a85b1)
This will ensure that we're calling the correct binary for the
hypervisor.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 813bfdec01)
Otherwise we'll fail to configure kata-containers in the `install-kata`
step.
This is mostly needed because the nerdctl-full tarball doesn't provide a
contaienrd configuration, just the binary, as contaienrd does not
actually require a configuration file to run with the default config.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 46bc0b1c01)
TIL that the Azure VMs we use are created without an explicit outbund
connectivity defined.
This leads us to issues using `ping ...` as part of our tests, and when
consulting Jeremi Piotrowski about the issue he pointed me out to two
interesting links:
* https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access
* https://learn.microsoft.com/en-us/archive/blogs/mast/use-port-pings-instead-of-icmp-to-test-azure-vm-connectivity
For your own sanity, do not read the comments, after all this is
internet. :-)
Anyways, the suggestion is to use nping instead, which is provided by
the nmap package, so we can explicitly switch to using the tcp port 80
for the ping. With this in mind, I'm switching the image we use for the
test and using one that provided nping as a possible entry point, and
from now on (this part of) the tests should work.
Fixes: #7910
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 13968aa7f6)
TIL that the Azure VMs we use are created without an explicit outbund
connectivity defined.
This leads us to issues using `ping ...` as part of our tests, and when
consulting Jeremi Piotrowski about the issue he pointed me out to two
interesting links:
* https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access
* https://learn.microsoft.com/en-us/archive/blogs/mast/use-port-pings-instead-of-icmp-to-test-azure-vm-connectivity
For your own sanity, do not read the comments, after all this is
internet. :-)
Anyways, the suggestion is to use nping instead, which is provided by
the nmap package, so we can explicitly switch to using the tcp port 80
for the ping. With this in mind, I'm switching the image we use for the
test and using one that provided nping as a possible entry point, and
from now on (this part of) the tests should work.
Fixes: #7910
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit e0c811678b)
This PR ensures that docker is running as part of the init_env function
in kata metrics to avoid failures like docker is not running and making
the kata metrics CI to fail.
Fixes#7898
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit d53eb73eec)
FIO test is showing ongoing issues when running in k8s.
Working on running FIO on the ctr client which has been
shown to be stable.
Fixes: #7920
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit a58ea66592)
This will help us to make sure that the failure is actually related to
Kata Containers.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit f536ef5ce1)
There's no reason to wait till the payload is created to run the tests,
as we rely on the tarball, not on the kata-deploy payload.
That was a mistake on my side, and that's already fixed for the nerdctl
tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit c83f167c59)
Let's add a very basic sanity test to check that we can spawn a
containers using nerdctl + Kata Containers.
This will ensure that, at least, we don't regress to the point where
this feature doesn't work at all.
In the future, we should also test all the VMMs with devmapper, but
that's for a follow-up PR after this test is working as expected.
Fixes: #7911
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 12d833d07d)
Let's add a very basic sanity test to check that we can spawn a
containers using docker + Kata Containers.
This will ensure that, at least, we don't regress to the point where
this feature doesn't work at all.
For now we're running this test against Cloud Hypervisor and QEMU only,
due to an already reported issue with dragonball:
https://github.com/kata-containers/kata-containers/issues/7912
In the future, we should also test all the VMMs with devmapper, but
that's for a follow-up PR after this test is working as expected.
Fixes: #7910
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 348b8644d6)
As, regardless of what's mentioned in the documentation, it seems that
$GITHUB_REF_NAME is passed down as a literal string.
Fixes: #7414
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit f811b064ca)
The ones for the payload-after-push.yamland ci-nightly.yaml are not that
much important right now, but they're needed for when we start running
those on stable branches as well.
The other ones were missed during
bd24afcf73.
Fixes: #7414
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 6d795c089e)
Now that the metrics migration from the tests to kata containers has been completed, this PR removes the warning from the main metrics documentation.
Fixes#7894
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 060499dcae)
There's no need to keep curl there after the kubectl binary has already
been downloaded.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 8b4a0b368f)
Similarly to what's been done for x86_64 -> amd64, we need to do a
aarch64 -> arm64 change in order to be able to download the kubectl
binary.
Fixes: #7861
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 139c7f03ab)
We're changing what's been done as part of ac939c458c, as we've
notcied issues using `github.event.pull_request.merge_commit_sha`.
Basically, whenever a force-push would happen, the reference of
merge_commit_sha wouldn't be updated, leading us to test PRs with the
old code. :-/
In order to get the rebase properly working, we need to ensure we pull
the hash of the commit as part of checkout action, and ensure
fetch-depth is set to 0.
Fixes: #7414
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit bd24afcf73)
This will make our image smaller, and still ensure it's multi-arch
support.
Fixes: #7861
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 670a8e9c73)
There's absolutely no need to have the skip check as part of the test
itself when it's already done as part of the setup function.
We're only touching the files here that were touched in the previous
commit.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit f6cd3930c5)
Let's keep both checks for now, but in the future we'll be able to
remove the check for "firecracker", as the hypervisor name used as part
of the GitHub Actions has to match what's used as part of the
kata-deploy stuff, which is `fc` (as in `kata-fc for the runtime class)
instead of `firecracker`.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 3cc20b47a6)
The tests are failing to finish as the argument is invalid.
Fixes: #6542
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b5bad3cb0f)
That's what we've been using as part of Jenkins, so let's ensure things
will work as they did before, and only after that consider upgrading the
base OS used for the tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit aaec5a09f3)
We've been using the `kata-deploy-tdx` target as that also uses k3s as
base, but it's better to just have a specific garm target.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 27fa7d828d)
So we have a better control on which flavour of kubernetes kata-deploy
is expected to be targetting.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit fa62a4c01b)
GARM runners do not come with the whole set of tools we need, or are
used to when it comes to the GHA runners, so we need to manually install
bats on those.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 8c9380a798)
Let's put a 1 minute sleep, just to make sure everything is back up
again.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 3de23034f8)
This PR changes the order in which the FIO test first
cleans the environment and then checks if the environment
is indeed clean.
Fixes: #7869
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit adfea55b8f)
As we were using `tee` without the `-a` (or `--apend`) aptton, the
containerd config would be overwritten, leading to a NotReady state of
the Node.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 2df183fd99)
It should be plenty, and worked well in local tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 369a8af8f7)
Let's download the vanilla kubectl binary into `/usr/bin/`, as we need
to avoid hitting issues like:
```sh
error: open /etc/rancher/k3s/k3s.yaml.lock: permission denied
```
The issue basically happens because k3s links `/usr/local/bin/kubectl`
to `/usr/local/bin/k3s`, and that does extra stuff that vanilla
`kubectl` doesn't do.
Also, in order to properly use the k3s.yaml config with the vanilla
kubectl, we're copying it to ~/.kube/config.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ada65b988a)
Otherwise the /etc/rancher/k3s/k3s.yaml is not readable by other users
than root.
As --write-config-mode is being passed, and that's an option that has to
be passed to the `server`, -s is also added to the command line.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ad45ab5d33)
`wait` waits for a job to complete, not a number of seconds. Not sure
how I got that wrong in the first place, but it's what it's.
Fixes: #6542
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 028a97e0d5)
This PR replaces the ubuntu image for one which has TensorFlow optimized
for kata metrics.
Fixes#7866
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 3a427795ea)
Let's enable the devmapper kubernetes tests to match exactly what's been
tested as part of the Jenkins CI.
Fixes: #6542
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 0e8bd50cbb)
This function right now is completely based on what's part of the tests
repo[0], and that's the reason I'm keeping the `Signed-off-by` of all
the contributors to that file.
This is not perfect, though, as it changes the default snapshotter to
devmapper, instead of only doing so for the Kata Containers specific
runtime handlers. OTOH, this is exactly what we've always been doing as
part of the tests.
We'll improve it, soon enough, when we get to also add a way for
kata-deploy to set up different snapshotters for different handlers.
But, for now, this is as good (or as bad) as it's always been.
It's important to note that the devmapper setup doesn't take into
consideration a BM machine, and this is not suitable for that. We're
really only targetting GHA runners which will be thrown away after the
run is over.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit b28b54df04)
One can use different kubernetes flavours for getting a kubernetes
cluster up and running.
As part of our CI, though, I really would like to avoid contributors
spending time maintaining and updating kubernetes dependencies, as done
with the tests repo, and which has been proven to be really good on
getting things rotten.
With this in mind, I'm taking the bullet and using "k3s" as the way to
deploy kubernetes for the devmapper related tests, and that's the reason
I'm adding a function to do so, and this will be used later on as part
of this series.
It's important to note that the k3s setup doesn't take into
consideration a BM machine, and this is not suitable for that. We're
really only targetting GHA runners which will be thrown away after the
run is over.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 54f7117212)
This PR is to skip installing docker-compose-plugin while buiding a `build-kata-deploy` image for s390x|ppc64le.
It is a temporary solution to fix current CI failures for s390x regarding `hash sum mismatch`.
Fixes: #7848
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
(cherry picked from commit 2efda20c77)
This PR adds the write 95 percentile for FIO for qemu for
checkmetrics for kata metrics.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 438fbf9669)
This PR adds the write 95 percentile FIO value for checkmetrics
for kata metrics.
Fixes#7842
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 024b4d2ffe)
Previously, static-checks-dragonball is using machines from Alibaba
Cloud to run all the CI jobs.
Currently, we are going through an internal process to apply for the new
machines for Dragonball CI. Before the internal process is over, we will
temporarily use Azure VM to run static-checks-dragonball jobs.
fixes: #7838
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
(cherry picked from commit 60f733d301)
Pass --owner and --group to the tar invokation to prevent gihtub runner user
from leaking into release artifacts.
Fixes: #7832
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 18c94ebbe3)
This PR fixes an issues in the parsing results stage,
by collecting just the n-results from the n-running
containers, discarding irrelevant data.
Fixes: #7774
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit 538c965c2b)
We have two scenarios we care about this, `pull_request` and
`pull_request_target` events triggered a job.
`pull_request` event:
When using the checkout action, it'll already provide a "rebased atop of
main" repo for us, nothing else is needed, and that's basically what we
already have as part of the jobs in our CI.
`pull_request_target` event:
This one is a little bit tricky, as the checkout action, unless passing
a spsecific repo, give us the PR checked out rebased atop of the HEAD of
the PR branch. Jeremi Piotrowski nicely pointed out that we could use
github.event.pull_request.merge_commit_sha instead, which is the result
of the PR's branch with the official repo target branch.
Now, the only cases where the contributor's rebase would still be needed
is when the action itself has been changed.
Fixes: #7414
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ac939c458c)
This PR adds the grabdata script so it can be used for the metrics report
for kata metrics.
Fixes#7812
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 6668825752)
At this point we should always be using the latest checkout action.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit d7a996c686)
This PR adds the limit for 90 percentile for qemu value for
FIO kata metrics.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit a7b59a5bf9)
This PR adds the limit for write 90 percentile value for clh for
FIO metrics.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 99db6568e9)
This PR fixes the memory inside limit for clh for kata metrics due
to the recent changes that we had in the script which impacted
in the performance measurement.
Fixes#7786
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 8877ec62fb)
Let's expand the confidential test to also support TDX.
The main difference on the test, though, is that we're not grepping for
a string in the `dmesg` output, but rather relying on `cpuid` to detect
a TDX guest.
Fixes: #7184
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit e286e842c1)
Let's expand the confidential test to also support SNP.
Fixes: #7184
Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
(cherry picked from commit e31f099be1)
Add a test case for the launch of unencrypted confidential
container, verifying that we are running inside a TEE.
Right now the test only works with SEV, but it'll be expanded in the
coming commits, as part of this very same series.
Fixes: #7184
Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit c3b9d4945e)
The file can be removed between builds without causing any issue, and
leaving it around has been causing us some headache due to:
```
ERROR: open /home/runner/.docker/buildx/activity/default: permission denied
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 3818bf3311)
Otherwise we'll have to re-run all the tests due to a flaky behaviour in
one of the parts.
Fixes: #7757
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit fb49d5d7ce)
k8s-pid-ns.bats was already using the test name from
k8s-kill-all-process-in-container.bats - probably a copy/paste bug.
Fixes: #7753
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
(cherry picked from commit 183f51d6f6)
At the end of k8s-kill-all-process-in-container.bats, delete the
deployment it created.
Fixes: #7752
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
(cherry picked from commit 6a974679f2)
The directory is a host path mount and cannot be removed from within the
container. What we actually want to remove is whatever is inside that
directory.
This may raise errors like:
```
rm: cannot remove '/opt/kata/': Device or resource busy
```
Fixes: #7746
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit d8f3ce6497)
The vfio test requires nested-nested virtualization:
L0 Azure host
-> L1 Ubuntu VM
-> L2 Fedora VM
-> L3 Kata
This hits a kernel bug on v5.15 but works quite nicely on the v6.2 kernel
included in Ubuntu 23.04. We can switch back to Ubuntu 22.04 when they roll out
v6.2.
Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 936e8091a7)
docker install now creates a group with gid 999 which happens to match what we
need to get docker-in-docker to work. Remove the group first as we don't need
it.
Fixes: #7726
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
(cherry picked from commit 3b881fbc0e)
This PR adds the TensorFlow ResNet50 fp32 Dockerfile for kata metrics.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 959ca49447)
We can simply use `rm -f` all over the place and avoid the container
returning any error.
Fixes: #7733
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 5cba38c175)
This PR configures the corresponding kata runtime in K8s
based on the tested hypervisor.
This PR also enables FIO metrics test in the kata metrics-ci.
Fixes: #7665
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit fb571f8be9)
This PR fixes the check results for tensorflow benchmark now
that we change the name of the test.
Fixes#7684
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit e8a5119343)
Instead of doing this as part of the test itself, let's ensure it's done
before running the tests and during the tests cleanup.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 2d896ad12f)
This test, at least for now, only checks whether the runtimeclasses
have been properly created.
This is just a migration from a test we had as part of the k8s suite.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 4ffc2c86f3)
By doing this we can make sure there won't be any clash on the cluster
name created for either the k8s or the kata-deploy tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 285e616b5e)
This will have the same function as run-k8s-tests.sh has, but for
kata-deploy.
Right now it doesn't have any tests, and the command to actually run the
tests is commented out, but right now this is just a placeholder that
will be populated sooner than later.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ce6adecd0a)
In a follow-up series, we'll add a whole suite for the kata-deploy
tests. With this in mind, let's already get rid of this one and avoid
more kata-deploy tests to land here.
Fixes: #7642
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit cfc29c11a3)
The default `kata` runtime class would get created with the `kata`
handler instead of `kata-$KATA_HYPERVISOR`. This made Kata use the wrong
hypervisor and broke CI.
Fixes: #7663
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
(cherry picked from commit 339569b69c)
Let's add the tests as part of the ci.yaml, so they an be triggered as
part of each PR.
For this PR those tests won't be triggered, courtesy to the
`pull_request_target` event we rely on.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit d19a75e80c)
This PR fixes the TensorFlow word across the document to have uniformity
across all the document.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit bade6a5c3b)
As the cri-containerd tests have been fully migrated to GHA, let's make
sure we get them running.
Fixes: #6543
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b3592ab25c)
As part of the runners, we're hitting a timeout that I cannot reproduce,
at all, when allocating the same instance and running the tests
manually.
The default timeout to connect to the server is 2s when using `crictl`.
Let's increase this to 20s.
It's fairly important to mention that in the first tests I used a
timeout of 10s, and that helped but we still hit issues every now and
then.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 84dd02e0f9)
It'll help us to debug failures with the pod stop / pod delete.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit b29782984a)
We need this to fully understand what are the issues we're facing.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ae0930824a)
This improves readability in case of failures by a lot.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 6c8b2ffa60)
This PR renames the tensorflow scripts to include the data format
that is being used as we will have multiple tests with different
data and model formats for tensorflow so this will help us to
distinguish them.
Fixes#7645
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 18a7fd8e4e)
This will not be tested as part of the PR, thanks to the
`pull_request_target` event, but we want it to be added so we can build
atop of that in a coming up series.
Fixes: #7642
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit e55fa93db9)
This will not be tested as part of the PR, thanks to the
`pull_request_target` event, but we want it to be added so we can build
atop of that in a coming up series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit d9ee17aaec)
Right now this file does nothing, as it's not even called by any GHA.
However, it'll be populated later on as part of a different series,
where we'll have kata-deploy specific tests running here.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 831e73ff91)
Let's split a good portion of `tests/integration/kuberentes/gha-run.sh`
out, and put them in a place where they can be used to the soon-to-come
kata-deploy specific tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit af1b46bbf2)
This PR fixed the loop that stops the kata-shim and the
hypervisors used in metrics checks.
Fixes: #7628
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit 767434d50a)
The GHA runners are not exactly powerful, which makes the static-checks
take way too long (almost an hour).
Let's give a try and move those to the same size of Azure instances used
as part of our CI, and probably have this time reduced.
Fixes: #7446
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit c52d090522)
This PR adds check containers are running in tensorflow mobilenet
that is being defined in common script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit fdcd52ff78)
This PR adds the check containers are up function from common
in tensorflow script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 36337ee146)
This PR adds the check containers are running function the common metrics
script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 833cf7a684)
This PR adds the check containers are up in the common script
in the tensorflow mobilenet script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 918c783084)
This PR uses the check containers are up from the common script
in the tensorflow script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 9d57a1fab4)
This PR adds check containers are up in common script for kata metrics.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 1c84680d8c)
This PR uses the collect results function defined in common for
the tensorflow mobilenet test.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit d3e57cf454)
This PR removes the collect results function from tensorflow script
as it is going to be referenced in the common metrics script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 286de046af)
This PR computes average results for TF bench.
Additionally, it improves the data parsing from
all running containers.
Fixes: #7603
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit 473b0d3a31)
The build context should be the folder where the Dockerfile is present,
otherwise the files copied into the image won't be found.
Fixes: #7595
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 03d1fa67b1)
Let's make sure that we don't fail in case we're building non x86_64.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit eb463b38ec)
Following the example on https://github.com/docker/build-push-action,
it's clear that the actions to "Set up QEMU" and "Set up Docker Buildx"
are missing.
Let's add them, and also take the advantage to bump the
build-push-action to its v4, which, by the way, had a typo on its name
(build-and-push-action does **NOT** exist, build-push-action does).
Fixes: #7595
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit a2d731ad26)
Right now this is not being used, but it'll as the image generated for
the confidential tests have that as part of their tag.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 43fe5d1b90)
One of the joys to rely on the `pull_request_target` is to only be able
to catch those after those are merged.
Fixes: #7595
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 54f6a78500)
With these 2 simple checks we can ensure that we do not regress on the
behaviour of allowing the runtime classes / default runtime class to be
created by the kata-deploy payload.
Fixes: #7491
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 034d7aab87)
This will be done before running TEE tests, and it's a hard dependency
fr them.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit fac8ccf5cd)
Let's add here the image we'll be using for unencrypted confidential
tests. Later on, we'll make sure to build and use this image as part of
our CI.
The image can easily be built as a multi-arch image, and has `cpuid`
installed in case of `x86_64` build, so it can be used to detect whether
we're running on a TEE guest without having to rely on `dmesg | grep
...`.
Fixes: #7595
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ab5f603ffa)
Instead of using package manager to install bats, building
this from source. This gives us the updated version of bats
which supports functions such as setup_file and
teardown_file.
We can use these functions into our current tests.
Fixes: #7597
Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
(cherry picked from commit aeaec9dae9)
This PR changes the metrics workflow in order to just install
kata once, and run the checks for multiple hypervisor variations.
In this way we save time avoiding installing kata for each
hypervisor to be tested.
Fixes: #7578
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit e664969862)
This PR renames the mobilenet tensorflow test to have a more specific
tensorflow name mainly because tensorflow has different configurations
and we will add more tensorflow tests so we want to distinguish each
tensorflow test.
Fixes#7571
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 863283716d)
This PR adds the iperf network metrics to the github actions
for kata metrics.
Fixes#7535
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
(cherry picked from commit 5b5caf8908)
This commit provides a new way to name the containers used
in the launch-times-test in this form:
'kata_launch_times_RANDOM_NUMBER', where RANDOM_NUMBER is
in the 0-1000 range.
Fixes: #7529
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
(cherry picked from commit 1e15369e59)
when interacting with systemd. We have occasionally faced issues with
compatibility between the systemctl version used inside the kata-deploy
container and the systemd version on the host. Instead of using a containerized
systemctl with bind mounted sockets, nsenter the host and run systemctl from
there. This provides less coupling between the kata-deploy container and the
host.
Fixes: #7511
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Version 0.10.5, which was just released, breaks `nydus-storage`.
This is a workaround to fix the CI which is blocking other PRs.
Fixes: #7541
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
- ci-on-push: Make the CI also run for the stable-* branches
- ci: k8s: Do not fail when gathering info on AKS nodes
- kata-deploy: enable cross build for non-x86
- runtime-rs: add support for gather metrics in runtime-rs
- kata-ctl: add monitor subcommand for runtime-rs
- release: release-note.sh: Fix typos and reference to images
- metrics: Add sysbench performance test
- Simplify implementation of runtime-rs/service
6ad16d497 release: Adapt kata-deploy for 3.2.0-rc0
025596b28 ci-on-push: Make the CI also run for the stable-* branches
7ffc0c122 static-build: enable cross build for qemu
35d6d86ab static-build: enable cross-build for image build
2205fb9d0 static-build: enable cross build for virtiofsd
11631c681 static-build: enable cross build for shim-v2
7923de899 static-build: cross build kernel
e2c31fce2 kata-deploy: enable cross build for kata deploy script
2fc5f0e2e kata-depoly: prepare env for cross build in lib.sh
f5e9985af release: release-note.sh: Fix typos and reference to images
f910c66d6 ci: k8s: Do not fail when gathering info on AKS nodes
632818176 metrics: Add k8s sysbench documentation
b3901c46d runtime-rs: ignore errors during clean up sandbox resources
5a1b5d367 metrics: Add sysbench pod yaml
ad413d164 metrics: Add sysbench dockerfile
151256011 metrics: Add sysbench performance test
62e328ca5 runtime-rs: refine implementation of TaskService
458e1bc71 runtime-rs: make send_message() as an method of ServiceManager
1cc1c81c9 runtime-rs: fix possibe bug in ServiceManager::run()
1a5f90dc3 runtime-rs: simplify implementation of service crate
731e7c763 kata-ctl: add monitor subcommand for runtime-rs The previous kata-monitor in golang could not communicate with runtime-rs to gather metrics due to different sandbox addresses. This PR adds the subcommand monitor in kata-ctl to gather metrics from runtime-rs and monitor itself.
d74639d8c kata-ctl: provide the global TIMEOUT for creating MgmtClient
02cc4fe9d runtime-rs: add support for gather metrics in runtime-rs
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
kata-deploy files must be adapted to a new release. The cases where it
happens are when the release goes from -> to:
* main -> stable:
* kata-deploy-stable / kata-cleanup-stable: are removed
* stable -> stable:
* kata-deploy / kata-cleanup: bump the release to the new one.
There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As we only support one stable branch, it'll be used as part of the
stable-3.2 and onwards.
Fixes: #7518
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Depends on mutiarch feature of ubuntu, we can set up cross build
environment easily and achive as good build performance as native
build.
Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
It's too long a time to cross build agent based on docker buildx, thus
we cross build rootfs based on a container with cross compile toolchain
of gcc and rust with musl libc. Then we get fast build just like native
build.
rootfs initrd cross build is disabled as no cross compile tolchain for
rust with musl lib if found for alpine and based on docker buildx takes
too long a time.
Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Based on messense/rust-musl-cross which offer cross build musl lib
environment to cross compile virtiofsd.
Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
shim-v2 has go and rust code. For rust code, we use messense/rust-musl-cross
to build for speed up as it doesn't depends on qemu emulation. Build go
code based on docker buildx as it doesn't support cross build now.
Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
kata-deploy-binaries-in-docker.sh is the entry to build kata components.
set some environment to facilitate the following cross build work.
Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
We leverage three env, TARGET_ARCH means the buid target tuple;
ARCH nearly the same meaning with TARGET_ARCH but has been widely
used in kata; CROSS_BUILD means if you want to do cross compile.
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
diferent -> different
And also let's make sure we escape the backticks around the kata-deploy
environment variables, otherwise bash will try to interpret those.
Fixes: #7497
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise the VM deletion may not delete, leaving us with several
machines behind.
Fixes: #7509
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This requires the GITHUB_UPLOAD_TOKEN. While we're here, let's also fix
the name of the action and remove the "-tarball" suffix, as it's not
really a tarball.
Fixes: #7497
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`stage` has been added, but only hooked up to the amd64 logic, leaving
arm64 and s390x behind.
Let's fix this right now, and make sure no error occurs when passing
this down to the yaml files.
Fixes: #7497
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit 7c857d38c1.
I've misunderstood the error given by github action, let's fix this in
the next commit.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise we'll face the following error as part of our GHA:
```
The workflow is not valid.
kata-containers/kata-containers/.github/workflows/release-$foo.yaml
(Line: 13, Col: 14): Invalid input, stage is not defined in the
referenced workflow.
```
Fixes: #7497
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make sure we can rely on the tests passing down whether they want
to be tested using Node Feataure Discovery or not.
Right now, only the TDX job has this option set to "true", all the other
jobs have this option set to "false".
We can and have to merge this one before merging the NFD related patches
as:
1) It causes no harm in exporting this environment variable, but not
having it used
2) It will allow us to test the NFD after this one is merged, as changes
in the yaml file, in the case of the pull_request_target event, are
not taken into consideration before they're merged
Fixes: #7495
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
If modeVFIO is enabled we need 1st to attach the VFIO control group
device /dev/vfio/vfio an 2nd the actuall device(s) afterwards.Sort the
devices starting with device #1 being the VFIO control group device and
the next the actuall device(s)
/dev/vfio/<group>
Fixes: #7493
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
- tests: Add `k8s-volume` and `k8s-file-volume` tests to GHA CI
- metrics: Update boot time for kata metrics
- metrics: Add FIO report files for kata metrics
- kata-deploy: Allow runtimeclasses to be created by the daemonset
- runtime-rs: change block index to 0
- agent: fix typo in constant
- metrics: Add FIO benchmark for metrics tests
- gha: dragonball: Run only on the dragonball labeled machine
- tests: Fix `k8s-job` test
- agent,libs: Remove unused 'mut' keywords
- runtime-rs: remove unneeded 'mut' keywords
- tests: QoL improvements for running tests locally
- agent: exclude symlinks from recursive ownership change
- cache: kernel: Fix kernel caching
- runk: Add Docker guide to README
- metrics: General improvements to json.bash script
- kata-deploy: Allow shim creation based on what's passed to the daemonset
- gha: ci: Add skeleton of vfio job
- s390x: Fixing device.Bus assignment
- release: Mention the container images used to build the project
- kata-deploy-binaries: kernel_cache: Take module_dir into account
- ci: nydus: Fix typo in "source"
- gha: ci: Add no-op nydus tests to our CI
- Dragonball: migrate dragonball-sandbox crates to Kata
- ci: gha: Add cri-containerd tests (but still do not enable them)
- packaging/tools: Add kata-debug and use it as part of our CI
- cache: kernel: Consider changes in tools/packaging/kernel
- kata-deploy: Properly get the path of the versions.yaml file
- kata-deploy: Add VERSION and versions.yaml to the final tarball
- metrics: Add C-Ray performance test
- metrics: enable TensorFlow benchmark to be run on gha
- metrics: Add function to memory inside container script
- Revert "metrics: Replace backslashes used to escape double quoted key in jq expr"
- versions: Bump virtiofsd to v1.7.0
- metrics: stop hypervirsor and shim at init_env stage
- ci: k8s: Adapt "source ..." to the new location of gha-run.sh
- ci: Move `tests/integration/gha-run.sh` to `tests/integration/kuberentes/` ... and also remove KUBECONFIG from the tdx envs
- versions: Update kernel to version v6.1.x
- agent: Fix exec hang issues with a backgroud process
- agent: Ignore already mounted dev/fs/pseudo-fs
- ci: k8s: Bring TDX tests back
- metrics: Update machine learning documentation
- gha: ci: cri-containerd: Fix KATA_HYPERVSIOR typo
- tests: Add MobileNet Tensorflow performance benchmark
- metrics: replace backslashes used to escape double quoted jq key expr.
- runtime-rs: enhancement of Device Manager for network endpoints.
- feat(Tracing): tracing in Rust runtime
- runtime-rs: ignore unconfigured network interfaces
- metrics: Stop running kata-env before kata is properly installed.
- metrics: use rm -f to remove the oldest continerd config file.
- kernel: Update kernel config name
- kata-deploy: Add a debug option to kata-deploy (and also use it as part of our CI)
- runtime-rs: add parameter for propagation of (u)mount events
- kata-ctl: Move GuestProtection code to kata-sys-util
- tests: Add function before function name in common.bash for metrics
- tests: Add metrics storage documentation
- metrics: Fix metrics ts generator to treat numbers as decimals
- gha: ci: Add cri-containerd tests skeleton -- follow up 1
- dragonball/agent: Add some optimization for Makefile and bugfixes of unit tests on aarch64
- metrics: Enable blogbench test
- tests: Add machine learning performance tests
- tests: gha: ci: Add cri-containerd tests skeleton
- metrics: Enable memory inside container metrics
- tools: Use a consistent target name when building mariner initrd
- gha: ci: Gather info about the node / pods
- runtime-rs: Do not scan network if network model is "none"
- gha: k8s: tdx: Temporarily disable TDX tests
- metrics: Update memory usage script
- gha: Cancel previous jobs if a PR is updated
- gha: nightly: Fix long name of AKS clusters issue and make the CI easier to test
- README: Add badge for our Nightly CI
- gha: Do not run all the tests if only docs are updated
- bugfix: plus default_memory when calculating mem size
- gha: ci: Use github.sha to get the last commit reference
- dragonball: Don't fail if a request asks for more CPUs than allowed
- gha: ci: Fix refernce passed to checkout@v3
- gha: ci: Avoid using env also in the ci-nightly and payload-after-push
- gha: k8s: Ensure cluster doesn't exist before creating it
- gha: ci: More follow up fixes after adding a nightly CI
- tests: Enable running k8s tests on Mariner
- gha: ci: Avoid using env unless it's really needed
- gha: ci: Follow up fixes for the nightly jobs
- tests: Enable memory usage metrics tests
- gha: Add nightly jobs
- metrics: storing metrics workflow artifacts
- gha: k8s: Ensure tests are running on a specific namespace
- metrics: Adds blogbench and webtool metrics tests
- gha: dragonball: Correctly propagate PATH update
- versions: Upgrade to Cloud Hypervisor v33.0
- Convert `is_allowed`, `ttrpc_error` and `sl` to functions
- gha: release: Use a specific release of hub
- metrics: Add checkmetrics to gha-run.sh for metrics CI
- packaging: Fix indentation of build.sh script at ovmf
- doc: Add documentation for the virtualization reference architecture
- gpu: Update kernel building to the latest changes
- runtime: fix PCIe topology for GPUDirect use-case
- metrics: Add memory footprint tests
- runtime: Add "none" as a shared_fs option
- metrics: Uniformity across function names in gha-run.sh
- runtime-rs: support physical endpoint using device manager
- runtime-rs: bugfix for direct volume path's validation.
- metrics: Fix retrieving hypervisor version on metrics
- runtime-rs: fix build error on AArch64
- checkmetrics: Add checkmetrics makefile and documentation
- docs: Add boot time metrics documentation
- runtime-rs: add support spdk/vhost-user based volume.
- static-build: Remove kata-version parameter
- dragonball: avoid obtaining lock twice in create_stdio_console
- metrics: Add checkmetrics for kata metrics CI
- metrics: enable launch-times test on gha-run metrics script
- docs: Add general metrics documentation
- add support vfio device manager
- gha: Don't automatically trigger CI
- kata-ctl: Check for vm capability
- docs: fix spelling of "crate"
- packaging: Fix indentation in init.sh script
- gha: Fix gha actions
- metrics: install kata and launch-times test
- tests: Move tests helper script to this repo
- tests: Add json script for metrics tests
- Cherry pick initramfs caching updates from CCv0
- gha: Fix format for run launchtimes metrics yaml
- tests: Add tests lib common script
- Fix deprecated virtiofsd args (go shim only)
- gha: Add base branch on SHA on pull requst
- gha: ci-on-push: Run metrics tests
- docs: Update Developer Guide
- runtime-rs: Enhance flexibility of virtio-fs config
- versions: Update firecracker version to 1.3.3
- tools: Fix no-op builds
- runtime-rs: update Cargo.lock
- gha: Fix `stage` definition in matrix
- feat(runtime): vcpu resize capability
- packaging: Remove snap package
- gha: Add new build targets for Mariner
- Dragonball: support resize memory
- Port Measured rootfs feature from CCv0 branch to main
- add support direct volume and refactor device manager
- gha: Fix gha-run.sh and unbreak CI
- kata-ctl: Switch to slog logging; add --log-level and --json-logging arguments
- log-parser: Update log parser link at README
- gha: aks: Extract `run` commands to a script
- runtime-rs: handle copy files when share_fs is not available
- agent-ctl: fix the compile error
- agent: fix the issue of exec hang with a backgroud process
- runtime-rs: bugfix: update Cargo.lock
- gha: aks: Use short SHA in cluster name
- README: Display badge for the "Publish Artefacts" job and update the Kata Containers logo
- kata-deploy: Change how we get the Ubuntu k8s key
- gha: aks: Ensure host_os is used everywhere needed
- kubernetes: add agnhost command in pod yaml
- main | release: Standardize kata static file name
- packaging: make BUILDER_REGISTRY configurable
- gha: aks: Add the host_os as part of the aks cluster's name
- kernel: Modify build-kernel.sh to accomodate for changes in version.yaml
- gha: Fix Mariner cluster creation
- gha: Unbreak CI and fix cluster creation step
- Dragonball: support vcpu hotplug on aarch64
- runtime-rs/sandbox_bindmounts: add support for sandbox bindmounts
- runtime-rs/kata-ctl: Enhancement of DirectVolumeMount.
- gha: Create Mariner host as part of k8s tests
- netlink: Fix the issue of update_interface
- gha: Increase timeout for AKS jobs and give more time to start running the tests
- runtime: sending SIGKILL to qemu
- dragonball: convert BlockDeviceMgr and VirtioNetDeviceMgr functions to methods
- dragonball: Remove virtio-net and vsock devices gracefully
- kata-deploy: Improve shim backup / restore
- doc: Update git commands
- kata-deploy: Fix indentation on kata deploy merge script
8353aae41 ci: k8s: Rework get_nodes_and_pods_info()
6ad5d7112 ci: k8s: Do not gather node info before running the tests
5261e3a60 ci: k8s: Group messages to improve readability
9cc6b5f46 ci: k8s: Get logs from kata-deploy
9d285c622 ci: k8s: Let kata-deploy take care of the runtimeclasses
87568ed98 gha: Test split out runtimeclasses are in sync with all-in-one file
39192c608 kata-deploy: Print variables passed to the script
0e157be6f kata-deploy: Allow runtimeclasses to be created by the daemonset
a27433324 kata-deploy: Change default values of DEBUG
69535b808 kata-deploy: runtimeclass: Split out entries
9e1710674 kata-runtimeClasses: Alphabetically sort the enrties
6222bd910 tests: Add k8s-file-volume test
187a72d38 tests: Add k8s-volume test
0c8427035 metrics: Add boot time value for qemu
6520dfee3 metrics: Update boot time for kata metrics
ff2279061 metrics: Update runtime and configuration paths
a5d4e3388 metrics: Add compare virtiofsd dax script
5e937fa62 metrics: Update general FIO tests
b0bea47c5 metrics: Add makefile to report generator
73c57b9a1 metrics: Add FIO report files for kata metrics
c8fcd29d9 runtime-rs: use device manager to handle virtio-pmem
901c19225 runtime-rs: support configure vm_rootfs_driver
5d6199f9b runtime-rs: use device manager to handle vm rootfs
20f1f62a2 runtime-rs: change block index to 0
662f87539 metrics: Add general FIO makefile
c5a87eed2 tests: gha: Add timeout to cluster creation
6daeb08e6 tests: k8s: Clean up node debuggers after running
3aa6c77a0 gha: dragonball: Run only on the dragonball labeled machine
37641a543 metrics: Add example config for fio jobs
314aec73d agent: fix typo in constant
4703434b1 tests: k8s: Allow using custom resource group
350f3f70b tests: Import `common.bash` in `run_kubernetes_tests.sh`
d7f04a64a tests: k8s: Leave `runtimeclass_workloads/` alone
bdde6aa94 tests: k8s: Split deployment and testing commands
91a0b3b40 tests: aks: Simply delete cluster when cleaning up
3c1044d9d metrics: Update FIO paths for k8s runner
6177a0db3 metrics: Add env files for FIO
a45900324 metrics: Add fio exec
ea198fddc metrics: Add FIO runner k8s
8f7ef41c1 metrics: Add FIO vendor code
6293c17bd metrics: Add FIO benchmark for metrics tests
ff4cfcd8a runk: Add Docker guide to README
c8ac56569 cache: kernel: Harmonize commit with fetching side
81775ab1b cache: kernel: Fix SEV kernel caching
717f775f3 gha: ci: Add skeleton of vfio job
b9f100b39 agent,libs: Remove unused 'mut' keywords
a56f96bb2 kata-deploy: Allow shim creation based on what's passed to the daemonset
4a5ab38f1 metrics: General improvements to json.bash script
d4eba3698 kata-deploy-binaries: kernel_cache: Take module_dir into account
b7c9867d6 release: Mention the container images used to build the project
7c4b59781 ci: nydus: Fix typo in "source"
6a680e241 gha: ci: Add placeholder for the nydus tests as part of the CI
fb4f7a002 gha: nydus: Add a no-op GHA for nydus
4a207a16f gha: nydus: Bring tests as they are from the tests repo
2c8f83424 runtime-rs: remove unneeded 'mut' keywords
1fc715bc6 s390x: Add AP Attach/Detach test
e91f5edba ci: cri-containerd: Fix default typo for testContainerStart()
8b8aef09a ci: cri-containerd: Temporarily disable TestContainerSwap
56767001c ci: cri-containerd: Add namespace / uid to the pods
a84773652 ci: cri-containerd: Always use sudo to call crictl
99ba86a1b ci: cri-containerd: Add /usr/local/go/bin to the PATH
7f3b30999 ci: cri-containerd: Add `function` before each function
fde22d6bc ci: cri-containerd: Assume podman is always used
9465a0496 ci: cri-containerd: Adapt "source ..." to this repo
df8d14411 ci: cri-containerd: Remove CI variable
f90570aef ci: cri-containerd: Remove unused runc_runtime_bin
c3637039f ci: cri-containerd: Remove KILL_VMM_TEST env var
bc4919f9b ci: cri-containerd: Always run shim-v2 tests
f9e332c6d ci: cri-containerd: Stop cloning containerd
cfd662fee ci: cri-containerd: Remove ununsed SNAP_CI var
d36c3395c ci: cri-containerd: Update copyright
b5be8a4a8 ci: cri-containerd: Move integration-tests.sh as it was
f2e00c95c ci: cri-containerd: Populate install_dependencies()
897955252 versions: Add "latest" field for cri-tools
1bbcbafa6 ci: Add clone_cri_container()
f66c68a2b ci: Add install_cri_tools()
4dd828414 ci: Add install_cri_containerd()
ad47d1b9f ci: Add download_github_project_tarball()
788c562a9 ci: Add get_latest_patch_release_from_a_github_project()
6742f3a89 ci: Use `function` before each install_go.sh function
5eacecffc ci: Adjust paths for install_go.sh
8ed1595f9 ci: Update copyright for install_go.sh
6123d0db2 ci: Move install_go.sh as it was
8653be71b ci: Do not take cross-build into consideration for kata-arch.sh
6a76bf92c ci: Fix style / identation if kata-arch.sh
72743851c ci: Add `function` before each kata-arch.sh function
9f6d4892c ci: Update copyright for kata-arch.sh
6f73a7283 ci: Move kata-arch.sh as it was
3615d7343 ci: Add get_from_kata_deps()
34779491e gha: kubernetes: Avoid declaring repo_root_dir
f3738beac tests: Use $HOME/go as fallback for $GOPATH
b87ed2741 tests: Move `ensure_yq` to common.bash
124e39033 tests: common: Fix quoting when globbing
db77c9a43 tests: Make install_kata take care of the links
13715db1f tests: Do not call `install_check_metrics` when installing kata
630634c5d ci: k8s: Group logs to make them easier to read
228b30f31 ci: k8s: Gather node info during the cleanup
81f99543e ci: k8s: Cleanup cluster before deleting it
38a7b5325 packaging/tools: Add kata-debug
ae6e8d2b3 kata-deploy: Properly get the path of the versions.yaml file
309e23255 cache: kernel: Consider changes in tools/packaging/kernel
59fdd69b8 kata-deploy: Add VERSION and versions.yaml to the final tarball
5dddd7c5d release: Upload versions.yaml as part of the release
bad3ac84b metrics: Rename C-Ray to cpu performance tests
87d99a71e versions: Remove "kernel-experimental"
545de5042 vfio: Fix tests
62aa6750e vfio: Added better handling of VFIO Control Devices
dd422ccb6 vfio: Remove obsolete HotplugVFIOonRootBus
114542e2b s390x: Fixing device.Bus assignment
371a118ad agent: exclude symlinks from recursive ownership change
e64edf41e metrics: Add tensorflow function in gha-run script
67a6fff4f metrics: Enable tensorflow benchmark on gha
01450deb6 Revert "metrics: Replace backslashes used to escape double quoted key in jq expr."
843006805 metrics: Add function to memory inside container script
bbd3c1b6a Dragonball: migrate dragonball-sandbox crates to Kata
fad801d0f ci: k8s: Adapt "source ..." to the new location of gha-run.sh
55e2f0955 metrics: stop hypervirsor and shim at init_env stage
556e663fc metrics: Add disk link to general metrics README
98c121709 metrics: Add C-Ray README
8e7d9926e metrics: Add C-Ray Dockerfile
e2ee76978 metrics: Add C-Ray performance test
2ee2cd307 ci: k8s: Move gha-run.sh to the kubernetes dir
88eaff533 ci: tdx: Adjust KUBECONFIG
c09e268a1 versions: Downgrade SEV(-SNP) kernel back to v5.19.x
6a7a32365 versions: Bump virtiofsd to v1.7.0
ac5f5353b ci: k8s: Bring TDX tests back
950b89ffa versions: Update kernel to version v6.1.38
8ccc1e5c9 metrics: Update machine learning documentation
f50d2b066 gha: ci: cri-containerd: Fix KATA_HYPERVSIOR typo
620b94597 metrics: Add Tensorflow Mobilenet documentation
6c91af0a2 agent: Fix exec hang issues with a backgroud process
59f4731bb metrics: Stop running kata-env before kata is properly installed.
468f017e2 metrics: Replace backslashes used to escape double quoted key in jq expr.
64f013f3b ci: k8s: Enable debug when running the tests
8f4b1df9c kata-deploy: Give users the ability to run it on DEBUG mode
2c8dfde16 kernel: Update kernel config name
150e54d02 runtime-rs: ignore unconfigured network interfaces
3ae02f920 metrics: use rm -f to remove older continerd config file.
a864d0e34 tests: Add tensorflow mobilenet dockerfile
788d2a254 tests: Add tensorflow mobilenet performance test
3fed61e7a tests: Add storage link to general metrics documentation
b34dda4ca tests: Add storage blogbench metrics documentation
6787c6390 runtime-rs: add parameter for propagation of (u)mount events
6e5679bc4 tests: Add function before function name in common.bash for metrics
62080f83c kata-sys-util: Fix compilation errors
02d99caf6 static-checks: Make cargo clippy pass.
982420682 agent: Make the static checks pass for agent
61e4032b0 kata-ctl: Remove all utility functions to get platform protection
a24dbdc78 kata-sys-util: Move utilities to get platform protection
dacdf7c28 kata-ctl: Remove cpu related functions from kata-ctl
f5d195717 kata-sys-util: Move additional functionality to cpu.rs
304b9d914 kata-sys-util: Move CPU info functions
7319cff77 ci: cri-containerd: Add LTS / Active versions for containerd
2a957d41c ci: cri-containerd: Export GOPATH
75a294b74 ci: cri-containerd: Ensure deps are installed
6924d14df metrics: Fix metrics ts generator to treat numbers as decimals
9e048c8ee checkmetrics: Add blogbench read value for qemu
2935aeb7d checkmetrics: Add blogbench write value for qemu
02031e29a checkmetrics: Add blogbench read value for clh
107fae033 checkmetrics: Add blogbench write value for clh
8c75c2f4b metrics: Update blogbench Dockerfile
49723a9ec metrics: Add double quotes to variables
dc67d902e metrics: Enable blogbench test
438fe3b82 gha: ci: Add cri-containerd tests skeleton
bd08d745f tests: metrics: Move metrics specific function to metrics gha-run.sh
3ffd48bc1 tests: common: Move a few utility functions to common.bash
7f961461b tests: Add machine learning README
bb2ef4ca3 tests: Add `function` before each function
063f7aa7c tests: Add Pytorch Dockerfile
1af03b9b3 tests: Add Pytorch performance test
4cecd6237 tests: Add tensorflow Dockerfile
c4094f62c tests: Add metrics machine learning performance tests
89b622dcb gha: k8s: tdx: Temporarily disable TDX tests
8c9d08e87 gha: ci: Gather info about the node / pods
283f809dd runtime-rs: Enhancing Device Manager for network endpoints.
a65291ad7 agent: rustjail: update test_mknod_dev
46b81dd7d agent: clippy: fix cargo clippy warnings
c4771d9e8 agent: Makefile: enable set SECCOMP dynamically
a88212e2c utils.mk: update BUILD_TYPE argument
883b4db38 dragonball: fix cargo test on aarch64
6822029c8 runtime-rs: Do not scan network if network model is "none"
ce54e43eb metrics: Update memory usage script
fbc2a91ab gha: Cancel previous jobs if a PR is updated
307cfc8f7 tools: Use a consistent target name when building mariner initrd
d780cc08f gha: nightly: Also use `workflow_dispatch` to trigger it
b99ff3026 gha: nightly: Fix name size limit for AKS
aedc586e1 dragonball: Makefile: add coverage target
310e069f7 checkmetrics: Enable checkmetrics for memory inside test
1363fbbf1 README: Add badge for our Nightly CI
1776b18fa gha: Do not run all the tests if only docs are updated
28c29b248 bugfix: plus default_memory when calculating mem size
0c1cbd01d gha: ci: after-push: Use github.sha to get the last commit reference
37a955678 gha: ci: nightly: Use github.sha to get the last commit reference
ed23b47c7 tracing: Add tracing to runtime-rs
96e9374d4 dragonball: Don't fail if a request asks for more CPUs than allowed
38f0aaa51 Revert "gha: k8s: dragonball: Skip k8s-number-cpus"
828a72183 gha: k8s: dragonball: Skip k8s-oom
a79505b66 gha: k8s: dragonball: Skip k8s-number-cpus
275c84e7b Revert "agent: fix the issue of exec hang with a backgroud process"
2be342023 checkmetrics: Add memory usage inside container value for qemu
6ca34f949 checkmetrics: Add memory inside container value for clh
6c6892423 metrics: Enable memory inside container metrics
0ad298895 gha: ci: Fix refernce passed to checkout@v3
86904909a gha: ci: Avoid using env also in the ci-nightly and payload-after-push
f72cb2fc1 agent: Remove shadowed function, add slog-term
1d05b9cc7 gha: ci: Pass down secrets to ci-on-push / ci-nightly
c5b4164cb gha: ci: Fix tarball-suffix passed to the metrics tests
07810bf71 agent: Ignore already mounted dev/fs/pseudo-fs
11e3ccfa4 gha: ci: Avoid using env unless it's really needed
c45f646b9 gha: k8s: Ensure cluster doesn't exist before creating it
1a7bbcd39 gha: ci: Fix typo pull_requesst -> pull_request
ddf4afb96 gha: ci: Fix set-fake-pr-number job
8a0a66655 gha: ci: schedule expects a list, not a map
5c0269dc5 gha: ci: Add pr-number input to the correct job
de83cd9de gha: ci: Use $VAR instead of ${{ env.VAR }}
6acce83e1 metrics: Fix the call to check_metrics function
e067d1833 gha: Add a nightly CI job
7c0de8703 gha: k8s: Ensure tests are running on a specific namespace
106e30571 gha: Create a re-usable `ci.yaml` file
cc3993d86 gha: Pass event specific info from the caller workflow
4e396e728 metrics: Add function keyword to to helper metrics functions
1ca17c2f7 metrics: storing metrics workflow artifacts
5a61065ab checkmetrics: Add checkmetrics value for memory usage in qemu
78086ed1f checkmetrics: Add memory usage value for clh
1c3dbafbf metrics: Fix function of how to retrieve multiple values
18968f428 metrics: Add function to have uniformity
35d096b60 metrics: Adds blogbench and webtool metrics tests
d8f90e89d metrics: Rename function at memory usage script
b9d66e0d5 metrics: Fix double quotes variables in memory usage script
476a11194 tests: Enable memory usage metrics tests
b568c7f7d tests/integration: Provide default value for KATA_HOST_OS
d6e96ea06 tests/integration: Use AzureLinux instead of Mariner
40c46c75e tests/integration: Perform yq install in run_tests()
d8b8f7e94 metrics: Enable launch tests time metrics
72fd562bd gha: release: Use a specific release of hub
0502354b4 checkmetrics: Add checkmetrics json for qemu
b481ef188 makefile: Add -buildvcs=false flag to go build
e94aaed3c ci_worker: Add checkmetrics ci worker for cloud hypervisor
917576e6f metrics: Add double quotes in all variables
cc8f0a24e metrics: Add checkmetrics to gha-run.sh for metrics CI
477856c1e gha: dragonball: Correctly propagate PATH update
1c211cd73 gha: Swap asset/release in build matrix
0152c9aba tools: Introduce `USE_CACHE` environment variable
2b5975689 tests: Build CLH with glibc for Mariner
80c78eadc tests: Use baked-in kernel with Mariner
532755ce3 tests: Build Mariner rootfs initrd
6a21e20c6 runtime: Add "none" as a shared_fs option
5681caad5 versions: Upgrade to Cloud Hypervisor v33.0
b2ce8b4d6 metrics: Add memory footprint tests to the CI
d035955ef doc: Add documentation for the virtualization reference architecture
0f454d0c0 gpu: Fixing typos for PCIe topology changes
6bb2ea819 packaging: Fix indentation of build.sh script at ovmf
0504bd725 agent: convert the `sl` macros to functions
0860fbd41 agent: convert the `ttrpc_error` macro to a function
0e5d6ce6d agent: convert the `is_allowed` macro to a function
f680fc52b agent: change `AGENT_CONFIG`'s lazy type to just `AgentConfig`
beb706368 metrics: Uniformity across function names
1f3e837e4 runtime-rs: fix build error on AArch64
6fd25968c runtime-rs: bugfix for direct volume path's validation.
415578cf3 docs: Add general README
bff4672f7 runtime-rs: support physical endpoint using device manager
32cba7e44 metrics: Fix retrieving hypervisor version on metrics
aa7946de4 checkmetrics: Add general checkmetrics documentation
2fac2b72f checkmetrics: Add checkmetrics makefile
e45899ae0 docs: Add time tests documentation reference
28130d3ce docs: Add boot time metrics documentation
0df2fc270 runtime-rs: add support spdk/vhost-user based volume.
17198089e vendor: Add vendor checkmetrics dependencies
f1dfea6e8 docs: Add metrics documentation reference
8330fb8ee gpu: Update unit tests
859359424 metrics: enable launch-times test on gha-run metrics script
c4ee601bf metrics: Add checkmetrics for kata metrics CI
e0d6475b4 gha: Don't automatically trigger CI
b535c7cbd tests: Enable running k8s tests on Mariner
71071bdb6 docs: Add general metrics documentation
610f7986e check: Relax the unrestricted_guest check when running in a VM
1b406b9d0 kata-ctl:Implement functionality to check host is capable of running VM
adf88eaa8 static-build: Remove kata-version parameter
09720babc docs: fix spelling of "crate"
7185afc50 gha: Fix gha actions
21294b868 packaging: Fix indentation in init.sh script
fad3ac9f5 metrics: install kata and launch-times test
4bbfcfaf1 tests: Move tests helper script to this repo
f152f0e8c metrics: Add launch-times to metrics tests
59510cfee runtime-rs: add support vfio device based volume
1e3b372bb runtime-rs: add support vfio device manager
6b0848930 gha: Fix format for run launchtimes metrics yaml
3cefa43e7 tests: Add json script for metrics tests
6a3710055 initramfs: Build dependencies as part of the Dockerfile
aa2380fdd packaging: Add infra to push the initramfs builder image
1c7fcc6cb packaging: Use existing image to build the initramfs
a43ea24df virtiofsd: Convert legacy `-o` sub-options to their `--` replacement
8e00dc694 virtiofsd: Drop `-o no_posix_lock`
2a15ad978 virtiofsd: Stop using deprecated `-f` option
c3043a6c6 tests: Add tests lib common script
b16e0de73 gha: Add base branch on SHA on pull requst
72f2cb84e gpu: Reset cold or hot plug after overriding
fbacc0964 gpu: PCIe topology, consider vhost-user-block in Virt
bc152b114 gha: ci-on-push: Run metrics tests
dad731d5c docs: Update Developer Guide
b11246c3a gpu: Various fixes for virt machine type
40101ea7d vfio: Added annotation for hot(cold) plug
8f0d4e261 vfio: Cleanup of Cold and Hot Plug
b5c4677e0 vfio: Rearrange the bus assignemnt
b1aa8c8a2 gpu: Moved the PCIe configs to drivers
55a66eb7f gpu: Add config to TOML
da42801c3 gpu: Add config settings tests for hot-plug
de39fb7d3 runtime: Add support for GPUDirect and GPUDirect RDMA PCIe topology
9318e022a gpu: Add CC relates configs
b7932be4b gpu: Add Arm64 Kernel Settings
211b0ab26 gpu: Update Kernel Config
5f103003d gpu: Update kernel building to the latest changes
35e4938e8 tools: Fix no-op builds
347385b4e runtime-rs: Enhance flexibility of virtio-fs config
21d227853 versions: Update firecracker version to 1.3.3
0e2379909 gha: Fix `stage` definition in matrix
ae2cfa826 doc: add vcpu handlint doc for runtime-rs
7b1e67819 fix(clippy): fix clippy error
67972ec48 feat(runtime-rs): calculate initial size
aaa96c749 feat(runtime-rs): modify onlineCpuMemRequest
d66f7572d feat(runtime-rs): clear cpuset in runtime side
a0385e138 feat(runtime-rs): update linux resource when stop_process
a39e1e6cd feat(runtime-rs): merge the update_cgroups in update_linux_resources
fa6dff9f7 feat(runtime-rs): support vcpu resizing on runtime side
8cb4238b4 packaging: Remove snap package
213773998 runtime-rs: update Cargo.lock
56d2ea9b7 kata-ctl: Refactor kernel module check
9f7a45996 gha: Add `rootfs-initrd-mariner` build target
f28a62164 gha: Add `cloud-hypervisor-glibc` build target
8fb7ab751 dragonball: introduce virtio-balloon device
7ed949497 dragonball: introduce virtio-mem device
776a15e09 runtime-rs: add support direct volume.
a8e0f51c5 dragonball: extend DeviceOpContext
abae11404 runtime-rs: refactor device manager implementation
210a15794 dragonball: avoid obtaining lock twice in create_stdio_console
69668ce87 tests: gha-run: Use correct env variable for repo
f487199ed gha: aks: Fix argument in call to gha-run.sh
f6afae9c7 packaging: Add rootfs-image-tdx-tarball target
f62b2670c config: Add root hash value and measure config to kernel params
008058807 kernel: Integrate initramfs into Guest kernel
28b264562 initramfs: Add build script to generate initramfs
5cb02a806 image-build: generate root hash as an separate partition for rootfs
31c0ad207 packaging: Add cryptsetup support in Guest kernel and rootfs
980d084f4 log-parser: Update log parser link at README
410bc1814 agent-ctl: fix the compile error
77519fd12 kata-ctl: Switch to slog logging; add --log-level, --json-logging args
aab603096 gha: aks: Extract `run` commands to a script
e4eb664d2 runtime-rs: update rust to 1.69.0
ed37715e0 runtime-rs: handle copy files when share_fs is not available
5f6fc3ed7 runtime-rs: bugfix: update Cargo.lock
1c6d22c80 gha: aks: Use short SHA in cluster name
3c1f6d36d readme: Update Kata Containers logo
388684113 readme: Add status badge for the "Publish Artefacts" job
26f752038 kata-deploy: Change how we get the Ubuntu k8s key
aebd3b47d gha: aks: Ensure host_os is used everywhere needed
0c8282c22 gha: aks: Add the host_os as part of the aks cluster's name
4b89a6bda release: Standardize kata static file name
9228815ad kernel: Modify build-kernel.sh to accomodate for changes in version.yaml
03027a739 gha: Fix Mariner cluster creation
43e73bdef packaging: make BUILDER_REGISTRY configurable
ffe3157a4 dragonball: add arm64 patches for upcall
560442e6e dragonball: add vcpu_boot_onlined vector
e31772cfe dragonball: add support resize_vcpu on aarch64
64c764c14 dragonball: update dbs-boot to v0.4.0
fd9b41464 dragonball: update comment for init_microvm
af16d3fca gha: Unbreak CI and fix cluster creation step
5ddc4f94c runtime-rs/kata-ctl: Enhancement of DirectVolumeMount.
25d2fb0fd agent: fix the issue of exec hang with a backgroud process
4af4ced1a gha: Create Mariner host as part of k8s tests
eee7aae71 runtime-rs/sandbox_bindmounts: add support for sandbox bindmounts
557b84081 gha: aks: Wait longer to start running the tests
c04c872c4 gha: aks: Increase the timeout time
428041624 kata-deploy: Improve shim backup / restore
14c3f1e9f kata-deploy: Fix indentation on kata deploy merge script
0e47cfc4c runtime: sending SIGKILL to qemu
6a0035e41 doc: Update git commands
433b5add4 kubernetes: add agnhost command in pod yaml
c477ac551 dragonball: Convert VirtioNetDeviceMgr function to method
4659facb7 dragonball: Convert BlockDeviceMgr function to method
ee6deef09 dragonball: Remove virtio-net and vsock devices gracefully
2bda92fac netlink: Fix the issue of update_interface
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The links for pods and kubelets no longer work so update to new links
with relevant info.
Fixes#7487
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Multiple instances of task service may get registered by
ServiceManager::run(), fix it by making operation symmetric.
Fixes: #7479
Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
This PR rounds the axelnet and resnet results in order to extract
properly the result.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR will avoid to have the strconv.atoi parsing error when we
are retrieving the results from the json.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR moves the checkmetrics to gha-run script to gathered
tensorflow information.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The previous kata-monitor in golang could not communicate with runtime-rs
to gather metrics due to different sandbox addresses.
This PR adds the subcommand monitor in kata-ctl to gather metrics from
runtime-rs and monitor itself.
Fixes: #5017
Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
Several functions in kata-ctl need to establish a connection with runtime-rs through MgmtClient.
This PR provides a global TIMEOUT to avoid multiple definitions.
Fixes: #5017
Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
1. Implemented metrics collection for runtime-rs shim and dragonball hypervisor.
2. Described the current supported metrics in runtime-rs.(docs/design/kata-metrics-in-runtime-rs.md)
Fixes: #5017
Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
The amount of info we've added seemed unnecessary, and ends up making
our lives even harder when trying to find errors.
Let's just rely on the kata-debug container to collect the needed info
for us.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It's been proven to not be useful, and ends up making things more
confusing due to the amount of logs printed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make sure we can debug kata-deploy in case something goes wrong
during its execution.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is needed in order to not lose track of what's been created and
what's been added here and there.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will help folks to debug / understand what's been passed to the
kata-deploy.sh script.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's allow the daemonset to create the runtimeclasses, which will
decrease one manual step a user of kata-deploy should take, and also
help us in the Confidential Containers land as the Operator can just
delegate it to this script.
Fixes: #7409
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This can be easily done as there was no official release with the
previous values.
The reason we're doing so is because when using `yq` to replace the
value, even when forcing `--tag '!!str' "yes"`, the content is placed
without quotes, causing errors in our CI.
While here, we're also removing the fallback value for DEBUG, as it is
**always** set in the kata-deploy.yaml file.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will make things simpler to only create the handlers defined by the
kata-deploy user.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will become handy in the near future, as we want to have separate
enrties for each file, while still keeping this one.
Having the entries sorted will make our lives easier to test those are
always in sync.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This imports the k8s-file-volume test from the tests repo and modifies
it slightly to set up the host volume on the AKS host.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This imports the k8s-volume test from the tests repo and modifies it
slightly to set up the host volume on the AKS host.
Fixes: #6566
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This deletes node debugger pods after execution since their presence may
affect tests that assume only test workloads pods are present.
For example, in `k8s-job` we wait for *any* pod to be in the `Succeeded`
state before proceeding, which causes failures.
Fixes: #7452
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Static checks for dragonball are landing on any of the self-hosted
runners, and the reason for that is because "self-hosted" was the label
selector used.
Let's use "dragonball" instead, as the machine has that label as well.
Fixes: #7464
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This simply allows setting a custom resource group when debugging
locally, so as to prevent name collisions and not pollute the namespace.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Makes it so that `setup.sh` doesn't make changes in
`runtimeclass_workloads/` directly. Instead we treat that as a template
directory and we use the new directory `runtimeclass_workloads_work/` as
a work dir.
This has two advantages:
* Allows rerunning tests without the assumption that `setup.sh` must be
idempotent. E.g. the `set_runtime_class()` step would break.
* Doesn't pollute your git environment with a bunch of changes when
developing.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This splits deploying Kata and running the tests into separate commands
to make it possible to rerun tests locally without having to redeploy
Kata each time.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR adds the FIO benchmark scripts and resources for the metrics
tests section.
Fixes#7441
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
kata-deploy-binaries.sh uses the last commit in
tools/packaging/static-build/kernel for its version check, while the cache
generation uses tools/packaging/kernel. Use tools/packaging/static-build/kernel
as $kata_config_version is already part of the version string and covers any
changes to tools/packaging/kernel.
Fixes: #7403
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
The SEV kernel cache calls create_cache_asset() twice, once for the kernel and
once for modules. Both calls need to use the same version string, otherwise the
second call overwrites the "latest" file of the first one and the cache is not
used.
Fixes: #7403
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This job will run on a nested virt capable Azure VM (improving test
concurrency). This is just a placeholder while we adapt the test to GHA.
Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Remove unused `mut` because the agent compilation fails
when the rust compiler is >= 1.71. This is related to #7425Fixes: #7438
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Instead of hardcoding shims as part of the script, let's ensure we can
allow them to be created based on environment variables passed to the
daemonset.
This change brings no functionality change as the default values in the
daemonset are exactly what has been used as part of the scripts.
Fixes: #7407
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR adds general improvements like putting function before function
name and consistency in how we declare variables and so on to have
uniformity across the metrics scripts.
Fixes#7429
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
`module_dir` has been passed to the function but was never assigned to a
var, leading to errors when trying to use it.
Fixes: #7416
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We should source from `nydus_dir`, instead of `cri_containerd_dir`, and
that was a leftover from fb4f7a002c.
Fixes: #6543
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will triger the nydus tests, but as they currently are they'll just
return "okay" without actually executing.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This newly added GHA does nothing, is not even triggered, and it's just
a placeholder that we'll grow in the next commits / PRs, so we can
actually start running the nydus tests as part of our CI.
Fixes: #6543
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Now that we have propper AP device support add a
unit test for testing the correct Attach/Detach of AP devices.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Otherwise crictl will fail to remove them with:
```
getting sandbox status of pod "$pod": metadata.Name, metadata.Namespace
or metadata.Uid is not in metadata "..."
```
A huge shout out to Steven Horsman for helping to debug this one.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
For this set of tests, we'll always be using podman in order to avoid
having containerd pulled in by docker.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We don't need the env var, we just need to restrict the test according
to the KATA_HYPERVISOR used, as right now it's very specifict to QEMU.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We only have shim-v2 as the runtime type, so we always need to run tests
using it. :-)
We had to adjust the script in order to properly run the tests with the
current logic.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's move the `integration/containerd/cri/integration-tests.sh` file
from the tests repo to this one.
The file has been moved as it is, it's not used, and in the following
commits we'll clean it up before actually using it.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's install all the dependencies needed for running the
`cri-containerd` tests.
The list of dependencies we have are:
* From the system
- build-essential
- jq
- podman-docker
* From our own repo
- yq
- go
* From GitHub projects
- containerd
- cri-tools
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As we don't want to disrupt what we have on the `tests` repo, let's
create a "latest" entry and use that for the GitHub actions tests.
Once we deprecate the `tests` repo we can decide whether we want to
stick to using "latest" or switch back to "version".
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will simply clone containerd repo, specifically on a tag
we want to use to test.
This can be expanded for different projects, and it will be the case as
soon as we grow the tests. But, for now, let's keep it simple.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will install cri-tools in the host, and soon enough (as
part of this PR) we'll be using it to install cri-tools as part of the
cri-containerd tests.
I've decided to have this as part of the `common.bash` as other tests
that will be added in the future will require cri-tools to be installed
as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will install cri-containerd in the host, and soon enough
(as part of this PR) we'll be using it to install cri-containerd as part
of the cri-containerd tests.
I've decided to have this as part of the `common.bash` as other tests
that will be added in the future will require cri-containerd to be
installed as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will hel us to get the tarball, from a github project,
that we're going to use as part of our tests.
Right now this is not used anywhere, but it'll soon enough (as part of
this series) be used to download the cri-containerd / cri-tools / cni
tarballs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will help us to get the latest patch release from a
GitHub project.
The idea behind this function is that we don't have to keep updating
versions.yaml that frequently (or worse, have it outdated as it
currently is), and always test against the latest patch release of a
given project's version that we care about.
Although right now this is not used anywhere, this will be used with the
coming cri-containerd tests, which will be part of this series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's adjust paths for what we source and the scripts we call, after
moving from the tests repo to this one.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's move `.ci/install_go.sh` file from the tests repo to this one.
The file has been moved as it is, it's not used, and in the following
commits we'll clean it up before actually using it.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Right now we'd need to import lib.sh just in order to get cross-build
information for rust, and it seems a little bit premature to do so at
this stage and only for rust.
Let's skip it and keep this transition simple.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's move `.ci/kata-arch.sh` file from the tests repo to this one.
The file has been moved as it is, it's not used, and in the following
commits we'll clean it up before actually using it.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
First of all, I'm 100% aware that I'm duplicating this function here as
I've copied it from the packaging stuff, and I'm not exactly proud of
that.
However, right now it seems a little bit premature to combine that set
of scripts with this set of scripts in a single one and make them used
by both pieces of our project.
Anyways, this functions helps to get information from the
`versions.yaml` file, and it'll be used as part of the cri-containerd
tests and a few others in the future.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is already declared as part of the `common.bash` file, so let's
just make sure we use it from there.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Considering that someone may want to run the tests locally, we shouldn't
rely on having GITHUB_WORKSPACE exported, and fallback to $HOME/go if
needed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
When the glob star is inside quotes, there is only one iteration of the loop
and b holds all matches at once. Move the glob out of the quotes so that we
actually iterate over matched paths.
Fixes: #6543
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
The `install_kata` function was moved from the metrics' `gha-run.sh`
file to the `common.bash` in the commit 3ffd48bc16, but I didn't notice
that it brought with it a call to `install_check_metrics`, which is
totally unrelated to installing Kata Containers.
Let's remove the call so the function is a little bit less specific, and
move the call to install_check_metrics to the metrics `gha-run.sh` file.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will help us to in two fronts:
* catching possible issues related to kata-deploy cleanup
* do more (like, in the future, collect logs) after the tests run
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
kata-debug is a tool that is used as part of the Kata Containers CI to gather
information from the node, in order to help debugging issues with Kata
Containers.
As one can imagine, this can be expanded and used outside of the CI context,
and any contribution back to the script is very much welcome.
The resulting container is stored at the [Kata Containers quay.io
space](https://quay.io/repository/kata-containers/kata-debug) and can
be used as shown below:
```sh
kubectl debug $NODE_NAME -it --image=quay.io/kata-containers/kata-debug:latest
```
Fixes: #7397
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We need to correctly get the full path of the versions.yaml file as part
of the merge-builds.sh script, as we do a `pushd` there and that leads
to a fail merging the artefacts as the `versions.yaml` file does not
exists in that path.
Fixes: #7405
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Any change in the script used to build the kernel should invalidate the
cache.
Fixes: #7403
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make things simpler to figure out which version of Kata
Containers has been deployed, and also which artefacts come with it.
This will help us immensely in the future, for the TEEs use case, so we
can easily know whether we can deploy a specific guest kernel for a
specific host kernel.
Fixes: #7394
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Although this file is far away from being a SBOM, it'll help folks to
easily visualise which components are part of a release, and even have
SBOMs generated from that.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We've not been using nor shipping this kernel for a very long time.
Regardless, we're leaving behind the logic in the kernel scripts to
build it, in case it becomes necessary in the future.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Removing HotplugVFIOonRootBus which is obsolete with the latest PCI
topology changes, users can set cold_plug_vfio or hot_plug_vfio either
in the configuration.toml or via annotations.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The device.Bus was reset if a specific combination of
configuration parameters were not met. With the new
PCIe topology this should not happen anymore
Fixes: #7381
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
currently when fsGroup is used with direct-assign, kata agent
recursively changes ownership and permission for each file including
symlinks. However the problem with symlinks is, the permission of
the symlink itself may not be same as the underlying file. So while
doing recursive ownership and permission changes we should skip
symlinks.
Fixes: #7364
Signed-off-by: Alakesh Haloi <a_haloi@apple.com>
This PR adds the tensorflow function in gha-run script in order to
be triggered in the gha.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds function before function of the variables at the memory
inside container script in order to have uniformity across the script.
Fixes#7386
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
In order to make it easier for developers to contribute to Dragonball,
we decide to migrate all dragonball-sandbox crates to Kata.
fixes: #7262
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
This PR kills the hypervisor and the kata shim in the
init_env stage prior to launch any metric test.
Additionally this PR adds info messages in the main blocks
of the blogbench test to help in debugging.
Fixes: #7366
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR adds C-Ray performance test in order to be part of the kata
metrics CI.
Fixes#7375
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
We don't need to export KUBECONFIG there. Let's just make sure we have
the server correctly setup and avoid doing that.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
CC-GPU seems to have issues with v6.1, so downgrade the kernels used for
SEV-SNP to a known-working version. It is worth mentioning that TDX is also
still on 5.19.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Now that we have a new TDX machine plugged into our CI, let's re-enable
the TDX tests.
Fixes: #7368
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Kernel v6.1.38 is the current latest LTS version, switch to it. No
patches should be necessary. Some CONFIG options have been removed:
- CONFIG_MEMCG_SWAP is covered by CONFIG_SWAP and CONFIG_MEMCG
- CONFIG_ARCH_RANDOM is unconditionally compiled in
- CONFIG_ARM64_CRYPTO is covered by CONFIG_CRYPTO and ARCH=arm64
Fixes: #6086
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This PR updates the machine learning documentation related with
Tensorflow and Pytorch benchmarks.
Fixes#7359
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the Tensorflow mobilinet documentation for the machine
learning README.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Issue #4747 and pull request #4748 fix exec hang issues where the exec
command hangs when a process's stdout is not closed. However, the PR might
cause the exec command not to work as expected, leading to CI failure. The
PR was reverted in #7042. This PR resolves the exec hang issues and has
undergone 1000 rounds of testing to verify that it would not cause any CI
failures.
Fixes: #4747
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
This PR makes kata-env is called only after some metrics have
completed his workload. This fixes a bug that occurs when
kata-env was being called before kata is already installed on the
testing platform.
Fixes: #7348
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR uses squared brackets in a jq expression to access
key values corresponding to metric results in json format.
The values are the data inputs into the checkmetrics tool.
Fixes: #7319
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This will help us to gather more information about Kata Containers in
case of failure.
Fixes: #7343
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The DEBUG env var introduced to the kata-deploy / kata-cleanup yaml file
will be responsible for:
* Setting up the CRI Engine to run with the debug log level set to debug
* The default is usually info
* Setting up Kata Containers to enable:
* debug logs
* debug console
* agent logs
This will help a lot folks trying to debug Kata Containers while using
kata-deploy, and also help us to always run with DEBUG=yes as part of
our CI.
Fixes: #7342
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Fixes: #7294
When installing the kernel config adjust the name like
the vmlinuz and vmlinux files so that any added suffixes
are also reflected in the kernel config name.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
In order to run kata metrics we need to check that the containerd
config file is properly set. When this is not the case, we
need to remove that file, and generate a valid one.
This PR runs rm -f in order to ignore errors in case the
file to delete does not exist.
Fixes: #7336
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR adds the storage metrics documentation for blogbench for kata
metrics.
Fixes#7329
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Add an extra parameter in `bind_mount_unchecked` to specify
the propagation type: "shared" or "slave".
Fixes: #7017
Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
This PR adds function before the function name in common.bash script
in order to have uniformity across all the script.
Fixes#7327
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Since these have been added to kata-sys-util, remove these from
kata-ctl. Change all invocations to get platform protection to make use
of kata-sys-util.
Fixes: #7144
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Remove cpu related functions which have been moved to kata-sys-util.
Change invocations in kata-ctl to make use of functions now moved to
kata-sys-util.
Signed-off-by: Nathan Whyte <nathanwhyte35@gmail.com>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Make certain imports architecture specific as these are not used on all
architectures.
Move additional constants and functionality to cpu.rs.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Move get_single_cpu_info and get_cpu_flags into kata-sys-util.
Add new functions that get a list of flags and check if a flag
exists in that list.
Fixes#6383
Signed-off-by: Nathan Whyte <nathanwhyte35@gmail.com>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
As we'll be testing against the LTS and the Active versions of
containers, let's add those entries to the versions.yaml file and make
sure we export what we want to use for the tests as an env var.
The approach taken should not break the current way of getting the
containerd version.
LTS and Active versions of containerd can be found at:
https://containerd.io/releases/#support-horizonFixes: #6543
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make sure this is exported, as it'll be needed in order to install
`yq`, which will be used to get the versions of the dependencies to be
installed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make sure we install the needed dependencies for running the
`cri-containerd` tests.
Right now this commit is basically adding a placeholder, and later on,
when we'll actually be able to test the job, we'll add the logic of
installing the needed dependencies.
The obvious dependencies we've spotted so far are:
* From the OS
* jq
* curl (already present)
* From our repo
* yq (using the install_yq script)
* From GitHub
* cri-containerd
* cri-tools
* cni plugins
We may need a few more packages, but we will only figure this out as
part of the actual work.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Use bc tool to perform math operations even when variables contain
values with leading zero.
Fixes: #7317
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR adds double quotes to variables in the blogbench script to
have uniformity across all the tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR builds the foundation for us to start migrating the
cri-containerd tests from Jenkins to GitHub Actions.
Right now the test does nothing and should always finish successfully.
The coming PRs will actually introduce logic to the `gha-run.sh` script
where we'll be able to run the tests and make sure those pass before
having them actually merged.
Fixes: #6543
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Those functions were originally introduced as part of the
`metrics/gha-run.sh` file, but those will be very hand at the time we
start adding more tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
TDX tests need to be temporarily disabled as the current machine
allocated for this will be off for some time, and a new machine only
will be added next week.
Fixes: #7307
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Currently, network endpoints are separate from the device manager
and need to be included for proper management. In order to do so,
we need to refactor the implementation of the network endpoints.
The first step is to restructure the NetworkConfig and NetworkDevice
structures.
Next, we will implement the virtio-net driver and add the Network
device to the Device Manager.
Finally, we'll unify entries with do_handle_device for each endpoint.
Fixes: #7215
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
When running cargo test in container, test_mknod_dev may fail sometimes
because of "Operation not permitted". Change the device path to
"/dev/fifo-test" to avoid this case.
Fixes: #7284
Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
1. Update memory end assert because address space layout differs between
x86 and arm.
2. Set guest_addr for aarch64 in test_handler_insert_region case.
Fixes: #7284
TODO: #7290
Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
This PR updates memory usage script by applying the clean_env_ctr at the main
in order to avoid failures of leaving certain processes not removed.
Fixes#7302
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Currently a mixture of cbl-mariner and mariner is used when creating the
mariner initrd. The kata-static tarball has mariner in the name, but the
jenkins url uses cbl-mariner. This breaks cache usage.
Use mariner as the target name throughout the build, so that caching works.
Fixes: #7292
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This is a very nice suggestion from Steve Horsman, as with that we can
manually trigger the workflow anytime we need to test it, instead of
waiting for a full day for it to be retriggered via the `schedule`
event.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Passing the commit hash as the "pr-number" has shown problematic as it
would make the AKS cluster name longer than what's accepted by AKS.
One easy way to solve this is just passing "nightly" as the PR number,
as that's only used to create the cluster.
Fixes: #7247
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will help folks to monitor the history of the failing tests, as
we've done in Jenkins with the "Green Effort CI".
Fixes: #7279
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We should not go through the trouble of running all our tests on AKS /
Azure / baremetal machines in case a PR only changes our documentation.
Fixes: #7258
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We've noticed this caused regressions with the k8s-oom tests, and then
decided to take a step back and do this in the same way it was done
before 67972ec48a.
Moreover, this step back is also more reasonable in terms of the
controlling logic.
And by doing this we can re-enable the k8s-oom.bats tests, which is done
as part of this PR.
Fixes: #7271
Depends-on: github.com/kata-containers/tests#5705
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
As we need to pass down the commit sha to the jobs that will be
triggered from the `push` event, we must be careful on what exactly
we're using there.
At first we were using ${{ github.ref }}, but this turns out to be the
**branch name**, rather than the commit hash. In order to actually get
the commit hash, Let's use ${{ github.sha }} instead.
Fixes: #7247
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As we need to pass down the commit sha to the jobs that will be
triggered from the `schedule` event, we must be careful on what exactly
we're using there.
At first we were using ${{ github.ref }}, but this turns out to be the
**branch name**, rather than the commit hash. In order to actually get
the commit hash, Let's use ${{ github.sha }} instead, as described by
https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#Fixes: #7247
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's take the same approach of the go runtime, instead, and allocate
the maximum allowed number of vcpus instead.
Fixes: #7270
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit 25d2fb0fde.
The reason we're reverting the commit is because it to check whether
it's the cause for the regression on devmapper tests.
Fixes: #7253
Depends-on: github.com/kata-containers/tests#5705
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
On cc3993d860 we introduced a regression,
where we started passing inputs.commit-hash, instead of
github.event.pull_request.head.sha. However, we have been setting
commit-hash to github.event.pull_request.sha, meaning that we're mssing
a `.head.` there.
github.event.pull_request.sha is empty for the pull_request_target
event, leading the CI to pull the content from `main` instead of the
content from the PR.
Fixes: #7247
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The latter workflow is breaking as it doesn't recognise ${GITHUB_REF},
the former would most likely break as well, but it didn't get triggered
yet.
The error we're facing is:
```
Determining the checkout info
/usr/bin/git branch --list --remote origin/${GITHUB_REF}
/usr/bin/git tag --list ${GITHUB_REF}
Error: A branch or tag with the name '${GITHUB_REF}' could not be found
```
Fixes: #7247
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Remove shadowed get_mounts(), added slog-term as a new crate,
slog can directly log to stdout and we can capture output
in the test-cases that are created in the function to be tested.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We have to do this, otherwise we cannot log into azure.
This is a regression introduced by
106e305717.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Instead of passing "-${{ inputs.tag }}-amd64", we must only pass
"-${{ inputs.tag }}".
This is a regression introduced by
106e305717.
Fixes: #7247
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Using an initrd and setting KATA_INIT=yes meaning we're using the kata-agent
as the init process we need to make sure that the agent is not segfaulting
if mounts are already happened. Some workloads need to configure several
things in the initrd before the kata-agent starts which involves having
/proc or /sys already mounted.
Fixes: #6992
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
de83cd9de7 tried to solve an issue, but it
clearly seems that I'm using env wrongly, as what ended up being passed
as input was "$VAR", instead of the content of the VAR variable.
As we can simply avoid using those here, let's do it and save us a
headache.
Fixes: #7247
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It has to have steps declared, and we need to make it a dependency for
the nightly kata-containers-ci-on-push job.
Fixes: #7247
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise we'll get the following error from the workflow:
```
The workflow is not valid. .github/workflows/ci-on-push.yaml (Line: 24,
Col: 20): Unrecognized named-value: 'env'. Located at position 1 within
expression: env.COMMIT_HASH .github/workflows/ci-on-push.yaml (Line: 25,
Col: 18): Unrecognized named-value: 'env'. Located at position 1 within
expression: env.PR_NUMBER
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes the call to check_metrics function as KATA_HYPERVISOR
is not needed to be passed.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The idea is to mimic what's been done with Jenkins and the "Green CI"
effort, but now using our GHA and the GHA infrastructure.
Fixes: #7247
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make sure we run our tests in a specific namespace, as in case of
any kind of issue, we will just get rid of the namespace itself, which
will take care of cleaning up any leftover from failing tests.
One important thing to mention is why we can get rid of the `namespace:
${namespace}` on the tests that are already using it, and let's do it in
parts:
* namespace: default
We can easily get rid of this as that's the default namespace where
pods are created, so it was a no-op so far.
* namespace: test-quota-ns
My understanding is that we'd need this in order to get a clean
namespace where we'd be setting a quota for. Doing this in the
namespace that's only used for tests should **not** cause any
side-effect on the tests, as we're running those in serial and there's
no other pods running on the `kata-containers-k8s-tests` namespace
Last but not least, we're not dynamically creating namespaces as the
tests are not running in parallel, **never**, not in the case of having
2 tests being ran at same time, neither in the case of having 2 jobs
being scheduled to the same machine.
Fixes: #6864
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is based on the `ci-on-push.yaml` file, and it's called from ther
The reason to split on a new file is that we can easily introduce a
`ci-nightly.yaml` file and re-use the `ci.yaml` file there as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's ensure we're not relying, on any of the called workflows, on event
specific information.
Right now, the two information we've been relying on are:
* PR number, coming from github.event.pull_request.number
* Commit hash, coming from github.event.pull_request.head.sha
As we want to, in the future, add nightly jobs, which will be triggered
by a different event (thus, having different fields populated), we
should ensure that those are not used unless it's in the "top action"
that's trigerred by the event.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Use the 'function' keyword to prevent bash aliases from colliding
with other function's name.
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR enables storing metrics workflow artifacts in two
separated flavours: clh and qemu.
Fixes: #7239
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR adds the function name before the function to have uniformity
across all the test.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds blogbench and webtooling metrics checks to this repo.
The function running the test intentionally returns zero, so
the test will be enabled in another PR once the workflow is
green.
Fixes: #7069
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR usses double quotes in all the variables as well as general fixes
to the memory usage script in order to have uniformity.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Non AKS k8s tests (SEV/SNP/TDX) don't currently set KATA_HOST_OS, so provide a
default empty value for the variable so that those tests can run.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
as OSSKU value, to get rid of this warning when creating the AKS cluster:
WARNING: The osSKU "AzureLinux" should be used going forward instead of
"CBLMariner" or "Mariner". The osSKUs "CBLMariner" and "Mariner" will
eventually be deprecated.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
We only need to install in run_tests() so that the yq install is picked up by
kubernets/setup.sh as well. We also need to either use (sudo &&
INSTALL_IN_GOPATH=false) || (INSTALL_IN_GOPATH=true).
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This PR adds the checkmetrics ci worker file for cloud hypervisor in
order to check the boot times limit.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds double quotes in all variables to have uniformity across
all the gha-run.sh script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds checkmetrics installation for gha-run.sh in order to compare
results limits as part of the metrics CI.
Fixes#7198
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
cargo/rust is installed in one step, we need to write the PATH update to
GITHUBENV so that it becomes visible in the next steps.
Fixes: #7221
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This simply displays the asset name first in GH's UI, so that the
release name (always "test") is truncated rather than the asset name.
Makes things slightly easier to read.
e.g.
build-asset (cloud-hypervisor-glibc, te...
instead of
build-asset (test, cloud-hypervisor-gli...
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This allows setting `USE_CACHE=no` to test building e2e during
developmet without having to comment code blocks and so forth.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This enables building CLH with glibc and the mshv feature as required
for Mariner. At test time, it also configures Kata to use that CLH
flavor when running Mariner.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Mariner ships a bleeding-edge kernel that might be ahead of upstream, so
we use that to guarantee compatibility with the host.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
* Adds a new `rootfs-initrd-mariner` build target.
* Sets the custom initrd path via annotation in `setup.sh` at test
time.
* Adapts versions.yaml to specify a `cbl-mariner` initrd variant.
* Introduces env variable `HOST_OS` at deploy time to enable using a
custom initrd.
* Refactors the image builder so that its caller specifies the desired
guest OS.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Currently, even when using devmapper, if the VMM supports virtio-fs /
virtio-9p, that's used to share a few files between the host and the
guest.
This *needed*, as we need to share with the guest contents like secrets,
certificates, and configurations, via Kubernetes objects like configMaps
or secrets, and those are rotated and must be updated into the guest
whenever the rotation happens.
However, there are still use-cases users can live with just copying
those files into the guest at the pod creation time, and for those
there's absolutely no need to have a shared filesystem process running
with no extra obvious benefit, consuming memory and even increasing the
attack surface used by Kata Containers.
For the case mentioned above, we should allow users, making it very
clear which limitations it'll bring, to run Kata Containers with
devmapper without actually having to use a shared file system, which is
already the approach taken when using Firecracker as the VMM.
Fixes: #7207
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR adds memory foot print metrics to tests/metrics/density
folder.
Intentionally, each test exits w/ zero in all test cases to ensure
that tests would be green when added, and will be enabled in a
subsequent PR.
A workflow matrix was added to define hypervisor variation on
each job, in order to run them sequentially.
The launch-times test was updated to make use of the matrix
environment variables.
Fixes: #7066
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
There is nothing in them that requires them to be macros. Converting
them to functions allows for better error messages.
Fixes: #7201
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
There is nothing in it that requires it to be a macro. Converting it to
a function allows for better error messages.
Fixes: #7201
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
Having a function allows for better error messages from the type checker
and it makes it clearer to callers what can happen. For example:
is_allowed!(req);
Gives no indication that it may result in an early return, and no simple
way for callers to modify the behaviour. It also makes it look like
ownership of `req` is being transferred.
On the other hand,
is_allowed(&req)?;
Indicates that `req` is being borrowed (immutably) and may fail. The
question mark indicates that the caller wants an early return on
failure.
Fixes: #7201
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
Since it is never modified, it doesn't really need a lock of any kind.
Removing the `RwLock` wrapper allows us to remove all `.read().await`
calls when accessing it.
Additionally, `AGENT_CONFIG` already has a static lifetime, so there is
no need to wrap it in a ref-counted heap allocation.
Fixes: #5409
Signed-off-by: Wedson Almeida Filho <walmeida@microsoft.com>
This PR adds the word function before the function names in order to have
uniformity across the script as some are using this and some are not.
Fixes#7196
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Vfio support introduce build error on AArch64. Remove arch related
annotation can avoid this error.
Fixes: #7187
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
The failure mainly caused by the encoded volume path and
the mount/src. As the src will be validated with stat,but
it's not a full path and encoded, which causes the stat
mount source failed.
Fixes: #7186
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This PR adds link to the unreference docs in the cmd path to make
them more discoverable.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds checkmetrics makefile which is used to process the
metrics json results files.
Fixes#7172
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds time tests documentation reference in the general README
for kata metrics.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Unlike the previous usage which requires creating
/dev/xxx by mknod on the host, the new approach will
fully utilize the DirectVolume-related usage method,
and pass the spdk controller to vmm.
And a user guide about using the spdk volume when run
a kata-containers. it can be found in docs/how-to.
Fixes: #6526
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This PR adds the metrics documentation as a general reference in the
main README for kata containers.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the checkmetrics scripts that will be used for the kata metrics CI.
Fixes#7160
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
We have GH configured so that manual approval is required for CI runs
triggered by outside contributors. However, because CI is triggered by
the `pull_request_target` event, this setting isn't being honored
(see [1]). This means that an attacker could trivially extracts secrets
by submitting a PR.
This change aims to mititgate this issue by preventing PRs from
triggering CI unless the `ok-to-test` label is set.
Note: For further context, we use the `pull_request_target` event and
manually check out the PR branch because it is the only way to both
access secrets and test incoming code changes.
Fixes: #7163
[1]: https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
When running on a VM, the kernel parameter "unrestricted_guest" for
kernel module "kvm_intel" is not required. So, return success when running
on a VM without checking value of this kernel parameter.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Implement functionality to add to the env output if the host is capable
of running a VM.
Fixes: #6727
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This PR removes an unrecognized value located in one of the yamls for the
gha in order to make it work the CI again.
Fixes#7149
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR replaces single spaces for tabs in order to fix the indentation
in the init.sh script.
Fixes#7147
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR installs kata static tarball on metrics runner
and run launch-times tests.
Fixes: #7049
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
The common.sh script includes helper functions used in
our metrics tests, so we are gradually adding more
metrics used in kata.
Fixes: #7108
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This test measures the duration of a workload that starts, and then
immediately stops the contianer. Also measures the workload period,
the time to quit period, and the time to kernel period.
Fixes: #7049
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
A new choice of using vfio devic based volume for kata-containers.
With the help of kata-ctl direct-volume, users are able to add a
specified device which is BDF or IOMMU group ID.
To help users to use it smoothly, A doc about howto added in
docs/how-to/how-to-run-kata-containers-with-kinds-of-Block-Volumes.
Fixes: #6525
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Limitations:
As no ready rust vmm's vfio manager is ready, it only supports
part of vfio in runtime-rs. And the left part is to call vmm
interfaces related to vfio add/remove.
So when vmm/vfio manager ready, a new PR will be pushed to
narrow the gap.
Fixes: #6525
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This PR fixes the format for the run launchtimes metrics yaml which
is causing to the workflow to fail.
Fixes#7130
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the json script which allow us to save the metrics results
into a json file which will be used in the kata containers metrics.
Fixes#7128
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This will help to not have to build those on every CI run, and rather
take advantage of the cached image.
Fixes: #7084
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit c720869eef)
Let's add the needed infra for only building and pushing the initramfs
builder image to the Kata Containers' quay.io registry.
Fixes: #7084
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 111ad87828)
Let's first try to pull a pre-existing image, instead of building our
own, to be used as a builder for the initramds.
This will save us some CI time.
Fixes: #7084
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit ebf6c83839)
The `-o` option is the legacy way to configure virtiofsd, inherited
from the C implementation. The rust implementation honours it for
compatibility but it logs deprecation warnings.
Let's use the replacement options in the go shim code. Also drop
references to `-o` from the configuration TOML file.
Fixes#7111
Signed-off-by: Greg Kurz <groug@kaod.org>
The C implementation of virtiofsd had some kind of limited support
for remote POSIX locks that was causing some workflows to fail with
kata. Commit 432f9bea6e hard coded `-o no_posix_lock` in order
to enforce guest local POSIX locks and avoid the issues.
We've switched to the rust implementation of virtiofsd since then,
but it emits a warning about `-o` being deprecated.
According to https://gitlab.com/virtio-fs/virtiofsd/-/issues/53 :
The C implementation of the daemon has limited support for
remote POSIX locks, restricted exclusively to non-blocking
operations. We tried to implement the same level of
functionality in #2, but we finally decided against it because,
in practice most applications will fail if non-blocking
operations aren't supported.
Implementing support for non-blocking isn't trivial and will
probably require extending the kernel interface before we can
even start working on the daemon side.
There is thus no justification to pass `-o no_posix_lock` anymore.
Signed-off-by: Greg Kurz <groug@kaod.org>
The rust implementation of virtiofsd always runs foreground and
spits a deprecation warning when `-f` is passed.
Signed-off-by: Greg Kurz <groug@kaod.org>
This PR adds the test lib common script that is going to be used
for kata containers metrics.
Fixes#7113
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The run-launchtimes-metrics workflow needs to get the commit ID
for the last commit to the head branch of the PR.
Fixes: #7116
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
If we override the cold, hot plug with an annotation
we need to reset the other plugging mechanism to NoPort
otherwise both will be enabled.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
In Virt the vhost-user-block is an PCIe device so
we need to make sure to consider it as well. We're keeping
track of vhost-user-block devices and deduce the correct
amount of PCIe root ports.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
This gh-workflow prints a simple msg, but is the base for future
PRs that will gradually add the jobs corresponding to the kata
metrics test.
Fixes: #7100
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR updates the developer guide at the connect to the debug console
section.
Fixes#7094
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Now it is possible to configure the PCIe topology via annotations
and addded a simple test, checking for Invalid and RootPort
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Removed the configuration of PCIeRootPort and PCIeSwitchPort, those
values can be deduced in createPCIeTopology
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Refactor the bus assignment so that the call to GetAllVFIODevicesFromIOMMUGroup
can be used by any module without affecting the topology.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The hypervisor_state file was the wrong location for the PCIe Port
settings, moved everything under device umbrella, where it can be
consumed more easily and we do not get into circular deps.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
For the GPU CC use case we need to set several crypto algorithms.
The driver relies on them in the CC case.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Use now the sev.conf rather then the snp.conf.
Devices can be prestend in two different way in the
container (1) as vfio devices /dev/vfio/<num>
(2) the device is managed by whataever driver in
the VM kernel claims it.
Fixes: #6844
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
This fixes the builds of `cloud-hypervisor-glibc` and
`rootfs-initrd-mariner` to properly create the `build/` directory.
Fixes: #7098
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR updates the firecracker version to 1.3.3 which includes the following
changes
Fixed passing through cache information from host in CPUID leaf 0x80000006.
A race condition that has been identified between the API thread and the VMM
thread due to a misconfiguration of the api_event_fd.
Fixes#7089
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Kubernetes and Containerd will help calculate the Sandbox Size and pass it to
Kata Containers through annotations.
In order to accommodate this favorable change and be compatible with the past,
we have implemented the handling of the number of vCPUs in runtime-rs. This is
This is slightly different from the original runtime-go design.
This doc introduce how we handle vCPU size in runtime-rs.
Fixes: #5030
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
In this commit, we refactored the logic of static resource management.
We defined the sandbox size calculated from PodSandbox's annotation and
SingleContainer's spec as initial size, which will always be the sandbox
size when booting the VM.
The configuration static_sandbox_resource_mgmt controls whether we will
modify the sandbox size in the following container operation.
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
Some vmms, such as dragonball, will actively help us
perform online cpu operations when doing cpu hotplug.
Under the old onlineCpuMem interface, it is difficult
to adapt to this situation.
So we modify the semantics of nb_cpus in onlineCpuMemRequest.
In the original semantics, nb_cpus represents the number of
newly added CPUs that need to be online. The modified
semantics become that the number of online CPUs in the guest
needs to be guaranteed.
Fixes: #5030
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
The declaration of the cpu number in the cpuset is greater
than the actual number of vcpus, which will cause an error when
updating the cgroup in the guest.
This problem is difficult to solve, so we temporarily clean up
the cpuset in the container spec before passing in the agent.
Fixes: #5030
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
Updating vCPU resources and memory resources of the sandbox and
updating cgroups on the host will always happening together, and
they are all updated based on the linux resources declarations of
all the containers.
So we merge update_cgroups into the update_linux_resources, so we
can better manage the resources allocated to one pod in the host.
Fixes: #5030
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
Support vcpu resizing on runtime side:
1. Calculate vcpu numbers in resource_manager using all the containers'
linux_resources in the spec.
2. Call the hypervisor(vmm) to do the vcpu resize.
3. Call the agent to online vcpus.
Fixes: #5030
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Nobody has volunteered to maintain the (currently broken) snap build, so
remove it.
Fixes: #6769.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
After we support memory resize in Dragonball, we need to update
Cargo.lock in runtime-rs.
Fixes: #6719
Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
Adding vhost and vhost-net to the kernel modules. These do not require
any kernel module parameters to be checked. Currently, kernel params is
a required field. Make this as optional. Could make this as <Option>,
but making this a slice instead, as a module could have multiple kernel
params. Refactor the function that checks are for kernel modules into
two with one specifically checking if the module is loaded and other
checking for module parameters.
Refactor some of the tests to take into account these changes.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This adds the glibc flavor of CLH to the list of assets as preparation
for #6839. Mariner Kata is only tested with glibc.
Fixes: #7026
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
We introduce virtio-balloon device to support memory resize.
virtio-balloon device could reclaim memory from guest to host.
Fixes: #6719
Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
We introduce virtio-mem device to support memory resize. virtio-mem
device could hot-plug more memory blocks to guest and could also
hot-unplug them from guest.
Fixes: #6719
Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
As block/direct volume use similar steps of device adding,
so making full use of block volume code is a better way to
handle direct volume.
the only different point is that direct volume will use
DirectVolume and get_volume_mount_info to parse mountinfo.json
from the direct volume path. That's to say, direct volume needs
the help of `kata-ctl direct-volume ...`.
Details seen at Advanced Topics:
[How to run Kata Containers with kinds of Block Volumes]
docs/how-to/how-to-run-kata-containers-with-kinds-of-Block-Volumes.md
Fixes: #5656
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
In order to support virtio-mem and virtio-balloon devices, we need to
extend DeviceOpContext with VmConfigInfo and InstanceInfo.
Fixes: #6719
Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
The key aspects of the DM implementation refactoring as below:
1. reduce duplicated code
Many scenarios have similar steps when adding devices. so to reduce
duplicated code, we should create a common method abstracted and use
it in various scenarios.
do_handle_device:
(1) new_device with DeviceConfig and return device_id;
(2) try_add_device with device_id and do really add device;
(3) return device info of device's info;
2. return full info of Device Trait get_device_info
replace the original type DeviceConfig with full info DeviceType.
3. refactor find_device method.
Fixes: #5656
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
After we have a guest kernel with builtin initramfs which
provide the rootfs measurement capability and Kata rootfs
image with hash device, we need set related root hash value
and measure config to the kernel params in kata configuration file.
Fixes: #6674
Signed-off-by: Wang, Arron <arron.wang@intel.com>
Integrate initramfs into guest kernel as one binary,
which will be measured by the firmware together.
Fixes: #6674
Signed-off-by: Wang, Arron <arron.wang@intel.com>
The init.sh in initramfs will parse the verity scheme,
roothash, root device and setup the root device accordingly.
Fixes: #6674
Signed-off-by: Wang, Arron <arron.wang@intel.com>
Generate rootfs hash data during creating the kata rootfs,
current kata image only have one partition, we add another
partition as hash device to save hash data of rootfs data blocks.
Fixes: #6674
Signed-off-by: Wang, Arron <arron.wang@intel.com>
Add required kernel config for dm-crypt/dm-integrity/dm-verity
and related crypto config.
Add userspace command line tools for disk encryption support
and ext4 file system utilities.
Fixes: #6674
Signed-off-by: Arron Wang <arron.wang@intel.com>
This PR updates the link to the correspondent Developer Guide at the
enabling full containerd debug that we have for kata 2.0 documentation.
Fixes#7034
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
When the version of libc is upgraded to 0.2.145, older getrandom could not adapt
to new API, and this will make agent-ctl fail to compile.
We upgrade the version of `rand`, so the low version of getrandom will no longer
need.
Fixes: #7032
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Fixes: #5401, #6654
- Switch kata-ctl from eprintln!()/println!() to structured logging via
the logging library which uses slog.
- Adds a new create_term_logger() library call which enables printing
log messages to the terminal via a less verbose / more human readable
terminal format with colors.
- Adds --log-level argument to select the minimum log level of printed messages.
- Adds --json-logging argument to switch to logging in JSON format.
Co-authored-by: Byron Marohn <byron.marohn@intel.com>
Co-authored-by: Luke Phillips <lucas.phillips@intel.com>
Signed-off-by: Jayant Singh <jayant.singh@intel.com>
Signed-off-by: Byron Marohn <byron.marohn@intel.com>
Signed-off-by: Luke Phillips <lucas.phillips@intel.com>
Signed-off-by: Kelby Madal-Hellmuth <kelby.madal-hellmuth@intel.com>
Signed-off-by: Liz Lawrens <liz.lawrens@intel.com>
Github Actions reads and runs workflow files from the main branch,
rather than from the PR branch. This means that PRs that modify workflow
files aren't being tested with the updated workflows coming from the PR,
but rather with the old workflows from the main branch. AFAIK, this
behavior isn't avoidable for workflow files (but is for other scripts).
This makes it very hard to reliably test workflow changes before they're
actually merged into main and leads to issues that we have to hotifx
(see #6983, #6995).
This PR aims to mitigate that by extracting the commands used in
workflows to a separate script file. The way our CI is set up, those
script files are read from the PR branch and thus changes would be
reflected in the CI checks.
Fixes: #6971
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
In hypervisors that do not support virtiofs we have to copy files in
the VM sandbox to properly setup the network (resolv.conf, hosts, and hostname).
To do that, we construct the volume as before, with the addition of an extra
variable that designates the path where the file will reside in the sandbox.
In this case, we issue a `copy_file` agent request *and* we patch the spec
to account for this change.
Fixes: #6978
Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
Signed-off-by: George Pyrros <gpyrros@nubificus.co.uk>
When dragonball update dbs-boot crate in commit
64c764c147, the Cargo.lock in runtime-rs
should also be updated.
Fixes: #6969
Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
Full SHA is 40 characters, while AKS cluster name has a limit of 63. Trim the
SHA to 12 characters, which is widely considered to be unique enough and is
short enough to be used in the cluster name
Fixes: #7010
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Let's start adding the status of our jobs as part of our main page, so
folks monitoring those can easily check whether they're okay, or if
someone has to be pinged about those.
Fixes: #7008
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We added that to create the cluster name, but I forgot to add that to
the part we get the k8s config file, or to the part where we delete the
AKS cluster.
Fixes: #6999
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We need to do so, otherwise we'll create two clusters for testing Cloud
Hypervisor with exactly the same name, one using Ubuntu, and one using
Mariner.
Fixes: #6999
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The string representing the architecture aarch64 and x86_64 need to be changed to arm64 and amd64 for the release.
Fixes: #6986
Signed-off-by: SinghWang <wangxin_0611@126.com>
There were recent changes for the tdx kernel in the version.yaml that are
not currently accounted for in the build-kernel.sh script.
Attempts to setup a tdx kernel to build local changes seemed to not download
the tdx kernel. Instead the mainline kernel is downloaded which has no
tdx-related changes.
The version.yaml has a new entry for tdx kernel. Use that instead for
setting up and downloading the tdx kernel.
Fixes: #6984
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
While the Mariner Kata host is in preview, we need the `aks-preview`
extension to enable the `--workload-runtime KataMshvVmIsolation` flag.
Fixes: #6994
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR is to make an environment variable `BUILDER_REGISTRY` configurable
so that those who want to use their own registry for build can set up
the registry.
Fixes: #6988
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The vcpu hotplug/hotunplug feature is implemented with upcall. This commit
add three patches to support the feature on aarch64. Patches:
> 0005: add support of upcall on aarch64
> 0006: skip activate offline cpus' MSI interrupt
> 0007: set the correct boot cpu number
Fixes: #6010
Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
This commit implements the vcpu_boot_onlined vector in get_fdt_vm_info.
"boot_enabled" means whether this vcpu should be onlined at first boot.
It will be used by fdt, which write an attribute called boot_enabled,
and will be handled by guest kernel to pass the correct cpu number to
function "bringup_nonboot_cpus".
Fixes: #6010
Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
This commit add support of resize_vcpu on aarch64. As kvm will check
whether vgic is initialized when calling KVM_CREATE_VCPU ioctl, all the
vcpu fds should be created before vm is booted.
To support resizing vcpu scenario, we use max_vcpu_count for
create_vcpus and setup_interrupt_controller interfaces. The
SetVmConfiguration API will ensure max_vcpu_count >= boot_vcpu_count.
Fixes: #6010
Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
dbs-boot-v0.4.0 refectors the create_fdt interface. It simplifies the
parameters needed to be passed and abstracts them into three structs.
By the way, it also reserves some interfaces for future feature: numa
passthrough and cache passthrough.
Fixes: #6969
Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
Rewrite the comment of Vm::init_microvm method for aarch64.
Fixes cargo test warnings on aarch64.
Fixes: #6969
Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
Move the get_volume_mount_info to kata-types/src/mount.rs.
If so, it becomes a common method of DirectVolumeMountInfo
and reduces duplicated code.
Fixes: #6701
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
When run a exec process in backgroud without tty, the
exec will hang and didn't terminated.
For example:
crictl -i <container id> sh -c 'nohup tail -f /dev/null &'
Fixes: #4747
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
The current testing setup only supports running Kata on top of an Ubuntu
host. This adds Mariner to the matrix of testable hosts for k8s
tests, with Cloud Hypervisor as a VMM.
As preparation for the upcoming PR that will change only the actual test
code (rather than workflow YAMLs), this also introduces a new file
`setup.sh` that will be used to set host-specific parameters at test
run-time.
Fixes: #6961
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
sandbox_bind_mounts supports kinds of mount patterns, for example:
(1) "/path/to", default readonly mode.
(2) "/path/to:ro", same as (1).
(3) "/path/to:rw", readwrite mode.
Both support configuration and annotation:
(1)[runtime]
sandbox_bind_mounts=["/path/to", "/path/to:rw", "/mnt/to:ro"]
(2) annotation will alse be supported, restricted as below:
io.katacontainers.config.runtime.sandbox_bind_mounts
= "/path/to /path/to:rw /mnt/to:ro"
Fixes: #6597
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
We're still facing issues related to the time taken to deploy the
kata-deplot daemonset and starting to run the tests.
Ideally, we should solve this with a readiness probe, and that's the
approach we want to take in the future. However, for now, let's just
make sure those tests are not on the way of the community.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We've seen tests being aborted close to the end of the run due to the
timeout. Let's increase it, avoiding to hit such cases again..
Fixes: #6964
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We're currently backing up and restoring all the possible shim files,
but the default one ("containerd-shim-kata-v2").
Let's ensure this is also backed up and restored.
Fixes: #6957
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes the indentation on the kata deploy merge script
that instead of single spaces uses a tap.
Fixes#6925
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
There is a race condition when virtiofsd is killed without finishing all
the clients. Because of that, when a pod is stopped, QEMU detects
virtiofsd is gone, which is legitimate.
Sending a SIGTERM first before killing could introduce some latency
during the shutdown.
Fixes#6757.
Signed-off-by: Beraldo Leal <bleal@redhat.com>
We previously were doing:
* Create a new image on kata-deploy-ci using the commit hash of the
latest tag
* This was used to test on AKS, which is no longer needed as we test
on AKS on every PR
* Create a new image on kata-deploy using the release tag and "latest"
or "stable", by tagging the kata-deploy-ci image accordingly
As part of cfe63527c5, we broke the
workflow described above, as in the first step we would save the PKG_SHA
to be used in the second step, but that part ended up being removed.
Anyways, this back and forth is not needed anymore and we can simplify
the process by doing:
* Create a new image on kata-deploy, using:
- The tag received as ref from the event that triggered this worklow
- "latest" or "stable" tag, depending on whether it's a stable release
or not
Fixes: #6946
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
For some bizarre reason, the login-action will simply fail to
authenticate to docker.io in it's specified as a registry. The way to
proceed, instead, is to *not* specify any registry as it'd be used by
default.
Fixes: #6943
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR updates the nydus version to 2.2.1. This change includes:
nydus-image: fix a underflow issue in get_compressed_size()
backport fix/feature to stable 2.2
[backport] contrib: upgrade runc to v1.1.5
service: add README for nydus-service
nydus: fix a possible panic caused by SubCmdArgs::is_present
Backports two bugfixes from master into stable/v2.2
[backport stable/v2.2] action: upgrade golangci-lint to v1.51.2
[backport] action: fix smoke test for branch pattern
Fixes#6938
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
`docker/login-action@v3` does *not* exist and `docker/login-action@v2`
should be used instead.
Fixes: #6934
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
- Fix cache for OVMF and rootfs-initrd (both x86_64)
- Upgrade to Cloud Hypervisor v32.0
- osbuilder: Bump fedora image version
- local-build: Standardise what's set for the local build scripts
- gha: aks: Wait a little bit more before run the tests
- docs: Update container network model url
- gha: release: Fix s390x worklow
- cache: Fix OVMF caching
- gha: payload-after-push: Pass secrets down
- tools: Fix arch bug
22154e0a3 cache: Fix OVMF tarball name for different flavours
b7341cd96 cache: Use "initrd" as `initrd_type` to build rootfs-initrd
b8ffcd1b9 osbuilder: Bump fedora image version
636539bf0 kata-deploy: Use apt-key.gpg from k8s.io
ae24dc73c local-build: Standardise what's set for the local build scripts
35c3d7b4b runtime: clh: Re-generate the client code
cfee99c57 versions: Upgrade to Cloud Hypervisor v32.0
ad324adf1 gha: aks: Wait a little bit more before run the tests
191b6dd9d gha: release: Fix s390x worklow
cfd8f4ff7 gha: payload-after-push: Pass secrets down
75330ab3f cache: Fix OVMF caching
a89b44aab tools: Fix arch bug
11a34a72e docs: Update container network model url
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
75330ab3f9 tried to fix OVMF caching, but
didn't consider that the "vanilla" OVMF tarball name is not
"kata-static-ovmf-x86_64.tar.xz", but rather "kata-static-ovmf.tar.xz".
The fact we missed that, led to the cache builds of OVMF failing, and
the need to build the component on every single PR.
Fixes: #6917 (hopefully for good this time).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We've been defaulting to "", which would lead to a mismatch with the
latest version from the cache, causing a miss, and finally having to
build the rootfs-initrd as part of the tests, every single time.
Fixes: #6917
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We're facing some issues to download / use the public key provided by
google for installing kubernetes as part of the kata-deploy image.
```
The following signatures couldn't be verified because the public key is
not available: NO_PUBKEY B53DC80D13EDEF05
Reading package lists... Done
W: GPG error: https://packages.cloud.google.com/apt kubernetes-xenial
InRelease: The following signatures couldn't be verified because the
public key is not available: NO_PUBKEY B53DC80D13EDEF05 E: The
repository 'https://apt.kubernetes.io kubernetes-xenial InRelease' is
not signed.
N: Updating from such a repository can't be done securely, and is
therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user
configuration details.
```
Let's work this around following the suggestion made by @dims, at:
https://github.com/kubernetes/k8s.io/pull/4837#issuecomment-1446426585
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We've a discrepancy on what's set along the scripts used to build the
Kata Cotainers artefacts locally.
Some of those were missing a way to easily debug them in case of a
failure happens, but one specific one (build-and-upload-payload.sh)
could actually silently fail.
All of those have been changed as part of this commut.
Fixes: #6908
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This patch re-generates the client code for Cloud Hypervisor v32.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.
Fixes: #6632
Signed-off-by: Bo Chen <chen.bo@intel.com>
fa832f4709 increased the timeout, which
helped a lot, mainly in the TEE machines. However, we're still seeing
some failures here and there with the AKS tests.
Let's bump it yet again and, hopefully, those errors to start the tests
will go away.
Fixes: #6905
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
GitHub is warning us that:
"""
The workflow is not valid. In .github/workflows/release.yaml (Line: 21,
Col: 11): Error from called workflow
kata-containers/kata-containers/.github/workflows/release-s390x.yaml@d2e92c9ec993f56537044950a4673e50707369b5
(Line: 14, Col: 12): Job 'kata-deploy' depends on unknown job
'create-kata-tarball'.
"""
This is happening as we need to reference
"build-kata-static-tarball-s390x" instead of "create-kata-tarball".
Fixes: #6903
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The "build-assets-${arch}" jobs need to have access to the secrets in
order to log into the container registry in the cases where
"push-to-registry", which is used to push the builder containers to
quay.io, is set to "yes".
Now that "build-assets-${arch}" pass the secrets down, we need to log
into the container registry in the "build-kata-static-tarball-${arch}"
files, in case "push-to-registry" is set to "yes".
Fixes: #6899
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
OVMF has been cached, but it's not been used from cache as the `version`
set in the cached builds has always been empty.
The reason for that is because we've been trying to look for
`externals.ovmf.ovmf.version`, while we should be actually looking for
`externals.ovmf.x86_64.version`.
Setting `x86_64` as the OVMF_FLAVOUR would cause another bug, as the
expected tarball name would then be `kata-static-x86_64.tar.xz`, instead
of `kata-static-ovmf-x86_64.tar.xz`.
With everything said, let's simplify the OVMF_FLAVOUR usage, by using it
as it's passed, and only adapting the tarball name for the TDVF case,
which is the abnormal one.
Fixes: #6897
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
- runtime: Use static_sandbox_resource_mgmt=true for TEEs
- update tokio dependency
- resource-control: fix setting CPU affinities on Linux
- runtime: use enable_vcpus_pinning from toml
- gha: k8s: Make the tests more reliable
- gha: Enable SEV-SNP tests on main
- gha: tdx: Use the k3s overlay for kata-cleanup
- runtime: Port sev package to main
- gpu: Rename the last bits from `gpu` to `nvidia-gpu`
- deploy: fix shell script error
- ppc64le: switch virtiofsd from C to rust version
- osbuilder: Fix indentation in rootfs.sh
- virtcontainers/qemu_test.go: Improve coverage
- agent: Add context to errors that may occur when AgentConfig file is …
- virtcontainers/pkg/compatoci/: Improved coverage for for Kata 2.0
- kata-manager: Fix '-o' syntax and logic error
- kata-ctl: Add the option to install kata-ctl to a user specified directory
- runtime-rs: fix building instructions to use correct required Rust ve…
- Dragonball: use LinuxBootConfigurator::write_bootparams
- kata-deploy: Add http_proxy as part of the docker build
- kata-deploy: Do not ship the kata tarball
- kata-deploy: Build improvements
- deploy: Fix arch in image tag
- Revert "kata-deploy: Use readinessProbe to ensure everything is ready"
- virtcontainers: Improved test coverage for fc.go from 4.6% to 18.5%
- main | release: Fix multi-arch publishing is not supported
- cache: More fixes to nvidia-gpu kernels caching
- runtime: remove overriding ARCH value by default for ppc64le
- gha: Fix Body Line Length action flagging empty body commit messages
- gha: Fix snap creation workflow
- cache: Fix nvidia-gpu version
- cache: Update the KERNEL_FLAVOUR list to include nvidia-gpu
- packaging: Add SEV-SNP artifacts to main
- docs: Mark snap installation method as unmaintained
- packaging: Add sev artifacts to main
- kata-ctl: add generic kvm check & unit test
- Log-parser-rs
- warning_fix: fix warnings when build with cargo-1.68.0
- cross-compile: Include documentation and configuration for cross-compile
- runtime: Fix virtiofs fd leak
- gpu: cold plug VFIO devices
- pkg/signals: Improved test coverage 60% to 100%
- virtcontainers/persist: Improved test coverage 65% to 87.5%
- virtcontainers/clh_test.go: improve unit test coverage
- virtcontainers/factory: Improved test coverage
- gha: Also run k8s tests on qemu-snp
- gha: sev: fix for kata-deploy error
- gha: Also run k8s tests on qemu-sev
- Implement the "kata-ctl env" command
- runtime-rs: support keep_abnormal in toml config
- gpu: Build and Ship an GPU enabled Kernel
- kata-ctl: checks for kvm, kvm_intel modules loaded
- osbuilder: Fix D-Bus enabling in the dracut case
- snap: fix docker start fail issue
- kata-manager: Fix containerd download
- agent: Fix ut issue caused by fd double closed
- Bump ttrpc to 0.7.2 and protobuf to 3.2.0
- gpu: Add GPU enabled confguration and runtime
- gpu: Do not pass-through PCI (Host) Bridges
- cache-components: Fix caching of TDVF and QEMU for TDX
- gha: tdx: Ensure kata-deploy is removed after the tests run
- versions: Upgrade to Cloud Hypervisor v31.0
- osbuilder: Enable dbus in the dracut case
- runtime: Don't create socket file in /run/kata
- nydus_rootfs/prefetch_files: add prefetch_files for RAFS
- runtime-rs/virtio-fs: add support extra handler for cache mode.
- runtime-rs: enable nerdctl to setup cni plugin
- tdx: Add artefacts from the latest TDX tools release into main
- runtime: support non-root for clh
- gha: ci-on-push: Run k8s tests with dragonball
- rustjail: Use CPUWeight with systemd and CgroupsV2
- gha: k8s-on-aks: {create,delete} AKS must be a coded-in step
- docs: update the rust version from version.yaml
- gha: k8s-on-aks: Set {create,delete}_aks as steps
- gha: k8s-on-aks: Fix cluster name
- gha: Also run k8s tests on AKS with dragonball
- gha: Only push images to registry after merging a PR
- gha: aks: Use D4s_v5 instance
- tools: Avoid building the kernel twice
- rustjail: Fix panic when cgroup manager fails
- runtime: add filter metrics with specific names
- gha: Use ghcr.io for the k8s CI
- GHA |Switch "kubernetes tests" from jenkins to GitHub actions
- docs: Update CNM url in networking document
- kata-ctl: add function to get platform protection.
f6e1b1152 agent: update tokio dependency
4cb83dc21 kata-ctl: update tokio dependency
df615ff25 runk: update tokio dependency
ca6892ddb runtime-rs: update tokio dependency
ca1531fe9 runtime: Use static_sandbox_resource_mgmt=true for TEEs
fa832f470 gha: k8s: Make the tests more reliable
cbb9fe8b8 config: Use standard OVMF with SEV
724437efb kata-deploy: add kata-qemu-sev runtimeclass
521dad2a4 Tests: skip CPU constraints test on SEV and SNP
72308ddb0 gha: ci-on-push: Don't skip tests for SEV
da0f92cef gha: ci-on-push: Don't skip tests for SEV-SNP
12f43bea0 gha: tdx: Use the k3s overlay for kata-cleanup
1a3f8fc1a deploy: fix shell script error
87cb98c01 osbuilder: Fix indentation in rootfs.sh
c5a59caca ppc64le: switch virtiofsd from C to rust version
bfdf0144a versions: Bump virtiofsd to 1.6.1
dd7562522 runtime: pkg/sev: Add kbs utility package for SEV pre-attestation
05de7b260 runtime: Add sev package
3a9d3c72a gpu: Rename the last bits from `gpu` to `nvidia-gpu`
4cde844f7 local-build: Fix kernel-nvidia-gpu target name
593840e07 kata-ctl: Allow INSTALL_PATH= to be specified
bdb75fb21 runtime: use enable_vcpus_pinning from toml
20cb87508 virtcontainers/qemu_test.go: Improve test coverage
b9a1db260 kata-deploy: Add http_proxy as part of the docker build
3e85bf5b1 resource-control: fix setting CPU affinities on Linux
5f3f844a1 runtime-rs: fix building instructions with respect to required Rust version
777c3dc8d kata-deploy: Do not ship the kata tarball
50cc9c582 tests: Improve coverage for virtcontainers/pkg/compatoci/ for Kata 2.0
136e2415d static-build: Download firecracker instead of building it
3bf767cfc static-build: Adjust ARCH for nydus
ac88d34e0 static-build: Use relased binary for CLH (aarch64)
73913c8eb kata-manager: Fix '-o' syntax and logic error
2856d3f23 deploy: Fix arch in image tag
e8f81ee93 Revert "kata-deploy: Use readinessProbe to ensure everything is ready"
cfe63527c release: Fix multi-arch publishing is not supported
197c33651 Dragonball: use LinuxBootConfigurator::write_bootparams to writes the boot parameters into guest memory.
4d17ea4a0 cache: Fix nvidia-snp caching version
a133fadbf cache: Fix nvidia-gpu-tdx-experimental cache URL
b9990c201 cache: Fix nvidia-gpu version
c9bf7808b cache: Update the KERNEL_FLAVOUR list to include nvidia-gpu
3665b4204 gpu: Rename `gpu` targets to `nvidia-gpu`
2c90cac75 local-build: fixup alphabetization
4da6eb588 kata-deploy: Add qemu-snp shim
14dd05375 kata-deploy: add kata-qemu-snp runtimeclass
0bb37bff7 config: Add SNP configuration
af7f2519b versions: update SEV kernel description
dbcc3b5cc local-build: fix default values for OVMF build
b8bbe6325 gha: build OVMF for tests and release
cf0ca265f local-build: Add x86_64 OVMF target
db095ddeb cache: add SNP flavor to comments
f4ee00576 gha: Build and ship QEMU for SNP
7a58a91fa docs: update SNP guide
879333bfc versions: update SNP QEMU version
38ce4a32a local-build: add support to build QEMU for SEV-SNP
5f8008b69 kata-ctl: add unit test for kvm check
a085a6d7b kata-ctl: add generic kvm check
772d4db26 gha: Build and ship SEV initrd
45fa36692 gha: Build and ship SEV OVMF
4770d3064 gha: Build and ship SEV kernel.
fb9c1fc36 runtime: Add qemu-sev config
813e4c576 runtimeClasses: add sev runtime class
af18806a8 static-build: Add caching support to sev ovmf
76ae7a3ab packaging: adding caching capability for kernel
12c5ef902 packaging: add support to build OVMF for SEV
b87820ee8 packaging: add support to build initrd for sev
e1f3b871c docs: Mark snap installation method as unmaintained
022a33de9 agent: Add context to errors when AgentConfig file is missing
b0e6a094b packaging: Add sev kernel build capability
a4c0303d8 virtcontainers: Fixed static checks for improved test coverage for fc.go
8495f830b cross-compile: Include documentation and configuration for cross-compile
13d7f39c7 gpu: Check for VFIO port assignments
6594a9329 tools: made log-parser-rs
03a8cd69c virtcontainers: Improved test coverage for fc.go from 4.6% to 18.5%
9e2b7ff17 gha: sev: fix for kata-deploy error
5c9246db1 gha: Also run k8s tests on qemu-snp
c57a44436 gha: Add the ability to test qemu-snp
406419289 env: Utilize arch specific functionality to get cpu details
fb40c71a2 env: Check for root privileges
1016bc17b config: Add api to fetch config from default config path
b908a780a kata-env: Pass cmd option for file path
b1920198b config: Workaround the way agent and hypervisor configs are fetched
f2b2621de kata-env: Implement the kata-env command.
c849bdb0a gha: Also run k8s tests on qemu-sev
6bf1fc605 virtcontainers/factory: Improved test coverage
0d49ceee0 gha: Fix snap creation workflow warnings
138ada049 gpu: Cold Plug VFIO toml setting
defb64334 runtime: remove overriding ARCH value by default for ppc64le
f7ad75cb1 gpu: Cold-plug extend the api.md
0fec2e698 gpu: Add cold-plug test
f2ebdd81c utils: Get rid of spurious print statement left behind.
9a94f1f14 make: Export VERSION and COMMIT
2f81f48da config: Add file under /opt as another location to look for the config
07f7d17db config: Make the pipe_size field optional
68f635773 config: Make function to get the default conf file public
7565b3356 kata-ctl: Implement Display trait for GuestProtection enum
94a00f934 utils: Make certain constants in utils.rs public
572b338b3 gitignore: Ignore .swp and .swo editor backup files
376884b8a cargo: Update version of clap to 4.1.13
17daeb9dd warning_fix: fix warnings when build with cargo-1.68.0
521519d74 gha: Add the ability to test qemu-sev
205909fbe runtime: Fix virtiofs fd leak
5226f15c8 gha: Fix Body Line Length action flagging empty body commit messages
0f45b0faa virtcontainers/clh_test.go: improve unit test coverage
dded731db gpu: Add OVMF setting for MMIO aperture
2a830177c gpu: Add fwcfg helper function
131f056a1 gpu: Extract VFIO Functions to drivers
c8cf7ed3b gpu: Add ColdPlug of VFIO devices with devManager
e2b5e7f73 gpu: Add Rawdevices to hypervisor
6107c32d7 gpu: Assign default value to cold-plug
377ebc2ad gpu: Add configuration option for cold-plug VFIO
c18ceae10 gpu: Add new struct PCIePort
9c38204f1 virtcontainers/persist: Improved test coverage 65% to 87.5%
1c1ee8057 pkg/signals: Improved test coverage 60% to 100%
cc8ea3232 runtime-rs: support keep_abnormal in toml config
96e8470db kata-manager: Fix containerd download
432d40744 kata-ctl: checks for kvm, kvm_intel modules loaded
b1730e4a6 gpu: Add new kernel build option to usage()
3e7b90226 osbuilder: Fix D-Bus enabling in the dracut case
53c749a9d agent: Fix ut issue caused by fd double closed
2e3f19af9 agent: fix clippy warnings caused by protobuf3
4849c56fa agent: Fix unit test issue cuased by protobuf upgrade
0a582f781 trace-forwarder: remove unused crate protobuf
73253850e kata-ctl: remove unused crate ttrpc
76d2e3054 agent-ctl: Bump ttrpc from 0.6.0 to 0.7.1
eb3d20dcc protocols: Add ut for Serde
59568c79d protocols: add support for Serde
a6b4d92c8 runtime-rs: Bump ttrpc from 0.6.0 to 0.7.1
ac7c63bc6 gpu: Add containerd shim for qemu-gpu
a0cc8a75f gpu: Add a kube runtime class
a81fff706 gpu: Adding a GPU enabled configuration
8af6fc77c agent: Bump ttrpc from 0.6.0 to 0.7.1
009b42dbf protocols: Fix unit test
392732e21 protocols: Bump ttrpc from 0.6.0 to 0.7.1
f4f958d53 gpu: Do not pass-through PCI (Host) Bridges
825e76948 gpu: Add GPU support to default kernel without any TEE
e4ee07f7d gpu: Add GPU TDX experimental kernel
a1272bcf1 gha: tdx: Fix typo overlay -> overlays
3fa0890e5 cache-components: Fix TDVF caching
80e3a2d40 cache-components: Fix TDX QEMU caching
87ea43cd4 gpu: Add configuration fragment
aca6ff728 gpu: Build and Ship an GPU enabled Kernel
dc662333d runtime: Increase the dial_timeout
eb1762e81 osbuilder: Enable dbus in the dracut case
f478b9115 clh: tdx: Update timeouts for confidential guest
3b76abb36 kata-deploy: Ensure node is ready after CRI Engine restart
5ec9ae0f0 kata-deploy: Use readinessProbe to ensure everything is ready
ea386700f kata-deploy: Update podOverhead for TDX
e31efc861 gha: tdx: Use the k3s overlay
542bb0f3f gha: tdx: Set KUBECONFIG env at the job level
d7fdf19e9 gha: tdx: Delete kata-deploy after the tests finish
da35241a9 tests: k8s: Skip k8s-cpu-ns when testing TDX
db2cac34d runtime: Don't create socket file in /run/kata
6d315719f snap: fix docker start fail issue
e4b3b0887 gpu: Add proper CONFIG_LOCALVERSION depending on TEE
69ba2098f runtime-rs: remove network entities and netns
b31f103d1 runtime-rs: enable nerdctl cni plugin
69d7a959c gha: ci-on-push: Run tests on TDX
5a0727ecb kata-deploy: Ship kata-qemu-tdx runtimeClass
98682805b config: Add configuration for QEMU TDX
3e1580019 govmm: Directly pass the firmware using -bios with TDX
3c5ffb0c8 govmm: Set "sept-ve-disable=on"
ed145365e runtime/qemu: Drop "kvm-type=tdx"
25b3cdd38 virtcontainers: Drop check for the `tdx` CPU flag
01bdacb4e virtcontainers: Also check /sys/firmwares/tdx for TDX
9feec533c cache: Add ability to cache OVMF
ce8d98251 gha: Build and ship the OVMF for TDX
39c3fab7b local-build: Add support to build OVMF for TDX
054174d3e versions: Bump OVMF for TDX
800fb49da packaging: Add get_ovmf_image_name() helper
fbf03d7ac cache: Document kernel-tdx-experimental
5d79e9696 cache: Add a space to ease the reading of the kernel flavours
6e4726e45 cache: Fix typos
fc22ed0a8 gha: Build and ship the Kernel for TDX
502844ced local-build: Add support to build Kernel for TDX
b2585eecf local-build: Avoid code duplication building the kernel
f33345c31 versions: Update Kernel TDX version
20ab2c242 versions: Move Kernel TDX to its own experimental entry
3d9ce3982 cache: Allow specifying the QEMU_FLAVOUR
33dc6c65a gha: Build and ship QEMU for TDX
eceaae30a local-build: Add support to build QEMU for TDX
f7b7c187e static-build: Improve qemu-experimental build script
3018c9ad5 versions: Update QEMU TDX version
800ee5cd8 versions: Move QEMU TDX to its own experimental entry
1315bb45f local-build: Add dragonball kernel to the `all` target
73e108136 local-build: Rename non vanilla kernel build functions
1d851b4be local-build: Cosmetic changes in build targets
49ce685eb gha: k8s-on-aks: Always delete the AKS cluster
e2a770df5 gha: ci-on-push: Run k8s tests with dragonball
d1f550bd1 docs: update the rust version from versions.yaml
f3595e48b nydus_rootfs/prefetch_files: add prefetch_files for RAFS
3bfaafbf4 fix: oci hook
c1fbaae8d rustjail: Use CPUWeight with systemd and CgroupsV2
375187e04 versions: Upgrade to Cloud Hypervisor v31.0
79f3047f0 gha: k8s-on-aks: {create,delete} AKS must be a coded-in step
2f35b4d4e gha: ci-on-push: Only run on `main` branch
e7bd2545e Revert "gha: ci-on-push: Depend on Commit Message Check"
0d96d4963 Revert "gha: ci-on-push: Adjust to using workflow_run"
c7ee45f7e Revert "gha: ci-on-push: Adapt chained jobs to workflow_run"
5d4d72064 Revert "gha: k8s-on-aks: Fix cluster name"
13d857a56 gha: k8s-on-aks: Set {create,delete}_aks as steps
dc6569dbb runtime-rs/virtio-fs: add support extra handler for cache mode.
85cc5bb53 gha: k8s-on-aks: Fix cluster name
1688e4f3f gha: aks: Use D4s_v5 instance
108d80a86 gha: Add the ability to also test Dragonball
2550d4462 gha: build-kata-static-tarball: Only push to registry after merge
e81b8b8ee local-build: build-and-upload-payload is not quay.io specific
13929fc61 gha: publish-kata-deploy-payload: Improve registry login
41026f003 gha: payload-after-push: Pass registry / repo as inputs
7855b4306 gha: ci-on-push: Adapt chained jobs to workflow_run
3a760a157 gha: ci-on-push: Adjust to using workflow_run
a159ffdba gha: ci-on-push: Depend on Commit Message Check
8086c75f6 gha: Also run k8s tests on AKS with dragonball
fe86c08a6 tools: Avoid building the kernel twice
3215860a4 gha: Set ci-on-push to run on `pull_request_target`
d17dfe4cd gha: Use ghcr.io for the k8s CI
b661e0cf3 rustjail: Add anyhow context for D-Bus connections
60c62c3b6 gha: Remove kata-deploy-test.yaml
43894e945 gha: Remove kata-deploy-push.yaml
cab9ca043 gha: Add a CI pipeline for Kata Containers
53b526b6b gha: k8s: Add snippet to run k8s tests on aks clusters
c444c24bc gha: aks: Add snippets to create / delete aks clusters
11e0099fb tests: Move k8s tests to this repo
73be4bd3f gha: Update actions for release.yaml
d38d7fbf1 gha: Remove code duplication from release.yaml
56331bd7b gha: Split payload-after-push-*.yaml
a552a1953 docs: Update CNM url in networking document
7796e6ccc rustjail: Fix minor grammatical error in function name
41fdda1d8 rustjail: Do not unwrap potential error with cgroup manager
a914283ce kata-ctl: add function to get platform protection.
0f7351556 runtime: add filter metrics with specific names
cbe6ad903 runtime: support non-root for clh
d3bb25418 utils: Add function to check vhost-vsock
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
If a hypervisor debug console is enabled and sandbox_cgroup_only is set,
the hypervisor can fail to open /dev/ptmx, which prevents the sandbox
from launching.
This is caused by the absence of a device cgroup entry to allow access
to /dev/ptmx. When sandbox_cgroup_only is not set, the hypervisor
inherits the default unrestrcited device cgroup, but with it enabled it
runs into allow / deny list restrictions.
Fix by adding an allowlist entry for /dev/ptmx when debug is enabled,
sandbox_cgroup_only is true, and no /dev/ptmx is already in the list of
devices.
Fixes: #6870
Signed-off-by: Krister Johansen <kjlx@templeofstupid.com>
This PR updates the container network model url that is part of the
virtcontainers documentation.
Fixes#6889
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
When this option is enabled the runtime will attempt to determine the
appropriate sandbox size (memory, CPU) before booting the virtual
machine.
As TEEs do not support memory and CPU hotplug, this approach must be
used.
Fixes: #6818
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We like it or not, every now and then we'll have to deal with flaky
tests, and our tests using GHA are not exempt from that fact.
With this simple commit, we're trying to improve the reliability of the
tests in a few different fronts:
* Giving enough time for the script used by kata-deploy to be executed
* We've hit issues as the kata-deploy pod is considered "Ready" at the
moment it starts running, not when it finishes the needed setup. We
should also be looking on how to solve this on the kata-deploy side
but, for now, let's ensure our tests do not break with the current
kata-deploy behavior.
* Merging the "Deploy kata-deploy" and "Run tests" steps
* We've hit issues re-running tests and seeing even more failures than
the ones we're trying to debug, as a step will simply be taken as
succeeded as part of the re-run, in case it was successful executed
as part of the first run. This causes issues with the kata-deploy
deployment, as the tests would start running before even having the
node set up for running Kata Containers.
Fixes: #6865#6649
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The AmdSev firmware package should be used with
measured direct boot. If the expected hashes are not
injected into the firmware binary by the VMM, the
guest will not boot. This is required for security.
Currently the main branch does not have the extended
shim support for SEV, which tells the VMM to inject
the expected hashes.
We ship the standard OVMF package to use with SNP,
so let's switch SEV to that for now. This will need
to be changed back when shim support for SEV(-ES)
is added to main.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
In order to populate containerd config file with
support for SEV, we need to add the qemu-sev shim
to the kata-deploy script.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Currently Kata does not support memory / CPU hotplug for SEV or
SEV-SNP so we need to skip tests that rely on it.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Now that SEV artifacts are built by GHA, remove
conditional that skips tests when using qemu-sev.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Now that we have SNP artifacts in place and they are built via gha,
remove the condition that skips the tests for SNP.
Fixes: #6809
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
When updating an interface, there's maybe an existed
interface whose name would be the same with the updated
required name, thus it would update failed with interface
name existed error. Thus we should rename the existed interface
with an temporary name and swap it with the previouse interface
name last.
Fixes: #6842
Signed-off-by: fupan <fupan.lfp@antgroup.com>
As the TDX CI runs on k3s, we must ensure the cleanup, as already done
for the deploy, used the k3s overlay.
Fixes: #6857
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR replaces single spaces to tabs in order to fix the
indentation of the rootfs script.
Fixes#6848
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
We have been using the C version of virtiofsd on ppc64le. Now that the issue with
rust virtiofsd have been fixed, let's switch to it.
Fixes: #4259
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
virtiofsd v1.6.1 has been released with the fixes required for running
successfully on ppc64le.
Fixes: #4259
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
Supports both online and offline modes of interaction with simple-kbs
for SEV/SEV-ES confidential guests.
Fixes: #6795
Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
The sev package provides utilities for launching AMD SEV and SEV-ES
confidential guests.
Fixes: #6795
Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
Let's specifically name the `gpu` runtime class as `nvidia-gpu`. By
doing this we keep the door open and ease the life of the next vendor
adding GPU support for Kata Containers.
Fixes: #6553
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Update the kata-ctl install rule to allow it to be installed to a given directory
The Makefile was updated to use an INSTALL_PATH variable to track where the
kata-ctl binary should be installed. If the user doesn't specify anything,
then it uses the default path that cargo uses. Otherwise, it will install it
in the directory that the user specified. The README.md file was also updated
to show how to use the new option.
Fixes#5403
Co-authored-by: Cesar Tamayo <cesar.tamayo@intel.com>
Co-authored-by: Kevin Mora Jimenez <kevin.mora.jimenez@intel.com>
Co-authored-by: Narendra Patel <narendra.g.patel@intel.com>
Co-authored-by: Ray Karrenbauer <ray.karrenbauer@intel.com>
Co-authored-by: Srinath Duraisamy <srinath.duraisamy@intel.com>
Signed-off-by: Narendra Patel <narendra.g.patel@intel.com>
Rework TestQemuCreateVM routine to be a table driven test with
various config variations passed to it. After CreateVM a handful
of additional functions are exercised to improve code-coverage.
Also add partial coverage for StartVM routine.
Currently improving from 19.7% to 35.7%
Credit PR to Hackathon Team3
Fixes: #267
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Add http_proxy and https_proxy as part of the docker build arguments
in order to build properly when we are behind a proxy.
Fixes#6834
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
With this fix the vCPU pinning feature chooses the correct
physical cores to pin the vCPU threads on rather than always using core 0.
Fixes#6831
Signed-off-by: Peteris Rudzusiks <rye@stripe.com>
There's absolutely no reason to ship the kata-static tarball as part of
the payload image, as:
* The tarball is already part of the release process
* The payload image already has uncompressed content of the tarball
* The tarball itself is not used anywhere by the kata-deploy scripts
Fixes: #6828
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
There's no reason for us to build firecracker instead of simply
downloading the official released tarball, as tarballs are provided for
the architectures we want to use them.
Fixes: #6770
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
When building from aarch64, just use "arm64" as that's what's used in
the name of the released nydus tarballs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
There's no need to build Cloud Hypervisor aarch64 as, for a few releases
already, Cloud Hypervisor provides an official release binary for the
architecture.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Fix the syntax and logic error that is only displayed if the user runs
the script with `-o`. This option requests that "only" Kata Containers
is installed and stops containerd from being installed.
Fixes: #6822.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
`uname -m` produces `x86_64`, but container image convention
is to use `amd64`, so update this in the tag
Fixes: #6820
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This reverts commit 5ec9ae0f04, for two
main reasons:
* The readinessProbe was misintepreted by myself when working on the
original PR
* It's actually causing issues, as the pod ends up marked as not
healthy.
When release is published, kata-deploy payload and kata-static package
can support multi-arch publishing.
Fixes: #6449
Signed-off-by: SinghWang <wangxin_0611@126.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
All the kernel-foo instances, such as "kernel-sev" or "kernel-snp",
should be transformed into "kernel.foo" when looking at the
versions.yaml file.
This was already done for SEV, but missed on the SNP case.
Fixes: #6777
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We were passing "kernel-nvidia-gpu-tdx", missing the "-experimental"
part, leading to a non-valid URL.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
c9bf7808b6 introduced the logic to
properly get the version of nvidia-gpu kernels, but one important part
was dropped during the rebase into main, which is actually getting the
correct version of the kernel.
Fixing this now, and using the old issue as reference.
Fixes: #6777
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We need to make sure that, when caching a `-nvidia-gpu` kernel, we still
look at the version of the base kernel used to build the nvidia-gpu
drivers, as the ${vendor}-gpu kernels are based on already existing
entries in the versions.yaml file and do not require a new entry to be
added.
Fixes: #6777
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
A few pieces of the local-build tooling are supposed to be
alphabetized. Fixup a couple minor issues that have accumulated.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Now that we have the SNP components in place, make sure that
kata-deploy knows about the qemu-snp shim so that it will be
added to containerd config.
Fixes: #6575
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Since SEV-SNP has limited hotplug support, increase
the pod overhead to account for fixed resource usage.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
SNP requires many specific configurations, so let's make
a new SNP configuration file that we can use with the
kata-qemu-snp runtime class.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Alex Carter <Alex.Carter@ibm.com>
SNP and SEV will share a (guest) kernel. Update the description
in versions.yaml to mention this.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
The x86_64 package of OVMF is required for deployments
that don't use kernel hashes, which includes SEV-SNP
in the short term. We should keep this in the bundle
in the long term in case someone wants to disable
kernel hashes.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Add targets to build the "plain" x86_64 OVMF.
This will be used by anyone who is using SEV or SNP
without kernel hashes. The SNP QEMU does not yet
support kernel hashes so the OvmfPkg will be used
by default.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Alex Carter <Alex.Carter@ibm.com>
Since we reshuffled versions.yaml, update the guide so that
we can find the SNP QEMU info.
Once runtime support is merged we should overhaul or remove
this guide, but let's keep it for now.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Refactor SNP QEMU entry in versions.yaml to match
qemu-experimental and qemu-tdx-experimental.
Also, update the version of QEMU to what we are using
in CCv0. This is the non-UPM QEMU and it does not
have kernel hashes support.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Alex Carter <Alex.Carter@ibm.com>
Add Make targets and helper functions to build the QEMU
needed for SEV-SNP.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Alex Carter <Alex.Carter@ibm.com>
Check that kvm test fails when run as non-root and when device specified
is not /dev/kvm.
Fixes#5338
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Add kvm check using ioctl macro to create a syscall that checks the kvm
api version and if creation of a vm is successful.
Fixes#5338
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
We have code that builds initrd for SEV.
thus, adding that to the test and release process.
Fixes: #6572
Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
SEV requires custom kernel arguments when building.
Thus, adding it to the test and release process.
Fixes: #6572
Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
Adding config file that can be used with qemu-sev runtime class.
Since SEV has limited hotplug support, increase
the pod overhead to account for fixed resource usage.
Fixes: #6572
Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
SEV requires special OVMF.
Now that we have ability to build this custom OVMF, let's optimize
it by caching so that we don't have to build it for every run.
Fixes: sev: #6572
Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
The SEV initrd build requires kernel modules.
So, for SEV case, we need to cache kernel modules tarball in
addition to kernel tarball.
Fixes: #6572
Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
SEV requires special OVMF to work with kernel hashes.
Thus, adding changes that builds this custom OVMF for SEV.
Fixes: #6572
Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
We need special initrd for SEV. The work on SEV initrd is based on
Ubuntu. Thus, adding another entry in versions.yaml
This binary will have '-sev' suffix to distinguish it from the generic
binary.
Fixes: #6572
Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
The snap package is no longer being maintained so update the docs to
warn readers.
We'll remove the snap installation docs in a few weeks.
See: #6769.
Fixes: #6793.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
When the agent config file is missing, the panic message says "no such file or
directory" but doesn't inform the user about which file was missing. Add
context to the parsing (with filename) and to the from_config_file() calls
(with information where the path is coming from).
Fixes: #6771
Depends-on: github.com/kata-containers/tests#5627
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Expanded tests on fc_test.go to cover more lines of code. Coverage went from 4.6% to 18.5%.
Fixed very simple static check fail on line 202.
Fixes: #266
Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>
`cross` is an open source tool that provides zero-setup cross compile
for rust binaries. Add documentation on this tool for compiling
kata-ctl tool and Cross.toml file that provides required configuration
for installing dependencies for various targets.
This is pretty useful for a developer to make sure code compiles and
passes checks for various architectures.
Fixes: #6765
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Eventual replacement of kata-log-parser, but for now replicates its
functionaility for the new runtime-rs syntax. Takes in log files,
parses, sorts by timestamp, spits them out in json, csv, xml, toml, and
a few others.
Fixes#5350
Signed-off-by: Gabe Venberg <gabevenberg@gmail.com>
Expanded tests on fc_test.go to cover more lines of code. Coverage went from 4.6% to 18.5%.
Fixes: #266
Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>
With the changes proposed as part of this PR, a qemu-snp cluster
will be created but no tests will be performed.
GitHub Actions will only run the tests using the workflows that are
part of the **target** branch, instead of the using the ones coming
from the PR. No way to work around this for now.
After this commit is merged, the tests (not the yaml files for the
actions) will be altered in order for the checkout action to help in
this case.
Fixes: #6722
Signed-off-by: Ryan Savino <ryan.savino@amd.com>
Have kata-env call architecture specific function to get cpu details
instead of generic function to get cpu details that works only for
certain architectures. The functionality for cpu details has been fully
implemented for x86_64 and arm architectures, but needs to be
implemented for s390 and powerpc.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Add ability to write the environment information to a file
or stdout if file path is absent.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This is essentially a workaround for the issue:
https://github.com/kata-containers/kata-containers/issues/5954
runtime-rs chnages the Kata config format adding agent_name and
hypervisor_name which are then used as keys to fetch the agent and
hypervisor configs. This will not work for older configs.
So use the first entry in the hashmaps to fetch the configs as a
workaround while the config change issue is resolved.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Expanded tests on factory_test.go to cover more lines of code. Coverage went from 34% to 41.5% in the case of user-mode run tests,
and from 77.7% to 84% in the case of priviledge-mode run tests.
Fixes: #260
Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>
Fix recurring issues of failing to install dependencies due to stale apt cache.
Uprev actions/checkout to v3 to resolve issue "Node.js 12 actions are deprecated."
Fixes: #5659
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Currently, ARCH value is being set to powerpc64le by default.
powerpc64le is only right in context of rust and any operation
which might use this variable for a different purpose would fail on ppc64le.
Fixes: #6741
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
These will be consumed by kata-ctl, so export these so that
they can be used to replace variables available to the rust binary.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Most of kata installation tools use this path for installation, so
add this to the paths to look for the configuration.toml file.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Add the serde default attribute to the field so that parsing
can continue if this field is not present.
The agent assumes a default value for this, so it is not required
by the user to provide a value here.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
With the changes proposed as part of this PR, a qemu-sev cluster will
be created but no tests will be performed.
GitHub Actions will only run the tests using the workflows that are
part of the **target** branch, instead of the using the ones coming
from the PR. No way to work around this for now.
After this commit is merged, the tests (not the yaml files for the
actions) will be altered in order for the checkout action to help in this
case.
Fixes: #6711
Signed-off-by: Ryan Savino <ryan.savino@amd.com>
The kata runtime invokes removeStaleVirtiofsShareMounts after
a container is stopped to clean up the stale virtiofs file caches.
Fixes: #6455
Signed-off-by: Feng Wang <fwang@confluent.io>
Change the Body Line Length workflow to not trigger when the commit
message contains only a message without a body. Other workflows will
flag the missing body sections, and it was confusing to have an error
message that said 'Body line too long (max 150)' when this was not
actually the case.
Fixes: #5561
Co-authored-by: Jayant Singh <jayant.singh@intel.com>
Co-authored-by: Luke Phillips <lucas.phillips@intel.com>
Signed-off-by: Byron Marohn <byron.marohn@intel.com>
Signed-off-by: Jayant Singh <jayant.singh@intel.com>
Signed-off-by: Luke Phillips <lucas.phillips@intel.com>
Signed-off-by: Kelby Madal-Hellmuth <kelby.madal-hellmuth@intel.com>
Signed-off-by: Liz Lawrens <liz.lawrens@intel.com>
Added driver util function for easier handling of VFIO
devices outside of the VFIO module. At the sandbox level
we may need to set options depending if we have a VFIO/PCIe
device, like the fwCfg for confiential guests.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Some functions may be used in other modules then only in
the VFIO module, extract them and make them available to
other layers like sandbox.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
If we have a VFIO device and cold-plug is enabled
we mark each device as ColdPlug=true and let the VFIO
module do the attaching.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
RawDevics are used to get PCIe device info early before the sandbox
is started to make better PCIe topology decisions
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
For the hypervisor to distinguish between PCIe components, adding
a new enum that can be used for hot-plug and cold-plug of PCIe devices
Fixes: #6687
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Expanded tests on signals_test.go to cover more lines of code. 'go test' won't show 100% coverage (only 66.7%), because one test need to spawn a new
process (since it is testing a function that calls os.Exit(1)).
Fixes: #256
Signed-off-by: Eduardo Berrocal <eduardo.berrocal@intel.com>
This patch adds keep_abnormal in runtime config. If keep_abnormal =
true, it means that 1) if the runtime exits abnormally, the cleanup
process will be skipped, and 2) the runtime will not exit even if the
health check fails.
This option is typically used to retain abnormal information for
debugging and should NOT be enabled by default.
Fixes: #6717
Signed-off-by: mengze <mengze@linux.alibaba.com>
Signed-off-by: quanweiZhou <quanweiZhou@linux.alibaba.com>
Newer containerd releases have an additional static package published.
Because of this, download_url contains two urls causing curl to fail.
To resolve this, pick the first url from the containerd releases to
download containerd.
Fixes: #6695
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Ensure that kvm and kvm_intel modules are loaded.
Renames the get_cpu_info() function to read_file_contents()
Fixes#5332
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
- D-Bus enabling now occurs only in setup_rootfs (instead of
prepare_overlay and setup_rootfs)
- Adjust permissions of / so dbus-broker will be able to traverse FS
These changes enables kata-agent to successfully communicate with D-Bus.
Fixes#6677
Signed-off-by: Vladimir <amigo.elite@gmail.com>
Never ever try to close the same fd double times, even in a unit test.
A file descriptor is a number which will be reused, so when you close
the same number twice you may close another file descriptor in the second
time and then there will be an error 'Bad file descriptor (os error 9)'
while the wrongly closed fd is being used.
Fixes: #6679
Signed-off-by: Tim Zhang <tim@hyper.sh>
Last but not least add the continerd shim configuration
pointing to the correct configuration-<shim>.toml
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We need to set hotplug on pci root port and enable at least one
root port. Also set the guest-hooks-dir to the correct path
Fixes: #6675
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
On some systems a GPU is in a IOMMU group with a PCI Bridge and
PCI Host Bridge. Per default no PCI Bridge needs to be passed-through.
When scanning the IOMMU group, ignore devices with a 0x60 class ID prefix.
Fixes: #6663
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
With each release make sure we ship a GPU and TEE enabled kernel
This adds tdx-experimental kernel support
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The beauty of GHA not allowing us to easily test changes in the yaml
files as part of the PR has hit us again. :-/
The correct path for the k3s deployment is
tools/packaging/kata-deploy/kata-deploy/overlays/k3s instead of
tools/packaging/kata-deploy/kata-deploy/overlay/k3s.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
TDVF caching is not working as the tarball name is incorrect. The result
expected is kata-static-tdvf.tar.xz, but it's looking for
kata-static-tdx.tar.xz.
This happens as a logic to convert tdx -> tdvf has been added as part of
the building scripts, but I missed doing this as part of the caching
scripts.
Fixes: #6669
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
TDX QEMU caching is not working as expected, as we're checking for its
version looking at "assets.hypervisor.${QEMU_FLAVOUR}.version", which is
correct for standard QEMU. However, for TDX QEMU we should be checking
for "assets.hypervisor.${QEMU_FLAVOUR}.tag"
Fixes: #6668
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
When testing on AKS, we've been hitting the dial_timeout every now and
then. Let's increase it to 45 seconds (instead of 30) for all the VMMs,
and to 60 seconfs in case of TEEs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The agent now offloads cgroup configuration to systemd when
possible. This requires to enable D-Bus in order to communicate
with systemd.
Fixes#6657
Signed-off-by: Greg Kurz <groug@kaod.org>
Booting up TDX takes more time than booting up a normal VM. Those
values are being already used as part of the CCv0 branch, and we're just
bringing them to the `main` branch as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's ensure the node is ready after the CRI Engine restart, otherwise
we may proceed and scripts may simply fail if they try to deploy a pod
while the CRI Engine is not yet restarted (and, consequently, the node
is not Ready).
Related: #6649
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
readinessProbe will help us to only have the kata-deploy pod marked as
Ready when it finishes all the needed configurations in the node.
Related: #6649
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As TEEs cannot hotplug memory / CPU, we *must* consider the default
values for those as part of the podOverhead.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As the TDX machine is using k3s, let's make sure we're deploying
kat-deploy using the k3s overlay.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We must ensure that no kata-deploy is left behind after the tests
finish, otherwise it may interfere with the next run.
Fixes: #6647
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The socket file for shim management is created in /run/kata
and it isn't deleted after the container is stopped. After
running and stopping thousands of containers /run folder
will run out of space.
Fixes#6622
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Co-authored-by: Greg Kurz <groug@kaod.org>
If conf_guest is set we need to update the CONFIG_LOCALVERSION
to match the suffix created in install_kata
-nvidia-gpu-{snp|tdx}, the linux headers will be named the very
same if build with make deb-pkg for TDX or SNP.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
1. when we use nerdctl to setup network for kata, no netns is created by
nerdctl, kata need to create netns by its own
2. after start VM, nerdctl will call cni plugin via oci hook, we need to
rescan the netns after the interfaces have been created, and hotplug
the network device into the VM
Fixes:#4693
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Now that we've added a TDX capable external runner, let's make sure we
also run the basic tests using TDX.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make sure we configure containerd for the kata-qemu-tdx handler
and ship the kata-qemu-tdx runtime class for kubernetes.
Fixes: #6537
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As the QEMU configuration for TDX differs quite a lot from the normal
QEMU configuration, let's add a new configuration file for the QEMU TDX.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Since TDX doesn't support readonly memslot, TDVF cannot be mapped as
pflash device and it actually works as RAM. "-bios" option is chosen to
load TDVF.
OVMF is the opensource firmware that implements the TDVF support. Thus
the command line to specify and load TDVF is ``-bios OVMF.fd``
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make sure we also check /sys/firmwares/tdx for TDX guest
protection, as the location may depend on whether TDX Seam is being used
or not.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the ability to cache OVMF, which right now we're only building
and shipping it for TDX.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's build the OVMF with TDX support as part of our tests, and let's
ship it as part of our releases.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the needed targets and modifications to be able to build
OVMF for TDX as part of the local-build scripts.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's update the OVMF for TDX version to what's the latest tested
release of the Intel TDX tools with Kata Containers.
This change requires a newer version of `nasm` than the one provided by
the container used to build the project. This change will also be
needed for SEV-SNP and was originally done by Alex Carter (thanks!).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Alex Carter <Alex.Carter@ibm.com>
As we'll be using this from different places in the near future, let's
create a helper function as part of the libs.sh.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make users aware of the cache_components_main.sh that they can
also cache the kernel-tdx-experimental builds.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's build the kernel with TDX support as part of our tests, and let's
ship it as part of our releases.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the needed targets and modifications to be able to build
kernel-tdx-experimental as part of the local-build scripts.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's create a `install_kernel_helper()` function, as it was already
done for QEMU, and rely on that when calling `install_kernel` and
`install_kernel_dragonball_experimental`.
This helps us to reduce the code duplication by a fair amount.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's update the Kernel TDX version to what's the latest tested release
of the Intel TDX tools with Kata Containers.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Although we've been providing users a way to build kernel with TDX
support, this must be moved to its own experimental entry instead of how
it currently is.
The reason for that is because the patches are not yet merged into
kernel, and this is still an experimental build of the project.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's do what we already did when caching the kernel, and allow passing
a FLAVOUR of the project to build.
By doing this we can re-use the same function used to cache QEMU to also
cache any kind of experimental QEMU that we may happen to have.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the needed targets and modifications to be able to build
qemu-tdx-experimental as part of the local-build scripts.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make sure the `qemu_suffix` and `qemu_tarball_name` can be
specified. With this we make it really easy to reuse this script for
any addition flavour of an experimental QEMU that ends up having to be
built (specifically looking at the ones for Confidential Containers
here).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's update the QEMU TDX version to what's the latest tested release of
the Intel TDX tools with Kata Containers.
In order to do such update, we had to relax the checks on the QEMU
version for some of the configuration options, as those were removed
right after the window was open for the 7.1.0 development (thus the
7.0.50 check).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Although we've been providing users a way to build QEMU with TDX
support, this must be moved to its own experimental entry instead of how
it currently is.
The reason for that is because the patches are not yet merged into QEMU,
and this is still an experimental build of the project.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As the dragonball kernel is shipped as part of our releases, it must be
added to the `all` target.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In order to make it easier to read, let's just rename the
install_dragonball_experimental_kernel and install_experimental_kernel
to install_kernel_dragonball_experimental and
install_kernel_experimental, respectively.
This allows us to quickly get to those functions when looking for
`install_kernel`.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Now that the infra for running dragonball tests has been enabled, let's
actually make sure to have them running on each PR.
The tests skipped are:
* `k8s-cpu-ns.bats`, as CPU resize doesn't seem to be yet properly
supported on runtime-rs
* https://github.com/kata-containers/kata-containers/issues/6621Fixes: #6605
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
A sandbox annotation used to specify prefetch_files.list
path the container image being used, and runtime will pass
it to Hypervisor to search for corresponding prefetch file:
format looks like:
"io.katacontainers.config.hypervisor.prefetch_files.list"
= /path/to/<uid>/xyz.com/fedora:36/prefetch_file.list
Fixes: #6582
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
1. when do the deserialization for the oci hook, we should use camel
case for createRuntime
2. we should pass the dir of bundle path instead of the path of
config.json
Fixes:#4693
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
I should have seen this coming, but currently the "create" and "delete"
AKS workflows cannot be imported and uses as a job's step, resulting on
an error trying to find the correspondent action.yaml file for those.
Fixes: #6630
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's ensure we're only running this workflow when PRs are opened
against the main branch.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit a159ffdba7.
Unfortunately we have to revert the PRs related to the switch done to
using `workflow_run` instead of `pull_request_target`. The reason for
that being that we can only mark jobs as required if they are targetting
PRs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit 3a760a157a.
Unfortunately we have to revert the PRs related to the switch done to
using `workflow_run` instead of `pull_request_target`. The reason for
that being that we can only mark jobs as required if they are targetting
PRs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit 7855b43062.
Unfortunately we have to revert the PRs related to the switch done to
using `workflow_run` instead of `pull_request_target`. The reason for
that being that we can only mark jobs as required if they are targetting
PRs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit 85cc5bb534.
Unfortunately we have to revert the PRs related to the switch done to
using `workflow_run` instead of `pull_request_target`. The reason for
that being that we can only mark jobs as required if they are targetting
PRs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We've been currently using {create,delete}_aks as jobs. However, it
means that if the tests fail we'll end up deleting the AKS cluster (as
expected), but not having a way to recreate the cluster without
re-running all jobs, which is a waste of resources.
Fixes: #6628
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add support for virtiofsd when virtio_fs_extra_args with
"-o cache auto, ..." users specified.
Fixes: #6615
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This was missed from the last series, as GHA will use the "target
branch" yaml file to start the workflow.
Basically we changed the name of the cluster created to stop relying on
the PR number, as that's not easily accessible on `workflow_run`.
Fixes: #6611
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It's been pointed out that D4s_v5 instances are more powerful than the
D4s_v3 ones, and have the very same price. With this in mind, let's
switch to the newer machines.
Fixes: #6606
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
With the changes proposed as part of this PR, an AKS cluster will be
created but no tests will be performed.
The reason we have to do this is because GitHub Actions will only run
the tests using the workflows that are part of the **target** branch,
instead of the using the ones coming from the PR, and we didn't find yet
a way to work this around.
Once this commit is in, we'll actually change the tests themselves (not
the yaml files for the actions), as those will be the ones we want as
the checkout action helps us on this case.
Fixes: #6583
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
56331bd7bc oversaw the fact that we
mistakenly tried to push the build containers to the registry for a PR,
rather than doing so only when the code is merged.
As the workflow is now shared between different actions, let's introduce
an input variable to specify which are the cases we actually need to
perform a push to the registry.
Fixes: #6592
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's just print "to the registry" instead of printing "to quay.io", as
the registry used is not tied to quay.io.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We made registry / repo mandatory, but we only adapted that to the amd64
job. Let's fix it now and make sure this is also passed to the arm64
and s390x jobs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As we're using the `workflow_run` event, the checkout action would
pull the **current target branch** instead of the PR one.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The way previously used to get the PR's commit sha can only be used with
`pull_request*` kind of events.
Let's adapt it to the `workflow_run` now that we're using it.
With this change we ended up dropping the PR number from the tarball
suffix, as that's not straightforward to get and, to be honest, not a
unique differentiator that would justify the effort.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make this workflow dependent of the commit message check, and only
start it if the commit message check one passes.
As a side effect, this allows us to run this specific workflow using
secrets, without having to rely on `pull_request_target`.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As already done for Cloud Hypervisor and QEMU, let's make sure we can
run the AKS tests using dragonball.
Fixes: #6583
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Two different kernel build targets (build,install) have both instructions to
build the kernel, hence it was executed twice. Install should only do
install and build should only do build.
Fixes: #6588
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
This is less secure than running the PR on `pull_request`, and will
require using an additional `ok-to-test` label to make sure someone
deliverately ran the actions coming from a forked repo.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's switch to using the `ghcr.io` registry for the k8s CI, as this
will save us some troubles on running the CI with PRs coming from forked
repos.
Fixes: #6587
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In cases where the D-Bus connection fails, add a little additional context about
the origin of the error.
Fixes: 6561
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Suggested-by: Archana Shinde <archana.m.shinde@intel.com>
Spell-checked-by: Greg Kurz <gkurz@redhat.com>
This workflow becomes redundant as we're already testing kubernetes
using kata-deploy, and also testing it on AKS.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is the very first step to replacing the Jenkins CI, and I've
decided to start with an x86_64 approach only (although easily
expansible for other arches as soon as they're ready to switch), and to
start running our kubernetes tests (now running on AKS).
Fixes: #6541
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will be shortly used as part of a newly created GitHub action which
will replace our Jenkins CI.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Those will be shortly used as part of a newly added GitHub action for
testing k8s tests on Azure.
They've been created using the secrets we already have exposed as part
of our GitHub, and they follow a similar way to authenticate to Azure /
create an AKS cluster as done in the `/test-kata-deploy` action.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The first part of simplifying things to have all our tests using GitHub
actions is moving the k8s tests to this repo, as those will be the first
vict^W targets to be migrated to GitHub actions.
Those tests have been slightly adapted, mainly related to what they load
/ import, so they are more self-contained and do not require us bringing
a lot of scripts from the tests repo here.
A few scripts were also dropped along the way, as we no longer plan to
deploy kubernetes as part of every single run, but rather assume there
will always be k8s running whenever we land to run those tests.
It's important to mention that a few tests were not added here:
* k8s-block-volume:
* k8s-file-volume:
* k8s-volume:
* k8s-ro-volume:
These tests depend on some sort of volume being created on the
kubernetes node where the test will run, and this won't fly as the
tests will run from a GitHub runner, targetting a different machine
where kubernetes will be running.
* https://github.com/kata-containers/kata-containers/issues/6566
* k8s-hugepages: This test depends a whole lot on the host where it
lands and right now we cannot assume anything about that anymore, as
the tests will run from a GitHub runner, targetting a different
machine where kubernetes will be running.
* https://github.com/kata-containers/kata-containers/issues/6567
* k8s-expose-ip: This is simply hanging when running on AKS and has to
be debugged in order to figure out the root cause of that, and then
adapted to also work on AKS.
* https://github.com/kata-containers/kata-containers/issues/6578
Till those issues are solved, we'll keep running a jenkins job with
hose tests to avoid any possible regression.
Last but not least, I've decided to **not** keep the history when
bringing those tests here, otherwise we'd end up polluting a lot the
history of this repo, without any clear benefit on doing so.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We can easily re-use the newly added build-kata-static-tarball-*.yaml as
part of the release.yaml file.
By doing this we consolidate on how we build the components accross our
actions.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's split those actions into two different ones:
* Build the kata-static tarball
* Publish the kata-deploy payload
We're doing this as, later in this series we'll start taking advantage
of both pieces.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR updates the url for the Container Network Model
in the network document.
Fixes#6563
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
There can be an error while connecting to the cgroups managager, for
example a `ENOENT` if a file is not found. Make sure that this is
reported through the proper channels instead of causing a `panic()`
that does not provide much information.
Fixes: #6561
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Reported-by: Greg Kurz <gkurz@redhat.com>
The kata monitor metrics API returns a huge size response,
if containers or sandboxs are a large number,
focus on what we need will be harder.
Fixes: #6500
Signed-off-by: Miao Xia <xia.miao1@zte.com.cn>
- nydus: upgrad to v2.2.0
- osbuilder: Add support for CBL-Mariner
- kata-deploy: Fix bash semantics error
- make only_kata work without -f
- runtime-rs: ch: Implement confidential guest handling
- qemu/arm64: disable image nvdimm once no firmware offered
- static checks workflow improvements
- A couple of kata-deploy fixes
- agent: Bring in VFIO-AP device handling again
- bugfix: set hostname in CreateSandboxRequest
- packaging / kata-deploy builds: Add the ability to cache and consume cached components
- versions: Update firecracker version
- dependency: update cgroups-rs
- Built-in Sandbox: add more unit tests for dragonball. Part 6
- runtime: add support for Hyper-V
- runtime-rs: update load_config comment
- Add support for ephemeral mounts to occupy entire sandbox's memory
- runtime-rs: fix default kernel location and add more default config paths
- Implement direct-volume commands handler for shim-mgmt
- bugfix: modify tty_win info in runtime when handling ResizePtyRequest
- bugfix: add get_ns_path API for Hypervisor
- runtime-rs: add the missing default trait
- packaging: Simplify get_last_modification()
- utils: Make kata-manager.sh runs checks
- dragonball: support pmu on aarch64
- docs: fix typo in key filename in AWS installation guide
- backport rustjail systemd cgroup fix#6331 to 3.1
- main | kata-deploy: Fix kata deploy arm64 image build error
- workflows: Yet more fixes for publishing the kata-deploy payload after every PR merged
- rustjail: fix cgroup handling in agent-init mode
- runtime/Makefile: Fix install-containerd-shim-v2 dependency
- fix wrong notes for func GetSandboxesStoragePathRust()
- fix(runtime-rs): add exited state to ensure cleanup
- runtime-rs: add oci hook support
- utils: Remove kata-manager.sh cgroups v2 check
- workflows: Fixes for the `payload-after-push` action
- Dragonball: update dependencies
- workflows: Do not install docker
- workflows: Publish kata-deploy payload after a merge
- src: Fixed typo mod.rs
- actions: Use `git-diff` to get changes in kernel dir
- agent: don't set permission of existing directory in copy_file
- runtime: use filepath.Clean() to clean the mount path
- Upgrade to Cloud Hypervisor v30.0
- feat(runtime): make static resource management consistent with 2.0
- osbuilder: Include minimal set of device nodes in ubuntu initrd
- kata-ctl/exec: add new command exec to enter guest VM.
- kernel: Add CONFIG_SEV_GUEST to SEV kernel config
- runtime-rs: Improve Cloud Hypervisor config handling
- virtiofsd: update to a valid path on ppc64le
- runtime-rs: cleanup kata host share path
- osbuilder: fix default build target in makefile
- devguide: Add link to the contribution guidelines
- kata-deploy: Ensure go binaries can run on Ubuntu 20.04
- dragonball: config_manager: preserve device when update
- Revert "workflows: Push the builder image to quay.io"
- Remove all remaining unsafe impl
- kata-deploy: Fix building the kata static firecracker arm64 package occurred an error
- shim-v2: Bump Ubuntu container image to 22.04
- packaging: Cache the container used to build the kata-deploy artefacts
- utils: always check some dependencies.
- versions: Use ubuntu as the default distro for the rootfs-image
- github-action: Replace deprecated command with environment file
- docs: Change the order of release step
- runtime-rs: remove unnecessary Send/Sync trait implement
- runtime-rs: Don't build on Power, don't break on Power.
- runtime-rs: handle sys_dir bind volume
- sandbox: set the dns for the sandbox
- packaging/shim-v2: Only change the config if the file exists
- runtime-rs: Add basic CH implementation
- release: Revert kata-deploy changes after 3.1.0-rc0 release
8b008fc743 kata-deploy: fix bash semantics error
74ec38cf02 osbuilder: Add support for CBL-Mariner
ac58588682 runtime-rs: ch: Generate Cloud Hypervisor config for confidential guests
96555186b3 runtime-rs: ch: Honour debug setting
e3c2d727ba runtime-rs: ch: clippy fix
ece5edc641 qemu/arm64: disable image nvdimm if no firmware offered
dd23f452ab utils: renamed only_kata to skip_containerd
59c81ed2bb utils: informed pre-check about only_kata
4f0887ce42 kata-deploy: fix install failing to chmod runtime-rs/bin/*
09c4828ac3 workflows: add missing artifacts on payload-after-push
fbf891fdff packaging: Adapt `get_last_modification()`
82a04dbce1 local-build: Use cached VirtioFS when possible
3b99004897 local-build: Use cached shim v2 when possible
1b8c5474da local-build: Use cached RootFS when possible
09ce4ab893 local-build: Use cached QEMU when possible
1e1c843b8b local-build: Use cached Nydus when possible
64832ab65b local-build: Use cached Kernel when possible
04fb52f6c9 local-build: Use cached Firecracker when possible
8a40f6f234 local-build: Use cached Cloud Hypervisor when possible
194d5dc8a6 tools: Add support for caching VirtioFS artefacts
a34272cf20 tools: Add support for caching shim v2 artefacts
7898db5f79 tools: Add support for caching RootFS artefacts
e90891059b tools: Add support for caching QEMU artefacts
7aed8f8c80 tools: Add support for caching Nydus artefacts
cb4cbe2958 tools: Add support for caching Kernel artefacts
762f9f4c3e tools: Add support for caching Firecracker artefacts
6b1b424fc7 tools: Add support for caching Cloud Hypervisor artefacts
08fe49f708 versions: Adjust kernel names to match kata-deploy build targets
99505c0f4f versions: Update firecracker version
f4938c0d90 bugfix: set hostname
96baa83895 agent: Bring in VFIO-AP device handling again
f666f8e2df agent: Add VFIO-AP device handling
b546eca26f runtime: Generalize VFIO devices
4c527d00c7 agent: Rename VFIO handling to VFIO PCI handling
db89c88f4f agent: Use cfg-if for s390x CCW
68a586e52c agent: Use a constant for CCW root bus path
a8b55bf874 dependency: update cgroups-rs
97cdba97ea runtime-rs: update load_config comment
974a5c22f0 runtime: add support for Hyper-V
40f4eef535 build: Use the correct kernel name
a6c67a161e runtime: add support for ephemeral mounts to occupy entire sandbox memory
844bf053b2 runtime-rs: add the missing default trait
e7bca62c32 bugfix: modify tty_win info in runtime when handling ResizePtyRequest
30e235f0a1 runtime-rs: impl volume-resize trait for sandbox
e029988bc2 bugfix: add get_ns_path API for Hypervisor
42b8867148 runtime-rs: impl volume-stats trait for sandbox
462d4a1af2 workflows: static-checks: Free disk space before running checks
e68186d9af workflows: static-checks: Set GOPATH only once
439ff9d4c4 tools/osbuilder/tests: Remove TRAVIS variable
43ce3f7588 packaging: Simplify get_last_modification()
33c5c49719 packaging: Move repo_root_dir to lib.sh
16e2c3cc55 agent: implement update_ephemeral_mounts api
3896c7a22b protocol: add updateEphemeralMounts proto
23488312f5 agent: always use cgroupfs when running as init
8546387348 agent: determine value of use_systemd_cgroup before LinuxContainer::new()
736aae47a4 rustjail: print type of cgroup manager
dbae281924 workflows: Properly set the kata-tarball architecture
76b4591e2b tools: Adjust the build-and-upload-payload.sh script
cd2aaeda2a kata-deploy: Switch to using an ubuntu image
2d43e13102 docs: fix typo in AWS installation guide
760f78137d dragonball: support pmu on aarch64
9bc7bef3d6 kata-deploy: Fix path to the Dockerfile
78ba363f8e kata-deploy: Use different images for s390x and aarch64
6267909501 kata-deploy: Allow passing BASE_IMAGE_{NAME,TAG}
3443f558a6 nydus: upgrad nydus to v2.2.0
395645e1ce runtime: hybrid-mode cause error in the latest nydusd
f8e44172f6 utils: Make kata-manager.sh runs checks
f31c79d210 workflows: static-checks: Remove TRAVIS_XXX variables
8030e469b2 fix(runtime-rs): add exited state to ensure cleanup
7d292d7fc3 workflows: Fix the path of imported workflows
e07162e79d workflows: Fix action name
dd2713521e Dragonball: update dependencies
bd1ed26c8d workflows: Publish kata-deploy payload after a merge
fea7e8816f runtime-rs: Fixed typo mod.rs
a9e2fc8678 runtime/Makefile: Fix install-containerd-shim-v2 dependency
b6880c60d3 logging: Correct the code notes
12cfad4858 runtime-rs: modify the transfer to oci::Hooks
828d467222 workflows: Do not install docker
4b8a5a1a3d utils: Remove kata-manager.sh cgroups v2 check
2c4428ee02 runtime-rs: move pre-start hooks to sandbox_start
e80c9f7b74 runtime-rs: add StartContainer hook
977f281c5c runtime-rs: add CreateContainer hook support
875f2db528 runtime-rs: add oci hook support
ecac3a9e10 docs: add design doc for Hooks
3ac6f29e95 runtime: clh: Re-generate the client code
262daaa2ef versions: Upgrade to Cloud Hypervisor v30.0
192df84588 agent: always use cgroupfs when running as init
b0691806f1 agent: determine value of use_systemd_cgroup before LinuxContainer::new()
dc86d6dac3 runtime: use filepath.Clean() to clean the mount path
c4ef5fd325 agent: don't set permission of existing directory
3483272bbd runtime-rs: ch: Enable initrd usage
fbee6c820e runtime-rs: Improve Cloud Hypervisor config handling
1bff1ca30a kernel: Add CONFIG_SEV_GUEST to SEV kernel config Adding kernel config to sev case since it is needed for SNP and SNP will use the SEV kernel. Incrementing kernel config version to reflect changes
ad8968c8d9 rustjail: print type of cgroup manager
b4a1527aa6 kata-deploy: Fix static shim-v2 build on arm64
2c4f8077fd Revert "shim-v2: Bump Ubuntu container image to 22.04"
afaccf924d Revert "workflows: Push the builder image to quay.io"
4c39c4ef9f devguide: Add link to the contribution guidelines
76e926453a osbuilder: Include minimal set of device nodes in ubuntu initrd
697ec8e578 kata-deploy: Fix kata static firecracker arm64 package build error
ced3c99895 dragonball: config_manager: preserve device when update
da8a6417aa runtime-rs: remove all remaining unsafe impl
0301194851 dragonball: use crossbeam_channel in VmmService instead of mpsc::channel
9d78bf9086 shim-v2: Bump Ubuntu container image to 22.04
3cfce5a709 utils: improved unsupported distro message.
919d19f415 feat(runtime): make static resource management consistent with 2.0
b835c40bbd workflows: Push the builder image to quay.io
781ed2986a packaging: Allow passing a container builder to the scripts
45668fae15 packaging: Use existing image to build td-shim
e8c6bfbdeb packaging: Use existing image to build td-shim
3fa24f7acc packaging: Add infra to push the OVMF builder image
f076fa4c77 packaging: Use existing image to build OVMF
c7f515172d packaging: Add infra to push the QEMU builder image
fb7b86b8e0 packaging: Use existing image to build QEMU
d0181bb262 packaging: Add infra to push the virtiofsd builder image
7c93428a18 packaging: Use existing image to build virtiofsd
8c227e2471 virtiofsd: Pass the expected toolchain to the build container
7ee00d8e57 packaging: Add infra to push the shim-v2 builder image
24767d82aa packaging: Use existing image to build the shim-v2
e84af6a620 virtiofsd: update to a valid path on ppc64le
6c3c771a52 packaging: Add infra to push the kernel builder image
b9b23112bf packaging: Use existing image to build the kernel
869827d77f packaging: Add push_to_registry()
e69a6f5749 packaging: Add get_last_modification()
6c05e5c67a packaging: Add and export BUILDER_REGISTRY
1047840cf8 utils: always check some dependencies.
95e3364493 runtime-rs: remove unnecessary Send/Sync trait implement
a96ba99239 actions: Use `git-diff` to get changes in kernel dir
619ef54452 docs: Change the order of release step
a161d11920 versions: Use ubuntu as the default distro for the rootfs-image
be40683bc5 runtime-rs: Add a generic powerpc64le-options.mk
47c058599a packaging/shim-v2: Install the target depending on the arch/libc
b582c0db86 kata-ctl/exec: add new command exec to enter guest VM.
07802a19dc runtime-rs: handle sys_dir bind volume
04e930073c sandbox: set the dns for the sandbox
32ebe1895b agent: fix the issue of creating the dns file
44aaec9020 github-action: Replace deprecated command with environment file
a68c5004f8 packaging/shim-v2: Only change the config if the file exists
ee76b398b3 release: Revert kata-deploy changes after 3.1.0-rc0 release
bbc733d6c8 docs: runtime-rs: Add CH status details
37b594c0d2 runtime-rs: Add basic CH implementation
545151829d kata-types: Add Cloud Hypervisor (CH) definitions
2dd2421ad0 runtime-rs: cleanup kata host share path
0a21ad78b1 osbuilder: fix default build target in makefile
9a01d4e446 dragonball: add more unit test for virtio-blk device.
Signed-off-by: Greg Kurz <groug@kaod.org>
Our CI and release process are currently taking advantage of the
kata-deploy local build scripts to build the artefacts.
Having snap doing the same is the next logical step, and it will also
help to reduce, by a lot, the CI time as we only build the components
that a PR is touching (otherwise we just pull the cached component).
Fixes: #6514
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add osbuilder support to build a rootfs and image
based on the CBL-Mariner Linux distro
Fixes: #6462
Signed-off-by: Dallas Delaney <dadelan@microsoft.com>
This change provides a preliminary implementation for the Cloud Hypervisor (CH) feature ([currently
disabled](https://github.com/kata-containers/kata-containers/pull/6201))
to allow it to generate the CH configuration for handling confidential guests.
This change also introduces concrete errors using the `thiserror` crate
(see `src/runtime-rs/crates/hypervisor/ch-config/src/errors.rs`) and a
lot of unit tests for the conversion code that generates the CH
configuration from the generic Hypervisor configuration.
Fixes: #6430.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Enable Cloud Hypervisor debug based on the specified configuration
rather than hard-coding debug to be disabled.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
For now, image nvdimm on qemu/arm64 depends on UEFI/ACPI, so if there
is no firmware offered, it should be disabled.
Fixes: #6468
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
passed the only_kata variable through to pre_check, only_kata does not
abort the install when containerd is already installed.
fixes#6385
Signed-off-by: Gabe Venberg <gabevenberg@gmail.com>
The kata-deploy install method tried to `chmod +x /opt/kata/runtime-rs/bin/*` but it isn't
always true that /opt/kata/runtime-rs/bin/ exists. For example, the
s390x payload does not build the kernel-dragonball-experimental
artifacts. So let's ensure the dir exist before issuing the command.
Fixes#6494
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The kata-deploy-ci payloads for amd64 and arm64 were missing the shim-v2
and kernel-dragonball-experimental artifacts.
Fixes#6493
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The function is returning "" when called from the script used to cache
the artefacts and one difference noted between this version and the
already working one from the CCv0 is that we make sure to `pushd
${repo_root_dir}` in the CCv0 version.
Let's give it a try here and see if it solves the issue.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add support for caching VirtioFS artefacts that are generated using
the kata-deploy local-build scripts.
Right now those are not used, but we'll switch to using them very soon
as part of upcoming changes of how we build the components we test in
our CI.
Fixes: #6480
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's add support for caching shim v2 artefacts that are generated using
the kata-deploy local-build scripts.
Right now those are not used, but we'll switch to using them very soon
as part of upcoming changes of how we build the components we test in
our CI.
Fixes: #6480
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's add support for caching RootFS artefacts that are generated using
the kata-deploy local-build scripts.
Right now those are not used, but we'll switch to using them very soon
as part of upcoming changes of how we build the components we test in
our CI.
Fixes: #6480
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's add support for caching QEMU artefacts that are generated using
the kata-deploy local-build scripts.
Right now those are not used, but we'll switch to using them very soon
as part of upcoming changes of how we build the components we test in
our CI.
Fixes: #6480
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's add support for caching Nydus artefacts that are generated using
the kata-deploy local-build scripts.
Right now those are not used, but we'll switch to using them very soon
as part of upcoming changes of how we build the components we test in
our CI.
Fixes: #6480
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's add support for caching Kernel artefacts that are generated using
the kata-deploy local-build scripts.
Right now those are not used, but we'll switch to using them very soon
as part of upcoming changes of how we build the components we test in
our CI.
Fixes: #6480
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's add support for caching Firecracker artefacts that are generated
using the kata-deploy local-build scripts.
Right now those are not used, but we'll switch to using them very soon
as part of upcoming changes of how we build the components we test in
our CI.
Fixes: #6480
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's add support for caching Cloud Hypervisor artefacts that are
generated using the kata-deploy local-build scripts.
Right now those are not used, but we'll switch to using them very soon
as part of upcoming changes of how we build the components we test in
our CI.
Fixes: #6480
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's adjust the kernel names in versions.yaml so those can match the
names used as part of the kata-deploy local build scripts.
Right now this doesn't bring any benefit nor drawback, but it'll make
our life easier later on in this same series.
Depends-on: github.com/kata-containers/tests#5534
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR updates the firecracker version being used in kata containers
versions.yaml
The changes in version 1.3.1 are
Added
Introduced T2CL (Intel) and T2A (AMD) CPU templates to provide
instruction set feature parity between Intel and AMD CPUs when using
these templates.
Added Graviton3 support (c7g instance type).
Changed
Improved error message when invalid network backend provided.
Improved TCP throughput by between 5% and 15% (depending on CPU) by using
scatter-gather I/O in the net device's TX path.
Upgraded Rust toolchain from 1.64.0 to 1.66.0.
Made seccompiler output bit-reproducible.
Fixed
Fixed feature flags in T2 CPU template on Intel Ice Lake.
Fixes#6482
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR is a continuing work for (kata-containers#3679).
This generalizes the previous VFIO device handling which only
focuses on PCI to include AP (IBM Z specific).
Fixes: kata-containers#3678
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Initial VFIO-AP support (#578) was simple, but somewhat hacky; a
different code path would be chosen for performing the hotplug, and
agent-side device handling was bound to knowing the assigned queue
numbers (APQNs) through some other means; plus the code for awaiting
them was written for the Go agent and never released. This code also
artificially increased the hotplug timeout to wait for the (relatively
expensive, thus limited to 5 seconds at the quickest) AP rescan, which
is impractical for e.g. common k8s timeouts.
Since then, the general handling logic was improved (#1190), but it
assumed PCI in several places.
In the runtime, introduce and parse AP devices. Annotate them as such
when passing to the agent, and include information about the associated
APQNs.
The agent awaits the passed APQNs through uevents and triggers a
rescan directly.
Fixes: #3678
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Generalize VFIO devices to allow for adding AP in the next patch.
The logic for VFIOPciDeviceMediatedType() has been changed and IsAPVFIOMediatedDevice() has been removed.
The rationale for the revomal is:
- VFIODeviceMediatedType is divided into 2 subtypes for AP and PCI
- Logic of checking a subtype of mediated device is included in GetVFIODeviceType()
- VFIOPciDeviceMediatedType() can simply fulfill the device addition based
on a type categorized by GetVFIODeviceType()
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
e.g., split_vfio_option is PCI-specific and should instead be named
split_vfio_pci_option. This mutually affects the runtime, most notably
how the labels are named for the agent.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Since shimv2 create task option is already implemented, we need to update the
corresponding comments.
Also, the ordering is also updated to fit with the code.
fixes: #3961
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
This adds /dev/mshv to the list of sandbox devices so that VMMs can
create Hyper-V VMs.
In our testing, this also doesn't error out in case /dev/mshv isn't
present.
Fixes#6454.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
When calling `MAKE_KERNEL_NAME` we're considering the default kernel
name will be `vmlinux.container` or `vmlinuz.container`, which is not
the case as the runtime-rs, when used with dragonball, relies on the
`vmlinu[zx]-dragonball-experimental.container` kernel.
Other hypervisors will have to introduce a similar
`MAKE_KERNEL_NAME_${HYPERVISOR}` to adapt this to the kernel they want
to use, similarly to what's already done for the go runtime.
By doing this we also ensure that no changes in the configuration file
will be required to run runtime-rs, with dragonball, as part of our CI
or as part of kata-deploy.
Fixes: #6290
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
On hotplug of memory as containers are started, remount all ephemeral mounts with size option set to the total sandbox memory
Fixes: #6417
Signed-off-by: Sidhartha Mani <sidhartha_mani@apple.com>
Some structs in the runtime-rs don't implement Default trait.
This commit adds the missing Default.
Fixes: #5463
Signed-off-by: Li Hongyu <lihongyu1999@bupt.edu.cn>
Currently, we only create the new exec process in runtime, this will cause error
when the following requests needing to be handled:
- Task: exec process
- Task: resize process pty
- ...
The agent do not do_exec_process when we handle ExecProcess, thus we can not find
any process information in the guest when we handle ResizeProcessPty. This will
report an error.
In this commit, the handling process is modified to the:
* Modify process tty_win information in runtime
* If the exec process is not running, we just return. And the truly pty_resize will
happen when start_process
Fixes: #6248
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Implements resize-volume handlers in shim-mgmt,
trait for sandbox and add RPC calls to agent.
Note the actual rpc handler for the resize request is currently not
implemented, refer to issue #3694.
Fixes#5369
Signed-off-by: Tingzhou Yuan <tzyuan15@bu.edu>
For external hypervisors(qemu, cloud-hypervisor, ...), the ns they launch vm in
is different from internal hypervisor(dragonball). And when we doing CreateContainer
hook, we will rely on the netns path. So we add a get_ns_path API.
Fixes: #6442
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
Implements get-volume-stats trait for sandbox,
handler for shim-mgmt and add RPC calls to
agent. Also added type conversions in trans.rs
Fixes#5369
Signed-off-by: Tingzhou Yuan <tzyuan15@bu.edu>
We've been seeing the 'sudo make test' job occasionally run out of space in
/tmp, which is part of the root filesystem. Removing dotnet and
`AGENT_TOOLSDIRECTORY` frees around 10GB of space and in my tests the job still
has 13GB of space left after running.
Fixes: #6401
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
{{ runner.workspace }}/kata-containers and {{ github.workspace }} resolve to
the same value, but they're being used multiple times in the workflow. Remove
multiple definitions and define the GOPATH var at job level once.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
The last remaining user of the TRAVIS variable in this repo is
tools/osbuilder/tests and it is only used to skip spinning up VMs. Travis
didn't support virtualization and the same is true for github actions hosted
runners. Replace the variable with KVM_MISSING and determine availability of
/dev/kvm at runtime.
TRAVIS is also used by '.ci/setup.sh' in kata-containers/tests to reduce the
set of dependencies that gets installed, but this is also in the process of
being removed.
Fixes: #3544
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
There's no need to pass repo_root_dir to get_last_modification() as the
variable used everywhere is exported from that very same file.
Fixes: #6431
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is used in several parts of the code, and can have a single
declaration as part of the `lib.sh` file, which is already imported by
all the places where it's used.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
- implement update_ephemeral_mounts rpc
- for each mountpoint passed in, remount it with new options
Signed-off-by: Sidhartha Mani <sidhartha_mani@apple.com>
- adds a new rpc call to the agent service named `updateEphemeralMounts`
- this call takes a list of grpc.Storage objects
Signed-off-by: Sidhartha Mani <sidhartha_mani@apple.com>
The logic to decide which cgroup driver is used is currently based on the
cgroup path that the host provides. This requires host and guest to use the
same cgroup driver. If the guest uses kata-agent as init, then systemd can't be
used as the cgroup driver. If the host requests a systemd cgroup, this
currently results in a rustjail panic:
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: I/O error: No such file or directory (os error 2)
Caused by:
No such file or directory (os error 2)', rustjail/src/cgroups/systemd/manager.rs:44:51
stack backtrace:
0: 0x7ff0fe77a793 - std::backtrace_rs::backtrace::libunwind::trace::h8c197fa9a679d134
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
1: 0x7ff0fe77a793 - std::backtrace_rs::backtrace::trace_unsynchronized::h9ee19d58b6d5934a
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x7ff0fe77a793 - std::sys_common::backtrace::_print_fmt::h4badc450600fc417
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys_common/backtrace.rs:65:5
3: 0x7ff0fe77a793 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::had334ddb529a2169
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys_common/backtrace.rs:44:22
4: 0x7ff0fdce815e - core::fmt::write::h1aa7694f03e44db2
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/fmt/mod.rs:1209:17
5: 0x7ff0fe74e0c4 - std::io::Write::write_fmt::h61b2bdc565be41b5
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/io/mod.rs:1682:15
6: 0x7ff0fe77cd3f - std::sys_common::backtrace::_print::h4ec69798b72ff254
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys_common/backtrace.rs:47:5
7: 0x7ff0fe77cd3f - std::sys_common::backtrace::print::h0e6c02048dec3c77
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys_common/backtrace.rs:34:9
8: 0x7ff0fe77c93f - std::panicking::default_hook::{{closure}}::hcdb7e705dc37ea6e
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:267:22
9: 0x7ff0fe77d9b8 - std::panicking::default_hook::he03a933a0f01790f
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:286:9
10: 0x7ff0fe77d9b8 - std::panicking::rust_panic_with_hook::he26b680bfd953008
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:688:13
11: 0x7ff0fe77d482 - std::panicking::begin_panic_handler::{{closure}}::h559120d2dd1c6180
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:579:13
12: 0x7ff0fe77d3ec - std::sys_common::backtrace::__rust_end_short_backtrace::h36db621fc93b005a
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys_common/backtrace.rs:137:18
13: 0x7ff0fe77d3c1 - rust_begin_unwind
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:575:5
14: 0x7ff0fda52ee2 - core::panicking::panic_fmt::he7679b415d25c5f4
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:65:14
15: 0x7ff0fda53182 - core::result::unwrap_failed::hb71caff146724b6b
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/result.rs:1791:5
16: 0x7ff0fe5bd738 - <rustjail::cgroups::systemd::manager::Manager as rustjail::cgroups::Manager>::apply::hd46958d9d807d2ca
17: 0x7ff0fe606d80 - <rustjail::container::LinuxContainer as rustjail::container::BaseContainer>::start::{{closure}}::h1de806d91fcb878f
18: 0x7ff0fe604a76 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h1749c148adcc235f
19: 0x7ff0fdc0c992 - kata_agent::rpc::AgentService::do_create_container::{{closure}}::{{closure}}::hc1b87a15dfdf2f64
20: 0x7ff0fdb80ae4 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h846a8c9e4fb67707
21: 0x7ff0fe3bb816 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h53de16ff66ed3972
22: 0x7ff0fdb519cb - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h1cbece980286c0f4
23: 0x7ff0fdf4019c - <tokio::future::poll_fn::PollFn<F> as core::future::future::Future>::poll::hc8e72d155feb8d1f
24: 0x7ff0fdfa5fd8 - tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h0a407ffe2559449a
25: 0x7ff0fdf033a1 - tokio::runtime::task::raw::poll::h1045d9f1db9742de
26: 0x7ff0fe7a8ce2 - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h4924ae3464af7fbd
27: 0x7ff0fe7afb85 - tokio::runtime::task::raw::poll::h5c843be39646b833
28: 0x7ff0fe7a05ee - std::sys_common::backtrace::__rust_begin_short_backtrace::ha7777c55b98a9bd1
29: 0x7ff0fe7a9bdb - core::ops::function::FnOnce::call_once{{vtable.shim}}::h27ec83c953360cdd
30: 0x7ff0fe7801d5 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hed812350c5aef7a8
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/alloc/src/boxed.rs:1987:9
31: 0x7ff0fe7801d5 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hc7df8e435a658960
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/alloc/src/boxed.rs:1987:9
32: 0x7ff0fe7801d5 - std::sys::unix::thread::Thread::new::thread_start::h575491a8a17dbb33
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys/unix/thread.rs:108:17
Forward the value of "init_mode" to AgentService, so that we can force cgroupfs
when systemd is unavailable.
Fixes: #5779
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Right now LinuxContainer::new() gets passed a CreateOpts struct, but then
modifies the use_systemd_cgroup field inside that struct. Pull the cgroups path
parsing logic into do_create_container, so that CreateOpts can be immutable in
LinuxContainer::new. This is just moving things around, there should be no
functional changes.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Since the cgroup manager is wrapped in a dyn now, the print in
LinuxContainer::new has been useless and just says "CgroupManager". Extend the
Debug trait for 'dyn Manager' to print the type of the cgroup manager so that
it's easier to debug issues.
Fixes: #5779
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Let's make sure the kata-tarball architecture upload / downloaded / used
is exactly the same one that we need as part of the architecture we're
using to generate the image.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Now that we've switched the base container image to using Ubuntu instead
of CentOS, we don't need any kind of extra logic to correctly build the
image for different architectures, as Ubuntu is a multi-arch image that
supports all the architectures we're targetting.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make sure we use a multi-arch image for building kata-deploy.
A few changes were also added in order to get systemd working inside the
kata-deploy image, due to the switch from CentOS to Ubuntu.
Fixes: #6358
Signed-off-by: SinghWang <wangxin_0611@126.com>
This commit adds support for pmu virtualization on aarch64. The
initialization of pmu is in the following order:
1. Receive pmu parameter(vpmu_feature) from runtime-rs to determine the
VpmuFeatureLevel.
2. Judge whether to initialize pmu devices and add pmu device node into
fdt on aarch64, according to VpmuFeatureLevel.
Fixes: #6168
Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
As part of bd1ed26c8d, we've pointed to
the Dockerfile that's used in the CC branch, which is wrong.
For what we're doing on main, we should be pointing to the one under the
`kata-deploy` folder, and not the one under the non-existent
`kata-deploy-cc` one.
Fixes: #6343
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As the image provided as part of registry.centos.org is not a multi-arch
one, at least not for CentOS 7, we need to expand the script used to
build the image to pass images that are known to work for s390x (ClefOS)
and aarch64 (CentOS, but coming from dockerhub).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's break the IMAGE build parameter into BASE_IMAGE_NAME and
BASE_IMAGE_TAG, as it makes it easier to replace the default CentOS
image by something else.
Spoiler alert, the default CentOS image is **not** multi-arch, and we do
want to support at least aarch64 and s390x in the near term future.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
When update the nydusd to 2.2, the argument "--hybrid-mode" cause
the following error:
thread 'main' panicked at 'ArgAction::SetTrue / ArgAction::SetFalse is defaulted'
Maybe we should remove it to upgrad nydusd
Fixes: #6407
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Updated the `kata-manager.sh` script to make it run all the checks on
the host system before attempting to create a container. If any checks
fail, they will indicate to the user what the problem is in a clearer
manner than those reported by the container manager.
Fixes: #6281.
Signed-off-by: tg5788re <jfokugas@gmail.com>
These variables are unused since we don't use travis CI. This also allows to
remove two steps:
- 'Setup GOPATH' only printed variables
- 'Setup travis reference' modified some shell local variables that don't have
any influence on the rest of the steps
The TRAVIS var is still used by tools/osbuilder/tests to determine if
virtualization is available.
Fixes: #3544
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Set process status to exited at end of io wait, which indicate process
exited only, but stop process has not been finished. Otherwise, the
cleanup_container will be skipped.
Fixes: #6393
Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
In `payload-after-push.yaml` we ended up mentioning cc-*.yaml workflows,
which are non existent in the main branch.
Let's adapt the name to the correct ones.
Fixes: #6343
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We have a few actions in the `payload-after-push.*.yaml` that are
referring to Confidential Containers, but they should be referring to
Kata Containers instead.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Since rust-vmm and dragonball-sandbox has introduced several updates
such as vPMU support for aarch64, we also need to update Dragonball
dependencies to include those changes.
Update:
virtio-queue to v0.6.0
kvm-ioctls to v0.12.0
dbs-upcall to v0.2.0
dbs-virtio-devices to v0.2.0
kvm-bindings to v0.6.0
Also, several aarch64 features are updated because of dependencies
changes:
1. update vcpu hotplug API.
2. update vpmu related API.
3. adjust unit test cases for aarch64 Dragonball.
fixes: #6268
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
For the architectures we know that `make kata-tarball` works as
expected, let's start publishing the kata-deploy payload after each
merge.
This will help to:
* Easily test the content of current `main` or `stable-*` branch
* Easily bisect issues
* Start providing some sort of CI/CD content pipeline for those who
need that
This is a forward-port work from the `CCv0` and groups together patches
that I've worked on, with the work that Choi did in order to support
different architectures.
Fixes: #6343
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
$ make install
make: *** No rule to make target 'containerd-shim-kata-v2', needed by 'install-containerd-shim-v2'. Stop.
Spotted when building kata-runtime with a different name for
SHIMV2_OUTPUT. For instance, trying to keep different runtime binaries
installed at the same time, one from master and another from lets say,
the CCv0 branch, with the following small change applied.
diff --git a/src/runtime/Makefile b/src/runtime/Makefile
index 95efaff78..2bab9eb75 100644
--- a/src/runtime/Makefile
+++ b/src/runtime/Makefile
@@ -231,7 +231,7 @@ SED = sed
CLI_DIR = cmd
SHIMV2 = containerd-shim-kata-v2
-SHIMV2_OUTPUT = $(bCURDIR)/$(SHIMV2)
+SHIMV2_OUTPUT = $(CURDIR)/$(SHIMV2)-ccv0
SHIMV2_DIR = $(CLI_DIR)/$(SHIMV2)
MONITOR = kata-monitor
Fixes: #6398
Signed-off-by: Eduardo Lima (Etrunko) <etrunko@redhat.com>
In this commit, we have done:
* modify the tranfer process from grpc::Hooks to oci::Hooks, so the code
can be more clean
* add more tests for create_runtime, create_container, start_container hooks
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
The latest ubuntu runners already have docker installed and trying to
install it manually will cause the following issue:
```
Run curl -fsSL https://test.docker.com/ -o test-docker.sh
Warning: the "docker" command appears to already exist on this system.
If you already have Docker installed, this script can cause trouble, which is
why we're displaying this warning and provide the opportunity to cancel the
installation.
If you installed the current Docker package using this script and are using it
again to update Docker, you can safely ignore this message.
You may press Ctrl+C now to abort this script.
+ sleep 20
+ sudo -E sh -c apt-get update -qq >/dev/null
E: The repository 'https://packages.microsoft.com/ubuntu/22.04/prod jammy Release' is no longer signed.
```
Fixes: #6390
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Removed the part in the `kata-manager.sh` script that checks if the host system only runs cgroups v2.
Fixes: #6259.
Signed-off-by: Alec Pemberton <pembek1901@gmail.com>
In some cases, network endpoints will be configured through Prestart
Hook. So network endpoints may need to be added(hotpluged) after vm
is started and also Prestart Hook is executed.
We move pre-start hook functions' execution to sandbox_start to allow
hooks running between vm_start and netns_scan easily, so that the
lifecycle API can be cleaner.
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
StartContainer will be execute in guest container namespace in Kata.
The Hook Path of this kind of hook is also in guest container namespace.
StartContainer is executed after start operation is called, and it
should be executed before user-specific command is executed.
Fixes: #5787
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
CreateContainer hook is one kind of OCI hook. In kata, it will be
executed after VM is started, before container is created, and after
CreateRuntime is executed.
The hook path of CreateContainer hook is in host runtime namespace, but
it will be executed in host vmm namespace.
Fixes: #5787
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
According to the runtime OCI Spec, there can be some hook
operations in the lifecycle of the container. In these hook
operations, the runtime can execute some commands. There are different
points in time in the container lifecycle and different hook types
can be executed.
In this commit, we are now supporting 4 types of hooks(same in
runtime-go): Prestart hook, CreateRuntime hook, Poststart hook and
Poststop hook.
Fixes: #5787
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
This patch re-generates the client code for Cloud Hypervisor v30.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.
Fixes: #6375
Signed-off-by: Bo Chen <chen.bo@intel.com>
The logic to decide which cgroup driver is used is currently based on the
cgroup path that the host provides. This requires host and guest to use the
same cgroup driver. If the guest uses kata-agent as init, then systemd can't be
used as the cgroup driver. If the host requests a systemd cgroup, this
currently results in a rustjail panic:
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: I/O error: No such file or directory (os error 2)
Caused by:
No such file or directory (os error 2)', rustjail/src/cgroups/systemd/manager.rs:44:51
stack backtrace:
0: 0x7ff0fe77a793 - std::backtrace_rs::backtrace::libunwind::trace::h8c197fa9a679d134
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
1: 0x7ff0fe77a793 - std::backtrace_rs::backtrace::trace_unsynchronized::h9ee19d58b6d5934a
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x7ff0fe77a793 - std::sys_common::backtrace::_print_fmt::h4badc450600fc417
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys_common/backtrace.rs:65:5
3: 0x7ff0fe77a793 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::had334ddb529a2169
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys_common/backtrace.rs:44:22
4: 0x7ff0fdce815e - core::fmt::write::h1aa7694f03e44db2
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/fmt/mod.rs:1209:17
5: 0x7ff0fe74e0c4 - std::io::Write::write_fmt::h61b2bdc565be41b5
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/io/mod.rs:1682:15
6: 0x7ff0fe77cd3f - std::sys_common::backtrace::_print::h4ec69798b72ff254
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys_common/backtrace.rs:47:5
7: 0x7ff0fe77cd3f - std::sys_common::backtrace::print::h0e6c02048dec3c77
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys_common/backtrace.rs:34:9
8: 0x7ff0fe77c93f - std::panicking::default_hook::{{closure}}::hcdb7e705dc37ea6e
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:267:22
9: 0x7ff0fe77d9b8 - std::panicking::default_hook::he03a933a0f01790f
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:286:9
10: 0x7ff0fe77d9b8 - std::panicking::rust_panic_with_hook::he26b680bfd953008
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:688:13
11: 0x7ff0fe77d482 - std::panicking::begin_panic_handler::{{closure}}::h559120d2dd1c6180
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:579:13
12: 0x7ff0fe77d3ec - std::sys_common::backtrace::__rust_end_short_backtrace::h36db621fc93b005a
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys_common/backtrace.rs:137:18
13: 0x7ff0fe77d3c1 - rust_begin_unwind
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:575:5
14: 0x7ff0fda52ee2 - core::panicking::panic_fmt::he7679b415d25c5f4
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:65:14
15: 0x7ff0fda53182 - core::result::unwrap_failed::hb71caff146724b6b
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/result.rs:1791:5
16: 0x7ff0fe5bd738 - <rustjail::cgroups::systemd::manager::Manager as rustjail::cgroups::Manager>::apply::hd46958d9d807d2ca
17: 0x7ff0fe606d80 - <rustjail::container::LinuxContainer as rustjail::container::BaseContainer>::start::{{closure}}::h1de806d91fcb878f
18: 0x7ff0fe604a76 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h1749c148adcc235f
19: 0x7ff0fdc0c992 - kata_agent::rpc::AgentService::do_create_container::{{closure}}::{{closure}}::hc1b87a15dfdf2f64
20: 0x7ff0fdb80ae4 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h846a8c9e4fb67707
21: 0x7ff0fe3bb816 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h53de16ff66ed3972
22: 0x7ff0fdb519cb - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h1cbece980286c0f4
23: 0x7ff0fdf4019c - <tokio::future::poll_fn::PollFn<F> as core::future::future::Future>::poll::hc8e72d155feb8d1f
24: 0x7ff0fdfa5fd8 - tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h0a407ffe2559449a
25: 0x7ff0fdf033a1 - tokio::runtime::task::raw::poll::h1045d9f1db9742de
26: 0x7ff0fe7a8ce2 - tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h4924ae3464af7fbd
27: 0x7ff0fe7afb85 - tokio::runtime::task::raw::poll::h5c843be39646b833
28: 0x7ff0fe7a05ee - std::sys_common::backtrace::__rust_begin_short_backtrace::ha7777c55b98a9bd1
29: 0x7ff0fe7a9bdb - core::ops::function::FnOnce::call_once{{vtable.shim}}::h27ec83c953360cdd
30: 0x7ff0fe7801d5 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hed812350c5aef7a8
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/alloc/src/boxed.rs:1987:9
31: 0x7ff0fe7801d5 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hc7df8e435a658960
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/alloc/src/boxed.rs:1987:9
32: 0x7ff0fe7801d5 - std::sys::unix::thread::Thread::new::thread_start::h575491a8a17dbb33
at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/sys/unix/thread.rs:108:17
Forward the value of "init_mode" to AgentService, so that we can force cgroupfs
when systemd is unavailable.
Fixes: #5779
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Right now LinuxContainer::new() gets passed a CreateOpts struct, but then
modifies the use_systemd_cgroup field inside that struct. Pull the cgroups path
parsing logic into do_create_container, so that CreateOpts can be immutable in
LinuxContainer::new. This is just moving things around, there should be no
functional changes.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Fix path check bypassed issuse introduced by #6082,
use filepath.Clean() to clean path before check
Fixes: #6082
Signed-off-by: XDTG <click1799@163.com>
This patch fixes the issue that do_copy_file changes
the directory permission of the parent directory of
a target file, even when the parent directory already
exists.
Fixes#6367
Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
This change enables to run cloud-hypervisor VMM using a non-root user
when rootless flag is set true in the configuration
Fixes: #2567
Signed-off-by: Feng Wang <fwang@confluent.io>
Allow an initrd/initramfs image to be used with Cloud Hypervisor, which
is handled differently to the default rootfs image type.
Fixes: #6335.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Replace `cloud_hypervisor_vm_create_cfg()` with a set of `TryFrom` trait
implementations in the new CH specific `convert.rs` to allow the generic
`Hypervisor` configuration to be converted into the CH specific
`VmConfig` type.
Note that device configuration is not currently handled in `convert.rs`
(it's handled in `inner_device.rs`).
This change removes the old hard-coded CH specific configuration.
Fixes: #6203.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Adding kernel config to sev case since it is needed for SNP and SNP will use the SEV kernel.
Incrementing kernel config version to reflect changes
Fixes: #6123
Signed-off-by: Alex Carter <Alex.Carter@ibm.com>
Since the cgroup manager is wrapped in a dyn now, the print in
LinuxContainer::new has been useless and just says "CgroupManager". Extend the
Debug trait for 'dyn Manager' to print the type of the cgroup manager so that
it's easier to debug issues.
Fixes: #5779
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Following Jong Wu suggestion, let's link /usr/bin/musl-gcc to
/usr/bin/aarch64-linux-musl-gcc.
Fixes: #6320
Signed-off-by: SinghWang <wangxin_0611@126.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit 9d78bf9086.
Golang binaries are built statically by default, unless linking against
CGO, which we do. In this case we dynamically link against glibc,
causing us troubles when running a binary built with Ubuntu 22.04 on
Ubuntu 20.04 (which will still be supported for the next few years ...)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
New developers are often confused by some of our requirements, notably porting
labels. While our CONTRIBUTING.md file points to the solution, the developer's
guide does not. Add a link there.
Fixes: #6329
Signed-off-by: Christophe de Dinechin <christophe@dinechin.org>
When starting an initrd the kernel expects to find /dev/console in the initrd,
so that it can connect it as stdin/stdout/stderr to the /init process. If the
device node is missing the kernel will complain that it was unable to open an
initial console. If kata-agent is the initrd init process, it will also result
in log messages not being logged to console and thus not forwarded to host
syslog.
Add a set of standard device nodes for completeness, so that console logging
works. To do that we install the makedev packge which provides a MAKEDEV helper
that knows the major/minor numbers. Unfortunately the debian package tries to
create devnodes from postinst, which can be suppressed if systemd-detect-virt
is present. That's why we create a small dummy script that matches what
systemd-detect-virt would output (anything is enough to suppress mknod).
Fixes: #6261
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
DeviceConfigInfo contains config and device, so when we want to do
update we could simply update config part of the info, and device would
not be changed during update.
Fixes: #6324
Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
Because crossbeam_channel has more features and better performance than
mpsc::channel and finally rust replace its channel implementation with
crossbeam_channel on version 1.67
Signed-off-by: Tim Zhang <tim@hyper.sh>
Let's bump the base container image to use the 22.04 version of Ubuntu,
as it does bring up-to-date package dependencies that we need to
statically build the runtime-rs on aarch64.
Fixes: #6320
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
previously, if installing on unkown distro, script would tell user that
their distro was unsupported. Changed error message prompting user to
install dependecies manually, then retry.
Signed-off-by: Gabe Venberg <gabevenberg@gmail.com>
Let's push the builder images to a registry, so we can take advantage of
those on each step of our building process.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This, combined with the effort of caching builder images *and* only
performing the build itself inside the builder images, is the very first
step for reproducible builds for the project.
Reproducible builds are quite important when we talk about Confidential
Containers, as users may want to verify the content used / provided by
the CSPs, and this is the first step towards that direction.
Fixes: #5517
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's first try to pull a pre-existing image, instead of building our
own, to be used as a builder image for the td-shim.
This will save us some CI time.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's first try to pull a pre-existing image, instead of building our
own, to be used as a builder image for the td-shim.
This will save us some CI time.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the needed infra for building and pushing the OVMF builder
image to the Kata Containers' quay.io registry.
Fixes: #5477
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's first try to pull a pre-existing image, instead of buildinf our
own, to be used as a builder image for OVMF.
This will save us some CI time.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the needed infra for only building and pushing the QEMU
builder image to the Kata Containers' quay.io registry.
Fixes: #5481
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's first try to pull a pre-existsing image, instead of building our
own, to be used as a builder image for QEMU.
This will save us some CI time.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the needed infra for only building and pushing the virtiofsd
builder image to the Kata Containers' quay.io registry.
Fixes: #5480
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's first try to pull a pre-existing image, instead of building our
own, to be used as a builder image for the virtiofsd.
This will save us some CI time.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's ensure we're building virtiofsd with a specific toolchain that's
known to not cause any issues, instead of always using the latest one.
On each bump of the virtiofsd, we'll make sure to adjust this according
to what's been used by the virtiofsd community.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the needed infra for only building and pushing the shim-v2
builder image to the Kata Containers' quay.io registry.
Fixes: #5478
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's try to pull a pre-existing image, instead of building our own, to
be used as a builder for the shim-v2.
This will save us some CI time.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Currently the symbolic link for virtiofsd which is used as
a valid path is not updated on every CI run. Fix it by
using the actual path of installation.
Fixes: #6311
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
Let's add the needed infra for only building and pushing the kernel
builder image to the Kata Containers' quay.io registry.
Fixes: #5476
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's first try to pull a pre-existing image, instead of building our
own, to be used as a builder image for the kernel.
This will save us some CI time.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will push a specific tag to a registry, whenever the
PUSH_TO_REGISTRY environment variable is set, otherwise it's a no-op.
This will be used in the future to avoid replicating that logic in every
builder used by the kata-deploy scripts.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add a function to get the hash of the last commit modifying a
specific file.
This will help to avoid writing `git rev-list ...` into every single
build script used by the kata-deploy.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
BUILD_REGISTRY, which points to quay.io/kata-containers/builder, will be
used for storing the builder images used to build the artefacts via the
kata-deploy scripts.
The plan is to tag, whenever it's possible and makes sense, images like:
* ${BUILDER_REGISTRY}:${component}-${unique_identifier}
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Every dependency in check_deps is used inside the script (apart from
git, which may be a historical artifact), and therefore should be
checked even when the -f option is passed to the script. Simply changed
at what point check_deps is called in order to always run it.
Fixes#6302.
Signed-off-by: Gabe Venberg <gabevenberg@gmail.com>
Send and Sync are automatically derived traits,
if a type is composed entirely of Send or Sync types, then it is Send or Sync.
Almost all primitives are Send and Sync,
so we don't need to implement them manually most of the time.
Fixes: #6307
Signed-off-by: Tim Zhang <tim@hyper.sh>
Use `git-diff` instead of legacy `git-whatchanged` to get
differences in the packaging/kernel directory. This also fixes
a bug by grepping for the kernel directory in the output of the
git command.
Fixes: #6210
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
When a new stable branch is created, it is necessary to change the
references in the tests repo from main to the new stable branch.
However this step needs to be performed after the repos have been tagged
as the `tags_repos.sh` script is the one that creates the new branch.
Clarify this in the documentation and move the step to change branch
references in test repo after repos have been tagged.
Fixes: #1824
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Currently ubuntu is already the default distro for all the architectures
but x86_64, which uses clearlinux. However, our CI does *not* test the
clearlinux image we ship.
Taking a look at our CI code [0], we've been using ubuntu as base for
the tests for a few years already, if not forever.
The minimum we can do is to switch to distributing ubuntu, as the tested
rootfs-image, and then decide later on whether we should switch back to
clearlinux (once we switch our CI to using that, and make sure all tests
will be green), or if we move to slimmer distro, such as alpine.
[0]: 0a39dd1a01/.ci/install_kata_image.sh (L44)Fixes: #6303
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
There's a check in the runtime-rs Makefile that basically checks whether
the `arch/$arch-options.mk` exists or not and, if it doesn't, the build
is just aborted.
With this in mind, let's create a generic powerpc64le-options.mk file
and not bail when building for this architecture.
Fixes: #6142
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In the `install_go_rust.sh` file we're adding a
x86_64-unknown-linux-musl target unconditionally. That should be,
instead, based in the ARCH of the host and the appropriate LIBC to be
used with that host.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The patchset will help users to easily enter guest VM by debug
console sock.
In order to enter guest VM smoothly, users needs to do some
configuration, options as below:
(1) Set debug_console_enabled = true with default vport 1026.
(2) Or add agent.debug_console agent.debug_console_vport=<PORT>
into kernel_params, and the vport is <PORT> you set.
The detail of usage:
$ kata-ctl exec -h
kata-ctl-exec
Enter into guest VM by debug console
USAGE:
kata-ctl exec [OPTIONS] <SANDBOX_ID>
ARGS:
<SANDBOX_ID> pod sandbox ID
Fixes: #5340
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
For some cases, users will mount system directories as bind volume.
We should not bind mount these kind of directories in the host as it does
not make sense.
Fixes: #6299
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
The rust agent had supported to set the guest dns
server in start sandbox request, thus add the dns
in the runtime side.
Fixes:#6286
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
We should make sure the dns's source file's parent
directory exist, otherwise, it would failed to create
the file directly.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
In workflow, `set-output` command is deprecated and will be disabled soon.
This commit replaces the deprecated `set-output` command with putting a
value in the environment file `$GITHUB_OUTPUT`.
Fixes#6266
Signed-off-by: jongwooo <jongwooo.han@gmail.com>
Let's not try to sed a file that doesn't exist, which may be the case
depending on the architecture we're building the shim-v2 for.
This is a partial-forward port of
f24c47ea47.
Fixes: #6293
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Removed the `` around containerd, because when you execute this as a
script it runs the containerd command within the script, which it should
not do.
Fixes#4217
Signed-off-by: Willem Dendauw <willem.dendauw@hotmail.com>
As 3.1.0-rc0 has been released, let's switch the kata-deploy / kata-cleanup
tags back to "latest", and re-add the kata-deploy-stable and the
kata-cleanup-stable files.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
- kata-deploy: Install protobuf-compiler explicitly in shim-v2 Dockerfile
- runtime: tracing: Fix missing ctx return
- runtime: add reconnect timeout for vhost user block
- SEV: Update ReducedPhysBits
- shim-v2/build.sh: Only build runtime-rs for the supported arches
- kata-ctl: Expand unit tests for CPU check
- runtime: support cgroup v2 metrics marshal guest metrics
- Typo: change tabs in comment to spaces
- rootfs: support EROFS filesystem
- versions: Update runc version
- runtime: Improve documentation of appendFDs
- Minor cleanups in make file
- main | docs: Fix missing critical steps in how-to-hotplug-memory-arm64.md
- Action check kernel config version
- clh: Enforce API timeout only for vm.boot request
- virtiofsd: change cache mod to const
- runtime-rs: ignor "no such process" error when delete cgroup for a thread to let it go
- kernel: Add console kernel config for s390
- runtime: remove not used shim configurations
- improvement: Fix naming conventions for span name and log subsystem
- Dragonball: add cpu resize ability
- arm64/CI: fix unit test failure on arm64
- CI: Make docker version stick to v20.10 in ubuntu:20.04 for s390x|ppc64le
- virtiofsd: fix the build on ppc64le
- runtime:fix stat uds path
- cni: Update cni plugins version to 1.2.0
- Built-in Sandbox: add more unit tests for dragonball. Part 5
- runtime: Drop QEMU log file support
- docs: Add documentation for building agent with seccomp support.
- Add kernel-dragonball-experimental to kata-deploy, kata-deploy-test, and the release
- runtime-rs: add missing config section for share-fs
- runtime: Add hmp for qemu
- upcall: add document for upcall
- runtime: Start QEMU undaemonized and get logs
- docs: Update url link in QAT documentation
- versions: update cni plugins version
- versions: Upgrade to Cloud Hypervisor v29.0
- runtime: Use consts in `kata-runtime check`
- versions: Bump QEMU to v7.2.0
- agent: Eliminate unnecessary metrics
- runtime:all APIs are hang in the service.mu
- Utility functions for kata-env
- versions: Update conmon version
- runtime: paas enablevhostuserstore annotation to hypervisor config
- runk: Upgrade liboci-cli to v0.0.4
- runtime: use system pagesize for hugepage test
- dependency: update cgroups-rs
- runtime: Use git rev-parse for the kata-monitor tag
- virtcontainers: split out linux-specific bits for mount, factory
- Add darwin skeletons
- vendor: revendor netlink to get latest
- Address issues with the initial vCPU pinning functionality
- virtcontainers: Fix misspelling in error message
- runtime: add test generated file to .gitignore
- runtime: fix up disable_netns handling
- docs: add hint of probing loop module
- tools: add --locked option for cargo install
- runtime-rs: add Single Container support
- virtcontainers: tests: Ensure Linux specific tests are just run on Linux
- Change cache mode from none to never
- tools: Fix indentation for setup aks script
- virtcontainers: fs_share: Add Darwin skeleton
- virtcontainers: Add a Virtualization.framework skeleton
- kata-ctl: remove get_kata_version_by_url function
- kata-ctl: fix build error on s390x
- virtcontainers: Introduce hypervisor_darwin
- runtime: Define Darwin handled signals list
- nydus: net-ns handling needs to be only executed on Linux hosts
- clh: Ensure it works with Docker / Moby
- agent: refactor guest hooks
- fix moby prestart hook handling
- schedcore: Make buildable on !linux
- Built-in Sandbox: add more unit tests for dragonball. Part 4
- runtime-rs: cleanup the run dir of hypervisor when shut down
- Feat: implementation of kata-ctl direct-volume operations
- Runtime: Clarify mutability of global var
- kata-runtime: add rust runtime path for kata-runtime exec
- versions: Upgrade to Cloud Hypervisor v28.1
- runtime-rs: add dbs-upcall feature
- runtime/Makefile: Get some bits happy on darwin
- docs: remove old and misleading instructions for minikube
- packaging: fix indents in build-kernel.sh
- kernel: adding kmod to do docker env
- versions: Update the rust toolchain to 1.66.0
- kata-ctl: skip test if access GitHub.com fail
- agent: unset `CC` for cross-build
- runtime-rs: enable hugepage
- runtime-rs: Clean up mount points shared to guest
- kata-ctl: fix checkcpu bug in non-x86 arches
d144ded12 release: Adapt kata-deploy for 3.1.0-rc0
8e3863cec kata-deploy: Install protobuf-compiler explicitly in shim-v2 Dockerfile
c45391991 runtime: tracing: Fix missing ctx return
4139d68d5 runtime-rs: Include target install in conditional branch
ca02c9f51 runtime: add reconnect timeout for vhost user block
2f5bc0f40 kata-ctl: Expand unit tests for CPU check
67b8f0773 SEV: Update ReducedPhysBits
bdf20b5d2 rootfs: support EROFS filesystem
fff0e50a7 versions: Update runc version
ed02c8a05 docs: add guide for building rootfs with EROFS
01765e173 runtime: support cgroup v2 metrics marshal guest metrics
49326fe4e fix(clippy): fix hypervisor clippy checks
94b1d9814 cargo: Update Cargo.lock files
f1855594a make: Get rid of verbose output while creating tar
c3836010a make: clean up obsolete targets
ac64b021a clh: Enforce API timeout only for vm.boot request
56071c6e7 virtiofsd: change cache mod to const
5d37d31ac cgroups: upgrade cgroupfs to 0.3.1
ab59a65c9 runtime-rs: neglect a certain error when delete cgroup
390916b33 runtime: remove not used shim configurations
9794c52c6 improvement: Fix naming conventions for span name and log subsystem
f49b89b63 CI: Set docker version to v20.10 in ubuntu:20.04 for s390x|ppc64le
3c24e2340 README: Update Readme under packaging/kernel
d73f3a8a2 github-action: Add step to verify kernel config version id updated
59f104c02 runtime: skip unit test that fail regularly on aarch64
b7dd97cac kata-ctl: fix permission deny issue in test_add_remove
57c5e5629 Dragonball: add cpu resize ability
3c48f2202 runtime: Improve documentation of appendFDs
856ab6687 virtiofsd: fix the build on ppc64le
f83115a83 docs: Fix missing critical steps in how-to-hotplug-memory-arm64.md
e071d9251 Typo: change tabs in comment to spaces
56f0a27fe kernel: Add console kernel config for s390
334c4b8bd runtime: Drop QEMU log file support
3a63e3c1f cni: Update cni plugins version to 1.2.0
510798155 dragonball: Improve test cases
dc90c6e30 dragonball: add more unit test for vm
c07135535 runtime-rs: Improve s390x error message
4e2db96ef runtime-rs: Don't try to build on Power
8e8c720d5 kata-deploy-push: Ensure we build Dragonball specific kernel
1e531b44d runtime:fix stat uds path
9092c23a2 runtime: Add hmp for qemu
b7f4e96ff kata-deploy-test: Ensure we build dragonball specific kernel
063dec37c release: Add the dragonball-experimental kernel
0b3c91d2a kata-deploy: Add kernel-dragonball-experimental target
00dcd900f docs: Add documentation for building agent with seccomp support.
2b779cba0 docs: Update url link in QAT documentation
39fe4a4b6 runtime: Collect QEMU's stderr
a5319c6be runtime: Start QEMU undaemonized
bf4e3a618 runtime: Launch QEMU with cmd.Start()
8a1723a5c runtime: Pre-establish the QMP connection
8a4f08cb0 govmm: Optionally pass QMP listener to QEMU
219bb8e7d govmm: Optionally start QMP with a pre-configured connection
a85d0e465 versions: update cni plugins version
676d02850 versions: Bump QEMU to v7.2.0
861c38b6a versions: Upgrade to Cloud Hypervisor v29.0
ba87e0afe runtime: Use consts in `kata-runtime check`
9f490d16f upcall: add document for upcall
596037e20 versions: Update conmon version
095e8fdef runk: Use the original Kill command instead of the customed it.
0f9e23a3d runk: Upgrade liboci-cli to v0.0.4
69fc8de71 runtime:all APIs are hang in the service.mu
8d4c2cf1b kata-ctl: Allow certain constants to go unused
64c11a66f kata-ctl: Have function to get cpu details to run on specific arch
923cd3fda virtcontainers: split out Linux parts from mount
cf1bae352 runtime: paas enablevhostuserstore annotation to hypervisor config
1592a385e dependency: update cgroups-rs
60ff230d8 virtcontainers: Split the factory package into Linux and Darwin bits
76437a972 runtime: Use git rev-parse for the kata-monitor tag
a9626682a virtcontainers: resourcecontrol: Add skeleton for Darwin
ea06fe3af virtcontainers: Add a Network API skeleton for Darwin
6ee550e9a runtime: vCPUs pinning is sandbox specific, not hypervisor
6199b6917 runtime-rs: change cache mode
a33a22ccd runtime-rs: add missing config section for share-fs
e3d3b72fa virtcontainers: use resource control for setting CPU affinity
f137048be resource-control: add helper function for setting CPU affinity
73216a810 vendor: revendor netlink to get latest
fc17d7cc4 virtcontainers: Fix misspelling in error message
12fd6ffc1 runtime: fix up disable_netns handling
64c9114a3 tools: add --locked option for cargo install
7eb43cec1 runtime: add test generated file to .gitignore
8551853cf runtime: use system pagesize for hugepage test
86a82cace runtime: change cache mode from none to never
82c59efd6 runtime-rs: change cache mode from none to never
7b309b578 kata-types: change cache mode from none to never
fee4e7c7c docs: change cache mode from none to never
594b57d08 utils: Add utility functions to get cpu and distro details.
d33e34361 check: Move PROC_CPUINFO from architecture specific files
f8a93a1de tools: Fix indentation for setup aks script
03de5f41b kata-ctl: remove get_kata_version_by_url function
464d4c94d runtime-rs: process single_container
5f9c892e4 kata-types: add single_container support
fa9ae9362 virtcontainers: Add a Virtualization.framework skeleton
d48b22bb1 virtcontainers: fs_share: add Darwin skeleton
fafc7a8b1 virtcontainers: tests: Ensure Linux specific tests are just run on Linux
efa4fc0b2 clh: Add hotplug support for network devices
1074d2c1d clh: Make vmAddNetPutRequest capable of doing hotplugs
9ec8a1398 virtcontainers: introduce hypervisor_darwin
8bb68a9f2 vc/network: skip existing endpoints when scanning for new ones
c21a8d5ff kata-ctl: fix build error on s390x
3b4420eb8 runtime: Define Darwin handled signals list
24b05a99b schedcore: Make buildable on !linux
3886aad19 nydus: net-ns handling needs to be only executed on Linux hosts
e256903af runtime-rs: cleanup the run dir of hypervisor when shut down
937a41346 kata-ctl: add unit tests for volume ops
8451db7c0 kata-ctl: direct-volume: add Add and Remove handlers
2d4b2cf72 runtime-rs: add POST method to shim-client
cae78a685 kata-ctl: add constants for direct-volume commands
652021ad9 versions: Upgrade to Cloud Hypervisor v28.1
d08538912 vc: fix up UT for CreateSandbox API change
578a9c25f vc: rescan network endpoints after running prestart hooks
cb84b0fb0 katautils: run prestart hooks after starting VM
079462d2e runk: Fix needless_borrow warning
2c24fcf34 runtime-rs: Fix clippy::bool-to-int-with-if warnings
025e78341 runtime-rs: Fix needless_borrow warnings
4fb163d57 runtime-rs: Allow clippy:box_default warnings
20121fcda runtime-rs: Fix unnecessary_cast warnings
b95364a14 dragonball: Allow question_mark warning in allocate_device_resources()
0b2f060bf dragonball: Fix unnecessary_cast warnings
a545a6593 agent: Allow clippy::question_mark warning in Namespace{}
9ced34dd2 agent: Fix explicit_auto_deref warnings
f77220490 agent: Fix needless_borrow warnings
7bcdc9049 rustjail: Fix unnecessary_cast warnings
41d7dbaae rustjail: Fix needless_borrow warnings
2a73e057d kata-types: Fix unnecessary_cast warnings
cf9ef1833 kata-types: Fix needless_borrow warnings
126187e81 safe-path: Fix needless_borrow warnings
bb78d35db kata-sys-util: Fix "match-like-matches-macro" warning
668e65240 kata-sys-util: Fix unnecessary_cast warnings
c1a8d89a7 kata-sys-util: Fix needless_borrow warnings
c9c38e6d0 logging: Allow clippy::type-complexity warning
ffd6fbb6b logging: Fix needless_borrow warnings
60df30015 protocols: Fix unnecessary_cast warnings
56e7b5d0f runtime/Makefile: Get some bits happy on darwin
0bbeb34b4 protocols: Fix needless_borrow warnings
dfea6c7d2 versions: Update the rust toolchain to 1.66.0
86ee24b33 Runtime: Clarify mutability of global var
dae667062 kata-runtime: add rust runtime path for kata-runtime exec
a2e3715e0 upcall: remove upcall client when stopping vm
31591d791 dragonball: fix unit test failure case about Kvm.
2b02e0a9b dragonball: add more unit test for vcpu manager
85f9094f1 agent: refactor guest hooks
360506225 runtime-rs: add dbs-upcall feature
03a0c9d78 kata-ctl: skip test if access GitHub.com fail
1dcbda3f0 kata-ctl: update Cargo.lock
b4b5d8150 docs: remove old and misleading instructions for minikube
0fe24e08b packaging: fix indents in build-kernel.sh
3480780bd kata-ctl: add check framework support for non-x86
1bd533f10 kata-ctl: let check framework arch-agnostic
fd77eebd4 runtime-rs: fix the issues mentioned in the code review
0e6920790 runtime-rs: Clean up mount points shared to guest
ecb28e2b1 kernel: adding kmod to do docker env
087515a46 agent: unset `CC` for cross-build
bf8848f92 agent: Eliminate unnecessary metrics
f8a48ab41 docs: add hint of probing loop module
afaf17f42 runtime-rs: enable container hugepage
fc4a67eec runtime-rs: enable vm hugepage
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
kata-deploy files must be adapted to a new release. The cases where it
happens are when the release goes from -> to:
* main -> stable:
* kata-deploy-stable / kata-cleanup-stable: are removed
* stable -> stable:
* kata-deploy / kata-cleanup: bump the release to the new one.
There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Add a few details about the current state of the Cloud Hypervisor (CH)
runtime-rs external hypervisor implementation with pointers to the
appropriate issues.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Add a basic runtime-rs `Hypervisor` trait implementation for Cloud
Hypervisor (CH).
> **Notes:**
>
> - This only supports a default Kata configuration for CH currently.
>
> - Since this feature is still under development, `cargo` features have
> been added to enable the feature optionally. The default is to not enable
> currently since the code is not ready for general use.
>
> To enable the feature for testing and development, enable the
> `cloud-hypervisor` feature in the `virt_container` crate and enable the
> `cloud-hypervisor` feature for its `hypervisor` dependency.
Fixes: #5242.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This is to install a missing binary protoc in shim-v2 Dockerfile.
Fixes: #6244
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
(cherry picked from commit 10603e3def)
Normally we return the context when creating a trace span so that the
ordering of spans w.r.t. calls is maintained in tracing output. Add
missing context for StartVM() for Cloud Hypervisor.
Fixes#6271
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Change unit tests for CPU check to table-driven tests and expand test
cases including temp files for cpuinfo.
Fixes#5919
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Updating this field, as `cpuid` provides host level data, which is not
what a guest would expect for Reduced Phsycial Bits. In almost all
cases, we should be using `1` for the value here.
Amend: Adding unit test change.
Fixes: #5006
Signed-off-by: Larry Dewey <larry.dewey@amd.com>
For kata containers, rootfs is used in the read-only way.
EROFS can noticably decrease metadata overhead.
On the basis of supporting the EROFS file system, it supports using the config parameter to switch the file system used by rootfs.
Fixes: #6063
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
This PR updates the runc version. This new version include
changes in:
- Fix mounting via wrong proc fd. When the user and mount namespaces are
used, and the bind mount is followed by the cgroup mount in the spec,
the cgroup was mounted using the bind mount's mount fd.
- Switch kill() in libcontainer/nsenter to sane_kill().
- Fix "permission denied" error from runc run on noexec fs.
- Fix failed exec after systemctl daemon-reload. Due to a regression
in v1.1.3, the DeviceAllow=char-pts rwm rule was no longer added and
was causing an error open /dev/pts/0: operation not permitted: unknown when systemd was reloaded.
Fixes#6251
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
We already have verbose output while merging the builds from various
build targets. Getting rid of verbose output to speed up.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Cleanup targets that have been removed in the past when the
makefile for kata-deploy was included.
Instead, add targets from the makefile under local-build kata-deploy.
Fixes: #6165
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
launchClh already has a timeout of 10seconds for launching clh, e.g.
if launchClh or setupVirtiofsDaemon takes a few seconds the context's
deadline will already be expired by the time it reaches bootVM
Fixes#6240
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Change cache mod from literal to const and place them in one place.
Also set default cache mode from `none` to `never` in
`pkg/katautils/config-settings.go.in`.
Fixes: #6151
Signed-off-by: Bin Liu <bin@hyper.sh>
The .dracut_rootfs.done file is accidentally being picked up as the default
target, regardless of BUILD_METHOD. Move the 'all' target definition up, so
that it's the default (=first) target in the makefile. Additionally make the
.dracut_rootfs.done target conditional on the right BUILD_METHOD being
selected, as building it doesn't make sense with BUILD_METHOD=distro.
Fixes: #6235
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Trait method cause for std::error::Error is deprecated thus need replace
it with source method for cgroups-fs::error::ErrorKind.
Fixes: #6192
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Delete cgroup for a thread which may exit can lead to panic. Just
neglect that error is harmless also avoid this failure.
Fixes: #6192
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Normally, the span name should be the same as the function name, and the log subsystem should not contain spaces.
Fixes#6153
Signed-off-by: joannejchen <chenjjoanne@gmail.com>
This is to make a docker version to v20.10 in docker upstream image ubuntu:20.04 for s390x and ppc64le.
Fixes: #6211
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Update Readme to instruct users to increment the kata config version
for any changes made to configs or patches under packaging/kernel.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
The version mentioned in the `kata_config_version` needs to be
updated for any kernel config change or changed to the patches applied.
Without this, CI would not test with the latest kernel changes.
We use to enforce this earlier as part of CI when `packaging` was
a standalone repo.
Add back this check as part of a github action so that the check is
performed early on instead of a CI job.
Fixes: #6210
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
There are lots of unit test cases fails regularly on aarch64, including
TestIOCopy, create_tmpfs. Temporarily skip it for now and enable it
after them get fixed.
Fixes: #6194
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
test_add_remove and test_get_sandbox_id_for_volume need root user, but
test_drop_privs can temporarily change the user to "nobody" that can
lead to the failure of these tests.
Serialise these three tests can fix it.
Fixes: #6055
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Add cpu resize ability upon upcall communication channel. Runtime could
use ResizeVcpu VmmAction and pass the desired vCPU number to the
Dragonball hypervisor.
Dragonball will trigger the device manager service in guest kernel's
upcall server to do cpu resize.
Fixes: #6008
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
The cmd.ExtraFiles feature that is used to implement appendFDs takes an
array of arbitray file descriptors and internally renumbers them to be
consecutive starting from 3, using dup2().
This isn't especially obvious : document it for the sake of clarity.
Fixes#6199
Signed-off-by: Greg Kurz <groug@kaod.org>
link-self-contained is not supported on ppc64le rust target.
Hence, do not pass it while building virtiofsd.
Fixes: #6195
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
The key steps in how-to-hotplug-memory-arm64.md are missing, resulting in the kata qemu pod not being created successfully.
Fixes: #6105
Signed-off-by: SinghWang <wangxin_0611@126.com>
The QEMU log file is essentially about fine grain tracing of QEMU
internals and mostly useful for developpers, not production. Notably,
the log file isn't limited in size, nor rotated in any way. It means
that a container running in the VM could possibly flood the log file
with a guest triggerable trace. For example, on openshift, the log
file is supposed to reside on a per-VM 14 GiB tmpfs mount. This means
that each pod running with the kata runtime could potentially consume
this amount of host RAM which is not acceptable.
Error messages are best collected from QEMU's stderr as kata is doing
now since PR #5736 was merged. Drop support for the QEMU log file
because it doesn't bring any value but can certainly do harm.
Fixes#6173
Signed-off-by: Greg Kurz <groug@kaod.org>
As the dragonball specific kernel is now part of the release, let's make
sure we build it as part of the kata-deploy-push action.
Fixes: #5859
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
os.Stat("unix:///run/vc/sbs/sid/shim-monitor.sock") will fail,
should be os.Stat("/run/vc/sbs/sid/shim-monitor.sock")
Fixes:#6148
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
As the dragonball specific kernel is now part of the release, let's make
sure we build it as part of the kata-deploy-test action.
Fixes: #5859
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the dragonball specific kernel, which takes advantage of
upcall, as part of the release tarball, so it can be used from the
release tarball / kata-deploy.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As Chao Wu added the support for building the dragonball kernel as a new
experimental kernel, let's make sure we reflect that as part of the
kata-deploy build scripts.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The default for the agent today is building with seccomp support.
However, additional steps need to be taken for building against
musl such as installing the static seccomp library for musl.
Add documentation to explain this.
Fixes#6136
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
LaunchQemu now connects a pipe to QEMU's stderr and makes it
usable by callers through a Go io.ReadCloser object. As
explained in [0], all messages should be read from the pipe
before calling cmd.Wait : introduce a LogAndWait helper to handle
that.
Fixes#5780
Signed-off-by: Greg Kurz <groug@kaod.org>
QEMU has always been started daemonized since the beginning. I
could not find any justification for that though, but it certainly
introduces a problem : QEMU stops logging errors when started this
way, which isn't accaptable from a support standpoint. The QEMU
community discourages the use of -daemonize ; mostly because
libvirt, QEMU's primary consummer, doesn't use this option and
prefers getting errors from QEMU's stderr through a pipe in order
to enforce rollover.
Now that virtcontainers knows how to start QEMU with a pre-
established QMP connection, let's start QEMU without -daemonize.
This requires to handle the reaping of QEMU when it terminates.
Since cmd.Wait() is blocking, call it from a goroutine.
Signed-off-by: Greg Kurz <groug@kaod.org>
LaunchCustomQemu() currently starts QEMU with cmd.Run() which is
supposed to block until the child process terminates. This assumes
that QEMU daemonizes itself, otherwise LaunchCustomQemu() would
block forever. The virtcontainers package indeed enables the
Daemonize knob in the configuration but having such an implicit
dependency on a supposedly configurable setting is ugly and fragile.
cmd.Run() is :
func (c *Cmd) Run() error {
if err := c.Start(); err != nil {
return err
}
return c.Wait()
}
Let's open-code this : govmm calls cmd.Start() and returns the
cmd to virtcontainers which calls cmd.Wait().
If QEMU doesn't start, e.g. missing binary, there won't be any
errors to collect from QEMU output. Just drop these lines in govmm.
Similarily there won't be any log file to read from in virtcontainers.
Drop that as well.
Signed-off-by: Greg Kurz <groug@kaod.org>
Running QEMU daemonized ensures that the QMP socket is ready to
accept connections when LaunchQemu() returns. In order to be
able to run QEMU undaemonized, let's handle that part upfront.
Create a listener socket and connect to it. Pass the listener
to QEMU and pass the connected socket to QMP : this ensures
that we cannot fail to establish QMP connection and that we
can detect if QEMU exits before accepting the connection.
This is basically what libvirt does.
Signed-off-by: Greg Kurz <groug@kaod.org>
QEMU's -qmp option can be passed the file descriptor of a socket that
is already in listening mode. This is done with by passing `fd=XXX`
to `-qmp` instead of a path. Note that these two options are mutually
exclusive : QEMU errors out if both are passed, so we check that as
well in the validation function.
While here add the `path=` stanza in the path based case for clarity.
Signed-off-by: Greg Kurz <groug@kaod.org>
When QEMU is launched daemonized, we have the guarantee that the
QMP socket is available. In order to launch a non-daemonized QEMU,
the QMP connection should be created before QEMU is started in order
to avoid a race. Introduce a variant of QMPStart() that can use such
an existing connection.
Signed-off-by: Greg Kurz <groug@kaod.org>
As QEMU released its v7.2.0 version in December last year, last do the
bump on our side.
A few configuration options have been removed between the v6.2.0 (the
version we currently use) and v7.2.0, so those have also been dropped
from our configure-hypervison.sh script (for this specific version).
Also, we're explicitly setting --disable-virtiofsd for the platforms
that we're testing using the rust version.
See: a8d6abe129/docs/about/deprecated.rst (virtiofsd)Fixes: #6102
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Fixes: #6095
We're already importing the virtcontainers package so might as well
use the constants for the hypervisor types we're checking against instead
of typing the names out in the switch cases.
Signed-off-by: Danny Canter <danny@dcantah.dev>
In order for users to get better understand of upcall features, we add
this document for upcall to illustrate what is upcall and how to enable
upcall.
fixes: #6054
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
When the vmm process exits abnormally, a goroutine sets s.monitor
to null in the 'watchSandbox' function without getting service.mu,
This will cause another goroutine to block when sending a message
to s.monitor, and it holds service.mu, which leads to a deadlock.
For example, the wait function in the file
.../pkg/containerd-shim-v2/wait.go will send a message to s.monitor
after obtaining service.mu, but s.monitor may be null at this time
Fixes: #6059
Signed-off-by: ls <335814617@qq.com>
The generic constants for cpu vendor and model may be superseded
by architecture specific constants. Allow these to be marked as
dead code to ignore warnings on architectures where they are overrided.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This function relies on get_single_cpu function which has configured
to compile on amd64 and s390x.
Making the function get_generic_cpu_details to compile on these
architectures until we resolve the compilation for functions defined
in check.rs. This is a temporary solution until we cleanup check.rs to
make it build on all architectures.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Mount handling is often unique in Linux. Let's ensure that the common
parts remain in mount.go, while Linux speific parts are within a linux
file.
Fixes: #6049
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
The .git-commit can be a multiple line file, potentially confusing
the Darwin linker for example.
Fixes: #6046
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Cgroups do not exist on Darwin, so use an empty implementation for
resourcecontrol for the time being. In the process, ensure that the
utilized cgroup handling (ie, isSystemdCgroup) is kept in general file,
since we use this to help assess/constrain the container spec we pass to
the guest.
Fixes: #6051
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
This PR fixes a misspelling in the error message when it tries to run
a system without Confidential computing support.
Fixes#6042
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
With `disable_netns=true`, we should never scan the sandbox netns which
is the host netns in such case.
Fixes: #6021
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
There is a broken release of cgroup-rs, but cargo install will not use
the version in Cargo.lock, so add the `--locked` option to use the version
specified in the Cargo.toml
Fixes: #5376
Signed-off-by: Bin Liu <bin@hyper.sh>
In TestHandleHugepages it will do a mount operation with different pagesizes,
but some systems only support 2M pagesize, test for a 1g pagesize will fail.
This commit try to fix by only mount pagesizes under `/sys/kernel/mm/hugepages`, which are
supported to mount by the OS.
Fixes: #6029
Signed-off-by: Bin Liu <bin@hyper.sh>
Move PROC_CPUINFO into check.rs. This file is used accross
architectures and does not need to be in arch-specific files.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
In `src/tools/kata-ctl/src/check.rs`, there is a function
`get_kata_version_by_url` in the tests mod,
indeed we can use the `get_kata_all_releases_by_url` in the main mod
to replace it.
Fixes: #5981
Signed-off-by: Bin Liu <bin@hyper.sh>
Process single_container like pod_sandbox when create container but like
pod_container when get the size info of memory/cpu from oci/spec.
Fixes: #6006
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
For now, only pod_sandbox and pod_container are supported. It doesn't cover
the case that container started by ctr which is a single_container defined
in kata 2.0. port the single_container kata type from kata 2.0 to kata 3.0.
Fixes: #6006
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Fixes: #6004
A Virtualization.framework based Hypervisor implementation.
This is just stubs for now to eventually get this building.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
Fixes: #6002
As a first pass for testing, let's add a skeleton for filesystem
sharing support on Darwin..
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
Fixes: #5993
Several tests utilize linux'isms like Mounts, bindmounts, vsock etc.
Let's ensure that these are still tested on Linux, but that we also skip
these tests when on other operating systems (Darwin). This commit just
moves tests; there shouldn't be any functional test changes. While the
tests still won't be runnable on Darwin/other hosts yet, this is a necessary
step forward.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
This is needed in order to have Moby / Docker working properly with
Cloud Hypervisor, as Moby / Docker relies on hotplugging a network
device to the VM as a preStartHook.
Fixes: #5997
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
THe only bit needed for having the vmAddNetPutRequest() capable of
dealing with hotplugs, instead of only coldplugs, is making sure it
doesn't error out in case a `200` response is returned.
The 200 response means:
"""
The new device was successfully added to the VM instance.
"""
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Fixes: #5995
Placeholder skeleton at this point - implementation will be added after
basic build refactoring lands.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
Fixes: #5990
Some signals may not be defined on non Linux host OSes, like
SIGSTKFLT for example. It's also not defined on certain architectures,
but irrelevant for this.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
Fixes: #5983
sched-core only makes sense on Linux hosts. Let's add stub/error for
other platforms.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
Fixes: #5985
With nydus not being its own pkg, it is challenging to implement cleanly
in a virtcontainers package that isn't necesarily Linux-only. The
existing code utilizes network namespace code in order to ensure nydus
is launched in the host netns. This is very Linux specific - so let's
make sure we only carry this out in a linux specific file.
In the Darwin case, to allow for compilation at least, let's add a stub
for doNetNS. Ideally the nydus and vc code can be refactored /
decoupled.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
Added table driven unit tests and
funcitionality test for functions in volume_ops.
`join_path` relies on safe_path::scoped_join
to validate the unsafe part of the input.
Testcase also takes into account the possibility of specially
constructed string that would get b64-encoded into path-like string.
Fixes#5341
Signed-off-by: Tingzhou Yuan <tzyuan15@bu.edu>
This commit adds direct-volume command handlers for kata-ctl,
including add, remove, stats and resize. Stats and resize
makes HTTP over UDS calls to runtime-rs while add and remove
runs locally on the host.
Fixes#5341
Signed-off-by: Tingzhou Yuan <tzyuan15@bu.edu>
kata-ctl: direct-volume: add Add and Remove handlers
This commit adds direct-volume command handlers for kata-ctl,
including add, remove, stats and resize. Stats and resize
makes HTTP over UDS calls to runtime-rs while add and remove
runs locally on the host.
Fixes#5341
Signed-off-by: Tingzhou Yuan <tzyuan15@bu.edu>
partly refactored shim-client to reuse code, added POST method
support, and made path string constants public for client imports.
Fixes#5341
Signed-off-by: Tingzhou Yuan <tzyuan15@bu.edu>
Moby relies on the prestart hooks to configure network endpoints. We
should rescan the netns after running them so that the newly added
endpoints can be found and plugged to the guest.
Fixes: #5941
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Substitution in the yq install script doesn't like zsh, and additionally
the version of yq we're using doesn't have a darwin/arm64 build so grab
the amd64 version and let rosetta work its magic.
Additionally swap to abspath from readlink -m for the printing of what binaries
to install, as the -m flag doesn't exist on the BSD variant, and this
should be the same behavior.
Fixes: #5970
Signed-off-by: Danny Canter <danny@dcantah.dev>
Was about to change `urandomdev` to a constant when I realized it's
intentionally mutable so it can be mocked in tests. There's other
comments to the same effect so clarify here as well.
Fixes: #5965
Signed-off-by: Danny Canter <danny@dcantah.dev>
In order to avoid resource leak, we need to remove upcall client in vm
and vcpu manager when stopping vm.
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
We have to execute some hooks both in host and guest. And in
/libs/kata-sys-util/src/hooks.rs, the coomon operations are implemented.
In this commit, we are going to refactor the code of guest hooks using
code in /libs/kata-sys-util/src/hooks.rs. At the same time, we move
function valid_env to kata-sys-util to make it usable by both agent and
runtime.
Fixes: #5857
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
This commit will call `error_for_status` after `send`, this call
will generate errors if status code between 400-499 and 500-599.
And sometime access github.com will fail, in this case we can
skip the test to prevent the CI failing.
Fixes: #5948
Signed-off-by: Bin Liu <bin@hyper.sh>
kata-ctl depends on runtime-rs, and this commit:
fbf294da3f
added a new dependency named shim-interface, this Cargo.lock should be updated too.
Signed-off-by: Bin Liu <bin@hyper.sh>
The current check framwork is specific for x86. Refactor the code
to let it arch-agnostic.
Fixes: #5923
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
In order to avoid cloning, changed the signature of
`ShareFsMount::share_rootfs`, `ShareFsMount::share_volume`, and
`ShareFsMount::umount_rootfs` to receive a reference to a config.
Fixes: #5898
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Fixed issues where shared volumes couldn't umount correctly.
The rootfs of each container is cleaned up after the container is killed, except
for `NydusRootfs`. `ShareFsRootfs::cleanup()` calls
`VirtiofsShareMount::umount_rootfs()` to umount mount points shared to the
guest, and umounts the bundle rootfs.
Fixes: #5898
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
- tools: Add some new gitignore items
- shim: return hypervisor's pid not shim's pid
- Dragonball: introduce upcall
- refactor(shim-mgmt): move client side to libs
- kata-ctl: Add --list option
- kata-ctl: check: only-list-releases and include-all-releases options
- basic framework for QEMU support in runtime-rs
- tools: Fix indentation on build kernel script
- runtime-rs: fix standalone share fs
- runtime-rs: fix sandbox_pidns calculation and oci spec amending
- runtime,agent: Add SELinux support for containers inside the guest
- kata-sys-util: fix issues where umount2 couldn't get the correct path
- agent: Drop the Option for LinuxContainer.cgroup_manager
- dragonball: enable kata3.0/dragonball CI on Arm
- fix kata deploy error after node reboot.
- tools: Fix indentation for ovmf script
- runtime: prevent waiting 50 ms minimum for a process exit
- runtime-rs: fix high cpu
- agent: remove `sysinfo` dependency
- runtime-rs: bind mount volumes in sandbox level
- docs: Update the rust version in the installation documentation
- runtime-rs: fix some variable names and typos
- kata-ctl: add host check for aarch64
- kata-ctl: fix dependency version conflict
- workflow: fix cargo-deny-runner.yaml syntax error
- runtime: Add identification in version for runtime-rs
- workflow: call cargo in user's $PATH
- runtime-rs: remove the version number from the commit display message
- runk: Re-implement start operation using the agent codes
- build: update golang version to 1.19.3
- snap: Fix snapcraft setup (unbreak snap releases)
- fix(agent): fix iptables binary path in guest
- runtime-rs: moving only vCPU threads into sandbox controller
- tools: Remove extra tab spaces from kata deploy binaries script
- ci: let static checks don't depend on build
- actions: use matrix to refactor static checks
- agent: support systemd cgroup for kata agent.
- actions: skip some jobs using "paths-ignore" filter
- runtime: go fix code for 1.19
- doc: update runtime-rs "Build and Install"
- runtime: don't fail mkdir if the folder is already created by another process
- kernel: add CONFIG_X86_SGX into whitelist
- runtime-rs: block on the current thread when setup the network to avoid be take over by other task
- Refactor(runtime-rs): add conditional compile for virt-sandbox persist
- runtime: add log record to the qemu config method `appendDevices` for…
- runtime: Use containerd v1.6.8
- tools: Fix indentation of build static firecracker script
- package: add nydus to release artifacts
- agent: check if command exist before do ip_tables test
- runtime: Support virtiofs queue size for qemu and make it configurable
- docs: change mount-info.json to mountInfo.json
- docs: update doc "NVIDIA GPU passthrough"
- runtime-rs: support vhost-vsock
- utils: Add utility function to fetch the kernel version.
- versions: update nydusd version
- runtime-rs: support nydus v5 and v6 rootfs
- Upgrade to Cloud Hypervisor v28.0
- docs: update doc "Setup swap device in guest kernel"
- Rust fixes + Golang bump
- clh: avoid race condition when stopping clh
- tools: Fix indentation of build static virtiofsd script
- docs: Fix configuration path
- runtime-rs : fix the shim source in the documentation test is ambiguous
- versions: update vmm-sys-util and related crates to v0.11.0
- runtime-rs: delete all cargo patches
- feat(shim-mgmt): iptables handler
- tools: Remove empty spaces from build kernel script
- Built-in Sandbox: add more unit tests for dragonball. Part 3
- Dragonball: enable mem_file_path config into hugetlbfs process
- runtime-rs:add hypervisor interface capabilities
- cloud-hypervisor: Fix GetThreadIDs function
- github: Parallelise static checks
- runtime-rs: blanks filled & fixes made to virtiofsd launch
- vCPUs pinning support for Kata Containers
- runtime-rs: fix shared volume permission issue
- runk: Ignore an error when calling kill cmd with --all option
- runk: Upgrade libseccomp crate to v0.3.0 in Cargo.lock
- snap: Unbreak docker install
- add EnterNetNS in virtcontainers
- tools: Fix indentation of build static clh script
- virtiofsd: Not use "link-self-contained=yes" on s390x
- Kata ctl drop privs
- versions: bump golangci-lint version
- runtime-rs: generate config files with the default target
- docs: Fix volumeMounts in SGX usage example
- versions: Update Cloud Hypervisor to b4e39427080
- docs: update rust runtime installation guide
- rustjail: Upgrade libseccomp crate to v0.3.0
- makefile: remove sudo when create symbolic link
- agent: remove redundant checks
- shim: Ensure pagesize is set when reporting hugetlb stats
- kata-ctl: Re-enable network tests on s390x (fixes 5438)
- agent: use NLM_F_REPLACE replace NLM_F_EXCL in rtnetlink
- fix readme content error at doc directory
- agent: validate hugepage size is supported
- Makefile: fix an typo in runtime-rs makefile
- qemu: Re-work static-build Dockerfile
- Modify agent-url return value in runtime-rs
- runtime-rs: regulate the comment in runtime-rs makefile
- doc: Update how-to-run-kata-containers-with-SNP-VMs.md
- kata-ctl: Disable network check on s390x
- virtiofsd: Build inside a container
- Dragonball: remove redundant comments in event manager
- versions: Update TDX QEMU
- runtime-rs: fix typo get_contaier_type to get_container_type
- kata-ctl: improve command descriptions for consistency
- runtime-rs: force shutdown shim process in it can't exit
- versions: Update TDX kernel
- ci: skip s390x for dragonball.
- Dragonball: delete redundant comments in blk_dev_mgr
- kata-ctl: Move development to main branch
- runtime-rs: support ephemeral storage for emptydir
- docs: fix a typo in rust-runtime-installation-guide
- Built-in Sandbox: add more unit tests for dragonball
- readme: remove libraries mentioning
b5cfd0958 kata-ctl: Fixed format for check release options
fbf294da3 refactor(shim-mgmt): move client side to libs
ae0dcacd4 tools: Add some new gitignore items
99485d871 shim: return hypervisor's pid not shim's pid
1f28ff683 runtime-rs: add binary to exercise shim proper w/o containerd dependencies
eb8c9d38f runtime-rs: add launch of a simple qemu process to start_vm()
2f6d0d408 runtime-rs: support qemu in VirtContainer
1413dfe91 runtime-rs: add basic empty boilerplate for qemu driver
a81ced0e3 upcall: add upcall into kernel build script
f5c34ed08 Dragonball: introduce upcall
8dbfc3dc8 kata-ctl: Fixed format for check release options
f3091a9da kata-ctl: Add kata-ctl check release options
a577df8b7 tools: Fix indentation on build kernel script
b087667ac kata-deploy: Fix the pod of kata deploy starts to occur an error
79cf38e6e runtime-rs: clear OCI spec namespace path
62f4603e8 runtime-rs: reset rdma cgroup
5b6596f54 runtime-rs: CreateContainerRequest has Default
e9e82ce28 runtime-rs: fix is_pid_namespace_enabled check
8079a9732 kata-sys-util: fix issues where umount2 couldn't get the correct path
4661ea8d3 runtime-rs: fix standalone share fs
c5abc5ed4 config: speed up rng init when kernel boot for arm64
3e6114b2e tools: Fix indentation for ovmf script
7fdbbcda8 agent: Drop the Option for LinuxContainer.cgroup_manager
d04d45ea0 runtime: use pidfd to wait for processes on Linux
e9ba0c11d runtime: use exponential backoff for process wait
748f22e7d agent: remove sysinfo dependency
0019d653d runtime-rs: fix high cpu
46b38458a docs: Update the rust version in the installation documentation
71491a69c runtime: move process wait logic to another function
92ebe61fe runtime: reap force killed processes
fdf0a7bb1 runtime-rs: fix the issues mentioned in the code review
1d823c4f6 runtime-rs: umount and permission controls in sandbox level
527b87141 runtime-rs: bind mount volumes in sandbox level
9ccf2ebe8 agent: add signal value to log
fb2c142f1 runtime-rs: fix some variable names and typos
737420469 kata-ctl: fix dependency version conflict
89574f03f workflow: call cargo in user's $PATH
d4321ab48 runtime: Add identification in version for runtime-rs
f7fc436be workflow: fix cargo-deny-runner.yaml syntax error
78532154d docs: Add description for guest SELinux support
c617bbe70 runtime: Pass SELinux policy for containers to the agent
935476928 agent: Add SELinux support for containers
a75f99d20 osbuilder: Create guest image for SELinux
a9c746f28 kernel: Add kernel configs for SELinux
86cb05883 snap: Fix snapcraft setup (unbreak snap releases)
f443b7853 build: update golang version to 1.19.3
e12db92e4 runk: Re-implement start operation using the agent codes
e723bad0a ci: let static checks don't depend on build
69aae0227 actions: use matrix to refactor static checks
a5e4cad4b kata-ctl: add host check for aarch64
2edbe389d runtime-rs: moving only vCPU threads into sandbox controller
340e24f17 actions: skip some job using "paths-ignore" filter
2426ea9bd doc: update runtime-rs "Build and Install"
67fe703ff runtime-rs: remove the version number from the commit display message
1d93a9346 fix(agent): fix iptables binary path in guest
1dfd845f5 runtime: go fix code for 1.19
cd85a44a0 tools: Remove extra tab spaces from kata deploy binaries script
cb199e0ec kernel: add CONFIG_X86_SGX into whitelist
4b45e1386 runtime: don't fail mkdir if the folder is already created
b987bbc57 runtime-rs: block on the current thread when setup the network
abb9ebeec package: add nydus to release artifacts
30a7ebf43 runtime: Log invalid devices in QEMU config
2539f3186 runtime: Use containerd v1.6.8
993d05a42 docs: change mount-info.json to mountInfo.json
d808adef9 runtime-rs: support vhost-vsock
6b2ef66f0 runtime-rs: add conditional compile for virt-sandbox persist
6c1e153a6 docs: update doc "NVIDIA GPU passthrough"
b53171b60 agent: check command before do test_ip_tables
a636d426d versions: update nydusd version
3bb145c63 runtime: Support virtiofs queue size for qemu and make it configurable
e80a9f09f utils: Add utility function to fetch the kernel version.
36545aa81 runtime: clh: Re-generate the client code
f4b02c224 versions: Upgrade to Cloud Hypervisor v28.0
e4a6fbadf docs: update doc "Setup swap device in guest kernel"
2f5f575a4 log-parser: Simplify check
d94718fb3 runtime: Fix gofmt issues
16b837509 golang: Stop using io/ioutils
66aa330d0 versions: Update golangci-lint
b3a4a1629 versions: bump containerd version
eab8d6be1 build: update golang version to 1.19.2
e80dbc15d runtime-rs: workaround Dragonball compilation problem
c3f1922df fix(fmt): fix cargo fmt to pass static check
a4099dab8 tools: Fix indentation of build static firecracker script
c46814b26 runtime-rs:support nydus v5 and v6
a04afab74 qemu: early exit from Check if the process was stopped
7e481f217 qemu: set stopped only if StopVM is successful
0e3ac66e7 clh: return faster with dead clh process from isClhRunning
9ef68e0c7 clh: fast exit from isClhRunning if the process was stopped
2631b08ff clh: don't try to stop clh multiple times
f45fe4f90 versions: update vmm-sys-util and related crates to v0.11.0
8be081730 tools: Fix indentation of build static virtiofsd script
f8f97c1e2 feat(shim-mgmt): iptables handler
29c75cf12 runtime-rs: delete all cargo patches
9f70a6949 tools: Remove empty spaces from build kernel script
57336835d dragonball: add more unit test for device manager
233370023 dragonball: add test utils.
3e9c3f12c docs: Fix configuration path
2adb1c182 Dragonball: enable mem_file_path config into hugetlbfs process
daeee26a1 cloud-hypervisor: Fix GetThreadIDs function
40d514aa2 github: Parallelise static checks
2508d39b7 runtime: added vcpus pinning logics Core VCPU threads pinning logics for issue 4476. Also provided docs.
fef8e92af runtime-rs:add hypervisor interface capabilities
27b191358 runtime-rs: blanks filled & fixes made to virtiofsd launch
990e6359b snap: Unbreak docker install
ca69a9ad6 snap: Use metadata for dependencies
df092185e runk: Upgrade libseccomp crate to v0.3.0 in Cargo.lock
16dca4ecd runk: Ignore an error when calling kill cmd with --all option
b74c18024 runtime-rs: fix shared volume permission issue
936fe35ac runtime-rs : fix shim source is ambiguous
0ed7da30d tools: Fix indentation of build static clh script
43fcb8fd0 virtiofsd: Not use "link-self-contained=yes" on s390x The compile option link-self-contained=yes asks rustc to use C library startup object files that come with the compiler, which are not available on the target s390x-unknown-linux-gnu. A build does not contain any startup files leading to a broken executable entry point (causing segmentation fault).
219919e9f docs: Fix volumeMounts in SGX usage example
c0f5bc81b cargo: Add Cargo.lock to version control
474927ec9 gitignore: Add gitignore file
699f821e1 utils: Add function to drop priveleges
a6fb4e2a6 versions: bump golangci-lint version
b015f34af runtime-rs: generate config files with the default target
d7bb4b551 agent: support systemd cgroup for kata agent
144efd1a7 docs: update rust runtime installation guide
abf4f9b29 docs: kata 3.0 Architecture fix readme content error
44d8de892 agent: remove redundant checks
9d286af7b versions: Update Cloud Hypervisor to b4e39427080
081ee4871 agent: use NLM_F_REPLACE replace NLM_F_EXCL in rtnetlink
e95089b71 kata-ctl: add basic cpu check for s390x
871d2cf2c kata-ctl: Limit running tests to x86 and use native-tls on s390x
cbd84c3f5 rustjail: Upgrade libseccomp crate to v0.3.0
748be0fe3 makefile: remove sudo when create symbolic link
227e717d2 qemu: Re-work static-build Dockerfile
72738dc11 agent: validate hugepage size is supported
f74e328ff Makefile: fix an typo in runtime-rs makefile
f205472b0 Makefile: regulate the comment style for the runtime-rs comments
9f2c7e47c Revert "kata-ctl: Disable network check on s390x"
ac403cfa5 doc: Update how-to-run-kata-containers-with-SNP-VMs.md
00981b3c0 kata-ctl: Disable network check on s390x
39363ffbf runtime: remove same function
c322d1d12 kata-ctl: arch: Improve check call
0bc5baafb snap: Build virtiofsd using the kata-deploy scripts
cb4ef4734 snap: Create a task for installing docker
7e5941c57 virtiofsd: Build inside a container
35d52d30f versions: Update TDX QEMU
4d9dd8790 runtime-rs: fix typo get_contaier_type to get_container_type
70676d4a9 kata-ctl: improve command descriptions for consistency
9eb73d543 versions: Update TDX kernel
00a42f69c kata-ctl: cargo: 2021 -> 2018
fb6327474 kata-ctl: rustfmt + clippy fixes
1f1901e05 dragonball: fix clippy warning for aarch64
a343c570e dragonball: enhance dragonball ci
6a64fb0eb ci: skip s390x for dragonball.
a743e37da Dragonball: delete redundant comments in blk_dev_mgr
2b345ba29 build: Add kata-ctl to tools list
f7010b806 kata-ctl: docs: Write basic documentation
862eaef86 docs: fix a typo in rust-runtime-installation-guide
26c043dee ci: Add dragonball test
781e604c3 docs: Reference kata-ctl README
15c343cbf kata-ctl: Don't rely on system ssl libs
c23584994 kata-ctl: clippy: Resolve warnings and reformat
133690434 kata-ctl: implement CLI argument --check-version-only
eb5423cb7 kata-ctl: switch to use clap derive for CLI handling
018aa899c kata-ctl: Add cpu check
7c9f9a5a1 kata-ctl: Make arch test run at compile time
b63ba66dc kata-ctl: Formatting tweaks
cca7e32b5 kata-ctl: Lint fixes to allow the branch to be built
8e7bb8521 kata-ctl: add code for framework for arch
303fc8b11 kata-ctl: Add unit tests cases
d0b33e9a3 versions: Add kata-ctl version entry
002b18054 kata-ctl: Add initial rust code for kata-ctl
b62b18bf1 dragonball: fix clippy warning
2ddc948d3 Makefile: add dragonball components.
3fe81fe4a dragonball-ut: use skip_if_not_root to skip root case
72259f101 dragonball: add more unit test for vmm actions
9717dc3f7 Dragonball: remove redundant comments in event manager
9c1ac3d45 runtime-rs: return port on agent-url req
89e62d4ed shim: Ensure pagesize is set when reporting hugetbl stats
8d4ced3c8 runtime-rs: support ephemeral storage for emptydir
046ddc646 readme: remove libraries mentioning
86ad832e3 runtime-rs: force shutdown shim process in it can't exit
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
If the serial path is given, legacy_manager should create socket console
based on that path. Or the console should be created based on stdio.
Fixes: #5914
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
enable start container from bundle in this way
$ ls ./bundle
config.json rootfs
$ sudo ctr run -d --runtime io.containerd.kata.v2 --config bundle/config.json test_kata
Fixes:#5872
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
The client side is moved to libs. This is to solve the problem
that including clients will bring about messy dependencies.
Fixes: #5874
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
Add some new ignore items to avoid local builds that cause git to track a lot of files
Fixes: #5900
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
When `HOST_ARCH` != `ARCH` unset `CC`
Specifying a foreign CC is incompatible with building libgit2. Thus after the RUSTFLAGS linker
has been set we can safely unset CC to avoid passing this value through the build.
Fixes: #5890
Signed-off-by: James Tumber <james.tumber@ibm.com>
After building the binary as usual with `cargo build` run it as follows.
It needs a configuration.toml in which only qemu keys `path`, `kernel`
and `initrd` will initially need to be set. Point them to respective
files e.g. from a kata distribution tarball.
It also needs to be launched from an exported container bundle
directory. One can be created by running
mkdir rootfs
podman export $(podman create busybox) | tar -C ./rootfs -xvf -
runc spec -b .
in a suitable directory.
Then launch the program like this:
KATA_CONF_FILE=/path/to/configuration-qemu.toml /path/to/shim-ctl
Fixes: #5817
Signed-off-by: Pavel Mores <pmores@redhat.com>
This does almost literally nothing so far apart from getting and setting
HypervisorConfig. It's mostly copied from/inspired by dragonball.
Signed-off-by: Pavel Mores <pmores@redhat.com>
DEFAULT_REGISTRY pre-registers many metrics that we don't need or have duplicated.
This PR uses a custom register for metrics without interference and ensures that
the registration process is executed only once when the program is running.
Fixes: #5255
Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
In order to let upcall being used by Kata Container, we need to add
those patches into kernel build script.
Currently, only when experimental (-e) and hypervisor type dragonball
(-t dragonball) are both enabled, that the upcall patches will be
applied to build a 5.10 guest kernel.
example commands: sh ./build-kernel.sh -e -t dragonball -d setup
fixes: #5642
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Upcall is a direct communication tool between VMM and guest developed
upon vsock. The server side of the upcall is a driver in guest kernel
(kernel patches are needed for this feature) and it'll start to serve
the requests after the kernel starts. And the client side is in
Dragonball VMM , it'll be a thread that communicates with vsock through
uds.
We want to keep the lightweight of the VM through the implementation of
the upcall, through which we could achieve vCPU hotplug, virtio-mmio
hotplug without implementing complex and heavy virtualization features
such as ACPI virtualization.
fixes: #5642
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
If `loop` module is not probed, it causes error like "losetup: cannot find an unused loop device".
Fixes: #5887
Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>
If a pod of kata is deployed on a machine, after the machine restarts, the pod status of kata-deploy will be CrashLoopBackOff.
Fixes: #5868
Signed-off-by: SinghWang <wangxin_0611@126.com>
None of the host namespace paths make sense in the guest. Let's clear
them all before sending the spec to the agent.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
We should test is_pid_namespace_enabled before amending the container
spec, where the pid namespace path is cleared and resulting
sandbox_pidns to always being false.
Fixes: #5881
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Strings in Rust don't have \0 at the end, but C does, which leads to `umount2`
in the libc can't get the correct path. Besides, calling `nix::mount::umount2`
to avoid using an unsafe block is a robust solution.
Fixes: #5871
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Standalone share fs should add virtiofs device in setup_device_before_start_vm
and return the storages to mount the directory in guest. And it uses
hypervisor's jailer root directly instead of jail config.
Besides, we tweaked the parameter, so it adapts to rust version virtiofsd
now. And its cache policy which forbids caching is "never" now, instead of
"none". Hence, we change the default cache mode.
Fixes: #5655
Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
For now, rng init is too slow for kata3.0/dragonball. Enable
random_trust_cpu can speed up rng init when kernel boot.
Fixes: #5870
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Script to execute to build virtiofsd has been changed in #5426 but not in the doc. This commit update the developer guide.
Fixes: #5860
Signed-off-by: Mathias Flagey <mathiasflagey1201@gmail.com>
Cgroup manager for a container will always be created.
Thus, dropping the option for LinuxContainer.cgroup_manager
is feasible and could simplify the code.
Fixes: #5778
Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
Use pidfd_open and poll on newer versions of Linux to wait
for the process to exit. For older versions use existing wait logic
Fixes: #5617
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Fixed the issue when using nonblocking, the `tokio::io::copy()` needing
to handle EAGAIN, resulting in high CPU usage.
Fixes: #5740
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
Removed the `Debug` trait for the `ShareFs` and etc. Renamed
`ShareFsMount::upgrade()` and `ShareFsMount::downgrade()` to
`upgrade_to_rw()` and `downgrade_to_ro()`. Protected `mounted_info_set`
with a mutex to avoid race conditions.
Fixes: #5588
Signed-off-by: Xuewei Niu <justxuewei@apache.org>
This commit implemented umonut controls and permission controls. When a volume
is no longer referenced, it will be umounted immediately. When a volume mounted
with readonly permission and a new coming container needs readwrite permission,
the volume should be upgraded to readwrite permission. On the contrary, if a
volume with readwrite permission and no container needs readwrite, then the
volume should be downgraded.
Fixes: #5588
Signed-off-by: Xuewei Niu <justxuewei@apache.org>
Implemented bind mount related managment on the sandbox side, involving bind
mount a volume if it's not mounted before, upgrade permission to readwrite if
there is a new container needs.
Fixes: #5588
Signed-off-by: Xuewei Niu <justxuewei@apache.org>
Also added crate `runtime-rs/crates/runtimes` as dependency as it's
immediately depended upon by the `direct-volume` feature, see issue
5341 and PR 5467.
Fixes#5810
Signed-off-by: Tingzhou Yuan <tzyuan15@bu.edu>
Call cargo in root's HOME may lead to permission error, should
call cargo installed in user's HOME/PATH.
Fixes: #5813
Signed-off-by: Bin Liu <bin@hyper.sh>
Now we are supporting two runtime/shim, the go version,
and the rust version, for debug purposes, we can
add an identification in the version info
to tell us which runtime/shim is used.
Fixes: #5806
Signed-off-by: Bin Liu <bin@hyper.sh>
Add the description about how to enable SELinux for containers
running inside the guest.
Fixes: #4812
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Pass SELinux policy for containers to the agent if `disable_guest_selinux`
is set to `false` in the runtime configuration. The `container_t` type
is applied to the container process inside the guest by default.
Users can also set a custom SELinux policy to the container process using
`guest_selinux_label` in the runtime configuration. This will be an
alternative configuration of Kubernetes' security context for SELinux
because users cannot specify the policy in Kata through Kubernetes's security
context. To apply SELinux policy to the container, the guest rootfs must
be CentOS that is created and built with `SELINUX=yes`.
Fixes: #4812
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
The kata-agent supports SELinux for containers inside the guest
to comply with the OCI runtime specification.
Fixes: #4812
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Create a guest image to support SELinux for containers inside the guest
if `SELINUX=yes` is specified. This works only if the guest rootfs is
CentOS and the init service is systemd, not the agent init. To enable
labeling the guest image on the host, selinuxfs must be mounted on the
host. The kata-agent will be labeled as `container_runtime_exec_t` type.
Fixes: #4812
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Add kernel configs related to SELinux in order to add the
support for containers running inside the guest.
Fixes: #4812
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Setup the snapcraft environment manually as the action we had been using
for this does not appear to be actively maintained currently.
Related to this, switch to specifying the snapcraft store credentials
using the `SNAPCRAFT_STORE_CREDENTIALS` secret. This unbreaks
`snapcraft upload`, which Canonical appear to have broken by removing
the previous facility.
Fixes: #5772.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This commit re-implements `start` operation by leveraging the agent codes.
Currently, `runk` has own `start` mechanism even if the agent already
has the feature to handle starting a container. This worsen the maintainability
and `runk` cannot keep up with the changes on the agent side easily.
Hence, `runk` replaces own implementations with agent's ones.
Fixes: #5648
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
For now, we can check if host support running kata by check if "/dev/kvm"
exist on aarch64.
Fixes: #5768
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
When using source code to compile runtime-rs,make the
documentation point out the detailed environment build
and compilation methods to avoid errors caused by related
dependent packages.
Fixes:#5757
Signed-off-by: Chen Taotao <chentt10@chinatelecom.cn>
The displayed commit message and version message are partially duplicated.
Remove the version number from the commit display message.
Fixes:#5735
Signed-off-by: Chen Taotao <chentt10@chinatelecom.cn>
Some rootfs put iptables-save and iptables-restore
under /usr/sbin instead of /sbin. This pr checks both
and returns the one exist.
Fixes: #5608
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
We have starting to use golang 1.19, some features are
not supported later, so run `go fix` to fix them.
Fixes: #5750
Signed-off-by: Bin Liu <bin@hyper.sh>
CONFIG_X86_SGX is introduced after kernel 5.11, and that config is a
default x86_64 config for Kata build-kernel.sh script.
But if we use -v to specify any kernel version below 5.11 will cause an
inevitable error because CONFIG_X86_SGX is not supported in older
kernels and that may cause problem for the situation if we need kernel
version below 5.11.
So I propose to put CONFIG_X86_SGX into whitelist.conf to avoid break
building guest kernel below 5.11.
fixes: #5741
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Use MkdirAll instead of Mkdir so it doesn't generate an
error when the folder is created by another process
Fixes#5713
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
As the increase of the I/O intensive tasks, two issues could be caused:
1. When the future is blocked, the current thread (which is in the network namespace)
might be take over by other tasks. After the future is finished, the thread take over
the current task might not be in the pod network namespace
2. When finish setting up the network, the current thread will be set back to the host namsapce.
But the task which be taken over would still stay in the pod network namespace
To avoid that, we need to block the future on the current thread.
Fixes:#5728
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
When the user tried to add new devices to the VM, there is no error info for the invalid
device. This PR adds a log record to the `appendDevices` for the invalid device of the
qemu config.
Fixes: #5719
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
Let's follow the binary bump used in the CI and also bump the vendored
version of containerd to v1.6.8.
Fixes: #5722
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Rename old VsockConfig to HybridVsockConfig. And add VsockConfig to
support vhost-vsock. We follow kata's old way to try random vhost fd
for 50 times to generate uniqe fd.
Fixes: #5654
Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com>
test_ip_tables test depends on iptables tools. But we can't
ensure these tools are exist. it's better to skip the test
if there is no such tools.
Fixes: #5697
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
The default vhost-user-fs queue-size of qemu is 128 now. Set it to 1024
by default which is same as clh. Also make this value configurable.
Fixes: #5694
Signed-off-by: liyuxuan.darfux <liyuxuan.darfux@bytedance.com>
Add functionality to get kernel version and related unit tests.
This is intended to be used in the kata-env command going forward.
Fixes: #5688
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This patch re-generates the client code for Cloud Hypervisor v28.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.
Fixes: #5683
Signed-off-by: Bo Chen <chen.bo@intel.com>
```
14:13:15 parse.go:306:5: S1009: should omit nil check; len() for github.com/kata-containers/kata-containers/src/tools/log-parser.kvPairs is defined as zero (gosimple)
14:13:15 if pairs == nil || len(pairs) == 0 {
14:13:15 ^
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It seems that bumping the version of golang and golangci-lint new format
changes are required.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The package has been deprecated as part of 1.16 and the same
functionality is now provided by either the io or the os package.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's bump the golangci-lint in order to fix issues that popped up after
updating Golang to its 1.19.2 version.
Depends-on: github.com/kata-containers/tests#5257
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
v1.5.2 cannot be built from source by newer golang. Let's bump
containerd version to 1.6.8. The GO runtime dependency has
been moved to v1.6.6 for some time already.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
So that we get the latest language fixes.
There is little use to maitain compiler backward compatibility.
Let's just set the default golang version to the latest 1.19.2.
Fixes: #5494
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Since the upstream rust-vmm is changing its dependency style towards
caret requirements in these days (more information:
rust-vmm/vm-memory#199) and it breaks Dragonball compilation frequently.
rust-vmm is expected to finish the changes this week and in order to not
break Kata CI due to Dragonball's compilation error, we will add
Cargo.lock file into /src/dragonball first and remove it later when
rust-vmm is stable.
fixes: #5657
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Through proactively checking if Cloud Hypervisor process is dead,
this patch provides a faster path for isClhRunning
Fixes: #5623
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Use atomic operations instead of acquiring a mutex in isClhRunning.
This stops isClhRunning from generating a deadlock by trying to
reacquire an already-acquired lock when called via StopVM->terminate.
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Avoid executing StopVM concurrently when virtiofs dies as a result of clh
being stopped in StopVM.
Fixes: #5622
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Since the upstream of vmm-sys-utils upgraded to 0.11.0, some crates
automatically upgrade to v0.11.0, and some stay at v0.10.0 ( depending
on how they write version dependency in Cargo toml` which causes the
compile error in runtime-rs.
In order to fix this problem, we need to upgrade all vmm-sys-util
dependencies in runtime-rs to v0.11.0.
fixes: #5636
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Support the handlers in runtime, which are used by kata-ctl iptables series of commands in runtime.
Fixes: #5370
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
The cargo patch in the cargo.toml seems to cause the whole runtime-rs
building time longer and also makes it harder to build runtime-rs in an
environment without the network
We should delete all patches from the cargo.toml file and publish all
the crates that was once patched.
fixes: #5614#5527#5526#5449
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
On install you generate a configuration-fc.toml
file when building the kata-runtime and
copy it to either /etc/kata-containers/configuration-fc.toml
or /usr/share/defaults/kata-containers/configuration-fc.toml.
To reflect that the path must be one of the above,
we can fix the path in doc.
Fixes: #5589
Signed-off-by: Mathis Joffre <mariusjoffre@gmail.com>
In the current Dragonball code, mem_file_path config is not used when
hugetlbfs is enabled.
In this commit we add mem_file_path into hugetlbfs enable process.
fixes: #5566
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Although introducing an awful amount of code duplication, let's
parallelise the static checks in order to reduce its time and the space
used in the VMs running those.
While I understand there may be ways to make the whole setup less
repetitive and error prone, I'm taking the approach of:
* Make it work
* Make it right
* Make it fast
So, it's clear that I'm only attempting to make it work, and I'd
appreciate community help in order to improve the situation here. But,
for now, this is a stopgap solution.
JFYI, the time needed for run the tests on the `main` branch went down
from ~110 minutes to ~60 minutes. Plus, we're not running those on a
single VM anymore, which decreases the change to hit the space limit.
Reference: https://github.com/kata-containers/kata-containers/actions/runs/3393468605/jobs/5640842041
Ideally, each one of the following tests should be also split into
smaller tests, each test for one component, for instance.
* static-checks
* compiler-checks
* unit-tests
* unit-tests-as-root
Fixes: #5585
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
1. be able to check does hypervisor support use block device, block
device hotplug, multi-queue, and share file
2. be able to set the hypervisor capability of using block device, block
device hotplug, multi-queue, and share file
Fixes: #5569
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
The 'config' argument to ShareVirtioFsStandalone::new() is now actually
used, taking care of an explicit TODO.
If a shared path doesn't exist in ShareVirtioFsStandalone::virtiofsd_args()
it is now created instead of returning an error, thus following
ShareVirtioFsInline's suit.
The '-o vhost_user_socket=...' command line argument doesn't seem to be
supported by newer versions of virtiofsd so we replace it with
'--socket-path' which should be functionally equivalent according to docs.
Fixes#5572
Signed-off-by: Pavel Mores <pmores@redhat.com>
It appears that _either_ the GitHub workflow runners have changed their
environment, or the Ubuntu archive has changed package dependencies,
resulting in the following error when building the snap:
```
Installing build dependencies: bc bison build-essential cpio curl docker.io ...
:
The following packages have unmet dependencies:
docker.io : Depends: containerd (>= 1.2.6-0ubuntu1~)
E: Unable to correct problems, you have held broken packages.
```
This PR uses the simplest solution: install the `containerd` and `runc`
packages. However, we might want to investigate alternative solutions in
the future given that the docker and containerd packages seem to have
gone wild in the Ubuntu GitHub workflow runner environment. If you
include the official docker repo (which the snap uses), a _subset_ of
the related packages is now:
- `containerd`
- `containerd.io`
- `docker-ce`
- `docker.io`
- `moby-containerd`
- `moby-engine`
- `moby-runc`
- `runc`
Fixes: #5545.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Rather than hard-coding the package manager into the docker part,
use the `build-packages` section to specify the parts package
dependencies in a distro agnostic manner.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
The libseccomp crate was upgraded to v0.3.0 by 4696ead,
but `Cargo.lock` of runk wasn't updated by mistake.
So, this commit updates `Cargo.lock` of runk to the latest dependencies.
Fixes: #5487
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Ignore an error handling that is triggered when the kill command is called
with `--all option` to the stopped container.
High-level container runtimes such as containerd call the kill command with
`--all` option in order to terminate all processes inside the container
even if the container already is stopped. Hence, a low-level runtime
should allow `kill --all` regardless of the container state like runc.
This commit reverts to the previous behavior.
Fixes: #5555
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Fix the issue where share volumes always have readwrite permission even if
readonly permission is enough.
Fixes: #5549
Signed-off-by: Xuewei Niu <justxuewei@apache.org>
In the documentation test, the name shim has multiple potential
sources of import, now give it a clear source.
Fixes: #5535
Signed-off-by: Chen TaoTao <chentt10@chinatelecom.cn>
The compile option link-self-contained=yes asks rustc to use
C library startup object files that come with the compiler,
which are not available on the target s390x-unknown-linux-gnu.
A build does not contain any startup files leading to a
broken executable entry point (causing segmentation fault).
Fixes: #5522
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The /dev/sgx is not mounted and the enclave is not available,
causing the demo job to report an error in the logs. Add volumeMounts to
container in order to have the device available in the container.
Fixes: #5514
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
This function is meant to be used before operations
such as accessing network to make sure those operations
are not performed as a privilged user.
Fixes: #5331
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
There is little point to maintain backward compatiblity for
golangci-lint. Let's just use a unified version of it.
Fixes: #5512
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
1. Implemented a rust module for operating cgroups through systemd with the help of zbus (src/agent/rustjail/src/cgroups/systemd).
2. Add support for optional cgroup configuration through fs and systemd at agent (src/agent/rustjail/src/container.rs).
3. Described the usage and supported properties of the agent systemd cgroup (docs/design/agent-systemd-cgroup.md).
Fixes: #4336
Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
An API change, done a long time ago, has been exposed on Cloud
Hypervisor and we should update it on the Kata Containers side to ensure
it doesn't affect Cloud Hypervisor CI and because the change is needed
for an upcoming work to get QAT working with Cloud Hypervisor.
Fixes: #5492
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add a basic s390x cpu check for the "sie" feature to be present.
Also re-enable cpu check testing.
Fixes: #5438
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
For s390x, use native-tls for reqwest because the rustls-tls/ring
dependency is not available for s390x.
Also exclude s390x, powerpc64le, and aarch64 from running the cpu
check due to the lack of the arch-specific implementation. In this
case, rust complains about unused functions in src/check.rs (both
normal and test context).
Fixes: #5438
Co-authored-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Differently than every single other bit that's part of our repo, QEMU
has been using a single Dockerfile that prepares an environment where
the project can be built, but *also* building the project as part of
that very same Dockerfile.
This is a problem, for several different reasons, including:
* It's very hard to have a reproducible build if you don't have an
archived image of the builder
* One cannot cache / ipload the image of the builder, as that contains
already a specific version of QEMU
* Every single CI run we end up building the builder image, which
includes building dependencies (such as liburing)
Let's split the logic into a new build script, and pass the build script
to be executed inside the builder image, which will be only responsible
for providing an environment where QEMU can be built.
Fixes: #5464
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
before setting a limit, otherwise paths may not be found.
guest supporting different hugepage size is more likely with peer-pods where
podvm may use different flavor.
Fixes: #5191
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
In runtime-rs makefile, we use
```
```
to let make help print out help information for variables and targets,
but later commits forgot this rule.
So we need to follow the previous rule and change the current comments.
fixes: #5413
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
If the needed libraries (for virtfs) are installed on the host,
QEMU will pick it up and enable it. If not installed and you
do not enable the flag, QEMU will just ignore it, and you end
up without 9p support. Enabling it explicitly will fail if the
needed libs are not installed so this way we can be sure that
it gets build.
Fixes: #5418
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
s390x apparently does not support rust-tls, which is required by the
network check (due to the `reqwest` crate dependency).
Disable the network check on s390x until we can find a solution to the
problem.
> **Note:**
>
> This fix is assumed to be a temporary one until we find a solution.
> Hence, I have not moved the network check code (which should be entirely
> generic) into an architecture specific module.
Fixes: #5435.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Rework the architecture-specific `check()` call by moving all the
conditional logic out of the function.
Fixes: #5402.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Let's build virtiofsd using the kata-deploy build scripts, which
simplifies and unifies the way we build our components.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's have the docker installation / configuration as part of its own
task, which can be set as a dependency of other tasks whcih may or may
not depend on docker.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
When moving to building the CI artefacts using the kata-deploy scripts,
we've noticed that the build would fail on any machine where the tarball
wasn't officially provided.
This happens as rust is missing from the 1st layer container. However,
it's a very common practice to leave the 1st layer container with the
minimum possible dependencies and install whatever is needed for
building a specific component in a 2nd layer container, which virtiofsd
never had.
In this commit we introduce the second layer containers (yes,
comtainers), one for building virtiofsd using musl, and one for building
virtiofsd using glibc. The reason for taking this approach was to
actually simplify the scripts and avoid building the dependencies
(libseccomp, libcap-ng) using musl libc.
Fixes: #5425
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The previously used repo will be removed by Intel, as done with the one
used for TDX kernel. The TDX team has already worked on providing the
patches that were hosted atop of the QEMU commit with the following hash
4c127fdbe81d66e7cafed90908d0fd1f6f2a6cd0 as a tarball in the
https://github.com/intel/tdx-tools repo, see
https://github.com/intel/tdx-tools/pull/162.
On the Kata Containers side, in order to simplify the process and to
avoid adding hundreds of patches to our repo, we've revived the
https://github.com/kata-containers/qemu repo, and created a branch and a
tag with those hundreds of patches atop of the QEMU commit hash
4c127fdbe81d66e7cafed90908d0fd1f6f2a6cd0. The branch is called
4c127fdbe81d66e7cafed90908d0fd1f6f2a6cd0-plus-TDX-v3.1 and the tag is
called TDX-v3.1.
Knowing the whole background, let's switch the repo we're getting the
TDX QEMU from.
Fixes: #5419
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This change improves the command descriptions for kata-ctl and can avoid certain confusions in command functionality.
Fixes#5411
Signed-off-by: Tingzhou Yuan <tzyuan15@bu.edu>
The previously used repo has been removed by Intel. As this happened,
the TDX team worked on providing the patches that were hosted atop of
the v5.15 kernel as a tarball present in the
https://github.com/intel/tdx-tools repos, see
https://github.com/intel/tdx-tools/pull/161.
On the Kata Containers side, in order to simplify the process and to
avoid adding ~1400 kernel patches to our repo, we've revived the
https://github.com/kata-containers/linux repo, and created a branch and
a tag with those ~1400 patches atop of the v5.15. The branch is called
v5.15-plus-TDX, and the tag is called 5.15-plus-TDX (in order to avoid
having to change how the kernel builder script deals with versioning).
Knowing the whole background, let's switch the repo we're getting the
TDX kernel from.
Fixes: #5326
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Make this file conform to the standard rust layout conventions and
simplify the code as recommended by `clippy`.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Build using the rust TLS implementation rather than the system ones.
This resolves the `reqwest` crate build failure: it doesn't appear to
build against the native libssl libraries due to Kata defaulting to
using the musl libc.
Fixes: #5387.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This kata-ctl argument returns the latest stable Kata
release by hitting github.com.
Adds check-version unit tests.
Fixes: #11
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
Add architecture-specific code for x86_64 and generic calls handling
checks for CPU flags and attributes.
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Changed the `panic!()` call to a `compile_error!()` one to ensure it
fires at compile time rather than runtime.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Add framework for different architectures for check. In the existing
kata-runtime check, the network checks do not appear to be
architecture-specific while the kernel module, cpu, and kvm checks do
have separate implementations for different architectures.
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
As we're switching to using the rust version of the kata-ctl, lets
provide with its own entry in the kata-ctl command line.
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
Commit-edited-by: James O. D. Hunt <james.o.hunt@intel.com>
handle_events for EventManager doesn't take max_events as arguments, so
we need to update the comments for it.
p.s. max_events is defined when initializing the EventManager.
fixes: #5382
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
- libs/kata-types: adjust default_vcpus correctly
- runtime-rs: delete duplicated PASSTHROUGH_FS_DIR const
- Enable ACRN hypervisor support for Kata 2.x release
- agent: reduce reference count for failed mount
- agent: don't exit early if signal fails due to ESRCH
- kata-sys-util: delete duplicated get_bundle_path
- packaging: Mount $HOME/.docker in the 1st layer container
- Upgrade to Cloud Hypervisor v27.0
- microvm: Remove kernel_irqchip=on option
- kata-sys-util: fix typo `unknow`
- dragonball: update ut for kernel config
- versions: Update gperf url to avoid libseccomp random failures
- versions: Update oci version
- dragonball: fix no "as_str" error on Arm
- tools: release: fix bogus version check
- runtime-rs: update Cargo.lock
- refactor(runtime-rs): Use RwLock in runtime-agent
- runtime-rs: fix shim close_io call to support kubectl cp
- runtime-rs: add comments for runtime-rs shared directory
- workflow: trigger test-kata-deploy with pull_request and fix workflow_dispatch
- Dragonball: update linux_loader to 0.6.0
- modify virtio_net_dev_mgr.rs wrong code comments
- docs: Update urls in runk documentation
- runtime-rs: support watchable mount
- runtime-rs: debug console support in runtime
- kata-deploy: ship the rustified runtime binary
- runtime-rs: define VFIO unbind path as a const
- runtime-rs: set agent timeout to 0 for stream RPCs
- Added SNP-Support for Kata-Containers
- packaging: fix typo in configure-hypervisor.sh
- runtime/runtime-rs: update dependency
- release: Revert kata-deploy changes after 3.0.0-rc0 release
- runtime-rs: add test for StaticResource
- runtime-rs: remove hardcoded string
- docs: add README for runtime-rs hypervisor crate
- runtime-rs: use Path.is_file to check regular files
- osbuilder: Export directory variables for libseccomp
- runtime-rs: add unit tests for network resource
- runtime-rs/resource: use macro to reduce duplicated code
- runtime-rs: fix incorrect comments
- kernel: Add crypto kernel config for s390
- Non-root hypervisor uid reuse bug
- Build-in Sandbox: update dragonball-sandbox dependencies
- docs: Update url in virtualization document
- dragonball: Fix problem that stdio console cannot connect to stdout
- runtime-rs: call TomlConfig's validate function after load
- feat(Shimmgmt): Shim management server and client
53f209af4 libs/kata-types: adjust default_vcpus correctly
ef5a2dc3b agent: don't exit early if signal fails due to ESRCH
435c8f181 acrn: Enable ACRN hypervisor support for Kata 2.x release
c31cf7269 agent: reduce reference count for failed mount
4da743f90 packaging: Mount $HOME/.docker in the 1st layer container
067e2b1e3 runtime: clh: Use the new API to boot with TDX firmware (td-shim)
5d63fcf34 runtime: clh: Re-generate the client code
fe6107042 versions: Upgrade to Cloud Hypervisor v27.0
17de94e11 microvm: Remove kernel_irqchip=on option
3aeaa6459 runtime-rs: delete duplicated PASSTHROUGH_FS_DIR const
43ae97233 kata-sys-util: delete duplicated get_bundle_path
ac0483122 kata-sys-util: fix typo `unknow`
a24127659 versions: Update gperf url to avoid libseccomp random failures
a617a6348 versions: Update oci version
6d585d591 dragonball: fix no "as_str" error on Arm
421729f99 tools: release: fix bogus version check
457b0beaf runtime-rs: update Cargo.lock
f89ada2de dragonball: update ut for kernel config
0e899669e runtime-rs: fix shim close_io call to support kubectl cp
96cf21fad runtime-rs: add comments for runtime-rs shared directory
9bd941098 docs: Update urls in runk documentation
90ecc015e Dragonball: update linux_loader to 0.6.0
4a763925e runtime-rs: support watchable mount
abc26b00b dragonball: modify wrong code comments modify virtio_net_dev_mgr.rs wrong code comments
20bcaf0e3 runtime-rs: set agent timeout to 0 for stream RPCs
274de024c docs: add README for runtime-rs hypervisor crate
a4a23457c osbuilder: Export directory variables for libseccomp
d663f110d kata-deploy: get the config path from cri options
c6b3dcb67 kata-deploy: support kata-deploy for runtime-rs
46965739a runtime-rs: remove hardcoded string
a394761a5 kata-deploy: add installation for runtime-rs
50299a329 refactor(runtime-rs): Use RwLock in runtime agent
9628c7df0 runtime: update runc dependency
7fbc88387 runtime-rs: drop dependency on rustc-serialize
bf2be0cf7 release: Revert kata-deploy changes after 3.0.0-rc0 release
e23bfd615 runtime-rs: make function name more understandable
426a43678 runtime-rs: add unit test and eliminate raw string
87959cb72 runtime-rs: debug console support in runtime
d55cf9ab7 docs: Update url in virtualization document
0399da677 runtime-rs: update dependencies
f6f19917a dragonball: update dragonball-sandbox dependencies
2caee1f38 runtime-rs: define VFIO unbind path as a const
3f65ff2d0 runtime-rs: fix incorrect comments
9670a3caa runtime-rs: use Path.is_file to check regular files
d9e6eb11a docs: Guide to use SNP-VMs with Kata-Containers
ded60173d runtime: Enable choice between AMD SEV and SNP
22bda0838 runtime: Support for AMD SEV-SNP VMs
a2bbd2942 kernel: Introduce SNP kernel
0e69405e1 docs: Developer-Guide updated
105eda5b9 runtime: Initrd path option added to config
a8a8a28a3 runtime-rs/resource: use macro to reduce duplicated code
7622452f4 Dragonball: Fix the problem about stdio console
208233288 runtime-rs: add test for StaticResource
adb33a412 packaging: fix typo in configure-hypervisor.sh
f91431987 runtime: store the user name in hypervisor config
86a02c5f6 kernel: Add crypto kernel config for s390
5cafe2177 runtime: make StopVM thread-safe
c3015927a runtime: add more debug logs for non-root user operation
5add50aea runtime-rs: timeout for shim management client
9f13496e1 runtime-rs: shim management client
aaf6d6908 runtime-rs: call TomlConfig's validate function after load
e891295e1 runtime-rs: shim management - agent-url
59aeb776b runtime-rs: shim management
a828292b4 runtime-rs: add unit tests for network resource
7676cde0c workflow: trigger test-kata-deploy with pull_request
f10827357 workflow: require PR num input on test-kata-deploy workflow_dispatch
428d6dc80 workflow: Revert "workflow: trigger test-kata-deploy with pull_request"
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
This reverts commit 7676cde0c5.
It turns out that when triggerred from a PR, the docker login command is
failing with
```
Error: Cannot perform an interactive login from a non TTY device
```
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
With default_maxvcpus = 0 and default_vcpus = 1 settings, the
default_vcpus will be set to 0 and leads to starting fail.
The default_maxvcpus is not set correctly when it is set to 0,
and the default_vcpus is set to 0.
The correct action is setting default_maxvcpus to the max number
of CPUs or MAX_DRAGONBALL_VCPUS, and the default_vcpus should be
set to the desired value if the valuse is between 0 and
default_maxvcpus.
Fixes: #5110
Signed-off-by: Bin Liu <bin@hyper.sh>
ESRCH usually means the process has exited. In this case,
the execution should continue to kill remaining container processes.
Fixes: #5366
Signed-off-by: Feng Wang <feng.wang@databricks.com>
[Fix up cargo updates]
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Currently ACRN hypervisor support in Kata2.x releases is broken.
This commit re-enables ACRN hypervisor support and also refactors
the code so as to remove dependency on Sandbox.
Fixes#3027
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
The kata agent adds a reference for each storage object before mount
and skip mount again if the storage object is known. We need to
remove the object reference if mount fails.
Fixes: #5364
Signed-off-by: Feng Wang <feng.wang@databricks.com>
In order to ensure that the proxy configuration is passed to the 2nd
layer container, let's ensure the $HOME/.docker/config.json file is
exposed inside the 1st layer container.
For some reason which I still don't fully understand exporting
https_proxy / http_proxy / no_proxy was not enough to get those
variables exported to the 2nd layer container.
In this commit we're creating a "$HOME/.docker" directory, and removing
it after the build, in case it doesn't exist yet. The reason we do this
is to avoid docker not running in case "$HOME/.docker" doesn't exist.
This was not tested with podman, but if there's an issue with podman,
the issue was already there beforehand and should be treated as a
different problem than the one addressed in this commit.
Fixes: #5077
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The containerd stats method and metrics API are broken with Kata 2.5.x, the stats fail to load and the metrics API responds with status code 500
This seems to be down to the conversion from the stats reported by the agent RPC `StatsContainer` where the field `Pagesize` is not
completed by the `setHugetlbStats` method. In the case where multiple sized tables stats are reported, this causes containerd to register two metrics
with the same label set, rather than each being partitioned by the `page` label.
Fixes: #5316
Signed-off-by: Champ-Goblem <cameron@northflank.com>
The new way to boot from TDX firmware (e.g. td-shim) is using the
combination of '--platform tdx=on' with '--firmware tdshim'.
Fixes: #5309
Signed-off-by: Bo Chen <chen.bo@intel.com>
This release has been tracked in our new [roadmap project ](https://github.com/orgs/cloud-hypervisor/projects/6) as iteration v27.0.
**Community Engagement**
A new mailing list has been created to support broader community discussions.
Please consider [subscribing](https://lists.cloudhypervisor.org/g/dev/); an announcement of a regular meeting will be
announced via this list shortly.
**Prebuilt Packages**
Prebuilt packages are now available. Please see this [document](https://github.com/cloud-hypervisor/obs-packaging/blob/main/README.md)
on how to install. These packages also include packages for the different
firmware options available.
**Network Device MTU Exposed to Guest**
The MTU for the TAP device associated with a virtio-net device is now exposed
to the guest. If the user provides a MTU with --net mtu=.. then that MTU is
applied to created TAP interfaces. This functionality is also exposed for
vhost-user-net devices including those created with the reference backend.
**Boot Tracing**
Support for generating a trace report for the boot time has been added
including a script for generating an SVG from that trace.
**Simplified Build Feature Flags**
The set of feature flags, for e.g. experimental features, have been simplified:
* msvh and kvm features provide support for those specific hypervisors
(with kvm enabled by default),
* tdx provides support for Intel TDX; and although there is no MSHV support
now it is now possible to compile with the mshv feature,
* tracing adds support for boot tracing,
* guest_debug now covers both support for gdbing a guest (formerly gdb
feature) and dumping guest memory.
The following feature flags were removed as the functionality was enabled by
default: amx, fwdebug, cmos and common.
**Asynchronous Kernel Loading**
AArch64 has gained support for loading the guest kernel asynchronously like
x86-64.
**GDB Support for AArch64**
GDB stub support (accessed through --gdb under guest_debug feature) is now
available on AArch64 as well as as x86-64.
**Notable Bug Fixes**
* This version incorporates a version of virtio-queue that addresses an issue
where a rogue guest can potentially DoS the VMM,
* Improvements around PTY handling for virtio-console and serial devices,
* Improved error handling in virtio devices.
**Deprecations**
Deprecated features will be removed in a subsequent release and users should
plan to use alternatives.
* Booting legacy firmware (compiled without a PVH header) has been deprecated.
All the firmware options (Cloud Hypervisor OVMF and Rust Hypervisor Firmware)
support booting with PVH so support for loading firmware in a legacy mode is no
longer needed. This functionality will be removed in the next release.
Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v27.0
Note: To have the new API of loading firmware for booting (e.g. boot
from td-shim), a specific commit revision after the v27.0 release is
used as the Cloud Hypervisor version from the 'versions.yaml'.
Fixes: #5309
Signed-off-by: Bo Chen <chen.bo@intel.com>
`kernel_irqchip` option doesn't seem to bring any benefits and, on the
contrary, its usage cause issues when using the microvm machine type.
With this in mind, let's remove it.
Fixes: #1984, #4386
Signed-off-by: norbjd <norbjd@users.noreply.github.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add support for ephemeral storage and k8s emptydir.
Depends-on:github.com/kata-containers/tests#5161
Fixes: #4730
Signed-off-by: Bin Liu <bin@hyper.sh>
This PR updates the gperf url to avoid random failures when installing
libseccomp as it seems that the mirrror url produces network random
failures in multiple CIs.
Fixes#5294
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Cmdline struct update in the latest linux-loader lib and its as_str
method is changed to as_cstring, thus we need fix it according whereas
the old as_str method is used.
Fixes: #5287
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Shell expands `*"rc"*` to the top-level `src` directory. This results
in comparing a version with a directory name. This doesn't make sense
and causes the script to choose the wrong branch of the `if`.
The intent of the check is actually to detect `rc` in the version.
Fixes: #5283
Signed-off-by: Greg Kurz <groug@kaod.org>
There are two duplicated mentioning of the rust libraries in README.md.
Let's just remove them all as the section is intended to list out core
Kata components rather than general libraries.
Fixes: #5275
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Since linux loader is updated in the Dragonball and the api for Cmdline
has been changed ( as_str() changed to as_cstring() ), we need to update
unit test in Dragonball.
fixes: #5277
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Add close_io to shim and call agent's close_stdin in close_io.
Depends-on:github.com/kata-containers/tests#5155
Fixes: #4729
Signed-off-by: Bin Liu <bin@hyper.sh>
Since linux-loader 0.4.0 and 0.5.0 is yanked due to null terminator bug,
we need to update linux-loader to 0.6.0.
And as_str() function should also be changed.
fixes: #5253
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
For stream RPCs:
- write_stdin
- read_stdout
- read_stderr
there should be no timeout (by setting it to 0).
Fixes: #5249
Signed-off-by: Bin Liu <bin@hyper.sh>
The qmp command of hotplug cpu failed error was hidden. It didn't friendly for
the user tracing the hotplug cpu error. The PR help us to improve the hotplug
cpu error log. Add real qemu command error log for `failed to hot add vCPUs`.
Through the error message, we can get the reason of the failed qmp command
for hotplug cpu operation.
Fixes: #5234
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
To avoid the random failures when we are building the rootfs as it seems
that it does not find the value for the libseccomp and gperf directory,
this PR export these variables.
Fixes#5232
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
As 3.0.0-rc0 has been released, let's switch the kata-deploy / kata-cleanup
tags back to "latest", and re-add the kata-deploy-stable and the
kata-cleanup-stable files.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
This PR updates the url for the cloud hypervisor in the virtualization
document.
Fixes#5203
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
In src/runtime-rs/crates/hypervisor/src/device/vfio.rs,
the path of new_id is defined as a const, but unbind is used
as a local variable, they should be unified to const.
Fixes: #5189
Signed-off-by: Bin Liu <bin@hyper.sh>
The guide describes how to set Kata-Containers up so that AMD SEV-SNP
encrypted VMs are used when deploying confidential containers.
Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
This is based on a patch from @niteeshkd that adds a config
parameter to choose between AMD SEV and SEV-SNP VMs as the
confidential guest type in case both types are supported. SEV is
the default.
Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
This commit adds AMD SEV-SNP as a confidential guest option to the
runtime. Information on required components such as OVMF, QEMU and
a kernel supporting SEV-SNP are defined in the versions file and
corresponding configs are added.
Note: The CPU model 'host' provided by the current SNP-QEMU does
not support all SNP capabilities yet, which is why this option is
changed to EPYC-v4.
Note: The guest's physical address space reduction specified with
ReducedPhysBits is 1. Details are can be found in Section 15.34.6
here https://www.amd.com/system/files/TechDocs/24593.pdfFixes#4437
Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
Developer-Guide.md is updated to work using current golang versions.
Related Readmes are also updated.
Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
Adds initrd configuration option to the configuration.toml that is
generated for the setup using QEMU.
Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
- runtime-rs: delete some allow(dead_code) attributes
- kata-types: don't check virtio_fs_daemon for inline-virtio-fs
- kata-types: change return type of getting CPU period/quota function
- runtime-rs: fix host device check pattern
- runtime-rs: remove meaningless comment
- runtime-rs: update rust runtime roadmap
- runk: Enable seccomp support by default
- config: add "inline-virtio-fs" as a "shared_fs" type
- runtime-rs: add README.md
- runk: Refactor container builder
- kernel: fix kernel tarball name for SEV
- libs/kata-types: replace tabs by spaces in comments
- gperf: point URL to mirror site
be242a3c3 release: Adapt kata-deploy for 3.0.0-rc0
156e1c324 runtime-rs: delete some allow(dead_code) attributes
62cf6e6fc runtime-rs: remove meaningless comment
bcf6bf843 runk: Enable seccomp support by default
2b1d05857 runtime-rs: fix host device check pattern
85b49cee0 runtime-rs: add README.md
36d805fab config: add "inline-virtio-fs" as a "shared_fs" type
b948a8ffe kernel: fix kernel tarball name for SEV
50f912615 libs/kata-types: replace tabs by spaces in comments
96c8be715 libs/kata-types: change return type of getting CPU period/quota
fc9c6f87a kata-types: don't check virtio_fs_daemon for inline-virtio-fs
968c2f6e8 runk: Refactor container builder
84268f871 runtime-rs: update rust runtime roadmap
566656b08 gperf: point URL to mirror site
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
kata-deploy files must be adapted to a new release. The cases where it
happens are when the release goes from -> to:
* main -> stable:
* kata-deploy-stable / kata-cleanup-stable: are removed
* stable -> stable:
* kata-deploy / kata-cleanup: bump the release to the new one.
There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Some device types have the same definition, they can be implemented
by macro to reduce code.
And this commit also deleted the `peer_name` field of the structs that
is never been used.
Fixes: #5170
Signed-off-by: Bin Liu <bin@hyper.sh>
The user name will be used to delete the user instead of relying on
uid lookup because uid can be reused.
Fixes: #5155
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Enable seccomp support in `runk` by default.
Due to this, `runk` is built with `gnu libc` by default
because the building `runk` with statically linked the `libseccomp`
and `musl` requires additional configurations.
Also, general container runtimes are built with `gnu libc` as
dynamically linked binaries by default.
The user can disable seccomp by `make SECCOMP=no`.
Fixes: #4896
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Let client side support timeout if the timeout value is set.
If timeout not set, execute directly.
Fixes: #5114
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
"inline-virtio-fs" is newly supported by kata 3.0 as a "shared_fs" type,
it should be described in configuration file.
"inline-virtio-fs" is the same as "virtio-fs", but it is running in
the same process of shim, does not need an external virtiofsd process.
Fixes: #5102
Signed-off-by: Bin Liu <bin@hyper.sh>
Add client side function(public), to establish http connections (PUT,
POST, GET) to the long standing shim mgmt server.
Fixes: #5114
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
Add agent-url to its handler. The general framework of registering URL
handlers is done.
Fixes: #5114
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
Add shim management http server and boot it as a light-weight thread
when the sandbox is created.
Fixes: #5114
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
period should have a type of u64, and quota should be i64, the
function of getting CPU period and quota from annotations should
use the same data type as function return type.
Fixes: #5100
Signed-off-by: Bin Liu <bin@hyper.sh>
Kata 3.0 introduced 3 new configurations under runtime section:
name="virt_container"
hypervisor_name="dragonball"
agent_name="kata"
Blank values will lead to starting to fail.
Adding default values will make user easy to migrate to kata 3.0.
Fixes: #5098
Signed-off-by: Bin Liu <bin@hyper.sh>
Refactor the container builder code (`InitContainer` and `ActivatedContainer`)
to make it easier to understand and to maintain.
The details:
1. Separate the existing `builder.rs` into an `init_builder.rs` and
`activated_builder.rs` to make them easy to read and maintain.
2. Move the `create_linux_container` function from the `builder.rs` to
`container.rs` because it is shared by the both files.
3. Some validation functions such as `validate_spec` from `builder.rs`
to `utils.rs` because they will be also used by other components as
utilities in the future.
Fixes: #5033
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
In the commit 54d6d01754 we ended up
removing the BUILD_SUFFIX argument passed to QEMU as it only seemed to
be used to generate the HYPERVISOR_NAME and PKGVERSION, which were added
as arguments to the dockerfile.
However, it turns out BUILD_SUFFIX is used by the `qemu-build-post.sh`
script, so it can rename the QEMU binary accordingly.
Let's just bring it back.
Fixes: #5078
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In some case the call of cleanup from shim to service manager will fail,
and the shim process will continue to running, that will make process leak.
This commit will force shutdown the shim process in case of any errors in
service crate.
Fixes: #5087
Signed-off-by: Bin Liu <bin@hyper.sh>
Following the instructions in guidance doc will result in the ECONNREFUSED,
thus we need to keep the unix socket address in the two commands consistent.
Fixes: #5085
Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
Dockerfile cannot decipher multiple conditional statements in the main RUN call.
Cannot segregate statements in Dockerfile with '{}' braces without wrapping entire statement in 'bash -c' statement.
Dockerfile does not support setting variables by bash command.
Must set HYPERVISOR_NAME and PKGVERSION from parent script: build-base-qemu.sh
Fixes: #5078
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
amend_spec do two works:
- modify the spec
- check if the pid namespace is enabled
This make it confusable. So split it into two functions.
Fixes: #5062
Signed-off-by: Bin Liu <bin@hyper.sh>
Augment the mock hypervisor so that we can validate that ACPI memory hotplug
is carried out as expected.
We'll augment the number of memory slots in the hypervisor config each
time the memory of the hypervisor is changed. In this way we can ensure
that large memory hotplugs are broken up into appropriately sized
pieces in the unit test.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
If we're using ACPI hotplug for memory, there's a limitation on the
amount of memory which can be hotplugged at a single time.
During hotplug, we'll allocate memory for the memmap for each page,
resulting in a 64 byte per 4KiB page allocation. As an example, hotplugging 12GiB
of memory requires ~192 MiB of *free* memory, which is about the limit
we should expect for an idle 256 MiB guest (conservative heuristic of 75%
of provided memory).
From experimentation, at pod creation time we can reliably add 48 times
what is provided to the guest. (a factor of 48 results in using 75% of
provided memory for hotplug). Using prior example of a guest with 256Mi
RAM, 256 Mi * 48 = 12 Gi; 12GiB is upper end of what we should expect
can be hotplugged successfully into the guest.
Note: It isn't expected that we'll need to hotplug large amounts of RAM
after workloads have already started -- container additions are expected
to occur first in pod lifecycle. Based on this, we expect that provided
memory should be freely available for hotplug.
If virtio-mem is being utilized, there isn't such a limitation - we can
hotplug the max allowed memory at a single time.
Fixes: #4847
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
gperf download fails intermittently.
Changing to mirror site will hopefully increase download reliability.
Fixes: #5057
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
00aadfe20a introduced a regression on
`make cc-tdx-kernel-tarball` as we stopped passing all the needed
information to the `build-kernel.sh` script, leading to requiring `yq`
installed in the container used to build the kernel.
This commit partially reverts the faulty one, rewritting it in a way the
old behaviour is brought back, without changing the behaviour that was
added by the faulty commit.
Fixes: #5043
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It'll be useful to get the total memory provided to the guest
(hotplugged + coldplugged). We'll use this information when calcualting
how much memory we can add at a time when utilizing ACPI hotplug.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
- Initrd fixes for ubuntu systemd
- kernel: Add CONFIG_CGROUP_HUGETLB=y as part of the cgroup fragments
- Fix kata-deploy to work on CI context
- github-actions: Auto-backporting
- runtime-rs: add support for core scheduling
- ci: Use versions.yaml for the libseccomp
- runk: Add cli message for init command
- agent: add some logs for mount operation
- Use iouring for qemu block devices
- logging: Replace nix::Error::EINVAL with more descriptive msgs
- kata-deploy: fix threading conflicts
- kernel: Ignore CONFIG_SPECULATION_MITIGATIONS for older kernels
- runtime-rs: support loading kernel modules in guest vm
- TDX: Get TDX working again with Cloud Hypervisor + a minor change on QEMU's code
- runk: Move delete logic to libcontainer
- runtime: cri-o annotations have been moved to podman
- Fix depbot reported rust crates dependency security issues
- UT: test_load_kernel_module needs root
- enable vmx for vm factory
- runk: add pause/resume commands
- kernel: upgrade guest kernel support to 5.19
- Drop-in cfg files support in runtime-rs
- agent: do some rollback works if case of do_create_container failed
- network: Fix error message for setting hardware address on TAP interface
- Upgrade to Cloud Hypervisor v26.0
- runtime: tracing: End root span at end of trace
- ci: Update libseccomp version
- dep: update nix dependency
- Updated the link target of CRI-O
- libs/test-utils: share test code by create a new crate
dc32c4622 osbuilder: fix ubuntu initrd /dev/ttyS0 hang
cc5f91dac osbuilder: add systemd symlinks for kata-agent
c08a8631e agent: add some logs for mount operation
0a6f0174f kernel: Ignore CONFIG_SPECULATION_MITIGATIONS for older kernels
6cf16c4f7 agent-ctl: fix clippy error
4b57c04c3 runtime-rs: support loading kernel modules in guest vm
dc90eae17 qemu: Drop unnecessary `tdx_guest` kernel parameter
d4b67613f clh: Use HVC console with TDX
c0cb3cd4d clh: Avoid crashing when memory hotplug is not allowed
9f0a57c0e clh: Increase API and SandboxStop timeouts for TDX
b535bac9c runk: Add cli message for init command
c142fa254 clh: Lift the sharedFS restriction used with TDX
bdf8a57bd runk: Move delete logic to libcontainer
a06d819b2 runtime: cri-o annotations have been moved to podman
ffd1c1ff4 agent-ctl/trace-forwarder: udpate thread_local dependency
69080d76d agent/runk: update regex dependency
e0ec09039 runtime-rs: update async-std dependency
763ceeb7b logging: Replace nix::Error::EINVAL with more descriptive msgs
4ee2b99e1 kata-deploy: fix threading conflicts
731d39df4 kernel: Add CONFIG_CGROUP_HUGETLB=y as part of the cgroup fragments
96d903734 github-actions: Auto-backporting
a6fbaac1b runk: add pause/resume commands
8e201501e kernel: fix for set_kmem_limit error
00aadfe20 kernel: SEV guest kernel upgrade to 5.19.2
0d9d8d63e kernel: upgrade guest kernel support to 5.19.2
57bd3f42d runtime-rs: plug drop-in decoding into config-loading code
87b97b699 runtime-rs: add filesystem-related part of drop-in handling
cf785a1a2 runtime-rs: add core toml::Value tree merging
92f7d6bf8 ci: Use versions.yaml for the libseccomp
f508c2909 runtime: constify splitIrqChipMachineOptions
2b0587db9 runtime: VMX is migratible in vm factory case
fa09f0ec8 runtime: remove qemuPaths
326f1cc77 agent: enrich some error code path
4f53e010b agent: skip test_load_kernel_module if non-root
3a597c274 runtime: clh: Use the new 'payload' interface
16baecc5b runtime: clh: Re-generate the client code
50ea07183 versions: Upgrade to Cloud Hypervisor v26.0
f7d41e98c kata-deploy: export CI in the build container
4f90e3c87 kata-deploy: add dockerbuild/install_yq.sh to gitignore
8ff5c10ac network: Fix error message for setting hardware address on TAP interface
338c28295 dep: update nix dependency
78231a36e ci: Update libseccomp version
34746496b libs/test-utils: share test code by create a new crate
3829ab809 docs: Update CRI-O target link
fcc1e0c61 runtime: tracing: End root span at end of trace
c1e3b8f40 govmm: Refactor qmp functions for adding block device
598884f37 govmm: Refactor code to get rid of redundant code
00860a7e4 qmp: Pass aio backend while adding block device
e1b49d758 config: Add block aio as a supported annotation
ed0f1d0b3 config: Add "block_device_aio" as a config option for qemu
b6cd2348f govmm: Add io_uring as AIO type
81cdaf077 govmm: Correct documentation for Linux aio.
a355812e0 runtime-rs: fixed bug on core-sched error handling
591dfa4fe runtime-rs: add support for core scheduling
09672eb2d agent: do some rollback works if case of do_create_container failed
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Updates versions of crossbeam-channel because 0.52.0 is a yanked package
(creators mark version as not for release except as a dependency for
another package)
Updates chrono to use >0.42.0 to avoid:
https://rustsec.org/advisories/RUSTSEC-2020-0159
Updates lz4-sys.
Signed-off-by: Derek Lee <derlee@redhat.com>
Adds oci under the src/libs workplace.
oci shares a Cargo.lock file with the rest of src/libs but was not
listed as a member of the workspace.
There is no clear reason why it is not included in the workspace, so
adding it so cargo-deny stop complaining
Signed-off-by: Derek Lee <derlee@redhat.com>
One of the checks done by cargo-deny is ensuring all crates have a valid
license. As the rust programs import each other, cargo.toml files
without licenses trigger the check. While I could disable this check
this would be bad practice.
This adds an Apache-2.0 license in the Cargo.toml files.
Some of these files already had a header comment saying it is an Apache
license. As the entire project itself is under an Apache-2.0 license, I
assumed all individual components would also be covered under that
license.
Signed-off-by: Derek Lee <derlee@redhat.com>
Adds cargo-deny to scan for vulnerabilities and license issues regarding
rust crates.
GitHub Actions does not have an obvious way to loop over each of the
Cargo.toml files. To avoid hardcoding it, I worked around the problem
using a composite action that first generates the cargo-deny action by
finding all Cargo.toml files before calling this new generated action in
the master workflow.
Uses recommended deny.toml from cargo-deny repo with the following
modifications:
ignore = ["RUSTSEC-2020-0071"]
because chrono is dependent on the version of time with the
vulnerability and there is no simple workaround
multiple-versions = "allow"
Because of the above error and other packages, there are instances
where some crates require different versions of a crate.
unknown-git = "allow"
I don't see a particular issue with allowing crates from other repos.
An alternative would be the manually set each repo we want in an
allow-git list, but I see this as more of a nuisance that its worth.
We could leave this as a warning (default), but to avoid clutter I'm
going to allow it.
If deny.toml needs to be edited in the future, here's the guide:
https://embarkstudios.github.io/cargo-deny/index.htmlFixes#3359
Signed-off-by: Derek Lee <derlee@redhat.com>
This PR updates the cni plugins version that is being used in the kata CI.
Fixes#5039
Depends-on: github.com/kata-containers/tests#5088
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This removes two options that are not needed (any longer). These
are not set for any kernel so they do not need to be ignored either.
Fixes#5035
Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
Guest log is showing a hang on systemd getty start.
Adding symlink for /dev/ttyS0 resolves issue.
Fixes: #4932
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
Somewhere is lack of log info, add more details about
the storage and log when error will help understand
what happened.
Fixes: #4962
Signed-off-by: Bin Liu <bin@hyper.sh>
TDX kernel is based on a kernel version which doesn't have the
CONFIG_SPECULATION_MITIGATIONS option.
Having this in the allow list for missing configs avoids a breakage in
the TDX CI.
Fixes: #4998
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Users can specify the kernel module to be loaded through the agent
configuration in kata configuration file or in pod anotation file.
And information of those modules will be sent to kata agent when
sandbox is created.
Fixes: #4894
Signed-off-by: Yushuo <y-shuo@linux.alibaba.com>
With the current TDX kernel used with Kata Containers, `tdx_guest` is
not needed, as TDX_GUEST is now a kernel configuration.
With this in mind, let's just drop the kernel parameter.
Fixes: #4981
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As right now the TDX guest kernel doesn't support "serial" console,
let's switch to using HVC in this case.
Fixes: #4980
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The runtime will crash when trying to resize memory when memory hotplug
is not allowed.
This happens because we cannot simply set the hotplug amount to zero,
leading is to not set memory hotplug at all, and later then trying to
access the value of a nil pointer.
Fixes: #4979
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
While doing tests using `ctr`, I've noticed that I've been hitting those
timeouts more frequently than expected.
Till we find the root cause of the issue (which is *not* in the Kata
Containers), let's increase the timeouts when dealing with a
Confidential Guest.
Fixes: #4978
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add cli message for init command to tell the user
not to run this command directly.
Fixes: #4367
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
When booting the TDX kernel with `tdx_disable_filter`, as it's been done
for QEMU, VirtioFS can work without any issues.
Whether this will be part of the upstream kernel or not is a different
story, but it easily could make it there as Cloud Hypervisor relies on
the VIRTIO_F_IOMMU_PLATFORM feature, which forces the guest to use the
DMA API, making these devices compatible with TDX.
See Sebastien Boeuf's explanation of this in the
3c973fa7ce208e7113f69424b7574b83f584885d commit:
"""
By using DMA API, the guest triggers the TDX codepath to share some of
the guest memory, in particular the virtqueues and associated buffers so
that the VMM and vhost-user backends/processes can access this memory.
"""
Fixes: #4977
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Move delete logic to `libcontainer` crate to make the code clean
like other commands.
Fixes: #4975
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Let's swith to depending on podman which also simplies indirect
dependency on kubernetes components. And it helps to avoid cri-o
security issues like CVE-2022-1708 as well.
Fixes: #4972
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
So that we bump several indirect dependencies like crossbeam-channel,
crossbeam-utils to bring in fixes to known security issues like CVE-2020-15254.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Replaces instances of anyhow!(nix::Error::EINVAL) with other messages to
make it easier to debug.
Fixes#954
Signed-off-by: Derek Lee <derlee@redhat.com>
Fix threading conflicts when kata-deploy 'make kata-tarball' is called.
Force the creation of rootfs tarballs to happen serially instead of in parallel.
Fixes: #4787
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
Kata guest os cgroup is not work properly kata guest kernel config option
CONFIG_CGROUP_HUGETLB is not set, leading to:
root@clr-b08d402cc29d44719bb582392b7b3466 ls /sys/fs/cgroup/hugetlb/
ls: cannot access '/sys/fs/cgroup/hugetlb/': No such file or directory
Fixes: #4953
Signed-off-by: Miao Xia <xia.miao1@zte.com.cn>
An implementation of semi-automating the backporting
process.
This implementation has two steps:
1. Checking whether any associated issues are marked as bugs
If they do, mark with `auto-backport` label
2. On a successful merge, if there is a `auto-backport` label and there
are any tags of `backport-to-BRANCHNAME`, it calls an action that
cherry-picks the commits in the PR and automatically creates a PR to
those branches.
This action uses https://github.com/sqren/backport-github-actionFixes#3618
Signed-off-by: Derek Lee <derlee@redhat.com>
To make cgroup v1 and v2 works well, I use `cgroups::cgroup` in
`Container` to manager cgroup now. `CgroupManager` in rustjail has some
drawbacks. Frist, methods in Manager traits are not visiable. So we need
to modify rustjail and make them public. Second, CgrupManager.cgroup is
private too, and it can't be serialized. We can't load/save it in
status file. One solution is adding getter/setter in rustjail, then
create `cgroup` and set it when loading status. In order to keep the
modifications to a minimum in rustjail, I use `cgroups::cgroup`
directly. Now it can work on cgroup v1 or v2, since cgroup-rs do this
stuff.
Fixes: #4364#4821
Signed-off-by: Chen Yiyang <cyyzero@qq.com>
kernel: Update SEV guest kernel to 5.19.2
Kernel 5.19.2 has all the needed patches for running SEV, thus let's update it and stop using the version coming from confidential-containers.
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
To plug drop-in support into existing config-loading code in a robust
way, more specifically to create a single point where this needs to be
handled, load_from_file() and load_raw_from_file() were refactored.
Seeing as the original implemenations of both functions were identical
apart from adjust_config() calls in load_from_file(), load_from_file()
was reimplemented in terms of load_raw_from_file().
Fixes #4771
Signed-off-by: Pavel Mores <pmores@redhat.com>
The central function being added here is load() which takes a path to a
base config file and uses it to load the base config file itself, find
the corresponding drop-in directory (get_dropin_dir_path()), iterate
through its contents (update_from_dropins()) and load each drop-in in
turn and merge its contents with the base file (update_from_dropin()).
Also added is a test of load() which mirrors the corresponding test in
the golang runtime (TestLoadDropInConfiguration() in config_test.go).
Signed-off-by: Pavel Mores <pmores@redhat.com>
This is the core functionality of merging config file fragments into the
base config file. Our TOML parser crate doesn't seem to allow working
at the level of TomlConfig instances like BurntSushi, used in the Golang
runtime, does so we implement the required functionality at the level of
toml::Value trees.
Tests to verify basic requirements are included. Values set by a base
config file and not touched by a subsequent drop-in should be preserved.
Drop-in config file fragments should be able to change values set by the
base config file and add settings not present in the base. Conversion
of a merged tree into a mock TomlConfig-style structure is tested as
well.
Signed-off-by: Pavel Mores <pmores@redhat.com>
It would be nice to use `versions.yaml` for the maintainability.
Previously, we have been specified the `libseccomp` and the `gperf` version
directly in this script without using the `versions.yaml` because the current
snap workflow is incomplete and fails.
This is because snap CI environment does not have kata-cotnainers repository
under ${GOPATH}. To avoid the failure, the `rootfs.sh` extracts the libseccomp
version and url in advance and pass them to the `install_libseccomp.sh` as
environment variables.
Fixes: #4941
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Highlights from the Cloud Hypervisor release v26.0:
**SMBIOS Improvements via `--platform`**
`--platform` and the appropriate API structure has gained support for supplying
OEM strings (primarily used to communicate metadata to systemd in the guest)
**Unified Binary MSHV and KVM Support**
Support for both the MSHV and KVM hypervisors can be compiled into the same
binary with the detection of the hypervisor to use made at runtime.
**Notable Bug Fixes**
* The prefetchable flag is preserved on BARs for VFIO devices
* PCI Express capabilties for functionality we do not support are now filtered
out
* GDB breakpoint support is more reliable
* SIGINT and SIGTERM signals are now handled before the VM has booted
* Multiple API event loop handling bug fixes
* Incorrect assumptions in virtio queue numbering were addressed, allowing
thevirtio-fs driver in OVMF to be used
* VHDX file format header fix
* The same VFIO device cannot be added twice
* SMBIOS tables were being incorrectly generated
**Deprecations**
Deprecated features will be removed in a subsequent release and users should
plan to use alternatives.
The top-level `kernel` and `initramfs` members on the `VmConfig` have been
moved inside a `PayloadConfig` as the `payload` member. The OpenAPI document
has been updated to reflect the change and the old API members continue to
function and are mapped to the new version. The expectation is that these old
versions will be removed in the v28.0 release.
**Removals**
The following functionality has been removed:
The unused poll_queue parameter has been removed from --disk and
equivalent. This was residual from the removal of the vhost-user-block
spawning feature.
Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v26.0Fixes: #4952
Signed-off-by: Bo Chen <chen.bo@intel.com>
The clone_tests_repo() in ci/lib.sh relies on CI variable to decide
whether to checkout the tests repository or not. So it is required to
pass that variable down to the build container of kata-deploy, otherwise
it can fail on some scenarios.
Fixes#4949
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The install_yq.sh is copied to tools/packaging/kata-deploy/local-build/dockerbuild
so that it is added in the kata-deploy build image. Let's tell git to
ignore that file.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
More and more Rust code is introduced, the test utils original in agent
should be made easy to share, move it into a new crate will make it
easy to share between different crates.
Fixes: #4925
Signed-off-by: Bin Liu <bin@hyper.sh>
vergen is a build dependency, but it is not being used.
we are processing ver/commit hash by make command, but not by vergen.
Fixes: #4920
Signed-off-by: Bin Liu <bin@hyper.sh>
- runtime-rs: fix design doc's typo
- docs: use curl as default downloader for runtime-rs
- runtime-rs: update Cargo.lock
- Fix some GitHub actions workflow issues
- versions: Update libseccomp version
- runtime-rs:merge runtime rs to main
- nydus: wait nydusd API server ready before mounting share fs
- versions: Update TD-shim due to build breakage
- agent-ctl: Add an empty [workspace]
- packaging: Create no_patches.txt for the SPR-BKC-PC-v9.6.x
- docs: Improve SGX documentation
- runtime: explicitly mark the source of the log is from qemu.log
- runtime: add unlock before return in sendReq
- docs: add back host network limitation
- runk: add ps sub-command
- Depends-on:github.com/kata-containers/tests#4986
- runtime-rs:update rtnetlink version
- runtime-rs:skip the build process when the arch is s390x
- docs: Improve SGX documentation
- agent: Use rtnetlink's neighbours API to add neighbors
- Bump TDX dependencies (QEMU and Kernel)
- OVMF / td-shim: Adjust final tarball location
- libs: fix CI error for protocols
- runtime-rs: merge main to runtime-rs
- packaging: Add support for building TDVF
- versions: Track and add support for building TD-shim
- versions: Upgrade rust version
- Merge Main into runtime-rs branch
- agent: log RPC calls for debugging
- runtime-rs: fix stop failed in azure
- Add support AmdSev build of OVMF
- runtime: Support for host cgroupv2
- versions: Update runc version
- qemu: Add liburing to qemu build
- runtime-rs: fix set share sandbox pid namespace
- Docs: fix tables format error
- versions: Update Firecracker version to v1.1.0
- agent: Fix stream fd's double close
- container: kill all of the processes in a container when it terminated
- fix network failed for kata ci
- runtime-rs: handle default_vcpus greator than default_maxvcpu
- agent: fix fd-double-close problem in ut test_do_write_stream
- runtime-rs: add functionalities support for macvlan and vlan endpoints
- Docs: add rust environment setup for kata 3.0
- rustjail: check result to let it return early
- upgrade nydus version
- support disable_guest_seccomp
- cgroups: remove unnecessary get_paths()
- versions: Update firecracker version
- kata-monitor: fix can't monitor /run/vc/sbs
- runtime-rs: fix sandbox_cgroup_only=false panic
- runtime-rs: fix ctr exit failed
- docs: add installation guide for kata 3.0
- runtime-rs: support functionalities of ipvlan endpoint
- runtime-rs: remove the value of hypervisor path in DB config
- kata-sys-util: upgrade nix version
- runtime-rs: fix some bugs to make runtime-rs on aarch64
- runk: Support `exec` sub-command
- runtime-rs: hypervisor part
- clh: Don't crash if no network device is set by the upper layer
- packaging: Rework how ${BUILD_SUFFIX} is used with the QEMU builder scripts
- versions: Update Cloud Hypervisor to v25.0
- Runtime-rs merge main
- kernel: Deduplicate code used for building TEE kernels
- runtime-rs: Dragonball-sandbox - add virtio device feature support for aarch64
- packaging: Simplify config path handling
- build: save lines for repository_owner check
- kata 3.0 Architecture
- Fix clh tarball build
- runtime-rs: built-in Dragonball sandbox part III - virtio-blk, virtio-fs, virtio-net and VMM API support
- runtime: Fix DisableSelinux config
- docs: Update URL links for containerd documentation
- docs: delete CRI containerd plugin statement
- release: Revert kata-deploy changes after 2.5.0-rc0 release
- tools/snap: simplify nproc
- action: revert commit message limit to 150 bytes
- runtime-rs: Dragonball sandbox - add Vcpu::configure() function for aarch64
- runtime-rs: makefile for dragonball
- runtime-rs:refactor network model with netlink
- runtime-rs: Merge Main into runtime-rs branch
- runtime-rs: built-in Dragonball sandbox part II - vCPU manager
- runtime-rs: runtime-rs merge main
- runtime-rs: built-in Dragonball sandbox part I - resource and device managers
caada34f1 runtime-rs: fix design doc's typo
b61dda40b docs: use curl as default downloader for runtime-rs
ca9d16e5e runtime-rs: update Cargo.lock
99a7b4f3e workflow: Revert "static-checks: Allow Merge commit to be >75 chars"
d14e80e9f workflow: Revert "docs: modify move-issues-to-in-progress.yaml"
1f4b6e646 versions: Update libseccomp version
8a4e69008 versions: Update TD-shim due to build breakage
065305f4a agent-ctl: Add an empty [workspace]
1444d7ce4 packaging: Create no_patches.txt for the SPR-BKC-PC-v9.6.x
2ae807fd2 nydus: wait nydusd API server ready before mounting share fs
c8d4ea84e docs: Improve SGX documentation
d8ad16a34 runtime: add unlock before return in sendReq
8bbffc42c runtime-rs:update rtnetlink version
c5452faec docs: Improve SGX documentation
389ae9702 runtime-rs:skip the test when the arch is s390x
945e02227 runtime-rs:skip the build process when the arch is s390x
8d1cb1d51 td-shim: Adjust final tarball location
62f05d4b4 ovmf: Adjust final tarball location
9972487f6 versions: Bump Kernel TDX version
c9358155a kernel: Sort the TDX configs alphabetically
dd397ff1b versions: Bump QEMU TDX version
230a22905 runk: add ps sub-command
889557ecb docs: add back host network limitation
c9b5bde30 versions: Track and build TDVF
e6a5a5106 packaging: Generate a tarball as OVMF build result
42eaf19b4 packaging: Simplify OVMF repo clone
4d33b0541 packaging: Don't hardcode "edk2" as the cloned repo's dir.
7247575fa runtime-rs:fix cargo clippy
b06bc8228 versions: Track and add support for building TD-shim
86ac653ba libs: fix CI error for protocols
81fe51ab0 agent: fix unittests for arp neighbors
845c1c03c agent: use rtnetlink's neighbours API to add neighbors
9b1940e93 versions: update rust version
638c2c416 static-build: Add AmdSev option for OVMF builder Introduces new build of firmware needed for SEV
f0b58e38d static-build: Add build script for OVMF
fa0b11fc5 runtime-rs: fix stdin hang in azure
5c3155f7e runtime: Support for host cgroup v2
4ab45e5c9 docs: Update support for host cgroupv2
326eb2f91 versions: Update runc version
f5aa6ae46 agent: Fix stream fd's double close problem
6e149b43f Docs: fix tables format error
85f4e7caf runtime: explicitly mark the source of the log is from qemu.log
56d49b507 versions: Update Firecracker version to v1.1.0
b3147411e runtime-rs:add unit test for set share pid ns
1ef3f8eac runtime-rs: set share sandbox pid namespace
57c556a80 runtime-rs: fix stop failed in azure
0e24f47a4 agent: log RPC calls for debugging
c825065b2 runtime-rs: fix tc filter setup failed
e0194dcb5 runtime-rs: update route destination with prefix
fa85fd584 docs: add rust environment setup for kata 3.0
896478c92 runtime-rs: add functionalities support for macvlan and vlan endpoints
df79c8fe1 versions: Update firecracker version
912641509 agent: fix fd-double-close problem in ut test_do_write_stream
43045be8d runtime-rs: handle default_vcpus greator than default_maxvcpu
0d7cb7eb1 agent: delete agent-type property in announce
eec9ac81e rustjail: check result to let it return early.
402bfa0ce nydus: upgrade nydus/nydus-snapshotter version
54f53d57e runtime-rs: support disable_guest_seccomp
4331ef80d Runtime-rs: add installation guide for rust-runtime
72dbd1fcb kata-monitor: fix can't monitor /run/vc/sbs.
e9988f0c6 runtime-rs: fix sandbox_cgroup_only=false panic
cebbebbe8 runtime-rs: fix ctr exit failed
62182db64 runtime-rs: add unit test for ipvlan endpoint
99654ce69 runtime-rs: update dbs-xxx dependencies
f4c3adf59 runtime-rs: Add compile option file
545ae3f0e runtime-rs: fix warning
19eca71cd runtime-rs: remove the value of hypervisor path in DB config
d8920b00c runtime-rs: support functionalities of ipvlan endpoint
2b01e9ba4 dragonball: fix warning
996a6b80b kata-sys-util: upgrade nix version
f690b0aad qemu: Add liburing to qemu build
d93e4b939 container: kill all of the processes in this container
3c989521b dragonball: update for review
274598ae5 kata-runtime: add dragonball config check support.
1befbe673 runtime-rs: Cargo lock for fix version problem
3d6156f6e runtime-rs: support dragonball and runtime-binary
3f6123b4d libs: update configuration and annotations
9ae2a45b3 cgroups: remove unnecessary get_paths()
be31207f6 clh: Don't crash if no network device is set by the upper layer
051181249 packaging: Add a "-" in the dir name if $BUILD_DIR is available
dc3b6f659 versions: Update Cloud Hypervisor to v25.0
201ff223f packaging: Use the $BUILD_SUFFIX when renaming the qemu binary
1a25afcdf kernel: Allow passing the URL to download the tarball
80c68b80a kernel: Deduplicate code used for building TEE kernels
d2584991e dragonball: fix dependency unused warning
458f6f42f dragonball: use const string for legacy device type
939959e72 docs: add Dragonball to hypervisors
f6f96b8fe dragonball: add legacy device support for aarch64
7a4183980 dragonball: add device info support for aarch64
f7ccf92dc kata-deploy: Rely on the configured config path
386a523a0 kata-deploy: Pass the config path to CRI-O
13df57c39 build: save lines for repository_owner check
57c2d8b74 docs: Update URL links for containerd documentation
e57a1c831 build: Mark git repos as safe for build
2551924bd docs: delete CRI containerd plugin statement
9cee52153 fmt: do cargo fmt and add a dependency for blk_dev
47a4142e0 fs: change vhostuser and virtio into const
e14e98bbe cpu_topo: add handle_cpu_topology function
5d3b53ee7 downtime: add downtime support
6a1fe85f1 vfio: add vfio as TODO
5ea35ddcd refractor: remove redundant by_id
b646d7cb3 config: remove ht_enabled
cb54ac6c6 memory: remove reserve_memory_bytes
bde6609b9 hotplug: add room for other hotplug solution
d88b1bf01 dragonball: update vsock dependency
dd003ebe0 Dragonball: change error name and fix compile error
38957fe00 UT: fix compile error in unit tests
11b3f9514 dragonball: add virtio-fs device support
948381bdb dragonball: add virtio-net device support
3d20387a2 dragonball: add virtio-blk device support
87d38ae49 Doc: add document for Dragonball API
2bb1eeaec docs: further questions related to upcall
026aaeecc docs: add FAQ to the report
fffcb8165 docs: update the content of the report
42ea854eb docs: kata 3.0 Architecture
efdb92366 build: Fix clh source build as normal user
0e40ecf38 tools/snap: simplify nproc
f59939a31 runk: Support `exec` sub-command
4d89476c9 runtime: Fix DisableSelinux config
090de2dae dragonball: fix the clippy errors.
a1593322b dragonball: add vsock api to api server
89b9ba860 dragonball: add set_vm_configuration api
95fa0c70c dragonball: add start microvm support
5c1ccc376 dragonball: add Vmm struct
4d234f574 dragonball: refactor code layout
cfd5dae47 dragonball: add vm struct
527b73a8e dragonball: remove unused feature in AddressSpaceMgr
3bafafec5 action: extend commit message line limit to 150 bytes
5010c643c release: Revert kata-deploy changes after 2.5.0-rc0 release
7120afe4e dragonball: add vcpu test function for aarch64
648d285a2 dragonball: add vcpu support for aarch64
7dad7c89f dragonball: update dbs-xxx dependency
07231b2f3 runtime-rs:refactor network model with netlink
c8a905206 build: format files
242992e3d build: put install methods in utils.mk
8a697268d build: makefile for dragonball config
9c526292e runtime-rs:refactor network model with netlink
71db2dd5b hotplug: add room for future acpi hotplug mechanism
8bb00a3dc dragonball: fix a bug when generating kernel boot args
2aedd4d12 doc: add document for vCPU, api and device
bec22ad01 dragonball: add api module
07f44c3e0 dragonball: add vcpu manager
78c971875 dragonball: add upcall support
7d1953b52 dragonball: add vcpu
468c73b3c dragonball: add kvm context
e89e6507a dragonball: add signal handler
b6cb2c4ae dragonball: add metrics system
e80e0c464 dragonball: add io manager wrapper
d5ee3fc85 safe-path: fix clippy warning
93c10dfd8 runtime-rs: add crosvm license in Dragonball
dfe6de771 dragonball: add dragonball into kata README
39ff85d61 dragonball: green ci
71f24d827 dragonball: add Makefile.
a1df6d096 Doc: Update Dragonball Readme and add document for device
8619f2b3d dragonball: add virtio vsock device manager.
52d42af63 dragonball: add device manager.
c1c1e5152 dragonball: add kernel config.
6850ef99a dragonball: add configuration manager.
0bcb422fc dragonball: add legacy devices manager
3c45c0715 dragonball: add console manager.
3d38bb300 dragonball: add address space manager.
aff604055 dragonball: add resource manager support.
8835db6b0 dragonball: initial commit
9cb15ab4c agent: add the FSGroup support
ff7874bc2 protobuf: upgrade the protobuf version to 2.27.0
06f398a34 runtime-rs: use withContext to evaluate lazily
fd4c26f9c runtime-rs: support network resource
4be7185aa runtime-rs: runtime part implement
10343b1f3 runtime-rs: enhance runtimes
9887272db libs: enhance kata-sys-util and kata-types
3ff0db05a runtime-rs: support rootfs volume for resource
234d7bca0 runtime-rs: support cgroup resource
75e282b4c runtime-rs: hypervisor base define
bdfee005f runtime-rs: service and runtime framework
4296e3069 runtime-rs: agent implements
d3da156ee runtime-rs: uint FsType for s390x
e705ee07c runtime-rs: update containerd-shim-protos to 0.2.0
8c0a60e19 runtime-rs: modify the review suggestion
278f843f9 runtime-rs: shim implements for runtime-rs
641b73610 libs: enhance kata-sys-util
69ba1ae9e trans: fix the issue of wrong swapness type
d2a9bc667 agent: agent-protocol support async
aee9633ce libs/sys-util: provide functions to execute hooks
8509de0ae libs/sys-util: add function to detect and update K8s emptyDir volume
6d59e8e19 libs/sys-util: introduce function to get device id
5300ea23a libs/sys-util: implement reflink_copy()
1d5c898d7 libs/sys-util: add utilities to parse NUMA information
87887026f libs/sys-util: add utilities to manipulate cgroup
ccd03e2ca libs/sys-util: add wrappers for mount and fs
45a00b4f0 libs/sys-util: add kata-sys-util crate under src/libs
48c201a1a libs/types: make the variable name easier to understand
b9b6d70aa libs/types: modify implementation details
05ad026fc libs/types: fix implementation details
d96716b4d libs/types:fix styles and implementation details
6cffd943b libs/types:return Result to handle parse error
6ae87d9d6 libs/types: use contains to make code more readable
45e5780e7 libs/types: fixed spelling and grammer error
2599a06a5 libs/types:use include_str! in test file
8ffff40af libs/types:Option type to handle empty tomlconfig
626828696 libs/types: add license for test-config.rs
97d8c6c0f docs: modify move-issues-to-in-progress.yaml
8cdd70f6c libs/types: change method to update config by annotation
e19d04719 libs/types: implement KataConfig to wrap TomlConfig
387ffa914 libs/types: support load Kata agent configuration from file
69f10afb7 libs/types: support load Kata hypervisor configuration from file
21cc02d72 libs/types: support load Kata runtime configuration from file
5b89c1df2 libs/types: add kata-types crate under src/libs
4f62a7618 libs/logging: fix clippy warnings
6f8acb94c libs: refine Makefile rules
7cdee4980 libs/logging: introduce a wrapper writer for logging
426f38de9 libs/logging: implement rotator for log files
392f1ecdf libs: convert to a cargo workspace
575df4dc4 static-checks: Allow Merge commit to be >75 chars
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Static resource management should be default to false. If default to be
true, later update sandbox operation, e.g. resize, will not work.
Fixes: #4742
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
The root span should exist the duration of the trace. Defer ending span
until the end of the trace instead of end of function. Add the span to
the service struct to do so.
Fixes#4902
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
As route model is used for specific internal scenario, and it's not for
the general requirement.
Fixes:#4838
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
This PR updates the libseccomp version at the versions.yaml that is
being used in the kata CI.
Fixes#4858
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
"We need a newer nightly 1.62 rust to deal with the change
rust-lang/libc@576f778 on crate libc which breaks the compilation."
This comes from the a pull-request raised on TD-shim repo,
https://github.com/confidential-containers/td-shim/pull/354, which fixes
the issues with the commit being used with Kata Containers.
Let's bump to a newer commit of TD-shim and to a newer version of the
nightly toolchain as part of our versions file.
Fixes: #4840
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
"An empty [workspace] can be used with a package to conveniently create a
workspace with the package and all of its path dependencies", according
to the https://doc.rust-lang.org/cargo/reference/workspaces.html
This is also matches with the suggestion provided by the Cargo itself,
due to the errors faced with the Cloud Hypervisor CI:
```
10:46:23 this may be fixable by adding `go/src/github.com/kata-containers/kata-containers/src/tools/agent-ctl` to the `workspace.members` array of the manifest located at: /tmp/jenkins/workspace/kata-containers-2-clh-PR/Cargo.toml
10:46:23 Alternatively, to keep it out of the workspace, add the package to the `workspace.exclude` array, or add an empty `[workspace]` table to the package's manifest.
```
Fixes: #4843
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The file was added as part of the commit that tested this changes in the
CCv0 branch, but forgotten when re-writing it to the `main` branch.
Fixes: #4841
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
If the API server is not ready, the mount call will fail, so before
mounting share fs, we should wait the nydusd is started and
the API server is ready.
Fixes: #4710
Signed-off-by: liubin <liubin0329@gmail.com>
Signed-off-by: Bin Liu <bin@hyper.sh>
Instead of passing a bunch of arguments to qmp functions for
adding block devices, use govmm BlockDevice structure to reduce these.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Get rid of redundant return values from function.
args and blockdevArgs used to return different values to maintain
compatilibity between qemu versions. These are exactly the same now.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This configuration will allow users to choose between different
I/O backends for qemu, with the default being io_uring.
This will allow users to fallback to a different I/O mechanism while
running on kernels olders than 5.1.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Remove line about annotations support in CRI-O and containerd since it
has been supported for a couple years.
Fixes#4819
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
To keep runtime-rs up to date, we will merge main into runtime-rs every
week.
Fixes:kata-containers#4822
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Update documentation with details regarding
intel-device-plugins-for-kubernetes setup and dependencies.
Fixes#4819
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
github.com/kata-containers/tests#4986.To avoid returning an error when
running the ci, we just skip the test if the arch is s390x
Fixes: #4816
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
github.com/kata-containers/tests#4986.To avoid returning an error when running the ci, we just skip the build
process if the arch is s390x
Fixes: #4816
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
io_uring was introduced as a new kernel IO interface in kernel 5.1.
It is designed for higher performance than the older Linux AIO API.
This feature was added in qemu 5.0.
Fixes#4645
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Let's create the td-shim tarball in the directory where the script was
called from, instead of doing it in the $DESTDIR.
This aligns with the logic being used for creating / extracting the
tarball content, which is already in use by the kata-deploy local build
scripts.
Fixes: #4809
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's create the OVMF tarball in the directory where the script was
called from, instead of doing it in the $DESTDIR.
This aligns with the logic being used for creating / extracting the
tarball content, which is already in use by the kata-deploy local build
scripts.
Fixes: #4808
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The latest kernel with TDX support should be pulled from a different
repo (https://github.com/intel/linux-kernel-dcp, instead of
https://github.com/intel/tdx), and the latest version to be used is
SPR-BKC-PC-v9.6.
With the new version being used, let's make sure we enable the
INTEL_TDX_ATTESTATION config option, and all the dependencies needed to
do so.
Fixes: #4803
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's just re-order the TDX configs alphabetically. No new config has
been added or removed, thus no need to bump the kernel version.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
ps command supprot two formats, `json` and `table`. `json` format just
outputs pids in the container. `table` format will use `ps` utilty in
the host, search and output all processes in the container. Add a struct
`container` to represent a spawned container. Move the `kill`
implemention from kill.rs as a method of `container`.
Fixes: #4361
Signed-off-by: Chen Yiyang <cyyzero@qq.com>
Linux 5.14 supports core scheduling to have better security control
for SMT siblings. This PR supports that.
Fixes: #4429
Signed-off-by: Ji-Xinyou <jerryji0414@outlook.com>
Kata Containers doesn't support host network namespace,
it's a common issue for new users. The limitation
is deleted, this commit will add them back.
Also, Docker has support to run containers using
Kata Containers, delete Docker from not support list.
This commit reverts parts of #3710Fixes: #4794
Signed-off-by: Bin Liu <bin@hyper.sh>
TDVF is the firmware used by QEMU to start TDX capable VMs. Let's start
tracking it as it'll become part of the Confidential Containers sooner
or later.
TDVF lives in the public https://github.com/tianocore/edk2-staging repo
and we're using as its version tags that are consumed internally at
Intel.
Fixes: #4624
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Instead of having as a result the directory where OVMF artefacts where
installed, let's follow what we do with the other components and have a
tarball as a result of the OVMF build.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Instead of cloning the repo, and then switching to a specific branch,
let's take advantage of `--branch` and directly clone the specific
branch / tag.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As TDVF comes from a different repo, the edk2-staging one, we cannot
simply hardcode the name. Instead, let's get the name of the directory
from name of the git repo.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
To keep runtime-rs up to date, we will merge main into runtime-rs every
week.
Fixes: #4790
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
TD-shim is a simplified TDX virtual firmware, used by Cloud Hypervisor,
in order to create a TDX capable VM.
TD-shim is heavily under development, and is hosted as part of the
Confidential Containers project:
https://github.com/confidential-containers/td-shim
The version chosen for this commit, is a version that's being tested
inside Intel, but we, most likely, will need to change it before we have
it officially packaged as part of an official release.
Fixes: #4779
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Bump rtnetlink version from 0.8.0 to 0.11.0. Use rtnetlinks's API to
add neighbors and fix issues to adapt new verson of rtnetlink.
Fixes: #4607
Signed-off-by: Xuewei Niu <justxuewei@apache.org>
To keep runtime-rs up to date, we will merge main into runtime-rs every
week.
Fixes:#4776
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Fixes#4764
versions: update rust version to fix ccv0 attestation-agent build error
static-checks: kata tools, libs, and agent fixes
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
In some cases do_create_container may return an error, mostly due to
`container.start(process)` call. This commit will do some rollback
works if this function failed.
Fixes: #4749
Signed-off-by: Bin Liu <bin@hyper.sh>
In qemu.StopVM(), if debug is enabled, the shim will dump logs
from qemu.log, but users don't know which logs are from qemu.log
and shim itself. Adding some additional messages will
help users to distinguish these logs.
Fixes: #4745
Signed-off-by: Bin Liu <bin@hyper.sh>
We can log all RPC calls to the agent for debugging purposes
to check which RPC is called, which can help us to understand
the container lifespan.
Fixes: #4738
Signed-off-by: liubin <liubin0329@gmail.com>
when the default_vcpus is greater than the default_maxvcpus, the default
vcpu number should be set equal to the default_maxvcpus.
Fixes: #4712
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
When run with configuration `sandbox_cgroup_only=false`, we will call
`gen_overhead_path()` as the overhead path. The `cgroup-rs` will push
the path with the subsystem prefix by `PathBuf::push()`. When the path
has prefix “/” it will act as root path, such as
```
let mut path = PathBuf::from("/tmp");
path.push("/etc");
assert_eq!(path, PathBuf::from("/etc"));
```
So we shoud not set overhead path with prefix "/".
Fixes: #4687
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
During use, there will be cases where the container is in the stop state
and get another stop. In this case, the second stop needs to be ignored.
Fixes: #4683
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
Update dbs-xxx commit ID for aarch64 in runtime-rs/Cargo.toml file to add
dependencies for aarch64.
Fixes: #4676
Signed-off-by: xuejun-xj <jiyunxue@alibaba.linux.com>
Module anyhow::anyhow is only used on x86_64 architecture in
crates/hypervisor/src/device/vfio.rs file.
Fixes: #4676
Signed-off-by: xuejun-xj <jiyunxue@alibaba.linux.com>
As a built in VMM, Path, jailer path, ctlpath are not needed for
Dragonball. So we don't generate those value in Makefile.
Fixes: #4677
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
New nix is supporting UMOUNT_NOFOLLOW, upgrade nix
version to use this flag instead of the self-defined flag.
Fixes: #4670
Signed-off-by: liubin <liubin0329@gmail.com>
io_uring is a Linux API for asynchronous I/O introduced in qemu 5.0.
It is designed to better performance than older aio API.
We could leverage this in order to get better storage performance.
We should be adding liburing-dev to qemu build to leverage this feature.
However liburing-dev package is not available in ubuntu 20.04,
it is avaiable in 22.04.
Upgrading the ubuntu version in the dockerfile to 22.04 is causing
issues in the static qemu build related to libpmem.
So instead we are building liburing from source until those build issues
are solved.
Fixes: #4645
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
When a container terminated, we should make sure there's no processes
left after destroying the container.
Before this commit, kata-agent depended on the kernel's pidns
to destroy all of the process in a container after the 1 process
exit in a container. This is true for those container using a
separated pidns, but for the case of shared pidns within the
sandbox, the container exit wouldn't trigger the pidns terminated,
and there would be some daemon process left in this container, this
wasn't expected.
Fixes: #4663
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
1. support annotation for runtime.name, hypervisor_name, agent_name.
2. fix parse memory from annotation
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Change get_mounts to get paths from a borrowed argument rather than
calling get_paths a second time.
Fixes#3768
Signed-off-by: Derek Lee <derlee@redhat.com>
Currently $BUILD_DIR will be used to create a directory as:
/opt/kata/share/kata-qemu${BUILD_DIR}
It means that when passing a BUILD_DIR, like "foo", a name would be
built like /opt/kata/share/kata-qemufoo
We should, instead, be building it as /opt/kata/share/kata-qemu-foo.
Fixes: #4638
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Instead of always naming the binary as "-experimental", let's take
advantage of the $BUILD_SUFFIX that's already passed and correctly name
the binary according to it.
Fixes: #4638
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Passing the URL to be used to download the kernel tarball is useful in
various scenarios, mainly when doing a downstream build, thus let's add
this new option.
This new option also works around a known issue of the Dockerfile used
to build the kernel not having `yq` installed.
Fixes: #4629
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
There's no need to have the entire function for building SEV / TDX
duplicated.
Let's remove those functions and create a `get_tee_kernel()` which takes
the TEE as the argument.
Fixes: #4627
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Fix the warning "unused import: `dbs_arch::gic::Error as GICError`" and
"unused import: `dbs_arch::gic::GICDevice`" in file src/vm/mod.rs when
compiling.
Fixes: #4544
Signed-off-by: xuejun-xj <jiyunxue@alibaba.linux.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
As string "com1", "com2" and "rtc" are used in two files
(device_manager/mod.rs and device_manager/legacy.rs), we use public
const variables COM1, COM2 and RTC to replace them respectively.
Fixes: #4544
Signed-off-by: xuejun-xj <jiyunxue@alibaba.linux.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
Implement generate_virtio_device_info() and
get_virtio_mmio_device_info() functions su support the mmio_device_info
member, which is used by FDT.
Fixes: #4544
Signed-off-by: xuejun-xj <jiyunxue@linux.alibaba.com>
Signed-off-by: jingshan <jingshan@linux.alibaba.com>
Instead of passing a `KATA_CONF_FILE` environament variable, let's rely
on the configured (in the container engine) config path, as both
containerd and CRI-O support it, and we're using this for both of them.
Fixes: #4608
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As we're already doing for containerd, let's also pass the configuration
path to CRI-O, as all the supported CRI-O versions do support this
configuration option.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
repository_owner check in docs-url-alive-check.yaml now is specified for each step, it can be in job level to save lines.
Fixes: #4611
Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
This is not an issue when the build is run as non-privilged user.
Marking these as safe in case where the build may be run as root
or some other user.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
There is no independent CRI containerd plugin for new containerd,
the related documentation should be updated too.
Fixes: #4605
Signed-off-by: liubin <liubin0329@gmail.com>
remove redundant by_id in get_vm_by_id_mut and get_vm_by_id. They are
optimized to get_vm_mut and get_vm.
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Since cpu topology could tell whether hyper thread is enabled or not, we
removed ht_enabled config from VmConfigInfo
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Change error name from `StartMicrovm` to `StartMicroVm`,
`StartMicrovmError` to `StartMicroVmError`.
Besides, we fix a compile error in config_manager.
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
1. Explain why the current situation is a problem.
2. We are beyond a simple introduction now, it's a real proposal.
3. Explain why you think it is solid, and fix a grammatical error.
4. The Rust rationale does not really belong to the initial paragraph.
Also, I rephrased it to highlight the contrast with Go and the Kata community's
past experience switching to Rust for the agent.
Fixes:#4193
Signed-off-by: Christophe de Dinechin <christophe@dinechin.org>
While running make as non-privileged user, the make errors out with
the following message:
"INFO: Build cloud-hypervisor enabling the following features: tdx
Got permission denied while trying to connect to the Docker daemon
socket at unix:///var/run/docker.sock: Post
"http://%2Fvar%2Frun%2Fdocker.sock/v1.24/images/create?fromImage=cloudhypervisor%2Fdev&tag=20220524-0":
dial unix /var/run/docker.sock: connect: permission denied"
Even though the user may be part of docker group, the clh build from
source does a docker in docker build. It is necessary for the user of
the nested container to be part of docker build for the build to
succeed.
Fixes#4594
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Replaces calls of nproc with nproc with
nproc ${CI:+--ignore 1}
to run nproc with one less processing unit than the maximum to prevent
DOS-ing the local machine.
If process is being run in a container (determined via whether $CI is
null), all processing units avaliable will be used.
Fixes#3967
Signed-off-by: Derek Lee <derlee@redhat.com>
`exec` will execute a command inside a container which exists and is not
frozon or stopped. *Inside* means that the new process share namespaces
and cgroup with the container init process. Command can be specified by
`--process` parameter to read from a file, or from other parameters such
as arg, env, etc. In order to be compatible with `create`/`run`
commands, I refactor libcontainer. `Container` in builder.rs is divided
into `InitContainer` and `ActivatedContainer`. `InitContainer` is used
for `create`/`run` command. It will load spec from given bundle path.
`ActivatedContainer` is used by `exec` command, and will read the
container's status file, which stores the spec and `CreateOpt` for
creating the rustjail::LinuxContainer. Adapt the spec by replacing the
process with given options and updating the namesapces with some paths
to join the container. I also rename the `ContainerContext` as
`ContainerLauncher`, which is only used to spawn process now. It uses
the `LinuxContaier` in rustjail as the runner. For `create`/`run`, the
`launch` method will create a new container and run the first process.
For `exec`, the `launch` method will spawn a process which joins a
container.
Fixes#4363
Signed-off-by: Chen Yiyang <cyyzero@qq.com>
Enable Kata runtime to handle `disable_selinux` flag properly in order
to be able to change the status by the runtime configuration whether the
runtime applies the SELinux label to VMM process.
Fixes: #4599
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
As 2.5.0-rc0 has been released, let's switch the kata-deploy / kata-cleanup
tags back to "latest", and re-add the kata-deploy-stable and the
kata-cleanup-stable files.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
- Drop in cfg files support
- agent: enhance get handled signal
- oci: fix serde skip serializing condition
- agent: Run OCI poststart hooks after a container is launched
- agent: Replace some libc functions with nix ones
- runtime: overwrite mount type to bind for bind mounts
- build: Set safe.directory for runtime repo
- ci/cd: update check-commit-message
- Set safe.directory against tests repository
- runtime: delete Console from Cmd type
- Add `default_maxmemory` config option
- shim: set a non-zero return code if the wait process call failed.
- Refactor how hypervisor config validation is handled
- packaging: Remove unused kata docker configure script
- kata-with-k8s: Add cgroupDriver for containerd
- shim: support shim v2 logging plugin
- device package cleanup/refactor
- versions: Update kernel to latest LTS version 5.15.48
- agent: Allow BUILD_TYPE=debug
- Fix clippy warnings and update agent's vendored code
- block: Leverage multiqueue for virtio-block
- kernel: Add CONFIG_EFI=y as part of the TDX fragments
- runtime: Add heuristic to get the right value(s) for mem-reserve
- runtime: enable sandbox feature on qemu
- snap: fix snap build on ppc64le
- packaging: Remove unused publish kata image script
- rootfs: Fix chronyd.service failing on boot
- tracing: Remove whitespace from root span
- workflow: Removing man-db, workflow kept failing
- docs: Update outdated URLs and keep them available
- runtime: fix error when trying to parse sandbox sizing annotations
- snap: Fix debug cli option
- deps: Resolve dependabot bumps of containerd, crossbeam-utils, regex
- Allow Cloud Hypervisor to run under the `container_kvm_t`
- docs: Update containerd url link
- agent: refactor reading file timing for debugging
- safe-path: fix clippy warning
- kernel building: efi_secret module
- runtime: Switch to using the rust version of virtiofsd (all arches but powerpc)
- shim: change the log level for GetOOMEvent call failures
- docs: Add more kata monitor details
- Allow io.katacontainers.config.hypervisor.enable_iommu annotation by …
- versions: Bump virtiofsd to v1.3.0
- docs: Add storage limits to arch doc
- docs: Update source for cri-tools
- tools: Enable extra detail on error
- docs: Add agent-ctl examples section
f4eea832a release: Adapt kata-deploy for 2.5.0-rc0
0ddb34a38 oci: fix serde skip serializing condition
fbb2e9bce agent: Replace some libc functions with nix ones
acd3302be agent: Run OCI poststart hooks after a container is launched
1f363a386 runtime: overwrite mount type to bind for bind mounts
4e48509ed build: Set safe.directory for runtime repo
48ccd4233 ci: Set safe.directory against tests repository
2a4fbd6d8 agent: enhance get handled signal
433816cca ci/cd: update check-commit-message
a5a25ed13 runtime: delete Console from Cmd type
96553e8bd runtime: Add documentation of drop-in config file fragments
c656457e9 runtime: Add tests of drop-in config file decoding
99f5ca80f runtime: Plug drop-in decoding into decodeConfig()
0f9856c46 runtime: Scan drop-in directory, read files and decode them
2c1efcc69 runtime: Add helpers to copy fields between tomlConfig instances
20f11877b runtime: Add framework to manipulate config structs via reflection
ab5f1c956 shim: set a non-zero return code if the wait process call failed.
e5be5cb08 runtime: device: cleanup outdated comments
5f936f268 virtcontainers: config validation is host specific
323271403 virtcontainers: Remove unused function
0939f5181 config: Expose default_maxmemory
58ff2bd5c clh,qemu: Adapt to using default_maxmemory
1a78c3df2 packaging: Remove unused kata docker configure script
afdc96042 hypervisor: Add default_maxmemory configuration
4e30e11b3 shim: support shim v2 logging plugin
bdf5e5229 virtcontainers: validate hypervisor config outside of hypervisor itself
469e09854 katautils: don't do validation when loading hypervisor config
e32bf5331 device: deduplicate state structures
f97d9b45c runtime: device/persist: drop persist dependency from device pkgs
f9e96c650 runtime: device: move to top level package
3880e0c07 agent: refactor reading file timing for debugging
c70d3a2c3 agent: Update the dependencies
612fd79ba random: Fix "nonminimal-bool" clippy warning
d4417f210 netlink: Fix "or-fun-call" clippy warnings
93874cb3b packaging: Restrict kernel patches applied to top-level dir
07b1367c2 versions: Update kernel to latest LTS version 5.15.48
1b7d36fdb agent: Allow BUILD_TYPE=debug
9ff10c083 kernel: Add CONFIG_EFI=y as part of the TDX fragments
e227b4c40 block: Leverage multiqueue for virtio-block
e7e7dc9df runtime: Add heuristic to get the right value(s) for mem-reserve
c7dd10e5e packaging: Remove unused publish kata image script
0bbbe7068 snap: fix snap build on ppc64le
ef925d40c runtime: enable sandbox feature on qemu
28995301b tracing: Remove whitespace from root span
9941588c0 workflow: Removing man-db, workflow kept failing
90a7763ac snap: Fix debug cli option
a305bafee docs: Update outdated URLs and keep them available
bee770343 docs: Update containerd url link
ac5dbd859 clh: Improve logging related to the net dev addition
0b75522e1 network: Set queues to 1 to ensure we get the network fds
93b61e0f0 network: Add FFI_NO_PI to the netlink flags
bf3ddc125 clh: Pass the tuntap fds down to Cloud Hypervisor
55ed32e92 clh: Take care of the VmAdNetdPut request ourselves
01fe09a4e clh: Hotplug the network devices
2e0753833 clh: Expose VmAddNetPut
1ef0b7ded runtime: Switch to using the rust version of virtiofsd (all but power)
bb26bd73b safe-path: fix clippy warning
1a5ba31cb agent: refactor reading file timing for debugging
721ca72a6 runtime: fix error when trying to parse sandbox sizing annotations
9773838c0 virtiofsd: export env vars needed for building it
b0e090f40 versions: Bump virtiofsd to v1.3.0
db5048d52 kernel: build efi_secret module for SEV
1b845978f docs: Add storage limits to arch doc
412441308 docs: Add more kata monitor details
eff4e1017 shim: change the log level for GetOOMEvent call failures
5d7fb7b7b build(deps): bump github.com/containerd/containerd in /src/runtime
d0ca2fcbb build(deps): bump crossbeam-utils in /src/tools/trace-forwarder
a60dcff4d build(deps): bump regex from 1.5.4 to 1.5.6 in /src/tools/agent-ctl
dbf50672e build(deps): bump crossbeam-utils in /src/tools/agent-ctl
8e2847bd5 build(deps): bump crossbeam-utils from 0.8.6 to 0.8.8 in /src/libs
e9ada165f build(deps): bump regex from 1.5.4 to 1.5.5 in /src/agent
adad9cef1 build(deps): bump crossbeam-utils from 0.8.5 to 0.8.8 in /src/agent
34bcef884 docs: Add agent-ctl examples section
815157bf0 docs: Remove erroneous whitespace
f5099620f tools: Enable extra detail on error
8f10e13e0 config: Allow enable_iommu pod annotation by default
7ae11cad6 docs: Update source for cri-tools
0e2459d13 docs: Add cgroupDriver for containerd
1b7fd19ac rootfs: Fix chronyd.service failing on boot
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
kata-deploy files must be adapted to a new release. The cases where it
happens are when the release goes from -> to:
* main -> stable:
* kata-deploy-stable / kata-cleanup-stable: are removed
* stable -> stable:
* kata-deploy / kata-cleanup: bump the release to the new one.
There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Replace `libc::setgroups()`, `libc::fchown()`, and `libc::sethostname()`
functions with nix crate ones for safety and maintainability.
Fixes: #4579
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Run the OCI `poststart` hooks must be called after the
user-specified process is executed but before the `start`
operation returns in accordance with OCI runtime spec.
Fixes: #4575
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Some clients like nerdctl may pass mount type of none for volumes/bind mounts,
this will lead to container start fails.
Referring to runc, it overwrites the mount type to bind and ignores the input value.
Fixes: #4548
Signed-off-by: liubin <liubin0329@gmail.com>
While doing a docker build for shim-v2, we see this:
```
fatal: unsafe repository
('/home/${user}/go/src/github.com/kata-containers/kata-containers' is
owned by someone else)
To add an exception for this directory, call:
git config --global --add safe.directory
/home/${user}/go/src/github.com/kata-containers/kata-containers
```
This is because the docker container build is run as root while the
runtime repo is checked out as normal user.
Unlike this error causing the rootfs build to error out, the error here
does not really cause `make shim-v2-tarball` to fail.
However its good to get rid of this error message showing during the
make process.
Fixes: #4572
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Set `safe.directory` against `kata-containers/tests` repository
before checkout because the user in the docker container is root,
but the `tests` repository on the host machine is usually owned
by the normal user.
This works when we already have the `tests` repository which is
not owned by root on the host machine and try to create a rootfs
using Docker (`USE_DOCKER=true`).
Fixes: #4561
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
For runC, send the signal to the init process directly.
For kata, we try to send `SIGKILL` instead of `SIGTERM` when the process
has not installed the handler for `SIGTERM`.
The `is_signal_handled` function determine which signal the container
process has been handled. But currently `is_signal_handled` is only
catching (SigCgt). While the container process is ignoring (SigIgn) or
blocking (SigBlk) also should not be converted from the `SIGTERM` to
`SIGKILL`. For example, when using terminationGracePeriodSeconds the k8s
will send SIGTERM first and then send `SIGKILL`, in this case, the
container ignores the `SIGTERM`, so we should send the `SIGTERM` not the
`SIGKILL` to the container.
Fixes: #4478
Signed-off-by: quanweiZhou <quanweiZhou@linux.alibaba.com>
Recently added check-commit-message to the tests repository. Minor
changes were also made to action. For consistency's sake, copied changes
over to here as well.
tests - https://github.com/kata-containers/tests/pull/4878
Minor Changes:
1. Body length check is now 75 and consistent with guidelines
2. Lines without spaces are not counted in body length check
Fixes#4559
Signed-off-by: Derek Lee <derlee@redhat.com>
The tests ensure that interactions between drop-ins and the base
configuration.toml and among drop-ins themselves work as intended,
basically that files are evaluated in the correct order (base file
first, then drop-ins in alphabetical order) and the last one to set
a specific key wins.
Signed-off-by: Pavel Mores <pmores@redhat.com>
updateFromDropIn() uses the infrastructure built by previous commits to
ensure no contents of 'tomlConfig' are lost during decoding. To do
this, we preserve the current contents of our tomlConfig in a clone and
decode a drop-in into the original. At this point, the original
instance is updated but its Agent and/or Hypervisor fields are
potentially damaged.
To merge, we update the clone's Agent/Hypervisor from the original
instance. Now the clone has the desired Agent/Hypervisor and the
original instance has the rest, so to finish, we just need to move the
clone's Agent/Hypervisor to the original.
Signed-off-by: Pavel Mores <pmores@redhat.com>
These functions take a TOML key - an array of individual components,
e.g. ["agent" "kata" "enable_tracing"], as returned by BurntSushi - and
two 'tomlConfig' instances. They copy the value of the struct field
identified by the key from the source instance to the target one if
necessary.
This is only done if the TOML key points to structures stored in
maps by 'tomlConfig', i.e. 'hypervisor' and 'agent'. Nothing needs to
be done in other cases.
Signed-off-by: Pavel Mores <pmores@redhat.com>
For 'tomlConfig' substructures stored in Golang maps - 'hypervisor' and
'agent' - BurntSushi doesn't preserve their previous contents as it does
for substructures stored directly (e.g. 'runtime'). We use reflection
to work around this.
This commit adds three primitive operations to work with struct fields
identified by their `toml:"..."` tags - one to get a field value, one to
set a field value and one to assign a source struct field value to the
corresponding field of a target.
Signed-off-by: Pavel Mores <pmores@redhat.com>
Return code is an int32 type, so if an error occurred, the default value
may be zero, this value will be created as a normal exit code.
Set return code to 255 will let the caller(for example Kubernetes) know
that there are some problems with the pod/container.
Fixes: #4419
Signed-off-by: liubin <liubin0329@gmail.com>
Prior device config move didn't update the comments. Let's address this,
and make sure comments match the new path...
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Ideally this config validation would be in a seperate package
(katautils?), but that would introduce circular dependency since we'd
call it from vc, and it depends on vc types (which, shouldn't be vc, but
probably a hypervisor package instead).
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
While working on the previous commits, some of the functions become
non-used. Let's simply remove them.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Expose the newly added `default_maxmemory` to the project's Makefile and
to the configuration files.
Fixes: #4516
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's adapt Cloud Hypervisor's and QEMU's code to properly behave to the
newly added `default_maxmemory` config.
While implementing this, a change of behaviour (or a bug fix, depending
on how you see it) has been introduced as if a pod requests more memory
than the amount avaiable in the host, instead of failing to start the
pod, we simply hotplug the maximum amount of memory available, mimicing
better the runc behaviour.
Fixes: #4516
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR removes an unused kata configure docker script which was used
in packaging for kata 1.x but not longer being used in kata 2.x
Fixes#4546
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's add a `default_maxmemory` configuration, which allows the admins
to set the maximum amount of memory to be used by a VM, considering the
initial amount + whatever ends up being hotplugged via the pod limits.
By default this value is 0 (zero), and it means that the whole physical
RAM is the limit.
Fixes: #4516
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Now kata shim only supports stdout/stderr of fifo from
containerd/CRI-O, but shim v2 supports logging plugins,
and nerdctl default will use the binary schema for logs.
This commit will add the others type of log plugins:
- file
- binary
In case of binary, kata shim will receive a stdout/stderr like:
binary:///nerdctl?_NERDCTL_INTERNAL_LOGGING=/var/lib/nerdctl/1935db59
That means the nerdctl process will handle the logs(stdout/stderr)
Fixes: #4420
Signed-off-by: Bin Liu <bin@hyper.sh>
Depending on the user of it, the hypervisor from hypervisor interface
could have differing view on what is valid or not. To help decouple,
let's instead check the hypervisor config validity as part of the
sandbox creation, rather than as part of the CreateVM call within the
hypervisor interface implementation.
Fixes: #4251
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Policy for whats valid/invalid within the config varies by VMM, host,
and by silicon architecture. Let's keep katautils simple for just
translating a toml to the hypervisor config structure, and leave
validation to virtcontainers.
Without this change, we're doing duplicate validation.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
In order to support ACPI hotplug in the future with the cooperative work
from the Kata community, we add ACPI feature and dbs-upcall feature to
add room for ACPI hotplug.
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Before, we maintained almost identical structures between our persist
API and what we keep for our devices, with the persist API being a
slight subset of device structures.
Let's deduplicate this, now that persist is importing device package.
Json unmarshal of prior persist structure will work fine, since it was
an exact subset of fields.
Fixes: #4468
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Rather than have device package depend on persist, let's define the
(almost duplicate) structures within device itself, and have the Kata
Container's persist pkg import these.
This'll help avoid unecessary dependencies within our core packages.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
In the original code, reads mountstats file and return
the content in the error, but at this time the file maybe
changed, we should return the file content that parsed
line by line to check why there is not a fstype option.
Fixes: #4246
Signed-off-by: Bin Liu <bin@hyper.sh>
Let's run a `cargo update` and ensure the deps are up-to-date before we
cut the "-rc0" release.
Fixes: #4525
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The error shown below was caught during a dependency bump in the CCv0
branch, but we better fix it here first.
```
error: this boolean expression can be simplified
--> src/random.rs:85:21
|
85 | assert!(!ret.is_ok());
| ^^^^^^^^^^^^ help: try: `ret.is_err()`
|
= note: `-D clippy::nonminimal-bool` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#nonminimal_bool
error: this boolean expression can be simplified
--> src/random.rs:93:17
|
93 | assert!(!ret.is_ok());
| ^^^^^^^^^^^^ help: try: `ret.is_err()`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#nonminimal_bool
```
Fixes: #4523
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The error shown below was caught during a dependency bump in the CCv0
branch, but we better fix it here first.
```
error: use of `ok_or` followed by a function call
--> src/netlink.rs:526:14
|
526 | .ok_or(anyhow!(nix::Error::EINVAL))?;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try this: `ok_or_else(|| anyhow!(nix::Error::EINVAL))`
|
= note: `-D clippy::or-fun-call` implied by `-D warnings`
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#or_fun_call
error: use of `ok_or` followed by a function call
--> src/netlink.rs:615:49
|
615 | let v = u8::from_str_radix(split.next().ok_or(anyhow!(nix::Error::EINVAL))?, 16)?;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try this: `ok_or_else(|| anyhow!(nix::Error::EINVAL))`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#or_fun_call
```
Fixes: #4523
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The apply_patches.sh script applies all patches in the patches
directory, as well as subdirectories. This means if there is a sub-dir
called "experimental" under a major kernel version directory,
experimental patches would be applied to the default kernel supported by
Kata.
We did not come accross this issue earlier as typically the experimental
kernel version was different from the default kernel.
With both the default kernel and the arm-experimental kernel having the
same major kernel version (5.15.x) at this time, trying to update the
kernel patch version revealed that arm-experimental patches were being
applied to the default kernel.
Restricting the patches to be applied to the top level directory will
solve the issue. The apply_patches script should ignore any
sub-directories meant for experimental patches.
Fixes#4520
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This brings in a few security fixes.
Removing arm patches related to virtio-mem that are no longer required
as they have been merged.
Fixes#4438
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
The cargo command creates debug build binaries, when the --release
option is not specified. Specifying --debug option causes an error.
This patch specifies --release option when BUILD_TYPE=release,
and does not specify any build type option when BUILD_TYPE=debug.
Fixes#4504
Signed-off-by: Yohei Ueda <yohei@jp.ibm.com>
Otherwise `./build-kernel.sh -x tdx setup` will fail with the following
error:
```
$ ./build-kernel.sh -x tdx setup
INFO: Config version: 92
INFO: Kernel version: tdx-guest-v5.15-4
INFO: kernel path does not exist, will download kernel
INFO: Apply patches from
/home/ffidenci/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/patches/tdx-guest-v5.15-4.x
INFO: Found 0 patches
INFO: Enabling config for 'tdx' confidential guest protection
INFO: Constructing config from fragments:
/home/ffidenci/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/configs/fragments/x86_64/.config
WARNING: unmet direct dependencies detected for UNACCEPTED_MEMORY
Depends on [n]: EFI [=n] && EFI_STUB [=n]
Selected by [y]:
- INTEL_TDX_GUEST [=y] && HYPERVISOR_GUEST [=y] && X86_64 [=y] &&
CPU_SUP_INTEL [=y] && PARAVIRT [=y] && SECURITY [=y] &&
X86_X2APIC[=y]
INFO: Some CONFIG elements failed to make the final .config:
INFO: Value requested for CONFIG_EFI_STUB not in final .config
INFO: Generated config file can be found in
/home/ffidenci/go/src/github.com/kata-containers/kata-containers/tools/packaging/kernel/configs/fragments/x86_64/.config
ERROR: Failed to construct requested .config file
ERROR: failed to find default config
```
Fixes: #4510
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similar to network, we can use multiple queues for virtio-block
devices. This can help improve storage performance.
This commit changes the number of queues for block devices to
the number of cpus for cloud-hypervisor and qemu.
Today the default number of cpus a VM starts with is 1.
Hence the queues used will be 1. This change will help
improve performance when the default cold-plugged cpus is greater
than one by changing this in the config file. This may also help
when we use the sandboxing feature with k8s that passes down
the sum of the resources required down to Kata.
Fixes#4502
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This PR removes unused the publish kata image script which
was used on kata 1.x when we had OBS packages which are not
longer used on kata 2.x
Fixes#4496
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Enable "-sandbox on" in qemu can introduce another protect layer
on the host, to make the secure container more secure.
The default option is disable because this feature may introduce some
performance cost, even though user can enable
/proc/sys/net/core/bpf_jit_enable to reduce the impact.
Fixes: #2266
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Remove space from root span name to follow camel casing of other tracing
span names in the runtime and to make parsing easier in testing.
Fixes#4483
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
`snap`/`snapcraft` seems to have changed recently. Since `snap`
auto-updates all `snap` packages and since we use the `snapcraft` `snap`
for building snaps, this is impacting all our CI jobs which now show:
```
Installing Snapcraft for Linux…
snapcraft 7.0.4 from Canonical* installed
Run snapcraft -d snap --destructive-mode
Usage: snapcraft [options] command [args]...
Try 'snapcraft pack -h' for help.
Error: unrecognized arguments: -d
Error: Process completed with exit code 1.
```
Move the debug option to make it a sub-command (long) option to resolve
this issue.
Fixes: #4457.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
By comparing the content of the old url and the new url,
ensure that their content is consistent and does not contain ambiguities
Fixes: #4454
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
Let's improve the log so we make it clear that we're only *actually*
adding the net device to the Cloud Hypervisor configuration when calling
our own version of VmAddNetPut().
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We want to have the file descriptors of the opened tuntap device to pass
them down to the VMMs, so the VMMs don't have to explicitly open a new
tuntap device themselves, as the `container_kvm_t` label does not allow
such a thing.
With this change we ensure that what's currently done when using QEMU as
the hypervisor, can be easily replicated with other VMMs, even if they
don't support multiqueue.
As a side effect of this, we need to close the received file descriptors
in the code of the VMMs which are not going to use them.
Fixes: #3533
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Adding FFI_NO_PI to the netlink flags causes no harm to the supported
and tested hypervisors as when opening the device by its name Cloud
Hypervisor[0], Firecracker[1], and QEMU[2] do set the flag already.
However, when receiving the file descriptor of an opened tutap device
Cloud Hypervisor is not able to set the flag, leaving the guest without
connectivity.
To avoid such an issue, let's simply add the FFI_NO_PI flag to the
netlink flags and ensure, from our side, that the VMMs don't have to set
it on their side when dealing with an already opened tuntap device.
Note that there's a PR opened[3] just for testing that this change
doesn't cause any breakage.
[0]: e52175c2ab/net_util/src/tap.rs (L129)
[1]: b6d6f71213/src/devices/src/virtio/net/tap.rs (L126)
[2]: 3757b0d08b/net/tap-linux.c (L54)
[3]: https://github.com/kata-containers/kata-containers/pull/4292
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is basically a no-op right now, as:
* netPair.TapInterface.VMFds is nil
* the tap name is still passed to Cloud Hypervisor, which is the Cloud
Hypervisor's first choice when opening a tap device.
In the very near future we'll stop passing the tap name to Cloud
Hypervisor, and start passing the file descriptors of the opened tap
instead.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Knowing that VmAddNetPut works as expected, let's switch to manually
building the request and writing it to the appropriate socket.
By doing this it gives us more flexibility to, later on, pass the file
descriptor of the tuntap device to Cloud Hypervisor, as openAPI doesn't
support such operation (it has no notion of SCM Rights).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Instead of creating the VM with the network device already plugged in,
let's actually add the network device *after* the VM is created, but
*before* the Vm is actually booted.
Although it looks like it doesn't make any functional difference between
what's done in the past and what this commit introduces, this will be
used to workaround a limitation on OpenAPI when it comes to passing down
the network device's file descriptor to Cloud Hypervisor, so Cloud
Hypervisor can use it instead of opening the device by its name on the
VMM side.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
VmAddNetPut is the API provided by the Cloud Hypervisor client (auto
generated) code to hotplug a new network device to the VM.
Let's expose it now as it'll be used as part this series, mostly to
guide the reviewer through the process of what we have to do, as later
on, spoiler alert, it'll end up being removed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
So far this has been done for x86_64. Now that the support for building
and testing has been added for all arches, let's do the second part of
the switch.
We're still not done yet for powerpc, as some a virtifosd crash on the
rust version has been found by the maintainer.
Fixes: #4258, #4260
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In the original code, reads mountstats file and return
the content in the error, but at this time the file maybe
changed, we should return the file content that parsed
line by line to check why there is not a fstype option.
Fixes: #4246
Signed-off-by: Bin Liu <bin@hyper.sh>
Changed bitsize for parsing functions to 64-bit in order to avoid
parsing errors.
Fixes#4435
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Revert this patch, after dragonball-sandbox is ready. And all
subsequent implementations are submitted.
Fixes: #4257
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Update Dragonball Readme to fix style problem and add github issue for
TODOs.
Add document for devices in dragonball. This is the document for the
current dragonball device status and we'll keep updating it when we
introduce more devices in later pull requets.
Fixes: #4257
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Device manager to manage IO devices for a virtual machine. And added
DeviceManagerTx to provide operation transaction for device management,
added DeviceManagerContext to operation context for device management.
Fixes: #4257
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Console manager to manage frontend and backend console devcies.
A virtual console are composed up of two parts: frontend in virtual
machine and backend in host OS. A frontend may be serial port,
virtio-console etc, a backend may be stdio or Unix domain socket. The
manager connects the frontend with the backend.
Fixes: #4257
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
1. support oom event
2. use ContainerProcess to store container_id and exec_id
3. support stats
Fixes: #3785
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
1. service: Responsible for processing services, such as task service, image service
2. Responsible for implementing different runtimes, such as Virt-container,
Linux-container, Wasm-container
Fixes: #3785
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
1. support async.
2. update ttrpc and protobuf
update ttrpc to 0.6.0
update protobuf to 2.23.0
3. support trans from oci
Fixes: #3746
Signed-off-by: Quanwei Zhou <quanweiZhou@linux.alibaba.com>
Implement reflink_copy() to copy file by reflink, and fallback to normal
file copy.
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Changes since v1.2.0:
!123 Update rust-vmm dependencies (main) ← (update-deps)
!121 implement std::error::Error trait (main) ← (fix-impl-error)
!120 Show the nofile hard limit value in the warning me... (main) ← (fix-rlimit-warn)
!119 Do not create tmpdir and bind mount /proc/self/fd ... (main) ← (remove-tmp-dir-for-proc)
!116 Disable killpriv_v2 by default (main) ← (no-killpriv-default)
The one that affected Kata Containers the most was !119, as virtiofsd
would get denied when SELinux was set to run on enforcing mode.
Fixes: #4433
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The kata-sys-util crate is a collection of modules that provides helpers
and utilities used by multiple Kata Containers components.
Fixes: #3305
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
1. modify default values for hypervisor
2. change the variable name
3. check the min memory limit
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
1. Some Nit problems are fixed
2. Make the code more readable
3. Modify some implementation details
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
If there is a parse error when we are trying to get the annotations, we
will return Result<Option<type>> to handle that.
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
loading from empty string is only used to identity that the config is
not initialized yet, so Option<TomlConfig> is a better option
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Some annotations are used to override hypervisor configurations, and you
know it's dangerous. We must be careful when overriding hypervisor configuration
by annotations, to avoid security flaws.
There are two existing mechanisms to prevent attacks by annotations:
1) config.hypervisor.enable_annotations defines the allowed annotation
keys for config.hypervisor.
2) config.hyperisor.xxxx_paths defines allowd values for specific keys.
The access methods for config.hypervisor.xxx enforces the permisstion
checks for above rules.
To update conifg, traverse the annotation hashmap,check if the key is enabled in hypervisor or not.
If it is enabled. For path related annotation, check whether it is valid or not
before updating conifg. For cpu and memory related annotation, check whether it
is more than or less than the limitation for DB and qemu beforing updating config.
If it is not enabled, there will be three possibilities, agent related
annotation, runtime related annotation and hypervisor related annotation
but not enabled. The function will handle agent and runtime annotation
first, then the option left will be the invlaid hypervisor, err message
will be returned.
add more edge cases tests for updating config
clean up unused functions, delete unused files and fix warnings
Fixes: #3523
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
The TomlConfig structure is a parsed form of Kata configuration file,
but it's a little inconveneient to access those configuration
information directly. So introduce a wrapper KataConfig to easily
access those configuration information.
Two singletons of KataConfig is provided:
- KATA_DEFAULT_CONFIG: the original version directly loaded from Kata
configuration file.
- KATA_ACTIVE_CONFIG: the active version is the KATA_DEFAULT_CONFIG
patched by annotations.
So the recommended to way to use these two singletons:
- Load TomlConfig from configuration file and set it as the default one.
- Clone the default one and patch it with values from annotations.
- Use the default one for permission checks, such as to check for
allowed annotation keys/values.
- The patched version may be set as the active one or passed to clients.
- The clients directly accesses information from the active/passed one,
and do not need to check annotation for override.
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Add structures to load Kata agent configuration from configuration files.
Also define a mechanism for vendor to extend the Kata configuration
structure.
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Add structures to load Kata hypevisor configuration from configuration
files. Also define a mechanisms to:
1) for hypervisors to handle the configuration info.
2) for vendor to extend the Kata configuration structure.
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Add structures to load Kata runtime configuration from configuration
files. Also define a mechanism for vendor to extend the Kata
configuration structure.
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
Add FileRotator to rotate log files.
The FileRotator structure may be used as writer for create_logger()
and limits the storage space occupied by log files.
Fixes: #3304
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Wei Yang <wei.yang1@linux.alibaba.com>
Signed-off-by: yanlei <yl.on.the.way@gmail.com>
Convert libs into a Cargo workspace, so all libraries could share the
build infrastructure.
Fixes#3282
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Some generated merge commit messages are >75 chars
Allow these to not trigger the subject line length failure
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add kernel fork for sev to kernel builder with efi_secret. Additionally, install efi_secret module for sev.
Fixes: #4179
Signed-off-by: Alex Carter <alex.carter@ibm.com>
this will require to set a PR number when triggering the test-kata-deploy workflow manually
also make sure user variables are set correctly when workflow_dispatch is used
Fixes: #4349
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
Updated the architecture document to explain that if you wish to
constrain the amount of disk space a container uses, you need to use an
existing facility such as `quota(1)`s or device mapper limits.
Fixes: #4430.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Add more detail to the `kata-monitor` doc to allow an admin to make a
more informed decision about where and how to run the daemon.
Fixes: #4416.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
GetOOMEvent is a blocking call that will fail if
the container exit, in this case, it's not an error or warning.
Changing the log level for logs in case of GetOOMEvent call fails
will reduce log noise in a large cluster that has pods
creating/deleting frequently.
Fixes: #4376
Signed-off-by: Bin Liu <bin@hyper.sh>
- docs: Update storage documentation link
- rustjail: get home dir using nix crate
- runk: Support `list` sub-command
- docs: Update vGPU use-case
- runtime: ignore ESRCH error from stop container
- docs: Update configuration reference for snap documentation
- workflows: add workflow_dispatch triggering to test-kata-deploy
- snap: Use helper script and cleanup
- feature: add ability to interact with IPTables within the guest
- agent: return mount file content if parse mountinfo failed
- docs: Update Intel QAT documentation links
- osbuilder: add iptables package
- runk: Return error when tty is used without console socket
- runk: Add Podman guide in README
- agent: Pass standard I/O to container launched by runk
- agent, runk: Enable test for the agent built with standard-oci-runtime feature
- runk: Handle rootfs path in config.json properly
- Update containerd docs
- clh: Update to v24.0
- snap: Build and package rust version of virtiofsd
- runk: merge oci-kata-agent into runk
- virtiofsd: static build virtiofsd from rust code for non-x86
- Fix issues with direct-volume stats feature
- runtime: fix incorrect Action function for direct-volume stats
- runtime: Adding the correct detection of mediated PCIe devices
- runtime: remove duplicate 'types' import
- runtime: sync docstrings with function names
- qemu: allow using legacy serial device for the console
- docs: Remove clear containers reference in README
- runtime: do not check for EOF error in console watcher
- kernel: Remove nemu.conf from packaging
- tools: delete unused param from get_from_kata_deps callers
- agent: Fix is_signal_handled failing parsing str to u64
- Improve Go unit test script
- packaging: Add kernel config option for SGX in Gramine
- ci: Don't run Docs URL Alive Check workflow on forks
- tools: Add QEMU patches for SGX numa support
- docs: Update runc containerd runtime
- Build and distribute the rust version of virtiofsd
- doc: Update log parser link
- Move the kata-log-parser from the tests repo
- versions: Upgrade to Cloud Hypervisor v23.1
- agent: Add a macro to skip a loop easier
- runk: use custom Kill command to support --all option
- agent: add test coverage for functions find_process and online_resources
fe3c1d9cd docs: Update storage documentation link
9d27c1fce agent: ignore ESRCH error when destroying containers
9726f56fd runtime: force stop container after the container process exits
168f325c4 docs: Update configuration reference for snap documentation
38a318820 runk: Support `list` sub-command
b9fc24ff3 docs: update release process github token instructions
c1476a174 docs: update release process with latest workflow triggering
002f2cd10 snap: Use helper script and cleanup
2e04833fb docs: Update Intel QAT documentation links
8b57bf97a workflows: add workflow_dispatch triggering to test-kata-deploy
6d0ff901a docs: Update vGPU use-case
9b108d993 docs: Improve snap formatting
894f661cc docs: Add warning to snap build
d759f6c3e snap: Fix CH architecture check
590381574 agent: Pass standard I/O to container launched by runk
af2ef3f7a agent-ctl: introduce handle for iptables get/set
65f0cef16 kata-runtime: add iptables CLI to test http endpoint
3201ad083 shim-client: ensure we check resp status for Put/Post
0706fb28a kata-runtime: shmgmt: make url usage consistent
2a09378dd shim-client: add support for DoPut
640173cfc shim-mgmt: Add endpoint handler for interacting with iptables
0136be22c virtcontainers: plumb iptable set/get from sandbox to agent
bd50d463b agent: iptables: get/set handling for iptables
7c4049aab osbuilder: add iptables package
03176a9e0 proto: update generated code based on proto update
38ebbc705 proto: update to add set/get iptables
78d45b434 agent: return mount file content if parse mountinfo failed
c7b3941c9 runk: Enable test for the agent built with standard-oci-runtime feature
6dbce7c3d agent: Remove unused import in console test
6ecea84bc rustjail: get home dir using nix crate
648b8d0ae runk: Return error when tty is used without console socket
5205efd9b runk: Add Podman guide in README
d862ca059 runk: Handle rootfs path in config.json properly
56591804b docs: Improve snap build instructions
cb2b30970 snap: Build using destructive mode
60823abb9 docs: Move snap README
fff832874 clh: Update to v24.0
49361749e snap: Build and package rust version of virtiofsd
27d903b76 snap: Put the yq binary in the staging bin directory
d7b4ce049 snap: Remove unused variable
43de5440e snap: Fix unbound variable error
c9b291509 snap: Fix whitespace
122a85e22 agent: remove bin oci-kata-agent
35619b45a runk: merge oci-kata-agent into runk
10c13d719 qemu: remove virtiofsd option in qemu config
d20bc5a4d virtiofsd: build rust based virtiofsd from source for non-x86_64
c95ba63c0 docs: Remove information related to Kata 1.x
34b80382b docs: Get rid of note related to networking.
dfad5728a docs: Mention --cni flag while invoking ctr
8e7c5975c agent: fix direct-assigned volume stats
4428ceae1 runtime: direct-volume stats use correct name
ffdc065b4 runtime: direct-volume stats update to use GET parameter
f29595318 runtime: fix incorrect Action function for direct-volume stats
7a5ccd126 runtime: sync docstrings with function names
ce2e521a0 runtime: remove duplicate 'types' import
834f93ce8 docs: fix annotations example
f4994e486 runtime: allow annotation configuration to use_legacy_serial
24a2b0f6a docs: Remove clear containers reference in README
abad33eba kernel: Remove nemu.conf from packaging
e87eb13c4 tools: delete unused param from get_from_kata_deps callers
8052fe62f runtime: do not check for EOF error in console watcher
c67b9d297 qemu: allow using legacy serial device for the console
44814dce1 qemu: treat console kernel params within appendConsole
4f586d2a9 packaging: Add kernel config option for SGX in Gramine
4b437d91f agent: Fix is_signal_handled failing parsing str to u64
88fb9b72e docs: Update runc containerd runtime
d1f2852d8 tools: Stop building virtiofsd with qemu (for x86_64)
c39852e83 runtime: Use ${LIBEXEC}/virtiofsd as the default virtiofsd path
b4b9068cb tools: Add QEMU patches for SGX numa support
a475956ab workflows: Add support for building virtiofsd
71f59f3a7 local-build: Add support for building virtiofsd
c7ac55b6d dockerbuild: Install unzip
8e2042d05 tools: add script to pull virtiofsd
dbedea508 versions: Add virtiofsd entry
e73b70baf runtime: Don't run unit tests verbose by default
f24a6e761 runtime: Consolidate flags setting in unit tests script
cf465feb0 runtime: Don't change test behaviour based on $CI or $KATA_DEV_MODE
34c4ac599 runtime: Remove redundant subcommands from go-test.sh
0aff5aaa3 runtime: Simplify package listing in go-test.sh
557c4cfd0 runtime: Don't chmod coverage files in Go tests
04c8b52e0 runtime: Remove HTML coverage option from go-test.sh
7f7691442 runtime: Add coverage.txt.tmp to gitignore
13c257700 runtime: Move go testing script locally
421064680 doc: Update log parser link
271933fec log-parser: fix some of the documentation
c7dacb121 log-parser: move the kata-log-parser from the tests repo
82ea01828 versions: Upgrade to Cloud Hypervisor v23.1
2a1d39414 runtime: Adding the correct detection of mediated PCIe devices
7bc4ab68c ci: Don't run Docs URL Alive Check workflow on forks
475e3bf38 agent: add test coverage for functions find_process and online_resources
383be2203 agent: Add a macro to skip a loop easier
97d7b1845 runk: use custom Kill command to support --all option
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
Add a new `Examples` section to the `agent-ctl` docs giving some
examples of how to use the tool with QEMU and stand-alone.
Fixes: #4414.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
The `agent-ctl` and `trace-forwarder` tools make use of
`anyhow::Context` to provide additional call site information on error.
However, previously neither tool was using the "alternate debug" format
to display the error, meaning full error output was not displayed.
Fixes: #4411.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This PR updates the storage documentation link for the devicemapper
snapshotter.
Fixes#4398
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Since #902 the `io.katacontainers.config.hypervisor` pod annotations
have only been permitted if explicitly allowed in the global
configuration. The default global configuration allows no such
annotations. That's important because several of those annotations
would cause Kata to execute arbitrary binaries, and so were wildly
unsafe.
However, this is inconvenient for the
`io.katacontainers.config.hypervisor.enable_iommu` annotation
specifically, which controls whether the sandbox VM includes a vIOMMU.
A guest side vIOMMU is necessary to implement VFIO passthrough devices
with `vfio_mode = vfio`, so enabling that mode of operation currently
requires a global configuration change, and can't just be enabled
per-pod.
Unlike some of the other hypervisor annotations, the `enable_iommu`
annotation is quite safe. By default the vIOMMU is not present, so
allowing a user to override it for a pod only improves their
facilities for isolation. Even if the global default were changed to
enable the vIOMMU, that doesn't compel the guest kernel to use it, so
allowing a user to disable the vIOMMU doesn't materially affect
isolation either.
Therefore, allow the io.katacontainers.config.hypervisor.enable_iommu
annotation to work in the default configurations.
fixes#4330
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
destroy() method should ignore the ESRCH error from signal::kill
and continue the operation as ESRCH is often considered harmless.
Fixes: #4359
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Set thestop container force flag to true so that the container state is always set to
“StateStopped” after the container wait goroutine is finished. This is necessary for
the following delete container step to succeed.
Fixes: #4359
Signed-off-by: Feng Wang <feng.wang@databricks.com>
This PR updates the url link for the kata containers configuration
for the general snap documentation.
Fixes#4341
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Support list sub-command. It will traverse the root directory, parse
status file and print basic information of containers. Behavior and
print format consistent with runc. To handle race with runk delete
or system user modify, the loop will continue to traverse when errors
are encountered.
Fixes: #4362
Signed-off-by: Chen Yiyang <cyyzero@qq.com>
Move the common shell code to a helper script that is sourced by all
parts.
Add extra quoting to some variables in the snap config file
and simplify.
Fixes: #4304.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Now that #4213 is merged we need updated documentation for vGPU time-sliced or vGPU MIG-backed.
Fixes: #4343
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Improve the snap docs by using more consistent formatting and proper
shell code in the shell example.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Since we must build with `--destructive-mode`, add a warning that the
host environment could change the behaviour of the build, depending on
the packages installed on the system.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
The `kata-agent` passes its standard I/O file descriptors
through to the container process that will be launched
by `runk` without manipulation or modification in order to
allow the container process can handle its I/O operations.
Fixes: #4327
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
In linux 5.14 and hopefully some backports, core scheduling allows processes to
be co scheduled within the same domain on SMT enabled systems.
Containerd impl sets the core sched domain when launching a shim. This
allows a clean way for each shim(container/pod) to be in its own domain and any
additional containers, (v2 pods) be be launched with the same domain as well as
any exec'd process added to the container.
kernel docs: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/core-scheduling.html
For Kata specifically, we will look for SCHED_CORE environment variable
to be set to indicate we shuold create a new schedule core domain.
This is equivalent to the containerd shim's PR: e48bbe8394Fixes: #4309
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Michael Crosby <michael@thepasture.io>
While end users can connect directly to the shim, let's provide a way to
easily get/set iptables from kata-runtime itself.
Fixes: #4080
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Without this, potential errors are silently dropped. Let's ensure we
return the error code as well as potenial data from the response.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Before, we had a mix of slash, etc. Unfortunately, when cleaning URL
paths, serve mux seems to mangle the request method, resulting in each
request being a GET (instead of PUT or POST).
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Add two endpoints: ip6tables, iptables.
Each url handler supports GET and PUT operations. PUT expects
the requests' data to be []bytes, and to contain iptable information in
format to be consumed by iptables-restore.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Introduce get/set iptable handling. We add a sandbox API for getting and
setting the IPTables within the guest. This routes it from sandbox
interface, through kata-agent, ultimately making requests to the guest
agent.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Since we are introducing an agent API for interacting with guest
iptables, let's ensure that our example rootfs' have iptables-save/restore
installed.
Fixes: #4356
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Update the agent protocol definition to introduce support for setting
and getting iptables from the guest.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
This enables tests for the kata-agent for runk that is built
with standard-oci-runtime feature in CI.
Fixes: #4351
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Get user's home dir using `nix::unistd` crate instead of `utils` crate,
and remove useless code from agent.
Fixes: #4209
Signed-off-by: Xuewei Niu <justxuewei@apache.org>
runk always launches containers with detached mode,
so users have to use a console socket with run or
create operation when a terminal is used.
If users set `terminal` to `true` in `config.json` and
try to launch a container without specifying a console
socket, runk returns an error with a message early.
Fixes: #4324
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
This commit enables runk to handle `root.path` in `config.json`
properly even if the path is specified by a relative path that
includes the single (`.`) or the double (`..`) dots.
For example, with a bundle at `/to/bundle` and a rootfs directly
under `/to/bundle` such as `/to/bundle/{bin,dev,etc,home,...}`,
the `root.path` value can be either `/to/bundle` or just `.`.
This behavior conforms to OCI runtime spec.
Accordingly, a bundle path managed by runk's status file
(`status.json`) always is statically stored as a canonical path.
Previously, a bundle path has been got by `oci_state()` of rustjail's
API that returns the path as the parent directory path of a rootfs
(`root.path`). In case of the kata-agent, this works properly because
the kata containers assume that the rootfs path is always
`/to/bundle/rootfs`. However in case of standard OCI runtimes,
a rootfs can be placed anywhere under a bundle, so the rootfs path
doesn't always have to be at a `/to/bundle/rootfs`.
Fixes: #4334
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Destructive mode is required to build the Kata Containers snap. See:
```
.github/workflows/snap-release.yaml
.github/workflows/snap.yaml
```
Hence, update the last file that we forgot to update with
`--destructive-mode`.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Move the snap README to a subdirectory to resolve the warning given by
`snapcraft` (folded and reformatted slightly for clarity):
```
The 'snap' directory is meant specifically for snapcraft,
but it contains the following non-snapcraft-related paths,
which is unsupported and will cause unexpected behavior:
- README.md
If you must store these files within the 'snap' directory,
move them to 'snap/local', which is ignored by snapcraft.
```
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This release has been tracked through the v24.0 project.
virtio-iommu specification describes how a device can be attached by default
to a bypass domain. This feature is particularly helpful for booting a VM with
guest software which doesn't support virtio-iommu but still need to access
the device. Now that Cloud Hypervisor supports this feature, it can boot a VM
with Rust Hypervisor Firmware or OVMF even if the virtio-block device exposing
the disk image is placed behind a virtual IOMMU.
Multiple checks have been added to the code to prevent devices with identical
identifiers from being created, and therefore avoid unexpected behaviors at boot
or whenever a device was hot plugged into the VM.
Sparse mmap support has been added to both VFIO and vfio-user devices. This
allows the device regions that are not fully mappable to be partially mapped.
And the more a device region can be mapped into the guest address space, the
fewer VM exits will be generated when this device is accessed. This directly
impacts the performance related to this device.
A new serial_number option has been added to --platform, allowing a user to
set a specific serial number for the platform. This number is exposed to the
guest through the SMBIOS.
* Fix loading RAW firmware (#4072)
* Reject compressed QCOW images (#4055)
* Reject virtio-mem resize if device is not activated (#4003)
* Fix potential mmap leaks from VFIO/vfio-user MMIO regions (#4069)
* Fix algorithm finding HOB memory resources (#3983)
* Refactor interrupt handling (#4083)
* Load kernel asynchronously (#4022)
* Only create ACPI memory manager DSDT when resizable (#4013)
Deprecated features will be removed in a subsequent release and users should
plan to use alternatives
* The mergeable option from the virtio-pmem support has been deprecated
(#3968)
* The dax option from the virtio-fs support has been deprecated (#3889)
Fixes: #4317
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Update the snap config file to build the rust version of `virtiofsd` for
x86_64, but build QEMU's C version for other platforms.
Fixes: #4261.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Rather than putting the `yq` binary in the staging directory itself,
put it in the `bin/` sub-directory.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Remove the unused `kata_url` variable and use the value in the `website`
YAML metadata instead.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Based on @fidencio's opoinon,
On Arm: static build virtiofsd using musl lib;
on ppc64 & s390: static build virtiofsd using gnu lib;
Fixes: #4258
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Since Kata 2.x does not support runtime cli, remove information
related to it. Update the configuration snippet accordingly.
Fixes#3870
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
One may want to use standalone containerd without k8s
and still have network enabled for the container.
Getting rid of note due to inaccuracy.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Specify that the `--cni` flag needs to be passed to the `ctr` tool
while starting a container in order to have networking enabled for the
container. This flag allows containerd to call into the configured
network plugin which in turn creates a network interface for the
container.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
The current implementation of walking the
disks to match with the requested volume path
in agent doesn't work because the volume path
provided by the shim to the agent is the mount
path within the guest and not the device name.
The current logic is trying to match the
device name to the volume path which will never
match.
This change will simplify the
get_volume_capacity_stats and
get_volume_inode_stats to just call statfs and
get the bytes and inodes usage of the volume
path directly.
Fixes: #4297
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
Today the shim does a translation when doing
direct-volume stats where it takes the source and
returns the mount path within the guest.
The source for a direct-assigned volume is actually
the device path on the host and not the publish
volume path.
This change will perform a lookup of the mount info
during direct-volume stats to ensure that the
device path is provided to the shim for querying
the volume stats.
Fixes: #4297
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
The go default http mux AFAIK doesn’t support pattern
routing so right now client is padding the url
for direct-volume stats with a subpath of the volume
path and this will always result in 404 not found returned
by the shim.
This change will update the shim to take the volume
path as a GET query parameter instead of a subpath.
If the parameter is missing or empty, then return
400 BadRequest to the client.
Fixes: #4297
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
The action function expects a function that returns error
but the current direct-volume stats Action returns
(string, error) which is invalid.
This change fixes the format and print out the stats from
the command instead.
Fixes: #4293
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
This PR removes the clear containers reference as this is not longer
being used and is deprecated at the rootfs builder README.
Fixes#4278
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes the nemu.conf as we are not longer using NEMU from
the kernel configurations.
Fixes#4272
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The documentation of the bufio package explicitly says
"Err returns the first non-EOF error that was encountered by the
Scanner."
When io.EOF happens, `Err()` will return `nil` and `Scan()` will return
`false`.
Fixes#4079
Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>
For the Gramine Shielded Containers guest kernel, CONFIG_NUMA must be
enabled.
Fixes #4266
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
In the is_signal_handled function, when parsing the hex string returned
from `/proc/<pid>/status` the space/tab character after the colon
is not removed.
This patch trims the result of SigCgt so that
all whitespace characters are removed. It also extends the existing
test cases to check for this scenario.
Fixes: #4250
Signed-off-by: Champ-Goblem <cameron@northflank.com>
As we are using a containerd version > 1.4 we need to update
the runc containerd runtime.
Fixes#4263
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This commit updates the "Run Kata Containers with Kubernetes" to include
cgroupDriver configuration via "KubeletConfiguration". Without this
setting kubeadm defaults to systemd cgroupDriver. Containerd with Kata
cannot spawn conntainers with systemd cgroup driver.
Fixes: #4262
Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
As we finally can move to using the rust virtiofs daemon, let's stop
bulding and packaging the C version of the virtiofsd for x86_64.
Fixes: #4249
Depends-on: github.com/kata-containers/tests#4785
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As now we build and ship the rust version of virtiofsd, which is not
tied to QEMU, we need to update its default location to match with where
we're installing this binary.
Fixes: #4249
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
There are a few patches for SGX numa support in QEMU added after the
6.2.0 release. Add them for SGX support in Kata.
Fixes#4254
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
As already done for the other assets we rely on, let's build (well, pull
in this very specific case) the virtiofsd binary, as we're relying on
its standlone rust version from now on.
Fixes: #4234
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As done for the other binaries we release, let's add support for
"building" (or pulling down) the static binary we ship as part of the
kata-containers static tarball (the same one used by kata-deploy).
Right now the virtiofsd is installed in /opt/kata/libexec/virtiofsd, a
different path than the virtiofsd that comes with QEMU.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As virtiofsd comes in the `zip` format, let's install unzip in the
containers and then be able to access the virtiofsd binary.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Right now this is very much x86_64 specific, but I'd like to count on
the maintainers of the other architectures to expand it.
Also, the name as it's now may be misleading, as we're actually only
pulling the binary that's statically built using `musl` and released as
part of virtiofsd official releases. But we'll need to build it for the
other architectures, thus I'm following the naming of the scripts used
by the other components.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As we're switching to using the rust version of the virtiofsd, let's
give it its own entry in the versions.yaml file, as it's no longer part
of QEMU.
It's important to mention that GitLab doesn't provide a well formed URL
for the releases. Instead, it adds there a hash, leading us to have to
add the specific link for the tarball.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
go-test.sh by default adds the -v option to 'go test' meaning that output
will be printed from all the passing tests as well as any failing ones.
This results in a lot of output in which it's often difficult to locate the
failing tests you're interested in.
So, remove -v from the default flags.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
One of the responsibilities of the go-test.sh script is setting up the
default flags for 'go test'. This is constructed across several different
places in the script using several unneeded intermediate variables though.
Consolidate all the flag construction into one place.
fixes#4190
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
go-test.sh changes behaviour based on both the $CI and $KATA_DEV_MODE
variables, but not in a way that makes a lot of sense.
If either one is set it uses the test_coverage path, instead of the
test_local path. That collects coverage information, as the name
suggests, but it also means it runs the tests twice as root and
non-root, which is very non-obvious.
It's not clear what use case the test_local path is for at all.
Developer local builds will typically have $KATA_DEV_MODE set and CI
builds will have $CI set. There's essentially no downside to running
coverage all the time - it has little impact on the test runtime.
In addition, if *both* $CI and $KATA_DEV_MODE are set, the script
refuses to run things as root, considering it "unsafe". While having
both set might be unwise in a general sense, there's not really any
way running sudo can be any more unsafe than it is with either one
set.
So, simplify everything by just always running the test_coverage path.
This leaves the test_local path unused, so we can remove it entirely.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
go-test.sh accepts subcommands, however invoking it in the usual way via
the Makefile doesn't use them. In fact the only remaining subcommand is
"help" and we already have another way of getting the usage information
(-h or --help). We don't need a second way, so just drop subcommand
handling.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
go-test.sh defaults to testing all the packages listed by go list, except
for a number filtered out. It turns out that none of those filters are
necessary any more:
* We've long required a Go newer than 1.9 which means the vendor filter
isn't needed
* The agent filter doesn't do anything now that we've moved to the Kata
2.x unified repo
* The tests filters don't hit anything on the list of modules in
src/runtime (which is the only user of the script)
But since we don't need to filter anything out any more, we don't even need
to iterate through a list ourselves. We can simply pass "./..." directly
to go test and it will iterate through all the sub-packages itself.
Interestingly this more than doubles the speed of "make test" for me - I
suspect because go test's internal paralellism works better over a larger
pool of tests.
This also lets us remove handling of non-existent coverage files from
test_go_package(), since with default options we will no longer test packages without tests
by default. If the user explicitly requests testing of a package with no
tests, then failing makes sense.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The go-test.sh script has an explicit chmod command, run as root, to
set the mode of the temporary coverage files to 0644. AFAICT the
point of this is specifically the 004 bit allowing world read access,
so that we can then merge the temporary coverage file into the main
coverage file.
That's a convoluted way of doing things. Instead we can just run the tail
command which reads the temporary file as the same user that generated it.
In addition, go-test.sh became root to remove that temporary coverage
file. This is not necessary, since deleting a regular file just requires
write access to the directory, not the file itself.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The html-coverage option to this script doesn't really alter behaviour
it just does the same thing as normal coverage, then converts the
report to HTML. That conversion is a single command, plus a chmod to
make the final output mode 0644. That overrides any umask the user
has set, which doesn't seem like a policy decision this script should
be making.
Nothing in the kata-containers or tests repository uses this, so it doesn't
really make sense to keep this logic inside this script.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In addition to coverage.txt, the go-test.sh script creates
coverage.txt.tmp files while running. These are temporary and
certainly shouldn't be committed, so add them to the gitignore file.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The go unit tests for the runtime are invoked by the helper script
ci/go-test.sh. Which calls the run_go_test() function in ci/lib.sh. Which
calls into .ci/go-test.sh from the tests repository.
But.. the runtime is the only user of this script, and generally stuff for
unit tests (rather than functional or integration tests) lives in the main
repository, not the tests repository.
So, just move the actual script into src/runtime. A change to remove it
from the tests repo will follow.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
to the kata-containers repo under the src/tools/log-parser folder
and vendor the modules
Fixes: #4100
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
The following issues have been addressed from the latest bug fix release
v23.1 of Cloud Hypervisor: 1) Add some missing seccomp rules; 2) Remove
virtio-fs filesystem entries from config on removal; 3) Do not delete
API socket on API server start; 4) Reject virtio-mem resize if the guest
doesn't activate the device; 5) Fix OpenAPI naming of I/O throttling
knobs;
Fixes: #4222
Signed-off-by: Bo Chen <chen.bo@intel.com>
This workflow is a scheduled job that runs at 23:00
every Sunday, it should only run the main repo
but not the forked ones.
Fixes: #4219
Signed-off-by: Bin Liu <bin@hyper.sh>
- agent watchers: ensure uid/gid is preserved on copy/mkdir
- clh: Rely on Cloud Hypervisor for generating the device ID
- agent: add tests for create_logger_task function
- runk: set BinaryName for runk for containerd
- tools: Add a Rust-based standard OCI container runtime based on Kata agent
- rustjail: add tests for parse_mount_table
- Virtcontainers: Enable hot plugging vhost-user-blk device on ARM
- docs: repropose direct-assigned volume
- versions: change qemu tdx url and tag
- doc: Update for NVIDIA GPUs
- agent-ctl: Fix abstract socket connections
- Implement network and disk rate limiter for Cloud Hypervisor
- kata-deploy: Add support to RKE2
- docs: Update containerd link to installation guide
- docs: remove pc machine type supports
- Agent: Unit tests for random.rs
- rustjail: Add tests for mount_grpc_to_oci
- packaging: Fix broken path in `build-static-clh.sh`
- Fix Go unit tests to clean up /tmp after themselves
- rustjail: add tests for mount_from function
- rustjail: Add tests for hooks_grpc_to_oci
- agent: modify the type of swappiness to u64
- libs/safe-path: add crate to safely resolve fs paths
- agent: move assert_result macro to test_utils file
- rustjail: Add tests for root_grpc_to_oci
- agent: add tests for mount_to_rootfs function
- agent: add tests for update_container_namespaces
- agent: add tests for is_signal_handled function
- Upgrade to Cloud Hypervisor v23.0
- agent: best-effort removing mount point
- test: Fix golangci-lint error for s390x
- fsGroup support for direct-assigned volume
- kata-monitor: add the README file
- kata-monitor: update the hrefs in the debug/pprof index page
- runtime: Base64 encode the direct volume mountInfo path
- runtime: no need to write virtiofsd error to log
- kata-monitor: add some links when generating pages for browsers
- agent: Avoid agent panic when reading empty stats
- docs: Update link to contributions guide
- agent: add tests for mount_storage
- agent: add test coverage for parse_mount_flags_and_options function
- agent: add tests for do_write_stream function
- runtime: delete debug option in virtiofsd
- rustjail: add test coverage for process_grpc_to_oci function
- agent: Allow the agent to be rebuilt with the change of Cargo features
- protocols: add src/csi.rs to .gitignore
- kata-runtime enable hugepage support
- docs: Add a firecracker installation guide
- runtime: Allow and require no initrd for SE
- test: use `T.TempDir` to create temporary test directory
- clh: Expose service offload configuration
33a8b705 clh: Rely on Cloud Hypervisor for generating the device ID
70eda2fa agent: watchers: ensure uid/gid is preserved on copy/mkdir
7772f7dd runk: set BinaryName for runk for containerd
7ffe5a16 docs: Direct-assigned volume design
081f6de8 versions: change qemu tdx url and tag
666aee54 docs: Add VSOCK localhost example for agent-ctl
86d348e0 docs: Use VM term in agent-ctl doc
4b9b62bb agent-ctl: Fix abstract socket connections
b6467ddd clh: Expose disk rate limiter config
7580bb5a clh: Expose net rate limiter config
a88adaba clh: Cloud Hypervisor has a built-in Rate Limiter
63c4da03 clh: Implement the Disk RateLimiter logic
511f7f82 config: Add DiskRateLimiter* to Cloud Hypervisor
5b18575d hypervisor: Add disk bandwidth and operations rate limiters
1cf94692 clh: Implement the Network RateLimiter logic
00a5b1bd utils: Define DefaultRateLimiterRefillTimeMilliSecs
be1bb7e3 utils: Move FC's function to revert bytes to utils
c9f6496d config: Add NetRateLimiter* to Cloud Hypervisor
2d35e606 hypervisor: Add network bandwidth and operations rate limiters
b0e439cb rustjail: add tests for parse_mount_table
ccb01839 kata-deploy: Add support to RKE2
9d39362e kata-deploy: Reestructure the installing section
18d27f79 kata-deploy: Add a missing `$` prefix in the README
6948b4b3 docs: Update containerd link to installation guide
b221a259 tools: Add runk
2c218a07 agent: Modify Kata agent for runk
dd4bd7f4 doc: Added initial doc update for NV GPUs
832c33d5 docs: remove pc machine type supports
b658dccc tools: fix typo in clh directory name
afbd60da packaging: Fix clh build from source fall-back
4b9e78b8 rustjail: Add tests for mount_grpc_to_oci
81f6b486 agent: add tests for create_logger_task function
96bc3ec2 rustjail: Add tests for hooks_grpc_to_oci
02395027 agent: modify the type of swappiness to u64
1b931f42 runtime: Allock mockfs storage to be placed in any directory
ef6d54a7 runtime: Let MockFSInit create a mock fs driver at any path
5d8438e9 runtime: Move mockfs control global into mockfs.go
963d03ea runtime: Export StoragePathSuffix
1719a8b4 runtime: Don't abuse MockStorageRootPath() for factory tests
bec59f9e runtime: Make bind mount tests better clean up after themselves
f7ba21c8 runtime: Clean up mock hook logs in tests
90b2f5b7 runtime: Make SetupOCIConfigFile clean up after itself
2eeb5dc2 runtime: Don't use fixed /tmp/mountPoint path
0ad89ebd safe-path: add more unit test cases
b63774ec libs/safe-path: add crate to safely resolve fs paths
f385b21b rustjail: add tests for mount_from function
0e7f1a5e agent: move assert_result macro to test_utils file
2256bcb6 rustjail: Add tests for root_grpc_to_oci
7b2ff026 kata-monitor: add a README file
29e569aa virtcontainers: clh: Re-generate the client code
6012c197 versions: Upgrade to Cloud Hypervisor v23.0
aabcebbf agent: best-effort removing mount point
d136c9c2 test: Fix golangci-lint error for s390x
86977ff7 kata-monitor: update the hrefs in the debug/pprof index page
78f30c33 agent: Avoid agent panic when reading empty stats
6e79042a runtime: no need to write virtiofsd error to log
9b6f24b2 agent: add tests for mount_to_rootfs function
c3776b17 agent: add tests for is_signal_handled function
9c22d955 agent: add tests for update_container_namespaces
92c00c7e agent: fsGroup support for direct-assigned volume
6e9e4e8c docs: Update link to contributions guide
532d5397 runtime: fsGroup support for direct-assigned volume
6a47b82c proto: fsGroup support for direct-assigned volume
9d5e7ee0 agent: add tests for mount_storage
f8cc5d1a kata-monitor: add some links when generating pages for browsers
c31cd0e8 rustjail: add test coverage for process_grpc_to_oci function
1118a3d2 agent: add test coverage for parse_mount_flags_and_options function
9d5b03a1 runtime: delete debug option in virtiofsd
eff7c7e0 agent: Allow the agent to be rebuilt with the change of Cargo features
b975f2e8 Virtcontainers: Enable hot plugging vhost-user-blk device on ARM
962d05ec protocols: add src/csi.rs to .gitignore
354cd3b9 runtime: Base64 encode the direct volume mountInfo path
485aeabb agent: add tests for do_write_stream function
4405b188 docs: Add a firecracker installation guide
98750d79 clh: Expose service offload configuration
59c7165e test: use `T.TempDir` to create temporary test directory
ff17c756 runtime: Allow and require no initrd for SE
1cad3a46 agent/random: Ensure data.len > 0
33c953ac agent: Add test_ressed_rng_not_root
39a35b69 agent: Add test to random::reseed_rng()
d8f39fb2 agent/random: Rename RNDRESEEDRNG to RNDRESEEDCRNG
a2f5c176 runtime/virtcontainers: Pass the hugepages resources to agent
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We're currently hitting a race condition on the Cloud Hypervisor's
driver code when quickly removing and adding a block device.
This happens because the device removal is an asynchronous operation,
and we currently do *not* monitor events coming from Cloud Hypervisor to
know when the device was actually removed. Together with this, the
sandbox code doesn't know about that and when a new device is attached
it'll quickly assign what may be the very same ID to the new device,
leading to the Cloud Hypervisor's driver trying to hotplug a device with
the very same ID of the device that was not yet removed.
This is, in a nutshell, why the tests with Cloud Hypervisor and
devmapper have been failing every now and then.
The workaround taken to solve the issue is basically *not* passing down
the device ID to Cloud Hypervisor and simply letting Cloud Hypervisor
itself generate those, as Cloud Hypervisor does it in a manner that
avoids such conflicts. With this addition we have then to keep a map of
the device ID and the Cloud Hypervisor's generated ID, so we can
properly remove the device.
This workaround will probably stay for a while, at least till someone
has enough cycles to implement a way to watch the device removal event
and then properly act on that. Spoiler alert, this will be a complex
change that may not even be worth it considering the race can be avoided
with this commit.
Fixes: #4176
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add test coverage for the functions find_process and online_resources in src/sandbox.rs.
Fixes#4085Fixes#4136
Signed-off-by: Jack Hance <jack.hance@ndsu.edu>
Today in agent watchers, when we copy files/symlinks
or create directories, the ownership of the source path
is not preserved which can lead to permission issues.
In copy, ensure that we do a chown of the source path
uid/gid to the destination file/symlink after copy to
ensure that ownership matches the source ownership.
fs::copy() takes care of setting the permissions.
For directory creation, ensure that we set the
permissions of the created directory to the source
directory permissions and also perform a chown of the
source path uid/gid to ensure directory ownership
and permissions matches to the source.
Fixes: #4188
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
runk uses liboci-cli crate to parse command line options,
but liboci-cli does not support --all option for kill command,
though this is the runtime spec behavior.
But crictl will issue kill --all command when stopping containers,
as a workaround, we use a custom kill command instead of the one
provided by liboci-cli.
Fixes: #4182
Signed-off-by: Bin Liu <bin@hyper.sh>
In at least kata versions 2.3.3 and 2.4.0 it was noticed that the guest
operating system's clock would drift out of sync slowly over time
whilst the pod was running.
This had previously been raised and fixed in the old reposity via [1].
In essence kvm_ptp and chrony were paired together in order to
keep the system clock up to date with the host.
In the recent versions of kata metioned above,
the chronyd.service fails upon boot with status `266/NAMESPACE`
which seems to be due to the fact that the `/var/lib/chrony`
directory no longer exists.
This change sets the `/var/lib/chrony` directory for the `ReadWritePaths`
to be ignored when the directory does not exist, as per [2].
[1] https://github.com/kata-containers/runtime/issues/1279
[2] https://www.freedesktop.org/software/systemd
/man/systemd.exec.html#ReadWritePaths=
Fixes: #4167
Signed-off-by: Champ-Goblem <cameron_mcdermott@yahoo.co.uk>
The default runtime for io.containerd.runc.v2 is runc,
to use runk, the containerd configuration should set the
default runtime to runk or add BinaryName options for the
runtime.
Fixes: #4177
Signed-off-by: Bin Liu <bin@hyper.sh>
Update the `agent-ctl` docs to show how to use a VSOCK local address
when running the agent and the tool in the same environment. This is an
alternative to using a Unix socket.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Unbreak the `agent-ctl` tool connecting to the agent with a Unix domain
socket.
It appears that [1] changed the behaviour of connecting to the agent
using a local Unix socket (which is not used by Kata under normal
operation).
The change can be seen by reverting to commit
72b8144b56 (the one before [1]) and
running the agent manually as:
```bash
$ sudo KATA_AGENT_SERVER_ADDR=unix:///tmp/foo.socket target/x86_64-unknown-linux-musl/release/kata-agent
```
Before [1], in another terminal we see this:
```bash
$ sudo lsof -U 2>/dev/null |grep foo|awk '{print $9}'
@/tmp/foo.socket@
```
But now, we see the following:
```bash
$ sudo lsof -U 2>/dev/null |grep foo|awk '{print $9}'
@/tmp/foo.socket
```
Note the last byte which represents a nul (`\0`) value.
The `agent-ctl` tool used to add that trailing nul but now it seems to not
be needed, so this change removes it, restoring functionality. No
external changes are necessary so the `agent-ctl` tool can connect to
the agent as below like this:
```bash
$ cargo run -- -l debug connect --server-address "unix://@/tmp/foo.socket" --bundle-dir "$bundle_dir" -c Check -c GetGuestDetails
```
[1] - https://github.com/kata-containers/kata-containers/issues/3124Fixes: #4164.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
With everything implemented, let's now expose the disk rate limiter
configuration options in the Cloud Hypervisor configuration file.
Fixes: #4139
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
With everything implemented, let's now expose the net rate limiter
configuration options in the Cloud Hypervisor configuration file.
Fixes: #4017
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The notion of "built-in rate limiter" was added as part of
bd8658e362, and that commit considered
that only Firecracker had a built-in rate limiter, which I think was the
case when that was introduced (mid 2020).
Nowadays, however, Cloud Hypervisor takes advantage of the very same crate
used by Firecraker to do I/O throttling.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's take advantage of the newly added DiskRateLimiter* options and
apply those to the network device configuration.
The logic here is identical to the one already present in the Network
part of Cloud Hypervisor's driver.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the newly added disk rate limiter configurations to the Cloud
Hypervisor's hypervisor configuration.
Right now those are not used anywhere, and there's absolutely no way the
users can set those up. That's coming later in this very same series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is the disk counterpart of the what was introduced for the network
as part of the previous commits in this series.
The newly added fields are:
* DiskRateLimiterBwMaxRate, defined in bits per second, which is used to
control the network I/O bandwidth at the VM level.
* DiskRateLimiterBwOneTimeBurst, also defined in bits per second, which
is used to define an *initial* max rate, which doesn't replenish.
* DiskRateLimiterOpsMaxRate, the operations per second equivalent of the
DiskRateLimiterBwMaxRate.
* DiskRateLimiterOpsOneTimeBurst, the operations per second equivalent of
the DiskRateLimiterBwOneTimeBurst.
For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's take advantage of the newly added NetRateLimiter* options and
apply those to the network device configuration.
The logic here is quite similar to the one already present in the
Firecracker's driver, with the main difference being the single Inbound
/ Outbound MaxRate and the presence of both Bandwidth and Operations
rate limiter.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Firecracker's driver doesn't expose the RefillTime option of the rate
limiter to the user. Instead, it uses a contant value of 1000
miliseconds (1 second).
As we're following Firecracker's driver implementation, let's expose
create a new constant, use it as part of the Firecracker's driver, and
later on re-use it as part of the Cloud Hypervisor's driver.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Firecracker's revertBytes function, now called "RevertBytes", can be
exposed as part of the virtcontainers' utils file, as this function will
be reused by Cloud Hypervisor, when adding the rate limiter logic there.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the newly added network rate limiter configurations to the
Cloud Hypervisor's hypervisor configuration.
Right now those are not used anywhere, and there's absolutely no way the
users can set those up. That's coming later in this very same series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In a similar way to what's already exposed as RxRateLimiterMaxRate and
TxRateLimiterMaxRate, let's add four new fields to the Hypervisor's
configuration.
The values added are related to bandwidth and operations rate limiters,
which have to be added so we can expose I/O throttling configurations to
users using Cloud Hypervisor as their preferred VMM.
The reason we cannot simply re-use {Rx,Tx}RateLimiterMaxRate is because
Cloud Hypervisor exposes a single MaxRate to be used for both inbound
and outbound queues.
The newly added fields are:
* NetRateLimiterBwMaxRate, defined in bits per second, which is used to
control the network I/O bandwidth at the VM level.
* NetRateLimiterBwOneTimeBurst, also defined in bits per second, which
is used to define an *initial* max rate, which doesn't replenish.
* NetRateLimiterOpsMaxRate, the operations per second equivalent of the
NetRateLimiterBwMaxRate.
* NetRateLimiterOpsOneTimeBurst, the operations per second equivalent of
the NetRateLimiterBwOneTimeBurst.
For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add tests for parse_mount_table function in rustjail/src/mount.rs.
Includes some minor refactoring improve the testability of the
function and improve its error values.
Fixes: #4082
Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
"RKE2 - Rancher's Next Generation Kuberentes Distribution" can easily be
supported by kata-deploy with some simple adjustments to what we've been
relying on for "k3s".
The main differences between k3s and RKE2 are, basically:
1. The location where the containerd configuration is stored
- k3s: /var/lib/rancher/k3s/agent/etc/containerd/
- rke2: /var/lib/rancher/rke2/agent/etc/containerd/
2. The name of the systemd services used:
- k3s: k3s.service or k3s-agent.service
- rke2: rke2-server.service or rke2-agent.service
Knowing this, let's add a new overlay for RKE2, adapt the kata-deploy
and the kata-cleanup scripts, and that's it.
Fixes: #4160
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's move the specific installation instructions, such as for k3s,
upper in the document.
This helps reading (and also skipping) according to what the user
is looking for.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Add a Rust-based standard OCI container runtime based on
Kata agent.
You can build and install runk as follows:
```sh
$ cd src/tools/runk
$ make
$ sudo make install
$ runk --help
```
Fixes: #2784
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Generate an oci-kata-agent which is a customized agent to be
called from runk which is a Rust-based standard OCI container
runtime based on Kata agent.
Fixes: #2784
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Currently the 'pc' machine type is no longer supported in kata configuration,
so remove it in the design docs.
Fixes: #4155
Signed-off-by: Jason Zhang <zhanghj.lc@inspur.com>
If we fail to download the clh binary, we fall-back to build from source.
Unfortunately, `pull_clh_released_binary()` leaves a `cloud_hypervisor`
directory behind, which causes `build_clh_from_source()` not to clone
the git repo:
[ -d "${repo_dir}" ] || git clone "${cloud_hypervisor_repo}"
When building from a kata-containers git repo, the subsequent calls
to `git` in this function thus apply to the kata-containers repo and
eventually fail, e.g.:
+ git checkout v23.0
error: pathspec 'v23.0' did not match any file(s) known to git
It doesn't quite make sense actually to keep an existing directory the
content of which is arbitrary when we want to it to contain a specific
version of clh. Just remove it instead.
Fixes: #4151
Signed-off-by: Greg Kurz <groug@kaod.org>
The type of MemorySwappiness in runtime is uint64, and the type of swappiness in agent is int64,
if we set max uint64 in runtime and pass it to agent, the value will be equal to -1. We should
modify the type of swappiness to u64
Fixes: #4123
Signed-off-by: holyfei <yangfeiyu20092010@163.com>
Currently EnableMockTesting() takes no arguments and will always place the
mock storage in the fixed location /tmp/vc/mockfs. This means that one
test run can interfere with the next one if anything isn't cleaned up
(and there are other bugs which means that happens). If if those were
fixed this would allow developers testing on the same machine to interfere
with each other.
So, allow the mockfs to be placed at an arbitrary place given as a
parameter to EnableMockTesting(). In TestMain() we place it under our
existing temporary directory, so we don't need any additional cleanup just
for the mockfs.
fixes#4140
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently MockFSInit always creates the mockfs at the fixed path
/tmp/vc/mockfs. This change allows it to be initialized at any path
given as a parameter. This allows the tests in fs_test.go to be
simplified, because the by using a temporary directory from
t.TempDir(), which is automatically cleaned up, we don't need to
manually trigger initTestDir() (which is misnamed, it's actually a
cleanup function).
For now we still use the fixed path when auto-creating the mockfs in
MockAutoInit(), but we'll change that later.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
virtcontainers/persist/fs/mockfs.go defines a mock filesystem type for
testing. A global variable in virtcontainers/persist/manager.go is used to
force use of the mock fs rather than a normal one.
This patch moves the global, and the EnableMockTesting() function which
sets it into mockfs.go. This is slightly cleaner to begin with, and will
allow some further enhancements.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
storagePathSuffix defines the file path suffix - "vc" - used for
Kata's persistent storage information, as a private constant. We
duplicate this information in fc.go which also needs it.
Export it from fs.go instead, so it can be used in fc.go.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A number of unit tests under virtcontainers/factory use
MockStorageRootPath() as a general purpose temporary directory. This
doesn't make sense: the mockfs driver isn't even in use here since we only
call EnableMockTesting for the pase virtcontainers package, not the
subpackages.
Instead use t.TempDir() which is for exactly this purpose. As a bonus it
also handles the cleanup, so we don't need MockStorageDestroy any more.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There are several tests in mount_test.go which perform a sample bind
mount. These need a corresponding unmount to clean up afterwards or
attempting to delete the temporary files will fail due to the existing
mountpoint. Most of them had such an unmount, but
TestBindMountInvalidPgtypes was missing one.
In addition, the existing unmounts where done inconsistently - one was
simply inline (so wouldn't be executed if the test fails too early) and one
is a defer. Change them all to use the t.Cleanup mechanism.
For the dummy mountpoint files, rather than cleaning them up after the
test, the tests were removing them at the beginning of the test. That
stops the test being messed up by a previous run, but messily. Since
these are created in a private temporary directory anyway, if there's
something already there, that indicates a problem we shouldn't ignore.
In fact we don't need to explicitly remove these at all - they'll be
removed along with the rest of the private temporary directory.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The tests in hook_test.go run a mock hook binary, which does some debug
logging to /tmp/mock_hook.log. Currently we don't clean up those logs
when the tests are done. Use a test cleanup function to do this.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
SetupOCIConfigFile creates a temporary directory with os.MkDirTemp(). This
means the callers need to register a deferred function to remove it again.
At least one of them was commented out meaning that a /temp/katatest-
directory was leftover after the unit tests ran.
Change to using t.TempDir() which as well as better matching other parts of
the tests means the testing framework will handle cleaning it up.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Several tests in kata_agent_test.go create /tmp/mountPoint as a dummy
directory to mount. This is not cleaned up after the test. Although it
is in /tmp, that's still a little messy and can be confusing to a user.
In addition, because it uses the same name every time, it allows for one
run of the test to interfere with the next.
Use the built in t.TempDir() to use an automatically named and deleted
temporary directory instead.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There are always path(symlink) based attacks, so the `safe-path` crate
tries to provde some mechanisms to harden path resolution related code.
Fixes: #3451
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Move the assert_result macro to the shared test_utils file
so that it is not duplicated in individual files.
Fixes: #4093
Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
During container exit, the agent tries to remove all the mount point directories,
which can fail if it's a readonly filesytem (e.g. device mapper). This commit ignores
the removal failure and logs a warning message.
Fixes: #4043
Signed-off-by: Feng Wang <feng.wang@databricks.com>
This is to fix a test failure for the
kata-containers-2.0-ubuntu-20.04-s390x-main-baseline jenkins job
Fixes: #4088
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
kata-monitor allows to get data profiles from the kata shim
instances running on the same node by acting as a proxy
(e.g., http://$NODE_ADDRESS:8090/debug/pprof/?sandbox=$MYSANDBOXID).
In order to proxy the requests and the responses to the right shim,
kata-monitor requires to pass the sandbox id via a query string in the
url.
The profiling index page proxied by kata-monitor contains the link to all
the data profiles available. All the links anyway do not contain the
sandbox id included in the request: the links result then broken when
accessed through kata-monitor.
This happens because the profiling index page comes from the kata shim,
which will not include the query string provided in the http request.
Let's add on-the-fly the sandbox id in each href tag returned by the kata
shim index page before providing the proxied page.
Fixes: #4054
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
The scanner reads nothing from viriofsd stderr pipe, because param
'--syslog' rediercts stderr to syslog. So there is no need to write
scanner.Text() to kata log
Fixes: #4063
Signed-off-by: Zhuoyu Tie <tiezhuoyu@outlook.com>
Add test coverage for mount_to_rootfs function in src/mount.rs.
Includes minor refactoring to make function more easily testable.
Fixes#4073
Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
Add test coverage for is_signal_handled function in rpc.rs. Includes
refactors to make the function testable and handle additional cases.
Fixes#3939
Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
Add test coverage for update_container_namespaces function
in src/rpc.rs. Includes minor refactor to make function easier
to test.
Fixes#4034
Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
Adding two functions set_ownership and
recursive_ownership_change to support changing group id
ownership for a mounted volume.
The set_ownership will be called in common_storage_handler
after mount_storage performs the mount for the volume.
set_ownership will be a noop if the FSGroup field in the
Storage struct is not set which indicates no chown will be
performed. If FSGroup field is specified, then it will
perform the recursive walk of the mounted volume path to
change ownership of all files and directories to the
desired group id. It will also configure the SetGid bit
so that files created the directory will have group
following parent directory group.
If the fsGroupChangePolicy is on root mismatch,
then the group ownership will be skipped if the root
directory group id alreasy matches the desired group
id and if the SetGid bit is also set on the root directory.
This is the same behavior as what
Kubelet does today when performing the recursive walk
to change ownership.
Fixes#4018
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
This PR updates the url link to the contributions guide
at the Limitations document.
Fixes#4070
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The fsGroup will be specified by the fsGroup key in
the direct-assign mountinfo metadate field.
This will be set when invoking the kata-runtime
binary and providing the key, value pair in the metadata
field. Similarly, the fsGroupChangePolicy will also
be provided in the mountinfo metadate field.
Adding an extra fields FsGroup and FSGroupChangePolicy
in the Mount construct for container mount which will
be populated when creating block devices by parsing
out the mountInfo.json.
And in handleDeviceBlockVolume of the kata-agent client,
it checks if the mount FSGroup is not nil, which
indicates that fsGroup change is required in the guest,
and will provide the FSGroup field in the protobuf to
pass the value to the agent.
Fixes#4018
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
This change adds two fields to the Storage pb
FSGroup which is a group id that the runtime
specifies to indicate to the agent to perform a
chown of the mounted volume to the specified
group id after mounting is complete in the guest.
FSGroupChangePolicy which is a policy to indicate
whether to always perform the group id ownership
change or only if the root directory group id
does not match with the desired group id.
These two fields will allow CSI plugins to indicate
to Kata that after the block device is mounted in
the guest, group id ownership change should be performed
on that volume.
Fixes#4018
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
Add some links to rendered webpages for better user experience,
let users can jump to pages only by clicking links in browsers.
Fixes: #4061
Signed-off-by: bin <bin@hyper.sh>
Add test coverage for the parse_mount_flags_and_options function
in src/mount.rs.
Fixes#4056
Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
virtiofsd's debug will be enabled if hypervisor's debug has been
enabled, this will generate too many noisy logs from virtiofsd.
Unbind the relationship of log level between virtiofsd and
hypervisor, if users want to see debug log of virtiofsd,
can set it by:
virtio_fs_extra_args = ["-o", "log_level=debug"]
Fixes: #3303
Signed-off-by: bin <bin@hyper.sh>
This allows the kata-agent to be rebuilt when Cargo "features" is
changed. The Makefile for the agent do not need to specify the
sources for prerequisites by having Cargo check for the sources
changes.
Fixes: #4052
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
The vhost-user-blk can be hotplugged on the PCI bridge successfully on
X86, but failed on Arm. However, hotplugging it on Root Port as a PCIe
device can work well on ARM.
Open the "pcie_root_port" in configuration.toml is needed.
Fixes: #4019
Signed-off-by: Jaylyn Ren <jaylyn.ren@arm.com>
After running make in src/agent, the git working area will be changed:
Untracked files:
(use "git add <file>..." to include in what will be committed)
src/libs/protocols/src/csi.rs
The generated file by `build.rs` should be ignored in git.
Fixes: #3959
Signed-off-by: bin <bin@hyper.sh>
Add test coverage for do_write_stream function of AgentService
in src/rpc.rs. Includes minor refactoring to make function more
easily testable.
Fixes#3984
Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
This configuration option is valid for all the hypervisor that are going
to be used with the confidential containers effort, thus exposing the
configuration option for Cloud Hypervisor as well.
Fixes: #4022
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
- agent: fix container stop error with signal SIGRTMIN+3
- doc: Improve kata-deploy README.md by changing sh blocks to bash blocks
- docs: Remove kata-proxy reference
- kata-monitor: fix duplicated output when printing usage
- Stop getting OOM events from agent for "ttrpc closed" error
- tools/packaging: Fix error path in `kata-deploy-binaries.sh -s`
- kata-deploy: fix version bump from -rc to stable
- release: Include all the rust vendored code into the vendored tarball
- docs: Remove VPP documentation
- runtime: Remove the explicit VirtioMem set and fix the comment
- tools/packaging/kata-deploy: Copy install_yq.sh before starting parallel builds
- docs: Remove kata-proxy references in documentation
- agent: Signal the whole process group
- osbuilder/qat: don't pull kata sources if exist
- docs: fix markdown issues in how-to-run-docker-with-kata.md
- osbuilder/qat: use centos as base OS
- docs: Update vcpu handling document
- Agent: fix unneeded late initialization lint
- static-build,clh: Add the ability to build from a PR
- Don't use a globally installed mock hook for hook tests
- ci: Weekly check whether the docs url is alive
- Multistrap Ubuntu & enable cross-building guest
- device: using const strings for block-driver option instead of hard coding
- doc: update Intel SGX use cases document
- tools: update QEMU to 6.2
- action: Update link for format patch documentation
- runtime: properly handle ESRCH error when signaling container
- docs: Update k8s documentation
- rustjail: optimization, merged several writelns into one
- doc: fix kata-deploy README typo
- versions: Upgrade to Cloud Hypervisor v22.1
- Add debug and self-test control options to Kata Manager
- scripts: Change here document delimiters
- agent: add tests for get_memory_info function
- CI: Update GHA secret name
- tools: release: Do not consider release candidates as stable releases
- kernel: fix cve-2022-0847
- docs: Update contact link in runtime README
- Improve error checking of hugepage allocation
- CI: Create GHA to add PR sizing label
- release: Revert kata-deploy changes after 2.4.0-rc0 release
2b91dcfe docs: Remove kata-proxy reference
0d765bd0 agent: fix container stop error with signal SIGRTMIN+3
a63bbf97 kata-monitor: fix duplicated output when printing usage
9e4ca0c4 doc: Improve kata-deploy README.md by changing sh blocks to bash blocks
a779e19b tools/packaging: Fix error path in 'kata-deploy-binaries.sh -s'
0baebd2b tools/packaging: Fix usage of kata-deploy-binaries.sh
3606923a workflows,release: Ship *all* the rust vendored code
2eb07455 tools: Add a generate_vendor.sh script
5e1c30d4 runtime: add logs around sandbox monitor
fb8be961 runtime: stop getting OOM events when ttrpc: closed error
93d03cc0 kata-deploy: fix version bump from -rc to stable
a9314023 docs: Remove kata-proxy references in documentation
66f05c5b runtime: Remove the explicit VirtioMem set and fix the comment
0928eb9f agent: Kill the all the container processes of the same cgroup
c2796327 osbuilder/qat: don't pull kata sources if exist
154c8b03 tools/packaging/kata-deploy: Copy install_yq.sh in a dedicated script
1ed7da8f packaging: Eliminate TTY_OPT and NO_TTY variables in kata-deploy
bad859d2 tools/packaging/kata-deploy/local-build: Add build to gitignore
19f372b5 runtime: Add more debug logs for container io stream copy
459f4bfe osbuilder/qat: use centos as base OS
9a5b4770 docs: Update vcpu handling document
ecf71d6d docs: Remove VPP documentation
c77e34de runtime: Move mock hook source
86723b51 virtcontainers: Remove unused install/uninstall targets
0e83c95f virtcontainers: Run mock hook from build tree rather than system bin dir
77434864 docs: fix markdown issues in how-to-run-docker-with-kata.md
32131cb8 Agent: fix unneeded late initialization lint
e65db838 virtcontainers: Remove VC_BIN_DIR
c20ad283 virtcontainers: Remove unused Makefile defines
c776bdf4 virtcontainers: Remove unused parameter from go-test.sh
ebec6903 static-build,clh: Add the ability to build from a PR
24b29310 doc: update Intel SGX use cases document
18d4d7fb tools: update QEMU to 6.2
62351637 action: Update link for format patch documentation
aa5ae6b1 runtime: Properly handle ESRCH error when signaling container
efa19c41 device: use const strings for block-driver option instead of hard coding
dacf6e39 doc: fix filename typo
92ce5e2d rustjail: optimization, merged several writelns into one
7a18e32f versions: Upgrade to Cloud Hypervisor v22.1
5c434270 docs: Update k8s documentation
5d6d39be scripts: Change here document delimiters
be12baf3 manager: Change here documents to use standard delimiter
9576a7da manager: Add options to change self test behaviour
d4d65bed manager: Add option to enable component debug
019da91d manager: Whitespace fix
d234cb76 manager: Create containerd link
c088a3f3 agent: add tests for get_memory_info function
4b1e2f52 CI: Update GHA secret name
ffdf961a docs: Update contact link in runtime README
5ec7592d kernel: fix cve-2022-0847
6a850899 CI: Create GHA to add PR sizing label
2b41d275 release: Revert kata-deploy changes after 2.4.0-rc0 release
4adf93ef tools: release: Do not consider release candidates as stable releases
72f7e9e3 osbuilder: Multistrap Ubuntu
df511bf1 packaging: Enable cross-building agent
0a313eda osbuilder: Fix use of LIBC in rootfs.sh
2c86b956 osbuilder: Simplify Rust installation
0072cc2b osbuilder: Remove musl installations
5c3e5536 osbuilder: apk add --no-cache
42e35505 agent: Verify that we allocated as many hugepages as we need
608e003a agent: Don't attempt to create directories for hugepage configuration
168fadf1 ci: Weekly check whether the docs url is alive
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
This PR removes the kata-proxy reference from this document as it is
not longer a component in kata 2.0
Fixes#4013
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This release changes Docker images repository from DockerHub to Amazon
ECR. This resolves the `You have reached your pull rate limit` error
when building the firecracker tarball.
Fixes#4001
Signed-off-by: Greg Kurz <groug@kaod.org>
The nix::sys::signal::Signal package api cannot deal with SIGRTMIN+3,
directly use libc function to send the signal.
Fixes: #3990
Signed-off-by: Wang Xingxing <stellarwxx@163.com>
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.
This commit also updates the unit test advice to use `T.TempDir` to
create temporary directory in tests.
Fixes: #3924
Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
(default: "/run/containerd/containerd.sock") is duplicated when
printing kata-monitor usage:
[root@kubernetes ~]# kata-monitor --help
Usage of kata-monitor:
-listen-address string
The address to listen on for HTTP requests. (default ":8090")
-log-level string
Log level of logrus(trace/debug/info/warn/error/fatal/panic). (default "info")
-runtime-endpoint string
Endpoint of CRI container runtime service. (default: "/run/containerd/containerd.sock") (default "/run/containerd/containerd.sock")
the golang flag package takes care of adding the defaults when printing
usage. Remove the explicit print of the value so that it would not be
printed on screen twice.
Fixes: #3998
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
The idea is to pass this README file to kata-doc-to-script.sh script and
then execute the result.
Added comments with a file name on top of each YAML snippet.
This helps in assigning a file name when we cat the YAML to a file.
Fixes: #3943
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
`make kata-tarball` relies on `kata-deploy-binaries.sh -s` which
silently ignores errors, and you may end up with an incomplete
tarball without noticing it because `make`'s exit status is 0.
`kata-deploy-binaries.sh` does set the `errexit` option and all the
code in the script seems to assume that since it doesn't do error
checking. Unfortunately, bash automatically disables `errexit` when
calling a function from a conditional pipeline, like done in the `-s`
case:
if [ "${silent}" == true ]; then
if ! handle_build "${t}" &>"$log_file"; then
^^^^^^
this disables `errexit`
and `handle_build` ends with a `tar tvf` that always succeeds.
Adding error checking all over the place isn't really an option
as it would seriously obfuscate the code. Drop the conditional
pipeline instead and print the final error message from a `trap`
handler on the special ERR signal. This requires the `errtrace`
option as `trap`s aren't propagated to functions by default.
Since all outputs of `handle_build` are redirected to the build
log file, some file descriptor duplication magic is needed for
the handler to be able to write to the orignal stdout and stderr.
Fixes#3757
Signed-off-by: Greg Kurz <groug@kaod.org>
Instead of only vendoring the code needed by the agent, let's ensure we
vendor all the needed rust code, and let's do it using the newly
introduced enerate_vendor.sh script.
Fixes: #3973
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This script is responsible for generating a tarball with all the rust
vendored code that is needed for fully building kata-containers on a
disconnected environment.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
getOOMEvents is a long-waiting call, it will retry when failed.
For cases of agent shutdown, the retry should stop.
When the agent hasn't detected agent has died, we can also check
whether the error is "ttrpc: closed".
Fixes: #3815
Signed-off-by: bin <bin@hyper.sh>
This PR removes the kata-proxy references in VSocks documentation,
as this is not a component in kata 2.0 and all the examples that
were used belonged to kata 1.x.
Fixes#3980
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Modify the 2Mib in the comment to 4Mib.
VirtioMem is set by configuration file or annotation. And setupVirtioMem is called only when VirtioMem is true.
Fixes: #3750
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
For the library `procfs`, the unit of values in `CpuTime` is ticks,
and we do not know how many ticks per second from metrics because the
`tps` in `CpuTime` is private.
But there are some implements in `CpuTime` for getting these values,
e.g., `user_ms()` for `user`, and `nice_ms()` for `nice`. With these
values, accurate time can be obtained.
Fixes: #3979
Acked-by: zhaojizhuang <571130360@qq.com>
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
Previously, it was not permitted to have neither an initrd nor an image.
However, this is the exact config to use for Secure Execution, where the
initrd is part of the image to be specified as `-kernel`. Require the
configuration of no initrd for Secure Execution.
Also
- remove redundant code for image/initrd checking -- no need to check in
`newQemuHypervisorConfig` (calling) when it is also checked in
`getInitrdAndImage` (called)
- use `QemuCCWVirtio` constant when possible
Fixes: #3922
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
'make kata-tarball' sometimes fails early with:
cp: cannot create regular file '[...]/tools/packaging/kata-deploy/local-build/dockerbuild/install_yq.sh': File exists
This happens because all assets are built in parallel using the same
`kata-deploy-binaries-in-docker.sh` script, and thus all try to copy
the `install_yq.sh` script to the same location with the `cp` command.
This is a well known race condition that cannot be avoided without
serialization of `cp` invocations.
Move the copying of `install_yq.sh` to a separate script and ensure
it is called *before* parallel builds. Make the presence of the copy
a prerequisite for each sub-build so that they still can be triggered
individually. Update the GH release workflow to also call this script
before calling `kata-deploy-binaries-in-docker.sh`.
Fixes#3756
Signed-off-by: Greg Kurz <groug@kaod.org>
NO_TTY configured whether to add the -t option to docker run. It makes no
sense for the caller to configure this, since whether you need it depends
on the commands you're running. Since the point here is to run
non-interactive build scripts, we don't need -t, or -i either.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
This directory consists entirely of files built during a make kata-tarball,
so it should not be committed to the tree. A symbolic link to this directory
might be created during 'make tarball', ignore it as well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[greg: - rearranged the subject to make the subsystem checker happy
- also ignore the symbolic link created by
`kata-deploy-binaries-in-docker.sh`]
Signed-off-by: Greg Kurz <groug@kaod.org>
This PR updates the vcpu handling document by removing docker information
which is not longer being used in kata 2.x and leaving only k8s information.
Fixes#3950
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR is removing VPP documentation as it is not longer valid with
kata 2.x, all the instructions were used for kata 1.x
Fixes#3946
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
src/runtime/virtcontainers/hook/mock contains a simple example hook in Go.
The only thing this is used for is for some tests in
src/runtime/pkg/katautils/hook_test.go. It doesn't really have anything
to do with the rest of the virtcontainers package.
So, move it next to the test code that uses it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We've now removed the need to install the mock hook binary for unit tests.
However, it turns out that managing that was the *only* thing that the
install and uninstall targets in the virtcontainers Makefile handled.
So, remove them.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Running unit tests should generally have minimal dependencies on
things outside the build tree. It *definitely* shouldn't modify
system wide things outside the build tree. Currently the runtime
"make test" target does so, though.
Several of the tests in src/runtime/pkg/katautils/hook_test.go require a
sample hook binary. They expect this hook in
/usr/bin/virtcontainers/bin/test/hook, so the makefile, as root, installs
the test binary to that location.
Go tests automatically run within the package's directory though, so
there's no need to use a system wide path. We can use a relative path to
the binary build within the tree just as easily.
fixes#3941
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The VC_BIN_DIR variable in the virtcontainers Makefile is almost unused.
It's used to generate TEST_BIN_DIR, and it's created in the install target.
However, we also create TEST_BIN_DIR, which is a subdirectory of VC_BIN_DIR
with mkdir -p, so it will necessarily create VC_BIN_DIR along the way.
So we can drop the unnecessary mkdir and expand the definition of
VC_BIN_DIR in the definition of TEST_BIN_DIR.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The INSTALL_EXEC and UNINSTALL_EXEC definitions from the virtcontainers
Makefile (unlike those from the runtime Makefile in the parent directory)
are entirely unused. Remove them.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The check-go-test target passes the path to the mock hook test binary to
go-test.sh when it invokes it. But go-test.sh just calls run_go_test from
ci/lib.sh, which invokes a script from the tests repo *without* any
parameters.
That is, this parameter is ignored anyway, so remove it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Right now it doesn't do much for us, as we're always building from a
specific version. However, this opens the possibility for us to add a
CI, similar to the one we have for CRI-O, for testing against each
cloud-hypervisor PR, on the cloud-hypervisor branch.
Fixes: #3908
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Installation section is not longer needed because of the latest
default kata kernel supports Intel SGX.
Include QEMU to the list of supported hypervisors.
fixes#3911
Signed-off-by: Julio Montes <julio.montes@intel.com>
bring Intel SGX support
Changes tha may impact in Kata Containers
Arm:
The 'virt' machine now supports an emulated ITS
The 'virt' machine now supports more than 123 CPUs in TCG emulation mode
The pl031 real-time clock device now supports sending RTC_CHANGE QMP events
PowerPC:
Improved POWER10 support for the 'powernv' machine
Initial support for POWER10 DD2.0 CPU added
Added support for FORM2 PAPR NUMA descriptions in the "pseries" machine
type
s390x:
Improved storage key emulation (e.g. fixed address handling, lazy
storage key enablement for TCG, ...)
New gen16 CPU features are now enabled automatically in the latest
machine type
KVM:
Support for SGX in the virtual machine, using the /dev/sgx_vepc device
on the host and the "memory-backend-epc" backend in QEMU.
New "hv-apicv" CPU property (aliased to "hv-avic") sets the
HV_DEPRECATING_AEOI_RECOMMENDED bit in CPUID[0x40000004].EAX.
virtio-mem:
QEMU now fully supports guest memory dumps with virtio-mem.
QEMU now cleanly supports precopy migration, postcopy migration and
background snapshots with virtio-mem.
fixes#3902
Signed-off-by: Julio Montes <julio.montes@intel.com>
This PR updates the link for the format patch documentation for the
commit message check.
Fixes#3900
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Currently kata shim v2 doesn't translate ESRCH signal, causing container
fail to stop and shim leak.
Fixes: #3874
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Currently, the block driver option is specifed by hard coding, maybe it
is better to use const string variables instead of hard coded strings.
Another modification is to remove duplicate consts for virtio driver in
manager.go.
Fixes: #3321
Signed-off-by: Jason Zhang <zhanghj.lc@inspur.com>
Update documentation with missing step to untaint node to enable
scheduling and update the example to run a pod using the kata runtime
class instead of untrusted workloads, which applies to versions of CRI-O
prior to v1.12.
Fixes#3863
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
All scripts should use `EOF` as the shell here document delimiter as
this is checked by the static checker.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Added new `kata-manager` options to control the self-test behaviour. By
default, after installation the manager will run a test to ensure a Kata
Containers container can be created. New options allow:
- The self test to be disabled.
- Only the self test to be run (no installation).
These features allow changes to be made to the installed system before
the self test is run.
Fixes: #3851.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Make the `kata-manager` create a `containerd` link to ensure the
downloaded containerd systemd service file can find the daemon when
using the GitHub packaged version of containerd.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Add test coverage for get_memory_info function in src/rpc.rs. Includes
some minor refactoring of the function.
Fixes#3837
Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
Change the secret used by the GitHub Action that adds the PR size
label to one with the correct set of privileges.
Fixes: #3856.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
As 2.4.0-rc0 has been released, let's switch the kata-deploy / kata-cleanup
tags back to "latest", and re-add the kata-deploy-stable and the
kata-cleanup-stable files.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
During the release of 2.4.0-rc0 @egernst noticed an incositency in the
way we handle release tags, as release candidates are being taken as
"stable" releases, while both the kata-deploy tests and the release
action consider this as "latest".
Ideally we should have our own tag for "release candidate", but that's
something that could and should be discussed more extensively outside of
the scope of this quick fix.
For now, let's align the code generating the PR for bumping the release
with what we already do as part of the release action and kata-deploy
test, and tag "-rc" as latest, regardless of which branch it's coming
from.
Fixes: #3847
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Use `multistrap` for building Ubuntu rootfs. Adds support for building
for foreign architectures using the `ARCH` environment variable.
In the process, the Ubuntu rootfs workflow is vastly simplified.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Requires setting ARCH and CC.
- Add CC linker option for building agent.
- Set host for building libseccomp.
Fixes: #3681
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
- Add a doc comment
- Pass to build container, e.g. to build x86_64 with glibc (would
always use musl)
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Remove a lot of cruft of musl installations -- we needed those for the
Go agent, but Rustup just takes care of everything. aarch64 on
Debian-based & Alpine is an exception -- create a symlink
`aarch64-linux-musl-gcc` to `musl-tools`'s `musl-gcc` or `gcc` on
Alpine. This is unified -- arch-specific Dockerfiles are removed.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Hadolint DL3019. If you're wondering why this is in this PR, that's
because I touch the file later, and we're only triggering the lints for
changed files.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
allocate_hugepages() writes to the kernel sysfs file to allocate hugepages
in the Kata VM. However, even if the write succeeds, it's not certain that
the kernel will actually be able to allocate as many hugepages as we
requested.
This patch reads back the file after writing it to check if we were able to
allocate all the required hugepages.
fixes#3816
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
allocate_hugepages() constructs the path for the sysfs directory containing
hugepage configuration, then attempts to create this directory if it does
not exist.
This doesn't make sense: sysfs is a view into kernel configuration, if the
kernel has support for the hugepage size, the directory will already be
there, if it doesn't, trying to create it won't help.
For the same reason, attempting to create the "nr_hugepages" file
itself is pointless, so there's no reason to call
OpenOptions::create(true).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Weekly check(at 23:00 every Sunday) whether the docs url is ALIVE, so that
we can find the failed url in time
Fixes#815
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
2022-01-20 19:56:15 +08:00
3517 changed files with 811278 additions and 41721 deletions
@@ -116,7 +119,9 @@ The table below lists the core parts of the project:
| Component | Type | Description |
|-|-|-|
| [runtime](src/runtime) | core | Main component run by a container manager and providing a containerd shimv2 runtime implementation. |
| [runtime-rs](src/runtime-rs) | core | The Rust version runtime. |
| [agent](src/agent) | core | Management process running inside the virtual machine / POD that sets up the container environment. |
| [`dragonball`](src/dragonball) | core | An optional built-in VMM brings out-of-the-box Kata Containers experience with optimizations on container workloads |
| [documentation](docs) | documentation | Documentation common to all components (such as design and install documentation). |
| [tests](https://github.com/kata-containers/tests) | tests | Excludes unit tests which live with the main code. |
@@ -129,8 +134,12 @@ The table below lists the remaining parts of the project:
| [packaging](tools/packaging) | infrastructure | Scripts and metadata for producing packaged binaries<br/>(components, hypervisors, kernel and rootfs). |
| [kernel](https://www.kernel.org) | kernel | Linux kernel used by the hypervisor to boot the guest image. Patches are stored [here](tools/packaging/kernel). |
| [osbuilder](tools/osbuilder) | infrastructure | Tool to create "mini O/S" rootfs and initrd images and kernel for the hypervisor. |
| [kata-debug](tools/packaging/kata-debug/README.md) | infrastructure | Utility tool to gather Kata Containers debug information from Kubernetes clusters. |
| [`agent-ctl`](src/tools/agent-ctl) | utility | Tool that provides low-level access for testing the agent. |
| [`kata-ctl`](src/tools/kata-ctl) | utility | Tool that provides advanced commands and debug facilities. |
| [`log-parser-rs`](src/tools/log-parser-rs) | utility | Tool that aid in analyzing logs from the kata runtime. |
| [`runk`](src/tools/runk) | utility | Standard OCI container runtime based on the agent. |
| [`ci`](https://github.com/kata-containers/ci) | CI | Continuous Integration configuration files and scripts. |
| [`katacontainers.io`](https://github.com/kata-containers/www.katacontainers.io) | Source for the [`katacontainers.io`](https://www.katacontainers.io) site. |
@@ -138,8 +147,10 @@ The table below lists the remaining parts of the project:
Kata Containers is now
[available natively for most distributions](docs/install/README.md#packaged-installation-methods).
However, packaging scripts and metadata are still used to generate snap and GitHub releases. See
the [components](#components) section for further details.
## Metrics tests
See the [metrics documentation](tests/metrics/README.md).
This document is written **specifically for developers**: it is not intended for end users.
If you want to contribute changes that you have made, please read the [community guidelines](https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md) for information about our processes.
# Assumptions
- You are working on a non-critical test or development system.
@@ -33,51 +35,41 @@ You need to install the following to build Kata Containers components:
-`make`.
-`gcc` (required for building the shim and runtime).
# Build and install the Kata Containers runtime
# Build and install Kata Containers
## Build and install the Kata Containers runtime
```
$ go get -d -u github.com/kata-containers/kata-containers
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/src/runtime
The agent is built with a statically linked `musl.` The default `libc` used is `musl`, but on `ppc64le` and `s390x`, `gnu` should be used. To configure this:
> variable in the previous command and ensure the `qemu-img` command is
> available on your system.
> - If `qemu-img` is not installed, you will likely see errors such as `ERROR: File /dev/loop19p1 is not a block device` and `losetup: /tmp/tmp.bHz11oY851: Warning: file is smaller than 512 bytes; the loop device may be useless or invisible for system tools`. These can be mitigated by installing the `qemu-img` command (available in the `qemu-img` package on Fedora or the `qemu-utils` package on Debian).
> - If `loop` module is not probed, you will likely see errors such as `losetup: cannot find an unused loop device`. Execute `modprobe loop` could resolve it.
If you do not want to install the respective QEMU version, the configuration file can be modified to point to the correct binary. In `/etc/kata-containers/configuration.toml`, change `path = "/path/to/qemu/build/qemu-system-x86_64"` to point to the correct QEMU binary.
See the [static-build script for QEMU](../tools/packaging/static-build/qemu/build-static-qemu.sh) for a reference on how to get, setup, configure and build QEMU for Kata.
### Build a custom QEMU for aarch64/arm64 - REQUIRED
@@ -439,11 +494,33 @@ See the [static-build script for QEMU](../tools/packaging/static-build/qemu/buil
> under upstream review for supporting NVDIMM on aarch64.
>
You could build the custom `qemu-system-aarch64` as required with the following command:
Modify `/etc/kata-containers/configuration.toml` and update value `virtio_fs_daemon = "/path/to/kata-containers/tools/packaging/static-build/virtiofsd/virtiofsd/virtiofsd"` to point to the binary.
# Check hardware requirements
You can check if your system is capable of creating a Kata Container by running the following:
```bash
$ sudo kata-runtime check
```
If your system is *not* able to run Kata Containers, the previous command will error out and explain why.
# Run Kata Containers with Containerd
Refer to the [How to use Kata Containers and Containerd](how-to/containerd-kata.md) how-to guide.
@@ -465,7 +542,7 @@ script and paste its output directly into a
@@ -491,7 +568,7 @@ contain either `/bin/sh` or `/bin/bash`.
Enable debug_console_enabled in the `configuration.toml` configuration file:
```
```toml
[agent.kata]
debug_console_enabled=true
```
@@ -502,7 +579,7 @@ This will pass `agent.debug_console agent.debug_console_vport=1026` to agent as
For Kata Containers `2.0.x` releases, the `kata-runtime exec` command depends on the`kata-monitor` running, in order to get the sandbox's `vsock` address to connect to. Thus, first start the `kata-monitor` process.
```
```bash
$ sudo kata-monitor
```
@@ -510,10 +587,15 @@ $ sudo kata-monitor
#### Connect to debug console
Command `kata-runtime exec` is used to connect to the debug console.
You need to start a container for example:
```bash
$ sudo ctr run --runtime io.containerd.kata.v2 -d docker.io/library/ubuntu:latest testdebug
```
Then, you can use the command `kata-runtime exec <sandbox id>` to connect to the debug console.
`kata-runtime exec` has a command-line option `runtime-namespace`, which is used to specify under which [runtime namespace](https://github.com/containerd/containerd/blob/master/docs/namespaces.md) the particular pod was created. By default, it is set to `k8s.io` and works for containerd when configured
`kata-runtime exec` has a command-line option `runtime-namespace`, which is used to specify under which [runtime namespace](https://github.com/containerd/containerd/blob/main/docs/namespaces.md) the particular pod was created. By default, it is set to `k8s.io` and works for containerd when configured
with Kubernetes. For CRI-O, the namespace should set to `default` explicitly. This should not be confused with [Kubernetes namespaces](https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/).
For other CRI-runtimes and configurations, you may need to set the namespace utilizing the `runtime-namespace` option.
@@ -564,10 +646,10 @@ an additional `coreutils` package.
For example using CentOS:
```
$ cd $GOPATH/src/github.com/kata-containers/kata-containers/tools/osbuilder/rootfs-builder
Note that the OCI standard does not specify `checkpoint` and `restore`
@@ -93,6 +102,42 @@ All other configurations are supported and are working properly.
## Networking
### Host network
Host network (`nerdctl/docker run --net=host`or [Kubernetes `HostNetwork`](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#hosts-namespaces)) is not supported.
It is not possible to directly access the host networking configuration
from within the VM.
The `--net=host` option can still be used with `runc` containers and
inter-mixed with running Kata Containers, thus enabling use of `--net=host`
when necessary.
It should be noted, currently passing the `--net=host` option into a
Kata Container may result in the Kata Container networking setup
modifying, re-configuring and therefore possibly breaking the host
networking setup. Do not use `--net=host` with Kata Containers.
### Support for joining an existing VM network
Docker supports the ability for containers to join another containers
namespace with the `docker run --net=containers` syntax. This allows
multiple containers to share a common network namespace and the network
interfaces placed in the network namespace. Kata Containers does not
support network namespace sharing. If a Kata Container is setup to
share the network namespace of a `runc` container, the runtime
effectively takes over all the network interfaces assigned to the
namespace and binds them to the VM. Consequently, the `runc` container loses
its network connectivity.
### docker run --link
The runtime does not support the `docker run --link` command. This
command is now deprecated by docker and we have no intention of adding support.
Equivalent functionality can be achieved with the newer docker networking commands.
* Using an [application token](https://github.com/settings/tokens) is required for hub.
* Using an [application token](https://github.com/settings/tokens) is required for hub (set to a GITHUB_TOKEN environment variable).
- GitHub permissions to push tags and create releases in Kata repositories.
- GPG configured to sign git tags. https://help.github.com/articles/generating-a-new-gpg-key/
- GPG configured to sign git tags. https://docs.github.com/en/authentication/managing-commit-signature-verification/generating-a-new-gpg-key
- You should configure your GitHub to use your ssh keys (to push to branches). See https://help.github.com/articles/adding-a-new-ssh-key-to-your-github-account/.
* As an alternative, configure hub to push and fork with HTTPS, `git config --global hub.protocol https` (Not tested yet) *
If you create a new stable branch, i.e. if your release changes a major or minor version number (not a patch release), then
you should modify the `tests` repository to point to that newly created stable branch and not the `main` branch.
The objective is that changes in the CI on the main branch will not impact the stable branch.
In the test directory, change references the main branch in:
* `README.md`
* `versions.yaml`
* `cmd/github-labels/labels.yaml.in`
* `cmd/pmemctl/pmemctl.sh`
* `.ci/lib.sh`
* `.ci/static-checks.sh`
See the commits in [the corresponding PR for stable-2.1](https://github.com/kata-containers/tests/pull/3504) for an example of the changes.
### Merge all bump version Pull requests
- The above step will create a GitHub pull request in the Kata projects. Trigger the CI using `/test` command on each bump Pull request.
- Trigger the test-kata-deploy workflow on the kata-containers repository bump Pull request using `/test_kata_deploy` (monitor under the "action" tab).
- Trigger the `test-kata-deploy` workflow which is under the `Actions` tab on the repository GitHub page (make sure to select the correct branch and validate it passes).
- Check any failures and fix if needed.
- Work with the Kata approvers to verify that the CI works and the pull requests are merged.
@@ -63,6 +46,24 @@
$ ./tag_repos.sh -p -b "$BRANCH" tag
```
### Point tests repository to stable branch
If your release changes a major or minor version number(not a patch release), then the above
`./tag_repos.sh` script will create a new stable branch in all the repositories in addition to tagging them.
This happens when you are making the first `rc` release for a new major or minor version in Kata.
In this case, you should modify the `tests` repository to point to the newly created stable branch and not the `main` branch.
The objective is that changes in the CI on the main branch will not impact the stable branch.
In the test directory, change references of the `main` branch to the new stable branch in:
* `README.md`
* `versions.yaml`
* `cmd/github-labels/labels.yaml.in`
* `cmd/pmemctl/pmemctl.sh`
* `.ci/lib.sh`
* `.ci/static-checks.sh`
See the commits in [the corresponding PR for stable-2.1](https://github.com/kata-containers/tests/pull/3504) for an example of the changes.
### Check Git-hub Actions
We make use of [GitHub actions](https://github.com/features/actions) in this [file](../.github/workflows/release.yaml) in the `kata-containers/kata-containers` repository to build and upload release artifacts. This action is auto triggered with the above step when a new tag is pushed to the `kata-containers/kata-containers` repository.
As we know, we can interact with cgroups in two ways, **`cgroupfs`** and **`systemd`**. The former is achieved by reading and writing cgroup `tmpfs` files under `/sys/fs/cgroup` while the latter is done by configuring a transient unit by requesting systemd. Kata agent uses **`cgroupfs`** by default, unless you pass the parameter `--systemd-cgroup`.
## usage
For systemd, kata agent configures cgroups according to the following `linux.cgroupsPath` format standard provided by `runc` (`[slice]:[prefix]:[name]`). If you don't provide a valid `linux.cgroupsPath`, kata agent will treat it as `"system.slice:kata_agent:<container-id>"`.
> Here slice is a systemd slice under which the container is placed. If empty, it defaults to system.slice, except when cgroup v2 is used and rootless container is created, in which case it defaults to user.slice.
>
> Note that slice can contain dashes to denote a sub-slice (e.g. user-1000.slice is a correct notation, meaning a `subslice` of user.slice), but it must not contain slashes (e.g. user.slice/user-1000.slice is invalid).
>
> A slice of `-` represents a root slice.
>
> Next, prefix and name are used to compose the unit name, which is `<prefix>-<name>.scope`, unless name has `.slice` suffix, in which case prefix is ignored and the name is used as is.
## supported properties
The kata agent will translate the parameters in the `linux.resources` of `config.json` into systemd unit properties, and send it to systemd for configuration. Since systemd supports limited properties, only the following parameters in `linux.resources` will be applied. We will simply treat hybrid mode as legacy mode by the way.
`session.rs` and `system.rs` in `src/agent/rustjail/src/cgroups/systemd/interface` are automatically generated by `zbus-xmlgen`, which is is an accompanying tool provided by `zbus` to generate Rust code from `D-Bus XML interface descriptions`. The specific commands to generate these two files are as follows:
In cloud-native scenarios, there is an increased demand for container startup speed, resource consumption, stability, and security, areas where the present Kata Containers runtime is challenged relative to other runtimes. To achieve this, we propose a solid, field-tested and secure Rust version of the kata-runtime.
Also, we provide the following designs:
- Turn key solution with builtin `Dragonball` Sandbox
- Async I/O to reduce resource consumption
- Extensible framework for multiple services, runtimes and hypervisors
- Lifecycle management for sandbox and container associated resources
### Rationale for choosing Rust
We chose Rust because it is designed as a system language with a focus on efficiency.
In contrast to Go, Rust makes a variety of design trade-offs in order to obtain
good execution performance, with innovative techniques that, in contrast to C or
C++, provide reasonable protection against common memory errors (buffer
overflow, invalid pointers, range errors), error checking (ensuring errors are
dealt with), thread safety, ownership of resources, and more.
These benefits were verified in our project when the Kata Containers guest agent
was rewritten in Rust. We notably saw a significant reduction in memory usage
with the Rust-based implementation.
## Design
### Architecture

### Built-in VMM
#### Current Kata 2.x architecture

As shown in the figure, runtime and VMM are separate processes. The runtime process forks the VMM process and interacts through the inter-process RPC. Typically, process interaction consumes more resources than peers within the process, and it will result in relatively low efficiency. At the same time, the cost of resource operation and maintenance should be considered. For example, when performing resource recovery under abnormal conditions, the exception of any process must be detected by others and activate the appropriate resource recovery process. If there are additional processes, the recovery becomes even more difficult.
#### How To Support Built-in VMM
We provide `Dragonball` Sandbox to enable built-in VMM by integrating VMM's function into the Rust library. We could perform VMM-related functionalities by using the library. Because runtime and VMM are in the same process, there is a benefit in terms of message processing speed and API synchronization. It can also guarantee the consistency of the runtime and the VMM life cycle, reducing resource recovery and exception handling maintenance, as shown in the figure:

### Async Support
#### Why Need Async
**Async is already in stable Rust and allows us to write async code**
- Async provides significantly reduced CPU and memory overhead, especially for workloads with a large amount of IO-bound tasks
- Async is zero-cost in Rust, which means that you only pay for what you use. Specifically, you can use async without heap allocations and dynamic dispatch, which greatly improves efficiency
- For more (see [Why Async?](https://rust-lang.github.io/async-book/01_getting_started/02_why_async.html) and [The State of Asynchronous Rust](https://rust-lang.github.io/async-book/01_getting_started/03_state_of_async_rust.html)).
**There may be several problems if implementing kata-runtime with Sync Rust**
- In Sync mode, implementing a timeout mechanism is challenging. For example, in TTRPC API interaction, the timeout mechanism is difficult to align with Golang
#### How To Support Async
The kata-runtime is controlled by TOKIO_RUNTIME_WORKER_THREADS to run the OS thread, which is 2 threads by default. For TTRPC and container-related threads run in the `tokio` thread in a unified manner, and related dependencies need to be switched to Async, such as Timer, File, Netlink, etc. With the help of Async, we can easily support no-block I/O and timer. Currently, we only utilize Async for kata-runtime. The built-in VMM keeps the OS thread because it can ensure that the threads are controllable.
**For N tokio worker threads and M containers**
- Sync runtime(both OS thread and `tokio` task are OS thread but without `tokio` worker thread) OS thread number: 4 + 12*M
- Async runtime(only OS thread is OS thread) OS thread number: 2 + N
```shell
├─ main(OS thread)
├─ async-logger(OS thread)
└─ tokio worker(N * OS thread)
├─ agent log forwarder(1 * tokio task)
├─ health check thread(1 * tokio task)
├─ TTRPC reaper thread(M * tokio task)
├─ TTRPC listener thread(M * tokio task)
├─ TTRPC client handler thread(7 * M * tokio task)
├─ container stdin io thread(M * tokio task)
├─ container stdout io thread(M * tokio task)
└─ container stderr io thread(M * tokio task)
```
### Extensible Framework
The Kata 3.x runtime is designed with the extension of service, runtime, and hypervisor, combined with configuration to meet the needs of different scenarios. At present, the service provides a register mechanism to support multiple services. Services could interact with runtime through messages. In addition, the runtime handler handles messages from services. To meet the needs of a binary that supports multiple runtimes and hypervisors, the startup must obtain the runtime handler type and hypervisor type through configuration.

### Resource Manager
In our case, there will be a variety of resources, and every resource has several subtypes. Especially for `Virt-Container`, every subtype of resource has different operations. And there may be dependencies, such as the share-fs rootfs and the share-fs volume will use share-fs resources to share files to the VM. Currently, network and share-fs are regarded as sandbox resources, while rootfs, volume, and cgroup are regarded as container resources. Also, we abstract a common interface for each resource and use subclass operations to evaluate the differences between different subtypes.

## Roadmap
- Stage 1 (June): provide basic features (current delivered)
- Are the "service", "message dispatcher" and "runtime handler" all part of the single Kata 3.x runtime binary?
Yes. They are components in Kata 3.x runtime. And they will be packed into one binary.
1. Service is an interface, which is responsible for handling multiple services like task service, image service and etc.
2. Message dispatcher, it is used to match multiple requests from the service module.
3. Runtime handler is used to deal with the operation for sandbox and container.
- What is the name of the Kata 3.x runtime binary?
Apparently we can't use `containerd-shim-v2-kata` because it's already used. We are facing the hardest issue of "naming" again. Any suggestions are welcomed.
Internally we use `containerd-shim-v2-rund`.
- Is the Kata 3.x design compatible with the containerd shimv2 architecture?
Yes. It is designed to follow the functionality of go version kata. And it implements the `containerd shim v2` interface/protocol.
- How will users migrate to the Kata 3.x architecture?
The migration plan will be provided before the Kata 3.x is merging into the main branch.
- Is `Dragonball` limited to its own built-in VMM? Can the `Dragonball` system be configured to work using an external `Dragonball` VMM/hypervisor?
The `Dragonball` could work as an external hypervisor. However, stability and performance is challenging in this case. Built in VMM could optimise the container overhead, and it's easy to maintain stability.
`runD` is the `containerd-shim-v2` counterpart of `runC` and can run a pod/containers. `Dragonball` is a `microvm`/VMM that is designed to run container workloads. Instead of `microvm`/VMM, we sometimes refer to it as secure sandbox.
- QEMU, Cloud Hypervisor and Firecracker support are planned, but how that would work. Are they working in separate process?
Yes. They are unable to work as built in VMM.
- What is `upcall`?
The `upcall` is used to hotplug CPU/memory/MMIO devices, and it solves two issues.
1. avoid dependency on PCI/ACPI
2. avoid dependency on `udevd` within guest and get deterministic results for hotplug operations. So `upcall` is an alternative to ACPI based CPU/memory/device hotplug. And we may cooperate with the community to add support for ACPI based CPU/memory/device hotplug if needed.
`Dbs-upcall` is a `vsock-based` direct communication tool between VMM and guests. The server side of the `upcall` is a driver in guest kernel (kernel patches are needed for this feature) and it'll start to serve the requests once the kernel has started. And the client side is in VMM , it'll be a thread that communicates with VSOCK through `uds`. We have accomplished device hotplug / hot-unplug directly through `upcall` in order to avoid virtualization of ACPI to minimize virtual machine's overhead. And there could be many other usage through this direct communication channel. It's already open source.
- The URL below says the kernel patches work with 4.19, but do they also work with 5.15+ ?
Forward compatibility should be achievable, we have ported it to 5.10 based kernel.
- Are these patches platform-specific or would they work for any architecture that supports VSOCK?
It's almost platform independent, but some message related to CPU hotplug are platform dependent.
- Could the kernel driver be replaced with a userland daemon in the guest using loopback VSOCK?
We need to create device nodes for hot-added CPU/memory/devices, so it's not easy for userspace daemon to do these tasks.
- The fact that `upcall` allows communication between the VMM and the guest suggests that this architecture might be incompatible with https://github.com/confidential-containers where the VMM should have no knowledge of what happens inside the VM.
1.`TDX` doesn't support CPU/memory hotplug yet.
2. For ACPI based device hotplug, it depends on ACPI `DSDT` table, and the guest kernel will execute `ASL` code to handle during handling those hotplug event. And it should be easier to audit VSOCK based communication than ACPI `ASL` methods.
- What is the security boundary for the monolithic / "Built-in VMM" case?
It has the security boundary of virtualization. More details will be provided in next stage.
Core scheduling is a Linux kernel feature that allows only trusted tasks to run concurrently on
CPUs sharing compute resources (for example, hyper-threads on a core).
Containerd versions >= 1.6.4 leverage this to treat all of the processes associated with a
given pod or container to be a single group of trusted tasks. To indicate this should be carried
out, containerd sets the `SCHED_CORE` environment variable for each shim it spawns. When this is
set, the Kata Containers shim implementation uses the `prctl` syscall to create a new core scheduling
domain for the shim process itself as well as future VMM processes it will start.
For more details on the core scheduling feature, see the [Linux documentation](https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/core-scheduling.html).
Today, there exist a few gaps between Container Storage Interface (CSI) and virtual machine (VM) based runtimes such as Kata Containers
that prevent them from working together smoothly.
First, it’s cumbersome to use a persistent volume (PV) with Kata Containers. Today, for a PV with Filesystem volume mode, Virtio-fs
is the only way to surface it inside a Kata Container guest VM. But often mounting the filesystem (FS) within the guest operating system (OS) is
desired due to performance benefits, availability of native FS features and security benefits over the Virtio-fs mechanism.
Second, it’s difficult if not impossible to resize a PV online with Kata Containers. While a PV can be expanded on the host OS,
the updated metadata needs to be propagated to the guest OS in order for the application container to use the expanded volume.
Currently, there is not a way to propagate the PV metadata from the host OS to the guest OS without restarting the Pod sandbox.
# Proposed Solution
Because of the OS boundary, these features cannot be implemented in the CSI node driver plugin running on the host OS
as is normally done in the runc container. Instead, they can be done by the Kata Containers agent inside the guest OS,
but it requires the CSI driver to pass the relevant information to the Kata Containers runtime.
An ideal long term solution would be to have the `kubelet` coordinating the communication between the CSI driver and
the container runtime, as described in [KEP-2857](https://github.com/kubernetes/enhancements/pull/2893/files).
However, as the KEP is still under review, we would like to propose a short/medium term solution to unblock our use case.
The proposed solution is built on top of a previous [proposal](https://github.com/egernst/kata-containers/blob/da-proposal/docs/design/direct-assign-volume.md)
described by Eric Ernst. The previous proposal has two gaps:
1. Writing a `csiPlugin.json` file to the volume root path introduced a security risk. A malicious user can gain unauthorized
access to a block device by writing their own `csiPlugin.json` to the above location through an ephemeral CSI plugin.
2. The proposal didn't describe how to establish a mapping between a volume and a kata sandbox, which is needed for
implementing CSI volume resize and volume stat collection APIs.
This document particularly focuses on how to address these two gaps.
## Assumptions and Limitations
1. The proposal assumes that a block device volume will only be used by one Pod on a node at a time, which we believe
is the most common pattern in Kata Containers use cases. It’s also unsafe to have the same block device attached to more than
one Kata pod. In the context of Kubernetes, the `PersistentVolumeClaim` (PVC) needs to have the `accessMode` as `ReadWriteOncePod`.
2. More advanced Kubernetes volume features such as, `fsGroup`, `fsGroupChangePolicy`, and `subPath` are not supported.
## End User Interface
1. The user specifies a PV as a direct-assigned volume. How a PV is specified as a direct-assigned volume is left for each CSI implementation to decide.
There are a few options for reference:
1. A storage class parameter specifies whether it's a direct-assigned volume. This avoids any lookups of PVC
or Pod information from the CSI plugin (as external provisioner takes care of these). However, all PVs in the storage class with the parameter set
will have host mounts skipped.
2. Use a PVC annotation. This approach requires the CSI plugins have `--extra-create-metadata` [set](https://kubernetes-csi.github.io/docs/external-provisioner.html#persistentvolumeclaim-and-persistentvolume-parameters)
to be able to perform a lookup of the PVC annotations from the API server. Pro: API server lookup of annotations only required during creation of PV.
Con: The CSI plugin will always skip host mounting of the PV.
3. The CSI plugin can also lookup pod `runtimeclass` during `NodePublish`. This approach can be found in the [ALIBABA CSI plugin](https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/pkg/disk/nodeserver.go#L248).
2. The CSI node driver delegates the direct assigned volume to the Kata Containers runtime. The CSI node driver APIs need to
be modified to pass the volume mount information and collect volume information to/from the Kata Containers runtime by invoking `kata-runtime` command line commands.
to propagate the volume mount information to the Kata Containers runtime for it to carry out the filesystem mount operation.
The `volumePath` is the [target_path](https://github.com/container-storage-interface/spec/blob/master/csi.proto#L1364) in the CSI `NodePublishVolumeRequest`.
The `mountInfo` is a serialized JSON string.
* **NodeGetVolumeStats** -- It invokes `kata-runtime direct-volume stats --volume-path [volumePath]` to retrieve the filesystem stats of direct-assigned volume.
* **NodeExpandVolume** -- It invokes `kata-runtime direct-volume resize --volume-path [volumePath] --size [size]` to send a resize request to the Kata Containers runtime to
resize the direct-assigned volume.
* **NodeStageVolume/NodeUnStageVolume** -- It invokes `kata-runtime direct-volume remove --volume-path [volumePath]` to remove the persisted metadata of a direct-assigned volume.
The `mountInfo` object is defined as follows:
```Golang
typeMountInfostruct{
// The type of the volume (ie. block)
VolumeTypestring`json:"volume-type"`
// The device backing the volume.
Devicestring`json:"device"`
// The filesystem type to be mounted on the volume.
FsTypestring`json:"fstype"`
// Additional metadata to pass to the agent regarding this volume.
Notes: given that the `mountInfo` is persisted to the disk by the Kata runtime, it shouldn't container any secrets (such as SMB mount password).
## Implementation Details
### Kata runtime
Instead of the CSI node driver writing the mount info into a `csiPlugin.json` file under the volume root,
as described in the original proposal, here we propose that the CSI node driver passes the mount information to
the Kata Containers runtime through a new `kata-runtime` commandline command. The `kata-runtime` then writes the mount
information to a `mountInfo.json` file in a predefined location (`/run/kata-containers/shared/direct-volumes/[volume_path]/`).
When the Kata Containers runtime starts a container, it verifies whether a volume mount is a direct-assigned volume by checking
whether there is a `mountInfo` file under the computed Kata `direct-volumes` directory. If it is, the runtime parses the `mountInfo` file,
updates the mount spec with the data in `mountInfo`. The updated mount spec is then passed to the Kata agent in the guest VM together
with other mounts. The Kata Containers runtime also creates a file named by the sandbox id under the `direct-volumes/[volume_path]/`
directory. The reason for adding a sandbox id file is to establish a mapping between the volume and the sandbox using it.
Later, when the Kata Containers runtime handles the `get-stats` and `resize` commands, it uses the sandbox id to identify
the endpoint of the corresponding `containerd-shim-kata-v2`.
### containerd-shim-kata-v2 changes
`containerd-shim-kata-v2` provides an API for sandbox management through a Unix domain socket. Two new handlers are proposed: `/direct-volume/stats` and `/direct-volume/resize`:
Example:
```bash
$ curl --unix-socket "$shim_socket_path" -I -X GET 'http://localhost/direct-volume/stats/[urlSafeVolumePath]'
Let’s assume that changes have been made in the `aws-ebs-csi-driver` node driver.
**Node publish volume**
1. In the node CSI driver, the `NodePublishVolume` API invokes: `kata-runtime direct-volume add --volume-path "/kubelet/a/b/c/d/sdf" --mount-info "{\"Device\": \"/dev/sdf\", \"fstype\": \"ext4\"}"`.
2. The `Kata-runtime` writes the mount-info JSON to a file called `mountInfo.json` under `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
**Node unstage volume**
1. In the node CSI driver, the `NodeUnstageVolume` API invokes: `kata-runtime direct-volume remove --volume-path "/kubelet/a/b/c/d/sdf"`.
2. Kata-runtime deletes the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
**Use the volume in sandbox**
1. Upon the request to start a container, the `containerd-shim-kata-v2` examines the container spec,
and iterates through the mounts. For each mount, if there is a `mountInfo.json` file under `/run/kata-containers/shared/direct-volumes/[mount source path]`,
it generates a `storage` GRPC object after overwriting the mount spec with the information in `mountInfo.json`.
2. The shim sends the storage objects to kata-agent through TTRPC.
3. The shim writes a file with the sandbox id as the name under `/run/kata-containers/shared/direct-volumes/[mount source path]`.
4. The kata-agent mounts the storage objects for the container.
**Node expand volume**
1. In the node CSI driver, the `NodeExpandVolume` API invokes: `kata-runtime direct-volume resize –-volume-path "/kubelet/a/b/c/d/sdf" –-size 8Gi`.
2. The Kata runtime checks whether there is a sandbox id file under the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
3. The Kata runtime identifies the shim instance through the sandbox id, and sends a GRPC request to resize the volume.
4. The shim handles the request, asks the hypervisor to resize the block device and sends a GRPC request to Kata agent to resize the filesystem.
5. Kata agent receives the request and resizes the filesystem.
**Node get volume stats**
1. In the node CSI driver, the `NodeGetVolumeStats` API invokes: `kata-runtime direct-volume stats –-volume-path "/kubelet/a/b/c/d/sdf"`.
2. The Kata runtime checks whether there is a sandbox id file under the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
3. The Kata runtime identifies the shim instance through the sandbox id, and sends a GRPC request to get the volume stats.
4. The shim handles the request and forwards it to the Kata agent.
5. Kata agent receives the request and returns the filesystem stats.
During container's lifecycle, different Hooks can be executed to do custom actions. In Kata Containers, we support two types of Hooks, `OCI Hooks` and `Kata Hooks`.
### OCI Hooks
The OCI Spec stipulates six hooks that can be executed at different time points and namespaces, including `Prestart Hooks`, `CreateRuntime Hooks`, `CreateContainer Hooks`, `StartContainer Hooks`, `Poststart Hooks` and `Poststop Hooks`. We support these types of Hooks as compatible as possible in Kata Containers.
The path and arguments of these hooks will be passed to Kata for execution via `bundle/config.json`. For example:
```
...
"hooks": {
"prestart": [
{
"path": "/usr/bin/prestart-hook",
"args": ["prestart-hook", "arg1", "arg2"],
"env": [ "key1=value1"]
}
],
"createRuntime": [
{
"path": "/usr/bin/createRuntime-hook",
"args": ["createRuntime-hook", "arg1", "arg2"],
"env": [ "key1=value1"]
}
]
}
...
```
### Kata Hooks
In Kata, we support another three kinds of hooks executed in guest VM, including `Guest Prestart Hook`, `Guest Poststart Hook`, `Guest Poststop Hook`.
The executable files for Kata Hooks must be packaged in the *guest rootfs*. The file path to those guest hooks should be specified in the configuration file, and guest hooks must be stored in a subdirectory of `guest_hook_path` according to their hook type. For example:
+ In configuration file:
```
guest_hook_path="/usr/share/hooks"
```
+ In guest rootfs, prestart-hook is stored in `/usr/share/hooks/prestart/prestart-hook`.
## Execution
The table below summarized when and where those different hooks will be executed in Kata Containers:
| Hook Name | Hook Type | Hook Path | Exec Place | Exec Time |
|---|---|---|---|---|
| `Prestart(deprecated)` | OCI hook | host runtime namespace | host runtime namespace | After VM is started, before container is created. |
| `CreateRuntime` | OCI hook | host runtime namespace | host runtime namespace | After VM is started, before container is created, after `Prestart` hooks. |
| `CreateContainer` | OCI hook | host runtime namespace | host vmm namespace* | After VM is started, before container is created, after `CreateRuntime` hooks. |
| `StartContainer` | OCI hook | guest container namespace | guest container namespace | After container is created, before container is started. |
| `Poststart` | OCI hook | host runtime namespace | host runtime namespace | After container is started, before start operation returns. |
| `Poststop` | OCI hook | host runtime namespace | host runtime namespace | After container is deleted, before delete operation returns. |
| `Guest Prestart` | Kata hook | guest agent namespace | guest agent namespace | During start operation, before container command is executed. |
| `Guest Poststart` | Kata hook | guest agent namespace | guest agent namespace | During start operation, after container command is executed, before start operation returns. |
| `Guest Poststop` | Kata hook | guest agent namespace | guest agent namespace | During delete operation, after container is deleted, before delete operation returns. |
+ `Hook Path` specifies where hook's path be resolved.
+ `Exec Place` specifies in which namespace those hooks can be executed.
+ For `CreateContainer` Hooks, OCI requires to run them inside the container namespace while the hook executable path is in the host runtime, which is a non-starter for VM-based containers. So we design to keep them running in the *host vmm namespace.*
+ `Exec Time` specifies at which time point those hooks can be executed.
@@ -51,6 +51,7 @@ The `kata-monitor` management agent should be started on each node where the Kat
> **Note**: a *node* running Kata containers will be either a single host system or a worker node belonging to a K8s cluster capable of running Kata pods.
- Aggregate sandbox metrics running on the node, adding the `sandbox_id` label to them.
- Attach the additional `cri_uid`, `cri_name` and `cri_namespace` labels to the sandbox metrics, tracking the `uid`, `name` and `namespace` Kubernetes pod metadata.
- Expose a new Prometheus target, allowing all node metrics coming from the Kata shim to be collected by Prometheus indirectly. This simplifies the targets count in Prometheus and avoids exposing shim's metrics by `ip:port`.
Only one `kata-monitor` process runs in each node.
| 🚧 | `kata_shim_pod_overhead_cpu`: <br> Kata Pod overhead for CPU resources(percent). | `GAUGE` | percent | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_pod_overhead_memory_in_bytes`: <br> Kata Pod overhead for memory resources(bytes). | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> |
| ✅ | `kata_shim_proc_stat`: <br> Kata containerd shim v2 process statistics. | `GAUGE` | | <ul><li>`item` (see `/proc/<pid>/stat`)<ul><li>`cstime`</li><li>`cutime`</li><li>`stime`</li><li>`utime`</li></ul></li><li>`sandbox_id`</li></ul> |
| ✅ | `kata_shim_proc_status`: <br> Kata containerd shim v2 process status. | `GAUGE` | | <ul><li>`item` (see `/proc/<pid>/status`)<ul><li>`hugetlbpages`</li><li>`nonvoluntary_ctxt_switches`</li><li>`rssanon`</li><li>`rssfile`</li><li>`rssshmem`</li><li>`vmdata`</li><li>`vmexe`</li><li>`vmhwm`</li><li>`vmlck`</li><li>`vmlib`</li><li>`vmpeak`</li><li>`vmpin`</li><li>`vmpmd`</li><li>`vmpte`</li><li>`vmrss`</li><li>`vmsize`</li><li>`vmstk`</li><li>`vmswap`</li><li>`voluntary_ctxt_switches`</li></ul></li><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_cpu_seconds_total`: <br> Total user and system CPU time spent in seconds. | `COUNTER` | `seconds` | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_max_fds`: <br> Maximum number of open file descriptors. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_open_fds`: <br> Number of open file descriptors. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_start_time_seconds`: <br> Start time of the process since `unix` epoch in seconds. | `GAUGE` | `seconds` | <ul><li>`sandbox_id`</li></ul> |
| 🚧 | `kata_shim_process_virtual_memory_max_bytes`: <br> Maximum amount of virtual memory available in bytes. | `GAUGE` | `bytes` | <ul><li>`sandbox_id`</li></ul> |
| ✅ | `kata_shim_threads`: <br> Kata containerd shim v2 process threads. | `GAUGE` | | <ul><li>`sandbox_id`</li></ul> |
### Kata Hypervisor
Different from golang runtime, hypervisor and shim in runtime-rs belong to the **same process**, so all previous metrics for hypervisor and shim only need to be gathered once. Thus, we currently only collect previous metrics in kata shim.
At the same time, we added the interface(`VmmAction::GetHypervisorMetrics`) to gather hypervisor metrics, in case we design tailor-made metrics for hypervisor in the future. Here're metrics exposed from [src/dragonball/src/metric.rs](https://github.com/kata-containers/kata-containers/blob/main/src/dragonball/src/metric.rs).
# Virtual machine vCPU sizing in Kata Containers 3.0
> Preview:
> [Kubernetes(since 1.23)][1] and [Containerd(since 1.6.0-beta4)][2] will help calculate `Sandbox Size` info and pass it to Kata Containers through annotations.
> In order to adapt to this beneficial change and be compatible with the past, we have implemented the new vCPUs handling way in `runtime-rs`, which is slightly different from the original `runtime-go`'s design.
## When do we need to handle vCPUs size?
vCPUs sizing should be determined by the container workloads. So throughout the life cycle of Kata Containers, there are several points in time when we need to think about how many vCPUs should be at the time. Mainly including the time points of `CreateVM`, `CreateContainer`, `UpdateContainer`, and `DeleteContainer`.
*`CreateVM`: When creating a sandbox, we need to know how many vCPUs to start the VM with.
*`CreateContainer`: When creating a new container in the VM, we may need to hot-plug the vCPUs according to the requirements in container's spec.
*`UpdateContainer`: When receiving the `UpdateContainer` request, we may need to update the vCPU resources according to the new requirements of the container.
*`DeleteContainer`: When a container is removed from the VM, we may need to hot-unplug the vCPUs to reclaim the vCPU resources introduced by the container.
## On what basis do we calculate the number of vCPUs?
When Kata calculate the number of vCPUs, We have three data sources, the `default_vcpus` and `default_maxvcpus` specified in the configuration file (named `TomlConfig` later in the doc), the `io.kubernetes.cri.sandbox-cpu-quota` and `io.kubernetes.cri.sandbox-cpu-period` annotations passed by the upper layer runtime, and the corresponding CPU resource part in the container's spec for the container when `CreateContainer`/`UpdateContainer`/`DeleteContainer` is requested.
Our understanding and priority of these resources are as follows, which will affect how we calculate the number of vCPUs later.
* From `TomlConfig`:
*`default_vcpus`: default number of vCPUs when starting a VM.
*`default_maxvcpus`: maximum number of vCPUs.
* From `Annotation`:
*`InitialSize`: we call the size of the resource passed from the annotations as `InitialSize`. Kubernetes will calculate the sandbox size according to the Pod's statement, which is the `InitialSize` here. This size should be the size we want to prioritize.
* From `Container Spec`:
* The amount of CPU resources that the Container wants to use will be declared through the spec. Including the aforementioned annotations, we mainly consider `cpu quota` and `cpuset` when calculating the number of vCPUs.
*`cpu quota`: `cpu quota` is the most common way to declare the amount of CPU resources. The number of vCPUs introduced by `cpu quota` declared in a container's spec is: `vCPUs = ceiling( quota / period )`.
*`cpuset`: `cpuset` is often used to bind the CPUs that tasks can run on. The number of vCPUs may introduced by `cpuset` declared in a container's spec is the number of CPUs specified in the set that do not overlap with other containers.
## How to calculate and adjust the vCPUs size:
There are two types of vCPUs that we need to consider, one is the number of vCPUs when starting the VM (named `Boot Size` in the doc). The second is the number of vCPUs when `CreateContainer`/`UpdateContainer`/`DeleteContainer` request is received (`Real-time Size` in the doc).
### `Boot Size`
The main considerations are `InitialSize` and `default_vcpus`. There are the following principles:
`InitialSize` has priority over `default_vcpus` declared in `TomlConfig`.
1. When there is such an annotation statement, the originally `default_vcpus` will be modified to the number of vCPUs in the `InitialSize` as the `Boot Size`. (Because not all runtimes support this annotation for the time being, we still keep the `default_cpus` in `TomlConfig`.)
2. When the specs of all containers are aggregated for sandbox size calculation, the method is consistent with the calculation method of `InitialSize` here.
### `Real-time Size`
When we receive an OCI request, it may be for a single container. But what we have to consider is the number of vCPUs for the entire VM. So we will maintain a list. Every time there is a demand for adjustment, the entire list will be traversed to calculate a value for the number of vCPUs. In addition, there are the following principles:
1. Do not cut computing power and try to keep the number of vCPUs specified by `InitialSize`.
* So the number of vCPUs after will not be less than the `Boot Size`.
2.`cpu quota` takes precedence over `cpuset` and the setting history are took into account.
* We think quota describes the CPU time slice that a cgroup can use, and `cpuset` describes the actual CPU number that a cgroup can use. Quota can better describe the size of the CPU time slice that a cgroup actually wants to use. The `cpuset` only describes which CPUs the cgroup can use, but the cgroup can use the specified CPU but consumes a smaller time slice, so the quota takes precedence over the `cpuset`.
* On the one hand, when both `cpu quota` and `cpuset` are specified, we will calculate the number of vCPUs based on `cpu quota` and ignore `cpuset`. On the other hand, if `cpu quota` was used to control the number of vCPUs in the past, and only `cpuset` was updated during `UpdateContainer`, we will not adjust the number of vCPUs at this time.
3.`StaticSandboxResourceMgmt` controls hotplug.
* Some VMMs and kernels of some architectures do not support hotplugging. We can accommodate this situation through `StaticSandboxResourceMgmt`. When `StaticSandboxResourceMgmt = true` is set, we don't make any further attempts to update the number of vCPUs after booting.
# Design Doc for Kata Containers' VCPUs Pinning Feature
## Background
By now, vCPU threads of Kata Containers are scheduled randomly to CPUs. And each pod would request a specific set of CPUs which we call it CPU set (just the CPU set meaning in Linux cgroups).
If the number of vCPU threads are equal to that of CPUs claimed in CPU set, we can then pin each vCPU thread to one specified CPU, to reduce the cost of random scheduling.
## Detailed Design
### Passing Config Parameters
Two ways are provided to use this vCPU thread pinning feature: through `QEMU` configuration file and through annotations. Finally the pinning parameter is passed to `HypervisorConfig`.
| Official Doc Page | https://pkg.go.dev/golang.org/x/sys/unix#SchedSetaffinity |
### When is VCPUs Pinning Checked?
As shown in Section 1, when `num(vCPU threads) == num(CPUs in CPU set)`, we shall pin each vCPU thread to a specified CPU. And when this condition is broken, we should restore to the original random scheduling pattern.
So when may `num(CPUs in CPU set)` change? There are 5 possible scenes:
The following configuration includes three runtime classes:
The following configuration includes two runtime classes:
- `plugins.cri.containerd.runtimes.runc`: the runc, and it is the default runtime.
- `plugins.cri.containerd.runtimes.kata`: The function in containerd (reference [the document here](https://github.com/containerd/containerd/tree/master/runtime/v2#binary-naming))
- `plugins.cri.containerd.runtimes.kata`: The function in containerd (reference [the document here](https://github.com/containerd/containerd/tree/main/runtime/v2#binary-naming))
where the dot-connected string `io.containerd.kata.v2` is translated to `containerd-shim-kata-v2` (i.e. the
binary name of the Kata implementation of [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2)).
- `plugins.cri.containerd.runtimes.katacli`: the `containerd-shim-runc-v1` calls `kata-runtime`, which is the legacy process.
binary name of the Kata implementation of [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/main/runtime/v2)).
From Containerd v1.2.4 and Kata v1.6.0, there is a new runtime option supported, which allows you to specify a specific Kata configuration file as follows:
`privileged_without_host_devices` tells containerd that a privileged Kata container should not have direct access to all host devices. If unset, containerd will pass all host devices to Kata container, which may cause security issues.
`pod_annotations` is the list of pod annotations passed to both the pod sandbox as well as container through the OCI config.
`container_annotations` is the list of container annotations passed through to the OCI config of the containers.
This `ConfigPath` option is optional. If you do not specify it, shimv2 first tries to get the configuration file from the environment variable `KATA_CONF_FILE`. If neither are set, shimv2 will use the default Kata configuration file paths (`/etc/kata-containers/configuration.toml` and `/usr/share/defaults/kata-containers/configuration.toml`).
If you use Containerd older than v1.2.4 or a version of Kata older than v1.6.0 and also want to specify a configuration file, you can use the following workaround, since the shimv2 accepts an environment variable, `KATA_CONF_FILE` for the configuration file path. Then, you can create a
#### Kata Containers as the runtime for untrusted workload
For cases without `RuntimeClass` support, we can use the legacy annotation method to support using Kata Containers
@@ -218,28 +185,8 @@ and then, run an untrusted workload with Kata Containers:
runtime_type = "io.containerd.kata.v2"
```
For the earlier versions of Kata Containers and containerd that do not support Runtime V2 (Shim API), you can use the following alternative configuration:
```toml
[plugins.cri.containerd]
# "plugins.cri.containerd.default_runtime" is the runtime to use in containerd.
[plugins.cri.containerd.default_runtime]
# runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
runtime_type = "io.containerd.runtime.v1.linux"
# "plugins.cri.containerd.untrusted_workload_runtime" is a runtime to run untrusted workloads on it.
# runtime_type is the runtime type to use in containerd e.g. io.containerd.runtime.v1.linux
runtime_type = "io.containerd.runtime.v1.linux"
# runtime_engine is the name of the runtime engine used by containerd.
runtime_engine = "/usr/bin/kata-runtime"
```
You can find more information on the [Containerd config documentation](https://github.com/containerd/cri/blob/master/docs/config.md)
#### Kata Containers as the default runtime
If you want to set Kata Containers as the only runtime in the deployment, you can simply configure as follows:
@@ -250,15 +197,6 @@ If you want to set Kata Containers as the only runtime in the deployment, you ca
runtime_type = "io.containerd.kata.v2"
```
Alternatively, for the earlier versions of Kata Containers and containerd that do not support Runtime V2 (Shim API), you can use the following alternative configuration:
```toml
[plugins.cri.containerd]
[plugins.cri.containerd.default_runtime]
runtime_type = "io.containerd.runtime.v1.linux"
runtime_engine = "/usr/bin/kata-runtime"
```
### Configuration for `cri-tools`
> **Note:** If you skipped the [Install `cri-tools`](#install-cri-tools) section, you can skip this section too.
@@ -312,11 +250,55 @@ To run a container with Kata Containers through the containerd command line, you
After executing the above script, two files will be generated under the directory `/usr/share/kata-containers/` by default, namely `kata-flash0.img` and `kata-flash1.img`. Next we need to change the configuration file of `kata qemu`, which is in `/opt/kata/share/defaults/kata-containers/configuration-qemu.toml` by default, specify in the configuration file to use the UEFI ROM installed above. The above is an example of `kata deploy` installation. For package management installation, please use `kata-runtime env` to find the location of the configuration file. Please refer to the following configuration.
```
[hypervisor.qemu]
# -pflash can add image file to VM. The arguments of it should be in format
# of ["/path/to/flash0.img", "/path/to/flash1.img"]
@@ -48,9 +48,9 @@ Running Docker containers Kata Containers requires care because `VOLUME`s specif
kataShared on / type virtiofs (rw,relatime,dax)
```
`kataShared` mount types are powered by [`virtio-fs`][virtio-fs], a marked improvement over `virtio-9p`, thanks to [PR #1016](https://github.com/kata-containers/runtime/pull/1016). While `virtio-fs` is normally an excellent choice, in the case of DinD workloads `virtio-fs` causes an issue -- [it *cannot* be used as a "upper layer" of `overlayfs` without a custom patch](http://lists.katacontainers.io/pipermail/kata-dev/2020-January/001216.html).
`kataShared` mount types are powered by [`virtio-fs`](https://virtio-fs.gitlab.io/), a marked improvement over `virtio-9p`, thanks to [PR #1016](https://github.com/kata-containers/runtime/pull/1016). While `virtio-fs` is normally an excellent choice, in the case of DinD workloads `virtio-fs` causes an issue -- [it *cannot* be used as a "upper layer" of `overlayfs` without a custom patch](http://lists.katacontainers.io/pipermail/kata-dev/2020-January/001216.html).
As `/var/lib/docker` is a `VOLUME` specified by DinD (i.e. the `docker` images tagged `*-dind`/`*-dind-rootless`), `docker`fill fail to start (or even worse, silently pick a worse storage driver like `vfs`) when started in a Kata Container. Special measures must be taken when running DinD-powered workloads in Kata Containers.
As `/var/lib/docker` is a `VOLUME` specified by DinD (i.e. the `docker` images tagged `*-dind`/`*-dind-rootless`), `docker`will fail to start (or even worse, silently pick a worse storage driver like `vfs`) when started in a Kata Container. Special measures must be taken when running DinD-powered workloads in Kata Containers.
## Workarounds/Solutions
@@ -58,7 +58,7 @@ Thanks to various community contributions (see [issue references below](#referen
### Use a memory backed volume
For small workloads (small container images, without much generated filesystem load), a memory-backed volume is sufficient. Kubernetes supports a variant of [the `EmptyDir` volume][k8s-emptydir], which allows for memdisk-backed storage -- the [the `medium: Memory` ][k8s-memory-volume-type]. An example of a `Pod` using such a setup [was contributed](https://github.com/kata-containers/runtime/issues/1429#issuecomment-477385283), and is reproduced below:
For small workloads (small container images, without much generated filesystem load), a memory-backed volume is sufficient. Kubernetes supports a variant of [the `EmptyDir` volume](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir), which allows for memdisk-backed storage -- the the `medium: Memory`. An example of a `Pod` using such a setup [was contributed](https://github.com/kata-containers/runtime/issues/1429#issuecomment-477385283), and is reproduced below:
This guide is designed for developers and is - same as the Developer Guide - not intended for production systems or end users. It is advisable to only follow this guide on non-critical development systems.
## Prerequisites
To run Kata Containers in SNP-VMs, the following software stack is used.

The host BIOS and kernel must be capable of supporting AMD SEV-SNP and configured accordingly. For Kata Containers, the host kernel with branch [`sev-snp-iommu-avic_5.19-rc6_v3`](https://github.com/AMDESE/linux/tree/sev-snp-iommu-avic_5.19-rc6_v3) and commit [`3a88547`](https://github.com/AMDESE/linux/commit/3a885471cf89156ea555341f3b737ad2a8d9d3d0) is known to work in conjunction with SEV Firmware version 1.51.3 (0xh\_1.33.03) available on AMD's [SEV developer website](https://developer.amd.com/sev/). See [AMD's guide](https://github.com/AMDESE/AMDSEV/tree/sev-snp-devel) to configure the host accordingly. Verify that you are able to run SEV-SNP encrypted VMs first. The guest components required for Kata Containers are built as described below.
**Tip**: It is easiest to first have Kata Containers running on your system and then modify it to run containers in SNP-VMs. Follow the [Developer guide](../Developer-Guide.md#warning) and then follow the below steps. Nonetheless, you can just follow this guide from the start.
## How to build
Follow all of the below steps to install Kata Containers with SNP-support from scratch. These steps mostly follow the developer guide with modifications to support SNP
__Steps from the Developer Guide:__
- Get all the [required components](../Developer-Guide.md#requirements-to-build-individual-components) for building the kata-runtime
- [Build the and install kata-runtime](../Developer-Guide.md#build-and-install-the-kata-containers-runtime)
- [Build a custom agent](../Developer-Guide.md#build-a-custom-kata-agent---optional)
- [Create an initrd image](../Developer-Guide.md#create-an-initrd-image---optional) by first building a rootfs, then building the initrd based on the rootfs, use a custom agent and install. `ubuntu` works as the distribution of choice.
- Get the [required components](../../tools/packaging/kernel/README.md#requirements) to build a custom kernel
__SNP-specific steps:__
- Build the SNP-specific kernel as shown below (see this [guide](../../tools/packaging/kernel/README.md#build-kata-containers-kernel) for more information)
```bash
$ pushd kata-containers/tools/packaging/kernel/
$ ./build-kernel.sh -a x86_64 -x snp setup
$ ./build-kernel.sh -a x86_64 -x snp build
$ sudo -E PATH="${PATH}" ./build-kernel.sh -x snp install
- Use the custom QEMU capable of SNP (change path)
```toml
path="/path/to/qemu/build/qemu-system-x86_64"
```
- Use `virtio-9p` device since `virtio-fs` is unsupported due to bugs / shortcomings in QEMU version [`snp-v3`](https://github.com/AMDESE/qemu/tree/snp-v3) for SEV and SEV-SNP (change value)
```toml
shared_fs="virtio-9p"
```
- Disable `virtiofsd` since it is no longer required (comment out)
With Kata Containers configured to support SNP-VMs, we use containerd to test and deploy containers in these VMs.
### Install Containerd
If not already present, follow [this guide](./containerd-kata.md#install) to install containerd and its related components including `CNI` and the `cri-tools` (skip Kata Containers since we already installed it)
### Containerd Configuration
Follow [this guide](./containerd-kata.md#configuration) to configure containerd to use Kata Containers
## Run Kata Containers in SNP-VMs
Run the below commands to start a container. See [this guide](./containerd-kata.md#run) for more information
To obtain an attestation report inside the container, the `/dev/sev-guest` must first be configured. As of now, the VM does not perform this step, however it can be performed inside the container, either in the terminal or in code.
/ # mknod -m 600 /dev/sev-guest c "${SNP_MAJOR}" "${SNP_MINOR}"
```
## Known Issues
- Support for cgroups v2 is still [work in progress](https://github.com/kata-containers/kata-containers/issues/927). If issues occur due to cgroups v2 becoming the default in newer systems, one possible solution is to downgrade cgroups to v1:
```bash
sudo sed -i 's/^\(GRUB_CMDLINE_LINUX=".*\)"/\1 systemd.unified_cgroup_hierarchy=0"/' /etc/default/grub
sudo update-grub
sudo reboot
```
- If both SEV and SEV-SNP are supported by the host, Kata Containers uses SEV-SNP by default. You can verify what features are enabled by checking `/sys/module/kvm_amd/parameters/sev` and `sev_snp`. This means that Kata Containers can not run both SEV-SNP-VMs and SEV-VMs at the same time. If SEV is to be used by Kata Containers instead, reload the `kvm_amd` kernel module without SNP-support, this will disable SNP-support for the entire platform.
# A new way for Kata Containers to use Kinds of Block Volumes
> **Note:** This guide is only available for runtime-rs with default Hypervisor Dragonball.
> Now, other hypervisors are still ongoing, and it'll be updated when they're ready.
## Background
Currently, there is no widely applicable and convenient method available for users to use some kinds of backend storages, such as File on host based block volume, SPDK based volume or VFIO device based volume for Kata Containers, so we adopt [Proposal: Direct Block Device Assignment](https://github.com/kata-containers/kata-containers/blob/main/docs/design/direct-blk-device-assignment.md) to address it.
## Solution
According to the proposal, it requires to use the `kata-ctl direct-volume` command to add a direct assigned block volume device to the Kata Containers runtime.
And then with the help of method [get_volume_mount_info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L95), get information from JSON file: `(mountinfo.json)` and parse them into structure [Direct Volume Info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L70) which is used to save device-related information.
We only fill the `mountinfo.json`, such as `device` ,`volume_type`, `fs_type`, `metadata` and `options`, which correspond to the fields in [Direct Volume Info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L70), to describe a device.
The JSON file `mountinfo.json` placed in a sub-path `/kubelet/kata-test-vol-001/volume001` which under fixed path `/run/kata-containers/shared/direct-volumes/`.
And the full path looks like: `/run/kata-containers/shared/direct-volumes/kubelet/kata-test-vol-001/volume001`, But for some security reasons. it is
encoded as `/run/kata-containers/shared/direct-volumes/L2t1YmVsZXQva2F0YS10ZXN0LXZvbC0wMDEvdm9sdW1lMDAx`.
Finally, when running a Kata Containers with `ctr run --mount type=X, src=Y, dst=Z,,options=rbind:rw`, the `type=X` should be specified a proprietary type specifically designed for some kind of volume.
Now, supported types:
-`directvol` for direct volume
-`vfiovol` for VFIO device based volume
-`spdkvol` for SPDK/vhost-user based volume
## Setup Device and Run a Kata-Containers
### Direct Block Device Based Volume
#### create raw block based backend storage
> **Tips:** raw block based backend storage MUST be formatted with `mkfs`.
As `/run/kata-containers/shared/direct-volumes/` is a fixed path , we will be able to run a kata pod with `--mount` and set
`src` sub-path. And the `--mount` argument looks like: `--mount type=spdkvol,src=/kubelet/kata-test-vol-001/volume001,dst=/disk001`.
#### Run a Kata container with SPDK vhost-user block device
In the case, `ctr run --mount type=X, src=source, dst=dest`, the X will be set `spdkvol` which is a proprietary type specifically designed for SPDK volumes.
```bash
$ # ctr run with --mount type=spdkvol,src=/kubelet/kata-test-vol-001/volume001,dst=/disk001
To improve security, Kata Container supports running the VMM process (currently only QEMU) as a non-`root` user.
To improve security, Kata Container supports running the VMM process (QEMU and cloud-hypervisor) as a non-`root` user.
This document describes how to enable the rootless VMM mode and its limitations.
## Pre-requisites
@@ -27,7 +27,7 @@ Another necessary change is to move the hypervisor runtime files (e.g. `vhost-fs
## Limitations
1. Only the VMM process is running as a non-root user. Other processes such as Kata Container shimv2 and `virtiofsd` still run as the root user.
2. Currently, this feature is only supported in QEMU. Still need to bring it to Firecracker and Cloud Hypervisor (see https://github.com/kata-containers/kata-containers/issues/2567).
2. Currently, this feature is only supported in QEMU and cloud-hypervisor. For firecracker, you can use jailer to run the VMM process with a non-root user.
3. Certain features will not work when rootless VMM is enabled, including:
1. Passing devices to the guest (`virtio-blk`, `virtio-scsi`) will not work if the non-privileged user does not have permission to access it (leading to a permission denied error). A more permissive permission (e.g. 666) may overcome this issue. However, you need to be aware of the potential security implications of reducing the security on such devices.
2. `vfio` device will also not work because of permission denied error.
@@ -57,6 +57,7 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.enable_iothreads` | `boolean`| enable IO to be processed in a separate thread. Supported currently for virtio-`scsi` driver |
| `io.katacontainers.config.hypervisor.enable_mem_prealloc` | `boolean` | the memory space used for `nvdimm` device by the hypervisor |
| `io.katacontainers.config.hypervisor.entropy_source` (R) | string| the path to a host source of entropy (`/dev/random`, `/dev/urandom` or real hardware RNG device) |
@@ -87,10 +88,11 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.use_vsock` | `boolean` | specify use of `vsock` for agent communication |
| `io.katacontainers.config.hypervisor.vhost_user_store_path` (R) | `string` | specify the directory path where vhost-user devices related folders, sockets and device nodes should be (QEMU) |
# Configure Kata Containers to use EROFS build rootfs
## Introduction
For kata containers, rootfs is used in the read-only way. EROFS can noticeably decrease metadata overhead.
`mkfs.erofs` can generate compressed and uncompressed EROFS images.
For uncompressed images, no files are compressed. However, it is optional to inline the data blocks at the end of the file with the metadata.
For compressed images, each file will be compressed using the lz4 or lz4hc algorithm, and it will be confirmed whether it can save space. Use No compression of the file if compression does not save space.
On newer `Ubuntu/Debian` systems, it can be installed directly using the `apt` command, and on `Fedora` it can be installed directly using the `dnf` command.
If you need to enable the `Lz4` compression feature, `Lz4 1.8.0+` is required, and `Lz4 1.9.3+` is strongly recommended.
##### Compilation process
For some old lz4 versions (lz4-1.8.0~1.8.3), if lz4-static is not installed, the lz4hc algorithm will not be supported. lz4-static can be installed with apt install lz4-static.x86_64. However, these versions have some bugs in compression, and it is not recommended to use these versions directly.
If you use `lz4 1.9.0+`, you can directly use the following command to compile.
```shell
$ ./autogen.sh
$ ./configure
$ make
```
The compiled `mkfs.erofs` program will be saved in the `mkfs` directory. Afterwards, the generated tools can be installed to a system directory using make install (requires root privileges).
$ sudo sed -i -e 's/^# *\(rootfs_type\).*=.*$/\1 = erofs/g' /etc/kata-containers/configuration.toml
```
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.