kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-02-22 14:54:23 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	855f4dc7fa	release: Bump version to 3.27.0 Bump VERSION and helm-charts versions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-19 14:01:26 +01:00
Markus Rudy	0621e65e74	genpolicy: allow RO and RW for sysfs with privileged container After containerd 2.0.4, privileged containers handle sysfs mounts a bit differently, so we can end up with the policy expecting RO and the input having RW. The sandbox needs to get privileged mounts when any container in the pod is privileged, not only when the pause container itself is marked privileged. So we now compute that and pass it into get_mounts. One downside: we’re relaxing policy checks (accepting RO/RW mismatch for sysfs) and giving the pause container privileged mounts whenever the pod has any privileged workload. For Kata, that means a slightly broader attack surface for privileged pods—the pause container sees more than it strictly needs, and we’re being more permissive on sysfs. It’s a trade-off for compatibility with newer containerd; if you need maximum isolation, you may want to avoid privileged pods or tighten policy elsewhere. Fixes: #12532 Signed-off-by: Markus Rudy <mr@edgeless.systems> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-19 11:16:50 +01:00
Amulyam24	a22c59a204	kata-deploy: enable kata-remote for ppc64le When kata-deploy is deployed with cloud-api-adaptor, it defaults to qemu instead of configuring the remote shim. Support ppc64le to enable it correctly when shims.remote.enabled=true Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2026-02-19 11:14:27 +01:00
Steve Horsman	6a67250397	Merge commit from fork runtime-go/rs: Disable virtio-pmem for Cloud Hypervisor	2026-02-19 09:00:56 +00:00
Chiranjeevi Uddanti	88203cbf8d	tests: Add regression test for sandbox_cgroup_only=false Add unit test for get_ch_vcpu_tids() and integration test that creates a pod with sandbox_cgroup_only=false to verify it starts successfully. Signed-off-by: Chiranjeevi Uddanti <244287281+chiranjeevi-max@users.noreply.github.com> Co-authored-by: Antigravity <antigravityagent@google.com>	2026-02-18 20:20:14 +01:00
Chiranjeevi Uddanti	9c52f0caa7	runtime-rs/ch: Fix inverted vcpu/tid mapping in get_ch_vcpu_tids The VcpuThreadIds struct expects a mapping from vcpu_id to thread_id, but get_ch_vcpu_tids() was inserting (tid, vcpu_id) instead of (vcpu_id, tid). This caused move_vcpus_to_sandbox_cgroup() to interpret vcpu IDs (0, 1, 2...) as process IDs when sandbox_cgroup_only=false, leading to failed attempts to read /proc/0/status. Fixes: #12479 Signed-off-by: Chiranjeevi Uddanti <244287281+chiranjeevi-max@users.noreply.github.com>	2026-02-18 20:20:14 +01:00
Aurélien Bombo	8ff9cd1f12	Merge pull request #12455 from ajaypvictor/secret-cm-without-sharedfs ci: Add integration tests for secret & configmap propagation	2026-02-18 12:06:48 -06:00
Aurélien Bombo	336b922d4f	tests/cbl-mariner: Stop disabling NVDIMM explicitly This is not needed anymore since now disable_image_nvdimm=true for Cloud Hypervisor. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-02-18 11:52:51 -06:00
Aurélien Bombo	48aa077e8c	runtime{,-rs}/qemu/arm64: Disable DAX Enabling full-featured QEMU NVDIMM support on ARM with DAX enabled causes a kernel panic in caches_clean_inval_pou (see below, different issue from `33b1f07`), so we disable DAX in that environment. [ 1.222529] EXT4-fs (pmem0p1): mounted filesystem e5a4892c-dac8-42ee-ba55-27d4ff2f38c3 ro with ordered data mode. Quota mode: disabled. [ 1.222695] VFS: Mounted root (ext4 filesystem) readonly on device 259:1. [ 1.224890] devtmpfs: mounted [ 1.225175] Freeing unused kernel memory: 1920K [ 1.226102] Run /sbin/init as init process [ 1.226164] with arguments: [ 1.226204] /sbin/init [ 1.226235] with environment: [ 1.226268] HOME=/ [ 1.226295] TERM=linux [ 1.230974] Internal error: synchronous external abort: 0000000096000010 [#1] SMP [ 1.231963] CPU: 0 UID: 0 PID: 1 Comm: init Tainted: G M 6.18.5 #1 NONE [ 1.232965] Tainted: [M]=MACHINE_CHECK [ 1.233428] pstate: 43400005 (nZcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 1.234273] pc : caches_clean_inval_pou+0x68/0x84 [ 1.234862] lr : sync_icache_aliases+0x30/0x38 [ 1.235412] sp : ffff80008000b9a0 [ 1.235842] x29: ffff80008000b9a0 x28: 0000000000000000 x27: 00000000019a00e1 [ 1.236912] x26: ffff80008000bc08 x25: ffff80008000baf0 x24: fffffdffc0000000 [ 1.238064] x23: ffff000001671ab0 x22: ffff000001663480 x21: fffffdffc23401c0 [ 1.239356] x20: fffffdffc23401c0 x19: fffffdffc23401c0 x18: 0000000000000000 [ 1.240626] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 1.241762] x14: ffffaae8f021b3b0 x13: 0000000000000000 x12: ffffaae8f021b3b0 [ 1.242874] x11: ffffffffffffffff x10: 0000000000000000 x9 : 0000ffffbb53c000 [ 1.244022] x8 : 0000000000000000 x7 : 0000000000000012 x6 : ffff55178f5e5000 [ 1.245157] x5 : ffff80008000b970 x4 : ffff00007fa4f680 x3 : ffff00008d007000 [ 1.246257] x2 : 0000000000000040 x1 : ffff00008d008000 x0 : ffff00008d007000 [ 1.247387] Call trace: [ 1.248056] caches_clean_inval_pou+0x68/0x84 (P) [ 1.248923] __sync_icache_dcache+0x7c/0x9c [ 1.249578] insert_page_into_pte_locked+0x1e4/0x284 [ 1.250432] insert_page+0xa8/0xc0 [ 1.251080] vmf_insert_page_mkwrite+0x40/0x7c [ 1.251832] dax_iomap_pte_fault+0x598/0x804 [ 1.252646] dax_iomap_fault+0x28/0x30 [ 1.253293] ext4_dax_huge_fault+0x80/0x2dc [ 1.253988] ext4_dax_fault+0x10/0x3c [ 1.254679] __do_fault+0x38/0x12c [ 1.255293] __handle_mm_fault+0x530/0xcf0 [ 1.255990] handle_mm_fault+0xe4/0x230 [ 1.256697] do_page_fault+0x17c/0x4dc [ 1.257487] do_translation_fault+0x30/0x38 [ 1.258184] do_mem_abort+0x40/0x8c [ 1.258895] el0_ia+0x4c/0x170 [ 1.259420] el0t_64_sync_handler+0xd8/0xdc [ 1.260154] el0t_64_sync+0x168/0x16c [ 1.260795] Code: d2800082 9ac32042 d1000443 8a230003 (d50b7523) [ 1.261756] ---[ end trace 0000000000000000 ]--- Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-02-18 11:52:43 -06:00
Aurélien Bombo	c727332b0e	runtime/qemu/arm64: Align NVDIMM usage on amd64 Nowadays on arm64 we use a modern QEMU version which supports the features we require for NVDIMM, so we remove the arm64-specific code and use the generic implementation. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-02-18 11:47:53 -06:00
Aurélien Bombo	e17f96251d	runtime{,-rs}/clh: Disable virtio-pmem This disables virtio-pmem support for Cloud Hypervisor by changing Kata config defaults and removing the relevant code paths. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-02-18 11:47:53 -06:00
Zvonko Kaiser	1d09e70233	Merge pull request #12538 from fidencio/topic/kata-deploy-fix-regression-on-hardcopying-symlinks kata-deploy: preserve symlinks when installing artifacts	2026-02-18 12:44:46 -05:00
Mikko Ylinen	5622ab644b	versions: bump QEMU to v10.2.1 v10.2.1 is the latest patch release in v10.2 series. Changes: https://github.com/qemu/qemu/compare/v10.2.0...v10.2.1 Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-18 18:18:52 +01:00
Mikko Ylinen	d68adc54da	versions: bump to Linux v6.18.12 (LTS) Latest changelog in https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.18.12 Also other changes for 6..11 updates are available. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-18 18:18:52 +01:00
Fabiano Fidêncio	34336f87c7	kata-deploy: convert install.rs get_hypervisor_name tests to rstest Use rstest parameterized tests for QEMU variants, other hypervisors, and unknown/empty shim cases. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-18 12:41:55 +01:00
Fabiano Fidêncio	bb11bf0403	kata-deploy: preserve symlinks when installing artifacts When copying artifacts from the container to the host, detect source entries that are symlinks and recreate them as symlinks at the destination instead of copying the target file. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-18 12:29:14 +01:00
Dan Mihai	eee25095b5	tests: mariner annotations for k8s-openvpn This test uses YAML files from a different directory than the other k8s CI tests, so annotations have to be added into these separate files. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2026-02-18 07:17:04 +01:00
Markus Rudy	8365afa336	qemu: log exit code after failure When qemu exits prematurely, we usually see a message like msg="Cannot start VM" error="exiting QMP loop, command cancelled" This is an indirect hint, caused by the QMP server shutting down. It takes experience to understand what it even means, and it still does not show what's actually the problem. With this commit, we're taking the error return from the qemu subprocess and surface it in the logs, if it's not nil. This means we automatically capture any non-zero exit codes in the logs. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2026-02-17 21:03:13 +01:00
Fabiano Fidêncio	f0a0425617	kata-deploy: convert a few toml.rs tests to rstest Turn test_toml_value_types into a parameterized test with one case per type (string, bool, int). Merge the two invalid-TOML tests (get and set) into one rstest with two cases, and the two "not an array" tests into one rstest with two cases. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-17 09:33:39 +01:00
Fabiano Fidêncio	899005859c	kata-deploy: avoid leading/blank lines in written TOML config When writing containerd drop-in or other TOML (e.g. initially empty file), the serialized document could start with many newlines. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-17 09:33:39 +01:00
Fabiano Fidêncio	cfa8188cad	kata-deploy: convert containerd version support tests to rstest Replace multiple #[test] functions for snapshotter and erofs version checks with parameterized #[rstest] #[case] tests for consistency and easier extension. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-17 09:33:39 +01:00
Fabiano Fidêncio	cadac7a960	kata-deploy: runtime_platform -> runtime_platforms Fix runtime_platforms typo. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-17 09:33:39 +01:00
Hyounggyu Choi	8bc60a0761	Merge pull request #12521 from fidencio/topic/kata-deploy-auto-add-nfd-tee-labels-to-the-runtime-class kata-deploy: Add TEE nodeSelectors for TEE shims when NFD is detected	2026-02-16 18:06:18 +01:00
Jacek Tomasiak	8025fa0457	agent: Don't pass empty options to mount With some older kernels some fs implementations don't handle empty options strings well. This leads to failures in "setup rootfs" step. E.g. `cgroup: cgroup2: unknown option ""`. This is fixed by mapping empty string to `None` before passing to `nix::mount`. Signed-off-by: Jacek Tomasiak <jtomasiak@arista.com> Signed-off-by: Jacek Tomasiak <jacek.tomasiak@gmail.com>	2026-02-16 14:55:59 +01:00
Fabiano Fidêncio	a04df4f4cb	kata-deploy: disable provenance/SBOM for quay.io compatibility Disable provenance and SBOM when building per-arch kata-deploy images so each tag is a single image manifest. quay.io rejects pushing multi-arch manifest lists that include attestation manifests (400 manifest invalid). Add a note in the release script documenting this. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-16 13:32:25 +01:00
Fabiano Fidêncio	0e8e30d6b5	kata-deploy: fix default RuntimeClass + nodeSelectors The default RuntimeClass (e.g. kata) is meant to point at the default shim handler (e.g. kata-qemu-$tee). We were building it in a separate block and only sometimes adding the same TEE nodeSelectors as the shim-specific RuntimeClass, leading to kata ending up without the SE/SNP/TDX nodeSelector while kata-qemu-$tee had it. The fix is to stop duplicating the RuntimeClass definition, having a single template that renders one RuntimeClass (name, handler, overhead, nodeSelectors). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-16 13:09:03 +01:00
Fabiano Fidêncio	80a175d09b	kata-deploy: Add TEE nodeSelectors for TEE shims when NFD is detected When NFD is detected (deployed by the chart or existing in the cluster), apply shim-specific nodeSelectors only for TEE runtime classes (snp, tdx, and se). Non-TEE shims keep existing behavior (e.g. runtimeClass.nodeSelector for nvidia GPU from `f3bba0885` is unchanged). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-16 12:07:51 +01:00
Fabiano Fidêncio	d000acfe08	infra: fix multi-arch manifest publish Per-arch images were failing publish-multiarch-manifest with 'X is a manifest list' because Buildx now enables attestations by default, so each arch tag became an image index. Use 'docker buildx imagetools create' instead of 'docker manifest create' so we can merge those indexes into the final multi-arch manifest while keeping provenance and SBOM on per-arch images. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-14 19:49:00 +01:00
Fabiano Fidêncio	02c9a4b23c	kata-deploy: Temporarily comment GPU specific labels We depend on GPU Operator v26.3 release, which is not out yet. Although we have been testing with it, it's not yet publicly available, which would break anyone actually trying to use the GPU runtime classes. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-14 09:25:14 +01:00
Ajay Victor	83935e005c	ci: Add integration tests for secret & configmap propagation Enhance k8s-configmap.bats and k8s-credentials-secrets.bats to test that ConfigMap and Secret updates propagate to volume-mounted pods. - Enhanced k8s-configmap.bats to test ConfigMap propagation * Added volume mount test for ConfigMap consumption * Added verification that ConfigMap updates propagate to volume-mounted pods - Enhanced k8s-credentials-secrets.bats to test Secret propagation * Added verification that Secret updates propagate to volume-mounted pods Fixes #8015 Signed-off-by: Ajay Victor <ajvictor@in.ibm.com>	2026-02-14 08:56:21 +05:30
Fabiano Fidêncio	5106e7b341	build: Add gnupg to the agent's builder container Otherwise we'll fail to check gperf's GPG signing key when needed. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-14 00:33:45 +01:00
stevenhorsman	79b5022a5a	kata-ctl: Bump rkyv version to 0.7.46 Bump to remediate RUSTSEC-2026-0001 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-14 00:33:45 +01:00
stevenhorsman	30ebc4241e	genpolicy: Bump rkyv version to 0.7.46 Bump to remediate RUSTSEC-2026-0001 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-14 00:33:45 +01:00
stevenhorsman	87d1979c84	agent-ctl: Bump rkyv version to 0.7.46 Bump to remediate RUSTSEC-2026-0001 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-14 00:33:45 +01:00
stevenhorsman	90dbd3f562	agent: Bump rkyv version to 0.7.46 Bump to remediate RUSTSEC-2026-0001 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-14 00:33:45 +01:00
stevenhorsman	7f77948658	versions: Bump rkyv version to 0.7.46 Bump to remediate RUSTSEC-2026-0001 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-14 00:33:45 +01:00
Aurélien Bombo	981f693a88	Merge pull request #11140 from balintTobik/hyperv_warning runtime: refactor hypervisor devices cgroup creation	2026-02-13 15:16:09 -06:00
Fabiano Fidêncio	d8acc403c8	kata-deploy: set CRI images runtime_platform snapshotter for containerd v3 In containerd config v3 the CRI plugin is split into runtime and images, and setting the snapshotter only on the runtime plugin is not enough for image pull/prepare. The images plugin must have runtime_platform.<runtime>.snapshotter so it uses the correct snapshotter per runtime (e.g. nydus, erofs). A PR on the containerd side is open so we can rely on the runtime plugin snapshotter alone: https://github.com/containerd/containerd/pull/12836 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 22:15:02 +01:00
Fabiano Fidêncio	2930c68c0b	ci: tdx: properly skip k8s-sandbox-vcpus-allocation.bats This is a follow-up for `25962e9325` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 20:56:08 +01:00
Fabiano Fidêncio	f6e0a7c33c	scripts: use temporary GPG home when verifying cached gperf tarball In CI the default GPG keyring is often read-only or missing, so 'gpg --import' of the cached keyring fails and verification cannot succeed. Use a temporary GNUPGHOME for import and verify so cached gperf can be verified without writing to the system keyring. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 19:39:55 +01:00
stevenhorsman	55a89f6836	runtime: doc: Remove usage of golang.org/x/net/context This package is deprecated and we aren't using it any more Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-13 17:55:23 +01:00
stevenhorsman	06246ea18b	csi-kata-directvolume: Remove usage of golang.org/x/net/context This packages is deprecated, so use the standard library context package instead Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-13 17:55:23 +01:00
stevenhorsman	f2fae93785	csi-kata-directvolume: Bump x/net to v0.50 Remediates CVEs: - GO-2026-4440 - GO-2026-4441 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-13 17:55:23 +01:00
stevenhorsman	74d4469dab	ci/openshift-ci: Bump x/net to v0.50 Remediates CVEs: - GO-2026-4440 - GO-2026-4441 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-13 17:55:23 +01:00
Steve Horsman	bb867149bb	Merge pull request #12514 from fidencio/topic/nvidia-try-to-improve-genpolicy-failures tests: nvidia: Fix genpolicy error when pulling nvcr.io images	2026-02-13 16:34:00 +00:00
Joji Mekkattuparamban	f3bba08851	kata-deploy: add node selector to nvidia runtime classes The CC runtime classes kata-qemu-nvidia-gpu-snp and kata-qemu-nvidia-gpu-tdx are mutually exclusive with kata-qemu-nvidia-gpu, as dictated by the gpu cc mode setting. In order to properly support a cluster that has both CC and non-CC nodes, we use a node selector so the scheduling is consistent with the GPU mode. The GPU operator sets a label nvidia.com/cc.ready.state=[true, false] to indicate the gpu mode setting Fixes #12431 Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>	2026-02-13 15:58:06 +01:00
Fabiano Fidêncio	8cb7d0be9d	tests: nvidia: Fix genpolicy error when pulling nvcr.io images genpolicy pulls image manifests from nvcr.io to generate policy and was failing with 'UnauthorizedError' because it had no registry credentials. Genpolicy (src/tools/genpolicy) uses docker_credential::get_credential() in registry.rs, which reads from DOCKER_CONFIG/config.json. Add setup_genpolicy_registry_auth() to create a Docker config with nvcr.io auth (NGC_API_KEY) and set DOCKER_CONFIG before running genpolicy so it can authenticate when pulling manifests. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 13:12:55 +01:00
Fabiano Fidêncio	f4dcb66a3c	ci: add workflow to push ORAS tarball cache Add push-oras-tarball-cache workflow that runs on push to main when versions.yaml changes (and on workflow_dispatch). It populates the ghcr.io ORAS cache with gperf and busybox tarballs from versions.yaml. Remove the push_to_cache call from download-with-oras-cache.sh since it was never triggered in CI. Cache population is now done solely by the new workflow and by populate-oras-tarball-cache.sh when run manually. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-13 12:57:48 +01:00
Balint Tobik	295a6a81d0	runtime: refactor hypervisor devices cgroup creation Separatly added hypervisor devices to cgroup to omit not relevant warnings and fail if none of them are available. Also fix a testcase reload removed kernel modules to later testcases and skip some tests on ARM because lack of virtualization support Fixes #6656 Signed-off-by: Balint Tobik <btobik@redhat.com>	2026-02-13 09:23:08 +01:00
Aurélien Bombo	14be9504e7	Merge pull request #12506 from kata-containers/sprt/gperf-mirror versions: Switch gperf mirror again	2026-02-12 17:00:17 -06:00
Fabiano Fidêncio	a01e95b988	kata-deploy: test k3s/rke2 template handling / version checks Add tests for the split_non_toml_header helper that strips Go template directives before TOML parsing, and for every TOML operation (set, get, append, remove, set_array) on files that start with {{ template "base" . }}. Also converts the containerd version detection tests in manager.rs from individual #[test] functions with helper wrappers to parametrized #[rstest] cases, which is more readable and easier to extend. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-12 22:30:08 +01:00
Fabiano Fidêncio	2e7633674f	kata-deploy: use k3s/rke2 base template K3s docs (https://docs.k3s.io/advanced#configuring-containerd) say that the right way to customize containerd is to extend the base template with {{ template "base" . }} and append your own TOML blocks, rather than copying a prerendered config.toml into the template file. We were copying config.toml into config.toml.tmpl / config-v3.toml.tmpl, which meant we were replacing the K3s defaults with a snapshot that gets stale as soon as K3s is upgraded. Now we create the template files with just the base directive and let our regular set_toml_value code path append the Kata runtime configuration on top. To make that work, the TOML utils learned to handle files that start with a Go template line ({{ ... }}): strip it before parsing, put it back when writing. This keeps the K3s/RKE2 path identical to every other runtime -- no special append logic needed. refs: * k3s:: https://docs.k3s.io/advanced#configuring-containerd * rke2: https://docs.rke2.io/advanced?_highlight=conyainerd#configuring-containerd Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-12 22:30:08 +01:00
Aurélien Bombo	199e1ab16c	versions: Switch gperf mirror again The mirror introduced by #11178 still breaks quite often so apply this as a quick fix. A proper solution would probably be to load balance like in #12453. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-02-12 13:41:19 -06:00
Fabiano Fidêncio	6a3bbb1856	tests: Retry k8s deployment We've seen a lot of spurious issues when deploying the infra needed for the tests. Let's give it a few tries before actually failing. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-02-12 20:13:59 +01:00
Manuel Huber	ed7de905b5	build: Tighten upstream download path for ORAS The gperf-3.3 tarball frequently fails to download on my end with cryptic error messages such as: "tar: This does not look like a tar archive". This change tightens the download logic a bit: We fail at the point in time when we're supposed to fail. This way we detect rate limiting issues right away, and this way, the actual hashsum and signature checks are effective, not only printouts. This change also updates the key reference and allows for an array, for instance, when a different signer was used for a cache vs upstream version. The change also makes it clear, that signature verification is only implemented for the gperf tarball. Improvements can be made in a subsequent change. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-12 19:20:35 +01:00
Fabiano Fidêncio	9fc5be47d0	kata-deploy: fix custom runtime config path for runtime-rs shims Custom runtimes whose base config lives under runtime-rs/ (e.g. dragonball, cloud-hypervisor) were not found because the path was always built under share/defaults/kata-containers/. Use get_kata_containers_original_config_path for the handler so rust shim configs are read from .../runtime-rs/. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-12 18:08:47 +01:00
Fabiano Fidêncio	50923b6d62	kata-deploy: run cleanup on uninstall via DaemonSet preStop On helm uninstall let's rely on a preStop hook to run kata-deploy cleanup so each pod cleans its node before exiting. We must keep RBAC (resource-policy: keep) so pods retain API access during termination, and then can properly delete the NodeFeatureRules and remove the labels from the nodes. The post-delete hook Job, which runs on a single node, now is only responsible for cleaning the kept RBAC (cluster-wide resource) after uninstall, not leaving any resource or artefact behind. The changes on this commit lead to a "resouerces were kept" message when running `helm uninstall`, which document as being normal, as the post-delete job will remove those. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-11 22:05:10 +01:00
Fabiano Fidêncio	6e0cbc28a3	kata-deploy: fix node label removal When removing a node label, JSON merge patch semantics require setting the key to null; omitting the key leaves it unchanged. Fix label_node to send a patch with the label key set to null so the API server actually removes katacontainers.io/kata-runtime. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-11 22:05:10 +01:00
Fabiano Fidêncio	510d2a69ae	kata-deploy: exit with 0 on SIGTERM in install mode Wait for SIGTERM after install and exit(0) so the container terminates cleanly. If registering the SIGTERM handler fails, log a warning and sleep forever instead of exiting with an error (fallback to the old behaviour). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-11 22:05:10 +01:00
Mikko Ylinen	25962e9325	tests/coco: disable k8s-sandbox-vcpus-allocation.bats for TDX After the move to Linux 6.17 and QEMU 10.2 from Kata, k8s-sandbox-vcpus-allocation.bats started failing on TDX. 2026-02-10T16:39:39.1305813Z # pod/vcpus-less-than-one-with-no-limits created 2026-02-10T16:39:39.1306474Z # pod/vcpus-less-than-one-with-limits created 2026-02-10T16:39:39.1307090Z # pod/vcpus-more-than-one-with-limits created 2026-02-10T16:39:39.1307672Z # pod/vcpus-less-than-one-with-limits condition met 2026-02-10T16:39:39.1308373Z # timed out waiting for the condition on pods/vcpus-less-than-one-with-no-limits 2026-02-10T16:39:39.1309132Z # timed out waiting for the condition on pods/vcpus-more-than-one-with-limits 2026-02-10T16:39:39.1310370Z # Error from server (BadRequest): container "vcpus-less-than-one-with-no-limits" in pod "vcpus-less-than-one-with-no-limits" is waiting to start: ContainerCreating A manual test without agent policies added it seems to work OK but disable the test for now to get CI stable. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-11 22:02:59 +01:00
stevenhorsman	006a5d5141	versions: Tidy up versions file - We don't use containerd.latest as the comment on it suggests - We also don't have any references to `sriov-network-device` so remove that and the plugins section. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-11 20:49:53 +01:00
Dan Mihai	9d763e9d5a	Merge pull request #12439 from sespiros/genpolicy-suppress-yaml-stdout genpolicy: suppress YAML output when --{base64/raw}-out are used	2026-02-11 10:27:01 -08:00
Spyros Seimenis	282bfc9f14	genpolicy: suppress YAML output when --{base64/raw}-out are used this will suppress yaml output only if the input is passed via stdin. If {base64/raw}-out is passed in alongside a yaml file, the encoded annotation or the policy data respectively will be printed to stdout as before. Fixes #12438 Signed-off-by: Spyros Seimenis <sse@edgeless.systems>	2026-02-11 14:08:30 +02:00
Hyounggyu Choi	c84e37f6ac	Merge pull request #12486 from BbolroC/cpu-hotplug-s390x-runtime-rs runtime-rs: Skip sockets and threads for hotplug_vcpus on Z/P	2026-02-11 09:40:21 +01:00
Hyounggyu Choi	67f54bdcb5	tests: Remove skip condition for runtime-rs on s390x in k8s-cpu-ns This commit removes the skip condition for qemu-runtime-rs on s390x in k8s-cpu-ns.bats. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-02-11 05:52:13 +01:00
Hyounggyu Choi	eab77a26ab	runtime-rs: Skip sockets and threads for hotplug_vcpus on Z/P As s390x and ppc64 use a flat CPU topology without sockets and threads, this commit skips the socket_id and thread_id properties for vCPU hotplug on these architectures instead of aborting the operation. This is the change in line with those from the Go runtime: - isSocketIDSupported() - isThreadIDSupported() Fixes: #12155 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-02-11 05:52:13 +01:00
Alex Lyn	c53910eb1b	Merge pull request #12408 from Apokleos/netdev-multiq runtime-rs: Add support configurable network_queues via configuration and annotation	2026-02-11 09:34:58 +08:00
stevenhorsman	a115d6d858	ci: Add copyright and license to shellcheckrc Make the static-checks happy Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-10 21:58:28 +01:00
stevenhorsman	15d6a681ed	doc: Fix spelling issues Put things in backticks Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-10 21:58:28 +01:00
stevenhorsman	e84d234721	doc: Update broken/slow URLs Update the URLs to better/existing links Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-10 21:58:28 +01:00
Fabiano Fidêncio	5c0269881e	tests: Make editorconfig-checker happy - Trim trailing whitespace and ensure final newline in non-vendor files - Add .editorconfig-checker.json excluding vendor dirs, .patch, .img, .dtb, .drawio, *.svg, and pkg/cloud-hypervisor/client so CI only checks project code - Leave generated and binary assets unchanged (excluded from checker) Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-10 21:58:28 +01:00
Fabiano Fidêncio	34199b09eb	runtime-rs: Properly parse containerd runtime options to extract ConfigPath The runtime-rs shim was failing to load its configuration when deployed via kata-deploy because it couldn't correctly parse the ConfigPath passed by containerd. The previous implementation naively skipped the first 2 bytes of the options and interpreted the rest as a UTF-8 string, which doesn't work since containerd passes a properly serialized protobuf message of type runtimeoptions.v1.Options. This change adds the runtimeoptions.proto definition to the protocols crate and updates the load_config function to correctly deserialize the protobuf message and extract the config_path field, matching how the Go runtime handles this via typeurl.UnmarshalAny. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	cb652e0da1	tests: Update NVRC trace to use drop-in config mechanism Update the enable_nvrc_trace() function to use the new drop-in configuration mechanism instead of directly modifying the base configuration file. The function now creates a 90-nvrc-trace.toml drop-in file that properly combines existing kernel parameters with the nvrc.log=trace setting. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	4cb2aea9dd	kata-deploy: Document drop-in configuration and add warning to config files When kata-deploy installs Kata Containers, the base configuration files should not be modified directly. This change adds documentation explaining how to use drop-in configuration files for customization, and prepends a warning comment to all deployed configuration files reminding users to use drop-in files instead. The warning is added to both standard shim configurations and custom runtime configurations. It includes a brief explanation of how drop-in files work and points users to the documentation for more details. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	d5d561abe5	kata-deploy: Add detailed logging for drop-in configuration Add clear INFO-level messages when creating drop-in configuration files, making it easy to understand what kata-deploy is doing during installation: - "Setting up runtime directory for shim: X" - "Generating drop-in configuration files for shim: X" - "Created drop-in file: <path>" When DEBUG mode is enabled (via DEBUG=true environment variable), also log the full content of each drop-in file to aid troubleshooting. The log level is now automatically set to Debug when the DEBUG environment variable is set, ensuring debug messages are visible. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	eddd1b507e	kata-deploy: Extract common drop-in generation into shared helper Deduplicate the drop-in file generation logic between configure_shim_config and install_custom_runtime_configs by extracting it into a shared write_common_drop_ins helper function. This ensures both standard and custom runtimes use the same code path for generating drop-in configuration files. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	577aa6b319	kata-deploy: Propagate drop-in configs to custom runtime classes Ensure custom runtime classes receive the same drop-in configuration files as standard runtimes: - 10-installation-prefix.toml (if custom dest_dir) - 20-debug.toml (if debug enabled) - 30-kernel-params.toml (proxy + debug kernel params) Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	8c60a88bda	kata-deploy: Add combined kernel_params drop-in Add a combined drop-in file (30-kernel-params.toml) that handles all kernel_params modifications. This approach reads the base kernel_params from the original untouched config file and combines them with: - Proxy settings (agent.https_proxy, agent.no_proxy) - Debug settings (agent.log=debug, initcall_debug) Using a single drop-in file for kernel_params avoids the TOML merge behavior where scalar values are replaced rather than appended. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	fae96f1f82	kata-deploy: Add drop-in file for debug configuration When debug mode is enabled, generate a drop-in configuration file (20-debug.toml) with the boolean debug flags for hypervisor, runtime, and agent sections. Note: kernel_params for debug (agent.log=debug, initcall_debug) will be handled by a separate combined kernel_params drop-in file to avoid the TOML merge replacement behavior. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	bb65e516e5	kata-deploy: Add drop-in file for installation prefix When the installation prefix differs from the default /opt/kata, generate a drop-in configuration file (10-installation-prefix.toml) with the adjusted paths instead of modifying the original config file. This removes the need for adjust_installation_prefix and adjust_qemu_cmdline functions which are now deleted along with their tests. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	cd76d61a3d	kata-deploy: Add infrastructure for per-shim drop-in configuration Instead of modifying original config files directly, set up a per-shim directory structure that uses symlinks to the original configs and config.d/ directories for drop-in overrides. This enables cleaner configuration management where the original files remain untouched and all kata-deploy customizations are in separate drop-in files that can be easily inspected and removed. Directory structure: {config_path}/runtimes/{shim}/ {config_path}/runtimes/{shim}/configuration-{shim}.toml -> symlink {config_path}/runtimes/{shim}/config.d/ Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Paul Meyer	c5ad3f9b26	Merge pull request #12472 from katexochen/p/disable-nvdimm-cc runtime: disable nvdimm for confidential guest	2026-02-10 14:54:40 +01:00
Steve Horsman	44c86f881b	Merge pull request #12466 from ldoktor/gk-pagination tools.gatekeeper: Add support to paginate workflows	2026-02-10 11:59:57 +00:00
Steve Horsman	a8debc9841	Merge pull request #12476 from stevenhorsman/bump-rust-to-1.91 versions: Bump rust to 1.91	2026-02-10 10:03:01 +00:00
Paul Meyer	76525b97a6	runtime-rs: disable nvdimm for confidential guest nvdimm isn't supported by confidential guests, so disable it in the configuration. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2026-02-10 08:38:41 +01:00
Paul Meyer	a5f554922c	runtime: disable nvdimm for confidential guest There is code to disable this at runtime when confidential_guest is enabled anyway[^1], but it will omit a warning every time. All the touched configuration files set confidential_guest to true, so we already know nvdimm isn't supported. [^1]: `16a7ed6e14/src/runtime/virtcontainers/qemu_amd64.go (L144-L148)` Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2026-02-10 08:38:18 +01:00
Lukáš Doktor	f7baa394d4	tools.gatekeeper: Add support to paginate workflows The number of workflows increased over 30 so we need to paginate them as well as jobs. This commit extracts the existing pagination from jobs and uses it for both jobs and workflows. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2026-02-10 06:53:47 +00:00
stevenhorsman	120fde28e1	versions: Bump rust to 1.91 Following the agreed toolchain policy - bump rust to the current (1.93)-2 releases. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-10 06:52:42 +00:00
Alex Lyn	362a4c5714	runtime-rs: Fix multiqueue config propagation and tap initialization The previous implementation failed to correctly propagate the network multiqueue configuration, causing the effective queue number to remain 0. It also mixed up "queue pairs" with "queue number", so tap devices were opened without proper multiqueue initialization which causes Clh netconfig validation failed. This commit fixes the configuration mapping and initializes tap devices with the correct multiqueue semantics, ensuring Cloud Hypervisor receives a valid netconfig. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-10 11:34:25 +08:00
Alex Lyn	79f81dae50	runtime-rs: Add network_queues for setting network device multiqueues To make network_queues configurable, a new method is introduced via configurtion toml. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-10 11:34:25 +08:00
Alex Lyn	6723ff5c46	runtime-rs: Add configurable DEFNETQUEUES in Makefile To make build with a configurable item of network queues, a dedicated variable of DEFNETQUEUES is added. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-10 11:34:25 +08:00
Alex Lyn	cfc479ef1d	kata-types: Add Network device specific annotation for network queues This commit introduces a new annotation for users to easily set network queues via "io.katacontainers.config.hypervisor.network_queues". And the annotation will be mapped into `NetworkInfo.network_queues` within the configuration. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-10 11:34:25 +08:00
Alex Lyn	61e7875267	kata-types: Adjust the network_queues when load from configuration Adjusts the network queues after loading from a configuration file. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-10 11:34:25 +08:00
Manuel Huber	a6ca5c6628	ci: add editorconfig checker This adds a basic configuration for editorconfig checker. The supplied configuration checks against trailing whitespaces and issues with newlines. Example: \| tools/packaging/kernel/configs/fragments/x86_64/numa.conf: \| Wrong line endings or no final newline \| tools/packaging/release/generate_vendor.sh: \| 44: Trailing whitespace Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-09 15:03:26 -08:00
stevenhorsman	e6d291cf0a	trace-forwarder: Bump time to 0.3.47 Bump time to remediate CVE-2026-25727 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:44:51 +01:00
stevenhorsman	79dc892e18	kata-ctl: Bump time to 0.3.47 Bump time to remediate CVE-2026-25727 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:44:51 +01:00
stevenhorsman	9e1ddcdde9	agent-ctl: Bump time to 0.3.47 Bump time to remediate CVE-2026-25727 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:44:51 +01:00
stevenhorsman	f840f9ad54	rust: Bump time to 0.3.47 To remediate CVE-2026-25727 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:44:51 +01:00
stevenhorsman	ffcb10b6a3	agent: Bump time crate to 0.3.47 Update time to resolve CVE-2026-25727. Note: this involved bumping the versions of slog-term and slog-json and bumping the MSRV to 1.88.0 which time 0.3.47 requires. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:44:51 +01:00
stevenhorsman	33d494b07e	kata-deploy: Bump bytes to 1.11.1 To remediate CVE-2026-25541 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:43:23 +01:00
stevenhorsman	2ea29df99a	genpolicy: Bump bytes to 1.11.1 To remediate CVE-2026-25541 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:43:23 +01:00
stevenhorsman	fa3b419965	kata-ctl: Bump bytes to 1.11.1 To remediate CVE-2026-25541 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:43:23 +01:00
stevenhorsman	e49a61eea2	agent: Bump bytes to 1.11.1 To remediate CVE-2026-25541 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:43:23 +01:00
stevenhorsman	bc45788356	versions: Bump bytes to 1.11.1 To remediate CVE-2026-25541 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:43:23 +01:00
stevenhorsman	51d35f9261	agent-ctl: Bump bytes to 1.11.1 Remediate CVE-2026-25541 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:43:23 +01:00
Park.Jiyeon	082e25b297	genpolicy: skip serializing VFIO generation-only settings Skip serializing anno/value regexes and the NVIDIA VFIO device type since they are generation-time only. Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>	2026-02-09 11:36:34 -08:00
Park.Jiyeon	9231144b99	genpolicy: refactor VFIO settings and support multiple NVIDIA GPU keys - Moved VFIO-related config from "device_annotations" to a new "devices" section. - Introduced structured "nvidia" subfield for NVIDIA-specific VFIO settings. - Replaced hardcoded "nvidia.com/pgpu" with configurable "pgpu_resource_keys". - Adjusted Rego rules and code to match new config schema. Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>	2026-02-09 11:36:34 -08:00
Park.Jiyeon	5fa5d1934b	fix(genpolicy): make NVIDIA GPU resource keys configurable Allow specifying multiple NVIDIA GPU resource keys via an explicit allowlist. Keys are now configured under `device_annotations.vfio.nvidia_pgpu_resource_keys` in genpolicy-settings.json. This removes the previous hardcoded reliance on `nvidia.com/pgpu` and supports model-specific resource names. Fixes #12322 Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>	2026-02-09 11:36:34 -08:00
Manuel Huber	525192832f	tests: Clean up superfluous GPU annotation This annotation was required for GPU cold-plug before using a newer device plugin and before querying the pod resources API. As this annotation is no longer required, cleaning it up. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-09 11:28:24 -08:00
Konstantin Khlebnikov	5d99a141d9	runtime: add hypervisor options for NUMA topology With enable_numa=true hypervisor will expose host NUMA topology as is: map vm NUMA nodes to host 1:1 and bind vpus to relates CPUS. Option "numa_mapping" allows to redefine NUMA nodes mapping: - map each vm node to particular host node or several numa nodes - emulate numa on host without numa (useful for tests) Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Co-authored-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-02-09 20:09:25 +01:00
Fabiano Fidêncio	ab515712d4	kernel: Unify kernel and kernel-confidential Build a single kernel for both kernel and kernel-confidential on x86_64 and s390x. The kernel is built with TEE support (-x) on those arches only. This helps to simplilfy and to maintain the code, and having a single kernel was the original plan since forever. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-09 18:28:23 +01:00
Fabiano Fidêncio	c5b5433866	kernel: Unify nvidia-gpu and nvidia-gpu-confidential Build a single kernel for both nvidia-gpu and nvidia-gpu-confidential, simplifying and reducing code maintenance. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-09 18:28:23 +01:00
Steve Horsman	f02fa79758	Merge pull request #12470 from jirimoravcik/docs/add-os-version docs: add `OS_VERSION` to rootfs script	2026-02-09 15:06:14 +00:00
Alex Lyn	3fda59e27d	tests: rename pod_exec_with_retries to pod_exec and update callers It will do following works in this commit: (1) Rename pod_exec_with_retries() to pod_exec(). (2) Update implementation to call container_exec(). (3) Replace all usages of pod_exec_with_retries across tests with pod_exec. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
Alex Lyn	861d39305c	tests: drop kubectl exec retries in container_exec This commit aims to drop retries when kubectl exec a container: (1) Rename container_exec_with_retries() to container_exec(). (2) Remove the retry loop and sleep backoff around kubectl exec. Keep the same logging and container-selection logic and return kubectl exec exit status directly. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
Alex Lyn	41e8acbc5e	runtime: Map empty ReadStdout/ReadStderr response to io.EOF After the kata-agent "drain-after-exit" change, stdout/stderr EOF is signaled by a successful ReadStdout/ReadStderr reply with empty Data (len==0), instead of an RPC error. However, runtime-go currently returns (0, nil) to io.CopyBuffer() when resp.Data is empty, which violates Go io.Reader semantics and can cause `kubectl exec` to hang after the command output is already printed. To avoid exec hang: In readProcessStream(), map an empty response (len(resp.Data)==0) into (0, io.EOF). This allows the stdout/stderr copy goroutines to terminate, closes exitIOch, and unblocks the wait path so exec can complete normally. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
Alex Lyn	ffb8a6a9c3	agent: fix misleading tokio::select! biased comment in do_read_stream The previous comment incorrectly implied that `biased` prevents data loss and the exit notifier would never be polled before all buffered data is read. And the detailed info can be seen from the document: https://docs.rs/tokio/latest/src/tokio/macros/select.rs.html#67 Tokio's `biased` only makes polling order deterministic(top-to-bottom) when multiple branches are ready in the same poll, and it makes fairness the caller's responsibility. Output can still be truncated if the exit notification becomes ready while `read_stream` is pending. This change updates the comment to reflect the actual semantics and caveats. No functional behavior change. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
Alex Lyn	1080f6d87e	agent: Introduce drain after exit mechanism to address truncation race Short-lived processes (e.g., `kubectl exec echo`) in legacy-io mode occasionally lose the last segments of their output. The root cause is a race condition where the `term_exit_notifier` triggers before the pipe buffers are fully drained. In the previous implementation, once the exit notification was received, the agent immediately returned an EOF, causing the runtime's `run_io_copy` to terminate and drop any residual data in the pipe. This patch introduces a "drain after exit" mechanism: - Upon receiving an exit notification, the agent enters a 500ms window for polling `read_streaim` to flush remaining data from the buffer. - A true EOF is only returned if the stream is confirmed empty or the timeout is reached. This ensures reliable output delivery for transient exec tasks under high concurrency. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
Alex Lyn	700bddeecc	agent: treat EOF as normal for read_stdout/stderr stream Legacy IO uses shim polling via read_stdout/read_stderr. The agent previously mapped pipe EOF (read() == 0) and term_exit_notifier to errors ("read meet eof"/"eof"), which became ttrpc INTERNAL failures. This caused runtime IO copy to abort early, leading to lost stdout/stderr for short-lived exec (e.g."echo") and spurious failures. Normalize EOF semantics: read_stream now returns Ok(empty) on EOF instead of Err("read meet eof"). This makes legacy IO behave like a proper stream: data until EOF, no INTERNAL errors for normal termination. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
stevenhorsman	b909c41128	runtime: Bump x/net to v0.49.0 Bump x/net to resolve CVEs: - GO-2026-4441 - GO-2026-4440 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 14:49:31 +01:00
stevenhorsman	b29312289f	versions: Bump go to 1.24.13 Bump go to 1.24.13 to fix CVE GO-2026-4337 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 14:49:31 +01:00
Zvonko Kaiser	7af306de13	agent: Update aarch64 create_pci_root_bus_path aarch64 is also a supported architecture for NUMA. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-02-09 10:19:41 +01:00
Zvonko Kaiser	8185c015ad	gpu: Add Agent NUMA Support 1 of N We're introducing a root_complex to assign each and every device to a NUMA node or to the default root_complex="00" aka pcie.0. This patch introduces the proper handling of the current qom path being bus/device == "00/02" with NUMAA we need to extend it with the root_complex/bus/device == "10/00/02". We're defaulting to root_complex="00". Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-02-09 10:19:41 +01:00
Alex Lyn	16a7ed6e14	Merge pull request #12464 from mythi/runtime-rs-tdvf runtime-rs: use FIRMWARETDVFPATH like Go runtime	2026-02-09 09:12:52 +08:00
Mikko Ylinen	4088881662	runtime-rs: use FIRMWARETDVFPATH like Go runtime Use OVMF path configuration for Intel TDX consistently: $ git grep FIRMWARETD src/runtime-rs/Makefile:FIRMWARETDXPATH := $(PREFIXDEPS)/share/ovmf/OVMF.inteltdx.fd src/runtime-rs/Makefile:USER_VARS += FIRMWARETDXPATH src/runtime-rs/config/configuration-qemu-tdx-runtime-rs.toml.in:firmware = "@FIRMWARETDXPATH@" src/runtime/Makefile:FIRMWARETDVFPATH := $(PREFIXDEPS)/share/ovmf/OVMF.inteltdx.fd Go runtime has used TDVF so just make runtime-rs to follow. This keeps the behavior consistent when downstreams switch from Go runtime to runtime-rs. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-08 21:38:06 +01:00
Jiri Moravcik	d5840149d2	docs: add `OS_VERSION` to rootfs script The OS_VERSION is required when trying to build RootFS with ubuntu distro. Fixes #12469 Signed-off-by: Jiri Moravcik <jiri.moravcik@gmail.com>	2026-02-08 21:21:59 +01:00
Manuel Huber	d9d1073cf1	gpu: Install packages for devkit Introduce a new function to install additional packages into the devkit flavor. With modprobe, we avoid errors on pod startup related to loading nvidia kernel modules in the NVRC phase. Note, the production flavor gets modprobe from busybox, see its configuration file containing CONFIG_MODPROBE=y. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-06 09:58:32 +01:00
Manuel Huber	a786582d0b	rootfs: deprecate initramfs dm-verity mode Remove the initramfs folder, its build steps, and use the kernel based dm-verity enforcement for the handlers which used the initramfs mode. Also, remove the initramfs verity mode capability from the shims and their configs. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	cf7f340b39	tests: Read and overwrite kernel_verity_parameters Read the kernel_verity_paramers from the shim config and adjust the root hash for the negative test. Further, improve some of the test logic by using shared functions. This especially ensures we don't read the full journalctl logs on a node but only the portion of the logs we are actually supposed to look at. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	7958be8634	runtime: Make kernel_verity_params overwritable Similar to the kernel_params annotation, add a kernel_verity_params annotation and add logic to make these parameters overwritable. For instance, this can be used in test logic to provide bogus dm-verity hashes for negative tests. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	7700095ea8	runtime-rs: Make kernel_verity_params overwritable Similar to the kernel_params annotation, add a kernel_verity_params annotation and add logic to make these parameters overwritable. For instance, this can be used in test logic to provide bogus dm-verity hashes for negative tests. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	472b50fa42	runtime-rs: Enable kernelinit dm-verity variant This change introduces the kernel_verity_parameters knob to the rust based shim, picking up dm-verity information in a new config field (the corresponding build variable is already produced by the shim build). The change extends the shim to parse dm-verity information from this parameter and to construct the kernel command line appropriately, based on the indicated initramfs or kernelinit build variant. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	f639c3fa17	runtime: Enable kernelinit dm-verity variant This change introduces the kernel_verity_parameters knob to the Go based shim, picking up dm-verity information in a new config field (the corresponding build variable is already produced by the shim build). The change extends the shim to parse dm-verity information from this parameter and to construct the kernel command line appropriately, based on the indicated initramfs or kernelinit build variant. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	e120dd4cc6	tests: cc: Remove quotes from kernel command line With dm-mod.create parameters using quotes, we remove the backslashes used to escape these quotes from the output we retrieve. This will enable attestation tests to work with the kernelinit dm-verity mode. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	976df22119	rootfs: Change condition for cryptsetup-bin Measured rootfs mode and CDH secure storage feature require the cryptsetup-bin and e2fsprogs components in the guest. This change makes this more explicity - confidential guests are users of the CDH secure container image layer storage feature. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	a3c4e0b64f	rootfs: Introduce kernelinit dm-verity mode This change introduces the kernelinit dm-verity mode, allowing initramfs-less dm-verity enforcement against the rootfs image. For this, the change introduces a new variable with dm-verity information. This variable will be picked up by shim configurations in subsequent commits. This will allow the shims to build the kernel command line with dm-verity information based on the existing kernel_parameters configuration knob and a new kernel_verity_params configuration knob. The latter specifically provides the relevant dm-verity information. This new configuration knob avoids merging the verity parameters into the kernel_params field. Avoiding this, no cumbersome escape logic is required as we do not need to pass the dm-mod.create="..." parameter directly in the kernel_parameters, but only relevant dm-verity parameters in semi-structured manner (see above). The only place where the final command line is assembled is in the shims. Further, this is a line easy to comment out for developers to disable dm-verity enforcement (or for CI tasks). This change produces the new kernelinit dm-verity parameters for the NVIDIA runtime handlers, and modifies the format of how these parameters are prepared for all handlers. With this, the parameters are currently no longer provided to the kernel_params configuration knob for any runtime handler. This change alone should thus not be used as dm-verity information will no longer be picked up by the shims. systemd-analyze on the coco-dev handler shows that using the kernelinit mode on a local machine, less time is spent in the kernel phase, slightly speeding up pod start-up. On that machine, the average of 172.5ms was reduced to 141ms (4 measurements, each with a basic pod manifest), i.e., the kernel phase duration is improved by about 18 percent. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	83a0bd1360	gpu: use dm-verity for the non-TEE GPU handler Use a dm-verity protected rootfs image for the non-TEE NVIDIA GPU handler as well. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	02ed4c99bc	rootfs: Use maxdepth=1 to search for kata tarballs These tarballs are in the top layer of the build directory, no need to traverse all sub-directories. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	d37db5f068	rootfs: Restore "gpu: Handle root_hash.txt ..." This reverts commit `923f97bc66` in order to re-instantiate the logic from commit `e4a13b9a4a`. The latter commit was previously reverted due to the NVIDIA GPU TEE handler using an initrd, not an image. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	f1ca547d66	initramfs: introduce log function Log to /dev/kmsg, this way logs will show up and not get lost. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	6d0bb49716	runtime: nvidia: Use img and sanitize whitespaces Shift NVIDIA shim configurations to use an image instead of an initrd, and remove trailing whitespaces from the configs. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Manuel Huber	282014000f	tests: cc: support initrd, image for attestation Allow using an image instead of an initrd. For confidential guests using images, the assumption is that the guest kernel uses dm-verity protection, implicitly measuring the rootfs image via the kernel command line's dm-verity information. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-05 23:04:35 +01:00
Greg Kurz	e430b2641c	Merge pull request #12435 from bpradipt/crio-annotation shim: Add CRI-O annotation support for device cold plug	2026-02-05 09:29:19 +01:00
Alex Lyn	e257430976	Merge pull request #12433 from manuelh-dev/mahuber/cfg-sanitize-whitespaces runtimes: Sanitize trailing whitespaces	2026-02-05 09:31:21 +08:00
Fabiano Fidêncio	dda1b30c34	tests: nvidia-nim: Use sealed secrets for NGC_API_KEY Convert the NGC_API_KEY from a regular Kubernetes secret to a sealed secret for the CC GPU tests. This ensures the API key is only accessible within the confidential enclave after successful attestation. The sealed secret uses the "vault" type which points to a resource stored in the Key Broker Service (KBS). The Confidential Data Hub (CDH) inside the guest will unseal this secret by fetching it from KBS after attestation. The initdata file is created AFTER create_tmp_policy_settings_dir() copies the empty default file, and BEFORE auto_generate_policy() runs. This allows genpolicy to add the generated policy.rego to our custom CDH configuration. The sealed secret format follows the CoCo specification: sealed.<JWS header>.<JWS payload>.<signature> Where the payload contains: - version: "0.1.0" - type: "vault" (pointer to KBS resource) - provider: "kbs" - resource_uri: KBS path to the actual secret Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 12:34:44 +01:00
Fabiano Fidêncio	c9061f9e36	tests: kata-deploy: Increase post-deployment wait time Increase the sleep time after kata-deploy deployment from 10s to 60s to give more time for runtimes to be configured. This helps avoid race conditions on slower K8s distributions like k3s where the RuntimeClass may not be immediately available after the DaemonSet rollout completes. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 12:13:53 +01:00
Fabiano Fidêncio	0fb2c500fd	tests: kata-deploy: Merge E2E tests to avoid timing issues Merge the two E2E tests ("Custom RuntimeClass exists with correct properties" and "Custom runtime can run a pod") into a single test, as those 2 are very much dependent of each other. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 12:13:53 +01:00
Fabiano Fidêncio	fef93f1e08	tests: kata-deploy: Use die() instead of fail() for error handling Replace fail() calls with die() which is already provided by common.bash. The fail() function doesn't exist in the test infrastructure, causing "command not found" errors when tests fail. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 12:13:53 +01:00
Fabiano Fidêncio	f90c12d4df	kata-deploy: Avoid text file busy error with nydus-snapshotter We cannot overwrtie a binary that's currently in use, and that's the reason that elsewhere we remove / unlink the binary (the running process keeps its file descriptor, so we're good doing that) and only then we copy the binary. However, we missed doing this for the nydus-snapshotter deployment. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-04 10:24:49 +01:00
Manuel Huber	30c7325e75	runtimes: Sanitize trailing whitespaces Clean up trailing whitespaces, making life easier for those who have configured their IDE to clean these up. Suggest to not add new code with trailing whitespaces etc. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-03 11:46:30 -08:00
Steve Horsman	30494abe48	Merge pull request #12426 from kata-containers/dependabot/github_actions/zizmorcore/zizmor-action-0.4.1 build(deps): bump zizmorcore/zizmor-action from 0.2.0 to 0.4.1	2026-02-03 14:38:54 +00:00
Pradipta Banerjee	8a449d358f	shim: Add CRI-O annotation support for device cold plug Add support for CRI-O annotations when fetching pod identifiers for device cold plug. The code now checks containerd CRI annotations first, then falls back to CRI-O annotations if they are empty. This enables device cold plug to work with both containerd and CRI-O container runtimes. Annotations supported: - containerd: io.kubernetes.cri.sandbox-name, io.kubernetes.cri.sandbox-namespace - CRI-O: io.kubernetes.cri-o.KubeName, io.kubernetes.cri-o.Namespace Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>	2026-02-03 04:51:15 +00:00
Steve Horsman	6bb77a2f13	Merge pull request #12390 from mythi/tdx-updates-2026-2 runtime: tdx QEMU configuration changes	2026-02-02 16:58:44 +00:00
Zvonko Kaiser	6702b48858	Merge pull request #12428 from fidencio/topic/nydus-snapshotter-start-from-a-clean-state kata-deploy: nydus: Always start from a clean state	2026-02-02 11:21:26 -05:00
Steve Horsman	0530a3494f	Merge pull request #12415 from nlle/make-helm-updatestrategy-configurable kata-deploy: Make update strategy configurable for kata-deploy DaemonSet	2026-02-02 10:29:01 +00:00
Steve Horsman	93dcaee965	Merge pull request #12423 from manuelh-dev/mahuber/pause-build-fix packaging: Delete pause_bundle dir before unpack	2026-02-02 10:26:30 +00:00
Fabiano Fidêncio	62ad0814c5	kata-deploy: nydus: Always start from a clean state Clean up existing nydus-snapshotter state to ensure fresh start with new version. This is safe across all K8s distributions (k3s, rke2, k0s, microk8s, etc.) because we only touch the nydus data directory, not containerd's internals. When containerd tries to use non-existent snapshots, it will re-pull/re-unpack. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-02 11:06:37 +01:00
Mikko Ylinen	870630c421	kata-deploy: drop custom TDX installation steps As we have moved to use QEMU (and OVMF already earlier) from kata-deploy, the custom tdx configurations and distro checks are no longer needed. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-02 11:11:26 +02:00
Mikko Ylinen	927be7b8ad	runtime: tdx: move to use QEMU from kata-deploy Currently, a working TDX setup expects users to install special TDX support builds from Canonical/CentOS virt-sig for TDX to work. kata-deploy configured TDX runtime handler to use QEMU from the distro's paths. With TDX support now being available in upstream Linux and Ubuntu 24.04 having an install candidate (linux-image-generic-6.17) for a new enough kernel, move TDX configuration to use QEMU from kata-deploy. While this is the new default, going back to the original setup is possible by making manual changes to TDX runtime handlers. Note: runtime-rs is already using QEMUPATH for TDX. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-02-02 11:10:52 +02:00
Nikolaj Lindberg Lerche	6e98df2bac	kata-deploy: Make update strategy configurable for kata-deploy DaemonSet This Allows the updateStrategy to be configured for the kata-deploy helm chart, this is enabling administrators to control the aggressiveness of updates. For a less aggressive approach, the strategy can be set to `OnDelete`. Alternatively, the update process can be made more aggressive by adjusting the `maxUnavailable` parameter. Signed-off-by: Nikolaj Lindberg Lerche <nlle@ambu.com>	2026-02-01 20:14:29 +01:00
Dan Mihai	d7ff54769c	tests: policy: remove the need for using sudo Modify the copy of root user's settings file, instead of modifying the original file. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2026-02-01 20:09:50 +01:00
Dan Mihai	4d860dcaf5	tests: policy: avoid redundant debug output Avoid redundant and confusing teardown_common() debug output for k8s-policy-pod.bats and k8s-policy-pvc.bats. The Policy tests skip the Message field when printing information about their pods, because unfortunately that field might contain a truncated Policy log - for the test cases that intentiocally cause Policy failures. The non-truncated Policy log is already available from other "kubectl describe" fields. So, avoid the redundant pod information from teardown_common(), that also included the confusing Message field. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2026-02-01 20:09:50 +01:00
dependabot[bot]	dc8d9e056d	build(deps): bump zizmorcore/zizmor-action from 0.2.0 to 0.4.1 Bumps [zizmorcore/zizmor-action](https://github.com/zizmorcore/zizmor-action) from 0.2.0 to 0.4.1. - [Release notes](https://github.com/zizmorcore/zizmor-action/releases) - [Commits](`e673c3917a...135698455d`) --- updated-dependencies: - dependency-name: zizmorcore/zizmor-action dependency-version: 0.4.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2026-02-01 15:08:10 +00:00
Manuel Huber	8b0c199f43	packaging: Delete pause_bundle dir before unpack Delete the pause_bundle directory before running the umoci unpack operation. This will make builds idempotent and not fail with errors like "create runtime bundle: config.json already exists in .../build/pause-image/destdir/pause_bundle". This will make life better when building locally. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-31 19:43:11 +01:00
Steve Horsman	4d1095e653	Merge pull request #12350 from manuelh-dev/mahuber/term-grace-period tests: Remove terminationGracePeriod in manifests	2026-01-29 15:17:17 +00:00
Fabiano Fidêncio	b85393e70b	release: Bump version to 3.26.0 Bump VERSION and helm-charts versions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-29 00:23:26 +01:00
Fabiano Fidêncio	500146bfee	versions: Bump Go to 1.24.12 Update Go from 1.24.11 to 1.24.12 to address security vulnerabilities in the standard library: - GO-2026-4342: Excessive CPU consumption in archive/zip - GO-2026-4341: Memory exhaustion in net/url query parsing - GO-2026-4340: TLS handshake encryption level issue in crypto/tls Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-29 00:23:26 +01:00
Dan Mihai	20ca4d2d79	runtime: DEFDISABLEBLOCK := true 1. Add disable_block_device_use to CLH settings file, for parity with the already existing QEMU settings. 2. Set DEFDISABLEBLOCK := true by default for both QEMU and CLH. After this change, Kata Guests will use by default virtio-fs to access container rootfs directories from their Hosts. Hosts that were designed to use Host block devices attached to the Guests can re-enable these rootfs block devices by changing the value of disable_block_device_use back to false in their settings files. 3. Add test using container image without any rootfs layers. Depending on the container runtime and image snapshotter being used, the empty container rootfs image might get stored on a host block device that cannot be safely hotplugged to a guest VM, because the host is using the same block device. 4. Add block device hotplug safety warning into the Kata Shim configuration files. Signed-off-by: Dan Mihai <dmihai@microsoft.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Cameron McDermott <cameron@northflank.com>	2026-01-28 19:47:49 +01:00
Manuel Huber	5e60d384a2	kata-deploy: Update for mariner in all target Remove the initrd function and add the image function to align with the actually existing functions in this file. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-28 08:58:45 -08:00
Greg Kurz	ea627166b9	Merge pull request #12389 from ldoktor/ci-helm ci.ocp: Use 0.0.0-dev tagged helm chart	2026-01-28 17:20:07 +01:00
Manuel Huber	0d8fbdef07	kernel: Readjust kernel version after decrement Readjust the kata_config_version counter after it was accidentally decremented in commit `c7f5ff4`. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-28 10:48:12 +01:00
Joji Mekkattuparamban	1440dd7468	shim: enforce iommufd for confidential guest vfio Confidential guests cannot use traditional IOMMU Group based VFIO. Instead, they need to use IMMUFD. This is mainly because the group abstraction is incompatible with a confidential device model. If traditional VFIO is specified for a confidential guest, detect the error and bail out early. Fixes #12393 Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>	2026-01-28 00:11:38 +01:00
stevenhorsman	c7bc428e59	versions: Bump guest-components Bump guest-components to 9aae2eae to pick up the latest security fixes and toolchain bump Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-28 00:05:58 +01:00
Aurélien Bombo	932920cb86	Merge pull request #11959 from houstar/main agent: remove redundant func comment	2026-01-27 12:01:04 -06:00
Lukáš Doktor	5250d4bacd	ci.ocp: Use 0.0.0-dev tagged helm chart in CI we are testing the latest kata-deploy, which requires the latest helm chart. The previous query doesn't work anymore, but these days we should be able to rely on the "0.0.0-dev" tag and on helm to print the to-be-installed version into console. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2026-01-27 14:58:46 +01:00
Steve Horsman	eb3d204ff3	Merge pull request #12274 from ldoktor/pp-images ci.ocp: Two little fixes regarding the openshift-ci	2026-01-27 11:31:51 +00:00
Lukáš Doktor	971b096a1f	ci.ocp: Update cleanup.sh to cope with helm deployment replaces the old kata-deploy and uses "helm uninstall" instead. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2026-01-27 07:59:13 +01:00
Lukáš Doktor	272ff9c568	ci.ocp: Add notes about where to get other podvm images I keep struggling finding the debug images, let's include them in the peer-pods-azure.sh script so people can find them easier. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2026-01-27 07:59:12 +01:00
Qingyuan Hou	ca43a8cbb8	agent: remove redundant func comment This comment was first introduced in `e111093` with secure_join() but then we forgot to remove it when we switched to the safe-path lib in `c0ceaf6` Signed-off-by: Qingyuan Hou <lenohou@gmail.com>	2026-01-27 03:07:57 +00:00
Alex Lyn	6c0ae4eb04	Merge pull request #11585 from Apokleos/enhance-qmp runtime-rs: Make QMP init robust by retrying handshake with deadline	2026-01-27 09:11:19 +08:00
Zvonko Kaiser	a59f791bf5	gpu: Move CUDA repo selection to versions.yaml We want to enable local and remote CUDA repository builds. Moving the cuda and tools repo to versions.yaml with a unified build for both types. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-26 22:19:40 +01:00
Fabiano Fidêncio	d0fe60e784	tests: Fix empty string handling for helm Fix empty string handling in format conversion When HELM_ALLOWED_HYPERVISOR_ANNOTATIONS, HELM_AGENT_HTTPS_PROXY, or HELM_AGENT_NO_PROXY are empty, the pattern matching condition `!= :` or `!= =` evaluates to true, causing the conversion loop to create invalid entries like "qemu-tdx: qemu-snp:". Add -n checks to ensure conversion only runs when variables are non-empty. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	4b2d4e96ae	tests: Add qemu-{tdx,snp}-runtime-rs to the list of tee shims We missed doing this as part of `b5a986eacf`. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	26c534d610	tests: Use shims.disableAll in test helpers Update the CI and functional test helpers to use the new shims.disableAll option instead of iterating over every shim to disable them individually. Also adds helm repo for node-feature-discovery before building dependencies to fix CI failures on some distributions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	04f45a379c	kata-deploy: docs: Document shims.disableAll option Update the Helm chart README to document the new shims.disableAll option and simplify the examples that previously required listing every shim to disable. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	c9e9a682ab	kata-deploy: Use disableAll in example values files Simplify the example values files by using the new shims.disableAll option instead of listing every shim to disable. Before (try-kata-nvidia-gpu.values.yaml): shims: clh: enabled: false cloud-hypervisor: enabled: false # ... 15 more lines ... After: shims: disableAll: true Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	cfe9bcbaf1	kata-deploy: Add shims.disableAll option to Helm chart Add a new `shims.disableAll` option that disables all standard shims at once. This is useful when: - Enabling only specific shims without listing every other shim - Using custom runtimes only mode (no standard Kata shims) Usage: shims: disableAll: true qemu: enabled: true # Only qemu is enabled All helper templates are updated to check for this flag before iterating over shims. One thing that's super important to note here is that helm recursively merges user values with chart defaults, making a simple `disableAll` flag problematic: if defaults have `enabled: true`, user's `disableAll: true` gets merged with those defaults, resulting in all shims still being enabled. The workaround found is to use null (`~`) as the default for `enabled` field. The template logic interprets null differently based on disableAll: \| enabled value \| disableAll: false \| disableAll: true \| \|---------------\|-------------------\|------------------\| \| ~ (null) \| Enabled \| Disabled \| \| true \| Enabled \| Enabled \| \| false \| Disabled \| Disabled \| This is backward compatible: - Default behavior unchanged: all shims enabled when disableAll: false - Users can set `disableAll: true` to disable all, then explicitly enable specific shims with `enabled: true` - Explicit `enabled: false` always disables, regardless of disableAll Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	d8a3272f85	kata-deploy: Add tests for custom runtimes Helm templates Add Bats tests to verify the custom runtimes Helm template rendering, and that the we can start a pod with the custom runtime. Tests were written with Cursor's help. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	3be57bb501	kata-deploy: Add Helm chart support for custom runtimes Add Helm chart configuration for defining custom RuntimeClasses with base configuration and drop-in overrides. Usage: helm install kata-deploy ./kata-deploy \ -f custom-runtimes.values.yaml Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	a76cdb5814	kata-deploy: Add custom runtime config installation/removal Add functions to install and remove custom runtime configuration files. Each custom runtime gets an isolated directory structure: custom-runtimes/{handler}/ configuration-{baseConfig}.toml # Copied from base config config.d/ 50-overrides.toml # User's drop-in overrides The base config is copied AFTER kata-deploy has applied its modifications (debug settings, proxy configuration, annotations), so custom runtimes inherit these settings. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	4c3989c3e4	kata-deploy: Add custom runtime configuration for containerd/CRI-O Add functions to configure custom runtimes in containerd and CRI-O. Custom runtimes use an isolated config directory under: custom-runtimes/{handler}/ Custom runtimes automatically derive the shim binary path from the baseConfig field using the existing is_rust_shim() logic. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	678b560e6d	kata-deploy: Add CustomRuntime struct and parsing Add support for parsing custom runtime configurations from a mounted ConfigMap. This allows users to define their own RuntimeClasses with custom Kata configurations. The ConfigMap format uses a custom-runtimes.list file with entries: handler:baseConfig:containerd_snapshotter:crio_pulltype Drop-in files are read from dropin-{handler}.toml, if present. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Fabiano Fidêncio	609a25e643	kata-deploy: Refactor runtime configuration with helper functions Let's extract the common logic from configure_containerd_runtime and configure_crio_runtime into reusable helper functions. This reduces code duplication and prepares for adding custom runtime support. For containerd: - Add ContainerdRuntimeParams struct to encapsulate common parameters - Add get_containerd_pluginid() to extract version detection logic - Add get_containerd_output_path() to extract file path resolution - Add write_containerd_runtime_config() to write common TOML values For CRI-O: - Add CrioRuntimeParams struct to encapsulate common parameters - Add write_crio_runtime_config() to write common configuration While here, let's also simplify pod_annotations to always use "[\"io.katacontainers.*\"]" for all runtimes, as the NVIDIA specific case has been removed from the shell script, but we forgot to do so here. No functional changes intended. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-26 20:50:01 +01:00
Steve Horsman	aa94038355	Merge pull request #12388 from Apokleos/fix-shimio runtime-rs: se File instead of UnixStream for FIFO to fix ENOTSOCK	2026-01-26 13:22:57 +00:00
tak-ka3	5471fa133c	runtime-rs: Add -info flag support for containerd v2.0+ Add -info flag handling to containerd-shim-kata-v2 (Rust version). This outputs RuntimeInfo protobuf (name, version, revision) to stdout, providing compatibility with containerd v2.0+ which queries runtime information via this flag. This is the runtime-rs counterpart to the Go implementation. Fixes #12133 Signed-off-by: tak-ka3 <takumi.hiraoka@acompany-ac.com>	2026-01-26 13:38:07 +01:00
Alex Lyn	68d671af0f	runtime-rs: Make QMP init robust by retrying handshake with deadline It aims to make QMP initialize robust by retrying QMP handshake with global deadline to handle slow QEMU bring-up. Qmp::new() used DEFAULT_QMP_READ_TIMEOUT as the effective deadline for the QMP handshake read. When QEMU initialization is slow (e.g. heavy host load, large memory/device init, slow storage, confidential guests, etc.), the QMP greeting may not become readable within a small per-read timeout (e.g. 250ms). This caused QMP init to fail with "Resource temporarily unavailable (os error 11)" and spam "couldn't initialise QMP", while subsequent retries might eventually succeed once QEMU became ready. To address this issue, keep a short per-read timeout to avoid indefinite blocking, but add a global "wait for QMP ready" deadline that retries the handshake with a small backoff. This improves startup reliability under load and avoids unnecessary reconnect failures. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-26 16:47:32 +08:00
Bo Liu	c7f5ff45a2	arm64: Update ptp.conf to correct time sync Given the patch has been merged in linux upstream, it's safe to enable these two options. Signed-off-by: Bo Liu <152475812+liubocflt@users.noreply.github.com>	2026-01-24 21:08:21 +01:00
Hui Zhu	37a0c81b6a	libs: Change kv of get_agent_kernel_params to BTreeMap HashMap cannot guarantee the order. The command line is always changed. This commit change kv of get_agent_kernel_params to BTreeMap to make sure the command line is not changed. Fixes: #10977 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2026-01-24 21:07:41 +01:00
Alex Lyn	e7b8b302ac	runtime-rs: se File instead of UnixStream for FIFO to fix ENOTSOCK It aims to address the issue: "run_io_copy[Stdout]: failed to copy stream: Not a socket (os error 88)" The `Not a socket (os error 88)` error was caused by incorrectly wrapping a FIFO file descriptor in a `UnixStream`. The following changes: (1) Refactor `open_fifo_write` to return `tokio::fs::File` (or a generic async reader/writer) instead of `AsyncUnixStream`. (2) Ensure IO copying logic treats stdout/stderr streams as file-like objects rather than sockets. This fix eliminates the "failed to copy stream" errors in the IO loop and ensures reliable log forwarding for legacy-io. Fixes: #12387 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-24 10:41:27 +00:00
Alex Lyn	8a0fad4b95	runtime-rs: Move the set_flag_with_blocking out as a public method Move the private closure out and make it a public method which is responsible for clear O_NONBLOCK for an fd and turn it into blocking mode. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-24 10:41:27 +00:00
Manuel Huber	6438fe7f2d	tests: Remove terminationGracePeriod in manifests Do not kill containers immediately, instead use Kubernetes' default termination grace period. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-23 16:18:44 -08:00
Manuel Huber	0d35b36652	Revert "ci: Ensure the KBS resources are created" This reverts commit `c0d7222194`. Soon, guest components will switch to using a DB instead of storing resources in the filesystem. Further, I don't see any more indicators why kbs-client would struggle to set simple resources. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-23 16:18:10 -08:00
Fabiano Fidêncio	5b82b160e2	runtime-rs: Add arm64 QEMU support Add the necessary configuration and code changes to support QEMU on arm64 architecture in runtime-rs. Changes: - Set MACHINETYPE to "virt" for arm64 - Add machine accelerators "usb=off,gic-version=host" required for proper arm64 virtualization - Add arm64-specific kernel parameter "iommu.passthrough=0" - Guard vIOMMU (Intel IOMMU) to skip on arm64 since it's not supported These changes align runtime-rs with the Go runtime's arm64 QEMU support. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2026-01-23 19:48:31 +01:00
tak-ka3	29e7dd27f1	runtime: Add -info flag support for containerd v2.0+ Add support for the -info flag that containerd v2.0+ passes to shims. The flag outputs RuntimeInfo protobuf to stdout containing the shim name and version information. Fixes #12133 Signed-off-by: tak-ka3 <takumi.hiraoka@acompany-ac.com>	2026-01-22 19:26:44 +01:00
Steve Horsman	d0bfb27857	Merge pull request #12384 from Apokleos/fix-full-debug doc: update enabling full debug method	2026-01-22 14:25:11 +00:00
Fabiano Fidêncio	ac8436e326	kata-deploy: Update debian in the container image to 13 (trixie) Just a bump to the latest version, as requested by Mikko. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-22 12:32:59 +01:00
Steve Horsman	2cd76796bd	Merge pull request #12305 from stevenhorsman/fix-stalebot-permissions ci: Fix stalebot permissions	2026-01-22 10:02:43 +00:00
Alex Lyn	fb7390ce3c	doc: update enabling full debug method The enable_debug parameter was explicitly set to false rather than being commented out (e.g., # enable_debug = true). As the previous enabling method failed to account for this explicit setting, it was rendered invalid. This commit updates the matching logic to correctly handle and toggle the explicit false value. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-22 17:44:57 +08:00
Hyounggyu Choi	bc131a84b9	GHA: Set timeout for kata-deploy and kbs cleanup It was observed that some kata-deploy cleanup steps could hang, causing the workflow to never finish properly. In these cases, a QEMU process was not cleaned up and kept printing debug logs to the journal. Over time, this maxed out the runner’s disk usage and caused the runner service to stop. Set timeouts for the relevant cleanup steps to avoid this. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-01-22 10:32:24 +01:00
Fabiano Fidêncio	dacb14619d	kata-deploy: Make verification ConfigMap a regular resource The verification job mounts a ConfigMap containing the pod spec for the Kata runtime test. Previously, both the ConfigMap and the Job were Helm hooks with different weights (-5 and 0 respectively). On k3s, a race condition was observed where the Job pod would be scheduled before the kubelet's informer cache had registered the ConfigMap, causing a FailedMount error: MountVolume.SetUp failed for volume "pod-spec": object "kube-system"/"kata-deploy-verification-spec" not registered This happened because k3s's lightweight architecture schedules pods very quickly, and the hook weight difference only controls Helm's ordering, not actual timing between resource creation and cache sync. By making the ConfigMap a regular chart resource (removing hook annotations), it is created during the main chart installation phase, well before any post-install hooks run. This guarantees the ConfigMap is fully propagated to all kubelets before the verification Job starts. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	89e287c3b2	kata-deploy: Add more permissions to verification job's RBAC The verification job needs to list nodes to check for the katacontainers.io/kata-runtime label and list events to detect FailedCreatePodSandBox errors during pod creation. This was discovered when testing with k0s, where the service account lacked the required cluster-scope permissions to list nodes. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	869dd5ac65	kata-deploy: Enable dynamic drop-in support for k0s Remove k0s-worker and k0s-controller from RUNTIMES_WITHOUT_CONTAINERD_DROP_IN_SUPPORT and always return true for k0s in is_containerd_capable_of_using_drop_in_files since k0s auto-loads from containerd.d/ directory regardless of containerd version. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	d4ea02e339	kata-deploy: Add microk8s support with dynamic version detection Add microk8s case to get_containerd_paths() method and remove microk8s from RUNTIMES_WITHOUT_CONTAINERD_DROP_IN_SUPPORT to enable dynamic containerd version checking. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	69dd9679c2	kata-deploy: Centralize containerd path management Introduce ContainerdPaths struct and get_containerd_paths() method to centralize the complex logic for determining containerd configuration file paths across different Kubernetes distributions. The new ContainerdPaths struct includes: - config_file: File to read containerd version from and write to - backup_file: Backup file path before modification - imports_file: File to add/remove drop-in imports from (Option<String>) - drop_in_file: Path to the drop-in configuration file - use_drop_in: Whether drop-in files can be used Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	606c12df6d	kata-deploy: fix JSONPath parsing for labels with dots The JSONPath parser was incorrectly splitting on escaped dots (\.) causing microk8s detection to fail. Labels like "microk8s.io/cluster" were being split into ["microk8s\", "io/cluster"] instead of being treated as a single key. This adds a split_jsonpath() helper that properly handles escaped dots, allowing the automatic microk8s detection via the node label to work correctly. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	ec18dd79ba	tests: Simplify kata-deploy test to use helm directly The kata-deploy test was using helm_helper which made it hard to debug failures (die() calls would cause "Executed 0 tests" errors) and added unnecessary complexity. The test now calls helm directly like a user would, making it simpler and more representative of real-world usage. The verification job status is explicitly checked with proper failure detection instead of relying on helm --wait. Timeouts are configurable via environment variables to account for different network speeds and image sizes: - KATA_DEPLOY_TIMEOUT (default: 600s) - KATA_DEPLOY_DAEMONSET_TIMEOUT (default: 300s) - KATA_DEPLOY_VERIFICATION_TIMEOUT (default: 120s) Documentation has been added to explain what each timeout controls and how to customize them. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	86e0b08b13	kata-deploy: Improve verification job timing and failure detection The verification job now supports configurable timeouts to accommodate different environments and network conditions. The daemonset timeout defaults to 1200 seconds (20 minutes) to allow for large image downloads, while the verification pod timeout defaults to 180 seconds. The job now waits for the DaemonSet to exist, pods to be scheduled, rollout to complete, and nodes to be labeled before creating the verification pod. A 15-second delay is added after node labeling to allow kubelet time to refresh runtime information. Retry logic with 3 attempts and a 10-second delay handles transient FailedCreatePodSandBox errors that can occur during runtime initialization. The job only fails on pod errors after a 30-second grace period to avoid false positives from timing issues. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
Fabiano Fidêncio	2369cf585d	tests: Fix retry loop bugs in helm_helper The retry loop in helm_helper had two bugs: 1. Counter initialized to 10 instead of 0, causing immediate failure 2. Exit condition used -eq instead of -ge, incorrect for loop logic These bugs would cause helm_helper to fail immediately on the first retry attempt instead of properly retrying up to max_tries times. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-21 20:14:33 +01:00
stevenhorsman	19efeae12e	workflow: Fix stalebot permissions When looking into stale bot more for issues, I realised that our existing stale job would need permissions to work. Unfortunately the behaviour of the actions without these permissions is to log, but still finish as successful. This means it was hard to spot we had an issue. Add the required permissions to get this working again and improve the message Also add concurrency rule to make zizmor happy Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-21 17:28:59 +00:00
Steve Horsman	70f6543333	Merge pull request #12371 from stevenhorsman/cargo-check build: Add cargo check	2026-01-21 14:50:07 +00:00
Steve Horsman	4eb50d7b59	Merge pull request #12334 from stevenhorsman/rust-linting-improvements Rust linting improvements	2026-01-21 14:01:37 +00:00
Steve Horsman	ba47bb6583	Merge pull request #11421 from kata-containers/dependabot/go_modules/src/runtime/github.com/urfave/cli-1.22.17 build(deps): bump github.com/urfave/cli from 1.22.14 to 1.22.17 in /src/runtime	2026-01-21 11:46:02 +00:00
stevenhorsman	62847e1efb	kata-ctl: Remove unnecessary unwrap Switch `is_err()` and then `unwrap_err()` for `if let` which is "more idiomatic" Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-21 08:53:40 +00:00
stevenhorsman	78824e0181	agent: Remove unnecessary unwrap Switch `is_some()` and then `unwrap()` for `if let` which is "more idiomatic" Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-21 08:53:40 +00:00
stevenhorsman	d135a186e1	libs: Remove unnecessary unwrap Switch `is_err()` and then `unwrap_err()` for `if let` which is "more idiomatic" Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-21 08:52:48 +00:00
stevenhorsman	949e0c2ca0	libs: Remove unused imports Tidy up the imports Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-21 08:52:48 +00:00
stevenhorsman	83b0c44986	dragonball: Remove unused imports Clean up the imports Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-21 08:52:48 +00:00
stevenhorsman	7a02c54b6c	kata-ctl: Allow unused assigned in clap parsing command isn't ever read, but leave it in for now, so we don't disrupt the parsing option Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-21 08:52:48 +00:00
stevenhorsman	bf1539b802	libs: Replace manual default HugePageType has a manual default that can be derived more concisely Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-21 08:52:47 +00:00
stevenhorsman	0fd9eebf0f	kata-ctl: Update Cargo.lock The cargo check identified that the lock file is out of date, so bump this to fix the issue Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-20 16:07:34 +00:00
stevenhorsman	3f1533ae8a	build: Add cargo check We've had a couple of occasions that Cargo.lock has been out of sync with Cargo.toml, so try and extend our rust check to pick this up in the CI. There is probably a more elegant way than doing `cargo check` and checking for changes, but I'll start with this approach Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-20 16:07:34 +00:00
Greg Kurz	cf3441bd2c	agent: Refresh `Cargo.lock` Downstream builders at Red Hat complain that `Cargo.lock` doesn't match `Cargo.toml`. Run `cargo check` to refresh `Cargo.lock`. `git bisect` shows that `7cfb97d41b` is the first commit where `cargo check` has an effect in `src/agent`. Signed-off-by: Greg Kurz <groug@kaod.org>	2026-01-20 14:44:47 +01:00
Fabiano Fidêncio	e0158869b1	tests: Add common bats test runner function Add run_bats_tests() function to common.bash that provides consistent test execution and reporting across all test suites (k8s, nvidia, kata-deploy). This removes duplicated test runner code from run_kubernetes_tests.sh, run_kubernetes_nv_tests.sh, and run-kata-deploy-tests.sh. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-20 12:31:55 +01:00
Fabiano Fidêncio	5aff81198f	helm-chart: Fix warnings on README nydus -> `nydus` erofs -> `erofs` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 22:41:50 +01:00
Fabiano Fidêncio	b5a986eacf	kata-deploy: Add runtime-rs TDX / SNP runtimeclasses https://github.com/kata-containers/kata-containers/pull/11534 has been merged and it added all the needed bits to deploy the QEMU SNP / TDX runtime-rs variants, apart from the kata-deploy additions, which is done by this PR. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 22:41:50 +01:00
Fabiano Fidêncio	c7570427d2	tests: Add report generation to NVIDIA tests The NVIDIA GPU test runner script was not generating test reports, causing the report_tests() function in gha-run.sh to have nothing to display. This aligns the script with run_kubernetes_tests.sh by: - Adding set -o pipefail for proper pipeline error handling - Creating a reports directory with timestamped subdirectory - Capturing test output to files with ok-/not_ok- prefixes - Adding --timing flag to bats for timing information Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 18:21:43 +01:00
Fabiano Fidêncio	c1216598e8	static-checks: Fix kata-deploy reference Let's just point to the official documentation rather than explaining exactly how to deploy (and the current text was very outdated). Removing fluentd / minikube examples is out of context of this commit. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 15:09:20 +01:00
Fabiano Fidêncio	96e1fb4ca6	tools: Remove runk The runk tool hasn't been supported for a few years, with no maintainers since ManaSugi stopped being involved in the project and the CI was disabled in 2024. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 14:43:53 +01:00
Fabiano Fidêncio	f68c25de6a	kata-deploy: Switch to the rust version Let's remove the script and rely only on the rust version from now on. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 14:07:49 +01:00
Fabiano Fidêncio	d7aa793dde	Revert "ci: Run a nightly job using the kata-deploy rust" This reverts commit `6130d7330f`, as we're officially swithcing to the rust version of kata-deploy. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 14:07:49 +01:00
Fabiano Fidêncio	17472f3f10	release: scripts: Accept KATA_TOOLS_STATIC_TARBALL env var `a2534e7bc8` introduced the logic to also release a kata-tools tarball, but it missed allowing KATA_TOOLS_STATIC_TARBALL env var to be passed to the release script, leading to the following error during the release process: ``` ERROR: Invalid environment variable "KATA_TOOLS_STATIC_TARBALL" ``` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 13:03:23 +01:00
Fabiano Fidêncio	882862d711	release: Bump version to 3.25.0 Bump VERSION and helm-charts versions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-19 11:33:45 +01:00
XanderC	93beb58c5d	runtime: fix network initialization for non-hotplug VMMs In startVM(), for VMMs without hotplug support (e.g., Firecracker or QEMU microvm), the runtime runs prestart hooks but misses rescanning the network namespace. This causes VMs to boot with uninitialized network configs, as updates from CNI plugins are not captured. This patch adds a network rescan via AddEndpoints after prestart hooks for the non-hotplug path, ensuring correct network info is passed to the VMM configuration before the VM starts. Fixes #11500 Signed-off-by: XanderC <xanderc@qq.com>	2026-01-17 23:56:59 +01:00
Zvonko Kaiser	428cc5d586	gpu: Chroot Cleanup With the newest NVRC we do not need the supported GPUs anymore. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-17 19:27:24 +01:00
Fabiano Fidêncio	1c154b4c15	kernel: Add DAX fix for arm64 The patch has been provided upstream by Seunguk Shin and is already approved. We'll drop it once it becomes available in the LTS tree. Reference: https://lore.kernel.org/all/18af3213-6c46-4611-ba75-da5be5a1c9b0@arm.coum Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-17 19:15:53 +01:00
Fabiano Fidêncio	33b1f0786e	Revert "arm64: Do not use DAX with the rootfs image" This reverts commit `2acb94ef2d`, as we have a kernel patch approved fixing the issue. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-17 19:15:53 +01:00
Alex Lyn	fe15f2fa47	runtime-rs: Remove deprecated virtio-9p The virtio-9p is not supported for a long time, specially within the runtime-rs, we have no such plan to support it. Removal of the related items is reasonable. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-17 18:52:57 +01:00
Alex Lyn	b7cfc6fd72	runtime-rs: Remove mem-agent section from TDX/SNP configurations As Memory Agent feature is not used within CoCo(TDX/SNP) scenarios, with this fact, it's better to just remove the related sections. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-17 18:52:57 +01:00
Alex Lyn	634ec2b56d	runtime-rs: Add configurable SNP items in Makefile when make build It aims to introduce some related items within Makefile to enable Intel SNP settings in configuration when do make build. And make it possible to generate the rendered qemu-snp-runtime-rs configuration based on the *.in template. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-17 18:52:57 +01:00
Alex Lyn	0abdb8e016	runtime-rs: Introduce a qemu-runtime-rs/SEV-SNP dedicated configuration To make it work well on the SEV-SNP platforms for qemu-runtime-rs with coco, a dedicated SEV-SNP configuration should be introduced to help prepare related CVM resources. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-17 18:52:57 +01:00
Alex Lyn	b0a82f7bb8	runtime-rs: Enable measured rootfs within configuration when make build Enable measured rootfs within configuration when make build. And add some other important items to make the configuration work well. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-17 18:52:57 +01:00
Alex Lyn	3799855040	runtime-rs: Add configurable TDX items in Makefile when make build It aims to introduce some related items within Makefile to enable Intel TDX settings in configuration when do make build. And make it possible to generate the rendered qemu-tdx-runtime-rs configuration based on the *.in template. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-17 18:52:57 +01:00
Alex Lyn	4d55e2c8c8	runtime-rs: Introduce a dedicated configuration for qemu-runtime-rs/TDX To make it work well on the TDX platforms for qemu-runtime-rs with coco, a dedicated TDX configuration should be introduced to help prepare related CVM resources. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-17 18:52:57 +01:00
Manuel Huber	956f43c6c6	runtime: skip MoveTo for systemd cgroups Systemd-managed cgroups use the slice:prefix:name format, which is not a filesystem path. Calling MoveTo() on such paths fails with "invalid group path" and can abort cleanup before Delete() runs. In some cases, this causes pod teardown delays. Skip MoveTo for systemd-formatted sandbox/overhead cgroup paths when sandbox_cgroup_only is true; systemd moves tasks on unit deletion. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-16 16:41:38 +01:00
Manuel Huber	6b70923e55	docs: Update NVIDIA GPU passthrough QEMU scenario With cold-plug becoming by design the only supported mode with the update of NVRC to v0.1.1, resolving references to hot-plug. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-16 13:50:10 +01:00
Steve Horsman	610a8bdfd5	Merge pull request #12346 from Amulyam24/ppc64le-payload ci: move the job publish kata payload after push to an alternate runner for ppc64le	2026-01-16 11:41:53 +00:00
Fabiano Fidêncio	ea18f543b4	tests: kata-deploy: Enable verification during helm install Enable post-install verification in kata-deploy CI tests. When HELM_VERIFY_DEPLOYMENT is set, a simple verification pod is created that runs with the Kata runtime to confirm deployment succeeded. The verification pod prints kernel info and exits - success indicates the Kata runtime is properly configured and functional. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-16 10:52:43 +01:00
Fabiano Fidêncio	a188f04d75	kata-deploy: helm: Add optional post-install verification Add optional verification that runs after kata-deploy installation. When a pod spec is provided via --set-file verification.pod=<file>, a verification job runs after install/upgrade to validate deployment. The user is fully responsible for the verification pod content: - Pod name, runtimeClassName, annotations, and verification logic - Pod must exit 0 on success, non-zero on failure The verification job simply: 1. Waits for kata-deploy DaemonSet to be ready 2. Applies the user-provided pod spec 3. Waits for the pod to complete 4. Shows logs and cleans up Usage: helm install kata-deploy ... \ --set-file verification.pod=/path/to/your-pod.yaml Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-16 10:52:43 +01:00
Amulyam24	859313d904	ci: move the job payload after push to an alternate runner for ppc64le To unlock the release, move the job to publish kata payload after push to an alternate runner(IBM owned) for ppc64le. Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2026-01-16 11:14:42 +05:30
Alex Lyn	c0cca81993	runtime-rs: Set default_bridges with 0 for dragonball vmm As Dragonball VMM does not support PCI hotplug options, it should be set 0. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-15 20:32:15 +01:00
Alex Lyn	1a76d44e16	kata-types: Chanage the default bridges with 1 It aims to align it with the Makefile and configuration's setting. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-15 20:32:15 +01:00
Alex Lyn	6375b3881d	runtime-rs: Set the default bridges with default 1 As runtime-go use the default bridges with 1, it should be kept as 1 to avoid alignment issues. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-15 20:32:15 +01:00
Alex Lyn	8728b262fb	Merge pull request #12338 from zvonkok/nvrc-update gpu: Bump NVRC Version	2026-01-15 19:36:07 +08:00
Zvonko Kaiser	adce41c432	gpu: Bump NVRC Version The new NVRC version works for CC and non-CC use cases, no --feature confidential needed anymore. Bump versions.yaml and adjust deployment instructions. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-15 01:51:10 +00:00
Manuel Huber	6753c3ac08	runtime: nvidia: Disable NVDIMM Disable NVDIMM. When using GPU passthrough, using NVDIMM would create a r/o file-backed memory region. When using a GPU, QEMU tries to DMA- map guest memory for the device, resulting in a mapping error: memory listener initialization failed: Region mem0: vfio_container_dma_map ... -22 (Invalid argument). For the CC configs, NVDIMM is disabled by default in qemu_amd64.go with a warning, but we also explicitly disable the setting in the shim configuration file. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-14 22:51:07 +01:00
Fabiano Fidêncio	a9dda0e52b	versions: nvidia: Bump kernel to the latest LTS As now that we have the decoupled rootfs / kernel, doing the bump becomes trivial. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-14 20:45:54 +01:00
Fabiano Fidêncio	4e99860fd2	workflows: nvidia: Adjust to kernel / roots build decouple We don't need to store the kernel headers anymore. We do need to store the kernel modules, instead. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-14 20:45:54 +01:00
Zvonko Kaiser	02d2b6bdf2	kernel: bump kata_config_version We have kernel build changes bump the config version Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-14 20:45:54 +01:00
Zvonko Kaiser	a075c3740a	gpu: build_image.sh use versions.yaml We've done some bad file based driver determination, now with versions.yaml there is a single source of truth. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-14 20:45:54 +01:00
Zvonko Kaiser	ffc8725164	gpu: rootfs update decoupling Remove all the driver build instructions, sicne those are now done in the kernel target. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-14 20:45:54 +01:00
Zvonko Kaiser	cca973772d	gpu: deploy modules for kernel build We need to package the build modules for the rootfs to be able to consume it. We package the whole /lib/modules/$(uname -r) directory strip=2. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-14 20:45:54 +01:00
Zvonko Kaiser	13ed3cdff9	gpu: Add NVIDA modules to build-kernel.sh Checkout and build the kernel modules along with the kernel to avoid the kernel rootfs dependency. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-14 20:45:54 +01:00
Zvonko Kaiser	2a11910acb	gpu: Remove building of Headers Since we build along the kernel we do not need to carry over the headers to the rootfs build. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-14 20:45:54 +01:00
Zvonko Kaiser	b1870fef07	gpu: versions.yaml nvidia driver pinning We want to have deterministic behaviour and only one valid driver version acceptable via versions.yaml Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-14 20:45:54 +01:00
Zvonko Kaiser	229481b348	kernel: bugfix install yq We actually never installed yq to the kernel build, there are some path that use yq but were never hit, for the GPU use-case we need to read values from versions.yaml Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-01-14 20:45:54 +01:00
Steve Horsman	6db3a4cf8d	Merge pull request #12333 from fitzthum/bump-v0180 Update Trustee and guest-components for upcoming releases	2026-01-14 19:44:55 +00:00
Tobin Feldman-Fitzthum	ca29e68acb	agent-ctl: bump image-rs version In preparation for coco v0.18.0, bump the version of image-rs we use in agent-ctl to match what we have in versions.yaml. Drop the snapshotter-overlayfs feature. This was dropped from image-rs when we removed enclave-cc support. Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>	2026-01-14 06:54:29 -08:00
Tobin Feldman-Fitzthum	25a08ef739	versions: bump Trustee and guest-components Before cutting the Kata release that will be used with CoCo v0.18.0, let's bump the versions of Trustee and guest-components to latest. Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>	2026-01-14 06:43:30 -08:00
Steve Horsman	0f5f914a04	Merge pull request #12330 from LandonTClipp/docs_improvement docs: Navigation improvements and bug fixes to Pages	2026-01-14 14:13:29 +00:00
stevenhorsman	70e3e2b0c9	genpolicy: Bump openssl-src This is a vulnerability (CVE-2025-9230) in openssl, so move to 3.5.4 which has a fix for this Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-14 14:05:48 +01:00
stevenhorsman	aace7a7336	versions: Bump openssl-src This is a vulnerability (CVE-2025-9230) in openssl, so move to 3.5.4 which has a fix for this Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-14 14:05:48 +01:00
Fabiano Fidêncio	2acb94ef2d	arm64: Do not use DAX with the rootfs image Kernel 6.18.x has an issue with DAX, which is not yet fixed upstream: ``` [ 0.737679] EXT4-fs (pmem0p1): mounted filesystem 79676804-7c8b-491a-b2a6-9bae3c72af70 ro with ordered data mode. Quota mode: disabled. [ 0.737891] VFS: Mounted root (ext4 filesystem) readonly on device 259:1. [ 0.739119] devtmpfs: mounted [ 0.739476] Freeing unused kernel memory: 1920K [ 0.740156] Run /sbin/init as init process [ 0.740229] with arguments: [ 0.740286] /sbin/init [ 0.740321] with environment: [ 0.740369] HOME=/ [ 0.740400] TERM=linux [ 0.743162] Unable to handle kernel paging request at virtual address fffffdffbf000008 [ 0.743285] Mem abort info: [ 0.743316] ESR = 0x0000000096000006 [ 0.743371] EC = 0x25: DABT (current EL), IL = 32 bits [ 0.743444] SET = 0, FnV = 0 [ 0.743489] EA = 0, S1PTW = 0 [ 0.743545] FSC = 0x06: level 2 translation fault [ 0.743610] Data abort info: [ 0.743656] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000 [ 0.743720] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 0.743785] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 0.743848] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000b9d17000 [ 0.743931] [fffffdffbf000008] pgd=10000000bfa3d403, p4d=10000000bfa3d403, pud=1000000040bfe403, pmd=0000000000000000 [ 0.744070] Internal error: Oops: 0000000096000006 [#1] SMP [ 0.748888] CPU: 0 UID: 0 PID: 1 Comm: init Not tainted 6.18.4 #1 NONE [ 0.749421] pstate: 004000c5 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.749969] pc : dax_disassociate_entry.constprop.0+0x20/0x50 [ 0.750444] lr : dax_insert_entry+0xcc/0x408 [ 0.750802] sp : ffff80008000b9e0 [ 0.751083] x29: ffff80008000b9e0 x28: 0000000000000000 x27: 0000000000000000 [ 0.751682] x26: 0000000001963d01 x25: ffff0000004f7d90 x24: 0000000000000000 [ 0.752264] x23: 0000000000000000 x22: ffff80008000bcc8 x21: 0000000000000011 [ 0.752836] x20: ffff80008000ba90 x19: 0000000001963d01 x18: 0000000000000000 [ 0.753407] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 0.753970] x14: ffffbf3154b9ae70 x13: 0000000000000000 x12: ffffbf3154b9ae70 [ 0.754548] x11: ffffffffffffffff x10: 0000000000000000 x9 : 0000000000000000 [ 0.755122] x8 : 000000000000000d x7 : 000000000000001f x6 : 0000000000000000 [ 0.755707] x5 : 0000000000000000 x4 : 0000000000000000 x3 : fffffdffc0000000 [ 0.756287] x2 : 0000000000000008 x1 : 0000000040000000 x0 : fffffdffbf000000 [ 0.756871] Call trace: [ 0.757107] dax_disassociate_entry.constprop.0+0x20/0x50 (P) [ 0.757592] dax_iomap_pte_fault+0x4fc/0x808 [ 0.757951] dax_iomap_fault+0x28/0x30 [ 0.758258] ext4_dax_huge_fault+0x80/0x2dc [ 0.758594] ext4_dax_fault+0x10/0x3c [ 0.758892] __do_fault+0x38/0x12c [ 0.759175] __handle_mm_fault+0x530/0xcf0 [ 0.759518] handle_mm_fault+0xe4/0x230 [ 0.759833] do_page_fault+0x17c/0x4dc [ 0.760144] do_translation_fault+0x30/0x38 [ 0.760483] do_mem_abort+0x40/0x8c [ 0.760771] el0_ia+0x4c/0x170 [ 0.761032] el0t_64_sync_handler+0xd8/0xdc [ 0.761371] el0t_64_sync+0x168/0x16c [ 0.761677] Code: f9453021 f2dfbfe3 cb813080 8b001860 (f9400401) [ 0.762168] ---[ end trace 0000000000000000 ]--- [ 0.762550] note: init[1] exited with irqs disabled [ 0.762631] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ``` For now, we limit the rootfs that we ship to ARM64 to not use DAX, in the future we'll re-enable it as soon as the patch lands on mainstream kernel. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-14 11:46:40 +01:00
Fabiano Fidêncio	3ef99f4ee3	versions: Add specific nvidia kernel version This is needed as the 580 driver doesn't build against 6.18.x, and the 590 driver is not yet fully working for our case, thus we stick to the previous version that worked before. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-14 11:46:40 +01:00
Fabiano Fidêncio	cce5d4abf6	kernel: bump to v6.18.x (LTS) Bump both the kernel and kernel-confidential versions from v6.12.x and v6.16.x to v6.18.4, aligning with the new LTS release. Kernel 6.18 introduced several configuration changes that required updates to our kernel config fragments: * CRYPTO_FIPS dependencies changed: - In 6.12: depended on !CRYPTO_MANAGER_DISABLE_TESTS - In 6.18: now depends on CRYPTO_SELFTESTS (which requires EXPERT) Added CONFIG_EXPERT=y and CONFIG_CRYPTO_SELFTESTS=y to crypto.conf to satisfy the new dependency chain. * CONFIG_EXPERT is a naughty one, as it disables / enables a bunch of things behind ones back, probably just to prove a point that it is for experts ;-) ... regardless, a reasonable amount of options had to be re-added in order to make sure anything ends up broken. * Legacy iptables support: Kernel 6.18 requires explicit legacy xtables/iptables configs for IP_NF_* options. Added CONFIG_NETFILTER_XTABLES_LEGACY, CONFIG_IP_NF_IPTABLES_LEGACY, and CONFIG_IP6_NF_IPTABLES_LEGACY to netfilter.conf. * Module signing dependencies: Added CONFIG_MODULES=y and other required dependencies to module_signing.conf to ensure MODULE_SIG can be properly enabled. * Whitelist updates: - Added CONFIG_NF_CT_PROTO_DCCP (removed in 6.18+) - Added CONFIG_CRYPTO_SELFTESTS, CONFIG_NETFILTER_XTABLES_LEGACY, CONFIG_IP_NF_IPTABLES_LEGACY, CONFIG_IP6_NF_IPTABLES_LEGACY (added in 6.18+, not present in older kernels like 6.12) Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-14 11:46:40 +01:00
LandonTClipp	197231456f	docs: Navigation improvements and bug fixes to Pages A few minor changes to the Zensical config that makes navigation easier. Also fixed a couple of bugs with local serving and added some quality of life features to Zensical. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2026-01-13 11:17:58 -06:00
LandonTClipp	94fde1356c	docs: Add Zensical Doc Site Generation This commit adds a Github workflow for building a Github Pages site for the markdown files in the docs/ directory. Zensical is a new markdown-based static site generation framework built by the creators of Material for Mkdocs. https://zensical.org/ This commit does not clean the doc structure, so site navigation is initially going to be messy. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2026-01-13 12:42:02 +01:00
dependabot[bot]	2edb161c53	build(deps): bump github.com/urfave/cli in /src/runtime Bumps [github.com/urfave/cli](https://github.com/urfave/cli) from 1.22.14 to 1.22.17. - [Release notes](https://github.com/urfave/cli/releases) - [Changelog](https://github.com/urfave/cli/blob/main/docs/CHANGELOG.md) - [Commits](https://github.com/urfave/cli/compare/v1.22.14...v1.22.17) --- updated-dependencies: - dependency-name: github.com/urfave/cli dependency-version: 1.22.17 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2026-01-13 09:04:41 +00:00
dependabot[bot]	3377d729ea	build(deps): bump rsa from 0.9.6 to 0.9.9 in /src/tools/agent-ctl Bumps [rsa](https://github.com/RustCrypto/RSA) from 0.9.6 to 0.9.9. - [Changelog](https://github.com/RustCrypto/RSA/blob/v0.9.9/CHANGELOG.md) - [Commits](https://github.com/RustCrypto/RSA/compare/v0.9.6...v0.9.9) --- updated-dependencies: - dependency-name: rsa dependency-version: 0.9.9 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2026-01-13 04:08:40 +01:00
Fupan Li	1f1a000608	Merge pull request #12291 from Apokleos/bump-qapi runtime-rs: Bump qapi-rs from 0.14 to 0.15	2026-01-13 10:39:41 +08:00
Manuel Huber	9e30283952	runtime: nvidia: change kernel parameters Remove the agent hotplug timeout parameter from the kernel command line. Having shifted to VFIO cold-plug, this parameter is no longer needed. Remove the no longer required parameter for TDX and thus align the SNP and TDX configurations. Add a parameter to avoid the kernel to mount the /dev tmpfs. NVRC and later on kata-agent attempt this. While kata-agent does not panic when mounting /dev fails, NVRC makes mounting /dev a hard requirement. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-12 16:11:28 -08:00
dependabot[bot]	bcadb9b231	build(deps): bump sequoia-openpgp in /src/tools/agent-ctl Bumps [sequoia-openpgp](https://gitlab.com/sequoia-pgp/sequoia) from 2.0.0 to 2.1.0. - [Commits](https://gitlab.com/sequoia-pgp/sequoia/compare/openpgp/v2.0.0...openpgp/v2.1.0) --- updated-dependencies: - dependency-name: sequoia-openpgp dependency-version: 2.1.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2026-01-12 22:16:51 +01:00
Alex Lyn	fba92880c9	tests: make set_container_command idempotent and add debug output set_container_command() previously appended command arguments one-by-one with '.command += [...]'. This makes the helper non-idempotent and can lead to unexpected command arrays when invoked multiple times. Update the helper to set the full command array in a single yq v4 expression and print the target YAML path plus the command being applied to simplify debugging when tests fail. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-12 17:56:28 +01:00
Alex Lyn	38296a41b2	tests: Generate pod config with stable .yaml suffix The pod config file created by new_pod_config() was generated via mktemp using the template "pod-config.yaml.in.XXX", which produces filenames that do not end with ".yaml" (e.g. pod-config.yaml.in.ABC). If the random combination of special suffix with ".Csv" or ".Xml", etc. the following operations with yq will fail. Some helpers and tooling assume the config path ends with ".yaml". Switch the mktemp template to place the random suffix before the extension so the returned path always ends with ".yaml". Fixes: #12268, #12319 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-12 17:56:28 +01:00
Fabiano Fidêncio	9fec31f400	tools: kubectl: Add kubectl version as a tag This is a suggestion from Choi, so we can easily test with a specific kubectl version and also easily understand which kubectl version is being used in case of failure. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-12 15:48:44 +01:00
Fabiano Fidêncio	26dfcb627b	tools: Build kubectl image This image will be used by our helm charts to verify that a kata-containers deployment is correct. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-12 15:48:44 +01:00
Alex Lyn	d03eccf567	runtime-rs: Improve wait_for_migration to avoid fixed sleep Enhance the wait_for_migration implementation to reliably wait for QEMU migration completion and avoid the previous `sleep(280ms)` delay. (1) Add an initial fast-path query to return immediately if migration is already completed/failed/cancelled. (2) Use a hard deadline to enforce timeouts deterministically. (3) Implement adaptive polling with backoff and a maximum interval to reduce QMP load while keeping responsiveness. (4) Unify migration status handling and return clear errors on failed/cancelled states. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-12 20:06:55 +08:00
Alex Lyn	5026b33455	runtime-rs: Introduce a method to detect current migrate info Return information about current migration process. And the input and output as below: { 'command': 'query-migrate', 'returns': 'MigrationInfo' } But note that the Qemu API is valid within qapi-rs(v0.15+) Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-12 20:06:55 +08:00
Alex Lyn	c472b5db54	runtime-rs: Bump qapi-rs from 0.14 to 0.15 The detailed information about the updated versions as below: ``` qapi = { version = "0.15", features = ["qmp", "async-tokio-all"] } qapi-spec = "0.3.2" qapi-qmp = "0.15.0" ``` and it will correct some corresonding structures. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-12 20:06:55 +08:00
Manuel Huber	183507beeb	agent: change secure_storage_integrity default Change the secure_storage_integrity option's default value to true. With this, integrity protection for encrypted block device contents will be requested from the confidential data hub by default, see the agent's cdh_handler_trusted_storage function in rpc.rs. This behavior can be disabled by explicitly setting the agent.secure_storage_integrity parameter to 0 or false via kernel command line parameters. This will affect the trusted storage implementation for the guest-pull mechanism, and it will affect future implementations using this code path, such as implementations for ephemeral secure storage. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-10 16:54:03 +01:00
stevenhorsman	a0d96256f5	packaging: Fix tools permissions issue In some builds we are seeing: ``` error: could not create temp file /opt/rustup/tmp/r2xu46kwuyc7k2kr_file: Permission denied (os error 13) ``` in the agent-ctl build, so try and port a fix from #12313 to the tools build to try and resolve this. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-09 21:45:26 +01:00
Federico A. Corazza	787768fe9b	kata-deploy: Fix extraction of the containerd major version Fixes deploying kata-containers using k3s. The deploy script fails with /opt/kata-artifacts/scripts/kata-deploy.sh: line 397: [: too many arguments Signed-off-by: Federico A. Corazza <git@facorazza.com>	2026-01-09 19:52:18 +01:00
stevenhorsman	5067ed7d9a	versions.yaml: Fix formatting errors yamllint complains that there is only one space before the comment, so add a second to prevent this annoying message showing up. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-09 19:36:31 +01:00
stevenhorsman	a850f66fc4	versions: Bump rust to 1.89 Following the agreed toolchain policy - bump rust to the current (1.91)-2 releases. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-09 19:36:31 +01:00
Manuel Huber	df2896c298	docs: Create NVIDIA GPU passthrough QEMU scenario Create a new page for a reference implementation for Kubernetes using QEMU, the go shim and an NVIDIA rootfs. The new page contains information on: - components involved in the NVIDIA (TEE) GPU scenario - orchestration flow for GPU passthrough scenarios - deployment guidance Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-09 19:02:56 +01:00
Manuel Huber	43627805f4	docs: Improve structure and flow of NVIDIA guide - Apply a few structural/grouping changes and improve flow - Group build sections together - Move usage examples to last section Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-09 19:02:56 +01:00
Steve Horsman	489deaad17	Merge pull request #12297 from manuelh-dev/mahuber/fix-doc docs: Fix trusted-image-storage reference	2026-01-09 15:22:25 +00:00
Hyounggyu Choi	2962e14c10	virtiofsd: fix RUSTUP_HOME and CARGO_HOME permissions for non-root builds The following error was observed during virtiofsd static build: ``` error: could not create temp file /opt/rustup/tmp/p44enysfaxwdbvw4_file: Permission denied (os error 13) ``` This occurs because RUSTUP_HOME and CARGO_HOME were initialized by the root user during `docker build`, but `cargo build` is executed as a non-root user via 'docker run --user'. Ensure these directories are writable by adjusting the permission after the toolchain installation is complete. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2026-01-09 14:01:20 +01:00
Manuel Huber	65aa99f291	docs: Fix trusted-image-storage reference The sample uses a volume device name which does not exist, hence fix. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-01-09 11:41:18 +00:00
Saul Paredes	02979a13e3	Merge pull request #12208 from romoh/patch-1 ci: Update AKS setup post Pod Sandboxing GA	2026-01-08 11:02:05 -08:00
Fabiano Fidêncio	f8318c0542	kata-deploy: Remove unused dependency We're depending solely on toml_edit, thus we can safely remove the toml dependency. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-08 18:58:11 +01:00
Fupan Li	b3546f3a68	Merge pull request #12282 from kata-containers/set-required-ci Set several tests as required ci	2026-01-08 20:34:39 +08:00
Mikko Ylinen	cc6277b735	Revert "tdx: Update GPU config for the latest TDX stack" Prefer the "full feature TDVF" instead of the generic OVMF build. See Option-B in https://github.com/tianocore/edk2/tree/master/OvmfPkg/IntelTdx#configurations-and-features for the extra hardening supported. FIRMWAREPATH_NV also seems to be TDX specific unlike the Makefile suggests. Therefore, it can be dropped completely. This reverts commit `66ccc25724`.	2026-01-08 10:21:47 +01:00
Mikko Ylinen	e02e226431	packaging: build OVMF for Intel TDX again OVMF build for Intel TDX (aka "TDVF") was disabled in favor of Ubuntu/ CentOS pre-upstream releases of Intel TDX. See `4292c4c3b1`. It's time to re-enable the build and move runtime configurations to use it (the latter will be done in a later commit). This is a partial revert of `4292c4c3b` with the following changes: - Stop calling OVMF for Intel TDX "TDVF" and follow the naming distros use for TDX enabled build: OVMF.inteltdx.fd. - Single binary OVMF.inteltdx.fd is supported using -bios QEMU param. - Secure Boot infrastructure is disabled since Kata does not support it. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-01-08 10:21:47 +01:00
Alex Lyn	f3d92a8b4a	dragonball: Fix UT failed in test_fs_manipulate_backend_fs Improve the checking logic for source path existing. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-08 12:42:00 +08:00
Alex Lyn	7de968b416	dragonball: Fix warning of unused method Actually this method is indeed called, just add attribute of `#[allow(dead_code)]` to allow UT pass. And the warning looks like: warning: method `send_message_with_payload` is never used \| 224 \| impl<R: Req> Endpoint<R> { \| ------------------------ method in this implementation ... 522 \| pub fn send_message_with_payload<T: Sized, P: Sized>( \| ^^^^^^^^^^^^^^^^^^^^^^^^^ \| = note: `#[warn(dead_code)]` on by default Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-08 11:01:34 +08:00
Alex Lyn	36d3d7c3bf	dragonball: Fix warnings of result to be handled warning: unused `std::result::Result` that must be used --> src/dragonball/dbs_virtio_devices/src/vhost/vhost_user/net.rs:679:9 \| 679 \| / VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::write_config( 680 \| \| &mut dev, 0, &config, 681 \| \| ); \| \|_________^ \| = note: this `Result` may be an `Err` variant, which should be handled = note: `#[warn(unused_must_use)]` on by default help: use `let _ = ...` to ignore the resulting value \| 679 \| let _ = VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::write_config( \| +++++++ warning: unused `std::result::Result` that must be used --> src/dragonball/dbs_virtio_devices/src/vhost/vhost_user/net.rs:683:9 \| 683 \| / VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::read_config( 684 \| \| &mut dev, 0, &mut data, 685 \| \| ); \| \|_________^ \| = note: this `Result` may be an `Err` variant, which should be handled help: use `let _ = ...` to ignore the resulting value \| 683 \| let _ = VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::read_config( \| +++++++ Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-08 10:52:19 +08:00
Alex Lyn	6a1b25a4b0	dragonball: Fix warning of variable does not need to be mutable the WARNING looks like as: ... warning: variable does not need to be mutable --> src/dragonball/dbs_virtio_devices/src/vsock/csm/txbuf.rs:217:13 \| 217 \| let mut tmp: Vec<u8> = vec![0; TxBuf::SIZE - 2]; \| ----^^^ \| \| \| help: remove this `mut` \| = note: `#[warn(unused_mut)]` on by default ... Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-08 10:44:25 +08:00
Alex Lyn	064271b9cb	dragonball: Fix unexpected `cfg` condition of test-resources Fix the warnings about unexpected cfg of test-resources, and the detailed warning message looks like as below: ... warning: unexpected `cfg` condition value: `test-resources` --> src/dragonball/dbs_virtio_devices/src/fs/device.rs:973:11 \| 973 \| #[cfg(feature = "test-resources")] \| ^^^^^^^^^^^^^^^^^^^^^^^^^^ \| = note: expected values for `feature` are: `fuse-backend-rs`, `vhost`, `vhost-net`, `vhost-rs`, `vhost-user`, `vhost-user-blk`, `vhost-user-fs`, `vhost-user-net`, `virtio-balloon`, `virtio-blk`, `virtio-fs`, `virtio-fs-pro`, `virtio-mem`, `virtio-mmio`, `virtio-net`, and `virtio-vsock` = help: consider adding `test-resources` as a feature in `Cargo.toml` ... Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-08 10:39:33 +08:00
Alex Lyn	ef36c47ca4	runtime-rs: Fix deprecated method in UT Remove into_path() and replace it with keep(). Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-08 10:32:31 +08:00
Alex Lyn	e4451baa84	tests: Set run-nerdctl-tests with qemu-runtime-rs required run-nerdctl-tests (qemu-runtime-rs) Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-08 09:56:50 +08:00
Alex Lyn	56a21c33a3	tests: Set stability tests with qemu-runtime-rs required run-containerd-stability (active, qemu-runtime-rs) run-containerd-stability (lts, qemu-runtime-rs) Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-08 09:56:50 +08:00
Alex Lyn	679e31d884	tests: Set run-nydus CIs as required run-basic-amd64-tests / run-nydus Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-01-08 09:56:50 +08:00
Fabiano Fidêncio	6b3953dd51	tests: k8s: liveness-probes: Adjust events grep Till k8s 1.34 we could grep by "Started containerd". From k8s 1.35 onwards the event message changed and we should, instead, grep by "Container started". Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-07 23:01:59 +01:00
Fabiano Fidêncio	c4194538e2	versions: Bump QEMU to v10.2.0 QEMU v10.2.0 was released on December 24th, 2025. The experimental GPU SNP / TDX are also pointing to v10.2.0 release with their gpu-{snp,tdx}-20260107 branch. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-01-07 12:30:55 +01:00
Steve Horsman	93ad6fde75	Merge pull request #12294 from stevenhorsman/remediate-RUSTSEC-2021-0064 versions: Bump sha2 crate version	2026-01-07 09:53:26 +00:00
stevenhorsman	c456b84537	versions: Bump sha2 crate version sha2 0.9.3 includes the use of cpuid-bool, which was renamed to cpufeatures around 5 years ago. Try moving to a workspace dependency of sha2 and bumping to the latest version to remediate RUSTSEC-2021-0064 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-01-06 15:41:34 +00:00
Roaa Sakr	44c79cf14a	ci: Update AKS setup post Pod Sandboxing GA Update workload-runtime value to align with current AKS Pod Sandboxing documentation post GA. Signed-off-by: Roaa Sakr <romoh@microsoft.com>	2026-01-05 13:47:33 -08:00
Steve Horsman	9463dd970e	Merge pull request #12287 from mythi/drop-qat use-cases: drop Intel QuickAssist instructions	2026-01-05 13:28:16 +00:00
Mikko Ylinen	99bc0f49cc	use-cases: drop Intel QuickAssist instructions While the use-case of Intel QuickAssist (QAT) accelerated crypto and/or compression with k8s and Kata Containers is still valid, the setup instructions are outdated: Starting with Intel Xeon Gen4 (Sapphire Rapids), QAT driver stack moved to in-tree drivers without a separete SR-IOV VF driver. Drop all the setup instructions but keep the use-cases doc for reference. Users wanting to enable the use-case, should consult with Intel QAT Device plugins or Intel QAT DRA driver authors. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-01-02 12:14:04 +02:00
Fupan Li	b27a80b800	Merge pull request #12156 from Apokleos/required-coco-dev-rs tests: Make the tests coco-dev job with coco-dev-runtime-rs required	2025-12-25 17:30:40 +08:00
Steve Horsman	bdc5f7d4be	Merge pull request #12271 from stevenhorsman/bump-rust-to-1.88 Bump rust to 1.88	2025-12-23 21:38:42 +00:00
Alex Lyn	0b1a5c6e93	tests: Make the tests coco-dev job with coco-dev-runtime-rs required The nontee job (run-k8s-tests-coco-nontee) for qemu-coco-dev-runtime-rs is running well and it's time to make it required when the CI runs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-23 09:54:52 +08:00
stevenhorsman	b6108a7c4a	dragonball: Fix manual implementation of .is_multiple_of Use this new method to avoid the clippy warning and increase readability Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:50:19 +00:00
stevenhorsman	55be31ef0f	runtime-rs: Fix manual implementation of .is_multiple_of Use this new method to avoid the clippy warning and increase readability Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:50:19 +00:00
stevenhorsman	1d139a7c92	versions: Bump rust to 1.88 In prep for the bump to rust 1.90, try bumping to 1.88 first to see if the CI is successful here Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:50:19 +00:00
stevenhorsman	c6053e976f	dragonball: Improve vector initialisation Directly initialise a zero-filled vector, rather than resizing later Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:50:19 +00:00
stevenhorsman	18a51dad98	dragonball: Fix manual slice size calculation Using the built in size_of_val is easier to read and less error-prone than doing this calculation manually Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:50:19 +00:00
stevenhorsman	188c9e6eb7	dragonball: Prefer from over into From give Into for free, so prefer this method Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:50:19 +00:00
stevenhorsman	c7daa12fe6	dragonball: Remove unnecessary cast Don't cast usize to usize Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:50:19 +00:00
stevenhorsman	6c19bd01c8	dragonball: Fix redundant pattern matching Convert `matches!(desc, None)` to desc.is_none() which is simpler Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:50:19 +00:00
stevenhorsman	15c6ef5988	dragonball: Fix `deprecated cargo-clippy` cfg #[cfg(feature = "cargo-clippy")] has been deprecated for years, so should be replaced with `#[cfg(clippy)]` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:50:19 +00:00
stevenhorsman	e0d09dd787	dragonball: Fix useless use of `vec!` `vec![...]` is the same as `[...]`, so remove it to clean up code Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:50:19 +00:00
stevenhorsman	4fb90d61aa	dragonball: Temporaily skip kvm bindgen tests There are many, many null pointer dereferences in the bindgen code when moving between rust 1.85.1 and 1.86 and no docs of the source that it was generated from, so try and skip these test from running until an SME can look at them @lifupan Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:50:19 +00:00
stevenhorsman	04306c162b	genpolicy: Fix uninlined_format_args Clippy is recommending that format args are inlined for better clarity, so update our code to remove these warnings Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:50:11 +00:00
stevenhorsman	b9ce0bbdf8	trace-forwarder: Fix uninlined_format_args in examples Clippy is recommending that format args are inlined for better clarity, so update our code to remove these warnings Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:50:11 +00:00
stevenhorsman	c5f0acef23	kata-ctl: Fix uninlined_format_args Clippy is recommending that format args are inlined for better clarity, so update our code to remove these warnings Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:50:02 +00:00
stevenhorsman	aff3524420	kata-ctl: Refresh runtime-rs crates runtime-rs crates are pulled into kata-ctl and some of these have bumped recently, so update these in kata-ctl as well Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:50:01 +00:00
stevenhorsman	2caa62f753	agent-ctl: Fix uninlined_format_args Clippy is recommending that format args are inlined for better clarity, so update our code to remove these warnings Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:49:52 +00:00
stevenhorsman	6006b8350d	libs: Fix uninlined_format_args Clippy is recommending that format args are inlined for better clarity, so update our code to remove these warnings Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:49:45 +00:00
stevenhorsman	2fde31547a	runtime-rs: Fix uninlined_format_args Clippy is recommending that format args are inlined for better clarity, so update our code to remove these warnings Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:49:36 +00:00
stevenhorsman	a299338b6c	dragonball: Fix uninlined_format_args Clippy is recommending that format args are inlined for better clarity, so update our code to remove these warnings Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:49:27 +00:00
stevenhorsman	e44c4d901f	doc: Fix uninlined_format_args in examples Clippy is recommending that format args are inlined for better clarity, so ensure our docs include this Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:49:27 +00:00
stevenhorsman	b07899f8dc	agent: Fix uninlined_format_args Clippy is recommending that format args are inlined for better clarity, so update our code to remove these warnings Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-22 19:49:17 +00:00
stevenhorsman	2af88dbb48	agent: bump cdi-rs In #12151 the version was bumped in cargo.toml, but the update not done, so run `cargo update -p container-device-interface` to apply it Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-20 10:08:45 +00:00
Steve Horsman	97603608ac	Merge pull request #12259 from RuoqingHe/filter-tests-requires-kvm dragonball: Skip tests require kvm while kvm is absent	2025-12-19 16:05:33 +00:00
Steve Horsman	81d74346f3	Merge pull request #12255 from stevenhorsman/bump-to-rust-1.90-prep Preparations for the rust 1.90 bump	2025-12-19 14:41:32 +00:00
Steve Horsman	b75cc16bad	Merge pull request #12272 from shwetha-s-poojary/revert_cleanup workflows: payload: do not remove AGENT_TOOLSDIRECTORY	2025-12-19 14:22:36 +00:00
shwetha-s-poojary	1929ca8879	workflows: payload: do not remove AGENT_TOOLSDIRECTORY Remove line that deletes $AGENT_TOOLSDIRECTORY Signed-off-by: shwetha-s-poojary <shwetha.s-poojary@ibm.com>	2025-12-19 05:24:36 -08:00
Alex Lyn	b85084f046	Merge pull request #12266 from BbolroC/fix-selective-skip-for-empty-dir-test tests: remove re-delcared local variable in k8s-empty-dirs.bats	2025-12-19 17:30:07 +08:00
Hyounggyu Choi	3fa1d93f85	tests: remove re-delcared local variable in k8s-empty-dirs.bats Since #12204 was merged, the following error has been observed: ``` bats warning: Executed 1 instead of expected 2 tests [run_kubernetes_tests.sh:162] ERROR: Tests FAILED from suites: k8s-empty-dirs.bats ``` The cause is that `pod_logs_file` is re-declared as a local variable in the second test before skipping, which makes it inaccessible in `teardown()` and leads to an error. This commit removes the re-declaration of the variable. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-12-18 18:57:16 +01:00
Fabiano Fidêncio	51e9b7e9d1	nydus-snapshotter: Bump to v0.15.10 As it brings a fix that most likely can workaround the containerd / nydus-snapshotter databases desynchronization. Reference: https://github.com/containerd/nydus-snapshotter/pull/700 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-18 18:41:09 +01:00
Fabiano Fidêncio	03297edd3a	kata-deploy: rust: Add list verb for runtimeclasses RBAC The Rust kata-deploy binary calls list_runtimeclasses() during NFD setup, but the ClusterRole only granted get and patch permissions. Add the list verb to the runtimeclasses resource permissions to fix the RBAC error: runtimeclasses.node.k8s.io is forbidden: User \"system:serviceaccount:kube-system:kata-deploy-sa\" cannot list resource \"runtimeclasses\" in API group \"node.k8s.io\" at the cluster scope Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-18 18:31:52 +01:00
Ruoqing He	5fa663b1e3	dragonball: Skip tests requires KVM when KVM is absent KVM is not available in our ARM runners, let's skip those tests accordingly, while making the rest test cases remain tested on machines with KVM present and access to KVM device. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-12-18 14:17:46 +00:00
Ruoqing He	7cfb97d41b	libs: Introduce skip_if_kvm_unaccessable macro There are test cases require interaction with KVM device, introduce skip_if_kvm_unaccessable macro to skip them. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-12-18 12:43:20 +00:00
Manuel Huber	78c41b61f4	tests: nvidia: Update images, probes and timeouts Changes in NIM/RAG samples: - update image references - update memory requirements, timeouts, model name - sanitize some of the probes and print-out Further refinements can be made in the future. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-18 10:57:14 +01:00
Manuel Huber	0373428de4	tests: nvidia: Use secret for NGC API key This is a slight change in the manifest to at least use a secret for the environment variable. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-18 10:57:14 +01:00
Hyounggyu Choi	56ec8d7788	Merge pull request #12204 from kata-containers/runtime-rs-stability-debug CI: Upgrade log details for improved error analysis	2025-12-18 10:54:54 +01:00
Alex Lyn	c7dfdf71f5	Merge pull request #11935 from burgerdev/fsgroup genpolicy: support fsGroup setting in pod security context	2025-12-18 16:47:48 +08:00
stevenhorsman	e5568e65a1	lib: Fix missing copyright and license Add the copyright date from when the file was first submitted to github Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:27 +00:00
stevenhorsman	175c2c70b1	dragonball: Fix pointer equality check Use `ptr::eq` to compare references by address rather than the values that they point to Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:27 +00:00
stevenhorsman	a221eaa81d	dragonball: Fix length comparison to zero Replace .len() == 0 with .is_empty() for more clarity Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:27 +00:00
stevenhorsman	e73a7c3717	dragonball: Replace manual div_ceil Use the more clear built-in method Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:27 +00:00
stevenhorsman	048000654c	runtime-rs: Prevent doc test issue cargo test was trying to evaluate the documentation comment and failing, so try and make the comment explicitly text to avoid this Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:27 +00:00
stevenhorsman	4384b6ad9f	dragonball: Avoid manual implementation of ok Refactor to use `.ok()` rather than implementing it ourselves Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:27 +00:00
stevenhorsman	f4dd69a835	dragonball: Remove unnecessary unwrap Given that we call `is_some` earlier, we don't then need to unwrap, so refactor to avoid this Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:27 +00:00
stevenhorsman	20192f819f	agent-ctl: Remove unnecessary unwrap Given that we call `is_some` earlier, we don't then need to unwrap, so refactor to avoid this Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:27 +00:00
stevenhorsman	9bf5f113f9	genpolicy: Allow dead_code A few structs in genpolicy are never constructed, so add `#[allow(dead_code)]` to prevent this clipped warning Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:27 +00:00
stevenhorsman	ca1c0c853f	libs: Remove doc overindentation The doc comment had one space to many in it's list, so the format was wrong Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:27 +00:00
stevenhorsman	501b41cf8f	dragonball: Remove doc overindentation The doc comment had one space to many in it's list, so the format was wrong Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:27 +00:00
stevenhorsman	6a45ee0874	runtime-rs: Improve map iteration The key was never used, just the value, so just iterate over `.values()` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:27 +00:00
stevenhorsman	2f49dffcd7	runtime-rs: Remove dead code `VmmPingResponse` and `NetInterworkingModel` are never constructed, so remove them Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:27 +00:00
stevenhorsman	35557745b1	runtime-rs: Fix char_indices_as_byte_indices In unicode you can have multi-byte characters, so it's better to user char_indices than enumerate the bytes Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:27 +00:00
stevenhorsman	69ca6c0de0	runtime-rs: Fix manual_contains Use contains to be more concise and efficient rather than manually implementing this check Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:27 +00:00
stevenhorsman	0027f6cae0	agent: Fix dead_code warning VirtioBlkCcwDeviceHandler and VirtioBlkCcwHandler are only constructed on s390x, so add #[cfg(target_arch = "s390x")] to all the code Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:27 +00:00
stevenhorsman	3b2c83f9d2	trace-forwarder: Fix clippy::io_other_error issue We can use the new Error::other options rather than Error:new(Error:Kind:Other Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:26 +00:00
stevenhorsman	b1cfa98524	runtime-rs: Fix clippy::io_other_error issue We can use the new Error::other options rather than Error:new(Error:Kind:Other Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:26 +00:00
stevenhorsman	dc8f628dd1	libs: Fix clippy::io_other_error issue We can use the new Error::other options rather than Error:new(Error:Kind:Other and drop our own macro that did this mapping Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:26 +00:00
stevenhorsman	5f1d3481af	dragonball: Fix clippy::io_other_error issue We can use the new Error::other options rather than Error:new(Error:Kind:Other Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:26 +00:00
stevenhorsman	9ec7109712	agent: Fix clippy::io_other_error issue We can use the new Error::other options rather than Error:new(Error:Kind:Other Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:26 +00:00
stevenhorsman	34d299ae44	vsock-exporter: Fix clippy::io_other_error issue We can use the new Error::other options rather than Error:new(Error:Kind:Other Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:26 +00:00
stevenhorsman	b2f9f23504	dragonball: Fix `mismatched_lifetime_syntaxes` issue Fix to`warning: hiding a lifetime that's elided elsewhere is confusing` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:26 +00:00
stevenhorsman	8bbbc3a58b	lib: Fix `mismatched_lifetime_syntaxes` issue Fix the warning throw up: ``` warning: hiding a lifetime that's elided elsewhere is confusing --> /root/go/src/github.com/kata-containers/kata-containers/src/libs/kata-types/src/utils/u32_set.rs:50:17 \| 50 \| pub fn iter(&self) -> Iter<u32> { \| ^^^^^ --------- the same lifetime is hidden here \| \| \| the lifetime is elided here \| = help: the same lifetime is referred to in inconsistent ways, making the signature confusing = note: `#[warn(mismatched_lifetime_syntaxes)]` on by default help: use `'_` for type paths \| 50 \| pub fn iter(&self) -> Iter<'_, u32> { \| +++ ``` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-18 07:45:26 +00:00
Xuewei Niu	a65c2b06b8	Merge pull request #12169 from zhangls-0524/new-fix-issue-11996 runtime-rs: Block Device Rootfs Mount Options Lost During Storage Object Creation	2025-12-18 10:09:38 +08:00
Fabiano Fidêncio	0e534fa7fe	versions: Update virtiofsd to v1.13.3 Update virtiofsd to its latest release. Here we also need to update the alpine version used by the builder as we need a version of musl-dev new enough to have wrappers for pread2 and pwrite2. As bumping, bump to the latest. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-18 00:51:08 +01:00
Fabiano Fidêncio	1d2e19b07c	versions: Update pause image to 3.10.1 Update pause image to its latest release. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-18 00:51:08 +01:00
Fabiano Fidêncio	6211c10904	versions: Update libseccomp to 2.6.0 Update libseccomp to its latest release. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-18 00:51:08 +01:00
Fabiano Fidêncio	0e0a92533c	versions: update lvm2 to v2_03_38 Update lvm2 to its latest release. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-18 00:51:08 +01:00
Fabiano Fidêncio	142c7d6522	versions: Update gperf to 3.3 Update gperf to its latest release. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-18 00:51:08 +01:00
Fabiano Fidêncio	e757485853	versions: Update cryptsetup to v2.8.1 Update cryptsetup to its latest release Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-18 00:51:08 +01:00
Fabiano Fidêncio	35cd5fb1d4	versions: Update helm to v4.0.4 Update helm to its latest release Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-18 00:51:08 +01:00
Tobin Feldman-Fitzthum	decc09e975	tests: cc: add test with SNP reference values Add two attestation tests. The first one sets a resource policy that requires CPU0 to have an affirming trust level. This is a negative test which can run on any platform. Setting this policy without setting any reference values should result in an attestation failure. Next, a second test will set the same policy, but this time it will use the journal log to find the QEMU command line from the previous test and calculate the expected reference values. Currently this is only supported on SNP using the sev-snp-measure tool, but the same flow should work on other platforms. Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>	2025-12-18 00:12:11 +01:00
Ruoqing He	8b0d650081	dragonball: Use unique name for vhost path The five tests are set to the same vhost socket path, which could lead to racing with one another. Use unique name to avoid this. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-12-17 22:25:55 +01:00
Fabiano Fidêncio	320f1ce2a3	versions: Bump experimental {tdx,snp} QEMU Let's bump experimental {tdx,snp} QEMU to the tags created Today in the Confidential Containers repo, which match with QEMU 10.2.0-rc3. This bump is mostly for early testing what will become 10.2.0, which will be bumped everywhere then. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-17 17:42:04 +01:00
Alex Lyn	3696d9143a	tests: Correct the teardown_common in cpu-ns.bats It will address the issue: "# bats warning: Executed 0 instead of expected 1 tests" Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Alex Lyn	a28f24ef8c	tests: move the get_pod_config_dir into setup_common As each case need such preparation of get_pod_config_dir, a better method is directly move it into the setup_common method. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Alex Lyn	5778b0a001	tests: Introduce measure_node_time to get test case end time To measure the duration for journal, we need clearly print the journal start time and end time for each case which helps to ensure the journal log is for the specified period for the case. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Alex Lyn	648f0913ca	tests: Load lib.rs in bats to ensure related function available The lib.rs should be first loaded before execute some functions call. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Alex Lyn	0929c84480	runtime-rs: Reduce output log and increase log level For failure cases within CI, we need dump the kata log to help address issues, but currently large log messages cause partial log we can see. We remove initdata log output and increase log level to reduce log output. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Alex Lyn	bbec15d695	tests: delete policy_settings_dir only for first test case Currently policy_settings_dir is created only when BATS_TEST_NUMBER == "1", but delete_tmp_policy_settings_dir "${policy_settings_dir}" is called in teardown() for every test. This means that for tests after the first one teardown() may attempt to delete a directory that was already removed by a previous test, or rely on a value that does not belong to the current test execution. Adjust teardown logic so that policy_settings_dir is only deleted for the first test case (BATS_TEST_NUMBER == "1") and ignored for subsequent tests. This keeps the original optimization of running genpolicy only once, while avoiding unnecessary or confusing cleanup attempts in later test cases. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Alex Lyn	24e68b246f	tests: Add missing bin env at the head of bats Add the missing part of `#!/bin/bash/env` in bats. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Alex Lyn	93ba6a8e76	tests: Make pod_name a global variable the previous pod_name is set as local which can not be captured within the teardown() function, causing failure. This commit just remove the `local pod_name` to make it a global variable. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Alex Lyn	89dce4eff6	tests: Enhance debug log output Introduce setup_common in setup() and teardown_common() in teardown() to get enough log to help debug Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-17 16:14:10 +00:00
Fabiano Fidêncio	88cdfab604	runtime: nvidia: Align static_sandbox_resource_mgmt Let's ensure we have those aligned for both CC and non-CC use-case. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-17 17:04:51 +01:00
Fabiano Fidêncio	995770dbeb	runtime: nvidia: Use cold-plug by default Now that we have the way to do cold-plug, let's ensure we also use it for the non-CC use case. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-17 17:04:51 +01:00
Hyounggyu Choi	7f72acc266	Merge pull request #12180 from BbolroC/enable-vfio-ap-passthrough-runtime-rs runtime-rs: Enable VFIO-AP passthrough (hotplug only) on s390x	2025-12-17 15:50:10 +01:00
Hyounggyu Choi	f1b4327dba	Merge pull request #12247 from fidencio/topic/ci-store-the-tarballs-we-rely-on-on-gchr-follow-up build: Fix GPG key for gperf & Pass PUSH_TO_REGISTRY and GH_TOKEN to Docker builds	2025-12-17 13:53:58 +01:00
Fabiano Fidêncio	5415cf4e0f	workflows: payload: Remove unneeded stuff from the runner Otherwise we may hit a `no space left on device` when building the rust kata-deploy binary. This happens mostly because of the muli-staging build used to generate a distroless final container. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-17 09:57:02 +01:00
Fabiano Fidêncio	98c5276546	helm: runtimeclasses: Match the kata-deploy rust deployment There we ensure labels are added to better deal with ownership of the runtimeclasses. It's not strictly needed here as helm does take care of the ownership, but also doesn't hurt to follow what seems to be a common practice. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-17 09:57:02 +01:00
Fabiano Fidêncio	6130d7330f	ci: Run a nightly job using the kata-deploy rust Let's shamelessly duplicate the nightly job to have at least nightly runs using the rust implementation of kata-deploy. The reason for doing that is to be pragmatic, as pragmatic as possible, and avoid switching away of the scripts before 3.24.0 release, while still testing both ways till the switch happens. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-17 09:57:02 +01:00
Fabiano Fidêncio	fbc29f3f5e	kata-deploy: helm: Adapt to the rust binary Differently than the scripts, which are called as `bash -c ...`, the kata-deploy rust binary must be invoked directly we do not even have shell in its container. For now, the rust version is used in the used image has the "-rust" suffix, which will help us to have both ways being used / tested for a little while. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-17 09:57:02 +01:00
Fabiano Fidêncio	9d88c6b1d7	kata-deploy: Oxidize the script kata-deploy shell script is not THAT bad and, to be honest, it's quite handy for quick hacks and quick changes. However, it's been increasingly becoming harder to maintain as it's grown its scope from a testing tool to the proper project's front door, lacking unit tests, and with an abundacy of complex regular expressions and bashisms to be able to properly parse the environment variables it consumes. Morever, the fact it is a Frankstein's monster glued together using python packages, golang binaries, and a distro dependent container makes the situation VERY HARD to use it from a distroless container (thus, avoiding security issues), preventing further integration with components that require a higher standard of security than we've been requiring. With everything said, with the help of Cursor (mostly on generating the tests cases), here comes the oxidized version of the script, which runs from a distroless container image. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-17 09:57:02 +01:00
Fabiano Fidêncio	c9cd79655d	build: Pass PUSH_TO_REGISTRY and GH_TOKEN to Docker builds The ORAS cache helper needs PUSH_TO_REGISTRY to be set to 'yes' to push new artifacts to the cache. However, this environment variable was not being passed to the Docker container during agent, tools, and busybox builds. Moreover, for ghcr.io authentication, add support for using GH_TOKEN and GITHUB_ACTOR as fallbacks when explicit credentials (ARTEFACT_REGISTRY_USERNAME/PASSWORD) are not provided. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-16 21:58:16 +01:00
Fabiano Fidêncio	b11cea3113	build: Fix GPG key for gperf The GPG key used for gperf was incorrectly set to the busybox maintainer's key (Denis Vlasenko) instead of the gperf maintainer's key (Marcel Schaible). Wrong key (busybox): C9E9416F76E610DBD09D040F47B70C55ACC9965B Denis Vlasenko <vda.linux@googlemail.com> Correct key (gperf): EDEB87A500CC0A211677FBFD93C08C88471097CD Marcel Schaible <marcel.schaible@studium.fernuni-hagen.de> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-16 21:58:16 +01:00
Fabiano Fidêncio	6e01ee6d47	helm: Provide kata-remote runtime class kata-remote is a runtime class that cloud-api-adaptor relies on to work. kata-remote by itself does nothing, and that's the reason it's disabled by default. We're only adding it here so cloud-api-adaptor charts can simply do something like `--set shims.remote.enabled=true`. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-16 21:57:49 +01:00
Fabiano Fidêncio	0a0fcbae4a	gatekeeper: Adjust to kata-tools A few jobs have been renamed as part of the kata-tools split. Let's add them all here. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-16 18:22:40 +01:00
Fabiano Fidêncio	fb326b53df	agent: Ensure MS_REMOUNT is respected When updating ephemeral storages, MS_REMOUNT is explicitly passed as, for instance, `/dev/shm` should be remounted after memory is hotplugged. Till now Kata Containers has been explicitly ignoring such updates, leading to the containers' `/dev/shm` having the size of "half of the memory allocated, during the startup time", which goes against the expected behaviour. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-12-16 15:11:34 +01:00
Fabiano Fidêncio	830d15d4c8	tests: Adapt to using kata-tools Instead of relying and the fully bloated kata tarball. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-16 12:55:07 +01:00
Fabiano Fidêncio	a2534e7bc8	kata-tools: Release as its own tarball We're only releasing those for amd64 as that's the only architecture we've been building the packages for. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-16 12:55:07 +01:00
Fabiano Fidêncio	6d2f393be4	build: Split tools build from the other artefacts build Let's ensure we can create a specific "tools" tarball, which will help those who only need to pull those either for testing or production usage. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-16 12:55:07 +01:00
Ruoqing He	6d2c66c7eb	runtime-rs: Refactor feature propagation After runtime-rs workspace merged into root workspace, features passed when building runtime-rs needs to be refactored to be correctly propagated. Taking dragonball for example, runtime-rs requires runtimes to depend on virt_conttainers feature, and virt_containers needs to handle hypervisor features specifically. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-12-16 11:26:07 +01:00
Ruoqing He	1872af7c5a	ci: Install cmake before building runtime-rs cmake is required for libz-sys to compile (which is required by nydus). Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-12-16 11:26:07 +01:00
Ruoqing He	9551f97e87	runtime-rs: Change TARGET_PATH to root workspace After the workspace integration of runtime-rs, now the output of runtime-rs is under the repo root, instead of src/runtime-rs. Change the TARGET_PATH accordingly to tell Makefile where to lookup output. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-12-16 11:26:07 +01:00
Ruoqing He	c7c02ac513	dragonball: Skip tests needs kvm under non-root Some cases in dragonball crates requires interaction with KVM module to complete, which requires root privilege. Skip those tests under non-root user. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-12-16 11:26:07 +01:00
Ruoqing He	889c3b6012	dragonball: Fix false use statement on aarch64 gic::create_gic is actually gated behind dbs_arch crate, instead of arch::aarch64. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-12-16 11:26:07 +01:00
Ruoqing He	1c1f3a2416	dragonball: Allow missing_docs for dummy MMIODeviceInfo MMIODeviceInfo inside the test module of dbs_boot on aarch64 is used for testing purpose, but `pub` attribute requires it to have documentation. Since this is used only for testing purpose, let's allow missing_docs for it. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-12-16 11:26:07 +01:00
Ruoqing He	6d0cb18c07	dragonball: Add missing test module attribute Test set of dbs_utils's tap module is missing test attribute, which makes dev-dependencies unusable. Marking tests of tap as test module. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-12-16 11:26:07 +01:00
Ruoqing He	15fe7ecda1	runtime-rs: Remove lockfile Remove Cargo.lock since it now shares lockfile workspace-wise. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-12-16 11:26:07 +01:00
Ruoqing He	beb0cac0d1	build: Move runtime-rs to root workspace This is a follow-up of `3fbe693`. Remove runtime-rs from exclude list, and make it as a member of root workspace. Specify shim and shim-ctl as the binary of runtime-rs package, make runtime-rs and all its members into root workspace. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-12-16 11:26:07 +01:00
Ruoqing He	ae4b3e9ac0	runtime-rs: Make runtime-rs a package Make runtime-rs a package produces shim and shim-ctl as its binary product, which enables Makefile to work after it's incorporated into root workspace. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-12-16 11:26:07 +01:00
shezhang.lau	9744e9f26d	runtime-rs: Block Rootfs Mount Options During Storage Object Creation Init the storage options with original rootfs options. Addition: XFS, append nouuid to the mount options if not exist. Signed-off-by: shezhang.lau <shezhang.lau@antgroup.com>	2025-12-16 13:57:02 +08:00
Xuewei Niu	c8b5f8efad	Merge pull request #12167 from M-Phansa/main runtime-rs: handle container missing during kill_process gracefully	2025-12-16 10:31:50 +08:00
Fabiano Fidêncio	1388a3acda	packaging: Add ORAS cache for gperf and busybox tarballs To protect against upstream download failures for gperf and busybox, implement ORAS-based caching to GHCR. This adds: - download-with-oras-cache.sh: Core helper for downloading with cache - populate-oras-tarball-cache.sh: Script to manually populate cache - warn() function to lib.sh for consistency Modified build scripts to: - Try ORAS cache first (from ghcr.io/kata-containers/kata-containers) - Fall back to upstream download on cache miss - Automatically push to cache when PUSH_TO_REGISTRY=yes The cache is automatically populated during CI builds, and parallel architecture builds check for existing versions before pushing to avoid race conditions. Forks benefit from upstream cache but can override with their own: ARTEFACT_REPOSITORY=myorg/kata make agent-tarball Generated-By: Cursor IDE with Claude Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-15 22:04:21 +01:00
Markus Rudy	661e851445	genpolicy: support fsGroup setting in pod security context The runtime handles the fsGroup field of the pod security context by adding a mount option to the generated storage object [1]. This commit changes genpolicy to expect this option. Instead of passing another side input to yaml::get_container_mounts_and_storages, we pass the entire PodSpec. This reduces the necessary changes in the pod-generating resources and allows for possible future use of other PodSpec fields. [1]: https://github.com/kata-containers/kata-containers/blob/0c6fcde1/src/runtime/virtcontainers/kata_agent.go#L1620-L1625 Fixes: #11934 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-12-15 15:22:33 +01:00
Fabiano Fidêncio	a25a53c860	kata-deploy: sa: Fix permissions for patching nodefeaturerules I've seen this happening with the GPU SNP CI every now and then, but I don't really understand how this was not caught by the TDX / SNP CI themselves before. In any case, the error seen is: ``` Error from server (Forbidden): error when applying patch: {"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"nfd.k8s-sigs.io/v1alpha1\",\"kind\":\"NodeFeatureRule\",\"metadata\":{\"annotations\":{},\"name\":\"amd64-tee-keys\"},\"spec\":{\"rules\":[{\"extendedResources\":{\"sev-snp.amd.com/esids\":\"@cpu.security.sev.encrypted_state_ids\"},\"labels\":{\"amd.feature.node.kubernetes.io/snp\":\"true\"},\"matchFeatures\":[{\"feature\":\"cpu.security\",\"matchExpressions\":{\"sev.snp.enabled\":{\"op\":\"Exists\"}}}],\"name\":\"amd.sev-snp\"},{\"extendedResources\":{\"tdx.intel.com/keys\":\"@cpu.security.tdx.total_keys\"},\"labels\":{\"intel.feature.node.kubernetes.io/tdx\":\"true\"},\"matchFeatures\":[{\"feature\":\"cpu.security\",\"matchExpressions\":{\"tdx.enabled\":{\"op\":\"Exists\"}}}],\"name\":\"intel.tdx\"}]}}\n"}}} to: Resource: "nfd.k8s-sigs.io/v1alpha1, Resource=nodefeaturerules", GroupVersionKind: "nfd.k8s-sigs.io/v1alpha1, Kind=NodeFeatureRule" Name: "amd64-tee-keys", Namespace: "" for: "/opt/kata-artifacts/node-feature-rules/x86_64-tee-keys.yaml": error when patching "/opt/kata-artifacts/node-feature-rules/x86_64-tee-keys.yaml": nodefeaturerules.nfd.k8s-sigs.io "amd64-tee-keys" is forbidden: User "system:serviceaccount:kube-system:kata-deploy-sa" cannot patch resource "nodefeaturerules" in API group "nfd.k8s-sigs.io" at the cluster scope ``` And the fix is as simple as allowing patching and updating a nodefeaturerule in our service account RBAC. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-15 12:01:20 +01:00
Alex Lyn	f4f61d5666	Merge pull request #12229 from fidencio/topic/kata-deploy-do-deprecations kata-deploy: Remove deprecated features from 3.23.0	2025-12-15 19:00:07 +08:00
Hyounggyu Choi	b69da5f3ba	gatekeeper: Make s390x e2e tests required again Since the CI issue for s390x was resolved on Dec 5th, the nightly test result has gone green for 10 consecutive days. This commit puts the e2e tests for s390x again into the required job list. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-12-15 11:12:25 +01:00
Fabiano Fidêncio	ded6d1636f	kata-deploy: Remove deprecated features from 3.23.0 Let's remove the deprecated features that were marked for removal after Kata Containers 3.23.0: kata-deploy.sh: - Remove non-arch-specific variable fallbacks (SHIMS, DEFAULT_SHIM, SNAPSHOTTER_HANDLER_MAPPING, ALLOWED_HYPERVISOR_ANNOTATIONS, PULL_TYPE_MAPPING, EXPERIMENTAL_FORCE_GUEST_PULL). Each arch now has its own default value. - Remove CREATE_RUNTIMECLASSES and CREATE_DEFAULT_RUNTIMECLASS variables and associated functions (create_runtimeclasses, delete_runtimeclasses, adjust_shim_for_nfd). RuntimeClasses are now managed by Helm chart, not the daemonset script. - Unsupported architectures now fail with an error instead of falling back to non-arch-specific defaults. Helm chart: - Remove all deprecated env values (createRuntimeClasses, createDefaultRuntimeClass, debug, shims, shims_, defaultShim, defaultShim_, allowedHypervisorAnnotations, snapshotterHandlerMapping, snapshotterHandlerMapping_, agentHttpsProxy, agentNoProxy, pullTypeMapping, pullTypeMapping_, _experimentalSetupSnapshotter, _experimentalForceGuestPull, _experimentalForceGuestPull_*). - Remove backward compatibility code from _helpers.tpl that checked for legacy env values. - Remove legacy env.shims check from runtimeclasses.yaml. - Remove CREATE_RUNTIMECLASSES and CREATE_DEFAULT_RUNTIMECLASS env vars from kata-deploy.yaml and post-delete-job.yaml. - Update RBAC to only include runtimeclasses get/patch permissions (needed for NFD patching), removing create/delete/list/update/watch. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-13 16:32:00 +01:00
Adeet Phanse	db09912808	agent: add SandboxError enum for typed error handling - Replace generic errors in sandbox operations with typed SandboxError variants (InvalidContainerId, InitProcessNotFound, InvalidExecId). - This enables the kata shim to handle specific failure cases differently. Fixes #12120 Signed-off-by: Adeet Phanse <adeet.phanse@mongodb.com>	2025-12-12 12:33:18 -05:00
Adeet Phanse	5b7e1cdaad	runtime-rs: handle container missing during kill_process gracefully Add better error handling to runtime rs to handle when the sandbox itself is killed and recreated. - Update the kill_process function to skip sending a signal when the process is stopped. - Always set ProcessStatus::Stopped even when wait_process fails - In state_process return synthetic state for sandbox container when using Sandbox API Fixes #12120 Signed-off-by: Adeet Phanse <adeet.phanse@mongodb.com>	2025-12-12 12:33:17 -05:00
Fabiano Fidêncio	c7d0c270ee	release: Bump version to 3.24.0 Bump VERSION and helm-chart versions Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-12 18:15:41 +01:00
Fabiano Fidêncio	50b853eb93	tests: nvidia: Always rely on the "kata" default runtime class This is a pattern already followed by all the other tests. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-12 16:31:42 +01:00
Manuel Huber	ff2396aeec	tests: nvidia: Declare KATA_HYPERVISOR variable Align with other test logic - declare the KATA_HYPERVISOR in the run bash script, then declare the RUNTIME_CLASS_NAME variable in the bats files. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 16:31:42 +01:00
Manuel Huber	6e31cf2156	tests: nvidia: cc: USE is_confidential_gpu_hw This function has recently been introduced, so we align patterns. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 16:31:42 +01:00
Manuel Huber	cd1f55b41c	tests: nvidia: cc: Set GPU0 policy for NIM tests Now that we have a more restrictive resource policy for KBS, let us start adopting it across all NVIDIA test cases. This policy was previously introduced by the NVIDIA attestation test. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 16:31:42 +01:00
Manuel Huber	edbac264cb	tests: nvidia: cc: Remove KBS variable The variable is now set in the CI YAML file, thus removing the assignment. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 16:31:42 +01:00
Manuel Huber	9665b74653	tests: nvidia: cc: address shellcheck warnings Address shellcheck warnings for run_kubernetes_nv_tests.sh Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 16:31:42 +01:00
Manuel Huber	5f9e7a03a8	tests: nvidia: do not use teardown_common Clean up in each NVIDIA bats file according to our needs. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 16:31:42 +01:00
Alex Lyn	c3fd4c1621	version: Bump rtnetlink and netlink-packet-route It aims to upgrade rtnetlink to mitigate netlink log noise. This commit upgrades the `rtnetlink` dependency (and corresponding libraries like `netlink-packet-route`) to address excessive and unnecessary netlink-related logging during sandbox startup. Problem: The previously used `rtnetlink v0.16` (depending on `netlink-proto v0.11.3`) generates a high volume of DEBUG/INFO level netlink messages during sandbox initialization. This noise: 1. Overloads the logging system, often leading to warnings like "slog-async: logger dropped messages due to channel overflow." 2. Interferes with effective troubleshooting by distracting developers from legitimate Kata errors. Solution: We upgrade to `rtnetlink v0.19` (and `netlink-proto v0.12`), as testing confirms that the latest versions have correctly elevated the verbosity of these netlink internal events to the TRACE level. This change significantly enhances the log analysis experience by suppressing unnecessary network-related logs during startup. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-12 14:27:33 +01:00
Manuel Huber	1781fb8b06	tests: nvidia: cc: Use CUDA image from NVCR Pull from nvcr.io to avoid hitting unauthenticated pull rate limits. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 12:52:33 +01:00
Manuel Huber	f63f95f315	tests: nvidia: cc: generate pod security policies With these changes, we create pod security policies when running against NVIDIA TEE GPU handlers where AUTO_GENERATE_POLICY is set. For the non-TEE GPU tests, the added functions bail out by design. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 12:52:33 +01:00
Manuel Huber	bf26ad9532	nvidia: tests: remove outer CDI annotations With the new device plugin being used by CI runners, these annotations are no longer necessary. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 12:52:33 +01:00
Manuel Huber	37b4f6ae8b	tests: Adapt NVIDIA common policy settings Following existing patterns, we adapt the common policy settings for NVIDIA GPU CI platforms. For instance, for our CI runners, we use containerd 2.x. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 12:52:33 +01:00
Manuel Huber	f4c0c8546e	tests: Enable AUTO_GENERATE_POLICY for NVIDIA TEEs Enable auto-generate policy for qemu-nvidia-gpu-* if the user didn't specify an AUTO_GENERATE_POLICY value. Setting this in run_kubernetes_nv_tests.sh is too late as gha-run.sh calls into run_tests, setup.sh, and then into create_common_genpolicy_settings() where the rules.rego and genpolicy-settings file are being copied to the right locations. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 12:52:33 +01:00
Manuel Huber	b9774e44b6	genpolicy: tests: Add VFIO passthrough test cases Add one valid test case with 2 GPUs with proper VFIO device entries and CDI annotations. Add seven test cases with invalid combinations of VFIO device entries and CDI annotations. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 12:52:33 +01:00
Manuel Huber	d3e6936820	genpolicy: validation of vfio passthrough GPUs Add rules for vfio passthrough GPUs. When creating the security policy document, parse GPU resource limits and derive CDI annotation patterns and VFIO device entries. With various values for CDI annotations and device paths being runtime-dependent, use regular expressions. For now, this enables passthrough of NVIDIA GPUs, but the changes are designed to allow for other VFIO device types. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-12 12:52:33 +01:00
Alex Lyn	82e8e9fbe0	doc: add block device's settings to the doc page Add the block device specific annotations which is dedicated within runtime-rs for num_queues and queue_sie to the document to help users set the two parameters. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-11 21:10:22 +01:00
Alex Lyn	a8a458664d	kata-types: Allow dynamic queue config via Pod annotations This commit introduces the capability to dynamically configure `queue_size` and `num_queues` parameters via Pod annotations. Currently, `kata-runtime` allows for static configuration of `queue_size` and `num_queues` for block devices through its config file. However, a critical issue arises when a Pod is allocated fewer CPU cores than the statically configured `num_queues` value. In such scenarios, the Pod fails to start, leading to operational instability and limiting flexibility in resource allocation. To address this, this feature enables users to override the default queue_size and num_queues parameters by specifying them in Pod annotations.This allows for fine-grained control and dynamic adjustment of these parameters based on the specific resource allocation of a Pod. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-11 21:10:22 +01:00
Steve Horsman	51459b9b15	Merge pull request #12220 from fidencio/topic/ci-arm64-temporarily-disable-arm64-non-k8s-tests ci: arm64-non-k8s: temporarily skip the tests	2025-12-11 11:35:39 +00:00
Fabiano Fidêncio	46c7d6c9f8	ci: arm64-non-k8s: temporarily skip the tests The runner is down for a few weeks. I may end up bringing in my personal runner, but I'm not confident I can easily do this before the holidays, thus I'm skipping the tests for now. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-11 12:14:32 +01:00
Manuel Huber	560f6f6c74	tests: nvidia: cc: Affirming attestation policy Set the attestation policy for GPU0 to affirming. This requires the GPU, for instance, to have production properties, such as properly signed VBIOS firmware. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-11 10:16:58 +01:00
Alex Lyn	751b6875f9	tests: Temporarily skip the cpu-ns test for the s390x platform As some reasons that this CI is continuously failed, we'd like to temporarily skip it for the s390x platform. And it will be enabled when we addressed related issues. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-10 22:11:56 +01:00
Alex Lyn	d495b77135	runtime-rs: Align the default annptations with runtime-go As the default enable_annotations in runtime-rs is different with runtime-go, we should make it align with configuration in runtime-go. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-10 22:11:56 +01:00
Alex Lyn	c8dd5fbacf	runtime-rs: Migrate vCPU tracking to fractional float This commit refactors the vCPU resource management within runtime's `CpuResource` structure and related calculation logic to use floating-point numbers (`f32`) instead of integers (`u32`). This migration is necessary to fully support the fractional vCPU allocation introduced in the `kata-types` library, ensuring better precision in: 1.Allocation Tracking: `current_vcpu` now tracks the precise fractional value (e.g., 1.5 vCPUs). 2.Resource Calculation: `calc_cpu_resources` now returns a precise `f32` sum of container vCPU requests, including normalization logic based on the maximum period, removing the previous integer rounding steps in the calculation. 3.Hypervisor Interaction: The integer vCPU requirement for the hypervisor remains, so `ceil()` is now explicitly applied only when interacting with the hypervisor or agent APIs (`do_update_cpu_resources`, `current_vcpu`, `online_cpu_mem`). And key changes as below: 1. `CpuResource::current_vcpu` updated from `u32` to `f32`. 2. `calc_cpu_resources` return type changed from `u32` to `f32`. 3. CPU hotplug logic now uses `f32` for the target vCPU count and applies 4. `ceil()` before calling `hypervisor.resize_vcpu()`. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-10 22:11:56 +01:00
Alex Lyn	84fd33c3bc	kata-types: Use fractional float for vCPU resource tracking Refactors `LinuxContainerCpuResources` and `LinuxSandboxCpuResources` to track calculated vCPU allocation using `f64` (fractional float) instead of `u64` (milliseconds). This ensures more precise resource calculation (`quota / period`) and aggregation by avoiding rounding errors inherent in millisecond-based integer tracking. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-10 22:11:56 +01:00
Alex Lyn	0f04363ea8	tests: Disable CPU elasticity tests for nontee scenarios This commit updates the non-TEE tests to disable two specific test cases: `k8s-number-cpus.bats` and `k8s-sandbox-vcpus-allocation.bats`. These tests are designed to cover CPU elasticity/dynamic scaling capabilities. In the non-TEE scenario, we are enforcing the disabling of this capability by setting the default configuration to `static_sandbox_resource_mgmt=true`. Although the tests currently pass, allowing them to run is logically inconsistent with the intended non-TEE configuration. Therefore, we are disabling them for all non-TEE runtimes, specifically targeting: - `qemu-coco-dev` - `qemu-coco-dev-runtime-rs` This change ensures that our non-TEE CI accurately reflects the static resource management policy and prevents misleading test results. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-10 22:11:56 +01:00
Alex Lyn	beaf44dd2e	tests: disable block volume test for s390 arch As runtime-rs doesn't support block device hotplug in s390 arch, with this fact, we just disable or skip the test when it is the s390. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-10 22:11:56 +01:00
Alex Lyn	535ba589f4	runtime-rs: Enable elastic resource feature To support such feature, the item in Makefile should be enabled, and it can be set true when make build, just like this: `DEFSTATICRESOURCEMGMT_QEMU := false` When users don't want this feature, they can set it with true via the configuration.toml. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-10 22:11:56 +01:00
Alex Lyn	28371dbec5	tests: Enable cloud-hypervisor and qemu-runtime-rs within the CI Enable the cpu hotplug tests within the k8s-number-cpus.bats for both cloud-hypervisor and qemu-runtime-rs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-10 22:11:56 +01:00
Alex Lyn	82a72b4564	tests: Enable cpu hotplug for dragonball and clh in vcpus allocation We have support cpu hotplug features within dragonball and clh, this commit is to enable the test within the CI. Fixes: #8660 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-10 22:11:56 +01:00
Alex Lyn	6196d3d646	tests: Enable cpu hotplug tests in k8s-cpu-ns.bats As previous failure within the case, we choose to skip it, but now the cpu hotplug has been corrected, and it's time to re-enable it. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-10 22:11:56 +01:00
Alex Lyn	96bd13e85d	tests: Add support for qemu-runtime-rs We have supportted virtio-scsi driver, and now the CI should be enabled. Fixes: #10373 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-12-10 22:11:56 +01:00
dependabot[bot]	2137b1fa3a	build(deps): bump github.com/containernetworking/plugins in /src/runtime Bumps [github.com/containernetworking/plugins](https://github.com/containernetworking/plugins) from 1.7.1 to 1.9.0. - [Release notes](https://github.com/containernetworking/plugins/releases) - [Commits](https://github.com/containernetworking/plugins/compare/v1.7.1...v1.9.0) --- updated-dependencies: - dependency-name: github.com/containernetworking/plugins dependency-version: 1.9.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2025-12-10 16:10:24 +01:00
LandonTClipp	b50a73912d	runtime: Config test extension for IOMMUFDID Adding additional cases for the IOMMUFDID method to check for non-IOMMUFD paths are passed. The method should do the right thing. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2025-12-10 15:46:28 +01:00
LandonTClipp	d5e4cf6b4d	runtime: Add test for ExecuteVFIODeviceAdd Copilot made a good point that we should have a test for this. Thus, this commit. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2025-12-10 15:46:28 +01:00
LandonTClipp	137866f793	runtime: Allow QMP commands to be logged in debug level Logging the QMP commands gives us a lot of flexibility to troubleshoot issues with what is being sent to QEMU. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2025-12-10 15:46:28 +01:00
LandonTClipp	a3b5764f67	runtime: Fix import cycle and add unit test for IOMMUFDID() An import cycle was introduced because of a mutual need for the constant that describes the prefix of IOMMUFD files. We need to extract this out into a higher-level package. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2025-12-10 15:46:28 +01:00
LandonTClipp	09438fd54f	runtime: Add IOMMUFD Object Creation for QEMU QMP Commands The QMP commands sent to QEMU did not properly set up IOMMUFD objects in the codepath that handles VFIO device hot-plugging. This is mainly relevant in the Kubernetes use-case where the VFIO devices are not available when QEMU is first launched. Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>	2025-12-10 15:46:28 +01:00
Manuel Huber	cb8fd2e3b1	runtime: gpu: Skip CDI annos for pause container The pause container does not need CDI annotations, these are only intended for workload containers. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-10 13:26:04 +01:00
Fabiano Fidêncio	69a0ac979c	tests: Adjust install_bats() The function assumes that the runner is a Ubuntu machine, which so far has been true as part of our CI. However, the new ARM runner is running on Debian, and those mirror additions would simply break. With this in mind, for any distro that's not ubuntu, let's just make sure to inform the owner of the system to have bats already installed as part of the environment provided. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-10 12:05:04 +01:00
Fabiano Fidêncio	406f6b1d15	Revert "tests: Add workaround to override CDI files" This reverts commit `5a81b010f2`, as we now have all the infrastructure properly set up as part of our CI node. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-09 23:18:11 +01:00
Fabiano Fidêncio	3db7b88eff	tests: remove containerd guest pull stability tests Remove the existing containerd guest pull stability tests workflow as we're going to rebuild all the VMs used for testing and introduce new, more focused stability tests for nydus-snapshotter. The new tests will be added soon, as part of another PR. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-08 16:29:11 +01:00
Fabiano Fidêncio	5b6a2d25bc	podOverhead: Reduce memory overhead for GPU runtime classes Now that we've bumped to QEMU 10.2.0-rc1, we can take advantage of a fix that's present there, which fixes the double memory allocation for the cases where GPUs are being cold-plugged. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-06 00:16:43 +01:00
Fabiano Fidêndio	71f78cc87e	tests: cc: gpu: Lower the amount of memory required by the pods We've made the pods require a ridiculous amount of memory, just for the sake of getting them running. Now that those are running, tests are passing, CI is required, let's work to lower the amount of mmemory needed as everything else is working as expected. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-06 00:16:43 +01:00
Dan Mihai	965ad10cf2	tests: k8s: tests_common.sh local modification Clean-up shellcheck warnings: SC2030 (info): Modification of cmd_out is local (to subshell caused by (..) group). SC2031 (info): cmd_out was modified in a subshell. That change might be lost. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-12-06 00:16:23 +01:00
Dan Mihai	8199171cc4	tests: k8s: tests_common.sh braces around variables Clean-up shellcheck warnings: SC2250 (style): Prefer putting braces around variable references even when not strictly required. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-12-06 00:16:23 +01:00
Fabiano Fidêncio	5a81b010f2	tests: Add workaround to override CDI files Let's add a simple backup and restore logic for the CDI configuration file nvidia.com-pgpu.yaml in the k8s-nvidia-*.bats and k8s-confidential-attestation.bats test files. Althought not optimal, this is a temporary workaround needed until NVIDIA releases what's needed for the GPU Operator to properly deal with cold plugged devices for the Confidential Containers cases, which is work in progress right now. After that's released, we can revert/drop this patch. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-05 18:58:35 +01:00
Fabiano Fidêncio	aaa67df4dd	versions: Bump experimental {tdx,snp} QEMU Let's bump experimental {tdx,snp} QEMU to the tags created Today in the Confidential Containers repo, which match with QEMU 10.2.0-rc1. This bump is specially beneficial for us, as we can get rid of QEMU's double memory allocation when cold plugging a GPU. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-05 18:58:35 +01:00
Zvonko Kaiser	f8ad17499d	gpu: VFIO handling container vs sandbox If the sandbox has cold-plugged a IOMMUFD device but the device-plugins sends us a /dev/vfio/<NUM> device we need to check if the IOMMUFD device and the VFIO device are the same We have the sibling.BDF we now need to extract the BDF of the devPath that is either /dev/vfio/<NUM> or /dev/vfio/devices/vfio<NUM> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-05 16:53:31 +01:00
Zvonko Kaiser	147e9f188e	Merge pull request #12080 from manuelh-dev/mahuber/cc-gpu-ci-attestation tests: nvidia: cc: Add attestation test	2025-12-05 09:31:57 -05:00
Steve Horsman	2f1b98c232	Merge pull request #12197 from stevenhorsman/logrus-1.9.3-bump version: Bump sirupsen/logrus	2025-12-05 14:18:50 +00:00
Manuel Huber	e5861cde20	tests: use Authorization when GH_TOKEN is set Same as for other uses of GH_TOKEN, use it when set in order to avoid rate limiting issues. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-05 14:08:43 +01:00
stevenhorsman	9eba559bd6	version: Bump sirupsen/logrus Bump the github.com/sirupsen/logrus version to 1.9.3 across our components where it is back-level to bring us up-to-date and resolve high severity CVE-2025-65637 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-05 11:12:04 +00:00
Manuel Huber	34efa83afc	tests: nvidia: cc: Add attestation test Add the attestation bats test case to the NVIDIA CI and provide a second pod manifest for the attestation test with a GPU. This will enable composite attestation in a subsequent step. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-05 11:48:55 +01:00
Manuel Huber	e31d592a0c	versions: Bump coco-trustee Bump to pull in a fix for composite attestation with GPUs. The new commit ID corresponds to the fix (change for default GPU policy), currently being the top commit of the main branch. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-05 11:48:55 +01:00
Manuel Huber	73dfa9b9d5	versions: Bump coco-guest-components Bump to pull in a fix for NVIDIA CC GPU attestation. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-05 11:48:55 +01:00
Manuel Huber	116a72ad0d	tests: cc: Fix command evaluation This brings two fixes: - use the test_key variable to check against the aatest value. - properly check the run command invocation (run w/o bash does not seem to like the pipe which leads to ALWAYS evaluating the status result to 1. With this, the deny-all test would ALWAYS succeed regardless of whether aatest was actually returned or not. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-05 11:48:55 +01:00
Manuel Huber	23675c784b	tests: cc: Reset default policy When running these tests repeatedly locally, the default policy is not being reset after the test completes, then subsequent runs fail. Similar to k8s-sealed-secrets.bats, we set the default policy in an if condition. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-05 11:48:55 +01:00
Manuel Huber	f70c3adaf1	tests: cc: Add kbs_set_gpu0_resource_policy This allows setting a GPU0 resource policy, enabling GPU attestation tests to not use the default resource policy. For now, the policy requires attestation's ear status to not be contraindicated. In a future change we will require this to be affirming once our CI runners' vBIOS version is properly configured. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-05 11:48:55 +01:00
Manuel Huber	c2d1e2dcc9	tests: cc: Add is_confidential_gpu_hardware This enables attestation tests to figure out whether composite attestation with a GPU can be executed. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-05 11:48:55 +01:00
Manuel Huber	53e94df203	tests: nvidia: cc: add SUPPORTED_TEE_HYPERVISORS Add the NVIDIA TEE hypervisors. With this, attestation tests can be run against the NVIDIA handlers, for instance. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-05 11:48:55 +01:00
Fabiano Fidêncio	923f97bc66	rootfs: Temporarily revert "gpu: Handle root_hash.txt correctly" This reverts commit `e4a13b9a4a`, as it caused some issues with the GPU workflows. Reverting it is better, as it unblocks other PRs. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-05 11:47:37 +01:00
Steve Horsman	d27af53902	Merge pull request #12185 from stevenhorsman/runtime-rs-required-checks ci: Add qemu-runtime-rs AKS tests to required	2025-12-05 10:43:25 +00:00
stevenhorsman	403de2161f	version: Update golang to 1.24.11 Needed to fix: ``` Vulnerability #1: GO-2025-4155 Excessive resource consumption when printing error string for host certificate validation in crypto/x509 More info: https://pkg.go.dev/vuln/GO-2025-4155 Standard library Found in: crypto/x509@go1.24.9 Fixed in: crypto/x509@go1.24.11 Vulnerable symbols found: #1: x509.HostnameError.Error ``` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-04 22:50:07 +01:00
Steve Horsman	425f4ffc8d	Merge pull request #12124 from zvonkok/nvidia-measured-rootfs gpu: Measured rootfs	2025-12-04 14:54:11 +00:00
Hyounggyu Choi	1dd3426adc	tests: Extend vfio-ap test for runtime-rs vfio-ap passthrough has been introduced for runtime-rs, requiring that the existing test verify this new functionality. This commit adds: - containerd config specific to runtime-rs - extensions to the existing test functions to cover vfio-ap Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-12-04 15:05:23 +01:00
Hyounggyu Choi	aa326fb9b8	tests: Remove usage of crictl for vfio-ap `crictl` is not used any more after #10767. Let's clean up all places where the tool is used. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-12-04 15:05:23 +01:00
Hyounggyu Choi	41d61f4b16	runtime-rs: Enable VFIO-AP passthrough The following have been made for the enablement: 1. Make `MediatedPci` and `MediatedAp` in `VfioDeviceType` 2. Make HostDevice without BDF for `MediatedAp` 3. Add `CCW` to VFioBusMode and set it to VfioConfig as `bus_type` 4. Return `vfio-ap` driver type for `CCW` bus type 5. Set `bus_mode` for `VfioDevice` based on `bus_type` 6. Set `vfio-ap` to the agent device's `field_type` 7. Prepare a different argument for `vfio-ap` for QMP command 8. Set None to all PCI relevant fields Please keep in mind that `vfio-ap` does not belong to any types of port togologies like PCI (e.g., root or switch) because devices on s390x are controlled by CCW. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-12-04 15:05:23 +01:00
Hyounggyu Choi	cb5b1384ca	runtime-rs: Introduce `uses_native_ccw_bus()` Until now, we relied on `VMROOTFSDRIVER` to determine whether a system uses a native CCW bus. However, this method is not canonical and can be error-prone depending on the configuration. This commit introduces a new function that checks for the presence of CCW bus infrastructure in sysfs and verifies that native mainframe drivers are available. It replaces all previous uses of the old detection method. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-12-04 15:05:23 +01:00
Steve Horsman	f673f33e72	Merge pull request #12172 from fidencio/topic/gatekeeper-mark-nvidia-jobs-as-required gatekeeper: Mark NVIDIA CC GPU test as required	2025-12-04 12:48:57 +00:00
stevenhorsman	112810c796	ci: Add qemu-runtime-rs AKS tests to required Add the small and normal variants of the qemu-runtime-rs tests to the required-tests list now that they are stable. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-04 11:15:43 +00:00
Fabiano Fidêncio	c505afb67c	gatekeeper: Mark NVIDIA CC GPU test as required It's been stable for the past 10 nightlies, no retries. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-12-04 11:14:25 +00:00
Steve Horsman	635f7892d5	Merge pull request #12190 from BbolroC/mark-s390x-jobs-as-nonrequired gatekeeper: Drop all s390x e2e tests temporarily	2025-12-04 11:10:46 +00:00
Steve Horsman	2a6ebc556f	Merge pull request #12175 from kata-containers/mahuber/gpu-ci-genpolicy ci: nvidia: Install kata-artifacts	2025-12-04 09:23:32 +00:00
Hyounggyu Choi	b6ef7eb9c3	gatekeeper: Drop all s390x e2e tests temporarily This commit marks three s390x CI jobs as non-required. Please check out the details at #12189. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-12-04 08:05:14 +01:00
Steve Horsman	10b0717cae	Merge pull request #12179 from stevenhorsman/nginx-test-image-by-digest tests: Switch nginx test image ref to digest	2025-12-03 13:39:07 +00:00
Hyounggyu Choi	22778547b2	runtime-rs: Fix panic when OCI spec annotations are missing An oci-spec can be passed to the runtime without annotations (e.g., `ctr run`). In this case, runtime panics with: ``` src/runtime-rs/crates/runtimes/src/manager.rs:391: called `Option::unwrap()` on a `None` value ``` This commit checks if the annotation is None, and instantiates the hashmap as an empty map if it is missing. It also adds a None check for `netns`. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-12-03 13:07:39 +01:00
Hyounggyu Choi	ba78fb46fb	runtime-rs: Configure protection devices when confidential_guest is set Currently, the protection device configuration is constructed automatically even if `confidential_guest` is not set. This commit puts a condition to check the flag and allows the construction accordingly. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-12-03 13:07:39 +01:00
Zvonko Kaiser	e4a13b9a4a	gpu: Handle root_hash.txt correctly Updates to the shim-v2 build and the binaries.sh script. Makeing sure that both variants "confidential" AND "nvidia-gpu-confidential" are handled. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-12-02 19:56:19 +01:00
Steve Horsman	d8405cb7fb	Merge pull request #11983 from stevenhorsman/toolchain-guidance doc: Document our Toolchain policy	2025-12-02 15:47:54 +00:00
stevenhorsman	b9cb667687	doc: Document our Toolchain policy Create an initial version of our toolchain policy as agreed in Architecture Committee meetings and the PTG Fixes: #9841 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-02 14:28:29 +00:00
stevenhorsman	79a75b63bf	tests: Switch nginx test image ref to digest As tags are mutable and digests are not, lets pin our image by digest to give our CI a better chance of stability Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-02 13:02:50 +00:00
stevenhorsman	5c618dc8e2	tests: Switch nginx images to use version.yaml details - Swap out the hard-coded nginx registry and verisons for reading the test image details for version.yaml which can also ensure that the quay.io mirror is used rather than the docker hub versions which can hit pull limits - Try setting imagePullPoliycy Always to fix issues with the arm CI Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-12-02 10:04:09 +01:00
Manuel Huber	3427b5c00e	ci: nvidia: Install kata-artifacts In preparation for Kata agent security policy testing, installing Kata tools to provide genpolicy. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-01 17:59:19 +00:00
Manuel Huber	4355af7972	kata-deploy: Fix binary find install_tools_helper Using make tarball targets for tools locally, binaries may exist for both debug and release builds. In this case, cryptic errors are shown as we try to install multiple binaries. This change require exactly one binary to be found and errors out in other cases. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-12-01 09:29:24 -08:00
Manuel Huber	5a5c43429e	ci: nvidia: remove kubectl_retry calls When tests regress, the CI wait time can increase significantly with the current kubectly_retry attempt logic. Thus, align with other tests and remove kubectl_retry invocations. Instead, rely on proper timeouts. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-28 19:00:57 +01:00
Fabiano Fidêncio	e3646adedf	gatekeeper: Drop SEV-SNP from required SEV-SNP machine is failing due to nydus not being deployed in the machine. We cannot easily contact the maintainers due to the US Holidays, and I think this should become a criteria for a machine not be added as required again (different regions coverage). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-28 12:46:07 +01:00
Steve Horsman	8534afb9e8	Merge pull request #12150 from stevenhorsman/add-gatekeeper-triggers ci: Add two extra gatekeeper triggers	2025-11-28 09:34:41 +00:00
Zvonko Kaiser	9dfa6df2cb	agent: Bump CDI-rs to latest Latest version of container-device-interface is v0.1.1 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-11-27 22:57:50 +01:00
Fabiano Fidêncio	776e08dbba	build: Add nvidia image rootfs builds So far we've only been building the initrd for the nvidia rootfs. However, we're also interested on having the image beind used for a few use-cases. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-27 22:46:07 +01:00
stevenhorsman	531311090c	ci: Add two extra gatekeeper triggers We hit a case that gatekeeper was failing due to thinking the WIP check had failed, but since it ran the PR had been edited to remove that from the title. We should listen to edits and unlabels of the PR to ensure that gatekeeper doesn't get outdated in situations like this. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-27 16:45:04 +00:00
Zvonko Kaiser	bfc9e446e1	kernel: Add NUMA config Add per arch specific NUMA enablement kernel settings Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-11-27 12:45:27 +01:00
Steve Horsman	c5ae8c4ba0	Merge pull request #12144 from BbolroC/use-runs-on-to-choose-runners GHA: Use `runs-on` only for choosing proper runners	2025-11-27 09:54:39 +00:00
Fabiano Fidêncio	2e1ca580a6	runtime-rs: Only QEMU supports templating We can remove the checks and default values attribution from all other shims. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-27 10:31:28 +01:00
Alex Lyn	df8315c865	Merge pull request #12130 from Apokleos/stability-rs tests: Enable stability tests for runtime-rs	2025-11-27 14:27:58 +08:00
Fupan Li	50dce0cc89	Merge pull request #12141 from Apokleos/fix-nydus-sn tests: Properly handle containerd config based on version	2025-11-27 11:59:59 +08:00
Fabiano Fidêncio	fa42641692	kata-deploy: Cover all flavours of QEMU shims with multiInstallSuffix We were missing all the runtime-rs variants. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-26 17:44:16 +01:00
Fabiano Fidêncio	96d1e0fe97	kata-deploy: Fix multiInstallSuffix for NV shims When using the multiInstallSuffix we must be cautelous on using the shim name, as qemu-nvidia-gpu* doesn't actually have a matching QEMU itself, but should rather be mapped to: qemu-nvidia-gpu -> qemu qemu-nvidia-gpu-snp -> qemu-snp-experimental qemu-nvidia-gpu-tdx -> qemu-tdx-experimental Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-26 17:44:16 +01:00
Markus Rudy	d8f347d397	Merge pull request #12112 from shwetha-s-poojary/fix_list_routes agent: fix the list_routes failure	2025-11-26 17:32:10 +01:00
Steve Horsman	3573408f6b	Merge pull request #11586 from zvonkok/numa-qemu qemu: Enable NUMA	2025-11-26 16:28:16 +00:00
Steve Horsman	aae483bf1d	Merge pull request #12096 from Amulyam24/enable-ibm-runners ci: re-enable IBM runners for ppc64le and s390x	2025-11-26 13:51:21 +00:00
Steve Horsman	5c09849fe6	Merge pull request #12143 from kata-containers/topic/add-report-tests-to-workflows workflows: Add Report tests to all workflows	2025-11-26 13:18:21 +00:00
Steve Horsman	ed7108e61a	Merge pull request #12138 from arvindskumar99/SNPrequired CI: readding SNP as required	2025-11-26 11:33:07 +00:00
Amulyam24	43a004444a	ci: re-enable IBM runners for ppc64le and s390x This PR re-enables the IBM runners for ppc64le/s390x build jobs and s390x static checks. Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2025-11-26 16:20:01 +05:30
Hyounggyu Choi	6f761149a7	GHA: Use `runs-on` only for choosing proper runners Fixes: #12123 `include` in #12069, introduced to choose a different runner based on component, leads to another set of redundant jobs where `matrix.command` is empty. This commit gets back to the `runs-on` solution, but makes the condition human-readable. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-11-26 11:35:30 +01:00
Alex Lyn	4e450691f4	tests: Unify nydus configuration to containerd v3 schema Containerd configuration syntax (`config.toml`) varies across versions, requiring per-version logic for fields like `runtime`. However, testing confirms that containerd LTS (1.7.x) and newer versions fully support the v3 schema for the nydus remote snapshotter. This commit changes the previous containerd v1 settings in `config.toml`. Instead, it introduces a unified v3-style configuration for nydus, which can be vailid for lts and active containerds. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-26 17:58:16 +08:00
stevenhorsman	4c59cf1a5d	workflows: Add Report tests to all workflows In the CoCo tests jobs @wainersm create a report tests step that summarises the jobs, so they are easier to understand and get results for. This is very useful, so let's roll it out to all the bats tests. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-26 09:28:36 +00:00
shwetha-s-poojary	4510e6b49e	agent: fix the list_routes failure relax list_routes tests so not every route requires a device Signed-off-by: shwetha-s-poojary <shwetha.s-poojary@ibm.com>	2025-11-25 20:25:46 -08:00
Xuewei Niu	04e1cf06ed	Merge pull request #12137 from Apokleos/fix-netdev-mq runtime-rs: fix QMP 'mq' parameter type in netdev_add to boolean	2025-11-26 11:49:33 +08:00
Alex Lyn	ebe084e093	Merge pull request #12122 from fidencio/topic/configs-do-no-have-commented-out-options runtimes: config: Do NOT have commented fields	2025-11-26 10:33:32 +08:00
Alex Lyn	e9f50f6e71	Merge pull request #12116 from manuelh-dev/mahuber/ci-openvpn-policy-v2 policy: ci: enable security policy for openvpn test case	2025-11-26 09:35:43 +08:00
Fabiano Fidêncio	e859537c74	runtimes: config: Do NOT have commented fields In order to have a better way to set things up using a toml editor, we should take the containerd approach and actually have everything uncommnted. This will help us to unify how we deal with such values in the future from the kata-deploy POV. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-25 19:26:56 +01:00
Arvind Kumar	c085011a0a	CI: readding SNP as required Reenabling the SNP CI node as a required test. Signed-off-by: Arvind Kumar <arvinkum@amd.com>	2025-11-25 17:05:01 +00:00
Fabiano Fidêncio	5ca4f2b9ff	runtimes: annotations: Fix kernel param handling We need to ensure that we do not blindly append nor blindly override the kernel parameters set by default, but rather modify the values in case they exist, and append in case they do not. Now we're actually making golang and rust runtime behave the same, as so far they were behaving differently, each version wrong in its own way. :-p. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-25 16:04:52 +01:00
Zvonko Kaiser	45cce49b72	shellcheckk: Fix [] [[]] SC2166 This file is a beast so doing one shellcheck fix after the other. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-11-25 15:46:16 +01:00
Zvonko Kaiser	b2c9439314	qemu: Update tools/packaging/static-build/qemu/build-qemu.sh This nit was introduced by `227e717` during the v3.1.0 era. The + sign from the bash substitution ${CI:+...} was copied by mistake. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-11-25 15:46:09 +01:00
Zvonko Kaiser	2f3d42c0e4	shellcheck: build-qemu.sh is clean Make shellcheck happy Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-11-25 15:46:07 +01:00
Zvonko Kaiser	f55de74ac5	shellcheck: build-base-qemu.sh is clean Make shellcheck happy Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-11-25 15:45:49 +01:00
Zvonko Kaiser	040f920de1	qemu: Enable NUMA support Enable NUMA support with QEMU. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-11-25 15:45:00 +01:00
Alex Lyn	de9308419b	Merge pull request #12135 from microsoft/danmihai1/init-data agent: allow disabling detect_initdata_device	2025-11-25 21:07:57 +08:00
Alex Lyn	34d3bd18bc	Merge pull request #12132 from fidencio/topic/runtime-classes-fix-nvidia-gpu-podOverhead runtimeclasses: Fix nvidia-gpu podOverhead	2025-11-25 20:23:07 +08:00
Alex Lyn	7f4d856e38	tests: Enable nydus tests for qemu-runtime-rs We need enable nydus tests for qemu-runtime-rs, and this commit aims to do it. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-25 17:45:57 +08:00
Alex Lyn	98df3e760c	runtime-rs: fix QMP 'mq' parameter type in netdev_add to boolean QEMU netdev_add QMP command requires the 'mq' (multi-queue) argument to be of boolean type (`true` / `false`). In runtime-rs the virtio-net device hotplug logic currently passes a string value (e.g. "on"/"off"), which causes QEMU to reject the command: ``` Invalid parameter type for 'mq', expected: boolean ``` This patch modifies `hotplug_network_device` to insert 'mq' as a proper boolean value of `true . This fixes sandbox startup failures when multi-queue is enabled. Fixes #12136 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-25 17:34:36 +08:00
Alex Lyn	23393d47f6	tests: Enable stability tests for qemu-runtime-rs on nontee Enable the stability tests for qemu-runtime-rs CoCo on non-TEE environments Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-25 16:18:37 +08:00
Alex Lyn	f1d971040d	tests: Enable run-nerdctl-tests for qemu-runtime-rs Enable nerdctl tests for qemu-runtime-rs Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-25 16:14:50 +08:00
Alex Lyn	c7842aed16	tests: Enable stability tests for runtime-rs As previous set without qemu-runtime-rs, we enable it in this commit. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-25 16:12:12 +08:00
Alex Lyn	aadf1d6f71	Merge pull request #11932 from Apokleos/enhance-blk-params runtime-rs: Allow configuration of virtio block queue parameters	2025-11-25 15:24:12 +08:00
Dan Mihai	22d60a36c0	agent: allow disabling detect_initdata_device Allow users to build the Kata Agent using INIT_DATA=no to disable the detect_initdata_device() code loop and associated debug log output. Future additional improvements related to Init Data are tracked by #11532. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-25 02:44:28 +00:00
Fabiano Fidêncio	bb56a2e4d9	runtimeclasses: Fix nvidia-gpu podOverhead On `69c4fc4e76`, I've mistakenly changed the nvidia-gpu podOverhead while I should only have changed the TEE nvidia-gpu ones. Let's move it back to its original value. Reported-by: Joji Mekkattuparamban <jojim@nvidia.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-24 21:43:29 +01:00
Zvonko Kaiser	55489818d6	gpu: TDX kernel param cleanup This settings is not needed anymore with Ubuntu 25.10 and the newest QEMU releases for TDX by Ubuntu. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-11-24 15:49:16 +01:00
Steve Horsman	e1e370091c	Merge pull request #12128 from fidencio/topic/kata-deploy-nfd-adjust-runtime-classe kata-deploy: nfd: Patch TEE runtimeclasses when needed	2025-11-24 14:05:43 +00:00
Steve Horsman	d437f875aa	Merge pull request #12126 from zvonkok/cold-plug-cleanup gpu: Cleanup Makefile	2025-11-24 14:01:49 +00:00
Zvonko Kaiser	77089fe5b3	Merge pull request #12115 from nheinemans-asml/main Kata-deploy: Add tolerations to daemonset and cleanup job	2025-11-24 09:00:42 -05:00
Manuel Huber	331515e1b8	ci: enable security policy for openvpn test With issue 11777 being resolved, this commit enables openvpn policy testing. The remaining work on the security policy required to successfully run this test case was to enable UDP ports for Service kinds and to use the mount path's last component instead of the volume name to construct the expected storage source path. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-23 17:23:43 +00:00
Manuel Huber	4f32816ea3	policy: Use mount path instead of volume name Use the mount path's last component instead of the volume name to construct the expected storage source path. Example: Name of a volumeMount is 'openvpn-config' and its mountPath is '/etc/openvpn/'. Without this change, we use 'openvpn-config' to calculate the expected storage source path. However, we need to use 'openvpn', because the shim uses the basename of the destination path as the source suffix and not the volume name. For reference, see 'fs_hsare_linux.go"'s 'ShareFile' function where the filename variable uses 'filepath.Base(m.Destionation))'. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-23 17:23:43 +00:00
Manuel Huber	e4123a9848	policy: support UDP based Service types For Service kinds using the UDP protocol as port. An example is the openvpn-server-service.yaml file part of the openvpn CI test. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-23 17:23:43 +00:00
Fabiano Fidêncio	d0f3eb935e	kata-deploy: nfd: Patch TEE runtimeclasses when needed We've added logic to properly do the book keeping of the TEE keys when using NFD AND creating the runtime classes. However, we need to also take into consideration the case where the runtimeclasses are being created by the helm template, and in that case we just update what helm has deployed. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-23 10:27:52 +01:00
Zvonko Kaiser	dce207397c	gpu: Cleanup Makefile Some VARS were introduced but not cleaned up with the recent cold-plug PR, doing this now Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-11-21 22:03:34 +00:00
Zvonko Kaiser	8afcdae31f	Merge pull request #12092 from manuelh-dev/mahuber/cc-gpu-ci-smi-srs tests: nvidia: cc: Remove nvrc.smi.srs=1 parameter	2025-11-21 08:26:13 -05:00
Steve Horsman	37dd055283	Merge pull request #12090 from stevenhorsman/required-tests-update-14-nov-2025 Required tests update 14 nov 2025	2025-11-21 12:05:05 +00:00
nheinemans-asml	ef9d4e8b0d	kata-deploy: Add tolerations value to kata-deploy This allows the daemonset and cleanup job to run on tainted nodes. fixes #12114 Signed-off-by: nheinemans-asml <nick.heinemans@asml.com> Signed-off-by: nheinemans-asml <97238218+nheinemans-asml@users.noreply.github.com>	2025-11-21 09:49:47 +01:00
Manuel Huber	dfc229f51e	tests: nvidia: cc: Remove nvrc.smi.srs=1 parameter Remove the nvrc.smi.srs=1 parameter from the kernel command line. In CC use cases, the attestation agent is expected to set the GPU ready state. For the CUDA vectorAdd case where attestation agent is not being used, we set the ready state by adding the kernel command line parameter through an annotation. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-21 09:35:05 +01:00
Manuel Huber	6c6fc50aa5	tests: nvidia: cc: allow-all policy and init-data Add an allow-all policy for the CC GPU tests and ensure the init-data device is being created (hypervisor annotations). Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-21 09:24:15 +01:00
Manuel Huber	7e20118c8e	tests: nvidia: move secret definitions to bottom The add_allow_all_policy_to_yaml in tests_common.sh needs some improvements so that this function can support pod manifests with different resource kinds. For now, moving the Secret definition to the bottom so that we can create a default policy for the Pod. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-21 09:24:15 +01:00
Manuel Huber	ffd5443637	tests: nvidia: adapt is_aks_cluster The qemu-nvida-gpu handlers should not cause is_aks_cluster to return 1. Otherwise, CI logic will assume these hypervisors run on AKS hosts, see the following message in CI w/o this change: INFO: Adapting common policy settings for AKS Hosts Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-21 09:24:15 +01:00
Manuel Huber	f2bdd12e5e	tests: nvidia: Check KATA_HYPERVISOR var Fail explicitly when a wrong KATA_HYPERVISOR variable is provided. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-21 09:24:15 +01:00
Xuewei Niu	bf967b81cc	runtime-rs: Bump cgroups-rs to v0.5.0 The new version fixes some issues with systemd version, path verification. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2025-11-21 09:06:26 +01:00
Fabiano Fidêncio	6b40b59861	tests: Reduce KBS deployment check flakeness We currently start a pod that does a `wget` to the KBS address, and fails after 5 seconds. By the time it fails and reports back, we can see that KBS is actually running, but the workflow failed as the checker failed. :-/ Let's give it more time for the KBS to show up, and the flakeness should go away. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-20 19:29:26 +01:00
Fabiano Fidêncio	35672ec5ee	tests: cc: Test authenticated images with force guest pull As this should simply work. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-20 19:02:15 +01:00
Fupan Li	b86e7ff42b	Merge pull request #12087 from jojimt/device_cold_plug shim: Support device cold plug with Kubernetes	2025-11-20 19:17:13 +08:00
Joji Mekkattuparamban	7dc292094c	shim: go vendor changes for cold plug support Vendor in the kubelet pod resources API. Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>	2025-11-20 10:58:55 +01:00
Joji Mekkattuparamban	5aa184925a	shim: Support device cold plug with Kubernetes Utilize Kubelet's Pod Resource API to determine device allocations for the Pod during sandbox creation. Use CDI files to translate the device IDs to corresponding device paths and perform device injection. Fixes #12009 Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>	2025-11-20 10:58:55 +01:00
Manuel Huber	477ca3980b	tests: nvidia: cc: Re-enable multi GPU test case Use the pod name variable so that kubectl wait finds the pod. Currently, kubectl waits for nvidia-nim-llama-3-2-nv-embedqa-1b-v2, not for nvidia-nim-llama-3-2-nv-embedqa-1b-v2-tee Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-20 10:05:46 +01:00
Zvonko Kaiser	89cd561340	Merge pull request #12059 from manuelh-dev/mahuber/bb-debug-v2 gpu: introduce a new devkit build flag to produce a rootfs for developers	2025-11-19 13:03:46 -05:00
Steve Horsman	8c6c31555a	Merge pull request #12111 from fidencio/topic/ci-fix-erofs-ci tests: k8s: Fix typo in authenticated tests	2025-11-19 16:08:48 +00:00
Manuel Huber	3966864376	gpu: introduce devkit build flag Introduce a new devkit parameter which will produce a rootfs without chisselling. This results in a larger rootfs with various packages and binaries being included, for instance, enabling the use of the debug console. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-19 15:50:03 +01:00
Manuel Huber	2c9e0f9f4f	gpu: add signed-by to package sources Pin to specific key. CUDA package sources in /etc/apt/sources.list.d already use a specific key. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-19 15:50:03 +01:00
Ruoqing He	54bfbf5687	build: Exclude tools from root workspace There are rust packages being cloned and built inside tools/packaging/kata-deploy/local-build/build folder, which may mislead those packages to think they are part of the kata root workspace. Exclude the directory to avoid that. Reported-by: Fabiano Fidêncio <ffidencio@nvidia.com> Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-11-19 15:49:25 +01:00
Fabiano Fidêncio	ae463642ed	tests: k8s: Fix typo in authenticated tests The person who introduced the check, someone named Fabiano Fidêncio, forgot a `$` in a variable assignment. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-19 11:59:59 +01:00
Steve Horsman	87b180383e	Merge pull request #11802 from kata-containers/dependabot/github_actions/oras-project/setup-oras-1.2.4 build(deps): bump oras-project/setup-oras from 1.2.2 to 1.2.4	2025-11-19 09:58:37 +00:00
dependabot[bot]	ede5ac9c2d	build(deps): bump the bit-vec group across 2 directories with 1 update Bumps the bit-vec group with 1 update in the /src/agent directory: [bit-vec](https://github.com/contain-rs/bit-vec). Bumps the bit-vec group with 1 update in the /src/tools/agent-ctl directory: [bit-vec](https://github.com/contain-rs/bit-vec). Updates `bit-vec` from 0.6.3 to 0.8.0 - [Changelog](https://github.com/contain-rs/bit-vec/blob/master/RELEASES.md) - [Commits](https://github.com/contain-rs/bit-vec/commits) Updates `bit-vec` from 0.6.3 to 0.8.0 - [Changelog](https://github.com/contain-rs/bit-vec/blob/master/RELEASES.md) - [Commits](https://github.com/contain-rs/bit-vec/commits) --- updated-dependencies: - dependency-name: bit-vec dependency-version: 0.8.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: bit-vec - dependency-name: bit-vec dependency-version: 0.8.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: bit-vec ... Signed-off-by: dependabot[bot] <support@github.com>	2025-11-19 10:43:25 +01:00
stevenhorsman	b75d90b483	ci: Comment out snp ci from required-tests The snp CI has not been required for a while and has recently been broken, so comment it out from the list of required jobs. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-19 09:39:36 +00:00
stevenhorsman	ae71921be2	ci: Update build-checks name in required-tests to update the required-tests to match. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-19 09:39:36 +00:00
stevenhorsman	112ed9bb46	ci: Comment out run-nydus from required-tests The run-nydus tests are not stable and blocking PRs, so make them non-required temporarily until they can be looked at Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-19 09:38:38 +00:00
Fupan Li	478a5ff693	Merge pull request #12109 from Apokleos/enable-cocodev-rs tests: Enable AUTO_GENERATE_POLICY for qemu-coco-dev-runtime-rs	2025-11-19 12:05:22 +08:00
Alex Lyn	1da225efc5	tests: Enable AUTO_GENERATE_POLICY for qemu-coco-dev-runtime-rs Enable auto-generate policy on cbl-mariner Hosts for qemu-coco-dev-runtime-rs if the user didn't specify an AUTO_GENERATE_POLICY value. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-19 10:44:03 +08:00
Alex Lyn	8d85548711	Merge pull request #12102 from Apokleos/rs-copyfile-devcgrp runtime-rs: Clear Linux.Resources.Devices completely and correct the guest path for container mount binding	2025-11-19 09:05:59 +08:00
Fabiano Fidêncio	8c02b5b913	tests: nvidia: cc: Temporarily skip multi GPU for nim tests We will re-enable this one later on once the changes to properly cold plug multi GPUs are merged. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 22:29:42 +01:00
Fabiano Fidêncio	69c4fc4e76	kata-deploy: Adjust podOverhead for GPU TEEs Let's just move the podOverhead to a gigantic value, as we do need pod snadboxes as big as that, and we've noticed QEMU being OOM killed with smaller overheads. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 22:29:42 +01:00
Fabiano Fidêncio	94ed4051b0	tests: nvidia: cc: Increase RAM for NIM pods Those need to pull the models inside the guest, and the guest has 50% of its memory "allowed" to be used as tmpfs, so, we gotta usa the RAM that we have. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 22:29:42 +01:00
Fabiano Fidêncio	e5062a056e	tests: nvidia: cc: Adjust timeouts on NIM pods Timeout increases for confidential computing slowness: * livenessProbe: * initialDelaySeconds: 15 → 120 seconds * timeoutSeconds: 1 → 10 seconds * failureThreshold: 3 → 10 * readinessProbe: * initialDelaySeconds: 15 → 120 seconds * timeoutSeconds: 1 → 10 seconds * failureThreshold: 3 → 10 * startupProbe: * initialDelaySeconds: 40 → 180 seconds * timeoutSeconds: 1 → 10 seconds * failureThreshold: 180 → 300 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 22:29:42 +01:00
Fabiano Fidêncio	dee6f2666b	runtime: nvidia: Increase the guest pull timeout to 20 minutes Yes, we're dealing with a combination of large images and image-rs concurrent image layers being not optimal. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 22:29:42 +01:00
Fabiano Fidêncio	6be43b2308	tests: nvidia: Retry kubectl commands As with CoCo some of the commands may take longer, way longer than expected. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 22:29:42 +01:00
Fabiano Fidêncio	bb5bf6b864	tests: nvidia: nims: Use the current auths format for KBS We cannot use the same format used for docker, as it includes username and password, while what's expected when using Trustee does not. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 22:29:42 +01:00
Fabiano Fidêncio	92da54c088	tests: nvidia: cc: Enable NIM tests Now that we've bumped Trustee to a version that supports the NVIDIA remote verifier, let's re-enable the tests. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 22:29:42 +01:00
Steve Horsman	74254cba8f	Merge pull request #12106 from stevenhorsman/gatekeeper-paging-reduction ci: Adjust gatekeeper's job fetch	2025-11-18 14:08:26 +00:00
Fabiano Fidêncio	8eca0814bd	tests: Run authenticated tests with experimental_force_guest_pull As it should be supported. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 14:46:48 +01:00
Fabiano Fidêncio	5beb1af202	tests: Pass EXPERIMENTAL_FORCE_GUEST_PULL to the test Right now we have only been passing the env var to the deployment script, but we really need to pass it to the tests script as well. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-18 14:46:48 +01:00
Markus Rudy	638cad18ef	Merge pull request #11978 from burgerdev/genpolicy-test-refactor genpolicy: prepare integration tests for programmatic modification	2025-11-18 09:54:40 +01:00
stevenhorsman	9f0fea1e34	ci: Adjust gatekeeper's job fetch Try and reduce the page limit of each job request to avoid the chances of us tripping over github's 10s api limit. All credit to @burgerdev for the investigation and suggestion! Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-18 08:22:36 +00:00
Alex Lyn	6ceacee0b9	runtime-rs: Add queue_size and num_queues for block volumes Add the related block queue_size and num_queues in volumes based on block devices, This very important for IO performance. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-18 14:53:43 +08:00
Alex Lyn	30a9a8b4ec	runtime-rs: Add queue_size and num_queues for block device Add the queue_size and num_queues in block device config when the block device is handled. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-18 14:53:43 +08:00
Alex Lyn	9b0204a2de	runtime-rs: Set Clh's disk queue_size and num_queues Previous Clh's settings with disk queue_size and num_queues are hardcodes, they should be configurable with user-defined values. This commit is to address such issue via passing these settings. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-18 14:53:43 +08:00
Alex Lyn	f19c48505c	runtime-rs: Introduce queue_size and num_queues in BlockConfig Usually, we pass the related block config via BlockConfig, and to reach the goal of user-friendly setting queue_size and num_queues for users, the queue_size and num_queues are introduced in BlockConfig. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-18 14:53:43 +08:00
Alex Lyn	e958993348	kata-types: Introduce queue_size and num_queues within BlockDeviceInfo Add two fields of queue_size and num_queues in BlockDeviceInfo to allow users to set the related items via configurations Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-18 14:53:43 +08:00
Alex Lyn	780c45de23	runtime-rs: Add support queue_size and num_queues within configurations Add related items for block device queue size and num queues in configurations. And users can set the related items by configurations. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-18 14:53:43 +08:00
Steve Horsman	ac021e2ab9	Merge pull request #11563 from RuoqingHe/single-workspace build: Introduce root workspace for rust components	2025-11-18 06:36:18 +00:00
Alex Lyn	d071384bba	runtime-rs: Clear Linux.Resources.Devices completely The current implementation causes issues with the Agent Policy nontee CI tests, as Kata-Agent does not allow any configuration for `count(Linux.Resources.Devices) == 0`. This commit ensures that Linux.Resources.Devices, including all its values, is completely cleared from the OCI Runtime Specification before being passed to the Kata-Agent. This addresses the CI failure by enforcing the required empty state for the Devices cgroup configuration. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-18 13:40:09 +08:00
Xuewei Niu	ca8b3300d3	Merge pull request #11620 from zhangckid/indep_iothreads_upstream Runtime/QEMU: Introduce virtio-blk with iothreads and enable Indep iothreads framework	2025-11-18 11:08:51 +08:00
Alex Lyn	5982e66503	runtime-rs: Ensure unique guest path for container mount binding Previously, CopyFile implementation attempted to reuse existing guest paths for subsequent containers within the same Pod. This prevented correct bind mounting of shared configurations (e.g., ConfigMaps, Service Accounts) into the later containers within a multi-containers pod, as they lacked their own allocated guest path. This commit modifies the logic to create a unique guest path for every container that requires file propagation. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-18 11:03:26 +08:00
Fupan Li	f791be1abb	Merge pull request #12064 from Apokleos/policy-optional-path genpolicy: Make cpath compatible with both runtime-rs and runtime-go	2025-11-18 10:19:26 +08:00
Ruoqing He	e6b24cd789	build: Exclude crates with no workspace setup Crates with no workspace setup would think themselves are in the root workspace, which our root workspace is not ready for them. Excluding them for now. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-11-18 01:39:48 +00:00
Ruoqing He	6068242bf1	build: Move dragonball to root workspace Move dragonball and all its member of that workspace into root workspace. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-11-18 01:39:48 +00:00
Ruoqing He	3fbe693658	build: Introduce root workspace for rust components Add Cargo.toml at repo root, use this root workspace for as many as possible Rust components of Kata Containers. This would enable us to share a common Cargo.lock file, and reduce the noise from dependabot. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-11-18 01:39:48 +00:00
Steve Horsman	650ada7bcc	Merge pull request #12101 from stevenhorsman/release/3.23.0 release: Bump version to 3.23.0	2025-11-17 21:09:45 +00:00
stevenhorsman	70f1f4a3ac	release: Bump version to 3.23.0 Bump VERSION and helm-chart versions Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-17 19:27:25 +00:00
stevenhorsman	c47e8d0ab8	kata-ctl: update backtrace and local references Similar to #12075, bump-backtrace to 0.3.76 to remove the dependency on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056 As a side effect this brought in loads of other crate changes, which I think are due to it bumping the local dependencies that this package builds on. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-17 20:13:04 +01:00
stevenhorsman	d16620bae1	runk: update backtrace to 0.3.76 Similar to #12075, bump-backtrace to remove the dependency on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-17 20:13:04 +01:00
stevenhorsman	0b259e4fcf	agent-ctl: update backtrace to 0.3.76 Similar to #12075, bump-backtrace to remove the dependency on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-17 20:13:04 +01:00
stevenhorsman	4abf79f16f	genpolicy: update backtrace to 0.3.76 Similar to #12075, bump-backtrace to remove the dependency on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-17 20:13:04 +01:00
stevenhorsman	4158d9a94a	runtime-rs: update flate2 & backtrace Similar to #12075, bump flate2 and backtrace to remove the dependency on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-17 20:13:04 +01:00
stevenhorsman	fe10db233c	runtime-rs: Remove libbacktrace feature from backtrace This feature was removed in https://github.com/rust-lang/backtrace-rs/pull/615 which shows that the implementation was removed over two years ago, so get rid of this feature, so we can move to newer versions Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-17 20:13:04 +01:00
stevenhorsman	398e7987cd	dragonball: update flate2 & backtrace Similar to #12075, bump flate2 and backtrace to remove the dependency on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-17 20:13:04 +01:00
Steve Horsman	04c7d11689	Merge pull request #12044 from lifupan/fix_update_interface runtime: fix the issue of update interface error	2025-11-17 14:45:36 +00:00
Fupan Li	763a0d8675	runtime: fix the issue of update interface error Since the network device hotplug is an asynchronous operation, it's possible that the hotplug operation had returned, but the network device hasn't ready in guest, thus it's better to retry on this operation to wait until the device ready in guest. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-11-17 13:58:36 +01:00
Steve Horsman	b3eb794662	Merge pull request #12098 from stevenhorsman/csi-kata-direct-volume-xz-0.5.15-bump csi-kata-directvolume: Bump xz module	2025-11-17 12:47:28 +00:00
Fabiano Fidêncio	75996945aa	kata-deploy: try-kata-values.yaml -> values.yaml This makes the user experience better, as the admin can deploy Kata Containers without having to download / set up any additional file. Of course, if the admin wants something more specific, examples are provided. Tests and documentation are updated to reflect this change. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-17 12:16:17 +01:00
Alex Lyn	71a9ecf9f8	Merge pull request #12095 from lifupan/fix_vcpu_number runtime-rs: fix the issue of wrong vcpu number	2025-11-17 19:11:48 +08:00
stevenhorsman	502a3ce3b6	csi-kata-directvolume: Bump xz module Bump github.com/ulikunitz/xz to v0.5.15, to remediate vulnerability GO-2025-3922 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-17 10:20:50 +00:00
Markus Rudy	b771bb6ed3	genpolicy: log requests as jsonlines The current format of genpolicy request logs looks a bit like JSON, but it does not parse out of the box and needs post-processing with sed, for example. This commit changes the log format to jsonlines[1], which is basically newline-delimited compact JSON values. Compared to standard JSON, this allows streaming output. The resulting file can be converted and processed programmatically, for example with `jq -s`. The fields are also adjusted to match the field names of TestRequest, so that the logged requests can be used immediately in tests. [1]: https://jsonlines.org/ Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-11-17 09:01:00 +01:00
Markus Rudy	eb6cf025b3	genpolicy: format testcases.json and sort by key This should allow keeping future diffs minimal. The files were formatted with `jq -S`, which should be used after future updates to the test case files. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-11-17 09:01:00 +01:00
Markus Rudy	851f8258af	genpolicy: move testcase request type out of struct Storing the request type outside the request object has two benefits: * The request JSON passed to the Rego engine matches more closely what would be passed by the agent (no `type` field). * If we want to update the requests, it's easier to insert them into a dedicated field, rather than inserting them and amending the type field. This is a first step towards programmatic updates of testcase files. This commit also adds the 'Request' suffix to the test case enum, such that we can use the 'ep' input for allow_request directly. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-11-17 09:01:00 +01:00
zhangchen.kidd	914063bcdd	runtime: documentation: Add virtio-blk support iothread comments in docs Add comments to make the "EnableIOThreads" flag as a switch for virtio-blk(based on IndepIOThreads) driver. Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>	2025-11-17 15:55:03 +08:00
zhangchen.kidd	9128112e3d	runtime: qemu: Add Independent IOThread support for virtio-blk Make hotplug virtio-blk device attach to Independent IOThread 0 as default when enabled the EnableIOThreads and IndepIOThreads. Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>	2025-11-17 15:55:03 +08:00
zhangchen.kidd	fea954df7a	runtime: qemu: qmp: Add iothread args for QMP ExecutePCIDeviceAdd Qemu already support the device_add with iothread args. Make KATA have ability to hotplug PCI device with IOThreads. Currently, just support QEMU as the hypervisor, not sure it works for stratovirt. Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>	2025-11-17 15:55:03 +08:00
zhangchen.kidd	af203b7dee	runtime: qemu: introduce setup iothread function Make the original virtio-scsi iothread and the new independent iothread to a dedicated method for handing the related logics. Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>	2025-11-17 15:55:03 +08:00
zhangchen.kidd	d20712aa9e	runtime: qemu: Add comments for virtio-scsi iothread args For current implementation, just virtio-scsi use this iothread path. Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>	2025-11-17 15:55:03 +08:00
zhangchen.kidd	f9d4829e77	rumtime: qemu: Add indep_iothreads for QEMU hypervisor toml Add indep_iothreads args for QEMU related configuration toml. The default value is 0. Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>	2025-11-17 15:55:03 +08:00
zhangchen.kidd	c3d3684f81	runtime: Introduce independent IOThreads framework Introduce independent IOThread framework for Kata container. What is the indep_iothreads: This new feature introduce a way to pre-alloc IOThreads for QEMU hypervisor (maybe other hypervisor can support too). Independent IOThreads enables IO to be processed in a separate thread. To generally improve the performance of each module, avoid them running in the QEMU main loop. Why need indep_iothreads: In Kata container implementation, many devices based on hotplug mechanism. The real workload container may not sync the same lifecycle with the VM. It may require to hotplug/unplug new disks or other devices without destroying the VM. So we can keep the IOThread with the VM as a IOThread pool(some devices need multi iothreads for performance like virtio-blk vq-mapping), the hotplug devices can attach/detach with the IOThread according to business needs. At the same time, QEMU also support the "x-blockdev-set-iothread" to change iothreads(but it need stop VM for data secure). Current QEMU have many devices support iothread, virtio-blk, virtio-scsi, virtio-balloon, monitor, colo-compare...etc... How it works: Add new item in hypervisor struct named "indep_iothreads" in toml. The default value is 0, it reused the original "enable_iothreads" as the switch. If the "indep_iothreads" != 0 and "enable_iothreads" = true it will add qmp object -iothread indepIOThreadsPrefix_No when VM startup. The first user is the virtio-blk, it will attach the indep_iothread_0 as default when enable iothread for virtio-blk. Thanks Chen Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>	2025-11-17 15:55:01 +08:00
Fupan Li	c74a2650e9	runtime-rs: fix the issue of wrong vcpu number In commit `1f95d9401b` runtime-rs: change representation of default_vcpus from i32 to f32, When the vCPU number is less than 1.0, directly converting an integer to a floating-point number will automatically convert it to 0. Therefore, it needs to be rounded up before converting it back to an integer. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-11-17 10:09:51 +08:00
Alex Lyn	daca7b268b	genpolicy: Make cpath compatible with both runtime-rs and runtime-go Update the `cpath` variable in the policy template to support the optional `/passthrough` subpath used by runtime-rs. This ensures that mount source path validation works correctly for both runtime implementations. By changing `cpath` to include the `(?:/passthrough)?` regular expression fragment, we make the `/passthrough` segment optional. The updated `cpath`: `/run/kata-containers/shared/containers(?:/passthrough)?` This single regex pattern now correctly matches both: 1.`/run/kata-containers/shared/containers/<sandbox-id>/...` (runtime-go) 2.`/run/kata-containers/shared/containers/passthrough/<sandbox-id>/...` (runtime-rs) This elegantly resolves the compatibility issue without needing to add separate or conditional logic to the policy rules, making the policy more robust and maintainable. Fixes: #12063 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-17 09:36:19 +08:00
Fabiano Fidêncio	2e000129a9	kata-deploy: tests: Add example values files for easy Kata deployment Add three example values files to make it easier for users to try out different Kata Containers configurations: - try-kata.values.yaml: Enables all available shims - try-kata-tee.values.yaml: Enables only TEE/confidential computing shims - try-kata-nvidia-gpu.values.yaml: Enables only NVIDIA GPU shims These files use the new structured configuration format and serve as ready-to-use examples for common deployment scenarios. Also update the README.md to document these example files and how to use them. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-15 09:36:14 +01:00
Fabiano Fidêncio	8717312599	tests: Migrate helm_helper to use new structured configuration Update the helm_helper function in gha-run-k8s-common.sh to use the new structured configuration format instead of the legacy env.* format. All possible settings have been migrated to the structured format: - HELM_DEBUG now sets root-level 'debug' boolean - HELM_SHIMS now enables shims in structured format with automatic architecture detection based on shim name - HELM_DEFAULT_SHIM now sets per-architecture defaultShim mapping - HELM_EXPERIMENTAL_SETUP_SNAPSHOTTER now sets snapshotter.setup array - HELM_ALLOWED_HYPERVISOR_ANNOTATIONS now sets per-shim allowedHypervisorAnnotations - HELM_SNAPSHOTTER_HANDLER_MAPPING now sets per-shim containerd.snapshotter - HELM_AGENT_HTTPS_PROXY and HELM_AGENT_NO_PROXY now set per-shim agent proxy settings - HELM_PULL_TYPE_MAPPING now sets per-shim forceGuestPull/guestPull settings - HELM_EXPERIMENTAL_FORCE_GUEST_PULL now sets per-shim forceGuestPull/guestPull The test helper automatically determines supported architectures for each shim (e.g., qemu-se supports s390x, qemu-cca supports arm64, qemu-snp/qemu-tdx support amd64, etc.) and applies per-shim settings to the appropriate shims based on HELM_SHIMS. Only HELM_HOST_OS remains in legacy env.* format as it doesn't have a structured equivalent yet. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-15 09:36:14 +01:00
Fabiano Fidêncio	aa89fda7fc	kata-deploy: Document new structured configuration and deprecation Add comprehensive documentation for the new structured configuration format, including: - Migration guide from legacy env.* format - List of deprecated fields with removal timeline (2 releases) - Examples of the new structured format - Explanation of key benefits - Backward compatibility notes The documentation makes it clear that the legacy format is deprecated but will continue to work during the transition period. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-15 09:36:14 +01:00
Fabiano Fidêncio	119893b8e8	kata-deploy: Add backward compatibility for legacy env.* configuration This commit adds backward compatibility support to ensure existing configurations using the legacy env.* format continue to work. The helper functions now check for legacy env.* values first, and only fall back to the new structured format if legacy values are not set. This allows for gradual migration without breaking existing deployments. Backward compatibility is maintained for: - env.shims, env.shims_* (per architecture) - env.defaultShim, env.defaultShim_* (per architecture) - env.allowedHypervisorAnnotations - env.snapshotterHandlerMapping_* (per architecture) - env.pullTypeMapping_* (per architecture) - env.agentHttpsProxy, env.agentNoProxy - env._experimentalSetupSnapshotter - env._experimentalForceGuestPull_* (per architecture) - env.debug Legacy env vars (SHIMS, DEFAULT_SHIM, etc.) are still set in the DaemonSet when using the old format to maintain full compatibility. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-15 09:36:14 +01:00
Fabiano Fidêncio	ae3fb45814	kata-deploy: Introduce structured configuration format for shims This commit introduces a new structured configuration format for configuring Kata Containers shims in the Helm chart. The new format provides: - Per-shim configuration with enabled/supportedArches - Per-shim snapshotter, guest pull, and agent proxy settings - Architecture-aware default shim configuration - Root-level debug and snapshotter setup configuration All shims are disabled by default and must be explicitly enabled. This provides better type safety and clearer organization compared to the legacy env.* string-based format. The templates are updated to use the new structure exclusively. Backward compatibility will be added in a follow-up commit. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-15 09:36:14 +01:00
Fabiano Fidêncio	e85d584e1c	kata-deploy: script: Fix FOR_ARCH handling As the some of the global vars can be empty, we should actually check their _FOR_ARCH version instead. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-15 09:36:14 +01:00
Fabiano Fidêncio	397289c67c	kata-deploy: script: Handle {https,no}_proxy per shim As we're making the values.yaml more user friendly, we actually have to handle the https_proxy and no_proxy entries per shim, instead of having this globally available, as this will only affect images being pulled inside the guest (as in, when using TEE variations of the shims). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-15 09:36:14 +01:00
Fabiano Fidêncio	f62d9435a2	runtimeclasses: firecracker is not a valid one At least not for now, and it was mistakenly added to the list. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-15 09:36:14 +01:00
nheinemans-asml	3380458269	kata-deploy: Add daemonsets to the RBAC Add missing rules which are necessary for dealing with daemonsets as kata-deploy know checks for the NFD daemonset as part of its script. fixes #12083 Signed-off-by: nheinemans-asml <97238218+nheinemans-asml@users.noreply.github.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-14 17:16:58 +01:00
Simon Kaegi	716c55abdd	kernel: adds nft bridging and filtering support for IPv4 and IPv6 Adds a practical set of kernel config used by docker-in-docker and kind for network bridging and filtering. It also includes the matching IPv6 support to allow tools like kind that require IPv6 network policies to work out of the box. This support includes: - nftables reject and filtering support for inet/ipv4/ipv6 - Bridge filtering for container-to-container traffic - IPv6 NAT, filtering, and packet matching rules for network policies - VXLAN and IPsec crypto support for network tunneling - TMPFS POSIX ACL support for filesystem permissions The configs are organized across fragment files: - common/fs.conf: TMPFS ACL support - common/crypto.conf: IPsec/VXLAN crypto algorithms - common/network.conf: VXLAN, IPsec ESP, nftables bridge/ARP/netdev - common/netfilter.conf: IPv6 netfilter stack and nftables advanced features Fixes: #11886 Signed-off-by: Simon Kaegi <simon.kaegi@gmail.com>	2025-11-14 15:57:47 +01:00
Dan Mihai	5cc1024936	ci: k8s: AUTO_GENERATE_POLICY for coco-dev Re-enable AUTO_GENERATE_POLICY for coco-dev Hosts, unless PULL_TYPE is "experimental-force-guest-pull", or the caller specified a different value for AUTO_GENERATE_POLICY. Auto-generated Policy has been disabled accidentally and recently for these Hosts, by a GHA workflow change. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-14 15:53:34 +01:00
Dan Mihai	73ad83e1cc	genpolicy: update workaround for guest pull Don't skip anymore parsing the pause container image when using the recently updated AKS pause container handling - i.e. when pause_container_id_policy == "v2". This was the easiest CI fix for guest pull + new AKS given the current tests. When adding new UID/GID/AdditionalGids tests in the future, these workarounds might need additional updates. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-14 15:53:34 +01:00
Steve Horsman	7bcb971398	Merge pull request #12075 from burgerdev/genpolicy-archived-deps retire `adler` dependency	2025-11-14 14:51:47 +00:00
Steve Horsman	1d0d066869	Merge pull request #12069 from Amulyam24/static-checks-ppc github: run agent checks for Power on ppc64le instead of ubuntu-24.04-ppc64le	2025-11-14 10:18:37 +00:00
Markus Rudy	dd59131924	runtime-rs: update flate2 to 1.1.5 The update removes the deprecated adler crate from our dependencies. In addition, we're switching to the default backend (miniz_oxide), which is a pure Rust implementation and thus much more portable. The performance impact is negligible, because flate2 is only used for initdata decompression, which is limited to a couple of MiB anyway. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-11-14 11:11:44 +01:00
Markus Rudy	3949492f19	genpolicy: update flate2 to 1.1.5 The update removes the deprecated adler crate from our dependencies. In addition, we're switching to the default backend (miniz_oxide), which is a pure Rust implementation and thus much more portable. The performance impact is acceptable for a developer tool. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-11-14 11:10:29 +01:00
Steve Horsman	0ab71771ab	Merge pull request #11447 from kata-containers/runtime-rs-qemu-coco-dev-config Runtime rs qemu coco dev config	2025-11-13 19:12:57 +00:00
stevenhorsman	1ef3e3b929	ci: Switch gatekeeper auth header The github API suggestions that `Authorization: Bearer <YOUR-TOKEN>` is the way to set the auth token, but it also mentioned that `token` should work, so it's unclear if this will help much, but it shouldn't harm. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-13 19:01:21 +01:00
stevenhorsman	b7abcc4c37	tests: Fix wildcard skip in k8s-cpu-ns The formatting wasn't quite right, so the `qemu-coco-dev-runtime-rs` hypervisor wasn't skipping this test Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-13 14:21:05 +00:00
Alex Lyn	bda6bbcad3	runtime-rs: Set `static_sandbox_resource_mgmt` to true within nontee Introduce a flag `DEFSTATICRESOURCEMGMT_COCO` for setting static sandbox resource management with default true. And then set it to the item of `static_sandbox_resource_mgmt` in configuration. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-13 14:18:43 +00:00
stevenhorsman	b51af53bc7	tests/k8s: call teardown_common in some policy tests The teardown_common will print the description of the running pods, kill them all and print the system's syslogs afterwards. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-13 14:18:43 +00:00
Alex Lyn	efc6aee4f6	runtime-rs: Support agent policy Support agent policy within runtime-rs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-13 14:18:43 +00:00
stevenhorsman	79082171ca	workflows: Add Delete AKS cluster timeout When testing this branch, on several occasions the Delete AKS cluster step has hung for multiple hours, so add a timeout to prevent this. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-13 14:18:43 +00:00
stevenhorsman	0335012824	tests/k8s: Enable tests for qemu-coco-dev-runtime-rs Add the runtime class to the non-tee tests and enable it to run in the test code Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-13 14:18:43 +00:00
stevenhorsman	a1ddd2c3dd	kata-deploy: Add kata-qemu-coco-dev-runtime-rs runtime class Add the runtime class and shim references for the new non-tee runtime-rs class Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-13 14:18:43 +00:00
Alex Lyn	64da581f6e	kata-types: Support create_container_timeout set within configuration Since it aligns with the create_container_timeout definition in runtime-go, we need to set the value in configuration.toml in seconds, not milliseconds. We must also convert it to milliseconds when the configuration is loaded for request_timeout_ms. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-13 14:18:43 +00:00
stevenhorsman	af2c2d9d00	runtime-rs: Add qemu-coco-dev-runtime-rs Create non-tee runtime class for runtime-rs qemu CoCo development without requiring TEE hardware. Based on the qemu-runtime-rs config, but with updated guest image, kernel and shared_fs Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-13 14:18:43 +00:00
Amulyam24	b32b54c4af	github: do not run agent checks for Power on ubuntu-24.04-ppc64le The new environment of Power runners for agent checks is causing two test case failures w.r.to selinux and inode which needs further understanding and is mostly an issue due to environemnt change and not to do with the agent. Fall back to running agent checks on original ppc64le self hosted runners. Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2025-11-13 15:56:43 +05:30
Gao Xiang	657c4406cd	runtime: Add preliminary support for EROFS native rwlayers So that the writable data will be written to a seperate storage instead of tmpfs in the guest. Note that a cleaner way should use new containerd custom mount type but I don't have time on this for now. More details, see: https://github.com/containerd/containerd/blob/v2.2.0/docs/snapshotters/erofs.md#quota-support Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2025-11-13 09:55:06 +01:00
Steve Horsman	92758a17fe	Merge pull request #12078 from kata-containers/switch-to-ubuntu-24.04-arm-runner workflows: Switch to ubuntu-24.04-arm runner	2025-11-12 16:35:52 +00:00
stevenhorsman	ba56a2c372	workflows: Switch to ubuntu-22.04-arm runner As the arm 22.04 runner isn't working at the moment, let's test the 24.04 version to see if that is better. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-12 15:37:09 +00:00
Fabiano Fidêncio	a04cdbc40f	tests: Enforce qemu-coco-dev for experimental_force_guest_pull The fact that we were not explicitly setting the VMM was leading to us testing with the default runtime class (qemu). :-/ Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-12 16:07:05 +01:00
Wainer Moschetta	e31313ce9e	Merge pull request #11030 from ldoktor/webhook2 tools.kata-webhook: Add support for only-filter	2025-11-12 11:21:23 -03:00
Hyounggyu Choi	2dec247a54	Merge pull request #12038 from lifupan/fix_smaller-memeory runtime-rs: fix the issue of hot-unplug memory smaller	2025-11-12 11:22:04 +01:00
dependabot[bot]	c715d8648c	build(deps): bump oras-project/setup-oras from 1.2.2 to 1.2.4 Bumps [oras-project/setup-oras](https://github.com/oras-project/setup-oras) from 1.2.2 to 1.2.4. - [Release notes](https://github.com/oras-project/setup-oras/releases) - [Commits](`5c0b487ce3...22ce207df3`) --- updated-dependencies: - dependency-name: oras-project/setup-oras dependency-version: 1.2.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>	2025-11-12 09:45:27 +00:00
Markus Rudy	2c8d0688f2	Merge pull request #12068 from katexochen/p/full-controllers genpolicy: support full DeploymentSpec, JobSpec; cleanup CronJobSpec	2025-11-12 10:35:38 +01:00
Fabiano Fidêncio	6d3c20bc45	riscv: Introduce its own nightly tests By doing this, the ones interested on RISC-V support can still have a ood visibility of its state, without the extra noise in our CI. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-12 09:46:17 +01:00
Zvonko Kaiser	d783e59b42	Merge pull request #12055 from fidencio/topic/coco-bump-trustee versions: Bump Trustee	2025-11-12 02:48:16 -05:00
dependabot[bot]	edacdcb0bc	build(deps): bump github.com/opencontainers/selinux in /src/runtime Bumps [github.com/opencontainers/selinux](https://github.com/opencontainers/selinux) from 1.12.0 to 1.13.0. - [Release notes](https://github.com/opencontainers/selinux/releases) - [Commits](https://github.com/opencontainers/selinux/compare/v1.12.0...v1.13.0) --- updated-dependencies: - dependency-name: github.com/opencontainers/selinux dependency-version: 1.13.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2025-11-11 23:15:40 +01:00
Steve Horsman	1954dfe349	Merge pull request #12071 from stevenhorsman/update-required-test-docker-and-stratovirt ci: Remove stratovirt & docker tests from required	2025-11-11 21:19:25 +00:00
Zvonko Kaiser	76e4e6bc24	Merge pull request #12061 from Apokleos/correct-unexpected-cap tests: Correct unexpected capability for policy failure test	2025-11-11 12:20:33 -05:00
Fabiano Fidêncio	d82eb8d0f1	ci: Drop docker tests We have had those tests broken for months. It's time to get rid of those. NOTE that we could easily revert this commit and re-add those tests as soon as we find someone to maintain and be responsible for such integration. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-11 17:02:02 +01:00
stevenhorsman	8b5df4d360	ci: Remove stratovirt & docker tests from required As stratovirt CI was removed in #12006 we should remove the jobs from required. Also the docker tests have been commented out for months, and we are considering removing them, so clean this file up. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-11-11 15:38:51 +00:00
Steve Horsman	4b33000c56	Merge pull request #12067 from Apokleos/fix-guest-emptydir runtime-rs: Fix several incorrect settings with guest empty dir.	2025-11-11 15:21:31 +00:00
Lukáš Doktor	ca91073d83	tools.kata-webhook: Add support for only-filter sometimes it's hard to enumerate all blacklisted namespaces, lets add a regular expression based only filter to allow specifying namespaces that should be mutated. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-11-11 15:21:15 +01:00
dependabot[bot]	281f69a540	build(deps): bump github.com/containerd/containerd in /src/runtime Bumps [github.com/containerd/containerd](https://github.com/containerd/containerd) from 1.7.27 to 1.7.29. - [Release notes](https://github.com/containerd/containerd/releases) - [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md) - [Commits](https://github.com/containerd/containerd/compare/v1.7.27...v1.7.29) --- updated-dependencies: - dependency-name: github.com/containerd/containerd dependency-version: 1.7.29 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2025-11-11 14:23:47 +01:00
Paul Meyer	ec6896e96b	genpolicy: remove non-existing field from CronJobSpec There is no backoffLimit on CronJobSpec, also no additional fields. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-11-11 11:12:48 +01:00
Paul Meyer	258aed3cd3	genpolicy: support full JobSpec Based on https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.33/#job-v1-batch The JOB_COMPLETION_INDEX env will be set if completionMode is "indexed". Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-11-11 11:12:48 +01:00
Paul Meyer	f0ffaa9a6b	genpolicy: support full DeploymentSpec The added fields are relevant only to the controller, so they should not impact security and following aren't of interest for policies. Adding according to https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.33/#deployment-v1-apps Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-11-11 11:07:18 +01:00
Alex Lyn	79d1a6ed8f	runtime-rs: Correct the mount type for emptydir with local storage Previous set for the Mount.type with `bind` is wrong, and for local storage, the type of Mount should be `local`. This commit aims to correct the type with "local". Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-11 17:09:33 +08:00
Alex Lyn	935ecf2765	runtime-rs: Fix disable_guest_empty_dir parameters order As the disable_guest_empty_dir order is wrong which causes the bool value is not correct and it got a wrong result. This commit aims to correct the parameters order. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-11 16:59:00 +08:00
Fabiano Fidêncio	9d6f6bac37	agent-ctl: Bump image-rs version Bump to the same version of CoCo Guest Components. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-11 08:08:24 +01:00
Fabiano Fidêncio	a5629a5a6f	versions: Bump coco-guest-components Usual bump before a release that will be consumed by Confidential Containers. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-11 08:08:24 +01:00
Fabiano Fidêncio	2d2b0de160	tests: kbs: Try to get the pod logs on deployment failure As this helps immensely to figure out what went wrong with the deployment. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-11 08:08:24 +01:00
Fabiano Fidêncio	58df06d90e	versions: Bump Trustee This is a bump pre-release, which brings several fixes and some improvements related to initData, and NVIDIA's remote verifier. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-11 08:08:05 +01:00
Alex Lyn	c225cba0e6	tests: Correct unexpected capability for policy failure test The test case designed to verify policy failures due to an "unexpected capability" was misconfigured. It was using "CAP_SYS_CHROOT" as the unexpected capability to be added. This configuration was flawed for two main reasons: 1.Incorrect Syntax: Kubernetes Pod specs expect capability names without the "CAP_" prefix (e.g., "SYS_CHROOT", not "CAP_SYS_CHROOT"). This made the test case's premise incorrect from a K8s API perspective. 2.Part of Default Set: "SYS_CHROOT" is already included in the `default_caps` list for a standard container. Therefore, adding it would not trigger a policy violation, defeating the purpose of the "unexpected capability" test. Furthermore, a related issue was observed where a malformed capability like "CAP_CAP_SYS_CHROOT" was being generated, causing parsing failures in the `oci-spec-rs` library. This was a symptom of incorrect string manipulation when handling capabilities. This commit corrects the test by selecting "SYS_NICE" as the unexpected capability. "SYS_NICE" is a more suitable choice because: - It is a valid Linux capability. - It is relatively harmless. - It is not part of the default capability set defined in `genpolicy-settings.json`. By using "SYS_NICE", the test now accurately simulates a scenario where a Pod requests a legitimate but non-default capability, which the policy (generated from a baseline Pod without this capability) should correctly reject. This change fixes the test's logic and also resolves the downstream `oci-spec-rs` parsing error by ensuring only valid capability names are processed. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-11 14:06:30 +08:00
Alex Lyn	9aaf41a71b	Merge pull request #11985 from Apokleos/policy-caps-rs genpolicy: Correct caps matcher for runtime-rs	2025-11-11 11:08:11 +08:00
Alex Lyn	29fe46bc06	genpolicy: Correct caps matcher for runtime-rs Detected a format mismatch in OCI Spec Capabilities fields between `runtime-rs` (no `CAP_` prefix) and `runtime-go` (with `CAP_` prefix). This introduces a normalization of caps in match_caps(p_caps, i_caps). This ensures robust and consistent processing of Capabilities regardless of whether the OCI Spec originates from `runtime-rs` or `runtime-go`. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-11 10:03:54 +08:00
Dan Mihai	f78584e868	Merge pull request #12048 from manuelh-dev/mahuber/bb-build deploy: Improve busybox build	2025-11-10 11:32:07 -08:00
Alex Lyn	7423eb7a30	agent: Support both virtio-blk and virtio-scsi devices for initdata Currently, the initdata module only detects virtio-blk devices (/dev/vd) when searching for the initdata block device. However, when using virtio-scsi, the devices appear as /dev/sd in the guest, causing the initdata detection to fail. This commit extends the device detection logic to support both device types: - virtio-blk devices: /dev/vda, /dev/vdb, etc. - virtio-scsi devices: /dev/sda, /dev/sdb, etc. This commits aims to address issue of theinitdata device not being found when using virtio-scsi Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-10 18:03:23 +01:00
dependabot[bot]	f699f097f3	build(deps): bump github.com/opencontainers/runc in /src/runtime Bumps [github.com/opencontainers/runc](https://github.com/opencontainers/runc) from 1.2.6 to 1.2.8. - [Release notes](https://github.com/opencontainers/runc/releases) - [Changelog](https://github.com/opencontainers/runc/blob/v1.2.8/CHANGELOG.md) - [Commits](https://github.com/opencontainers/runc/compare/v1.2.6...v1.2.8) --- updated-dependencies: - dependency-name: github.com/opencontainers/runc dependency-version: 1.2.8 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2025-11-10 15:43:48 +01:00
Fabiano Fidêncio	92226d0a19	tests: nvidia: Be prepared for TDX Thankfully there's only one piece that's still SNP specific (for the supported TEEs). Let's adjust it so we can have an easy and smooth execution when adding a TDX CI machine. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Fabiano Fidêncio	4d314e8676	tests: nvidia: nims: Adjust to CC There are several changes needed in order to get this test working with CC, and yet we still are skipping it. Basically, we need to: * Pull an authenticated image inside the guest, which requires: * Using Trustee to release the credential * We still depend on a PR to be merged on Trustee side * https://github.com/confidential-containers/trustee/pull/1035 * We still depend on a Trustee bump (including the PR above) on our side Apart from those changes, I ended up "duplicating" the tests by adding a "-tee" version of those, which already have: * The proper kbs annotations set up * Dropped host mounts * Increases the memory needed Last but not least, as "bats" probably means "being a terrible script", I had to re-arrange a few things otherwise the tests would not even run due to bats-isms that I am sincerely not able to pin-point. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Fabiano Fidêncio	8cedd96d54	tests: nvidia: k8s: Enforce experimental_force_guest_pull We added the tests using virtio-9p as we knew it'd require incremental changes to be able to use any kind of guest-pull method. Now, as in the coming commits we'll be actually ensuring that guest-pull works and is in use, we can enforce the experimental_force_guest_pull usage for the nvidia cases. Note: We're using experimental_force_guest_pull instead of nydus-snapshotter due to stability concerns with the snapshotter. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Fabiano Fidêncio	464764c7e0	tests: nvidia: kbs: Ensure KBS_INGRESS=nodeport I've missed doing this doing the KBS deployment set up. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Manuel Huber	a5cd7235cb	runtime: Align nvidia TEEs enable_annotations with TEEs It was just missed when adding those configurations. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Fabiano Fidêncio	e85cf83573	k8s: tests: Fix default for EXPERIMENTAL_FORCE_GUEST_PULL It takes either a shim name or "", but we were treating this (thankfully only in this specific file) as a boolean. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Manuel Huber	8b39468b36	tests: nvidia: Logging for NIM Adjust output to the setup_file and teardown_file behavior. With this, we will be able to observe relevant logging rather than adding to the output variable. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-10 13:01:30 +01:00
Fabiano Fidêncio	812191c1f3	tests: nvidia: Do not deploy NFD on nvidia-gpu cases As it'll come from the GPU Operator for now. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Pavel Mores	74f9fdb11f	runtime-rs: remove hardcoding of SEV physical address reduction Previous commit enabled getting the physical address reduction from processor but just stored it for later use. This commit adds handling of the value to ProtectionDevice and enables the QEMU driver to use it. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-11-10 13:01:03 +01:00
Pavel Mores	6f9178d290	runtime-rs: get SEV params using CPUID and store them in SevSnpDetails An implementation of cbitpos acquisition is supplied that was missing so far. We also get the physical address reduction value from the same source (CPUID Fn8000_001f function). This has been hardcoded at 1 so far, following the Go runtime example, but it's better to get it from the processor. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-11-10 13:01:03 +01:00
Greg Kurz	5810279edf	Merge pull request #12008 from microsoft/saulparedes/allow_priv webhook: allow privileged containers	2025-11-10 11:13:41 +01:00
Zvonko Kaiser	df58972d41	Merge pull request #12051 from microsoft/danmihai1/agent-version agent: update version.rs when VERSION file changed	2025-11-09 20:34:58 -05:00
Fabiano Fidêncio	37d4eb0b77	ci: nvidia: Ensure K8S_TEST_HOST_TYPE=baremetal So the proper cleanups are performed in case something goes awry in a previous run. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-09 10:51:33 +01:00
Dan Mihai	7b10f4c72a	agent: update version.rs when VERSION file changed - version.rs gets generated from version.rs.in - version.rs.in contains values read from VERSION - so version.rs (and maybe other Agent files too) must be re-generated when the VERSION file changes Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-08 17:53:09 +00:00
Alex Lyn	83b0a59215	Merge pull request #12046 from Apokleos/disable-guest-emptydir Disable guest emptydir	2025-11-08 11:54:15 +08:00
Dan Mihai	df7ee2dd38	ci: k8s: AUTO_GENERATE_POLICY for cbl-mariner Auto-generate policy on cbl-mariner Hosts if the user didn't specify an AUTO_GENERATE_POLICY value. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-08 00:00:09 +01:00
Dan Mihai	53acb74f26	genpolicy: adapt to new AKS pause container behavior The new image reference has changed to mcr.microsoft.com/oss/v2/kubernetes/pause:3.6 from mcr.microsoft.com/oss/kubernetes/pause:3.6. The new image uses by default UID=0, GID=0 while the older. The older image had: UID=65535, GID=65535. There is a new pause_container_id_policy field in genpolicy-settings.json, informing genpolicy about the way AdditionalGids gets updated - "v1" for the older behavior and "v2" for the newer AKS version: - When using v1, the default value of AdditionalGids is {65535}. - When using v2, the default value of AdditionalGids is {}. UID=65535 and GID=65535 are still hard-coded by default in genpolicy-settings.json. We might be able to remove/ignore these fields in the future, if we'll stop relying on policy::KataSpec::get_process_fields to use these fields. A new CI function adapt_common_policy_settings_for_aks() changes the pause container UID, GID, pause_container_id_policy, and image ref settings values when testing on AKS Hosts - i.e., when testing coco-dev or mariner Hosts. The genpolicy workarounds for the unexpected behavior with guest pull enabled have been improved to use the current container's GID instead of hard-coding GID=0 as the guest pull default. Also, AdditionalGids gets updated when the current container's GID is changing, instead of always changing the AdditionalGids at the very end of policy::AgentPolicy::get_container_process(), when the relevant evolution of the GID value was no longer available. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-08 00:00:09 +01:00
Dan Mihai	1f784bb770	genpolicy: improve policy generation comments Make it easier to understand the source of the UID/GID/AdditionalGids values from the container in the auto-generated policy. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-08 00:00:09 +01:00
Dan Mihai	969b8e0fb8	genpolicy: more detailed UID/GID debug logs Add more details to code paths handling UID/GID values, for easier debugging. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-08 00:00:09 +01:00
Dan Mihai	cacd37ee6e	tests: genpolicy: restore test settings for non-Coco configMap These settings got broken recently because the non-CoCo tests were disabled for unrelated reasons. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-08 00:00:09 +01:00
Manuel Huber	caff6df827	deploy: Improve busybox build Parallelize busybox builds to build a bit faster and create the build directory prior to Docker execution, which on my environment, helps with permission issues when building busybox without the kata-containers/build directory existing beforehand. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-07 10:09:57 -08:00
Alex Lyn	23024876b2	runtime-rs: Use the configurable disable_guest_empty_dir Correct the hardcoded value of disable_guest_empty_dir, instead, we use the real value of it which comes from the configuration. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-07 19:52:11 +08:00
Alex Lyn	382924bdf3	kata-sys-util: Introduce a sandbox annotation for disable guest emptydir A sandbox annotation that determines if it should create Kubernetes emptyDir mounts on the guest filesystem. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-07 19:48:42 +08:00
Alex Lyn	720a229579	kata-types: Introduce disable guest emptydir flag It acts as if it should create Kubernetes emptyDir mounts on the guest filesystem. If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem.Instead, emptyDir mounts will be created on the host and shared via virtio-fs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-07 19:45:55 +08:00
Fabiano Fidêncio	03e06fdf4d	tests: nvidia: Deploy Trustee Let's ensure Trustee is deployed as some of the tests rely images that live behind authentication. /o\ The approach taken here to deploy Trustee is exactly the same one taken on the other CoCo tests, apart from an env var passed to ensure we're using the NVIDIA remote verifier (which will be in handy very very soon). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-07 12:32:11 +01:00
Pavel Mores	841fee28da	runtime-rs: add a helper to run external command and capture its output This isn't really related to remote hypervisor though it was useful for its debugging. It's a small helper I've been using regularly during development for quite some time that I think might be useful more broadly. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-11-07 10:49:14 +01:00
Pavel Mores	72c704b287	runtime-rs: make error reporting for CreateVM a bit more explicit A naked ttrpc error with no context turns out to be rather hard to understand or even notice in log. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-11-07 10:49:14 +01:00
Pavel Mores	45d8141edc	runtime-rs: remote hv needs neither image nor initrd specified in config The remote hypervisor launches no VM, it just instructs the Cloud API Adaptor to do so, therefore it has no need for an image or initrd to boot from and should be exempt from the mandate for one or the other to be specified. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-11-07 10:49:14 +01:00
Pavel Mores	80ef102a00	runtime-rs: fix scoping of the remote hv Hypervisor service The go runtime's .proto file - which is also used by the Cloud API Adaptor - puts the Hypervisor service into the "hypervisor" package. runtime-rs has to do the same to avoid an "unimplemented" error. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-11-07 10:49:14 +01:00
Alex Lyn	d5e2071869	Merge pull request #11921 from Apokleos/enhance-copyfile2 runtime-rs: Add support LocalStorage for emptyDir within nontee cases	2025-11-07 16:58:39 +08:00
Fabiano Fidêncio	a591cda466	gatekeeper: Adjust the nvidia gpu test name With the change made to the matrix when the CC GPU runner was added, there was a change in the job name (@sprt saw that coming, but I didn't). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-06 16:28:33 +01:00
Manuel Huber	c6dc176a03	tests: nvidia: cc: Enable NIMs tests Same deal as the previous commut, just enabling the tests here, with the same list of improvements that we will need to go through in order to get is working in a perfect way. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-06 16:28:33 +01:00
Manuel Huber	8ca77f2655	tests: nvidia: cc: Run CUDA vectorAdd tests on CC mode While the primary goal of this change is to detect regressions to the NVIDIA SNP GPU scenario, various improvements to reflect a more realistic CC setting are planned in subsequent changes, such as: * moving away from the overlayfs snapshotter * disabling filesystem sharing * applying a pod security policy * activating the GPUs only after attestation * using a refined approach for GPU cold-plugging without requiring annotations * revisiting pod timeout and overhead parameters (the podOverhead value was increased due to CUDA vectorAdd requiring about 6Gi of podOverhead, as well as the inference and embedqa requiring at least 12Gi, respectively, 14Gi of podOverhead to run without invoking the host's oom-killer. We will revisit this aspect after addressing points 1. and 2.) Signed-off-by: Manuel Huber <manuelh@nvidia.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-06 16:28:33 +01:00
Manuel Huber	25ce0afd52	kata-deploy: Allow the CDI annotation for CC GPU cases For the nvidia-gpu-snp and nvidia-gpu-tdx we must set containerd to allow the CDI annotation to be passed to down. This solution may become obsolete soon enough, but the cleanest way to have it properly working is by adding it here (even if we remove it before the next release). Signed-off-by: Manuel Huber <manuelh@nvidia.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-06 16:28:33 +01:00
Manuel Huber	c91edf884b	runtimeclasses: nvidia: Bump TEE podOverhead It's been noticed that as more RAM is needed to run the CC tests, we also need to update the podOverhead of the NVIDIA CC runtime classes to avoid getting OOM Killed. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-06 16:28:33 +01:00
Fupan Li	bfe8da6c8a	tests: disable the qemu-runtime-rs cpu hotplug test Since there's something wrong with the cpu hotplug on qemu-runtime-rs, thus disable this test temporally. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-11-06 21:37:01 +08:00
Fupan Li	3b1bfea609	runtime-rs: fix the issue of hot-unplug memory smaller It should do nothing instead of return an error when hot-unplug the memory to the size smaller than static plugged memory size. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-11-06 18:19:55 +08:00
Fupan Li	aac2a37ff5	runtime-rs: enable pselect6 syscall for dragonball seccomp Since the nerdctl's network hook would call pselect6 syscall by xtables-nft-multi, thus we'd better add it to the seccomp's whitelist. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-11-06 11:17:57 +01:00
Hyounggyu Choi	ff429072b6	Merge pull request #11924 from BbolroC/fix-static-checks-actionspz ci: Fix failing static checks to enable IBM actionspz - Z specific	2025-11-06 09:04:04 +01:00
Zvonko Kaiser	fce6a75899	Merge pull request #12027 from fidencio/topic/kata-deploy-make-ALLOWED_HYPERVISOR_ANNOTATIONS-per-arch kata-deploy: Add per arch ALLOWED_HYPERVISOR_ANNOTATIONS	2025-11-05 18:20:14 -05:00
Manuel Huber	d8953f67c5	ci: Onboard another NVIDIA machine Let's add a new NVIDIA machine, which later on will be used for CC related tests. For now the current tests are skipped in the CC capable machine. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-05 23:23:08 +01:00
Fabiano Fidêncio	b2ee64a2d6	kata-deploy: scripts: Ensure we don't add duplicated values Let's now make sure that we don't add duplicated values to any of our entries, making the script as sane as possible for sequential runs. Vibed with Cursor's help! Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-05 19:48:24 +01:00
Fabiano Fidêncio	78ae79d153	kata-deploy: scripts: Add helper functions to avoid duplicated items Let's add some helper functions, not yet used, to avoid adding duplicated items. This idea is an expansion of Choi's idea to avoid setting duplicated items, and it'll help on making the whole script idempotent on sequential runs. Vibed with Cursor's help! Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-05 19:48:24 +01:00
Fabiano Fidêncio	f773368d93	kata-deploy: Add per arch ALLOWED_HYPERVISOR_ANNOTATIONS I know, this is not simplifying much things for now, but it has a good intent in the background and will serve as base for making the kata-deploy helm chart more user friendly. With that said, let's add ALLOWED_HYPERVISOR_ANNOTATIONS per arch, while adding support to set something like "qemu:foo,bar clh:bar foobar barfoo". Why? Because in the future we'll have a better way to set this per shim (and the shim is per arch ...). More details of what we'll do in the future are being discussed here: https://github.com/kata-containers/kata-containers/issues/12024 Anyways, the variables are DELIBERATELY not exposed to the chart for now, as those will be later on when addressing the issue mentioned above. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-05 19:45:34 +01:00
Fabiano Fidêncio	66e133e096	kata-deploy: Add missing runtimeClasses When the runtimeClasses were added, as part of `7cfa826804`, the firecracker runtimeClass ended up missing from the dictionary. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-05 19:07:28 +01:00
Anton Ippolitov	23c46b8a00	docs: Update devmapper containerd plugin name The Firecracker installation docs had an outaded containerd configuration for the devmapper plugin. This commit updates the instructions so that they are compatible with more recent versions of containerd. Signed-off-by: Anton Ippolitov <anton.ippolitov@datadoghq.com>	2025-11-05 18:42:29 +01:00
Fabiano Fidêncio	ace9cf942d	tests: guest-pull: Fix names When added, I've mistakenly used the wrong test-type name, which is now fixed and should be enough to trigger the tests correctly. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-05 18:21:48 +01:00
Hyounggyu Choi	4ee2037974	GHA: Run runtime tests on self-hosted runners for P/Z On IBM actionspz P/Z runners, the following error was observed during runtime tests: ``` host system doesn't support vsock: stat /dev/vhost-vsock: no such file or directory ``` Since loading the vsock module on the fly is not permitted, this commit moves the runtime tests back to self-hosted runners for P/Z. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-11-05 16:35:04 +00:00
Hyounggyu Choi	32da38273a	agent/tests: Skip if kernel module is not found On IBM actionspz Z runners, the following error occurs when running `modprobe`: ``` modprobe: FATAL: Module bridge not found in directory /lib/modules/6.8.0-85-generic ``` Additionally, there are no files under `/lib/modules`, for example: ``` total 0 drwxr-xr-x 1 root root 0 Aug 5 13:09 . drwxr-xr-x 1 root root 2.0K Oct 1 22:59 .. ``` This commit skips the `test_load_kernel_module` test if the module is not found or if running `modprobe` is not permitted. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-11-05 16:35:04 +00:00
Hyounggyu Choi	075de4dc62	agent/tests: Skip test if error is EACCES (permission denied) On IBM actionspz Z runners, write operations on network interfaces are not allowed, even for the root user. This commit skips the `add_update_addresses` test if the operation fails with EACCES (-13, permission denied). Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-11-05 16:35:04 +00:00
Hyounggyu Choi	3f84b623a3	agent/tests: Skip RNG reseeding test on restricted environments On IBM actionspz Z runners, the ioctl system call is not allowed even for the root user. There is likely an additional security mechanism (such as AppArmor or seccomp) in place on Ubuntu runners. This commit introduces a new helper, `is_permission_error()`, which skips the test if ioctl operations in `reseed_rng()` are not permitted. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-11-05 16:35:04 +00:00
Hyounggyu Choi	c2abc4da34	agent/tests: Use detected filesystem for baremounted points The IBM actionspz Z runners mount /dev as tmpfs, while other systems use devtmpfs. This difference causes an assertion failure for test_already_baremounted. This commit sets the detected filesystem for bare-mounted points as the expected value. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-11-05 16:35:04 +00:00
Hyounggyu Choi	faa048893d	agent/tests: Handle error messages differetnly based on root filesystem The root filesystem for IBM actionspz Z runners is `btrfs` instead of `ext4`. The error message differs when an unprivileged user tries to perform a bind mount. This commit adjusts the handling of error messages based on the detected root filesystem type. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-11-05 16:35:04 +00:00
Fupan Li	0df6c795d8	runtime-rs: disable the default static resource management Since the qemu & cloud-hypervisor support the cpu & memory hotplug now, thus disable the static resource management for qemu and cloud-hypervisor by default. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-11-05 16:59:13 +01:00
Fupan Li	02ecab40e4	tests: disable the cpu hotplug test for coco dev runtime Since qemu-coco-dev-runtime-rs and qemu-coco-dev had disabled the cpu&memory hotplug by enable static_sandbox_resource_mgmt, thus we should disable the cpu hotplug test for those two runtime. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-11-05 16:59:13 +01:00
Fupan Li	1fc05491a2	tests: enable the cpu hotplug test for dragonball etc Since the qemu, cloud-hypervisor and dragonball had supported the cpu hotplug on runtime-rs, thus enable the cpu hotplug test in CI. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-11-05 16:59:13 +01:00
Fabiano Fidêncio	0a0de4e6e3	Revert "tests: Do not enable NFD on s390x" This reverts commit `c75a46d17f`, as NFD now publishes an s390x image (and also a ppc64le one). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-05 16:06:33 +01:00
Alex Lyn	8f0dd4c44b	runtime-rs: Introduce disable_guest_empty_dir flag This commit introduces the configuration flag `disable_guest_empty_dir` to control the placement of Kubernetes emptyDir volumes. By default, the value is set to `false`, maintaining the current behavior of creating emptyDirs within the guest VM When set to `true`, emptyDirs will be created on the host filesystem. This is essential for scenarios where users need to share data between the host and the guest VM via an emptyDir. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-05 15:05:45 +08:00
Alex Lyn	205c3dac44	runtime-rs: Add rprivate and rw options for memory emptyDir mounts When handling a memory-based emptyDir, the runtime creates a tmpfs mount inside the guest VM. The previous implementation just supports mount options with only "rbind", which does not explicitly guarantee the desired mount propagation behavior. This commit hardens the mounting process by explicitly adding the `rprivate` and `rw` mount flags. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-05 15:05:45 +08:00
Alex Lyn	fac9c795c6	runtime-rs: Add 'local' volume to support k8s emptyDir This commit introduces the 'local' volume, which is specifically designed to create and manage Kubernetes emptyDir volumes directly within the VM's sandbox directory. The core functionality ensures that local volume can be handled correctly in handle volume procedure. This capability is essential for allowing containers to leverage the storage backend for shared volumes. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-05 15:05:45 +08:00
Alex Lyn	1696968eb1	runtime-rs: Implement 'local' storage type for k8s emptyDir volumes This commit implements the new 'local' storage type, enabling Kubernetes emptyDir volumes to be created and managed directly inside the Kata VM (in the sandbox directory). The 'local' type instructs the kata-agent to provision the empty directory within the VM. This approach allows containers to share storage inside VM, Specially useful within CoCo emptyDir scenarios. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-05 15:05:22 +08:00
Alex Lyn	b58a53bfa4	kata-sys-util: Improve handling of Kubernetes emptyDir volumes Separated the checks for tmpfs and disk-based emptyDirs from an `if-else if` block into two distinct `if` statements. This clarifies the logic by treating each volume type detection as an independent task. Additionally, updated the type for disk-based emptyDirs to the more semantically accurate `KATA_K8S_LOCAL_STORAGE_TYPE`. This allows for more specific handling downstream, distinguishing them from generic host path mounts. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-05 14:59:21 +08:00
Alex Lyn	c39c6f1ae4	kata-sys-utils: Correct the judgement of logic of host emptyDir In fact, emptyDir is not usually found in the proc mounts with the previous logic and then it failed with the previous implementation. Based on the related implementation within runtime-go,related implementation within Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-05 14:59:21 +08:00
Alex Lyn	f278616bf7	kata-types: Introduce a new storage type of "local" This introduces a new storage type: local. Local storage type will tell kata-agent to create an empty directory with LocalStorgae handler in the sandbox directory within the VM. And it also makes it align with runtime-go `KataLocalDevType = "local"`. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-05 14:59:21 +08:00
Manuel Huber	1561d7fbba	runtime: Clear outer CDI annotations Pod annotations from the outer runtime are being used for cold-plugging CDI devices. We need to ensure that these annotations don't leak into the inner runtime for which specific container (sibling) annotations are being created. Without this change, the inner runtime receives both annotations, leading to failing CDI injection as an outer runtime annotation observed in the guest translates to an unresolvable CDI device, for example, cdi.k8s.io/gpu: "nvidia.com/pgpu=0". Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-04 23:18:00 +01:00
Fabiano Fidêncio	1dfbb14093	tests: Stop testing on stratovirt Stratovirt has been failing for a considerable amount of time, with no sign of someone watching it and being actively working on a fix. With this we also stop building and shipping stratovirt as part of our release as we cannot test it. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-04 10:22:46 +01:00
Fabiano Fidêncio	02f47d3f18	helm: uninstall: Take nodeSelector into consideration As we're already doing for the install part, but this bit was missed during review. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-04 09:29:35 +01:00
Fabiano Fidêncio	5b01eaf929	tests: Align kata-deploy helm's uninstall Let's use the same method both on the kata-deploy and k8s tests. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-04 09:29:35 +01:00
Fabiano Fidêncio	4293cdf846	tests: Add stability tests for experimental-force-guest-pull A few weeks ago we've tested nydus-snapshotter with this approach, and we DID find issues with it. Now, let's also test this with `experimental_force_guest_pull`. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-04 09:02:19 +01:00
Dan Mihai	6a4c336ca0	Merge pull request #12016 from microsoft/danmihai1/early-wait-abort tests: k8s: reduce test time for unexpected CreateContainerRequest errors	2025-11-03 12:04:56 -08:00
Fabiano Fidêncio	3107533953	tests: Adjust to runtimeClass creation by the chart It's just a follow-up on the previous commit where we move away from the runtimeClass creation inside the script, and instead we do it using the chart itself. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-03 17:32:18 +01:00
Fabiano Fidêncio	12f3b206eb	Revert "kata-deploy: Allow setting the default runtime class name" This reverts commit `be05e1370c`, which is not a problem as we never released such option. Conflicts: tools/packaging/kata-deploy/helm-chart/README.md Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-03 17:32:18 +01:00
Fabiano Fidêncio	7cfa826804	kata-deploy: Let helm deal with runtimeClass creation We had this logic inside the script when we didn't use the helm chart. However, this only makes the shim script more convoluted for no reason. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-03 17:32:18 +01:00
Fabiano Fidêncio	14039c9089	golang: Update to 1.24.9 In order to fix: ``` === Running govulncheck on containerd-shim-kata-v2 === Vulnerabilities found in containerd-shim-kata-v2: === Symbol Results === Vulnerability #1: GO-2025-4015 Excessive CPU consumption in Reader.ReadResponse in net/textproto More info: https://pkg.go.dev/vuln/GO-2025-4015 Standard library Found in: net/textproto@go1.24.6 Fixed in: net/textproto@go1.24.8 Vulnerable symbols found: #1: textproto.Reader.ReadResponse Vulnerability #2: GO-2025-4014 Unbounded allocation when parsing GNU sparse map in archive/tar More info: https://pkg.go.dev/vuln/GO-2025-4014 Standard library Found in: archive/tar@go1.24.6 Fixed in: archive/tar@go1.24.8 Vulnerable symbols found: #1: tar.Reader.Next Vulnerability #3: GO-2025-4013 Panic when validating certificates with DSA public keys in crypto/x509 More info: https://pkg.go.dev/vuln/GO-2025-4013 Standard library Found in: crypto/x509@go1.24.6 Fixed in: crypto/x509@go1.24.8 Vulnerable symbols found: #1: x509.Certificate.Verify #2: x509.Certificate.Verify Vulnerability #4: GO-2025-4012 Lack of limit when parsing cookies can cause memory exhaustion in net/http More info: https://pkg.go.dev/vuln/GO-2025-4012 Standard library Found in: net/http@go1.24.6 Fixed in: net/http@go1.24.8 Vulnerable symbols found: #1: http.Client.Do #2: http.Client.Get #3: http.Client.Head #4: http.Client.Post #5: http.Client.PostForm Use '-show traces' to see the other 9 found symbols Vulnerability #5: GO-2025-4011 Parsing DER payload can cause memory exhaustion in encoding/asn1 More info: https://pkg.go.dev/vuln/GO-2025-4011 Standard library Found in: encoding/asn1@go1.24.6 Fixed in: encoding/asn1@go1.24.8 Vulnerable symbols found: #1: asn1.Unmarshal #2: asn1.UnmarshalWithParams Vulnerability #6: GO-2025-4010 Insufficient validation of bracketed IPv6 hostnames in net/url More info: https://pkg.go.dev/vuln/GO-2025-4010 Standard library Found in: net/url@go1.24.6 Fixed in: net/url@go1.24.8 Vulnerable symbols found: #1: url.JoinPath #2: url.Parse #3: url.ParseRequestURI #4: url.URL.Parse #5: url.URL.UnmarshalBinary Vulnerability #7: GO-2025-4009 Quadratic complexity when parsing some invalid inputs in encoding/pem More info: https://pkg.go.dev/vuln/GO-2025-4009 Standard library Found in: encoding/pem@go1.24.6 Fixed in: encoding/pem@go1.24.8 Vulnerable symbols found: #1: pem.Decode Vulnerability #8: GO-2025-4008 ALPN negotiation error contains attacker controlled information in crypto/tls More info: https://pkg.go.dev/vuln/GO-2025-4008 Standard library Found in: crypto/tls@go1.24.6 Fixed in: crypto/tls@go1.24.8 Vulnerable symbols found: #1: tls.Conn.Handshake #2: tls.Conn.HandshakeContext #3: tls.Conn.Read #4: tls.Conn.Write #5: tls.Dial Use '-show traces' to see the other 4 found symbols Vulnerability #9: GO-2025-4007 Quadratic complexity when checking name constraints in crypto/x509 More info: https://pkg.go.dev/vuln/GO-2025-4007 Standard library Found in: crypto/x509@go1.24.6 Fixed in: crypto/x509@go1.24.9 Vulnerable symbols found: #1: x509.CertPool.AppendCertsFromPEM #2: x509.Certificate.CheckCRLSignature #3: x509.Certificate.CheckSignature #4: x509.Certificate.CheckSignatureFrom #5: x509.Certificate.CreateCRL Use '-show traces' to see the other 27 found symbols Vulnerability #10: GO-2025-4006 Excessive CPU consumption in ParseAddress in net/mail More info: https://pkg.go.dev/vuln/GO-2025-4006 Standard library Found in: net/mail@go1.24.6 Fixed in: net/mail@go1.24.8 Vulnerable symbols found: #1: mail.AddressParser.Parse #2: mail.AddressParser.ParseList #3: mail.Header.AddressList #4: mail.ParseAddress #5: mail.ParseAddressList ``` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-03 16:57:22 +01:00
Dan Mihai	c563ee99fa	tests: policy-rc: detect create container errors early During the ${wait_time} for an expected condition, if CreateContainerRequest was NOT expected to fail: detect possible CreateContainerRequest failures early and abort the wait. For example, before this change: not ok 1 Successful replication controller with auto-generated policy in 123335ms ok 2 Policy failure: unexpected container command in 14601ms ok 3 Policy failure: unexpected volume mountPath in 14443ms ok 4 Policy failure: unexpected host device mapping in 14515ms ok 5 Policy failure: unexpected securityContext.allowPrivilegeEscalation in 14485ms ok 6 Policy failure: unexpected capability in 14382ms ok 7 Policy failure: unexpected UID = 1000 in 14578ms After this change: not ok 1 Successful replication controller with auto-generated policy in 17108ms ok 2 Policy failure: unexpected container command in 14427ms ok 3 Policy failure: unexpected volume mountPath in 14636ms ok 4 Policy failure: unexpected host device mapping in 14493ms ok 5 Policy failure: unexpected securityContext.allowPrivilegeEscalation in 14554ms ok 6 Policy failure: unexpected capability in 15087ms ok 7 Policy failure: unexpected UID = 1000 in 14371ms Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-03 15:55:55 +00:00
Dan Mihai	319400dc0d	tests: policy-pvc: detect create container errors early During the ${wait_time} for an expected condition, if CreateContainerRequest was NOT expected to fail: detect possible CreateContainerRequest failures early and abort the wait. For example, before this change: not ok 1 Successful pod with auto-generated policy in 94852ms ok 2 Policy failure: unexpected device mount in 17807ms After this change: not ok 1 Successful pod with auto-generated policy in 35194ms ok 2 Policy failure: unexpected device mount in 21355ms Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-03 15:55:55 +00:00
Dan Mihai	1914fcb812	tests: policy-log: detect create container errors early During the ${wait_time} for an expected condition, if CreateContainerRequest was NOT expected to fail: detect possible CreateContainerRequest failures early and abort the wait. For example, before this change: not ok 1 Logs empty when ReadStreamRequest is blocked in 102257ms After this change: not ok 1 Logs empty when ReadStreamRequest is blocked in 17339ms Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-03 15:55:55 +00:00
Dan Mihai	a0bd9e02ca	tests: policy-job: detect create container errors early During the ${wait_time} for an expected condition, if CreateContainerRequest was NOT expected to fail: detect possible CreateContainerRequest failures early and abort the wait. For example, before this change: not ok 1 Successful job with auto-generated policy in 107111ms ok 2 Policy failure: unexpected environment variable in 7920ms ok 3 Policy failure: unexpected command line argument in 7874ms ok 4 Policy failure: unexpected emptyDir volume in 7823ms ok 5 Policy failure: unexpected projected volume in 7812ms ok 6 Policy failure: unexpected readOnlyRootFilesystem in 7903ms ok 7 Policy failure: unexpected UID = 222 in 7720ms After this change: not ok 1 Successful job with auto-generated policy in 10271ms ok 2 Policy failure: unexpected environment variable in 8018ms ok 3 Policy failure: unexpected command line argument in 7886ms ok 4 Policy failure: unexpected emptyDir volume in 7621ms ok 5 Policy failure: unexpected projected volume in 7843ms ok 6 Policy failure: unexpected readOnlyRootFilesystem in 7632ms ok 7 Policy failure: unexpected UID = 222 in 7619ms Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-03 15:55:55 +00:00
Dan Mihai	992c91371c	tests: policy-deployment-sc: detect create container errors early During the ${wait_time} for an expected condition, if CreateContainerRequest was NOT expected to fail: detect possible CreateContainerRequest failures early and abort the wait. For example, before this change: ok 1 Successful sc deployment with auto-generated policy and container image volumes in 14769ms ok 2 Successful sc with fsGroup/supplementalGroup deployment with auto-generated policy and container image volumes in 8384ms not ok 3 Successful sc deployment with security context choosing another valid user in 136149ms ok 4 Successful layered sc deployment with auto-generated policy and container image volumes in 8862ms ok 5 Policy failure: unexpected GID = 0 for layered securityContext deployment in 7941ms ok 6 Policy failure: malicious root group added via supplementalGroups deployment in 11612ms After: ok 1 Successful sc deployment with auto-generated policy and container image volumes in 15230ms ok 2 Successful sc with fsGroup/supplementalGroup deployment with auto-generated policy and container image volumes in 9364ms not ok 3 Successful sc deployment with security context choosing another valid user in 11060ms ok 4 Successful layered sc deployment with auto-generated policy and container image volumes in 9124ms ok 5 Policy failure: unexpected GID = 0 for layered securityContext deployment in 7919ms ok 6 Policy failure: malicious root group added via supplementalGroups deployment in 11666ms Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-03 15:55:55 +00:00
Dan Mihai	704ee76f1e	tests: policy-deployment-sc: reduced redundancy Call common function instead of copy/paste of three commands. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-03 15:55:55 +00:00
Dan Mihai	2cafb10a6a	tests: policy-pod: detect create container errors early During the ${wait_time} for an expected condition, if CreateContainerRequest was NOT expected to fail: detect possible CreateContainerRequest failures early and abort the wait. For example, before this change: not ok 1 Successful pod with auto-generated policy in 110801ms not ok 2 Able to read env variables sourced from configmap using envFrom in 94104ms not ok 3 Successful pod with auto-generated policy and runtimeClassName filter in 95838ms not ok 4 Successful pod with auto-generated policy and custom layers cache path in 110712ms ok 5 Policy failure: unexpected container image in 8113ms ok 6 Policy failure: unexpected privileged security context in 7943ms ok 7 Policy failure: unexpected terminationMessagePath in 11530ms ok 8 Policy failure: unexpected hostPath volume mount in 7970ms ok 9 Policy failure: unexpected config map in 7933ms not ok 10 Policy failure: unexpected lifecycle.postStart.exec.command in 112677ms ok 11 RuntimeClassName filter: no policy in 2302ms not ok 12 ExecProcessRequest tests in 93946ms not ok 13 Successful pod: runAsUser having the same value as the UID from the container image in 94003ms ok 14 Policy failure: unexpected UID = 0 in 8016ms ok 15 Policy failure: unexpected UID = 1234 in 7850ms After: not ok 1 Successful pod with auto-generated policy in 12182ms not ok 2 Able to read env variables sourced from configmap using envFrom in 10121ms not ok 3 Successful pod with auto-generated policy and runtimeClassName filter in 11738ms not ok 4 Successful pod with auto-generated policy and custom layers cache path in 26592ms ok 5 Policy failure: unexpected container image in 7742ms ok 6 Policy failure: unexpected privileged security context in 7949ms ok 7 Policy failure: unexpected terminationMessagePath in 7789ms ok 8 Policy failure: unexpected hostPath volume mount in 7887ms ok 9 Policy failure: unexpected config map in 7818ms not ok 10 Policy failure: unexpected lifecycle.postStart.exec.command in 9120ms ok 11 RuntimeClassName filter: no policy in 2081ms not ok 12 ExecProcessRequest tests in 9883ms not ok 13 Successful pod: runAsUser having the same value as the UID from the container image in 9870ms ok 14 Policy failure: unexpected UID = 0 in 11161ms ok 15 Policy failure: unexpected UID = 1234 in 7814ms Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-03 15:55:55 +00:00
Alex Lyn	897ecfb503	Merge pull request #12014 from fidencio/topic/release-ensure-helm-dependencies-update scripts: release: Run helm dependencies update	2025-11-03 16:34:17 +08:00
Fabiano Fidêncio	c539a9e90e	tests: k8s: parallel: Increase timeout We've seen a few cases where we fail the test due to timeout and when we print the pods we just see that they've been created. With that in mind, let's just increase the timeout a little bit. Example: ``` not ok 1 Parallel jobs in 6250ms (in test file k8s-parallel.bats, line 41) `kubectl wait --for=condition=Ready --timeout=$timeout pod -l jobgroup=${job_name}' failed No resources found in kata-containers-k8s-tests namespace. [bats-exec-test:71] INFO: k8s configured to use runtimeclass job.batch/process-item-test1 created job.batch/process-item-test2 created job.batch/process-item-test3 created NAME STATUS COMPLETIONS DURATION AGE process-item-test1 Running 0/1 0s process-item-test2 Running 0/1 0s process-item-test3 Running 0/1 0s error: no matching resources found No resources found in kata-containers-k8s-tests namespace. No resources found in kata-containers-k8s-tests namespace. DEBUG: system logs of node 'aks-nodepool1-25989463-vmss000000' since test start time (2025-11-01 16:39:03) -- No entries -- job.batch "process-item-test1" deleted job.batch "process-item-test2" deleted job.batch "process-item-test3" deleted ``` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-01 18:09:37 +01:00
Fabiano Fidêncio	8a5ebd5d16	tests: k8s: run QoS tests on a bigger instance It's been failing to start quite regularly on the smaller instance. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-01 17:54:58 +01:00
Fabiano Fidêncio	157b2c32ce	scripts: release: Run helm dependencies update Otherwise we'll face issues like: ``` Error: found in Chart.yaml, but missing in charts/ directory: node-feature-discovery ``` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-01 17:54:58 +01:00
Fabiano Fidêncio	c75a46d17f	tests: Do not enable NFD on s390x As we're failing on the uninstall, which seems related to a bug on NFD itself, but I don't have access to a s390x machine to debug, let's skip the enablement for now and enable it back once we've experimented it better on s390x. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-31 16:30:13 +01:00
Fabiano Fidêncio	67e38e0f92	tests: Do not enable NFD on cbl-mariner As we're failing to install NFD on CBL Mariner, let's skip the enablement there, and enable it once we've experimented it better there. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-31 16:30:13 +01:00
Fabiano Fidêncio	1bc873397b	tests: Use NFD as part of the tests As we have the ability to deploy NFD as a sub-chart of our chart, let's make sure we test it during our CI. We had to increase the timeout values, where we had timeouts set, to deploy / undeploy kata, as now NFD is also deployed / undeployed. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-31 16:30:13 +01:00
Fabiano Fidêncio	ebe15d154e	kata-deploy: Add NFD as a dependency Let's ensure that we add NFD as a weak dependency of the kata-deploy helm chart. What we're doing for now is leaving it up to the user / admin to enable it, and if enabled then we do a explicit check for virtualization support (x86_64 only for now). In case NFD is already deployed, we fail the installation (in case it's enabled on the kata-deploy helm chart) with a clear error message to the user. While I know that kata-remote DOES NOT require virtualization, I've left this out (with a comment for when we add a peer-pods dependency on kata-deploy) in order to simplify things for now, as kata-remote is not a deployed shim by default. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-31 16:30:13 +01:00
Fabiano Fidêncio	be05e1370c	kata-deploy: Allow setting the default runtime class name As Kata Containers can be consumed by other helm-charts, hard coding the default runtime class name to `kata` is not optimal. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-31 16:14:53 +01:00
Fabiano Fidêncio	820e6d6351	kata-deploy: Add more per-arch options All the options that take a specific shim as an argument MUST have specific per arch settings, as not all the shims are available for all the arches, leading to issues when setting up multi-arch deployments. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-31 16:14:53 +01:00
Zvonko Kaiser	94abe4fc00	osbuilder: nvrc: Consume NVRC release instead of building it Let's ensure that we consume NVRC releases straight from GitHub instead of building the binaries ourselves. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-31 12:10:20 +01:00
Zvonko Kaiser	69c76971f3	gpu: Handle VFIO and IOMMUFD We have here either /dev/vfio/<num> or /dev/vfio/devices/vfio<num>, for IOMMUFD format /dev/vfio/devices/vfio<num>, strip "vfio" prefix /dev/vfio/123 - basename "123" - vfioNum = "123" - cdi.k8s.io/vfio123 /dev/vfio/devices/vfio123 - basename "vfio123" - strip - vfioNum = "123" - cdi.k8s.io/vfio123 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-10-31 09:46:07 +01:00
Saul Paredes	26396881cf	webhook: allow privileged containers This allows us to test privileged containers when using the webhook. We can do this because kata-deploy sets privileged_without_host_devices = true for kata runtime by default. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-10-30 14:59:26 -07:00
Fabiano Fidêncio	e30e2b5f45	tests: k8s: Remove tests running on GitHub provided runner We have 2 tests running on GitHub provided runners: * devmapper * CRI-O - devmapper situation For devmapper, we're currently testing devmapper with s390x as part of one of its jobs. More than that, this test has been failing here due to a lack of space in the machine for quite some time, and no-action was taken to bring it back either via GARM or some other way. With that said, let's rely on the s390x CI to test devmapper and avoid one extra failure on our CI by removing this one. - cri-o situation CRI-O is being tested with a fixed version of kubernetes that's already reached its EOL, and a CRI-O version that matches that k8s version. There has been attempts to raise issues, and also to provide a PR that does at least part of the work ... leaving the debugging part for the maintainers of the CI. However, there was no action on those from the maintainers. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-30 11:46:59 +01:00
Alex Lyn	fa521220a9	Merge pull request #11816 from jiuyi123/rs-vm-template-kata-ctl-merge kata-ctl: add factory subcommands for VM template management	2025-10-30 18:21:12 +08:00
ssc	551caad4b1	docs: add guide on VM templating usage in runtime-rs - Explained the concept and benefits of VM templating - Provided step-by-step instructions for enabling VM templating - Detailed the setup for using snapshotter in place of VirtioFS for template-based VM creation - Added performance test results comparing template-based and direct VM creation Signed-off-by: ssc <741026400@qq.com>	2025-10-30 15:18:31 +08:00
ssc	5a586e13a1	kata-ctl: add factory subcommands for VM template management - init: initialize the VM template factory - status: check the current factory status - destroy: clean up and remove factory resources These commands provide basic lifecycle management for VM templates. Signed-off-by: ssc <741026400@qq.com>	2025-10-30 10:27:17 +08:00
RuoqingHe	8878c46e8f	Merge pull request #11867 from spectator333/update-rust-vmm-deps dragonball: Bump kvm-ioctls to fix security issue	2025-10-30 00:17:29 +08:00
Siyu Tao	dd444d23b3	dragonball: Bump kvm-ioctls to fix security issue Use `ioctl_with_mut_ref` instead of `ioctl_with_ref` in the `create_device` method as it needs to write to the `kvm_create_device` struct passed to it, which was released in v0.12.1. Signed-off-by: Siyu Tao <taosiyu2024@163.com>	2025-10-29 14:03:29 +00:00
Steve Horsman	0e19a2bf91	Merge pull request #11993 from zvonkok/vectorAdd gpu: Add libs for CC	2025-10-29 13:42:34 +00:00
stevenhorsman	555926ea1a	libs: Fix formatting issue Fix the cargo fmt issues and then we can make the libs tests required again to avoid this regression happening again. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-10-29 13:13:50 +01:00
Steve Horsman	dbdd1009af	Merge pull request #11933 from kata-containers/topic/kata-deploy-nfd-dependency-part-I kata-deploy: Automatically deploy NodeFeatureRules for TEEs	2025-10-29 09:50:38 +00:00
Fabiano Fidêncio	103f80c7f5	readme: install: Drop outdated documentation kata-deploy helm chart is THE way to deploy kata-containers on kubernetes environments, and kubernetes environments is basically the only reliably tested deployment we have. For now, let's just drop documentation that is outdated / incorrect, and in the future let's ensure we update the linked docs, as we work on update / upgrade for the helm chart. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-29 09:41:57 +01:00
Zvonko Kaiser	5ff218823c	gpu: Remove unneeded libraries The libs in question were added when moving to developer.nvidia.com but switching back to ubuntu only based builds they are not needed. Remove them to keep the rootfs as minimal as possible. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-10-29 08:03:36 +01:00
Zvonko Kaiser	6d9b4059f5	gpu: Add libs for CC In the case of CC we need additional libraries in the rootfs. Add them conditionally if type == confidential. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-10-29 08:03:36 +01:00
Xuewei Niu	55d181beb1	Merge pull request #11828 from jiuyi123/rs-vm-template-runtime-rs runtime-rs: introduce VM template lifecycle and integration	2025-10-29 14:03:46 +08:00
Xuewei Niu	8aca32dfa9	Merge pull request #11862 from StevenFryto/rootless_clh runtime-rs: supporting the CLH VMM process running in non-root mode	2025-10-29 13:31:53 +08:00
ssc	16e8cf1a09	runtime-rs: boot vm from template Add build_vm_from_template() that flips boot_from_template flag, wires factory.template_path/{memory,state} into the hypervisor config, and returns ready-to-use hypervisor & agent instances. When factory.template is enabled, VirtContainer bypasses normal creation and directly boots the VM by restoring the template through incoming migration, completing the "create → save → clone" loop. Fixes: #11413 Signed-off-by: ssc <741026400@qq.com>	2025-10-29 12:38:28 +08:00
ssc	550615285c	runtime-rs: add factory, template and vm modules for VM template lifecycle Introduced factory::FactoryConfig with init/destroy/status commands to manage template pools. Added template::Template to fetch, create and persist base VMs. Introduced vm::{VM, VMConfig} exposing create, pause, save, resume, stop, disconnect and migration helpers for sandbox integration. Extended QemuInner to executes QMP incoming migration, pause/resume and status tracking. Fixes: #11413 Signed-off-by: ssc <741026400@qq.com>	2025-10-29 12:38:28 +08:00
ssc	135c84b6cb	kata-types: add VM template and factory configuration Added new fields in Hypervisor struct to support VM template creation, template boot, memory and device state paths, shared path, and store paths. Introduced a Factory struct in config to manage template path, cache endpoint, cache number, and template enable flag. Integrated Factory into TomlConfig for runtime configuration parsing. Fixes: #11413 Signed-off-by: ssc <741026400@qq.com>	2025-10-29 11:49:08 +08:00
stevenfryto	2ceadc5fa3	runtime-rs: supporting the CLH VMM process running in non-root mode This change enables to run the Cloud Hypervisor VMM using a non-root user when rootless flag is set true in configuration. Fixes: #11414 Signed-off-by: stevenfryto <sunzitai_1832@bupt.edu.cn>	2025-10-29 01:55:10 +00:00
stevenfryto	2ddbae3aa6	runtime-rs: pass the tuntap fds down to Cloud Hypervisor Pass the file descriptors of the tuntap device to the Cloud Hypervisor VMM process so that the process could open the device without cap_net_admin Signed-off-by: stevenfryto <sunzitai_1832@bupt.edu.cn>	2025-10-29 01:55:10 +00:00
Fabiano Fidêncio	59883a2d99	actions: Remove unused USING_NFD There's no reason to keep the env var / input as it's never been used and now kata-deploy detects automatically whether NFD is deployed or not. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-28 21:24:27 +01:00
Fabiano Fidêncio	f9825b4e6e	kata-deploy: Automatically deploy NodeFeatureRules for TEEs When the NodeFeatureRule CRD is detected kata-deploy will: * Create the specific NodeFeatureRules for the x86_64 TEEs * Adapt the TEEs runtime classes to take into account the amount of keys available in the system when spawning the podsandbox. Note, we still do not have NFD as sub-dependency of the helm chart, and I'm not even sure if we will have. However, it's important to integrate better with the scenarios where the NFD is already present. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-28 21:24:27 +01:00
Manuel Huber	8dc78057d6	ci: Refactor NVIDIA NIM test Change NIM bats file logic to allow skipping test cases which require multiple GPUs. This can be helpful for test clusters where there is only one node with a single GPU, or for local test environments with a single-node cluster with a single GPU. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-10-28 19:12:16 +01:00
Manuel Huber	be32b77baf	ci: Add NVIDIA CUDA vectoradd test This change adds a CUDA vectoradd test case and makes enabling NVRC tracing optional and idempotent. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-10-28 19:12:16 +01:00
Fabiano Fidêncio	a164693e1a	release: Bump version to 3.22.0 Bump VERSION and helm-chart versions Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-28 16:28:18 +01:00
Steve Horsman	1b46cf43c4	Merge pull request #11989 from Amulyam24/actionpz-ppc64le revert: Enable new ibm runners for ppc64le	2025-10-28 12:09:03 +00:00
Amulyam24	c603094584	revert: Enable new ibm runners for ppc64le Temporarily disables the new runners for building artifacts jobs. Will be re-enabled once they are stable. Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2025-10-28 17:09:26 +05:30
Hyounggyu Choi	7d2fe5e187	revert: Enable new ibm runners for s390x This partially reverts `8dcd91c` for the s390x because the CI jobs are currently blocking the release. The new runners will be re-introduced once they are stable and no longer impact critical paths. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-10-28 11:11:51 +01:00
Fabiano Fidêncio	754e832cfa	kata-deploy: Allow passing shims / defaultShim per arch This allows us to do a full multi-arch deployment, as the user can easily select which shim can be deployed per arch, as some of the VMMs are not supported on all architectures, which would lead to a broken installation. Now, passing shims per arch we can easily have an heterogenous deployment where, for instance, we can set qemu-se-runtime-rs for s390x, qemu-cca for aarch64, and qemu-snp / qemu-tdx for x86_64 and call all of those a default kata-confidential ... and have everything working with the same deployment. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-27 22:42:37 +01:00
Greg Kurz	ffdc80733a	Merge pull request #11966 from zvonkok/gpu-cc-fix gpu: rootfs fixes	2025-10-27 10:18:13 +01:00
Alex Lyn	418d5f724e	Merge pull request #11971 from lifupan/fupan_blk_ratelimit runtime-rs: Support disk rate limiter for dragonball	2025-10-27 17:12:47 +08:00
Alex Lyn	f86ac595a8	Merge pull request #11973 from Apokleos/enhance-oci-spec runtime-rs: Enhancements for items within OCI Spec	2025-10-27 16:15:00 +08:00
Alex Lyn	690dad5528	runtime-rs: Ensure complete cleanup of stale Device Cgroups The previous procedure failed to reliably ensure that all unused Device Cgroups were completely removed, a failure consistently verified by CI tests. This change introduces a more robust and thorough cleanup mechanism. The goal is to prevent previous issues—likely stemming from improper use of Rust mutable references—that caused the modifications to be ineffective or incomplete. This ensures a clean environment and reliable CI test execution. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-10-27 12:47:48 +08:00
Alex Lyn	25ab615da5	Merge pull request #11913 from Apokleos/dedicated-error-rs CI: Add dedicated expected error message for runtime-rs	2025-10-27 10:47:07 +08:00
Zvonko Kaiser	39848e0983	gpu: rootfs fixes Build only from Ubuntu repositories do not mix with developer.nvidia.com Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> Update tools/osbuilder/rootfs-builder/nvidia/nvidia_chroot.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-26 19:36:55 +01:00
stevenhorsman	aec0ceb860	gatekeeper: Update mariner tests name In https://github.com/kata-containers/kata-containers/pull/11972 the auto-generate-policy: yes matrix parameter was removed which updates the name of the name, so sync this change in required-tests.yaml Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-10-25 17:51:31 +02:00
Kevin Zhao	e2dbe87a99	tests: Fix cca test failure on arm64 and other architectures Fix the wrong test with appendProtectionDevice on arm64 Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2025-10-25 13:54:35 +02:00
dependabot[bot]	99ae3607dc	build(deps): bump astral-tokio-tar in /src/tools/agent-ctl Bumps [astral-tokio-tar](https://github.com/astral-sh/tokio-tar) from 0.5.5 to 0.5.6. - [Release notes](https://github.com/astral-sh/tokio-tar/releases) - [Changelog](https://github.com/astral-sh/tokio-tar/blob/main/CHANGELOG.md) - [Commits](https://github.com/astral-sh/tokio-tar/compare/v0.5.5...v0.5.6) --- updated-dependencies: - dependency-name: astral-tokio-tar dependency-version: 0.5.6 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2025-10-25 13:53:24 +02:00
Dan Mihai	61ee4d7f8b	Merge pull request #11951 from burgerdev/watchable genpolicy: allow non-watchable ConfigMaps	2025-10-24 08:38:55 -07:00
Steve Horsman	ac601ecd45	Merge pull request #11964 from Amulyam24/k8s-ppc64le github: migrate k8s job to a different runner on ppc64le	2025-10-24 15:55:59 +01:00
Dan Mihai	ac3ea973ee	Merge pull request #11958 from microsoft/danmihai1/policy-tests-upstream5 tests: k8s: auto-generate policy for additional tests	2025-10-24 07:18:00 -07:00
Amulyam24	9876cbffd6	github: migrate k8s job to a different runner on ppc64le Migrate the k8s job to a different runner and use a long running cluster instead of creating the cluster on every run. Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2025-10-24 18:20:11 +05:30
Steve Horsman	5713072385	Merge pull request #11974 from fidencio/topic/payload-after-build-upload-latest-charts actions: Push a `0.0.0-dev` chart package to the registries	2025-10-24 13:13:02 +01:00
Alex Lyn	e539432a91	CI: Add dedicated expected error message for runtime-rs Runtime-rs has its dedicated error message, we need handle it separately. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-10-24 20:08:59 +08:00
Steve Horsman	60022c9556	Merge pull request #11972 from microsoft/danmihai1/no-mariner-policy gha: no policy for cbl-mariner during ci	2025-10-24 12:03:52 +01:00
Fabiano Fidêncio	ebc1d64096	actions: Push a `0.0.0-dev` chart package to the registries This will help immensely projects consuming the kata-deploy helm chart to use configuration options added during the development cycle that are waiting for a release to be out ... allowing very early tests of the stack. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-24 11:44:27 +02:00
Alex Lyn	91db25ef02	runtime-rs: Reset capabilities for exec processes By default, `kubectl exec` inherits some capabilities from the container, which could pose a security risk in a confidential environment. This change modifies the agent policy to strictly enforce that any process started via `ExecProcessRequest` has no Linux capabilities. This prevents potential privilege escalation within an exec session, adhering to the principle of least privilege. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-10-24 15:42:17 +08:00
Alex Lyn	2de6fa520d	runtime-rs: Reset ApparmorProfile with Non value As in CoCo cases, the ApparmorProfile setting within runtime-go is set with None, we should align it with runtime-go. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-10-24 15:40:45 +08:00
Dan Mihai	b8c1215d99	gha: no policy for cbl-mariner during ci Temporarily disable the auto-generated Agent Policy on Mariner hosts, to workaround the new test failures on these hosts. When re-enabling auto-generated policy in the future, that would be better achieved with a tests/integration/kubernetes/gha-run.sh change. Those changes are easier to test compared with GHA YAML changes. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-10-24 04:00:36 +00:00
Fupan Li	9fda9905a7	runtime-rs: Support disk rate limiter for dragonball This PR adds code that passes disk limiter parameters to dragonball vmm.. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-10-24 10:39:53 +08:00
Markus Rudy	acc7974602	genpolicy: allow non-watchable ConfigMaps If a ConfigMap has more than 8 files it will not be mounted watchable [1]. However, genpolicy assumes that ConfigMaps are always mounted at a watchable path, so containers with large ConfigMap mounts fail verification. This commit allows mounting ConfigMaps from watchable and non-watchable directories. ConfigMap mounts can't be meaningfully verified anyway, so the exact location of the data does not matter, except that we stay in the sandbox data dirs. [1]: `0ce3f5fc6f/docs/design/inotify.md (L11-L21)` Fixes: #11777 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-10-23 15:45:17 +02:00
Fabiano Fidêncio	94adc58342	tests: Ensure helm secret for kata-deploy installation is cleaned up Every now and then, in case a failure happens, helm leaves the secret behind without cleaning it up, leading to issues in the consecutive runs. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-23 11:15:13 +02:00
Fabiano Fidêncio	12a515826d	tools: Install Golang from a reliable mirror (follow-up) Aurélien has moved to a reliable mirror for our tests, but we missed that our tools Dockerfiles could benefit from the same change, which is added now. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-23 11:15:13 +02:00
Fabiano Fidêncio	560425f31f	build: kernel: Bump version to trigger signed builds for arm64 GPU Although we saw this happening, we expected it to NOT happen ... As the kernel is not signed, but we expect it to be (the cached version), then we're bailing. :-/ Let's ensure a full rebuild of kernels happen and we'll be good from that point onwards. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-23 11:15:13 +02:00
Zvonko Kaiser	0b11190fcf	gpu: Add Arm64 kernel signing Adopt working amd64 workflow to arm64 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-10-22 21:05:32 +02:00
Mikko Ylinen	1beda258b8	qemu: nvidia: tdx: add quote-generation-socket for attestation to work Add TDX QGS quote-generation-socket TDX QEMU object params for attestation to work in NVGPU+TDX environment. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-10-22 21:01:35 +02:00
Hyounggyu Choi	2c805900a4	Merge pull request #11891 from stevenhorsman/signature-tests-with-initdata tests/k8s: Add initdata variants of signature verification and registry authentication tests	2025-10-22 20:27:26 +02:00
Fabiano Fidêncio	ba912e6a84	kata-deploy: Adapt nydus installation to MULTI_INSTALL_SUFFIX By doing this we can ensure that more than one instance of nydus-snapshotter can be running inside the cluster, which is super useful for doing A-B "upgrades" (where we install a new version of kata-containers + nydus on B, while A is still running, and then only uninstall A after making sure that B is working as expected). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-22 20:25:03 +02:00
Fupan Li	5615c9af84	Merge pull request #11722 from RuoqingHe/2025-08-25-move-mem-agent-to-libs libs: Move mem-agent into libs workspace	2025-10-22 11:23:33 +02:00
Fabiano Fidêncio	ded336405f	kata-deploy: All qemu variants use .hypervisors.qemu.* We've been wrongly trying to set up the `${shim}` (as the qemu-snp, for instance) as the hypervisor name in the kata-containers configuration file, leading to an `tomlq` breaking as all the .hypervisors.qemu* shims are tied to the `qemu` hypervisor, and it happens regardless of the shim having a different name, or the hypervisor being experimental or not. ```sh $ grep "hypervisor.qemu" src/runtime/config/configuration- src/runtime/config/configuration-qemu-cca.toml.in:[hypervisor.qemu] src/runtime/config/configuration-qemu-coco-dev.toml.in:[hypervisor.qemu] src/runtime/config/configuration-qemu-nvidia-gpu-snp.toml.in:[hypervisor.qemu] src/runtime/config/configuration-qemu-nvidia-gpu-tdx.toml.in:[hypervisor.qemu] src/runtime/config/configuration-qemu-nvidia-gpu.toml.in:[hypervisor.qemu] src/runtime/config/configuration-qemu-se.toml.in:[hypervisor.qemu] src/runtime/config/configuration-qemu-snp.toml.in:[hypervisor.qemu] src/runtime/config/configuration-qemu-tdx.toml.in:[hypervisor.qemu] src/runtime/config/configuration-qemu.toml.in:[hypervisor.qemu] $ grep "hypervisor.qemu" src/runtime-rs/config/configuration- src/runtime-rs/config/configuration-qemu-runtime-rs.toml.in:[hypervisor.qemu] src/runtime-rs/config/configuration-qemu-se-runtime-rs.toml.in:[hypervisor.qemu] ``` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-22 10:23:12 +02:00
Ruoqing He	000f707205	libs: mem-agent: Add missing #[cfg(test)] `tests` module inside `memcg` module should be gated behind `test`, add `[#cfg(test)]` to make those tests work properly. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	831f3ab616	libs: mem-agent: Skip tests require root Some tests from mem-agent requires root privilege, use `skip_if_not_root` to skip those tests if they were not executed under root user. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	ac539baeaa	libs: Ignore clippy `precedence` and `identity_op` Ignoring `precedence` and `identity_op` clippy warning suggested by rust 1.85.1 for now. ```console error: operator precedence can trip the unwary --> mem-agent/src/compact.rs:273:61 \| 273 \| ... total_free_movable_pages += count * 1 << order; \| ^^^^^^^^^^^^^^^^^^ help: consider parenthesizing your expression: `(count * 1) << order` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#precedence = note: `-D clippy::precedence` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::precedence)]` Checking kata-types v0.1.0 (/root/riscv/kata-containers/src/libs/kata-types) error: this operation has no effect --> mem-agent/src/compact.rs:273:61 \| 273 \| ... total_free_movable_pages += count * 1 << order; \| ^^^^^^^^^ help: consider reducing it to: `count` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#identity_op = note: `-D clippy::identity-op` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::identity_op)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	4dec1a32eb	libs: Allow clippy `type_complexity` Prefixing with `#[allow(clippy::type_complexity)]` to silence this warning, the return type is documented in comments. ```console error: very complex type used. Consider factoring parts into `type` definitions --> mem-agent/src/mglru.rs:184:6 \| 184 \| ) -> Result<HashMap<String, (usize, HashMap<usize, MGenLRU>)>> { \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#type_complexity = note: `-D clippy::type-complexity` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::type_complexity)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	241e6db237	libs: Fix clippy `absurd_extreme_comparisons` Manually fix `redundant_field_names ` clippy warning by testing equality against 0 as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: this comparison involving the minimum or maximum element for this type contains a case that is always true or always false --> mem-agent/src/psi.rs:62:8 \| 62 \| if reader \| ________^ 63 \| \| .read_line(&mut first_line) 64 \| \| .map_err(\|e\| anyhow!("reader.read_line failed: {}", e))? 65 \| \| <= 0 \| \|____________^ \| = help: because `0` is the minimum value for this type, the case where the two sides are not equal never occurs, consider using `reader .read_line(&mut first_line) .map_err(\|e\| anyhow!("reader.read_line failed: {}", e))? == 0` instead = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#absurd_extreme_comparisons = note: `#[deny(clippy::absurd_extreme_comparisons)]` on by default ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	495e012160	libs: Fix clippy `redundant_field_names` Manually fix `redundant_field_names` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: redundant field names in struct initialization --> mem-agent/src/memcg.rs:441:13 \| 441 \| numa_id: numa_id, \| ^^^^^^^^^^^^^^^^ help: replace it with: `numa_id` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_field_names = note: `-D clippy::redundant-field-names` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::redundant_field_names)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	96c1175580	libs: Fix clippy `manual_strip` Manually fix `manual_strip` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: stripping a prefix manually --> mem-agent/src/mglru.rs:284:29 \| 284 \| u32::from_str_radix(&content[2..], 16) \| ^^^^^^^^^^^^^ \| note: the prefix was tested here --> mem-agent/src/mglru.rs:283:13 \| 283 \| let r = if content.starts_with("0x") { \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#manual_strip = note: `-D clippy::manual-strip` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::manual_strip)]` help: try using the `strip_prefix` method \| 283 ~ let r = if let Some(<stripped>) = content.strip_prefix("0x") { 284 ~ u32::from_str_radix(<stripped>, 16) \| ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	2dc0b14512	libs: Fix clippy `field_reassign_with_default` Manually fix `field_reassign_with_default` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: field assignment outside of initializer for an instance created with Default::default() --> mem-agent/src/memcg.rs:874:21 \| 874 \| numa_cg.numa_id = numa; \| ^^^^^^^^^^^^^^^^^^^^^^^ \| note: consider initializing the variable with `memcg::CgroupConfig { numa_id: numa, ..Default::default() }` and removing relevant reassignments --> mem-agent/src/memcg.rs:873:21 \| 873 \| let mut numa_cg = CgroupConfig::default(); \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#field_reassign_with_default = note: `-D clippy::field-reassign-with-default` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::field_reassign_with_default)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	b399ac7f6d	libs: Fix clippy `derivable_impls` Fix `derivable_impls` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: this `impl` can be derived --> mem-agent/src/memcg.rs:123:1 \| 123 \| / impl Default for CgroupConfig { 124 \| \| fn default() -> Self { 125 \| \| Self { 126 \| \| no_subdir: false, ... \| 132 \| \| } \| \|_^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#derivable_impls = note: `-D clippy::derivable-impls` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::derivable_impls)]` help: replace the manual implementation with a derive attribute \| 117 + #[derive(Default)] 118 ~ pub struct CgroupConfig { \| ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	55bafa257d	libs: Fix clippy `redundant_pattern_matching` Fix `redundant_pattern_matching` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: redundant pattern matching, consider using `is_some()` --> mem-agent/src/memcg.rs:595:40 \| 595 \| ... if let Some(_) = config_map.get_mut(path) { \| -------^^^^^^^--------------------------- help: try: `if config_map.get_mut(path).is_some()` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_pattern_matching = note: `-D clippy::redundant-pattern-matching` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::redundant_pattern_matching)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	a9f415ade5	libs: Fix clippy `needless_bool` Fix `needless_bool` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: this if-then-else expression returns a bool literal --> mem-agent/src/memcg.rs:855:17 \| 855 \| / if configs.is_empty() { 856 \| \| true 857 \| \| } else { 858 \| \| false 859 \| \| } \| \|_________________^ help: you can reduce it to: `configs.is_empty()` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_bool = note: `-D clippy::needless-bool` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::needless_bool)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	6959bc1b3c	libs: Fix clippy `for_kv_map` Fix `for_kv_map` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: you seem to want to iterate on a map's keys --> mem-agent/src/memcg.rs:822:43 \| 822 \| for (single_config, _) in &secs_map.cgs { \| ^^^^^^^^^^^^^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#for_kv_map help: use the corresponding method \| 822 \| for single_config in secs_map.cgs.keys() { \| ~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~ ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	702665ee8b	libs: Fix clippy `manual_map` Fix `manual_map` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: manual implementation of `Option::map` --> mem-agent/src/memcg.rs:375:21 \| 375 \| / if let Some(hmg) = hmg.get(&(numa_id as usize)) { 376 \| \| Some((numa_id, Numa::new(hmg, path, psi_path))) 377 \| \| } else { 378 \| \| None 379 \| \| } \| \|_____________________^ help: try: `hmg.get(&(numa_id as usize)).map(\|hmg\| (numa_id, Numa::new(hmg, path, psi_path)))` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#manual_map = note: `-D clippy::manual-map` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::manual_map)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	b47a382d00	libs: Fix clippy `into_iter_on_ref` Fix `into_iter_on_ref` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: this `.into_iter()` call is equivalent to `.iter_mut()` and will not consume the `Vec` --> mem-agent/src/memcg.rs:1122:27 \| 1122 \| for info in infov.into_iter() { \| ^^^^^^^^^ help: call directly: `iter_mut` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#into_iter_on_ref = note: `-D clippy::into-iter-on-ref` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::into_iter_on_ref)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	2986eb3a78	libs: Fix clippy `legacy_numeric_constants` Fix `legacy_numeric_constants` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: usage of a legacy numeric constant --> mem-agent/src/compact.rs:132:47 \| 132 \| if self.config.compact_force_times == std::u64::MAX { \| ^^^^^^^^^^^^^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#legacy_numeric_constants help: use the associated constant instead \| 132 \| if self.config.compact_force_times == u64::MAX { \| ~~~~~~~~ ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	3d146a525c	libs: Fix clippy `single_component_path_imports` Fix `single_component_path_imports` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: this import is redundant --> mem-agent/src/mglru.rs:345:5 \| 345 \| use slog_term; \| ^^^^^^^^^^^^^^ help: remove it entirely \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#single_component_path_imports ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	b84a03e434	libs: Fix clippy `from_str_radix_10` Fix `from_str_radix_10` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: this call to `from_str_radix` can be replaced with a call to `str::parse` --> mem-agent/src/mglru.rs:29:14 \| 29 \| let id = usize::from_str_radix(words[1], 10) \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try: `words[1].parse::<usize>()` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#from_str_radix_10 = note: `-D clippy::from-str-radix-10` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::from_str_radix_10)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	ded6f2d116	libs: Fix clippy `needless_borrow` Fix `needless_borrow` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: this expression creates a reference which is immediately dereferenced by the compiler --> mem-agent/src/memcg.rs:1100:52 \| 1100 \| self.run_eviction_single_config(infov, &config)?; \| ^^^^^^^ help: change this to: `config` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	541436c82c	libs: Fix clippy `ptr_arg` Fix `ptr_arg` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: writing `&PathBuf` instead of `&Path` involves a new object where a slice will do --> mem-agent/src/memcg.rs:367:19 \| 367 \| psi_path: &PathBuf, \| ^^^^^^^^ help: change this to: `&Path` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg = note: requested on the command line with `-D clippy::ptr-arg` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	cdd94060f1	libs: Fix clippy `crate_in_macro_def` Fix `crate_in_macro_def` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: `crate` references the macro call's crate --> mem-agent/src/misc.rs:12:22 \| 12 \| slog::error!(crate::misc::sl(), "{}", format_args!($($arg)*)) \| ^^^^^ help: to reference the macro definition's crate, use: `$crate` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#crate_in_macro_def = note: `-D clippy::crate-in-macro-def` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::crate_in_macro_def)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	150aee088d	libs: Fix clippy `len_zero` Fix `len_zero` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: length comparison to zero --> mem-agent/src/memcg.rs:225:61 \| 225 \| let (keep, moved) = vec.drain(..).partition(\|c\| c.numa_id.len() > 0); \| ^^^^^^^^^^^^^^^^^^^ help: using `!is_empty` is clearer and more explicit: `!c.numa_id.is_empty()` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#len_zero ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	1a0935d35c	libs: Fix clippy `bool_assert_comparison` Fix `bool_assert_comparison` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: used `assert_eq!` with a literal bool --> mem-agent/src/memcg.rs:1378:9 \| 1378 \| assert_eq!(m.get_timeout_list().len() > 0, true); \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#bool_assert_comparison = note: `-D clippy::bool-assert-comparison` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::bool_assert_comparison)]` help: replace it with `assert!(..)` \| 1378 - assert_eq!(m.get_timeout_list().len() > 0, true); 1378 + assert!(m.get_timeout_list().len() > 0); \| ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	75171b0cb7	libs: Fix clippy `collapsible_else_if` Fix `collapsible_else_if` clippy warning as suggested by rust 1.85.1, since `mem-agent` is now a member of `libs` workspace. ```console error: this `else { if .. }` block can be collapsed --> mem-agent/src/agent.rs:205:16 \| 205 \| } else { \| ________________^ 206 \| \| if mas.refresh() { 207 \| \| continue; 208 \| \| } 209 \| \| } \| \|_________^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#collapsible_else_if = note: `-D clippy::collapsible-else-if` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::collapsible_else_if)]` help: collapse nested if block \| 205 ~ } else if mas.refresh() { 206 + continue; 207 + } \| ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	f605097daa	libs: Make `mem-agent` a member of `libs` workspace Add `mem-agent` to `libs` workspace and sort the members list. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	7bb28d8da7	libs: Move `mem-agent` into `src/libs` `mem-agent` now does not ship example binaries and serves as a library for `agent` to reference, so we move it into `libs` to better manage it. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Ruoqing He	f0e223c535	mem-agent: Rename `mem-agent-lib` to `mem-agent` Rename `mem-agent-lib` to `mem-agent` before we move it into `src/libs`. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-10-22 03:26:35 +00:00
Dan Mihai	d7176ffcc8	tests: k8s-sandbox-vcpus-allocation generated policy Auto-generate policy for k8s-sandbox-vcpus-allocation.bats. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-10-21 21:36:49 +00:00
Dan Mihai	25299bc2a9	tests: k8s-block-volume.bats generated policy Auto-generate policy for k8s-block-volume.bats. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-10-21 21:36:40 +00:00
Dan Mihai	02a8ec0f63	tests: k8s-measured-rootfs auto generated policy Generate Agent Policy for the pod from k8s-measured-rootfs.bats. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-10-21 21:36:27 +00:00
Zvonko Kaiser	1ff8b066c6	Merge pull request #11941 from fidencio/topic/kata-deploy-add-missing-helm-docs helm: Add missing documentation	2025-10-21 16:04:55 -04:00
Dan Mihai	ebaecbd3d6	Merge pull request #11949 from microsoft/danmihai1/optional-secret-volume genpolicy: allow optional secret volumes	2025-10-21 12:27:13 -07:00
Aurélien Bombo	d01fa478ad	Merge pull request #11948 from kata-containers/sprt/fix-go-download tests: Install Go from reliable mirror	2025-10-21 14:00:09 -05:00
Aurélien Bombo	89e976e413	Merge pull request #11955 from kata-containers/sprt/refresh-oidc-before-delete ci: Always refresh OIDC token before cluster deletion	2025-10-21 13:52:24 -05:00
Dan Mihai	f11853ab33	tests: k8s-optional-empty-secret.bats policy Auto-generate policy in k8s-optional-empty-secret.bats, now that genpolicy suppprts optional secret-based volumes. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-10-21 15:27:31 +00:00
Dan Mihai	346e1c1db6	genpolicy: allow optional secret volumes Don't reject during policy generation Secret volumes defined as optional. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-10-21 15:27:31 +00:00
Aurélien Bombo	785afb1dec	Merge pull request #11885 from kata-containers/sprt/block-dev-hostpath docs: Document behavior of `BlockDevice` hostPath, procs, and sysfs mounts	2025-10-21 09:38:27 -05:00
Aurélien Bombo	b7f542443e	ci: Always refresh OIDC token before cluster deletion This forces OIDC token refresh even if the tests step failed, so that we also have proper credentials to delete the cluster in that case. I first noticed the original issue here: https://github.com/kata-containers/kata-containers/actions/runs/18659064688/job/53215379040?pr=11950 Fixes: #11953 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-10-21 09:35:52 -05:00
Fabiano Fidêncio	552378cf1e	helm: Add missing documentation We've recently added support for: * deploying and setting up a snapshotter, via _experimentalSetupSnapshotter * enabling experimental_force_guest_pull, via _experimentalForceGuestPull However, we never updated the documentation for those, thus let's do it now. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-21 16:20:21 +02:00
Greg Kurz	43455774ce	Merge pull request #11939 from ldoktor/ocp-helm-sudo ci.ocp: Install helm in local dir	2025-10-21 16:12:41 +02:00
Aurélien Bombo	93eef5b253	docs: Document behavior of procfs and sysfs mounts The claims in the doc come from #808 and #886. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-10-21 08:50:06 -05:00
Aurélien Bombo	033299e46d	docs: Document behavior of BlockDevice hostPath volumes This is a follow-up to #11832. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-10-21 08:50:06 -05:00
Aurélien Bombo	22aa27ff5e	tests: Install Go from reliable mirror Downloading Go from storage.googleapis.com fails intermittently with a 403 (see error below) so we switch to go.dev as referenced at https://go.dev/dl/. /tmp/install-go-tmp.Rw5Q4thEWr ~/work/kata-containers/kata-containers /usr/bin/go [install_go.sh:85] INFO: removing go version go1.24.9 linux/amd64 [install_go.sh:94] INFO: Download go version 1.24.6 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 298 100 298 0 0 2610 0 --:--:-- --:--:-- --:--:-- 2614 [install_go.sh:97] INFO: Install go gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now [install_go.sh:99] ERROR: sudo tar -C /usr/local/ -xzf go1.24.6.linux-amd64.tar.gz https://github.com/kata-containers/kata-containers/actions/runs/18602801597/job/53045072109?pr=11947#step:5:17 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-10-21 08:47:41 -05:00
Manuel Huber	af34308c83	gpu: remove version suffixes for imex and nscq This change ensures that the NVIDIA package repository for nvidia-imex and libnvidia-nspc is being used as source. The NVIDIA repository does not publish these packages with a -580 version suffix, which made us fall back to the packages from the Ubuntu repository. These two packages were recently updated by Ubuntu to depend on nvidia-kernel-common-580-server (this happened from version 580.82.07-0ubuntu1 to version 580.95.05-0ubuntu1). This conflicts with nvidia-kernel-common-580 which gets installed by nvidia-headless-no-dkms-580-open, thus causing a build failure. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-10-21 15:42:51 +02:00
Lukáš Doktor	5038578fba	ci.ocp: Install helm in local dir in CI helm is not yet installed and we don't have root access. Let's use the current dir, which should be writable, and --no-sudo option to install it. Note when helm is installed it should not change anything and simply use the syste-wide installation. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-10-21 06:28:36 +02:00
Steve Horsman	947862f804	Merge pull request #11904 from manuelh-dev/mahuber/conf-rootfs-nv-guest-pull gpu: nvidia rootfs build with guest pull support	2025-10-17 16:08:05 +01:00
Steve Horsman	94b6a1d43e	Merge pull request #10664 from kevinzs2048/add-cca runtime-go \| kata-deploy: Add Arm CCA confidential Guest Support	2025-10-17 14:38:34 +01:00
Manuel Huber	4ad8c31b5a	gpu: build nv rootfs with guest pull support While the local-build's folder's Makefile dependencies for the confidential nvidia rootfs targets already declare the pause image and coco-guest-components dependencies, the actual rootfs composition does not contain the pause image bundle and relevant certificates for guest pull. This change ensure the rootfs gets composed with the relevant files. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-10-16 09:20:49 -07:00
Aurélien Bombo	edbb4b633c	Merge pull request #11890 from microsoft/saulparedes/optional_initdata genpolicy: take path to initdata from command line if provided	2025-10-16 11:04:57 -05:00
Markus Rudy	d5cb9764fd	kata-types: use pretty TOML encoder for initdata TOML was chosen for initdata particularly for the ability to include policy docs and other configuration files without mangling them. The default TOML encoding renders string values as single-line, double-quoted strings, effectively depriving us of this feature. This commit changes the encoding to use `to_string_pretty`, and includes a test that verifies the desirable aspect of encoding: newlines are kept verbatim. Fixes: #11943 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-10-16 12:08:18 +02:00
Kevin Zhao	141070b388	Kata-deploy: Add kata-deploy set up for qemu-cca Support launch qemu-cca in Kata-deploy. Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2025-10-16 17:24:52 +08:00
Kevin Zhao	af919686ab	Kata-deploy: Add CCA firmware build support runtime: pass firmware to CCA Realm Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2025-10-16 17:24:45 +08:00
Kevin Zhao	16e91bfb21	kata-deploy: Add support for Arm CCA Qemu build The Qemu support is picked up from: https://git.codelinaro.org/linaro/dcap/qemu.git, branch: cca/2025-04-16 More info regarding the CCA software stack dev and test, please refer to link: https://linaro.atlassian.net/wiki/spaces/QEMU/pages/29051027459/Building+an+RME+stack+for+QEMU Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2025-10-16 17:24:08 +08:00
Seunguk Shin	c7d5f207f1	kata-deploy: support build confidential rootfs and initrd for CCA Also add cca-attester for coco-guest-component Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org> Co-authored-by: Seunguk Shin <seunguk.shin@arm.com>	2025-10-16 17:24:03 +08:00
Seunguk Shin	40dac78412	kata-deploy: support build confidential kernel and shim-v2 for CCA After supporting the Arm CCA, it will rely on the kernel kvm.h headers to build the runtime. The kernel-headers currently quite new with the traditional one, so that we rely on build the kernel header first and then inject it to the shim-v2 build container. Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org> Co-authored-by: Seunguk Shin <seunguk.shin@arm.com>	2025-10-16 17:23:58 +08:00
Kevin Zhao	bfa7f2486d	runtime: Add Arm64 CCA confidential Guest Support This commit add the support for Arm CCA/RME support in golang runtime. The guest kernel is support since Linux 6.13. The host kernel which Kata is running is picked from: https://gitlab.arm.com/linux-arm/linux-cca branch: cca-host/v8 which is currently very stable and reviewed for a while, and it is expecting to merged this year. The Qemu support is picked up from: https://git.codelinaro.org/linaro/dcap/qemu.git, branch: cca/2025-05-28, The Qemu support will be merged to upstream after the CCA host support official support in linux kernel. More info regarding the CCA software stack dev and test, please refer to link: https://linaro.atlassian.net/wiki/spaces/QEMU/pages/29051027459/Building+an+RME+stack+for+QEMU Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2025-10-16 17:23:54 +08:00
stevenhorsman	9b086376a4	tests/k8s: Skip initdata tests on tdx The new initdata variants of the tests are failing on the tdx runner, so as discussed, skip them for now: Issue #11945 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-10-15 14:52:08 +01:00
stevenhorsman	09149407fd	tests/k8s: Delete k8s-initdata.bats Now we have wider coverage of initdata testing in k8s-guest-pull-image-signature.bats then remove the old testing. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-10-15 14:52:08 +01:00
stevenhorsman	bdc0a3cf19	tests/k8s: Add initdata variant of registry creds tests Our current set of authenticated registry tests involve setting kernel_params to config the image pull process, but as of kata-containers#11197 this approach is not the main way to set this configuration and the agent config has been removed. Instead we should set the configuration in the `cdh.toml` part of the initdata, so add new test cases for this. In future, when we have been through the deprecation process, we should remove the old tests Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-10-15 14:52:08 +01:00
stevenhorsman	7fbbd170ee	tests/k8s: Add initdata variants of oci signature tests Our current set of signature tests involve setting kernel_parameters to config the image pull process, but as of https://github.com/kata-containers/kata-containers/pull/11197 this approach is not the main way to set this configuration and the agent config has been removed. Instead we should set the configuration in the `cdh.toml` part of the initdata, so add new test cases for this. In future, when we have been through the deprecation process, we should remove the old tests Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-10-15 14:52:08 +01:00
stevenhorsman	90ad5cd884	tests/k8s: Refactor initdata annotation Create a shared get_initdata method that injects a cdh image section, so we don't duplicate the initdata structure everywhere Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-10-15 14:52:08 +01:00
Fabiano Fidêncio	aa7e46b5ed	tests: Check the multi-snapshotter situation on containerd One problem that we've been having for a reasonable amount of time, is containerd not behaving very well when we have multiple snapshotters. Although I'm adding this test with my "CoCo" hat in mind, the issue can happen easily with any other case that requires a different snapshotter (such as, for instance, firecracker + devmapper). With this in mind, let's do some stability tests, checking every hour a simple case of running a few pre-defined containers with runc, and then running the same containers with kata. This should be enough to put us in the situation where containerd gets confused about which snapshotter owns the image layers, and break on us (or not break and show us that this has been solved ...). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-15 13:35:43 +02:00
Manuel Huber	8221361915	gpu: Use variable to differentiate rootfs variants With this change we namespace the stage one rootfs tarball name and use the same name across all uses. This will help overcome several subtle local build problems. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-10-15 12:39:44 +02:00
Hyounggyu Choi	88c333f2a6	agent: Fix race in tests calling LinuxContainer::new() We fix the following error: ``` thread 'sandbox::tests::add_and_get_container' panicked at src/sandbox.rs:901:10: called `Result::unwrap()` on an `Err` value: Create cgroupfs manager Caused by: 0: fs error caused by: Os { code: 17, kind: AlreadyExists, message: "File exists" } 1: File exists (os error 17) note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ``` by ensuring that the cgroup path is unique for tests run in the same millisecond. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-10-15 11:32:22 +02:00
Hyounggyu Choi	8412af919d	agent/netlink: Attempt to fix ARP and routes tests test_add_one_arp_neighbor ========================= We attempt to fix the following error: ``` thread 'netlink::tests::test_add_one_arp_neighbor' panicked at src/netlink.rs:1163:9: assertion `left == right` failed left: "" right: "192.0.2.127 lladdr 6a:92:3a:59:70:aa PERMANENT" ``` by adding a sleep to prepare_env_for_test_add_one_arp_neighbor() to wait for the kernel interfaces to settle. list_routes =========== We attempt to fix the following error (notice that the available devices contain "dummy_for_arp"): ``` thread 'netlink::tests::list_routes' panicked at src/netlink.rs:986:14: Failed to list routes: available devices: [Interface { device: "", name: "lo", IPAddresses: [IPAddress { family: v6, address: "127.0.0.1", mask: "8", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, IPAddress { family: v6, address: "169.254.1.1", mask: "31", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, IPAddress { family: v4, address: "2001:db8:85a3::8a2e:370:7334", mask: "128", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, IPAddress { family: v4, address: "::1", mask: "128", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }], mtu: 65536, hwAddr: "00:00:00:00:00:00", devicePath: "", type_: "", raw_flags: 0, special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, Interface { device: "", name: "enc0", IPAddresses: [IPAddress { family: v6, address: "10.249.65.4", mask: "24", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, IPAddress { family: v4, address: "fe80::4ff:fe57:b3e4", mask: "64", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }], mtu: 1500, hwAddr: "02:00:04:57:B3:E4", devicePath: "", type_: "", raw_flags: 0, special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, Interface { device: "", name: "docker0", IPAddresses: [IPAddress { family: v6, address: "172.17.0.1", mask: "16", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, IPAddress { family: v4, address: "fe80::42:56ff:fe5c:d9f9", mask: "64", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }], mtu: 1500, hwAddr: "02:42:56:5C:D9:F9", devicePath: "", type_: "", raw_flags: 0, special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, Interface { device: "", name: "dummy_for_arp", IPAddresses: [IPAddress { family: v6, address: "192.0.2.2", mask: "24", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }, IPAddress { family: v4, address: "fe80::f4f2:64ff:fe46:2b01", mask: "64", special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }], mtu: 1500, hwAddr: "4A:73:DE:A3:07:64", devicePath: "", type_: "", raw_flags: 0, special_fields: SpecialFields { unknown_fields: UnknownFields { fields: None }, cached_size: CachedSize { size: 0 } } }] Caused by: 0: error looking up device 19888 1: Received a netlink error message No such device (os error 19) ``` by calling clean_env_for_test_add_one_arp_neighbor() at the start of the test. However this fix is uncertain: the original assumption for the fix was that the "dummy_for_arp" interface left over from test_add_one_arp_neighbor was the cause of the error. But (3) below shows that running list_routes in isolation while that interface is present is NOT enough to repro the error: 1. Running all tests + no clean_env in list_routes => list_routes FAILS (before this PR) 2. Running all tests + clean_env in list_routes => list_routes PASSES (after this PR) 3. Running only list_routes + dummy_for_arp present => list_routes PASSES (manual test, see below) ``` $ ip a l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet 169.254.1.1/31 brd 169.254.1.1 scope global lo valid_lft forever preferred_lft forever inet6 2001:db8:85a3::8a2e:370:7334/128 scope global valid_lft forever preferred_lft forever inet6 ::1/128 scope host noprefixroute valid_lft forever preferred_lft forever 2: enc0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 02:00:01:02:e2:47 brd ff:ff:ff:ff:ff:ff inet 10.240.64.4/24 metric 100 brd 10.240.64.255 scope global dynamic enc0 valid_lft 159sec preferred_lft 159sec inet6 fe80::1ff:fe02:e247/64 scope link valid_lft forever preferred_lft forever 311: dummy_for_arp: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether ee:79:66:3a:dc:bc brd ff:ff:ff:ff:ff:ff inet 192.0.2.2/24 scope global dummy_for_arp valid_lft forever preferred_lft forever inet6 fe80::4c2e:83ff:fe7d:ef00/64 scope link valid_lft forever preferred_lft forever $ sudo -E PATH=$PATH make test ../../utils.mk:162: "WARNING: s390x-unknown-linux-musl target is unavailable" Finished `test` profile [unoptimized + debuginfo] target(s) in 0.25s Running unittests src/main.rs (target/s390x-unknown-linux-gnu/debug/deps/kata_agent-b2b5b200deca712e) running 1 test test netlink::tests::list_routes ... ok test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 224 filtered out; finished in 0.00s ``` Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-10-15 11:32:22 +02:00
Paul Meyer	06ed957a45	virtcontainers: fix nydus cleanup on rootfs unmount This was discovered by @sprt in https://github.com/kata-containers/kata-containers/pull/10243#discussion_r2373709407. Checking for state.Fstype makes no sense as we know it is empty. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-10-15 09:22:51 +02:00
Zvonko Kaiser	10f8ec0c20	cdi: Add Crate remove Github Hash Use CDI exclusively from crates.io and not from a GH repository. Cargo can easily check if a new version is available and we can far more easier bump it if needed. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-10-15 09:22:20 +02:00
Greg Kurz	3507b2038e	Merge pull request #11936 from ldoktor/ocp-helm ci.ocp: Use helm to install kata	2025-10-14 18:22:28 +02:00
Lukáš Doktor	bdb0afc4e0	ci.ocp: Fix incorrectly quoted argument with the shellcheck fixes we accidentally quoted the "-n NAMESPACE" argument where we should have used array instead, which lead to oc considering this as a pod name and returning error. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-10-14 17:59:33 +02:00
Lukáš Doktor	f891f340bc	ci.ocp: Use helm to install kata which is the current supported way to deploy kata-containers directly. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-10-14 17:59:33 +02:00
Aurélien Bombo	0c6fcde198	Merge pull request #11918 from fidencio/topic/builds-qemu-use-liburing-newer-than-2.2 builds: qemu: Use a liburing newer than 2.2	2025-10-14 10:17:16 -05:00
Steve Horsman	363701d767	Merge pull request #11915 from stevenhorsman/ibm-runner-followups-part-i ci: Add protobuf-compiler dependencies	2025-10-14 13:28:45 +01:00
Fabiano Fidêncio	2ad81c4797	build: qemu: Fix cache logic We need to ensure that any change on the Dockerfile (and its dir) leads to the build being retriggered, rather than using the cached version. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-14 12:17:43 +02:00
Fabiano Fidêncio	2f73e34e33	builds: qemu: Use a liburing newer than 2.2 Due to a potential regression introduced by: `984a32f17e (565f3835aaed6321caab4f7c4f8560a687f6000b_379_386)` Reported-by: Aurélien Bombo <abombo@microsoft.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-14 12:17:28 +02:00
stevenhorsman	8ce714cf97	ci: Add protobuf-compiler dependencies We are seeing more protoc related failures on the new runners, so try adding the protobuf-compiler dependency to these steps to see if it helps. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-10-14 10:58:58 +01:00
Fabiano Fidêncio	b0b0038689	versions: Bump QEMU to 10.1.1 QEMU 10.1.1 was released on October 8th, 2025, let's bump it on our side. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-13 23:52:01 +02:00
Fabiano Fidêncio	d46474cfc0	tests: Run apt-get update before installing a package Otherwise it'll just break. :-) Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-13 23:33:46 +02:00
Saul Paredes	ba7a5953c8	tests: k8s-policy-pod.bats: test unspecified initdata path use auto_generate_policy_no_added_flags, so we don't pass --initdata-path to genpolicy Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-10-13 10:47:53 -07:00
Saul Paredes	395f237fc2	tests: k8s: use default-initdata.toml when auto-generating policy - copy default-initdata.toml in create_tmp_policy_settings_dir, so it can be modified by other tests if needed - make auto_generate_policy use default-initdata.toml by default - add auto_generate_policy_no_added_flags, so it may be used by tests that don't want to use default-initdata.toml by default Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-10-13 10:47:53 -07:00
Saul Paredes	dfd269eb87	genpolicy: take path to initdata from command line if provided Otherwise use default initdata. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-10-13 10:47:53 -07:00
Fabiano Fidêncio	fb43d3419f	build: Fix nvidia kernel breakage On commit `9602ba6ccc`, from February this year, we've introduced a check to ensure that the files needed for signing the kernel build are present. However, we've noticed last week that there were a reasonable amount of wrong assumptions with the workflow. :-) Zvonko fixed the majority of those, but this bit was left and it'd cause breakages when using kernel that was cached ... although passing when building new kernels. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-13 19:28:40 +02:00
Fupan Li	8b06f3d95d	Merge pull request #11905 from Apokleos/coldplug-scsidev runtime-rs: Support virtio-scsi for initdata within non-TEE	2025-10-11 16:11:39 +08:00
Xuewei Niu	5acb6d8e13	Merge pull request #11863 from lifupan/fupan_blk_remove runtime-rs: ad the block device hot unplug for clh	2025-10-11 10:31:48 +08:00
Aurélien Bombo	ff973a95c8	Merge pull request #11916 from zvonkok/fix-kernel-module-signing gpu: Fix kernel module signing	2025-10-10 17:17:08 -05:00
Zvonko Kaiser	b00013c717	kernel: Add KBUILD_SIGN_PIN pass through This is needed to the kernel setup picks up the correct config values from our fragments directories. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-10-10 15:45:34 -04:00
Zvonko Kaiser	37bd5e3c9d	gpu: Add kernel CONFIG check We need to make sure that the kernel we're using has the correct configs set, otherwise the module signing will not work. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-10-10 15:45:34 -04:00
Fabiano Fidêncio	e782d1ad50	ci: k8s: Test experimental_force_guest_pull Now that we have added the ability to deploy kata-containers with experimental_force_guest_pull configured, let's make sure we test it to avoid any kind of regressions. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-10 20:08:10 +02:00
Fabiano Fidêncio	1bc89d09ae	tests: Consider SNAPSHOTTER in the cluster name Otherwise we have no way to differentiate running tests on qemu-coco-dev with different snapshotters. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-10 20:08:10 +02:00
Fabiano Fidêncio	496e255ea2	build: Fix KBUILD_SIGN_PIN usage What was done in the past, trying to set the env var on the same step it'd be used, simply does not work. Instead, we need to properly set it through the `env` set up, as done now. We're also bumping the kata_config_version to ensure we retrigger the kernel builds. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-10 15:25:10 +02:00
Paul Meyer	5ae891ab46	versions: bump opa 1.6.0 -> 1.9.0 Bumping opa to latest release. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-10-10 10:58:51 +02:00
Steve Horsman	a570fdc0fd	Merge pull request #11909 from kata-containers/ibm-runners-test ci: Enable new ibm runners	2025-10-10 09:42:53 +01:00
stevenhorsman	8dcd91cf5f	ci: Enable new ibm runners We have some scalable s390x and ppc runners, so start to use them for build and test, to improve the throughput of our CI Signed-off-by: stevenhorsman <steven@uk.ibm.com> Co-authored-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-10-10 09:42:06 +01:00
Fabiano Fidêncio	06a3bbdd44	ci: k8s: coco: Add "Report tests" step For some reason we didn't have the "Report tests" step as part of the TEE jobs. This step immensely helps to check which tests are failing and why, so let's add it while touching the workflow. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-10 09:51:59 +02:00
Fabiano Fidêncio	a1f90fe350	tests: k8s: Unify k8s TEE tests There's no reason to have the code duplication between the SNP / TDX tests for CoCo, as those are basically using the same configuration nowadays. Note that for the TEEs case, as the nydus-snapshotter is deployed by the admin, once, instead of deploying it on every run ... I'm actually removing the nydus-snapshotter steps so we make it clear that those steps are not performed by the CI. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-10 09:51:59 +02:00
Alex Lyn	4c386b51d9	runtime-rs: Add support for handling virtio-scsi devices As virtio-scsi has been set the default block device driver, the runtime also need to correctly handle the virtio-scsi info, specially the SCSI address required within kata-agent handling logic. And getting and assigning the scsi_addr to kata agent device id will be enough. This commit just do such work. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-10-10 11:31:04 +08:00
Fupan Li	4002a91452	runtime-rs: ad the block device hot unplug for clh Since runtime-rs support the block device hotplug with creating new containers, and the device would also be removed when the container stopped, thus add the block device unplug for clh. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-10-10 10:02:12 +08:00
Zvonko Kaiser	afbec780a9	Merge pull request #11903 from zvonkok/ppcie gpu: PPCIE support DGX like systems	2025-10-09 21:06:41 -04:00
Aurélien Bombo	a3a45429f6	Merge pull request #11865 from microsoft/danmihai1/nested-configmap-secret tests: k8s-nested-configmap-secret policy	2025-10-09 11:33:50 -05:00
Alex Lyn	b42ef09ffb	Merge pull request #11888 from spuzirev/main runtime: fix "num-queues expects uint64" error with virtio-blk	2025-10-09 20:21:32 +08:00
Xuewei Niu	2a43bf37ed	Merge pull request #11894 from M-Phansa/main runtime: fix device typo	2025-10-09 16:53:40 +08:00
Alex Lyn	a54d95966b	runtime-rs: Support virtio-scsi for initdata within non-TEE This commit introduces support for selecting `virtio-scsi` as the block device driver for QEMU during initial setup. The primary goal is to resolve a conflict in non-TEE environments: 1. The global block device configuration defaults to `virtio-scsi`. 2. The `initdata` device driver was previously designed and hardcoded to `virtio-blk-pci`. 3. This conflict prevented unified block device usage. By allowing `virtio-scsi` to be configured at cold boot, the `initdata` device can now correctly adhere to the global setting, eliminating the need for a hardcoded driver and ensuring consistent block device configuration across all supported devices (excluding rootfs). Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-10-09 15:52:33 +08:00
Xuewei Niu	5208ee4ec0	Merge pull request #11674 from was-saw/dragonball_seccomp runtime-rs: add seccomp support for dragonball	2025-10-09 15:01:15 +08:00
wangxinge	8e1b33cc14	docs: add document for seccomp This commit adds a document to use seccomp in runtime-rs Signed-off-by: wangxinge <wangxinge@bupt.edu.cn>	2025-10-09 13:25:17 +08:00
wangxinge	2abf6965ff	dragonball: add seccomp support for dragonball This commit modifies seccomp framework to support different restrictions for different threads. Signed-off-by: wangxinge <wangxinge@bupt.edu.cn>	2025-10-09 13:25:17 +08:00
wangxinge	bb6fb8ff39	runtime-rs: add seccomp support for dragonball The implementation of the seccomp feature in Dragonball currently has a basic framework. But the actual restriction rules are empty. This pull request includes the following changes: - Modifiy configuration files to relevant configuration files. - Modifiy seccomp framework to support different restrictions for different threads. - Add new seccomp rules for the modified framework. This commit primarily implements the changes 1 and 3 for runtime-rs. Fixes: #11673 Signed-off-by: wangxinge <wangxinge@bupt.edu.cn>	2025-10-09 13:25:17 +08:00
Zvonko Kaiser	91739d4425	gpu: PPCIE support DGX like systems For DGX like systems we need additional binaries and libraries, enable the Kata AND CoCo use-case. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> Update tools/osbuilder/rootfs-builder/nvidia/nvidia_rootfs.sh Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-09 00:00:12 +00:00
Dan Mihai	364d3cded0	tests: k8s-nested-configmap-secret policy Add auto-generated agent policy in k8s-nested-configmap-secret.bats. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-10-08 23:37:54 +00:00
Sergei Puzyrev	62b12953c7	runtime: fix "num-queues expects uint64" error with virtio-blk Unneeded type-conversion was removed. Fixes #11887 Signed-off-by: Sergei Puzyrev <spuzirev@gmail.com>	2025-10-08 17:09:22 -05:00
Adeet Phanse	4e4f9c44ae	runtime: fix device typo Fix device typo in dragonball / runtime-rs / runtime. Signed-off-by: Adeet Phanse <adeet.phanse@mongodb.com>	2025-10-08 17:08:27 -05:00
Aurélien Bombo	d954932876	Merge pull request #11883 from kata-containers/sprt/zizmor-fixes3 ci: zizmor: Address all issues	2025-10-08 17:01:48 -05:00
Aurélien Bombo	07645cf58b	ci: actionlint: Address issues and set as required Address issues just introduced and set actionlint as a required by removing the path filter. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-10-08 16:55:27 -05:00
Aurélien Bombo	b3a551d438	ci: zizmor: Reestablish as required test We can re-require this now that we've addressed all the issues. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-10-08 16:55:27 -05:00
Aurélien Bombo	5a4ddb8c71	ci: zizmor: Fix all `template-injection` alerts Fix all instances of template injection by using environment variables as recommended by Zizmor, instead of directly injecting values into the commands. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-10-08 16:55:26 -05:00
Aurélien Bombo	7b203d1b43	ci: zizmor: Ignore `dangerous-triggers` audit for known safe usage The two ignored cases are strictly necessary for the CI to work today, and we have various security mitigations in place. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-10-08 16:55:08 -05:00
Aurélien Bombo	7afdfc7388	ci: zizmor: Disable `undocumented-permissions` audit There are 62 such warnings and addressing them would take quite a bit of time so just disable them for now. help[undocumented-permissions]: permissions without explanatory comments --> ./.github/workflows/release.yaml:71:7 \| 71 \| packages: write \| ^^^^^^^^^^^^^^^ needs an explanatory comment 72 \| id-token: write \| ^^^^^^^^^^^^^^^ needs an explanatory comment 73 \| attestations: write \| ^^^^^^^^^^^^^^^^^^^ needs an explanatory comment \| = note: audit confidence → High Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-10-08 16:55:08 -05:00
Aurélien Bombo	889ba0d5db	Merge pull request #11901 from kata-containers/sprt/remove-docs-url-check gha: Fix `docs-url-alive-check` workflow	2025-10-08 14:42:58 -05:00
Aurélien Bombo	ec81ea95df	gha: Add `workflow_dispatch` trigger to `docs-url-alive-check` We can't test this PR because the workflow needs this trigger, so adding this will allow testing future PRs. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-10-08 14:39:34 -05:00
Aurélien Bombo	4d760e64ae	gha: Fix docs-url-alive-check workflow The Go installation step was broken because the checkout action was checking out the code in a subdirectory: https://github.com/kata-containers/kata-containers/actions/runs/18265538456/job/51999316919 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-10-08 14:39:34 -05:00
Aurélien Bombo	476c827fca	Merge pull request #11878 from kata-containers/sprt/privileged-docs docs: Document `privileged_without_host_devices=false` as unsupported	2025-10-08 11:12:45 -05:00
Fabiano Fidêncio	dbb1eb959c	kata-deploy: Allow users to set experimental_force_guest_pull For those who are not willing to use the nydus-snapshotter for pulling the image inside the guest, let's allow them setting the experimetal_force_guest_pull, introduced by Edgeless, as part of our helm-chart. This option can be set as: _experimentalForceGuestPull: "qemu-tdx,qemu-coco-dev" Which would them ensure that the configuration for `qemu-tdx` and `qemu-coco-dev` would have the option enabled. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-08 17:43:09 +02:00
Fabiano Fidêncio	8c4bad68a8	kata-deploy: Remove kustomize yamls, rely on helm-chart only As the kata-deploy helm chart has been the only way we've been testing kata-containers deployment as part of our CI, it's time to finally get rid of the kustomize yamls and avoid us having to maintain two different methods (with one of those not being tested). Here I removed: * kata-deploy yamls and kustomize yamls * kata-cleanup yamls and kustomize yamls * kata-rbac yals and kustomize yamls * README.md for the kustomize yamls was removed Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-08 16:54:19 +02:00
Fabiano Fidêncio	3418cedacc	ci: Add tests for erofs-snapshotter (for coco-qemu-dev) erofs-snapshotter can be used to leverage sharing the image from the host to the guest without the need of a shared filesystem (such as virtio-fs or virtio-9p). This case is ideal for Confidential Computing enabled on Kata Containers, and we can immensely benefit from this snapshotter, thus let's test it as soon as possible so we can find issues, report bugs, and ask for enhancement requests. There are at least a few things that we know for sure to be problematic now: * Policy has to be adjusted to the erofs-snapshotter * There is no support for signed nor encrypted images * Tests that use the KBS are disabled for now Even with the limitations, I do believe we should be testing the snapshoitter, so we can team up and get those limitations addressed. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-08 10:34:09 +02:00
Fabiano Fidêncio	544f688104	tests: Add ability to deploy vanilla k8s with erofs As done in the previous commit, let's expand the vanilla k8s deployment to also allow the erofs host side configuration. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-08 10:34:09 +02:00
Fabiano Fidêncio	3ac6579ca6	tests: Add support for deploying vanilla k8s We already have support for deploying a few flavours of k8s that are required for different tests we perform. Let's also add the ability to deploy vanilla k8s, as that will be very useful in the next commits in this series. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-08 10:34:09 +02:00
Fabiano Fidêncio	aa9e3fc3d5	versions: Update containerd active / latest versions The active version is 2.1.x, and the latest is 2.2.0-beta.0. The latest is what we'll be using to test if the "to be released" version of containerd works well for our use-cases. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-08 10:34:09 +02:00
Fabiano Fidêncio	287db1865f	tests: Relax regex used to install containerd Let's make sure that we can get non-official releases as well, otherwise we won't be able to test a coming release of containerd, to know whether it solves issues that we face or not, before it's actually released. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-08 10:34:09 +02:00
Zvonko Kaiser	59b4e3d3f8	gpu: Add CONFIG_FW_LOADER to the kernel We need it for the newer CC kernel Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-10-08 10:01:27 +02:00
Zvonko Kaiser	7061f64db5	gpu: Fix confidential build NVRC introduced the confidential feature flag and we haven't updated the rootfs build to accomodate. If rootfs_type==confidential user --feature=confidential Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-10-08 10:01:27 +02:00
Zvonko Kaiser	2260f66339	gpu: Some fixes regarding the rootfs v580 With the 580 driver version we need new dependencies in the rootfs. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-10-08 10:01:27 +02:00
Dan Mihai	08272ab673	Merge pull request #11884 from kata-containers/sprt/priv-test tests/k8s: Add test for privileged containers	2025-10-07 19:18:06 -07:00
Szymon Klimek	8dc6b24e7d	kata-deploy: accept 25.10 as supported distro for TDX Canonical TDX release is not needed for vanilla Ubuntu 25.10 but GRUB_CMDLINE_LINUX_DEFAULT needs to contain `nohibernate` and `kvm_intel.tdx=1` Signed-off-by: Szymon Klimek <szymon.klimek@intel.com>	2025-10-07 23:41:52 +02:00
Dan Mihai	650863039b	tests: k8s-volume: auto-generate policy Auto-generate the agent policy, instead of using the insecure "allow all" policy. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-10-07 23:35:06 +02:00
Dan Mihai	5ed76b3c91	tests: k8s-volume: retry failed exec Use grep_pod_exec_output to retry possible failing "kubectl exec" commands. Other tests have been hitting such errors during CI in the past. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-10-07 23:35:06 +02:00
Dan Mihai	6ab59453ff	genpolicy: better parsing of mount path Mount paths ending in '/' were not parsed correctly. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-10-07 23:35:06 +02:00
Dan Mihai	ba792945ef	genpolicy: additional mount_source_allows logging Make debugging policy errors related to storage mount sources easier to debug. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-10-07 23:35:06 +02:00
Aurélien Bombo	6e451e3da0	tests/k8s: Add test for privileged containers This adds an integration test to verify that privileged containers work properly when deploying Kata with kata-deploy. This is a follow-up to #11878. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-10-07 09:59:05 -05:00
Fabiano Fidêncio	f994bacf6c	tests: coco: Use the new way to set up nydus snapshotter Let's rely on kata-deploy setting up the nydus snapshotter for us, instead of doing this with external code. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-07 10:32:46 +02:00
Fabiano Fidêncio	6f17125ea4	tests: Allow using the new way to deploy nydus-snapshotter This allows us to stop setting up the snapshotter ourselves, and just rely con kata-deploy to do so. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-07 10:32:46 +02:00
Fabiano Fidêncio	000c9cce23	kata-deploy: chart: Add `_experimentalSetupSnapshotter` Let's expose the EXPERIMENTAL_SETUP_SNAPSHOTTER script environment variable to our chart, allowing then users of our helm chart to take advantage of this experimental feature. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-07 10:32:46 +02:00
Fabiano Fidêncio	d6a1881b8b	kata-deploy: scripts: Allow setting up multiple snapshotters We may deploy in scenarios where we want to have both snapshotters set up, sometimes even for simple test on which one behaves better. With this in mind, let's allow EXTERNAL_SETUP_SNAPSHOTTER to receive a comma separated list of snapshotters, such as: ``` EXPERIMENTAL_SETUP_SNAPSHOTTER="erofs,nydus" ``` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-07 10:32:46 +02:00
Fabiano Fidêncio	445af6c09b	kata-deploy: scripts: Allow deploying erofs-snapshotters Similarly to what's been done for the nydus-snapshotter, let's allow users to have erofs-snapshotter set up by simply passing: ``` EXPERIMENTAL_SETUP_SNAPSHOTTER="erofs". ``` Mind that erofs, although a built-in containerd snapshotter, has system depdencies that we will NOT install and it's up to the admin to do so. These dependencies are: * erofs-utils * fsverity * erofs module loaded Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-07 10:32:46 +02:00
Fabiano Fidêncio	4359c7b15d	tests: Ensure the nydus-snapshotter versions are aligned In the previous commit we added the assumption that the nydus-snapshotter version should be the same in two different places. Now, with this test, we ensure those will always be in sync. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-07 10:32:46 +02:00
Fabiano Fidêncio	2e0ce2f39f	kata-deploy: scripts: Allow deploying nydus-snapshotter Let's introduce a new EXPERIMENTAL_SETUP_SNAPSHOTTER environemnt variable that, when set, allows kata-deploy to put the nydus snapshotter in the correct place, and configure containerd accordingly. Mind, this is a stop gap till the nydus-snapshotter helm chart is ready to be used and behaving well enough to become a weak dependency of our helm chart. When that happens this code can be deleted entirely. Users can have nydus-snapshotter deployed and configured for the guest-pull use case by simply passing: ``` EXPERIMENTAL_SETUP_SNAPSHOTTER="nydus" ``` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-07 10:32:46 +02:00
Fabiano Fidêncio	1e2c86c068	kata-deploy: scripts: Only add conf file to the imports once Otherwise we'd end up adding a the file several times, which could lead to problems when removing the entry, leading to containerd not being able to start due to an import file not being present. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-07 10:32:46 +02:00
Fabiano Fidêncio	e1269afe8a	tests: Only use Authorization when GH_TOKEN is available The code, how it was, would lead to the following broke command: `--header "Authorization: Bearer: "` Let's only expand that part of the command if ${GH_TOKEN} is passed, otherwise we don't even bother adding it. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-07 10:32:46 +02:00
Dan Mihai	5e46f814dd	Merge pull request #11832 from kata-containers/sprt/dev-hostpath runtime: Simplify mounting guest devices when using hostPath volumes	2025-10-06 12:36:36 -07:00
Steve Horsman	0d58bad0fd	Merge pull request #11840 from kata-containers/dependabot/cargo/src/tools/agent-ctl/astral-tokio-tar-0.5.5 build(deps): bump astral-tokio-tar from 0.5.2 to 0.5.5 in /src/tools/agent-ctl	2025-10-06 09:35:56 +01:00
Aurélien Bombo	6ff78373cf	docs: Document `privileged_without_host_devices=false` as unsupported Document that privileged containers with privileged_without_host_devices=false are not generally supported. When you try the above, the runtime will pass all the host devices to Kata in the OCI spec, and Kata will fail to create the container for various reasons depending on the setup, e.g.: - Attempting to hotplug uninitialized loop devices. - Attempting to remount /dev devices on themselves when the agent had already created them as default devices (e.g. /dev/full). - "Conflicting device updates" errors. - And more... privileged_without_host_devices was originally created to support Kata [1][2] and lots of people are having issues when it's set to false [3]. [1] https://github.com/kata-containers/runtime/issues/1568 [2] https://github.com/containerd/cri/pull/1225 [3] https://github.com/kata-containers/kata-containers/issues?q=is%3Aissue%20%20in%3Atitle%20privileged Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-10-02 15:21:19 -05:00
Fabiano Fidêncio	300f7e686e	build: Fix initramfs build We have noticed in the CI that the `gen_init_cpio ...` was returning 255 and breaking the build. Why? I am not sure. When chatting with Steve, he suggested to split the command, so it'd be easier to see what's actually breaking. But guess what? There's no breakage when we split the command. So, let's try it out and see whether the CI passes after it. If someone is willing to educate us on this one, please, that would be helpful! :-) Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-02 20:58:22 +02:00
Zvonko Kaiser	2693daf503	gpu: Install dcgm export from the CUDA repo Do not use the repo to install the exporter, we rely on the version tested with Ubuntu <version> Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-10-02 18:05:13 +02:00
Zvonko Kaiser	56c6512781	gpu: Bump to noble and rearrange repos Moving the CUDA repo to the top for all essential packages and adding a repo priority favouring NVIDIA based repos. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-10-02 18:05:13 +02:00
Aurélien Bombo	eeecd6d72b	Merge pull request #11872 from kata-containers/sprt/rust-use-uninit agent/rustjail: Fix potentially uninitialized memory read in unsafe code	2025-10-02 10:39:25 -05:00
Manuel Huber	4b7c1db064	ci: Add test case for openvpn Introduce new test case which verifies that openvpn clients and servers can run as Kata pods and can successfully establish a connection. Volatile certificates and keys are generated by an initialization container and injected into the client and server containers. This scenario requires TUN/TAP support for the UVM kernel. Signed-off-by: Manuel Huber <mahuber@microsoft.com> Co-authored-by: Manuel Huber <manuelh@nvidia.com>	2025-10-02 11:40:49 +02:00
Manuel Huber	34ecb11b35	tests: ease add_allow_all_policy_to_yaml if case No need to die when a Kind that does not require a policy annotation is found in a pod manifest. Print an informational message instead. Signed-off-by: Manuel Huber <mahuber@microsoft.com>	2025-10-02 11:40:49 +02:00
Manuel Huber	e36f788570	kernel: add required configs for openvpn support Currently, use of openvpn clients/servers is not possible in Kata UVMs. Following error message can be expected: ERROR: Cannot open TUN/TAP dev /dev/net/tun: No such device (errno=19) To support opevpn scenarios using bridging and TAP, we enable various kernel networking config options. Signed-off-by: Manuel Huber <mahuber@microsoft.com>	2025-10-02 11:40:49 +02:00
Aurélien Bombo	a9fc501c08	check-spelling: Add hostPath to dictionary Manually added "hostPath" to main.txt then regenerated the dictionary with `./kata-spell-check.sh make-dict`. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-10-01 15:32:21 -05:00
Aurélien Bombo	c7a478662f	check-spelling: Run `make-dict` This simply ran `./kata-spell-check.sh make-dict` as documented in [1]. Unclear why it leads to changes - maybe it hadn't been run in a while. [1] https://github.com/kata-containers/kata-containers/tree/main/tests/cmd/check-spelling#create-the-master-dictionary-files Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-10-01 15:32:21 -05:00
Aurélien Bombo	5c21b1faf3	runtime: Simplify mounting guest devices when using hostPath volumes This change crystallizes and simplifies the current handling of /dev hostPath mounts with virtually no functional change. Before this change: - If a mount DESTINATION is in /dev and it is a non-regular file on the HOST, the shim passes the OCI bind mount as is to the guest (e.g. /dev/kmsg:/dev/kmsg). The container rightfully sees the GUEST device. - If the mount DESTINATION does not exist on the host, the shim relies on k8s/containerd to automatically create a directory (ie. non-regular file) on the HOST. The shim then also passes the OCI bind mount as is to the guest. The container rightfully sees the GUEST device. - For other /dev mounts, the shim passes the device major/minor to the guest over virtio-fs. The container rightfully sees the GUEST device. After this change: - If a mount SOURCE is in /dev and it is a non-regular file on the HOST, the shim passes the OCI bind mount as is to the guest. The container rightfully sees the GUEST device. - The shim does not anymore rely on k8s/containerd to create missing mount directories. Instead it explicitely handles missing mount SOURCES, and treats them like the previous bullet point. - The shim no longer uses virtio-fs to pass /dev device major/minor to the guest, instead it passes the OCI bind mount as is. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-10-01 15:32:21 -05:00
Markus Rudy	285aaad13e	Merge pull request #11868 from burgerdev/serial-tests kata-sys-util: use a tempdir per test case	2025-10-01 14:34:18 +02:00
Markus Rudy	507a0e09f3	agent: use TEST-NET-1 addresses for netlink tests test_add_one_arp_neighbor modifies the root network namespace, so we should ensure that it does not interfere with normal network setup. Adding an IP to a device results in automatic routes, which may affect routing to non-test endpoints. Thus, we change the addresses used in the test to come from TEST-NET-1, which is designated for tests and usually not routable. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-10-01 09:00:52 +02:00
Markus Rudy	bbc006ab7c	agent: add debug info to netlink tests list_routes and test_add_one_arp_neighbor have been flaky in the past (#10856), but it's been hard to tell what exactly is going wrong. This commit adds debug information for the most likely problem in list_routes: devices being added/removed/modified concurrently. Furthermore, it adds the exit code and stderr of the ip command, in case it failed to list the ARP neighborhood. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-10-01 09:00:52 +02:00
Markus Rudy	43f6a70897	kata-sys-util: use a tempdir per test case Rust unit tests are executed concurrently [1], so sharing a directory of test files between test cases is prone to race conditions. This commit changes the pci_manager tests such that each test uses its own tempfile::tempdir, which provides nice isolation and obsoletes the need to manually clean up. [1]: https://doc.rust-lang.org/book/ch11-02-running-tests.html#running-tests-in-parallel-or-consecutively Fixes: #11852 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-10-01 09:00:52 +02:00
Aurélien Bombo	a3669d499a	agent/rustjail: Fix potentially uninitialized memory read in unsafe code The previous code only checked the result of with_nix_path(), not statfs(), thus leading to an uninitialized memory read if statfs() failed. No functional change otherwise. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-09-30 15:48:07 -05:00
Aurélien Bombo	20c60b21bd	Merge pull request #11839 from Sumynwa/sumsharma/agent-ctl-vm-container agent-ctl: Add fs sharing using virtio-fs when booting a pod vm.	2025-09-30 15:45:10 -05:00
Aurélien Bombo	7b2a7ca4d8	Merge pull request #11869 from burgerdev/cargo-fmt kata-sys-util: format mount.rs	2025-09-30 10:27:08 -05:00
Markus Rudy	a21a94a2e8	kata-sys-util: format mount.rs PR #11849 was merged before fixing a formatting issue. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-09-30 13:02:30 +02:00
Mikko Ylinen	6f45a7f937	runtime: config: allow TDX QGS port=0 `85f3391bc` added the support for TDX QGS port=0 but missed defaultQgsPort in the default config. defaultQgsPort overrides user provided tdx_quote_generation_service_socket_port=0. After this change, defaultQgsPort is not needed anymore since there's no default: any positive integer is OK and negative or unset value becomes a parse error. QEMUTDXQUOTEGENERATIONSERVICESOCKETPORT in the Makefile is used to provide a sane default when tdx_quote_generation_service_socket_port gets set in the configuration. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-09-30 09:47:05 +02:00
Xuewei Niu	ca11a7387d	Merge pull request #11636 from burgerdev/darwin-ci ci: add genpolicy build for Darwin	2025-09-30 13:52:39 +08:00
Aurélien Bombo	575381cb7e	Merge pull request #11846 from kata-containers/sprt/reinstate-mariner Revert "ci: temporarily avoid using the Mariner Host image"	2025-09-29 15:49:53 -05:00
Dan Mihai	4b308817bc	Merge pull request #11858 from microsoft/danmihai/policy-tests-upstream2 tests: k8s: auto-generate policy for additional tests	2025-09-29 13:39:22 -07:00
Aurélien Bombo	693a1461d2	tests: policy: Set oci_version to 1.2.0 for Mariner Mariner recently upgraded to containerd 2.0. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-09-29 12:14:51 -05:00
Aurélien Bombo	756f3a73df	Revert "ci: temporarily avoid using the Mariner Host image" This reverts commit `e8405590c1`. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-09-29 12:14:51 -05:00
Aurélien Bombo	c8fdb0e971	Merge pull request #11849 from shwetha-s-poojary/fix_ppc_mount_ut libs: Fix the test_parse_mount_options failure on ppc64le	2025-09-29 11:08:21 -05:00
Markus Rudy	369124b180	ci: build genpolicy on darwin genpolicy is a developer tool that should be usable on MacOS. Adding it to the darwin CI job ensures that it can still be built after changes. On an Apple M2, the output of `uname -m` is `arm64`, which is why a new case is needed in the arch_to_* functions. We're not going to cross-compile binaries on darwin, so don't install any additional Rust targets. Fixes: #11635 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-09-29 09:48:32 +02:00
Markus Rudy	369aed0203	kata-types: conditionally include safe-path Most of the kata-types code is reusable across platforms. However, some functions in the mount module require safe-path, which is Linux-specific and can't be used on other platforms, notably darwin. This commit adds a new feature `safe-path` to kata-types, which enables the functions that use safe-path. The Linux-only callers kata-ctl and runtime-rs enable this feature, whereas genpolicy only needs initdata and does not need the functions from the mount module. Using a feature instead of a target_os restriction ensures that the developer experience for genpolicy remains the same. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-09-29 09:48:32 +02:00
Sumedh Alok Sharma	c94e65e982	agent-ctl: Add fs sharing using virtio-fs when booting a pod vm. This commit adds changes to enable fs sharing between host/guest using virtio-fs when booting a pod VM for testing. This primarily enables sharing container rootfs for testing container lifecycle commands. Summary of changes is as below: - adds minimal virtiofsd code to start userspace daemon (based on `runtime-rs/crates/resource/src/share_fs`) - adds the virtiofs device to the test vm - prepares and mounts the container rootfs on host - modifies container storage & oci specs Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2025-09-29 07:20:42 +00:00
Markus Rudy	63515242c5	tests: fix shellcheck findings in install_rust.sh Fixing the shellcheck issues first so that they are not coupled to the subsequent commit introducing Darwin support to the script. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-09-28 12:01:23 +02:00
Zvonko Kaiser	c4e352f7ff	Merge pull request #11856 from zvonkok/gpu_guest_components gpu: Add libgcc for RUST libc=gnu builds	2025-09-26 18:27:16 -04:00
Dan Mihai	ef0f8723cf	tests: k8s-nginx-connectivity: auto-generated policy Auto-generate policy for nginx-deployment pods, instead of hard-coding the "allow all" policy. Note that the `busybox_pod` - created using `kubectl run` - still doesn't have an Init Data annotation, so it is using the default policy built into the Kata Guest rootfs image file. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-26 20:24:13 +00:00
Dan Mihai	8943f0d9b2	tests: k8s-liveness-probes: auto-generate policy Auto-generate agent policy in k8s-liveness-probes.bats, instead of using the non-confidential "allow all" policy. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-26 20:23:12 +00:00
Dan Mihai	d9bc7e2b76	tests: k8s-credentials-secrets: auto-generate policy Auto-generate the agent policy for pod-secret-env.yaml, using "genpolicy -c inject_secret.yaml". Support for passing Secret specification files as "-c" arguments of genpolicy has been added when fixing #10033 with PR #10986. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-26 20:23:12 +00:00
Zvonko Kaiser	3743eb4cea	gpu: Add ligcc for RUST libc=gnul builds Since we cannot build all components with libc=musl and static RUSTFLAG we still need to ship libcc for AA or other guest components. Without this change the guest components do not work and we see /usr/local/bin/attestation-agent: error while loading shared libraries: libgcc_s.so.1: cannot open shared object file: No such file or directory Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-09-26 15:08:58 -04:00
Dan Mihai	32453a576f	Merge pull request #11845 from microsoft/danmihai/policy-tests-upstream tests: k8s: auto-generate policy for additional tests	2025-09-26 11:32:23 -07:00
Aurélien Bombo	f3293ed404	Merge pull request #11855 from kata-containers/sprt/zizmor-fixes2 gha: zizmor: fix "workflow or action definition without a name" error	2025-09-26 12:09:52 -05:00
Hyounggyu Choi	077aaa6480	Merge pull request #11854 from kata-containers/sprt/pipefail-lib tests/k8s: Add set -euo pipefail to lib.sh	2025-09-26 12:49:59 +02:00
Aurélien Bombo	433e59de1f	gha: zizmor: fix "workflow or action definition without a name" error This fixes that error everywhere by adding a `name:` field to all jobs that were missing it. We keep the same name as the job ID to ensure no disturbance to the required job names. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-09-25 23:34:40 -05:00
Aurélien Bombo	282e20bc37	tests/k8s: Add set -euo pipefail to lib.sh -o pipefail in particular ensures that exec_host() returns the right exit code. -u is also added for good measure. Note that $BATS_TEST_DIRNAME is set by bats so we move its usage inside the function. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-09-25 23:05:05 -05:00
Aurélien Bombo	d1f52728cc	Merge pull request #11853 from kata-containers/sprt/zizmor-fix gha: Run Zizmor without Advanced Security	2025-09-25 14:06:53 -05:00
Aurélien Bombo	0b40ad066a	gha: Set Zizmor check as non-required As a consequence of moving away from Advanced Security for Zizmor, it now checks the entire codebase and will error out on this PR and future. To be reverted once we address all Zizmor findings in a future PR. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-09-25 10:50:49 -05:00
Aurélien Bombo	2e033d0079	gha: Run Zizmor without Advanced Security This does not change the security of the analysis, this is just to work around zizmorcore/zizmor-action#43. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-09-25 10:50:41 -05:00
shwetha-s-poojary	c28ffac060	libs: Fix the test_parse_mount_options failure on ppc64le This PR fixes a test that failed on platforms like ppc64le due to a hardcoded mount option length. * Test was failing on ppc64le due to larger system page size (e.g., 65536 bytes) * Original test used a hardcoded 4097-byte string assuming 4KB page size * Replaced with MAX_MOUNT_PARAM_SIZE + 1 to reflect actual system limit Ensures test fails correctly across all architectures Fixes: #11852 Signed-off-by: shwetha-s-poojary <shwetha.s-poojary@ibm.com>	2025-09-25 19:56:51 +05:30
Greg Kurz	f6d352d088	Merge pull request #11835 from ldoktor/ocp-pp-revision ci.ocp: Avoid unsupported "git --revision"	2025-09-25 16:10:48 +02:00
Xuewei Niu	98446e7338	Merge pull request #11678 from StevenFryto/rootless_vmm runtime-rs: Add support for running the VMM in non-root mode	2025-09-25 22:03:25 +08:00
Aurélien Bombo	3ce7693a2d	Merge pull request #11851 from BbolroC/remove-comment-for-hadolint-dl3007 ci: Remove DL3007 ignore comment for base image	2025-09-25 09:03:07 -05:00
Xuewei Niu	46cbb2fb98	Merge pull request #11719 from whyeinstein/csi-kata-spdkvolume csi-kata-directvolume: Add basic SPDK volume support	2025-09-25 21:53:46 +08:00
Hyounggyu Choi	c961f70b7e	ci: Remove DL3007 ignore comment for base image The Hadolint warning DL3007 (pin the version explicitly) is no longer applicable. We have updated the base image to use a specific version digest, which satisfies the linter's requirement for reproducible builds. This commit removes the corresponding inline ignore comment. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-09-25 15:46:39 +02:00
Dan Mihai	fe5ee803a8	tests: k8s-sysctls.bats auto-generated policy Auto-generate policy in k8s-sysctls.bats, instead of hard-coding the "allow all" policy. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-25 13:03:19 +00:00
Dan Mihai	9d3d3c9b0f	tests: k8s-pod-quota.bats auto-generated policy Auto-generate policy in k8s-pod-quota.bats, instead of hard-coding the "allow all" policy. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-25 13:03:19 +00:00
Dan Mihai	0008ecd18b	tests: k8s-inotify.bats auto-generated policy Auto-generate policy for k8s-inotify.bats, instead of hard-coding the "allow all" policy. Fixes: #8889 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-25 13:03:19 +00:00
Dan Mihai	711e7b8014	tests: k8s-hostname.bats auto-generated policy Auto-generate policy for k8s-hostname.bats, instead of hard-coding the "allow all" policy. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-25 13:03:19 +00:00
Dan Mihai	566e1abb09	tests: k8s-empty-dirs.bats generated policy Auto-generated policy for k8s-empty-dirs.bats, instead of hard-coding the "allow all" policy. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-25 13:03:19 +00:00
stevenfryto	9e33888f06	runtime-rs: supporting the QEMU VMM process running in non-root mode This change enables to run the QEMU VMM using a non-root user when rootless flag is set true in the configuration. Signed-off-by: stevenfryto <sunzitai_1832@bupt.edu.cn>	2025-09-25 19:30:29 +08:00
stevenfryto	bde6eb7c3a	runtime-rs: add generic support for running the VMM in non-root mode This commit introduces generic support for running the VMM in rootless mode in runtime-rs: 1.Detect whether the VMM is running in rootless mode. 2.Before starting the VMM process, create a non-root user and launch the VMM with that user’s UID and GID; also add the KVM user's group ID to the VMM process's supplementary groups so the VMM process can access /dev/kvm. 3.Add the setup of the rootless directory located in the dir /run/user/<uid> directory, and modify some path variables to be functions that return the path with the rootless directory prefix when running in rootless mode. Fixes: #11414 Signed-off-by: stevenfryto <sunzitai_1832@bupt.edu.cn>	2025-09-25 19:30:29 +08:00
why	5d76811c8a	csi-kata-directvolume: Add basic SPDK volume support Introduce initial implementation for SPDK-backed CSI volumes, allowing basic create and delete operations with vhost-user-blk integration. Signed-off-by: why <1206176262@qq.com>	2025-09-25 19:29:50 +08:00
Xuewei Niu	319237e447	Merge pull request #11848 from BbolroC/pin-alpine-to-stable-digest GHA: Pin Alpine to 3.20 for tee-unencrypted image	2025-09-25 19:29:22 +08:00
Hyounggyu Choi	e9653eae6e	GHA: Pin Alpine to 3.20 for tee-unencrypted image We recently hit the following error during build: ``` RUN ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key -P "" OpenSSL version mismatch. Built against 3050003f, you have 30500010 ``` This happened because `alpine:latest` moved forward and the `ssh-keygen` binary in the base image was compiled against a newer OpenSSL version that is not available at runtime. Pinning the base image to the stable release (3.20) avoids the mismatch and ensures consistent builds. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-09-25 11:49:04 +02:00
Steve Horsman	0a9e730f54	Merge pull request #11847 from Sumynwa/sumsharma/agent-ctl-ci-fix tests: agent-ctl: Fix cleanup for testing with qemu	2025-09-25 10:37:45 +01:00
Sumedh Alok Sharma	1be3785fa0	tests: agent-ctl: Fix cleanup for testing with qemu This change fixes clean up logic when running tests in a vm booted with qemu wrt to qmp.sock & console.sock files, and no longer assumes any path for them. Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2025-09-25 07:30:17 +00:00
Fupan Li	7c58ec7daa	Merge pull request #11833 from kata-containers/sprt/rust-io-bug agent/rustjail: Fix double free in TTY handling	2025-09-25 10:03:45 +08:00
Fupan Li	79f51ab237	runtime-rs: set the default block driver as virtio-scsi for qemu Change the default block driver to virtio-scsi. Since the latest qemu's commit: https://gitlab.com/qemu-project/qemu/-/commit/ 984a32f17e8dab0dc3d2328c46cb3e0c0a472a73 brings a bug for virtio-blk-pci with io_uring mode at line: https://gitlab.com/qemu-project/qemu/-/commit/ 984a32f17e8dab0dc3d2328c46cb3e0c0a472a73# ce8eeb01f8b84f8cb8d3c35684d473fe1ee670f9_345_352 In order to avoid this issue, change the default block driver to virtio-scsi. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-09-24 14:49:53 +02:00
Wainer Moschetta	0bdc462bed	Merge pull request #11841 from microsoft/danmihai1/test-timing-info tests: k8s: add test duration information	2025-09-24 08:17:54 -03:00
Fupan Li	362c177b3d	Merge pull request #11843 from Apokleos/remove-initdata-anno runtime-rs: Remove InitData annotation from OCI Spec	2025-09-24 18:25:37 +08:00
Alex Lyn	62c936b916	runtime-rs: Use the updated OCI Spec annotation as the argument As OCI Spec annotation has been updated with adding or remove items, we should use the updated annotation as the passed argument. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-24 13:04:51 +08:00
Alex Lyn	9eca015d73	runtime-rs: Remove InitData annotation from OCI Spec This commit removes the InitData annotation from the OCI Spec's annotations. Similar to the Policy annotation, InitData is now exclusively handled and transmitted to the guest via the sandbox's init data mechanism. Removing this redundant and potentially large annotation simplifies the OCI Spec and streamlines the guest initialization process. This change aligns the handling of InitData with existing practices within runtime-go. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-24 09:32:13 +08:00
Aurélien Bombo	dedd833cdd	agent: Add note about future breaking change in nix Tracked in #11842. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-09-23 16:23:54 -05:00
Aurélien Bombo	ecb22cb3e3	agent/rustjail: Fix double free in TTY handling The repro below would show this error in the logs (in debug mode only): fatal runtime error: IO Safety violation: owned file descriptor already closed The issue was that the `pseudo.slave` file descriptor was being owned by multiple variables simultaneously. When any of those variables would go out of scope, they would close the same file descriptor, which is undefined behavior. To fix this, we clone: we create a new file descriptOR that refers to the same file descriptION as the original. When the cloned descriptor is closed, this affect neither the original descriptor nor the description. Only when the last descriptor is closed does the kernel cleans up the description. Note that we purposely consume (not clone) the original descriptor with `child_stdin` as `pseudo` is NOT dropped automatically. Repro ----- Prerequisites: - Use Rust 1.80+. - Build the agent in debug mode. $ cat busybox.yaml apiVersion: v1 kind: Pod metadata: name: busybox spec: containers: - image: busybox:latest name: busybox runtimeClassName: kata $ kubectl apply -f busyboox.yaml pod/busybox created $ kubectl exec -it busybox -- sh error: Internal error occurred: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "e6c602352849647201860c1e1888d99ea3166512f1cc548b9d7f2533129508a9": cannot enter container 76a499cbf747b9806689e51f6ba35e46d735064a3f176f9be034777e93a242d5, with err ttrpc: closed Fixes: #11054 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-09-23 16:23:50 -05:00
Dan Mihai	38a28b273a	Merge pull request #11814 from charludo/main genpolicy: match sandbox name by regex	2025-09-23 14:14:11 -07:00
Dan Mihai	e9f69ce321	tests: k8s: add test duration information Log how much time "kubectl get pods" and each test case are taking, just in case that will reveal unusually slow test clusters, and/or opportunities to improve tests. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-23 19:24:38 +00:00
stevenhorsman	c2b0650491	release: Bump version to 3.21.0 Bump VERSION and helm-chart versions Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-23 20:59:00 +02:00
dependabot[bot]	e24e564eb7	build(deps): bump astral-tokio-tar in /src/tools/agent-ctl Bumps [astral-tokio-tar](https://github.com/astral-sh/tokio-tar) from 0.5.2 to 0.5.5. - [Release notes](https://github.com/astral-sh/tokio-tar/releases) - [Changelog](https://github.com/astral-sh/tokio-tar/blob/main/CHANGELOG.md) - [Commits](https://github.com/astral-sh/tokio-tar/compare/v0.5.2...v0.5.5) --- updated-dependencies: - dependency-name: astral-tokio-tar dependency-version: 0.5.5 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2025-09-23 17:46:48 +00:00
Fabiano Fidêncio	bfc54d904a	agent: Fix format issues In the previous commit we've added some code that broke `cargo fmt -- --check` without even noticing, as the code didn't go through the CI process (due to it being a security advisory). Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-09-23 16:47:39 +02:00
Steve Horsman	3e67f92e34	Merge commit from fork Fix malicious host can circumvent initdata verification on TDX	2025-09-23 13:31:29 +01:00
Alex Lyn	a9ec8ef21f	kata-types: remove trailing slash from DEFAULT_KATA_GUEST_SANDBOX_DIR Trailing slash in DEFAULT_KATA_GUEST_SANDBOX_DIR caused double slashes in mount_point (e.g. "/run/kata-containers/sandbox//shm"), which failed OPA strict equality checks against policy mount_point. Removing it aligns generated paths with policy and fixes CreateSandboxRequest denial. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-23 14:01:22 +02:00
Steve Horsman	bcd0c0085c	Merge pull request #11821 from mythi/coco-guest-update Confidential containers version updates	2025-09-23 12:45:38 +01:00
Mikko Ylinen	5cb1332348	build: enable nvidia-attester for coco-guest-components coco-guest-components tarball is used as is for both vanilla coco rootfs and the nvidia enabled rootfs. nvidia-attester can be built without nvml so make it globally enabled for coco-guest-components. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-09-23 12:38:32 +03:00
Mikko Ylinen	e878d4a90a	versions: bump guest-components and trustee for CoCo v0.16.0 Pick the latest CoCo components targeted for the next release. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-09-23 12:38:32 +03:00
Charlotte Hartmann Paludo	2cea32cc23	genpolicy: match sandbox name by regex `allow_interactive_exec` requires a sandbox-name annotation, however this is only added for pods by genpolicy. Other pod-generating resources have unpredictable sandbox names. This patch instead uses a regex for the sandbox name in genpolicy, based on the specified metadata and following Kubernetes' naming logic. The generated regex is then used in the policy to correctly match the sandbox name. Fixes: #11823 Signed-off-by: Charlotte Hartmann Paludo <git@charlotteharludo.com> Co-authored-by: Paul Meyer <katexochen0@gmail.com> Co-authored-by: Markus Rudy <mr@edgeless.systems>	2025-09-23 10:31:58 +02:00
Lukáš Doktor	5c14d2956a	ci.ocp: Avoid unsupported "git --revision" the git version in CI doesn't support "git clone --revision", workaround it by using fetch directly. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-09-23 09:29:06 +02:00
Fupan Li	a27009012c	Merge pull request #11834 from Apokleos/fix-initdata-whitespace CI: Keep base64 output of initdata annotation is a single line	2025-09-23 15:16:35 +08:00
Alex Lyn	4e793d635e	Merge pull request #11736 from kata-containers/enhance-copyfile runtime-rs: Enhance copyfile when sharedfs is disabled	2025-09-23 14:15:44 +08:00
Alex Lyn	f254eeb0e9	CI: Keep base64 output is a single line This commit addresses an issue where base64 output, when used with a default configuration, would introduce newlines, causing decoding to fail on the runtime. The fix ensures base64 output is a single, continuous line using the -w0 flag. This guarantees the encoded string is a valid Base64 sequence, preventing potential runtime errors caused by invalid characters. Note that: When you use the base64 command without any parameters, it typically automatically adds newlines to the output, usually every 76 chars. In contrast, base64 -w0 explicitly tells the command not to add any newlines (-w for wrap, and 0 for a width of zero), which results in a continuous string with no whitespace. This is a critical distinction because if you pass a Base64 string with newlines to a runtime, it may be treated as an invalid string, causing the decoding process to fail. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-23 11:58:53 +08:00
Fupan Li	72a0f5daec	Merge pull request #11794 from Sumynwa/sumsharma/clh_netdev_hotplug_pciinfo runtime: clh: Add pci path for hotplugged network endpoints	2025-09-23 09:57:57 +08:00
Dan Mihai	02ace265d9	Merge pull request #11827 from microsoft/danmihai1/exec-retries tests: k8s: retry kubectl exec	2025-09-22 17:14:50 -07:00
Hyounggyu Choi	16c2dd7c96	Merge pull request #11769 from Apokleos/enhance-blockdev Enhance block device AIO mode	2025-09-22 14:01:38 +02:00
Alex Lyn	5dd36c6c0f	runtime-rs: Correctly set permission and mode for dir when copy files Correctly set dir's permissions and mode. This update ensures: The dir_mode field of CopyFileRequest is set to DIR_MODE_PERMS (equivalent to Go's 0o750 \| os.ModeDir), which is primarily used for the top-level directory creation permissions. The file_mode field now directly uses metadata.mode() (equivalent to Go's st.Mode) for the target entry. This change aims to resolve potential permission issues or inconsistencies during directory and file creation within the guest environment by precisely matching the expected mode propagation of the Kata agent. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-22 17:59:57 +08:00
Greg Kurz	0f5511962c	Merge pull request #11638 from ldoktor/ocp-peer-pods ci.ocp: More debug output and tweaks	2025-09-22 11:57:46 +02:00
Alex Lyn	429133cedb	runtime-rs: Introduce shared FS volume management in VolumeResource The core purpose of introducing volume_manager to VolumeResource is to centralize the management of shared file system volumes. By creating a single VolumeManager instance within VolumeResource, all shared file volumes are managed by one central entity. This single volume_manager can accurately track the references of all ShareFsVolume instances to the shared volumes, ensuring correct reference counting, proper volume lifecycle management, and preventing issues like volumes being overwritten. This new design ensures that all shared volumes are managed by a central entity, which: (1) Guarantees correct reference counting. (2) Manages the volume lifecycle correctly, avoiding issues like volumes being overwritten. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-22 15:03:41 +08:00
Alex Lyn	90c99541da	runtime-rs: Integrate VolumeManager into ShareFsVolume lifecycle This commit integrates the new `VolumeManager` into the `ShareFsVolume` lifecycle. Instead of directly copying files, `ShareFsVolume::new` now uses the `VolumeManager` to get a guest path and determine if the volume needs to be copied. It also updates the `cleanup` function to release the volume's reference count, allowing the `VolumeManager` to manage its state and clean up resources when no longer in use. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-22 15:03:27 +08:00
Alex Lyn	e73daa2f14	runtime-rs: Add sandbox level volume manager within non-sharedfs This commit introduces a new `VolumeManager` to track the state of shared volumes, including their reference count and its corresponding container ids. The manager's goal is to handle the lifecycle of shared filesystem volumes, including: (1) Volume State Tracking: Tracks the mapping from host source paths to guest destination paths. (2) Reference Counting: Manages reference counts for each volume, preventing premature cleanup when multiple containers share the same source. (3) Deterministic guest paths: Generates unique guest paths using random string to avoid naming conflicts. (4) Improved Management: Provides a centralized way to handle volume creation, copying, and release, including aborting file watchers when volumes are no longer in use. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-22 14:45:16 +08:00
Mikko Ylinen	28ab972b3f	agent-ctl: bump image-rs pull image-rs from CoCo guest-components that is targeted for CoCo v0.16.0. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-09-22 08:31:58 +03:00
Alex Lyn	313c7313f0	runtime-rs: Refactor code to improve copyfile logic and readability This commit refactors the `CopyFile` related code to streamline the logic for creating guest directories and make the code structure clearer. Its main goal is to improve the overall maintainability and facilitate future feature extensions. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-22 11:30:47 +08:00
Alex Lyn	f36377070a	runtime-rs: Enhance Copyfile to ensure existing contents synchronized This commit is designed to perform a full sync before starting monitoring to ensure that files which exist before monitoring starts are also synced. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-22 11:30:35 +08:00
Alex Lyn	2f5319675a	runtime-rs: Set native aio more for initdata block device This commit updates the configuration for the initdata block device to use the BlockDeviceAio::Native mode. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-22 10:13:44 +08:00
Alex Lyn	5ca403b5d9	runtime-rs: Allow per-device AIO mode configuration for block devices This commit enhances control over block device AIO modes via hotplug. Previously, hotplugging block devices was set with default AIO mode (io_uring). Even if users reset the AIO mode in the configuration file, the changes would not be correctly applied to individual block devices. With this update, users can now explicitly configure the AIO mode for hot-plugging block devices via the configuration, and those settings will be correctly applied. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-22 10:13:44 +08:00
Alex Lyn	425e93a9b8	runtime-rs: Get more block device info within Device Manager We need more information about block device, just relapce the original method get_block_driver with get_block_device_info and return its BlockDeviceInfo. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-22 10:13:44 +08:00
Xuewei Niu	50ffa0fbfd	Merge pull request #11495 from Caspian443/temp-selinux runtime-rs: align SELinux feature with runtime-go (#9866)	2025-09-21 17:12:37 +08:00
Caspian443	2221b76b67	runtime-rs: Add selinux support for hypervisor - read selinux_label from OCI spec in sandbox - set selinux_label in preparevm and startvm in hypervisor Fixes: [#9866](https://github.com/Caspian443/kata-containers/issues/9866) Signed-off-by: Caspian443 <scrisis843@gmail.com>	2025-09-21 13:59:17 +08:00
Caspian443	a658db8746	runtime-rs: hypervisor: add SELinux support functions - Add disable_selinux and selinux_label fields to hypervisor for SELinux support. - Implement related SELinux support functions. Fixes: #9866 Signed-off-by: Caspian443 <scrisis843@gmail.com>	2025-09-21 13:59:17 +08:00
Xuewei Niu	04948c616e	Merge pull request #11830 from zvonkok/gpu-lts gpu: Add correct latest driver per default	2025-09-21 13:58:34 +08:00
Zvonko Kaiser	e6f12d8f86	gpu: Add latest driver per default Lets make sure that we use latest driver for CI and release. There was a sort step missing. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-09-20 23:50:35 +00:00
Fabiano Fidêncio	54e8081222	qemu: Fix submodules location change The submodule change led to a breakage on our build of QEMU. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-09-20 22:12:27 +02:00
Lukáš Doktor	346ebd0ff9	ci.ocp: Allow to set CAA_IMAGE we might want to provide different CAA_IMAGE (repo) to reproduce issues. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-09-20 06:57:54 +02:00
Lukáš Doktor	bf90ccaf75	ci.ocp: Allow to set/provide PP_IMAGE_ID to be able to test with older or custom peer-pod image. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-09-20 06:57:54 +02:00
Lukáš Doktor	b7143488d9	ci.ocp: Allow to set CAA TAG to allow re-running with older CAA tag for bisection/reproduction. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-09-20 06:57:54 +02:00
Lukáš Doktor	12c5e0f33f	ci.ocp: Log more details on failure recently we got ErrImagePull, having more details should help analyzing issues. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-09-20 06:57:54 +02:00
Lukáš Doktor	7565c881e6	ci.ocp: Log variables in bash-friendly format this should simplify copy&paste of the values from logs. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-09-20 06:57:54 +02:00
Lukáš Doktor	a300b6b9a9	ci.ocp: Allow to set operator/caa commits this can help reproducing or bisecting issues related to operator/caa versions. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-09-20 06:57:53 +02:00
Dan Mihai	524bf66cbc	tests: k8s-credentials-secrets: retry on exec error Retry after "kubectl exec" failure, instead of aborting the test immediately. Example of recent error: https://github.com/kata-containers/kata-containers/actions/runs/17828061309/job/50693999052?pr=11822 not ok 1 Credentials using secrets (in test file k8s-credentials-secrets.bats, line 59) `kubectl exec $pod_name -- "${pod_exec_command[@]}" \| grep -w "username"' failed Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-19 17:45:05 +00:00
Dan Mihai	01c7949bfd	tests: k8s-number-cpus: retry on kubectl exec error Retry after "kubectl exec" failure, instead of aborting the test immediately. Example of recent error: https://github.com/kata-containers/kata-containers/actions/runs/17813996758/job/50644372056 not ok 1 Check number of cpus ... error: Internal error occurred: error sending request: Post "https://10.224.0.4:10250/exec/kata-containers-k8s-tests/cpu-test/c1?command=sh&command=-c&command= cat+%!F(MISSING)proc%!F(MISSING)cpuinfo+%!C(MISSING)grep+processor%!C(MISSING)wc+-l&error=1&output=1": EOF Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-19 17:44:55 +00:00
Dan Mihai	91c3804959	tests: k8s: add container_exec_with_retries() Add container_exec_with_retries(), useful for retrying if needed commands similar to: kubectl exec <pod_name> -c <container_name> -- <command> Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-19 17:42:59 +00:00
Dan Mihai	eec6c8b0c4	tests: k8s: retry after kubectl exec error Some of the k8s tests were already retrying if `kubectl exec` succeeded but produced empty output. Perform the same retries on `kubectl exec` error exit code too, instead of aborting the test immediately. Example of recent exec error: https://github.com/kata-containers/kata-containers/actions/runs/17813996758/job/50644372056 not ok 1 Check number of cpus ... error: Internal error occurred: error sending request: Post "https://10.224.0.4:10250/exec/kata-containers-k8s-tests/cpu-test/c1?command=sh&command=-c&command= cat+%!F(MISSING)proc%!F(MISSING)cpuinfo+%!C(MISSING)grep+processor%!C(MISSING)wc+-l&error=1&output=1": EOF Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-19 15:43:39 +00:00
Hyounggyu Choi	0fb40eda12	Merge pull request #11822 from BbolroC/runtime-no-hotplug-ibm-sel-s390x runtime: Set maxmem to initialmem on s390x when memory hotplug is disabled	2025-09-18 17:31:01 +02:00
Hyounggyu Choi	d90e785901	runtime: Set maxmem to initialmem on s390x when memory hotplug is disabled On s390x, QEMU fails if maxmem is set to 0: ``` invalid value of maxmem: maximum memory size (0x0) must be at least the initial memory size ``` This commit sets maxmem to the initial memory size for s390x when hotplug is disabled, resolving the error while still ensuring that memory hotplug remains off. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-09-18 14:05:33 +02:00
Mikko Ylinen	49fbd6e7af	runtime: qemu: disable memory hotplug for ConfidentialGuests The setting '-m xM,slots=y,maxmem=zM' where maxmem is from the host's memory capacity is failing with confidential VMs on hosts having 1T+ of RAM. slots/maxmem are necessary for setups where the container memory is hotplugged to the VM during container creation based on createContainer info. This is not the case with CoCo since StaticResourceManagement is enabled and memory hotplug flows have not been checked. To avoid unexpeted errors with maxmem, disable slots/maxmem in case ConfidentialGuest is requested. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-09-17 23:43:36 +02:00
Dan Mihai	ca244c7265	Merge pull request #11753 from Apokleos/fix-anno runtime-rs: Fix annotations within runtime-rs to pass the agent policy check	2025-09-16 16:42:26 -07:00
Dan Mihai	e2992b51ad	tests: k8s-job debug information Log the output of "kubectl logs", to hopefully help understand test failures similar to: https://github.com/kata-containers/kata-containers/actions/runs/17709473340/job/50326984605?pr=11753 not ok 1 Run a job to completion (in test file k8s-job.bats, line 37) `kubectl logs "$pod_name" \| grep "$pi_number"' failed Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-16 22:36:31 +02:00
Dan Mihai	8854e69e28	tests: k8s-empty-dirs debug information Log the output of "kubectl logs", to hopefully help understand test failures similar to: https://github.com/kata-containers/kata-containers/actions/runs/17709473340/job/50326984613?pr=11753 not ok 2 Empty dir volume when FSGroup is specified with non-root container (from function `assert_equal' in file k8s-empty-dirs.bats, line 16, in test file k8s-empty-dirs.bats, line 65) `assert_equal "1001" "$uid"' failed Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-16 22:36:31 +02:00
Fabiano Fidêncio	96108006f2	agent: Panic on errors accessing the attestation agent binary Let's make sure that whenever we try to access the attestation agent binariy, we only proceed the startup in case: * the binary is found (CoCo case) * the binary is not present (non-CoCo case) In case any error that's not `NotFound`, we should simply abort as that could mean a potential tampering with the binary (which would be reported as an EIO). Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-09-16 21:35:00 +02:00
Fabiano Fidêncio	d056fb20fe	initramfs: Enforce --panic-on-corruption for veritysetup Let's enforce an error on veritysetup in case there's any tampering with the rootfs. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-09-16 21:35:00 +02:00
Alex Lyn	bc1170ba0c	runtime-rs: Add bundle_path annotation within oci spec Add the annotation of OCI bundle path to store its path. As it'll be checked within agent policy, we need add them to pass agent policy validations. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-16 21:31:02 +02:00
Alex Lyn	71ddbac56d	runtime-rs: Correctly set CONTAINER_TYPE_KEY within OCI Spec annotation With the help of `update_ocispec_annotations`, we'll add the contaienr type key with "io.katacontainers.pkg.oci.container_type" and its corresponding type "pod_sandbox" when it's pause container and "pod_container" when it's an other containers. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-16 21:31:02 +02:00
Alex Lyn	a47c0cdf66	kata-types: Introduce a helper to update oci spec annotations It'll updates OCI annotations by removing specified keys and adding new ones. This function creates a new `HashMap` containing the updated annotations, ensuring that the original map remains unchanged. It is optimized for performance by pre-allocating the necessary capacity and handling removals and additions efficiently. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-16 21:31:02 +02:00
Alex Lyn	9992e1c416	kata-types: Export `POD_CONTAINER` and `POD_SANDBOX` constants as public To enable access to the constants `POD_CONTAINER` and `POD_SANDBOX` from other crates, their visibility has been updated to public. This change addresses the previous limitation of restricted access and ensures these values can be utilized across the codebase. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-16 21:31:02 +02:00
Alex Lyn	95585d818f	runtime-rs: Add sandbox annotation of nerdctl network namespace Add the annotation of nerdctl network namespace to let nerdctl know which namespace to use when calling the selected CNI plugin with "nerdctl/network-namespace". As it'll be checked within agent policy, we need add them to pass agent policy validations. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-16 21:31:00 +02:00
Dan Mihai	bc75f6a158	Merge pull request #11783 from billionairiam/agenttypo kata-agent: Rename misleading variable in config parsing	2025-09-16 11:07:17 -07:00
Fabiano Fidêncio	e31a06d51d	kata-manager: Handle zst unpacking On `63f6dcdeb9` we added the support to download either a .xz or a .zst tarball file. However, we missed adding the code to properly unpack a .zst tarball file. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-09-16 19:16:14 +02:00
Fabiano Fidêncio	4265beb081	tools: agent-ctl: Fix unresolved ch import agent-ctl's make check has been failing with: ``` Checking kata-agent-ctl v0.0.1 (/home/ubuntu/runner/_layout/_work/kata-containers/kata-containers/src/tools/agent-ctl) error[E0432]: unresolved import `hypervisor::ch` --> src/vm/vm_ops.rs:10:5 \| 10 \| ch::CloudHypervisor, \| ^^ could not find `ch` in `hypervisor` \| note: found an item that was configured out --> /home/ubuntu/runner/_layout/_work/kata-containers/kata-containers/src/runtime-rs/crates/hypervisor/src/lib.rs:30:9 \| 30 \| pub mod ch; \| ^^ note: the item is gated here --> /home/ubuntu/runner/_layout/_work/kata-containers/kata-containers/src/runtime-rs/crates/hypervisor/src/lib.rs:26:1 \| 26 \| / #[cfg(all( 27 \| \| feature = "cloud-hypervisor", 28 \| \| any(target_arch = "x86_64", target_arch = "aarch64") 29 \| \| ))] \| \|___^ ``` Let's just make sure that we include ch conditionally as well. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-09-16 18:44:33 +02:00
Fupan Li	4a92fc1129	runtime-rs: add the sandbox's shm volume support Docker containers support specifying the shm size using the --shm-size option and support sandbox-level shm volumes, so we've added support for shm volumes. Since Kubernetes doesn't support specifying the shm size, it typically uses a memory-based emptydir as the container's shm, and its size can be specified. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-09-16 16:32:41 +02:00
Fupan Li	d48c542a52	runtime-rs: Support Firecracker disk rate limiter This PR adds code that passes disk limiter parameters from KC configuration to Firecracker. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-09-16 16:27:58 +02:00
Fupan Li	e0caeb32fc	runtime-rs: move the rate limiter to hypervisor config Since the rate limiter would be shared by cloud-hypervisor and firecracker etc, thus move it from clh's config to hypervisor config crate which would be shared by other vmm. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-09-16 16:27:58 +02:00
Fupan Li	73e31ea19a	runtime-rs: add the block devices io limit support Given that Rust-based VMMs like cloud-hypervisor, Firecracker, and Dragonball naturally offer user-level block I/O rate limiting, I/O throttling has been implemented to leverage this capability for these VMMs. This PR specifically introduces support for cloud-hypervisor. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-09-16 16:27:58 +02:00
Steve Horsman	ac74ef4505	Merge pull request #11801 from Apokleos/blk-sharerw runtime-rs: Enable share-rw=true when hotplug block device within qemu	2025-09-16 14:55:57 +01:00
Sumedh Alok Sharma	3443ddf24d	runtime: clh: Add pci path for hotplugged network endpoints This commit introduces changes to parse the PciDeviceInfo received in response payload when adding a network device to the VM with cloud hypervisor. When hotplugging a network device for a given endpoint, it rightly sets the PciPath of the plugged-in device in the endpoint. In calls like virtcontainers/sandbox.go:AddInterface, the later call to agent sends the pci info for uevents (instead of empty value) to rightly update the interfaces instead of failing with `Link not found` Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2025-09-16 12:45:57 +00:00
Alex Lyn	e9a5de35e8	runtime-rs: Enable share-rw=true when hotplug block device within qemu Support for the share-rw=true parameter has been added. While this parameter is essential for maintaining data consistency across multiple QEMU instances sharing a backend disk image, its implementation also serves to standardize parameters with the block device hotplug functionality in kata-runtime/qemu. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-16 10:55:29 +01:00
Fupan Li	df852b77b5	Merge pull request #11799 from Apokleos/fix-virtual-volume-type runtime-rs: Bugfix for kata virtual volume overlay fstype	2025-09-16 09:38:07 +08:00
Dan Mihai	489b677927	Merge pull request #11732 from microsoft/saulparedes/init_data_policy_support genpolicy: add init data support	2025-09-15 15:45:57 -07:00
Fabiano Fidêncio	8abfef358a	tests: Only run docker tests with one VMM Docker tests have been broken for a while and should be removed if we cannot maintain those. For now, though, let's limit it to run only with one hypervisor and avoid wasting resources for no reason. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-09-15 23:03:04 +02:00
Fabiano Fidêncio	dce6f13da8	tests: Only run devmapper tests with QEMU devmapper tests have been failing for a while. It's been breaking on the kata-deploy deployment, which is most likely related to Disk Pressure. Removing files was not enough to get the tests to run, so we'll just run those with QEMU as a way to test fixes. Once we get the test working, we can re-enable the other VMMs, but for now let's just not waste resources for no reason. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-09-15 23:02:33 +02:00
Saul Paredes	e3e406ff26	tests: remove add_allow_all_policy_to_yaml call from helper func add_allow_all_policy_to_yaml now also sets the initdata annotation. So don't overwrite the initdata annotation that was previously set by create_coco_pod_yaml_with_annotations. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-09-15 11:40:29 -07:00
Saul Paredes	cc73b14e26	docs: update policy docs Update policy docs to use initdata annotation and encoding Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-09-15 11:40:29 -07:00
Saul Paredes	b5352af1ee	tests: update tests that manually set policy Use new initdata annotation instead Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-09-15 11:40:29 -07:00
Saul Paredes	2d8c3206c7	gha: allow cbl-mariner to test using initdata annotation Allow "cc_init_data" hypervisor annotation. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-09-15 11:40:29 -07:00
Saul Paredes	5d124523f8	runtime: add initdata support in clh Prepare the initdata image and mount it as a block device. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-09-15 11:40:21 -07:00
Saul Paredes	252d4486f1	runtime: delete initdata annotation Delete annotation from OCI spec and sandbox config. This is done after the optional initdata annotation value has been read. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-09-15 11:34:26 -07:00
Saul Paredes	af41f5018f	runtime: share initdata setup code Move setup code such that it can be used by other hypervisors. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-09-15 11:34:26 -07:00
Saul Paredes	a427537914	genpolicy: add initdata support Encode policy inside initdata and encode as annotation (base64(gzip(toml))). Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-09-15 11:34:26 -07:00
Saul Paredes	10de56a749	kata-types: expose encode and decode initdata helper methods These methods can be used by other components, such as genpolicy. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-09-15 11:34:26 -07:00
Mikko Ylinen	86fe419774	versions: update kernel-confidential to Linux v6.16.7 update to the latest available v6.16 stable series kernel for CoCo. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-09-15 20:29:22 +02:00
Steve Horsman	fab828586b	Merge pull request #11771 from stevenhorsman/attempt-crio-1.34.0-bump runtime: Bump cri-o to latest	2025-09-15 17:31:13 +01:00
Alex Tibbles	fa6e4981a1	versions: bump ovmf edk2 version Update ovmf to latest release. Includes CVE-2024-38805 fix. EDK2 changelogs for releases since edk2-stable202411: https://github.com/tianocore/edk2/releases/tag/edk2-stable202508 https://github.com/tianocore/edk2/releases/tag/edk2-stable202505 https://github.com/tianocore/edk2/releases/tag/edk2-stable202502 Signed-off-by: Alex Tibbles <alex@bleg.org>	2025-09-15 15:38:33 +02:00
stevenhorsman	dc64d256bf	runtime: Bump cri-o to latest Bump cri-o to 1.34.0 to try and remediate security advisories CVE-2025-0750 and CVE-2025-4437. Note: Running ``` go get github.com/cri-o/cri-o@v1.34.0 ``` seems to bump a lot of other go modules, hence the size of the vendor diff Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-15 14:29:06 +01:00
stevenhorsman	16dd1de0ab	kata-monitor: Update deprecated use of grpc functions In google.golang.org/grpc v1.72.0, `DialContext`, is deprecated, so switch to use `NewClient` instead. `grpc.WithBlock()` is deprecated and not recommend, so remove this Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-15 14:29:06 +01:00
stevenhorsman	b9ff5ffc21	kata-monitor: Replace use of deprecated expfmt.FmtText In `github.com/prometheus/common v0.62.0` expfmt.FmtText is deprecated, so replace with `expfmt.NewFormat(expfmt.TypeTextPlain)`. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-15 14:29:06 +01:00
stevenhorsman	7f86b967d1	runtime: Replace use of deprecated expfmt.FmtText In `github.com/prometheus/common v0.62.0` expfmt.FmtText is deprecated, so replace with `expfmt.NewFormat(expfmt.TypeTextPlain)`. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-15 14:29:06 +01:00
stevenhorsman	62ed86d1aa	runtime: Update deprecated use of grpc.Dial In google.golang.org/grpc v1.72.0, `Dial`, is deprecated, so switch to use `NewClient` instead Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-15 14:29:06 +01:00
stevenhorsman	334340aa18	runtime: Update remove methods In selinux v1.12.0, `label.SetProcessLabel`, was removed to be replaced by `selinux.SetExecLabel` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-15 14:29:06 +01:00
Fabiano Fidêncio	ad7e60030a	tests: k8s: kata-deploy: Remove unnecessary dirs to free up space This is following Steve's suggestion, based on what's been done on cloud-api-adaptor. The reason we're doing it here is because we've seen pods being evicted due to disk pressure. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-09-15 15:27:54 +02:00
Fabiano Fidêncio	60ba121a0d	kata-deploy: nit: Fix test name Just add a "is" there as it was missing. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-09-15 15:27:54 +02:00
Fabiano Fidêncio	d741544fa6	kata-deploy: Don't fail if the runtimeclass is already deleted I've hit this when using a machine with slow internet connection, which took ages to download the kata-cleanup image, and then helm timed out in the middle of the cleanup, leading to the cleanup job being restarted and then bailing with an error as the runtimeclasses that kata-deploy tries to delete were already deleted. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-09-15 15:27:54 +02:00
Fupan Li	679cdeadc8	runtime: fix the issue clh resize vcpu failed Since the cloud hypervisor's resize vCPU is an asynchronous operation, it's possible that the previous resize operation hasn't completed when the request is sent, causing the current call to return an error. Therefore, several retries can be performed to avoid this error. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-09-15 14:29:25 +02:00
Alex Tibbles	66a3d4b4a2	versions: bump kernel to 6.12.47 Update LTS kernel to latest. Signed-off-by: Alex Tibbles <alex@bleg.org>	2025-09-15 14:19:48 +02:00
Alex Tibbles	710c117a24	version: Bump QEMU to v10.1.0 A minor release of QEMU is out, so update to it for fixes and features. QEMU changelog: https://wiki.qemu.org/ChangeLog/10.1 Notes: * AVX support is not an option to be enabled / disabled anymore. * Passt requires Glibc 2.40.+, which means a dependency on Ubuntu 25.04 or newer, thus we're disabling it. Signed-off-by: Alex Tibbles <alex@bleg.org>	2025-09-15 14:19:25 +02:00
stevenhorsman	e3aa973995	versions(deps): Bump slab versions prior to 0.4.10 Although versions of slab prior to 0.4.10, don't have a security vulnearability, we can bump them all to keep things in sync Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-15 09:48:03 +02:00
stevenhorsman	9c0fcd30c5	ci: Add slab to dependabot groups Add slab, so that in future the different component bumps are all done together Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-15 09:48:03 +02:00
stevenhorsman	924051c652	genpolicy: Bump slab crate to 0.4.11 Bump versions to remediate CVE-2025-55159 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-15 09:48:03 +02:00
stevenhorsman	8fb4332d42	agent-ctl: Bump slab crate to 0.4.11 Bump versions to remediate CVE-2025-55159 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-15 09:48:03 +02:00
dependabot[bot]	84bcf34c75	build(deps): bump slab from 0.4.10 to 0.4.11 in /src/runtime-rs Bumps [slab](https://github.com/tokio-rs/slab) from 0.4.10 to 0.4.11. - [Release notes](https://github.com/tokio-rs/slab/releases) - [Changelog](https://github.com/tokio-rs/slab/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/slab/compare/v0.4.10...v0.4.11) --- updated-dependencies: - dependency-name: slab dependency-version: 0.4.11 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-15 09:48:03 +02:00
Fabiano Fidêncio	60790907ef	clh: Update to v48.0 release ``` Experimental fw_cfg Device Support This feature enables passing configuration data and files, such as VM boot configurations (kernel, kernel cmdline, e820 memory map, and ACPI tables), from the host to the guest. (#7117) Experimental ivshmem Device Support Support for inter-VM shared memory has been added. For more information, please refer to the ivshmem documentation. (#6703) Firmware Boot Support on riscv64 In addition to direct kernel boot, firmware boot support has been added on riscv64 hosts. (#7249) Increased vCPU Limit on x86_64/kvm The maximum number of supported vCPUs on x86_64 hosts using KVM has been raised from 254 to 8192. (#7299) Improved Block Performance with Small Block Sizes Performance for virtio-blk with small block sizes (16KB and below) is enhanced via submitting async IO requests in batches. (#7146) Faster VM Pause Operation The VM pause operation now is significantly faster particularly for VMs with a large number of vCPUs. (#7290) Updated Documentation on Windows Guest Support Our Windows documentation now includes instructions to run Windows 11 guests, in addition to Windows Server guests. (#7218) Policy on AI Generated Code We will decline any contributions known to contain contents generated or derived from using Large Language Models (LLMs). Details can be found in our contributing documentation. (#7162) Removed SGX Support The SGX support has been removed, as announced in the deprecation notice two release cycles ago. (#7093) Notable Bug Fixes Seccomp filter fixes with glibc v2.42 (#7327) Various fixes related to (#7331, #7334, #7335) ``` From https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v48.0 Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-09-15 08:30:18 +02:00
Fupan Li	4dc21aa966	Merge pull request #11766 from Apokleos/fix-create_container_timeout kata-types: Support create_container_timeout set within configuration	2025-09-15 10:19:58 +08:00
Alex Lyn	7874505249	Merge pull request #11782 from Apokleos/enhance-policy-rs genpolicy: Enhance policy rule for runtime-rs scenarios	2025-09-15 10:07:14 +08:00
Alex Lyn	e3d6cb8547	Merge pull request #11716 from lifupan/fupan_main runtime-rs: make the virtio-blk use the pci bus as default	2025-09-15 09:49:40 +08:00
Alex Lyn	7062a769b7	genpolicy: Exclude cgroup namespace from namespace validation Exclude 'cgroup' namespace from namespace checks during `allow_linux` validation. This complements the existing exclusion of the 'network' namespace. As runtime-rs has specific cgroup namespace configurations, and excluding it from policy validation ensures parity between runtime-rs and runtime-go implementations. This allows focusing validation on critical namespaces like PID, IPC, and MNT, while avoiding potential policy mismatches due to another cgroup namespace management by the runtime-rs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-14 17:24:06 +08:00
Alex Lyn	12a9ad56b4	genpolicy: Normalize namespace type for mount/mnt compatibility Add `normalize_namespace_type()` function to map "mount" (case-insensitive) to "mnt" while keeping other values unchanged. This ensures namespace comparisons treat "mount" and "mnt" as equivalent. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-14 17:24:06 +08:00
Alex Lyn	ebdfbd3120	genpolicy: Make comparison order-independent and accept CAP_X/X - Use set comparison to ignore ordering differences when matching capabilities. - Add normalization to strip "CAP_" prefix to support both CAP_XXX and XXX formats. This makes capability matching more robust against different ordering and naming formats. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-14 17:23:58 +08:00
Alex Lyn	04dedda6ed	runtime-rs: Bugfix for kata virtual volume overlay fstype As prvious configure with overlayfs is incorrect, which causes the agent policy validation failure. And it's also different with runtime-go's configuration. In this patch, we'll correct its fstype with overlay and align with runtime on this matter. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-14 16:38:09 +08:00
Fupan Li	d073af4e64	dragonball: fix the issue of missing unregister doorbell It should unregister the doorbell resources once the device was reset. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-09-14 09:08:26 +08:00
Fupan Li	2844a6f938	runtime-rs: sync hotunplug the block devices for dragonball When hot-removing a block device, the kernel must first unmount the device and then destroy it on the VM. Therefore, a prepare_remove_block_device procedure must be added to wait for the kernel to unmount the device before destroying it on the VM. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-09-14 09:08:26 +08:00
Fupan Li	6e5fe96ed1	dragonball: sync remove the block devices When hot-removing a block device, the kernel must first remove the device and then destroy it on the VM. Therefore, a prepare_remove_block_device procedure must be added to wait for the kernel to unmount the device before destroying it on the VM. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-09-14 09:08:26 +08:00
Fupan Li	c80ddd3fd9	runtime-rs: make virtio-blk use the pci bus as default Since Dragonball's MMIO bus only supports legacy interrupts, while the PCI bus supports MSIX interrupts, to improve performance for block devices, virtio-blk devices are set to PCI bus mode by default. We had tested the virtio-blk's performance using the fio with the following commands: fio -filename=./test -direct=1 -iodepth 32 -thread -rw=randrw -rwmixread=50 -ioengine=libaio -bs=4k -size=10G -numjobs=4 -group_reporting -name=mytest When used the legacy interrupt, the io test is as below: read : io=20485MB, bw=195162KB/s, iops=48790, runt=107485msec write: io=20475MB, bw=195061KB/s, iops=48765, runt=107485msec Once switched to msix innterrupt, the io test is as below: read : io=20485MB, bw=260862KB/s, iops=65215, runt= 80414msec write: io=20475MB, bw=260727KB/s, iops=65181, runt= 80414msec We can get 34% performance improvement. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-09-14 09:08:26 +08:00
Fupan Li	2dd172c5b6	dragonball: Add the pci bus support for virtio-blk Added support for PCI buses for virtio-blk devices. This commit adds support for PCI buses for both cold-plugged and hot-plugged virtio-blk devices. Furthermore, during hot-plugging, support is added for synchronous waiting for hot-plug completion. This ensures that multiple devices can be hot-plugged successfully without causing upcall busy errors. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-09-14 09:08:26 +08:00
Fupan Li	3c3823f2e4	dragonball: refactoring the pci system manager In order to support the pci bus for virtio devices, move the pci system manager from vfio manager to device manager, thus it can be shared by both of vfio and virtio pci devices. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-09-14 09:08:26 +08:00
Fupan Li	59273e8b2d	dragonball: add the msix interrupt support Add the msix notify support for virito queues. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-09-14 09:08:26 +08:00
Fupan Li	7de6455742	dragonball: add the pci bus support for virtio Add the pci bus support for virtio devices. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-09-14 09:08:26 +08:00
Dan Mihai	34925ae740	Merge pull request #11795 from microsoft/danmihai1/snp-annotations runtime: snp: enable CoCo annotations	2025-09-12 14:23:54 -07:00
Dan Mihai	60beb5236d	runtime: snp: enable CoCo annotations Use @DEFENABLEANNOTATIONS_COCO@ in configuration-qemu-snp.toml, for consistency with the tdx and coco-dev configuration files. k8s-initdata.bats was failing during CI on SNP without this change, because the cc_init_data annotation was disabled. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-12 15:38:33 +00:00
RuoqingHe	a011d2132f	Merge pull request #11775 from RuoqingHe/fix-test_execute_hook libs: Fix unit tests under non-root user	2025-09-12 08:03:05 +08:00
Aurélien Bombo	760b465bb0	Merge pull request #11788 from kata-containers/sprt/zizmor-branch ci: Run Zizmor on pushes to any branch	2025-09-11 11:52:06 -05:00
Aurélien Bombo	11655ef029	ci: Run Zizmor on pushes to any branch This runs Zizmor on pushes to any branch, not just main. This is useful for: 1. Testing changes in feature branches with the manually-triggered CI. 2. Forked repos that may use a different name than "main" for their default branch. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-09-11 09:33:25 -05:00
Ruoqing He	f6e93c2094	libs: Fix test_get_uds_with_sid_with_zero Test case for `get_uds_with_sid` with an empty run directory would not hit the 0 match arm, i.e. "sandbox with the provided prefix {short_id:?} is not found", because `get_uds_with_sid` will try to create the directory with provided short id before detecting `target_id`. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-09-11 02:04:54 +00:00
Ruoqing He	b10e5a2250	libs: Fix test_get_uds_with_sid_ok Preset directory `kata98654sandboxpath1` will produce more than one `target_id` in `get_uds_with_sid`, which causes test to fail. Remove that directory to make this test work. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-09-11 02:04:54 +00:00
Ruoqing He	efeba0b8ed	libs: Detect guest protection before testing `test_arch_guest_protection_*` test cases get triggered simultaneously, which is impossible for a single machine to pass. Modify tests to detect protection file before preceding. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-09-11 02:04:54 +00:00
Ruoqing He	a9ba18d48c	libs: Fix test_execute_hook test Case 4 of `test_execute_hook` would fail because `args` could not be empty, while by providing `build_oci_hook` with `vec![]` would result in empty args at execution stage. Modify `build_oci_hook` to set args as `None` when empty vector is provided. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-09-11 02:04:54 +00:00
Dan Mihai	5d59341f7f	Merge pull request #11780 from ryansavino/snp-guest-kernel-upgrade-issue packaging: add required modules for confidential guest kernel	2025-09-10 18:21:26 -07:00
Liang, Ma	a989686cf6	kata-agent: Rename misleading variable in config parsing The variable `addr` was used to store the log level string read from the `LOG_LEVEL_ENV_VAR` environment variable. This name is misleading as it implies a network address rather than a log level value. This commit renames the variable to `level` to more accurately reflect its purpose, improving the overall readability of the configuration code. A minor whitespace formatting fix in a macro is also included. Signed-off-by: Liang, Ma <liang3.ma@intel.com>	2025-09-11 07:54:48 +08:00
Steve Horsman	58259aa5f4	Merge pull request #11754 from stevenhorsman/go.mod-1.24.6-bump versions: Tidy up go.mod versions	2025-09-10 14:11:33 +01:00
Hyounggyu Choi	1737777d28	Merge pull request #11743 from BbolroC/enable-ci-qemu-se-runtime-rs runtime-rs: Enable s390x nightly test for IBM SEL	2025-09-10 15:00:16 +02:00
Alex Lyn	1d26d07110	Merge pull request #11781 from lifupan/fupan_main_qemu runtime-rs: log out the qemu console when debug enabled	2025-09-10 16:59:30 +08:00
Hyounggyu Choi	1060a94b08	GHA: Add s390x nightly test for runtime-rs on IBM SEL A new internal nightly test has been established for runtime-rs. This commit adds a new entry `cc-se-e2e-tests-rs` to the existing matrix and renames the existing entry `cc-se-e2e-tests` to `cc-se-e2e-tests-go`. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-09-10 10:57:40 +02:00
Hyounggyu Choi	37764d18d4	tests: Skip k8s tests for qemu-se-runtime-rs Tests skipped because tests for `qemu-se` are skipped: - k8s-empty-dirs.bats - k8s-inotify.bats - k8s-shared-volume.bats Tests skipped because tests for `qemu-runtime-rs` are skipped: - k8s-block-volume.bats - k8s-cpu-ns.bats - k8s-number-cpus.bats Let's skip the tests above to run the nightly test for runtime-rs on IBM SEL. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-09-10 10:57:40 +02:00
Steve Horsman	e502fa2feb	Merge pull request #11731 from kata-containers/dependabot/go_modules/src/tools/csi-kata-directvolume/github.com/ulikunitz/xz-0.5.14 build(deps): bump github.com/ulikunitz/xz from 0.5.11 to 0.5.14 in /src/tools/csi-kata-directvolume	2025-09-10 09:47:28 +01:00
Steve Horsman	3f25b88f89	Merge pull request #11737 from kata-containers/dependabot/cargo/src/runtime-rs/tracing-subscriber-0.3.20 build(deps): bump tracing-subscriber from 0.3.17 to 0.3.20 in /src/runtime-rs	2025-09-10 09:47:07 +01:00
Steve Horsman	22bc29cb4a	Merge pull request #11746 from stevenhorsman/bump-tests-go-mod-yaml-3.0.1 versions: Bump gopkg.in/yaml.v3	2025-09-10 09:46:18 +01:00
RuoqingHe	106c6cea59	Merge pull request #11774 from RuoqingHe/2025-09-09-disable-make-test-libs-temporarily ci: gatekeeper: Mark `make test libs` not required	2025-09-10 14:52:33 +08:00
Fupan Li	16be168062	runtime-rs: log out the qemu console when debug enabled When hypervisor's debug enabled, log out the qemu's console messages for kernel boot debugging. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-09-10 14:19:15 +08:00
Fupan Li	5715408d61	runtime-rs: add the console device to kernel boot for qemu Add the console device to kernel boot, thus we can log out the kernel's boot message for debug. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-09-10 14:10:45 +08:00
Ruoqing He	6a2d813196	ci: gatekeeper: Mark `make test libs` not required There are still some issues to be address before we can mark `make test` for `libs` as required. Mark this case as not required temporarily. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-09-10 03:52:20 +00:00
Ryan Savino	85779a6f1a	packaging: add required modules for confidential guest kernel SNP launch was failing after the confidential guest kernel was upgraded to 6.16.1. Added required module CONFIG_MTRR enabled. Added required module CONFIG_X86_PAT enabled. Fixes: #11779 Signed-off-by: Ryan Savino <ryan.savino@amd.com>	2025-09-09 21:58:15 -05:00
Xuewei Niu	c1ee0985ed	Merge pull request #11770 from stevenhorsman/agent-ctl-bump-hypervisor agent-ctl: version: bump hypervisor	2025-09-09 11:59:25 +08:00
Aurélien Bombo	ceab55a871	Merge pull request #11772 from kata-containers/sprt/zizmor-hash ci: security: Fix "commit hash does not point to a Git tag"	2025-09-08 13:56:25 -05:00
Aurélien Bombo	b640fe5a6a	Merge pull request #11756 from kata-containers/sprt/curl-logging ci: cri-containerd-amd64: add logging for curl failures	2025-09-08 11:55:29 -05:00
Aurélien Bombo	c0030c271c	ci: security: Fix "commit hash does not point to a Git tag" This fixes all such issues, ie.: https://github.com/kata-containers/kata-containers/security/code-scanning/459 https://github.com/kata-containers/kata-containers/security/code-scanning/508 https://github.com/kata-containers/kata-containers/security/code-scanning/510 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-09-08 11:17:54 -05:00
Aurélien Bombo	cbcc7af6f3	Merge pull request #11615 from kata-containers/sprt/zizmor-pedantic security: gha: Run Zizmor in auditor mode	2025-09-08 10:28:19 -05:00
stevenhorsman	87356269d8	versions: Tidy up go.mod versions Update go 1.23 references to go 1.24.6 to match versions.yaml Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-08 14:03:47 +01:00
stevenhorsman	2d28f3d267	agent-ctl: version: bump hypervisor Bump the version of runtime-rs' hypervisor crate to upgrade (indirectly) protobug and remediate vulnerability RUSTSEC-2024-0437 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-08 13:23:03 +01:00
dependabot[bot]	5ae34ab240	build(deps): bump github.com/ulikunitz/xz Bumps [github.com/ulikunitz/xz](https://github.com/ulikunitz/xz) from 0.5.11 to 0.5.14. - [Commits](https://github.com/ulikunitz/xz/compare/v0.5.11...v0.5.14) --- updated-dependencies: - dependency-name: github.com/ulikunitz/xz dependency-version: 0.5.14 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2025-09-08 11:30:49 +01:00
Alex Lyn	8eeea7d1fc	runtime-rs: Correct the default create_container_timeout with 30s The previous document about the default of create_container_timeout is 30,000 millseconds which not keep alignment with runtime-go. In this commit, we'll change it as 30 seconds. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-07 21:59:37 +08:00
Alex Lyn	3e53f2814a	kata-types: Support create_container_timeout set within configuration Since it aligns with the create_container_timeout definition in runtime-go, we need to set the value in configuration.toml in seconds, not milliseconds. We must also convert it to milliseconds when the configuration is loaded for request_timeout_ms. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-07 21:59:32 +08:00
Alex Lyn	4644a02871	Merge pull request #11752 from Apokleos/fix-hooks-devcgrp runtime-rs: Remove default value of Linux.Resources.Devices and correctly set Hooks in OCI Spec to meet with Agent Policy requirements	2025-09-07 18:01:02 +08:00
stevenhorsman	66dc24566f	versions: Bump gopkg.in/yaml.v3 Bump gopkg.in/yaml.v3 from 3.0.0 to 3.0.1 to remediate CVE-2022-28948 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-09-05 16:36:48 +01:00
Aurélien Bombo	c480737ebd	ci: cri-containerd-amd64: add logging for curl failures This is to investigate #11755. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-09-05 10:35:45 -05:00
Aurélien Bombo	efbc69a2ec	Merge pull request #11760 from kata-containers/sprt/oidc-fix ci: aks: Refresh OIDC token in case access token expired	2025-09-05 10:29:35 -05:00
Dan Mihai	1f68f15995	Merge pull request #11759 from microsoft/danmihai1/policy-storages genpolicy: print Input and Policy storages	2025-09-04 15:07:45 -07:00
Aurélien Bombo	f39517a18a	ci: aks: Refresh OIDC token in case access token expired It's possible that tests take a long time to run and hence that the access token expires before we delete the cluster. In this case `az cli` will try to refresh the access token using the OIDC token (which will have definitely also expired because its lifetime is ~5 minutes). To address this we refresh the OIDC token manually instead. Automatic refresh isn't supported per Azure/azure-cli#28708. Fixes: #11758 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-09-04 12:44:02 -05:00
Dan Mihai	9b0b7fc795	genpolicy: print Input and Policy storages Print the Storage data structures, to help with debugging. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-09-04 16:03:03 +00:00
Cameron Baird	bdd98ec623	ci: Add test case for iptables, exercised via istio init container Introduce new test case in k8s-iptables.bats which verifies that workloads can configure iptables in the UVM. Users discovered that they weren't able to do this for common usecases such as istio. Proper support for this should be built into UVM kernels. This test ensures that current and future kernel configurations don't regress this functionality. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-09-04 07:18:45 +02:00
Cameron Baird	d16026f7b9	kernel: add required configs for ip6tables support Currently, the UVM kernel fails for istio deployments (at least with the version we tested, 1.27.0). This is because the istio sidecar container uses ip6tables and the required kernel configs are not built-in: ``` iptables binary ip6tables has no loaded kernel support and cannot be used, err: exit status 3 out: ip6tables v1.8.10 (legacy): can't initialize ip6tables table `filter': Table does not exist (do you need to insmod?) Perhaps ip6tables or your kernel needs to be upgraded. ``` Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-09-04 07:18:45 +02:00
Aurélien Bombo	1dcc67c241	security: gha: Use Zizomor's auditor mode This is the strictest possible setting for Zizmor. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-09-03 12:30:09 -05:00
Hyounggyu Choi	49ca96561b	Merge pull request #11750 from BbolroC/use-pattern-working-for-both-runtimes tests: Use "Failed" consistently for both runtimes	2025-09-03 13:06:05 +02:00
Alex Lyn	e235fc1efb	runtime-rs: Remove default value of Linux.Resources.Devices in OCI Spec In certain scenarios, particularly under CoCo/Agent Policy enforcement, the default initial value of `Linux.Resources.Devices` is considered non-compliant, leading to container creation failures. To address this issue and ensure consistency with the behavior in `runtime-go`, this commit removes the default value of `Linux.Resources.Devices` from the OCI Spec. This cleanup ensures that the OCI Spec aligns with runtime expectations and prevents policy violations during container creation. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-03 18:42:34 +08:00
Alex Lyn	203f7090a6	runtime-rs: Ensure the setting of hooks when OCI Hooks is existing. Only the StartContainer hook needs to be reserved for execution in the guest, but we also make sure that the setting happens only when the OCI Hooks does exist, otherwise we do nothing. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-09-03 17:38:40 +08:00
Hyounggyu Choi	6d6202bbe3	tests: Use "Failed" consistently for both runtimes In k8s-guest-pull-image.bats, `failed to pull image` is not caught by assert_logs_contain() for runtime-rs. To ensure consistency, this commit changes `failed` to `Failed`, which works for both runtimes. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-09-03 09:09:13 +02:00
Hyounggyu Choi	150c90e32a	Merge pull request #11728 from BbolroC/fix-sealed-secret-volume runtime-rs: Adjust path for sealed secret mount check	2025-09-02 16:57:33 +02:00
Fupan Li	9cc1c76ade	Merge pull request #11729 from kata-containers/dependabot/go_modules/src/tools/log-parser/gopkg.in/yaml.v3-3.0.1 build(deps): bump gopkg.in/yaml.v3 from 3.0.0 to 3.0.1 in /src/tools/log-parser	2025-09-02 17:05:51 +08:00
dependabot[bot]	8330dd059f	build(deps): bump tracing-subscriber in /src/runtime-rs Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.3.17 to 0.3.20. - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.3.17...tracing-subscriber-0.3.20) --- updated-dependencies: - dependency-name: tracing-subscriber dependency-version: 0.3.20 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2025-08-29 20:44:35 +00:00
Xuewei Niu	f6ff9cf717	Merge pull request #11689 from Caspian443/fix-devmapper-selinux-mount-issue runtime-rs: Empty block-rootfs Storage.options and align with Go runtime	2025-08-29 15:29:46 +08:00
Aurélien Bombo	754f07cff2	Merge pull request #11614 from kata-containers/workflow-permissions-tightening Workflow permissions tightening	2025-08-28 10:56:03 -05:00
dependabot[bot]	3a0416c99f	build(deps): bump gopkg.in/yaml.v3 in /src/tools/log-parser Bumps gopkg.in/yaml.v3 from 3.0.0 to 3.0.1. --- updated-dependencies: - dependency-name: gopkg.in/yaml.v3 dependency-version: 3.0.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2025-08-28 14:03:22 +00:00
Hyounggyu Choi	65fdb18c96	runtime-rs: Adjust path for sealed secret mount check Mount validation for sealed secret requires the base path to start with `/run/kata-containers/shared/containers`. Previously, it used `/run/kata-containers/sandbox/passthrough`, which caused test failures where volume mounts are used. This commit renames the path to satisfy the validation check. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-08-28 15:38:07 +02:00
Fabiano Fidêncio	08d2ba1969	cgroups: Fix "." parent cgroup special case `ef642fe890` added a special case to avoid moving cgroups that are on the "default" slice in case of deletion. However, this special check should be done in the Parent() method instead, which ensures that the default resource controller ID is returned, instead of ".". Fixes: #11599 Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-08-27 08:15:15 +02:00
Caspian443	617af4cb3b	runtime-rs: Empty block-rootfs Storage.options and align with Go runtime - Set guest Storage.options for block rootfs to empty (do not propagate host mount options). - Align behavior with Go runtime: only add xfs nouuid when needed. Signed-off-by: Caspian443 <scrisis843@gmail.com>	2025-08-26 01:27:21 +00:00
Caspian443	9a7aadaaca	libs: Introduce rootfs fs types - Add new kata-types::fs module with: - VM_ROOTFS_FILESYSTEM_EXT4 - VM_ROOTFS_FILESYSTEM_XFS - VM_ROOTFS_FILESYSTEM_EROFS - Export fs module in src/libs/kata-types/src/lib.rs - Remove duplicated filesystem constants from src/runtime-rs/crates/hypervisor/src/lib.rs - Update src/runtime-rs/crates/hypervisor/src/kernel_param.rs (and tests) to import from kata_types::fs Signed-off-by: Caspian443 <scrisis843@gmail.com>	2025-08-26 01:26:53 +00:00
Fabiano Fidêncio	63f6dcdeb9	kata-manager: Support xz and zst suffixes for the kata tarball We moved to `.zst`, but users still use the upstream kata-manager to download older versions of the project, thus we need to support both suffixes. Fixes: #11714 Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-08-25 21:15:06 +02:00
Fupan Li	687d0bf94a	Merge pull request #11715 from fidencio/topic/backport-qemu-reclaim-guest-freed-memory runtime: qemu: Add reclaim_guest_freed_memory [BACKPORT]	2025-08-25 16:59:29 +08:00
Fabiano Fidêncio	fd1b8ceed1	runtime: qemu: Add reclaim_guest_freed_memory [BACKPORT] Similar to what we've done for Cloud Hypervisor in the commit `9f76467cb7`, we're backporting a runtime-rs feature that would be benificial to have as part of the go runtime. This allows users to use virito-balloon for the hypervisor to reclaim memory freed by the guest. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-08-22 23:56:47 +02:00
stevenhorsman	b4545da15d	workflows: Set top-level permissions to empty The default suggestion for top-level permissions was `contents: read`, but scorecard notes anything other than empty, so try updating it and see if there are any issues. I think it's only needed if we run workflows from other repos. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-08-22 14:13:21 +01:00
stevenhorsman	f79e453313	workflows: Tighten up workflow permissions Since the previous tightening a few workflow updates have gone in and the zizmor job isn't flagging them as issues, so address this to remove potential attack vectors Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-08-22 14:13:21 +01:00
Fabiano Fidêncio	e396a460bc	Revert "local-build: Enforce USE_CACHE=no" This reverts commit `cb5f143b1b`, as the cached packages have been regenerated after the switch to using zstd. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-08-22 14:03:36 +02:00
Steve Horsman	23d2dfaedc	Merge pull request #11707 from fidencio/topic/switch-to-use-zstd-when-possible kata-deploy: local-build: Use zstd instead of xz	2025-08-22 10:06:00 +01:00
stevenhorsman	8cbb1a4357	runtime: Fix non constant Errorf formatting As part of the go 1.24.6 bump there are errors about the incorrect use of a errorf, so switch to the non-formatting version, or add the format string as appropriate Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-08-22 10:44:15 +02:00
stevenhorsman	381da9e603	versions: Bump golang to 1.24.6 golang 1.25 has been released, so 1.23 is EoL, so we should update to ensure we don't end up with security issues Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-08-22 10:44:15 +02:00
stevenhorsman	0ccf429a3d	workflows: Switch workflows to use install_go.sh Update the two workflows that used setup-go to instead call `install_go.sh` script, which handles installing the correct version of golang Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-08-22 10:44:15 +02:00
stevenhorsman	5f7525f099	build: Add darwin support to arch_to_golang Avoid the error `ERROR: unsupported architecture: arm64` in install_go.sh on darwin Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-08-22 10:44:15 +02:00
stevenhorsman	3391c6f1c5	ci: Make install_go.sh more portable `${kernel_name,,}` is bash 4.0 and not posix compliant, so doesn't work on macos, so switch to `tr` which is more widely supported Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-08-22 10:44:15 +02:00
Alex Lyn	91913f9e82	Merge pull request #11711 from stevenhorsman/remote-allow-cc_init_data-annotation runtime: Enable init_data annotation	2025-08-22 14:41:53 +08:00
Fupan Li	1a0fbbfa32	Merge pull request #11699 from Apokleos/support-nonprotection runtime-rs: Support initdata within NonProtection scenarios	2025-08-22 10:24:47 +08:00
Hyounggyu Choi	41dcfb4a9f	Merge pull request #11321 from BbolroC/reconnect-timeout-qemu-se runtime-rs: Adjust VSOCK timeouts for IBM SEL	2025-08-22 00:34:05 +02:00
Fabiano Fidêncio	cb5f143b1b	local-build: Enforce USE_CACHE=no We need that to regenerate the tarballs that are already cached in the zstd format. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-08-21 21:00:20 +02:00
stevenhorsman	081823b388	runtime: Enable init_data annotation In #11693 the cc_init_data annotation was changes to be hypervisor scoped, so each hypervisor needs to explicitly allow it in order to use it now, so add this to both the go and rust runtime's remote configurations Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-08-21 19:26:10 +01:00
Fabiano Fidêncio	f8d7ff40b4	local-build: Fix shim-v2 no cache build with measured rootfs We need to get the root_hash.txt file from the image build, otherwise there's no way to build the shim using those values for the configuration files. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-08-21 19:56:01 +02:00
Fabiano Fidêncio	ad240a39e6	kata-deploy: tools: tests: Use zstd instead of xz Although the compress ratio is not as optimal as using xz, it's way faster to compress / uncompress, and it's "good enough". This change is not small, but it's still self-contained, and has to get in at once, in order to help bisects in the future. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-08-21 19:53:55 +02:00
Fabiano Fidêncio	9cc97ad35c	kata-deploy: Bump image to use alpine 3.22 As 3.18 is already EOL. We need to add `--break-system-packages` to enforce the install of the installation of the yq version that we rely on. The tests have shown that no breakage actually happens, fortunately. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-08-21 19:53:55 +02:00
Fabiano Fidêncio	1329ce355e	versions: image / initrd: Bump to alpine 3.22 As the 3.18 is EOL'ed. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-08-21 19:53:55 +02:00
Fabiano Fidêncio	c32fc409ec	rootfs-builder: Bump alpine to 3.22 As we were using a very old non-supported version. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-08-21 19:53:55 +02:00
Zvonko Kaiser	60d87b7785	gpu: Add more debugging to CI/CD Capture NVRC logs via journalctl Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-08-21 18:09:20 +02:00
Alex Lyn	e430727cb6	runtime-rs: Change the initdata device driver with block_device_driver Currently, we change vm_rootfs_driver as the initdata device driver with block_device_driver. Fixes #11697 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-08-21 18:56:26 +08:00
Alex Lyn	5cc028a8b1	runtime-rs: Support initdata within NonProtection scenarios we also need support initdat within nonprotection even though the platform is detected as NonProtection or usually is called nontee host. Within these cases, there's no need to validate the item of `confidential_guest=true`, we believe the result of the method `available_guest_protection()?`. Fixes #11697 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-08-21 18:56:23 +08:00
Hyounggyu Choi	faf5aed965	runtime-rs: Adjust VSOCK timeouts for IBM SEL The default `reconnect_timeout` (3 seconds) was found to be insufficient for IBM SEL when using VSOCK. This commit updates the timeouts as follows: - `dial_timeout_ms`: Set to 90ms to match the value used in go-runtime for IBM SEL - `reconnect_timeout_ms`: Increased to 5000ms based on empirical testing Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-08-21 12:35:44 +02:00
Hyounggyu Choi	b7d2973ce5	Merge pull request #11696 from BbolroC/enable-initdata-ibm-sel-runtime-rs runtime-rs Enable initdata IBM SEL	2025-08-21 09:23:46 +02:00
Hyounggyu Choi	c4b4a3d8bb	tests: Add hypervisor qemu-se-runtime-rs for initdata This commit adds a new hypervisor `qemu-se-runtime-rs` to test initdata for IBM SEL (s390x). Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-08-20 18:57:50 +02:00
Hyounggyu Choi	2ec70bc8e2	runtime-rs: Enable initdata spec for IBM SEL Add support for the `InitData` resource config on IBM SEL, so that a corresponding block device is created and the initdata is passed to the guest through this device. Note that we skip passing the initdata hash via QEMU’s object, since the hypervisor does not yet support this mechanism for IBM SEL. It will be introduced separately once QEMU adds the feature. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-08-20 18:57:50 +02:00
Zvonko Kaiser	c980b6e191	release: Bump version to 3.20.0 Bump VERSION and helm-chart versions Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-08-20 18:18:05 +02:00
Markus Rudy	30aff429df	Merge pull request #11647 from Park-Jiyeonn/opt/sealed-secret-prefix-check Optimize sealed secret scanning to avoid full file reads	2025-08-20 17:18:20 +02:00
Alex Lyn	014ab2fce6	Merge pull request #11693 from BbolroC/revert-initdata-annotation runtime-rs: Fix issues for initdata	2025-08-20 21:17:52 +08:00
Fabiano Fidêncio	dd1752ac1c	Merge pull request #11634 from mythi/coco-kernel-v6.16 versions: update kernel-confidential to Linux v6.16.1	2025-08-20 13:01:05 +02:00
Fupan Li	29ab8df881	Merge pull request #11514 from Apokleos/ci-for-libs CI: Introduce CI for libs to Improve code quality and reduce noises	2025-08-20 18:59:27 +08:00
Hyounggyu Choi	0ac8f1f70e	Merge pull request #11705 from Apokleos/remove-default-guesthookpath kata-types: remove default setting of guest_hook_path	2025-08-20 11:15:25 +02:00
Mikko Ylinen	a0ae1b6608	packaging: kernel: libdw-dev and python3 to builder image These new dependencies are needed by Linux 6.16+. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-08-20 11:34:09 +03:00
Mikko Ylinen	412a384aad	versions: update kernel-confidential to Linux v6.16.1 Linux v6.16 brings some useful features for the confidential guests. Most importantly, it adds an ABI to extend runtime measurement registers (RTMR) for the TEE platforms supporting it. This is currently enabled on Intel TDX only. The kernel version bump from v6.12.x to v6.16 forces some CONFIG_* changes too: MEMORY_HOTPLUG_DEFAULT_ONLINE was dropped in favor of more config choices. The equivalent option is MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO. X86_5LEVEL was made unconditional. Since this was only a TDX configuration, dropping it completely as part of v6.16 is fine. CRYPTO_NULL2 was merged with CRYPTO_NULL. This was only added in confidential guest fragments (cryptsetup) so we can drop it in this update. CRYPTO_FIPS now depends on CRYPTO_SELFTESTS which further depends on EXPERT which we don't have. Enable both in a separate config fragment for confidential guests. This can be moved to a common setting once other targets bump to post v6.16. CRYPTO_SHA256_SSE3 arch optimizations were reworked and are now enabled by default. Instead of adding it to whitelist.conf, just drop it completely since it was only enabled as part of "measured boot" feature for confidential guests. CONFIG_CRYPTO_CRC32_S390 was reworked the same way. In this case, whitelist.conf is needed. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-08-20 11:32:48 +03:00
Hyounggyu Choi	0daafecef2	Revert "runtime-rs: Correct the coresponding initdata annotation const" This reverts commit `37685c41c7`. This renames the relevant constant for initdata. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-08-20 10:15:23 +02:00
Hyounggyu Choi	f0db4032f2	Revert "kata-types: Align the initdata annotation with kata-runtime's definition" This reverts commit `ede773db17`. `cc_init_data` should be under a hypervisor category because it is a hypervisor-specific feature. The annotation including `runtime` also breaks a logic for `is_annotation_enabled()`. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-08-20 10:15:23 +02:00
Hyounggyu Choi	208cec429a	runtime-rs: Introduce CoCo-specific enable_annotations We need to include `cc_init_data` in the enable_annotations array to pass the data. Since initdata is a CoCo-specific feature, this commit introduces a new array, `DEFENABLEANNOTATIONS_COCO`, which contains the required string and applies it to the relevant CoCo configuration. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-08-20 10:15:23 +02:00
Hyounggyu Choi	1f978ecc31	runtime-rs: Fix issues for empty initdata annotation test Currently, there are 2 issues for the empty initdata annotation test: - Empty string handling - "\[CDH\] \[ERROR\]: Get Resource failed" not appearing `add_hypervisor_initdata_overrides()` does not handle an empty string, which might lead to panic like: ``` called `Result::unwrap()` on an `Err` value: gz decoder failed Caused by: failed to fill whole buffer ``` This commit makes the function return an empty string for a given empty input and updates the assertion string to one that appears in both go-runtime and runtime-rs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com> Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-08-20 10:15:23 +02:00
alex.lyn	b23d094928	CI: Introduce CI for libs to Improve code quality and reduce noises Currently, runtime-rs related code within the libs directory lacks sufficient CI protection. We frequently observe the following issues: - Inconsistent Code Formatting: Code that has not been properly formatted is merged. - Failing Tests: Code with failing unit or integration tests is merged. To address these issues, we need introduce stricter CI checks for the libs directory. This may specifically include: - Code Formatting Checks - Mandatory Test Runs Fixes #11512 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-08-20 15:36:09 +08:00
alex.lyn	0f19465b3a	shim-interface: Do cargo check and reduce warnings Reduce shim-interface's warings caused by non-formatted or unchecked operations. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-08-20 15:36:09 +08:00
alex.lyn	e05197e81c	safe-path: Do cargo check and reduce warnings Reduce warings caused by non-formatted or unchecked operations. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-08-20 15:36:09 +08:00
alex.lyn	683d673f4f	protocols: Do cargo format to make codes clean Fix protocols' warings by correctly do cargo check/format. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-08-20 15:36:09 +08:00
alex.lyn	38242d3a61	kata-types: Do cargo check and reduce warnings Reduce noises caused by non-formated codes. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-08-20 15:36:09 +08:00
alex.lyn	283fd45045	kata-sys-utils: fix warnings for s390x The warning reports as bwlow: ``` --> kata-sys-util/src/protection.rs:145:9 \| 145 \| return Err(ProtectionError::NoPerms)?; \| ^^^^^^^ help: remove it \| ... error: `to_string` applied to a type that implements `Display` in `format!` args --> kata-sys-util/src/protection.rs:151:16 \| 151 \| err.to_string() \| ^^^^^^^^^^^^ help: remove this ``` Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-08-20 15:36:09 +08:00
alex.lyn	730b0f1769	kata-sys-utils: Do cargo check codes and reduce warnings Fix kata-sys-utils warings by correctly do cargo check and test it well. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-08-20 15:35:42 +08:00
Fabiano Fidêncio	585d0be342	Merge pull request #11691 from alextibbles/update-lts-kernel versions: update to latest LTS kernel 6.12.42	2025-08-20 08:55:06 +02:00
Fupan Li	b748688e69	Merge pull request #11698 from Apokleos/filter-arpneibhors runtime-rs: Add only static ARP entries with handle_neighours	2025-08-20 14:05:20 +08:00
Alex Lyn	c4af9be411	kata-types: remove default setting of guest_hook_path To make it aligned with the setting of runtime-go, we should keep it as empty when users doesn't enable and set its specified path. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-08-20 13:56:42 +08:00
Zvonko Kaiser	bce8efca67	gpu: Rebuild initrd and image for kernel bump We need to make sure that we use the latest kernel and rebuild the initrd and image for the nvidia-gpu use-cases otherwise the tests will fail since the modules are not build against the new kernel and they simply fail to load. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-08-19 17:32:42 -04:00
Alex Tibbles	e20f6b2f9d	versions: update to latest LTS kernel 6.12.42 Fixes #11690 Signed-off-by: Alex Tibbles <alex@bleg.org>	2025-08-19 17:32:42 -04:00
Fabiano Fidêncio	3503bcdb50	Merge pull request #11701 from alextibbles/go-stdlib-#11700 versions: sync go.mod with versions.yaml for go 1.23.12	2025-08-19 22:14:57 +02:00
Alex Tibbles	a03dc3129d	versions: sync go.mod with versions.yaml for go 1.23.12 OSV-Scanner highlights go.mod references to go stdlib 1.23.0 contrary to intention in versions.yaml, so synchronize them. Make a converse comment for versions.yaml. Fixes: #11700 Signed-off-by: Alex Tibbles <alex@bleg.org>	2025-08-19 11:30:19 -04:00
Hyounggyu Choi	93ec470928	runtime/tests: Update annotation for initdata Let's rename the runtime-rs initdata annotation from `io.katacontainers.config.runtime.cc_init_data` to `io.katacontainers.config.hypervisor.cc_init_data`. Rationale: - initdata itself is a hypervisor-specific feature - the new name aligns with the annotation handling logic: `c92bb1aa88/src/libs/kata-types/src/annotations/mod.rs (L514-L968)` This commit updates the annotation for go-runtime and tests accordingly. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-08-19 15:17:01 +02:00
Alex Lyn	903e608c23	runtime-rs: Add only static ARP entries with handle_neighours To make it aligned with runtime-go, we need add only static ARP entries into the targets. Fixes #11697 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-08-19 20:09:20 +08:00
Steve Horsman	c92bb1aa88	Merge pull request #11684 from zvonkok/gpu-required gatekeeper: GPU test required	2025-08-15 10:30:19 +01:00
Hyounggyu Choi	28bd0cf405	Merge pull request #11640 from rafsal-rahim/bm-initdata-s390x Feat \| Implement initdata for bare-metal/qemu for s390x	2025-08-15 10:42:32 +02:00
Zvonko Kaiser	3a4e1917d2	gatekeeper: Make GPU test required We now run a simple RAG pipeline with each PR to make sure we do not break GPU support. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-08-14 18:35:39 -04:00
Aurélien Bombo	3a5e2060aa	Merge pull request #11683 from kata-containers/sprt/static-checks-default-branch ci: static-checks: Don't hardcode default repo branch	2025-08-14 17:01:18 -05:00
Zvonko Kaiser	55ee8abf0b	Merge pull request #11658 from kata-containers/amd64-nvidia-gpu-cicd-step2 gpu: AMD64 NVIDIA GPU CI/CD Part 2	2025-08-14 17:51:26 -04:00
Aurélien Bombo	0fa7d5b293	ci: static-checks: Don't hardcode default repo branch This would cause weird issues for downstreams which default branch is not "main". Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-08-14 13:22:20 -05:00
Zvonko Kaiser	dcb62a7f91	Merge pull request #11525 from was-saw/qemu-seccomp runtime-rs: add seccomp support for qemu	2025-08-14 12:35:32 -04:00
Zvonko Kaiser	8be41a4e80	gpu: Add embeding service For a simple RAG pipeline add a embeding service Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-08-14 16:34:21 +00:00
RuoqingHe	65a9fe0063	Merge pull request #11670 from kevinzs2048/add-aavmf CI: change the directory for Arm64 firmware	2025-08-14 21:30:21 +08:00
stevenhorsman	43cdde4c5d	test/k8s: Extend initdata tests to run on s390x Enable testing of initdata on the qemu-coco-dev and qemu-se runtime classes, so we can validate the function on s390x Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-08-14 17:10:58 +05:30
rafsalrahim	9891b111d1	runtime: Add initdata support to s390x - Added support for initdata device on s390x. - Generalized devno generation for QEMU CCW devices. Signed-off-by: rafsalrahim <rafsal.rahim@ibm.com>	2025-08-14 17:10:58 +05:30
wangxinge	d147e2491b	runtime-rs: add seccomp support for qemu This commit support the seccomp_sandbox option from the configuration.toml file and add the logic for appending command-line arguments based on this new configuration parameter. Fixes: #11524 Signed-off-by: wangxinge <wangxinge@bupt.edu.cn>	2025-08-14 18:45:03 +08:00
Xuewei Niu	479cce8406	Merge pull request #11536 from was-saw/clh/fc-seccomp runtime-rs: add seccomp support for cloud hypervisor and firecracker	2025-08-14 18:23:14 +08:00
Dan Mihai	ea74024b93	Merge pull request #11663 from burgerdev/arp genpolicy: support AddARPNeighbors	2025-08-13 14:54:36 -07:00
Kevin Zhao	aadad0c9b6	CI: change the directory for Arm64 firmware Previouly it is reusing the ovmf, which will enter some issue for path checking, so move to aavmf as it should be. Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2025-08-13 23:39:44 +02:00
Fabiano Fidêncio	cfd0ebe85f	Merge pull request #11675 from katexochen/snp-guest-policy runtime: make SNP guest policy configurable	2025-08-13 22:20:51 +02:00
Steve Horsman	c7f4c9a3bb	Merge pull request #11676 from stevenhorsman/golang-1.23.12-bump versions: Bump golang to 1.23.12	2025-08-13 15:24:17 +01:00
Park.Jiyeon	2f50c85b12	agent: avoid full file reads when scanning sealed secrets. Read only the sealed secret prefix instead of the whole file. Improves performance and reduces memory usage in I/O-heavy environments. Fixes: #11643 Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>	2025-08-13 20:32:03 +08:00
Paul Meyer	5635410dd3	runtime: make SNP guest policy configurable Dependening on the platform configuration, users might want to set a more secure policy than the QEMU default. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-08-13 09:06:36 +02:00
stevenhorsman	1a6f1fc3ac	versions: Bump golang to 1.23.12 Bump go version to remediate vuln GO-2025-3849 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-08-12 14:46:29 +01:00
Dan Mihai	9379a18c8a	Merge pull request #11565 from Sumynwa/sumsharma/agent_ctl_vm_boot_support agent-ctl: Add option "--vm" to boot pod VM for testing.	2025-08-11 09:36:23 -07:00
Sumedh Alok Sharma	c7c811071a	agent-ctl: Add option --vm to boot pod VM for testing. This change introduces a new command line option `--vm` to boot up a pod VM for testing. The tool connects with kata agent running inside the VM to send the test commands. The tool uses `hypervisor` crates from runtime-rs for VM lifecycle management. Current implementation supports Qemu & Cloud Hypervisor as VMMs. In summary: - tool parses the VMM specific runtime-rs kata config file in /opt/kata/share/defaults/kata-containers/runtime-rs/* - prepares and starts a VM using runtime-rs::hypervisor vm APIs - retrieves agent's server address to setup connection - tests the requested commands & shutdown the VM Fixes #11566 Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2025-08-11 11:03:18 +00:00
wangxinge	f3a669ee2d	runtime-rs: add seccomp support for cloud hypervisor and firecracker The seccomp feature for Cloud Hypervisor and Firecracker is enabled by default. This commit introduces an option to disable seccomp for both and updates the built-in configuration.toml file accordingly. Fixes: #11535 Signed-off-by: wangxinge <wangxinge@bupt.edu.cn>	2025-08-11 17:59:30 +08:00
Hyounggyu Choi	407252a863	Merge pull request #11641 from Apokleos/kata-log runtime-rs: Label system journal log with kata	2025-08-11 08:44:31 +02:00
Alex Lyn	196d7d674d	runtime-rs: Label system journal log with kata Route kata-shim logs directly to systemd-journald under 'kata' identifier. This refactoring enables `kata-shim` logs to be properly attributed to 'kata' in systemd-journald, instead of inheriting the 'containerd' identifier. Previously, `kata-shim` logs were challenging to filter and debug as they appeared under the `containerd.service` unit. This commit resolves this by: 1. Introducing a `LogDestination` enum to explicitly define logging targets (File or Journal). 2. Modifying logger creation to set `SYSLOG_IDENTIFIER=kata` when logging to Journald. 3. Ensuring type safety and correct ownership handling for different logging backends. This significantly enhances the observability and debuggability of Kata Containers, making it easier to monitor and troubleshoot Kata-specific events. Fixes: #11590 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com> Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-08-10 16:00:36 +08:00
Aurélien Bombo	be148c7f72	Merge pull request #11666 from kata-containers/sprt/static-check-exclude-security-md ci: static-checks: add SECURITY.md to exclude list	2025-08-08 12:50:29 -05:00
Fabiano Fidêncio	dcbdf56281	Merge pull request #11660 from zvonkok/remove-stable ci: Remove stable	2025-08-08 14:18:25 +02:00
Xuewei Niu	1d2f2d6350	Merge pull request #11219 from fidencio/topic/version-qemu-bump-to-10.0.0 version: Bump QEMU to v10.0.0	2025-08-08 19:04:45 +08:00
RuoqingHe	aaf8de3dbf	Merge pull request #11669 from kevinzs2048/add-timeout ci: cri-containerd: add 5s timeout for creating sanbox with crictl	2025-08-08 18:25:58 +08:00
Alex Lyn	9816ffdac7	Merge pull request #11653 from Apokleos/align-initdata-annoation Align initdata annoation with kata-runtime	2025-08-08 16:24:09 +08:00
Kevin Zhao	1aa65167d7	CI: cri-containerd: add 5s timeout for creating sanbox with crictl After moving Arm64 CI nodes to new one, we do faced an interesting issue for timeout when it executes the command with crictl runp, the error is usally: code = DeadlineExceeded Fixes: #11662 Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2025-08-08 15:41:39 +08:00
Fupan Li	b50777a174	Merge pull request #10580 from pmores/make-vcpu-allocation-more-accurate runtime-rs: make vcpu allocation more accurate	2025-08-08 14:14:40 +08:00
Xuewei Niu	beea0c34c5	Merge pull request #11060 from kata-containers/sprt/vfsd-metadata runtime: virtio-fs: Support "metadata" cache mode	2025-08-08 11:13:57 +08:00
Fabiano Fidêncio	f9e16431c1	version: Bump QEMU to v10.0.3 As the new release of QEMU is out, let's switch to it and take advantage of bug fixes and improvements. QEMU changelog: https://wiki.qemu.org/ChangeLog/10.0 Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-08-07 22:31:30 +02:00
Greg Kurz	f9a6359674	Merge pull request #11667 from c3d/bug/11633-qmp qemu: Respect the JSON schema for hot plug	2025-08-07 16:04:12 +02:00
Aurélien Bombo	6d96875d04	runtime: virtio-fs: Support "metadata" cache mode The Rust virtiofsd supports a "metadata" cache mode [1] that wasn't present in the C version [2], so this PR adds support for that. [1] https://gitlab.com/virtio-fs/virtiofsd [2] https://qemu.weilnetz.de/doc/5.1/tools/virtiofsd.html#cmdoption-virtiofsd-cache Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-08-07 21:24:40 +08:00
Pavel Mores	69f21692ed	runtime-rs: enable vcpu allocation tests in CI This series should make runtime-rs's vcpu allocation behaviour match the behaviour of runtime-go so we can now enable pertinent tests which were skipped so far due the difference between both shims. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-08-07 10:32:44 +02:00
Pavel Mores	00bfa3fa02	runtime-rs: re-adjust config after modifying it with annotations Configuration information is adjusted after loading from file but so far, there has been no similar check for configuration coming from annotations. This commit introduces re-adjusting config after annotations have been processed. A small refactor was necessary as a prerequisite which introduces function TomlConfig::adjust_config() to make it easier to invoke the adjustment for a whole TomlConfig instance. This function is analogous to the existing validate() function. The immediate motivation for this change is to make sure that 0 in "default_vcpus" annotation will be properly adjusted to 1 as is the case if 0 is loaded from a config file. This is required to match the golang runtime behaviour. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-08-07 10:32:44 +02:00
Pavel Mores	e2156721fd	runtime-rs: add tests to exercise floating-point 'default_vcpus' Also included (as commented out) is a test that does not pass although it should. See source code comment for explanation why fixing this seems beyond the scope of this PR. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-08-07 10:32:44 +02:00
Pavel Mores	1f95d9401b	runtime-rs: change representation of default_vcpus from i32 to f32 This commit focuses purely on the formal change of type. If any subsequent changes in semantics are needed they are purposely avoided here so that the commit can be reviewed as a 100% formal and 0% semantic change. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-08-07 10:32:44 +02:00
Pavel Mores	cdc0eab8e4	runtime-rs: make sandbox vcpu allocation more accurate This commit addresses a part of the same problem as PR #7623 did for the golang runtime. So far we've been rounding up individual containers' vCPU requests and then summing them up which can lead to allocation of excess vCPUs as described in the mentioned PR's cover letter. We address this by reversing the order of operations, we sum the (possibly fractional) container requests and only then round up the total. We also align runtime-rs's behaviour with runtime-go in that we now include the default vcpu request from the config file ('default_vcpu') in the total. We diverge from PR #7623 in that `default_vcpu` is still treated as an integer (this will be a topic of a separate commit), and that this implementation avoids relying on 32-bit floating point arithmetic as there are some potential problems with using f32. For instance, some numbers commonly used in decimal, notably all of single-decimal-digit numbers 0.1, 0.2 .. 0.9 except 0.5, are periodic in binary and thus fundamentally not representable exactly. Arithmetics performed on such numbers can lead to surprising results, e.g. adding 0.1 ten times gives 1.0000001, not 1, and taking a ceil() results in 2, clearly a wrong answer in vcpu allocation. So instead, we take advantage of the fact that container requests happen to be expressed as a quota/period fraction so we can sum up quotas, fundamentally integral numbers (possibly fractional only due to the need to rewrite them with a common denominator) with much less danger of precision loss. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-08-07 10:32:44 +02:00
Christophe de Dinechin	ec480dc438	qemu: Respect the JSON schema for hot plug When hot-plugging CPUs on QEMU, we send a QMP command with JSON arguments. QEMU 9.2 recently became more strict[1] enforcing the JSON schema for QMP parameters. As a result, running Kata Containers with QEMU 9.2 results in a message complaining that the core-id parameter is expected to be an integer: ``` qmp hotplug cpu, cpuID=cpu-0 socketID=1, error: QMP command failed: Invalid parameter type for 'core-id', expected: integer ``` Fix that by changing the core-id, socket-id and thread-id to be integer values. [1]: `be93fd5372` Fixes: #11633 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2025-08-07 09:13:57 +02:00
Alex Lyn	37685c41c7	runtime-rs: Correct the coresponding initdata annotation const As we have changed the initdata annotation definition, Accordingly, we also need correct its const definition with KATA_ANNO_CFG_RUNTIME_INIT_DATA. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-08-07 10:45:28 +08:00
Alex Lyn	163f04a918	Merge pull request #11651 from microsoft/danmihai1/debug-kubectl-logs tests: k8s-sandbox-vcpus-allocation debug info	2025-08-07 10:27:29 +08:00
Aurélien Bombo	e3b4d87b6d	ci: static-checks: add SECURITY.md to exclude list This adds SECURITY.md to the list of GH-native files that should be excluded by the reference checker. Today this is useful for downstreams who already have a SECURITY.md file for compliance reasons. When Kata onboards that file, this commit will also be required. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-08-06 11:24:52 -05:00
Markus Rudy	3eb0641431	genpolicy: add rule for AddARPNeighbors When the network interface provisioned by the CNI has static ARP table entries, the runtime calls AddARPNeighbor to propagate these to the agent. As of today, these calls are simply rejected. In order to allow the calls, we do some sanity checks on the arguments: We must ensure that we don't unexpectedly route traffic to the host that was not intended to leave the VM. In a first approximation, this applies to loopback IPs and devices. However, there may be other sensitive ranges (for example, VPNs between VMs), so there should be some flexibility for users to restrict this further. This is why we introduce a setting, similar to UpdateRoutes, that allows restricting the neighbor IPs further. The only valid state of an ARP neighbor entry is NUD_PERMANENT, which has a value of 128 [1]. This is already enforced by the runtime. According to rtnetlink(7), valid flag values are 8 and 128, respectively [2], thus we allow any combination of these. [1]: https://github.com/torvalds/linux/blob/4790580/include/uapi/linux/neighbour.h#L72 [2]: https://github.com/torvalds/linux/blob/4790580/include/uapi/linux/neighbour.h#L49C20-L53 Fixes: #11664 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-08-06 17:24:36 +02:00
Zvonko Kaiser	1b1b3af9ab	ci: Remove trigger for stable branch We do not support stable branches anymore, remove the trigger for it. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-08-06 09:22:24 +08:00
Hyounggyu Choi	af01434226	Merge pull request #11646 from kata-containers/sprt/param-static-checks ci: static-checks: Auto-detect repo by default	2025-08-05 22:13:20 +02:00
Alex Lyn	ede773db17	kata-types: Align the initdata annotation with kata-runtime's definition To make it work within CI, we do alignment with kata-runtime's definition with "io.katacontainers.config.runtime.cc_init_data". Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-08-03 22:51:39 +08:00
Dan Mihai	05eca5ca25	tests: k8s-sandbox-vcpus-allocation debug info Print more details about the behavior of "kubectl logs", trying to understand errors like: https://github.com/kata-containers/kata-containers/actions/runs/16662887973/job/47164791712 not ok 1 Check the number vcpus are correctly allocated to the sandbox (in test file k8s-sandbox-vcpus-allocation.bats, line 37) `[ `kubectl logs ${pods[$i]}` -eq ${expected_vcpus[$i]} ]' failed with status 2 No resources found in kata-containers-k8s-tests namespace. ... k8s-sandbox-vcpus-allocation.bats: line 37: [: -eq: unary operator expected Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-08-01 20:09:17 +00:00
Aurélien Bombo	c47bff6d6a	Merge pull request #11637 from kata-containers/sprt/remove-install-az-cli gha: Remove unnecessary install-azure-cli step	2025-08-01 09:34:46 -05:00
Fabiano Fidêncio	82f141a02e	Merge pull request #11632 from burgerdev/codegen runtime: reproducible generation of Golang proto bindings	2025-07-31 23:49:18 +02:00
Fabiano Fidêncio	7198c8789e	Merge pull request #11639 from zvonkok/gpu_guest_components gpu: guest components	2025-07-31 21:42:31 +02:00
Aurélien Bombo	9585e608e5	ci: static-checks: Auto-detect repo by default This auto-detects the repo by default (instead of having to specify KATA_DEV_MODE=true) so that forked repos can leverage the static-checks.yaml CI check without modification. An alternative would have been to pass the repo in static-checks.yaml. However, because of the matrix, this would've changed the check name, which is a pain to handle in either the gatekeeper/GH UI. Example fork failure: https://github.com/microsoft/kata-containers/actions/runs/16656407213/job/47142421739#step:8:75 I've tested this change to work in a fork. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-07-31 14:33:24 -05:00
Zvonko Kaiser	8422411d91	gpu: Add coco guest components The second stage needs to consider the coco guest components Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-31 17:11:21 +00:00
Markus Rudy	3fd354b991	ci: add codegen to static-checks Signed-off-by: Markus Rudy <mr@edgeless.systems> Fixes: #11631 Co-authored-by: Steve Horsman <steven@uk.ibm.com>	2025-07-31 17:58:25 +01:00
Markus Rudy	9e38fd2562	tools: add image for Go proto bindings In order to have a reproducible code generation process, we need to pin the versions of the tools used. This is accomplished easiest by generating inside a container. This commit adds a container image definition with fixed dependencies for Golang proto/ttrpc code generation, and changes the agent Makefile to invoke the update-generated-proto.sh script from within that container. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-07-31 17:58:25 +01:00
Markus Rudy	f7a36df290	runtime: generate proto files The generated Go bindings for the agent are out of date. This commit was produced by running src/agent/src/libs/protocols/hack/update-generated-proto.sh with protobuf compiler versions matching those of the last run, according to the generated code comments. Since there are new RPC methods, those needed to be added to the HybridVSockTTRPCMockImp. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-07-31 17:58:25 +01:00
Fabiano Fidêncio	d077ed4c1e	Merge pull request #11645 from kata-containers/topic/fix-kbuild-sign-pin-issue build: nvidia: Fix KBUILD_SIGN_PIN breakage	2025-07-31 18:31:34 +02:00
Fabiano Fidêncio	8d30b84abd	build: nvidia: Fix KBUILD_SIGN_PIN breakage We only need KBUILD_SIGN_PIN exported when building nvidia related artefacts. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-31 16:39:20 +02:00
Fabiano Fidêncio	20bef41347	Merge pull request #11236 from kata-containers/amd64-nvidia-gpu-cicd gpu: AMD64 NVIDIA GPU CI/CD	2025-07-31 14:52:01 +02:00
Aurélien Bombo	96f1d95de5	gha: Remove unnecessary install-azure-cli step az cli is already installed by the azure/login action. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-07-30 10:42:56 -05:00
Zvonko Kaiser	fbb0e7f2f2	gpu: Add secrets passthrough to the workflow We need to pass-through the secrets in all the needed workflows ci, ci-on-push, ci-nightly, ci-devel Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-30 13:51:01 +00:00
Zvonko Kaiser	30778594d0	gpu: Add arm64-nvidia-a100 to actionlint.yaml Make zizmor happy about our custom runner label Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-30 13:45:59 +00:00
Zvonko Kaiser	8768e08258	gpu: Add embeding service For a simple RAG pipeline add a embeding service Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-30 13:45:59 +00:00
Zvonko Kaiser	254dbd9b45	gpu: Add Pod spec for NIM llama Pod spec for the NIM inferencing service Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-30 13:45:59 +00:00
Zvonko Kaiser	568b13400a	gpu: Add NIM bats test We're running a simple NIM container to test if the GPUs are working properly Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-30 13:45:59 +00:00
Zvonko Kaiser	6188b7f79f	gpu: Add run_kubernetes_nv_tests.sh Replicate what we have for run_tests and run .bats files Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-30 13:45:59 +00:00
Zvonko Kaiser	9a829107ba	gpu: Add selector for k8s tests We want to reuse the current run_tests with GPUs, introduce a var that will define what to run. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-30 13:45:59 +00:00
Zvonko Kaiser	7669f1fbd1	gpu: Add NVIDIA GPU test block for amd64 Once we have the amd64 artifacts we can run some arm64 k8s tests. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-30 13:45:59 +00:00
Zvonko Kaiser	97d7575d41	gpu: Disable metrics tests We are not running the metrics tests anyway for now lets make room to run the GPU tests. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-30 13:45:58 +00:00
Anastassios Nanos	00e0db99a3	Merge pull request #11627 from itsmohitnarayan/FirecrackerVersionUpdate	2025-07-30 13:59:55 +03:00
Kumar Mohit	5cccbb9f41	versions: Upgrade Firecracker Version to 1.12.1 Updated versions.yaml to use Firecracker v1.12.1. Replaced firecracker and jailer binaries under /opt/kata/bin. Tested with kata-fc runtime on Kubernetes: - Deployed pods using gitpod/openvscode-server - Verified microVM startup, container access, and Firecracker usage - Confirmed Firecracker and jailer versions via CLI Signed-off-by: Kumar Mohit <68772712+itsmohitnarayan@users.noreply.github.com>	2025-07-30 12:51:08 +05:30
Saul Paredes	1aaaef2134	Merge pull request #11553 from microsoft/danmihai1/genpolicy-cleanup genpolicy: reduce complexity	2025-07-28 14:32:59 -07:00
Dan Mihai	c11c972465	genpolicy: config layer logging clean-up Use a simple debug!() for logging the config_layer string, instead of transcoding, etc. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-07-28 18:30:13 +00:00
Dan Mihai	30bfa2dfcc	genpolicy: use CoCo settings by default - "confidential_emptyDir" becomes "emptyDir" in the settings file. - "confidential_configMap" becomes "configMap" in settings. - "mount_source_cpath" becomes "cpath". - The new "root_path" gets used instead of the old "cpath" to point to the container root path.. - "confidential_guest" is no longer used. By default it gets replaced by "enable_configmap_secret_storages"=false, because CoCo is using CopyFileRequest instead of the Storage data structures for ConfigMap and/or Secret volume mounts during CreateContainerRequest. - The value of "guest_pull" becomes true by default. - "image_layer_verification" is no longer used - just CoCo's guest pull is supported. - The Request input files from unit tests are changing to reflect the new default settings values described above. - tests/integration/kubernetes/tests_common.sh adjusts the settings for platforms that are not set-up for CoCo during CI (i.e., platforms other than SNP, TDX, and CoCo Dev). Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-07-28 18:30:13 +00:00
Dan Mihai	94995d7102	genpolicy: skip pulling layers for guest-pull Skip pulling container image layers when guest-pull=true. The contents of these layers were ignored due to: - #11162, and - tarfs snapshotter support having been removed from genpolicy. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-07-28 18:30:13 +00:00
Dan Mihai	f6016f4f36	genpolicy: remove tarfs snapshotter support AKS Confidential Containers are using the tarfs snapshotter. CoCo upstream doesn't use this snapshotter, so remove this Policy complexity from upstream. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-07-28 18:30:10 +00:00
Steve Horsman	077c59dd1f	Merge pull request #11385 from wainersm/ci_make_coco_nontee_required ci/gatekeeper: make run-k8s-tests-coco-nontee job required	2025-07-28 14:16:23 +01:00
Steve Horsman	74fba9c736	Merge pull request #11619 from kata-containers/install-dependencies-gh-cli ci: Try passing api token into githubh api call	2025-07-28 13:35:12 +01:00
Xuewei Niu	2a3c8b04df	Merge pull request #11613 from RuoqingHe/clippy-fix-for-libs-20250721 mem-agent: Ignore Cargo.lock	2025-07-28 17:45:29 +08:00
RuoqingHe	3f46347dc5	Merge pull request #11618 from RuoqingHe/fix-dragonball-default-build dragonball: Fix warnings in default build	2025-07-28 11:24:46 +08:00
Xuewei Niu	e5d5768c75	Merge pull request #11626 from RuoqingHe/bump-cloud-hypervisor-v47 versions: Upgrade to Cloud Hypervisor v47.0	2025-07-28 10:34:45 +08:00
Ruoqing He	4ca6c2d917	mem-agent: Ignore Cargo.lock `mem-agent` here is now a library and do not contain examples, ignore Cargo.lock to get rid of untracked file noise produced by `cargo run` or `cargo test`. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-07-28 10:32:46 +08:00
Ruoqing He	3ec10b3721	runtime: clh: Re-generate client code against v47.0 Re-generates the client code against Cloud Hypervisor v47.0. Note: The client code of cloud-hypervisor's OpenAPI is automatically generated by openapi-generator. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-07-25 20:44:14 +02:00
Ruoqing He	14e9d2c815	versions: Upgrade to Cloud Hypervisor v47.0 Details of v47.0 release can be found in our roadmap project as iteration v47.0: https://github.com/orgs/cloud-hypervisor/projects/6. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-07-25 20:42:24 +02:00
Xuewei Niu	6f6d64604f	Merge pull request #11598 from justxuewei/cgroups	2025-07-25 17:53:03 +08:00
Hyounggyu Choi	860779c4d9	Merge pull request #11621 from Apokleos/enhance-copyfile runtime-rs: Some extra work to enhance copyfile with sharedfs disabled	2025-07-25 11:27:03 +02:00
Ruoqing He	639273366a	dragonball: Gate `MmapRegion` behind `virtio-fs` `MmapRegion` is only used while `virtio-fs` is enabled during testing dragonball, gate the import behind `virtio-fs` feature. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-07-25 09:09:35 +00:00
Ruoqing He	2e81ac463a	dragonball: Allow unused to suppress warnings Some variables went unused if certain features are not enabled, use `#[allow(unused)]` to suppress those warnings at the time being. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-07-25 09:07:19 +00:00
Ruoqing He	5f7da1ccaa	dragonball: Silence never read fields Some fields in structures used for testing purpose are never read, rename to send out the message. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-07-25 09:07:19 +00:00
Ruoqing He	225e6fffbc	dragonball: Gate `VcpuManagerError` behind `host-device` `VcpuManagerError` is only needed when `host-device` feature is enabled, gate the import behind that feature. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-07-25 09:07:19 +00:00
Ruoqing He	0502b05718	dragonball: Remove `with-serde` feature assertion Code inside `test_mac_addr_serialization_and_deserialization` test does not actually require this `with-serde` feature to test, removing the assertion here to enable this test. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-07-25 09:05:55 +00:00
Xuewei Niu	60e3679eb7	runtime-rs: Add full cgroups support on host Add full cgroups support on host. Cgroups are managed by `FsManager` and `SystemdManager`. As the names impies, the `FsManager` manages cgroups through cgroupfs, while the `SystemdManager` manages cgroups through systemd. The two manages support cgroup v1 and cgroup v2. Two types of cgroups path are supported: 1. For colon paths, for example "foo.slice:bar:baz", the runtime manages cgroups by `SystemdManager`; 2. For relative/absolute paths, the runtime manages cgroups by `FsManager`. vCPU threads are added into the sandbox cgroups in cgroup v1 + cgroupfs, others, cgroup v1 + systemd, cgroup v2 + cgroupfs, cgroup v2 + systemd, VMM process is added into the cgroups. The systemd doesn't provide a way to add thread to a unit. `add_thread()` in `SystemdManager` is equivalent to `add_process()`. Cgroup v2 supports threaded mode. However, we should enable threaded mode from leaf node to the root node (`/`) iteratively [1]. This means the runtime needs to modify the cgroups created by container runtime (e.g. containerd). Considering cgroupfs + cgroup v2 is not a common combination, its behavior is aligned with systemd + cgroup v2, which is not allowed to manage process at the thread level. 1: https://www.kernel.org/doc/html/v4.18/admin-guide/cgroup-v2.html#threads Fixes: #11356 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2025-07-25 14:52:55 +08:00
alex.lyn	613dba6f1f	runtime-rs: Some extra work to enhance copyfile with sharedfs disabled As some reasons, it first should make it align with runtime-go, this commit will do this work. Fixes #11543 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-25 11:39:20 +08:00
Xuewei Niu	6aa3517393	tests: Prevent the shim from being killed in k8s-oom test The actual memory usage on the host is equal to the hypervisor memory usage plus the user memory usage. An OOM killer might kill the shim when the memory limit on host is same with that of container and the container consumes all available memory. In this case, the containerd will never receive OOM event, but get "task exit" event. That makes the `k8s-oom.bats` test fail. The fix is to add a new container to increase the sandbox memory limit. When the container "oom-test" is killed by OOM killer, there is still available memory for the shim, so it will not be killed. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2025-07-24 23:44:21 +08:00
Steve Horsman	c762a3dd4f	Merge pull request #11372 from kata-containers/dependabot/cargo/src/dragonball/openssl-af8515b6e0 build(deps): bump the openssl group across 4 directories with 1 update	2025-07-24 13:27:24 +01:00
Fupan Li	fdbe549368	Merge pull request #11547 from Apokleos/virtio-scsi runtime-rs: support block device driver virtio-scsi within qemu-rs	2025-07-24 18:02:11 +08:00
Xuewei Niu	635272f3e8	runtime-rs: Ignore SIGTERM signal in shim When enabling systemd cgroup driver and sandbox cgroup only, the shim is under a systemd unit. When the unit is stopping, systemd sends SIGTERM to the shim. The shim can't exit immediately, as there are some cleanups to do. Therefore, ignoring SIGTERM is required here. The shim should complete the work within a period (Kata sets it to 300s by default). Once a timeout occurs, systemd will send SIGKILL. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2025-07-24 17:15:15 +08:00
Xuewei Niu	79f29bc523	runtime-rs: QEMU get_thread_ids() returns real vCPU's tids The information is obtained through QMP query_cpus_fast. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2025-07-24 17:15:15 +08:00
stevenhorsman	475baf95ad	ci: Try passing api token into githubh api call Our CI keeps on getting ``` jq: error (at <stdin>:1): Cannot index string with string "tag_name" ``` during the install dependencies phase, which I suspect might be due to github rate limits being reduced, so try to pass through the `GH_TOKEN` env and use it in the auth header. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-24 08:49:32 +01:00
alex.lyn	b40d65bc1b	runtime-rs: support block device driver virtio-scsi within qemu-rs It is important that we continue to support VirtIO-SCSI. While VirtIO-BLK is a common choice, virtio-scsi offers significant performance advantages in specific scenarios, particularly when utilizing iothreads and with NVMe Fabrics. Maintaining Flexibility and Choice by supporting both virtio-blk and virtio-scsi, we provide greater flexibility for users to choose the optimal storage（virtio-blk, virtio-scsi) interface based on their specific workload requirements and hardware configurations. As virtio-scsi controller has been created when qemu vm starts with block device driver is set to `virtio-scsi`. This commit is for blockdev_add the backend block device and device_add frondend virtio-scsi device via qmp. Fixes #11516 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-24 14:00:02 +08:00
alex.lyn	e683a7fd37	runtime-rs: Change the device_id with block device index As block device index is an very important unique id of a block device and can indicate a block device which is equivalent to device_id. In case of index is required in calculating scsi LUN and reduce useless arguments within reusing `hotplug_block_device`, we'd better change the device_id with block device index. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-24 11:57:00 +08:00
alex.lyn	4521cae0c0	runtime-rs: Support AIO for hotplugging block device within qemu In this commit, block device aio are introduced within hotplug_block_device within qemu via qmp and the "iouring" is set the default. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-24 11:57:00 +08:00
alex.lyn	b4d276bc2b	runtime-rs: Handle virtio-scsi within device manager It should be correctly handled within the device manager when do create_block_device if the driver_option is virtio-scsi. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-24 11:57:00 +08:00
alex.lyn	fbd84fd3f4	runtime-rs: Support virtio-scsi device within handle_block_volume It supports handling scsi device when block device driver is `scsi`. And it will ensure a correct storage source with LUN. Fixes #11516 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-24 11:57:00 +08:00
alex.lyn	57645c0786	runtime-rs: Add support for block device AIO In this commit, three block device aio modes are introduced and the "iouring" is set the default. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-24 11:57:00 +08:00
alex.lyn	40e6aacc34	runtime-rs: Introduce scsi_addr within BlockConfig for SCSI devices It's used to help discover scsi devices inside guest and also add a new const value `KATA_SCSI_DEV_TYPE` to help pass information. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-24 11:57:00 +08:00
alex.lyn	125383e53c	runtime-rs: Add support for configurable block device aio AIO is the I/O mechanism used by qemu with options: - threads Pthread based disk I/O. - native Native Linux I/O. - io_uring (default mode) Linux io_uring API. This provides the fastest I/O operations on Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-24 11:56:52 +08:00
dependabot[bot]	ef9d960763	build(deps): bump the openssl group across 4 directories with 1 update Bumps the openssl group with 1 update in the /src/dragonball directory: [openssl](https://github.com/sfackler/rust-openssl). Bumps the openssl group with 1 update in the /src/runtime-rs directory: [openssl](https://github.com/sfackler/rust-openssl). Bumps the openssl group with 1 update in the /src/tools/genpolicy directory: [openssl](https://github.com/sfackler/rust-openssl). Bumps the openssl group with 1 update in the /src/tools/kata-ctl directory: [openssl](https://github.com/sfackler/rust-openssl). Updates `openssl` from 0.10.72 to 0.10.73 - [Release notes](https://github.com/sfackler/rust-openssl/releases) - [Commits](https://github.com/sfackler/rust-openssl/compare/openssl-v0.10.72...openssl-v0.10.73) Updates `openssl` from 0.10.72 to 0.10.73 - [Release notes](https://github.com/sfackler/rust-openssl/releases) - [Commits](https://github.com/sfackler/rust-openssl/compare/openssl-v0.10.72...openssl-v0.10.73) Updates `openssl` from 0.10.72 to 0.10.73 - [Release notes](https://github.com/sfackler/rust-openssl/releases) - [Commits](https://github.com/sfackler/rust-openssl/compare/openssl-v0.10.72...openssl-v0.10.73) Updates `openssl` from 0.10.72 to 0.10.73 - [Release notes](https://github.com/sfackler/rust-openssl/releases) - [Commits](https://github.com/sfackler/rust-openssl/compare/openssl-v0.10.72...openssl-v0.10.73) --- updated-dependencies: - dependency-name: openssl dependency-version: 0.10.73 dependency-type: indirect update-type: version-update:semver-patch dependency-group: openssl - dependency-name: openssl dependency-version: 0.10.73 dependency-type: indirect update-type: version-update:semver-patch dependency-group: openssl - dependency-name: openssl dependency-version: 0.10.73 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: openssl - dependency-name: openssl dependency-version: 0.10.73 dependency-type: indirect update-type: version-update:semver-patch dependency-group: openssl ... Signed-off-by: dependabot[bot] <support@github.com>	2025-07-23 15:17:12 +00:00
Fabiano Fidêncio	58925714d2	Merge pull request #11579 from Apokleos/fix-hotplug-blk runtime-rs: Support hotplugging host block devices within qemu-rs	2025-07-23 11:10:04 +02:00
alex.lyn	a12ae58431	runtime-rs: Support hotplugging host block devices within qemu-rs Although Previous implementation of hotplugging block device via QMP can successfully hot-plug the regular file based block device, but it fails when the backend is /dev/xxx(e.g. /dev/loop0). With analysis about it, we can know that it lacks the ablility to hotplug host block devices. This commit will fill the gap, and make it work well for host block devices. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-22 15:40:03 +08:00
Fabiano Fidêncio	acae4480ac	Merge pull request #11604 from fidencio/release/3.19.1 release: Bump version to 3.19.1	2025-07-22 09:00:15 +02:00
Fabiano Fidêncio	0220b4d661	release: Bump version to 3.19.1 As there were a few moderate security vulnerability fixes missed as part of the 3.19.0 release. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-21 20:09:21 +02:00
Steve Horsman	09efcfbd86	Merge pull request #11606 from kata-containers/dependabot/cargo/src/tools/genpolicy/zerocopy-0.6.6 build(deps): bump zerocopy from 0.6.1 to 0.6.6 in /src/tools/genpolicy	2025-07-21 18:58:56 +01:00
Steve Horsman	9f04d8e121	Merge pull request #11605 from kata-containers/dependabot/cargo/src/tools/kata-ctl/unsafe-libyaml-0.2.11 build(deps): bump unsafe-libyaml from 0.2.9 to 0.2.11 in /src/tools/kata-ctl	2025-07-21 18:50:01 +01:00
dependabot[bot]	a9c8377073	build(deps): bump zerocopy from 0.6.1 to 0.6.6 in /src/tools/genpolicy --- updated-dependencies: - dependency-name: zerocopy dependency-version: 0.6.6 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2025-07-21 12:50:38 +00:00
dependabot[bot]	0b4c434ece	build(deps): bump unsafe-libyaml in /src/tools/kata-ctl Bumps [unsafe-libyaml](https://github.com/dtolnay/unsafe-libyaml) from 0.2.9 to 0.2.11. - [Release notes](https://github.com/dtolnay/unsafe-libyaml/releases) - [Commits](https://github.com/dtolnay/unsafe-libyaml/compare/0.2.9...0.2.11) --- updated-dependencies: - dependency-name: unsafe-libyaml dependency-version: 0.2.11 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2025-07-21 12:46:27 +00:00
Fabiano Fidêncio	35629d0690	Merge pull request #11603 from stevenhorsman/security-updates-21-jul dependencies: More crate bumps to resolve security issues	2025-07-21 14:33:07 +02:00
stevenhorsman	162ba19b85	agent-ctl: Bump rusttls Bump rusttls to >=0.23.18 to remediate RUSTSEC-2024-0399 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-21 10:41:59 +01:00
stevenhorsman	42339e9cdf	dragonball: Update url crate Update url to 2.5.4 to bump idna to 1.0.3 and remediate RUSTSEC-2024-0421 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-21 10:35:05 +01:00
stevenhorsman	1795361589	runk: Update rustjail Update the rustjail crate to pull in the latest security fixes Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-21 10:31:18 +01:00
stevenhorsman	28929f5b3e	runtime: Bump promethus Bump this crate to remove the old version of protobuf and remediate RUSTSEC-2024-0437 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-21 10:29:57 +01:00
stevenhorsman	e66aa1ef8c	runtime: Bump promethus and ttrpc-codegen Bump these crates to remove the old version of protobuf and remediate RUSTSEC-2024-0437 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-21 10:29:39 +01:00
Fabiano Fidêncio	d60513ece9	Merge pull request #11597 from kata-containers/topic/fix-release-static-tarball-content release: Copy the VERSION file to the tarball	2025-07-20 21:06:40 +02:00
Fabiano Fidêncio	55aae75ed7	shellcheck: Fix issues on kata-deploy-merge-builds.sh As we're already touching the file, let's get those fixed. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-20 09:33:50 +02:00
Fabiano Fidêncio	aaeb3b3221	release: Copy the VERSION file to the tarball For the release itself, let's simply copy the VERSION file to the tarball. To do so, we had to change the logic that merges the build, as at that point the tag is not yet pushed to the repo. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-20 00:06:14 +02:00
Fabiano Fidêncio	21ccaf4a80	Merge pull request #11596 from fidencio/release/v3.19.0 release: Bump version to 3.19.0	2025-07-19 18:27:36 +02:00
Fabiano Fidêncio	60f312b4ae	release: Bump version to 3.19.0 Bump VERSION and helm-chart versions Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-19 09:11:30 +02:00
Fabiano Fidêncio	1351ccb2de	Merge pull request #11576 from Tim-Zhang/update-protobuf-to-fix-CVE-2025-53605 chore: Update protobuf to fix CVE-2025-53605	2025-07-19 07:43:13 +02:00
Fabiano Fidêncio	7f5f032aca	runtime-rs: Update containerd-shim / containerd-shim-protos Let's bump those to their 0.10.0 releases, which contain fixes for the CVE-2025-53605. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-19 00:18:01 +02:00
Fabiano Fidêncio	6dc4c0faae	Merge pull request #11589 from fidencio/topic/fix-tdx-qemu-path-for-non-gpu qemu: tdx: Fix binary path for non-gpu TDX	2025-07-18 17:24:00 +02:00
Tim Zhang	2fe9df16cc	gent-ctl: update Cargo.lock to fix CVE-2025-53605 Fixes: https://github.com/kata-containers/kata-containers/security/dependabot/392 Fixes: #11570 Signed-off-by: Tim Zhang <tim@hyper.sh> Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-18 16:13:25 +02:00
Tim Zhang	45b44742de	genpolicy: update Cargo.lock to fix CVE-2025-53605 Fixes: https://github.com/kata-containers/kata-containers/security/dependabot/394 Fixes: #11570 Signed-off-by: Tim Zhang <tim@hyper.sh> Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-18 16:10:52 +02:00
Tim Zhang	fa9ff1b299	kata-ctl: update prometheus/protobuf to fix CVE-2025-53605 Fixes: https://github.com/kata-containers/kata-containers/security/dependabot/395 Fixes: #11570 Signed-off-by: Tim Zhang <tim@hyper.sh> Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-18 16:05:13 +02:00
Tim Zhang	d0e7a51f7b	dragonball: update prometheus/protobuf to fix CVE-2025-53605 Fixes: https://github.com/kata-containers/kata-containers/security/dependabot/396 Fixes: #11570 Signed-off-by: Tim Zhang <tim@hyper.sh>	2025-07-18 16:02:29 +02:00
Tim Zhang	222393375a	agent: update ttrpc-codegen to remove dependency on protobuf v2 To fix CVE-2025-53605. Fixes: https://github.com/kata-containers/kata-containers/security/dependabot/397 Fixes: #11570 Signed-off-by: Tim Zhang <tim@hyper.sh> Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-18 16:02:07 +02:00
Fabiano Fidêncio	60c3d89767	Merge pull request #11558 from gmintoco/feature/helm-nodeSelector helm: add nodeSelector support to kata-deploy chart	2025-07-18 15:52:19 +02:00
Fabiano Fidêncio	3143787f69	qemu: tdx: Fix binary path for non-gpu TDX On commit `90bc749a19`, we've changed the QEMUTDXPATH in order to get it to work with GPUs, but the change broke the non-GPU TDX use-case, which depends on the distro binary. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-18 15:26:27 +02:00
Fabiano Fidêncio	497a3620c2	tests: Remove references to qemu-sev As it's been removed from our codebase. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-18 12:49:54 +02:00
Fabiano Fidêncio	17ce44083c	runtime: Remove reference to sev package Otherwise it'll just break static checks. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-18 12:49:54 +02:00
Gus Minto-Cowcher	3b5cd2aad6	helm: remove qemu-sev references qemu-sev support has been removed, but those bits were left behind by mistake. Signed-off-by: Gus Minto-Cowcher <gus@basecamp-research.com>	2025-07-18 12:49:54 +02:00
Gus Minto-Cowcher	41d41d51f7	helm: add nodeSelector support to kata-deploy chart - Add nodeSelector configuration to values.yaml with empty default - Update DaemonSet template to conditionally include nodeSelector - Add documentation and examples for nodeSelector usage in README - Allows users to restrict kata-containers deployment to specific nodes by labeling them Signed-off-by: Gus Minto-Cowcher <gus@basecamp-research.com>	2025-07-18 12:49:54 +02:00
Fabiano Fidêncio	7d709a0759	Merge pull request #11493 from stevenhorsman/agent-ctl-tag-cache ci: cache: Tag agent-ctl cache	2025-07-18 12:12:46 +02:00
Fabiano Fidêncio	4a6c718f23	Merge pull request #11584 from zvonkok/fix-kernel-debug-enabled kernel: fix enable kernel debug	2025-07-18 11:38:36 +02:00
Sumedh Alok Sharma	47184e82f5	Merge pull request #11313 from Ankita13-code/ankitapareek/exec-id-agent-fix agent: update the processes hashmap to use exec_id as primary key	2025-07-18 14:07:15 +05:30
Fabiano Fidêncio	d9daddce28	Merge pull request #11578 from justxuewei/vsock-async runtime-rs: Fix the issue of blocking socket with Tokio	2025-07-18 10:13:03 +02:00
Xuewei Niu	629c942d4b	runtime-rs: Fix the issue of blocking socket with Tokio According to the issue [1], Tokio will panic when we are giving a blocking socket to Tokio's `from_std()` method, the information is as follows: ``` A panic occurred at crates/agent/src/sock/vsock.rs:59: Registering a blocking socket with the tokio runtime is unsupported. If you wish to do anyways, please add `--cfg tokio_allow_from_blocking_fd` to your RUSTFLAGS. ``` A workaround is to set the socket to non-blocking. 1: https://github.com/tokio-rs/tokio/issues/7172 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2025-07-18 10:55:48 +08:00
Xuewei Niu	1508e6f0f5	agent: Bump Tokio to v1.46.1 Tokio now has a newer version, let us bump it. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2025-07-18 10:55:48 +08:00
Xuewei Niu	5a4050660a	runtime-rs: Bump Tokio to v1.46.1 Tokio now has a newer version, let us bump it. Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2025-07-18 10:55:48 +08:00
Zvonko Kaiser	a786dc48b0	kernel: fix enable kernel debug The KERNEL_DEBUG_ENABLED was missing in the outer shell script so overrides via make were not possible. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-18 02:24:19 +00:00
Fabiano Fidêncio	eb2bfbf7ac	Merge pull request #11572 from stevenhorsman/RUSTSEC-2024-0384-remediate More crate bumps for security remediations	2025-07-17 22:35:05 +02:00
Zvonko Kaiser	cef9485634	Merge pull request #11450 from kata-containers/dependabot/cargo/src/agent/nix-0.27.1 build(deps): bump nix to 0.26.4 in agent, libs, runtime-rs	2025-07-17 14:22:40 -04:00
stevenhorsman	41a608e5ce	tools: Bump borsh, liboci-cli and oci-spec Bump these crates to remove the unmaintained dependency proc-macro-error and remediate RUSTSEC-2024-0370 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-17 18:23:19 +01:00
stevenhorsman	e56f493191	deps: Bump zbus, serial_test & async-std Bump these crates across various components to remove the dependency on unmaintained instant crate and remediate RUSTSEC-2024-0384 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-17 18:23:19 +01:00
stevenhorsman	bb820714cb	agent-ctl: Update borsh - Update borsh to remove the unmaintained dependency proc-macro-error and remediate RUSTSEC-2024-0370 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-17 18:23:19 +01:00
Steve Horsman	549fd2a196	Merge pull request #11581 from stevenhorsman/osv-scanner-action-permissions-fix workflow: Fix osv-scanner action	2025-07-17 18:18:16 +01:00
stevenhorsman	a7e27b9b68	workflow: Fix osv-scanner action - The github generated template had an old version which isn't valid for the pr-scan, so update to the latest - The action needs also `actions: read` and `contents:read` to run in kata-containers Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-17 17:29:35 +01:00
Steve Horsman	8741f2ab3d	Merge pull request #11580 from kata-containers/osv-scanner-action workflow: Add osv-scanner action	2025-07-17 17:00:34 +01:00
stevenhorsman	1a75c12651	workflow: Add osv-scanner action Add action to check for vulnerabilities in the project and on each PR Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-17 16:41:56 +01:00
stevenhorsman	4c776167e5	trace-forwarder: Add nix features Some of the nix apis we are using are now enabled by features, so add these to resolve the compilation issues Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-17 15:09:21 +01:00
dependabot[bot]	cd79108c77	build(deps): bump nix in /src/tools/trace-forwarder Bumps [nix](https://github.com/nix-rust/nix) from 0.23.1 to 0.30.1. - [Changelog](https://github.com/nix-rust/nix/blob/master/CHANGELOG.md) - [Commits](https://github.com/nix-rust/nix/compare/v0.23.1...v0.30.1) --- updated-dependencies: - dependency-name: nix dependency-version: 0.30.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2025-07-17 15:09:06 +01:00
stevenhorsman	9185ef1a67	runtime-rs: Bump nix to matching version runtime-rs needs the same version as libs, so sync this up as well. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-17 15:08:46 +01:00
dependabot[bot]	219ad505c2	build(deps): bump nix from 0.24.3 to 0.26.4 in /src/agent Nix needs to be in sync between libs and agent, so bump the agent to the libs version Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-17 15:01:06 +01:00
dependabot[bot]	a4d22fe330	build(deps): bump nix from 0.24.2 to 0.26.4 in /src/libs --- updated-dependencies: - dependency-name: nix dependency-version: 0.26.4 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>	2025-07-17 15:01:06 +01:00
Fabiano Fidêncio	6dabb3683f	Merge pull request #10961 from zvonkok/shellcheck-zero shellcheck: fix kernel/build.sh	2025-07-17 12:59:00 +02:00
Steve Horsman	405f5283f0	Merge pull request #11573 from arvindskumar99/versions_comment OVMF: Making comment in versions.yaml for SEV-SNP	2025-07-17 10:11:58 +01:00
Fabiano Fidêncio	32d40849fa	Merge pull request #11577 from Xynnn007/bump-gc deps(chore): bump guest-components to candidate v0.14.0	2025-07-17 11:08:36 +02:00
Zvonko Kaiser	ca4f96ed00	shellcheck: fix kernel/build.sh Refactor code to make shellcheck happy Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-17 10:15:41 +02:00
Xynnn007	82b890349d	deps(chore): bump guest-components to candidate v0.14.0 This new version of gc fixes s390x attestation, also introduces registry configuration setting directly via initdata. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-07-17 10:19:02 +08:00
stevenhorsman	51f41b1669	ci: cache: Tag agent-ctl cache The peer pods project is using the agent-ctl tool in some tests, so tagging our cache will let them more easily identify development versions of kata for testing between releases. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-16 11:32:33 +01:00
Fupan Li	75d23b8884	Merge pull request #11504 from lifupan/fix_fd_leak agent: fix the issue of parent writer pipe fd leak	2025-07-16 18:29:24 +08:00
Fupan Li	83f54eec52	agent: fix the issue of parent writer pipe fd leak Sometimes, containers or execs do not use stdin, so there is no chance to add parent stdin to the process's writer hashmap, resulting in the parent stdin's fd not being closed when the process is cleaned up later. Therefore, when creating a process, first explicitly add parent stdin to the wirter hashmap. Make sure that the parent stdin's fd can be closed when the process is cleaned up later. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-07-16 16:15:31 +08:00
Fupan Li	752c8b611e	Merge pull request #11575 from Tim-Zhang/fix-runk-build runk: Fix build errors	2025-07-16 15:23:58 +08:00
Arvind Kumar	2a52351822	OVMF: Making comment in versions.yaml for SEV-SNP Adding comment to versions.yaml to indicate that the ovmf-sev is also used by AMD SEV-SNP, as per the discussion in https://github.com/kata-containers/kata-containers/pull/11561. Signed-off-by: Arvind Kumar <arvinkum@amd.com>	2025-07-16 06:35:21 +02:00
Tim Zhang	c8183a2c14	runk: rename imported crate from users to uzers To adapt the new crate name and fix build errors introduced in the commit `39f51b4c6d` Fixes: #11574 Signed-off-by: Tim Zhang <tim@hyper.sh>	2025-07-16 11:35:39 +08:00
Fabiano Fidêncio	9cebbab29d	Merge pull request #11335 from zvonkok/fix-kata-deploy.sh gpu: Fix kata deploy.sh	2025-07-15 19:50:44 +02:00
Fabiano Fidêncio	c8b7a51d72	Merge pull request #11082 from zvonkok/debug-kernel kernel: debug config	2025-07-15 19:04:15 +02:00
Zvonko Kaiser	c56c896fc6	qemu: remove the experimental suffix for qemu-snp We switched to vanilla QEMU for the CPU SNP use-case. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-15 16:49:58 +02:00
Zvonko Kaiser	a282fa6865	gpu: Add TDX related runtime adjustments We have the QEMU adjustments for SNP but missing those for TDX Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-15 16:49:56 +02:00
Zvonko Kaiser	0d2993dcfd	kernel: bump kernel version Obligatory kernel version bump Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-15 16:48:23 +02:00
Zvonko Kaiser	a4597672c0	kernel: Add KERNEL_DEBUG_ENABLED to build scripts We want to be able to build a debug version of the kernel for various use-cases like debugging, tracing and others. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-15 16:48:03 +02:00
Fabiano Fidêncio	b7af7f344b	Merge pull request #11569 from Xynnn007/bump-coco deps(chore): update guest-components and trustee	2025-07-15 16:34:23 +02:00
Fabiano Fidêncio	aac555eeff	Merge pull request #11571 from fidencio/topic/fix-nvidia-gpu-initrd-cache build: Fix cache for nvidia-gpu-initrd builds	2025-07-15 16:28:03 +02:00
Fabiano Fidêncio	4415a47fff	Merge pull request #11557 from Apokleos/fix-initdata runtime-rs: Fix initdata length field missing when create block	2025-07-15 16:22:45 +02:00
Fabiano Fidêncio	11c744c5c3	Merge pull request #11567 from zvonkok/remove-gpu-admin-tools Remove gpu admin tools	2025-07-15 15:11:56 +02:00
Fabiano Fidêncio	fa7598f6ec	Merge pull request #11568 from zvonkok/tdx-qemu-path gpu: Add proper TDX config path	2025-07-15 14:54:13 +02:00
Fabiano Fidêncio	3e86f3a95c	build: Rename rootfs-nvidia-* to fix cache issues The convention for rootfs-* names is: * rootfs-${image_type}-${special_build} If this is not followed, cache will never work as expected, leading to building the initrd / image on every single build, which is specially constly when building the nvidia specific targets. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-15 14:48:45 +02:00
alex.lyn	56c0c172fa	runtime-rs: Fix initdata length field missing when create block The init data could not be read properly within kata-agent because the data length field was omitted, a consequence of a mismatch in the data write format. Fixes #11556 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-15 19:22:17 +08:00
Fabiano Fidêncio	b76efa2a25	Merge pull request #11564 from BbolroC/make-qemu-coco-dev-s390x-required ci: Make qemu-coco-dev for s390x (zVSI) required again	2025-07-15 12:04:18 +02:00
Xynnn007	4da31bf2f9	agent: deliver initdata toml to attestation agent Now AA supports to receive initdata toml plaintext and deliver it in the attestation. This patch creates a file under '/run/confidential-containers/initdata' to store the initdata toml and give it to AA process. When we have a separate component to handle initdata, we will move the logic to that component. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-07-15 17:10:56 +08:00
Steve Horsman	d219fc20e1	Merge pull request #11555 from stevenhorsman/rust-advisory-fixes-pre-3.19.0 Rust advisory fixes pre 3.19.0	2025-07-15 09:11:33 +01:00
Hui Zhu	3577e4bb43	Merge pull request #11480 from teawater/update_ma mem-agent: Update to https://github.com/teawater/mem-agent/tree/kata-20250627	2025-07-15 15:22:10 +08:00
Xynnn007	19001af1e2	deps(chore): update guest-components and trustee to the version of pre v0.14.0 Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-07-15 09:12:47 +08:00
teawater	028f25ac84	mem-agent: Update to kata-20250627 Update to https://github.com/teawater/mem-agent/tree/kata-20250627. The commit list: 3854b3a Update nix version from 0.23.2 to 0.30.1 d9a4ced Update tokio version from 1.33 to 1.45.1 9115c4d run_eviction_single_config: Simplify check evicted pages after eviction 68b48d2 get_swappiness: Use a rounding method to obtain the swappiness value 14c4508 run_eviction_single_config: Add max_seq and min_seq check with each info 8a3a642 run_eviction_single_config: Move infov update to main loop b6d30cf memcg.rs: run_aging_single_config: Fix error of last_inc_time check 54fce7e memcg.rs: Update anon eviction code 41c31bf cgroup.rs: Fix build issue with musl 0d6aa77 Remove lazy_static from dependencies a66711d memcg.rs: update_and_add: Fix memcg not work after set memcg issue cb932b1 Add logs and change some level of some logs 93c7ad8 Add per-cgroup and per-numa config support 092a75b Remove all Cargo.lock to support different versions of rust 540bf04 Update mem-agent-srv, mem-agent-ctl and mem-agent-lib to v0.2.0 81f39b2 compact.rs: Change default value of compact_sec_max to 300 c455d47 compact.rs: Fix psi_path error with cgroup v2 issue 6016e86 misc.rs: Fix log error ded90e9 Set mem-agent-srv and mem-agent-ctl as bin Fixes: #11478 Signed-off-by: teawater <zhuhui@kylinos.cn>	2025-07-15 08:57:41 +08:00
Zvonko Kaiser	90bc749a19	gpu: Add proper TDX config path This was missed during the GPU TDX experimental enablement Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-14 23:26:28 +00:00
Zvonko Kaiser	da17b06d28	gpu: Pin toolkit version New versions have incompatibilites, pin toolkit to a working version Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-14 22:07:21 +00:00
Zvonko Kaiser	97a4a1574e	gpu: Remove gpu-admin-tools NVRC got a new feature reading the CC mode directly from register Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-14 21:59:31 +00:00
stevenhorsman	18597588c0	agent: Bump cdi version Bump cdi version to the pick up fixes to: - RUSTSEC-2025-0024 - RUSTSEC-2025-0023 - RUSTSEC-2024-0370 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-14 16:54:30 +01:00
stevenhorsman	661d88b11f	versions: Bump oci-spec Try bumping oci-spec to 0.8.1 as it included fixes for vulnerabilities including RUSTSEC-2024-0370 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-14 16:54:30 +01:00
Fabiano Fidêncio	579d373623	Merge pull request #11521 from stevenhorsman/idna-1.0.4-bump versions: Bump idna crate to >= 1.0.3	2025-07-14 17:39:30 +02:00
Fabiano Fidêncio	f5decea13e	Merge pull request #11550 from stevenhorsman/runtime-rs-bump-chrono-0.4.41 runtime-rs \| trace-forwarder: Bump chrono crate version	2025-07-14 16:45:58 +02:00
Steve Horsman	0fa2cd8202	Merge pull request #11519 from wainersm/tests_teardown_common tests/k8s: instrument some tests for debugging	2025-07-14 13:20:01 +01:00
Hyounggyu Choi	a224b4f9e4	ci: Make qemu-coco-dev for s390x (zVSI) required again As the following job has passed 10 days in a row for the nightly test: ``` kata-containers-ci-on-push / run-k8s-tests-on-zvsi / run-k8s-tests (nydus, qemu-coco-dev, kubeadm) ``` this commit makes the job required again. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-07-14 11:03:54 +02:00
Wainer dos Santos Moschetta	f0f1974e14	tests/k8s: call teardown_common in k8s-parallel.bats The teardown_common will print the description of the running pods, kill them all and print the system's syslogs afterwards. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-07-12 10:13:51 +01:00
Wainer dos Santos Moschetta	8dfeed77cd	tests/k8s: add handler for Job in set_node() Set the node in the spec template of a Job manifest, allowing to use set_node() on tests like k8s-parallel.bats Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-07-12 10:13:51 +01:00
Wainer dos Santos Moschetta	806d63d1d8	tests/k8s: call teardown_common in k8s-credentials-secrets.bats The teardown_common will print the description of the running pods, kill them all and print the system's syslogs afterwards. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-07-12 10:13:51 +01:00
Wainer dos Santos Moschetta	c8f40fe12c	tests/k8s: call teardown_common in k8s-sandbox-vcpus-allocation.bats The teardown_common will print the description of the running pods, kill them all and print the system's syslogs afterwards. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-07-12 10:13:51 +01:00
Fabiano Fidêncio	4a79c2520d	Merge pull request #11491 from Apokleos/default-blk-driver runtime-rs: Change default block device driver from virtio-scsi to virtio-blk-*	2025-07-11 23:14:13 +02:00
alex.lyn	9cc14e4908	runtime-rs: Update block device driver docs within configuration The previous description for the `block_device_driver` was inaccurate or outdated. This commit updates the documentation to provide a more precise explanation of its function. Fixes #11488 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-11 17:40:58 +02:00
alex.lyn	92160c82ff	runtime-rs: Change block device driver defualt with virtio-blk-* When we run a kata pod with runtime-rs/qemu and with a default configuration toml, it will fail with error "unsupported driver type virtio-scsi". As virtio-scsi within runtime-rs is not so popular, we set default block device driver with `virtio-blk-*`. Fixes #11488 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-11 17:40:58 +02:00
Ankita Pareek	5f08cc75b3	agent: update the processes hashmap to use exec_id as primary key This patch changes the container process HashMap to use exec_id as the primary key instead of PID, preventing exec_id collisions that could be exploited in Confidential Computing scenarios where the host is less trusted than the guest. Key changes: - Changed `processes: HashMap<pid_t, Process>` to `HashMap<String, Process>` - Added exec_id collision detection in `start()` method - Updated process lookup operations to use exec_id directly - Simplified `get_process()` with direct HashMap access This prevents multiple exec operations from reusing the same exec_id, which could be problematic in CoCo use cases where process isolation and unique identification are critical for security. Signed-off-by: Ankita Pareek <ankitapareek@microsoft.com>	2025-07-11 10:10:23 +00:00
Steve Horsman	878e50f978	Merge pull request #11554 from fidencio/topic/fix-version-file-on-release gh: Fix released VERSION file	2025-07-11 09:20:06 +01:00
Fabiano Fidêncio	fb22e873cd	gh: Fix released VERSION file The `/opt/kata/VERSION` file, which is created using `git describe --tags`, requires the newly released tag to be updated in order to be accurate. To do so, let's add a `fetch-tags: true` to the checkout action used during the `create-kata-tarball` job. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-11 09:47:11 +02:00
Alex Lyn	87e41e2a09	Merge pull request #11549 from stevenhorsman/bump-remove_dir_all runtime-rs: Switch tempdir to tempfile	2025-07-11 13:46:12 +08:00
Alex Lyn	f22272b8f7	Merge pull request #11540 from Apokleos/coldplug-vfio-clh runtime-rs: Add vfio support with coldplug for cloud-hypervisor	2025-07-11 10:33:59 +08:00
RuoqingHe	7cd4e3278a	Merge pull request #11545 from RuoqingHe/remove-lockfile-for-libs libs: Remove lockfile for libs	2025-07-10 21:56:10 +08:00
stevenhorsman	c740896b1c	trace-forwarder: Bump chrono crate version Bump chrono version to drop time@0.1.43 and remediate vulnerability CVE-2020-26235 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-10 14:55:20 +01:00
stevenhorsman	3916507553	runtime-rs: Bump chrono crate version Bump chrono version to drop time@0.1.45 and remediate vulnerability CVE-2020-26235 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-10 13:47:05 +01:00
Wainer dos Santos Moschetta	3ab6a8462d	ci/gatekeeper: make run-k8s-tests-coco-nontee job required The CoCo non-TEE job (run-k8s-tests-coco-nontee) used to be required but we had to withdraw it to fix a problem (#11156). Now the job is back running and stable, so time to make it required again. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-07-10 12:19:19 +01:00
stevenhorsman	c5ceae887b	runtime-rs: Switch tempdir to tempfile tempdir hasn't been updated for seven years and pulls in remove_dir_all@0.5.3 which has security advisory GHSA-mc8h-8q98-g5hr, so replace this with using tempfile, which the crate got merged into and we use elsewhere in the project Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-10 12:16:35 +01:00
Ruoqing He	4039506740	libs: Ignore Cargo.lock in libs workspace Ignore Cargo.lock in `libs` to prevent developers from accidentally track lock files in `libs` workspace. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-07-10 09:31:45 +00:00
alex.lyn	3fbe493edc	runtime-rs: Convert host devices within VmConfig for cloud-hypervisor This PR adds support for adding a network device before starting the cloud-hypervisor VM. This commit will get the host devices from NamedHypervisorConfig and assign it to VmConfig's devices which is for vfio devices when clh starts launching. And with this, it successfully finish the vfio devices conversion from a generic Hypervisor config to a clh specific VmConfig. Signed-off-by: alex.lyn <alex.lyn@antgroup.com> Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-07-10 16:33:43 +08:00
alex.lyn	0b5b8f549d	runtime-rs: Introduce a field host_devices within NamedHypervisorConfig This commit introduce `host_devices` to help convert vfio devices from a generic hypervisor config to a cloud-hypervisor specific VmConfig. Signed-off-by: alex.lyn <alex.lyn@antgroup.com> Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-07-10 16:33:41 +08:00
alex.lyn	d37183d754	runtime-rs: Add vfio support with coldplug for cloud-hypervisor This PR adds support for adding a vfio device before starting the cloud-hypervisor VM (or cold-plug vfio device). This commit changes "pending_devices" for clh implementation via adding DeviceType::Vfio() into pending_devices. And it will get shared host devices after correctly handling vfio devices (Specially for primary device). Signed-off-by: alex.lyn <alex.lyn@antgroup.com> Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-07-10 16:32:21 +08:00
Ruoqing He	ffa3a5a15e	libs: Remove Cargo.lock crates in `libs` workspace do not ship binaries, they are just libraries for other workspace to reference, the `Cargo.lock` file hence would not take effect. Removing Cargo.lock for `libs` workspace. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-07-10 03:14:55 +00:00
Fabiano Fidêncio	c68eb58f3f	Merge pull request #11529 from fidencio/topic/only-use-fixed-version-of-k0s-for-crio tests: k0s: Always use latest version, apart from CRI-O tests	2025-07-09 18:47:18 +02:00
Hyounggyu Choi	09297b7955	Merge pull request #11537 from BbolroC/set-sharedfs-to-none-for-ibm-sel runtime/runtime-rs: Set shared_fs to none for IBM SEL in config file	2025-07-09 18:30:08 +02:00
Hyounggyu Choi	bca31d5a4d	runtime/runtime-rs: Set shared_fs to none for IBM SEL in config file In line with configuration for other TEEs, shared_fs should be set to none for IBM SEL. This commit updates the value for runtime/runtime-rs. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-07-09 14:22:28 +02:00
Fabiano Fidêncio	5f17e61d11	tests: kata-deploy: Remove --wait from helm uninstall As we're using a `kubectl wait --timeout ...` to check whether the kata-deploy pod's been deleted or not, let's remove the `--wait` from the `helm uninstall ...` call as k0s tests were failing because the `kubectl wait --timeout...` was starting after the pod was deleted, making the test fail. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-09 14:01:30 +02:00
Fabiano Fidêncio	842e17b756	tests: k0s: Always use latest version, apart from CRI-O tests We've been pinning a specific version of k0s for CRI-O tests, which may make sense for CRI-O, but doesn't make sense at all when it comes to testing that we can install kata-deploy on latest k0s (and currently our test for that is broken). Let's bump to the latest, and from this point we start debugging, instead of debugging on an ancient version of the project. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-09 13:27:18 +02:00
Steve Horsman	7bc25b0259	Merge pull request #11494 from katexochen/p/opa-1.6 versions: bump opa 1.5.1 -> 1.6.0	2025-07-09 11:45:54 +01:00
Steve Horsman	967f66f677	Merge pull request #11380 from arvindskumar99/sev-deprecation Sev deprecation	2025-07-09 11:38:13 +01:00
stevenhorsman	f96b8fb690	kata-ctl: Update expected test failure message Update expected error after url crate bump Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-09 11:34:27 +01:00
stevenhorsman	b7bf46fdfa	versions: Bump idna crate to >= 1.0.4 Bump url, reqwests and idna crates in order to move away from idna <1.0.3 and remediate CVE-2024-12224. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-09 11:34:27 +01:00
Xuewei Niu	b8838140d0	Merge pull request #11527 from StevenFryto/fix-runtime-rootless-bugs runtime: Fix rootlessDir not correctly set in rootless VMM mode	2025-07-09 16:40:11 +08:00
Steve Horsman	990c4e68ee	Merge pull request #11523 from wainersm/ci_setup_kubectl workflows: adopting azure/setup-kubectl	2025-07-09 09:09:38 +01:00
stevenfryto	3c7a670129	runtime: Fix rootlessDir not correctly set in rootless VMM mode Previously, the rootlessDir variable in `src/runtime/virtcontainers/pkg/rootless.go` was initialized at package load time using `os.Getenv("XDG_RUNTIME_DIR")`. However, in rootless VMM mode, the correct value of $XDG_RUNTIME_DIR is set later during runtime using os.Setenv(), so rootlessDir remained empty. This patch defers the initialization of rootlessDir until the first call to `GetRootlessDir()`, ensuring it always reflects the current environment value of $XDG_RUNTIME_DIR. Fixes: #11526 Signed-off-by: stevenfryto <sunzitai_1832@bupt.edu.cn>	2025-07-09 09:51:48 +08:00
Wainer dos Santos Moschetta	e4da3b84a3	workflows: adopting azure/setup-kubectl There are workflows that rely on `az aks install-cli` to get kubectl installed. There is a well-known problem on install-cli, related with API usage rate limit, that has recently caused the command to fail quite often. This is replacing install-cli with the azure/setup-kubectl github action which has no such as rate limit problem. While here, removed the install_cli() function from gha-run-k8s-common.sh so avoid developers using it by mistake in the future. Fixes #11463 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-07-08 15:15:54 -03:00
Alex Lyn	294b2c1c10	Merge pull request #11528 from Apokleos/remote-initdata runtime-rs: add initdata annotation for remote hypervisor	2025-07-08 09:13:13 +08:00
Arvind Kumar	afedad0965	kernel: Removing SEV kernel packages Removing kernel config files realting to SEV as part of the SEV deprecation efforts. Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com> Signed-off-by: Arvind Kumar <arvinkum@amd.com>	2025-07-07 11:21:11 -05:00
Arvind Kumar	ecac3d2d28	runtime: Removing runtime logic for SEV Removing runtime SEV functionality, such as the kbs, ovmf, VMSA handling, and SEV configs as part of deprecating SEV from kata. Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com> Signed-off-by: Arvind Kumar <arvinkum@amd.com>	2025-07-07 11:17:32 -05:00
Arvind Kumar	8eebcef8fb	tests: Removing testing framework for SEV Removing files pertaining to SEV from the CI framework. Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com> Signed-off-by: Arvind Kumar <arvinkum@amd.com>	2025-07-07 11:17:32 -05:00
Arvind Kumar	675ea86aba	kata-deploy: Removing SEV from kata-deploy Removing files related to SEV, responsible for installing and configuring Kata containers. Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com> Signed-off-by: Arvind Kumar <arvinkum@amd.com>	2025-07-07 11:17:32 -05:00
Paul Meyer	ff7ac58579	versions: bump opa 1.5.1 -> 1.6.0 Bumping opa to latest release. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-07-07 14:19:08 +02:00
alex.lyn	fcaade24f4	runtime-rs: add initdata annotation for remote hypervisor Add init data annotation within preparing remote hypervisor annotations when prepare vm, so that it can be passed within CreateVMRequest. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-07 12:46:05 +01:00
Fabiano Fidêncio	110f68a0f1	Merge pull request #11530 from fidencio/topic/tests-fix-runtime-class-check tests: runtimeclasses: Adjust gpu runtimeclasses	2025-07-07 13:42:46 +02:00
Fabiano Fidêncio	2c2995b7b0	tests: runtimeclasses: Adjust gpu runtimeclasses `679cc9d47c` was merged and bumped the podoverhead for the gpu related runtimeclasses. However, the bump on the `kata-runtimeClasses.yaml` as overlooked, making our tests fail due to that discrepancy. Let's just adjust the values here and move on. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-07 11:43:40 +02:00
Fabiano Fidêncio	ef545eed86	Merge pull request #11513 from lifupan/dragonball_6.12.x tools: port the dragonball kernel patch to 6.12.x	2025-07-07 10:31:49 +02:00
Steve Horsman	d291e9bda0	Merge pull request #11336 from zvonkok/fix-podoverhead gpu: Update runtimeClasses for correct podoverhead	2025-07-07 09:20:07 +01:00
Fabiano Fidêncio	a2faf93211	kernel: Bump to v6.12.36 As that's the latest releasesd LTS. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-07-06 23:48:20 +02:00
Fupan Li	fd21c9df59	tools: port the dragonball kernel patch to 6.12.x Backport the dragonball's kernel patches to 6.12.x kernel version. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-07-06 23:48:20 +02:00
Zvonko Kaiser	679cc9d47c	gpu: Update runtimeClasses for correct podoverhead We cannot only rely only on default_cpu and default_memory in the config, default is 1 and 2Gi but we need some overhead for QEMU and the other related binaries running as the pod overhead. Especially when QEMU is hot-plugging GPUs, CPUs, and memory it can consume more memory. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-04 12:20:15 -04:00
Steve Horsman	1c718dbcdd	Merge pull request #11506 from stevenhorsman/remove-atty-dependency Remove atty dependency	2025-07-04 10:46:28 +01:00
Alex Lyn	362ea54763	Merge pull request #11517 from zvonkok/fix-nvrc-build gpu: NVRC static build	2025-07-04 13:51:03 +08:00
Alex Lyn	2e35a8067d	Merge pull request #11482 from Apokleos/fix-force-guestpull runtime-rs: refactor and fix the implementation of guest-pull	2025-07-04 11:29:33 +08:00
stevenhorsman	6f23608e96	ci: Remove atty group atty is unmaintained, with the last release almost 3 years ago, so we don't need to check for updates, but instead will remove it from out dependency tree. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-04 09:43:34 +08:00
stevenhorsman	7ffbdf7b3a	mem-agent: Remove structopts crate structopt features were integrated into clap v3 and so is not actively updated and pulls in the atty crate which has a security advisory, so update clap, remove structopts, update the code that used it to remove the outdated dependencies. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-04 09:43:34 +08:00
stevenhorsman	7845129bdc	versions: Bump slog-term to 2.9.1 slog-term 2.9.0 included atty, which is unmaintained as has a security advisory GHSA-g98v-hv3f-hcfr, so bump the version across our components to remove this dependency. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-07-04 09:43:34 +08:00
Aurélien Bombo	fe532f9d04	Merge pull request #11475 from kata-containers/sprt/zizmor-fixes security: ci: Fixes for Zizmor GHA security scanning	2025-07-03 13:29:47 -05:00
Zvonko Kaiser	c3b2d69452	gpu: NVRC static build We had the proper config.toml configuration for static builds but were building the glibc target and not the musl target. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-07-03 15:31:00 +00:00
Aurélien Bombo	8723eedad2	gha: Remove path restriction for Zizmor workflow The way GH works, we can only require Zizmor results on ALL PR runs, or none, so remove the path filter. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-07-03 08:18:34 -05:00
Alex Lyn	c857f59a1a	Merge pull request #11510 from lifupan/sync_resize_vcpu runtime-rs: make the resize_vcpu api support sync	2025-07-03 17:35:08 +08:00
alex.lyn	2b95facc6f	kata-type: Relax Mandatory source Field Check in Guest-Pull Mode Previously, the source field was subject to mandatory checks. However, in guest-pull mode, this field doesn't consistently provide useful information. Our practical experience has shown that relying on this field for critical data isn't always necessary. In other aspect, not all cases need mandatory check for KataVirtualVolume. based on this fact, we'd better to make from_base64 do only one thing and remove the validate(). Of course, We also keep the previous capability to make it easy for possible cases which use such method and we rename it clearly with from_base64_and_validate. This commit relaxes the mandatory checks on the KataVirtualVolume specifically for guest-pull mode, acknowledging its diminished utility in this context. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-03 17:07:20 +08:00
alex.lyn	8f8b196705	runtime-rs: refactor merging metadata within image_pull refactor implementation for merging metadata. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-03 17:07:08 +08:00
Fupan Li	fb1c35335a	runtime-rs: make the resize_vcpu sync When hot plugging vcpu in dragonball hypervisor, use the synchronization interface and wait until the hot plug cpu is executed in the guest before returning. This ensures that the subsequent device hot plug will not conflict with the previous call. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-07-03 15:11:36 +08:00
Fupan Li	72a38457f0	dragonball: make the resize_vcpu api support sync Let dragonball's resize_vcpu api support synchronization, and only return after the hot-plug of the CPU is successfully executed in the guest kernel. This ensures that the subsequent device hot-plug operation can also proceed smoothly. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-07-03 15:11:36 +08:00
Alex Lyn	210844ce6b	Merge pull request #11509 from teawater/agent_test kata-agent: mount.rs: Fix warning of test	2025-07-03 15:05:04 +08:00
Alex Lyn	95d513b379	Merge pull request #11423 from zhaodiaoer/test test: fix broken testing code in libs	2025-07-03 11:15:39 +08:00
teawater	0347698c59	kata-agent: mount.rs: Fix warning of test Got follow warning with make test of kata-agent: Compiling rustjail v0.1.0 (/data/teawater/kata-containers/src/agent/rustjail) Compiling kata-agent v0.1.0 (/data/teawater/kata-containers/src/agent) warning: unused import: `std::os::unix::fs` --> rustjail/src/mount.rs:1147:9 \| 1147 \| use std::os::unix::fs; \| ^^^^^^^^^^^^^^^^^ \| = note: `#[warn(unused_imports)]` on by default This commit fixes it. Fixes: #11508 Signed-off-by: teawater <zhuhui@kylinos.cn>	2025-07-03 10:01:19 +08:00
alex.lyn	7a59d7f937	runtime-rs: Import the public const value from libs Introduce a const value `KATA_VIRTUAL_VOLUME_PREFIX` defined in the libs/kata-types, and it'll be better import such const value from there. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-03 09:42:17 +08:00
Aurélien Bombo	8d86bcea4b	Merge pull request #11499 from kata-containers/sprt/fix-commit-check gha: Eliminate use of force-skip-ci label	2025-07-02 10:53:55 -05:00
Aurélien Bombo	8d7d859e30	gha: Eliminate use of force-skip-ci label This was originally implemented as a Jenkins skip and is only used in a few workflows. Nowadays this would be better implemented via the gatekeeper. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-07-02 10:29:50 -05:00
Saul Paredes	e7b9eddced	Merge pull request #11248 from microsoft/archana1/storages genpolicy: add validation for storages	2025-07-01 10:02:10 -07:00
Fabiano Fidêncio	07b41c88de	Merge pull request #11490 from Apokleos/fix-noise runtime-rs: Fix noise with frequently appearing in unstaged changes	2025-07-01 17:43:41 +02:00
Archana Choudhary	6932beb01f	policy: fix parse errors in rules.rego This patch fixes the rules.rego file to ensure that the policy is correctly parsed and applied by opa. Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2025-07-01 12:43:41 +00:00
Archana Choudhary	abbe1be69f	tests: enable confidential_guest setting for coco This commit updates the `tests_common.sh` script to enable the `confidential_guest` setting for the coco tests in the Kubernetes integration tests. Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2025-07-01 10:35:20 +00:00
Archana Choudhary	9dd365fdb5	genpolicy: fix mount source check in rules.rego This commit fixes the mount source check in rules.rego. Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2025-07-01 10:35:20 +00:00
Archana Choudhary	1cbea890f1	genpolicy: tests: update testcases for execprocess This patch removes storages from the testcases.json file for execprocess. This is because input storage objects are invalid for two reasons: 1. "io.katacontainers.fs-opt.layer=" is missing option in annotations. 2. by default, we don't have host-tarfs-dm-verity enabled, so the storage objects are not created in policy. Signed-off-by: Archana Choudhary <archana1@microsoft.com> ---	2025-07-01 10:35:20 +00:00
Archana Choudhary	6adec0737c	genpolicy: add rules for image_guest_pull storage This patch introduces some basic checks for the `image_guest_pull` storage type in the genpolicy tool. Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2025-07-01 10:35:20 +00:00
Archana Choudhary	bd2dc1422e	genpolicy: add test for container images having volumes This patch adds a test case to genpolicy for container images that have volumes. Examples of such container images include: - quay.io/opstree/redis - https://github.com/kubernetes/examples/blob/master/cassandra/image/Dockerfile Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2025-07-01 10:35:20 +00:00
Archana Choudhary	d7f998fbd5	genpolicy: tests: update test for emptydir volumes This patch - updates testcases.json for emptydir volumes/storages Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2025-07-01 10:35:20 +00:00
Archana Choudhary	68c8c31718	genpolicy: tests: add test for config_map volumes This patch adds test for config_map volumes. Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2025-07-01 10:35:20 +00:00
Archana Choudhary	9ebbc08d70	genpolicy: enable storage checks This patch - adds condition to add container image layers as storages - enable storage checks - fix CI policy test cases - update genpolicy-settings.json to enable storage checks - remove storage object addition in container image parsing Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2025-07-01 10:35:20 +00:00
Archana Choudhary	5b1459e623	genpolicy: test framework: enable config map usage This patch improves the test framework for the genpolicy tool by enabling the use of config maps. Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2025-07-01 10:35:20 +00:00
Alex Lyn	8784cebb84	Merge pull request #10693 from Apokleos/guest-pullimage-timeout runtime-rs: support setting create_container timeout with request_timeout_ms for image pulling in guest	2025-07-01 11:40:19 +08:00
alex.lyn	b7c1d04a47	runtime-rs: Fix noise with frequently appearing in unstaged changes Fixes #11489 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-07-01 10:19:02 +08:00
alex.lyn	9839c17cad	build: add Makefile variable for create_container_timeout Add the definiation of variable DEFCREATECONTAINERTIMEOUT into Makefile target with default timeout 30s. Fixes: #485 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-30 20:04:56 +08:00
alex.lyn	1a06bd1f08	kata-types: Introduce annotation *_RUNTIME_CREATE_CONTAINTER_TIMEOUT It's used to indicate timeout value set for image pulling in guest during creating container. This allows users to set this timeout with annotation according to the size of image to be pulled. Fixes #10692 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-30 20:04:56 +08:00
alex.lyn	f886e82f03	runtime-rs: support setting create_container_timeout It allows users to set this create container timeout within configuration.toml according to the size of image to be pulled inside guest. Fixes #10692 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-30 20:04:56 +08:00
alex.lyn	ce524a3958	kata-types: Give a more comprehensive definition of request_timeout_ms To better understand the impact of different timeout values on system behavior, this section provides a more comprehensive explanation of the request_timeout_ms: This timeout value is used to set the maximum duration for the agent to process a CreateContainerRequest. It's also used to ensure that workloads, especially those involving large image pulls within the guest, have sufficient time to complete. Based on explaination above, it's renamed with `create_container_timeout`, Specially, exposed in 'configuration.toml' Fixes #10692 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-30 20:04:56 +08:00
Steve Horsman	f04bb3f34c	Merge pull request #11479 from stevenhorsman/skip-weekly-coco-stability-tests workflows: Skip weekly coco stability tests	2025-06-30 09:05:14 +01:00
Fabiano Fidêncio	b024d8737c	Merge pull request #11481 from fidencio/topic/fix-passing-image-size-alignment build: Allow passing IMAGE_SIZE_ALIGNMENT_MB as an env var	2025-06-30 09:04:39 +02:00
Alex Lyn	69d2c078d1	Merge pull request #11484 from stevenhorsman/bump-nydus-snapshotter-0.15.2 version: Bump nydus-snapshotter	2025-06-30 14:44:01 +08:00
Alex Lyn	e66baf503b	Merge pull request #11474 from Apokleos/remote-annotation runtime-rs: Add GPU annotations for remote hypervisor	2025-06-30 14:05:15 +08:00
Fabiano Fidêncio	8d4e3b47b1	Merge pull request #11470 from fidencio/topic/runtime-rs-fix-odd-memory-size-calculation runtime-rs: Fix calculation of odd memory sizes	2025-06-30 07:26:30 +02:00
Champ-Goblem	91cadb7bfe	runtime-rs: Fix calculation of odd memory sizes An odd memory size leads to the runtime breaking during its startup, as shown below: ``` Warning FailedCreatePodSandBox 34s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox "708c81910f4e67e53b4170b6615083339b220154cb9a0c521b3232cdb40d50f9": failed to create containerd task: failed to create shim task: Others("failed to handle message start sandbox in task handler\n\nCaused by:\n 0: start vm\n 1: set vm base config\n 2: set vm configuration\n 3: Failed to set vm configuration VmConfigInfo { vcpu_count: 2, max_vcpu_count: 16, cpu_pm: \"on\", cpu_topology: CpuTopology { threads_per_core: 1, cores_per_die: 1, dies_per_socket: 1, sockets: 1 }, vpmu_feature: 0, mem_type: \"shmem\", mem_file_path: \"\", mem_size_mib: 4513, serial_path: Some(\"/run/kata/708c81910f4e67e53b4170b6615083339b220154cb9a0c521b3232cdb40d50f9/console.sock\"), pci_hotplug_enabled: true }\n 4: vmm action error: MachineConfig(InvalidMemorySize(4513))\n\nStack backtrace:\n 0: anyhow::error::<impl anyhow::Error>::msg\n 1: hypervisor::dragonball::vmm_instance::VmmInstance::handle_request\n 2: hypervisor::dragonball::vmm_instance::VmmInstance::set_vm_configuration\n 3: hypervisor::dragonball::inner::DragonballInner::set_vm_base_config\n 4: <hypervisor::dragonball::Dragonball as hypervisor::Hypervisor>::start_vm::{{closure}}::{{closure}}\n 5: <hypervisor::dragonball::Dragonball as hypervisor::Hypervisor>::start_vm::{{closure}}\n 6: <virt_container::sandbox::VirtSandbox as common::sandbox::Sandbox>::start::{{closure}}::{{closure}}\n 7: <virt_container::sandbox::VirtSandbox as common::sandbox::Sandbox>::start::{{closure}}\n 8: runtimes::manager::RuntimeHandlerManager::handler_task_message::{{closure}}::{{closure}}\n 9: runtimes::manager::RuntimeHandlerManager::handler_task_message::{{closure}}\n 10: <service::task_service::TaskService as containerd_shim_protos::shim::shim_ttrpc_async::Task>::create::{{closure}}\n 11: <containerd_shim_protos::shim::shim_ttrpc_async::CreateMethod as ttrpc::asynchronous::utils::MethodHandler>::handler::{{closure}}\n 12: <tokio::time::timeout::Timeout<T> as core::future::future::Future>::poll\n 13: ttrpc::asynchronous::server::HandlerContext::handle_msg::{{closure}}\n 14: <core::future::poll_fn::PollFn<F> as core::future::future::Future>::poll\n 15: <ttrpc::asynchronous::server::ServerReader as ttrpc::asynchronous::connection::ReaderDelegate>::handle_msg::{{closure}}::{{closure}}\n 16: tokio::runtime::task::core::Core<T,S>::poll\n 17: tokio::runtime::task::harness::Harness<T,S>::poll\n 18: tokio::runtime::scheduler::multi_thread::worker::Context::run_task\n 19: tokio::runtime::scheduler::multi_thread::worker::Context::run\n 20: tokio::runtime::context::runtime::enter_runtime\n 21: tokio::runtime::scheduler::multi_thread::worker::run\n 22: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll\n 23: tokio::runtime::task::core::Core<T,S>::poll\n 24: tokio::runtime::task::harness::Harness<T,S>::poll\n 25: tokio::runtime::blocking::pool::Inner::run\n 26: std::sys::backtrace::__rust_begin_short_backtrace\n 27: core::ops::function::FnOnce::call_once{{vtable.shim}}\n 28: std::sys::pal::unix::thread::Thread::new::thread_start") ``` As we cannot control what the users will set, let's just round it up to the next acceptable value. Signed-off-by: Champ-Goblem <cameron@northflank.com> Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-06-28 14:29:18 +02:00
Fabiano Fidêncio	e2b93fff3f	build: Allow passing IMAGE_SIZE_ALIGNMENT_MB as an env var This helps considerably to avoid patching the code, and just adjusting the build environment to use a smaller alignment than the default one. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-06-28 00:05:20 +02:00
stevenhorsman	fe5d43b4bd	workflows: Skip weekly coco stability tests These tests are not passing, or being maintained, so as discussed on the AC meeting, we will skip them from automatically running until they can be reviewed and re-worked, so avoid wasting CI cycles. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-27 16:51:53 +01:00
stevenhorsman	61b12d4e1b	version: Bump nydus-snapshotter Bump to version v0.15.2 to pick up fix to mount source in https://github.com/containerd/nydus-snapshotter/pull/636 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-27 14:04:00 +01:00
RuoqingHe	a43e06e0eb	Merge pull request #11461 from stevenhorsman/bump-guest-components-4cd62c3 versions: Bump guest-components	2025-06-27 10:45:06 +08:00
Aurélien Bombo	d94085916e	ci: set Zizmor as required test This adds Zizmor GHA security scanning as a PR gate. Note that this does NOT require that Zizmor returns 0 alerts, but rather that Zizmor's invocation completes successfully (regardless of how many alerts it raises). I will set up the former after this commit is merged (through the GH UI). Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-06-26 12:36:41 -05:00
Aurélien Bombo	820c1389db	security: ci: remove overly broad permission This removes the permission from the workflow since it's already present at the job level. https://github.com/kata-containers/kata-containers/security/code-scanning/111 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-06-26 12:29:23 -05:00
Aurélien Bombo	bb2a427a8a	security: ci: fix template injection This fixes a Zizmor error where some variables are vulnerable to template injection. https://github.com/kata-containers/kata-containers/security/code-scanning/67 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-06-26 12:29:11 -05:00
Saul Paredes	8c57beb943	Merge pull request #11471 from microsoft/saulparedes/fix_kata_monitor_dockerfile tools: kata-monitor: update go version used to build in Dockerfile	2025-06-26 08:37:08 -07:00
Chao Wu	ac928218f3	Merge pull request #11434 from hsiangkao/erofs runtime: improve EROFS snapshotter support	2025-06-26 22:40:48 +08:00
Cameron McDermott	b6cd6e6914	Merge pull request #11469 from fidencio/topic/dragonball-set-default_maxvcpus-to-zero runtime-rs: Set default_maxvcpus to 0	2025-06-26 15:20:21 +01:00
Aurélien Bombo	a1aa3e79d4	Merge pull request #11392 from kata-containers/sprt/zizmor ci: Run zizmor for GHA security analysis	2025-06-26 08:55:22 -05:00
Fupan Li	1ff54a95d2	Merge pull request #11422 from lifupan/memory_hotplug runtime-rs: Add the memory and vcpu hotplug for cloud-hypervisor	2025-06-26 17:56:49 +08:00
Aurélien Bombo	34c8cd810d	ci: Run zizmor for GHA security analysis This runs the zizmor security lint [1] on our GH Actions. The initial workflow uses [2] as a base. [1] https://docs.zizmor.sh/ [2] https://docs.zizmor.sh/usage/#use-in-github-actions Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-06-26 10:52:28 +01:00
alex.lyn	e6e4cd91b8	runtime-rs: Enable GPU annotations in remote hypervisor configuration Enable GPU annotations by adding `default_gpus` and `default_gpu_model` into the list of valid annotations `enable_annotations`. Fixes #10484 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-26 17:29:36 +08:00
alex.lyn	e5f44fae30	runtime-rs: Add GPU annotations during remote hypervisor preparation Add GPU specific annotations used by remote hypervisor for instance selection during `prepare_vm`. Fixes #10484 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-26 17:27:41 +08:00
alex.lyn	866d3facba	kata-types: Introduce two GPU annotations for remote hypervisor Two annotations: `default_gpus and `default_gpu_model` as GPU annotations are introduced for Kata VM configurations to improve instance selection on remote hypervisors. By adding these annotations: (1) `default_gpus`: Allows users to specify the minimum number of GPUs a VM requires. This ensures that the remote hypervisor selects an instance with at least that many GPUs, preventing resource under-provisioning. (2) `default_gpu_model`: Lets users define the specific GPU model needed for the VM. This is crucial for workloads that depend on particular GPU archs or features, ensuring compatibility and optimal performance. Fixes #10484 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-26 17:27:41 +08:00
alex.lyn	ed0c0b2367	kata-types: Introduce GPU related fields in RemoteInfo To provide the remote hypervisor with the necessary intelligence to select the most appropriate instance for a given GPU instance, leading to better resource allocation, two fields `default_gpus` and `default_gpu_model` are introduced in `RemoteInfo`. Fixes #10484 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-26 17:27:28 +08:00
Alex Lyn	9a1d4fc5d6	Merge pull request #11468 from Apokleos/fix-sharefs-none runtime-rs: Support shared fs with "none" on non-tee platforms	2025-06-26 15:37:44 +08:00
Gao Xiang	9079c8e598	runtime: improve EROFS snapshotter support To better support containerd 2.1 and later versions, remove the hardcoded `layer.erofs` and instead parse `/proc/mounts` to obtain the real mount source (and `/sys/block/loopX/loop/backing_file` if needed). If the mount source doesn't end with `layer.erofs`, it should be marked as unsupported, as it may be a filesystem meta file generated by later containerd versions for the EROFS flattened filesystem feature. Also check whether the filesystem type is `overlay` or not, since the containerd mount manager [1] may change it after being introduced. [1] https://github.com/containerd/containerd/issues/11303 Fixes: `f63ec50ba3` ("runtime: Add EROFS snapshotter with block device support") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2025-06-26 10:12:12 +08:00
Saul Paredes	d53c720ac1	tools: kata-monitor: update go version used to build in Dockerfile Current Dockerfile fails when trying to build from the root of the repo docker build -t kata-monitor -f tools/packaging/kata-monitor/Dockerfile . with "invalid go version '1.23.0': must match format 1.23" Using go 1.23 in the Dockerfile fixes the build error Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-06-25 15:32:41 -07:00
stevenhorsman	290fda9b97	agent-ctl: Bump image-rs version I notices that agent-ctl is including a 9 month old version of image-rs and the libs crates haven't been update for potentially many years, so bump all of these. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-25 16:30:58 +01:00
stevenhorsman	c7da62dd1e	versions: Bump guest-components Bump to pick up the new guest-components and matching trustee which use rust 1.85.1 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-25 15:05:07 +01:00
Fabiano Fidêncio	bebe377f0d	runtime-rs: Set default_maxvcpus to 0 Otherwise we just cannot start a container that requests more than 1 vcpu. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-06-25 14:36:46 +02:00
Steve Horsman	9ff30c6aeb	Merge pull request #11462 from kata-containers/add-scorecard-action ci: Add scorecard action	2025-06-25 12:48:11 +01:00
Fabiano Fidêncio	69c706b570	Merge pull request #11441 from stevenhorsman/protobuf-3.7.2-bump versions: Bump protobuf to 3.7.2	2025-06-25 13:47:28 +02:00
alex.lyn	eae62ca9ac	runtime-rs: Support shared fs with "none" on non-tee platforms This commit introduces the ability to run Pods without shared fs mechanism in Kata. The default shared fs can lead to unnecessary resource consumption and security risks for certain use cases. Specifically, scenarios where files only need to be copied into the VM once at Pod creation (e.g., non-tee envs) and don't require dynamic updates make the shared fs redundant and inefficient. By explicitly disabling shared fs functionality, we reduce resource overhead and shrink the attack surface. Users will need to employ alternative methods(e.g. guest-pull) to ensure container images are shared into the guest VM for these specific scenarios. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-25 17:36:57 +08:00
Fabiano Fidêncio	4719c08184	Merge pull request #11467 from lifupan/fixblockfile runtime-rs: fix the issue return the wrong volume	2025-06-25 09:56:28 +02:00
Fupan Li	48c8e0f296	runtime-rs: fix the issue return the wrong volume In the pre commit:74eccc54e7b31cc4c9abd8b6e4007c3a4c1d4dd4, it missed return the right rootfs volume. In the is_block_rootfs fn, if the rootfs is based on a block device such as devicemapper, it should clear the volume's source and let the device_manager to use the dev_id to get the device's host path. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-06-25 10:02:52 +08:00
Alex Lyn	648fef4f52	Merge pull request #11466 from lifupan/blockfile runtime-rs: add the blockfile based rootfs support	2025-06-25 09:46:54 +08:00
Dan Mihai	2d43b3f9fc	Merge pull request #11424 from katexochen/p/regorus-oras-cache ci/static-checks: use oras cache for regorus	2025-06-24 14:49:00 -07:00
Fupan Li	74eccc54e7	runtime-rs: add the blockfile based rootfs support For containerd's Blockfile Snapshotter, it will pass a rootfs mounts with a rawfile as a mount source and mount options with "loop" embeded. To support this type of rootfs, it is necessary to identify this as a blockfile rootfs through the "loop" flag, and then use the volume source of the rootfs as the source of the block device to hot-insert it into the guest. Fixes:#11464 Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-06-24 22:31:54 +08:00
Paul Meyer	43739cefdf	ci/static-checks: use oras cache for regorus Instead of building it every time, we can store the regorus binary in OCI registry using oras and download it from there. This reduces the install time from ~1m40s to ~15s. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-06-24 13:14:18 +02:00
Fupan Li	9bdbd82690	Merge pull request #11181 from Apokleos/initdata-runtime-rs runtime-rs: Implement Initdata Spec Support in runtime-rs for CoCo	2025-06-24 18:59:34 +08:00
Fupan Li	1c59516d72	runtime-rs: add support resize_vcpu for cloud-hypervisor This commit add support of resize_vcpu for cloud-hypervisor using the it's vm resize api. It can support bothof vcpu hotplug and hot unplug. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-06-24 17:15:05 +08:00
Fupan Li	a3671b7a5c	runtime-rs: Add the memory hotplug for cloud-hypervisor For cloud-hypervisor, currently only hot plugging of memory is supported, but hot unplugging of memory is not supported. In addition, by default, cloud-hypervisor uses ACPI-based memory hot-plugging instead of virtio-mem based memory hot-plugging. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-06-24 17:15:05 +08:00
Fupan Li	7df29605a4	runtime-rs: add the vm resize and get vminfo api for clh Add API interfaces for get vminfo and resize. get vminfo can obtain the memory size and number of vCPUs from the cloud hypervisor vmm in real time. This interface provides information for the subsequent resize memory and vCPU. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-06-24 17:15:05 +08:00
Fupan Li	9a51ade4e2	runtime-rs: impl the Deserialize trait for MacAddr The system's own Deserialize cannot implement parsing from string to MacAddr, so we need to implement this trait ourself. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-06-24 17:15:05 +08:00
Fupan Li	ceaae3049c	runtime-rs: move the bytes_to_megs and megs_to_bytes to utils Since those two functions would be used by other hypervisors, thus move them into the utils crate. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-06-24 17:15:05 +08:00
alex.lyn	871465f5d3	kata-agent: Allow unrecognized fields in InitData To make it flexibility and extensibility This change modifies the Kata Agent's handling of `InitData` to allow for unrecognized key-value pairs. The `InitData` field now directly utilizes `HashMap<String, String>`, enabling it to carry arbitrary metadata and information that may be consumed by other components Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-24 10:25:57 +08:00
alex.lyn	afcb042c28	runtime-rs: Specify the initdata to mrconfigid correctly During sandbox preparation, initdata should be specified to TdxConfig, specially mrconfigid, which is used to pass to tdx guest report for measurement. Fixes #11180 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-24 10:25:57 +08:00
alex.lyn	d6d8497b56	runtime-rs: Add host-data property to sev-snp-guest object SEV-SNP guest configuration utilizes a different set of properties compared to the existing 'sev-guest' object. This change introduces the `host-data` property within the sev-snp-guest object. This property allows for configuring an SEV-SNP guest with host-provided data, which is crucial for data integrity verification during attestation. The `host-data` property is specifically valid for SEV-SNP guests running on a capable platform. It is configured as a base64-encoded string when using the sev-snp-guest object. the example cmdline looks like: ```shell -object sev-snp-guest,id=sev-snp0,host-data=CGNkCHoBC5CcdGXir... ``` Fixes #11180 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-24 10:25:57 +08:00
alex.lyn	4a4361393c	runtime-rs: Introduce host-data in SevSnpConfig for validation To facilitate the transfer of initdata generated during `prepare_initdata_device_config`, a new parameter has been introduced into the `prepare_protection_device_config` function. Furthermore, to specifically pass initdata to SEV-SNP Guests, a `host_data` field has been added to the `SevSnpConfig` structure. However, this field is exclusively applicable to the SEV-SNP platform. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-24 10:25:57 +08:00
alex.lyn	5c8170dbb9	runtime-rs: Handle initdata block device config during sandbox start Retrieve the Initdata string content from the security_info of the Configuration. Based on the Protection Platform type, calculate the digest of the Initdata. Write the Initdata content to the block device. Subsequently, construct the BlockConfig based on this block device information. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-24 10:25:57 +08:00
alex.lyn	6ea1494701	runtime-rs: Add InitData Resource type for block device management To correctly manage initdata as a block device, a new InitData Resource type, inherently a block device, has been introduced within the ResourceManager. As a component of the Sandbox's resources, this InitData Resource needs to be appropriately handled by the Device Manager's handler. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-24 10:25:57 +08:00
alex.lyn	8c1482a221	runtime-rs: Introduce coco_data dir and initdata block Implement resource storage infrastructure with initial initdata support: 1. Create dedicated `coco_data` directory for: - Centralized management of CoCo resources; - Future expansion of CoCo artifacts; 2. Atomic initdata block as foundational component in `coco_data`, it will implement creation of compressed initdata blocks with: - Gzip compression with level customization (0-9) - Sector-aligned (512B) image format with magic header - Adaptive buffering (4KB-128KB) based on payload size - Temp-file atomic writes with 0o600 permissions Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-24 10:25:57 +08:00
alex.lyn	9b21d062c9	kata-types: Implement InitData retrieval from Pod annotation This commit implements the retrieval and processing of InitData provided via a Pod annotation. Specifically, it enables runtime-rs to: (1) Parse the "io.katacontainers.config.hypervisor.cc_init_data" annotation from the Pod YAML. (2) Perform reverse operations on the annotation value: base64 decoding followed by gzip decompression. (3) Deserialize the decompressed data into the internal InitData structure. (4) Serialize the resulting InitData into a string and store it in the Configuration. This allows users to inject configuration data into the TEE Guest by encoding and compressing it and passing it as an annotation in the Pod configuration. This mechanism supports scenarios where dynamic config is required for Confidential Containers. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-24 10:25:57 +08:00
alex.lyn	4ca394f4fc	kata-types: Implement Initdata Spec and Digest Calculation Logic This commit introduces the Initdata Spec and the logic for calculating its digest. It includes: (1) Define a `ProtectedPlatform` enum to represent major TEE platform types. (2) Create an `InitData` struct to support building and serializing initialization data in TOML format. (3) Implement adaptation for SHA-256, SHA-384, and SHA-512 digest algorithms. (4) Provide a platform-specific mechanism for adjusting digest lengths (zero-padding). (5) Supporting the decoding and verification of base64+gzip encoded Initdata. The core functionality ensures the integrity of data injected by the host through trusted algorithms, while also accommodating the measurement requirements of different TEE platforms. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-24 10:25:57 +08:00
alex.lyn	2603ee66b8	kata-types: Introduce initdata to SecurityInfo for data injection This commit introduces a new `initdata` field of type String to hypervisor `SecurityInfo`. In accordance with the Initdata Specification, this field will facilitate the injection of well-defined data from an untrusted host into the TEE. To ensure the integrity of this injected data, the TEE evidence's hostdata capability or the (v)TPM dynamic measurement capability will be leveraged, as outlined in the specification. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-24 10:25:57 +08:00
Dan Mihai	89dcc8fb27	Merge pull request #11444 from microsoft/danmihai1/k8s-policy-rc tests: k8s-policy-rc: print pod descriptions	2025-06-23 16:14:56 -07:00
Dan Mihai	0a57e09259	Merge pull request #11426 from charludo/fix/genpolicy-corruption-of-layer-cache-file genpolicy: prevent corruption of the layer cache file	2025-06-23 14:00:45 -07:00
Dan Mihai	8aecf14b34	Merge pull request #11405 from kata-containers/dependabot/cargo/src/agent/clap-77d1155c52 build(deps): bump the clap group across 6 directories with 1 update	2025-06-23 13:05:59 -07:00
Dan Mihai	62c9845623	tests: k8s-policy-rc: print pod descriptions Don't use local launched_pods variable in test_rc_policy(), because teardown() needs to use this variable to print a description of the pods, for debugging purposes. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-06-23 16:23:26 +00:00
stevenhorsman	649e31340b	doc: Add scorecard badge Add our scorecard badge to our readme for transparency and to help motivate us to update our score Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-23 16:22:59 +01:00
stevenhorsman	6dd025d0ed	workflows: Add scorecard workflow Add a workflow to update our scorecard score on each change Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-23 16:09:14 +01:00
Steve Horsman	4f245df4a0	Merge pull request #11420 from kata-containers/pin-gha-actions workflows: Pin action hashes	2025-06-23 15:26:03 +01:00
charludo	4e57cc0ed2	genpolicy: keep layers cache in-memory to prevent corruption The locking mechanism around the layers cache file was insufficient to prevent corruption of the file. This commit moves the layers cache's management in-memory, only reading the cache file once at the beginning of `genpolicy`, and only writing to it once, at the end of `genpolicy`. In the case that obtaining a lock on the cache file fails, reading/writing to it is skipped, and the cache is not used/persisted. Signed-off-by: charludo <git@charlotteharludo.com>	2025-06-23 16:16:42 +02:00
RuoqingHe	8c1f6e827d	Merge pull request #11448 from RuoqingHe/remove-dup-ignore ci: Remove duplicated `rust-vmm` dependencies	2025-06-23 10:34:30 +08:00
Ruoqing He	1d2d2cc3d5	ci: Remove duplicated `rust-vmm` dependencies `vmm-sys-util` was duplicated while updating the `ignore` list of `rust-vmm` crates in #11431, remove duplicated one and sort the list. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-21 21:02:59 +00:00
stevenhorsman	9685e2aeca	trace-forwarder: Replace removed clap functions When moving from clap v2 to v4 a bunch of functions have been removed, so update the code to handle these replacements Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-21 17:15:12 +01:00
stevenhorsman	e204847df5	agent-ctl: Replace removed clap functions When moving from clap v2 to v4 a bunch of functions have been removed, so update the code to handle these replacements Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-21 17:15:12 +01:00
stevenhorsman	e11fc3334e	agent: Clap v4 updates AppSettings was removed, so refactor based on new documentation Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-21 17:15:12 +01:00
dependabot[bot]	0aa80313eb	build(deps): bump the clap group across 6 directories with 1 update Bumps the clap group with 1 update in the /src/agent directory: [clap](https://github.com/clap-rs/clap). Bumps the clap group with 1 update in the /src/tools/agent-ctl directory: [clap](https://github.com/clap-rs/clap). Bumps the clap group with 1 update in the /src/tools/genpolicy directory: [clap](https://github.com/clap-rs/clap). Bumps the clap group with 1 update in the /src/tools/kata-ctl directory: [clap](https://github.com/clap-rs/clap). Bumps the clap group with 1 update in the /src/tools/runk directory: [clap](https://github.com/clap-rs/clap). Bumps the clap group with 1 update in the /src/tools/trace-forwarder directory: [clap](https://github.com/clap-rs/clap). Updates `clap` from 3.2.25 to 4.5.37 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 3.2.25 to 4.5.37 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 3.2.25 to 4.5.37 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 3.2.25 to 4.5.37 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 3.2.25 to 4.5.37 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 3.2.25 to 4.5.37 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 2.34.0 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 2.34.0 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 2.34.0 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 2.34.0 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 2.34.0 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 2.34.0 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 4.1.8 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 4.1.8 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 4.1.8 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 4.1.8 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 4.1.8 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 4.1.8 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 4.4.10 to 4.5.13 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 4.4.10 to 4.5.13 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 4.4.10 to 4.5.13 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 4.4.10 to 4.5.13 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 4.4.10 to 4.5.13 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 4.4.10 to 4.5.13 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 3.2.25 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 3.2.25 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 3.2.25 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 3.2.25 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 3.2.25 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 3.2.25 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 2.34.0 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 2.34.0 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 2.34.0 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 2.34.0 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 2.34.0 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) Updates `clap` from 2.34.0 to 4.5.40 - [Release notes](https://github.com/clap-rs/clap/releases) - [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md) - [Commits](https://github.com/clap-rs/clap/compare/v3.2.25...clap_complete-v4.5.37) --- updated-dependencies: - dependency-name: clap dependency-version: 4.5.37 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.37 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.37 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.37 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.37 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.37 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: clap - dependency-name: clap dependency-version: 4.5.13 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: clap - dependency-name: clap dependency-version: 4.5.13 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: clap - dependency-name: clap dependency-version: 4.5.13 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: clap - dependency-name: clap dependency-version: 4.5.13 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: clap - dependency-name: clap dependency-version: 4.5.13 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: clap - dependency-name: clap dependency-version: 4.5.13 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap - dependency-name: clap dependency-version: 4.5.40 dependency-type: direct:production update-type: version-update:semver-major dependency-group: clap ... Signed-off-by: dependabot[bot] <support@github.com>	2025-06-21 17:15:12 +01:00
RuoqingHe	b22135f4e5	Merge pull request #11431 from RuoqingHe/udpate-rust-vmm-ignore-list ci: Update dependabot ignore list	2025-06-21 18:20:41 +08:00
Ruoqing He	6628ba3208	ci: Update dependabot ignore list Update dependabot ignore list in cargo ecosystem to ignore upgrades from rust-vmm crates, since those crates need to be managed carefully and manually. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-21 08:18:20 +01:00
stevenhorsman	9d3b9fb438	workflows: Pin action hashes Pin Github owned actions to specific hashes as recommended as tags are mutable see https://pin-gh-actions.kammel.dev/. This one of the recommendations that scorecard gives us. Note this was generated with `frizbee actions` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-21 08:14:13 +01:00
Steve Horsman	4bfa74c2a5	Merge pull request #11331 from stevenhorsman/helm-ghcr-login-update workflow: Remove code injection in helm login	2025-06-21 08:13:40 +01:00
Steve Horsman	353b4bc853	Merge pull request #11440 from stevenhorsman/osbuilder-fedora-42-update osbuilder: Update image-builder base to f42	2025-06-21 08:11:12 +01:00
Steve Horsman	cac1cb75ce	Merge pull request #11378 from kata-containers/dependabot/cargo/src/tools/agent-ctl/rustix-0.37.28 build(deps): bump rustix in various components	2025-06-21 08:05:21 +01:00
stevenhorsman	900d9be55e	build(deps): bump rustix in various components Bumps of rustix 0.36, 0.37 and 0.38 to resolve CVE-2024-43806 Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-20 14:52:43 -05:00
stevenhorsman	d9defd5102	osbuilder: Update image-builder base to f42 Fedora 40 is EoL, and I've seen the registry pull fail a few times recently, so let's bump to fedora 42 which has 10 months of support left. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-20 20:52:30 +01:00
stevenhorsman	0f1c326ca0	versions: Bump protobuf to 3.7.2 Now we are decoupled from the image-rs crate, we can bump the protobuf version across our project to resolve the GHSA-2gh3-rmm4-6rq5 advisory Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-20 20:52:04 +01:00
Saul Paredes	cc27966aa1	Merge pull request #11443 from microsoft/saulparedes/update_image tests: update container image for ci and unit test	2025-06-20 12:50:42 -07:00
Archana Choudhary	e093919b42	tests: update container image for ci and unit test This patch updates the container image for the CI test workloads: - `k8s-layered-sc-deployment.yaml` - `k8s-pod-sc-deployment.yaml` - `k8s-pod-sc-nobodyupdate-deployment.yaml` - `k8s-pod-sc-supplementalgroups-deployment.yaml` - `k8s-policy-deployment.yaml` Also updates unit tests: - `test_create_container_security_context` - `test_create_container_security_context_supplemental_groups` This fixes tests failing due to an image pull error as the previous image is no longer available in the container registry. Signed-off-by: Archana Choudhary <archana1@microsoft.com> Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-06-20 10:46:56 -07:00
stevenhorsman	776c89453c	workflow: Remove code injection in helm login In theory `github.actor` could be used for code injection, so swap it out. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-20 16:27:52 +01:00
Fabiano Fidêncio	6722ea2fd9	Merge pull request #11439 from stevenhorsman/multi-arch-manifest-permissions-fix release: Add more permissions	2025-06-19 12:45:37 +02:00
stevenhorsman	8da75bf55d	release: Add more permissions Add package: write to the multi-arch manifest upload to ghcr.io Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-19 11:04:29 +01:00
Fabiano Fidêncio	d0c1ce1367	Merge pull request #11438 from stevenhorsman/helm-upload-fix release: Fix helm push typo	2025-06-19 12:01:04 +02:00
stevenhorsman	eaf42b3e0f	release: Fix helm push typo Switch the hyper for an underscore, so the ghcr helm publish can work properly. Co-authored-by: Fabiano Fidêncio <fidencio@northflank.com> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-19 10:56:50 +01:00
Fabiano Fidêncio	f7d3ea0c55	Merge pull request #11437 from kata-containers/release-flow-permissions-fixes-iii workflows: Release permissions	2025-06-19 11:23:46 +02:00
stevenhorsman	19597b8950	workflows: Release permissions Add more permissions to the release workflow in order to enable `gh release` commands to run Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-19 10:05:23 +01:00
Fabiano Fidêncio	254ada2f6a	Merge pull request #11436 from kata-containers/release-flow-permission-fix-ii workflows: Add extra permissions	2025-06-19 10:45:26 +02:00
stevenhorsman	7c6c6f3c15	workflows: Add extra permissions Add permissions to the ppc release Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-19 09:39:01 +01:00
Steve Horsman	00c9e61b60	Merge pull request #11435 from kata-containers/release-flow-permissions-fix(es) workflows: Fix permissions	2025-06-19 09:35:23 +01:00
stevenhorsman	9adf989555	workflows: Fix permissions Add extra permissions for reusable workflow calls that need them later on Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-19 08:44:18 +01:00
Fabiano Fidêncio	e82de65d5d	Merge pull request #11425 from stevenhorsman/release-3.18.0-bump release: Bump version to 3.18.0	2025-06-18 21:39:51 +02:00
stevenhorsman	6fc622ef0f	release: Bump version to 3.18.0 Bump VERSION and helm-chart versions Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-18 19:09:42 +01:00
Steve Horsman	060faa3d1a	Merge pull request #11433 from kata-containers/cri-containerd-test-fast-fail-false workflows: Add fail-fast: false to cri-containerd tests	2025-06-18 19:08:59 +01:00
Steve Horsman	e0084a958c	Merge pull request #11432 from stevenhorsman/golang-1.23.10 versions: Bump golang to 1.23.10	2025-06-18 17:25:07 +01:00
Steve Horsman	4e3238b9dc	Merge pull request #11337 from zvonkok/fix-module-signing gpu: Fix module signing	2025-06-18 17:23:51 +01:00
Steve Horsman	547b6c5781	Merge pull request #11429 from stevenhorsman/cri-containerd-required-test-rename Cri containerd required test rename	2025-06-18 15:45:14 +01:00
Zvonko Kaiser	e2f18057a4	kernel: Add config option for signing Only sign the kernel if the user has provided the KBUILD_SIGN_PIN otherwise ignore. Whole here, let's move the functionality to the common fragments as it's not a GPU specific functionality. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-06-18 15:32:26 +02:00
stevenhorsman	73d7b4f258	workflows: Add fail-fast: false to cri-containerd tests At the moment if any of the tests in the matric fails then the rest of the jobs are cancelled, so we have to re-run everything. Add `fail-fast: false` to stop this behaviour. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-18 14:20:16 +01:00
stevenhorsman	aedbaa1545	versions: Bump golang to 1.23.10 Bump golang to fix CVEs GO-2025-3751 and GO-2025-3563 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-18 11:11:32 +01:00
stevenhorsman	b20f89b775	ci: required-tests: Remove test skip Remove the rule that causes gatekeeper to skip tests if we've only updated the required-tests.yaml list. Although update to just the required-tests.yaml doesn't change the outcome of any of the CI tests, it does change whether gatekeeper will still pass with the new rules. Although it's a bit of a hit to run the CI, it's probably worth it to keep gatekeeper validated. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-18 10:52:03 +01:00
stevenhorsman	d68b09a4f0	ci: required-tests: cri-containerd rename Update the names of the required jobs based on the changes done in #11019 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-18 10:52:03 +01:00
Steve Horsman	0aca20986b	Merge pull request #11400 from miz060/mitchzhu/add-govulncheck ci: Add optional govulncheck security scanning to static checks	2025-06-18 10:34:56 +01:00
Steve Horsman	d754e3939b	Merge pull request #11427 from BbolroC/bump-rootfs-confidential-s390x rootfs: Bump rootfs-{image,initrd} to 24.04	2025-06-18 09:06:58 +01:00
Mitch Zhu	292c27130d	ci: Add optional govulncheck security scanning to static checks This adds govulncheck vulnerability scanning as a non-blocking check in the static checks workflow. The check scans Go runtime binaries for known vulnerabilities while filtering out verified false positives. Signed-off-by: Mitch Zhu <mitchzhu@microsoft.com>	2025-06-17 20:43:00 -07:00
Alex Lyn	b61b20eef3	Merge pull request #11394 from mythi/tdx-kata-deploy-bump kata-deploy: accept 25.04 as supported distro for TDX	2025-06-18 08:52:46 +08:00
Hyounggyu Choi	4be261f248	rootfs: Bump rootfs-{image,initrd} to 24.04 Since #11197 was merged, all confidential k8s e2e tests for s390x have been failing with the following errors: ``` attestation-agent: error while loading shared libraries: libcurl.so.4: cannot open shared object file libnghttp2.so.14: cannot open shared object file ``` In line with the update on x86_64, we need to upgrade the OS used in rootfs-{image,initrd} on s390x. This commit also bumps all 22.04 to 24.04 for all architectures. For s390x, this ensures the missing packages listed above are installed. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-06-17 22:03:26 +02:00
Steve Horsman	fd93e83a4f	Merge pull request #11019 from seungukshin/cri-containerd-tests-for-arm64 Enable cri-containerd-tests for arm64	2025-06-17 11:53:49 +01:00
Fupan Li	15b24b5be1	Merge pull request #10698 from Apokleos/kata-volume-rs runtime-rs: Support Pull Image in Guest with Kata Volume for CoCo	2025-06-17 15:00:02 +08:00
Lei Liu	71d1cdf40a	test: fix broken testing code in libs After commit `a3f973db3b` merged, protection::GuestProtection::[Snp,Sev] have changed to tuple variants, and can no longer be used in assert_eq marco without tuple values, or some errors will raised: ``` assert_eq!(actual.unwrap(), GuestProtection::Snp); \| ^^^^^^^^^^^^^^^^^^^^ expected \ `GuestProtection`, found enum constructor ``` Signed-off-by: Lei Liu <liulei.pt@bytedance.com>	2025-06-17 12:38:39 +08:00
Steve Horsman	a00f39e272	Merge pull request #11419 from katexochen/p/gitignore-direnv gitignore: ignore direnv	2025-06-16 17:26:10 +01:00
Seunguk Shin	4f9b7e4d4f	ci: Enable cri-containerd-tests for arm64 This change enables cri-containerd-test for arm64. Signed-off-by: Seunguk Shin <seunguk.shin@arm.com> Reviewed-by: Nick Connolly <nick.connolly@arm.com>	2025-06-16 15:12:17 +01:00
Paul Meyer	822f54c800	ci/static-checks: add dispatch trigger This simplifies executing the workflow on a fork during testing. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-06-16 16:12:10 +02:00
Seunguk Shin	203e3af94b	ci: Disable run-containerd-sandboxapi containerd-sandboxapi fails with `containerd v2.0.x` and passes with `containerd v1.7.x` regardless kata-containers. And it was not tested with `containerd v2.0.x` because `containerd v2.0.x` could not recognize `[plugins.cri.containerd]` in `config.toml`. Signed-off-by: Seunguk Shin <seunguk.shin@arm.com>	2025-06-16 15:02:07 +01:00
Mikko Ylinen	825b1cd233	kata-deploy: accept 25.04 as supported distro for TDX the latest Canonical TDX release supports 25.04 / Plucky as well. Users experimenting with the latest goodies in the 25.04 TDX enablement won't get Kata deployed properly. This change accepts 25.04 as supported distro for TDX. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-06-16 13:42:08 +01:00
Xuewei Niu	9b4518f742	Merge pull request #11359 from pawelbeza/fix-logs-on-virtiofs-shutdown Fix logging on virtiofs shutdown	2025-06-16 17:06:29 +08:00
Paul Meyer	b629b11ba0	gitignore: ignore direnv This allows contributors to setup direnv without having it detected by git. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-06-16 11:02:00 +02:00
Steve Horsman	64c95cb996	Merge pull request #11389 from kata-containers/checkout-persist-credentials-false workflows: Set persist-credentials: false on checkout	2025-06-16 09:58:22 +01:00
alex.lyn	cebb259e51	runtime-rs: Introduce force guest pulling image Container image integrity protection is a critical practice involving a multi-layered defense mechanism. While container images inherently offer basic integrity verification through Content-Addressable Storage (CAS) (ensuring pulled content matches stored hashes), a combination of other measures is crucial for production environments. These layers include: Encrypted Transport (HTTPS/TLS) to prevent tampering during transfer; Image Signing to confirm the image originates from a trusted source; Vulnerability Scanning to ensure the image content is "healthy"; and Trusted Registries with stringent access controls. In certain scenarios, such as when container image confidentiality requirements are not stringent, and integrity is already ensured via the aforementioned mechanisms (especially CAS and HTTPS/TLS), adopting "force guest pull" can be a viable option. This implies that even when pulling images from a container registry, their integrity remains guaranteed through content hashes and other built-in mechanisms, without relying on additional host-side verification or specialized transfer methods. Since this feature is already available in runtime-go and offers synergistic benefits with guest pull, we have chosen to support force guest pull. Fixes #10690 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-16 16:49:17 +08:00
alex.lyn	2157075140	kata-types: Introduce a helper method to adjust rootfs mounts This commit introduces the `adjust_rootfs_mounts` function to manage root filesystem mounts for guest-pull scenarios. When the force guest-pull mechanism is active, this function ensures that the rootfs is exclusively configured via a dedicated `KataVirtualVolume`. It disregards any provided input mounts, instead generating a single, default `KataVirtualVolume`. This volume is then base64-encoded and set as the sole mount option for a new, singular `Mount` entry, which is returned as the only item in the `Vec<Mount>`. This change guarantees consistent and exclusive rootfs configuration when utilizing guest-pull for container images. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-16 16:49:17 +08:00
alex.lyn	c9ffbaf30d	runtime-rs: Support handling Kata Virtual Volume in handle_rootfs In CoCo scenarios, there's no image pulling on host side, and it will disable such operations, that's to say, there's no files sharing between host and guest, especially for container rootfs. We introduce Kata Virtual Volume to help handle such cases: (1) Introduce is_kata_virtual_volume to ensure the volume is kata virtual volume. (2) Introduce VirtualVolume Handling logic in handle_rootfs when the mount is kata virtual volume. Fixes #10690 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-16 16:49:17 +08:00
alex.lyn	2600fc6f43	runtime-rs: Add Spec annotation to help pass image information We need get the relevent image ref from OCI runtime Spec, especially the annotation of it. Fixes #10690 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-16 16:49:17 +08:00
alex.lyn	d4e9369d3d	runtime-rs: Implement guest-pull rootfs via virtual volumes This commit introduces comprehensive support for rootfs mount mgmt through Kata Virtual Volumes, specifically enabling the guest-pull mechanism. It enhances the runtime's ability to: (1) Extract image references from container annotations (CRI/CRI-O). (2) Process `KataVirtualVolume` objects, configuring them for guest-pull operations. (3) Set up the agent's storage for guest-pulled images. This functionality streamlines the process of pulling container images directly within the guest for rootfs, aligning with guest-side image management strategies. Fixes #10690 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-16 16:49:17 +08:00
Alex Lyn	a966d1be50	Merge pull request #11197 from Xynnn007/move-image-pull Move image pull abilities to CDH	2025-06-16 16:43:59 +08:00
Xynnn007	e0b4cd2dba	initrd/image: update x86_64 base to ubuntu 24.04 The Multistrap issue has been fixed in noble thus we can use the LTS. Also, this will fix the error reported by CDH ``` /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found ``` Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 13:54:15 +08:00
Xynnn007	0b3a8c0355	initdata: delete coco_as token section in initdata The new version of AA allows the config not having a coco_as token config. If not provided, it will mark as None. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 13:54:15 +08:00
Xynnn007	5bab460224	chore(deps): update guest-components This patch updates the guest-components to new version with better error logging for CDH. It also allows the config of AA not having a coco_as token config. Also, the new version of CDH requires to build aws-lc-sys thus needs to install cmake for build. See https://github.com/kata-containers/kata-containers/actions/runs/15327923347/job/43127108813?pr=11197#step:6:1609 for details. Besides, the new version of guest-components have some fixes for SNP stack, which requires the updates of trustee side. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 13:54:15 +08:00
Xynnn007	aae64fa3d6	agent: add agent.image_pull_timeout parameter This new parameter for kata-agent is used to control the timeout for a guest pull request. Note that sometimes an image can be really big, so we set default timeout to 1200 seconds (20 minutes). Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 13:54:15 +08:00
Xynnn007	93826ff90c	tests: update negative test log assertions After moving image pulling from kata-agent to CDH, the failed image pull error messages have been slightly changed. This commit is to apply for the change. Note that in original and current image-rs implementation, both no key or wrong key will result in a same error information. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 13:54:15 +08:00
Xynnn007	7420194ea8	build: abandon PULL_TYPE build env Now kata-agent by default supports both guest pull and host pull abilities, thus we do not need to specify the PULL_TYPE env when building kata-agent. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 13:53:55 +08:00
Xynnn007	44a6d1a6f7	docs: update guest pull document After moving guest pull abilities to CDH, the document of guest pull should be updated due to new workflow. Also, replace the diagram of PNG into a mermaid one for better maintaince. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 11:13:20 +08:00
Xynnn007	105cb47991	agent: always try to override oci process spec In previous version, only when the `guest-pull` feature is enabled during the build time, the OCI process will be tried to be overrided when the storage has a guest pull volume and also it is sandbox. After getting rid of the feature, whether it is guest-pull is runtimely determined thus we can always do this trying override, by checking if there is kata guest pull volume in storages and it's sandbox. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 11:13:20 +08:00
Xynnn007	6b1249186f	agent: embed ocicrypt config in rootfs by default Now the ocicrypt configuration used by CDH is always the same and it's not a good practics to write it into the rootfs during runtime by kata-agent. Thus we now move it to coco-guest-components build script. The config will be embedded into guest image/initrd together with CDH binary. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 11:13:20 +08:00
Xynnn007	22e65024ce	agent: get rid of pull-type option The feature `guest-pull` and `default-pull` are both removed, because both guest pull and host pull are supported in building time without without involving new dependencies like image-rs before. The guest pull will depend on the CDH process, not the build time feature. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 11:13:20 +08:00
Xynnn007	0e15b49369	agent: get rid of init_image_service we do not need to initialize image service in kata-agent now, as it's initialized in CDH. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 11:13:20 +08:00
Xynnn007	22c50cae7b	agent: let image_pull_handler call cdh to pull image This is a higher level calling to pull image inside guest. Now it should call confidential_data_hub's API. As the previous pull_image API does 1. check is sandbox 2. generate bundle_path inside the original logic, and the new API does not do them to keep the API semantice clean, thus before we call the API, we explicitly do the two things. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 11:13:20 +08:00
Xynnn007	39cd430994	agent: add ocicrypt_config envs for CDH process now image pull ability is moved to CDH, thus the CDH process needs environment variables of ocicrypt to help find the keyprovider(cdh) to decrypt images. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 11:13:20 +08:00
Xynnn007	f67f5c2b69	agent: remove image pull configs As image pull ability is moved to CDH, kata-agent does not need the confugurations of image pulling anymore. All these configurations reading from kernel cmdline is now implemented by CDH. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 11:13:20 +08:00
Xynnn007	4436fe6d99	agent: move guest pull abilities to Confidential Data Hub Image pull abilities are all moved to the separate component Confidential Data Hub (CDH) and we only left the auxiliary functions except pull_image in confidential_data_hub/image.rs Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 11:10:09 +08:00
Xynnn007	5067aafd56	agent: move cdh.rs and image.rs to a separate module confidential_data_hub This is a little refactoring commit that moves the mod `cdh.rs` and `image.rs` to a directory module `confidential_data_hub`. This is because the image pull ability will be moved into confidential data hub, thus it is better to handle image pull things in the confidential data hub submodule. Also, this commit does some changes upon the original code. It gets rid of a static variable for CDH timeout config and directly use the global config variable's member. Also, this changes the `is_cdh_client_initialized` function to sync version as it does not need to be async. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 11:10:09 +08:00
Xynnn007	997a1f35ab	agent: add PullImage to CDH proto file CDH provides the image pull api. This commit adds the declaration of the API in the CDH proto file. This will be used in following commits. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-16 11:10:09 +08:00
Xuewei Niu	c27116fa8e	Merge pull request #11416 from lifupan/prealloc runtime-rs: add the memory prealloc support for qemu/ch	2025-06-15 11:01:05 +08:00
Xuewei Niu	b43a61e2c8	Merge pull request #11418 from microsoft/saulparedes/flag_secure_mount agent: add feature flag to secure_mount method	2025-06-15 10:59:20 +08:00
Saul Paredes	cdfc9fd2d9	agent: add feature flag to secure_mount method This method is not used when guest-pull is not used. Add a flag that prevents a compile error when building with rust version > 1.84.0 and not using guest-pull Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-06-13 11:25:58 -07:00
Fabiano Fidêncio	6f0ea595b7	Merge pull request #11402 from microsoft/danmihai1/disable-nvdimm runtime: build variable for disable_image_nvdimm=true	2025-06-13 16:35:57 +02:00
Dan Mihai	0f8e453518	Merge pull request #11412 from katexochen/rego-v1 genpolicy: fix rules syntax issues, rego v1 compatibility; ci: checks for rego parsing	2025-06-13 07:30:34 -07:00
Paweł Bęza	91db41227f	runtime: Fix logging on virtiofs shutdown Fixes a confusing log message shown when Virtio-FS is disabled. Previously we logged “The virtiofsd had stopped” regardless of whether Virtio-FS was actually enabled or not. Signed-off-by: Paweł Bęza <pawel.beza99@gmail.com>	2025-06-13 15:59:52 +02:00
Fupan Li	5163156676	runtime-rs: add the memory prealloc support for cloud-hypervisor Add the memory prealloc support for cloud hypervisor too. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-06-13 16:26:11 +08:00
Fupan Li	fb7cfcd2fb	runtime-rs: add the memory prealloc support for qemu Add the memory prealloc support for qemu hypervisor. When it was enabled, all of the memory will be allocated and locked. This is useful when you want to reserve all the memory upfront or in the cases where you want memory latencies to be very predictable. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-06-13 16:26:03 +08:00
Steve Horsman	707b8b8a98	Merge pull request #11374 from kata-containers/dependabot/cargo/src/dragonball/tracing-1900da1d01 build(deps): bump the tracing group across 7 directories with 1 update	2025-06-13 08:30:37 +01:00
dependabot[bot]	1e6962e4a8	build(deps): bump the tracing group across 7 directories with 1 update Bumps the tracing group with 1 update in the /src/dragonball directory: [tracing](https://github.com/tokio-rs/tracing). Bumps the tracing group with 1 update in the /src/libs directory: [tracing](https://github.com/tokio-rs/tracing). Bumps the tracing group with 1 update in the /src/tools/agent-ctl directory: [tracing](https://github.com/tokio-rs/tracing). Bumps the tracing group with 1 update in the /src/tools/genpolicy directory: [tracing](https://github.com/tokio-rs/tracing). Bumps the tracing group with 1 update in the /src/tools/kata-ctl directory: [tracing](https://github.com/tokio-rs/tracing). Bumps the tracing group with 1 update in the /src/tools/runk directory: [tracing](https://github.com/tokio-rs/tracing). Bumps the tracing group with 1 update in the /src/tools/trace-forwarder directory: [tracing](https://github.com/tokio-rs/tracing). Updates `tracing` from 0.1.37 to 0.1.41 - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-0.1.37...tracing-0.1.41) Updates `tracing` from 0.1.34 to 0.1.41 - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-0.1.37...tracing-0.1.41) Updates `tracing` from 0.1.37 to 0.1.41 - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-0.1.37...tracing-0.1.41) Updates `tracing` from 0.1.37 to 0.1.41 - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-0.1.37...tracing-0.1.41) Updates `tracing` from 0.1.40 to 0.1.41 - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-0.1.37...tracing-0.1.41) Updates `tracing` from 0.1.40 to 0.1.41 - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-0.1.37...tracing-0.1.41) Updates `tracing` from 0.1.29 to 0.1.41 - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-0.1.37...tracing-0.1.41) --- updated-dependencies: - dependency-name: tracing dependency-version: 0.1.41 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: tracing - dependency-name: tracing dependency-version: 0.1.41 dependency-type: indirect update-type: version-update:semver-patch dependency-group: tracing - dependency-name: tracing dependency-version: 0.1.41 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: tracing - dependency-name: tracing dependency-version: 0.1.41 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: tracing - dependency-name: tracing dependency-version: 0.1.41 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: tracing - dependency-name: tracing dependency-version: 0.1.41 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: tracing - dependency-name: tracing dependency-version: 0.1.41 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: tracing ... Signed-off-by: dependabot[bot] <support@github.com>	2025-06-12 15:45:35 +00:00
Steve Horsman	6bdc0cf495	Merge pull request #11417 from kata-containers/sprt/revert-validate-ok-to-test Revert "ci: gha: Remove ok-to-test label on every push"	2025-06-12 15:04:44 +01:00
Aurélien Bombo	5200034642	Revert "ci: gha: Remove ok-to-test label on every push" This reverts commit `2ee3470627`. This is mostly redundant given we already have workflow approval for external contributors. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-06-12 08:40:06 -05:00
Paul Meyer	64906e6973	tests/static-checks: parse rego with opa and regorus Ensure rego policies in tree can be parsed using opa and regorus. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-06-12 14:59:39 +02:00
Paul Meyer	107e7dfdf6	ci/static-checks: install regorus Make regorus available for static checks as prerequisite for rego checks. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-06-12 14:59:39 +02:00
Steve Horsman	843655c352	Merge pull request #11411 from stevenhorsman/runk-users-crate-switch runk: Switch users crate	2025-06-12 10:35:31 +01:00
Paul Meyer	71796f7b12	ci/static-checks: install opa Make open-policy-agent available for static checks as prerequisite for rego checks. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-06-12 10:46:43 +02:00
Paul Meyer	5baea34fff	genpolicy/rules: rego v1 compatibility Migrate policy to rego v1. See https://www.openpolicyagent.org/docs/v0-upgrade#changes-to-rego-in-opa-v10 Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-06-12 10:46:43 +02:00
Fupan Li	7c1f8c9009	Merge pull request #10697 from Apokleos/no-sharefs runtime-rs: Support shared_fs = "none" for CoCo	2025-06-12 11:48:00 +08:00
Fupan Li	a495dec9f4	Merge pull request #11305 from RuoqingHe/bump-rust-1.85.1 versions: Bump Rust from 1.80.0 to 1.85.1	2025-06-12 10:21:38 +08:00
Ruoqing He	26c7f941aa	versions: Bump rust to 1.85.1 As discussed in 2025-05-22's AC call, bump rust toolchian to 1.85.1. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	5011253818	agent-ctl: Bump `ttrpc-codegen` related dependencies Bump `ttrpc-codegen` related dependencies in response to `ttrpc-codegen` bump in `libs/protocol`. Relates: #11376 Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	ba75b3299f	dragonball: Fix clippy `elided_named_lifetimes` Manually fix `elided_named_lifetimes` clippy warning reported by rust 1.85.1. ```console error: elided lifetime has a name --> src/vm/aarch64.rs:113:10 \| 107 \| fn get_fdt_vm_info<'a>( \| -- lifetime `'a` declared here ... 113 \| ) -> FdtVmInfo { \| ^^^^^^^^^ this elided lifetime gets resolved as `'a` \| = note: `-D elided-named-lifetimes` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(elided_named_lifetimes)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	1bbedb8def	dragonball: Fix clippy `repr_packed_without_abi` Fix `repr_packed_without_abi` clippy warning as suggested by rust 1.85.1. ```console error: item uses `packed` representation without ABI-qualification --> dbs_pci/src/msi.rs:468:1 \| 466 \| #[repr(packed)] \| ------ `packed` representation set here 467 \| #[derive(Clone, Copy, Default, PartialEq)] 468 \| / pub struct MsiState { 469 \| \| msg_ctl: u16, 470 \| \| msg_addr_lo: u32, 471 \| \| msg_addr_hi: u32, 472 \| \| msg_data: u16, 473 \| \| mask_bits: u32, 474 \| \| } \| \|_^ \| = warning: unqualified `#[repr(packed)]` defaults to `#[repr(Rust, packed)]`, which has no stable ABI = help: qualify the desired ABI explicity via `#[repr(C, packed)]` or `#[repr(Rust, packed)]` = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#repr_packed_without_abi = note: `-D clippy::repr-packed-without-abi` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::repr_packed_without_abi)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	e8be3c13fb	dragonball: Fix clippy `missing_docs` Fix `missing_docs` clippy warning as suggested by rust 1.85.1. ```console error: missing documentation for an associated function --> src/device_manager/mod.rs:1299:9 \| 1299 \| pub fn new_test_mgr() -> Self { \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ \| = note: `-D missing-docs` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(missing_docs)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	ceff1ed98d	dragonball: Fix clippy `needless_lifetimes` Fix `needless_lifetimes` clippy warning as suggested by rust 1.85.1. ```console error: the following explicit lifetimes could be elided: 'a --> dbs_virtio_devices/src/vhost/vhost_user/connection.rs:137:6 \| 137 \| impl<'a, AS: GuestAddressSpace, Q: QueueT, R: GuestMemoryRegion> EndpointParam<'a, AS, Q, R> { \| ^^ ^^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_lifetimes = note: `-D clippy::needless-lifetimes` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::needless_lifetimes)]` help: elide the lifetimes \| 137 - impl<'a, AS: GuestAddressSpace, Q: QueueT, R: GuestMemoryRegion> EndpointParam<'a, AS, Q, R> { 137 + impl<AS: GuestAddressSpace, Q: QueueT, R: GuestMemoryRegion> EndpointParam<'_, AS, Q, R> { \| ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	c04f1048d5	dragonball: Fix clippy `unnecessary_lazy_evaluations` Fix `unnecessary_lazy_evaluations` clippy warning as suggested by rust 1.85.1. ```console error: unnecessary closure used to substitute value for `Option::None` --> dbs_virtio_devices/src/vhost/vhost_user/block.rs:225:28 \| 225 \| let vhost_socket = config_path \| ____________________________^ 226 \| \| .strip_prefix("spdk://") 227 \| \| .ok_or_else(\|\| VirtIoError::InvalidInput)? \| \|_____________________________________________________^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_lazy_evaluations = note: `-D clippy::unnecessary-lazy-evaluations` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::unnecessary_lazy_evaluations)]` help: use `ok_or` instead \| 227 \| .ok_or(VirtIoError::InvalidInput)? \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn> unnecessary_lazy_evaluations Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	16b45462a1	dragonball: Fix clippy `manual_inspect` Manually fix `manual_inspect` clippy warning reported by rust 1.85.1. ```console error: using `map_err` over `inspect_err` --> dbs_virtio_devices/src/net.rs:753:52 \| 753 \| self.device_info.read_config(offset, data).map_err(\|e\| { \| ^^^^^^^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#manual_inspect = note: `-D clippy::manual-inspect` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::manual_inspect)]` help: try \| 753 ~ self.device_info.read_config(offset, data).inspect_err(\|e\| { 754 ~ self.metrics.cfg_fails.inc(); \| ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	5e80293bfc	dragonball: Fix clippy `empty_line_after_doc_comments` Fix `empty_line_after_doc_comments` clippy warning as suggested by rust 1.85.1. ```console error: empty line after doc comment --> dbs_boot/src/x86_64/layout.rs:11:1 \| 11 \| / /// Magic addresses externally used to lay out x86_64 VMs. 12 \| \| \| \|_^ 13 \| /// Global Descriptor Table Offset 14 \| pub const BOOT_GDT_OFFSET: u64 = 0x500; \| ------------------------------ the comment documents this constant \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#empty_line_after_doc_comments = note: `-D clippy::empty-line-after-doc-comments` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::empty_line_after_doc_comments)]` = help: if the empty line is unintentional remove it help: if the documentation should include the empty line include it in the comment \| 12 \| /// \| ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	bb13b6696e	dragonball: Fix clippy `manual_div_ceil` Fix `manual_div_ceil` clippy warning as suggested by rust 1.85.1. ```console error: manually reimplementing `div_ceil` --> dbs_interrupt/src/kvm/mod.rs:202:24 \| 202 \| let elem_cnt = (total_sz + elem_sz - 1) / elem_sz; \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: consider using `.div_ceil()`: `total_sz.div_ceil(elem_sz)` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#manual_div_ceil = note: `-D clippy::manual-div-ceil` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::manual_div_ceil)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	e58bd52dd8	dragonball: Fix clippy `precedence` Fix `precedence` clippy warning as suggested by rust 1.85.1. ```console error: operator precedence can trip the unwary --> dbs_interrupt/src/kvm/mod.rs:169:6 \| 169 \| (u64::from(type1) << 48 \| u64::from(entry.type_) << 32) \| u64::from(entry.gsi) \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: consider parenthesizing your expression: `(u64::from(type1) << 48) \| (u64::from(entry.type_) << 32)` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#precedence = note: `-D clippy::precedence` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::precedence)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	44142b13d3	genpolicy: Fix clippy `unstable_name_collisions` Manually fix `unstable_name_collisions` clippy warning reported by rust 1.85.1. ```console error: a method with this name may be added to the standard library in the future --> src/registry.rs:646:10 \| 646 \| file.unlock()?; \| ^^^^^^ \| = warning: once this associated item is added to the standard library, the ambiguity may cause an error or change in behavior! = note: for more information, see issue #48919 <https://github.com/rust-lang/rust/issues/48919> = help: call with fully qualified syntax `fs2::FileExt::unlock(...)` to keep using the current method = note: `-D unstable-name-collisions` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(unstable_name_collisions)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	366d293141	genpolicy: Fix clippy `manual_unwrap_or_default` Manually fix `manual_unwrap_or_default` clippy warning reported by rust 1.85.1. ```console error: if let can be simplified with `.unwrap_or_default()` --> src/registry.rs:619:37 \| 619 \| let mut data: Vec<ImageLayer> = if let Ok(vec) = serde_json::from_reader(read_file) { \| _____________________________________^ 620 \| \| vec 621 \| \| } else { ... \| 624 \| \| }; \| \|_____^ help: replace it with: `serde_json::from_reader(read_file).unwrap_or_default()` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#manual_unwrap_or_default = note: `-D clippy::manual-unwrap-or-default` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::manual_unwrap_or_default)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	a71a77bfa3	genpolicy: Fix clippy `manual_div_ceil` Manually fix `manual_div_ceil` clippy warning reported by rust 1.85.1. ```console error: manually reimplementing `div_ceil` --> src/verity.rs:73:25 \| 73 \| let count = (data_size + entry_size - 1) / entry_size; \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: consider using `.div_ceil()`: `data_size.div_ceil(entry_size)` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#manual_div_ceil = note: `-D clippy::manual-div-ceil` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::manual_div_ceil)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	5d491bd4f4	genpolicy: Bump `ttrpc-codegen` related dependencies Bump `ttrpc-codegen` related dependencies in response to `ttrpc-codegen` bump in `libs/protocol`. Relates: #11376 Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	965f1d799c	kata-ctl: Fix clippy `empty_line_after_outer_attr` Manually fix `empty_line_after_outer_attr` clippy warning reported by rust 1.85.1. ```console error: empty line after outer attribute --> src/check.rs:515:9 \| 515 \| / #[allow(dead_code)] 516 \| \| \| \|_^ 517 \| struct TestData<'a> { \| ------------------- the attribute applies to this struct \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#empty_line_after_outer_attr = note: `-D clippy::empty-line-after-outer-attr` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::empty_line_after_outer_attr)]` = help: if the empty line is unintentional remove it ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	3d64b11454	kata-ctl: Fix clippy `question_mark` Manually fix `question_mark` clippy warning reported by rust 1.85.1. ```console error: this `match` expression can be replaced with `?` --> src/ops/check_ops.rs:49:13 \| 49 \| let f = match get_builtin_check_func(check) { \| _____________^ 50 \| \| Ok(fp) => fp, 51 \| \| Err(e) => return Err(e), 52 \| \| }; \| \|_____^ help: try instead: `get_builtin_check_func(check)?` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#question_mark = note: `-D clippy::question-mark` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::question_mark)]` ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	702ba4033e	kata-ctl: Bump `ttrpc-codegen` related dependencies Bump `ttrpc-codegen` related dependencies in response to `ttrpc-codegen` bump in `libs/protocol`. Relates: #11376 Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	f70c17660a	runtime-rs: Fix clippy `unnecessary_map_or` Fix `unnecessary_map_or` clippy warning as suggested by rust 1.85.1. error: this `map_or` can be simplified --> crates/hypervisor/src/ch/inner_hypervisor.rs:1054:24 \| 1054 \| let have_tdx = fs::read(TDX_KVM_PARAMETER_PATH) \| ________________________^ 1055 \| \| .map_or(false, \|content\| !content.is_empty() && content[0] == b'Y'); \| \|_______________________________________________________________________________^ help: use is_ok_and instead: `fs::read(TDX_KVM_PARAMETER_PATH).is_ok_and(\|content\| !content.is_empty() && content[0] == b'Y')` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_map_or = note: `-D clippy::unnecessary-map-or` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::unnecessary_map_or)]` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	d7dfab92be	runtime-rs: Fix clippy `manual_inspect` Manually fix `manual_inspect` clippy warning reported by rust 1.85.1. ```console error: using `map` over `inspect` --> crates/resource/src/cdi_devices/container_device.rs:50:10 \| 50 \| .map(\|device\| { \| ^^^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#manual_inspect = note: `-D clippy::manual-inspect` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::manual_inspect)]` help: try \| 50 ~ .inspect(\|device\| { 51 \| // push every device's Device to agent_devices 52 ~ devices_agent.push(device.device.clone()); \| ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	4c467f57de	runtime-rs: Fix clippy `needless_return` Fix `needless_return` clippy warning as suggested by rust 1.85.1. ```console error: unneeded `return` statement --> crates/resource/src/rootfs/nydus_rootfs.rs:199:5 \| 199 \| return Some(prefetch_list_path.display().to_string()); \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_return = note: `-D clippy::needless-return` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::needless_return)]` help: remove `return` \| 199 - return Some(prefetch_list_path.display().to_string()); 199 + Some(prefetch_list_path.display().to_string()) \| ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	23365fc7e2	runtime-rs: Bump `ttrpc-codegen` related dependencies Bump `ttrpc-codegen` related dependencies in response to `ttrpc-codegen` bump in `libs/protocol`. Relates: #11376 Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Ruoqing He	bd4d9cf67c	agent: Fix clippy `empty_line_after_doc_comments` Manually fix `empty_line_after_doc_comments` clippy warning reported by rust 1.85.1. ```console error: empty line after doc comment --> src/linux_abi.rs:8:1 \| 8 \| / /// Linux ABI related constants. 9 \| \| \| \|_^ 10 \| #[cfg(target_arch = "aarch64")] 11 \| use std::fs; \| ------- the comment documents this import \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#empty_line_after_doc_comments = note: `-D clippy::empty-line-after-doc-comments` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::empty_line_after_doc_comments)]` = help: if the empty line is unintentional remove it ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 13:50:10 +00:00
Paul Meyer	d488c998c7	genpolicy/rules: fix syntax issue Policy wan't parsable with OPA due to surplus whitespace. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-06-11 14:48:36 +02:00
Steve Horsman	c8fcda0d73	Merge pull request #11407 from Champ-Goblem/fix/nvidia-rootfs-only-copy-opa-when-agent-policy-enabled nvidia-rootfs: only copy `kata-opa` if `AGENT_POLICY` is enabled	2025-06-11 13:39:07 +01:00
stevenhorsman	39f51b4c6d	runk: Switch users crate The users@0.11.0 has a high severity CVE-2025-5791 and doesn't seem to be maintained, so switch to uzers which forked it. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-11 12:03:28 +01:00
Champ-Goblem	d6c45027f5	nvidia-rootfs: only copy `kata-opa` if `AGENT_POLICY` is enabled In the nvidia rootfs build, only copy in `kata-opa` if `AGENT_POLICY` is enabled. This fixes builds when `AGENT_POLICY` is disabled and opa is not built. Signed-off-by: Champ-Goblem <cameron@northflank.com>	2025-06-11 11:25:10 +02:00
Ruoqing He	2ccb306c0b	agent: Fix clippy `precedence` Fix `precedence` clippy warning as suggested by rust 1.85.1. ```console warning: operator precedence can trip the unwary --> src/pci.rs:54:19 \| 54 \| Ok(SlotFn(ss8 << FUNCTION_BITS \| f8)) \| ^^^^^^^^^^^^^^^^^^^^^^^^^ help: consider parenthesizing your expression: `(ss8 << FUNCTION_BITS) \| f8` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#precedence ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 07:18:09 +00:00
Ruoqing He	048178bc5e	agent: Fix clippy `unnecessary_get_then_check` Manually fix `unnecessary_get_then_check` clippy warning as suggested by rust 1.85.1. ```console warning: unnecessary use of `get(&shared_mount.src_ctr).is_none()` --> src/sandbox.rs:431:25 \| 431 \| if src_ctrs.get(&shared_mount.src_ctr).is_none() { \| ---------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ \| \| \| help: replace it with: `!src_ctrs.contains_key(&shared_mount.src_ctr)` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_get_then_check ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 07:18:09 +00:00
Ruoqing He	54ec432178	agent: Fix clippy `partialeq_to_none` Fix `partialeq_to_none` clippy warning as suggested by rust 1.85.1. ```console warning: binary comparison to literal `Option::None` --> src/sandbox.rs:431:16 \| 431 \| if src_ctrs.get(&shared_mount.src_ctr) == None { \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use `Option::is_none()` instead: `src_ctrs.get(&shared_mount.src_ctr).is_none()` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#partialeq_to_none ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 07:18:09 +00:00
Ruoqing He	95dca31ecc	agent: Fix clippy `question_mark` Fix `question_mark` clippy warning as suggested by rust 1.85.1. ```console warning: this `match` expression can be replaced with `?` --> rustjail/src/cgroups/fs/mod.rs:1327:20 \| 1327 \| let dev_type = match DeviceType::from_char(d.typ().as_str().chars().next()) { \| ____________________^ 1328 \| \| Some(t) => t, 1329 \| \| None => return None, 1330 \| \| }; \| \|_____^ help: try instead: `DeviceType::from_char(d.typ().as_str().chars().next())?` \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#question_mark ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 07:18:09 +00:00
Ruoqing He	5a95a65604	agent: Fix clippy `unnecessary_map_or` Fix `unnecessary_map_or` clippy warning as suggested by rust 1.85.1. ```console warning: this `map_or` can be simplified --> rustjail/src/container.rs:1424:20 \| 1424 \| if namespace \| ____________________^ 1425 \| \| .path() 1426 \| \| .as_ref() 1427 \| \| .map_or(true, \|p\| p.as_os_str().is_empty()) \| \|_______________________________________________________________^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_map_or help: use is_none_or instead \| 1424 ~ if namespace 1425 + .path() 1426 + .as_ref().is_none_or(\|p\| p.as_os_str().is_empty()) \| ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 07:18:09 +00:00
Ruoqing He	f9c76edd23	agent: Fix clippy `manual_inspect` Manually fix `manual_inspect` clippy warning reported by rust 1.85.1. ```console warning: using `map_err` over `inspect_err` --> rustjail/src/mount.rs:881:6 \| 881 \| .map_err(\|e\| { \| ^^^^^^^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#manual_inspect help: try \| 881 ~ .inspect_err(\|&e\| { 882 ~ log_child!(cfd_log, "mount error: {:?}", e); \| ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 07:18:09 +00:00
Ruoqing He	7ff34f00c2	agent: Fix clippy `single_match` Fix `single_match` clippy warning as suggested by rust 1.85.1. ```console warning: you seem to be trying to use `match` for destructuring a single pattern. Consider using `if let` --> src/image.rs:241:9 \| 241 \| / match oci.annotations() { 242 \| \| Some(a) => { 243 \| \| if ImageService::is_sandbox(a) { 244 \| \| return ImageService::get_pause_image_process(); ... \| 247 \| \| None => {} 248 \| \| } \| \|_________^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#single_match help: try \| 241 ~ if let Some(a) = oci.annotations() { 242 + if ImageService::is_sandbox(a) { 243 + return ImageService::get_pause_image_process(); 244 + } 245 + } \| ``` Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-11 07:18:09 +00:00
Alex Lyn	e99070afb4	Merge pull request #11343 from Apokleos/cc-blk-sharefs Enables block device and disable virtio-fs	2025-06-11 11:52:52 +08:00
Alex Lyn	2d570db08b	Merge pull request #11179 from Apokleos/tdx-qemu-rs runtime-rs: Add TDX Support to runtime-rs for Confidential Containers (CoCo)	2025-06-11 10:27:36 +08:00
alex.lyn	2e9d27c500	runtime-rs: Enables block device and disable virtio-fs via capabilities Kata runtime employs a CapabilityBits mechanism for VMM capability governance. Fundamentally, this mechanism utilizes predefined feature flags to manage the VMM's operational boundaries. To meet demands for storage performance and security, it's necessary to explicitly enable capability flags such as `BlockDeviceSupport` (basic block device support) and `BlockDeviceHotplugSupport` (block device hotplug) which ensures the VMM provides the expected caps. In CoCo scenarios, due to the potential risks of sensitive data leaks or side-channel attacks introduced by virtio-fs through shared file systems, the `FsSharingSupport` flag must be forcibly disabled. This disables the virtio-fs feature at the capability set level, blocking insecure data channels. Fixes #11341 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-11 10:19:13 +08:00
alex.lyn	23340b6b5f	runtime-rs: Support cold plug of block devices via virtio-blk for Qemu Two key important scenarios: (1) Support `virtio-blk-pci` cold plug capability for confidential guests instead of nvdimm device in CVM due to security constraints in CoCo cases. (2) Push initdata payload into compressed raw block device and insert it in CVM through `virtio-blk-pci` cold plug mechanism. Fixes #11341 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-11 10:19:13 +08:00
RuoqingHe	7916db9613	Merge pull request #11345 from Apokleos/fix-noise protocols: Fix the noise caused by non-formatted codes in protocols	2025-06-11 09:50:02 +08:00
Aurélien Bombo	66ae9473cb	Merge pull request #11397 from kata-containers/sprt/validate-ok-to-test ci: gha: Remove ok-to-test label on every push	2025-06-10 16:42:54 -05:00
Aurélien Bombo	31288ea7fc	Merge pull request #11398 from kata-containers/sprt/undo-mariner-hotfix Revert "ci: Fix Mariner rootfs build failure"	2025-06-10 16:09:08 -05:00
Aurélien Bombo	f34010cc94	Merge pull request #11388 from kata-containers/sprt/azure-oidc ci: Use OIDC to log into Azure	2025-06-10 13:08:44 -05:00
Steve Horsman	6424055eeb	Merge pull request #11393 from stevenhorsman/bump-chrono-0.4.41 libs: Bump chrono package	2025-06-10 16:47:18 +01:00
stevenhorsman	99e70100c7	workflows: Set persist-credentials: false on checkout By default the checkout action leave the credentials in the checked-out repo's `.git/config`, which means they could get exposed. Use persist-credentials: false to prevent this happening. Note: static-checks.yaml does use git diff after the checkout, but the git docs state that git diff is just local, so doesn't need authentication. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-10 10:33:41 +01:00
RuoqingHe	5b8f7b2e3c	Merge pull request #11391 from RuoqingHe/disable-runtime-rs-test-on-riscv runtime-rs: Skip test on RISC-V architecture	2025-06-10 17:28:12 +08:00
Xuewei Niu	ac6779428f	Merge pull request #11377 from justxuewei/hvsock-logging	2025-06-10 16:45:59 +08:00
alex.lyn	c8433c6b70	kata-sys-util: Update TDX platform detection for newer TDX platforms On newer TDX platforms, checking `/sys/firmware/tdx` for `major_version` and `minor_version` is no longer necessary. Instead, we only need to verify that `/sys/module/kvm_intel/parameters/tdx` is set to `'Y'`. This commit addresses the following: (1) Removes the outdated check and corrects related code, primarily impacting `cloud-hypervisor`. (2) Refines the TDX platform detection logic within `arch_guest_protection`. Fixes #11177 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-10 11:31:25 +08:00
alex.lyn	8652aa7417	kata-types: Enable QGS port via configuration Currently, the TDX Quote Generation Service (QGS) connection in QEMU with default vsock port 4050 for TD attestation. To make it flexible for users to modify the QGS port. Based on the introduced qgs_port, This commit supports the QGS port to be configured via configuration Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-10 11:31:25 +08:00
alex.lyn	f8d1ee8b1c	kata-types: Introduce QGS port for TD attestation in Hypervisor config Currently, the TDX Quote Generation Service (QGS) connection in QEMU is hardcoded to vsock port 4050, which limits flexibility for TD attestation. While the users will be able to modify the QGS port. To address this inflexibility, this commit introduces a new qgs_port field within security info and make it default with 4050. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-10 11:31:25 +08:00
alex.lyn	49ced4d43c	runtime-rs: Prepare Tdx protection device in start sandbox During the prepare for `start sandbox` phase, this commit ensures the correct `ProtectionDeviceConfig` is prepared based on the `GuestProtection` type in a TEE platform. Specifically, for the TDX platform, this commit sets the essential parameters within the ProtectionDeviceConfig, including the TDX ID, firmware path, and the default QGS port (4050). This information is then passed to the underlying VMM for further processing using the existing ResourceManager and DeviceManager infrastructure. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-10 11:31:25 +08:00
alex.lyn	bab77e2d65	runtime-rs: Introduce Tdx Protection Device and add it into cmdline This patch introduces TdxConfig with key fields, firmare, qgs_port, mrconfigid, and other useful things. With this config, a new ProtectionDeviceConfig type `Tdx(TdxConfig)` is added. With this new type supported, we finally add tdx protection device into the cmdline to launch a TDX-based CVM. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-10 11:31:25 +08:00
alex.lyn	09fddac2c4	runtime-rs: Introduce 'tdx-guest' object and its builder for TDX CVMs This commit introduces the `tdx-guest` designed to facilitate the launch of CVMs leveraging Intel's TDX. Launching a TDX-based CVM requires various properties, including `quote-generation-socket`, and `mrconfigid`,`sept-ve-disable` .etc. (1) The `quote-generation-socket` property is added to the `tdx-guest` object, which is of type `SocketAddress`, specifies the address of the Quote Generation Service (QGS). (2) The `mrconfigid` property, representing the SHA384 hash for non-owner-defined configurations of the guest TD, is introduced as a runtime or OS configuration parameter. (3) And the `sept-ve-disable` property allows control over whether EPT violation conversions to #VE exceptions are disabled when the guest TD accesses PENDING pages. With the introduction of the `tdx-guest` object and its associated properties, launching TDX-based CVMs is now supported. For example, a TDX guest can be configured via the command line as follows: ```shell -object {"qom-type":"tdx-guest", "id":"tdx", "sept-ve-disable":true,\ "mrconfigid":"vHswGkzG4B3Kikg96sLQ5vPCYx4AtuB4Ubfzz9UOXvZtCGat8b8ok7Ubz4AxDDHh",\ "quote-generation-socket":{"type":"vsock","cid":"2","port":"4050"} \ -machine q35,accel=kvm,confidential-guest-support=tdx ``` Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-10 11:31:25 +08:00
alex.lyn	1d4ffe6af3	runtime-rs: Implement serializable SocketAddress with Serde This enables consistent JSON representation of socket addresses across system components: (1) Add serde serialization/deserialization with standardized field naming convention. (2) Enforce string-based port/cid and unix/path representation for protocol compatibility. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-10 11:31:25 +08:00
alex.lyn	65931fb75f	protocols: Fix the noise caused by non-formatted codes in protocols ``` - decoded.strip_prefix("CAP_").unwrap_or(decoded) + decoded + .strip_prefix("CAP_") + .unwrap_or(decoded) .parse::<oci::Capability>() .unwrap_or_else(\|_\| panic!("Failed to parse {:?} to Enum Capability", cap)) }) @@ -1318,8 +1320,6 @@ mod tests { #[test] #[should_panic] fn test_cap_vec2hashset_bad() { - cap_vec2hashset(vec![ - "CAP_DOES_NOT_EXIST".to_string(), - ]); + cap_vec2hashset(vec!["CAP_DOES_NOT_EXIST".to_string()]); ``` Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-10 11:30:33 +08:00
alex.lyn	f3c8ef9200	kata-types: Support disabled sharefs with config of shared_fs = "none" For CoCo, shared_fs is prohibited as we cannot guarantee the security of guest/host sharing. Therefore, this PR enables administrators to configure shared_fs = "none" via the configuration.toml file, thereby enforcing the disablement of sharing. Fixes #10677 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-06-10 11:30:01 +08:00
Dan Mihai	d37feac679	tests: test mariner with disable_image_nvdimm=true Run the k8s tests on mariner with annotation disable_image_nvdimm=true, to use virtio-blk instead of nvdimm for the guest rootfs block device. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-06-10 02:03:31 +00:00
Dan Mihai	1aeef52bae	clh: runtime: add disable_image_nvdimm support Allow users to build using DEFDISABLEIMAGENVDIMM=true if they want to set disable_image_nvdimm=true in configuration-clh.toml. disable_image_nvdimm=false is the default config value. Also, use virtio-blk instead of nvdimm if disable_image_nvdimm=true in configuration-clh.toml. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-06-10 02:00:52 +00:00
Dan Mihai	0dd9325264	qemu: runtime: build variable for disable_image_nvdimm=true Allow users to build using DEFDISABLEIMAGENVDIMM=true if they want to set disable_image_nvdimm=true in configuration-qemu*.toml. disable_image_nvdimm=false is the default configuration value. Note that the value of disable_image_nvdimm gets ignored for platforms using "confidential_guest = true". Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-06-10 01:57:42 +00:00
Dan Mihai	d51e0c9875	snp: gpu: comment out disable_image_nvdimm config Comment out "disable_image_nvdimm = true" in: - configuration-qemu-snp.toml - configuration-qemu-nvidia-gpu-snp.toml for consistency with the other configuration-qemu*.toml files. Those two platforms are using "confidential_guest = true", and therefore the value of disable_image_nvdimm gets ignored. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-06-10 01:44:51 +00:00
stevenhorsman	ac9d3eb7be	libs: Bump chrono package Bump chrono package to 0.4.41 and thereby remove the time 0.1.43 dependency and remediate CVE-2020-26235 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-09 21:01:27 +01:00
Aurélien Bombo	004c1a4595	Revert "ci: Fix Mariner rootfs build failure" This reverts commit `dfa25a42ff`. The original issue was fixed: https://github.com/microsoft/azurelinux/issues/13971#issuecomment-2956384627	2025-06-09 14:06:07 -05:00
Aurélien Bombo	2ee3470627	ci: gha: Remove ok-to-test label on every push This removes the ok-to-test label on every push, except if the PR author has write access to the repo (ie. permission to modify labels). This protects against attackers who would initially open a genuine PR, then push malicious code after the initial review. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-06-09 12:37:06 -05:00
Aurélien Bombo	9488ce822d	Merge pull request #11396 from kata-containers/sprt/fix-mariner-image ci: Fix Mariner rootfs build failure	2025-06-09 12:32:14 -05:00
Aurélien Bombo	dfa25a42ff	ci: Fix Mariner rootfs build failure This implements a workaround for microsoft/azurelinux#13971 to unblock the CI. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-06-09 10:56:10 -05:00
Alex Lyn	2979312f7b	Merge pull request #11381 from RuoqingHe/log-instead-of-format runtime-rs: Log error instead of format	2025-06-09 11:54:13 +08:00
Ruoqing He	e290587f9c	runtime-rs: Skip test on RISC-V architecture Full set test on RISC-V architecture is not yet supported, skip it for now. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-09 01:49:47 +00:00
Ruoqing He	781510202a	runtime-rs: Log error instead of format Log on error condition when `umount` operation fail instead of `format!` error message. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-08 08:28:22 +00:00
Xuewei Niu	17b2daf0a7	Merge pull request #11357 from justxuewei/nxw/remove-dcode dragonball: Remove a useless dead_code attribute	2025-06-08 16:07:03 +08:00
Dan Mihai	e067a1be64	Merge pull request #11358 from burgerdev/gid-warning genpolicy: improvements to /etc/passwd checks	2025-06-06 17:04:27 -07:00
Aurélien Bombo	9dd3807467	ci: Use OIDC to log into Azure This completely eliminates the Azure secret from the repo, following the below guidance: https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/configuring-openid-connect-in-azure The federated identity is scoped to the `ci` environment, meaning: * I had to specify this environment in some YAMLs. I don't believe there's any downside to this. * As previously, the CI works seamlessly both from PRs and in the manual workflow. I also deleted the tools/packaging/kata-deploy/action folder as it doesn't seem to be used anymore, and it contains a reference to the secret. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-06-06 15:26:10 -05:00
Steve Horsman	31a8944da1	Merge pull request #11334 from kata-containers/remove-inherit-secrets workflows: Replace secrets: inherit	2025-06-06 16:41:13 +01:00
Steve Horsman	9555f2ce08	Merge pull request #11387 from burgerdev/riscv-artifact-name ci: fix artifact name of RISC-V tarball	2025-06-06 15:50:21 +01:00
stevenhorsman	66ef1c1198	workflows: Replace secrets: inherit Having secrets unconditionally being inherited is bad practice, so update the workflows to only pass through the minimal secrets that are needed Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-06 09:56:46 +01:00
stevenhorsman	89d038d2b4	workflows: Switch QUAY_DEPLOYER_USERNAME to var QUAY_DEPLOYER_USERNAME isn't sensitive, so update the secret for a var to simplify the workflows Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-06 09:49:14 +01:00
stevenhorsman	2eda21180a	workflows: Switch AUTHENTICATED_IMAGE_USER to var AUTHENTICATED_IMAGE_USER isn't sensitive, so update the secret for a var to simplify the workflows Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-06 09:49:14 +01:00
Markus Rudy	9ffed463a1	ci: fix artifact name of RISC-V tarball The artifact name accidentally referred to ARM64, which caused a clash in CI runs. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-06-06 08:29:48 +02:00
RuoqingHe	567296119d	Merge pull request #11317 from kimullaa/remove-obsolete-parameters runtime: remove hotplug_vfio_on_root_bus from config.toml	2025-06-06 04:03:03 +02:00
Steve Horsman	9ff650b641	Merge pull request #11383 from stevenhorsman/remove-docker-hub-publish Switch docker hub mirroring to ghcr.io	2025-06-05 17:16:18 +01:00
Shunsuke Kimura	5193cfedca	runtime: remove hotplug_vfio_on_root_bus from toml In this commit, hotplug_vfio_on_root_bus parameter is removed. <`dd422ccb69`> pcie_root_port parameter description (`This value is valid when hotplug_vfio_on_root_bus is true and machine_type is "q35"`) will have no value, and not completely valid, since vrit or DB as also support for root-ports and CLH as well. so removed. Fixes: #11316 Co-authored-by: Zvonko Kaiser <zkaiser@nvidia.com> Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>	2025-06-05 21:53:06 +09:00
Steve Horsman	0f8104a2df	Merge pull request #11376 from RuoqingHe/upgrade-ttrpc-0.5.0 Upgrade `ttrpc-codegen` and `protobuf` to kill `#![allow(box_pointers)]`	2025-06-05 13:02:13 +01:00
stevenhorsman	6c6e16eef3	workflows: Remove docker hub registry publishing As docker hub has rate limiting issues, inside mirror quay.io to ghcr.io instead Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-05 11:46:51 +01:00
Markus Rudy	1c240de58d	genpolicy: don't parse /etc/passwd in a loop Instead of looping over the users per group and parsing passwd for each user, we can do the reverse lookup uid->user up front and then compare the names directly. This has the nice side-effect of silencing warnings about non-existent users mentioned in /etc/group, which is not relevant for policy decisions. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-06-04 17:54:57 +02:00
Markus Rudy	a1baaf6fe2	genpolicy: ignore groups with same name as user containerd does not automatically add groups to the list of additional GIDs when the groups have the same name as the user: https://github.com/containerd/containerd/blob/f482992/pkg/oci/spec_opts.go#L852-L854 This is a bug and should be corrected, but it has been present since at least 1.6.0 and thus affects almost all containerd deployments in existence. Thus, we adopt the same behavior and ignore groups with the same name as the user when calculating additional GIDs. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-06-04 10:29:49 +02:00
Xuewei Niu	77ca2fe88b	runtime-rs: Reduce the number of duplicate log entries being printed When connecting to guest through vsock, a log is printed for each failure. The failure comes from two main reasons: (1) the guest is not ready or (2) some real errors happen. Printing logs for the first case leads to log clutter, and your logs will like this: ``` Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/... Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/... Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/... Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/... Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/... ``` To avoid this, the sock implmentations save the last error and return it after all retries are exhausted. Users are able to check all errors by setting the log level to trace. Reorganize the log format to "{sock type}: {message}" to make it clearer. Apart from that, errors return by the socks use `self`, instead of `ConnectConfig`, since the `ConnectConfig` doesn't provide any useful information. Disable infinite loop for the log forwarder. There is retry logic in the sock implmentations. We can consider the agent-log unavailable if `sock.connect()` encounters an error. Fixes: #10847 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2025-06-04 12:25:32 +08:00
Xuewei Niu	3f8dd821e6	dragonball: Remove a useless dead_code attribute The vhost-user-fs has been added to Dragonball, so we can remove `update_memory`'s dead_code attribute. Fixes: #8691 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2025-06-04 11:34:16 +08:00
Ruoqing He	77e68b164e	agent: Upgrade `ttrpc-codegen` to 0.5.0 Propagate `ttrpc-codegen` upgrade from `libs/protocols` to `agent`. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-04 01:16:46 +00:00
Ryan Savino	1e686dbca7	agent: Remove casting and fix Arc declaration Removed unnecessary dynamic dispatch for services. Properly dereferenced service Box values and stored in Arc. Co-authored-by: Ruoqing He <heruoqing@iscas.ac.cn> Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn> Signed-Off-By: Ryan Savino <ryan.savino@amd.com>	2025-06-04 01:16:46 +00:00
Ruoqing He	0471f01074	libs: Bump `ttrpc-codegen` and `protobuf` Previous version of `ttrpc-codegen` is generating outdated `#![allow(box_pointers)]` which was deprecated. Bump `ttrpc-codegen` from v0.4.2 to v0.5.0 and `protobuf` from vx to v3.7.1 to get rid of this. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-06-04 01:16:18 +00:00
Markus Rudy	eeb3d1384b	genpolicy: compare additionalGIDs as sets The additional GIDs are handled by genpolicy as a BTreeSet. This set is then serialized to an ordered JSON array. On the containerd side, the GIDs are added to a list in the order they are discovered in /etc/group, and the main GID of the user is prepended to that list. This means that we don't have any guarantees that the input GIDs will be sorted. Since the order does not matter here, comparing the list of GIDs as sets is close enough. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-06-03 20:18:35 +02:00
Aurélien Bombo	8c3f8f8e21	Merge pull request #11339 from kata-containers/sprt/require-agent-ctl ci: Require agent-ctl tests	2025-06-03 11:58:33 -04:00
Steve Horsman	74e47382f8	Merge pull request #11016 from stevenhorsman/dependabot-configuration workflows: Add dependabot config	2025-06-03 15:12:32 +01:00
Steve Horsman	8176eefdac	Merge pull request #10748 from zvonkok/helm-doc doc: Add Helm Chart entry	2025-06-03 14:48:19 +01:00
Markus Rudy	02ad39ddf1	genpolicy: push down warning about missing passwd file The warning used to trigger even if the passwd file was not needed. This commit moves it down to where it actually matters. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-06-03 11:19:29 +02:00
Markus Rudy	ec969e4dcd	genpolicy: remove redundant group check https://github.com/kata-containers/kata-containers/pull/11077 established that the GID from the image config is never used for deriving the primary group of the container process. This commit removes the associated logic that derived a GID from a named group. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-06-03 10:59:10 +02:00
Zvonko Kaiser	985e965adb	doc: Added Helm Chart README.md We need more and accurate documentation. Let's start by providing an Helm Chart install doc and as a second step remove the kustomize steps. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> Co-authored-by: Steve Horsman <steven@uk.ibm.com>	2025-06-02 23:26:16 +00:00
Dan Mihai	dc0da567cd	Merge pull request #11340 from microsoft/danmihai1/image-size-alignment image: custom guest rootfs image file size alignment	2025-06-02 14:33:21 -07:00
Dan Mihai	c2c194d860	kata-deploy: smaller guest image file for mariner Align up the mariner Guest image file size to 2M instead of the default 128M alignment. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-06-02 16:15:17 +00:00
Dan Mihai	65385a5bf9	image: custom guest rootfs image file size alignment The Guest rootfs image file size is aligned up to 128M boundary, since commmit `2b0d5b2`. This change allows users to use a custom alignment value - e.g., to align up to 2M, users will be able to specify IMAGE_SIZE_ALIGNMENT_MB=2 for image_builder.sh. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-06-02 16:15:17 +00:00
Steve Horsman	c575048aa7	Merge pull request #11329 from Xynnn007/fix-initdata-snp Fix \| Support initdata for SNP	2025-06-02 15:24:12 +01:00
stevenhorsman	ae352e7e34	ci: Add dependabot groups - Create groups for commonly seen cargo packages so that rather than getting up to 9 PRs for each rust components, bumps to the same package are grouped together. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-02 14:45:31 +01:00
stevenhorsman	a94388cf61	ci: Add dependabot config - Create a dependabot configuration to check for updates to our rust and golang packages each day and our github actions each month Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-06-02 14:45:31 +01:00
Xynnn007	8750eadff2	test: turn SNP on for initdata tests After the last commit, the initdata test on SNP should be ok. Thus we turn on this flag for CI. Fixes #11300 Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-02 20:33:19 +08:00
Xynnn007	39aa481da1	runtime: fix initdata support for SNP the qemu commandline of SNP should start with `sev-snp-guest`, and then following other parameters separeted by ','. This patch fixes the parameter order. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-06-02 20:33:19 +08:00
Fabiano Fidêncio	57f3cb8b3b	Merge pull request #11344 from fidencio/topic/kernel-add-tuntap-move-memagent-stuff kernel: Add CONFIG_TUN (needed for VPNs) and move mem-agent related configs to common	2025-06-01 21:32:07 +02:00
RuoqingHe	51cc960cdd	Merge pull request #11346 from fidencio/topic/bump-cgroups-rs rust: Update cgroups-rs to its v0.3.5 release	2025-05-31 04:13:05 +02:00
Fabiano Fidêncio	48f8496209	Merge pull request #11327 from Champ-Goblem/agent/increase-limit-nofile agent: increase LimitNOFILE in the systemd service	2025-05-30 21:56:01 +02:00
Fabiano Fidêncio	02c46471fd	rust: Update cgroups-rs to its v0.3.5 release We're switching to using a rev as it may take some time for the package to be updated on crates.io. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-05-30 21:49:50 +02:00
Fabiano Fidêncio	dadbfd42c8	kernel: Move mem-agent configs to the common kernel build There's no benefit on keeping those restricted to the dragonball build, when they can be used with other VMMs as well (as long as they support the mem-agent). Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-05-30 21:49:22 +02:00
Champ-Goblem	a37080917d	kernel: Add CONFIG_TUN for VPN services TUN/TAP is a must for VPN related services. Signed-off-by: Champ-Goblem <cameron@northflank.com> Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-05-30 21:49:22 +02:00
Fabiano Fidêncio	b8a7350a3d	Merge pull request #11324 from Champ-Goblem/runtime/fix-cgroup-deletion runtime: fix cgroupv2 deletion when sandbox_cgroup_only=false	2025-05-30 21:23:07 +02:00
Champ-Goblem	ef642fe890	runtime: fix cgroupv2 deletion when sandbox_cgroup_only=false Currently, when a new sandbox resource controller is created with cgroupsv2 and sandbox_cgroup_only is disabled, the cgroup management falls back to cgroupfs. During deletion, `IsSystemdCgroup` checks if the path contains `:` and tries to delete the cgroup via systemd. However, the cgroup was originally set up via cgroupfs and this process fails with `lstat /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/....scope: no such file or directory`. This patch updates the deletion logic to take in to account the sandbox_cgroup_only=false option and in this case uses the cgroupfs delete. Fixes: #11036 Signed-off-by: Champ-Goblem <cameron@northflank.com>	2025-05-30 17:51:31 +02:00
Champ-Goblem	f4007e5dc1	agent: increase LimitNOFILE in the systemd service Increase the NOFILE limit in the systemd service, this helps with running databases in the Kata runtime. Signed-off-by: Champ-Goblem <cameron@northflank.com>	2025-05-30 17:49:29 +02:00
Fabiano Fidêncio	3f5dc87284	Merge pull request #11333 from stevenhorsman/csi-driver-permissions-fix workflow: add packages: write to csi-driver publish	2025-05-30 17:45:47 +02:00
Zvonko Kaiser	4586511c01	doc: Add Helm Chart entry Since 3.12 we're shipping the helm-chart per default with each release. Update the documentation to use helm rather then the kata-deploy manifests. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-05-30 14:45:01 +00:00
Aurélien Bombo	c03b38c7e3	ci: Require agent-ctl tests This adds `run-kata-agent-apis` to the list of required tests. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-05-29 14:09:42 -05:00
stevenhorsman	586d9adfe5	workflow: add packages: write to csi-driver publish This one was missed in the earlier PR Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-29 15:57:07 +01:00
Steve Horsman	3da213a8c8	Merge pull request #11326 from kata-containers/top-level-workflow-permissions Top level workflow permissions	2025-05-29 10:03:06 +01:00
stevenhorsman	c34416f53a	workflows: Add explicit permissions where needed We have a number of jobs that either need,or nest workflows that need gh permissions, such as for pushing to ghcr, or doing attest build provenance. This means they need write permissions on things like `packages`, `id-token` and `attestations`, so we need to set these permissions at the job-level (along with `contents: read`), so they are not restricted by our safe defaults. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-28 19:34:28 +01:00
stevenhorsman	088e97075c	workflow: Add top-level permissions Set: ``` permissions: contents: read ``` as the default top-level permissions explicitly to conform to recommended security practices e.g. https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions	2025-05-28 19:34:28 +01:00
Dan Mihai	353d0822fd	Merge pull request #11314 from katexochen/p/svc-name-regex genpolicy: fix svc_name regex	2025-05-28 10:08:38 -07:00
Steve Horsman	7a9d919e3e	Merge pull request #11322 from kata-containers/workflow-permissions workflows: Add explicit permissions for attestation	2025-05-28 17:28:22 +01:00
Steve Horsman	2667d4a345	Merge pull request #11323 from stevenhorsman/gatekeeper-workflow-permissions-ii workflow: Update gatekeeper permissions	2025-05-28 17:05:24 +01:00
stevenhorsman	4d4fb86d34	workflow: Update gatekeeper permissions I shortsightedly forgot that gatekeeper would need to read more than just the commit content in it's python scripts, so add read permissions to actions issues which it uses in it's processing Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-28 15:58:27 +01:00
Steve Horsman	fed63e0801	Merge pull request #11319 from stevenhorsman/remove-old-workflows workflows: Delete workflows	2025-05-28 15:38:19 +01:00
Steve Horsman	49f86aaa0d	Merge pull request #11320 from stevenhorsman/gatekeeper-workflow-permissions workflows: gatekeeper: Update permissions	2025-05-28 15:38:06 +01:00
stevenhorsman	3ff602c1e8	workflows: Add explicit permissions for attestation We have a number of jobs that nest the build-static-tarball workflows later on. Due to these doing attest build provenance, and pushing to ghcr.io, t hey need write permissions on `packages`, `id-token` and `attestations`, so we need to set these permissions on the top-level jobs (along with `contents: read`), so they are not blocked. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-28 12:56:52 +01:00
stevenhorsman	2f0dc2ae24	workflows: gatekeeper: Update permissions Restrict the permissions of gatekeeper flow to read contents only for better security Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-28 09:57:19 +01:00
stevenhorsman	f900b0b776	workflows: Delete workflows Some legacy workflows require write access to github which is a security weakness and don't provide much value, so lets remove them. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-28 09:45:42 +01:00
Alex Lyn	aab6caa141	Merge pull request #10362 from Apokleos/vfio-hotplug-runtime-rs runtime-rs: add support hotplugging vfio device for qemu-rs	2025-05-28 13:21:58 +08:00
Fabiano Fidêncio	ac934e001e	Merge pull request #11244 from katexochen/p/guest-pull-config runtime: add option to force guest pull	2025-05-27 16:00:09 +02:00
alex.lyn	e69a4d203a	runtime-rs: Increase QMP read timeout to mitigate failures It frequently causes "Resource Temporarily Unavailable (OS Error 11)" with the original 250ms read timeout When passing through devices via VFIO in QEMU. The root cause lies in synchronization timeout windows failing to accommodate inherent delays during critical hardware init phases in kernel space. This commit would increase the timeout to 5000ms which was determined through some tests. While not guaranteeing complete resolution for all hardware combinations, this change significantly reduces timeout failures. Fixes # 10361 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-27 21:06:57 +08:00
Paul Meyer	c4815eb3ad	runtime: add option to force guest pull This enables guest pull via config, without the need of any external snapshotter. When the config enables runtime.experimental_force_guest_pull, instead of relying on annotations to select the way to share the root FS, we always use guest pull. Co-authored-by: Markus Rudy <mr@edgeless.systems> Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-05-27 12:42:00 +02:00
Fabiano Fidêncio	d3f81ec337	Merge pull request #11240 from Apokleos/copydir runtime-rs: Propagate k8s configs correctly when sharedfs is disabled	2025-05-27 12:41:21 +02:00
Paul Meyer	8de8b8185e	genpolicy: rename svc_name to svc_name_downward_env Just to be more explicit what this matches. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-05-27 10:13:43 +02:00
Paul Meyer	78eb65bb0b	genpolicy: fix svc_name regex The service name is specified as RFC 1035 lable name [1]. The svc_name regex in the genpolicy settings is applied to the downward API env variables created based on the service name. So it tries to match RFC 1035 labels after they are transformed to downward API variable names [2]. So the set of lower case alphanumerics and dashes is transformed to upper case alphanumerics and underscores. The previous regex wronly permitted use of numbers, but did allow dot and dash, which shouldn't be allowed (dot not because they aren't conform with RFC 1035, dash not because it is transformed to underscore). We have to take care not to also try to use the regex in places where we actually want to check for RFC 1035 label instead of the downward API transformed version of it. Further, we should consider using a format like JSON5/JSONC for the policy settings, as these are far from trivial and would highly benefit from proper documentation through comments. [1]: https://kubernetes.io/docs/concepts/services-networking/service/#defining-a-service [2]: `b2dfba4151/pkg/kubelet/envvars/envvars.go (L29-L70)` Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-05-27 08:43:25 +02:00
RuoqingHe	139dc13bdc	Merge pull request #11301 from lifupan/fix_cgroup runtime-rs: fix the issue of delete cgroup failed	2025-05-27 05:05:32 +02:00
Wainer Moschetta	d77e33babf	Merge pull request #11266 from ldoktor/ci-pp-retry ci.ocp: A couple of peer-pods setup improvements	2025-05-26 14:22:11 -03:00
Wainer Moschetta	c249769bb8	Merge pull request #11270 from ldoktor/gk tools.testing: Add methods to simplify gatekeeper development	2025-05-26 12:04:07 -03:00
Fabiano Fidêncio	20d3bc6f37	Merge pull request #10964 from hsiangkao/drop_outdated_patches Drop outdated erofs patches for 6.1.y kernels & fix a dragonball vsock issue	2025-05-26 13:00:25 +02:00
Gao Xiang	b441890749	kernel: drop outdated erofs patches for 6.1.y kernels Patches 0001..0004 have been included upstream as dependencies since Linux 6.1.113. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2025-05-26 15:48:24 +08:00
Xingru Li	71b6acfd7e	dragonball: vsock: support single descriptor Since kernel v6.3 the vsock packet is not split over two descriptors and is instead included in a single one. Therefore, we currently decide the specific method of obtaining BufWrapper based on the length of descriptor. Refer: `a2752fe04f` https://git.kernel.org/torvalds/c/71dc9ec9ac7d Signed-off-by: Xingru Li <lixingru.lxr@linux.alibaba.com> [ Gao Xiang: port this patch from the internal branch to address Linux 6.1.63+. ] Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2025-05-26 15:48:19 +08:00
RuoqingHe	b6cafba5f6	Merge pull request #11308 from hsiangkao/enable_tmpfs_xattr kernel: support `CONFIG_TMPFS_XATTR=y`	2025-05-26 05:00:26 +02:00
Gao Xiang	b681dfb594	kernel: support `CONFIG_TMPFS_XATTR=y` Currently, Kata EROFS support needs it, otherwise it will: [ 0.564610] erofs: (device sda): mounted with root inode @ nid 36. [ 0.564858] overlayfs: failed to set xattr on upper [ 0.564859] overlayfs: ...falling back to index=off,metacopy=off. [ 0.564860] overlayfs: ...falling back to xino=off. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2025-05-24 20:43:35 +08:00
RuoqingHe	a9ffdfc2ae	Merge pull request #11294 from wainersm/delint_confidential_kbs tests/k8s: delint confidential_kbs.sh	2025-05-23 17:00:28 +02:00
Fupan Li	e9b45126fc	Merge pull request #11254 from sampleyang/main runtime-rs: fix vfio pci address domain 0001 problem	2025-05-23 18:13:10 +08:00
yangsong	06c7c5bccb	runtime-rs: fix vfio pci address domain 0001 problem Some nvidia gpu pci address domain with 0001, current runtime default deal with 0000:bdf, which cause address errors during device initialization and address conflicts during device registration. Fixes #11252 Signed-off-by: yangsong <yunya.ys@antgroup.com>	2025-05-23 14:33:06 +08:00
Wainer dos Santos Moschetta	ddf333feaf	tests/k8s: fix shellcheck SC1091 in confidential_kbs.sh Fixed "note: Not following: ./../../../tools/packaging/guest-image/lib_se.sh: openBinaryFile: does not exist (No such file or directory) [SC1091]" Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-05-22 15:38:27 -03:00
Wainer dos Santos Moschetta	c9fb0b9c85	tests/k8s: fix shellcheck SC2154 in confidential_kbs.sh Fixed "warning: HKD_PATH is referenced but not assigned. [SC2154]" Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-05-22 15:02:20 -03:00
Wainer dos Santos Moschetta	68d91d759a	tests/k8s: add `set -e` to confidential_ksh.sh Although the script will inherit that setting from the caller scripts, expliciting it in the file will vanish shellcheck "warning: Use 'pushd ... \|\| exit' or 'pushd ... \|\| return' in case pushd fails. [SC2164]" Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-05-22 14:55:24 -03:00
Wainer dos Santos Moschetta	b4adfcb3cb	tests/k8s: apply shellcheck tips to confidential_kbs.sh Addressed the following shellcheck advices: SC2046 (warning): Quote this to prevent word splitting. SC2248 (style): Prefer double quoting even when variables don't contain special characters SC2250 (style): Prefer putting braces around variable references even when not strictly required. SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-05-22 14:52:38 -03:00
alex.lyn	043bab3d3e	runtime-rs: Handle port allocation in PCIe topology for vfio devices It's import to handle port allocation in a PCIe topology before vfio deivce hotplug via QMP. The code ensures that VFIO devices are properly allocated to available ports (either root ports or switch ports) and updates the device's bus and port information accordingly. It'll first retrieves the PCIe port type from the topology using pcie_topo.get_pcie_port(). And then, searches for an available node in the PCIe topology with RootPort or SwitchPort type and allocates the VFIO device to the found available port. Finally, Updates the device's bus with the allocated port's ID and type. Fixes # 10361 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-22 18:58:41 +08:00
alex.lyn	01b822de16	runtime-rs: Get available port node in the PCIe topology This commit implements the `find_available_node` function, which searches the PCIe topology for the first available `TopologyPortDevice` or `SwitchDownPort`. If no available node is found in either the `pcie_port_devices` or the connected switches' downstream ports, the function returns `None`. Fixes # 10361 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-22 18:58:41 +08:00
alex.lyn	533d07a2c3	runtime-rs: Introduce qemu-rs vfio device hotplug handler This commit note that the current implementation restriction where 'multifunction=on' is temporarily unsupported. While the feature isn't available in the present version, we explicitly acknowledge this limitation and commit to addressing it in future iterations to enhance functional completeness. Tracking issue #11292 has been created to monitor progress towards full multifunction support. Fixes #10361 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-22 18:58:06 +08:00
Steve Horsman	91f2e97aae	Merge pull request #11267 from Rtoax/p001-fix-osbuilder-lib.sh-indent osbuilder: lib.sh: Fix indent	2025-05-22 09:54:18 +01:00
alex.lyn	f1796fe9ba	runtime-rs: Add more fields in VfioDevice to express vfio devices To support port devices for vfio devices, more fields need to be introduced to help pass port type, bus and other information. Fixes #10361 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-22 16:00:40 +08:00
Fupan Li	15cbc545ca	runtime-rs: fix the issue of delete cgroup failed When try to delete a cgroup, it's needed to move all of the tasks/procs in the cgroup into root cgroup and then delete it. Since for cgroup v2, it doesn't support to move thread into root cgroup, thus move the processes instead of moving tasks can fix this issue. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-05-22 12:15:02 +08:00
Steve Horsman	9356ed59d5	Merge pull request #11130 from wainersm/tests-better-report tests/k8s: better tests reporting for CI	2025-05-21 17:21:35 +01:00
Steve Horsman	b519e9fdff	Merge pull request #11293 from wainersm/tests_increase_kbs_timeout tests/k8s: increase wait time of KBS service ingress	2025-05-21 17:14:52 +01:00
Steve Horsman	a897bce29f	Merge pull request #11298 from stevenhorsman/release-3.17.0-bump release: Bump version to 3.17.0	2025-05-21 12:06:24 +01:00
stevenhorsman	7b90ff3c01	release: Bump version to 3.17.0 Bump VERSION and helm-chart versions Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-21 12:04:39 +01:00
Fabiano Fidêncio	5378e581d8	Merge pull request #11144 from Apokleos/hotplug-block-qemu-rs Support hot-plug block device in qemu-rs with QMP	2025-05-21 11:31:48 +02:00
Lukáš Doktor	67ee9f3425	ci.ocp: Improve logging of extra new resources this script relies on temporary subscriptions and won't cleanup any resources. Let's improve the logging to better describe what resources were created and how to clean them, if the user needs to do so. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-05-21 11:02:36 +02:00
Lukáš Doktor	32dbc5d2a9	ci.ocp: Use SCRIPT_DIR to allow execution from any folder We used hardcoded "ci/openshift-ci/cluster" location which expects this script to be only executed from the root. Let's use SCRIPT_DIR instead to allow execution from elsewhere eg. by user bisecting a failed CI run. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-05-21 10:30:03 +02:00
Lukáš Doktor	0e4fb62bb4	ci.ocp: Retry first az command as login takes time to propagate In CI we hit problem where just after `az login` the first `az network vnet list` command fails due to permission. We see "insufficient permissions" or "pending permissions", suggesting we should retry later. Manual tests and successful runs indicate we do have the permissions, but not immediately after login. Azure docs suggest using extra `az account set` but still the propagation might take some time. Add a loop retrying the first command a few times before declaring failure. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-05-21 10:28:01 +02:00
Fabiano Fidêncio	6c9b199ef1	Merge pull request #11289 from BbolroC/fix-vfio-coldplug runtime: Preserve hotplug devices for vfio-coldplug mode	2025-05-21 09:48:25 +02:00
Wainer dos Santos Moschetta	fdcf11d090	tests/k8s: increase wait time of KBS service ingress kbs_k8s_svc_host() returns the ingress IP when the KBS service is exposed via an ingress. In Azure AKS the ingress can time a while to be fully ready and recently we have noticed on CI that kbs_k8s_svc_host() has returned empty value. Maybe the problem is on current timeout being too low, so let's increase it to 50 seconds to see if the situation improves. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-05-20 15:20:08 -03:00
Wainer dos Santos Moschetta	80a816db9d	workflows/run-k8s-tests-coco-nontee: add step to report tests Run `gha-run.sh report-tests` to generate the report of the tests. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-05-20 14:43:38 -03:00
Wainer dos Santos Moschetta	8c4637d629	tests/k8s: print tests report Added 'report-tests' command to gha-run.sh to print to stdout a report of the tests executed. For example: ``` SUMMARY (2025-02-17-14:43:53): Pass: 0 Fail: 1 STATUSES: not_ok foo.bats OUTPUTS: ::group::foo.bats 1..3 not ok 1 test 1 not ok 2 test 2 ok 3 test 3 1..2 not ok 1 test 1 not ok 2 test 2 ::endgroup:: ``` Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-05-20 14:43:38 -03:00
Wainer dos Santos Moschetta	5e3b8a019a	tests/k8s: split and save bats outputs in files Currently run_kubernetes_tests.sh sends all the bats outputs to stdout which can be very difficult to browse to find a problem, mainly on CI. With this change, each bats execution have its output sent to 'reports/yyy-mm-dd-hh:mm:ss/<status>-<bats file>.log' where <status> is either 'ok' (tests passed) or 'not_ok' (some tests failed). Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-05-20 14:43:38 -03:00
Steve Horsman	f8c5aa6df6	Merge pull request #11259 from fitzthum/bump-gc-0140 Update Trustee and Guest Components for CoCo v0.14.0	2025-05-20 18:05:17 +01:00
Lukáš Doktor	c203d7eba6	ci.ocp: Set peer-pods-azure license We forgot to add the license header when introducing this test. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-05-20 17:03:48 +02:00
Steve Horsman	b4aa1e3fbd	Merge pull request #11279 from skazi0/repo-components osbuilder: ubuntu: Add REPO_COMPONENTS setting	2025-05-20 16:03:48 +01:00
Lukáš Doktor	b97b20295b	ci.ocp: Make peer-pods setup executable set permissions of the peer-pods-azure.sh script to executable Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-05-20 17:03:48 +02:00
Sumedh Alok Sharma	9a4432d197	Merge pull request #11233 from Ankita13-code/ankitapareek/execprocess-additional-input-validation genpolicy: validate input process fields for ExecProcessRequest	2025-05-20 20:11:41 +05:30
Jacek Tomasiak	91fb4353f6	osbuilder: ubuntu: Add REPO_COMPONENTS setting Added variable REPO_COMPONENTS (default: "main") which sets components used by mmdebstrap for rootfs building. This is useful for custom image builders who want to include EXTRA_PKGS from components other than the default "main" (e.g. "universe"). Fixes: #11278 Signed-off-by: Jacek Tomasiak <jtomasiak@arista.com> Signed-off-by: Jacek Tomasiak <jacek.tomasiak@gmail.com>	2025-05-20 14:01:48 +02:00
Fabiano Fidêncio	29099d139b	Merge pull request #11280 from kata-containers/dependabot/cargo/src/tools/kata-ctl/ring-0.17.14 build(deps): bump ring from 0.17.5 to 0.17.14 in /src/tools/kata-ctl	2025-05-20 13:47:22 +02:00
Fabiano Fidêncio	0bc0623037	Merge pull request #11277 from skazi0/repo-url osbuilder: ubuntu: Expose REPO_URL variables	2025-05-20 13:46:01 +02:00
Ankita Pareek	ad75595dc8	genpolicy: Add tests for various input validations for ExecProcessRequest These additional tests cover edge cases specific to- - Terminal validation - Capabilities validation - Working directory (Cwd) validation - NoNewPrivileges validation - User validation - Environment variables validation Signed-off-by: Ankita Pareek <ankitapareek@microsoft.com>	2025-05-20 11:19:55 +00:00
Saul Paredes	1e466bf39c	genpolicy: fix validation of env variables sourced from metadata.namespace Use $(sandbox-namespace) wildcard in case none is specified in yaml. If wildcard is present, compare input against annotation value. Fixes regression introduced in https://github.com/microsoft/kata-containers/pull/273 where samples that use metadata.namespace env var were no longer working. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-05-20 11:19:46 +00:00
Dan Mihai	a113b9eefd	genpolicy: validate probe process fields Validate more process fields for k8s probe commands - e.g., livenessProbe, readinessProbe, etc. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-05-20 11:15:30 +00:00
Dan Mihai	c0b8c6ed5e	genpolicy: validate process for commands from settings Validate more process fields for commands enabled using the ExecProcessRequest "commands" and/or "regex" fields from the settings file. Add function to get the container from state based on container_id matching instead of matching it against every policy container data Signed-off-by: Dan Mihai <dmihai@microsoft.com> Signed-off-by: Ankita Pareek <ankitapareek@microsoft.com>	2025-05-20 11:15:30 +00:00
Dan Mihai	6f78aaa411	genpolicy: use process inputs for allow_process() Using process data inputs for allow_process() is easier to read/understand compared with the older OCI data inputs. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-05-20 11:15:30 +00:00
Steve Horsman	2871c31162	Merge pull request #11273 from mythi/tdx-qemu-params config: update QEMU TDX configuration	2025-05-20 10:22:59 +01:00
Steve Horsman	4b317dddfa	Merge pull request #11271 from stevenhorsman/gatekeeper-truncate-names ci: gatekeeper: Require names update	2025-05-20 10:20:05 +01:00
alex.lyn	4b27ca9233	runtime-rs: Implement volume copy allowlist check For security reasons, we have restricted directory copying. Introduces the `is_allowlisted_copy_volume` function to verify if a given volume path is present in an allowed copy directory. This enhances security by ensuring only permitted volumes are copied Currently, only directories under the path `/var/lib/kubelet/pods/<uid>/volumes/{kubernetes.io~configmap, kubernetes.io~secret, kubernetes.io~downward-api, kubernetes.io~projected}` are allowed to be copied into the guest. Copying of other directories will be prohibited. Fixes #11237 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-20 16:57:10 +08:00
alex.lyn	8910bddce8	kata-types: Introduce k8s special volumes for projected and downward-api Fixes #11237 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-20 16:55:49 +08:00
alex.lyn	6fa409df1a	kata-agent: Improve file sync handling and address symlink issues When synchronizing file changes on the host, a "symlink AlreadyExists" issue occurs, primarily due to improper handling of symbolic links (symlinks). Additionally, there are other related problems. This patch will try to address these problems. (1) Handle symlink target existence (files, dirs, symlinks) during host file sync. Use appropriate removal methods (unlink, remove_file, remove_dir_all). (2) Enhance temporary file handling for safer operations and implement truncate only at offset 0 for resume support. (3) Set permissions and ownership for parent directories. (4) Check and clean target path for regular files before rename. Fixes #11237 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-20 16:55:49 +08:00
alex.lyn	654e6db91f	runtime-rs: Add inotify-based real-time directory synchronization Introduce event-driven file sync mechanism between host and guest when sharedfs is disabled, which will help monitor the host path in time and do sync files changes: 1. Introduce FsWatcher to monitor directory changes via inotify; 2. Support recursive watching with configurable filters; 3. Add debounce logic (default 500ms cooldown) to handle burst events; 4. Trigger `copy_dir_recursively` on stable state; 5. Handle CREATE/MODIFY/DELETE/MOVED/CLOSE_WRITE events; Fixes #11237 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-20 16:55:49 +08:00
alex.lyn	79b832b2f5	runtime-rs: Propagate k8s configs correctly when sharedfs is disabled In Kubernetes (k8s), while Kata Pods often use virtiofs for injecting Service Accounts, Secrets, and ConfigMaps, security-sensitive environments like CoCo disable host-guest sharing. Consequently, when SharedFs is disabled, we propagate these configurations into the guest via file copy and bind mount for correct container access. Fixes #11237 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-20 16:55:49 +08:00
alex.lyn	8da7cd1611	runtime-rs: Impl recursive directory copy with metadata preservation Add async directory traversal using BFS algorithm: (1) Support file type handling: Regular files (S_IFREG) with content streaming; Directories (S_IFDIR) with mode preservation; Symbolic links (S_IFLNK) with target recreation; (2) Maintain POSIX metadata: UID/GID preservation,File mode bits, and Directory permissions (3) Implement async I/O operations for: Directory enumeration, file reading, symlink target resolution Fixes #11237 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-20 16:55:49 +08:00
alex.lyn	378d04bdf0	runtime-rs: Add hotplug block device type with QMP There's several cases that block device plays very import roles: 1. Direct Volume: In Kata cases, to achieve high-performance I/O, raw files on the host are typically passed directly to the Guest via virtio-blk, and then bond/mounted within the Guest for container usage. 2. Trusted Storage In CoCo scenarios, particularly in Guest image pull mode, images are typically pulled directly from the registry within the Guest. However, due to constrained memory resources (prioritized for containers), CoCo leverages externally attached encrypted storage to store images, requiring hot-plug capability for block devices. and as other vmms, like dragonball and cloud-hypervisor in runtime-rs or qemu in kata-runtime have already supported such capabilities, we need support block device with hot-plug method (QMP) in qemu-rs. Let's do it. Fixes #11143 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-20 16:46:54 +08:00
alex.lyn	2405301e2e	runtime-rs: Support hotplugging block device via QMP This commit introduces block device hotplugging capability using QMP commands. The implementation enables attaching raw block devices to a running VM through the following steps: 1.Block Device Configuration Uses `blockdev-add` QMP command to define a raw block backend with (1) Direct I/O mode (2) Configurable read-only flag (3) Host file/block device path (`/path/to/block`) 2.PCI Device Attachment, Attaches the block device via `device_add` QMP command as a `virtio-blk-pci` device: (1) Dynamically allocates PCI slots using `find_free_slot()` (2) Binds to user-specified PCIe bus (e.g., `pcie.1`) (3) Returns PCI path for further management Fixes #11143 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-20 16:46:54 +08:00
alex.lyn	80bd71bfcc	runtime-rs: Iterates through PCI devices to find a match with qdev_id The get_pci_path_by_qdev_id function is designed to search for a PCI device within a given list of devices based on a specified qdev_id. It tracks the device's path in the PCI topology by recording the slot values of the devices traversed during the search. If the device is located behind a PCI bridge, the function recursively explores the bridge's device list to find the target device. The function returns the matching device along with its updated path if found, otherwise, it returns None. Fixes #11143 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-20 16:46:54 +08:00
Fupan Li	9a03815f18	Merge pull request #11095 from lifupan/ephemeral_volume runtime-rs: add the ephemeral memory based volume support	2025-05-20 09:18:34 +08:00
RuoqingHe	5b5c71510e	Merge pull request #11093 from kimullaa/fix-err-when-containerd-conf-does-not-exist kata-deploy: fix bug when config does not exist	2025-05-19 18:12:50 +02:00
Steve Horsman	cfdccaacb3	Merge pull request #11283 from Rtoax/p002-fix-typo config: Fix typos	2025-05-19 14:59:37 +01:00
RuoqingHe	93b44f920c	Merge pull request #11287 from bpradipt/remote-hyp-logging runtime: Fix logging for remote hypervisor	2025-05-19 15:49:15 +02:00
Shunsuke Kimura	9a8d64d6b1	kata-deploy: execute in the host environment `containerd` command should be executed in the host environment. (To generate the config that matches the host's containerd version.) Fixes: #11092 Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>	2025-05-19 21:42:21 +09:00
Shunsuke Kimura	d3edc90d80	kata-deploy: Fix condition always true if config.toml does not exist, `[ -x $(command -v containerd) ]` will always True (Because it is not enclosed in ""). ``` // current code $ [ -x $(command -v containerd_notfound) ] $ echo $? 0 // maybe expected code $ [ -x "$(command -v containerd_notfound)" ] $ echo $? 1 ``` Fixes: #11092 Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>	2025-05-19 21:42:21 +09:00
Hyounggyu Choi	2fd2cd4a9b	runtime: Preserve hotplug devices for vfio-coldplug mode Fixes: #11288 This commit appends hotplug devices (e.g., persistent volume) to deviceInfos when `vfio_mod` is `vfio` and `cold_plug_vfio` is set to one except `no-port`. For details, please visit the issue. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-05-19 13:46:49 +02:00
Pradipta Banerjee	9f9841492e	runtime: Fix logging for remote hypervisor Need to use hvLogger Fixes: #11286 Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>	2025-05-19 07:01:59 -04:00
Jacek Tomasiak	da6860a632	osbuilder: ubuntu: Expose REPO_URL variables This exposes REPO_URL and adds REPO_URL_X86_64 which can be set to use custom Ubuntu repo for building rootfs. If only one architecture is built, REPO_URL can be set. Otherwise, REPO_URL_X86_64 is used for x86_64 arch and REPO_URL for others. Fixes: #11276 Signed-off-by: Jacek Tomasiak <jtomasiak@arista.com> Signed-off-by: Jacek Tomasiak <jacek.tomasiak@gmail.com>	2025-05-19 12:41:49 +02:00
Rong Tao	914730d948	config: Fix typos devie should be device Signed-off-by: Rong Tao <rongtao@cestc.cn>	2025-05-19 14:19:22 +08:00
Alex Lyn	305a5f5e41	Merge pull request #10578 from Apokleos/pcie-port-devices runtime-rs: Introduce PCIe Port devices in runtime-rs for qemu-rs	2025-05-18 21:10:25 +08:00
Dan Mihai	b9651eadab	Merge pull request #11214 from microsoft/cameronbaird/address-gid-mismatch-additionalgids genpolicy: Enable AdditionalGids checks in rules.rego	2025-05-16 10:15:53 -07:00
dependabot[bot]	a2c7e48e0e	build(deps): bump ring from 0.17.5 to 0.17.14 in /src/tools/kata-ctl Bumps [ring](https://github.com/briansmith/ring) from 0.17.5 to 0.17.14. - [Changelog](https://github.com/briansmith/ring/blob/main/RELEASES.md) - [Commits](https://github.com/briansmith/ring/commits) --- updated-dependencies: - dependency-name: ring dependency-version: 0.17.14 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2025-05-16 14:51:20 +00:00
Fabiano Fidêncio	9e11b2e577	Merge pull request #11274 from fidencio/topic/arm-ci-k8s-enable-hotplug-tests ci: k8s: arm: Enable skipped tests	2025-05-16 13:19:18 +02:00
Fabiano Fidêncio	219d6e8ea6	Merge pull request #11257 from mythi/coco-guest-hardening confidential guest kernel hardening changes	2025-05-16 08:52:36 +02:00
Fabiano Fidêncio	86d2d96d4a	ci: k8s: arm: Enable skipped tests Now that memory hotplug should work, as we're using a firmware that supports that, let's re-enable the tests that rely on hotplug. Fixes: #10926, #10927 Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-05-16 03:02:32 +02:00
Fabiano Fidêncio	02ce395a69	Merge pull request #11272 from seungukshin/enable-edk2-for-arm64 Enable edk2 for arm64	2025-05-15 20:59:56 +02:00
Cameron Baird	7bba7374ec	genpolicy: Add retries to policy generation As the genpolicy from_files call makes network requests to container registries, it has a chance to fail. Harden us against flakes due to network by introducing a 6x retry loop in genpolicy tests. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-05-15 18:12:50 +00:00
Steve Horsman	d21d2a0657	Merge pull request #11265 from chathuryaadapa/bumpalo-crate-bump Bump: libz-sys crate to address CVE	2025-05-15 16:18:00 +01:00
Mikko Ylinen	ff851202e6	config: update QEMU TDX configuration Drop '-vmx-rdseed-exit' from '-cpu host' QEMU options. The history of it is unknown but it's likely related to early TDX enablement. TD pods start up fine without it (tested by manually editing the configuration file) and it's also not used elsewhere. Keep TDXCPUFEATURES for now in case a need for it shows up later. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-05-15 15:43:24 +03:00
Fabiano Fidêncio	676e66ae49	Merge pull request #11246 from skazi0/mmdebstrap osbuilder: ubuntu: Switch from multistrap to mmdebstrap	2025-05-15 14:15:37 +02:00
alex.lyn	07533522b8	runtime-rs: Handle PortDevice devices when invoke start_vm with Qemu Extract PortDevice relevant information, and then invoke different processing methods based on the device type. Fixes #10361 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-15 20:10:49 +08:00
alex.lyn	c109328097	runtime-rs: Introduce pcie root port and switch port in qemu-rs cmdline. Some data structures and methods are introduced to help handle vfio devices. And mothods add_pcie_root_ports and add_pcie_switch_ports follow runtime's related implementations of vfio devices. Fixes #10361 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-15 20:10:49 +08:00
alex.lyn	47c7ba8672	runtime-rs: Prepare pcie port devices before start sandbox Prepare pcie port devices before starting VM with the help of device manager and PCIe Topology. Fixes #10361 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-15 20:10:49 +08:00
alex.lyn	d435712ccb	runtime-rs: Introduce PortDevice in resource manager in sandbox A new resource type `PortDevice` is introduced which is dedicated for handling root ports/switch ports during sandbox creation(VM). Fixes #10361 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-15 20:10:49 +08:00
alex.lyn	1d670bb46c	runtime-rs: handle useless Device match arms in dragonball vmm case Fixes #10361 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-15 20:10:49 +08:00
alex.lyn	f08fdd25d8	runtime-rs: Introduce device type of PordDevice in device manager PortDevice is for handling root ports or switch ports in PCIe Topology. It will make it easy pass the root ports/switch ports information during create VM with requirements of PCIe devices. Fixes #10361 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-15 20:10:49 +08:00
alex.lyn	694a849eaa	runtime-rs: Add PCIe topology mgmt for Root Port and Switch Port This commit introduces an implementation for managing PCIe topologies, focusing on the relationship between Root Ports and Switch Ports. The design supports two strategies for generating Switch Ports: Let's take the requirement of 4 switch ports as an example. There'll be three possible solutions as below: (1) Single Root Port + Single PCIe Switch: Uses 1 Root Port and 1 Switch with 4 Downstream Ports. (2) Multiple Root Ports + Multiple PCIe Switches: Uses 2 Root Ports and 2 Switches, each with 2 Downstream Ports. The recommended strategy is Option 1 due to its simplicity, efficiency, and scalability. The implementation includes data structures (PcieTopology, RootPort, PcieSwitch, SwitchPort) and operations (add_pcie_root_port, add_switch_to_root_port, add_switch_port_to_switch) to manage the topology effectively. Fxies #10361 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-15 20:10:49 +08:00
alex.lyn	2f5ee0ec6d	kata-types: Support switch port config via annotation and configuration Support setting switch ports with annotatation or configuration.toml Fixes #10361 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-15 20:10:49 +08:00
alex.lyn	a42d16a6a4	kata-types: Introduce pcie_switch_port in configuration (1) Introduce new field `pcie_switch_port` for switch ports. (2) Add related checking logics in vmms(dragonball, qemu) Fixes #10361 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-05-15 20:10:49 +08:00
Fabiano Fidêncio	af3c601a92	Merge pull request #11258 from fidencio/topic/second-try-fix-multi-install-prefix kata-deploy: Avoid changing any component path in case of restart	2025-05-15 11:21:15 +02:00
Seunguk Shin	560e718979	runtime: Add edk2 to configuration-qemu.toml for arm64 The edk2 is required for memory hot plug on qemu for arm64. This adds the edk2 to configuration-qemu.toml for arm64. Signed-off-by: Seunguk Shin <seunguk.shin@arm.com> Reviewed-by: Nick Connolly <nick.connolly@arm.com>	2025-05-15 10:12:31 +01:00
Seunguk Shin	5cabce1a25	packaging: Build edk2 for arm64 The edk2 is required for memory hot plug on qemu for arm64. This adds the edk2 to static tarball for arm64. Signed-off-by: Seunguk Shin <seunguk.shin@arm.com> Reviewed-by: Nick Connolly <nick.connolly@arm.com>	2025-05-15 10:12:24 +01:00
stevenhorsman	c09291a9c7	ci: gatekeeper: Require names update The github rest api truncated job names that are >100 characters (which doesn't seem to be documented). There doesn't seem to be a way to easily make gatekeeper handle this automatically, so lets update the required-tests to expect the truncated job names Fixes: #11176 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-15 10:07:41 +01:00
Steve Horsman	95e5e0ec49	Merge pull request #11264 from fidencio/topic/helm-to-ci helm: release: Publish our helm charts to the OCI registries	2025-05-15 09:47:33 +01:00
Lukáš Doktor	9f8c8ea851	tools.testing: Add way to re-play recorded queries in gatekeeper to simplify gatekeeper development add support for DEBUG_INPUT which can be used to report content from files gathered in DEBUG run. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-05-15 10:32:10 +02:00
Lukáš Doktor	1a15990ee1	tools.testing: Add DEBUG support for gatekeeper to avoid manual curling to analyze GK issues let's add a way to dump all GK requests in a directory when the use specifies "DEBUG" env variable. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-05-15 10:32:10 +02:00
Fabiano Fidêncio	71e8c1b4f0	helm: release: Publish our helm charts to the OCI registries Let's take advantage that helm take and OCI registry as the charts, and upload our charts to the OCI registries we've been using so far. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-05-14 20:20:35 +02:00
RuoqingHe	393cc61153	Merge pull request #11241 from kata-containers/dependabot/cargo/src/tools/agent-ctl/ring-0.17.14 build(deps): bump ring from 0.17.8 to 0.17.14 in /src/tools/agent-ctl	2025-05-14 16:20:33 +02:00
Adapa Chathurya	3d284d3b4e	versions: Bump libz-sys version Bump libz-sys version to update and remediate CVE-2025-1744. Signed-off-by: Adapa Chathurya <adapa.chathurya1@ibm.com>	2025-05-14 19:48:10 +05:30
Fabiano Fidêncio	82928d1480	kata-deploy: Avoid changing any component path in case of restart The previous attempt to fix this issue only took in consideration the QEMU binary, as I completely forgot that there were other pieces of the config that we also adjusted. Now, let's just check one of the configs before trying to adjust anything else, and only do the changes if the suffix added with the multi-install suffix is not yet added.{ Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-05-14 15:41:13 +02:00
Jacek Tomasiak	e20fb377fc	osbuilder: ubuntu: Switch from multistrap to mmdebstrap Multistrap requires usrmerge package which was dropped in Ubuntu 24.04 (Noble). Based on details from [0], the rootfs build process was switched to mmdebstrap. Some additional minor tweaks were needed around chrony as the version from Noble has very strict systemd sandboxing configured and it doesn't work with readonly root by default. [0] https://lists.debian.org/debian-dpkg/2023/05/msg00080.html Fixes: #11245 Signed-off-by: Jacek Tomasiak <jtomasiak@arista.com> Signed-off-by: Jacek Tomasiak <jacek.tomasiak@gmail.com>	2025-05-14 11:46:19 +02:00
Steve Horsman	711fcd8f51	Merge pull request #11251 from stevenhorsman/rust-vulns-9th-may-2025 Rust vulns 9th may 2025	2025-05-14 09:58:12 +01:00
Tobin Feldman-Fitzthum	be708f410e	tests: fixup error assert in pull image test Guest components is now less verbose with its error messages. This will be fixed after the release but for now switch to a more generic error message that is still found in the logs. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2025-05-13 20:17:02 -05:00
Tobin Feldman-Fitzthum	806abeefb9	tests: fixup error asserts in init-data test Guest components is less verbose with its error message now. This will be fixed after the release, but for now, update the tests with the new more general message. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2025-05-13 20:16:50 -05:00
Tobin Feldman-Fitzthum	e2e503eb33	tests: fixup error string for signature tests Guets components is less verbose with its error messages. This will be fixed after the release, but for now let's replace this with a more generic message. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2025-05-13 16:54:06 -05:00
Cameron Baird	090497f520	genpolicy: Add test cases for fsGroup and supplementalGroup fields Fix up genpolicy test inputs to include required additionalGids Include a test for the pod_container container in security_context tests as these containers follow slightly different paths in containerd. Introduce a test for fsGroup/supplementalGroups fields in the security context. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-05-13 21:48:58 +00:00
Cameron Baird	19d502de76	ci: Add test cases for fsGroup and supplementalGroup fields Introduce new test case to the security context bats file which verifies that policy works properly for a deployment yaml containing fsGroup and supplementalGroup configuration. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-05-13 21:48:58 +00:00
Cameron Baird	d3cd1af593	genpolicy: Enable AdditionalGids checks in rules.rego With added support for parsing these fields in genpolicy, we can now enable policy verification of AdditionalGids. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-05-13 21:48:58 +00:00
Tobin Feldman-Fitzthum	ef98f39b6d	tests: update error message for authenticated guest pull Some changes in guest components have obscured the error message that we show when we fail to get the credentials for an authenticated image. The new error message is a little bit misleading since it references decrypting an image. This will be udpated in a future release, but for now look for this message. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2025-05-13 16:46:32 -05:00
Cameron Baird	29ee46c186	genpolicy: Handle PodSecurityContext.fsGroup\|supplementalGroups Policy enforcement for additionalGids, A list of groups applied to the first process run in each container. Manifests in OCI struct as additionalGids: Consists of container's GID, fsGroup, and supplementalGroups. https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.28/#PodSecurityContext-v1-core Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-05-13 21:44:51 +00:00
Tobin Feldman-Fitzthum	e10aa4e49c	tests: update error message for encrypted image test Guest components prints out a different error when failing to decrypt an image. Update the test to look for this new error. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2025-05-13 12:33:37 -05:00
RuoqingHe	cd4c3e89e1	Merge pull request #11243 from kata-containers/dependabot/go_modules/src/runtime/github.com/opencontainers/runc-1.2.0 build(deps): bump github.com/opencontainers/runc from 1.1.12 to 1.2.0 in /src/runtime	2025-05-13 17:02:35 +02:00
RuoqingHe	268197957d	Merge pull request #11253 from stevenhorsman/golang.org/x/oauth2v0.27.0-bump versions: Bump golang.org/x/oauth2	2025-05-13 15:03:24 +02:00
stevenhorsman	b3825829d8	versions: Bump golang.org/x/oauth2 Update module to remediate [CVE-2025-22868](https://www.cve.org/CVERecord?id=CVE-2025-22868) Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-13 11:00:35 +01:00
Rong Tao	37a16c19d1	osbuilder: lib.sh: Fix indent Replace 4 spaces to [tab]. Signed-off-by: Rong Tao <rongtao@cestc.cn>	2025-05-13 16:56:54 +08:00
Steve Horsman	299fb3b77b	Merge pull request #11255 from stevenhorsman/skip-docker-tests ci: gatekeeper: skip docker tests	2025-05-13 09:18:09 +01:00
Zvonko Kaiser	842ec6a32e	Merge pull request #11262 from BbolroC/add-vfio-config-for-sel-runtime runtime/config: Add VFIO config for IBM SEL	2025-05-12 10:59:09 -04:00
Zvonko Kaiser	5cc098ae43	Merge pull request #11242 from houstar/qing/safe-path agent: use safe-path to replace secure_join	2025-05-12 10:58:19 -04:00
Mikko Ylinen	ab29c8c979	runtime: do not add virtio-rng-pci device for confidential guests Adding: "-object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0" for confidential guests is not necessary as the RNG source cannot be trusted and the guest kernel has the driver already disable as well. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-05-12 17:14:51 +03:00
Mikko Ylinen	a44dfb8d37	versions: bump LTS kernel 6.12.28 has been released, let's bump to it. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-05-12 17:14:51 +03:00
Mikko Ylinen	eb326477fc	kernel: disable virtio RNG for confidential guests Linux CoCo x86 guest is hardened to ensure RDRAND provides enough entropy to initialize Linux RNG. A failure will panic the guest. For confidential guests any other RNG source is untrusted so disable them. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-05-12 17:12:44 +03:00
Hyounggyu Choi	4fac1293bd	runtime/config: Add VFIO config for IBM SEL With #11076 merged, a VFIO configuration is needed in the runtime when IBM SEL is involved (e.g., qemu-se or qemu-se-runtime-rs). For the Go runtime, we already have a nightly test (e.g., https://github.com/kata-containers/kata-containers/actions/runs/14964175872/job/42031097043) in which this change has been applied. For the Rust runtime, the feature has not yet been migrated. Thus, this change serves as a placeholder and a reminder for future implementation. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-05-12 14:58:29 +02:00
Qingyuan Hou	c0ceaf661a	agent: use safe-path to replace secure_join This patch use safe-path library to safely handle filesystem paths. Signed-off-by: Qingyuan Hou <qingyuan.hou@linux.alibaba.com>	2025-05-12 09:06:55 +00:00
Tobin Feldman-Fitzthum	de6f4ae99c	versions: update Trustee version for CoCo v0.14.0 This hash will be tagged as Trustee v0.13.0 after the CoCo release is finished. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2025-05-09 13:40:28 -05:00
Tobin Feldman-Fitzthum	f9a9967e21	versions: update guest-components for CoCo v0.14.0 Pick up changes to guest components. This hash is right before the changes to GC to support image pull via the CDH. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2025-05-09 13:40:28 -05:00
Tobin Feldman-Fitzthum	d714eb2472	agent: update image-rs for CoCo v0.14.0 We might be able to eliminate this dependency soon, but for now let's update image-rs. I massaged the dependencies with: cargo update idna_adapter@1.2.1 --precise 1.2.0 cargo update litemap@0.7.5 --precise 0.7.4 cargo update zerofrom@0.1.6 --precise 0.1.5 cargo update astral-tokio-tar@0.5.2 --precise 0.5.1 cargo update base64ct@1.7.3 --precise 1.6.0 cargo update generic-array@1.2.0 --precise 1.1.1 Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2025-05-09 13:39:52 -05:00
stevenhorsman	35ed3a2a3a	versions: Bump bumpalo version Bump bumpalo version to remediate RUSTSEC-2022-0078 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-09 16:09:22 +01:00
stevenhorsman	fcc60b514b	versions: Bump hyper version Bump hyper version to update and remediate CVE-2023-26964 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-09 16:09:22 +01:00
stevenhorsman	7807e6c29a	versions: Bump byte-unit and rust_decimal Bump the crates to update them and pull in a newer version of borsh to remediate RUSTSEC-2023-0033 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-09 16:09:22 +01:00
Mikko Ylinen	96d922fc27	kernel: disable virtio MMIO for confidential guests As the comment in the fragment suggests, this is for the firecracker builds and not relevant for confidential guests, for example. Exlude mmio.conf fragment by adding the new !confidential tag to drop virtio MMIO transport for the confidential guest kernel (as virtio PCI is enough for the use cases today). Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-05-09 17:53:22 +03:00
Mikko Ylinen	31d6839eb5	tools: let confidential guest kernel builds to exclude fragments build-kernel.sh supports exluding fragments from the common base set based on the kernel target architecture. However, there are also cases where the base set must be stripped down for other reason. For example, confidential guest builds want to exclude some drivers the untrusted host may try to add devices (e.g., virtio-rng). Make build-kernel.sh to skip fragments tagged using '!confidential' when confidential guest kernels are built. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-05-09 17:53:22 +03:00
Zvonko Kaiser	78ff72a386	Merge pull request #11199 from fidencio/topic/kata-deploy-fix-multiInstallSufix-behaviour-during-restarts helm: Avoid appending the multiInstallSuffix several times	2025-05-09 10:32:23 -04:00
Zvonko Kaiser	26a3cb4fd1	Merge pull request #11250 from stevenhorsman/tempfile-3.19.1-bump versions: Update tempfile crate	2025-05-09 09:51:49 -04:00
stevenhorsman	a09a76a4f5	ci: gatekeeper: skip docker tests It looks like the 22.04 image got updated and broke the docker tests (see #11247), so make these un-required until we can get a resolution Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-09 13:57:23 +01:00
Markus Rudy	835f59df2f	Merge pull request #10986 from 3u13r/euler/feat/genpolicy/env-from-secret genpolicy: support secrets to be referenced for pod envs	2025-05-09 13:29:35 +02:00
stevenhorsman	787198f8bb	versions: Update tempfile crate Update the tempfile crate to resolve security issue [WS-2023-0045](`7247a8b6ee`) that came with the remove_dir_all dependency in prior versions Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-09 09:57:28 +01:00
Leonard Cohnen	b23ff6fc68	genpolicy: refactor policy test workdir setup This aligns the workdir preparation more closely with the workdir preparation for the generate integration test. Most notably, we clean up the temporary directory before we execute the tests in it. This way we better isolate different runs. Signed-off-by: Leonard Cohnen <leonard.cohnen@gmail.com> Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-05-09 09:45:28 +02:00
Leonard Cohnen	bad0cd0003	genpolicy: add cli integration tests Add a new type of integration test to genpolicy. Now we can test flag handling and how the CLI behaves with certain yaml inputs. The first tests cover the case when a Pod references a Kubernetes secret of config map in another file. Those need to be explicitly added via the --config-files flag. In the future we can easily add test suites that cover that all yaml fields of all resources are understood by genpolicy. Signed-off-by: Leonard Cohnen <leonard.cohnen@gmail.com>	2025-05-09 09:45:28 +02:00
Leonard Cohnen	61ee330029	genpolicy: move policy enforcement integration test to separate folder In preparation for adding more types of integration tests, moving the policy enforcements test into a separate folder. Signed-off-by: Leonard Cohnen <leonard.cohnen@gmail.com> Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-05-09 09:45:28 +02:00
Leonard Cohnen	2ea57aefbc	genpolicy: remove unused function Remove function that became unused in the last commit. Signed-off-by: Leonard Cohnen <leonard.cohnen@gmail.com>	2025-05-09 09:41:43 +02:00
Aurélien Bombo	4bb441965f	genpolicy: support arbitrary resources with -c This allows passing config maps and secrets (as well as any other resource kinds relevant in the future) using the -c flag. Fixes: #10033 Co-authored-by: Leonard Cohnen <leonard.cohnen@gmail.com> Signed-off-by: Leonard Cohnen <leonard.cohnen@gmail.com> Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-05-09 09:41:43 +02:00
Hyounggyu Choi	a286a5aee8	Merge pull request #11076 from Jakob-Naucke/ap-bind-assoc Bind/associate for VFIO-AP	2025-05-09 09:32:46 +02:00
Saul Paredes	1e09dfb0df	Merge pull request #11127 from microsoft/archana1/mount-tc genpolicy: improve validation for mounts	2025-05-08 15:41:23 -07:00
stevenhorsman	17843e50bb	runtime: Switch userns packages Switch imports to resolve: ``` SA1019: "github.com/opencontainers/runc/libcontainer/userns" is deprecated: use github.com/moby/sys/userns ``` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-08 11:04:11 +01:00
dependabot[bot]	2c80a3edce	build(deps): bump github.com/opencontainers/runc in /src/runtime Bumps [github.com/opencontainers/runc](https://github.com/opencontainers/runc) from 1.1.12 to 1.2.0. - [Release notes](https://github.com/opencontainers/runc/releases) - [Changelog](https://github.com/opencontainers/runc/blob/main/CHANGELOG.md) - [Commits](https://github.com/opencontainers/runc/compare/v1.1.12...v1.2.0) --- updated-dependencies: - dependency-name: github.com/opencontainers/runc dependency-version: 1.2.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-08 11:02:16 +01:00
Steve Horsman	e3e0007bf7	Merge pull request #11141 from stevenhorsman/k8s-cpu-ns-exec-retry tests: k8s: Retry output of kubectl exec in k8s-cpu-ns	2025-05-07 17:11:25 +01:00
Fabiano Fidêncio	f981e8a904	Merge pull request #10833 from stevenhorsman/crio-annotations-update Crio annotations update	2025-05-07 16:05:24 +02:00
dependabot[bot]	96885a8449	build(deps): bump ring from 0.17.8 to 0.17.14 in /src/tools/agent-ctl Bumps [ring](https://github.com/briansmith/ring) from 0.17.8 to 0.17.14. - [Changelog](https://github.com/briansmith/ring/blob/main/RELEASES.md) - [Commits](https://github.com/briansmith/ring/commits) --- updated-dependencies: - dependency-name: ring dependency-version: 0.17.14 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2025-05-07 12:18:56 +00:00
RuoqingHe	be75391953	Merge pull request #11235 from kata-containers/dependabot/cargo/src/tools/kata-ctl/openssl-0.10.72 build(deps): bump openssl from 0.10.60 to 0.10.72 in /src/tools/kata-ctl	2025-05-07 20:17:42 +08:00
RuoqingHe	d4d737a73e	Merge pull request #10512 from ncppd/riscv64-agent agent: Support RISC-V 64-bit architecture	2025-05-07 10:56:10 +08:00
RuoqingHe	7bdfea0041	Merge pull request #11123 from kimullaa/add-path-for-kata-deploy runtime: Add Path for kata-deploy	2025-05-07 00:25:12 +08:00
RuoqingHe	b5e45601f6	Merge pull request #11116 from kimullaa/more-robust-script-path-resolution kata-debug: Make path resolution more robust	2025-05-07 00:19:04 +08:00
stevenhorsman	5472662b33	runtime: Fix Incorrect conversion between integer types Fix the high severity codeql issue by checking the value is in bounds before converting Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-06 15:18:37 +01:00
stevenhorsman	4de79b9821	runtime: Ignoring deprecated warning. In the latest oci-spec, the prestart hook is deprecated. However, the docker & nerdctl tests failed when I switched to one of the newer hooks which don't run at quite the same time, so ignore the deprecation warnings for now to unblock the security fix Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-06 15:18:37 +01:00
stevenhorsman	37dda6060c	runtime: Re-vendor Re-run `make vendor` after the podman -> crio annotations change Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-06 15:18:37 +01:00
stevenhorsman	3740ce6e7b	runtime: Update crio annotations We've been using the github.com/containers/podman/v4/pkg/annotations module to get cri-o annotations, which has some major CVEs in, but in v5 most of the annotations were moved into crio (from 1.30) (see https://github.com/cri-o/cri-o/pull/7867). Let's switch to use the cri-o annotations module instead and remediate CVE-2024-3056. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-06 15:18:37 +01:00
dependabot[bot]	70b481e1ee	build(deps): bump openssl from 0.10.60 to 0.10.72 in /src/tools/kata-ctl Bumps [openssl](https://github.com/sfackler/rust-openssl) from 0.10.60 to 0.10.72. - [Release notes](https://github.com/sfackler/rust-openssl/releases) - [Commits](https://github.com/sfackler/rust-openssl/compare/openssl-v0.10.60...openssl-v0.10.72) --- updated-dependencies: - dependency-name: openssl dependency-version: 0.10.72 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2025-05-06 13:56:33 +00:00
RuoqingHe	4f97e5fed3	Merge pull request #11226 from kata-containers/dependabot/cargo/src/agent/tokio-1.44.2 build(deps): bump tokio from 1.44.0 to 1.44.2	2025-05-06 21:55:18 +08:00
Fabiano Fidêncio	78bf9d7500	Merge pull request #11232 from lifupan/mtu runtime: add the mtu support for updating routes	2025-05-06 15:55:04 +02:00
Shunsuke Kimura	7177ab3827	runtime: execute using abs path Fixes: #11123 Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>	2025-05-06 21:17:06 +09:00
Shunsuke Kimura	ddccbd4764	runtime: Add Path for kata-deploy When installing with kata-deploy, usually `/opt/kata/bin` is not in the PATH. Therefore, it will fail to execute. so add it to the PATH. Fixes: #11122 Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com> Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com>	2025-05-06 21:17:06 +09:00
Shunsuke Kimura	5c156a24e8	kata-debug: Make path resolution more robust Enabled to run from other scripts as source, etc. Fixes: #11115 Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>	2025-05-06 21:16:25 +09:00
stevenhorsman	6030a64f0c	build(deps): bump tokio to 1.44.2 Bumps [tokio](https://github.com/tokio-rs/tokio) from to 1.44.2 in all components to resolve the security vuln throughout our repo Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-06 11:38:52 +01:00
RuoqingHe	89685c0cd0	Merge pull request #11225 from kata-containers/dependabot/cargo/src/dragonball/openssl-0.10.72 build(deps): bump openssl from 0.10.57 to 0.10.72	2025-05-06 18:27:45 +08:00
Fabiano Fidêncio	fb5f3eae3b	Merge pull request #11172 from ChengyuZhu6/erofs EROFS Snapshotter Support in Kata	2025-05-06 11:14:19 +02:00
Ruoqing He	384d335419	ci: Enable build-check for agent on riscv64 Enable build-check for `agent` component for riscv64 platforms. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-05-06 01:48:37 +00:00
Ruoqing He	7f9b2c0af1	ci: Enable `install_libseccomp.sh` for riscv64 `musl` target is not yet available for riscv64 as of 1.80.0 rust toolchain, set `FORTIFY_SOURCE` to 1 on riscv64 platforms. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-05-06 01:48:37 +00:00
Nikos Ch. Papadopoulos	0f2c0d38f5	agent: Create pci_root_bus_path for riscv64 `create_pci_root_bus_path` needs to be enabled on riscv64 for agent to compile and work on those platforms. Signed-off-by: Nikos Ch. Papadopoulos <ncpapad@cslab.ece.ntua.gr>	2025-05-06 01:48:37 +00:00
Fupan Li	29f9015caf	runtime-rs: rm the obsoleted ephemeral volume processing Since the ephemeral volume already has a separate volume type for processing, the processing in the virtiofs share volume can be deleted. Moreover, it is not appropriate to process the ephemeral in the share fs. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-05-06 09:45:35 +08:00
Fupan Li	6e5f3cbbeb	runtime-rs: add the ephemeral memory based volume support For k8s, there's two type of volumes based on ephemral memory, one is emptydir volume based on ephemeral memory, and the other one is used for shm device such as /dev/shm. Thus add a new volume type ephemeral volume to support those two type volumes and remove the legacy shm volume. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-05-06 09:45:24 +08:00
ChengyuZhu6	d07b279bf1	agent:storage: Add directory creation support Implementing directory creation logic in the OverlayfsHandler to process driver options with the KATA_VOLUME_OVERLAYFS_CREATE_DIR prefix Signed-off-by: ChengyuZhu6 <hudson@cyzhu.com>	2025-05-05 23:51:44 +02:00
ChengyuZhu6	f63ec50ba3	runtime: Add EROFS snapshotter with block device support - Detection of EROFS options in container rootfs - Creation of necessary EROFS devices - Sharing of rootfs with EROFS via overlayfs Fixes: #11163 Signed-off-by: ChengyuZhu6 <hudson@cyzhu.com>	2025-05-05 23:51:13 +02:00
Archana Choudhary	fb815b77c1	genpolicy: add test for volumeMounts This patch: - adds a count check on mounts - adds various test scenarios for mounts with emptyDir volume source Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2025-05-05 15:17:50 +00:00
RuoqingHe	1cb34c4d0a	Merge pull request #11202 from RuoqingHe/2025-04-28-upgrade-rtnetlink runtime-rs: Upgrade `rust-netlink` crates	2025-05-05 21:35:45 +08:00
Fupan Li	492329fc02	runtime: add the mtu support for updating routes Some cni plugins will set the MTU of some routes, such as cilium will modify the MTU of the default route. If the mtu of the route is not set correctly, it may cause excessive fragmentation or even packet loss of network packets. Therefore, this PR adds the setting of the MTU of the route. First, when obtaining the route, if the MTU is set, the MTU will also be obtained and set to the route in the guest. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-05-04 23:12:57 +02:00
Ruoqing He	2d0f32ff96	runtime-rs: Upgrade crates from `rust-netlink` Bump `netlink-sys` to v0.8, `netlink-packet-route` to v0.22 and `rtnetlink` to v0.16 to reach a consistent state of `rust-netlink` dependencies. `bitflags` is bumped to v2.9.0 since those crates requires it. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-05-03 02:31:02 +00:00
Ruoqing He	09700478eb	runtime-rs: Group Dependencies from `rust-netlink` `rtnetlink`, `netlink-sys` and `netlink-packet-route` are from the same organization, and some of them are depending on the others, which implies the version of those crates should be chosen and dealt with carefully, group them to provide better management. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-05-03 02:29:43 +00:00
Fabiano Fidêncio	fbf7faa9f4	Merge pull request #11227 from fidencio/topic/agent-only-try-ipv6-if-stack-is-supported agent: netlink: Only add an ipv6 address if ipv6 is enabled	2025-05-02 12:31:40 +02:00
Xuewei Niu	a9b3c6a5a5	Merge pull request #11209 from lifupan/fix_slog shimv2: fix the issue logger write failed	2025-05-02 17:25:44 +08:00
Fabiano Fidêncio	79ad68cce5	Merge pull request #11230 from kimullaa/remove-wrong-qemu-option runtime: remove wrong qemu-system-x86_64 option	2025-05-02 11:18:45 +02:00
stevenhorsman	21498d401f	build(deps): bump openssl from to 0.10.72 Bumps [openssl](https://github.com/sfackler/rust-openssl) to 0.10.72. - [Release notes](https://github.com/sfackler/rust-openssl/releases) --- updated-dependencies: - dependency-name: openssl dependency-version: 0.10.72 dependency-type: indirect ... Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-05-02 09:36:50 +01:00
Fabiano Fidêncio	4ce00ea434	agent: netlink: Only add an ipv6 address if ipv6 is enabled When running Kata Containers on CSPs, the CSPs may enforce their clusters to be IPv4-only. Checking the OCI spec passed down to container, on a GKE cluster, we can see: ``` "sysctl": { ... "net.ipv6.conf.all.disable_ipv6": "1", "net.ipv6.conf.default.disable_ipv6": "1", ... }, ``` Even with ipv6 being explicitly disabled (behind our back ;-)), we've noticed that IPv6 addresses would be received, but then as IPv6 was disabled we'd break on CreatePodSandbox with the following error: ``` Warning FailedCreatePodSandBox 4s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: "update interface: Failed to add address fe80::c44c:1cff:fe84:f6b7: NetlinkError(ErrorMessage { code: Some(-13), header: [64, 0, 0, 0, 20, 0, 5, 5, 19, 0, 0, 0, 0, 0, 0, 0, 10, 64, 0, 0, 2, 0, 0, 0, 20, 0, 1, 0, 254, 128, 0, 0, 0, 0, 0, 0, 196, 76, 28, 255, 254, 132, 246, 183, 20, 0, 2, 0, 254, 128, 0, 0, 0, 0, 0, 0, 196, 76, 28, 255, 254, 132, 246, 183] })\n\nStack backtrace:\n 0: <unknown>\n 1: <unknown>\n 2: <unknown>\n 3: <unknown>\n 4: <unknown>\n 5: <unknown>\n 6: <unknown>\n 7: <unknown>\n 8: <unknown>\n 9: <unknown>\n 10: <unknown>": unknown ``` A huge shoutout to Fupan Li for helping with the debug on this one! Fixes: #11200 Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-05-02 09:10:45 +02:00
Shunsuke Kimura	3dba8ddd98	runtime: remove wrong qemu-system-x86_64 option qemu-system-x86_64 does not support "-machine virt". (this is only supported by arm,aarch64) <https://people.redhat.com/~cohuck/2022/01/05/qemu-machine-types.html> Fixes: #11229 Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>	2025-05-02 04:37:12 +09:00
Fabiano Fidêncio	7e404dd13f	Merge pull request #11228 from zvonkok/fix-kernel-modules-build gpu: Set the ARCH explicilty for driver builds	2025-05-01 21:07:20 +02:00
Zvonko Kaiser	445cad7754	gpu: Set the ARCH explicilty for driver builds Kernel Makefiles changed how to deduce the right arch lets set it explicilty to enable arm and amd builds. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-05-01 17:13:20 +00:00
RuoqingHe	049a4ef3a8	Merge pull request #11146 from RuoqingHe/2025-04-14-dragonball-centralize-dbs dragonball: Put local dependencies into workspace	2025-05-01 22:06:51 +08:00
RuoqingHe	bd1071aff8	Merge pull request #11174 from kata-containers/dependabot/cargo/src/mem-agent/crossbeam-channel-0.5.15 build(deps): bump crossbeam-channel from 0.5.13 to 0.5.15 in /src/mem-agent	2025-05-01 16:53:42 +08:00
Ruoqing He	61f2b6a733	dragonball: Put local dependencies into workspace Put local dependencies (mostly `dbs` crates) into workspace to avoid complex path dependencies all over the workspace. Simplify path dependency referencing. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-05-01 08:40:22 +00:00
RuoqingHe	33c69fc8bf	Merge pull request #11204 from stevenhorsman/go-security-bump-april-25 versions: Bump golang.org/x/net	2025-05-01 16:36:24 +08:00
Fabiano Fidêncio	bc66d75fe9	Merge pull request #11217 from stevenhorsman/runtime-rs-centralise-workspace-config Runtime rs centralise workspace config	2025-05-01 10:36:07 +02:00
Fupan Li	9924fbbc70	shimv2: fix the issue logger write failed It's better to open the log pipe file with read & write option, otherwise, once the containerd reboot and closed the read endpoint, kata shim would write the log pipe with broken pipe error. Fixes: #11207 Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-05-01 16:15:18 +08:00
Fabiano Fidêncio	3dfabd42c2	Merge pull request #11206 from kimullaa/fix-xfs-rootfs-type runtime: remove wrong xfs options	2025-05-01 09:05:17 +02:00
Fabiano Fidêncio	a2fbc598b8	Merge pull request #11223 from microsoft/cameronbaird/revert-aks-extension-pin ci: revert temp: ci: Fix AKS cluster creation	2025-05-01 08:33:12 +02:00
Shunsuke Kimura	62639c861e	runtime: remove wrong xfs options "data=ordered" and "errors=remount-ro" are wrong options in xfs. (they are ext4 options) <https://manpages.ubuntu.com/manpages/focal/man5/xfs.5.html> Fixes: #11205 Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>	2025-05-01 07:56:39 +09:00
Cameron Baird	6e21d14334	Revert "temp: ci: Fix AKS cluster creation" This reverts commit `1de466fe84`. The latest release of the az aks extension fixes the issue https://github.com/Azure/azure-cli-extensions/blob/main/src/aks-preview/HISTORY.rst#1400b5 Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-04-30 21:24:42 +00:00
stevenhorsman	a126884953	runtime-rs: Share workspace config Update the runtime-rs workspace packages to use workspace package versions where applicable to centralise the config and reduce maintenance when updating these Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-30 19:40:47 +01:00
stevenhorsman	f8fcd032ef	workflow: Set RUST_LIB_BACKTRACE=0 As discussed in #9538, with anyhow >=1.0.77 we have test failures due to backtrace behaviour changing, so set RUST_LIB_BACKTRACE=0, so that we only have backtrace on panics Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-30 19:38:13 +01:00
stevenhorsman	ffbaa793a3	versions: Update crossbeam-channel Update all crossbeam-channel for all non-agent packages (it was done separately in #11175) to 0.5.15 to get them on latest version and remove the versions with a vulnerability Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-30 19:36:40 +01:00
Steve Horsman	b97bc03ecb	Merge pull request #11211 from stevenhorsman/dragonball-lockfiles dragonball: Remove package lockfiles	2025-04-30 19:34:58 +01:00
stevenhorsman	f910c7535a	ci: Workaround cargo deny issue When a PR has no new files the cargo deny runner fails with: ``` [cargo-deny-generator.sh:17] ERROR: changed_files_status= ``` so add `\|\| true` to try and help this Co-authored-by: Ruoqing He <heruoqing@iscas.ac.cn> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-30 16:27:25 +01:00
stevenhorsman	f2a2117252	tests: k8s: Retry output of kubectl exec in k8s-cpu-ns We are seeing failures in this test, where the output of the kubectl exec command seems to be blank, so try retrying the exec like #11024 Fixes: #11133 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-30 15:01:08 +01:00
stevenhorsman	97f7d49e8e	dragonball: Remove package lockfiles Since #10780 the dbs crates are managed as members of the dragonball workspace, so we can remove the lockfile as it's now workspace managed now Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-30 09:14:07 +01:00
Steve Horsman	8045cb982c	Merge pull request #11208 from kata-containers/dependabot/cargo/src/runtime-rs/tokio-1.38.2 build(deps): bump tokio from 1.38.0 to 1.38.2 in /src/runtime-rs	2025-04-30 08:44:51 +01:00
Aurélien Bombo	46af7cf817	Merge pull request #11077 from microsoft/cameronbaird/address-gid-mismatch genpolicy: Align GID behavior with CRI and enable GID policy checks.	2025-04-29 22:23:23 +01:00
Aurélien Bombo	19371e2d3b	Merge pull request #11164 from wainersm/fix_kbs_on_aks tests/k8s: fix kbs installation on Azure AKS	2025-04-29 18:25:14 +01:00
Steve Horsman	6c1fafb651	Merge pull request #11210 from kata-containers/dependabot/cargo/src/tools/runk/tokio-1.44.2 build(deps): bump tokio from 1.38.0 to 1.44.2 in /src/tools/runk	2025-04-29 16:43:58 +01:00
Steve Horsman	3c8cc0cdbf	Merge pull request #11212 from BbolroC/add-cc-vfio-ap-test-s390x GHA: Add VFIO-AP to s390x nightly tests for CoCo	2025-04-29 16:15:00 +01:00
Steve Horsman	a6d1dc7df3	Merge pull request #10940 from ldoktor/peer-pods ci.ocp: Add peer-pods setup script	2025-04-29 15:57:30 +01:00
Hyounggyu Choi	63b9ae3ed0	GHA: Add VFIO-AP to s390x nightly tests for CoCo As #11076 introduces VFIO-AP bind/associate funtions for IBM Secure Execution (SEL), a new internal nightly test has been established. This PR adds a new entry `cc-vfio-ap-e2e-tests` to the existing matrix to share the test result. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-04-29 16:06:12 +02:00
Steve Horsman	8b32846519	Merge pull request #10882 from stevenhorsman/kbs-logging-on-failure tests: confidential: Add KBS logging	2025-04-29 13:29:21 +01:00
dependabot[bot]	7163d7d89b	build(deps): bump tokio from 1.38.0 to 1.38.2 in /src/runtime-rs Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.38.0 to 1.38.2. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.38.0...tokio-1.38.2) --- updated-dependencies: - dependency-name: tokio dependency-version: 1.38.2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2025-04-29 12:21:58 +00:00
dependabot[bot]	2992a279ab	build(deps): bump tokio from 1.38.0 to 1.44.2 in /src/tools/runk Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.38.0 to 1.44.2. - [Release notes](https://github.com/tokio-rs/tokio/releases) - [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.38.0...tokio-1.44.2) --- updated-dependencies: - dependency-name: tokio dependency-version: 1.44.2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2025-04-29 12:14:41 +00:00
Fabiano Fidêncio	e5cc9acab8	Merge pull request #11175 from kata-containers/dependabot/cargo/src/agent/crossbeam-channel-0.5.15 build(deps): bump crossbeam-channel from 0.5.14 to 0.5.15 in /src/agent	2025-04-29 14:13:25 +02:00
Fabiano Fidêncio	a9893e83b8	Merge pull request #11203 from stevenhorsman/high-severity-security-bumps-april-25 rust: High severity security bumps april 25	2025-04-29 14:10:05 +02:00
stevenhorsman	52b2662b75	tests: confidential: Add KBS logging For help with debugging add, logging of the KBS, like the container system logs if the confidential test fails Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-29 09:48:18 +01:00
stevenhorsman	bcffe938ca	versions: Bump golang.org/x/net Bump golang.org/x/net to 0.38.0 as dependabot isn't doing it for these packages to remediate CVE-2025-22872 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-29 09:46:48 +01:00
Steve Horsman	57527c1ce4	Merge pull request #11161 from kata-containers/dependabot/go_modules/src/runtime/golang.org/x/net-0.38.0 build(deps): bump golang.org/x/net from 0.33.0 to 0.38.0 in /src/runtime	2025-04-29 09:39:30 +01:00
Cameron Baird	70ef0376fb	genpolicy: Introduce special handling for clusters using nydus Nydus+guest_pull has specific behavior where it improperly handles image layers on the host, causing the CRI to not find /etc/passwd and /etc/group files on container images which have them. The unfortunately causes different outcomes w.r.t. GID used which we are trying to enforce with policy. This behavior is observed/explained in https://github.com/kata-containers/kata-containers/issues/11162 Handle this exception with a config.settings.cluster_config.guest_pull field. When this is true, simply ignore the /etc/* files in the container image as they will not be parsed by the CRI. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-04-28 20:18:42 +00:00
Cameron Baird	d3b652014a	genpolicy: Introduce genpolicy tests for security contexts Add security context testcases for genpolicy, verifying that UID and GID configurations controlled by the kubernetes security context are enforced. Also, fix the other CreateContainerRequest tests' expected contents to reflect our new genpolicy parsing/enforcement of GIDs. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-04-28 16:28:31 +00:00
Cameron Baird	fc75aee13a	ci: Add CI tests for runAsGroup, GID policy Introduce tests to check for policy correctness on a redis deployment with 1. a pod-level securityContext 2. a container-level securityContext which shadows the pod-level securityContext 3. a pod-level securityContext which selects an existing user (nobody), causing a new GID to be selected. Redis is an interesting container image to test with because it includes a /etc/passwd file with existing user/group configuration of 1000:1000 baked in. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-04-28 16:28:31 +00:00
Cameron Baird	938ddeaf1e	genpolicy: Enable GID checks in rules.rego With fixes to align policy GID parsing with the CRI behavior, we can now enable policy verification of GIDs. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-04-28 16:28:31 +00:00
Cameron Baird	eb2c7f4150	genpolicy: Integrate /etc/passwd from OCI container when setting GIDs The GID used for the running process in an OCI container is a function of 1. The securityContext.runAsGroup specified in a pod yaml, 2. The UID:GID mapping in /etc/passwd, if present in the container image layers, 3. Zero, even if the userstr specifies a GID. Make our policy engine align with this behavior by: 1. At the registry level, always obtain the GID from the /etc/passwd file if present. Ignore GIDs specified in the userstr encoded in the OCI container. 2. After an update to UID due to securityContexts, perform one final check against the /etc/passwd file if present. The GID used for the running process is the mapping in this file from UID->GID. 3. Override everything above with the GID of the securityContext configuration if provided Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-04-28 16:28:31 +00:00
Cameron Baird	c13d7796ee	genpolicy: Parse secContext runAsGroup and allowPrivilegeEscalation Our policy should cover these fields for securityContexts at the pod or container level of granularity. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-04-28 16:28:31 +00:00
Cameron Baird	349ce8c339	genpolicy: Refactor registry user/group parsing to account for all cases The get_process logic in registry.rs did not account for all cases (username:groupname), did not defer to contents of /etc/group, /etc/passwd when it should, and was difficult to read. Clean this implementation up, factoring the string parsing for user/group strings into their own functions. Enable the registry::Container class to query /etc/passwd and /etc/group, if they exist. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-04-28 16:28:29 +00:00
Wainer dos Santos Moschetta	460c3394dd	gha: run CoCo non-TEE tests on "all" host type By running on "all" host type there are two consequences: 1) run the "normal" tests too (until now, it's only "small" tests), so increasing the coverage 2) create AKS cluster with larger VMs. This is a new requirement due to the current ingress controller for the KBS service eating too much vCPUs and lefting only few for the tests (resulting on failures) Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-04-28 12:08:31 -03:00
Wainer dos Santos Moschetta	945482ff6e	tests: make _print_instance_type() to handle "all" host type _print_instance_type() returns the instance type of the AKS nodes, based on the host type. Tests are grouped per host type in "small" and "normal" sets based on the CPU requirements: "small" tests require few CPUs and "normal" more. There is an 3rd case: "all" host type maps to the union of "small" and "normal" tests, which should be handled by _print_instance_type() properly. In this case, it should return the largest instance type possible because "normal" tests will be executed too. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-04-28 12:08:31 -03:00
Wainer dos Santos Moschetta	a66aac0d77	tests/k8s: optimize nginx ingress for AKS small VM It's used an AKS managed ingress controller which keeps two nginx pod replicas where both request 500m of CPU. On small VMs like we've used on CI for running the CoCo non-TEE tests, it left only a few amount of CPU for the tests. Actually, one of these pod replicas won't even get started. So let's patch the ingress controller to have only one replica of nginx. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-04-28 12:08:31 -03:00
Wainer dos Santos Moschetta	14e74b8fc9	tests/k8s: fix kbs installation on Azure AKS The Azure AKS addon-http-application-routing add-on is deprecated and cannot be enabled on new clusters which has caused some CI jobs to fail. Migrated our code to use approuting instead. Unlike addon-http-application-routing, this add-on doesn't configure a managed cluster DNS zone, but the created ingress has a public IP. To avoid having to deal with DNS setup, we will be using that address from now on. Thus, some functions no longer used are deleted. Fixes #11156 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-04-28 12:08:31 -03:00
Fabiano Fidêncio	03ab774ed5	helm: Avoid appending the multiInstallSuffix several times Once the multiInstallSuffix has been taken into account, we should not keep appending it on every re-run/restart, as that would lead to a path that does not exist. Fixes: #11187 Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-04-28 16:36:38 +02:00
stevenhorsman	c938c75af0	versions: kata-ctl: Bump rustls Bump rustls version to > 0.21.11 to remediate high severity CVE-2024-32650 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-28 14:55:59 +01:00
stevenhorsman	2ee7ef6aa3	versions: agent-ctl: Bump hashbrown Bump hashbrown to >= 0.15.1 to remediate the high severity security alert that was in v0.15.0 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-28 14:55:46 +01:00
stevenhorsman	e3d3a2843f	versions: Bump mio to at least 0.8.11 Ensure that all the versions of mio we use are at least 0.8.11 to remediate CVE-2024-27308 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-28 14:55:46 +01:00
stevenhorsman	973bd7c2b6	build(deps): bump golang.org/x/net from 0.33.0 to 0.38.0 in /src/runtime Bumps [golang.org/x/net](https://github.com/golang/net) from 0.33.0 to 0.38.0. - [Commits](https://github.com/golang/net/compare/v0.33.0...v0.38.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-version: 0.38.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-28 14:09:54 +01:00
Steve Horsman	9248634baa	Merge pull request #11098 from stevenhorsman/golang-1.23.7 versions: Bump golang version	2025-04-28 13:46:11 +01:00
Fabiano Fidêncio	ee344aa4e9	Merge pull request #11185 from fidencio/topic/reclaim-guest-freed-memory-backport-from-runtime-rs runtime: clh: Add reclaim_guest_freed_memory [BACKPORT]	2025-04-28 12:32:33 +02:00
Steve Horsman	4f703e376b	Merge pull request #11201 from BbolroC/remove-non-tee-from-required-tests ci: Remove run-k8s-tests-coco-nontee from required tests	2025-04-28 10:05:07 +01:00
Hyounggyu Choi	9fe70151f7	ci: Remove run-k8s-tests-coco-nontee from required tests In #11044, `run-k8s-tests-coco-nontee` was set as requried by mistake. This PR disables the test again. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-04-28 10:48:08 +02:00
Steve Horsman	83d31b142b	Merge pull request #11044 from Jakob-Naucke/basic-s390x-ci ci: Extend basic s390x tests	2025-04-28 09:14:00 +01:00
Fupan Li	3457572130	Merge pull request #10579 from Apokleos/pcilibs-rs kata-sys-utils: Introduce pcilibs for getting pci devices info	2025-04-27 16:39:40 +08:00
Alex Lyn	43b5a616f6	Merge pull request #11166 from Apokleos/memcfg-adjust kata-types: Optimize memory adjuesting by only gathering memory info	2025-04-27 15:57:45 +08:00
Fabiano Fidêncio	b747f8380e	clh: Rework CreateVM to reduce the amount of cycles Otherwise the static checks will whip us as hard as possible. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-04-25 21:30:47 +02:00
Champ-Goblem	9f76467cb7	runtime: clh: Add reclaim_guest_freed_memory [BACKPORT] We're bringing to Cloud Hypervisor only the reclaim_guest_freed_memory option already present in the runtime-rs. This allows us to use virtio-balloon for the hypervisor to reclaim memory freed by the guest. The reason we're not touching other hypervisors is because we're very much aware of avoiding to clutter the go code at this point, so we'll leave it for whoever really needs this on other hypervisor (and trust me, we really do need it for Cloud Hypervisor right now ;-)). Signed-off-by: Champ-Goblem <cameron@northflank.com> Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-04-25 21:05:53 +02:00
Fabiano Fidêncio	1c72d22212	Merge pull request #11186 from fidencio/topic/kernel-add-taskstats-to-the-config kernel: Add CONFIG_TASKSTATS (and related) configs	2025-04-25 15:28:04 +02:00
Steve Horsman	213f9ddd30	Merge pull request #11191 from fidencio/topic/release-3.16.0-bump release: Bump version to 3.16.0	2025-04-25 09:04:31 +01:00
Fabiano Fidêncio	fc4e10b08d	release: Bump version to 3.16.0 Bump VERSION and helm-chart versions Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-04-25 08:17:15 +02:00
Fabiano Fidêncio	b96685bf7a	Merge pull request #11153 from fidencio/topic/build-allow-choosing-which-runtime-will-be-built build: Allow users to build the go, rust, or both runtimes	2025-04-25 08:13:07 +02:00
Fabiano Fidêncio	800c05fffe	Merge pull request #11189 from kata-containers/sprt/fix-create-cluster temp: ci: Fix AKS cluster creation	2025-04-24 23:01:12 +02:00
Aurélien Bombo	1de466fe84	temp: ci: Fix AKS cluster creation The AKS CLI recently introduced a regression that prevents using aks-preview extensions (Azure/azure-cli#31345), and hence create CI clusters. To address this, we temporarily hardcode the last known good version of aks-preview. Note that I removed the comment about this being a Mariner requirement, as aks-preview is also a requirement of AKS App Routing, which will be introduced soon in #11164. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-04-24 15:06:14 -05:00
Dan Mihai	706c2e2d68	Merge pull request #11184 from microsoft/danmihai1/retry-genpolicy ci: retry genpolicy execution	2025-04-24 08:01:22 -07:00
Champ-Goblem	cf4325b535	kernel: Add CONFIG_TASKSTATS (and related) configs Knowing that the upstream project provides a "ready to use" version of the kernel, it's good to include an easy way to users to monitor performance, and that's what we're doing by enabling the TASKSTATS (and related) kernel configs. This has been present as part of older kernels, but I couldn't reasonably find the reason why it's been dropped. Signed-off-by: Champ-Goblem <cameron@northflank.com> Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-04-24 11:51:21 +02:00
Fabiano Fidêncio	7e9e9263d1	build: Allow users to build the go, rust, or both runtimes Let's add a RUNTIME_CHOICE env var that can be passed to be build scripts, which allows the user to select whether they bulld the go runtime, the rust runtime, or both. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-04-24 10:36:26 +02:00
Alex Lyn	8b49564c01	Merge pull request #10610 from Xynnn007/faet-initdata-rbd Feat \| Implement initdata for bare-metal/qemu hypervisor	2025-04-24 09:59:14 +08:00
Alex Lyn	e8f19609b9	Merge pull request #11150 from zvonkok/cdi-annotations gpu: Fix CDI annotations	2025-04-24 09:58:16 +08:00
Dan Mihai	517d6201f5	ci: retry genpolicy execution genpolicy is sending more HTTPS requests than other components during CI so it's more likely to be affected by transient network errors similar to: ConnectError( "dns error", Custom { kind: Uncategorized, error: "failed to lookup address information: Try again", }, ) Note that genpolicy is not the only component hitting network errors during CI. Recent example from a different component: "Message: failed to create containerd task: failed to create shim task: failed to async pull blob stream HTTP status server error (502 Bad Gateway)" This CI change might help just with the genpolicy errors. Fixes: #11182 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-04-23 21:38:12 +00:00
Zvonko Kaiser	3946435291	gpu: Handle VFIO devices with DevicePlugin and CDI We can provide devices during cold-plug with CDI annotation on a Pod level and add per container device information wit the device plugin. Since the sandbox has already attached the VFIO device remove them from consideration and just apply the inner runtime CDI annotation. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-04-23 21:02:06 +00:00
Zvonko Kaiser	486244b292	gpu: Remove unneeded parsing of CDI devices The addition of CDI devices is now done for single_container and pod_sandbox and pod_container before the devmanager creates the deviceinfos no need for extra parsing. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-04-23 21:02:06 +00:00
Zvonko Kaiser	6713db8990	gpu: Add CDI parsing for Sandbox as well Extend the CDI parsing for pod_sandbox as well, only single_container was covered properly. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-04-23 21:02:06 +00:00
Zvonko Kaiser	97f4bcb456	gpu: Remove CDI annotations for outer runtime After the outer runtime has processed the CDI annotation from the spec we can delete them since they were converted into Linux devices in the OCI spec. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-04-23 21:02:06 +00:00
Steve Horsman	6102976d2d	Merge pull request #11178 from stevenhorsman/gperf-mirror versions: Switch gperf mirror	2025-04-23 20:21:42 +01:00
stevenhorsman	09052faaa0	versions: Switch gperf mirror Every so often the main gnu site has an outage, so we can't download gperf. GNU providesthe generic URL https://ftpmirror.gnu.org to automatically choose a nearby and up-to-date mirror, so switch to this to help avoid this problem Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-23 15:29:54 +01:00
stevenhorsman	ed56050a99	versions: Bump golangci-lint version v1.60.0+ is needed for go 1.23 support, so bump to the current latest 1.x version Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-23 12:37:48 +01:00
stevenhorsman	1c9d7ce0eb	ci: cri-containerd: Remove source from install_go.sh If the correct version of go is already installed then install_go.sh runs `exit`. When calling this as source from cri-containerd/gha-run.sh it means all dependencies after are skipped, so remove this. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-23 12:37:48 +01:00
stevenhorsman	c37840ce80	versions: Bump golang version Bump golang version to the latest minor 1.23.x release now that 1.24 has been released and 1.22.x is no longer stable and receiving security fixes Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-23 12:37:48 +01:00
dependabot[bot]	463fd4eda4	build(deps): bump crossbeam-channel from 0.5.14 to 0.5.15 in /src/agent Bumps [crossbeam-channel](https://github.com/crossbeam-rs/crossbeam) from 0.5.14 to 0.5.15. - [Release notes](https://github.com/crossbeam-rs/crossbeam/releases) - [Changelog](https://github.com/crossbeam-rs/crossbeam/blob/master/CHANGELOG.md) - [Commits](https://github.com/crossbeam-rs/crossbeam/compare/crossbeam-channel-0.5.14...crossbeam-channel-0.5.15) --- updated-dependencies: - dependency-name: crossbeam-channel dependency-version: 0.5.15 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com>	2025-04-23 11:34:14 +00:00
Steve Horsman	1ffce3ff70	Merge pull request #11173 from stevenhorsman/update-before-install workflows: Add apt update before install	2025-04-23 12:32:54 +01:00
stevenhorsman	ccfdf59607	workflows: Add apt update before install Add apt/apt-get updates before we do apt/apt-get installs to try and help with issues where we fail to fetch packages Co-authored-by: Fabiano Fidêncio <fidencio@northflank.com> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-23 09:06:08 +01:00
Xynnn007	b1c72c7094	test: add integration test for initdata This test we will test initdata in the following logic 1. Enable image signature verification via kernel commandline 2. Set Trustee address via initdata 3. Pull an image from a banned registry 4. Check if the pulling fails with log `image security validation failed` the initdata works. Note that if initdata does not work, the pod still fails to launch. But the error information is `[CDH] [ERROR]: Get Resource failed` which internally means that the KBS URL has not been set correctly. This test now only runs on qemu-coco-dev+x86_64 and qemu-tdx Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-04-23 15:55:04 +08:00
RuoqingHe	ef12dcd7da	Merge pull request #11158 from RuoqingHe/2025-04-15-fix-flag-calc runtime-rs: Use bitwise or assign for bitflags	2025-04-23 15:20:33 +08:00
alex.lyn	9eb3fcb84b	kata-types: Clean up noise caused by unformatted code For a long time, there has been unformatted code in the kata-types codebase, for example: ``` if qemu.memory_info.enable_guest_swap { - return Err(eother!( - "Qemu hypervisor doesn't support enable_guest_swap" - )); + return Err(eother!("Qemu hypervisor doesn't support enable_guest_swap")); } ... - }, device::DRIVER_NVDIMM_TYPE, eother, resolve_path + }, + device::DRIVER_NVDIMM_TYPE, + eother, resolve_path, -use std::collections::HashMap; -use anyhow::{Result, anyhow}; +use anyhow::{anyhow, Result}; use std::collections::hash_map::Entry; +use std::collections::HashMap; -/// DRIVER_VFIO_PCI_GK_TYPE is the device driver for vfio-pci +/// DRIVER_VFIO_PCI_GK_TYPE is the device driver for vfio-pci ``` This has brought unnecessary difficulties in version maintenance and commit difficulties. This commit will address this issue. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-04-23 09:40:07 +08:00
alex.lyn	97a1942f86	kata-types: Optimize memory adjuesting by only gathering memory info The Coniguration initialization was observed to be significantly slow due to the extensive system information gathering performed by `sysinfo::System::new_all()`. This function collects data on CPU, memory, disks, and network, most of which is unnecessary for Kata's memory adjusting config phase, where only the total system memory is required. This commit optimizes the initialization process by implementing a more targeted approach to retrieve only the total system memory. This avoids the overhead of collecting a large amount of irrelevant data, resulting in a noticeable performance improvement. Fixes #11165 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-04-23 09:40:07 +08:00
alex.lyn	3e77377be0	kata-sys-utils: Add test cases for devices In this, the crate mockall is introduced to help mock get_all_devices. Fixes #10556 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-04-23 09:32:04 +08:00
alex.lyn	f714b6c049	kata-sys-utils: Add test cases for pci manager Fixes #10556 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-04-23 09:32:04 +08:00
alex.lyn	0cdc05ce0a	kata-sys-utils: Introduce method to help handle proper BAR memory We need more information (BAR memory and other future ures...)for PCI devices when vfio devices passed through. So the method get_bars_max_addressable_memory is introduced for vfio devices to deduce the memory_reserve and pref64_reserve for NVIDIA devices. But it will be extended for other devices. Fixes #10556 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-04-23 09:32:04 +08:00
alex.lyn	f5eaaa41d5	kata-sys-utils: Introduce pcilibs to help get pci device info It's the basic framework for getting information of pci devices. Currently, we focus on the PCI Max bar memory size, but it'll be extended in the future. Fixes #10556 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-04-23 09:32:04 +08:00
Ruoqing He	d7f4b6cbef	runtime-rs: Use bitwise or assign for bitflags Use `\|=` instead of `+=` while calculating and iterating through a vector of flags, which makes more sense and prevents situations like duplicated flags in vector, which would cause problems. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-04-22 23:55:11 +00:00
Jakob Naucke	1c3b1f5adb	ci: Extend basic s390x tests Currently, s390x only tests cri-containerd. Partially converge to the feature set of basic-ci-amd64: - containerd-sandboxapi - containerd-stability - docker with the appropriate hypervisors. Do not run tests currently skipped on amd64, as well as - agent-ctl, which we don't package for s390x - nerdctl, does not package the `full` image for s390x - nydus, does not package for s390x Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com> Co-authored-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-04-22 21:34:02 +02:00
Aurélien Bombo	bf93b5daf1	Merge pull request #11113 from Sumynwa/sumsharma/policy_execprocess_container_id genpolicy: Add container_id & related policy container data to state.	2025-04-22 18:37:58 +01:00
Aurélien Bombo	318c409ed6	Merge pull request #11126 from gkurz/rootfs-systemd-files rootfs: Don't remove files from the rootfs by default	2025-04-22 18:17:14 +01:00
Aurélien Bombo	12594a9f9e	Merge pull request #11157 from wainersm/make_nontee_job_not_required ci: demote CoCo non-TEE to non-required from gatekeeper	2025-04-22 18:15:28 +01:00
Greg Kurz	734e7e8c54	rootfs: Don't remove files from the rootfs by default Recent PR #10732 moved the deletion of systemd files and units that were deemed uneccessary by `02b3b3b977` from `image_builder.sh` to `rootfs.sh`. This unfortunately broke `rootfs.sh centos` and `rootfs.sh -r` as used by some other downstream users like fedora and RHEL, with the following error : Warning FailedCreatePodSandBox 1s (x5 over 63s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = CreateContainer failed: Establishing a D-Bus connection Caused by: 0: I/O error: Connection reset by peer (os error 104) 1: Connection reset by peer (os error 104) This is because the aforementioned distros use dbus-broker [1] that requires systemd-journald to be present. It is questionable that systemd units or files should be deemed unnecessary for _all_ distros but this has been around since 2019. There's now also a long-standing expectation from CI that `make rootfs && make image` does remove these files. In order to accomodate all the expectations, add a `-d` flag to `rootfs.sh` to delete the systemd files and have `make rootfs` to use it. [1] https://github.com/bus1/dbus-broker Reported-by: Niteesh Dubey <niteesh@us.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org>	2025-04-17 16:53:05 +02:00
Zvonko Kaiser	497ab9faaf	Merge pull request #10999 from zvonkok/rootfs-updates gpu: Update creation permissions	2025-04-16 10:15:38 -04:00
Wainer dos Santos Moschetta	90397ca4fe	ci: demote CoCo non-TEE to non-required from gatekeeper The CoCo non-TEE job has failed due the removal of an add-on from AKS, causing KBS to not get installed (see #11156). The fix should be done in this repo as well as in trustee, which can take some time. We don't want to hold kata-containers PRs from getting merged anylonger, so removing the job from required list. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-04-15 19:00:30 -03:00
Wainer Moschetta	ff9fb19f11	Merge pull request #11026 from ldoktor/e2e-resources ci.ocp: Override default runtimeclass CPU resources	2025-04-15 10:33:35 -03:00
Lukáš Doktor	bfdf4e7a6a	ci.ocp: Add peer-pods setup script this script will be used in a new OCP integration pipeline to monitor basic workflows of OCP+peer-pods. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-04-15 12:13:22 +02:00
Xynnn007	91bb6b7c34	runtime: add support for io.katacontainers.config.runtime.cc_init_data io.katacontainers.config.runtime.cc_init_data specifies initdata used by the pod in base64(gzip(initdata toml)) format. The initdata will be encapsulated into an initdata image and mount it as a raw block device to the guest. The initdata image will be aligned with 512 bytes, which is chosen as a usual sector size supported by different hypervisors like qemu, clh and dragonball. Note that this patch only adds support for qemu hypervisor. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-04-15 16:35:59 +08:00
Sumedh Sharma	2a17628591	genpolicy: Add container_id & related policy container data to state. This commit adds changes to add input container_id and related container data to state after a CreateContainerRequest is allowed. This helps constrain reference container data for evaluating request inputs to one instead of matching against every policy container data, Ex: in ExecProcessRequest inputs. Fixes #11109 Signed-off-by: Sumedh Sharma <sumsharma@microsoft.com>	2025-04-15 14:02:59 +05:30
Zvonko Kaiser	2f28be3ad9	gpu: Update creation permissions We need to make sure the device files are created correctly in the rootfs otherwise kata-agent will apply permission 0o000. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-04-14 21:02:34 +00:00
Fabiano Fidêncio	bfd4b98355	Merge pull request #11142 from fidencio/topic/build-scripts-improvements-for-users build: User-facing improvements for the build scripts	2025-04-14 19:28:12 +02:00
Fabiano Fidêncio	5e363dc277	virtiofsd: Update to v1.13.1 It's been released for some time already ... and although we did have the necessary patches in, we better to stick to a released version of the project. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-04-14 13:23:31 +02:00
Fabiano Fidêncio	2fef594f14	build: Allow users to define AGENT_POLICY This is mostly used for Kata Containers backing up Confidential Computing use cases, this also has benefits for the normal Kata Containers use cases, this it's left enabled by default. However, let's allow users to specify whether or not they want to have it enabled, as depending on their use-case, it just does not make sense. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-04-14 10:02:22 +02:00
Fabiano Fidêncio	5d0688079a	build: Allow users to specificy EXTRA_PKGS Right now we've had some logic to add EXTRA_PKGS, but those were restrict to the nvidia builds, and would require changing the file manually. Let's make sure a user can add this just by specifying an env var. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-04-14 10:02:22 +02:00
Fabiano Fidêncio	40a15ac760	build: Allow adding a guest-hook to the rootfs Kata Containers provides, since forever, a way to run OCI guest-hooks from the rootfs, as long as the files are dropped in a specific location defined in the configuration.toml. However, so far, it's been up to the ones using it to hack the generated image in order to add those guest hooks, which is far from handy. Let's add a way for the ones interested on this feature to just drop a tarball file under the same known build directory, spcificy an env var, and let the guest hooks be installed during the rootfs build. Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-04-14 10:02:16 +02:00
RuoqingHe	0b4fea9382	Merge pull request #11134 from stevenhorsman/rust-toolchain rust: Add rust-toolchain.toml	2025-04-12 15:03:29 +08:00
Steve Horsman	792180a740	Merge pull request #11105 from stevenhorsman/required-tests-process-update doc: Update required job process	2025-04-11 14:53:27 +01:00
stevenhorsman	93830cbf4d	rust: Add rust-toolchain.toml Add a top-level rust-toolchain.toml with the version that matches version.yaml to ensure that we stay in sync Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-11 09:24:04 +01:00
Steve Horsman	ad68cb9afa	Merge pull request #11106 from stevenhorsman/rust-workspace-settings agent: Inherit rust workspace settings	2025-04-10 09:47:53 +01:00
Xynnn007	17d0db9865	agent: add initdata parse logic Kata-agent now will check if a device /dev/vd* with 'initdata' magic number exists. If it exists, kata-agent will try to read it. Bytes 9~16 are the length of the compressed initdata toml in little endine. Bytes starting from 17 is the compressed initdata. The initdata image device layout looks like 0 8 16 16+length ... EOF 'initdata' length gzip(initdata toml) paddings The initdata will be parsed and put as aa.toml, cdh.toml and policy.rego to /run/confidential-containers/initdata. When AgentPolicy is initialized, the default policy will be overwritten by that. When AA is to be launched, if initdata is once processed, the launch arg will include --initdata parameter. Also, if /run/confidential-containers/initdata/aa.toml exists, the launch args will include -c /run/confidential-containers/initdata/aa.toml. When CDH is to be launched, if initdata is once processed, the launch args will include -c /run/confidential-containers/initdata/cdh.toml Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2025-04-10 13:09:51 +08:00
stevenhorsman	75dc4ce3bf	doc: Update required job process Add information about using required-tests.yaml as a way to track jobs that are required. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-09 18:13:45 +01:00
Steve Horsman	0dbf4ec39f	Merge pull request #10678 from stevenhorsman/update-gatekeeper-rules-for-md-only-PRs ci: Update gatekeeper tests for md files	2025-04-09 18:10:05 +01:00
stevenhorsman	d1d60cfe89	ci: Update gatekeeper tests for md files Update the required-tests.yaml so that .md files only trigger the static tests, not the build, or CI Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-09 17:55:27 +01:00
Steve Horsman	9b401cd250	Merge pull request #11090 from stevenhorsman/required-test-updates ci: required-tests fixes/updates	2025-04-09 14:41:57 +01:00
stevenhorsman	576747b060	ci: Skip tests if we only update the required list When making new tests required, or removing existing tests from required, this doesn't impact the CI jobs, so we don't need to run all the tests. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-09 14:22:47 +01:00
stevenhorsman	9a7c5b914e	ci: required-tests fixes/updates - Remove metrics setup job - Update some truncation typos of job names - Add shellcheck-required - Remove the ok-to-test as a required label on the build test as it isn't needed as a trigger Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-09 14:22:37 +01:00
Xuewei Niu	5774f131ec	Merge pull request #10938 from Apokleos/fix-iommugrp-symlink runtime-rs: Simplify iommu group base name extraction from symlink	2025-04-09 19:23:48 +08:00
Xuewei Niu	fd9a4548ab	Merge pull request #11129 from RuoqingHe/entend-runtime-rs-workspace runtime-rs: Extend runtime-rs workspace and centralize local dependencies	2025-04-09 19:23:15 +08:00
stevenhorsman	6603cf7872	agent: Update vsock-exporter to use workspace settings To reduce duplication, we could update the vsock-exporter crate to use settings and versions from the agent, where applicable. > [!NOTE] > In order to use the workspace, this has bumped some crate versions Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-09 12:02:43 +01:00
stevenhorsman	2cb9fd3c69	agent: Update rustjail to use workspace settings - To reduce duplication, we could update the rustjail crate to use settings and versions from the agent, where applicable. - Also switch to using the derive feature in serde crate rather than the separate serde_derive to avoid keeping both versions in sync > [!NOTE] > In order to use the workspace, this has bumped some crate versions Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-09 12:02:43 +01:00
stevenhorsman	655255b50c	agent: Update policy to use workspace settings To reduce duplication, we could update the policy crate to use settings and versions from the agent, where applicable. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-09 11:42:05 +01:00
stevenhorsman	1bec432ffa	agent: Create workspace package and dependencies - Create agent workspace dependencies and packge info so that the packages in the workspace can use them - Group the local dependencies together for clarity (like in #11129) Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-09 11:42:00 +01:00
Ruoqing He	28c09ae645	runtime-rs: Put local dependencies into workspace Put local dependencies into workspace to avoid complex path dependencies all over the workspace. This gives an overview of local dependencies this workspace uses, where those crates are located, and simplifies the local dependencies referencing process. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-04-09 07:30:29 +00:00
Ruoqing He	3769ad9c0d	runtime-rs: Group local dependencies Judging by the layout of the `Cargo.toml` files, local dependencies are intentionally separated from other dependencies, let's enforce it workspace-wise. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-04-09 03:52:16 +00:00
Ruoqing He	abb5fb127b	runtime-rs: Extend workspace to cover all crates Only `shim` and `shim-ctl` are incorporated in `runtime-rs`'s workspace, let's extend it to cover all crates in `runtime-rs/crates`. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-04-09 03:51:48 +00:00
alex.lyn	58bebe332a	runtime-rs: Simplify iommu group base name extraction from symlink Just get base name from iommu group symlink is enough. As the validation will be handled in subsequent steps when constructing the full path /sys/kernel/iommu_groups/$iommu_group. In this PR, it will remove dupicalted validation of iommu_group. Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-04-09 09:28:00 +08:00
Steve Horsman	8df271358e	Merge pull request #11128 from stevenhorsman/disable-metrics-jobs ci: Remove metric jobs	2025-04-08 18:16:35 +01:00
stevenhorsman	e6cca9da6d	ci: Remove metric jobs The metrics runner is broken, so skip the metrics jobs to stop the CI being stuck waiting. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-04-08 17:55:07 +01:00
RuoqingHe	713cbb0c62	Merge pull request #11121 from fidencio/topic/bump-kernel-lts versions: Bump LTS kernel	2025-04-08 17:28:31 +08:00
Xuewei Niu	d3c9cc4e36	Merge pull request #11014 from teawater/mem-agent-doc docs: Add how-to-use-memory-agent.md to howto	2025-04-08 17:20:25 +08:00
Fabiano Fidêncio	a40b919afe	Merge pull request #10724 from likebreath/0109/upgrade_clh_v43.0 versions: Upgrade to Cloud Hypervisor v45.0	2025-04-08 08:11:30 +02:00
Fabiano Fidêncio	bc04c390bd	versions: Bump LTS kernel 6.12.22 has been released Yesterday, let's bump to it. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-04-07 21:46:29 +02:00
Bo Chen	ee84068aed	versions: Upgrade to Cloud Hypervisor v45.0 Details of this release can be found in our roadmap project as iteration v45.0: https://github.com/orgs/cloud-hypervisor/projects/6. Fixes: #10723 Signed-off-by: Bo Chen <bchen@crusoe.ai> Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>	2025-04-07 20:33:34 +02:00
Dan Mihai	8779abd0a1	Merge pull request #11057 from mythi/tdx-qgs-uds runtime: qemu: add support to use TDX QGS via Unix Domain Sockets	2025-04-07 07:27:48 -07:00
Dan Mihai	e606a8deb5	Merge pull request #11103 from Ankita13-code/ankitapareek/policy-input-validation policy: Add missing input validations for ExecProcessRequest	2025-04-07 07:26:24 -07:00
Steve Horsman	ba92639481	Merge pull request #11094 from RuoqingHe/2025-03-28-enable-riscv-assets-build ci: Enable `build-kata-static-tarball-riscv64.yaml`	2025-04-07 11:26:15 +01:00
Fabiano Fidêncio	c75ea2582e	Merge pull request #11114 from fidencio/topic/allow-building-the-agent-without-enabling-guest-pull agent: Allow users to build without guest-pull	2025-04-06 12:17:27 +01:00
Fabiano Fidêncio	e3c98a5ac7	agent: Allow users to build without guest-pull For those not interested in CoCo, let's at least allow them to easily build the agent without the guest-pull feature. This reduces the binary size (already stripped) from 25M to 18M. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-04-04 22:58:43 +01:00
Ankita Pareek	7e450bc1c2	policy: Add missing input validations for ExecProcessRequest This commit introduces missing validations for input fields in ExecProcessRequest to harden the security policy. The changes include: - Update rules.rego to add null/empty field enforcements for String_user, SelinuxLabel and ApparmorProfile - Add unit test cases for ExecProcessRequest for each of the validations Signed-off-by: Ankita Pareek <ankitapareek@microsoft.com>	2025-04-03 12:53:59 +00:00
Hui Zhu	17af28acad	docs: Add how-to-use-memory-agent.md to howto Add how-to-use-memory-agent.md (How to use mem-agent to decrease the memory usage of Kata container) to docs to show how to use mem-agent. Fixes: #11013 Signed-off-by: Hui Zhu <teawater@gmail.com>	2025-04-02 17:45:59 +08:00
Lukáš Doktor	009aa6257b	ci.ocp: Override default runtimeclass CPU resources some of the e2e tests spawn a lot of workers which are mainly idle, but the scheduler fails to schedule them due to cpu resource overcommit. For our testing we are more focused on having actual pods running than the speed of the scheduled pods so let's increase the amount of schedulable pods by decreasing the default cpu requests. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-04-02 10:30:40 +02:00
RuoqingHe	2f134514b0	Merge pull request #11097 from kimullaa/robust-user-input kata-deploy: add INSTALLATION_PREFIX validation	2025-04-02 10:05:03 +08:00
Ruoqing He	96e43fbee5	ci: Enable `build-kata-static-tarball-riscv64.yaml` Previously we introduced `build-kata-static-tarball-riscv64.yaml`, enable that workflow in `ci.yaml`. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-04-01 16:35:14 +08:00
RuoqingHe	10ceeb0930	Merge pull request #11104 from fidencio/topic/kata-deploy-create-runtimeclasses-by-default kata-deploy: Create runtimeclasses by default	2025-04-01 10:55:44 +08:00
RuoqingHe	b19a8c7b1c	Merge pull request #11066 from kimullaa/update-command-sample kernel: Update the usage in readme	2025-04-01 09:12:43 +08:00
RuoqingHe	b046f79d06	Merge pull request #11100 from kimullaa/remove-double-slash kata-deploy: remove the double "/"	2025-04-01 08:17:00 +08:00
Shunsuke Kimura	a05f5f1827	kata-deploy: add INSTALLATION_PREFIX validation INSTALLATION_PREFIX must begin with a "/" because it is being concatenated with /host. If there is no /, displays a message and makes an error. Fixes: #11096 Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>	2025-04-01 06:47:30 +09:00
Shunsuke Kimura	a49b6f8634	kata-deploy: Moves the function to the top Move functions that may be used in validation to the top. Fixes: #11097 Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>	2025-04-01 06:47:30 +09:00
Zvonko Kaiser	d81a1747bd	Merge pull request #11085 from kevinzs2048/fix-virtiomem runtime-go: qemu: Fix sandbox start failing with virtio-mem enable on arm64	2025-03-31 17:09:43 -04:00
Zvonko Kaiser	e5c4cfb8a1	Merge pull request #11081 from BbolroC/unsealed-secret-fix tests: Enable sealed secrets for all TEEs	2025-03-31 11:19:52 -04:00
Shunsuke Kimura	c0af0b43e0	kernel: Update the outdated usage in the readme Since it is difficult to update the README when modifying the options of ./build-kernel.sh, instead of update the README, we encourage users to run the -h command. Fixes: #11065 Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>	2025-03-31 23:29:58 +09:00
Shunsuke Kimura	902cb5f205	kata-deploy: remove the double "/" Currently, ConfigPath in containerd.toml is a double "/" as follows. ``` [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-clh.options] ConfigPath = "/opt/kata/share/defaults/kata-containers//configuration-clh.toml" ... [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata-cloud-hypervisor.options] ConfigPath = "/opt/kata/share/defaults/kata-containers//runtime-rs/configuration-cloud-hypervisor.toml" ... ``` So, removed the double "/". Fixes: #11099 Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>	2025-03-31 22:31:36 +09:00
Fabiano Fidêncio	28be53ac92	kata-deploy: Create runtimeclasses by default Let's make the life of the users easier and create the runtimeclasses for them by default. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-03-31 11:29:44 +01:00
Xuewei Niu	abbc9c6b50	Merge pull request #11101 from RuoqingHe/runtime-rs-fix-fmt-check runtime-rs: Remove redundant empty line	2025-03-31 16:28:55 +08:00
Ruoqing He	3c78c42ea5	runtime-rs: Remove redundant empty line While running `cargo fmt -- --check` in `src/runtime-rs` directory, it errors out and suggesting these is an redundant empty line, which prevents `make check` of `runtime-rs` component from passing. Remove redundant empty line to fix this. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-31 00:39:04 +08:00
Steve Horsman	44bab5afc4	Merge pull request #11091 from fidencio/topic/ci-add-kata-deploy-tests-as-required gatekeeper: Add kata-deploy tests as required	2025-03-28 11:05:03 +00:00
Fabiano Fidêncio	5a08d748b9	Merge pull request #11088 from kimullaa/fix-cleanup-failure kata-deploy: Fix kata-cleanup's CrashLoopBackOff	2025-03-27 20:33:52 +01:00
Fabiano Fidêncio	700944c420	gatekeeper: Add kata-deploy tests as required kata-deploy tests have been quite stable, working for more than 10 days without any nightly failure (or any failure reported at all), and I'll be the one maintaining those. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-03-27 19:47:38 +01:00
Steve Horsman	97bd311a66	Merge pull request #11058 from stevenhorsman/required-static-checks-rename ci: Update static-checks strings	2025-03-27 12:56:28 +00:00
Xuewei Niu	54dcf0d342	Merge pull request #11056 from RuoqingHe/runtime-qemu-riscv runtime: Support and enable build on riscv64	2025-03-27 17:02:21 +08:00
Fabiano Fidêncio	047b7e1fb7	Merge pull request #11063 from lifupan/fix_compile runtime-rs: update the protobuf to 3.7.1	2025-03-27 09:52:20 +01:00
Fabiano Fidêncio	41b536d487	Merge pull request #11059 from microsoft/danmihai1/tests-common tests: k8s: clean-up shellcheck warnings in tests_common.sh	2025-03-27 09:51:49 +01:00
Shunsuke Kimura	9ab6ab9897	kata-deploy: Fix kata-cleanup's CrashLoopBackOff Since kata-deploy.sh references an undefined variable, kata-cleanup.yaml enters a CrashLoopBackOff state. ``` $ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-containers/main/tools/packaging/kata-deploy/kata-cleanup/base/kata-cleanup.yaml daemonset.apps/kubelet-kata-cleanup created $ kubectl get pods -n kube-system kubelet-kata-cleanup-zzbd2 0/1 CrashLoopBackOff 3 (33s ago) 80s $ kubectl logs -n kube-system daemonsets/kubelet-kata-cleanup /opt/kata-artifacts/scripts/kata-deploy.sh: line 19: SHIMS: unbound variable ``` Therefore, set an initial value for the environment variables. Fixes: #11083 Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>	2025-03-27 15:00:19 +09:00
Hyounggyu Choi	0432d2fcdf	Merge pull request #11086 from BbolroC/fix-overwrite-containerd-config tests: Make sure /etc/containerd before writing config	2025-03-27 05:57:31 +01:00
Ruoqing He	46caa986bb	ci: Skip tests depend on virtualization on riscv64 `VMContainerCapable` requires a present `kvm` device, which is not yet available in our RISC-V runners. Skipped related tests if it is running on `riscv-builder`. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-27 10:47:49 +08:00
Ruoqing He	7f0b1946c5	ci: Enable build-check for runtime on riscv64 `runtime` support for riscv64 is now ready, let enable building and testing on that component. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-27 10:38:30 +08:00
Yuting Nie	1f52f83309	runtime: Enable kata-check test on riscv64 Provide according tests to cover `kata-runtime` package, test `kata-runtime`'s `check` functionality on riscv64 platforms. Signed-off-by: Yuting Nie <nieyuting@iscas.ac.cn>	2025-03-27 10:36:55 +08:00
Yuting Nie	b6924ef5e5	runtime: Add getExpectedHostDetails for riscv64 Add `getExpectedHostDetails` with expected value according to template defined in `kata-check_data_riscv64_test.go`. This provides necessary `HostInfo` for tests to cover `kata-check_riscv64.go`. Signed-off-by: Yuting Nie <nieyuting@iscas.ac.cn>	2025-03-27 10:34:34 +08:00
Yuting Nie	594c5e36a6	runtime: Add mock data for kata-check Add definition of `testCPUInfoTemplate` which is retrieved from `/proc/cpuinfo` of a QEMU emulated virtual machine on virt board. Signed-off-by: Yuting Nie <nieyuting@iscas.ac.cn>	2025-03-27 10:33:42 +08:00
Yuting Nie	0ff5cb1e66	runtime: Enable testSetCPUTypeGeneric for riscv64 `testSetCPUTypeGeneric` will be used for writting `kata-check` in `kata-runtime` on riscv64 platforms, enable building for later testing. Signed-off-by: Yuting Nie <nieyuting@iscas.ac.cn>	2025-03-27 10:32:29 +08:00
Ruoqing He	2329aeec38	runtime: Disable race flag for riscv64 `-race` flag used for `go test` is not yet supported on riscv64 platforms, disable it for now. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-27 10:28:53 +08:00
Ruoqing He	1b4dbebb1b	runtime: Enable runtime to build on riscv64 Convert Rust arch to Go arch in Makefile, and add `riscv64-options.mk` to provide definitions required for runtime to build on riscv64. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-27 10:22:55 +08:00
Ruoqing He	805da14634	runtime: Enable runtime check for riscv64 Enable `kata-runtime check` command to work on riscv64 platforms to make sure required features/devices presents. Co-authored-by: Yuting Nie <nieyuting@iscas.ac.cn> Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-27 10:07:09 +08:00
Ruoqing He	96b2d25508	runtime: Define default values for QEMU riscv Provide default values while invoking QEMU as the hypervisor for Go runtime on riscv64 platform. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-27 10:05:36 +08:00
Ruoqing He	1662595146	runtime: Introduce riscv64 to govmm pkg Define `vmm` for riscv64, set `MaxVCPUs` to 512 as QEMU RISC-V virt Generic Virtual Platform [1] define. [1] https://www.qemu.org/docs/master/system/riscv/virt.html Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-27 09:57:49 +08:00
Ruoqing He	1e4963a3b2	runtime: Define availableGuestProtection for riscv64 `GuestProtection` feature is not made available yet, return `noneProtection` for now. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-27 09:34:53 +08:00
Ruoqing He	4947938ce8	runtime: Introduce riscv64 template for vm factory Set `templateDeviceStateSize` to 8 as other architectures did. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-27 09:28:32 +08:00
Zvonko Kaiser	b7cf4fd2e6	Merge pull request #11053 from ldoktor/ci ci: shellcheck fixes	2025-03-26 13:22:56 -04:00
Hyounggyu Choi	1e187482d4	tests: Make sure /etc/containerd before writing config We get the following error while writing containerd config if a base dir `/etc/containerd` does not exist like: ``` sudo tee /etc/containerd/config.toml << EOF ... EOF tee: /etc/containerd/config.toml: No such file or directory ``` The commit makes sure a base directory for containerd before writing config and drops the config file deletion because a default behaviour of `tee` is overwriting. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-03-26 18:19:45 +01:00
Hyounggyu Choi	0aa76f7206	tests: Enable sealed secrets for TEEs Fixes: #11011 This commit allows all TEEs to run the sealed secret test. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-03-26 17:50:41 +01:00
Hyounggyu Choi	423ad8341d	agent: Call cdh_handler for sealed secrets after add_storage() As reported in #11011, mounted secrets are available after a container image is pulled by add_storage() for IBM SE. But secure mount should be handled before the `add_storage()`. Therefore, this commit divides cdh_handler() into: - cdh_handler_trusted_storage() - cdh_handler_sealed_secrets() and calls cdh_handler_sealed_secrets() after add_storage() while keeping cdh_handler_trusted_storage() unchanged. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-03-26 17:50:41 +01:00
Fabiano Fidêncio	7a0ac55f22	Merge pull request #10984 from fidencio/topic/tests-kata-deploy-ground-work-to-rewrite-the-tests tests: kata-deploy: The rest of the ground work to rewrite the kata-deploy tests	2025-03-26 17:47:48 +01:00
Hyounggyu Choi	8088064b8b	tests: Set default policy before running sealed secrets tests The test `Cannot get CDH resource when deny-all policy is set` completes with a KBS policy set to deny-all. This affects the future TEE test (e.g. k8s-sealed-secrets.bats) which makes a request against KBS. This commit introduces kbs_set_default_policy() and puts it to the setup() in k8s-sealed-secrets.bats. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-03-26 17:44:38 +01:00
Jakob Naucke	d808cef2fb	agent: AP bind-associate for Secure Execution Kata Containers has support for both the IBM Secure Execution trusted execution environment and the IBM Crypto Express hardware security module (used via the Adjunct Processor bus), but using them together requires specific steps. In Secure Execution, the Acceleration and Enterprise PKCS11 modes of Crypto Express are supported. Both modes require the domain to be _bound_ in the guest, and the latter also requires the domain to be _associated_ with a _guest secret_. Guest secrets must be submitted to the ultravisor from within the guest. Each EP11 domain has a master key verification pattern (MKVP) that can be established at HSM setup time. The guest secret and its ID are to be provided at `/vfio_ap/{mkvp}/secret` and `/vfio_ap/{mkvp}/secret_id` via a key broker service respectively. Bind each domain, and for each EP11 domain, - get the secret and secret ID from the addresses above, - submit the secret to the ultravisor, - find the index of the secret corresponding to the ID, and - associate the domain to the index of this secret. To bind, add the secret, parse the info about the domain, and associate, the s390_pv_core crate is used. The code from this crate also does the AP online check, which can be removed from here. Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2025-03-26 16:37:23 +01:00
Kevin Zhao	211a36559c	runtime-go: qemu: Fix sandbox start failing with virtio-mem enable on arm64 Also add CONFIG_VIRTIO_MEM to arm64 platform Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2025-03-26 22:31:00 +08:00
Fabiano Fidêncio	404e212102	tests: kata-deploy: Use helm_helper() With this we switch to fully testing with helm, instead of testimg with the kustomizations (which will soon be removed). Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-03-26 13:30:15 +01:00
Fabiano Fidêncio	f7976a40e4	tests: Create a helm_helper() common function Let's use what we have in the k8s functional tests to create a common function to deploy kata containers using our helm charts. This will help us immensely in the kata-deploy testing side in the near future. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-03-26 13:30:11 +01:00
Fabiano Fidêncio	eb884d33a8	tests: k8s: Export all the default env vars on gha-run.sh This is not strictly needed, but it does help a lot when setting up a cluster manually, while still relying on those scripts. While here, let's also ensure the assignment is between quotes, to make shellchecker happier. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-03-26 13:23:16 +01:00
Saul Paredes	ae5c587efc	Merge pull request #11074 from Sumynwa/sumsharma/genpolicy_test genpolicy: Refactor tests to allow different request types in a testcases json.	2025-03-25 12:38:19 -07:00
Sumedh Sharma	3406df9133	genpolicy: Refactor tests to add different request types in testcases json This commit introduces changes to add test data for multiple request type in a single testcases.json file. This allows for stateful testing, for ex: enable testing ExecProcessRequest using policy state set after testing a CreateContainerRequest. Fixes #11073. Signed-off-by: Sumedh Sharma <sumsharma@microsoft.com>	2025-03-25 13:52:17 +05:30
Mikko Ylinen	85f3391bcf	runtime: qemu: add support to use TDX QGS via Unix Domain Sockets TDX Quote Generation Service (QGS) signs TDREPORT sent to it from Qemu (GetQuote hypercall). Qemu needs quote-generation-socket address configured for IPC. Currently, Kata govmm only enables vsock based IPC for QGS but QGS supports Unix Domain Sockets too which works well for host process to process IPC (Qemu <-> QGS). The QGS configuration to enable UDS is to run the service with "-port=0" parameter. The same works well here too: setting "tdx_quote_generation_service_socket_port=0" let's users to enable UDS based IPC. The socket path is fixed in QGS and cannot be configured: when "-port=0" is used, the socket appears in /var/run/tdx-qgs/qgs.socket. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-03-25 10:18:40 +02:00
RuoqingHe	7a704453b6	Merge pull request #11075 from microsoft/danmihai1/genpolicy-debug-build genpolicy: add support for BUILD_TYPE=debug	2025-03-25 14:59:15 +08:00
RuoqingHe	5d68600c06	Merge pull request #11010 from stevenhorsman/metrics-containerd-debugging metrics: Test improvements	2025-03-25 11:38:28 +08:00
Dan Mihai	15c9035254	genpolicy: add support for BUILD_TYPE=debug Use "cargo build --release" when BUILD_TYPE was not specified, or when BUILD_TYPE=release. The default "cargo build" behavior is to build in debug mode. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-24 16:10:20 +00:00
Jakob Naucke	683a482d64	protos: Add CDH GetResourceService Add service to get arbitrary data from Confidential Data Hub. Taken from https://github.com/confidential-containers/guest-components/tree/main/api-server-rest. Marked as `#[allow(dead_code)]` because planned use is architecture-specific at this time. Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2025-03-24 15:46:40 +00:00
RuoqingHe	f6a1c6d0e0	Merge pull request #11069 from kimullaa/exit-if-action-is-invalid kata-deploy: return exit code for invalid argument	2025-03-24 09:40:39 +08:00
Shunsuke Kimura	e5d7414c33	kata-deploy: Return exit code for invalid argument It hangs when invalid arguments are specified. ```bash kata-deploy-6sr2p:/# /opt/kata-artifacts/scripts/kata-deploy.sh xxx Action: * xxx ... Usage: /opt/kata-artifacts/scripts/kata-deploy.sh [install/cleanup/reset] ERROR: invalid arguments ... ^C <- hang ``` I changed it to behave the same as when there are no arguments. ```bash kata-deploy-6sr2p:/# /opt/kata-artifacts/scripts/kata-deploy.sh Usage: /opt/kata-artifacts/scripts/kata-deploy.sh [install/cleanup/reset] ERROR: invalid arguments kata-deploy-6sr2p:/# echo $? 1 ``` Fixes: #11068 Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>	2025-03-22 21:32:38 +09:00
Aurélien Bombo	17baa6199b	Merge pull request #11061 from RuoqingHe/2025-03-21-generalize-non-kvm ci: Generalize `GITHUB_RUNNER_CI_ARM64`	2025-03-21 15:23:51 -05:00
Fupan Li	4b93176225	runtime-rs: update the protobuf to 3.7.1 Since some files generated by protobuf were share between runtime-rs and kata agent, and the kata agent's dependency image-rs dependened protobuf@3.7.1, thus we'd better to keep the protobuf version aligned between runtime-rs and agent, otherwise, we couldn't compile the runtime-rs and agent at the same time. Fixes: https://github.com/kata-containers/kata-containers/issues/10650 Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-03-21 17:46:12 +08:00
Ruoqing He	5e81f67ceb	ci: Generalize GITHUB_RUNNER_CI_ARM64 `GITHUB_RUNNER_CI_ARM64` is turned on for self hosted runners without virtualization to skipped those tests depend on virtualization. This may happen to other archs/runners as well, let's generalize it to `GITHUB_RUNNER_CI_NON_VIRT` so we can reuse it on other archs. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-21 09:49:44 +08:00
RuoqingHe	e84f7c2c4b	Merge pull request #11046 from mythi/drop-dcap-libs build: drop libtdx-attest	2025-03-21 09:23:33 +08:00
Dan Mihai	835c6814d7	tests: k8s/tests_common: avoid using regex More straightforward implementation of hard_coded_policy_tests_enabled, that avoids ShellCheck warning: warning: Remove quotes from right-hand side of =~ to match as a regex rather than literally. [SC2076] Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-20 22:23:19 +00:00
Dan Mihai	d83b8349a2	tests: policy: avoid using caller's variable Fix unintended use of caller's variable. Use the corresponding function parameter instead. ShellCheck: warning: policy_settings_dir is referenced but not assigned. [SC2154] Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-20 19:02:29 +00:00
Dan Mihai	59a70a2b28	tests: k8s/tests_common: avoid masking return values Avoid masking command return values by declaring and only then assigning. ShellCheck: warning: Declare and assign separately to avoid masking return values. [SC2155] Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-20 19:02:29 +00:00
Dan Mihai	b895e3b3e5	tests: k8s/tests_common.sh: add variable assignments Pick the the values exported by other scripts. ShellCheck: warning: AUTO_GENERATE_POLICY is referenced but not assigned. [SC2154] Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-20 19:02:29 +00:00
Dan Mihai	0f4de1c94a	tests: tests_common: remove useless assignment ShellCheck: warning: This assignment is only seen by the forked process. [SC2097] warning: This expansion will not see the mentioned assignment. [SC2098] Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-20 19:02:29 +00:00
Dan Mihai	9c0d069ac7	tests: tests_common: prevent globbing and word splitting ShellCheck: note: Double quote to prevent globbing and word splitting. [SC2086] Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-20 19:02:28 +00:00
Dan Mihai	15961b03f7	tests: k8s/tests_common.sh: -n instead of ! -z ShellCheck: note: Use -n instead of ! -z. [SC2236] Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-20 19:02:28 +00:00
Dan Mihai	4589dc96ef	tests: k8s/tests_common.sh: add double quoting ShellCheck: note: Prefer double quoting even when variables don't contain special characters. [SC2248] Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-20 19:02:28 +00:00
Dan Mihai	cc5f8d31d2	tests: k8s/tests_common.sh: add braces ShellCheck: add braces around variable references: note: Prefer putting braces around variable references even when not strictly required. [SC2250] Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-20 19:02:28 +00:00
Dan Mihai	0d3f9fcee1	tests: tests_common: export variables used externally ShellCheck: export variables used outside of tests_common.sh - e.g., warning: timeout appears unused. Verify use (or export if used externally). [SC2034] Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-20 19:02:28 +00:00
Dan Mihai	5df43ffc7c	tests: k8s/tests_common.sh: Prefer [[ ]] over [ ] Replace [ ] with [[ ]] as advised by shellcheck: note: Prefer [[ ]] over [ ] for tests in Bash/Ksh. [SC2292] Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-20 19:02:28 +00:00
Dan Mihai	f79fabab24	Merge pull request #11024 from microsoft/danmihai1/empty-exec-output tests: k8s: retry "kubectl exec" on empty output	2025-03-20 11:03:08 -07:00
stevenhorsman	70d32afbb7	ci: Remove metrics tests from required list The metrics tests haven't been stable, or required through github for many week now, so update the required-tests.yaml list to re-sync Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-20 16:03:03 +00:00
stevenhorsman	607b27fd7f	ci: Update static-checks strings With the refactor in #10948 the names of the static checks has changed, so update these. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-20 13:45:57 +00:00
Mikko Ylinen	f52a565834	build: drop libtdx-attest with the latest CoCo guest-components, tdx-attester no longer depends on libtdx attest. Stop installing it to the rootfs. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-03-20 10:45:30 +02:00
Steve Horsman	c0632f847f	Merge pull request #11043 from stevenhorsman/3.15.0-release release: Bump version to 3.15.0	2025-03-20 07:38:20 +00:00
Greg Kurz	e19b81225c	Merge pull request #11045 from kata-containers/sprt/fix-gha-tag security: ci: Pin third-party actions to commit hashes	2025-03-20 08:14:06 +01:00
Aurélien Bombo	a678046d13	gha: Pin third-party actions to commit hashes A popular third-party action has recently been compromised [1][2] and the attacker managed to point multiple git version tags to a malicious commit containing code to exfiltrate secrets. This PR follows GitHub's recommendation [3] to pin third-party actions to a full-length commit hash, to mitigate such attacks. Hopefully actionlint starts warning about this soon [4]. [1] https://www.cve.org/CVERecord?id=CVE-2025-30066 [2] https://www.stepsecurity.io/blog/harden-runner-detection-tj-actions-changed-files-action-is-compromised [3] https://docs.github.com/en/actions/security-for-github-actions/security-guides/security-hardening-for-github-actions#using-third-party-actions [4] https://github.com/rhysd/actionlint/pull/436 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-03-19 13:52:49 -05:00
stevenhorsman	fad248ef09	release: Bump version to 3.15.0 Bump VERSION and helm-chart versions Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-19 17:28:06 +00:00
Fabiano Fidêncio	a6e5d28a15	Merge pull request #11055 from stevenhorsman/bump-github.com/containerd/containerd/v1.7.27 runtime: Update github.com/containerd/containerd	2025-03-19 18:19:10 +01:00
stevenhorsman	cb7c599180	runtime: Switch from deprecated tracer `go.opentelemetry.io/otel/trace.NewNoopTracerProvider` is deprectated now, so switch to `go.opentelemetry.io/otel/trace/noop.NewTracerProvider` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-19 14:22:06 +00:00
stevenhorsman	8f22b07aba	runtime: Update github.com/containerd/containerd Update to 1.7.27 to resolve CVE-2024-40635 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-19 13:48:04 +00:00
Lukáš Doktor	d708866b2a	ci.ocp: shellcheck various fixes various manual fixes. Related to: #10951 Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-03-19 12:26:28 +01:00
Lukáš Doktor	7e11489daf	ci: shellcheck - collection of fixes manual fixes of various issues. Related to: #10951 Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-03-19 12:26:23 +01:00
Lukáš Doktor	f62e08998c	ci: shellcheck - remove unused argument the "-a" argument was introduced with this tool but never was actually used. Related to: #10951 Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-03-19 12:26:19 +01:00
Lukáš Doktor	02deb1d782	ci: shellcheck SC2248 SC2248 (style): Prefer double quoting even when variables don't contain special characters, might result in arguments difference, shouldn't in our cases. Related to: #10951 Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-03-19 12:26:16 +01:00
Lukáš Doktor	d80e7c7644	ci: shellcheck SC2155 SC2155 (warning): Declare and assign separately to avoid masking return values, should be harmless. Related to: #10951 Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-03-19 12:26:12 +01:00
Lukáš Doktor	6552ac41e0	ci: shellcheck SC2086 SC2086 Double quote to prevent globbing and word splitting, might break places where we deliberately use word splitting, but we are not using it here. Related to: #10951 Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-03-19 12:26:08 +01:00
Lukáš Doktor	154a4ddc00	ci: shellcheck SC2292 SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh. This might result in different handling of globs and some ops which we don't use. Related to: #10951 Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-03-19 12:26:03 +01:00
Lukáš Doktor	667e26036c	ci: shellcheck SC2250 Treat the SC2250 require-variable-braces in CI. There are no functional changes. Related to: #10951 Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-03-19 12:25:44 +01:00
Zvonko Kaiser	d37d9feee9	Merge pull request #11035 from kata-containers/sprt/fix-dependabot security: ci: Remove `replace` directives in go.mod files	2025-03-18 12:43:46 -04:00
Steve Horsman	ba5b0777b5	Merge pull request #11002 from fitzthum/bump-gc-0130 Bump Trustee and Guest Components for coco v0.13.0	2025-03-17 16:31:23 +00:00
RuoqingHe	36d2dee3a4	Merge pull request #11042 from RuoqingHe/runtime-rs-riscv runtime-rs: Support and enable build on riscv64	2025-03-17 21:42:15 +08:00
Ruoqing He	cb7508ffdc	ci: Enable runtime-rs component build-check on riscv64 `runtime-rs` is now buildable and testable on riscv64 platforms, enable `build-check` on `runtime-rs`. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-17 17:38:59 +08:00
Steve Horsman	f308cbba93	Merge pull request #11015 from AdithyaKrishnan/main CI: Mark SNP as a Required test	2025-03-17 09:27:28 +00:00
Ruoqing He	084fb2d780	runtime-rs: Enable RISC-V build Define `riscv64gc-options.mk` to enable `runtime-rs` to be built on RISC-V platforms. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-17 17:22:48 +08:00
Ruoqing He	fd6c16e209	kata-sys-util: Set NoProtection for riscv64 `available_guets_protection` is required for `runtime-rs` to infer while building it on riscv64 platforms. Set it to `NoProtection` as riscv64 does not support guest protection for now. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-17 17:22:48 +08:00
Aurélien Bombo	26bd7989b3	csi-kata-directvolume: Remove `replace` in go.mod Running `go mod tidy` and `go mod vendor` after this resulted in no-ops. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-03-14 18:00:36 +00:00
Aurélien Bombo	b965fe8239	tests: Run `go mod vendor` `go mod tidy` was a no-op. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-03-14 18:00:36 +00:00
Aurélien Bombo	e9f88757ba	tests: Remove `replace` directives in go.mod Same rationale as for runtime. With tests, the blackfriday replacement was actually meaningful, so I refactored some imports. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-03-14 18:00:36 +00:00
Aurélien Bombo	35c92aa6ad	runtime: Run `go mod vendor` Regenerating go module files. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-03-14 18:00:36 +00:00
Aurélien Bombo	fa0f85e8b0	runtime: Run `go mod tidy` Tidying up go.mod. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-03-14 18:00:36 +00:00
Aurélien Bombo	c3a9c70d45	runtime: Remove `replace` directives in go.mod These replace directives aren't understood by dependabot, hence dependabot can claim to upgrade a dependency, while a replace directive still makes the dependency point to an old version. Fixes: #11020 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-03-14 18:00:36 +00:00
Adithya Krishnan Kannan	32dbee8d7e	CI: Mark SNP as a Required test The SNP CI has been consistently passing and we request the @kata-containers/architecture-committee to mark this test as a required test. Signed-Off-By: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>	2025-03-14 12:48:55 -05:00
Dan Mihai	dab981b0bc	tests: k8s: retry "kubectl exec" on empty output Retry "kubectl exec" a few times if it unexpectedly produced an empty output string. This is an attempt to work around test failures similar to: https://github.com/kata-containers/kata-containers/actions/runs/13840930994/job/38730153687?pr=10983 not ok 1 Environment variables (from function `grep_pod_exec_output' in file tests_common.sh, line 394, in test file k8s-env.bats, line 36) `grep_pod_exec_output "${pod_name}" "HOST_IP=$[0-9]\+\(\.\\|$$\)\{4\}" "${exec_command[@]}"' failed That test obtained correct ouput from "sh -c printenv" one time, but the second execution of the same command returned an empty output string. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-14 17:03:03 +00:00
Tobin Feldman-Fitzthum	b7786fbcf0	agent: update image-rs for coco v0.13.0 image-rs has gotten a number of significant updates, eliminating corner cases with obscure containers, improving support for local certs, and more. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2025-03-14 10:44:10 -05:00
Tobin Feldman-Fitzthum	63ec1609bc	versions: update guest-components for coco v0.13.0 Update to the latest hash of guest-components. This will pick up some nice new features including using ec key for the rcar handshake. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2025-03-14 10:44:10 -05:00
Tobin Feldman-Fitzthum	c352905998	versions: bump trustee for coco v0.13.0 Update to new hashes for Trustee. The MSRV for Trustee is now 1.80.0 so bump the rust toolchain as well. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2025-03-14 10:44:04 -05:00
Steve Horsman	7968a3c09d	Merge pull request #11028 from Amulyam24/hooks gha: use runner hooks instead of pre/post scripts for ppc64le runners	2025-03-14 15:43:27 +00:00
stevenhorsman	1022d8d260	metrics: Update range for clh tests In `ef0e8669fb` we had been seeing some significantly lower minvalues in the jitter.Result test, so I lowered the mid-value rather than having a very high minpercent, but it appears that the variability of this result is very high, so we are still getting the occasional high value, so reset the midval and just have a bigger ranges on both sides, to try and keep the test stable. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-14 14:54:30 +00:00
stevenhorsman	d77008b817	metrics: Further reduce repeats for boot time tests on qemu I've seen failures on the third run, so reduce it further to just run twice on qemu Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-14 14:53:26 +00:00
stevenhorsman	97151cce4e	metrics: Improve iperf timeout The kubectl wait has a built in timeout of 30s, so wrapping it in waitForProcess, means we have 180/2 * 30 delay, which is much longer than intended, so just set the timeout directly. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-14 14:53:26 +00:00
Amulyam24	becb760e32	gha: use runner hooks instead of pre/post scripts for ppc64le runners This PR makes changes to remove steps to run scripts for preparing and cleaning the runner and instead use runner hooks env variables to manage them. Fixes: #9934 Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2025-03-14 17:12:54 +05:30
RuoqingHe	af4058fa82	Merge pull request #10889 from katexochen/p/config-idblock-qemu runtime: make SNP IDBlock configurable	2025-03-14 16:23:05 +08:00
Paul Meyer	a994f142d0	runtime: make SNP IDBlock configurable For a use case, we want to set the SNP IDBlock, which allows configuring the AMD ASP to enforce parameters like expected launch digest at launch. The struct with the config that should be enforced (IDBlock) is signed. The public key is placed in the auth block and the signature is verified by the ASP before launch. The digest of the public key is also part of the attestation report (ID_KEY_DIGESTS). Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-03-14 07:50:54 +01:00
RuoqingHe	810a6dafad	Merge pull request #10939 from mchtech/fix-unbound-var tools: initialize unbound variables in rootfs.sh	2025-03-14 08:22:05 +08:00
Saul Paredes	b7087eb0ea	Merge pull request #10983 from microsoft/cameronbaird/updateinterfacerequest-hardening-upstream genpolicy: Introduce UpdateInterfaceRequest rules in genpolicy-settings	2025-03-13 16:12:03 -07:00
Dan Mihai	b910daf625	Merge pull request #11012 from microsoft/saulparedes/validate_generated_name_upstr policy: validate pod generated name	2025-03-13 14:09:57 -07:00
Steve Horsman	199b16f053	Merge pull request #11022 from microsoft/danmihai1/polist-test-volume-path tests: k8s-policy-pod: safer host path volume source	2025-03-13 20:26:06 +00:00
Dan Mihai	0e26dd4ce8	tests: k8s-policy-pod: safer host path volume source Test using the host path /tmp/k8s-policy-pod-test instead of /var/lib/kubelet/pods. /var/lib/kubelet/pods might happen to contain files that CopyFileRequest would try to send to the Guest before CreateContainerRequest. Such CopyFileRequest was an unintended side effect of this test. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-13 18:56:57 +00:00
Cameron Baird	bceffd5ff6	genpolicy: Introduce UpdateInterfaceRequest rules in genpolicy-settings Introduce rules for UpdateInterfaceRequest and genpolicy tests for them. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-03-13 17:30:01 +00:00
Saul Paredes	1c406e9c1d	Merge pull request #11004 from microsoft/cameronbaird/updateroutesrequest-hardening-upstream genpolicy: Introduce UpdateRoutesRequest rules in genpolicy-settings	2025-03-13 10:11:39 -07:00
Saul Paredes	7a5db51c80	policy: validate pod generated name Validate sandbox name using a regex. If the YAML specifies metadata.name, use a regex that exact matches. If the YAML specifies metadata.generateName, use a regex that matches the prefix of the generated name. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2025-03-13 09:49:57 -07:00
Steve Horsman	e6a78e64e6	Merge pull request #10967 from stevenhorsman/coco-tests-required ci: Add coco required tests	2025-03-13 15:10:22 +00:00
mchtech	0e61eb215d	tools: initialize unbound variables in rootfs.sh Initialize unbound variables in rootfs.sh for RHEL series OS. Signed-off-by: mchtech <michu_an@126.com>	2025-03-13 22:57:43 +08:00
Fupan Li	592d58ca52	Merge pull request #11001 from RuoqingHe/enable-riscv-kernel-build kernel: Support and enable riscv kernel build	2025-03-13 19:28:00 +08:00
Ruoqing He	e0fb8f08d8	ci: Add riscv-builder to actionlint.yaml We have three SG2042 connected and labeled as `riscv-builder`, add that entry to `actionlint.yaml` to help linting while setting up workflows. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-13 13:43:29 +08:00
Ruoqing He	a7e953c7a7	ci: Enable static-tarball build for riscv64 Enable `kernel` and `virtiofsd` static-tarball build for riscv64. Since `virtiofsd` was previously supported and `kernel` is supported now. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-13 13:43:29 +08:00
Ruoqing He	3c8a8ca9c2	kernel: Enable riscv kernel build Modify `build-kernel.sh` to enable building of riscv64 kernel. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-13 13:43:29 +08:00
Ruoqing He	e316f633d8	kernel: Bump kata_config_version Bump kata_config_version since riscv kernel build is introduced. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-13 13:43:29 +08:00
Ruoqing He	31446b8be8	kernel: Skip ACPI common fragment for riscv ACPI is not yet ratified and is still frequently evolving, disable acpi.conf for riscv architecture. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-13 13:43:29 +08:00
Ruoqing He	ebd1214b2e	kernel: Introduce riscv mmu fragment conf Memory hotplug and related features is required, enable them in `mmu.conf`. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-13 13:43:29 +08:00
Ruoqing He	734f5d07a9	kernel: Introduce riscv pci fragment conf AIA (Advanced Interrupt Architecture) is available and enabled by default after v6.10 kernel, provide pci.conf to make proper use of IMSIC of AIA. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-13 13:43:29 +08:00
Ruoqing He	19d78ca844	kernel: Introduce riscv base fragment conf Create `riscv` folder for riscv64 architecture to be inferred while constructing kernel configuration, and introduce `base.conf` which builds 64-bit kernel and with KVM built-in to kernel. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-13 13:43:29 +08:00
Cameron Baird	cf129f3744	genpolicy: Introduce UpdateRoutesRequest rules in genpolicy-settings Introduce rule to block routes from source addresses which are the loopback. Block routes added to the lo device. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-03-12 19:03:57 +00:00
Dan Mihai	71d4ad5fca	Merge pull request #11003 from microsoft/mahuber/grpc-1-58-3 runtime: upgrade grpc vendor dependency	2025-03-12 09:23:07 -07:00
Wainer Moschetta	8c2d1b374c	Merge pull request #10892 from ldoktor/webhook ci: Change the way we modify runtimeclass in webhook	2025-03-12 12:32:45 -03:00
RuoqingHe	386fed342c	Merge pull request #10990 from kata-containers/shell-check-vendor-skip workflows: shellcheck: Expand vendor ignore	2025-03-12 21:34:26 +08:00
Alex Lyn	fdc0d81198	Merge pull request #10994 from teawater/swap7 runtime-rs: Add guest swap support	2025-03-12 17:59:00 +08:00
Hui Zhu	796eab3bef	runtime-rs: Update swap option of configuration file Remove swap configuration from qemu config file because runtime-rs qemu support code doesn't support hotplug block device. Add swap configuration to dragonball and cloud-hypervisor config file. Fixes: #10988 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2025-03-12 13:51:35 +08:00
Dan Mihai	4f41989a6a	Merge pull request #11009 from mythi/e2e-skip-flaky-tests tests: k8s: skip trusted storage tests for qemu-tdx	2025-03-11 12:13:35 -07:00
Dan Mihai	e40251d9f8	Merge pull request #11006 from ryansavino/fix-confidential-ssh-dockerfile tests: fix confidential ssh Dockerfile	2025-03-11 11:22:23 -07:00
Aurélien Bombo	33f3a8cf5f	Merge pull request #10973 from microsoft/danmihai1/main ci: temporarily avoid using the Mariner Host image	2025-03-11 10:24:00 -05:00
Steve Horsman	420b282279	Merge pull request #10948 from RuoqingHe/better-matrix ci: Refactor matrix for `build-checks`	2025-03-11 14:13:10 +00:00
Mikko Ylinen	71531a82f4	tests: k8s: skip trusted storage tests for qemu-tdx follow other TEEs to skip trusted storage tests due to #10838. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-03-11 15:14:03 +02:00
Hui Zhu	93cd30862d	libs: Add AddSwapPath to service AgentService AddSwap send the pci path to guest kernel to let it add swap device. But some mmio device doesn't have pci path. To support it add AddSwapPath send virt_path to guest kernel as swap device. Fixes: #10988 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2025-03-11 16:02:48 +08:00
Hui Zhu	7787340ab6	runtime-rs: Add guest swap support This commit add guest swap support. When configuration enable_guest_swap is enabled, runtime-rs will start a swap task. When the VM start or update the guest memory, the swap task will be waked up to create and insert a swap file. Before this job, swap task will sleep some seconds (set by configuration guest_swap_create_threshold_secs) to reduce the impact on guest kernel boot performance and prevent the insertion of multiple swap files due to frequent memory elasticity within a short period. The size of swap file is set by configuration guest_swap_size_percent. The percentage of the total memory to be used as swap device. Fixes: #10988 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2025-03-11 16:02:31 +08:00
Hui Zhu	4cd9d70c4d	runtime-rs: Add is_direct to struct BlockConfig Add is_direct to struct BlockConfig. This option specifies cache-related options for block devices. Denotes whether use of O_DIRECT (bypass the host page cache) is enabled. If not set, use configurarion block_device_cache_direct. Fixes: #10988 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2025-03-11 15:44:40 +08:00
Ryan Savino	1dbe3fb8bc	tests: fix confidential ssh Dockerfile Need to set correct permissions for ssh directories and files Fixes: #11005 Signed-Off-By: Ryan Savino <ryan.savino@amd.com>	2025-03-10 18:31:05 -05:00
Dan Mihai	e8405590c1	ci: temporarily avoid using the Mariner Host image Disable the Mariner host during CI, while investigating test failures with new Cloud Hypervisor v43.0. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-10 20:15:09 +00:00
Steve Horsman	730e007abd	Merge pull request #11000 from microsoft/danmihai1/print-exec-output2 tests: k8s: log kubectl exec ouput	2025-03-10 09:31:41 +00:00
Fupan Li	df9c6ae9d7	Merge pull request #10998 from teawater/ma_config runtime-rs: Add mem-agent config to clh and qemu config file	2025-03-10 16:23:20 +08:00
Dan Mihai	509e6da965	tests: k8s-env.bats: log exec output Log the "kubectl exec" ouput, just in case it helps investigate sporadic test errors like: https://github.com/kata-containers/kata-containers/actions/runs/13724022494/job/38387329321?pr=10973 not ok 1 Environment variables (in test file k8s-env.bats, line 37) `grep "HOST_IP=$[0-9]\+\(\.\\|$$\)\{4\}"' failed It appears that the first exec from this test case produced the expected output: MY_POD_NAME=test-env but the second exec produced something else - that will be logged after this change. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-07 19:37:20 +00:00
Dan Mihai	95d47e4d05	tests: k8s-configmap.bats: log exec output Log the "kubectl exec" ouput, just in case it helps investigate sporadic test errors like: https://github.com/kata-containers/kata-containers/actions/runs/13724022494/job/38387329268?pr=10973 not ok 1 ConfigMap for a pod (in test file k8s-configmap.bats, line 44) `kubectl exec $pod_name -- "${exec_command[@]}" \| grep "KUBE_CONFIG_2=value-2"' failed It appears that the first exec from this test case produced the expected output: KUBE_CONFIG_1=value-1 but the second exec produced something else - that will be logged after this change. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-07 19:35:45 +00:00
Dan Mihai	caee12c796	tests: k8s: add function to log exec output grep_pod_exec_output invokes "kubectl exec", logs its output, and checks that a grep pattern is present in the output. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-07 19:34:57 +00:00
Steve Horsman	014ff8476a	Merge pull request #10992 from microsoft/danmihai1/git-helper gha: always delete workspace on rebase error	2025-03-07 14:26:00 +00:00
Steve Horsman	cb682ef3c8	Merge pull request #10987 from RuoqingHe/enable-docker-on-riscv kata-deploy: Use docker.io for all architectures	2025-03-07 11:14:19 +00:00
Xuewei Niu	0671252466	Merge pull request #10760 from lifupan/route_flags_suport	2025-03-07 18:18:01 +08:00
Hui Zhu	691430ca95	runtime-rs: Add mem-agent config to clh and qemu config file Add mem-agent config to clh and qemu config file. Fixes: #10996 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2025-03-07 15:54:59 +08:00
Fupan Li	9a4c0a5c5c	agent: add the route flags support when adding routes Get the route entry's flags passed from host and set it in the add route request. Fixes: #7934 Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-03-07 09:56:08 +08:00
Fupan Li	d929bc0224	agent: refactor the code of update routes/interfaces We can use the netlink update method to add a route or an interface address. There is no need to delete it first and then add it. This can save two system commissions. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-03-07 09:56:08 +08:00
Fupan Li	aad915a7a1	agent: upgrade the netlink related crates Upgrade rtnetlink and related crates to support route flags. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-03-07 09:56:08 +08:00
Fupan Li	0995c6528e	runtime-rs: add the route flags support Get the route entry's flags from the host and pass it into kata-agent to add route entries with flags support. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-03-07 09:56:08 +08:00
Fupan Li	cda6d0e36c	runtime-rs: upgrade the netlink related crates Upgrade netlink-packet-route and rtnetlink to support route flags. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-03-07 09:56:08 +08:00
Fupan Li	1ade2a874f	runtime: add the flags support to the route setting We should support the flags when add the route from host to guest. Otherwise, some route would be set failed. Fixes: #7934 Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-03-07 09:56:08 +08:00
Dan Mihai	7b63f256e5	gha: fix git-helper issues reported by shellcheck ./tests/git-helper.sh:20:5: note: Prefer [[ ]] over [ ] for tests in Bash/Ksh. [SC2292] ./tests/git-helper.sh:22:26: note: Double quote to prevent globbing and word splitting. [SC2086] ./tests/git-helper.sh:23:7: note: Prefer [[ ]] over [ ] for tests in Bash/Ksh. [SC2292] Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-06 20:28:41 +00:00
Dan Mihai	04adcdace6	gha: always delete workspace on rebase error The workplace was already being deleted on non-x86_64 platforms, but x86_64 can be affected by the same problem too. That might have been the case with the SNP and TDX test runs from: https://github.com/kata-containers/kata-containers/actions/runs/13687511270/job/38313758751?pr=10973 https://github.com/kata-containers/kata-containers/actions/runs/13687511270/job/38313760086?pr=10973 Rebase worked fine for the same patch/PR on other platforms. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-03-06 20:24:09 +00:00
Ruoqing He	3a8131349e	kata-deploy: Use docker.io for all archietcutres Switch to `docker.io` provided by Ubuntu sources. It is not necessary for us to install docker through `get-docker.sh`. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-07 02:22:31 +08:00
RuoqingHe	8ef8109b2f	Merge pull request #10985 from RuoqingHe/remove-s390x-conditional-compilation runtime-rs: Remove s390x conditional compilation	2025-03-06 23:13:11 +08:00
Pavel Mores	133528a63c	runtime-rs: remove snp_certs_path support SNP certs were apparently obsoleted by AMD. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-03-06 15:53:24 +01:00
stevenhorsman	a40d5d3daa	ci: Add arm64 K8s tests as required This is based on the request from @fidencio, who is one of the maintainers	2025-03-06 14:39:04 +00:00
stevenhorsman	f45b398170	ci: Add coco required tests Add the zvsi and nontee coco tests to the required jobs list Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-06 14:38:52 +00:00
stevenhorsman	ee0f0b7bfe	workflows: shellcheck: Expand vendor ignore - In the previous PR I only skipped the runtime/vendor directory, but errors are showing up in other vendor packages, so try a wildcard skip - Also update the job step was we can distinguish between the required and non-required versions Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-06 14:35:12 +00:00
Manuel Huber	c05b976ebe	runtime: upgrade grpc vendor dependency - remove hard link to v.1.47.0 in go.mod - run go mod tidy, go mod vendor to actually update to v1.58.3 - addresses CVE-2023-44487 Signed-off-by: Manuel Huber <mahuber@microsoft.com>	2025-03-06 10:00:49 +00:00
Xuewei Niu	644af52968	Merge pull request #10876 from lifupan/fupan_containerd ci: cri-containerd: upgrade the LTS / Active versions for containerd	2025-03-06 17:08:40 +08:00
Hyounggyu Choi	bf41618a84	Merge pull request #10862 from BbolroC/enable-ibm-se-for-qemu-runtime-rs runtime-rs: Enable IBM SE for QEMU	2025-03-06 05:38:13 +01:00
Ruoqing He	ed6f57f8f6	runtime-rs: Restrict cloud-hypervisor feature Cloud-Hypervisor currently only supports `x86_64` and `aarch64`, this features should not be avaiable even if other architectures explicitly requires it. Restrict `cloud-hypervisor` feature to only `x86_64` and `aarch64`. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-06 11:21:57 +08:00
Ruoqing He	6f894450fe	runtime-rs: Drop s390x target predicates Drop `target_arch = "s390x"` all over `runtime-rs`, it is strange to have such predicates on features and code while we do not support it. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-06 11:20:28 +08:00
Xuewei Niu	a54eed6bab	Merge pull request #10975 from teawater/fix_log_level runtime-rs: Fix log_level's comments in configuration-dragonball.toml.in	2025-03-06 10:05:09 +08:00
Alex Lyn	2619b57411	Merge pull request #10937 from Apokleos/bugfix-useless-annotation kata-types: Fix bugs related to annotations in kata-types	2025-03-06 09:37:29 +08:00
Hyounggyu Choi	c3e3ef7b25	Merge pull request #10981 from BbolroC/remove-sclp-console-s390x runtime: Remove console=ttysclp0 for s390x	2025-03-05 21:43:57 +01:00
Fabiano Fidêncio	80e95bd264	Merge pull request #10966 from kata-containers/topic/tests-bring-back-kata-deploy-tests tests: Bring back kata-deploy tests	2025-03-05 21:11:21 +01:00
Zvonko Kaiser	ae63bbb824	Merge pull request #10982 from zvonkok/fix-zvonkos-fix agent: fix permisssion according to runc	2025-03-05 15:08:48 -05:00
Fabiano Fidêncio	545780a83a	shellcheck: tests: k8s: Fix gha-run.sh warnings As we'll touch this file during this series, let's already make sure we solve all the needed warnings. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-03-05 19:44:27 +01:00
Fabiano Fidêncio	50f765b19c	shellcheck: tests: Fix gha-run-k8s-common.sh warnings Let's fix all the warnings caught in this file, as we're already touching it. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-03-05 19:44:27 +01:00
Fabiano Fidêncio	219db60071	tests: kata-deploy: microk8s: Re-work installation So we can ensure that the user has enough permissions to access microk8s. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-03-05 19:44:27 +01:00
Fabiano Fidêncio	c337a21a4e	shellcheck: kata-deploy: Fix warnings He were fixing the few warnings we found in the files present in the functional tests for kata-deploy. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-03-05 19:44:27 +01:00
Fabiano Fidêncio	fd832d0feb	tests: kata-deploy: Run installation with only one VMM It doesn't make much sense to test different VMMs as that wouldn't trigger a different code path. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-03-05 19:44:27 +01:00
Fabiano Fidêncio	14bf653c35	tests: kata-deploy: Re-add tests, now using github runners As GitHub runners now support nested virt, we're don't depend on garm for those anymore. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-03-05 19:44:27 +01:00
Zvonko Kaiser	3cea080185	agent: fix permisssion according to runc The previous PR mistakenly set all perms to 0o666 we should follow what runc does and fetch the permission from the guest aka host if the file_mode == 0. If we do not find the device on the guest aka host fallback to 0. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-03-05 17:33:40 +00:00
Fupan Li	7024d3c600	CI: cri-containerd: upgrade the LTS / Active versions for containerd As we're testing against the LTS and the Active versions of containers, let's upgrade the lts version from 1.6 to 1.7 and active version from 1.7 to 2.0 to cover the sandboxapi tests. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-03-05 23:09:24 +08:00
Hyounggyu Choi	624f7bfe0b	runtime: Remove console=ttysclp0 for s390x After the introduction of the following kernel parameters (see #6163): ``` CONFIG_SCLP_VT220_TTY=y CONFIG_SCLP_VT220_CONSOLE=y ``` the system log for Kata components (e.g., the agent) no longer appeared on the SCLP console (i.e., /dev/ttysclp0). Let's switch to the default fallback console (likely /dev/console) for logging. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-03-05 15:06:08 +01:00
Zvonko Kaiser	a5629f9bfa	Merge pull request #10971 from zvonkok/host-guest-mapping agent: Enable VFIO and initContainers	2025-03-05 08:58:45 -05:00
Fabiano Fidêncio	504d9e2b66	Merge pull request #10976 from zvonkok/fix-dev-permissions agent: Fix default linux device permissions	2025-03-05 13:54:06 +01:00
Hyounggyu Choi	4ea7d274c4	runtime-rs: Add new runtimeClass qemu-se-runtime-rs When `KATA_HYPERVISOR` is set to `qemu-se-runtime-rs`, a configuration file is properly referenced and a runtime class should be created via kata-deploy. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-03-05 13:50:38 +01:00
Hyounggyu Choi	2c72cf5891	runtime-rs: Add SE configuration A configuration file, `configuration-qemu-se-runtime-rs.toml`, is referenced when the `qemu-se-runtime-rs` runtime is configured. This commit adds a template file and updates the Makefile configuration accordingly. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-03-05 13:50:38 +01:00
Hyounggyu Choi	65021caca6	Merge pull request #10963 from RuoqingHe/remove-arch-predicates-in-runtime-rs runtime-rs: Enable Dragonball only for x86_64 & aarch64	2025-03-05 09:10:33 +01:00
Zvonko Kaiser	c73ff7518e	agent: Fix default linux device permissions We had the default permissions set to 0o000 if the file_mode was not present, for most container devices this is the wrong default. Since those devices are meant also to be accessed by users and others add a sane default of 0o666 to devices that do not have any permissions set. Otherwise only root can acess those and we cannot run containers as a user. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-03-05 02:22:24 +00:00
Ruoqing He	186c88b1d5	ci: Move musl-tools installation into Setup rust `musl-tools` is only needed when a component needs `rust`, and the `instance` running is of `x86_64` or `aarch64`. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-05 09:43:19 +08:00
Zvonko Kaiser	4bb0eb4590	Merge pull request #10954 from kata-containers/topic/metrics-kata-deploy Rework and fix metrics issues	2025-03-04 20:22:53 -05:00
Hui Zhu	c3c3f23b33	runtime-rs: Fix log_level's comments in configuration-dragonball.toml.in Add double quotes to fix log_level's comments in configuration-dragonball.toml.in. Fixes: #10974 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2025-03-05 09:21:08 +08:00
Dan Mihai	edf6af2a43	Merge pull request #10955 from microsoft/cameronbaird/hyp-loglevel-default-upstream runtime: Properly set default hyp loglevel to 1	2025-03-04 16:44:08 -08:00
Cameron Baird	d48116114e	runtime: Properly set default hyp loglevel to 1 Tweak default HypervisorLoglevel config option for clh to 1. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-03-04 20:36:40 +00:00
Zvonko Kaiser	248d04c20c	agent: Enable VFIO and initContainers We had a static mapping of host guest PCI addresses, which prevented to use VFIO devices in initContainers. We're tracking now the host-guest mapping per container and removing this mapping if a container is removed. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-03-04 19:53:52 +00:00
Fabiano Fidêncio	874129a11f	Merge pull request #10958 from stevenhorsman/shell-check-errors-fix Shell check errors fix	2025-03-04 17:37:36 +01:00
stevenhorsman	02a2f6a9c1	tests: Sanitize `K8S_TEST_ENTRY` Now we've added the double quotes around `${K8S_TEST_UNION[@]}`, so platforms are failing with: ``` Error: Test file "/home/ubuntu/runner/_layout/_work/kata-containers/kata-containers/tests/integration/kubernetes/k8s-nginx-connectivity.bats " does not exist ``` due to the line continuation, so sanitise the value to try and fix this. Co-authored-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-04 09:39:10 +00:00
stevenhorsman	e33ad56cf4	kernel: bump kata_config_version Bump kernel version as the build-kernel script was updated (even if there was no functional change). Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-04 09:39:10 +00:00
stevenhorsman	2df3e5937a	ci/openshift-ci: Fix script error The space was missing before `]`, so fix this and also swtich to double square brackets and variable braces Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-04 09:39:10 +00:00
stevenhorsman	9a9e88a38d	test: vfio: Attempt to fix logic This was checking that a literal string was non-zero. I'm assume it instead wanted to check if the file exists Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-04 09:39:10 +00:00
stevenhorsman	b220cca253	shellcheck: Fix shellcheck SC2066 > Since you double-quoted this, it will not word split, and the loop will only run once. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-04 09:39:10 +00:00
stevenhorsman	b8cfdd06fb	shellcheck: Fix shellcheck SC2071 > > is for string comparisons. Use -gt instead. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-04 09:39:10 +00:00
stevenhorsman	eb90b93e3f	shellcheck: Fix shellcheck SC2104 > In functions, use return instead of break. > rationale: break or continue are used to abort or continue a loop, and are not the right way to exit a function. Use return instead. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-04 09:39:10 +00:00
stevenhorsman	67bfd4793e	shellcheck: Fix shellcheck SC2242 > Can only exit with status 0-255. Other data should be written to stdout/stderr. Switch exit -1 to exit 1 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-04 09:39:01 +00:00
stevenhorsman	ed8347c868	shellcheck: Fix shellcheck SC2070 > -n doesn't work with unquoted arguments. Quote or use [[ ]] Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-04 09:35:46 +00:00
stevenhorsman	dbba6b056b	shellcheck: Fix shellcheck SC2148 > Tips depend on target shell and yours is unknown. Add a shebang. Add ``` #!/usr/bin/env bash ``` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-04 09:35:46 +00:00
stevenhorsman	c5ff513e0b	shellcheck: Fix shellcheck SC2068 > Double quote array expansions to avoid re-splitting elements Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-04 09:35:46 +00:00
stevenhorsman	58672068ff	shellcheck: Fix shellcheck SC2145 > Argument mixes string and array. Use * or separate argument. - Swap echos for printfs and improve formatting - Replace $@ with $* - Split arrays into separate arguments Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-04 09:35:46 +00:00
stevenhorsman	bc2d7d9e1e	osbuilder: Skip shellcheck on test_images.sh I'm not sure if we use test_images anywhere, so before we invest the time to fix the 120 shellcheck errors and warnings we should decide if we want to keep it. See #10957 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-04 09:35:46 +00:00
stevenhorsman	fb1d4b571f	workflows: Add required shellcheck workflow Start with a required smaller set of shellchecks to try and prevent regressions whilst we fix the current problems Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-04 09:35:46 +00:00
stevenhorsman	b3972df3ca	workflows: Shellcheck - ignore vendor Ignore the vendor directories in our shellcheck workflow as we can't fix them. If there is a way to set this in shellcheckrc that would be better, but it doesn't seem to be implemented yet. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-04 09:35:46 +00:00
Zvonko Kaiser	4df406f03c	Merge pull request #10965 from zvonkok/fix-init gpu: fix init symlinks	2025-03-03 14:46:41 -05:00
Zvonko Kaiser	eb2f75ee61	gpu: fix init symlinks With the recent changes we need to make sure NVRC is symlinked for init and sbin/init Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-03-03 17:21:59 +00:00
Greg Kurz	545022f295	Merge pull request #10817 from Jakob-Naucke/virtio-net-ccw Fix virtio-net-ccw	2025-03-03 17:37:46 +01:00
Hyounggyu Choi	e8aa5a5ab7	runtime-rs: Enable virtio-net-ccw for s390x When using `virtio-net-pci` for IBM SE, the following error occurs: ``` update interface: Link not found (Address: f2:21:48:25:f4:10) ``` On s390x, it is more appropriate to use the CCW type of virtio network device. This commit ensures that a subchannel is configured accordingly. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-03-03 16:34:03 +01:00
Hyounggyu Choi	59c1f0b59b	runtime-rs: Suppress kernel parameters for IBM SE For IBM SE, the following kernel parameters are not required: - Basic parameters (reboot and systemd-related) - Rootfs parameters This commit suppresses these parameters when IBM SE is configured. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-03-03 16:34:03 +01:00
Hyounggyu Choi	4c8e881a84	runtime-rs: Enable IBM SE support for QEMU This commit configures the command line for IBM Secure Execution (SE) and other TEEs. The following changes are made: - Add a new item `Se` to ProtectionDeviceConfig and handle it at sandbox - Introduce `add_se_protection_device()` for SE cmdline config - Bypass rootfs image/initrd validity checks when SE is configured. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-03-03 16:32:18 +01:00
Ruoqing He	2ecb2fe519	runtime-rs: Enable Dragonball for x86_64 & aarch64 `USE_BUILDIN_DB` is turned on by default for architectures do not support `Dragonball`, which leads `s390x` is building `runtime-rs` with `--features dragonball` presents. Let's restrict `USE_BUILDIN_DB` to be enable only for architectures supported by `Dragonball` (namely x86_64 and aarch64 as of now). Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-03-03 12:10:58 +08:00
stevenhorsman	c69509be1c	metrics: Reduce repeats for boot time tests on qemu On qemu the run seems to error after ~4-7 runs, so try a cut down version of repetitions to see if this helps us get results in a stable way. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-02 08:42:00 +00:00
stevenhorsman	0962cd95bc	metrics: Increase minpercent range for qemu iperf test We have a new metrics machine and environment and the iperf jitter result failed as it finished too quickly, so increase the minpercent to try and get it stable Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-02 08:32:26 +00:00
stevenhorsman	ef0e8669fb	metrics: Increase minpercent range for clh tests We have a new metrics machine and environment and the fio write.bw and iperf3 parallel.Results tests failed for clh, as below the minimum range, so increase the minpercent to try and get it stable Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-02 08:32:26 +00:00
stevenhorsman	f81c85e73d	metrics: Increase maxpercent range for clh boot times We have a new metrics machine and environment and the boot time test failed for clh, so increase the maxpercent to try and get it stable Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-01 17:50:05 +00:00
stevenhorsman	435ee86fdd	metrics: Update iperf affinity The iperf deployment is quite a lot out of date and uses `master` for it's affinity and toleration, so update this to control-plane, so it can run on newer Kubernetes clusters Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-01 17:50:05 +00:00
stevenhorsman	85bbc0e969	metrics: Increase wait time The new metrics runner seems slower, so we are seeing errors like: The iperf3 tests are failing with: ``` pod rejected: RuntimeClass "kata" not found ``` so give more time for it to succeed Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-01 17:50:05 +00:00
stevenhorsman	4ce94c2d1b	Revert "metrics: Add init_env function to latency test" This reverts commit `9ac29b8d38`. to remove the duplicate `init_env` call Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-01 17:50:05 +00:00
stevenhorsman	658a5e032b	metrics: Increase containerd start timeout - Move `kill_kata_components` from common.bash into the metrics code base as the only user of it - Increase the timeout on the start of containerd as the last 10 nightlies metric tests have failed with: ``` 223478 Killed sudo timeout -s SIGKILL "${TIMEOUT}" systemctl start containerd ``` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-01 17:50:05 +00:00
stevenhorsman	3fab7944a3	workflows: Improve metrics jobs - As the metrics tests are largely independent then allow subsequent tests to run even if previous ones failed. The results might not be perfect if clean-up is required, but we can work on that later. - Move the test results check out of the latency test that seems arbitrary and into it's own job step - Add timeouts to steps that might fail/hang if there are containerd/K8s issues Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-01 17:50:05 +00:00
stevenhorsman	6f918d71f5	workflows: Update metrics jobs Currently the run-metrics job runs a manual install and does this in a separate job before the metrics tests run. This doesn't make sense as if we have multiple CI runs in parallel (like we often do), there is a high chance that the setup for another PR runs between the metrics setup and the runs, meaning it's not testing the correct version of code. We want to remove this from happening, so install (and delete to cleanup) kata as part of the metrics test jobs. Also switch to kata-deploy rather than manual install for simplicity and in order to test what we recommend to users. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-03-01 17:50:05 +00:00
Zvonko Kaiser	3f13023f5f	Merge pull request #10870 from zvonkok/module-signing gpu: add module signing	2025-03-01 09:51:24 -05:00
Zvonko Kaiser	d971e13446	gpu: Update rootfs.sh Only source NV scripts if variant starts with "nvidia-gpu" Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-03-01 02:08:29 +00:00
Fabiano Fidêncio	4018079b55	Merge pull request #10960 from fidencio/topic/kata-deploy-fix-k0s-deployment kata-deploy: k0s: Fix drop-in path	2025-02-28 18:49:46 +01:00
Zvonko Kaiser	94579517d4	shellcheck: Update nvidia_rootfs.sh With the new rules we need more updates. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-28 16:36:05 +00:00
Zvonko Kaiser	af1d6c2407	shecllcheck: Update nvidia_chroot.sh Make shellcheck happy with the new rules new updates needed Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-28 16:27:51 +00:00
Fabiano Fidêncio	c95f9885ea	kata-deploy: k0s: Fix drop-in path The drop-in path should be /etc/containerd (from the containers' perspective), which mounts to the host path /etc/k0s/containerd.d. With what we had we ended up dropping the file under the /etc/k0s/containerd.d/containerd.d/, which is wrong. This is a regression introduce by: `94b3348d3c` Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-02-28 16:32:00 +01:00
Zvonko Kaiser	c4e4e14b32	kernel: bump kata_config_version Mandatory update to have a unique kernel version name Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-28 15:18:15 +00:00
Fabiano Fidêncio	d13be49f9b	Merge pull request #10846 from stalb/feature/microk8s-support kata-deploy: Update kata-deploy to support microk8s	2025-02-28 13:57:44 +01:00
Stephane Talbot	f80e7370d5	test: Verify deployement of kata-deploy on microk8s Enable fonctional test to verify deployment of kata-deploy on a Microk8s cluster Signed-off-by: Stephane Talbot <Stephane.Talbot@univ-savoie.fr>	2025-02-28 10:10:29 +01:00
Stéphane Talbot	f2ba224e6c	kata-deploy: Update kata-deploy to support microk8s Change kata-deploy script and Helm chart in order to be able to use kata-deploy on a microk8s cluster deployed with snap. Fixes: #10830 Signed-off-by: Stephane Talbot <Stephane.Talbot@univ-savoie.fr>	2025-02-28 10:10:29 +01:00
Ruoqing He	09030ee96e	ci: Refactor build-checks workflow Refator matrix setup and according dependencies installation logic in `build-checks.yaml` and `build-checks-preview-riscv64.yaml` to provide better readability and maintainability. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-02-28 09:47:25 +08:00
Ruoqing He	eb94700590	ci: Drop install-libseccomp matrix variant `install-libseccomp` is applied only for `agent` component, and we are already combining matrix with `if`s in steps, drop `install-libseccomp` in matrix to reduce complexity. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-02-28 09:44:53 +08:00
Zvonko Kaiser	4dadd07699	gpu: Update rootfs.sh Pass-through KBUILD_SIGN_PIN to the rootfs build Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-28 01:31:35 +00:00
Zvonko Kaiser	5ab3192c51	gpu: Update nvidia_rootfs.sh We need to handle KBUILD_SIGN_PIN so that the kbuild can decrypte the signing key Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-28 01:31:35 +00:00
Zvonko Kaiser	493ba63c77	gpu: Provide KBUILD_SIGN_PIN to the build.sh At the proper step pass-through the var KBUILD_SIGN_PIN so that the kernel_headers step has the PIN for encrypting the signing key. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-28 01:31:35 +00:00
Zvonko Kaiser	0309b70522	gpu: Pass-through KBUILD_SIGN_PIN In kata-deploy-binaries.sh we need to pass-through the var KBUILD_SIGN_PIN to the other static builder scripts. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-28 01:31:35 +00:00
Zvonko Kaiser	9602ba6ccc	gpu: Add proper KBUILD_SIGN_PIN to entry script Update kata-deploy-binaries-in-docker.sh to read the env variable KBUILD_SIGN_PIN that either can be set via GHA or other means. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-28 01:31:35 +00:00
Zvonko Kaiser	39d3b7fb90	gpu: Update NVIDIA chroot script We need to place the signing key and cert at the right place and hide the KBUILD_SIGN_PIN from echo'ing or xtrace Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-28 01:31:35 +00:00
Zvonko Kaiser	d815fb6f46	gpu: Update kernel-headers Use the kernel-headers as the extra_tarball to move the encrypted key and cert from stage to stage Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-28 01:31:35 +00:00
Zvonko Kaiser	c2cb89532b	gpu: Add the proper handling in build-kernel.sh If KBUILD_SIGN_PIN is provided we can encrypt the signing key for out-of-tree builds and second round jobs in GHA Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-28 01:31:35 +00:00
Zvonko Kaiser	bc8360e8a9	gpu: Add proper config for module signing We want to enable module signing in Kata and Coco Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-28 01:31:34 +00:00
Zvonko Kaiser	f485e52f75	Merge pull request #10953 from zvonkok/shellcheckrc ci: Add shellcheckrc	2025-02-27 13:35:23 -05:00
Fabiano Fidêncio	96ed706d20	Merge pull request #10950 from fidencio/topic/skip-arm-check-tests-that-depend-on-virt ci: arm64: Skip tests that depend on virt on non-virt capable runners	2025-02-27 18:26:32 +01:00
Zvonko Kaiser	abfbc0ab60	ci: Add shellcheckrc Let's have common rules over all shell files. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-27 17:11:24 +00:00
Zvonko Kaiser	33460386b9	Merge pull request #10803 from ryansavino/update-confidential-initrd-22.04 versions: update confidential initrd to 22.04	2025-02-27 09:29:36 -05:00
Fabiano Fidêncio	e18e1ec3a8	ci: arm64: Skip tests that depend on virt on non-virt capable runners The GitHub hosted runners for ARM64 do not provide virtualisation support, thus we're just skipping the tests as those would check whether or not the system is "VMContainerCapable". Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-02-27 14:43:21 +01:00
Wainer Moschetta	5fda6b69e8	Merge pull request #10883 from stevenhorsman/k0s-version-pinning ci: k8s: Pin k0s version to get cri-o tests back working	2025-02-27 10:11:59 -03:00
Steve Horsman	f3c22411fc	Merge pull request #10930 from stevenhorsman/codeql-config workflows: Add codeql config	2025-02-27 12:43:41 +00:00
stevenhorsman	d08787774f	ci: k8s: Use pinned k0s version Update the code to install the version of k0s that we have in our versions.yaml, rather than just installing the latest, to help our CI being less stable and prone to breaking due to things we don't control. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-27 11:33:23 +00:00
stevenhorsman	3fe35c1594	version: Add k0s version Add external versions support for k0s and initially pin it at v1.31.5 as our cri-o tests started failing when v1.32 became the latest Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-27 11:33:23 +00:00
Fabiano Fidêncio	6e236fd44c	Merge pull request #10652 from burgerdev/sysctls genpolicy: support sysctls from PodSpec and environment defaults	2025-02-27 08:25:14 +01:00
Dan Mihai	cb382e1367	Merge pull request #10925 from katexochen/p/fail-on-layer-pull genpolicy: fail when layer can't be processed	2025-02-26 13:28:38 -08:00
Ryan Savino	ceafa82f2e	tests: skip trusted storage tests for qemu-snp skip tests for trusted storage until #10838 is resolved. Signed-Off-By: Ryan Savino <ryan.savino@amd.com>	2025-02-26 14:23:57 -06:00
Ryan Savino	a00a7c500a	build: initrd rootfs init symlink directly to systemd when no AGENT_INIT In some cases, /init is not following two levels of symlinks i.e. /init to /sbin/init to /lib/systemd/systemd Setting /init directly to /lib/systemd/systemd when AGENT_INIT is not mandated Signed-Off-By: Ryan Savino <ryan.savino@amd.com>	2025-02-26 14:23:56 -06:00
Markus Rudy	70709455ef	genpolicy: support sysctl settings Sysctls may be added to a container by the Kubernetes pod definition or by containerd configuration. This commit adds support for the corresponding PodSecurityContext field and an option to specify environment-dependent sysctls in the settings file. The sysctls requested in a CreateContainerRequest are checked against the sysctls in the pod definition, or if not defined there in the defaults in genpolicy-settings.json. There is no check for the presence of expected sysctls, though, because Kubernetes might legitimately omit unsafe syscalls itself and because default sysctls might not apply to all containers. Fixes: #10064 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-02-26 18:56:17 +01:00
Steve Horsman	5aa89bc1d7	Merge pull request #10831 from RuoqingHe/ci-riscv64 ci: Enable partial components build-check on riscv	2025-02-26 17:50:47 +00:00
Fabiano Fidêncio	9d8026b4e5	Merge pull request #10654 from burgerdev/cronjob genpolicy: add get_process_fields to CronJob	2025-02-26 15:13:40 +01:00
Fabiano Fidêncio	7b16df64c9	Merge pull request #10935 from burgerdev/error-messages runtime: add cause to CDI errors	2025-02-26 14:01:22 +01:00
Jakob Naucke	c146980bcd	agent: Handle virtio-net-ccw devices separately On s390x, a virtio-net device will use the CCW bus instead of PCI, which impacts how its uevent should be handled. Take the respective path accordingly. Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2025-02-26 11:36:43 +01:00
Jakob Naucke	a084b99324	virtcontainers: Separate PCI/CCW for net devices On s390x, virtio-net devices should use CCW, alongside a different device path. Use accordingly. Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2025-02-26 11:36:43 +01:00
Jakob Naucke	2aa523f08a	virtcontainers: Fix virtio-net-ccw address format Hex device number was formatted as hex twice, thus encoding the string as hex. Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2025-02-26 11:36:43 +01:00
Jakob Naucke	2a992c4080	virtcontainers: Add CCW device to endpoint To support virtio-net-ccw for s390x, add CCW devices to the Endpoint interface. Add respective fields and functions to implementing structs. Device paths may be empty. PciPath resolves this by being a list that may be empty, but this design does not map to CcwDevice. Use a pointer instead. Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2025-02-26 11:36:42 +01:00
Jakob Naucke	b325069d72	agent: Update QEMU URL Readthedocs URL was outdated. Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2025-02-26 11:36:42 +01:00
Jakob Naucke	9935f9ea7e	proto: Rename Interface.pciPath to devicePath Field is being used for both PCI and CCW devices. Name it devicePath to avoid confusion when the device isn't a PCI device. Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>	2025-02-26 11:36:42 +01:00
alex.lyn	a338af3f18	kata-types: Fix bugs related to annotations in kata-types It will address two issuses: (1) expected `,`: --> /root/kata-containers/src/libs/kata-types/tests/test_config.rs:15:9 \| 14 \| KATA_ANNO_CFG_HYPERVISOR_ENABLE_IO_THREADS \| - \| \| \| expected one of `,`, `::`, `as`, or `}` \| help: missing `,` 15 \| KATA_ANNO_CFG_HYPERVISOR_FILE_BACKED_MEM_ROOT_DIR, \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ unexpected token (2) remove useless annotation `KATA_ANNO_CFG_HYPERVISOR_CTLPATH`. Fixes #10936 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2025-02-26 17:48:11 +08:00
Fabiano Fidêncio	47a5439a20	Merge pull request #10934 from fidencio/topic/agent-unbreak-non-guest-pull-build agent: Fix non-guest-pull build	2025-02-26 09:45:22 +01:00
Pavel Mores	c5e560e2d1	runtime-rs: handle ProtectionDevice in resource manager and sandbox As part of device preparation in Sandbox we check available protection and create a corresponding ProtectionDeviceConfig if appropriate. The resource-side handling is trivial. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-02-26 09:11:35 +01:00
Pavel Mores	eb47f15b10	runtime-rs: support ProtectionDevice in qemu-rs As an example, or a test case, we add some implementation of SEV/SEV-SNP. Within the QEMU command line generation, the 'Cpu' object is extended to accomodate the EPYC-v4 CPU type for SEV-SNP. 'Machine' is extended to support the confidential-guest-support parameter which is useful for other TEEs as well. Support for emitting the -bios command line switch is added as that seems to be the preferred way of supplying a path to firmware for SEV/SEV-SNP. Support for emitting '-object sev-guest' and '-object sev-snp-guest' with an appropriate set of parameters is added as well. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-02-26 09:11:35 +01:00
Pavel Mores	87deb68ab7	runtime-rs: add implementation of ProtectionDevice ProtectionDevice is a new device type whose implementation structure matches the one of other devices in the device module. It is split into an inner "config" part which contains device details (we implement SEV/SEV-SNP for now) and the customary outer "device" part which just adds a device instance ID and the customary Device trait implementation. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-02-26 09:11:35 +01:00
Pavel Mores	a3f973db3b	runtime-rs: extend SEV/SEV-SNP detection by including a details struct This matches the existing TDX handling where additional details are retrieved right away after TDX is detected. Note that the actual details (cbitpos) acquisition is NOT included at this time. This change might seem bigger than it is. The change itself is just in protection.rs, the rest are corresponding adjustments. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-02-26 09:11:35 +01:00
Pavel Mores	c549d12da7	runtime-rs: parse SEV-SNP related config file settings The 'sev_snp_guest' default value of 'false' is in compliance with the golang runtime behaviour. Signed-off-by: Pavel Mores <pmores@redhat.com>	2025-02-26 09:11:35 +01:00
Markus Rudy	d58f38dfab	genpolicy: add get_process_fields to CronJob This function was accidentally left unimplemented for CronJob, resulting in runAsUser not being supported there. Fixes: #10653 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-02-26 09:00:04 +01:00
Ruoqing He	ec020399b9	ci: Enable partial components build-check on riscv Since we have RISC-V builders available now, let's start with `agent-ctl`, `trace-forwarder` and `genpolicy` components to run build-checks on these `riscv-builder`s, and gradually add the rest components when they are ready, to catch up with other architectures eventually. This workflow could be mannually triggered, `riscv-builder` will be the default instance when that is the case. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-02-26 15:38:39 +08:00
Markus Rudy	1f6833bd0d	runtime: add cause to CDI errors Adding devices by CDI annotation can fail for a variety of reasons. If that happens, it's helpful to know the root cause of the issue (CDI spec missing, malformatted, requested device not present, etc.). This commit adds the root cause of the CDI device addition to the errors reported back to the caller. Since this error is bubbled up all the way back to the shimv2 task.Create handler, it will be visible in Kubernetes logs and enable fixing the root cause. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-02-26 08:36:15 +01:00
Paul Meyer	9981cdd8a8	genpolicy: fail when layer can't be processed Currently, if a layer can't be processed, we log this a warning and continue execution, finally exit with a zero exit code. This can lead to the generation of invalid policies. One reason a layer might not be processed is that the pull of that layer fails. We need all layers to be processed successfully to generate a valid policy, as otherwise we will miss the verity hash for that layer or we might miss the USER information from a passwd stored in that layer. This will cause our VM to not get through the agent's policy validation. Returning an error instead of printing a warning will cause genpolicy to fail in such cases. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-02-26 08:30:59 +01:00
Fabiano Fidêncio	b3b570e4c4	agent: Fix non-guest-pull build As the guest-pull is a very Confidental Containers specific feature, let's make sure we, at least, don't break folks who decide to build Kata Containers' agent without having this feature enabled (for instance, for the sake of the agent size). Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-02-25 21:48:41 +01:00
Zvonko Kaiser	04c56a0aaf	Merge pull request #10931 from zvonkok/iommufd-fix gpu: IOMMUFD fix	2025-02-25 12:50:24 -05:00
Ruoqing He	ed50e31625	build: Reorganize target selection Architectures here with `musl` available are minority, which is more suitable for enumeration. With this change, we are implicitly choosing gnu target for `ppc64le`, `riscv64` and `s390x`. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-02-26 00:56:54 +08:00
Ruoqing He	562911e170	build: Add riscv mapping for common.bash While installing Rust and Golang in our CI workflow, `arch_to_golang` and `arch_to_rust` are needed for inferring the correct arch string for riscv64 architecture. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-02-26 00:56:54 +08:00
Ruoqing He	62e2473c32	build: Add riscv64 to utils.mk Since `ARCH` for `riscv64` is `riscv64gc`, we'll need to override it in `utils.mk`, and forcing `gnu` target for `riscv64` because `musl` target is not yet made ready. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-02-26 00:56:54 +08:00
Zvonko Kaiser	804e5cd332	gpu: IOMMUFD provide proper ID We need a proper ID otherwise QEMU sometimes fails with invalid ID. Use the same pattern as with the old VFIO implementation. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-25 16:24:17 +00:00
stevenhorsman	c97e9e1592	workflows: Add codeql config I noticed that CodeQl using the default config hasn't scanned since May 2024, so figured it would be worth trying an explicit configuration to see if that gets better results. It's mostly the template, but updated to be more relevant: - Only scan PRs and pushes to the `main` branch - Set a pinned runner version rather than latest (with mac support) - Edit the list of languages to be scanned to be more relevant for kata-containers Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-25 15:05:43 +00:00
Fabiano Fidêncio	e09ae2cc0b	Merge pull request #10921 from RuoqingHe/drop-redundant-override build: Drop redundant ARCH override	2025-02-25 14:54:36 +01:00
Fabiano Fidêncio	c01e7f1ed5	Merge pull request #10932 from kata-containers/topic/consolidate-publish-workflow workflows: Refactor publish workflows	2025-02-25 14:50:40 +01:00
stevenhorsman	5000fca664	workflows: Add build-checks to manual CI Currently the ci-on-push workflow that runs on PRs runs two jobs: gatekeeper-skipper.yaml and ci.yaml. In order to test things like for the error ``` too many workflows are referenced, total: 21, limit: 20 ``` on topic branches, we need ci-devel.yaml to have an extra workflow to match ci-on-push, so add the build-checks as this is helpful to run on topic branches anyway. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-25 11:38:49 +00:00
stevenhorsman	23434791f2	workflows: Refactor publish workflows Replace the four different publish workflows with a single one that take input parameters of the arch and runner, so reduce the amount of duplicated code and try and avoid the ``` too many workflows are referenced, total: 21, limit: 20 ``` error	2025-02-25 10:49:09 +00:00
Fabiano Fidêncio	e3eb9e4f28	Merge pull request #10929 from kata-containers/topic/enable-arm-tests arm: ci: k8s: Enable CI	2025-02-24 19:34:28 +01:00
Fabiano Fidêncio	a6186b6244	ci: k8s: arm: Skip "Check the number vcpus are ..." test See https://github.com/kata-containers/kata-containers/issues/10928 Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-02-24 18:43:24 +01:00
Fabiano Fidêncio	1798804c32	ci: k8s: arm: Skip "Pod quota" test See https://github.com/kata-containers/kata-containers/issues/10927 Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-02-24 18:43:24 +01:00
Fabiano Fidêncio	053827cacc	ci: k8s: arm: Skip "Running within memory constraints" test See https://github.com/kata-containers/kata-containers/issues/10926 Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-02-24 18:43:24 +01:00
Fabiano Fidêncio	7bd444fa52	ci: Run k8s tests on arm64 Let's take advantege of the current arm64 runners, and make sure we have those tests running there as well. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org> Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>	2025-02-24 18:43:20 +01:00
Aurélien Bombo	16aa6b9b4b	Merge pull request #10911 from kata-containers/sprt/fix-cgroup-race agent: Fix race condition with cgroup watchers	2025-02-24 10:28:58 -06:00
Ruoqing He	265a751837	build: Drop redundant ARCH override There are many `override ARCH = powerpc64le` after where `utils.mk` is included, which are redundant. Drop those redundant `override`s. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-02-24 22:04:28 +08:00
Fabiano Fidêncio	aa30f9ab1f	versions: Use jammy for x86_64 confidential initrd Set confidential initrd to use jammy rootfs Signed-off-by: Ryan Savino <ryan.savino@amd.com>	2025-02-22 23:57:16 -06:00
Aurélien Bombo	adca339c3c	ci: Fix GH throttling in run-nerdctl-tests Specify a GH API token to avoid the below throttling error: https://github.com/kata-containers/kata-containers/actions/runs/13450787436/job/37585810679?pr=10911#step:4:96 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-02-21 17:52:17 -06:00
Aurélien Bombo	111803e168	runtime: cgroups: Remove commented out code Doesn't seem like we're going to use this and it's confusing when inspecting code. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-02-21 17:52:17 -06:00
Aurélien Bombo	1f8c15fa48	Revert "tests: Skip k8s job test on qemu-coco-dev" This reverts commit `a8ccd9a2ac`. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-02-21 17:52:17 -06:00
Aurélien Bombo	7542dbffb8	Revert "tests: disable k8s-policy-job.bats on coco-dev" This reverts commit `47ce5dad9d`. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-02-21 17:52:17 -06:00
Aurélien Bombo	a1ed923740	agent: Fix race condition with cgroup watchers In the CI, test containers intermittently fail to start after creation, with an error like below (see #10872 for more details): # State: Terminated # Reason: StartError # Message: failed to start containerd task "afd43e77fae0815afbc7205eac78f94859e247968a6a4e8bcbb987690fcf10a6": No such file or directory (os error 2) I've observed this error to repro with the following containers, which have in common that they're all very short-lived by design (more tests might be affected): * k8s-job.bats * k8s-seccomp.bats * k8s-hostname.bats * k8s-policy-job.bats * k8s-policy-logs.bats Furthermore, appending a `; sleep 1` to the command line for those containers seemed to consistently get rid of the error. Investigating further, I've uncovered a race between the end of the container process and the setting up of the cgroup watchers (to report OOMs). If the process terminates first, the agent will try to watch cgroup paths that don't exist anymore, and it will fail to start the container. The added error context in notifier.rs confirms that the error comes from the missing cgroup: https://github.com/kata-containers/kata-containers/actions/runs/13450787436/job/37585901466#step:17:6536 The fix simply consists in creating the watchers before we start the container but still after we create it -- this is non-blocking, and IIUC the cgroup is guaranteed to already be present then. Fixes: #10872 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-02-21 17:52:11 -06:00
Fabiano Fidêncio	aaa7008cad	versions: Add a comment about "jammy" being 22.04 I missed that when I added the other comments, so, for the sake of consistency, let's just add it there as well. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-02-21 16:02:38 -06:00
Fabiano Fidêncio	a7d33cc0cb	build: Ensure MEASURED_ROOTFS is only used for images We never ever tested MEASURED_ROOTFS with initrd, and I sincerely do not know why we've been setting that to "yes" in the initrd cases. Let's drop it, as it may be causing issues with the jobs that rely on the rootfs-initrd-confidential. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-02-21 15:32:20 -06:00
Dan Mihai	b90c537f79	Merge pull request #10881 from mythi/build-fixes minor build fixes	2025-02-21 09:54:55 -08:00
Jeremi Piotrowski	304978ad47	Merge pull request #10784 from arvindskumar99/disable_nesting_checks Disabling Nesting Check for SNP upstream	2025-02-21 12:39:18 +01:00
Xuewei Niu	cdb29a4fd1	Merge pull request #10780 from RuoqingHe/setup-dragonball-workspace dragonball: Appease clippy, setup workspace and centralize RustVMM	2025-02-21 14:04:19 +08:00
Hyounggyu Choi	58647bb654	Merge pull request #10743 from zvonkok/iommufd-gpu-fix IOMMUFD GPU enhancement	2025-02-20 23:43:00 +01:00
Zvonko Kaiser	7cca2c4925	gpu: Use a dedicated VFIO group vs iommufd entry We do not want to abuse the sysfsentry lets use a dedicated devfsentry. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-20 18:27:52 +00:00
Zvonko Kaiser	9add633258	qemu: Add command line for IOMMUFD For each IOMMUFD device create an object and assign it to the device, we need additional information that is populated now correctly to decide if we run the old VFIO or new VFIO backend. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-20 18:27:50 +00:00
Fabiano Fidêncio	19a7f27736	Merge pull request #10906 from BbolroC/remove-measured-rootfs-check-for-shimv2-on-s390x shim-v2: Remove MEASURED_ROOTFS assignment for s390x	2025-02-20 15:53:50 +01:00
arvindskumar99	c0a3ecb27b	config: Disabling nesting check for SNP Adding disable_nesting_checks to accomodate SNP on Azure Signed-off-by: arvindskumar99 <arvinkum@amd.com>	2025-02-20 12:24:08 +01:00
Hyounggyu Choi	1a9dabd433	shim-v2: Remove MEASURED_ROOTFS assignment for s390x As a follow-up for #10904, we do not need to set MEASURED_ROOTFS to no on s390x explicitly. The GHA workflow already exports this variable. This commit removes the redundant assignment. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-02-20 10:43:36 +01:00
Greg Kurz	f51d84b466	Merge pull request #10904 from BbolroC/turn-off-measured-rootfs-s390x-gha-workflows GHA: Turn off MEASURED_ROOTFS in build-kata-static-tarball-s390x	2025-02-20 10:24:23 +01:00
Aurélien Bombo	601c403603	Merge pull request #10818 from burgerdev/plumbing agent: clear log pipes if denied by policy	2025-02-19 16:28:58 -06:00
Aurélien Bombo	cb3467535c	tests: Add policy test for ReadStreamRequest This test verifies that, when ReadStreamRequest is blocked by the policy, the logs are empty and the container does not deadlock. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-02-19 14:03:41 -06:00
Hyounggyu Choi	ca40462a1c	Merge pull request #10903 from BbolroC/fixes-for-cri-containerd-on-ubuntu24 tests: Support systemd unit files in /usr/lib as well as /lib	2025-02-19 19:45:55 +01:00
Hyounggyu Choi	d973d41efb	GHA: Turn off MEASURED_ROOTFS in build-kata-static-tarball-s390x This is the first attempt to remove the following code: ``` if [ "${ARCH}" == "s390x" ]; then export MEASURED_ROOTFS=no fi ``` from install_shimv2() in kata-deploy-binaries.sh. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-02-19 18:19:19 +01:00
Zvonko Kaiser	238db32126	Merge pull request #10868 from zvonkok/qemu-tdx-experimental-workflow QEMU TDX experimental workflow	2025-02-19 10:09:27 -05:00
Zvonko Kaiser	f0eef73a89	gpu: Add no_patches.txt for TDX flavour As alwasy if we do not have any patches create the no_patches.txt for the specific tag gpu_tdx_... Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-19 14:59:04 +00:00
Zvonko Kaiser	ca4d227562	gpu: Add qemu-tdx-experimental build We need to introduce again the qemu-tdx build for the GPU Depends-on: #10867 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-19 14:48:56 +00:00
Hyounggyu Choi	a8363c28ca	tests: Support systemd unit files in /usr/lib as well as /lib On Ubuntu 24.04, due to the /usr merge, system-provided unit files now reside in `/usr/lib/systemd/system/` instead of `/lib/systemd/system/`. For example, the command below now returns a different path: ``` $ systemctl show containerd.service -p FragmentPath /usr/lib/systemd/system/containerd.service ``` Previously, on Ubuntu 22.04 and earlier, it returned: ``` /lib/systemd/system/containerd.service ``` The current pattern `if [[ $unit_file == /lib* ]]` fails to match the new path. To ensure compatibility across versions, we update the pattern to match both `/lib` and `/usr/lib` like: ``` if [[ $unit_file =~ ^/(usr/)?lib/ ]] ``` Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-02-19 14:34:59 +01:00
Zvonko Kaiser	0d786577c6	Merge pull request #10867 from zvonkok/qemu-snp-tdx-experimental gpu: QEMU SNP+TDX experimental updates	2025-02-19 08:26:37 -05:00
Ruoqing He	a8a096b20c	dragonball: Centralize RustVMM crates Centralize all RustVMM crates to workspace.dependencies to prevent having multiple versions of each RustVMM crate, which is error-prone and inconsistent. With this setup, updates on RustVMM crates would be much easier. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-02-19 21:20:30 +08:00
Ruoqing He	b129972e12	dragonball: Setup workspace Setup workspace in dragonball, move `dbs` crates one level up to be managed as members of dragonball workspace. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-02-19 21:20:30 +08:00
Ruoqing He	a174e2be03	dragonball: Appease clippy introduced by 1.80.0 New clippy warnings show up after Rust Tool Chain bumped from 1.75.0 to 1.80.0, fix accrodingly. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-02-19 21:20:30 +08:00
Ruoqing He	6bb193bbc0	spell: Update dictionary for dbs crates Add entries for dbs_* crates' README.md to pass `kata-spell-check.sh` spell checking. Changed British terms to American terms in README of `dbs_pci` to pass `hunspell` check. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-02-19 21:20:30 +08:00
Zvonko Kaiser	73b7a3478c	Merge pull request #10893 from RuoqingHe/fix-static-check ci: Fix spell_check and improve header_check	2025-02-19 08:08:40 -05:00
Mikko Ylinen	926119040c	packaging: make install_oras.sh to run curl without sudo sudo hides the environment variables that are sometimes useful with the builds (for example: proxy settings). While install_oras.sh could run completely without sudo in the container it's COPY'd to, make minimal changes to it to keep it functional outside the container too while still addressing the problem of 'sudo curl' not working with proxy env variables. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-02-19 09:34:13 +02:00
Mikko Ylinen	0d8242aee4	agent: rename cargo config To mitigate: warning: `.../kata-containers/src/agent/.cargo/config` is deprecated in favor of `config.toml` note: if you need to support cargo 1.38 or earlier, you can symlink `config` to `config.toml` Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2025-02-19 09:34:13 +02:00
Fabiano Fidêncio	c8db24468c	Merge pull request #10894 from BbolroC/use-multi-arch-for-qemu-sample example: Use multi-arch image for test-deploy-kata-qemu.yaml	2025-02-18 23:43:52 +01:00
Dan Mihai	672462e6b8	Merge pull request #10895 from katexochen/p/agent-deps agent: make policy feature optional again	2025-02-18 13:27:23 -08:00
Dan Mihai	6b389fdd4f	Merge pull request #10896 from katexochen/p/oci-client-genplicy genpolicy: bump oci-distribution to v0.12.0	2025-02-18 12:42:23 -08:00
Markus Rudy	67fbad5f37	genpolicy: bump oci-distribution to v0.12.0 This picks up a security fix for confidential pulling of unsigned images. The crate moved permanently to oci-client, which required a few import changes. Co-authored-by: Paul Meyer <katexochen0@gmail.com> Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-02-18 16:32:00 +01:00
Ruoqing He	d23284a0dc	header_check: Check header for changed text files We are running `header_check` for non-text files like binary files, symbolic link files, image files (pictures) and etc., which does not make sense. Filter out non-text files and run `header_check` only for text files changed. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-02-18 22:39:53 +08:00
Paul Meyer	80af09aae9	agent: make policy feature optional again This was messed up a little when factoring out the policy crate. Removing the dependencies no longer used by the agent and making the import of kata-agent-policy optional again. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2025-02-18 15:28:06 +01:00
Hyounggyu Choi	4646058c0c	example: Use multi-arch image for test-deploy-kata-qemu.yaml An image `registry.k8s.io/hpa-example` only supports amd64. Let's use a multi-arch image `quay.io/prometheus/prometheus` for the QEMU example instead. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-02-18 14:23:09 +01:00
Ruoqing He	7e49e83779	spell: Add missing entries for kata-spell-check `kata-dictionary.dic` changes after running `kata-spell-check.sh make-dict`. This is due to someone forgot to first update entries in data and run `make-dict`, but directly updated `kata-dictionary.dic` instead. Add mssing entries to data and re-run `make-dict` to generate correct `kata-dictionary.dic`. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-02-18 19:06:34 +08:00
Lukáš Doktor	d0ef78d3a4	ci: Change the way we modify runtimeclass in webhook previously we used to deploy the webhook and then modified the cm from our ci/openshift-ci/ script to the desired value, but sometimes it happens that the webhook pod starts before we modify the cm and keeps using the default value. Let's change the approach and modify the deployments in-place. The only cons is it leaves the git dirty, but since this script is only supposed to be used in ci it should be safe. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2025-02-18 11:39:22 +01:00
Anastassios Nanos	1e6cea24c8	Merge pull request #10890 from zvonkok/arm64-fix-release release: Remove artifacts for release	2025-02-17 22:29:23 +02:00
Zvonko Kaiser	1d9915147d	release: Remove artifacts for release We need to make sure the release does not have any residual binaries left for the release payload Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-17 20:16:48 +00:00
Anastassios Nanos	ae1be28ddd	Merge pull request #10880 from nubificus/3.14.0-release release: Bump version to 3.14.0	2025-02-17 20:25:30 +02:00
Zvonko Kaiser	72833cb00b	Merge pull request #10878 from zvonkok/agent_cdi_timeout gpu: agent cdi timeout	2025-02-17 12:49:51 -05:00
Zvonko Kaiser	fda095a4c9	Merge pull request #10786 from zvonkok/gpu-config-update gpu: Update config files	2025-02-17 12:45:54 -05:00
Anastassios Nanos	c7347cb76d	release: Bump version to 3.14.0 Bump VERSION and helm-chart versions Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>	2025-02-17 16:47:24 +00:00
Fabiano Fidêncio	639bc84329	Merge pull request #10787 from fidencio/topic/bump-kernel-to-6.12.11 version: Bump kernel to 6.12.13	2025-02-17 17:39:14 +01:00
Fabiano Fidêncio	7ae5fa463e	versions: Bump coco-guest-components So attestation-agent and others have a version including the ttrpc bump to v0.8.4, allowing us to use the latest LTS kernel. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-02-17 15:16:54 +01:00
Fabiano Fidêncio	1381cab6f0	build: Fix rootfs cache logic We've been appending to the wrong variable for quite some time, it seems, leading to not actually regenerating the rootfs when needed. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-02-17 13:55:36 +01:00
Fabiano Fidêncio	7fc7328bbc	versions: Bump kernel to 6.12.13 Let's try to keep up with the LTS patch releases. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-02-17 13:47:35 +01:00
Simon Kaegi	f5edbfd696	kernel: support loop device in v6.8+ kernels Set CONFIG_BLK_DEV_WRITE_MOUNTED=y to restore previous kernel behaviour. Kernel v6.8+ will by default block buffer writes to block devices mounted by filesystems. This unfortunately is what we need to use mounted loop devices needed by some teams to build OSIs and as an overlay backing store. More info on this config item [here](https://cateee.net/lkddb/web-lkddb/BLK_DEV_WRITE_MOUNTED.html) Fixes: #10808 Signed-off-by: Simon Kaegi <simon.kaegi@gmail.com>	2025-02-17 13:47:35 +01:00
Fabiano Fidêncio	d96e8375c4	Merge pull request #10885 from stevenhorsman/bump-agent-crates-to-resolve-CVEs agent: Bump agent crates to resolve CVEs	2025-02-17 12:11:43 +01:00
stevenhorsman	e5a284474d	deps: Update cookie-store & publicsuffix Run: ``` cargo update -p cookie-store cargo update -p publicsuffix ``` to update the version of idna and resolve CVE-2024-12224 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-14 17:30:03 +00:00
stevenhorsman	5656fc6139	deps: Bump reqwest Bump reqwest to 0.12.12 to pick up fixes Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-14 17:30:03 +00:00
stevenhorsman	3a3849efff	deps: Update quinn-proto Update quin-proto to fix CVE-2024-45311 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-14 17:30:03 +00:00
Fabiano Fidêncio	64ceb0832a	Merge pull request #10851 from fidencio/topic/bump-image-rs-to-bring-in-ttrpc-0.8.4 agent: Bump image-rs to 514c561d93	2025-02-14 18:21:56 +01:00
Fabiano Fidêncio	d5878437a4	Merge pull request #10845 from DataDog/dind-subcgroup-fix Add process to init subcgroup when we're using dind with cgroups v2	2025-02-14 18:12:24 +01:00
Steve Horsman	469c651fc0	Merge pull request #10879 from nubificus/fix_version packaging(release): Properly handle version tag for the release bundle	2025-02-14 14:40:37 +00:00
Zvonko Kaiser	908aacfa78	gpu: Update the logging around CDI Removed a rogue printf and updated the logging to say that we're waiting for CDI spec(s) to be generated rather than saying there is an error, it's not we have a timeout after that it is an error. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-14 14:32:00 +00:00
Zvonko Kaiser	4bda16565b	gpu: Update timeouts With the create_container_timeout the dial_timeout is lest important. Add the custom timeout for GPUs in create_container_timeout Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-14 14:29:18 +00:00
Zvonko Kaiser	66ccc25724	tdx: Update GPU config for the latest TDX stack We need extra kernel_params for TDX Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-14 14:29:18 +00:00
Zvonko Kaiser	d4dd87a974	gpu: Update config files With the recent changed to cgroupsv1 and AGENT_INIT=no we need update to the config files. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-14 14:29:18 +00:00
Anastassios Nanos	b13db29aaa	packaging(release): Properly handle version tag for the release bundle The tags created automatically for published Github releases are probably not annotated, so by simply running `git describe` we are not getting the correct tag. Use a `git describe --tags` to allow git to look at all tags, not just annotated ones. Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>	2025-02-14 12:41:08 +00:00
Zvonko Kaiser	2499d013bd	gpu: Update handle_cdi_devices AgentConfig now has the cdi_timeout from the kernel cmdline, update the proper function signature and use it in the for loop. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-13 20:11:48 +00:00
Zvonko Kaiser	d28410ed75	Merge pull request #10877 from AdithyaKrishnan/main CI: Deprecate SEV	2025-02-13 14:55:11 -05:00
Zvonko Kaiser	95aa21f018	gpu: Add CDI timeout via kernel config Some systems like a DGX where we have 8 H100 or 8 H800 GPUs need some extended time to be initialized. We need to make sure we can configure CDI timeout, to enable even systems with 16 GPUs. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-13 19:23:19 +00:00
Adithya Krishnan Kannan	6cc5b79507	CI: Deprecate SEV Phase 1 of Issue #10840 AMD has deprecated SEV support on Kata Containers, and going forward, SNP will be the only AMD feature supported. As a first step in this deprecation process, we are removing the SEV CI workflow from the test suite to unblock the CI. Will be adding future commits to remove redundant SEV code paths. Signed-Off-By: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>	2025-02-13 12:20:21 -06:00
Steve Horsman	0a39f59a9b	Merge pull request #10874 from stevenhorsman/skip-consistently-failing-block-volume-test tests: Skip block volume test on fc, stratovirt	2025-02-13 15:39:45 +00:00
Zvonko Kaiser	a0766986e7	Merge pull request #10832 from RuoqingHe/update-yq ci: Update yq to v4.44.5 to support riscv64	2025-02-13 08:33:02 -05:00
stevenhorsman	56fb2a9482	tests: Skip block volume test on fc, stratovirt The block volume test has failed on 10/10 nightlies and all the PRs I've seen, so skip it until it can be assessed. See #10873 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-13 11:50:35 +00:00
stevenhorsman	2d266df846	test: Update expected error in signed image tests We are seeing a different error in the new version of image-rs, so update our tests to match. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-13 11:44:51 +00:00
stevenhorsman	d28a512d29	agent: Wait for network before init_image_service Based on the guidance from @Xynnn007 in #10851 > The new version of image-rs will do attestation once ClientBuilder.build().await() is called, while the old version will do so lazily the first image pull request comes. Looks like it's called in rpc::start() in kata-agent, when I'm afraid the network hasn't been initialized yet. > I am not sure if the guest network is prepared after the DNS is configured (in create_sandbox), if so we can move (the init_image_service) right after that. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-13 11:44:51 +00:00
Tobin Feldman-Fitzthum	a13d5a3f04	agent: Bump image-rs to 514c561d93 As this brings in the commit bumping ttrpc to 0.8.4, which fixes connection issues with kernel 6.12.9+. As image-rs has a new builder pattern and several of the values in the image client config have been renamed, let's change the agent to account for this. Signed-off-by: Tobin Feldman-Fitzthum <tobin@linux.ibm.com> Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-13 11:44:51 +00:00
Steve Horsman	8614e5efc4	Merge pull request #10869 from stevenhorsman/bump-kcli-ubuntu-version ci: k8s: Bump kcli image version	2025-02-13 09:59:20 +00:00
Antoine Gaillard	4b5b788918	agent: Use init subcgroup for process attachment in DinD cgroups v2 enforces stricter delegation rules, preventing operations on cgroups outside our ownership boundary. When running Docker-in-Docker (DinD), processes must be attached to an "init" subcgroup within the systemd unit. This fix detects and uses the init subcgroup when proxying process attachment. Fixes #10733 Signed-off-by: Antoine Gaillard <antoine.gaillard@datadoghq.com>	2025-02-13 10:44:51 +01:00
Dan Mihai	958cd8dd9f	Merge pull request #10613 from 3u13r/feat/policy/refactor-out-policy-crate-and-network-namespace policy: add policy crate and add network namespace check to policy	2025-02-12 18:28:09 -08:00
Alex Lyn	e1b780492f	Merge pull request #10839 from RuoqingHe/appease-clippy dragonball: Appease clippy	2025-02-13 09:12:15 +08:00
Zvonko Kaiser	acd2a933da	Merge pull request #10864 from fidencio/topic/packaging-move-to-ubuntu-22-04 packaging: Move builds to Ubuntu 22.04	2025-02-12 14:29:41 -05:00
Wainer Moschetta	62e239ceaa	Merge pull request #10810 from arvindskumar99/nydus_perm_install Skipping SNP and SEV from deploying and deleting Snapshotter	2025-02-12 14:38:56 -03:00
stevenhorsman	fd7bcd88d0	ci: k8s: Bump kcli image version When trying to deploy nydus on kcli locally we get the following failure: ``` root@sh-kata-ci1:~# kubectl get pods -n nydus-system NAMESPACE NAME READY STATUS RESTARTS AGE nydus-system nydus-snapshotter-5kdqs 0/1 CrashLoopBackOff 4 (84s ago) 7m29s ``` Digging into this I found that the nydus-snapshotter service is failing with: ``` ubuntu@kata-k8s-worker-0:~$ journalctl -u nydus-snapshotter.service -- Logs begin at Wed 2025-02-12 15:06:08 UTC, end at Wed 2025-02-12 15:20:27 UTC. -- Feb 12 15:10:39 kata-k8s-worker-0 systemd[1]: Started nydus snapshotter. Feb 12 15:10:39 kata-k8s-worker-0 containerd-nydus-grpc[6349]: /usr/local/bin/containerd-nydus-grpc: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required b> Feb 12 15:10:39 kata-k8s-worker-0 containerd-nydus-grpc[6349]: /usr/local/bin/containerd-nydus-grpc: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required b> Feb 12 15:10:39 kata-k8s-worker-0 systemd[1]: nydus-snapshotter.service: Main process exited, code=exited, status=1/FAILURE ``` I think this is because 20.04 has version: ``` ubuntu@kata-k8s-worker-0:~$ ldd --version ldd (Ubuntu GLIBC 2.31-0ubuntu9.16) 2.31 ``` so it's too old for the nydus snapshotter. Also 20.04 is EoL soon, so bumping is better. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-12 15:38:18 +00:00
Zvonko Kaiser	fbc8454d3d	Merge pull request #10866 from zvonkok/enable-cc-gpu-build gpu: enable confidential initrd build	2025-02-12 09:26:08 -05:00
Ruoqing He	897e2e2b6e	dragonball: Appease clippy Some problem hidden in `dbs` crates are revealed after making these crates workspace components, fix according to `cargo clippy` suggests. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-02-12 19:44:34 +08:00
Leonard Cohnen	ec0af6fbda	policy: check the linux network namespace Peer pods have a linux namespace of type network. We want to make sure that all container in the same pod use the same namespace. Therefore, we add the first namespace path to the state and check all other requests against that. This commit also adds the corresponding integration test in the policy crate showcasing the benefit of having rust integration tests for the policy. Signed-off-by: Leonard Cohnen <lc@edgeless.systems>	2025-02-12 10:41:15 +01:00
Leonard Cohnen	7aca7a6671	policy: use agent policy crate in genpolicy test The generated rego policies for `CreateContainerRequest` are stateful and that state is handled in the policy crate. We use this policy crate in the genpolicy integration test to be able to test if those state changes are handled correctly without spinning up an agent or even a cluster. This also allows to easily test on a e.g., CreateContainerRequest level instead of relying on changing the yaml that is applied to a cluster. Signed-off-by: Leonard Cohnen <lc@edgeless.systems>	2025-02-12 10:41:15 +01:00
Leonard Cohnen	d03738a757	genpolicy: expose create as library This commit allows to programmatically invoke genpolicy. This allows for other rust tools that don't want to consume genpolicy as binary to generate policies. One such use-case is the policy integration test implemented in the following commits. Signed-off-by: Leonard Cohnen <lc@edgeless.systems>	2025-02-12 10:41:15 +01:00
Leonard Cohnen	cf54a1b0e1	agent: move policy module into separate crate The policy module augments the policy generated with genpolicy by keeping and providing state to each invocation. Therefore, it is not sufficient anymore to test the passing of requests in the genpolicy crate. Since in Rust, integration tests cannot call functions that are not exposed publicly, this commit factors out the policy module of the agent into its own crate and exposes the necessary functions to be consumed by the agent and an integration tests. The integration test itself is implemented in the following commits. Signed-off-by: Leonard Cohnen <lc@edgeless.systems>	2025-02-12 10:41:15 +01:00
Fupan Li	ec7b2aa441	Merge pull request #10850 from teawater/direct Clean the config block_device_cache_direct of runtime-rs	2025-02-12 09:45:37 +08:00
Zvonko Kaiser	5431841a80	Merge pull request #10814 from kata-containers/shellcheck-gha gha: Add shellcheck	2025-02-11 18:30:41 -05:00
Zvonko Kaiser	2d8531cd20	gpu: Add TDX experimental target for GPUs We have custom branches on coco/qemu to support GPUs in TDX and SNP add experimental target. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-11 17:32:31 +00:00
Zvonko Kaiser	7ded74c068	gpu: Add version for QEMU+TDX+SNP SNP and TDX patches for GPU are not compatible hence we need an own build for TDX. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-11 17:32:31 +00:00
Zvonko Kaiser	e4679055c6	gpu: qemu-snp-experimental no patches The branch has all the needed cherry-picks Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-11 17:32:31 +00:00
Zvonko Kaiser	7a219b3f03	gpu: Add GPU+SNP QEMU build Since the CPU SNP is upstreamed and available via our default QEMU target we're repurposing the SNP-experimental for the GPU+SNP enablement. First step is to update the version we're basing it off. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-11 17:32:31 +00:00
Zvonko Kaiser	b231a795d7	gha: Add shellcheck We need to start to fix our scripts. Lets run shellcheck and see what needs to be reworked. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-11 16:00:34 +00:00
Zvonko Kaiser	befb2a7c33	gpu: Confidential Initrd Start building the confidential initrd Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-11 15:41:36 +00:00
Fupan Li	5b809ca440	CI: a workaround for containerd v2.x e2e test the latest containerd had an issue for its e2e test, thus we should do the following fix to workaround this issue. For much info about this issue, please see: https://github.com/containerd/containerd/pull/11240 Once this pr was merged and release new version, we can remove this workaround. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-02-11 15:21:53 +01:00
Fupan Li	a3fd3d90bc	ci: Add the sandbox api testcases A test case is added based on the intergrated cri-containerd case. The difference between cri containerd integrated testcase and sandbox api testcase is the "sandboxer" setting in the sandbox runtime handler. If the "sandboxer" is set to "" or "podsandbox", then containerd will use the legacy shimv2 api, and if the "sandboxer" is set to "shim", then it will use the sandbox api to launch the pod. In addition, add a containerd v2.0.0 version. Because containerd officially supports the sandbox api from version 2.0.0. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-02-11 15:21:53 +01:00
Fupan Li	36bf080c1e	runtime-rs: register the sandbox api service add and resiger the sandbox api service, thus runtime-rs can deal with the sandbox api rpc call from the containerd. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-02-11 15:21:53 +01:00
Fupan Li	8332f427d2	runtime-rs: add the wait and status method for sandbox api Add the sandbox wait and sandbox status method for sandbox api. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-02-11 15:21:53 +01:00
Fupan Li	2d6b1e6b13	runtime-rs: add the sandbox api support For Kata-Containers, we add SandboxService for these new calls alongside the existing TaskService, including processing requests and replies, and properly calling VirtSandbox's interfaces. By splitting the start logic of the sandbox, virt_container is compatible with calls from the SandboxService and TaskService. In addition, we modify the processing of resource configuration to solve the problem that SandboxService does not have a spec file when creating a pod. Sandbox api can be supported from containerd 1.7. But there's a difference from container 2.0. To enbale it from 2.0, you can support the sandbox api for a specific runtime by adding: sandboxer = "shim", take kata runtime as an example: [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata] runtime_type = "io.containerd.kata.v2" sandboxer = "shim" privileged_without_host_devices = true pod_annotations = ["io.katacontainers.*"] For container version 1.7, you can enable it by: 1: add env ENABLE_CRI_SANDBOXES=true 2: add sandbox_mode = "shim" to runtime config. Acknowledgement This work was based on @wllenyj's POC code: (`f5b62a2d7c`) Signed-off-by: Fupan Li <fupan.lfp@antgroup.com> Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>	2025-02-11 15:21:53 +01:00
Fupan Li	65e908a584	runtime-rs: add the sandbox init for sandbox api For the processing of init sandbox, the init of task api has some more special processing procedures than the init of sandbox api, so these two types of init are separated here. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-02-11 15:21:53 +01:00
Fupan Li	be40646d04	runtime-rs: move the sandbox start from sandbox init function Split the sandbox start from the sandbox init process, and call them separately. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-02-11 15:21:53 +01:00
Fupan Li	438f81b108	runtime-rs: only get the containerd id when start container When start the sandbox, the sandbox id would be passed from the shim command line, and it only need to get the containerd id from oci spec when starting the pod container instead of the pod sandbox. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-02-11 15:21:53 +01:00
Fupan Li	9492c45d06	runtime-rs: load the cgroup path correctly When the sandbox api was enabled, the pause container would be removed and sandbox start api only pass an empty bundle directory, which means there's no oci spec file under it, thus the cgroup config couldn't get the cgroup path from pause container's oci spec. So we should set a default cgroup path for sandbox api case. In the future, we can promote containerd to pass the cgroup path during the sandbox start phase. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-02-11 15:21:53 +01:00
Fupan Li	78b96a6e2e	runtime-rs: fix the issue of missing create sandbox dir It's needed to make sure the sandbox storage path exist before return it. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-02-11 15:21:53 +01:00
Fupan Li	97785b1f3f	runtime-rs: rustfmt against lib.rs It seemed some files was mssing run rustfmt. This commit do rustfmt for them. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-02-11 15:21:53 +01:00
Fupan Li	33555037c0	protocols: Add the cri api protos Add the cri api protos to support the sandbox api. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-02-11 15:21:53 +01:00
Hui Zhu	27cff15015	runtime-rs: Remove block_device_cache_direct from config of fc Remove block_device_cache_direct from config of fc in runtime-rs because fc doesn't support this config. Fixes: #10849 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2025-02-11 14:04:11 +08:00
Hui Zhu	70d9afbd1f	runtime-rs: Add block_device_cache_direct to config of ch and dragonball Add block_device_cache_direct to config of ch and dragonball in runtime-rs because they support this config. Fixes: #10849 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2025-02-11 14:04:11 +08:00
Hui Zhu	db04c7ec93	runtime-rs: Add block_device_cache_direct config to ch and qemu Add block_device_cache_direct config to ch and qemu in runtime-rs. Fixes: #10849 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2025-02-11 14:04:11 +08:00
Hui Zhu	e4cbc6abce	runtime-rs: CloudHypervisorInner: Change config type This commit change config in CloudHypervisorInner to normal HypervisorConfig to decrease the change of its type. Fixes: #10849 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2025-02-11 14:04:11 +08:00
Fabiano Fidêncio	75ac09baba	packaging: Move builds to Ubuntu 22.04 As Ubuntu 20.04 will reach its EOL in April. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-02-10 21:25:43 +01:00
Fabiano Fidêncio	c9f5966f56	Merge pull request #10860 from kata-containers/topic/debug-ci workflows: build: Do not store unnecessary content on the tarball	2025-02-10 20:01:37 +01:00
Fabiano Fidêncio	ec290853e9	workflows: build: Do not store unnecessary content on the tarball Otherwise we may end up simply unpacking kata-containers specific binaries into the same location that system ones are needed, leading to a broken system (most likely what happened with the metrics CI, and also what's happening with the GHA runners). Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-02-10 18:57:29 +01:00
Steve Horsman	fb341f8ebb	Merge pull request #10857 from fidencio/topic/ci-tdx-only-use-one-machine-for-testing ci: Only use the Ubuntu TDX machine in the CI	2025-02-10 15:25:06 +00:00
Fabiano Fidêncio	23cb5bb6c2	ci: Only use the Ubuntu TDX machine in the CI We've been hitting issues with the CentOS 9 Stream machine, which Intel doesn't have cycles to debug. After raising this up in the Confidential Containers community meeting we got the green light from Red Hat (Ariel Adam) to just disable the CI based on CentOS 9 Stream for now. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-02-10 12:50:16 +01:00
Zvonko Kaiser	eb1cf792de	Merge pull request #10791 from kata-containers/gpu_ci_cd gpu: Add first target and fix extratarballs	2025-02-06 15:47:27 -05:00
Zvonko Kaiser	62a975603e	Merge pull request #10806 from stevenhorsman/rust-1.80.0-bump Rust 1.80.0 bump	2025-02-06 14:49:23 -05:00
Dan Mihai	fdf3088be0	Merge pull request #10842 from microsoft/danmihai1/disable-job-policy-test tests: disable k8s-policy-job.bats on coco-dev	2025-02-06 09:09:49 -08:00
Hyounggyu Choi	48c5b1fb55	Merge pull request #10841 from BbolroC/make-measured-rootfs-configurable local-build: Do not build measured rootfs on s390x	2025-02-06 16:07:15 +01:00
Hyounggyu Choi	1bdb34e880	tests: Skip trusted storage tests for IBM SE Let's skip all tests for trusted storage until #10838 is resolved. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-02-06 12:09:14 +01:00
Hyounggyu Choi	27ce3eef12	local-build: Do not use measured rootfs on s390x IBM SE ensures to make initrd measured by genprotimg and verified by ultravisor. Let's not build the measured rootf on s390x. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-02-06 10:12:55 +01:00
stevenhorsman	fce49d4206	dragonball: Skip unsafe tests Skip tests that use unsafe uses of file descriptor which causes ``` fatal runtime error: IO Safety violation: owned file descriptor already closed ``` See #10821 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-06 08:54:17 +00:00
Fabiano Fidêncio	2ceb7a35fc	versions: Bump rust to 1.80.0 (matching coco-guest-components) This is needed in order to avoid agent build issues, such as: ``` error[E0658]: use of unstable library feature 'lazy_cell' --> /home/ansible/.cargo/git/checkouts/guest-components-1e54b222ad8d9630/514c561/ocicrypt-rs/src/lib.rs:10:5 \| 10 \| use std::sync::LazyLock; \| ^^^^^^^^^^^^^^^^^^^ \| = note: see issue #109736 <https://github.com/rust-lang/rust/issues/109736> for more information ``` Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-06 08:53:51 +00:00
Fabiano Fidêncio	76df852f33	packaging: agent: Add rust version to the builder image name As we want to make sure a new builder image is generated if the rust version is bumped. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-06 08:53:51 +00:00
stevenhorsman	d3e0ecc394	kata-ctl: Allow empty const Due to the way that multi-arch support is done, on various platforms we will get a clippy error: ``` error: this expression always evaluates to false ``` which might not be true on those other platforms, so allow this code pattern to suppress the clippy error Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-06 08:53:51 +00:00
Fabiano Fidêncio	6de8e59109	Merge pull request #10824 from stevenhorsman/updates-in-prep-of-rust-1.80-bump Updates in prep of rust 1.80 bump	2025-02-06 09:05:23 +01:00
Dan Mihai	47ce5dad9d	tests: disable k8s-policy-job.bats on coco-dev k8s-policy-job is modeled after the older k8s-job, and it appears that both of them fail occasionally on coco-dev. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-02-05 23:06:16 +00:00
Arvind Kumar	47534c1c3e	nydus: Skipping SNP and SEV from deploying and deleting Snapshotter Preparing to install nydus permanently on the AMD node, so disabling deploy and delete command for SNP and SEV. Signed-off-by: Arvind Kumar <arvinkum@amd.com>	2025-02-05 12:26:53 -06:00
Zvonko Kaiser	45bd451fa0	ci: add arm64 attestation Do the very same thing that we do on amd64 and add attestation Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-05 16:30:20 +00:00
Zvonko Kaiser	9a7dff9c40	gpu: Add arm64 targets We want to make sure we deliver arm64 GPU targets as well Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-05 16:30:20 +00:00
Zvonko Kaiser	968318180d	ci: Add extratarballs steps We introduced extratarballs with a make target. The CI currently only uploads tarballs that are listed in the matrix. The NV kernel builds a headers package which needs to be uploaded as well. The get-artifacts has a glob to download all artifacts hence we should be good. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-05 16:30:20 +00:00
Zvonko Kaiser	b04bdf54a5	gpu: Add rootfs target amd64/arm64 Adding the initrd build first to get the rootfs on amd64. With that we can start to add tests. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-05 16:30:20 +00:00
stevenhorsman	7831caf1e7	libs/safe-path: Fix doc formatting Clippy fails with ``` error: doc list item missing indentation ``` so indent further to avoid this.	2025-02-05 15:16:47 +00:00
stevenhorsman	17b1e94f1a	cargo: Update time crate So it avoids us hitting ``` error[E0282]: type annotations needed for `Box<_>` --> /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/time-0.3.31/src/format_description/parse/mod.rs:83:9 \| 83 \| let items = format_items \| ^^^^^ ... 86 \| Ok(items.into()) \| ---- type must be known at this point \| help: consider giving `items` an explicit type, where the placeholders `_` are specified \| 83 \| let items: Box<_> = format_items \| ++++++++ ``` Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 15:16:47 +00:00
stevenhorsman	e9393827e8	agent: Workaround ppc formatting On powerpc64le platform the ip neigh command has a trailing space after the state, so the test is failing e.g. ``` assertion `left == right` failed left: "169.254.1.1 lladdr 6a:92:3a:59:70:aa PERMANENT \n" right: "169.254.1.1 lladdr 6a:92:3a:59:70:aa PERMANENT\n" ``` Trim the whitespace to make the test pass on all platforms Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 15:16:47 +00:00
stevenhorsman	1ac0e67245	kata-ctl: Add stub of missing method for ppc `host_is_vmcontainer_capable` is required, but wasn't implemented for powerpc64, so copy the aarch64 approach @Amulyam24 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 15:16:47 +00:00
stevenhorsman	bd3c93713f	kata-sys-util: Complete code move In #7236 the guest protection code was moved to kata-sys-utils, but some of it was left behind, and the adjustment to the new location wasn't completed, so the powerpc64 code doesn't build now we've fixed the cfg to test it. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 15:16:47 +00:00
stevenhorsman	9f865f5bad	kata-ctl: Allow dead_code Some of the Kernel structs have `#[allow(dead_code)]` but not all and this results in the clippy error: ``` error: fields `name` and `value` are never read ``` so complete the job started before to remove the error. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:02 +00:00
stevenhorsman	61a252094e	dragonball: Fix feature typo Replace `legacy_irq` with `legacy-irq` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:02 +00:00
stevenhorsman	add785f677	dragonball: Remove unused fields `metrics` is never used, so remove this code Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:02 +00:00
stevenhorsman	dde34bb7b8	runtime-rs: Remove un-used code The `r#type` method is never used, so neither are the log type constants Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:02 +00:00
stevenhorsman	71fffb8736	runtime-rs: Allow dead code Clippy errors with: ``` error: field `driver` is never read --> crates/resource/src/network/utils/link/driver_info.rs:77:9 \| 76 \| pub struct DriverInfo { \| ---------- field in this struct 77 \| pub driver: String, \| ^^^^^^ ``` We set this, but never read it, so clippy is correct, but I'm not sure if it's useful for logging, or other purposes, so I'll allow it for now. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:02 +00:00
stevenhorsman	d75a0ccbd1	dragonball: Allow test-mock feature Clippy fails with: ``` warning: unexpected `cfg` condition value: `test-mock` --> /root/go/src/github.com/kata-containers/kata-containers/src/dragonball/src/dbs_pci/src/vfio.rs:1929:17 \| 1929 \| #[cfg(all(test, feature = "test-mock"))] \| ^^^^^^^^^^^^^^^^^^^^^ help: remove the condition \| = note: no expected values for `feature` = help: consider adding `test-mock` as a feature in `Cargo.toml` ``` So add it as an expected cfg in the linter to skip this Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:02 +00:00
stevenhorsman	bddaea6df1	runtime-rs: Allow enable-vendor feature Clippy fails with: ``` error: unexpected `cfg` condition value: `enable-vendor` --> crates/hypervisor/src/device/driver/vfio.rs:180:11 \| 180 \| #[cfg(feature = "enable-vendor")] \| ^^^^^^^^^^^^^^^^^^^^^^^^^ \| = note: expected values for `feature` are: `ch-config`, `cloud-hypervisor`, `default`, and `dragonball` = help: consider adding `enable-vendor` as a feature in `Cargo.toml` ``` So add it as an expected cfg in the linter to skip this Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:02 +00:00
stevenhorsman	bed128164a	runtime-rs: Allow unexpected config Clippy fails with: ``` error: unexpected `cfg` condition value: `enable-vendor` --> crates/hypervisor/src/device/driver/vfio.rs:180:11 \| 180 \| #[cfg(feature = "enable-vendor")] \| ^^^^^^^^^^^^^^^^^^^^^^^^^ \| = note: expected values for `feature` are: `ch-config`, `cloud-hypervisor`, `default`, and `dragonball` = help: consider adding `enable-vendor` as a feature in `Cargo.toml` ``` allow this until we can check this behaviour with @Apokleos Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:02 +00:00
stevenhorsman	53bcb0b108	runtime-rs: Fix for-loops-over-fallibles Clippy complains about: ``` error: for loop over a `&Result`. This is more readably written as an `if let` statement --> crates/hypervisor/src/firecracker/fc_api.rs:99:22 \| 99 \| for param in &kernel_params.to_string() { \| ^^^^^^^^^^^^^^^^^^^^^^^^^^ ``` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:02 +00:00
stevenhorsman	c332a91ef8	runtime-rs: Fix doc list item missing indentation Add the extra space to format the list correctly Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:02 +00:00
stevenhorsman	fe98d49a29	runtime-rs: Remove direct implementation of ToString Fix clippy error: ``` direct implementation of `ToString` ``` by switching to implement Display instead Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:02 +00:00
stevenhorsman	730c56af2a	runtime-rs: Fix clippy::unnecessary-get-then-check Clippy errors with: ``` error: unnecessary use of `get(&id).is_none()` --> crates/hypervisor/src/device/device_manager.rs:494:29 \| 494 \| if self.devices.get(&id).is_none() { \| -------------^^^^^^^^^^^^^^^^^^ \| \| \| help: replace it with: `!self.devices.contains_key(&id)` ``` so fix this as suggested Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
stevenhorsman	a9358b59b7	runtime-rs: Allow unused enum field Clippy errors with: ``` error: field `0` is never read --> crates/hypervisor/src/qemu/cmdline_generator.rs:375:25 \| 375 \| DeviceAlreadyExists(String), // Error when trying to add an existing device \| ------------------- ^^^^^^ ``` but this is used when creating the error later, so add an allow to ignore this warning Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
stevenhorsman	1d9efeb92b	runtime-rs: Remove use of legacy constants Fix clippy error ``` error: usage of a legacy numeric constant ``` by swapping `std::u8::MAX` for `u8::MAX` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
stevenhorsman	225c7fc026	kata-ctl: Allow unused enum field Clippy errors with: ``` error: field `0` is never read ``` but the field is required for the `map_err`, so ignore this error for now to avoid too much disruption Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
stevenhorsman	f1d3450d1f	runtime-rs: Remove unused config `gdb` is only activated by a feature `guest_debug` that doesn't exist, so remove this. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
stevenhorsman	1e90fc38de	dragonball: Fix incorrect reference There were references to `config_manager::DeviceInfoGroup` which doesn't exist, so I guess it means `DeviceConfigInfo` instead, so update them Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
stevenhorsman	f389b05f20	dragonball: Fix doc formatting issue Clippy errors with: ``` error: doc list item missing indentation ``` which I think is because the Return is between two list items, so add a blank line to separate this into a separate paragraph Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
stevenhorsman	8bea57326a	dragonballl: Fix thread_local initializer error clippy errors with: ``` error: initializer for `thread_local` value can be made `const` ``` so update as suggested Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
stevenhorsman	7257ee0397	agent: Remove implementation of ToString Fix clippy error: ``` direct implementation of `ToString` ``` by switching to implement Display instead Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
stevenhorsman	ca87aca1a6	agent: Remove use of legacy constants Fix clippy error ``` error: usage of a legacy numeric constant ``` by swapping `std::i32::<MIN/MAX>` for `i32::<MIN/MAX>` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
stevenhorsman	6008fd56a1	agent: Fix clippy error ``` error: file opened with `create`, but `truncate` behavior not defined ``` `truncate(true)` ensures the file is entirely overwritten with new data which I believe is the behaviour we want Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
stevenhorsman	a640bb86ec	agent: cdh: Remove unnecessary borrows Fix clippy error: ``` error: the borrowed expression implements the required traits ``` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
stevenhorsman	a131eec5c1	agent: config: Remove supports_seccomp supports_seccomp is never used, so throws a clippy error Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
stevenhorsman	0bd36a63d9	agent: Fix clippy error ``` error: bound is defined in more than one place ``` Move Sized into the later definition of `R` & `W` rather than defining them in two places Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
stevenhorsman	7709198c3b	rustjail: Fix clippy error ``` error: file opened with `create`, but `truncate` behavior not defined ``` `truncate(true)` ensures the file is entirely overwritten with new data which I believe is the behaviour we want Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
Fabiano Fidêncio	b4de302cb2	genpolicy: Adjust to build with rust 1.80.0 ``` error: field `image` is never read --> src/registry.rs:35:9 \| 34 \| pub struct Container { \| --------- field in this struct 35 \| pub image: String, \| ^^^^^ \| = note: `Container` has derived impls for the traits `Debug` and `Clone`, but these are intentionally ignored during dead code analysis = note: `-D dead-code` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(dead_code)]` error: field `use_cache` is never read --> src/utils.rs:106:9 \| 105 \| pub struct Config { \| ------ field in this struct 106 \| pub use_cache: bool, \| ^^^^^^^^^ \| = note: `Config` has derived impls for the traits `Debug` and `Clone`, but these are intentionally ignored during dead code analysis error: could not compile `genpolicy` (bin "genpolicy") due to 2 previous errors ``` Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
stevenhorsman	099b241702	powerpc64: Add target_endian = "little" Based on comments from @Amulyam24 we need to use the `target_endian = "little"` as well as target_arch = "powerpc64" to ensure we are working on powerpc64le. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:45:01 +00:00
stevenhorsman	4c006c707a	build: Fix powerpc64le target_arch Starting with version 1.80, the Rust linter does not accept an invalid value for `target_arch` in configuration checks: ``` Compiling kata-sys-util v0.1.0 (/home/ddd/Work/kata/kata-containers/src/libs/kata-sys-util) error: unexpected `cfg` condition value: `powerpc64le` --> /home/ddd/Work/kata/kata-containers/src/libs/kata-sys-util/src/protection.rs:17:34 \| 17 \| #[cfg(any(target_arch = "s390x", target_arch = "powerpc64le"))] \| ^^^^^^^^^^^^^^------------- \| \| \| help: there is a expected value with a similar name: `"powerpc64"` \| = note: expected values for `target_arch` are: `aarch64`, `arm`, `arm64ec`, `avr`, `bpf`, `csky`, `hexagon`, `loongarch64`, `m68k`, `mips`, `mips32r6`, `mips64`, `mips64r6`, `msp430`, `nvptx64`, `powerpc`, `powerpc64`, `riscv32`, `riscv64`, `s390x`, `sparc`, `sparc64`, `wasm32`, `wasm64`, `x86`, and `x86_64` = note: see <https://doc.rust-lang.org/nightly/rustc/check-cfg/cargo-specifics.html> for more information about checking conditional configuration = note: `-D unexpected-cfgs` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(unexpected_cfgs)]` ``` According [to GitHub user @Urgau][explain], this is a new warning introduced in Rust 1.80, but the problem exists before. The correct architecture name should be `powerpc64`, and the differentiation between `powerpc64le` and `powerpc64` should use the `target_endian = "little"` check. [explain]: #10072 (comment) Fixes: #10067 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com> [emlima: fix some more occurences and typos] Signed-off-by: Emanuel Lima <emlima@redhat.com> [stevenhorsman: fix some more occurences and typos] Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-05 14:20:47 +00:00
Zvonko Kaiser	429b2654f4	Merge pull request #10812 from zvonkok/fix-arch-build-gpu gpu: Fix arm64 build	2025-02-04 17:03:37 -05:00
Dan Mihai	3fc170788d	Merge pull request #10811 from microsoft/cameronbaird/hyp-loglevel-upstream CLH: config: add hypervisor_loglevel	2025-02-04 11:59:21 -08:00
Zvonko Kaiser	eeacd8fd74	gpu: Adapt rootfs build for multi-arch Add aarch64 and x86_64 handling. Especially build the Rust dependency with the correct rust musl target. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-02-04 16:44:21 +00:00
Steve Horsman	9060904c4f	Merge pull request #10826 from kata-containers/topic/crio-test-timeouts workflows: Add delete kata-deploy timeouts for crio tests	2025-02-04 13:09:49 +00:00
Markus Rudy	937fd90779	agent: clear log pipes if denied by policy Container logs are forwarded to the agent through a unix pipe. These pipes have limited capacity and block the writer when full. If reading logs is blocked by policy, a common setup for confidential containers, the pipes fill up and eventually block the container. This commit changes the implementation of ReadStream such that it returns empty log messages instead of a policy failure (in case reading log messages is forbidden by policy). As long as the runtime does not encounter a failure, it keeps pulling logs periodically. In turn, this triggers the agent to flush the pipes. Fixes: #10680 Co-Authored-By: Aurélien Bombo <abombo@microsoft.com> Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-02-04 13:17:29 +01:00
Ruoqing He	8e073a6715	ci: Update yq to v4.44.5 to support riscv64 In v4.44.5 of `yq`, artifacts for riscv64 are released. Update the version used for `yq` and enable `install_yq.sh` to work on riscv64. Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-02-04 19:36:34 +08:00
Zvonko Kaiser	95c63f4982	Merge pull request #10827 from stevenhorsman/bump-golang-1.22.11 versions: Bump golang version	2025-02-03 16:06:56 -05:00
Zvonko Kaiser	7dc8060051	Merge pull request #10828 from stevenhorsman/fix-versions-comments versions: Fix formatting	2025-02-03 16:06:37 -05:00
stevenhorsman	546e3ae9ea	versions: Fix formatting The static_checks_versions test uses yamllint which fails with: ``` [comments] too few spaces before comment ``` many times and so makes code reviews more annoying with all these extra messages. Other it's probably not the worse issues, I checked the [yaml spec](https://yaml.org/spec/1.2.2/#66-comments) and it does say > Comments must be separated from other tokens by white space characters so it's easiest to fix it and move on. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-03 17:08:25 +00:00
Zvonko Kaiser	122ad95da6	Merge pull request #10751 from ryansavino/snp-upstream-host-kernel-support snp: update kata to use latest upstream packages for snp	2025-02-03 11:20:59 -05:00
stevenhorsman	d9eb1b0e06	versions: Bump golang version Bump golang versions so we are more up-to-date and have the extra security fixes Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-03 15:28:53 +00:00
stevenhorsman	5203158195	workflows: Add delete kata-deploy timeouts for crio tests I've also seen cases (the qemu, crio, k0s tests) where Delete kata-deploy is still running for this test after 2 hours, and had to be manually cancelled, so let's try adding a 5m timeout to the kata-deploy delete to stop CI jobs hanging. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-02-03 11:45:43 +00:00
Greg Kurz	a806d74ce3	Merge pull request #10807 from kata-containers/dependabot/go_modules/src/tools/csi-kata-directvolume/go_modules-8d4d0c168c build(deps): bump github.com/golang/glog from 1.2.0 to 1.2.4 in /src/tools/csi-kata-directvolume in the go_modules group across 1 directory	2025-02-01 08:29:44 +01:00
Cameron Baird	b6b0addd5e	config: add hypervisor_loglevel Implement HypervisorLoglevel config option for clh. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2025-01-31 18:37:03 +00:00
Steve Horsman	41f23f1d2a	Merge pull request #10823 from stevenhorsman/fix-virtiofsd-build-error packaging: virtiofsd: Allow building a specific commit	2025-01-31 16:18:02 +00:00
stevenhorsman	1cf1a332a5	packaging: virtiofsd: Allow building a specific commit #10714 added support for building a specific commit, but due to the clone only having `--depth=1`, we can only reset to a commit if it's the latest on the `main` branch, otherwise we will get: ``` + git clone --depth 1 --branch main https://gitlab.com/virtio-fs/virtiofsd virtiofsd Cloning into 'virtiofsd'... warning: redirecting to https://gitlab.com/virtio-fs/virtiofsd.git/ + pushd virtiofsd + git reset --hard cecc61bca981ab42aae6ec490dfd59965e79025e ... fatal: Could not parse object 'cecc61bca981ab42aae6ec490dfd59965e79025e'. ``` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-01-31 11:24:23 +00:00
Greg Kurz	0215d958da	Merge pull request #10805 from balintTobik/egrep_removal egrep/fgrep removal	2025-01-30 18:26:59 +01:00
Hyounggyu Choi	530fedd188	Merge pull request #10767 from BbolroC/enable-coldplug-vfio-ap-s390x Enable VFIO-AP coldplug for s390x	2025-01-30 12:11:00 +01:00
Balint Tobik	1943a1c96d	tests: replace egrep with grep -E to avoid deprecation warning https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00001.html Signed-off-by: Balint Tobik <btobik@redhat.com>	2025-01-29 11:26:27 +01:00
Balint Tobik	47140357c4	docs: replace egrep/fgrep with grep -E/-F to avoid deprecation warning https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00001.html Signed-off-by: Balint Tobik <btobik@redhat.com>	2025-01-29 11:25:54 +01:00
Ryan Savino	90e2b7d1bc	docs: updated build and host setup instructions for SNP Referenced AMD developer page for latest SEV firmware. Instructions to point to upstream 6.11 kernel or later. Referenced sev-utils and AMDESE fork for kernel setup. Signed-Off-By: Ryan Savino <ryan.savino@amd.com>	2025-01-28 18:09:40 -06:00
Ryan Savino	c1ca49a66c	snp: set snp to use upstream qemu in config use upstream qemu in snp and nvidia snp configs. load ovmf with bios flag on qemu cmdline instead of file. Fixes: #10750 Signed-Off-By: Ryan Savino <ryan.savino@amd.com>	2025-01-28 18:09:40 -06:00
Ryan Savino	af235fc576	Revert "builds: ovmf: Workaround Zeex repo becoming private" This reverts commit `aff3d98ddd`.	2025-01-28 18:09:40 -06:00
Ryan Savino	bb7ca954c7	ovmf: upgrade standard and sev ovmf ovmf upgraded to latest tag for standard and sev. Signed-Off-By: Ryan Savino <ryan.savino@amd.com>	2025-01-28 18:09:40 -06:00
Ryan Savino	e87231edc7	snp: remove snp certs on qemu cmdline snp standard attestation with the upstream kernel and qemu do not support extended attestation with certs. Fixes: #10750 Signed-Off-By: Ryan Savino <ryan.savino@amd.com>	2025-01-28 18:09:40 -06:00
Zvonko Kaiser	f9bbe4e439	Merge pull request #10785 from zvonkok/agent-cgv2-activate agent: Add proper activation param handling to activate cgroupV2	2025-01-28 14:21:15 -05:00
dependabot[bot]	df5eafd2a1	build(deps): bump github.com/golang/glog Bumps the go_modules group with 1 update in the /src/tools/csi-kata-directvolume directory: [github.com/golang/glog](https://github.com/golang/glog). Updates `github.com/golang/glog` from 1.2.0 to 1.2.4 - [Release notes](https://github.com/golang/glog/releases) - [Commits](https://github.com/golang/glog/compare/v1.2.0...v1.2.4) --- updated-dependencies: - dependency-name: github.com/golang/glog dependency-type: direct:production dependency-group: go_modules ... Signed-off-by: dependabot[bot] <support@github.com>	2025-01-28 17:38:14 +00:00
Fabiano Fidêncio	5e00a24145	Merge pull request #10749 from zvonkok/pass-through-stack gpu: Add driver version selection	2025-01-28 16:24:16 +01:00
Hyounggyu Choi	dde627cef4	test: Run full set of zcrypttest for VFIO-AP coldplug Previously, the test for VFIO-AP coldplug only checked whether a passthrough device was attached to the VM guest. This commit expands the test to include a full set of zcrypttest to verify that the device functions properly within a container. Additionally, since containerd has been upgraded to v1.7.25 on the test machine, it is no longer necessary to run the test via crictl. The commit removes all related codes/files. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-01-28 10:53:00 +01:00
Hyounggyu Choi	47db9b3773	agent: Run check_ap_device() for VFIO-AP coldplug This commit updates the device handler to call check_ap_device() instead of wait_for_ap_device() for VFIO-AP coldplug. The handler now returns a SpecUpdate for passthrough devices if the device is online (e.g., `/sys/devices/ap/card05/05.001f/online` is set to 1). Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-01-28 10:53:00 +01:00
Hyounggyu Choi	200cbfd0b0	kata-types: Introduce new type `vfio-ap-cold` for VFIO-AP coldplug This newly introduced type will be used by the VFIO-AP device handler on the agent. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-01-28 10:53:00 +01:00
Hyounggyu Choi	4a6ba534f1	runtime: Introduce new gRPC device type for VFIO-AP coldplug This commit introduces a new gRPC device type, `vfio-ap-cold`, to support VFIO-AP coldplug. This enables the VM guest to handle passthrough devices differently from VFIO-AP hotplug. With this new type, the guest no longer needs to wait for events (e.g., device addition) because the device already exists at the time the device type is checked. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-01-28 10:53:00 +01:00
Hyounggyu Choi	419b5ed715	runtime: Add DeviceInfo to Container for VFIO coldplug configuration Even though ociSpec.Linux.Devices is preserved when vfio_mode is VFIO, it has not been updated correctly for coldplug scenarios. This happens because the device info passed to the agent via CreateContainerRequest is dropped by the Kata runtime. This commit ensures that the device info is added to the sandbox's device manager when vfio_mode is VFIO and coldPlugVFIO is true (e.g., vfio-ap-cold), allowing ociSpec.Linux.Devices to be properly updated with the device information before the container is created on the guest. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-01-28 10:53:00 +01:00
Balint Tobik	233d15452b	runtime: replace egrep with grep -E to avoid deprecation warning https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00001.html Signed-off-by: Balint Tobik <btobik@redhat.com>	2025-01-28 10:46:44 +01:00
Balint Tobik	e657f58cf9	ci: replace egrep with grep -E to avoid deprecation warning https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00001.html Signed-off-by: Balint Tobik <btobik@redhat.com>	2025-01-28 10:46:44 +01:00
Zvonko Kaiser	9f2799ba4f	Merge pull request #10790 from JakubLedworowski/add-xattr-to-confidential-kernel kernel: Add CONFIG_TMPFS_XATTR to tdx.conf	2025-01-27 13:47:08 -05:00
Zvonko Kaiser	d2528ef84f	gpu: Initialize unbound variables rootfs.sh Since we're importing some build script for nvidia and we're setting set -u we have some unbound variables in rootfs.sh add initialization for those. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-27 18:37:21 +00:00
Zvonko Kaiser	9162103f85	agent: Update macro for e.g. String type stack-only types are handled properly with the parse_cmdline_param macro advancted types like String couldn't be guarded by a guard function since it passed the variable by value rather than reference. Now we can have guard functions for the String type parse_cmdline_param!( param, CGROUP_NO_V1, config.cgroup_no_v1, get_string_value, \| no_v1 \| no_v1 == "all" ); Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-27 17:56:43 +00:00
Zvonko Kaiser	aab9d36e47	agent: Add tests for cgroup_no_v1 The only valid value is "all", ignore all other Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-27 17:56:43 +00:00
Zvonko Kaiser	e1596f7abf	agent: Add option to parse cgroup_no_v1 For AGENT_INIT=yes we do not run systemd and hence systemd.unified_... does not mean anything to other init systems. Providing cgroup_no_v1=all is enough to signal other init systemd to use cgroupV2. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-27 17:56:43 +00:00
Zvonko Kaiser	cd7001612a	gpu: rootfs adjust for AGENT_INIT=no Since we're defaulting to AGENT_INIT=no for all the initrd/images adapt the NV build to properly get kata-agent installed. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-27 17:56:21 +00:00
Zvonko Kaiser	10974b7bec	gpu: AGENT_INIT=no We're setting globally for each initrd and image AGENT_INIT=no Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-27 17:56:21 +00:00
Zvonko Kaiser	98e0dc1676	gpu: Add set -u to scripts Make the scripts more robust by failing on unset varaibles Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-27 17:56:21 +00:00
Zvonko Kaiser	f153229865	gpu: Add driver version selection Besides latest and lts options add an option to specify the exact driver version. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-27 17:56:21 +00:00
Steve Horsman	311c3638c6	Merge pull request #10794 from fidencio/topic/bump-ubuntu-version-for-the-confidential-rootfs-and-initrd versions: Bump Ubuntu base image & initrd	2025-01-27 15:55:16 +00:00
Fabiano Fidêncio	84b0ca1b18	versions: Bump Ubuntu rootfs / initrd versions While I wish we could be bumping to the very same version everywhere, it's not possible and it's been quite a ride to get a combination of things that work. Let me try to describe my approach here: * Do NOT stay on 20.04 * This version will be EOL'ed by April * This version has a very old version of systemd that causes a bug when trying to online the cpusets for guests using systemd as init, causing then a breakage on the qemu-coco-non-tee and TDX non-attestation set of tests * Bump to 22.04 when possible * This was possible for the majority of the cases, but for the confidential initrd & confidential images for x86_64, the reason being failures on AMD SEV CI (which I didn't debug), and a kernel panic on the CentOS 9 Stream TDX machine * 22.04 is being used instead of 24.04 as multistrap is simply broken on Ubuntu 24.04, and I'd prefer to stay on an LTS release whenever it's possible * Bump to 24.10 for x86_64 image confidential * This was done as we got everything working with 24.10 in the CI. * This requires using libtdx-attest from noble (Ubuntu 24.04), as Intel only releases their sgx stuff for LTS releases. * Stick to 20.04 for x86_64 initrd confidential * 24.10 caused a panic on their CI * This is only being used by AMD so far, so they can decide when to bump, after doing the proper testing & debug that the bump will work as expected for them Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-27 15:08:20 +01:00
Carlos Segarra	b6e0effc06	tdx: bump version of libtdx-attest in rootfs-builder Bump libtdx-attest to its 1.22 release. Signed-off-by: Carlos Segarra <carlos@carlossegarra.com>	2025-01-27 15:08:20 +01:00
Fabiano Fidêncio	2b5dbfacb8	osbuilder: ubuntu: Try to install pyinstaller using --break-system-packages We first try without passing the `--break-system-packages` argument, as that's not supported on Ubuntu 22.04 or older, but that's required on Ubuntu 24.04 or newer. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-27 15:08:20 +01:00
Fabiano Fidêncio	c54f78bc6b	local-build: cache: Consider os name & version for image/initrd Otherwise a bump in the os name and / or os version would lead to the CI using a cached artefact. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-27 15:08:20 +01:00
Fabiano Fidêncio	4a66acc6f5	osbuilder: ubuntu: Abort if multistrap fails (but not on 20.04) We have gotten Ubuntu 20.04 working pretty much "by luck", as multistrap fails the deployment, and then a hacky function was introduced to add the proper dbus links. However, this does not scale at all, and we should: * Fail if multistrap fails * I won't do this for Ubuntu 20.04 as it's working for now and soon enough it'll be EOL * Add better logging to ensure someone can know when multistrap fails Below you can find the failure that we're hitting on Ubuntu 20.04: ```sh Errors were encountered while processing: dbus ERR: dpkg configure reported an error. Native mode configuration reported an error! I: Tidying up apt cache and list data. Multistrap system reported 1 error in /rootfs/. I: Tidying up apt cache and list data. ``` Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-27 15:08:16 +01:00
Fabiano Fidêncio	585f82f730	osbuilder: ubuntu: Ensure OS_VERSION is passed & used Right now we're hitting an interesting situation with osbuilder, where regardless of what's being passed Ubuntu 20.04 (focal) is being used when building the rootfs-image, as shown in the snippets of the logs below: ``` ffidenci@tatu:~/src/upstream/kata-containers/kata-containers$ make rootfs-image-confidential-tarball /home/ffidenci/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build//kata-deploy-copy-libseccomp-installer.sh "agent" make agent-tarball-build ... make pause-image-tarball-build ... make coco-guest-components-tarball-build ... make kernel-confidential-tarball-build ... make rootfs-image-confidential-tarball-build make[1]: Entering directory '/home/ffidenci/src/upstream/kata-containers/kata-containers' /home/ffidenci/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build//kata-deploy-binaries-in-docker.sh --build=rootfs-image-confidential sha256:f16c57890b0e85f6e1bbe1957926822495063bc6082a83e6ab7f7f13cabeeb93 Build kata version 3.13.0: rootfs-image-confidential INFO: DESTDIR /home/ffidenci/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build/build/rootfs-image-confidential/destdir INFO: Create image build image ~/src/upstream/kata-containers/kata-containers/tools/osbuilder ~/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build/build/rootfs-image-confidential/builddir INFO: Build image INFO: image os: ubuntu INFO: image os version: latest Creating rootfs for ubuntu /home/ffidenci/src/upstream/kata-containers/kata-containers/tools/osbuilder/rootfs-builder/rootfs.sh -o 3.13.0-13f0807e9f5687d8e5e9a0f4a0a8bb57ca50d00c-dirty -r /home/ffidenci/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build/build/rootfs-image-confidential/builddir/rootfs-image/ubuntu_rootfs ubuntu INFO: rootfs_lib.sh file found. Loading content ~/src/upstream/kata-containers/kata-containers/tools/osbuilder/rootfs-builder/ubuntu ~/src/upstream/kata-containers/kata-containers/tools/osbuilder ~/src/upstream/kata-containers/kata-containers/tools/osbuilder INFO: rootfs_lib.sh file found. Loading content INFO: build directly WARNING: apt does not have a stable CLI interface. Use with caution in scripts. Get:1 http://security.ubuntu.com/ubuntu focal-security InRelease [128 kB] Get:2 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB] Get:3 http://archive.ubuntu.com/ubuntu focal-updates InRelease [128 kB] Get:4 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [4276 kB] Get:5 http://archive.ubuntu.com/ubuntu focal-backports InRelease [128 kB] Get:6 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB] Get:7 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [1297 kB] Get:8 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [30.9 kB] Get:9 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [4187 kB] Get:10 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB] Get:11 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB] Get:12 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB] Get:13 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [4663 kB] Get:14 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1589 kB] Get:15 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [34.6 kB] Get:16 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [4463 kB] Get:17 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [55.2 kB] Get:18 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [28.6 kB] Fetched 34.1 MB in 5s (6284 kB/s) ... ``` The reason this is happening is due to a few issues in different places: 1. IMG_OS_VERSION, passed to osbuilder, is not used anywhere and OS_VERSION should be used instead. And we should break if OS_VERSION is not properly passed down 2. Using UBUNTU_CODENAME is simply wrong, as it'll use whatever comes as the base container from kata-deploy's local-build scripts, and it has just been working by luck Note that at the same time this commit fixes the wrong behaviour, it would break the rootfses build as they are, this we need to set the versions.yaml to use 20.04 were it was already using 20.04 even without us knowing. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-27 14:19:42 +01:00
Fabiano Fidêncio	02a18c1359	versions: Clarify which release matches a codename It'll make the life of the developers not so familiar with Ubuntu easier. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-27 14:19:42 +01:00
Fabiano Fidêncio	ca96a6ac76	versions: Use Ubuntu codename instead of versions As this is required as part of the osbuilder tool to be able to properly set the repositories used when building the rootfs. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-27 14:19:39 +01:00
Fabiano Fidêncio	353ceb948e	versions: Don't use the yaml variable definitions While having variables are nice, those are more extensive to write down, and actually confusing for tired developer eyes to read, plus we're mixing the use of the yaml variables here and there together with not using them for some architectures. With the best "all or nothing" spirit, let's just make it easier for our developers to read the versions.yaml and easily understand what's being used. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-27 14:19:08 +01:00
Jakub Ledworowski	42531cf6c4	kernel: Add CONFIG_TMPFS_XATTR to confidential kernel During pull inside the guest, overlayfs expects xattrs. Fixes: [guest-components#876](https://github.com/confidential-containers/guest-components/issues/876) Signed-off-by: Jakub Ledworowski <jakub.ledworowski@intel.com>	2025-01-27 07:07:54 +01:00
Zvonko Kaiser	b4c710576e	Merge pull request #10782 from stevenhorsman/clh-metrics-write-update metrics: Increase minval range for blogbench test	2025-01-24 10:21:20 -05:00
Steve Horsman	54e7e1fdc3	Merge pull request #10768 from kata-containers/dependabot/go_modules/src/runtime/go_modules-28d0d344dd build(deps): bump the go_modules group across 3 directories with 1 update	2025-01-24 12:04:56 +00:00
Greg Kurz	17f3eb0579	Merge pull request #10766 from balintTobik/remove_shebang Remove shebang in non-executable completion script	2025-01-24 12:29:03 +01:00
Alex Lyn	ee635293c6	Merge pull request #10740 from RuoqingHe/virtiofsd-riscv64 virtiofsd: Enable build for RISC-V	2025-01-24 15:43:56 +08:00
Zvonko Kaiser	f5c509d58e	Merge pull request #10779 from kata-containers/topic/arm64-static-build-runner workflows: Move arm static checks runner	2025-01-23 22:29:16 -05:00
Fabiano Fidêncio	4bc978416c	Merge pull request #10720 from fidencio/topic/test-cgroupsv2-on-guest kernel: Ensure no cgroupsv1 is used	2025-01-23 21:26:49 +01:00
Aurélien Bombo	66d292bdb4	Merge pull request #10732 from microsoft/danmihai/minor-systemd-cleanup rootfs: minor systemd file deletion cleanup	2025-01-23 11:29:25 -06:00
Fabiano Fidêncio	b47cc6fffe	cri-containerd: Skip TestDeviceCgroup till it's adapted to cgroupsv2 As the devices controller works in a different way in cgroupsv2, the "/sys/fs/cgroup/devices/devices.list" file simply doesn't exist. For now, let's skip the test till the test maintainer decides to re-enable it for cgroupsv2. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-23 17:25:56 +01:00
Fabiano Fidêncio	0626d7182a	tests: k8s-cpu-ns: Adapt to cgroupsv2 The changes done are: * cpu/cpu.shares was replaced by cpu.weight * The weight, according to our reference[0], is calculated by: weight = (1 + ((request - 2) * 9999) / 262142) * cpu/cpu.cfs_quota_us & cpu/cpu.cfs_period_us were replaced by cpu.max, where quota and period are written together (in this order) [0]: https://github.com/containers/crun/blob/main/crun.1.md#cgroup-v2 Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-23 17:25:56 +01:00
Fabiano Fidêncio	4307f0c998	Revert "ci: mariner: Ensure kernel_params can be set" This reverts commit `091ad2a1b2`, in order to ensure tests would be running with cgroupsv2 on the guest. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-23 17:25:56 +01:00
Fabiano Fidêncio	c653719270	kernel: Ensure no cgroupsv1 is used Let's ensure that we're fully running the guest on cgroupsv2. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-23 17:25:56 +01:00
stevenhorsman	d031e479ab	metrics: Increase minval range for blogbench test In the last couple of days I've seen the blogbench metrics write latency test on clh fail a few times because the latency was too low, so adjust the minimum range to tolerate quicker finishes. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-01-23 15:58:31 +00:00
Fabiano Fidêncio	66d881a5da	Merge pull request #10755 from fidencio/topic/ensure-systemd-is-used-as-init-for-coco-cases rootfs-confidential: Ensure systemd is used as init	2025-01-23 15:25:24 +01:00
stevenhorsman	3acce82c91	ci: Update gatekeeper tests for static workflow The static-checks targets are `pull_request`, so they can run the PR workflow version, so we want to update the required-tests.yaml so that static-check workflow changes do trigger static checks in order to test them properly. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-01-23 14:23:09 +00:00
stevenhorsman	d625f20d18	workflows: Move arm static checks runner Now we have the build-assets running on the gh-hosted runners, try the same approach for the static-checks Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-01-23 14:23:09 +00:00
Zvonko Kaiser	a23d6a1241	Merge pull request #10777 from zvonkok/arm64-nvidia-gpu-kernel gpu: Fix arm64 kernel build	2025-01-23 07:14:30 -05:00
Christophe de Dinechin	9a92a4bacf	cli: Remove shebang in non-executable completion script Raised during package review [1] by rpmlint [1] https://bugzilla.redhat.com/show_bug.cgi?id=1590425#c8 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com> Signed-off-by: Balint Tobik <btobik@redhat.com>	2025-01-23 13:11:25 +01:00
Fabiano Fidêncio	734ef71cf7	tests: k8s: confidential: Cleanup $HOME/.ssh/known_hosts I've noticed the following error when running the tests with SEV: ``` 2025-01-21T17:10:28.7999896Z # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2025-01-21T17:10:28.8000614Z # @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ 2025-01-21T17:10:28.8001217Z # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2025-01-21T17:10:28.8001857Z # IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! 2025-01-21T17:10:28.8003009Z # Someone could be eavesdropping on you right now (man-in-the-middle attack)! 2025-01-21T17:10:28.8003348Z # It is also possible that a host key has just been changed. 2025-01-21T17:10:28.8004422Z # The fingerprint for the ED25519 key sent by the remote host is 2025-01-21T17:10:28.8005019Z # SHA256:x7wF8zI+LLyiwphzmUhqY12lrGY4gs5qNCD81f1Cn1E. 2025-01-21T17:10:28.8005459Z # Please contact your system administrator. 2025-01-21T17:10:28.8006734Z # Add correct host key in /home/kata/.ssh/known_hosts to get rid of this message. 2025-01-21T17:10:28.8007031Z # Offending ED25519 key in /home/kata/.ssh/known_hosts:178 2025-01-21T17:10:28.8007254Z # remove with: 2025-01-21T17:10:28.8008172Z # ssh-keygen -f "/home/kata/.ssh/known_hosts" -R "10.244.0.71" ``` And this was causing a failure to ssh into the confidential pod. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-23 12:04:13 +01:00
Fabiano Fidêncio	18137b1583	tests: k8s: confidential: Increase log_buf_len to 4M Relying on dmesg is really not ideal, as we may lose important info, mainly those which happen very early in the boot, depending on the size of kernel ring buffer. So, for this specific test, let's increase the kernel ring buffer, by default, to 4M. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-23 12:04:13 +01:00
Fabiano Fidêncio	d5f907dcf1	rootfs-confidential: Ensure systemd is used as init Let's make sure that we don't use Kata Containers' agent as init for the Confidential related rootfses, as we don't want to increase the agent's complexity for no reason ... mainly when we can rely on a proper init system. : - images already used systemd as init - initrds are now using systemd as init Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-23 12:04:13 +01:00
dependabot[bot]	d2cb14cdbc	build(deps): bump the go_modules group across 3 directories with 1 update Bumps the go_modules group with 1 update in the /src/runtime directory: [golang.org/x/net](https://github.com/golang/net). Bumps the go_modules group with 1 update in the /src/tools/csi-kata-directvolume directory: [golang.org/x/net](https://github.com/golang/net). Bumps the go_modules group with 1 update in the /tools/testing/kata-webhook directory: [golang.org/x/net](https://github.com/golang/net). Updates `golang.org/x/net` from 0.25.0 to 0.33.0 - [Commits](https://github.com/golang/net/compare/v0.25.0...v0.33.0) Updates `golang.org/x/net` from 0.23.0 to 0.33.0 - [Commits](https://github.com/golang/net/compare/v0.25.0...v0.33.0) Updates `golang.org/x/net` from 0.23.0 to 0.33.0 - [Commits](https://github.com/golang/net/compare/v0.25.0...v0.33.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: indirect dependency-group: go_modules - dependency-name: golang.org/x/net dependency-type: direct:production dependency-group: go_modules - dependency-name: golang.org/x/net dependency-type: indirect dependency-group: go_modules ... Signed-off-by: dependabot[bot] <support@github.com>	2025-01-23 10:18:22 +00:00
Fupan Li	944eb2cf3f	Merge pull request #10762 from teawater/remove_enable_swap libs/kata-types: Remove config enable_swap	2025-01-23 14:03:42 +08:00
Fupan Li	ebd8ec227b	Merge pull request #10778 from zvonkok/kata-agent-cgroupsV2 agent: Ensure proper cgroupsV2 handling with init_mode=true	2025-01-23 14:00:13 +08:00
Zvonko Kaiser	afd286f6d6	agent: Ensure proper cgroupsV2 with init_mode=yes When the agent is run as the init process cgroupfs is being setup. In the case of cgroupsV1 we needed to enable the memory hiearchy this is now per default enabled in cgroupsV2. Additionally the file /sys/fs/cgroup/memory/memory.use_hierarchy isn't even available with V2. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-23 03:54:51 +00:00
Fabiano Fidêncio	3f8abb4da7	Merge pull request #10776 from kata-containers/topic/arm64-runners workflows: Switch to github-hosted arm runners	2025-01-22 23:14:28 +01:00
Zvonko Kaiser	91c6d524f8	gpu: Fix arm64 kernel build CONFIG_IOASID (not configurable) in newer kernels. Removing it. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-22 18:15:57 +00:00
Fabiano Fidêncio	6baa60d77d	Merge pull request #10775 from fidencio/topic/update-ttrpc-crate agent: Update ttrpc to include the fix for connectivity issues	2025-01-22 17:45:38 +01:00
stevenhorsman	ab27e11d31	workflows: Switch to github-hosted arm runner Now that gituhb have hosted arm runners https://github.blog/changelog/2025-01-16-linux-arm64-hosted-runners-now-available-for-free-in-public-repositories-public-preview/ we should try and switch our arm64 builder jobs to run on these. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-01-22 16:27:17 +00:00
Greg Kurz	90b6d5725b	Merge pull request #10773 from RuoqingHe/retry-on-aks-throttle ci: Retry on failure of Create AKS cluster	2025-01-22 15:30:57 +01:00
Ruoqing He	373a388844	ci: Retry on failure of Create AKS cluster The `Create AKS cluster` step in `run-k8s-tests-on-aks.yaml` is likely to fail fail since we are trying to issue `PUT` to `aks` in a relatively high frequency, while the `aks` end has it's limit on `bucket-size` and `refill-rate`, documented here [1]. Use `nick-fields/retry@v3` to retry in 10 seconds after request fail, based on observations that AKS were request 7, or 8 second delays before retry as part of their 429 response [1] https://learn.microsoft.com/en-us/azure/aks/quotas-skus-regions#throttling-limits-on-aks-resource-provider-apis Fixes: #10772 Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-01-22 13:24:51 +00:00
Fabiano Fidêncio	a8678a7794	deps: Update ttrpc to v0.8.4 Update the ttrpc crate to include the fix from Moritz Sanft, which solves the connectivity issues with 6.12.x kernels* *: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v6.12.9&id=3257813a3ae7462ac5cde04e120806f0c0776850 Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-22 13:05:43 +01:00
Fabiano Fidêncio	e71bc1f068	Merge pull request #10770 from zvonkok/gpu_kernel_dep gpu: Add kernel dep for the non coco use-case	2025-01-22 12:53:39 +01:00
Greg Kurz	17d053f4bb	Merge pull request #10711 from teawater/balloon Add reclaim_guest_freed_memory config to qemu and cloud-hypervisor	2025-01-22 10:57:13 +01:00
Hui Zhu	c148b70da7	libs/kata-types: Remove config enable_swap Remove config enable_swap because there is no code use it. Fixes: #10761 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2025-01-22 11:08:45 +08:00
Aurélien Bombo	4e9d1363b3	Merge pull request #10754 from sprt/sprt/ci-gh-pr-number-coco ci: Unify on `$GH_PR_NUMBER` environment variable	2025-01-21 15:07:24 -06:00
Zvonko Kaiser	4621f53e4a	gpu: Add kernel dep for the non coco use-case Add the kernel dependency to the non coco use-case so that a rootfs build can be executed via GHA. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-21 16:18:14 +00:00
Zvonko Kaiser	61c282c725	Merge pull request #10769 from kata-containers/revert-10764-gpu_ci_cd Revert "gpu: Add rootfs target amd64/arm64"	2025-01-21 11:09:52 -05:00
Zvonko Kaiser	9fd430e46b	Revert "gpu: Add rootfs target amd64/arm64" Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-21 16:08:30 +00:00
Zvonko Kaiser	ef1639b6bf	Merge pull request #10764 from zvonkok/gpu_ci_cd gpu: Add rootfs target amd64/arm64	2025-01-21 09:51:20 -05:00
Ruoqing He	7e76ef587a	virtiofsd: Enable build for RISC-V With this change, `virtiofsd` (gnu target) could be built and then to be used with other components. Depends: #10741 Fixes: #10739 Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-01-21 18:05:37 +08:00
Hui Zhu	185b94b7fa	runtime-rs: Add reclaim_guest_freed_memory cloud-hypervisor support Add reclaim_guest_freed_memory config to cloud-hypervisor in runtime-rs. Fixes: #10710 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2025-01-21 10:34:21 +08:00
Hui Zhu	487171d992	runtime-rs: Add reclaim_guest_freed_memory qemu support Add reclaim_guest_freed_memory config to qemu in runtime-rs. Fixes: #10710 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2025-01-21 10:34:18 +08:00
Hui Zhu	8f550de88a	runtime-rs: db: Change config enable_balloon_f_reporting Change config enable_balloon_f_reporting of db to reclaim_guest_freed_memory. Signed-off-by: Hui Zhu <teawater@antgroup.com>	2025-01-21 10:34:08 +08:00
Hui Zhu	42f5ef9ff1	kernel: config: Add CONFIG_VIRTIO_BALLOON to virtio.conf Add CONFIG_VIRTIO_BALLOON to virtio.conf to open virtio-balloon. Fixes: #10710 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2025-01-21 10:34:04 +08:00
Zvonko Kaiser	8b097244e7	gpu: Add rootfs initrd build for arm64 We need the arm64 builds as well for GH and GB systems. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-20 19:03:52 +00:00
Zvonko Kaiser	f525631522	gpu: Add rootfs target amd64 Adding the initrd build first to get the rootfs on amd64. With that we can start to add tests. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-20 19:01:42 +00:00
Zvonko Kaiser	d7059e9024	Merge pull request #10736 from zvonkok/gpu-rootfs-fix gpu: Fix rootfs build	2025-01-17 14:44:41 -05:00
Aurélien Bombo	0d70dc31c1	ci: Unify on $GH_PR_NUMBER environment variable While working on #10559, I realized that some parts of the codebase use $GH_PR_NUMBER, while other parts use $PR_NUMBER. Notably, in that PR, since I used $GH_PR_NUMBER for CoCo non-TEE tests without realizing that TEE tests use $PR_NUMBER, the tests on that PR fail on TEEs: https://github.com/kata-containers/kata-containers/actions/runs/12818127344/job/35744760351?pr=10559#step:10:45 ... 44 error: error parsing STDIN: error converting YAML to JSON: yaml: line 90: mapping values are not allowed in this context ... 135 image: ghcr.io/kata-containers/csi-kata-directvolume: ... So let's unify on $GH_PR_NUMBER so that this issue doesn't repro in the future: I replaced all instances of PR_NUMBER with GH_PR_NUMBER. Note that since some test scripts also refer to that variable, the CI for this PR will fail (would have also happened with the converse substitution), hence I'm not adding the ok-to-test label and we should force-merge this after review. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-01-17 10:53:08 -06:00
Fabiano Fidêncio	c018a1cc61	Merge pull request #10741 from RuoqingHe/update-virtiofsd-build-image virtiofsd: Update ubuntu to 22.04 for gnu target	2025-01-16 20:51:10 +01:00
Zvonko Kaiser	2777b13db7	Merge pull request #10742 from zvonkok/3.13.0-release release: Bump version to 3.13.0	2025-01-16 10:05:48 -05:00
Ruoqing He	c70195d629	virtiofsd: Update ubuntu to 22.04 for gnu target With ubuntu 20.04 image, virtiofsd gnu target couldn't be built due to "unsupported ISA subset z" reported by "cc". Updating to ubuntu 22.04 image addresses this problem. Relates: #10739 Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>	2025-01-16 17:27:38 +08:00
Zvonko Kaiser	e82fdee20f	runtime: Add proper IOMMUFD parsing With newer kernels we have a new backend for VFIO called IOMMUFD this is a departure from VFIO IOMMU Groups since it has only one device associated with an IOMMUFD entry. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-15 23:39:33 +00:00
Zvonko Kaiser	f0bd83b073	gpu: Fix rootfs build The pyinstaller is located per default under /usr/local/bin some prior versions were installing it to ${HOME}. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-15 20:37:51 +00:00
Aurélien Bombo	0d93f59f5b	Merge pull request #10738 from microsoft/danmihai1/empty-pty-lines runtime: skip empty Guest console output lines	2025-01-15 10:33:24 -06:00
Zvonko Kaiser	0b04f43ac6	release: Bump version to 3.13.0 Bump VERSION and helm-chart versions Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2025-01-15 16:13:22 +00:00
Zvonko Kaiser	365def9b4a	Merge pull request #10735 from BbolroC/kubectl-create-retry-trusted-storage tests: Introduce retry_kubectl_apply() for trusted storage	2025-01-14 21:59:45 -05:00
Dan Mihai	2e21f51375	runtime: skip empty Guest console output lines Skip logging empty lines of text from the Guest console output, if there are any such lines. Without this change, the Guest console log from CLH + /dev/pts/0 has twice as many lines of text. Half of these lines are empty. Fixes: #10737 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-01-15 00:28:26 +00:00
Hyounggyu Choi	f7816e9206	tests: Introduce retry_kubectl_apply() for trusted storage On s390x, some tests for trusted storage occasionally failed due to: ```bash etcdserver: request timed out ``` or ```bash Internal error occurred: resource quota evaluation timed out ``` These timeouts were not observed previously on k3s but occur sporadically on kubeadm. Importantly, they appear to be temporary and transient, which means they can be ignored in most cases. To address this, we introduced a new wrapper function, `retry_kubectl_apply()`, for `kubectl create`. This function retries applying a given manifest up to 5 times if it fails due to a timeout. However, it will still catch and handle any other errors during pod creation. Fixes: #10651 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-01-14 21:15:44 +01:00
Fabiano Fidêncio	121ac0c5c0	Merge pull request #10727 from microsoft/danmihai1/mariner3-guest image: bump mariner guest version to 3.0	2025-01-14 19:06:28 +01:00
Fabiano Fidêncio	3658ea2320	Merge pull request #10731 from microsoft/danmihai1/quiet-rootfs-build rootfs: reduced console output by default	2025-01-14 19:02:42 +01:00
Chengyu Zhu	7d34ca4420	Merge pull request #10674 from bpradipt/fix-10398 agent: alternative implementation for sealed_secret as volume	2025-01-14 18:55:45 +08:00
Fabiano Fidêncio	4578969c5d	Merge pull request #10730 from BbolroC/bump-coco-trustee versions: Bump trustee to latest	2025-01-14 08:56:11 +01:00
Dan Mihai	c4da296326	rootfs: delete links to deleted files Delete symbolic links to files being deleted. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-01-13 21:28:44 +00:00
Dan Mihai	5b8471ffce	rootfs: print the path to files being deleted Show the list of files being deleted. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-01-13 21:28:34 +00:00
Dan Mihai	a49d0fb343	rootfs: delete systemd units/files from rootfs.sh Move the deletion of unnecessary systemd units and files from image_builder.sh into rootfs.sh. The files being deleted can be applicable to other image file formats too, not just to the rootfs-image format created by image_builder.sh. Also, image_builder.sh was deleting these files after it calculated the size of the rootfs files, thus missing out on the opportunity to possibly create a smaller image file. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-01-13 21:28:23 +00:00
Dan Mihai	0f522c09d9	rootfs: reduced console output by default Use "set -x" only when the user specified DEBUG=1. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-01-13 19:34:05 +00:00
Pradipta Banerjee	36580bb642	tests: Update sealed secret CI value to base64url The existing encoding was base64 and it fails due to `874948638a` Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>	2025-01-13 09:37:05 -05:00
Hyounggyu Choi	2cdb549a75	versions: Bump trustee to latest This update addresses an issue with token verification for SE and SNP introduced in the last update by #10541. Bumping the project to the latest commit resolves the issue. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-01-13 15:07:33 +01:00
Pradipta Banerjee	5218345e34	agent: alternative implementation for sealed_secret as volume The earlier implementation relied on using a specific mount-path prefix - `/sealed` to determine that the referenced secret is a sealed secret. However that was restrictive for certain use cases as it forced the user to always use a specific mountpath naming convention. This commit introduces an alternative implementation to relax the restriction. A sealed secret can be mounted in any mount-path. However it comes with a potential performance penality. The implementation loops through all volume mounts and reads the file to determine if it's a sealed secret or not. Fixes: #10398 Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>	2025-01-11 12:36:44 -05:00
Dan Mihai	4707883b40	image: bump mariner guest version to 3.0 Use Mariner 3.0 (a.k.a., Azure Linux 3.0) as the Guest CI image. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-01-11 17:36:19 +00:00
Fabiano Fidêncio	2d9baf899a	Merge pull request #10719 from msanft/msanft/runtime/fix-boolean-opts runtime: use actual booleans for QMP `device_add` boolean options	2025-01-11 16:38:06 +01:00
Zvonko Kaiser	f08a9eac11	Merge pull request #10721 from stevenhorsman/more-metrics-latency-minimum-range-fixes metrics: Increase latency test range	2025-01-10 21:59:39 -05:00
Moritz Sanft	e5735b221c	runtime: use actual booleans for QMP `device_add` boolean options Since `be93fd5372`, which is included in QEMU since version 9.2.0, the options for the `device_add` QMP command need to be typed correctly. This makes it so that instead of `"on"`, the value is set to `true`, matching QEMU's expectations. This has been tested on QEMU 9.2.0 and QEMU 9.1.2, so before and after the change. The compatibility with incorrectly typed options for the `device_add` command is deprecated since version 6.2.0 [^1]. [^1]: https://qemu-project.gitlab.io/qemu/about/deprecated.html#incorrectly-typed-device-add-arguments-since-6-2 Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>	2025-01-10 11:53:56 +01:00
Wainer Moschetta	5fae2a9f91	Merge pull request #9871 from wainersm/fix-print_cluster_name tests/gha-run-k8s-common: shorten AKS cluster name	2025-01-09 14:35:02 -03:00
stevenhorsman	aaae5b6d0f	metrics: clh: Increase network-iperf3 range We hit a failure with: ``` time="2025-01-09T09:55:58Z" level=warning msg="Failed Minval (0.017600 > 0.015000) for [network-iperf3]" ``` The range is very big, but in the last 3 test runs I reviewed we have got a minimum value of 0.015s and a max value of 0.052, so there is a ~350% difference possible so I think we need to have a wide range to make this stable. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-01-09 11:25:57 +00:00
stevenhorsman	e946d9d5d3	metrics: qemu: Increase latency test range After the kernel version bump, in the latest nightly run https://github.com/kata-containers/kata-containers/actions/runs/12681309963/job/35345228400 The sequential read throughput result was 79.7% of the expected (so failed) and the sequential write was 84% of the expected, so was fairly close, so increase their minimum ranges to make them more robust. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-01-09 11:25:50 +00:00
Wainer dos Santos Moschetta	badc208e9a	tests/gha-run-k8s-common: shorten AKS cluster name Because az client restricts the name to be less than 64 characters. In some cases (e.g. KATA_HYPERVISOR=qemu-runtime-rs) the generated name will exceed the limit. This changed the function to shorten the name: * SHA1 is computed from metadata then compound the cluster's name * metadata as plain-text are passed as --tags Fixes: #9850 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-01-08 16:39:07 -03:00
Fabiano Fidêncio	8f8988fcd1	Merge pull request #10714 from fidencio/topic/update-virtiofsd virtiofsd: Update to its v1.13.0 ( + one patch) release :-)	2025-01-08 17:59:29 +01:00
Fabiano Fidêncio	7e5e109255	Merge pull request #10541 from fitzthum/bump-trustee-010 Update Trustee and Guest Components	2025-01-08 17:44:13 +01:00
Fabiano Fidêncio	eb3fe0d27c	Merge pull request #10717 from fidencio/topic/re-enable-oom-test-for-mariner tests: Re-enable oom tests for mariner	2025-01-08 17:43:56 +01:00
Fabiano Fidêncio	65e267294b	Merge pull request #10718 from stevenhorsman/metrics-blogbench-latency-minimal-range-increase metrics: Increase latency minimum range	2025-01-08 17:09:36 +01:00
stevenhorsman	dc069d83b5	metrics: Increase latency test range The bump to kernel 6.12 seems to have reduced the latency in the metrics test, so increase the ranges for the minimal value, to account for this. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-01-08 15:11:49 +00:00
Fabiano Fidêncio	967d5afb42	Revert "tests: k8s: Skip one of the empty-dir tests" This reverts commit `9aea7456fb`. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-08 14:07:34 +01:00
Fabiano Fidêncio	7ae2ca4c31	virtiofsd: Update to its v1.13.0 + one patch release Together with the bump, let's also bump the rust version needed to build the package, with the caveat that virtiofsd doesn't actually use a pinned version as part of their CI, so we're bumping to whatever is the version on `alpine:rust` (which is used in their CI). It's important to note that we're using a version which brings in one extra patch apart from the release, as the next virtiofsd release will happen at the end of February, 2025. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-08 14:07:34 +01:00
Fabiano Fidêncio	0af3536328	packaging: virtiofsd: Allow building a specific commit Right now we've been only building releases from virtiofsd, but we'll need to pin a specific commit till v1.14.0 is out, thus let's add the needed machinery to do so. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-08 14:07:34 +01:00
Tobin Feldman-Fitzthum	41c7f076fa	packaging: updating guest components build script The guest-components directory has been re-arranged slightly. Adjust the installation path of the LUKS helper script to account for this. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2025-01-07 16:59:10 -06:00
Tobin Feldman-Fitzthum	cafc7d6819	versions: update trustee and guest components Trustee has some new features including a plugin backend, support for PKCS11 resources, improvements to token verification, and adjustments to logging, and more. Also update guest-components to pickup improvements and keep the KBS protocol in sync. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2025-01-07 16:59:10 -06:00
Fabiano Fidêncio	53ac0f00c5	tests: Re-enable oom tests for mariner Since we bumped to the 6.12.x LTS kernel, we've also adjusted the aggressivity of the OOM test, which may be enough to allow us to re-enable it for mariner. Fixes: #8821 Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-07 18:33:17 +01:00
Fabiano Fidêncio	f4a39e8c40	Merge pull request #10468 from fidencio/topic/early-tests-on-next-lts-kernel versions: Move kernel to the latest 6.12 release (the current LTS)	2025-01-07 18:02:04 +01:00
Fupan Li	bd56891f84	Merge pull request #10702 from lifupan/fix_containerdname CI: change the containerd tarball name from cri-containerd-cni to containerd	2025-01-07 18:56:15 +08:00
Fupan Li	b19db40343	CI: change the containerd tarball name to containerd Since from https://github.com/containerd/containerd/pull/9096 containerd removed cri-containerd-*.tar.gz release bundles, thus we'd better change the tarball name to "containerd". BTW, the containerd tarball containerd the follow files: bin/ bin/containerd-shim bin/ctr bin/containerd-shim-runc-v1 bin/containerd-stress bin/containerd bin/containerd-shim-runc-v2 thus we should untar containerd into /usr/local directory instead of "/" to keep align with the cri-containerd. In addition, there's no containerd.service file,runc binary and cni-plugin included, thus we should add a specific containerd.service file and install install the runc binary and cni-pluginspecifically. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-01-07 17:39:05 +08:00
Fabiano Fidêncio	9aea7456fb	tests: k8s: Skip one of the empty-dir tests An issue has been created for this, and we should fix the issue before the next release. However, for now, let's unblock the kernel bump and have the test skipped. Reference: https://github.com/kata-containers/kata-containers/issues/10706 Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-06 21:48:20 +01:00
Fabiano Fidêncio	44ff602c64	tests: k8s: Be more aggressive to get OOM Let's increase the amount of bytes allocated per VM worker, so we can hit the OOM sooner. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-06 21:48:20 +01:00
Fabiano Fidêncio	f563f0d3fc	versions: Update kernel to v6.12.8 There are lots of configs removed from latest kernel. Update them here for convenience of next kernel upgrade. Remove CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE [1] Remove CONFIG_IP_NF_TARGET_CLUSTERIP [2] Remove CONFIG_NET_SCH_CBQ [3] Remove CONFIG_AUTOFS4_FS [4] Remove CONFIG_EMBEDDED [5] Remove CONFIG_ARCH_RANDOM & CONFIG_RANDOM_TRUST_CPU [6] [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.6&id=a7e4676e8e2cb158a4d24123de778087955e1b36 [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.6&id=9db5d918e2c07fa09fab18bc7addf3408da0c76f [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.6&id=051d442098421c28c7951625652f61b1e15c4bd5 [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.6&id=1f2190d6b7112d22d3f8dfeca16a2f6a2f51444e [5] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.6&id=ef815d2cba782e96b9aad9483523d474ed41c62a [6] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.2&id=b9b01a5625b5a9e9d96d14d4a813a54e8a124f4b Apart from the removals, CONFIG_CPU_MITIGATIONS is now a dependency for CONFIG_RETPOLINE (which has been renamed to CONFIG_MITIGATION_RETPOLINE) and CONFIG_PAGE_TABLE_ISOLATION (which has been renamed to CONFIG_MITIGATION_PAGE_TABLE_ISOLATION). I've added that to the whitelist because we still build older versions of the kernel that do not have that dependency. Fixes: #8408 Signed-off-by: Jianyong Wu <jianyong.wu@arm.com> Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org> Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-01-06 21:48:20 +01:00
Xuewei Niu	71b14d40f2	Merge pull request #10696 from teawater/kt kata-ctl: direct-volume: Auto create KATA_DIRECT_VOLUME_ROOT_PATH	2025-01-02 14:04:37 +08:00
Hui Zhu	d15a7baedd	kata-ctl: direct-volume: Auto create KATA_DIRECT_VOLUME_ROOT_PATH Got following issue: kata-ctl direct-volume add /kubelet/kata-direct-vol-002/directvol002 "{\"device\": \"/home/t4/teawater/coco/t.img\", \"volume-type\": \"directvol\", \"fstype\": \"\", \"metadata\":"{}", \"options\": []}" subsystem: kata-ctl_main Dec 30 09:43:41.150 ERRO Os { code: 2, kind: NotFound, message: "No such file or directory", } The reason is KATA_DIRECT_VOLUME_ROOT_PATH is not exist. This commit create_dir_all KATA_DIRECT_VOLUME_ROOT_PATH before join_path to handle this issue. Fixes: #10695 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2024-12-30 17:55:49 +08:00
Xuewei Niu	6400295940	Merge pull request #10683 from justxuewei/nxw/remove-mut	2024-12-29 00:49:38 +08:00
Fupan Li	2068801b80	Merge pull request #10626 from teawater/ma Add mem-agent to kata	2024-12-24 14:11:36 +08:00
Steve Horsman	2322f6df94	Merge pull request #10686 from stevenhorsman/ppc64le-all-prepare-steps-timeout workflows: Add more ppc64le timeouts	2024-12-20 19:08:48 +00:00
stevenhorsman	9b6fce9e96	workflows: Add more ppc64le timeouts Unsurprisingly now we've got passed the containerd test hangs on the ppc64le, we are hitting others in the "Prepare the self-hosted runner" stage, so add timeouts to all of them to avoid CI blockages. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-20 17:31:24 +00:00
Steve Horsman	162e2af4f5	Merge pull request #10685 from stevenhorsman/ppc64le-containerd-test-timeout workflows: Add timeout to some ppc64le steps	2024-12-20 16:55:40 +00:00
stevenhorsman	d9d8d53bea	workflows: Add timeout to some ppc64le steps In some runs e.g. https://github.com/kata-containers/kata-containers/actions/runs/12426384186/job/34697095588 and https://github.com/kata-containers/kata-containers/actions/runs/12422958889/job/34697016842 we've seen the Prepare the self-hosted runner and Install dependencies steps get stuck for 5hours+. If they are working then it should take a few minutes, so let's add timeouts and not hold up whole the CI if they are stuck Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-20 16:37:36 +00:00
Steve Horsman	99f239bc44	Merge pull request #10380 from stevenhorsman/required-tests-guidance doc: Add required jobs info	2024-12-20 16:24:42 +00:00
stevenhorsman	d1d4bc43a4	static-checks: Add words to dictionary devmapper and snapshotters are being marked as spelling errors, so add them to the kata dictionary Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-20 14:16:52 +00:00
stevenhorsman	7612839640	doc: Add required jobs info Add information about what required jobs are and our initial guidelines for how jobs are eligible for being made required, or non-required Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-20 14:12:13 +00:00
Xuewei Niu	ecf98e4db8	runtime-rs: Remove unneeded `mut` from `new_hypervisor()` `set_hypervisor_config()` and `set_passfd_listener_port()` acquire inner lock, so that `mut` for `hypervisor` is unneeded. Fixes: #10682 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2024-12-20 17:08:10 +08:00
Steve Horsman	2c6126d3ab	Merge pull request #10676 from stevenhorsman/fix-qemu-coco-dev-skip tests: Fix qemu-coc-dev skip	2024-12-20 08:56:54 +00:00
Xuewei Niu	ea60613be9	Merge pull request #9387 from deagon/fix-broken-usage packaging: fix the broken usage help	2024-12-20 15:20:37 +08:00
Guoqiang Ding	75baf75726	packaging: fix the broken usage help Using the plain usage text instead of the bad variable reference. Fixes: #9386 Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>	2024-12-20 13:58:40 +08:00
stevenhorsman	dd02b6699e	tests: Fix qemu-coc-dev skip Fix the logic to make the test skipped on qemu-coco-dev, rather than the opposite and update the syntax to make it clearer as it incorrectly got written and reviewed by three different people in it's prior form. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-19 19:50:46 +00:00
Steve Horsman	79495379e2	Merge pull request #10668 from stevenhorsman/update-release-process-post-3.12 doc: Update the release process	2024-12-19 14:16:30 +00:00
Steve Horsman	99b9ef4e5a	Merge pull request #10675 from stevenhorsman/release-repeat-abort release: Abort if release version exists	2024-12-19 11:55:44 +00:00
stevenhorsman	c3f13265e4	doc: Update the release process Add a step to wait for the payload publish to complete before running the release action. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-19 09:52:39 +00:00
Zvonko Kaiser	f2d72874a1	Merge pull request #10620 from kata-containers/topic/fix-remove-artifact-ordering workflows: Remove potential timing issues with artifacts	2024-12-18 13:22:12 -05:00
Zvonko Kaiser	fc2c77f3b6	Merge pull request #10669 from zvonkok/qemu-aarch64-fix qemu: Fix aarch64 build	2024-12-18 08:26:55 -05:00
stevenhorsman	e2669d4acc	release: Abort if release version exists In order to check that we don't accidentally overwrite release artifacts, we should add a check if the release name already exists and bail if it does. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-18 11:04:19 +00:00
Zvonko Kaiser	07d2b00863	qemu: Fix aarch64 build Building static binaries for aarch64 requires disabling PIE We get an GOT overflow and the OS libraries are only build with fpic and not with fPIC which enables unlimited sized GOT tables. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-12-18 03:26:14 +00:00
Zvonko Kaiser	39bf10875b	Merge pull request #10663 from zvonkok/3.12.0-relase release: Bump version to 3.12.0	2024-12-17 10:00:42 -05:00
Zvonko Kaiser	28b57627bd	release: Bump version to 3.12.0 Bump VERSION and helm-chart versions Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-12-16 18:41:51 +00:00
Xuewei Niu	02b5fa15ac	Merge pull request #10655 from liubogithub/patch-1 kata-ctl: fix outdated comments	2024-12-16 13:11:25 +08:00
Hyounggyu Choi	cfbc425041	Merge pull request #10660 from BbolroC/fix-leading-zero-issue-for-vfio-ap vfio-ap: Assign default string "0" for empty APID and APQI	2024-12-13 17:40:29 +01:00
Hyounggyu Choi	341e5ca58e	vfio-ap: Assign default string "0" for empty APID and APQI The current script logic assigns an empty string to APID and APQI when APQN consists entirely of zeros (e.g., "00.0000"). However, this behavior is incorrect, as "00" and "0000" are valid values and should be represented as "0". This commit ensures that the script assigns the default string “0” to APID and APQI if their computed values are empty. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-12-13 14:39:03 +01:00
Liu Bo	95fc585103	kata-ctl: fix outdated comments MgmnClient can also tolerate short sandbox id. Signed-off-by: Liu Bo <liub.liubo@gmail.com>	2024-12-12 21:59:54 -08:00
stevenhorsman	cf8b82794a	workflows: Only remove artifacts in release builds Due to the agent-api tests requiring the agent to be deployed in the CI by the tarball, so in the short-term lets only do this on the release stage, so that both kata-manager works with the release and the agent-api tests work with the other CI builds. In the longer term we need to re-evaluate what is in our tarballs (issue #10619), but want to unblock the tests in the short-term. Fixes: #10630 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-12 17:38:27 +00:00
stevenhorsman	e1f6aca9de	workflows: Remove potential timing issues with artifacts With the code I originally did I think there is potentially a case where we can get a failure due to timing of steps. Before this change the `build-asset-shim-v2` job could start the `get-artifacts` step and concurrently `remove-rootfs-binary-artifacts` could run and delete the artifact during the download and result in the error. In this commit, I try to resolve this by making sure that the shim build waits for the artifact deletes to complete before starting. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-12 16:52:54 +00:00
Fabiano Fidêncio	7b0c1d0a8c	Merge pull request #10492 from zvonkok/upgrade-qemu-9.1.0 qemu: Upgrade qemu 9.1.2	2024-12-12 08:15:39 +01:00
Fupan Li	07fe7325c2	Merge pull request #10643 from justxuewei/fix-bind-vol runtime-rs & agent: Fix the issues with bind volumes	2024-12-12 11:34:52 +08:00
Fupan Li	372346baed	Merge pull request #10641 from justxuewei/fix-build-type runtime-rs: Ignore BUILD_TYPE if it is not release	2024-12-12 11:32:49 +08:00
Xuewei Niu	5f1b1d8932	Merge pull request #10638 from justxuewei/fix-stderr-fifo runtime-rs: Fix the issues with stderr fifo	2024-12-12 10:03:46 +08:00
Fabiano Fidêncio	a5c863a907	Merge pull request #10581 from ryansavino/snp-enable-skipped Revert "ci: Skip the failing tests in SNP"	2024-12-11 18:22:17 +01:00
Zvonko Kaiser	cc9ecedaea	qemu: Bump version, new options, add no_patches We want to have the latest QEMU version available which is as of this writing v9.1.2 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> qemu: Add new options for 9.1.2 We need to fence specific options depending on the version and disable ones that are not needed anymore Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> qemu: Add no_patches.txt Since we do not have any patches for this version let's create the appropriate files. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-12-11 16:32:39 +00:00
Zvonko Kaiser	69ed4bc3b7	qemu: Add depedency The new QEMU build needs python-tomli, now that we bumped Ubuntu we can include the needed tomli package Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-12-11 16:32:20 +00:00
Zvonko Kaiser	c82db45eaa	qemu: Disable pmem We're disabling pmem support, it is heavilly broken with Ubuntu's static build of QEMU and not needed Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-12-11 16:32:19 +00:00
Zvonko Kaiser	a88174e977	qemu: Replace from source build with package In jammy we have the liburing package available, hence remove the source build and include the package. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-12-11 16:22:54 +00:00
Zvonko Kaiser	c15f77737a	qemu: Bump Ubuntu version in Dockerfile We need jammy for a new package that is not available in focal Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-12-11 16:22:54 +00:00
Zvonko Kaiser	eef2795226	qemu: Use proper QEMU builder Do not use hardcoded abs path. Use the deduced rel path. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-12-11 16:22:54 +00:00
Zvonko Kaiser	e604e51b3d	qemu: Build as user We moved all others artifacts to be build as a user, QEMU should not be the exception Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-12-11 16:22:54 +00:00
Zvonko Kaiser	1d56fd0308	qemu: Remove abs path We want to stick with the other build scripts and only use relative paths. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-12-11 16:22:54 +00:00
Ryan Savino	7d45382f54	Revert "ci: Skip the failing tests in SNP" This reverts commit `2242aee099`.	2024-12-10 16:20:31 -06:00
Xuewei Niu	3fb91dd631	agent: Fix the issues with bind volumes The mount type should be considered as empty if the value is `Some("none")`. Fixes: #10642 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2024-12-11 00:51:32 +08:00
Xuewei Niu	59ed19e8b2	runtime-rs: Fix the issues with bind volumes This path fixes the logic of getting the type of volume: when the type of OCI mount is Some("none") and the options have "bind" or "rbind", the type will be considered as "bind". Fixes: #10642 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2024-12-11 00:50:36 +08:00
Xuewei Niu	2424c1a562	runtime-rs: Ignore BUILD_TYPE if it is not release This patch fixes that by adding `--release` only if `BUILD_TYPE=release`. Fixes: #10640 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2024-12-11 00:27:28 +08:00
Xuewei Niu	b4695f6303	runtime-rs: Fix the issues with stderr fifo When tty is enabled, stderr fifo should never be opened. Fixes: #10637 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2024-12-10 21:48:52 +08:00
Aurélien Bombo	037281d699	Merge pull request #10593 from microsoft/saulparedes/improve_namespace_validation policy: improve pod namespace validation	2024-12-09 11:55:09 -06:00
Steve Horsman	9b7fb31ce6	Merge pull request #10631 from stevenhorsman/action-lint-workflow Action lint workflow	2024-12-09 09:33:07 +00:00
Fabiano Fidêncio	bec1de7bd7	Merge pull request #10548 from Sumynwa/sumsharma/clh_tweak_vm_configs runtime: Set memory config shared=false when shared_fs=None in CLH.	2024-12-06 23:15:29 +01:00
Sumedh Alok Sharma	ac4f986e3e	runtime: Set memory config shared=false when shared_fs=None in CLH. This commit sets memory config `shared` to false in cloud hypervisor when creating vm with shared_fs=None && hugePages = false. Currently in runtime/virtcontainers/clh.go,the memory config shared is by default set to true. As per the CLH memory document, (a) shared=true is needed in case like when using virtio_fs since virtiofs daemon runs as separate process than clh. (b) for shared_fs=none + hugespages=false, shared=false can be set to use private anonymous memory for guest (with no file backing). (c) Another memory config thp (use transparent huge pages) is always enabled by default. As per documentation, (b) + (c) can be used in combination. However, with the current CLH implementation, the above combination cannot be used since shared=true is always set. Fixes #10547 Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2024-12-06 21:22:51 +05:30
stevenhorsman	b4b3471bcb	workflows: linting: Fix shellcheck SC1001 > This \/ will be a regular '/' in this context Remove ignored escape Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-06 13:50:12 +00:00
stevenhorsman	491210ed22	workflows: linting: Fix shellcheck SC2006 > Use $(...) notation instead of legacy backticks `...` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-06 13:50:12 +00:00
stevenhorsman	5d7c5bdfa4	workflows: linting: Fix shellcheck SC2015 > A && B \|\| C is not if-then-else. C may run when A is true Refactor the echo so that we can't get into a situation where the retry of workspace delete happens if the original one was successful Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-06 13:50:12 +00:00
stevenhorsman	c2ba15c111	workflows: linting: Fix shellcheck SC2206 > Quote to prevent word splitting/globbing Double quote variables expanded in an array Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-06 13:50:12 +00:00
stevenhorsman	007514154c	workflows: linting: Fix shellcheck SC2068 > Double quote array expansions to avoid re-splitting elements Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-06 13:50:12 +00:00
stevenhorsman	4ef05c6176	workflows: linting: Fix shellcheck SC2116 > Useless echo? Instead of 'cmd $(echo foo)', just use 'cmd foo' Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-06 13:50:12 +00:00
stevenhorsman	f02d540799	workflows: Bump outdated action versions Bump some actions that are significantly out-of-date and out of sync with the versions used in other workflows Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-06 13:50:12 +00:00
stevenhorsman	935327b5aa	workflows: linting: Fix shellcheck SC2046 > Quote this to prevent word splitting. Quote around subshell Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-06 13:50:12 +00:00
stevenhorsman	e93ed6c20e	workflows: linting: Add tdx labels The tdx runners got split into two different runners, so we need to update the known self-hosted runner labels Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-06 13:50:12 +00:00
stevenhorsman	d4bd314d52	workflows: linting: Fix incorrect properties These properties are currently invalid, so either fix, or remove them Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-06 13:50:12 +00:00
stevenhorsman	9113606d45	workflows: linting: Fix shellcheck SC2086 > Double quote to prevent globbing and word splitting. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-06 13:50:12 +00:00
stevenhorsman	42cd2ce6e4	workflows: Add actionlint workflows On PRs that update anything in the workflows directory, add an actionlint run to validate our workflow files for errors and hopefully catch issues earlier. Fixes: #9646 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-06 11:36:08 +00:00
Fabiano Fidêncio	a93ff57c7d	Merge pull request #10627 from kata-containers/topic/release-helm-charm-tarball release: helm: Add the chart as part of the release	2024-12-06 11:22:43 +01:00
Fabiano Fidêncio	300a827d03	release: helm: Add the chart as part of the release So users can simply download the chart and use it accordingly without the need to download the full repo. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-12-06 11:19:34 +01:00
Fabiano Fidêncio	652662ae09	Merge pull request #10551 from fidencio/topic/kata-deploy-allow-multi-deployment kata-deploy: Add support to multi-installation	2024-12-06 11:16:20 +01:00
Hui Zhu	d3a6bcdaa5	runtime-rs: configuration-dragonball.toml.in: Add config for mem-agent Add config for mem-agent to configuration-dragonball.toml.in. Fixes: #10625 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2024-12-06 10:00:28 +08:00
Hui Zhu	2b6caf26e0	agent-ctl: Add mem-agent API support Add sub command MemAgentMemcgSet and MemAgentCompactSet to agent-ctl to configate the mem-agent inside the running kata-containers. Fixes: #10625 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2024-12-06 10:00:24 +08:00
Hui Zhu	cb86d700a6	config: Add config of mem-agent Add config of mem-agent to configate the mem-agent. Fixes: #10625 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2024-12-06 10:00:20 +08:00
Hui Zhu	692ded8f96	agent: add support for MemAgentMemcgSet and MemAgentCompactSet Add MemAgentMemcgSet and MemAgentCompactSet to agent API to set the config of mem-agent memcg and compact. Fixes: #10625 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2024-12-06 10:00:16 +08:00
Hui Zhu	f84ad54d97	agent: Start mem-agent in start_sandbox mem-agent will run with kata-agent. Fixes: #10625 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2024-12-06 10:00:13 +08:00
Hui Zhu	74a17f96f4	protocols/protos/agent.proto: Add mem-agent support Add MemAgentMemcgConfig and MemAgentCompactConfig to AgentService. Fixes: #10625 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2024-12-06 10:00:09 +08:00
Hui Zhu	ffc8390a60	agent: Add mem-agent to Cargo.toml Add mem-agent to Cargo.toml of agent. mem-agent will be integrated into kata-agent. Fixes: #10625 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2024-12-06 10:00:05 +08:00
Hui Zhu	4407f6e098	mem-agent: Add to src mem-agent is a component designed for managing memory in Linux environments. Sub-feature memcg: Utilizes the MgLRU feature to monitor each cgroup's memory usage and periodically reclaim cold memory. Sub-feature compact: Periodically compacts memory to facilitate the kernel's free page reporting feature, enabling the release of more idle memory from guests. During memory reclamation and compaction, mem-agent monitors system pressure using Pressure Stall Information (PSI). If the system pressure becomes too high, memory reclamation or compaction will automatically stop. Fixes: #10625 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2024-12-06 10:00:02 +08:00
Hui Zhu	f9c63d20a4	kernel/configs: Add mglru, debugfs and psi to dragonball-experimental Add mglru, debugfs and psi to dragonball-experimental/mem_agent.conf to support mem_agent function. Fixes: #10625 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2024-12-06 09:59:59 +08:00
Fabiano Fidêncio	111082db07	kata-deploy: Add support to multi-installation This is super useful for development / debugging scenarios, mainly when dealing with limited hardware availability, as this change allows multiple people to develop into one single machine, while still using kata-deploy. Fixes: #10546 Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-12-05 17:42:53 +01:00
Fabiano Fidêncio	0033a0c23a	kata-deploy: Adjust paths for qemu-coco-dev as well I missed that when working on the INSTALL_PREFIX feature, so adding it now. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-12-05 17:42:53 +01:00
Fabiano Fidêncio	62b3a07e2f	kata-deploy: helm: Add overlooked INSTALLATION_PREFIX env var At the same time that INSTALLATION_PREFIX was added, I was working on the helm changes to properly do the cleanup / deletion when it's removed. However, I missed adding the INSTALLATION_PREFIX env var there. which I'm doing now. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-12-05 17:42:53 +01:00
Steve Horsman	5d96734831	Merge pull request #10572 from ldoktor/gk-stalled-results ci.gatekeeper: Update existing results	2024-12-04 19:02:14 +00:00
Wainer Moschetta	a94982d8b8	Merge pull request #10617 from stevenhorsman/skip-k8s-job-test-on-non-tee tests: Skip k8s job test on qemu-coco-dev	2024-12-04 15:47:33 -03:00
Saul Paredes	84a411dac4	policy: improve pod namespace validation - Remove default_namespace from settings - Ensure container namespaces in a pod match each other in case no namespace is specified in the YAML Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-12-04 10:17:54 -08:00
Steve Horsman	c86f76d324	Merge pull request #10588 from stevenhorsman/metrics-clh-min-range-relaxation metrics: Increase minval range for failing tests	2024-12-04 16:10:26 +00:00
stevenhorsman	a8ccd9a2ac	tests: Skip k8s job test on qemu-coco-dev The tests is unstable on this platform, so skip it for now to prevent the regular known failures covering up other issues. See #10616 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-04 16:00:05 +00:00
Steve Horsman	9e609dd34f	Merge pull request #10615 from kata-containers/topic/update-remove-artifact-filter workflows: Fix remove artifact name filter	2024-12-04 15:02:35 +00:00
Fabiano Fidêncio	531a29137e	Merge pull request #10607 from microsoft/danmihai1/less-logging runtime: skip logging some of the dial errors	2024-12-04 15:01:45 +01:00
stevenhorsman	14a3adf4d6	workflows: Fix remove artifact name filter - Fix copy-paste errors in artifact filters for arm64 and ppc64le - Remove the trailing wildcard filter that falsely ends up removing agent-ctl and replace with the tarball-suffix, which should exactly match the artifacts Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-04 13:34:42 +00:00
Alex Lyn	5f9cc86b5a	Merge pull request #10604 from 3u13r/euler/fix/genpolicy-rego-state-getter genpolicy: align state path getter and setter	2024-12-04 13:57:34 +08:00
Alex Lyn	c7064027f4	Merge pull request #10574 from BbolroC/add-ccw-subchannel-qemu-runtime-rs Add subchannel support to qemu-runtime-rs for s390x	2024-12-04 09:17:45 +08:00
Aurélien Bombo	57d893b5dc	Merge pull request #10563 from sprt/csi-deploy coco: ci: Fully implement compilation of CSI driver and require it for CoCo tests [2/x]	2024-12-03 18:58:14 -06:00
Aurélien Bombo	4aa7d4e358	ci: Require CSI driver for CoCo tests With the building/publishing step for the CSI driver validated, we can set that as a requirement for the CoCo tests. Depends on: #10561 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-12-03 14:43:36 -06:00
Aurélien Bombo	fe55b29ef0	csi-kata-directvolume: Remove go version check The driver build recipe has a script to check the current Go version against the go.mod version. However, the script is broken ($expected is unbound) and I don't believe we do this for other components. On top of this, Go should be backward-compatible. Let's keep things simple for now and we can evaluate restoring this script in the future if need be. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-12-03 14:43:36 -06:00
Aurélien Bombo	fb87bf221f	ci: Implement build step for CSI driver This fully implements the compilation step for csi-kata-directvolume. This component can now be built by the CI running: $ cd tools/packaging/kata-deploy/local-build $ make csi-kata-directvolume-tarball A couple notes: * When installing the binary, we rename it from directvolplugin to csi-kata-directvolume on the fly to make it more readable. * We add go to the tools builder Dockerfile to support building this tool. * I've noticed the file install_libseccomp.sh gets created by the build process so I've added it to a .gitignore. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-12-03 14:43:36 -06:00
Aurélien Bombo	0f6113a743	Merge pull request #10612 from kata-containers/sprt/fix-csi-publish2 ci: Fix Docker publishing for CSI driver, 2nd try	2024-12-03 14:43:28 -06:00
Aurélien Bombo	a23ceac913	ci: Fix Docker publishing for CSI driver, 2nd try Follow-up to #10609 as it seems GHA doesn't allow hard links: https://github.com/kata-containers/kata-containers/actions/runs/12144941404/job/33868901896?pr=10563#step:6:8 Note that I also updated the `needs` directive as we don't need the Kata payload container, just the tarball artifact. Part of: #10560 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-12-03 13:04:46 -06:00
Dan Mihai	2a67038836	Merge pull request #10608 from microsoft/saulparedes/policy_metadatata_uid policy: ignore optional metadata uid field	2024-12-03 10:19:12 -08:00
Dan Mihai	25e6f4b2a5	Merge pull request #10592 from microsoft/saulparedes/add_constants_to_rules policy: add constants to rules.rego	2024-12-03 10:17:10 -08:00
Aurélien Bombo	5e1fc5a63f	Merge pull request #10609 from kata-containers/sprt/fix-publish-csi ci: Fix Docker publishing for CSI driver	2024-12-03 11:21:55 -06:00
Hyounggyu Choi	8b998e5f0c	runtime-rs: Introduce get_devno_ccw() for deduplication The devno assignment logic is repeated in 5 different places during device addition. To improve code maintainability and readability, this commit introduces a standalone function, `get_devno_ccw()`, to handle the deduplication. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-12-03 15:35:03 +01:00
Leonard Cohnen	9b614a4615	genpolicy: align state path getter and setter Before this patch there was a mismatch between the JSON path under which the state of the rule evaluation is set in comparison to under which it is retrieved. This resulted in the behavior that each time the policy was evaluated, it thought it was the _first_ time the policy was evaluated. This also means that the consistency check for the `sandbox_name` was ineffective. Signed-off-by: Leonard Cohnen <lc@edgeless.systems>	2024-12-03 13:25:24 +01:00
Aurélien Bombo	85d3bcd713	ci: Fix Docker publishing for CSI driver The compilation succeeds, however Docker can't find the binary because we specify an absolute path. In Docker world, an absolute path is absolute to the Docker build context (here: src/tools/csi-kata-directvolume). To fix this, we link the binary into the build context, where the Dockerfile expects it. Failure mode: https://github.com/kata-containers/kata-containers/actions/runs/12068202642/job/33693101962?pr=10563#step:8:213 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-12-02 15:50:01 -06:00
Saul Paredes	711d12e5db	policy: support optional metadata uid field This prevents a deserialization error when uid is specified Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-12-02 11:24:58 -08:00
Dan Mihai	efd492d562	runtime: skip logging some of the dial errors With full debug logging enabled there might be around 1,500 redials so log just ~15 of these redials to avoid flooding the log. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-12-02 19:11:32 +00:00
Hyounggyu Choi	9c19d7674a	Merge pull request #10590 from zvonkok/fix-ci ci: Fix variant for confidential targets	2024-12-02 18:39:52 +01:00
Saul Paredes	9105c1fa0c	policy: add constants to rules.rego Reuse constants where applicable Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-12-02 08:28:58 -08:00
Hyounggyu Choi	6f4f94a9f0	Merge pull request #10595 from BbolroC/add-zvsi-devmapper-to-gatekeeper-required-jobs gatekeeper: add run-k8s-tests-on-zvsi(devmapper) to required jobs	2024-12-02 15:28:14 +01:00
Zvonko Kaiser	20442c0eae	ci: Fix variant for confidential targets The default initrd confidential target will have a variant=confidential we need to accomodate this and make sure we also accomodate aaa-xxx-confidential targets. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-12-02 14:21:03 +00:00
stevenhorsman	b87b4b6756	metrics: Increase ranges range for qemu failing tests We've also seen the qemu metrics tests are failing due to the results being slightly outside the max range for network-iperf3 parallel and minimum for network-iperf3 jitter tests on PRs that have no code changes, so we've increase the bounds to not see false negatives. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-11-29 10:52:16 +00:00
stevenhorsman	4011071526	metrics: Increase minval range for failing tests We've seen a couple of instances recently where the metrics tests are failing due to the results being below the minimum value by ~2%. For tests like latency I'm not sure why values being too low would be an issue, but I've updated the minpercent range of the failing tests to try and get them passing. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-11-29 10:50:02 +00:00
Hyounggyu Choi	de3452f8e1	gatekeeper: add run-k8s-tests-on-zvsi(devmapper) to required jobs As the following CI job has been marked as required: - kata-containers-ci-on-push / run-k8s-tests-on-zvsi / run-k8s-tests (devmapper, qemu, kubeadm) we need to add it to the gatekeeper's required job list. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-11-28 12:46:47 +01:00
Fabiano Fidêncio	bdf10e651a	Merge pull request #10597 from kata-containers/topic/unbreak-ci-3rd-time-s-a-charm Unbreak the CI, 3rd attempt	2024-11-28 12:36:09 +01:00
Fabiano Fidêncio	92b8091f62	Revert "ci: unbreak: Reallow no-op builds" This reverts commit `559018554b`. As we've noticed that this is causing issues with initrd builds in the CI. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-28 12:02:40 +01:00
Fabiano Fidêncio	ca2098f828	build: Allow dummy builds (for when adding a new target) This will help us to simply allow a new dummy build whenever a new component is added. As long as the format `$(call DUMMY,$@)` is followed, we should be good to go without taking the risk of breaking the CI. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-28 11:13:24 +01:00
Fabiano Fidêncio	f9930971a2	Merge pull request #10594 from sprt/sprt/unbreak-ci-noop-build ci: unbreak: Reallow no-op builds	2024-11-28 07:38:25 +01:00
Aurélien Bombo	559018554b	ci: unbreak: Reallow no-op builds #9838 previously modified the static build so as not to repeatedly copy the same assets on each matrix iteration: https://github.com/kata-containers/kata-containers/pull/9838#issuecomment-2169299202 However, that implementation breaks specifiying no-op/WIP build targets such as done in `e43c59a`. Such no-op builds have been a historical of the project requirement because of a GHA limitation. The breakage is due to no-op builds not generating a tar file corresponding to the asset: https://github.com/kata-containers/kata-containers/actions/runs/12059743390/job/33628926474?pr=10592 To address this breakage, we revert to the `cp -r` implementation and add the `--no-clobber` flag to still preserve the current behavior. Note that `-r` will also create the destination directory if it doesn't exist. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-11-27 18:40:29 -06:00
Fabiano Fidêncio	9699c7ed06	Merge pull request #10589 from kata-containers/sprt/fix-csi-publish gha: Unbreak CI and work around workflow limit	2024-11-27 23:52:55 +01:00
Aurélien Bombo	eac197d3b7	Merge pull request #10564 from microsoft/danmihai1/clh-endpoint-type runtime: clh: addNet() logging clean-up	2024-11-27 14:44:14 -06:00
Aurélien Bombo	7f659f3d63	gha: Unbreak CI and work around workflow limit #10561 inadvertently broke the CI by going over the limit of 20 reusable workflows: https://github.com/kata-containers/kata-containers/actions/runs/12054648658/workflow This commit fixes that by inlining the job. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-11-27 12:23:15 -06:00
Aurélien Bombo	16a91fccbe	Merge pull request #10561 from sprt/csi-driver-ci coco: ci: Lay groundwork for compiling and publishing CSI driver image [1/x]	2024-11-27 10:26:45 -06:00
Fabiano Fidêncio	175fe8bc66	Merge pull request #10585 from fidencio/topic/kata-deploy-use-drop-in-containerd-config-whenever-it-is-possible kata-deploy: Use drop-in files whenever it's possible	2024-11-27 16:36:18 +01:00
Steve Horsman	6bb00d9a1d	Merge pull request #10583 from squarti/agent-startup-cdh-client agent: fix startup when guest_components_procs is set to none	2024-11-27 11:43:07 +00:00
Fabiano Fidêncio	500508a592	kata-deploy: Use drop-in files whenever it's possible This will make our lives considerably easier when it comes to cleaning up content added, while it's also a groundwork needed for having multiple installations running in parallel. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-27 12:27:08 +01:00
Steve Horsman	3240f8a4b8	Merge pull request #10586 from stevenhorsman/delete-rootfs-binary-assets-after-rootfs-build workflows: Remove rootfs binary artifacts	2024-11-27 10:03:20 +00:00
Fabiano Fidêncio	c472fe1924	Merge pull request #10584 from fidencio/topic/kata-deploy-prepare-for-containerd-config-version-3 kata-deploy: Support containerd configuration version 3	2024-11-26 18:44:56 +01:00
stevenhorsman	3e5d360185	workflows: Remove rootfs binary artifacts We need the publish certain artefacts for the rootfs, like the agent, guest-components, pause bundle etc as they are consumed in the `build-asset-rootfs` step. However after this point they aren't needed and probably shouldn't be included in the overall kata tarball, so delete them once they aren't needed any more to avoid them being included. Fixes: #10575 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-11-26 15:24:20 +00:00
Fabiano Fidêncio	6f70ab9169	kata-deploy: Adapt how the containerd version is checked for k0s Let's actually mount the whole /etc/k0s as /etc/containerd, so we can easily access the containerd configuration file which has the version in it, allowing us to parse it instead of just making a guess based on kubernetes distro being used. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-26 16:15:11 +01:00
Silenio Quarti	1230bc77f2	agent: fix startup when guest_components_procs is set to none This PR ensures that OCICRYPT_CONFIG_PATH file is initialized only when CDH socket exists. This prevents startup error if attestation binaries are not installed in PodVM. Fixes: https://github.com/kata-containers/kata-containers/issues/10568 Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>	2024-11-26 09:57:04 -05:00
Fabiano Fidêncio	f5a9aaa100	kata-deploy: Support containerd config version 3 On Ubuntu 24.04, with the distro default containerd, we're already getting: ``` $ containerd config default \| grep "version = " version = 3 ``` With that in mind, let's make sure that we're ready to support this from the next release. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-26 14:01:50 +01:00
Fupan Li	28166c8a32	Merge pull request #10577 from Apokleos/fix-vfiodev-name runtime-rs: fix vfio device name combination issue	2024-11-26 09:35:45 +08:00
Dan Mihai	d93900c128	Merge pull request #10543 from microsoft/danmihai1/regorus-warning genpolicy: avoid regorus warning	2024-11-25 16:47:33 -08:00
Zvonko Kaiser	1b10e82559	Merge pull request #10516 from zvonkok/kata-agent-cdi ci: Fix error on self-hosted machines	2024-11-25 18:49:37 -05:00
Ryan Savino	e46d24184a	Merge pull request #10386 from kimullaa/fix-build-error-when-using-sev-snp docs: Fix several build failures when I tried the procedures in "Kata Containers with AMD SEV-SNP VMs"	2024-11-25 16:58:52 -06:00
Dan Mihai	f340b31c41	genpolicy: avoid regorus warning Avoid adding to the Guest console warnings about "agent_policy:10:8". "import input" is unnecessary. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-11-25 21:19:01 +00:00
Zvonko Kaiser	c3d1b3c5e3	Merge pull request #10464 from zvonkok/nvidia-gpu-rootfs gpu: NVIDIA GPU initrd/image build	2024-11-25 16:16:42 -05:00
Fabiano Fidêncio	8763a9bc90	Merge pull request #10520 from fidencio/topic/drop-clear-linux-rootfs osbuilder: Drop Clear Linux	2024-11-25 21:16:03 +01:00
Dan Mihai	78cbf33f1d	runtime: clh: addNet() logging clean-up Avoid logging the same endpoint fields twice from addNet(). Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-11-25 19:58:54 +00:00
alex.lyn	5dba680afb	runtime-rs: fix vfio device name combination issue Fixes #10576 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2024-11-25 14:01:43 +08:00
Hyounggyu Choi	48e2df53f7	runtime-rs: Add devno to DeviceVirtioScsi A new attribute named `devno` is added to DeviceVirtioScsi. It will be used to specify a device number for a CCW bus type. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-11-23 13:45:36 +01:00
Hyounggyu Choi	2cc48f7822	runtime-rs: Add devno to DeviceVhostUserFs A new attribute named `devno` is added to DeviceVhostUserFs. It will be used to specify a device number for a CCW bus type. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-11-23 13:45:36 +01:00
Hyounggyu Choi	920484918c	runtime-rs: Add devno to VhostVsock A new attribute named `devno` is added to VhostVsock. It will be used to specify a device number for a CCW bus type. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-11-23 13:45:36 +01:00
Hyounggyu Choi	9486790089	runtime-rs: Add devno to DeviceVirtioSerial A new attribute named `devno` is added to DeviceVirtioSerial. It will be used to specify a device number for a CCW bus type. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-11-23 13:45:36 +01:00
Hyounggyu Choi	516daecc50	runtime-rs: Add devno to DeviceVirtioBlk A new attribute named `devno` is added to DeviceVirtioBlk. It will be used to specify a device number for a CCW bus type. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-11-23 13:45:36 +01:00
Hyounggyu Choi	30a64092a7	runtime-rs: Add CcwSubChannel to provide devno for CCW devices To explicitly specify a device number on the QEMU command line for the following devices using the CCW transport on s390x: - SerialDevice - BlockDevice - VhostUserDevice - SCSIController - VSOCKDevice this commit introduces a new structure CcwSubChannel and implements the following methods: - add_device() - remove_device() - address_format_ccw() - set_addr() You can see the detailed explanation for each method in the comment. This resolves the 1st part of #10573. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-11-23 13:45:36 +01:00
Steve Horsman	322073bea1	Merge pull request #10447 from ldoktor/required-jobs ci: Required jobs	2024-11-22 09:15:11 +00:00
Lukáš Doktor	e69635b376	ci.gatekeeper: Remove unused variable this is a left-over from previous way of iterating over jobs. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-11-22 09:27:11 +01:00
Lukáš Doktor	fa7bca4179	ci.gatekeeper: Print the older job id let's print the also the existing result's id when printing the information about ignoring older result id to simplify debugging. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-11-22 09:27:11 +01:00
Lukáš Doktor	6c19a067a0	ci.gatekeeper: Update existing results tha matching run_id means we're dealing with the same job but with updated results and not with an older job. Update the results in such case. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-11-22 09:27:09 +01:00
Aurélien Bombo	5e4990bcf5	coco: ci: Add no-op steps to deploy CSI driver This adds no-op steps that'll be used to deploy and clean up the CSI driver used for testing. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-11-21 16:08:06 -06:00
Aurélien Bombo	893f6a4ca0	ci: Introduce job to publish CSI driver image This adds a new job to build and publish the CSI driver Docker image. Of course this job will fail after we merge this PR because the CSI driver compilation job hasn't been implemented yet. However that will be implemented directly after in #10561. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-11-21 16:07:59 -06:00
Aurélien Bombo	e43c59a2c6	ci: Add no-op step to compile CSI driver This adds a no-op build step to compile the CSI driver. The actual compilation will be implemented in an ulterior PR, so as to ensure we don't break the CI. Addresses: #10560 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-11-21 16:06:55 -06:00
Zvonko Kaiser	0debf77770	gpu: NVIDIA gpu initrd/image build With each release make sure we ship a GPU enabled rootfs/initrd Fixes: #6554 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-11-21 18:57:23 +00:00
Steve Horsman	b4da4b5e3b	Merge pull request #10377 from coolljt0725/fix_build osbuilder: Fix build dependency of ubuntu rootfs with Docker	2024-11-21 08:45:59 +00:00
Jitang Lei	ed4c727c12	osbuilder: Fix build dependency of ubuntu rootfs with Docker Build ubuntu rootfs with Docker failed with error: `Unable to find libclang` Fix this error by adding libclang-dev to the dependency. Signed-off-by: Jitang Lei <leijitang@outlook.com>	2024-11-21 10:49:27 +08:00
Zvonko Kaiser	e9f36f8187	ci: Fixing simple typo change evn to env Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-11-20 18:40:14 +00:00
Zvonko Kaiser	a5733877a4	ci: Fix error on self-hosted machines We need to clean-up any created files/dirs otherwise we cause problems on self-hosted runners. Using tempdir which will be removed automatically. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-11-20 18:40:13 +00:00
Lukáš Doktor	62e8815a5a	ci: Add documentation to cover mapping format to help people with adding new entries. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-11-20 17:25:59 +01:00
Lukáš Doktor	64306dc888	ci: Set required-tests according to GH required tests this should record the current list of required tests from GH. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-11-20 17:25:57 +01:00
Steve Horsman	358ebf5134	Merge pull request #10558 from AdithyaKrishnan/main ci: Re-enable SNP CI	2024-11-20 10:27:41 +00:00
Steve Horsman	30bad4ee43	Merge pull request #10562 from stevenhorsman/remove-release-artifactor-skips workflows: Remove skipping of artifact uploads	2024-11-20 08:45:37 +00:00
Adithya Krishnan Kannan	2242aee099	ci: Skip the failing tests in SNP Per [Issue#10549](https://github.com/kata-containers/kata-containers/issues/10549), the following tests are failing on SNP. 1. k8s-guest-pull-image-encrypted.bats 2. k8s-guest-pull-image-authenticated.bats 3. k8s-guest-pull-image-signature.bats 4. k8s-confidential-attestation.bats Per @fidencio 's comment on [PR#10558](https://github.com/kata-containers/kata-containers/pull/10558), I am skipping the same. Signed-Off-By: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>	2024-11-19 10:41:43 -06:00
stevenhorsman	da5f6b77c7	workflows: Remove skipping of artifact uploads Now we are downloading artifacts to create the rootfs we need to ensure they are uploaded always, even on releases Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-11-19 13:28:02 +00:00
Steve Horsman	817438d1f6	Merge pull request #10552 from stevenhorsman/3.11.0-release release: Bump version to 3.11.0	2024-11-19 09:44:35 +00:00
Saul Paredes	eab48c9884	Merge pull request #10545 from microsoft/cameronbaird/sync-clh-logging runtime: fix comment to accurately reflect clh behavior	2024-11-18 11:25:58 -08:00
Adithya Krishnan Kannan	ef367d81f2	ci: Re-enable SNP CI We've debugged the SNP Node and we wish to test the fixes on GHA. Signed-Off-By: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>	2024-11-18 11:11:27 -06:00
stevenhorsman	7a8ba14959	release: Bump version to 3.11.0 Bump `VERSION` and helm-chart versions Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-11-18 11:13:15 +00:00
Steve Horsman	0ce3f5fc6f	Merge pull request #10514 from squarti/pause_command agent: overwrite OCI process spec when overwriting pause image	2024-11-15 18:03:58 +00:00
Fabiano Fidêncio	92f7526550	Merge pull request #10542 from Crypt0s/topic/enable-CONFIG_KEYS kernel: add CONFIG_KEYS=y to enable kernel keyring	2024-11-15 12:15:25 +01:00
Crypt0s	563a6887e2	kernel: add CONFIG_KEYS=y to enable kernel keyring KinD checks for the presence of this (and other) kernel configuration via scripts like https://blog.hypriot.com/post/verify-kernel-container-compatibility/ or attempts to directly use /proc/sys/kernel/keys/ without checking to see if it exists, causing an exit when it does not see it. Docker/it's consumers apparently expect to be able to use the kernel keyring and it's associated syscalls from/for containers. There aren't any known downsides to enabling this except that it would by definition enable additional syscalls defined in https://man7.org/linux/man-pages/man7/keyrings.7.html which are reachable from userspace. This minimally increases the attack surface of the Kata Kernel, but this attack surface is minimal (especially since the kernel is most likely being executed by some kind of hypervisor) and highly restricted compared to the utility of enabling this feature to get further containerization compatibility. Signed-off-by: Crypt0s <BryanHalf@gmail.com>	2024-11-15 09:30:06 +01:00
Shunsuke Kimura	706e8bce89	docs: change from OVMF.fd to AmdSev.fd change the build method to generate OVMF for AmdSev. This commit adds `ovmf_build=sev` env parameter. <`638c2c4164`> Fixes #10378 Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>	2024-11-15 11:24:45 +09:00
Shunsuke Kimura	d7f6fabe65	docs: fix build-kernel.sh option `build-kernel.sh` no longer takes an argument for the -x option. <`6c3338271b`> Fixes #10378 Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>	2024-11-15 11:24:45 +09:00
Cameron Baird	65881ceb8a	runtime: fix comment to accurately reflect clh behavior Fix the CLH log levels description Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2024-11-14 23:16:11 +00:00
Silenio Quarti	42b6203493	agent: overwrite OCI process spec when overwriting pause image The PR replaces the OCI process spec of the pause container with the spec of the guest provided pause bundle. Fixes: https://github.com/kata-containers/kata-containers/issues/10537 Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>	2024-11-14 13:05:16 -05:00
Fabiano Fidêncio	6a9266124b	Merge pull request #10501 from kata-containers/topic/ci-split-tests ci: tdx: Split jobs to run in 2 different machines	2024-11-14 17:24:50 +01:00
Fabiano Fidêncio	9b3fe0c747	ci: tdx: Adjust workflows to use different machines This will be helpful in order to increase the OS coverage (we'll be using both Ubuntu 24.04 and CentOS 9 Stream), while also reducing the amount spent on the tests (as one machine will only run attestation related tests, and the other the tests that do not require attestation). Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-14 15:52:00 +01:00
Fabiano Fidêncio	9b1a5f2ac2	tests: Add a way to run only tests which rely on attestation We're doing this as, at Intel, we have two different kind of machines we can plug into our CI. Without going much into details, only one of those two kinds of machines will work for the attestation tests we perform with ITA, thus in order to speed up the CI and improve test coverage (OS wise), we're going to run different tests in different machines. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-14 15:51:57 +01:00
Steve Horsman	915695f5ef	Merge pull request #9407 from mrIncompetent/root-fs-clang rootfs: Install missing clang in Ubuntu docker image	2024-11-14 10:35:06 +00:00
Henrik Schmidt	57a4dbedeb	rootfs: Install missing libclang-dev in Ubuntu docker image Fixes #9444 Signed-off-by: Henrik Schmidt <mrIncompetent@users.noreply.github.com>	2024-11-14 08:48:24 +00:00
Hyounggyu Choi	5869046d04	Merge pull request #9195 from UiPath/fix/vcpus-for-static-mgmt runtime: Set maxvcpus equal to vcpus for the static resources case	2024-11-14 09:38:20 +01:00
Dan Mihai	d9977b3e75	Merge pull request #10431 from microsoft/saulparedes/add-policy-state genpolicy: add state to policy	2024-11-13 11:48:46 -08:00
Aurélien Bombo	7bc2fe90f9	Merge pull request #10521 from ncppd/osbuilder-cleanup osbuilder: remove redundant env variable	2024-11-13 12:17:09 -06:00
Steve Horsman	a947d2bc40	Merge pull request #10539 from AdithyaKrishnan/main ci: Temporarily skip SNP CI	2024-11-13 17:58:32 +00:00
Adithya Krishnan Kannan	439a1336b5	ci: Temporarily skip SNP CI As discussed in the CI working group, we are temporarily skipping the SNP CI to unblock the remaining workflow. Will revert after fixing the SNP runner. Signed-Off-By: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>	2024-11-13 11:44:16 -06:00
Fabiano Fidêncio	02d4c3efbf	Merge pull request #10519 from fidencio/topic/relax-restriction-for-qemu-tdx Reapply "runtime: confidential: Do not set the max_vcpu to cpu"	2024-11-13 16:09:06 +01:00
Saul Paredes	c207312260	genpolicy: validate container sandbox names Make sure all container sandbox names match the sandbox name of the first container. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-11-12 15:17:01 -08:00
Saul Paredes	52d1aea1f7	genpolicy: Add state Use regorous engine's add_data method to add state to the policy. This data can later be accessed inside rego context through the data namespace. Support state modifications (json-patches) that may be returned as a result from policy evaluation. Also initialize a policy engine data slice "pstate" dedicated for storing state. Fixes #10087 Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-11-12 15:16:53 -08:00
Alexandru Matei	e83f8f8a04	runtime: Set maxvcpus equal to vcpus for the static resources case Fixes: #9194 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>	2024-11-12 16:36:42 +02:00
GabyCT	06fe459e52	Merge pull request #10508 from GabyCT/topic/installartsta gha: Get artifacts when installing kata tools in stability workflow	2024-11-11 15:59:06 -06:00
Nikos Ch. Papadopoulos	ab80cf8f48	osbuilder: remove redundant env variable Remove second declaration of GO_HOME in roofs-build ubuntu script. Signed-off-by: Nikos Ch. Papadopoulos <ncpapad@cslab.ece.ntua.gr>	2024-11-11 19:49:28 +02:00
Fabiano Fidêncio	780b36f477	osbuilder: Drop Clear Linux The Clear Linux rootfs is not being tested anywhere, and it seems Intel doesn't have the capacity to review the PRs related to this (combined with the lack of interested from the rest of the community on reviewing PRs that are specific to this untested rootfs). With this in mind, I'm suggesting we drop Clear Linux support and focus on what we can actually maintain. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-11 15:22:55 +01:00
Fabiano Fidêncio	5618180e63	Merge pull request #10515 from kata-containers/sprt/ubuntu-latest-fix gha: Hardcode ubuntu-22.04 instead of latest	2024-11-10 09:54:39 +01:00
Fabiano Fidêncio	2281342fb8	Merge pull request #10513 from fidencio/topic/ci-adjust-proxy-nightmare-for-tdx ci: tdx: kbs: Ensure https_proxy is taken in consideration	2024-11-10 00:17:10 +01:00
Fabiano Fidêncio	0d8c4ce251	Merge pull request #10517 from microsoft/saulparedes/remove_manifest_v1_test tests: remove manifest v1 test	2024-11-09 23:40:51 +01:00
Fabiano Fidêncio	56812c852f	Reapply "runtime: confidential: Do not set the max_vcpu to cpu" This reverts commit `f15e16b692`, as we don't have to do this since we're relying on the `static_sandbox_resource_mgmt` feature, which gives us the correct amount of memory and CPUs to be allocated. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-09 23:20:17 +01:00
Saul Paredes	461efc0dd5	tests: remove manifest v1 test This test was meant to show support for pulling images with v1 manifest schema versions. The nginxhttps image has been modified in https://hub.docker.com/r/ymqytw/nginxhttps/tags such that we are no longer able to pull it: $ docker pull ymqytw/nginxhttps:1.5 Error response from daemon: missing signature key We may remove this test since schema version 1 manifests are deprecated per https://docs.docker.com/engine/deprecated/#pushing-and-pulling-with-image-manifest-v2-schema-1 : "These legacy formats should no longer be used, and users are recommended to update images to use current formats, or to upgrade to more current images". This schema version was used by old docker versions. Further OCI spec https://github.com/opencontainers/image-spec/blob/main/manifest.md#image-manifest-property-descriptions only supports schema version 2. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-11-08 13:38:51 -08:00
Aurélien Bombo	19e972151f	gha: Hardcode ubuntu-22.04 instead of latest GHA is migrating ubuntu-latest to Ubuntu 24 so let's hardcode the current 22.04 LTS. https://github.blog/changelog/2024-11-05-notice-of-breaking-changes-for-github-actions/ Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-11-08 11:00:15 -06:00
Greg Kurz	2bd8fde44a	Merge pull request #10511 from ldoktor/fedora-python ci.ocp: Use the official python:3 container for sanity	2024-11-08 16:31:40 +01:00
Fabiano Fidêncio	baf88bb72d	ci: tdx: kbs: Ensure https_proxy is taken in consideration Trustee's deployment must set the correct https_proxy as env var on the container that will talk to the ITA / ITTS server, otherwise the kbs service won't be able to start, causing then issues in our CI. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org> Signed-off-by: Krzysztof Sandowicz <krzysztof.sandowicz@intel.com>	2024-11-08 16:06:16 +01:00
Steve Horsman	1f728eb906	Merge pull request #10498 from stevenhorsman/update-create-container-timeout-log tests: k8s: Update image pull timeout error	2024-11-08 10:47:39 +00:00
Steve Horsman	6112bf85c3	Merge pull request #10506 from stevenhorsman/skip-runk-ci workflow: Remove/skip runk CI	2024-11-08 09:54:06 +00:00
Steve Horsman	a5acbc9e80	Merge pull request #10505 from stevenhorsman/remove-stratovirt-metrics-tests metrics: Skip metrics on stratovirt	2024-11-08 08:53:05 +00:00
Lukáš Doktor	2f7d34417a	ci.ocp: Use the official python:3 container for sanity Fedora F40 removed python3 from the base container, to avoid such issues let's rely on the latest and greates official python container. Fixes: #10497 Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-11-08 07:16:30 +01:00
Zvonko Kaiser	183bd2aeed	Merge pull request #9584 from zvonkok/kata-agent-cdi kata-agent: Add CDI support	2024-11-07 14:18:32 -05:00
Zvonko Kaiser	aa2e1a57bd	agent: Added test-case for handle_cdi_devices We are generating a simple CDI spec with device and global containerEdits to test the CDI crate. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-11-07 17:03:18 +00:00
Gabriela Cervantes	4274198664	gha: Get artifacts when installing kata tools in stability workflow This PR adds the get artifacts which are needed when installing kata tools in stability workflow to avoid failures saying that artifacts are missing. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-11-07 16:20:41 +00:00
stevenhorsman	a5f1a5a0ee	workflow: Remove/skip runk CI As discussed in the AC meeting, we don't have a maintainer, (or users?) of runk, and the CI is unstable, so giving we can't support it, we shouldn't waste CI cycles on it. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-11-07 14:16:30 +00:00
stevenhorsman	0efe9f4e76	metrics: Skip metrics on stratovirt As discussed on the AC call, we are lacking maintainers for the metrics tests. As a starting point for potentially phasing them out, we discussed starting with removing the test for stratovirt as a non-core hypervisor and a job that is problematic in leaving behind resources that need cleaning up. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-11-07 14:06:57 +00:00
Fabiano Fidêncio	c332e953f9	Merge pull request #10500 from squarti/fix-10499 runtime: Files are not synced between host and guest VMs	2024-11-07 08:28:53 +01:00
Silenio Quarti	be3ea2675c	runtime: Files are not synced between host and guest VMs This PR makes the root dir absolute after resolving the default root dir symlink. Fixes: https://github.com/kata-containers/kata-containers/issues/10499 Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>	2024-11-06 17:31:12 -05:00
GabyCT	47cea6f3c6	Merge pull request #10493 from GabyCT/topic/katatoolsta gha: Add install kata tools as part of the stability workflow	2024-11-06 14:16:48 -06:00
Gabriela Cervantes	13e27331ef	gha: Add install kata tools as part of the stability workflow This PR adds the install kata tools step as part of the k8s stability workflow. To avoid the failures saying that certain kata components are not installed it. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-11-06 20:07:06 +00:00
Fabiano Fidêncio	71c4c2a514	Merge pull request #10486 from kata-containers/topic/enable-AUTO_GENERATE_POLICY-for-qemu-coco-dev workflows: Use AUTO_GENERATE_POLICY for qemu-coco-dev	2024-11-06 21:04:45 +01:00
Zvonko Kaiser	3995fe71f9	kata-agent: Add CDI support For proper device handling add CDI support Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-11-06 17:50:20 +00:00
stevenhorsman	85554257f8	tests: k8s: Update image pull timeout error Currently the error we are checking for is `CreateContainerRequest timed out`, but this message doesn't always seem to be printed to our pod log. Try using a more general message that should be present more reliably. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-11-06 17:00:26 +00:00
Fabiano Fidêncio	a3c72e59b1	Merge pull request #10495 from littlejawa/ci/skip_nginx_connectivity_for_crio ci: skip nginx connectivity test with qemu/crio	2024-11-06 13:43:19 +01:00
Julien Ropé	da5e0c3f53	ci: skip nginx connectivity test with crio We have an error with service name resolution with this test when using crio. This error could not be reproduced outside of the CI for now. Skipping it to keep the CI job running until we find a solution. See: #10414 Signed-off-by: Julien Ropé <jrope@redhat.com>	2024-11-06 12:07:02 +01:00
Greg Kurz	5af614b1a4	Merge pull request #10496 from littlejawa/ci/expose_container_runtime ci: export CONTAINER_RUNTIME to the test scripts	2024-11-06 12:05:36 +01:00
Julien Ropé	6d0cb1e9a8	ci: export CONTAINER_RUNTIME to the test scripts This variable will allow tests to adapt their behaviour to the runtime (containerd/crio). Signed-off-by: Julien Ropé <jrope@redhat.com>	2024-11-06 11:29:11 +01:00
Fabiano Fidêncio	72979d7f30	workflows: Use AUTO_GENERATE_POLICY for qemu-coco-dev By the moment we're testing it also with qemu-coco-dev, it becomes easier for a developer without access to TEE to also test it locally. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-06 10:47:08 +01:00
Fabiano Fidêncio	7d3f2f7200	runtime: Match TEEs for the static_sandbox_resource_mgmt option The qemu-coco-dev runtime class should be as close as possible to what the TEEs runtime classes are doing, and this was one of the options that ended up overlooked till now. Shout out to Dan Mihai for noticing that! Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-06 10:47:08 +01:00
Fabiano Fidêncio	ea8114833c	Merge pull request #10491 from fidencio/topic/fix-typo-in-the-ephemeral-handler agent: fix typo on getting EphemeralHandler size option	2024-11-06 10:31:48 +01:00
Fabiano Fidêncio	7e6779f3ad	Merge pull request #10488 from fidencio/topic/teach-our-machinery-to-deal-with-rc-kernels build: kernel: Teach our machinery to deal with -rc kernels	2024-11-05 16:19:57 +01:00
Zvonko Kaiser	a4725034b2	Merge pull request #9480 from zvonkok/build-image-suffix image: Add suffix to image or initrd depending on the NVIDIA driver version	2024-11-05 09:43:56 -05:00
Fabiano Fidêncio	77c87a0990	agent: fix typo on getting EphemeralHandler size option Most likely this was overlooked during the development / review, but we're actually interested on the size rather than on the pagesize of the hugepages. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-05 15:15:17 +01:00
Fabiano Fidêncio	2b16160ff1	versions: kernel-dragonball: Fix URL SSIA Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-05 12:55:34 +01:00
Fabiano Fidêncio	f7b31ccd6c	kernel: bump kata_config_version Due to the changes done in the previous commits. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-05 12:26:57 +01:00
Fabiano Fidêncio	a52ea32b05	build: kernel: Learn how to deal with release candidates So far we were not prepared to deal with release candidates as those: * Do not have a sha256sum in the sha256sums provided by the kernel cdn * Come from a different URL (directly from Linus) * Have a different suffix (.tar.gz, instead of .tar.xz) Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-05 12:26:02 +01:00
Fabiano Fidêncio	9f2d4b2956	build: kernel: Always pass the url to the builder This doesn't change much on how we're doing things Today, but it simplifies a lot cases that may be added later on (and will be) like building -rc kernels. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-05 12:26:02 +01:00
Fabiano Fidêncio	ee1a17cffc	build: kernel: Take kernel_url into consideration Let's make sure the kernel_url is actually used whenever it's passed to the function. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-05 12:26:02 +01:00
Fabiano Fidêncio	9a0b501042	build: kernel: Remove tee specific function As, thankfully, we're relying on upstream kernels for TEEs. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-05 12:26:02 +01:00
Fabiano Fidêncio	cc4006297a	build: kernel: Pass the yaml base path instead of the version path By doing this we can ensure this can be re-used, if needed (and it'll be needed), for also getting the URL. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-05 12:26:02 +01:00
Fabiano Fidêncio	7057ff1cd5	build: kernel: Always pass -f to the kernel builder -f forces the (re)generaton of the config when doing the setup, which helps a lot on local development whilst not causing any harm in the CI builds. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-05 12:26:02 +01:00
Fabiano Fidêncio	910defc4cf	Merge pull request #10490 from fidencio/topic/fix-ovmf-build builds: ovmf: Workaround Zeex repo becoming private	2024-11-05 12:25:00 +01:00
Fabiano Fidêncio	aff3d98ddd	builds: ovmf: Workaround Zeex repo becoming private Let's just do a simple `sed` and not use the repo that became private. This is not a backport of https://github.com/tianocore/edk2/pull/6402, but it's a similar approach that allows us to proceed without the need to pick up a newer version of edk2. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-05 11:25:54 +01:00
Dan Mihai	03bf4433d7	Merge pull request #10459 from stevenhorsman/update-bats tests: k8s: Update bats	2024-11-04 12:26:58 -08:00
Aurélien Bombo	f639d3e87c	Merge pull request #10395 from Sumynwa/sumsharma/create_container agent-ctl: Add support to test kata-agent's container creation APIs.	2024-11-04 14:09:12 -06:00
GabyCT	7f066be04e	Merge pull request #10485 from GabyCT/topic/fixghast gha: Fix source for gha stability run script	2024-11-04 12:09:28 -06:00
Steve Horsman	a2b9527be3	Merge pull request #10481 from mkulke/mkulke/init-cdh-client-on-gcprocs-none agent: perform attestation init w/o process launch	2024-11-04 17:27:45 +00:00
Gabriela Cervantes	fd4d0dd1ce	gha: Fix source for gha stability run script This PR fixes the source to avoid duplication specially in the common.sh script and avoid failures saying that certain script is not in the directory. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-11-04 16:16:13 +00:00
Magnus Kulke	bf769851f8	agent: perform attestation init w/o process launch This change is motivated by a problem in peerpod's podvms. In this setup the lifecycle of guest components is managed by systemd. The current code skips over init steps like setting the ocicrypt-rs env and initialization of a CDH client in this case. To address this the launch of the processes has been isolated into its own fn. Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>	2024-11-04 13:31:07 +01:00
Steve Horsman	4fd9df84e4	Merge pull request #10482 from GabyCT/topic/fixvirtdoc docs: Update virtualization document	2024-11-04 11:51:09 +00:00
stevenhorsman	175ebfec7c	Revert "k8s:kbs: Add trap statement to clean up tmp files" This reverts commit `973b8a1d8f`. As @danmihai1 points out https://github.com/bats-core/bats-core/issues/364 states that using traps in bats is error prone, so this could be the cause of the confidential test instability we've been seeing, like it was in the static checks, so let's try and revert this. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-11-04 09:59:37 +00:00
stevenhorsman	75cb1f46b8	tests/k8s: Add skip is setup_common fails At @danmihai1's suggestion add a die message in case the call to setup_common fails, so we can see if in the test output. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-11-04 09:59:33 +00:00
stevenhorsman	3f5bf9828b	tests: k8s: Update bats We've seen some issues with tests not being run in some of the Coco CI jobs (Issue #10451) and in the envrionments that are more stable we noticed that they had a newer version of bats installed. Try updating the version to 1.10+ and print out the version for debug purposes Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-11-04 09:59:33 +00:00
Steve Horsman	06d2cc7239	Merge pull request #10453 from bpradipt/remote-annotation runtime: Add GPU annotations for remote hypervisor	2024-11-04 09:10:06 +00:00
Zvonko Kaiser	3781526c94	gpu: Add VARIANT to the initrd and image build We need to know if we're building a nvidia initrd or image Additionally if we build a regular or confidential VARIANT Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-11-01 18:34:13 +00:00
Zvonko Kaiser	95b69c5732	build: initrd make it coherent to the image build Add -f for moving the initrd to the correct file path Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-11-01 18:34:13 +00:00
Zvonko Kaiser	3c29c1707d	image: Add suffix to image or initrd depending on the NVIDIA driver version Fixes: #9478 We want to keep track of the driver versions build during initrd/image build so update the artifact_name after the fact. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-11-01 18:34:13 +00:00
Sumedh Alok Sharma	4b7aba5c57	agent-ctl: Add support to test kata-agent's container creation APIs. This commit introduces changes to enable testing kata-agent's container APIs of CreateContainer/StartContainer/RemoveContainer. The changeset include: - using confidential-containers image-rs crate to pull/unpack/mount a container image. Currently supports only un-authenicated registry pull - re-factor api handlers to reduce cmdline complexity and handle request generation logic in tool - introduce an OCI config template for container creation - add test case Fixes #9707 Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2024-11-01 22:18:54 +05:30
Fabiano Fidêncio	2efcb442f4	Merge pull request #10442 from Sumynwa/sumsharma/tools_use_ubuntu_static_build ci: Use ubuntu for static building of kata tools.	2024-11-01 16:04:31 +01:00
Gabriela Cervantes	1ca83f9d41	docs: Update virtualization document This PR updates the virtualization document by removing a url link which is not longer valid. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-10-31 17:28:02 +00:00
GabyCT	a3d594d526	Merge pull request #10480 from GabyCT/topic/fixstabilityrun gha: Add missing steps in Kata stability workflow	2024-10-31 09:57:33 -06:00
Fabiano Fidêncio	e058b92350	Merge pull request #10425 from burgerdev/darwin genpolicy: support darwin target	2024-10-31 12:16:44 +01:00
Markus Rudy	df5e6e65b5	protocols: only build RLimit impls on Linux The current version of the oci-spec crate compiles RLimit structs only for Linux and Solaris. Until this is fixed upstream, add compilation conditions to the type converters for the affected structs. Fixes: #10071 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2024-10-31 09:50:36 +01:00
Markus Rudy	091a410b96	kata-sys-util: move json parsing to protocols crate The parse_json_string function is specific to parsing capability strings out of ttRPC proto definitions and does not benefit from being available to other crates. Moving it into the protocols crate allows removing kata-sys-util as a dependency, which in turn enables compiling the library on darwin. Fixes: #10071 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2024-10-31 09:41:07 +01:00
Markus Rudy	8ab4bd2bfc	kata-sys-util: remove obsolete cgroups dependency The cgroups.rs source file was removed in `234d7bca04`. With cgroups support handled in runtime-rs, the cgroups dependency on kata-sys-util can be removed. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2024-10-31 09:41:07 +01:00
Sumedh Alok Sharma	0adf7a66c3	ci: Use ubuntu for static building of kata tools. This commit introduces changes to use ubuntu for statically building kata tools. In the existing CI setup, the tools currently build only for x86_64 architecture. It also fixes the build error seen for agent-ctl PR#10395. Fixes #10441 Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2024-10-31 13:19:18 +05:30
Gabriela Cervantes	c4089df9d2	gha: Add missing steps in Kata stability workflow This PR adds missing steps in the gha run script for the kata stability workflow. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-10-30 19:13:15 +00:00
Xuewei Niu	1a216fecdf	Merge pull request #10225 from Chasing1020/main runtime-rs: Add basic boilerplate for remote hypervisor	2024-10-30 17:02:50 +08:00
Hyounggyu Choi	dca69296ae	Merge pull request #10476 from BbolroC/switch-to-kubeadm-s390x gha: Switch KUBERNETES from k3s to kubeadm on s390x	2024-10-30 09:52:06 +01:00
GabyCT	9293931414	Merge pull request #10474 from GabyCT/topic/removeunvarb packaging: Remove kernel config repo variable as it is unused	2024-10-29 12:52:07 -06:00
Gabriela Cervantes	69ee287e50	packaging: Remove kernel config repo variable as it is unused This PR removes the kernel config repo variable at the build kernel script as it is not used. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-10-29 17:09:52 +00:00
GabyCT	8539cd361a	Merge pull request #10462 from GabyCT/topic/increstress tests: Increase time to run stressng k8s tests	2024-10-29 11:08:47 -06:00
Chasing1020	425f6ad4e6	runtime-rs: add oci spec for prepare_vm method The cloud-api-adaptor needs to support different types of pod VM instance. We needs to pass some annotations like machine_type, default_vcpus and default_memory to prepare the VMs. Signed-off-by: Chasing1020 <643601464@qq.com>	2024-10-30 01:01:28 +08:00
Chasing1020	f1167645f3	runtime-rs: support for remote hypervisors type This patch adds the support of the remote hypervisor type for runtime-rs. The cloud-api-adaptor needs the annotations and network namespace path to create the VMs. The remote hypervisor opens a UNIX domain socket specified in the config file, and sends ttrpc requests to a external process to control sandbox VMs. Fixes: #10350 Signed-off-by: Chasing1020 <643601464@qq.com>	2024-10-30 00:54:17 +08:00
Pradipta Banerjee	6f1ba007ed	runtime: Add GPU annotations for remote hypervisor Add GPU annotations for remote hypervisor to help with the right instance selection based on number of GPUs and model Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>	2024-10-29 10:28:21 -04:00
Steve Horsman	68225b53ca	Merge pull request #10475 from stevenhorsman/revert-10452 Revert "tests: Add trap statement in kata doc script"	2024-10-29 13:58:00 +00:00
Hyounggyu Choi	aeef28eec2	gha: Switch to kubeadm for run-k8s-tests-on-zvsi Last November, SUSE discontinued support for s390x, leaving k3s on this platform stuck at k8s version 1.28, while upstream k8s has since reached 1.31. Fortunately, kubeadm allows us to create a 1.30 Kubernetes cluster on s390x. This commit switches the KUBERNETES option from k3s to kubeadm for s390x and removes a dedicated cluster creation step. Now, cluster setup and teardown occur in ACTIONS_RUNNER_HOOK_JOB_{STARTED,COMPLETED}. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-10-29 14:27:32 +01:00
Hyounggyu Choi	238f67005f	tests: Add `kubeadm` option for KUBERNETES in gha-run.sh When creating a k8s cluster via kubeadm, the devmapper setup for containerd requires a different configuration. This commit introduces a new `kubeadm` option for the KUBERNETES variable and adjusts the path to the containerd config file for devmapper setup. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-10-29 14:19:42 +01:00
stevenhorsman	b1cffb4b09	Revert "tests: Add trap statement in kata doc script" This reverts commit `093a6fd542`. as it is breaking the static checks Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-10-29 09:57:18 +00:00
Aurélien Bombo	eb04caaf8f	Merge pull request #10074 from koct9i/log-vm-start-error runtime: log vm start error before cleanup	2024-10-28 14:39:00 -05:00
Fabiano Fidêncio	e675e233be	Merge pull request #10473 from fidencio/topic/build-cache-fix-shim-v2-root_hash.txt-location build: cache: Ensure shim-v2-root_hash.txt is in "${workdir}"	2024-10-28 16:53:06 +01:00
Fabiano Fidêncio	f19c8cbd02	build: cache: Ensure shim-v2-root_hash.txt is in "${workdir}" All the oras push logic happens from inside `${workdir}`, while the root_hash.txt extraction and renaming was not taking this into consideration. This was not caught during the manually triggered runs as those do not perform the oras push. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-28 15:17:16 +01:00
Steve Horsman	51bc71b8d9	Merge pull request #10466 from kata-containers/topic/ensure-shim-v2-sets-the-measured-rootfs-parameters-to-the-config re-enable measured rootfs build & tests	2024-10-28 13:11:50 +00:00
Fabiano Fidêncio	b70d7c1aac	tests: Enable measured rootfs tests for qemu-coco-dev Then it's on pair with what's being tested with TEEs using a rootfs image. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-28 12:43:54 +01:00
Fabiano Fidêncio	d23d057ac7	runtime: Enable measured rootfs for qemu-coco-dev Let's make sure we are prepared to test this with non-TEE environments as well. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-28 12:43:53 +01:00
Fabiano Fidêncio	7d202fc173	tests: Re-enable measured_rootfs test for TDX As we're now building everything needed to test TDX with measured rootfs support, let's bring this test back in (for TDX only, at least for now). Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-28 12:43:53 +01:00
Fabiano Fidêncio	d537932e66	build: shim-v2: Ensure MEASURED_ROOTFS is exported The approach taken for now is to export MEASURED_ROOTFS=yes on the workflow files for the architectures using confidential stuff, and leave the "normal" build without having it set (to avoid any change of expectation on the current bevahiour). Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-28 12:43:53 +01:00
Fabiano Fidêncio	9c8b20b2bf	build: shim-v2: Rebuild if root_hashes do not match Let's make sure we take the root_hashes into consideration to decide whether the shim-v2 should or should not be used from the cached artefacts. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-28 12:43:53 +01:00
Fabiano Fidêncio	9c84998de9	build: cache: Cache root_hash.txt used by the shim-v2 Let's cache the root_hash.txt from the confidential image so we can use them later on to decide whether there was a rootfs change that would require shim-v2 to be rebuilt. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-28 12:43:53 +01:00
Fabiano Fidêncio	d2d9792720	build: Don't leave cached component behind if it can't be used Let's ensure we remove the component and any extra tarball provided by ORAS in case the cached component cannot be used. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-28 12:43:53 +01:00
Fabiano Fidêncio	ef29824db9	runtime: Don't do measured rootfs for "vanilla" kernel We may decide to add this later on, but for now this is only targetting TEEs and the confidential image / initrd. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-28 12:43:53 +01:00
Fabiano Fidêncio	a65946bcb0	workflows: build: Ensure rootfs is present for shim-v2 build Let's ensure that we get the already built rootfs tarball from previous steps of the action at the time we're building the shim-v2. The reason we do that is because the rootfs binary tarballs has a root_hash.txt file that contains the information needed the shim-v2 build scripts to add the measured rootfs arguments to the shim-v2 configuration files. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-28 12:43:53 +01:00
Fabiano Fidêncio	6ea0369878	workflows: build: Ensure rootfs is built before shim-v2 As the rootfs will have what we need to add as part of the shim-v2 configuration files for measured rootfs, we must ensure this is built before shim-v2. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-28 12:43:53 +01:00
Fabiano Fidêncio	13ea082531	workflows: Build rootfs after its deps are built By doing this we can just re-use the dependencies already built, saving us a reasonable amount of time. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-28 12:43:53 +01:00
Fabiano Fidêncio	eb07a809ce	tests: Add a helper script to use prebuild components This is a helper script that does basically what's already being done by the s390x CI, which is: * Move a folder with the components that we were stored / downloaded during the GHA execution to the expected `build` location * Get rid of the dependencies for a specific asset, as the dependencies are already pulled in from previous GHA steps For now this script is only being added but not yet executed anywhere, and that will come as the next step in this series. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-28 12:43:52 +01:00
Fabiano Fidêncio	c2b18f9660	workflows: Store rootfs dependencies So far we haven't been storing the rootfs dependencies as part of our workflows, but we better do it to re-use them as part of the rootfs build. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-28 12:43:52 +01:00
Steve Horsman	b5f503b0b5	Merge pull request #10471 from fidencio/topic/possibly-fix-release-workflow workflows: Possibly fix the release workflow	2024-10-28 11:38:33 +00:00
Konstantin Khlebnikov	ee50582848	runtime: log vm start error before cleanup Return of proper error to the initiator is not guaranteed. Method StopVM could kill shim process together with VM pieces. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>	2024-10-28 11:21:21 +01:00
Fabiano Fidêncio	a8fad6893a	workflows: Possibly fix the release workflow The only reason we had this one passing for amd64 is because the check was done using the wrong variable (`matrix.stage`, while in the other workflows the variable used is `inputs.stage`). The commit that broke the release process is `67a8665f51`, which blindly copy & pasted the logic from the matrix assets. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-28 11:15:53 +01:00
Steve Horsman	ad5749fd6b	Merge pull request #10467 from stevenhorsman/release-3.10.1 release: Bump version to 3.10.1	2024-10-25 20:19:23 +01:00
stevenhorsman	b22d4429fb	release: Bump version to 3.10.1 Fix release to pick up #10463 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-10-25 17:16:09 +01:00
Steve Horsman	19ac0b24f1	Merge pull request #10463 from skaegi/rustjail_filemode_perm_fix agent: Correct rustjail device filemode permission typo	2024-10-25 14:27:50 +01:00
Fabiano Fidêncio	cc815957c0	Merge pull request #10461 from kata-containers/topic/workflows-follow-up-on-manually-triggered-job workflows: devel: Follow-up on the manually triggered jobs	2024-10-25 08:31:14 +02:00
Simon Kaegi	322846b36f	agent: Correct rustjail device filemode permission typo Corrects device filemode permissions typo/regression in rustjail to `666` instead of `066`. `666` is the standard and expected value for these devices in containers. Fixes: #10454 Signed-off-by: Simon Kaegi <simon.kaegi@gmail.com>	2024-10-24 16:46:40 -04:00
GabyCT	a9af46ccd2	Merge pull request #10452 from GabyCT/topic/katadoctemp tests: Add trap statement in kata doc script	2024-10-24 13:21:11 -06:00
Gabriela Cervantes	a3ef8c0a16	tests: Increase time to run stressng k8s tests This PR increase the time to run the stressng k8s tests for the CoCo stability CI. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-10-24 16:34:17 +00:00
Fabiano Fidêncio	475ad3e06b	workflows: devel: Allow running more than one at once More than one developer can and should be able to run this workflow at the same time, without cancelling the job started by another developer. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-24 15:38:35 +02:00
Fabiano Fidêncio	8f634ceb6b	workflows: devel: Adjust the pr-number Let's use "dev" instead of "manually-triggered" as it avoids the name being too long, which results in failures to create AKS clusters. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-24 15:38:31 +02:00
GabyCT	41d1178e4a	Merge pull request #10438 from GabyCT/topic/fixspellreadme docs: Fix misspelling in CI documentation	2024-10-23 13:34:52 -06:00
Steve Horsman	c5c389f473	Merge pull request #10449 from kata-containers/topic/add-workflows-specifically-for-testing Add a specific workflow for testing the CI, without messing up with the "nightly" weather	2024-10-23 19:03:49 +01:00
Gabriela Cervantes	093a6fd542	tests: Add trap statement in kata doc script This PR adds the trap statement into the kata doc script to clean up properly the temporary files. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-10-23 15:56:58 +00:00
Gabriela Cervantes	701891312e	docs: Fix misspelling in CI documentation This PR fixes a misspelling in CI documentation readme. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-10-23 15:42:08 +00:00
Fabiano Fidêncio	829415dfda	workflows: Remove the possibility to manually trigger the nightly CI As a new workflow was added for the cases where developers want to test their changes in the workflow itself, let's make sure we stop allowing manual triggers on this workflow, which can lead to a polluted / misleading weather of the CI. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-23 13:19:45 +02:00
Fabiano Fidêncio	cc093cdfdb	workflows: Add a manually trigger "devel" workflow for the CI This workflow is intended to replace the `workflow_dispatch` trigger currently present as part of the `ci-nightly.yaml`. The reasoning behind having this done in this way is because of our good and old GHA behaviour for `pull_request_target`, which requires a PR to be merged in order to check the changes in the workflow itself, which leads to: * when a change in a workflow is done, developers (should) do: * push their branch to the kata-containers repo * manually trigger the "nightly" CI in order to ensure the changes don't break anything * this can result in the "nightly" CI weather being polluted * we don't have the guarantee / assurance about the last n nightly runs anymore Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-23 13:14:50 +02:00
Greg Kurz	378f454fb9	Merge pull request #10208 from wtootw/main runtime: Failed to clean up resources when QEMU is terminated	2024-10-23 12:11:57 +02:00
Fabiano Fidêncio	ca416d8837	Merge pull request #10446 from kata-containers/topic/re-work-shim-v2-build-as-part-of-the-ci-and-release workflows: Ensure shim-v2 is built as the last asset	2024-10-23 09:27:29 +02:00
Fabiano Fidêncio	c082b99652	Merge pull request #10439 from microsoft/mahuber/azl-cfg-var tools: Change PACKAGES var for cbl-mariner	2024-10-23 08:39:49 +02:00
Manuel Huber	a730cef9cf	tools: Change PACKAGES var for cbl-mariner Change the PACKAGES variable for the cbl-mariner rootfs-builder to use the kata-packages-uvm meta package from packages.microsoft.com to define the set of packages to be contained in the UVM. This aligns the UVM build for the Azure Linux distribution with the UVM build done for the Kata Containers offering on Azure Kubernetes Services (AKS). Signed-off-by: Manuel Huber <mahuber@microsoft.com>	2024-10-22 23:11:42 +00:00
Fabiano Fidêncio	67a8665f51	workflows: Ensure shim-v2 is built as the last asset By doing this we can ensure that whenever the rootfs changes, we'll be able to get the new root_hash.txt and use it. This is the very first step to bring the measured rootfs tests back. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-22 14:56:37 +02:00
Greg Kurz	3de6d09a86	Merge pull request #10443 from gkurz/release-3.10.0 release: Bump VERSION to 3.10.0	2024-10-22 14:46:30 +02:00
Greg Kurz	3037303e09	release: Bump VERSION to 3.10.0 Let's start the 3.10.0 release. Signed-off-by: Greg Kurz <groug@kaod.org>	2024-10-22 11:28:15 +02:00
wangyaqi54	cf4b81344d	runtime: Failed to clean up resources when QEMU is terminated by signal 15 When QEMU is terminated by signal 15, it deletes the PidFile. Upon detecting that QEMU has exited, the shim executes the stopVM function. If the PidFile is not found, the PID is set to 0. Subsequently, the shim executes `kill -9 0`, which terminates the current process group. This prevents any further logic from being executed, resulting in resources not being cleaned up. Signed-off-by: wangyaqi54 <wangyaqi54@jd.com>	2024-10-22 17:04:46 +08:00
Fabiano Fidêncio	4c34cfb0ab	Merge pull request #10420 from pmores/add-support-for-virtio-scsi runtime-rs: support virtio-scsi device in qemu-rs	2024-10-22 11:00:33 +02:00
Pavel Mores	8cdd968092	runtime-rs: support virtio-scsi device in qemu-rs Semantics are lifted straight out of the go runtime for compatibility. We introduce DeviceVirtioScsi to represent a virtio-scsi device and instantiate it if block device driver in the configuration file is set to virtio-scsi. We also introduce ObjectIoThread which is instantiated if the configuration file additionally enables iothreads. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-10-22 08:55:54 +02:00
Greg Kurz	91b874f18c	Merge pull request #10421 from Apokleos/hostname-bugfix kata-agent: fixing bug of unable setting hostname correctly.	2024-10-22 00:26:51 +02:00
alex.lyn	b25538f670	ci: Introduce CI to validate pod hostname Fixes #10422 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2024-10-21 16:32:56 +01:00
alex.lyn	3dabe0f5f0	kata-agent: fixing bug of unable setting hostname correctly. When do update_container_namespaces updating namespaces, setting all UTS(and IPC) namespace paths to None resulted in hostnames set prior to the update becoming ineffective. This was primarily due to an error made while aligning with the oci spec: in an attempt to match empty strings with None values in oci-spec-rs, all paths were incorrectly set to None. Fixes #10325 Signed-off-by: alex.lyn <alex.lyn@antgroup.com>	2024-10-21 16:32:56 +01:00
Steve Horsman	98886a7571	Merge pull request #10437 from mkulke/mkulke/dont-parse-oci-image-for-cached-artifacts ci: don't parse oci image for cached artifacts	2024-10-21 16:31:23 +01:00
Magnus Kulke	e27d70d47e	ci: don't parse oci image for cached artifacts Moved the parsing of the oci image marker into its own step, since we only need to perform that for attestation purposes and some cached images might not have that file in the tarball. Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>	2024-10-21 14:50:00 +02:00
Magnus Kulke	9a33a3413b	Merge pull request #10433 from mkulke/mkulke/add-provenance-attestation-for-agent-builds ci: add provenance attestation for agent artifact	2024-10-18 15:00:18 +02:00
Anastassios Nanos	68d539f5c5	Merge pull request #10435 from nubificus/fix_fc_machineconfig runtime-rs: Use vCPU and memory values from config	2024-10-18 13:41:20 +01:00
Magnus Kulke	b93f5390ce	ci: add provenance attestation for agent artifact This adds provenance attestation logic for agent binaries that are published to an oci registry via ORAS. As a downstream consumer of the kata-agent binary the Peerpod project needs to verify that the artifact has been built on kata's CI. To create an attestation we need to know the exact digest of the oci artifact, at the point when the artifact was pushed. Therefore we record the full oci image as returned by oras push. The pushing and tagging logic has been slightly reworked to make this task less repetetive. The oras cli accepts multiple tags separated by comma on pushes, so a push can be performed atomically instead of iterating through tags and pushing each individually. This removes the risk of partially successful push operations (think: rate limits on the oci registry). So far the provenance creation has been only enabled for agent builds on amd64 and xs390x. Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>	2024-10-18 10:24:00 +02:00
Anastassios Nanos	23f5786cca	runtime-rs: Use vCPU and memory values from config Use values from the config for the setup of the microVM. Fixes: #10434 Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>	2024-10-17 23:17:02 +01:00
GabyCT	4ae9317675	Merge pull request #10430 from GabyCT/topic/ciaz docs: Update CI documentation	2024-10-17 15:09:24 -06:00
GabyCT	b00203ba9b	Merge pull request #10428 from GabyCT/topic/archk8sc gha: Use a arch_to_golang variable to have uniformity	2024-10-17 11:00:59 -06:00
Chengyu Zhu	cca77f0911	Merge pull request #10412 from stevenhorsman/agent-config-rstest agent: config: Use rstest for unit tests	2024-10-17 23:01:21 +08:00
Gabriela Cervantes	e3efad8ed2	docs: Update CI documentation This PR updates the CI documentation referring to the several tests and in which kind of instances is running them. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-10-16 19:23:19 +00:00
stevenhorsman	4adb454ed0	agent: config: Use rstest for unit tests Use rstest for unit test rather than TestData arrays where possible to make the code more compact, easier to read and open the possibility to enhance test cases with a description more easily. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-10-16 16:55:44 +01:00
Gabriela Cervantes	f0e0c74fd4	gha: Use a arch_to_golang variable to have uniformity This PR replaces the arch uname -m to use the arch_to_golang variable in the script to have a better uniformity across the script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-10-15 20:03:09 +00:00
Dan Mihai	69509eff33	Merge pull request #10417 from microsoft/danmihai1/k8s-inotify.bats tests: k8s-inotify.bats improvements	2024-10-15 11:22:53 -07:00
Dan Mihai	ece0f9690e	tests: k8s-inotify: longer pod termination timeout inotify-configmap-pod.yaml is using: "inotifywait --timeout 120", so wait for up to 180 seconds for the pod termination to be reported. Hopefully, some of the sporadic errors from #10413 will be avoided this way: not ok 1 configmap update works, and preserves symlinks waitForProcess "${wait_time}" "$sleep_time" "${command}" failed Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-10-15 16:01:25 +00:00
Dan Mihai	ccfb7faa1b	tests: k8s-inotify.bats: don't leak configmap Delete the configmap if the test failed, not just on the successful path. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-10-15 16:01:25 +00:00
Aurélien Bombo	f13d13c8fa	Merge pull request #10416 from microsoft/danmihai1/mariner_static_sandbox_resource_mgmt ci: static_sandbox_resource_mgmt for cbl-mariner	2024-10-15 10:40:17 -05:00
Aurélien Bombo	c371b4e1ce	Merge pull request #10426 from 3u13r/fix/genpolicy/handle-config-map-binary-data genpolicy: read binaryData value as String	2024-10-14 21:31:23 -05:00
Leonard Cohnen	c06bf2e3bb	genpolicy: read binaryData value as String While Kubernetes defines `binaryData` as `[]byte`, when defined in a YAML file the raw bytes are base64 encoded. Therefore, we need to read the YAML value as `String` and not as `Vec<u8>`. Fixes: #10410 Signed-off-by: Leonard Cohnen <lc@edgeless.systems>	2024-10-14 20:03:11 +02:00
Aurélien Bombo	f9b7a8a23c	Merge pull request #10402 from Sumynwa/sumsharma/agent-ctl-dependencies ci: Install build dependencies for building agent-ctl with image pull.	2024-10-14 10:28:32 -05:00
Sumedh Alok Sharma	bc195d758a	ci: Install build dependencies for building agent-ctl with image pull. Adds dependencies of 'clang' & 'protobuf' to be installed in runners when building agent-ctl sources having image pull support. Fixes #10400 Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2024-10-14 10:36:04 +05:30
Aurélien Bombo	614e21ccfb	Merge pull request #10415 from GabyCT/topic/egreptim tools/osbuilder/tests: Remove egrep in test images script	2024-10-11 13:47:30 -05:00
Gabriela Cervantes	aae654be80	tools/osbuilder/tests: Remove egrep in test images script This PR removes egrep command as it has been deprecated and it replaces by grep in the test images script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-10-11 17:23:35 +00:00
Dan Mihai	3622b5e8b4	ci: static_sandbox_resource_mgmt for cbl-mariner Use the configuration used by AKS (static_sandbox_resource_mgmt=true) for CI testing on Mariner hosts. Hopefully pod startup will become more predictable on these hosts - e.g., by avoiding the occasional hotplug timeouts described by #10413. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-10-10 22:17:39 +00:00
Fabiano Fidêncio	02f5fd94bd	Merge pull request #10409 from fidencio/topic/ci-add-ita_image-and-ita_image_tag kbs: ita: Ensure the proper image / image_tag is used for ITA	2024-10-10 11:46:26 +02:00
Fabiano Fidêncio	cf5d3ed0d4	kbs: ita: Ensure the proper image / image_tag is used for ITA When dealing with a specific release, it was easier to just do some adjustments on the image that has to be used for ITA without actually adding a new entry in the versions.yaml. However, it's been proven to be more complicated than that when it comes to dealing with staged images, and we better explicitly add (and update) those versions altogether to avoid CI issues. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-10 10:01:33 +02:00
Steve Horsman	0c4a7c8771	Merge pull request #10406 from ChengyuZhu6/fix-unit agent:cdh: fix unit tests about sealed secret	2024-10-10 08:57:28 +01:00
Fabiano Fidêncio	3f7ce1d620	Merge pull request #10401 from stevenhorsman/kbs-deploy-overlays-update Kbs deploy overlays update	2024-10-10 09:50:19 +02:00
Fabiano Fidêncio	036b04094e	Merge pull request #10397 from fidencio/topic/build-remove-initrd-mariner-target build: mariner: Remove the ability to build the marine initrd	2024-10-10 09:44:36 +02:00
ChengyuZhu6	65ecac5777	agent:cdh: fix unit tests about sealed secret The root cause is that the CDH client is a global variable, and unit tests `test_unseal_env` and `test_unseal_file` share this lock-free global variable, leading to resource contention and destruction. Merging the two unit tests into one test_sealed_secret will resolve this issue. Fixes: #10403 Signed-off-by: ChengyuZhu6 <zhucy0405@gmail.com>	2024-10-10 08:38:06 +08:00
ChengyuZhu6	a992feb7f3	Revert "Revert "agent:cdh: unittest for sealed secret as file"" This reverts commit `b5142c94b9`. Signed-off-by: ChengyuZhu6 <zhucy0405@gmail.com>	2024-10-10 08:37:06 +08:00
GabyCT	0cda92c6d8	Merge pull request #10407 from GabyCT/topic/fixbuildk packaging: Remove unused variable in build kernel script	2024-10-09 16:53:45 -06:00
Gabriela Cervantes	616eb8b19b	packaging: Remove unused variable in build kernel script This PR removes an unused variable in the build kernel script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-10-09 20:02:56 +00:00
Fabiano Fidêncio	652ba30d4a	build: mariner: Remove the ability to build the marine initrd As mariner has switched to using an image instead of an initrd, let's just drop the abiliy to build the initrd and avoid keeping something in the tree that won't be used. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-09 21:42:55 +02:00
Fabiano Fidêncio	59e3ab07e4	Merge pull request #10396 from fidencio/topic/ci-mariner-test-using-mariner-image-instead-of-initrd ci: mariner: Use the image instead of the initrd	2024-10-09 21:39:44 +02:00
stevenhorsman	b2fb19f8f8	versions: Bump KBS version Bump to the commit that had the overlays changes we want to adapt to. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-10-09 17:49:21 +01:00
Fabiano Fidêncio	01a957f7e1	ci: mariner: Stop building mariner initrd As the mariner image is already in place, and the tests were modified to use them (as part of this series), let's just stop building it as part of the CI. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-09 18:23:35 +02:00
Fabiano Fidêncio	091ad2a1b2	ci: mariner: Ensure kernel_params can be set The reason we're doing this is because mariner image uses, by default, cgroups default-hierarchy as `unified` (aka, cgroupsv2). In order to keep the same initrd behaviour for mariner, let's enforce that `SYSTEMD_CGROUP_ENABLE_LEGACY_FORCE=1 systemd.legacy_systemd_cgroup_controller=yes systemd.unified_cgroup_hierarchy=0` is passed to the kernel cmdline, at least for now. Other tests that are setting `kernel_params` are not running on mariner, then we're safe taking this path as it's done as part of this PR. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-09 18:23:35 +02:00
Fabiano Fidêncio	3bbf3c81c2	ci: mariner: Use the image instead of the initrd As an image has been added for mariner as part of the commit `63c1f81c2`, let's start using it in the CI, instead of using the initrd. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-09 18:23:32 +02:00
Fabiano Fidêncio	9c0c159b25	Merge pull request #10404 from fidencio/topic/rever-sealed-secrets-tests Revert "agent:cdh: unittest for sealed secret as file"	2024-10-09 18:09:09 +02:00
GabyCT	2035d638df	Merge pull request #10388 from GabyCT/topic/testimtemp tools/osbuilder/tests: Add trap statement in test images script	2024-10-09 09:49:45 -06:00
Fabiano Fidêncio	b5142c94b9	Revert "agent:cdh: unittest for sealed secret as file" This reverts commit `31e09058af`, as it's breaking the agent unit tests CI. This is a stop gap till Chengyu Zhu finds the time to properly address the issue, avoiding the CI to be blocked for now. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-09 16:06:09 +02:00
stevenhorsman	8763880e93	tests/k8s: kbs: Update overlays logic In https://github.com/confidential-containers/trustee/pull/521 the overlays logic was modified to add non-SE s390x support and simplify non-ibm-se platforms. We need to update the logic in `kbs_k8s_deploy` to match and can remove the dummying of `IBM_SE_CREDS_DIR` for non-SE now Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-10-09 09:39:41 +01:00
Gabriela Cervantes	e08749ce58	tools/osbuilder/tests: Add trap statement in test images script This PR adds the trap statement in the test images script to clean up tmp files. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-10-08 19:54:23 +00:00
Fabiano Fidêncio	80196c06ad	Merge pull request #10390 from microsoft/danmihai1/new-rootfs-image-mariner local-build: add ability to build rootfs-image-mariner	2024-10-08 21:40:43 +02:00
Fabiano Fidêncio	083b2f24d8	Merge pull request #10363 from ChengyuZhu6/secret-as-volume Support Confidential Sealed Secrets (as volume)	2024-10-08 19:23:40 +02:00
Dan Mihai	63c1f81c23	local-build: add rootfs-image-mariner Kata CI will start testing the new rootfs-image-mariner instead of the older rootfs-initrd-mariner image. The "official" AKS images are moving from a rootfs-initrd-mariner format to the rootfs-image-mariner format. Making the same change in Kata CI is useful to keep this testing in sync with the AKS settings. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-10-08 17:15:56 +00:00
GabyCT	7a38cce73c	Merge pull request #10383 from kata-containers/topic/imagevar image-builder: Remove unused variable	2024-10-08 10:27:03 -06:00
Aurélien Bombo	e56af7a370	Merge pull request #10389 from emanuellima1/fix-agent-policy build: Fix RPM build fail due to AGENT_POLICY	2024-10-08 09:59:21 -05:00
ChengyuZhu6	a94024aedc	tests: add test for sealed file secrets add a test for sealed file secrets. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-10-08 16:01:48 +08:00
ChengyuZhu6	fe307303c8	agent:rpc: Refactor CDH-related operations Refactor CDH-related operations into the cdh_handler function to make the `create_container` code clearer. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-10-08 16:01:48 +08:00
ChengyuZhu6	31e09058af	agent:cdh: unittest for sealed secret as file add unittest for sealed secret as file. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com> Signed-off-by: Linda Yu <linda.yu@intel.com>	2024-10-08 16:01:48 +08:00
ChengyuZhu6	974d6b0736	agent:cdh: initialize cdhclient with the input cdh socket uri Refactor cdh code to initialize cdhclient with the input cdh socket uri. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-10-08 14:58:07 +08:00
ChengyuZhu6	1f33fd4cd4	agent:rpc: handle the sealed secret in createcontainer Users must set the mount path to `/sealed/<path>` for kata agent to detect the sealed secret mount and handle it in createcontainer stage. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com> Signed-off-by: Linda Yu <linda.yu@intel.com>	2024-10-08 14:58:07 +08:00
ChengyuZhu6	da281b4444	agent:cdh: support to unseal secret as file Introduced `unseal_file` function to unseal secret as files: - Implemented logic to handle symlinks and regular files within the sealed secret directory. - For each entry, call CDH to unseal secrets and the unsealed contents are written to a new file, and a symlink is created to replace the sealed symlink. Fixes: #8123 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com> Signed-off-by: Linda Yu <linda.yu@intel.com>	2024-10-08 14:58:07 +08:00
Fabiano Fidêncio	71d0c46e0a	Merge pull request #10384 from microsoft/danmihai1/virtio-fs-policy tests: k8s: AUTO_GENERATE_POLICY=yes for local testing	2024-10-07 21:25:52 +02:00
Emanuel Lima	e989e7ee4e	build: Fix RPM build fail due to AGENT_POLICY By checking for AGENT_POLICY we ensure we only try to read allow-all.rego if AGENT_POLICY is set to "yes" Signed-off-by: Emanuel Lima <emlima@redhat.com>	2024-10-07 15:43:23 -03:00
Dan Mihai	6d5fc898b8	tests: k8s: AUTO_GENERATE_POLICY=yes for local testing The behavior of Kata CI doesn't change. For local testing using kubernetes/gha-run.sh and AUTO_GENERATE_POLICY=yes: 1. Before these changes users were forced to use: - SEV, SNP, or TDX guests, or - KATA_HOST_OS=cbl-mariner 2. After these changes users can also use other platforms that are configured with "shared_fs = virtio-fs" - e.g., - KATA_HOST_OS=ubuntu + KATA_HYPERVISOR=qemu Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-10-04 18:26:00 +00:00
Dan Mihai	5aaef8e6eb	Merge pull request #10376 from microsoft/danmihai1/auto-generate-just-for-ci gha: enable AUTO_GENERATE_POLICY where needed	2024-10-04 10:52:31 -07:00
Gabriela Cervantes	4cd737d9fd	image-builder: Remove unused variable This PR removes an unused variable in the image builder script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-10-04 15:56:28 +00:00
Greg Kurz	77c5db6267	Merge pull request #9637 from ldoktor/selective-ci CI: Select jobs by touched code	2024-10-04 11:29:05 +02:00
GabyCT	2d089d9695	Merge pull request #10381 from GabyCT/topic/archrootfs osbuilder: Remove duplicated arch variable definition	2024-10-03 14:48:08 -06:00
Wainer Moschetta	b9025462fb	Merge pull request #10134 from ldoktor/ci-sort-range ci.ocp: Sort images according to git	2024-10-03 15:08:41 -03:00
Chelsea Mafrica	9138f55757	Merge pull request #10375 from GabyCT/topic/mktempkbs k8s:kbs: Add trap statement to clean up tmp files	2024-10-03 12:32:30 -04:00
Gabriela Cervantes	d7c2b7d13c	osbuilder: Remove duplicated arch variable definition This PR removes duplicated arch variable definition in the rootfs script as this variable and its value is already defined at the top of the script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-10-03 16:22:27 +00:00
Greg Kurz	96336d141b	Merge pull request #10165 from pmores/add-network-device-hotplugging runtime-rs: add network device hotplugging to qemu-rs	2024-10-03 17:44:50 +02:00
Pavel Mores	23927d8a94	runtime-rs: plug in netdev hotplugging functionality and actually call it add_device() now checks if QEMU is running already by checking if we have a QMP connection. If we do a new function hotplug_device() is called which hotplugs the device if it's a network one. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-10-03 11:23:10 +02:00
Pavel Mores	ac393f6316	runtime-rs: implement netdev hotplugging for qemu-rs With the helpers from previous commit, the actual hotplugging implementation, though lengthy, is mostly just assembling a QMP command to hotplug the network device backend and then doing the same for the corresponding frontend. Note that hotplug_network_device() takes cmdline_generator types Netdev and DeviceVirtioNet. This is intentional and aims to take advantage of the similarity between parameter sets needed to coldplug and hotplug devices reuse and simplify our code. To enable using the types from qmp, accessors were added as needed. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-10-03 11:20:02 +02:00
Pavel Mores	4eb7e2966c	runtime-rs: add netdev hotplugging helpers to qemu-rs Before adding network device hotplugging functionality itself we add a couple of helpers in a separate commit since their functionality is non-trivial. To hotplug a device we need a free PCI slot. We add find_free_slot() which can be called to obtain one. It looks for PCI bridges connected to the root bridge and looks for an unoccupied slot on each of them. The first found is returned to the caller. The algorithm explicitly doesn't support any more complex bridge hierarchies since those are never produced when coldplugging PCI bridges. Sending netdev queue and vhost file descriptors to QEMU is slightly involved and implemented in pass_fd(). The actual socket has to be passed in an SCM_RIGHTS socket control message (also called ancillary data, see man 3 cmsg) so we have to use the msghdr structure and sendmsg() call (see man 2 sendmsg) to send the message. Since qapi-rs doesn't support sending messages with ancillary data we have to do the sending sort of "under it", manually, by retrieving qapi-rs's socket and using it directly. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-10-03 11:15:31 +02:00
Pavel Mores	3f46dfcf2f	runtime-rs: don't treat NetworkConfig::index as unique in qemu-rs NetworkConfig::index has been used to generate an id for a network device backend. However, it turns out that it's not unique (it's always zero as confirmed by a comment at its definition) so it's not suitable to generate an id that needs to be unique. Use the host device name instead. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-10-03 11:12:37 +02:00
Pavel Mores	cda04fa539	runtime-rs: factor setup of network device out of QemuCmdLine Network device hotplugging will use the same infrastructure (Netdev, DeviceVirtioNet) as coldplugging, i.e. QemuCmdLine. To make the code of network device setup visible outside of QemuCmdLine we factor it out to a non-member function `get_network_device()` and make QemuCmdLine just delegate to it. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-10-03 11:03:32 +02:00
Pavel Mores	efc8e93bfe	runtime-rs: factor bus_type() out of QemuCmdLine The function takes a whole QemuCmdLine but only actually uses HypervisorConfig. We increase callability of the function by limiting its interface to what it needs. This will come handy shortly. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-10-03 11:03:32 +02:00
Pavel Mores	720265c2d8	runtime-rs: support adding PCI bridges to qemu VM At least one PCI bridge is necessary to hotplug PCI devices. We only support PCI (at this point at least) since that's what the go runtime does (note that looking at the code in virtcontainers it might seem that other bus types are supported, however when the bridge objects are passed to govmm, all but PCI bridges are actually ignored). The entire logic of bridge setup is lifted from runtime-go for compatibility's sake. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-10-03 11:03:32 +02:00
Lukáš Doktor	63b6e8a215	ci: Ensure we check the latest workflow run in gatekeeper with multiple iterations/reruns we need to use the latest run of each workflow. For that we can use the "run_id" and only update results of the same or newer run_ids. To do that we need to store the "run_id". To avoid adding individual attributes this commit stores the full job object that contains the status, conclussion as well as other attributes of the individual jobs, which might come handy in the future in exchange for slightly bigger memory overhead (still we only store the latest run of required jobs only). Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-10-03 09:10:45 +02:00
Lukáš Doktor	2ae090b44b	ci: Add extra gatekeeper debug output to stderr which might be useful to assess the amount of querries. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-10-03 09:08:35 +02:00
Lukáš Doktor	2440a39c50	ci: Check required lables before checking tests in gatekeeper some tests require certain labels before they are executed. When our PR is not labeled appropriately the gatekeeper detects skipped required tests and reports a failure. With this change we add "required-labeles" to the tests mapping and check the expected labels first informing the user about the missing labeles before even checking the test statuses. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-10-03 09:08:35 +02:00
Lukáš Doktor	dd2878a9c8	ci: Unify character for separating items the test names are using `;` and regexps were designed to use `,` but during development simply joined the expressions by `\|`. This should work but might be confusing so let's go with the semi-colon separator everywhere. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-10-03 09:08:35 +02:00
Wainer dos Santos Moschetta	fdcfac0641	workflows/gatekeeper: export COMMIT_HASH variable The Github SHA of triggering PR should be exported in the environment so that gatekeeper can fetch the right workflows/jobs. Note: by default github will export GITHUB_SHA in the job's environment but that value cannot be used if the gatekeeper was triggered from a pull_request_target event, because the SHA correspond to the push branch. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-10-03 09:08:35 +02:00
Wainer dos Santos Moschetta	4abfc11b4f	workflows/gatekeeper: configure concurrency properly This will allow to cancel-in-progress the gatekeeper jobs. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com> Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-10-03 09:08:35 +02:00
Lukáš Doktor	5c1cea1601	ci: Select jobs by touched code to allow selective testing as well as selective list of required tests let's add a mapping of required jobs/tests in "skips.py" and a "gatekeaper" workflow that will ensure the expected required jobs were successful. Then we can only mark the "gatekeaper" as the required job and modify the logic to suit our needs. Fixes: #9237 Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-10-03 09:08:33 +02:00
Dan Mihai	1a4928e710	gha: enable AUTO_GENERATE_POLICY where needed The behavior of Kata CI doesn't change. For local testing using kubernetes/gha-run.sh: 1. Before these changes: - AUTO_GENERATE_POLICY=yes was always used by the users of SEV, SNP, TDX, or KATA_HOST_OS=cbl-mariner. 2. After these changes: - Users of SEV, SNP, TDX, or KATA_HOST_OS=cbl-mariner must specify AUTO_GENERATE_POLICY=yes if they want to auto-generate policy. - These users have the option to test just using hard-coded policies (e.g., using the default policy built into the Guest rootfs) by using AUTO_GENERATE_POLICY=no. AUTO_GENERATE_POLICY=no is the default value of this env variable. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-10-02 23:20:33 +00:00
Gabriela Cervantes	973b8a1d8f	k8s:kbs: Add trap statement to clean up tmp files This PR adds the trap statement in the confidential kbs script to clean up temporary files and ensure we are leaving them. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-10-02 19:59:08 +00:00
Steve Horsman	8412c09143	Merge pull request #10371 from fidencio/topic/k8s-tdx-re-enable-empty-dir-tests k8s: tests: Re-enable empty-dirs tests for TDX / coco-qemu-dev	2024-10-02 18:41:19 +01:00
Dan Mihai	9a8341f431	Merge pull request #10370 from microsoft/danmihai1/k8s-policy-rc tests: k8s-policy-rc: remove default UID from YAML	2024-10-02 09:32:17 -07:00
GabyCT	a1d380305c	Merge pull request #10369 from GabyCT/topic/egrepfastf metrics: Update fast footprint script to use grep	2024-10-02 10:10:12 -06:00
Fabiano Fidêncio	b3ed7830e4	k8s: tests: Re-enable empty-dirs tests for TDX / coco-qemu-dev The tests is disabled for qemu-coco-dev / qemu-tdx, but it doesn't seen to actually be failing on those. Plus, it's passing on SEV / SNP, which means that we most likely missed re-enabling this one in the past. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-10-01 20:51:01 +02:00
Hyounggyu Choi	b179598fed	Merge pull request #10374 from BbolroC/skip-block-volume-qemu-runtime-rs tests: Skip k8s-block-volume.bats for qemu-runtime-rs	2024-10-01 19:45:10 +02:00
Lukáš Doktor	820e000f1c	ci.ocp: Sort images according to git The quay.io registry returns the tags sorted alphabetically and doesn't seem to provide a way to sort it by age. Let's use "git log" to get all changes between the commits and print all tags that were actually pushed. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-10-01 16:08:00 +02:00
Hyounggyu Choi	4ccf1f29f9	tests: Skip k8s-block-volume.bats for qemu-runtime-rs Currently, `qemu-runtime-rs` does not support `virtio-scsi`, which causes the `k8s-block-volume.bats` test to fail. We should skip this test until `virtio-scsi` is supported by the runtime. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-10-01 09:09:47 +02:00
Dan Mihai	3b24219310	tests: k8s-policy-rc: remove default UID from YAML The nginx container seems to error out when using UID=123. Depending on the timing between container initialization and "kubectl wait", the test might have gotten lucky and found the pod briefly in Ready state before nginx errored out. But on some of the nodes, the pod never got reported as Ready. Also, don't block in "kubectl wait --for=condition=Ready" when wrapping that command in a waitForProcess call, because waitForProcess is designed for short-lived commands. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-10-01 00:10:30 +00:00
Saul Paredes	94bc54f4d2	Merge pull request #10340 from microsoft/saulparedes/validate_create_sandbox_storages genpolicy: validate create sandbox storages	2024-09-30 14:24:56 -07:00
Aurélien Bombo	b49800633d	Merge pull request #7165 from sprt/k8s-block-volume-test tests: Add `k8s-block-volume` test to GHA CI	2024-09-30 13:26:18 -07:00
Dan Mihai	7fe44d3a3d	genpolicy: validate create sandbox storages Reject any unexpected values from the CreateSandboxRequest storages field. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-09-30 11:31:12 -07:00
Gabriela Cervantes	52ef092489	metrics: Update fast footprint script to use grep This PR updates the fast footprint script to remove the use of egrep as this command has been deprecated and change it to use grep command. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-09-30 17:43:08 +00:00
Aurélien Bombo	c037ac0e82	tests: Add k8s-block-volume test This imports the k8s-block-volume test from the tests repo and modifies it slightly to set up the host volume on the AKS host. This is a follow-up to #7132. Fixes: #7164 Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com> Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-09-30 10:58:30 -05:00
Alex Lyn	dfd0ca9bfe	Merge pull request #10312 from sidneychang/configurable-build-dragonball runtime-rs: Add Configurable Compilation for Dragonball in Runtime-rs	2024-09-29 22:33:54 +08:00
GabyCT	6a9e3ccddf	Merge pull request #10305 from GabyCT/topic/ita ci:tdx: Use an ITA key for TDX	2024-09-27 16:44:53 -06:00
Fabiano Fidêncio	66bcfe7369	k8s: kbs: Properly delete ita kustomization The ita kustomization for Trustee, as well as previously used one (DCAP), doesn't have a $(uname -m) directory after the deployment directory name. Let's follow the same logic used for the deploy-kbs script and clean those up accordingly. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-27 21:47:29 +02:00
Gabriela Cervantes	bafa527be0	ci: tdx: Test attestation with ITTS Intel Tiber Trust Services (formerly known as Intel Trust Authority) is Intel's own attestation service, and we want to take advantage of the TDX CI in order to ensure ITTS works as expected. In order to do so, let's replace the former method used (DCAP) to use ITTS instead. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-27 21:47:25 +02:00
GabyCT	36750b56f1	Merge pull request #10342 from GabyCT/topic/updevguide docs: Remove qemu information not longer valid	2024-09-27 11:15:11 -06:00
Fabiano Fidêncio	86b8c53d27	Merge pull request #10357 from fidencio/topic/add-ita-secret gha: Add ita_key as a github secret	2024-09-27 17:40:41 +02:00
Gabriela Cervantes	d91979d7fa	gha: Add ita_key as a github secret This PR adds ita_key as a github secret at the kata coco tests yaml workflow. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-09-27 17:15:22 +02:00
Xuewei Niu	ad0f2b2a55	Merge pull request #10219 from sidneychang/decouple-runtime-rs-from-dragonball runtime-rs: Port TAP implementation from dragonball	2024-09-27 11:17:55 +08:00
Xuewei Niu	11b1a72442	Merge pull request #10349 from lifupan/main_nsandboxapi sandbox: refactor the sandbox init process	2024-09-27 11:10:45 +08:00
Xuewei Niu	3911bd3108	Merge pull request #10351 from lifupan/main_agent agent: fix the issue of setup sandbox pidns	2024-09-27 10:49:47 +08:00
Fupan Li	f7bc627a86	sandbox: refactor the sandbox init process Inorder to support sandbox api, intorduce the sandbox_config struct and split the sandbox start stage from init process. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-09-26 23:50:24 +08:00
Hyounggyu Choi	b1275bed1b	Merge pull request #10346 from BbolroC/minor-improvement-k8s-tests tests: Minor improvement k8s tests	2024-09-26 17:01:32 +02:00
Hyounggyu Choi	01d460ac63	tests: Add teardown_common() to tests_common.sh There are many similar or duplicated code patterns in `teardown()`. This commit consolidates them into a new function, `teardown_common()`, which is now called within `teardown()`. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-26 13:56:36 +02:00
Hyounggyu Choi	e8d1feb25f	tests: Validate node name for exec_host() The current `exec_host()` accepts a given node name and creates a node debugger pod, even if the name is invalid. This could result in the creation of an unnecessary pending pod (since we are using nodeAffinity; if the given name does not match any actual node names, the pod won’t be scheduled), which wastes resources. This commit introduces validation for the node name to prevent this situation. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-26 13:20:50 +02:00
Xuewei Niu	3a7f9595b6	Merge pull request #10318 from lsc2001/ci-add-docker ci: Enable basic docker tests for runtime-rs	2024-09-26 17:41:09 +08:00
Xuewei Niu	cb5a2b30e9	Merge pull request #10293 from lsc2001/solve-docker-compatibility runtime-rs: Notify containerd when process exits	2024-09-26 14:51:20 +08:00
Sicheng Liu	e4733748aa	ci: Enable basic docker tests for runtime-rs This commit enables basic amd64 tests of docker for runtime-rs by adding vmm types "dragonball" and "cloud-hypervisor". Signed-off-by: Sicheng Liu <lsc2001@outlook.com>	2024-09-26 06:27:05 +00:00
Sicheng Liu	08eb5fc7ff	runtime-rs: Notify containerd when process exits Docker cannot exit normally after the container process exits when used with runtime-rs since it doesn't receive the exit event. This commit enable runtime-rs to send TaskExit to containerd after process exits. Also, it moves "system_time_into" and "option_system_time_into" from crates/runtimes/common/src/types/trans_into_shim.rs to a new utility mod. Signed-off-by: Sicheng Liu <lsc2001@outlook.com>	2024-09-26 02:52:50 +00:00
Fupan Li	71afeccdf1	agent: fix the issue of setup sandbox pidns When the sandbox api was enabled, the pasue container wouldn't be created, thus the shared sandbox pidns should be fallbacked to the first container's init process, instead of return any error here. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-09-26 10:21:25 +08:00
Xuewei Niu	857222af02	Merge pull request #10330 from lifupan/main_sandboxapi Some prepared work for sandbox api support	2024-09-26 09:47:47 +08:00
Hyounggyu Choi	caf3b19505	Merge pull request #10348 from BbolroC/delete-node-debugger-by-trap tests: Delete custom node debugger pod on EXIT	2024-09-25 23:39:43 +02:00
Hyounggyu Choi	57e8cbff6f	tests: Delete custom node debugger pod on EXIT It was observed that the custom node debugger pod is not cleaned up when a test times out. This commit ensures the pod is cleaned up by triggering the cleanup on EXIT, preventing any debugger pods from being left behind. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-25 20:36:05 +02:00
Fabiano Fidêncio	edf4ca4738	Merge pull request #10345 from ldoktor/kata-webhook ci: Reorder webhook deployment	2024-09-25 18:16:46 +02:00
Fabiano Fidêncio	09ed9c5c50	Merge pull request #10328 from BbolroC/improve-negative-tests tests: Improve k8s negative tests	2024-09-25 18:16:28 +02:00
Xuewei Niu	e1825c2ef3	Merge pull request #9977 from l8huang/dan-2-vfio runtime: add DAN support for VFIO network device in Go kata-runtime	2024-09-25 10:11:38 +08:00
Lei Huang	39b0e9aa8f	runtime: add DAN support for VFIO network device in Go kata-runtime When using network adapters that support SR-IOV, a VFIO device can be plugged into a guest VM and claimed as a network interface. This can significantly enhance network performance. Fixes: #9758 Signed-off-by: Lei Huang <leih@nvidia.com>	2024-09-24 09:53:28 -07:00
Hyounggyu Choi	c70588fafe	tests: Use custom-node-debugger pod With #10232 merged, we now have a persistent node debugger pod throughout the test. As a result, there’s no need to spawn another debugger pod using `kubectl debug`, which could lead to false negatives due to premature pod termination, as reported in #10081. This commit removes the `print_node_journal()` call that uses `kubectl debug` and instead uses `exec_host()` to capture the host journal. The `exec_host()` function is relocated to `tests/integration/kubernetes/lib.sh` to prevent cyclical dependencies between `tests_common.sh` and `lib.sh`. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-24 17:25:24 +02:00
Lukáš Doktor	8355eee9f5	ci: Reorder webhook deployment in `b9d88f74ed` the `runtime_class` CM was added which overrides the one we previously set. Let's reorder our logic to first deploy webhook and then override the default CM in order to use the one we really want. Since we need to change dirs we also have to use realpath to ensure the files are located well. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-09-24 17:01:28 +02:00
Hyounggyu Choi	2c2941122c	tests: Fail fast in assert_pod_fail() `assert_pod_fail()` currently calls `k8s_create_pod()` to ensure that a pod does not become ready within the default 120s. However, this delays the test's completion even if an error message is detected earlier in the journal. This commit removes the use of `k8s_create_pod()` and modifies `assert_pod_fail()` to fail as soon as the pod enters a failed state. All failing pods end up in one of the following states: - CrashLoopBackOff - ImagePullBackOff The function now polls the pod's state every 5 seconds to check for these conditions. If the pod enters a failed state, the function immediately returns 0. If the pod does not reach a failed state within 120 seconds, it returns 1. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-24 16:09:20 +02:00
Gabriela Cervantes	6a8b137965	docs: Remove qemu information not longer valid This PR removes some qemu information which is not longer valid as this is referring to the tests repository and to kata 1.x. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-09-23 16:58:24 +00:00
Aurélien Bombo	e738054ddb	Merge pull request #10311 from pawelpros/pproskur/fixyq ci: don't require sudo for yq if already installed	2024-09-23 08:57:11 -07:00
Alex Lyn	6b94cc47a8	Merge pull request #10146 from Apokleos/intro-cdi Introduce cdi in runtime-rs	2024-09-23 21:45:42 +08:00
Alex Lyn	b8ba346e98	runtime-rs: Add test for container devices with CDI. Fixes #10145 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-09-23 17:20:22 +08:00
Steve Horsman	0e0cb24387	Merge pull request #10329 from Bickor/webhook-check tools.kata-webhook: Specify runtime class using configMap	2024-09-23 09:59:12 +01:00
Steve Horsman	6f0b3eb2f9	Merge pull request #10337 from stevenhorsman/update-release-process-post-3.9.0 doc: Update the release process	2024-09-23 09:55:57 +01:00
Hyounggyu Choi	8a893cd4ee	Merge pull request #10232 from BbolroC/fix-loop-device-for-exec_host tests: Fix loop device handling for exec_host()	2024-09-23 08:15:03 +02:00
Fupan Li	f1f5bef9ef	Merge pull request #10339 from lifupan/main_fix runtime-rs: fix the issue of using block_on	2024-09-23 09:28:40 +08:00
Fupan Li	52397ca2c1	sandbox: rename the task_service to service rename the task_service to service, in order to incopperate with the following added sandbox services. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-09-22 14:44:19 +08:00
Fupan Li	20b4be0225	runtime-rs: rename the Request/Response to TaskRequest/TaskResponse In order to make different from sandbox request/response, this commit changed the task request/response to TaskRequest/TaskResponse. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-09-22 14:44:11 +08:00
Fupan Li	ba94eed891	sandbox: fix the issue of hypervisor's wait_vm Since the wait_vm would be called before calling stop_vm, which would take the reader lock, thus blocking the stop_vm getting the writer lock, which would trigge the dead lock. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-09-22 14:44:03 +08:00
Fupan Li	fb27de3561	runtime-rs: fix the issue of using block_on Since the block_on would block on the current thread which would prevent other async tasks to be run on this worker thread, thus change it to use the async task for this task. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-09-22 14:40:44 +08:00
Aurélien Bombo	79a3b4e2e5	Merge pull request #10335 from kata-containers/sprt/fix-kata-deploy-docs kata-deploy: clean up and fix docs for k0s	2024-09-20 13:33:14 -07:00
stevenhorsman	4f745f77cb	doc: Update the release process - Reflect the need to update the versions in the Helm Chart - Add the lock branch instruction - Add clarity about the permissions needed to complete tasks Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-09-20 19:04:33 +01:00
Aurélien Bombo	78c63c7951	kata-deploy: clean up and fix docs for k0s * Clarifies instructions for k0s. * Adds kata-deploy step for each cluster type. * Removes the old kata-deploy-stable step for vanilla k8s. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-09-20 11:59:40 -05:00
sidney chang	456e13db98	runtime-rs: Add Configurable Compilation for Dragonball in Runtime-rs rename DEFAULT_HYPERVISOR to HYPERVISOR in Makefile Fixes #10310 Signed-off-by: sidney chang <2190206983@qq.com>	2024-09-20 05:41:34 -07:00
sidneychang	b85a886694	runtime-rs: Add Configurable Compilation for Dragonball in Runtime-rs This PR introduces support for selectively compiling Dragonball in runtime-rs. By default, Dragonball will continue to be compiled into the containerd-shim-kata-v2 executable, but users now have the option to disable Dragonball compilation. Fixes #10310 Signed-off-by: sidney chang <2190206983@qq.com>	2024-09-20 05:38:59 -07:00
Hyounggyu Choi	2d6ac3d85d	tests: Re-enable guest-pull-image tests for qemu-coco-dev Now that the issue with handling loop devices has been resolved, this commit re-enables the guest-pull-image tests for `qemu-coco-dev`. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-20 14:37:43 +02:00
Hyounggyu Choi	c6b86e88e4	tests: Increase timeouts for qemu-coco-dev in trusted image storage tests Timeouts occur (e.g. `create_container_timeout` and `wait_time`) when using qemu-coco-dev. This commit increases these timeouts for the trusted image storage test cases Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-20 14:37:43 +02:00
Hyounggyu Choi	9cff9271bc	tests: Run all commands in _loop_device() using exec_host() If the host running the tests is different from the host where the cluster is running, the _loop_device() functions do not work as expected because the device is created on the test host, while the cluster expects the device to be local. This commit ensures that all commands for the relevant functions are executed via exec_host() so that a device should be handled on a cluster node. Additionally, it modifies exec_host() to return the exit code of the last executed command because the existing logic with `kubectl debug` sometimes includes unexpected characters that are difficult to handle. `kubectl exec` appears to properly return the exit code for a given command to it. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-20 14:37:43 +02:00
Hyounggyu Choi	374b8d2534	tests: Create and delete node debugger pod only once Creating and deleting a node debugger pod for every `exec_host()` call is inefficient. This commit changes the test suite to create and delete the pod only once, globally. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-20 14:37:43 +02:00
Hyounggyu Choi	aedf14b244	tests: Mimic node debugger with full privileges This commit addresses an issue with handling loop devices via a node debugger due to restricted privileges. It runs a pod with full privileges, allowing it to mount the host root to `/host`, similar to the node debugger. This change enables us to run tests for trusted image storage using the `qemu-coco-dev` runtime class. Fixes: #10133 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-20 14:37:43 +02:00
Alex Lyn	63b25e8cb0	runtime-rs: Introduce cdi devices in container creation Fixes #10145 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-09-20 09:28:51 +08:00
Alex Lyn	03735d78ec	runtime-rs: add cdi devices definition and related methods Add cdi devices including ContainerDevice definition and annotation_container_device method to annotate vfio device in OCI Spec annotations which is inserted into Guest with its mapping of vendor-class and guest pci path. Fixes #10145 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-09-20 09:28:51 +08:00
Alex Lyn	020e3da9b9	runtime-rs: extend DeviceVendor with device class We need vfio device's properties device, vendor and class, but we can only get property device and vendor. just extend it with class is ok. Fixes #10145 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-09-20 09:28:51 +08:00
Fabiano Fidêncio	77c844da12	Merge pull request #10239 from fidencio/topic/remove-acrn acrn: Drop support	2024-09-19 23:10:29 +02:00
GabyCT	6eef58dc3e	Merge pull request #10336 from GabyCT/topic/extendtimeout gha: Increase timeout to run k8s tests on TDX	2024-09-19 13:12:55 -06:00
Martin	b9d88f74ed	tools.kata-webhook: Specify runtime class using configMap The kata webhook requires a configmap to define what runtime class it should set for the newly created pods. Additionally, the configmap allows others to modify the default runtime class name we wish to set (in case the handler is kata but the name of the runtimeclass is different). Finally, this PR changes the webhook-check to compare the runtime of the newly created pod against the specific runtime class in the configmap, if said confimap doesn't exist, then it will default to "kata". Signed-off-by: Martin <mheberling@microsoft.com>	2024-09-19 11:51:38 -07:00
Fabiano Fidêncio	51dade3382	docs: Fix spell checker tokio is not a valid word, it seeems, so let's use `tokio`. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-19 20:25:21 +02:00
Gabriela Cervantes	49b3a0faa3	gha: Increase timeout to run k8s tests on TDX This PR increases the timeout to run k8s tests for Kata CoCo TDX to avoid the random failures of timeout. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-09-19 17:15:47 +00:00
Fabiano Fidêncio	31438dba79	docs: Fix qemu link Otherwise static checks will fail, as we woke up the dogs with changes on the same file. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-19 16:05:43 +02:00
Fabiano Fidêncio	fefcf7cfa4	acrn: Drop support As we don't have any CI, nor maintainer to keep ACRN code around, we better have it removed than give users the expectation that it should or would work at some point. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-19 16:05:43 +02:00
Fabiano Fidêncio	cdaaf708a1	Merge pull request #10334 from emanuellima1/bump-version release: Bump version to 3.9.0	2024-09-19 15:27:50 +02:00
Emanuel Lima	a6ee15c5c7	release: Bump VERSION to 3.9.0 Starting the v3.9.0 release Signed-off-by: Emanuel Lima <emlima@redhat.com>	2024-09-19 10:14:55 -03:00
Fabiano Fidêncio	e9593b53a4	Merge pull request #10234 from pmores/add-support-for-disabled-guest-selinux runtime-rs: add support for disabled guest selinux	2024-09-19 15:03:24 +02:00
Fabiano Fidêncio	4d11fecc2d	Merge pull request #10274 from ajaypvictor/remote_image-os_types runtime: Enable Image annotation for remote hypervisor	2024-09-19 13:39:20 +02:00
Fabiano Fidêncio	3d5f48e02e	Merge pull request #10283 from alexman-stripe/alexman-stripe/fix-kata-shim-not-reporting-inactive-file-cgroup-v2 shim: Fix memory usage reporting for cgroup v2	2024-09-19 12:50:36 +02:00
Pavel Mores	5e5eb9759f	runtime-rs: handle disabled guest selinux in virtiofsd This is just a port of functionality existing in the golang runtime. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-09-19 12:47:10 +02:00
Pavel Mores	8c92f3bfec	runtime-rs: enable/disable selinux in guest based on disable_guest_selinux This change technically affects the path for enabled guest selinux as well, however since this is not implemented in runtime-rs anyway nothing should break. When guest selinux support is added this change will come handy. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-09-19 12:47:10 +02:00
Pavel Mores	204ee21bc8	runtime-rs: handle disabled guest selinux in OCI spec If guest selinux is off the runtime has to ensure that container OCI spec contains no selinux labels for the container rootfs and process. Failure to do so causes kata agent to try and apply the labels which fails since selinux is not enabled in guest, which in turn causes container launch to fail. This is largely inspired by golang runtime() with a slight deviation in ordering of checks. This change simply checks the disable_guest_selinux config setting and if it's true it clears both rootfs and process label if necessary. Golang runtime, on the other hand, seems to first check if process label is non-empty and only then it checks the config setting, meaning that if process label is empty the rootfs label is not reset even if it's non-empty. Frankly, this looks like a potential bug though probably unlikely to manifest since it can be assumed that the labels are either both empty, or both non-empty. () `4fd4b02f2e/src/runtime/virtcontainers/kata_agent.go (L1005)` Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-09-19 12:47:10 +02:00
Pavel Mores	eb1227f47d	runtime-rs: parse the disable_guest_selinux config key In order to handle the setting we have to first parse it and make its value available to the rest of the program. The yes() function is added to comply with serde which seems to insist on default values being returned from functions. Long term, this is surely not the best place for this function to live, however given that this is currently the first and only place where it's used it seems appropriate to put it near its use. If it ends up being reused elsewhere a better place will surely emerge. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-09-19 12:47:10 +02:00
Steve Horsman	8789551fe6	Merge pull request #10333 from fidencio/topic/ci-bump-ubuntu-20.04-runners-to-22.04 ci: Bump ubuntu 20.04 runners to 22.04	2024-09-19 11:44:33 +01:00
Fabiano Fidêncio	35c7f8d1ba	ci: Bump ubuntu 20.04 runners to 22.04 Azure internal mirrors for Ubuntu 20.04 have gone awry, leading to a situation where dependencies cannot be installed (such as libdevmapper-dev), blocking then our CI. Let's bump the runners to 22.04 regardless, even knowing it'll cause an issue with the runk tests, as the agent check tests are considered more crucial to the project at this point. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-19 12:29:20 +02:00
Fabiano Fidêncio	eccdffebf7	Merge pull request #10243 from katexochen/nydus-overlayfs-path virtcontainers: allow specifying nydus-overlayfs binary by path	2024-09-19 11:35:45 +02:00
Ajay Victor	a19f2eacec	runtime: Enable ImageName annotation for remote hypervisor Enables ImageName to support multiple VM images in remote hypervisor scenario Fixes https://github.com/kata-containers/kata-containers/issues/10240 Signed-off-by: Ajay Victor <ajvictor@in.ibm.com>	2024-09-19 14:48:46 +05:30
Alex Man	27f8f69195	shim: Fix memory usage reporting for cgroup v2 kata-shim was not reporting `inactive_file` in memory stat. This memory is deducted by containerd when calculating the size of container working set, as it can be paged out by the operating system under memory pressure. Without reporting `inactive_file`, containerd will over report container memory usage. [Here](https://github.com/containerd/containerd/blob/v1.7.22/pkg/cri/server/container_stats_list_linux.go#L117) is where containerd deducts `inactive_file` from memory usage. Note that kata-shim correctly reports `total_inactive_file` for cgroup v1, but this was not implemented for cgroup v2. This commit: - Adds code in kata-shim to report "inactive_file" memory for cgroup v2 - Implements reporting of all available cgroup v2 memory stats to containerd - Uses defensive coding to avoid assuming existence of any memory.stat fields The list of available cgroup v2 memory stats defined by containerd can be found [here](https://pkg.go.dev/github.com/containerd/cgroups/v2/stats#MemoryStat). Fixes #10280 Signed-off-by: Alex Man <alexman@stripe.com>	2024-09-18 14:04:24 -07:00
Fabiano Fidêncio	1597f8ba00	Merge pull request #10279 from alexman-stripe/alexman-stripe/fix-cgroup-v2-wrong-cpu-usage-unit agent: Fix CPU usage reporting for cgroup v2 in kata-agent	2024-09-18 21:36:52 +02:00
Fabiano Fidêncio	593cbb8710	Merge pull request #10306 from microsoft/danmihai1/more-security-contexts genpolicy: get UID from PodSecurityContext	2024-09-18 21:33:39 +02:00
Aurélien Bombo	5402f2c637	Merge pull request #10308 from Sumynwa/sumsharma/add_setpolicy_agent_ctl agent-ctl: Add SetPolicy support	2024-09-18 10:09:07 -07:00
Pawel Proskurnicki	b63d49b34a	ci: don't require sudo for yq if already installed Yq installation shouldn't force to use sudo in case yq is already installed in correct version. Signed-off-by: Pawel Proskurnicki <pawel.proskurnicki@intel.com>	2024-09-18 11:01:07 +02:00
Sumedh Alok Sharma	18c887f055	agent-ctl: Add SetPolicy support This patch adds support to call kata agents SetPolicy API. Also adds tests for SetPolicy API using agent-ctl. Fixes #9711 Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2024-09-18 10:53:49 +05:30
GabyCT	28d430ec42	Merge pull request #10324 from GabyCT/topic/fixinlib ci: Fix indentation of install libseccomp script	2024-09-17 14:21:24 -06:00
Fabiano Fidêncio	da2377346d	Merge pull request #10323 from stevenhorsman/update-kubectl-release-url kata-deploy: Switch Kubernetes URL	2024-09-17 20:47:17 +02:00
Gabriela Cervantes	096f32cc52	ci: Fix indentation of install libseccomp script This PR fixes the indentation of the install libseccomp script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-09-17 16:38:53 +00:00
Aurélien Bombo	9d29ce460d	Merge pull request #10303 from Sumynwa/sumsharma/agent_policy_set_env agent: add support to provide default agent policy via env	2024-09-17 09:04:11 -07:00
stevenhorsman	c0d35a66aa	ci: kata-deploy: Update kubectil install URL The `deploy_k0s` and `deploy_k3s` kubectl installs aren't failing yet, but let get ahead of this and bump them as well Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-09-17 15:35:42 +01:00
stevenhorsman	1abeffdac6	kata-deploy: Switch Kubernetes URL The payload build is failing with: ``` ERROR: failed to solve: process "/bin/sh -c apk --no-cache add bash curl && ARCH=$(uname -m) && if [ \"${ARCH}\" = \"x86_64\" ]; then ARCH=amd64; fi && if [ \"${ARCH}\" = \"aarch64\" ]; then ARCH=arm64; fi && DEBIAN_ARCH=${ARCH} && if [ \"${DEBIAN_ARCH}\" = \"ppc64le\" ]; then DEBIAN_ARCH=ppc64el; fi && curl -fL --progress-bar -o /usr/bin/kubectl https://storage.googleapis.com/kubernetes-release/release/ \ $(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/${ARCH}/kubectl && chmod +x /usr/bin/kubectl && curl -fL --progress-bar -o /usr/bin/jq https://github.com/jqlang/jq/releases/download/jq-1.7.1/jq-linux-${DEBIAN_ARCH} && chmod +x /usr/bin/jq && mkdir -p ${DESTINATION} && tar xvf ${WORKDIR}/${KATA_ARTIFACTS} -C ${DESTINATION} && rm -f ${WORKDIR}/${KATA_ARTIFACTS} && apk del curl && apk --no-cache add py3-pip && pip install --no-cache-dir yq==3.2.3" did not complete successfully: exit code: 22 ``` Looking into this, the problem is that https://storage.googleapis.com/kubernetes-release/release/v1.31.1/bin/linux/amd64/kubectl doesn't exist. The [kubectl install doc](https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-kubectl-on-linux) recommends the `dl.k8s.io` site, so let's switch to this. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-09-17 15:35:42 +01:00
Steve Horsman	5448f7fbbf	Merge pull request #10321 from BbolroC/fix-build-boot-image-se local-build: Fix unbound variable issue for lib_se.sh	2024-09-17 15:35:04 +01:00
Hyounggyu Choi	72471d1a18	local-build: Fix unbound variable for lib_se.sh As #10315 introduced an `unbound variable` error, this is a hot-fix for it. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-17 10:01:14 +02:00
Hyounggyu Choi	72df3004e8	gha: Rebase build-secure-image-se atop of latest target branch This commit adds a step called `Rebase atop of the latest target branch` to the job named `build-asset-boot-image-se` which can test the PR properly. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-17 09:54:51 +02:00
Hyounggyu Choi	03cd02a006	Merge pull request #10315 from BbolroC/update-ibm-se-doc doc: Update how-to-run-kata-containers-with-SE-VMs.md	2024-09-16 15:12:18 +02:00
Sumedh Alok Sharma	cefba08903	agent: add support to provide default agent policy via env agent built with policy feature initializes the policy engine using a policy document from a default path, which is installed & linked during UVM rootfs build. This commit adds support to provide a default agent policy as environment variable. This targets development/testing scenarios where kata-agent is wanted to be started as a local process. Fixes #10301 Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2024-09-16 18:05:21 +05:30
Hyounggyu Choi	8d609e47fb	doc: Update how-to-run-kata-containers-with-SE-VMs.md The following changes have been made: - Remove unnecessary `sudo` - Add an error message where an incorrect host key document is used - Add a missing artifact `kernel-confidential-modules` - Make a variable `kernel_version` and replace it with relevant hits Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-16 12:53:30 +02:00
Fabiano Fidêncio	fc5a631791	Merge pull request #10009 from Xynnn007/feat-cosign Merge to main: supporting pull cosign signed images	2024-09-16 12:08:26 +02:00
stevenhorsman	aa9f21bd19	test: Add support for s390x in cosign testing We've added s390x test container image, so add support to use them based on the arch the test is running on Fixes: #10302 Signed-off-by: stevenhorsman <steven@uk.ibm.com> fixuop	2024-09-16 09:20:57 +01:00
stevenhorsman	3087ce17a6	tests: combined pod yaml creation for CoCo tests This commit brings some public parts of image pulling test series like encrypted image pulling, pulling images from authenticated registry and image verification. This would help to reduce the cost of maintainance. Co-authored-by: stevenhorsman <steven@uk.ibm.com> Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-09-16 09:20:57 +01:00
Xynnn007	c80c8d84c3	test: add cosign signature verificaton tests Close #8120 Case 1 Create a pod from an unsigned image, on an insecureAcceptAnything registry works. Image: quay.io/prometheus/busybox:latest Policy rule: ``` "default": [ { "type": "insecureAcceptAnything" } ] ``` Case 2 Create a pod from an unsigned image, on a 'restricted registry' is rejected. Image: ghcr.io/confidential-containers/test-container-image-rs:unsigned Policy rule: ``` "quay.io/confidential-containers/test-container-image-rs": [ { "type": "sigstoreSigned", "keyPath": "kbs:///default/cosign-public-key/test" } ] ``` Case 3 Create a pod from a signed image, on a 'restricted registry' is successful. Image: ghcr.io/confidential-containers/test-container-image-rs:cosign-signed Policy rule: ``` "ghcr.io/confidential-containers/test-container-image-rs": [ { "type": "sigstoreSigned", "keyPath": "kbs:///default/cosign-public-key/test" } ] ``` Case 4 Create a pod from a signed image, on a 'restricted registry', but with the wrong key is rejected Image: ghcr.io/confidential-containers/test-container-image-rs:cosign-signed-key2 Policy: ``` "ghcr.io/confidential-containers/test-container-image-rs": [ { "type": "sigstoreSigned", "keyPath": "kbs:///default/cosign-public-key/test" } ] ``` Case 5 Create a pod from an unsigned image, on a 'restricted registry' works if enable_signature_verfication is false Image: ghcr.io/kata-containers/confidential-containers:unsigned image security enable: false Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-09-16 09:20:57 +01:00
Xynnn007	9606e7ac8b	agent: Set image-rs image security policy Add two parameters for enabling cosign signature image verification. - `enable_signature_verification`: to activate signature verification - `image_policy`: URI of the image policy config Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2024-09-16 09:20:57 +01:00
Xynnn007	653bc3973f	agent: fix make test for kata-agent of dependency anyhow new version of the anyhow crate has changed the backtrace capture thus unit tests of kata-agent that compares a raised error with an expected one would fail. To fix this, we need only panics to have backtraces, thus set `RUST_BACKTRACE=1` and `RUST_LIB_BACKTRACE=0` for tests due to document https://docs.rs/anyhow/latest/anyhow/ Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2024-09-16 09:20:57 +01:00
Fabiano Fidêncio	dfcb41b5cc	Merge pull request #10313 from stevenhorsman/coco-components-0.10-bump CoCo: Bump Coco components to 0.10 releases	2024-09-14 21:43:28 +02:00
stevenhorsman	705e469696	rootf: Change initrd alpine mirror The rootfs-initrd build is failing with: ``` fetch https://mirror.math.princeton.edu/pub/alpinelinux//v3.18/main/aarch64/APKINDEX.tar.gz 6684368:error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1914: ERROR: https://mirror.math.princeton.edu/pub/alpinelinux//v3.18/main: Permission denied ``` so try bumping to a newer version of alpine to see if that helps the issue Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-09-14 18:47:45 +02:00
Dan Mihai	5777869cf4	tests: k8s-policy-rc: add unexpected UID test Change pod runAsUser value of a Replication Controller after generating the RC's policy, and verify that the RC pods get rejected due to this change. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-09-13 22:05:31 +00:00
Dan Mihai	6773f14667	tests: k8s-policy-job: add unexpected UID test Change pod runAsUser value of a Job after generating the Job's policy, and verify that the Job gets rejected due to this change. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-09-13 22:05:31 +00:00
Dan Mihai	124f01beb3	tests: k8s-policy-deployment: add bad UID test Change pod runAsUser value of a Deployment after generating the Deployment's policy, and verify that the Deployment fails due to this change. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-09-13 22:05:31 +00:00
Dan Mihai	16f5ebf5f9	genpolicy: get UID from PodSecurityContext Get UID from PodSecurityContext for other k8s resource types too, not just for Pods. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-09-13 22:05:31 +00:00
Dan Mihai	5badc30a69	Merge pull request #10316 from microsoft/danmihai1/k8s-inotify tests: k8s-inotify: pod termination polling	2024-09-13 15:02:38 -07:00
GabyCT	6f363bba18	Merge pull request #10304 from GabyCT/topic/fixcricont tests: Fix indentation in the cri containerd tests	2024-09-13 14:49:12 -06:00
Dan Mihai	d3127af9c5	tests: k8s-inotify: pod termination polling Poll/wait for pod termination instead of sleeping 2 minutes. This change typically saves ~90 seconds in my test cluster. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-09-13 17:12:55 +00:00
sidney chang	5a7d0ed3ad	runtime-rs: introduce tap in hypervisor by extrating it from dragonball It's a prerequisite PR to make built-in vmm dragonball compilation options configurable. Extract TAP device-related code from dragonball's dbs_utils into a separate library within the runtime-rs hypervisor module. To enhance functionality and reduce dependencies, the extracted code has been reimplemented using the libc crate and the ifreq structure. Fixes #10182 Signed-off-by: sidney chang <2190206983@qq.com>	2024-09-13 07:32:14 -07:00
Fabiano Fidêncio	b09eba8c46	Merge pull request #10309 from BbolroC/helm-install-with-retry tests: Introduce retry mechanism for helm install	2024-09-13 15:08:46 +02:00
stevenhorsman	00e657cdb7	agent: image-rs: Update to v0.10.0 release Update image-rs to use the latest release of guest-components Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-09-13 13:29:54 +01:00
stevenhorsman	5e03890562	versions: Bump trustee and guest-components Bump to the v0.10.1 release of trustee and v0.10.0 release of guest-components Signed-off-by: stevenhorsman <steven@uk.ibm.com> fixup	2024-09-13 13:28:54 +01:00
Hyounggyu Choi	0aae847ae5	tests: Update secure boot image verification for IBM SE In the latest `s390-tools`, there has been update on how to verify a secure boot image. A host key revocation list (CRL), which was optinoal, now becomes mandatory for verification. This commit updates the relevant scripts and documentation accordingly. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-13 14:14:02 +02:00
Hyounggyu Choi	4c933a5611	tests: Introduce retry mechanism for helm install Kata-deploy often fails due to a transiently unreachable k8s cluster for the qemu-coco-dev test on s390x. (e.g. https://github.com/kata-containers/kata-containers/actions/runs/10831142906/job/30058527098?pr=10009) This commit introduces a retry mechanism to mitigate these failures by retrying the command two more times with a 10-second interval as a workaround. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-13 14:03:44 +02:00
Dan Mihai	e937cb1ded	Merge pull request #10291 from microsoft/danmihai1/user-name-to-uid genpolicy: fix and re-enable create container UID verification	2024-09-12 15:47:59 -07:00
Dan Mihai	0c5ac042e7	tests: k8s-policy-pod: add workaround for #10297 If the CI platform being tested doesn't support yet the prometheus container image: - Use busybox instead of prometheus. - Skip the test cases that depend on the prometheus image. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-09-12 18:26:38 +00:00
Gabriela Cervantes	0346b32a90	tests: Fix indentation in the cri containerd tests This PR fixes the indentation in the cri containerd tests as we have in several places a misalignment in the script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-09-12 16:18:34 +00:00
Dan Mihai	94d95fc055	tests: k8s-policy-pod: test container UID changes Add test cases for changing container UID after generating the policy. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-09-11 22:38:20 +00:00
Dan Mihai	db1ca4b665	tests: k8s-policy-pod: remove UID workaround Remove the workaround for #9928, now that genpolicy is able to convert user names from container images into the corresponding UIDs from these images. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-09-11 22:38:20 +00:00
Dan Mihai	d2d8d2e519	genpolicy: remove default UID/GID values Remove the recently added default UID/GID values, because the genpolicy design is to initialize those fields before this new code path gets executed. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-09-11 22:38:20 +00:00
Hernan Gatta	871476c3cb	genpolicy: pull UID:GID values from /etc/passwd Some container images are configured such that the user (and group) under which their entrypoint should run is not a number (or pair of numbers), but a user name. For example, in a Dockerfile, one might write: > USER 185 indicating that the entrypoint should run under UID=185. Some images, however, might have: > RUN groupadd --system --gid=185 spark > RUN useradd --system --uid=185 --gid=spark spark > ... > USER spark indicating that the UID:GID pair should be resolved at runtime via /etc/passwd. To handle such images correctly, read through all /etc/passwd files in all layers, find the latest version of it (i.e., the top-most layer with such a file), and, in so doing, ensure that whiteouts of this file are respected (i.e., if one layer adds the file and some subsequent layer removes it, don't use it). Signed-off-by: Hernan Gatta <hernan.gatta@opaque.co>	2024-09-11 22:38:20 +00:00
Hernan Gatta	f9249b4476	genpolicy: add tar dependency Used to read /etc/passwd from tar files. Signed-off-by: Hernan Gatta <hernan.gatta@opaque.co>	2024-09-11 22:38:20 +00:00
Dan Mihai	eb7f747df1	genpolicy: enable create container UID verification Disabling the UID Policy rule was a workaround for #9928. Re-enable that rule here and add a new test/CI temporary workaround for this issue. This new test workaround will be removed after fixing #9928. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-09-11 22:38:20 +00:00
Dan Mihai	71ede4ea3f	tests: k8s-policy-pod: use prometheus container Change quay.io/prometheus/busybox to quay.io/prometheus/prometheus in this test. The prometheus image will be helpful for testing the future fix for #9928 because it specifies user = "nobody". Also, change: sh -c "ls -l /" to: echo -n "readinessProbe with space characters" as the test readinessProbe command line. Both include a command line argument containing space characters, but "sh -c" behaves differently when using the prometheus container image (causes the readinessProbe to time out, etc.). Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-09-11 22:38:20 +00:00
GabyCT	614328f342	Merge pull request #10295 from GabyCT/topic/removeimgvar metrics: Remove unused remove img var in common script	2024-09-11 15:02:39 -07:00
GabyCT	095c5ed961	Merge pull request #10289 from GabyCT/topic/enablestresst tests: Enable stressng k8s stability test for Kata CoCo CI	2024-09-11 10:47:33 -07:00
Fabiano Fidêncio	97ecdabde9	Merge pull request #10294 from fidencio/topic/bring-ita-support Bump guest-components / trustee to a version that supports ITA	2024-09-11 19:45:48 +02:00
Gabriela Cervantes	fdaf12d16c	metrics: Remove unused remove img var in common script This PR removes the remove_img variable in the metrics common script as it is not being used. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-09-11 17:45:18 +00:00
Gabriela Cervantes	04d1122a46	tests: Decrease iterations in soak test This PR decreases the number of iterations in the kubernetes soak test as this is already taking more than 2 hours for the kata coco ci stability. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-09-11 17:39:06 +00:00
Gabriela Cervantes	c48c6f974e	tests: Enable stressng k8s stability test for Kata CoCo CI This PR enables the stressng k8s stability test for Kata CoCo CI. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-09-11 17:38:13 +00:00
Alex Man	7e400f7bb2	agent: Fix CPU usage reporting for cgroup v2 in kata-agent kata-agent incorrectly reports CPU time for cgroup v2, causing 1000x underreporting. For cgroup v2, kata-agent reads the cpu.stat file, which reports the time consumed by the processes in the cgroup in µs. However, there was a bug in kata-agent where it returned this value in µs without converting it to ns. This commit adds the necessary µs to ns conversion for cgroup v2, aligning it with v1 behavior and kata-shim's expectations. This fixes #10278 Signed-off-by: Alex Man <alexman@stripe.com>	2024-09-11 10:29:03 -07:00
Fabiano Fidêncio	1178fe20e9	tests: Adapt error parser for failed image decryption With an older version of image-rs, we were getting the following error: ``` Message: failed to create containerd task: failed to create shim task: failed to handle layer: failed to get decrypt key no suitable key found for decrypting layer key: ``` However, with the version of image-rs we are bumping to, the error comes as: ``` Message: failed to create containerd task: failed to create shim task: failed to handle layer: failed to get decrypt key Caused by: no suitable key found for decrypting layer key: keyprovider: failed to unwrap key by ttrpc ``` Due to this change, I'm splitting the check in two different ones. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-11 17:07:56 +02:00
Dan Mihai	66dda37877	Merge pull request #10271 from Sumynwa/sumsharma/agent_ctl_issue_9689_local agent-ctl: Refactor CopyFile Handler	2024-09-11 07:35:09 -07:00
Fabiano Fidêncio	f6cfc33314	Merge pull request #10292 from fidencio/topic/ci-tdx-adapt-how-we-get-the-host-ip ci: tdx: Adapt how we get the host IP	2024-09-11 14:42:22 +02:00
Fabiano Fidêncio	e2200f0690	versions: trustee: Update to a version that supports ITA ITA stands for Intel Trust Authority, which is in the process to being renamed to ITTS (Intel Tiber Trust Services). Proper ITA / ITTS support on Trustee was finished as part of: * `6f767fa15f` Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-11 13:39:35 +02:00
Fabiano Fidêncio	d3e3ee7755	versions: guest-components: Update to a version that supports ITA ITA stands for Intel Trust Authority, which is in the process to being renamed to ITTS (Intel Tiber Trust Services). As we've bumped guest-components on trustee, let's make sure we also bump image-rs to the commit that brings ITA support in: * https://github.com/confidential-containers/guest-components/commit/1db6c3a87665dde58d0efa56f4e4af5fc Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-11 13:36:56 +02:00
Fabiano Fidêncio	f94d80783d	agent: image-rs: Update to a version that supports ITA ITA stands for Intel Trust Authority, which is in the process to being renamed to ITTS (Intel Tiber Trust Services). As we've bumped guest-components on trustee, let's make sure we also bump image-rs to the commit that brings ITA support in: * `1db6c3a876` The reason we need to bump the dependency here is to avoid kbs_protocol mismatch between the version used by the agent and the trustee one. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-11 13:36:46 +02:00
Fabiano Fidêncio	3946aa7283	ci: tdx: Adapt how we get the host IP In the process of switching the TDX CI machine we've noticed that `hostname -i` in one of the machines returns an one and only IP address, while in another machine it returns a full list of IPs. As we're only interested in the first one, let's adapt the code to always return the first one. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-11 09:31:43 +02:00
Sumedh Alok Sharma	b4bbbf65c6	ci: Do not start CDH/attestation procs with kata-agent as local process. Since CDH/attestation related processes and its dependencies are not fully available, the setup fails to start kata-agent as local process. This fix removes these procs to prevent kata-agent from trying to start them. Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2024-09-11 11:53:59 +05:30
Sumedh Alok Sharma	8045a7a2ba	ci: Install policy document on host to run kata-agent as local process. The test setup starts kata-agent as a local process without the UVM. The agent policy initialization fails due to missing policy document at `/etc/kata-opa/default-policy.rego`. The fix - installs a relaxed `allow-all.rego` policy document - cleans up the install during exit Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2024-09-11 11:25:08 +05:30
Sumedh Alok Sharma	822f898433	ci: Install bats as dependencies Install bats as part of dependencies for running the tests. Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2024-09-11 10:57:15 +05:30
Sumedh Alok Sharma	2c774fb207	ci: Add tests for CopyFile api. This commit introduces test cases for testing CopyFile API using kata-agent-ctl with improved command semantics and handling. - copy a file to /run/kata-containers - copy symlink to /run/kata-containers - copy directory to /run/kata-containers - copy file to /tmp - copy large file to /run/kata-containers Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2024-09-11 10:54:01 +05:30
Sumedh Alok Sharma	2af1113426	agent-ctl: Refactor CopyFile handler In the existing implementation for the CopyFile subcommand, - cmd line argument list is too long, including various metadata information. - in case of a regular file, passing the actual data as bytes stream adds to the size and complexity of the input. - the copy request will fail when the file size exceeds that of the allowed ttrpc max data length limit of 4Mb. This change refactors the CopyFile handler and modifies the input to a known 'source' 'destination' syntax. Fixes #9708 Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2024-09-11 10:54:01 +05:30
Alex Lyn	d0968032f7	Merge pull request #10276 from Apokleos/fix-runtime-cdi runtime: Fix runtime/cdi panic with assignment to entry in nil map	2024-09-11 09:00:11 +08:00
Alex Lyn	3f541aff4a	Merge pull request #10282 from teawater/dup runtime-rs: configuration-dragonball.toml.in: Remove duplication	2024-09-10 11:46:40 +08:00
Hui Zhu	dfea12bc53	runtime-rs: configuration-dragonball.toml.in: Remove duplication Remove duplicated description of enable_balloon_f_reporting from configuration-dragonball.toml.in. Signed-off-by: Hui Zhu <teawater@antgroup.com>	2024-09-10 07:34:29 +08:00
David Esparza	6f8897249b	Merge pull request #10277 from GabyCT/topic/fixsk tests: Increase timeout to wait for soak stability test deployment	2024-09-09 14:07:10 -06:00
Gabriela Cervantes	5a52fe1a75	tests: Increase timeout to wait for soak stability test deployment This PR increases the timeout to wait that the deployment for the soak stability test is ready in order to avoid random failures saying that the deployment is not ready yet. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-09-09 16:13:40 +00:00
Alex Lyn	1684c1962c	runtime: Fix runtime/cdi panic with assignment to entry in nil map It will panic when users do GPU vfio passthrough with cdi in runtime. The root cause is that CustomSpec.Annotations is nil when new element added. To address this issue, initialization is introduced when it's nil. Fixes #10266 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-09-09 20:15:10 +08:00
Alex Lyn	f31839af63	Merge pull request #10253 from teawater/enable_balloon_f_reporting Add support of dragonball virtio-balloon free page reporting	2024-09-09 17:37:52 +08:00
Fabiano Fidêncio	026a4d92a9	Merge pull request #10272 from fidencio/topic/add-tdx-mrconfigid-mrowner-mrownerconfig-support runtime: qemu: tdx: Add support for setting mrconfigid / mrowner / mrownerconfig	2024-09-08 14:11:30 +02:00
Fabiano Fidêncio	51ee4c381a	Merge pull request #10257 from fidencio/topic/kata-deploy-remove-unused-vars-for-cleanup kata-deploy: Remove kata-cleanup unneeded vars	2024-09-07 11:27:14 +02:00
Chengyu Zhu	3a37652d01	Merge pull request #10213 from ChengyuZhu6/device Refine device management for kata-agent	2024-09-07 12:02:32 +08:00
ChengyuZhu6	75816d17f1	agent: switch to new device subsystem Switch to new device subsystem to handle various devices in kata-agent. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-07 09:45:43 +08:00
ChengyuZhu6	df55f37dfe	agent: Move unit tests about vfio device to vfio_device_handler Move unit tests about vfio device to vfio_device_handler. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-07 09:45:43 +08:00
ChengyuZhu6	41c2d81fd3	agent: Move unit tests about scsi device to scsi_device_handler Move unit tests about scsi device to scsi_device_handler. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-07 09:45:43 +08:00
ChengyuZhu6	f45129cb44	agent: Move unit tests about network device to network_device_handler Move unit tests about network device to network_device_handler. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-07 09:45:43 +08:00
ChengyuZhu6	52203db760	agent: Move unit tests about block device to block_device_handler Move unit tests about block device to block_device_handler. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-07 09:45:43 +08:00
ChengyuZhu6	e1afb92a28	agent: Move common unit tests about device Move common unit tests about device to mod.rs Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-07 09:45:43 +08:00
ChengyuZhu6	25bd04c02a	agent: Use DeviceHandlerManager to handle various devices Use DeviceHandlerManager to handle various devices. Fixes: #10218 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-07 09:45:42 +08:00
ChengyuZhu6	5fc645c869	agent: Move network device code to network_device_handler Move network device code to network_device_handler to simplify the code. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-07 09:40:30 +08:00
ChengyuZhu6	07f104085a	agent: Move vfio device code to vfio_device_handler Move vfio device code to vfio_device_handler to simplify the code. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-07 09:40:30 +08:00
ChengyuZhu6	0cb87767ae	agent: Move device code with virtio scsi driver to scsi_device_handler Move scsi device code to scsi_device_handler to simplify the code. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-07 09:40:30 +08:00
ChengyuZhu6	0738d75a92	agent: Move device code with nvdimm driver to nvdimm_device_handler Move device code with nvdimm driver to nvdimm_device_handler, including nvdimm device and pmem device. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-07 09:40:30 +08:00
ChengyuZhu6	bbf934161b	agent: Move virtio-block device handlers to block_device_handler Move virtio-block device handlers to block_device_handler to simplify the code. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-07 09:40:30 +08:00
ChengyuZhu6	4e33665be8	kata-types: Move device driver constants to kata-types Move device driver constants and add DeviceHandlerManager type alias. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-07 09:40:30 +08:00
ChengyuZhu6	0b3ad2f830	kata-types: Replace StorageHandlerManager with type alias Removed the `StorageHandlerManager` struct and its associated implementations and introduced a type alias `StorageHandlerManager` for `HandlerManager` to simplify the code. The new type alias maintains the same functionality while reducing redundancy. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-07 07:53:31 +08:00
ChengyuZhu6	281f0d7f29	kata-types: Add HandlerManager to manage registered handlers Introduced `HandlerManager` struct to manage registered handlers, which will be used to storage and device management for kata-agent. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-07 07:51:48 +08:00
GabyCT	b05811587e	Merge pull request #10245 from ChengyuZhu6/handler-manager agent: Refactor storage handler registration	2024-09-06 09:45:39 -06:00
GabyCT	37ddb837c4	Merge pull request #10267 from GabyCT/topic/updatemlcomments metrics: Update openVINO and oneDNN tests references	2024-09-06 09:42:21 -06:00
Fabiano Fidêncio	65a4562050	runtime: qemu: tdx: Add `omitempty` to QuoteGenerationSocket I know right now we're always passing a value for that, but this doesn't really have to be set unless attestation is used. Thus, let's also omit it in case it's empty. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-06 15:05:55 +02:00
Fabiano Fidêncio	7818484120	runtime: qemu: tdx: Support mrconfigid / mrowner/ mrownerconfig This is a quick and simple pre-req for supporting initData, which will take advantage of the mrconfigid in the TDX case. While already adding mrconfigid, which is hardcoded empty right now, let's do the same for mrowner and mrownerconfig, and leave it prepared for future expansions. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-06 15:05:54 +02:00
Fabiano Fidêncio	8285957678	runtime: qemu: Rename prepareObjectWithTDXQgs to prepareTDXObject The reason we're relying on yet another function to do so is because the TDX object will be used in its qom / qapi json format. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-06 14:36:09 +02:00
Fabiano Fidêncio	29ce2205a1	Merge pull request #10268 from microsoft/saulparedes/pdb-support genpolicy: add support for PodDisruptionBudget yaml	2024-09-06 09:53:36 +02:00
Dan Mihai	1885478e2e	Merge pull request #10270 from Sumynwa/sumsharma/enable_agent_tests_in_ci ci: Enable kata agent API tests	2024-09-05 14:24:49 -07:00
Archana Choudhary	f2625b0014	genpolicy: add support for PodDisruptionBudget yaml Prevent panic for PDB specs Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2024-09-05 11:33:47 -07:00
Sumedh Alok Sharma	e1ac2f4416	ci: Enable kata agent api tests This commit enables running tests for kata agent apis. The 'api-tests' directory will contain bats test files for individual APIs. Fixes #10269 Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2024-09-06 00:02:55 +05:30
GabyCT	4b257bcbb6	Merge pull request #10255 from Sumynwa/sumsharma/metrics_ci_kill_kata_components ci: send SIGKILL to kill kata components	2024-09-05 12:04:57 -06:00
Aurélien Bombo	cc9aeee81a	Merge pull request #10263 from Sumynwa/sumsharma/add_ci_workflow ci: Add workflow to run kata-agent api tests using kata-agent-ctl	2024-09-05 09:32:34 -07:00
Dan Mihai	7ab95b56f1	Merge pull request #10251 from microsoft/saulparedes/support_readonly_hostpath genpolicy: support readonly hostpath	2024-09-05 09:27:15 -07:00
GabyCT	deb6d12ff6	Merge pull request #10237 from GabyCT/topic/k8soakcoco tests: Enable k8s soak stability test for Kata CoCo CI	2024-09-05 09:56:48 -06:00
Gabriela Cervantes	fcc35dd3a7	metrics: Update openVINO and oneDNN tests references This PR updates the machine learning tests references or urls for the openVINO and oneDNN scripts as currently they are refering to a different performance benchmark. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-09-05 15:39:21 +00:00
GabyCT	bb5d8bbcb5	Merge pull request #10229 from GabyCT/topic/ufcv versions: Update firecracker version to 1.8.0	2024-09-05 09:19:36 -06:00
Fabiano Fidêncio	70491ff29f	Merge pull request #10244 from BbolroC/turn-on-kbs-qemu-coco-dev-s390x gha: Turn on KBS for qemu-coco-dev on s390x	2024-09-05 13:02:42 +02:00
Sumedh Alok Sharma	ad66f4dfc9	ci: Add workflow to run kata-agent api tests using kata-agent-ctl enable CI to add test cases for testing kata-agent APIs. This commit introduces: - a workflow to run tests - setup scripts to prepare the test environment Fixes #10262 Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2024-09-05 14:38:29 +05:30
Saul Paredes	24c2d13fd3	genpolicy: support readonly emptyDir mount Set emptyDir access based on volume mount readOnly value Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-09-04 15:05:44 -07:00
Saul Paredes	36a4104753	genpolicy: support readonly hostpath Set hostpath access based on volume mount readOnly value Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-09-04 14:55:22 -07:00
Fabiano Fidêncio	7d048f5963	Merge pull request #10254 from fidencio/topic/remove-amd-specific-warning-from-non-amd-systems runtime: Don't error out about SNP cert path on non SNP platforms	2024-09-04 23:42:32 +02:00
Fabiano Fidêncio	d44d66ddf6	kata-deploy: Remove kata-cleanup unneeded vars As kata-cleanup will only call `reset_runtime()`, there's absolutely no need to export the other set of environment variables in its yaml file. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-04 19:09:02 +02:00
Steve Horsman	f66e8c41a1	Merge pull request #10250 from squarti/remote-machine-type-default runtime: fix bad default machine_type for remote hypervisor	2024-09-04 17:34:04 +01:00
Sumedh Alok Sharma	4025468e27	ci: send SIGKILL to kill kata components metrics tests sometimes fail with kata components still running. sending SIGKILL and waiting for the processes to reap. Fixes #8651 Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>	2024-09-04 18:58:17 +05:30
Fabiano Fidêncio	b10256a7ca	runtime: Don't error out about SNP cert path on non SNP platforms This error is specific to SNP platforms, so let's make sure we only error this out when an SNP platform is used. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-04 11:54:52 +02:00
Hui Zhu	447a7feccf	runtime-rs: configuration-dragonball.toml.in: Add config for balloon Add enable_balloon_f_reporting config to configuration-dragonball.toml.in. Signed-off-by: Hui Zhu <teawater@antgroup.com>	2024-09-04 17:25:38 +08:00
Hui Zhu	9c1b5238b3	kernel/configs: Add ballon and f_reporting to dragonball-experimental Add CONFIG_PAGE_REPORTING, CONFIG_BALLOON_COMPACTION and CONFIG_VIRTIO_BALLOON to dragonball-experimental configs to open dragonball function and free page reporting function. Signed-off-by: Hui Zhu <teawater@antgroup.com>	2024-09-04 17:25:30 +08:00
Hui Zhu	ad9968ce2d	runtime-rs: Add enable_balloon_f_reporting for dragonball Under normal circumstances, the virtual machine only requests memory from the host and does not actively release it back to host when it is no longer needed, leading to a waste of memory resources. Free page reporting is a sub-feature of virtio-balloon. When this feature is enabled, the Linux guest kernel will send information about released pages to dragonball via virtio-balloon, and dragonball will then release these pages. This commit adds an option enable_balloon_f_reporting to runtime-rs. When this option is enabled, runtime-rs will insert a virtio-balloon device with the f_reporting option enabled during the Dragonball virtual machine startup. Signed-off-by: Hui Zhu <teawater@antgroup.com>	2024-09-04 16:38:13 +08:00
Fabiano Fidêncio	13517cf9c1	Merge pull request #10192 from fidencio/topic/helm-add-post-delete-job helm: Several fixes, including some reasonable re-work on kata-deploy.sh script	2024-09-04 09:34:57 +02:00
Paul Meyer	3be719c805	virtcontainers: allow specifying nydus-overlayfs binary by path ...or by using a binary with additional suffix. This allows having multiple versions of nydus-overlayfs installed on the host, telling nydus-snapshotter which one to use while still detecting Nydus is used. Signed-off-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>	2024-09-04 08:29:40 +02:00
Chengyu Zhu	f0066568eb	Merge pull request #10233 from ChengyuZhu6/cdh-instance agent:cdh: Refactor CDHClient usage and initialization	2024-09-04 13:34:36 +08:00
Silenio Quarti	9e1388728e	runtime: fix bad default machine_type for remote hypervisor Fixes: #10249 Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>	2024-09-03 20:53:19 -04:00
GabyCT	c2774b09dd	Merge pull request #10247 from GabyCT/topic/removereportm metrics: Remove metrics report for Kata Containers	2024-09-03 15:10:04 -06:00
Fabiano Fidêncio	bb9bcd886a	kata-deploy: Add reset_cri_runtime() This will help to avoid code duplication on what's needed on the helm and non-helm cases. The reason it's not been added as part of the commit which adds the post-delete hook is simply for helping the reviewer (as the diff would be less readable with this change). Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-03 23:08:22 +02:00
Fabiano Fidêncio	a773797594	ci: Pass --debug to helm Just to make ourlives a little bit easier. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-03 23:08:22 +02:00
Fabiano Fidêncio	64ccb1645d	helm: Add a post-delete hook Instead of using a lifecycle.preStop hook, as done when we're using using the helm chat, let's add a post-delete hook to take care of properly cleaning up the node during when uninstalling kata-deploy. The reason why the lifecyle.preStop hook would never work on our case is simply because each helm chart operation follows the Kuberentes "declarative" approach, meaning that an operation won't wait for its previous operation to successfully finish before being called, leading to us trying to access content that's defined by our RBAC, in an operation that was started before our RBAC was deleted, but having the RBAC being deleted before the operation actually started. Unfortunately this hook brings in some code duplicatioon, mainly related to the RBAC parts, but that's not new as the same happens with our deamonset. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-09-03 23:08:22 +02:00
Wainer dos Santos Moschetta	3b23d62635	tests/k8s: fix wait for pods on deploy-kata action On commit `51690bc157` we switched the installation from kubectl to helm and used its `--wait` expecting the execution would continue when all kata-deploy Pods were Ready. It turns out that there is a limitation on helm install that won't wait properly when the daemonset is made of a single replica and maxUnavailable=1. In order to fix that issue, let's revert the changes partially to keep using kubectl and waitForProcess to the exection while Pods aren't Running. Fixes #10168 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-09-03 23:08:22 +02:00
Fabiano Fidêncio	40f8aae6db	Reapply "ci: make cleanup_kata_deploy really simple" This reverts commit `21f9f01e1d`, as the pacthes for helm are coming as part of this series. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-03 23:08:22 +02:00
Fabiano Fidêncio	cfe6e4ae71	Reapply "ci: Use helm to deploy kata-deploy" (partially) This reverts commit `36f4038a89`, as the pacthes for helm are coming as part of this series. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-03 23:08:22 +02:00
Fabiano Fidêncio	424347bf0e	Reapply "kata-deploy: Add Helm Chart" (partially) This reverts commit `b18c3dfce3`, as the pacthes for helm are coming as part of this series. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-03 23:08:22 +02:00
ChengyuZhu6	77521cc8d2	agent:cdh: introduce a function to check initialization of cdh client introduce a function to check initialization of cdh client. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-04 04:52:50 +08:00
ChengyuZhu6	07e0e843e8	agent:cdh: switch to the new method for initializing cdh client Decouple the cdh client from AgentService and refactor cdh client usage and initialization. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-04 04:51:55 +08:00
ChengyuZhu6	bc8156c3ae	agent:cdh: Refactor cdh client methods for better integration Move `unseal_env` and `secure_mount` functions on the global `CDH_CLIENT` instance to access the CDH client. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-04 04:51:54 +08:00
ChengyuZhu6	0ad35dc91b	agent:cdh: Initialize CDH client as a global asynchronous instance Introduced a global `CDH_CLIENT` instance to hold the cdh client and implemented `init_cdh_client` function to initialize the cdh client if not already set. Fixes: #10231 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-04 04:49:54 +08:00
Gabriela Cervantes	5b0ab7f17c	metrics: Remove metrics report for Kata Containers This PR removes the metrics report which is not longer being used in Kata Containers. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-09-03 16:11:07 +00:00
Hyounggyu Choi	1cefa48047	gha: Add necessary steps for KBS enablement The following steps are required for enabling KBS: - Set environment variables `KBS` and `KBS_INGRESS` - Uninstall and install `kbs-client` - Deploy KBS This commit adds the above stpes to the existing workflow for `qemu-coco-dev`. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-03 16:26:12 +02:00
Hyounggyu Choi	b0a912b8b4	tests: Enable KBS deployment for qemu-coco-dev on s390x To deploy KBS on s390x, the environment variable `IBM_SE_CREDS_DIR` must be exported, and the corresponding directory must be created. This commit enables KBS deployment for `qemu-coco-dev`, in addition to the existing `qemu-se` support on the platform. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-09-03 15:51:18 +02:00
Fabiano Fidêncio	057612f18f	Merge pull request #10238 from fidencio/topic/remove-stdio-test ci: Remove stdio tests	2024-09-03 14:50:46 +02:00
ChengyuZhu6	0d519162b5	agent:storage: Refactor storage handler registration - Added `driver_types` method to `StorageHandler` trait to return driver types managed by each handler. - Implemented driver_types method for all storage handlers. - Updated `STORAGE_HANDLERS` initialization to use `driver_types` for handler registration. Fixes: #10242 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-03 18:38:52 +08:00
ChengyuZhu6	e47eb0d7d4	kata-types:mount: support registering multiple IDs to a single handler - Updated the `add_handler` function in `StorageHandlerManager` to accept a slice of IDs (`&[&str]`) instead of a single ID (`&str`). This change allows a single handler to be registered for multiple storage device types. - Refactored calls to `add_handler` in `Storage` of kata-agent to use the new function, passing arrays of storage drivers instead of single driver. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-09-03 18:38:36 +08:00
Fabiano Fidêncio	e8657c502d	Revert "CI: Add tests for stdio" This reverts commit `704da86e9b`, as the tests never became stable to run. This was discussed and agreed with the maintainer. Conflicts: .github/workflows/basic-ci-amd64.yaml tests/integration/stdio/gha-run.sh Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-03 11:52:30 +02:00
Greg Kurz	4698235e59	Merge pull request #10204 from fidencio/topic/kata-deploy-add-installation-prefix kata-deploy: helm: Add INSTALLATION_PREFIX	2024-09-03 09:26:51 +02:00
Fabiano Fidêncio	e1d3fb8c00	Merge pull request #10236 from fidencio/topic/bump-image-rs-to-properly-handle-gzip-whiteouts agent: Update image-rs to 02af65abc	2024-09-02 21:43:19 +02:00
Fabiano Fidêncio	0cb93ed1bb	kata-deploy: helm: Add INSTALLATION_PREFIX option This will allow users to properly set the INSTALLATION_PREFIX when deploying Kata Containers. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-02 20:25:22 +02:00
Gabriela Cervantes	c2aa288498	gha: Increase time to run Kata CoCo stability tests This PR increases the time to run the Kata CoCo stability tests as this tests are design to run for more than 2 hours. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-09-02 16:40:47 +00:00
Gabriela Cervantes	825cb2d22e	tests: Enable k8s soak stability test for Kata CoCo CI This PR enables the k8s soak stability test to run on the weekly Kata CoCo stability CI. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-09-02 16:30:44 +00:00
Fabiano Fidêncio	1309c49c09	agent: Update image-rs to 02af65abc As this brings in proper support to handle gzip whiteouts. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-02 14:15:04 +02:00
Fabiano Fidêncio	7be77ebee5	kata-deploy: helm: Stop mounting /opt/kata It's simply easier if we just use /host/opt/kata instead in our scripts, which will simplify a lot the logic of adding an INSTALLATION_PREFIX later on. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-02 09:38:51 +02:00
Fabiano Fidêncio	6ce5e62c48	kata-deploy: Add a $dest_dir var As we build our binaries with the `/opt/kata` prefix, that's the value of $dest_dir. Later in thise series it'll become handy, as we'll introduce a way to install the Kata Containers artefacts in a different location. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-09-02 09:36:33 +02:00
Fabiano Fidêncio	ef5a5ea26e	Merge pull request #10038 from sprt/move-free-runner-iii ci: Transition GARM tests to free runners, pt. III	2024-08-31 01:29:08 +02:00
Gabriela Cervantes	19d8f11345	versions: Update firecracker version to 1.8.0 This PR updates the firecracker version to 1.8.0 which includes the following changes: - Added ACPI support to Firecracker for x86_64 microVMs. Currently, we pass ACPI tables with information about the available vCPUs, interrupt controllers, VirtIO and legacy x86 devices to the guest. This allows booting kernels without MPTable support. Please see our kernel policy documentation for more information regarding relevant kernel configurations. - Added support for the Virtual Machine Generation Identifier (VMGenID) device on x86_64 platforms. VMGenID is a virtual device that allows VMMs to notify guests when they are resumed from a snapshot. Linux includes VMGenID support since version 5.18. It uses notifications from the device to reseed its internal CSPRNG. Please refer to snapshot support and random for clones documention for more info on VMGenID. VMGenID state is part of the snapshot format of Firecracker. As a result, Firecracker snapshot version is now 2.0.0. - Changed T2CL template to pass through bit 27 and 28 of MSR_IA32_ARCH_CAPABILITIES (RFDS_NO and RFDS_CLEAR) since KVM consider they are able to be passed through and T2CL isn't designed for secure snapshot migration between different processors. - Avoid setting kvm_immediate_exit to 1 if are already handling an exit, or if the vCPU is stopped. This avoids a spurious KVM exit upon restoring snapshots. - Changed T2S template to set bit 27 of MSR_IA32_ARCH_CAPABILITIES (RFDS_NO) to 1 since it assumes that the fleet only consists of processors that are not affected by RFDS. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-30 20:49:29 +00:00
Aurélien Bombo	886b3047ac	Merge pull request #10222 from microsoft/danmihai1/log-level-false-positives agent: avoid policy.txt log without debug enabled	2024-08-30 10:09:04 -07:00
Alex Lyn	4fd4b02f2e	Merge pull request #10228 from GabyCT/topic/removeionednn metrics: Remove unused variable in oneDNN benchmark	2024-08-30 09:31:14 +08:00
Gabriela Cervantes	aa8635727d	metrics: Remove unused variable in oneDNN benchmark This PR removes an unused variable in oneDNN metrics benchmark. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-29 15:52:47 +00:00
Alex Lyn	8241423ba5	Merge pull request #10224 from amshinde/update-image-rs-xattr agent: image-rs: check xattrs for image unpacking	2024-08-29 09:33:22 +08:00
GabyCT	dd9f41547c	Merge pull request #10160 from microsoft/saulparedes/support_priority_class genpolicy: add priorityClassName as a field in PodSpec interface	2024-08-28 14:36:20 -06:00
GabyCT	394480e7ff	Merge pull request #10221 from GabyCT/topic/addopendmmread docs: Add oneDNN benchmark information to metrics README	2024-08-28 14:22:22 -06:00
GabyCT	83b031ca7a	Merge pull request #10214 from GabyCT/topic/ciweekly gha: Add GHA workflow to run Kata CoCo stability tests	2024-08-28 11:46:29 -06:00
Archana Shinde	c747852bce	agent: image-rs: check xattrs for image unpacking This commit includes a fix for pulling an image on platforms that do not support xattr. Some platforms/file-systems do not support xattrs, this would make the image pull fail because of failing to set xattr. This commit will check whether the target path supports xattr. If yes, the unpacking will maintain xattrs; if not, it will not set xattrs. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-08-28 00:02:46 -07:00
Archana Choudhary	ae2cdedba8	genpolicy: add priorityClassName as a field in PodSpec interface This allows generation of policy for pods specifying priority classes. Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2024-08-27 19:54:02 -07:00
Dan Mihai	aa8bdbde5a	agent: avoid policy.txt log without debug enabled slog's is_enabled() is documented as: - "best effort", and - Sometime resulting in false positives. Use AGENT_CONFIG.log_level.as_usize() instead, to avoid those false positives. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-08-28 02:33:56 +00:00
Aurélien Bombo	de98e467b4	ci: Use `ubuntu-22.04` instead of `ubuntu-latest` 22.04 is the default today: `23da668261/README.md` Being more specific will avoid unexpected errors when Github updates the default. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-08-27 16:44:39 +00:00
Aurélien Bombo	ceab66b1ce	ci: Run `build-checks-depending-on-kvm` for free Also keeps the Rust installation step even though it's preinstalled, so that we use the version specified in versions.yaml. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-08-27 16:43:59 +00:00
Aurélien Bombo	b4ce84b9d2	ci: Move `run-runk` to free runner No change other than switching the runner - no dependency issue expected. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-08-27 16:43:33 +00:00
Aurélien Bombo	645aaa6f7f	ci: Move `run-monitor` to free runner No change other than switching the runner - no dependency issue expected. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-08-27 16:43:33 +00:00
Gabriela Cervantes	3affde5b28	docs: Add oneDNN benchmark information to metrics README This PR adds the oneDNN benchmark information to the machine learning metrics README. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-27 16:32:50 +00:00
Dan Mihai	9f6f5dac4b	Merge pull request #10037 from sprt/reinstate-mariner-host ci: reinstate Mariner host and guest kernel	2024-08-27 08:24:51 -07:00
Alex Lyn	f24983b3cf	Merge pull request #10210 from l8huang/cold-vf runtime: check if cold_plug_vfio is enabled before create PhysicalEndpoint	2024-08-27 15:23:55 +08:00
Alex Lyn	3a749cfb44	Merge pull request #10212 from squarti/remote-machine-type runtime: Allow machine_type in kata config for remote hypervisors	2024-08-27 14:05:36 +08:00
Aurélien Bombo	a3dba3e82b	ci: reinstate Mariner host GH-9592 addressed a bug in a previous version of the AKS Mariner host kernel that blocked the CH v39 upgrade. This bug has now been fixed so we undo that PR. Note we also specify a different OCI version for Mariner as it differs from Ubuntu's. Fixes: #9594 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-08-26 21:07:25 +00:00
Gabriela Cervantes	3a14b04621	gha: Fix entry for ci coco stability yaml This PR fixes the entry or use of the ci weekly GHA workflow to run properly the weekly k8s tests. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-26 17:14:35 +00:00
Gabriela Cervantes	95f6246858	gha: Add GHA workflow to run Kata CoCo stability tests This PR adds a GHA workflow to run Kata CoCo weekly stablity tests. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-26 17:05:21 +00:00
Silenio Quarti	11ba8f05ca	runtime: Allow machine_type in kata config for remote hypervisors Fixes: #10211 Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>	2024-08-26 10:17:40 -04:00
Lei Huang	70168a467d	runtime: check if cold_plug_vfio is enabled before create PhysicalEndpoint PhysicalEndpoint unbinds its VF interface and rebinds it as a VFIO device, then cold-plugs the VFIO device into the guest kernel. When `cold_plug_vfio` is set to "no-port", cold-plugging the VFIO device will fail. This change checks if `cold_plug_vfio` is enabled before creating PhysicalEndpoint to avoid unnecessary VFIO rebind operations. Fixes: #10162 Signed-off-by: Lei Huang <leih@nvidia.com>	2024-08-23 15:42:17 -07:00
GabyCT	6b0272d6bf	Merge pull request #10193 from GabyCT/topic/k8ssoak stability: Add kubernetes parallel test	2024-08-23 15:51:01 -06:00
GabyCT	83177efb9b	Merge pull request #10201 from GabyCT/topic/readmeopenvino metrics: Add OpenVINO general information into README	2024-08-23 14:11:26 -06:00
Bo Chen	a0bd78b358	Merge pull request #10205 from likebreath/0819/upgrade_clh_v41.0 Upgrade to Cloud Hypervisor v41.0	2024-08-23 10:01:41 -07:00
Hyounggyu Choi	169b4490d2	Merge pull request #10209 from fidencio/topic/kata-manager-avoid-rate-pull-limit kata-manager: Avoid docker rate-limit	2024-08-23 12:52:14 +02:00
Fabiano Fidêncio	7f0289de60	kata-manager: Avoid docker rate-limit To do so, use a test image from quay.io instead of docker.io. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-08-23 11:56:09 +02:00
Fabiano Fidêncio	45f69373a6	Merge pull request #10199 from BbolroC/make-cdh-api-timeout-configurable agent/config: Make CDH_API_TIMEOUT configurable	2024-08-23 11:04:10 +02:00
Hyounggyu Choi	4cd83d2b98	Merge pull request #10202 from BbolroC/fix-k8s-tests-s390x tests: Fix k8s test issues on s390x	2024-08-23 09:51:11 +02:00
Fabiano Fidêncio	11bb9231c2	Merge pull request #10207 from amshinde/remove-image-check-cc Revert "tests: add image check before running coco tests"	2024-08-23 09:33:39 +02:00
Alex Lyn	44bf7ccb46	Merge pull request #10141 from soulfy/fix-delete-failed agent: kill child process when console socket closed	2024-08-23 14:00:53 +08:00
Archana Shinde	b0be03a93f	Revert "tests: add image check before running coco tests" This reverts commit `41b7577f08`. We were seeing a lot of issues in the TDX CI of the nature: "Error: failed to create containerd container: create instance 470: object with key "470" already exists: unknown" With the TDX CI, we moved to having the nydus snapsotter pre-installed. Essentially the `deploy-snapshotter` step was performed once before any actual CI runs. We were seeing failures related to the error message above. On reverting this change, we are no longer seeing errors related to "key exists" with the TDX CI passing now. The change reverted here is related to downloading incomplete images, but this seems to be messing up TDX CI. It is possible to pass --snapshotter to `ctr image check` but that does not seem to have any effect on the data set returned. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org> Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-08-22 18:05:42 -07:00
Bo Chen	254f8bca74	runtime: clh: Re-generate the client code This patch re-generates the client code for Cloud Hypervisor v41.0. Note: The client code of cloud-hypervisor's OpenAPI is automatically generated by openapi-generator. Fixes: #10203 Signed-off-by: Bo Chen <chen.bo@intel.com>	2024-08-22 11:05:54 -07:00
Bo Chen	e69535326d	versions: Upgrade to Cloud Hypervisor v41.0 Details of this release can be found in our roadmap project as iteration v41.0: https://github.com/orgs/cloud-hypervisor/projects/6. Fixes: #10203 Signed-off-by: Bo Chen <chen.bo@intel.com>	2024-08-22 11:02:26 -07:00
Gabriela Cervantes	2fa8e85439	metrics: Add OpenVINO general information into README This PR adds the OpenVINO benchmark general information into the machine learning README metrics information. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-22 16:08:06 +00:00
Hyounggyu Choi	274de8c6af	tests: Introduce wait_time to k8s_create_pod() In certain environments (e.g., those with lower performance), `k8s_create_pod()` may require additional wait time, especially when dealing with large images. Since `k8s_wait_pod_be_ready()` — which is called by `k8s_create_pod()` — already accepts `wait_time` as a second argument, it makes sense to introduce `wait_time` to `k8s_create_pod()` and propagate it to the callee. This commit adds `wait_time` to `k8s_create_pod()` as the 2nd (optional) argument. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-08-22 17:46:53 +02:00
Hyounggyu Choi	5d7397cc69	tests: Load confidential_kbs.sh in k8s-guest-pull-iamge.bats Some of the tests call set_metadata_annotation() for updating the kernel parameters. For `kata-qemu-se`, repack_secure_image() is called which is defined in `lib_se.sh` and sourced by `confidential_kbs.sh`. This commit ensures that the function call chain for the relevant `KATA_HYPERVISOR` is properly handled. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-08-22 17:33:38 +02:00
Fabiano Fidêncio	890fa26767	Merge pull request #10196 from fidencio/topic/ci-commit-message-take-reapply-into-consideration ci: commit-message-check: Take re-revert into consideration	2024-08-22 17:31:27 +02:00
Fabiano Fidêncio	2f6edc4b9b	Merge pull request #10194 from fidencio/topic/kata-deploy-re-work-logic kata-deploy: Rework the logic a little bit	2024-08-22 16:46:36 +02:00
Hyounggyu Choi	baa8af3f8e	doc: Update how-to-set-sandbox-config-kata.md This commit add a row for `cdh_api_timeout` to the agent options in how-to-set-sandbox-config-kata.md. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-08-22 14:50:51 +02:00
Hyounggyu Choi	7d0aba1a24	runtime: Enable to get cdh_api_timeout from configuration file This commit allows `cdh_api_timeout` to be configured from the configuration file. The configuration is commented out with specifying a default value (50s) because the default value is configured in the agent. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-08-22 14:47:37 +02:00
Hyounggyu Choi	8615516823	agent: Add agent.cdh_api_timeout to README This commit adds an explanation for `cdh_api_timeout` to the README file. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-08-22 14:47:37 +02:00
Fabiano Fidêncio	a9a1345a31	kata-deploy: Print the action the script was invoked with This increases debuggability. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-08-22 14:32:33 +02:00
Fabiano Fidêncio	ab493b6028	kata-deploy: Move general logic to the correct actions therwise we may end up running into unexpected issues when calling the cleanup option, as the same checks would be done, and files could end up being copied again, overwriting the original content which was backked up by the install option. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-22 14:32:29 +02:00
Fabiano Fidêncio	6596012956	kata-deploy: Simplify check for runtime Let's write the runtime check in a shorter and simpler to read form. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-08-22 14:32:02 +02:00
Hyounggyu Choi	2512ddeab2	agent/cdh: Use AGENT_CONFIG.cdh_api_timeout for CDH_API_TIMEOUT This commit updates CDH_API_TIMEOUT to use AGENT_CONFIG.cdh_api_timeout and changes it from a `const` to `lazy_static` to accommodate runtime-determined values. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-08-22 10:09:16 +02:00
Hyounggyu Choi	6139e253a0	agent/config: Add cdh_api_timeout to AgentConfig To make the `cdh_api_timeout` variable configurable, it has been added to the `AgentConfig` structure. This change includes storing the variable as a `time::Duration` type and generalizing the existing `hotplug_timeout` code to handle both timeouts. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-08-22 10:09:16 +02:00
GabyCT	3fd108b09a	Merge pull request #10198 from GabyCT/topic/remvaropenvino metrics: Remove unused variable in openvino script	2024-08-21 15:48:56 -06:00
Dan Mihai	8ccc8a8d0b	Merge pull request #9911 from microsoft/saulparedes/mounts genpolicy: deny UpdateEphemeralMountsRequest	2024-08-21 10:12:28 -07:00
Gabriela Cervantes	59e31baaee	metrics: Remove unused variable in openvino script This PR removes an unused variable in the openvino script for kata metrics. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-21 16:05:55 +00:00
Greg Kurz	09a13da8ec	Merge pull request #10197 from beraldoleal/release-3.8 release: Bump VERSION to 3.8.0	2024-08-21 17:50:10 +02:00
Beraldo Leal	55bdb380fb	release: Bump VERSION to 3.8.0 Let's start the 3.8.0 release. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2024-08-21 10:24:07 -04:00
Gabriela Cervantes	27d5539954	stability: Add pod deployment yaml for soak test This PR adds the pod deployment yaml for soak test which is part of the stability k8s tests. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-21 14:23:22 +00:00
Fabiano Fidêncio	3fd021a9b3	ci: commit-message-check: Take re-revert into consideration `Reapply "` should be taken into sonsideration as well. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-08-21 14:19:16 +02:00
Fabiano Fidêncio	f071c8cada	Merge pull request #10191 from fidencio/topic/ci-temporarily-revert-helm-usage ci: Let's temporarily revert the helm charts usage in our CI	2024-08-21 10:52:23 +02:00
Dan Mihai	6654491cc3	genpolicy: deny UpdateEphemeralMountsRequest * genpolicy: deny UpdateEphemeralMountsRequest Deny UpdateEphemeralMountsRequest by default, because paths to critical Guest components can be redirected using such request. Signed-off-by: Dan Mihai <Daniel.Mihai@microsoft.com>	2024-08-20 18:28:17 -07:00
Gabriela Cervantes	c04a805215	stability: Add kubernetes parallel test This PR adds a kubernetes parallel test that will launch multiple replicas from a kubernetes deployment and we will iterate this multiple times to verify that we are able to do this using CoCo Kata. This test will be part of the CoCo Kata stability CI. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-20 23:24:22 +00:00
Fabiano Fidêncio	b18c3dfce3	Revert "kata-deploy: Add Helm Chart" (partially) This partially reverts commit `94b3348d3c`, as there's more work needed in order to have this one done in a robust way, and we are taking the safer path of reverting for now, and adding it back as soon as the release is cut out. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-08-21 00:09:11 +02:00
Fabiano Fidêncio	36f4038a89	Revert "ci: Use helm to deploy kata-deploy" (partially) This partially reverts commit `51690bc157`, as there's more work needed in order to have this one done in a robust way, and we are taking the safer path of reverting for now, and adding it back as soon as the release is cut out. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-08-21 00:09:11 +02:00
Fabiano Fidêncio	21f9f01e1d	Revert "ci: make cleanup_kata_deploy really simple" This reverts commit `1221ab73f9`, as there's more work needed in order to have this one done in a robust way, and we are taking the safer path of reverting for now, and adding it back as soon as the release is cut out. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-08-21 00:09:11 +02:00
GabyCT	e0bff7ed14	Merge pull request #10177 from GabyCT/topic/cocoghas gha: Add k8s stability Kata CoCo GHA workflow	2024-08-20 15:12:29 -06:00
Gabriela Cervantes	ca3d778479	gha: Add Kata CoCo Stability workflow This PR adds the Kata CoCo Stability workflow that will setup the environment to run the k8s tests on a non-tee environment. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-20 16:34:33 +00:00
Gabriela Cervantes	3ebaa5d215	gha: Add Kata CoCo stability weekly yaml This PR adds the Kata CoCo stability weekly yaml that will trigger weekly the k8s stability tests. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-20 16:32:03 +00:00
Fabiano Fidêncio	aeb6f54979	Merge pull request #10180 from fidencio/topic/ci-ensure-the-key-was-created-on-kbs ci: Ensure the KBS resources are created	2024-08-20 09:07:56 +02:00
Fabiano Fidêncio	40d385d401	Merge pull request #10188 from wainersm/kbs_key tests/k8s: check and save kbs.key	2024-08-19 23:29:10 +02:00
Fabiano Fidêncio	c0d7222194	ci: Ensure the KBS resources are created Otherwise we may have tests failing due to the resource not being created yet. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-19 23:27:06 +02:00
Wainer dos Santos Moschetta	e014eee4e8	tests/k8s: check and save kbs.key The deploy-kbs.sh script generates the kbs.key that's used to install KBS. This same file is used lately by kbs-client to authenticate. This ensures that the file was created, otherwise fail. Another problem solved here is that on bare-metal machines the key doesn't survive a reboot as it is created in a temporary directory (/tmp/trustee). So let's save the file to a non-temporary location. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-08-19 16:03:03 -03:00
Wainer Moschetta	6a982930e2	Merge pull request #10183 from fidencio/topic/kata-deploy-use-runtime_path kata-deploy: Stop symlinking into /usr/local/bin	2024-08-19 13:17:21 -03:00
Fabiano Fidêncio	42d48efcc2	Merge pull request #10181 from fidencio/topic/ci-fix-stdio-typo ci: stdio: Fix typo on getting the containerd version	2024-08-18 16:05:42 +02:00
Fabiano Fidêncio	e0ae398a2e	Merge pull request #10151 from squarti/rootdir2 runtime: Files are not synced between host and guest VMs	2024-08-18 12:32:52 +02:00
Fabiano Fidêncio	d03b72f19b	kata-deploy: Stop linking binaries to /usr/local/bin Neither CRI-O nor containerd requires that, and removing such symlinks makes everything less intrusive from our side. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-18 01:25:12 +02:00
Fabiano Fidêncio	c2393dc467	kata-deploy: Use shim's absolute path for crio's runtime_path This will allow us, in the future, not have to do symlinks here and there. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-18 01:25:12 +02:00
Fabiano Fidêncio	58623723b1	kata-deploy: Use runtime_path for containerd It's already being used with CRi-O, let's simplify what we do and also use this for containerd, which will allow us to do further cleanups in the coming patches. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-18 01:25:12 +02:00
Fabiano Fidêncio	e75c149dec	ci: stdio: Properly start running the test "gha-run.sh" requires a `run` argument in order to run the tests, which seems to be forgotten when the test was added. This PR needs to get merged before the test can successfully run. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-17 14:41:44 +02:00
Fabiano Fidêncio	dd2d9e5524	ci: stdio: Fix typo on getting the containerd version I assume the PR that introduced this was based on an older version of yq, and as the test couldn't run before it got merged we never noticed the error. However, this test has been failing for a reasonable amount of time, which makes me think that we either need a maintainer for it, or just remove it completely, but that's a discussion for another day. For now, let's make it, at least, run. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-17 14:06:24 +02:00
Fabiano Fidêncio	7113490cb1	Merge pull request #10179 from fidencio/topic/switch-nginx-image ci: k8s: Replace nginx alpine images	2024-08-17 13:07:31 +02:00
Fabiano Fidêncio	0831081399	ci: k8s: Replace nginx alpine images The previous ones are gone, so let's switch to our own multi-arch image for the tests. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-17 12:19:33 +02:00
Fabiano Fidêncio	a78d82f4f1	Merge pull request #10159 from squarti/main agent: Handle EINVAL error when umounting container rootfs	2024-08-16 22:07:50 +02:00
Dan Mihai	79c1d0a806	Merge pull request #10136 from microsoft/danmihai1/docker-image-volume2 genpolicy: add bind mounts for image volumes	2024-08-16 13:07:01 -07:00
Fabiano Fidêncio	28aa4314ba	Merge pull request #10175 from ChengyuZhu6/error_message runtime: Add specific error message for gRPC request timeouts	2024-08-16 22:06:49 +02:00
Fabiano Fidêncio	720edbe3fc	Merge pull request #10174 from ChengyuZhu6/install_script tools: install luks-encrypt-storage script by guest-components	2024-08-16 22:04:56 +02:00
Fabiano Fidêncio	7b5da45059	Merge pull request #10178 from fidencio/topic/revert-trustee-bump Revert "version: bump trustee version"	2024-08-16 21:48:30 +02:00
Gabriela Cervantes	6ea34f13e1	gha: Add k8s stability Kata CoCo GHA workflow This PR adds the k8s stability Kata CoCo GHA workflow to run weekly the k8s stability tests. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-16 16:14:15 +00:00
Fabiano Fidêncio	45f43e2a6a	Revert "version: bump trustee version" This reverts commit `d35320472c`. Although the commit in question does solve an issue related to the usage of busybox from docker.io, as it's reasonably easy to hit the rate limit, the commit also brings in functionalities that are causing issues in, at least, the TDX CI, such as: ```sh [2024-08-16T16:03:52Z INFO actix_web::middleware::logger] 10.244.0.1 "POST /kbs/v0/attest HTTP/1.1" 401 259 "-" "attestation-agent-kbs-client/0.1.0" 0.065266 [2024-08-16T16:03:53Z INFO kbs::http::attest] Auth API called. [2024-08-16T16:03:53Z INFO actix_web::middleware::logger] 10.244.0.1 "POST /kbs/v0/auth HTTP/1.1" 200 74 "-" "attestation-agent-kbs-client/0.1.0" 0.000169 [2024-08-16T16:03:54Z INFO kbs::http::attest] Attest API called. [2024-08-16T16:03:54Z INFO verifier::tdx] Quote DCAP check succeeded. [2024-08-16T16:03:54Z INFO verifier::tdx] MRCONFIGID check succeeded. [2024-08-16T16:03:54Z INFO verifier::tdx] CCEL integrity check succeeded. [2024-08-16T16:03:54Z ERROR kbs::http::error] Attestation failed: Verifier evaluate failed: TDX Verifier: failed to parse AA Eventlog from evidence Caused by: at least one line should be included in AAEL ``` Let's revert this for now, and then once we get this one fixed on trustee side we'll update again. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-16 18:10:38 +02:00
Dan Mihai	c22ac4f72c	genpolicy: add bind mounts for image volumes Add bind mounts for volumes defined by docker container images, unless those mounts have been defined in the input K8s YAML file too. For example, quay.io/opstree/redis defines two mounts: /data /node-conf Before these changes, if these mounts were not defined in the YAML file too, the auto-generated policy did not allow this container image to start. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-08-16 15:11:05 +00:00
Fabiano Fidêncio	b203f715e5	Merge pull request #10170 from beraldoleal/deploy-reset-fix kata-deploy: fix kata-deploy reset	2024-08-16 16:51:14 +02:00
Fabiano Fidêncio	8d63723910	Merge pull request #10161 from microsoft/saulparedes/ignore_role_resource genpolicy: ignore Role resource	2024-08-16 16:50:16 +02:00
Fabiano Fidêncio	6c58ae5b95	Merge pull request #10171 from fidencio/topic/ci-treat-nydus-snapshotter-as-a-dep ci: nydus: Treat the snapshotter as a dependency	2024-08-16 16:39:48 +02:00
ChengyuZhu6	1eda6b7237	tests: update error message with guest pulling image timeout update error message with guest pulling image timeout. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-08-16 20:26:33 +08:00
ChengyuZhu6	ca05aca548	runtime: Add specific error message for gRPC request timeouts Improved error handling to provide clearer feedback on request failures. For example: Improve createcontainer request timeout error message from "Error: failed to create containerd task: failed to create shim task:context deadline exceed" to "Error: failed to create containerd task: failed to create shim task: CreateContainerRequest timed out: context deadline exceed". Fixes: #10173 -- part II Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-08-16 20:24:48 +08:00
Beraldo Leal	b3a4cd1a06	Merge pull request #10172 from deagon/fix-typo osbuilder: fix typo in ubuntu rootfs depends	2024-08-16 08:01:59 -04:00
Beraldo Leal	b843b236e4	kata-deploy: improve kata-deploy script For the rare cases where containerd_conf_file does not exist, cp could fail and let the pod in Error state. Let's make it a little bit more robust. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2024-08-16 07:52:38 -04:00
ChengyuZhu6	aa31a9d3c4	tools: install luks-encrypt-storage script by guest-components Install luks-encrypt-storage script by guest-components. So that we can maintain a single source and prevent synchronization issues. Fixes: #10173 -- part I Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-08-16 16:28:20 +08:00
Chengyu Zhu	ba3c484d12	Merge pull request #9999 from ChengyuZhu6/trusted-storage Trusted image storage	2024-08-16 15:39:50 +08:00
Fabiano Fidêncio	0f3eb2451e	Merge pull request #10169 from fidencio/topic/revert-reset_runtime-to-cleanup Revert "ci: add reset_runtime to cleanup"	2024-08-16 07:29:58 +02:00
Aurélien Bombo	e1775e4719	Merge pull request #10164 from BbolroC/make-exec_host-stable tests: Ensure exec_host() consistently captures command output	2024-08-15 21:43:32 -07:00
Guoqiang Ding	1d21ff9864	osbuilder: fix typo in ubuntu rootfs depends Remove the duplicate package "xz-utils". Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>	2024-08-16 11:33:55 +08:00
Silenio Quarti	5d815ffde1	runtime: Files are not synced between host and guest VMs This PR resolves the default kubelet root dir symbolic link and uses it as the absolute path for the fs watcher regexs Fixes: https://github.com/kata-containers/kata-containers/issues/9986 Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>	2024-08-15 23:19:08 -04:00
Silenio Quarti	0dd16e6b25	agent: Handle EINVAL error when umounting container rootfs Container/Sandbox clean up should not fail if root FS is not mounted. This PR handles EINVAL errors when umount2 is called. Fixes: #10166 Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>	2024-08-15 19:41:46 -04:00
Fabiano Fidêncio	3733266a60	ci: nydus: Treat the snapshotter as a dependency Instead of deploying and removing the snapshotter on every single run, let's make sure the snapshotter is always deploy on the TDX case. We're doing this as an experiment, in order to see if we'll be able to reduce the failures we've been facing with the nydus snapshotter. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-15 22:44:30 +02:00
Hyounggyu Choi	ba3e5f6b4a	Revert "tests: Disable k8s file volume test" This reverts commit `e580e29246`. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-08-15 21:10:39 +02:00
Hyounggyu Choi	758e650a28	tests: Ensure exec_host() consistently captures command output The `exec_host()` function often fails to capture the output of a given command because the node debugger pod is prematurely terminated. To address this issue, the function has been refactored to ensure consistent output capture by adjusting the `kubectl debug` process as follows: - Keep the node debugger pod running - Wait until the pod is fully ready - Execute the command using `kubectl exec` - Capture the output and terminate the pod This commit refactors `exec_host()` to implement the above steps, improving its reliability. Fixes: #10081 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-08-15 21:10:39 +02:00
Beraldo Leal	74662a0721	Merge pull request #10137 from hex2dec/fix-image-warning tools: Fix container image build warning	2024-08-15 14:45:41 -04:00
Dan Mihai	905c76bd47	Merge pull request #10153 from microsoft/saulparedes/support_cron_job genpolicy: Add support for cron jobs	2024-08-15 11:11:00 -07:00
Aurélien Bombo	0223eedda5	Merge pull request #10050 from burgerdev/request-hardening genpolicy: hardening some agent requests	2024-08-15 08:31:21 -07:00
Fabiano Fidêncio	1f6a8baaf1	Revert "ci: add reset_runtime to cleanup" This reverts commit `8d9bec2e01`, as it causes issues in the operator and kata-deploy itself, leading to the node to be NotReady. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-15 16:09:34 +02:00
ChengyuZhu6	5f4209e008	agent:README: add secure_image_storage_integrity to agent's README add secure_image_storage_integrity to agent's README. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-08-15 20:32:44 +08:00
ChengyuZhu6	6ecb2b8870	tests: skip test trusted storage in qemu-coco-dev I can't set up loop device with `exec_host`, which the command is necessary for qemu-coco-dev. See issue #10133. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-08-15 20:32:44 +08:00
ChengyuZhu6	51b9d20d55	tests: update error message in pulling image encrypted tests Update error message in pulling image encrypted to "failed to get decrypt key no suitable key found for decrypting layer key". Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-08-15 20:32:44 +08:00
ChengyuZhu6	b4d10e7655	version: update the version of coco-guest-components update the version of coco-guest-components. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-08-15 20:32:43 +08:00
Fupan Li	365df81d5e	Merge pull request #10148 from lifupan/main_sandboxapi runtime-rs: Add the wait_vm support for hypervisors	2024-08-15 17:08:38 +08:00
ChengyuZhu6	a9b436f788	agent:cdh: Introduces secure_mount API in cdh Introduces `secure_mount` API in the cdh. It includes: - Adding the `SecureMountServiceClient`. - Implementing the `secure_mount` function to handle secure mounting requests. - Updating the confidential_data_hub.proto file to define SecureMountRequest and SecureMountResponse messages and adding the SecureMountService service. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-08-15 13:55:23 +08:00
ChengyuZhu6	1528d543b2	agent:cdh: Rename sealed_secret API namespace to confidential_data_hub renames the sealed_secret.proto file to confidential_data_hub.proto and updates the corresponding API namespace from sealed_secret to confidential_data_hub. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-08-15 13:55:23 +08:00
ChengyuZhu6	37bd2406e0	docs: add content about how to pull large image Add content about how to pull large image in the guest with trust storage. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-08-15 13:55:22 +08:00
ChengyuZhu6	c5a973e68c	tests:k8s: add tests for guest pull with configured timeout add tests for guest pull with configured timeout: 1) failed case: Test we cannot pull a large image that pull time exceeds a short creatcontainer timeout(10s) inside the guest 2) successful case: Test we can pull a large image inside the guest with increasing createcontainer timeout(120s) Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-08-15 13:55:22 +08:00
ChengyuZhu6	6c506cde86	tests:k8s: add tests for pull images in the guest using trusted storage add tests for pull images in the guest using trusted storage: 1) failed case: Test we cannot pull an image that exceeds the memory limit inside the guest 2) successful case: Test we can pull an image inside the guest using trusted ephemeral storage. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-08-15 13:55:22 +08:00
GabyCT	ecfbc9515a	Merge pull request #10158 from GabyCT/topic/k8sstabil tests: Add kubernetes stability test	2024-08-14 14:44:49 -06:00
Saul Paredes	5ad47b8372	genpolicy: ignore Role resource Ignore Role resources because they don't need a Policy. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-08-14 12:57:06 -07:00
Gabriela Cervantes	d48ad94825	tests: Add kubernetes stability test This PR adds a k8s stability test that will be part of the CoCo Kata stability tests that will run weekly. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-14 15:30:49 +00:00
Fupan Li	cadcf5f92d	runtime-rs: Add the wait_vm support for hypervisors Add the wait_vm method for hypervisors. This is a prerequisite for sandbox api support. Fixes: #7043 Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-08-14 12:01:34 +08:00
Fupan Li	506977b102	Merge pull request #10156 from GabyCT/topic/disablevolume tests: Disable k8s file volume test	2024-08-14 12:00:47 +08:00
GabyCT	b0b6a1baea	Merge pull request #10154 from GabyCT/topic/stressk8s tests: Add kubernetes stress-ng tests	2024-08-13 15:09:59 -06:00
Gabriela Cervantes	e580e29246	tests: Disable k8s file volume test This PR disables the k8s file volume test as we are having random failures in multiple GHA CIs mainly because the exec_host function sometimes does it not work properly. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-13 20:50:18 +00:00
Saul Paredes	af598a232b	tests: add test for cron job support Add simple test for cron job support Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-08-13 10:47:42 -07:00
Saul Paredes	88451d26d0	genpolicy: add support for cron jobs Add support for cron jobs Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-08-13 10:47:42 -07:00
Gabriela Cervantes	bdca5ca145	tests: Add kubernetes stress-ng tests This PR adds kubernetes stress-ng tests as part of the stability testing for kata. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-13 16:23:52 +00:00
Fabiano Fidêncio	99730256a2	Merge pull request #10149 from fidencio/topic/kata-manager-relax-opt-check kata-manager: Only check files when tarball is not passed	2024-08-13 16:26:16 +02:00
Markus Rudy	bce5cb2ce5	genpolicy: harden CreateSandboxRequest checks Hooks are executed on the host, so we don't expect to run hooks and thus require that no hook paths are set. Additional Kernel modules expand the attack surface, so require that none are set. If a use case arises, modules should be allowlisted via settings. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2024-08-13 09:01:58 +02:00
Markus Rudy	aee23409da	genpolicy: harden CopyFileRequest checks CopyFile is invoked by the host's FileSystemShare.ShareFile function, which puts all files into directories with a common pattern. Copying files anywhere else is dangerous and must be prevented. Thus, we check that the target path prefix matches the expected directory pattern of ShareFile, and that this directory is not escaped by .. traversal. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2024-08-13 09:01:58 +02:00
soulfy	722b576eb3	agent: kill child process when console socket closed when use debug console, the shell run in child process may not be exited, in some scenes. eg. directly Ctrl-C in the host to terminate the kata-runtime process, that will block the task handling the console connection，while waiting for the child to exit. Signed-off-by: soulfy <liukai254@jd.com>	2024-08-13 10:18:03 +08:00
Steve Horsman	91084058ae	Merge pull request #10007 from wainersm/run_k8s_on_free_runners ci: Transition GARM tests to free runners, pt. II	2024-08-12 18:12:18 +01:00
Fabiano Fidêncio	5fe65e9fc2	kata-manager: Only check files when tarball is not passed Only do the checking in case the tarball was not explicitly passed by the user. We have no control of what's passed and we cannot expect that all the files are going to be under /opt. Fixes: #10147 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-12 13:54:24 +02:00
ChengyuZhu6	c3a0ab4b93	tests:k8s: Re-enable and refactor the tests with guest pull Currently, setting `io.containerd.cri.runtime-handler` annotation in the yaml is not necessary for pulling images in the guest. All TEE hypervisors are already running tests with guest-pulling enabled. Therefore, we can remove some duplicate tests and re-enable the guest-pull test for running different runtime pods at the same time. While considering to support different containerd version, I recommend to keep setting "io.containerd.cri.runtime-handler". Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-08-12 16:36:54 +08:00
ChengyuZhu6	47be9c7c01	osbuilder:rootfs: install init_trusted_storage script Install init_trusted_storage script if enable MEASURED_ROOTFS. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com> Co-authored-by: Wang, Arron <arron.wang@intel.com> Co-authored-by: Anand Krishnamoorthi <anakrish@microsoft.com>	2024-08-12 16:36:54 +08:00
ChengyuZhu6	df993b0f88	agent:rpc: initialize trusted storage device Initialize the trusted stroage when the device is defined as "/dev/trusted_store" with shell script as first step. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com> Co-authored-by: Wang, Arron <arron.wang@intel.com>	2024-08-12 16:36:54 +08:00
ChengyuZhu6	94347e2537	agent:config: Support secure_storage_integrity option for trusted storage After enable secure storage integrity for trusted storage, the initialize time will take more times, the default value will be NOT enabled but add this config to allow the user to enable if they care more strict security. Fixes: #8142 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com> Co-authored-by: Wang, Arron <arron.wang@intel.com>	2024-08-12 16:36:54 +08:00
GabyCT	775f6bdc5c	Merge pull request #10142 from GabyCT/topic/updatestress tests: Update ubuntu image for stress Dockerfile	2024-08-09 16:11:35 -06:00
Gabriela Cervantes	5e5fc145cd	tests: Update ubuntu image for stress Dockerfile This PR updates the ubuntu image for stress Dockerfile. The main purpose is to have a more updated image compared with the one that is in libpod which has not been updated in a while. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-09 15:29:10 +00:00
Steve Horsman	e4c023a9fa	Merge pull request #10140 from stevenhorsman/kata-version-in-artefact-version ci: cache: Include kata version in artefact versions	2024-08-09 11:37:09 +01:00
Fabiano Fidêncio	44b08b84b0	Merge pull request #10113 from Freax13/fix/no-scsi-off qemu: don't emit scsi parameter	2024-08-08 16:23:36 +02:00
stevenhorsman	b6a3a3f8fe	ci: cache: Include kata version in artefact versions - At the moment we aren't factoring in the kata version on our caches, so it means that when we bump this just before release, we don't rebuilt components that pull in the VERSION content, so the release build ends up with incorrect versions in it's binaries Fixes: #10092 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-08-08 14:58:58 +01:00
GabyCT	584d7a265e	Merge pull request #10127 from GabyCT/topic/execimage tests:k8s: Update image in kubectl debug for the exec host function	2024-08-07 17:00:52 -06:00
Archana Shinde	1012449141	Merge pull request #10129 from hex2dec/qemu-aio-native tools: Support for building qemu with linux aio	2024-08-07 14:32:52 -07:00
Archana Shinde	a6a736eeaf	Merge pull request #10089 from amshinde/enable-nerdctl-clh ci: Enable nerdctl tests for clh	2024-08-07 12:13:00 -07:00
Wainer dos Santos Moschetta	374405aed1	workflows/run-k8s-tests-on-amd64: remove 'instance' from matrix The jobs are all executed on ubuntu-22.04 so it's invariant and can be removed from the matrix (this will shrink the jobs names). Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-08-07 16:00:39 -03:00
Wainer dos Santos Moschetta	d11ce129ac	workflows: merge run-k8s-tests-on-garm and run-k8s-tests-with-crio-on-garm Created the run-k8s-tests-on-amd64.yaml which is a merge of run-k8s-tests-on-garm.yaml and run-k8s-tests-with-crio-on-garm.yaml ps: renamed the job from 'run-k8s-tests' to 'run-k8s-tests-on-amd64' to it is easier to find on Github UI and be distinguished from s390x, ppc64le, etc... Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-08-07 15:50:43 -03:00
Wainer dos Santos Moschetta	ed0732c75d	workflows: migrate run-k8s-tests-with-crio-on-garm to free runners Switch to Github managed runners just like the run-k8s-tests-on-garm workflow. See: #9940 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-08-07 15:20:42 -03:00
Wainer dos Santos Moschetta	3d053a70ab	workflows: migrate run-k8s-tests-on-garm to free runners Switched to Github managed runners. The instance_type parameter was removed and K8S_TEST_HOST_TYPE is set to "all" which combine the tests of "small" and "normal". This way it will reduze to half of the jobs. See: #9940 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-08-07 15:20:42 -03:00
Wainer dos Santos Moschetta	dfb92e403e	tests/k8s: add "deploy-kata"/"cleanup" actions to gh-run.sh These new "kata-deploy" and "cleanup" actions are equivalent to "kata-deploy-garm" "cleanup-garm", respectively, and should be used on the workflows being migrated from GARM to Github's managed runners. Eventually "kata-deploy-garm" and "cleanup-garm" won't be used anymore then we will be able to remove them. See: #9940 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-08-07 15:20:23 -03:00
Zhiwei Huang	7270a7ba48	tools: Fix container image build warning All commands within the Dockerfile should use the same casing (either upper or lower).[1] [1]: https://docs.docker.com/reference/build-checks/consistent-instruction-casing/ Signed-off-by: Zhiwei Huang <ai.william@outlook.com>	2024-08-07 15:49:01 +08:00
Dan Mihai	2da77c6979	Merge pull request #10068 from burgerdev/genpolicy-test genpolicy: add crate-scoped integration test	2024-08-06 16:10:46 -07:00
GabyCT	fb166956ab	Merge pull request #10132 from fidencio/topic/support-image-pull-with-nerdctl runtime: image-pull: Make it work with nerdctl	2024-08-06 15:33:40 -06:00
Gabriela Cervantes	d0ca43162d	tests:k8s: Update image in kubectl debug for the exec host function This PR updates the image that we are using in the kubectl debug command as part of the exec host function, as the current alpine image does not allow to create a temporary file for example and creates random kubernetes failures. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-06 21:13:46 +00:00
Fabiano Fidêncio	63802ecdd9	Merge pull request #9880 from zvonkok/helm-chart kata-deploy: Add Helm Chart	2024-08-06 22:55:31 +02:00
Archana Shinde	ba884aac13	ci: Enable nerdctl tests for clh A recent fix should resolve some the issues seen earlier with clh with the go runtime. Enabling this test to check if the issue is still seen. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-08-06 10:41:42 -07:00
Fabiano Fidêncio	f33f2d09f7	runtime: image-pull: Make it work with nerdctl Our code for handling images being pulled inside the guest relies on a containerType ("sandbox" or "container") being set as part of the container annotations, which is done by the CRI Engine being used, and depending on the used CRI Engine we check for a specfic annotation related to the image-name, which is then passed to the agent. However, when running kata-containers without kubernetes, specifically when using `nerdctl`, none of those annotations are set at all. One thing that we can do to allow folks to use `nerdctl`, however, is to take advantage of the `--label` flag, and document on our side that users must pass `io.kubernetes.cri.image-name=$image_name` as part of the label. By doing this, and changing our "fallback" so we can always look for such annotation, we ensure that nerdctl will work when using the nydus snapshotter, with kata-containers, to perform image pulling inside the pod sandbox / guest. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-06 17:07:45 +02:00
Zvonko Kaiser	8d9bec2e01	ci: add reset_runtime to cleanup Adding reset_cleanup to cleanup action so that it is done automatically without the need to run yet another DS just to reset the runtime. This is now part of the lifecycle hook when issuing kata-deploy.sh cleanup Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-08-06 11:57:04 +02:00
Zvonko Kaiser	1221ab73f9	ci: make cleanup_kata_deploy really simple Remove the unneeded logic for cleanup the values are encapsulated in the deployed helm release Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-08-06 11:57:04 +02:00
Zvonko Kaiser	51690bc157	ci: Use helm to deploy kata-deploy Rather then modifying the kata-depoy scripts let's use Helm and create a values.yaml that can be used to render the final templates Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-08-06 11:57:04 +02:00
Zvonko Kaiser	94b3348d3c	kata-deploy: Add Helm Chart For easier handling of kata-deploy we can leverage a Helm chart to get rid of all the base and overlays for the various components Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-08-06 11:57:04 +02:00
Zhiwei Huang	d455883b46	tools: Support for building qemu with linux aio The kata containers hypervisior qemu configuration supports setting block_device_aio="native", but the kata static build of qemu does not add the linux aio feature. The libaio-dev library is a necessary dependency for building qemu with linux aio. Fixes: #10130 Signed-off-by: Zhiwei Huang <ai.william@outlook.com>	2024-08-06 14:30:45 +08:00
Markus Rudy	69535e5458	genpolicy: add crate-scoped integration test Provides a test runner that generates a policy and validates it with canned requests. The initial set of test cases is mostly for illustration and will be expanded incrementally. In order to enable both cross-compilation on Ubuntu test runners as well as native compilation on the Alpine tools builder, it is easiest to switch to the vendored openssl-src variant. This builds OpenSSL from source, which depends on Perl at build time. Adding the test to the Makefile makes it execute in CI, on a variety of architectures. Building on ppc64le requires a newer version of the libz-ng-sys crate. Fixes: #10061 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2024-08-05 11:52:01 +02:00
Markus Rudy	4d1416529d	genpolicy: fix clippy v1.78.0 warnings cargo clippy has two new warnings that need addressing: - assigning_clones These were fixed by clippy itself. - suspicious_open_options I added truncate(false) because we're opening the file for reading. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2024-08-05 11:48:30 +02:00
Fabiano Fidêncio	43dca8deb4	Merge pull request #10121 from microsoft/saulparedes/add_version_flag genpolicy: add --version flag	2024-08-03 21:22:10 +02:00
Fabiano Fidêncio	3b2173c87a	Merge pull request #10124 from fidencio/topic/ci-enable-encrypted-image-tests-for-tees ci: Enable encrypted image tests for TEEs	2024-08-03 11:39:51 +02:00
Fabiano Fidêncio	89f1581e54	ci: Enable encrypted image tests for TEEs After experimenting a little bit with those tests, they seem to be passing on all the available TEE machines. With this in mind, let's just enable them for those machines. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-03 09:27:32 +02:00
Fabiano Fidêncio	3b896cf3ef	Merge pull request #10125 from fidencio/topic/un-break-ci ci: Remove jobs that are not running	2024-08-03 09:27:04 +02:00
Fabiano Fidêncio	62a086937e	ci: Remove jobs that are not running When re-enabling those we'll need a smart way to do so, as this limit of 20 workflows referenced is just ... weird. However, for now, it's more important to add the jobs related to the new platforms than keep the ones that are actively disabled. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-03 09:24:05 +02:00
GabyCT	76af5a444b	Merge pull request #10075 from microsoft/saulparedes/hooks genpolicy: reject create custom hook settings	2024-08-02 15:36:34 -06:00
GabyCT	aadde2c25b	Merge pull request #10120 from kata-containers/fix_metrics_json_results_file Fix metrics json results file	2024-08-02 11:29:02 -06:00
Fabiano Fidêncio	b93a0642e0	Merge pull request #10123 from fidencio/topic/re-enable-arm-ci ci: re-enable arm CI	2024-08-02 17:48:35 +02:00
Dan Mihai	2628b34435	Merge pull request #10098 from microsoft/danmihai1/allow-failing agent: fix the AllowRequestsFailingPolicy functionality	2024-08-02 08:42:47 -07:00
GabyCT	8da5f7a72f	Merge pull request #10102 from ChengyuZhu6/fix-debug tests: Fix error with `kubectl debug`	2024-08-02 09:25:13 -06:00
Fabiano Fidêncio	551e0a6287	Merge pull request #10116 from GabyCT/topic/kbsdependencies tests: kbs: Add missing dependencies to install kbs cli	2024-08-02 14:22:28 +02:00
Fabiano Fidêncio	ed57ef0297	ci; aarch64: Enable builders as part of the CI As we have new runners added, let's enable the builders so we can prevent build failures happening after something gets merged. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-02 14:13:53 +02:00
Fabiano Fidêncio	388b5b0e58	Revert "ci: Temporarily remove arm64 builds" This reverts commit `e9710332e7`, as there are now 2 arm64-builders (to be expanded to 4 really soon). Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-02 13:53:50 +02:00
Fabiano Fidêncio	08be9c3601	Revert "ci: Temporarily remove arm64 builds -- part II" This reverts commit `c5dad991ce`, as there are now 2 arm64-builders (to be expanded to 4 really soon). Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-02 13:52:53 +02:00
Tom Dohrmann	322c80e7c8	qemu: don't emit scsi parameter This parameter has been deprecated for a long time and QEMU 9.1.0 finally removes it. Fixes: kata-containers#10112 Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>	2024-08-02 07:30:39 +02:00
Tom Dohrmann	b7999ac765	runtime-rs: don't emit scsi parameter for block devices This parameter has been deprecated for a long time and QEMU 9.1.0 finally removes it. Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>	2024-08-02 07:30:23 +02:00
Fabiano Fidêncio	4183680bc3	Merge pull request #10107 from fidencio/topic/rotate-journal-logs-every-run tests: k8s: Rotate & cleanup journal for every run	2024-08-02 07:27:10 +02:00
Fabiano Fidêncio	302e02aed8	Merge pull request #10114 from fidencio/topic/kata-manager-configure-qemu-and-ovmf-for-tdx kata-manager: Ensure distro specific TDX config is set	2024-08-02 07:24:57 +02:00
Saul Paredes	194cc7ca81	genpolicy: add --version flag - Add --version flag to the genpolicy tool that prints the current version - Add version.rs.in template to store the version information - Update makefile to autogenerate version.rs from version.rs.in - Add license to Cargo.toml Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-08-01 17:18:17 -07:00
David Esparza	dcd0c0b269	metrics: Remove duplicated headers from results file. This PR removes duplicated entries (vcpus count, and available memory), from onednn and openvino results files. Fixes: #10119 Signed-off-by: David Esparza <david.esparza.borquez@intel.com>	2024-08-01 18:11:06 -06:00
Dan Mihai	9e99329bef	genpolicy: reject create sandbox hooks Reject CreateSandboxRequest hooks, because these hooks may be used by an attacker. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-08-01 16:58:35 -07:00
ChengyuZhu6	2eac8fa452	tests: Fix error with `kubectl debug` The issue is similar to #10011. The root cause is that tty and stderr are set to true at same time in containerd: #10031. Fixes: #10081 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-08-02 07:32:30 +08:00
David Esparza	1e640ec3a6	metrics: fix pargins json results file. This PR encloses the search string for 'default_vcpus =' and 'default_memory =' with double quotes in order to parse the precise values, which are included in the kata configuration file. Fixes: #10118 Signed-off-by: David Esparza <david.esparza.borquez@intel.com>	2024-08-01 17:05:03 -06:00
Dan Mihai	c2a55552b2	agent: fix the AllowRequestsFailingPolicy functionality 1. Use the new value of AllowRequestsFailingPolicy after setting up a new Policy. Before this change, the only way to enable AllowRequestsFailingPolicy was to change the default Policy file, built into the Guest rootfs image. 2. Ignore errors returned by regorus while evaluating Policy rules, if AllowRequestsFailingPolicy was enabled. For example, trying to evaluate the UpdateInterfaceRequest rules using a policy that didn't define any UpdateInterfaceRequest rules results in a "not found" error from regorus. Allow AllowRequestsFailingPolicy := true to bypass that error. 3. Add simple CI test for AllowRequestsFailingPolicy. These changes are restoring functionality that was broken recently by commmit `df23eb09a6`. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-08-01 22:37:18 +00:00
Fabiano Fidêncio	66b0305eed	Merge pull request #10117 from fidencio/topic/temporarily-remove-arm-nightly-jobs-part-2 ci: Temporarily remove arm64 builds -- part II	2024-08-01 23:06:46 +02:00
GabyCT	20a88b6470	Merge pull request #10099 from GabyCT/topic/fixmemo metrics: Update memory tests to use grep -F	2024-08-01 13:48:36 -06:00
Fabiano Fidêncio	aef7da7bc9	tests: k8s: Rotate & cleanup journal for every run This will help to avoid huge logs, and allow us to debug issues in a better way. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-01 21:36:57 +02:00
Fabiano Fidêncio	c5dad991ce	ci: Temporarily remove arm64 builds -- part II Let's remove what we commented out, as publish manifest complains: ``` Created manifest list quay.io/kata-containers/kata-deploy-ci:kata-containers-latest ./tools/packaging/release/release.sh: line 146: --amend: command not found ``` Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-01 20:43:28 +02:00
Fabiano Fidêncio	5ec11afc21	Merge pull request #10111 from fidencio/topic/temporarily-remove-arm-nightly-jobs ci: Temporarily remove arm64 builds	2024-08-01 19:50:07 +02:00
Gabriela Cervantes	7454908690	metrics: Update memory tests to use grep -F This PR updates the memory tests like fast footprint to use grep -F instead of fgrep as this command has been deprecated. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-01 17:20:57 +00:00
Gabriela Cervantes	d72cb8ccfc	tests: kbs: Add missing dependencies to install kbs cli This PR adds missing packages depenencies to install kbs cli in a fresh new baremetal environment. This will avoid to have a failure when trying to run install-kbs-client. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-08-01 17:09:50 +00:00
Fabiano Fidêncio	bfd014871a	kata-manager: Ensure distro specific TDX config is set We've done something quite similar for kata-deploy, but I've noticed we forgot about the kata-manager counterpart. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-01 17:27:01 +02:00
Fabiano Fidêncio	e9710332e7	ci: Temporarily remove arm64 builds It's been a reasonable time that we're not able to even build arm64 artefacts. For now I am removing the builds as it doesn't make sense to keep running failing builds, and those can be re-enabled once we have arm64 machines plugged in that can be used for building the stuff, and maintainers for those machines. The `arm-jetson-xavier-nx-01` is also being removed from the runners. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-08-01 13:30:47 +02:00
Fabiano Fidêncio	c784fb6508	Merge pull request #10110 from ChengyuZhu6/bump-trustee version: bump trustee version	2024-08-01 07:34:38 +02:00
ChengyuZhu6	d35320472c	version: bump trustee version Bump trustee to the latest version to fix error with pulling busybox from dockerhub. Fixes: #10109 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-08-01 08:59:58 +08:00
Fupan Li	230aefc0da	Merge pull request #10070 from BbolroC/qemu-runtime-rs-k8s-s390x GHA: Run k8s e2e tests for qemu-runtime-rs on s390x	2024-07-31 18:41:11 +08:00
Chengyu Zhu	8e9f140ee0	Merge pull request #10080 from ChengyuZhu6/fix-coco-ci tests: add image check before running coco tests	2024-07-31 17:08:00 +08:00
Peng Tao	11e10647f9	Merge pull request #10104 from BbolroC/fix-zvsi-cleanup-s390x gha: Restore cleanup-zvsi for s390x	2024-07-31 16:23:26 +08:00
Chengyu Zhu	fc0f635098	Merge pull request #10101 from AdithyaKrishnan/main ci: Fix rate limit error by migrating busybox_image	2024-07-31 14:48:12 +08:00
ChengyuZhu6	2cfb32ac4d	version: bump nydus snapshotter to v0.13.14 bump nydus snapshotter to v0.13.14 to stabilize CIs. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-07-31 14:47:33 +08:00
ChengyuZhu6	41b7577f08	tests: add image check before running coco tests Currently, there are some issues with pulling images in CI, such as : https://github.com/kata-containers/kata-containers/actions/runs/10109747602/job/27959198585 This issue is caused by switching between different snapshotters for the same image in some scenarios. To resolve it, we can check existing images to ensure all content is available locally before running tests. Fixes: #10029 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-07-31 14:47:33 +08:00
Hyounggyu Choi	e135d536c5	gha: Restore cleanup-zvsi for s390x In #10096, a cleanup step for kata-deploy is removed by mistake. This leads to a cleanup error in the following `Complete job` step. This commit restores the removed step to resolve the current CI failure on s390x. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-07-31 06:42:16 +02:00
Adithya Krishnan Kannan	fdf7036d5e	ci: Fix rate limit error by migrating busybox_image Changing the busybox_image from docker to quay to fix rate limit errors. Signed-Off-By: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>	2024-07-30 22:32:22 -05:00
Hyounggyu Choi	c8a160d14a	Merge pull request #10096 from BbolroC/remove-pre-post-action-s390x gha: Eradicate {pre,post}-action steps for s390x runners	2024-07-30 22:30:05 +02:00
Hyounggyu Choi	8d529b960a	gha: Eradicate {pre,post}-action steps for s390x runners As suggested in #9934, the following hooks have been introduced for s390x runners: - ACTIONS_RUNNER_HOOK_JOB_STARTED - ACTIONS_RUNNER_HOOK_JOB_COMPLETED These hooks will perfectly replace the existing {pre,post}-action scripts. This commit wipes out all GHA steps for s390x where the actions are triggered. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-07-30 17:10:19 +02:00
Wainer Moschetta	528745fc88	Merge pull request #10052 from nubificus/feat_fix_qemu_after_8070 runtime-rs: Fix QEMU backend for runtime-rs	2024-07-30 11:00:14 -03:00
Fupan Li	de22b3c4bf	Merge pull request #10024 from lifupan/main runtime-rs: enable dragonball hypervisor support initrd	2024-07-30 16:00:42 +08:00
Fupan Li	e3f0d2a751	runtime-rs: enable dragonball hypervisor support initrd enable the dragonball support initrd. Fixes: #10023 Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-07-30 14:50:24 +08:00
Fupan Li	4fbf9d67a5	Merge pull request #10043 from lifupan/fix_sandbox runtime-rs : fix the issue of stop sandbox	2024-07-29 09:22:26 +08:00
Fabiano Fidêncio	949ffd146a	Merge pull request #10083 from microsoft/danmihai1/policy-tests tests: k8s: minor policy tests clean-up	2024-07-28 11:04:24 +02:00
Dan Mihai	3e348e9768	tests: k8s: rename hard-coded policy test script Rename k8s-exec-rejected.bats to k8s-policy-hard-coded.bats, getting ready to test additional hard-coded policies using the same script. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-26 20:14:05 +00:00
Dan Mihai	7b691455c2	tests: k8s: hard-coded policy for any platform Users of AUTO_GENERATE_POLICY=yes: - Already tested auto-generated policy on any platform. - Will be able to test hard-coded policy too on any platform, after this change. CI continues to test hard-coded policies just on the platforms listed here, but testing those policies locally (outside of CI) on other platforms can be useful too. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-26 19:30:03 +00:00
Dan Mihai	83056457d6	tests: k8s-policy-pod: avoid word splitting Avoid potential word splitting when using array of command args array. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-26 18:55:52 +00:00
Dan Mihai	5546ce4031	Merge pull request #10069 from microsoft/danmihai1/exec-args genpolicy: validate each exec command line arg	2024-07-26 11:39:44 -07:00
Fabiano Fidêncio	b0b04bd2f3	Merge pull request #10078 from fidencio/topic/increase-rootfs-confidential-slash-run-to-50-percent tee: osbuilder: Set /run to use 50% of the image with systemd	2024-07-26 18:37:41 +02:00
Anastassios Nanos	d11657a581	runtime-rs: Remove unused env vars from build Since we can't find a homogeneous value for the resource/cgroup management of multiple hypervisors, and we have decoupled the env vars in the Makefile, we don't need the generic ones. Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>	2024-07-26 14:03:50 +00:00
Anastassios Nanos	3f58ea9258	runtime-rs: Decouple Makefile env VARS To avoid overriding env vars when multiple hypervisors are available, we add per-hypervisor vars for static resource management and cgroups handling. We reflect that in the relevant config files as well. Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>	2024-07-26 14:02:35 +00:00
Fabiano Fidêncio	5f146e10a1	osbuilder: Add logs for setting up systemd based stuff This helps us to debug any kind of changes. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-07-26 14:22:45 +02:00
Alex Carter	4a8fb475be	tee: osbuilder: Set /run to use 50% of the image with systemd Let's ensure at least 50% of the memory is used for /run, as systemd by default forces it to be 10%, which is way too small even for very small workloads. This is only done for the rootfs-confidential image. Fixes: kata-containers#6775 Signed-off-by: Alex Carter <Alex.Carter@ibm.com> Signed-off-by: Wang, Arron <arron.wang@intel.com> Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.co Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-07-26 14:22:38 +02:00
Chengyu Zhu	2a9ed19512	Merge pull request #9988 from huoqifeng/annotation initdata: add initdata annotation in hypervisor config	2024-07-26 19:59:45 +08:00
Fupan Li	c51ba73199	container: fix the issue of send signal to process It's better to check the container's status before try to send signal to it. Since there's no need to send signal to it when the container's stopped. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-07-26 19:23:43 +08:00
Fupan Li	e156516bde	sandbox: fix the issue of stop sandbox Since stop sandbox would be called in multi path, thus it's better to set and check the sandbox's state. Fixes: #10042 Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-07-26 19:23:34 +08:00
Qi Feng Huo	a113fc93c8	initdata: fix unit test code for initdata annotation Added ut code for initdata annotation Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>	2024-07-26 18:24:05 +08:00
Qi Feng Huo	8d61029676	initdata: add unit test code for initdata annotation Added ut code for initdata annotation Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>	2024-07-26 14:20:57 +08:00
Qi Feng Huo	b80057dfb5	initdata: Merge branch 'main' into annotation - Merge branch 'main' into feature branch annotation	2024-07-26 14:01:04 +08:00
Archana Shinde	d7637f93f9	Merge pull request #9899 from amshinde/multiple-networks-fix Fix issue while adding multiple networks with nerdctl	2024-07-25 11:56:27 -07:00
Dan Mihai	a37f10fc87	genpolicy: validate each exec command line arg Generate policy that validates each exec command line argument, instead of joining those args and validating the resulting string. Joining the args ignored the fact that some of the args might include space characters. The older format from genpolicy-settings.json was similar to: "ExecProcessRequest": { "commands": [ "sh -c cat /proc/self/status" ], "regex": [] }, That format will not be supported anymore. genpolicy will detect if its users are trying to use the older "commands" field and will exit with a relevant error message in that case. The new settings format is: "ExecProcessRequest": { "allowed_commands": [ [ "sh", "-c", "cat /proc/self/status" ] ], "regex": [] }, Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-25 16:57:17 +00:00
Dan Mihai	0f11384ede	tests: k8s-policy-pod: exec_command clean-up Use "${exec_command[@]}" for calling both: - add_exec_to_policy_settings - kubectl exec Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-25 16:55:03 +00:00
Dan Mihai	95b78ecaa9	tests: k8s-exec: reuse sh_command variable Reuse sh_command variable instead of repeading "sh". Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-25 16:50:34 +00:00
Alex Lyn	abb0a2659a	Merge pull request #9944 from Apokleos/align-ocispec-rs Align kata oci spec with oci-spec-rs	2024-07-25 19:36:52 +08:00
Alex Lyn	bb2b60dcfc	oci: Delete the kata oci spec It's time to delete the kata oci spec implemented just for kata. As we have already done align OCI Spec with oci-spec-rs. Fixes #9766 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-07-25 17:47:01 +08:00
Alex Lyn	b56313472b	agent: Align agent OCI spec with oci-spec-rs Fixes #9766 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-07-25 17:47:01 +08:00
Alex Lyn	882385858d	runtime-rs: Align oci spec in runtime-rs with oci-spec-rs This commit aligns the OCI Spec implementation in runtime-rs with the OCI Spec definitions and related operations provided by oci-spec-rs. Key changes as below: (1) Leveraged oci-spec-rs to align Kata Runtime OCI Spec with the official OCI Spec. (2) Introduced runtime-spec to separate OCI Spec definitions from Kata-specific State data structures. (3) Preserved the original code logic and implementation as much as possible. (4) Made minor code adjustments to adhere to Rust programming conventions; Fixes #9766 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-07-25 17:47:01 +08:00
Alex Lyn	bf813f85f2	runk: Align oci spec with oci-spec-rs Utilized oci-spec-rs to align OCI Spec structures and data representations in runk with the OCI Spec. Fixes #9766 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-07-25 17:47:01 +08:00
Alex Lyn	b3eab5ffea	genpolicy: Align agent-ctl OCI Spec with oci-spec-rs Fixes #9766 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-07-25 17:47:01 +08:00
Alex Lyn	c500fd5761	agent-ctl: Align agent-ctl OCI Spec with oci-spec-rs This commit aligns the OCI Spec used within agent-ctl with the oci-spec-rs definition and operations. This enhancement ensures that agent-ctl adheres to the latest OCI standards and provides a more consistent and reliable experience for managing container images and configurations. Fixes #9766 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-07-25 17:47:01 +08:00
Alex Lyn	faffee8909	libs: update Cargo config and lock file update Cargo.toml and Cargo.lock for adding runtime-spec Fixes #9766 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-07-25 17:47:01 +08:00
Alex Lyn	8b5499204d	protocols: Reimplement OCI Spec to TTRPC Data Translation This commit transitions the data implementation for OCI Spec from kata-oci-spec to oci-spec-rs. While both libraries adhere to the OCI Spec standard, significant implementation details differ. To ensure data exchange through TTRPC services, this commit reimplements necessary data conversion logic. This conversion bridges the gap between oci-spec-rs data and TTRPC data formats, guaranteeing consistent and reliable data transfer across the system. Fixes #9766 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-07-25 17:46:07 +08:00
Anastassios Nanos	cda00ed176	runtime-rs: Add FC specific KERNELPARAMS To avoid overriding KERNELPARAMS for other hypervisors, add FC-specific KERNELPARAMS. Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>	2024-07-25 08:53:57 +00:00
Hyounggyu Choi	d8cac9f60b	GHA: Run k8s e2e tests for qemu-runtime-rs on s390x This commit adds a new CI job for qemu-runtime-rs to the existing zvsi Kubernetes test matrix. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-07-25 08:11:49 +02:00
Alex Lyn	4e003a2125	Merge pull request #10058 from Apokleos/enhance-vsock-connect runtime-rs: enhance debug info for agent connect.	2024-07-25 11:29:04 +08:00
Alex Lyn	36385a114d	runtime-rs: enhance debug info for agent connect. we need more friendly logs for debugging agent conntion cases when kata pods fail. Fixes #10057 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-07-25 08:51:57 +08:00
Dan Mihai	c3adeda3cc	Merge pull request #10051 from microsoft/danmihai1/exec-variable-reuse tests: k8s: reuse policy exec variable	2024-07-24 14:58:40 -07:00
Aurélien Bombo	f08b594733	Merge pull request #9576 from microsoft/saulparedes/support_env_from genpolicy: Add support for envFrom	2024-07-24 13:39:54 -07:00
GabyCT	79edf2ca7d	Merge pull request #10054 from GabyCT/topic/docnydus docs: Update url links in kata nydus document	2024-07-24 14:08:44 -06:00
Archana Shinde	64d6293bb0	tests:Add nerdctl test for testing with multiple netwokrs Add integration test that creates two bridge networks with nerdctl and verifies that Kata container is brought up while passing the networks created. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-07-24 10:45:56 -07:00
Archana Shinde	49fbae4fb1	agent: Wait for interface in update_interface For nerdctl and docker runtimes, network is hot-plugged instead of cold-plugged. While this change was made in the runtime, we did not have the agent waiting for the device to be ready. On some systems, the device hotplug could take some time causing the update_interface rpc call to fail as the interface is not available. Add a watcher for the network interface based on the pci-path of the network interface. Note, waiting on the device based on name is really not reliable especially in case multiple networks are hotplugged. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-07-24 10:45:56 -07:00
Dan Mihai	fecb70b85e	tests: k8s: reuse policy exec variable Share a single test script variable for both: - Allowing a command to be executed using Policy settings. - Executing that command using "kubectl exec". Fixes: #10014 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-24 17:42:04 +00:00
Fabiano Fidêncio	162a6b44f6	Merge pull request #10063 from ChengyuZhu6/fix-ci-timeout gha: Increase timeout to run CoCo tests	2024-07-24 15:14:35 +02:00
Pavel Mores	dd1e09bd9d	runtime-rs: add experimental support for memory hotunplugging to qemu-rs Hotunplugging memory is not guaranteed or even likely to work. Nevertheless I'd really like to have this code in for tests and observation. It shouldn't hurt, from experience so far. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-07-24 13:22:41 +02:00
Pavel Mores	3095b65ac3	runtime-rs: support hotplugging memory in QemuInner The bulk of this implementation are simple though tedious sanity checks, alignment computations and logging. Note that before any hotplugging, we query qemu directly for the current size of hotplugged memory. This ensures that any request to resize memory will be properly compared to the actual already available amount and only necessary amount will be added. Note also that we borrow checked_next_multiple_of() from CH implementation. While this might look uncleanly it's just a rather temporary solution since an equivalent function will apparently be part of std soon, likely the upcoming 1.75. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-07-24 13:22:41 +02:00
Pavel Mores	4a1c828bf8	runtime-rs: support hotplugging memory in Qmp The algorithm is rather simple - we query qemu for existing memory devices to figure out the index of the one we're about to add. Then we add a backend object and a corresponding frontend device. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-07-24 13:22:41 +02:00
Pavel Mores	0e0b146b87	runtime-rs: support storage & retrieval of guest memblock size in qemu-rs This will be used for ensuring that hotplugged memory block sizes are properly aligned. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-07-24 13:22:41 +02:00
Alex Lyn	efb7390357	kata-sys-utils: align OCI Spec with oci-spec-rs Do align oci spec and fix warnings to make clippy happy. Fixes #9766 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-07-24 14:38:48 +08:00
Alex Lyn	012029063c	runtime-spec: Introduce runtime-spec for Container State As part of aligning the Kata OCI Spec with oci-spec-rs, the concept of "State" falls outside the scope of the OCI Spec itself. While we'll retain the existing code for State management for now, to improve code organizationand clarity, we propose moving the State-related code from the oci/ dir to a dedicated directory named runtime-spec/. This separation will be completed in subsequent commits with the removal of the oci/ directory. Fixes #9766 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-07-24 14:38:30 +08:00
Zvonko Kaiser	a388d2b8d4	Merge pull request #9919 from zvonkok/ubuntu-dockerfile gpu: rootfs ubuntu build expansion	2024-07-24 08:05:54 +02:00
ChengyuZhu6	2b44e9427c	gha: Increase timeout to run CoCo tests This PR increases the timeout for running the CoCo tests to avoid random failures. These failures occur when the action `Run tests` times out after 30 minutes, causing the CI to fail. Fixes: #10062 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-07-24 12:31:38 +08:00
GabyCT	b408cc1694	Merge pull request #10060 from GabyCT/topic/fgreptest metrics: Update launch times to use grep -F	2024-07-23 17:23:14 -06:00
Gabriela Cervantes	0e5489797d	docs: Update url links in kata nydus document This PR updates the url links in the kata nydus document. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-23 17:49:12 +00:00
Gabriela Cervantes	3d17a7038a	metrics: Update launch times to use grep -F This PR updates the metrics launch times to use grep -F instead of fgrep as this command has been deprecated. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-23 17:13:52 +00:00
Zvonko Kaiser	941577ab3b	gpu: rootfs ubuntu build expansion For the GPU build we need go/rust and some other helpers to build the rootfs. Always use versions.yaml for the correct and working Rust and golang version Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-07-23 14:31:35 +00:00
Steve Horsman	d69950e5c6	Merge pull request #10053 from stevenhorsman/release-env-var ci: cache: Pass through RELEASE env	2024-07-22 21:53:20 +01:00
Dan Mihai	f26d595e5d	Merge pull request #9910 from microsoft/saulparedes/set_policy_rego_via_env tools: Allow setting policy rego file via	2024-07-22 11:00:30 -07:00
stevenhorsman	66f6ec2919	ci: cache: Pass through RELEASE env In kata-deploy-binaries.sh we want to understand if we are running as part of a release, so we need to pass through the RELEASE env from the workflow, which I missed in https://github.com/kata-containers/kata-containers/pull/9550 Fixes: #9921 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-07-22 16:39:35 +01:00
Zvonko Kaiser	5765b6e062	Merge pull request #9920 from zvonkok/initrd-builer gpu: rootfs/initrd build init	2024-07-22 15:06:49 +02:00
Zvonko Kaiser	73bcb09232	Merge pull request #9968 from zvonkok/kernel-gpu-dragonball-6.1.x dragonball: kernel gpu dragonball 6.1.x	2024-07-22 13:03:14 +02:00
Zvonko Kaiser	3029e6e849	gpu: rootfs/initrd build init Initramfs expects /init, create symlink only if ${ROOTFS}/init does not exist Init may be provided by other packages, e.g. systemd or GPU initrd/rootfs Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-07-22 10:19:05 +00:00
Saul Paredes	b7a184a0d8	rootfs: Allow AGENT_POLICY_FILE te be an absolute path Don't set AGENT_POLICY_FILE as $script_dir may change Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-07-21 14:57:41 -07:00
Alex Lyn	67466aa27f	kata-types: do alignment of oci-spec for kata-types Fixes #9766 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-07-21 22:54:43 +08:00
Hyounggyu Choi	c774cd6bb0	Merge pull request #10031 from ChengyuZhu6/fix-log-contain-tdx tests: Fix missing log on TDX	2024-07-20 07:26:08 +02:00
ChengyuZhu6	6ea6e85f77	tests: Re-enable authenticated image tests on tdx Try to re-enable authenticated image tests on tdx. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-07-20 12:10:02 +08:00
ChengyuZhu6	3476fb481e	tests: Fix missing log on TDX Currently, we have found that `assert_logs_contain` does not work on TDX. We manually located the specific log, but it fails to get the log using `kubectl debug`. The error found in CI is: ``` warning: couldn't attach to pod/node-debugger-984fee00bd70.jf.intel.com-pdgsj, falling back to streaming logs: error stream protocol error: unknown error ``` Upon debugging the TDX CI machine, we found an error in containerd: ``` Attach container from runtime service failed" err="rpc error: code = InvalidArgument desc = tty and stderr cannot both be true" containerID="abc8c7a546c5fede4aae53a6ff2f4382ff35da331bfc5fd3843b0c8b231728bf" ``` We believe this is the root cause of the test failures in TDX CI. Therefore, we need to ensure that tty and stderr are not set to true at same time. Fixes: #10011 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com> Signed-off-by: Wang, Arron <arron.wang@intel.com>	2024-07-20 12:10:01 +08:00
Steve Horsman	7dd560f07f	Merge pull request #9620 from l8huang/kernel Add kernel config for NVIDIA DPU/ConnectX adapter	2024-07-19 23:16:51 +01:00
Dan Mihai	3127dbb3df	Merge pull request #10035 from microsoft/danmihai1/k8s-credentials-secrets tests: k8s-credentials-secrets: policy for second pod	2024-07-19 12:44:21 -07:00
Saul Paredes	2681fc7eb0	genpolicy: Add support for envFrom This change adds support for the `envFrom` field in the `Pod` resource Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-07-19 09:53:58 -07:00
GabyCT	be2d4719c2	Merge pull request #10040 from kata-containers/fix_blogbench_midvalues metrics: update avg reference values for blogbench.	2024-07-19 09:51:29 -06:00
Zvonko Kaiser	8eaa2f0dc8	dragonball: Add GPU support Build a GPU flavoured dragonball kernel Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-07-19 14:48:05 +00:00
Dan Mihai	44e443678d	Merge pull request #9835 from microsoft/saulparedes/test_policy_on_sev gha: enable autogenerated policy testing on SEV and SEV-SNP	2024-07-19 07:46:01 -07:00
Greg Kurz	dc97f3f540	Merge pull request #10045 from lifupan/cleanup_container runtime-rs: container: fix the issue of missing cleanup container	2024-07-19 16:36:04 +02:00
Alex Lyn	d0dc67bb96	Merge pull request #8597 from amshinde/vfio-hotplug-support Implement hotplug support for physical endpoints	2024-07-19 13:41:11 +08:00
Lei Huang	20f6979d8f	build: add kernel config for Nvidia DPU/ConnectX adapter With Nvidia DPU or ConnectX network adapter, VF can do VFIO passthrough to guest VM in `guest-kernel` mode. In the guest kernel, the adapter's driver is required to claim the VFIO device and create network interface. Signed-off-by: Lei Huang <leih@nvidia.com>	2024-07-18 22:29:16 -07:00
Fupan Li	8a2f7b7a8c	container: fix the issue of missing cleanup container When create container failed, it should cleanup the container thus there's no device/resource left. Fixes: #10044 Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-07-19 11:02:55 +08:00
ms-mahuber	ddff762782	tools: Allow setting policy rego file via environment variable * Set policy file via env var * Add restrictive policy file to kata-opa folder * Change restrictive policy file name * Change relative default path location * Add license headers Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-07-18 15:05:45 -07:00
David Esparza	60f52a4b93	metrics: update avg reference values for blogbench. This PR updates the Blogbench reference values for read and write operations used in the CI check metrics job. This is due to the update to version 1.2 of blobench. Fixes: #10039 Signed-off-by: David Esparza <david.esparza.borquez@intel.com>	2024-07-18 15:47:14 -06:00
Greg Kurz	fc4357f642	Merge pull request #10034 from BbolroC/hide-repack_secure_image-from-test tests: Call repack_secure_image() in set_metadata_annotation()	2024-07-18 23:03:41 +02:00
Aurélien Bombo	ab6f37aa52	Merge pull request #10022 from microsoft/danmihai1/probes-and-lifecycle genpolicy: container.exec_commands args validation	2024-07-18 12:21:31 -07:00
Steve Horsman	256ab50f1a	Merge pull request #9959 from sprt/fix-ci-cleanup ci: cleanup: Ignore nonexisting resources	2024-07-18 19:23:48 +01:00
David Esparza	1fdc5c1183	Merge pull request #10028 from amshinde/upgrade-blogbench-1.2 metric: Upgrade blogbench to 1.2	2024-07-18 11:30:17 -06:00
Hyounggyu Choi	a7e4d3b738	tests: Call repack_secure_image() in set_metadata_annotation() It is not good practice to call repack_secure_image() from a bats file because the test code might not consider cases where `qemu-se` is used as `KATA_HYPERVISOR`. This commit moves the function call to set_metadata_annotation() if a key includes `kernel_params` and `KATA_HYPERVISOR` is set to `qemu-se`, allowing developers to focus on the test scenario itself. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-07-18 18:09:45 +02:00
Dan Mihai	035a42baa4	tests: k8s-credentials-secrets: policy for second pod Add policy to pod-secret-env.yaml from k8s-credentials-secrets.bats. Policy was already auto-generated for the other pod used by the same test (pod-secret.yaml). pod-secret-env.yaml was inconsistent, because it was taking advantage of the "allow all" policy built into the Guest image. Sooner or later, CI Guests for CoCo will not get the "allow all" policy built in anymore and pod-secret-env.yaml would have stopped working then. Note that pod-secret-env.yaml continues to use an "allow all" policy after these changes. #10033 must be solved before a more restrictive policy will be generated for pod-secret-env.yaml. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-18 15:03:57 +00:00
Hyounggyu Choi	d2ac01c862	Merge pull request #10032 from BbolroC/fix-image-authenticated-for-s390x tests: Rebuild secure boot image for guest-pull-image-authenticated for IBM SE	2024-07-18 17:00:18 +02:00
Hyounggyu Choi	6e7ee4bdab	tests: Rebuild secure image for guest-pull-image-authenticated on SE Since #9904 was merged, newly introduced tests for `k8s-guest-pull-image-authenticated.bats` have been failing on IBM SE (s390x). The agent fails to start because a kernel parameter cannot pass to the guest VM via annotation. To fix this, the boot image must be rebuilt with updated parameters. This commit adds the rebuilding step in create_pod_yaml_with_private_image() for `qemu-se`. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-07-18 14:56:12 +02:00
Archana Shinde	1636c201f4	network: Implement network hotunplug for physical endpoints Similar to HotAttach, the HotDetach method signature for network endoints needs to be changed as well to allow for the method to make use of device manager to manage the hot unplug of physical network devices. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-07-17 16:42:41 -07:00
Archana Shinde	c6390f2a2a	vfio: Introduce function to get vfio dev path This function will be later used to get the vfio dev path. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-07-17 16:42:41 -07:00
Archana Shinde	1e304e6307	network: Implement hotplug for physical endpoints Enable physical network interfaces to be hotplugged. For this, we need to change the signature of the HotAttach method to make use of Sandbox instead of Hypervisor. Similar approach was followed for Attach method, but this change was overlooked for HotAttach. The signature change is required in order to make use of device manager and receiver for physical network enpoints. Fixes: #8405 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-07-17 16:42:40 -07:00
Archana Shinde	2fef4bc844	vfio: use driver_override field for device binding. The current implementation for device binding using driver bind/unbind and new_id fails in the scenario when the physical device is not bound to a driver before assigning it to vfio. There exists and updated mechanism to accomplish the same that does not have the same issue as above. The driver_override field for a device allows us to specify the driver for a device rather than relying on the bound driver to provide a positive match of the device. It also has other advantages referenced here: https://patchwork.kernel.org/project/linux-pci/patch/1396372540.476.160.camel@ul30vt.home/ So use the updated driver_override mechanism for binding/unbinding a physical device/virtual function to vfio-pci. Signed-off-by: liangxianlong <liang.xianlong@zte.com.cn> Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-07-17 16:42:40 -07:00
GabyCT	6aff5f300a	Merge pull request #10021 from GabyCT/topic/fixarchdoc docs: Update devmapper docs	2024-07-17 14:56:40 -06:00
Saul Paredes	57d2ded3e2	gha: enable autogenerated policy testing on SEV-SNP Enable autogenerated policy testing on SEV-SNP Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-07-17 13:32:06 -07:00
Archana Shinde	30e5e88ff1	metric: Upgrade blogbench to 1.2 Move to blogbench 1.2 version from 1.1. This version includes an important fix for the read_score test which was reported to be broken in the previous version. It essentially fixes this issue here: https://github.com/jedisct1/Blogbench/issues/4 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-07-17 11:32:09 -07:00
Steve Horsman	e5d5284761	Merge pull request #10026 from wainersm/release_370 release: Bump VERSION to 3.7.0	2024-07-17 18:43:51 +01:00
Wainer dos Santos Moschetta	6f7ab31860	release: Bump VERSION to 3.7.0 On preparation for the 3.7.0 release, bumped the version in VERSION file. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-07-17 14:19:44 -03:00
Saul Paredes	b3cc8b200f	gha: enable autogenerated policy testing on SEV Enable autogenerated policy testing on SEV Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-07-17 09:55:13 -07:00
Dan Mihai	f31c1b121e	Merge pull request #9812 from microsoft/saulparedes/test_policy_on_tdx gha: enable policy testing on TDX	2024-07-17 08:47:44 -07:00
Dan Mihai	449103c7bf	Merge pull request #10020 from microsoft/danmihai1/pod-security-context tests: fix ps command in k8s-security-context	2024-07-17 08:12:57 -07:00
Fabiano Fidêncio	b7051890af	Merge pull request #9722 from zvonkok/busybox-build deploy: Add busybox target	2024-07-17 13:47:15 +02:00
Steve Horsman	5ce2c1010a	Merge pull request #9904 from stevenhorsman/registry-authentication Support for registry authentication in guest pull	2024-07-17 10:48:38 +01:00
Fupan Li	65f2bfb8c4	Merge pull request #9967 from zvonkok/kernel-dragonball-6.1.x dragonball: kernel dragonball 6.1.x	2024-07-17 14:38:06 +08:00
Dan Mihai	0e86a96157	tests: fix ps command in k8s-security-context 1. Use a container image that supports "ps --user 1000 -f". 2. Execute that command using: sh -c "ps --user 1000 -f" instead of passing additional arguments to sh: sh -c ps --user 1000 -f Fixes: #10019 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-17 01:33:31 +00:00
Dan Mihai	9f4d1ffd43	genpolicy: container.exec_commands args validation Keep track of individual exec args instead of joining them in the policy text. Verifying each arg results in a more precise policy, because some of the args might include space characters. This improved validation applies to commands specified in K8s YAML files using: - livenessProbe - readinessProbe - startupProbe - lifecycle.postStart - lifecycle.preStop Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-17 01:19:23 +00:00
Dan Mihai	b23ea508d5	tests: k8s: container.exec_commands policy tests Add tests for genpolicy's handling of container.exec_commands. These are commands allowed by the policy and originating from these input K8s YAML fields: - livenessProbe - readinessProbe - startupProbe - lifecycle.postStart - lifecycle.preStop Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-17 01:19:00 +00:00
stevenhorsman	567b4d5788	test/k8s: Fix up node logging typo We had a typo in the attestation tests that we've copied around a lot and Wainer spotted it in the authenticated registry tests, so let's fix it up now Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-07-16 21:39:31 -03:00
stevenhorsman	0015c8ef51	tests: Add guest-pull auth registry tests Add three new test cases for guest pull from an authenticated registry for the following scenarios: _Scenario: Creating a container from an authenticated image, with correct credentials via KBC works_ Given An authenticated container registry quay.io/kata-containers/confidential-containers-auth And a version of kata deployed with a guest image that has an agent with `guest_pull` feature enabled and nydus-snapshotter installed and configured for [guest-pulling](https://github.com/containerd/nydus-snapshotter/blob/main/misc/snapshotter/config-coco-guest-pulling.toml) And a KBS set up to have the correct auth.json for registry quay.io/kata-containers/confidential-containers-auth embedded in the `"Credential"` section of `its resources file` When I create a pod from the container image quay.io/kata-containers/confidential-containers-auth:test Then The pull image works and the pod can start _Scenario: Creating a container from an authenticated image, with incorrect credentials via KBC fails_ Given An authenticated container registry quay.io/kata-containers/confidential-containers-auth And a version of kata deployed with a guest image that has an agent with `guest_pull` feature enabled and nydus-snapshotter installed and configured for [guest-pulling](https://github.com/containerd/nydus-snapshotter/blob/main/misc/snapshotter/config-coco-guest-pulling.toml) And An installed kata CC with the sample_kbs set up to have the auth.json for registry quay.io/kata-containers/confidential-containers-auth embedded in the `"Credential"` resource, but with a dummy user name and password When I create a pod from the container image quay.io/kata-containers/confidential-containers-auth:test Then The pull image fails with a message that reflects that the authorisation failed _Scenario: Creating a container from an authenticated image, with no credentials fails_ Given An authenticated container registry quay.io/kata-containers/confidential-containers-auth And a version of kata deployed with a guest image that has an agent with `guest_pull` feature enabled and nydus-snapshotter installed and configured for [guest-pulling](https://github.com/containerd/nydus-snapshotter/blob/main/misc/snapshotter/config-coco-guest-pulling.toml) And An installed kata CC with no credentials section When I create a pod from the container image quay.io/kata-containers/confidential-containers-auth:test Then The pull image fails with a message that reflects that the authorisation failed Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-07-16 21:39:31 -03:00
stevenhorsman	eb07f5ef5e	agent: doc: Fix ordering of options - Fix the config options to be back in alphabetical order to be easier to find Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-07-16 21:39:31 -03:00
stevenhorsman	7cc81ce867	agent: image: Set image-rs auth config If the agent-config has a value for `image_registry_auth`, Then pass this to the image-rs client and enable auth mode too Fixes: #8122 Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-07-16 21:39:31 -03:00
stevenhorsman	265322990a	agent: config: Add config option to provide auth for guest-pull Add optional config for agent.image_registry_auth, to specify the uri of credentials to be used when pulling images in the guest from an authenticated registry Fixes: #8122 Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com> Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-07-16 21:39:31 -03:00
Steve Horsman	064b45a2fa	Merge pull request #10016 from wainersm/ibm-se-auth-reg workflows: setup environment to run auth registry tests on s390x	2024-07-16 22:24:39 +01:00
Gabriela Cervantes	d2866081d2	docs: Update devmapper docs This PR updates the devmapper docs by updating the url link for the current containerd devmapper information. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-16 21:07:51 +00:00
GabyCT	2206e2dd5c	Merge pull request #10013 from GabyCT/topic/updatecontdoc docs: Update cri installion guide url in containerd documentation	2024-07-16 14:32:59 -06:00
Wainer dos Santos Moschetta	66c600f8d8	gha: delint the s390x workflow Made run-k8s-tests-on-zvsi.yaml free of warnings by removing: SC2086:info:1:1: Double quote to prevent globbing and word splitting ... SC2086:info:2:1: Double quote to prevent globbing and word splitting ... Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-07-16 15:20:46 -03:00
Wainer dos Santos Moschetta	a98985fab8	gha: export user/password for auth registry tests on s390x Counterpart of commit `d8961cbd4a` for run-k8s-tests-on-zvsi workflow Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-07-16 15:18:40 -03:00
Saul Paredes	af49252c69	gha: enable policy testing on TDX Enable policy testing on TDX Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-07-15 14:09:49 -07:00
Saul Paredes	0b3d193730	genpolicy: Support cpath for mount sources Add setting to allow specifying the cpath for a mount source. cpath is the root path for most files used by a container. For example, the container rootfs and various files copied from the Host to the Guest when shared_fs=none are hosted under cpath. mount_source_cpath is the root of the paths used a storage mount sources. Depending on Kata settings, mount_source_cpath might have the same value as cpath - but on TDX for example these two paths are different: TDX uses "/run/kata-containers" as cpath, but "/run/kata-containers/shared/containers" as mount_source_cpath. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-07-15 14:09:49 -07:00
Gabriela Cervantes	e4045ff29a	docs: Update runtime v2 containerd url information This PR updates the runtime v2 containerd url information at containerd documentation. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-15 20:36:17 +00:00
Dan Mihai	bcaf7fc3b4	Merge pull request #10008 from microsoft/danmihai1/runAsUser genpolicy: add support for runAsUser fields	2024-07-15 12:08:50 -07:00
Gabriela Cervantes	9f738f0d05	docs: Update cri installion guide url in containerd documentation This PR updates the cri installation guide url link in the containerd documentation guide as the previous url link does not exists. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-15 16:58:18 +00:00
Dan Mihai	648265d80e	Merge pull request #9998 from microsoft/danmihai1/GENPOLICY_PULL_METHOD tests: k8s: GENPOLICY_PULL_METHOD clean-up	2024-07-15 09:32:29 -07:00
Steve Horsman	02b9fd6e95	Merge pull request #9382 from Xynnn007/feat-encrypt-image Merge to main: supporting pull encrypted images	2024-07-15 15:58:42 +01:00
stevenhorsman	b060fb5b31	tests/k8s: Skip measured rootfs test The only kernel built for measured rootfs was the kernel-tdx-experimental, so this test only ran in the qemu-tdx job runs the test. In commit `6cbdba7` we switched all TEE configurations to use the same kernel-confidential, so rootfs measured is disabled for qemu-tdx too now. The VM still fails to boot (because of a different reason...) but the bug in the assert_logs_contain, fixed in this PR was masking the checks on the logs. We still have a few open issues related to measured rootfs and generating the root hash, so let's skip this test that doesn't work until they are looked at Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-07-15 12:00:50 +01:00
stevenhorsman	2cf94ae717	tests: Add guest-pull encrypted image tests Add three new tests cases for guest-pull of an encrypted image for the following scenarios: _Scenario: Pull encrypted image on guest with correct key works_ Given I have a version of kata deployed with a guest image that has an agent with `guest_pull` feature enabled and nydus-snapshotter installed and configured for guest-pulling And A public encrypted container image i with a decryption key k that is configured as a resource the KBS, so that image-rs on the guest can connect to it When I try and create a pod from i Then The pod is successfully created and runs _Scenario: Cannot pull encrypted image with no decryption key_ Given I have a version of kata deployed with a guest image that has an agent with `guest_pull` feature enabled and nydus-snapshotter installed and configured for guest-pulling And A public encrypted container image i with a decryption key k, that is not configured in a KBS that image-rs on the guest can connect to When I try and create a pod from i Then The pod is not created with an error message that reflects why _Scenario: Cannot pull encrypted image with wrong decryption key_ Given I have a version of kata deployed with a guest image that has an agent with `guest_pull` feature enabled and nydus-snapshotter installed and configured for guest-pulling And A public encrypted container image i with a decryption key k and a different key k' that is set as a resource in a KBS, that image-rs on the guest can connect to When I try and create a pod from i Then The pod is not created with an error message that reflects why Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-07-15 12:00:50 +01:00
Xynnn007	a56b15112a	agent: add ocicrypt config ocicrypt config is for kata-agent to connect to CDH to request for image decryption key. This value is specified by an env. We use this workaround the same as CCv0 branch. In future, we will consider better ways instead of writting files and setting envs inside inner logic of kata-agent. Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>	2024-07-15 12:00:50 +01:00
Xynnn007	1072658219	agent: Enable kata-cc-rustls-tls in image-rs - Enable the kata-cc-rustls-tls feature in image-rs, so that it can get resources from the KBS in order to retrieve the registry credentials. - Also bump to the latest image-rs to pick up protobuf fixes - Add libprotobuf-dev dependency to the agent packaging as it is needed by the new image-rs feature - Add extra env in the agent make test as the new version of the anyhow crate has changed the backtrace capture thus unit tests of kata-agent that compares a raised error with an expected one would fail. To fix this, we need only panics to have backtraces, thus set RUST_BACKTRACE=0 for tests due to document https://docs.rs/anyhow/latest/anyhow/ Fixes #9538 Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-07-15 12:00:50 +01:00
stevenhorsman	3b72e9ffab	tests/k8s: Fix assert_logs_contain The pipe needs adding to the grep, otherwise the grep gets consumed as an argument to `print_node_journal` and run in the debug pod. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-07-15 12:00:50 +01:00
Hyounggyu Choi	83b3a681f4	Merge pull request #10010 from BbolroC/osbuilder-bump-fedora-to-40 osbuilder: Bump Fedora to 40	2024-07-15 13:00:28 +02:00
Greg Kurz	203d9e7803	Merge pull request #10000 from littlejawa/kata_deploy_add_storage_config_for_crio kata-deploy: add storage configuration for cri-o	2024-07-15 12:29:21 +02:00
Hyounggyu Choi	08d2f6bfe4	osbuilder: Bump Fedora to 40 As Fedora 38 has reached EOL, we are encountering 404 errors for s390x, such as: ``` Status code: 404 for https://dl.fedoraproject.org/pub/fedora-secondary/updates/38/Everything/s390x/repodata/repomd.xml ``` Let's bump the OS to the latest version. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-07-15 09:58:54 +02:00
Fupan Li	a7179be31d	Merge pull request #9534 from Tim-Zhang/fix-stdin-stuck Fix ctr exec stuck problem	2024-07-15 13:19:19 +08:00
Dan Mihai	dded329d26	tests: k8s: SecurityContext.runAsUser policy test Add test for auto-generating policy for a pod spec that includes the SecurityContext.runAsUser field. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-13 01:37:58 +00:00
Dan Mihai	7040fb8c50	tests: k8s-security-context auto-generated policy Auto-generate the policy in k8s-security-context.bats - previously blocked by lacking support for PodSecurityContext.runAsUser. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-13 01:23:54 +00:00
Dan Mihai	f087044ecb	genpolicy: add support for runAsUser Add ability to auto-generate policy for SecurityContext.runAsUser and PodSecurityContext.runAsUser. Fixes: #8879 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-13 01:10:43 +00:00
Dan Mihai	5282701b5b	genpolicy: add link to allow_user() active issue Improve comment to workaround in rules.rego, to explain better the reason for that workaround. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-13 01:05:58 +00:00
GabyCT	3c0171df3d	Merge pull request #10005 from GabyCT/topic/katadragonball common: Add share fs information for dragonball	2024-07-12 16:10:29 -06:00
Wainer Moschetta	646d7ea4fb	Merge pull request #9951 from BbolroC/enable-attestation-for-ibm-se tests: Enable attestation e2e tests for IBM SE	2024-07-11 16:02:59 -03:00
Hyounggyu Choi	ca80301b4b	Merge pull request #10003 from BbolroC/skip-pod-shared-volume-for-ibm-se k8s: Skip shared-volume relevant tests for IBM SE	2024-07-11 19:29:13 +02:00
Gabriela Cervantes	4477b4c9dc	common: Add share fs information for dragonball This PR adds the share fs information for dragonball using kata-ctl to avoid the failures in runk tests saying that shared_fs is an unbound variable. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-11 17:09:35 +00:00
Dan Mihai	09c5ca8032	tests: k8s: clarify the need to use containerd.sock Modify the permissions of containerd.sock just when genpolicy needs access to this socket, when testing GENPOLICY_PULL_METHOD=containerd. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-11 16:49:58 +00:00
Dan Mihai	c1247cc254	tests: k8s: explain the default containerd settings Explain why the containerd settings on the local machine get set to containerd's defaults when testing GENPOLICY_PULL_METHOD=containerd. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-11 16:49:39 +00:00
Dan Mihai	3b62eb4695	tests: k8s: add comment for GENPOLICY_PULL_METHOD Explain why there are two different methods for pulling container images in genpolicy. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-11 16:40:01 +00:00
Dan Mihai	eaedd21277	tests: k8s: use oci-distribution as default value oci-distribution is the value used by run-k8s-tests-on-aks.yaml, so use the same value as default for GENPOLICY_PULL_METHOD in gha-run.sh. The value of GENPOLICY_PULL_METHOD is currently compared just with "containerd", but avoid possible future problems due to using a different default value in gha-run.sh. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-11 16:40:01 +00:00
GabyCT	2056eda5f0	Merge pull request #9922 from GabyCT/topic/updateblogname metrics: Update container name in blogbench test	2024-07-11 10:05:35 -06:00
Hyounggyu Choi	32c3e55cde	k8s: Skip shared-volume relevant tests for IBM SE Currently, it is not viable to share a writable volume (e.g., emptyDir) between containers in a single pod for IBM SE. The following tests are relevant: - pod-shared-volume.bats - k8s-empty-dirs.bats (See: https://github.com/kata-containers/kata-containers/issues/10002) This commit skips the tests until the issue is resolved. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-07-11 14:09:19 +02:00
Julien Ropé	b83d4e1528	kata-deploy: add storage configuration for cri-o Make sure that the "skip_mount_home" flag is set in cri-o config. Fixes: #9878 Signed-off-by: Julien Ropé <jrope@redhat.com>	2024-07-11 10:11:30 +02:00
Qi Feng Huo	4d66ee1935	initdata: add initdata annotation in hypervisor config - Add Initdata annotation for hypervisor config, so that it can be passed when CreateVM Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>	2024-07-11 10:56:18 +08:00
GabyCT	dac07239f5	Merge pull request #9974 from squarti/sharedfs runtime: Initialize SharedFS for remote hypervisor	2024-07-10 17:03:00 -06:00
GabyCT	3827b5f9f2	Merge pull request #9982 from ChengyuZhu6/fix-ci tests: Delete test scripts forcely	2024-07-10 17:00:41 -06:00
Wainer Moschetta	deb4627558	Merge pull request #9975 from niteeshkd/nd_snp_attestation gha: enable SNP attestation	2024-07-10 18:59:05 -03:00
GabyCT	c40b3b4ce7	Merge pull request #9992 from sprt/fix-nydus ci: fix run-nydus tests	2024-07-10 13:56:16 -06:00
David Esparza	be9385342e	Merge pull request #9990 from GabyCT/topic/tdxtimeout gha: Increase timeout to run CoCo TDX tests	2024-07-10 13:21:23 -06:00
Silenio Quarti	8260ce8d15	runtime: Initialize SharedFS for remote hypervisor Sets SharedFS config to NoSharedFS for remote hypervisor in order to start the file watcher which syncs files from the host to the guest VMs. Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>	2024-07-10 14:31:25 -03:00
Aurélien Bombo	25e0e2fb35	ci: fix run-nydus tests GH-9973 introduced: * New function get_kata_memory_and_vcpus() in tests/metrics/lib/common.bash. * A call to get_kata_memory_and_vcpus() from extract_kata_env(), which is defined in tests/common.bash. Because the nydus test only sources tests/common.bash, it can't find get_kata_memory_and_vcpus() and errors out. We fix this by moving the get_kata_memory_and_vcpus() call from tests/common.bash to tests/metrics/lib/json.bash so that it doesn't impact the nydus test. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-07-10 17:19:08 +00:00
Gabriela Cervantes	b6b8524ab7	gha: Increase timeout to run CoCo TDX tests This PR increases the timeout to run the CoCo TDX tests in order to avoid the random failures on TDX saying that The action 'Run tests' has timed out after 30 minutes and making the GHA job fail. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-10 16:06:07 +00:00
Niteesh Dubey	e8a3f8571e	docs: update for SNP attestation This updates how-to document for SNP attestation. Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>	2024-07-10 15:06:55 +00:00
Niteesh Dubey	ff04154fdb	gha: enable SNP attestation This removes the code to skip the SNP attestation. Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>	2024-07-10 15:06:55 +00:00
Hyounggyu Choi	d94b285189	tests: Enable k8s-confidential-attestation.bats for s390x For running a KBS with `se-verifier` in service, specific credentials need to be configured. (See https://github.com/confidential-containers/trustee/tree/main/attestation-service/verifier/src/se for details.) This commit introduces two procedures to support IBM SE attestation: - Prepare required files and directory structure - Set necessary environment variables for KBS deployment - Repackage a secure image once the KBS service address is determined These changes enable `k8s-confidential-attestation.bats` for s390x. Fixes: #9933 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-07-10 16:18:37 +02:00
Hyounggyu Choi	5d0f74cd70	local-build: Extract build_secure_image() as a separate library Currently, all functions in `build_se_image.sh` are dedicated to publishing a payload image. However, `build_secure_image()` is now also used for repackaging a secure image when a kernel parameter is reconfigured. This reconfiguration is necessary because the KBS service address is determined after the initial secure image build. This commit extracts `build_secure_image()` from `build_se_image.sh` and creates a separate library, which can be loaded by bats-core. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-07-10 16:18:37 +02:00
Hyounggyu Choi	bf2f0ea2ca	tests: Change a location for creating key.bin The current KBS deployment creates a file `key.bin` assuming that `kustomization.yaml` is located in `overlays/`. However, this does not hold true when the kustomize config is enabled for multiple architectures. In such cases, the configuration file should be located in `overlays/$(uname -m)`. This commit changes the location for file creation. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-07-10 16:18:37 +02:00
Hyounggyu Choi	4025ef7193	versions: Bump trustee to multi-arch deployment for KBS As part of the enablement for s390x, KBS should support multi-arch deployment. This commit updates the version of coco-trustee to a commit where the support is implemented. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-07-10 16:18:37 +02:00
Hyounggyu Choi	856a1f72c6	packaging: Set ATTESTER to se-attester for guest components on s390x This commit allows the guest-components builder to only build se-attester on s390x. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-07-10 16:18:37 +02:00
Xuewei Niu	7f71eac6de	Merge pull request #9868 from l8huang/dan runtime: implement DAN in Go kata-runtime	2024-07-10 19:09:46 +08:00
Alex Lyn	dafff26f01	Merge pull request #9814 from Apokleos/bugfix-pcipath runtime-rs: bugfix for root bus slot allocation	2024-07-10 16:19:06 +08:00
Steve Horsman	aa487307e8	Merge pull request #9962 from GabyCT/topic/removecif scripts: Eliminate CI variable as it is not longer used	2024-07-10 09:02:33 +01:00
Steve Horsman	78bbc51ff0	Merge pull request #9806 from niteeshkd/nd_snp_certs runtime: pass certificates to get extended attestation report for SNP coco	2024-07-10 08:57:45 +01:00
Steve Horsman	29413021e5	Merge pull request #9981 from stevenhorsman/run-k8s-tests-on-zvsi-inherit-secrets gha: make run-k8s-tests-on-zvsi inherit secrets	2024-07-10 08:49:11 +01:00
Lei Huang	171d298dea	runtime: implement DAN in Go kata-runtime The DAN feature has already been implemented in kata-runtime-rs, and this commit brings the same capability to the Go kata-runtime. Fixes: #9758 Signed-off-by: Lei Huang <leih@nvidia.com>	2024-07-10 00:22:30 -07:00
ChengyuZhu6	489afffd8c	tests:gha: delete namespace before resetting namespace Delete the kata-containers-k8s-tests namespace before resetting the namespace to ensure that no deployments or services are restarting and creating pods in the default namespace. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com> Signed-off-by: Wang, Arron <arron.wang@intel.com>	2024-07-10 12:08:28 +08:00
ChengyuZhu6	e874c8fa2e	tests: Delete test scripts forcely Delete test scripts forcely in `Delete kata-deploy` step before deleting all kata pods. Fixes: #9980 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-07-10 12:08:28 +08:00
Alex Lyn	806e959b01	runtime-rs: bugfix for device slot allocation failed in dragonball In dragonball Vfio device passthrough scenarois, the first passthrough device will be allocated slot 0 which is occupied by root device. It will cause error, looks like as below: ``` ... 6: failed to add VFIO passthrough device: NoResource\n 7: no resource available for VFIO device"): unknown ... ``` To address such problem, we adopt another method with no pre-allocated guest device id and just let dragonball auto allocate guest device id and return it to runtime. With this idea, add_device will return value Result<DeviceType> and apply the change to related code. Fixes #9813 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-07-10 10:59:57 +08:00
Alex Lyn	27947cbb0b	dragonball: make add vfio device return guest device id Fixes #9813 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-07-10 10:59:51 +08:00
Alex Lyn	fa4af09658	Merge pull request #9985 from GabyCT/topic/fixcrites cri-containerd: Remove use_devmapper variable for cri-containerd tests	2024-07-10 10:13:27 +08:00
Alex Lyn	e4997760f1	Merge pull request #9987 from kata-containers/remove_double_process_check_from_memory_usage_test metrics: Remove duplicate check of processes from memory test.	2024-07-10 10:12:18 +08:00
David Esparza	09f523c815	Merge pull request #9973 from kata-containers/add_memory_and_vcpus_info_to_results Add memory and vcpus info to metrics results	2024-07-09 18:05:07 -06:00
David Esparza	e77d44614b	metrics: Remove duplicate check of processes from memory test. This PR removes the common_init function call from the memory usage script to eliminate duplicate checking that is also done from the init_env function. It also eliminates duplicaction of nested conditionals. Fixes: #9984 Signed-off-by: David Esparza <david.esparza.borquez@intel.com>	2024-07-09 12:34:51 -06:00
Gabriela Cervantes	7061272b4e	kernel: bump kata config version This PR bumps the kata config version as the kernel scripts were modified. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-09 20:04:24 +02:00
Gabriela Cervantes	de848c1458	packaging: Remove CI variable from build kernel script This PR removes the CI variable from build kernel script which is not longer supported it as this was part of the jenkins environment. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-09 20:04:24 +02:00
Gabriela Cervantes	28601b51d2	tools: Remove CI variable in kata deploy in docker script This PR removes the CI variable in kata deploy in docker script which was supported it in jenkins environment which is not longer being supported it. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-09 20:04:24 +02:00
Gabriela Cervantes	f2b8c6619d	makefile: Remove CI variable from local build makefile This PR removes the CI variable from the local build makefile as this was part of the jenkins environment which is not longer supported it. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-09 20:04:24 +02:00
Gabriela Cervantes	4161fa3792	tools: Remove CI variable in test images script for osbuilder This PR removes the CI variable in test images script for osbuilder as this was part of the jenkins environment which is not longer supported it. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-09 20:04:24 +02:00
Greg Kurz	7506d1ec29	tools: Remove CI variable in test config osbuilder script This PR removes the CI variable in test config osbuilder script which was supported on the jenkins environment which is not longer supported it. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com> [greg: squash all fixes into a single patch] Signed-off-by: Greg Kurz <groug@kaod.org>	2024-07-09 20:03:08 +02:00
Niteesh Dubey	647dad2a00	gha: skip SNP attestation test Skip the SNP attestation test for now. Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>	2024-07-09 17:16:07 +00:00
Niteesh Dubey	e7b4e5e386	gha: add SNP attestation test This tests the attestation of SNP guest. Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>	2024-07-09 17:14:26 +00:00
Gabriela Cervantes	1a1e62b968	cri-containerd: Remove use_devmapper variable for cri-containerd tests This PR removes the use_devmapper variable which was part of the jenkins environment flags which is not longer support it or available for the cri-containerd tests. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-09 17:09:55 +00:00
GabyCT	eb0bc5007c	Merge pull request #9976 from sprt/fix-cri-containerd tests: cri-containerd: Ensure Docker isn't present	2024-07-09 11:02:20 -06:00
David Esparza	04df85a44f	metrics: Add num_vcpus and free_mem to metrics results template. This PR retrieves the free memory and the vcpus count from a kata container and includes them to the json results file of any metric. Additionally this PR parses the requested vcpus quantity and the requested amount memory from kata configuration file and includes this pair of values into the json results file of any metric. Finally, the file system defined in the kata configuration file is included in the results template. Fixes: #9972 Signed-off-by: David Esparza <david.esparza.borquez@intel.com>	2024-07-09 10:29:29 -06:00
David Esparza	a554541495	metrics: Improvement to the description of certain functions. This PR rephrased the description and usage of certain functions as such as: - set_kata_configuration_performance - set_kata_config_file - get_current_kata_config_file - check_if_root - check_ctr_images Signed-off-by: David Esparza <david.esparza.borquez@intel.com>	2024-07-09 10:29:29 -06:00
stevenhorsman	c7cf26fa32	gha: make run-k8s-tests-on-zvsi inherit secrets run-k8s-tests-on-zvsi runs the coco tests and we've added new secrets to provide credentials for the authenticated image testing, so we need to let the zvsi job inherit these from the caller workflow like the rest of the coco tests Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-07-09 15:29:48 +01:00
Hyounggyu Choi	37b907dfbc	Merge pull request #9859 from BbolroC/set-ocispec-for-vfio-ap tests: Extend vfio-ap hotplug test to use a zcrypttest tool	2024-07-09 14:03:45 +02:00
Steve Horsman	ff498c55d1	Merge pull request #9719 from fitzthum/sealed-secret Support Confidential Sealed Secrets (as env vars)	2024-07-09 09:43:51 +01:00
Niteesh Dubey	529660fafb	runtime: pass certificates for SNP coco This will be used to get extended attestation report. Fixes: #9805 Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>	2024-07-09 03:46:00 +00:00
Tim Zhang	704da86e9b	CI: Add tests for stdio Add tests for stdio Signed-off-by: Tim Zhang <tim@hyper.sh>	2024-07-09 11:44:40 +08:00
Tim Zhang	8801554889	runtime-rs: Fix ctr exec stuck problem Fixes: #9532 Instead of call agent.close_stdin in close_io, we call agent.write_stdin with 0 len data when the stdin pipe ends. Signed-off-by: Tim Zhang <tim@hyper.sh>	2024-07-09 11:44:36 +08:00
Tobin Feldman-Fitzthum	1c2d69ded7	tests: add test for sealed env secrets The sealed secret test depends on the KBS to provide the unsealed value of a vault secret. This secret is provisioned to an environment variable. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2024-07-08 17:41:20 -05:00
Linda Yu	b4d61f887b	agent: unittest for sealed secret as env in kata To test unsealing secrets stored in environment variables, we create a simple test server that takes the place of the CDH. We start this server and then use it to unseal a test secret. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com> Signed-off-by: Linda Yu <linda.yu@intel.com>	2024-07-08 17:32:45 -05:00
Linda Yu	6003608fe6	agent: support sealed secret as env in kata When sealed-secret is enabled, the Kata Agent intercepts environment variables containing sealed secrets and uses the CDH to unseal the value. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com> Signed-off-by: Linda Yu <linda.yu@intel.com>	2024-07-08 17:31:33 -05:00
Gabriela Cervantes	cf2d5ff4c1	scrips: Fix indentation in QAT run script This PR fixes the indentation of the QAT run script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-08 20:23:50 +00:00
Gabriela Cervantes	d53eb61856	QAT: Remove CI variable from QAT run script This PR removes the CI variable from QAT run script which was used in the jenkins environment and not longer used. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-08 20:16:00 +00:00
Gabriela Cervantes	8a79b1449e	tests: Remove CI variable in tracing test This PR removes the CI variable as well as the instructions related to this as this was part of the jenkins environment which is not longer supported it. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-08 20:12:41 +00:00
Gabriela Cervantes	9d44abb406	tests: Remove CI variable in test agent shutdown This PR removes the CI variable as well as the instructions related to this variable which was used on the jenkins environment and not longer supported. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-08 20:10:24 +00:00
Gabriela Cervantes	f2ed8dc568	docs: Remove CI variable from Intel QAT documentation This PR updates the Intel QAT documentation by removing the CI variable which is not longer being supported as this was part of the jenkins CI environment. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-08 20:05:47 +00:00
Gabriela Cervantes	ff06ef0bbc	scripts: Eliminate CI variable as it is not longer used This PR removes the CI variable which is not longer being used or valid in the kata containers repository. The CI variable was used when we were using jenkins and scripts setups which are not longer supported. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-08 20:00:30 +00:00
GabyCT	cb0fb91bdd	Merge pull request #9966 from GabyCT/topic/fixstability tests: Use variable already defined in metrics common script for stability tests	2024-07-08 13:55:55 -06:00
Aurélien Bombo	e9d6179b28	tests: cri-containerd: Ensure Docker isn't present Following #9960 that transitioned this test to a free runner, we need to ensure Docker isn't installed on the system as that will conflict with the installation of Podman. Example error: https://github.com/kata-containers/kata-containers/actions/runs/9818218975/job/27177785716 Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-07-08 18:50:57 +00:00
Steve Horsman	e8836fafaa	Merge pull request #9828 from stevenhorsman/image-rs-bump-bad84c7 Image rs bump to latest main	2024-07-08 17:07:59 +01:00
Fabiano Fidêncio	67ba0ad0ad	Merge pull request #9971 from GabyCT/topic/fixnerdctldep gha: Fix pip installation for nerdctl GHA	2024-07-06 21:37:55 +02:00
Gabriela Cervantes	724b2c612c	gha: Fix pip installation for nerdctl GHA This PR fixes the pip installation for nerdctl by removing a flag which is not longer supported and avoid the failure of no such option: --break-system-packages. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-05 17:31:52 +00:00
stevenhorsman	1d6c1d1621	test: Add journal logging for debug - Due to the error we hit with pulling the agnhost image used in the liveness-probe tests, we want to leave the console printing to help with debug when we next try to bump the image-rs version Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-07-05 10:25:28 +01:00
stevenhorsman	d511820974	agent: Bump image-rs - Bump the commit of image-rs we are pulling in to 413295415 Note: This is the last commmit before a change to whiteout handling was introduced that lead to the error `'failed to unpack: convert whiteout"` when pulling the agnhost:2.21 image Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-07-05 10:25:28 +01:00
Fabiano Fidêncio	543c90f145	Merge pull request #9695 from ChengyuZhu6/fix-init Fix issues on CI about guest-pull	2024-07-05 11:21:08 +02:00
ChengyuZhu6	65dc12d791	tests: Re-enable k8s-kill-all-process-in-container.bats This test was fixed by previous patches in this PR: kata-containers#9695 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-07-05 08:10:04 +08:00
ChengyuZhu6	2ea521db5e	tests:tdx: Re-enable k8s-liveness-probes.bats This test was fixed by previous patches in this PR: kata-containers#9695 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-07-05 08:10:04 +08:00
ChengyuZhu6	93453c37d6	tests: Re-enable k8s-sysctls.bats This test was fixed by previous patches in this PR: kata-containers#9695 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-07-05 08:10:04 +08:00
ChengyuZhu6	6c5e053dd5	tests: Re-enable k8s-shared-volume.bats This test was fixed by previous patches in this PR: kata-containers#9695 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-07-05 08:10:04 +08:00
ChengyuZhu6	85979021b3	tests: Re-enable k8s-file-volume.bats This test was fixed by previous patches in this PR: kata-containers#9695 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-07-05 08:10:04 +08:00
ChengyuZhu6	e71c7ab932	agent/image: Remove functions about merging container spec for guest pull Let me explain why: In our previous approach, we implemented guest pull by passing PullImageRequest to the guest. However, this method resulted in the loss of specifications essential for running the container, such as commands specified in YAML, during the CreateContainer stage. To address this, it is necessary to integrate the OCI specifications and process information from the image’s configuration with the container in guest pull. The snapshotter method does not care this issue. Nevertheless, a problem arises when two containers in the same pod attempt to pull the same image, like InitContainer. This is because the image service searches for the existing configuration, which resides in the guest. The configuration, associated with <image name, cid>, is stored in the directory /run/kata-containers/<cid>. Consequently, when the InitContainer finishes its task and terminates, the directory ceases to exist. As a result, during the creation of the application container, the OCI spec and process information cannot be merged due to the absence of the expected configuration file. Fixes: kata-containers#9665 Fixes: kata-containers#9666 Fixes: kata-containers#9667 Fixes: kata-containers#9668 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-07-05 08:10:04 +08:00
ChengyuZhu6	c9d1a758cd	agent/image: Reuse the mountpoint in image-rs Currently, the image is pulled by image-rs in the guest and mounted at `/run/kata-containers/image/cid/rootfs`. Finally, the agent rebinds `/run/kata-containers/image/cid/rootfs` to `/run/kata-containers/cid/rootfs` in CreateContainer. However, this process requires specific cleanup steps for these mount points. To simplify, we reuse the mount point `/run/kata-containers/cid/rootfs` and allow image-rs to directly mount the image there, eliminating the need for rebinding. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-07-05 08:10:04 +08:00
stevenhorsman	05cd1cc7a0	agent: Add CreateContainer support for pre-pulled bundle - Add a check in setup_bundle to see if the bundle already exists and if it does then skip the setup. This commit is cherry-picked from `44ed3ab80e`. The reason that k8s-kill-all-process-in-container.bats failed is that deletion of the directory `/root/kata-containers/cid/rootfs` failed during removing container because it was mounted twice (one in image-rs and one in set_bundle ) and only unmounted once in removing container. Fixes: #9664 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com> Co-authored-by: Dave Hay <david_hay@uk.ibm.com> Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-07-05 08:10:00 +08:00
Zvonko Kaiser	7990d3a154	dragonball: Update kata config version Mandatory update Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-07-04 17:24:16 +00:00
Zvonko Kaiser	cfbca4fe0d	dragonball: Update versions Use the latest guest kernel that we use for all other VMMs Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-07-04 17:24:16 +00:00
Zvonko Kaiser	26446d1edb	dragonball: Update patches After v5.14 there is no cpu_hotplug_begin function now cpus_write_lock same for cpu_hotplug_done = cpus_write_unlock Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-07-04 17:23:24 +00:00
Zvonko Kaiser	ad574b7e10	dragonball: Add patches for 6.1.x Ported the 5.10 patchs to 6.1.x Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-07-04 17:06:39 +00:00
Gabriela Cervantes	757f37d956	stability: General improvements for soak parallel test This PR has better variable definitons as well the use of a variable which is already defined in the metrics common script for soak parallel test. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-04 16:32:46 +00:00
Gabriela Cervantes	6d56abbdad	stability: General improvements to agent stability test This PR is for better variable definitions as well as the use of the CTR_EXE variable which is already defined in the metrics common script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-04 16:24:27 +00:00
Gabriela Cervantes	3e6c32c3c8	tests: Use variable already defined in stability tests This PR uses the CTR_EXE which is already defined in the metrics common script to have uniformity across the multiple stability tests. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-04 16:21:24 +00:00
Steve Horsman	ddb8a94677	Merge pull request #9960 from sprt/fix-garm ci: Transition GARM tests to free runners, pt. I	2024-07-04 09:04:58 +01:00
Biao Lu	6c1a2f01f8	protocols: add support for sealed_secret service To unseal a secret, the Kata agent will contact the CDH using ttRPC. Add the proto that describes the sealed secret service and messages that will be used. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com> Signed-off-by: Biao Lu <biao.lu@intel.com>	2024-07-04 01:03:41 -05:00
Fabiano Fidêncio	49696bbdf2	Merge pull request #9943 from AdithyaKrishnan/nydus-cleanup-timeout tests: Fixes TEE timeout issue	2024-07-03 22:57:17 +02:00
Anastassios Nanos	db75b5f3c4	Merge pull request #8070 from nubificus/feat_add-fc-runtime-rs runtime-rs: firecracker hypervisor backend	2024-07-03 22:29:30 +03:00
Adithya Krishnan Kannan	9250858c3e	tests: Stop trying to patch finalize We have not seen instances of the nydus snapshotter hanging on its deletion that we must patch its finalize. Let's just drop this line for now. Signed-Off-By: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-07-03 12:19:26 -05:00
Dan Mihai	ada53744ea	Merge pull request #9907 from microsoft/saulparedes/allow_empty_env_vars genpolicy: allow some empty env vars	2024-07-03 08:07:23 -07:00
Aurélien Bombo	f18e35014f	ci: Move `run-nerdctl-tests` to free runner See #9940. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-07-03 14:58:11 +00:00
Aurélien Bombo	c0919d6f45	ci: Move `run-docker-tests` to free runner Removed the Docker installation step as that's preinstalled in free runners. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-07-03 14:57:59 +00:00
Aurélien Bombo	743a765525	ci: Move `run-runk` to free runner See #9940. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-07-03 14:57:48 +00:00
Aurélien Bombo	09cce86cc7	ci: Move `run-nydus` to free runner See #9940. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-07-03 14:57:42 +00:00
Aurélien Bombo	9e1b6064dc	ci: Move `run-containerd-stability` to free runner Removes the Docker installation step as that's preinstalled on the free runner: https://github.com/actions/runner-images/blob/main/images/ubuntu/Ubuntu2204-Readme.md#tools Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-07-03 14:57:37 +00:00
Aurélien Bombo	6a0e403acf	ci: Move `run-cri-containerd` to free runner See #9940. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-07-03 14:57:29 +00:00
George Pyrros	2d19f3fbd7	runtime-rs: firecracker hypervisor backend Add a basic runtime-rs `Hypervisor` trait implementation for AWS Firecracker - Add basic hypervisor operations (setup / start / stop / add_device) - Implement AWS Firecracker API on a separate file `fc_api.rs` - Add support for running jailed (include all sandbox-related content) - Add initial device support (limited as hotplug is not supported) - Add separate config for runtime-rs (FC) Notes: - devmapper is the only snapshotter supported - to account for no sharefs support, we copy files in the sandbox (as in the GO runtime) - nerdctl spawn is broken (TODO: #7703) Fixes: #5268 Signed-off-by: George Pyrros <gpyrros@nubificus.co.uk> Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk> Signed-off-by: Charalampos Mainas <cmainas@nubificus.co.uk> Signed-off-by: George Ntoutsos <gntouts@nubificus.co.uk>	2024-07-03 08:30:30 +00:00
GabyCT	e3e3873857	Merge pull request #9954 from GabyCT/topic/sysbenchci metrics: Remove variable in sysbench that is not being used	2024-07-02 16:58:46 -06:00
Aurélien Bombo	eda5d2c623	ci: cleanup: Run every 24 hours instead of 6 hours Resources don't fail to get deleted as often to need to run every 6 hours. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-07-02 22:27:58 +00:00
Aurélien Bombo	f20924db24	ci: cleanup: Ignore nonexisting resources Some resource names seem to be lingering in Azure limbo but do not map to any actual resources, so we ignore those. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-07-02 22:23:54 +00:00
GabyCT	0590aab3e6	Merge pull request #9952 from GabyCT/topic/unitjenkins docs: Remove jenkins reference from unit testing presentation	2024-07-02 15:34:25 -06:00
Aurélien Bombo	33d08a8417	Merge pull request #9825 from microsoft/mahuber/main osbuilder: allow rootfs builds w/o git or version file deps	2024-07-02 09:38:13 -07:00
Steve Horsman	078a1147a6	Merge pull request #9909 from kata-containers/sprt/gha-cleanup-pt2 ci: Add scheduled job to cleanup resources, pt. II	2024-07-02 17:12:03 +01:00
Gabriela Cervantes	b7da1291ea	metrics: Remove variable in sysbench that is not being used This PR removes the CI_JOB variable which previously was used but not longer being supported of the metrics sysbench test. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-02 15:29:50 +00:00
Wainer Moschetta	ec695f67e1	Merge pull request #9577 from microsoft/saulparedes/topology genpolicy: add topologySpreadConstraints support	2024-07-02 11:24:26 -03:00
Fabiano Fidêncio	ef3f6515cf	Merge pull request #9941 from sprt/temp-disable-test ci: Temporarily disable kata-deploy and GARM tests	2024-07-02 14:13:46 +02:00
Amulya Meka	dd12089e0d	Merge pull request #9914 from Amulyam24/qemu-fix kata-deploy: fix qemu static build on ppc64le	2024-07-02 10:45:03 +05:30
Saul Paredes	f3f3caa80a	genpolicy: update sample Update pod-one-container.yaml sample Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-07-01 13:49:08 -07:00
Dan Mihai	75aee526a9	genpolicy: add topologySpreadConstraints support Allow genpolicy to process Pod YAML files including topologySpreadConstraints. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-07-01 13:32:49 -07:00
Gabriela Cervantes	c270df7a9c	docs: Remove jenkins reference from unit testing presentation This PR removes the jenkins reference from unit testing presentation as this is not longer supported on the kata containers project. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-01 20:26:35 +00:00
GabyCT	e94490232e	Merge pull request #9949 from cmaf/tests-fix-openvino-help tests: Update help section in openvino test	2024-07-01 13:31:51 -06:00
Gabriela Cervantes	e3318a04f7	metrics: Update container name in blogbench test This PR updates the container name to put a random name instead of using a hard coded name. This PR is a general improvement to avoid random bug failures specially when we are running on baremetal environments. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-07-01 19:28:16 +00:00
Fabiano Fidêncio	05848d0c34	Merge pull request #9930 from likebreath/0627/clh_v40.0 Upgrade to Cloud Hypervisor v40.0	2024-07-01 20:04:47 +02:00
Steve Horsman	4fd820abd2	Merge pull request #9947 from stevenhorsman/fix-cleanups-workflow-secret gha: ci: Remove incorrect secrets line	2024-07-01 16:30:37 +01:00
Chelsea Mafrica	0b83c8549a	tests: Update help section in openvino test Test reports that it is a onednn test when it is openvino; update description. Fixes: #9948 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2024-07-01 14:24:50 +00:00
Hyounggyu Choi	795c5dc0ff	tests: Extend vfio-ap hotplug test to use zcrypttest This commit extends the vfio-ap hotplug test to include the use of `zcrypttest`. A newly introduced test by the tool consists of several test rounds as follows: - ioctl_test - simple_test - simple_one_thread_test - simple_multi_threads_test - multi_thread_stress_test - hang_after_offline_online_test A writable root filesystem is required for testing because the reference count needs to be reset after each test round. The current containerd kata containers support does not include `--privileged_without_host_devices`, which is necessary to configure a writable filesystem along with `--privileged`. (Please check out https://github.com/kata-containers/kata-containers/issues/9791 for details) So `crictl` is chosen to extend the test. The commit also includes the removal of old commands previously used for the tests repository but no longer in use. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-07-01 11:41:59 +02:00
Hyounggyu Choi	5bda197e9d	tests: Add zcrypttest tool to test image Dockerfile This commit copies an internal testing tool `zcrypttest` to the test image. A base image is changed to `ubuntu:22.04` due to a library dependency issue. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-07-01 11:40:49 +02:00
Hyounggyu Choi	99690ab202	runtime: Instantiate/pass vfio-ap device to ociSpec This commit adds the missing step of passing an attached vfio-ap device to a container via ociSpec. It instantiates and passes a vfio-ap device (e.g. a Z crypto device). A device at `/dev/z90crypt` covers all use cases at the time of writing. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-07-01 11:40:49 +02:00
Amulyam24	259ec408b5	kata-deploy: fix qemu static build for v8.2.1 on ppc64le Do not install the packages librados-dev and librbd-dev as they are not needed for building static qemu. Add machine option cap-ail-mode-3=off while creating the VM to qemu cmdline. Fixes: #9893 Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2024-07-01 14:56:43 +05:30
stevenhorsman	16130e473c	gha: ci: Remove incorrect secrets line The CI is failing with: ``` Invalid workflow file: .github/workflows/cleanup-resources.yaml#L10 The workflow is not valid. .github/workflows/cleanup-resources.yaml (Line: 10, Col: 5): Unexpected value 'secrets' ``` I think this is because `secrets: inherit` is only applicable when re-using a workflow, not for a standalone job like we have here. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-07-01 09:32:58 +01:00
Hyounggyu Choi	f0187ff969	Merge pull request #9932 from BbolroC/drop-ci-install-go CI: Eliminate dependency on tests repo	2024-07-01 08:24:28 +02:00
Hyounggyu Choi	f2bfc306a2	Merge pull request #9936 from BbolroC/use-quay-lpine-bash-curl CI: Use multi-arch image for alpine-bash-curl	2024-07-01 08:02:01 +02:00
Manuel Huber	4b2e725d03	rootfs: Install Rust only when necessary For docker-based builds only install Rust when necessary. Further, execute the detect Rust version check only when intending to install Rust. As of today, this is the case when we intend to build the agent during rootfs build. Signed-off-by: Manuel Huber <mahuber@microsoft.com>	2024-06-28 22:19:46 +00:00
Aurélien Bombo	c605fff4c1	ci: Temporarily disable kata-deploy and GARM tests Per the decision taken in the 6/27 AC meeting, this PR temporarily disables kata-deploy and GARM tests until we secure further Azure CI funding. In the meantime, I'll transition the GARM tests to free runners and reenable them to regain that coverage without affecting spending (see #9940). If it turns out the free runners are too slow, we'll switch back to GARM. After funding is secured, we'll reenable the kata-deploy tests (see #9939). Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-06-28 20:23:07 +00:00
Hyounggyu Choi	dd23beeb05	CI: Eliminating dependency on clone_tests_repo() As part of archiving the tests repo, we are eliminating the dependency on `clone_tests_repo()`. The scripts using the function is as follows: - `ci/install_rust.sh`. - `ci/setup.sh` - `ci/lib.sh` This commit removes or replaces the files, and makes an adjustment accordingly. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-06-28 14:52:02 +02:00
Hyounggyu Choi	f2c5f18952	CI: Use multi-arch image for alpine-bash-curl A multi-arch image for `alpine-bash-curl` has been pushed to and available at `quay.io/kata-containers`. This commit switches the test image to `quay.io/kata-containers/alpine-bash-curl`. Fixes: #9935 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-06-28 12:01:53 +02:00
Hyounggyu Choi	0e20f60534	CI: Drop unused scripts The following scripts are not used by the repository any more: - ci/install_go.sh - ci/run.sh - ci/install_vc.sh Additionally, they rely on the tests repo, which is soon to be archived. This commit drops the unused scripts. Fixes: #8507 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-06-28 07:55:21 +02:00
Archana Shinde	82a1892d34	agent: Add additional info while returning errors for update_interface This should provide additional context for errors while updating network interface. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-06-27 12:56:53 -07:00
Archana Shinde	2127288437	agent: Bring interface down before renaming it. In case we are dealing with multiple interfaces and there exists a network interface with a conflicting name, we temporarily rename it to avoid name conflicts. Before doing this, we need to rename bring the interface down. Failure to do so results in netlink returning Resource busy errors. The resource needs to be down for subsequent operation when the name is swapped back as well. This solves the issue of passing multiple networks in case of nerdctl as: nerdctl run --rm --net foo --net bar docker.io/library/busybox:latest ip a Fixes: #9900 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-06-27 12:56:53 -07:00
Zvonko Kaiser	a32b21bd32	Merge pull request #9918 from zvonkok/build-error rootfs: Fix spurious error	2024-06-27 19:46:51 +02:00
Bo Chen	25e3cab028	runtime: clh: Re-generate the client code This patch re-generates the client code for Cloud Hypervisor v40.0. Note: The client code of cloud-hypervisor's OpenAPI is automatically generated by openapi-generator. Fixes: #9929 Signed-off-by: Bo Chen <chen.bo@intel.com>	2024-06-27 09:59:00 -07:00
Bo Chen	ad92d73e43	versions: Upgrade to Cloud Hypervisor v40.0 Details of this release can be found in our roadmap project as iteration v40.0: https://github.com/orgs/cloud-hypervisor/projects/6. Fixes: #9929 Signed-off-by: Bo Chen <chen.bo@intel.com>	2024-06-27 09:40:13 -07:00
Alex Lyn	d66c214ae7	Merge pull request #9849 from markyangcc/main runtime: fix missing of VhostUserDeviceReconnect parameter assignment	2024-06-27 21:48:37 +08:00
Wainer Moschetta	afc1c1a782	Merge pull request #9896 from fitzthum/bump-gc-090 versions: bump coco guest components and trustee	2024-06-27 09:46:06 -03:00
Zvonko Kaiser	29bb9de864	Merge pull request #9923 from BbolroC/increase-interval-max-tries-kubectl tests: Increase interval and max_tries for kubectl_retry	2024-06-27 09:49:24 +02:00
Hyounggyu Choi	4ec355fb78	tests: Increase interval and max_tries for kubectl_retry Observed instability in the API server after deploying kata-deploy caused test failures. (see: https://github.com/kata-containers/kata-containers/actions/runs/9681494440/job/26743286861) Specifically, `kubectl_retry logs` failed before the API server could respond properly. This commit increases the interval and max_tries for kubectl_retry(), allowing sufficient time to handle this situation. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-06-27 08:39:22 +02:00
Aurélien Bombo	2c89828749	ci: Add scheduled job to cleanup resources, pt. II Follow-up to #9898 and final PR of this set. This implements the actual deletion logic. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-06-26 17:36:47 +00:00
Zvonko Kaiser	893fd2b59c	Merge pull request #9916 from zvonkok/config-fix gpu: Missing separator	2024-06-26 14:46:47 +02:00
Greg Kurz	fe7ef878d2	Merge pull request #9913 from gkurz/update-kata-ctl-deps kata-ctl: Update Cargo.lock	2024-06-26 14:31:03 +02:00
Zvonko Kaiser	30ec78b19a	rootfs: Fix spurious error In some DMZ'ed or CI systems the repos are not up to date and multistrap fails to find the ubuntu-keyring package. Update the repos to fix this; Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-06-26 11:10:58 +00:00
Zvonko Kaiser	e0aa54301f	gpu: Missing separator Add the correct separator for replacement Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-06-26 10:40:35 +00:00
Greg Kurz	ac33a389c0	Merge pull request #9879 from pmores/remove-dependency-on-containerd-bundle-dir-tree runtime-rs: remove attempt to access sandbox bundle from container bu…	2024-06-26 10:57:50 +02:00
Greg Kurz	db7b2f7aaa	kata-ctl: Update Cargo.lock A previous change missed to refresh Cargo.lock. Signed-off-by: Greg Kurz <groug@kaod.org>	2024-06-26 08:27:52 +02:00
Tobin Feldman-Fitzthum	dd8605917b	versions: bump coco guest components and trustee Pick up the changes from the newest version of guest-components and trustee. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2024-06-25 23:56:18 +00:00
GabyCT	81d23a1865	Merge pull request #9897 from GabyCT/topic/montime tests: Increase timeout to crictl calls on kata monitor tests	2024-06-25 17:27:15 -06:00
Gabriela Cervantes	a8432880f8	tests: Increase timeout to crictl calls on kata monitor tests This PR increases the timeout to crictl calls on kata monitor tests to avoid to hit issues every now and avoid random failures. This PR is very similar to PR #7640. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-06-25 22:32:47 +00:00
Wainer Moschetta	c4fb6fbda2	Merge pull request #9887 from ldoktor/ci-kata-runtime ci.ocp: Ensure we smoke-test with the right runtime class	2024-06-25 15:27:27 -03:00
Fabiano Fidêncio	fb44edc22f	Merge pull request #9906 from stevenhorsman/TEE-sample-kbs-policy-guards tests: attestation: Restrict sample policy use	2024-06-25 20:27:13 +02:00
Steve Horsman	c9df743dab	Merge pull request #9898 from sprt/gha-cleanup-job ci: Add scheduled job to cleanup resources, pt. I	2024-06-25 19:11:30 +01:00
Saul Paredes	ce19419d72	genpolicy: allow some empty env vars Updated genpolicy settings to allow 2 empty environment variables that may be forgotten to specify (AZURE_CLIENT_ID and AZURE_TENANT_ID) Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-06-25 10:53:05 -07:00
Aurélien Bombo	0582a9c75b	Merge pull request #9864 from 3u13r/feat/genpolicy/layers-cache-file-path genpolicy: allow specifying layer cache file	2024-06-25 10:42:22 -07:00
Aurélien Bombo	d60b548d61	ci: Add scheduled job to cleanup resources This is the first part of adding a job to clean up potentially dangling Azure resources. This will be based on Jeremi's tool from https://github.com/jepio/kata-azure-automation. At first, we'll only clean up AKS clusters, as this is what has been causing us problems lately, but this could very well be extended to cleaning up entire resource groups, which is why I left the different names pretty generic (i.e. "resources" instead of "clusters"). Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-06-25 16:33:03 +00:00
stevenhorsman	7610b34426	tests: attestation: Restrict sample policy use - We only want to enable the sample verifier in the KBS for non-TEE tests, so prevent an edge case where the TEE platform isn't set up correctly and we might fall back to the sample and get false positives. To prevent this we add guards around the sample policy enablement and only run it for non confidential hardware Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-06-25 16:59:40 +01:00
Steve Horsman	d574d37c4b	Merge pull request #9903 from stevenhorsman/authenticated-regsitry-workflow-secrets workflow: coco: Add auth registry secret	2024-06-25 16:40:46 +01:00
stevenhorsman	d8961cbd4a	workflow: coco: Add auth registry secret - Add the `AUTHENTICATED_IMAGE_USER` and `AUTHENTICATED_IMAGE_PASSWORD` repository secrets as env vars to the coco tests, so we can use them to pull an images from and authenticated registry for testing Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-06-25 11:11:02 +01:00
Alex Lyn	2c5b3a5c20	Merge pull request #9830 from gaohuatao-1/ght/count-rs runtime-rs: fix the bug of func count_files	2024-06-25 15:00:46 +08:00
GabyCT	27d75f93e2	Merge pull request #9872 from GabyCT/topic/varmemin metrics: Improve variable definition in memory inside containers script	2024-06-24 15:30:05 -06:00
Aurélien Bombo	b0cdf4eb0d	Merge pull request #9579 from microsoft/saulparedes/add_seccomp_support genpolicy: ignore SeccompProfile in PodSpec	2024-06-24 08:58:01 -07:00
Wainer Moschetta	bcdc4fde10	Merge pull request #9857 from wainersm/disable_failing_jobs-part2 CI: disable jobs that failed >= 50% on nightly CI recently - part 2	2024-06-24 10:11:05 -03:00
Leonard Cohnen	6a3ed38140	genpolicy: allow specifying layer cache file Add --layers-cache-file-path flag to allow the user to specify where the cache file for the container layers is saved. This allows e.g. to have one cache file independent of the user's working directory. Signed-off-by: Leonard Cohnen <lc@edgeless.systems>	2024-06-24 14:53:27 +02:00
Fabiano Fidêncio	3adf9e250f	Merge pull request #9875 from zvonkok/gha-no-sudo-arm64 ci: gha no sudo arm64	2024-06-21 15:28:54 +02:00
Wainer Moschetta	f7e0d6313b	Merge pull request #9865 from wainersm/qemu-coco-dev_updates runtime: updates to qemu-coco-dev configuration	2024-06-21 10:14:30 -03:00
Fabiano Fidêncio	2d552800f2	Merge pull request #9876 from zvonkok/gha-no-sudo-s390x ci: remove sudo from s390x build	2024-06-21 15:00:31 +02:00
Saul Paredes	44afb4aa5f	genpolicy: ignore SeccompProfile in PodSpec Ignore SeccompProfile in PodSpec Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-06-20 09:42:17 -07:00
Dan Mihai	7aeaf2502a	Merge pull request #9856 from microsoft/danmihai1/new-policy-rules genpolicy: reject untested CreateContainer field values	2024-06-20 09:34:53 -07:00
GabyCT	9320c2e484	Merge pull request #9845 from GabyCT/topic/fixartifacts gha: Do not fail when collecting artifacts	2024-06-20 10:15:53 -06:00
Hyounggyu Choi	959a277dc5	Merge pull request #9886 from BbolroC/kernel-config-uv-uapi-s390x kernel: Add CONFIG_S390_UV_UAPI for s390x	2024-06-20 16:05:15 +02:00
Steve Horsman	d5b4da7331	Merge pull request #9881 from stevenhorsman/remote-hypervisor-policy runtime: Support policy in remote hypervisor	2024-06-20 14:01:29 +01:00
Hyounggyu Choi	9cb12dfa88	kernel: Add CONFIG_S390_UV_UAPI for s390x While enabling the attestation for IBM SE, it was observed that a kernel config `CONFIG_S390_UV_UAPI` is missing. This config is required to present an ultravisor in the guest VM. Ths commit adds the missing config. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-06-20 13:15:33 +02:00
Lukáš Doktor	b08c019003	ci.ocp: Ensure we smoke-test with the right runtime class we do encourage people to set the KATA_RUNTIME, but it is only used by the webhook. Let's define it in the main `test.sh` and use it in the smoke test to ensure the user-defined runtime is smoke-tested rather than hard-coded kata-qemu one. Related to: #9804 Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-06-20 11:15:02 +02:00
Fabiano Fidêncio	0f2a4d202e	Merge pull request #9884 from fidencio/topic/re-enable-tdx-ci ci: tdx: Re-enable TDX CI	2024-06-20 06:39:06 +02:00
GabyCT	02075f73e9	Merge pull request #9874 from GabyCT/topic/fixvarnerdctl tests: nerdctl: Fix variables names and remove network	2024-06-19 13:43:25 -06:00
Fabiano Fidêncio	2bab0f31d7	ci: tdx: Re-enable TDX CI Now, using vanilla kubernetes, let's re-enable the TDX CI and hope it becomes more stable than it used to be. The cleanup-snapshotter is now taking ~4 minutes, and that matches with the other platforms, mainly considering there's a sum of 210 seconds sleep in the process. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-06-19 20:08:28 +02:00
Greg Kurz	81972f6ffc	Merge pull request #9149 from ryansavino/upgrade-to-qemu-8.2.1 qemu: upgrade to 8.2.4	2024-06-19 19:10:02 +02:00
stevenhorsman	779754dcf6	runtime: Support policy in remote hypervisor Move the `sandbox.agent.setPolicy` call out of the remoteHypervisor if, block, so we can use the policy implementation on peer pods Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-06-19 16:43:53 +01:00
Fabiano Fidêncio	f9862e054c	Merge pull request #9882 from fidencio/topic/ci-tdx-use-vanilla-k8s ci: tdx: Use vanilla k8s instead of k3s	2024-06-19 17:33:00 +02:00
Pavel Mores	6a4919eeb9	runtime-rs: fix misleading log message get_vmm_master_tid() currently returns an error with the message "cannot get qemu pid (though it seems running)" when it finds a valid QemuInner::qemu_process instance but fails to extract the PID out of it. This condition however in fact means that a qemu child process was running (otherwise QemuInner::qemu_process would be None) but isn't anymore (id() returns None). Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-06-19 17:15:24 +02:00
Pavel Mores	af5492e773	runtime-rs: made Qemu::stop_vm() idempotent Since Hypervisor::stop_vm() is called from the WaitProcess request handling which appears to be per-container, it can be called multiple times during kata pod shutdown. Currently the function errors out on any subsequent call after the initial one since there's no VM to stop anymore. This commit makes the function tolerate that condition. While it seems conceivable that sandbox shouldn't be stopped by WaitProcess handling, and the right fix would then have to happen elsewhere, this commit at least makes qemu driver's behaviour consistent with other hypervisor drivers in runtime-rs. We also slightly improve the error message in case there's no QemuInner::qemu_process instance. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-06-19 17:15:24 +02:00
Pavel Mores	5fbbff9e5e	runtime-rs: remove attempt to access sandbox bundle from container bundle Since no objections were raised in the linked issue (#9847) this commit removes the attempt to derive sandbox bundle path from container bundle path. As described in more detail in the linked issue, this is container runtime specific and doesn't seem to serve any purpose. As for implementation, we hoist the only part of get_shim_info_from_sandbox() that's still useful (getting the socket address) directly into the caller and remove the function altogether. Fixes #9847 Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-06-19 17:09:15 +02:00
Fabiano Fidêncio	7127178acc	ci: tdx: Use vanilla k8s instead of k3s We've noticed a bunch of issues related to deploying and deleting the nydus-snapshotter. As we don't see the same issues on other machines using vanilla kubernetes, let's avoid using k3s for now follow the flow. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-06-19 16:56:15 +02:00
Zvonko Kaiser	beab17f765	Merge pull request #9877 from zvonkok/gha-no-sudo-ppc64 ci: gha no sudo ppc64	2024-06-19 14:02:05 +02:00
Zvonko Kaiser	d783ddaf03	ci: Remove not needed chown for ppc64 Now that all artifacts are owned by $USER no extra step needed to adjust ownership Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-06-19 07:56:45 +00:00
Zvonko Kaiser	5bc37e39d5	ci: remove sudo from ppc64 build We can now do the same for ppc64 that we did for amd64 and remove the sudo cp. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-06-19 07:55:45 +00:00
Zvonko Kaiser	c341234c0b	ci: remove sudo from s390x build We can now do the same for s390x that we did for amd64 and remove the sudo cp. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-06-19 07:53:33 +00:00
Zvonko Kaiser	3beb460a97	ci: Remove not needed chown for arm64 Now that all artifacts are owned by $USER no extra step needed to adjust ownership Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-06-19 07:48:00 +00:00
Zvonko Kaiser	445b389b16	ci: remove sudo from arm64 build We can now do the same for arm64 that we did for amd64 and remove the sudo cp. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-06-19 07:46:51 +00:00
Gabriela Cervantes	6ec7971f7a	tests: nerdctl: Fix variables names and remove network This PR fixes the variables names for the network that was created as well removes the network that were created for the tests to ensure a clean environment when running all the tests and avoid failures specially on baremental environments that network already exists. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-06-18 23:00:49 +00:00
Dan Mihai	4df66568cf	genpolicy: reject untested CreateContainer field values Reject CreateContainerRequest field values that are not tested by Kata CI and that might impact the confidentiality of CoCo Guests. This change uses a "better safe than sorry" approach to untested fields. It is very possible that in the future we'll encounter reasonable use cases that will either: - Show that some of these fields are benign and don't have to be verified by Policy, or - Show that Policy should verify legitimate values of these fields These are the new CreateContainerRequest Policy rules: count(input.shared_mounts) == 0 is_null(input.string_user) i_oci := input.OCI is_null(i_oci.Hooks) is_null(i_oci.Linux.Seccomp) is_null(i_oci.Solaris) is_null(i_oci.Windows) i_linux := i_oci.Linux count(i_linux.GIDMappings) == 0 count(i_linux.MountLabel) == 0 count(i_linux.Resources.Devices) == 0 count(i_linux.RootfsPropagation) == 0 count(i_linux.UIDMappings) == 0 is_null(i_linux.IntelRdt) is_null(i_linux.Resources.BlockIO) is_null(i_linux.Resources.Network) is_null(i_linux.Resources.Pids) is_null(i_linux.Seccomp) i_linux.Sysctl == {} i_process := i_oci.Process count(i_process.SelinuxLabel) == 0 count(i_process.User.Username) == 0 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-06-18 18:09:31 +00:00
Wainer Moschetta	cf372f41bf	Merge pull request #9869 from fidencio/topic/disable-tdx-ci ci: tdx: Disable TDX CI	2024-06-18 14:47:38 -03:00
Gabriela Cervantes	671d9af456	metrics: Improve variable definition in memory inside containers script This PR improves the variable definition in memory inside the container script for metrics. This change declares and assigns the variables separately to avoid masking return values. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-06-18 16:56:12 +00:00
Gabriela Cervantes	eeb467bdc2	gha: Do not fail when collecting artifacts This PR will avoid the failures when collecting artifacts for the gha. This will ensure that we collect and archive system's data for the purpose of debugging. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-06-18 16:05:23 +00:00
Zvonko Kaiser	b1909e940e	deploy: Add busybox target For a minimal initrd/image build we may want to leverage busybox. This is part number two of the NVIDIA initrd/image build Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-06-18 15:31:00 +00:00
Wainer Moschetta	36093e86e0	Merge pull request #9863 from wainersm/kata-deploy_yq kata-deploy: always copy ci/install_yq.sh	2024-06-18 10:05:41 -03:00
Fabiano Fidêncio	587f4d45de	ci: tdx: Disable TDX CI TDX CI has been having some issues with the Nydus snapshotter cleanup, which has been stuck for hours depending every now and then. With this in mind, let's disable the TDX CI, so we avoid it blocking the progress of Kata Containers project, and we re-enable it as soon as we have it solved on Intel's side. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-06-18 10:30:40 +02:00
markyangcc	a28bf266f9	runtime: fix missing of VhostUserDeviceReconnect parameter assignment Commit 'ca02c9f5124e' implements the vhost-user-blk reconnection functionality, However, it has missed assigning VhostUserDeviceReconnect when new the QEMU HypervisorConfig, resulting in VhostUserDeviceReconnect always set to default value 0. Real change is this line, most of changes caused by go format, return vc.HypervisorConfig{ // ... VhostUserDeviceReconnect: h.VhostUserDeviceReconnect, }, nil Fixes: #9848 Signed-off-by: markyangcc <mmdou3@163.com>	2024-06-18 12:15:10 +08:00
Alex Lyn	388cd7dde4	Merge pull request #9772 from pmores/add-base-qmp-framework runtime-rs: add base qmp framework	2024-06-18 09:53:28 +08:00
Alex Lyn	275c498dc9	Merge pull request #9834 from lifupan/main sandbox: fix the issue of failed to get the vmm master tid	2024-06-18 08:57:21 +08:00
Alex Lyn	d3fb6bfd35	Merge pull request #9860 from stevenhorsman/tokio-vulnerability-bump Tokio vulnerability bump	2024-06-18 08:35:34 +08:00
Wainer dos Santos Moschetta	bdbee78517	runtime: allow default_{vcpus,memory} annotations to qemu-coco-dev This is a counterpart of commit `abf52420a4` for the qemu-coco-dev configuration. By allowing default_vcpu and default_memory annotations users can fine-tune the VM based on the size of the container image to avoid issues related with pulling large images in the guest. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-06-17 18:59:52 -03:00
Wainer dos Santos Moschetta	baa8d9d99c	runtime: set shared_fs=none to qemu-coco-dev configuration Just like the TEE configurations (sev, snp, tdx) we want to have the qemu-coco-dev using shared_fs=none. Fixes: #9676 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-06-17 18:42:46 -03:00
Wainer Moschetta	b8d7a8c546	Merge pull request #9862 from BbolroC/improve-kubectl-retry tests: Use selector rather than pod name for kubectl logs/describe	2024-06-17 18:33:24 -03:00
Hyounggyu Choi	6b065f5609	tests: Use selector rather than pod name for kubectl logs/describe The following error was observed during the deployment of nydus snapshotter: ``` Error from server (NotFound): the server could not find the requested resource ( pods/log nydus-snapshotter-5v82v) 'kubectl logs nydus-snapshotter-5v82v -n nydus-system' failed after 3 tries Error: Process completed with exit code 1. ``` This error can occur when a pod is re-created by a daemonset during the retry interval. This commit addresses the issue by using `--selector` rather than the pod name for `kubectl logs/describe`. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-06-17 22:27:50 +02:00
Wainer Moschetta	7df221a8f9	Merge pull request #9833 from wainersm/qemu-rs_tests tests/k8s: run for qemu-runtime-rs on AKS	2024-06-17 16:59:46 -03:00
Zvonko Kaiser	5f11c0f144	Merge pull request #9861 from zvonkok/release-3.6.0 release: Bump VERSIONS file to 3.6.0	2024-06-17 20:35:29 +02:00
Wainer Moschetta	b6a28bd932	Merge pull request #9786 from microsoft/saulparedes/add_back_insecure_registry_pull genpolicy: add back support for insecure	2024-06-17 15:21:25 -03:00
Wainer Moschetta	68415dabcd	Merge pull request #9815 from msanft/fix/genpolicy/flag-name genpolicy: fix settings path flag name	2024-06-17 15:13:25 -03:00
Wainer dos Santos Moschetta	08eaa60b59	CI: disable all run-kata-deploy-tests-on-garm jobs The following jobs have failed more than 50% on nightly CI. run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, k0s) run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, rke2) run-kata-deploy-tests-on-garm / run-kata-deploy-tests (qemu, k0s) Instead of removing only those jobs, let's skip the kata-deploy-tests on GARM completely so we can try to fix all the issues (or maybe drop the jobs altogether). Issue: #9854 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-06-17 14:39:38 -03:00
Steve Horsman	4a41cee534	Merge pull request #9838 from zvonkok/gha-no-sudo CI: remove sudo from GHA	2024-06-17 16:23:39 +01:00
Wainer dos Santos Moschetta	e517167825	kata-deploy: always copy ci/install_yq.sh To build the build-kata-deploy image, it should be copied ci/install_yq.sh to tools/packaging/kata-deploy/local-build/dockerbuild as this script will install yq within the image. Currently, if tools/packaging/kata-deploy/local-build/dockerbuild/install_yq.sh exists then make won't copy it again. This can raise problems as, for example, the current update of yq version (commit `c99ba42d`) in ci/install_yq.sh won't force the rebuild of the build-kata-deploy image. Note: this isn't a problem on a fresh dev or CI environment. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-06-17 12:18:22 -03:00
Zvonko Kaiser	618121a654	release: Bump VERSIONS file to 3.6.0 Let's bump the VERSIONS file and start preparing for a new release of the project. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-06-17 12:06:46 +00:00
stevenhorsman	53659f1ede	libs: Update tokio dependencies - Bump tokio to 1.38.0 to fix the security vulnerability https://rustsec.org/advisories/RUSTSEC-2024-0019 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-06-17 13:03:01 +01:00
stevenhorsman	35f6be97df	runtime-rs: Update tokio dependency - Bump tokio to 1.38.0 to fix the security vulnerability https://rustsec.org/advisories/RUSTSEC-2024-0019 If possible it would be good to add the many runtime-rs creates into the runtime-rs workspace and provide a centralised version to avoid the updates in many places. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-06-17 13:03:01 +01:00
stevenhorsman	3bb1a67d80	agent-ctl: Update rustjail dependencies - Run `cargo update -p rustjail` to pick up rustjail's bump of tokio to 1.38.0 to fix the security vulnerability https://rustsec.org/advisories/RUSTSEC-2024-0019 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-06-17 13:03:01 +01:00
stevenhorsman	d2d35d2dcc	runk: Update tokio dependencies - Bump tokio to 1.38.0 to fix the security vulnerability https://rustsec.org/advisories/RUSTSEC-2024-0019 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-06-17 13:03:01 +01:00
stevenhorsman	adda401a8c	genpolicy: Update tokio dependencies - Bump tokio to 1.38.0 to fix the security vulnerability https://rustsec.org/advisories/RUSTSEC-2024-0019 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-06-17 13:03:01 +01:00
stevenhorsman	b7928f465e	agent: Update tokio dependencies - Bump tokio to 1.38.0 to fix the security vulnerability https://rustsec.org/advisories/RUSTSEC-2024-0019 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-06-17 13:02:47 +01:00
Zvonko Kaiser	5c2f3f34a8	CI: remove sudo from GHA Now that all artifacts are owned by $USER we can start to remove sudo from our GHA Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-06-17 11:06:56 +00:00
Steve Horsman	cce735a09e	Merge pull request #9840 from stevenhorsman/bump-agent-rust-1.75.0 versions: Bump rust toolchain	2024-06-17 11:28:07 +01:00
Fupan Li	b218c4bc10	Merge pull request #9836 from lifupan/main_fix sandbox: fix the issue of double initial_size_manager config	2024-06-17 09:15:51 +08:00
Fabiano Fidêncio	9b5dd854db	Merge pull request #9726 from GabyCT/topic/unodeport tests: kbs: Use nodeport deployment from upstream trustee	2024-06-16 22:31:27 +02:00
Wainer dos Santos Moschetta	d4f664b73b	CI: disable run-kata-monitor-tests / run-monitor (containerd, lts) job The job has failed more than 50% on nightly CI. Remove it from the list of execution until we don't have a fix. Issue: #9853 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-06-14 16:27:04 -03:00
Wainer dos Santos Moschetta	cbf0b7ca7b	CI: disable run-basic-amd64-tests / run-nerdctl-tests (clh) job The job has failed more than 50% on nightly CI. Remove it from the list of execution until we don't have a fix. Issue: #9852 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-06-14 16:17:26 -03:00
Wainer dos Santos Moschetta	562820449e	CI: disable run-basic-amd64-tests / run-vfio (qemu) job The job has failed more than 50% on nightly CI. Remove it from the list of execution until we don't have a fix. The clh variation was disabled on commit `5f5274e699` so this change will actually result on all the VFIO jobs disabled. Instead of delete the entire entry from this workflow yaml (or comment the entry), I preferred to use `if: false` which will make the jobs appear on the UI as skipped. Issue: 9851 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-06-14 16:09:59 -03:00
GabyCT	4800e242a4	Merge pull request #9832 from GabyCT/topic/fixsets tests: setup: Improve setup script for kubernetes tests	2024-06-14 11:14:05 -06:00
Bo Chen	a68aeca356	Merge pull request #9575 from likebreath/0430/clh_v39.0 versions: Upgrade to Cloud Hypervisor v39.0	2024-06-14 09:10:19 -07:00
stevenhorsman	e23b929ba0	versions: Bump rust toolchain - Bump the rust version used to build the agent to 1.75.0 as agreed on in the AC meeting Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-06-14 16:45:16 +01:00
stevenhorsman	3fb176970f	dragonball: Fix device manager warning - Fix the lint error: ``` error: you seem to use `.enumerate()` and immediately discard the index --> src/device_manager/mod.rs:427:33 \| 427 \| for (_index, device) in self.virtio_devices.iter().enumerate() { \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``` by removing the unnecessary enumerate Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-06-14 16:45:16 +01:00
stevenhorsman	1ea2671f2f	dragonball: Fix lint with rust 1.75.0 The ci failed with: ``` error: use of `or_insert_with` to construct default value --> src/address_space_manager.rs:650:14 \| 650 \| .or_insert_with(NumaNode::new); \| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try: `or_default()` \| ``` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-06-14 16:45:16 +01:00
Steve Horsman	ab8a9882c1	Merge pull request #9818 from EmmEff/fix-spelling runtime: fix minor spelling issues	2024-06-14 13:12:56 +01:00
Steve Horsman	99bf95f773	Merge pull request #9827 from littlejawa/fix_panic_on_metrics_gathering runtime: avoid panic on metrics gathering	2024-06-14 11:12:43 +01:00
Steve Horsman	3eba4211f3	Merge pull request #9843 from microsoft/danmihai1/install_yq ci: fix the expected yq version string	2024-06-14 10:26:21 +01:00
Pavel Mores	380f8ad03f	runtime-rs: add base vCPU hotplugging support We take advantage of the Inner pattern to enable QemuInner::resize_vcpu() take `&mut self` which we need to call non-const functions on Qmp. This runs on Intel architecture but will need to be verified and ported (if necessary) to other architectures in the future. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-06-14 10:13:32 +02:00
Pavel Mores	8231c6c4a3	runtime-rs: instantiate Qmp as (optional) member of QemuInner The QMP_SOCKET_FILE constant in cmdline_generator.rs is made public to make it accessible from QemuInner. This is fine for now however if the constant needs to be accessed from additional places in the future we could consider moving it to somewhere more visible. The Debug impl for Qmp is empty since first, we don't actually want it, it's only forced by Hypervisor trait bounds, and second, it doesn't have anything to display anyway. If Qmp gets any members in the future that can be meaningfully displayed they should be handled by Qmp's Debug::fmt(). Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-06-14 10:13:32 +02:00
Pavel Mores	6fdb262dca	runtime-rs: add Qmp object to encapsulate QMP functionality The constructor handles QMP connection initialisation, too, so there can be non-functional Qmp instance. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-06-14 10:13:32 +02:00
Manuel Huber	62fd84dfd8	build: allow rootfs builds w/o git or VERSION file deps We set the VERSION variable consistently across Makefiles to 'unknown' if the file is empty or not present. We also use git commands consistently for calculating the COMMIT, COMMIT_NO variables, not erroring out when building outside of a git repository. In create_summary_file we also account for a missing/empty VERSION file. This makes e.g. the UVM build process in an environment where we build outside of git with a minimal/reduced set of files smoother. Signed-off-by: Manuel Huber <mahuber@microsoft.com>	2024-06-13 22:46:52 +00:00
Dan Mihai	824287d64a	Merge pull request #9844 from microsoft/danmihai1/k8s-policy-pvc tests: fix yq command line in k8s-policy-pvc	2024-06-13 15:07:15 -07:00
Wainer dos Santos Moschetta	73ab5942fb	tests/k8s: run for qemu-runtime-rs on AKS The following tests are disabled because they fail (alike with dragonball): - k8s-cpu-ns.bats - k8s-number-cpus.bats - k8s-sandbox-vcpus-allocation.bats Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-06-13 16:20:59 -03:00
Mike Frisch	c2f61b0fe3	runtime: spelling fixes Minor spelling fixes in runtime log messages. Signed-off-by: Mike Frisch <mikef17@gmail.com>	2024-06-13 12:11:34 -04:00
Dan Mihai	56f9e23710	tests: fix yq command line in k8s-policy-pvc Fix the collision between: - https://github.com/kata-containers/kata-containers/pull/9377 - https://github.com/kata-containers/kata-containers/pull/9706 One enabled a newer yq command line format and the other used the older format. Both passed CI because they were not tested together. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-06-13 16:06:15 +00:00
Dan Mihai	23e99e264c	ci: fix the expected yq version string I get: ~/gopath/bin/yq --version yq (https://github.com/mikefarah/yq/) version v4.40.7 Also add support for set -o xtrace to install_yq.sh. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-06-13 15:52:26 +00:00
Ryan Savino	0430794952	qemu: upgrade to 8.2.4 There is a known issue in qemu 7.2.0 that causes kernel-hashes to fail the verification of the launch binaries for the SEV legacy use case. Upgraded to qemu 8.2.4. new available features disabled. Fixes: #9148 Signed-Off-By: Ryan Savino <ryan.savino@amd.com>	2024-06-13 10:19:42 -05:00
Greg Kurz	b85b1c1058	Merge pull request #9790 from gkurz/kill-some-dead-runtime-code Kill some dead runtime code	2024-06-13 15:45:51 +02:00
gaohuatao	4cb4e44234	runtime-rs: fix the bug of func count_files When the total number of files observed is greater than limit, return -1 directly. runtime has fixed this bug, it should b ported to runtime-rs. Fixes:#9829 Signed-off-by: gaohuatao <gaohuatao@bytedance.com>	2024-06-13 16:02:33 +08:00
Fupan Li	cd68ef372f	sandbox: fix the issue of double initial_size_manager config It shouldn't call the initial_size_manager's setup_config in the load_config since it had been called in the sandbox's try_init function. Fixes: #9778 Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-06-13 15:44:51 +08:00
Fupan Li	61687992f4	sandbox: fix the issue of failed to get the vmm master tid For kata container, the container's pid is meaning less to containerd/crio since the container's pid is belonged to VM, and containerd/crio couldn't use it. Thus we just return any tid of kata shim or hypervisor. But since the hypervisor had been stopped before deleting the container, and it wouldn't get the hypervisor's tid for some supported hypervisor, thus we'd better to return the kata shim's pid instead of hypervisor's tid. Fixes: #9777 Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-06-13 10:27:04 +08:00
Fabiano Fidêncio	56423cbbfe	Merge pull request #9706 from burgerdev/burgerdev/genpolicy-devices genpolicy: add support for devices	2024-06-12 23:03:41 +02:00
Wainer Moschetta	d971e5ae68	Merge pull request #9537 from wainersm/kata-deploy-crio kata-deploy: configuring CRI-O for guest-pull image pulling	2024-06-12 17:27:00 -03:00
Gabriela Cervantes	c36c300fd6	tests: kbs: Use nodeport deployment from upstream trustee This PR uses the nodeport deployment from upstream trustee. To ensure our deployment is as close to upstream trustee replace the custom nodeport handling and replace it with nodeport kustomized flavour from the trustee project. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-06-12 20:01:59 +00:00
Gabriela Cervantes	0066aebd84	tests: setup: Improve setup script for kubernetes tests This PR makes general improvements like definition of variables and the use of them to improve the general setup script for kubernetes tests. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-06-12 19:39:54 +00:00
GabyCT	461b6e7c93	Merge pull request #9821 from GabyCT/topic/fixts metrics: Use function definition to have uniformity	2024-06-12 10:04:28 -06:00
Fabiano Fidêncio	3a0247ed43	Merge pull request #9819 from stevenhorsman/config-envvar-precedence agent: config: Ensure envs take precedence	2024-06-12 11:26:02 +02:00
Julien Ropé	9c86eb1d35	runtime: avoid panic on metrics gathering While running with a remote hypervisor, whenever kata-monitor tries to access metrics from the shim, the shim does a "panic" and no metric can be gathered. The function GetVirtioFsPid() is called on metrics gathering, and had a call to "panic()". Since there is no virtiofs process for remote hypervisor, the right implementation is to return nil. The caller expects that, and will skip metrics gathering for virtiofs. Fixes: #9826 Signed-off-by: Julien Ropé <jrope@redhat.com>	2024-06-12 10:02:44 +02:00
Xuewei Niu	92cc5e0adb	Merge pull request #9781 from gaohuatao-1/ght/shm	2024-06-12 12:39:28 +08:00
Moritz Sanft	84903c898c	genpolicy: fix settings path flag name This corrects the warning to point to the \`-j\` flag, which is the correct flag for the JSON settings file. Previously, the warning was confusing, as it pointed to the \`-p\` flag, which specifies to the path for the Rego ruleset. Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>	2024-06-11 21:17:18 +02:00
Greg Kurz	1acf8d0c35	govmm: Drop QEMU's `NoShutdown` knob Code is not used. Signed-off-by: Greg Kurz <groug@kaod.org>	2024-06-11 19:55:54 +02:00
Greg Kurz	cb5b548ad7	govmm: Drop QEMU's `Daemonize` knob Code isn't used anymore. Signed-off-by: Greg Kurz <groug@kaod.org>	2024-06-11 19:55:54 +02:00
Greg Kurz	33eaf69d5f	virtcontainers: Drop QEMU's `Daemonize` knob QEMU isn't started as daemon anymore and this won't change (see #5736 for details). Drop the related code. Signed-off-by: Greg Kurz <groug@kaod.org>	2024-06-11 19:55:54 +02:00
Wainer Moschetta	f66a5b6287	Merge pull request #9807 from wainersm/qemu-rs_kata-deploy kata-deploy: add qemu-runtime-rs runtimeClass	2024-06-11 14:50:01 -03:00
Dan Mihai	d47f40210a	Merge pull request #9808 from microsoft/saulparedes/oci_from_settings genpolicy: load OCI version from settings	2024-06-11 10:42:04 -07:00
Gabriela Cervantes	a96ff49060	metrics: Use function definition to have uniformity This PR uses the function definition to have uniformity across all the launch times script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-06-11 17:36:08 +00:00
Saul Paredes	3e9d6c11a1	genpolicy: add back support for insecure registries Adding back changes from `77540503f9`. Fixes: #9008 Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-06-11 09:42:23 -07:00
Bo Chen	2398442c58	runtime: clh: Re-generate the client code This patch re-generates the client code for Cloud Hypervisor v39.0. Note: The client code of cloud-hypervisor's OpenAPI is automatically generated by openapi-generator. Fixes: #8694, #9574 Signed-off-by: Bo Chen <chen.bo@intel.com>	2024-06-11 09:42:17 -07:00
Bo Chen	7a82894502	versions: Upgrade to Cloud Hypervisor v39.0 This patch upgrades Cloud Hypervisor to v39.0 from v36.0, which contains fixes of several security advisories from dependencies. Details can be found from #9574. Fixes: #8694, #9574 Signed-off-by: Bo Chen <chen.bo@intel.com>	2024-06-11 09:42:16 -07:00
Wainer dos Santos Moschetta	be9990144a	workflow: run kata-deploy tests to qemu-runtime-rs on AKS Start testing the ability of kata-deploy to install and configure the qemu-runtime-rs runtimeClass. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-06-11 12:58:47 -03:00
Wainer dos Santos Moschetta	4f398cc969	kata-deploy: add qemu-runtime-rs runtimeClass Allow kata-deploy to install and configure the qemu-runtime-rs runtimeClass which ties to qemu hypervisor implementation in rust for the runtime-rs. Fixes: #9804 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-06-11 12:58:47 -03:00
stevenhorsman	40e02b34cb	agent: config: Ensure envs take precedence - Update the config parsing logic so that when reading from the agent-config.toml file any envs are still processed - Add units tests to formalise that the envs take precedence over values from the command line and the config file Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-06-11 16:31:10 +01:00
Steve Horsman	59ff40f054	Merge pull request #9811 from mkulke/mkulke/use-kebabcase-for-enum-values-in-config-file-parsing agent: convert enum vals to kebab-case in cfg file	2024-06-11 14:49:30 +01:00
gaohuatao	638e9acf89	runtime: fix the bug of func countFiles When the total number of files observed is greater than limit, return (-1, err). When the returned err is not nil, the func countFiles should return -1. Fixes:#9780 Signed-off-by: gaohuatao <gaohuatao@bytedance.com>	2024-06-11 18:17:18 +08:00
Alex Lyn	1c8db85d54	Merge pull request #9784 from Apokleos/bufix-testcases kata-types: fix bug in kata-types several test cases	2024-06-11 10:01:45 +08:00
Saul Paredes	6a84562c16	genpolicy: load OCI version from settings Load OCI version from genpolicy-settings.json and validate it in rules.rego Fixes: #9593 Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-06-10 15:30:39 -07:00
GabyCT	0c5849b68b	Merge pull request #9809 from microsoft/danmihai1/yq-breaking-change tests: k8s: use newer yq command line format	2024-06-10 16:29:59 -06:00
Wainer Moschetta	ade69e44f9	Merge pull request #9785 from BbolroC/kubectl-retry CI: Introduce retry mechanism for kubectl in gha-run.sh	2024-06-10 18:33:34 -03:00
Magnus Kulke	abc704a720	agent: convert enum vals to kebab-case in cfg file fixes #9810 Add an annotation to the enum values in the agent config that will deserialize them using a kebab-case conversion, aligning the behaviour to parsing of params specified via kernel cmdline. drive-by fix: add config override for guest_component_procs variable Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>	2024-06-10 21:55:05 +02:00
Dan Mihai	32198620a9	tests: k8s: use newer yq command line format Fix the recent collision between: - https://github.com/kata-containers/kata-containers/pull/9377 - https://github.com/kata-containers/kata-containers/pull/9725 One enabled a newer yq command line format and the other used the older format. Both passed CI because they were not tested together. Fixes: #9789 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-06-10 18:48:25 +00:00
Dan Mihai	079a0a017c	Merge pull request #9557 from portersrc/ci-debug-output-nydus-pod CI: describe pod on k8s-create-pod wait failure	2024-06-10 08:17:54 -07:00
Ryan Savino	84280115f6	Merge pull request #9151 from niteeshkd/nd_snp_kernel_hashes runtime: enable kernel-hashes for SNP confidential container	2024-06-07 18:19:51 -05:00
GabyCT	03bcc167a4	Merge pull request #9779 from GabyCT/topic/fixcoscript tests: Fix indentation in common script	2024-06-07 15:37:10 -06:00
Wainer Moschetta	7a28535277	Merge pull request #9800 from fidencio/topic/ci-tdx-re-enable-some-of-the-tests ci: tdx: Re-enable a bunch of volume related tests	2024-06-07 16:17:19 -03:00
Hyounggyu Choi	8ff128dda8	CI: Introduce retry mechanism for kubectl in gha-run.sh Frequent errors have been observed during k8s e2e tests: - The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port? - Error from server (ServiceUnavailable): the server is currently unable to handle the request - Error from server (NotFound): the server could not find the requested resource These errors can be resolved by retrying the kubectl command. This commit introduces a wrapper function in common.sh that runs kubectl up to 3 times with a 5-second interval. Initially, this change only covers gha-run.sh for Kubernetes. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-06-07 18:24:19 +02:00
Fabiano Fidêncio	81c221c1b4	ci: k8s: tdx: Re-enable volume tests It seems I was very lose on disabling some of the tests, and the issues I faced could be related to other instabilities in the CI. Let's re-enable this one, following what was done for the SEV, SNP, and coco-qemu-dev. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-06-07 18:13:36 +02:00
Fabiano Fidêncio	9db9d35198	ci: k8s: tdx: Re-enable projected-volume tests It seems I was very lose on disabling some of the tests, and the issues I faced could be related to other instabilities in the CI. Let's re-enable this one, following what was done for the SEV, SNP, and coco-qemu-dev. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-06-07 18:12:36 +02:00
Fabiano Fidêncio	f6a6cba8ca	ci: k8s: tdx: Re-enable nested-configmap-secret tests It seems I was very lose on disabling some of the tests, and the issues I faced could be related to other instabilities in the CI. Let's re-enable this one, following what was done for the SEV, SNP, and coco-qemu-dev. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-06-07 18:12:06 +02:00
Fabiano Fidêncio	957d0cccf6	ci: k8s: tdx: Re-enable inotify tests It seems I was very lose on disabling some of the tests, and the issues I faced could be related to other instabilities in the CI. Let's re-enable this one, following what was done for the SEV, SNP, and coco-qemu-dev. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-06-07 18:10:39 +02:00
Fabiano Fidêncio	fc6f662ae0	ci: k8s: tdx: Re-enable credentials-secrets tests It seems I was very lose on disabling some of the tests, and the issues I faced could be related to other instabilities in the CI. Let's re-enable this one, following what was done for the SEV, SNP, and coco-qemu-dev. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-06-07 18:08:29 +02:00
Fabiano Fidêncio	5741c6d3e6	Merge pull request #9768 from fidencio/topic/ci-tdx-enable-cdh-test ci: kbs: Enable CDH tests for TDX	2024-06-07 17:59:12 +02:00
Greg Kurz	afeb98d73f	Merge pull request #9782 from ldoktor/ci-centos-9 ci.ocp: Switch base to centos-9	2024-06-07 13:15:02 +02:00
Fabiano Fidêncio	fde457589e	ci: kbs: tdx: Enable basic attestation tests Let's stop skipping the CDH tests for TDX, as know we should have an environmemnt where it can run and should pass. :-) Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-06-07 12:18:50 +02:00
Fabiano Fidêncio	cac525059e	ci: kbs: tdx: Use the hostname ip instead of localhost for the PCCS We must ensure we use the host ip to connect to the PCCS running on the host side, instead of using localhost (which has a different meaning from inside the KBS pod). The reason we're using `hostname -i` isntead of the helper functions, is because the helper functions need the coco-kbs deployed for them to work, and what we do is before the deployment. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-06-07 12:18:07 +02:00
Alex Lyn	27685c91e5	kata-types: fix bug in kata-types several test cases (1) As mis-use of cap.set causing previous Caps lost which causing assert! failed, just replacing cap.set with cap.add. (2) It will return error if there's no such name setting when do update_config_by_annotation { ... if config.runtime.name.is_empty() { return Err(io::Error::new( io::ErrorKind::InvalidData, "Runtime name is missing in the configuration", )); } ... } Fixes #9783 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-06-07 09:16:23 +08:00
David Esparza	822c641b58	Merge pull request #9760 from amshinde/kata-manager-link-runc kata-manager: Add symlinks for runc and slirp4netns	2024-06-06 12:55:57 -06:00
Lukáš Doktor	699376c535	ci.ocp: Switch base to centos-9 Centos8 is EOL and repos are not available anymore. Centos9 contains the same packages and should do well as a base for testing. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-06-06 09:03:17 +02:00
Chris Porter	4172ccb3a0	CI: describe pod on k8s-create-pod wait failure This is generally useful debug output on test failures, and specifically this has been useful for nydus-related issues recently. Signed-off-by: Chris Porter <porter@ibm.com>	2024-06-05 12:37:53 -04:00
Gabriela Cervantes	264c7e9473	tests: Fix indentation in common script This PR fixes the indentation in common script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-06-05 15:52:40 +00:00
Niteesh Dubey	1dbf5208ac	versions: Upgrade ovmf This is required to support SEV-SNP confidential container with kernel-hashes. Since this ovmf is latest stable version, it is good to upgrade for tdx and Vanilaa builds too. Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>	2024-06-05 15:02:02 +00:00
Niteesh Dubey	62d3d7c58f	runtime: enable kernel-hashes for SNP confidential container This is required to provide the hashes of kernel, initrd and cmdline needed during the attestation of the coco. Fixes: #9150 Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>	2024-06-05 15:02:02 +00:00
Steve Horsman	b30d085271	Merge pull request #9702 from ildikov/blog-submission-guide docs: Adding blog submission guidelines	2024-06-05 09:03:19 +01:00
Amulya Meka	b323afeda9	Merge pull request #9214 from Amulyam24/oras kata-deploy: install oras using release artefacts on ppc64le	2024-06-05 11:40:55 +05:30
Fabiano Fidêncio	138ef2c55f	Merge pull request #9678 from AdithyaKrishnan/main TEEs: Skip a few CI tests for SEV/SNP	2024-06-04 23:42:51 +02:00
GabyCT	ba30f0804a	Merge pull request #9770 from GabyCT/topic/fixvad tests: Use variable definition for better uniformity	2024-06-04 15:23:34 -06:00
Wainer dos Santos Moschetta	af4f9afb71	kata-deploy: add PULL_TYPE handler for CRI-O A new PULL_TYPE environment variable is recognized by the kata-deploy's install script to allow it to configure CRIO-O for guest-pull image pulling type. The tests/integration/kubernetes/gha-run.sh change allows for testing it: ``` export PULL_TYPE=guest-pull cd tests/integration/kubernetes ./gha-run.sh deploy-k8s ``` Fixes #9474 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-06-04 14:02:01 -03:00
GabyCT	6c2e8bed77	Merge pull request #9725 from 3u13r/feat/genpolicy/filter-by-runtime genpolicy: add ability to filter for runtimeClassName	2024-06-04 10:06:14 -06:00
Hyounggyu Choi	869f89c338	Merge pull request #9773 from BbolroC/use-qemu-coco-dev-s390x GHA: Use qemu-coco-dev for k8s nydus test on s390x	2024-06-04 17:49:38 +02:00
Gabriela Cervantes	cafba23f3e	tests: Use variable definition for better uniformity This PR replaces the name to use a variable that is already defined to have a better uniformity across the general script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-06-04 15:49:27 +00:00
Wainer Moschetta	2b8cdd9ff2	Merge pull request #9765 from wainersm/disable_failing_jobs CI: disable jobs that failed > 50% on nightly CI recently - part 1	2024-06-04 12:05:36 -03:00
Hyounggyu Choi	246ee83768	GHA: Use qemu-coco-dev for k8s nydus test on s390x In line with the changes for x86_64, the k8s nydus test for s390x should also use `qemu-coco-dev` for `KATA_HYPERVISOR`. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-06-04 15:49:23 +02:00
Hyounggyu Choi	3aff6c5bd8	CI: Retry fetching node_start_time when it is empty It was observed that the `node_start_time` value is sometimes empty, leading to a test failure. This commit retries fetching the value up to 3 times. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-06-04 15:41:15 +02:00
Zvonko Kaiser	647560539f	Merge pull request #9769 from zvonkok/initrd-image-no-sudo ci: remove sudo and make sure artifacts is owned by user	2024-06-04 07:16:51 +02:00
Wainer Moschetta	b5561074c3	Merge pull request #9377 from beraldoleal/yqbump deps: bumping yq to v4.40.7	2024-06-03 14:34:58 -03:00
Ildiko Vancsa	5e03bec26b	docs: Adding blog submission guidelines The Kata blog was recently moved to the project's website. The content of the blog is stored together with the rest of the website source on GitHub. This patch adds a short guide that describes how to submit a new blog post as a PR, to appear on the project's website. Signed-off-by: Ildiko Vancsa <ildiko.vancsa@gmail.com>	2024-06-03 08:58:05 -07:00
GabyCT	6c7affbd85	Merge pull request #9741 from GabyCT/topic/staticcheck tests: Fix indentation in static checks script	2024-06-03 09:43:23 -06:00
Zvonko Kaiser	a48c084e13	ci: remove sudo and make sure image is owed by user The image build needs special handling since we're doing a lot of privileged operations. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-06-03 15:29:06 +00:00
Fabiano Fidêncio	34d45f0868	Merge pull request #9749 from mkulke/mkulke/configure-guest-components-spawning CoCo: introduce config for guest-components procs	2024-06-03 15:50:36 +02:00
Ryan Savino	72dc823059	tests: k8s: sev: snp: skip "setting sysctl" test This test fails when using `shared_fs=none` with the nydus snapshotter. Issue tracked here: #9666 Skipping for now. Signed-Off-By: Ryan Savino <ryan.savino@amd.com>	2024-06-03 01:14:17 -05:00
Ryan Savino	3f3be54893	tests: k8s: sev: snp: skip initContainers shared vol test This test is failing due to the initContainers not being properly handled with the guest image pulling. Issue tracked here: #9668 Skipping for now. Signed-Off-By: Ryan Savino <ryan.savino@amd.com>	2024-06-03 01:14:17 -05:00
Ryan Savino	35dfb730ce	tests: k8s: sev: snp: skip "kill all processes in container" test This test fails when using `shared_fs=none` with the nydus napshotter, Issue tracked here: #9664 Skipping for now. Signed-Off-By: Ryan Savino <ryan.savino@amd.com>	2024-06-03 01:14:16 -05:00
Ryan Savino	62cc1dec4c	tests: replace docker debug alpine image with ghcr docker alpine latest image is rate limited. Need to use ghcr.io image. Signed-Off-By: Ryan Savino <ryan.savino@amd.com>	2024-06-03 01:14:16 -05:00
ChengyuZhu6	1820b02993	tests: replace busybox from docker with quay in guest pull To prevent download failures caused by high traffic to the Docker image, opt for quay.io/prometheus/busybox:latest over docker.io/library/busybox:latest . Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-06-03 01:14:16 -05:00
Ryan Savino	6c646dc96d	tests: k8s: sev: snp: add runtime annotation for sev and snp sev and snp cases added to the KATA_HYPERVISOR switch. Signed-off-by: Ryan Savino <ryan.savino@amd.com>	2024-06-03 01:14:16 -05:00
Ryan Savino	6db08ed620	runtime: sev: snp: Use shared_fs=none Disabling 9p for SEV and SNP TEEs. Signed-Off-By: Ryan Savino <ryan.savino@amd.com>	2024-06-03 01:14:16 -05:00
Ryan Savino	668959408d	tests: ensure kata_deploy cleanup even if namespace deletion fails the test cluster namespace deletion failing causes kata_deploy to not get cleaned up. Signed-Off-By: Ryan Savino <ryan.savino@amd.com>	2024-06-03 01:14:15 -05:00
Wainer dos Santos Moschetta	c9f93fc507	github: add actionlint configuration file Added configuration file with rules to exclude some self-hosted runners from the linter warnings. Related-with: #9646 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-31 19:46:09 -03:00
Wainer dos Santos Moschetta	5f5274e699	CI: disable run-basic-amd64-tests / run-vfio (clh) job The job has failed more than 50% on nightly CI. Remove it from the list of execution until we don't have a fix. Issue: 9764 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-31 19:34:45 -03:00
Wainer dos Santos Moschetta	9154ce9051	CI: disable run-basic-amd64-tests / run-tracing jobs These jobs have failed more than 50% on nightly CI. Remove them from the list of execution until we don't have a fix. Issue: 9763 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-31 19:26:58 -03:00
Wainer dos Santos Moschetta	ac4d48ad17	CI: disable run-kata-monitor-tests / run-monitor (qemu, containerd) job This job has failed more than 50% on nightly CI. Remove it from the list of execution until we don't have a fix. Issue: 9761 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-31 19:21:21 -03:00
Archana Shinde	7a3e13fae8	kata-manager: Add symlinks for runc and slirp4netns For nerdctl install, add symlinks for runc and slirp4netns in the binary install path. runc link comes in handy for running runc containers with nerdctl fir quick tests. slirp4netns allows for running containers with user mode networking useful in case of rootless containers. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-05-31 13:53:42 -07:00
Markus Rudy	13310587ed	genpolicy: check requested devices CreateContainerRequest objects can specify devices to be created inside the guest VM. This change ensures that requested devices have a corresponding entry in the PodSpec. Devices that are added to the pod dynamically, for example via the Device Plugin architecture, can be allowlisted globally by adding their definition to the settings file. Fixes: #9651 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2024-05-31 22:05:49 +02:00
Wainer Moschetta	f093c4c190	Merge pull request #9754 from wainersm/qemu_coco_dev-enable_policy_tests tests/k8s: enable policy tests for qemu-coco-dev	2024-05-31 15:09:25 -03:00
Markus Rudy	ea578f0a80	genpolicy: add support for VolumeDevices This adds structs and fields required to parse PodSpecs with VolumeDevices and PVCs with non-default VolumeModes. Signed-off-by: Markus Rudy <mr@edgeless.systems>	2024-05-31 19:34:14 +02:00
Beraldo Leal	d3a5eb299a	tools: bumping kernel config version Lets make ci happy. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2024-05-31 13:28:34 -04:00
Beraldo Leal	53b8158a81	tests: adding debug and skip to kata-deploy If a test is failing during setup, makes no much sense to run the suite. Let's skip and add some debug messages. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2024-05-31 13:28:34 -04:00
Beraldo Leal	9171821d57	tests: add debug message to check return code Lets add this message to make sure sh is starting properly. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2024-05-31 13:28:34 -04:00
Beraldo Leal	f91fbef184	tests: increase time after sh execution Increased sleep duration to ensure the shell process starts. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2024-05-31 13:28:34 -04:00
Beraldo Leal	ba5d2e54c2	tests: remove object separation mark from eof End of file should not end with --- mark. This will confuse tools like yq and kubectl that might think this is another object. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2024-05-31 13:28:34 -04:00
Beraldo Leal	3e8b4806b8	tests: increase debug messages for kata-deploy When the timeout happens we can't tell much information about the nodes. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2024-05-31 13:28:34 -04:00
Beraldo Leal	c99ba42d62	deps: bumping yq to v4.40.7 Since yq frequently updates, let's upgrade to a version from February to bypass potential issues with versions 4.41-4.43 for now. We can always upgrade to the newest version if necessary. Fixes #9354 Depends-on:github.com/kata-containers/tests#5818 Signed-off-by: Beraldo Leal <bleal@redhat.com>	2024-05-31 13:28:34 -04:00
Beraldo Leal	4f6732595d	ci: skip go version check golang.mk is not ready to deal with non GOPATH installs. This is breaking test on s390x. Since previous steps here are installing go and yq our way, we could skip this aditional check. A full refactor to golang.mk would be needed to work with different paths. Signed-off-by: Beraldo Leal <bleal@redhat.com>	2024-05-31 13:28:34 -04:00
Greg Kurz	7886ed6670	Merge pull request #9751 from wainersm/k8s_print_logs_on_fail tests/k8s: print logs on fail only (k8s-confidential-attestation.bats)	2024-05-31 14:47:27 +02:00
Fabiano Fidêncio	44df674232	Merge pull request #9757 from fidencio/topic/ci-tdx-skip-empty-dir-tests ci: k8s: Skip empty dir tests also for TDX	2024-05-31 13:18:35 +02:00
Magnus Kulke	9f04dc4c8b	agent: introduce config for coco attestion procs fixes #9748 A configuration option `guest_component_procs` has been introduced that indicates which guest component processes are supposed to be spawned by the agent. The default behaviour remains that all of those processes are actively spawned by the agent. At the moment this is based on presence of binaries in the rootfs and the guest_component_api_rest option. The new option is incremental: none -> attestation-agent -> confidential-data-hub -> api-server-rest e.g. api-server-rest implies attestation-agent and confidential-data-hub the `none` option has been removed from guest_component_api_rest, since this is addresses by the introduced option. To not change expected behaviour for non-coco guests we still will still only attempt to spawn the processes if the requested attestation binaries are present on the rootfs, and issue in warning in those cases. Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>	2024-05-31 12:15:41 +02:00
Amulyam24	eadcb868f4	kata-deploy: install oras using release artefacts on ppc64le We are currently building Oras from source on ppc64le. Now that they offically release the artefacts for power, consume them to install Oras. Fixes: #9213 Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2024-05-31 14:16:14 +05:30
Zvonko Kaiser	0321a3adcc	Merge pull request #8944 from zvonkok/update-threat-model threat-model: Add VFIO, ACPI and KVM/VMM threat-model descriptions	2024-05-31 10:38:27 +02:00
Fabiano Fidêncio	03a7cf4b02	ci: k8s: Skip empty dir tests also for TDX Wainer noticed this is failing for the coco-qemu-dev case, and decided to skip it, notifying me that he didn't fully understand why it was not failing on TDX. Turns out, though, this is also failing on TDX, and we need to skip it there as well. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-31 09:59:46 +02:00
Fabiano Fidêncio	72a71ff2bf	Merge pull request #9737 from zvonkok/kata-deploy-no-sudo ci: kata-deploy no sudo	2024-05-31 09:55:24 +02:00
Zvonko Kaiser	dd89d35b75	Merge pull request #9747 from zvonkok/remove-git-config ci: Remove all git config safe.directory	2024-05-31 07:25:28 +02:00
Leonard Cohnen	1d1690e2a4	genpolicy: add ability to filter for runtimeClassName Add the CLI flag --runtime-class-names, which is used during policy generation. For resources that can define a runtimeClassName (e.g., Pods, Deployments, ReplicaSets,...) the value must have any of the --runtime-class-names as prefix, otherwise the resource is ignored. This allows to run genpolicy on larger yaml files defining many different resources and only generating a policy for resources which will be deployed in a confidential context. Signed-off-by: Leonard Cohnen <lc@edgeless.systems>	2024-05-31 03:17:02 +02:00
Wainer dos Santos Moschetta	3333f8ddfd	tests/k8s: enable policy tests for qemu-coco-dev So qemu-coco-dev is on pair with the TEE configurations. Fixes: #9753 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-30 21:51:15 -03:00
Wainer Moschetta	83fa813700	Merge pull request #9694 from wainersm/qemu_coco_dev-k8s-guest-pull tests: enable guest-pull on all k8s tests for the qemu-coco-dev configuration	2024-05-30 21:48:11 -03:00
Wainer dos Santos Moschetta	55ae98eb28	tests/k8s: print logs on fail only (k8s-confidential-attestation.bats) Use the variable BATS_TEST_COMPLETED which is defined by the bats framework when the test finishes. `BATS_TEST_COMPLETED=` (empty) means the test failed, so the node syslogs will be printed only at that condition. Fixes: #9750 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-30 17:19:33 -03:00
Wainer Moschetta	66e3b88694	Merge pull request #9746 from wainersm/nydus_snapshotter_pin ci: pin the nydus-snapshotter image version	2024-05-30 16:49:10 -03:00
Wainer dos Santos Moschetta	3e18fe7805	tests/k8s: skip file volume tests for qemu-coco-dev This test fails with qemu-coco-dev configuration and guest-pull image pull. Issue: #9667 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-30 14:50:59 -03:00
Zvonko Kaiser	063db516f2	ci: Remove all git config safe.directory Now with the sudo less build we should be good to remove those hacks. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-30 15:12:28 +00:00
Zvonko Kaiser	d8889684f0	ci: kata-deploy no sudo Build/push/manage aritfacts without sudo Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-30 15:07:27 +00:00
Wainer dos Santos Moschetta	5faf9ca344	ci: pin the nydus-snapshotter image version It's cloning the nydus-snapshotter repo from the version specified in versions.yaml, however, the deployment files are set to pull in the latest version of the snapshotter image. With this version we are pinning the image version too. This is a temporary fix as it should be better worked out at nydus-snapshotter project side. Fixes: #9742 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-30 11:21:16 -03:00
Greg Kurz	b3cb19b6a7	Merge pull request #9639 from emanuellima1/rng-impl runtime-rs: Add RNG to QEMU cmdline	2024-05-30 12:00:11 +02:00
Zvonko Kaiser	7cc0ebe75e	Merge pull request #9743 from zvonkok/tools-fix ci: Fix tools builder images	2024-05-30 11:53:34 +02:00
Zvonko Kaiser	02a7f8c852	ci: Fix tools builder images We weren't considering changes of the tools script dir adding a fourth hash to accomodate this Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-30 08:10:42 +00:00
Fabiano Fidêncio	97806dbdaa	Merge pull request #9732 from zvonkok/shim-v2-no-sudo ci: shim-v2 no sudo	2024-05-30 07:01:04 +02:00
Wainer dos Santos Moschetta	37894923c1	tests/k8s: skip empty dir volumes tests for qemu-coco-dev This test fails with qemu-coco-dev configuration and guest-pull image pull. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-29 18:37:24 -03:00
Wainer dos Santos Moschetta	79a8b31ec5	tests/k8s: skip shared volume tests for qemu-coco-dev This test fails with qemu-coco-dev configuration and guest-pull image pull. Issue: #9668 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-29 18:37:24 -03:00
Wainer dos Santos Moschetta	aa1a37081e	tests/k8s: skip sysctls tests for qemu-coco-dev This test fails with qemu-coco-dev configuration and guest-pull image pull. Issue: #9666 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-29 18:37:24 -03:00
Wainer dos Santos Moschetta	0e81ced9f1	tests/k8s: skip kill-all-process tests for qemu-coco-dev This test fails with qemu-coco-dev configuration and guest-pull image pull. Issue: #9664 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-29 18:37:24 -03:00
Wainer dos Santos Moschetta	18896efa3c	tests/k8s: skip seccomp tests for qemu-coco-dev This test fails with qemu-coco-dev configuration and guest-pull image pull. Unlike other tests that I've seen failing on this scenario, k8s-seccomp.bats fails after a couple of consecutive executions, so it's that kind of failure that happens once in a while. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-29 18:37:24 -03:00
Wainer dos Santos Moschetta	b62ad71c43	tests/k8s: add runtime handler annotation for qemu-coco-dev This will enable the k8s tests to leverage guest pulling when PULL_TYPE=guest-pull for qemu-coco-dev runtimeclass. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-29 18:37:24 -03:00
Wainer dos Santos Moschetta	089c7ad84a	tests/k8s: add runtime handler annotation only for guest-pull The runtime handler annotation is required for Kubernetes <= 1.28 and guest-pull pull type. So leverage $PULL_TYPE (which is exported by CI jobs) to conditionally apply the annotation. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-29 18:37:24 -03:00
GabyCT	0eddfdc74f	Merge pull request #9731 from zvonkok/pause-no-sudo ci: pause-image no sudo	2024-05-29 11:48:41 -06:00
Zvonko Kaiser	7354c427f9	Merge pull request #9734 from zvonkok/virtiofsd-no-sudo ci: virtiofsd no sudo	2024-05-29 19:31:25 +02:00
GabyCT	3c91aa0475	Merge pull request #9739 from zvonkok/initramfs-no-sudo ci: initramfs no sudo	2024-05-29 11:28:59 -06:00
Hyounggyu Choi	40d2306f95	Merge pull request #9729 from zvonkok/agent-no-sudo-build ci: build agent without sudo	2024-05-29 19:27:56 +02:00
GabyCT	03be220482	Merge pull request #9730 from zvonkok/kernel-no-sudo ci: kernel no sudo	2024-05-29 10:23:31 -06:00
GabyCT	a32058913a	Merge pull request #9679 from amshinde/kata-manager-install-cni kata-manager: Copy cni files under /opt/cni	2024-05-29 10:20:34 -06:00
GabyCT	a5808a556d	Merge pull request #9733 from zvonkok/tools-no-sudo ci: tools no sudo	2024-05-29 10:19:17 -06:00
GabyCT	e94b09839d	Merge pull request #9736 from zvonkok/qemu-no-sudo ci: qemu no sudo	2024-05-29 10:18:34 -06:00
GabyCT	6d58fce4a9	Merge pull request #9677 from GabyCT/topic/memoryusags metrics: Improve variable definition in memory usage script	2024-05-29 10:16:56 -06:00
Emanuel Lima	138d985c64	runtime-rs: Add RNG to QEMU cmdline It creates this line, as the Golang runtime does: -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0 Signed-off-by: Emanuel Lima <emlima@redhat.com>	2024-05-29 13:11:00 -03:00
Hyounggyu Choi	6ba2461404	Merge pull request #9728 from zvonkok/coco-guest-comp-no-sudo ci: guest-components without sudo	2024-05-29 17:55:43 +02:00
Gabriela Cervantes	09c3e08f6a	tests: Fix indentation in static checks script This PR fixes the indentation in the static checks script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-05-29 15:43:44 +00:00
Xuewei Niu	c297a7891c	Merge pull request #9723 from zvonkok/hotunplug-fix vfio: Fix hot-unplug	2024-05-29 22:02:05 +08:00
Zvonko Kaiser	25c784c568	ci: shim-v2 no sudo Build shim-v2 without sudo docker this is not needed. This is part 6 of N Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-29 09:24:54 +00:00
Zvonko Kaiser	84a9773cec	ci: initramfs no sudo BUild initramfs without sudo docker this is not needed. This is part 10 of N Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-29 09:20:39 +00:00
Zvonko Kaiser	7dc47c8150	ci: qemu no sudo Build qemu without sudo docker this is not needed. This is part 9 of N Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-28 16:12:06 +00:00
Zvonko Kaiser	4a455bf24a	ci: virtiofsd no sudo build virtiofsd without sudo docker this is not needed. This is part 8 of N Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-28 14:19:58 +00:00
Wainer Moschetta	9896f69827	Merge pull request #9414 from ldoktor/ci-bisection ci.ocp: Document openshift pipeline and manual bisection	2024-05-28 11:17:09 -03:00
Zvonko Kaiser	dd04d26cb0	ci: tools no sudo Build tools without sudo docker this is not needed. This is part 7 of N Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-28 13:57:20 +00:00
Zvonko Kaiser	6c9c0306ac	ci: pause-image no sudo Build pause-image without sudo docker this is not needed. This is part 5 of N Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-28 11:31:59 +00:00
Hyounggyu Choi	e8c06301d7	Merge pull request #9727 from zvonkok/ovmf-no-sudo ci: ovmf without sudo	2024-05-28 13:29:00 +02:00
Zvonko Kaiser	c95ae5a502	ci: kernel no sudo Build kernel without sudo docker this is not needed. This is part 4 of N Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-28 11:19:08 +00:00
Zvonko Kaiser	8fab5dd584	ci: build agent without sudo Build agent without sudo docker this is not needed. This is part 3 of N Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-28 09:55:32 +00:00
Zvonko Kaiser	1e4cbc4fcd	ci: guest-components wihout sudo Build guest-components without sudo docker this is not needed. This is part 2 of N Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-28 09:03:14 +00:00
Zvonko Kaiser	b76938b922	ci: ovmf without sudo Build ovmf without sudo docker this is not needed. This is part 1 of N Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-28 08:25:27 +00:00
Zvonko Kaiser	c6c20ac253	docs: Format the threat-model to 80 chars Truncate long lines to reasonable 80 characters Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-28 07:39:26 +00:00
Zvonko Kaiser	d4832b3b74	vfio: Fix hotpunplug We need to remove the device from the tracking map, a container restart will increment the bus index and we will get out of root-ports and crash the machine. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-28 07:37:30 +00:00
Zvonko Kaiser	a7931115a0	Merge pull request #8861 from zvonkok/config-pcie-root-switch-port gpu: reintroduce pcie_root_port and add pcie_switch_port	2024-05-27 13:17:57 +02:00
Fabiano Fidêncio	3276bb52b6	Merge pull request #9721 from fidencio/topic/ci-kata-deploy-improvements-and-fixes kata-deploy / kata-cleanup / ci: Fixes and improvements to kata-deploy / kata-cleanup and its usage in the CI	2024-05-27 12:29:40 +02:00
Zvonko Kaiser	4c93bb2d61	qemu: Add CDI device handling for any container type We need special handling for pod_sandbox, pod_container and single_container how and when to inject CDI devices Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-27 10:13:01 +00:00
Zvonko Kaiser	c7b41361b2	gpu: reintroduce pcie_root_port and add pcie_switch_port In Kubernetes we still do not have proper VM sizing at sandbox creation level. This KEP tries to mitigates that: kubernetes/enhancements#4113 but this can take some time until Kube and containerd or other runtimes have those changes rolled out. Before we used a static config of VFIO ports, and we introduced CDI support which needs a patched contianerd. We want to eliminate the patched continerd in the GPU case as well. Fixes: #8860 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-27 10:13:01 +00:00
Fupan Li	6f6a164451	Merge pull request #9268 from zvonkok/kata-agent-createcontainer kata-agent: CreateContainer Hook	2024-05-27 16:36:22 +08:00
Fabiano Fidêncio	e81e8a4527	tests: kata-deploy: Adjust timeout 10 minutes is waay too long. Let's give it 4 minutes only. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-27 06:23:00 +02:00
Fabiano Fidêncio	fba5793c0d	tests: kata-deploy: Run the tests from "${repo_root_dir}" Let's see if it helps with issues like: ``` error: must build at directory: not a valid directory: evalsymlink failure on '"/home/runner/actions-runner/_work/kata-containers/kata-containers/tests/functional/kata-deploy/../../..//tools/packaging/kata-deploy/kata-cleanup/overlays/k0s"' : lstat /home/runner/actions-runner/_work/kata-containers/kata-containers/tests/functional/kata-deploy/": no such file or directory ``` Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-27 06:23:00 +02:00
Fabiano Fidêncio	8a8a7ea0e5	tests: kata-deploy: Show more logs in the setup() This will also help us to better understand possible failures with the CI. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-27 05:05:06 +02:00
Fabiano Fidêncio	47d9589e9b	tests: kata-deploy: Show output of passing tests This will help us to debug failures and compare passing and failures outputs. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-27 05:05:06 +02:00
Fabiano Fidêncio	dbd0d4a090	gha: Only do preventive cleanups for baremetal This takes a few minutes that could be saved, so let's avoid doing this on all the platforms, but simply do this when it's needed (the baremetal use case). Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-27 05:05:06 +02:00
Fabiano Fidêncio	ee2ef0641c	tests: k8s: Allow passing "all" to run all the tests Currently only "baremetal" runs all the tests, but we could easily run "all" locally or using the github provided runners, even when not using a "baremetal" system. The reason I'd like to have a differentiation between "all" and "baremetal" is because "baremetal" may require some cleanup, which "all" can simply skip if testing against a fresh created VM. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-27 05:05:06 +02:00
Fabiano Fidêncio	556227cb51	tests: Add the possibility to deploy k0s / rke2 For now we've only exposed the option to deploy kata-deploy for k3s and vanilla kubernetes when using containerd. However, I do need to also deploy k0s and rke2 for an internal CI, and having those exposed here do not hurt, and allow us to easily expand the CI at any time in the future. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-27 05:05:06 +02:00
Fabiano Fidêncio	e3c2f0b0f1	kata-cleanup: Add k0s kustomization k0s was added to kata-deploy, but it's kata-cleanup counterpart was never added. Let's fix it. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-27 05:05:06 +02:00
Fabiano Fidêncio	f15d40f8fb	kata-deploy: Fix k0s deployment k0s deployment has been broken since we moved to using `tomlq` in our scripts. The reason is that before using `tomlq` our script would, involuntarily, end up creating the file. Now, in order to fix the situation, we need to explicitly create the file and let `tomlq` add the needed content. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-27 05:05:06 +02:00
Alex Lyn	713c929a64	Merge pull request #9656 from pmores/document-qemu-rs-conventions runtime-rs: document architecture & implementation conventions in qem…	2024-05-27 10:38:58 +08:00
Xuewei Niu	bb7a1c56e9	Merge pull request #9693 from sidneychang/9690/Adjust-indentation	2024-05-27 00:20:34 +08:00
Alex Lyn	55dbf6121a	Merge pull request #9604 from Apokleos/qmp-cmdline01 runtime-rs: add QMP support for Qemu(part I)	2024-05-26 20:22:59 +08:00
Alex Lyn	028b10ce7a	Merge pull request #9687 from l8huang/vfio-pci-gk agent: collect PCI address mapping for both vfio-pci-gk and vfio-pci device	2024-05-26 17:48:25 +08:00
Steve Horsman	b89c3e35dd	Merge pull request #9583 from cncal/update_check_error_message runtime: make kata-runtime check error more understandable when /dev/kvm doesn't exist	2024-05-24 17:49:43 +01:00
Alex Lyn	41fb7aeb89	runtime-rs: add QMP params suppport in cmdline Fixes: #9603 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-05-24 22:16:24 +08:00
Alex Lyn	7ed6c6896b	runtime-rs: add an option dbg_monitor_socket for HMP support This option allows to add a debug monitor socket when `enable_debug = true` to control QEMU within debugging case. Fixes: #9603 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-05-24 22:16:17 +08:00
Lei Huang	3624573b12	agent: collect PCI address mapping for both vfio-pci-gk and vfio-pci device The `update_env_pci()` function need the PCI address mapping to translate the host PCI address to guest PCI address in below environment variables: - PCIDEVICE_<prefix>_<resource-name>_INFO - PCIDEVICE_<prefix>_<resource-name> So collect PCI address mapping for both vfio-pci-gk and vfio-pci devices. Fixes #9614 Signed-off-by: Lei Huang <leih@nvidia.com>	2024-05-23 21:20:01 -07:00
Fupan Li	d73876252e	Merge pull request #9690 from justxuewei/agent-timeout runtime-rs: Remove obsoleted dial_timeout config	2024-05-24 10:31:12 +08:00
Zvonko Kaiser	3affd83e14	Merge pull request #9605 from l8huang/skip-env kata-agent: update env PCIDEVICE_<prefix>_<resource-name>_INFO	2024-05-23 18:45:00 +02:00
Fabiano Fidêncio	44d6cb7791	Merge pull request #9698 from wainersm/k8s_tests_disable_fail_fast tests/k8s: disable "fail-fast" behavior by default	2024-05-23 18:28:00 +02:00
Fabiano Fidêncio	d83cf39ba1	Merge pull request #9680 from kata-containers/dependabot/go_modules/src/runtime/go_modules-5e29427af7 build(deps): bump golang.org/x/net from 0.24.0 to 0.25.0 in /src/runtime in the go_modules group across 1 directory	2024-05-23 12:55:29 +02:00
Fabiano Fidêncio	d9ee950d8f	Merge pull request #9696 from wainersm/skip_custom_dns_test tests/k8s: skip custom DNS tests on confidential jobs	2024-05-22 23:57:21 +02:00
GabyCT	e08ad8d1b7	Merge pull request #9686 from GabyCT/topic/fixbootclh metrics: Fix minvalue for boot time	2024-05-22 15:46:50 -06:00
Wainer dos Santos Moschetta	76735df427	tests/k8s: disable "fail-fast" behavior by default The k8s test suite halts on the first failure, i.e., failing-fast. This isn't the behavior that we used to see when running tests on Jenkins and it seems that running the entire test suite is still the most productive way. So this disable fail-fast by default. However, if you still wish to run on fail-fast mode then just export K8S_TEST_FAIL_FAST=yes in your environment. Fixes: #9697 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-22 18:27:44 -03:00
Fabiano Fidêncio	8eb061cd5b	Merge pull request #9681 from GabyCT/topic/etdx gha: Enable install kbs and coco components for TDX, but still skip the CDH test	2024-05-22 23:18:42 +02:00
Wainer dos Santos Moschetta	43766cdb96	tests/k8s: skip custom DNS tests on confidential jobs This test has failed in confidential runtime jobs. Skip it until we don't have a fix. Fixes: #9663 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-05-22 17:08:22 -03:00
Fabiano Fidêncio	904370ecd6	tests: attestation: tdx: Skip test for now Skipping the test will allow us to have the TDX CI running while we debug the test. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-22 20:04:13 +02:00
Fabiano Fidêncio	414d716eef	tests: kbs: Enable cli installation also on CentOS One of our machines is running CentOS 9 Stream, and we could easily verify that we can build and install the kbs client there, thus we're expanding the installation script to also support CentOS 9 Stream. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-22 20:01:57 +02:00
Fabiano Fidêncio	27d7f4c5b8	tests: kbs: Fix rust installation `externals.coco-kbs.toolchain` is not defined, get the rust_version from `externals.coco-trustee.toolchain` instead. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-22 20:01:57 +02:00
Fabiano Fidêncio	fa8b5c76b8	tests: kbs: Add more info for the TDX deployment Ditto in the commit shortlog. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-22 20:01:57 +02:00
Fabiano Fidêncio	6ffd7b8425	versions: trustee: Bump version to 6adb8383309cbb7 We're bumping the version in order to bring in the customisation needed for setting up a custom pccs, which is needed for the KBS integration tests with Kata Containers + TDX. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-22 20:01:57 +02:00
Fabiano Fidêncio	dbd1fa51cd	tests: kbs: Don't assume /tmp/trustee exists in the machine Instead, check if the directory exists before pushd'ing into it. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-22 20:01:57 +02:00
Gabriela Cervantes	f698caccc0	gha: Enable install kbs and coco components for TDX This PR enables the installation and unistallation of the kbs client as well as general coco components needed for the TDX GHA CI. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-05-22 20:01:57 +02:00
GabyCT	eaaab19763	Merge pull request #9685 from GabyCT/topic/fixic tests: Fix indentation in confidential common script	2024-05-22 11:53:33 -06:00
Gabriela Cervantes	29a10f1373	metrics: Fix minvalue for boot time This PR fixes the minvalue for boot time to avoid the random failures of the GHA CI. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-05-22 17:52:51 +00:00
GabyCT	0b32360ab4	Merge pull request #9684 from stevenhorsman/add-arch-to-component-cache-tags ci: cache: Add arch suffix to all cache tags	2024-05-22 09:24:28 -06:00
Fabiano Fidêncio	0e33ecf7fc	Merge pull request #9653 from JakubLedworowski/fixes-9497-ensure-quote-generation-service-is-added-to-qemu-cmd-2 runtime: Enable connection to Quote Generation Service (QGS)	2024-05-22 15:49:23 +02:00
sidneychang	8938f35627	runtime-rs: Adjust indentation in ifneq statements within Makefile. Replace tab indentation with spaces for the three lines within the ifneq statements, aligning them with the surrounding code. Fixes:#9692 Signed-off-by: sidneychang <2190206983@qq.com>	2024-05-22 20:24:35 +08:00
Fabiano Fidêncio	94f7bbf253	Merge pull request #9682 from fidencio/topic/allow-increasing-cpus-and-memory-via-annotation-for-tdx runtime: tdx: Allow default_{cpu,memory} annotations	2024-05-22 12:07:28 +02:00
Xuewei Niu	d31616cec3	runtime-rs: Remove obsoleted dial_timeout config The `dial_timeout` works fine for Runtime-go, but is obsoleted in Runtime-rs. When the pod cannot connect to the Agent upon starting, we need to adjust the `reconnect_timeout_ms` to increase the number of connection attempts to the Agent. Fixes: #9688 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2024-05-22 17:57:05 +08:00
Jakub Ledworowski	fc680139e5	runtime: Enable connection to Quote Generation Service (QGS) For the TD attestation to work the connection to QGS on the host is needed. By default QGS runs on vsock port 4050, but can be modified by the host owner. Format of the qemu object follows the SocketAddress structure, so it needs to be provided in the JSON format, as in the example below: -object '{"qom-type":"tdx-guest","id":"tdx","quote-generation-socket":{"type":"vsock","cid":"2","port":"4050"}}' Fixes: #9497 Signed-off-by: Jakub Ledworowski <jakub.ledworowski@intel.com>	2024-05-22 11:16:24 +02:00
Alex Lyn	0331859740	Merge pull request #9642 from gkurz/drop-unused-knobs-qemu-rs runtime-rs: Drop some useless QEMU arguments	2024-05-22 16:13:14 +08:00
Alex Lyn	ce030d1804	Merge pull request #9641 from cmaf/runtime-resize-mem-1 runtime: Add missing check in ResizeMemory for CH	2024-05-22 14:05:30 +08:00
Alex Lyn	b7af00be2a	Merge pull request #9624 from cncal/bugfix_duplicated_devices runtime: fix duplicated devices requested to the agent	2024-05-22 12:45:46 +08:00
Steve Horsman	f41f642b90	Merge pull request #9635 from kata-containers/dependabot/go_modules/src/runtime/go_modules-f0df977846 build(deps): bump github.com/containerd/containerd from 1.7.11 to 1.7.16 in /src/runtime in the go_modules group across 1 directory	2024-05-21 21:19:32 +01:00
Steve Horsman	9b0ed3dfa7	Merge pull request #9657 from ajaypvictor/remote-hyp-annotations runtime: Disable number of cpu comparison on remote hypervisor scenario	2024-05-21 21:19:12 +01:00
Hyounggyu Choi	92101fc61f	Merge pull request #9658 from BbolroC/migrate-vfio-ap-test CI: Migrate vfio-ap test files from tests repo	2024-05-21 20:21:09 +02:00
Lei Huang	b0a91b0d13	kata-agent: update env PCIDEVICE_<prefix>_<resource-name>_INFO The new version of sriov-network-device-plugin adds an env `PCIDEVICE_<prefix>_<resource-name>_INFO`, which has a json value; kata-agent can't parse it as env `PCIDEVICE_<prefix>_<resource-name>` which has value in format "DDDD:BB:SS.F". This change updates env `PCIDEVICE_<prefix>_<resource-name>_INFO`. Signed-off-by: Lei Huang <leih@nvidia.com>	2024-05-21 10:46:41 -07:00
stevenhorsman	db4818fe1d	ci: cache: Enforce tag length limit Container tags can be a maximum of 128 characters long so calculate the length of the arch suffix and then restrict the tag to this length subtracted from 128 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-21 18:03:45 +01:00
Gabriela Cervantes	c9e91db16f	tests: Fix indentation in confidential common script This PR fixes the indentation in the confidential common script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-05-21 16:33:46 +00:00
stevenhorsman	d6afd77eae	ci: cache: Update agent cache to use the full commit hash - Previously I copied the logic that abbreviated the commit hash from the versioning, but looking at our versions.yaml the clear pattern is that when pointing at commits of dependencies we use the full commit hash, not the abbreviated one, so for consistency I think we should do the same with the components that we make available Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-21 16:51:16 +01:00
stevenhorsman	d46b6a3879	ci: cache: Add arch suffix to all cache tags As we have multi-arch builds for nearly all components, we want to ensure that all the cache tags we set have the architecture suffix, not just the `TARGET_BRANCH` one. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-21 11:25:07 +01:00
stevenhorsman	865fa9da15	runtime: Resolve go static-checks failure Remove `rand.Seed` call to resolve the following failure: ``` rand.Seed is deprecated: As of Go 1.20 there is no reason to call Seed with a random value. ``` The go rand.Seed docs: https://pkg.go.dev/math/rand@go1.20#Seed back this up and states: > If Seed is not called, the generator is seeded randomly at program startup. so I believe we can just delete the call. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-21 11:08:59 +01:00
Fabiano Fidêncio	abf52420a4	runtime: tdx: Allow default_{cpu,memory} annotations For now, let's allow the users to set the default_cpu and default_memory when using TDX, as they may hit issues related to the size of the container image that must be pulled and unpacked inside the guest, Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-21 10:26:39 +02:00
stevenhorsman	75a201389d	runtime: update go version in go.mod - Make due to us bumping the golang version used in our CI but `make vendor` fails without the go version in the runtime go.mod being increased, so update this and run go mod tidy Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-21 09:11:46 +01:00
dependabot[bot]	735185b15c	build(deps): bump github.com/containerd/containerd Bumps the go_modules group with 1 update in the /src/runtime directory: [github.com/containerd/containerd](https://github.com/containerd/containerd). Updates `github.com/containerd/containerd` from 1.7.11 to 1.7.16 - [Release notes](https://github.com/containerd/containerd/releases) - [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md) - [Commits](https://github.com/containerd/containerd/compare/v1.7.11...v1.7.16) --- updated-dependencies: - dependency-name: github.com/containerd/containerd dependency-type: direct:production dependency-group: go_modules ... Signed-off-by: dependabot[bot] <support@github.com>	2024-05-21 09:11:46 +01:00
Ajay Victor	abe607b0c7	runtime: Disable number of cpu comparison on remote hypervisor scenario Fixes https://github.com/kata-containers/kata-containers/issues/9238 Signed-off-by: Ajay Victor <ajvictor@in.ibm.com>	2024-05-21 13:34:21 +05:30
dependabot[bot]	01868b2849	--- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: indirect dependency-group: go_modules ... Signed-off-by: dependabot[bot] <support@github.com>	2024-05-20 22:06:41 +00:00
Fabiano Fidêncio	8879e3bc45	Merge pull request #9452 from GabyCT/topic/tdxcoco gha: Add support to install KBS to k8s TDX GHA workflow	2024-05-20 23:28:52 +02:00
Fabiano Fidêncio	072b929b6f	Merge pull request #9660 from malt3/fix/genpolicy/namespace_empty_string genpolicy: detect empty string in ns as default	2024-05-20 21:34:13 +02:00
Gabriela Cervantes	cfdef7ed5f	tests/k8s: Use custom intel DCAP configuration This PR adds the use of custom Intel DCAP configuration when deploying the KBS. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-05-20 18:44:57 +00:00
Gabriela Cervantes	cace2fd340	metrics: Improve variable definition in memory usage script This PR improves general format like variable definition to have uniformity across the memory usage script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-05-20 16:14:59 +00:00
Fabiano Fidêncio	97056b017d	Merge pull request #9675 from stevenhorsman/release-build-tarballs-inherit-secrets gha: release: Set inherit secrets on tarball builds	2024-05-20 18:06:38 +02:00
Fabiano Fidêncio	b8b3bcc492	Merge pull request #9671 from bikesheddev/fix/kata-deploy-unbound-variable fix: kata-deploy.sh VERSION_ID unbound-variable	2024-05-20 17:22:55 +02:00
Fabiano Fidêncio	94cff3f74e	Merge pull request #9315 from fidencio/topic/adapt-TEEs-for-shared_fs-none TEEs: Use `shared_fs=none` for TDX	2024-05-20 17:17:36 +02:00
Fabiano Fidêncio	cffeb0ffb8	Merge pull request #9673 from fidencio/topic/revert-aks-workaround Revert "ci: azure: Workaround azure cli installation script"	2024-05-20 16:16:55 +02:00
stevenhorsman	f271983aeb	gha: release: Set inherit secrets on tarball builds Now we have updated the release builds to push artefacts to our registry for the release, so we can cache the images, we need to set `secrets: inherit` for all architecture's tarball builds so that we can log into quay.io and ghcr in those steps Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-20 14:19:17 +01:00
Fabiano Fidêncio	25c9cf32ff	Revert "ci: azure: Workaround azure cli installation script" This reverts commit `5ff53e4d1c`, as the script was fixed by MSFT, at least according to: https://github.com/Azure/azure-cli/issues/28984 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-20 14:38:46 +02:00
vac (Brendan)	d812007b99	kata-deploy: Fix unbound VERSION_ID VERSION_ID is not guaranteed to be specified in os-release, this makes kaka-deploy breaks in rolling distros like arch linux and void linux. Note that operating system vendors may choose not to provide version information, for example to accommodate for rolling releases. In this case, VERSION and VERSION_ID may be unset. Applications should not rely on these fields to be set. Signed-off-by: vac <dot.fun@protonmail.com>	2024-05-20 19:48:31 +08:00
Tim Zhang	857d2bbc8e	agent: Fix ctr exec stuck problem Fixes: #9532 Close stdin when write_stdin receives data of length 0. Stop call notify_term_close() in close_stdin, because it could discard stdout unexpectedly. Signed-off-by: Tim Zhang <tim@hyper.sh>	2024-05-20 14:52:14 +08:00
Fabiano Fidêncio	e8ebe18868	tests: k8s: tdx: Skip liveness probe test This test doesn't fail with the guest image pulling, but it for sure should. :-) We can see in the bats logs, something like: ``` Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 31s default-scheduler Successfully assigned kata-containers-k8s-tests/liveness-exec to 984fee00bd70.jf.intel.com Normal Pulled 23s kubelet Successfully pulled image "quay.io/prometheus/busybox:latest" in 345ms (345ms including waiting) Normal Started 21s kubelet Started container liveness Warning Unhealthy 7s (x3 over 13s) kubelet Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory Normal Killing 7s kubelet Container liveness failed liveness probe, will be restarted Normal Pulled 7s kubelet Successfully pulled image "quay.io/prometheus/busybox:latest" in 389ms (389ms including waiting) Warning Failed 5s kubelet Error: failed to create containerd task: failed to create shim task: the file /bin/sh was not found: unknown Normal Pulling 5s (x3 over 23s) kubelet Pulling image "quay.io/prometheus/busybox:latest" Normal Pulled 4s kubelet Successfully pulled image "quay.io/prometheus/busybox:latest" in 342ms (342ms including waiting) Normal Created 4s (x3 over 23s) kubelet Created container liveness Warning Failed 3s kubelet Error: failed to create containerd task: failed to create shim task: failed to mount /run/kata-containers/f0ec86fb156a578964007f7773a3ccbdaf60023106634fe030f039e2e154cd11/rootfs to /run/kata-containers/liveness/rootfs, with error: ENOENT: No such file or directory: unknown Warning BackOff 1s (x3 over 3s) kubelet Back-off restarting failed container liveness in pod liveness-exec_kata-containers-k8s-tests(b1a980bf-a5b3-479d-97c2-ebdb45773eff) ``` Let's skip it for now as we have an issue opened to track it down: https://github.com/kata-containers/kata-containers/issues/9665 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-19 21:59:29 +02:00
Fabiano Fidêncio	a2c70222a8	tests: k8s: tdx: Skip initContainerd shared vol test This is another one that is related to initContainers not being properly handled with the guest image pulling. Let's skip it for now as we have https://github.com/kata-containers/kata-containers/issues/9668 to track it down. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-19 20:58:45 +02:00
Fabiano Fidêncio	9d56145499	tests: k8s: tdx: Skip volume related tests Similarly to firecracker, which doesn't have support for virtio-fs / virtio-9p, TDX used with `shared_fs=none` will face the very same limitations. The tests affected are: * k8s-credentials-secrets.bats * k8s-file-volume.bats * k8s-inotify.bats * k8s-nested-configmap-secret.bats * k8s-projected-volume.bats * k8s-volume.bats Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-19 19:38:49 +02:00
Fabiano Fidêncio	606a62a0a7	tests: k8s: tdx: Skip "Setting sysctl" test This test fails when using `shared_fs=none` with the nydus-snapshotter, and we're tracking the issue here: https://github.com/kata-containers/kata-containers/issues/9666 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-19 19:38:38 +02:00
Fabiano Fidêncio	937b2d5806	tests: k8s: tdx: Skip "Kill all processes in container" test This test fails when using `shared_fs=none` with the nydus snapshotter, and we're tracking the issue here: https://github.com/kata-containers/kata-containers/issues/9664 For now, let's have it skipped. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-19 18:51:14 +02:00
Fabiano Fidêncio	03ce41b743	tests: k8s: tdx: Skip "Check custom dns" test The test has been failing on TDX for a while, and an issue has been created to track it down, see: https://github.com/kata-containers/kata-containers/issues/9663 For now, let's have it skipped. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-19 18:51:14 +02:00
Fabiano Fidêncio	1a8a4d046d	tests: k8s: setup: Improve / Fix logs Let's make sure the logs will print the correct annotation and its value, instead of always mentioning "kernel" and "initrd". Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-19 18:51:14 +02:00
Fabiano Fidêncio	3f38309c39	tests: k8s: tdx: Stop running `k8s-guest-pull-image.bats` We're doing that as all tests are going to be running with `shared_fs=none`, meaning that we don't need any specific test for this case anymore. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-19 18:51:00 +02:00
Fabiano Fidêncio	e84619d54b	tests: k8s: tdx: Add `add_runtime_handler_annotations` function This function will set the needed annotation for enforcing that the image pull will be handled by the snapshotter set for the runtime handler, instead of using the default one. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-19 18:49:07 +02:00
Fabiano Fidêncio	f2de259387	runtime: tdx: Use shared_fs=none We shouldn't be using 9p, at all, with TEEs, as off right now we have no way to ensure the channels are encrypted. The way to work this around for now is using guest pull, either with containerd + nydus snapshotter or with CRI-O; or even tardev snapshotter for pulling on the host (which is the approach used by MSFT). This is only done for TDX for now, leaving the generic, AMD, and IBM related stuff for the folks working on those to switch and debug possible issues on their environment. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-19 18:47:09 +02:00
Fabiano Fidêncio	5b257685d9	Merge pull request #9662 from dborquez/fix_launchtimes_timestamp_generation Fix launch times timestamp generation.	2024-05-18 21:11:09 +02:00
Fabiano Fidêncio	94786dc939	Merge pull request #9659 from stevenhorsman/remove-non-printable-tag-characters ci: cache: Filter out non-printable characters from tag	2024-05-18 14:47:07 +02:00
Fabiano Fidêncio	874cda0e51	Merge pull request #9655 from BbolroC/add-arch-to-initramfs CI: Append arch type to initramfs-cryptsetup image	2024-05-18 14:31:57 +02:00
Malte Poll	babdab9078	genpolicy: detect empty string in ns as default In Kubernetes, the following values for namespace are equivalent and all refer to the default namespace: - ` ` (namespace field missing) - `namespace: ""` (namespace field is the empty string) - `namespace: "default"`(namespace field has the explicit value `default`) Genpolicy currently does not handle the empty string case correctly. Signed-Off-By: Malte Poll <1780588+malt3@users.noreply.github.com>	2024-05-18 12:44:59 +02:00
Fabiano Fidêncio	cbfdc70a55	Merge pull request #9613 from fidencio/topic/skip-pull-image-tests-on-tees-part-II tests: pull-image: Only skip tests for TEEs	2024-05-18 03:31:38 +02:00
Archana Shinde	0e28e904e0	kata-manager: Install cni for containerd When just containerd is installed without installing nerdctl, cni plugins are missing from the installation. containerd tarball does not include cni plugin files. Hence install cni plugins separately for containerd. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-05-18 00:19:57 +00:00
Archana Shinde	d23d58a484	kata-manager: Copy cni files under /opt/cni nerdctl requires cni plugins to be installed in /opt/cni/bin Without bridge plugin installed, it is not possible to run a container with nerdctl. The downloaded nerdctl tarball contains cni plugin files, but are extracted under /usr/local/libexec. Copy extracted tarball cni files under /usr/local/libexec to /opt/cni/bin Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-05-18 00:16:48 +00:00
David Esparza	938d3dc430	metrics: fix timestamps generation from launch times test. Use `eval` to process the `date` command along with its parameters, thus avoiding misinterpreting the parameters as commands. Fixes: #9661 Signed-off-by: David Esparza <david.esparza.borquez@intel.com>	2024-05-17 14:44:41 -06:00
David Esparza	bae377b42a	metrics: determine the realpath of kata-shim component. Determine the realpath of kata-shim avoiding the check fails in case the kata-shim is not a symlink, as was happening prior to this commit. Signed-off-by: David Esparza <david.esparza.borquez@intel.com>	2024-05-17 14:40:02 -06:00
Fabiano Fidêncio	5ff53e4d1c	ci: azure: Workaround azure cli installation script This is done in order to work around https://github.com/Azure/azure-cli/issues/28984, following a suggestion on the very same issue. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-17 20:28:24 +02:00
stevenhorsman	42fddb5530	ci: cache: Filter out non-printable characters from tag - The tags have a trailing non-printable character, which results in our cache tags having a trailing underscore e.g. `ghcr.io/kata-containers/cached-artefacts/agent:ce24e9835_` For ease of use of these cached components, we should strip off the trailing underscore. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-17 14:16:40 +01:00
Hyounggyu Choi	961735a181	CI: Migrate vfio-ap test files from tests repo An e2e test for `vfio-ap` has been conducted internally in IBM due to the lack of publicly available test machines equipped with a required crypto device. The test is performed by the `tests` repository: (i.e. `772105b560/Makefile (L144)`) The community is working to integrate all tests into the `kata-containers` repository, so the `vfio-ap` test should be part of that effort. This commit moves a test script and Dockerfile for a test image from the `tests` repository. We do not rename the script to `gha-run.sh` because it is not executed by Github Actions' workflow. You can check the test results from the s390x nightly test with the migrated files here: https://github.com/kata-containers/kata-containers/actions/runs/9123170010/job/25100026025 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-05-17 14:59:16 +02:00
stevenhorsman	a92defdffe	tests: pull-image: Remove skips Given that we think the containerd -> snapshotter image cache problems have been resolved by bumping to nydus-snapshotter v0.3.13 we can try removing the skips to test this out Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-17 12:39:57 +02:00
stevenhorsman	7ac302e2d8	tests: Slacken guest pull rootfs count assert - We previously have an expectation for the pause rootfs to be pull on the host when we did a guest pull. We weren't really clear why, but it is plausible related to the issues we had with containerd and nydus caching. Now that is fixed we can begin to address this with setting shared_fs=none, but let's start with updating the rootfs host check to be not higher than expected Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-17 12:39:56 +02:00
Fabiano Fidêncio	67ff58251d	tests: confidential_common: Remove unneeded `ensure_yq` call This test is called from `tests/integration/run_kuberentes_tests.sh`, which already ensures that yq is installed. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-17 12:39:56 +02:00
Fabiano Fidêncio	cc874ad5e1	tests: confidential: Ensure those only run on TEEs Running those with the non-TEE runtime classes will simply fail. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-17 12:39:56 +02:00
Fabiano Fidêncio	2bc5b1bba2	tests: pull-image: Only skip tests for TEEs On `1423420`, I've mistakenly disabled the tests entirely, for both non-TEEs and TEEs. This happened as I didn't realise that `confidential_setup` would take non-TEEs into consideration. :-/ Now, let me follow-up on that and make sure that the tests will be running on non-TEEs. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-17 12:39:56 +02:00
Fabiano Fidêncio	d875f89fa2	tests: Add is_confidential_hardware() This function is a helper to check whether the KATA_HYPERVISOR being used is a confidential hardware (TEE) or not, and we can use it to skip or only run tests on those platforms when needed. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-17 12:39:56 +02:00
Fabiano Fidêncio	4a04a1f2ae	tests: Re-work confidential_setup() Let's rename it to `is_confidential_runtime_class`, and adapt all the places where it's called. The new name provides a better description, leading to a better understanding of what the function really does. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-17 12:39:56 +02:00
Pavel Mores	b9febc4458	runtime-rs: document architecture & implementation conventions in qemu-rs Implementation of QemuCmdLine has a fairly uniform and repetitive structure that's guided by a set of conventions. These conventions have however been mostly implicit so far, leading to a superfluous and annoying request/force-push churn during qemu-rs PR reviews. This commit aims to make things explicit so that contributors can take them into account before an initial PR submission. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-05-17 12:21:44 +02:00
Hyounggyu Choi	3917930a76	CI: Append arch type to initramfs-cryptsetup image This commit is to append an arch type to the initramfs-cryptsetup image to prevent a wrong arch image from being pulled on a different arch host. Fixes: #9654 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-05-17 11:42:49 +02:00
Steve Horsman	9a6d8d8330	Merge pull request #9650 from stevenhorsman/caching-tagging-update-partIII Caching tagging update part iii	2024-05-17 09:09:15 +01:00
stevenhorsman	ce24e98358	ci: cache: Add tag character filtering - Container image tags can only contain alphanumeric, period, hyphen and underscore characters, so convert characters outside of these to be underscores, to avoid having invalid tag failures Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-16 21:38:07 +01:00
stevenhorsman	a98b1e3afb	ci: cache: Integrate tagging updates with recent changes Recently the extra gpu caching was added, unfortunately when I rebased I ended up with both the new tagging logic and old logic. Let's try and integrate them properly to avoid doing the push twice. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-16 21:38:07 +01:00
Lukáš Doktor	f994f79078	ci.ocp: Add steps to reproduce/bisect CI runs in case the upstream CI fails it's useful to pin-point the PR that caused the regression. Currently openshift-ci does not allow doing that from their setup but we can mimic the setup on our infrastructure and use the available kata-deploy-ci images to find the first failing one. To help with that add a few helper scripts and a howto. Fixes: #9228 Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-05-16 20:20:05 +02:00
Lukáš Doktor	a556ad7e01	ci.ocp: Document how to run openshift-tests with kata document the ocp pipeline. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-05-16 20:15:32 +02:00
Lukáš Doktor	ea081bd882	ci.ocp: Add webhook cleanup cleanup the webhook resources as well. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-05-16 20:15:31 +02:00
David Esparza	029a6de52b	Merge pull request #9615 from GabyCT/topic/fixlaunchtime metrics: Update launch times script	2024-05-16 11:28:44 -06:00
Steve Horsman	33e6b241ba	Merge pull request #9647 from stevenhorsman/fix-artefact-tags-unbound-variable ci: cache: Fix unbound variable	2024-05-16 16:22:47 +01:00
stevenhorsman	9d9487b17f	ci: cache: Fix unbound variable Now we have the workflow updated and can test the changes in caching we've hit an error: ``` line 1180: artefact_tag: unbound variable ``` so we need to fix that up. Sorry for missing this before. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-16 14:30:32 +01:00
Steve Horsman	03c08583c3	Merge pull request #9644 from stevenhorsman/fix-broken-workflow workflow: Remove if from env conditional	2024-05-16 14:13:25 +01:00
stevenhorsman	f7fd2f9a5d	workflow: Fix problems with build-asset workflows - It appears like the `if` isn't required when setting env as a conditional - `inputs.stage` over input.stage - Swap matrix.component to matrix.asset Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-16 11:51:46 +01:00
Steve Horsman	d8468cb178	Merge pull request #9550 from stevenhorsman/tag-component-caches Tag component caches	2024-05-16 11:05:18 +01:00
Steve Horsman	b31ff09b8d	Merge pull request #9617 from zvonkok/artefact-repository deploy: Add artefact repository	2024-05-16 10:41:23 +01:00
Fabiano Fidêncio	4d073c837d	Merge pull request #9636 from ChengyuZhu6/snapshotter version: Bump nydus snapshotter to v0.13.13	2024-05-16 02:54:53 +02:00
GabyCT	05cc8fae5e	Merge pull request #9610 from GabyCT/topic/fixrwfio metrics: Fix random write value for FIO	2024-05-15 17:44:41 -06:00
Gabriela Cervantes	793a02600a	metrics: Fix random write value for clh for FIO This PR decreases the random write value for clh for FIO. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-05-15 22:13:10 +00:00
Chelsea Mafrica	5d2af555da	runtime: Add missing check in ResizeMemory for CH ResizeMemory for Cloud Hypervisor is missing a check for the new requested memory being greater than the max hotplug size after alignment. Add the check, and since an earlier check for this setsrequested memory to the max hotplug size, do the same in the post-alignment check. Fixes #9640 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2024-05-15 11:29:18 -07:00
GabyCT	d752f0aa4f	Merge pull request #9627 from GabyCT/topic/ghacomk8s gha: Fix indentation in gha run k8s common	2024-05-15 11:55:14 -06:00
Greg Kurz	bd6420e0cc	runtime-rs: Drop some useless QEMU arguments All these settings are hardcoded as `false` and result in no extra options on the QEMU command line, like the go runtime does. There actually not needed : - we're never going to ask QEMU to survive a guest shutdown - we're never going to run QEMU daemonized since it prevents log collection - we're never going to ask QEMU to start with the guest stopped No need to keep this code around then. Signed-off-by: Greg Kurz <groug@kaod.org>	2024-05-15 18:33:43 +02:00
stevenhorsman	7f41329010	ci: cache: Optional tag components with tags - CoCo wants to use the agent and coco-guest-components cached artifacts so tag them with a helpful version, so make these easier to get Signed-off-by: stevenhorsman <steven@uk.ibm.com> No commands remaining.	2024-05-15 16:56:40 +01:00
stevenhorsman	9999971656	release: Move component's don't ship logic - We don't want to ship certain components (agent, coco-guest-components) as part of the release, but for other consumers it's useful to be able to pull in the components from oras, so rather than not building them, just don't upload it as part of the release. - Also make the archs all consistent on not shipping the agent Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-15 16:55:55 +01:00
stevenhorsman	040e6cdf12	gha: release: Set RELEASE env - Set RELEASE env to 'yes', or 'no', based on if the stage passed in was 'release', so we can use it in the build scripts Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-15 16:55:55 +01:00
stevenhorsman	d93156d84d	gha: release: Push artifacts to registry on release For other projects (e.g. CoCo projects) being able to access the released versions of components is helpful, so push these during the release process Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-05-15 16:55:55 +01:00
Steve Horsman	19ca1a6656	Merge pull request #9638 from BbolroC/use-fixed-len-git-hash-explicitly CI: Use `--abbrev=9` explicitly for abbreviated commit hash	2024-05-15 16:55:07 +01:00
GabyCT	64b915b86e	Merge pull request #9438 from GabyCT/topic/addnegativetest tests: Add k8s negative policy test	2024-05-15 08:52:57 -06:00
Hyounggyu Choi	e075150fbe	CI: Use `--abbrev=9` explicitly for abbreviated commit hash A length of the result of `git log -1 --pretty=format:%h` could vary over different CI systems, highly likely messing up their caching mechanisms. This commit is to use an option `--abbrev=9` to standardize the length to 9 characters for CI. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-05-15 14:22:07 +02:00
Zvonko Kaiser	117e2f2ecc	Merge pull request #9618 from zvonkok/nvidia-rootfs-#1 gpu: Add build targets for GPU rootfs initrd/image	2024-05-15 13:30:42 +02:00
Hyounggyu Choi	6a4ff08156	Merge pull request #9632 from BbolroC/do-not-build-agent-policy-for-s390x local-build: Ensure the default rootfs is built with AGENT_POLICY=yes	2024-05-15 06:56:22 +02:00
ChengyuZhu6	d48c7ec979	version: Bump nydus snapshotter to v0.13.13 Bump nydus snapshotter to v0.13.13 to fix the gap when switching different snapshotters in guest pull. Fixes: #8407 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-05-15 12:21:01 +08:00
Fabiano Fidêncio	92bb235723	osbuilder: Log when the default policy is installed This will help us to debug issues in the future (and would have helped in the past as well). :-) Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-14 20:45:49 +02:00
Fabiano Fidêncio	75bd97e8df	build: Ensure the default rootfs is built with AGENT_POLICY=yes This is needed, as `b1710ee2c0` made the default agent shipped the one with policy support. However, we simply didn't update the rootfs to reflect that, causing then an issue to start the agent as shown by the strace below: ``` open("/etc/kata-opa/default-policy.rego", O_RDONLY\|O_LARGEFILE\|O_CLOEXEC) = -1 ENOENT (No such file or directory) futex(0x7f401eba0c28, FUTEX_WAKE_PRIVATE, 1) = 1 rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [], 8) = 0 tkill(553681, SIGABRT) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=553681, si_uid=1000} --- +++ killed by SIGABRT (core dumped) +++ ``` This happens as the default policy must be set when the agent is built with policy support, but the code path that copies that into the rootfs is only triggered if the rootfs itself is built with AGENT_POLICY=yes, which we're now doing for both confidential and non-confidential cases. Sadly this was not caught by CI till we the cache was not used for rootfs, which should be solved by the previous commit. Fixes: #9630, #9631 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-14 20:39:15 +02:00
Hyounggyu Choi	37060a7d2e	local-build: Stop using cached artifacts when local-build/* is updated This is to add an info for files at `tools/packaging/kata-deploy/local-build/* to a version of the components and ensure that the cached artefacts are not used when the files of interest are updated. Fixes: #9630 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-05-14 19:47:33 +02:00
Fabiano Fidêncio	9a3392993d	Merge pull request #9629 from ldoktor/tdx_not_supported_warning kata-deploy: Fix tdx_not_supported call	2024-05-14 17:27:56 +02:00
Greg Kurz	f14a1330d4	Merge pull request #9585 from littlejawa/debugging_the_runtime debugging: adding a script and instructions for debugging the GO shim	2024-05-14 15:31:07 +02:00
Lukáš Doktor	d9ae130031	kata-deploy: Fix tdx_not_supported call the `tdx_not_supported_warning` function does not exists, the `tdx_not_supported` should be called instead. Fixes: #9628 Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-05-14 13:26:07 +02:00
Julien Ropé	e7cfc0865a	debugging: adding a script and instructions for debugging the GO shim Using a debugger with the kata runtime is complicated, but it can be done and can be very useful. This commits provides a helper script that simplifies it, and updates the developper's documentation to explain how to use it. Signed-off-by: Julien Ropé <jrope@redhat.com>	2024-05-14 11:12:31 +02:00
Greg Kurz	e2117d3b71	Merge pull request #9571 from emanuellima1/fix-impl-rtc runtime-rs: Fix constructing the RTC struct	2024-05-14 09:17:27 +02:00
Gabriela Cervantes	f20a44bba3	gha: Fix indentation in gha run k8s common This PR fixes the indentation in gha run k8s common script to have uniformity across the script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-05-13 20:07:47 +00:00
Fabiano Fidêncio	4d5e90038c	Merge pull request #9626 from fidencio/topic/prepare-for-3.5.0-release release: Bump VERSIONS file to 3.5.0	2024-05-13 12:52:12 +02:00
Fabiano Fidêncio	0e385452e5	release: Bump VERSIONS file to 3.5.0 Let's bump the VERSIONS file and start preparing for a new release of the project. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-13 10:49:09 +02:00
Fabiano Fidêncio	c64b07f981	Merge pull request #9622 from fidencio/topic/unbreak-nvidia-gpu-build build: nvidia-gpu: Fix cache usage of the headers tarball	2024-05-12 14:40:22 +02:00
cncal	232db2d906	runtime: fix duplicated devices requested to the agent By default, when a container is created with the `--privileged` flag, all devices in `/dev` from the host are mounted into the guest. If there is a block device(e.g. `/dev/dm`) followed by a generic device(e.g. `/dev/null`)，two identical block devices(`/dev/dm`) would be requested to the kata agent causing the agent to exit with error: > Conflicting device updates for /dev/dm-2 As the generic device type does not hit any cases defined in `switch`， the variable `kataDevice` which is defined outside of the loop is still the value of the previous block device rather than `nil`. Defining `kataDevice` in the loop fixes this bug. Signed-off-by: cncal <flycalvin@qq.com>	2024-05-12 16:38:37 +08:00
Fabiano Fidêncio	9713558477	k0s: Use a different port for kube-route's metrics kube-router decided to use :8080 for its metrics, and this seems to be a change that affected k0s 1.30.0+, leading to kube-router pod crashing all the time and anything can actually be started after that. Due to this issue, let's simply use a different port (:9999) and move on with our tests. Fixes: #9623 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-11 23:18:20 +02:00
Fabiano Fidêncio	4cd048444d	build: nvidia-gpu: Fix cache usage of the headers tarball Whenever we count on having the headers tarball, we must unpack the cached content into the expected directory, otherwise we'd simply fail, as we've been failing in our CI, at the end of the process where we generate the tarball from the cached components. It's weird to me, sincerely, that the headers tarball end up in such weird place (build/kernel-nvidia-gpu/builddir/), but I'll leave that to Zvonko to figure out whether something better can be done, as the intuit of this PR is simply unblock Kata Containers CI. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-11 17:59:53 +02:00
Zvonko Kaiser	693e307f72	deploy: Add artefact repository New env var so everyone can test the PUSH_TO_REGISTRY feature export PUSH_TO_REGISTRY=yes export ARTEFACT_REGISTRY=quay.io export ARTEFACT_REPOSITORY=my-fancy-kata-containers export ARTEFACT_REGISTRY_USERNAME=zvonkok export ARTEFACT_REGISTRY_PASSWORD=<super-secret> make ...-tarball Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-10 16:41:52 +00:00
Zvonko Kaiser	4dea73b433	Merge pull request #9616 from zvonkok/nv-kernel-hotfix deploy: Fix wrong pushing of artifacts	2024-05-10 18:38:09 +02:00
Zvonko Kaiser	4d0f42a145	deploy: Fix wrong pushing of artifacts Added explicit case statements for nvidia-gpu and nvidia-gpu-confidential Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-10 14:08:32 +00:00
Zvonko Kaiser	85374f55d2	gpu: Add build targets for GPU rootfs initrd/image Preparation for complete GPU rootfs build step #1/#N Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-10 09:47:21 +00:00
Zvonko Kaiser	8ec2cc9c0d	threat-model: Add VFIO, ACPI and KVM/VMM threat-model descriptions We're missing several topics in the current threat model lets update. Fixes: #8943 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-10 07:18:44 +00:00
Fabiano Fidêncio	20515fed70	Merge pull request #9484 from zvonkok/nvidia-runtimeclasses deploy: Add runtimeClasses relating to the NVIDIA GPU	2024-05-10 03:52:12 +02:00
Gabriela Cervantes	80e551ea74	metrics: Update launch times script This PR updates the launch times scripts by improving the variable definition as well as trying to use the same format across all the script. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-05-09 21:29:32 +00:00
Emanuel Lima	59c1567f80	runtime-rs: Fix constructing the RTC struct RTC was being built in a wrong fashion on commit #2bc5e3c6e2ab0145fa9e8be95df0d5086c07a517 RTC was being constructed inside the QemuCmdLine struct, but it should've been built inside the devices vector. Signed-off-by: Emanuel Lima <emlima@redhat.com>	2024-05-09 15:00:47 -03:00
Fabiano Fidêncio	2f686b1179	Merge pull request #9608 from fidencio/topic/tdx-depend-on-distro-host-stack-part-II tdx: Adapt kata-deploy to use QEMU / OVMF from the distros	2024-05-09 10:25:19 +02:00
Zvonko Kaiser	da7e6a0f07	deploy: Add runtimeClasses relating to the NVIDIA GPU Fixes: #9483 For the added configurations we need to provide runtimeClasses. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-09 10:00:59 +02:00
Fabiano Fidêncio	96a100f910	Merge pull request #9482 from zvonkok/kernel-headers-tarball kernel: Add caching of kernel-headers	2024-05-09 09:58:30 +02:00
Fabiano Fidêncio	aba56a8adb	tests: measured-rootfs: Skip policy addition Let's skip the policy addition for now, in order to get the TDX CI back up and running, and then we can re-enable it as soon as we get https://github.com/kata-containers/kata-containers/issues/9612 fixed. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-09 07:59:12 +02:00
Fabiano Fidêncio	77f457c0e1	runtime: tdx: Drop sept-ve-disable=on This was needed when we were using an old (and not maintained anymore) host stack. Considering what we have as part of the distros, Today, this can simply be dropped, as I cannot find any reference of this one being needed in any up-to-date documentation. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-09 07:59:12 +02:00
Fabiano Fidêncio	416d00228c	Revert "qemu: tdx: Adapt command line" (partially) This reverts commit `b7cccfa019`. The `private=on` bit has never made its way upstream, and was removed from the latest iteration that we're using. With that in mind, let's revert its usage in the code. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-09 07:59:12 +02:00
Fabiano Fidêncio	1c3037fd25	Revert "govmm: tdx: Expose the private=on\|off knob" This reverts commit `582b5b6b19`. The `private=on` bit has never made its way upstream, and was removed from the latest iteration that we're using. With that in mind, let's revert its addition, and later on its usage in the code. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-09 07:59:12 +02:00
Fabiano Fidêncio	a9720495de	kata-deploy: Ensure the distro QEMU and OVMF are used for TDX Here we're checking the distro's `/etc/os-release` or `/usr/lib/os-release` in order to get which distro we're deploying the Kata Containers artefacts to, and then to properly adjust the QEMU and OVMF with TDX support that's been shipped with the distros. Together with that, we're also printing the instructions provided by the distro on how to enable and use TDX. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-09 07:59:12 +02:00
Fabiano Fidêncio	f48450b360	runtime: config: tdx: Add QEMU / OVMF placeholder var Let's add the PLACEHOLDER_FOR_DISTRO_{QEMU,OVMF}_WITH_TDX_SUPPORT variables instead of actually setting a path, so we can easily replace those as part of our deployment scripts. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-09 07:59:12 +02:00
Fabiano Fidêncio	84b94dc2b1	kata-deploy: Expose /host to the daemon-set We'll need to have access to the host os-release file (either under `/etc/os-release` or under `/usr/lib/os-release`), and the simplest approach that comes to my mind to do is doing what a debug pod would do, mounting `/` as `/host` and then allowing us to have access to those files, and then corectly set the TDX specific QEMU and OVMF (TDVF) paths for the tdx available configurations. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-09 07:59:12 +02:00
Fabiano Fidêncio	f2d40da8e4	versions: build: Remove unused td-shim entry We haven't been using nor testing with td-shim, as Cloud Hypervisor does not officially support TDX yet, and TDVF is supposed to be used with QEMU, instead of td-shim. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-09 07:59:12 +02:00
Fabiano Fidêncio	ea82740b19	versions: build: Remove TDX specific QEMU Let's remove everything related to the TDX specific QEMU building / shipping from our repo, as we'll be relying on the one coming from the distros. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-09 07:59:12 +02:00
Fabiano Fidêncio	4292c4c3b1	versions: build: Remove TDX specific OVMF (TDVF) Let's remove everything related to the TDVF building / shipping from our repo, as we'll be relying on the one coming from the distro. Later on, we may need to re-add TDVF logic, as we're already using upstream edk2 repo / content, but when that's needed we'll simply revert this commit. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-09 07:59:12 +02:00
Alex Lyn	946f0bdfff	Merge pull request #9609 from fidencio/topic/skip-pull-image-tests-on-tees tests: pull-image: Don't run on TEEs	2024-05-09 08:22:55 +08:00
GabyCT	3b8a910393	Merge pull request #9596 from lifupan/main db: fix the issue of failed to init pci root bus	2024-05-08 13:14:20 -06:00
Gabriela Cervantes	2fb406ed3a	metrics: Fix random write value for FIO This PR fixes the random write value for FIO for qemu by decreasing it to avoid the random failures of the GHA CI. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-05-08 18:54:41 +00:00
Fabiano Fidêncio	142342012c	tests: pull-image: Don't run on TEEs Let's skip those tests on TEEs as we've been facing a reasonable amount of issues, most likely on the containerd side, related to pulling the image on the guest. Once we're able to fix the issues on containerd, we can get back and re-enable those by reverting this commit. The decision of disabling the tests for TEEs is because the machines may end up in a state where human intervention is necessary to get them back to a functional state, and that's really not optimal for our CI. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-08 18:40:22 +02:00
Fabiano Fidêncio	c0bf9e9bc6	Merge pull request #9607 from fidencio/topic/tdx-depend-on-distro-host-stack-part-I ci: Stop building TDX specific QEMU and OVMF	2024-05-08 15:53:15 +02:00
Zvonko Kaiser	fb0b821771	kernel: Add caching of kernel-headers Fixes: #9481 We need to cache the kernel-headers for the NVIDIA GPU initrd/image build. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-05-08 11:30:39 +00:00
Fabiano Fidêncio	12dc9f83df	ci: Stop building TDX specific QEMU and OVMF This is the first step of the work to start relying on the artefacts coming from the distros (CentOS 9 Stream, and Ubuntu) themselves. Let's have this first one merged, as this will not run the CI due to the changes being on the yaml itself, and then follow-up with the changes needed on other parts of the project (kata-deploy, runtime, etc). Fixes: #9590 -- part I Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-05-08 11:39:32 +02:00
Alex Lyn	875e6e3815	Merge pull request #9601 from cncal/fix_redundant_log qemu: the error is logged only when it occurs	2024-05-08 08:59:01 +08:00
GabyCT	22087f9db9	Merge pull request #9598 from lifupan/main_shim runtime-rs: fix the issue of the leak of dead shim	2024-05-07 10:14:11 -06:00
GabyCT	a564422b7b	Merge pull request #9582 from cncal/main build: fix the confusing build message if yq doesn't exist in GOPATH/bin	2024-05-07 09:34:27 -06:00
Fabiano Fidêncio	cd84414c63	Merge pull request #9600 from GabyCT/topic/deleteoci versions: Remove oci information from versions file	2024-05-07 13:15:35 +02:00
Fabiano Fidêncio	ddf6b367c7	Merge pull request #9568 from kata-containers/dependabot/go_modules/src/runtime/go_modules-22ef55fa20 build(deps): bump the go_modules group across 5 directories with 8 updates	2024-05-07 13:14:48 +02:00
Steve Horsman	e967db60ab	Merge pull request #9592 from sprt/mariner-before-ch39 tests: adapt Mariner CI to unblock CH v39 upgrade	2024-05-07 11:52:55 +01:00
cncal	15d511af97	qemu: the error is logged only when it occurs Everytime I create contianer on arm64 machine, containerd/kata logs a redundant warning as follows: ``` shell time="2024-05-07" level=warning msg="<nil>" arch=arm64 name=containerd-shim-v2 pid=xxx sandbox=fdd1f05 source=virtcontainers/hypervisor ``` I added an error statement so that the error would be logged when it occurs. Signed-off-by: cncal <flycalvin@qq.com>	2024-05-07 14:28:04 +08:00
Gabriela Cervantes	aecede11fc	versions: Remove oci information from versions file This PR removes oci information from versions file as this is not longer being used in kata containers repository. Fixes #9599 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-05-06 20:14:00 +00:00
Gabriela Cervantes	b54dc26073	gha: Enable uninstall kbs client function for coco gha workflow This PR enables the uninstall kbs client function for coco gha tdx workflow. Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-05-06 15:55:24 +00:00
Gabriela Cervantes	aaf9b54d97	gha: Add support to install KBS to k8s TDX GHA workflow This PR adds support to install KBS to k8s TDX GHA workflow in order to run confidential attestation tests. Fixes #9451 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-05-06 15:42:17 +00:00
Gabriela Cervantes	506e17a60d	tests: Add k8s negative policy test This PR adds a k8s negative policy test to the confidential attestation bats test. Fixes #9437 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-05-06 15:28:54 +00:00
Fupan Li	3694f3d9fe	runtime-rs: fix the issue of the leak of dead shim We should init and asign the runtime instance to runtime handler, otherwise, if the pause container failed to start, which means the runtime instance failed to start, then the following delete & shutdown request wouldn't be run, thus the dead shim would be left. Fixes: #9597 Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-05-06 17:31:31 +08:00
Fupan Li	26bee78e8d	db: fix the issue of failed to init pci root bus dragonball reserves 2048G of mmio space for the pci root bus by default on physical addresses greater than 4G. However, for some machines with smaller physical address widths, such as 39-bit wide physical addresses, dragonball reserves the mmio space when initializing the memory. It is less than 2048G, so this commit dynamically calculates and allocates the mmio size of each pci root bus. Fixes: #9509 Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-05-06 11:34:18 +08:00
Aurélien Bombo	0cc2b07a8c	tests: adapt Mariner CI to unblock CH v39 upgrade The CH v39 upgrade in #9575 is currently blocked because of a bug in the Mariner host kernel. To address this, we temporarily tweak the Mariner CI to use an Ubuntu host and the Kata guest kernel, while retaining the Mariner initrd. This is tracked in #9594. Importantly, this allows us to preserve CI for genpolicy. We had to tweak the default rules.rego however, as the OCI version is now different in the Ubuntu host. This is tracked in #9593. This change has been tested together with CH v39 in #9588. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-05-03 16:29:12 +00:00
cncal	48d873b52b	build: fix the confusing build message if yq doesn't exist in GOPATH/bin The build message shows that yq was not found when I tried to build runtime binaries, but I've actually installed yq by yum install. Signed-off-by: cncal <flycalvin@qq.com>	2024-05-03 08:34:45 +08:00
cncal	9caa7beb1f	runtime: make kata-runtime check error more understandable If device /dev/kvm does not exist, kata-runtime check would fail with an ambiguous error messae 'no such file or directory'. I added a little more details to make it understandable and it will belike: ``` ERRO[0000] cannot open kvm device: no such file or directory arch=arm64 check-type=full device=/dev/kvm name=kata-runtime pid=2849085 source=runtime ERRO[0000] no such file or directory arch=arm64 name=kata-runtime pid=2849085 source=runtime no such file or directory ``` Signed-off-by: cncal <flycalvin@qq.com>	2024-05-03 08:29:08 +08:00
Zvonko Kaiser	e5e0983b56	Merge pull request #9476 from zvonkok/nvidia-config-tomls config: Add NVIDIA GPU SNP, TDX configuration files	2024-05-02 10:27:10 +02:00
Fabiano Fidêncio	f04a7a55ed	Merge pull request #9563 from fidencio/topic/agent-use-policy-by-default build: Build the shipped agent with policy enabled	2024-05-01 12:22:05 +02:00
Fabiano Fidêncio	33a8701904	Merge pull request #9573 from littlejawa/kata_deploy_crio_conf kata-deploy: configure debugging for crio	2024-05-01 12:19:10 +02:00
Julien Ropé	c2aed995b7	kata-deploy: configure debugging for crio Fix the configuration for crio's log_level Fixes: #9556 Signed-off-by: Julien Ropé <jrope@redhat.com>	2024-04-30 17:48:43 +02:00
stevenhorsman	3c2232d898	runtime: fix testVersionString logic - The testVersionString logic use regex to check that the ociVersion is displayed correctly, but with the new go module that version has a `+` in, so we need to quote this to escape special characters Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-04-30 10:54:49 +01:00
dependabot[bot]	391bc35805	build(deps): bump the go_modules group across 5 directories with 8 updates Bumps the go_modules group with 2 updates in the /src/runtime directory: [github.com/containerd/containerd](https://github.com/containerd/containerd) and [github.com/containers/podman/v4](https://github.com/containers/podman). Bumps the go_modules group with 4 updates in the /src/tools/csi-kata-directvolume directory: [golang.org/x/sys](https://github.com/golang/sys), google.golang.org/protobuf, [golang.org/x/net](https://github.com/golang/net) and [google.golang.org/grpc](https://github.com/grpc/grpc-go). Bumps the go_modules group with 2 updates in the /src/tools/log-parser directory: [golang.org/x/sys](https://github.com/golang/sys) and gopkg.in/yaml.v3. Bumps the go_modules group with 2 updates in the /tests directory: [golang.org/x/sys](https://github.com/golang/sys) and gopkg.in/yaml.v3. Bumps the go_modules group with 2 updates in the /tools/testing/kata-webhook directory: [golang.org/x/sys](https://github.com/golang/sys) and [golang.org/x/net](https://github.com/golang/net). Updates `github.com/containerd/containerd` from 1.7.2 to 1.7.11 - [Release notes](https://github.com/containerd/containerd/releases) - [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md) - [Commits](https://github.com/containerd/containerd/compare/v1.7.2...v1.7.11) Updates `github.com/containers/podman/v4` from 4.2.0 to 4.9.4 - [Release notes](https://github.com/containers/podman/releases) - [Changelog](https://github.com/containers/podman/blob/v4.9.4/RELEASE_NOTES.md) - [Commits](https://github.com/containers/podman/compare/v4.2.0...v4.9.4) Updates `google.golang.org/protobuf` from 1.29.1 to 1.33.0 Updates `github.com/cyphar/filepath-securejoin` from 0.2.3 to 0.2.4 - [Release notes](https://github.com/cyphar/filepath-securejoin/releases) - [Commits](https://github.com/cyphar/filepath-securejoin/compare/v0.2.3...v0.2.4) Updates `golang.org/x/sys` from 0.15.0 to 0.19.0 - [Commits](https://github.com/golang/sys/compare/v0.15.0...v0.19.0) Updates `google.golang.org/protobuf` from 1.31.0 to 1.33.0 Updates `golang.org/x/net` from 0.19.0 to 0.23.0 - [Commits](https://github.com/golang/net/compare/v0.19.0...v0.23.0) Updates `google.golang.org/grpc` from 1.59.0 to 1.63.2 - [Release notes](https://github.com/grpc/grpc-go/releases) - [Commits](https://github.com/grpc/grpc-go/compare/v1.59.0...v1.63.2) Updates `golang.org/x/sys` from 0.0.0-20191026070338-33540a1f6037 to 0.1.0 - [Commits](https://github.com/golang/sys/compare/v0.15.0...v0.19.0) Updates `gopkg.in/yaml.v3` from 3.0.0-20200313102051-9f266ea9e77c to 3.0.0 Updates `golang.org/x/sys` from 0.0.0-20220429233432-b5fbb4746d32 to 0.19.0 - [Commits](https://github.com/golang/sys/compare/v0.15.0...v0.19.0) Updates `gopkg.in/yaml.v3` from 3.0.0-20210107192922-496545a6307b to 3.0.0 Updates `golang.org/x/sys` from 0.15.0 to 0.19.0 - [Commits](https://github.com/golang/sys/compare/v0.15.0...v0.19.0) Updates `golang.org/x/net` from 0.19.0 to 0.23.0 - [Commits](https://github.com/golang/net/compare/v0.19.0...v0.23.0) --- updated-dependencies: - dependency-name: github.com/containerd/containerd dependency-type: direct:production dependency-group: go_modules - dependency-name: github.com/containers/podman/v4 dependency-type: direct:production dependency-group: go_modules - dependency-name: google.golang.org/protobuf dependency-type: direct:production dependency-group: go_modules - dependency-name: github.com/cyphar/filepath-securejoin dependency-type: indirect dependency-group: go_modules - dependency-name: golang.org/x/sys dependency-type: indirect dependency-group: go_modules - dependency-name: google.golang.org/protobuf dependency-type: indirect dependency-group: go_modules - dependency-name: golang.org/x/net dependency-type: direct:production dependency-group: go_modules - dependency-name: google.golang.org/grpc dependency-type: direct:production dependency-group: go_modules - dependency-name: golang.org/x/sys dependency-type: indirect dependency-group: go_modules - dependency-name: gopkg.in/yaml.v3 dependency-type: indirect dependency-group: go_modules - dependency-name: golang.org/x/sys dependency-type: indirect dependency-group: go_modules - dependency-name: gopkg.in/yaml.v3 dependency-type: indirect dependency-group: go_modules - dependency-name: golang.org/x/sys dependency-type: indirect dependency-group: go_modules - dependency-name: golang.org/x/net dependency-type: indirect dependency-group: go_modules ... Signed-off-by: dependabot[bot] <support@github.com>	2024-04-30 09:46:13 +01:00
Wainer Moschetta	eae429a39b	Merge pull request #9552 from wainersm/kata_cc_dev runtime: new qemu-coco-dev configuration	2024-04-30 05:21:49 -03:00
Zvonko Kaiser	28078ded84	Merge pull request #9570 from stevenhorsman/dependabot-commit-check-skip workflow: static-checks: Skip commit checks for dependabout	2024-04-29 23:00:35 +02:00
Pavel Mores	1dd06cf40d	Merge pull request #9551 from pmores/support-iommu runtime-rs: support IOMMU in qemu VMs	2024-04-29 15:26:11 +02:00
stevenhorsman	0bec8721cc	workflow: Skip commit checks for dependabout Dependabot doesn't follow all our commit format guidelines, so add a check and skip these if the author is `dependabot[bot]` Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-04-29 13:45:51 +01:00
Wainer dos Santos Moschetta	631f6f6ed6	gha: switch CoCo tests on non-TEE to use qemu-coco-dev With the addition of the 'qemu-coco-dev' runtimeClass we no longer need to run CoCo tests on non-TEE environments with 'qemu'. As a result the tests also no longer need to set the "io.katacontainers.config.hypervisor.image" annotation to pods. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-04-29 05:45:11 -03:00
Wainer dos Santos Moschetta	c6708726ff	kata-deploy: install the new kata-qemu-coco-dev runtimeclass Created the runtimeclasses/kata-qemu-coco-dev.yaml file and updated the list of SHIMS. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-04-29 05:45:11 -03:00
Wainer dos Santos Moschetta	42fb5d7760	runtime: new qemu-coco-dev configuration Created a new configuration to configure Kata for CoCo without requiring TEE hardware so to allow developers implement/test/debug platform agnostic code on their workstations. It will also ease testing of CoCo features on CI with non-TEE supported VMs. This is based off qemu configuration. The following differences applied: - switched to confidential guest image/initrd - switched to confidential kernel - switched to 9p shared_fs Fixes #9487 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-04-29 05:45:10 -03:00
Fabiano Fidêncio	d3b300ff95	build: tests: Remove agent-opa Now that the `kata-agent` is being built with policy support, let's stop building the `kata-opa-agent`, reducing the amount of things we need to test and maintain. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-28 12:52:54 +02:00
Fabiano Fidêncio	b1710ee2c0	build: Build the shipped agent with policy enabled Now that the OPA binary is not required anymore, let's start shipping the agent with the policy enabled by default. The agent without policy enabled has 30MB, while it's 34MB with the policy enabled. This 4MB (~10%) increase is, IMHO, worth it in order to reduce the amount of components we have to maintain and test, including the possibility to also reduce the amount of possible rootfs / initrd images. Whoever wants to use the agent without policy enabled can simply do that by building their own agent. :-) Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-28 12:52:54 +02:00
Fabiano Fidêncio	7b039eb1b9	Merge pull request #9559 from fidencio/topic/remove-opa-stuff rootfs: Stop building and shipping OPA	2024-04-28 12:52:07 +02:00
Fabiano Fidêncio	fe21d7a58b	rootfs: Stop building and shipping OPA Since OPA binary was replaced by the regorus crate, we can finally stop building and shipping the binary. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-26 18:51:28 +02:00
Fabiano Fidêncio	7dd2fde22d	Revert "rootfs: Make OPA build working in docker for s390x and ppc64le" This reverts commit `d523e865c0`, as we will not depend on the OPA binary anymore. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-26 18:51:27 +02:00
Hyounggyu Choi	62bad976e0	Merge pull request #9562 from BbolroC/bump-golang build: Update golang version to 1.22.2	2024-04-26 17:58:04 +02:00
Steve Horsman	34a1cdc5c7	Merge pull request #9528 from cncal/patch-1 doc: fix missing document link	2024-04-26 15:22:15 +01:00
Hyounggyu Choi	80cb4a6c18	build: Update golang version to 1.22.2 As we have an issue with a golang version for `run-cri-containerd`, it is required to bump the language. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-04-26 15:50:29 +02:00
Pavel Mores	908ec31d9b	runtime-rs: fix iommu_platform support for qemu vhost-user-fs device iommu_platform support was already added on initial DeviceVhostUserFs introduction, however it incorrectly enabled iommu_platform also on non-CCW (e.g. PCI) systems. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-04-26 14:48:00 +02:00
Pavel Mores	174fc8f44b	runtime-rs: support iommu_platform for qemu virtio-net device Note that it's only supported on CCW systems. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-04-26 14:48:00 +02:00
Pavel Mores	0d038f20cc	runtime-rs: support iommu_platform for qemu virtio-serial device iommu_platform is only turned on for CCW systems. PartialEq is added to VirtioBusType to enable the '==' operator. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-04-26 14:48:00 +02:00
Pavel Mores	66a2dc48ae	runtime-rs: support iommu_platform for qemu vhost-vsock device iommu_platform addition is controlled solely by the configuration file. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-04-26 14:48:00 +02:00
Pavel Mores	d1e6f9cc4e	runtime-rs: add IOMMU to qemu VM if configured The adding itself is done by a new function add_iommu() that conforms with the add_() convention. Note though that this function is called internally, by the QemuCmdLine constructor, simply because there's nothing to trigger its invocation from QemuInner (unlike the other add_() functions so far). Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-04-26 14:48:00 +02:00
Pavel Mores	0859f47a17	runtime-rs: add representation of '-device intel-iommu' to qemu-rs Following the golang shim example, the values are hardcoded. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-04-26 14:47:51 +02:00
Pavel Mores	702bf0d35e	runtime-rs: support qemu machine's 'kernel_irqchip' param We will want to set kernel_irqchip when enabling IOMMU and this commit adds the requisite support. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-04-26 14:42:54 +02:00
Alex Lyn	f72c6ba814	Merge pull request #9519 from emanuellima1/impl-rtc runtime-rs: Add RTC to QEMU cmdline	2024-04-26 17:44:47 +08:00
Dan Mihai	b42ddaf15f	Merge pull request #9530 from microsoft/saulparedes/improve_caching genpolicy: changing caching so the tool can run concurrently with itself	2024-04-25 13:06:23 -07:00
David Esparza	ae317a319f	Merge pull request #9549 from JakubLedworowski/fix-tarball-dockerfile build: Fix tarball not building correctly in docker	2024-04-25 09:40:20 -06:00
James O. D. Hunt	5bd614530f	Merge pull request #9525 from jodh-intel/gha-k8s-ch-dm gha: Enable k8s tests for cloud hypervisor with devicemapper	2024-04-25 09:28:09 +01:00
Fabiano Fidêncio	b4360e7e37	Merge pull request #9510 from microsoft/danmihai1/regorus-policy2 agent: use regorus instead of opa	2024-04-24 21:40:29 +02:00
James O. D. Hunt	ff7349b6f0	gha: Enable k8s tests for cloud hypervisor with devicemapper Enable the k8s tests for cloud hypervisor with devicemapper. Fixes: #9221. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com> Co-authored-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-04-24 16:32:51 +01:00
Dan Mihai	2400a4d249	Merge pull request #9428 from arc9693/archana1/genplicyfixes genpolicy: implement default methods for K8sResource trait	2024-04-24 08:04:19 -07:00
Dan Mihai	ff385eac41	agent: remove unnecessary comment Remove reminder to initialize Policy earlier, because currently there are no plans to initialize earlier. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-24 14:53:51 +00:00
Jakub Ledworowski	73366da9f9	build: Fix tarball not building correctly in docker When docker is installed on the host system using script from https://get.docker.com/ it automatically creates a docker group with gid=999. Then during docker build process of tarball, eg. make qemu-tdx-experimental-tarball docker is also installed inside the image with the same script, which also automatically adds docker group with gid=999. Then, the build tries to add a new group docker_on_host with gid=999, which already exists, which breaks the build. Signed-off-by: Jakub Ledworowski <jakub.ledworowski@intel.com>	2024-04-24 15:35:36 +02:00
Calvin Liu	56a73ee704	doc: fix missing document link Document section hardware-requirements locates to /README.md for now. Signed-off-by: Calvin Liu <flycalvin@qq.com>	2024-04-24 17:34:30 +08:00
Fabiano Fidêncio	4e35f11a3d	Merge pull request #9535 from fidencio/topic/fix-crio-debug-drop-in kata-deploy: Stop append `log_level = "debug"` for CRI-O	2024-04-24 10:03:36 +02:00
Dan Mihai	89c85dfe84	Merge pull request #9432 from UiPath/fix-clh-wait clh: isClhRunning waits for full timeout when clh exits	2024-04-23 13:02:45 -07:00
Hyounggyu Choi	608df9b7df	Merge pull request #9494 from BbolroC/guest-pull-gha-s390x CC: Enable guest-pull tests on non-TEE for s390x	2024-04-23 21:22:37 +02:00
Dan Mihai	e5c3f5fa9b	tests: no generated policy for untested platforms Avoid auto-generating Policy on platforms that haven't been tested yet with auto-generated Policy. Support for auto-generated Policy on these additional platforms is coming up in future PRs, so the tests being fixed here were prematurely enabled. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-23 16:07:03 +00:00
Emanuel Lima	2bc5e3c6e2	runtime-rs: Add RTC to QEMU cmdline Add RTC by hardcoding the ooptions base=utc,driftfix=slew,clock=host Signed-off-by: Emanuel Lima <emlima@redhat.com>	2024-04-23 10:46:30 -03:00
Fabiano Fidêncio	d190c9d4d9	kata-deploy: Stop append `log_level = "debug"` for CRI-O This should only be done once, and if CRI-O restarts, there's a big chance kata-deploy will also restart and the user would end up with a file that looks like: ``` [crio] log_level = "debug" [crio] log_level = "debug" [crio] log_level = "debug" ... ``` And that would simply cause CRI-O to not start. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-23 14:51:35 +02:00
Greg Kurz	42a79801f3	Merge pull request #9524 from littlejawa/fix_createruntime_hook_not_called runtime: Call CreateRuntime hooks at container creation time	2024-04-23 13:43:36 +02:00
Fupan Li	469c4e4f44	Merge pull request #9335 from Tim-Zhang/fix-passfd-fifo-open passfd-io: fix FIFO opening and vsock handling	2024-04-23 09:04:45 +08:00
Alex Lyn	bc2cf95e7a	Merge pull request #9517 from amshinde/update-storage-source-pciblock runtime-rs: Update storage source for pci block devices	2024-04-23 07:32:36 +08:00
Dan Mihai	5d31eb4847	agent: use regorus 0.1.4 Use regorus 0.1.4 from crates.io, instead of its source code repository. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-22 23:21:17 +00:00
Dan Mihai	ed6412b63c	tests: k8s: reduce the policy tests output noise Hide some of the kubectl output, to reduce the size and redundancy of this output. Fixes: #9388 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-22 19:59:33 +00:00
Dan Mihai	df23eb09a6	agent: use regorus instead of opa Implement Agent Policy using the regorus crate instead of the OPA daemon. The OPA daemon will be removed from the Guest rootfs in a future PR. Fixes: #9388 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-22 19:58:30 +00:00
Dan Mihai	58e608d61a	tests: remove k8s-policy-set-keys.bats Remove k8s-policy-set-keys.bats in preparation for using the regorus crate instead of the OPA daemon for evaluating the Agent Policy. This test depended on sending HTTP requests to OPA. Fixes: #9388 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-22 19:49:38 +00:00
Dan Mihai	b509c1beee	agent: lock anyhow version to 1.0.58 Lock anyhow version to 1.0.58 because: - Versions between 1.0.59 - 1.0.76 have not been tested yet using Kata CI. However, those versions pass "make test" for the Kata Agent. - Versions 1.0.77 or newer fail during "make test" - see https://github.com/kata-containers/kata-containers/issues/9538. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-22 19:49:15 +00:00
Archana Shinde	cc6b671101	runtime-rs: Update storage source for pci block devices In case of block devices using virtio-block, we need to pass the pci-path as the storage source field to the agent. Current the virt-path is being passed which works just for mmio block devices. In the future when support is added for scsi, block-ccw and pmem devices, the storage source would need to be handled accordingly. Fixes: #9034 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com> Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2024-04-22 11:36:58 -07:00
Hyounggyu Choi	f10744df99	CC: Enable guest-pull tests on non-TEE for s390x This commit is to add a new CI job to run-k8s-tests-on-zvsi.yaml. Why the job is not configured in run-kata-coco-tests.yaml by having it integrated with `run-k8s-tests-coco-nontee` is: - It uses k3s instead of AKS - It runs on a self-hosted runner These differences make the integrated job not easy to read and maintain when it comes to incorporating other platforms in the near future. Fixes: #9467 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-04-22 17:15:20 +02:00
Greg Kurz	6ca0f09710	Merge pull request #9518 from microsoft/danmihai1/agent-cargo-lock agent: update cargo.lock	2024-04-22 13:36:06 +02:00
Tim Zhang	aeba483ec8	agent: avoid fd leakage of passfd-io In do_create_container and do_exec_process, we should create the proc_io first, in case there's some error occur below, thus we can make sure the io stream closed when error occur. Signed-off-by: Tim Zhang <tim@hyper.sh> Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-04-22 17:39:33 +08:00
Tim Zhang	8441187d5e	runtime-rs: fix FIFO handling Fixes: #9334 In linux, when a FIFO is opened and there are no writers, the reader will continuously receive the HUP event. This can be problematic. To avoid this problem, we open stdin in write mode and keep the stdin-writer We need to open the stdout/stderr as the read mode and keep the open endpoint until the process is delete. otherwise, the process would exit before the containerd side open and read the stdout fifo, thus runD would write all of the stdout contents into the stdout fifo and then closed the write endpoint. Then, containerd open the stdout fifo and try to read, since the write side had closed, thus containerd would block on the read forever. Here we keep the stdout/stderr read endpoint File in the common_process, which would be destroied when containerd send the delete rpc call, at this time the containerd had waited the stdout read return, thus it can make sure the contents in the stdout/stderr fifo wouldn't be lost. Signed-off-by: Tim Zhang <tim@hyper.sh> Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-04-22 17:39:33 +08:00
Tim Zhang	d68eb7f0ad	agent: Fix close_stdin for passfd-io In scenario passfd-io, we should wait for stdin to close itself instead of manually intervening in it. Signed-off-by: Tim Zhang <tim@hyper.sh> Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-04-22 17:39:32 +08:00
Steve Horsman	ff9985fc50	Merge pull request #9490 from wainersm/port_attestation_nontee_job gha: move attestation tests to run-k8s-tests-coco-nontee	2024-04-22 10:23:11 +01:00
Archana Choudhary	4a010cf71b	genpolicy: add default implementations for K8sResource trait This commit adds default implementations for following methods of K8sResource trait: - generate_policy - serialize Fixes: #8960 Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2024-04-21 12:59:02 +00:00
Archana Choudhary	6edc3b6b0a	genpolicy: add default implementation for use_sandbox_pidns This patch adds a default implementation for the use_sandbox_pidns and updates the structs that implement the K8sResource trait to use the default. Fixes: #8960 Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2024-04-21 12:59:02 +00:00
Archana Choudhary	d5d3f9cda7	genpolicy: add default implementation for use_host_network - Provide default implementation for use_host_network - Remove default implementation from structs implementing the trait K8sResource Fixes: #8960 Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2024-04-21 12:59:02 +00:00
Archana Choudhary	9a3eac5306	genpolicy: add default impl for get_containers - Provide default impl for get_containers - Remove default impl from structs implementing the trait K8sResource Fixes: #8960 Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2024-04-21 12:59:02 +00:00
Archana Choudhary	2db3470602	genpolicy: add default impl for get_container_mounts_and_storages - Provide default impl for get_container_mounts_and_storages - Remove default impl from structs implementing the trait K8sResource Fixes: #8960 Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2024-04-21 12:59:02 +00:00
Archana Choudhary	09b0b4c11d	genpolicy: add default implementation for get_sandbox_name - Provide default implementation for get_sandbox_name in K8sResource trait - Remove default implementation from structs implementing the trait K8sResource Fixes: #8960 Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2024-04-21 12:55:32 +00:00
Archana Choudhary	43e9de8125	genpolicy: add default implementation for get_annotations - Provide default implementation for get_annontations. - Remove default implementation from structs implementing the trait K8sResource Fixes: #8960 Signed-off-by: Archana Choudhary <archana1@microsoft.com>	2024-04-21 12:55:32 +00:00
Saul Paredes	2149cb6502	genpolicy: changing caching so the tool can run concurrently with itself Based on 3a1461b0a5186a92afedaaea33ff2bd120d1cea0 Previously the tool would use the layers_cache folder for all instances and hence delete the cache when it was done, interfereing with other instances. This change makes it so that each instance of the tool will have its own temp folder to use. Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-04-19 15:46:30 -07:00
Wainer dos Santos Moschetta	1e35291fd5	gha: move attestation tests to run-k8s-tests-coco-nontee The new run-k8s-tests-coco-nontee job should be the home of attestation tests. Changed run-k8s-tests-coco-nontee to get KBS installed and by the time the KBS variable is exported in the environment then the attestation tests will kick in (likewise they will skip in run-k8s-tests-on-aks). Fixes #9455 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-04-19 14:51:30 -03:00
Steve Horsman	7e12d588c0	Merge pull request #9485 from sparky005/update_golang.org/x/net update golang.org/x/net	2024-04-19 11:26:13 +01:00
Amulya Meka	12964256a4	Merge pull request #9521 from Amulyam24/gha gha: tag k8s tests on ppc64le to ppc64le-runner-01	2024-04-19 15:08:08 +05:30
Julien Ropé	70e798ed35	runtime: Call CreateRuntime hooks at container creation time CreateRuntime hooks are called at the CreateSandbox time, but not after CreateContainer. Fixes: #9523 Signed-off-by: Julien Ropé <jrope@redhat.com>	2024-04-19 10:25:02 +02:00
Alex Lyn	3456483df9	Merge pull request #9513 from stevenhorsman/bump-stale-version gha: stale: Bump stalebot version	2024-04-19 15:15:10 +08:00
Alex Lyn	c147f0f4ed	Merge pull request #9516 from sprt/rlz-340 release: bump version for 3.4.0 release	2024-04-19 15:12:26 +08:00
Amulyam24	8255ed248a	gha: tag k8s tests on ppc64le to ppc64le-runner-01 This PR aims at running the k8s tests to one runner on ppc64le. Fixes: #9520 Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>	2024-04-19 12:04:25 +05:30
Hyounggyu Choi	304dc1e4da	doc: Update how-to-run-kata-containers-with-SE-VMs.md This is to update a document `how-to-run-kata-containers-with-SE-VMs` on using confidential artifacts to build a secure image. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-04-19 08:31:12 +02:00
Hyounggyu Choi	8fbed9f6a4	local-build: Use confidential kernel and initrd for boot-image-se This is to make `boot-image-se-tarball` use confidential kernel and initrd instead of vanilla version of artifacts. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-04-19 07:09:04 +02:00
Dan Mihai	4242801b1c	agent: update cargo.lock Update Kata Agent's Cargo.lock after the recent changes to Cargo.toml. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-18 17:12:48 +00:00
Aurélien Bombo	95971e4a42	release: bump version for 3.4.0 release Release v3.4.0. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-04-18 17:08:06 +00:00
Steve Horsman	6dd038fd58	Merge pull request #9501 from zvonkok/check-fixes kata: Remove check for "Fixes" in PR	2024-04-18 17:48:50 +01:00
Hyounggyu Choi	2b9c439fcf	Merge pull request #9508 from BbolroC/gha-s390x-k8s-label gha: Make integration tests for s390x run on s390x-large runners	2024-04-18 18:05:01 +02:00
Adil Sadik	1c5ca0c915	runtime: update golang.org/x/net updates golang.org/x/net to newer version that closes some reported vulnerabilities and security issues Fixes #9486 Signed-off-by: Adil Sadik <sparky.005@gmail.com>	2024-04-18 10:55:02 -04:00
Tim Zhang	221c5b51fe	dragonball: fix EPOLLHUP/EPOLLERR events handling in vsock 1. EPOLLHUP events also need to be read and will be got len 0. 2. We should kill the connection when EPOLLERR events are received. Signed-off-by: Tim Zhang <tim@hyper.sh> Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2024-04-18 20:47:02 +08:00
Hyounggyu Choi	49a0d57f66	gha: Make integration tests for s390x run on s390x-large runners This is to make a workflow `run-k8s-tests` and `run-cri-containerd` (s390x and zvsi) run only on the runners labeled by `s390x-large`. Fixes: #9507 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-04-18 14:35:24 +02:00
stevenhorsman	cf5c3dc155	gha: stale: Bump stalebot version - Bump the stalebot action version to v9 as that fixes the ``` Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/stale@v8. ``` warning. Fixes: #9512 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-04-18 11:41:09 +01:00
Steve Horsman	bf16b18180	Merge pull request #9503 from stevenhorsman/stale-pr-remove-date gha: stale: Remove the start-date	2024-04-18 09:36:27 +01:00
Hyounggyu Choi	566a6de594	Merge pull request #9505 from BbolroC/remove-crio-nightly-test-s390x gha: Remove k8s-cri-containerd-rhel9-e2e-tests for s390x	2024-04-18 09:31:07 +02:00
Hyounggyu Choi	cc22dc33f2	Merge pull request #9489 from BbolroC/install-opa-in-docker rootfs: Make OPA build working in docker for s390x and pp…	2024-04-18 00:26:11 +02:00
Dan Mihai	5ceed689eb	Merge pull request #9492 from microsoft/danmihai1/pod-tests tests: k8s: inject agent policy failures (part 3)	2024-04-17 14:01:11 -07:00
Hyounggyu Choi	e046f5e652	gha: Remove k8s-cri-containerd-rhel9-e2e-tests for s390x This commit is simply to remove a CI workflow `k8s-cri-containerd-rhel9-e2e-tests`. Fixes: #9504 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-04-17 15:36:42 +02:00
Zvonko Kaiser	eda3bfe2ef	config: Add NVIDIA GPU SNP, TDX configuration files Fixes: #9475 For TDX and SNP add NVIDIA specific configuration files Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-04-17 12:49:13 +00:00
Wainer Moschetta	2d8e7933c5	Merge pull request #9461 from GabyCT/topic/uninstallkbs tests/k8s: Add uninstall kbs client command function	2024-04-17 09:36:37 -03:00
Zvonko Kaiser	d7b24c04e5	Merge pull request #9473 from zvonkok/gpu-image-initrd-versions version: add initrd, image NVIDIA sections	2024-04-17 13:22:05 +02:00
stevenhorsman	7235988605	gha: stale: Remove the start-date As documented in https://github.com/actions/stale?tab=readme-ov-file#start-date > The start date is used to ignore the issues and pull requests created before the start date. > Particularly useful when you wish to add this stale workflow on an existing repository > and only wish to stale the new issues and pull requests. As we don't want need to treat PRs older than May 2023 as a special case, then remove this option. Fixes: #9502 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-04-17 11:19:56 +01:00
Zvonko Kaiser	395e93acd5	kata: Remove Issue - PR dependency We've discussed this over and over. Let's try to get to an agreement here. I will use this issue to remove the mandatory Issue - PR dependency. Fixes: #9500 Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-04-17 09:53:08 +00:00
Archana Shinde	af3b19ed18	Merge pull request #9084 from amshinde/document-intel-gpu-vfio docs: Document Intel Discrete GPUs usage with Kata	2024-04-16 16:17:03 -07:00
Archana Shinde	973a15332a	spell-check: Add missing words to spell-check Add missing words to spell-check dictionaries Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-04-16 11:50:02 -07:00
Archana Shinde	6f97dc1f60	static-checks: Rename file in doc to make static checks happy Configuration file for qemu with runtime-rs was recently renamed. Doc contains name for old file. This was somehow not caught in the CI earlier. Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-04-16 11:50:02 -07:00
Archana Shinde	87f0097b18	docs: Document Intel Discrete GPUs usage with Kata Document describes the steps needed to pass an entire Intel Discrete GPU as well a GPU SR-IOV interface to a Kata Container. Fixes: #9083 Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>	2024-04-16 11:50:02 -07:00
Dan Mihai	2c4d1ef76b	tests: k8s: inject agent policy failures (part 3) Auto-generate the policy and then simulate attacks from the K8s control plane by modifying the test yaml files. The policy then detects and blocks those changes. These test cases are using K8s Pods. Additional policy failures are injected during CI using other types of K8s resources - e.g., using Jobs and Replication Controllers - from separate PRs. Fixes: #9491 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-16 18:15:12 +00:00
Dan Mihai	c26dad8fe5	Merge pull request #9294 from burgerdev/burgerdev/genpolicy-configurable-pause genpolicy: support insecure registries and custom pause containers	2024-04-16 09:39:33 -07:00
GabyCT	9238daf729	Merge pull request #9464 from microsoft/danmihai1/rc-tests tests: k8s: inject agent policy failures (part2)	2024-04-16 10:01:39 -06:00
Hyounggyu Choi	d523e865c0	rootfs: Make OPA build working in docker for s390x and ppc64le The commit is to make the OPA build from source working in `ubuntu-rootfs-osbuilder`. To achieve the goal, the configuration is changed as follows: - Switch the make target to `ci-build-linux-static` not triggering docker-in-docker build - Install go in the builder image for s390x and ppc64le Fixes: #9466 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-04-16 16:49:12 +02:00
Greg Kurz	aca6a1bcb5	Merge pull request #9353 from pmores/pr-8866-follow-up runtime-rs: refactor qemu driver	2024-04-16 16:07:36 +02:00
Fabiano Fidêncio	7bb5490676	Merge pull request #9479 from wainersm/fix_coco_nontee_jobs gha: make run-kata-coco-tests inherit secrets	2024-04-16 13:46:52 +02:00
Hyounggyu Choi	7b11fd2546	Merge pull request #9471 from BbolroC/coco-kernel-version-s390x version: Add coco name and version for {image,initrd} for s390x	2024-04-15 16:03:20 +02:00
Wainer dos Santos Moschetta	77541008fc	gha: make run-kata-coco-tests inherit secrets The new CoCo non-tee job introduced on commit `0d5399ba92` need to read secrets like AZ_TENANT_ID, so run-kata-coco-tests workflow should inherit the secrets from the caller workflow. Fixes #9477 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-04-15 10:53:44 -03:00
Zvonko Kaiser	78e3ebb011	version: add initrd, image NVIDIA sections Fixes: #9472 For initrd and image, the related NVIDIA will not use the default targets and we will pin them to a specific release. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-04-15 13:31:35 +00:00
Wainer Moschetta	c85e1ca674	Merge pull request #9404 from ldoktor/ci-mcp-timeout ci.ocp: Increase the MCP update time	2024-04-15 09:42:14 -03:00
Hyounggyu Choi	3ec209dcf1	Merge pull request #9469 from BbolroC/coco-kernel-config-s390x kernel: Adjust s390x config for confidential containers	2024-04-15 13:55:28 +02:00
Hyounggyu Choi	8fce600493	version: Add coco name and version for {image,initrd} for s390x In order to build a coco {image,initrd}, it is required to specify its name and version in versions.yaml. This commit is to add the configuration for them, respectively. Fixes: #9470 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-04-15 12:53:00 +02:00
Hyounggyu Choi	a792dc3e2b	kernel: Adjust s390x config for confidential containers `CONFIG_TN3270_TTY` and `CONFIG_S390_AP_IOMMU` are dropped for s390x in 6.7.x which is used for a confidential kernel. But they are still used for a vanilla kernel. So we need to add them to the whitelist. Fixes: #9465 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-04-15 10:28:59 +02:00
Hyounggyu Choi	32f58abfde	Merge pull request #9403 from BbolroC/runtime-rs-ci-qemu CI: Enable GHA cri-containerd workflow for runtime-rs with QEMU	2024-04-15 09:31:25 +02:00
Xuewei Niu	402d8a968e	Merge pull request #9430 from UiPath/fix-agent-shutdown agent: shutdown vm on exit when agent is used as init process	2024-04-15 10:47:07 +08:00
Wainer Moschetta	0a04f54a8e	Merge pull request #9454 from GabyCT/topic/pulltype gha: Define unbound PULL TYPE variable	2024-04-12 14:48:56 -03:00
Wainer Moschetta	a0b21d0e14	Merge pull request #9424 from wainersm/cc_guest_pull-encrypted CC: run guest-pull tests on non-TEE jobs	2024-04-12 09:34:35 -03:00
Hyounggyu Choi	cf20a6a4ae	gha: Add qemu-runtime-rs to VMM matrix for run-cri-containerd This commit expands the VMM matrix for run-cri-containerd, adding a new item `qemu-runtime-rs` for a test scenario where the VMM is QEMU and runtime-rs is employed. This expansion affects the workflows for both x86_64 and s390x platforms. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-04-12 12:25:53 +02:00
Hyounggyu Choi	606f8e1ab2	runtime-rs: Adjust configuration for qemu-runtime-rs To make `qemu-runtime-rs` working for CI, we have to rename a configuration template file and `CONFIG_FILE_QEMU` in Makefile. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-04-12 12:25:53 +02:00
Hyounggyu Choi	3c217c6c15	ci\|cri-containerd: Introduce qemu-runtime-rs for KATA_HYPERVISOR `qemu-runtime-rs` will be utilized to handle a test scenario where the VMM is QEMU and runtime-rs is employed. Note: Some of the tests are skipped. They are going to be reintegrated in the follow-up PR (Check out #9375). Fixes: #9371 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-04-12 12:25:53 +02:00
Alexandru Matei	9e01732f7a	agent: shutdown vm on exit when agent is used as init process Linux kernel generates a panic when the init process exits. The kernel is booted with panic=1, hence this leads to a vm reboot. When used as a service the kata-agent service has an ExecStop option which does a full sync and shuts down the vm. This patch mimicks this behavior when kata-agent is used as the init process. Fixes: #9429 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>	2024-04-12 11:32:31 +03:00
Alexandru Matei	54923164b5	clh: isClhRunning waits for full timeout when clh exits isClhRunning uses signal 0 to test whether the process is still alive or not. This doesn't work because the process is a direct child of the shim. Once it is dead the process becomes zombie. Since no one waits for it the process lingers until its parent dies and init reaps it. Hence sending signal 0 in isClhRunning will always return success whether the process is dead or not. This patch calls wait to reap the process, if it succeeds that means it is our child process, if not we send the signal. Fixes: #9431 Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>	2024-04-12 11:31:53 +03:00
Dan Mihai	e51cbdcff9	tests: k8s: inject agent policy failures (part2) Auto-generate the policy and then simulate attacks from the K8s control plane by modifying the test yaml files. The policy then detects and blocks those changes. These test cases are using K8s Replication Controllers. Additional policy failures will be injected using other types of K8s resources - e.g., using Pods and/or Jobs - in separate PRs. Fixes: #9463 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-11 21:08:53 +00:00
Markus Rudy	77540503f9	genpolicy: add support for insecure registries genpolicy is a handy tool to use in CI systems, to prepare workloads before applying them to the Kubernetes API server. However, many modern build systems like Bazel or Nix restrict network access, and rightfully so, so any registry interaction must take place on localhost. Configuring certificates for localhost is tricky at best, and since there are no privacy concerns for localhost traffic, genpolicy should allow to contact some registries insecurely. As this is a runtime environment detail, not a target environment detail, configuring insecure registries does not belong into the JSON settings, so it's implemented as command line flags. Fixes: #9008 Signed-off-by: Markus Rudy <webmaster@burgerdev.de>	2024-04-11 22:29:03 +02:00
Wainer dos Santos Moschetta	4f74617897	tests: pass --overwrite-existing to aks get-credentials By passing --overwrite-existing to `aks get-credentials` it will stop asking if I want to overwrite the existing credentials. This is handy for running the scripts locally. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-04-11 15:31:40 -03:00
Wainer dos Santos Moschetta	3508f3a43a	tests/k8s: use CoCo image on guest-pull when non-TEE When running on non-TEE environments (e.g. KATA_HYPERVISOR=qemu) the tests should be stressing the CoCo image (/opt/kata/share/kata-containers/kata-containers-confidential.img) although currently the default image/initrd is built to be able to do guest-pull as well. Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-04-11 15:31:40 -03:00
Wainer dos Santos Moschetta	c24f13431d	tests/k8s: enable guest-pull tests on non-TEE Enabled guest-pull tests on non-TEE environment. It know requires the SNAPSHOTTER environment variable to avoid it running on jobs where nydus-snapshotter is not installed Fixes: #9410 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-04-11 15:31:40 -03:00
Wainer dos Santos Moschetta	0d5399ba92	gha: Create CoCo tests jobs on non-TEE Created the new run-k8s-tests-coco-nontee jobs for running CoCo tests on non-TEE. It currently generates the run-k8s-tests-coco-nontee(qemu, nydus, guest-pull) job only to run the guest-pull tests. Fixes: #9410 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-04-11 15:31:40 -03:00
Gabriela Cervantes	5420595d03	tests/k8s: Add uninstall kbs client command function This PR adds the function to uninstall kbs client command function specially when we are running with baremetal devices. Fixes #9460 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-04-11 17:06:11 +00:00
Steve Horsman	6b2d655857	Merge pull request #9457 from justxuewei/fs_manager_tests agent: Fix the issue with the "test_new_fs_manager" test	2024-04-11 17:02:58 +01:00
Fabiano Fidêncio	5611233ed8	Merge pull request #9439 from microsoft/danmihai1/job-tests tests: k8s: inject agent policy failures	2024-04-11 17:21:54 +02:00
Markus Rudy	bc2292bc27	genpolicy: make pause container image configurable CRIs don't always use a pause container, but even if they do the concrete container choice is not specified. Even if the CRI config can be tweaked, it's not guaranteed that registries in the public internet can be reached. To be portable across CRI implementations and configurations, the genpolicy user needs to be able to configure the container the tool should append to the policy. Signed-off-by: Markus Rudy <webmaster@burgerdev.de>	2024-04-11 16:26:35 +02:00
Markus Rudy	8b30fa103f	genpolicy: parse json settings during config init Decouple initialization of the Settings struct from creating the AgentPolicy struct, so that the settings are available for evaluating, extending or overriding command line arguments. Signed-off-by: Markus Rudy <webmaster@burgerdev.de>	2024-04-11 16:17:33 +02:00
Xuewei Niu	50f78ec52c	agent: Fix the issue with the "test_new_fs_manager" test This patch introduces a one-time cpath to mitigate the cgroup residuals. It might break the device cgroup merging rules when the cgroup has children. Fixes: #9456 Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>	2024-04-11 18:06:05 +08:00
GabyCT	08dcdc62de	Merge pull request #9423 from GabyCT/topic/improvecleanup tests: Improve the kbs_k8s_delete function	2024-04-10 14:28:21 -06:00
Gabriela Cervantes	4a2ee3670f	gha: Define unbound PULL TYPE variable This PR defines the PULL_TYPE variable to avoid failures of unbound variable when this is being test it locally. Fixes #9453 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-04-10 17:16:19 +00:00
GabyCT	dab837d71d	Merge pull request #9450 from GabyCT/topic/fixinnydus gha: Fix indentation in gha run script	2024-04-10 11:07:56 -06:00
David Esparza	9e1368dbc5	Merge pull request #9391 from dborquez/add-onednn-openvino-ml-benchs add onednn and openvino ml-benchmarks	2024-04-09 19:03:00 -06:00
Dan Mihai	ea31df8bff	Merge pull request #9185 from microsoft/saulparedes/genpolicy_add_containerd_pull genpolicy: Add optional toggle to pull images using containerd	2024-04-09 12:29:19 -07:00
Gabriela Cervantes	6ebdcf8974	gha: Fix indentation in gha run script This PR fixes an identation in gha run script. Fixes #9449 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-04-09 16:37:17 +00:00
Greg Kurz	89353249fc	Merge pull request #8988 from beraldoleal/ci-docs docs: adding an initial CI documentation	2024-04-09 18:26:15 +02:00
Dan Mihai	2252490a96	tests: k8s: inject agent policy failures Auto-generate the policy and then simulate attacks from the K8s control plane by modifying the test yaml files. The policy then detects and blocks those changes. These test cases are using K8s Jobs. Additional policy failures will be injected using other types of K8s resources - e.g., using Pods and/or Replication Controllers - in future PRs. Fixes: #9406 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-09 15:36:57 +00:00
David Esparza	facf3c9364	metrics: Add onednn benchmark. This PR adds onednn test to exercise additional ML benchmarks. Onednn is an Intel-optimized library for Deep Neural Networks. Fixes: #9390 Signed-off-by: David Esparza <david.esparza.borquez@intel.com>	2024-04-09 09:05:51 -06:00
David Esparza	3bde511d0d	metrics: Add openvino benchmark. This PR adds openvino test in order to exercise additional ML benchmarks. OpenVino bench used to optimize and deploy deep learning models. Fixes: #9389 Signed-off-by: David Esparza <david.esparza.borquez@intel.com>	2024-04-09 09:05:51 -06:00
David Esparza	b37c5f8ba1	metrics:libs: Add HTTPS and HTTP vars to docker build. Include HTTP and HTTPS env variables in the building docker images because they are required to download packages such as Phoronix. Added a restriction that verifies that docker building images is performed as root. Signed-off-by: David Esparza <david.esparza.borquez@intel.com>	2024-04-09 09:05:51 -06:00
David Esparza	3355dd9e2b	metrics:libs: Adds a function to set new kata configuration. Adds a function that receives as a single parameter the name of a valid Kata configuration file which will be established as the default kata configuration to start kata containers. Adds a second function that returns the path to the current kata configuration file. Signed-off-by: David Esparza <david.esparza.borquez@intel.com>	2024-04-09 09:05:51 -06:00
David Esparza	cb4380d1c9	metrics: common: Add function to clean the cache. The function clear the Page Cache only. Signed-off-by: David Esparza <david.esparza.borquez@intel.com>	2024-04-09 09:05:51 -06:00
David Esparza	3a419ba3b1	metrics: common: Add function to update kata config. Add an extra function that updates kata config to use the max num. of vcpus available and to use the available memory in the system. Signed-off-by: David Esparza <david.esparza.borquez@intel.com>	2024-04-09 09:05:51 -06:00
Beraldo Leal	959e56525c	docs: adding an initial CI documentation This is actually a first attempt to document our CI, and all this content was based on the document created by Fabiano Fidencio (kudos to him). We are just moving the content and discussion from Google Docs to here. I used the "poetic license" to add some notes on what I believe our CI will look like in the future. Fixes #9006 Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Beraldo Leal <bleal@redhat.com>	2024-04-09 09:21:47 -04:00
Saul Paredes	51498ba99a	genpolicy: toggle containerd pull in tests - Add v1 image test case - Install protobuf-compiler in build check - Reset containerd config to default in kubernetes test if we are testing genpolicy - Update docker_credential crate - Add test that uses default pull method - Use GENPOLICY_PULL_METHOD in test Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-04-08 19:28:29 -07:00
Dan Mihai	f60c9eaec3	Merge pull request #9398 from microsoft/danmihai1/policy-test-cleanup tests: k8s: improve the Agent Policy tests	2024-04-08 15:37:07 -07:00
Gabriela Cervantes	fb4c359cc2	tests: Improve the kbs_k8s_delete function This PR improves the kbs_k8s_delete function to verify that the resources were properly deleted for baremetal environments. Fixes #9379 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-04-08 18:03:07 +00:00
Saul Paredes	c96ebf237c	genpolicy: add containerd pull method Add optional toggle to use existing containerd installation to pull and manage container images. This adds support to a wider set of images that are currently not supported by standard pull method, such as those that use v1 manifest. Fixes: #9144 Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-04-08 09:56:59 -07:00
Greg Kurz	8b996b9307	Merge pull request #9331 from egernst/foobar katautils: check number of cores on the system intead of go runtime	2024-04-08 18:38:49 +02:00
Greg Kurz	934beb5ae4	Merge pull request #9421 from gkurz/bump-node-js-20 gha: Bump various actions to use Node.js 20	2024-04-08 18:22:28 +02:00
Wainer Moschetta	fba1d394d7	Merge pull request #9369 from ChengyuZhu6/sandbox-image agent:image: Support different pause image in the guest for guest pull	2024-04-08 11:06:21 -03:00
Steve Horsman	3242f55691	Merge pull request #8870 from LindaYu17/aa2main port attestation agent from CCv0 branch to main branch	2024-04-08 15:01:07 +01:00
James O. D. Hunt	42936cb92c	Merge pull request #9372 from jodh-intel/docs-kata-manager-update docs: kata-manager: Update with latest details	2024-04-08 13:23:23 +01:00
stevenhorsman	864e9c22ba	agent: doc: Add new config doc Document the new guest_components_rest_api config parameter Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-04-08 11:38:53 +01:00
stevenhorsman	29a5652e31	packaging: guest-components, set new environment variables - Set KBC_PROVIDER and ATTESTER rather than TEE_PLATFORM to avoid tss build issues for vTPM attester(s) - There are future plans to make a matching TEE_PLATFORM, so this can be simplified once that is available Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-04-08 11:38:53 +01:00
stevenhorsman	a284a20a14	tests: Filter CoCo tests on ppc64le/arm - At the moment we aren't supporting ppc64le or aarch64 for CoCo, so filter out these tests from running Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-04-08 11:38:53 +01:00
stevenhorsman	a0c03966c2	versions: Bump guest-components - Bump guest-components to try and test compatibility with the latest version Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-04-08 11:38:53 +01:00
stevenhorsman	101a5bf273	packaging: Update guest-components Dockerfile - Switch to Ubuntu 20.04 for building guest-components as The rootfs is based on 20.04, so we need matching GLIBC versions. See #8955 - Add dependencies needed by TDX verifier as we want to build for all platforms Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-04-08 11:38:53 +01:00
Gabriela Cervantes	6d85025e59	test/k8s: Add basic attestation test - Add basic test case to check that a ruuning pod can use the api-server-rest (and attestation-agent and confidential-data-hub indirectly) to get a resource from a remote KBS Fixes #9057 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com> Co-authored-by: Linda Yu <linda.yu@intel.com> Co-authored-by: stevenhorsman <steven@uk.ibm.com> Co-authored-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-04-08 11:38:53 +01:00
Biao Lu	f0edec84f6	agent: Launch api-server-rest If 'rest_api' is configured, let's start the api-server-rest after the attestation-agent and the confidential-data-hub have been started. Fixes: #7555 Signed-off-by: Biao Lu <biao.lu@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Linda Yu <linda.yu@intel.com> Co-authored-by: stevenhorsman <steven@uk.ibm.com> Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com> Co-authored-by: Wang, Arron <arron.wang@intel.com> Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com> Co-authored-by: Alex Carter <alex.carter@ibm.com> Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com> Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>	2024-04-08 11:38:53 +01:00
Biao lu	4d752e6350	agent: Add config for api-server-rest Add configuration for 'rest api server'. Optional configurations are 'agent.rest_api=attestation' will enable attestation api 'agent.rest_api=resource' will enable resource api 'agent.rest_api=all' will enable all (attestation and resource) api Fixes: #7555 Signed-off-by: Biao Lu <biao.lu@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Linda Yu <linda.yu@intel.com> Co-authored-by: stevenhorsman <steven@uk.ibm.com> Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com> Co-authored-by: Wang, Arron <arron.wang@intel.com> Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com> Co-authored-by: Alex Carter <alex.carter@ibm.com> Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com> Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>	2024-04-08 11:06:14 +01:00
Biao Lu	f476d671ed	agent: Launch the confidential data hub Let's introduce a new method to start the confidential data hub and the attestation agent. The former depends on the later, and it needs to be started before the RPC server. Starting the attestation components is based on whether the confidential containers guest components binaries are found in the rootfs. Fixes: #7544 Signed-off-by: Biao Lu <biao.lu@intel.com> Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> Signed-off-by: Linda Yu <linda.yu@intel.com> Co-authored-by: stevenhorsman <steven@uk.ibm.com> Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com> Co-authored-by: Wang, Arron <arron.wang@intel.com> Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com> Co-authored-by: Alex Carter <alex.carter@ibm.com> Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com> Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>	2024-04-08 11:06:14 +01:00
Greg Kurz	be8f0cb520	Merge pull request #9402 from deagon/feat/debug-threads qemu: show the thread name when enable the hypervisor.debug option	2024-04-08 11:04:36 +02:00
Hyounggyu Choi	e39be7a45e	Merge pull request #9415 from BbolroC/fix-dir-removal-error GHA: Implement secondary GITHUB_WORKSPACE cleanup on 1st failure	2024-04-08 10:44:44 +02:00
ChengyuZhu6	8c897f822c	agent:image: Support different pause image in the guest for guest pull Support different pause images in the guest for guest-pull, such as k8s pause image (registry.k8s.io/pause) and openshift pause image (quay.io/bpradipt/okd-pause). Fixes: #9225 -- part III Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-04-07 09:00:10 +08:00
GabyCT	9d2c5b180e	Merge pull request #9419 from GabyCT/topic/fxlatency metrics: Improve latency test cleanup	2024-04-05 16:31:00 -06:00
Wainer Moschetta	aae7048d4f	Merge pull request #9273 from ldoktor/kcli-coco-kbs tests: Support for kbs setup on kcli	2024-04-05 18:55:58 -03:00
Fabiano Fidêncio	f09bb98f51	Merge pull request #8840 from fidencio/topic/update-tdx-artefacts-to-the-new-host-os tdx: Update TDX artefacts to be used with the Ubuntu 23.10 / CentOS 9 stream OSVs.	2024-04-05 22:36:03 +02:00
Fabiano Fidêncio	cdb8531302	hypervisor: Simplify TDX protection detection Let's rely on the kvm module 'tdx' parameter to do so. This aligns with both OSVs (Canonical, Red Hat, SUSE) and the TDX adoption (https://github.com/intel/tdx-linux) stacks. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-05 19:51:27 +02:00
Fabiano Fidêncio	2ee03b5dc3	tdvf: Adapt the build command This is done in order to match the example from: https://github.com/intel/tdx-linux/wiki/Instruction-to-set-up-TDX-host-and-guest#build-tdvf-image Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-05 19:51:27 +02:00
Fabiano Fidêncio	b7cccfa019	qemu: tdx: Adapt command line This commit is a mess, but I'm not exactly sure what's the best way to make it less messy, as we're getting QEMU TDX to work while partially reverting `1e34220c41`. With that said, let me cover the content of this commit. Firstly, we're reverting all the changes related to "memory-backend-memfd-private", as that's what was used with the previous host stack, but it seems it didn't fly upstream. Secondly, in order to get QEMU to properly work with TDX, we need to enforce the 'private=on' knob and use the "memory-backend-ram", and we're doing so, and also making sure to test the `private=on` newly added knob. I'm sorry for the confusion, I understand this is not optimal, I just don't see an easy path to do changes without leaving the code broken during those changes. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-05 19:51:27 +02:00
Greg Kurz	424a5e243f	gha: Bump to `actions/[down\|up]load-artifact@v4` (all the rest) `Node.js 19` is deprecated. Bump to a new version based on `Node.js 20`. This fixes all remaining sites. Fixes #9245 Signed-off-by: Greg Kurz <groug@kaod.org>	2024-04-05 18:36:51 +02:00
Greg Kurz	dbc5dc7806	gha: Bump to `actions/[down\|up]load-artifact@v4` (k8s tests on garm) `Node.js 19` is deprecated. Bump to a new version based on `Node.js 20`. As explained at [1] : > The contents of an Artifact are uploaded together into an immutable > archive. They cannot be altered by subsequent jobs. Both of these > factors help reduce the possibility of accidentally corrupting > Artifact files. This means that artifacts cannot have the same name. Adapt the `run-k8s-tests-on-garm` workflow accordingly by embedding all the other `${{ vmm.* }}` fields and `${{ inputs.tag }}` in the artifact names that would otherwise collide. Fixes #9245 Signed-off-by: Greg Kurz <groug@kaod.org>	2024-04-05 18:36:51 +02:00
Greg Kurz	62a54ffa70	gha: Bump to `actions/[down\|up]load-artifact@v4` (kata static tarball) `Node.js 19` is deprecated. Bump to a new version based on `Node.js 20`. As explained at [1] : > The contents of an Artifact are uploaded together into an immutable > archive. They cannot be altered by subsequent jobs. Both of these > factors help reduce the possibility of accidentally corrupting > Artifact files. This means that artifacts cannot have the same name. Adapt all `build-kata-static-tarball` workflows accordingly by embedding `${{ matrix.asset }}` in the artifact names that would otherwise collide. Fixes #9245 Signed-off-by: Greg Kurz <groug@kaod.org>	2024-04-05 18:36:51 +02:00
Greg Kurz	7f2ce914a1	gha: Bump to `actions/checkout@v4` `Node.js 19` is deprecated. Bump to a new version based on `Node.js 20`. Fixes #9245 Signed-off-by: Greg Kurz <groug@kaod.org>	2024-04-05 18:36:50 +02:00
Greg Kurz	0a43d26c94	gha: Bump to `docker/login-action@v3` `Node.js 19` is deprecated. Bump to a new version based on `Node.js 20`. Fixes #9245 Signed-off-by: Greg Kurz <groug@kaod.org>	2024-04-05 18:36:50 +02:00
Greg Kurz	06c9c0d7db	gha: Bump to `docker/build-push-action@v5` `Node.js 19` is deprecated. Bump to a new version based on `Node.js 20`. Fixes #9245 Signed-off-by: Greg Kurz <groug@kaod.org>	2024-04-05 18:36:50 +02:00
Greg Kurz	8c21844aef	gha: Bump to `docker/setup-buildx-action@v3` `Node.js 19` is deprecated. Bump to a new version based on `Node.js 20`. Fixes #9245 Signed-off-by: Greg Kurz <groug@kaod.org>	2024-04-05 18:36:50 +02:00
Greg Kurz	03cbe6a011	gha: Bump to `docker/setup-qemu-action@v3` `Node.js 19` is deprecated. Bump to a new version based on `Node.js 20`. Fixes #9245 Signed-off-by: Greg Kurz <groug@kaod.org>	2024-04-05 18:36:50 +02:00
Hyounggyu Choi	4493459937	GHA: Implement secondary GITHUB_WORKSPACE cleanup on 1st failure Occasionally, the removal of GITHUB_WORKSPACE fails for self-hosted runners because one of the subdirectories is not empty. This is likely due to another process occupying the directory at the time. Implementing a secondary cleanup resolves this issue. This commit focuses on the implementation for the secondary cleanup. Fixes: #9317 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-04-05 11:41:51 +02:00
Fabiano Fidêncio	6b4cc5ea6a	Revert "qemu: tdx: Workaround SMP issue with TDX 1.5" This reverts commit `d1b54ede29`. Conflicts: src/runtime/virtcontainers/qemu.go This commit was a hack that was needed in order to get QEMU + TDX to work atop of the stack our CI was running on. As we're moving to "the officially supported by distros" host OS, we need to get rid of this. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-05 10:23:52 +02:00
Fabiano Fidêncio	582b5b6b19	govmm: tdx: Expose the private=on\|off knob The private=on\|off knob is required in order to properly lauunch a TDX guest VM. This is a brand new property that is part of the still in-flight patches adding TDX support on QEMU. Please, see: `3fdd8072da` Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-05 10:23:52 +02:00
Fabiano Fidêncio	fe5adae5d9	qemu-tdx: Update to v8.1.0 + TDX patches Let's update the QEMU to the one that's officially maintained by Intel till all the TDX patches make their way upstream. We've had to also update python to explicitly use python3 and add python3-venv as part of the dependencies. Fixes: #8810 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-04-05 10:23:51 +02:00
Alex Lyn	0e0a361f0e	Merge pull request #8782 from Apokleos/device-increate-count bugfix and refactor device increate count	2024-04-05 13:43:49 +08:00
Dan Mihai	6f9f8ae285	Merge pull request #9413 from microsoft/saulparedes/ensure_unique_rg_in_gha gha: ensure unique resource group name	2024-04-04 17:13:09 -07:00
GabyCT	80d926c357	Merge pull request #9411 from microsoft/danmihai1/k8s-job tests: k8s-job: wait for job successful create	2024-04-04 15:14:56 -06:00
Gabriela Cervantes	8e5d401be0	metrics: Improve latency test cleanup This PR improves the latency test cleanup in order to avoid random failures of leaving the pods. Fixes #9418 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-04-04 20:43:53 +00:00
Saul Paredes	f20caac1c0	gha: ensure unique resource group name There's an rg name duplication situation that got introduced by #9385 where 2 different test runs might have same rg name. Add back uniqueness by including the first letter of GENPOLICY_PULL_METHOD to cluster name. Fixes: #9412 Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-04-04 13:13:32 -07:00
GabyCT	aae2679f09	Merge pull request #9409 from GabyCT/topic/ghrunset gha: Define GH_PR_NUMBER variable in gha run k8s common script	2024-04-04 09:46:48 -06:00
Eric Ernst	da01bccd36	katautils: check number of cores on the system intead of go runtime We used to utilize go runtime's "NumCPUs()", which will give the number of cores available to the Go runtime, which may be a subset of physical cores if the shim is started from within a cpuset. From the function's description: "NumCPU returns the number of logical CPUs usable by the current process." As an example, if containerd is run from within a smaller CPUset, the maximum size of a pod will be dictated by this CPUset, instead of what will be available on the rest of the system. Since the shim will be moved into its own cgroup that may have a different CPUset, let's stick with checking physical cores. This also aligns with what we have documented for maxVCPU handling. In the event we fail to read /proc/cpuinfo, let's use the goruntime. Fixes: #9327 Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2024-04-03 16:09:16 -07:00
Dan Mihai	3e72b3f360	tests: k8s-job: wait for job successful create Don't just verify SuccessfulCreate - wait for it if needed. Fixes: #9138 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 22:11:15 +00:00
Gabriela Cervantes	73f27e28d1	gha: Define GH_PR_NUMBER variable in gha run k8s common script This PR defines the GH_PR_NUMBER variable in gha run k8s common script to avoid failures like unbound variable when running locally the scripts just like the GHA CI. Fixes #9408 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-04-03 18:25:00 +00:00
GabyCT	c5c229b330	Merge pull request #9397 from GabyCT/topic/removeconmon versions: Remove conmon information from versions.yaml	2024-04-03 11:14:43 -06:00
GabyCT	12947b1ba6	Merge pull request #9344 from GabyCT/topic/kerneldoc docs: Remove stale kernel information	2024-04-03 11:13:54 -06:00
Dan Mihai	07c23a05f2	Merge pull request #9385 from microsoft/saulparedes/add_genpolicy_yaml_params gha: add GENPOLICY_PULL_METHOD	2024-04-03 09:20:16 -07:00
Lukáš Doktor	b8382cea88	ci.ocp: Increase the MCP update time updating the machine config takes even longer than 1200s, use 60m to be sure everything is updated. Fixes: #9338 Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-04-03 15:01:29 +02:00
Alex Lyn	935a1a3b40	runtime-rs: refactor decrease_attach_count with do_decrease_count Try to reduce duplicated code in decrease_attach_count with public new function do_decrease_count. Fixes: #8738 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-04-03 17:19:19 +08:00
Alex Lyn	4f0fab938d	runtime-rs: refactor increase_attach_count with do_increase_count Try to reduce duplicated code in increase_attach_count with public new function do_increase_count. Fixes: #8738 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-04-03 17:19:19 +08:00
Alex Lyn	fff64f1c3e	runtime-rs: introduce dedicated function do_decrease_count Introduce a dedicated public function do_decrease_count to reduce duplicated code in drivers' decrease_attach_count. Fixes: #8738 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-04-03 17:19:08 +08:00
Alex Lyn	5750faaf31	runtime-rs: introduce dedicated function do_increase_count Since there are many implementations of reference counting in the drivers, all of which have the same implementation, we should try to reduce such duplicated code as much as possible. Therefore, a new function is introduced to solve the problem of duplicated code. Fixes: #8738 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-04-03 17:09:17 +08:00
Dan Mihai	f800bd86f6	tests: k8s-sandbox-vcpus-allocation.bats policy Use the "allow all" policy for k8s-sandbox-vcpus-allocation.bats, instead of relying on the Kata Guest image to use the same policy as its default. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 03:01:33 +00:00
Dan Mihai	4211d93b87	tests: k8s-nginx-connectivity.bats policy Use the "allow all" policy for k8s-nginx-connectivity.bats, instead of relying on the Kata Guest image to use the same policy as its default. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 03:01:26 +00:00
Dan Mihai	5dcf64ef34	tests: k8s-volume.bats allow all policy Use the "allow all" policy for k8s-volume.bats, instead of relying on the Kata Guest image to use the same policy as its default. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 03:01:18 +00:00
Dan Mihai	04085d8442	tests: k8s-sysctls.bats allow all policy Use the "allow all" policy for k8s-sysctls.bats, instead of relying on the Kata Guest image to use the same policy as its default. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 03:01:10 +00:00
Dan Mihai	839993f245	tests: k8s-security-context.bats allow all policy Use the "allow all" policy for k8s-security-context.bats, instead of relying on the Kata Guest image to use the same policy as its default. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 03:01:03 +00:00
Dan Mihai	02a050b47e	tests: k8s-seccomp.bats allow all policy Use the "allow all" policy for k8s-seccomp.bats, instead of relying on the Kata Guest image to use the same policy as its default. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 03:00:56 +00:00
Dan Mihai	543e40b80c	tests: k8s-projected-volume.bats allow all policy Use the "allow all" policy for k8s-projected-volume.bats, instead of relying on the Kata Guest image to use the same policy as its default. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 03:00:47 +00:00
Dan Mihai	3f94e2ee1b	tests: k8s-pod-quota.bats allow all policy Use the "allow all" policy for k8s-pod-quota.bats, instead of relying on the Kata Guest image to use the same policy as its default. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 03:00:37 +00:00
Dan Mihai	ba23758a42	tests: k8s-optional-empty-secret.bats policy Use the "allow all" policy for k8s-optional-empty-secret.bats, instead of relying on the Kata Guest image to use the same policy as its default. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 03:00:30 +00:00
Dan Mihai	e4ff6b1d91	tests: k8s-measured-rootfs.bats allow all policy Use the "allow all" policy for k8s-measured-rootfs.bats, instead of relying on the Kata Guest image to use the same policy as its default. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 03:00:23 +00:00
Dan Mihai	2821326a7e	tests: k8s-liveness-probes.bats allow all policy Use the "allow all" policy for k8s-liveness-probes.bats, instead of relying on the Kata Guest image to use the same policy as its default. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 03:00:15 +00:00
Dan Mihai	9af3e4cc4a	tests: k8s-inotify.bats allow all policy Use the "allow all" policy for k8s-inotify.bats, instead of relying on the Kata Guest image to use the same policy as its default. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 03:00:08 +00:00
Dan Mihai	bd45e948cc	tests: k8s-guest-pull-image.bats policy Use the "allow all" policy for k8s-guest-pull-image.bats, instead of relying on the Kata Guest image to use the same policy as its default. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 03:00:00 +00:00
Dan Mihai	be3797ef7c	tests: k8s-footloose.bats allow all policy Use the "allow all" policy for k8s-footloose.bats, instead of relying on the Kata Guest image to use the same policy as its default. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 02:59:50 +00:00
Dan Mihai	18f5e55667	tests: k8s-empty-dirs.bats allow all policy Use the "allow all" policy for k8s-empty-dirs.bats, instead of relying on the Kata Guest image to use the same policy as its default. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 02:59:44 +00:00
Dan Mihai	ef22bd8a2b	tests: k8s: replace run_policy_specific_tests Check from: - k8s-exec-rejected.bats - k8s-policy-set-keys.bats if policy testing is enabled or not, to reduce the complexity of run_kubernetes_tests.sh. After these changes, there are no policy specific commands left in run_kubernetes_tests.sh. add_allow_all_policy_to_yaml() is moving out of run_kubernetes_tests.sh too, but it not used yet. It will be used in future commits. Fixes: #9395 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-03 02:59:28 +00:00
Guoqiang Ding	cd0c31e185	qemu: show the thread name when enable the hypervisor.debug option Add debug-threads=on in the name argument if debug enabled. Fixes: #9400 Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>	2024-04-03 10:36:52 +08:00
Saul Paredes	8a92e81f98	gha: add GENPOLICY_PULL_METHOD Add GENPOLICY_PULL_METHOD that will be used to test pulling container images in genpolicy using the oci-distribution crate and/or the containerd interface. GENPOLICY_PULL_METHOD will start being used in a future PR. Fixes: #9384 Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-04-02 19:03:28 -07:00
Gabriela Cervantes	f3957352f0	versions: Remove conmon information from versions.yaml This PR removes conmon information from versions.yaml as this is not longer being used in kata containers repository. Fixes #9396 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-04-02 16:25:45 +00:00
Dan Mihai	39805822fc	tests: k8s: reduce policy testing complexity Don't add the "allow all" policy to all the test YAML files anymore. After this change, the k8s tests assume that all the Kata CI Guest rootfs image files either: - Don't support Agent Policy at all, or - Include an "allow all" default policy. This relience/assumption will be addressed in a future commit. Fixes: #9395 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-04-02 16:18:31 +00:00
Alex Lyn	7795f9c016	Merge pull request #9365 from GabyCT/topic/removerunc versions: Remove runc version information	2024-04-02 09:21:56 +08:00
Alex Lyn	fa8049af6c	Merge pull request #9383 from Apokleos/unified-cgrp-cmdline kata-agent: enabling cgroups-v2 by systemd.unified_cgroup_hierarchy	2024-04-02 09:08:04 +08:00
Alex Lyn	07bfdf4a22	Merge pull request #9275 from Apokleos/swap-hooks-bindmnt kata-agent: Change order of guest hook and bind mount processing	2024-04-02 07:40:10 +08:00
Alex Lyn	c88014834b	kata-agent: enabling cgroups-v2 by systemd.unified_cgroup_hierarchy Configure the system to mount cgroups-v2 by default during system boot by the systemd system, We must add systemd.unified_cgroup_hierarchy=1 parameter to kernel cmdline, which will be passed by kernel_params in configuration.toml. To enable cgroup-v2, just add systemd.unified_cgroup_hierarchy=true[1] to kernel_params. Fixes: #9336 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-04-01 18:45:12 +08:00
alex.lyn	548f252bc4	runtime-rs: bugfix incorrect use of refcount before vfio attach When there's a pod with multiple containers, there may be case that attach point more than 2, we should not return Err in that case when we are doing attach ops, but just return Ok. Fixes: #8738 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-04-01 11:28:57 +08:00
Alex Lyn	aa9cd232cd	Merge pull request #9358 from GabyCT/topic/nerdrandom gha: Update journal log names for nerdctl artifacts	2024-04-01 09:50:16 +08:00
Alex Lyn	dfa8832406	Merge pull request #9345 from c3d/bug/9342-agent-test-errors agent: Fix errors in `make check`	2024-04-01 09:48:44 +08:00
Dan Mihai	3a7dbcfc17	Merge pull request #9367 from microsoft/danmihai1/infinite-io-stream-copy-loop runtime: remove stream copy infinite loop	2024-03-29 09:37:44 -07:00
Dan Mihai	600f9266f3	runtime: remove stream copy infinite loop This reverts commit `1c5693be86`. Avoid apparent infinite loop when ReadStreamRequest is blocked by policy - for some of the pods. When running the k8s-limit-range.bats test with Policy enabled, the Shim + VMM never get terminated on my cluster. Not sure why the sandbox clean-up works better for other tests, but the k8s-limit-range test pod gets stuck in an infinite loop: stdout io stream copy error happens: error = %wrpc error: code = PermissionDenied desc = \"ReadStreamRequest is blocked by policy ... policy check: ReadStreamRequest ... stdout io stream copy error happens: error = %wrpc error: code = PermissionDenied desc = \"ReadStreamRequest is blocked by policy ... policy check: ReadStreamRequest ... Fixes: #9380 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-03-28 22:43:28 +00:00
James O. D. Hunt	13966f4d1d	docs: kata-manager: Add help for permissions issue The 3.3.0 release installs the `kata-manager` script with overly restrictive permissions (see #9373), so add details to help users handle the situation. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2024-03-28 16:22:10 +00:00
James O. D. Hunt	5589e4e291	docs: kata-manager: Update with latest details Now that v3.3.0 has been released, simplify the `kata-manager` documentation. Fixes: #9227. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2024-03-28 16:22:10 +00:00
James O. D. Hunt	52fe60c94b	docs: kata-manager: Fix heading levels Add an extra heading indent so that there is only a single top-level heading. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2024-03-28 16:21:31 +00:00
Dan Mihai	ebb26edf42	Merge pull request #9347 from microsoft/danmihai1/reduce-exec-test-policy-prints genpolicy: reduce policy debug prints	2024-03-27 15:12:10 -07:00
Gabriela Cervantes	a32418bf32	versions: Remove runc version information This PR removes the runc version information as this is not longer being used in the kata containers scripts. Fixes #9364 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-03-27 20:32:38 +00:00
Steve Horsman	b3acbe0b7f	Merge pull request #8046 from fitzthum/clean-config runtime: remove unimplemented CoCo configurations	2024-03-27 19:39:48 +00:00
Tobin Feldman-Fitzthum	04d021bd12	packaging: remove SERVICEOFFLOAD option Since we're removing the unused service_offload parameter, don't set it in any of the packaging scripts. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2024-03-27 12:21:13 -05:00
Tobin Feldman-Fitzthum	9856fe5bea	runtime: remove ServiceOffload parameter Since we no longer use the service_offload configuration, remove the ServiceOffload field from the image struct. Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2024-03-27 12:21:13 -05:00
Tobin Feldman-Fitzthum	a18c7ca307	runtime: remove unimplemented CoCo configurations These experimental options were added 2 years ago in anticipation of features that would be added in CoCo. These do not match the features that were eventually added and will soon be ported to main. Fixes: #8047 Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>	2024-03-27 12:21:06 -05:00
Steve Horsman	53fa1fd82d	Merge pull request #9349 from fidencio/topic/ci-k8s-update-cpuid k8s: confidential: Update cpuid to its latest release	2024-03-27 16:57:36 +00:00
Chengyu Zhu	e66a5cb54d	Merge pull request #9332 from ChengyuZhu6/guest-pull-timeout Support to set timeout to pull large image in guest	2024-03-28 00:34:08 +08:00
Christophe de Dinechin	82c4079fd0	agent: Remove useless loop This is the report from `make check`: ``` error: this loop never actually loops --> src/signal.rs:147:9 \| 147 \| / loop { 148 \| \| select! { 149 \| \| _ = handle => { 150 \| \| println!("INFO: task completed"); ... \| 156 \| \| } 157 \| \| } \| \|_________^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#never_loop = note: `#[deny(clippy::never_loop)]` on by default ``` There is only one option: you get something or a timeout. You never retry, so the report is correct. Fixes: #9342 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2024-03-27 17:03:44 +01:00
Christophe de Dinechin	df5c88cdf0	agent: Remove lint error about `.flatten` running forever The lint report is the following: ``` error: `flatten()` will run forever if the iterator repeatedly produces an `Err` --> src/rpc.rs:1754:10 \| 1754 \| .flatten() \| ^^^^^^^^^ help: replace with: `map_while(Result::ok)` \| note: this expression returning a `std::io::Lines` may produce an infinite number of `Err` in case of a read error --> src/rpc.rs:1752:5 \| 1752 \| / reader 1753 \| \| .lines() \| \|________________^ = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#lines_filter_map_ok = note: `-D clippy::lines-filter-map-ok` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::lines_filter_map_ok)]` ``` This commit simply applies the suggestion. Fixes: #9342 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2024-03-27 17:03:44 +01:00
Christophe de Dinechin	bfb55312be	agent: Fix `.enumerate` errors during `make check` Running `make check` in the `src/agent` directory gives: ``` error: you seem to use `.enumerate()` and immediately discard the index --> rustjail/src/mount.rs:572:27 \| 572 \| for (_index, line) in reader.lines().enumerate() { \| ^^^^^^^^^^^^^^^^^^^^^^^^^^ \| = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unused_enumerate_index = note: `-D clippy::unused-enumerate-index` implied by `-D warnings` = help: to override `-D warnings` add `#[allow(clippy::unused_enumerate_index)]` help: remove the `.enumerate()` call \| 572 \| for line in reader.lines() { \| ~~~~ ~~~~~~~~~~~~~~ Checking tokio-native-tls v0.3.1 Checking hyper-tls v0.5.0 Checking reqwest v0.11.18 error: could not compile `rustjail` (lib) due to 1 previous error warning: build failed, waiting for other jobs to finish... make: *** [../../utils.mk:177: standard_rust_check] Error 101 ``` Fixes: #9342 Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>	2024-03-27 17:03:44 +01:00
Greg Kurz	e1068da1a0	Merge pull request #9326 from gkurz/draft-release Only tag and publish the release when it is fully ready	2024-03-27 15:59:59 +01:00
ChengyuZhu6	c50d3ebacc	tests:k8s: Add a test to pull large images in the guest Add a test to pull large images in the guest. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-27 21:58:44 +08:00
ChengyuZhu6	8551ee9533	how-to: add createcontainer timeout to sandbox config documentation add createcontainer timeout annotation to sandbox config documentation. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-27 21:58:44 +08:00
ChengyuZhu6	c2dc13ebaa	runtime: support to configure CreateContainer Timeout in configurations support to configure CreateContainerRequestTimeout in the configurations. e.g.: [runtime] ... create_container_timeout = 300 Note: The effective timeout is determined by the lesser of two values: runtime-request-timeout from kubelet config (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout) and create_container_timeout. In essence, the timeout used for guest pull=runtime-request-timeout<create_container_timeout?runtime-request-timeout:create_container_timeout. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-27 21:58:41 +08:00
Chengyu Zhu	87fc17d4d2	Merge pull request #9341 from ChengyuZhu6/guest-pull-doc docs: Add documents for kata guest image management	2024-03-27 21:20:22 +08:00
ChengyuZhu6	95b2f7f129	how-to: Add a document for kata guest image management usage Add a document for kata guest image management usage. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-27 20:09:37 +08:00
Greg Kurz	693c9487d4	docs: Adjust release documentation Most of the content of `docs/Stable-Branch-Strategy.md` got de-facto deprecated by the re-design of the release process described in #9064. Remove this file and all its references in the repo. The `## Versioning` section has some useful information though. It is moved to `docs/Release-Process.md`. The documentation of the `PATCH` field is adapted according to new workflow. Fixes #9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>	2024-03-27 12:41:48 +01:00
Steve Horsman	45aba769c0	Merge pull request #9346 from cmaf/ci-remove-repo-docs Remove additional links to tests directory	2024-03-27 11:13:32 +00:00
Steve Horsman	a1a615a7c8	Merge pull request #9356 from stevenhorsman/agent-opa-ppc64le-s390x workflows: Build agent-opa for more archs	2024-03-27 08:53:28 +00:00
ChengyuZhu6	2224f6d63f	runtime: support to configure CreateContainer timeout in annotation Support to configure CreateContainerRequestTimeout in the annotations. e.g.: annotations: "io.katacontainers.config.runtime.create_container_timeout": "300" Note: The effective timeout is determined by the lesser of two values: runtime-request-timeout from kubelet config (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout) and create_container_timeout. In essence, the timeout used for guest pull=runtime-request-timeout<create_container_timeout?runtime-request-timeout:create_container_timeout. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-27 15:44:29 +08:00
ChengyuZhu6	39bd462431	runtime: support to set timeout for CreateContainerRequest In the situation to pull images in the guest #8484, it’s important to account for pulling large images. Presently, the image pull process in the guest hinges on `CreateContainerRequest`, which defaults to a 60-second timeout. However, this duration may prove insufficient for pulling larger images, such as those containing AI models. Consequently, we must devise a method to extend the timeout period for large image pull. Fixes: #8141 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-27 15:44:29 +08:00
Gabriela Cervantes	a997e282be	gha: Update journal log names for nerdctl artifacts This PR updates the journal log name for nerdctl artifacts to make sure that we have different names in case we add a parallel GHA job. Fixes #9357 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-03-26 20:03:54 +00:00
GabyCT	c163d9f114	Merge pull request #9329 from GabyCT/topic/seun scripts: Fix unbound variables in k8s setup script	2024-03-26 11:19:33 -06:00
stevenhorsman	9aa675abb9	workflows: Build agent-opa for more archs Since https://github.com/kata-containers/kata-containers/pull/7769, we support building the OPA binary into the ppc64le and s390x arch versions of the rootfs, so build the policy enabled agent to match for those architectures too. Fixes: #9355 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-03-26 17:02:14 +00:00
Lukáš Doktor	a671b3fc6e	tests: Use full svc address to check kbs service the service might not listen on the default port, use the full service address to ensure we are talking to the right resource. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-03-26 16:59:02 +01:00
Lukáš Doktor	6b0eaca4d4	tests: Add support for nodeport ingress for the kbs setup this can be used on kcli or other systems where cluster nodes are accessible from all places where the tests are running. Fixes: #9272 Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2024-03-26 16:59:00 +01:00
Greg Kurz	5009fabde4	release: Keep it draft until all artifacts have been published The automated release workflow starts with the creation of the release in GitHub. This is followed by the build and upload of the various artifacts, which can be very long (like hours). During this period, the release appears to be fully available in https://github.com/kata-containers/kata-containers/ even though it lacks all the artifacts. This might be confusing for users or automation consuming the release. Create the release as draft and clear the draft flag when all jobs are done. This ensure that the release will only be tagged and made public when it is fully usable. If some job fails because of network timeout or any other transient error, the correct action is to restart the failed jobs until they eventually all succeed. This is by far the quicker path to complete the release process. If the workflow is canceled for some reason, the draft release is left behind. A new run of the workflow will create a brand new draft release with the same name (not an issue with GitHub). The draft release from the previous run should be manually deleted. This step won't be automated as it looks safer to leave the decision to a human. [1] https://github.com/kata-containers/kata-containers/releases Fixes #9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>	2024-03-26 14:48:05 +01:00
Pavel Mores	4c72b02e53	runtime-rs: remove the now-unused code of NetDevice The remaining code in network.rs was mostly moved to utils.rs which seems better home for these utility functions anyway (and a closely related function open_named_tuntap() has already lived there). ToString implementation for Address was removed after some consideration. Address should probably ideally implement Display (as per RFC 565) which would also supply a ToString implementation, however it implements Debug instead, probably to enable automatic implementation of Debug for anything that Address is a member of, if for no other reason. Rather than having two identical functions this commit simply switches to using the Debug implementation for printing Address on qemu command line. Fixes #9352 Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-03-26 12:52:40 +01:00
Pavel Mores	c94e55d45a	runtime-rs: make QemuCmdLine own vsock file descriptor Make file descriptors to be passed to qemu owned by QemuCmdLine. See commit 52958f17cd for more explanation. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-03-26 12:50:41 +01:00
Pavel Mores	0cf0e923fc	runtime-rs: refactor QemuCmdLine::add_network_device() signature add_network_device() doesn't need to be passed NetworkInfo since it already has access to the full HypervisorConfig. Also, one of the goals of QemuCmdLine interface's design is to avoid coupling between QemuCmdLine and the hypervisor crate's device module, if at all possible. That's why add_network_device() shouldn't take device module's NetworkConfig but just parts that are useful in add_network_device()'s implementation. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-03-26 12:50:41 +01:00
Pavel Mores	a4f033f864	runtime-rs: add should_disable_modern() utility function is_running_in_vm() is enough to figure out whether to disable_modern but it's clumsy and verbose to use. should_disable_modern() streamlines the usage by encapsulating the verbosity. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-03-26 12:50:41 +01:00
Pavel Mores	12e40ede97	runtime-rs: reimplement add_network_device() using Netdev & DeviceVirtioNet This commit replaces the existing NetDevice-based implementation with one using Netdev and DeviceVirtioNet. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-03-26 12:50:41 +01:00
Pavel Mores	0a57e2bb32	runtime-rs: refactor NetDevice in qemu driver In keeping with architecture of QemuCmdLine implementation we split the functionality into two objects: Netdev to represent and generate the -netdev part and DeviceVirtioNet for the -device virtio-net-<transport> part. This change is a pure refactor, existing functionality does not change. However, we do remove some stub generalizations and govmm-isms, notably: - we remove the NetDev enum since the only network interface types that kata seems to use with qemu are tuntap and macvtap, both of which are implemented by the same -netdev tap - enum DeviceDriver is also left out since it doesn't seem reasonable to try to represent VFIO NICs (which are completely different from virtio-net ones) with the same struct as virtio-net - we also remove VirtioTransport because there's no use for it so far, but with the expectation that it will be added soon. We also make struct Netdev the owner of any vhost-net and queue file descriptors so that their lifetime is tied ultimately to the lifetime of QemuCmdLine automatically, instead of returning the fds to the caller and forcing it to achieve the equivalent functionality but manually. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-03-26 12:50:41 +01:00
Pavel Mores	7f23734172	runtime-rs: reduce generate_netdev_fds() dependencies generate_netdev_fds() takes NetworkConfig from which it however only needs a host-side network device name. This commit makes it take the device name directly, making the function useful to callers who don't have the whole NetworkConfig but do have the requisite device name. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-03-26 12:50:41 +01:00
Pavel Mores	d4ac45d840	runtime-rs: refactor clear_fd_flags() The idea of this function is to make sure O_CLOEXEC is not set on file descriptors that should be inherited by a child (=hypervisor) process. The approach so far is however rather heavy-handed - clearing all flags is unjustifiably aggresive for a low-level function with no knowledge of context whatsoever. This commit refactors the function so that it only does what's expected and renames it accordingly. It also clarifies some of its call sites. Signed-off-by: Pavel Mores <pmores@redhat.com>	2024-03-26 12:50:14 +01:00
Fabiano Fidêncio	cfe75f9422	k8s: confidential: Update cpuid to its latest release Since v2.2.6 it can detect TDX guests on Azure, so let's bump it even if Azure peer-pods are not currently used as part of our CI. Fixes: #9348 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-03-26 10:21:12 +01:00
Chengyu Zhu	d16971e37e	Merge pull request #9325 from ChengyuZhu6/image_service agent:image: Refactor code to improve memory efficiency of image service	2024-03-26 10:38:37 +08:00
Dan Mihai	6c72c29535	genpolicy: reduce policy debug prints Kata CI has full debug output enabled for the cbl-mariner k8s tests, and the test AKS node is relatively slow. So debug prints from policy are expensive during CI. Fixes: #9296 Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2024-03-26 02:21:26 +00:00
Alex Lyn	cec943fc26	Merge pull request #9244 from Apokleos/dgb-gpu runtime-rs/dragonball: add support building kernel with upcall and GPU hotplug	2024-03-26 08:53:54 +08:00
Chelsea Mafrica	4e3deb5a3b	tools: Fix path for installing yq in packaging script The lib.sh script uses the right directory but the wrong path for the script that installs yq; fix it. Fixes #9165 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2024-03-25 15:09:52 -07:00
Chelsea Mafrica	cfb977625e	docs: Remove links to tests repo Remove links to tests repo and update with corresponding location in the current repo. Fixes #9165 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2024-03-25 15:09:52 -07:00
Chelsea Mafrica	d69514766e	src: Remove references to files in tests repo Change scripts and source that uses files in the tests repo to use the corresponding file in the current repo. Fixes #9165 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>	2024-03-25 15:09:52 -07:00
Gabriela Cervantes	ddef2be4f1	docs: Remove stale kernel information This PR removes stale kernel information from the README document. Fixes #9343 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-03-25 15:57:00 +00:00
Greg Kurz	e9e94d2dbd	release: Give a pretty name to all steps For a prettier rendering in the web UI. Fixes #9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>	2024-03-25 15:50:35 +01:00
Greg Kurz	dce6ea57b2	release: Simplify the `create-new-release` action of `release.sh` Now that the version is an invariant for the entire workflow, it isn't required to obtain it with an environment variable. Just rely on the content of the `VERSION` file like other actions. Fixes #9064 - part VI Signed-off-by: Greg Kurz <groug@kaod.org>	2024-03-25 15:50:35 +01:00
Alex Lyn	5c54315a87	dragonball: fix CI failure due to poor UT adaptation. Fixes: #9144 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-03-25 20:25:27 +08:00
Alex Lyn	079d894496	kernel: bump version in kata config version Fixes: #9140 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-03-25 20:25:27 +08:00
Alex Lyn	070c3fa657	docs: add doc about building kernel with upcall and GPU hotplug We need some docs about how to build a guest kernel to support both Upcall and Nvidia GPU Passthrough(hotplug) at the same time. This patch is to do such thing to help users to build a guest kernel with support both Upcall and Nvidia GPU hotplug/unlplug. Fixes: #9140 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-03-25 20:25:17 +08:00
ChengyuZhu6	06b9935402	docs: Add a document for kata guest image management design Add a document for kata guest image management design. Related feature: #8484 Fixes: #9225 -- part I Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com> Co-authored-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2024-03-25 18:17:23 +08:00
Chengyu Zhu	4029d154ba	Merge pull request #9313 from ChengyuZhu6/rtest agent: Refactor unit tests to leverage rstest for parameterization	2024-03-25 10:31:45 +08:00
Alex Lyn	bc309b9865	kernel: add CONFIG_CRYPTO_ECDSA into whitelist CONFIG_CRYPTO_ECDSA is not supported in older kernels such as 5.10.x which may cause building broken problem if we build such kernel with NVIDIA GPU in version 5.10.x So this patch is to add CONFIG_CRYPTO_ECDSA into whitelist.conf to avoid break building guest kernel with NVIDIA GPU. Fixes: #9140 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-03-25 08:05:31 +08:00
ChengyuZhu6	f47408fdf4	agent:image: Refactor code to improve memory efficiency of image service Currently, `.lock().await.clone()` results in `Option<ImageService>` being duplicated in memory with each call to `singleton()`. Consequently, if kata-agent receives numerous image pulling requests simultaneously, it will lead to the allocation of multiple `Option<ImageService>` instances in memory, thereby consuming additional memory resources. In image.rs, we introduce two public functions: `merge_bundle_oci()` and `init_image_service()`. These functions will encapsulate the operations on `IMAGE_SERVICE`, ensuring that its internal details remain hidden from external modules such as `rpc.rs`. Fixes: #9225 -- part II Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com> Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-25 07:46:50 +08:00
ChengyuZhu6	7a49ec1c80	agent:util: Refactor the unit tests to leverage rstest Refactor the unit tests in util.rs to leverage rstest for parameterization. Fixes: #9314 Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-23 10:49:53 +08:00
ChengyuZhu6	2df2b4d30d	agent:namespace: Refactor unit tests to leverage rstest Refactor the unit tests in `namespace.rs` to leverage rstest for parameterization. Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>	2024-03-23 10:49:48 +08:00
Hyounggyu Choi	d915a79e2d	Merge pull request #9280 from BbolroC/enable-qemu-on-s390x runtime-rs: Enable qemu on s390x	2024-03-22 23:58:42 +01:00
Fabiano Fidêncio	25cd28a32b	Merge pull request #9337 from fidencio/topic/bump-nydus-snapshotter versions: Update nydus-snapshotter to v0.13.11	2024-03-22 22:18:18 +01:00
Hyounggyu Choi	81aaa34bd6	runtime-rs: Add DeviceVirtioSerial and DeviceVirtconsole It is observed that virtiofsd exits immediately on s390x if there is no attached console devices. This commit resolves the issue by migrating `appendConsole()` from runtime and being triggered in `start_vm()`. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-03-22 19:27:13 +01:00
Hyounggyu Choi	2cfe745efb	runtime-rs: Enable memory backend option for Machine for s390x For s390x, it requires an additional option `memory-backend` for `-machine`. Otherwise, virtiofsd exits with HandleRequest(InvalidParam). This commit is to add a field `memory_backend` to `struct Machine` and turn it on for s390x. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-03-22 19:27:13 +01:00
Hyounggyu Choi	9bcfaad625	runtime-rs: Add ccw block device for rootfs Like nvdimm for x86_64, a block device for s390x should be treated differently with `virtio-blk-ccw`. This is to generate a QEMU command line parameter for a block device by using `-blockdev` and `-device` if the `vm_rootfs_driver` is set to `virtio-blk-ccw`. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-03-22 19:27:13 +01:00
Fabiano Fidêncio	d0949759ec	versions: Update nydus-snapshotter to v0.13.11 This version brings in a fix for cleaning up k3s/rke2 environments, which directly impacts the TDX machine that's part of our CI. Fixes: #9318 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>	2024-03-22 14:56:18 +01:00
Gabriela Cervantes	d54cdd3f0c	scripts: Fix unbound variables in k8s setup script This PR fixes the unbound variables error when trying to run the setup script locally in order to avoid errors. Fixes #9328 Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>	2024-03-21 19:10:16 +00:00
Hyounggyu Choi	9b2c08935b	runtime-rs: Pass different device argument based on bus type Currently, `*-pci` is used as an argument for the device config. It is not true for a case where a different type of bus is used. s390x uses `ccw`. This commit is to make it flexible to generate the device argument based on the bus type. A structure `DeviceVhostUserFsPci` and `VhostVsockPci` is renamed to `DeviceVhostUserFs` and `VhostVsock` because the structure name is not bound to a certain bus type any more. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-03-21 09:25:37 +01:00
Hyounggyu Choi	7b3d1adb8c	libs: Bump sysinfo to v0.30.5 It has been observed that the runtime stops running around `sysinfo::total_memory()` while adjusting a config on s390x. This is to update the crate to the latest version which happened to resolve the issue. (No explicit release note for this) Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-03-20 09:27:13 +01:00
Hyounggyu Choi	1dac6b1357	runtime-rs: Configure s390x specific flags for Makefile s390x supports a different machine type `s390-ccw-virtio` and it is not required to configure cpu features by default for the platform. A hypervisor `dragonball` is not supported on s390x so that `DBCMD` is not necessary. `vm-rootfs_driver` should be set to `virtio-blk-ccw`. This commit is to set the architecture-specific flags for Makefile. Fixes: #9158 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-03-14 13:05:35 +01:00
Alex Lyn	2aa3519520	kata-agent: Change order of guest hook and bind mount processing The guest_hook_path item in configuration.toml allows OCI hook scripts to be executed within Kata's guest environment. Traditionally, these guest hook programs are pre-built and included in Kata's guest rootfs image at a fixed location. While setting guest_hook_path = "/usr/share/oci/hooks" in configuration.toml works, it lacks flexibility. Not all guest hooks reside in the path /usr/share/oci/hooks, and users might have custom locations. To address this, a more flexible and configurable approach is to be proposed that allows users to specify their desired path. This could include using a sandbox bind mount path for hooks specific to that particular container. However, The current implementation of guest hooks and bind mounts in kata-agent has a reversed order of execution compared to the desired behavior. To achieve the intended functionality, we simply need to swap the order of their implementation. Fixes: #9274 Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2024-03-13 20:30:32 +08:00
Zvonko Kaiser	63dff9a9f2	kata-agent: CreateContainer Hook Fixes: #9267 The doc states we have support for all lifecycle hooks. There are still some missing. This is the first issue regarding the CreateContainer hook which is run before pivot_root but after prestart and createruntime Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2024-03-13 09:24:25 +00:00

5314 changed files with 734611 additions and 301730 deletions

									
										7

.editorconfig
									
										Normal file
									
												View File
												
				@@ -0,0 +1,7 @@

				root = true

				[*]

				charset = utf-8

				end_of_line = lf

				insert_final_newline = true

				trim_trailing_whitespace = true

									
										30

.editorconfig-checker.json
									
										Normal file
									
												View File
												
				@@ -0,0 +1,30 @@

				{

				  "Verbose": false,

				  "Debug": false,

				  "IgnoreDefaults": false,

				  "SpacesAfterTabs": false,

				  "NoColor": false,

				  "Exclude": [

				    "src/runtime/vendor",

				    "src/tools/log-parser/vendor",

				    "tests/metrics/cmd/checkmetrics/vendor",

				    "tests/vendor",

				    "src/runtime/virtcontainers/pkg/cloud-hypervisor/client",

				    "\\.img$",

				    "\\.dtb$",

				    "\\.drawio$",

				    "\\.svg$",

				    "\\.patch$"

				  ],

				  "AllowedContentTypes": [],

				  "PassedFiles": [],

				  "Disable": {

				    "EndOfLine": false,

				    "Indentation": false,

				    "IndentSize": false,

				    "InsertFinalNewline": false,

				    "TrimTrailingWhitespace": false,

				    "MaxLineLength": false,

				    "Charset": false

				  }

				}

									
										30

.github/actionlint.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,30 @@

				# Copyright (c) 2024 Red Hat

				#

				# SPDX-License-Identifier: Apache-2.0

				#

				# Configuration file with rules for the actionlint tool.

				#

				self-hosted-runner:

				  # Labels of self-hosted runner that linter should ignore

				  labels:

				    - amd64-nvidia-a100

				    - amd64-nvidia-h100-snp

				    - arm64-k8s

				    - garm-ubuntu-2004

				    - garm-ubuntu-2004-smaller

				    - garm-ubuntu-2204

				    - garm-ubuntu-2304

				    - garm-ubuntu-2304-smaller

				    - garm-ubuntu-2204-smaller

				    - ppc64le

				    - ppc64le-k8s

				    - ppc64le-small

				    - ubuntu-24.04-ppc64le

				    - ubuntu-24.04-s390x

				    - metrics

				    - riscv-builder

				    - sev-snp

				    - s390x

				    - s390x-large

				    - tdx

				    - ubuntu-24.04-arm

									
										2

.github/cargo-deny-composite-action/cargo-deny-generator.sh
									
										vendored
									
												View File
												
				@@ -8,7 +8,7 @@

				script_dir=$(dirname "$(readlink -f "$0")")

				parent_dir=$(realpath "${script_dir}/../..")

				cidir="${parent_dir}/ci"

				source "${cidir}/lib.sh"

				source "${cidir}/../tests/common.bash"

				cargo_deny_file="${script_dir}/action.yaml"

									
										4

.github/cargo-deny-composite-action/cargo-deny-skeleton.yaml.in
									
										vendored
									
												View File
												
				@@ -17,11 +17,11 @@ runs:

				      uses: actions-rs/toolchain@v1

				      with:

				        profile: minimal

				        toolchain: nightly 

				        toolchain: nightly

				        override: true

				    - name: Cache

				      uses: Swatinem/rust-cache@v2

				      uses: Swatinem/rust-cache@f0deed1e0edfc6a9be95417288c0e1099b1eeec3 # v2.7.7

				    - name: Install Cargo deny

				      shell: bash

									
										92

.github/dependabot.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,92 @@

				---

				version: 2

				updates:

				  - package-ecosystem: "cargo"

				    directories:

				      - "/src/agent"

				      - "/src/dragonball"

				      - "/src/libs"

				      - "/src/mem-agent"

				      - "/src/mem-agent/example"

				      - "/src/runtime-rs"

				      - "/src/tools/agent-ctl"

				      - "/src/tools/genpolicy"

				      - "/src/tools/kata-ctl"

				      - "/src/tools/trace-forwarder"

				    schedule:

				      interval: "daily"

				    ignore:

				    # rust-vmm repos might cause incompatibilities on patch versions, so

				    # lets handle them manually for now.

				      - dependency-name: "event-manager"

				      - dependency-name: "kvm-bindings"

				      - dependency-name: "kvm-ioctls"

				      - dependency-name: "linux-loader"

				      - dependency-name: "seccompiler"

				      - dependency-name: "vfio-bindings"

				      - dependency-name: "vfio-ioctls"

				      - dependency-name: "virtio-bindings"

				      - dependency-name: "virtio-queue"

				      - dependency-name: "vm-fdt"

				      - dependency-name: "vm-memory"

				      - dependency-name: "vm-superio"

				      - dependency-name: "vmm-sys-util"

				    # As we often have up to 8/9 components that need the same versions bumps

				    # create groups for common dependencies, so they can all go in a single PR

				    # We can extend this as we see more frequent groups

				    groups:

				      bit-vec:

				        patterns:

				          - bit-vec

				      bumpalo:

				        patterns:

				          - bumpalo

				      clap:

				        patterns:

				          - clap

				      crossbeam:

				        patterns:

				          - crossbeam

				      h2:

				        patterns:

				          - h2

				      idna:

				        patterns:

				          - idna

				      openssl:

				        patterns:

				          - openssl

				      protobuf:

				        patterns:

				          - protobuf

				      rsa:

				        patterns:

				          - rsa

				      rustix:

				        patterns:

				          - rustix

				      slab:

				        patterns:

				          - slab

				      time:

				        patterns:

				          - time

				      tokio:

				        patterns:

				          - tokio

				      tracing:

				        patterns:

				          - tracing

				  - package-ecosystem: "gomod"

				    directories:

				      - "src/runtime"

				      - "tools/testing/kata-webhook"

				      - "src/tools/csi-kata-directvolume"

				    schedule:

				      interval: "daily"

				  - package-ecosystem: "github-actions"

				    directory: "/"

				    schedule:

				      interval: "monthly"

									
										7

.github/workflows/PR-wip-checks.yaml
									
										vendored
									
												View File
												
				@@ -9,18 +9,19 @@ on:

				      - labeled

				      - unlabeled

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				jobs:

				  pr_wip_check:

				    runs-on: ubuntu-latest

				    runs-on: ubuntu-22.04

				    name: WIP Check

				    steps:

				    - name: WIP Check

				      if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}

				      uses: tim-actions/wip-check@1c2a1ca6c110026b3e2297bb2ef39e1747b5a755

				      uses: tim-actions/wip-check@1c2a1ca6c110026b3e2297bb2ef39e1747b5a755 # master (2021-06-10)

				      with:

				        labels: '["do-not-merge", "wip", "rfc"]'

				        keywords: '["WIP", "wip", "RFC", "rfc", "dnm", "DNM", "do-not-merge"]'

									
										30

.github/workflows/actionlint.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,30 @@

				name: Lint GHA workflows

				on:

				  workflow_dispatch:

				  pull_request:

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				jobs:

				  run-actionlint:

				    name: run-actionlint

				    env:

				      GH_TOKEN: ${{ github.token }}

				    runs-on: ubuntu-24.04

				    steps:

				      - name: Checkout the code

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Install actionlint gh extension

				        run: gh extension install https://github.com/cschleiden/gh-actionlint

				      - name: Run actionlint

				        run:  gh actionlint

									
										59

.github/workflows/add-issues-to-project.yaml
									
										vendored
									
												View File
											
				@@ -1,59 +0,0 @@

				# Copyright (c) 2020 Intel Corporation

				#

				# SPDX-License-Identifier: Apache-2.0

				#

				name: Add newly created issues to the backlog project

				on:

				  issues:

				    types:

				      - opened

				      - reopened

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				jobs:

				  add-new-issues-to-backlog:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Install hub

				        run: |

				          HUB_ARCH="amd64"

				          HUB_VER=$(curl -sL "https://api.github.com/repos/github/hub/releases/latest" |\

				            jq -r .tag_name | sed 's/^v//')

				          curl -sL \

				            "https://github.com/github/hub/releases/download/v${HUB_VER}/hub-linux-${HUB_ARCH}-${HUB_VER}.tgz" |\

				          tar xz --strip-components=2 --wildcards '*/bin/hub' && \

				          sudo install hub /usr/local/bin

				      - name: Install hub extension script

				        run: |

				          # Clone into a temporary directory to avoid overwriting

				          # any existing github directory.

				          pushd $(mktemp -d) &>/dev/null

				          git clone --single-branch --depth 1 "https://github.com/kata-containers/.github" && cd .github/scripts

				          sudo install hub-util.sh /usr/local/bin

				          popd &>/dev/null

				      - name: Checkout code to allow hub to communicate with the project

				        uses: actions/checkout@v4

				      - name: Add issue to issue backlog

				        env:

				          GITHUB_TOKEN: ${{ secrets.KATA_GITHUB_ACTIONS_TOKEN }}

				        run: |

				          issue=${{ github.event.issue.number }}

				          project_name="Issue backlog"

				          project_type="org"

				          project_column="To do"

				          hub-util.sh \

				            add-issue \

				            "$issue" \

				            "$project_name" \

				            "$project_type" \

				            "$project_column"

									
										53

.github/workflows/add-pr-sizing-label.yaml
									
										vendored
									
												View File
											
				@@ -1,53 +0,0 @@

				# Copyright (c) 2022 Intel Corporation

				#

				# SPDX-License-Identifier: Apache-2.0

				#

				name: Add PR sizing label

				on:

				  pull_request_target:

				    types:

				      - opened

				      - reopened

				      - synchronize

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				jobs:

				  add-pr-size-label:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Checkout code

				        uses: actions/checkout@v4

				        with:

				          ref: ${{ github.event.pull_request.head.sha }}

				          fetch-depth: 0

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ github.event.pull_request.base.ref }}

				      - name: Install PR sizing label script

				        run: |

				          # Clone into a temporary directory to avoid overwriting

				          # any existing github directory.

				          pushd $(mktemp -d) &>/dev/null

				          git clone --single-branch --depth 1 "https://github.com/kata-containers/.github" && cd .github/scripts

				          sudo install pr-add-size-label.sh /usr/local/bin

				          popd &>/dev/null

				      - name: Add PR sizing label

				        env:

				          GITHUB_TOKEN: ${{ secrets.KATA_GITHUB_ACTIONS_PR_SIZE_TOKEN }}

				        run: |

				          pr=${{ github.event.number }}

				          # Removing man-db, workflow kept failing, fixes: #4480

				          sudo apt -y remove --purge man-db

				          sudo apt -y install diffstat patchutils

				          pr-add-size-label.sh -p "$pr"

									
										215

.github/workflows/basic-ci-amd64.yaml
									
										vendored
									
												View File
												
				@@ -13,26 +13,33 @@ on:

				        type: string

				        default: ""

				permissions: {}

				jobs:

				  run-cri-containerd:

				  run-containerd-sandboxapi:

				    name: run-containerd-sandboxapi

				    strategy:

				      # We can set this to true whenever we're 100% sure that

				      # the all the tests are not flaky, otherwise we'll fail

				      # all the tests due to a single flaky instance.

				      fail-fast: false

				      matrix:

				        containerd_version: ['lts', 'active']

				        vmm: ['clh', 'dragonball', 'qemu', 'stratovirt', 'cloud-hypervisor']

				    runs-on: garm-ubuntu-2204-smaller

				        containerd_version: ['active']

				        vmm: ['dragonball', 'cloud-hypervisor', 'qemu-runtime-rs']

				    # TODO: enable me when https://github.com/containerd/containerd/issues/11640 is fixed

				    if: false

				    runs-on: ubuntu-22.04

				    env:

				      CONTAINERD_VERSION: ${{ matrix.containerd_version }}

				      GOPATH: ${{ github.workspace }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      SANDBOXER: "shim"

				    steps:

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -42,9 +49,11 @@ jobs:

				      - name: Install dependencies

				        run: bash tests/integration/cri-containerd/gha-run.sh install-dependencies

				        env:

				          GH_TOKEN: ${{ github.token }}

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				@@ -52,27 +61,29 @@ jobs:

				      - name: Install kata

				        run: bash tests/integration/cri-containerd/gha-run.sh install-kata kata-artifacts

				      - name: Run cri-containerd tests

				      - name: Run containerd-sandboxapi tests

				        timeout-minutes: 10

				        run: bash tests/integration/cri-containerd/gha-run.sh run

				  run-containerd-stability:

				    name: run-containerd-stability

				    strategy:

				      fail-fast: false

				      matrix:

				        containerd_version: ['lts', 'active']

				        vmm: ['clh', 'cloud-hypervisor', 'dragonball', 'qemu', 'stratovirt']

				    runs-on: garm-ubuntu-2204-smaller

				        vmm: ['clh', 'cloud-hypervisor', 'dragonball', 'qemu', 'qemu-runtime-rs']

				    runs-on: ubuntu-22.04

				    env:

				      CONTAINERD_VERSION: ${{ matrix.containerd_version }}

				      GOPATH: ${{ github.workspace }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      SANDBOXER: "podsandbox"

				    steps:

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				@@ -81,9 +92,11 @@ jobs:

				      - name: Install dependencies

				        run: bash tests/stability/gha-run.sh install-dependencies

				        env:

				          GH_TOKEN: ${{ github.token }}

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				@@ -96,6 +109,7 @@ jobs:

				        run: bash tests/stability/gha-run.sh run

				  run-nydus:

				    name: run-nydus

				    strategy:

				      # We can set this to true whenever we're 100% sure that

				      # the all the tests are not flaky, otherwise we'll fail

				@@ -103,17 +117,18 @@ jobs:

				      fail-fast: false

				      matrix:

				        containerd_version: ['lts', 'active']

				        vmm: ['clh', 'qemu', 'dragonball', 'stratovirt']

				    runs-on: garm-ubuntu-2204-smaller

				        vmm: ['clh', 'qemu', 'dragonball', 'qemu-runtime-rs']

				    runs-on: ubuntu-22.04

				    env:

				      CONTAINERD_VERSION: ${{ matrix.containerd_version }}

				      GOPATH: ${{ github.workspace }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				    steps:

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -123,67 +138,51 @@ jobs:

				      - name: Install dependencies

				        run: bash tests/integration/nydus/gha-run.sh install-dependencies

				        env:

				          GH_TOKEN: ${{ github.token }}

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				      - name: get-kata-tools-tarball

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-tools-artifacts

				      - name: Install kata

				        run: bash tests/integration/nydus/gha-run.sh install-kata kata-artifacts

				      - name: Install kata-tools

				        run: bash tests/integration/nydus/gha-run.sh install-kata-tools kata-tools-artifacts

				      - name: Run nydus tests

				        timeout-minutes: 10

				        run: bash tests/integration/nydus/gha-run.sh run

				  run-runk:

				    runs-on: garm-ubuntu-2204-smaller

				    env:

				      CONTAINERD_VERSION: lts

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Install dependencies

				        run: bash tests/integration/runk/gha-run.sh install-dependencies

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        with:

				          name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				      - name: Install kata

				        run: bash tests/integration/runk/gha-run.sh install-kata kata-artifacts

				      - name: Run runk tests

				        timeout-minutes: 10

				        run: bash tests/integration/runk/gha-run.sh run

				  run-tracing:

				    name: run-tracing

				    strategy:

				      fail-fast: false

				      matrix:

				        vmm:

				          - clh # cloud-hypervisor

				          - qemu

				    # TODO: enable me when https://github.com/kata-containers/kata-containers/issues/9763 is fixed

				    # TODO: Transition to free runner (see #9940).

				    if: false

				    runs-on: garm-ubuntu-2204-smaller

				    env:

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				    steps:

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -193,9 +192,11 @@ jobs:

				      - name: Install dependencies

				        run: bash tests/functional/tracing/gha-run.sh install-dependencies

				        env:

				          GH_TOKEN: ${{ github.token }}

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				@@ -208,19 +209,27 @@ jobs:

				        run: bash tests/functional/tracing/gha-run.sh run

				  run-vfio:

				    name: run-vfio

				    strategy:

				      fail-fast: false

				      matrix:

				        vmm: ['clh', 'qemu']

				        vmm:

				          - clh

				          - qemu

				    # TODO: enable with clh when https://github.com/kata-containers/kata-containers/issues/9764 is fixed

				    # TODO: enable with qemu when https://github.com/kata-containers/kata-containers/issues/9851 is fixed

				    # TODO: Transition to free runner (see #9940).

				    if: false

				    runs-on: garm-ubuntu-2304

				    env:

				      GOPATH: ${{ github.workspace }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				    steps:

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -230,9 +239,11 @@ jobs:

				      - name: Install dependencies

				        run: bash tests/functional/vfio/gha-run.sh install-dependencies

				        env:

				          GH_TOKEN: ${{ github.token }}

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				@@ -241,48 +252,8 @@ jobs:

				        timeout-minutes: 15

				        run: bash tests/functional/vfio/gha-run.sh run

				  run-docker-tests:

				    strategy:

				      # We can set this to true whenever we're 100% sure that

				      # all the tests are not flaky, otherwise we'll fail them

				      # all due to a single flaky instance.

				      fail-fast: false

				      matrix:

				        vmm:

				          - clh

				          - qemu

				    runs-on: garm-ubuntu-2304-smaller

				    env:

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Install dependencies

				        run: bash tests/integration/docker/gha-run.sh install-dependencies

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        with:

				          name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				      - name: Install kata

				        run: bash tests/integration/docker/gha-run.sh install-kata kata-artifacts

				      - name: Run docker smoke test

				        timeout-minutes: 5

				        run: bash tests/integration/docker/gha-run.sh run

				  run-nerdctl-tests:

				    name: run-nerdctl-tests

				    strategy:

				      # We can set this to true whenever we're 100% sure that

				      # all the tests are not flaky, otherwise we'll fail them

				@@ -294,14 +265,16 @@ jobs:

				          - dragonball

				          - qemu

				          - cloud-hypervisor

				    runs-on: garm-ubuntu-2304-smaller

				          - qemu-runtime-rs

				    runs-on: ubuntu-22.04

				    env:

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				    steps:

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -310,10 +283,13 @@ jobs:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Install dependencies

				        env:

				          GITHUB_API_TOKEN: ${{ github.token }}

				          GH_TOKEN: ${{ github.token }}

				        run: bash tests/integration/nerdctl/gha-run.sh install-dependencies

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				@@ -326,11 +302,54 @@ jobs:

				        run: bash tests/integration/nerdctl/gha-run.sh run

				      - name: Collect artifacts ${{ matrix.vmm }}

				        if: always()

				        run: bash tests/integration/nerdctl/gha-run.sh collect-artifacts

				        continue-on-error: true

				      - name: Archive artifacts ${{ matrix.vmm }}

				        uses: actions/upload-artifact@v3

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: nerdctl-tests-garm-${{ matrix.vmm }}

				          path: /tmp/artifacts

				          retention-days: 1

				  run-kata-agent-apis:

				    name: run-kata-agent-apis

				    runs-on: ubuntu-22.04

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Install dependencies

				        run: bash tests/functional/kata-agent-apis/gha-run.sh install-dependencies

				        env:

				          GH_TOKEN: ${{ github.token }}

				      - name: get-kata-tarball

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				      - name: get-kata-tools-tarball

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-tools-artifacts

				      - name: Install kata & kata-tools

				        run: |

				          bash tests/functional/kata-agent-apis/gha-run.sh install-kata kata-artifacts

				          bash tests/functional/kata-agent-apis/gha-run.sh install-kata-tools kata-tools-artifacts

				      - name: Run kata agent api tests with agent-ctl

				        run: bash tests/functional/kata-agent-apis/gha-run.sh run

									
										108

.github/workflows/basic-ci-s390x.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,108 @@

				name: CI | Basic s390x tests

				on:

				  workflow_call:

				    inputs:

				      tarball-suffix:

				        required: false

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				permissions: {}

				jobs:

				  run-containerd-sandboxapi:

				    name: run-containerd-sandboxapi

				    strategy:

				      # We can set this to true whenever we're 100% sure that

				      # the all the tests are not flaky, otherwise we'll fail

				      # all the tests due to a single flaky instance.

				      fail-fast: false

				      matrix:

				        containerd_version: ['active']

				        vmm: ['qemu-runtime-rs']

				    # TODO: enable me when https://github.com/containerd/containerd/issues/11640 is fixed

				    if: false

				    runs-on: s390x-large

				    env:

				      CONTAINERD_VERSION: ${{ matrix.containerd_version }}

				      GOPATH: ${{ github.workspace }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      SANDBOXER: "shim"

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Install dependencies

				        run: bash tests/integration/cri-containerd/gha-run.sh

				        env:

				          GH_TOKEN: ${{ github.token }}

				      - name: get-kata-tarball

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-s390x${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				      - name: Install kata

				        run: bash tests/integration/cri-containerd/gha-run.sh install-kata kata-artifacts

				      - name: Run containerd-sandboxapi tests

				        timeout-minutes: 10

				        run: bash tests/integration/cri-containerd/gha-run.sh run

				  run-containerd-stability:

				    name: run-containerd-stability

				    strategy:

				      fail-fast: false

				      matrix:

				        containerd_version: ['lts', 'active']

				        vmm: ['qemu']

				    runs-on: s390x-large

				    env:

				      CONTAINERD_VERSION: ${{ matrix.containerd_version }}

				      GOPATH: ${{ github.workspace }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      SANDBOXER: "podsandbox"

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Install dependencies

				        run: bash tests/stability/gha-run.sh install-dependencies

				      - name: get-kata-tarball

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-s390x${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				      - name: Install kata

				        run: bash tests/stability/gha-run.sh install-kata kata-artifacts

				      - name: Run containerd-stability tests

				        timeout-minutes: 15

				        run: bash tests/stability/gha-run.sh run

									
										134

.github/workflows/build-checks-preview-riscv64.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,134 @@

				# This yaml is designed to be used until all components listed in

				# `build-checks.yaml` are supported

				on:

				  workflow_dispatch:

				    inputs:

				      instance:

				        default: "riscv-builder"

				        description: "Default instance when manually triggering"

				  workflow_call:

				    inputs:

				      instance:

				        required: true

				        type: string

				permissions: {}

				name: Build checks preview riscv64

				jobs:

				  check:

				    name: check

				    runs-on: ${{ inputs.instance }}

				    strategy:

				      fail-fast: false

				      matrix:

				        command:

				          - "make vendor"

				          - "make check"

				          - "make test"

				          - "sudo -E PATH=\"$PATH\" make test"

				        component:

				          - name: agent

				            path: src/agent

				            needs:

				              - rust

				              - libdevmapper

				              - libseccomp

				              - protobuf-compiler

				              - clang

				          - name: agent-ctl

				            path: src/tools/agent-ctl

				            needs:

				              - rust

				              - musl-tools

				              - protobuf-compiler

				              - clang

				          - name: trace-forwarder

				            path: src/tools/trace-forwarder

				            needs:

				              - rust

				              - musl-tools

				          - name: genpolicy

				            path: src/tools/genpolicy

				            needs:

				              - rust

				              - musl-tools

				              - protobuf-compiler

				          - name: runtime

				            path: src/runtime

				            needs:

				              - golang

				              - XDG_RUNTIME_DIR

				          - name: runtime-rs

				            path: src/runtime-rs

				            needs:

				              - rust

				    steps:

				      - name: Adjust a permission for repo

				        run: |

				          sudo chown -R "$USER":"$USER" "$GITHUB_WORKSPACE" "$HOME"

				          sudo rm -rf "$GITHUB_WORKSPACE"/* || { sleep 10 && sudo rm -rf "$GITHUB_WORKSPACE"/*; }

				          sudo rm -f /tmp/kata_hybrid*  # Sometime we got leftover from test_setup_hvsock_failed()

				      - name: Checkout the code

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Install yq

				        run: |

				          ./ci/install_yq.sh

				        env:

				          INSTALL_IN_GOPATH: false

				      - name: Install golang

				        if: contains(matrix.component.needs, 'golang')

				        run: |

				          ./tests/install_go.sh -f -p

				          echo "/usr/local/go/bin" >> "$GITHUB_PATH"

				      - name: Setup rust

				        if: contains(matrix.component.needs, 'rust')

				        run: |

				          ./tests/install_rust.sh

				          echo "${HOME}/.cargo/bin" >> "$GITHUB_PATH"

				          if [ "$(uname -m)" == "x86_64" ] || [ "$(uname -m)" == "aarch64" ]; then

				            sudo apt-get update && sudo apt-get -y install musl-tools

				          fi

				      - name: Install devicemapper

				        if: contains(matrix.component.needs, 'libdevmapper') && matrix.command == 'make check'

				        run: sudo apt-get update && sudo apt-get -y install libdevmapper-dev

				      - name: Install libseccomp

				        if: contains(matrix.component.needs, 'libseccomp') && matrix.command != 'make vendor' && matrix.command != 'make check'

				        run: |

				          libseccomp_install_dir=$(mktemp -d -t libseccomp.XXXXXXXXXX)

				          gperf_install_dir=$(mktemp -d -t gperf.XXXXXXXXXX)

				          ./ci/install_libseccomp.sh "${libseccomp_install_dir}" "${gperf_install_dir}"

				          echo "Set environment variables for the libseccomp crate to link the libseccomp library statically"

				          echo "LIBSECCOMP_LINK_TYPE=static" >> "$GITHUB_ENV"

				          echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> "$GITHUB_ENV"

				      - name: Install protobuf-compiler

				        if: contains(matrix.component.needs, 'protobuf-compiler') && matrix.command != 'make vendor'

				        run: sudo apt-get update && sudo apt-get -y install protobuf-compiler

				      - name: Install clang

				        if: contains(matrix.component.needs, 'clang') && matrix.command == 'make check'

				        run: sudo apt-get update && sudo apt-get -y install clang

				      - name: Setup XDG_RUNTIME_DIR

				        if: contains(matrix.component.needs, 'XDG_RUNTIME_DIR') && matrix.command != 'make check'

				        run: |

				          XDG_RUNTIME_DIR=$(mktemp -d "/tmp/kata-tests-$USER.XXX" | tee >(xargs chmod 0700))

				          echo "XDG_RUNTIME_DIR=${XDG_RUNTIME_DIR}" >> "$GITHUB_ENV"

				      - name: Skip tests that depend on virtualization capable runners when needed

				        if: inputs.instance == 'riscv-builder'

				        run: |

				          echo "GITHUB_RUNNER_CI_NON_VIRT=true" >> "$GITHUB_ENV"

				      - name: Running `${{ matrix.command }}` for ${{ matrix.component.name }}

				        run: |

				          cd "${COMPONENT_PATH}"

				          ${COMMAND}

				        env:

				          COMMAND: ${{ matrix.command }}

				          COMPONENT_PATH: ${{ matrix.component.path }}

				          RUST_BACKTRACE: "1"

				          RUST_LIB_BACKTRACE: "0"

				          SKIP_GO_VERSION_CHECK: "1"

									
										159

.github/workflows/build-checks.yaml
									
										vendored
									
												View File
												
				@@ -5,64 +5,89 @@ on:

				        required: true

				        type: string

				permissions: {}

				name: Build checks

				jobs:

				  check:

				    runs-on: ${{ inputs.instance }}

				    name: check

				    runs-on: >-

				      ${{

				        ( contains(inputs.instance, 's390x') && matrix.component.name == 'runtime' ) && 's390x' ||

				        ( contains(inputs.instance, 'ppc64le') && (matrix.component.name == 'runtime' || matrix.component.name == 'agent') ) && 'ppc64le' ||

				        inputs.instance

				      }}

				    strategy:

				      fail-fast: false

				      matrix:

				        component:

				          - agent

				          - dragonball

				          - runtime

				          - runtime-rs

				          - agent-ctl

				          - kata-ctl

				          - runk

				          - trace-forwarder

				          - genpolicy

				        command:

				          - "make vendor"

				          - "make check"

				          - "make test"

				          - "sudo -E PATH=\"$PATH\" make test"

				        include:

				          - component: agent

				            component-path: src/agent

				          - component: dragonball

				            component-path: src/dragonball

				          - component: runtime

				            component-path: src/runtime

				          - component: runtime-rs

				            component-path: src/runtime-rs

				          - component: agent-ctl

				            component-path: src/tools/agent-ctl

				          - component: kata-ctl

				            component-path: src/tools/kata-ctl

				          - component: runk

				            component-path: src/tools/runk

				          - component: trace-forwarder

				            component-path: src/tools/trace-forwarder

				          - install-libseccomp: no

				          - component: agent

				            install-libseccomp: yes

				          - component: runk

				            install-libseccomp: yes

				          - component: genpolicy

				            component-path: src/tools/genpolicy

				        component:

				          - name: agent

				            path: src/agent

				            needs:

				              - rust

				              - libdevmapper

				              - libseccomp

				              - protobuf-compiler

				              - clang

				          - name: dragonball

				            path: src/dragonball

				            needs:

				              - rust

				          - name: runtime

				            path: src/runtime

				            needs:

				              - golang

				              - XDG_RUNTIME_DIR

				          - name: runtime-rs

				            path: src/runtime-rs

				            needs:

				              - rust

				          - name: libs

				            path: src/libs

				            needs:

				              - rust

				              - protobuf-compiler

				          - name: agent-ctl

				            path: src/tools/agent-ctl

				            needs:

				              - rust

				              - protobuf-compiler

				              - clang

				          - name: kata-ctl

				            path: src/tools/kata-ctl

				            needs:

				              - rust

				              - protobuf-compiler

				          - name: trace-forwarder

				            path: src/tools/trace-forwarder

				            needs:

				              - rust

				          - name: genpolicy

				            path: src/tools/genpolicy

				            needs:

				              - rust

				              - protobuf-compiler

				        instance:

				          - ${{ inputs.instance }}

				    steps:

				      - name: Adjust a permission for repo

				        run: |

				          sudo chown -R $USER:$USER $GITHUB_WORKSPACE $HOME

				          sudo rm -rf $GITHUB_WORKSPACE/*

				          sudo chown -R "$USER":"$USER" "$GITHUB_WORKSPACE" "$HOME"

				          sudo rm -rf "$GITHUB_WORKSPACE"/* || { sleep 10 && sudo rm -rf "$GITHUB_WORKSPACE"/*; }

				          sudo rm -f /tmp/kata_hybrid*  # Sometime we got leftover from test_setup_hvsock_failed()

				        if: ${{ inputs.instance != 'ubuntu-20.04' }}

				      - name: Checkout the code

				        uses: actions/checkout@v4

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Install yq

				        run: |

				@@ -70,44 +95,52 @@ jobs:

				        env:

				          INSTALL_IN_GOPATH: false

				      - name: Install golang

				        if: ${{ matrix.component == 'runtime' }}

				        if: contains(matrix.component.needs, 'golang')

				        run: |

				          ./tests/install_go.sh -f -p

				          echo "/usr/local/go/bin" >> $GITHUB_PATH

				      - name: Install rust

				        if: ${{ matrix.component != 'runtime' }}

				          echo "/usr/local/go/bin" >> "$GITHUB_PATH"

				      - name: Setup rust

				        if: contains(matrix.component.needs, 'rust')

				        run: |

				          ./tests/install_rust.sh

				          echo "${HOME}/.cargo/bin" >> $GITHUB_PATH

				      - name: Install musl-tools

				        if: ${{ matrix.component != 'runtime' }}

				        run: sudo apt-get -y install musl-tools

				          echo "${HOME}/.cargo/bin" >> "$GITHUB_PATH"

				          if [ "$(uname -m)" == "x86_64" ] || [ "$(uname -m)" == "aarch64" ]; then

				            sudo apt-get update && sudo apt-get -y install musl-tools

				          fi

				      - name: Install devicemapper

				        if: ${{ matrix.command == 'make check' && matrix.component == 'agent' }}

				        run: sudo apt-get -y install libdevmapper-dev

				        if: contains(matrix.component.needs, 'libdevmapper') && matrix.command == 'make check'

				        run: sudo apt-get update && sudo apt-get -y install libdevmapper-dev

				      - name: Install libseccomp

				        if: ${{ matrix.command != 'make vendor'  &&  matrix.command != 'make check' &&  matrix.install-libseccomp == 'yes' }}

				        if: contains(matrix.component.needs, 'libseccomp') && matrix.command != 'make vendor' && matrix.command != 'make check'

				        run: |

				          libseccomp_install_dir=$(mktemp -d -t libseccomp.XXXXXXXXXX)

				          gperf_install_dir=$(mktemp -d -t gperf.XXXXXXXXXX)

				          ./ci/install_libseccomp.sh "${libseccomp_install_dir}" "${gperf_install_dir}"

				          echo "Set environment variables for the libseccomp crate to link the libseccomp library statically"

				          echo "LIBSECCOMP_LINK_TYPE=static" >> $GITHUB_ENV

				          echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> $GITHUB_ENV

				          echo "LIBSECCOMP_LINK_TYPE=static" >> "$GITHUB_ENV"

				          echo "LIBSECCOMP_LIB_PATH=${libseccomp_install_dir}/lib" >> "$GITHUB_ENV"

				      - name: Install protobuf-compiler

				        if: ${{ matrix.command != 'make vendor' && (matrix.component == 'agent' || matrix.component == 'runk') }}

				        run: sudo apt-get -y install protobuf-compiler

				        if: contains(matrix.component.needs, 'protobuf-compiler') && matrix.command != 'make vendor'

				        run: sudo apt-get update && sudo apt-get -y install protobuf-compiler

				      - name: Install clang

				        if: ${{ matrix.command == 'make check' && matrix.component == 'agent' }}

				        run: sudo apt-get -y install clang

				      - name: Setup XDG_RUNTIME_DIR for the `runtime` tests

				        if: ${{ matrix.command != 'make vendor' && matrix.command != 'make check' && matrix.component == 'runtime' }}

				        if: contains(matrix.component.needs, 'clang') && matrix.command == 'make check'

				        run: sudo apt-get update && sudo apt-get -y install clang

				      - name: Setup XDG_RUNTIME_DIR

				        if: contains(matrix.component.needs, 'XDG_RUNTIME_DIR') && matrix.command != 'make check'

				        run: |

				          XDG_RUNTIME_DIR=$(mktemp -d /tmp/kata-tests-$USER.XXX | tee >(xargs chmod 0700))

				          echo "XDG_RUNTIME_DIR=${XDG_RUNTIME_DIR}" >> $GITHUB_ENV

				      - name: Running `${{ matrix.command }}` for ${{ matrix.component }}

				          XDG_RUNTIME_DIR=$(mktemp -d "/tmp/kata-tests-$USER.XXX" | tee >(xargs chmod 0700))

				          echo "XDG_RUNTIME_DIR=${XDG_RUNTIME_DIR}" >> "$GITHUB_ENV"

				      - name: Skip tests that depend on virtualization capable runners when needed

				        if: ${{ endsWith(inputs.instance, '-arm') }}

				        run: |

				          cd ${{ matrix.component-path }}

				          ${{ matrix.command }}

				          echo "GITHUB_RUNNER_CI_NON_VIRT=true" >> "$GITHUB_ENV"

				      - name: Running `${{ matrix.command }}` for ${{ matrix.component.name }}

				        run: |

				          cd "${COMPONENT_PATH}"

				          eval "${COMMAND}"

				        env:

				          COMMAND: ${{ matrix.command }}

				          COMPONENT_PATH: ${{ matrix.component.path }}

				          RUST_BACKTRACE: "1"

				          RUST_LIB_BACKTRACE: "0"

				          SKIP_GO_VERSION_CHECK: "1"

									
										394

.github/workflows/build-kata-static-tarball-amd64.yaml
									
										vendored
									
												View File
												
				@@ -20,72 +20,65 @@ on:

				        required: false

				        type: string

				        default: ""

				    secrets:

				      QUAY_DEPLOYER_PASSWORD:

				        required: false

				      KBUILD_SIGN_PIN:

				        required: true

				permissions: {}

				jobs:

				  build-asset:

				    runs-on: ubuntu-latest

				    name: build-asset

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    strategy:

				      matrix:

				        asset:

				          - agent

				          - agent-opa

				          - agent-ctl

				          - busybox

				          - cloud-hypervisor

				          - cloud-hypervisor-glibc

				          - coco-guest-components

				          - firecracker

				          - genpolicy

				          - kata-ctl

				          - kata-manager

				          - kernel

				          - kernel-confidential

				          - kernel-dragonball-experimental

				          - kernel-nvidia-gpu

				          - kernel-nvidia-gpu-confidential

				          - nydus

				          - ovmf

				          - ovmf-sev

				          - ovmf-tdx

				          - pause-image

				          - qemu

				          - qemu-snp-experimental

				          - qemu-tdx-experimental

				          - stratovirt

				          - rootfs-image

				          - rootfs-image-confidential

				          - rootfs-initrd

				          - rootfs-initrd-confidential

				          - rootfs-initrd-mariner

				          - runk

				          - shim-v2

				          - tdvf

				          - trace-forwarder

				          - virtiofsd

				        stage:

				          - ${{ inputs.stage }}

				        exclude:

				          - asset: agent

				            stage: release

				          - asset: agent-opa

				            stage: release

				          - asset: cloud-hypervisor-glibc

				            stage: release

				          - asset: pause-image

				            stage: release

				          - asset: coco-guest-components

				            stage: release

				    env:

				      PERFORM_ATTESTATION: ${{ matrix.asset == 'agent' && inputs.push-to-registry == 'yes' && 'yes' || 'no' }}

				    steps:

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.push-to-registry == 'yes' }}

				        uses: docker/login-action@v2

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0 # This is needed in order to keep the commit ids history

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -94,11 +87,12 @@ jobs:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Build ${{ matrix.asset }}

				        id: build

				        run: |

				          make "${KATA_ASSET}-tarball"

				          build_dir=$(readlink -f build)

				          # store-artifact does not work with symlink

				          sudo cp -r "${build_dir}" "kata-build"

				          mkdir -p kata-build && cp "${build_dir}"/kata-static-"${KATA_ASSET}"*.tar.* kata-build/.

				        env:

				          KATA_ASSET: ${{ matrix.asset }}

				          TAR_OUTPUT: ${{ matrix.asset }}.tar.gz

				@@ -107,40 +101,360 @@ jobs:

				          ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}

				          ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				          KBUILD_SIGN_PIN: ${{ contains(matrix.asset, 'nvidia') && secrets.KBUILD_SIGN_PIN || '' }}

				      - name: Parse OCI image name and digest

				        id: parse-oci-segments

				        if: ${{ env.PERFORM_ATTESTATION == 'yes' }}

				        env:

				          KATA_ASSET: ${{ matrix.asset }}

				        run: |

				          oci_image="$(<"build/${KATA_ASSET}-oci-image")"

				          echo "oci-name=${oci_image%@*}" >> "$GITHUB_OUTPUT"

				          echo "oci-digest=${oci_image#*@}" >> "$GITHUB_OUTPUT"

				      - uses: oras-project/setup-oras@22ce207df3b08e061f537244349aac6ae1d214f6 # v1.2.4

				        if: ${{ env.PERFORM_ATTESTATION == 'yes' }}

				        with:

				          version: "1.2.0"

				      # for pushing attestations to the registry

				      - uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        if: ${{ env.PERFORM_ATTESTATION == 'yes' }}

				        with:

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - uses: actions/attest-build-provenance@ef244123eb79f2f7a7e75d99086184180e6d0018 # v1.4.4

				        if: ${{ env.PERFORM_ATTESTATION == 'yes' }}

				        with:

				          subject-name: ${{ steps.parse-oci-segments.outputs.oci-name }}

				          subject-digest: ${{ steps.parse-oci-segments.outputs.oci-digest }}

				          push-to-registry: true

				      - name: store-artifact ${{ matrix.asset }}

				        uses: actions/upload-artifact@v3

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-amd64${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-${{ matrix.asset }}.tar.xz

				          name: kata-artifacts-amd64-${{ matrix.asset }}${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-${{ matrix.asset }}.tar.zst

				          retention-days: 15

				          if-no-files-found: error

				      - name: store-extratarballs-artifact ${{ matrix.asset }}

				        if: ${{ matrix.asset == 'kernel' || startsWith(matrix.asset, 'kernel-nvidia-gpu') }}

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-amd64-${{ matrix.asset }}-modules${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-${{ matrix.asset }}-modules.tar.zst

				          retention-days: 15

				          if-no-files-found: error

				  build-asset-rootfs:

				    name: build-asset-rootfs

				    runs-on: ubuntu-22.04

				    needs: build-asset

				    permissions:

				      contents: read

				      packages: write

				    strategy:

				      matrix:

				        asset:

				          - rootfs-image

				          - rootfs-image-confidential

				          - rootfs-image-mariner

				          - rootfs-image-nvidia-gpu

				          - rootfs-image-nvidia-gpu-confidential

				          - rootfs-initrd

				          - rootfs-initrd-confidential

				          - rootfs-initrd-nvidia-gpu

				          - rootfs-initrd-nvidia-gpu-confidential

				    steps:

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.push-to-registry == 'yes' }}

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0 # This is needed in order to keep the commit ids history

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-artifacts

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          pattern: kata-artifacts-amd64-*${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				          merge-multiple: true

				      - name: Build ${{ matrix.asset }}

				        id: build

				        run: |

				          ./tests/gha-adjust-to-use-prebuilt-components.sh kata-artifacts "${KATA_ASSET}"

				          make "${KATA_ASSET}-tarball"

				          build_dir=$(readlink -f build)

				          # store-artifact does not work with symlink

				          mkdir -p kata-build && cp "${build_dir}"/kata-static-"${KATA_ASSET}"*.tar.* kata-build/.

				        env:

				          KATA_ASSET: ${{ matrix.asset }}

				          TAR_OUTPUT: ${{ matrix.asset }}.tar.gz

				          PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}

				          ARTEFACT_REGISTRY: ghcr.io

				          ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}

				          ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				          KBUILD_SIGN_PIN: ${{ contains(matrix.asset, 'nvidia') && secrets.KBUILD_SIGN_PIN || '' }}

				      - name: store-artifact ${{ matrix.asset }}

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-amd64-${{ matrix.asset }}${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-${{ matrix.asset }}.tar.zst

				          retention-days: 15

				          if-no-files-found: error

				  # We don't need the binaries installed in the rootfs as part of the release tarball, so can delete them now we've built the rootfs

				  remove-rootfs-binary-artifacts:

				    name: remove-rootfs-binary-artifacts

				    runs-on: ubuntu-22.04

				    needs: build-asset-rootfs

				    strategy:

				      matrix:

				        asset:

				          - busybox

				          - coco-guest-components

				          - kernel-modules

				          - kernel-nvidia-gpu-modules

				          - pause-image

				    steps:

				      - uses: geekyeggo/delete-artifact@f275313e70c08f6120db482d7a6b98377786765b # v5.1.0

				        with:

				          name: kata-artifacts-amd64-${{ matrix.asset}}${{ inputs.tarball-suffix }}

				  # We don't need the binaries installed in the rootfs as part of the release tarball, so can delete them now we've built the rootfs

				  remove-rootfs-binary-artifacts-for-release:

				    name: remove-rootfs-binary-artifacts-for-release

				    runs-on: ubuntu-22.04

				    needs: build-asset-rootfs

				    strategy:

				      matrix:

				        asset:

				          - agent

				    steps:

				      - uses: geekyeggo/delete-artifact@f275313e70c08f6120db482d7a6b98377786765b # v5.1.0

				        if: ${{ inputs.stage == 'release' }}

				        with:

				          name: kata-artifacts-amd64-${{ matrix.asset}}${{ inputs.tarball-suffix }}

				  build-asset-shim-v2:

				    name: build-asset-shim-v2

				    runs-on: ubuntu-22.04

				    needs: [build-asset, build-asset-rootfs, remove-rootfs-binary-artifacts, remove-rootfs-binary-artifacts-for-release]

				    permissions:

				      contents: read

				      packages: write

				    steps:

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.push-to-registry == 'yes' }}

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0 # This is needed in order to keep the commit ids history

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-artifacts

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          pattern: kata-artifacts-amd64-*${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				          merge-multiple: true

				      - name: Build shim-v2

				        id: build

				        run: |

				          ./tests/gha-adjust-to-use-prebuilt-components.sh kata-artifacts "${KATA_ASSET}"

				          make "${KATA_ASSET}-tarball"

				          build_dir=$(readlink -f build)

				          # store-artifact does not work with symlink

				          mkdir -p kata-build && cp "${build_dir}"/kata-static-"${KATA_ASSET}"*.tar.* kata-build/.

				        env:

				          KATA_ASSET: shim-v2

				          TAR_OUTPUT: shim-v2.tar.gz

				          PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}

				          ARTEFACT_REGISTRY: ghcr.io

				          ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}

				          ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				          MEASURED_ROOTFS: yes

				      - name: store-artifact shim-v2

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-amd64-shim-v2${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-shim-v2.tar.zst

				          retention-days: 15

				          if-no-files-found: error

				  create-kata-tarball:

				    runs-on: ubuntu-latest

				    needs: build-asset

				    name: create-kata-tarball

				    runs-on: ubuntu-22.04

				    needs: [build-asset, build-asset-rootfs, build-asset-shim-v2]

				    permissions:

				      contents: read

				      packages: write

				    steps:

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          fetch-tags: true

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-artifacts

				        uses: actions/download-artifact@v3

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-artifacts-amd64${{ inputs.tarball-suffix }}

				          pattern: kata-artifacts-amd64-*${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				          merge-multiple: true

				      - name: merge-artifacts

				        run: |

				          ./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts versions.yaml

				        env:

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				      - name: store-artifacts

				        uses: actions/upload-artifact@v3

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-static.tar.xz

				          path: kata-static.tar.zst

				          retention-days: 15

				          if-no-files-found: error

				  build-tools-asset:

				    name: build-tools-asset

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: read

				      packages: write

				    strategy:

				      matrix:

				        asset:

				          - agent-ctl

				          - csi-kata-directvolume

				          - genpolicy

				          - kata-ctl

				          - kata-manager

				          - trace-forwarder

				        stage:

				          - ${{ inputs.stage }}

				    steps:

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.push-to-registry == 'yes' }}

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0 # This is needed in order to keep the commit ids history

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Build ${{ matrix.asset }}

				        id: build

				        run: |

				          make "${KATA_ASSET}-tarball"

				          build_dir=$(readlink -f build)

				          # store-artifact does not work with symlink

				          mkdir -p kata-tools-build && cp "${build_dir}"/kata-static-"${KATA_ASSET}"*.tar.* kata-tools-build/.

				        env:

				          KATA_ASSET: ${{ matrix.asset }}

				          TAR_OUTPUT: ${{ matrix.asset }}.tar.gz

				          PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}

				          ARTEFACT_REGISTRY: ghcr.io

				          ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}

				          ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				      - name: store-artifact ${{ matrix.asset }}

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-tools-artifacts-amd64-${{ matrix.asset }}${{ inputs.tarball-suffix }}

				          path: kata-tools-build/kata-static-${{ matrix.asset }}.tar.zst

				          retention-days: 15

				          if-no-files-found: error

				  create-kata-tools-tarball:

				    name: create-kata-tools-tarball

				    runs-on: ubuntu-22.04

				    needs: [build-tools-asset]

				    permissions:

				      contents: read

				      packages: write

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          fetch-tags: true

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-artifacts

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          pattern: kata-tools-artifacts-amd64-*${{ inputs.tarball-suffix }}

				          path: kata-tools-artifacts

				          merge-multiple: true

				      - name: merge-artifacts

				        run: |

				          ./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-tools-artifacts versions.yaml kata-tools-static.tar.zst

				        env:

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				      - name: store-artifacts

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-tools-static.tar.zst

				          retention-days: 15

				          if-no-files-found: error

									
										272

.github/workflows/build-kata-static-tarball-arm64.yaml
									
										vendored
									
												View File
												
				@@ -20,44 +20,54 @@ on:

				        required: false

				        type: string

				        default: ""

				    secrets:

				      QUAY_DEPLOYER_PASSWORD:

				        required: false

				      KBUILD_SIGN_PIN:

				        required: true

				permissions: {}

				jobs:

				  build-asset:

				    runs-on: arm64-builder

				    name: build-asset

				    runs-on: ubuntu-24.04-arm

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    strategy:

				      matrix:

				        asset:

				          - agent

				          - busybox

				          - cloud-hypervisor

				          - firecracker

				          - kernel

				          - kernel-dragonball-experimental

				          - kernel-nvidia-gpu

				          - kernel-cca-confidential

				          - nydus

				          - ovmf

				          - qemu

				          - stratovirt

				          - rootfs-image

				          - rootfs-initrd

				          - shim-v2

				          - virtiofsd

				        stage:

				          - ${{ inputs.stage }}

				    env:

				      PERFORM_ATTESTATION: ${{ matrix.asset == 'agent' && inputs.push-to-registry == 'yes' && 'yes' || 'no' }}

				    steps:

				      - name: Adjust a permission for repo

				        run: |

				          sudo chown -R $USER:$USER $GITHUB_WORKSPACE

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.push-to-registry == 'yes' }}

				        uses: docker/login-action@v2

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0 # This is needed in order to keep the commit ids history

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -70,7 +80,7 @@ jobs:

				          make "${KATA_ASSET}-tarball"

				          build_dir=$(readlink -f build)

				          # store-artifact does not work with symlink

				          sudo cp -r "${build_dir}" "kata-build"

				          mkdir -p kata-build && cp "${build_dir}"/kata-static-"${KATA_ASSET}"*.tar.* kata-build/.

				        env:

				          KATA_ASSET: ${{ matrix.asset }}

				          TAR_OUTPUT: ${{ matrix.asset }}.tar.gz

				@@ -79,44 +89,248 @@ jobs:

				          ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}

				          ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				          KBUILD_SIGN_PIN: ${{ contains(matrix.asset, 'nvidia') && secrets.KBUILD_SIGN_PIN || '' }}

				      - name: Parse OCI image name and digest

				        id: parse-oci-segments

				        if: ${{ env.PERFORM_ATTESTATION == 'yes' }}

				        env:

				          KATA_ASSET: ${{ matrix.asset }}

				        run: |

				          oci_image="$(<"build/${KATA_ASSET}-oci-image")"

				          echo "oci-name=${oci_image%@*}" >> "$GITHUB_OUTPUT"

				          echo "oci-digest=${oci_image#*@}" >> "$GITHUB_OUTPUT"

				      - uses: oras-project/setup-oras@22ce207df3b08e061f537244349aac6ae1d214f6 # v1.2.4

				        if: ${{ env.PERFORM_ATTESTATION == 'yes' }}

				        with:

				          version: "1.2.0"

				      # for pushing attestations to the registry

				      - uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        if: ${{ env.PERFORM_ATTESTATION == 'yes' }}

				        with:

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - uses: actions/attest-build-provenance@ef244123eb79f2f7a7e75d99086184180e6d0018 # v1.4.4

				        if: ${{ env.PERFORM_ATTESTATION == 'yes' }}

				        with:

				          subject-name: ${{ steps.parse-oci-segments.outputs.oci-name }}

				          subject-digest: ${{ steps.parse-oci-segments.outputs.oci-digest }}

				          push-to-registry: true

				      - name: store-artifact ${{ matrix.asset }}

				        uses: actions/upload-artifact@v3

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-arm64${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-${{ matrix.asset }}.tar.xz

				          name: kata-artifacts-arm64-${{ matrix.asset }}${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-${{ matrix.asset }}.tar.zst

				          retention-days: 15

				          if-no-files-found: error

				      - name: store-extratarballs-artifact ${{ matrix.asset }}

				        if: ${{ startsWith(matrix.asset, 'kernel-nvidia-gpu') }}

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-arm64-${{ matrix.asset }}-modules${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-${{ matrix.asset }}-modules.tar.zst

				          retention-days: 15

				          if-no-files-found: error

				  build-asset-rootfs:

				    name: build-asset-rootfs

				    runs-on: ubuntu-24.04-arm

				    needs: build-asset

				    permissions:

				      contents: read

				      packages: write

				    strategy:

				      matrix:

				        asset:

				          - rootfs-image

				          - rootfs-image-nvidia-gpu

				          - rootfs-initrd

				          - rootfs-initrd-nvidia-gpu

				    steps:

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.push-to-registry == 'yes' }}

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0 # This is needed in order to keep the commit ids history

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-artifacts

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          pattern: kata-artifacts-arm64-*${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				          merge-multiple: true

				      - name: Build ${{ matrix.asset }}

				        run: |

				          ./tests/gha-adjust-to-use-prebuilt-components.sh kata-artifacts "${KATA_ASSET}"

				          make "${KATA_ASSET}-tarball"

				          build_dir=$(readlink -f build)

				          # store-artifact does not work with symlink

				          mkdir -p kata-build && cp "${build_dir}"/kata-static-"${KATA_ASSET}"*.tar.* kata-build/.

				        env:

				          KATA_ASSET: ${{ matrix.asset }}

				          TAR_OUTPUT: ${{ matrix.asset }}.tar.gz

				          PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}

				          ARTEFACT_REGISTRY: ghcr.io

				          ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}

				          ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				          KBUILD_SIGN_PIN: ${{ contains(matrix.asset, 'nvidia') && secrets.KBUILD_SIGN_PIN || '' }}

				      - name: store-artifact ${{ matrix.asset }}

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-arm64-${{ matrix.asset }}${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-${{ matrix.asset }}.tar.zst

				          retention-days: 15

				          if-no-files-found: error

				  # We don't need the binaries installed in the rootfs as part of the release tarball, so can delete them now we've built the rootfs

				  remove-rootfs-binary-artifacts:

				    name: remove-rootfs-binary-artifacts

				    runs-on: ubuntu-24.04-arm

				    needs: build-asset-rootfs

				    strategy:

				      matrix:

				        asset:

				          - busybox

				          - kernel-nvidia-gpu-modules

				    steps:

				      - uses: geekyeggo/delete-artifact@f275313e70c08f6120db482d7a6b98377786765b # v5.1.0

				        with:

				          name: kata-artifacts-arm64-${{ matrix.asset}}${{ inputs.tarball-suffix }}

				  # We don't need the binaries installed in the rootfs as part of the release tarball, so can delete them now we've built the rootfs

				  remove-rootfs-binary-artifacts-for-release:

				    name: remove-rootfs-binary-artifacts-for-release

				    runs-on: ubuntu-24.04-arm

				    needs: build-asset-rootfs

				    strategy:

				      matrix:

				        asset:

				          - agent

				    steps:

				      - uses: geekyeggo/delete-artifact@f275313e70c08f6120db482d7a6b98377786765b # v5.1.0

				        if: ${{ inputs.stage == 'release' }}

				        with:

				          name: kata-artifacts-arm64-${{ matrix.asset}}${{ inputs.tarball-suffix }}

				  build-asset-shim-v2:

				    name: build-asset-shim-v2

				    runs-on: ubuntu-24.04-arm

				    needs: [build-asset, build-asset-rootfs, remove-rootfs-binary-artifacts, remove-rootfs-binary-artifacts-for-release]

				    permissions:

				      contents: read

				      packages: write

				    steps:

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.push-to-registry == 'yes' }}

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0 # This is needed in order to keep the commit ids history

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-artifacts

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          pattern: kata-artifacts-arm64-*${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				          merge-multiple: true

				      - name: Build shim-v2

				        run: |

				          ./tests/gha-adjust-to-use-prebuilt-components.sh kata-artifacts "${KATA_ASSET}"

				          make "${KATA_ASSET}-tarball"

				          build_dir=$(readlink -f build)

				          # store-artifact does not work with symlink

				          mkdir -p kata-build && cp "${build_dir}"/kata-static-"${KATA_ASSET}"*.tar.* kata-build/.

				        env:

				          KATA_ASSET: shim-v2

				          TAR_OUTPUT: shim-v2.tar.gz

				          PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}

				          ARTEFACT_REGISTRY: ghcr.io

				          ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}

				          ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				      - name: store-artifact shim-v2

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-arm64-shim-v2${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-shim-v2.tar.zst

				          retention-days: 15

				          if-no-files-found: error

				  create-kata-tarball:

				    runs-on: arm64-builder

				    needs: build-asset

				    name: create-kata-tarball

				    runs-on: ubuntu-24.04-arm

				    needs: [build-asset, build-asset-rootfs, build-asset-shim-v2]

				    permissions:

				      contents: read

				      packages: write

				    steps:

				      - name: Adjust a permission for repo

				        run: |

				          sudo chown -R $USER:$USER $GITHUB_WORKSPACE

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          fetch-tags: true

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-artifacts

				        uses: actions/download-artifact@v3

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-artifacts-arm64${{ inputs.tarball-suffix }}

				          pattern: kata-artifacts-arm64-*${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				          merge-multiple: true

				      - name: merge-artifacts

				        run: |

				          ./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts versions.yaml

				        env:

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				      - name: store-artifacts

				        uses: actions/upload-artifact@v3

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-static-tarball-arm64${{ inputs.tarball-suffix }}

				          path: kata-static.tar.xz

				          path: kata-static.tar.zst

				          retention-days: 15

				          if-no-files-found: error

									
										206

.github/workflows/build-kata-static-tarball-ppc64le.yaml
									
										vendored
									
												View File
												
				@@ -20,44 +20,42 @@ on:

				        required: false

				        type: string

				        default: ""

				    secrets:

				      QUAY_DEPLOYER_PASSWORD:

				        required: true

				permissions: {}

				jobs:

				  build-asset:

				    runs-on: ppc64le

				    name: build-asset

				    permissions:

				      contents: read

				      packages: write

				    runs-on: ubuntu-24.04-ppc64le

				    strategy:

				      matrix:

				        asset:

				          - agent

				          - kernel

				          - qemu

				          - rootfs-initrd

				          - shim-v2

				          - virtiofsd

				        stage:

				          - ${{ inputs.stage }}

				    steps:

				      - name: Adjust a permission for repo

				        run: |

				          sudo chown -R $USER:$USER $GITHUB_WORKSPACE

				      - name: Prepare the self-hosted runner

				        run: |

				            ${HOME}/scripts/prepare_runner.sh

				            sudo rm -rf $GITHUB_WORKSPACE/*

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.push-to-registry == 'yes' }}

				        uses: docker/login-action@v2

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0 # This is needed in order to keep the commit ids history

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -70,8 +68,7 @@ jobs:

				          make "${KATA_ASSET}-tarball"

				          build_dir=$(readlink -f build)

				          # store-artifact does not work with symlink

				          sudo cp -r "${build_dir}" "kata-build"

				          sudo chown -R $(id -u):$(id -g) "kata-build"

				          mkdir -p kata-build && cp "${build_dir}"/kata-static-"${KATA_ASSET}"*.tar.* kata-build/.

				        env:

				          KATA_ASSET: ${{ matrix.asset }}

				          TAR_OUTPUT: ${{ matrix.asset }}.tar.gz

				@@ -80,44 +77,195 @@ jobs:

				          ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}

				          ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				      - name: store-artifact ${{ matrix.asset }}

				        uses: actions/upload-artifact@v3

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-ppc64le${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-${{ matrix.asset }}.tar.xz

				          name: kata-artifacts-ppc64le-${{ matrix.asset }}${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-${{ matrix.asset }}.tar.zst

				          retention-days: 1

				          if-no-files-found: error

				  build-asset-rootfs:

				    name: build-asset-rootfs

				    runs-on: ubuntu-24.04-ppc64le

				    needs: build-asset

				    permissions:

				      contents: read

				      packages: write

				    strategy:

				      matrix:

				        asset:

				          - rootfs-initrd

				        stage:

				          - ${{ inputs.stage }}

				    steps:

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.push-to-registry == 'yes' }}

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0 # This is needed in order to keep the commit ids history

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-artifacts

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          pattern: kata-artifacts-ppc64le-*${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				          merge-multiple: true

				      - name: Build ${{ matrix.asset }}

				        run: |

				          ./tests/gha-adjust-to-use-prebuilt-components.sh kata-artifacts "${KATA_ASSET}"

				          make "${KATA_ASSET}-tarball"

				          build_dir=$(readlink -f build)

				          # store-artifact does not work with symlink

				          mkdir -p kata-build && cp "${build_dir}"/kata-static-"${KATA_ASSET}"*.tar.* kata-build/.

				        env:

				          KATA_ASSET: ${{ matrix.asset }}

				          TAR_OUTPUT: ${{ matrix.asset }}.tar.gz

				          PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}

				          ARTEFACT_REGISTRY: ghcr.io

				          ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}

				          ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				      - name: store-artifact ${{ matrix.asset }}

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-ppc64le-${{ matrix.asset }}${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-${{ matrix.asset }}.tar.zst

				          retention-days: 1

				          if-no-files-found: error

				  # We don't need the binaries installed in the rootfs as part of the release tarball, so can delete them now we've built the rootfs

				  remove-rootfs-binary-artifacts:

				    name: remove-rootfs-binary-artifacts

				    runs-on: ubuntu-22.04

				    needs: build-asset-rootfs

				    strategy:

				      matrix:

				        asset:

				          - agent

				    steps:

				      - uses: geekyeggo/delete-artifact@f275313e70c08f6120db482d7a6b98377786765b # v5.1.0

				        if: ${{ inputs.stage == 'release' }}

				        with:

				          name: kata-artifacts-ppc64le-${{ matrix.asset}}${{ inputs.tarball-suffix }}

				  build-asset-shim-v2:

				    name: build-asset-shim-v2

				    runs-on: ubuntu-24.04-ppc64le

				    needs: [build-asset, build-asset-rootfs, remove-rootfs-binary-artifacts]

				    permissions:

				      contents: read

				      packages: write

				    steps:

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.push-to-registry == 'yes' }}

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0 # This is needed in order to keep the commit ids history

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-artifacts

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          pattern: kata-artifacts-ppc64le-*${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				          merge-multiple: true

				      - name: Build shim-v2

				        run: |

				          ./tests/gha-adjust-to-use-prebuilt-components.sh kata-artifacts "${KATA_ASSET}"

				          make "${KATA_ASSET}-tarball"

				          build_dir=$(readlink -f build)

				          # store-artifact does not work with symlink

				          mkdir -p kata-build && cp "${build_dir}"/kata-static-"${KATA_ASSET}"*.tar.* kata-build/.

				        env:

				          KATA_ASSET: shim-v2

				          TAR_OUTPUT: shim-v2.tar.gz

				          PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}

				          ARTEFACT_REGISTRY: ghcr.io

				          ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}

				          ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				      - name: store-artifact shim-v2

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-ppc64le-shim-v2${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-shim-v2.tar.zst

				          retention-days: 1

				          if-no-files-found: error

				  create-kata-tarball:

				    runs-on: ppc64le

				    needs: build-asset

				    name: create-kata-tarball

				    runs-on: ubuntu-24.04-ppc64le

				    needs: [build-asset, build-asset-rootfs, build-asset-shim-v2]

				    permissions:

				      contents: read

				      packages: write

				    steps:

				      - name: Adjust a permission for repo

				        run: |

				          sudo chown -R $USER:$USER $GITHUB_WORKSPACE

				          sudo chown -R "$USER":"$USER" "$GITHUB_WORKSPACE"

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          fetch-tags: true

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-artifacts

				        uses: actions/download-artifact@v3

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-artifacts-ppc64le${{ inputs.tarball-suffix }}

				          pattern: kata-artifacts-ppc64le-*${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				          merge-multiple: true

				      - name: merge-artifacts

				        run: |

				          ./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts versions.yaml

				        env:

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				      - name: store-artifacts

				        uses: actions/upload-artifact@v3

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-static-tarball-ppc64le${{ inputs.tarball-suffix }}

				          path: kata-static.tar.xz

				          path: kata-static.tar.zst

				          retention-days: 1

				          if-no-files-found: error

									
										75

.github/workflows/build-kata-static-tarball-riscv64.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,75 @@

				name: CI | Build kata-static tarball for riscv64

				on:

				  workflow_call:

				    inputs:

				      stage:

				        required: false

				        type: string

				        default: test

				      tarball-suffix:

				        required: false

				        type: string

				      push-to-registry:

				        required: false

				        type: string

				        default: no

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				permissions: {}

				jobs:

				  build-asset:

				    name: build-asset

				    runs-on: riscv-builder

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    strategy:

				      matrix:

				        asset:

				          - kernel

				          - virtiofsd

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0 # This is needed in order to keep the commit ids history

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Build ${{ matrix.asset }}

				        run: |

				          make "${KATA_ASSET}-tarball"

				          build_dir=$(readlink -f build)

				          # store-artifact does not work with symlink

				          mkdir -p kata-build && cp "${build_dir}"/kata-static-"${KATA_ASSET}"*.tar.* kata-build/.

				        env:

				          KATA_ASSET: ${{ matrix.asset }}

				          TAR_OUTPUT: ${{ matrix.asset }}.tar.gz

				          PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}

				          ARTEFACT_REGISTRY: ghcr.io

				          ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}

				          ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				      - name: store-artifact ${{ matrix.asset }}

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-riscv64-${{ matrix.asset }}${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-${{ matrix.asset }}.tar.zst

				          retention-days: 3

				          if-no-files-found: error

									
										295

.github/workflows/build-kata-static-tarball-s390x.yaml
									
										vendored
									
												View File
												
				@@ -20,10 +20,24 @@ on:

				        required: false

				        type: string

				        default: ""

				    secrets:

				      CI_HKD_PATH:

				        required: true

				      QUAY_DEPLOYER_PASSWORD:

				        required: true

				permissions: {}

				jobs:

				  build-asset:

				    runs-on: s390x

				    name: build-asset

				    runs-on: ubuntu-24.04-s390x

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    strategy:

				      matrix:

				        asset:

				@@ -32,33 +46,23 @@ jobs:

				          - kernel

				          - pause-image

				          - qemu

				          - rootfs-image

				          - rootfs-initrd

				          - shim-v2

				          - virtiofsd

				        stage:

				          - ${{ inputs.stage }}

				        exclude:

				          - asset: pause-image

				            stage: release

				          - asset: coco-guest-components

				            stage: release

				    env:

				      PERFORM_ATTESTATION: ${{ matrix.asset == 'agent' && inputs.push-to-registry == 'yes' && 'yes' || 'no' }}

				    steps:

				      - name: Take a pre-action for self-hosted runner

				        run: ${HOME}/script/pre_action.sh ubuntu-2204

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.push-to-registry == 'yes' }}

				        uses: docker/login-action@v2

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0 # This is needed in order to keep the commit ids history

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -67,12 +71,12 @@ jobs:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Build ${{ matrix.asset }}

				        id: build

				        run: |

				          make "${KATA_ASSET}-tarball"

				          build_dir=$(readlink -f build)

				          # store-artifact does not work with symlink

				          sudo cp -r "${build_dir}" "kata-build"

				          sudo chown -R $(id -u):$(id -g) "kata-build"

				          mkdir -p kata-build && cp "${build_dir}"/kata-static-"${KATA_ASSET}"*.tar.* kata-build/.

				        env:

				          KATA_ASSET: ${{ matrix.asset }}

				          TAR_OUTPUT: ${{ matrix.asset }}.tar.gz

				@@ -81,29 +85,141 @@ jobs:

				          ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}

				          ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				      - name: Parse OCI image name and digest

				        id: parse-oci-segments

				        if: ${{ env.PERFORM_ATTESTATION == 'yes' }}

				        env:

				          ASSET: ${{ matrix.asset }}

				        run: |

				          oci_image="$(<"build/${ASSET}-oci-image")"

				          echo "oci-name=${oci_image%@*}" >> "$GITHUB_OUTPUT"

				          echo "oci-digest=${oci_image#*@}" >> "$GITHUB_OUTPUT"

				      # for pushing attestations to the registry

				      - uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        if: ${{ env.PERFORM_ATTESTATION == 'yes' }}

				        with:

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - uses: actions/attest-build-provenance@ef244123eb79f2f7a7e75d99086184180e6d0018 # v1.4.4

				        if: ${{ env.PERFORM_ATTESTATION == 'yes' }}

				        with:

				          subject-name: ${{ steps.parse-oci-segments.outputs.oci-name }}

				          subject-digest: ${{ steps.parse-oci-segments.outputs.oci-digest }}

				          push-to-registry: true

				      - name: store-artifact ${{ matrix.asset }}

				        uses: actions/upload-artifact@v3

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-s390x${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-${{ matrix.asset }}.tar.xz

				          name: kata-artifacts-s390x-${{ matrix.asset }}${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-${{ matrix.asset }}.tar.zst

				          retention-days: 15

				          if-no-files-found: error

				      - name: store-extratarballs-artifact ${{ matrix.asset }}

				        if: ${{ matrix.asset == 'kernel' }}

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-s390x-${{ matrix.asset }}-modules${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-${{ matrix.asset }}-modules.tar.zst

				          retention-days: 15

				          if-no-files-found: error

				  build-asset-rootfs:

				    name: build-asset-rootfs

				    runs-on: s390x

				    needs: build-asset

				    permissions:

				      contents: read

				      packages: write

				    strategy:

				      matrix:

				        asset:

				          - rootfs-image

				          - rootfs-image-confidential

				          - rootfs-initrd

				          - rootfs-initrd-confidential

				    steps:

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.push-to-registry == 'yes' }}

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0 # This is needed in order to keep the commit ids history

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-artifacts

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          pattern: kata-artifacts-s390x-*${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				          merge-multiple: true

				      - name: Build ${{ matrix.asset }}

				        id: build

				        run: |

				          ./tests/gha-adjust-to-use-prebuilt-components.sh kata-artifacts "${KATA_ASSET}"

				          make "${KATA_ASSET}-tarball"

				          build_dir=$(readlink -f build)

				          # store-artifact does not work with symlink

				          mkdir -p kata-build && cp "${build_dir}"/kata-static-"${KATA_ASSET}"*.tar.* kata-build/.

				        env:

				          KATA_ASSET: ${{ matrix.asset }}

				          TAR_OUTPUT: ${{ matrix.asset }}.tar.gz

				          PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}

				          ARTEFACT_REGISTRY: ghcr.io

				          ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}

				          ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				      - name: store-artifact ${{ matrix.asset }}

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-s390x-${{ matrix.asset }}${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-${{ matrix.asset }}.tar.zst

				          retention-days: 15

				          if-no-files-found: error

				  build-asset-boot-image-se:

				    name: build-asset-boot-image-se

				    runs-on: s390x

				    needs: build-asset

				    needs: [build-asset, build-asset-rootfs]

				    permissions:

				      contents: read

				      packages: write

				    steps:

				      - name: Take a pre-action for self-hosted runner

				        run: ${HOME}/script/pre_action.sh ubuntu-2204

				      - uses: actions/checkout@v3

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-artifacts

				        uses: actions/download-artifact@v3

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-artifacts-s390x${{ inputs.tarball-suffix }}

				          pattern: kata-artifacts-s390x-*${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				          merge-multiple: true

				      - name: Place a host key document

				        run: |

				@@ -114,54 +230,139 @@ jobs:

				      - name: Build boot-image-se

				        run: |

				          base_dir=tools/packaging/kata-deploy/local-build/

				          cp -r kata-artifacts ${base_dir}/build

				          # Skip building dependant artifacts of boot-image-se-tarball

				          # because we already have them from the previous build

				          sed -i 's/\(^boot-image-se-tarball:\).*/\1/g' ${base_dir}/Makefile

				          ./tests/gha-adjust-to-use-prebuilt-components.sh kata-artifacts "boot-image-se"

				          make boot-image-se-tarball

				          build_dir=$(readlink -f build)

				          sudo cp -r "${build_dir}" "kata-build"

				          sudo chown -R $(id -u):$(id -g) "kata-build"

				          sudo chown -R "$(id -u)":"$(id -g)" "kata-build"

				        env:

				          HKD_PATH: "host-key-document"

				      - name: store-artifact boot-image-se

				        uses: actions/upload-artifact@v3

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-s390x${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-boot-image-se.tar.xz

				          path: kata-build/kata-static-boot-image-se.tar.zst

				          retention-days: 1

				          if-no-files-found: error

				  create-kata-tarball:

				    runs-on: s390x

				    needs: [build-asset, build-asset-boot-image-se]

				  # We don't need the binaries installed in the rootfs as part of the release tarball, so can delete them now we've built the rootfs

				  remove-rootfs-binary-artifacts:

				    name: remove-rootfs-binary-artifacts

				    runs-on: ubuntu-22.04

				    needs: [build-asset-rootfs, build-asset-boot-image-se]

				    strategy:

				      matrix:

				        asset:

				          - agent

				          - coco-guest-components

				          - pause-image

				    steps:

				      - name: Take a pre-action for self-hosted runner

				        run: ${HOME}/script/pre_action.sh ubuntu-2204

				      - uses: geekyeggo/delete-artifact@f275313e70c08f6120db482d7a6b98377786765b # v5.1.0

				        if: ${{ inputs.stage == 'release' }}

				        with:

				          name: kata-artifacts-s390x-${{ matrix.asset}}${{ inputs.tarball-suffix }}

				      - uses: actions/checkout@v4

				  build-asset-shim-v2:

				    name: build-asset-shim-v2

				    runs-on: ubuntu-24.04-s390x

				    needs: [build-asset, build-asset-rootfs, remove-rootfs-binary-artifacts]

				    permissions:

				      contents: read

				      packages: write

				    steps:

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.push-to-registry == 'yes' }}

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0 # This is needed in order to keep the commit ids history

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-artifacts

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          pattern: kata-artifacts-s390x-*${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				          merge-multiple: true

				      - name: Build shim-v2

				        id: build

				        run: |

				          ./tests/gha-adjust-to-use-prebuilt-components.sh kata-artifacts "${KATA_ASSET}"

				          make "${KATA_ASSET}-tarball"

				          build_dir=$(readlink -f build)

				          # store-artifact does not work with symlink

				          mkdir -p kata-build && cp "${build_dir}"/kata-static-"${KATA_ASSET}"*.tar.* kata-build/.

				        env:

				          KATA_ASSET: shim-v2

				          TAR_OUTPUT: shim-v2.tar.gz

				          PUSH_TO_REGISTRY: ${{ inputs.push-to-registry }}

				          ARTEFACT_REGISTRY: ghcr.io

				          ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}

				          ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				          MEASURED_ROOTFS: no

				      - name: store-artifact shim-v2

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-artifacts-s390x-shim-v2${{ inputs.tarball-suffix }}

				          path: kata-build/kata-static-shim-v2.tar.zst

				          retention-days: 15

				          if-no-files-found: error

				  create-kata-tarball:

				    name: create-kata-tarball

				    runs-on: ubuntu-24.04-s390x

				    needs:

				      - build-asset

				      - build-asset-rootfs

				      - build-asset-boot-image-se

				      - build-asset-shim-v2

				    permissions:

				      contents: read

				      packages: write

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          fetch-tags: true

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-artifacts

				        uses: actions/download-artifact@v3

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-artifacts-s390x${{ inputs.tarball-suffix }}

				          pattern: kata-artifacts-s390x-*${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				          merge-multiple: true

				      - name: merge-artifacts

				        run: |

				          ./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts versions.yaml

				        env:

				          RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}

				      - name: store-artifacts

				        uses: actions/upload-artifact@v3

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: kata-static-tarball-s390x${{ inputs.tarball-suffix }}

				          path: kata-static.tar.xz

				          path: kata-static.tar.zst

				          retention-days: 15

				          if-no-files-found: error

									
										75

.github/workflows/build-kubectl-image.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,75 @@

				name: Build kubectl multi-arch image

				on:

				  schedule:

				    # Run every Sunday at 00:00 UTC

				    - cron: '0 0 * * 0'

				  workflow_dispatch:

				    # Allow manual triggering

				  push:

				    branches:

				      - main

				    paths:

				      - 'tools/packaging/kubectl/Dockerfile'

				      - '.github/workflows/build-kubectl-image.yaml'

				permissions: {}

				env:

				  REGISTRY: quay.io

				  IMAGE_NAME: kata-containers/kubectl

				jobs:

				  build-and-push:

				    name: Build and push multi-arch image

				    runs-on: ubuntu-24.04

				    permissions:

				      contents: read

				      packages: write

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: Set up QEMU

				        uses: docker/setup-qemu-action@29109295f81e9208d7d86ff1c6c12d2833863392 # v3.6.0

				      - name: Set up Docker Buildx

				        uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0

				      - name: Login to Quay.io

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: ${{ env.REGISTRY }}

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - name: Get kubectl version

				        id: kubectl-version

				        run: |

				          KUBECTL_VERSION=$(curl -L -s https://dl.k8s.io/release/stable.txt)

				          echo "version=${KUBECTL_VERSION}" >> "$GITHUB_OUTPUT"

				      - name: Generate image metadata

				        id: meta

				        uses: docker/metadata-action@902fa8ec7d6ecbf8d84d538b9b233a880e428804 # v5.7.0

				        with:

				          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

				          tags: |

				            type=raw,value=latest

				            type=raw,value={{date 'YYYYMMDD'}}

				            type=raw,value=${{ steps.kubectl-version.outputs.version }}

				            type=sha,prefix=

				      - name: Build and push multi-arch image

				        uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25 # v5.4.0

				        with:

				          context: tools/packaging/kubectl/

				          file: tools/packaging/kubectl/Dockerfile

				          platforms: linux/amd64,linux/arm64,linux/s390x,linux/ppc64le

				          push: true

				          tags: ${{ steps.meta.outputs.tags }}

				          labels: ${{ steps.meta.outputs.labels }}

				          cache-from: type=gha

				          cache-to: type=gha,mode=max

									
										14

.github/workflows/cargo-deny-runner.yaml
									
										vendored
									
												View File
												
				@@ -11,20 +11,22 @@ concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				permissions: {}

				jobs:

				  cargo-deny-runner:

				    runs-on: ubuntu-latest

				    name: cargo-deny-runner

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Checkout Code

				        if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}

				        uses: actions/checkout@v4

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: Generate Action

				        if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}

				        run: bash cargo-deny-generator.sh

				        working-directory: ./.github/cargo-deny-composite-action/

				        env:

				          GOPATH: ${{ runner.workspace }}/kata-containers

				          GOPATH: ${{ github.workspace }}/kata-containers

				      - name: Run Action

				        if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}

				        uses: ./.github/cargo-deny-composite-action

									
										33

.github/workflows/ci-coco-stability.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,33 @@

				name: Kata Containers CoCo Stability Tests Weekly

				on:

				  # Note: This workload is not currently maintained, so skipping it's scheduled runs

				  # schedule:

				  #   - cron: '0 0 * * 0'

				  workflow_dispatch:

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				permissions: {}

				jobs:

				  kata-containers-ci-on-push:

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    uses: ./.github/workflows/ci-weekly.yaml

				    with:

				      commit-hash: ${{ github.sha }}

				      pr-number: "weekly"

				      tag: ${{ github.sha }}-weekly

				      target-branch: ${{ github.ref_name }}

				    secrets:

				      AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}

				      AZ_APPID: ${{ secrets.AZ_APPID }}

				      AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}

				      AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

									
										35

.github/workflows/ci-devel.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,35 @@

				name: Kata Containers CI (manually triggered)

				on:

				  workflow_dispatch:

				permissions: {}

				jobs:

				  kata-containers-ci-on-push:

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    uses: ./.github/workflows/ci.yaml

				    with:

				      commit-hash: ${{ github.sha }}

				      pr-number: "dev"

				      tag: ${{ github.sha }}-dev

				      target-branch: ${{ github.ref_name }}

				    secrets:

				      AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}

				      AZ_APPID: ${{ secrets.AZ_APPID }}

				      AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}

				      AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				      CI_HKD_PATH: ${{ secrets.CI_HKD_PATH }}

				      ITA_KEY: ${{ secrets.ITA_KEY }}

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      NGC_API_KEY: ${{ secrets.NGC_API_KEY }}

				      KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

				  build-checks:

				    uses: ./.github/workflows/build-checks.yaml

				    with:

				      instance: ubuntu-22.04

									
										34

.github/workflows/ci-nightly-riscv.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,34 @@

				on:

				  schedule:

				    - cron: '0 5 * * *'

				name: Nightly CI for RISC-V

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				permissions: {}

				jobs:

				  build-kata-static-tarball-riscv:

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    uses: ./.github/workflows/build-kata-static-tarball-riscv64.yaml

				    with:

				      tarball-suffix: -${{ github.sha }}

				      commit-hash: ${{ github.sha }}

				      target-branch: ${{ github.ref_name }}

				  build-checks-preview:

				    strategy:

				      fail-fast: false

				      matrix:

				        instance:

				          - "riscv-builder"

				    uses: ./.github/workflows/build-checks-preview-riscv64.yaml

				    with:

				      instance: ${{ matrix.instance }}

									
										32

.github/workflows/ci-nightly-s390x.yaml
									
										vendored
									
												View File
												
				@@ -3,41 +3,25 @@ on:

				    - cron: '0 5 * * *'

				name: Nightly CI for s390x

				permissions: {}

				jobs:

				  check-internal-test-result:

				    name: check-internal-test-result

				    runs-on: s390x

				    strategy:

				      fail-fast: false

				      matrix:

				        test_title:

				          - kata-vfio-ap-e2e-tests

				          - cc-se-e2e-tests

				          - cc-vfio-ap-e2e-tests

				          - cc-se-e2e-tests-go

				          - cc-se-e2e-tests-rs

				    steps:

				    - name: Fetch a test result for {{ matrix.test_title }}

				      run: |

				        file_name="${TEST_TITLE}-$(date +%Y-%m-%d).log"

				        /home/${USER}/script/handle_test_log.sh download $file_name

				        "/home/${USER}/script/handle_test_log.sh" download "$file_name"

				      env:

				        TEST_TITLE: ${{ matrix.test_title }}

				  k8s-cri-containerd-rhel9-e2e-tests:

				    runs-on: s390x-rhel9

				    steps:

				    - name: Take a pre-action for self-hosted runner

				      run: |

				        ${HOME}/script/pre_action.sh rhel9-nightly

				    - name: Run k8s/cri-containerd e2e tests on RHEL9

				      run: |

				        export WORKSPACE=$GITHUB_WORKSPACE

				        export GITHUB_ACTION=""

				        bash ci_crio_entry_point.sh

				      env:

				        BAREMETAL: "true"

				        REPO_OWNER: "cri-o"

				        REPO_NAME: "cri-o"

				    - name: Take a post-action for self-hosted runner

				      if: always()

				      run: |

				        ${HOME}/script/post_action.sh rhel9-nightly

									
										19

.github/workflows/ci-nightly.yaml
									
										vendored
									
												View File
												
				@@ -2,18 +2,33 @@ name: Kata Containers Nightly CI

				on:

				  schedule:

				    - cron: '0 0 * * *'

				  workflow_dispatch:

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				permissions: {}

				jobs:

				  kata-containers-ci-on-push:

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    uses: ./.github/workflows/ci.yaml

				    with:

				      commit-hash: ${{ github.sha }}

				      pr-number: "nightly"

				      tag: ${{ github.sha }}-nightly

				      target-branch: ${{ github.ref_name }}

				    secrets: inherit

				    secrets:

				      AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}

				      AZ_APPID: ${{ secrets.AZ_APPID }}

				      AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}

				      AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				      CI_HKD_PATH: ${{ secrets.CI_HKD_PATH }}

				      ITA_KEY: ${{ secrets.ITA_KEY }}

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      NGC_API_KEY: ${{ secrets.NGC_API_KEY }}

				      KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

									
										32

.github/workflows/ci-on-push.yaml
									
										vendored
									
												View File
												
				@@ -1,9 +1,8 @@

				name: Kata Containers CI

				on:

				  pull_request_target:

				  pull_request_target: # zizmor: ignore[dangerous-triggers] See #11332.

				    branches:

				      - 'main'

				      - 'stable-*'

				    types:

				      # Adding 'labeled' to the list of activity types that trigger this event

				      # (default: opened, synchronize, reopened) so that we can run this

				@@ -14,17 +13,42 @@ on:

				      - reopened

				      - labeled

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				jobs:

				  kata-containers-ci-on-push:

				  skipper:

				    if: ${{ contains(github.event.pull_request.labels.*.name, 'ok-to-test') }}

				    uses: ./.github/workflows/gatekeeper-skipper.yaml

				    with:

				      commit-hash: ${{ github.event.pull_request.head.sha }}

				      target-branch: ${{ github.event.pull_request.base.ref }}

				  kata-containers-ci-on-push:

				    needs: skipper

				    if: ${{ needs.skipper.outputs.skip_build != 'yes' }}

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    uses: ./.github/workflows/ci.yaml

				    with:

				      commit-hash: ${{ github.event.pull_request.head.sha }}

				      pr-number: ${{ github.event.pull_request.number }}

				      tag: ${{ github.event.pull_request.number }}-${{ github.event.pull_request.head.sha }}

				      target-branch: ${{ github.event.pull_request.base.ref }}

				    secrets: inherit

				      skip-test: ${{ needs.skipper.outputs.skip_test }}

				    secrets:

				      AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}

				      AZ_APPID: ${{ secrets.AZ_APPID }}

				      AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}

				      AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				      CI_HKD_PATH: ${{ secrets.CI_HKD_PATH }}

				      ITA_KEY: ${{ secrets.ITA_KEY }}

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      NGC_API_KEY: ${{ secrets.NGC_API_KEY }}

				      KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

									
										128

.github/workflows/ci-weekly.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,128 @@

				name: Run the CoCo Kata Containers Stability CI

				on:

				  workflow_call:

				    inputs:

				      commit-hash:

				        required: true

				        type: string

				      pr-number:

				        required: true

				        type: string

				      tag:

				        required: true

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				    secrets:

				      AUTHENTICATED_IMAGE_PASSWORD:

				        required: true

				      AZ_APPID:

				        required: true

				      AZ_TENANT_ID:

				       required: true

				      AZ_SUBSCRIPTION_ID:

				        required: true

				      QUAY_DEPLOYER_PASSWORD:

				        required: true

				      KBUILD_SIGN_PIN:

				        required: true

				permissions: {}

				jobs:

				  build-kata-static-tarball-amd64:

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    uses: ./.github/workflows/build-kata-static-tarball-amd64.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      commit-hash: ${{ inputs.commit-hash }}

				      target-branch: ${{ inputs.target-branch }}

				    secrets:

				      KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

				  publish-kata-deploy-payload-amd64:

				    needs: build-kata-static-tarball-amd64

				    permissions:

				      contents: read

				      packages: write

				    uses: ./.github/workflows/publish-kata-deploy-payload.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      registry: ghcr.io

				      repo: ${{ github.repository_owner }}/kata-deploy-ci

				      tag: ${{ inputs.tag }}-amd64

				      commit-hash: ${{ inputs.commit-hash }}

				      target-branch: ${{ inputs.target-branch }}

				      runner: ubuntu-22.04

				      arch: amd64

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				  build-and-publish-tee-confidential-unencrypted-image:

				    name: build-and-publish-tee-confidential-unencrypted-image

				    permissions:

				      contents: read

				      packages: write

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Checkout code

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Set up QEMU

				        uses: docker/setup-qemu-action@29109295f81e9208d7d86ff1c6c12d2833863392 # v3.6.0

				      - name: Set up Docker Buildx

				        uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0

				      - name: Login to Kata Containers ghcr.io

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - name: Docker build and push

				        uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25 # v5.4.0

				        with:

				          tags: ghcr.io/kata-containers/test-images:unencrypted-${{ inputs.pr-number }}

				          push: true

				          context: tests/integration/kubernetes/runtimeclass_workloads/confidential/unencrypted/

				          platforms: linux/amd64

				          file: tests/integration/kubernetes/runtimeclass_workloads/confidential/unencrypted/Dockerfile

				  run-kata-coco-stability-tests:

				    needs: [publish-kata-deploy-payload-amd64, build-and-publish-tee-confidential-unencrypted-image]

				    uses: ./.github/workflows/run-kata-coco-stability-tests.yaml

				    with:

				      registry: ghcr.io

				      repo: ${{ github.repository_owner }}/kata-deploy-ci

				      tag: ${{ inputs.tag }}-amd64

				      commit-hash: ${{ inputs.commit-hash }}

				      pr-number: ${{ inputs.pr-number }}

				      target-branch: ${{ inputs.target-branch }}

				      tarball-suffix: -${{ inputs.tag }}

				    secrets:

				      AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}

				      AZ_APPID: ${{ secrets.AZ_APPID }}

				      AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}

				      AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				    permissions:

				      contents: read

				      id-token: write

									
										374

.github/workflows/ci.yaml
									
										vendored
									
												View File
												
				@@ -15,18 +15,54 @@ on:

				        required: false

				        type: string

				        default: ""

				      skip-test:

				        required: false

				        type: string

				        default: no

				    secrets:

				      AUTHENTICATED_IMAGE_PASSWORD:

				        required: true

				      AZ_APPID:

				        required: true

				      AZ_TENANT_ID:

				       required: true

				      AZ_SUBSCRIPTION_ID:

				        required: true

				      CI_HKD_PATH:

				        required: true

				      ITA_KEY:

				        required: true

				      QUAY_DEPLOYER_PASSWORD:

				        required: true

				      NGC_API_KEY:

				        required: true

				      KBUILD_SIGN_PIN:

				        required: true

				permissions: {}

				jobs:

				  build-kata-static-tarball-amd64:

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    uses: ./.github/workflows/build-kata-static-tarball-amd64.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      commit-hash: ${{ inputs.commit-hash }}

				      target-branch: ${{ inputs.target-branch }}

				    secrets:

				      KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

				  publish-kata-deploy-payload-amd64:

				    needs: build-kata-static-tarball-amd64

				    uses: ./.github/workflows/publish-kata-deploy-payload-amd64.yaml

				    permissions:

				      contents: read

				      packages: write

				    uses: ./.github/workflows/publish-kata-deploy-payload.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      registry: ghcr.io

				@@ -34,26 +70,76 @@ jobs:

				      tag: ${{ inputs.tag }}-amd64

				      commit-hash: ${{ inputs.commit-hash }}

				      target-branch: ${{ inputs.target-branch }}

				    secrets: inherit

				      runner: ubuntu-22.04

				      arch: amd64

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				  build-kata-static-tarball-arm64:

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    uses: ./.github/workflows/build-kata-static-tarball-arm64.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      commit-hash: ${{ inputs.commit-hash }}

				      target-branch: ${{ inputs.target-branch }}

				    secrets:

				      KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

				  publish-kata-deploy-payload-arm64:

				    needs: build-kata-static-tarball-arm64

				    permissions:

				      contents: read

				      packages: write

				    uses: ./.github/workflows/publish-kata-deploy-payload.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      registry: ghcr.io

				      repo: ${{ github.repository_owner }}/kata-deploy-ci

				      tag: ${{ inputs.tag }}-arm64

				      commit-hash: ${{ inputs.commit-hash }}

				      target-branch: ${{ inputs.target-branch }}

				      runner: ubuntu-24.04-arm

				      arch: arm64

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				  build-kata-static-tarball-s390x:

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    uses: ./.github/workflows/build-kata-static-tarball-s390x.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      commit-hash: ${{ inputs.commit-hash }}

				      target-branch: ${{ inputs.target-branch }}

				    secrets: inherit

				    secrets:

				      CI_HKD_PATH: ${{ secrets.ci_hkd_path }}

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				  build-kata-static-tarball-ppc64le:

				    permissions:

				      contents: read

				      packages: write

				    uses: ./.github/workflows/build-kata-static-tarball-ppc64le.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      commit-hash: ${{ inputs.commit-hash }}

				      target-branch: ${{ inputs.target-branch }}

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				  publish-kata-deploy-payload-s390x:

				    needs: build-kata-static-tarball-s390x

				    uses: ./.github/workflows/publish-kata-deploy-payload-s390x.yaml

				    permissions:

				      contents: read

				      packages: write

				    uses: ./.github/workflows/publish-kata-deploy-payload.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      registry: ghcr.io

				@@ -61,11 +147,17 @@ jobs:

				      tag: ${{ inputs.tag }}-s390x

				      commit-hash: ${{ inputs.commit-hash }}

				      target-branch: ${{ inputs.target-branch }}

				    secrets: inherit

				      runner: ubuntu-24.04-s390x

				      arch: s390x

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				  publish-kata-deploy-payload-ppc64le:

				    needs: build-kata-static-tarball-ppc64le

				    uses: ./.github/workflows/publish-kata-deploy-payload-ppc64le.yaml

				    permissions:

				      contents: read

				      packages: write

				    uses: ./.github/workflows/publish-kata-deploy-payload.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      registry: ghcr.io

				@@ -73,16 +165,24 @@ jobs:

				      tag: ${{ inputs.tag }}-ppc64le

				      commit-hash: ${{ inputs.commit-hash }}

				      target-branch: ${{ inputs.target-branch }}

				    secrets: inherit

				      runner: ubuntu-24.04-ppc64le

				      arch: ppc64le

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				  build-and-publish-tee-confidential-unencrypted-image:

				    runs-on: ubuntu-latest

				    name: build-and-publish-tee-confidential-unencrypted-image

				    permissions:

				      contents: read

				      packages: write

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Checkout code

				        uses: actions/checkout@v4

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -91,20 +191,20 @@ jobs:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Set up QEMU

				        uses: docker/setup-qemu-action@v2

				        uses: docker/setup-qemu-action@29109295f81e9208d7d86ff1c6c12d2833863392 # v3.6.0

				      - name: Set up Docker Buildx

				        uses: docker/setup-buildx-action@v2

				        uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0

				      - name: Login to Kata Containers ghcr.io

				        uses: docker/login-action@v2

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - name: Docker build and push

				        uses: docker/build-push-action@v4

				        uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25 # v5.4.0

				        with:

				          tags: ghcr.io/kata-containers/test-images:unencrypted-${{ inputs.pr-number }}

				          push: true

				@@ -112,31 +212,63 @@ jobs:

				          platforms: linux/amd64, linux/s390x

				          file: tests/integration/kubernetes/runtimeclass_workloads/confidential/unencrypted/Dockerfile

				  run-kata-deploy-tests-on-aks:

				    needs: publish-kata-deploy-payload-amd64

				    uses: ./.github/workflows/run-kata-deploy-tests-on-aks.yaml

				    with:

				      registry: ghcr.io

				      repo: ${{ github.repository_owner }}/kata-deploy-ci

				      tag: ${{ inputs.tag }}-amd64

				      commit-hash: ${{ inputs.commit-hash }}

				      pr-number: ${{ inputs.pr-number }}

				      target-branch: ${{ inputs.target-branch }}

				    secrets: inherit

				  publish-csi-driver-amd64:

				    name: publish-csi-driver-amd64

				    needs: build-kata-static-tarball-amd64

				    permissions:

				      contents: read

				      packages: write

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Checkout code

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				  run-kata-deploy-tests-on-garm:

				    needs: publish-kata-deploy-payload-amd64

				    uses: ./.github/workflows/run-kata-deploy-tests-on-garm.yaml

				    with:

				      registry: ghcr.io

				      repo: ${{ github.repository_owner }}/kata-deploy-ci

				      tag: ${{ inputs.tag }}-amd64

				      commit-hash: ${{ inputs.commit-hash }}

				      pr-number: ${{ inputs.pr-number }}

				      target-branch: ${{ inputs.target-branch }}

				    secrets: inherit

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-kata-tools-tarball

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-tools-static-tarball-amd64-${{ inputs.tag }}

				          path: kata-tools-artifacts

				      - name: Install kata-tools

				        run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts

				      - name: Copy binary into Docker context

				        run: |

				          # Copy to the location where the Dockerfile expects the binary.

				          mkdir -p src/tools/csi-kata-directvolume/bin/

				          cp /opt/kata/bin/csi-kata-directvolume src/tools/csi-kata-directvolume/bin/directvolplugin

				      - name: Set up Docker Buildx

				        uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0

				      - name: Login to Kata Containers ghcr.io

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - name: Docker build and push

				        uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25 # v5.4.0

				        with:

				          tags: ghcr.io/kata-containers/csi-kata-directvolume:${{ inputs.pr-number }}

				          push: true

				          context: src/tools/csi-kata-directvolume/

				          platforms: linux/amd64

				          file: src/tools/csi-kata-directvolume/Dockerfile

				  run-kata-monitor-tests:

				    if: ${{ inputs.skip-test != 'yes' }}

				    needs: build-kata-static-tarball-amd64

				    uses: ./.github/workflows/run-kata-monitor-tests.yaml

				    with:

				@@ -145,8 +277,13 @@ jobs:

				      target-branch: ${{ inputs.target-branch }}

				  run-k8s-tests-on-aks:

				    if: ${{ inputs.skip-test != 'yes' }}

				    needs: publish-kata-deploy-payload-amd64

				    uses: ./.github/workflows/run-k8s-tests-on-aks.yaml

				    permissions:

				      contents: read

				      id-token: write # Used for OIDC access to log into Azure

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      registry: ghcr.io

				@@ -155,44 +292,66 @@ jobs:

				      commit-hash: ${{ inputs.commit-hash }}

				      pr-number: ${{ inputs.pr-number }}

				      target-branch: ${{ inputs.target-branch }}

				    secrets: inherit

				    secrets:

				      AZ_APPID: ${{ secrets.AZ_APPID }}

				      AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}

				      AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				  run-k8s-tests-on-garm:

				    needs: publish-kata-deploy-payload-amd64

				    uses: ./.github/workflows/run-k8s-tests-on-garm.yaml

				  run-k8s-tests-on-arm64:

				    if: ${{ inputs.skip-test != 'yes' }}

				    needs: publish-kata-deploy-payload-arm64

				    uses: ./.github/workflows/run-k8s-tests-on-arm64.yaml

				    with:

				      registry: ghcr.io

				      repo: ${{ github.repository_owner }}/kata-deploy-ci

				      tag: ${{ inputs.tag }}-arm64

				      commit-hash: ${{ inputs.commit-hash }}

				      pr-number: ${{ inputs.pr-number }}

				      target-branch: ${{ inputs.target-branch }}

				  run-k8s-tests-on-nvidia-gpu:

				    if: ${{ inputs.skip-test != 'yes' }}

				    needs: publish-kata-deploy-payload-amd64

				    uses: ./.github/workflows/run-k8s-tests-on-nvidia-gpu.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      registry: ghcr.io

				      repo: ${{ github.repository_owner }}/kata-deploy-ci

				      tag: ${{ inputs.tag }}-amd64

				      commit-hash: ${{ inputs.commit-hash }}

				      pr-number: ${{ inputs.pr-number }}

				      target-branch: ${{ inputs.target-branch }}

				    secrets: inherit

				    secrets:

				      NGC_API_KEY: ${{ secrets.NGC_API_KEY }}

				  run-k8s-tests-with-crio-on-garm:

				    needs: publish-kata-deploy-payload-amd64

				    uses: ./.github/workflows/run-k8s-tests-with-crio-on-garm.yaml

				    with:

				      registry: ghcr.io

				      repo: ${{ github.repository_owner }}/kata-deploy-ci

				      tag: ${{ inputs.tag }}-amd64

				      commit-hash: ${{ inputs.commit-hash }}

				      pr-number: ${{ inputs.pr-number }}

				      target-branch: ${{ inputs.target-branch }}

				    secrets: inherit

				  run-kata-coco-tests:

				    needs: [publish-kata-deploy-payload-amd64, build-and-publish-tee-confidential-unencrypted-image]

				    if: ${{ inputs.skip-test != 'yes' }}

				    needs:

				     - publish-kata-deploy-payload-amd64

				     - build-and-publish-tee-confidential-unencrypted-image

				     - publish-csi-driver-amd64

				    uses: ./.github/workflows/run-kata-coco-tests.yaml

				    permissions:

				      contents: read

				      id-token: write # Used for OIDC access to log into Azure

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      registry: ghcr.io

				      repo: ${{ github.repository_owner }}/kata-deploy-ci

				      tag: ${{ inputs.tag }}-amd64

				      commit-hash: ${{ inputs.commit-hash }}

				      pr-number: ${{ inputs.pr-number }}

				      target-branch: ${{ inputs.target-branch }}

				    secrets:

				      AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}

				      AZ_APPID: ${{ secrets.AZ_APPID }}

				      AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}

				      AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				      ITA_KEY: ${{ secrets.ITA_KEY }}

				  run-k8s-tests-on-zvsi:

				    if: ${{ inputs.skip-test != 'yes' }}

				    needs: [publish-kata-deploy-payload-s390x, build-and-publish-tee-confidential-unencrypted-image]

				    uses: ./.github/workflows/run-k8s-tests-on-zvsi.yaml

				    with:

				@@ -202,8 +361,11 @@ jobs:

				      commit-hash: ${{ inputs.commit-hash }}

				      pr-number: ${{ inputs.pr-number }}

				      target-branch: ${{ inputs.target-branch }}

				    secrets:

				      AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}

				  run-k8s-tests-on-ppc64le:

				    if: ${{ inputs.skip-test != 'yes' }}

				    needs: publish-kata-deploy-payload-ppc64le

				    uses: ./.github/workflows/run-k8s-tests-on-ppc64le.yaml

				    with:

				@@ -214,15 +376,20 @@ jobs:

				      pr-number: ${{ inputs.pr-number }}

				      target-branch: ${{ inputs.target-branch }}

				  run-metrics-tests:

				    needs: build-kata-static-tarball-amd64

				    uses: ./.github/workflows/run-metrics.yaml

				  run-kata-deploy-tests:

				    if: ${{ inputs.skip-test != 'yes' }}

				    needs: [publish-kata-deploy-payload-amd64]

				    uses: ./.github/workflows/run-kata-deploy-tests.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      registry: ghcr.io

				      repo: ${{ github.repository_owner }}/kata-deploy-ci

				      tag: ${{ inputs.tag }}-amd64

				      commit-hash: ${{ inputs.commit-hash }}

				      pr-number: ${{ inputs.pr-number }}

				      target-branch: ${{ inputs.target-branch }}

				  run-basic-amd64-tests:

				    if: ${{ inputs.skip-test != 'yes' }}

				    needs: build-kata-static-tarball-amd64

				    uses: ./.github/workflows/basic-ci-amd64.yaml

				    with:

				@@ -230,18 +397,97 @@ jobs:

				      commit-hash: ${{ inputs.commit-hash }}

				      target-branch: ${{ inputs.target-branch }}

				  run-cri-containerd-tests-s390x:

				  run-basic-s390x-tests:

				    if: ${{ inputs.skip-test != 'yes' }}

				    needs: build-kata-static-tarball-s390x

				    uses: ./.github/workflows/run-cri-containerd-tests-s390x.yaml

				    uses: ./.github/workflows/basic-ci-s390x.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      commit-hash: ${{ inputs.commit-hash }}

				      target-branch: ${{ inputs.target-branch }}

				  run-cri-containerd-tests-ppc64le:

				    needs: build-kata-static-tarball-ppc64le

				    uses: ./.github/workflows/run-cri-containerd-tests-ppc64le.yaml

				  run-cri-containerd-amd64:

				    if: ${{ inputs.skip-test != 'yes' }}

				    needs: build-kata-static-tarball-amd64

				    strategy:

				      fail-fast: false

				      matrix:

				        params: [

				          { containerd_version: lts,    vmm: clh              },

				          { containerd_version: lts,    vmm: dragonball       },

				          { containerd_version: lts,    vmm: qemu             },

				          { containerd_version: lts,    vmm: cloud-hypervisor },

				          { containerd_version: lts,    vmm: qemu-runtime-rs  },

				          { containerd_version: active, vmm: clh              },

				          { containerd_version: active, vmm: dragonball       },

				          { containerd_version: active, vmm: qemu             },

				          { containerd_version: active, vmm: cloud-hypervisor },

				          { containerd_version: active, vmm: qemu-runtime-rs  },

				         ]

				    uses: ./.github/workflows/run-cri-containerd-tests.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      commit-hash: ${{ inputs.commit-hash }}

				      target-branch: ${{ inputs.target-branch }}

				      runner: ubuntu-22.04

				      arch: amd64

				      containerd_version: ${{ matrix.params.containerd_version }}

				      vmm: ${{ matrix.params.vmm }}

				  run-cri-containerd-s390x:

				    if: ${{ inputs.skip-test != 'yes' }}

				    needs: build-kata-static-tarball-s390x

				    strategy:

				      fail-fast: false

				      matrix:

				        params: [

				          { containerd_version: active, vmm: qemu            },

				          { containerd_version: active, vmm: qemu-runtime-rs },

				         ]

				    uses: ./.github/workflows/run-cri-containerd-tests.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      commit-hash: ${{ inputs.commit-hash }}

				      target-branch: ${{ inputs.target-branch }}

				      runner: s390x-large

				      arch: s390x

				      containerd_version: ${{ matrix.params.containerd_version }}

				      vmm: ${{ matrix.params.vmm }}

				  run-cri-containerd-tests-ppc64le:

				    if: ${{ inputs.skip-test != 'yes' }}

				    needs: build-kata-static-tarball-ppc64le

				    strategy:

				      fail-fast: false

				      matrix:

				        params: [

				          { containerd_version: active, vmm: qemu },

				         ]

				    uses: ./.github/workflows/run-cri-containerd-tests.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      commit-hash: ${{ inputs.commit-hash }}

				      target-branch: ${{ inputs.target-branch }}

				      runner: ppc64le-small

				      arch: ppc64le

				      containerd_version: ${{ matrix.params.containerd_version }}

				      vmm: ${{ matrix.params.vmm }}

				  run-cri-containerd-tests-arm64:

				    if: false

				    needs: build-kata-static-tarball-arm64

				    strategy:

				      fail-fast: false

				      matrix:

				        params: [

				          { containerd_version: active, vmm: qemu },

				         ]

				    uses: ./.github/workflows/run-cri-containerd-tests.yaml

				    with:

				      tarball-suffix: -${{ inputs.tag }}

				      commit-hash: ${{ inputs.commit-hash }}

				      target-branch: ${{ inputs.target-branch }}

				      runner: arm64-non-k8s

				      arch: arm64

				      containerd_version: ${{ matrix.params.containerd_version }}

				      vmm: ${{ matrix.params.vmm }}

									
										38

.github/workflows/cleanup-resources.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,38 @@

				name: Cleanup dangling Azure resources

				on:

				  schedule:

				    - cron: "0 0 * * *"

				  workflow_dispatch:

				permissions: {}

				jobs:

				  cleanup-resources:

				    name: cleanup-resources

				    runs-on: ubuntu-22.04

				    permissions:

				      id-token: write # Used for OIDC access to log into Azure

				    environment: ci

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: Log into Azure

				        uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0

				        with:

				          client-id: ${{ secrets.AZ_APPID }}

				          tenant-id: ${{ secrets.AZ_TENANT_ID }}

				          subscription-id: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				      - name: Install Python dependencies

				        run: |

				          pip3 install --user --upgrade \

				            azure-identity==1.16.0 \

				            azure-mgmt-resource==23.0.1

				      - name: Cleanup resources

				        env:

				          AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				          CLEANUP_AFTER_HOURS: 24 # Clean up resources created more than this many hours ago.

				        run: python3 tests/cleanup_resources.py

									
										100

.github/workflows/codeql.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,100 @@

				# For most projects, this workflow file will not need changing; you simply need

				# to commit it to your repository.

				#

				# You may wish to alter this file to override the set of languages analyzed,

				# or to provide custom queries or build logic.

				#

				# ******** NOTE ********

				# We have attempted to detect the languages in your repository. Please check

				# the `language` matrix defined below to confirm you have the correct set of

				# supported CodeQL languages.

				#

				name: "CodeQL Advanced"

				on:

				  push:

				    branches: [ "main" ]

				  pull_request:

				    branches: [ "main" ]

				  schedule:

				    - cron: '45 0 * * 1'

				permissions: {}

				jobs:

				  analyze:

				    name: Analyze (${{ matrix.language }})

				    # Runner size impacts CodeQL analysis time. To learn more, please see:

				    #   - https://gh.io/recommended-hardware-resources-for-running-codeql

				    #   - https://gh.io/supported-runners-and-hardware-resources

				    #   - https://gh.io/using-larger-runners (GitHub.com only)

				    # Consider using larger runners or machines with greater resources for possible analysis time improvements.

				    runs-on: ubuntu-24.04

				    permissions:

				      # required for all workflows

				      security-events: write

				      # required to fetch internal or private CodeQL packs

				      packages: read

				      # only required for workflows in private repositories

				      actions: read

				      contents: read

				    strategy:

				      fail-fast: false

				      matrix:

				        include:

				        - language: go

				          build-mode: manual

				        - language: python

				          build-mode: none

				        # CodeQL supports the following values keywords for 'language': 'c-cpp', 'csharp', 'go', 'java-kotlin', 'javascript-typescript', 'python', 'ruby', 'swift'

				        # Use `c-cpp` to analyze code written in C, C++ or both

				        # Use 'java-kotlin' to analyze code written in Java, Kotlin or both

				        # Use 'javascript-typescript' to analyze code written in JavaScript, TypeScript or both

				        # To learn more about changing the languages that are analyzed or customizing the build mode for your analysis,

				        # see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/customizing-your-advanced-setup-for-code-scanning.

				        # If you are analyzing a compiled language, you can modify the 'build-mode' for that language to customize how

				        # your codebase is analyzed, see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/codeql-code-scanning-for-compiled-languages

				    steps:

				    - name: Checkout repository

				      uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      with:

				        persist-credentials: false

				    # Add any setup steps before running the `github/codeql-action/init` action.

				    # This includes steps like installing compilers or runtimes (`actions/setup-node`

				    # or others). This is typically only required for manual builds.

				    # - name: Setup runtime (example)

				    #   uses: actions/setup-example@v1

				    # Initializes the CodeQL tools for scanning.

				    - name: Initialize CodeQL

				      uses: github/codeql-action/init@v3

				      with:

				        languages: ${{ matrix.language }}

				        build-mode: ${{ matrix.build-mode }}

				        # If you wish to specify custom queries, you can do so here or in a config file.

				        # By default, queries listed here will override any specified in a config file.

				        # Prefix the list here with "+" to use these queries and those in the config file.

				        # For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs

				        # queries: security-extended,security-and-quality

				    # If the analyze step fails for one of the languages you are analyzing with

				    # "We were unable to automatically build your code", modify the matrix above

				    # to set the build mode to "manual" for that language. Then modify this step

				    # to build your code.

				    # ℹ️ Command-line programs to run using the OS shell.

				    # 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun

				    - if: matrix.build-mode == 'manual' && matrix.language == 'go'

				      shell: bash

				      run: |

				        make -C src/runtime

				    - name: Perform CodeQL Analysis

				      uses: github/codeql-action/analyze@v3

				      with:

				        category: "/language:${{matrix.language}}"

									
										44

.github/workflows/commit-message-check.yaml
									
										vendored
									
												View File
												
				@@ -6,6 +6,8 @@ on:

				      - reopened

				      - synchronize

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				@@ -18,13 +20,14 @@ env:

				jobs:

				  commit-message-check:

				    runs-on: ubuntu-latest

				    runs-on: ubuntu-22.04

				    env:

				      PR_AUTHOR: ${{ github.event.pull_request.user.login }}

				    name: Commit Message Check

				    steps:

				    - name: Get PR Commits

				      if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}

				      id: 'get-pr-commits'

				      uses: tim-actions/get-pr-commits@v1.2.0

				      uses: tim-actions/get-pr-commits@c64db31d359214d244884dd68f971a110b29ab83 # v1.2.0

				      with:

				        token: ${{ secrets.GITHUB_TOKEN }}

				        # Filter out revert commits

				@@ -32,23 +35,25 @@ jobs:

				        #

				        # Revert "<original-subject-line>"

				        #

				        filter_out_pattern: '^Revert "'

				        # The format of a re-re-vert commit as follows:

				        #

				        # Reapply "<original-subject-line>"

				        filter_out_pattern: '^Revert "|^Reapply "'

				    - name: DCO Check

				      if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}

				      uses: tim-actions/dco@2fd0504dc0d27b33f542867c300c60840c6dcb20

				      uses: tim-actions/dco@f2279e6e62d5a7d9115b0cb8e837b777b1b02e21 # v1.1.0

				      with:

				        commits: ${{ steps.get-pr-commits.outputs.commits }}

				    - name: Commit Body Missing Check

				      if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && ( success() || failure() ) }}

				      uses: tim-actions/commit-body-check@v1.0.2

				      if: ${{ success() || failure() }}

				      uses: tim-actions/commit-body-check@d2e0e8e1f0332b3281c98867c42a2fbe25ad3f15 # v1.0.2

				      with:

				        commits: ${{ steps.get-pr-commits.outputs.commits }}

				    - name: Check Subject Line Length

				      if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && ( success() || failure() ) }}

				      uses: tim-actions/commit-message-checker-with-regex@v0.3.1

				      if: ${{ (env.PR_AUTHOR != 'dependabot[bot]') && ( success() || failure() ) }}

				      uses: tim-actions/commit-message-checker-with-regex@d6d9770051dd6460679d1cab1dcaa8cffc5c2bbd # v0.3.1

				      with:

				        commits: ${{ steps.get-pr-commits.outputs.commits }}

				        pattern: '^.{0,75}(\n.*)*$'

				@@ -56,8 +61,8 @@ jobs:

				        post_error: ${{ env.error_msg }}

				    - name: Check Body Line Length

				      if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && ( success() || failure() ) }}

				      uses: tim-actions/commit-message-checker-with-regex@v0.3.1

				      if: ${{ (env.PR_AUTHOR != 'dependabot[bot]') && ( success() || failure() ) }}

				      uses: tim-actions/commit-message-checker-with-regex@d6d9770051dd6460679d1cab1dcaa8cffc5c2bbd # v0.3.1

				      with:

				        commits: ${{ steps.get-pr-commits.outputs.commits }}

				        # Notes:

				@@ -86,20 +91,9 @@ jobs:

				        error: 'Body line too long (max 150)'

				        post_error: ${{ env.error_msg }}

				    - name: Check Fixes

				      if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && ( success() || failure() ) }}

				      uses: tim-actions/commit-message-checker-with-regex@v0.3.1

				      with:

				        commits: ${{ steps.get-pr-commits.outputs.commits }}

				        pattern: '\s*Fixes\s*:?\s*(#\d+|github\.com\/kata-containers\/[a-z-.]*#\d+)|^\s*release\s*:'

				        flags: 'i'

				        error: 'No "Fixes" found'

				        post_error: ${{ env.error_msg }}

				        one_pass_all_pass: 'true'

				    - name: Check Subsystem

				      if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') && ( success() || failure() ) }}

				      uses: tim-actions/commit-message-checker-with-regex@v0.3.1

				      if: ${{ (env.PR_AUTHOR != 'dependabot[bot]') && ( success() || failure() ) }}

				      uses: tim-actions/commit-message-checker-with-regex@d6d9770051dd6460679d1cab1dcaa8cffc5c2bbd # v0.3.1

				      with:

				        commits: ${{ steps.get-pr-commits.outputs.commits }}

				        pattern: '^[\s\t]*[^:\s\t]+[\s\t]*:'

									
										28

.github/workflows/darwin-tests.yaml
									
										vendored
									
												View File
												
				@@ -6,6 +6,8 @@ on:

				      - reopened

				      - synchronize

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				@@ -13,13 +15,29 @@ concurrency:

				name: Darwin tests

				jobs:

				  test:

				    name: test

				    runs-on: macos-latest

				    steps:

				    - name: Install Go

				      uses: actions/setup-go@v2

				      with:

				        go-version: 1.19.3

				    - name: Install Protoc

				      run: |

				        f=$(mktemp)

				        curl -sSLo "$f" https://github.com/protocolbuffers/protobuf/releases/download/v28.2/protoc-28.2-osx-aarch_64.zip

				        mkdir -p "$HOME/.local"

				        unzip -d "$HOME/.local" "$f"

				        echo "$HOME/.local/bin" >> "${GITHUB_PATH}"

				    - name: Checkout code

				      uses: actions/checkout@v4

				      uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      with:

				        persist-credentials: false

				    - name: Install golang

				      run: |

				        ./tests/install_go.sh -f -p

				        echo "/usr/local/go/bin" >> "${GITHUB_PATH}"

				    - name: Install Rust

				      run: ./tests/install_rust.sh

				    - name: Build utils

				      run: ./ci/darwin-test.sh

									
										31

.github/workflows/docs-url-alive-check.yaml
									
										vendored
									
												View File
												
				@@ -1,37 +1,34 @@

				on:

				  schedule:

				    - cron:  '0 23 * * 0'

				  workflow_dispatch:

				permissions: {}

				name: Docs URL Alive Check

				jobs:

				  test:

				    runs-on: ubuntu-20.04

				    name: test

				    runs-on: ubuntu-22.04

				    # don't run this action on forks

				    if: github.repository_owner == 'kata-containers'

				    env:

				      target_branch: ${{ github.base_ref }}

				    steps:

				    - name: Install Go

				      uses: actions/setup-go@v2

				      with:

				        go-version: 1.19.3

				      env:

				        GOPATH: ${{ runner.workspace }}/kata-containers

				    - name: Set env

				      run: |

				        echo "GOPATH=${{ github.workspace }}" >> $GITHUB_ENV

				        echo "${{ github.workspace }}/bin" >> $GITHUB_PATH

				        echo "GOPATH=${GITHUB_WORKSPACE}" >> "$GITHUB_ENV"

				    - name: Checkout code

				      uses: actions/checkout@v4

				      uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      with:

				        fetch-depth: 0

				        path: ./src/github.com/${{ github.repository }}

				    - name: Setup

				        persist-credentials: false

				    - name: Install golang

				      run: |

				        cd ${GOPATH}/src/github.com/${{ github.repository }} && ./ci/setup.sh

				      env:

				        GOPATH: ${{ runner.workspace }}/kata-containers

				    # docs url alive check

				        ./tests/install_go.sh -f -p

				        echo "/usr/local/go/bin" >> "${GITHUB_PATH}"

				    - name: Docs URL Alive Check

				      run: |

				        cd ${GOPATH}/src/github.com/${{ github.repository }} && make docs-url-alive-check

				        make docs-url-alive-check

									
										32

.github/workflows/docs.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,32 @@

				name: Documentation

				on:

				  push:

				    branches:

				      - main

				permissions: {}

				jobs:

				  deploy-docs:

				    name: deploy-docs

				    permissions:

				      contents: read

				      pages: write

				      id-token: write

				    environment:

				      name: github-pages

				      url: ${{ steps.deployment.outputs.page_url }}

				    runs-on: ubuntu-latest

				    steps:

				      - uses: actions/configure-pages@v5

				      - uses: actions/checkout@v5

				        with:

				          persist-credentials: false

				      - uses: actions/setup-python@v5

				        with:

				          python-version: 3.x

				      - run: pip install zensical

				      - run: zensical build --clean

				      - uses: actions/upload-pages-artifact@v4

				        with:

				          path: site

				      - uses: actions/deploy-pages@v4

				        id: deployment

									
										29

.github/workflows/editorconfig-checker.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,29 @@

				name: EditorConfig checker

				on:

				  pull_request:

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				jobs:

				  editorconfig-checker:

				    name: editorconfig-checker

				    runs-on: ubuntu-24.04

				    steps:

				      - name: Checkout the code

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Set up editorconfig-checker

				        uses: editorconfig-checker/action-editorconfig-checker@4b6cd6190d435e7e084fb35e36a096e98506f7b9 # v2.1.0

				        with:

				          version: v3.6.1

				      - name: Run editorconfig-checker

				        run: editorconfig-checker

									
										55

.github/workflows/gatekeeper-skipper.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,55 @@

				name: Skipper

				# This workflow sets various "skip_*" output values that can be used to

				# determine what workflows/jobs are expected to be executed. Sample usage:

				#

				#   skipper:

				#     uses: ./.github/workflows/gatekeeper-skipper.yaml

				#     with:

				#       commit-hash: ${{ github.event.pull_request.head.sha }}

				#       target-branch: ${{ github.event.pull_request.base.ref }}

				#

				#   your-workflow:

				#     needs: skipper

				#     if: ${{ needs.skipper.outputs.skip_build != 'yes' }}

				on:

				  workflow_call:

				    inputs:

				      commit-hash:

				        required: true

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				    outputs:

				      skip_build:

				        value: ${{ jobs.skipper.outputs.skip_build }}

				      skip_test:

				        value: ${{ jobs.skipper.outputs.skip_test }}

				      skip_static:

				        value: ${{ jobs.skipper.outputs.skip_static }}

				permissions: {}

				jobs:

				  skipper:

				    name: skipper

				    runs-on: ubuntu-22.04

				    outputs:

				      skip_build: ${{ steps.skipper.outputs.skip_build }}

				      skip_test: ${{ steps.skipper.outputs.skip_test }}

				      skip_static: ${{ steps.skipper.outputs.skip_static }}

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - id: skipper

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				        run: |

				          python3 tools/testing/gatekeeper/skips.py | tee -a "$GITHUB_OUTPUT"

				        shell: /usr/bin/bash -x {0}

									
										55

.github/workflows/gatekeeper.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,55 @@

				name: Gatekeeper

				# Gatekeeper uses the "skips.py" to determine which job names/regexps are

				# required for given PR and waits for them to either complete or fail

				# reporting the status.

				on:

				  pull_request_target: # zizmor: ignore[dangerous-triggers] See #11332.

				    types:

				      - opened

				      - synchronize

				      - reopened

				      - edited

				      - labeled

				      - unlabeled

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				jobs:

				  gatekeeper:

				    name: gatekeeper

				    runs-on: ubuntu-22.04

				    permissions:

				      actions: read

				      contents: read

				      issues: read

				      pull-requests: read

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ github.event.pull_request.head.sha }}

				          fetch-depth: 0

				          persist-credentials: false

				      - id: gatekeeper

				        env:

				          TARGET_BRANCH: ${{ github.event.pull_request.base.ref }}

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				          COMMIT_HASH: ${{ github.event.pull_request.head.sha }}

				          GH_PR_NUMBER: ${{ github.event.pull_request.number }}

				        run: |

				          #!/usr/bin/env bash -x

				          mapfile -t lines < <(python3 tools/testing/gatekeeper/skips.py -t)

				          export REQUIRED_JOBS="${lines[0]}"

				          export REQUIRED_REGEXPS="${lines[1]}"

				          export REQUIRED_LABELS="${lines[2]}"

				          echo "REQUIRED_JOBS: $REQUIRED_JOBS"

				          echo "REQUIRED_REGEXPS: $REQUIRED_REGEXPS"

				          echo "REQUIRED_LABELS: $REQUIRED_LABELS"

				          python3 tools/testing/gatekeeper/jobs.py

				          exit $?

				        shell: /usr/bin/bash -x {0}

									
										53

.github/workflows/govulncheck.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,53 @@

				on:

				  workflow_call:

				name: Govulncheck

				permissions: {}

				jobs:

				  govulncheck:

				    name: govulncheck

				    runs-on: ubuntu-22.04

				    strategy:

				      matrix:

				        include:

				          - binary: "kata-runtime"

				            make_target: "runtime"

				          - binary: "containerd-shim-kata-v2"

				            make_target: "containerd-shim-v2"

				          - binary: "kata-monitor"

				            make_target: "monitor"

				      fail-fast: false

				    steps:

				      - name: Checkout the code

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4

				        with:

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Install golang

				        run: |

				          ./tests/install_go.sh -f -p

				          echo "/usr/local/go/bin" >> "${GITHUB_PATH}"

				      - name: Install govulncheck

				        run: |

				          go install golang.org/x/vuln/cmd/govulncheck@latest

				          echo "${HOME}/go/bin" >> "${GITHUB_PATH}"

				      - name: Build runtime binaries

				        run: |

				          cd src/runtime

				          make "${MAKE_TARGET}"

				        env:

				          MAKE_TARGET: ${{ matrix.make_target }}

				          SKIP_GO_VERSION_CHECK: "1"

				      - name: Run govulncheck on ${{ matrix.binary }}

				        env:

				          BINARY: ${{ matrix.binary }}

				        run: |

				          cd src/runtime

				          bash ../../tests/govulncheck-runner.sh "./${BINARY}"

									
										36

.github/workflows/kata-runtime-classes-sync.yaml
									
										vendored
									
												View File
											
				@@ -1,36 +0,0 @@

				on:

				  pull_request:

				    types:

				      - opened

				      - edited

				      - reopened

				      - synchronize

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				jobs:

				  kata-deploy-runtime-classes-check:

				    runs-on: ubuntu-latest

				    steps:

				    - name: Checkout code

				      uses: actions/checkout@v4

				    - name: Ensure the split out runtime classes match the all-in-one file

				      run: |

				        pushd tools/packaging/kata-deploy/runtimeclasses/

				        echo "::group::Combine runtime classes"

				        for runtimeClass in `find . -type f \( -name "*.yaml" -and -not -name "kata-runtimeClasses.yaml" \) | sort`; do

				            echo "Adding ${runtimeClass} to the resultingRuntimeClasses.yaml"

				            cat ${runtimeClass} >> resultingRuntimeClasses.yaml;

				        done

				        echo "::endgroup::"

				        echo "::group::Displaying the content of resultingRuntimeClasses.yaml"

				        cat resultingRuntimeClasses.yaml

				        echo "::endgroup::"

				        echo ""

				        echo "::group::Displaying the content of kata-runtimeClasses.yaml"

				        cat kata-runtimeClasses.yaml

				        echo "::endgroup::"

				        echo ""

				        diff resultingRuntimeClasses.yaml kata-runtimeClasses.yaml

									
										92

.github/workflows/move-issues-to-in-progress.yaml
									
										vendored
									
												View File
											
				@@ -1,92 +0,0 @@

				# Copyright (c) 2020 Intel Corporation

				#

				# SPDX-License-Identifier: Apache-2.0

				#

				name: Move issues to "In progress" in backlog project when referenced by a PR

				on:

				  pull_request_target:

				    types:

				      - opened

				      - reopened

				jobs:

				  move-linked-issues-to-in-progress:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Install hub

				        if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}

				        run: |

				          HUB_ARCH="amd64"

				          HUB_VER=$(curl -sL "https://api.github.com/repos/github/hub/releases/latest" |\

				            jq -r .tag_name | sed 's/^v//')

				          curl -sL \

				            "https://github.com/github/hub/releases/download/v${HUB_VER}/hub-linux-${HUB_ARCH}-${HUB_VER}.tgz" |\

				          tar xz --strip-components=2 --wildcards '*/bin/hub' && \

				          sudo install hub /usr/local/bin

				      - name: Install hub extension script

				        if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}

				        run: |

				          # Clone into a temporary directory to avoid overwriting

				          # any existing github directory.

				          pushd $(mktemp -d) &>/dev/null

				          git clone --single-branch --depth 1 "https://github.com/kata-containers/.github" && cd .github/scripts

				          sudo install hub-util.sh /usr/local/bin

				          popd &>/dev/null

				      - name: Checkout code to allow hub to communicate with the project

				        if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}

				        uses: actions/checkout@v4

				        with:

				          ref: ${{ github.event.pull_request.head.sha }}

				          fetch-depth: 0

				      - name: Rebase atop of the latest target branch

				        if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ github.event.pull_request.base.ref }}

				      - name: Move issue to "In progress"

				        if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}

				        env:

				          GITHUB_TOKEN: ${{ secrets.KATA_GITHUB_ACTIONS_TOKEN }}

				        run: |

				          pr=${{ github.event.pull_request.number }}

				          linked_issue_urls=$(hub-util.sh \

				            list-issues-for-pr "$pr" |\

				            grep -v "^\#"  |\

				            cut -d';' -f3 || true)

				          # PR doesn't have any linked issues

				          # (it should, but maybe a new user forgot to add a "Fixes: #XXX" commit).

				          [ -z "$linked_issue_urls" ] && {

				            echo "::error::No linked issues for PR $pr"

				            exit 1

				          }

				          project_name="Issue backlog"

				          project_type="org"

				          project_column="In progress"

				          for issue_url in $(echo "$linked_issue_urls")

				          do

				            issue=$(echo "$issue_url"| awk -F\/ '{print $NF}' || true)

				            [ -z "$issue" ] && {

				              echo "::error::Cannot determine issue number from $issue_url for PR $pr"

				              exit 1

				            }

				            # Move the issue to the correct column on the project board

				            hub-util.sh \

				              move-issue \

				              "$issue" \

				              "$project_name" \

				              "$project_type" \

				              "$project_column"

				          done

									
										35

.github/workflows/nydus-snapshotter-version-in-sync.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,35 @@

				name: nydus-snapshotter-version-sync

				on:

				  pull_request:

				    types:

				      - opened

				      - edited

				      - reopened

				      - synchronize

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				jobs:

				  nydus-snapshotter-version-check:

				    name: nydus-snapshotter-version-check

				    runs-on: ubuntu-22.04

				    steps:

				    - name: Checkout code

				      uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				      with:

				        persist-credentials: false

				    - name: Ensure nydus-snapshotter-version is in sync inside our repo

				      run: |

				        dockerfile_version=$(grep "ARG NYDUS_SNAPSHOTTER_VERSION" tools/packaging/kata-deploy/Dockerfile | cut -f2 -d'=')

				        versions_version=$(yq ".externals.nydus-snapshotter.version | explode(.)" versions.yaml)

				        if [[ "${dockerfile_version}" != "${versions_version}" ]]; then

				          echo "nydus-snapshotter version must be the same in the following places: "

				          echo "- versions.yaml: ${versions_version}"

				          echo "- tools/packaging/kata-deploy/Dockerfile: ${dockerfile_version}"

				          exit 1

				        fi

									
										43

.github/workflows/osv-scanner.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,43 @@

				# A sample workflow which sets up periodic OSV-Scanner scanning for vulnerabilities,

				# in addition to a PR check which fails if new vulnerabilities are introduced.

				#

				# For more examples and options, including how to ignore specific vulnerabilities,

				# see https://google.github.io/osv-scanner/github-action/

				name: OSV-Scanner

				on:

				  workflow_dispatch:

				  pull_request:

				    branches: [ "main" ]

				  schedule:

				    - cron: '0 1 * * 0'

				  push:

				    branches: [ "main" ]

				permissions: {}

				jobs:

				  scan-scheduled:

				    permissions:

				      actions: read # # Required to upload SARIF file to CodeQL

				      contents: read  # Read commit contents

				      security-events: write  # Require writing security events to upload SARIF file to security tab

				    if: ${{ github.event_name == 'push' || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' }}

				    uses: "google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml@b00f71e051ddddc6e46a193c31c8c0bf283bf9e6" # v2.1.0

				    with:

				      scan-args: |-

				        -r

				        ./

				  scan-pr:

				    permissions:

				      actions: read # Required to upload SARIF file to CodeQL

				      contents: read  # Read commit contents

				      security-events: write  # Require writing security events to upload SARIF file to security tab

				    if: ${{ github.event_name == 'pull_request' }}

				    uses: "google/osv-scanner-action/.github/workflows/osv-scanner-reusable-pr.yml@b00f71e051ddddc6e46a193c31c8c0bf283bf9e6" # v2.1.0

				    with:

				      # Example of specifying custom arguments

				      scan-args: |-

				        -r

				        ./

									
										130

.github/workflows/payload-after-push.yaml
									
										vendored
									
												View File
												
				@@ -5,98 +5,155 @@ on:

				      - main

				  workflow_dispatch:

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				jobs:

				  build-assets-amd64:

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    uses: ./.github/workflows/build-kata-static-tarball-amd64.yaml

				    with:

				      commit-hash: ${{ github.sha }}

				      push-to-registry: yes

				      target-branch: ${{ github.ref_name }}

				    secrets: inherit

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

				  build-assets-arm64:

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    uses: ./.github/workflows/build-kata-static-tarball-arm64.yaml

				    with:

				      commit-hash: ${{ github.sha }}

				      push-to-registry: yes

				      target-branch: ${{ github.ref_name }}

				    secrets: inherit

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

				  build-assets-s390x:

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    uses: ./.github/workflows/build-kata-static-tarball-s390x.yaml

				    with:

				      commit-hash: ${{ github.sha }}

				      push-to-registry: yes

				      target-branch: ${{ github.ref_name }}

				    secrets: inherit

				    secrets:

				      CI_HKD_PATH: ${{ secrets.CI_HKD_PATH }}

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				  build-assets-ppc64le:

				    permissions:

				      contents: read

				      packages: write

				    uses: ./.github/workflows/build-kata-static-tarball-ppc64le.yaml

				    with:

				      commit-hash: ${{ github.sha }}

				      push-to-registry: yes

				      target-branch: ${{ github.ref_name }}

				    secrets: inherit

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				  publish-kata-deploy-payload-amd64:

				    needs: build-assets-amd64

				    uses: ./.github/workflows/publish-kata-deploy-payload-amd64.yaml

				    permissions:

				      contents: read

				      packages: write

				    uses: ./.github/workflows/publish-kata-deploy-payload.yaml

				    with:

				      commit-hash: ${{ github.sha }}

				      registry: quay.io

				      repo: kata-containers/kata-deploy-ci

				      tag: kata-containers-latest-amd64

				      target-branch: ${{ github.ref_name }}

				    secrets: inherit

				      runner: ubuntu-22.04

				      arch: amd64

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				  publish-kata-deploy-payload-arm64:

				    needs: build-assets-arm64

				    uses: ./.github/workflows/publish-kata-deploy-payload-arm64.yaml

				    permissions:

				      contents: read

				      packages: write

				    uses: ./.github/workflows/publish-kata-deploy-payload.yaml

				    with:

				      commit-hash: ${{ github.sha }}

				      registry: quay.io

				      repo: kata-containers/kata-deploy-ci

				      tag: kata-containers-latest-arm64

				      target-branch: ${{ github.ref_name }}

				    secrets: inherit

				      runner: ubuntu-24.04-arm

				      arch: arm64

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				  publish-kata-deploy-payload-s390x:

				    needs: build-assets-s390x

				    uses: ./.github/workflows/publish-kata-deploy-payload-s390x.yaml

				    permissions:

				      contents: read

				      packages: write

				    uses: ./.github/workflows/publish-kata-deploy-payload.yaml

				    with:

				      commit-hash: ${{ github.sha }}

				      registry: quay.io

				      repo: kata-containers/kata-deploy-ci

				      tag: kata-containers-latest-s390x

				      target-branch: ${{ github.ref_name }}

				    secrets: inherit

				      runner: s390x

				      arch: s390x

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				  publish-kata-deploy-payload-ppc64le:

				    needs: build-assets-ppc64le

				    uses: ./.github/workflows/publish-kata-deploy-payload-ppc64le.yaml

				    permissions:

				      contents: read

				      packages: write

				    uses: ./.github/workflows/publish-kata-deploy-payload.yaml

				    with:

				      commit-hash: ${{ github.sha }}

				      registry: quay.io

				      repo: kata-containers/kata-deploy-ci

				      tag: kata-containers-latest-ppc64le

				      target-branch: ${{ github.ref_name }}

				    secrets: inherit

				      runner: ubuntu-24.04-ppc64le

				      arch: ppc64le

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				  publish-manifest:

				    runs-on: ubuntu-latest

				    name: publish-manifest

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: read

				      packages: write

				    needs: [publish-kata-deploy-payload-amd64, publish-kata-deploy-payload-arm64, publish-kata-deploy-payload-s390x, publish-kata-deploy-payload-ppc64le]

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@v4

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: Login to Kata Containers quay.io

				        uses: docker/login-action@v2

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - name: Push multi-arch manifest

				@@ -105,3 +162,42 @@ jobs:

				        env:

				          KATA_DEPLOY_IMAGE_TAGS: "kata-containers-latest"

				          KATA_DEPLOY_REGISTRIES: "quay.io/kata-containers/kata-deploy-ci"

				  upload-helm-chart-tarball:

				    name: upload-helm-chart-tarball

				    needs: publish-manifest

				    runs-on: ubuntu-22.04

				    permissions:

				      packages: write # needed to push the helm chart to ghcr.io

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: Install helm

				        uses: azure/setup-helm@fe7b79cd5ee1e45176fcad797de68ecaf3ca4814 # v4.2.0

				        id: install

				      - name: Login to the OCI registries

				        env:

				          QUAY_DEPLOYER_USERNAME: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				          GITHUB_TOKEN: ${{ github.token }}

				        run: |

				          echo "${QUAY_DEPLOYER_PASSWORD}" | helm registry login quay.io --username "${QUAY_DEPLOYER_USERNAME}" --password-stdin

				          echo "${GITHUB_TOKEN}" | helm registry login ghcr.io --username "${GITHUB_ACTOR}" --password-stdin

				      - name: Push helm chart to the OCI registries

				        run: |

				          echo "Adjusting the Chart.yaml and values.yaml"

				          yq eval '.version = "0.0.0-dev" | .appVersion = "0.0.0-dev"' -i tools/packaging/kata-deploy/helm-chart/kata-deploy/Chart.yaml

				          yq eval '.image.reference = "quay.io/kata-containers/kata-deploy-ci" | .image.tag = "kata-containers-latest"' -i tools/packaging/kata-deploy/helm-chart/kata-deploy/values.yaml

				          echo "Generating the chart package"

				          helm dependencies update tools/packaging/kata-deploy/helm-chart/kata-deploy

				          helm package tools/packaging/kata-deploy/helm-chart/kata-deploy

				          echo "Pushing the chart to the OCI registries"

				          helm push "kata-deploy-0.0.0-dev.tgz" oci://quay.io/kata-containers/kata-deploy-charts

				          helm push "kata-deploy-0.0.0-dev.tgz" oci://ghcr.io/kata-containers/kata-deploy-charts

									
										66

.github/workflows/publish-kata-deploy-payload-amd64.yaml
									
										vendored
									
												View File
											
				@@ -1,66 +0,0 @@

				name: CI | Publish kata-deploy payload for amd64

				on:

				  workflow_call:

				    inputs:

				      tarball-suffix:

				        required: false

				        type: string

				      registry:

				        required: true

				        type: string

				      repo:

				        required: true

				        type: string

				      tag:

				        required: true

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				jobs:

				  kata-payload:

				    runs-on: ubuntu-latest

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        with:

				          name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.registry == 'quay.io' }}

				        uses: docker/login-action@v2

				        with:

				          registry: quay.io

				          username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - name: Login to Kata Containers ghcr.io

				        if: ${{ inputs.registry == 'ghcr.io' }}

				        uses: docker/login-action@v2

				        with:

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - name: build-and-push-kata-payload

				        id: build-and-push-kata-payload

				        run: |

				          ./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \

				          $(pwd)/kata-static.tar.xz \

				          ${{ inputs.registry }}/${{ inputs.repo }} ${{ inputs.tag }}

									
										71

.github/workflows/publish-kata-deploy-payload-arm64.yaml
									
										vendored
									
												View File
											
				@@ -1,71 +0,0 @@

				name: CI | Publish kata-deploy payload for arm64

				on:

				  workflow_call:

				    inputs:

				      tarball-suffix:

				        required: false

				        type: string

				      registry:

				        required: true

				        type: string

				      repo:

				        required: true

				        type: string

				      tag:

				        required: true

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				jobs:

				  kata-payload:

				    runs-on: arm64-builder

				    steps:

				      - name: Adjust a permission for repo

				        run: |

				          sudo chown -R $USER:$USER $GITHUB_WORKSPACE

				      - uses: actions/checkout@v4

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        with:

				          name: kata-static-tarball-arm64${{ inputs.tarball-suffix }}

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.registry == 'quay.io' }}

				        uses: docker/login-action@v2

				        with:

				          registry: quay.io

				          username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - name: Login to Kata Containers ghcr.io

				        if: ${{ inputs.registry == 'ghcr.io' }}

				        uses: docker/login-action@v2

				        with:

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - name: build-and-push-kata-payload

				        id: build-and-push-kata-payload

				        run: |

				          ./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \

				          $(pwd)/kata-static.tar.xz \

				          ${{ inputs.registry }}/${{ inputs.repo }} ${{ inputs.tag }}

									
										75

.github/workflows/publish-kata-deploy-payload-ppc64le.yaml
									
										vendored
									
												View File
											
				@@ -1,75 +0,0 @@

				name: CI | Publish kata-deploy payload for ppc64le

				on:

				  workflow_call:

				    inputs:

				      tarball-suffix:

				        required: false

				        type: string

				      registry:

				        required: true

				        type: string

				      repo:

				        required: true

				        type: string

				      tag:

				        required: true

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				jobs:

				  kata-payload:

				    runs-on: ppc64le

				    steps:

				      - name: Prepare the self-hosted runner

				        run: |

				          ${HOME}/scripts/prepare_runner.sh

				          sudo rm -rf $GITHUB_WORKSPACE/*

				      - name: Adjust a permission for repo

				        run: |

				          sudo chown -R $USER:$USER $GITHUB_WORKSPACE

				      - uses: actions/checkout@v4

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        with:

				          name: kata-static-tarball-ppc64le${{ inputs.tarball-suffix }}

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.registry == 'quay.io' }}

				        uses: docker/login-action@v2

				        with:

				          registry: quay.io

				          username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - name: Login to Kata Containers ghcr.io

				        if: ${{ inputs.registry == 'ghcr.io' }}

				        uses: docker/login-action@v2

				        with:

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - name: build-and-push-kata-payload

				        id: build-and-push-kata-payload

				        run: |

				          ./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \

				          $(pwd)/kata-static.tar.xz \

				          ${{ inputs.registry }}/${{ inputs.repo }} ${{ inputs.tag }}

									
										69

.github/workflows/publish-kata-deploy-payload-s390x.yaml
									
										vendored
									
												View File
											
				@@ -1,69 +0,0 @@

				name: CI | Publish kata-deploy payload for s390x

				on:

				  workflow_call:

				    inputs:

				      tarball-suffix:

				        required: false

				        type: string

				      registry:

				        required: true

				        type: string

				      repo:

				        required: true

				        type: string

				      tag:

				        required: true

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				jobs:

				  kata-payload:

				    runs-on: s390x

				    steps:

				      - name: Take a pre-action for self-hosted runner

				        run: ${HOME}/script/pre_action.sh ubuntu-2204

				      - uses: actions/checkout@v4

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        with:

				          name: kata-static-tarball-s390x${{ inputs.tarball-suffix }}

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.registry == 'quay.io' }}

				        uses: docker/login-action@v2

				        with:

				          registry: quay.io

				          username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - name: Login to Kata Containers ghcr.io

				        if: ${{ inputs.registry == 'ghcr.io' }}

				        uses: docker/login-action@v2

				        with:

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - name: build-and-push-kata-payload

				        id: build-and-push-kata-payload

				        run: |

				          ./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \

				          $(pwd)/kata-static.tar.xz \

				          ${{ inputs.registry }}/${{ inputs.repo }} ${{ inputs.tag }}

									
										108

.github/workflows/publish-kata-deploy-payload.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,108 @@

				name: CI | Publish kata-deploy payload

				on:

				  workflow_call:

				    inputs:

				      tarball-suffix:

				        required: false

				        type: string

				      registry:

				        required: true

				        type: string

				      repo:

				        required: true

				        type: string

				      tag:

				        required: true

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				      runner:

				        default: 'ubuntu-22.04'

				        description: The runner to execute the workflow on. Defaults to 'ubuntu-22.04'.

				        required: false

				        type: string

				      arch:

				        description: The arch of the tarball.

				        required: true

				        type: string

				    secrets:

				      QUAY_DEPLOYER_PASSWORD:

				        required: true

				permissions: {}

				jobs:

				  kata-payload:

				    name: kata-payload

				    permissions:

				      contents: read

				      packages: write

				    runs-on: ${{ inputs.runner }}

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Remove unnecessary directories to free up space

				        run: |

				          sudo rm -rf /usr/local/.ghcup

				          sudo rm -rf /opt/hostedtoolcache/CodeQL

				          sudo rm -rf /usr/local/lib/android

				          sudo rm -rf /usr/share/dotnet

				          sudo rm -rf /opt/ghc

				          sudo rm -rf /usr/local/share/boost

				          sudo rm -rf /usr/lib/jvm

				          sudo rm -rf /usr/share/swift

				          sudo rm -rf /usr/local/share/powershell

				          sudo rm -rf /usr/local/julia*

				          sudo rm -rf /opt/az

				          sudo rm -rf /usr/local/share/chromium

				          sudo rm -rf /opt/microsoft

				          sudo rm -rf /opt/google

				          sudo rm -rf /usr/lib/firefox

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-kata-tarball for ${{ inputs.arch }}

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-${{ inputs.arch}}${{ inputs.tarball-suffix }}

				      - name: Login to Kata Containers quay.io

				        if: ${{ inputs.registry == 'quay.io' }}

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - name: Login to Kata Containers ghcr.io

				        if: ${{ inputs.registry == 'ghcr.io' }}

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - name: build-and-push-kata-payload for ${{ inputs.arch }}

				        id: build-and-push-kata-payload

				        env:

				          REGISTRY: ${{ inputs.registry }}

				          REPO: ${{ inputs.repo }}

				          TAG: ${{ inputs.tag }}

				        run: |

				          ./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \

				          "$(pwd)/kata-static.tar.zst" \

				          "${REGISTRY}/${REPO}" \

				          "${TAG}"

									
										43

.github/workflows/push-oras-tarball-cache.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,43 @@

				# Push gperf and busybox tarballs to the ORAS cache (ghcr.io) so that

				# download-with-oras-cache.sh can pull them instead of hitting upstream.

				# Runs when versions.yaml changes on main (e.g. after a PR merge) or manually.

				name: CI | Push ORAS tarball cache

				on:

				  push:

				    branches:

				      - main

				    paths:

				      - 'versions.yaml'

				  workflow_dispatch:

				permissions: {}

				jobs:

				  push-oras-cache:

				    name: push-oras-cache

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: read

				      packages: write

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Install yq

				        run: ./ci/install_yq.sh

				      - name: Install ORAS

				        uses: oras-project/setup-oras@22ce207df3b08e061f537244349aac6ae1d214f6 # v1.2.4

				        with:

				          version: "1.2.0"

				      - name: Populate ORAS tarball cache

				        run: ./tools/packaging/scripts/populate-oras-tarball-cache.sh all

				        env:

				          ARTEFACT_REGISTRY: ghcr.io

				          ARTEFACT_REPOSITORY: kata-containers

				          ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}

				          ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

									
										59

.github/workflows/release-amd64.yaml
									
										vendored
									
												View File
												
				@@ -5,53 +5,78 @@ on:

				      target-arch:

				        required: true

				        type: string

				    secrets:

				      QUAY_DEPLOYER_PASSWORD:

				        required: true

				      KBUILD_SIGN_PIN:

				        required: true

				permissions: {}

				jobs:

				  build-kata-static-tarball-amd64:

				    uses: ./.github/workflows/build-kata-static-tarball-amd64.yaml

				    with:

				      push-to-registry: yes

				      stage: release

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				  kata-deploy:

				    name: kata-deploy

				    needs: build-kata-static-tarball-amd64

				    runs-on: ubuntu-latest

				    permissions:

				      contents: read

				      packages: write

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Login to Kata Containers docker.io

				        uses: docker/login-action@v2

				      - name: Login to Kata Containers ghcr.io

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          username: ${{ secrets.DOCKER_USERNAME }}

				          password: ${{ secrets.DOCKER_PASSWORD }}

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - name: Login to Kata Containers quay.io

				        uses: docker/login-action@v2

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-amd64

				      - name: build-and-push-kata-deploy-ci-amd64

				        id: build-and-push-kata-deploy-ci-amd64

				        env:

				          TARGET_ARCH: ${{ inputs.target-arch }}

				        run: |

				          # We need to do such trick here as the format of the $GITHUB_REF

				          # is "refs/tags/<tag>"

				          tag=$(echo $GITHUB_REF | cut -d/ -f3-)

				          tag=$(echo "$GITHUB_REF" | cut -d/ -f3-)

				          if [ "${tag}" = "main" ]; then

				              tag=$(./tools/packaging/release/release.sh release-version)

				              tags=(${tag} "latest")

				              tags=("${tag}" "latest")

				          else

				              tags=(${tag})

				              tags=("${tag}")

				          fi

				          for tag in ${tags[@]}; do

				          for tag in "${tags[@]}"; do

				              ./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \

				                  $(pwd)/kata-static.tar.xz "docker.io/katadocker/kata-deploy" \

				                  "${tag}-${{ inputs.target-arch }}"

				                  "$(pwd)"/kata-static.tar.zst "ghcr.io/kata-containers/kata-deploy" \

				                  "${tag}-${TARGET_ARCH}"

				              ./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \

				                  $(pwd)/kata-static.tar.xz "quay.io/kata-containers/kata-deploy" \

				                  "${tag}-${{ inputs.target-arch }}"

				                  "$(pwd)"/kata-static.tar.zst "quay.io/kata-containers/kata-deploy" \

				                  "${tag}-${TARGET_ARCH}"

				          done

									
										59

.github/workflows/release-arm64.yaml
									
										vendored
									
												View File
												
				@@ -5,53 +5,78 @@ on:

				      target-arch:

				        required: true

				        type: string

				    secrets:

				      QUAY_DEPLOYER_PASSWORD:

				        required: true

				      KBUILD_SIGN_PIN:

				        required: true

				permissions: {}

				jobs:

				  build-kata-static-tarball-arm64:

				    uses: ./.github/workflows/build-kata-static-tarball-arm64.yaml

				    with:

				      push-to-registry: yes

				      stage: release

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				  kata-deploy:

				    name: kata-deploy

				    needs: build-kata-static-tarball-arm64

				    runs-on: arm64-builder

				    permissions:

				      contents: read

				      packages: write

				    runs-on: ubuntu-24.04-arm

				    steps:

				      - name: Login to Kata Containers docker.io

				        uses: docker/login-action@v2

				      - name: Login to Kata Containers ghcr.io

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          username: ${{ secrets.DOCKER_USERNAME }}

				          password: ${{ secrets.DOCKER_PASSWORD }}

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - name: Login to Kata Containers quay.io

				        uses: docker/login-action@v2

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-arm64

				      - name: build-and-push-kata-deploy-ci-arm64

				        id: build-and-push-kata-deploy-ci-arm64

				        env:

				          TARGET_ARCH: ${{ inputs.target-arch }}

				        run: |

				          # We need to do such trick here as the format of the $GITHUB_REF

				          # is "refs/tags/<tag>"

				          tag=$(echo $GITHUB_REF | cut -d/ -f3-)

				          tag=$(echo "$GITHUB_REF" | cut -d/ -f3-)

				          if [ "${tag}" = "main" ]; then

				              tag=$(./tools/packaging/release/release.sh release-version)

				              tags=(${tag} "latest")

				              tags=("${tag}" "latest")

				          else

				              tags=(${tag})

				              tags=("${tag}")

				          fi

				          for tag in ${tags[@]}; do

				          for tag in "${tags[@]}"; do

				              ./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \

				                  $(pwd)/kata-static.tar.xz "docker.io/katadocker/kata-deploy" \

				                  "${tag}-${{ inputs.target-arch }}"

				                  "$(pwd)"/kata-static.tar.zst "ghcr.io/kata-containers/kata-deploy" \

				                  "${tag}-${TARGET_ARCH}"

				              ./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \

				                  $(pwd)/kata-static.tar.xz "quay.io/kata-containers/kata-deploy" \

				                  "${tag}-${{ inputs.target-arch }}"

				                  "$(pwd)"/kata-static.tar.zst "quay.io/kata-containers/kata-deploy" \

				                  "${tag}-${TARGET_ARCH}"

				          done

									
										61

.github/workflows/release-ppc64le.yaml
									
										vendored
									
												View File
												
				@@ -5,58 +5,75 @@ on:

				      target-arch:

				        required: true

				        type: string

				    secrets:

				      QUAY_DEPLOYER_PASSWORD:

				        required: true

				permissions: {}

				jobs:

				  build-kata-static-tarball-ppc64le:

				    uses: ./.github/workflows/build-kata-static-tarball-ppc64le.yaml

				    with:

				      push-to-registry: yes

				      stage: release

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				  kata-deploy:

				    name: kata-deploy

				    needs: build-kata-static-tarball-ppc64le

				    runs-on: ppc64le

				    permissions:

				      contents: read

				      packages: write

				    runs-on: ubuntu-24.04-ppc64le

				    steps:

				      - name: Prepare the self-hosted runner

				        run: |

				          bash ${HOME}/scripts/prepare_runner.sh

				          sudo rm -rf $GITHUB_WORKSPACE/*

				      - name: Login to Kata Containers docker.io

				        uses: docker/login-action@v2

				      - name: Login to Kata Containers ghcr.io

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          username: ${{ secrets.DOCKER_USERNAME }}

				          password: ${{ secrets.DOCKER_PASSWORD }}

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - name: Login to Kata Containers quay.io

				        uses: docker/login-action@v2

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-ppc64le

				      - name: build-and-push-kata-deploy-ci-ppc64le

				        id: build-and-push-kata-deploy-ci-ppc64le

				        env:

				          TARGET_ARCH: ${{ inputs.target-arch }}

				        run: |

				          # We need to do such trick here as the format of the $GITHUB_REF

				          # is "refs/tags/<tag>"

				          tag=$(echo $GITHUB_REF | cut -d/ -f3-)

				          tag=$(echo "$GITHUB_REF" | cut -d/ -f3-)

				          if [ "${tag}" = "main" ]; then

				              tag=$(./tools/packaging/release/release.sh release-version)

				              tags=(${tag} "latest")

				              tags=("${tag}" "latest")

				          else

				              tags=(${tag})

				              tags=("${tag}")

				          fi

				          for tag in ${tags[@]}; do

				          for tag in "${tags[@]}"; do

				              ./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \

				                  $(pwd)/kata-static.tar.xz "docker.io/katadocker/kata-deploy" \

				                  "${tag}-${{ inputs.target-arch }}"

				                  "$(pwd)"/kata-static.tar.zst "ghcr.io/kata-containers/kata-deploy" \

				                  "${tag}-${TARGET_ARCH}"

				              ./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \

				                  $(pwd)/kata-static.tar.xz "quay.io/kata-containers/kata-deploy" \

				                  "${tag}-${{ inputs.target-arch }}"

				                  "$(pwd)"/kata-static.tar.zst "quay.io/kata-containers/kata-deploy" \

				                  "${tag}-${TARGET_ARCH}"

				          done

									
										64

.github/workflows/release-s390x.yaml
									
										vendored
									
												View File
												
				@@ -5,57 +5,79 @@ on:

				      target-arch:

				        required: true

				        type: string

				    secrets:

				      CI_HKD_PATH:

				        required: true

				      QUAY_DEPLOYER_PASSWORD:

				        required: true

				permissions: {}

				jobs:

				  build-kata-static-tarball-s390x:

				    uses: ./.github/workflows/build-kata-static-tarball-s390x.yaml

				    with:

				      push-to-registry: yes

				      stage: release

				    secrets: inherit

				    secrets:

				      CI_HKD_PATH: ${{ secrets.CI_HKD_PATH }}

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				  kata-deploy:

				    name: kata-deploy

				    needs: build-kata-static-tarball-s390x

				    runs-on: s390x

				    permissions:

				      contents: read

				      packages: write

				    runs-on: ubuntu-24.04-s390x

				    steps:

				      - name: Take a pre-action for self-hosted runner

				        run: ${HOME}/script/pre_action.sh ubuntu-2204

				      - name: Login to Kata Containers docker.io

				        uses: docker/login-action@v2

				      - name: Login to Kata Containers ghcr.io

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          username: ${{ secrets.DOCKER_USERNAME }}

				          password: ${{ secrets.DOCKER_PASSWORD }}

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - name: Login to Kata Containers quay.io

				        uses: docker/login-action@v2

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-s390x

				      - name: build-and-push-kata-deploy-ci-s390x

				        id: build-and-push-kata-deploy-ci-s390x

				        env:

				          TARGET_ARCH: ${{ inputs.target-arch }}

				        run: |

				          # We need to do such trick here as the format of the $GITHUB_REF

				          # is "refs/tags/<tag>"

				          tag=$(echo $GITHUB_REF | cut -d/ -f3-)

				          tag=$(echo "$GITHUB_REF" | cut -d/ -f3-)

				          if [ "${tag}" = "main" ]; then

				              tag=$(./tools/packaging/release/release.sh release-version)

				              tags=(${tag} "latest")

				              tags=("${tag}" "latest")

				          else

				              tags=(${tag})

				              tags=("${tag}")

				          fi

				          for tag in ${tags[@]}; do

				          for tag in "${tags[@]}"; do

				              ./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \

				                  $(pwd)/kata-static.tar.xz "docker.io/katadocker/kata-deploy" \

				                  "${tag}-${{ inputs.target-arch }}"

				                  "$(pwd)"/kata-static.tar.zst "ghcr.io/kata-containers/kata-deploy" \

				                  "${tag}-${TARGET_ARCH}"

				              ./tools/packaging/kata-deploy/local-build/kata-deploy-build-and-upload-payload.sh \

				                  $(pwd)/kata-static.tar.xz "quay.io/kata-containers/kata-deploy" \

				                  "${tag}-${{ inputs.target-arch }}"

				                  "$(pwd)"/kata-static.tar.zst "quay.io/kata-containers/kata-deploy" \

				                  "${tag}-${TARGET_ARCH}"

				          done

									
										233

.github/workflows/release.yaml
									
										vendored
									
												View File
												
				@@ -2,19 +2,20 @@ name: Release Kata Containers

				on:

				  workflow_dispatch

				permissions: {}

				jobs:

				  release:

				    runs-on: ubuntu-latest

				    name: release

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: write # needed for the `gh release create` command

				    steps:

				      - name: Checkout code

				        uses: actions/checkout@v4

				      - name: Checkout repository

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				      - name: Get the new release version

				        run: |

				          release_version=$(./tools/packaging/release/release.sh release-version)

				          echo "RELEASE_VERSION=$release_version" >> "$GITHUB_ENV"

				          persist-credentials: false

				      - name: Create a new release

				        run: |

				@@ -24,50 +25,84 @@ jobs:

				  build-and-push-assets-amd64:

				    needs: release

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    uses: ./.github/workflows/release-amd64.yaml

				    with:

				      target-arch: amd64

				    secrets: inherit

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

				  build-and-push-assets-arm64:

				    needs: release

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    uses: ./.github/workflows/release-arm64.yaml

				    with:

				      target-arch: arm64

				    secrets: inherit

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      KBUILD_SIGN_PIN: ${{ secrets.KBUILD_SIGN_PIN }}

				  build-and-push-assets-s390x:

				    needs: release

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    uses: ./.github/workflows/release-s390x.yaml

				    with:

				      target-arch: s390x

				    secrets: inherit

				    secrets:

				      CI_HKD_PATH: ${{ secrets.CI_HKD_PATH }}

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				  build-and-push-assets-ppc64le:

				    needs: release

				    permissions:

				      contents: read

				      packages: write

				      id-token: write

				      attestations: write

				    uses: ./.github/workflows/release-ppc64le.yaml

				    with:

				      target-arch: ppc64le

				    secrets: inherit

				    secrets:

				      QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				  publish-multi-arch-images:

				    runs-on: ubuntu-latest

				    name: publish-multi-arch-images

				    runs-on: ubuntu-22.04

				    needs: [build-and-push-assets-amd64, build-and-push-assets-arm64, build-and-push-assets-s390x, build-and-push-assets-ppc64le]

				    permissions:

				      contents: write # needed for the `gh release` commands

				      packages: write # needed to push the multi-arch manifest to ghcr.io

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@v4

				      - name: Login to Kata Containers docker.io

				        uses: docker/login-action@v2

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          username: ${{ secrets.DOCKER_USERNAME }}

				          password: ${{ secrets.DOCKER_PASSWORD }}

				          persist-credentials: false

				      - name: Login to Kata Containers ghcr.io

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: ghcr.io

				          username: ${{ github.actor }}

				          password: ${{ secrets.GITHUB_TOKEN }}

				      - name: Login to Kata Containers quay.io

				        uses: docker/login-action@v2

				        uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

				        with:

				          registry: quay.io

				          username: ${{ secrets.QUAY_DEPLOYER_USERNAME }}

				          username: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				      - name: Get the image tags

				@@ -75,96 +110,200 @@ jobs:

				          release_version=$(./tools/packaging/release/release.sh release-version)

				          echo "KATA_DEPLOY_IMAGE_TAGS=$release_version latest" >> "$GITHUB_ENV"

				      - name: Push multi-arch manifest

				      - name: Publish multi-arch manifest on quay.io & ghcr.io

				        run: |

				          ./tools/packaging/release/release.sh publish-multiarch-manifest

				        env:

				          KATA_DEPLOY_REGISTRIES: "quay.io/kata-containers/kata-deploy docker.io/katadocker/kata-deploy"

				          KATA_DEPLOY_REGISTRIES: "quay.io/kata-containers/kata-deploy ghcr.io/kata-containers/kata-deploy"

				  upload-multi-arch-static-tarball:

				    name: upload-multi-arch-static-tarball

				    needs: [build-and-push-assets-amd64, build-and-push-assets-arm64, build-and-push-assets-s390x, build-and-push-assets-ppc64le]

				    runs-on: ubuntu-latest

				    permissions:

				      contents: write # needed for the `gh release` commands

				    runs-on: ubuntu-22.04

				    steps:

				      - uses: actions/checkout@v4

				      - name: Checkout repository

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: Set KATA_STATIC_TARBALL env var

				        run: |

				          tarball=$(pwd)/kata-static.tar.xz

				          tarball=$(pwd)/kata-static.tar.zst

				          echo "KATA_STATIC_TARBALL=${tarball}" >> "$GITHUB_ENV"

				      - name: download-artifacts-amd64

				        uses: actions/download-artifact@v3

				      - name: Download amd64 artifacts

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-amd64

				      - name: push amd64 static tarball to github

				      - name: Upload amd64 static tarball to GitHub

				        run: |

				          ./tools/packaging/release/release.sh upload-kata-static-tarball

				        env:

				          GH_TOKEN: ${{ github.token }}

				          ARCHITECTURE: amd64

				      - name: download-artifacts-arm64

				        uses: actions/download-artifact@v3

				      - name: Download arm64 artifacts

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-arm64

				      - name: push arm64 static tarball to github

				      - name: Upload arm64 static tarball to GitHub

				        run: |

				          ./tools/packaging/release/release.sh upload-kata-static-tarball

				        env:

				          GH_TOKEN: ${{ github.token }}

				          ARCHITECTURE: arm64

				      - name: download-artifacts-s390x

				        uses: actions/download-artifact@v3

				      - name: Download s390x artifacts

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-s390x

				      - name: push s390x static tarball to github

				      - name: Upload s390x static tarball to GitHub

				        run: |

				          ./tools/packaging/release/release.sh upload-kata-static-tarball

				        env:

				          GH_TOKEN: ${{ github.token }}

				          ARCHITECTURE: s390x

				      - name: download-artifacts-ppc64le

				        uses: actions/download-artifact@v3

				      - name: Download ppc64le artifacts

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-ppc64le

				      - name: push ppc64le static tarball to github

				      - name: Upload ppc64le static tarball to GitHub

				        run: |

				          ./tools/packaging/release/release.sh upload-kata-static-tarball

				        env:

				          GH_TOKEN: ${{ github.token }}

				          ARCHITECTURE: ppc64le

				      - name: Set KATA_TOOLS_STATIC_TARBALL env var

				        run: |

				          tarball=$(pwd)/kata-tools-static.tar.zst

				          echo "KATA_TOOLS_STATIC_TARBALL=${tarball}" >> "$GITHUB_ENV"

				      - name: Download amd64 tools artifacts

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-tools-static-tarball-amd64

				      - name: Upload amd64 static tarball tools to GitHub

				        run: |

				          ./tools/packaging/release/release.sh upload-kata-tools-static-tarball

				        env:

				          GH_TOKEN: ${{ github.token }}

				          ARCHITECTURE: amd64

				  upload-versions-yaml:

				    name: upload-versions-yaml

				    needs: release

				    runs-on: ubuntu-latest

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: write # needed for the `gh release` commands

				    steps:

				      - uses: actions/checkout@v4

				      - name: upload versions.yaml

				      - name: Checkout repository

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: Upload versions.yaml to GitHub

				        run: |

				          ./tools/packaging/release/release.sh upload-versions-yaml-file

				        env:

				          GH_TOKEN: ${{ github.token }}

				  upload-cargo-vendored-tarball:

				    name: upload-cargo-vendored-tarball

				    needs: release

				    runs-on: ubuntu-latest

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: write # needed for the `gh release` commands

				    steps:

				      - uses: actions/checkout@v4

				      - name: generate-and-upload-tarball

				      - name: Checkout repository

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: Generate and upload vendored code tarball

				        run: |

				          ./tools/packaging/release/release.sh upload-vendored-code-tarball

				        env:

				          GH_TOKEN: ${{ github.token }}

				  upload-libseccomp-tarball:

				    name: upload-libseccomp-tarball

				    needs: release

				    runs-on: ubuntu-latest

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: write # needed for the `gh release` commands

				    steps:

				      - uses: actions/checkout@v4

				      - name: download-and-upload-tarball

				      - name: Checkout repository

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: Download libseccomp tarball and upload it to GitHub

				        run: |

				          ./tools/packaging/release/release.sh upload-libseccomp-tarball

				        env:

				          GH_TOKEN: ${{ github.token }}

				  upload-helm-chart-tarball:

				    name: upload-helm-chart-tarball

				    needs: release

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: write # needed for the `gh release` commands

				      packages: write # needed to push the helm chart to ghcr.io

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: Install helm

				        uses: azure/setup-helm@fe7b79cd5ee1e45176fcad797de68ecaf3ca4814 # v4.2.0

				        id: install

				      - name: Generate and upload helm chart tarball

				        run: |

				          ./tools/packaging/release/release.sh upload-helm-chart-tarball

				        env:

				          GH_TOKEN: ${{ github.token }}

				      - name: Login to the OCI registries

				        env:

				          QUAY_DEPLOYER_USERNAME: ${{ vars.QUAY_DEPLOYER_USERNAME }}

				          QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}

				          GITHUB_TOKEN: ${{ github.token }}

				        run: |

				          echo "${QUAY_DEPLOYER_PASSWORD}" | helm registry login quay.io --username "${QUAY_DEPLOYER_USERNAME}" --password-stdin

				          echo "${GITHUB_TOKEN}" | helm registry login ghcr.io --username "${GITHUB_ACTOR}" --password-stdin

				      - name: Push helm chart to the OCI registries

				        run: |

				          release_version=$(./tools/packaging/release/release.sh release-version)

				          helm push "kata-deploy-${release_version}.tgz" oci://quay.io/kata-containers/kata-deploy-charts

				          helm push "kata-deploy-${release_version}.tgz" oci://ghcr.io/kata-containers/kata-deploy-charts

				  publish-release:

				    name: publish-release

				    needs: [ build-and-push-assets-amd64, build-and-push-assets-arm64, build-and-push-assets-s390x, build-and-push-assets-ppc64le, publish-multi-arch-images, upload-multi-arch-static-tarball, upload-versions-yaml, upload-cargo-vendored-tarball, upload-libseccomp-tarball ]

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: write # needed for the `gh release` commands

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: Publish a release

				        run: |

				          ./tools/packaging/release/release.sh publish-release

				        env:

				          GH_TOKEN: ${{ github.token }}

									
										67

.github/workflows/run-cri-containerd-tests-ppc64le.yaml
									
										vendored
									
												View File
											
				@@ -1,67 +0,0 @@

				name: CI | Run cri-containerd tests on ppc64le

				on:

				  workflow_call:

				    inputs:

				      tarball-suffix:

				        required: false

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				jobs:

				  run-cri-containerd:

				    strategy:

				      # We can set this to true whenever we're 100% sure that

				      # the all the tests are not flaky, otherwise we'll fail

				      # all the tests due to a single flaky instance

				      fail-fast: false

				      matrix:

				        containerd_version: ['active']

				        vmm: ['qemu']

				    runs-on: ppc64le

				    env:

				      CONTAINERD_VERSION: ${{ matrix.containerd_version }}

				      GOPATH: ${{ github.workspace }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				    steps:

				      - name: Adjust a permission for repo

				        run: sudo chown -R $USER:$USER $GITHUB_WORKSPACE

				      - name: Prepare the self-hosted runner

				        run: |

				          bash ${HOME}/scripts/prepare_runner.sh cri-containerd

				          sudo rm -rf $GITHUB_WORKSPACE/*

				      - uses: actions/checkout@v4

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Install dependencies

				        run: bash tests/integration/cri-containerd/gha-run.sh install-dependencies

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        with:

				          name: kata-static-tarball-ppc64le${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				      - name: Install kata

				        run: bash tests/integration/cri-containerd/gha-run.sh install-kata kata-artifacts

				      - name: Run cri-containerd tests

				        run: bash tests/integration/cri-containerd/gha-run.sh run

				      - name: Cleanup actions for the self hosted runner

				        run: ${HOME}/scripts/cleanup_runner.sh

									
										63

.github/workflows/run-cri-containerd-tests-s390x.yaml
									
										vendored
									
												View File
											
				@@ -1,63 +0,0 @@

				name: CI | Run cri-containerd tests

				on:

				  workflow_call:

				    inputs:

				      tarball-suffix:

				        required: false

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				jobs:

				  run-cri-containerd:

				    strategy:

				      # We can set this to true whenever we're 100% sure that

				      # the all the tests are not flaky, otherwise we'll fail

				      # all the tests due to a single flaky instance

				      fail-fast: false

				      matrix:

				        containerd_version: ['active']

				        vmm: ['qemu']

				    runs-on: s390x

				    env:

				      CONTAINERD_VERSION: ${{ matrix.containerd_version }}

				      GOPATH: ${{ github.workspace }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				    steps:

				      - name: Take a pre-action for self-hosted runner

				        run: ${HOME}/script/pre_action.sh ubuntu-2204

				      - uses: actions/checkout@v4

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Install dependencies

				        run: bash tests/integration/cri-containerd/gha-run.sh install-dependencies

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        with:

				          name: kata-static-tarball-s390x${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				      - name: Install kata

				        run: bash tests/integration/cri-containerd/gha-run.sh install-kata kata-artifacts

				      - name: Run cri-containerd tests

				        run: bash tests/integration/cri-containerd/gha-run.sh run

				      - name: Take a post-action for self-hosted runner

				        if: always()

				        run: ${HOME}/script/post_action.sh ubuntu-2204

									
										75

.github/workflows/run-cri-containerd-tests.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,75 @@

				name: CI | Run cri-containerd tests

				permissions: {}

				on:

				  workflow_call:

				    inputs:

				      tarball-suffix:

				        required: false

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				      runner:

				        description: The runner to execute the workflow on.

				        required: true

				        type: string

				      arch:

				        description: The arch of the tarball.

				        required: true

				        type: string

				      containerd_version:

				        description: The version of containerd for testing.

				        required: true

				        type: string

				      vmm:

				        description: The kata hypervisor for testing.

				        required: true

				        type: string

				jobs:

				  run-cri-containerd:

				    name: run-cri-containerd-${{ inputs.arch }} (${{ inputs.containerd_version }}, ${{ inputs.vmm }})

				    strategy:

				      fail-fast: false

				    runs-on: ${{ inputs.runner }}

				    env:

				      CONTAINERD_VERSION: ${{ inputs.containerd_version }}

				      GOPATH: ${{ github.workspace }}

				      KATA_HYPERVISOR: ${{ inputs.vmm }}

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Install dependencies

				        timeout-minutes: 15

				        run: bash tests/integration/cri-containerd/gha-run.sh install-dependencies

				        env:

				          GH_TOKEN: ${{ github.token }}

				      - name: get-kata-tarball for ${{ inputs.arch }}

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-${{ inputs.arch }}${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				      - name: Install kata

				        run: bash tests/integration/cri-containerd/gha-run.sh install-kata kata-artifacts

				      - name: Run cri-containerd tests for ${{ inputs.arch }}

				        timeout-minutes: 10

				        run: bash tests/integration/cri-containerd/gha-run.sh run

									
										98

.github/workflows/run-k8s-tests-on-aks.yaml
									
										vendored
									
												View File
												
				@@ -24,9 +24,21 @@ on:

				        required: false

				        type: string

				        default: ""

				    secrets:

				      AZ_APPID:

				        required: true

				      AZ_TENANT_ID:

				       required: true

				      AZ_SUBSCRIPTION_ID:

				        required: true

				permissions: {}

				jobs:

				  run-k8s-tests:

				    name: run-k8s-tests

				    strategy:

				      fail-fast: false

				      matrix:

				@@ -36,7 +48,7 @@ jobs:

				          - clh

				          - dragonball

				          - qemu

				          - stratovirt

				          - qemu-runtime-rs

				          - cloud-hypervisor

				        instance-type:

				          - small

				@@ -45,10 +57,19 @@ jobs:

				          - host_os: cbl-mariner

				            vmm: clh

				            instance-type: small

				            genpolicy-pull-method: oci-distribution

				          - host_os: cbl-mariner

				            vmm: clh

				            instance-type: small

				            genpolicy-pull-method: containerd

				          - host_os: cbl-mariner

				            vmm: clh

				            instance-type: normal

				    runs-on: ubuntu-latest

				    runs-on: ubuntu-22.04

				    permissions:

				      contents: read

				      id-token: write # Used for OIDC access to log into Azure

				    environment: ci

				    env:

				      DOCKER_REGISTRY: ${{ inputs.registry }}

				      DOCKER_REPO: ${{ inputs.repo }}

				@@ -56,18 +77,15 @@ jobs:

				      GH_PR_NUMBER: ${{ inputs.pr-number }}

				      KATA_HOST_OS: ${{ matrix.host_os }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      # Set to install the KBS for attestation tests

				      KBS: ${{ (matrix.vmm == 'qemu' && matrix.host_os == 'ubuntu') && 'true' || 'false' }}

				      # Set the KBS ingress handler (empty string disables handling)

				      KBS_INGRESS: "aks"

				      KUBERNETES: "vanilla"

				      USING_NFD: "false"

				      K8S_TEST_HOST_TYPE: ${{ matrix.instance-type }}

				      GENPOLICY_PULL_METHOD: ${{ matrix.genpolicy-pull-method }}

				    steps:

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -75,57 +93,67 @@ jobs:

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				      - name: get-kata-tools-tarball

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				          name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-tools-artifacts

				      - name: Install kata

				        run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-artifacts

				      - name: Install kata-tools

				        run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts

				      - name: Download Azure CLI

				        run: bash tests/integration/kubernetes/gha-run.sh install-azure-cli

				        uses: azure/setup-kubectl@776406bce94f63e41d621b960d78ee25c8b76ede # v4.0.1

				        with:

				          version: 'latest'

				      - name: Log into the Azure account

				        run: bash tests/integration/kubernetes/gha-run.sh login-azure

				        env:

				          AZ_APPID: ${{ secrets.AZ_APPID }}

				          AZ_PASSWORD: ${{ secrets.AZ_PASSWORD }}

				          AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}

				          AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				        uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0

				        with:

				          client-id: ${{ secrets.AZ_APPID }}

				          tenant-id: ${{ secrets.AZ_TENANT_ID }}

				          subscription-id: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				      - name: Create AKS cluster

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh create-cluster

				        uses: nick-fields/retry@ce71cc2ab81d554ebbe88c79ab5975992d79ba08 # v3.0.2

				        with:

				          timeout_minutes: 15

				          max_attempts: 20

				          retry_on: error

				          retry_wait_seconds: 10

				          command: bash tests/integration/kubernetes/gha-run.sh create-cluster

				      - name: Install `bats`

				        run: bash tests/integration/kubernetes/gha-run.sh install-bats

				      - name: Install `kubectl`

				        run: bash tests/integration/kubernetes/gha-run.sh install-kubectl

				        uses: azure/setup-kubectl@776406bce94f63e41d621b960d78ee25c8b76ede # v4.0.1

				        with:

				          version: 'latest'

				      - name: Download credentials for the Kubernetes CLI to use them

				        run: bash tests/integration/kubernetes/gha-run.sh get-cluster-credentials

				      - name: Deploy Kata

				        timeout-minutes: 10

				        timeout-minutes: 20

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-aks

				      - name: Deploy CoCo KBS

				        if: env.KBS == 'true'

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-coco-kbs

				      - name: Install `kbs-client`

				        if: env.KBS == 'true'

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh install-kbs-client

				      - name: Run tests

				        timeout-minutes: 60

				        run: bash tests/integration/kubernetes/gha-run.sh run-tests

				      - name: Report tests

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh report-tests

				      - name: Refresh OIDC token in case access token expired

				        if: always()

				        uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0

				        with:

				          client-id: ${{ secrets.AZ_APPID }}

				          tenant-id: ${{ secrets.AZ_TENANT_ID }}

				          subscription-id: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				      - name: Delete AKS cluster

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh delete-cluster

									
										91

.github/workflows/run-k8s-tests-on-arm64.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,91 @@

				name: CI | Run kubernetes tests on arm64

				on:

				  workflow_call:

				    inputs:

				      registry:

				        required: true

				        type: string

				      repo:

				        required: true

				        type: string

				      tag:

				        required: true

				        type: string

				      pr-number:

				        required: true

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				permissions: {}

				jobs:

				  run-k8s-tests-on-arm64:

				    name: run-k8s-tests-on-arm64

				    strategy:

				      fail-fast: false

				      matrix:

				        vmm:

				          - qemu

				          - qemu-runtime-rs

				        k8s:

				          - kubeadm

				    runs-on: arm64-k8s

				    env:

				      DOCKER_REGISTRY: ${{ inputs.registry }}

				      DOCKER_REPO: ${{ inputs.repo }}

				      DOCKER_TAG: ${{ inputs.tag }}

				      GH_PR_NUMBER: ${{ inputs.pr-number }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      KUBERNETES: ${{ matrix.k8s }}

				      K8S_TEST_HOST_TYPE: all

				      TARGET_ARCH: "aarch64"

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Deploy Kata

				        timeout-minutes: 20

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-kata

				      - name: Install `bats`

				        run: bash tests/integration/kubernetes/gha-run.sh install-bats

				      - name: Run tests

				        timeout-minutes: 30

				        run: bash tests/integration/kubernetes/gha-run.sh run-tests

				      - name: Report tests

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh report-tests

				      - name: Collect artifacts ${{ matrix.vmm }}

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh collect-artifacts

				        continue-on-error: true

				      - name: Archive artifacts ${{ matrix.vmm }}

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: k8s-tests-${{ matrix.vmm }}-${{ matrix.k8s }}-${{ inputs.tag }}

				          path: /tmp/artifacts

				          retention-days: 1

				      - name: Delete kata-deploy

				        if: always()

				        timeout-minutes: 15

				        run: bash tests/integration/kubernetes/gha-run.sh cleanup

									
										100

.github/workflows/run-k8s-tests-on-garm.yaml
									
										vendored
									
												View File
											
				@@ -1,100 +0,0 @@

				name: CI | Run kubernetes tests on GARM

				on:

				  workflow_call:

				    inputs:

				      registry:

				        required: true

				        type: string

				      repo:

				        required: true

				        type: string

				      tag:

				        required: true

				        type: string

				      pr-number:

				        required: true

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				jobs:

				  run-k8s-tests:

				    strategy:

				      fail-fast: false

				      matrix:

				        vmm:

				          - clh #cloud-hypervisor

				          - dragonball

				          - fc #firecracker

				          - qemu

				          - cloud-hypervisor

				        snapshotter:

				          - devmapper

				        k8s:

				          - k3s

				        instance:

				          - garm-ubuntu-2004

				          - garm-ubuntu-2004-smaller

				        include:

				          - instance: garm-ubuntu-2004

				            instance-type: normal

				          - instance: garm-ubuntu-2004-smaller

				            instance-type: small

				    runs-on: ${{ matrix.instance }}

				    env:

				      DOCKER_REGISTRY: ${{ inputs.registry }}

				      DOCKER_REPO: ${{ inputs.repo }}

				      DOCKER_TAG: ${{ inputs.tag }}

				      PR_NUMBER: ${{ inputs.pr-number }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      KUBERNETES: ${{ matrix.k8s }}

				      SNAPSHOTTER: ${{ matrix.snapshotter }}

				      USING_NFD: "false"

				      K8S_TEST_HOST_TYPE: ${{ matrix.instance-type }}

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Deploy ${{ matrix.k8s }}

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-k8s

				      - name: Configure the ${{ matrix.snapshotter }} snapshotter

				        run: bash tests/integration/kubernetes/gha-run.sh configure-snapshotter

				      - name: Deploy Kata

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-garm

				      - name: Install `bats`

				        run: bash tests/integration/kubernetes/gha-run.sh install-bats

				      - name: Run tests

				        timeout-minutes: 30

				        run: bash tests/integration/kubernetes/gha-run.sh run-tests

				      - name: Collect artifacts ${{ matrix.vmm }}

				        run: bash tests/integration/kubernetes/gha-run.sh collect-artifacts

				      - name: Archive artifacts ${{ matrix.vmm }}

				        uses: actions/upload-artifact@v3

				        with:

				          name: k8s-tests-garm-${{ matrix.vmm }}

				          path: /tmp/artifacts

				          retention-days: 1

				      - name: Delete kata-deploy

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh cleanup-garm

									
										131

.github/workflows/run-k8s-tests-on-nvidia-gpu.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,131 @@

				name: CI | Run NVIDIA GPU kubernetes tests on amd64

				on:

				  workflow_call:

				    inputs:

				      tarball-suffix:

				        required: true

				        type: string

				      registry:

				        required: true

				        type: string

				      repo:

				        required: true

				        type: string

				      tag:

				        required: true

				        type: string

				      pr-number:

				        required: true

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				    secrets:

				      NGC_API_KEY:

				        required: true

				permissions: {}

				jobs:

				  run-nvidia-gpu-tests-on-amd64:

				    name: run-${{ matrix.environment.name }}-tests-on-amd64

				    strategy:

				      fail-fast: false

				      matrix:

				        environment: [

				          { name: nvidia-gpu,     vmm: qemu-nvidia-gpu,     runner: amd64-nvidia-a100 },

				          { name: nvidia-gpu-snp, vmm: qemu-nvidia-gpu-snp, runner: amd64-nvidia-h100-snp },

				        ]

				    runs-on: ${{ matrix.environment.runner }}

				    env:

				      DOCKER_REGISTRY: ${{ inputs.registry }}

				      DOCKER_REPO: ${{ inputs.repo }}

				      DOCKER_TAG: ${{ inputs.tag }}

				      GH_PR_NUMBER: ${{ inputs.pr-number }}

				      KATA_HYPERVISOR: ${{ matrix.environment.vmm }}

				      KUBERNETES: kubeadm

				      KBS: ${{ matrix.environment.name == 'nvidia-gpu-snp' && 'true' || 'false' }}

				      K8S_TEST_HOST_TYPE: baremetal

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-kata-tools-tarball

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-tools-artifacts

				      - name: Install kata-tools

				        run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts

				      - name: Uninstall previous `kbs-client`

				        if: matrix.environment.name != 'nvidia-gpu'

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh uninstall-kbs-client

				      - name: Deploy CoCo KBS

				        if: matrix.environment.name != 'nvidia-gpu'

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-coco-kbs

				        env:

				          NVIDIA_VERIFIER_MODE: remote

				          KBS_INGRESS: nodeport

				      - name: Install `kbs-client`

				        if: matrix.environment.name != 'nvidia-gpu'

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh install-kbs-client

				      - name: Deploy Kata

				        timeout-minutes: 20

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-kata

				      - name: Install `bats`

				        run: bash tests/integration/kubernetes/gha-run.sh install-bats

				      - name: Run tests ${{ matrix.environment.vmm }}

				        timeout-minutes: 30

				        run: bash tests/integration/kubernetes/gha-run.sh run-nv-tests

				        env:

				          NGC_API_KEY: ${{ secrets.NGC_API_KEY }}

				      - name: Report tests

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh report-tests

				      - name: Collect artifacts ${{ matrix.environment.vmm }}

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh collect-artifacts

				        continue-on-error: true

				      - name: Archive artifacts ${{ matrix.environment.vmm }}

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: k8s-tests-${{ matrix.environment.vmm }}-kubeadm-${{ inputs.tag }}

				          path: /tmp/artifacts

				          retention-days: 1

				      - name: Delete kata-deploy

				        if: always()

				        timeout-minutes: 15

				        run: bash tests/integration/kubernetes/gha-run.sh cleanup

				      - name: Delete CoCo KBS

				        if: always() && matrix.environment.name != 'nvidia-gpu'

				        timeout-minutes: 10

				        run: |

				          bash tests/integration/kubernetes/gha-run.sh delete-coco-kbs

									
										33

.github/workflows/run-k8s-tests-on-ppc64le.yaml
									
										vendored
									
												View File
												
				@@ -22,8 +22,11 @@ on:

				        type: string

				        default: ""

				permissions: {}

				jobs:

				  run-k8s-tests:

				    name: run-k8s-tests

				    strategy:

				      fail-fast: false

				      matrix:

				@@ -31,27 +34,22 @@ jobs:

				          - qemu

				        k8s:

				          - kubeadm

				    runs-on: ppc64le

				    runs-on: ppc64le-k8s

				    env:

				      DOCKER_REGISTRY: ${{ inputs.registry }}

				      DOCKER_REPO: ${{ inputs.repo }}

				      DOCKER_TAG: ${{ inputs.tag }}

				      PR_NUMBER: ${{ inputs.pr-number }}

				      GH_PR_NUMBER: ${{ inputs.pr-number }}

				      GOPATH: ${{ github.workspace }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      KUBERNETES: ${{ matrix.k8s }}

				      USING_NFD: "false"

				      TARGET_ARCH: "ppc64le"

				    steps:

				      - name: Prepare the self-hosted runner

				        run: | 

				          bash ${HOME}/scripts/prepare_runner.sh kubernetes

				          sudo rm -rf $GITHUB_WORKSPACE/*

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -62,21 +60,22 @@ jobs:

				      - name: Install golang

				        run: |

				          ./tests/install_go.sh -f -p

				          echo "/usr/local/go/bin" >> $GITHUB_PATH

				          echo "/usr/local/go/bin" >> "$GITHUB_PATH"

				      - name: Prepare the runner for k8s cluster creation

				        run: bash ${HOME}/scripts/k8s_cluster_cleanup.sh

				      - name: Prepare the runner for k8s test suite

				        run: bash "${HOME}/scripts/k8s_cluster_prepare.sh"

				      - name: Create k8s cluster using kubeadm

				        run: bash ${HOME}/scripts/k8s_cluster_create.sh

				      - name: Check if cluster is healthy to run the tests

				        run: bash "${HOME}/scripts/k8s_cluster_check.sh"

				      - name: Deploy Kata

				        timeout-minutes: 10

				        timeout-minutes: 20

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-kubeadm

				      - name: Run tests

				        timeout-minutes: 30

				        run: bash tests/integration/kubernetes/gha-run.sh run-tests

				      - name: Delete cluster and post cleanup actions

				        run: bash ${HOME}/scripts/k8s_cluster_cleanup.sh

				      - name: Report tests

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh report-tests

									
										107

.github/workflows/run-k8s-tests-on-zvsi.yaml
									
										vendored
									
												View File
												
				@@ -21,37 +21,68 @@ on:

				        required: false

				        type: string

				        default: ""

				    secrets:

				      AUTHENTICATED_IMAGE_PASSWORD:

				        required: true

				permissions: {}

				jobs:

				  run-k8s-tests:

				    name: run-k8s-tests

				    strategy:

				      fail-fast: false

				      matrix:

				        snapshotter:

				          - overlayfs

				          - devmapper

				          - nydus

				        vmm:

				          - qemu

				        snapshotter:

				          - devmapper

				          - qemu-runtime-rs

				          - qemu-coco-dev

				        k8s:

				          - k3s

				    runs-on: s390x

				          - kubeadm

				        include:

				          - snapshotter: devmapper

				            pull-type: default

				            deploy-cmd: configure-snapshotter

				          - snapshotter: nydus

				            pull-type: guest-pull

				            deploy-cmd: deploy-snapshotter

				        exclude:

				          - snapshotter: overlayfs

				            vmm: qemu

				          - snapshotter: overlayfs

				            vmm: qemu-coco-dev

				          - snapshotter: devmapper

				            vmm: qemu-runtime-rs

				          - snapshotter: devmapper

				            vmm: qemu-coco-dev

				          - snapshotter: nydus

				            vmm: qemu

				          - snapshotter: nydus

				            vmm: qemu-runtime-rs

				    runs-on: s390x-large

				    env:

				      DOCKER_REGISTRY: ${{ inputs.registry }}

				      DOCKER_REPO: ${{ inputs.repo }}

				      DOCKER_TAG: ${{ inputs.tag }}

				      PR_NUMBER: ${{ inputs.pr-number }}

				      GH_PR_NUMBER: ${{ inputs.pr-number }}

				      KATA_HOST_OS: "ubuntu"

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      KUBERNETES: "k3s"

				      KUBERNETES: ${{ matrix.k8s }}

				      PULL_TYPE: ${{ matrix.pull-type }}

				      SNAPSHOTTER: ${{ matrix.snapshotter }}

				      USING_NFD: "true"

				      TARGET_ARCH: "s390x"

				      AUTHENTICATED_IMAGE_USER: ${{ vars.AUTHENTICATED_IMAGE_USER }}

				      AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}

				    steps:

				      - name: Take a pre-action for self-hosted runner

				        run: ${HOME}/script/pre_action.sh ubuntu-2204

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -59,22 +90,60 @@ jobs:

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Deploy ${{ matrix.k8s }}

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-k8s

				      - name: Set SNAPSHOTTER to empty if overlayfs

				        run: echo "SNAPSHOTTER=" >> "$GITHUB_ENV"

				        if: ${{ matrix.snapshotter == 'overlayfs' }}

				      - name: Set KBS and KBS_INGRESS if qemu-coco-dev

				        run: |

				          echo "KBS=true" >> "$GITHUB_ENV"

				          echo "KBS_INGRESS=nodeport" >> "$GITHUB_ENV"

				        if: ${{ matrix.vmm == 'qemu-coco-dev' }}

				      # qemu-runtime-rs only works with overlayfs

				      # See: https://github.com/kata-containers/kata-containers/issues/10066

				      - name: Configure the ${{ matrix.snapshotter }} snapshotter

				        run: bash tests/integration/kubernetes/gha-run.sh configure-snapshotter

				        env:

				          DEPLOY_CMD: ${{ matrix.deploy-cmd }}

				        run: bash tests/integration/kubernetes/gha-run.sh "${DEPLOY_CMD}"

				        if: ${{ matrix.snapshotter != 'overlayfs' }}

				      - name: Deploy Kata

				        timeout-minutes: 10

				        timeout-minutes: 20

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-zvsi

				      - name: Uninstall previous `kbs-client`

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh uninstall-kbs-client

				        if: ${{ matrix.vmm == 'qemu-coco-dev' }}

				      - name: Deploy CoCo KBS

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-coco-kbs

				        if: ${{ matrix.vmm == 'qemu-coco-dev' }}

				      - name: Install `kbs-client`

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh install-kbs-client

				        if: ${{ matrix.vmm == 'qemu-coco-dev' }}

				      - name: Run tests

				        timeout-minutes: 30

				        timeout-minutes: 60

				        run: bash tests/integration/kubernetes/gha-run.sh run-tests

				      - name: Take a post-action

				      - name: Report tests

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh report-tests

				      - name: Delete kata-deploy

				        if: always()

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh cleanup-zvsi

				      - name: Delete CoCo KBS

				        if: always()

				        timeout-minutes: 10

				        run: |

				          bash tests/integration/kubernetes/gha-run.sh cleanup-zvsi || true

				          ${HOME}/script/post_action.sh ubuntu-2204

				          if [ "${KBS}" == "true" ]; then

				            bash tests/integration/kubernetes/gha-run.sh delete-coco-kbs

				          fi

									
										86

.github/workflows/run-k8s-tests-with-crio-on-garm.yaml
									
										vendored
									
												View File
											
				@@ -1,86 +0,0 @@

				name: CI | Run kubernetes tests, using CRI-O, on GARM

				on:

				  workflow_call:

				    inputs:

				      registry:

				        required: true

				        type: string

				      repo:

				        required: true

				        type: string

				      tag:

				        required: true

				        type: string

				      pr-number:

				        required: true

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				jobs:

				  run-k8s-tests:

				    strategy:

				      fail-fast: false

				      matrix:

				        vmm:

				          - qemu

				        k8s:

				          - k0s

				        instance:

				          - garm-ubuntu-2204

				          - garm-ubuntu-2204-smaller

				        include:

				          - instance: garm-ubuntu-2204

				            instance-type: normal

				          - instance: garm-ubuntu-2204-smaller

				            instance-type: small

				          - k8s: k0s

				            k8s-extra-params: '--cri-socket remote:unix:///var/run/crio/crio.sock --kubelet-extra-args --cgroup-driver="systemd"'

				    runs-on: ${{ matrix.instance }}

				    env:

				      DOCKER_REGISTRY: ${{ inputs.registry }}

				      DOCKER_REPO: ${{ inputs.repo }}

				      DOCKER_TAG: ${{ inputs.tag }}

				      PR_NUMBER: ${{ inputs.pr-number }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      KUBERNETES: ${{ matrix.k8s }}

				      KUBERNETES_EXTRA_PARAMS: ${{ matrix.k8s-extra-params }}

				      USING_NFD: "false"

				      K8S_TEST_HOST_TYPE: ${{ matrix.instance-type }}

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Configure CRI-O

				        run: bash tests/integration/kubernetes/gha-run.sh setup-crio

				      - name: Deploy ${{ matrix.k8s }}

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-k8s

				      - name: Deploy Kata

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-garm

				      - name: Install `bats`

				        run: bash tests/integration/kubernetes/gha-run.sh install-bats

				      - name: Run tests

				        timeout-minutes: 30

				        run: bash tests/integration/kubernetes/gha-run.sh run-tests

				      - name: Delete kata-deploy

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh cleanup-garm

									
										157

.github/workflows/run-kata-coco-stability-tests.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,157 @@

				name: CI | Run Kata CoCo k8s Stability Tests

				on:

				  workflow_call:

				    inputs:

				      registry:

				        required: true

				        type: string

				      repo:

				        required: true

				        type: string

				      tag:

				        required: true

				        type: string

				      pr-number:

				        required: true

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				      tarball-suffix:

				        required: false

				        type: string

				    secrets:

				      AZ_APPID:

				        required: true

				      AZ_TENANT_ID:

				       required: true

				      AZ_SUBSCRIPTION_ID:

				        required: true

				      AUTHENTICATED_IMAGE_PASSWORD:

				        required: true

				permissions: {}

				jobs:

				  # Generate jobs for testing CoCo on non-TEE environments

				  run-stability-k8s-tests-coco-nontee:

				    name: run-stability-k8s-tests-coco-nontee

				    strategy:

				      fail-fast: false

				      matrix:

				        vmm:

				          - qemu-coco-dev

				          - qemu-coco-dev-runtime-rs

				        snapshotter:

				          - nydus

				        pull-type:

				          - guest-pull

				    runs-on: ubuntu-22.04

				    permissions:

				      id-token: write # Used for OIDC access to log into Azure

				    environment: ci

				    env:

				      DOCKER_REGISTRY: ${{ inputs.registry }}

				      DOCKER_REPO: ${{ inputs.repo }}

				      DOCKER_TAG: ${{ inputs.tag }}

				      GH_PR_NUMBER: ${{ inputs.pr-number }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      # Some tests rely on that variable to run (or not)

				      KBS: "true"

				      # Set the KBS ingress handler (empty string disables handling)

				      KBS_INGRESS: "aks"

				      KUBERNETES: "vanilla"

				      PULL_TYPE: ${{ matrix.pull-type }}

				      AUTHENTICATED_IMAGE_USER: ${{ vars.AUTHENTICATED_IMAGE_USER }}

				      AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}

				      SNAPSHOTTER: ${{ matrix.snapshotter }}

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-kata-tools-tarball

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-tools-artifacts

				      - name: Install kata-tools

				        run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts

				      - name: Log into the Azure account

				        uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0

				        with:

				          client-id: ${{ secrets.AZ_APPID }}

				          tenant-id: ${{ secrets.AZ_TENANT_ID }}

				          subscription-id: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				      - name: Create AKS cluster

				        uses: nick-fields/retry@ce71cc2ab81d554ebbe88c79ab5975992d79ba08 # v3.0.2

				        with:

				          timeout_minutes: 15

				          max_attempts: 20

				          retry_on: error

				          retry_wait_seconds: 10

				          command: bash tests/integration/kubernetes/gha-run.sh create-cluster

				      - name: Install `bats`

				        run: bash tests/integration/kubernetes/gha-run.sh install-bats

				      - name: Install `kubectl`

				        uses: azure/setup-kubectl@776406bce94f63e41d621b960d78ee25c8b76ede # v4.0.1

				        with:

				          version: 'latest'

				      - name: Download credentials for the Kubernetes CLI to use them

				        run: bash tests/integration/kubernetes/gha-run.sh get-cluster-credentials

				      - name: Deploy Snapshotter

				        timeout-minutes: 5

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-snapshotter

				      - name: Deploy Kata

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-aks

				      - name: Deploy CoCo KBS

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-coco-kbs

				      - name: Install `kbs-client`

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh install-kbs-client

				      - name: Run stability tests

				        timeout-minutes: 300

				        run: bash tests/stability/gha-stability-run.sh run-tests

				      - name: Report tests

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh report-tests

				      - name: Refresh OIDC token in case access token expired

				        if: always()

				        uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0

				        with:

				          client-id: ${{ secrets.AZ_APPID }}

				          tenant-id: ${{ secrets.AZ_TENANT_ID }}

				          subscription-id: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				      - name: Delete AKS cluster

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh delete-cluster

									
										365

.github/workflows/run-kata-coco-tests.yaml
									
										vendored
									
												View File
												
				@@ -2,6 +2,9 @@ name: CI | Run kata coco tests

				on:

				  workflow_call:

				    inputs:

				      tarball-suffix:

				        required: false

				        type: string

				      registry:

				        required: true

				        type: string

				@@ -21,90 +24,54 @@ on:

				        required: false

				        type: string

				        default: ""

				    secrets:

				      AUTHENTICATED_IMAGE_PASSWORD:

				        required: true

				      AZ_APPID:

				        required: true

				      AZ_TENANT_ID:

				       required: true

				      AZ_SUBSCRIPTION_ID:

				        required: true

				      ITA_KEY:

				        required: true

				permissions: {}

				jobs:

				  run-k8s-tests-on-tdx:

				  run-k8s-tests-on-tee:

				    name: run-k8s-tests-on-tee

				    strategy:

				      fail-fast: false

				      matrix:

				        vmm:

				          - qemu-tdx

				        snapshotter:

				          - nydus

				        pull-type:

				          - guest-pull

				    runs-on: tdx

				        include:

				          - runner: tdx

				            vmm: qemu-tdx

				          - runner: sev-snp

				            vmm: qemu-snp

				    runs-on: ${{ matrix.runner }}

				    env:

				      DOCKER_REGISTRY: ${{ inputs.registry }}

				      DOCKER_REPO: ${{ inputs.repo }}

				      DOCKER_TAG: ${{ inputs.tag }}

				      PR_NUMBER: ${{ inputs.pr-number }}

				      GH_PR_NUMBER: ${{ inputs.pr-number }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      KUBERNETES: "k3s"

				      USING_NFD: "true"

				      K8S_TEST_HOST_TYPE: "baremetal"

				      SNAPSHOTTER: ${{ matrix.snapshotter }}

				      PULL_TYPE: ${{ matrix.pull-type }}

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Deploy Snapshotter

				        timeout-minutes: 5

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-snapshotter

				      - name: Deploy Kata

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-tdx

				      - name: Run tests

				        timeout-minutes: 30

				        run: bash tests/integration/kubernetes/gha-run.sh run-tests

				      - name: Delete kata-deploy

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh cleanup-tdx

				      - name: Delete Snapshotter

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh cleanup-snapshotter

				  run-k8s-tests-on-sev:

				    strategy:

				      fail-fast: false

				      matrix:

				        vmm:

				          - qemu-sev

				        snapshotter:

				          - nydus

				        pull-type:

				          - guest-pull

				    runs-on: sev

				    env:

				      DOCKER_REGISTRY: ${{ inputs.registry }}

				      DOCKER_REPO: ${{ inputs.repo }}

				      DOCKER_TAG: ${{ inputs.tag }}

				      PR_NUMBER: ${{ inputs.pr-number }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      KUBECONFIG: /home/kata/.kube/config

				      KUBERNETES: "vanilla"

				      USING_NFD: "false"

				      KBS: "true"

				      K8S_TEST_HOST_TYPE: "baremetal"

				      SNAPSHOTTER: ${{ matrix.snapshotter }}

				      PULL_TYPE: ${{ matrix.pull-type }}

				      KBS_INGRESS: "nodeport"

				      SNAPSHOTTER: "nydus"

				      PULL_TYPE: "guest-pull"

				      AUTHENTICATED_IMAGE_USER: ${{ vars.AUTHENTICATED_IMAGE_USER }}

				      AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}

				      GH_ITA_KEY: ${{ secrets.ITA_KEY }}

				      AUTO_GENERATE_POLICY: "yes"

				    steps:

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -112,54 +79,109 @@ jobs:

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Deploy Snapshotter

				        timeout-minutes: 5

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-snapshotter

				      - name: get-kata-tools-tarball

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-tools-artifacts

				      - name: Install kata-tools

				        run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts

				      - name: Deploy Kata

				        timeout-minutes: 20

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-kata

				      - name: Uninstall previous `kbs-client`

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-sev

				        run: bash tests/integration/kubernetes/gha-run.sh uninstall-kbs-client

				      - name: Deploy CoCo KBS

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-coco-kbs

				        env:

				          ITA_KEY: ${{ env.KATA_HYPERVISOR == 'qemu-tdx' && env.GH_ITA_KEY || '' }}

				      - name: Install `kbs-client`

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh install-kbs-client

				      - name: Deploy CSI driver

				        timeout-minutes: 5

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-csi-driver

				      - name: Run tests

				        timeout-minutes: 30

				        timeout-minutes: 100

				        run: bash tests/integration/kubernetes/gha-run.sh run-tests

				      - name: Report tests

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh report-tests

				      - name: Delete kata-deploy

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh cleanup-sev

				        timeout-minutes: 15

				        run: bash tests/integration/kubernetes/gha-run.sh cleanup

				      - name: Delete Snapshotter

				      - name: Delete CoCo KBS

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh cleanup-snapshotter

				        timeout-minutes: 10

				        run: |

				          [[ "${KATA_HYPERVISOR}" == "qemu-tdx" ]] && echo "ITA_KEY=${GH_ITA_KEY}" >> "${GITHUB_ENV}"

				          bash tests/integration/kubernetes/gha-run.sh delete-coco-kbs

				  run-k8s-tests-sev-snp:

				      - name: Delete CSI driver

				        timeout-minutes: 5

				        run: bash tests/integration/kubernetes/gha-run.sh delete-csi-driver

				  # Generate jobs for testing CoCo on non-TEE environments

				  run-k8s-tests-coco-nontee:

				    name: run-k8s-tests-coco-nontee

				    strategy:

				      fail-fast: false

				      matrix:

				        vmm:

				          - qemu-snp

				          - qemu-coco-dev

				          - qemu-coco-dev-runtime-rs

				        snapshotter:

				          - nydus

				        pull-type:

				          - guest-pull

				    runs-on: sev-snp

				        include:

				          - pull-type: experimental-force-guest-pull

				            vmm: qemu-coco-dev

				            snapshotter: ""

				    runs-on: ubuntu-22.04

				    permissions:

				      id-token: write # Used for OIDC access to log into Azure

				    environment: ci

				    env:

				      DOCKER_REGISTRY: ${{ inputs.registry }}

				      DOCKER_REPO: ${{ inputs.repo }}

				      DOCKER_TAG: ${{ inputs.tag }}

				      PR_NUMBER: ${{ inputs.pr-number }}

				      GH_PR_NUMBER: ${{ inputs.pr-number }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      KUBECONFIG: /home/kata/.kube/config

				      # Some tests rely on that variable to run (or not)

				      KBS: "true"

				      # Set the KBS ingress handler (empty string disables handling)

				      KBS_INGRESS: "aks"

				      KUBERNETES: "vanilla"

				      USING_NFD: "false"

				      K8S_TEST_HOST_TYPE: "baremetal"

				      SNAPSHOTTER: ${{ matrix.snapshotter }}

				      PULL_TYPE: ${{ matrix.pull-type }}

				      AUTHENTICATED_IMAGE_USER: ${{ vars.AUTHENTICATED_IMAGE_USER }}

				      AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}

				      SNAPSHOTTER: ${{ matrix.snapshotter }}

				      EXPERIMENTAL_FORCE_GUEST_PULL: ${{ matrix.pull-type == 'experimental-force-guest-pull' && matrix.vmm || '' }}

				      # Caution: current ingress controller used to expose the KBS service

				      # requires much vCPUs, lefting only a few for the tests. Depending on the

				      # host type chose it will result on the creation of a cluster with

				      # insufficient resources.

				      K8S_TEST_HOST_TYPE: "all"

				    steps:

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -167,22 +189,177 @@ jobs:

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Deploy Snapshotter

				        timeout-minutes: 5

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-snapshotter

				      - name: get-kata-tools-tarball

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-tools-artifacts

				      - name: Install kata-tools

				        run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts

				      - name: Log into the Azure account

				        uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0

				        with:

				          client-id: ${{ secrets.AZ_APPID }}

				          tenant-id: ${{ secrets.AZ_TENANT_ID }}

				          subscription-id: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				      - name: Create AKS cluster

				        uses: nick-fields/retry@ce71cc2ab81d554ebbe88c79ab5975992d79ba08 # v3.0.2

				        with:

				          timeout_minutes: 15

				          max_attempts: 20

				          retry_on: error

				          retry_wait_seconds: 10

				          command: bash tests/integration/kubernetes/gha-run.sh create-cluster

				      - name: Install `bats`

				        run: bash tests/integration/kubernetes/gha-run.sh install-bats

				      - name: Install `kubectl`

				        uses: azure/setup-kubectl@776406bce94f63e41d621b960d78ee25c8b76ede # v4.0.1

				        with:

				          version: 'latest'

				      - name: Download credentials for the Kubernetes CLI to use them

				        run: bash tests/integration/kubernetes/gha-run.sh get-cluster-credentials

				      - name: Deploy Kata

				        timeout-minutes: 20

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-aks

				        env:

				          USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: ${{ env.SNAPSHOTTER == 'nydus' }}

				          AUTO_GENERATE_POLICY: ${{ env.PULL_TYPE == 'experimental-force-guest-pull' && 'no' || 'yes' }}

				      - name: Deploy CoCo KBS

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-snp

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-coco-kbs

				      - name: Install `kbs-client`

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh install-kbs-client

				      - name: Deploy CSI driver

				        timeout-minutes: 5

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-csi-driver

				      - name: Run tests

				        timeout-minutes: 30

				        timeout-minutes: 80

				        run: bash tests/integration/kubernetes/gha-run.sh run-tests

				      - name: Delete kata-deploy

				      - name: Report tests

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh cleanup-snp

				        run: bash tests/integration/kubernetes/gha-run.sh report-tests

				      - name: Delete Snapshotter

				      - name: Refresh OIDC token in case access token expired

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh cleanup-snapshotter

				        uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0

				        with:

				          client-id: ${{ secrets.AZ_APPID }}

				          tenant-id: ${{ secrets.AZ_TENANT_ID }}

				          subscription-id: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				      - name: Delete AKS cluster

				        if: always()

				        timeout-minutes: 15

				        run: bash tests/integration/kubernetes/gha-run.sh delete-cluster

				  # Generate jobs for testing CoCo on non-TEE environments with erofs-snapshotter

				  run-k8s-tests-coco-nontee-with-erofs-snapshotter:

				    name: run-k8s-tests-coco-nontee-with-erofs-snapshotter

				    strategy:

				      fail-fast: false

				      matrix:

				        vmm:

				          - qemu-coco-dev

				        snapshotter:

				          - erofs

				        pull-type:

				          - default

				    runs-on: ubuntu-24.04

				    environment: ci

				    env:

				      DOCKER_REGISTRY: ${{ inputs.registry }}

				      DOCKER_REPO: ${{ inputs.repo }}

				      DOCKER_TAG: ${{ inputs.tag }}

				      GH_PR_NUMBER: ${{ inputs.pr-number }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      # Some tests rely on that variable to run (or not)

				      KBS: "false"

				      # Set the KBS ingress handler (empty string disables handling)

				      KBS_INGRESS: ""

				      KUBERNETES: "vanilla"

				      CONTAINER_ENGINE: "containerd"

				      CONTAINER_ENGINE_VERSION: "v2.2"

				      PULL_TYPE: ${{ matrix.pull-type }}

				      SNAPSHOTTER: ${{ matrix.snapshotter }}

				      USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: "true"

				      K8S_TEST_HOST_TYPE: "all"

				      # We are skipping the auto generated policy tests for now,

				      # but those should be enabled as soon as we work on that.

				      AUTO_GENERATE_POLICY: "no"

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-kata-tools-tarball

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-tools-artifacts

				      - name: Install kata-tools

				        run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts

				      - name: Remove unnecessary directories to free up space

				        run: |

				          sudo rm -rf /usr/local/.ghcup

				          sudo rm -rf /opt/hostedtoolcache/CodeQL

				          sudo rm -rf /usr/local/lib/android

				          sudo rm -rf /usr/share/dotnet

				          sudo rm -rf /opt/ghc

				          sudo rm -rf /usr/local/share/boost

				          sudo rm -rf /usr/lib/jvm

				          sudo rm -rf /usr/share/swift

				          sudo rm -rf /usr/local/share/powershell

				          sudo rm -rf /usr/local/julia*

				          sudo rm -rf /opt/az

				          sudo rm -rf /usr/local/share/chromium

				          sudo rm -rf /opt/microsoft

				          sudo rm -rf /opt/google

				          sudo rm -rf /usr/lib/firefox

				      - name: Deploy kubernetes

				        timeout-minutes: 15

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-k8s

				        env:

				          GH_TOKEN: ${{ github.token }}

				      - name: Install `bats`

				        run: bash tests/integration/kubernetes/gha-run.sh install-bats

				      - name: Deploy Kata

				        timeout-minutes: 20

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-kata

				      - name: Deploy CSI driver

				        timeout-minutes: 5

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-csi-driver

				      - name: Run tests

				        timeout-minutes: 80

				        run: bash tests/integration/kubernetes/gha-run.sh run-tests

				      - name: Report tests

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh report-tests

									
										61

.github/workflows/run-kata-deploy-tests-on-aks.yaml
									
										vendored
									
												View File
												
				@@ -21,9 +21,19 @@ on:

				        required: false

				        type: string

				        default: ""

				    secrets:

				      AZ_APPID:

				        required: true

				      AZ_TENANT_ID:

				       required: true

				      AZ_SUBSCRIPTION_ID:

				        required: true

				permissions: {}

				jobs:

				  run-kata-deploy-tests:

				    name: run-kata-deploy-tests

				    strategy:

				      fail-fast: false

				      matrix:

				@@ -33,10 +43,14 @@ jobs:

				          - clh

				          - dragonball

				          - qemu

				          - qemu-runtime-rs

				        include:

				          - host_os: cbl-mariner

				            vmm: clh

				    runs-on: ubuntu-latest

				    runs-on: ubuntu-22.04

				    environment: ci

				    permissions:

				      id-token: write # Used for OIDC access to log into Azure

				    env:

				      DOCKER_REGISTRY: ${{ inputs.registry }}

				      DOCKER_REPO: ${{ inputs.repo }}

				@@ -45,12 +59,12 @@ jobs:

				      KATA_HOST_OS: ${{ matrix.host_os }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      KUBERNETES: "vanilla"

				      USING_NFD: "false"

				    steps:

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -58,33 +72,48 @@ jobs:

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Download Azure CLI

				        run: bash tests/functional/kata-deploy/gha-run.sh install-azure-cli

				      - name: Log into the Azure account

				        run: bash tests/functional/kata-deploy/gha-run.sh login-azure

				        env:

				          AZ_APPID: ${{ secrets.AZ_APPID }}

				          AZ_PASSWORD: ${{ secrets.AZ_PASSWORD }}

				          AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}

				          AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				        uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0

				        with:

				          client-id: ${{ secrets.AZ_APPID }}

				          tenant-id: ${{ secrets.AZ_TENANT_ID }}

				          subscription-id: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				      - name: Create AKS cluster

				        timeout-minutes: 10

				        run: bash tests/functional/kata-deploy/gha-run.sh create-cluster

				        uses: nick-fields/retry@ce71cc2ab81d554ebbe88c79ab5975992d79ba08 # v3.0.2

				        with:

				          timeout_minutes: 15

				          max_attempts: 20

				          retry_on: error

				          retry_wait_seconds: 10

				          command: bash tests/integration/kubernetes/gha-run.sh create-cluster

				      - name: Install `bats`

				        run: bash tests/functional/kata-deploy/gha-run.sh install-bats

				      - name: Install `kubectl`

				        run: bash tests/functional/kata-deploy/gha-run.sh install-kubectl

				        uses: azure/setup-kubectl@776406bce94f63e41d621b960d78ee25c8b76ede # v4.0.1

				        with:

				          version: 'latest'

				      - name: Download credentials for the Kubernetes CLI to use them

				        run: bash tests/functional/kata-deploy/gha-run.sh get-cluster-credentials

				      - name: Run tests

				        run: bash tests/functional/kata-deploy/gha-run.sh run-tests

				      - name: Report tests

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh report-tests

				      - name: Refresh OIDC token in case access token expired

				        if: always()

				        uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0

				        with:

				          client-id: ${{ secrets.AZ_APPID }}

				          tenant-id: ${{ secrets.AZ_TENANT_ID }}

				          subscription-id: ${{ secrets.AZ_SUBSCRIPTION_ID }}

				      - name: Delete AKS cluster

				        if: always()

				        run: bash tests/functional/kata-deploy/gha-run.sh delete-cluster

									
										65

.github/workflows/run-kata-deploy-tests-on-garm.yaml
									
										vendored
									
												View File
											
				@@ -1,65 +0,0 @@

				name: CI | Run kata-deploy tests on GARM

				on:

				  workflow_call:

				    inputs:

				      registry:

				        required: true

				        type: string

				      repo:

				        required: true

				        type: string

				      tag:

				        required: true

				        type: string

				      pr-number:

				        required: true

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				jobs:

				  run-kata-deploy-tests:

				    strategy:

				      fail-fast: false

				      matrix:

				        vmm:

				          - clh

				          - qemu

				        k8s:

				          - k0s

				          - k3s

				          - rke2

				    runs-on: garm-ubuntu-2004-smaller

				    env:

				      DOCKER_REGISTRY: ${{ inputs.registry }}

				      DOCKER_REPO: ${{ inputs.repo }}

				      DOCKER_TAG: ${{ inputs.tag }}

				      PR_NUMBER: ${{ inputs.pr-number }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      KUBERNETES: ${{ matrix.k8s }}

				      USING_NFD: "false"

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Deploy ${{ matrix.k8s }}

				        run:  bash tests/functional/kata-deploy/gha-run.sh deploy-k8s

				      - name: Install `bats`

				        run: bash tests/functional/kata-deploy/gha-run.sh install-bats

				      - name: Run tests

				        run: bash tests/functional/kata-deploy/gha-run.sh run-tests

									
										90

.github/workflows/run-kata-deploy-tests.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,90 @@

				name: CI | Run kata-deploy tests

				on:

				  workflow_call:

				    inputs:

				      registry:

				        required: true

				        type: string

				      repo:

				        required: true

				        type: string

				      tag:

				        required: true

				        type: string

				      pr-number:

				        required: true

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				permissions: {}

				jobs:

				  run-kata-deploy-tests:

				    name: run-kata-deploy-tests

				    strategy:

				      fail-fast: false

				      matrix:

				        vmm:

				          - qemu

				        k8s:

				          - k0s

				          - k3s

				          - rke2

				          - microk8s

				    runs-on: ubuntu-22.04

				    env:

				      DOCKER_REGISTRY: ${{ inputs.registry }}

				      DOCKER_REPO: ${{ inputs.repo }}

				      DOCKER_TAG: ${{ inputs.tag }}

				      GH_PR_NUMBER: ${{ inputs.pr-number }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      KUBERNETES: ${{ matrix.k8s }}

				    steps:

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Remove unnecessary directories to free up space

				        run: |

				          sudo rm -rf /usr/local/.ghcup

				          sudo rm -rf /opt/hostedtoolcache/CodeQL

				          sudo rm -rf /usr/local/lib/android

				          sudo rm -rf /usr/share/dotnet

				          sudo rm -rf /opt/ghc

				          sudo rm -rf /usr/local/share/boost

				          sudo rm -rf /usr/lib/jvm

				          sudo rm -rf /usr/share/swift

				          sudo rm -rf /usr/local/share/powershell

				          sudo rm -rf /usr/local/julia*

				          sudo rm -rf /opt/az

				          sudo rm -rf /usr/local/share/chromium

				          sudo rm -rf /opt/microsoft

				          sudo rm -rf /opt/google

				          sudo rm -rf /usr/lib/firefox

				      - name: Deploy ${{ matrix.k8s }}

				        run:  bash tests/functional/kata-deploy/gha-run.sh deploy-k8s

				      - name: Install `bats`

				        run: bash tests/functional/kata-deploy/gha-run.sh install-bats

				      - name: Run tests

				        run: bash tests/functional/kata-deploy/gha-run.sh run-tests

				      - name: Report tests

				        if: always()

				        run: bash tests/functional/kata-deploy/gha-run.sh report-tests

									
										23

.github/workflows/run-kata-monitor-tests.yaml
									
										vendored
									
												View File
												
				@@ -13,8 +13,11 @@ on:

				        type: string

				        default: ""

				permissions: {}

				jobs:

				  run-monitor:

				    name: run-monitor

				    strategy:

				      fail-fast: false

				      matrix:

				@@ -23,19 +26,25 @@ jobs:

				        container_engine:

				          - crio

				          - containerd

				        include:

				        # TODO: enable when https://github.com/kata-containers/kata-containers/issues/9853 is fixed

				        #include:

				        #  - container_engine: containerd

				        #    containerd_version: lts

				        exclude:

				          # TODO: enable with containerd when https://github.com/kata-containers/kata-containers/issues/9761 is fixed

				          - container_engine: containerd

				            containerd_version: lts

				    runs-on: garm-ubuntu-2204-smaller

				            vmm: qemu

				    runs-on: ubuntu-22.04

				    env:

				      CONTAINER_ENGINE: ${{ matrix.container_engine }}

				      CONTAINERD_VERSION: ${{ matrix.containerd_version }}

				      #CONTAINERD_VERSION: ${{ matrix.containerd_version }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				    steps:

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -45,9 +54,11 @@ jobs:

				      - name: Install dependencies

				        run: bash tests/functional/kata-monitor/gha-run.sh install-dependencies

				        env:

				          GH_TOKEN: ${{ github.token }}

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0

				        with:

				          name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-artifacts

									
										90

.github/workflows/run-metrics.yaml
									
										vendored
									
												View File
												
				@@ -2,8 +2,17 @@ name: CI | Run test metrics

				on:

				  workflow_call:

				    inputs:

				      tarball-suffix:

				        required: false

				      registry:

				        required: true

				        type: string

				      repo:

				        required: true

				        type: string

				      tag:

				        required: true

				        type: string

				      pr-number:

				        required: true

				        type: string

				      commit-hash:

				        required: false

				@@ -13,17 +22,35 @@ on:

				        type: string

				        default: ""

				permissions: {}

				jobs:

				  setup-kata:

				    name: Kata Setup

				  run-metrics:

				    name: run-metrics

				    strategy:

				      # We can set this to true whenever we're 100% sure that

				      # the all the tests are not flaky, otherwise we'll fail

				      # all the tests due to a single flaky instance.

				      fail-fast: false

				      matrix:

				        vmm: ['clh', 'qemu']

				      max-parallel: 1

				    runs-on: metrics

				    env:

				      GOPATH: ${{ github.workspace }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				      DOCKER_REGISTRY: ${{ inputs.registry }}

				      DOCKER_REPO: ${{ inputs.repo }}

				      DOCKER_TAG: ${{ inputs.tag }}

				      GH_PR_NUMBER: ${{ inputs.pr-number }}

				      K8S_TEST_HOST_TYPE: "baremetal"

				      KUBERNETES: kubeadm

				    steps:

				      - uses: actions/checkout@v4

				      - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Rebase atop of the latest target branch

				        run: |

				@@ -31,64 +58,71 @@ jobs:

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        with:

				          name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				      - name: Deploy Kata

				        timeout-minutes: 10

				        run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-kubeadm

				      - name: Install kata

				        run: bash tests/metrics/gha-run.sh install-kata kata-artifacts

				      - name: Install check metrics

				        run: bash tests/metrics/gha-run.sh install-checkmetrics

				  run-metrics:

				    needs: setup-kata

				    strategy:

				      # We can set this to true whenever we're 100% sure that

				      # the all the tests are not flaky, otherwise we'll fail

				      # all the tests due to a single flaky instance.

				      fail-fast: false

				      matrix:

				        vmm: ['clh', 'qemu', 'stratovirt']

				      max-parallel: 1

				    runs-on: metrics

				    env:

				      GOPATH: ${{ github.workspace }}

				      KATA_HYPERVISOR: ${{ matrix.vmm }}

				    steps:

				      - name: enabling the hypervisor

				        run: bash tests/metrics/gha-run.sh enabling-hypervisor

				      - name: run launch times test

				        timeout-minutes: 15

				        continue-on-error: true

				        run: bash tests/metrics/gha-run.sh run-test-launchtimes

				      - name: run memory foot print test

				        timeout-minutes: 15

				        continue-on-error: true

				        run:  bash tests/metrics/gha-run.sh run-test-memory-usage

				      - name: run memory usage inside container test

				        timeout-minutes: 15

				        continue-on-error: true

				        run:  bash tests/metrics/gha-run.sh run-test-memory-usage-inside-container

				      - name: run blogbench test

				        timeout-minutes: 15

				        continue-on-error: true

				        run:  bash tests/metrics/gha-run.sh run-test-blogbench

				      - name: run tensorflow test

				        timeout-minutes: 15

				        continue-on-error: true

				        run:  bash tests/metrics/gha-run.sh run-test-tensorflow

				      - name: run fio test

				        timeout-minutes: 15

				        continue-on-error: true

				        run:  bash tests/metrics/gha-run.sh run-test-fio

				      - name: run iperf test

				        timeout-minutes: 15

				        continue-on-error: true

				        run:  bash tests/metrics/gha-run.sh run-test-iperf

				      - name: run latency test

				        timeout-minutes: 15

				        continue-on-error: true

				        run:  bash tests/metrics/gha-run.sh run-test-latency

				      - name: check metrics

				        run:  bash tests/metrics/gha-run.sh check-metrics

				      - name: make metrics tarball ${{ matrix.vmm }}

				        run: bash tests/metrics/gha-run.sh make-tarball-results

				      - name: archive metrics results ${{ matrix.vmm }}

				        uses: actions/upload-artifact@v3

				        uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2

				        with:

				          name: metrics-artifacts-${{ matrix.vmm }}

				          path: results-${{ matrix.vmm }}.tar.gz

				          retention-days: 1

				          if-no-files-found: error

				      - name: Delete kata-deploy

				        timeout-minutes: 10

				        if: always()

				        run: bash tests/integration/kubernetes/gha-run.sh cleanup-kubeadm

									
										46

.github/workflows/run-runk-tests.yaml
									
										vendored
									
												View File
											
				@@ -1,46 +0,0 @@

				name: CI | Run runk tests

				on:

				  workflow_call:

				    inputs:

				      tarball-suffix:

				        required: false

				        type: string

				      commit-hash:

				        required: false

				        type: string

				      target-branch:

				        required: false

				        type: string

				        default: ""

				jobs:

				  run-runk:

				    runs-on: garm-ubuntu-2204-smaller

				    env:

				      CONTAINERD_VERSION: lts

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          ref: ${{ inputs.commit-hash }}

				          fetch-depth: 0

				      - name: Rebase atop of the latest target branch

				        run: |

				          ./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"

				        env:

				          TARGET_BRANCH: ${{ inputs.target-branch }}

				      - name: Install dependencies

				        run: bash tests/integration/runk/gha-run.sh install-dependencies

				      - name: get-kata-tarball

				        uses: actions/download-artifact@v3

				        with:

				          name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}

				          path: kata-artifacts

				      - name: Install kata

				        run: bash tests/integration/runk/gha-run.sh install-kata kata-artifacts

				      - name: Run runk tests

				        run: bash tests/integration/runk/gha-run.sh run

									
										60

.github/workflows/scorecard.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,60 @@

				# This workflow uses actions that are not certified by GitHub. They are provided

				# by a third-party and are governed by separate terms of service, privacy

				# policy, and support documentation.

				name: Scorecard supply-chain security

				on:

				  # For Branch-Protection check. Only the default branch is supported. See

				  # https://github.com/ossf/scorecard/blob/main/docs/checks.md#branch-protection

				  branch_protection_rule:

				  push:

				    branches: [ "main" ]

				  workflow_dispatch:

				permissions: {}

				jobs:

				  analysis:

				    name: Scorecard analysis

				    runs-on: ubuntu-latest

				    # `publish_results: true` only works when run from the default branch. conditional can be removed if disabled.

				    if: github.event.repository.default_branch == github.ref_name || github.event_name == 'pull_request'

				    permissions:

				      # Needed to upload the results to code-scanning dashboard.

				      security-events: write

				      # Needed to publish results and get a badge (see publish_results below).

				      id-token: write

				    steps:

				      - name: "Checkout code"

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          persist-credentials: false

				      - name: "Run analysis"

				        uses: ossf/scorecard-action@f49aabe0b5af0936a0987cfb85d86b75731b0186 # v2.4.1

				        with:

				          results_file: results.sarif

				          results_format: sarif

				          # Public repositories:

				          #   - Publish results to OpenSSF REST API for easy access by consumers

				          #   - Allows the repository to include the Scorecard badge.

				          #   - See https://github.com/ossf/scorecard-action#publishing-results.

				          publish_results: true

				      # Upload the results as artifacts (optional). Commenting out will disable uploads of run results in SARIF

				      # format to the repository Actions tab.

				      - name: "Upload artifact"

				        uses: actions/upload-artifact@4cec3d8aa04e39d1a68397de0c4cd6fb9dce8ec1 # v4.6.1

				        with:

				          name: SARIF file

				          path: results.sarif

				          retention-days: 5

				      # Upload the results to GitHub's code scanning dashboard (optional).

				      # Commenting out will disable upload of results to your repo's Code Scanning dashboard

				      - name: "Upload to code-scanning"

				        uses: github/codeql-action/upload-sarif@v3

				        with:

				          sarif_file: results.sarif

									
										32

.github/workflows/shellcheck.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,32 @@

				# https://github.com/marketplace/actions/shellcheck

				name: Check shell scripts

				on:

				  workflow_dispatch:

				  pull_request:

				    types:

				      - opened

				      - edited

				      - reopened

				      - synchronize

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				jobs:

				  shellcheck:

				    name: shellcheck

				    runs-on: ubuntu-24.04

				    steps:

				      - name: Checkout the code

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Run ShellCheck

				        uses: ludeeus/action-shellcheck@00cae500b08a931fb5698e11e79bfbd38e612a38 # v2.0.0

				        with:

				          ignore_paths: "**/vendor/**"

									
										35

.github/workflows/shellcheck_required.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,35 @@

				# https://github.com/marketplace/actions/shellcheck

				name: Shellcheck required

				on:

				  workflow_dispatch:

				  pull_request:

				    types:

				      - opened

				      - edited

				      - reopened

				      - synchronize

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				jobs:

				  shellcheck-required:

				    name: shellcheck-required

				    runs-on: ubuntu-24.04

				    steps:

				      - name: Checkout the code

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Run ShellCheck

				        uses: ludeeus/action-shellcheck@00cae500b08a931fb5698e11e79bfbd38e612a38 # v2.0.0

				        with:

				          severity: error

				          ignore_paths: "**/vendor/**"

									
										17

.github/workflows/stale.yaml
									
										vendored
									
												View File
												
				@@ -4,14 +4,23 @@ on:

				    - cron: '0 0 * * *'

				  workflow_dispatch:

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				jobs:

				  stale:

				    runs-on: ubuntu-latest

				    name: stale

				    runs-on: ubuntu-22.04

				    permissions:

				      actions: write # Needed to manage caches for state persistence across runs

				      pull-requests: write # Needed to add/remove labels, post comments, or close PRs

				    steps:

				      - uses: actions/stale@v8

				      - uses: actions/stale@5bef64f19d7facfb25b37b414482c7164d639639 # v9.1.0

				        with:

				          start-date: '2023-05-01T00:00:00Z'

				          stale-pr-message: 'This PR has been opened without with no activity for 180 days. Comment on the issue otherwise it will be closed in 7 days'

				          stale-pr-message: 'This PR has been opened without activity for 180 days. Please comment on the issue or it will be closed in 7 days.'

				          days-before-pr-stale: 180

				          days-before-pr-close: 7

				          days-before-issue-stale: -1

									
										18

.github/workflows/static-checks-self-hosted.yaml
									
										vendored
									
												View File
												
				@@ -6,21 +6,31 @@ on:

				      - reopened

				      - labeled # a workflow runs only when the 'ok-to-test' label is added

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				name: Static checks self-hosted

				jobs:

				  build-checks:

				  skipper:

				    if: ${{ contains(github.event.pull_request.labels.*.name, 'ok-to-test') }}

				    uses: ./.github/workflows/gatekeeper-skipper.yaml

				    with:

				      commit-hash: ${{ github.event.pull_request.head.sha }}

				      target-branch: ${{ github.event.pull_request.base.ref }}

				  build-checks:

				    needs: skipper

				    if: ${{ needs.skipper.outputs.skip_static != 'yes' }}

				    strategy:

				      fail-fast: false

				      matrix:

				        instance:

				          - "arm-no-k8s"

				          - "s390x"

				          - "ppc64le"

				          - "ubuntu-24.04-arm"

				          - "ubuntu-24.04-s390x"

				          - "ubuntu-24.04-ppc64le"

				    uses: ./.github/workflows/build-checks.yaml

				    with:

				      instance: ${{ matrix.instance }}

									
										119

.github/workflows/static-checks.yaml
									
										vendored
									
												View File
												
				@@ -5,6 +5,9 @@ on:

				      - edited

				      - reopened

				      - synchronize

				  workflow_dispatch:

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				@@ -12,19 +15,29 @@ concurrency:

				name: Static checks

				jobs:

				  skipper:

				    uses: ./.github/workflows/gatekeeper-skipper.yaml

				    with:

				      commit-hash: ${{ github.event.pull_request.head.sha }}

				      target-branch: ${{ github.event.pull_request.base.ref }}

				  check-kernel-config-version:

				    runs-on: ubuntu-latest

				    name: check-kernel-config-version

				    needs: skipper

				    if: ${{ needs.skipper.outputs.skip_static != 'yes' }}

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Checkout the code

				        uses: actions/checkout@v4

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Ensure the kernel config version has been updated

				        run: |

				          kernel_dir="tools/packaging/kernel/"

				          kernel_version_file="${kernel_dir}kata_config_version"

				          modified_files=$(git diff --name-only origin/$GITHUB_BASE_REF..HEAD)

				          if git diff --name-only origin/$GITHUB_BASE_REF..HEAD "${kernel_dir}" | grep "${kernel_dir}"; then

				          modified_files=$(git diff --name-only origin/"$GITHUB_BASE_REF"..HEAD)

				          if git diff --name-only origin/"$GITHUB_BASE_REF"..HEAD "${kernel_dir}" | grep "${kernel_dir}"; then

				            echo "Kernel directory has changed, checking if $kernel_version_file has been updated"

				            if echo "$modified_files" | grep -v "README.md" | grep "${kernel_dir}" >>"/dev/null"; then

				              echo "$modified_files" | grep "$kernel_version_file" >>/dev/null || ( echo "Please bump version in $kernel_version_file" && exit 1)

				@@ -35,12 +48,17 @@ jobs:

				          fi

				  build-checks:

				    needs: skipper

				    if: ${{ needs.skipper.outputs.skip_static != 'yes' }}

				    uses: ./.github/workflows/build-checks.yaml

				    with:

				      instance: ubuntu-20.04

				      instance: ubuntu-22.04

				  build-checks-depending-on-kvm:

				    runs-on: garm-ubuntu-2004-smaller

				    name: build-checks-depending-on-kvm

				    runs-on: ubuntu-22.04

				    needs: skipper

				    if: ${{ needs.skipper.outputs.skip_static != 'yes' }}

				    strategy:

				      fail-fast: false

				      matrix:

				@@ -53,12 +71,13 @@ jobs:

				            component-path: src/dragonball

				    steps:

				      - name: Checkout the code

				        uses: actions/checkout@v4

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Install system deps

				        run: |

				          sudo apt-get install -y build-essential musl-tools

				          sudo apt-get update && sudo apt-get install -y build-essential musl-tools

				      - name: Install yq

				        run: |

				          sudo -E ./ci/install_yq.sh

				@@ -71,13 +90,19 @@ jobs:

				      - name: Running `${{ matrix.command }}` for ${{ matrix.component }}

				        run: |

				          export PATH="$PATH:${HOME}/.cargo/bin"

				          cd ${{ matrix.component-path }}

				          ${{ matrix.command }}

				          cd "${COMPONENT_PATH}"

				          eval "${COMMAND}"

				        env:

				          COMMAND: ${{ matrix.command }}

				          COMPONENT_PATH: ${{ matrix.component-path }}

				          RUST_BACKTRACE: "1"

				          RUST_LIB_BACKTRACE: "0"

				  static-checks:

				    runs-on: ubuntu-20.04

				    name: static-checks

				    runs-on: ubuntu-22.04

				    needs: skipper

				    if: ${{ needs.skipper.outputs.skip_static != 'yes' }}

				    strategy:

				      fail-fast: false

				      matrix:

				@@ -85,27 +110,83 @@ jobs:

				          - "make static-checks"

				    env:

				      GOPATH: ${{ github.workspace }}

				    permissions:

				      contents: read  # for checkout

				      packages: write # for push to ghcr.io

				    steps:

				      - name: Checkout code

				        uses: actions/checkout@v4

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				          persist-credentials: false

				          path: ./src/github.com/${{ github.repository }}

				      - name: Install yq

				        run: |

				          cd ${GOPATH}/src/github.com/${{ github.repository }}

				          cd "${GOPATH}/src/github.com/${GITHUB_REPOSITORY}"

				          ./ci/install_yq.sh

				        env:

				          INSTALL_IN_GOPATH: false

				      - name: Install golang

				        run: |

				          cd ${GOPATH}/src/github.com/${{ github.repository }}

				          cd "${GOPATH}/src/github.com/${GITHUB_REPOSITORY}"

				          ./tests/install_go.sh -f -p

				          echo "/usr/local/go/bin" >> $GITHUB_PATH

				          echo "/usr/local/go/bin" >> "$GITHUB_PATH"

				      - name: Install system dependencies

				        run: |

				          sudo apt-get -y install moreutils hunspell hunspell-en-gb hunspell-en-us pandoc

				      - name: Run check

				          sudo apt-get update && sudo apt-get -y install moreutils hunspell hunspell-en-gb hunspell-en-us pandoc

				      - name: Install open-policy-agent

				        run: |

				          export PATH=${PATH}:${GOPATH}/bin

				          cd ${GOPATH}/src/github.com/${{ github.repository }} && ${{ matrix.cmd }}

				          cd "${GOPATH}/src/github.com/${GITHUB_REPOSITORY}"

				          ./tests/install_opa.sh

				      - name: Install regorus

				        env:

				          ARTEFACT_REPOSITORY: "${{ github.repository }}"

				          ARTEFACT_REGISTRY_USERNAME: "${{ github.actor }}"

				          ARTEFACT_REGISTRY_PASSWORD: "${{ secrets.GITHUB_TOKEN }}"

				        run: |

				          "${GOPATH}/src/github.com/${GITHUB_REPOSITORY}/tests/install_regorus.sh"

				      - name: Run check

				        env:

				          CMD: ${{ matrix.cmd }}

				        run: |

				          export PATH="${PATH}:${GOPATH}/bin"

				          cd "${GOPATH}/src/github.com/${GITHUB_REPOSITORY}" && ${CMD}

				  govulncheck:

				    needs: skipper

				    if: ${{ needs.skipper.outputs.skip_static != 'yes' }}

				    uses: ./.github/workflows/govulncheck.yaml

				  codegen:

				    name: codegen

				    runs-on: ubuntu-22.04

				    needs: skipper

				    if: ${{ needs.skipper.outputs.skip_static != 'yes' }}

				    permissions:

				      contents: read  # for checkout

				    steps:

				      - name: Checkout code

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				          persist-credentials: false

				      - name: generate

				        run: make -C src/agent generate-protocols

				      - name: check for diff

				        run: |

				          diff=$(git diff)

				          if [[ -z "${diff}" ]]; then

				            echo "No diff detected."

				            exit 0

				          fi

				          cat << EOF >> "${GITHUB_STEP_SUMMARY}"

				          Run \`make -C src/agent generate-protocols\` to update protobuf bindings.

				          \`\`\`diff

				          ${diff}

				          \`\`\`

				          EOF

				          echo "::error::Golang protobuf bindings need to be regenerated (see Github step summary for diff)."

				          exit 1

									
										29

.github/workflows/zizmor.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,29 @@

				name: GHA security analysis

				on:

				  pull_request:

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}

				  cancel-in-progress: true

				jobs:

				  zizmor:

				    name: zizmor

				    runs-on: ubuntu-22.04

				    steps:

				      - name: Checkout repository

				        uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

				        with:

				          fetch-depth: 0

				          persist-credentials: false

				      - name: Run zizmor

				        uses: zizmorcore/zizmor-action@135698455da5c3b3e55f73f4419e481ab68cdd95 # v0.4.1

				        with:

				          advanced-security: false

				          annotations: true

				          persona: auditor

				          version: v1.13.0

									
										3

.github/zizmor.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,3 @@

				rules:

				  undocumented-permissions:

				    disable: true

4

.gitignore vendored

View File

@@ -16,3 +16,7 @@ src/agent/protocols/src/*.rs
 build
 src/tools/log-parser/kata-log-parser
 tools/packaging/static-build/agent/install_libseccomp.sh
 .envrc
 .direnv
 **/.DS_Store
 site/

6290

Cargo.lock generated Normal file

View File

File diff suppressed because it is too large Load Diff

									
										140

Cargo.toml
									
										Normal file
									
												View File
												
				@@ -0,0 +1,140 @@

				[workspace.package]

				authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]

				edition = "2018"

				license = "Apache-2.0"

				rust-version = "1.88"

				[workspace]

				members = [

				  # Dragonball

				  "src/dragonball",

				  "src/dragonball/dbs_acpi",

				  "src/dragonball/dbs_address_space",

				  "src/dragonball/dbs_allocator",

				  "src/dragonball/dbs_arch",

				  "src/dragonball/dbs_boot",

				  "src/dragonball/dbs_device",

				  "src/dragonball/dbs_interrupt",

				  "src/dragonball/dbs_legacy_devices",

				  "src/dragonball/dbs_pci",

				  "src/dragonball/dbs_tdx",

				  "src/dragonball/dbs_upcall",

				  "src/dragonball/dbs_utils",

				  "src/dragonball/dbs_virtio_devices",

				  # runtime-rs

				  "src/runtime-rs",

				  "src/runtime-rs/crates/agent",

				  "src/runtime-rs/crates/hypervisor",

				  "src/runtime-rs/crates/persist",

				  "src/runtime-rs/crates/resource",

				  "src/runtime-rs/crates/runtimes",

				  "src/runtime-rs/crates/service",

				  "src/runtime-rs/crates/shim",

				  "src/runtime-rs/crates/shim-ctl",

				  "src/runtime-rs/tests/utils",

				]

				resolver = "2"

				# TODO: Add all excluded crates to root workspace

				exclude = [

				  "src/agent",

				  "src/tools",

				  "src/libs",

				  # kata-deploy binary is standalone and has its own Cargo.toml for now

				  "tools/packaging/kata-deploy/binary",

				  # We are cloning and building rust packages under

				  # "tools/packaging/kata-deploy/local-build/build" folder, which may mislead

				  # those packages to think they are part of the kata root workspace

				  "tools/packaging/kata-deploy/local-build/build",

				]

				[workspace.dependencies]

				# Rust-VMM crates

				event-manager = "0.2.1"

				kvm-bindings = "0.6.0"

				kvm-ioctls = "=0.12.1"

				linux-loader = "0.8.0"

				seccompiler = "0.5.0"

				vfio-bindings = "0.3.0"

				vfio-ioctls = "0.1.0"

				virtio-bindings = "0.1.0"

				virtio-queue = "0.7.0"

				vm-fdt = "0.2.0"

				vm-memory = "0.10.0"

				vm-superio = "0.5.0"

				vmm-sys-util = "0.11.0"

				# Local dependencies from Dragonball Sandbox crates

				dragonball = { path = "src/dragonball" }

				dbs-acpi = { path = "src/dragonball/dbs_acpi" }

				dbs-address-space = { path = "src/dragonball/dbs_address_space" }

				dbs-allocator = { path = "src/dragonball/dbs_allocator" }

				dbs-arch = { path = "src/dragonball/dbs_arch" }

				dbs-boot = { path = "src/dragonball/dbs_boot" }

				dbs-device = { path = "src/dragonball/dbs_device" }

				dbs-interrupt = { path = "src/dragonball/dbs_interrupt" }

				dbs-legacy-devices = { path = "src/dragonball/dbs_legacy_devices" }

				dbs-pci = { path = "src/dragonball/dbs_pci" }

				dbs-tdx = { path = "src/dragonball/dbs_tdx" }

				dbs-upcall = { path = "src/dragonball/dbs_upcall" }

				dbs-utils = { path = "src/dragonball/dbs_utils" }

				dbs-virtio-devices = { path = "src/dragonball/dbs_virtio_devices" }

				# Local dependencies from runtime-rs

				agent = { path = "src/runtime-rs/crates/agent" }

				hypervisor = { path = "src/runtime-rs/crates/hypervisor" }

				persist = { path = "src/runtime-rs/crates/persist" }

				resource = { path = "src/runtime-rs/crates/resource" }

				runtimes = { path = "src/runtime-rs/crates/runtimes" }

				service = { path = "src/runtime-rs/crates/service" }

				tests_utils = { path = "src/runtime-rs/tests/utils" }

				ch-config = { path = "src/runtime-rs/crates/hypervisor/ch-config" }

				common = { path = "src/runtime-rs/crates/runtimes/common" }

				linux_container = { path = "src/runtime-rs/crates/runtimes/linux_container" }

				virt_container = { path = "src/runtime-rs/crates/runtimes/virt_container" }

				wasm_container = { path = "src/runtime-rs/crates/runtimes/wasm_container" }

				# Local dependencies from `src/lib`

				kata-sys-util = { path = "src/libs/kata-sys-util" }

				kata-types = { path = "src/libs/kata-types", features = ["safe-path"] }

				logging = { path = "src/libs/logging" }

				protocols = { path = "src/libs/protocols", features = ["async"] }

				runtime-spec = { path = "src/libs/runtime-spec" }

				safe-path = { path = "src/libs/safe-path" }

				shim-interface = { path = "src/libs/shim-interface" }

				test-utils = { path = "src/libs/test-utils" }

				# Outside dependencies

				actix-rt = "2.7.0"

				anyhow = "1.0"

				async-trait = "0.1.48"

				containerd-shim = { version = "0.10.0", features = ["async"] }

				containerd-shim-protos = { version = "0.10.0", features = ["async"] }

				go-flag = "0.1.0"

				hyper = "0.14.20"

				hyperlocal = "0.8.0"

				lazy_static = "1.4"

				libc = "0.2"

				log = "0.4.14"

				netns-rs = "0.1.0"

				# Note: nix needs to stay sync'd with libs versions

				nix = "0.26.4"

				oci-spec = { version = "0.8.1", features = ["runtime"] }

				protobuf = "3.7.2"

				rand = "0.8.4"

				serde = { version = "1.0.145", features = ["derive"] }

				serde_json = "1.0.91"

				sha2 = "0.10.9"

				slog = "2.5.2"

				slog-scope = "4.4.0"

				strum = { version = "0.24.0", features = ["derive"] }

				tempfile = "3.19.1"

				thiserror = "1.0"

				tokio = "1.46.1"

				tracing = "0.1.41"

				tracing-opentelemetry = "0.18.0"

				ttrpc = "0.8.4"

				url = "2.5.4"

									
										11

Makefile
									
												View File
												
				@@ -18,7 +18,6 @@ TOOLS =

				TOOLS += agent-ctl

				TOOLS += kata-ctl

				TOOLS += log-parser

				TOOLS += runk

				TOOLS += trace-forwarder

				STANDARD_TARGETS = build check clean install static-checks-build test vendor

				@@ -42,13 +41,16 @@ generate-protocols:

				# Some static checks rely on generated source files of components.

				static-checks: static-checks-build

					bash tests/static-checks.sh github.com/kata-containers/kata-containers

					bash tests/static-checks.sh

				docs-url-alive-check:

					bash ci/docs-url-alive-check.sh

				build-and-publish-kata-debug:

					bash tools/packaging/kata-debug/kata-debug-build-and-upload-payload.sh ${KATA_DEBUG_REGISTRY} ${KATA_DEBUG_TAG} 

					bash tools/packaging/kata-debug/kata-debug-build-and-upload-payload.sh ${KATA_DEBUG_REGISTRY} ${KATA_DEBUG_TAG}

				docs-serve:

					docker run --rm -p 8000:8000 -v ./docs:/docs:ro -v ${PWD}/zensical.toml:/zensical.toml:ro zensical/zensical serve --config-file /zensical.toml -a 0.0.0.0:8000

				.PHONY: \

					all \

				@@ -56,4 +58,5 @@ build-and-publish-kata-debug:

					install-tarball \

					default \

					static-checks \

					docs-url-alive-check

					docs-url-alive-check \

					docs-serve

									
										3

README.md
									
												View File
												
				@@ -1,6 +1,7 @@

				<img src="https://object-storage-ca-ymq-1.vexxhost.net/swift/v1/6e4619c416ff4bd19e1c087f27a43eea/www-images-prod/openstack-logo/kata/SVG/kata-1.svg" width="900">

				[![CI | Publish Kata Containers payload](https://github.com/kata-containers/kata-containers/actions/workflows/payload-after-push.yaml/badge.svg)](https://github.com/kata-containers/kata-containers/actions/workflows/payload-after-push.yaml) [![Kata Containers Nightly CI](https://github.com/kata-containers/kata-containers/actions/workflows/ci-nightly.yaml/badge.svg)](https://github.com/kata-containers/kata-containers/actions/workflows/ci-nightly.yaml)

				[![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/kata-containers/kata-containers/badge)](https://scorecard.dev/viewer/?uri=github.com/kata-containers/kata-containers)

				# Kata Containers

				@@ -138,8 +139,8 @@ The table below lists the remaining parts of the project:

				| [`agent-ctl`](src/tools/agent-ctl) | utility | Tool that provides low-level access for testing the agent. |

				| [`kata-ctl`](src/tools/kata-ctl) | utility | Tool that provides advanced commands and debug facilities. |

				| [`trace-forwarder`](src/tools/trace-forwarder) | utility | Agent tracing helper. |

				| [`runk`](src/tools/runk) | utility | Standard OCI container runtime based on the agent. |

				| [`ci`](.github/workflows) | CI | Continuous Integration configuration files and scripts. |

				| [`ocp-ci`](ci/openshift-ci/README.md) | CI | Continuous Integration configuration for the OpenShift pipelines. |

				| [`katacontainers.io`](https://github.com/kata-containers/www.katacontainers.io) | Source for the [`katacontainers.io`](https://www.katacontainers.io) site. |

				| [`Webhook`](tools/testing/kata-webhook/README.md) | utility | Example of a simple admission controller webhook to annotate pods with the Kata runtime class |

2

VERSION

View File

@@ -1 +1 @@
 .3.0
 .27.0

									
										416

ci/README.md
									
										Normal file
									
												View File
												
				@@ -0,0 +1,416 @@

				# Kata Containers CI

				> [!WARNING]

				> While this project's CI has several areas for improvement, it is constantly

				> evolving. This document attempts to describe its current state, but due to

				> ongoing changes, you may notice some outdated information here. Feel free to

				> modify/improve this document as you use the CI and notice anything odd. The

				> community appreciates it!

				## Introduction

				The Kata Containers CI relies on [GitHub Actions][gh-actions], where the actions

				themselves can be found in the `.github/workflows` directory, and they may call

				helper scripts, which are located under the `tests` directory, to actually

				perform the tasks required for each test case.

				## The different workflows

				There are a few different sets of workflows that are running as part of our CI,

				and here we're going to cover the ones that are less likely to get rotten.  With

				this said, it's fair to advise that if the reader finds something that got

				rotten, opening an issue to the project pointing to the problem is a nice way to

				help, and providing a fix for the issue is a very encouraging way to help.

				### Jobs that run automatically when a PR is raised

				These are a bunch of tests that will automatically run as soon as a PR is

				opened, they're mostly running on "cost free" runners, and they do some

				pre-checks to evaluate that your PR may be okay to start getting reviewed.

				Mind, though, that the community expects the contributors to, at least, build

				their code before submitting a PR, which the community sees as a very fair

				request.

				Without getting into the weeds with details on this, those jobs are the ones

				responsible for ensuring that:

				- The commit message is in the expected format

				- There's no missing Developer's Certificate of Origin

				- Static checks are passing

				### Jobs that require a maintainer's approval to run

				There are some tests, and our so-called "CI".  These require a

				maintainer's approval to run as parts of those jobs will be running on "paid

				runners", which are currently using Azure infrastructure.

				Once a maintainer of the project gives "the green light" (currently by adding an

				`ok-to-test` label to the PR, soon to be changed to commenting "/test" as part

				of a PR review), the following tests will be executed:

				- Build all the components (runs on free cost runners, or bare-metal depending on the architecture)

				- Create a tarball with all the components (runs on free cost runners, or bare-metal depending on the architecture)

				- Create a kata-deploy payload with the tarball generated in the previous step (runs on free costs runner, or bare-metal depending on the architecture)

				- Run the following tests:

				  - Tests depending on the generated tarball

				    - Metrics (runs on bare-metal)

				    - `docker` (runs on cost free runners)

				    - `nerdctl` (runs on cost free runners)

				    - `kata-monitor` (runs on cost free runners)

				    - `cri-containerd` (runs on cost free runners)

				    - `nydus` (runs on cost free runners)

				    - `vfio` (runs on cost free runners)

				  - Tests depending on the generated kata-deploy payload

				    - kata-deploy (runs on cost free runners)

				      - Tests are performed using different "Kubernetes flavors", such as k0s, k3s, rke2, and Azure Kubernetes Service (AKS).

				    - Kubernetes (runs in Azure small and medium instances depending on what's required by each test, and on TEE bare-metal machines)

				      - Tests are performed with different runtime engines, such as CRI-O and containerd.

				      - Tests are performed with different snapshotters for containerd, namely OverlayFS and devmapper.

				      - Tests are performed with all the supported hypervisors, which are Cloud Hypervisor, Dragonball, Firecracker, and QEMU.

				For all the tests relying on Azure instances, real money is being spent, so the

				community asks for the maintainers to be mindful about those, and avoid abusing

				them to merely debug issues.

				## The different runners

				In the previous section we've mentioned using different runners, now in this section we'll go through each type of runner used.

				- Cost free runners:  Those are the runners provided by GitHub itself, and

				  those are fairly small machines with virtualization capabilities enabled.

				- Azure small instances: Those are runners which have virtualization

				  capabilities enabled, 2 CPUs, and 8GB of RAM.  These runners have a "-smaller"

				  suffix to their name.

				- Azure normal instances: Those are runners which have virtualization

				  capabilities enabled, 4 CPUs, and 16GB of RAM.  These runners are usually

				  `garm` ones with no "-smaller" suffix.

				- Bare-metal runners: Those are runners provided by community contributors,

				  and they may vary in architecture, size and virtualization capabilities.

				  Builder runners don't actually require any virtualization capabilities, while

				  runners which will be actually performing the tests must have virtualization

				  capabilities and a reasonable amount for CPU and RAM available (at least

				  matching the Azure normal instances).

				## Adding new tests

				Before someone decides to add a new test, we strongly recommend them to go

				through [GitHub Actions Documentation][gh-actions],

				which will provide you a very sensible background on how to read and understand

				current tests we have, and also become familiar with how to write a new test.

				On the Kata Containers land, there are basically two sets of tests: "standalone"

				and "part of something bigger".

				The "standalone" tests, for example the commit message check, won't be covered

				here as they're better covered by the GitHub Actions documentation pasted above.

				The "part of something bigger" is the more complicated one and not so

				straightforward to add, so we'll be focusing our efforts on describing the

				addition of those.

				> [!NOTE]

				> TODO: Currently, this document refers to "tests" when it actually means the

				> jobs (or workflows) of GitHub. In an ideal world, except in some specific cases,

				> new tests should be added without the need to add new workflows. In the

				> not-too-distant future (hopefully), we will improve the workflows to support

				> this.

				### Adding a new test that's "part of something bigger"

				The first important thing here is to align expectations, and we must say that

				the community strongly prefers receiving tests that already come with:

				- Instructions how to run them

				- A proven run where it's passing

				There are several ways to achieve those two requirements, and an example of that

				can be seen in PR #8115.

				With the expectations aligned, adding a test consists in:

				- Adding a new yaml file for your test, and ensure it's called from the

				  "bigger" yaml. See the [Kata Monitor test example][monitor-ex01].

				- Adding the helper scripts needed for your test to run. Again, use the [Kata Monitor script as example][monitor-ex02].

				Following those examples, the community advice during the review, and even

				asking the community directly on Slack are the best ways to get your test

				accepted.

				## Required tests

				In our CI we have two categories of jobs - required and non-required:

				- Required jobs need to all pass for a PR to be merged normally and

				should cover all the core features on Kata Containers that we want to

				ensure don't have regressions.

				- The non-required jobs are for unstable tests, or for features that

				are experimental and not-fully supported. We'd like those tests to also

				pass on all PRs ideally, but don't block merging if they don't as it's

				not necessarily an indication of the PR code causing regressions.

				### Transitioning between required and non-required status

				Required jobs that fail block merging of PRs, so we want to ensure that

				jobs are stable and maintained before we make them required.

				The [Kata Containers CI Dashboard](https://kata-containers.github.io/)

				is a useful resource to check when collecting evidence of job stability.

				At time of writing it reports the last ten days of Kata CI nightly test

				results for each job. This isn't perfect as it doesn't currently capture

				results on PRs, but is a good guideline for stability.

				> [!NOTE]

				> Below are general guidelines about jobs being marked as

				> required/non-required, but they are subject to change and the Kata

				> Architecture Committee may overrule these guidelines at their

				> discretion.

				#### Initial marking as required

				For new jobs, or jobs that haven't been marked as required recently,

				the criteria to be initially marked as required is ten days

				of passing tests, with no relevant PR failures reported in that time.

				Required jobs also need one or more nominated maintainers that are

				responsible for the stability of their jobs. Maintainers can be registered

				in [`maintainers.yml`](https://github.com/kata-containers/kata-containers.github.io/blob/main/maintainers.yml)

				and will then show on the CI Dashboard.

				To add transparency to making jobs required/non-required and to keep the

				GitHub UI in sync with the [Gatekeeper job](../tools/testing/gatekeeper),

				the process to update a job's required state is as follows:

				1. Create a PR to update `maintainers.yml`, if new maintainers are being

				declared on a CI job.

				1. Create a PR which updates

				[`required-tests.yaml`](../tools/testing/gatekeeper/required-tests.yaml)

				adding the new job and listing the evidence that the job meets the

				requirements above. Ensure that all maintainers and

				@kata-containers/architecture-committee are notified to give them the

				opportunity to review the PR. See

				[#11015](https://github.com/kata-containers/kata-containers/pull/11015)

				as an example.

				1. The maintainers and Architecture Committee get a chance to review the PR.

				It can be discussed in an AC meeting to get broader input.

				1. Once the PR has been merged, a Kata Containers admin should be notified

				to ensure that the GitHub UI is updated to reflect the change in

				`required-tests.yaml`.

				#### Expectation of required job maintainers

				Due to the nature of the Kata Containers community having contributors

				spread around the world, required jobs being blocked due to infrastructure,

				or test issues can have a big impact on work. As such, the expectation is

				that when a problem with a required job is noticed/reported, the maintainers

				have one working day to acknowledge the issue, perform an initial

				investigation and then either fix it, or get it marked as non-required

				whilst the investigation and/or fix it done.

				### Re-marking of required status

				Once a job has been removed from the required list, it requires two

				consecutive successful nightly test runs before being made required

				again.

				## Running tests

				### Running the tests as part of the CI

				If you're a maintainer of the project, you'll be able to kick in the tests by

				yourself.  With the current approach, you just need to add the `ok-to-test`

				label and the tests will automatically start.  We're moving, though, to use a

				`/test` command as part of a GitHub review comment, which will simplify this

				process.

				If you're not a maintainer, please, send a message on Slack or wait till one of

				the maintainers reviews your PR.  Maintainers will then kick in the tests on

				your behalf.

				In case a test fails and there's the suspicion it happens due to flakiness in

				the test itself, please, create an issue for us, and then re-run (or asks

				maintainers to re-run) the tests following these steps:

				- Locate which tests is failing

				- Click in "details"

				- In the top right corner, click in "Re-run jobs"

				- And then in "Re-run failed jobs"

				- And finally click in the green "Re-run jobs" button

				> [!NOTE]

				> TODO: We need figures here

				### Running the tests locally

				In this section, aligning expectations is also something very important, as one

				will not be able to run the tests exactly in the same way the tests are running

				in the CI, as one most likely won't have access to an Azure subscription.

				However, we're trying our best here to provide you with instructions on how to

				run the tests in an environment that's "close enough" and will help you to debug

				issues you find with the current tests, or even provide a proof-of-concept to

				the new test you're trying to add.

				The basic steps, which we will cover in details down below are:

				 1. Create a VM matching the configuration of the target runner

				 2. Generate the artifacts you'll need for the test, or download them from a

				    current failed run

				 3. Follow the steps provided in the action itself to run the tests.

				Although the general overview looks easy, we know that some tricks need to be

				shared, and we'll go through the general process of debugging one non-Kubernetes

				and one Kubernetes specific test for educational purposes.

				One important thing to note is that "Create a VM" can be done in innumerable

				different ways, using the tools of your choice.  For the sake of simplicity on

				this guide, we'll be using `kcli`, which we strongly recommend in case you're a

				non-experienced user, and happen to be developing on a Linux box.

				For both non-Kubernetes and Kubernetes cases, we'll be using PR #8070 as an

				example, which at the time this document is being written serves us very well

				the purpose, as you can see that we have `nerdctl` and Kubernetes tests failing.

				## Debugging tests

				### Debugging a non Kubernetes test

				As shown above, the `nerdctl` test is failing.

				As a developer you can go ahead to the details of the job, and expand the job

				that's failing in order to gather more information.

				But when that doesn't help, we need to set up our own environment to debug

				what's going on.

				Taking a look at the `nerdctl` test, which is located here, you can easily see

				that it runs-on a `garm-ubuntu-2304-smaller` virtual machine.

				The important parts to understand are `ubuntu-2304`, which is the OS where the

				test is running on; and "smaller", which means we're running it on a machine

				with 2 CPUs and 8GB of RAM.

				With this information, we can go ahead and create a similar VM locally using `kcli`.

				```bash

				$ sudo kcli create vm -i ubuntu2304 -P disks=[60] -P numcpus=2 -P memory=8192 -P cpumodel=host-passthrough debug-nerdctl-pr8070

				```

				In order to run the tests, you'll need the "kata-tarball" artifacts, which you

				can build your own using "make kata-tarball" (see below), or simply get them

				from the PR where the tests failed.  To download them, click on the "Summary"

				button that's on the top left corner, and then scroll down till you see the

				artifacts, as shown below.

				Unfortunately GitHub doesn't give us a link that we can download those from

				inside the VM, but we can download them on our local box, and then `scp` the

				tarball to the newly created VM that will be used for debugging purposes.

				> [!NOTE]

				> Those artifacts are only available (for 15 days) when all jobs are finished.

				Once you have the `kata-static.tar.zst` in your VM, you can login to the VM with

				`kcli ssh debug-nerdctl-pr8070`, go ahead and then clone your development branch

				```bash

				$ git clone --branch feat_add-fc-runtime-rs https://github.com/nubificus/kata-containers

				```

				Add the upstream as a remote, set up your git, and rebase your branch atop of the upstream main one

				```bash

				$ git remote add upstream https://github.com/kata-containers/kata-containers

				$ git remote update

				$ git config --global user.email "you@example.com"

				$ git config --global user.name "Your Name"

				$ git rebase upstream/main

				```

				Now copy the `kata-static.tar.zst` into your `kata-containers/kata-artifacts` directory

				```bash

				$ mkdir kata-artifacts

				$ cp ../kata-static.tar.zst kata-artifacts/

				```

				> [!NOTE]

				> If you downloaded the .zip from GitHub you need to uncompress first to see `kata-static.tar.zst`

				And finally run the tests following what's in the yaml file for the test you're

				debugging.

				In our case, the `run-nerdctl-tests-on-garm.yaml`.

				When looking at the file you'll notice that some environment variables are set,

				such as `KATA_HYPERVISOR`, and should be aware that, for this particular example,

				the important steps to follow are:

				Install the dependencies

				Install kata

				Run the tests

				Let's now run the steps mentioned above exporting the expected environment variables

				```bash

				$ export KATA_HYPERVISOR=dragonball

				$ bash ./tests/integration/nerdctl/gha-run.sh install-dependencies

				$ bash ./tests/integration/nerdctl/gha-run.sh install-kata

				$ bash tests/integration/nerdctl/gha-run.sh run

				```

				And with this you should've been able to reproduce exactly the same issue found

				in the CI, and from now on you can build your own code, use your own binaries,

				and have fun debugging and hacking!

				### Debugging a Kubernetes test

				Steps for debugging the Kubernetes tests are very similar to the ones for

				debugging non-Kubernetes tests, with the caveat that what you'll need, this

				time, is not the `kata-static.tar.zst` tarball, but rather a payload to be used

				with kata-deploy.

				In order to generate your own kata-deploy image you can generate your own

				`kata-static.tar.zst` and then take advantage of the following script.  Be aware

				that the image generated and uploaded must be accessible by the VM where you'll

				be performing your tests.

				In case you want to take advantage of the payload that was already generated

				when you faced the CI failure, which is considerably easier, take a look at the

				failed job, then click in "Deploy Kata" and expand the "Final kata-deploy.yaml

				that is used in the test" section.  From there you can see exactly what you'll

				have to use when deploying kata-deploy in your local cluster.

				> [!NOTE]

				> TODO: WAINER TO FINISH THIS PART BASED ON HIS PR TO RUN A LOCAL CI

				## Adding new runners

				Any admin of the project is able to add or remove GitHub runners, and those are

				the folks you should rely on.

				If you need a new runner added, please, tag @ac in the Kata Containers slack,

				and someone from that group will be able to help you.

				If you're part of that group and you're looking for information on how to help

				someone, this is simple, and must be done in private. Basically what you have to

				do is:

				- Go to the kata-containers/kata-containers repo

				- Click on the Settings button, located in the top right corner

				- On the left panel, under "Code and automation", click on "Actions"

				- Click on "Runners"

				If you want to add a new self-hosted runner:

				- In the top right corner there's a green button called "New self-hosted runner"

				If you want to remove a current self-hosted runner:

				- For each runner there's a "..." menu, where you can just click and the

				  "Remove runner" option will show up

				## Known limitations

				As the GitHub actions are structured right now we cannot: Test the addition of a

				GitHub action that's not triggered by a pull_request event as part of the PR.

				[gh-actions]: https://docs.github.com/en/actions

				[monitor-ex01]: https://github.com/kata-containers/kata-containers/commit/a3fb067f1bccde0cbd3fd4d5de12dfb3d8c28b60

				[monitor-ex02]: https://github.com/kata-containers/kata-containers/commit/489caf1ad0fae27cfd00ba3c9ed40e3d512fa492

									
										39

ci/darwin-test.sh
									
												View File
												
				@@ -7,16 +7,17 @@

				set -e

				cidir=$(dirname "$0")

				runtimedir=$cidir/../src/runtime

				runtimedir=${cidir}/../src/runtime

				genpolicydir=${cidir}/../src/tools/genpolicy

				build_working_packages() {

					# working packages:

					device_api=$runtimedir/pkg/device/api

					device_config=$runtimedir/pkg/device/config

					device_drivers=$runtimedir/pkg/device/drivers

					device_manager=$runtimedir/pkg/device/manager

					rc_pkg_dir=$runtimedir/pkg/resourcecontrol/

					utils_pkg_dir=$runtimedir/virtcontainers/utils

					device_api=${runtimedir}/pkg/device/api

					device_config=${runtimedir}/pkg/device/config

					device_drivers=${runtimedir}/pkg/device/drivers

					device_manager=${runtimedir}/pkg/device/manager

					rc_pkg_dir=${runtimedir}/pkg/resourcecontrol/

					utils_pkg_dir=${runtimedir}/virtcontainers/utils

					# broken packages :( :

					#katautils=$runtimedir/pkg/katautils

				@@ -24,15 +25,15 @@ build_working_packages() {

					#vc=$runtimedir/virtcontainers

					pkgs=(

						"$device_api"

						"$device_config"

						"$device_drivers"

						"$device_manager"

						"$utils_pkg_dir"

						"$rc_pkg_dir")

						"${device_api}"

						"${device_config}"

						"${device_drivers}"

						"${device_manager}"

						"${utils_pkg_dir}"

						"${rc_pkg_dir}")

					for pkg in "${pkgs[@]}"; do

						echo building "$pkg"

						pushd "$pkg" &>/dev/null

						echo building "${pkg}"

						pushd "${pkg}" &>/dev/null

						go build

						go test

						popd &>/dev/null

				@@ -40,3 +41,11 @@ build_working_packages() {

				}

				build_working_packages

				build_genpolicy() {

					echo "building genpolicy"

					pushd "${genpolicydir}" &>/dev/null

					make TRIPLE=aarch64-apple-darwin build

				}

				build_genpolicy

									
										2

ci/docs-url-alive-check.sh
									
												View File
												
				@@ -7,6 +7,6 @@

				set -e

				cidir=$(dirname "$0")

				source "${cidir}/lib.sh"

				source "${cidir}/../tests/common.bash"

				run_docs_url_alive_check

									
										58

ci/gh-util.sh
									
												View File
												
				@@ -10,7 +10,7 @@ set -o errtrace

				set -o nounset

				set -o pipefail

				[ -n "${DEBUG:-}" ] && set -o xtrace

				[[ -n "${DEBUG:-}" ]] && set -o xtrace

				script_name=${0##*/}

				@@ -25,7 +25,7 @@ die()

				usage()

				{

				    cat <<EOF

				Usage: $script_name [OPTIONS] [command] [arguments]

				Usage: ${script_name} [OPTIONS] [command] [arguments]

				Description: Utility to expand the abilities of the GitHub CLI tool, gh.

				@@ -48,7 +48,7 @@ Examples:

				- List issues for a Pull Request 123 in kata-containers/kata-containers repo

				  $ $script_name list-issues-for-pr 123

				  $ ${script_name} list-issues-for-pr 123

				EOF

				}

				@@ -57,11 +57,12 @@ list_issues_for_pr()

				    local pr="${1:-}"

				    local repo="${2:-kata-containers/kata-containers}"

				    [ -z "$pr" ] && die "need PR"

				    [[ -z "${pr}" ]] && die "need PR"

				    local commits=$(gh pr view ${pr} --repo ${repo} --json commits --jq .commits[].messageBody)

				    local commits

					commits=$(gh pr view "${pr}" --repo "${repo}" --json commits --jq .commits[].messageBody)

				    [ -z "$commits" ] && die "cannot determine commits for PR $pr"

				    [[ -z "${commits}" ]] && die "cannot determine commits for PR ${pr}"

				    # Extract the issue number(s) from the commits.

				    #

				@@ -78,24 +79,25 @@ list_issues_for_pr()

				    #

				    #     "<git-commit> <git-commit-msg>"

				    #

				    local issues=$(echo "$commits" |\

				        egrep -v "^( |	)" |\

				        egrep -i "fixes:* *(#*[0-9][0-9]*)" |\

				    local issues

					issues=$(echo "${commits}" |\

				        grep -v -E "^( |	)" |\

				        grep -i -E "fixes:* *(#*[0-9][0-9]*)" |\

				        tr ' ' '\n' |\

				        grep "[0-9][0-9]*" |\

				        sed 's/[.,\#]//g' |\

				        sort -nu || true)

				    [ -z "$issues" ] && die "cannot determine issues for PR $pr"

				    [[ -z "${issues}" ]] && die "cannot determine issues for PR ${pr}"

				    echo "# Issues linked to PR"

				    echo "#"

				    echo "# Fields: issue_number"

				    local issue

				    echo "$issues"|while read issue

				    echo "${issues}" | while read -r issue

				    do

				        printf "%s\n" "$issue"

				        printf "%s\n" "${issue}"

				    done

				}

				@@ -103,20 +105,21 @@ list_labels_for_issue()

				{

				    local issue="${1:-}"

				    [ -z "$issue" ] && die "need issue number"

				    [[ -z "${issue}" ]] && die "need issue number"

				    local labels=$(gh issue view ${issue} --repo kata-containers/kata-containers --json labels)

				    local labels

					labels=$(gh issue view "${issue}" --repo kata-containers/kata-containers --json labels)

				    [ -z "$labels" ] && die "cannot determine labels for issue $issue"

				    [[ -z "${labels}" ]] && die "cannot determine labels for issue ${issue}"

				    printf "$labels"

				    echo "${labels}"

				}

				setup()

				{

				    for cmd in gh jq

				    do

				        command -v "$cmd" &>/dev/null || die "need command: $cmd"

				        command -v "${cmd}" &>/dev/null || die "need command: ${cmd}"

				    done

				}

				@@ -124,29 +127,28 @@ handle_args()

				{

				    setup

				    local show_all="false"

				    local opt

				    while getopts "ahr:" opt "$@"

				    while getopts "hr:" opt "$@"

				    do

				        case "$opt" in

				            a) show_all="true" ;;

				        case "${opt}" in

				            h) usage && exit 0 ;;

				            r) repo="${OPTARG}" ;;

							*) echo "use '-h' to get list of supprted aruments" && exit 1 ;;

				        esac

				    done

				    shift $(($OPTIND - 1))

				    shift $((OPTIND - 1))

				    local repo="${repo:-kata-containers/kata-containers}"

				    local cmd="${1:-}"

				    case "$cmd" in

				    case "${cmd}" in

				        list-issues-for-pr) ;;

				        list-labels-for-issue) ;;

				        "") usage && exit 0 ;;

				        *) die "invalid command: '$cmd'" ;;

				        *) die "invalid command: '${cmd}'" ;;

				    esac

				    # Consume the command name

				@@ -155,20 +157,20 @@ handle_args()

				    local issue=""

				    local pr=""

				    case "$cmd" in

				    case "${cmd}" in

				        list-issues-for-pr)

				            pr="${1:-}"

				            list_issues_for_pr "$pr" "${repo}"

				            list_issues_for_pr "${pr}" "${repo}"

				            ;;

				        list-labels-for-issue)

				            issue="${1:-}"

				            list_labels_for_issue "$issue"

				            list_labels_for_issue "${issue}"

				            ;;

				        *) die "impossible situation: cmd: '$cmd'" ;;

				        *) die "impossible situation: cmd: '${cmd}'" ;;

				    esac

				    exit 0

									
										22

ci/install_go.sh
									
												View File
											
				@@ -1,22 +0,0 @@

				#!/usr/bin/env bash

				#

				# Copyright (c) 2019 Intel Corporation

				#

				# SPDX-License-Identifier: Apache-2.0

				set -e

				cidir=$(dirname "$0")

				source "${cidir}/lib.sh"

				clone_tests_repo

				new_goroot=/usr/local/go

				pushd "${tests_repo_dir}"

				# Force overwrite the current version of golang

				[ -z "${GOROOT}" ] || rm -rf "${GOROOT}"

				.ci/install_go.sh -p -f -d "$(dirname ${new_goroot})"

				[ -z "${GOROOT}" ] || sudo ln -sf "${new_goroot}" "${GOROOT}"

				go version

				popd

									
										126

ci/install_libseccomp.sh
									
												View File
												
				@@ -8,10 +8,13 @@

				set -o errexit

				script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

				script_name="$(basename "${BASH_SOURCE[0]}")"

				source "${script_dir}/../tests/common.bash"

				# Path to the ORAS cache helper for downloading tarballs (sourced when needed)

				# Use ORAS_CACHE_HELPER env var (set by build.sh in Docker) or fallback to repo path

				oras_cache_helper="${ORAS_CACHE_HELPER:-${script_dir}/../tools/packaging/scripts/download-with-oras-cache.sh}"

				# The following variables if set on the environment will change the behavior

				# of gperf and libseccomp configure scripts, that may lead this script to

				# fail. So let's ensure they are unset here.

				@@ -22,12 +25,12 @@ workdir="$(mktemp -d --tmpdir build-libseccomp.XXXXX)"

				# Variables for libseccomp

				libseccomp_version="${LIBSECCOMP_VERSION:-""}"

				if [ -z "${libseccomp_version}" ]; then

				    libseccomp_version=$(get_from_kata_deps "externals.libseccomp.version")

				if [[ -z "${libseccomp_version}" ]]; then

					libseccomp_version=$(get_from_kata_deps ".externals.libseccomp.version")

				fi

				libseccomp_url="${LIBSECCOMP_URL:-""}"

				if [ -z "${libseccomp_url}" ]; then

				    libseccomp_url=$(get_from_kata_deps "externals.libseccomp.url")

				if [[ -z "${libseccomp_url}" ]]; then

					libseccomp_url=$(get_from_kata_deps ".externals.libseccomp.url")

				fi

				libseccomp_tarball="libseccomp-${libseccomp_version}.tar.gz"

				libseccomp_tarball_url="${libseccomp_url}/releases/download/v${libseccomp_version}/${libseccomp_tarball}"

				@@ -35,77 +38,98 @@ cflags="-O2"

				# Variables for gperf

				gperf_version="${GPERF_VERSION:-""}"

				if [ -z "${gperf_version}" ]; then

				    gperf_version=$(get_from_kata_deps "externals.gperf.version")

				if [[ -z "${gperf_version}" ]]; then

					gperf_version=$(get_from_kata_deps ".externals.gperf.version")

				fi

				gperf_url="${GPERF_URL:-""}"

				if [ -z "${gperf_url}" ]; then

				    gperf_url=$(get_from_kata_deps "externals.gperf.url")

				if [[ -z "${gperf_url}" ]]; then

					gperf_url=$(get_from_kata_deps ".externals.gperf.url")

				fi

				gperf_tarball="gperf-${gperf_version}.tar.gz"

				gperf_tarball_url="${gperf_url}/${gperf_tarball}"

				# We need to build the libseccomp library from sources to create a static library for the musl libc.

				# However, ppc64le and s390x have no musl targets in Rust. Hence, we do not set cflags for the musl libc.

				if ([ "${arch}" != "ppc64le" ] && [ "${arch}" != "s390x" ]); then

				    # Set FORTIFY_SOURCE=1 because the musl-libc does not have some functions about FORTIFY_SOURCE=2

				    cflags="-U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=1 -O2"

				# Use ORAS cache for gperf downloads (gperf upstream can be unreliable)

				USE_ORAS_CACHE="${USE_ORAS_CACHE:-yes}"

				# We need to build the libseccomp library from sources to create a static

				# library for the musl libc.

				# However, ppc64le, riscv64 and s390x have no musl targets in Rust. Hence, we do

				# not set cflags for the musl libc.

				if [[ "${arch}" != "ppc64le" ]] && [[ "${arch}" != "riscv64" ]] && [[ "${arch}" != "s390x" ]]; then

					# Set FORTIFY_SOURCE=1 because the musl-libc does not have some functions about FORTIFY_SOURCE=2

					cflags="-U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=1 -O2"

				fi

				die() {

				    msg="$*"

				    echo "[Error] ${msg}" >&2

				    exit 1

					msg="$*"

					echo "[Error] ${msg}" >&2

					exit 1

				}

				finish() {

				    rm -rf "${workdir}"

					rm -rf "${workdir}"

				}

				trap finish EXIT

				build_and_install_gperf() {

				    echo "Build and install gperf version ${gperf_version}"

				    mkdir -p "${gperf_install_dir}"

				    curl -sLO "${gperf_tarball_url}"

				    tar -xf "${gperf_tarball}"

				    pushd "gperf-${gperf_version}"

				    # Unset $CC for configure, we will always use native for gperf

				    CC= ./configure --prefix="${gperf_install_dir}"

				    make

				    make install

				    export PATH=$PATH:"${gperf_install_dir}"/bin

				    popd

				    echo "Gperf installed successfully"

					echo "Build and install gperf version ${gperf_version}"

					mkdir -p "${gperf_install_dir}"

					# Use ORAS cache if available and enabled

					if [[ "${USE_ORAS_CACHE}" == "yes" ]] && [[ -f "${oras_cache_helper}" ]]; then

						echo "Using ORAS cache for gperf download"

						source "${oras_cache_helper}"

						local cached_tarball

						cached_tarball=$(download_component gperf "$(pwd)")

						if [[ -f "${cached_tarball}" ]]; then

							gperf_tarball="${cached_tarball}"

						else

							echo "ORAS cache download failed, falling back to direct download"

							curl -sLO "${gperf_tarball_url}"

						fi

					else

						curl -sLO "${gperf_tarball_url}"

					fi

					tar -xf "${gperf_tarball}"

					pushd "gperf-${gperf_version}"

					# Unset $CC for configure, we will always use native for gperf

					CC="" ./configure --prefix="${gperf_install_dir}"

					make

					make install

					export PATH=${PATH}:"${gperf_install_dir}"/bin

					popd

					echo "Gperf installed successfully"

				}

				build_and_install_libseccomp() {

				    echo "Build and install libseccomp version ${libseccomp_version}"

				    mkdir -p "${libseccomp_install_dir}"

				    curl -sLO "${libseccomp_tarball_url}"

				    tar -xf "${libseccomp_tarball}"

				    pushd "libseccomp-${libseccomp_version}"

				    [ "${arch}" == $(uname -m) ] && cc_name="" || cc_name="${arch}-linux-gnu-gcc"

				    CC=${cc_name} ./configure --prefix="${libseccomp_install_dir}" CFLAGS="${cflags}" --enable-static --host="${arch}"

				    make

				    make install

				    popd

				    echo "Libseccomp installed successfully"

					echo "Build and install libseccomp version ${libseccomp_version}"

					mkdir -p "${libseccomp_install_dir}"

					curl -sLO "${libseccomp_tarball_url}"

					tar -xf "${libseccomp_tarball}"

					pushd "libseccomp-${libseccomp_version}"

					[[ "${arch}" == $(uname -m) ]] && cc_name="" || cc_name="${arch}-linux-gnu-gcc"

					CC=${cc_name} ./configure --prefix="${libseccomp_install_dir}" CFLAGS="${cflags}" --enable-static --host="${arch}"

					make

					make install

					popd

					echo "Libseccomp installed successfully"

				}

				main() {

				    local libseccomp_install_dir="${1:-}"

				    local gperf_install_dir="${2:-}"

					local libseccomp_install_dir="${1:-}"

					local gperf_install_dir="${2:-}"

				    if [ -z "${libseccomp_install_dir}" ] || [ -z "${gperf_install_dir}" ]; then

				        die "Usage: ${0} <libseccomp-install-dir> <gperf-install-dir>"

				    fi

					if [[ -z "${libseccomp_install_dir}" ]] || [[ -z "${gperf_install_dir}" ]]; then

						die "Usage: ${0} <libseccomp-install-dir> <gperf-install-dir>"

					fi

				    pushd "$workdir"

				    # gperf is required for building the libseccomp.

				    build_and_install_gperf

				    build_and_install_libseccomp

				    popd

					pushd "${workdir}"

					# gperf is required for building the libseccomp.

					build_and_install_gperf

					build_and_install_libseccomp

					popd

				}

				main "$@"

									
										16

ci/install_rust.sh
									
												View File
											
				@@ -1,16 +0,0 @@

				#!/usr/bin/env bash

				# Copyright (c) 2019 Ant Financial

				#

				# SPDX-License-Identifier: Apache-2.0

				#

				set -e

				cidir=$(dirname "$0")

				source "${cidir}/lib.sh"

				clone_tests_repo

				pushd ${tests_repo_dir}

				.ci/install_rust.sh ${1:-}

				popd

									
										19

ci/install_vc.sh
									
												View File
											
				@@ -1,19 +0,0 @@

				#!/usr/bin/env bash

				#

				# Copyright (c) 2018 Intel Corporation

				#

				# SPDX-License-Identifier: Apache-2.0

				set -e

				cidir=$(dirname "$0")

				vcdir="${cidir}/../src/runtime/virtcontainers/"

				source "${cidir}/lib.sh"

				export CI_JOB="${CI_JOB:-default}"

				clone_tests_repo

				if [ "${CI_JOB}" != "PODMAN" ]; then

					echo "Install virtcontainers"

					make -C "${vcdir}" && chronic sudo make -C "${vcdir}" install

				fi

									
										45

ci/install_yq.sh
									
												View File
												
				@@ -5,29 +5,49 @@

				# SPDX-License-Identifier: Apache-2.0

				#

				[[ -n "${DEBUG}" ]] && set -o xtrace

				# If we fail for any reason a message will be displayed

				die() {

					msg="$*"

					echo "ERROR: $msg" >&2

					echo "ERROR: ${msg}" >&2

					exit 1

				}

				function verify_yq_exists() {

					local yq_path=$1

					local yq_version=$2

					local expected="yq (https://github.com/mikefarah/yq/) version ${yq_version}"

					if [[ -x  "${yq_path}" ]] && [[ "$(${yq_path} --version)"X == "${expected}"X ]]; then

						return 0

					else

						return 1

					fi

				}

				# Install the yq yaml query package from the mikefarah github repo

				# Install via binary download, as we may not have golang installed at this point

				function install_yq() {

					local yq_pkg="github.com/mikefarah/yq"

					local yq_version=3.4.1

					local yq_version=v4.44.5

					local precmd=""

					local yq_path=""

					INSTALL_IN_GOPATH=${INSTALL_IN_GOPATH:-true}

					if [ "${INSTALL_IN_GOPATH}"  == "true" ];then

					if [[ "${INSTALL_IN_GOPATH}" == "true" ]]; then

						GOPATH=${GOPATH:-${HOME}/go}

						mkdir -p "${GOPATH}/bin"

						local yq_path="${GOPATH}/bin/yq"

						yq_path="${GOPATH}/bin/yq"

					else

						yq_path="/usr/local/bin/yq"

					fi

					if verify_yq_exists "${yq_path}" "${yq_version}"; then

						echo "yq is already installed in correct version"

						return

					fi

					if [[ "${yq_path}" == "/usr/local/bin/yq" ]]; then

						# Check if we need sudo to install yq

						if [ ! -w "/usr/local/bin" ]; then

						if [[ ! -w "/usr/local/bin" ]]; then

							# Check if we have sudo privileges

							if ! sudo -n true 2>/dev/null; then

								die "Please provide sudo privileges to install yq"

				@@ -36,7 +56,6 @@ function install_yq() {

							fi

						fi

					fi

					[ -x  "${yq_path}" ] && [ "`${yq_path} --version`"X == "yq version ${yq_version}"X ] && return

					read -r -a sysInfo <<< "$(uname -sm)"

				@@ -54,15 +73,18 @@ function install_yq() {

						goarch=arm64

						;;

					"arm64")

						# If we're on an apple silicon machine, just assign amd64. 

						# The version of yq we use doesn't have a darwin arm build, 

						# If we're on an apple silicon machine, just assign amd64.

						# The version of yq we use doesn't have a darwin arm build,

						# but Rosetta can come to the rescue here.

						if [ $goos == "Darwin" ]; then 

						if [[ ${goos} == "Darwin" ]]; then

							goarch=amd64

						else 

						else

							goarch=arm64

						fi

						;;

					"riscv64")

						goarch=riscv64

						;;

					"ppc64le")

						goarch=ppc64le

						;;

				@@ -85,8 +107,7 @@ function install_yq() {

					## NOTE: ${var,,} => gives lowercase value of var

					local yq_url="https://${yq_pkg}/releases/download/${yq_version}/yq_${goos}_${goarch}"

					${precmd} curl -o "${yq_path}" -LSsf "${yq_url}"

					[ $? -ne 0 ] && die "Download ${yq_url} failed"

					${precmd} curl -o "${yq_path}" -LSsf "${yq_url}" || die "Download ${yq_url} failed"

					${precmd} chmod +x "${yq_path}"

					if ! command -v "${yq_path}" >/dev/null; then

									
										87

ci/lib.sh
									
												View File
											
				@@ -1,87 +0,0 @@

				#

				# Copyright (c) 2018 Intel Corporation

				#

				# SPDX-License-Identifier: Apache-2.0

				set -o nounset

				GOPATH=${GOPATH:-${HOME}/go}

				export kata_repo="github.com/kata-containers/kata-containers"

				export kata_repo_dir="$GOPATH/src/$kata_repo"

				export tests_repo="${tests_repo:-github.com/kata-containers/tests}"

				export tests_repo_dir="$GOPATH/src/$tests_repo"

				export branch="${target_branch:-main}"

				# Clones the tests repository and checkout to the branch pointed out by

				# the global $branch variable.

				# If the clone exists and `CI` is exported then it does nothing. Otherwise

				# it will clone the repository or `git pull` the latest code.

				#

				clone_tests_repo()

				{

					if [ -d "$tests_repo_dir" ]; then

						[ -n "${CI:-}" ] && return

						# git config --global --add safe.directory will always append

						# the target to .gitconfig without checking the existence of

						# the target, so it's better to check it before adding the target repo.

						local sd="$(git config --global --get safe.directory ${tests_repo_dir} || true)"

						if [ -z "${sd}" ]; then

							git config --global --add safe.directory ${tests_repo_dir}

						fi

						pushd "${tests_repo_dir}"

						git checkout "${branch}"

						git pull

						popd

					else

						git clone -q "https://${tests_repo}" "$tests_repo_dir"

						pushd "${tests_repo_dir}"

						git checkout "${branch}"

						popd

					fi

				}

				run_static_checks()

				{

					# Make sure we have the targeting branch

					git remote set-branches --add origin "${branch}"

					git fetch -a

					bash "$kata_repo_dir/tests/static-checks.sh" "$@"

				}

				run_docs_url_alive_check()

				{

					# Make sure we have the targeting branch

					git remote set-branches --add origin "${branch}"

					git fetch -a

					bash "$kata_repo_dir/tests/static-checks.sh" --docs --all "$kata_repo"

				}

				run_get_pr_changed_file_details()

				{

					# Make sure we have the targeting branch

					git remote set-branches --add origin "${branch}"

					git fetch -a

					source "$kata_repo_dir/tests/common.bash"

					get_pr_changed_file_details

				}

				# Check if the 1st argument version is greater than and equal to 2nd one

				# Version format: [0-9]+ separated by period (e.g. 2.4.6, 1.11.3 and etc.)

				#

				# Parameters:

				#	$1	- a version to be tested

				#	$2	- a target version

				#

				# Return:

				# 	0 if $1 is greater than and equal to $2

				#	1 otherwise

				version_greater_than_equal() {

					local current_version=$1

					local target_version=$2

					smaller_version=$(echo -e "$current_version\n$target_version" | sort -V | head -1)

					if [ "${smaller_version}" = "${target_version}" ]; then

						return 0

					else

						return 1

					fi

				}

									
										157

ci/openshift-ci/README.md
									
										Normal file
									
												View File
												
				@@ -0,0 +1,157 @@

				OpenShift CI

				============

				This directory contains scripts used by

				[the OpenShift CI](https://github.com/openshift/release/tree/master/ci-operator/config/kata-containers/kata-containers)

				pipelines to monitor selected functional tests on OpenShift.

				There are 2 pipelines, history and logs can be accessed here:

				* [main - currently supported OCP](https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-kata-containers-kata-containers-main-e2e-tests)

				* [next - currently under development OCP](https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-kata-containers-kata-containers-main-next-e2e-tests)

				Running openshift-tests on OCP with kata-containers manually

				============================================================

				To run openshift-tests (or other suites) with kata-containers one can use

				the kata-webhook. To deploy everything you can mimic the CI pipeline by:

				```bash

				#!/bin/bash -e

				# Setup your kubectl and check it's accessible by

				kubectl nodes

				# Deploy kata (set KATA_DEPLOY_IMAGE to override the default kata-deploy-ci:latest image)

				./test.sh

				# Deploy the webhook

				KATA_RUNTIME=kata-qemu cluster/deploy_webhook.sh

				```

				This should ensure kata-containers as well as kata-webhook are installed and

				working. Before running the openshift-tests it's (currently) recommended to

				ignore some security features by:

				```bash

				#!/bin/bash -e

				oc adm policy add-scc-to-group privileged system:authenticated system:serviceaccounts

				oc adm policy add-scc-to-group anyuid system:authenticated system:serviceaccounts

				oc label --overwrite ns default pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/warn=baseline pod-security.kubernetes.io/audit=baseline

				```

				Now you should be ready to run the openshift-tests. Our CI only uses a subset

				of tests, to get the current ``TEST_SKIPS`` see

				[the pipeline config](https://github.com/openshift/release/tree/master/ci-operator/config/kata-containers/kata-containers).

				Following steps require the [openshift tests](https://github.com/openshift/origin)

				being cloned and built in the current directory:

				```bash

				#!/bin/bash -e

				# Define tests to be skipped (see the pipeline config for the current version)

				TEST_SKIPS="\[sig-node\] Security Context should support seccomp runtime/default\|\[sig-node\] Variable Expansion should allow substituting values in a volume subpath\|\[k8s.io\] Probing container should be restarted with a docker exec liveness probe with timeout\|\[sig-node\] Pods Extended Pod Container lifecycle evicted pods should be terminal\|\[sig-node\] PodOSRejection \[NodeConformance\] Kubelet should reject pod when the node OS doesn't match pod's OS\|\[sig-network\].*for evicted pods\|\[sig-network\].*HAProxy router should override the route\|\[sig-network\].*HAProxy router should serve a route\|\[sig-network\].*HAProxy router should serve the correct\|\[sig-network\].*HAProxy router should run\|\[sig-network\].*when FIPS.*the HAProxy router\|\[sig-network\].*bond\|\[sig-network\].*all sysctl on whitelist\|\[sig-network\].*sysctls should not affect\|\[sig-network\] pods should successfully create sandboxes by adding pod to network"

				# Get the list of tests to be executed

				TESTS="$(./openshift-tests run --dry-run --provider "${TEST_PROVIDER}" "${TEST_SUITE}")"

				# Store the list of tests in /tmp/tsts file

				echo "${TESTS}" | grep -v "$TEST_SKIPS" > /tmp/tsts

				# Remove previously-existing temporarily files as well as previous results

				OUT=RESULTS/tmp

				rm -Rf /tmp/*test* /tmp/e2e-*

				rm -R $OUT

				mkdir -p $OUT

				# Run the tests ignoring the monitor health checks

				./openshift-tests run --provider azure -o "$OUT/job.log" --junit-dir "$OUT" --file /tmp/tsts --max-parallel-tests 5 --cluster-stability Disruptive --run '^\[sig-node\].*|^\[sig-network\]'

				```

				[!NOTE]

				Note we are ignoring the cluster stability checks because our public cloud is

				not that stable and running with VMs instead of containers results in minor

				stability issues. Some of the old monitor stability tests do not reflect

				the ``--cluster-stability`` setting, one should simply ignore these. If you

				get a message like ``invariant was violated`` or ``error: failed due to a

				MonitorTest failure``, it's usually an indication that only those kind of

				tests failed but the real tests passed. See

				[wrapped-openshift-tests.sh](https://github.com/openshift/release/blob/master/ci-operator/config/kata-containers/kata-containers/wrapped-openshift-tests.sh)

				for details how our pipeline deals with that.

				[!TIP]

				To compare multiple results locally one can use

				[junit2html](https://github.com/inorton/junit2html) tool.

				Best-effort kata-containers cleanup

				===================================

				If you need to cleanup the cluster after testing, you can use the

				``cleanup.sh`` script from the current directory. It tries to delete all

				resources created by ``test.sh`` as well as ``cluster/deploy_webhook.sh``

				ignoring all failures. The primary purpose of this script is to allow

				soft-cleanup after deployment to test different versions without

				re-provisioning everything.

				[!WARNING]

				Do not rely on this script in production, return codes are not checked!**

				Bisecting e2e tests failures

				============================

				Let's say the OCP pipeline passed running with

				``quay.io/kata-containers/kata-deploy-ci:kata-containers-d7afd31fd40e37a675b25c53618904ab57e74ccd-amd64``

				but failed running with

				``quay.io/kata-containers/kata-deploy-ci:kata-containers-9f512c016e75599a4a921bd84ea47559fe610057-amd64``

				and you'd like to know which PR caused the regression. You can either run with

				all the 60 tags between or you can utilize the [bisecter](https://github.com/ldoktor/bisecter)

				to optimize the number of steps in between.

				Before running the bisection you need a reproducer script. Sample one called

				``sample-test-reproducer.sh`` is provided in this directory but you might

				want to copy and modify it, especially:

				* ``OCP_DIR`` - directory where your openshift/release is located (can be exported)

				* ``E2E_TEST`` - openshift-test(s) to be executed (can be exported)

				* behaviour of SETUP (returning 125 skips the current image tag, returning

				  >=128 interrupts the execution, everything else reports the tag as failure

				* what should be executed (perhaps running the setup is enough for you or

				  you might want to be looking for specific failures...)

				* use ``timeout`` to interrupt execution in case you know things should be faster

				Executing that script with the GOOD commit should pass

				``./sample-test-reproducer.sh quay.io/kata-containers/kata-deploy-ci:kata-containers-d7afd31fd40e37a675b25c53618904ab57e74ccd-amd64``

				and fail when executed with the BAD commit

				``./sample-test-reproducer.sh quay.io/kata-containers/kata-deploy-ci:kata-containers-9f512c016e75599a4a921bd84ea47559fe610057-amd64``.

				To get the list of all tags in between those two PRs you can use the

				``bisect-range.sh`` script

				```bash

				./bisect-range.sh d7afd31fd40e37a675b25c53618904ab57e74ccd 9f512c016e75599a4a921bd84ea47559fe610057

				```

				[!NOTE]

				The tagged images are only built per PR, not for individual commits. See

				[kata-deploy-ci](https://quay.io/kata-containers/kata-deploy-ci) to see the

				available images.

				To find out which PR caused this regression, you can either manually try the

				individual commits or you can simply execute:

				```bash

				bisecter start "$(./bisect-range.sh d7afd31fd40 9f512c016)"

				OCP_DIR=/path/to/openshift/release bisecter run ./sample-test-reproducer.sh

				```

				[!NOTE]

				If you use ``KATA_WITH_SYSTEM_QEMU=yes`` you might want to deploy once with

				it and skip it for the cleanup. That way you might (in most cases) test

				all images with a single MCP update instead of per-image MCP update.

				[!TIP]

				You can check the bisection progress during/after execution by running

				``bisecter log`` from the current directory. Before starting a new

				bisection you need to execute ``bisecter reset``.

				Peer pods

				=========

				It's possible to run similar testing on peer-pods using cloud-api-adaptor.

				Our CI configuration to run inside azure's OCP is in ``peer-pods-azure.sh``

				and can be used to replace the `test.sh` step in snippets above.

									
										30

ci/openshift-ci/bisect-range.sh
									
										Executable file
									
												View File
												
				@@ -0,0 +1,30 @@

				#!/bin/bash

				# Copyright (c) 2024 Red Hat, Inc.

				#

				# SPDX-License-Identifier: Apache-2.0

				#

				if [[ "$#" -gt 2 ]] || [[ "$#" -lt 1 ]] ; then

					echo "Usage: $0 GOOD [BAD]"

					echo "Prints list of available kata-deploy-ci tags between GOOD and BAD commits (by default BAD is the latest available tag)"

					exit 255

				fi

				GOOD="$1"

				[[ -n "$2" ]] && BAD="$2"

				ARCH=amd64

				REPO="quay.io/kata-containers/kata-deploy-ci"

				TAGS=$(skopeo list-tags "docker://${REPO}")

				# For testing

				#echo "$TAGS" > tags

				#TAGS=$(cat tags)

				# Only amd64

				TAGS=$(echo "${TAGS}" | jq '.Tags' | jq "map(select(endswith(\"${ARCH}\")))" | jq -r '.[]')

				# Sort by git

				SORTED=""

				[[ -n "${BAD}" ]] && LOG_ARGS="${GOOD}~1..${BAD}" || LOG_ARGS="${GOOD}~1.."

				for TAG in $(git log --merges --pretty=format:%H --reverse "${LOG_ARGS}"); do

					[[ "${TAGS}" =~ ${TAG} ]] && SORTED+="

				kata-containers-${TAG}-${ARCH}"

				done

				# Comma separated tags with repo

				echo "${SORTED}" | tail -n +2 | sed -e "s@^@${REPO}:@" | paste -s -d, -

									
										30

ci/openshift-ci/cleanup.sh
									
												View File
												
				@@ -7,11 +7,14 @@

				# This script tries to removes most of the resources added by `test.sh` script

				# from the cluster.

				scripts_dir=$(dirname $0)

				scripts_dir=$(dirname "$0")

				deployments_dir=${scripts_dir}/cluster/deployments

				configs_dir=${scripts_dir}/configs

				source ${scripts_dir}/lib.sh

				# shellcheck disable=SC1091 # import based on variable

				source "${scripts_dir}/lib.sh"

				# Set your katacontainers repo dir location

				[[ -z "${katacontainers_repo_dir}" ]] && echo "Please set katacontainers_repo_dir variable to your kata repo"

				# Set to 'yes' if you want to configure SELinux to permissive on the cluster

				# workers.

				@@ -25,6 +28,10 @@ WORKAROUND_9206_CRIO=${WORKAROUND_9206_CRIO:-no}

				# Ignore errors as we want best-effort-approach here

				trap - ERR

				# Delete webhook resources

				oc delete -f "${scripts_dir}/../../tools/testing/kata-webhook/deploy"

				oc delete -f "${scripts_dir}/cluster/deployments/configmap_kata-webhook.yaml.in"

				# Delete potential smoke-test resources

				oc delete -f "${scripts_dir}/smoke/service.yaml"

				oc delete -f "${scripts_dir}/smoke/service_kubernetes.yaml"

				@@ -32,24 +39,19 @@ oc delete -f "${scripts_dir}/smoke/http-server.yaml"

				# Delete test.sh resources

				oc delete -f "${deployments_dir}/relabel_selinux.yaml"

				if [[ "$WORKAROUND_9206_CRIO" == "yes" ]]; then

				if [[ "${WORKAROUND_9206_CRIO}" == "yes" ]]; then

					oc delete -f "${deployments_dir}/workaround-9206-crio-ds.yaml"

					oc delete -f "${deployments_dir}/workaround-9206-crio.yaml"

				fi

				[ ${SELINUX_PERMISSIVE} == "yes" ] && oc delete -f "${deployments_dir}/machineconfig_selinux.yaml.in"

				[[ ${SELINUX_PERMISSIVE} == "yes" ]] && oc delete -f "${deployments_dir}/machineconfig_selinux.yaml.in"

				# Delete kata-containers

				pushd "$katacontainers_repo_dir/tools/packaging/kata-deploy"

				oc delete -f kata-deploy/base/kata-deploy.yaml

				helm uninstall kata-deploy --wait --namespace kube-system

				oc -n kube-system wait --timeout=10m --for=delete -l name=kata-deploy pod

				oc apply -f kata-cleanup/base/kata-cleanup.yaml

				echo "Wait for all related pods to be gone"

				( repeats=1; for i in $(seq 1 600); do

				( repeats=1; for _ in $(seq 1 600); do

				  oc get pods -l name="kubelet-kata-cleanup" --no-headers=true -n kube-system 2>&1 | grep "No resources found" -q && ((repeats++)) || repeats=1

				  [ "$repeats" -gt 5 ] && echo kata-cleanup finished && break

				  [[ "${repeats}" -gt 5 ]] && echo kata-cleanup finished && break

				  sleep 1

				done) || { echo "There are still some kata-cleanup related pods after 600 iterations"; oc get all -n kube-system; exit -1; }

				oc delete -f kata-cleanup/base/kata-cleanup.yaml

				oc delete -f kata-rbac/base/kata-rbac.yaml

				done) || { echo "There are still some kata-cleanup related pods after 600 iterations"; oc get all -n kube-system; exit 1; }

				oc delete -f runtimeclasses/kata-runtimeClasses.yaml

Compare commits

4748 Commits 3.3.0-test ... 3.27.0

7 .editorconfig Normal file Unescape Escape View File

30 .editorconfig-checker.json Normal file Unescape Escape View File

30 .github/actionlint.yaml vendored Normal file Unescape Escape View File

2 .github/cargo-deny-composite-action/cargo-deny-generator.sh vendored Unescape Escape View File

4 .github/cargo-deny-composite-action/cargo-deny-skeleton.yaml.in vendored Unescape Escape View File

92 .github/dependabot.yml vendored Normal file Unescape Escape View File

7 .github/workflows/PR-wip-checks.yaml vendored Unescape Escape View File

30 .github/workflows/actionlint.yaml vendored Normal file Unescape Escape View File

59 .github/workflows/add-issues-to-project.yaml vendored Unescape Escape View File

53 .github/workflows/add-pr-sizing-label.yaml vendored Unescape Escape View File

215 .github/workflows/basic-ci-amd64.yaml vendored Unescape Escape View File

108 .github/workflows/basic-ci-s390x.yaml vendored Normal file Unescape Escape View File

134 .github/workflows/build-checks-preview-riscv64.yaml vendored Normal file Unescape Escape View File

159 .github/workflows/build-checks.yaml vendored Unescape Escape View File

394 .github/workflows/build-kata-static-tarball-amd64.yaml vendored Unescape Escape View File

272 .github/workflows/build-kata-static-tarball-arm64.yaml vendored Unescape Escape View File

206 .github/workflows/build-kata-static-tarball-ppc64le.yaml vendored Unescape Escape View File

75 .github/workflows/build-kata-static-tarball-riscv64.yaml vendored Normal file Unescape Escape View File

295 .github/workflows/build-kata-static-tarball-s390x.yaml vendored Unescape Escape View File

75 .github/workflows/build-kubectl-image.yaml vendored Normal file Unescape Escape View File

14 .github/workflows/cargo-deny-runner.yaml vendored Unescape Escape View File

33 .github/workflows/ci-coco-stability.yaml vendored Normal file Unescape Escape View File

35 .github/workflows/ci-devel.yaml vendored Normal file Unescape Escape View File

34 .github/workflows/ci-nightly-riscv.yaml vendored Normal file Unescape Escape View File

32 .github/workflows/ci-nightly-s390x.yaml vendored Unescape Escape View File

19 .github/workflows/ci-nightly.yaml vendored Unescape Escape View File

32 .github/workflows/ci-on-push.yaml vendored Unescape Escape View File

128 .github/workflows/ci-weekly.yaml vendored Normal file Unescape Escape View File

374 .github/workflows/ci.yaml vendored Unescape Escape View File

38 .github/workflows/cleanup-resources.yaml vendored Normal file Unescape Escape View File

100 .github/workflows/codeql.yml vendored Normal file Unescape Escape View File

44 .github/workflows/commit-message-check.yaml vendored Unescape Escape View File

28 .github/workflows/darwin-tests.yaml vendored Unescape Escape View File

31 .github/workflows/docs-url-alive-check.yaml vendored Unescape Escape View File

32 .github/workflows/docs.yaml vendored Normal file Unescape Escape View File

29 .github/workflows/editorconfig-checker.yaml vendored Normal file Unescape Escape View File

55 .github/workflows/gatekeeper-skipper.yaml vendored Normal file Unescape Escape View File

55 .github/workflows/gatekeeper.yaml vendored Normal file Unescape Escape View File

53 .github/workflows/govulncheck.yaml vendored Normal file Unescape Escape View File

36 .github/workflows/kata-runtime-classes-sync.yaml vendored Unescape Escape View File

92 .github/workflows/move-issues-to-in-progress.yaml vendored Unescape Escape View File

35 .github/workflows/nydus-snapshotter-version-in-sync.yaml vendored Normal file Unescape Escape View File

43 .github/workflows/osv-scanner.yaml vendored Normal file Unescape Escape View File

130 .github/workflows/payload-after-push.yaml vendored Unescape Escape View File

66 .github/workflows/publish-kata-deploy-payload-amd64.yaml vendored Unescape Escape View File

71 .github/workflows/publish-kata-deploy-payload-arm64.yaml vendored Unescape Escape View File

75 .github/workflows/publish-kata-deploy-payload-ppc64le.yaml vendored Unescape Escape View File

69 .github/workflows/publish-kata-deploy-payload-s390x.yaml vendored Unescape Escape View File

108 .github/workflows/publish-kata-deploy-payload.yaml vendored Normal file Unescape Escape View File

43 .github/workflows/push-oras-tarball-cache.yaml vendored Normal file Unescape Escape View File

59 .github/workflows/release-amd64.yaml vendored Unescape Escape View File

59 .github/workflows/release-arm64.yaml vendored Unescape Escape View File

61 .github/workflows/release-ppc64le.yaml vendored Unescape Escape View File

64 .github/workflows/release-s390x.yaml vendored Unescape Escape View File

233 .github/workflows/release.yaml vendored Unescape Escape View File

67 .github/workflows/run-cri-containerd-tests-ppc64le.yaml vendored Unescape Escape View File

63 .github/workflows/run-cri-containerd-tests-s390x.yaml vendored Unescape Escape View File

75 .github/workflows/run-cri-containerd-tests.yaml vendored Normal file Unescape Escape View File

98 .github/workflows/run-k8s-tests-on-aks.yaml vendored Unescape Escape View File

91 .github/workflows/run-k8s-tests-on-arm64.yaml vendored Normal file Unescape Escape View File

100 .github/workflows/run-k8s-tests-on-garm.yaml vendored Unescape Escape View File

131 .github/workflows/run-k8s-tests-on-nvidia-gpu.yaml vendored Normal file Unescape Escape View File

33 .github/workflows/run-k8s-tests-on-ppc64le.yaml vendored Unescape Escape View File

107 .github/workflows/run-k8s-tests-on-zvsi.yaml vendored Unescape Escape View File

86 .github/workflows/run-k8s-tests-with-crio-on-garm.yaml vendored Unescape Escape View File

157 .github/workflows/run-kata-coco-stability-tests.yaml vendored Normal file Unescape Escape View File

365 .github/workflows/run-kata-coco-tests.yaml vendored Unescape Escape View File

61 .github/workflows/run-kata-deploy-tests-on-aks.yaml vendored Unescape Escape View File

65 .github/workflows/run-kata-deploy-tests-on-garm.yaml vendored Unescape Escape View File

90 .github/workflows/run-kata-deploy-tests.yaml vendored Normal file Unescape Escape View File

23 .github/workflows/run-kata-monitor-tests.yaml vendored Unescape Escape View File

90 .github/workflows/run-metrics.yaml vendored Unescape Escape View File

46 .github/workflows/run-runk-tests.yaml vendored Unescape Escape View File

60 .github/workflows/scorecard.yaml vendored Normal file Unescape Escape View File

32 .github/workflows/shellcheck.yaml vendored Normal file Unescape Escape View File

35 .github/workflows/shellcheck_required.yaml vendored Normal file Unescape Escape View File

17 .github/workflows/stale.yaml vendored Unescape Escape View File

18 .github/workflows/static-checks-self-hosted.yaml vendored Unescape Escape View File

4748 Commits

3.3.0-test ... 3.27.0

7

.editorconfig Normal file

View File

30

.editorconfig-checker.json Normal file

View File

30

.github/actionlint.yaml vendored Normal file

View File

2

.github/cargo-deny-composite-action/cargo-deny-generator.sh vendored

View File

4

.github/cargo-deny-composite-action/cargo-deny-skeleton.yaml.in vendored

View File

92

.github/dependabot.yml vendored Normal file

View File

7

.github/workflows/PR-wip-checks.yaml vendored

View File

30

.github/workflows/actionlint.yaml vendored Normal file

View File

59

.github/workflows/add-issues-to-project.yaml vendored

View File

53

.github/workflows/add-pr-sizing-label.yaml vendored

View File

215

.github/workflows/basic-ci-amd64.yaml vendored

View File

108

.github/workflows/basic-ci-s390x.yaml vendored Normal file

View File

134

.github/workflows/build-checks-preview-riscv64.yaml vendored Normal file

View File

159

.github/workflows/build-checks.yaml vendored

View File

394

.github/workflows/build-kata-static-tarball-amd64.yaml vendored

View File

272

.github/workflows/build-kata-static-tarball-arm64.yaml vendored

View File

206

.github/workflows/build-kata-static-tarball-ppc64le.yaml vendored

View File

75

.github/workflows/build-kata-static-tarball-riscv64.yaml vendored Normal file

View File

295

.github/workflows/build-kata-static-tarball-s390x.yaml vendored

View File

75

.github/workflows/build-kubectl-image.yaml vendored Normal file

View File

14

.github/workflows/cargo-deny-runner.yaml vendored

View File

33

.github/workflows/ci-coco-stability.yaml vendored Normal file

View File

35

.github/workflows/ci-devel.yaml vendored Normal file

View File

34

.github/workflows/ci-nightly-riscv.yaml vendored Normal file

View File

32

.github/workflows/ci-nightly-s390x.yaml vendored

View File

19

.github/workflows/ci-nightly.yaml vendored

View File

32

.github/workflows/ci-on-push.yaml vendored

View File

128

.github/workflows/ci-weekly.yaml vendored Normal file

View File

374

.github/workflows/ci.yaml vendored

View File

38

.github/workflows/cleanup-resources.yaml vendored Normal file

View File

100

.github/workflows/codeql.yml vendored Normal file

View File

44

.github/workflows/commit-message-check.yaml vendored

View File

28

.github/workflows/darwin-tests.yaml vendored

View File

31

.github/workflows/docs-url-alive-check.yaml vendored

View File

32

.github/workflows/docs.yaml vendored Normal file

View File

29

.github/workflows/editorconfig-checker.yaml vendored Normal file

View File

55

.github/workflows/gatekeeper-skipper.yaml vendored Normal file

View File

55

.github/workflows/gatekeeper.yaml vendored Normal file

View File

53

.github/workflows/govulncheck.yaml vendored Normal file

View File

36

.github/workflows/kata-runtime-classes-sync.yaml vendored

View File

92

.github/workflows/move-issues-to-in-progress.yaml vendored

View File

35

.github/workflows/nydus-snapshotter-version-in-sync.yaml vendored Normal file

View File

43

.github/workflows/osv-scanner.yaml vendored Normal file

View File

130

.github/workflows/payload-after-push.yaml vendored

View File

66

.github/workflows/publish-kata-deploy-payload-amd64.yaml vendored

View File

71

.github/workflows/publish-kata-deploy-payload-arm64.yaml vendored

View File

75

.github/workflows/publish-kata-deploy-payload-ppc64le.yaml vendored

View File

69

.github/workflows/publish-kata-deploy-payload-s390x.yaml vendored

View File

108

.github/workflows/publish-kata-deploy-payload.yaml vendored Normal file

View File

43

.github/workflows/push-oras-tarball-cache.yaml vendored Normal file

View File

59

.github/workflows/release-amd64.yaml vendored

View File

59

.github/workflows/release-arm64.yaml vendored

View File

61

.github/workflows/release-ppc64le.yaml vendored

View File

64

.github/workflows/release-s390x.yaml vendored

View File

233

.github/workflows/release.yaml vendored

View File

67

.github/workflows/run-cri-containerd-tests-ppc64le.yaml vendored

View File

63

.github/workflows/run-cri-containerd-tests-s390x.yaml vendored

View File

75

.github/workflows/run-cri-containerd-tests.yaml vendored Normal file

View File

98

.github/workflows/run-k8s-tests-on-aks.yaml vendored

View File

91

.github/workflows/run-k8s-tests-on-arm64.yaml vendored Normal file

View File

100

.github/workflows/run-k8s-tests-on-garm.yaml vendored

View File

131

.github/workflows/run-k8s-tests-on-nvidia-gpu.yaml vendored Normal file

View File

33

.github/workflows/run-k8s-tests-on-ppc64le.yaml vendored

View File

107

.github/workflows/run-k8s-tests-on-zvsi.yaml vendored

View File

86

.github/workflows/run-k8s-tests-with-crio-on-garm.yaml vendored

View File

157

.github/workflows/run-kata-coco-stability-tests.yaml vendored Normal file

View File

365

.github/workflows/run-kata-coco-tests.yaml vendored

View File

61

.github/workflows/run-kata-deploy-tests-on-aks.yaml vendored

View File

65

.github/workflows/run-kata-deploy-tests-on-garm.yaml vendored

View File

90

.github/workflows/run-kata-deploy-tests.yaml vendored Normal file

View File

23

.github/workflows/run-kata-monitor-tests.yaml vendored

View File

90

.github/workflows/run-metrics.yaml vendored

View File

46

.github/workflows/run-runk-tests.yaml vendored

View File

60

.github/workflows/scorecard.yaml vendored Normal file

View File

32

.github/workflows/shellcheck.yaml vendored Normal file

View File

35

.github/workflows/shellcheck_required.yaml vendored Normal file

View File

17

.github/workflows/stale.yaml vendored

View File

18

.github/workflows/static-checks-self-hosted.yaml vendored

View File

119

.github/workflows/static-checks.yaml vendored

View File