kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-07-01 22:50:54 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	850b385f6b	Revert "tests: skip Guaranteed QoS test for SNP/TDX runtime-rs" This reverts commit `6588014b54`, as the needed PR[0] was merged this morning, allowing us to just revert the image. [0]: https://github.com/kata-containers/kata-containers/pull/13173 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-25 18:18:17 +02:00
Fabiano Fidêncio	b2f7314d31	tests: harden sandbox sizing manifests for k8s cpu workloads Route runtime-rs tests to dedicated manifests/templates and ensure the CPU allocation workloads always carry explicit memory limits, avoiding Dragonball sandbox startup failures from InvalidMemorySize(0). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-25 13:56:11 +02:00
Fabiano Fidêncio	a34c74a2d4	runtime-rs: size static sandboxes with overhead values When static sandbox sizing is enabled, keep configured defaults when workloads do not specify CPU or memory limits. When limits are present, size the VM as requested resources plus overhead_vcpus/overhead_memory values derived from runtime-rs profile defaults. Limit-driven vCPU sizing is clamped to a minimum of one vCPU so a 0.0 result never yields an unbootable VM, and sandbox setup fails early with a clear, actionable error when the computed memory is 0 MiB (pointing at memory limits or non-zero default/overhead memory settings). This keeps static VM sizing predictable across runtime-rs profiles, including NVIDIA ones. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-25 13:56:11 +02:00
Aurélien Bombo	1217dd1584	Merge pull request #12373 from kata-containers/disable-guest-empty-dir runtime: Set `disable_guest_empty_dir = true` by default	2026-06-24 20:09:46 -05:00
Aurélien Bombo	77c3e36cf7	tests: Support GENPOLICY_SETTINGS_DIR with drop-in-examples Follow-up to `3dd77bf576`. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2026-06-24 15:22:13 -05:00
Fabiano Fidêncio	84db260d9a	docs: detail composable image runtime contracts in proposal Update the composable-vm-images proposal with the design decisions we only arrived at after experimenting with the implementation: * Replace the hardcoded agent path-resolution table with the data-driven components.toml manifest (process levels, args/optional_args, env, wait_socket, ${...} substitution, and select/variants), keeping the agent generic. * Document the attester-variant contract: NVRC exports KATA_ATTESTER_VARIANT and the manifest selects the stock vs NVIDIA attestation-agent. * Document the runtime dependency requirements found during bring-up: the nvidia attester's LD_LIBRARY_PATH (libnvat closure in the coco addon + NVML in the gpu addon) and the NVML-init failure mode, plus CDH secure_mount tooling placement -- plain storage (mke2fs/mkfs.ext4/dd) in the base vs encrypted storage (cryptsetup) in the coco addon, the CDH PATH, and the base/addon ABI lockstep. * Reflect the storage tooling and bundled libraries in the base/coco-addon build sections, and mark the GPU addon as implemented. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-22 20:04:25 +02:00
Fabiano Fidêncio	9761ea2235	Merge pull request #13164 from manuelh-dev/mahuber/remove-resource-requests tests: use limits for Kata workload resources	2026-06-22 20:01:33 +02:00
Fabiano Fidêncio	f1ebefcdfb	Merge pull request #13222 from fidencio/topic/nvidia-switch-to-kata-deploy-jobs kata-deploy: nvidia: Default to the Job-based deployment mode	2026-06-22 12:55:10 +02:00
Cameron Baird	65a5f272f8	ci: Introduce tests for VM template factory Add k8s-vm-templating-test.bats which exercises pod create with the factory initialized on the target node. Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>	2026-06-19 18:00:02 +00:00
Manuel Huber	aafd16515c	tests: use limits for Kata workload manifests Kata sizes VM CPU and memory from OCI limits, not Kubernetes resource requests. Requests are consumed by the Kubernetes control plane, but they do not drive Kata VM or sandbox sizing today. Convert the straightforward Kata workload manifests and kata-deploy examples from resource requests to limits so the declared resources match the values Kata uses for VM provisioning. Keep requests where the fixture intentionally validates Kubernetes request/limit behavior. Update fixture expectations affected by the conversion. The LimitRange fixture is limit-only at 500m. Raise the policy deployment limits to 500m and 800Mi. These tests boot CoCo/runtime-rs sandboxes with policy/initdata, and the former 100m/100Mi values became real runtime limits after the conversion, which is too constrained for the CI environments. Leave PVC storage requests, explicit request/limit validation fixtures, the env resourceFieldRef request, and non-Kata workload examples unchanged where requests are handled outside the Kata shim resource sizing path. If Kata later grows request-aware sandbox sizing, for example through Sandbox API based resource plumbing, these requests can be reintroduced where they carry the intended semantics. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-19 09:38:15 -07:00
Fabiano Fidêncio	0ddb2ee1f1	Merge pull request #13160 from LandonTClipp/kata_visible_devices feat(agent): translate KATA_VISIBLE_DEVICES into CDI GPU requests	2026-06-16 19:10:35 +02:00
davidweisse	ac56ea21d8	genpolicy: support pod-level resources Add support for resource requests and limits in the PodSpec. Fixes #12816 Signed-off-by: davidweisse <98460960+davidweisse@users.noreply.github.com>	2026-06-16 15:30:22 +02:00
LandonTClipp	4a9da5d37a	chore(docs): Add info on building and running custom artifacts I created this over the course of testing my VISIBLE_CDI_DEVICES changes. I think this will be useful to folks who don't understand the right way to deploy custom artifacts. Signed-off-by: LandonTClipp <lclipp@coreweave.com>	2026-06-16 11:44:09 +02:00
Harshitha Gowda	6588014b54	tests: skip Guaranteed QoS test for SNP/TDX runtime-rs The Guaranteed QoS test is currently failing for SNP and TDX runtime-rs due to a podOverhead configuration issue. The test requests 600Mi of memory which, combined with the 2048Mi podOverhead, exceeds 2GiB and triggers memory management issues in confidential guests. This is a temporary skip until the podOverhead fix is merged. Related: https://github.com/kata-containers/kata-containers/pull/13228 Signed-off-by: Harshitha Gowda <hgowda@amd.com>	2026-06-15 20:04:22 +00:00
Manuel Huber	9ffdb1219d	tests: add runtime config drop-in helpers Add common Kubernetes test helpers for locating the active per-shim Kata runtime config directory and copying/removing TOML fragments under config.d. Update the NVIDIA NUMA test to install its temporary numa_mapping override through those helpers. This gives follow-up tests a shared pattern for temporary runtime config overrides. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-12 21:43:06 +00:00
Fabiano Fidêncio	fefc0b75ab	kata-deploy: nvidia: Default to the Job-based deployment mode Switch the NVIDIA GPU example values file to install Kata via the Job-based deployment mode (deploymentMode: job) instead of the always-on, privileged DaemonSet, so that nothing keeps running on the node once the install completes. To exercise this in our CI, make the helm_helper aware of the deployment mode coming from the (base) values file: - In "job" mode, clear job.nodeSelectorExpressions so the dispatcher targets every discovered node. Our CI clusters are typically single-node, where the only node carries the control-plane label, and the default selector excludes control-plane/master nodes. - There is no always-on DaemonSet to wait on in "job" mode. The dispatcher runs as a blocking post-install hook and the final per-node stage labels the node, so wait until at least one node carries the katacontainers.io/kata-runtime label as the "install complete" signal (dumping Job/pod logs on timeout). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 22:55:11 +02:00
Fabiano Fidêncio	54878fa373	kata-deploy: add job deployment mode driven by the job-dispatcher Phase 2 of the DaemonSet -> staged-Job migration: add an opt-in `deploymentMode: job` that installs Kata via short-lived, per-node install Jobs instead of the long-running DaemonSet. The DaemonSet remains the default and is now gated behind `deploymentMode == daemonset`. Rather than render one Job per node into the Helm release (which grows the release secret O(nodes) and offers no rollout pacing), job mode ships a single tiny post-install/post-upgrade hook Job that runs the kata-deploy-job-dispatcher. The dispatcher enumerates the selected nodes LIVE from the API server and stamps out one node-pinned install Job per node from a constant-size ConfigMap of Job templates, keeping at most `job.parallelism` in flight and refilling as they finish. This guarantees per-node coverage with a paced rollout while the Helm release stays O(1) regardless of fleet size. New nodes are picked up by re-running `helm upgrade`; there is no always-on component. Each per-node Job runs the staged install pipeline as ordered initContainers and exits: host-check -> artifacts -> cri (initContainers, run sequentially) label (main container) The privilege split is explicit: the dispatcher pod is a pure control-plane client (lists nodes, manages Jobs in its own namespace) and runs fully unprivileged under a dedicated, least-privilege ServiceAccount (kata-rbac.yaml); only the per-node Jobs it creates carry the privileged kata-deploy host-mutation rights. Node selection (templates/_helpers.tpl: nodeLabelSelector / perNodeJob): - job.nodes: explicit node-name list passed to the dispatcher, and - job.nodeSelector (equality map) ANDed with - job.nodeSelectorExpressions (k8s label-selector requirements: In / NotIn / Exists / DoesNotExist), compiled into a single label-selector string the dispatcher resolves live. The default expressions target worker (non-control-plane) nodes, so no custom node labeling is required; set the expressions to [] to target all discovered nodes. Reuses the commonEnv/commonVolume* helpers and adds the stageContainer, serviceAccountName, dispatcherServiceAccountName, dispatcherImage and perNodeJob helpers shared by the dispatcher and the staged Jobs. The default (daemonset) render is unchanged. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <cursoragent@cursor.com>	2026-06-12 18:58:33 +02:00
Fabiano Fidêncio	a016fd0485	Merge pull request #13198 from fidencio/topic/fix-ci-tee-static-sizing-overhead tests: raise k8s memory/QoS pod limits for TEE runtime-rs CI	2026-06-12 11:46:56 +02:00
Fabiano Fidêncio	723f74e782	Merge pull request #13209 from fidencio/topic/fix-kata-monitor-runc-pod-runtime tests: launch kata-monitor runc workload with explicit runtime	2026-06-12 11:40:19 +02:00
Fabiano Fidêncio	cda6c8c6e0	tests: raise k8s memory/QoS pod limits for TEE runtime-rs CI Increase memory request/limit values used by k8s memory and QoS integration workloads so SNP/TDX static-sized sandboxes boot reliably under the new sizing defaults. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-11 22:03:36 +02:00
Fabiano Fidêncio	9e597d33f2	tests: launch kata-monitor runc workload with explicit runtime The kata-monitor negative test creates a non-kata pod and asserts it does not appear in the kata-monitor cache (built from /run/vc/sbs, where only kata sandboxes register). However, the workload was started without a runtime handler, so it used containerd's default runtime, which in the CI containerd config is set to kata, so the "runc" pod was actually launched as a kata sandbox, registered under /run/vc/sbs, and tripped the assertion ("cache: got runc pod ..."). Start the workload with an explicit runc handler (configurable via RUNC_RUNTIME) so it is a genuine runc sandbox that never touches /run/vc/sbs. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-11 21:59:53 +02:00
Alex Lyn	1034d7fc46	tests: Add support nydus tests for qemu-runtime-rs and clh-runtime-rs This commit is to enable qemu-runtime-rs/clh-runtime-rs and make it compatiable with qemu-runtime-rs and clh-runtime-rs. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Alex Lyn	4eb7512e7b	docs: Update how-to guide for virtio-fs-nydus with runtime-rs Add comprehensive documentation for using virtio-fs-nydus shared filesystem with Kata Containers. This guide covers: (1) Clarify configuration options for virtio-fs-nydus and nydus image preparation and usage. (2) Update daemon configuration and lifecycle management and introduce standalone, inline nydus architecture. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-06-11 21:42:48 +02:00
Fabiano Fidêncio	38416f78ec	Merge pull request #13190 from manuelh-dev/mahuber/fix-num-cpus-bats tests: fix k8s-number-cpus expectation	2026-06-10 21:59:21 +02:00
Fabiano Fidêncio	92a9691470	tests: add kata-monitor helm chart k8s test Add a single-job k8s test that installs the kata-deploy helm chart with monitor.enabled=true, pointed at the per-PR kata-monitor image built earlier in the same run, and exercises both the rollout and the user-visible behaviour: * the kata-monitor DaemonSet rolls out and the pod stays up without container restarts; * a real kata-runtime probe pod is scheduled, then /metrics and /sandboxes are scraped through the apiserver pod-proxy to prove kata-monitor sees the sandbox (non-zero running-shim count plus at least one per-sandbox kata_shim_* metric); * after the probe pod is deleted, /metrics drops back to a zero running-shim count. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	285d5daa23	tests: install latest cri-tools dynamically Resolve the cri-tools release at install time instead of pinning a version in versions.yaml: install_cri_tools now queries the GitHub releases API for the absolute latest stable tag, and the kata-monitor, cri-containerd and nydus jobs call it directly. Also write /etc/crictl.yaml during containerd setup so crictl stops emitting deprecation warnings about the legacy default endpoints. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	63fec205fe	tests: run kata-monitor functional tests against the dedicated image Exercise the published kata-monitor container image (the one built by publish-kata-monitor-payload-amd64) rather than the on-disk binary, so integration regressions like the recent glibc/musl mismatch surface at PR time. The kata-monitor-tests.sh script keeps the binary fallback for ad-hoc local runs. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	d5bc1177c0	tests: focus kata-monitor CI on containerd active Drop the stale CRI-O matrix entry (its cri-tools pin was several releases behind) along with the exclude that hid the containerd job, and pin the remaining job to containerd's "active" track (currently v2.2) via CONTAINERD_VERSION. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-09 14:33:30 +02:00
Fabiano Fidêncio	5000000883	tests: restore SystemdCgroup in installed containerd Set runc SystemdCgroup=true when generating /etc/containerd/config.toml during containerd installation, restoring behavior that was mistakenly dropped. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-09 10:46:38 +02:00
Fabiano Fidêncio	3ca9eb94b9	cri-containerd: fix v1 sanity-check config generation Avoid emitting unsupported plugin keys and empty runtime options in the v1.x config path so containerd 1.7 can load the generated TOML during runc sanity checks. While here, let's also dump the temporary cri-integration config on failure to speed diagnosis. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-09 10:46:38 +02:00
Fabiano Fidêncio	ac2221a6a5	Merge pull request #13004 from fidencio/topic/versions-bump-containerd-to-2.3 versions: Bump containerd to 2.3	2026-06-09 08:21:58 +02:00
Manuel Huber	f37fb18b8c	tests: fix k8s-number-cpus expectation As pointed out in kata-containers/kata-containers#12961, the k8s-number-cpus retry loop could fail all retried assertions and still pass. k8s-number-cpus retried until the guest reported three CPUs, but the post-loop result was never checked. Bash suppresses errexit for the equality test before && break, so the test could exhaust retries and still pass. The current kata-qemu handler sizes vCPUs from fractional container quotas: two 500m limits produce one workload vCPU, then the default vCPU is added and rounded once. Expect two CPUs and assert the final retry result so the test fails if the count never converges. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Assisted-by: OpenAI Codex <codex@openai.com>	2026-06-08 22:50:02 +00:00
Fabiano Fidêncio	48ebbbec3a	kata-deploy: honor debug mode with CLI log-level Make the chart pass --log-level debug automatically when debug=true so CI and troubleshooting runs emit full rendered config dumps without requiring a separate log-level override. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:25:48 +02:00
Fabiano Fidêncio	95b8e8bea9	tests: update remaining containerd callers for containerd 2.x tests/functional/vfio-ap/run.sh: - Source tests/common.bash so the schema helpers are available. - configure_containerd_for_runtime_rs: write kata-qemu-runtime-rs configuration via a conf.d drop-in. Schema >= 3 uses io.containerd.cri.v1.runtime; schema 2 uses io.containerd.grpc.v1.cri. The sandboxer field is emitted only for schema >= 3. tests/integration/nerdctl/gha-run.sh: - Fix "containerd config default" pipe: propagate PATH so the newly installed binary is found, suppress stdout, and call ensure_containerd_conf_d_rootful_api_sockets. tests/integration/kubernetes/gha-run.sh: - Fix jq filter for devmapper snapshotter (.version // 0 >= 3). - Add ensure_containerd_conf_d_rootful_api_sockets after config setup. tests/gha-run-k8s-common.sh: - Remove the redundant "containerd config default \| sed" override; overwrite_containerd_config (called via check_containerd_config_for_kata) now handles SystemdCgroup and all other containerd config setup. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	1caacda174	tests/cri-containerd: update integration tests for containerd 2.x Adapt create_containerd_config to work with containerd 2.x while keeping compatibility with v1.x for completeness: - Drop the direct config.toml patching in favour of conf.d fragments: use containerd_render_config_default_with_imports to generate the base config, then write separate drop-ins for API socket overrides, debug settings, and the Kata runtime. - Use CONTAINERD_SYSTEM_FRAGMENT_PREFIX directly (no PREFIX= indirection). - Detect cfg_schema via _containerd_blob_schema_version to select the right plugin table: schema >= 3 -> io.containerd.cri.v1.runtime schema 2 -> io.containerd.grpc.v1.cri and to emit the sandboxer field only on schema >= 3. - Pass GOTOOLCHAIN via "sudo -E make clean" so the environment variable set by export_go_toolchain_for_containerd_source_builds is preserved during the containerd source build. The require_containerd_binary_default_schema_v3_plus call is kept: the test explicitly clones and builds containerd 2.x from source, so a schema v2 binary should never appear here. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	7428832c86	tests/nydus: make containerd config schema-aware Configure containerd for nydus differently depending on the active config schema, because conf.d drop-in fragments are only honoured the same way by containerd 2.x. config_containerd now delegates to _containerd_resolved_schema_version (from common.bash) to detect the active schema and passes it to config_containerd_core, which emits schema-appropriate config: schema >= 3 (containerd v2.x): Keep the base config and add a conf.d drop-in fragment using the io.containerd.cri.v1.runtime plugin (sandboxer = 'podsandbox') and io.containerd.cri.v1.images to select nydus as the snapshotter. schema 2 (containerd v1.x): conf.d is not honoured the same way, so replace config.toml wholesale with a complete, self-contained file using the io.containerd.grpc.v1.cri plugin with nydus as the snapshotter and no sandboxer field. The [proxy_plugins] block is written in both cases as it is schema-version agnostic. Teardown restores the whole config.toml (schema v2 path) or removes the drop-in fragment (schema v3+ path) as appropriate. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	1bb43d0a19	tests/common: make overwrite_containerd_config schema-aware Rewrite overwrite_containerd_config so that it works with containerd v1.x (schema v2) as well as containerd v2.x (schema v3+): - Always regenerate /etc/containerd/config.toml from the installed binary via "sudo containerd config default". - Call ensure_containerd_conf_d_rootful_api_sockets after regenerating the base config. - Detect the effective schema via _containerd_resolved_schema_version. - Schema >= 3 (containerd v2.x): write io.containerd.cri.v1.runtime plugin path with sandboxer = podsandbox into a conf.d drop-in. - Schema 2 (containerd v1.x): write io.containerd.grpc.v1.cri plugin path without sandboxer into the drop-in. check_containerd_config_for_kata no longer appends a schema guard; the function supports both schema generations intentionally. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	18fbf4cd5d	tests/common: fix install_cri_containerd for containerd 2.x Three issues prevented containerd 2.x from working correctly after installation: 1. Socket uid/gid mismatch: "containerd config default" was run as the unprivileged user, which produced uid = <runner-uid> in the API socket stanza instead of uid = 0. Run it under sudo so the default output is owned by root. 2. Stale systemd unit: the CI runner ships a pre-installed containerd whose unit file is left in place after the binary is replaced by the test installer. The old unit causes "MigrateConfigTo: index out of range" panics when the new binary tries to load a schema v4 config. Always overwrite the unit file from the template so the running binary and the unit file stay in sync. 3. Schema guard removed: install_cri_containerd installs whatever version was requested (v1.7 or v2.3) and must not abort on a valid schema v2 binary. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	fbf133ce3a	tests/common: add containerd config schema helpers Introduce helper functions used by later commits to make containerd configuration schema-aware. _containerd_blob_schema_version(): Parse the version = <n> line from a containerd config blob and echo the integer. _containerd_resolved_schema_version(): Run "containerd config default" and return the schema version of the active binary. Drives conditional logic in overwrite_containerd_config and other helpers. containerd_emit_rootful_api_socket_overrides(): Emit the TOML fragment that fixes uid/gid on the grpc/ttrpc sockets. Schema v3 uses top-level [grpc]/[ttrpc]; schema v4+ uses plugin-scoped tables. require_containerd_config_schema_v3_plus() / require_containerd_binary_default_schema_v3_plus(): Guard helpers that abort with a clear message when the installed containerd is older than v2.x. Used only in test paths that explicitly build containerd 2.x from source. containerd_render_config_default_with_imports(): Write a fresh "containerd config default" to a file and ensure the conf.d import glob is present, ready for drop-in fragments. export_go_toolchain_for_containerd_source_builds(): Set GOTOOLCHAIN=auto so "go build" of containerd 2.x downloads the exact toolchain in its go.mod without changing the global Go version. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	8ffe4e6c02	tests: add journalctl diagnostics on containerd restart failure When restart_systemd_service_with_no_burst_limit fails or times out waiting for the containerd socket, emit "journalctl -xeu containerd.service" output so the failure reason is visible in CI logs without requiring a separate log-collection step. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	e122d7ffb0	versions: bump containerd to 2.3 and define minimum/latest test matrix Bump the containerd version used by CI from v1.7.25 to v2.3.0. Rename the version-range fields in versions.yaml and throughout the GitHub Actions workflows from lts/active/version/sandbox_api to minimum/latest to make their meaning self-evident: minimum: "v1.7" # oldest containerd branch under test latest: "v2.3" # newest containerd branch under test Drop the bare version field (superseded by the matrix) and the sandbox_api alias (covered by latest). Update all containerd_version matrix entries in the workflow files accordingly, and update gha-run-k8s-common.sh to resolve the new key names. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Cursor <noreply@cursor.com>	2026-06-08 19:20:14 +02:00
Fabiano Fidêncio	b119b051cb	kata-deploy: support drop-in configs for default runtimes Allow operators to provide per-shim drop-in TOML for built-in runtimes and reconcile stale override files so upgrades and migrations remain safe when drop-ins are added or removed. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Assisted-by: Codex	2026-06-08 13:31:03 +02:00
Fupan Li	024c2531a5	Merge pull request #13029 from fidencio/topic/rfc-composable-vm-images docs: add composable VM images design proposal	2026-06-08 18:40:35 +08:00
Fabiano Fidêncio	2440b5940b	docs: add composable VM images design proposal Add an RFC document describing the composable image architecture that replaces monolithic guest rootfs images with a lean base image plus purpose-specific addon images cold-plugged as virtio-blk devices. The proposal covers the runtime configuration (extra_images), host-side cold-plugging, guest-side mounting via systemd and dm-verity, agent-side dynamic path resolution, the image build pipeline, and the security model. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-07 13:58:17 +02:00
Fabiano Fidêncio	57c61e0c2f	tests: unskip hard-coded policy tests on qemu-tdx-runtime-rs Enable the hard-coded init-data policy test gate for qemu-tdx-runtime-rs so runtime-rs and Go TDX variants exercise the same Kubernetes policy coverage. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-06-06 22:48:20 +02:00
Fabiano Fidêncio	43321c7a78	Merge pull request #12931 from mythi/qemu-tdx-tests tests: fix TDX runtime-rs and initdata tests	2026-06-06 11:42:19 +02:00
Fabiano Fidêncio	f6ff9578d4	Merge pull request #13161 from kata-containers/sprt/remove-configure-mariner ci: remove Mariner annotations and use new config	2026-06-05 20:22:58 +02:00
Mikko Ylinen	013e901f1b	tests: re-enable initdata tests for qemu-tdx The coco initdata tests signature verification and authenticated registry never worked on qemu-tdx and so they have been disabled since. Add them back now that all necessary fixes are in place. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-06-05 16:04:05 +03:00
Mikko Ylinen	9313e336b5	tests: set image.image_pull_proxy for CDH initdata initdata tests set kernel arguments to "" which resets the kernel arguments configured by Helm install. However, TDX runner depends on agent.https_proxy= kernel arguments to pull images. In order for initdata tests to work on TDX, the same needs to be added to CDH configuration via image.image_pull_proxy. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-06-05 16:04:05 +03:00
Mikko Ylinen	f3a0ef6a7c	tests: use kubectl set to configure KBS env No need to patch yamls locally. Also, set RUST_LOG=debug and enable https_proxy for all TDX targets when the runner has HTTPS_PROXY is set. Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>	2026-06-05 16:04:05 +03:00

1 2 3 4 5 ...

2181 Commits