kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-05-05 12:02:33 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	b0a87880e7	Merge pull request #12826 from fidencio/topic/fix-concurrent-map-access-in-wait runtime: Fix concurrent map read/write panic in Wait()	2026-04-14 08:48:52 +02:00
Fabiano Fidêncio	b17dd2a902	runtime: Fix concurrent map read/write panic in Wait() Wait() was releasing s.mu immediately after getContainer(), then calling getExec() — which reads c.execs — without holding any lock. Concurrent Exec() or Delete() calls that write to c.execs under s.mu triggered a "concurrent map read and map write" fatal panic. Add a dedicated sync.RWMutex to the container struct that protects the execs map. getExec() now acquires a read lock internally, and all writes go through new setExec()/deleteExec() helpers that acquire the write lock. This keeps the locking concern local to the map and avoids complicating the s.mu usage in Wait(). Add a regression test (TestConcurrentExecAccess) that exercises concurrent getExec reads against setExec/deleteExec writes; this reliably reproduces the panic under the race detector without the fix. Fixes: #12825 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-13 21:14:28 +02:00
Fabiano Fidêncio	4c567a9c05	ci: Reduce TEE test scope for PR runs TEE hardware (TDX, SEV-SNP) is very limited in CI. Running the full test suite on every PR consumes these resources unnecessarily, since most tests exercises what is already exercised by the -coco-dev CIs. Introduce a `tee-test-scope` workflow input (small/full) and a new `baremetal-small-tee` K8S_TEST_HOST_TYPE that runs only the 12 tests that are TEE-relevant: attestation tests (encrypted/authenticated/ signed image pull, confidential attestation) plus policy and trusted ephemeral data storage tests. PR runs default to "small" (12 tests), nightly runs use "full" (59 tests), and manual dispatch offers a dropdown to choose. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-13 20:26:46 +02:00
Fabiano Fidêncio	bd6377a038	Merge pull request #12614 from manuelh-dev/mahuber/image-signing-nim tests: nvidia: Enforce image signing for NIM test	2026-04-11 14:48:04 +02:00
Fabiano Fidêncio	5eb7844183	Merge pull request #12430 from stevenhorsman/cargo-deny-static-checks static-checks: Rework cargo deny check	2026-04-11 12:05:53 +02:00
stevenhorsman	8be3a24112	ci: Update cargo-deny in gatekeeper Update the name and move it to the static checks as we don't need to ensure it's running for none code changes. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-11 08:46:32 +01:00
stevenhorsman	9448988783	workflow: Update cargo deny check The cargo deny generated action doesn't seem to work and seems unnecessarily complex, so try using EmbarkStudios/cargo-deny-action instead Fixes: #11218 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-11 08:46:32 +01:00
stevenhorsman	a0410e0d5c	static-checks: Update cargo deny config The previous config is not valid, so update it based on information from https://embarkstudios.github.io/cargo-deny/checks/cfg.html Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-11 08:46:32 +01:00
stevenhorsman	a32c6fd9ff	mem-agent: Add package metadata Make the authors, edition and license be inherited from the workspace Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-11 08:46:32 +01:00
stevenhorsman	5bcc006447	runtime-rs: Add missing license The ch-config crate was missing a license Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-11 08:46:32 +01:00
Manuel Huber	7daeb78b67	tests: nvidia: Enforce image signing for NIM test Validate container image signatures for the NIM test using NVIDIA's public signing key. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-04-11 09:22:50 +02:00
Fabiano Fidêncio	3ce3644c3c	Merge pull request #12807 from PiotrProkop/blk-sector-rust runtime-rs: allow specifying logical/physical sector size for block devices	2026-04-11 00:42:45 +02:00
Fabiano Fidêncio	6f3c11aec4	Merge pull request #12808 from fidencio/topic/agent-allow-configuring-launch-process-timeout agent: Make launch_process_timeout configurable	2026-04-11 00:36:01 +02:00
Fabiano Fidêncio	d4a042a155	Merge pull request #12813 from fitzthum/bump-gc-ma-sigs Bump guest components to pickup additional signature support	2026-04-10 23:57:19 +02:00
Fabiano Fidêncio	78fa4c88e2	Merge pull request #12814 from fidencio/topic/nvidia-always-do-vcpu-pinning runtime: Set `enable_vcpus_pinning = true` for NVIDIA configs	2026-04-10 23:47:44 +02:00
Fabiano Fidêncio	7244389ad4	runtime: Set `enable_vcpus_pinning = true` for NVIDIA configs So we can have a better performance by default. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-10 16:41:34 +02:00
Fabiano Fidêncio	1d77c4e60f	Merge pull request #12752 from LizZhang315/add-overheadEnabled helm: add overheadEnabled switch for runtimeclass	2026-04-10 16:40:42 +02:00
Tobin Feldman-Fitzthum	ff26a6b876	versions: update image-rs to pickup signature fixes The new version of image-rs supports more types of signed images. First, we added supported for a few more key types. Second, we added support for multi-arch images where the manifest digest is signed but the individual arch manifest is not. These images are relatively common, so let's pickup the fix asap. Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>	2026-04-10 06:54:58 -07:00
Tobin Feldman-Fitzthum	2588a0e5a5	agent-ctl: bump image-rs version I don't think agent-ctl will benefit from the new image-rs features, but let's update it to be complete. Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>	2026-04-10 06:52:53 -07:00
Fabiano Fidêncio	e8f34a2b26	agent: Update protocol This is not related to this PR, but rather to #12734, which ended up not running the `make src/agent generate-protocols`. While here, let's also fix it. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-10 14:47:01 +02:00
Fabiano Fidêncio	36a2d8e7f2	agent: Make launch_process_timeout configurable The hardcoded DEFAULT_LAUNCH_PROCESS_TIMEOUT of 6 seconds in the kata agent is insufficient for environments with NVIDIA GPUs and NVSwitches, where the attestation-agent needs significantly more time to collect evidence during initialization (e.g. ~2 seconds per NVSwitch). When the timeout expires, the agent (PID 1) exits with an error, causing the guest kernel to perform an orderly shutdown before the attestation-agent has finished starting. Make this timeout configurable via the kernel parameter agent.launch_process_timeout (in seconds), preserving the 6-second default for backward compatibility. The Go runtime is wired up to pass this value from the TOML config's [agent.kata] section through to the kernel command line. The NVIDIA GPU configs set the new default to 15 seconds. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-10 14:47:01 +02:00
PiotrProkop	82de35c720	runtime-rs: allow specifying logical/physical sector size for block devices Add two new configuration knobs that control the logical and physical sector sizes advertised by virtio-blk devices to the guest: block_device_logical_sector_size (config file) block_device_physical_sector_size (config file) io.katacontainers.config.hypervisor.blk_logical_sector_size (annotation) io.katacontainers.config.hypervisor.blk_physical_sector_size (annotation) The annotation names are abbreviated relative to the config file keys because Kubernetes enforces a 63-character limit on annotation name segments, and the full names would exceed it. Both settings default to 0 (let QEMU decide). When set, they are passed as logical_block_size and physical_block_size in the QMP device_add command during block device hotplug. Setting logical_sector_size smaller then container filesystem block size will cause EINVAL on mount. The physical_sector_size can always be set independently. Values must be 0 or a power of 2 in the range [512, 65536]; other values are rejected with an error at sandbox creation time. Signed-off-by: PiotrProkop <pprokop@nvidia.com>	2026-04-10 11:14:51 +02:00
Fabiano Fidêncio	fd6375d8d5	Merge pull request #12806 from kata-containers/topic/ci-run-runtime-rs-on-SNP ci: Run qemu-snp-runtime-rs tests in the CI	2026-04-10 11:01:20 +02:00
LizZhang315	2312f67c9b	helm: add overheadEnabled switch for runtimeclass Add a global and per-shim configurable switch to enable/disable the overhead section in generated RuntimeClasses. This allows users to omit overhead when it's not needed or managed externally. Priority: per-shim > global > default(true). Signed-off-by: LizZhang315 <123134987@qq.com>	2026-04-10 10:26:11 +02:00
Fabiano Fidêncio	218077506b	Merge pull request #12769 from RuoqingHe/runtime-rs-allow-install-on-riscv runtime-rs: Allow installation on RISC-V platforms	2026-04-10 10:24:40 +02:00
Fabiano Fidêncio	dca89485f0	Merge pull request #12802 from stevenhorsman/bump-golang-1.25.9 versions: bump golang to 1.25.9	2026-04-10 06:50:35 +02:00
Fabiano Fidêncio	72fb41d33b	kata-deploy: Symlink original config to per-shim runtime copy Users were confused about which configuration file to edit because kata-deploy copied the base config into a per-shim runtime directory (runtimes/<shim>/) for config.d support, leaving the original file in place untouched. This made it look like the original was the authoritative config, when in reality the runtime was loading the copy from the per-shim directory. Replace the original config file with a symlink pointing to the per-shim runtime copy after the copy is made. The runtime's ResolvePath / EvalSymlinks follows the symlink and lands in the per-shim directory, where it naturally finds config.d/ with all drop-in fragments. This makes it immediately obvious that the real configuration lives in the per-shim directory and removes the ambiguity about which file to inspect or modify. During cleanup, the symlink at the original location is explicitly removed before the runtime directory is deleted. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-09 17:16:40 +02:00
Steve Horsman	9e8069569e	Merge pull request #12734 from Apokleos/rm-v9p-rs runtime-rs: Remove virtio-9p Shared Filesystem Support	2026-04-09 16:15:55 +01:00
Fabiano Fidêncio	5e1ab0aa7d	tests: Support runtime-rs QEMU cmdline format in attestation test The k8s-confidential-attestation test extracts the QEMU command line from journal logs to compute the SNP launch measurement. It only matched the Go runtime's log format ("launching <path> with: [<args>]"), but runtime-rs logs differently ("qemu args: <args>"). Handle both formats so the test works with qemu-snp-runtime-rs. Made-with: Cursor Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-09 16:35:08 +02:00
Fabiano Fidêncio	3b155ab0b1	ci: Run runtime-rs tests for SNP As we're in the process to stabilise runtime-rs for the coming 4.0.0 release, we better start running as many tests as possible with that. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-04-09 16:35:08 +02:00
stevenhorsman	31f9a5461b	versions: bump golang to 1.25.9 Bump the go version to resolve CVEs: - GO-2026-4947 - GO-2026-4946 - GO-2026-4870 - GO-2026-4869 - GO-2026-4865 - GO-2026-4864 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-04-09 08:59:40 +01:00
Hyounggyu Choi	f15f7f49f1	Merge pull request #12787 from fidencio/topic/runtime-rs-qemu-arm64-use-static-sandbox-resource-mgmt runtime: qemu: Enable static sandbox resource management on ARM & s390x	2026-04-09 09:18:11 +02:00
Ruoqing He	98ee385220	runtime-rs: Consolidate unsupported arch Consolidate arch we don't support at the moment, and avoid hard coding error messages per arch. Signed-off-by: Ruoqing He <ruoqing.he@lingcage.com>	2026-04-09 04:18:50 +00:00
Ruoqing He	26ffe1223b	runtime-rs: Allow install on riscv64 platform runtime-rs works with QEMU on RISC-V platforms, let's enable installation on RISC-V. Signed-off-by: Ruoqing He <ruoqing.he@lingcage.com>	2026-04-09 04:18:50 +00:00
Fabiano Fidêncio	80b0ed273f	Merge pull request #12784 from hgowda-amd/sev-snp-tests-required Add sev-snp, qemu-snp CIs as required	2026-04-09 00:22:49 +02:00
Harshitha Gowda	bb1165b23f	tests: Set sev-snp, qemu-snp CIs as required run-k8s-tests-on-tee (sev-snp, qemu-snp) Signed-off-by: Harshitha Gowda <hgowda@amd.com>	2026-04-08 22:36:58 +02:00
Fabiano Fidêncio	2148afe243	Merge pull request #12796 from fidencio/topic/kata-deploy-run-cargo-fmt-and-cargo-check kata-deploy: Run cargo clippy during build	2026-04-08 22:32:31 +02:00
Fabiano Fidêncio	8ff630059a	Merge pull request #12778 from amd-aliem/enable-img-rootfs-snp runtime: SNP img-based rootfs with dm-verity	2026-04-08 22:06:31 +02:00
Fabiano Fidêncio	4561ae3e29	Merge pull request #12799 from fitzthum/fixup-nv-doc-1 docs: update flow for setting nvidia devices to ready	2026-04-08 21:32:55 +02:00
Tobin Feldman-Fitzthum	9119b4982c	docs: update flow for setting nvidia devices to ready Now, we include the nvrc.smi.srs=1 flag in the default kernel cmdline. Thus, we can remove the guidance for people to add it themselves when not using attestation. In fact, users don't really need to know about this flag at all. Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>	2026-04-08 18:59:51 +00:00
Fabiano Fidêncio	21466eb4e5	kata-deploy: Fix clippy warnings across crate Fix all clippy warnings triggered by -D warnings: - install.rs: remove useless .into() conversions on PathBuf values and replace vec! with an array literal where a Vec is not needed - utils/toml.rs: replace while-let-on-iterator with a for loop and drop the now-unnecessary mut on the iterator binding - main.rs: replace match-with-single-pattern with if-let in two places dealing with experimental_setup_snapshotter - utils/yaml.rs: extract repeated serde_yaml::Value::String key into a local variable, removing needless borrows on temporary values Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-08 20:47:59 +02:00
Fabiano Fidêncio	1874d4617b	kata-deploy: Run cargo clippy during build Ensure code formatting and compilation are verified early in the Docker build pipeline, before tests and the release build. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-08 20:47:59 +02:00
Amanda Liem	79f844d057	runtime: SNP img-based rootfs with dm-verity Follow-on to kata-containers/kata-containers#12396 Switch SNP config from initrd-based to image-based rootfs with dm-verity. The runtime assembles the dm-mod.create kernel cmdline from kernel_verity_params, and with kernel-hashes=on the root hash is included in the SNP launch measurement. Also add qemu-snp to the measured rootfs integration test. Signed-off-by: Amanda Liem <aliem@amd.com>	2026-04-08 16:46:32 +00:00
Greg Kurz	817580e35d	Merge pull request #12795 from fidencio/topic/kata-deploy-do-not-try-to-install-a-snapshotter-when-using-crio kata-deploy: Skip snapshotter install/uninstall on CRI-O	2026-04-08 17:18:05 +02:00
Fabiano Fidêncio	e93bfbe01a	tests: Remove qemu-coco-dev* skip from sandbox vCPU allocation test With static_sandbox_resource_mgmt calculation fixed for runtime-rs, the VM is correctly pre-sized at creation time. The vCPU allocation test no longer depends on CPU hotplug, so the qemu-coco-dev* skip is no longer needed. Fixes: #10928 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-08 16:36:00 +02:00
Fabiano Fidêncio	6bc2452664	tests: Remove aarch64 skip from sandbox vCPU allocation test With static_sandbox_resource_mgmt now enabled for ARM on runtime-rs, the VM is correctly pre-sized at creation time. The vCPU allocation test no longer depends on CPU hotplug, so the aarch64 skip (issue #10928) is no longer needed. Fixes: #10928 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-08 16:36:00 +02:00
Fabiano Fidêncio	e0141991d3	runtime-rs: Enable static sandbox resource management on s390x runtime-rs memory hotplug hard-codes the `pc-dimm` device driver, which is an x86-only QEMU device model. On s390x, the `s390-ccw-virtio` machine type does not support `pc-dimm` at all — the Go runtime handles this by using `virtio-mem-ccw` instead (controlled by the `enable_virtio_mem` config knob, defaulting to true on s390x). runtime-rs has no virtio-mem support, so any attempt to dynamically hotplug memory on s390x fails with: 'pc-dimm' is not a valid device model name This is a pre-existing limitation on main — it has never worked. It is now visible because commit 45dfb6ff252d ("runtime-rs: Fix initial vCPU / memory with static_sandbox_resource_mgmt") expanded runtime-rs test coverage, causing k8s-memory.bats and k8s-oom.bats to actually exercise this code path on s390x. Let's enforce using static_sandbox_resources_mgmt also for s390x so the VM is sized upfront at creation time, bypassing the broken dynamic hotplug path entirely. If someone decides to implement hotplug support for s390x, the work would basically be an implemntation of virtio-mem-ccw support in the runtime-rs QEMU backend (boot-time device creation, qom-set based resize, and virtio-mem aware memory accounting), mirroring what the Go runtime already does, but I'm not game for this (sorry). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-08 16:36:00 +02:00
Fabiano Fidêncio	ffab9b7eee	runtime: qemu: Enable static sandbox resource management on ARM runtime-rs lacks several features needed for CPU hotplug on ARM: pflash/UEFI firmware passthrough, SMP topology in -smp, nr_cpus kernel parameter, and QMP vCPU add handling for the virt machine type (which requires core-id only placement with socket/thread/die set to -1). Without static sandbox resource management, these gaps cause failures in tests like k8s-memory.bats where the VM is not correctly sized for the workload. Enable static_sandbox_resource_mgmt for aarch64 in the QEMU runtime-rs configuration so the VM is pre-sized at creation time, sidestepping the need for hotplug entirely. Together with this we're aligning the go runtime to the very same behaviour. Fixes: #10928 Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-08 16:36:00 +02:00
Fabiano Fidêncio	0e5e4802d7	runtime-rs: Fix initial vCPU / memory with static_sandbox_resource_mgmt InitialSizeManager::setup_config() is responsible for applying the sandbox workload sizing (computed from containerd/CRI-O sandbox annotations) to the hypervisor configuration before VM creation. Previously, the workload vCPU count was only logged but never actually added to default_vcpus, so the VM was always created with only the base vCPUs from the configuration/annotations. This caused the k8s-sandbox-vcpus-allocation test to fail with qemu-snp-runtime-rs: a pod with default_vcpus=0.75 and a container CPU limit of 1.2 should see ceil(0.75 + 1.2) = 2 vCPUs, but only got 1. Additionally, the workload memory was being added to default_memory unconditionally, diverging from the Go runtime which only applies both CPU and memory additions when static_sandbox_resource_mgmt is enabled. In the non-static path, adding workload resources here would cause double-counting: once from setup_config() at sandbox creation, and again from update_cpu_resources()/update_mem_resources() when individual containers are added. Guard both additions behind static_sandbox_resource_mgmt, matching the Go runtime's behavior in src/runtime/pkg/oci/utils.go: if sandboxConfig.StaticResourceMgmt { sandboxConfig.HypervisorConfig.NumVCPUsF += sandboxConfig.SandboxResources.WorkloadCPUs sandboxConfig.HypervisorConfig.MemorySize += sandboxConfig.SandboxResources.WorkloadMemMB } Fixes: k8s-sandbox-vcpus-allocation test failure on qemu-snp-runtime-rs Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com> Made-with: Cursor	2026-04-08 16:36:00 +02:00
Fabiano Fidêncio	bb051bb16a	Merge pull request #12788 from fidencio/topic/kata-deploy-re-apply-GPU-specific-labels kata-deploy: re-apply labels for the GPU runtime classes	2026-04-08 16:27:59 +02:00

1 2 3 4 5 ...

18452 Commits