kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-05-18 21:55:59 +00:00

Author	SHA1	Message	Date
Fabiano Fidêncio	cb652e0da1	tests: Update NVRC trace to use drop-in config mechanism Update the enable_nvrc_trace() function to use the new drop-in configuration mechanism instead of directly modifying the base configuration file. The function now creates a 90-nvrc-trace.toml drop-in file that properly combines existing kernel parameters with the nvrc.log=trace setting. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	4cb2aea9dd	kata-deploy: Document drop-in configuration and add warning to config files When kata-deploy installs Kata Containers, the base configuration files should not be modified directly. This change adds documentation explaining how to use drop-in configuration files for customization, and prepends a warning comment to all deployed configuration files reminding users to use drop-in files instead. The warning is added to both standard shim configurations and custom runtime configurations. It includes a brief explanation of how drop-in files work and points users to the documentation for more details. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	d5d561abe5	kata-deploy: Add detailed logging for drop-in configuration Add clear INFO-level messages when creating drop-in configuration files, making it easy to understand what kata-deploy is doing during installation: - "Setting up runtime directory for shim: X" - "Generating drop-in configuration files for shim: X" - "Created drop-in file: <path>" When DEBUG mode is enabled (via DEBUG=true environment variable), also log the full content of each drop-in file to aid troubleshooting. The log level is now automatically set to Debug when the DEBUG environment variable is set, ensuring debug messages are visible. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	eddd1b507e	kata-deploy: Extract common drop-in generation into shared helper Deduplicate the drop-in file generation logic between configure_shim_config and install_custom_runtime_configs by extracting it into a shared write_common_drop_ins helper function. This ensures both standard and custom runtimes use the same code path for generating drop-in configuration files. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	577aa6b319	kata-deploy: Propagate drop-in configs to custom runtime classes Ensure custom runtime classes receive the same drop-in configuration files as standard runtimes: - 10-installation-prefix.toml (if custom dest_dir) - 20-debug.toml (if debug enabled) - 30-kernel-params.toml (proxy + debug kernel params) Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	8c60a88bda	kata-deploy: Add combined kernel_params drop-in Add a combined drop-in file (30-kernel-params.toml) that handles all kernel_params modifications. This approach reads the base kernel_params from the original untouched config file and combines them with: - Proxy settings (agent.https_proxy, agent.no_proxy) - Debug settings (agent.log=debug, initcall_debug) Using a single drop-in file for kernel_params avoids the TOML merge behavior where scalar values are replaced rather than appended. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	fae96f1f82	kata-deploy: Add drop-in file for debug configuration When debug mode is enabled, generate a drop-in configuration file (20-debug.toml) with the boolean debug flags for hypervisor, runtime, and agent sections. Note: kernel_params for debug (agent.log=debug, initcall_debug) will be handled by a separate combined kernel_params drop-in file to avoid the TOML merge replacement behavior. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	bb65e516e5	kata-deploy: Add drop-in file for installation prefix When the installation prefix differs from the default /opt/kata, generate a drop-in configuration file (10-installation-prefix.toml) with the adjusted paths instead of modifying the original config file. This removes the need for adjust_installation_prefix and adjust_qemu_cmdline functions which are now deleted along with their tests. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Fabiano Fidêncio	cd76d61a3d	kata-deploy: Add infrastructure for per-shim drop-in configuration Instead of modifying original config files directly, set up a per-shim directory structure that uses symlinks to the original configs and config.d/ directories for drop-in overrides. This enables cleaner configuration management where the original files remain untouched and all kata-deploy customizations are in separate drop-in files that can be easily inspected and removed. Directory structure: {config_path}/runtimes/{shim}/ {config_path}/runtimes/{shim}/configuration-{shim}.toml -> symlink {config_path}/runtimes/{shim}/config.d/ Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-10 18:12:17 +01:00
Paul Meyer	c5ad3f9b26	Merge pull request #12472 from katexochen/p/disable-nvdimm-cc runtime: disable nvdimm for confidential guest	2026-02-10 14:54:40 +01:00
Steve Horsman	44c86f881b	Merge pull request #12466 from ldoktor/gk-pagination tools.gatekeeper: Add support to paginate workflows	2026-02-10 11:59:57 +00:00
Steve Horsman	a8debc9841	Merge pull request #12476 from stevenhorsman/bump-rust-to-1.91 versions: Bump rust to 1.91	2026-02-10 10:03:01 +00:00
Paul Meyer	76525b97a6	runtime-rs: disable nvdimm for confidential guest nvdimm isn't supported by confidential guests, so disable it in the configuration. Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2026-02-10 08:38:41 +01:00
Paul Meyer	a5f554922c	runtime: disable nvdimm for confidential guest There is code to disable this at runtime when confidential_guest is enabled anyway[^1], but it will omit a warning every time. All the touched configuration files set confidential_guest to true, so we already know nvdimm isn't supported. [^1]: `16a7ed6e14/src/runtime/virtcontainers/qemu_amd64.go (L144-L148)` Signed-off-by: Paul Meyer <katexochen0@gmail.com>	2026-02-10 08:38:18 +01:00
Lukáš Doktor	f7baa394d4	tools.gatekeeper: Add support to paginate workflows The number of workflows increased over 30 so we need to paginate them as well as jobs. This commit extracts the existing pagination from jobs and uses it for both jobs and workflows. Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>	2026-02-10 06:53:47 +00:00
stevenhorsman	120fde28e1	versions: Bump rust to 1.91 Following the agreed toolchain policy - bump rust to the current (1.93)-2 releases. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-10 06:52:42 +00:00
Alex Lyn	362a4c5714	runtime-rs: Fix multiqueue config propagation and tap initialization The previous implementation failed to correctly propagate the network multiqueue configuration, causing the effective queue number to remain 0. It also mixed up "queue pairs" with "queue number", so tap devices were opened without proper multiqueue initialization which causes Clh netconfig validation failed. This commit fixes the configuration mapping and initializes tap devices with the correct multiqueue semantics, ensuring Cloud Hypervisor receives a valid netconfig. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-10 11:34:25 +08:00
Alex Lyn	79f81dae50	runtime-rs: Add network_queues for setting network device multiqueues To make network_queues configurable, a new method is introduced via configurtion toml. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-10 11:34:25 +08:00
Alex Lyn	6723ff5c46	runtime-rs: Add configurable DEFNETQUEUES in Makefile To make build with a configurable item of network queues, a dedicated variable of DEFNETQUEUES is added. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-10 11:34:25 +08:00
Alex Lyn	cfc479ef1d	kata-types: Add Network device specific annotation for network queues This commit introduces a new annotation for users to easily set network queues via "io.katacontainers.config.hypervisor.network_queues". And the annotation will be mapped into `NetworkInfo.network_queues` within the configuration. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-10 11:34:25 +08:00
Alex Lyn	61e7875267	kata-types: Adjust the network_queues when load from configuration Adjusts the network queues after loading from a configuration file. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-10 11:34:25 +08:00
Manuel Huber	a6ca5c6628	ci: add editorconfig checker This adds a basic configuration for editorconfig checker. The supplied configuration checks against trailing whitespaces and issues with newlines. Example: \| tools/packaging/kernel/configs/fragments/x86_64/numa.conf: \| Wrong line endings or no final newline \| tools/packaging/release/generate_vendor.sh: \| 44: Trailing whitespace Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-09 15:03:26 -08:00
stevenhorsman	e6d291cf0a	trace-forwarder: Bump time to 0.3.47 Bump time to remediate CVE-2026-25727 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:44:51 +01:00
stevenhorsman	79dc892e18	kata-ctl: Bump time to 0.3.47 Bump time to remediate CVE-2026-25727 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:44:51 +01:00
stevenhorsman	9e1ddcdde9	agent-ctl: Bump time to 0.3.47 Bump time to remediate CVE-2026-25727 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:44:51 +01:00
stevenhorsman	f840f9ad54	rust: Bump time to 0.3.47 To remediate CVE-2026-25727 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:44:51 +01:00
stevenhorsman	ffcb10b6a3	agent: Bump time crate to 0.3.47 Update time to resolve CVE-2026-25727. Note: this involved bumping the versions of slog-term and slog-json and bumping the MSRV to 1.88.0 which time 0.3.47 requires. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:44:51 +01:00
stevenhorsman	33d494b07e	kata-deploy: Bump bytes to 1.11.1 To remediate CVE-2026-25541 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:43:23 +01:00
stevenhorsman	2ea29df99a	genpolicy: Bump bytes to 1.11.1 To remediate CVE-2026-25541 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:43:23 +01:00
stevenhorsman	fa3b419965	kata-ctl: Bump bytes to 1.11.1 To remediate CVE-2026-25541 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:43:23 +01:00
stevenhorsman	e49a61eea2	agent: Bump bytes to 1.11.1 To remediate CVE-2026-25541 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:43:23 +01:00
stevenhorsman	bc45788356	versions: Bump bytes to 1.11.1 To remediate CVE-2026-25541 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:43:23 +01:00
stevenhorsman	51d35f9261	agent-ctl: Bump bytes to 1.11.1 Remediate CVE-2026-25541 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 21:43:23 +01:00
Park.Jiyeon	082e25b297	genpolicy: skip serializing VFIO generation-only settings Skip serializing anno/value regexes and the NVIDIA VFIO device type since they are generation-time only. Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>	2026-02-09 11:36:34 -08:00
Park.Jiyeon	9231144b99	genpolicy: refactor VFIO settings and support multiple NVIDIA GPU keys - Moved VFIO-related config from "device_annotations" to a new "devices" section. - Introduced structured "nvidia" subfield for NVIDIA-specific VFIO settings. - Replaced hardcoded "nvidia.com/pgpu" with configurable "pgpu_resource_keys". - Adjusted Rego rules and code to match new config schema. Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>	2026-02-09 11:36:34 -08:00
Park.Jiyeon	5fa5d1934b	fix(genpolicy): make NVIDIA GPU resource keys configurable Allow specifying multiple NVIDIA GPU resource keys via an explicit allowlist. Keys are now configured under `device_annotations.vfio.nvidia_pgpu_resource_keys` in genpolicy-settings.json. This removes the previous hardcoded reliance on `nvidia.com/pgpu` and supports model-specific resource names. Fixes #12322 Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>	2026-02-09 11:36:34 -08:00
Manuel Huber	525192832f	tests: Clean up superfluous GPU annotation This annotation was required for GPU cold-plug before using a newer device plugin and before querying the pod resources API. As this annotation is no longer required, cleaning it up. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2026-02-09 11:28:24 -08:00
Konstantin Khlebnikov	5d99a141d9	runtime: add hypervisor options for NUMA topology With enable_numa=true hypervisor will expose host NUMA topology as is: map vm NUMA nodes to host 1:1 and bind vpus to relates CPUS. Option "numa_mapping" allows to redefine NUMA nodes mapping: - map each vm node to particular host node or several numa nodes - emulate numa on host without numa (useful for tests) Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Co-authored-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-02-09 20:09:25 +01:00
Fabiano Fidêncio	ab515712d4	kernel: Unify kernel and kernel-confidential Build a single kernel for both kernel and kernel-confidential on x86_64 and s390x. The kernel is built with TEE support (-x) on those arches only. This helps to simplilfy and to maintain the code, and having a single kernel was the original plan since forever. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-09 18:28:23 +01:00
Fabiano Fidêncio	c5b5433866	kernel: Unify nvidia-gpu and nvidia-gpu-confidential Build a single kernel for both nvidia-gpu and nvidia-gpu-confidential, simplifying and reducing code maintenance. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2026-02-09 18:28:23 +01:00
Steve Horsman	f02fa79758	Merge pull request #12470 from jirimoravcik/docs/add-os-version docs: add `OS_VERSION` to rootfs script	2026-02-09 15:06:14 +00:00
Alex Lyn	3fda59e27d	tests: rename pod_exec_with_retries to pod_exec and update callers It will do following works in this commit: (1) Rename pod_exec_with_retries() to pod_exec(). (2) Update implementation to call container_exec(). (3) Replace all usages of pod_exec_with_retries across tests with pod_exec. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
Alex Lyn	861d39305c	tests: drop kubectl exec retries in container_exec This commit aims to drop retries when kubectl exec a container: (1) Rename container_exec_with_retries() to container_exec(). (2) Remove the retry loop and sleep backoff around kubectl exec. Keep the same logging and container-selection logic and return kubectl exec exit status directly. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
Alex Lyn	41e8acbc5e	runtime: Map empty ReadStdout/ReadStderr response to io.EOF After the kata-agent "drain-after-exit" change, stdout/stderr EOF is signaled by a successful ReadStdout/ReadStderr reply with empty Data (len==0), instead of an RPC error. However, runtime-go currently returns (0, nil) to io.CopyBuffer() when resp.Data is empty, which violates Go io.Reader semantics and can cause `kubectl exec` to hang after the command output is already printed. To avoid exec hang: In readProcessStream(), map an empty response (len(resp.Data)==0) into (0, io.EOF). This allows the stdout/stderr copy goroutines to terminate, closes exitIOch, and unblocks the wait path so exec can complete normally. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
Alex Lyn	ffb8a6a9c3	agent: fix misleading tokio::select! biased comment in do_read_stream The previous comment incorrectly implied that `biased` prevents data loss and the exit notifier would never be polled before all buffered data is read. And the detailed info can be seen from the document: https://docs.rs/tokio/latest/src/tokio/macros/select.rs.html#67 Tokio's `biased` only makes polling order deterministic(top-to-bottom) when multiple branches are ready in the same poll, and it makes fairness the caller's responsibility. Output can still be truncated if the exit notification becomes ready while `read_stream` is pending. This change updates the comment to reflect the actual semantics and caveats. No functional behavior change. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
Alex Lyn	1080f6d87e	agent: Introduce drain after exit mechanism to address truncation race Short-lived processes (e.g., `kubectl exec echo`) in legacy-io mode occasionally lose the last segments of their output. The root cause is a race condition where the `term_exit_notifier` triggers before the pipe buffers are fully drained. In the previous implementation, once the exit notification was received, the agent immediately returned an EOF, causing the runtime's `run_io_copy` to terminate and drop any residual data in the pipe. This patch introduces a "drain after exit" mechanism: - Upon receiving an exit notification, the agent enters a 500ms window for polling `read_streaim` to flush remaining data from the buffer. - A true EOF is only returned if the stream is confirmed empty or the timeout is reached. This ensures reliable output delivery for transient exec tasks under high concurrency. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
Alex Lyn	700bddeecc	agent: treat EOF as normal for read_stdout/stderr stream Legacy IO uses shim polling via read_stdout/read_stderr. The agent previously mapped pipe EOF (read() == 0) and term_exit_notifier to errors ("read meet eof"/"eof"), which became ttrpc INTERNAL failures. This caused runtime IO copy to abort early, leading to lost stdout/stderr for short-lived exec (e.g."echo") and spurious failures. Normalize EOF semantics: read_stream now returns Ok(empty) on EOF instead of Err("read meet eof"). This makes legacy IO behave like a proper stream: data until EOF, no INTERNAL errors for normal termination. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2026-02-09 15:56:13 +01:00
stevenhorsman	b909c41128	runtime: Bump x/net to v0.49.0 Bump x/net to resolve CVEs: - GO-2026-4441 - GO-2026-4440 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 14:49:31 +01:00
stevenhorsman	b29312289f	versions: Bump go to 1.24.13 Bump go to 1.24.13 to fix CVE GO-2026-4337 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2026-02-09 14:49:31 +01:00
Zvonko Kaiser	7af306de13	agent: Update aarch64 create_pci_root_bus_path aarch64 is also a supported architecture for NUMA. Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>	2026-02-09 10:19:41 +01:00

... 21 22 23 24 25 ...

18964 Commits