Update the enable_nvrc_trace() function to use the new drop-in
configuration mechanism instead of directly modifying the base
configuration file. The function now creates a 90-nvrc-trace.toml
drop-in file that properly combines existing kernel parameters
with the nvrc.log=trace setting.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
When kata-deploy installs Kata Containers, the base configuration files
should not be modified directly. This change adds documentation explaining
how to use drop-in configuration files for customization, and prepends a
warning comment to all deployed configuration files reminding users to use
drop-in files instead.
The warning is added to both standard shim configurations and custom
runtime configurations. It includes a brief explanation of how drop-in
files work and points users to the documentation for more details.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add clear INFO-level messages when creating drop-in configuration
files, making it easy to understand what kata-deploy is doing during
installation:
- "Setting up runtime directory for shim: X"
- "Generating drop-in configuration files for shim: X"
- "Created drop-in file: <path>"
When DEBUG mode is enabled (via DEBUG=true environment variable),
also log the full content of each drop-in file to aid troubleshooting.
The log level is now automatically set to Debug when the DEBUG
environment variable is set, ensuring debug messages are visible.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Deduplicate the drop-in file generation logic between configure_shim_config
and install_custom_runtime_configs by extracting it into a shared
write_common_drop_ins helper function.
This ensures both standard and custom runtimes use the same code path
for generating drop-in configuration files.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add a combined drop-in file (30-kernel-params.toml) that handles all
kernel_params modifications. This approach reads the base kernel_params
from the original untouched config file and combines them with:
- Proxy settings (agent.https_proxy, agent.no_proxy)
- Debug settings (agent.log=debug, initcall_debug)
Using a single drop-in file for kernel_params avoids the TOML merge
behavior where scalar values are replaced rather than appended.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
When debug mode is enabled, generate a drop-in configuration file
(20-debug.toml) with the boolean debug flags for hypervisor, runtime,
and agent sections.
Note: kernel_params for debug (agent.log=debug, initcall_debug) will
be handled by a separate combined kernel_params drop-in file to avoid
the TOML merge replacement behavior.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
When the installation prefix differs from the default /opt/kata,
generate a drop-in configuration file (10-installation-prefix.toml)
with the adjusted paths instead of modifying the original config file.
This removes the need for adjust_installation_prefix and
adjust_qemu_cmdline functions which are now deleted along with
their tests.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Instead of modifying original config files directly, set up a per-shim
directory structure that uses symlinks to the original configs and
config.d/ directories for drop-in overrides.
This enables cleaner configuration management where the original files
remain untouched and all kata-deploy customizations are in separate
drop-in files that can be easily inspected and removed.
Directory structure:
{config_path}/runtimes/{shim}/
{config_path}/runtimes/{shim}/configuration-{shim}.toml -> symlink
{config_path}/runtimes/{shim}/config.d/
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
There is code to disable this at runtime when confidential_guest
is enabled anyway[^1], but it will omit a warning every time. All
the touched configuration files set confidential_guest to true,
so we already know nvdimm isn't supported.
[^1]: 16a7ed6e14/src/runtime/virtcontainers/qemu_amd64.go (L144-L148)
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
The number of workflows increased over 30 so we need to paginate them as
well as jobs. This commit extracts the existing pagination from jobs and
uses it for both jobs and workflows.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
The previous implementation failed to correctly propagate the network
multiqueue configuration, causing the effective queue number to remain
0.
It also mixed up "queue pairs" with "queue number", so tap devices were
opened without proper multiqueue initialization which causes Clh
netconfig validation failed.
This commit fixes the configuration mapping and initializes tap devices
with the correct multiqueue semantics, ensuring Cloud Hypervisor
receives a valid netconfig.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
To make build with a configurable item of network queues, a dedicated
variable of DEFNETQUEUES is added.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit introduces a new annotation for users to easily set network
queues via "io.katacontainers.config.hypervisor.network_queues".
And the annotation will be mapped into `NetworkInfo.network_queues`
within the configuration.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This adds a basic configuration for editorconfig checker. The
supplied configuration checks against trailing whitespaces and
issues with newlines.
Example:
| tools/packaging/kernel/configs/fragments/x86_64/numa.conf:
| Wrong line endings or no final newline
| tools/packaging/release/generate_vendor.sh:
| 44: Trailing whitespace
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Update time to resolve CVE-2026-25727.
Note: this involved bumping the versions of slog-term and slog-json
and bumping the MSRV to 1.88.0 which time 0.3.47 requires.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Skip serializing anno/value regexes and the NVIDIA VFIO device type since they
are generation-time only.
Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>
- Moved VFIO-related config from "device_annotations" to a new "devices" section.
- Introduced structured "nvidia" subfield for NVIDIA-specific VFIO settings.
- Replaced hardcoded "nvidia.com/pgpu" with configurable "pgpu_resource_keys".
- Adjusted Rego rules and code to match new config schema.
Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>
Allow specifying multiple NVIDIA GPU resource keys via an explicit allowlist.
Keys are now configured under `device_annotations.vfio.nvidia_pgpu_resource_keys`
in genpolicy-settings.json. This removes the previous hardcoded reliance on
`nvidia.com/pgpu` and supports model-specific resource names.
Fixes#12322
Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>
This annotation was required for GPU cold-plug before using a
newer device plugin and before querying the pod resources API.
As this annotation is no longer required, cleaning it up.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
With enable_numa=true hypervisor will expose host NUMA topology as is:
map vm NUMA nodes to host 1:1 and bind vpus to relates CPUS.
Option "numa_mapping" allows to redefine NUMA nodes mapping:
- map each vm node to particular host node or several numa nodes
- emulate numa on host without numa (useful for tests)
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Co-authored-by: Zvonko Kaiser <zkaiser@nvidia.com>
Build a single kernel for both kernel and kernel-confidential on x86_64
and s390x. The kernel is built with TEE support (-x) on those arches only.
This helps to simplilfy and to maintain the code, and having a single
kernel was the original plan since forever.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Build a single kernel for both nvidia-gpu and nvidia-gpu-confidential,
simplifying and reducing code maintenance.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
It will do following works in this commit:
(1) Rename pod_exec_with_retries() to pod_exec().
(2) Update implementation to call container_exec().
(3) Replace all usages of pod_exec_with_retries across tests
with pod_exec.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit aims to drop retries when kubectl exec a container:
(1) Rename container_exec_with_retries() to container_exec().
(2) Remove the retry loop and sleep backoff around kubectl exec.
Keep the same logging and container-selection logic and return
kubectl exec exit status directly.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
After the kata-agent "drain-after-exit" change, stdout/stderr EOF is
signaled by a successful ReadStdout/ReadStderr reply with empty Data
(len==0), instead of an RPC error. However, runtime-go currently
returns (0, nil) to io.CopyBuffer() when resp.Data is empty, which
violates Go io.Reader semantics and can cause `kubectl exec` to
hang after the command output is already printed.
To avoid exec hang:
In readProcessStream(), map an empty response (len(resp.Data)==0)
into (0, io.EOF). This allows the stdout/stderr copy goroutines to
terminate, closes exitIOch, and unblocks the wait path so exec can
complete normally.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The previous comment incorrectly implied that `biased` prevents data
loss and the exit notifier would never be polled before all buffered
data is read. And the detailed info can be seen from the document:
https://docs.rs/tokio/latest/src/tokio/macros/select.rs.html#67
Tokio's `biased` only makes polling order deterministic(top-to-bottom)
when multiple branches are ready in the same poll, and it makes fairness
the caller's responsibility. Output can still be truncated if the exit
notification becomes ready while `read_stream` is pending.
This change updates the comment to reflect the actual semantics and
caveats. No functional behavior change.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Short-lived processes (e.g., `kubectl exec echo`) in legacy-io mode
occasionally lose the last segments of their output.
The root cause is a race condition where the `term_exit_notifier`
triggers before the pipe buffers are fully drained. In the previous
implementation, once the exit notification was received, the agent
immediately returned an EOF, causing the runtime's `run_io_copy` to
terminate and drop any residual data in the pipe.
This patch introduces a "drain after exit" mechanism:
- Upon receiving an exit notification, the agent enters a 500ms window
for polling `read_streaim` to flush remaining data from the buffer.
- A true EOF is only returned if the stream is confirmed empty or the
timeout is reached.
This ensures reliable output delivery for transient exec tasks under
high concurrency.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Legacy IO uses shim polling via read_stdout/read_stderr. The agent
previously mapped pipe EOF (read() == 0) and term_exit_notifier to
errors ("read meet eof"/"eof"), which became ttrpc INTERNAL failures.
This caused runtime IO copy to abort early, leading to lost
stdout/stderr for short-lived exec (e.g."echo") and spurious failures.
Normalize EOF semantics: read_stream now returns Ok(empty) on EOF
instead of Err("read meet eof").
This makes legacy IO behave like a proper stream: data until EOF, no
INTERNAL errors for normal termination.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>