tests/functional/vfio-ap/run.sh:
- Source tests/common.bash so the schema helpers are available.
- configure_containerd_for_runtime_rs: write kata-qemu-runtime-rs
configuration via a conf.d drop-in. Schema >= 3 uses
io.containerd.cri.v1.runtime; schema 2 uses io.containerd.grpc.v1.cri.
The sandboxer field is emitted only for schema >= 3.
tests/integration/nerdctl/gha-run.sh:
- Fix "containerd config default" pipe: propagate PATH so the newly
installed binary is found, suppress stdout, and call
ensure_containerd_conf_d_rootful_api_sockets.
tests/integration/kubernetes/gha-run.sh:
- Fix jq filter for devmapper snapshotter (.version // 0 >= 3).
- Add ensure_containerd_conf_d_rootful_api_sockets after config setup.
tests/gha-run-k8s-common.sh:
- Remove the redundant "containerd config default | sed" override;
overwrite_containerd_config (called via check_containerd_config_for_kata)
now handles SystemdCgroup and all other containerd config setup.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Adapt create_containerd_config to work with containerd 2.x while
keeping compatibility with v1.x for completeness:
- Drop the direct config.toml patching in favour of conf.d fragments:
use containerd_render_config_default_with_imports to generate the
base config, then write separate drop-ins for API socket overrides,
debug settings, and the Kata runtime.
- Use CONTAINERD_SYSTEM_FRAGMENT_PREFIX directly (no PREFIX= indirection).
- Detect cfg_schema via _containerd_blob_schema_version to select the
right plugin table:
schema >= 3 -> io.containerd.cri.v1.runtime
schema 2 -> io.containerd.grpc.v1.cri
and to emit the sandboxer field only on schema >= 3.
- Pass GOTOOLCHAIN via "sudo -E make clean" so the environment variable
set by export_go_toolchain_for_containerd_source_builds is preserved
during the containerd source build.
The require_containerd_binary_default_schema_v3_plus call is kept: the
test explicitly clones and builds containerd 2.x from source, so a
schema v2 binary should never appear here.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Configure containerd for nydus differently depending on the active
config schema, because conf.d drop-in fragments are only honoured the
same way by containerd 2.x.
config_containerd now delegates to _containerd_resolved_schema_version
(from common.bash) to detect the active schema and passes it to
config_containerd_core, which emits schema-appropriate config:
schema >= 3 (containerd v2.x):
Keep the base config and add a conf.d drop-in fragment using the
io.containerd.cri.v1.runtime plugin (sandboxer = 'podsandbox') and
io.containerd.cri.v1.images to select nydus as the snapshotter.
schema 2 (containerd v1.x):
conf.d is not honoured the same way, so replace config.toml
wholesale with a complete, self-contained file using the
io.containerd.grpc.v1.cri plugin with nydus as the snapshotter and
no sandboxer field.
The [proxy_plugins] block is written in both cases as it is
schema-version agnostic.
Teardown restores the whole config.toml (schema v2 path) or removes the
drop-in fragment (schema v3+ path) as appropriate.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Rewrite overwrite_containerd_config so that it works with containerd
v1.x (schema v2) as well as containerd v2.x (schema v3+):
- Always regenerate /etc/containerd/config.toml from the installed
binary via "sudo containerd config default".
- Call ensure_containerd_conf_d_rootful_api_sockets after regenerating
the base config.
- Detect the effective schema via _containerd_resolved_schema_version.
- Schema >= 3 (containerd v2.x): write io.containerd.cri.v1.runtime
plugin path with sandboxer = podsandbox into a conf.d drop-in.
- Schema 2 (containerd v1.x): write io.containerd.grpc.v1.cri plugin
path without sandboxer into the drop-in.
check_containerd_config_for_kata no longer appends a schema guard;
the function supports both schema generations intentionally.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Three issues prevented containerd 2.x from working correctly after
installation:
1. Socket uid/gid mismatch: "containerd config default" was run as the
unprivileged user, which produced uid = <runner-uid> in the API
socket stanza instead of uid = 0. Run it under sudo so the default
output is owned by root.
2. Stale systemd unit: the CI runner ships a pre-installed containerd
whose unit file is left in place after the binary is replaced by the
test installer. The old unit causes "MigrateConfigTo: index out of
range" panics when the new binary tries to load a schema v4 config.
Always overwrite the unit file from the template so the running
binary and the unit file stay in sync.
3. Schema guard removed: install_cri_containerd installs whatever
version was requested (v1.7 or v2.3) and must not abort on a valid
schema v2 binary.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Introduce helper functions used by later commits to make containerd
configuration schema-aware.
_containerd_blob_schema_version():
Parse the version = <n> line from a containerd config blob and echo
the integer.
_containerd_resolved_schema_version():
Run "containerd config default" and return the schema version of the
active binary. Drives conditional logic in overwrite_containerd_config
and other helpers.
containerd_emit_rootful_api_socket_overrides():
Emit the TOML fragment that fixes uid/gid on the grpc/ttrpc sockets.
Schema v3 uses top-level [grpc]/[ttrpc]; schema v4+ uses plugin-scoped
tables.
require_containerd_config_schema_v3_plus() /
require_containerd_binary_default_schema_v3_plus():
Guard helpers that abort with a clear message when the installed
containerd is older than v2.x. Used only in test paths that
explicitly build containerd 2.x from source.
containerd_render_config_default_with_imports():
Write a fresh "containerd config default" to a file and ensure the
conf.d import glob is present, ready for drop-in fragments.
export_go_toolchain_for_containerd_source_builds():
Set GOTOOLCHAIN=auto so "go build" of containerd 2.x downloads the
exact toolchain in its go.mod without changing the global Go version.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
When restart_systemd_service_with_no_burst_limit fails or times out
waiting for the containerd socket, emit "journalctl -xeu
containerd.service" output so the failure reason is visible in CI logs
without requiring a separate log-collection step.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Bump the containerd version used by CI from v1.7.25 to v2.3.0.
Rename the version-range fields in versions.yaml and throughout the
GitHub Actions workflows from lts/active/version/sandbox_api to
minimum/latest to make their meaning self-evident:
minimum: "v1.7" # oldest containerd branch under test
latest: "v2.3" # newest containerd branch under test
Drop the bare version field (superseded by the matrix) and the
sandbox_api alias (covered by latest). Update all containerd_version
matrix entries in the workflow files accordingly, and update
gha-run-k8s-common.sh to resolve the new key names.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Allow operators to provide per-shim drop-in TOML for built-in runtimes
and reconcile stale override files so upgrades and migrations remain
safe when drop-ins are added or removed.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Codex
The dragonball nerdctl CI job can race when creating and attaching the
runtime process to the sandbox cgroup, surfacing an os error 17
(AlreadyExists) during shim task creation.
Let's retry add_proc once on this pre-existing cgroup condition so
startup remains robust.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Codex <codex@openai.com>
Low-CPU sandboxes can take longer than a few seconds to complete guest
boot and start the agent.
Let's clamp the reconnect timeout to a safe minimum so sandbox startup
does not fail early with transient vsock ECONNRESET.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Codex <codex@openai.com>
When static sandbox resource management is enabled, CRI CPU/memory
sizing may live only in sandbox annotations and be missing from the OCI
spec.
Let's fill missing sizing fields from annotations before applying static
VM sizing so runtime-rs follows the expected Kubernetes behavior for
constrained pods.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Codex <codex@openai.com>
Add top-level runtime-rs Makefile options `DEFSANDBOXCGROUP_ONLY` and
`DEFSTATICRESOURCEMGMT`, both defaulting to true, and use them for the
runtime defaults that previously disabled these paths.
This aligns runtime-rs defaults with static sandbox resource management,
which sizes sandbox memory up front instead of relying on memory hotplug,
helping avoid architecture-specific hotplug limitations.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add an RFC document describing the composable image architecture that
replaces monolithic guest rootfs images with a lean base image plus
purpose-specific addon images cold-plugged as virtio-blk devices.
The proposal covers the runtime configuration (extra_images), host-side
cold-plugging, guest-side mounting via systemd and dm-verity, agent-side
dynamic path resolution, the image build pipeline, and the security
model.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Enable the hard-coded init-data policy test gate for qemu-tdx-runtime-rs
so runtime-rs and Go TDX variants exercise the same Kubernetes policy
coverage.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Default custom runtime RuntimeClass overhead.podFixed to the selected
baseConfig values, so equivalent runtimes behave consistently without
repeating boilerplate.
In case the user wants to enforce that no overhead is set on the custom
RuntimeClass, disable inheritance with inheritBaseOverhead=false.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
For containerd v2.2+, the flow assumes that the imports directive would be present.
It is better to check it and add if it doesn't exist.
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
The coco initdata tests signature verification and authenticated registry
never worked on qemu-tdx and so they have been disabled since.
Add them back now that all necessary fixes are in place.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
initdata tests set kernel arguments to "" which resets the
kernel arguments configured by Helm install. However, TDX
runner depends on agent.https_proxy= kernel arguments to pull
images.
In order for initdata tests to work on TDX, the same needs to
be added to CDH configuration via image.image_pull_proxy.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
No need to patch yamls locally. Also, set RUST_LOG=debug
and enable https_proxy for all TDX targets when the runner
has HTTPS_PROXY is set.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Add osv-scanner ignores for GO-2025-3426 (CVE-2025-0750) and
GO-2025-3897 (CVE-2025-4437), which are false positives for
kata-containers.
The vulnerabilities have been open for 10 and 16 months and there
is no indication that the cri-o community have any intension of addressing
the situation. They also only affect the main CRI-O runtime code (log
management and user creation functions), but kata-containers only
imports github.com/cri-o/cri-o/pkg/annotations for string constant
definitions. The vulnerable code paths are not imported or used,
therefore we should just filter these out.
GO-2025-3426: Path traversal in UnMountPodLogs/LinkContainerLogs
GO-2025-3897: Memory exhaustion when reading /etc/passwd
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Generated-By: IBM Bob
Migrate trace-forwarder from the deprecated opentelemetry-jaeger
exporter to the modern opentelemetry-otlp exporter.
This change remediates GHSA-2f9f-gq7v-9h6m (CVE-2026-43868), a
medium-severity vulnerability in Apache Thrift. The opentelemetry-jaeger
crate is no longer maintained and depends on vulnerable thrift versions
(0.13.0 and 0.16.0). The opentelemetry-otlp exporter does not use thrift
and is actively maintained.
Changes:
- Replace opentelemetry-jaeger with opentelemetry-otlp in Cargo.toml
- Update tracer.rs to use OTLP exporter instead of Jaeger exporter
- Replace --jaeger-host/--jaeger-port flags with --otlp-endpoint flag
- Update server.rs to use TracerProvider instead of SpanExporter
- Update documentation to reflect OTLP migration
- Add examples for common OTLP-compatible collectors
Breaking change: Users must update their trace-forwarder invocations
to use --otlp-endpoint instead of --jaeger-host and --jaeger-port.
Default endpoint: http://localhost:4317 (OTLP gRPC)
Generated-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The kata-monitor test is currently failing and is running a very EoL
version of cri-o. This area is being actively reworked in #13107,
so remove this and then once kata-monitor tests are stable we
can re-add the new versions
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
When CreateContainer fails before the runtime instance is registered
(e.g. a hypervisor/cgroup error), no sandbox exists to drive the normal
teardown. containerd's follow-up Shutdown RPC then reaches
get_runtime_instance(), fails with "runtime not ready", and returns
before the service loop is ever told to stop. Because the shim ignores
SIGTERM, the containerd-shim-kata-v2 daemon is left running and orphaned.
Make the Shutdown RPC force the daemon to exit when there is no runtime
instance, emitting the same Action::Shutdown that sandbox.shutdown()
sends on the normal path. This guarantees the shim process is reaped
after a failed create instead of leaking.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Bump the go version to resolve CVEs:
- GO-2026-5037
- GO-2026-5038
- GO-2026-5039
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Generated-By: IBM Bob
k8s-sandbox-vcpus-allocation.bats was disabled for qemu-tdx due to
errors when moving to use "upstream" TDX KVM code. The failing test
is vcpus-less-than-one-with-no-limits pod which ends up getting
x86 default MaxCPU = 240 and erroring:
Number of hotpluggable cpus requested (240) exceeds the maximum cpus supported by KVM (224)
TDX max vcpus is capped to host's logical CPUs so 240 is too much.
With the maxcpus logic fixed (=maxcpus not set at all) for configurations
where confidential guest is enabled, qemu-tdx can be enabled for
k8s-sandox-vcpus-allocation.bats again.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
QEMU maxcpus enables CPU hotplug capabilities but it's unused when
confidential guest is enabled.
Change Go runtime code to skip setting maxcpus QEMU cmdline if CPU hotplug
is not needed.
Commit 07db945b09 built a relationship between kernel's cmdline nr_cpus and
the maxcpus config. Now that maxcpus is dropped for confidential guests, drop
nr_cpus from kernel commandline too. This hopefully helps with the reference
values computation too.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
QEMU maxcpus enables CPU hotplug capabilities but it's unused when
confidential guest is enabled.
Change runtime-rs code to skip setting maxcpus QEMU cmdline if CPU hotplug
is not needed.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>