Introduce `ShareVirtioFsNydus` to enable standalone Nydus rootfs
support. This implementation acts as the bridge between runtime-rs
and the external `nydusd` daemon.
Key Capabilities:
(1) Trait Implementation: Implements `ShareFs` (for VM device/storage) and
`NydusShareFs` (for RAFS lifecycle) traits.
(2) Daemon Lifecycle Management: Handles `nydusd` spawning, supervision,
and graceful shutdown.
(3) Native Overlay Support: Configures `nydusd` with `passthrough_fs`
backend to provide native overlay (upperdir/workdir) support.
(4) API Integration: Utilizes `NydusClient` for granular control over RAFS
mount/umount operations.
(5) QEMU Integration: Enables `virtio-fs-nydus` device support,
facilitating standalone mode execution.
This implementation allows Kata containers to utilize an external `nydusd`
process for Nydus rootfs management, providing a cleaner separation between
the runtime and the Nydus daemon lifecycle.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Refactor the `ShareFs` trait to improve modularity and support
standalone Nydus mode:
(1) Added `stop()` method to manage daemon teardown.
(2) Introduced a dedicated trait for Nydus-specific data-plane
operations.
This refactoring cleans up the `ShareFs` trait by consolidating
daemon lifecycle handling and isolating Nydus-specific extensions,
paving the way for cleaner standalone Nydus implementation.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Implement NydusClient to interact with nydusd daemon via Unix
socket:
(1) check_status: query daemon state via GET /api/v1/daemon.
(2) mount/umount: manage filesystem mounts via POST/DELETE
/api/v1/mount.
(3) wait_until_ready: poll daemon until RUNNING state.
This provides a lightweight, stateless HTTP client layer for nydusd
API.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
In standalone nydusd mode with virtio-fs passthrough, the guest-side
mkdir may fail with ENOSYS. Update the overlayfs storage handler to
skip directory creation when the directory already exists, logging a
warning instead of failing.
This ensures container rootfs setup succeeds when nydusd's native
overlay manages the directory structure.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
When using virtio-fs with nydusd's passthrough_fs, mkdir operations may
return ENOSYS on certain filesystem configurations. This causes mount
destination creation to fail unexpectedly.
Handle ENOSYS errors gracefully alongside AlreadyExists by verifying the
directory exists after the failed mkdir attempt, allowing the mount to
proceed if the directory is already present.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add "virtio-fs-nydus" as a recognized shared filesystem type in the
hypervisor configuration. This enables the standalone nydusd mode where
nydusd runs as a separate process alongside virtiofsd.
The key changes:
(1) Add VIRTIO_FS_NYDUS constant for the new shared fs type.
(2) Register virtio-fs-nydus in adjust() and validate() paths, reusing
the same virtio-fs validation logic since both use vhost-user protocol
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
In #13147, for some reason a test block was added in the middle of code
and the code was stale when merged, which meant that a second
`mod test` section was added, breaking our tests. Merge the two
to fix this.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
publish-kata-deploy-payload got renamed in #13107, which broke the CI.
Now, instead of tracking all those intermediate steps, let's make sure
we only track the tests themselves.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Switch kata-monitor workflows from the deprecated "active" key to
"latest" so CI resolves containerd versions from versions.yaml correctly
after the key rename.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Switch qemu-se config templates to use the TEE/CoCo-specific
static_sandbox_resource_mgmt defaults instead of the generic
QEMU defaults.
qemu-se-runtime-rs config now uses DEFSTATICRESOURCEMGMT_COCO
while runtime qemu-se config now uses DEFSTATICRESOURCEMGMT_TEE.
This aligns static sandbox resource management behavior with confidential
container expectations for qemu-se variants.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
These jobs build and push the kata-deploy OCI image, so call them
publish-kata-deploy-image-* instead of *-payload-*, matching the
kata-monitor image jobs and making the workflow easier to read.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add a single-job k8s test that installs the kata-deploy helm chart
with monitor.enabled=true, pointed at the per-PR kata-monitor image
built earlier in the same run, and exercises both the rollout and the
user-visible behaviour:
* the kata-monitor DaemonSet rolls out and the pod stays up without
container restarts;
* a real kata-runtime probe pod is scheduled, then /metrics and
/sandboxes are scraped through the apiserver pod-proxy to prove
kata-monitor sees the sandbox (non-zero running-shim count plus at
least one per-sandbox kata_shim_* metric);
* after the probe pod is deleted, /metrics drops back to a zero
running-shim count.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
Resolve the cri-tools release at install time instead of pinning a
version in versions.yaml: install_cri_tools now queries the GitHub
releases API for the absolute latest stable tag, and the kata-monitor,
cri-containerd and nydus jobs call it directly.
Also write /etc/crictl.yaml during containerd setup so crictl stops
emitting deprecation warnings about the legacy default endpoints.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
Exercise the published kata-monitor container image (the one built by
publish-kata-monitor-payload-amd64) rather than the on-disk binary, so
integration regressions like the recent glibc/musl mismatch surface at
PR time. The kata-monitor-tests.sh script keeps the binary fallback for
ad-hoc local runs.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
Drop the stale CRI-O matrix entry (its cri-tools pin was several
releases behind) along with the exclude that hid the containerd job,
and pin the remaining job to containerd's "active" track (currently
v2.2) via CONTAINERD_VERSION.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
Add a disabled-by-default kata-monitor DaemonSet to the kata-deploy Helm chart,
including image/configuration values so operators can enable monitor shipping as
part of the same deployment workflow when needed.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
kata-monitor is published as a standalone container image starting
with 3.32.0; point users at it from the metrics design doc and the
Prometheus-on-Kubernetes how-to, and switch the DaemonSet manifest to
the dedicated image (keeping the runtime endpoint/listen settings and
hostPath cleanups).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
Build kata-monitor images by extracting the binary from the
shim-v2-go tarball and shipping it on top of
gcr.io/distroless/static-debian13.
Because the binary is built inside an Ubuntu (glibc) toolchain it
cannot run on a pure musl/alpine base — users hit __fprintf_chk /
__vfprintf_chk relocation errors. To get a small, distroless
runtime image we use the same pattern as
tools/packaging/kata-deploy/Dockerfile: copy the glibc libraries
the binary needs (plus the dynamic linker) via ldd from a glibc
base image.
In order to do so, we also added a helper script to build and
publish architecture-specific monitor images from tarball
artifacts.
Reported-by: Steve Linde <stevenlinde@google.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
Set runc SystemdCgroup=true when generating /etc/containerd/config.toml
during containerd installation, restoring behavior that was mistakenly
dropped.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Avoid emitting unsupported plugin keys and empty runtime options in the
v1.x config path so containerd 1.7 can load the generated TOML during
runc sanity checks.
While here, let's also dump the temporary cri-integration config on
failure to speed diagnosis.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As pointed out in kata-containers/kata-containers#12961, the
k8s-number-cpus retry loop could fail all retried assertions and
still pass.
k8s-number-cpus retried until the guest reported three CPUs, but
the post-loop result was never checked. Bash suppresses errexit for
the equality test before && break, so the test could exhaust retries
and still pass.
The current kata-qemu handler sizes vCPUs from fractional container
quotas: two 500m limits produce one workload vCPU, then the default
vCPU is added and rounded once. Expect two CPUs and assert the final
retry result so the test fails if the count never converges.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
Make the chart pass --log-level debug automatically when debug=true so
CI and troubleshooting runs emit full rendered config dumps without
requiring a separate log-level override.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Allow operators to force kata-deploy log verbosity and emit the fully
rendered containerd/CRI-O config and drop-in files in debug mode so
install troubleshooting can rely on exact effective configuration.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
The containerd_version matrix values were renamed from lts/active to
minimum/latest, which changes the generated CI job names. Update the
required-tests list so the gatekeeper waits on the checks that are
actually produced.
The amd64 run-containerd-stability, run-nydus, run-cri-containerd and
free-runner run-k8s-tests jobs map lts -> minimum and active -> latest.
The s390x cri-containerd job maps active -> latest, matching its
updated matrix.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
tests/functional/vfio-ap/run.sh:
- Source tests/common.bash so the schema helpers are available.
- configure_containerd_for_runtime_rs: write kata-qemu-runtime-rs
configuration via a conf.d drop-in. Schema >= 3 uses
io.containerd.cri.v1.runtime; schema 2 uses io.containerd.grpc.v1.cri.
The sandboxer field is emitted only for schema >= 3.
tests/integration/nerdctl/gha-run.sh:
- Fix "containerd config default" pipe: propagate PATH so the newly
installed binary is found, suppress stdout, and call
ensure_containerd_conf_d_rootful_api_sockets.
tests/integration/kubernetes/gha-run.sh:
- Fix jq filter for devmapper snapshotter (.version // 0 >= 3).
- Add ensure_containerd_conf_d_rootful_api_sockets after config setup.
tests/gha-run-k8s-common.sh:
- Remove the redundant "containerd config default | sed" override;
overwrite_containerd_config (called via check_containerd_config_for_kata)
now handles SystemdCgroup and all other containerd config setup.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Adapt create_containerd_config to work with containerd 2.x while
keeping compatibility with v1.x for completeness:
- Drop the direct config.toml patching in favour of conf.d fragments:
use containerd_render_config_default_with_imports to generate the
base config, then write separate drop-ins for API socket overrides,
debug settings, and the Kata runtime.
- Use CONTAINERD_SYSTEM_FRAGMENT_PREFIX directly (no PREFIX= indirection).
- Detect cfg_schema via _containerd_blob_schema_version to select the
right plugin table:
schema >= 3 -> io.containerd.cri.v1.runtime
schema 2 -> io.containerd.grpc.v1.cri
and to emit the sandboxer field only on schema >= 3.
- Pass GOTOOLCHAIN via "sudo -E make clean" so the environment variable
set by export_go_toolchain_for_containerd_source_builds is preserved
during the containerd source build.
The require_containerd_binary_default_schema_v3_plus call is kept: the
test explicitly clones and builds containerd 2.x from source, so a
schema v2 binary should never appear here.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Configure containerd for nydus differently depending on the active
config schema, because conf.d drop-in fragments are only honoured the
same way by containerd 2.x.
config_containerd now delegates to _containerd_resolved_schema_version
(from common.bash) to detect the active schema and passes it to
config_containerd_core, which emits schema-appropriate config:
schema >= 3 (containerd v2.x):
Keep the base config and add a conf.d drop-in fragment using the
io.containerd.cri.v1.runtime plugin (sandboxer = 'podsandbox') and
io.containerd.cri.v1.images to select nydus as the snapshotter.
schema 2 (containerd v1.x):
conf.d is not honoured the same way, so replace config.toml
wholesale with a complete, self-contained file using the
io.containerd.grpc.v1.cri plugin with nydus as the snapshotter and
no sandboxer field.
The [proxy_plugins] block is written in both cases as it is
schema-version agnostic.
Teardown restores the whole config.toml (schema v2 path) or removes the
drop-in fragment (schema v3+ path) as appropriate.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Rewrite overwrite_containerd_config so that it works with containerd
v1.x (schema v2) as well as containerd v2.x (schema v3+):
- Always regenerate /etc/containerd/config.toml from the installed
binary via "sudo containerd config default".
- Call ensure_containerd_conf_d_rootful_api_sockets after regenerating
the base config.
- Detect the effective schema via _containerd_resolved_schema_version.
- Schema >= 3 (containerd v2.x): write io.containerd.cri.v1.runtime
plugin path with sandboxer = podsandbox into a conf.d drop-in.
- Schema 2 (containerd v1.x): write io.containerd.grpc.v1.cri plugin
path without sandboxer into the drop-in.
check_containerd_config_for_kata no longer appends a schema guard;
the function supports both schema generations intentionally.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Three issues prevented containerd 2.x from working correctly after
installation:
1. Socket uid/gid mismatch: "containerd config default" was run as the
unprivileged user, which produced uid = <runner-uid> in the API
socket stanza instead of uid = 0. Run it under sudo so the default
output is owned by root.
2. Stale systemd unit: the CI runner ships a pre-installed containerd
whose unit file is left in place after the binary is replaced by the
test installer. The old unit causes "MigrateConfigTo: index out of
range" panics when the new binary tries to load a schema v4 config.
Always overwrite the unit file from the template so the running
binary and the unit file stay in sync.
3. Schema guard removed: install_cri_containerd installs whatever
version was requested (v1.7 or v2.3) and must not abort on a valid
schema v2 binary.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Introduce helper functions used by later commits to make containerd
configuration schema-aware.
_containerd_blob_schema_version():
Parse the version = <n> line from a containerd config blob and echo
the integer.
_containerd_resolved_schema_version():
Run "containerd config default" and return the schema version of the
active binary. Drives conditional logic in overwrite_containerd_config
and other helpers.
containerd_emit_rootful_api_socket_overrides():
Emit the TOML fragment that fixes uid/gid on the grpc/ttrpc sockets.
Schema v3 uses top-level [grpc]/[ttrpc]; schema v4+ uses plugin-scoped
tables.
require_containerd_config_schema_v3_plus() /
require_containerd_binary_default_schema_v3_plus():
Guard helpers that abort with a clear message when the installed
containerd is older than v2.x. Used only in test paths that
explicitly build containerd 2.x from source.
containerd_render_config_default_with_imports():
Write a fresh "containerd config default" to a file and ensure the
conf.d import glob is present, ready for drop-in fragments.
export_go_toolchain_for_containerd_source_builds():
Set GOTOOLCHAIN=auto so "go build" of containerd 2.x downloads the
exact toolchain in its go.mod without changing the global Go version.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
When restart_systemd_service_with_no_burst_limit fails or times out
waiting for the containerd socket, emit "journalctl -xeu
containerd.service" output so the failure reason is visible in CI logs
without requiring a separate log-collection step.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Bump the containerd version used by CI from v1.7.25 to v2.3.0.
Rename the version-range fields in versions.yaml and throughout the
GitHub Actions workflows from lts/active/version/sandbox_api to
minimum/latest to make their meaning self-evident:
minimum: "v1.7" # oldest containerd branch under test
latest: "v2.3" # newest containerd branch under test
Drop the bare version field (superseded by the matrix) and the
sandbox_api alias (covered by latest). Update all containerd_version
matrix entries in the workflow files accordingly, and update
gha-run-k8s-common.sh to resolve the new key names.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
Allow operators to provide per-shim drop-in TOML for built-in runtimes
and reconcile stale override files so upgrades and migrations remain
safe when drop-ins are added or removed.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Codex
The dragonball nerdctl CI job can race when creating and attaching the
runtime process to the sandbox cgroup, surfacing an os error 17
(AlreadyExists) during shim task creation.
Let's retry add_proc once on this pre-existing cgroup condition so
startup remains robust.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Codex <codex@openai.com>
Low-CPU sandboxes can take longer than a few seconds to complete guest
boot and start the agent.
Let's clamp the reconnect timeout to a safe minimum so sandbox startup
does not fail early with transient vsock ECONNRESET.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Codex <codex@openai.com>
When static sandbox resource management is enabled, CRI CPU/memory
sizing may live only in sandbox annotations and be missing from the OCI
spec.
Let's fill missing sizing fields from annotations before applying static
VM sizing so runtime-rs follows the expected Kubernetes behavior for
constrained pods.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Codex <codex@openai.com>