Docker 26+ configures container networking (veth pair, IP addresses,
routes) after task creation rather than before. Kata's endpoint scan
runs during CreateSandbox, before the interfaces exist, resulting in
VMs starting without network connectivity (no -netdev passed to QEMU).
Add RescanNetwork() which runs asynchronously after the Start RPC.
It polls the network namespace until Docker's interfaces appear, then
hotplugs them to QEMU and informs the guest agent to configure them
inside the VM.
Additional fixes:
- mountinfo parser: find fs type dynamically instead of hardcoded
field index, fixing parsing with optional mount tags (shared:,
master:)
- IsDockerContainer: check CreateRuntime hooks for Docker 26+
- DockerNetnsPath: extract netns path from libnetwork-setkey hook
args with path traversal protection
- detectHypervisorNetns: verify PID ownership via /proc/pid/cmdline
to guard against PID recycling
- startVM guard: rescan when len(endpoints)==0 after VM start
Fixes: #9340
Signed-off-by: llink5 <llink5@users.noreply.github.com>
Onboard a test case for deploying a NIM service using the NIM
operator. We install the operator helm chart on the fly as this is
a fast operation, spinning up a single operand. Once a NIM service
is scheduled, the operator creates a deployment with a single pod.
For now, the TEE-based flow uses an allow-all policy. In future
work, we strive to support generating pod security policies for the
scenario where NIM services are deployed and the pod manifest is
being generated on the fly.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
to run all the tests that are running in CI we need to enable external
tests. This can be a bit tricky so add it into our documentation.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
replace the deprecated CAA deployment with helm one. Note that this also
installs the CAA mutating webhook, which wasn't installed before.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
A FC update caused bad requests for the runtime-rs runtime when
specifying the vcpu count and block rate limiter fields.
Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
Add functional tests that cover two previously untested kata-deploy
behaviors:
1. Restart resilience (regression test for #12761): deploys a
long-running kata pod, triggers a kata-deploy DaemonSet restart via
rollout restart, and verifies the kata pod survives with the same
UID and zero additional container restarts.
2. Artifact cleanup: after helm uninstall, verifies that RuntimeClasses
are removed, the kata-runtime node label is cleared, /opt/kata is
gone from the host filesystem, and containerd remains healthy.
3. Artifact presence: after install, verifies /opt/kata and the shim
binary exist on the host, RuntimeClasses are created, and the node
is labeled.
Host filesystem checks use a short-lived privileged pod with a
hostPath mount to inspect the node directly.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
When a kata-deploy DaemonSet pod is restarted (e.g. due to a label
change or rolling update), the SIGTERM handler runs cleanup which
unconditionally removes kata artifacts and restarts containerd. This
causes containerd to lose the kata shim binary, crashing all running
kata pods on the node.
Fix this by implementing a three-stage cleanup decision:
1. If this pod's owning DaemonSet still exists (exact name match via
DAEMONSET_NAME env var), this is a pod restart — skip all cleanup.
The replacement pod will re-run install, which is idempotent.
2. If this DaemonSet is gone but other kata-deploy DaemonSets still
exist (multi-install scenario), perform instance-specific cleanup
only (snapshotters, CRI config, artifacts) but skip shared
resources (node label removal, CRI restart) to avoid disrupting
the other instances.
3. If no kata-deploy DaemonSets remain, perform full cleanup including
node label removal and CRI restart.
The Helm chart injects a DAEMONSET_NAME environment variable with the
exact DaemonSet name (including any multi-install suffix), ensuring
instance-aware lookup rather than broadly matching any DaemonSet
containing "kata-deploy".
Fixes: #12761
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Newer k3s releases (v1.34+) no longer include "k3s" in the containerd
version string at all (e.g. "containerd://2.2.2-bd1.34" instead of
"containerd://2.1.5-k3s1"). This caused kata-deploy to fall through to
the default "containerd" runtime, configuring and restarting the system
containerd service instead of k3s's embedded containerd — leaving the
kata runtime invisible to k3s.
Fix by detecting k3s/rke2 via their systemd service names (k3s,
k3s-agent, rke2-server, rke2-agent) rather than parsing the containerd
version string. This is more robust and works regardless of how k3s
formats its containerd version.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Remove the workaround that wrote a synthetic containerd V3 config
template for k3s/rke2 in CI. This was added to test kata-deploy's
drop-in support before the upstream k3s/rke2 patch shipped. Now that
k3s and rke2 include the drop-in imports in their default template,
the workaround is no longer needed and breaks newer versions.
Removed:
- tests/containerd-config-v3.tmpl (synthetic Go template)
- _setup_containerd_v3_template_if_needed() and its k3s/rke2 wrappers
- Calls from deploy_k3s() and deploy_rke2()
This reverts the test infrastructure part of a2216ec05.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's ensure that in case nydus-snapshotter crashes for one reason or
another, the service is restarted.
This follows containerd approach, and avoids manual intervention in the
node.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's relax our RequiredBy and use a WantedBy in the nydus systemd unit
file as, in case of a nydus crash, containerd would also be put down,
causing the node to become NotReady.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
couldn't initialise QMP: Connection reset by peer (os error 104)
Caused by:
Connection reset by peer (os error 104)
qemu stderr: "qemu-system-ppc64: Maximum memory size 0x80000000 is not aligned to 256 MiB”
When the default max memory was assigned according to the
available host memory, it failed with the above error
Align the memory values with the block size of 256 MB on ppc64le.
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
While attaching the tap device, it fails on ppc64le with EBADF
"cannot create tap device. File descriptor in bad state (os error 77)\"): unknown”
Refactor the ioctl call to use the standard libc::TUNSETIFF constant.
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
After the qemu VM is booted, while storing the guest details,
it fails to set capabilities as it is not yet implemented
for QEMU, this change adds a default implementation for it.
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
Use the container image layer storage feature for the
k8s-nvidia-nim.bats test pod manifests. This reduces the pods'
memory requirements.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
- trusted-storage.yaml.in: use $PV_STORAGE_CAPACITY and
$PVC_STORAGE_REQUEST so that PV/PVC size can vary per test.
- confidential_common.sh: add optional size (MB) argument to
create_loop_device.
- k8s-guest-pull-image.bats: pass PV_STORAGE_CAPACITY and
PVC_STORAGE_REQUEST when generating storage config.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
The follow differences are observed between container 1.x and 2.x:
```
[plugins.'io.containerd.snapshotter.v1.devmapper']
snapshotter = 'overlayfs'
```
and
```
[plugins."io.containerd.snapshotter.v1.devmapper"]
snapshotter = "overlayfs"
```
The current devmapper configuration only works with double quotes.
Make it work with both single and double quotes via tomlq.
In the default configuration for containerd 2.x, the following
configuration block is missing:
```
[[plugins.'io.containerd.transfer.v1.local'.unpack_config]]
platform = "linux/s390x" # system architecture
snapshotter = "devmapper"
```
Ensure the configuration block is added for containerd 2.x.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The govmm workflow isn't run by us and it and the other CI files
are just legacy from when it was a separate repo, so let's clean up
this debt rather than having to update it frequently.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Update the action to resolve the following warning in GHA:
> Node.js 20 actions are deprecated. The following actions are running
> on Node.js 20 and may not work as expected:
> actions/checkout@11bd71901b.
> Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
It becomes simple and flexible with mermaid codes to update
the pic or diagrams. And it also remove the legacy PNG pictures
to reduce the kata-statics release file size.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add all 9 library crates which are missing in workspace including:
(1) kata-types with annotations, hypervisor configs, and K8s utilities.
(2) kata-sys-util with all sub-modules: cpu, device, fs, hooks, k8s,
mount, netns, numa, pcilibs, protection, spec, validate.
(3) protocols with ttrpc bindings: agent, health, remote, csi, oci,
confidential_data_hub.
(4) runtime-spec with OCI container state types and namespace constants.
(5) shim-interface with RESTful API and Unix socket path.
(6) logging with slog framework features: JSON, journal, filtering.
(7) safe-path with security-focused path resolution utilities.
(8) mem-agent with memory management: memcg, compact, psi.
(9) test-utils with privilege and KVM test macros.
And one more thing, uniformly adopt TOCTOU in place of the redundant
TOCTTOU abbreviation.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add comprehensive hypervisor support table (Dragonball, QEMU,
Cloud Hypervisor, Firecracker, Remote). Document all runtime handlers
(VirtContainer, LinuxContainer, WasmContainer) and resource types.
List all configuration files including CoCo variants (TDX, SNP, SE).
Add shim-ctl crate to crates table for development tooling reference.
Add Feature Flags section documenting dragonball and cloud-hypervisor
options.
Simplify and restructure content for clarity while preserving technical
accuracy.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Comprehensive rewrite of docs/design/virtualization.md to improve
clarity, completeness, and usability.
This document now serves as the authoritative guide for
understanding and selecting hypervisors in Kata Containers deployments.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add 'Choose a Hypervisor', 'Hypervisor Configuration Files', and
'Hypervisor Versions' sections to virtualization.md.
Key changes:
- Integrate hypervisor comparison table from hypervisors.md
- Add configuration file reference table for both go and rust runtimes
- Add current hypervisor versions from versions.yaml:
- Cloud Hypervisor: v51.1
- Firecracker: v1.12.1
- QEMU: v10.2.1
- StratoVirt: v2.3.0
- Dragonball: builtin (part of rust runtime)
- Preserve original structure documenting each hypervisor's device model
and features
- Add reference links for all hypervisors
This consolidates hypervisor selection guidance and version information
into a single comprehensive virtualization design document.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
As the document is just for CRI-O, we need remove containerd related
settings from it and make it clear for users.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>