It becomes simple and flexible with mermaid codes to update
the pic or diagrams. And it also remove the legacy PNG pictures
to reduce the kata-statics release file size.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add all 9 library crates which are missing in workspace including:
(1) kata-types with annotations, hypervisor configs, and K8s utilities.
(2) kata-sys-util with all sub-modules: cpu, device, fs, hooks, k8s,
mount, netns, numa, pcilibs, protection, spec, validate.
(3) protocols with ttrpc bindings: agent, health, remote, csi, oci,
confidential_data_hub.
(4) runtime-spec with OCI container state types and namespace constants.
(5) shim-interface with RESTful API and Unix socket path.
(6) logging with slog framework features: JSON, journal, filtering.
(7) safe-path with security-focused path resolution utilities.
(8) mem-agent with memory management: memcg, compact, psi.
(9) test-utils with privilege and KVM test macros.
And one more thing, uniformly adopt TOCTOU in place of the redundant
TOCTTOU abbreviation.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add comprehensive hypervisor support table (Dragonball, QEMU,
Cloud Hypervisor, Firecracker, Remote). Document all runtime handlers
(VirtContainer, LinuxContainer, WasmContainer) and resource types.
List all configuration files including CoCo variants (TDX, SNP, SE).
Add shim-ctl crate to crates table for development tooling reference.
Add Feature Flags section documenting dragonball and cloud-hypervisor
options.
Simplify and restructure content for clarity while preserving technical
accuracy.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Comprehensive rewrite of docs/design/virtualization.md to improve
clarity, completeness, and usability.
This document now serves as the authoritative guide for
understanding and selecting hypervisors in Kata Containers deployments.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add 'Choose a Hypervisor', 'Hypervisor Configuration Files', and
'Hypervisor Versions' sections to virtualization.md.
Key changes:
- Integrate hypervisor comparison table from hypervisors.md
- Add configuration file reference table for both go and rust runtimes
- Add current hypervisor versions from versions.yaml:
- Cloud Hypervisor: v51.1
- Firecracker: v1.12.1
- QEMU: v10.2.1
- StratoVirt: v2.3.0
- Dragonball: builtin (part of rust runtime)
- Preserve original structure documenting each hypervisor's device model
and features
- Add reference links for all hypervisors
This consolidates hypervisor selection guidance and version information
into a single comprehensive virtualization design document.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
As the document is just for CRI-O, we need remove containerd related
settings from it and make it clear for users.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
As previous document of run-kata-with-k8s.md is not clear for new comers
to quickly find the way to run kata with k8s/crio. In this commit, it
just rename the document name and make it clear.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
When runtime-rs becomes default runtime, everything just for runtime-rs
will be changed, and the dedicated installation for runtime-rs will
be deprecated.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
In this commit:
(1) Update containerd config with kata configurations
(2) Add more comments to guide how to use containerd/kata with default
setting and customized configure setting;
(3) Update the usage of containerd cmd tool ctr with explicitly
specified runtime-config-path options to make it work.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Updated the configuration guide to use `shared_fs = "none"`. This
change reflects that `virtio-9p` is deprecated in `runtime-rs` and
recommends the blockfile snapshotter as a stable alternative to
the buggy `virtio-fs` in SEV-SNP QEMU versions.
But this's limited in the nerdctl or ctr tools.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Corrects the typo 'BUILDIN' to the standard 'BUILTIN' across the
codebase to improve code quality and documentation consistency.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This document describes the Passthrough-FD (pass-fd) technology
implemented in Kata Containers to optimize IO performance. By bypassing
the intermediate proxy layers, this technology significantly reduces
latency and CPU overhead for container IO streams.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
After agent was moved to root workspace, the products are now under the
repo root. Change the TARGET_PATH accordingly to tell Makefile where to
lookup output.
Signed-off-by: Jiahao Wang <jiahao.wang@lingcage.com>
This commit adds kata agent to the root workspace, as a follow up work
of #12413.
Remove agent from exclude list, and make it as a member of root
workspace.
Signed-off-by: Jiahao Wang <jiahao.wang@lingcage.com>
Add two new configuration knobs that control the logical and physical
sector sizes advertised by virtio-blk devices to the guest:
block_device_logical_sector_size (config file)
block_device_physical_sector_size (config file)
io.katacontainers.config.hypervisor.blk_logical_sector_size (annotation)
io.katacontainers.config.hypervisor.blk_physical_sector_size (annotation)
The annotation names are abbreviated relative to the config file keys
because Kubernetes enforces a 63-character limit on annotation name
segments, and the full names would exceed it.
Both settings default to 0 (let QEMU decide). When set, they are passed
as logical_block_size and physical_block_size in the QMP device_add
command during block device hotplug.
Setting logical_sector_size smaller then container filesystem
block size will cause EINVAL on mount. The physical_sector_size can
always be set independently.
Values must be 0 or a power of 2 in the range [512, 65536]; other
values are rejected with an error at sandbox creation time.
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
As the AMD maintainers switched to the 2.3.0-beta.0 containerd (due to
the nydus fixes that landed there).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Now that kata-deploy deploys and manages nydus-for-kata-tee on all
platforms, the separate standalone nydus-snapshotter DaemonSet deployment
is no longer needed.
- Short-circuit deploy_nydus_snapshotter and cleanup_nydus_snapshotter
to no-ops with an explanatory message.
- Add qemu-snp to the workaround case so AMD SEV-SNP baremetal runners
also get USE_EXPERIMENTAL_SETUP_SNAPSHOTTER=true and kata-deploy picks
up the snapshotter setup on every run.
- Drop the x86_64 arch guard and the hypervisor sub-case from the
EXPERIMENTAL_SETUP_SNAPSHOTTER block, allowing any architecture and
hypervisor to use the kata-deploy-managed path when the flag is set.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Rename all host-visible names of the nydus-snapshotter instance managed
by kata-deploy from the generic "nydus-snapshotter" to "nydus-for-kata-tee".
This covers the systemd service name, the containerd proxy plugin key,
the runtime class snapshotter field, the data directory
(/var/lib/nydus-for-kata-tee), the socket path (/run/nydus-for-kata-tee/),
and the host install subdirectory.
The rename makes it immediately clear that this nydus-snapshotter instance
is the one deployed and managed by kata-deploy specifically for Kata TEE
use cases, rather than any general-purpose nydus-snapshotter that might
be present on the host.
Because the old code operated under a completely separate set of paths
(nydus-snapshotter.*), any previously deployed installation continues
to run without interference during the transition to this new naming.
CI pipelines and operators can upgrade kata-deploy on their own schedule
without having to coordinate an atomic cutover: the old service keeps
serving its existing workloads until it is explicitly replaced, and the
new deployment lands cleanly alongside it.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The enablement of the trusted ephemeral storage for IBM SEL was
missed in #10559. Set the emptydir_mode properly for the TEE.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Fixes: #10002
Since #11537 resolves the issue, remove the skip conditions for
the k8s e2e tests involving emptyDir volume mounts.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Enable VFIO device pass-through at VM creation time on Cloud Hypervisor,
in addition to the existing hot-plug path.
Signed-off-by: Roaa Sakr <romoh@microsoft.com>
SSIA, the NIM tests are breaking due to authentication issues, and those
issues are blocking other PRs.
Let's unrequire the test for now, and mark it as required again once we
fixed the auth issues.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Removing /var/lib/nydus-snapshotter during install or uninstall creates
a split-brain state: the nydus backend starts empty while containerd's
BoltDB (meta.db) still holds snapshot records from the previous run.
Any subsequent image pull then fails with:
"unable to prepare extraction snapshot:
target snapshot \"sha256:...\": already exists"
An earlier attempt cleaned up containerd's BoltDB via `ctr snapshots rm`
before wiping the directory, but that cleanup is inherently fragile:
- It requires the nydus gRPC service to be reachable at cleanup time.
If the service is stopped, crashed, or not yet running, every `ctr`
call silently fails and the stale records remain.
- Any workload still actively using a snapshot blocks the entire
cleanup, making it impossible to guarantee a clean state.
The correct invariant is that meta.db and the nydus backend always
agree. Preserving the data directory unconditionally guarantees this:
- Fresh install: data directory does not exist, nydus starts empty.
- Reinstall: existing snapshots and nydus.db are preserved, meta.db
and backend remain in sync, new binary starts cleanly.
- After uninstall: containerd is reconfigured without the nydus
proxy_plugins entry and restarted, so the snapshot records in
meta.db are completely dormant — nothing will use them. If nydus
is reinstalled later, the data directory is still present and both
sides remain in sync, so no split-brain can occur.
Any stale snapshots from previous workloads are garbage-collected by
containerd once the images referencing them are removed.
This also removes the cleanup_containerd_nydus_snapshots,
cleanup_nydus_snapshots, and cleanup_nydus_containers helpers that
were introduced by the earlier (fragile) attempt.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
Now that containerd 2.3.0-beta.0 has been released, it brings fixes for
multi-snapshotters that allows us to test the baremetal machines in the
same way we test the non-baremetal ones.
Let's start doing the switch for TDX as timezone is friendlier with
Mikko.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
`cargo check` was introduced in 3f1533a to check that Cargo.lock is in sync
with Cargo.toml. However, if there are uncommitted changes in the working
tree, the current invocation will immediately fail because of the `git
diff` call, which is frustrating for local development.
As it turns out, `cargo clippy` is a superset of `cargo check`, so we can
simply pass `--locked` to `cargo clippy` to detect Cargo.lock issues.
This is tested with the following change:
diff --git a/src/agent/Cargo.lock b/src/agent/Cargo.lock
index 96b6c676d..e1963af00 100644
--- a/src/agent/Cargo.lock
+++ b/src/agent/Cargo.lock
@@ -4305,6 +4305,7 @@ checksum = "8f50febec83f5ee1df3015341d8bd429f2d1cc62bcba7ea2076759d315084683"
name = "test-utils"
version = "0.1.0"
dependencies = [
- "libc",
"nix 0.26.4",
]
which results in the following output:
$ make -C src/agent check
make: Entering directory '/kata-containers/src/agent'
standard rust check...
cargo fmt -- --check
cargo clippy --all-targets --all-features --release --locked \
-- \
-D warnings
error: the lock file /kata-containers/src/agent/Cargo.lock needs to be updated but --locked was passed to prevent this
If you want to try to generate the lock file without accessing the network, remove the --locked flag and use --offline instead.
make: *** [../../utils.mk:184: standard_rust_check] Error 101
make: Leaving directory '/kata-containers/src/agent'
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>