This commit is to enable qemu-runtime-rs/clh-runtime-rs and make it
compatiable with qemu-runtime-rs and clh-runtime-rs.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add comprehensive documentation for using virtio-fs-nydus shared
filesystem with Kata Containers. This guide covers:
(1) Clarify configuration options for virtio-fs-nydus and nydus image
preparation and usage.
(2) Update daemon configuration and lifecycle management and introduce
standalone, inline nydus architecture.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Introduce `ShareVirtioFsNydus` to enable standalone Nydus rootfs
support. This implementation acts as the bridge between runtime-rs
and the external `nydusd` daemon.
Key Capabilities:
(1) Trait Implementation: Implements `ShareFs` (for VM device/storage) and
`NydusShareFs` (for RAFS lifecycle) traits.
(2) Daemon Lifecycle Management: Handles `nydusd` spawning, supervision,
and graceful shutdown.
(3) Native Overlay Support: Configures `nydusd` with `passthrough_fs`
backend to provide native overlay (upperdir/workdir) support.
(4) API Integration: Utilizes `NydusClient` for granular control over RAFS
mount/umount operations.
(5) QEMU Integration: Enables `virtio-fs-nydus` device support,
facilitating standalone mode execution.
This implementation allows Kata containers to utilize an external `nydusd`
process for Nydus rootfs management, providing a cleaner separation between
the runtime and the Nydus daemon lifecycle.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Refactor the `ShareFs` trait to improve modularity and support
standalone Nydus mode:
(1) Added `stop()` method to manage daemon teardown.
(2) Introduced a dedicated trait for Nydus-specific data-plane
operations.
This refactoring cleans up the `ShareFs` trait by consolidating
daemon lifecycle handling and isolating Nydus-specific extensions,
paving the way for cleaner standalone Nydus implementation.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Implement NydusClient to interact with nydusd daemon via Unix
socket:
(1) check_status: query daemon state via GET /api/v1/daemon.
(2) mount/umount: manage filesystem mounts via POST/DELETE
/api/v1/mount.
(3) wait_until_ready: poll daemon until RUNNING state.
This provides a lightweight, stateless HTTP client layer for nydusd
API.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
In standalone nydusd mode with virtio-fs passthrough, the guest-side
mkdir may fail with ENOSYS. Update the overlayfs storage handler to
skip directory creation when the directory already exists, logging a
warning instead of failing.
This ensures container rootfs setup succeeds when nydusd's native
overlay manages the directory structure.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
When using virtio-fs with nydusd's passthrough_fs, mkdir operations may
return ENOSYS on certain filesystem configurations. This causes mount
destination creation to fail unexpectedly.
Handle ENOSYS errors gracefully alongside AlreadyExists by verifying the
directory exists after the failed mkdir attempt, allowing the mount to
proceed if the directory is already present.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add "virtio-fs-nydus" as a recognized shared filesystem type in the
hypervisor configuration. This enables the standalone nydusd mode where
nydusd runs as a separate process alongside virtiofsd.
The key changes:
(1) Add VIRTIO_FS_NYDUS constant for the new shared fs type.
(2) Register virtio-fs-nydus in adjust() and validate() paths, reusing
the same virtio-fs validation logic since both use vhost-user protocol
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
As independent iothreads can work in both virtio-scsi and virtio-blk
devices, this commit aims to enable such feature in virtio-blk-pci
devices.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
1. Determine iothread for virtio-blk devices, only attach iothread
when:
(1) enable_iothreads is true
(2) indep_iothreads > 0
(3) block driver is not virtio-scsi (i.e., it's
virtio-blk)
And for more complex cases, some enhancements will be done in future
2. Add iothread parameter for virtio-blk devices if specified.
If iothreads set and passed, we will have to set it correctly for
virtio-blk devices via qmp with device_add arguments.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
To make it work well for independent IO threads for virtio-blk devices.
A new method for independent IO threads for virtio-blk hotplug devices
within qemu command line.
Note that as ObjectIoThread has been done for days, it can be directly
reused in this case.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
To make it more flexible when users want to set this feature, one
more way to make it valid is via annotations.
The dedicated annnotation of
"io.katacontainers.config.hypervisor.indep_iothreads" is introduced
within k8s clusters.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
It's useful and helpful to set indep_iothreads with enable_iothreads
for high IO performance. And we need provide an entry for people to
set it if needed.
This commit will introduce two configurable items:
- Makefile: DEFINDEPIOTHREADS when make build.
- configurations: indep_iothreads for people to set.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The 'indep_iothreads' field is introduced in Hypervisor to make it
configurable for number of independent IO threads for virtio-blk
devices. When set to a value greater than 0, creates independent
IO threads that can be attached to virtio-blk devices during hotplug.
Note that it requires 'enable_iothreads' to be true for virtio-blk
devices to use these threads.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add update_guest_filesystem_metrics() that collects disk space usage
(total/used/available) for all read-write mounted filesystems inside
the guest VM. This enables monitoring guest disk usage in kata/coco
pod through the existing GetMetrics RPC.
And its output metrics looks like as below:
- kata_guest_filesystem_bytes{mount="/",device="vda",item="total|used|available"}
- kata_guest_filesystem_inodes{mount="/",device="vda",item="total|used|available"}
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add two new GaugeVec metrics to expose guest filesystem space usage:
(1) kata_guest_filesystem_bytes{mount, device, item}: space in bytes
(total/used/available)
(2) kata_guest_filesystem_inodes{mount, device, item}: inode counts
(total/used/available)
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
In #13147, for some reason a test block was added in the middle of code
and the code was stale when merged, which meant that a second
`mod test` section was added, breaking our tests. Merge the two
to fix this.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
publish-kata-deploy-payload got renamed in #13107, which broke the CI.
Now, instead of tracking all those intermediate steps, let's make sure
we only track the tests themselves.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Run qemu-coco-dev-runtime-rs k8s test workflow on the zVSI
only during nightly builds.
Changes:
- Modified run-k8s-tests-on-zvsi.yaml to accept vmm as workflow
inputs instead of hardcoded matrix values
- run-k8s-tests-on-zvsi passes a conditional vmm value; 4 vmms
for nightly/dev builds and 3 vmms for all other PRs.
This ensures qemu-coco-dev-runtime-rs is only tested with nydus
snapshotter during nightly CI runs, reducing PR test time while
maintaining comprehensive nightly coverage.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Add qemu-coco-dev-runtime-rs to the VMM matrix in the zVSI K8S
test workflow, configured to run only with the nydus snapshotter.
Changes:
- Add qemu-coco-dev-runtime-rs to the vmm matrix
- Exclude overlayfs + qemu-coco-dev-runtime-rs combination
- Exclude devmapper + qemu-coco-dev-runtime-rs combination
- Update CoCo-related conditional steps to include the new VMM:
* KBS environment variable setup
* kbs-client uninstall/install steps
* CoCo KBS deployment
This ensures qemu-coco-dev-runtime-rs is only tested with nydus
snapshotter, while maintaining existing test configurations for
other VMMs.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Switch kata-monitor workflows from the deprecated "active" key to
"latest" so CI resolves containerd versions from versions.yaml correctly
after the key rename.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Switch qemu-se config templates to use the TEE/CoCo-specific
static_sandbox_resource_mgmt defaults instead of the generic
QEMU defaults.
qemu-se-runtime-rs config now uses DEFSTATICRESOURCEMGMT_COCO
while runtime qemu-se config now uses DEFSTATICRESOURCEMGMT_TEE.
This aligns static sandbox resource management behavior with confidential
container expectations for qemu-se variants.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
These jobs build and push the kata-deploy OCI image, so call them
publish-kata-deploy-image-* instead of *-payload-*, matching the
kata-monitor image jobs and making the workflow easier to read.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add a single-job k8s test that installs the kata-deploy helm chart
with monitor.enabled=true, pointed at the per-PR kata-monitor image
built earlier in the same run, and exercises both the rollout and the
user-visible behaviour:
* the kata-monitor DaemonSet rolls out and the pod stays up without
container restarts;
* a real kata-runtime probe pod is scheduled, then /metrics and
/sandboxes are scraped through the apiserver pod-proxy to prove
kata-monitor sees the sandbox (non-zero running-shim count plus at
least one per-sandbox kata_shim_* metric);
* after the probe pod is deleted, /metrics drops back to a zero
running-shim count.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
Resolve the cri-tools release at install time instead of pinning a
version in versions.yaml: install_cri_tools now queries the GitHub
releases API for the absolute latest stable tag, and the kata-monitor,
cri-containerd and nydus jobs call it directly.
Also write /etc/crictl.yaml during containerd setup so crictl stops
emitting deprecation warnings about the legacy default endpoints.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
Exercise the published kata-monitor container image (the one built by
publish-kata-monitor-payload-amd64) rather than the on-disk binary, so
integration regressions like the recent glibc/musl mismatch surface at
PR time. The kata-monitor-tests.sh script keeps the binary fallback for
ad-hoc local runs.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
Drop the stale CRI-O matrix entry (its cri-tools pin was several
releases behind) along with the exclude that hid the containerd job,
and pin the remaining job to containerd's "active" track (currently
v2.2) via CONTAINERD_VERSION.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
Add a disabled-by-default kata-monitor DaemonSet to the kata-deploy Helm chart,
including image/configuration values so operators can enable monitor shipping as
part of the same deployment workflow when needed.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
kata-monitor is published as a standalone container image starting
with 3.32.0; point users at it from the metrics design doc and the
Prometheus-on-Kubernetes how-to, and switch the DaemonSet manifest to
the dedicated image (keeping the runtime endpoint/listen settings and
hostPath cleanups).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
Build kata-monitor images by extracting the binary from the
shim-v2-go tarball and shipping it on top of
gcr.io/distroless/static-debian13.
Because the binary is built inside an Ubuntu (glibc) toolchain it
cannot run on a pure musl/alpine base — users hit __fprintf_chk /
__vfprintf_chk relocation errors. To get a small, distroless
runtime image we use the same pattern as
tools/packaging/kata-deploy/Dockerfile: copy the glibc libraries
the binary needs (plus the dynamic linker) via ldd from a glibc
base image.
In order to do so, we also added a helper script to build and
publish architecture-specific monitor images from tarball
artifacts.
Reported-by: Steve Linde <stevenlinde@google.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
Set runc SystemdCgroup=true when generating /etc/containerd/config.toml
during containerd installation, restoring behavior that was mistakenly
dropped.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Avoid emitting unsupported plugin keys and empty runtime options in the
v1.x config path so containerd 1.7 can load the generated TOML during
runc sanity checks.
While here, let's also dump the temporary cri-integration config on
failure to speed diagnosis.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>