The ci-coco-stability.yaml workflow has its weekly schedule
commented out with a note that the workload is not maintained.
Remove the entire chain: ci-coco-stability.yaml, ci-weekly.yaml,
run-kata-coco-stability-tests.yaml, and the kubernetes stability
test scripts that were only used through this path.
The local containerd stability tests (tests/stability/gha-run.sh)
remain as they are actively used by basic-ci workflows.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This manifest is not referenced by any .bats test file and
is effectively dead code.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The tests/integration/stdio/ directory has a gha-run.sh script
but no workflow in .github/workflows/ references it, so these
tests never run in CI.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This job in ci.yaml has been unconditionally disabled (if: false)
with no tracking issue or path to re-enablement.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The run-tracing job in basic-ci-amd64.yaml has been disabled
(if: false) due to issue #9763, with no path to re-enablement.
Remove the job definition and the backing
tests/functional/tracing/ directory.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The run-vfio job in basic-ci-amd64.yaml has been disabled
(if: false) due to issues #9764, #9851, and #9940, with no
path to re-enablement. Remove the job definition and the
backing tests/functional/vfio/ directory.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This reusable workflow (workflow_call) has no caller anywhere in
the repository, making it dead code.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The run-metrics.yaml workflow is a reusable workflow_call with no
caller in the repository, making it effectively dead code. Remove
the workflow, the entire tests/metrics/ directory (~586 files
including vendored Go for checkmetrics), and the "metrics"
self-hosted runner label from actionlint.yaml.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Allowing arbitrary symlinks in the shared directory is unsafe for
confidential VM use cases. In order to make CopyFile safe both for the
VM as well for the consuming containers, we implement the following
rules for symlinks (in addition to the existing rules for other files):
1. Symlinks may not be placed directly into the shared directory.
2. Symlinks must not point 'upwards', i.e. contain `..` as a path
element.
3. Symlinks must be relative.
These rules ensure that all writes initiated by CopyFile are restricted
to the shared directory (protecting the VM), and that symlinks can't
point outside their mount points (protecting the container).
These new restrictions mean that we can't support arbitrary mount
sources (which might not follow these rules), but the usual k8s suspects
(ConfigMap, Secret, ServiceAccountToken) should still pass.
In order to aid writing the policy, we convert the CopyFileRequest to a
structure that does not contain binary data, but well-defined strings
and types.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
The agent referred to the `data` field of an incoming CopyFileRequest
as the 'src'. This is misleading, because 'source' is not mentioned
in the specification (where links are just a path with attached
bytes), and because the documentation for the `ln` utility calls the
path LINK_NAME and the data TARGET. This commit fixes the glitch and
calls the first argument to `symlinkat` the target.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
Building the kata-agent-policy crate only succeeded when its parents
(agent and genpolicy) pulled in the required features. This commit adds
the required features to the crate itself, such that it can be built
standalone and IDEs don't show errors while browsing it.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
We should not ship configurations that we do not actively test.
This commit drops the following from the kata-deploy helm chart:
values.yaml:
- arm64 from supportedArches for the clh shim
- arm64 from supportedArches for the cloud-hypervisor shim
- arm64 from supportedArches for the dragonball shim
- arm64 from supportedArches for the fc shim
- arm64 from supportedArches for the qemu-nvidia-gpu shim
- the entire qemu-cca shim definition
try-kata-tee.values.yaml:
- CCA from the file description comment
- qemu-cca from the TEE shims list comment
- the entire qemu-cca shim definition
- arm64: qemu-cca from the defaultShim mapping, replaced with
arm64: qemu-coco-dev-runtime-rs (which is tested)
try-kata-nvidia-gpu.values.yaml:
- arm64 from supportedArches for the qemu-nvidia-gpu shim
- arm64: qemu-nvidia-gpu from the defaultShim mapping
Once arm64 and qemu-cca support are properly tested, they can be
re-added.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
This commit prunes the documentation tree by removing file
that are either no longer relevant to the current architecture
or have been superseded by newer guides.
Specifically, the doc Intel-Discrete-GPU-passthrough-and-Kata.md
and update using-Intel-QAT-and-kata.md index in nav.yaml
Refining the documentation helps ensure that new contributors
find accurate and up-to-date information.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
- Restructure document with clearer sections and better readability
- Add configuration format examples for both runtimes
- Add technical details including data flow and implementation references
- Add debugging section for troubleshooting
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
"cloud-hypervisor" is also a runtime-rs hypervisor. So we need to include it in the settings selection logic.
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Port the Go runtime's enable_vcpus_pinning feature to runtime-rs.
The Go runtime already lets users pin each vCPU thread to a specific
host CPU when the vCPU count matches the sandbox cpuset size, using
sched_setaffinity. This is useful for latency-sensitive workloads that
benefit from eliminating cross-CPU migration of vCPU threads.
The approach mirrors the Go implementation:
After VM start and on every container add/update/delete, we fetch the
vCPU thread IDs (via QMP query-cpus-fast for QEMU), compute the union of
all containers' OCI cpusets, and if the two counts match, pin vCPU i to
cpuset[i]. If they diverge (hotplug, container removal, etc.) we reset
all threads back to the full cpuset so nothing gets stuck on a single
core.
The pinning check lives in CgroupsResourceInner::update_sandbox_cgroups,
which already runs at exactly the right points in the lifecycle. The
enable_vcpus_pinning flag flows from the TOML config through
CgroupConfig into the cgroup resource layer, and can also be overridden
per-pod via the io.katacontainers.config.runtime.enable_vcpus_pinning
annotation.
The QEMU config templates default to false. The NV GPU configs will get
their own default (true) in a follow-up once those templates are added.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
The cloud-hypervisor feature has been fully functional for some time
now: it's enabled by default in virt_container, used by agent-ctl,
and exercised in CI. Drop the stale comments referencing issue #6264
and promote the feature to a default.
Fixes: #6264
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
* get_rootless_symlink_sandbox_path() would get without first checking for
is_rootless(), meaning cleanup() would ALWAYS fail (see below error), even
though the shim/CH would NOT leak thanks to containerd's recovery routine.
* Cleanup wouldn't be idempotent (in case the CRI issues multiple shutdown requests).
This was fixed by introducing remove_dir_all_if_exists().
Apr 17 17:53:21 containerd[4078033]: time="2026-04-17T17:53:21.821624475-05:00" level=error msg="failed to shutdown shim task and the shim might be leaked" error="Others(\"failed to handle message handler TaskRequest\\n\\nCaused by:\\n 0: do shutdown\\n 1: do the clean up\\n 2: delete hypervisor\\n 3: No such file or directory (os error 2)\\n\\nStack backtrace:\\n 0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from\\n 1: <hypervisor::ch::CloudHypervisor as hypervisor::Hypervisor>::cleanup::{{closure}}\\n 2: <virt_container::sandbox::VirtSandbox as common::sandbox::Sandbox>::cleanup::{{closure}}\\n 3: <virt_container::sandbox::VirtSandbox as common::sandbox::Sandbox>::shutdown::{{closure}}\\n 4: runtimes::manager::RuntimeHandlerManager::handler_task_message::{{closure}}::{{closure}}\\n 5: runtimes::manager::RuntimeHandlerManager::handler_task_message::{{closure}}\\n 6: <service::task_service::TaskService as containerd_shim_protos::shim::shim_ttrpc_async::Task>::shutdown::{{closure}}\\n 7: <containerd_shim_protos::shim::shim_ttrpc_async::ShutdownMethod as ttrpc::asynchronous::utils::MethodHandler>::handler::{{closure}}\\n 8: ttrpc::asynchronous::server::HandlerContext::handle_msg::{{closure}}\\n 9: <core::future::poll_fn::PollFn<F> as core::future::future::Future>::poll\\n 10: <ttrpc::asynchronous::server::ServerReader as ttrpc::asynchronous::connection::ReaderDelegate>::handle_msg::{{closure}}::{{closure}}\\n 11: tokio::runtime::task::core::Core<T,S>::poll\\n 12: tokio::runtime::task::harness::Harness<T,S>::poll\\n 13: tokio::runtime::scheduler::multi_thread::worker::Context::run_task\\n 14: tokio::runtime::scheduler::multi_thread::worker::Context::run\\n 15: tokio::runtime::context::scoped::Scoped<T>::set\\n 16: tokio::runtime::context::runtime::enter_runtime\\n 17: tokio::runtime::scheduler::multi_thread::worker::run\\n 18: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll\\n 19: tokio::runtime::task::core::Core<T,S>::poll\\n 20: tokio::runtime::task::harness::Harness<T,S>::poll\\n 21: tokio::runtime::blocking::pool::Inner::run\\n 22: std::sys::backtrace::__rust_begin_short_backtrace\\n 23: core::ops::function::FnOnce::call_once{{vtable.shim}}\\n 24: std::sys::thread::unix::Thread::new::thread_start\\n 25: <unknown>\\n 26: <unknown>\")" id=fca6a162b8f0ed7ef2b33cd99b6f1b58124e85c5489c193ceac487db0e4acdde
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This serializes CH API calls to avoid a race condition where deleting a pod
would hang indefinitely and leak both the shim and CH processes.
The race happened because the CRI can send multiple shutdown requests for the
same pod, however the CH socket wasn't guarded against concurrent usage, hence
it was possible that HTTP responses would interleave (see below) on the
shutdown path, leading to an error.
This would repro in <15 iterations (sometime 2-3) using a 2-container pod.
With this commit, I haven't observed a repro in 200+ iterations.
Fixes: #12858
ORIGINAL REPRO:
while true; do
kubectl apply -f busybox.yaml
kubectl wait --for=condition=ready po busybox
kubectl exec busybox -- echo foo
kubectl delete po busybox
done
ORIGINAL ERROR:
Apr 17 20:15:54 kata[2297383]: Failed to stop process, process = ContainerProcess { container_id: ContainerID { container_id: "d4eb8984d630111bbf808c7ea30b7a21274c0193cdb8d501d20e4f26a0a69151" }, exec_id: "", process_type: Container }, err = failed to update_mem_resource
Caused by:
0: resize memory
1: get vminfo
2: failed to serde {"config":{"cpus":{"boot_vcpus":1,"max_vcpus":32,"topology":{"threads_per_core":1,"cores_per_die":32,"dies_per_package":1,"packages":1},"kvm_hyperv":false,"max_phys_bits":46,"affinity":null,"features":{"amx":false},"nested":null},"memory":{"size":2147483648,"mergeable":false,"hotplug_method":"Acpi","hotplug_size":132024107008,"hotplugged_size":null,"shared":true,"hugepages":false,"hugepage_size":null,"prefault":false,"zones":null,"thp":true},"payload":{"firmware":null,"kernel":"/usr/share/cloud-hypervisor/vmlinux.bin","cmdline":"reboot=k panic=1 systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service agent.log_vport=1025 console=ttyS0,115200n8 root=/dev/vda1 rootflags=data=ordered,errors=remount-ro ro rootfstype=ext4 no_timer_check noreplace-smp systemd.log_target=console agent.container_pipe_size=1 agent.log=debug cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1","initramfs":null},"rate_limit_groups":null,"disks":[{"path":"/usr/share/kata-containers/kata-containers.img","readonly":true,"direct":false,"iommu":false,"num_queues":1,"queue_size":128,"vhost_user":false,"vhost_socket":null,"rate_limit_group":null,"rate_limiter_config":null,"id":"_disk0","disable_io_uring":false,"disable_aio":false,"pci_segment":0,"serial":null,"queue_affinity":null,"backing_files":false}],"net":[{"tap":null,"ip":"192.168.249.1","mask":"255.255.255.0","mac":"9e:7e:13:ee:03:5c","host_mac":null,"mtu":null,"iommu":false,"num_queues":2,"queue_size":256,"vhost_user":false,"vhost_socket":null,"vhost_mode":"Client","id":"_net1","fds":[-1],"rate_limiter_config":null,"pci_segment":0,"offload_tso":true,"offload_ufo":true,"offload_csum":true}],"rng":{"src":"/dev/urandom","iommu":false},"balloon":null,"fs":[{"tag":"kataShared","socket":"/run/kata/e1ae0a05f575a13a535aa95a9990d1fded4766a759f76be0e528c7912d3a5e39/root/virtiofsd.sock","num_queues":1,"queue_size":1024,"id":"_fs2","pci_segment":0}],"pmem":null:"/run/kata/e1ae0a05f575a13a535aa95a9990d1fded4766a759f76be0e528c7912d3a5e39/ch-vm.sock","iommu":false,"id":"_vsock3","pci_segment":0},"pvpanic":false,"iommu":false,"numa":null,"watchdog":false,"pci_segments":null,"platform":null,"tpm":null,"landlock_enabl"index":0,"base":3891789824,"size":524288,"type_":"Mmio32","prefetchable":false}}],"parent":null,"children":["_disk0"],"pci_bdf":"0000:00:01.0"},"_virtio-pci-_vsock3":{"id":"_virtio-pci-_vsock3","resources":[{"PciBar":{"index":0,"base":70367622201344,"sizee":false}}],"parent":null,"children":["_fs2"],"pci_bdf":"0000:00:04.0"},"_vsock3":{"id":"_vsock3","resources":[],"parent":"_virtio-pci-_vsock3","children":[],"pci_bdf":null},"_net1":{"id":"_net1","resources":[],"parent":"_virtio-pci-_net1","children":[],"presources":[{"PciBar":{"index":0,"base":70367623774208,"size":524288,"type_":"Mmio64","prefetchable":false}}],"parent":null,"children":["_net1"],"pci_bdf":"0000:00:02.0"},"_virtio-pci-__rng":{"id":"_virtio-pci-__rng","resources":[{"PciBar":{"index":0,"baseesources":[],"parent":null,"children":[],"pci_bdf":null}}}HTTP/1.1 200
Server: Cloud Hypervisor API
Connection: keep-alive
Content-Type: application/json
Content-Length: 4285
{"config":{"cpus":{"boot_vcpus":1,"max_vcpus":32,"topology":{"threads_per_core":1,"cores_per_die":32,"dies_per_package":1,"packagesepage_size":null,"prefault":false,"zones":null,"thp":true},"payload":{"firmware":null,"kernel":"/usr/share/cloud-hypervisor/vmlinux.bin","cmdline":"reboot=k panic=1 systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service agent.log_vport=1025 console=ttyS0,115200n8 root=/dev/vda1 rootflags=data=ordered,errors=remount-ro ro rootfstype=ext4 no_timer_check noreplace-smp systemd.log_target=console agent.container_pipe_size=1 agent.log=debug cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1","miter_config":null,"id":"_disk0","disable_io_uring":false,"disable_aio":false,"pci_segment":0,"serial":null,"queue_affinity":null,"backing_files":false}],"net":[{"tap":null,"ip":"192.168.249.1","mask":"255.255.255.0","mac":"9e:7e:13:ee:03:5c","host_mac":nu,"serial":{"file":null,"mode":"Tty","iommu":false,"socket":null},"console":{"file":null,"mode":"Off","iommu":false,"socket":null},"debug_console":{"file":null,"mode":"Off","iobase":233},"devices":[],"user_devices":null,"vdpa":null,"vsock":{"cid":3,"socket"
3: expected `,` or `}` at line 1 column 1924
Stack backtrace:
0: <E as anyhow::context::ext::StdError>::ext_context
1: anyhow::context::<impl anyhow::Context<T,E> for core::result::Result<T,E>>::with_context
2: <hypervisor::ch::CloudHypervisor as hypervisor::Hypervisor>::resize_memory::{{closure}}
3: resource::manager_inner::ResourceManagerInner::update_linux_resource::{{closure}}
4: virt_container::container_manager::container::Container::stop_process::{{closure}}
5: virt_container::container_manager::process::Process::run_io_wait::{{closure}}::{{closure}}
6: tokio::runtime::task::core::Core<T,S>::poll
7: tokio::runtime::task::harness::Harness<T,S>::poll
8: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
9: tokio::runtime::scheduler::multi_thread::worker::Context::run
10: tokio::runtime::context::scoped::Scoped<T>::set
11: tokio::runtime::context::runtime::enter_runtime
12: tokio::runtime::scheduler::multi_thread::worker::run
13: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
14: tokio::runtime::task::core::Core<T,S>::poll
15: tokio::runtime::task::harness::Harness<T,S>::poll
16: tokio::runtime::blocking::pool::Inner::run
17: std::sys::backtrace::__rust_begin_short_backtrace
18: core::ops::function::FnOnce::call_once{{vtable.shim}}
19: std::sys::thread::unix::Thread::new::thread_start
20: <unknown>
21: <unknown>
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Related to previous commit, which adds the default gateway neighbor, and that
entry has the state of reachable.
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
This change mirrors host networking into the guest as before, but now also
includes the default gateway neighbor entry for each interface.
Pods using overlay/synthetic gateways (e.g., 169.254.1.1) can hit a
first-connect race while the guest performs the initial ARP. Preseeding the
gateway neighbor removes that latency and makes early connections (e.g.,
to the API Service) deterministic.
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
reqwest 0.11 required rustls-webpki 0.101.x, so we had to bump it
to use 0.103.12 to fix CVEs:
- RUSTSEC-2026-0098
- RUSTSEC-2026-0099
Assisted-by IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In #12776 kata-deploy's binary was moved to the main cargo workspace,
but the Cargo.lock wasn't deleted. As it shares the main Cargo.lock tidy
this up.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The script is used to change the options used to build QEMU and **must**
be taken into consideration in case something changes, otherwise the
QEMU used by the CI would be the old cached one (ignoring any flag newly
added).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
`k8s-confidential.bats` technically doesn't need attestation, but only runs
on TEE hardware, so include it in the attestation list so we can test it in PRs
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The skip conditional is wrong, but it's not needed as the setup
and teardown only allow confidential hardware anyway
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The erofs snapshotter configuration is node-wide (a single containerd
drop-in) and cannot be split per runtime handler. The Go runtime does
not support fsmerged EROFS — it rejects fsmeta.erofs mount sources with
"unsupported mount source" — so erofs is only usable with runtime-rs.
Drop qemu-coco-dev (Go) from the erofs CI matrix and add a check in
kata-deploy's configure_erofs_snapshotter() that inspects the
SNAPSHOTTER_HANDLER_MAPPING: if any Go shim is explicitly mapped to
erofs, emit a prominent warning and bail out with a clear error telling
the operator to fix the mapping.
Since all shims are now guaranteed to be runtime-rs when erofs is
active, remove the conditional is_rust_shim gating and always emit the
full erofs configuration (differ options, default_size,
max_unmerged_layers=1).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Remove --tail=N limits from `kubectl logs` for kata-deploy pods so
the complete output is visible in CI job logs for debugging.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>