Commit Graph

18592 Commits

Author SHA1 Message Date
Fupan Li
18378145d2 Merge pull request #12821 from fidencio/topic/runtime-rs-cpu-pinning
runtime-rs: Add vCPU thread pinning support
2026-04-23 16:49:18 +08:00
Fabiano Fidêncio
f092210342 Merge pull request #12892 from kata-containers/topic/remove-non-running-tests
ci: Remove non-running tests
2026-04-23 09:41:38 +02:00
Fabiano Fidêncio
68cc7f8e70 ci: remove unmaintained CoCo stability test workflows
The ci-coco-stability.yaml workflow has its weekly schedule
commented out with a note that the workload is not maintained.
Remove the entire chain: ci-coco-stability.yaml, ci-weekly.yaml,
run-kata-coco-stability-tests.yaml, and the kubernetes stability
test scripts that were only used through this path.

The local containerd stability tests (tests/stability/gha-run.sh)
remain as they are actively used by basic-ci workflows.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
fccfd4dec7 tests: remove orphan vfio.yaml k8s workload manifest
This manifest is not referenced by any .bats test file and
is effectively dead code.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
c380c4c1d2 tests: remove unreferenced stdio integration tests
The tests/integration/stdio/ directory has a gha-run.sh script
but no workflow in .github/workflows/ references it, so these
tests never run in CI.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
e0d98fafe3 ci: remove disabled run-cri-containerd-tests-arm64 job
This job in ci.yaml has been unconditionally disabled (if: false)
with no tracking issue or path to re-enablement.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
c7e3f95883 tests: remove disabled tracing tests and CI job
The run-tracing job in basic-ci-amd64.yaml has been disabled
(if: false) due to issue #9763, with no path to re-enablement.
Remove the job definition and the backing
tests/functional/tracing/ directory.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
8a93cf8f17 tests: remove disabled VFIO tests and CI job
The run-vfio job in basic-ci-amd64.yaml has been disabled
(if: false) due to issues #9764, #9851, and #9940, with no
path to re-enablement. Remove the job definition and the
backing tests/functional/vfio/ directory.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
8e685f22c6 ci: remove orphan run-kata-deploy-tests-on-aks.yaml workflow
This reusable workflow (workflow_call) has no caller anywhere in
the repository, making it dead code.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Fabiano Fidêncio
b74f2c0a9c tests: remove metrics tests and workflow
The run-metrics.yaml workflow is a reusable workflow_call with no
caller in the repository, making it effectively dead code. Remove
the workflow, the entire tests/metrics/ directory (~586 files
including vendored Go for checkmetrics), and the "metrics"
self-hosted runner label from actionlint.yaml.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-23 08:46:12 +02:00
Aurélien Bombo
87a3318151 Merge pull request #12695 from microsoft/saulparedes/test_mariner_runtime-rs
ci: k8s-tests: test mariner and runtime-rs
2026-04-22 16:01:08 -05:00
Fabiano Fidêncio
8dccf4cf37 Merge pull request #12896 from fidencio/release/3.29.0
release: Bump version to 3.29.0
3.29.0
2026-04-22 20:45:50 +02:00
Fabiano Fidêncio
1b9e49eb27 Merge commit from fork
genpolicy: restrict symlinks in CopyFile
2026-04-22 20:05:03 +02:00
Fabiano Fidêncio
ed3f8b4efe release: Bump version to 3.29.0
Bump VERSION and helm-charts versions.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-22 15:57:39 +02:00
Markus Rudy
639ff3578d genpolicy: restrict symlinks in CopyFile
Allowing arbitrary symlinks in the shared directory is unsafe for
confidential VM use cases. In order to make CopyFile safe both for the
VM as well for the consuming containers, we implement the following
rules for symlinks (in addition to the existing rules for other files):

1. Symlinks may not be placed directly into the shared directory.
2. Symlinks must not point 'upwards', i.e. contain `..` as a path
   element.
3. Symlinks must be relative.

These rules ensure that all writes initiated by CopyFile are restricted
to the shared directory (protecting the VM), and that symlinks can't
point outside their mount points (protecting the container).

These new restrictions mean that we can't support arbitrary mount
sources (which might not follow these rules), but the usual k8s suspects
(ConfigMap, Secret, ServiceAccountToken) should still pass.

In order to aid writing the policy, we convert the CopyFileRequest to a
structure that does not contain binary data, but well-defined strings
and types.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-04-22 15:46:12 +02:00
Markus Rudy
d6bd666b3f agent: fix naming for symlinks in CopyFile
The agent referred to the `data` field of an incoming CopyFileRequest
as the 'src'. This is misleading, because 'source' is not mentioned
in the specification (where links are just a path with attached
bytes), and because the documentation for the `ln` utility calls the
path LINK_NAME and the data TARGET. This commit fixes the glitch and
calls the first argument to `symlinkat` the target.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-04-22 15:46:12 +02:00
Markus Rudy
5c362adcff agent: add required features for standalone build
Building the kata-agent-policy crate only succeeded when its parents
(agent and genpolicy) pulled in the required features. This commit adds
the required features to the crate itself, such that it can be built
standalone and IDEs don't show errors while browsing it.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-04-22 15:46:12 +02:00
Fabiano Fidêncio
47dea24409 Merge pull request #12895 from fidencio/topic/kata-deploy-avoid-shipping-what-we-do-not-test
kata-deploy: Remove arm64 and qemu-cca shim support
2026-04-22 15:42:43 +02:00
Fabiano Fidêncio
726992cde3 Merge pull request #12702 from Apokleos/update-docs2
docs: Update docs of kata-containers
2026-04-22 12:04:48 +02:00
Fabiano Fidêncio
9b62021049 kata-deploy: Remove untested arm64 and qemu-cca shim support
We should not ship configurations that we do not actively test.

This commit drops the following from the kata-deploy helm chart:

values.yaml:
- arm64 from supportedArches for the clh shim
- arm64 from supportedArches for the cloud-hypervisor shim
- arm64 from supportedArches for the dragonball shim
- arm64 from supportedArches for the fc shim
- arm64 from supportedArches for the qemu-nvidia-gpu shim
- the entire qemu-cca shim definition

try-kata-tee.values.yaml:
- CCA from the file description comment
- qemu-cca from the TEE shims list comment
- the entire qemu-cca shim definition
- arm64: qemu-cca from the defaultShim mapping, replaced with
  arm64: qemu-coco-dev-runtime-rs (which is tested)

try-kata-nvidia-gpu.values.yaml:
- arm64 from supportedArches for the qemu-nvidia-gpu shim
- arm64: qemu-nvidia-gpu from the defaultShim mapping

Once arm64 and qemu-cca support are properly tested, they can be
re-added.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-22 10:55:29 +02:00
Alex Lyn
978f40d631 docs: Remove obsolete and update documentation index
This commit prunes the documentation tree by removing file
that are either no longer relevant to the current architecture
or have been superseded by newer guides.

Specifically, the doc Intel-Discrete-GPU-passthrough-and-Kata.md
and update using-Intel-QAT-and-kata.md index in nav.yaml

Refining the documentation helps ensure that new contributors
find accurate and up-to-date information.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-22 16:29:46 +08:00
Alex Lyn
59609463e0 docs: Update kernel modules loading document
- Restructure document with clearer sections and better readability
- Add configuration format examples for both runtimes
- Add technical details including data flow and implementation references
- Add debugging section for troubleshooting

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-22 16:29:46 +08:00
Alex Lyn
d6308ffb8c docs: Update SPDK vhost-user guide with CSI driver
- Add support for runtime-rs with Dragonball
- Add CSI driver integration method for Kubernetes
- Add kata-ctl direct-volume method for manual setup
- Preserve SPDK vhost-user Target Overview principles
- Fix minor typo (can exposes -> can expose)

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-04-22 16:29:46 +08:00
Saul Paredes
cafdd278ba tests: k8s: policy: improve settings selection for runtime-rs hypervisors
"cloud-hypervisor" is also a runtime-rs hypervisor. So we need to include it in the settings selection logic.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2026-04-21 14:08:27 -07:00
Saul Paredes
baf0f16804 ci: k8s-tests: test mariner and runtime-rs
Disable policy tests when using mariner and runtime-rs. These are not supported yet.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2026-04-21 14:08:21 -07:00
Fabiano Fidêncio
0c80372cf5 Merge pull request #12881 from stevenhorsman/bump-web-pki-to-0.103.12
Bump web pki to 0.103.12
2026-04-21 18:11:26 +02:00
Aurélien Bombo
206c1d3be8 Merge pull request #12889 from fidencio/topic/ch-config
hypervisor: Enable cloud-hypervisor feature by default
2026-04-21 11:04:31 -05:00
Fabiano Fidêncio
48669a894e runtime-rs: Add vCPU thread pinning support
Port the Go runtime's enable_vcpus_pinning feature to runtime-rs.

The Go runtime already lets users pin each vCPU thread to a specific
host CPU when the vCPU count matches the sandbox cpuset size, using
sched_setaffinity. This is useful for latency-sensitive workloads that
benefit from eliminating cross-CPU migration of vCPU threads.

The approach mirrors the Go implementation:

After VM start and on every container add/update/delete, we fetch the
vCPU thread IDs (via QMP query-cpus-fast for QEMU), compute the union of
all containers' OCI cpusets, and if the two counts match, pin vCPU i to
cpuset[i]. If they diverge (hotplug, container removal, etc.) we reset
all threads back to the full cpuset so nothing gets stuck on a single
core.

The pinning check lives in CgroupsResourceInner::update_sandbox_cgroups,
which already runs at exactly the right points in the lifecycle. The
enable_vcpus_pinning flag flows from the TOML config through
CgroupConfig into the cgroup resource layer, and can also be overridden
per-pod via the io.katacontainers.config.runtime.enable_vcpus_pinning
annotation.

The QEMU config templates default to false. The NV GPU configs will get
their own default (true) in a follow-up once those templates are added.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-21 12:45:56 +02:00
Fabiano Fidêncio
1c2d5cb57d Merge pull request #12848 from kata-containers/sprt/fix-block-vol-test
tests: make k8s-block-volume more robust
2026-04-21 11:27:43 +02:00
Fabiano Fidêncio
2bfa94b7cb hypervisor: Enable cloud-hypervisor feature by default
The cloud-hypervisor feature has been fully functional for some time
now: it's enabled by default in virt_container, used by agent-ctl,
and exercised in CI.  Drop the stale comments referencing issue #6264
and promote the feature to a default.

Fixes: #6264

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-21 11:26:12 +02:00
Fabiano Fidêncio
3b481813f9 Merge pull request #12887 from kata-containers/sprt/fix-runtime-rs-ch-cleanup
runtime-rs/ch: Fix pod deletion hang and make deletion idempotent
2026-04-21 11:21:09 +02:00
Aurélien Bombo
a401266f0e Merge pull request #11704 from microsoft/saulparedes/allow_default_gateway_neigh
network: preseed default-gateway neighbor
2026-04-20 15:43:55 -05:00
Aurélien Bombo
d64fce3998 Revert "ci: k8s: Adjust timeout on free runners"
This reverts commit 8d6f1d6f34.
2026-04-20 15:36:35 -05:00
Aurélien Bombo
3cf9581fbe runtime-rs/ch: Fix errors on pod deletion
* get_rootless_symlink_sandbox_path() would get without first checking for
   is_rootless(), meaning cleanup() would ALWAYS fail (see below error), even
   though the shim/CH would NOT leak thanks to containerd's recovery routine.

 * Cleanup wouldn't be idempotent (in case the CRI issues multiple shutdown requests).
   This was fixed by introducing remove_dir_all_if_exists().

   Apr 17 17:53:21 containerd[4078033]: time="2026-04-17T17:53:21.821624475-05:00" level=error msg="failed to shutdown shim task and the shim might be leaked" error="Others(\"failed to handle message handler TaskRequest\\n\\nCaused by:\\n    0: do shutdown\\n    1: do the clean up\\n    2: delete hypervisor\\n    3: No such file or directory (os error 2)\\n\\nStack backtrace:\\n   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from\\n   1: <hypervisor::ch::CloudHypervisor as hypervisor::Hypervisor>::cleanup::{{closure}}\\n   2: <virt_container::sandbox::VirtSandbox as common::sandbox::Sandbox>::cleanup::{{closure}}\\n   3: <virt_container::sandbox::VirtSandbox as common::sandbox::Sandbox>::shutdown::{{closure}}\\n   4: runtimes::manager::RuntimeHandlerManager::handler_task_message::{{closure}}::{{closure}}\\n   5: runtimes::manager::RuntimeHandlerManager::handler_task_message::{{closure}}\\n   6: <service::task_service::TaskService as containerd_shim_protos::shim::shim_ttrpc_async::Task>::shutdown::{{closure}}\\n   7: <containerd_shim_protos::shim::shim_ttrpc_async::ShutdownMethod as ttrpc::asynchronous::utils::MethodHandler>::handler::{{closure}}\\n   8: ttrpc::asynchronous::server::HandlerContext::handle_msg::{{closure}}\\n   9: <core::future::poll_fn::PollFn<F> as core::future::future::Future>::poll\\n  10: <ttrpc::asynchronous::server::ServerReader as ttrpc::asynchronous::connection::ReaderDelegate>::handle_msg::{{closure}}::{{closure}}\\n  11: tokio::runtime::task::core::Core<T,S>::poll\\n  12: tokio::runtime::task::harness::Harness<T,S>::poll\\n  13: tokio::runtime::scheduler::multi_thread::worker::Context::run_task\\n  14: tokio::runtime::scheduler::multi_thread::worker::Context::run\\n  15: tokio::runtime::context::scoped::Scoped<T>::set\\n  16: tokio::runtime::context::runtime::enter_runtime\\n  17: tokio::runtime::scheduler::multi_thread::worker::run\\n  18: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll\\n  19: tokio::runtime::task::core::Core<T,S>::poll\\n  20: tokio::runtime::task::harness::Harness<T,S>::poll\\n  21: tokio::runtime::blocking::pool::Inner::run\\n  22: std::sys::backtrace::__rust_begin_short_backtrace\\n  23: core::ops::function::FnOnce::call_once{{vtable.shim}}\\n  24: std::sys::thread::unix::Thread::new::thread_start\\n  25: <unknown>\\n  26: <unknown>\")" id=fca6a162b8f0ed7ef2b33cd99b6f1b58124e85c5489c193ceac487db0e4acdde

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-04-20 15:36:18 -05:00
Aurélien Bombo
93bd2899fb runtime-rs/ch: Fix hang on pod deletion
This serializes CH API calls to avoid a race condition where deleting a pod
would hang indefinitely and leak both the shim and CH processes.

The race happened because the CRI can send multiple shutdown requests for the
same pod, however the CH socket wasn't guarded against concurrent usage, hence
it was possible that HTTP responses would interleave (see below) on the
shutdown path, leading to an error.

This would repro in <15 iterations (sometime 2-3) using a 2-container pod.
With this commit, I haven't observed a repro in 200+ iterations.

Fixes: #12858

ORIGINAL REPRO:

while true; do
  kubectl apply -f busybox.yaml
  kubectl wait --for=condition=ready po busybox
  kubectl exec busybox -- echo foo
  kubectl delete po busybox
done

ORIGINAL ERROR:

 Apr 17 20:15:54 kata[2297383]: Failed to stop process, process = ContainerProcess { container_id: ContainerID { container_id: "d4eb8984d630111bbf808c7ea30b7a21274c0193cdb8d501d20e4f26a0a69151" }, exec_id: "", process_type: Container }, err = failed to update_mem_resource

                               Caused by:
                                   0: resize memory
                                   1: get vminfo
                                   2: failed to serde {"config":{"cpus":{"boot_vcpus":1,"max_vcpus":32,"topology":{"threads_per_core":1,"cores_per_die":32,"dies_per_package":1,"packages":1},"kvm_hyperv":false,"max_phys_bits":46,"affinity":null,"features":{"amx":false},"nested":null},"memory":{"size":2147483648,"mergeable":false,"hotplug_method":"Acpi","hotplug_size":132024107008,"hotplugged_size":null,"shared":true,"hugepages":false,"hugepage_size":null,"prefault":false,"zones":null,"thp":true},"payload":{"firmware":null,"kernel":"/usr/share/cloud-hypervisor/vmlinux.bin","cmdline":"reboot=k panic=1 systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service agent.log_vport=1025 console=ttyS0,115200n8 root=/dev/vda1 rootflags=data=ordered,errors=remount-ro ro rootfstype=ext4 no_timer_check noreplace-smp systemd.log_target=console agent.container_pipe_size=1 agent.log=debug cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1","initramfs":null},"rate_limit_groups":null,"disks":[{"path":"/usr/share/kata-containers/kata-containers.img","readonly":true,"direct":false,"iommu":false,"num_queues":1,"queue_size":128,"vhost_user":false,"vhost_socket":null,"rate_limit_group":null,"rate_limiter_config":null,"id":"_disk0","disable_io_uring":false,"disable_aio":false,"pci_segment":0,"serial":null,"queue_affinity":null,"backing_files":false}],"net":[{"tap":null,"ip":"192.168.249.1","mask":"255.255.255.0","mac":"9e:7e:13:ee:03:5c","host_mac":null,"mtu":null,"iommu":false,"num_queues":2,"queue_size":256,"vhost_user":false,"vhost_socket":null,"vhost_mode":"Client","id":"_net1","fds":[-1],"rate_limiter_config":null,"pci_segment":0,"offload_tso":true,"offload_ufo":true,"offload_csum":true}],"rng":{"src":"/dev/urandom","iommu":false},"balloon":null,"fs":[{"tag":"kataShared","socket":"/run/kata/e1ae0a05f575a13a535aa95a9990d1fded4766a759f76be0e528c7912d3a5e39/root/virtiofsd.sock","num_queues":1,"queue_size":1024,"id":"_fs2","pci_segment":0}],"pmem":null:"/run/kata/e1ae0a05f575a13a535aa95a9990d1fded4766a759f76be0e528c7912d3a5e39/ch-vm.sock","iommu":false,"id":"_vsock3","pci_segment":0},"pvpanic":false,"iommu":false,"numa":null,"watchdog":false,"pci_segments":null,"platform":null,"tpm":null,"landlock_enabl"index":0,"base":3891789824,"size":524288,"type_":"Mmio32","prefetchable":false}}],"parent":null,"children":["_disk0"],"pci_bdf":"0000:00:01.0"},"_virtio-pci-_vsock3":{"id":"_virtio-pci-_vsock3","resources":[{"PciBar":{"index":0,"base":70367622201344,"sizee":false}}],"parent":null,"children":["_fs2"],"pci_bdf":"0000:00:04.0"},"_vsock3":{"id":"_vsock3","resources":[],"parent":"_virtio-pci-_vsock3","children":[],"pci_bdf":null},"_net1":{"id":"_net1","resources":[],"parent":"_virtio-pci-_net1","children":[],"presources":[{"PciBar":{"index":0,"base":70367623774208,"size":524288,"type_":"Mmio64","prefetchable":false}}],"parent":null,"children":["_net1"],"pci_bdf":"0000:00:02.0"},"_virtio-pci-__rng":{"id":"_virtio-pci-__rng","resources":[{"PciBar":{"index":0,"baseesources":[],"parent":null,"children":[],"pci_bdf":null}}}HTTP/1.1 200
                                      Server: Cloud Hypervisor API
                                      Connection: keep-alive
                                      Content-Type: application/json
                                      Content-Length: 4285

                                      {"config":{"cpus":{"boot_vcpus":1,"max_vcpus":32,"topology":{"threads_per_core":1,"cores_per_die":32,"dies_per_package":1,"packagesepage_size":null,"prefault":false,"zones":null,"thp":true},"payload":{"firmware":null,"kernel":"/usr/share/cloud-hypervisor/vmlinux.bin","cmdline":"reboot=k panic=1 systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service agent.log_vport=1025 console=ttyS0,115200n8 root=/dev/vda1 rootflags=data=ordered,errors=remount-ro ro rootfstype=ext4 no_timer_check noreplace-smp systemd.log_target=console agent.container_pipe_size=1 agent.log=debug cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1","miter_config":null,"id":"_disk0","disable_io_uring":false,"disable_aio":false,"pci_segment":0,"serial":null,"queue_affinity":null,"backing_files":false}],"net":[{"tap":null,"ip":"192.168.249.1","mask":"255.255.255.0","mac":"9e:7e:13:ee:03:5c","host_mac":nu,"serial":{"file":null,"mode":"Tty","iommu":false,"socket":null},"console":{"file":null,"mode":"Off","iommu":false,"socket":null},"debug_console":{"file":null,"mode":"Off","iobase":233},"devices":[],"user_devices":null,"vdpa":null,"vsock":{"cid":3,"socket"
                                   3: expected `,` or `}` at line 1 column 1924

                               Stack backtrace:
                                  0: <E as anyhow::context::ext::StdError>::ext_context
                                  1: anyhow::context::<impl anyhow::Context<T,E> for core::result::Result<T,E>>::with_context
                                  2: <hypervisor::ch::CloudHypervisor as hypervisor::Hypervisor>::resize_memory::{{closure}}
                                  3: resource::manager_inner::ResourceManagerInner::update_linux_resource::{{closure}}
                                  4: virt_container::container_manager::container::Container::stop_process::{{closure}}
                                  5: virt_container::container_manager::process::Process::run_io_wait::{{closure}}::{{closure}}
                                  6: tokio::runtime::task::core::Core<T,S>::poll
                                  7: tokio::runtime::task::harness::Harness<T,S>::poll
                                  8: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
                                  9: tokio::runtime::scheduler::multi_thread::worker::Context::run
                                 10: tokio::runtime::context::scoped::Scoped<T>::set
                                 11: tokio::runtime::context::runtime::enter_runtime
                                 12: tokio::runtime::scheduler::multi_thread::worker::run
                                 13: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
                                 14: tokio::runtime::task::core::Core<T,S>::poll
                                 15: tokio::runtime::task::harness::Harness<T,S>::poll
                                 16: tokio::runtime::blocking::pool::Inner::run
                                 17: std::sys::backtrace::__rust_begin_short_backtrace
                                 18: core::ops::function::FnOnce::call_once{{vtable.shim}}
                                 19: std::sys::thread::unix::Thread::new::thread_start
                                 20: <unknown>
                                 21: <unknown>

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-04-20 15:36:00 -05:00
Fabiano Fidêncio
847f0f40cb Merge pull request #12880 from fidencio/topic/improve-qemu-cache
ci: cache: qemu: Take configure-hypervisor.sh into account
2026-04-20 19:16:01 +02:00
Saul Paredes
f1bcfb8a62 policy: allow neighbors with reachable state
Related to previous commit, which adds the default gateway neighbor, and that
entry has the state of reachable.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2026-04-20 10:00:23 -07:00
Saul Paredes
83bbfedc08 network: preseed default-gateway neighbor
This change mirrors host networking into the guest as before, but now also
includes the default gateway neighbor entry for each interface.

Pods using overlay/synthetic gateways (e.g., 169.254.1.1) can hit a
first-connect race while the guest performs the initial ARP. Preseeding the
gateway neighbor removes that latency and makes early connections (e.g.,
to the API Service) deterministic.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2026-04-20 10:00:19 -07:00
Dan Mihai
b2ea9a8fc6 Merge pull request #12460 from microsoft/danmihai1/k8s-openvpn-runtime
tests: annotations for all k8s-openvpn yaml files
2026-04-20 09:47:02 -07:00
stevenhorsman
6b1fd4c782 kata-ctl: Bump reqwest to 0.12
reqwest 0.11 required rustls-webpki 0.101.x, so we had to bump it
to use 0.103.12 to fix CVEs:
- RUSTSEC-2026-0098
- RUSTSEC-2026-0099

Assisted-by IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-20 17:20:54 +01:00
stevenhorsman
9fbdf513ca kata-deploy: Delete Cargo.lock
In #12776 kata-deploy's binary was moved to the main cargo workspace,
but the Cargo.lock wasn't deleted. As it shares the main Cargo.lock tidy
this up.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-20 17:09:21 +01:00
stevenhorsman
a59afa3154 versions: Update rustls-webpki to 0.103.12
Simple bump to fix CVEs:
- RUSTSEC-2026-0098
- RUSTSEC-2026-0099

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-20 16:24:20 +01:00
Fabiano Fidêncio
b64673196a ci: cache: qemu: Take configure-hypervisor.sh into account
The script is used to change the options used to build QEMU and **must**
be taken into consideration in case something changes, otherwise the
QEMU used by the CI would be the old cached one (ignoring any flag newly
added).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-20 14:52:57 +02:00
Fabiano Fidêncio
07731cde21 Merge pull request #12879 from stevenhorsman/confidential-tests-fixes
Confidential tests fixes
2026-04-20 14:33:02 +02:00
stevenhorsman
c75c432c01 ci: Update TEE scope
`k8s-confidential.bats` technically doesn't need attestation, but only runs
on TEE hardware, so include it in the attestation list so we can test it in PRs

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-20 09:36:10 +01:00
stevenhorsman
7179e92142 tests/confidentials: Remove pointless skip
The skip conditional is wrong, but it's not needed as the setup
and teardown only allow confidential hardware anyway

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-20 09:36:10 +01:00
Fupan Li
2629df2785 Merge pull request #12763 from Apokleos/fsmerged-erofs-rs
runtime-rs: support erofs snapshotter with Fsmerge enabled
2026-04-20 11:54:19 +08:00
Alex Lyn
e975b3158b Merge pull request #12837 from stevenhorsman/rand-bump-GHSA-cq8v-f236-94qc
versions: Bump rand crate where possible
2026-04-20 10:05:19 +08:00
Fabiano Fidêncio
d6f0b15578 ci: erofs: restrict to runtime-rs only
The erofs snapshotter configuration is node-wide (a single containerd
drop-in) and cannot be split per runtime handler.  The Go runtime does
not support fsmerged EROFS — it rejects fsmeta.erofs mount sources with
"unsupported mount source" — so erofs is only usable with runtime-rs.

Drop qemu-coco-dev (Go) from the erofs CI matrix and add a check in
kata-deploy's configure_erofs_snapshotter() that inspects the
SNAPSHOTTER_HANDLER_MAPPING: if any Go shim is explicitly mapped to
erofs, emit a prominent warning and bail out with a clear error telling
the operator to fix the mapping.

Since all shims are now guaranteed to be runtime-rs when erofs is
active, remove the conditional is_rust_shim gating and always emit the
full erofs configuration (differ options, default_size,
max_unmerged_layers=1).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-19 13:24:31 +02:00
Fabiano Fidêncio
cf1e6f82f2 tests: Show full kata-deploy pod logs in CI
Remove --tail=N limits from `kubectl logs` for kata-deploy pods so
the complete output is visible in CI job logs for debugging.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-19 13:24:31 +02:00