Commit Graph

2319 Commits

Author SHA1 Message Date
Fabiano Fidêncio
09bbc70302 Merge pull request #13002 from manuelh-dev/mahuber/unrequire-nim-svc
gatekeeper: Unrequire NVIDIA GPU test (temporary)
2026-05-08 10:02:00 +02:00
Manuel Huber
edfb6f5716 gatekeeper: Unrequire NVIDIA GPU test (temporary)
Temporarily unrequire the NVIDIA GPU test. We are experiencing
situations in which two NIM service instances get deployed almost
at the same time into the kata-containers-k8s-tests namespace
(expected current context) and into the default namespace. This
causes the NIM operator to create two deployments in the two
namespaces and to then schedule two pods at the same time. This
usually causes the NIM pod in the default namespace to fail and to
linger.
We can't explain yet why this does not happen in the TEE CI path
and why this is happening at all.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-05-07 14:39:24 +02:00
Fabiano Fidêncio
0f3160276b ci: k8s: skip no-op Helm uninstall on free runners
In cleanup_kata_deploy, bail out early when no kata-deploy Helm release
exists so baremetal-* pre-deploy cleanup on fresh clusters does not
block on helm uninstall --wait (up to 10m).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-07 13:40:55 +02:00
Fabiano Fidêncio
f5533950e6 kata-deploy: helm: cap container RSS via resources block
Plumb a resources block into the kata-deploy DaemonSet container in
the Helm chart so the cluster can size its memory footprint
predictably.

Defaults are sized from real /proc/<pid>/status numbers on an
unpatched 3.30.0 build running on a ~220-vCPU GPU node:

  VmRSS:    9944 kB  (~9.7 MiB)   <- actual physical memory
  RssAnon:  2628 kB  (~2.6 MiB)   <- heap + dirty stack pages
  VmData: 464668 kB  (~454 MiB)   <- tokio multi-thread workers'
                                     reserved-but-untouched stacks
  Threads: 225                    <- num_cpus()-driven worker pool

That VmData number is the source of the original "kata-deploy is
using 400 MB" reports: any monitoring layer that surfaces virtual
data size, committed memory, or memory.usage_in_bytes on a kernel
that includes mapped-but-untouched memory will happily reproduce
~400 MB even though only ~10 MiB is ever made resident. The earlier
commits in this series (current_thread tokio, mimalloc, shared kube
client, JSONPath removal, post-install re-exec) collapse VmData into
the tens of MiB and drop the post-install resident set further.

The defaults below are picked accordingly:

  requests:
    cpu: 25m            # install is mostly I/O wait; the post-install
                        # waiter is genuinely idle
    memory: 16Mi        # ~2x headroom over the unpatched VmRSS we
                        # measured, far more over the patched waiter

Operators who hit OOMKilled on unusually large or churny clusters can
override `resources` directly in their Helm values (or set it to {}
to remove all requests and inherit cluster defaults).

Fixes: https://github.com/kata-containers/kata-containers/discussions/12976

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-07 13:40:55 +02:00
Fabiano Fidêncio
9e99b21ec5 kata-deploy: re-exec into a tiny post-install waiter
After install completes the kata-deploy DaemonSet pod has nothing else
to do for the rest of its lifetime — it just blocks on SIGTERM and then
runs cleanup. Up to here, the install path has built up substantial
peak heap (kube clients, deserialised Node/RuntimeClass objects, hyper
+ rustls TLS pools, parsed JSON / YAML), and on musl essentially none
of that is ever returned to the kernel. Idling in the same process
therefore pins the pod's RSS at the install peak indefinitely.

Re-exec the binary into a hidden `internal-post-install-wait` action
the moment install succeeds. execve(2) discards the entire address
space, so the waiter starts up holding only the working set it actually
needs (a config struct, the SIGTERM handler, and the health server).

To avoid a probe-availability gap during the handover the install
process clears FD_CLOEXEC on the health listener and passes the raw
FD to the child via KATA_DEPLOY_HEALTH_FD. The child reattaches the
FD as a tokio TcpListener and resumes serving /healthz and /readyz
without ever closing the socket — the kubelet sees no failure.

The detected container runtime is similarly threaded through
KATA_DEPLOY_DETECTED_RUNTIME so the waiter doesn't have to re-query
the apiserver. The new action is tagged `#[clap(hide = true)]` so
`--help` doesn't expose it; users should never invoke it directly.

Add the FD-inheritance helpers in health.rs:

  - prepare_listener_for_exec(): clears FD_CLOEXEC on a listener and
    returns its raw fd number.
  - listener_from_inherited_fd(): wraps an inherited fd back into a
    tokio::net::TcpListener (and re-sets FD_CLOEXEC so future host
    shellouts don't leak the socket).

Fixes: https://github.com/kata-containers/kata-containers/discussions/12976

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-07 13:40:55 +02:00
Fabiano Fidêncio
af03ab2228 kata-deploy: replace JSONPath node lookups with typed accessors
The two pieces of node metadata kata-deploy actually reads are
.status.nodeInfo.containerRuntimeVersion and a single label, both of
which were being fetched through a homegrown JSONPath walker:

  - get_node_field() serialised the entire Node object back into a
    serde_json::Value tree on every call,
  - split_jsonpath() / get_jsonpath_value() then walked that tree by
    string key.

Both the deep clone and the helpers themselves are unnecessary — kube's
Node type is already strongly typed. Replace get_node_field() with two
purpose-built accessors that read straight off the Node struct:

  - get_container_runtime_version(): pulls
    status.node_info.container_runtime_version with a clear error if
    the field isn't populated.
  - get_node_label(key): returns Option<String> directly from
    metadata.labels.

Drop split_jsonpath, get_jsonpath_value, and their unit tests (which
existed only to cover the JSONPath walker we no longer have). Update
the three callers (config.rs, runtime/manager.rs, runtime/containerd.rs)
to use the typed accessors.

This removes the entire serde_json::Value clone-and-walk path from the
hot read path and meaningfully cuts allocator churn during install.

Fixes: https://github.com/kata-containers/kata-containers/discussions/12976

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-07 13:40:55 +02:00
Fabiano Fidêncio
6cd842494c kata-deploy: cap the tokio worker pool to 2 threads
The default #[tokio::main] expands with flavor = "multi_thread" and
worker_threads = num_cpus::get(). On a typical NVIDIA GPU node
(200+ vCPUs) that allocates 200+ worker threads with ~2 MiB stacks
each, which is the single largest contributor to the DaemonSet pod's
VmData reservation — hundreds of MiB of address space mapped but never
touched, easily reproducing the "kata-deploy is using ~400 MB" reports
on any monitoring layer that surfaces VSZ / committed virtual memory.

Switch to a fixed two-worker multi-thread runtime instead:

  #[tokio::main(flavor = "multi_thread", worker_threads = 2)]

Two workers is exactly the right number for kata-deploy:

  - the install path is overwhelmingly I/O-bound and runs serially;
    one worker is enough to drive the install future itself,
  - install does shell out to `nsenter --target 1 systemctl restart
    containerd` (and friends) via the synchronous std::process::
    Command::output(), which wedges the worker thread it runs on for
    tens of seconds; the second worker keeps the spawned health-server
    task able to answer kubelet probes inside timeoutSeconds while
    the first is blocked.

flavor = "current_thread" would be tighter still on stacks (~4 MiB
saved) but is fundamentally unsafe here: with a single runtime thread,
any blocking host_systemctl call freezes the health server too, the
kubelet fails the readiness probe, and the pod is restarted long
before install completes. The CI lifecycle test reliably reproduces
this as a 15-minute timeout waiting for the kata-deploy DaemonSet pod
to become Ready.

Net result vs. upstream's num_cpus()-driven pool on a 200-vCPU node:
~200 fewer worker threads, ~400 MiB less VmData reservation, while
keeping kubelet probes responsive across the entire install path.

Add the "sync" tokio feature here too so subsequent commits in the
series can use tokio::sync primitives (OnceCell) without another
features bump.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-07 13:40:55 +02:00
Fabiano Fidêncio
346119108e kata-deploy: drop unused kube features
The binary doesn't use kube::runtime (controllers, watchers, reflectors)
or kube::derive (the CustomResource macro). Pulling them in only added
transitive deps (kube-runtime, kube-derive, backon, educe, ahash,
async-broadcast, ...) and inflated the binary's static data segment for
no functional gain.

Set default-features = false and select only what the binary actually
calls into: the kube-client surface plus the rustls-tls backend that
hyper-rustls already pulled in transitively. Behaviour is unchanged.

Fixes: https://github.com/kata-containers/kata-containers/discussions/12976

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-07 13:40:55 +02:00
Fabiano Fidêncio
1682b73e38 kata-deploy: Add qemu-nvidia-gpu-tdx-runtime-rs shim
Register the new qemu-nvidia-gpu-tdx-runtime-rs shim across the kata-deploy
stack so it is built, installed, and exposed as a RuntimeClass.

This adds the shim to the Rust binary's RUST_SHIMS list (so it uses the
runtime-rs binary), SHIMS list, the qemu-tdx-experimental share name
mapping, and the x86_64 default shim set. The Helm chart gets the new
shim entry in values.yaml, try-kata-nvidia-gpu.values.yaml, and the
RuntimeClass overhead definition in runtimeclasses.yaml.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-07 10:33:26 +02:00
Fabiano Fidêncio
2280620cb9 kata-deploy: Add qemu-nvidia-gpu-snp-runtime-rs shim
Register the new qemu-nvidia-gpu-snp-runtime-rs shim across the kata-deploy
stack so it is built, installed, and exposed as a RuntimeClass.

This adds the shim to the Rust binary's RUST_SHIMS list (so it uses the
runtime-rs binary), SHIMS list, the qemu-snp-experimental share name
mapping, and the x86_64 default shim set. The Helm chart gets the new
shim entry in values.yaml, try-kata-nvidia-gpu.values.yaml, and the
RuntimeClass overhead definition in runtimeclasses.yaml.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-07 10:33:26 +02:00
Fabiano Fidêncio
92a8cd56d1 kata-deploy: Add qemu-nvidia-gpu-runtime-rs shim
Register the Rust NVIDIA GPU runtime as a kata-deploy shim so it gets
installed and configured alongside the existing Go-based
qemu-nvidia-gpu shim.

Add qemu-nvidia-gpu-runtime-rs to the RUST_SHIMS list and the default
enabled shims, create its RuntimeClass entry in the Helm chart, and
include it in the try-kata-nvidia-gpu values overlay. The kata-deploy
installer will now copy the runtime-rs configuration and create the
containerd runtime entry for it.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-07 10:33:26 +02:00
Fabiano Fidêncio
acfb9f9762 Merge pull request #12954 from zvonkok/modular-makefile
build: remove gha-adjust-to-use-prebuilt-components.sh
2026-05-07 10:32:32 +02:00
Greg Kurz
c18932b5ab build-checks: Remove make vendor
The `generate_vendor.sh` script already knows how to create a tarball
with all the rust and go vendored code within the repo. It is used by
the release workflow to provide vendored code to downstream consummers
that might need it.

There isn't any vendored code in the repo anymore.

It thus doesn't seem quite useful to run `make vendor` in CI.

Stop doing it.

Signed-off-by: Greg Kurz <groug@kaod.org>
2026-05-06 09:49:50 +02:00
Greg Kurz
b44e56d3db runtime: Remove vendor directory
Now shipped in the vendored code tarball.

Drop the git tree status check since it isn't needed anymore.
Also stop building with `-mod=vendor`. This requires to
expose GOMODCACHE as suggested by Fabiano Fidêncio.

Signed-off-by: Greg Kurz <groug@kaod.org>
2026-05-06 09:47:30 +02:00
Greg Kurz
aa9145a762 generate_vendor: Add go vendored code
Add go vendored code for all packages to the vendor tarball.
This should be enough for people who need vendored code, e.g.
for hermetic builds.

The repo only tracks 4 go vendored code directories but the
script considers all go.mod files accross the repo, for the
sake of simplicity. The impact on the size of the tarball
is less than 20 mb.

It is now possible to stop tracking vendored code in git and
to get rid of `make vendor`.

Signed-off-by: Greg Kurz <groug@kaod.org>
2026-05-06 09:32:01 +02:00
Greg Kurz
6de1c00b77 webhook: Fix go.sum file
Run `go mod tidy`.

Signed-off-by: Greg Kurz <groug@kaod.org>
2026-05-06 09:31:55 +02:00
Greg Kurz
6c3de068a4 generate_vendor: Adapt to modern cargo
This is to silent :

warning: `.../.cargo/config` is deprecated in favor of `config.toml`
  |
  = help: if you need to support cargo 1.38 or earlier, you can symlink `config` to `config.toml`

We don't care for cargo 1.38 or earlier.

Signed-off-by: Greg Kurz <groug@kaod.org>
2026-05-06 09:31:54 +02:00
Manuel Huber
06072552de osbuilder: install yq in container without GOPATH
rootfs.sh stops passing a host GOPATH bind-mount into the inner
osbuilder docker run. Pass INSTALL_IN_GOPATH=false so
ci/install_yq.sh installs yq under /usr/local/bin in the container.
scripts/lib.sh resolves yq after sourcing install_yq.sh and fails
clearly if yq is still missing.
This avoids build issues on (managed) build hosts where HOME, for
example, resolves to /localhome/... while the image user record
still points at /home/... On those hosts the old flow could make
the daemon bind-mount a GOPATH path that does not exist or is not
writable on the host (e.g. mkdir or mount under /home/... denied).

Co-authored-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-05-05 11:31:58 -07:00
Alex Tibbles
8d7246e29a kernel: bump kernel versions other than dragonball
Applies fix for CVE-2026-31431 for non-dragonball configurations on current LTS 6.18.

Signed-Off-By: Alex Tibbles <alex@bleg.org>
2026-05-05 09:30:46 +02:00
Fabiano Fidêncio
27c3dfbb8c Merge pull request #12943 from fidencio/topic/kata-deploy-add-http-health-probes
kata-deploy: add HTTP health probes (healthz/readyz)
2026-05-05 09:30:17 +02:00
Fabiano Fidêncio
d9722ba4be Merge pull request #12960 from microsoft/saul/update_mariner_test_configs
kata-deploy: configure_mariner: update test configs
2026-05-04 18:26:41 +02:00
Fabiano Fidêncio
49396b7991 kata-deploy: add HTTP health probes (healthz/readyz)
The kata-deploy DaemonSet pod had no Kubernetes health probes, so the
kubelet could not distinguish between "still installing" and "crashed",
and rolling updates would proceed to the next node before install
actually finished.

Add a lightweight HTTP health server (built on raw tokio TcpListener,
no new crate dependencies) that starts immediately in the install path:

  /healthz — liveness: returns 200 as soon as the server binds
  /readyz  — readiness: returns 503 while installing, 200 after
             install completes (artifacts extracted, CRI restarted,
             node labeled)

Wire the Helm chart with startup, liveness, and readiness probes
(all individually toggleable). The startup probe allows up to 10
minutes for install to complete before the liveness probe takes over.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-03 22:09:08 +02:00
Fabiano Fidêncio
5540f50198 Merge pull request #12972 from stevenhorsman/release/3.30.0
release: Bump version to 3.30.0
2026-05-02 20:54:54 +02:00
Steve Horsman
fd2b85f8ad Merge pull request #12969 from burgerdev/require-codegen
gatekeeper: require codegen
2026-05-02 18:38:53 +01:00
stevenhorsman
a1a6a9a150 release: Bump version to 3.30.0
Bump VERSION and helm-charts versions.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-05-02 17:57:39 +01:00
Markus Rudy
22598a34b2 gatekeeper: require codegen
The codegen check ensures that generated files are up-to-date and
correspond to the tool versions used in CI. Requiring this check
prevents us from accidentally merging, e.g., proto changes without the
corresponding Rust/Go updates.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-05-02 12:28:58 +02:00
Saul Paredes
cbb06545f7 kata-deploy: configure_mariner: also apply test config to runtime-rs
Apply same test configs we use in runtime-go config to runtime-rs config.

These are:
- runtime.static_sandbox_resource_mgmt = true
- hypervisor.clh.valid_hypervisor_paths includes cloud-hypervisor-glibc
- hypervisor.clh.path = cloud-hypervisor-glibc

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2026-05-01 08:15:52 -07:00
Saul Paredes
564d381b79 kata-deploy: configure_mariner: correctly set static_sandbox_resource_mgmt
static_sandbox_resource_mgmt is under the runtime config, not the hypervisor one.

See
31f7438ecd/src/runtime/config/configuration-clh.toml.in (L439)

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2026-05-01 08:15:52 -07:00
Zvonko Kaiser
803531dd9c kernel: Bump Kernel Version
Copy Fail" (CVE-2026-31431) is a high-severity local privilege escalation (LPE)
vulnerability found in the Linux kernel in April 2026, which affects all major
Linux distributions—including those using Long Term Support (LTS) kernels—released since 2017.
The bug allows an unprivileged user to gain root access, escape containers,
and modify the in-memory page cache reliably using a tiny 732-byte script

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-05-01 14:21:49 +00:00
Fabiano Fidêncio
96b68e77a7 kata-deploy: support containerd config schema version 4 and newer
Containerd 2.3.0 introduces config schema version 4 (see upstream
RELEASES.md and the version-4 server-plugin documentation). The default file
still uses the same split-CRI layout as version 3 (plugins under
io.containerd.cri.v1.runtime and io.containerd.cri.v1.images). Schema v4
mainly moves gRPC, TTRPC, debug, and metrics listener settings under
io.containerd.server.v1.*; kata-deploy does not edit those server tables except
for containerd log verbosity when DEBUG=true.

Fixes: #12936

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-30 16:23:43 +02:00
stevenhorsman
b61b3d2f20 kata-deploy: Update default tool binary location
Now that all but agent-ctl (still WIP) of the tools are
in the root workspace, switch the default to that and add
the exception for agent-ctl as it's the odd one out.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-30 08:46:22 +01:00
Fabiano Fidêncio
1e6c54cbcf Merge pull request #12856 from harshitgupta1337/cbl-mariner-config-return-0
rootfs: Suppress condition check failure errors in cbl-mariner/config.sh
2026-04-30 08:35:06 +02:00
Zvonko Kaiser
35dfb11fe4 build: replace prebuilt-components sed hack with DEPS=
Mutating the Makefile in-place to strip prereqs was fragile and
limited to one target per invocation. DEPS= skips deps declaratively
and propagates through recursive make, so multi-target builds can
opt out in one shot.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-04-30 00:48:46 +00:00
Zvonko Kaiser
54c514e249 build: allow overriding rootfs/boot tarball prereqs via DEPS
Skipping prereq rebuilds is useful when artifacts are already staged
from a prior run (CI splitting work across jobs, local iteration).

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-04-29 23:59:05 +00:00
Harshit Gupta
3b796c6579 rootfs: mariner: suppress condition check failure errors
Avoid returning failure from sourced scripts when condition check evaluates
to false.

Signed-off-by: Harshit Gupta <guptaharshit@microsoft.com>
2026-04-29 14:11:32 -04:00
stevenhorsman
9cae783f14 kata-deploy: fix binary location for trace-forwarder
Moving the trace-forwarder into the root workspace moves the target
directory, so update this target.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-29 13:27:09 +01:00
Fabiano Fidêncio
1a22c3adec Merge pull request #12942 from stevenhorsman/fix-cri-containerd-test-names
ci: Fix cri-containerd-test names
2026-04-29 09:56:43 +02:00
Steve Horsman
2435970fe8 Merge pull request #12933 from fidencio/topic/runtime-rs-decouple-dragonball-from-non-x86-checks
runtime-rs: drop misleading unsupported arches gating
2026-04-28 18:36:16 +01:00
stevenhorsman
4d4dee3af2 ci: Fix cri-containerd-test names
During the zizmor refactoring I changed the name of two jobs
to make all the architectures match. I forgot to update required_tests
and as a workflow only change the PR didn't check this, so update
them now.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-28 18:30:53 +01:00
Aurélien Bombo
dc0f1795de kata-deploy: remove useless unit tests
These essentially merely test format!(), which is not our job.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-04-28 10:58:01 -05:00
Aurélien Bombo
cf6a91a104 runtime-rs/config: rename cloud-hypervisor to clh
This aligns on the previous commit and runtime-go.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-04-28 10:58:01 -05:00
Aurélien Bombo
e4fbddb91a ci: rename cloud-hypervisor to clh-runtime-rs
This aligns on qemu-runtime-rs and makes more sense.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-04-28 10:58:01 -05:00
Fabiano Fidêncio
8ab97a60f3 ci: install protobuf-compiler for runtime-rs build-checks
The `runtime-rs` component of `build-checks.yaml` declared `rust`
as its only dependency, but the runtime-rs build pulls in
`prost-build v0.8.0` (via `ttrpc-codegen` -> `containerd-shim-protos`,
and via the in-tree `hypervisor` crate), and `prost-build`'s build
script needs a `protoc` binary at compile time.

This worked on x86_64 and aarch64 only because `prost-build v0.8.0`
ships bundled `protoc` binaries for those targets. On s390x (and
ppc64le, when the matrix gets there) there is no bundled binary,
so the build fails with:

  Failed to find the protoc binary. The PROTOC environment variable
  is not set, there is no bundled protoc for this platform, and
  protoc is not in the PATH

The reason this didn't show up in CI before is that `make test`
and `make check` for runtime-rs were wrapped in arch-specific
`ifeq` blocks in `src/runtime-rs/Makefile` that turned them into
no-ops on s390x/ppc64le/riscv64gc. The previous commit dropped
those gates so `make {test,check}` now actually run on every arch,
which exposes this latent CI gap.

Match what `agent`, `libs`, `agent-ctl`, `kata-ctl` and `genpolicy`
already declare and add `protobuf-compiler` to runtime-rs's needs.
The existing `Install protobuf-compiler` step in this workflow
already runs `sudo apt-get -y install protobuf-compiler`, which
the s390x/ppc64le runners support (those other components have
been using it on s390x for some time).

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-28 16:25:31 +02:00
Fabiano Fidêncio
a5e1521727 kernel: bake in Mellanox MLX5 Ethernet support
The MLX5 Ethernet driver is useful well beyond the DPU/SmartNIC use case
(any guest sitting on top of a Mellanox/ConnectX NIC benefits from it),
yet the existing config fragment lived under dpu/ and was only pulled in
when the kernel was built with `-D nvidia`.

Promote it to a first-class common fragment so every Kata kernel gets
MLX5 Ethernet built in.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-28 11:02:39 +02:00
Steve Horsman
d5785b4eba Merge pull request #12872 from stevenhorsman/bump-rust-to-1.93
Bump rust to 1.93
2026-04-27 09:01:00 +01:00
Fabiano Fidêncio
28d9043d4c build: Add driver version to artefact cache
Add the nvidia driver version to the artefact cache keys so that
a driver bump triggers image and initrd rebuilds.

Also rename the helper functions to follow a consistent
get_latest_nvidia_* naming convention.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-25 19:28:31 +02:00
stevenhorsman
9d2bb4518f kata-deloy: Update MSRV to match workspace
Update the kata-deploy Cargo.toml to use the
workspace wide MSRV, so it's easy to track and bump
as and when necessary.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-25 11:27:39 +01:00
Aurélien Bombo
15296fc9fe Merge pull request #12374 from microsoft/cameronbaird/add-cifs
kernel: add required configs for CIFS support
2026-04-24 10:42:09 -05:00
Fabiano Fidêncio
877f6b2129 tools: Fix shellcheck issues in common.bash
Address shellcheck warnings including proper variable quoting,
use of [[ ]] over [ ], declaring and assigning variables separately,
and adding appropriate shellcheck disable directives where needed.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-24 08:14:08 +02:00
Fabiano Fidêncio
bc3a273f84 tools: Fix shellcheck issues in containerd-shim-katadbg-v2
Address shellcheck warnings including proper variable quoting,
use of [[ ]] over [ ], declaring and assigning variables separately,
and adding appropriate shellcheck disable directives where needed.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-24 08:14:08 +02:00