Commit Graph

2365 Commits

Author SHA1 Message Date
Pradipta Banerjee
1487eaaaa2 kernel: Enable landlock LSM
Allows using landlock LSM for the container process

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
2026-05-27 13:33:46 +02:00
Fabiano Fidêncio
25491fc20c Merge pull request #13104 from kata-containers/topic/kata-deploy-build-as-an-artefact
kata-deploy: prebuild payload-specific component artifacts
2026-05-25 22:56:55 +02:00
Fabiano Fidêncio
c65d64873b kata-deploy: prebuild payload-specific component artifacts
Build and publish the kata-deploy binary and CoCo guest-pull nydus
snapshotter as dedicated per-arch artifacts, then consume those tarballs
when assembling the kata-deploy image.

This avoids rebuilding those components in the payload image (which
would happen in serial) path and reduces overall CI build time.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-25 22:13:41 +02:00
Fabiano Fidêncio
3dc02a8604 Merge pull request #13085 from Apokleos/erofs-gpt-vmdk-only
runtime-rs: Support erofs snapshotter with gpt vmdk mode
2026-05-25 16:29:59 +02:00
Zvonko Kaiser
6c6c5809f1 Merge pull request #13109 from fidencio/topic/build-validate-measured-rootfs-root-hashes-for-all-shims
build: Validate measured-rootfs root hashes all shims
2026-05-25 15:58:35 +02:00
Zvonko Kaiser
aeadb1af35 Merge pull request #12948 from fidencio/topic/numa
runtime (go): agent: Add NUMA support for QEMU
2026-05-25 15:33:14 +02:00
Alex Lyn
a359d13476 build: Validate measured-rootfs root hashes all shims
The cached shim-v2 tarballs ship per-variant `root_hash_*.txt` files
embedded in the matching measured-rootfs image. Until now only
shim-v2-rust validated those hashes against the freshly built rootfs
images on a cache hit; shim-v2-go reused whatever was cached without
checking, even though its bundled configuration files contain the
`KERNELVERITYPARAMS_*` values baked in at build time.

When a PR changes the agent (and therefore the rootfs image and its
dm-verity hash) but does not touch `src/runtime`, the shim-v2-go cache
key stays the same and the stale tarball is reused. The resulting
guest cmdline carries a verity hash that no longer matches the new
rootfs image, so the VM panics very early in boot:

    device-mapper: verity: 254:1: metadata block 0 is corrupted
    erofs (device dm-0): cannot read erofs superblock
    Kernel panic - not syncing: VFS: Unable to mount root fs ...

Generalize the shim-v2-rust cache validation so it also runs for
shim-v2-go, push the per-variant root-hash sidecar files for both
shims, and fall back to a full rebuild whenever the cached hash is
missing or differs from the image one.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-25 19:12:52 +08:00
Alex Lyn
fd139a1143 kata-deploy: Reset max_unmerged_layers to "0" within erofs snapshotter
we should set max_unmerged_layers = 0 for erofs snapshotter gpt-vmdk
mode.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-25 19:08:31 +08:00
Fabiano Fidêncio
72be31c384 build: Validate measured-rootfs root hashes all shims
The cached shim-v2 tarballs ship per-variant `root_hash_*.txt` files
embedded in the matching measured-rootfs image. Until now only
shim-v2-rust validated those hashes against the freshly built rootfs
images on a cache hit; shim-v2-go reused whatever was cached without
checking, even though its bundled configuration files contain the
`KERNELVERITYPARAMS_*` values baked in at build time.

When a PR changes the agent (and therefore the rootfs image and its
dm-verity hash) but does not touch `src/runtime`, the shim-v2-go cache
key stays the same and the stale tarball is reused. The resulting
guest cmdline carries a verity hash that no longer matches the new
rootfs image, so the VM panics very early in boot:

    device-mapper: verity: 254:1: metadata block 0 is corrupted
    erofs (device dm-0): cannot read erofs superblock
    Kernel panic - not syncing: VFS: Unable to mount root fs ...

Generalize the shim-v2-rust cache validation so it also runs for
shim-v2-go, push the per-variant root-hash sidecar files for both
shims, and fall back to a full rebuild whenever the cached hash is
missing or differs from the image one.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-25 11:04:08 +02:00
Fabiano Fidêncio
7ddea26137 Merge pull request #13086 from fvichot/flo-kata-monitor-fix
kata-monitor: use full URI for connecting to containerd
2026-05-25 10:16:11 +02:00
Fabiano Fidêncio
407a6946f2 Merge pull request #13077 from hdp617/fix-kata-deploy-build
packaging: fix parallel kernel build race and kata-deploy script bugs
2026-05-25 09:53:38 +02:00
Fabiano Fidêncio
8d2ecaabb5 versions: Bump QEMU to v11.0.0
For more details see QEMU's release notes:
https://www.qemu.org/2026/04/22/qemu-11-0-0/

GPU experimental variants are also using v11.0.0 plus one patch to solve
issues related to NUMA mapping.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-24 22:00:46 +02:00
Florian Vichot
554e8f91b1 kata-monitor: use full URI for connecting to containerd
Without the protocol in the URI, grpc-go defaults to the DNS resolver,
which results in an error for unix sockets (`name resolver error: produced
zero addresses`).

We also remove the `getAddressAndDialer(...)` and `dial(...)` functions, as
they are no longer necessary, grpc-go supports connecting to unix sockets
directly. This also removes the matching tests.

This also adds a `Makefile` and tweaks the Dockerfile to simplify building
the Docker image.

Fixes #12398

Signed-off-by: Florian Vichot <florian.vichot@gmail.com>
2026-05-23 16:47:46 +02:00
Huy Pham
3ec444a7df kernel: bump config version
Bump the Kata Containers kernel configuration version to 195.

Signed-off-by: Huy Pham <huypham@google.com>
2026-05-22 12:26:53 -07:00
Huy Pham
c490373a78 kata-deploy: packaging: fix absolute path resolution in merge script
The `kata-deploy-merge-builds.sh` script blindly prepended `PWD` to the
`kata_versions_yaml_file` argument, assuming it was always a relative
path. However, the `Makefile` passes an absolute path using `$(MK_DIR)`.
This resulted in invalid double-concatenated paths like
`/workspace/...//workspace/...` which failed to copy.

Fix this by using `readlink -f` to safely resolve the path. This
correctly handles both relative and absolute paths, preventing path
corruption.

Signed-off-by: Huy Pham <huypham@google.com>
2026-05-22 12:05:56 -07:00
Fabiano Fidêncio
5d3e1e6396 kata-deploy: verify kata-runtime label remains stable on rke2/k3s
The retry loop added in efd468df3f still allows the install to declare
success while inside the kubelet's post-restart re-register window.

On rke2/k3s, `systemctl restart rke2-agent` restarts both containerd
and the kubelet, but `wait_till_node_is_ready` polls `.status.conditions[Ready]`
every 2 s and returns on the first `True` observation it sees. By default
the kubelet only publishes node status every ~10 s, so that first `True`
is almost always the stale value from before the restart — the kubelet
hasn't actually finished restarting yet. `label_node_with_retry` then
applies the label, sleeps 1 s, reads back "true" (still stale, kubelet
still down), and returns Ok. Install completes, `/readyz` flips to 200,
helm releases its `--wait`, and the bats test starts — and only then
does the kubelet finish coming up, re-register the node, and clobber
the label with its cached set. The lifecycle test sees an empty
`katacontainers.io/kata-runtime` and fails:

  # Node label katacontainers.io/kata-runtime:
  not ok 1 Kata artifacts are present on host after install

A single-shot verification can't distinguish "still stale true" from
"truly stable true after kubelet re-register". Replace it with a
stability window: after (re)applying the label, require it to remain
at the expected value for STABILITY_CHECKS=6 consecutive observations
spaced CHECK_INTERVAL=2 s apart (≈ 12 s — comfortably more than the
kubelet's status-update period). If the value ever drifts inside the
window, re-apply and restart the stability counter. Bounded by
MAX_APPLY_ATTEMPTS=12, so worst case is ~3 min; happy path adds ~12 s
to install.

Also add a short polling loop to the test's own label assertion as
belt-and-suspenders for any leftover transient race, matching the
existing retry pattern used for the container-runtime version check.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-22 11:53:18 +02:00
Huy Pham
ee4f756b75 kata-deploy: packaging: fix buggy return statements in cache check
The `install_cached_tarball_component` function in the binaries
packaging script contained syntax errors where it attempted to capture
the empty stdout of the `cleanup_and_fail` function inside a return
statement (e.g., `return "$(cleanup_and_fail ...)"`).

Since `cleanup_and_fail` only returns an exit status and produces no
stdout, this evaluated to `return ""`, which is invalid in bash and
causes the script to crash with `numeric argument required` instead of
returning the failure status.

Fix this by replacing the buggy inline returns with proper `if` blocks
that call `cleanup_and_fail` and explicitly return `1`.

Signed-off-by: Huy Pham <huypham@google.com>
2026-05-21 09:21:05 -07:00
Huy Pham
9ddcc53f6f kernel: build: resolve race condition in parallel config generation
During parallel builds of different kernel variants (e.g., generic,
debug, nvidia-gpu), the config generation script wrote to a shared
static path: `tools/packaging/kernel/configs/fragments/x86_64/.config`.

This caused critical race conditions where concurrent processes would
overwrite or delete the `.config` file while another process was reading
it, leading to sporadic build failures with "No such file or directory"
errors.

Resolve this by changing the temporary configuration path to be
build-specific, writing it inside the unique kernel build directory
(e.g., `kata-linux-.../.config.generated`). The final config is still
copied to `.config` in the kernel source tree as before, but the
intermediate merge process is now isolated.

Signed-off-by: Huy Pham <huypham@google.com>
2026-05-21 09:19:45 -07:00
Fabiano Fidêncio
7536f2c616 Merge pull request #13055 from kata-containers/topic/kata-deploy-only-install-what-will-be-used
kata-deploy: only install what will actually be used
2026-05-21 17:53:09 +02:00
Fabiano Fidêncio
efd468df3f kata-deploy: retry node labeling after CRI restart
On rke2/k3s a CRI restart also restarts the kubelet, which may briefly
re-register the node with its cached label set and clobber the
kata-runtime label that was just applied via the API.

Replace the single label_node call with a retry loop that verifies the
label value after setting it. If the label is missing or has the wrong
value, it is re-applied (up to 10 attempts with 2 s back-off). This
fixes a race condition that became more visible after the switch to
individual tarball extraction, which made install take slightly longer
and shifted the kubelet re-registration timing window.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-20 20:52:36 +02:00
Fabiano Fidêncio
291e4d37be kata-deploy: implement selective tarball extraction in installer
Add zstd and tar as Rust dependencies and rewrite the artifact
installation logic to extract only the component tarballs required by
the enabled runtime classes.

extract_component_tarballs reads shim-components.json to determine which
kata-static-<name>.tar.zst files are needed for the selected shims and
current architecture.  Shared components (e.g. kernel, shim-v2-go) are
listed by multiple shims and must only be unpacked once per install run.
Deduplication is handled with an in-memory set passed through the call,
avoiding any risk of stale on-disk state surviving across pod restarts.

Within each tarball, opt/kata path prefixes are stripped and absolute
symlink / hard-link targets are rewritten to point at the resolved
installation directory, correctly handling MULTI_INSTALL_SUFFIX.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-20 20:52:36 +02:00
Fabiano Fidêncio
9a0acc6c4c kata-deploy: ship individual component tarballs; drop merged tarball
Update the Dockerfile to copy each kata-static-<name>.tar.zst directly
into the image alongside shim-components.json, replacing the old
artifact-extractor stage that unpacked a single merged tarball.

Update the publish-kata-deploy-payload and release CI workflows to
download individual per-component artifacts instead of waiting for a
merged tarball, and simplify kata-deploy-build-and-upload-payload.sh
accordingly.  The kata-deploy image build is no longer blocked on the
merge step.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-20 20:52:36 +02:00
Fabiano Fidêncio
87e55be4a3 kata-deploy: add shim-components.json component manifest
Introduces the human-maintained shim-components.json that maps each
runtime class to the list of kata-static-<name>.tar.zst component
tarballs it needs per architecture. This is the source of truth read
by the installer at deploy time to decide which tarballs to extract.

Key design choices encoded here:
- shim-v2-go vs shim-v2-rust: explicit per-shim, so a node running
  only Rust shims never extracts the Go shim binary.
- virtiofsd and nydus are both listed for hypervisors that support
  configurable shared_fs (we cannot know which the user will choose).
- fc/firecracker: no virtiofsd or nydus (devmapper only).
- remote: only the shim binary (no local hypervisor artifacts).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-20 20:52:36 +02:00
Fabiano Fidêncio
c87e327876 kata-deploy: split shim-v2 into shim-v2-go and shim-v2-rust
Split the monolithic shim-v2 build target into separate shim-v2-go and
shim-v2-rust targets in kata-deploy-binaries.sh, the local-build
Makefile, and the four architecture CI workflows.

The Go and Rust shims now each produce their own kata-static-<name>.tar.zst
artifact, allowing downstream consumers to select only the shim variant
they need.  MEASURED_ROOTFS is set per-arch for the Rust job in CI.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-20 20:52:36 +02:00
stevenhorsman
3f27052184 kata-deploy: always add HEAD commit SHA tag to all builds
Previously, the commit SHA tag was only added for specific components
(agent, agent-ctl) by setting artefact_tag in individual install
functions. This was inconsistent and error-prone.

Now, the HEAD commit SHA is always added as a tag for all builds in
the central tagging logic. This ensures:
- All components get tagged with the commit SHA
- The correct HEAD commit is used (not the last commit that modified
  a specific path)
- Simpler, more maintainable code

The git command uses `git -C` to change to the repo directory before
running git log, which correctly returns the HEAD commit SHA regardless
of which files were modified in recent commits.

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-05-20 17:42:09 +01:00
stevenhorsman
76fc847c78 release: correct .cargo/config.toml reference in generate_vendor.sh
The script was creating .cargo/config.toml but referencing .cargo/config
in the vendor_dir_list, causing tar to fail with 'Cannot stat' error.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Generated-By: IBM Bob
2026-05-19 18:23:53 +01:00
stevenhorsman
a4cfe32157 release: Bump version to 3.31.0
Bump VERSION and helm-charts versions.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-05-19 15:32:50 +02:00
Alex Lyn
bbef0a755c Merge pull request #13005 from stevenhorsman/remove-osbuilder-tests
osbuilder: Remove tests
2026-05-19 11:58:27 +08:00
Fabiano Fidêncio
7c971f0c4c Merge pull request #13069 from fidencio/topic/kata-deploy-prevent-eviction
helm-chart: add priorityClassName to prevent kata-deploy eviction
2026-05-18 21:08:45 +02:00
Fabiano Fidêncio
5d40ba66ff helm-chart: add priorityClassName to prevent kata-deploy eviction
kata-deploy is a per-node infrastructure DaemonSet; if it gets evicted
under node memory/CPU pressure the node loses its Kata runtime until
the pod is rescheduled. Default to system-node-critical so the kubelet
evicts lower-priority workloads first.

The value is configurable via `priorityClassName` in values.yaml.

Fixes: #13068

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-18 15:14:06 +02:00
stevenhorsman
e3a00a2ec2 kata-deploy: fix binary location for agent-ctl
Moving agent-ctl into the root workspace moves the target
directory, so update this target to be in root, not src/tools

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-05-18 09:47:15 +01:00
stevenhorsman
2c1aaa8ae7 osbuilder: Remove tests
The tests haven't been run at least since we moved to GHA,
so in the spirit of lean and mean, let clear them up

Fixes: #10957
Assisted-by IBM Bob

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-05-18 09:46:42 +01:00
Manuel Huber
275a63b266 Revert "gatekeeper: Unrequire NVIDIA GPU test"
This reverts commit edfb6f5716.

The NVIDIA non-TEE CI job has passed again over the last 5 nightly
runs after merging PRs #13007 and #13020.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-05-15 15:20:12 -07:00
Steve Horsman
aade0f5fbe Merge pull request #12854 from kata-containers/dependabot/go_modules/tools/testing/kata-webhook/github.com/sirupsen/logrus-1.9.4
build(deps): bump github.com/sirupsen/logrus from 1.9.3 to 1.9.4 in /tools/testing/kata-webhook
2026-05-14 13:55:44 +01:00
Manuel Huber
ed4233bf91 rootfs: cdh: Update CDH to new version
Update CDH to a newer version and:
- adjust the NVIDIA root filesystem build to reflect the change from
  using libcryptsetup to using the cryptsetup binary.
- adjust image-pull test cases to conduct parallel write operations
  on the /dev/trusted_store backed guest image pull location since
  issue #12721 has been solved on CDH side.

Fixes #12721

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-05-13 20:20:45 +02:00
dependabot[bot]
18a13773da build(deps): bump github.com/sirupsen/logrus
Bumps [github.com/sirupsen/logrus](https://github.com/sirupsen/logrus) from 1.9.3 to 1.9.4.
- [Release notes](https://github.com/sirupsen/logrus/releases)
- [Changelog](https://github.com/sirupsen/logrus/blob/master/CHANGELOG.md)
- [Commits](https://github.com/sirupsen/logrus/compare/v1.9.3...v1.9.4)

---
updated-dependencies:
- dependency-name: github.com/sirupsen/logrus
  dependency-version: 1.9.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-05-13 06:11:16 +00:00
Greg Kurz
d2dc0a923c Merge pull request #13030 from stevenhorsman/go-1.25.10-bump
Go 1.25.10 bump
2026-05-13 08:09:51 +02:00
stevenhorsman
7cc72b933d versions: bump golang.org/x/net to v0.53.0
Bump golang.org/x/net to resolve CVE:
- GO-2026-4918

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Assisted-by: IBM Bob
2026-05-12 11:56:26 +01:00
stevenhorsman
4a65aca9cf versions: bump golang to 1.25.10
Bump the go version to resolve CVEs:
- GO-2026-4918
- GO-2026-4971
- GO-2026-4976
- GO-2026-4977
- GO-2026-4980
- GO-2026-4981
- GO-2026-4982
- GO-2026-4986

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Assisted-by: IBM Bob
2026-05-12 11:56:13 +01:00
Fabiano Fidêncio
93e02944fa image-builder/nvidia: skip DAX header for virtio-blk-pci images
The DAX header (2 MiB of NVDIMM metadata + a duplicate MBR) is
unconditionally prepended to every image by set_dax_header(). NVIDIA
images use virtio-blk-pci with disable_image_nvdimm=true, so the
kernel reads MBR #1 directly and never touches the DAX metadata --
it is dead weight.

Add a SKIP_DAX_HEADER environment variable (default "no") that, when
set to "yes", skips the DAX header entirely:
- Removes the 2 MiB DAX overhead from image size calculations in
  both the erofs and ext4 paths
- Skips the set_dax_header() call, avoiding compilation and
  execution of the nsdax tool
- Passes the variable through to containerised builds

Enable SKIP_DAX_HEADER=yes for both install_image_nvidia_gpu() and
install_image_nvidia_gpu_confidential() in the build pipeline. All
other image builds are unaffected (default remains "no").

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-10 17:18:05 +02:00
Fabiano Fidêncio
b72bb7243e image-builder: bump base image from Fedora 42 to 44
Fedora 42 reaches end-of-life in May 2026. Move the image-builder
container to Fedora 44, which is the current stable release.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-10 17:18:05 +02:00
Fabiano Fidêncio
6b802a4e30 nvidia: switch GPU rootfs images to erofs
Switch the NVIDIA GPU rootfs images (both standard and confidential)
from ext4 to erofs (Enhanced Read-Only File System).

Unlike ext4, which is a read-write filesystem mounted read-only by
convention, erofs is structurally read-only -- no journal, no write
metadata, no superblock write path. This eliminates accidental
mutation and reduces the attack surface inside the guest VM, which
is particularly important for confidential workloads using dm-verity.

Introduce a DEFROOTFSTYPE_NV Makefile variable (set to erofs) for
both Go and Rust runtimes, keeping the global DEFROOTFSTYPE as ext4
so non-NVIDIA configurations are unaffected.

Update all six NVIDIA GPU configuration templates (base, SNP, TDX
for both runtimes) to use @DEFROOTFSTYPE_NV@ instead of the global
@DEFROOTFSTYPE@.

Export FS_TYPE=erofs in install_image_nvidia_gpu() and
install_image_nvidia_gpu_confidential() so the build pipeline
produces erofs images via the image builder.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-10 17:18:05 +02:00
Fabiano Fidêncio
bfcd249f40 image-builder: add erofs dm-verity support and lz4hc compression
Add full dm-verity and measured rootfs support to
create_erofs_rootfs_image(), bringing it to parity with the ext4 path.

Unlike ext4, which is a read-write filesystem mounted read-only by
convention, erofs is structurally read-only -- no journal, no write
metadata, no superblock write path.

This is a natural fit for dm-verity: erofs never attempts writes, so
verity never has to reject anything. With ext4, the kernel must skip
journal replay on verity-protected devices, which is a fragile
assumption.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-10 17:18:05 +02:00
Fabiano Fidêncio
d2e0555cf0 image-builder: refactor dm-verity setup into shared functions
Extract build_kernel_verity_params() and setup_verity() from the
inline block inside create_rootfs_image() into top-level functions.

This is a pure refactoring with no behavior change. The verity logic
is moved verbatim, with the only difference being that
build_kernel_verity_params() now takes the image path as an explicit
parameter instead of capturing it from the enclosing scope.

The extracted functions will be reused by create_erofs_rootfs_image()
in a subsequent commit to add dm-verity support for erofs images.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-10 17:18:05 +02:00
Fabiano Fidêncio
341a0d366c kata-deploy: Fix containerd debug level path for config schema v4
Containerd 2.3 (config schema v4) uses the top-level [debug] table
for log level configuration, not plugins."io.containerd.server.v1.debug"
as was the case in the RC builds.

Update containerd_debug_level_toml_path() to use .debug.level for all
schema versions, matching the released containerd behavior.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-10 12:02:24 +02:00
stevenhorsman
87664c608d version: Bump to latest 6.18 kernel
Pick up the latest kernel that fixes CVE-2026-43284

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-05-08 17:15:24 +01:00
Fabiano Fidêncio
09bbc70302 Merge pull request #13002 from manuelh-dev/mahuber/unrequire-nim-svc
gatekeeper: Unrequire NVIDIA GPU test (temporary)
2026-05-08 10:02:00 +02:00
Manuel Huber
edfb6f5716 gatekeeper: Unrequire NVIDIA GPU test (temporary)
Temporarily unrequire the NVIDIA GPU test. We are experiencing
situations in which two NIM service instances get deployed almost
at the same time into the kata-containers-k8s-tests namespace
(expected current context) and into the default namespace. This
causes the NIM operator to create two deployments in the two
namespaces and to then schedule two pods at the same time. This
usually causes the NIM pod in the default namespace to fail and to
linger.
We can't explain yet why this does not happen in the TEE CI path
and why this is happening at all.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-05-07 14:39:24 +02:00
Fabiano Fidêncio
0f3160276b ci: k8s: skip no-op Helm uninstall on free runners
In cleanup_kata_deploy, bail out early when no kata-deploy Helm release
exists so baremetal-* pre-deploy cleanup on fresh clusters does not
block on helm uninstall --wait (up to 10m).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-07 13:40:55 +02:00
Fabiano Fidêncio
f5533950e6 kata-deploy: helm: cap container RSS via resources block
Plumb a resources block into the kata-deploy DaemonSet container in
the Helm chart so the cluster can size its memory footprint
predictably.

Defaults are sized from real /proc/<pid>/status numbers on an
unpatched 3.30.0 build running on a ~220-vCPU GPU node:

  VmRSS:    9944 kB  (~9.7 MiB)   <- actual physical memory
  RssAnon:  2628 kB  (~2.6 MiB)   <- heap + dirty stack pages
  VmData: 464668 kB  (~454 MiB)   <- tokio multi-thread workers'
                                     reserved-but-untouched stacks
  Threads: 225                    <- num_cpus()-driven worker pool

That VmData number is the source of the original "kata-deploy is
using 400 MB" reports: any monitoring layer that surfaces virtual
data size, committed memory, or memory.usage_in_bytes on a kernel
that includes mapped-but-untouched memory will happily reproduce
~400 MB even though only ~10 MiB is ever made resident. The earlier
commits in this series (current_thread tokio, mimalloc, shared kube
client, JSONPath removal, post-install re-exec) collapse VmData into
the tens of MiB and drop the post-install resident set further.

The defaults below are picked accordingly:

  requests:
    cpu: 25m            # install is mostly I/O wait; the post-install
                        # waiter is genuinely idle
    memory: 16Mi        # ~2x headroom over the unpatched VmRSS we
                        # measured, far more over the patched waiter

Operators who hit OOMKilled on unusually large or churny clusters can
override `resources` directly in their Helm values (or set it to {}
to remove all requests and inherit cluster defaults).

Fixes: https://github.com/kata-containers/kata-containers/discussions/12976

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-07 13:40:55 +02:00