Commit Graph

19404 Commits

Author SHA1 Message Date
Aurélien Bombo
e191c5b716 runtime-go/rs: Reconcile hugepage emptyDirs and disable_guest_empty_dir
This addresses an issue where the disable_guest_empty_dir=true code paths did
not take into account that hugepage-backed emptyDirs should always be recreated
in the guest (using guest hugepages).

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-06-24 15:22:13 -05:00
Aurélien Bombo
a3e91d9ed2 runtime-go/rs: Set disable_guest_empty_dir = true by default
This makes the runtime share the host Kubelet emptyDir folder with the guest
instead of the agent creating an empty folder in the container rootfs. Doing so
enables the Kubelet to track emptyDir usage and evict greedy pods.

In other words, with virtio-fs the container rootfs uses host storage whether
this is true or false, however with true, Kata uses the k8s emptyDir folder so
the sizeLimit is properly enforced by k8s.

Addresses the ephemeral storage part of #12203.

History:

 * Initially, emptyDirs are slow because they are shared from the host with 9p.
   https://github.com/kata-containers/runtime/issues/1472

 * To address above, emptyDirs are hardcoded to be created by the agent in the
   pause container's rootfs, potentially leveraging devicemapper and improving
   perf.
   https://github.com/kata-containers/runtime/pull/1485

 * The previous PR regressed an (interesting?) use case where emptyDirs were
   used to share data from the host to the guest, so the behavior was made
   configurable and `disable_guest_empty_dir = false` is introduced, defaulting
   to the behavior of the previous PR.
   https://github.com/kata-containers/kata-containers/pull/2056

 * Another resource accounting regression remains which is addressed in this PR.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-06-24 15:21:53 -05:00
Steve Horsman
49ce886f20 Merge pull request #13242 from charludo/fix/runtime-rs-safe-path
runtime-rs: change `safe-path` dependency from crates.io to workspace
2026-06-18 11:39:19 +01:00
Charlotte Hartmann Paludo
b4be5fdcca runtime-rs: change safe-path dependency from crates.io to workspace
`safe-path` is resolved from the local workspace in all other workspace
member crates. This commit changes the dependency to a local one for
runtime-rs as well.

Signed-off-by: Charlotte Hartmann Paludo <git@charlotteharludo.com>
Co-authored-by: Markus Rudy <mr@edgeless.systems>
2026-06-18 06:32:06 +02:00
Steve Horsman
66e938e02d Merge pull request #13244 from BbolroC/use-ibm-actionspz-runners-for-publishing-jobs
GHA: Use IBM ActionsPZ runners for publish jobs on s390x
2026-06-17 15:45:20 +01:00
Hyounggyu Choi
308eb34af6 GHA: Use IBM ActionsPZ runners for publish jobs on s390x
Let's use the ActionsPZ runners for the following jobs:
- publish-kata-deploy-image-s390x
- publish-kata-monitor-image-s390x

to improve CI experiences.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-06-17 15:34:39 +02:00
Alex Lyn
47ac08b419 Merge pull request #13239 from Apokleos/remove-9p
runtime-rs: Remove unused msize_9p totally from configurations
2026-06-17 20:17:52 +08:00
Greg Kurz
f0f8233759 Merge pull request #13237 from gkurz/osbuilder-version
osbuilder: Simplify version fetching
2026-06-17 13:56:13 +02:00
Greg Kurz
c3d98fe323 osbuilder: Simplify version fetching
`tools/osbuilder/VERSION` points to the root `VERSION` file,
just like the code does. Use that file.

Signed-off-by: Greg Kurz <groug@kaod.org>
2026-06-17 10:08:23 +02:00
Alex Lyn
854eef0312 runtime-rs: Remove unused msize_9p totally from configurations
As virtio-9p is deprecated already, and its msize_9p should be
deprecated too. This commit aims to remove the unused msize_9p.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-17 14:53:48 +08:00
Fabiano Fidêncio
0ddb2ee1f1 Merge pull request #13160 from LandonTClipp/kata_visible_devices
feat(agent): translate KATA_VISIBLE_DEVICES into CDI GPU requests
2026-06-16 19:10:35 +02:00
Fabiano Fidêncio
3ca5742338 Merge pull request #13129 from pmores/fix-default_memory_annontation
runtime-rs: fix default_memory annonation processing
2026-06-16 18:11:19 +02:00
Fabiano Fidêncio
3e98f925cf Merge pull request #13142 from davidweisse/dav/genpolicy-pod-resources
genpolicy: support pod-level resources
2026-06-16 15:31:50 +02:00
davidweisse
ac56ea21d8 genpolicy: support pod-level resources
Add support for resource requests and limits in the PodSpec.

Fixes #12816

Signed-off-by: davidweisse <98460960+davidweisse@users.noreply.github.com>
2026-06-16 15:30:22 +02:00
Fabiano Fidêncio
774e698aeb Merge pull request #12293 from Apokleos/graceful-errors
runtime-rs: make OOM watcher and signal handling lifecycle-aware
2026-06-16 15:02:54 +02:00
Fabiano Fidêncio
c76c82ce1c Merge pull request #13229 from hgowda-amd/skip-qos-tests-snp-tdx-runtime-rs
tests: skip Guaranteed QoS test for SNP/TDX runtime-rs
2026-06-16 14:02:51 +02:00
Fabiano Fidêncio
492d604daf Merge pull request #13214 from fidencio/topic/block-volume-readonly-propagation
runtime(-rs):  Propagate host block device read-only flag to the VMM
2026-06-16 13:39:23 +02:00
Pavel Mores
9b31e06c20 runtime-rs: bump the byte-unit dependency version
The unit tests added by the previous commit exposed a malfunction of the
byte-unit crate on big-endian systems(*), causing s390x CI to fail.
Bump the dependency's version to include a fix.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2026-06-16 13:15:23 +02:00
Pavel Mores
5ba5046e97 runtime-rs: fix default_memory annonation processing
The annotation value is implicitly in MiB but when presented to the
byte-unit crate this is interpreted as bytes.  When a common value like
2048, meant to mean 2048 MiB but interpreted as 2048 B, is then converted
to MiB the result is zero which is less than the minimal allowable memory
and the runtime fails to launch.

This is fixed by adding a detection whether the annotation value contains
units or not.  If it doesn't it's first converted to MiB and the rest of
the processing then goes like before.

This way we allow for the implicit MiB units when no units are given, thus
keeping compatibility with existing go shim behaviour, while also allowing
for any legal units to be given as well.

We take the opportunity to add some unit tests as well.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2026-06-16 13:15:23 +02:00
LandonTClipp
4a9da5d37a chore(docs): Add info on building and running custom artifacts
I created this over the course of testing my VISIBLE_CDI_DEVICES
changes. I think this will be useful to folks who don't understand the
right way to deploy custom artifacts.

Signed-off-by: LandonTClipp <lclipp@coreweave.com>
2026-06-16 11:44:09 +02:00
LandonTClipp
a1dd28cb52 feat(runtime): plumb VISIBLE_CDI_DEVICES through the Go runtime
Add a `visible_cdi_devices` TOML option to the Go runtime so the
agent.visible_cdi_devices=true kernel parameter is emitted to the guest
when enabled. Wire the option through the NVIDIA GPU configuration
templates and add tests verifying the kernel-params flow.

Signed-off-by: LandonTClipp <lclipp@coreweave.com>
2026-06-16 11:44:09 +02:00
LandonTClipp
b49eb577b2 feat(runtime-rs): expose visible_cdi_devices in config
Declare the `visible_cdi_devices` agent option (kernel param
agent.visible_cdi_devices) in kata-types so runtime-rs can opt into
emitting it to the guest, and expose it in the three NVIDIA GPU
configuration templates (qemu, qemu-snp, qemu-tdx) at runtime-rs/config/.

The agent consumes the corresponding VISIBLE_CDI_DEVICES env var to
drive CDI device requests.

Signed-off-by: LandonTClipp <lclipp@coreweave.com>
2026-06-16 11:44:09 +02:00
LandonTClipp
676fc90d0b feat(agent): translate VISIBLE_CDI_DEVICES into CDI device requests
Add an opt-in `visible_cdi_devices` agent option that lets a container
select which of the VM's CDI-known devices it sees via a
VISIBLE_CDI_DEVICES env var. The schema is `<cdi-kind>=<devices>`
(e.g. "nvidia.com/gpu=all", or "kata.com/gpu=0,1"), with multiple kinds
delimited by ':'.

When enabled, the agent maps the value to CDI device requests and feeds
them through the existing CDI injection path, so device nodes, mounts,
env and createContainer hooks from the guest CDI spec (e.g.
/var/run/cdi/nvidia.yaml, generated by NVRC/nvidia-ctk) are applied.
The variable is intentionally distinct from NVIDIA_VISIBLE_DEVICES and
does not promise identical semantics.

If a requested kind is present in the guest CDI registry but the
specific device index is not, the agent fails fast rather than waiting
for the CDI-spec watch/timeout path. An entirely absent kind falls
through to the existing wait/timeout behavior.

Defaults to false; containers that don't set the env var are unaffected.

Signed-off-by: LandonTClipp <lclipp@coreweave.com>
2026-06-16 11:44:09 +02:00
Alex Lyn
8fc1a16225 runtime-rs: Make signal_process idempotent for exited init processes
Address the issue where signal_process returns an INTERNAL error when
the container's init process has already exited, and ensure teardown
is never aborted by signal failures.

Introduce is_no_such_process_error() to detect "no such process"
conditions (ESRCH/ENOENT codes or equivalent messages). When the init
process is already gone, treat it as success with an info log instead
of an error.

In stop_process(), never propagate signal failures. During sandbox
shutdown the agent connection is often already closed, causing
AgentConnectionClosed errors that bypass is_no_such_process_error().
If stop_process() aborts on such errors, cleanup_container() is skipped
and leftover mounts cause "Resource busy" failures in sandbox cleanup.
Restore "always proceed to cleanup" semantics: log the failure as a
warning, but never skip resource cleanup.

Resource cleanup must be best-effort and idempotent regardless of kill
outcome.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-06-16 15:12:28 +08:00
Alex Lyn
44dd2b1f34 runtime-rs: Refine OOM watcher error reporting for sandbox teardown
This commit refines the error handling within the OOM watcher to
distinguish between genuine failures and errors that occur as a natural
consequence of sandbox shutdown via the helper is_normal_shutdown_error.
Previously, various connection-related errors during teardown were logged
as warnings, contributing to noisy logs.

It aims to improve OOM error handling, distinguish error types:
The logic now differentiates between "normal shutdown" errors (e.g.,
Connection reset by peer, broken pipe) and actual OOM watcher failures.

This enhancement makes OOM event logs more informative and less prone to
clutter during normal sandbox termination.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-16 15:12:24 +08:00
Alex Lyn
3095bd379b runtime-rs: Introduce cancellation for OOM watcher during teardown
This commit introduces an explicit cancellation mechanism for the OOM
watcher loop within VirtSandbox. This addresses the issue where the
watcher continues to poll for OOM events even when the sandbox is being
stopped, leading to spurious "Connection reset by peer" errors.

Key changes:
(1) A CancellationToken is added to VirtSandbox to signal the watcher
loop when the sandbox is undergoing teardown.
(2) The OOM watcher loop in VirtSandbox::start() is now wrapped in a
tokio::select! statement. This allows it to concurrently listen for
two events:
- cancel_token.cancelled(): Triggered when the sandbox/VM is stopping.
- agent.get_oom_event(): The regular OOM event polling.
(3) In the sandbox stop/teardown path, cancel_token.cancel() is called
before stopping the VM. This ensures the OOM watcher loop exits cleanly
via the cancellation token, preventing the occurrence of ECONNRESET/EOF
errors on a closed channel.

This change improves the robustness of OOM event handling during sandbox
lifecycle management.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-16 12:56:54 +08:00
Alex Lyn
0ffdc576d3 runtime-rs: Introduce a helper to check if process/container exists
Returns `true` if the error indicates that the target process/container
no longer exists.

This is used to determine if an operation, like signaling a process,
failed because the target is no longer available. The function checks
for standard OS error codes (`ESRCH`, `ENOENT`) and common error message
patterns.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-16 12:56:54 +08:00
Alex Lyn
59677688ee runtime-rs: Introduce a helper to check normal oom shutdown errors
It mainly for checking if an error is a normal oom shutdown error
due to network disconnection issues.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-16 12:56:54 +08:00
Fabiano Fidêncio
cfab6f496b runtime-rs: Propagate block device read-only flag to the VMM
Block volumes and block-mode device nodes were attached to the guest
read-write regardless of the volume's read-only intent, so the
guest-visible virtio-blk device was always writable.

This matters beyond simple write protection: filesystems such as XFS
inspect the block device read-only state to decide whether to attempt
journal/log recovery. When the device is writable, XFS tries to replay
the log even on a read-only mount, which fails badly. Mounting with
"-o ro" inside the guest is not sufficient; the device itself must
advertise read-only (VIRTIO_BLK_F_RO), which only happens when the VMM
opens the backing image read-only.

Set is_readonly on the block device config from two signals, combined
with OR so either one marks the device read-only:

  - the read-only intent from the OCI spec:
      * bind-mounted block volumes and direct-assigned (raw block)
        volumes derive it from the "ro" mount option, and
      * block-mode volumes (e.g. Kubernetes volumeDevices) arrive as
        device nodes in spec.Linux.Devices with no mount option; their
        intent is expressed only via the cgroup device access in
        spec.Linux.Resources.Devices ("rm" = read+mknod, no write, for
        read-only; "rwm" for read-write). handler_devices() derives the
        flag from the matching cgroup allow rule, and
  - the host block device's own read-only flag (queried via the BLKROGET
    ioctl). Both the volume path (block_volume/rawblock_volume) and the
    device-node path (handler_devices, resolving the host node via
    get_host_path) honor it, so a device that is physically read-only on
    the host is exposed read-only to the guest even when the intent is
    not encoded in the OCI spec.

All in-tree hypervisors (qemu, cloud-hypervisor, dragonball) already
honor BlockConfig.is_readonly, so no hypervisor changes are required.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor
2026-06-15 23:18:36 +02:00
Fabiano Fidêncio
6203e28bef runtime: Propagate block-mode device read-only flag to the VMM
Block-mode volumes (e.g. Kubernetes volumeDevices) are passed to the
container as device nodes in spec.Linux.Devices and carry no mount "ro"
option. Their read-only intent is expressed only via the cgroup device
access in spec.Linux.Resources.Devices ("rm" = read+mknod, no write, for
read-only; "rwm" for read-write).

The device path ignored that signal: newLinuxDeviceInfo() built the
DeviceInfo without ever setting ReadOnly (it only consumed FileMode, the
node permission bits, not the read/write access), so the device was
always attached read-write.

This is problematic for filesystems such as XFS, which inspect the block
device read-only state to decide whether to attempt journal/log recovery.
When the guest device is writable, XFS tries to replay the log even for a
read-only mount, which fails badly. Mounting "-o ro" in the guest is not
enough; the device itself must advertise read-only, which only happens
when the VMM opens the backing device read-only (DeviceInfo.ReadOnly ->
BlockDrive.ReadOnly -> qemu read-only=on / clh Readonly).

Derive the read-only flag from two independent signals, combined with OR
so either one marks the device read-only:

  - the cgroup device access rule that exactly matches the device, so a
    block-mode volume marked read-only by the orchestrator (e.g. a pod
    volume with persistentVolumeClaim.readOnly: true) is honored, and
  - the host block device's own read-only flag (queried via the BLKROGET
    ioctl). Block-mode volumes frequently carry no read-only signal in
    the OCI spec at all, so the device flag is often the only reliable
    source.

The BLKROGET probe is shared (pkg/device/config.BlockDeviceIsReadOnly,
Linux-only with a stub on other platforms) between the device-node path
(newLinuxDeviceInfo, probing /dev/block/<major>:<minor>) and the
bind-mounted/filesystem block path (createDeviceInfo). None of this
relies on external host tooling such as "blockdev --setro".

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor
2026-06-15 23:18:36 +02:00
Harshitha Gowda
6588014b54 tests: skip Guaranteed QoS test for SNP/TDX runtime-rs
The Guaranteed QoS test is currently failing for SNP and TDX runtime-rs
due to a podOverhead configuration issue. The test requests 600Mi of
memory which, combined with the 2048Mi podOverhead, exceeds 2GiB and
triggers memory management issues in confidential guests.

This is a temporary skip until the podOverhead fix is merged.

Related: https://github.com/kata-containers/kata-containers/pull/13228
Signed-off-by: Harshitha Gowda <hgowda@amd.com>
2026-06-15 20:04:22 +00:00
Fabiano Fidêncio
1af9456e26 Merge pull request #13223 from manuelh-dev/mahuber/kata-config-dropin-helpers
tests: add runtime config drop-in helpers
2026-06-15 19:53:28 +02:00
Fabiano Fidêncio
a8a2a705ed Merge pull request #13226 from fidencio/topic/fix-image-publishing-for-release
release: do not publish a kata-monitor-job-dispatcher manifest
2026-06-15 17:48:23 +02:00
Fabiano Fidêncio
5959549645 release: do not publish a kata-monitor-job-dispatcher manifest
The shared _publish_multiarch_manifest() helper always derived a
"-job-dispatcher" registry from the registries it was given. However, the
dispatcher is a kata-deploy-specific sidecar image, so when the helper
was reused to publish the kata-monitor multi-arch manifest it wrongly
tried to push a non-existent kata-monitor-job-dispatcher image.

Let's gate the dispatcher derivation behind
KATA_DEPLOY_PUBLISH_JOB_DISPATCHER (defaulting to true so the
kata-deploy path is unchanged) and opt out of it when publishing the
kata-monitor manifest.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-06-15 16:29:29 +02:00
Hyounggyu Choi
59fd29fb33 Merge pull request #13225 from BbolroC/use-tar-in-exporting-kate-deploy-files
packaging: Optimize kata-deploy build export using tar output
2026-06-15 14:47:01 +02:00
Hyounggyu Choi
46b8e9f027 packaging: Optimize kata-deploy build export using tar output
Replace `type=local` with `type=tar` in kata-deploy build to reduce
export time and avoid build hangs during the export-to-client-directory
phase.

Update callers to extract binaries directly from the tar archive instead
of copying from an intermediate directory.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-06-15 11:52:33 +02:00
Fabiano Fidêncio
50d68d1f5f Merge pull request #13116 from thejasn/thn/fix-oom-event-deadlock
agent: fix get_oom_event deadlock after connection restart
2026-06-15 11:05:46 +02:00
Steve Horsman
ea999aa033 Merge pull request #13221 from manuelh-dev/mahuber/nydus-root-export
kata-deploy: export nydus snapshotter root
2026-06-15 08:55:08 +01:00
Thejas N
7807aa3d62 agent: fix get_oom_event deadlock after connection restart
When the agent-protocol-forwarder's inbound connection restarts (e.g.
during a Cloud API Adaptor restart in peer pod environments), the shim
re-sends a GetOOMEvent request through the new connection. Since the
forwarder→agent Unix socket survives the restart, the old handler from
the previous connection remains alive, holding the event_rx lock while
blocked in recv().await.

The new handler acquires the sandbox lock, then attempts to acquire the
event_rx lock — which is held by the old handler. Because the sandbox
lock is still held during this wait, every subsequent RPC
(ExecProcess, WaitProcess, StatsContainer, SignalProcess, etc.) blocks
on the sandbox lock, rendering the pod completely unresponsive.

The root cause is a lock ordering violation: get_oom_event held the
sandbox lock while acquiring the event_rx lock. Fix this by scoping the
sandbox lock acquisition so it is dropped before the event_rx lock is
acquired. The sandbox lock is only needed to clone the Arc<Mutex<Receiver>>
— once cloned, it can be released immediately.

Assisted-by: Claude Code <noreply@anthropic.com>
Signed-off-by: Thejas N <thn@redhat.com>
2026-06-15 07:47:18 +02:00
Fabiano Fidêncio
37c4a0b6a2 Merge pull request #13128 from nikolasgkou/fix/guest-protection-fallback
runtime-rs: don't fail VM start when guest protection detection errors
2026-06-13 08:56:56 +02:00
Manuel Huber
9ffdb1219d tests: add runtime config drop-in helpers
Add common Kubernetes test helpers for locating the active per-shim
Kata runtime config directory and copying/removing TOML fragments
under config.d.

Update the NVIDIA NUMA test to install its temporary numa_mapping
override through those helpers. This gives follow-up tests a shared
pattern for temporary runtime config overrides.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-12 21:43:06 +00:00
Fabiano Fidêncio
5efc761002 Merge pull request #13211 from glingy/patch-1
runtime-rs: Fix queue_size of zero in block_rootfs
2026-06-12 22:37:18 +02:00
Fabiano Fidêncio
1b60563a34 Merge pull request #13120 from LandonTClipp/runtime-config
chore(docs): Clarify dropIn runtime configuration
2026-06-12 22:34:58 +02:00
LandonTClipp
6005f8a499 chore(docs): Add cspell makefile target for local testing
This makes it easier to check the spellchecker is happy before
submitting it as a PR.

Signed-off-by: LandonTClipp <lclipp@coreweave.com>
2026-06-12 22:24:18 +02:00
LandonTClipp
03c283edec chore(docs): Clarify dropIn runtime configuration
Clean the runtime configuration section by focusing first on the helm
configuration. Then, pivot into a further explanation on how the runtime
can be directly configured. Link to where these config parameters are
explained more in-depth.

Add open-in-new-tab (already downloaded in requirements.txt) in the
mkdocs plugin config so that links don't open in the same tab.

Signed-off-by: LandonTClipp <lclipp@coreweave.com>
2026-06-12 22:24:18 +02:00
nikolasgkou
80b8f592a0 runtime-rs: skip guest protection detection for non-confidential guests
prepare_protection_device_config() called available_guest_protection()
unconditionally and propagated any error before the "confidential_guest
is not set" case was handled.

On AMD hosts where the kvm_amd `sev` module parameter is "Y" but the CPU
does not expose the SEV-SNP CPUID bit (8000_001f EAX[4]) -- e.g. consumer
Ryzen -- available_guest_protection() returns Err("SEV not supported"),
which blocked every non-confidential VM from booting even though no
protection was requested.

When confidential_guest is not set there is no reason to probe the host,
so return Ok(None) before calling available_guest_protection(). Detection
(and any error it produces) now runs only when a confidential guest is
actually requested.

Signed-off-by: nikolasgkou <nikolasgkou@disroot.org>
2026-06-12 22:20:13 +02:00
Fabiano Fidêncio
47b327ea35 Merge pull request #13155 from fidencio/topic/kata-deploy-no-daemonset
kata-deploy: add a Job-based deployment mode (alternative to the privileged DaemonSet)
2026-06-12 21:55:11 +02:00
Manuel Huber
639420e7f5 kata-deploy: export nydus snapshotter root
containerd uses the proxy plugin root export when reporting CRI image
filesystem paths. Without this export, the CRI plugin falls back to
/var/lib/containerd/io.containerd.snapshotter.v1.<snapshotter>.

For nydus-for-kata-tee this fallback does not match the actual
snapshotter root under /var/lib/nydus-for-kata-tee.
Kubelet/cAdvisor then fails stats collection when it tries to inspect
the nonexistent fallback path.

Export the nydus proxy snapshotter root so containerd reports the real
filesystem path for resource accounting.

When using trusted ephemeral storage or a new ephemeral storage wip
feature for providing plain disks, resource accounting would not kick
in and pods which exhausted their emptyDir sizeLimits would not get
evicted.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-06-12 19:06:01 +02:00
Fabiano Fidêncio
c5d5fc6ee8 Merge pull request #13213 from burgerdev/grpc-probes
genpolicy: add missing probe fields
2026-06-12 19:04:53 +02:00
Fabiano Fidêncio
aa27490801 kata-deploy: track distroless static base by tag, not digest
The kata-deploy main image pinned its gcr.io/distroless/static-debian13
base by sha256 digest. distroless does not publish versioned tags, so a
pinned digest just goes stale with no clear upgrade path. Track the
rolling tag instead (guarded with a hadolint DL3007 ignore plus a comment
explaining why), matching the kata-deploy-job-dispatcher image base.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-06-12 18:58:33 +02:00