Commit Graph

2170 Commits

Author SHA1 Message Date
Fabiano Fidêncio
0ddb2ee1f1 Merge pull request #13160 from LandonTClipp/kata_visible_devices
feat(agent): translate KATA_VISIBLE_DEVICES into CDI GPU requests
2026-06-16 19:10:35 +02:00
davidweisse
ac56ea21d8 genpolicy: support pod-level resources
Add support for resource requests and limits in the PodSpec.

Fixes #12816

Signed-off-by: davidweisse <98460960+davidweisse@users.noreply.github.com>
2026-06-16 15:30:22 +02:00
LandonTClipp
4a9da5d37a chore(docs): Add info on building and running custom artifacts
I created this over the course of testing my VISIBLE_CDI_DEVICES
changes. I think this will be useful to folks who don't understand the
right way to deploy custom artifacts.

Signed-off-by: LandonTClipp <lclipp@coreweave.com>
2026-06-16 11:44:09 +02:00
Harshitha Gowda
6588014b54 tests: skip Guaranteed QoS test for SNP/TDX runtime-rs
The Guaranteed QoS test is currently failing for SNP and TDX runtime-rs
due to a podOverhead configuration issue. The test requests 600Mi of
memory which, combined with the 2048Mi podOverhead, exceeds 2GiB and
triggers memory management issues in confidential guests.

This is a temporary skip until the podOverhead fix is merged.

Related: https://github.com/kata-containers/kata-containers/pull/13228
Signed-off-by: Harshitha Gowda <hgowda@amd.com>
2026-06-15 20:04:22 +00:00
Manuel Huber
9ffdb1219d tests: add runtime config drop-in helpers
Add common Kubernetes test helpers for locating the active per-shim
Kata runtime config directory and copying/removing TOML fragments
under config.d.

Update the NVIDIA NUMA test to install its temporary numa_mapping
override through those helpers. This gives follow-up tests a shared
pattern for temporary runtime config overrides.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-12 21:43:06 +00:00
Fabiano Fidêncio
54878fa373 kata-deploy: add job deployment mode driven by the job-dispatcher
Phase 2 of the DaemonSet -> staged-Job migration: add an opt-in
`deploymentMode: job` that installs Kata via short-lived, per-node
install Jobs instead of the long-running DaemonSet. The DaemonSet remains
the default and is now gated behind `deploymentMode == daemonset`.

Rather than render one Job per node into the Helm release (which grows
the release secret O(nodes) and offers no rollout pacing), job mode ships
a single tiny post-install/post-upgrade hook Job that runs the
kata-deploy-job-dispatcher. The dispatcher enumerates the selected nodes
LIVE from the API server and stamps out one node-pinned install Job per
node from a constant-size ConfigMap of Job templates, keeping at most
`job.parallelism` in flight and refilling as they finish. This guarantees
per-node coverage with a paced rollout while the Helm release stays O(1)
regardless of fleet size. New nodes are picked up by re-running
`helm upgrade`; there is no always-on component.

Each per-node Job runs the staged install pipeline as ordered
initContainers and exits:

  host-check -> artifacts -> cri   (initContainers, run sequentially)
  label                            (main container)

The privilege split is explicit: the dispatcher pod is a pure
control-plane client (lists nodes, manages Jobs in its own namespace) and
runs fully unprivileged under a dedicated, least-privilege ServiceAccount
(kata-rbac.yaml); only the per-node Jobs it creates carry the privileged
kata-deploy host-mutation rights.

Node selection (templates/_helpers.tpl: nodeLabelSelector / perNodeJob):
  - job.nodes: explicit node-name list passed to the dispatcher, and
  - job.nodeSelector (equality map) ANDed with
  - job.nodeSelectorExpressions (k8s label-selector requirements:
    In / NotIn / Exists / DoesNotExist),
compiled into a single label-selector string the dispatcher resolves
live. The default expressions target worker (non-control-plane) nodes, so
no custom node labeling is required; set the expressions to [] to target
all discovered nodes.

Reuses the commonEnv/commonVolume* helpers and adds the stageContainer,
serviceAccountName, dispatcherServiceAccountName, dispatcherImage and
perNodeJob helpers shared by the dispatcher and the staged Jobs. The
default (daemonset) render is unchanged.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-06-12 18:58:33 +02:00
Fabiano Fidêncio
a016fd0485 Merge pull request #13198 from fidencio/topic/fix-ci-tee-static-sizing-overhead
tests: raise k8s memory/QoS pod limits for TEE runtime-rs CI
2026-06-12 11:46:56 +02:00
Fabiano Fidêncio
723f74e782 Merge pull request #13209 from fidencio/topic/fix-kata-monitor-runc-pod-runtime
tests: launch kata-monitor runc workload with explicit runtime
2026-06-12 11:40:19 +02:00
Fabiano Fidêncio
cda6c8c6e0 tests: raise k8s memory/QoS pod limits for TEE runtime-rs CI
Increase memory request/limit values used by k8s memory and QoS
integration workloads so SNP/TDX static-sized sandboxes boot reliably
under the new sizing defaults.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-06-11 22:03:36 +02:00
Fabiano Fidêncio
9e597d33f2 tests: launch kata-monitor runc workload with explicit runtime
The kata-monitor negative test creates a non-kata pod and asserts it does
not appear in the kata-monitor cache (built from /run/vc/sbs, where only
kata sandboxes register).

However, the workload was started without a runtime handler, so it used
containerd's default runtime, which in the CI containerd config is set
to kata, so the "runc" pod was actually launched as a kata sandbox,
registered under /run/vc/sbs, and tripped the assertion ("cache: got
runc pod ...").

Start the workload with an explicit runc handler (configurable via
RUNC_RUNTIME) so it is a genuine runc sandbox that never touches
/run/vc/sbs.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-06-11 21:59:53 +02:00
Alex Lyn
1034d7fc46 tests: Add support nydus tests for qemu-runtime-rs and clh-runtime-rs
This commit is to enable qemu-runtime-rs/clh-runtime-rs and make it
compatiable with qemu-runtime-rs and clh-runtime-rs.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 21:42:48 +02:00
Alex Lyn
4eb7512e7b docs: Update how-to guide for virtio-fs-nydus with runtime-rs
Add comprehensive documentation for using virtio-fs-nydus shared
filesystem with Kata Containers. This guide covers:
(1) Clarify configuration options for virtio-fs-nydus and nydus image
    preparation and usage.
(2) Update daemon configuration and lifecycle management and introduce
    standalone, inline nydus architecture.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-06-11 21:42:48 +02:00
Fabiano Fidêncio
38416f78ec Merge pull request #13190 from manuelh-dev/mahuber/fix-num-cpus-bats
tests: fix k8s-number-cpus expectation
2026-06-10 21:59:21 +02:00
Fabiano Fidêncio
92a9691470 tests: add kata-monitor helm chart k8s test
Add a single-job k8s test that installs the kata-deploy helm chart
with monitor.enabled=true, pointed at the per-PR kata-monitor image
built earlier in the same run, and exercises both the rollout and the
user-visible behaviour:

  * the kata-monitor DaemonSet rolls out and the pod stays up without
    container restarts;
  * a real kata-runtime probe pod is scheduled, then /metrics and
    /sandboxes are scraped through the apiserver pod-proxy to prove
    kata-monitor sees the sandbox (non-zero running-shim count plus at
    least one per-sandbox kata_shim_* metric);
  * after the probe pod is deleted, /metrics drops back to a zero
    running-shim count.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-09 14:33:30 +02:00
Fabiano Fidêncio
285d5daa23 tests: install latest cri-tools dynamically
Resolve the cri-tools release at install time instead of pinning a
version in versions.yaml: install_cri_tools now queries the GitHub
releases API for the absolute latest stable tag, and the kata-monitor,
cri-containerd and nydus jobs call it directly.

Also write /etc/crictl.yaml during containerd setup so crictl stops
emitting deprecation warnings about the legacy default endpoints.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-09 14:33:30 +02:00
Fabiano Fidêncio
63fec205fe tests: run kata-monitor functional tests against the dedicated image
Exercise the published kata-monitor container image (the one built by
publish-kata-monitor-payload-amd64) rather than the on-disk binary, so
integration regressions like the recent glibc/musl mismatch surface at
PR time. The kata-monitor-tests.sh script keeps the binary fallback for
ad-hoc local runs.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-09 14:33:30 +02:00
Fabiano Fidêncio
d5bc1177c0 tests: focus kata-monitor CI on containerd active
Drop the stale CRI-O matrix entry (its cri-tools pin was several
releases behind) along with the exclude that hid the containerd job,
and pin the remaining job to containerd's "active" track (currently
v2.2) via CONTAINERD_VERSION.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-09 14:33:30 +02:00
Fabiano Fidêncio
5000000883 tests: restore SystemdCgroup in installed containerd
Set runc SystemdCgroup=true when generating /etc/containerd/config.toml
during containerd installation, restoring behavior that was mistakenly
dropped.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-06-09 10:46:38 +02:00
Fabiano Fidêncio
3ca9eb94b9 cri-containerd: fix v1 sanity-check config generation
Avoid emitting unsupported plugin keys and empty runtime options in the
v1.x config path so containerd 1.7 can load the generated TOML during
runc sanity checks.

While here, let's also dump the temporary cri-integration config on
failure to speed diagnosis.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-06-09 10:46:38 +02:00
Fabiano Fidêncio
ac2221a6a5 Merge pull request #13004 from fidencio/topic/versions-bump-containerd-to-2.3
versions: Bump containerd to 2.3
2026-06-09 08:21:58 +02:00
Manuel Huber
f37fb18b8c tests: fix k8s-number-cpus expectation
As pointed out in kata-containers/kata-containers#12961, the
k8s-number-cpus retry loop could fail all retried assertions and
still pass.

k8s-number-cpus retried until the guest reported three CPUs, but
the post-loop result was never checked. Bash suppresses errexit for
the equality test before && break, so the test could exhaust retries
and still pass.

The current kata-qemu handler sizes vCPUs from fractional container
quotas: two 500m limits produce one workload vCPU, then the default
vCPU is added and rounded once. Expect two CPUs and assert the final
retry result so the test fails if the count never converges.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Assisted-by: OpenAI Codex <codex@openai.com>
2026-06-08 22:50:02 +00:00
Fabiano Fidêncio
48ebbbec3a kata-deploy: honor debug mode with CLI log-level
Make the chart pass --log-level debug automatically when debug=true so
CI and troubleshooting runs emit full rendered config dumps without
requiring a separate log-level override.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:25:48 +02:00
Fabiano Fidêncio
95b8e8bea9 tests: update remaining containerd callers for containerd 2.x
tests/functional/vfio-ap/run.sh:
- Source tests/common.bash so the schema helpers are available.
- configure_containerd_for_runtime_rs: write kata-qemu-runtime-rs
  configuration via a conf.d drop-in.  Schema >= 3 uses
  io.containerd.cri.v1.runtime; schema 2 uses io.containerd.grpc.v1.cri.
  The sandboxer field is emitted only for schema >= 3.

tests/integration/nerdctl/gha-run.sh:
- Fix "containerd config default" pipe: propagate PATH so the newly
  installed binary is found, suppress stdout, and call
  ensure_containerd_conf_d_rootful_api_sockets.

tests/integration/kubernetes/gha-run.sh:
- Fix jq filter for devmapper snapshotter (.version // 0 >= 3).
- Add ensure_containerd_conf_d_rootful_api_sockets after config setup.

tests/gha-run-k8s-common.sh:
- Remove the redundant "containerd config default | sed" override;
  overwrite_containerd_config (called via check_containerd_config_for_kata)
  now handles SystemdCgroup and all other containerd config setup.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
1caacda174 tests/cri-containerd: update integration tests for containerd 2.x
Adapt create_containerd_config to work with containerd 2.x while
keeping compatibility with v1.x for completeness:

- Drop the direct config.toml patching in favour of conf.d fragments:
  use containerd_render_config_default_with_imports to generate the
  base config, then write separate drop-ins for API socket overrides,
  debug settings, and the Kata runtime.
- Use CONTAINERD_SYSTEM_FRAGMENT_PREFIX directly (no PREFIX= indirection).
- Detect cfg_schema via _containerd_blob_schema_version to select the
  right plugin table:
    schema >= 3 -> io.containerd.cri.v1.runtime
    schema 2    -> io.containerd.grpc.v1.cri
  and to emit the sandboxer field only on schema >= 3.
- Pass GOTOOLCHAIN via "sudo -E make clean" so the environment variable
  set by export_go_toolchain_for_containerd_source_builds is preserved
  during the containerd source build.

The require_containerd_binary_default_schema_v3_plus call is kept: the
test explicitly clones and builds containerd 2.x from source, so a
schema v2 binary should never appear here.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
7428832c86 tests/nydus: make containerd config schema-aware
Configure containerd for nydus differently depending on the active
config schema, because conf.d drop-in fragments are only honoured the
same way by containerd 2.x.

config_containerd now delegates to _containerd_resolved_schema_version
(from common.bash) to detect the active schema and passes it to
config_containerd_core, which emits schema-appropriate config:

  schema >= 3 (containerd v2.x):
    Keep the base config and add a conf.d drop-in fragment using the
    io.containerd.cri.v1.runtime plugin (sandboxer = 'podsandbox') and
    io.containerd.cri.v1.images to select nydus as the snapshotter.

  schema 2 (containerd v1.x):
    conf.d is not honoured the same way, so replace config.toml
    wholesale with a complete, self-contained file using the
    io.containerd.grpc.v1.cri plugin with nydus as the snapshotter and
    no sandboxer field.

The [proxy_plugins] block is written in both cases as it is
schema-version agnostic.

Teardown restores the whole config.toml (schema v2 path) or removes the
drop-in fragment (schema v3+ path) as appropriate.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
1bb43d0a19 tests/common: make overwrite_containerd_config schema-aware
Rewrite overwrite_containerd_config so that it works with containerd
v1.x (schema v2) as well as containerd v2.x (schema v3+):

- Always regenerate /etc/containerd/config.toml from the installed
  binary via "sudo containerd config default".
- Call ensure_containerd_conf_d_rootful_api_sockets after regenerating
  the base config.
- Detect the effective schema via _containerd_resolved_schema_version.
- Schema >= 3 (containerd v2.x): write io.containerd.cri.v1.runtime
  plugin path with sandboxer = podsandbox into a conf.d drop-in.
- Schema 2 (containerd v1.x): write io.containerd.grpc.v1.cri plugin
  path without sandboxer into the drop-in.

check_containerd_config_for_kata no longer appends a schema guard;
the function supports both schema generations intentionally.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
18fbf4cd5d tests/common: fix install_cri_containerd for containerd 2.x
Three issues prevented containerd 2.x from working correctly after
installation:

1. Socket uid/gid mismatch: "containerd config default" was run as the
   unprivileged user, which produced uid = <runner-uid> in the API
   socket stanza instead of uid = 0.  Run it under sudo so the default
   output is owned by root.

2. Stale systemd unit: the CI runner ships a pre-installed containerd
   whose unit file is left in place after the binary is replaced by the
   test installer.  The old unit causes "MigrateConfigTo: index out of
   range" panics when the new binary tries to load a schema v4 config.
   Always overwrite the unit file from the template so the running
   binary and the unit file stay in sync.

3. Schema guard removed: install_cri_containerd installs whatever
   version was requested (v1.7 or v2.3) and must not abort on a valid
   schema v2 binary.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
fbf133ce3a tests/common: add containerd config schema helpers
Introduce helper functions used by later commits to make containerd
configuration schema-aware.

_containerd_blob_schema_version():
  Parse the version = <n> line from a containerd config blob and echo
  the integer.

_containerd_resolved_schema_version():
  Run "containerd config default" and return the schema version of the
  active binary.  Drives conditional logic in overwrite_containerd_config
  and other helpers.

containerd_emit_rootful_api_socket_overrides():
  Emit the TOML fragment that fixes uid/gid on the grpc/ttrpc sockets.
  Schema v3 uses top-level [grpc]/[ttrpc]; schema v4+ uses plugin-scoped
  tables.

require_containerd_config_schema_v3_plus() /
require_containerd_binary_default_schema_v3_plus():
  Guard helpers that abort with a clear message when the installed
  containerd is older than v2.x.  Used only in test paths that
  explicitly build containerd 2.x from source.

containerd_render_config_default_with_imports():
  Write a fresh "containerd config default" to a file and ensure the
  conf.d import glob is present, ready for drop-in fragments.

export_go_toolchain_for_containerd_source_builds():
  Set GOTOOLCHAIN=auto so "go build" of containerd 2.x downloads the
  exact toolchain in its go.mod without changing the global Go version.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
8ffe4e6c02 tests: add journalctl diagnostics on containerd restart failure
When restart_systemd_service_with_no_burst_limit fails or times out
waiting for the containerd socket, emit "journalctl -xeu
containerd.service" output so the failure reason is visible in CI logs
without requiring a separate log-collection step.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
e122d7ffb0 versions: bump containerd to 2.3 and define minimum/latest test matrix
Bump the containerd version used by CI from v1.7.25 to v2.3.0.

Rename the version-range fields in versions.yaml and throughout the
GitHub Actions workflows from lts/active/version/sandbox_api to
minimum/latest to make their meaning self-evident:

  minimum: "v1.7"   # oldest containerd branch under test
  latest:  "v2.3"   # newest containerd branch under test

Drop the bare version field (superseded by the matrix) and the
sandbox_api alias (covered by latest).  Update all containerd_version
matrix entries in the workflow files accordingly, and update
gha-run-k8s-common.sh to resolve the new key names.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
b119b051cb kata-deploy: support drop-in configs for default runtimes
Allow operators to provide per-shim drop-in TOML for built-in runtimes
and reconcile stale override files so upgrades and migrations remain
safe when drop-ins are added or removed.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Codex
2026-06-08 13:31:03 +02:00
Fupan Li
024c2531a5 Merge pull request #13029 from fidencio/topic/rfc-composable-vm-images
docs: add composable VM images design proposal
2026-06-08 18:40:35 +08:00
Fabiano Fidêncio
2440b5940b docs: add composable VM images design proposal
Add an RFC document describing the composable image architecture that
replaces monolithic guest rootfs images with a lean base image plus
purpose-specific addon images cold-plugged as virtio-blk devices.

The proposal covers the runtime configuration (extra_images), host-side
cold-plugging, guest-side mounting via systemd and dm-verity, agent-side
dynamic path resolution, the image build pipeline, and the security
model.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-07 13:58:17 +02:00
Fabiano Fidêncio
57c61e0c2f tests: unskip hard-coded policy tests on qemu-tdx-runtime-rs
Enable the hard-coded init-data policy test gate for qemu-tdx-runtime-rs
so runtime-rs and Go TDX variants exercise the same Kubernetes policy
coverage.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-06-06 22:48:20 +02:00
Fabiano Fidêncio
43321c7a78 Merge pull request #12931 from mythi/qemu-tdx-tests
tests: fix TDX runtime-rs and initdata tests
2026-06-06 11:42:19 +02:00
Fabiano Fidêncio
f6ff9578d4 Merge pull request #13161 from kata-containers/sprt/remove-configure-mariner
ci: remove Mariner annotations and use new config
2026-06-05 20:22:58 +02:00
Mikko Ylinen
013e901f1b tests: re-enable initdata tests for qemu-tdx
The coco initdata tests signature verification and authenticated registry
never worked on qemu-tdx and so they have been disabled since.

Add them back now that all necessary fixes are in place.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-06-05 16:04:05 +03:00
Mikko Ylinen
9313e336b5 tests: set image.image_pull_proxy for CDH initdata
initdata tests set kernel arguments to "" which resets the
kernel arguments configured by Helm install. However, TDX
runner depends on agent.https_proxy= kernel arguments to pull
images.

In order for initdata tests to work on TDX, the same needs to
be added to CDH configuration via image.image_pull_proxy.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-06-05 16:04:05 +03:00
Mikko Ylinen
f3a0ef6a7c tests: use kubectl set to configure KBS env
No need to patch yamls locally. Also, set RUST_LOG=debug
and enable https_proxy for all TDX targets when the runner
has HTTPS_PROXY is set.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-06-05 16:04:05 +03:00
Fabiano Fidêncio
743b0a4839 Merge pull request #13165 from stevenhorsman/bump-go-to-1.25.11
versions: bump golang to 1.25.11
2026-06-04 20:24:57 +02:00
Fabiano Fidêncio
2a1ce7b8c4 Merge pull request #12539 from mythi/no-vcpu-hotplug
Disable CPU hotplug when confidential guest setting enabled
2026-06-04 10:56:52 +02:00
stevenhorsman
879912be25 versions: bump golang to 1.25.11
Bump the go version to resolve CVEs:
- GO-2026-5037
- GO-2026-5038
- GO-2026-5039

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Generated-By: IBM Bob
2026-06-04 08:49:17 +01:00
Aurélien Bombo
de5333f275 ci: remove Mariner annotations and use new config
This is a follow-up to #13126 where we forgot to remove this now-unused code.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-06-03 09:25:12 -05:00
Mikko Ylinen
018389cb22 tests: enable k8s-sandbox-vcpus-allocation.bats for tdx and coco-dev
k8s-sandbox-vcpus-allocation.bats was disabled for qemu-tdx due to
errors when moving to use "upstream" TDX KVM code. The failing test
is vcpus-less-than-one-with-no-limits pod which ends up getting
x86 default MaxCPU = 240 and erroring:

Number of hotpluggable cpus requested (240) exceeds the maximum cpus supported by KVM (224)

TDX max vcpus is capped to host's logical CPUs so 240 is too much.

With the maxcpus logic fixed (=maxcpus not set at all) for configurations
where confidential guest is enabled, qemu-tdx can be enabled for
k8s-sandox-vcpus-allocation.bats again.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-06-03 15:27:35 +03:00
stevenhorsman
144ab161f1 tetss: bump golang.org/x/sys dependency
Bump golang.org/x/sys from v0.19.0 to v0.44.0 to resolve CVE:
- GO-2026-5024

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Generated-By: IBM Bob
2026-06-03 09:56:54 +01:00
Fabiano Fidêncio
230e01b04e Merge pull request #13126 from kata-containers/topic/runtimes-introduce-azure-specific-configs
runtime/runtime-rs: introduce Azure specific configs
2026-06-02 09:17:09 +02:00
Manuel Huber
7d9a143747 ci: cover EROFS snapshotter default_size=0 path
kata-deploy currently hard-codes the EROFS snapshotter
default_size to "10G", so the CoCo EROFS CI lane only
exercises the path where the snapshotter provides an rwlayer.

Use the generic containerd.userDropIn support for the EROFS
default_size and thread it through the Kubernetes CI helpers.
Keep the kata-deploy default at "10G" to preserve current
behavior, but allow the workflow to set "0" for the runtime-rs
no-rwlayer path.

Expand the existing EROFS snapshotter job to run both values.
The override is written to containerd as a TOML string so "0"
is not parsed as an integer.

Assisted-by: OpenAI Codex <codex@openai.com>
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-05-28 22:54:56 +00:00
Fabiano Fidêncio
744ab0b548 ci: improve kata-deploy pod wait and timeout diagnostics
Make kata-deploy deployment waits more robust by deriving the pod
selector from the rendered helm values and using it consistently for
readiness checks and logs.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-28 23:32:37 +02:00
Fabiano Fidêncio
81ce51a9aa ci: target Azure CLH runtimes directly in AKS tests
Switch AKS Mariner matrix entries to clh-azure handlers and remove the
temporary host-OS based helm value overrides.

Update integration test wiring and required test labels so CI tracks the
new runtime names.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-28 23:32:37 +02:00
Manuel Huber
3e874d0eaf tests: accept EROFS empty-image rootfs rejection
The empty-image test expects pod creation to fail. With an EROFS
snapshot that has a disk-backed rwlayer, runtime-rs can still reject
that pod with the existing unsupported mount-count error.

With default_size=0, there is no rwlayer mount. The same negative test
can instead reach the bind rootfs shape produced for the empty active
snapshot, which runtime-rs rejects as an unsupported rootfs mount.

Accept both messages so the test covers the expected failure for both
EROFS rwlayer modes.

Assisted-by: OpenAI Codex <codex@openai.com>
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-05-27 17:12:20 +00:00