Commit Graph

2147 Commits

Author SHA1 Message Date
Fabiano Fidêncio
1caacda174 tests/cri-containerd: update integration tests for containerd 2.x
Adapt create_containerd_config to work with containerd 2.x while
keeping compatibility with v1.x for completeness:

- Drop the direct config.toml patching in favour of conf.d fragments:
  use containerd_render_config_default_with_imports to generate the
  base config, then write separate drop-ins for API socket overrides,
  debug settings, and the Kata runtime.
- Use CONTAINERD_SYSTEM_FRAGMENT_PREFIX directly (no PREFIX= indirection).
- Detect cfg_schema via _containerd_blob_schema_version to select the
  right plugin table:
    schema >= 3 -> io.containerd.cri.v1.runtime
    schema 2    -> io.containerd.grpc.v1.cri
  and to emit the sandboxer field only on schema >= 3.
- Pass GOTOOLCHAIN via "sudo -E make clean" so the environment variable
  set by export_go_toolchain_for_containerd_source_builds is preserved
  during the containerd source build.

The require_containerd_binary_default_schema_v3_plus call is kept: the
test explicitly clones and builds containerd 2.x from source, so a
schema v2 binary should never appear here.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
7428832c86 tests/nydus: make containerd config schema-aware
Configure containerd for nydus differently depending on the active
config schema, because conf.d drop-in fragments are only honoured the
same way by containerd 2.x.

config_containerd now delegates to _containerd_resolved_schema_version
(from common.bash) to detect the active schema and passes it to
config_containerd_core, which emits schema-appropriate config:

  schema >= 3 (containerd v2.x):
    Keep the base config and add a conf.d drop-in fragment using the
    io.containerd.cri.v1.runtime plugin (sandboxer = 'podsandbox') and
    io.containerd.cri.v1.images to select nydus as the snapshotter.

  schema 2 (containerd v1.x):
    conf.d is not honoured the same way, so replace config.toml
    wholesale with a complete, self-contained file using the
    io.containerd.grpc.v1.cri plugin with nydus as the snapshotter and
    no sandboxer field.

The [proxy_plugins] block is written in both cases as it is
schema-version agnostic.

Teardown restores the whole config.toml (schema v2 path) or removes the
drop-in fragment (schema v3+ path) as appropriate.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
1bb43d0a19 tests/common: make overwrite_containerd_config schema-aware
Rewrite overwrite_containerd_config so that it works with containerd
v1.x (schema v2) as well as containerd v2.x (schema v3+):

- Always regenerate /etc/containerd/config.toml from the installed
  binary via "sudo containerd config default".
- Call ensure_containerd_conf_d_rootful_api_sockets after regenerating
  the base config.
- Detect the effective schema via _containerd_resolved_schema_version.
- Schema >= 3 (containerd v2.x): write io.containerd.cri.v1.runtime
  plugin path with sandboxer = podsandbox into a conf.d drop-in.
- Schema 2 (containerd v1.x): write io.containerd.grpc.v1.cri plugin
  path without sandboxer into the drop-in.

check_containerd_config_for_kata no longer appends a schema guard;
the function supports both schema generations intentionally.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
18fbf4cd5d tests/common: fix install_cri_containerd for containerd 2.x
Three issues prevented containerd 2.x from working correctly after
installation:

1. Socket uid/gid mismatch: "containerd config default" was run as the
   unprivileged user, which produced uid = <runner-uid> in the API
   socket stanza instead of uid = 0.  Run it under sudo so the default
   output is owned by root.

2. Stale systemd unit: the CI runner ships a pre-installed containerd
   whose unit file is left in place after the binary is replaced by the
   test installer.  The old unit causes "MigrateConfigTo: index out of
   range" panics when the new binary tries to load a schema v4 config.
   Always overwrite the unit file from the template so the running
   binary and the unit file stay in sync.

3. Schema guard removed: install_cri_containerd installs whatever
   version was requested (v1.7 or v2.3) and must not abort on a valid
   schema v2 binary.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
fbf133ce3a tests/common: add containerd config schema helpers
Introduce helper functions used by later commits to make containerd
configuration schema-aware.

_containerd_blob_schema_version():
  Parse the version = <n> line from a containerd config blob and echo
  the integer.

_containerd_resolved_schema_version():
  Run "containerd config default" and return the schema version of the
  active binary.  Drives conditional logic in overwrite_containerd_config
  and other helpers.

containerd_emit_rootful_api_socket_overrides():
  Emit the TOML fragment that fixes uid/gid on the grpc/ttrpc sockets.
  Schema v3 uses top-level [grpc]/[ttrpc]; schema v4+ uses plugin-scoped
  tables.

require_containerd_config_schema_v3_plus() /
require_containerd_binary_default_schema_v3_plus():
  Guard helpers that abort with a clear message when the installed
  containerd is older than v2.x.  Used only in test paths that
  explicitly build containerd 2.x from source.

containerd_render_config_default_with_imports():
  Write a fresh "containerd config default" to a file and ensure the
  conf.d import glob is present, ready for drop-in fragments.

export_go_toolchain_for_containerd_source_builds():
  Set GOTOOLCHAIN=auto so "go build" of containerd 2.x downloads the
  exact toolchain in its go.mod without changing the global Go version.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
8ffe4e6c02 tests: add journalctl diagnostics on containerd restart failure
When restart_systemd_service_with_no_burst_limit fails or times out
waiting for the containerd socket, emit "journalctl -xeu
containerd.service" output so the failure reason is visible in CI logs
without requiring a separate log-collection step.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
e122d7ffb0 versions: bump containerd to 2.3 and define minimum/latest test matrix
Bump the containerd version used by CI from v1.7.25 to v2.3.0.

Rename the version-range fields in versions.yaml and throughout the
GitHub Actions workflows from lts/active/version/sandbox_api to
minimum/latest to make their meaning self-evident:

  minimum: "v1.7"   # oldest containerd branch under test
  latest:  "v2.3"   # newest containerd branch under test

Drop the bare version field (superseded by the matrix) and the
sandbox_api alias (covered by latest).  Update all containerd_version
matrix entries in the workflow files accordingly, and update
gha-run-k8s-common.sh to resolve the new key names.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <noreply@cursor.com>
2026-06-08 19:20:14 +02:00
Fabiano Fidêncio
b119b051cb kata-deploy: support drop-in configs for default runtimes
Allow operators to provide per-shim drop-in TOML for built-in runtimes
and reconcile stale override files so upgrades and migrations remain
safe when drop-ins are added or removed.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Codex
2026-06-08 13:31:03 +02:00
Fupan Li
024c2531a5 Merge pull request #13029 from fidencio/topic/rfc-composable-vm-images
docs: add composable VM images design proposal
2026-06-08 18:40:35 +08:00
Fabiano Fidêncio
2440b5940b docs: add composable VM images design proposal
Add an RFC document describing the composable image architecture that
replaces monolithic guest rootfs images with a lean base image plus
purpose-specific addon images cold-plugged as virtio-blk devices.

The proposal covers the runtime configuration (extra_images), host-side
cold-plugging, guest-side mounting via systemd and dm-verity, agent-side
dynamic path resolution, the image build pipeline, and the security
model.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-07 13:58:17 +02:00
Fabiano Fidêncio
57c61e0c2f tests: unskip hard-coded policy tests on qemu-tdx-runtime-rs
Enable the hard-coded init-data policy test gate for qemu-tdx-runtime-rs
so runtime-rs and Go TDX variants exercise the same Kubernetes policy
coverage.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-06-06 22:48:20 +02:00
Fabiano Fidêncio
43321c7a78 Merge pull request #12931 from mythi/qemu-tdx-tests
tests: fix TDX runtime-rs and initdata tests
2026-06-06 11:42:19 +02:00
Fabiano Fidêncio
f6ff9578d4 Merge pull request #13161 from kata-containers/sprt/remove-configure-mariner
ci: remove Mariner annotations and use new config
2026-06-05 20:22:58 +02:00
Mikko Ylinen
013e901f1b tests: re-enable initdata tests for qemu-tdx
The coco initdata tests signature verification and authenticated registry
never worked on qemu-tdx and so they have been disabled since.

Add them back now that all necessary fixes are in place.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-06-05 16:04:05 +03:00
Mikko Ylinen
9313e336b5 tests: set image.image_pull_proxy for CDH initdata
initdata tests set kernel arguments to "" which resets the
kernel arguments configured by Helm install. However, TDX
runner depends on agent.https_proxy= kernel arguments to pull
images.

In order for initdata tests to work on TDX, the same needs to
be added to CDH configuration via image.image_pull_proxy.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-06-05 16:04:05 +03:00
Mikko Ylinen
f3a0ef6a7c tests: use kubectl set to configure KBS env
No need to patch yamls locally. Also, set RUST_LOG=debug
and enable https_proxy for all TDX targets when the runner
has HTTPS_PROXY is set.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-06-05 16:04:05 +03:00
Fabiano Fidêncio
743b0a4839 Merge pull request #13165 from stevenhorsman/bump-go-to-1.25.11
versions: bump golang to 1.25.11
2026-06-04 20:24:57 +02:00
Fabiano Fidêncio
2a1ce7b8c4 Merge pull request #12539 from mythi/no-vcpu-hotplug
Disable CPU hotplug when confidential guest setting enabled
2026-06-04 10:56:52 +02:00
stevenhorsman
879912be25 versions: bump golang to 1.25.11
Bump the go version to resolve CVEs:
- GO-2026-5037
- GO-2026-5038
- GO-2026-5039

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Generated-By: IBM Bob
2026-06-04 08:49:17 +01:00
Aurélien Bombo
de5333f275 ci: remove Mariner annotations and use new config
This is a follow-up to #13126 where we forgot to remove this now-unused code.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-06-03 09:25:12 -05:00
Mikko Ylinen
018389cb22 tests: enable k8s-sandbox-vcpus-allocation.bats for tdx and coco-dev
k8s-sandbox-vcpus-allocation.bats was disabled for qemu-tdx due to
errors when moving to use "upstream" TDX KVM code. The failing test
is vcpus-less-than-one-with-no-limits pod which ends up getting
x86 default MaxCPU = 240 and erroring:

Number of hotpluggable cpus requested (240) exceeds the maximum cpus supported by KVM (224)

TDX max vcpus is capped to host's logical CPUs so 240 is too much.

With the maxcpus logic fixed (=maxcpus not set at all) for configurations
where confidential guest is enabled, qemu-tdx can be enabled for
k8s-sandox-vcpus-allocation.bats again.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-06-03 15:27:35 +03:00
stevenhorsman
144ab161f1 tetss: bump golang.org/x/sys dependency
Bump golang.org/x/sys from v0.19.0 to v0.44.0 to resolve CVE:
- GO-2026-5024

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Generated-By: IBM Bob
2026-06-03 09:56:54 +01:00
Fabiano Fidêncio
230e01b04e Merge pull request #13126 from kata-containers/topic/runtimes-introduce-azure-specific-configs
runtime/runtime-rs: introduce Azure specific configs
2026-06-02 09:17:09 +02:00
Manuel Huber
7d9a143747 ci: cover EROFS snapshotter default_size=0 path
kata-deploy currently hard-codes the EROFS snapshotter
default_size to "10G", so the CoCo EROFS CI lane only
exercises the path where the snapshotter provides an rwlayer.

Use the generic containerd.userDropIn support for the EROFS
default_size and thread it through the Kubernetes CI helpers.
Keep the kata-deploy default at "10G" to preserve current
behavior, but allow the workflow to set "0" for the runtime-rs
no-rwlayer path.

Expand the existing EROFS snapshotter job to run both values.
The override is written to containerd as a TOML string so "0"
is not parsed as an integer.

Assisted-by: OpenAI Codex <codex@openai.com>
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-05-28 22:54:56 +00:00
Fabiano Fidêncio
744ab0b548 ci: improve kata-deploy pod wait and timeout diagnostics
Make kata-deploy deployment waits more robust by deriving the pod
selector from the rendered helm values and using it consistently for
readiness checks and logs.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-28 23:32:37 +02:00
Fabiano Fidêncio
81ce51a9aa ci: target Azure CLH runtimes directly in AKS tests
Switch AKS Mariner matrix entries to clh-azure handlers and remove the
temporary host-OS based helm value overrides.

Update integration test wiring and required test labels so CI tracks the
new runtime names.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-28 23:32:37 +02:00
Manuel Huber
3e874d0eaf tests: accept EROFS empty-image rootfs rejection
The empty-image test expects pod creation to fail. With an EROFS
snapshot that has a disk-backed rwlayer, runtime-rs can still reject
that pod with the existing unsupported mount-count error.

With default_size=0, there is no rwlayer mount. The same negative test
can instead reach the bind rootfs shape produced for the empty active
snapshot, which runtime-rs rejects as an unsupported rootfs mount.

Accept both messages so the test covers the expected failure for both
EROFS rwlayer modes.

Assisted-by: OpenAI Codex <codex@openai.com>
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-05-27 17:12:20 +00:00
Manuel Huber
6a715cf4f7 tests: nvidia: No policy for runtime-rs path
The current if condition causes agent security policies to be
attached to the non-TEE NVIDIA runtime-rs runtime class. While
this is good to see that it works, this is not intended. Thus,
replacting the condition with is_confidential_gpu_hypervisor.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-05-25 16:00:49 -07:00
Fabiano Fidêncio
f763e9cca9 tests: Add NUMA topology / GPU placement tests to the NV CIs
Add k8s-nvidia-numa.bats with five tests that validate NUMA behaviour
on hosts where NUMA is configured by default (qemu-nvidia-gpu,
qemu-nvidia-gpu-snp, qemu-nvidia-gpu-tdx):

1. Multi-node sandbox (large workload spanning all host NUMA nodes):
   - Guest NUMA node count matches host
   - Guest vCPU distribution is balanced across nodes (max-min <= 1)
   - Guest memory is distributed across NUMA nodes
   - Host-side vCPU pinning is balanced across NUMA nodes

2. Right-sized single-node sandbox (small workload fitting one node):
   - Guest collapses to a single NUMA node
   - All host vCPU threads pinned to that one NUMA node

3. GPU passthrough with VFIO, multi-node:
   - Guest NUMA topology is balanced (same as test 1)
   - Guest GPU's NUMA node matches the host GPU's NUMA node
     (resolved via the vfio-pci,host=<BDF> from the QEMU command
     line and /sys/bus/pci/devices/<BDF>/numa_node)
   - QEMU command line contains pxb-pcie and policy=bind
   - Host vCPU pinning is balanced

4. GPU passthrough with VFIO, right-sized single-node: small workload
   plus GPU that fits in a single host NUMA node:
   - Guest collapses to a single NUMA node
   - The chosen node is the GPU's host NUMA node, not just any node
     that fits — verified by matching host-nodes= in the memory
     backend and pxb-pcie numa_node= against the GPU's host node
   - Guest GPU reports the same NUMA node as the host GPU

5. Explicit numa_mapping in the runtime TOML (QEMU-only):
   - Drops a config.d/ fragment that sets numa_mapping = ["1"], so the
     auto-derive + right-sizing path is bypassed entirely
   - Guest sees exactly 1 NUMA node
   - QEMU memory backend is bound to host node 1 (host-nodes=1,
     policy=bind), not host node 0
   - Host-side vCPU threads land on host node 1
   - Drop-in is removed on teardown so subsequent tests are unaffected

Guest-side checks use a dedicated container image
(quay.io/kata-containers/numa) that reads sysfs and prints results to
stdout — no kubectl exec or CoCo policy overrides needed.

Host-side checks (crictl, pgrep, taskset) run directly on the host
via sudo; a standalone numa-pinning-check.sh script handles the vCPU
thread affinity inspection.  The config.d/ helpers used by test 5 are
runtime-agnostic (probe Go vs runtime-rs layout on disk) but the test
is gated to qemu-* shims since runtime-rs does not yet implement
NUMA.

Skips cleanly on single-NUMA hosts, unsupported hypervisors, or when
no nvidia.com/pgpu resources are available (GPU tests only).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-24 22:00:46 +02:00
Fabiano Fidêncio
20705470e9 docs: Add NUMA support guide for Kata Containers with QEMU
Add a step-by-step how-to guide covering host inspection, Kata NUMA
drop-in setup (via kata-deploy Helm and manual config.d/), pod
deployment examples, and guest/host verification procedures.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-24 22:00:46 +02:00
Fabiano Fidêncio
cbcdd999e4 Merge pull request #12957 from Apokleos/fix-sb-api
runtime-rs: Fix sandbox-api lifecycle and CRI status handling
2026-05-23 09:26:14 +02:00
Fabiano Fidêncio
5d3e1e6396 kata-deploy: verify kata-runtime label remains stable on rke2/k3s
The retry loop added in efd468df3f still allows the install to declare
success while inside the kubelet's post-restart re-register window.

On rke2/k3s, `systemctl restart rke2-agent` restarts both containerd
and the kubelet, but `wait_till_node_is_ready` polls `.status.conditions[Ready]`
every 2 s and returns on the first `True` observation it sees. By default
the kubelet only publishes node status every ~10 s, so that first `True`
is almost always the stale value from before the restart — the kubelet
hasn't actually finished restarting yet. `label_node_with_retry` then
applies the label, sleeps 1 s, reads back "true" (still stale, kubelet
still down), and returns Ok. Install completes, `/readyz` flips to 200,
helm releases its `--wait`, and the bats test starts — and only then
does the kubelet finish coming up, re-register the node, and clobber
the label with its cached set. The lifecycle test sees an empty
`katacontainers.io/kata-runtime` and fails:

  # Node label katacontainers.io/kata-runtime:
  not ok 1 Kata artifacts are present on host after install

A single-shot verification can't distinguish "still stale true" from
"truly stable true after kubelet re-register". Replace it with a
stability window: after (re)applying the label, require it to remain
at the expected value for STABILITY_CHECKS=6 consecutive observations
spaced CHECK_INTERVAL=2 s apart (≈ 12 s — comfortably more than the
kubelet's status-update period). If the value ever drifts inside the
window, re-apply and restart the stability counter. Bounded by
MAX_APPLY_ATTEMPTS=12, so worst case is ~3 min; happy path adds ~12 s
to install.

Also add a short polling loop to the test's own label assertion as
belt-and-suspenders for any leftover transient race, matching the
existing retry pattern used for the container-runtime version check.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-22 11:53:18 +02:00
Alex Lyn
adf6d43e24 test: skip TestContainerMemoryUpdate for sandbox api
Temporarily skip the `TestContainerMemoryUpdate` test case
for sandbox api.

This test case is currently skipped in other VMMs (e.g.,
QEMU, Cloud-Hypervisor) due to known issues and environmental
stability concerns.
To maintain consistency across the project, we are skipping it
for sandbox as well.

A follow-up PR will be dedicated to addressing these issues and
properly enabling/refining this test case for all VMMs.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-22 10:46:44 +08:00
Alex Lyn
b5349f4d78 versions: bump containerd to 2.3 for sandbox API tests
containerd 2.3 requires Go 1.26.3, but Kata still pins Go 1.25.10.
Use Go 1.26.3 for the sandbox-api job so that make cri-integration
can build containerd from source.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-22 10:46:16 +08:00
Alex Lyn
9f78dc687f tests: exclude TestContainerRestart from the cri-containerd test list
Creating a new container in the same sandbox VM after the previous
container has exited and been removed has never been supported by
kata-containers (neither with the go-based nor the rust-based runtime).
When the last container is removed the kata VM shuts down, so any
attempt to start a new container in the same sandbox fails.

This test exercises a use-case kata does not currently support, and it
has never been part of the passing list for good reason.  Mark it
explicitly excluded with a comment so it is clear this is a deliberate
omission rather than an oversight.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-22 10:45:50 +08:00
Alex Lyn
a7739579d6 tests: Use podsandbox sandboxer for the runc sanity check
The check_daemon_setup function verifies that containerd + runc are
functional before the real kata tests run. Using the shim sandboxer
for this runc check hits a known containerd bug where the OCI spec
is not populated before NewBundle is called, so config.json is never
written and containerd-shim-runc-v2 fails at startup.

See containerd/containerd#11640

The sandboxer choice is irrelevant for this sanity check, so use
podsandbox which works correctly with runc.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-05-22 10:44:38 +08:00
stevenhorsman
f47d1c0d69 tests/agent-ctl: Add debug
The agent-ctl tests are failing in the CI, but there is no log reporting,
so debugging is not possible. Add some debug to help.

Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-05-19 12:00:47 +01:00
Fabiano Fidêncio
2c1dec0c14 Merge pull request #13035 from stevenhorsman/docs-static-checks-cleanup
ci: remove docs URL alive check workflow
2026-05-18 17:59:03 +02:00
Fabiano Fidêncio
05f836ea23 Merge pull request #13038 from stevenhorsman/move-k8s-measured-rootfs
ci: Move measure-rootfs to run on TEE PRs
2026-05-18 17:29:25 +02:00
Hyounggyu Choi
f6fce19e01 Merge pull request #13062 from BbolroC/skip-coco-test-with-no-reference-values-ibm-sel
test: skip CDH resource test for qemu-se without reference values
2026-05-18 14:47:50 +02:00
Hyounggyu Choi
540986bc8f test: skip CDH resource test for qemu-se without reference values
Since gc and trustee were bumped (#13046), the test
"Cannot get CDH resource when affirming policy is set without reference values"
has started failing for IBM SEL.

The attestation policy for IBM SEL returns an "affirming"
result whenever the claim can be parsed successfully,
meaning the evidence verification succeeds. As a result,
the negative test above always produces a positive result.

Skip this negative test for IBM SEL environments
(e.g. qemu-se*).

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-05-18 08:40:16 +02:00
Steve Horsman
59b27c4645 Merge pull request #13057 from microsoft/danmihai1/deploy-check-hypervisor-name
gha: k8s: reject unsupported KATA_HYPERVISOR values
2026-05-17 18:43:49 +01:00
Dan Mihai
ddc36060d2 gha: k8s: reject unsupported KATA_HYPERVISOR values
Exit early with an error message instead of starting kata-deploy if
the value of KATA_HYPERVISOR is not expected during CI.

For example: "cloud-hypervisor" was renamed recently to
"clh-runtime-rs" and user scripts depending on the old name were
getting tangled in kata-deploy instead of just rejecting the old
value quickly.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-05-16 01:04:31 +00:00
Dan Mihai
b85fc8ed13 tests: export target_branch="${branch}"
Avoid running "git remote show origin" repeatedly when common.bash
gets sourced multiple times and target_branch was not specified by
the caller.

Repeated "git remote show origin" calls inflicted the additional
overhead of authenticating and communicating with the remote git
repository.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-05-16 00:59:44 +00:00
Dan Mihai
0f3df5d1e4 Merge pull request #13025 from manuelh-dev/mahuber/img-pull-policy
tests: generate guest-pull image pull agent security policies
2026-05-15 14:09:00 -07:00
Fabiano Fidêncio
c19bdbf23b tests: nvidia-nim: use trusted storage templates for runtime-rs
Now that runtime-rs supports block-encrypted emptyDir volumes, remove
the no-trusted-storage workaround templates and the is_runtime_rs
branching in the NIM test. Runtime-rs now uses the same TEE templates
as the Go runtime with emptyDir + PVC at 48Gi memory, instead of the
128Gi workaround that compensated for lacking trusted storage.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-14 22:56:11 +02:00
Fabiano Fidêncio
54aaa1ea2a tests: enable trusted ephemeral storage for runtime-rs
Remove the runtime-rs skip from the trusted ephemeral data storage
test now that runtime-rs implements block-encrypted emptyDir volumes.

Also remove the genpolicy drop-in that disabled encrypted_emptydir
for runtime-rs and the corresponding copy logic in tests_common.sh.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
2026-05-14 22:56:11 +02:00
Manuel Huber
ed4233bf91 rootfs: cdh: Update CDH to new version
Update CDH to a newer version and:
- adjust the NVIDIA root filesystem build to reflect the change from
  using libcryptsetup to using the cryptsetup binary.
- adjust image-pull test cases to conduct parallel write operations
  on the /dev/trusted_store backed guest image pull location since
  issue #12721 has been solved on CDH side.

Fixes #12721

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-05-13 20:20:45 +02:00
stevenhorsman
5c55726d11 tests/k8s: Update measured-rootfs image
Try and switch the docker nginx image to our versions.yaml
one so we avoid rate limit issues

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-05-13 17:17:48 +01:00
stevenhorsman
2870f7c2dd ci: Move measure-rootfs to run on TEE PRs
k8s-measured-rootfs only runs on confidential runtime,
so we should move it into the subset on tests that run on TEEs

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-05-13 17:01:50 +01:00