Temporarily skip the `TestContainerMemoryUpdate` test case
for sandbox api.
This test case is currently skipped in other VMMs (e.g.,
QEMU, Cloud-Hypervisor) due to known issues and environmental
stability concerns.
To maintain consistency across the project, we are skipping it
for sandbox as well.
A follow-up PR will be dedicated to addressing these issues and
properly enabling/refining this test case for all VMMs.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Creating a new container in the same sandbox VM after the previous
container has exited and been removed has never been supported by
kata-containers (neither with the go-based nor the rust-based runtime).
When the last container is removed the kata VM shuts down, so any
attempt to start a new container in the same sandbox fails.
This test exercises a use-case kata does not currently support, and it
has never been part of the passing list for good reason. Mark it
explicitly excluded with a comment so it is clear this is a deliberate
omission rather than an oversight.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The check_daemon_setup function verifies that containerd + runc are
functional before the real kata tests run. Using the shim sandboxer
for this runc check hits a known containerd bug where the OCI spec
is not populated before NewBundle is called, so config.json is never
written and containerd-shim-runc-v2 fails at startup.
See containerd/containerd#11640
The sandboxer choice is irrelevant for this sanity check, so use
podsandbox which works correctly with runc.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
BATS_TEST_COMPLETED is per-test and remains empty in teardown_file.
Track file-level state so successful NIM runs skip the journal dump
while setup or test failures still include node diagnostics.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Place the NIM service into our test namespace. We are still observing
various situations where for some reasons, the NIM service appears in
the default namespace in our CI.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Wait for the NIM operator pod to run before deploying NIM services.
Add a temporary debug function to print resource placement into the
different namespaces. Remove this function again when the NIM tests
are stabilized.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Add basic genpolicy support for container environment variables sourced
from metadata.labels.
In this implementation, the relevant labels must be available as input
to the policy tool. This is slightly different from the way variables
sourced from metadata.annotations are treated by the tool: when the
relevant annotation is not available as input, the generated Policy
allows any value. Depending on metadata.labels use cases that we might
encounter maybe the labels will be handled the same way as the
annotations in the future.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Switch the rootfs bundle pull implementatio from using image-rs to
use skopeo and umoci to remove the really long crate dependency
tail that image-rs brings.
Generated-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
When kubectl wait times out the pod never reached Ready, so the
existing log collection (which runs after wait succeeds) produces
"-- No entries --" with zero useful information.
Capture kubectl describe and kubectl logs (including previous
container) immediately on timeout so the next CI run shows exactly
why the pod is stuck (ImagePullBackOff, OOMKilled, probe failures,
containerd restart hang, etc.).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
In cleanup_kata_deploy, bail out early when no kata-deploy Helm release
exists so baremetal-* pre-deploy cleanup on fresh clusters does not
block on helm uninstall --wait (up to 10m).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
Add qemu-nvidia-gpu-runtime-rs and qemu-nvidia-gpu-snp-runtime-rs to
the NVIDIA GPU test matrix so CI covers the new runtime-rs shims.
Introduce a `coco` boolean field in each matrix entry and use it for
all CoCo-related conditionals (KBS, snapshotter, KBS deploy/cleanup
steps). This replaces fragile name-string comparisons that were already
broken for the runtime-rs variants: `nvidia-gpu (runtime-rs)` was
incorrectly getting KBS steps, and `nvidia-gpu-snp (runtime-rs)` was
not getting the right env vars.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
1. Ignore PodAffinity's preferredDuringSchedulingIgnoredDuringExecution.
2. Ignore additional PodAffinityTerm fields.
3. Add basic tests for the new fields.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
The cron-job test workload was missing `runtimeClassName: kata`, which
meant the cron job was not actually being executed under the Kata
runtime, defeating the purpose of the test.
Set it explicitly, consistent with the sibling `job.yaml` workload.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Fix two issues in kata-deploy-lifecycle.bats that caused failures on
k3s, k0s and rke2:
run_on_host():
- `kubectl run --rm -i` causes k3s/rke2 to inject session-recording
banners into stdout, polluting command output and breaking string
assertions. Replace with a create/wait/logs/delete sequence so only
the container's actual stdout is captured.
"Artifacts are fully cleaned up after uninstall":
- After a CRI restart the kubelet may briefly report "Unknown" for the
container runtime version. Retry for up to 60s before asserting.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Now that kata-deploy has a proper readiness probe (/readyz returns 200
only after install completes), replace the ad-hoc wait strategies with
kubectl wait --for=condition=Ready on the kata-deploy pods.
Note: helm --wait is ineffective for single-node clusters with
maxUnavailable=1 (the DaemonSet is considered ready with 0 ready pods),
so the CI uses kubectl wait on the pod readiness condition directly.
gha-run-k8s-common.sh:
- Drop the waitForProcess polling loop for Running pods
- Drop the `sleep 60s` with its FIXME comment
- Add kubectl wait --for=condition=Ready instead
helm-deploy.bash:
- Drop the extra `kubectl rollout status` after helm
- Drop the `sleep 60`
- The existing --wait on the helm command now suffices
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The ITA_KEY secret was conditionally passed to TDX jobs for Intel
Trust Authority attestation, but it is no longer needed. Remove it
from all workflow files and the test helper export.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
No longer used; its two responsibilities are now expressed directly
in the workflows and the Makefile.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
For k8s 1.36.0, the events of a pod are no longer included in the "kubectl describe pod"
output when describing a deployment. Describe using the "app" label instead.
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
No need to deviate from how other CoCo targets use Trustee and
enables us to add more tests (e.g., RVPS) that ITA Trustee implemention
does not support.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
At first we thought this only happened with AKS, but it seems this is a
change in k8s 1.36.0 as the tests now started failing outside of AKS as
well.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
All the CIs are failing on the tests and in order to avoid blocking
upstream while allowing enough time for the developers to properly fix
it, let's just not execute the test.
This commit should be reverted once a fix is proposed.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Address shellcheck warnings including proper variable quoting,
use of [[ ]] over [ ], declaring and assigning variables separately,
and adding appropriate shellcheck disable directives where needed.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
Address shellcheck warnings including proper variable quoting,
use of [[ ]] over [ ], declaring and assigning variables separately,
and adding appropriate shellcheck disable directives where needed.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor