46 Commits

Author SHA1 Message Date
Fabiano Fidêncio
9e3bd6b576 tests: fix kata-deploy lifecycle test reliability
Fix two issues in kata-deploy-lifecycle.bats that caused failures on
k3s, k0s and rke2:

  run_on_host():
  - `kubectl run --rm -i` causes k3s/rke2 to inject session-recording
    banners into stdout, polluting command output and breaking string
    assertions. Replace with a create/wait/logs/delete sequence so only
    the container's actual stdout is captured.

  "Artifacts are fully cleaned up after uninstall":
  - After a CRI restart the kubelet may briefly report "Unknown" for the
    container runtime version. Retry for up to 60s before asserting.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-03 22:09:08 +02:00
Fabiano Fidêncio
ed4f6ebc9e tests: use readiness probes to wait for kata-deploy install
Now that kata-deploy has a proper readiness probe (/readyz returns 200
only after install completes), replace the ad-hoc wait strategies with
kubectl wait --for=condition=Ready on the kata-deploy pods.

Note: helm --wait is ineffective for single-node clusters with
maxUnavailable=1 (the DaemonSet is considered ready with 0 ready pods),
so the CI uses kubectl wait on the pod readiness condition directly.

  gha-run-k8s-common.sh:
  - Drop the waitForProcess polling loop for Running pods
  - Drop the `sleep 60s` with its FIXME comment
  - Add kubectl wait --for=condition=Ready instead

  helm-deploy.bash:
  - Drop the extra `kubectl rollout status` after helm
  - Drop the `sleep 60`
  - The existing --wait on the helm command now suffices

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-05-03 22:09:08 +02:00
Fabiano Fidêncio
b7eb3ae402 tests: Fix shellcheck issues in helm-deploy.bash
Address shellcheck warnings including proper variable quoting,
use of [[ ]] over [ ], declaring and assigning variables separately,
and adding appropriate shellcheck disable directives where needed.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-24 08:14:08 +02:00
Fabiano Fidêncio
143f9a7882 tests: Fix shellcheck issues in run-kata-deploy-tests.sh
Fix shellcheck warnings and notes identified by running
shellcheck --severity=style.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 08:14:08 +02:00
Fabiano Fidêncio
140e08044f tests: Fix shellcheck issues in gha-run.sh
Fix shellcheck warnings and notes identified by running
shellcheck --severity=style.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-24 08:14:08 +02:00
Fabiano Fidêncio
cf1e6f82f2 tests: Show full kata-deploy pod logs in CI
Remove --tail=N limits from `kubectl logs` for kata-deploy pods so
the complete output is visible in CI job logs for debugging.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-19 13:24:31 +02:00
Fabiano Fidêncio
2131147360 tests: add kata-deploy lifecycle tests for restart resilience and cleanup
Add functional tests that cover two previously untested kata-deploy
behaviors:

1. Restart resilience (regression test for #12761): deploys a
   long-running kata pod, triggers a kata-deploy DaemonSet restart via
   rollout restart, and verifies the kata pod survives with the same
   UID and zero additional container restarts.

2. Artifact cleanup: after helm uninstall, verifies that RuntimeClasses
   are removed, the kata-runtime node label is cleared, /opt/kata is
   gone from the host filesystem, and containerd remains healthy.

3. Artifact presence: after install, verifies /opt/kata and the shim
   binary exist on the host, RuntimeClasses are created, and the node
   is labeled.

Host filesystem checks use a short-lived privileged pod with a
hostPath mount to inspect the node directly.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 15:20:53 +02:00
Fabiano Fidêncio
56c3618c1d tests: kata-deploy: wait for API recovery after uninstall
kata-deploy's SIGTERM cleanup restarts the CRI runtime, which on
k3s/rke2 takes down the API server temporarily. The helm uninstall
may complete with errors, and the next test suite would start with
a dead API. Add a wait loop after uninstall to ensure the API is
available before proceeding.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 11:26:31 +01:00
Fabiano Fidêncio
5c0269881e tests: Make editorconfig-checker happy
- Trim trailing whitespace and ensure final newline in non-vendor files
- Add .editorconfig-checker.json excluding vendor dirs, *.patch, *.img,
  *.dtb, *.drawio, *.svg, and pkg/cloud-hypervisor/client so CI only
  checks project code
- Leave generated and binary assets unchanged (excluded from checker)

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-10 21:58:28 +01:00
Fabiano Fidêncio
c9061f9e36 tests: kata-deploy: Increase post-deployment wait time
Increase the sleep time after kata-deploy deployment from 10s to 60s
to give more time for runtimes to be configured. This helps avoid
race conditions on slower K8s distributions like k3s where the
RuntimeClass may not be immediately available after the DaemonSet
rollout completes.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-04 12:13:53 +01:00
Fabiano Fidêncio
0fb2c500fd tests: kata-deploy: Merge E2E tests to avoid timing issues
Merge the two E2E tests ("Custom RuntimeClass exists with correct
properties" and "Custom runtime can run a pod") into a single test, as
those 2 are very much dependent of each other.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-04 12:13:53 +01:00
Fabiano Fidêncio
fef93f1e08 tests: kata-deploy: Use die() instead of fail() for error handling
Replace fail() calls with die() which is already provided by
common.bash. The fail() function doesn't exist in the test
infrastructure, causing "command not found" errors when tests fail.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-04 12:13:53 +01:00
Fabiano Fidêncio
26c534d610 tests: Use shims.disableAll in test helpers
Update the CI and functional test helpers to use the new
shims.disableAll option instead of iterating over every shim
to disable them individually.

Also adds helm repo for node-feature-discovery before building
dependencies to fix CI failures on some distributions.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-26 20:50:01 +01:00
Fabiano Fidêncio
d8a3272f85 kata-deploy: Add tests for custom runtimes Helm templates
Add Bats tests to verify the custom runtimes Helm template rendering,
and that the we can start a pod with the custom runtime.

Tests were written with Cursor's help.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-26 20:50:01 +01:00
Fabiano Fidêncio
ec18dd79ba tests: Simplify kata-deploy test to use helm directly
The kata-deploy test was using helm_helper which made it hard to debug
failures (die() calls would cause "Executed 0 tests" errors) and added
unnecessary complexity.

The test now calls helm directly like a user would, making it simpler
and more representative of real-world usage. The verification job status
is explicitly checked with proper failure detection instead of relying
on helm --wait.

Timeouts are configurable via environment variables to account for
different network speeds and image sizes:
- KATA_DEPLOY_TIMEOUT (default: 600s)
- KATA_DEPLOY_DAEMONSET_TIMEOUT (default: 300s)
- KATA_DEPLOY_VERIFICATION_TIMEOUT (default: 120s)

Documentation has been added to explain what each timeout controls and
how to customize them.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-21 20:14:33 +01:00
Fabiano Fidêncio
e0158869b1 tests: Add common bats test runner function
Add run_bats_tests() function to common.bash that provides consistent
test execution and reporting across all test suites (k8s, nvidia,
kata-deploy).

This removes duplicated test runner code from run_kubernetes_tests.sh,
run_kubernetes_nv_tests.sh, and run-kata-deploy-tests.sh.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-20 12:31:55 +01:00
Fabiano Fidêncio
ea18f543b4 tests: kata-deploy: Enable verification during helm install
Enable post-install verification in kata-deploy CI tests. When
HELM_VERIFY_DEPLOYMENT is set, a simple verification pod is created
that runs with the Kata runtime to confirm deployment succeeded.

The verification pod prints kernel info and exits - success indicates
the Kata runtime is properly configured and functional.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-01-16 10:52:43 +01:00
Fabiano Fidêncio
5b01eaf929 tests: Align kata-deploy helm's uninstall
Let's use the same method both on the kata-deploy and k8s tests.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-04 09:29:35 +01:00
Fabiano Fidêncio
3107533953 tests: Adjust to runtimeClass creation by the chart
It's just a follow-up on the previous commit where we move away from the
runtimeClass creation inside the script, and instead we do it using the
chart itself.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-03 17:32:18 +01:00
Fabiano Fidêncio
60ba121a0d kata-deploy: nit: Fix test name
Just add a "is" there as it was missing.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-09-15 15:27:54 +02:00
Aurélien Bombo
96f1d95de5 gha: Remove unnecessary install-azure-cli step
az cli is already installed by the azure/login action.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2025-07-30 10:42:56 -05:00
Fabiano Fidêncio
5f17e61d11 tests: kata-deploy: Remove --wait from helm uninstall
As we're using a `kubectl wait --timeout ...` to check whether the
kata-deploy pod's been deleted or not, let's remove the `--wait` from
the `helm uninstall ...` call as k0s tests were failing because the
`kubectl wait --timeout...` was starting after the pod was deleted,
making the test fail.

Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-07-09 14:01:30 +02:00
Aurélien Bombo
9dd3807467 ci: Use OIDC to log into Azure
This completely eliminates the Azure secret from the repo, following the below
guidance:

https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/configuring-openid-connect-in-azure

The federated identity is scoped to the `ci` environment, meaning:

 * I had to specify this environment in some YAMLs. I don't believe there's any
   downside to this.
 * As previously, the CI works seamlessly both from PRs and in the manual
   workflow.

I also deleted the tools/packaging/kata-deploy/action folder as it doesn't seem
to be used anymore, and it contains a reference to the secret.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2025-06-06 15:26:10 -05:00
Fabiano Fidêncio
28be53ac92 kata-deploy: Create runtimeclasses by default
Let's make the life of the users easier and create the runtimeclasses
for them by default.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-03-31 11:29:44 +01:00
Fabiano Fidêncio
404e212102 tests: kata-deploy: Use helm_helper()
With this we switch to fully testing with helm, instead of testimg with
the kustomizations (which will soon be removed).

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-03-26 13:30:15 +01:00
Fabiano Fidêncio
c337a21a4e shellcheck: kata-deploy: Fix warnings
He were fixing the few warnings we found in the files present in the
functional tests for kata-deploy.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-03-05 19:44:27 +01:00
stevenhorsman
c5ff513e0b shellcheck: Fix shellcheck SC2068
> Double quote array expansions to avoid re-splitting elements

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-03-04 09:35:46 +00:00
Stephane Talbot
f80e7370d5 test: Verify deployement of kata-deploy on microk8s
Enable fonctional test to verify deployment of kata-deploy on a Microk8s cluster

Signed-off-by: Stephane Talbot <Stephane.Talbot@univ-savoie.fr>
2025-02-28 10:10:29 +01:00
Beraldo Leal
53b8158a81 tests: adding debug and skip to kata-deploy
If a test is failing during setup, makes no much sense to run the suite.
Let's skip and add some debug messages.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2024-05-31 13:28:34 -04:00
Beraldo Leal
3e8b4806b8 tests: increase debug messages for kata-deploy
When the timeout happens we can't tell much information about the nodes.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2024-05-31 13:28:34 -04:00
Beraldo Leal
c99ba42d62 deps: bumping yq to v4.40.7
Since yq frequently updates, let's upgrade to a version from February to
bypass potential issues with versions 4.41-4.43 for now. We can always
upgrade to the newest version if necessary.

Fixes #9354
Depends-on:github.com/kata-containers/tests#5818

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2024-05-31 13:28:34 -04:00
Fabiano Fidêncio
e81e8a4527 tests: kata-deploy: Adjust timeout
10 minutes is waay too long.  Let's give it 4 minutes only.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-27 06:23:00 +02:00
Fabiano Fidêncio
fba5793c0d tests: kata-deploy: Run the tests from "${repo_root_dir}"
Let's see if it helps with issues like:
```
error: must build at directory: not a valid directory: evalsymlink
failure on
'"/home/runner/actions-runner/_work/kata-containers/kata-containers/tests/functional/kata-deploy/../../..//tools/packaging/kata-deploy/kata-cleanup/overlays/k0s"'
: lstat
/home/runner/actions-runner/_work/kata-containers/kata-containers/tests/functional/kata-deploy/":
no such file or directory
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-27 06:23:00 +02:00
Fabiano Fidêncio
8a8a7ea0e5 tests: kata-deploy: Show more logs in the setup()
This will also help us to better understand possible failures with the
CI.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-27 05:05:06 +02:00
Fabiano Fidêncio
47d9589e9b tests: kata-deploy: Show output of passing tests
This will help us to debug failures and compare passing and failures
outputs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-27 05:05:06 +02:00
ChengyuZhu6
e23737a103 gha: refactor code with yq for better clarity
refactor code with yq for better clarity:

Before:
```bash
yq write -i "${tools_dir}/packaging/kata-deploy/kata-deploy/base/kata-deploy.yaml"
'spec.template.spec.containers[0].env[7].value' "${KATA_HYPERVISOR}:${SNAPSHOTTER}"
```

After:
```bash
yq write -i \
  "${tools_dir}/packaging/kata-deploy/kata-deploy/base/kata-deploy.yaml" \
  'spec.template.spec.containers[0].env[7].value' \
  "${KATA_HYPERVISOR}:${SNAPSHOTTER}"
```

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 18:06:00 +01:00
Wainer dos Santos Moschetta
24c163e6e1 tests/kata-deploy: fix checker for kata-deploy running
Currently, the checking for kata-deploy is running assume that the
daemonset scheduled at least one pod, however it might not had and the
kubectl wait command fails due to "error: no matching resources found".

On CI I've observed that fail intermittently. I suspect the service
account kata-deploy-sa take a while to show up then no kata-deploy is
scheduled in meanwhile.

Changed the checker logic to use waitForProcess() to keep testing if it is
already running, or hit the timeout (still 10m).

Fixes #9183
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-02-29 22:26:27 -03:00
stevenhorsman
9e718b4e23 gha: kata-deploy: Add containerd status check
After kata-deploy has installed, check that the worker nodes
are still in Ready state and don't have a containerd://Unknown
container runtime versions, identicating that container isn't working
to ensure that we didn't corrupt the containerd config during kata-deploy's edits

Fixes: #8678
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2023-12-20 09:10:43 +00:00
Fabiano Fidêncio
0015257636 ci: kata-deploy: Add deploy-k8s argument to gha-run.sh
We'll be using exactly the same code used for the k8s tests, which are
already deploying k3s on GARM.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 13:38:10 +02:00
Fabiano Fidêncio
bf2cb02283 ci: kata-deploy: Expland tests to run on k0s / rke2
We just need to make sure the correct overlay is applied, following what
we already have been doing for k3s.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 13:38:10 +02:00
Fabiano Fidêncio
9e1fb8a966 ci: kata-deploy: Export KUBERNETES env var
So we have a better control on which flavour of kubernetes kata-deploy
is expected to be targetting.

This was also done as part of fa62a4c01b,
for the k8s tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-19 12:37:56 +02:00
Fabiano Fidêncio
2d896ad12f gha: kata-deploy: Do the runtime class cleanup as part of the cleanup
Instead of doing this as part of the test itself, let's ensure it's done
before running the tests and during the tests cleanup.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-17 18:54:46 +02:00
Fabiano Fidêncio
4ffc2c86f3 gha: kata-deploy: Add the first kata-deploy test
This test, at least for now, only checks whether the runtimeclasses
have been properly created.

This is just a migration from a test we had as part of the k8s suite.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-17 18:54:46 +02:00
Fabiano Fidêncio
285e616b5e tests: common: Ensure test_type is used as part of the cluster's name
By doing this we can make sure there won't be any clash on the cluster
name created for either the k8s or the kata-deploy tests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-17 14:22:16 +02:00
Fabiano Fidêncio
ce6adecd0a gha: kata-deploy: Add run-kata-deploy-tests.sh
This will have the same function as run-k8s-tests.sh has, but for
kata-deploy.

Right now it doesn't have any tests, and the command to actually run the
tests is commented out, but right now this is just a placeholder that
will be populated sooner than later.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-17 09:49:03 +02:00
Fabiano Fidêncio
831e73ff91 tests: kata-deploy: Add functional/kata-deploy/gha-run.sh placeholder
Right now this file does nothing, as it's not even called by any GHA.
However, it'll be populated later on as part of a different series,
where we'll have kata-deploy specific tests running here.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-14 17:46:10 +02:00