Update the release workflow and scripts to package and publish
the kata-lifecycle-manager Helm chart alongside kata-deploy.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This chart installs an Argo WorkflowTemplate for orchestrating controlled,
node-by-node upgrades of kata-deploy with verification and automatic
rollback on failure.
The workflow processes nodes sequentially rather than in parallel to
ensure fleet consistency. This design choice prevents ending up with a
mixed-version fleet where some nodes run the new version while others
remain on the old version. If verification fails on any node, the
workflow stops immediately before touching remaining nodes.
Alternative approaches considered:
- withParam loop with semaphore (max-concurrent: 1): Provides cleaner UI
with all nodes visible at the same level, but Argo's semaphore only
controls concurrency, not failure propagation. When one node fails and
releases the lock, other nodes waiting on the semaphore still proceed.
- withParam with failFast: true: Would be ideal, but Argo only supports
failFast for DAG tasks, not for steps with withParam. Attempting to use
it results in "unknown field" errors.
- Single monolithic script: Would guarantee sequential execution and
fail-fast, but loses per-node visibility in the Argo UI and makes
debugging harder.
The chosen approach uses recursive Argo templates (upgrade-node-chain)
which naturally provides fail-fast behavior because if any step in the
chain fails, the recursion stops. Despite the nesting in the Argo UI,
each node's upgrade steps remain visible for monitoring.
A verification pod is required to validate that Kata is functioning
correctly on each node after upgrade. The chart will fail to install
without one. Users must provide the verification pod when installing
kata-lifecycle-manager using --set-file
defaults.verificationPod=./pod.yaml. The pod can also be overridden at
workflow submission time using a base64-encoded workflow parameter.
When passing the verification pod as a workflow parameter, base64
encoding is required because multi-line YAML with special characters
does not survive the journey through Argo CLI and shell script parsing.
The workflow validates prerequisites before touching any nodes. If no
verification pod is configured, the workflow fails immediately with a
clear error message. This prevents partial upgrades that would leave
the cluster in an inconsistent state.
During helm upgrade, kata-deploy's verification is explicitly disabled
(--set verification.pod="") because:
- kata-deploy's verification is cluster-wide, designed for initial install
- kata-lifecycle-manager does per-node verification with proper
placeholder substitution (${NODE}, ${TEST_POD})
- Running kata-deploy's verification on each node would be redundant and
could fail due to unsubstituted placeholders
On verification failure, the workflow triggers an automatic helm
rollback, waits for kata-deploy to stabilize, uncordons the node, and
marks it with a rolled-back status annotation. The workflow then exits
with an error so the failure is clearly visible.
The upgrade flow per node:
1. Prepare: Annotate node with upgrade status
2. Cordon: Mark node unschedulable
3. Drain (optional): Evict pods if enabled
4. Upgrade: Run helm upgrade with --reuse-values
5. Wait: Wait for kata-deploy DaemonSet pod ready
6. Verify: Run verification pod with substituted placeholders
7. Complete: Uncordon and update annotations
Draining is disabled by default because running Kata VMs continue using
their in-memory binaries after upgrade. Only new workloads use the
upgraded binaries. Users who prefer to evict all workloads before
maintenance can enable draining.
Known limitations:
- Fleet consistency during rollback: Because kata-deploy uses a DaemonSet
that is updated cluster-wide, nodes that pass verification are
uncordoned and can accept new workloads before all nodes are verified.
If a later node fails verification and triggers a rollback, workloads
that started on already-verified nodes continue running with the new
version's in-memory binaries while the cluster reverts to the old
version. This is generally acceptable since running VMs continue
functioning and new workloads use the rolled-back version. A future
improvement could implement a two-phase approach that cordons all nodes
upfront and only uncordons after all verifications pass.
The chart requires Argo Workflows v3.4+ and uses multi-arch container
images supporting amd64, arm64, s390x, and ppc64le.
Usage:
# Install kata-lifecycle-manager with verification pod (required)
helm install kata-lifecycle-manager ./kata-lifecycle-manager \
--set-file defaults.verificationPod=./my-verification-pod.yaml
# Label nodes for upgrade
kubectl label node worker-1 katacontainers.io/kata-lifecycle-manager-window=true
# Trigger upgrade
argo submit -n argo --from workflowtemplate/kata-lifecycle-manager \
-p target-version=3.25.0 \
-p node-selector="katacontainers.io/kata-lifecycle-manager-window=true" \
-p helm-namespace=kata-system
# Monitor progress
argo watch @latest -n argo
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add a Dockerfile and GitHub Actions workflow to build and publish
a multi-arch helm container image to quay.io/kata-containers/helm.
The image is based on quay.io/kata-containers/kubectl and adds:
- helm (latest stable version)
The image supports the following architectures:
- linux/amd64
- linux/arm64
- linux/s390x
- linux/ppc64le
The workflow runs:
- Weekly (every Sunday at 12:00 UTC, 12 hours after kubectl image)
- On manual trigger
- When the Dockerfile or workflow changes
Image tags:
- latest
- Date-based (YYYYMMDD)
- Helm version (e.g., v3.17.0)
- Git SHA
This image is used by the kata-upgrade Helm chart for orchestrating
kata-deploy upgrades via Argo Workflows.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Convert the NGC_API_KEY from a regular Kubernetes secret to a sealed
secret for the CC GPU tests. This ensures the API key is only accessible
within the confidential enclave after successful attestation.
The sealed secret uses the "vault" type which points to a resource stored
in the Key Broker Service (KBS). The Confidential Data Hub (CDH) inside
the guest will unseal this secret by fetching it from KBS after
attestation.
The initdata file is created AFTER create_tmp_policy_settings_dir()
copies the empty default file, and BEFORE auto_generate_policy() runs.
This allows genpolicy to add the generated policy.rego to our custom
CDH configuration.
The sealed secret format follows the CoCo specification:
sealed.<JWS header>.<JWS payload>.<signature>
Where the payload contains:
- version: "0.1.0"
- type: "vault" (pointer to KBS resource)
- provider: "kbs"
- resource_uri: KBS path to the actual secret
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Increase the sleep time after kata-deploy deployment from 10s to 60s
to give more time for runtimes to be configured. This helps avoid
race conditions on slower K8s distributions like k3s where the
RuntimeClass may not be immediately available after the DaemonSet
rollout completes.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Merge the two E2E tests ("Custom RuntimeClass exists with correct
properties" and "Custom runtime can run a pod") into a single test, as
those 2 are very much dependent of each other.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Replace fail() calls with die() which is already provided by
common.bash. The fail() function doesn't exist in the test
infrastructure, causing "command not found" errors when tests fail.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We cannot overwrtie a binary that's currently in use, and that's the
reason that elsewhere we remove / unlink the binary (the running process
keeps its file descriptor, so we're good doing that) and only then we
copy the binary. However, we missed doing this for the
nydus-snapshotter deployment.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Clean up trailing whitespaces, making life easier for those who
have configured their IDE to clean these up.
Suggest to not add new code with trailing whitespaces etc.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Add support for CRI-O annotations when fetching pod identifiers for
device cold plug. The code now checks containerd CRI annotations first,
then falls back to CRI-O annotations if they are empty.
This enables device cold plug to work with both containerd and CRI-O
container runtimes.
Annotations supported:
- containerd: io.kubernetes.cri.sandbox-name, io.kubernetes.cri.sandbox-namespace
- CRI-O: io.kubernetes.cri-o.KubeName, io.kubernetes.cri-o.Namespace
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Clean up existing nydus-snapshotter state to ensure fresh start with new
version.
This is safe across all K8s distributions (k3s, rke2, k0s, microk8s,
etc.) because we only touch the nydus data directory, not containerd's
internals.
When containerd tries to use non-existent snapshots, it will
re-pull/re-unpack.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As we have moved to use QEMU (and OVMF already earlier) from
kata-deploy, the custom tdx configurations and distro checks
are no longer needed.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Currently, a working TDX setup expects users to install special
TDX support builds from Canonical/CentOS virt-sig for TDX to
work. kata-deploy configured TDX runtime handler to use QEMU
from the distro's paths.
With TDX support now being available in upstream Linux and
Ubuntu 24.04 having an install candidate (linux-image-generic-6.17)
for a new enough kernel, move TDX configuration to use QEMU from
kata-deploy.
While this is the new default, going back to the original
setup is possible by making manual changes to TDX runtime handlers.
Note: runtime-rs is already using QEMUPATH for TDX.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
This Allows the updateStrategy to be configured for the kata-deploy helm
chart, this is enabling administrators to control the aggressiveness of
updates. For a less aggressive approach, the strategy can be set to
`OnDelete`. Alternatively, the update process can be made more
aggressive by adjusting the `maxUnavailable` parameter.
Signed-off-by: Nikolaj Lindberg Lerche <nlle@ambu.com>
Avoid redundant and confusing teardown_common() debug output for
k8s-policy-pod.bats and k8s-policy-pvc.bats.
The Policy tests skip the Message field when printing information about
their pods, because unfortunately that field might contain a truncated
Policy log - for the test cases that intentiocally cause Policy
failures. The non-truncated Policy log is already available from other
"kubectl describe" fields.
So, avoid the redundant pod information from teardown_common(), that
also included the confusing Message field.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Delete the pause_bundle directory before running the umoci unpack
operation. This will make builds idempotent and not fail with
errors like "create runtime bundle: config.json already exists in
.../build/pause-image/destdir/pause_bundle". This will make life
better when building locally.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Update Go from 1.24.11 to 1.24.12 to address security vulnerabilities
in the standard library:
- GO-2026-4342: Excessive CPU consumption in archive/zip
- GO-2026-4341: Memory exhaustion in net/url query parsing
- GO-2026-4340: TLS handshake encryption level issue in crypto/tls
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
1. Add disable_block_device_use to CLH settings file, for parity with
the already existing QEMU settings.
2. Set DEFDISABLEBLOCK := true by default for both QEMU and CLH. After
this change, Kata Guests will use by default virtio-fs to access
container rootfs directories from their Hosts. Hosts that were
designed to use Host block devices attached to the Guests can
re-enable these rootfs block devices by changing the value of
disable_block_device_use back to false in their settings files.
3. Add test using container image without any rootfs layers. Depending
on the container runtime and image snapshotter being used, the empty
container rootfs image might get stored on a host block device that
cannot be safely hotplugged to a guest VM, because the host is using
the same block device.
4. Add block device hotplug safety warning into the Kata Shim
configuration files.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Cameron McDermott <cameron@northflank.com>
Remove the initrd function and add the image function to align
with the actually existing functions in this file.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Confidential guests cannot use traditional IOMMU Group based VFIO.
Instead, they need to use IMMUFD. This is mainly because the group
abstraction is incompatible with a confidential device model.
If traditional VFIO is specified for a confidential guest, detect
the error and bail out early.
Fixes#12393
Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>
in CI we are testing the latest kata-deploy, which requires the latest
helm chart. The previous query doesn't work anymore, but these days we
should be able to rely on the "0.0.0-dev" tag and on helm to print the
to-be-installed version into console.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
I keep struggling finding the debug images, let's include them in the
peer-pods-azure.sh script so people can find them easier.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
This comment was first introduced in e111093 with secure_join()
but then we forgot to remove it when we switched to the safe-path
lib in c0ceaf6
Signed-off-by: Qingyuan Hou <lenohou@gmail.com>
We want to enable local and remote CUDA repository builds.
Moving the cuda and tools repo to versions.yaml with a
unified build for both types.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Fix empty string handling in format conversion
When HELM_ALLOWED_HYPERVISOR_ANNOTATIONS, HELM_AGENT_HTTPS_PROXY, or
HELM_AGENT_NO_PROXY are empty, the pattern matching condition
`!= *:*` or `!= *=*` evaluates to true, causing the conversion loop
to create invalid entries like "qemu-tdx: qemu-snp:".
Add -n checks to ensure conversion only runs when variables are
non-empty.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Update the CI and functional test helpers to use the new
shims.disableAll option instead of iterating over every shim
to disable them individually.
Also adds helm repo for node-feature-discovery before building
dependencies to fix CI failures on some distributions.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Update the Helm chart README to document the new shims.disableAll
option and simplify the examples that previously required listing
every shim to disable.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Simplify the example values files by using the new shims.disableAll
option instead of listing every shim to disable.
Before (try-kata-nvidia-gpu.values.yaml):
shims:
clh:
enabled: false
cloud-hypervisor:
enabled: false
# ... 15 more lines ...
After:
shims:
disableAll: true
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add a new `shims.disableAll` option that disables all standard shims
at once. This is useful when:
- Enabling only specific shims without listing every other shim
- Using custom runtimes only mode (no standard Kata shims)
Usage:
shims:
disableAll: true
qemu:
enabled: true # Only qemu is enabled
All helper templates are updated to check for this flag before
iterating over shims.
One thing that's super important to note here is that helm recursively
merges user values with chart defaults, making a simple
`disableAll` flag problematic: if defaults have `enabled: true`, user's
`disableAll: true` gets merged with those defaults, resulting in all
shims still being enabled.
The workaround found is to use null (`~`) as the default for `enabled`
field. The template logic interprets null differently based on
disableAll:
| enabled value | disableAll: false | disableAll: true |
|---------------|-------------------|------------------|
| ~ (null) | Enabled | Disabled |
| true | Enabled | Enabled |
| false | Disabled | Disabled |
This is backward compatible:
- Default behavior unchanged: all shims enabled when disableAll: false
- Users can set `disableAll: true` to disable all, then explicitly
enable specific shims with `enabled: true`
- Explicit `enabled: false` always disables, regardless of disableAll
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add Bats tests to verify the custom runtimes Helm template rendering,
and that the we can start a pod with the custom runtime.
Tests were written with Cursor's help.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>