Compare commits

...

28 Commits

Author SHA1 Message Date
Alex Lyn
4a1c2b6620 Merge pull request #12309 from kata-containers/stale-issues-by-date
workflows: Create workflow to stale issues based on date
2026-04-03 09:31:34 +08:00
Fabiano Fidêncio
09194d71bb Merge pull request #12767 from nubificus/fix/fc-rs
runtime-rs: Fix FC API fields
2026-04-02 18:24:35 +02:00
Manuel Huber
dd868dee6d tests: nvidia: onboard NIM service test
Onboard a test case for deploying a NIM service using the NIM
operator. We install the operator helm chart on the fly as this is
a fast operation, spinning up a single operand. Once a NIM service
is scheduled, the operator creates a deployment with a single pod.

For now, the TEE-based flow uses an allow-all policy. In future
work, we strive to support generating pod security policies for the
scenario where NIM services are deployed and the pod manifest is
being generated on the fly.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-04-02 16:58:54 +02:00
Steve Horsman
58101a2166 Merge pull request #12656 from stevenhorsman/actions/checkout-bump
workflows: Update actions/checkout version
2026-04-01 17:34:39 +01:00
Fabiano Fidêncio
75df4c0bd3 Merge pull request #12766 from fidencio/topic/kata-deploy-avoid-kata-pods-to-crash-after-containerd-restart
kata-deploy: Fix kata-deploy pods crashing if containerd restarts
2026-04-01 18:28:16 +02:00
Steve Horsman
2830c4f080 Merge pull request #12746 from ldoktor/ci-helm2
ci.ocp: Use helm deployment for peer-pods
2026-04-01 17:13:21 +01:00
Lukáš Doktor
55a3772032 ci.ocp: Add note about external tests to README.md
to run all the tests that are running in CI we need to enable external
tests. This can be a bit tricky so add it into our documentation.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2026-04-01 16:59:33 +01:00
Lukáš Doktor
3bc460fd82 ci.ocp: Use helm deployment for peer-pods
replace the deprecated CAA deployment with helm one. Note that this also
installs the CAA mutating webhook, which wasn't installed before.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2026-04-01 16:59:33 +01:00
Fabiano Fidêncio
2131147360 tests: add kata-deploy lifecycle tests for restart resilience and cleanup
Add functional tests that cover two previously untested kata-deploy
behaviors:

1. Restart resilience (regression test for #12761): deploys a
   long-running kata pod, triggers a kata-deploy DaemonSet restart via
   rollout restart, and verifies the kata pod survives with the same
   UID and zero additional container restarts.

2. Artifact cleanup: after helm uninstall, verifies that RuntimeClasses
   are removed, the kata-runtime node label is cleared, /opt/kata is
   gone from the host filesystem, and containerd remains healthy.

3. Artifact presence: after install, verifies /opt/kata and the shim
   binary exist on the host, RuntimeClasses are created, and the node
   is labeled.

Host filesystem checks use a short-lived privileged pod with a
hostPath mount to inspect the node directly.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 15:20:53 +02:00
Fabiano Fidêncio
b4b62417ed kata-deploy: skip cleanup on pod restart to avoid crashing kata pods
When a kata-deploy DaemonSet pod is restarted (e.g. due to a label
change or rolling update), the SIGTERM handler runs cleanup which
unconditionally removes kata artifacts and restarts containerd. This
causes containerd to lose the kata shim binary, crashing all running
kata pods on the node.

Fix this by implementing a three-stage cleanup decision:

1. If this pod's owning DaemonSet still exists (exact name match via
   DAEMONSET_NAME env var), this is a pod restart — skip all cleanup.
   The replacement pod will re-run install, which is idempotent.

2. If this DaemonSet is gone but other kata-deploy DaemonSets still
   exist (multi-install scenario), perform instance-specific cleanup
   only (snapshotters, CRI config, artifacts) but skip shared
   resources (node label removal, CRI restart) to avoid disrupting
   the other instances.

3. If no kata-deploy DaemonSets remain, perform full cleanup including
   node label removal and CRI restart.

The Helm chart injects a DAEMONSET_NAME environment variable with the
exact DaemonSet name (including any multi-install suffix), ensuring
instance-aware lookup rather than broadly matching any DaemonSet
containing "kata-deploy".

Fixes: #12761

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 15:20:52 +02:00
Fabiano Fidêncio
28414a614e kata-deploy: detect k3s/rke2 via systemd services instead of version string
Newer k3s releases (v1.34+) no longer include "k3s" in the containerd
version string at all (e.g. "containerd://2.2.2-bd1.34" instead of
"containerd://2.1.5-k3s1"). This caused kata-deploy to fall through to
the default "containerd" runtime, configuring and restarting the system
containerd service instead of k3s's embedded containerd — leaving the
kata runtime invisible to k3s.

Fix by detecting k3s/rke2 via their systemd service names (k3s,
k3s-agent, rke2-server, rke2-agent) rather than parsing the containerd
version string. This is more robust and works regardless of how k3s
formats its containerd version.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 14:24:55 +02:00
Fabiano Fidêncio
8b9ce3b6cb tests: remove k3s/rke2 V3 containerd template workaround
Remove the workaround that wrote a synthetic containerd V3 config
template for k3s/rke2 in CI. This was added to test kata-deploy's
drop-in support before the upstream k3s/rke2 patch shipped. Now that
k3s and rke2 include the drop-in imports in their default template,
the workaround is no longer needed and breaks newer versions.

Removed:
- tests/containerd-config-v3.tmpl (synthetic Go template)
- _setup_containerd_v3_template_if_needed() and its k3s/rke2 wrappers
- Calls from deploy_k3s() and deploy_rke2()

This reverts the test infrastructure part of a2216ec05.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 14:24:55 +02:00
Steve Horsman
9a3f6b075e Merge pull request #12753 from stevenhorsman/remove-agent-Cargo-
agent: Remove Cargo.lock
2026-04-01 13:22:57 +01:00
Steve Horsman
0d38d88b07 Merge pull request #12484 from Amulyam24/runtime-rs-ppc64le
runtime-rs: add QEMU support for ppc64le
2026-04-01 12:54:40 +01:00
Fabiano Fidêncio
6555350625 Merge pull request #12765 from fidencio/topic/kata-deploy-nydus-fix-systemd-unit
kata-deploy: Make nydus a soft dep of containerd
2026-04-01 13:40:42 +02:00
Fabiano Fidêncio
b823184cf7 Merge pull request #12580 from manuelh-dev/mahuber/gpu-ci-storage
tests: gpu: use the container image layer storage feature
2026-04-01 12:23:56 +02:00
Fabiano Fidêncio
fe1f804543 kata-deploy: Restart nydus-snapshotter in case of failure
Let's ensure that in case nydus-snapshotter crashes for one reason or
another, the service is restarted.

This follows containerd approach, and avoids manual intervention in the
node.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 11:00:21 +02:00
Fabiano Fidêncio
789abe6fdf kata-deploy: Make nydus a soft dep of containerd
Let's relax our RequiredBy and use a WantedBy in the nydus systemd unit
file as, in case of a nydus crash, containerd would also be put down,
causing the node to become NotReady.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 10:52:29 +02:00
Amulyam24
bf74f683d7 runtime-rs: align memory size with desired block size on ppc64le
couldn't initialise QMP: Connection reset by peer (os error 104)
Caused by:
    Connection reset by peer (os error 104)

qemu stderr: "qemu-system-ppc64: Maximum memory size 0x80000000 is not aligned to 256 MiB”

When the default max memory was assigned according to the
available host memory, it failed with the above error

Align the memory values with the block size of 256 MB on ppc64le.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2026-04-01 09:36:45 +01:00
Amulyam24
dcb7d025c7 runtime-rs: Use libc::TUNSETIFF instead of wrapper TUNSETIFF()
While attaching the tap device, it fails on ppc64le with EBADF

"cannot create tap device. File descriptor in bad state (os error 77)\"): unknown”

Refactor the ioctl call to use the standard libc::TUNSETIFF constant.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2026-04-01 09:36:45 +01:00
Amulyam24
8d25ff2c36 runtime-rs: implement set_capabilities for qemu
After the qemu VM is booted, while storing the guest details,
it fails to set capabilities as it is not yet implemented
for QEMU, this change adds a default implementation for it.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2026-04-01 09:36:45 +01:00
Amulyam24
778524467b runtime-rs: enable building runtime-rs on ppc64le
Adds changes in Makefile to build runtime-rs on ppc64le
with QEMU support.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2026-04-01 09:36:45 +01:00
Manuel Huber
177f5c308e tests: gpu: use container image layer storage
Use the container image layer storage feature for the
k8s-nvidia-nim.bats test pod manifests. This reduces the pods'
memory requirements.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-04-01 10:22:26 +02:00
Manuel Huber
b6cf00a374 tests: parametrize storage parameters
- trusted-storage.yaml.in: use $PV_STORAGE_CAPACITY and
  $PVC_STORAGE_REQUEST so that PV/PVC size can vary per test.
- confidential_common.sh: add optional size (MB) argument to
  create_loop_device.
- k8s-guest-pull-image.bats: pass PV_STORAGE_CAPACITY and
  PVC_STORAGE_REQUEST when generating storage config.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-04-01 10:22:26 +02:00
stevenhorsman
5390e470d3 agent: Remove Cargo.lock
Following on from #12690, the agent is part of the repo workspace,
so no longer needs a lock file.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-01 09:11:28 +01:00
stevenhorsman
99eaa8fcb1 workflows: Create workflow to stale issues based on date
The standard stale/action is intended to be run regularly with
a date offset, but we want to have one we can run against a specific
date in order to run the stale bot against issues created since a particular
release milestone, so calculate the offset in one step and use it in the next.

At the moment we want to run this to stale issues before 9th October 2022 when Kata 3.0 was release, so default to this.

Note the stale action only processes a few issues at a time to avoid rate limiting, so why we want a cron job to it can get through
the backlog, but also to stale/unstale issues that are commented on.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-31 15:57:37 +01:00
stevenhorsman
12578b41f2 govmm: Delete old files
The govmm workflow isn't run by us and it and the other CI files
are just legacy from when it was a separate repo, so let's clean up
this debt rather than having to update it frequently.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-30 10:45:28 +01:00
stevenhorsman
b3179bdd8e workflows: Update actions/checkout version
Update the action to resolve the following warning in GHA:
> Node.js 20 actions are deprecated. The following actions are running
> on Node.js 20 and may not work as expected:
> actions/checkout@11bd71901b.
> Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-30 10:45:28 +01:00
81 changed files with 1047 additions and 6164 deletions

View File

@@ -35,7 +35,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
SANDBOXER: "shim"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -96,7 +96,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
SANDBOXER: "podsandbox"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -141,7 +141,7 @@ jobs:
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -195,7 +195,7 @@ jobs:
env:
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -242,7 +242,7 @@ jobs:
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -287,7 +287,7 @@ jobs:
env:
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -334,7 +334,7 @@ jobs:
name: run-kata-agent-apis
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -35,7 +35,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
SANDBOXER: "shim"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -96,7 +96,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
SANDBOXER: "podsandbox"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -72,7 +72,7 @@ jobs:
sudo rm -f /tmp/kata_hybrid* # Sometime we got leftover from test_setup_hvsock_failed()
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -84,7 +84,7 @@ jobs:
sudo rm -f /tmp/kata_hybrid* # Sometime we got leftover from test_setup_hvsock_failed()
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -75,7 +75,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -178,7 +178,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -272,7 +272,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -326,7 +326,7 @@ jobs:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -391,7 +391,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -436,7 +436,7 @@ jobs:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -64,7 +64,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -162,7 +162,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -253,7 +253,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -305,7 +305,7 @@ jobs:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -51,7 +51,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -109,7 +109,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -184,7 +184,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -240,7 +240,7 @@ jobs:
run: |
sudo chown -R "$USER":"$USER" "$GITHUB_WORKSPACE"
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -38,7 +38,7 @@ jobs:
- kernel
- virtiofsd
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history

View File

@@ -58,7 +58,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -143,7 +143,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -196,7 +196,7 @@ jobs:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Rebase atop of the latest target branch
@@ -270,7 +270,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -328,7 +328,7 @@ jobs:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -28,7 +28,7 @@ jobs:
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -20,7 +20,7 @@ jobs:
steps:
- name: Checkout Code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Generate Action

View File

@@ -73,7 +73,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -182,7 +182,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -16,7 +16,7 @@ jobs:
name: ci
deployment: false
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -60,7 +60,7 @@ jobs:
# your codebase is analyzed, see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/codeql-code-scanning-for-compiled-languages
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -27,7 +27,7 @@ jobs:
echo "$HOME/.local/bin" >> "${GITHUB_PATH}"
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -19,7 +19,7 @@ jobs:
run: |
echo "GOPATH=${GITHUB_WORKSPACE}" >> "$GITHUB_ENV"
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-24.04
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -42,7 +42,7 @@ jobs:
skip_test: ${{ steps.skipper.outputs.skip_test }}
skip_static: ${{ steps.skipper.outputs.skip_static }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -30,7 +30,7 @@ jobs:
issues: read
pull-requests: read
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: 0

View File

@@ -20,7 +20,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Ensure nydus-snapshotter-version is in sync inside our repo

View File

@@ -145,7 +145,7 @@ jobs:
needs: [publish-kata-deploy-payload-amd64, publish-kata-deploy-payload-arm64, publish-kata-deploy-payload-s390x, publish-kata-deploy-payload-ppc64le]
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -171,7 +171,7 @@ jobs:
packages: write # needed to push the helm chart to ghcr.io
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -44,7 +44,7 @@ jobs:
packages: write
runs-on: ${{ inputs.runner }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -21,7 +21,7 @@ jobs:
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -50,7 +50,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: get-kata-tarball

View File

@@ -50,7 +50,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: get-kata-tarball

View File

@@ -47,7 +47,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: get-kata-tarball

View File

@@ -51,7 +51,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: get-kata-tarball

View File

@@ -12,7 +12,7 @@ jobs:
contents: write # needed for the `gh release create` command
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
@@ -87,7 +87,7 @@ jobs:
packages: write # needed to push the multi-arch manifest to ghcr.io
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -124,7 +124,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -206,7 +206,7 @@ jobs:
contents: write # needed for the `gh release` commands
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -224,7 +224,7 @@ jobs:
contents: write # needed for the `gh release` commands
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -242,7 +242,7 @@ jobs:
contents: write # needed for the `gh release` commands
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -261,7 +261,7 @@ jobs:
packages: write # needed to push the helm chart to ghcr.io
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -298,7 +298,7 @@ jobs:
contents: write # needed for the `gh release` commands
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -41,7 +41,7 @@ jobs:
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ inputs.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -73,7 +73,7 @@ jobs:
GENPOLICY_PULL_METHOD: ${{ matrix.genpolicy-pull-method }}
RUNS_ON_AKS: "true"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -46,7 +46,7 @@ jobs:
K8S_TEST_HOST_TYPE: all
TARGET_ARCH: "aarch64"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -63,7 +63,7 @@ jobs:
CONTAINER_ENGINE_VERSION: ${{ matrix.environment.containerd_version }}
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -124,4 +124,3 @@ jobs:
if: always()
timeout-minutes: 15
run: bash tests/integration/kubernetes/gha-run.sh cleanup

View File

@@ -53,7 +53,7 @@ jobs:
USE_EXPERIMENTAL_SNAPSHOTTER_SETUP: ${{ matrix.environment.name == 'nvidia-gpu-snp' && 'true' || 'false' }}
K8S_TEST_HOST_TYPE: baremetal
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -45,7 +45,7 @@ jobs:
KUBERNETES: ${{ matrix.k8s }}
TARGET_ARCH: "ppc64le"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -78,7 +78,7 @@ jobs:
AUTHENTICATED_IMAGE_USER: ${{ vars.AUTHENTICATED_IMAGE_USER }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -74,7 +74,7 @@ jobs:
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
SNAPSHOTTER: ${{ matrix.snapshotter }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -71,7 +71,7 @@ jobs:
GH_ITA_KEY: ${{ secrets.ITA_KEY }}
AUTO_GENERATE_POLICY: "yes"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -169,7 +169,7 @@ jobs:
CONTAINER_ENGINE_VERSION: "active"
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -292,7 +292,7 @@ jobs:
K8S_TEST_HOST_TYPE: "all"
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -408,7 +408,7 @@ jobs:
AUTO_GENERATE_POLICY: "no"
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -62,7 +62,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: "vanilla"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -46,7 +46,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: ${{ matrix.k8s }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -40,7 +40,7 @@ jobs:
#CONTAINERD_VERSION: ${{ matrix.containerd_version }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -46,7 +46,7 @@ jobs:
K8S_TEST_HOST_TYPE: "baremetal"
KUBERNETES: kubeadm
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -27,7 +27,7 @@ jobs:
steps:
- name: "Checkout code"
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -22,7 +22,7 @@ jobs:
runs-on: ubuntu-24.04
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -23,7 +23,7 @@ jobs:
runs-on: ubuntu-24.04
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

42
.github/workflows/stale_issues.yaml vendored Normal file
View File

@@ -0,0 +1,42 @@
name: 'Stale issues with activity before a fixed date'
on:
schedule:
- cron: '0 0 * * *'
workflow_dispatch:
inputs:
date:
description: "Date of stale cut-off. All issues not updated since this date will be marked as stale. Format: YYYY-MM-DD e.g. 2022-10-09"
default: "2022-10-09"
required: false
type: string
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
stale:
name: stale
runs-on: ubuntu-24.04
permissions:
actions: write # Needed to manage caches for state persistence across runs
issues: write # Needed to add/remove labels, post comments, or close issues
steps:
- name: Calculate the age to stale
run: |
echo AGE=$(( ( $(date +%s) - $(date -d "${DATE:-2022-10-09}" +%s) ) / 86400 )) >> "$GITHUB_ENV"
env:
DATE: ${{ inputs.date }}
- name: Run the stale action
uses: actions/stale@5bef64f19d7facfb25b37b414482c7164d639639 # v9.1.0
with:
stale-pr-message: 'This issue has had no activity since before ${DATE}. Please comment on the issue, or it will be closed in 30 days'
days-before-pr-stale: -1
days-before-pr-close: -1
days-before-issue-stale: ${AGE}
days-before-issue-close: 30
env:
DATE: ${{ inputs.date }}

View File

@@ -28,7 +28,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
@@ -71,7 +71,7 @@ jobs:
component-path: src/dragonball
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
@@ -115,7 +115,7 @@ jobs:
packages: write # for push to ghcr.io
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
@@ -171,7 +171,7 @@ jobs:
contents: read # for checkout
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -37,6 +37,23 @@ oc adm policy add-scc-to-group anyuid system:authenticated system:serviceaccount
oc label --overwrite ns default pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/warn=baseline pod-security.kubernetes.io/audit=baseline
```
The e2e suite uses a combination of built-in (origin) and external tests. External
tests include Kubernetes upstream conformance tests from the `hyperkube` image.
To enable external tests, export a variable matching your cluster version:
```bash
export EXTENSIONS_PAYLOAD_OVERRIDE=$(oc get clusterversion version -o jsonpath='{.status.desired.image}')
# Optional: limit to hyperkube only (k8s conformance tests, avoids downloading all operator extensions)
export EXTENSION_BINARY_OVERRIDE_INCLUDE_TAGS="hyperkube"
```
Alternatively, skip external tests entirely (only OpenShift-specific tests from origin):
```bash
export OPENSHIFT_SKIP_EXTERNAL_TESTS=1
```
Now you should be ready to run the openshift-tests. Our CI only uses a subset
of tests, to get the current ``TEST_SKIPS`` see
[the pipeline config](https://github.com/openshift/release/tree/master/ci-operator/config/kata-containers/kata-containers).

View File

@@ -39,6 +39,21 @@ git_sparse_clone() {
git checkout FETCH_HEAD
}
#######################
# Install prerequisites
#######################
if ! command -v helm &>/dev/null; then
echo "Helm not installed, installing in current location..."
PATH="${PWD}:${PATH}"
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | HELM_INSTALL_DIR='.' bash -s -- --no-sudo
fi
if ! command -v yq &>/dev/null; then
echo "yq not installed, installing in current location..."
PATH="${PWD}:${PATH}"
curl -fsSL https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -o ./yq
chmod +x yq
fi
###############################
# Disable security to allow e2e
###############################
@@ -83,7 +98,6 @@ AZURE_REGION=$(az group show --resource-group "${AZURE_RESOURCE_GROUP}" --query
# Create workload identity
AZURE_WORKLOAD_IDENTITY_NAME="caa-${AZURE_CLIENT_ID}"
az identity create --name "${AZURE_WORKLOAD_IDENTITY_NAME}" --resource-group "${AZURE_RESOURCE_GROUP}" --location "${AZURE_REGION}"
USER_ASSIGNED_CLIENT_ID="$(az identity show --resource-group "${AZURE_RESOURCE_GROUP}" --name "${AZURE_WORKLOAD_IDENTITY_NAME}" --query 'clientId' -otsv)"
#############################
@@ -184,84 +198,36 @@ echo "CAA_IMAGE=\"${CAA_IMAGE}\""
echo "CAA_TAG=\"${CAA_TAG}\""
echo "PP_IMAGE_ID=\"${PP_IMAGE_ID}\""
# Install cert-manager (prerequisit)
helm install cert-manager oci://quay.io/jetstack/charts/cert-manager --namespace cert-manager --create-namespace --set crds.enabled=true
# Clone and configure caa
git_sparse_clone "https://github.com/confidential-containers/cloud-api-adaptor.git" "${CAA_GIT_SHA:-main}" "src/cloud-api-adaptor/install/"
git_sparse_clone "https://github.com/confidential-containers/cloud-api-adaptor.git" "${CAA_GIT_SHA:-main}" "src/cloud-api-adaptor/install/charts/" "src/peerpod-ctrl/chart" "src/webhook/chart"
echo "CAA_GIT_SHA=\"$(git rev-parse HEAD)\""
pushd src/cloud-api-adaptor
cat <<EOF > install/overlays/azure/workload-identity.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cloud-api-adaptor-daemonset
namespace: confidential-containers-system
spec:
template:
metadata:
labels:
azure.workload.identity/use: "true"
---
pushd src/cloud-api-adaptor/install/charts/peerpods
# Use the latest kata-deploy
yq -i '( .dependencies[] | select(.name == "kata-deploy") ) .version = "0.0.0-dev"' Chart.yaml
helm dependency update .
# Create secrets
kubectl apply -f - << EOF
apiVersion: v1
kind: ServiceAccount
kind: Namespace
metadata:
name: cloud-api-adaptor
namespace: confidential-containers-system
annotations:
azure.workload.identity/client-id: "${USER_ASSIGNED_CLIENT_ID}"
name: confidential-containers-system
labels:
app.kubernetes.io/managed-by: Helm
annotations:
meta.helm.sh/release-name: peerpods
meta.helm.sh/release-namespace: confidential-containers-system
EOF
PP_INSTANCE_SIZE="Standard_D2as_v5"
DISABLECVM="true"
cat <<EOF > install/overlays/azure/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../yamls
images:
- name: cloud-api-adaptor
newName: "${CAA_IMAGE}"
newTag: "${CAA_TAG}"
generatorOptions:
disableNameSuffixHash: true
configMapGenerator:
- name: peer-pods-cm
namespace: confidential-containers-system
literals:
- CLOUD_PROVIDER="azure"
- AZURE_SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID}"
- AZURE_REGION="${PP_REGION}"
- AZURE_INSTANCE_SIZE="${PP_INSTANCE_SIZE}"
- AZURE_RESOURCE_GROUP="${PP_RESOURCE_GROUP}"
- AZURE_SUBNET_ID="${PP_SUBNET_ID}"
- AZURE_IMAGE_ID="${PP_IMAGE_ID}"
- DISABLECVM="${DISABLECVM}"
- PEERPODS_LIMIT_PER_NODE="50"
secretGenerator:
- name: peer-pods-secret
namespace: confidential-containers-system
envs:
- service-principal.env
- name: ssh-key-secret
namespace: confidential-containers-system
files:
- id_rsa.pub
patchesStrategicMerge:
- workload-identity.yaml
EOF
ssh-keygen -t rsa -f install/overlays/azure/id_rsa -N ''
echo "AZURE_CLIENT_ID=${AZURE_CLIENT_ID}" > install/overlays/azure/service-principal.env
echo "AZURE_CLIENT_SECRET=${AZURE_CLIENT_SECRET}" >> install/overlays/azure/service-principal.env
echo "AZURE_TENANT_ID=${AZURE_TENANT_ID}" >> install/overlays/azure/service-principal.env
# Deploy Operator
git_sparse_clone "https://github.com/confidential-containers/operator" "${OPERATOR_SHA:-main}" "config/"
echo "OPERATOR_SHA=\"$(git rev-parse HEAD)\""
oc apply -k "config/release"
oc apply -k "config/samples/ccruntime/peer-pods"
popd
# Deploy CAA
kubectl apply -k "install/overlays/azure"
popd
popd
kubectl create secret generic my-provider-creds \
-n confidential-containers-system \
--from-literal=AZURE_CLIENT_ID="$AZURE_CLIENT_ID" \
--from-literal=AZURE_CLIENT_SECRET="$AZURE_CLIENT_SECRET" \
--from-literal=AZURE_TENANT_ID="$AZURE_TENANT_ID"
helm install peerpods . -f providers/azure.yaml --set secrets.mode=reference --set secrets.existingSecretName=my-provider-creds --set providerConfigs.azure.AZURE_SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID}" --set providerConfigs.azure.AZURE_REGION="${PP_REGION}" --set providerConfigs.azure.AZURE_INSTANCE_SIZE="Standard_D2as_v5" --set providerConfigs.azure.AZURE_RESOURCE_GROUP="${PP_RESOURCE_GROUP}" --set providerConfigs.azure.AZURE_SUBNET_ID="${PP_SUBNET_ID}" --set providerConfigs.azure.AZURE_IMAGE_ID="${PP_IMAGE_ID}" --set providerConfigs.azure.DISABLECVM="true" --set providerConfigs.azure.PEERPODS_LIMIT_PER_NODE="50" --set kata-deploy.snapshotter.setup= --dependency-update -n confidential-containers-system --create-namespace --wait
popd # charts
popd # git_sparse_clone CAA
# Wait for runtimeclass
SECONDS=0

5687
src/agent/Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -41,6 +41,7 @@ sysctl = "0.7.1"
tempfile = "3.19.1"
test-utils = { path = "../test-utils" }
nix = "0.26.4"
rstest = "0.18"
[features]
default = []

View File

@@ -1077,6 +1077,81 @@ impl MemoryInfo {
if self.default_maxmemory == 0 || u64::from(self.default_maxmemory) > host_memory {
self.default_maxmemory = host_memory as u32;
}
// Apply PowerPC64 memory alignment
#[cfg(all(target_arch = "powerpc64", target_endian = "little"))]
self.adjust_ppc64_memory_alignment()?;
Ok(())
}
/// Adjusts memory values for PowerPC64 little-endian systems to meet
/// QEMU's 256MB block size alignment requirement.
///
/// Ensures default_memory is at least 1024MB and both default_memory
/// and default_maxmemory are aligned to 256MB boundaries.
/// Returns an error if aligned values would be equal.
#[cfg(all(target_arch = "powerpc64", target_endian = "little"))]
fn adjust_ppc64_memory_alignment(&mut self) -> Result<()> {
const PPC64_MEM_BLOCK_SIZE: u64 = 256;
const MIN_MEMORY_MB: u64 = 1024;
fn align_memory(value: u64) -> u64 {
(value / PPC64_MEM_BLOCK_SIZE) * PPC64_MEM_BLOCK_SIZE
}
let mut mem_size = u64::from(self.default_memory);
let max_mem_size = u64::from(self.default_maxmemory);
// Ensure minimum memory size
if mem_size < MIN_MEMORY_MB {
info!(
sl!(),
"PowerPC: Increasing default_memory from {}MB to minimum {}MB",
mem_size,
MIN_MEMORY_MB
);
mem_size = MIN_MEMORY_MB;
}
// Align both values to 256MB boundaries
let aligned_mem = align_memory(mem_size);
let aligned_max_mem = align_memory(max_mem_size);
if aligned_mem != mem_size {
info!(
sl!(),
"PowerPC: Aligned default_memory from {}MB to {}MB", mem_size, aligned_mem
);
}
if aligned_max_mem != max_mem_size {
info!(
sl!(),
"PowerPC: Aligned default_maxmemory from {}MB to {}MB",
max_mem_size,
aligned_max_mem
);
}
// Check if aligned values are equal
if aligned_max_mem != 0 && aligned_max_mem <= aligned_mem {
return Err(std::io::Error::other(format!(
"PowerPC: default_maxmemory ({}MB) <= default_memory ({}MB) after alignment. \
Requires maxmemory > memory. Please increase default_maxmemory.",
aligned_max_mem, aligned_mem
)));
}
info!(
sl!(),
"PowerPC: Memory alignment applied - memory: {}MB, max_memory: {}MB",
aligned_mem,
aligned_max_mem
);
self.default_memory = aligned_mem as u32;
self.default_maxmemory = aligned_max_mem as u32;
Ok(())
}
@@ -1948,4 +2023,72 @@ mod tests {
);
}
}
#[cfg(all(target_arch = "powerpc64", target_endian = "little"))]
use rstest::rstest;
#[rstest]
#[case::memory_below_minimum(512, 2048, 1024, 2048)]
#[case::already_aligned(1024, 2048, 1024, 2048)]
#[case::unaligned_rounds_down(1100, 2100, 1024, 2048)]
#[cfg(all(target_arch = "powerpc64", target_endian = "little"))]
fn test_adjust_ppc64_memory_alignment_success(
#[case] input_memory: u32,
#[case] input_maxmemory: u32,
#[case] expected_memory: u32,
#[case] expected_maxmemory: u32,
) {
let mut mem = MemoryInfo {
default_memory: input_memory,
default_maxmemory: input_maxmemory,
..Default::default()
};
let result = mem.adjust_ppc64_memory_alignment();
assert!(
result.is_ok(),
"Expected success but got error: {:?}",
result.err()
);
assert_eq!(
mem.default_memory, expected_memory,
"Memory not aligned correctly"
);
assert_eq!(
mem.default_maxmemory, expected_maxmemory,
"Max memory not aligned correctly"
);
}
#[rstest]
#[case::equal_after_alignment(1024, 1100, "Requires maxmemory > memory")]
#[case::maxmemory_less_than_memory(2048, 1500, "Requires maxmemory > memory")]
#[cfg(all(target_arch = "powerpc64", target_endian = "little"))]
fn test_adjust_ppc64_memory_alignment_errors(
#[case] input_memory: u32,
#[case] input_maxmemory: u32,
#[case] expected_error_msg: &str,
) {
let mut mem = MemoryInfo {
default_memory: input_memory,
default_maxmemory: input_maxmemory,
..Default::default()
};
let result = mem.adjust_ppc64_memory_alignment();
assert!(
result.is_err(),
"Expected error but got success for memory={}, maxmemory={}",
input_memory,
input_maxmemory
);
let error_msg = result.unwrap_err().to_string();
assert!(
error_msg.contains(expected_error_msg),
"Error message '{}' does not contain expected text '{}'",
error_msg,
expected_error_msg
);
}
}

View File

@@ -33,15 +33,11 @@ test:
exit 0
install: install-runtime install-configs
else ifeq ($(ARCH), powerpc64le)
default:
@echo "PowerPC 64 LE is not currently supported"
exit 0
default: runtime show-header
test:
@echo "PowerPC 64 LE is not currently supported"
exit 0
install:
@echo "PowerPC 64 LE is not currently supported"
@echo "powerpc64le is not currently supported"
exit 0
install: install-runtime install-configs
else ifeq ($(ARCH), riscv64gc)
default: runtime show-header
test:

View File

@@ -7,9 +7,6 @@
MACHINETYPE := pseries
KERNELPARAMS := cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1
MACHINEACCELERATORS := "cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-large-decr=off,cap-ccf-assist=off"
CPUFEATURES := pmu=off
CPUFEATURES :=
QEMUCMD := qemu-system-ppc64
# dragonball binary name
DBCMD := dragonball

View File

@@ -147,7 +147,7 @@ impl Tap {
// ioctl is safe since we call it with a valid tap fd and check the return
// value.
let ret = unsafe { ioctl_with_mut_ref(&tuntap, TUNSETIFF(), ifr) };
let ret = unsafe { ioctl_with_mut_ref(&tuntap, libc::TUNSETIFF as libc::c_ulong, ifr) };
if ret < 0 {
return Err(Error::CreateTap(IoError::last_os_error()));
}

View File

@@ -617,8 +617,10 @@ impl QemuInner {
todo!()
}
pub(crate) fn set_capabilities(&mut self, _flag: CapabilityBits) {
todo!()
pub(crate) fn set_capabilities(&mut self, flag: CapabilityBits) {
let mut caps = Capabilities::default();
caps.set(flag)
}
pub(crate) fn set_guest_memory_block_size(&mut self, size: u32) {

View File

@@ -1,30 +0,0 @@
on: ["pull_request"]
name: Unit tests
permissions:
contents: read
jobs:
test:
name: test
strategy:
matrix:
go-version: [1.15.x, 1.16.x]
os: [ubuntu-22.04]
runs-on: ${{ matrix.os }}
steps:
- name: Install Go
uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5.6.0
with:
go-version: ${{ matrix.go-version }}
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
persist-credentials: false
- name: golangci-lint
uses: golangci/golangci-lint-action@4696ba8babb6127d732c3c6dde519db15edab9ea # v6.5.1
with:
version: latest
args: -c .golangci.yml -v
- name: go test
run: go test ./...

View File

@@ -1 +0,0 @@
*~

View File

@@ -1,35 +0,0 @@
# Copyright (c) 2021 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
run:
concurrency: 4
deadline: 600s
skip-dirs:
- vendor
# Ignore auto-generated protobuf code.
skip-files:
- ".*\\.pb\\.go$"
linters:
disable-all: true
enable:
- deadcode
- gocyclo
- gofmt
- gosimple
- govet
- ineffassign
- misspell
- staticcheck
- structcheck
- typecheck
- unconvert
- unused
- varcheck
linters-settings:
gocyclo:
min_complexity: 15
unused:
check-exported: true

View File

@@ -1,26 +0,0 @@
language: go
go:
- "1.10"
- "1.11"
- tip
arch:
- s390x
go_import_path: github.com/kata-containers/govmm
matrix:
allow_failures:
- go: tip
before_install:
- go get github.com/alecthomas/gometalinter
- gometalinter --install
- go get github.com/mattn/goveralls
script:
- go env
- gometalinter --tests --vendor --disable-all --enable=misspell --enable=vet --enable=ineffassign --enable=gofmt --enable=gocyclo --cyclo-over=15 --enable=golint --enable=errcheck --enable=deadcode --enable=staticcheck -enable=gas ./...
after_success:
- $GOPATH/bin/goveralls -repotoken $COVERALLS_TOKEN -v -service=travis-ci

View File

@@ -1,124 +0,0 @@
# Copyright (c) K3s contributors
#
# SPDX-License-Identifier: Apache-2.0
#
{{- /* */ -}}
# File generated by {{ .Program }}. DO NOT EDIT. Use config-v3.toml.tmpl instead.
version = 3
imports = ["__CONTAINERD_IMPORTS_PATH__"]
root = {{ printf "%q" .NodeConfig.Containerd.Root }}
state = {{ printf "%q" .NodeConfig.Containerd.State }}
[grpc]
address = {{ deschemify .NodeConfig.Containerd.Address | printf "%q" }}
[plugins.'io.containerd.internal.v1.opt']
path = {{ printf "%q" .NodeConfig.Containerd.Opt }}
[plugins.'io.containerd.grpc.v1.cri']
stream_server_address = "127.0.0.1"
stream_server_port = "10010"
[plugins.'io.containerd.cri.v1.runtime']
enable_selinux = {{ .NodeConfig.SELinux }}
enable_unprivileged_ports = {{ .EnableUnprivileged }}
enable_unprivileged_icmp = {{ .EnableUnprivileged }}
device_ownership_from_security_context = {{ .NonrootDevices }}
{{ if .DisableCgroup}}
disable_cgroup = true
{{ end }}
{{ if .IsRunningInUserNS }}
disable_apparmor = true
restrict_oom_score_adj = true
{{ end }}
{{ with .NodeConfig.AgentConfig.Snapshotter }}
[plugins.'io.containerd.cri.v1.images']
snapshotter = "{{ . }}"
disable_snapshot_annotations = {{ if eq . "stargz" }}false{{else}}true{{end}}
use_local_image_pull = true
{{ end }}
{{ with .NodeConfig.AgentConfig.PauseImage }}
[plugins.'io.containerd.cri.v1.images'.pinned_images]
sandbox = "{{ . }}"
{{ end }}
{{- if or .NodeConfig.AgentConfig.CNIBinDir .NodeConfig.AgentConfig.CNIConfDir }}
[plugins.'io.containerd.cri.v1.runtime'.cni]
{{ with .NodeConfig.AgentConfig.CNIBinDir }}bin_dirs = [{{ printf "%q" . }}]{{ end }}
{{ with .NodeConfig.AgentConfig.CNIConfDir }}conf_dir = {{ printf "%q" . }}{{ end }}
{{ end }}
{{ if or .NodeConfig.Containerd.BlockIOConfig .NodeConfig.Containerd.RDTConfig }}
[plugins.'io.containerd.service.v1.tasks-service']
{{ with .NodeConfig.Containerd.BlockIOConfig }}blockio_config_file = {{ printf "%q" . }}{{ end }}
{{ with .NodeConfig.Containerd.RDTConfig }}rdt_config_file = {{ printf "%q" . }}{{ end }}
{{ end }}
{{ with .NodeConfig.DefaultRuntime }}
[plugins.'io.containerd.cri.v1.runtime'.containerd]
default_runtime_name = "{{ . }}"
{{ end }}
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc.options]
SystemdCgroup = {{ .SystemdCgroup }}
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runhcs-wcow-process]
runtime_type = "io.containerd.runhcs.v1"
{{ range $k, $v := .ExtraRuntimes }}
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'{{ $k }}']
runtime_type = "{{$v.RuntimeType}}"
{{ with $v.BinaryName}}
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'{{ $k }}'.options]
BinaryName = {{ printf "%q" . }}
SystemdCgroup = {{ $.SystemdCgroup }}
{{ end }}
{{ end }}
[plugins.'io.containerd.cri.v1.images'.registry]
config_path = {{ printf "%q" .NodeConfig.Containerd.Registry }}
{{ if .PrivateRegistryConfig }}
{{ range $k, $v := .PrivateRegistryConfig.Configs }}
{{ with $v.Auth }}
[plugins.'io.containerd.cri.v1.images'.registry.configs.'{{ $k }}'.auth]
{{ with .Username }}username = {{ printf "%q" . }}{{ end }}
{{ with .Password }}password = {{ printf "%q" . }}{{ end }}
{{ with .Auth }}auth = {{ printf "%q" . }}{{ end }}
{{ with .IdentityToken }}identitytoken = {{ printf "%q" . }}{{ end }}
{{ end }}
{{ end }}
{{ end }}
{{ if eq .NodeConfig.AgentConfig.Snapshotter "stargz" }}
{{ with .NodeConfig.AgentConfig.ImageServiceSocket }}
[plugins.'io.containerd.snapshotter.v1.stargz']
cri_keychain_image_service_path = {{ printf "%q" . }}
[plugins.'io.containerd.snapshotter.v1.stargz'.cri_keychain]
enable_keychain = true
{{ end }}
[plugins.'io.containerd.snapshotter.v1.stargz'.registry]
config_path = {{ printf "%q" .NodeConfig.Containerd.Registry }}
{{ if .PrivateRegistryConfig }}
{{ range $k, $v := .PrivateRegistryConfig.Configs }}
{{ with $v.Auth }}
[plugins.'io.containerd.snapshotter.v1.stargz'.registry.configs.'{{ $k }}'.auth]
{{ with .Username }}username = {{ printf "%q" . }}{{ end }}
{{ with .Password }}password = {{ printf "%q" . }}{{ end }}
{{ with .Auth }}auth = {{ printf "%q" . }}{{ end }}
{{ with .IdentityToken }}identitytoken = {{ printf "%q" . }}{{ end }}
{{ end }}
{{ end }}
{{ end }}
{{ end }}

View File

@@ -0,0 +1,213 @@
#!/usr/bin/env bats
#
# Copyright (c) 2026 NVIDIA Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
# Kata Deploy Lifecycle Tests
#
# Validates kata-deploy behavior during DaemonSet restarts and uninstalls:
#
# 1. Artifacts present: After install, kata artifacts exist on the host,
# RuntimeClasses are created, and the node is labeled.
#
# 2. Restart resilience: Running kata pods must survive a kata-deploy
# DaemonSet restart without crashing. (Regression test for #12761)
#
# 3. Artifact cleanup: After helm uninstall, kata artifacts must be
# fully removed from the host and containerd must remain healthy.
#
# Required environment variables:
# DOCKER_REGISTRY - Container registry for kata-deploy image
# DOCKER_REPO - Repository name for kata-deploy image
# DOCKER_TAG - Image tag to test
# KATA_HYPERVISOR - Hypervisor to test (qemu, clh, etc.)
# KUBERNETES - K8s distribution (microk8s, k3s, rke2, etc.)
load "${BATS_TEST_DIRNAME}/../../common.bash"
repo_root_dir="${BATS_TEST_DIRNAME}/../../../"
load "${repo_root_dir}/tests/gha-run-k8s-common.sh"
source "${BATS_TEST_DIRNAME}/lib/helm-deploy.bash"
LIFECYCLE_POD_NAME="kata-lifecycle-test"
# Run a command on the host node's filesystem using a short-lived privileged pod.
# The host root is mounted at /host inside the pod.
# Usage: run_on_host "test -d /host/opt/kata && echo YES || echo NO"
run_on_host() {
local cmd="$1"
local node_name
node_name=$(kubectl get nodes --no-headers -o custom-columns=NAME:.metadata.name | head -1)
local pod_name="host-exec-${RANDOM}"
kubectl run "${pod_name}" \
--image=quay.io/kata-containers/alpine-bash-curl:latest \
--restart=Never --rm -i \
--overrides="{
\"spec\": {
\"nodeName\": \"${node_name}\",
\"activeDeadlineSeconds\": 300,
\"tolerations\": [{\"operator\": \"Exists\"}],
\"containers\": [{
\"name\": \"exec\",
\"image\": \"quay.io/kata-containers/alpine-bash-curl:latest\",
\"imagePullPolicy\": \"IfNotPresent\",
\"command\": [\"sh\", \"-c\", \"${cmd}\"],
\"securityContext\": {\"privileged\": true},
\"volumeMounts\": [{\"name\": \"host\", \"mountPath\": \"/host\", \"readOnly\": true}]
}],
\"volumes\": [{\"name\": \"host\", \"hostPath\": {\"path\": \"/\"}}]
}
}"
}
setup_file() {
ensure_helm
echo "# Image: ${DOCKER_REGISTRY}/${DOCKER_REPO}:${DOCKER_TAG}" >&3
echo "# Hypervisor: ${KATA_HYPERVISOR}" >&3
echo "# K8s distribution: ${KUBERNETES}" >&3
echo "# Deploying kata-deploy..." >&3
deploy_kata
echo "# kata-deploy deployed successfully" >&3
}
@test "Kata artifacts are present on host after install" {
echo "# Checking kata artifacts on host..." >&3
run run_on_host "test -d /host/opt/kata && echo PRESENT || echo MISSING"
echo "# /opt/kata directory: ${output}" >&3
[[ "${output}" == *"PRESENT"* ]]
run run_on_host "test -f /host/opt/kata/bin/containerd-shim-kata-v2 && echo FOUND || (test -f /host/opt/kata/runtime-rs/bin/containerd-shim-kata-v2 && echo FOUND || echo MISSING)"
echo "# containerd-shim-kata-v2: ${output}" >&3
[[ "${output}" == *"FOUND"* ]]
# RuntimeClasses must exist (filter out AKS-managed ones)
local rc_count
rc_count=$(kubectl get runtimeclasses --no-headers 2>/dev/null | grep -v "kata-mshv-vm-isolation" | grep -c "kata" || true)
echo "# Kata RuntimeClasses: ${rc_count}" >&3
[[ ${rc_count} -gt 0 ]]
# Node must have the kata-runtime label
local label
label=$(kubectl get nodes -o jsonpath='{.items[0].metadata.labels.katacontainers\.io/kata-runtime}')
echo "# Node label katacontainers.io/kata-runtime: ${label}" >&3
[[ "${label}" == "true" ]]
}
@test "DaemonSet restart does not crash running kata pods" {
# Create a long-running kata pod
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: ${LIFECYCLE_POD_NAME}
spec:
runtimeClassName: kata-${KATA_HYPERVISOR}
restartPolicy: Always
nodeSelector:
katacontainers.io/kata-runtime: "true"
containers:
- name: test
image: quay.io/kata-containers/alpine-bash-curl:latest
imagePullPolicy: IfNotPresent
command: ["sleep", "infinity"]
EOF
echo "# Waiting for kata pod to be running..." >&3
kubectl wait --for=condition=Ready "pod/${LIFECYCLE_POD_NAME}" --timeout=120s
# Record pod identity before the DaemonSet restart
local pod_uid_before
pod_uid_before=$(kubectl get pod "${LIFECYCLE_POD_NAME}" -o jsonpath='{.metadata.uid}')
local restart_count_before
restart_count_before=$(kubectl get pod "${LIFECYCLE_POD_NAME}" -o jsonpath='{.status.containerStatuses[0].restartCount}')
echo "# Pod UID before: ${pod_uid_before}, restarts: ${restart_count_before}" >&3
# Trigger a DaemonSet restart — this simulates what happens when a user
# changes a label, updates a config value, or does a rolling update.
echo "# Triggering kata-deploy DaemonSet restart..." >&3
kubectl -n "${HELM_NAMESPACE}" rollout restart daemonset/kata-deploy
echo "# Waiting for DaemonSet rollout to complete..." >&3
kubectl -n "${HELM_NAMESPACE}" rollout status daemonset/kata-deploy --timeout=300s
# On k3s/rke2 the new kata-deploy pod restarts the k3s service as
# part of install, which causes a brief API server outage. Wait for
# the node to become ready before querying pod status.
kubectl wait nodes --timeout=120s --all --for condition=Ready=True
echo "# Node is ready after DaemonSet rollout" >&3
# The kata pod must still be Running with the same UID and no extra restarts.
# Retry kubectl through any residual API unavailability.
local pod_phase=""
local retries=0
while [[ ${retries} -lt 30 ]]; do
pod_phase=$(kubectl get pod "${LIFECYCLE_POD_NAME}" -o jsonpath='{.status.phase}' 2>/dev/null) && break
retries=$((retries + 1))
sleep 2
done
echo "# Pod phase after restart: ${pod_phase}" >&3
[[ "${pod_phase}" == "Running" ]]
local pod_uid_after
pod_uid_after=$(kubectl get pod "${LIFECYCLE_POD_NAME}" -o jsonpath='{.metadata.uid}')
echo "# Pod UID after: ${pod_uid_after}" >&3
[[ "${pod_uid_before}" == "${pod_uid_after}" ]]
local restart_count_after
restart_count_after=$(kubectl get pod "${LIFECYCLE_POD_NAME}" -o jsonpath='{.status.containerStatuses[0].restartCount}')
echo "# Restart count after: ${restart_count_after}" >&3
[[ "${restart_count_before}" == "${restart_count_after}" ]]
echo "# SUCCESS: Kata pod survived DaemonSet restart without crashing" >&3
}
@test "Artifacts are fully cleaned up after uninstall" {
echo "# Uninstalling kata-deploy..." >&3
uninstall_kata
echo "# Uninstall complete, verifying cleanup..." >&3
# Wait for node to recover — containerd restart during cleanup may
# cause brief unavailability (especially on k3s/rke2).
kubectl wait nodes --timeout=120s --all --for condition=Ready=True
# RuntimeClasses must be gone (filter out AKS-managed ones)
local rc_count
rc_count=$(kubectl get runtimeclasses --no-headers 2>/dev/null | grep -v "kata-mshv-vm-isolation" | grep -c "kata" || true)
echo "# Kata RuntimeClasses remaining: ${rc_count}" >&3
[[ ${rc_count} -eq 0 ]]
# Node label must be removed
local label
label=$(kubectl get nodes -o jsonpath='{.items[0].metadata.labels.katacontainers\.io/kata-runtime}' 2>/dev/null || echo "")
echo "# Node label after uninstall: '${label}'" >&3
[[ -z "${label}" ]]
# Kata artifacts must be removed from the host filesystem
echo "# Checking host filesystem for leftover artifacts..." >&3
run run_on_host "test -d /host/opt/kata && echo EXISTS || echo REMOVED"
echo "# /opt/kata: ${output}" >&3
[[ "${output}" == *"REMOVED"* ]]
# Containerd must still be healthy and reporting a valid version
local container_runtime_version
container_runtime_version=$(kubectl get nodes --no-headers -o custom-columns=CONTAINER_RUNTIME:.status.nodeInfo.containerRuntimeVersion)
echo "# Container runtime version: ${container_runtime_version}" >&3
[[ "${container_runtime_version}" != *"Unknown"* ]]
echo "# SUCCESS: All kata artifacts cleaned up, containerd healthy" >&3
}
teardown() {
if [[ "${BATS_TEST_NAME}" == *"restart"* ]]; then
kubectl delete pod "${LIFECYCLE_POD_NAME}" --ignore-not-found=true --wait=false 2>/dev/null || true
fi
}
teardown_file() {
kubectl delete pod "${LIFECYCLE_POD_NAME}" --ignore-not-found=true --wait=false 2>/dev/null || true
uninstall_kata 2>/dev/null || true
}

View File

@@ -20,6 +20,7 @@ else
KATA_DEPLOY_TEST_UNION=( \
"kata-deploy.bats" \
"kata-deploy-custom-runtimes.bats" \
"kata-deploy-lifecycle.bats" \
)
fi

View File

@@ -296,36 +296,6 @@ function deploy_k0s() {
sudo chown "${USER}":"${USER}" ~/.kube/config
}
# If the rendered containerd config (v3) does not import the drop-in dir, write
# the full V3 template (from tests/containerd-config-v3.tmpl) with the given
# import path and restart the service.
# Args: containerd_dir (e.g. /var/lib/rancher/k3s/agent/etc/containerd), service_name (e.g. k3s or rke2-server).
function _setup_containerd_v3_template_if_needed() {
local containerd_dir="$1"
local service_name="$2"
local template_file="${tests_dir}/containerd-config-v3.tmpl"
local rendered_v3="${containerd_dir}/config-v3.toml"
local imports_path="${containerd_dir}/config-v3.toml.d/*.toml"
if sudo test -f "${rendered_v3}" && sudo grep -q 'config-v3\.toml\.d' "${rendered_v3}" 2>/dev/null; then
return 0
fi
if [[ ! -f "${template_file}" ]]; then
echo "Template not found: ${template_file}" >&2
return 1
fi
sudo mkdir -p "${containerd_dir}/config-v3.toml.d"
sed "s|__CONTAINERD_IMPORTS_PATH__|${imports_path}|g" "${template_file}" | sudo tee "${containerd_dir}/config-v3.toml.tmpl" > /dev/null
sudo systemctl restart "${service_name}"
}
function setup_k3s_containerd_v3_template_if_needed() {
_setup_containerd_v3_template_if_needed "/var/lib/rancher/k3s/agent/etc/containerd" "k3s"
}
function setup_rke2_containerd_v3_template_if_needed() {
_setup_containerd_v3_template_if_needed "/var/lib/rancher/rke2/agent/etc/containerd" "rke2-server"
}
function deploy_k3s() {
# Set CRI runtime-request-timeout to 600s (same as kubeadm) for CoCo and long-running create requests.
curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644 --kubelet-arg runtime-request-timeout=600s
@@ -333,9 +303,6 @@ function deploy_k3s() {
# This is an arbitrary value that came up from local tests
sleep 120s
# If rendered config does not import the drop-in dir, write full V3 template so kata-deploy can use it.
setup_k3s_containerd_v3_template_if_needed
# Download the kubectl binary into /usr/bin and remove /usr/local/bin/kubectl
#
# We need to do this to avoid hitting issues like:
@@ -405,9 +372,6 @@ function deploy_rke2() {
# This is an arbitrary value that came up from local tests
sleep 120s
# If rendered config does not import the drop-in dir, write full V3 template so kata-deploy can use it.
setup_rke2_containerd_v3_template_if_needed
# Link the kubectl binary into /usr/bin
sudo ln -sf /var/lib/rancher/rke2/bin/kubectl /usr/local/bin/kubectl

View File

@@ -116,12 +116,16 @@ function is_confidential_gpu_hardware() {
return 1
}
# create_loop_device creates a loop device backed by a file.
# $1: loop file path (default: /tmp/trusted-image-storage.img)
# $2: size in MiB, i.e. dd bs=1M count=... (default: 2500, ~2.4Gi)
function create_loop_device(){
local loop_file="${1:-/tmp/trusted-image-storage.img}"
local size_mb="${2:-2500}"
local node="$(get_one_kata_node)"
cleanup_loop_device "$loop_file"
exec_host "$node" "dd if=/dev/zero of=$loop_file bs=1M count=2500"
exec_host "$node" "dd if=/dev/zero of=$loop_file bs=1M count=$size_mb"
exec_host "$node" "losetup -fP $loop_file >/dev/null 2>&1"
local device=$(exec_host "$node" losetup -j $loop_file | awk -F'[: ]' '{print $1}')

View File

@@ -97,7 +97,10 @@ setup() {
storage_config=$(mktemp "${BATS_FILE_TMPDIR}/$(basename "${storage_config_template}").XXXXXX.yaml")
local_device=$(create_loop_device)
LOCAL_DEVICE="$local_device" NODE_NAME="$node" envsubst < "$storage_config_template" > "$storage_config"
PV_NAME=trusted-block-pv PVC_NAME=trusted-pvc \
PV_STORAGE_CAPACITY=10Gi PVC_STORAGE_REQUEST=1Gi \
LOCAL_DEVICE="$local_device" NODE_NAME="$node" \
envsubst < "$storage_config_template" > "$storage_config"
# For debug sake
echo "Trusted storage $storage_config file:"
@@ -142,7 +145,10 @@ setup() {
@test "Test we cannot pull a large image that pull time exceeds createcontainer timeout inside the guest" {
storage_config=$(mktemp "${BATS_FILE_TMPDIR}/$(basename "${storage_config_template}").XXXXXX.yaml")
local_device=$(create_loop_device)
LOCAL_DEVICE="$local_device" NODE_NAME="$node" envsubst < "$storage_config_template" > "$storage_config"
PV_NAME=trusted-block-pv PVC_NAME=trusted-pvc \
PV_STORAGE_CAPACITY=10Gi PVC_STORAGE_REQUEST=1Gi \
LOCAL_DEVICE="$local_device" NODE_NAME="$node" \
envsubst < "$storage_config_template" > "$storage_config"
# For debug sake
echo "Trusted storage $storage_config file:"
@@ -193,7 +199,10 @@ setup() {
fi
storage_config=$(mktemp "${BATS_FILE_TMPDIR}/$(basename "${storage_config_template}").XXXXXX.yaml")
local_device=$(create_loop_device)
LOCAL_DEVICE="$local_device" NODE_NAME="$node" envsubst < "$storage_config_template" > "$storage_config"
PV_NAME=trusted-block-pv PVC_NAME=trusted-pvc \
PV_STORAGE_CAPACITY=10Gi PVC_STORAGE_REQUEST=1Gi \
LOCAL_DEVICE="$local_device" NODE_NAME="$node" \
envsubst < "$storage_config_template" > "$storage_config"
# For debug sake
echo "Trusted storage $storage_config file:"

View File

@@ -0,0 +1,219 @@
#!/usr/bin/env bats
#
# Copyright (c) 2026 NVIDIA Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
# This file is modeled after k8s-nvidia-nim.bats which contains helpful in-line documentation.
load "${BATS_TEST_DIRNAME}/lib.sh"
load "${BATS_TEST_DIRNAME}/confidential_common.sh"
export KATA_HYPERVISOR="${KATA_HYPERVISOR:-qemu-nvidia-gpu}"
TEE=false
if is_confidential_gpu_hardware; then
TEE=true
fi
export TEE
NIM_SERVICE_NAME="meta-llama-3-2-1b-instruct"
[[ "${TEE}" = "true" ]] && NIM_SERVICE_NAME="meta-llama-3-2-1b-instruct-tee"
export NIM_SERVICE_NAME
POD_READY_TIMEOUT_LLAMA_3_2_1B_PREDEFINED=600s
[[ "${TEE}" = "true" ]] && POD_READY_TIMEOUT_LLAMA_3_2_1B_PREDEFINED=1200s
export POD_READY_TIMEOUT_LLAMA_3_2_1B=${POD_READY_TIMEOUT_LLAMA_3_2_1B:-${POD_READY_TIMEOUT_LLAMA_3_2_1B_PREDEFINED}}
export LOCAL_NIM_CACHE_LLAMA_3_2_1B="${LOCAL_NIM_CACHE_LLAMA_3_2_1B:-${LOCAL_NIM_CACHE:-/opt/nim/.cache}-llama-3-2-1b}"
DOCKER_CONFIG_JSON=$(
echo -n "{\"auths\":{\"nvcr.io\":{\"username\":\"\$oauthtoken\",\"password\":\"${NGC_API_KEY}\",\"auth\":\"$(echo -n "\$oauthtoken:${NGC_API_KEY}" | base64 -w0)\"}}}" |
base64 -w0
)
export DOCKER_CONFIG_JSON
KBS_AUTH_CONFIG_JSON=$(
echo -n "{\"auths\":{\"nvcr.io\":{\"auth\":\"$(echo -n "\$oauthtoken:${NGC_API_KEY}" | base64 -w0)\"}}}" |
base64 -w0
)
export KBS_AUTH_CONFIG_JSON
NGC_API_KEY_BASE64=$(
echo -n "${NGC_API_KEY}" | base64 -w0
)
export NGC_API_KEY_BASE64
# Points to kbs:///default/ngc-api-key/instruct and thus re-uses the secret from k8s-nvidia-nim.bats.
NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B="${SEALED_SECRET_PRECREATED_NIM_INSTRUCT}"
export NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B
NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B_BASE64=$(echo -n "${NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B}" | base64 -w0)
export NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B_BASE64
# NIM Operator (k8s-nim-operator) install/uninstall for NIMService CRD.
NIM_OPERATOR_NAMESPACE="${NIM_OPERATOR_NAMESPACE:-nim-operator}"
NIM_OPERATOR_RELEASE_NAME="nim-operator"
install_nim_operator() {
command -v helm &>/dev/null || die "helm is required but not installed"
echo "Installing NVIDIA NIM Operator (latest chart)"
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
kubectl create namespace "${NIM_OPERATOR_NAMESPACE}" --dry-run=client -o yaml | kubectl apply -f -
helm upgrade --install "${NIM_OPERATOR_RELEASE_NAME}" nvidia/k8s-nim-operator \
-n "${NIM_OPERATOR_NAMESPACE}" \
--wait
local deploy_name
deploy_name=$(kubectl get deployment -n "${NIM_OPERATOR_NAMESPACE}" -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || true)
if [[ -n "${deploy_name}" ]]; then
kubectl wait --for=condition=available --timeout=300s "deployment/${deploy_name}" -n "${NIM_OPERATOR_NAMESPACE}"
fi
echo "NIM Operator install complete."
}
uninstall_nim_operator() {
echo "Uninstalling NVIDIA NIM Operator (release: ${NIM_OPERATOR_RELEASE_NAME}, namespace: ${NIM_OPERATOR_NAMESPACE})"
if helm status "${NIM_OPERATOR_RELEASE_NAME}" -n "${NIM_OPERATOR_NAMESPACE}" &>/dev/null; then
helm uninstall "${NIM_OPERATOR_RELEASE_NAME}" -n "${NIM_OPERATOR_NAMESPACE}" || true
kubectl delete namespace "${NIM_OPERATOR_NAMESPACE}" --ignore-not-found=true --timeout=60s || true
echo "NIM Operator uninstall complete."
else
echo "NIM Operator release not found, nothing to uninstall."
fi
}
setup_kbs_credentials() {
CC_KBS_ADDR=$(kbs_k8s_svc_http_addr)
export CC_KBS_ADDR
kubectl delete secret ngc-secret-llama-3-2-1b --ignore-not-found
kubectl create secret docker-registry ngc-secret-llama-3-2-1b --docker-server="nvcr.io" --docker-username="\$oauthtoken" --docker-password="${NGC_API_KEY}"
kbs_set_gpu0_resource_policy
kbs_set_resource_base64 "default" "credentials" "nvcr" "${KBS_AUTH_CONFIG_JSON}"
kbs_set_resource "default" "ngc-api-key" "instruct" "${NGC_API_KEY}"
}
# CDH initdata for guest-pull: KBS URL, registry credentials URI, and allow-all policy.
# NIMService is not supported by genpolicy; add_allow_all_policy_to_yaml only supports Pod/Deployment.
# Build initdata with policy inline so TEE pods get both CDH config and policy.
create_nim_initdata_file_llama_3_2_1b() {
local output_file="$1"
local cc_kbs_address
cc_kbs_address=$(kbs_k8s_svc_http_addr)
local allow_all_rego="${BATS_TEST_DIRNAME}/../../../src/kata-opa/allow-all.rego"
cat > "${output_file}" << EOF
version = "0.1.0"
algorithm = "sha256"
[data]
"aa.toml" = '''
[token_configs]
[token_configs.kbs]
url = "${cc_kbs_address}"
'''
"cdh.toml" = '''
[kbc]
name = "cc_kbc"
url = "${cc_kbs_address}"
[image]
authenticated_registry_credentials_uri = "kbs:///default/credentials/nvcr"
'''
"policy.rego" = '''
$(cat "${allow_all_rego}")
'''
EOF
}
setup() {
setup_common || die "setup_common failed"
install_nim_operator || die "NIM Operator install failed"
dpkg -s jq >/dev/null 2>&1 || sudo apt -y install jq
# Same pattern as k8s-nvidia-nim.bats: choose manifest by TEE; each YAML has literal secret names.
local tee_suffix=""
[[ "${TEE}" = "true" ]] && tee_suffix="-tee"
export NIM_YAML_IN="${pod_config_dir}/nvidia-nim-llama-3-2-1b-instruct-service${tee_suffix}.yaml.in"
export NIM_YAML="${pod_config_dir}/nvidia-nim-llama-3-2-1b-instruct-service${tee_suffix}.yaml"
if [[ "${TEE}" = "true" ]]; then
setup_kbs_credentials
setup_sealed_secret_signing_public_key
initdata_file="${BATS_SUITE_TMPDIR}/nim-initdata-llama-3-2-1b.toml"
create_nim_initdata_file_llama_3_2_1b "${initdata_file}"
NIM_INITDATA_BASE64=$(gzip -c "${initdata_file}" | base64 -w0)
export NIM_INITDATA_BASE64
fi
envsubst < "${NIM_YAML_IN}" > "${NIM_YAML}"
}
@test "NIMService llama-3.2-1b-instruct serves /v1/models" {
echo "NIMService test: Applying NIM YAML"
kubectl apply -f "${NIM_YAML}"
echo "NIMService test: Waiting for deployment to exist (operator creates it from NIMService)"
local wait_exist_timeout=30
local elapsed=0
while ! kubectl get deployment "${NIM_SERVICE_NAME}" &>/dev/null; do
if [[ ${elapsed} -ge ${wait_exist_timeout} ]]; then
echo "Deployment ${NIM_SERVICE_NAME} did not appear within ${wait_exist_timeout}s" >&2
kubectl get deployment "${NIM_SERVICE_NAME}" 2>&1 || true
false
fi
sleep 5
elapsed=$((elapsed + 5))
done
local pod_name
pod_name=$(kubectl get pods --no-headers -o custom-columns=":metadata.name" | head -1)
echo "NIMService test: POD_NAME=${pod_name} (waiting for pod ready, timeout ${POD_READY_TIMEOUT_LLAMA_3_2_1B})"
[[ -n "${pod_name}" ]]
kubectl wait --for=condition=ready --timeout="${POD_READY_TIMEOUT_LLAMA_3_2_1B}" "pod/${pod_name}"
local pod_ip
pod_ip=$(kubectl get pod "${pod_name}" -o jsonpath='{.status.podIP}')
echo "NIMService test: POD_IP=${pod_ip}"
[[ -n "${pod_ip}" ]]
echo "NIMService test: Curling http://${pod_ip}:8000/v1/models"
run curl -sS --connect-timeout 10 "http://${pod_ip}:8000/v1/models"
echo "NIMService test: /v1/models response: ${output}"
[[ "${status}" -eq 0 ]]
[[ "$(echo "${output}" | jq -r '.object')" == "list" ]]
[[ "$(echo "${output}" | jq -r '.data[0].id')" == "meta/llama-3.2-1b-instruct" ]]
[[ "$(echo "${output}" | jq -r '.data[0].object')" == "model" ]]
echo "NIMService test: Curling http://${pod_ip}:8000/v1/chat/completions"
run curl -sS --connect-timeout 30 "http://${pod_ip}:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model":"meta/llama-3.2-1b-instruct","messages":[{"role":"user","content":"ping"}],"max_tokens":8}'
echo "NIMService test: /v1/chat/completions response: ${output}"
[[ "${status}" -eq 0 ]]
[[ "$(echo "${output}" | jq -r '.object')" == "chat.completion" ]]
[[ "$(echo "${output}" | jq -r '.model')" == "meta/llama-3.2-1b-instruct" ]]
[[ "$(echo "${output}" | jq -r '.choices[0].message | has("content") or has("reasoning_content")')" == "true" ]]
}
teardown() {
if kubectl get nimservice "${NIM_SERVICE_NAME}" &>/dev/null; then
POD_NAME=$(kubectl get pods --no-headers -o custom-columns=":metadata.name" | head -1)
if [[ -n "${POD_NAME}" ]]; then
echo "=== NIMService pod logs ==="
kubectl logs "${POD_NAME}" || true
kubectl describe pod "${POD_NAME}" || true
fi
kubectl describe nimservice "${NIM_SERVICE_NAME}" || true
fi
[ -f "${NIM_YAML}" ] && kubectl delete -f "${NIM_YAML}" --ignore-not-found=true
uninstall_nim_operator || true
print_node_journal_since_test_start "${node}" "${node_start_time:-}" "${BATS_TEST_COMPLETED:-}"
}

View File

@@ -85,6 +85,8 @@ setup_langchain_flow() {
# generated policy.rego to it and set it as the cc_init_data annotation.
# We must overwrite the default empty file AFTER create_tmp_policy_settings_dir()
# copies it to the temp directory.
# As we use multiple vCPUs we set `max_concurrent_layer_downloads_per_image = 1`,
# see: https://github.com/kata-containers/kata-containers/issues/12721
create_nim_initdata_file() {
local output_file="$1"
local cc_kbs_address
@@ -107,6 +109,7 @@ name = "cc_kbc"
url = "${cc_kbs_address}"
[image]
max_concurrent_layer_downloads_per_image = 1
authenticated_registry_credentials_uri = "kbs:///default/credentials/nvcr"
'''
EOF
@@ -189,12 +192,35 @@ setup_file() {
# This must happen AFTER create_tmp_policy_settings_dir() copies the empty
# file and BEFORE auto_generate_policy() runs.
create_nim_initdata_file "${policy_settings_dir}/default-initdata.toml"
# Container image layer storage: one block device and PV/PVC per pod.
storage_config_template="${pod_config_dir}/confidential/trusted-storage.yaml.in"
instruct_storage_mib=57344
local_device_instruct=$(create_loop_device /tmp/trusted-image-storage-instruct.img "$instruct_storage_mib")
storage_config_instruct=$(mktemp "${BATS_FILE_TMPDIR}/$(basename "${storage_config_template}").instruct.XXX")
PV_NAME=trusted-block-pv-instruct PVC_NAME=trusted-pvc-instruct \
PV_STORAGE_CAPACITY="${instruct_storage_mib}Mi" PVC_STORAGE_REQUEST="${instruct_storage_mib}Mi" \
LOCAL_DEVICE="$local_device_instruct" NODE_NAME="$node" \
envsubst < "$storage_config_template" > "$storage_config_instruct"
retry_kubectl_apply "$storage_config_instruct"
if [ "${SKIP_MULTI_GPU_TESTS}" != "true" ]; then
embedqa_storage_mib=8192
local_device_embedqa=$(create_loop_device /tmp/trusted-image-storage-embedqa.img "$embedqa_storage_mib")
storage_config_embedqa=$(mktemp "${BATS_FILE_TMPDIR}/$(basename "${storage_config_template}").embedqa.XXX")
PV_NAME=trusted-block-pv-embedqa PVC_NAME=trusted-pvc-embedqa \
PV_STORAGE_CAPACITY="${embedqa_storage_mib}Mi" PVC_STORAGE_REQUEST="${embedqa_storage_mib}Mi" \
LOCAL_DEVICE="$local_device_embedqa" NODE_NAME="$node" \
envsubst < "$storage_config_template" > "$storage_config_embedqa"
retry_kubectl_apply "$storage_config_embedqa"
fi
fi
create_inference_pod
if [ "${SKIP_MULTI_GPU_TESTS}" != "true" ]; then
create_embedqa_pod
create_embedqa_pod
fi
}
@@ -459,5 +485,13 @@ teardown_file() {
[ -f "${POD_EMBEDQA_YAML}" ] && kubectl delete -f "${POD_EMBEDQA_YAML}" --ignore-not-found=true
fi
if [[ "${TEE}" = "true" ]]; then
kubectl delete --ignore-not-found pvc trusted-pvc-instruct trusted-pvc-embedqa
kubectl delete --ignore-not-found pv trusted-block-pv-instruct trusted-block-pv-embedqa
kubectl delete --ignore-not-found storageclass local-storage
cleanup_loop_device /tmp/trusted-image-storage-instruct.img || true
cleanup_loop_device /tmp/trusted-image-storage-embedqa.img || true
fi
print_node_journal_since_test_start "${node}" "${node_start_time:-}" "${BATS_TEST_COMPLETED:-}" >&3
}

View File

@@ -89,7 +89,8 @@ if [[ -n "${K8S_TEST_NV:-}" ]]; then
else
K8S_TEST_NV=("k8s-confidential-attestation.bats" \
"k8s-nvidia-cuda.bats" \
"k8s-nvidia-nim.bats")
"k8s-nvidia-nim.bats" \
"k8s-nvidia-nim-service.bats")
fi
SUPPORTED_HYPERVISORS=("qemu-nvidia-gpu" "qemu-nvidia-gpu-snp" "qemu-nvidia-gpu-tdx")

View File

@@ -14,10 +14,10 @@ volumeBindingMode: WaitForFirstConsumer
apiVersion: v1
kind: PersistentVolume
metadata:
name: trusted-block-pv
name: $PV_NAME
spec:
capacity:
storage: 10Gi
storage: $PV_STORAGE_CAPACITY
volumeMode: Block
accessModes:
- ReadWriteOnce
@@ -37,12 +37,12 @@ spec:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: trusted-pvc
name: $PVC_NAME
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storage: $PVC_STORAGE_REQUEST
volumeMode: Block
storageClassName: local-storage

View File

@@ -69,14 +69,20 @@ spec:
limits:
nvidia.com/pgpu: "1"
cpu: "16"
memory: "64Gi"
memory: "48Gi"
volumeMounts:
- name: nim-trusted-cache
mountPath: /opt/nim/.cache
volumeDevices:
- devicePath: /dev/trusted_store
name: trusted-storage
volumes:
- name: nim-trusted-cache
emptyDir:
sizeLimit: 64Gi
- name: trusted-storage
persistentVolumeClaim:
claimName: trusted-pvc-instruct
---
apiVersion: v1
kind: Secret

View File

@@ -0,0 +1,54 @@
# Copyright (c) 2026 NVIDIA Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
---
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
name: ${NIM_SERVICE_NAME}
spec:
image:
repository: nvcr.io/nim/meta/llama-3.2-1b-instruct
tag: "1.12.0"
pullPolicy: IfNotPresent
pullSecrets:
- ngc-secret-llama-3-2-1b
authSecret: ngc-api-key-sealed-llama-3-2-1b
# /dev/trusted_store container image layer storage feature cannot be selected,
# but storage.emptyDir selects container data storage feature.
storage:
emptyDir:
sizeLimit: 10Gi
replicas: 1
resources:
limits:
nvidia.com/pgpu: "1"
cpu: "8"
memory: "56Gi"
expose:
service:
type: ClusterIP
port: 8000
runtimeClassName: kata
userID: 1000
groupID: 1000
annotations:
io.katacontainers.config.hypervisor.kernel_params: "agent.guest_components_procs=confidential-data-hub agent.aa_kbc_params=cc_kbc::${CC_KBS_ADDR}"
io.katacontainers.config.hypervisor.cc_init_data: "${NIM_INITDATA_BASE64}"
---
apiVersion: v1
kind: Secret
metadata:
name: ngc-secret-llama-3-2-1b
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: ${DOCKER_CONFIG_JSON}
---
apiVersion: v1
kind: Secret
metadata:
name: ngc-api-key-sealed-llama-3-2-1b
type: Opaque
data:
NGC_API_KEY: "${NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B_BASE64}"

View File

@@ -0,0 +1,48 @@
# Copyright (c) 2026 NVIDIA Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
---
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
name: ${NIM_SERVICE_NAME}
spec:
image:
repository: nvcr.io/nim/meta/llama-3.2-1b-instruct
tag: "1.12.0"
pullPolicy: IfNotPresent
pullSecrets:
- ngc-secret-llama-3-2-1b
authSecret: ngc-api-key-llama-3-2-1b
storage:
hostPath: "${LOCAL_NIM_CACHE_LLAMA_3_2_1B}"
replicas: 1
resources:
limits:
nvidia.com/pgpu: "1"
cpu: "8"
memory: "16Gi"
expose:
service:
type: ClusterIP
port: 8000
runtimeClassName: kata
userID: 1000
groupID: 1000
---
apiVersion: v1
kind: Secret
metadata:
name: ngc-secret-llama-3-2-1b
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: ${DOCKER_CONFIG_JSON}
---
apiVersion: v1
kind: Secret
metadata:
name: ngc-api-key-llama-3-2-1b
type: Opaque
data:
NGC_API_KEY: "${NGC_API_KEY_BASE64}"

View File

@@ -83,10 +83,16 @@ spec:
volumeMounts:
- name: nim-trusted-cache
mountPath: /opt/nim/.cache
volumeDevices:
- devicePath: /dev/trusted_store
name: trusted-storage
volumes:
- name: nim-trusted-cache
emptyDir:
sizeLimit: 40Gi
- name: trusted-storage
persistentVolumeClaim:
claimName: trusted-pvc-embedqa
---
apiVersion: v1
kind: Secret

View File

@@ -155,6 +155,7 @@ pub struct Config {
pub containerd_conf_file: String,
pub containerd_conf_file_backup: String,
pub containerd_drop_in_conf_file: String,
pub daemonset_name: String,
pub custom_runtimes_enabled: bool,
pub custom_runtimes: Vec<CustomRuntime>,
}
@@ -169,6 +170,12 @@ impl Config {
return Err(anyhow::anyhow!("NODE_NAME must not be empty"));
}
let daemonset_name = env::var("DAEMONSET_NAME")
.ok()
.map(|v| v.trim().to_string())
.filter(|v| !v.is_empty())
.unwrap_or_else(|| "kata-deploy".to_string());
let debug = env::var("DEBUG").unwrap_or_else(|_| "false".to_string()) == "true";
// Parse shims - only use arch-specific variable
@@ -293,6 +300,7 @@ impl Config {
containerd_conf_file,
containerd_conf_file_backup,
containerd_drop_in_conf_file,
daemonset_name,
custom_runtimes_enabled,
custom_runtimes,
};

View File

@@ -94,30 +94,41 @@ impl K8sClient {
Ok(())
}
pub async fn count_kata_deploy_daemonsets(&self) -> Result<usize> {
/// Returns whether a non-terminating DaemonSet with this exact name
/// exists in the current namespace. Used to decide whether this pod is
/// being restarted (true) or uninstalled (false).
pub async fn own_daemonset_exists(&self, daemonset_name: &str) -> Result<bool> {
use k8s_openapi::api::apps::v1::DaemonSet;
use kube::api::Api;
let ds_api: Api<DaemonSet> = Api::default_namespaced(self.client.clone());
match ds_api.get_opt(daemonset_name).await? {
Some(ds) => Ok(ds.metadata.deletion_timestamp.is_none()),
None => Ok(false),
}
}
/// Returns how many non-terminating DaemonSets across all namespaces
/// have a name containing "kata-deploy". Used to decide whether shared
/// node-level resources (node label, CRI restart) should be cleaned up:
/// they are only safe to remove when no kata-deploy instance remains
/// on the cluster.
pub async fn count_any_kata_deploy_daemonsets(&self) -> Result<usize> {
use k8s_openapi::api::apps::v1::DaemonSet;
use kube::api::{Api, ListParams};
let ds_api: Api<DaemonSet> = Api::default_namespaced(self.client.clone());
let lp = ListParams::default();
let daemonsets = ds_api.list(&lp).await?;
let ds_api: Api<DaemonSet> = Api::all(self.client.clone());
let daemonsets = ds_api.list(&ListParams::default()).await?;
// Note: We use client-side filtering here because Kubernetes field selectors
// don't support "contains" operations - they only support exact matches and comparisons.
// Filtering by name containing "kata-deploy" requires client-side processing.
// Exclude DaemonSets that are terminating (have deletion_timestamp) so that when our
// DaemonSet pod runs cleanup on SIGTERM during uninstall, we count 0 and remove the label.
let count = daemonsets
.iter()
.filter(|ds| {
if ds.metadata.deletion_timestamp.is_some() {
return false;
}
ds.metadata
.name
.as_ref()
.map(|n| n.contains("kata-deploy"))
.unwrap_or(false)
ds.metadata.deletion_timestamp.is_none()
&& ds
.metadata
.name
.as_ref()
.is_some_and(|n| n.contains("kata-deploy"))
})
.count();
@@ -584,9 +595,14 @@ pub async fn label_node(
client.label_node(label_key, label_value, overwrite).await
}
pub async fn count_kata_deploy_daemonsets(config: &Config) -> Result<usize> {
pub async fn own_daemonset_exists(config: &Config) -> Result<bool> {
let client = K8sClient::new(&config.node_name).await?;
client.count_kata_deploy_daemonsets().await
client.own_daemonset_exists(&config.daemonset_name).await
}
pub async fn count_any_kata_deploy_daemonsets(config: &Config) -> Result<usize> {
let client = K8sClient::new(&config.node_name).await?;
client.count_any_kata_deploy_daemonsets().await
}
pub async fn crd_exists(config: &Config, crd_name: &str) -> Result<bool> {

View File

@@ -236,19 +236,29 @@ async fn install(config: &config::Config, runtime: &str) -> Result<()> {
async fn cleanup(config: &config::Config, runtime: &str) -> Result<()> {
info!("Cleaning up Kata Containers");
info!("Counting kata-deploy daemonsets");
let kata_deploy_installations = k8s::count_kata_deploy_daemonsets(config).await?;
// Step 1: Check if THIS pod's owning DaemonSet still exists.
// If it does, this is a pod restart (rolling update, label change, etc.),
// not an uninstall — skip everything so running kata pods are not disrupted.
info!(
"Found {} kata-deploy daemonset(s)",
kata_deploy_installations
"Checking if DaemonSet '{}' still exists",
config.daemonset_name
);
if kata_deploy_installations == 0 {
info!("Removing kata-runtime label from node");
k8s::label_node(config, "katacontainers.io/kata-runtime", None, false).await?;
info!("Successfully removed kata-runtime label");
if k8s::own_daemonset_exists(config).await? {
info!(
"DaemonSet '{}' still exists, \
skipping all cleanup to avoid disrupting running kata pods",
config.daemonset_name
);
return Ok(());
}
// Step 2: Our DaemonSet is gone (uninstall). Perform instance-specific
// cleanup: snapshotters, CRI config, and artifacts for this instance.
info!(
"DaemonSet '{}' not found, proceeding with instance cleanup",
config.daemonset_name
);
match config.experimental_setup_snapshotter.as_ref() {
Some(snapshotters) => {
for snapshotter in snapshotters {
@@ -270,6 +280,25 @@ async fn cleanup(config: &config::Config, runtime: &str) -> Result<()> {
artifacts::remove_artifacts(config).await?;
info!("Successfully removed kata artifacts");
// Step 3: Check if ANY other kata-deploy DaemonSets still exist.
// Shared resources (node label, CRI restart) are only safe to touch
// when no other kata-deploy instance remains.
let other_ds_count = k8s::count_any_kata_deploy_daemonsets(config).await?;
if other_ds_count > 0 {
info!(
"{} other kata-deploy DaemonSet(s) still exist, \
skipping node label removal and CRI restart",
other_ds_count
);
return Ok(());
}
info!("No other kata-deploy DaemonSets found, performing full shared cleanup");
info!("Removing kata-runtime label from node");
k8s::label_node(config, "katacontainers.io/kata-runtime", None, false).await?;
info!("Successfully removed kata-runtime label");
// Restart the CRI runtime last. On k3s/rke2 this restarts the entire
// server process, which kills this (terminating) pod. By doing it after
// all other cleanup, we ensure config and artifacts are already gone.

View File

@@ -51,18 +51,19 @@ pub async fn get_container_runtime(config: &Config) -> Result<String> {
return Ok("crio".to_string());
}
if runtime_version.contains("containerd") && runtime_version.contains("-k3s") {
// Check systemd services (ignore errors - service might not exist)
let _ = utils::host_systemctl(&["is-active", "--quiet", "rke2-agent"]);
if utils::host_systemctl(&["is-active", "--quiet", "rke2-agent"]).is_ok() {
return Ok("rke2-agent".to_string());
}
if utils::host_systemctl(&["is-active", "--quiet", "rke2-server"]).is_ok() {
return Ok("rke2-server".to_string());
}
if utils::host_systemctl(&["is-active", "--quiet", "k3s-agent"]).is_ok() {
return Ok("k3s-agent".to_string());
}
// Detect k3s/rke2 via systemd services rather than the containerd version
// string, which no longer reliably contains "k3s" in newer releases
// (e.g. "containerd://2.2.2-bd1.34").
if utils::host_systemctl(&["is-active", "--quiet", "rke2-agent"]).is_ok() {
return Ok("rke2-agent".to_string());
}
if utils::host_systemctl(&["is-active", "--quiet", "rke2-server"]).is_ok() {
return Ok("rke2-server".to_string());
}
if utils::host_systemctl(&["is-active", "--quiet", "k3s-agent"]).is_ok() {
return Ok("k3s-agent".to_string());
}
if utils::host_systemctl(&["is-active", "--quiet", "k3s"]).is_ok() {
return Ok("k3s".to_string());
}
@@ -83,7 +84,7 @@ pub async fn get_container_runtime(config: &Config) -> Result<String> {
Ok(runtime)
}
/// Returns true if containerRuntimeVersion (e.g. "containerd://2.1.5-k3s1") indicates
/// Returns true if containerRuntimeVersion (e.g. "containerd://2.1.5-k3s1", "containerd://2.2.2-bd1.34") indicates
/// containerd 2.x or newer, false for 1.x or unparseable. Used for drop-in support
/// and for K3s/RKE2 template selection (config-v3.toml.tmpl vs config.toml.tmpl).
pub fn containerd_version_is_2_or_newer(runtime_version: &str) -> bool {
@@ -191,6 +192,7 @@ mod tests {
#[case("containerd://2.0.0", true)]
#[case("containerd://2.1.5", true)]
#[case("containerd://2.1.5-k3s1", true)]
#[case("containerd://2.2.2-bd1.34", true)]
#[case("containerd://2.2.0", true)]
#[case("containerd://2.3.1", true)]
#[case("containerd://2.0.0-rc.1", true)]

View File

@@ -143,6 +143,13 @@ spec:
valueFrom:
fieldRef:
fieldPath: spec.nodeName
{{- if .Values.env.multiInstallSuffix }}
- name: DAEMONSET_NAME
value: {{ printf "%s-%s" .Chart.Name .Values.env.multiInstallSuffix | quote }}
{{- else }}
- name: DAEMONSET_NAME
value: {{ .Chart.Name | quote }}
{{- end }}
- name: DEBUG
value: {{ include "kata-deploy.getDebug" . | quote }}
{{- $shimsAmd64 := include "kata-deploy.getEnabledShimsForArch" (dict "root" . "arch" "amd64") | trim -}}

View File

@@ -5,6 +5,8 @@ Before=containerd.service
[Service]
ExecStart=@CONTAINERD_NYDUS_GRPC_BINARY@ --config @CONFIG_GUEST_PULLING@ --log-to-stdout
Restart=always
RestartSec=5
[Install]
RequiredBy=containerd.service
WantedBy=containerd.service