Compare commits

...

32 Commits

Author SHA1 Message Date
Fabiano Fidêncio
13b4160523 Revert "ci: Drop docker tests"
This reverts commit d82eb8d0f1.
2026-04-02 18:28:48 +02:00
llink5
5ae8a608df runtime: use symptom-based rescan instead of runtime detection
Modern container runtimes (Docker 29+) no longer advertise their
identity through OCI hooks or annotations. Rather than attempting
fragile runtime detection, check for the symptom: no network
endpoints after sandbox creation.

- Remove IsDockerContainer guard from RescanNetwork goroutine
- Remove container kill on timeout (too aggressive without reliable
  runtime detection, breaks CNI on slow architectures)
- Restore original startVM endpoint scan condition (fixes CNI
  regression on s390x)
- RescanNetwork returns nil on timeout with warning instead of error

Signed-off-by: llink5 <llink5@users.noreply.github.com>
2026-04-02 18:27:21 +02:00
llink5
d13bd3f7eb runtime: kill container on network timeout, address review nitpicks
- Kill container via SIGKILL when RescanNetwork times out instead of
  silently continuing without networking
- Remove unused networkReady channel
- Fix import ordering, structured logging, log levels
- Remove double-logging on timeout path
- Add rollback comment and interface doc comment
- Use logrus.Fields and plain const consistently

Signed-off-by: llink5 <llink5@users.noreply.github.com>
2026-04-02 18:27:14 +02:00
llink5
66a04b3114 runtime: compare Dev+Ino for netns identity check
Signed-off-by: llink5 <llink5@users.noreply.github.com>
2026-04-02 18:27:11 +02:00
llink5
c445eea774 runtime: harden Docker 26+ networking fix
- Replace sandbox ID denylist with positive regex (^[0-9a-f]{64}$)
- Rollback partially-added endpoints on scan failure in scanEndpointsInNs

Signed-off-by: llink5 <llink5@users.noreply.github.com>
2026-04-02 18:26:39 +02:00
llink5
d43c5c20de runtime: fix Docker 26+ networking by rescanning after Start
Docker 26+ configures container networking (veth pair, IP addresses,
routes) after task creation rather than before. Kata's endpoint scan
runs during CreateSandbox, before the interfaces exist, resulting in
VMs starting without network connectivity (no -netdev passed to QEMU).

Add RescanNetwork() which runs asynchronously after the Start RPC.
It polls the network namespace until Docker's interfaces appear, then
hotplugs them to QEMU and informs the guest agent to configure them
inside the VM.

Additional fixes:
- mountinfo parser: find fs type dynamically instead of hardcoded
  field index, fixing parsing with optional mount tags (shared:,
  master:)
- IsDockerContainer: check CreateRuntime hooks for Docker 26+
- DockerNetnsPath: extract netns path from libnetwork-setkey hook
  args with path traversal protection
- detectHypervisorNetns: verify PID ownership via /proc/pid/cmdline
  to guard against PID recycling
- startVM guard: rescan when len(endpoints)==0 after VM start

Fixes: #9340

Signed-off-by: llink5 <llink5@users.noreply.github.com>
2026-04-02 18:26:34 +02:00
Fabiano Fidêncio
09194d71bb Merge pull request #12767 from nubificus/fix/fc-rs
runtime-rs: Fix FC API fields
2026-04-02 18:24:35 +02:00
Manuel Huber
dd868dee6d tests: nvidia: onboard NIM service test
Onboard a test case for deploying a NIM service using the NIM
operator. We install the operator helm chart on the fly as this is
a fast operation, spinning up a single operand. Once a NIM service
is scheduled, the operator creates a deployment with a single pod.

For now, the TEE-based flow uses an allow-all policy. In future
work, we strive to support generating pod security policies for the
scenario where NIM services are deployed and the pod manifest is
being generated on the fly.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-04-02 16:58:54 +02:00
Steve Horsman
58101a2166 Merge pull request #12656 from stevenhorsman/actions/checkout-bump
workflows: Update actions/checkout version
2026-04-01 17:34:39 +01:00
Fabiano Fidêncio
75df4c0bd3 Merge pull request #12766 from fidencio/topic/kata-deploy-avoid-kata-pods-to-crash-after-containerd-restart
kata-deploy: Fix kata-deploy pods crashing if containerd restarts
2026-04-01 18:28:16 +02:00
Steve Horsman
2830c4f080 Merge pull request #12746 from ldoktor/ci-helm2
ci.ocp: Use helm deployment for peer-pods
2026-04-01 17:13:21 +01:00
Lukáš Doktor
55a3772032 ci.ocp: Add note about external tests to README.md
to run all the tests that are running in CI we need to enable external
tests. This can be a bit tricky so add it into our documentation.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2026-04-01 16:59:33 +01:00
Lukáš Doktor
3bc460fd82 ci.ocp: Use helm deployment for peer-pods
replace the deprecated CAA deployment with helm one. Note that this also
installs the CAA mutating webhook, which wasn't installed before.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2026-04-01 16:59:33 +01:00
Fabiano Fidêncio
2131147360 tests: add kata-deploy lifecycle tests for restart resilience and cleanup
Add functional tests that cover two previously untested kata-deploy
behaviors:

1. Restart resilience (regression test for #12761): deploys a
   long-running kata pod, triggers a kata-deploy DaemonSet restart via
   rollout restart, and verifies the kata pod survives with the same
   UID and zero additional container restarts.

2. Artifact cleanup: after helm uninstall, verifies that RuntimeClasses
   are removed, the kata-runtime node label is cleared, /opt/kata is
   gone from the host filesystem, and containerd remains healthy.

3. Artifact presence: after install, verifies /opt/kata and the shim
   binary exist on the host, RuntimeClasses are created, and the node
   is labeled.

Host filesystem checks use a short-lived privileged pod with a
hostPath mount to inspect the node directly.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 15:20:53 +02:00
Fabiano Fidêncio
b4b62417ed kata-deploy: skip cleanup on pod restart to avoid crashing kata pods
When a kata-deploy DaemonSet pod is restarted (e.g. due to a label
change or rolling update), the SIGTERM handler runs cleanup which
unconditionally removes kata artifacts and restarts containerd. This
causes containerd to lose the kata shim binary, crashing all running
kata pods on the node.

Fix this by implementing a three-stage cleanup decision:

1. If this pod's owning DaemonSet still exists (exact name match via
   DAEMONSET_NAME env var), this is a pod restart — skip all cleanup.
   The replacement pod will re-run install, which is idempotent.

2. If this DaemonSet is gone but other kata-deploy DaemonSets still
   exist (multi-install scenario), perform instance-specific cleanup
   only (snapshotters, CRI config, artifacts) but skip shared
   resources (node label removal, CRI restart) to avoid disrupting
   the other instances.

3. If no kata-deploy DaemonSets remain, perform full cleanup including
   node label removal and CRI restart.

The Helm chart injects a DAEMONSET_NAME environment variable with the
exact DaemonSet name (including any multi-install suffix), ensuring
instance-aware lookup rather than broadly matching any DaemonSet
containing "kata-deploy".

Fixes: #12761

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 15:20:52 +02:00
Fabiano Fidêncio
28414a614e kata-deploy: detect k3s/rke2 via systemd services instead of version string
Newer k3s releases (v1.34+) no longer include "k3s" in the containerd
version string at all (e.g. "containerd://2.2.2-bd1.34" instead of
"containerd://2.1.5-k3s1"). This caused kata-deploy to fall through to
the default "containerd" runtime, configuring and restarting the system
containerd service instead of k3s's embedded containerd — leaving the
kata runtime invisible to k3s.

Fix by detecting k3s/rke2 via their systemd service names (k3s,
k3s-agent, rke2-server, rke2-agent) rather than parsing the containerd
version string. This is more robust and works regardless of how k3s
formats its containerd version.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 14:24:55 +02:00
Fabiano Fidêncio
8b9ce3b6cb tests: remove k3s/rke2 V3 containerd template workaround
Remove the workaround that wrote a synthetic containerd V3 config
template for k3s/rke2 in CI. This was added to test kata-deploy's
drop-in support before the upstream k3s/rke2 patch shipped. Now that
k3s and rke2 include the drop-in imports in their default template,
the workaround is no longer needed and breaks newer versions.

Removed:
- tests/containerd-config-v3.tmpl (synthetic Go template)
- _setup_containerd_v3_template_if_needed() and its k3s/rke2 wrappers
- Calls from deploy_k3s() and deploy_rke2()

This reverts the test infrastructure part of a2216ec05.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 14:24:55 +02:00
Steve Horsman
9a3f6b075e Merge pull request #12753 from stevenhorsman/remove-agent-Cargo-
agent: Remove Cargo.lock
2026-04-01 13:22:57 +01:00
Steve Horsman
0d38d88b07 Merge pull request #12484 from Amulyam24/runtime-rs-ppc64le
runtime-rs: add QEMU support for ppc64le
2026-04-01 12:54:40 +01:00
Fabiano Fidêncio
6555350625 Merge pull request #12765 from fidencio/topic/kata-deploy-nydus-fix-systemd-unit
kata-deploy: Make nydus a soft dep of containerd
2026-04-01 13:40:42 +02:00
Fabiano Fidêncio
b823184cf7 Merge pull request #12580 from manuelh-dev/mahuber/gpu-ci-storage
tests: gpu: use the container image layer storage feature
2026-04-01 12:23:56 +02:00
Fabiano Fidêncio
fe1f804543 kata-deploy: Restart nydus-snapshotter in case of failure
Let's ensure that in case nydus-snapshotter crashes for one reason or
another, the service is restarted.

This follows containerd approach, and avoids manual intervention in the
node.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 11:00:21 +02:00
Fabiano Fidêncio
789abe6fdf kata-deploy: Make nydus a soft dep of containerd
Let's relax our RequiredBy and use a WantedBy in the nydus systemd unit
file as, in case of a nydus crash, containerd would also be put down,
causing the node to become NotReady.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 10:52:29 +02:00
Amulyam24
bf74f683d7 runtime-rs: align memory size with desired block size on ppc64le
couldn't initialise QMP: Connection reset by peer (os error 104)
Caused by:
    Connection reset by peer (os error 104)

qemu stderr: "qemu-system-ppc64: Maximum memory size 0x80000000 is not aligned to 256 MiB”

When the default max memory was assigned according to the
available host memory, it failed with the above error

Align the memory values with the block size of 256 MB on ppc64le.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2026-04-01 09:36:45 +01:00
Amulyam24
dcb7d025c7 runtime-rs: Use libc::TUNSETIFF instead of wrapper TUNSETIFF()
While attaching the tap device, it fails on ppc64le with EBADF

"cannot create tap device. File descriptor in bad state (os error 77)\"): unknown”

Refactor the ioctl call to use the standard libc::TUNSETIFF constant.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2026-04-01 09:36:45 +01:00
Amulyam24
8d25ff2c36 runtime-rs: implement set_capabilities for qemu
After the qemu VM is booted, while storing the guest details,
it fails to set capabilities as it is not yet implemented
for QEMU, this change adds a default implementation for it.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2026-04-01 09:36:45 +01:00
Amulyam24
778524467b runtime-rs: enable building runtime-rs on ppc64le
Adds changes in Makefile to build runtime-rs on ppc64le
with QEMU support.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2026-04-01 09:36:45 +01:00
Manuel Huber
177f5c308e tests: gpu: use container image layer storage
Use the container image layer storage feature for the
k8s-nvidia-nim.bats test pod manifests. This reduces the pods'
memory requirements.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-04-01 10:22:26 +02:00
Manuel Huber
b6cf00a374 tests: parametrize storage parameters
- trusted-storage.yaml.in: use $PV_STORAGE_CAPACITY and
  $PVC_STORAGE_REQUEST so that PV/PVC size can vary per test.
- confidential_common.sh: add optional size (MB) argument to
  create_loop_device.
- k8s-guest-pull-image.bats: pass PV_STORAGE_CAPACITY and
  PVC_STORAGE_REQUEST when generating storage config.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-04-01 10:22:26 +02:00
stevenhorsman
5390e470d3 agent: Remove Cargo.lock
Following on from #12690, the agent is part of the repo workspace,
so no longer needs a lock file.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-04-01 09:11:28 +01:00
stevenhorsman
12578b41f2 govmm: Delete old files
The govmm workflow isn't run by us and it and the other CI files
are just legacy from when it was a separate repo, so let's clean up
this debt rather than having to update it frequently.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-30 10:45:28 +01:00
stevenhorsman
b3179bdd8e workflows: Update actions/checkout version
Update the action to resolve the following warning in GHA:
> Node.js 20 actions are deprecated. The following actions are running
> on Node.js 20 and may not work as expected:
> actions/checkout@11bd71901b.
> Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-30 10:45:28 +01:00
92 changed files with 1673 additions and 6218 deletions

View File

@@ -35,7 +35,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
SANDBOXER: "shim"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -96,7 +96,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
SANDBOXER: "podsandbox"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -141,7 +141,7 @@ jobs:
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -195,7 +195,7 @@ jobs:
env:
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -242,7 +242,7 @@ jobs:
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -269,6 +269,50 @@ jobs:
timeout-minutes: 15
run: bash tests/functional/vfio/gha-run.sh run
run-docker-tests:
name: run-docker-tests
strategy:
# We can set this to true whenever we're 100% sure that
# all the tests are not flaky, otherwise we'll fail them
# all due to a single flaky instance.
fail-fast: false
matrix:
vmm:
- qemu
runs-on: ubuntu-22.04
env:
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
persist-credentials: false
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/docker/gha-run.sh install-dependencies
env:
GH_TOKEN: ${{ github.token }}
- name: get-kata-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/docker/gha-run.sh install-kata kata-artifacts
- name: Run docker smoke test
timeout-minutes: 5
run: bash tests/integration/docker/gha-run.sh run
run-nerdctl-tests:
name: run-nerdctl-tests
strategy:
@@ -287,7 +331,7 @@ jobs:
env:
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -334,7 +378,7 @@ jobs:
name: run-kata-agent-apis
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -35,7 +35,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
SANDBOXER: "shim"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -96,7 +96,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
SANDBOXER: "podsandbox"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -123,3 +123,44 @@ jobs:
- name: Run containerd-stability tests
timeout-minutes: 15
run: bash tests/stability/gha-run.sh run
run-docker-tests:
name: run-docker-tests
strategy:
# We can set this to true whenever we're 100% sure that
# all the tests are not flaky, otherwise we'll fail them
# all due to a single flaky instance.
fail-fast: false
matrix:
vmm: ['qemu']
runs-on: s390x-large
env:
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
persist-credentials: false
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/docker/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-static-tarball-s390x${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/docker/gha-run.sh install-kata kata-artifacts
- name: Run docker smoke test
timeout-minutes: 5
run: bash tests/integration/docker/gha-run.sh run

View File

@@ -72,7 +72,7 @@ jobs:
sudo rm -f /tmp/kata_hybrid* # Sometime we got leftover from test_setup_hvsock_failed()
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -84,7 +84,7 @@ jobs:
sudo rm -f /tmp/kata_hybrid* # Sometime we got leftover from test_setup_hvsock_failed()
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -75,7 +75,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -178,7 +178,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -272,7 +272,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -326,7 +326,7 @@ jobs:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -391,7 +391,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -436,7 +436,7 @@ jobs:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -64,7 +64,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -162,7 +162,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -253,7 +253,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -305,7 +305,7 @@ jobs:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -51,7 +51,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -109,7 +109,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -184,7 +184,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -240,7 +240,7 @@ jobs:
run: |
sudo chown -R "$USER":"$USER" "$GITHUB_WORKSPACE"
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -38,7 +38,7 @@ jobs:
- kernel
- virtiofsd
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history

View File

@@ -58,7 +58,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -143,7 +143,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -196,7 +196,7 @@ jobs:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Rebase atop of the latest target branch
@@ -270,7 +270,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -328,7 +328,7 @@ jobs:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -28,7 +28,7 @@ jobs:
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -20,7 +20,7 @@ jobs:
steps:
- name: Checkout Code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Generate Action

View File

@@ -73,7 +73,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -182,7 +182,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -16,7 +16,7 @@ jobs:
name: ci
deployment: false
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -60,7 +60,7 @@ jobs:
# your codebase is analyzed, see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/codeql-code-scanning-for-compiled-languages
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -27,7 +27,7 @@ jobs:
echo "$HOME/.local/bin" >> "${GITHUB_PATH}"
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -19,7 +19,7 @@ jobs:
run: |
echo "GOPATH=${GITHUB_WORKSPACE}" >> "$GITHUB_ENV"
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-24.04
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -42,7 +42,7 @@ jobs:
skip_test: ${{ steps.skipper.outputs.skip_test }}
skip_static: ${{ steps.skipper.outputs.skip_static }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -30,7 +30,7 @@ jobs:
issues: read
pull-requests: read
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: 0

View File

@@ -20,7 +20,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Ensure nydus-snapshotter-version is in sync inside our repo

View File

@@ -145,7 +145,7 @@ jobs:
needs: [publish-kata-deploy-payload-amd64, publish-kata-deploy-payload-arm64, publish-kata-deploy-payload-s390x, publish-kata-deploy-payload-ppc64le]
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -171,7 +171,7 @@ jobs:
packages: write # needed to push the helm chart to ghcr.io
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -44,7 +44,7 @@ jobs:
packages: write
runs-on: ${{ inputs.runner }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -21,7 +21,7 @@ jobs:
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -50,7 +50,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: get-kata-tarball

View File

@@ -50,7 +50,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: get-kata-tarball

View File

@@ -47,7 +47,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: get-kata-tarball

View File

@@ -51,7 +51,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: get-kata-tarball

View File

@@ -12,7 +12,7 @@ jobs:
contents: write # needed for the `gh release create` command
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
@@ -87,7 +87,7 @@ jobs:
packages: write # needed to push the multi-arch manifest to ghcr.io
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -124,7 +124,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -206,7 +206,7 @@ jobs:
contents: write # needed for the `gh release` commands
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -224,7 +224,7 @@ jobs:
contents: write # needed for the `gh release` commands
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -242,7 +242,7 @@ jobs:
contents: write # needed for the `gh release` commands
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -261,7 +261,7 @@ jobs:
packages: write # needed to push the helm chart to ghcr.io
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -298,7 +298,7 @@ jobs:
contents: write # needed for the `gh release` commands
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -41,7 +41,7 @@ jobs:
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ inputs.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -73,7 +73,7 @@ jobs:
GENPOLICY_PULL_METHOD: ${{ matrix.genpolicy-pull-method }}
RUNS_ON_AKS: "true"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -46,7 +46,7 @@ jobs:
K8S_TEST_HOST_TYPE: all
TARGET_ARCH: "aarch64"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -63,7 +63,7 @@ jobs:
CONTAINER_ENGINE_VERSION: ${{ matrix.environment.containerd_version }}
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -124,4 +124,3 @@ jobs:
if: always()
timeout-minutes: 15
run: bash tests/integration/kubernetes/gha-run.sh cleanup

View File

@@ -53,7 +53,7 @@ jobs:
USE_EXPERIMENTAL_SNAPSHOTTER_SETUP: ${{ matrix.environment.name == 'nvidia-gpu-snp' && 'true' || 'false' }}
K8S_TEST_HOST_TYPE: baremetal
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -45,7 +45,7 @@ jobs:
KUBERNETES: ${{ matrix.k8s }}
TARGET_ARCH: "ppc64le"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -78,7 +78,7 @@ jobs:
AUTHENTICATED_IMAGE_USER: ${{ vars.AUTHENTICATED_IMAGE_USER }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -74,7 +74,7 @@ jobs:
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
SNAPSHOTTER: ${{ matrix.snapshotter }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -71,7 +71,7 @@ jobs:
GH_ITA_KEY: ${{ secrets.ITA_KEY }}
AUTO_GENERATE_POLICY: "yes"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -169,7 +169,7 @@ jobs:
CONTAINER_ENGINE_VERSION: "active"
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -292,7 +292,7 @@ jobs:
K8S_TEST_HOST_TYPE: "all"
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -408,7 +408,7 @@ jobs:
AUTO_GENERATE_POLICY: "no"
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -62,7 +62,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: "vanilla"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -46,7 +46,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: ${{ matrix.k8s }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -40,7 +40,7 @@ jobs:
#CONTAINERD_VERSION: ${{ matrix.containerd_version }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -46,7 +46,7 @@ jobs:
K8S_TEST_HOST_TYPE: "baremetal"
KUBERNETES: kubeadm
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -27,7 +27,7 @@ jobs:
steps:
- name: "Checkout code"
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -22,7 +22,7 @@ jobs:
runs-on: ubuntu-24.04
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -23,7 +23,7 @@ jobs:
runs-on: ubuntu-24.04
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -28,7 +28,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
@@ -71,7 +71,7 @@ jobs:
component-path: src/dragonball
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
@@ -115,7 +115,7 @@ jobs:
packages: write # for push to ghcr.io
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
@@ -171,7 +171,7 @@ jobs:
contents: read # for checkout
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -37,6 +37,23 @@ oc adm policy add-scc-to-group anyuid system:authenticated system:serviceaccount
oc label --overwrite ns default pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/warn=baseline pod-security.kubernetes.io/audit=baseline
```
The e2e suite uses a combination of built-in (origin) and external tests. External
tests include Kubernetes upstream conformance tests from the `hyperkube` image.
To enable external tests, export a variable matching your cluster version:
```bash
export EXTENSIONS_PAYLOAD_OVERRIDE=$(oc get clusterversion version -o jsonpath='{.status.desired.image}')
# Optional: limit to hyperkube only (k8s conformance tests, avoids downloading all operator extensions)
export EXTENSION_BINARY_OVERRIDE_INCLUDE_TAGS="hyperkube"
```
Alternatively, skip external tests entirely (only OpenShift-specific tests from origin):
```bash
export OPENSHIFT_SKIP_EXTERNAL_TESTS=1
```
Now you should be ready to run the openshift-tests. Our CI only uses a subset
of tests, to get the current ``TEST_SKIPS`` see
[the pipeline config](https://github.com/openshift/release/tree/master/ci-operator/config/kata-containers/kata-containers).

View File

@@ -39,6 +39,21 @@ git_sparse_clone() {
git checkout FETCH_HEAD
}
#######################
# Install prerequisites
#######################
if ! command -v helm &>/dev/null; then
echo "Helm not installed, installing in current location..."
PATH="${PWD}:${PATH}"
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | HELM_INSTALL_DIR='.' bash -s -- --no-sudo
fi
if ! command -v yq &>/dev/null; then
echo "yq not installed, installing in current location..."
PATH="${PWD}:${PATH}"
curl -fsSL https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -o ./yq
chmod +x yq
fi
###############################
# Disable security to allow e2e
###############################
@@ -83,7 +98,6 @@ AZURE_REGION=$(az group show --resource-group "${AZURE_RESOURCE_GROUP}" --query
# Create workload identity
AZURE_WORKLOAD_IDENTITY_NAME="caa-${AZURE_CLIENT_ID}"
az identity create --name "${AZURE_WORKLOAD_IDENTITY_NAME}" --resource-group "${AZURE_RESOURCE_GROUP}" --location "${AZURE_REGION}"
USER_ASSIGNED_CLIENT_ID="$(az identity show --resource-group "${AZURE_RESOURCE_GROUP}" --name "${AZURE_WORKLOAD_IDENTITY_NAME}" --query 'clientId' -otsv)"
#############################
@@ -184,84 +198,36 @@ echo "CAA_IMAGE=\"${CAA_IMAGE}\""
echo "CAA_TAG=\"${CAA_TAG}\""
echo "PP_IMAGE_ID=\"${PP_IMAGE_ID}\""
# Install cert-manager (prerequisit)
helm install cert-manager oci://quay.io/jetstack/charts/cert-manager --namespace cert-manager --create-namespace --set crds.enabled=true
# Clone and configure caa
git_sparse_clone "https://github.com/confidential-containers/cloud-api-adaptor.git" "${CAA_GIT_SHA:-main}" "src/cloud-api-adaptor/install/"
git_sparse_clone "https://github.com/confidential-containers/cloud-api-adaptor.git" "${CAA_GIT_SHA:-main}" "src/cloud-api-adaptor/install/charts/" "src/peerpod-ctrl/chart" "src/webhook/chart"
echo "CAA_GIT_SHA=\"$(git rev-parse HEAD)\""
pushd src/cloud-api-adaptor
cat <<EOF > install/overlays/azure/workload-identity.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cloud-api-adaptor-daemonset
namespace: confidential-containers-system
spec:
template:
metadata:
labels:
azure.workload.identity/use: "true"
---
pushd src/cloud-api-adaptor/install/charts/peerpods
# Use the latest kata-deploy
yq -i '( .dependencies[] | select(.name == "kata-deploy") ) .version = "0.0.0-dev"' Chart.yaml
helm dependency update .
# Create secrets
kubectl apply -f - << EOF
apiVersion: v1
kind: ServiceAccount
kind: Namespace
metadata:
name: cloud-api-adaptor
namespace: confidential-containers-system
annotations:
azure.workload.identity/client-id: "${USER_ASSIGNED_CLIENT_ID}"
name: confidential-containers-system
labels:
app.kubernetes.io/managed-by: Helm
annotations:
meta.helm.sh/release-name: peerpods
meta.helm.sh/release-namespace: confidential-containers-system
EOF
PP_INSTANCE_SIZE="Standard_D2as_v5"
DISABLECVM="true"
cat <<EOF > install/overlays/azure/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../yamls
images:
- name: cloud-api-adaptor
newName: "${CAA_IMAGE}"
newTag: "${CAA_TAG}"
generatorOptions:
disableNameSuffixHash: true
configMapGenerator:
- name: peer-pods-cm
namespace: confidential-containers-system
literals:
- CLOUD_PROVIDER="azure"
- AZURE_SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID}"
- AZURE_REGION="${PP_REGION}"
- AZURE_INSTANCE_SIZE="${PP_INSTANCE_SIZE}"
- AZURE_RESOURCE_GROUP="${PP_RESOURCE_GROUP}"
- AZURE_SUBNET_ID="${PP_SUBNET_ID}"
- AZURE_IMAGE_ID="${PP_IMAGE_ID}"
- DISABLECVM="${DISABLECVM}"
- PEERPODS_LIMIT_PER_NODE="50"
secretGenerator:
- name: peer-pods-secret
namespace: confidential-containers-system
envs:
- service-principal.env
- name: ssh-key-secret
namespace: confidential-containers-system
files:
- id_rsa.pub
patchesStrategicMerge:
- workload-identity.yaml
EOF
ssh-keygen -t rsa -f install/overlays/azure/id_rsa -N ''
echo "AZURE_CLIENT_ID=${AZURE_CLIENT_ID}" > install/overlays/azure/service-principal.env
echo "AZURE_CLIENT_SECRET=${AZURE_CLIENT_SECRET}" >> install/overlays/azure/service-principal.env
echo "AZURE_TENANT_ID=${AZURE_TENANT_ID}" >> install/overlays/azure/service-principal.env
# Deploy Operator
git_sparse_clone "https://github.com/confidential-containers/operator" "${OPERATOR_SHA:-main}" "config/"
echo "OPERATOR_SHA=\"$(git rev-parse HEAD)\""
oc apply -k "config/release"
oc apply -k "config/samples/ccruntime/peer-pods"
popd
# Deploy CAA
kubectl apply -k "install/overlays/azure"
popd
popd
kubectl create secret generic my-provider-creds \
-n confidential-containers-system \
--from-literal=AZURE_CLIENT_ID="$AZURE_CLIENT_ID" \
--from-literal=AZURE_CLIENT_SECRET="$AZURE_CLIENT_SECRET" \
--from-literal=AZURE_TENANT_ID="$AZURE_TENANT_ID"
helm install peerpods . -f providers/azure.yaml --set secrets.mode=reference --set secrets.existingSecretName=my-provider-creds --set providerConfigs.azure.AZURE_SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID}" --set providerConfigs.azure.AZURE_REGION="${PP_REGION}" --set providerConfigs.azure.AZURE_INSTANCE_SIZE="Standard_D2as_v5" --set providerConfigs.azure.AZURE_RESOURCE_GROUP="${PP_RESOURCE_GROUP}" --set providerConfigs.azure.AZURE_SUBNET_ID="${PP_SUBNET_ID}" --set providerConfigs.azure.AZURE_IMAGE_ID="${PP_IMAGE_ID}" --set providerConfigs.azure.DISABLECVM="true" --set providerConfigs.azure.PEERPODS_LIMIT_PER_NODE="50" --set kata-deploy.snapshotter.setup= --dependency-update -n confidential-containers-system --create-namespace --wait
popd # charts
popd # git_sparse_clone CAA
# Wait for runtimeclass
SECONDS=0

5687
src/agent/Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -41,6 +41,7 @@ sysctl = "0.7.1"
tempfile = "3.19.1"
test-utils = { path = "../test-utils" }
nix = "0.26.4"
rstest = "0.18"
[features]
default = []

View File

@@ -1077,6 +1077,81 @@ impl MemoryInfo {
if self.default_maxmemory == 0 || u64::from(self.default_maxmemory) > host_memory {
self.default_maxmemory = host_memory as u32;
}
// Apply PowerPC64 memory alignment
#[cfg(all(target_arch = "powerpc64", target_endian = "little"))]
self.adjust_ppc64_memory_alignment()?;
Ok(())
}
/// Adjusts memory values for PowerPC64 little-endian systems to meet
/// QEMU's 256MB block size alignment requirement.
///
/// Ensures default_memory is at least 1024MB and both default_memory
/// and default_maxmemory are aligned to 256MB boundaries.
/// Returns an error if aligned values would be equal.
#[cfg(all(target_arch = "powerpc64", target_endian = "little"))]
fn adjust_ppc64_memory_alignment(&mut self) -> Result<()> {
const PPC64_MEM_BLOCK_SIZE: u64 = 256;
const MIN_MEMORY_MB: u64 = 1024;
fn align_memory(value: u64) -> u64 {
(value / PPC64_MEM_BLOCK_SIZE) * PPC64_MEM_BLOCK_SIZE
}
let mut mem_size = u64::from(self.default_memory);
let max_mem_size = u64::from(self.default_maxmemory);
// Ensure minimum memory size
if mem_size < MIN_MEMORY_MB {
info!(
sl!(),
"PowerPC: Increasing default_memory from {}MB to minimum {}MB",
mem_size,
MIN_MEMORY_MB
);
mem_size = MIN_MEMORY_MB;
}
// Align both values to 256MB boundaries
let aligned_mem = align_memory(mem_size);
let aligned_max_mem = align_memory(max_mem_size);
if aligned_mem != mem_size {
info!(
sl!(),
"PowerPC: Aligned default_memory from {}MB to {}MB", mem_size, aligned_mem
);
}
if aligned_max_mem != max_mem_size {
info!(
sl!(),
"PowerPC: Aligned default_maxmemory from {}MB to {}MB",
max_mem_size,
aligned_max_mem
);
}
// Check if aligned values are equal
if aligned_max_mem != 0 && aligned_max_mem <= aligned_mem {
return Err(std::io::Error::other(format!(
"PowerPC: default_maxmemory ({}MB) <= default_memory ({}MB) after alignment. \
Requires maxmemory > memory. Please increase default_maxmemory.",
aligned_max_mem, aligned_mem
)));
}
info!(
sl!(),
"PowerPC: Memory alignment applied - memory: {}MB, max_memory: {}MB",
aligned_mem,
aligned_max_mem
);
self.default_memory = aligned_mem as u32;
self.default_maxmemory = aligned_max_mem as u32;
Ok(())
}
@@ -1948,4 +2023,72 @@ mod tests {
);
}
}
#[cfg(all(target_arch = "powerpc64", target_endian = "little"))]
use rstest::rstest;
#[rstest]
#[case::memory_below_minimum(512, 2048, 1024, 2048)]
#[case::already_aligned(1024, 2048, 1024, 2048)]
#[case::unaligned_rounds_down(1100, 2100, 1024, 2048)]
#[cfg(all(target_arch = "powerpc64", target_endian = "little"))]
fn test_adjust_ppc64_memory_alignment_success(
#[case] input_memory: u32,
#[case] input_maxmemory: u32,
#[case] expected_memory: u32,
#[case] expected_maxmemory: u32,
) {
let mut mem = MemoryInfo {
default_memory: input_memory,
default_maxmemory: input_maxmemory,
..Default::default()
};
let result = mem.adjust_ppc64_memory_alignment();
assert!(
result.is_ok(),
"Expected success but got error: {:?}",
result.err()
);
assert_eq!(
mem.default_memory, expected_memory,
"Memory not aligned correctly"
);
assert_eq!(
mem.default_maxmemory, expected_maxmemory,
"Max memory not aligned correctly"
);
}
#[rstest]
#[case::equal_after_alignment(1024, 1100, "Requires maxmemory > memory")]
#[case::maxmemory_less_than_memory(2048, 1500, "Requires maxmemory > memory")]
#[cfg(all(target_arch = "powerpc64", target_endian = "little"))]
fn test_adjust_ppc64_memory_alignment_errors(
#[case] input_memory: u32,
#[case] input_maxmemory: u32,
#[case] expected_error_msg: &str,
) {
let mut mem = MemoryInfo {
default_memory: input_memory,
default_maxmemory: input_maxmemory,
..Default::default()
};
let result = mem.adjust_ppc64_memory_alignment();
assert!(
result.is_err(),
"Expected error but got success for memory={}, maxmemory={}",
input_memory,
input_maxmemory
);
let error_msg = result.unwrap_err().to_string();
assert!(
error_msg.contains(expected_error_msg),
"Error message '{}' does not contain expected text '{}'",
error_msg,
expected_error_msg
);
}
}

View File

@@ -33,15 +33,11 @@ test:
exit 0
install: install-runtime install-configs
else ifeq ($(ARCH), powerpc64le)
default:
@echo "PowerPC 64 LE is not currently supported"
exit 0
default: runtime show-header
test:
@echo "PowerPC 64 LE is not currently supported"
exit 0
install:
@echo "PowerPC 64 LE is not currently supported"
@echo "powerpc64le is not currently supported"
exit 0
install: install-runtime install-configs
else ifeq ($(ARCH), riscv64gc)
default: runtime show-header
test:

View File

@@ -7,9 +7,6 @@
MACHINETYPE := pseries
KERNELPARAMS := cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1
MACHINEACCELERATORS := "cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-large-decr=off,cap-ccf-assist=off"
CPUFEATURES := pmu=off
CPUFEATURES :=
QEMUCMD := qemu-system-ppc64
# dragonball binary name
DBCMD := dragonball

View File

@@ -147,7 +147,7 @@ impl Tap {
// ioctl is safe since we call it with a valid tap fd and check the return
// value.
let ret = unsafe { ioctl_with_mut_ref(&tuntap, TUNSETIFF(), ifr) };
let ret = unsafe { ioctl_with_mut_ref(&tuntap, libc::TUNSETIFF as libc::c_ulong, ifr) };
if ret < 0 {
return Err(Error::CreateTap(IoError::last_os_error()));
}

View File

@@ -617,8 +617,10 @@ impl QemuInner {
todo!()
}
pub(crate) fn set_capabilities(&mut self, _flag: CapabilityBits) {
todo!()
pub(crate) fn set_capabilities(&mut self, flag: CapabilityBits) {
let mut caps = Capabilities::default();
caps.set(flag)
}
pub(crate) fn set_guest_memory_block_size(&mut self, size: u32) {

View File

@@ -9,9 +9,9 @@ import (
"context"
"fmt"
"github.com/containerd/containerd/api/types/task"
"github.com/sirupsen/logrus"
"github.com/containerd/containerd/api/types/task"
"github.com/kata-containers/kata-containers/src/runtime/pkg/katautils"
)
@@ -46,6 +46,19 @@ func startContainer(ctx context.Context, s *service, c *container) (retErr error
}
go watchSandbox(ctx, s)
// If no network endpoints were discovered during sandbox creation,
// schedule an async rescan. This handles runtimes that configure
// networking after task creation (e.g. Docker 26+ configures
// networking after the Start response, and prestart hooks may
// not have run yet on slower architectures).
// RescanNetwork is idempotent — it returns immediately if
// endpoints already exist.
go func() {
if err := s.sandbox.RescanNetwork(s.ctx); err != nil {
shimLog.WithError(err).Error("async network rescan failed — container may lack networking")
}
}()
// We use s.ctx(`ctx` derived from `s.ctx`) to check for cancellation of the
// shim context and the context passed to startContainer for tracing.
go watchOOMEvents(ctx, s)

View File

@@ -1,30 +0,0 @@
on: ["pull_request"]
name: Unit tests
permissions:
contents: read
jobs:
test:
name: test
strategy:
matrix:
go-version: [1.15.x, 1.16.x]
os: [ubuntu-22.04]
runs-on: ${{ matrix.os }}
steps:
- name: Install Go
uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5.6.0
with:
go-version: ${{ matrix.go-version }}
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
persist-credentials: false
- name: golangci-lint
uses: golangci/golangci-lint-action@4696ba8babb6127d732c3c6dde519db15edab9ea # v6.5.1
with:
version: latest
args: -c .golangci.yml -v
- name: go test
run: go test ./...

View File

@@ -1 +0,0 @@
*~

View File

@@ -1,35 +0,0 @@
# Copyright (c) 2021 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
run:
concurrency: 4
deadline: 600s
skip-dirs:
- vendor
# Ignore auto-generated protobuf code.
skip-files:
- ".*\\.pb\\.go$"
linters:
disable-all: true
enable:
- deadcode
- gocyclo
- gofmt
- gosimple
- govet
- ineffassign
- misspell
- staticcheck
- structcheck
- typecheck
- unconvert
- unused
- varcheck
linters-settings:
gocyclo:
min_complexity: 15
unused:
check-exported: true

View File

@@ -1,26 +0,0 @@
language: go
go:
- "1.10"
- "1.11"
- tip
arch:
- s390x
go_import_path: github.com/kata-containers/govmm
matrix:
allow_failures:
- go: tip
before_install:
- go get github.com/alecthomas/gometalinter
- gometalinter --install
- go get github.com/mattn/goveralls
script:
- go env
- gometalinter --tests --vendor --disable-all --enable=misspell --enable=vet --enable=ineffassign --enable=gofmt --enable=gocyclo --cyclo-over=15 --enable=golint --enable=errcheck --enable=deadcode --enable=staticcheck -enable=gas ./...
after_success:
- $GOPATH/bin/goveralls -repotoken $COVERALLS_TOKEN -v -service=travis-ci

View File

@@ -19,6 +19,7 @@ import (
vc "github.com/kata-containers/kata-containers/src/runtime/virtcontainers"
vf "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/factory"
vcAnnotations "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/annotations"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/utils"
specs "github.com/opencontainers/runtime-spec/specs-go"
)
@@ -140,6 +141,17 @@ func CreateSandbox(ctx context.Context, vci vc.VC, ociSpec specs.Spec, runtimeCo
sandboxConfig.Containers[0].RootFs = rootFs
}
// Docker 26+ may set up networking before task creation instead of using
// prestart hooks. The netns path is not in the OCI spec but can be
// discovered from Docker's libnetwork hook args which contain the sandbox
// ID that maps to /var/run/docker/netns/<sandbox_id>.
if sandboxConfig.NetworkConfig.NetworkID == "" && !sandboxConfig.NetworkConfig.DisableNewNetwork {
if dockerNetns := utils.DockerNetnsPath(&ociSpec); dockerNetns != "" {
sandboxConfig.NetworkConfig.NetworkID = dockerNetns
kataUtilsLogger.WithField("netns", dockerNetns).Info("discovered Docker network namespace from hook args")
}
}
// Important to create the network namespace before the sandbox is
// created, because it is not responsible for the creation of the
// netns if it does not exist.

View File

@@ -20,7 +20,9 @@ import (
"golang.org/x/sys/unix"
)
const procMountInfoFile = "/proc/self/mountinfo"
const (
procMountInfoFile = "/proc/self/mountinfo"
)
// EnterNetNS is free from any call to a go routine, and it calls
// into runtime.LockOSThread(), meaning it won't be executed in a
@@ -29,27 +31,30 @@ func EnterNetNS(networkID string, cb func() error) error {
return vc.EnterNetNS(networkID, cb)
}
// SetupNetworkNamespace create a network namespace
// SetupNetworkNamespace creates a network namespace if one is not already
// provided via NetworkID. When NetworkID is empty and networking is not
// disabled, a new namespace is created as a placeholder; the actual
// hypervisor namespace will be discovered later by addAllEndpoints after
// the VM has started.
func SetupNetworkNamespace(config *vc.NetworkConfig) error {
if config.DisableNewNetwork {
kataUtilsLogger.Info("DisableNewNetNs is on, shim and hypervisor are running in the host netns")
return nil
}
var err error
var n ns.NetNS
if config.NetworkID == "" {
var (
err error
n ns.NetNS
)
if rootless.IsRootless() {
n, err = rootless.NewNS()
if err != nil {
return err
}
} else {
n, err = testutils.NewNS()
if err != nil {
return err
}
}
if err != nil {
return err
}
config.NetworkID = n.Path()
@@ -71,11 +76,23 @@ func SetupNetworkNamespace(config *vc.NetworkConfig) error {
}
const (
netNsMountType = "nsfs"
mountTypeFieldIdx = 8
mountDestIdx = 4
netNsMountType = "nsfs"
mountDestIdx = 4
)
// mountinfoFsType finds the filesystem type in a parsed mountinfo line.
// The mountinfo format has optional tagged fields (shared:, master:, etc.)
// between field 7 and a "-" separator. The fs type is the field immediately
// after "-". Returns "" if the separator is not found.
func mountinfoFsType(fields []string) string {
for i, f := range fields {
if f == "-" && i+1 < len(fields) {
return fields[i+1]
}
}
return ""
}
// getNetNsFromBindMount returns the network namespace for the bind-mounted path
func getNetNsFromBindMount(nsPath string, procMountFile string) (string, error) {
// Resolve all symlinks in the path as the mountinfo file contains
@@ -100,16 +117,15 @@ func getNetNsFromBindMount(nsPath string, procMountFile string) (string, error)
// "711 26 0:3 net:[4026532009] /run/docker/netns/default rw shared:535 - nsfs nsfs rw"
//
// Reference: https://www.kernel.org/doc/Documentation/filesystems/proc.txt
// We are interested in the first 9 fields of this file,
// to check for the correct mount type.
// The "-" separator has a variable position due to optional tagged
// fields, so we locate the fs type dynamically.
fields := strings.Split(text, " ")
if len(fields) < 9 {
continue
}
// We check here if the mount type is a network namespace mount type, namely "nsfs"
if fields[mountTypeFieldIdx] != netNsMountType {
if mountinfoFsType(fields) != netNsMountType {
continue
}

View File

@@ -149,3 +149,23 @@ func TestSetupNetworkNamespace(t *testing.T) {
err = SetupNetworkNamespace(config)
assert.NoError(err)
}
func TestMountinfoFsType(t *testing.T) {
assert := assert.New(t)
// Standard mountinfo line with optional tagged fields
fields := []string{"711", "26", "0:3", "net:[4026532009]", "/run/docker/netns/default", "rw", "shared:535", "-", "nsfs", "nsfs", "rw"}
assert.Equal("nsfs", mountinfoFsType(fields))
// Multiple optional tags before separator
fields = []string{"711", "26", "0:3", "net:[4026532009]", "/run/docker/netns/default", "rw", "shared:535", "master:1", "-", "nsfs", "nsfs", "rw"}
assert.Equal("nsfs", mountinfoFsType(fields))
// No separator
fields = []string{"711", "26", "0:3", "net:[4026532009]", "/run/docker/netns/default", "rw"}
assert.Equal("", mountinfoFsType(fields))
// Separator at end (malformed)
fields = []string{"711", "26", "-"}
assert.Equal("", mountinfoFsType(fields))
}

View File

@@ -72,6 +72,8 @@ type VCSandbox interface {
GetOOMEvent(ctx context.Context) (string, error)
GetHypervisorPid() (int, error)
// RescanNetwork re-scans the network namespace for late-discovered endpoints.
RescanNetwork(ctx context.Context) error
UpdateRuntimeMetrics() error
GetAgentMetrics(ctx context.Context) (string, error)

View File

@@ -17,9 +17,11 @@ import (
"runtime"
"sort"
"strconv"
"strings"
"time"
"github.com/containernetworking/plugins/pkg/ns"
"github.com/sirupsen/logrus"
"github.com/vishvananda/netlink"
"github.com/vishvananda/netns"
otelTrace "go.opentelemetry.io/otel/trace"
@@ -45,6 +47,11 @@ type LinuxNetwork struct {
interworkingModel NetInterworkingModel
netNSCreated bool
danConfigPath string
// placeholderNetNS holds the path to a placeholder network namespace
// that we created but later abandoned in favour of the hypervisor's
// netns. If best-effort deletion in addAllEndpoints fails, teardown
// retries the cleanup via RemoveEndpoints.
placeholderNetNS string
}
// NewNetwork creates a new Linux Network from a NetworkConfig.
@@ -68,11 +75,11 @@ func NewNetwork(configs ...*NetworkConfig) (Network, error) {
}
return &LinuxNetwork{
config.NetworkID,
[]Endpoint{},
config.InterworkingModel,
config.NetworkCreated,
config.DanConfigPath,
netNSPath: config.NetworkID,
eps: []Endpoint{},
interworkingModel: config.InterworkingModel,
netNSCreated: config.NetworkCreated,
danConfigPath: config.DanConfigPath,
}, nil
}
@@ -325,28 +332,91 @@ func (n *LinuxNetwork) GetEndpointsNum() (int, error) {
// Scan the networking namespace through netlink and then:
// 1. Create the endpoints for the relevant interfaces found there.
// 2. Attach them to the VM.
//
// If no usable interfaces are found and the hypervisor is running in a
// different network namespace (e.g. Docker 26+ places QEMU in its own
// pre-configured namespace), switch to the hypervisor's namespace and
// rescan there. This handles the case where the OCI spec does not
// communicate the network namespace path.
func (n *LinuxNetwork) addAllEndpoints(ctx context.Context, s *Sandbox, hotplug bool) error {
netnsHandle, err := netns.GetFromPath(n.netNSPath)
endpoints, err := n.scanEndpointsInNs(ctx, s, n.netNSPath, hotplug)
if err != nil {
return err
}
// If the scan found no usable endpoints, check whether the
// hypervisor is running in a different namespace and retry there.
if len(endpoints) == 0 && s != nil {
if hypervisorNs, ok := n.detectHypervisorNetns(s); ok {
networkLogger().WithFields(logrus.Fields{
"original_netns": n.netNSPath,
"hypervisor_netns": hypervisorNs,
}).Debug("no endpoints in original netns, switching to hypervisor netns")
origPath := n.netNSPath
origCreated := n.netNSCreated
n.netNSPath = hypervisorNs
_, err = n.scanEndpointsInNs(ctx, s, n.netNSPath, hotplug)
if err != nil {
n.netNSPath = origPath
n.netNSCreated = origCreated
return err
}
// Clean up the placeholder namespace we created — we're now
// using the hypervisor's namespace and the placeholder is empty.
// Only clear netNSCreated once deletion succeeds; on failure,
// stash the path so RemoveEndpoints can retry during teardown.
if origCreated {
if delErr := deleteNetNS(origPath); delErr != nil {
networkLogger().WithField("netns", origPath).WithError(delErr).Warn("failed to delete placeholder netns, will retry during teardown")
n.placeholderNetNS = origPath
}
}
// The hypervisor's namespace was not created by us.
n.netNSCreated = false
}
}
sort.Slice(n.eps, func(i, j int) bool {
return n.eps[i].Name() < n.eps[j].Name()
})
networkLogger().WithField("endpoints", n.eps).Info("endpoints found after scan")
return nil
}
// scanEndpointsInNs scans a network namespace for usable (non-loopback,
// configured) interfaces and adds them as endpoints. Returns the list of
// newly added endpoints.
func (n *LinuxNetwork) scanEndpointsInNs(ctx context.Context, s *Sandbox, nsPath string, hotplug bool) ([]Endpoint, error) {
netnsHandle, err := netns.GetFromPath(nsPath)
if err != nil {
return nil, err
}
defer netnsHandle.Close()
netlinkHandle, err := netlink.NewHandleAt(netnsHandle)
if err != nil {
return err
return nil, err
}
defer netlinkHandle.Close()
linkList, err := netlinkHandle.LinkList()
if err != nil {
return err
return nil, err
}
epsBefore := len(n.eps)
var added []Endpoint
for _, link := range linkList {
netInfo, err := networkInfoFromLink(netlinkHandle, link)
if err != nil {
return err
// No rollback needed — no endpoints were added in this iteration yet.
return nil, err
}
// Ignore unconfigured network interfaces. These are
@@ -368,22 +438,62 @@ func (n *LinuxNetwork) addAllEndpoints(ctx context.Context, s *Sandbox, hotplug
continue
}
if err := doNetNS(n.netNSPath, func(_ ns.NetNS) error {
_, err = n.addSingleEndpoint(ctx, s, netInfo, hotplug)
return err
if err := doNetNS(nsPath, func(_ ns.NetNS) error {
ep, addErr := n.addSingleEndpoint(ctx, s, netInfo, hotplug)
if addErr == nil {
added = append(added, ep)
}
return addErr
}); err != nil {
return err
// Rollback: remove any endpoints added during this scan
// so that a failed scan does not leave partial side effects.
n.eps = n.eps[:epsBefore]
return nil, err
}
}
sort.Slice(n.eps, func(i, j int) bool {
return n.eps[i].Name() < n.eps[j].Name()
})
return added, nil
}
networkLogger().WithField("endpoints", n.eps).Info("endpoints found after scan")
// detectHypervisorNetns checks whether the hypervisor process is running in a
// network namespace different from the one we are currently tracking. If so it
// returns the procfs path to the hypervisor's netns and true.
func (n *LinuxNetwork) detectHypervisorNetns(s *Sandbox) (string, bool) {
pid, err := s.GetHypervisorPid()
if err != nil || pid <= 0 {
return "", false
}
return nil
// Guard against PID recycling: verify the process belongs to this
// sandbox by checking its command line for the sandbox ID. QEMU is
// started with -name sandbox-<id>, so the ID will appear in cmdline.
// /proc/pid/cmdline uses null bytes as argument separators; replace
// them so the substring search works on the joined argument string.
cmdlineRaw, err := os.ReadFile(fmt.Sprintf("/proc/%d/cmdline", pid))
if err != nil {
return "", false
}
cmdline := strings.ReplaceAll(string(cmdlineRaw), "\x00", " ")
if !strings.Contains(cmdline, s.id) {
return "", false
}
hypervisorNs := fmt.Sprintf("/proc/%d/ns/net", pid)
// Compare device and inode numbers. Inode numbers are only unique
// within a device, so both must match to confirm the same namespace.
var currentStat, hvStat unix.Stat_t
if err := unix.Stat(n.netNSPath, &currentStat); err != nil {
return "", false
}
if err := unix.Stat(hypervisorNs, &hvStat); err != nil {
return "", false
}
if currentStat.Dev != hvStat.Dev || currentStat.Ino != hvStat.Ino {
return hypervisorNs, true
}
return "", false
}
func convertDanDeviceToNetworkInfo(device *vctypes.DanDevice) (*NetworkInfo, error) {
@@ -571,6 +681,17 @@ func (n *LinuxNetwork) RemoveEndpoints(ctx context.Context, s *Sandbox, endpoint
return deleteNetNS(n.netNSPath)
}
// Retry cleanup of a placeholder namespace whose earlier deletion
// failed in addAllEndpoints.
if n.placeholderNetNS != "" && endpoints == nil {
if delErr := deleteNetNS(n.placeholderNetNS); delErr != nil {
networkLogger().WithField("netns", n.placeholderNetNS).WithError(delErr).Warn("failed to delete placeholder netns during teardown")
} else {
networkLogger().WithField("netns", n.placeholderNetNS).Info("placeholder network namespace deleted")
n.placeholderNetNS = ""
}
}
return nil
}

View File

@@ -363,11 +363,11 @@ func TestConvertDanDeviceToNetworkInfo(t *testing.T) {
func TestAddEndpoints_Dan(t *testing.T) {
network := &LinuxNetwork{
"net-123",
[]Endpoint{},
NetXConnectDefaultModel,
true,
"testdata/dan-config.json",
netNSPath: "net-123",
eps: []Endpoint{},
interworkingModel: NetXConnectDefaultModel,
netNSCreated: true,
danConfigPath: "testdata/dan-config.json",
}
ctx := context.TODO()

View File

@@ -255,6 +255,10 @@ func (s *Sandbox) GetHypervisorPid() (int, error) {
return 0, nil
}
func (s *Sandbox) RescanNetwork(ctx context.Context) error {
return nil
}
func (s *Sandbox) GuestVolumeStats(ctx context.Context, path string) ([]byte, error) {
return nil, nil
}

View File

@@ -20,6 +20,7 @@ import (
"strings"
"sync"
"syscall"
"time"
v1 "github.com/containerd/cgroups/stats/v1"
v2 "github.com/containerd/cgroups/v2/stats"
@@ -330,6 +331,81 @@ func (s *Sandbox) GetHypervisorPid() (int, error) {
return pids[0], nil
}
// RescanNetwork re-scans the network namespace for endpoints if none have
// been discovered yet. This is idempotent: if endpoints already exist it
// returns immediately. It enables Docker 26+ support where networking is
// configured after task creation but before Start.
//
// Docker 26+ configures networking (veth pair, IP addresses) between
// Create and Start. The interfaces may not be present immediately, so
// this method polls until they appear or a timeout is reached.
//
// When new endpoints are found, the guest agent is informed about the
// interfaces and routes so that networking becomes functional inside the VM.
func (s *Sandbox) RescanNetwork(ctx context.Context) error {
if s.config.NetworkConfig.DisableNewNetwork {
return nil
}
if len(s.network.Endpoints()) > 0 {
return nil
}
const maxWait = 5 * time.Second
const pollInterval = 50 * time.Millisecond
deadline := time.NewTimer(maxWait)
defer deadline.Stop()
ticker := time.NewTicker(pollInterval)
defer ticker.Stop()
s.Logger().Debug("waiting for network interfaces in namespace")
for {
if _, err := s.network.AddEndpoints(ctx, s, nil, true); err != nil {
return err
}
if len(s.network.Endpoints()) > 0 {
return s.configureGuestNetwork(ctx)
}
select {
case <-ctx.Done():
return ctx.Err()
case <-deadline.C:
s.Logger().Warn("no network interfaces found after timeout — networking may be configured by prestart hooks")
return nil
case <-ticker.C:
}
}
}
// configureGuestNetwork informs the guest agent about discovered network
// endpoints so that interfaces and routes become functional inside the VM.
func (s *Sandbox) configureGuestNetwork(ctx context.Context) error {
endpoints := s.network.Endpoints()
s.Logger().WithField("endpoints", len(endpoints)).Info("configuring hotplugged network in guest")
// Note: ARP neighbors (3rd return value) are not propagated here
// because the agent interface only exposes per-entry updates. The
// full setupNetworks path in kataAgent handles them; this path is
// only reached for late-discovered endpoints where neighbor entries
// are populated dynamically by the kernel.
interfaces, routes, _, err := generateVCNetworkStructures(ctx, endpoints)
if err != nil {
return fmt.Errorf("generating network structures: %w", err)
}
for _, ifc := range interfaces {
if _, err := s.agent.updateInterface(ctx, ifc); err != nil {
return fmt.Errorf("updating interface %s in guest: %w", ifc.Name, err)
}
}
if len(routes) > 0 {
if _, err := s.agent.updateRoutes(ctx, routes); err != nil {
return fmt.Errorf("updating routes in guest: %w", err)
}
}
return nil
}
// GetAllContainers returns all containers.
func (s *Sandbox) GetAllContainers() []VCContainer {
ifa := make([]VCContainer, len(s.containers))

View File

@@ -12,6 +12,7 @@ import (
"os"
"os/exec"
"path/filepath"
"regexp"
"strings"
"syscall"
"time"
@@ -493,17 +494,38 @@ func RevertBytes(num uint64) uint64 {
return 1024*RevertBytes(a) + b
}
// dockerLibnetworkSetkey is the hook argument that identifies Docker's
// network configuration hook. The argument following it is the sandbox ID.
const dockerLibnetworkSetkey = "libnetwork-setkey"
// dockerNetnsPrefixes are the well-known filesystem paths where the Docker
// daemon bind-mounts container network namespaces.
var dockerNetnsPrefixes = []string{"/var/run/docker/netns/", "/run/docker/netns/"}
// validSandboxID matches Docker sandbox IDs: exactly 64 lowercase hex characters.
var validSandboxID = regexp.MustCompile(`^[0-9a-f]{64}$`)
// IsDockerContainer returns if the container is managed by docker
// This is done by checking the prestart hook for `libnetwork` arguments.
// This is done by checking the prestart and createRuntime hooks for
// `libnetwork` arguments. Docker 26+ may use CreateRuntime hooks
// instead of the deprecated Prestart hooks.
func IsDockerContainer(spec *specs.Spec) bool {
if spec == nil || spec.Hooks == nil {
return false
}
for _, hook := range spec.Hooks.Prestart { //nolint:all
for _, arg := range hook.Args {
if strings.HasPrefix(arg, "libnetwork") {
return true
// Check both Prestart (Docker < 26) and CreateRuntime (Docker >= 26) hooks.
hookSets := [][]specs.Hook{
spec.Hooks.Prestart, //nolint:all
spec.Hooks.CreateRuntime,
}
for _, hooks := range hookSets {
for _, hook := range hooks {
for _, arg := range hook.Args {
if strings.HasPrefix(arg, "libnetwork") {
return true
}
}
}
}
@@ -511,6 +533,50 @@ func IsDockerContainer(spec *specs.Spec) bool {
return false
}
// DockerNetnsPath attempts to discover Docker's pre-created network namespace
// path from OCI spec hooks. Docker's libnetwork-setkey hook contains the
// sandbox ID as its second argument, which maps to the netns file under
// /var/run/docker/netns/<sandbox_id>.
func DockerNetnsPath(spec *specs.Spec) string {
if spec == nil || spec.Hooks == nil {
return ""
}
// Search both Prestart and CreateRuntime hooks for libnetwork-setkey.
hookSets := [][]specs.Hook{
spec.Hooks.Prestart, //nolint:all
spec.Hooks.CreateRuntime,
}
for _, hooks := range hookSets {
for _, hook := range hooks {
for i, arg := range hook.Args {
if arg == dockerLibnetworkSetkey && i+1 < len(hook.Args) {
sandboxID := hook.Args[i+1]
// Docker sandbox IDs are exactly 64 lowercase hex
// characters. Reject anything else to prevent path
// traversal and unexpected input.
if !validSandboxID.MatchString(sandboxID) {
continue
}
// Docker stores netns under well-known paths.
// Use Lstat to reject symlinks (which could point
// outside the Docker netns directory) and non-regular
// files such as directories.
for _, prefix := range dockerNetnsPrefixes {
nsPath := prefix + sandboxID
if fi, err := os.Lstat(nsPath); err == nil && fi.Mode().IsRegular() {
return nsPath
}
}
}
}
}
}
return ""
}
// GetGuestNUMANodes constructs guest NUMA nodes mapping to host NUMA nodes and host CPUs.
func GetGuestNUMANodes(numaMapping []string) ([]types.GuestNUMANode, error) {
// Add guest NUMA node for each specified subsets of host NUMA nodes.

View File

@@ -579,24 +579,178 @@ func TestRevertBytes(t *testing.T) {
assert.Equal(expectedNum, num)
}
// TestIsDockerContainer validates hook-detection logic in isolation.
// End-to-end Docker→containerd→kata integration is covered by
// external tests (see tests/integration/kubernetes/).
func TestIsDockerContainer(t *testing.T) {
assert := assert.New(t)
// nil spec
assert.False(IsDockerContainer(nil))
// nil hooks
assert.False(IsDockerContainer(&specs.Spec{}))
// Unrelated prestart hook
ociSpec := &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{
{
Args: []string{
"haha",
},
},
Prestart: []specs.Hook{ //nolint:all
{Args: []string{"haha"}},
},
},
}
assert.False(IsDockerContainer(ociSpec))
// Prestart hook with libnetwork (Docker < 26)
ociSpec.Hooks.Prestart = append(ociSpec.Hooks.Prestart, specs.Hook{ //nolint:all
Args: []string{"libnetwork-xxx"},
})
assert.True(IsDockerContainer(ociSpec))
// CreateRuntime hook with libnetwork (Docker >= 26)
ociSpec2 := &specs.Spec{
Hooks: &specs.Hooks{
CreateRuntime: []specs.Hook{
{Args: []string{"/usr/bin/docker-proxy", "libnetwork-setkey", "abc123", "ctrl"}},
},
},
}
assert.True(IsDockerContainer(ociSpec2))
// CreateRuntime hook without libnetwork
ociSpec3 := &specs.Spec{
Hooks: &specs.Hooks{
CreateRuntime: []specs.Hook{
{Args: []string{"/some/other/hook"}},
},
},
}
assert.False(IsDockerContainer(ociSpec3))
}
// TestDockerNetnsPath validates netns path discovery from OCI hook args.
// This does not test the actual namespace opening or endpoint scanning;
// see integration tests for full-path coverage.
func TestDockerNetnsPath(t *testing.T) {
assert := assert.New(t)
// Valid 64-char hex sandbox IDs for test cases.
validID := strings.Repeat("ab", 32) // 64 hex chars
validID2 := strings.Repeat("cd", 32) // another 64 hex chars
invalidShortID := "abc123" // too short
invalidUpperID := strings.Repeat("AB", 32) // uppercase rejected
// nil spec
assert.Equal("", DockerNetnsPath(nil))
// nil hooks
assert.Equal("", DockerNetnsPath(&specs.Spec{}))
// Hook without libnetwork-setkey
spec := &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{ //nolint:all
{Args: []string{"/some/binary", "unrelated"}},
},
},
}
assert.Equal("", DockerNetnsPath(spec))
// Prestart hook with libnetwork-setkey but sandbox ID too short (rejected by regex)
spec = &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{ //nolint:all
{Args: []string{"/usr/bin/proxy", "libnetwork-setkey", invalidShortID, "ctrl"}},
},
},
}
assert.Equal("", DockerNetnsPath(spec))
// Prestart hook with libnetwork-setkey but uppercase hex (rejected by regex)
spec = &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{ //nolint:all
{Args: []string{"/usr/bin/proxy", "libnetwork-setkey", invalidUpperID, "ctrl"}},
},
},
}
assert.Equal("", DockerNetnsPath(spec))
// Prestart hook with valid sandbox ID but netns file doesn't exist on disk
spec = &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{ //nolint:all
{Args: []string{"/usr/bin/proxy", "libnetwork-setkey", validID, "ctrl"}},
},
},
}
assert.Equal("", DockerNetnsPath(spec))
// Prestart hook with libnetwork-setkey and existing netns file — success path
tmpDir := t.TempDir()
fakeNsDir := filepath.Join(tmpDir, "netns")
err := os.MkdirAll(fakeNsDir, 0755)
assert.NoError(err)
fakeNsFile := filepath.Join(fakeNsDir, validID)
err = os.WriteFile(fakeNsFile, []byte{}, 0644)
assert.NoError(err)
// Temporarily override dockerNetnsPrefixes so DockerNetnsPath can find
// the netns file we created under the temp directory.
origPrefixes := dockerNetnsPrefixes
dockerNetnsPrefixes = []string{fakeNsDir + "/"}
defer func() { dockerNetnsPrefixes = origPrefixes }()
spec = &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{ //nolint:all
{Args: []string{"/usr/bin/proxy", "libnetwork-setkey", validID, "ctrl"}},
},
},
}
assert.Equal(fakeNsFile, DockerNetnsPath(spec))
// Sandbox ID that is a directory rather than a regular file — must be rejected
dirID := validID2
err = os.MkdirAll(filepath.Join(fakeNsDir, dirID), 0755)
assert.NoError(err)
spec = &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{ //nolint:all
{Args: []string{"/usr/bin/proxy", "libnetwork-setkey", dirID, "ctrl"}},
},
},
}
assert.Equal("", DockerNetnsPath(spec))
// CreateRuntime hook with valid sandbox ID — file doesn't exist
validID3 := strings.Repeat("ef", 32)
spec = &specs.Spec{
Hooks: &specs.Hooks{
CreateRuntime: []specs.Hook{
{Args: []string{"/usr/bin/proxy", "libnetwork-setkey", validID3, "ctrl"}},
},
},
}
assert.Equal("", DockerNetnsPath(spec))
// Hook with libnetwork-setkey as last arg (no sandbox ID follows) — no panic
spec = &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{ //nolint:all
{Args: []string{"libnetwork-setkey"}},
},
},
}
assert.Equal("", DockerNetnsPath(spec))
// Empty args slice
spec = &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{ //nolint:all
{Args: []string{}},
},
},
}
assert.Equal("", DockerNetnsPath(spec))
}

View File

@@ -1,124 +0,0 @@
# Copyright (c) K3s contributors
#
# SPDX-License-Identifier: Apache-2.0
#
{{- /* */ -}}
# File generated by {{ .Program }}. DO NOT EDIT. Use config-v3.toml.tmpl instead.
version = 3
imports = ["__CONTAINERD_IMPORTS_PATH__"]
root = {{ printf "%q" .NodeConfig.Containerd.Root }}
state = {{ printf "%q" .NodeConfig.Containerd.State }}
[grpc]
address = {{ deschemify .NodeConfig.Containerd.Address | printf "%q" }}
[plugins.'io.containerd.internal.v1.opt']
path = {{ printf "%q" .NodeConfig.Containerd.Opt }}
[plugins.'io.containerd.grpc.v1.cri']
stream_server_address = "127.0.0.1"
stream_server_port = "10010"
[plugins.'io.containerd.cri.v1.runtime']
enable_selinux = {{ .NodeConfig.SELinux }}
enable_unprivileged_ports = {{ .EnableUnprivileged }}
enable_unprivileged_icmp = {{ .EnableUnprivileged }}
device_ownership_from_security_context = {{ .NonrootDevices }}
{{ if .DisableCgroup}}
disable_cgroup = true
{{ end }}
{{ if .IsRunningInUserNS }}
disable_apparmor = true
restrict_oom_score_adj = true
{{ end }}
{{ with .NodeConfig.AgentConfig.Snapshotter }}
[plugins.'io.containerd.cri.v1.images']
snapshotter = "{{ . }}"
disable_snapshot_annotations = {{ if eq . "stargz" }}false{{else}}true{{end}}
use_local_image_pull = true
{{ end }}
{{ with .NodeConfig.AgentConfig.PauseImage }}
[plugins.'io.containerd.cri.v1.images'.pinned_images]
sandbox = "{{ . }}"
{{ end }}
{{- if or .NodeConfig.AgentConfig.CNIBinDir .NodeConfig.AgentConfig.CNIConfDir }}
[plugins.'io.containerd.cri.v1.runtime'.cni]
{{ with .NodeConfig.AgentConfig.CNIBinDir }}bin_dirs = [{{ printf "%q" . }}]{{ end }}
{{ with .NodeConfig.AgentConfig.CNIConfDir }}conf_dir = {{ printf "%q" . }}{{ end }}
{{ end }}
{{ if or .NodeConfig.Containerd.BlockIOConfig .NodeConfig.Containerd.RDTConfig }}
[plugins.'io.containerd.service.v1.tasks-service']
{{ with .NodeConfig.Containerd.BlockIOConfig }}blockio_config_file = {{ printf "%q" . }}{{ end }}
{{ with .NodeConfig.Containerd.RDTConfig }}rdt_config_file = {{ printf "%q" . }}{{ end }}
{{ end }}
{{ with .NodeConfig.DefaultRuntime }}
[plugins.'io.containerd.cri.v1.runtime'.containerd]
default_runtime_name = "{{ . }}"
{{ end }}
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc.options]
SystemdCgroup = {{ .SystemdCgroup }}
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runhcs-wcow-process]
runtime_type = "io.containerd.runhcs.v1"
{{ range $k, $v := .ExtraRuntimes }}
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'{{ $k }}']
runtime_type = "{{$v.RuntimeType}}"
{{ with $v.BinaryName}}
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'{{ $k }}'.options]
BinaryName = {{ printf "%q" . }}
SystemdCgroup = {{ $.SystemdCgroup }}
{{ end }}
{{ end }}
[plugins.'io.containerd.cri.v1.images'.registry]
config_path = {{ printf "%q" .NodeConfig.Containerd.Registry }}
{{ if .PrivateRegistryConfig }}
{{ range $k, $v := .PrivateRegistryConfig.Configs }}
{{ with $v.Auth }}
[plugins.'io.containerd.cri.v1.images'.registry.configs.'{{ $k }}'.auth]
{{ with .Username }}username = {{ printf "%q" . }}{{ end }}
{{ with .Password }}password = {{ printf "%q" . }}{{ end }}
{{ with .Auth }}auth = {{ printf "%q" . }}{{ end }}
{{ with .IdentityToken }}identitytoken = {{ printf "%q" . }}{{ end }}
{{ end }}
{{ end }}
{{ end }}
{{ if eq .NodeConfig.AgentConfig.Snapshotter "stargz" }}
{{ with .NodeConfig.AgentConfig.ImageServiceSocket }}
[plugins.'io.containerd.snapshotter.v1.stargz']
cri_keychain_image_service_path = {{ printf "%q" . }}
[plugins.'io.containerd.snapshotter.v1.stargz'.cri_keychain]
enable_keychain = true
{{ end }}
[plugins.'io.containerd.snapshotter.v1.stargz'.registry]
config_path = {{ printf "%q" .NodeConfig.Containerd.Registry }}
{{ if .PrivateRegistryConfig }}
{{ range $k, $v := .PrivateRegistryConfig.Configs }}
{{ with $v.Auth }}
[plugins.'io.containerd.snapshotter.v1.stargz'.registry.configs.'{{ $k }}'.auth]
{{ with .Username }}username = {{ printf "%q" . }}{{ end }}
{{ with .Password }}password = {{ printf "%q" . }}{{ end }}
{{ with .Auth }}auth = {{ printf "%q" . }}{{ end }}
{{ with .IdentityToken }}identitytoken = {{ printf "%q" . }}{{ end }}
{{ end }}
{{ end }}
{{ end }}
{{ end }}

View File

@@ -0,0 +1,213 @@
#!/usr/bin/env bats
#
# Copyright (c) 2026 NVIDIA Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
# Kata Deploy Lifecycle Tests
#
# Validates kata-deploy behavior during DaemonSet restarts and uninstalls:
#
# 1. Artifacts present: After install, kata artifacts exist on the host,
# RuntimeClasses are created, and the node is labeled.
#
# 2. Restart resilience: Running kata pods must survive a kata-deploy
# DaemonSet restart without crashing. (Regression test for #12761)
#
# 3. Artifact cleanup: After helm uninstall, kata artifacts must be
# fully removed from the host and containerd must remain healthy.
#
# Required environment variables:
# DOCKER_REGISTRY - Container registry for kata-deploy image
# DOCKER_REPO - Repository name for kata-deploy image
# DOCKER_TAG - Image tag to test
# KATA_HYPERVISOR - Hypervisor to test (qemu, clh, etc.)
# KUBERNETES - K8s distribution (microk8s, k3s, rke2, etc.)
load "${BATS_TEST_DIRNAME}/../../common.bash"
repo_root_dir="${BATS_TEST_DIRNAME}/../../../"
load "${repo_root_dir}/tests/gha-run-k8s-common.sh"
source "${BATS_TEST_DIRNAME}/lib/helm-deploy.bash"
LIFECYCLE_POD_NAME="kata-lifecycle-test"
# Run a command on the host node's filesystem using a short-lived privileged pod.
# The host root is mounted at /host inside the pod.
# Usage: run_on_host "test -d /host/opt/kata && echo YES || echo NO"
run_on_host() {
local cmd="$1"
local node_name
node_name=$(kubectl get nodes --no-headers -o custom-columns=NAME:.metadata.name | head -1)
local pod_name="host-exec-${RANDOM}"
kubectl run "${pod_name}" \
--image=quay.io/kata-containers/alpine-bash-curl:latest \
--restart=Never --rm -i \
--overrides="{
\"spec\": {
\"nodeName\": \"${node_name}\",
\"activeDeadlineSeconds\": 300,
\"tolerations\": [{\"operator\": \"Exists\"}],
\"containers\": [{
\"name\": \"exec\",
\"image\": \"quay.io/kata-containers/alpine-bash-curl:latest\",
\"imagePullPolicy\": \"IfNotPresent\",
\"command\": [\"sh\", \"-c\", \"${cmd}\"],
\"securityContext\": {\"privileged\": true},
\"volumeMounts\": [{\"name\": \"host\", \"mountPath\": \"/host\", \"readOnly\": true}]
}],
\"volumes\": [{\"name\": \"host\", \"hostPath\": {\"path\": \"/\"}}]
}
}"
}
setup_file() {
ensure_helm
echo "# Image: ${DOCKER_REGISTRY}/${DOCKER_REPO}:${DOCKER_TAG}" >&3
echo "# Hypervisor: ${KATA_HYPERVISOR}" >&3
echo "# K8s distribution: ${KUBERNETES}" >&3
echo "# Deploying kata-deploy..." >&3
deploy_kata
echo "# kata-deploy deployed successfully" >&3
}
@test "Kata artifacts are present on host after install" {
echo "# Checking kata artifacts on host..." >&3
run run_on_host "test -d /host/opt/kata && echo PRESENT || echo MISSING"
echo "# /opt/kata directory: ${output}" >&3
[[ "${output}" == *"PRESENT"* ]]
run run_on_host "test -f /host/opt/kata/bin/containerd-shim-kata-v2 && echo FOUND || (test -f /host/opt/kata/runtime-rs/bin/containerd-shim-kata-v2 && echo FOUND || echo MISSING)"
echo "# containerd-shim-kata-v2: ${output}" >&3
[[ "${output}" == *"FOUND"* ]]
# RuntimeClasses must exist (filter out AKS-managed ones)
local rc_count
rc_count=$(kubectl get runtimeclasses --no-headers 2>/dev/null | grep -v "kata-mshv-vm-isolation" | grep -c "kata" || true)
echo "# Kata RuntimeClasses: ${rc_count}" >&3
[[ ${rc_count} -gt 0 ]]
# Node must have the kata-runtime label
local label
label=$(kubectl get nodes -o jsonpath='{.items[0].metadata.labels.katacontainers\.io/kata-runtime}')
echo "# Node label katacontainers.io/kata-runtime: ${label}" >&3
[[ "${label}" == "true" ]]
}
@test "DaemonSet restart does not crash running kata pods" {
# Create a long-running kata pod
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: ${LIFECYCLE_POD_NAME}
spec:
runtimeClassName: kata-${KATA_HYPERVISOR}
restartPolicy: Always
nodeSelector:
katacontainers.io/kata-runtime: "true"
containers:
- name: test
image: quay.io/kata-containers/alpine-bash-curl:latest
imagePullPolicy: IfNotPresent
command: ["sleep", "infinity"]
EOF
echo "# Waiting for kata pod to be running..." >&3
kubectl wait --for=condition=Ready "pod/${LIFECYCLE_POD_NAME}" --timeout=120s
# Record pod identity before the DaemonSet restart
local pod_uid_before
pod_uid_before=$(kubectl get pod "${LIFECYCLE_POD_NAME}" -o jsonpath='{.metadata.uid}')
local restart_count_before
restart_count_before=$(kubectl get pod "${LIFECYCLE_POD_NAME}" -o jsonpath='{.status.containerStatuses[0].restartCount}')
echo "# Pod UID before: ${pod_uid_before}, restarts: ${restart_count_before}" >&3
# Trigger a DaemonSet restart — this simulates what happens when a user
# changes a label, updates a config value, or does a rolling update.
echo "# Triggering kata-deploy DaemonSet restart..." >&3
kubectl -n "${HELM_NAMESPACE}" rollout restart daemonset/kata-deploy
echo "# Waiting for DaemonSet rollout to complete..." >&3
kubectl -n "${HELM_NAMESPACE}" rollout status daemonset/kata-deploy --timeout=300s
# On k3s/rke2 the new kata-deploy pod restarts the k3s service as
# part of install, which causes a brief API server outage. Wait for
# the node to become ready before querying pod status.
kubectl wait nodes --timeout=120s --all --for condition=Ready=True
echo "# Node is ready after DaemonSet rollout" >&3
# The kata pod must still be Running with the same UID and no extra restarts.
# Retry kubectl through any residual API unavailability.
local pod_phase=""
local retries=0
while [[ ${retries} -lt 30 ]]; do
pod_phase=$(kubectl get pod "${LIFECYCLE_POD_NAME}" -o jsonpath='{.status.phase}' 2>/dev/null) && break
retries=$((retries + 1))
sleep 2
done
echo "# Pod phase after restart: ${pod_phase}" >&3
[[ "${pod_phase}" == "Running" ]]
local pod_uid_after
pod_uid_after=$(kubectl get pod "${LIFECYCLE_POD_NAME}" -o jsonpath='{.metadata.uid}')
echo "# Pod UID after: ${pod_uid_after}" >&3
[[ "${pod_uid_before}" == "${pod_uid_after}" ]]
local restart_count_after
restart_count_after=$(kubectl get pod "${LIFECYCLE_POD_NAME}" -o jsonpath='{.status.containerStatuses[0].restartCount}')
echo "# Restart count after: ${restart_count_after}" >&3
[[ "${restart_count_before}" == "${restart_count_after}" ]]
echo "# SUCCESS: Kata pod survived DaemonSet restart without crashing" >&3
}
@test "Artifacts are fully cleaned up after uninstall" {
echo "# Uninstalling kata-deploy..." >&3
uninstall_kata
echo "# Uninstall complete, verifying cleanup..." >&3
# Wait for node to recover — containerd restart during cleanup may
# cause brief unavailability (especially on k3s/rke2).
kubectl wait nodes --timeout=120s --all --for condition=Ready=True
# RuntimeClasses must be gone (filter out AKS-managed ones)
local rc_count
rc_count=$(kubectl get runtimeclasses --no-headers 2>/dev/null | grep -v "kata-mshv-vm-isolation" | grep -c "kata" || true)
echo "# Kata RuntimeClasses remaining: ${rc_count}" >&3
[[ ${rc_count} -eq 0 ]]
# Node label must be removed
local label
label=$(kubectl get nodes -o jsonpath='{.items[0].metadata.labels.katacontainers\.io/kata-runtime}' 2>/dev/null || echo "")
echo "# Node label after uninstall: '${label}'" >&3
[[ -z "${label}" ]]
# Kata artifacts must be removed from the host filesystem
echo "# Checking host filesystem for leftover artifacts..." >&3
run run_on_host "test -d /host/opt/kata && echo EXISTS || echo REMOVED"
echo "# /opt/kata: ${output}" >&3
[[ "${output}" == *"REMOVED"* ]]
# Containerd must still be healthy and reporting a valid version
local container_runtime_version
container_runtime_version=$(kubectl get nodes --no-headers -o custom-columns=CONTAINER_RUNTIME:.status.nodeInfo.containerRuntimeVersion)
echo "# Container runtime version: ${container_runtime_version}" >&3
[[ "${container_runtime_version}" != *"Unknown"* ]]
echo "# SUCCESS: All kata artifacts cleaned up, containerd healthy" >&3
}
teardown() {
if [[ "${BATS_TEST_NAME}" == *"restart"* ]]; then
kubectl delete pod "${LIFECYCLE_POD_NAME}" --ignore-not-found=true --wait=false 2>/dev/null || true
fi
}
teardown_file() {
kubectl delete pod "${LIFECYCLE_POD_NAME}" --ignore-not-found=true --wait=false 2>/dev/null || true
uninstall_kata 2>/dev/null || true
}

View File

@@ -20,6 +20,7 @@ else
KATA_DEPLOY_TEST_UNION=( \
"kata-deploy.bats" \
"kata-deploy-custom-runtimes.bats" \
"kata-deploy-lifecycle.bats" \
)
fi

View File

@@ -296,36 +296,6 @@ function deploy_k0s() {
sudo chown "${USER}":"${USER}" ~/.kube/config
}
# If the rendered containerd config (v3) does not import the drop-in dir, write
# the full V3 template (from tests/containerd-config-v3.tmpl) with the given
# import path and restart the service.
# Args: containerd_dir (e.g. /var/lib/rancher/k3s/agent/etc/containerd), service_name (e.g. k3s or rke2-server).
function _setup_containerd_v3_template_if_needed() {
local containerd_dir="$1"
local service_name="$2"
local template_file="${tests_dir}/containerd-config-v3.tmpl"
local rendered_v3="${containerd_dir}/config-v3.toml"
local imports_path="${containerd_dir}/config-v3.toml.d/*.toml"
if sudo test -f "${rendered_v3}" && sudo grep -q 'config-v3\.toml\.d' "${rendered_v3}" 2>/dev/null; then
return 0
fi
if [[ ! -f "${template_file}" ]]; then
echo "Template not found: ${template_file}" >&2
return 1
fi
sudo mkdir -p "${containerd_dir}/config-v3.toml.d"
sed "s|__CONTAINERD_IMPORTS_PATH__|${imports_path}|g" "${template_file}" | sudo tee "${containerd_dir}/config-v3.toml.tmpl" > /dev/null
sudo systemctl restart "${service_name}"
}
function setup_k3s_containerd_v3_template_if_needed() {
_setup_containerd_v3_template_if_needed "/var/lib/rancher/k3s/agent/etc/containerd" "k3s"
}
function setup_rke2_containerd_v3_template_if_needed() {
_setup_containerd_v3_template_if_needed "/var/lib/rancher/rke2/agent/etc/containerd" "rke2-server"
}
function deploy_k3s() {
# Set CRI runtime-request-timeout to 600s (same as kubeadm) for CoCo and long-running create requests.
curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644 --kubelet-arg runtime-request-timeout=600s
@@ -333,9 +303,6 @@ function deploy_k3s() {
# This is an arbitrary value that came up from local tests
sleep 120s
# If rendered config does not import the drop-in dir, write full V3 template so kata-deploy can use it.
setup_k3s_containerd_v3_template_if_needed
# Download the kubectl binary into /usr/bin and remove /usr/local/bin/kubectl
#
# We need to do this to avoid hitting issues like:
@@ -405,9 +372,6 @@ function deploy_rke2() {
# This is an arbitrary value that came up from local tests
sleep 120s
# If rendered config does not import the drop-in dir, write full V3 template so kata-deploy can use it.
setup_rke2_containerd_v3_template_if_needed
# Link the kubectl binary into /usr/bin
sudo ln -sf /var/lib/rancher/rke2/bin/kubectl /usr/local/bin/kubectl

View File

@@ -0,0 +1,45 @@
#!/bin/bash
#
# Copyright (c) 2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
set -o errexit
set -o nounset
set -o pipefail
kata_tarball_dir="${2:-kata-artifacts}"
docker_dir="$(dirname "$(readlink -f "$0")")"
source "${docker_dir}/../../common.bash"
image="${image:-instrumentisto/nmap:latest}"
function install_dependencies() {
info "Installing the dependencies needed for running the docker smoke test"
sudo -E docker pull "${image}"
}
function run() {
info "Running docker smoke test tests using ${KATA_HYPERVISOR} hypervisor"
enabling_hypervisor
info "Running docker with runc"
sudo docker run --rm --entrypoint nping "${image}" --tcp-connect -c 2 -p 80 www.github.com
info "Running docker with Kata Containers (${KATA_HYPERVISOR})"
sudo docker run --rm --runtime io.containerd.kata-${KATA_HYPERVISOR}.v2 --entrypoint nping "${image}" --tcp-connect -c 2 -p 80 www.github.com
}
function main() {
action="${1:-}"
case "${action}" in
install-dependencies) install_dependencies ;;
install-kata) install_kata ;;
run) run ;;
*) >&2 die "Invalid argument" ;;
esac
}
main "$@"

View File

@@ -116,12 +116,16 @@ function is_confidential_gpu_hardware() {
return 1
}
# create_loop_device creates a loop device backed by a file.
# $1: loop file path (default: /tmp/trusted-image-storage.img)
# $2: size in MiB, i.e. dd bs=1M count=... (default: 2500, ~2.4Gi)
function create_loop_device(){
local loop_file="${1:-/tmp/trusted-image-storage.img}"
local size_mb="${2:-2500}"
local node="$(get_one_kata_node)"
cleanup_loop_device "$loop_file"
exec_host "$node" "dd if=/dev/zero of=$loop_file bs=1M count=2500"
exec_host "$node" "dd if=/dev/zero of=$loop_file bs=1M count=$size_mb"
exec_host "$node" "losetup -fP $loop_file >/dev/null 2>&1"
local device=$(exec_host "$node" losetup -j $loop_file | awk -F'[: ]' '{print $1}')

View File

@@ -97,7 +97,10 @@ setup() {
storage_config=$(mktemp "${BATS_FILE_TMPDIR}/$(basename "${storage_config_template}").XXXXXX.yaml")
local_device=$(create_loop_device)
LOCAL_DEVICE="$local_device" NODE_NAME="$node" envsubst < "$storage_config_template" > "$storage_config"
PV_NAME=trusted-block-pv PVC_NAME=trusted-pvc \
PV_STORAGE_CAPACITY=10Gi PVC_STORAGE_REQUEST=1Gi \
LOCAL_DEVICE="$local_device" NODE_NAME="$node" \
envsubst < "$storage_config_template" > "$storage_config"
# For debug sake
echo "Trusted storage $storage_config file:"
@@ -142,7 +145,10 @@ setup() {
@test "Test we cannot pull a large image that pull time exceeds createcontainer timeout inside the guest" {
storage_config=$(mktemp "${BATS_FILE_TMPDIR}/$(basename "${storage_config_template}").XXXXXX.yaml")
local_device=$(create_loop_device)
LOCAL_DEVICE="$local_device" NODE_NAME="$node" envsubst < "$storage_config_template" > "$storage_config"
PV_NAME=trusted-block-pv PVC_NAME=trusted-pvc \
PV_STORAGE_CAPACITY=10Gi PVC_STORAGE_REQUEST=1Gi \
LOCAL_DEVICE="$local_device" NODE_NAME="$node" \
envsubst < "$storage_config_template" > "$storage_config"
# For debug sake
echo "Trusted storage $storage_config file:"
@@ -193,7 +199,10 @@ setup() {
fi
storage_config=$(mktemp "${BATS_FILE_TMPDIR}/$(basename "${storage_config_template}").XXXXXX.yaml")
local_device=$(create_loop_device)
LOCAL_DEVICE="$local_device" NODE_NAME="$node" envsubst < "$storage_config_template" > "$storage_config"
PV_NAME=trusted-block-pv PVC_NAME=trusted-pvc \
PV_STORAGE_CAPACITY=10Gi PVC_STORAGE_REQUEST=1Gi \
LOCAL_DEVICE="$local_device" NODE_NAME="$node" \
envsubst < "$storage_config_template" > "$storage_config"
# For debug sake
echo "Trusted storage $storage_config file:"

View File

@@ -0,0 +1,219 @@
#!/usr/bin/env bats
#
# Copyright (c) 2026 NVIDIA Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
# This file is modeled after k8s-nvidia-nim.bats which contains helpful in-line documentation.
load "${BATS_TEST_DIRNAME}/lib.sh"
load "${BATS_TEST_DIRNAME}/confidential_common.sh"
export KATA_HYPERVISOR="${KATA_HYPERVISOR:-qemu-nvidia-gpu}"
TEE=false
if is_confidential_gpu_hardware; then
TEE=true
fi
export TEE
NIM_SERVICE_NAME="meta-llama-3-2-1b-instruct"
[[ "${TEE}" = "true" ]] && NIM_SERVICE_NAME="meta-llama-3-2-1b-instruct-tee"
export NIM_SERVICE_NAME
POD_READY_TIMEOUT_LLAMA_3_2_1B_PREDEFINED=600s
[[ "${TEE}" = "true" ]] && POD_READY_TIMEOUT_LLAMA_3_2_1B_PREDEFINED=1200s
export POD_READY_TIMEOUT_LLAMA_3_2_1B=${POD_READY_TIMEOUT_LLAMA_3_2_1B:-${POD_READY_TIMEOUT_LLAMA_3_2_1B_PREDEFINED}}
export LOCAL_NIM_CACHE_LLAMA_3_2_1B="${LOCAL_NIM_CACHE_LLAMA_3_2_1B:-${LOCAL_NIM_CACHE:-/opt/nim/.cache}-llama-3-2-1b}"
DOCKER_CONFIG_JSON=$(
echo -n "{\"auths\":{\"nvcr.io\":{\"username\":\"\$oauthtoken\",\"password\":\"${NGC_API_KEY}\",\"auth\":\"$(echo -n "\$oauthtoken:${NGC_API_KEY}" | base64 -w0)\"}}}" |
base64 -w0
)
export DOCKER_CONFIG_JSON
KBS_AUTH_CONFIG_JSON=$(
echo -n "{\"auths\":{\"nvcr.io\":{\"auth\":\"$(echo -n "\$oauthtoken:${NGC_API_KEY}" | base64 -w0)\"}}}" |
base64 -w0
)
export KBS_AUTH_CONFIG_JSON
NGC_API_KEY_BASE64=$(
echo -n "${NGC_API_KEY}" | base64 -w0
)
export NGC_API_KEY_BASE64
# Points to kbs:///default/ngc-api-key/instruct and thus re-uses the secret from k8s-nvidia-nim.bats.
NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B="${SEALED_SECRET_PRECREATED_NIM_INSTRUCT}"
export NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B
NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B_BASE64=$(echo -n "${NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B}" | base64 -w0)
export NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B_BASE64
# NIM Operator (k8s-nim-operator) install/uninstall for NIMService CRD.
NIM_OPERATOR_NAMESPACE="${NIM_OPERATOR_NAMESPACE:-nim-operator}"
NIM_OPERATOR_RELEASE_NAME="nim-operator"
install_nim_operator() {
command -v helm &>/dev/null || die "helm is required but not installed"
echo "Installing NVIDIA NIM Operator (latest chart)"
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
kubectl create namespace "${NIM_OPERATOR_NAMESPACE}" --dry-run=client -o yaml | kubectl apply -f -
helm upgrade --install "${NIM_OPERATOR_RELEASE_NAME}" nvidia/k8s-nim-operator \
-n "${NIM_OPERATOR_NAMESPACE}" \
--wait
local deploy_name
deploy_name=$(kubectl get deployment -n "${NIM_OPERATOR_NAMESPACE}" -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || true)
if [[ -n "${deploy_name}" ]]; then
kubectl wait --for=condition=available --timeout=300s "deployment/${deploy_name}" -n "${NIM_OPERATOR_NAMESPACE}"
fi
echo "NIM Operator install complete."
}
uninstall_nim_operator() {
echo "Uninstalling NVIDIA NIM Operator (release: ${NIM_OPERATOR_RELEASE_NAME}, namespace: ${NIM_OPERATOR_NAMESPACE})"
if helm status "${NIM_OPERATOR_RELEASE_NAME}" -n "${NIM_OPERATOR_NAMESPACE}" &>/dev/null; then
helm uninstall "${NIM_OPERATOR_RELEASE_NAME}" -n "${NIM_OPERATOR_NAMESPACE}" || true
kubectl delete namespace "${NIM_OPERATOR_NAMESPACE}" --ignore-not-found=true --timeout=60s || true
echo "NIM Operator uninstall complete."
else
echo "NIM Operator release not found, nothing to uninstall."
fi
}
setup_kbs_credentials() {
CC_KBS_ADDR=$(kbs_k8s_svc_http_addr)
export CC_KBS_ADDR
kubectl delete secret ngc-secret-llama-3-2-1b --ignore-not-found
kubectl create secret docker-registry ngc-secret-llama-3-2-1b --docker-server="nvcr.io" --docker-username="\$oauthtoken" --docker-password="${NGC_API_KEY}"
kbs_set_gpu0_resource_policy
kbs_set_resource_base64 "default" "credentials" "nvcr" "${KBS_AUTH_CONFIG_JSON}"
kbs_set_resource "default" "ngc-api-key" "instruct" "${NGC_API_KEY}"
}
# CDH initdata for guest-pull: KBS URL, registry credentials URI, and allow-all policy.
# NIMService is not supported by genpolicy; add_allow_all_policy_to_yaml only supports Pod/Deployment.
# Build initdata with policy inline so TEE pods get both CDH config and policy.
create_nim_initdata_file_llama_3_2_1b() {
local output_file="$1"
local cc_kbs_address
cc_kbs_address=$(kbs_k8s_svc_http_addr)
local allow_all_rego="${BATS_TEST_DIRNAME}/../../../src/kata-opa/allow-all.rego"
cat > "${output_file}" << EOF
version = "0.1.0"
algorithm = "sha256"
[data]
"aa.toml" = '''
[token_configs]
[token_configs.kbs]
url = "${cc_kbs_address}"
'''
"cdh.toml" = '''
[kbc]
name = "cc_kbc"
url = "${cc_kbs_address}"
[image]
authenticated_registry_credentials_uri = "kbs:///default/credentials/nvcr"
'''
"policy.rego" = '''
$(cat "${allow_all_rego}")
'''
EOF
}
setup() {
setup_common || die "setup_common failed"
install_nim_operator || die "NIM Operator install failed"
dpkg -s jq >/dev/null 2>&1 || sudo apt -y install jq
# Same pattern as k8s-nvidia-nim.bats: choose manifest by TEE; each YAML has literal secret names.
local tee_suffix=""
[[ "${TEE}" = "true" ]] && tee_suffix="-tee"
export NIM_YAML_IN="${pod_config_dir}/nvidia-nim-llama-3-2-1b-instruct-service${tee_suffix}.yaml.in"
export NIM_YAML="${pod_config_dir}/nvidia-nim-llama-3-2-1b-instruct-service${tee_suffix}.yaml"
if [[ "${TEE}" = "true" ]]; then
setup_kbs_credentials
setup_sealed_secret_signing_public_key
initdata_file="${BATS_SUITE_TMPDIR}/nim-initdata-llama-3-2-1b.toml"
create_nim_initdata_file_llama_3_2_1b "${initdata_file}"
NIM_INITDATA_BASE64=$(gzip -c "${initdata_file}" | base64 -w0)
export NIM_INITDATA_BASE64
fi
envsubst < "${NIM_YAML_IN}" > "${NIM_YAML}"
}
@test "NIMService llama-3.2-1b-instruct serves /v1/models" {
echo "NIMService test: Applying NIM YAML"
kubectl apply -f "${NIM_YAML}"
echo "NIMService test: Waiting for deployment to exist (operator creates it from NIMService)"
local wait_exist_timeout=30
local elapsed=0
while ! kubectl get deployment "${NIM_SERVICE_NAME}" &>/dev/null; do
if [[ ${elapsed} -ge ${wait_exist_timeout} ]]; then
echo "Deployment ${NIM_SERVICE_NAME} did not appear within ${wait_exist_timeout}s" >&2
kubectl get deployment "${NIM_SERVICE_NAME}" 2>&1 || true
false
fi
sleep 5
elapsed=$((elapsed + 5))
done
local pod_name
pod_name=$(kubectl get pods --no-headers -o custom-columns=":metadata.name" | head -1)
echo "NIMService test: POD_NAME=${pod_name} (waiting for pod ready, timeout ${POD_READY_TIMEOUT_LLAMA_3_2_1B})"
[[ -n "${pod_name}" ]]
kubectl wait --for=condition=ready --timeout="${POD_READY_TIMEOUT_LLAMA_3_2_1B}" "pod/${pod_name}"
local pod_ip
pod_ip=$(kubectl get pod "${pod_name}" -o jsonpath='{.status.podIP}')
echo "NIMService test: POD_IP=${pod_ip}"
[[ -n "${pod_ip}" ]]
echo "NIMService test: Curling http://${pod_ip}:8000/v1/models"
run curl -sS --connect-timeout 10 "http://${pod_ip}:8000/v1/models"
echo "NIMService test: /v1/models response: ${output}"
[[ "${status}" -eq 0 ]]
[[ "$(echo "${output}" | jq -r '.object')" == "list" ]]
[[ "$(echo "${output}" | jq -r '.data[0].id')" == "meta/llama-3.2-1b-instruct" ]]
[[ "$(echo "${output}" | jq -r '.data[0].object')" == "model" ]]
echo "NIMService test: Curling http://${pod_ip}:8000/v1/chat/completions"
run curl -sS --connect-timeout 30 "http://${pod_ip}:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model":"meta/llama-3.2-1b-instruct","messages":[{"role":"user","content":"ping"}],"max_tokens":8}'
echo "NIMService test: /v1/chat/completions response: ${output}"
[[ "${status}" -eq 0 ]]
[[ "$(echo "${output}" | jq -r '.object')" == "chat.completion" ]]
[[ "$(echo "${output}" | jq -r '.model')" == "meta/llama-3.2-1b-instruct" ]]
[[ "$(echo "${output}" | jq -r '.choices[0].message | has("content") or has("reasoning_content")')" == "true" ]]
}
teardown() {
if kubectl get nimservice "${NIM_SERVICE_NAME}" &>/dev/null; then
POD_NAME=$(kubectl get pods --no-headers -o custom-columns=":metadata.name" | head -1)
if [[ -n "${POD_NAME}" ]]; then
echo "=== NIMService pod logs ==="
kubectl logs "${POD_NAME}" || true
kubectl describe pod "${POD_NAME}" || true
fi
kubectl describe nimservice "${NIM_SERVICE_NAME}" || true
fi
[ -f "${NIM_YAML}" ] && kubectl delete -f "${NIM_YAML}" --ignore-not-found=true
uninstall_nim_operator || true
print_node_journal_since_test_start "${node}" "${node_start_time:-}" "${BATS_TEST_COMPLETED:-}"
}

View File

@@ -85,6 +85,8 @@ setup_langchain_flow() {
# generated policy.rego to it and set it as the cc_init_data annotation.
# We must overwrite the default empty file AFTER create_tmp_policy_settings_dir()
# copies it to the temp directory.
# As we use multiple vCPUs we set `max_concurrent_layer_downloads_per_image = 1`,
# see: https://github.com/kata-containers/kata-containers/issues/12721
create_nim_initdata_file() {
local output_file="$1"
local cc_kbs_address
@@ -107,6 +109,7 @@ name = "cc_kbc"
url = "${cc_kbs_address}"
[image]
max_concurrent_layer_downloads_per_image = 1
authenticated_registry_credentials_uri = "kbs:///default/credentials/nvcr"
'''
EOF
@@ -189,12 +192,35 @@ setup_file() {
# This must happen AFTER create_tmp_policy_settings_dir() copies the empty
# file and BEFORE auto_generate_policy() runs.
create_nim_initdata_file "${policy_settings_dir}/default-initdata.toml"
# Container image layer storage: one block device and PV/PVC per pod.
storage_config_template="${pod_config_dir}/confidential/trusted-storage.yaml.in"
instruct_storage_mib=57344
local_device_instruct=$(create_loop_device /tmp/trusted-image-storage-instruct.img "$instruct_storage_mib")
storage_config_instruct=$(mktemp "${BATS_FILE_TMPDIR}/$(basename "${storage_config_template}").instruct.XXX")
PV_NAME=trusted-block-pv-instruct PVC_NAME=trusted-pvc-instruct \
PV_STORAGE_CAPACITY="${instruct_storage_mib}Mi" PVC_STORAGE_REQUEST="${instruct_storage_mib}Mi" \
LOCAL_DEVICE="$local_device_instruct" NODE_NAME="$node" \
envsubst < "$storage_config_template" > "$storage_config_instruct"
retry_kubectl_apply "$storage_config_instruct"
if [ "${SKIP_MULTI_GPU_TESTS}" != "true" ]; then
embedqa_storage_mib=8192
local_device_embedqa=$(create_loop_device /tmp/trusted-image-storage-embedqa.img "$embedqa_storage_mib")
storage_config_embedqa=$(mktemp "${BATS_FILE_TMPDIR}/$(basename "${storage_config_template}").embedqa.XXX")
PV_NAME=trusted-block-pv-embedqa PVC_NAME=trusted-pvc-embedqa \
PV_STORAGE_CAPACITY="${embedqa_storage_mib}Mi" PVC_STORAGE_REQUEST="${embedqa_storage_mib}Mi" \
LOCAL_DEVICE="$local_device_embedqa" NODE_NAME="$node" \
envsubst < "$storage_config_template" > "$storage_config_embedqa"
retry_kubectl_apply "$storage_config_embedqa"
fi
fi
create_inference_pod
if [ "${SKIP_MULTI_GPU_TESTS}" != "true" ]; then
create_embedqa_pod
create_embedqa_pod
fi
}
@@ -459,5 +485,13 @@ teardown_file() {
[ -f "${POD_EMBEDQA_YAML}" ] && kubectl delete -f "${POD_EMBEDQA_YAML}" --ignore-not-found=true
fi
if [[ "${TEE}" = "true" ]]; then
kubectl delete --ignore-not-found pvc trusted-pvc-instruct trusted-pvc-embedqa
kubectl delete --ignore-not-found pv trusted-block-pv-instruct trusted-block-pv-embedqa
kubectl delete --ignore-not-found storageclass local-storage
cleanup_loop_device /tmp/trusted-image-storage-instruct.img || true
cleanup_loop_device /tmp/trusted-image-storage-embedqa.img || true
fi
print_node_journal_since_test_start "${node}" "${node_start_time:-}" "${BATS_TEST_COMPLETED:-}" >&3
}

View File

@@ -89,7 +89,8 @@ if [[ -n "${K8S_TEST_NV:-}" ]]; then
else
K8S_TEST_NV=("k8s-confidential-attestation.bats" \
"k8s-nvidia-cuda.bats" \
"k8s-nvidia-nim.bats")
"k8s-nvidia-nim.bats" \
"k8s-nvidia-nim-service.bats")
fi
SUPPORTED_HYPERVISORS=("qemu-nvidia-gpu" "qemu-nvidia-gpu-snp" "qemu-nvidia-gpu-tdx")

View File

@@ -14,10 +14,10 @@ volumeBindingMode: WaitForFirstConsumer
apiVersion: v1
kind: PersistentVolume
metadata:
name: trusted-block-pv
name: $PV_NAME
spec:
capacity:
storage: 10Gi
storage: $PV_STORAGE_CAPACITY
volumeMode: Block
accessModes:
- ReadWriteOnce
@@ -37,12 +37,12 @@ spec:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: trusted-pvc
name: $PVC_NAME
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storage: $PVC_STORAGE_REQUEST
volumeMode: Block
storageClassName: local-storage

View File

@@ -69,14 +69,20 @@ spec:
limits:
nvidia.com/pgpu: "1"
cpu: "16"
memory: "64Gi"
memory: "48Gi"
volumeMounts:
- name: nim-trusted-cache
mountPath: /opt/nim/.cache
volumeDevices:
- devicePath: /dev/trusted_store
name: trusted-storage
volumes:
- name: nim-trusted-cache
emptyDir:
sizeLimit: 64Gi
- name: trusted-storage
persistentVolumeClaim:
claimName: trusted-pvc-instruct
---
apiVersion: v1
kind: Secret

View File

@@ -0,0 +1,54 @@
# Copyright (c) 2026 NVIDIA Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
---
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
name: ${NIM_SERVICE_NAME}
spec:
image:
repository: nvcr.io/nim/meta/llama-3.2-1b-instruct
tag: "1.12.0"
pullPolicy: IfNotPresent
pullSecrets:
- ngc-secret-llama-3-2-1b
authSecret: ngc-api-key-sealed-llama-3-2-1b
# /dev/trusted_store container image layer storage feature cannot be selected,
# but storage.emptyDir selects container data storage feature.
storage:
emptyDir:
sizeLimit: 10Gi
replicas: 1
resources:
limits:
nvidia.com/pgpu: "1"
cpu: "8"
memory: "56Gi"
expose:
service:
type: ClusterIP
port: 8000
runtimeClassName: kata
userID: 1000
groupID: 1000
annotations:
io.katacontainers.config.hypervisor.kernel_params: "agent.guest_components_procs=confidential-data-hub agent.aa_kbc_params=cc_kbc::${CC_KBS_ADDR}"
io.katacontainers.config.hypervisor.cc_init_data: "${NIM_INITDATA_BASE64}"
---
apiVersion: v1
kind: Secret
metadata:
name: ngc-secret-llama-3-2-1b
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: ${DOCKER_CONFIG_JSON}
---
apiVersion: v1
kind: Secret
metadata:
name: ngc-api-key-sealed-llama-3-2-1b
type: Opaque
data:
NGC_API_KEY: "${NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B_BASE64}"

View File

@@ -0,0 +1,48 @@
# Copyright (c) 2026 NVIDIA Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
---
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
name: ${NIM_SERVICE_NAME}
spec:
image:
repository: nvcr.io/nim/meta/llama-3.2-1b-instruct
tag: "1.12.0"
pullPolicy: IfNotPresent
pullSecrets:
- ngc-secret-llama-3-2-1b
authSecret: ngc-api-key-llama-3-2-1b
storage:
hostPath: "${LOCAL_NIM_CACHE_LLAMA_3_2_1B}"
replicas: 1
resources:
limits:
nvidia.com/pgpu: "1"
cpu: "8"
memory: "16Gi"
expose:
service:
type: ClusterIP
port: 8000
runtimeClassName: kata
userID: 1000
groupID: 1000
---
apiVersion: v1
kind: Secret
metadata:
name: ngc-secret-llama-3-2-1b
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: ${DOCKER_CONFIG_JSON}
---
apiVersion: v1
kind: Secret
metadata:
name: ngc-api-key-llama-3-2-1b
type: Opaque
data:
NGC_API_KEY: "${NGC_API_KEY_BASE64}"

View File

@@ -83,10 +83,16 @@ spec:
volumeMounts:
- name: nim-trusted-cache
mountPath: /opt/nim/.cache
volumeDevices:
- devicePath: /dev/trusted_store
name: trusted-storage
volumes:
- name: nim-trusted-cache
emptyDir:
sizeLimit: 40Gi
- name: trusted-storage
persistentVolumeClaim:
claimName: trusted-pvc-embedqa
---
apiVersion: v1
kind: Secret

View File

@@ -155,6 +155,7 @@ pub struct Config {
pub containerd_conf_file: String,
pub containerd_conf_file_backup: String,
pub containerd_drop_in_conf_file: String,
pub daemonset_name: String,
pub custom_runtimes_enabled: bool,
pub custom_runtimes: Vec<CustomRuntime>,
}
@@ -169,6 +170,12 @@ impl Config {
return Err(anyhow::anyhow!("NODE_NAME must not be empty"));
}
let daemonset_name = env::var("DAEMONSET_NAME")
.ok()
.map(|v| v.trim().to_string())
.filter(|v| !v.is_empty())
.unwrap_or_else(|| "kata-deploy".to_string());
let debug = env::var("DEBUG").unwrap_or_else(|_| "false".to_string()) == "true";
// Parse shims - only use arch-specific variable
@@ -293,6 +300,7 @@ impl Config {
containerd_conf_file,
containerd_conf_file_backup,
containerd_drop_in_conf_file,
daemonset_name,
custom_runtimes_enabled,
custom_runtimes,
};

View File

@@ -94,30 +94,41 @@ impl K8sClient {
Ok(())
}
pub async fn count_kata_deploy_daemonsets(&self) -> Result<usize> {
/// Returns whether a non-terminating DaemonSet with this exact name
/// exists in the current namespace. Used to decide whether this pod is
/// being restarted (true) or uninstalled (false).
pub async fn own_daemonset_exists(&self, daemonset_name: &str) -> Result<bool> {
use k8s_openapi::api::apps::v1::DaemonSet;
use kube::api::Api;
let ds_api: Api<DaemonSet> = Api::default_namespaced(self.client.clone());
match ds_api.get_opt(daemonset_name).await? {
Some(ds) => Ok(ds.metadata.deletion_timestamp.is_none()),
None => Ok(false),
}
}
/// Returns how many non-terminating DaemonSets across all namespaces
/// have a name containing "kata-deploy". Used to decide whether shared
/// node-level resources (node label, CRI restart) should be cleaned up:
/// they are only safe to remove when no kata-deploy instance remains
/// on the cluster.
pub async fn count_any_kata_deploy_daemonsets(&self) -> Result<usize> {
use k8s_openapi::api::apps::v1::DaemonSet;
use kube::api::{Api, ListParams};
let ds_api: Api<DaemonSet> = Api::default_namespaced(self.client.clone());
let lp = ListParams::default();
let daemonsets = ds_api.list(&lp).await?;
let ds_api: Api<DaemonSet> = Api::all(self.client.clone());
let daemonsets = ds_api.list(&ListParams::default()).await?;
// Note: We use client-side filtering here because Kubernetes field selectors
// don't support "contains" operations - they only support exact matches and comparisons.
// Filtering by name containing "kata-deploy" requires client-side processing.
// Exclude DaemonSets that are terminating (have deletion_timestamp) so that when our
// DaemonSet pod runs cleanup on SIGTERM during uninstall, we count 0 and remove the label.
let count = daemonsets
.iter()
.filter(|ds| {
if ds.metadata.deletion_timestamp.is_some() {
return false;
}
ds.metadata
.name
.as_ref()
.map(|n| n.contains("kata-deploy"))
.unwrap_or(false)
ds.metadata.deletion_timestamp.is_none()
&& ds
.metadata
.name
.as_ref()
.is_some_and(|n| n.contains("kata-deploy"))
})
.count();
@@ -584,9 +595,14 @@ pub async fn label_node(
client.label_node(label_key, label_value, overwrite).await
}
pub async fn count_kata_deploy_daemonsets(config: &Config) -> Result<usize> {
pub async fn own_daemonset_exists(config: &Config) -> Result<bool> {
let client = K8sClient::new(&config.node_name).await?;
client.count_kata_deploy_daemonsets().await
client.own_daemonset_exists(&config.daemonset_name).await
}
pub async fn count_any_kata_deploy_daemonsets(config: &Config) -> Result<usize> {
let client = K8sClient::new(&config.node_name).await?;
client.count_any_kata_deploy_daemonsets().await
}
pub async fn crd_exists(config: &Config, crd_name: &str) -> Result<bool> {

View File

@@ -236,19 +236,29 @@ async fn install(config: &config::Config, runtime: &str) -> Result<()> {
async fn cleanup(config: &config::Config, runtime: &str) -> Result<()> {
info!("Cleaning up Kata Containers");
info!("Counting kata-deploy daemonsets");
let kata_deploy_installations = k8s::count_kata_deploy_daemonsets(config).await?;
// Step 1: Check if THIS pod's owning DaemonSet still exists.
// If it does, this is a pod restart (rolling update, label change, etc.),
// not an uninstall — skip everything so running kata pods are not disrupted.
info!(
"Found {} kata-deploy daemonset(s)",
kata_deploy_installations
"Checking if DaemonSet '{}' still exists",
config.daemonset_name
);
if kata_deploy_installations == 0 {
info!("Removing kata-runtime label from node");
k8s::label_node(config, "katacontainers.io/kata-runtime", None, false).await?;
info!("Successfully removed kata-runtime label");
if k8s::own_daemonset_exists(config).await? {
info!(
"DaemonSet '{}' still exists, \
skipping all cleanup to avoid disrupting running kata pods",
config.daemonset_name
);
return Ok(());
}
// Step 2: Our DaemonSet is gone (uninstall). Perform instance-specific
// cleanup: snapshotters, CRI config, and artifacts for this instance.
info!(
"DaemonSet '{}' not found, proceeding with instance cleanup",
config.daemonset_name
);
match config.experimental_setup_snapshotter.as_ref() {
Some(snapshotters) => {
for snapshotter in snapshotters {
@@ -270,6 +280,25 @@ async fn cleanup(config: &config::Config, runtime: &str) -> Result<()> {
artifacts::remove_artifacts(config).await?;
info!("Successfully removed kata artifacts");
// Step 3: Check if ANY other kata-deploy DaemonSets still exist.
// Shared resources (node label, CRI restart) are only safe to touch
// when no other kata-deploy instance remains.
let other_ds_count = k8s::count_any_kata_deploy_daemonsets(config).await?;
if other_ds_count > 0 {
info!(
"{} other kata-deploy DaemonSet(s) still exist, \
skipping node label removal and CRI restart",
other_ds_count
);
return Ok(());
}
info!("No other kata-deploy DaemonSets found, performing full shared cleanup");
info!("Removing kata-runtime label from node");
k8s::label_node(config, "katacontainers.io/kata-runtime", None, false).await?;
info!("Successfully removed kata-runtime label");
// Restart the CRI runtime last. On k3s/rke2 this restarts the entire
// server process, which kills this (terminating) pod. By doing it after
// all other cleanup, we ensure config and artifacts are already gone.

View File

@@ -51,18 +51,19 @@ pub async fn get_container_runtime(config: &Config) -> Result<String> {
return Ok("crio".to_string());
}
if runtime_version.contains("containerd") && runtime_version.contains("-k3s") {
// Check systemd services (ignore errors - service might not exist)
let _ = utils::host_systemctl(&["is-active", "--quiet", "rke2-agent"]);
if utils::host_systemctl(&["is-active", "--quiet", "rke2-agent"]).is_ok() {
return Ok("rke2-agent".to_string());
}
if utils::host_systemctl(&["is-active", "--quiet", "rke2-server"]).is_ok() {
return Ok("rke2-server".to_string());
}
if utils::host_systemctl(&["is-active", "--quiet", "k3s-agent"]).is_ok() {
return Ok("k3s-agent".to_string());
}
// Detect k3s/rke2 via systemd services rather than the containerd version
// string, which no longer reliably contains "k3s" in newer releases
// (e.g. "containerd://2.2.2-bd1.34").
if utils::host_systemctl(&["is-active", "--quiet", "rke2-agent"]).is_ok() {
return Ok("rke2-agent".to_string());
}
if utils::host_systemctl(&["is-active", "--quiet", "rke2-server"]).is_ok() {
return Ok("rke2-server".to_string());
}
if utils::host_systemctl(&["is-active", "--quiet", "k3s-agent"]).is_ok() {
return Ok("k3s-agent".to_string());
}
if utils::host_systemctl(&["is-active", "--quiet", "k3s"]).is_ok() {
return Ok("k3s".to_string());
}
@@ -83,7 +84,7 @@ pub async fn get_container_runtime(config: &Config) -> Result<String> {
Ok(runtime)
}
/// Returns true if containerRuntimeVersion (e.g. "containerd://2.1.5-k3s1") indicates
/// Returns true if containerRuntimeVersion (e.g. "containerd://2.1.5-k3s1", "containerd://2.2.2-bd1.34") indicates
/// containerd 2.x or newer, false for 1.x or unparseable. Used for drop-in support
/// and for K3s/RKE2 template selection (config-v3.toml.tmpl vs config.toml.tmpl).
pub fn containerd_version_is_2_or_newer(runtime_version: &str) -> bool {
@@ -191,6 +192,7 @@ mod tests {
#[case("containerd://2.0.0", true)]
#[case("containerd://2.1.5", true)]
#[case("containerd://2.1.5-k3s1", true)]
#[case("containerd://2.2.2-bd1.34", true)]
#[case("containerd://2.2.0", true)]
#[case("containerd://2.3.1", true)]
#[case("containerd://2.0.0-rc.1", true)]

View File

@@ -143,6 +143,13 @@ spec:
valueFrom:
fieldRef:
fieldPath: spec.nodeName
{{- if .Values.env.multiInstallSuffix }}
- name: DAEMONSET_NAME
value: {{ printf "%s-%s" .Chart.Name .Values.env.multiInstallSuffix | quote }}
{{- else }}
- name: DAEMONSET_NAME
value: {{ .Chart.Name | quote }}
{{- end }}
- name: DEBUG
value: {{ include "kata-deploy.getDebug" . | quote }}
{{- $shimsAmd64 := include "kata-deploy.getEnabledShimsForArch" (dict "root" . "arch" "amd64") | trim -}}

View File

@@ -5,6 +5,8 @@ Before=containerd.service
[Service]
ExecStart=@CONTAINERD_NYDUS_GRPC_BINARY@ --config @CONFIG_GUEST_PULLING@ --log-to-stdout
Restart=always
RestartSec=5
[Install]
RequiredBy=containerd.service
WantedBy=containerd.service