Compare commits

..

25 Commits

Author SHA1 Message Date
Fabiano Fidêncio
6031a1219b runtime-rs: fix Docker 26+ networking by rescanning after Start
Docker 26+ configures networking after the Start response rather than
through prestart hooks, which means the network namespace may not have
any interfaces when the sandbox is first created. This is the runtime-rs
counterpart of the Go runtime fix in PR #12754.

Three changes are made:

1. Discover Docker's pre-created network namespace from OCI hook args
   (libnetwork-setkey) during sandbox creation, avoiding a placeholder
   netns when the real one is already available.

2. Add an async rescan_network method to VirtSandbox that polls the
   network namespace for up to 5 seconds (50ms interval) looking for
   late-appearing interfaces, then pushes them to the guest agent.

3. Spawn the async rescan after StartProcess for sandbox containers,
   matching the timing of the Go runtime's RescanNetwork goroutine.

Fixes: #9340

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
2026-04-05 14:48:23 +02:00
Fabiano Fidêncio
f074ceec6d Merge pull request #12682 from PiotrProkop/fix-direct-io-kata
runtime-rs: fix setting directio via config file
2026-04-03 16:11:57 +02:00
Fabiano Fidêncio
945aa5b43f Merge pull request #12774 from zvonkok/bump-nvrc
nvrc: Bump to the latest Release
2026-04-03 15:39:01 +02:00
Fabiano Fidêncio
ccfdf5e11b Merge pull request #12754 from llink5/fix/docker26-networking-9340
runtime: fix Docker 26+ networking by rescanning after Start
2026-04-03 13:15:38 +02:00
RuoqingHe
26bd5ad754 Merge pull request #12762 from YutingNie/fix-runtime-rs-shared-fs-typo
runtime-rs: Fix typo in share_fs error message
2026-04-03 15:24:33 +08:00
Yuting Nie
517882f93d runtime-rs: Fix typo in share_fs error message
There's a typo in the error message which gets prompted when an
unsupported share_fs was configured. Fixed shred -> shared.

Signed-off-by: Yuting Nie <yuting.nie@spacemit.com>
2026-04-03 05:23:46 +00:00
Alex Lyn
4a1c2b6620 Merge pull request #12309 from kata-containers/stale-issues-by-date
workflows: Create workflow to stale issues based on date
2026-04-03 09:31:34 +08:00
Zvonko Kaiser
3e23ee9998 nvrc: Bump to the latest Release
v0.1.4 has a bugfix for nvrc.log=trace which is now
optional.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-04-02 17:40:47 -04:00
llink5
f7878cc385 runtime: fix Docker 26+ networking by rescanning after Start
Docker 26+ configures container networking (veth pair, IP addresses,
routes) after task creation rather than before. Kata's endpoint scan
runs during CreateSandbox, before the interfaces exist, resulting in
VMs starting without network connectivity (no -netdev passed to QEMU).

Add RescanNetwork() which runs asynchronously after the Start RPC.
It polls the network namespace until Docker's interfaces appear, then
hotplugs them to QEMU and informs the guest agent to configure them
inside the VM.

Additional fixes:
- mountinfo parser: find fs type dynamically instead of hardcoded
  field index, fixing parsing with optional mount tags (shared:,
  master:)
- IsDockerContainer: check CreateRuntime hooks for Docker 26+
- DockerNetnsPath: extract netns path from libnetwork-setkey hook
  args with path traversal protection
- detectHypervisorNetns: verify PID ownership via /proc/pid/cmdline
  to guard against PID recycling
- startVM guard: rescan when len(endpoints)==0 after VM start

Fixes: #9340

Signed-off-by: llink5 <llink5@users.noreply.github.com>
2026-04-02 21:23:16 +02:00
Fabiano Fidêncio
09194d71bb Merge pull request #12767 from nubificus/fix/fc-rs
runtime-rs: Fix FC API fields
2026-04-02 18:24:35 +02:00
Manuel Huber
dd868dee6d tests: nvidia: onboard NIM service test
Onboard a test case for deploying a NIM service using the NIM
operator. We install the operator helm chart on the fly as this is
a fast operation, spinning up a single operand. Once a NIM service
is scheduled, the operator creates a deployment with a single pod.

For now, the TEE-based flow uses an allow-all policy. In future
work, we strive to support generating pod security policies for the
scenario where NIM services are deployed and the pod manifest is
being generated on the fly.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-04-02 16:58:54 +02:00
Steve Horsman
58101a2166 Merge pull request #12656 from stevenhorsman/actions/checkout-bump
workflows: Update actions/checkout version
2026-04-01 17:34:39 +01:00
Fabiano Fidêncio
75df4c0bd3 Merge pull request #12766 from fidencio/topic/kata-deploy-avoid-kata-pods-to-crash-after-containerd-restart
kata-deploy: Fix kata-deploy pods crashing if containerd restarts
2026-04-01 18:28:16 +02:00
Steve Horsman
2830c4f080 Merge pull request #12746 from ldoktor/ci-helm2
ci.ocp: Use helm deployment for peer-pods
2026-04-01 17:13:21 +01:00
Lukáš Doktor
55a3772032 ci.ocp: Add note about external tests to README.md
to run all the tests that are running in CI we need to enable external
tests. This can be a bit tricky so add it into our documentation.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2026-04-01 16:59:33 +01:00
Lukáš Doktor
3bc460fd82 ci.ocp: Use helm deployment for peer-pods
replace the deprecated CAA deployment with helm one. Note that this also
installs the CAA mutating webhook, which wasn't installed before.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2026-04-01 16:59:33 +01:00
PiotrProkop
67af63a540 runtime-rs: fix setting directio via config file
This fix applies the config file value as a fallback when block_device_cache_direct annotation is not explicitly set on the pod.

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2026-04-01 16:59:04 +02:00
Anastassios Nanos
02c82b174a runtime-rs: Fix FC API fields
A FC update caused bad requests for the runtime-rs runtime when
specifying the vcpu count and block rate limiter fields.

Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
2026-04-01 14:50:51 +00:00
Fabiano Fidêncio
2131147360 tests: add kata-deploy lifecycle tests for restart resilience and cleanup
Add functional tests that cover two previously untested kata-deploy
behaviors:

1. Restart resilience (regression test for #12761): deploys a
   long-running kata pod, triggers a kata-deploy DaemonSet restart via
   rollout restart, and verifies the kata pod survives with the same
   UID and zero additional container restarts.

2. Artifact cleanup: after helm uninstall, verifies that RuntimeClasses
   are removed, the kata-runtime node label is cleared, /opt/kata is
   gone from the host filesystem, and containerd remains healthy.

3. Artifact presence: after install, verifies /opt/kata and the shim
   binary exist on the host, RuntimeClasses are created, and the node
   is labeled.

Host filesystem checks use a short-lived privileged pod with a
hostPath mount to inspect the node directly.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 15:20:53 +02:00
Fabiano Fidêncio
b4b62417ed kata-deploy: skip cleanup on pod restart to avoid crashing kata pods
When a kata-deploy DaemonSet pod is restarted (e.g. due to a label
change or rolling update), the SIGTERM handler runs cleanup which
unconditionally removes kata artifacts and restarts containerd. This
causes containerd to lose the kata shim binary, crashing all running
kata pods on the node.

Fix this by implementing a three-stage cleanup decision:

1. If this pod's owning DaemonSet still exists (exact name match via
   DAEMONSET_NAME env var), this is a pod restart — skip all cleanup.
   The replacement pod will re-run install, which is idempotent.

2. If this DaemonSet is gone but other kata-deploy DaemonSets still
   exist (multi-install scenario), perform instance-specific cleanup
   only (snapshotters, CRI config, artifacts) but skip shared
   resources (node label removal, CRI restart) to avoid disrupting
   the other instances.

3. If no kata-deploy DaemonSets remain, perform full cleanup including
   node label removal and CRI restart.

The Helm chart injects a DAEMONSET_NAME environment variable with the
exact DaemonSet name (including any multi-install suffix), ensuring
instance-aware lookup rather than broadly matching any DaemonSet
containing "kata-deploy".

Fixes: #12761

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 15:20:52 +02:00
Fabiano Fidêncio
28414a614e kata-deploy: detect k3s/rke2 via systemd services instead of version string
Newer k3s releases (v1.34+) no longer include "k3s" in the containerd
version string at all (e.g. "containerd://2.2.2-bd1.34" instead of
"containerd://2.1.5-k3s1"). This caused kata-deploy to fall through to
the default "containerd" runtime, configuring and restarting the system
containerd service instead of k3s's embedded containerd — leaving the
kata runtime invisible to k3s.

Fix by detecting k3s/rke2 via their systemd service names (k3s,
k3s-agent, rke2-server, rke2-agent) rather than parsing the containerd
version string. This is more robust and works regardless of how k3s
formats its containerd version.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 14:24:55 +02:00
Fabiano Fidêncio
8b9ce3b6cb tests: remove k3s/rke2 V3 containerd template workaround
Remove the workaround that wrote a synthetic containerd V3 config
template for k3s/rke2 in CI. This was added to test kata-deploy's
drop-in support before the upstream k3s/rke2 patch shipped. Now that
k3s and rke2 include the drop-in imports in their default template,
the workaround is no longer needed and breaks newer versions.

Removed:
- tests/containerd-config-v3.tmpl (synthetic Go template)
- _setup_containerd_v3_template_if_needed() and its k3s/rke2 wrappers
- Calls from deploy_k3s() and deploy_rke2()

This reverts the test infrastructure part of a2216ec05.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-04-01 14:24:55 +02:00
stevenhorsman
99eaa8fcb1 workflows: Create workflow to stale issues based on date
The standard stale/action is intended to be run regularly with
a date offset, but we want to have one we can run against a specific
date in order to run the stale bot against issues created since a particular
release milestone, so calculate the offset in one step and use it in the next.

At the moment we want to run this to stale issues before 9th October 2022 when Kata 3.0 was release, so default to this.

Note the stale action only processes a few issues at a time to avoid rate limiting, so why we want a cron job to it can get through
the backlog, but also to stale/unstale issues that are commented on.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-31 15:57:37 +01:00
stevenhorsman
12578b41f2 govmm: Delete old files
The govmm workflow isn't run by us and it and the other CI files
are just legacy from when it was a separate repo, so let's clean up
this debt rather than having to update it frequently.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-30 10:45:28 +01:00
stevenhorsman
b3179bdd8e workflows: Update actions/checkout version
Update the action to resolve the following warning in GHA:
> Node.js 20 actions are deprecated. The following actions are running
> on Node.js 20 and may not work as expected:
> actions/checkout@11bd71901b.
> Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-30 10:45:28 +01:00
95 changed files with 1800 additions and 842 deletions

View File

@@ -15,10 +15,6 @@ on:
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-basic-amd64
cancel-in-progress: true
jobs:
run-containerd-sandboxapi:
name: run-containerd-sandboxapi
@@ -30,9 +26,6 @@ jobs:
matrix:
containerd_version: ['active']
vmm: ['dragonball', 'cloud-hypervisor', 'qemu-runtime-rs']
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-sandboxapi-amd64-${{ toJSON(matrix) }}
cancel-in-progress: true
# TODO: enable me when https://github.com/containerd/containerd/issues/11640 is fixed
if: false
runs-on: ubuntu-22.04
@@ -42,7 +35,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
SANDBOXER: "shim"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -96,9 +89,6 @@ jobs:
matrix:
containerd_version: ['lts', 'active']
vmm: ['clh', 'cloud-hypervisor', 'dragonball', 'qemu', 'qemu-runtime-rs']
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-run-containerd-stability-amd64-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: ubuntu-22.04
env:
CONTAINERD_VERSION: ${{ matrix.containerd_version }}
@@ -106,7 +96,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
SANDBOXER: "podsandbox"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -145,16 +135,13 @@ jobs:
matrix:
containerd_version: ['lts', 'active']
vmm: ['clh', 'qemu', 'dragonball', 'qemu-runtime-rs']
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-run-nydus-amd64-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: ubuntu-22.04
env:
CONTAINERD_VERSION: ${{ matrix.containerd_version }}
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -201,9 +188,6 @@ jobs:
vmm:
- clh # cloud-hypervisor
- qemu
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-tracing-amd64-${{ toJSON(matrix) }}
cancel-in-progress: true
# TODO: enable me when https://github.com/kata-containers/kata-containers/issues/9763 is fixed
# TODO: Transition to free runner (see #9940).
if: false
@@ -211,7 +195,7 @@ jobs:
env:
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -249,9 +233,6 @@ jobs:
vmm:
- clh
- qemu
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-vfio-amd64-${{ toJSON(matrix) }}
cancel-in-progress: true
# TODO: enable with clh when https://github.com/kata-containers/kata-containers/issues/9764 is fixed
# TODO: enable with qemu when https://github.com/kata-containers/kata-containers/issues/9851 is fixed
# TODO: Transition to free runner (see #9940).
@@ -261,7 +242,7 @@ jobs:
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -288,6 +269,51 @@ jobs:
timeout-minutes: 15
run: bash tests/functional/vfio/gha-run.sh run
run-docker-tests:
name: run-docker-tests
strategy:
# We can set this to true whenever we're 100% sure that
# all the tests are not flaky, otherwise we'll fail them
# all due to a single flaky instance.
fail-fast: false
matrix:
vmm:
- qemu
- qemu-runtime-rs
runs-on: ubuntu-22.04
env:
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
persist-credentials: false
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/docker/gha-run.sh install-dependencies
env:
GH_TOKEN: ${{ github.token }}
- name: get-kata-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/docker/gha-run.sh install-kata kata-artifacts
- name: Run docker smoke test
timeout-minutes: 5
run: bash tests/integration/docker/gha-run.sh run
run-nerdctl-tests:
name: run-nerdctl-tests
strategy:
@@ -302,14 +328,11 @@ jobs:
- qemu
- cloud-hypervisor
- qemu-runtime-rs
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-nerdctl-amd64-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: ubuntu-22.04
env:
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -355,12 +378,8 @@ jobs:
run-kata-agent-apis:
name: run-kata-agent-apis
runs-on: ubuntu-22.04
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-agent-api-amd64
cancel-in-progress: true
timeout-minutes: 30
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -13,10 +13,6 @@ on:
type: string
default: ""
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-s390x
cancel-in-progress: true
permissions: {}
jobs:
@@ -30,9 +26,6 @@ jobs:
matrix:
containerd_version: ['active']
vmm: ['qemu-runtime-rs']
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-s390x-${{ toJSON(matrix) }}
cancel-in-progress: true
# TODO: enable me when https://github.com/containerd/containerd/issues/11640 is fixed
if: false
runs-on: s390x-large
@@ -42,7 +35,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
SANDBOXER: "shim"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -96,9 +89,6 @@ jobs:
matrix:
containerd_version: ['lts', 'active']
vmm: ['qemu']
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-s390x-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: s390x-large
env:
CONTAINERD_VERSION: ${{ matrix.containerd_version }}
@@ -106,7 +96,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
SANDBOXER: "podsandbox"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -133,3 +123,46 @@ jobs:
- name: Run containerd-stability tests
timeout-minutes: 15
run: bash tests/stability/gha-run.sh run
run-docker-tests:
name: run-docker-tests
strategy:
# We can set this to true whenever we're 100% sure that
# all the tests are not flaky, otherwise we'll fail them
# all due to a single flaky instance.
fail-fast: false
matrix:
vmm:
- qemu
- qemu-runtime-rs
runs-on: s390x-large
env:
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
persist-credentials: false
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install dependencies
run: bash tests/integration/docker/gha-run.sh install-dependencies
- name: get-kata-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-static-tarball-s390x${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/docker/gha-run.sh install-kata kata-artifacts
- name: Run docker smoke test
timeout-minutes: 5
run: bash tests/integration/docker/gha-run.sh run

View File

@@ -12,10 +12,6 @@ on:
required: true
type: string
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-riscv64
cancel-in-progress: true
permissions: {}
name: Build checks preview riscv64
@@ -67,9 +63,7 @@ jobs:
path: src/runtime-rs
needs:
- rust
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-${{ inputs.instance }}-${{ toJSON(matrix) }}
cancel-in-progress: true
steps:
- name: Adjust a permission for repo
run: |
@@ -78,7 +72,7 @@ jobs:
sudo rm -f /tmp/kata_hybrid* # Sometime we got leftover from test_setup_hvsock_failed()
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -5,17 +5,13 @@ on:
required: true
type: string
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-build-checks
cancel-in-progress: true
permissions: {}
name: Build checks
jobs:
check:
name: check
timeout-minutes: 60
runs-on: >-
${{
( contains(inputs.instance, 's390x') && matrix.component.name == 'runtime' ) && 's390x' ||
@@ -79,9 +75,7 @@ jobs:
- protobuf-compiler
instance:
- ${{ inputs.instance }}
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-${{ inputs.instance }}-${{ toJSON(matrix) }}
cancel-in-progress: true
steps:
- name: Adjust a permission for repo
run: |
@@ -90,7 +84,7 @@ jobs:
sudo rm -f /tmp/kata_hybrid* # Sometime we got leftover from test_setup_hvsock_failed()
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -28,10 +28,6 @@ on:
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-tarball-amd64
cancel-in-progress: true
jobs:
build-asset:
name: build-asset
@@ -68,9 +64,6 @@ jobs:
exclude:
- asset: cloud-hypervisor-glibc
stage: release
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-amd64-${{ toJSON(matrix) }}
cancel-in-progress: true
env:
PERFORM_ATTESTATION: ${{ matrix.asset == 'agent' && inputs.push-to-registry == 'yes' && 'yes' || 'no' }}
steps:
@@ -82,7 +75,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -176,9 +169,6 @@ jobs:
- rootfs-image-nvidia-gpu-confidential
- rootfs-initrd
- rootfs-initrd-confidential
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-amd64-${{ toJSON(matrix) }}
cancel-in-progress: true
steps:
- name: Login to Kata Containers quay.io
if: ${{ inputs.push-to-registry == 'yes' }}
@@ -188,7 +178,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -246,9 +236,6 @@ jobs:
- coco-guest-components
- kernel-nvidia-gpu-modules
- pause-image
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-amd64-${{ toJSON(matrix) }}
cancel-in-progress: true
steps:
- uses: geekyeggo/delete-artifact@f275313e70c08f6120db482d7a6b98377786765b # v5.1.0
with:
@@ -263,9 +250,6 @@ jobs:
matrix:
asset:
- agent
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-${{ toJSON(matrix) }}
cancel-in-progress: true
steps:
- uses: geekyeggo/delete-artifact@f275313e70c08f6120db482d7a6b98377786765b # v5.1.0
if: ${{ inputs.stage == 'release' }}
@@ -288,7 +272,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -337,13 +321,12 @@ jobs:
create-kata-tarball:
name: create-kata-tarball
runs-on: ubuntu-22.04
timeout-minutes: 10
needs: [build-asset, build-asset-rootfs, build-asset-shim-v2]
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -399,9 +382,6 @@ jobs:
- trace-forwarder
stage:
- ${{ inputs.stage }}
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-amd64-${{ toJSON(matrix) }}
cancel-in-progress: true
steps:
- name: Login to Kata Containers quay.io
if: ${{ inputs.push-to-registry == 'yes' }}
@@ -411,7 +391,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -456,7 +436,7 @@ jobs:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -28,10 +28,6 @@ on:
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-arm64
cancel-in-progress: true
jobs:
build-asset:
name: build-asset
@@ -57,9 +53,6 @@ jobs:
- ovmf
- qemu
- virtiofsd
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-arm64-${{ toJSON(matrix) }}
cancel-in-progress: true
env:
PERFORM_ATTESTATION: ${{ matrix.asset == 'agent' && inputs.push-to-registry == 'yes' && 'yes' || 'no' }}
steps:
@@ -71,7 +64,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -160,9 +153,6 @@ jobs:
- rootfs-image
- rootfs-image-nvidia-gpu
- rootfs-initrd
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-arm-${{ toJSON(matrix) }}
cancel-in-progress: true
steps:
- name: Login to Kata Containers quay.io
if: ${{ inputs.push-to-registry == 'yes' }}
@@ -172,7 +162,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -227,9 +217,6 @@ jobs:
asset:
- busybox
- kernel-nvidia-gpu-modules
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-arm-${{ toJSON(matrix) }}
cancel-in-progress: true
steps:
- uses: geekyeggo/delete-artifact@f275313e70c08f6120db482d7a6b98377786765b # v5.1.0
with:
@@ -266,7 +253,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -313,13 +300,12 @@ jobs:
create-kata-tarball:
name: create-kata-tarball
runs-on: ubuntu-24.04-arm
timeout-minutes: 10
needs: [build-asset, build-asset-rootfs, build-asset-shim-v2]
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -26,10 +26,6 @@ on:
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-ppc64le
cancel-in-progress: true
jobs:
build-asset:
name: build-asset
@@ -46,9 +42,6 @@ jobs:
- virtiofsd
stage:
- ${{ inputs.stage }}
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-ppc64le-${{ toJSON(matrix) }}
cancel-in-progress: true
steps:
- name: Login to Kata Containers quay.io
if: ${{ inputs.push-to-registry == 'yes' }}
@@ -58,7 +51,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -107,9 +100,6 @@ jobs:
- rootfs-initrd
stage:
- ${{ inputs.stage }}
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-ppc64le-${{ toJSON(matrix) }}
cancel-in-progress: true
steps:
- name: Login to Kata Containers quay.io
if: ${{ inputs.push-to-registry == 'yes' }}
@@ -119,7 +109,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -172,9 +162,6 @@ jobs:
matrix:
asset:
- agent
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-ppc64le-${{ toJSON(matrix) }}
cancel-in-progress: true
steps:
- uses: geekyeggo/delete-artifact@f275313e70c08f6120db482d7a6b98377786765b # v5.1.0
if: ${{ inputs.stage == 'release' }}
@@ -197,7 +184,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -244,7 +231,6 @@ jobs:
create-kata-tarball:
name: create-kata-tarball
runs-on: ubuntu-24.04-ppc64le
timeout-minutes: 10
needs: [build-asset, build-asset-rootfs, build-asset-shim-v2]
permissions:
contents: read
@@ -254,7 +240,7 @@ jobs:
run: |
sudo chown -R "$USER":"$USER" "$GITHUB_WORKSPACE"
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -21,10 +21,6 @@ on:
type: string
default: ""
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-tarball-riscv64
cancel-in-progress: true
permissions: {}
jobs:
@@ -41,11 +37,8 @@ jobs:
asset:
- kernel
- virtiofsd
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-riscv-${{ toJSON(matrix) }}
cancel-in-progress: true
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history

View File

@@ -29,10 +29,6 @@ on:
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-tarball-s390x
cancel-in-progress: true
jobs:
build-asset:
name: build-asset
@@ -51,9 +47,6 @@ jobs:
- pause-image
- qemu
- virtiofsd
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-s390x-${{ toJSON(matrix) }}
cancel-in-progress: true
env:
PERFORM_ATTESTATION: ${{ matrix.asset == 'agent' && inputs.push-to-registry == 'yes' && 'yes' || 'no' }}
steps:
@@ -65,7 +58,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -141,9 +134,6 @@ jobs:
- rootfs-image-confidential
- rootfs-initrd
- rootfs-initrd-confidential
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-s390x-${{ toJSON(matrix) }}
cancel-in-progress: true
steps:
- name: Login to Kata Containers quay.io
if: ${{ inputs.push-to-registry == 'yes' }}
@@ -153,7 +143,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -206,7 +196,7 @@ jobs:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Rebase atop of the latest target branch
@@ -258,9 +248,6 @@ jobs:
- agent
- coco-guest-components
- pause-image
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-s390x-${{ toJSON(matrix) }}
cancel-in-progress: true
steps:
- uses: geekyeggo/delete-artifact@f275313e70c08f6120db482d7a6b98377786765b # v5.1.0
if: ${{ inputs.stage == 'release' }}
@@ -283,7 +270,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0 # This is needed in order to keep the commit ids history
@@ -332,7 +319,6 @@ jobs:
create-kata-tarball:
name: create-kata-tarball
runs-on: ubuntu-24.04-s390x
timeout-minutes: 10
needs:
- build-asset
- build-asset-rootfs
@@ -342,7 +328,7 @@ jobs:
contents: read
packages: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -15,10 +15,6 @@ on:
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-kubectl-image
cancel-in-progress: true
env:
REGISTRY: quay.io
IMAGE_NAME: kata-containers/kubectl
@@ -32,7 +28,7 @@ jobs:
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -20,7 +20,7 @@ jobs:
steps:
- name: Checkout Code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Generate Action

View File

@@ -2,10 +2,6 @@ name: Kata Containers CI (manually triggered)
on:
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-devel
cancel-in-progress: true
permissions: {}
jobs:

View File

@@ -6,10 +6,6 @@ name: Nightly CI for s390x
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-nightly-s390x
cancel-in-progress: true
jobs:
check-internal-test-result:
name: check-internal-test-result

View File

@@ -22,7 +22,7 @@ on:
AZ_APPID:
required: true
AZ_TENANT_ID:
required: true
required: true
AZ_SUBSCRIPTION_ID:
required: true
QUAY_DEPLOYER_PASSWORD:
@@ -32,10 +32,6 @@ on:
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-weekly
cancel-in-progress: true
jobs:
build-kata-static-tarball-amd64:
permissions:
@@ -77,7 +73,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -30,7 +30,7 @@ on:
AZ_APPID:
required: true
AZ_TENANT_ID:
required: true
required: true
AZ_SUBSCRIPTION_ID:
required: true
CI_HKD_PATH:
@@ -46,10 +46,6 @@ on:
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-ci
cancel-in-progress: true
jobs:
build-kata-static-tarball-amd64:
permissions:
@@ -186,7 +182,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -374,7 +370,7 @@ jobs:
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
run-cri-containerd-tests-amd64:
run-cri-containerd-amd64:
if: ${{ inputs.skip-test != 'yes' }}
needs: build-kata-static-tarball-amd64
strategy:
@@ -391,10 +387,7 @@ jobs:
{ containerd_version: active, vmm: qemu },
{ containerd_version: active, vmm: cloud-hypervisor },
{ containerd_version: active, vmm: qemu-runtime-rs },
]
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-cri-amd64-${{ toJSON(matrix) }}
cancel-in-progress: true
]
uses: ./.github/workflows/run-cri-containerd-tests.yaml
with:
tarball-suffix: -${{ inputs.tag }}
@@ -405,19 +398,16 @@ jobs:
containerd_version: ${{ matrix.params.containerd_version }}
vmm: ${{ matrix.params.vmm }}
run-cri-containerd-tests-s390x:
run-cri-containerd-s390x:
if: ${{ inputs.skip-test != 'yes' }}
needs: build-kata-static-tarball-s390x
strategy:
fail-fast: false
matrix:
params: [
{containerd_version: active, vmm: qemu},
{containerd_version: active, vmm: qemu-runtime-rs},
]
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-${{ toJSON(matrix) }}
cancel-in-progress: true
{ containerd_version: active, vmm: qemu },
{ containerd_version: active, vmm: qemu-runtime-rs },
]
uses: ./.github/workflows/run-cri-containerd-tests.yaml
with:
tarball-suffix: -${{ inputs.tag }}
@@ -435,11 +425,8 @@ jobs:
fail-fast: false
matrix:
params: [
{containerd_version: active, vmm: qemu},
]
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-cri-ppc64le-${{ toJSON(matrix) }}
cancel-in-progress: true
{ containerd_version: active, vmm: qemu },
]
uses: ./.github/workflows/run-cri-containerd-tests.yaml
with:
tarball-suffix: -${{ inputs.tag }}
@@ -457,11 +444,8 @@ jobs:
fail-fast: false
matrix:
params: [
{containerd_version: active, vmm: qemu},
]
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-cri-arm64-${{ toJSON(matrix) }}
cancel-in-progress: true
{ containerd_version: active, vmm: qemu },
]
uses: ./.github/workflows/run-cri-containerd-tests.yaml
with:
tarball-suffix: -${{ inputs.tag }}

View File

@@ -4,10 +4,6 @@ on:
- cron: "0 0 * * *"
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
permissions: {}
jobs:
@@ -20,7 +16,7 @@ jobs:
name: ci
deployment: false
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -19,10 +19,6 @@ on:
schedule:
- cron: '45 0 * * 1'
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
permissions: {}
@@ -64,7 +60,7 @@ jobs:
# your codebase is analyzed, see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/codeql-code-scanning-for-compiled-languages
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -27,7 +27,7 @@ jobs:
echo "$HOME/.local/bin" >> "${GITHUB_PATH}"
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -3,10 +3,6 @@ on:
- cron: '0 23 * * 0'
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
permissions: {}
name: Docs URL Alive Check
@@ -23,7 +19,7 @@ jobs:
run: |
echo "GOPATH=${GITHUB_WORKSPACE}" >> "$GITHUB_ENV"
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -3,9 +3,7 @@ on:
push:
branches:
- main
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

View File

@@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-24.04
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -31,10 +31,6 @@ on:
skip_static:
value: ${{ jobs.skipper.outputs.skip_static }}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-gatekeeper-skipper
cancel-in-progress: true
permissions: {}
jobs:
@@ -46,7 +42,7 @@ jobs:
skip_test: ${{ steps.skipper.outputs.skip_test }}
skip_static: ${{ steps.skipper.outputs.skip_static }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -30,7 +30,7 @@ jobs:
issues: read
pull-requests: read
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ github.event.pull_request.head.sha }}
fetch-depth: 0

View File

@@ -3,10 +3,6 @@ on:
name: Govulncheck
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
permissions: {}
jobs:
@@ -26,7 +22,7 @@ jobs:
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -20,7 +20,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Ensure nydus-snapshotter-version is in sync inside our repo

View File

@@ -15,10 +15,6 @@ on:
push:
branches: [ "main" ]
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-osv-scanner
cancel-in-progress: true
permissions: {}
jobs:

View File

@@ -145,7 +145,7 @@ jobs:
needs: [publish-kata-deploy-payload-amd64, publish-kata-deploy-payload-arm64, publish-kata-deploy-payload-s390x, publish-kata-deploy-payload-ppc64le]
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -171,7 +171,7 @@ jobs:
packages: write # needed to push the helm chart to ghcr.io
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -34,10 +34,6 @@ on:
QUAY_DEPLOYER_PASSWORD:
required: true
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-${{ inputs.arch }}-publish-deploy
cancel-in-progress: true
permissions: {}
jobs:
@@ -48,7 +44,7 @@ jobs:
packages: write
runs-on: ${{ inputs.runner }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -12,10 +12,6 @@ on:
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
push-oras-cache:
name: push-oras-cache
@@ -25,7 +21,7 @@ jobs:
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -11,10 +11,6 @@ on:
KBUILD_SIGN_PIN:
required: true
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: false # Note - don't cancel the in progress build as we could end up with inconsistent results
permissions: {}
jobs:
@@ -54,7 +50,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: get-kata-tarball

View File

@@ -11,10 +11,6 @@ on:
KBUILD_SIGN_PIN:
required: true
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: false # Note - don't cancel the in progress build as we could end up with inconsistent results
permissions: {}
jobs:
@@ -54,7 +50,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: get-kata-tarball

View File

@@ -9,10 +9,6 @@ on:
QUAY_DEPLOYER_PASSWORD:
required: true
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: false # Note - don't cancel the in progress build as we could end up with inconsistent results
permissions: {}
jobs:
@@ -51,7 +47,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: get-kata-tarball

View File

@@ -11,10 +11,6 @@ on:
QUAY_DEPLOYER_PASSWORD:
required: true
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: false # Note - don't cancel the in progress build as we could end up with inconsistent results
permissions: {}
jobs:
@@ -55,7 +51,7 @@ jobs:
username: ${{ vars.QUAY_DEPLOYER_USERNAME }}
password: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: get-kata-tarball

View File

@@ -2,10 +2,6 @@ name: Release Kata Containers
on:
workflow_dispatch
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: false # Note - don't cancel the in progress build as we could end up with inconsistent results
permissions: {}
jobs:
@@ -16,7 +12,7 @@ jobs:
contents: write # needed for the `gh release create` command
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
@@ -91,7 +87,7 @@ jobs:
packages: write # needed to push the multi-arch manifest to ghcr.io
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -128,7 +124,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -210,7 +206,7 @@ jobs:
contents: write # needed for the `gh release` commands
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -228,7 +224,7 @@ jobs:
contents: write # needed for the `gh release` commands
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -246,7 +242,7 @@ jobs:
contents: write # needed for the `gh release` commands
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -265,7 +261,7 @@ jobs:
packages: write # needed to push the helm chart to ghcr.io
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
@@ -302,7 +298,7 @@ jobs:
contents: write # needed for the `gh release` commands
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -1,5 +1,7 @@
name: CI | Run cri-containerd tests
permissions: {}
on:
workflow_call:
inputs:
@@ -30,12 +32,6 @@ on:
required: true
type: string
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-cri-tests-${{ toJSON(inputs) }}
cancel-in-progress: true
permissions: {}
jobs:
run-cri-containerd:
name: run-cri-containerd-${{ inputs.arch }} (${{ inputs.containerd_version }}, ${{ inputs.vmm }})
@@ -45,7 +41,7 @@ jobs:
GOPATH: ${{ github.workspace }}
KATA_HYPERVISOR: ${{ inputs.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -29,13 +29,10 @@ on:
AZ_APPID:
required: true
AZ_TENANT_ID:
required: true
required: true
AZ_SUBSCRIPTION_ID:
required: true
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-aks
cancel-in-progress: true
permissions: {}
@@ -57,9 +54,6 @@ jobs:
- host_os: cbl-mariner
vmm: clh
instance-type: normal
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-run-k8s-tests-aks-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: ubuntu-22.04
permissions:
contents: read
@@ -79,7 +73,7 @@ jobs:
GENPOLICY_PULL_METHOD: ${{ matrix.genpolicy-pull-method }}
RUNS_ON_AKS: "true"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -22,10 +22,6 @@ on:
type: string
default: ""
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-k8s-arm64
cancel-in-progress: true
permissions: {}
jobs:
@@ -39,9 +35,6 @@ jobs:
- qemu-runtime-rs
k8s:
- kubeadm
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-arm64-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: arm64-k8s
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
@@ -53,7 +46,7 @@ jobs:
K8S_TEST_HOST_TYPE: all
TARGET_ARCH: "aarch64"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -27,10 +27,6 @@ on:
type: string
default: ""
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-free-runner
cancel-in-progress: true
permissions: {}
jobs:
@@ -51,9 +47,6 @@ jobs:
{ vmm: cloud-hypervisor, containerd_version: lts },
{ vmm: cloud-hypervisor, containerd_version: active },
]
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-free-runner-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: ubuntu-24.04
permissions:
contents: read
@@ -70,7 +63,7 @@ jobs:
CONTAINER_ENGINE_VERSION: ${{ matrix.environment.containerd_version }}
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -28,10 +28,6 @@ on:
NGC_API_KEY:
required: true
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-nvidia-gpu
cancel-in-progress: true
permissions: {}
jobs:
@@ -44,9 +40,6 @@ jobs:
{ name: nvidia-gpu, vmm: qemu-nvidia-gpu, runner: amd64-nvidia-a100 },
{ name: nvidia-gpu-snp, vmm: qemu-nvidia-gpu-snp, runner: amd64-nvidia-h100-snp },
]
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: ${{ matrix.environment.runner }}
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
@@ -60,7 +53,7 @@ jobs:
USE_EXPERIMENTAL_SNAPSHOTTER_SETUP: ${{ matrix.environment.name == 'nvidia-gpu-snp' && 'true' || 'false' }}
K8S_TEST_HOST_TYPE: baremetal
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -22,10 +22,6 @@ on:
type: string
default: ""
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-ppc64le
cancel-in-progress: true
permissions: {}
jobs:
@@ -38,9 +34,6 @@ jobs:
- qemu
k8s:
- kubeadm
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-ppc64le-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: ppc64le-k8s
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
@@ -52,7 +45,7 @@ jobs:
KUBERNETES: ${{ matrix.k8s }}
TARGET_ARCH: "ppc64le"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -25,10 +25,6 @@ on:
AUTHENTICATED_IMAGE_PASSWORD:
required: true
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-zvsi
cancel-in-progress: true
permissions: {}
jobs:
@@ -67,9 +63,6 @@ jobs:
vmm: qemu
- snapshotter: nydus
vmm: qemu-runtime-rs
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-zvsi-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: s390x-large
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
@@ -85,7 +78,7 @@ jobs:
AUTHENTICATED_IMAGE_USER: ${{ vars.AUTHENTICATED_IMAGE_USER }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -29,16 +29,12 @@ on:
AZ_APPID:
required: true
AZ_TENANT_ID:
required: true
required: true
AZ_SUBSCRIPTION_ID:
required: true
AUTHENTICATED_IMAGE_PASSWORD:
required: true
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-coco-stability
cancel-in-progress: true
permissions: {}
jobs:
@@ -55,9 +51,6 @@ jobs:
- nydus
pull-type:
- guest-pull
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: ubuntu-22.04
permissions:
@@ -81,7 +74,7 @@ jobs:
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
SNAPSHOTTER: ${{ matrix.snapshotter }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -34,16 +34,12 @@ on:
AZ_APPID:
required: true
AZ_TENANT_ID:
required: true
required: true
AZ_SUBSCRIPTION_ID:
required: true
ITA_KEY:
required: true
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-coco
cancel-in-progress: true
permissions: {}
jobs:
@@ -57,9 +53,6 @@ jobs:
vmm: qemu-tdx
- runner: sev-snp
vmm: qemu-snp
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: ${{ matrix.runner }}
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
@@ -78,7 +71,7 @@ jobs:
GH_ITA_KEY: ${{ secrets.ITA_KEY }}
AUTO_GENERATE_POLICY: "yes"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -148,9 +141,6 @@ jobs:
{ vmm: qemu-coco-dev-runtime-rs, snapshotter: nydus, pull_type: guest-pull },
{ vmm: qemu-coco-dev, snapshotter: "", pull_type: experimental-force-guest-pull },
]
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: ubuntu-24.04
permissions:
contents: read
@@ -179,7 +169,7 @@ jobs:
CONTAINER_ENGINE_VERSION: "active"
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -278,9 +268,6 @@ jobs:
{ k8s: microk8s, vmm: qemu-coco-dev, snapshotter: "", pull_type: experimental-force-guest-pull },
{ k8s: microk8s, vmm: qemu-coco-dev-runtime-rs, snapshotter: nydus, pull_type: guest-pull },
]
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: ubuntu-24.04
permissions:
contents: read
@@ -305,7 +292,7 @@ jobs:
K8S_TEST_HOST_TYPE: "all"
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
@@ -395,9 +382,6 @@ jobs:
- erofs
pull-type:
- default
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
runs-on: ubuntu-24.04
environment:
name: ci
@@ -424,7 +408,7 @@ jobs:
AUTO_GENERATE_POLICY: "no"
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -25,14 +25,10 @@ on:
AZ_APPID:
required: true
AZ_TENANT_ID:
required: true
required: true
AZ_SUBSCRIPTION_ID:
required: true
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-kata-deploy-aks
cancel-in-progress: true
permissions: {}
jobs:
@@ -51,9 +47,6 @@ jobs:
include:
- host_os: cbl-mariner
vmm: clh
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: ubuntu-22.04
environment:
name: ci
@@ -69,7 +62,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: "vanilla"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -22,10 +22,6 @@ on:
type: string
default: ""
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-kata-deploy
cancel-in-progress: true
permissions: {}
jobs:
@@ -41,9 +37,6 @@ jobs:
- k3s
- rke2
- microk8s
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: ubuntu-22.04
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
@@ -53,7 +46,7 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.vmm }}
KUBERNETES: ${{ matrix.k8s }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -13,10 +13,6 @@ on:
type: string
default: ""
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-kata-monitor
cancel-in-progress: true
permissions: {}
jobs:
@@ -38,16 +34,13 @@ jobs:
# TODO: enable with containerd when https://github.com/kata-containers/kata-containers/issues/9761 is fixed
- container_engine: containerd
vmm: qemu
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.event.pull_request.number || github.ref }}-${{ toJSON(matrix) }}
cancel-in-progress: true
runs-on: ubuntu-22.04
env:
CONTAINER_ENGINE: ${{ matrix.container_engine }}
#CONTAINERD_VERSION: ${{ matrix.containerd_version }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -22,10 +22,6 @@ on:
type: string
default: ""
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}-metrics
cancel-in-progress: true
permissions: {}
jobs:
@@ -50,7 +46,7 @@ jobs:
K8S_TEST_HOST_TYPE: "baremetal"
KUBERNETES: kubeadm
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0

View File

@@ -11,10 +11,6 @@ on:
branches: [ "main" ]
workflow_dispatch:
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
permissions: {}
jobs:
@@ -31,7 +27,7 @@ jobs:
steps:
- name: "Checkout code"
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false

View File

@@ -22,7 +22,7 @@ jobs:
runs-on: ubuntu-24.04
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -23,7 +23,7 @@ jobs:
runs-on: ubuntu-24.04
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

42
.github/workflows/stale_issues.yaml vendored Normal file
View File

@@ -0,0 +1,42 @@
name: 'Stale issues with activity before a fixed date'
on:
schedule:
- cron: '0 0 * * *'
workflow_dispatch:
inputs:
date:
description: "Date of stale cut-off. All issues not updated since this date will be marked as stale. Format: YYYY-MM-DD e.g. 2022-10-09"
default: "2022-10-09"
required: false
type: string
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
stale:
name: stale
runs-on: ubuntu-24.04
permissions:
actions: write # Needed to manage caches for state persistence across runs
issues: write # Needed to add/remove labels, post comments, or close issues
steps:
- name: Calculate the age to stale
run: |
echo AGE=$(( ( $(date +%s) - $(date -d "${DATE:-2022-10-09}" +%s) ) / 86400 )) >> "$GITHUB_ENV"
env:
DATE: ${{ inputs.date }}
- name: Run the stale action
uses: actions/stale@5bef64f19d7facfb25b37b414482c7164d639639 # v9.1.0
with:
stale-pr-message: 'This issue has had no activity since before ${DATE}. Please comment on the issue, or it will be closed in 30 days'
days-before-pr-stale: -1
days-before-pr-close: -1
days-before-issue-stale: ${AGE}
days-before-issue-close: 30
env:
DATE: ${{ inputs.date }}

View File

@@ -28,7 +28,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
@@ -71,7 +71,7 @@ jobs:
component-path: src/dragonball
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
@@ -115,7 +115,7 @@ jobs:
packages: write # for push to ghcr.io
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
@@ -171,7 +171,7 @@ jobs:
contents: read # for checkout
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false

View File

@@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
@@ -26,4 +26,4 @@ jobs:
advanced-security: false
annotations: true
persona: auditor
version: v1.22.0
version: v1.13.0

1
Cargo.lock generated
View File

@@ -5824,6 +5824,7 @@ dependencies = [
"protobuf",
"protocols",
"resource",
"rstest",
"runtime-spec",
"serde_json",
"shim-interface",

View File

@@ -37,6 +37,23 @@ oc adm policy add-scc-to-group anyuid system:authenticated system:serviceaccount
oc label --overwrite ns default pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/warn=baseline pod-security.kubernetes.io/audit=baseline
```
The e2e suite uses a combination of built-in (origin) and external tests. External
tests include Kubernetes upstream conformance tests from the `hyperkube` image.
To enable external tests, export a variable matching your cluster version:
```bash
export EXTENSIONS_PAYLOAD_OVERRIDE=$(oc get clusterversion version -o jsonpath='{.status.desired.image}')
# Optional: limit to hyperkube only (k8s conformance tests, avoids downloading all operator extensions)
export EXTENSION_BINARY_OVERRIDE_INCLUDE_TAGS="hyperkube"
```
Alternatively, skip external tests entirely (only OpenShift-specific tests from origin):
```bash
export OPENSHIFT_SKIP_EXTERNAL_TESTS=1
```
Now you should be ready to run the openshift-tests. Our CI only uses a subset
of tests, to get the current ``TEST_SKIPS`` see
[the pipeline config](https://github.com/openshift/release/tree/master/ci-operator/config/kata-containers/kata-containers).

View File

@@ -39,6 +39,21 @@ git_sparse_clone() {
git checkout FETCH_HEAD
}
#######################
# Install prerequisites
#######################
if ! command -v helm &>/dev/null; then
echo "Helm not installed, installing in current location..."
PATH="${PWD}:${PATH}"
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | HELM_INSTALL_DIR='.' bash -s -- --no-sudo
fi
if ! command -v yq &>/dev/null; then
echo "yq not installed, installing in current location..."
PATH="${PWD}:${PATH}"
curl -fsSL https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64 -o ./yq
chmod +x yq
fi
###############################
# Disable security to allow e2e
###############################
@@ -83,7 +98,6 @@ AZURE_REGION=$(az group show --resource-group "${AZURE_RESOURCE_GROUP}" --query
# Create workload identity
AZURE_WORKLOAD_IDENTITY_NAME="caa-${AZURE_CLIENT_ID}"
az identity create --name "${AZURE_WORKLOAD_IDENTITY_NAME}" --resource-group "${AZURE_RESOURCE_GROUP}" --location "${AZURE_REGION}"
USER_ASSIGNED_CLIENT_ID="$(az identity show --resource-group "${AZURE_RESOURCE_GROUP}" --name "${AZURE_WORKLOAD_IDENTITY_NAME}" --query 'clientId' -otsv)"
#############################
@@ -184,84 +198,36 @@ echo "CAA_IMAGE=\"${CAA_IMAGE}\""
echo "CAA_TAG=\"${CAA_TAG}\""
echo "PP_IMAGE_ID=\"${PP_IMAGE_ID}\""
# Install cert-manager (prerequisit)
helm install cert-manager oci://quay.io/jetstack/charts/cert-manager --namespace cert-manager --create-namespace --set crds.enabled=true
# Clone and configure caa
git_sparse_clone "https://github.com/confidential-containers/cloud-api-adaptor.git" "${CAA_GIT_SHA:-main}" "src/cloud-api-adaptor/install/"
git_sparse_clone "https://github.com/confidential-containers/cloud-api-adaptor.git" "${CAA_GIT_SHA:-main}" "src/cloud-api-adaptor/install/charts/" "src/peerpod-ctrl/chart" "src/webhook/chart"
echo "CAA_GIT_SHA=\"$(git rev-parse HEAD)\""
pushd src/cloud-api-adaptor
cat <<EOF > install/overlays/azure/workload-identity.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cloud-api-adaptor-daemonset
namespace: confidential-containers-system
spec:
template:
metadata:
labels:
azure.workload.identity/use: "true"
---
pushd src/cloud-api-adaptor/install/charts/peerpods
# Use the latest kata-deploy
yq -i '( .dependencies[] | select(.name == "kata-deploy") ) .version = "0.0.0-dev"' Chart.yaml
helm dependency update .
# Create secrets
kubectl apply -f - << EOF
apiVersion: v1
kind: ServiceAccount
kind: Namespace
metadata:
name: cloud-api-adaptor
namespace: confidential-containers-system
annotations:
azure.workload.identity/client-id: "${USER_ASSIGNED_CLIENT_ID}"
name: confidential-containers-system
labels:
app.kubernetes.io/managed-by: Helm
annotations:
meta.helm.sh/release-name: peerpods
meta.helm.sh/release-namespace: confidential-containers-system
EOF
PP_INSTANCE_SIZE="Standard_D2as_v5"
DISABLECVM="true"
cat <<EOF > install/overlays/azure/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../yamls
images:
- name: cloud-api-adaptor
newName: "${CAA_IMAGE}"
newTag: "${CAA_TAG}"
generatorOptions:
disableNameSuffixHash: true
configMapGenerator:
- name: peer-pods-cm
namespace: confidential-containers-system
literals:
- CLOUD_PROVIDER="azure"
- AZURE_SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID}"
- AZURE_REGION="${PP_REGION}"
- AZURE_INSTANCE_SIZE="${PP_INSTANCE_SIZE}"
- AZURE_RESOURCE_GROUP="${PP_RESOURCE_GROUP}"
- AZURE_SUBNET_ID="${PP_SUBNET_ID}"
- AZURE_IMAGE_ID="${PP_IMAGE_ID}"
- DISABLECVM="${DISABLECVM}"
- PEERPODS_LIMIT_PER_NODE="50"
secretGenerator:
- name: peer-pods-secret
namespace: confidential-containers-system
envs:
- service-principal.env
- name: ssh-key-secret
namespace: confidential-containers-system
files:
- id_rsa.pub
patchesStrategicMerge:
- workload-identity.yaml
EOF
ssh-keygen -t rsa -f install/overlays/azure/id_rsa -N ''
echo "AZURE_CLIENT_ID=${AZURE_CLIENT_ID}" > install/overlays/azure/service-principal.env
echo "AZURE_CLIENT_SECRET=${AZURE_CLIENT_SECRET}" >> install/overlays/azure/service-principal.env
echo "AZURE_TENANT_ID=${AZURE_TENANT_ID}" >> install/overlays/azure/service-principal.env
# Deploy Operator
git_sparse_clone "https://github.com/confidential-containers/operator" "${OPERATOR_SHA:-main}" "config/"
echo "OPERATOR_SHA=\"$(git rev-parse HEAD)\""
oc apply -k "config/release"
oc apply -k "config/samples/ccruntime/peer-pods"
popd
# Deploy CAA
kubectl apply -k "install/overlays/azure"
popd
popd
kubectl create secret generic my-provider-creds \
-n confidential-containers-system \
--from-literal=AZURE_CLIENT_ID="$AZURE_CLIENT_ID" \
--from-literal=AZURE_CLIENT_SECRET="$AZURE_CLIENT_SECRET" \
--from-literal=AZURE_TENANT_ID="$AZURE_TENANT_ID"
helm install peerpods . -f providers/azure.yaml --set secrets.mode=reference --set secrets.existingSecretName=my-provider-creds --set providerConfigs.azure.AZURE_SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID}" --set providerConfigs.azure.AZURE_REGION="${PP_REGION}" --set providerConfigs.azure.AZURE_INSTANCE_SIZE="Standard_D2as_v5" --set providerConfigs.azure.AZURE_RESOURCE_GROUP="${PP_RESOURCE_GROUP}" --set providerConfigs.azure.AZURE_SUBNET_ID="${PP_SUBNET_ID}" --set providerConfigs.azure.AZURE_IMAGE_ID="${PP_IMAGE_ID}" --set providerConfigs.azure.DISABLECVM="true" --set providerConfigs.azure.PEERPODS_LIMIT_PER_NODE="50" --set kata-deploy.snapshotter.setup= --dependency-update -n confidential-containers-system --create-namespace --wait
popd # charts
popd # git_sparse_clone CAA
# Wait for runtimeclass
SECONDS=0

View File

@@ -111,7 +111,7 @@ impl FcInner {
let body_config: String = json!({
"mem_size_mib": self.config.memory_info.default_memory,
"vcpu_count": self.config.cpu_info.default_vcpus,
"vcpu_count": self.config.cpu_info.default_vcpus.ceil() as u8,
})
.to_string();
let body_kernel: String = json!({
@@ -191,13 +191,10 @@ impl FcInner {
.disk_rate_limiter_ops_one_time_burst,
);
let rate_limiter = serde_json::to_string(&block_rate_limit)
.with_context(|| format!("serde {block_rate_limit:?} to json"))?;
let body: String = json!({
"drive_id": format!("drive{drive_id}"),
"path_on_host": new_drive_path,
"rate_limiter": rate_limiter,
"rate_limiter": block_rate_limit,
})
.to_string();
self.request_with_retry(

View File

@@ -858,7 +858,12 @@ impl QemuInner {
block_device.config.index,
&block_device.config.path_on_host,
&block_device.config.blkdev_aio.to_string(),
block_device.config.is_direct,
Some(
block_device
.config
.is_direct
.unwrap_or(self.config.blockdev_info.block_device_cache_direct),
),
block_device.config.is_readonly,
block_device.config.no_drop,
)

View File

@@ -84,6 +84,16 @@ impl ResourceManager {
inner.handle_network(network_config).await
}
pub async fn has_network_endpoints(&self) -> bool {
let inner = self.inner.read().await;
inner.has_network_endpoints().await
}
pub async fn setup_network_in_guest(&self) -> Result<()> {
let inner = self.inner.read().await;
inner.setup_network_in_guest().await
}
#[instrument]
pub async fn setup_after_start_vm(&self) -> Result<()> {
let mut inner = self.inner.write().await;

View File

@@ -296,6 +296,33 @@ impl ResourceManagerInner {
Ok(())
}
pub async fn has_network_endpoints(&self) -> bool {
if let Some(network) = &self.network {
match network.interfaces().await {
std::result::Result::Ok(interfaces) => !interfaces.is_empty(),
Err(_) => false,
}
} else {
false
}
}
pub async fn setup_network_in_guest(&self) -> Result<()> {
if let Some(network) = self.network.as_ref() {
let network = network.as_ref();
self.handle_interfaces(network)
.await
.context("handle interfaces during network rescan")?;
self.handle_neighbours(network)
.await
.context("handle neighbours during network rescan")?;
self.handle_routes(network)
.await
.context("handle routes during network rescan")?;
}
Ok(())
}
pub async fn setup_after_start_vm(&mut self) -> Result<()> {
self.cgroups_resource
.setup_after_start_vm(self.hypervisor.as_ref())

View File

@@ -165,6 +165,6 @@ pub fn new(id: &str, config: &SharedFsInfo) -> Result<Arc<dyn ShareFs>> {
VIRTIO_FS => Ok(Arc::new(
ShareVirtioFsStandalone::new(id, config).context("new standalone virtio fs")?,
)),
_ => Err(anyhow!("unsupported shred fs {:?}", &shared_fs)),
_ => Err(anyhow!("unsupported shared fs {:?}", &shared_fs)),
}
}

View File

@@ -53,6 +53,9 @@ linux_container = { workspace = true, optional = true }
virt_container = { workspace = true, optional = true }
wasm_container = { workspace = true, optional = true }
[dev-dependencies]
rstest = { workspace = true }
[features]
default = ["virt"]
linux = ["linux_container"]

View File

@@ -51,6 +51,13 @@ pub trait Sandbox: Send + Sync {
shim_pid: u32,
) -> Result<()>;
/// Re-scan the network namespace for late-discovered endpoints.
/// This handles runtimes like Docker 26+ that configure networking
/// after the Start response. The default implementation is a no-op.
async fn rescan_network(&self) -> Result<()> {
Ok(())
}
// metrics function
async fn agent_metrics(&self) -> Result<String>;
async fn hypervisor_metrics(&self) -> Result<String>;

View File

@@ -69,6 +69,53 @@ use crate::{
tracer::{KataTracer, ROOTSPAN},
};
const DOCKER_LIBNETWORK_SETKEY: &str = "libnetwork-setkey";
const DOCKER_NETNS_PREFIXES: &[&str] = &["/var/run/docker/netns/", "/run/docker/netns/"];
fn is_valid_docker_sandbox_id(id: &str) -> bool {
id.len() == 64 && id.bytes().all(|b| matches!(b, b'0'..=b'9' | b'a'..=b'f'))
}
/// Discover Docker's pre-created network namespace path from OCI spec hooks.
///
/// Docker's libnetwork-setkey hook contains the sandbox ID as its
/// argument following "libnetwork-setkey", which maps to a netns file
/// under /var/run/docker/netns/<sandbox_id> or /run/docker/netns/<sandbox_id>.
fn docker_netns_path(spec: &oci::Spec) -> Option<String> {
let hooks = spec.hooks().as_ref()?;
let hook_sets: [&[oci::Hook]; 2] = [
hooks.prestart().as_deref().unwrap_or_default(),
hooks.create_runtime().as_deref().unwrap_or_default(),
];
for hooks in &hook_sets {
for hook in *hooks {
if let Some(args) = hook.args() {
for (i, arg) in args.iter().enumerate() {
if arg == DOCKER_LIBNETWORK_SETKEY && i + 1 < args.len() {
let sandbox_id = &args[i + 1];
if !is_valid_docker_sandbox_id(sandbox_id) {
continue;
}
for prefix in DOCKER_NETNS_PREFIXES {
let ns_path = format!("{}{}", prefix, sandbox_id);
if let Ok(metadata) = std::fs::symlink_metadata(&ns_path) {
if metadata.is_file() {
return Some(ns_path);
}
}
}
}
}
}
}
}
None
}
fn convert_string_to_slog_level(string_level: &str) -> slog::Level {
match string_level {
"trace" => slog::Level::Trace,
@@ -377,8 +424,17 @@ impl RuntimeHandlerManager {
if ns.path().is_some() {
netns = ns.path().clone().map(|p| p.display().to_string());
}
// if we get empty netns from oci spec, we need to create netns for the VM
else {
// Docker 26+ may configure networking outside of the OCI
// spec namespace path. Try to discover the netns from hook
// args before falling back to creating a placeholder.
else if let Some(docker_ns) = docker_netns_path(spec) {
info!(
sl!(),
"discovered Docker network namespace from hook args";
"netns" => &docker_ns
);
netns = Some(docker_ns);
} else {
let ns_name = generate_netns_name();
let raw_netns = NetNs::new(ns_name)?;
let path = Some(PathBuf::from(raw_netns.path()).display().to_string());
@@ -639,6 +695,7 @@ impl RuntimeHandlerManager {
Ok(TaskResponse::WaitProcess(exit_status))
}
TaskRequest::StartProcess(process_id) => {
let is_sandbox_container = cm.is_sandbox_container(&process_id).await;
let shim_pid = cm
.start_process(&process_id)
.await
@@ -647,6 +704,25 @@ impl RuntimeHandlerManager {
let pid = shim_pid.pid;
let process_type = process_id.process_type;
let container_id = process_id.container_id().to_string();
// Schedule an async network rescan for sandbox containers.
// This handles runtimes that configure networking after the
// Start response (e.g. Docker 26+). rescan_network is
// idempotent — it returns immediately if endpoints already
// exist.
if is_sandbox_container {
let sandbox_rescan = sandbox.clone();
tokio::spawn(async move {
if let Err(e) = sandbox_rescan.rescan_network().await {
error!(
sl!(),
"async network rescan failed — container may lack networking: {:?}",
e
);
}
});
}
tokio::spawn(async move {
let result = sandbox.wait_process(cm, process_id, pid).await;
if let Err(e) = result {
@@ -920,3 +996,85 @@ fn configure_non_root_hypervisor(config: &mut Hypervisor) -> Result<()> {
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
use oci_spec::runtime::{HookBuilder, HooksBuilder, SpecBuilder};
use rstest::rstest;
const VALID_SANDBOX_ID: &str =
"a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4e5f6a1b2";
#[rstest]
#[case::all_lowercase_hex(VALID_SANDBOX_ID, true)]
#[case::all_zeros("0000000000000000000000000000000000000000000000000000000000000000", true)]
#[case::uppercase_hex("A1B2C3D4E5F6A1B2C3D4E5F6A1B2C3D4E5F6A1B2C3D4E5F6A1B2C3D4E5F6A1B2", false)]
#[case::too_short("a1b2c3d4", false)]
#[case::non_hex("zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz", false)]
#[case::path_traversal("../../../etc/passwd", false)]
#[case::empty("", false)]
fn test_is_valid_docker_sandbox_id(#[case] id: &str, #[case] expected: bool) {
assert_eq!(is_valid_docker_sandbox_id(id), expected);
}
fn make_hook_with_args(args: Vec<&str>) -> oci::Hook {
HookBuilder::default()
.path("/usr/bin/test")
.args(args.into_iter().map(String::from).collect::<Vec<_>>())
.build()
.unwrap()
}
#[rstest]
#[case::no_hooks(None, None)]
#[case::unrelated_hooks(
Some(HooksBuilder::default()
.prestart(vec![make_hook_with_args(vec!["some-hook", "arg1"])])
.build().unwrap()),
None
)]
#[case::invalid_sandbox_id(
Some(HooksBuilder::default()
.prestart(vec![make_hook_with_args(vec![
"/usr/bin/dockerd", "libnetwork-setkey", "not-a-valid-id",
])])
.build().unwrap()),
None
)]
#[case::setkey_at_end_of_args(
Some(HooksBuilder::default()
.prestart(vec![make_hook_with_args(vec![
"/usr/bin/dockerd", "libnetwork-setkey",
])])
.build().unwrap()),
None
)]
#[case::valid_prestart_but_no_file(
Some(HooksBuilder::default()
.prestart(vec![make_hook_with_args(vec![
"/usr/bin/dockerd", "libnetwork-setkey", VALID_SANDBOX_ID,
])])
.build().unwrap()),
None
)]
#[case::valid_create_runtime_but_no_file(
Some(HooksBuilder::default()
.create_runtime(vec![make_hook_with_args(vec![
"/usr/bin/dockerd", "libnetwork-setkey", VALID_SANDBOX_ID,
])])
.build().unwrap()),
None
)]
fn test_docker_netns_path(
#[case] hooks: Option<oci::Hooks>,
#[case] expected: Option<String>,
) {
let mut builder = SpecBuilder::default();
if let Some(h) = hooks {
builder = builder.hooks(h);
}
let spec = builder.build().unwrap();
assert_eq!(docker_netns_path(&spec), expected);
}
}

View File

@@ -58,6 +58,7 @@ use resource::{ResourceConfig, ResourceManager};
use runtime_spec as spec;
use std::path::Path;
use std::sync::Arc;
use std::time::Duration;
use strum::Display;
use tokio::sync::{mpsc::Sender, Mutex, RwLock};
use tracing::instrument;
@@ -973,6 +974,71 @@ impl Sandbox for VirtSandbox {
self.hypervisor.get_hypervisor_metrics().await
}
async fn rescan_network(&self) -> Result<()> {
let config = self.resource_manager.config().await;
if config.runtime.disable_new_netns {
return Ok(());
}
if dan_config_path(&config, &self.sid).exists() {
return Ok(());
}
if self.resource_manager.has_network_endpoints().await {
return Ok(());
}
let sandbox_config = match &self.sandbox_config {
Some(c) => c,
None => return Ok(()),
};
let netns_path = match &sandbox_config.network_env.netns {
Some(p) => p.clone(),
None => return Ok(()),
};
const MAX_WAIT: Duration = Duration::from_secs(5);
const POLL_INTERVAL: Duration = Duration::from_millis(50);
let deadline = tokio::time::Instant::now() + MAX_WAIT;
info!(sl!(), "waiting for network interfaces in namespace");
loop {
let network_config = NetworkConfig::NetNs(NetworkWithNetNsConfig {
network_model: config.runtime.internetworking_model.clone(),
netns_path: netns_path.clone(),
queues: self
.hypervisor
.hypervisor_config()
.await
.network_info
.network_queues as usize,
network_created: sandbox_config.network_env.network_created,
});
if let Err(e) = self.resource_manager.handle_network(network_config).await {
warn!(sl!(), "network rescan attempt failed: {:?}", e);
}
if self.resource_manager.has_network_endpoints().await {
info!(sl!(), "network interfaces discovered during rescan");
return self
.resource_manager
.setup_network_in_guest()
.await
.context("setup network in guest after rescan");
}
if tokio::time::Instant::now() >= deadline {
warn!(
sl!(),
"no network interfaces found after timeout — networking may be configured later"
);
return Ok(());
}
tokio::time::sleep(POLL_INTERVAL).await;
}
}
async fn set_policy(&self, policy: &str) -> Result<()> {
if policy.is_empty() {
debug!(sl!(), "sb: set_policy skipped without policy");

View File

@@ -9,9 +9,9 @@ import (
"context"
"fmt"
"github.com/containerd/containerd/api/types/task"
"github.com/sirupsen/logrus"
"github.com/containerd/containerd/api/types/task"
"github.com/kata-containers/kata-containers/src/runtime/pkg/katautils"
)
@@ -46,6 +46,19 @@ func startContainer(ctx context.Context, s *service, c *container) (retErr error
}
go watchSandbox(ctx, s)
// If no network endpoints were discovered during sandbox creation,
// schedule an async rescan. This handles runtimes that configure
// networking after task creation (e.g. Docker 26+ configures
// networking after the Start response, and prestart hooks may
// not have run yet on slower architectures).
// RescanNetwork is idempotent — it returns immediately if
// endpoints already exist.
go func() {
if err := s.sandbox.RescanNetwork(s.ctx); err != nil {
shimLog.WithError(err).Error("async network rescan failed — container may lack networking")
}
}()
// We use s.ctx(`ctx` derived from `s.ctx`) to check for cancellation of the
// shim context and the context passed to startContainer for tracing.
go watchOOMEvents(ctx, s)

View File

@@ -1,34 +0,0 @@
on: ["pull_request"]
name: Unit tests
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
permissions:
contents: read
jobs:
test:
name: test
strategy:
matrix:
go-version: [1.15.x, 1.16.x]
os: [ubuntu-22.04]
runs-on: ${{ matrix.os }}
steps:
- name: Install Go
uses: actions/setup-go@40f1582b2485089dde7abd97c1529aa768e1baff # v5.6.0
with:
go-version: ${{ matrix.go-version }}
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
persist-credentials: false
- name: golangci-lint
uses: golangci/golangci-lint-action@4696ba8babb6127d732c3c6dde519db15edab9ea # v6.5.1
with:
version: latest
args: -c .golangci.yml -v
- name: go test
run: go test ./...

View File

@@ -1 +0,0 @@
*~

View File

@@ -1,35 +0,0 @@
# Copyright (c) 2021 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
run:
concurrency: 4
deadline: 600s
skip-dirs:
- vendor
# Ignore auto-generated protobuf code.
skip-files:
- ".*\\.pb\\.go$"
linters:
disable-all: true
enable:
- deadcode
- gocyclo
- gofmt
- gosimple
- govet
- ineffassign
- misspell
- staticcheck
- structcheck
- typecheck
- unconvert
- unused
- varcheck
linters-settings:
gocyclo:
min_complexity: 15
unused:
check-exported: true

View File

@@ -1,26 +0,0 @@
language: go
go:
- "1.10"
- "1.11"
- tip
arch:
- s390x
go_import_path: github.com/kata-containers/govmm
matrix:
allow_failures:
- go: tip
before_install:
- go get github.com/alecthomas/gometalinter
- gometalinter --install
- go get github.com/mattn/goveralls
script:
- go env
- gometalinter --tests --vendor --disable-all --enable=misspell --enable=vet --enable=ineffassign --enable=gofmt --enable=gocyclo --cyclo-over=15 --enable=golint --enable=errcheck --enable=deadcode --enable=staticcheck -enable=gas ./...
after_success:
- $GOPATH/bin/goveralls -repotoken $COVERALLS_TOKEN -v -service=travis-ci

View File

@@ -19,6 +19,7 @@ import (
vc "github.com/kata-containers/kata-containers/src/runtime/virtcontainers"
vf "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/factory"
vcAnnotations "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/annotations"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/utils"
specs "github.com/opencontainers/runtime-spec/specs-go"
)
@@ -140,6 +141,17 @@ func CreateSandbox(ctx context.Context, vci vc.VC, ociSpec specs.Spec, runtimeCo
sandboxConfig.Containers[0].RootFs = rootFs
}
// Docker 26+ may set up networking before task creation instead of using
// prestart hooks. The netns path is not in the OCI spec but can be
// discovered from Docker's libnetwork hook args which contain the sandbox
// ID that maps to /var/run/docker/netns/<sandbox_id>.
if sandboxConfig.NetworkConfig.NetworkID == "" && !sandboxConfig.NetworkConfig.DisableNewNetwork {
if dockerNetns := utils.DockerNetnsPath(&ociSpec); dockerNetns != "" {
sandboxConfig.NetworkConfig.NetworkID = dockerNetns
kataUtilsLogger.WithField("netns", dockerNetns).Info("discovered Docker network namespace from hook args")
}
}
// Important to create the network namespace before the sandbox is
// created, because it is not responsible for the creation of the
// netns if it does not exist.

View File

@@ -20,7 +20,9 @@ import (
"golang.org/x/sys/unix"
)
const procMountInfoFile = "/proc/self/mountinfo"
const (
procMountInfoFile = "/proc/self/mountinfo"
)
// EnterNetNS is free from any call to a go routine, and it calls
// into runtime.LockOSThread(), meaning it won't be executed in a
@@ -29,27 +31,30 @@ func EnterNetNS(networkID string, cb func() error) error {
return vc.EnterNetNS(networkID, cb)
}
// SetupNetworkNamespace create a network namespace
// SetupNetworkNamespace creates a network namespace if one is not already
// provided via NetworkID. When NetworkID is empty and networking is not
// disabled, a new namespace is created as a placeholder; the actual
// hypervisor namespace will be discovered later by addAllEndpoints after
// the VM has started.
func SetupNetworkNamespace(config *vc.NetworkConfig) error {
if config.DisableNewNetwork {
kataUtilsLogger.Info("DisableNewNetNs is on, shim and hypervisor are running in the host netns")
return nil
}
var err error
var n ns.NetNS
if config.NetworkID == "" {
var (
err error
n ns.NetNS
)
if rootless.IsRootless() {
n, err = rootless.NewNS()
if err != nil {
return err
}
} else {
n, err = testutils.NewNS()
if err != nil {
return err
}
}
if err != nil {
return err
}
config.NetworkID = n.Path()
@@ -71,11 +76,23 @@ func SetupNetworkNamespace(config *vc.NetworkConfig) error {
}
const (
netNsMountType = "nsfs"
mountTypeFieldIdx = 8
mountDestIdx = 4
netNsMountType = "nsfs"
mountDestIdx = 4
)
// mountinfoFsType finds the filesystem type in a parsed mountinfo line.
// The mountinfo format has optional tagged fields (shared:, master:, etc.)
// between field 7 and a "-" separator. The fs type is the field immediately
// after "-". Returns "" if the separator is not found.
func mountinfoFsType(fields []string) string {
for i, f := range fields {
if f == "-" && i+1 < len(fields) {
return fields[i+1]
}
}
return ""
}
// getNetNsFromBindMount returns the network namespace for the bind-mounted path
func getNetNsFromBindMount(nsPath string, procMountFile string) (string, error) {
// Resolve all symlinks in the path as the mountinfo file contains
@@ -100,16 +117,15 @@ func getNetNsFromBindMount(nsPath string, procMountFile string) (string, error)
// "711 26 0:3 net:[4026532009] /run/docker/netns/default rw shared:535 - nsfs nsfs rw"
//
// Reference: https://www.kernel.org/doc/Documentation/filesystems/proc.txt
// We are interested in the first 9 fields of this file,
// to check for the correct mount type.
// The "-" separator has a variable position due to optional tagged
// fields, so we locate the fs type dynamically.
fields := strings.Split(text, " ")
if len(fields) < 9 {
continue
}
// We check here if the mount type is a network namespace mount type, namely "nsfs"
if fields[mountTypeFieldIdx] != netNsMountType {
if mountinfoFsType(fields) != netNsMountType {
continue
}

View File

@@ -149,3 +149,23 @@ func TestSetupNetworkNamespace(t *testing.T) {
err = SetupNetworkNamespace(config)
assert.NoError(err)
}
func TestMountinfoFsType(t *testing.T) {
assert := assert.New(t)
// Standard mountinfo line with optional tagged fields
fields := []string{"711", "26", "0:3", "net:[4026532009]", "/run/docker/netns/default", "rw", "shared:535", "-", "nsfs", "nsfs", "rw"}
assert.Equal("nsfs", mountinfoFsType(fields))
// Multiple optional tags before separator
fields = []string{"711", "26", "0:3", "net:[4026532009]", "/run/docker/netns/default", "rw", "shared:535", "master:1", "-", "nsfs", "nsfs", "rw"}
assert.Equal("nsfs", mountinfoFsType(fields))
// No separator
fields = []string{"711", "26", "0:3", "net:[4026532009]", "/run/docker/netns/default", "rw"}
assert.Equal("", mountinfoFsType(fields))
// Separator at end (malformed)
fields = []string{"711", "26", "-"}
assert.Equal("", mountinfoFsType(fields))
}

View File

@@ -72,6 +72,8 @@ type VCSandbox interface {
GetOOMEvent(ctx context.Context) (string, error)
GetHypervisorPid() (int, error)
// RescanNetwork re-scans the network namespace for late-discovered endpoints.
RescanNetwork(ctx context.Context) error
UpdateRuntimeMetrics() error
GetAgentMetrics(ctx context.Context) (string, error)

View File

@@ -17,9 +17,11 @@ import (
"runtime"
"sort"
"strconv"
"strings"
"time"
"github.com/containernetworking/plugins/pkg/ns"
"github.com/sirupsen/logrus"
"github.com/vishvananda/netlink"
"github.com/vishvananda/netns"
otelTrace "go.opentelemetry.io/otel/trace"
@@ -45,6 +47,11 @@ type LinuxNetwork struct {
interworkingModel NetInterworkingModel
netNSCreated bool
danConfigPath string
// placeholderNetNS holds the path to a placeholder network namespace
// that we created but later abandoned in favour of the hypervisor's
// netns. If best-effort deletion in addAllEndpoints fails, teardown
// retries the cleanup via RemoveEndpoints.
placeholderNetNS string
}
// NewNetwork creates a new Linux Network from a NetworkConfig.
@@ -68,11 +75,11 @@ func NewNetwork(configs ...*NetworkConfig) (Network, error) {
}
return &LinuxNetwork{
config.NetworkID,
[]Endpoint{},
config.InterworkingModel,
config.NetworkCreated,
config.DanConfigPath,
netNSPath: config.NetworkID,
eps: []Endpoint{},
interworkingModel: config.InterworkingModel,
netNSCreated: config.NetworkCreated,
danConfigPath: config.DanConfigPath,
}, nil
}
@@ -325,28 +332,91 @@ func (n *LinuxNetwork) GetEndpointsNum() (int, error) {
// Scan the networking namespace through netlink and then:
// 1. Create the endpoints for the relevant interfaces found there.
// 2. Attach them to the VM.
//
// If no usable interfaces are found and the hypervisor is running in a
// different network namespace (e.g. Docker 26+ places QEMU in its own
// pre-configured namespace), switch to the hypervisor's namespace and
// rescan there. This handles the case where the OCI spec does not
// communicate the network namespace path.
func (n *LinuxNetwork) addAllEndpoints(ctx context.Context, s *Sandbox, hotplug bool) error {
netnsHandle, err := netns.GetFromPath(n.netNSPath)
endpoints, err := n.scanEndpointsInNs(ctx, s, n.netNSPath, hotplug)
if err != nil {
return err
}
// If the scan found no usable endpoints, check whether the
// hypervisor is running in a different namespace and retry there.
if len(endpoints) == 0 && s != nil {
if hypervisorNs, ok := n.detectHypervisorNetns(s); ok {
networkLogger().WithFields(logrus.Fields{
"original_netns": n.netNSPath,
"hypervisor_netns": hypervisorNs,
}).Debug("no endpoints in original netns, switching to hypervisor netns")
origPath := n.netNSPath
origCreated := n.netNSCreated
n.netNSPath = hypervisorNs
_, err = n.scanEndpointsInNs(ctx, s, n.netNSPath, hotplug)
if err != nil {
n.netNSPath = origPath
n.netNSCreated = origCreated
return err
}
// Clean up the placeholder namespace we created — we're now
// using the hypervisor's namespace and the placeholder is empty.
// Only clear netNSCreated once deletion succeeds; on failure,
// stash the path so RemoveEndpoints can retry during teardown.
if origCreated {
if delErr := deleteNetNS(origPath); delErr != nil {
networkLogger().WithField("netns", origPath).WithError(delErr).Warn("failed to delete placeholder netns, will retry during teardown")
n.placeholderNetNS = origPath
}
}
// The hypervisor's namespace was not created by us.
n.netNSCreated = false
}
}
sort.Slice(n.eps, func(i, j int) bool {
return n.eps[i].Name() < n.eps[j].Name()
})
networkLogger().WithField("endpoints", n.eps).Info("endpoints found after scan")
return nil
}
// scanEndpointsInNs scans a network namespace for usable (non-loopback,
// configured) interfaces and adds them as endpoints. Returns the list of
// newly added endpoints.
func (n *LinuxNetwork) scanEndpointsInNs(ctx context.Context, s *Sandbox, nsPath string, hotplug bool) ([]Endpoint, error) {
netnsHandle, err := netns.GetFromPath(nsPath)
if err != nil {
return nil, err
}
defer netnsHandle.Close()
netlinkHandle, err := netlink.NewHandleAt(netnsHandle)
if err != nil {
return err
return nil, err
}
defer netlinkHandle.Close()
linkList, err := netlinkHandle.LinkList()
if err != nil {
return err
return nil, err
}
epsBefore := len(n.eps)
var added []Endpoint
for _, link := range linkList {
netInfo, err := networkInfoFromLink(netlinkHandle, link)
if err != nil {
return err
// No rollback needed — no endpoints were added in this iteration yet.
return nil, err
}
// Ignore unconfigured network interfaces. These are
@@ -368,22 +438,62 @@ func (n *LinuxNetwork) addAllEndpoints(ctx context.Context, s *Sandbox, hotplug
continue
}
if err := doNetNS(n.netNSPath, func(_ ns.NetNS) error {
_, err = n.addSingleEndpoint(ctx, s, netInfo, hotplug)
return err
if err := doNetNS(nsPath, func(_ ns.NetNS) error {
ep, addErr := n.addSingleEndpoint(ctx, s, netInfo, hotplug)
if addErr == nil {
added = append(added, ep)
}
return addErr
}); err != nil {
return err
// Rollback: remove any endpoints added during this scan
// so that a failed scan does not leave partial side effects.
n.eps = n.eps[:epsBefore]
return nil, err
}
}
sort.Slice(n.eps, func(i, j int) bool {
return n.eps[i].Name() < n.eps[j].Name()
})
return added, nil
}
networkLogger().WithField("endpoints", n.eps).Info("endpoints found after scan")
// detectHypervisorNetns checks whether the hypervisor process is running in a
// network namespace different from the one we are currently tracking. If so it
// returns the procfs path to the hypervisor's netns and true.
func (n *LinuxNetwork) detectHypervisorNetns(s *Sandbox) (string, bool) {
pid, err := s.GetHypervisorPid()
if err != nil || pid <= 0 {
return "", false
}
return nil
// Guard against PID recycling: verify the process belongs to this
// sandbox by checking its command line for the sandbox ID. QEMU is
// started with -name sandbox-<id>, so the ID will appear in cmdline.
// /proc/pid/cmdline uses null bytes as argument separators; replace
// them so the substring search works on the joined argument string.
cmdlineRaw, err := os.ReadFile(fmt.Sprintf("/proc/%d/cmdline", pid))
if err != nil {
return "", false
}
cmdline := strings.ReplaceAll(string(cmdlineRaw), "\x00", " ")
if !strings.Contains(cmdline, s.id) {
return "", false
}
hypervisorNs := fmt.Sprintf("/proc/%d/ns/net", pid)
// Compare device and inode numbers. Inode numbers are only unique
// within a device, so both must match to confirm the same namespace.
var currentStat, hvStat unix.Stat_t
if err := unix.Stat(n.netNSPath, &currentStat); err != nil {
return "", false
}
if err := unix.Stat(hypervisorNs, &hvStat); err != nil {
return "", false
}
if currentStat.Dev != hvStat.Dev || currentStat.Ino != hvStat.Ino {
return hypervisorNs, true
}
return "", false
}
func convertDanDeviceToNetworkInfo(device *vctypes.DanDevice) (*NetworkInfo, error) {
@@ -571,6 +681,17 @@ func (n *LinuxNetwork) RemoveEndpoints(ctx context.Context, s *Sandbox, endpoint
return deleteNetNS(n.netNSPath)
}
// Retry cleanup of a placeholder namespace whose earlier deletion
// failed in addAllEndpoints.
if n.placeholderNetNS != "" && endpoints == nil {
if delErr := deleteNetNS(n.placeholderNetNS); delErr != nil {
networkLogger().WithField("netns", n.placeholderNetNS).WithError(delErr).Warn("failed to delete placeholder netns during teardown")
} else {
networkLogger().WithField("netns", n.placeholderNetNS).Info("placeholder network namespace deleted")
n.placeholderNetNS = ""
}
}
return nil
}

View File

@@ -363,11 +363,11 @@ func TestConvertDanDeviceToNetworkInfo(t *testing.T) {
func TestAddEndpoints_Dan(t *testing.T) {
network := &LinuxNetwork{
"net-123",
[]Endpoint{},
NetXConnectDefaultModel,
true,
"testdata/dan-config.json",
netNSPath: "net-123",
eps: []Endpoint{},
interworkingModel: NetXConnectDefaultModel,
netNSCreated: true,
danConfigPath: "testdata/dan-config.json",
}
ctx := context.TODO()

View File

@@ -255,6 +255,10 @@ func (s *Sandbox) GetHypervisorPid() (int, error) {
return 0, nil
}
func (s *Sandbox) RescanNetwork(ctx context.Context) error {
return nil
}
func (s *Sandbox) GuestVolumeStats(ctx context.Context, path string) ([]byte, error) {
return nil, nil
}

View File

@@ -20,6 +20,7 @@ import (
"strings"
"sync"
"syscall"
"time"
v1 "github.com/containerd/cgroups/stats/v1"
v2 "github.com/containerd/cgroups/v2/stats"
@@ -330,6 +331,81 @@ func (s *Sandbox) GetHypervisorPid() (int, error) {
return pids[0], nil
}
// RescanNetwork re-scans the network namespace for endpoints if none have
// been discovered yet. This is idempotent: if endpoints already exist it
// returns immediately. It enables Docker 26+ support where networking is
// configured after task creation but before Start.
//
// Docker 26+ configures networking (veth pair, IP addresses) between
// Create and Start. The interfaces may not be present immediately, so
// this method polls until they appear or a timeout is reached.
//
// When new endpoints are found, the guest agent is informed about the
// interfaces and routes so that networking becomes functional inside the VM.
func (s *Sandbox) RescanNetwork(ctx context.Context) error {
if s.config.NetworkConfig.DisableNewNetwork {
return nil
}
if len(s.network.Endpoints()) > 0 {
return nil
}
const maxWait = 5 * time.Second
const pollInterval = 50 * time.Millisecond
deadline := time.NewTimer(maxWait)
defer deadline.Stop()
ticker := time.NewTicker(pollInterval)
defer ticker.Stop()
s.Logger().Debug("waiting for network interfaces in namespace")
for {
if _, err := s.network.AddEndpoints(ctx, s, nil, true); err != nil {
return err
}
if len(s.network.Endpoints()) > 0 {
return s.configureGuestNetwork(ctx)
}
select {
case <-ctx.Done():
return ctx.Err()
case <-deadline.C:
s.Logger().Warn("no network interfaces found after timeout — networking may be configured by prestart hooks")
return nil
case <-ticker.C:
}
}
}
// configureGuestNetwork informs the guest agent about discovered network
// endpoints so that interfaces and routes become functional inside the VM.
func (s *Sandbox) configureGuestNetwork(ctx context.Context) error {
endpoints := s.network.Endpoints()
s.Logger().WithField("endpoints", len(endpoints)).Info("configuring hotplugged network in guest")
// Note: ARP neighbors (3rd return value) are not propagated here
// because the agent interface only exposes per-entry updates. The
// full setupNetworks path in kataAgent handles them; this path is
// only reached for late-discovered endpoints where neighbor entries
// are populated dynamically by the kernel.
interfaces, routes, _, err := generateVCNetworkStructures(ctx, endpoints)
if err != nil {
return fmt.Errorf("generating network structures: %w", err)
}
for _, ifc := range interfaces {
if _, err := s.agent.updateInterface(ctx, ifc); err != nil {
return fmt.Errorf("updating interface %s in guest: %w", ifc.Name, err)
}
}
if len(routes) > 0 {
if _, err := s.agent.updateRoutes(ctx, routes); err != nil {
return fmt.Errorf("updating routes in guest: %w", err)
}
}
return nil
}
// GetAllContainers returns all containers.
func (s *Sandbox) GetAllContainers() []VCContainer {
ifa := make([]VCContainer, len(s.containers))

View File

@@ -12,6 +12,7 @@ import (
"os"
"os/exec"
"path/filepath"
"regexp"
"strings"
"syscall"
"time"
@@ -493,17 +494,38 @@ func RevertBytes(num uint64) uint64 {
return 1024*RevertBytes(a) + b
}
// dockerLibnetworkSetkey is the hook argument that identifies Docker's
// network configuration hook. The argument following it is the sandbox ID.
const dockerLibnetworkSetkey = "libnetwork-setkey"
// dockerNetnsPrefixes are the well-known filesystem paths where the Docker
// daemon bind-mounts container network namespaces.
var dockerNetnsPrefixes = []string{"/var/run/docker/netns/", "/run/docker/netns/"}
// validSandboxID matches Docker sandbox IDs: exactly 64 lowercase hex characters.
var validSandboxID = regexp.MustCompile(`^[0-9a-f]{64}$`)
// IsDockerContainer returns if the container is managed by docker
// This is done by checking the prestart hook for `libnetwork` arguments.
// This is done by checking the prestart and createRuntime hooks for
// `libnetwork` arguments. Docker 26+ may use CreateRuntime hooks
// instead of the deprecated Prestart hooks.
func IsDockerContainer(spec *specs.Spec) bool {
if spec == nil || spec.Hooks == nil {
return false
}
for _, hook := range spec.Hooks.Prestart { //nolint:all
for _, arg := range hook.Args {
if strings.HasPrefix(arg, "libnetwork") {
return true
// Check both Prestart (Docker < 26) and CreateRuntime (Docker >= 26) hooks.
hookSets := [][]specs.Hook{
spec.Hooks.Prestart, //nolint:all
spec.Hooks.CreateRuntime,
}
for _, hooks := range hookSets {
for _, hook := range hooks {
for _, arg := range hook.Args {
if strings.HasPrefix(arg, "libnetwork") {
return true
}
}
}
}
@@ -511,6 +533,50 @@ func IsDockerContainer(spec *specs.Spec) bool {
return false
}
// DockerNetnsPath attempts to discover Docker's pre-created network namespace
// path from OCI spec hooks. Docker's libnetwork-setkey hook contains the
// sandbox ID as its second argument, which maps to the netns file under
// /var/run/docker/netns/<sandbox_id>.
func DockerNetnsPath(spec *specs.Spec) string {
if spec == nil || spec.Hooks == nil {
return ""
}
// Search both Prestart and CreateRuntime hooks for libnetwork-setkey.
hookSets := [][]specs.Hook{
spec.Hooks.Prestart, //nolint:all
spec.Hooks.CreateRuntime,
}
for _, hooks := range hookSets {
for _, hook := range hooks {
for i, arg := range hook.Args {
if arg == dockerLibnetworkSetkey && i+1 < len(hook.Args) {
sandboxID := hook.Args[i+1]
// Docker sandbox IDs are exactly 64 lowercase hex
// characters. Reject anything else to prevent path
// traversal and unexpected input.
if !validSandboxID.MatchString(sandboxID) {
continue
}
// Docker stores netns under well-known paths.
// Use Lstat to reject symlinks (which could point
// outside the Docker netns directory) and non-regular
// files such as directories.
for _, prefix := range dockerNetnsPrefixes {
nsPath := prefix + sandboxID
if fi, err := os.Lstat(nsPath); err == nil && fi.Mode().IsRegular() {
return nsPath
}
}
}
}
}
}
return ""
}
// GetGuestNUMANodes constructs guest NUMA nodes mapping to host NUMA nodes and host CPUs.
func GetGuestNUMANodes(numaMapping []string) ([]types.GuestNUMANode, error) {
// Add guest NUMA node for each specified subsets of host NUMA nodes.

View File

@@ -579,24 +579,178 @@ func TestRevertBytes(t *testing.T) {
assert.Equal(expectedNum, num)
}
// TestIsDockerContainer validates hook-detection logic in isolation.
// End-to-end Docker→containerd→kata integration is covered by
// external tests (see tests/integration/kubernetes/).
func TestIsDockerContainer(t *testing.T) {
assert := assert.New(t)
// nil spec
assert.False(IsDockerContainer(nil))
// nil hooks
assert.False(IsDockerContainer(&specs.Spec{}))
// Unrelated prestart hook
ociSpec := &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{
{
Args: []string{
"haha",
},
},
Prestart: []specs.Hook{ //nolint:all
{Args: []string{"haha"}},
},
},
}
assert.False(IsDockerContainer(ociSpec))
// Prestart hook with libnetwork (Docker < 26)
ociSpec.Hooks.Prestart = append(ociSpec.Hooks.Prestart, specs.Hook{ //nolint:all
Args: []string{"libnetwork-xxx"},
})
assert.True(IsDockerContainer(ociSpec))
// CreateRuntime hook with libnetwork (Docker >= 26)
ociSpec2 := &specs.Spec{
Hooks: &specs.Hooks{
CreateRuntime: []specs.Hook{
{Args: []string{"/usr/bin/docker-proxy", "libnetwork-setkey", "abc123", "ctrl"}},
},
},
}
assert.True(IsDockerContainer(ociSpec2))
// CreateRuntime hook without libnetwork
ociSpec3 := &specs.Spec{
Hooks: &specs.Hooks{
CreateRuntime: []specs.Hook{
{Args: []string{"/some/other/hook"}},
},
},
}
assert.False(IsDockerContainer(ociSpec3))
}
// TestDockerNetnsPath validates netns path discovery from OCI hook args.
// This does not test the actual namespace opening or endpoint scanning;
// see integration tests for full-path coverage.
func TestDockerNetnsPath(t *testing.T) {
assert := assert.New(t)
// Valid 64-char hex sandbox IDs for test cases.
validID := strings.Repeat("ab", 32) // 64 hex chars
validID2 := strings.Repeat("cd", 32) // another 64 hex chars
invalidShortID := "abc123" // too short
invalidUpperID := strings.Repeat("AB", 32) // uppercase rejected
// nil spec
assert.Equal("", DockerNetnsPath(nil))
// nil hooks
assert.Equal("", DockerNetnsPath(&specs.Spec{}))
// Hook without libnetwork-setkey
spec := &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{ //nolint:all
{Args: []string{"/some/binary", "unrelated"}},
},
},
}
assert.Equal("", DockerNetnsPath(spec))
// Prestart hook with libnetwork-setkey but sandbox ID too short (rejected by regex)
spec = &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{ //nolint:all
{Args: []string{"/usr/bin/proxy", "libnetwork-setkey", invalidShortID, "ctrl"}},
},
},
}
assert.Equal("", DockerNetnsPath(spec))
// Prestart hook with libnetwork-setkey but uppercase hex (rejected by regex)
spec = &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{ //nolint:all
{Args: []string{"/usr/bin/proxy", "libnetwork-setkey", invalidUpperID, "ctrl"}},
},
},
}
assert.Equal("", DockerNetnsPath(spec))
// Prestart hook with valid sandbox ID but netns file doesn't exist on disk
spec = &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{ //nolint:all
{Args: []string{"/usr/bin/proxy", "libnetwork-setkey", validID, "ctrl"}},
},
},
}
assert.Equal("", DockerNetnsPath(spec))
// Prestart hook with libnetwork-setkey and existing netns file — success path
tmpDir := t.TempDir()
fakeNsDir := filepath.Join(tmpDir, "netns")
err := os.MkdirAll(fakeNsDir, 0755)
assert.NoError(err)
fakeNsFile := filepath.Join(fakeNsDir, validID)
err = os.WriteFile(fakeNsFile, []byte{}, 0644)
assert.NoError(err)
// Temporarily override dockerNetnsPrefixes so DockerNetnsPath can find
// the netns file we created under the temp directory.
origPrefixes := dockerNetnsPrefixes
dockerNetnsPrefixes = []string{fakeNsDir + "/"}
defer func() { dockerNetnsPrefixes = origPrefixes }()
spec = &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{ //nolint:all
{Args: []string{"/usr/bin/proxy", "libnetwork-setkey", validID, "ctrl"}},
},
},
}
assert.Equal(fakeNsFile, DockerNetnsPath(spec))
// Sandbox ID that is a directory rather than a regular file — must be rejected
dirID := validID2
err = os.MkdirAll(filepath.Join(fakeNsDir, dirID), 0755)
assert.NoError(err)
spec = &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{ //nolint:all
{Args: []string{"/usr/bin/proxy", "libnetwork-setkey", dirID, "ctrl"}},
},
},
}
assert.Equal("", DockerNetnsPath(spec))
// CreateRuntime hook with valid sandbox ID — file doesn't exist
validID3 := strings.Repeat("ef", 32)
spec = &specs.Spec{
Hooks: &specs.Hooks{
CreateRuntime: []specs.Hook{
{Args: []string{"/usr/bin/proxy", "libnetwork-setkey", validID3, "ctrl"}},
},
},
}
assert.Equal("", DockerNetnsPath(spec))
// Hook with libnetwork-setkey as last arg (no sandbox ID follows) — no panic
spec = &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{ //nolint:all
{Args: []string{"libnetwork-setkey"}},
},
},
}
assert.Equal("", DockerNetnsPath(spec))
// Empty args slice
spec = &specs.Spec{
Hooks: &specs.Hooks{
Prestart: []specs.Hook{ //nolint:all
{Args: []string{}},
},
},
}
assert.Equal("", DockerNetnsPath(spec))
}

View File

@@ -1,124 +0,0 @@
# Copyright (c) K3s contributors
#
# SPDX-License-Identifier: Apache-2.0
#
{{- /* */ -}}
# File generated by {{ .Program }}. DO NOT EDIT. Use config-v3.toml.tmpl instead.
version = 3
imports = ["__CONTAINERD_IMPORTS_PATH__"]
root = {{ printf "%q" .NodeConfig.Containerd.Root }}
state = {{ printf "%q" .NodeConfig.Containerd.State }}
[grpc]
address = {{ deschemify .NodeConfig.Containerd.Address | printf "%q" }}
[plugins.'io.containerd.internal.v1.opt']
path = {{ printf "%q" .NodeConfig.Containerd.Opt }}
[plugins.'io.containerd.grpc.v1.cri']
stream_server_address = "127.0.0.1"
stream_server_port = "10010"
[plugins.'io.containerd.cri.v1.runtime']
enable_selinux = {{ .NodeConfig.SELinux }}
enable_unprivileged_ports = {{ .EnableUnprivileged }}
enable_unprivileged_icmp = {{ .EnableUnprivileged }}
device_ownership_from_security_context = {{ .NonrootDevices }}
{{ if .DisableCgroup}}
disable_cgroup = true
{{ end }}
{{ if .IsRunningInUserNS }}
disable_apparmor = true
restrict_oom_score_adj = true
{{ end }}
{{ with .NodeConfig.AgentConfig.Snapshotter }}
[plugins.'io.containerd.cri.v1.images']
snapshotter = "{{ . }}"
disable_snapshot_annotations = {{ if eq . "stargz" }}false{{else}}true{{end}}
use_local_image_pull = true
{{ end }}
{{ with .NodeConfig.AgentConfig.PauseImage }}
[plugins.'io.containerd.cri.v1.images'.pinned_images]
sandbox = "{{ . }}"
{{ end }}
{{- if or .NodeConfig.AgentConfig.CNIBinDir .NodeConfig.AgentConfig.CNIConfDir }}
[plugins.'io.containerd.cri.v1.runtime'.cni]
{{ with .NodeConfig.AgentConfig.CNIBinDir }}bin_dirs = [{{ printf "%q" . }}]{{ end }}
{{ with .NodeConfig.AgentConfig.CNIConfDir }}conf_dir = {{ printf "%q" . }}{{ end }}
{{ end }}
{{ if or .NodeConfig.Containerd.BlockIOConfig .NodeConfig.Containerd.RDTConfig }}
[plugins.'io.containerd.service.v1.tasks-service']
{{ with .NodeConfig.Containerd.BlockIOConfig }}blockio_config_file = {{ printf "%q" . }}{{ end }}
{{ with .NodeConfig.Containerd.RDTConfig }}rdt_config_file = {{ printf "%q" . }}{{ end }}
{{ end }}
{{ with .NodeConfig.DefaultRuntime }}
[plugins.'io.containerd.cri.v1.runtime'.containerd]
default_runtime_name = "{{ . }}"
{{ end }}
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runc.options]
SystemdCgroup = {{ .SystemdCgroup }}
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.runhcs-wcow-process]
runtime_type = "io.containerd.runhcs.v1"
{{ range $k, $v := .ExtraRuntimes }}
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'{{ $k }}']
runtime_type = "{{$v.RuntimeType}}"
{{ with $v.BinaryName}}
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'{{ $k }}'.options]
BinaryName = {{ printf "%q" . }}
SystemdCgroup = {{ $.SystemdCgroup }}
{{ end }}
{{ end }}
[plugins.'io.containerd.cri.v1.images'.registry]
config_path = {{ printf "%q" .NodeConfig.Containerd.Registry }}
{{ if .PrivateRegistryConfig }}
{{ range $k, $v := .PrivateRegistryConfig.Configs }}
{{ with $v.Auth }}
[plugins.'io.containerd.cri.v1.images'.registry.configs.'{{ $k }}'.auth]
{{ with .Username }}username = {{ printf "%q" . }}{{ end }}
{{ with .Password }}password = {{ printf "%q" . }}{{ end }}
{{ with .Auth }}auth = {{ printf "%q" . }}{{ end }}
{{ with .IdentityToken }}identitytoken = {{ printf "%q" . }}{{ end }}
{{ end }}
{{ end }}
{{ end }}
{{ if eq .NodeConfig.AgentConfig.Snapshotter "stargz" }}
{{ with .NodeConfig.AgentConfig.ImageServiceSocket }}
[plugins.'io.containerd.snapshotter.v1.stargz']
cri_keychain_image_service_path = {{ printf "%q" . }}
[plugins.'io.containerd.snapshotter.v1.stargz'.cri_keychain]
enable_keychain = true
{{ end }}
[plugins.'io.containerd.snapshotter.v1.stargz'.registry]
config_path = {{ printf "%q" .NodeConfig.Containerd.Registry }}
{{ if .PrivateRegistryConfig }}
{{ range $k, $v := .PrivateRegistryConfig.Configs }}
{{ with $v.Auth }}
[plugins.'io.containerd.snapshotter.v1.stargz'.registry.configs.'{{ $k }}'.auth]
{{ with .Username }}username = {{ printf "%q" . }}{{ end }}
{{ with .Password }}password = {{ printf "%q" . }}{{ end }}
{{ with .Auth }}auth = {{ printf "%q" . }}{{ end }}
{{ with .IdentityToken }}identitytoken = {{ printf "%q" . }}{{ end }}
{{ end }}
{{ end }}
{{ end }}
{{ end }}

View File

@@ -0,0 +1,213 @@
#!/usr/bin/env bats
#
# Copyright (c) 2026 NVIDIA Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
# Kata Deploy Lifecycle Tests
#
# Validates kata-deploy behavior during DaemonSet restarts and uninstalls:
#
# 1. Artifacts present: After install, kata artifacts exist on the host,
# RuntimeClasses are created, and the node is labeled.
#
# 2. Restart resilience: Running kata pods must survive a kata-deploy
# DaemonSet restart without crashing. (Regression test for #12761)
#
# 3. Artifact cleanup: After helm uninstall, kata artifacts must be
# fully removed from the host and containerd must remain healthy.
#
# Required environment variables:
# DOCKER_REGISTRY - Container registry for kata-deploy image
# DOCKER_REPO - Repository name for kata-deploy image
# DOCKER_TAG - Image tag to test
# KATA_HYPERVISOR - Hypervisor to test (qemu, clh, etc.)
# KUBERNETES - K8s distribution (microk8s, k3s, rke2, etc.)
load "${BATS_TEST_DIRNAME}/../../common.bash"
repo_root_dir="${BATS_TEST_DIRNAME}/../../../"
load "${repo_root_dir}/tests/gha-run-k8s-common.sh"
source "${BATS_TEST_DIRNAME}/lib/helm-deploy.bash"
LIFECYCLE_POD_NAME="kata-lifecycle-test"
# Run a command on the host node's filesystem using a short-lived privileged pod.
# The host root is mounted at /host inside the pod.
# Usage: run_on_host "test -d /host/opt/kata && echo YES || echo NO"
run_on_host() {
local cmd="$1"
local node_name
node_name=$(kubectl get nodes --no-headers -o custom-columns=NAME:.metadata.name | head -1)
local pod_name="host-exec-${RANDOM}"
kubectl run "${pod_name}" \
--image=quay.io/kata-containers/alpine-bash-curl:latest \
--restart=Never --rm -i \
--overrides="{
\"spec\": {
\"nodeName\": \"${node_name}\",
\"activeDeadlineSeconds\": 300,
\"tolerations\": [{\"operator\": \"Exists\"}],
\"containers\": [{
\"name\": \"exec\",
\"image\": \"quay.io/kata-containers/alpine-bash-curl:latest\",
\"imagePullPolicy\": \"IfNotPresent\",
\"command\": [\"sh\", \"-c\", \"${cmd}\"],
\"securityContext\": {\"privileged\": true},
\"volumeMounts\": [{\"name\": \"host\", \"mountPath\": \"/host\", \"readOnly\": true}]
}],
\"volumes\": [{\"name\": \"host\", \"hostPath\": {\"path\": \"/\"}}]
}
}"
}
setup_file() {
ensure_helm
echo "# Image: ${DOCKER_REGISTRY}/${DOCKER_REPO}:${DOCKER_TAG}" >&3
echo "# Hypervisor: ${KATA_HYPERVISOR}" >&3
echo "# K8s distribution: ${KUBERNETES}" >&3
echo "# Deploying kata-deploy..." >&3
deploy_kata
echo "# kata-deploy deployed successfully" >&3
}
@test "Kata artifacts are present on host after install" {
echo "# Checking kata artifacts on host..." >&3
run run_on_host "test -d /host/opt/kata && echo PRESENT || echo MISSING"
echo "# /opt/kata directory: ${output}" >&3
[[ "${output}" == *"PRESENT"* ]]
run run_on_host "test -f /host/opt/kata/bin/containerd-shim-kata-v2 && echo FOUND || (test -f /host/opt/kata/runtime-rs/bin/containerd-shim-kata-v2 && echo FOUND || echo MISSING)"
echo "# containerd-shim-kata-v2: ${output}" >&3
[[ "${output}" == *"FOUND"* ]]
# RuntimeClasses must exist (filter out AKS-managed ones)
local rc_count
rc_count=$(kubectl get runtimeclasses --no-headers 2>/dev/null | grep -v "kata-mshv-vm-isolation" | grep -c "kata" || true)
echo "# Kata RuntimeClasses: ${rc_count}" >&3
[[ ${rc_count} -gt 0 ]]
# Node must have the kata-runtime label
local label
label=$(kubectl get nodes -o jsonpath='{.items[0].metadata.labels.katacontainers\.io/kata-runtime}')
echo "# Node label katacontainers.io/kata-runtime: ${label}" >&3
[[ "${label}" == "true" ]]
}
@test "DaemonSet restart does not crash running kata pods" {
# Create a long-running kata pod
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: ${LIFECYCLE_POD_NAME}
spec:
runtimeClassName: kata-${KATA_HYPERVISOR}
restartPolicy: Always
nodeSelector:
katacontainers.io/kata-runtime: "true"
containers:
- name: test
image: quay.io/kata-containers/alpine-bash-curl:latest
imagePullPolicy: IfNotPresent
command: ["sleep", "infinity"]
EOF
echo "# Waiting for kata pod to be running..." >&3
kubectl wait --for=condition=Ready "pod/${LIFECYCLE_POD_NAME}" --timeout=120s
# Record pod identity before the DaemonSet restart
local pod_uid_before
pod_uid_before=$(kubectl get pod "${LIFECYCLE_POD_NAME}" -o jsonpath='{.metadata.uid}')
local restart_count_before
restart_count_before=$(kubectl get pod "${LIFECYCLE_POD_NAME}" -o jsonpath='{.status.containerStatuses[0].restartCount}')
echo "# Pod UID before: ${pod_uid_before}, restarts: ${restart_count_before}" >&3
# Trigger a DaemonSet restart — this simulates what happens when a user
# changes a label, updates a config value, or does a rolling update.
echo "# Triggering kata-deploy DaemonSet restart..." >&3
kubectl -n "${HELM_NAMESPACE}" rollout restart daemonset/kata-deploy
echo "# Waiting for DaemonSet rollout to complete..." >&3
kubectl -n "${HELM_NAMESPACE}" rollout status daemonset/kata-deploy --timeout=300s
# On k3s/rke2 the new kata-deploy pod restarts the k3s service as
# part of install, which causes a brief API server outage. Wait for
# the node to become ready before querying pod status.
kubectl wait nodes --timeout=120s --all --for condition=Ready=True
echo "# Node is ready after DaemonSet rollout" >&3
# The kata pod must still be Running with the same UID and no extra restarts.
# Retry kubectl through any residual API unavailability.
local pod_phase=""
local retries=0
while [[ ${retries} -lt 30 ]]; do
pod_phase=$(kubectl get pod "${LIFECYCLE_POD_NAME}" -o jsonpath='{.status.phase}' 2>/dev/null) && break
retries=$((retries + 1))
sleep 2
done
echo "# Pod phase after restart: ${pod_phase}" >&3
[[ "${pod_phase}" == "Running" ]]
local pod_uid_after
pod_uid_after=$(kubectl get pod "${LIFECYCLE_POD_NAME}" -o jsonpath='{.metadata.uid}')
echo "# Pod UID after: ${pod_uid_after}" >&3
[[ "${pod_uid_before}" == "${pod_uid_after}" ]]
local restart_count_after
restart_count_after=$(kubectl get pod "${LIFECYCLE_POD_NAME}" -o jsonpath='{.status.containerStatuses[0].restartCount}')
echo "# Restart count after: ${restart_count_after}" >&3
[[ "${restart_count_before}" == "${restart_count_after}" ]]
echo "# SUCCESS: Kata pod survived DaemonSet restart without crashing" >&3
}
@test "Artifacts are fully cleaned up after uninstall" {
echo "# Uninstalling kata-deploy..." >&3
uninstall_kata
echo "# Uninstall complete, verifying cleanup..." >&3
# Wait for node to recover — containerd restart during cleanup may
# cause brief unavailability (especially on k3s/rke2).
kubectl wait nodes --timeout=120s --all --for condition=Ready=True
# RuntimeClasses must be gone (filter out AKS-managed ones)
local rc_count
rc_count=$(kubectl get runtimeclasses --no-headers 2>/dev/null | grep -v "kata-mshv-vm-isolation" | grep -c "kata" || true)
echo "# Kata RuntimeClasses remaining: ${rc_count}" >&3
[[ ${rc_count} -eq 0 ]]
# Node label must be removed
local label
label=$(kubectl get nodes -o jsonpath='{.items[0].metadata.labels.katacontainers\.io/kata-runtime}' 2>/dev/null || echo "")
echo "# Node label after uninstall: '${label}'" >&3
[[ -z "${label}" ]]
# Kata artifacts must be removed from the host filesystem
echo "# Checking host filesystem for leftover artifacts..." >&3
run run_on_host "test -d /host/opt/kata && echo EXISTS || echo REMOVED"
echo "# /opt/kata: ${output}" >&3
[[ "${output}" == *"REMOVED"* ]]
# Containerd must still be healthy and reporting a valid version
local container_runtime_version
container_runtime_version=$(kubectl get nodes --no-headers -o custom-columns=CONTAINER_RUNTIME:.status.nodeInfo.containerRuntimeVersion)
echo "# Container runtime version: ${container_runtime_version}" >&3
[[ "${container_runtime_version}" != *"Unknown"* ]]
echo "# SUCCESS: All kata artifacts cleaned up, containerd healthy" >&3
}
teardown() {
if [[ "${BATS_TEST_NAME}" == *"restart"* ]]; then
kubectl delete pod "${LIFECYCLE_POD_NAME}" --ignore-not-found=true --wait=false 2>/dev/null || true
fi
}
teardown_file() {
kubectl delete pod "${LIFECYCLE_POD_NAME}" --ignore-not-found=true --wait=false 2>/dev/null || true
uninstall_kata 2>/dev/null || true
}

View File

@@ -20,6 +20,7 @@ else
KATA_DEPLOY_TEST_UNION=( \
"kata-deploy.bats" \
"kata-deploy-custom-runtimes.bats" \
"kata-deploy-lifecycle.bats" \
)
fi

View File

@@ -296,36 +296,6 @@ function deploy_k0s() {
sudo chown "${USER}":"${USER}" ~/.kube/config
}
# If the rendered containerd config (v3) does not import the drop-in dir, write
# the full V3 template (from tests/containerd-config-v3.tmpl) with the given
# import path and restart the service.
# Args: containerd_dir (e.g. /var/lib/rancher/k3s/agent/etc/containerd), service_name (e.g. k3s or rke2-server).
function _setup_containerd_v3_template_if_needed() {
local containerd_dir="$1"
local service_name="$2"
local template_file="${tests_dir}/containerd-config-v3.tmpl"
local rendered_v3="${containerd_dir}/config-v3.toml"
local imports_path="${containerd_dir}/config-v3.toml.d/*.toml"
if sudo test -f "${rendered_v3}" && sudo grep -q 'config-v3\.toml\.d' "${rendered_v3}" 2>/dev/null; then
return 0
fi
if [[ ! -f "${template_file}" ]]; then
echo "Template not found: ${template_file}" >&2
return 1
fi
sudo mkdir -p "${containerd_dir}/config-v3.toml.d"
sed "s|__CONTAINERD_IMPORTS_PATH__|${imports_path}|g" "${template_file}" | sudo tee "${containerd_dir}/config-v3.toml.tmpl" > /dev/null
sudo systemctl restart "${service_name}"
}
function setup_k3s_containerd_v3_template_if_needed() {
_setup_containerd_v3_template_if_needed "/var/lib/rancher/k3s/agent/etc/containerd" "k3s"
}
function setup_rke2_containerd_v3_template_if_needed() {
_setup_containerd_v3_template_if_needed "/var/lib/rancher/rke2/agent/etc/containerd" "rke2-server"
}
function deploy_k3s() {
# Set CRI runtime-request-timeout to 600s (same as kubeadm) for CoCo and long-running create requests.
curl -sfL https://get.k3s.io | sh -s - --write-kubeconfig-mode 644 --kubelet-arg runtime-request-timeout=600s
@@ -333,9 +303,6 @@ function deploy_k3s() {
# This is an arbitrary value that came up from local tests
sleep 120s
# If rendered config does not import the drop-in dir, write full V3 template so kata-deploy can use it.
setup_k3s_containerd_v3_template_if_needed
# Download the kubectl binary into /usr/bin and remove /usr/local/bin/kubectl
#
# We need to do this to avoid hitting issues like:
@@ -405,9 +372,6 @@ function deploy_rke2() {
# This is an arbitrary value that came up from local tests
sleep 120s
# If rendered config does not import the drop-in dir, write full V3 template so kata-deploy can use it.
setup_rke2_containerd_v3_template_if_needed
# Link the kubectl binary into /usr/bin
sudo ln -sf /var/lib/rancher/rke2/bin/kubectl /usr/local/bin/kubectl

View File

@@ -0,0 +1,45 @@
#!/bin/bash
#
# Copyright (c) 2023 Intel Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
set -o errexit
set -o nounset
set -o pipefail
kata_tarball_dir="${2:-kata-artifacts}"
docker_dir="$(dirname "$(readlink -f "$0")")"
source "${docker_dir}/../../common.bash"
image="${image:-instrumentisto/nmap:latest}"
function install_dependencies() {
info "Installing the dependencies needed for running the docker smoke test"
sudo -E docker pull "${image}"
}
function run() {
info "Running docker smoke test tests using ${KATA_HYPERVISOR} hypervisor"
enabling_hypervisor
info "Running docker with runc"
sudo docker run --rm --entrypoint nping "${image}" --tcp-connect -c 2 -p 80 www.github.com
info "Running docker with Kata Containers (${KATA_HYPERVISOR})"
sudo docker run --rm --runtime io.containerd.kata-${KATA_HYPERVISOR}.v2 --entrypoint nping "${image}" --tcp-connect -c 2 -p 80 www.github.com
}
function main() {
action="${1:-}"
case "${action}" in
install-dependencies) install_dependencies ;;
install-kata) install_kata ;;
run) run ;;
*) >&2 die "Invalid argument" ;;
esac
}
main "$@"

View File

@@ -0,0 +1,219 @@
#!/usr/bin/env bats
#
# Copyright (c) 2026 NVIDIA Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
# This file is modeled after k8s-nvidia-nim.bats which contains helpful in-line documentation.
load "${BATS_TEST_DIRNAME}/lib.sh"
load "${BATS_TEST_DIRNAME}/confidential_common.sh"
export KATA_HYPERVISOR="${KATA_HYPERVISOR:-qemu-nvidia-gpu}"
TEE=false
if is_confidential_gpu_hardware; then
TEE=true
fi
export TEE
NIM_SERVICE_NAME="meta-llama-3-2-1b-instruct"
[[ "${TEE}" = "true" ]] && NIM_SERVICE_NAME="meta-llama-3-2-1b-instruct-tee"
export NIM_SERVICE_NAME
POD_READY_TIMEOUT_LLAMA_3_2_1B_PREDEFINED=600s
[[ "${TEE}" = "true" ]] && POD_READY_TIMEOUT_LLAMA_3_2_1B_PREDEFINED=1200s
export POD_READY_TIMEOUT_LLAMA_3_2_1B=${POD_READY_TIMEOUT_LLAMA_3_2_1B:-${POD_READY_TIMEOUT_LLAMA_3_2_1B_PREDEFINED}}
export LOCAL_NIM_CACHE_LLAMA_3_2_1B="${LOCAL_NIM_CACHE_LLAMA_3_2_1B:-${LOCAL_NIM_CACHE:-/opt/nim/.cache}-llama-3-2-1b}"
DOCKER_CONFIG_JSON=$(
echo -n "{\"auths\":{\"nvcr.io\":{\"username\":\"\$oauthtoken\",\"password\":\"${NGC_API_KEY}\",\"auth\":\"$(echo -n "\$oauthtoken:${NGC_API_KEY}" | base64 -w0)\"}}}" |
base64 -w0
)
export DOCKER_CONFIG_JSON
KBS_AUTH_CONFIG_JSON=$(
echo -n "{\"auths\":{\"nvcr.io\":{\"auth\":\"$(echo -n "\$oauthtoken:${NGC_API_KEY}" | base64 -w0)\"}}}" |
base64 -w0
)
export KBS_AUTH_CONFIG_JSON
NGC_API_KEY_BASE64=$(
echo -n "${NGC_API_KEY}" | base64 -w0
)
export NGC_API_KEY_BASE64
# Points to kbs:///default/ngc-api-key/instruct and thus re-uses the secret from k8s-nvidia-nim.bats.
NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B="${SEALED_SECRET_PRECREATED_NIM_INSTRUCT}"
export NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B
NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B_BASE64=$(echo -n "${NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B}" | base64 -w0)
export NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B_BASE64
# NIM Operator (k8s-nim-operator) install/uninstall for NIMService CRD.
NIM_OPERATOR_NAMESPACE="${NIM_OPERATOR_NAMESPACE:-nim-operator}"
NIM_OPERATOR_RELEASE_NAME="nim-operator"
install_nim_operator() {
command -v helm &>/dev/null || die "helm is required but not installed"
echo "Installing NVIDIA NIM Operator (latest chart)"
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
kubectl create namespace "${NIM_OPERATOR_NAMESPACE}" --dry-run=client -o yaml | kubectl apply -f -
helm upgrade --install "${NIM_OPERATOR_RELEASE_NAME}" nvidia/k8s-nim-operator \
-n "${NIM_OPERATOR_NAMESPACE}" \
--wait
local deploy_name
deploy_name=$(kubectl get deployment -n "${NIM_OPERATOR_NAMESPACE}" -o jsonpath='{.items[0].metadata.name}' 2>/dev/null || true)
if [[ -n "${deploy_name}" ]]; then
kubectl wait --for=condition=available --timeout=300s "deployment/${deploy_name}" -n "${NIM_OPERATOR_NAMESPACE}"
fi
echo "NIM Operator install complete."
}
uninstall_nim_operator() {
echo "Uninstalling NVIDIA NIM Operator (release: ${NIM_OPERATOR_RELEASE_NAME}, namespace: ${NIM_OPERATOR_NAMESPACE})"
if helm status "${NIM_OPERATOR_RELEASE_NAME}" -n "${NIM_OPERATOR_NAMESPACE}" &>/dev/null; then
helm uninstall "${NIM_OPERATOR_RELEASE_NAME}" -n "${NIM_OPERATOR_NAMESPACE}" || true
kubectl delete namespace "${NIM_OPERATOR_NAMESPACE}" --ignore-not-found=true --timeout=60s || true
echo "NIM Operator uninstall complete."
else
echo "NIM Operator release not found, nothing to uninstall."
fi
}
setup_kbs_credentials() {
CC_KBS_ADDR=$(kbs_k8s_svc_http_addr)
export CC_KBS_ADDR
kubectl delete secret ngc-secret-llama-3-2-1b --ignore-not-found
kubectl create secret docker-registry ngc-secret-llama-3-2-1b --docker-server="nvcr.io" --docker-username="\$oauthtoken" --docker-password="${NGC_API_KEY}"
kbs_set_gpu0_resource_policy
kbs_set_resource_base64 "default" "credentials" "nvcr" "${KBS_AUTH_CONFIG_JSON}"
kbs_set_resource "default" "ngc-api-key" "instruct" "${NGC_API_KEY}"
}
# CDH initdata for guest-pull: KBS URL, registry credentials URI, and allow-all policy.
# NIMService is not supported by genpolicy; add_allow_all_policy_to_yaml only supports Pod/Deployment.
# Build initdata with policy inline so TEE pods get both CDH config and policy.
create_nim_initdata_file_llama_3_2_1b() {
local output_file="$1"
local cc_kbs_address
cc_kbs_address=$(kbs_k8s_svc_http_addr)
local allow_all_rego="${BATS_TEST_DIRNAME}/../../../src/kata-opa/allow-all.rego"
cat > "${output_file}" << EOF
version = "0.1.0"
algorithm = "sha256"
[data]
"aa.toml" = '''
[token_configs]
[token_configs.kbs]
url = "${cc_kbs_address}"
'''
"cdh.toml" = '''
[kbc]
name = "cc_kbc"
url = "${cc_kbs_address}"
[image]
authenticated_registry_credentials_uri = "kbs:///default/credentials/nvcr"
'''
"policy.rego" = '''
$(cat "${allow_all_rego}")
'''
EOF
}
setup() {
setup_common || die "setup_common failed"
install_nim_operator || die "NIM Operator install failed"
dpkg -s jq >/dev/null 2>&1 || sudo apt -y install jq
# Same pattern as k8s-nvidia-nim.bats: choose manifest by TEE; each YAML has literal secret names.
local tee_suffix=""
[[ "${TEE}" = "true" ]] && tee_suffix="-tee"
export NIM_YAML_IN="${pod_config_dir}/nvidia-nim-llama-3-2-1b-instruct-service${tee_suffix}.yaml.in"
export NIM_YAML="${pod_config_dir}/nvidia-nim-llama-3-2-1b-instruct-service${tee_suffix}.yaml"
if [[ "${TEE}" = "true" ]]; then
setup_kbs_credentials
setup_sealed_secret_signing_public_key
initdata_file="${BATS_SUITE_TMPDIR}/nim-initdata-llama-3-2-1b.toml"
create_nim_initdata_file_llama_3_2_1b "${initdata_file}"
NIM_INITDATA_BASE64=$(gzip -c "${initdata_file}" | base64 -w0)
export NIM_INITDATA_BASE64
fi
envsubst < "${NIM_YAML_IN}" > "${NIM_YAML}"
}
@test "NIMService llama-3.2-1b-instruct serves /v1/models" {
echo "NIMService test: Applying NIM YAML"
kubectl apply -f "${NIM_YAML}"
echo "NIMService test: Waiting for deployment to exist (operator creates it from NIMService)"
local wait_exist_timeout=30
local elapsed=0
while ! kubectl get deployment "${NIM_SERVICE_NAME}" &>/dev/null; do
if [[ ${elapsed} -ge ${wait_exist_timeout} ]]; then
echo "Deployment ${NIM_SERVICE_NAME} did not appear within ${wait_exist_timeout}s" >&2
kubectl get deployment "${NIM_SERVICE_NAME}" 2>&1 || true
false
fi
sleep 5
elapsed=$((elapsed + 5))
done
local pod_name
pod_name=$(kubectl get pods --no-headers -o custom-columns=":metadata.name" | head -1)
echo "NIMService test: POD_NAME=${pod_name} (waiting for pod ready, timeout ${POD_READY_TIMEOUT_LLAMA_3_2_1B})"
[[ -n "${pod_name}" ]]
kubectl wait --for=condition=ready --timeout="${POD_READY_TIMEOUT_LLAMA_3_2_1B}" "pod/${pod_name}"
local pod_ip
pod_ip=$(kubectl get pod "${pod_name}" -o jsonpath='{.status.podIP}')
echo "NIMService test: POD_IP=${pod_ip}"
[[ -n "${pod_ip}" ]]
echo "NIMService test: Curling http://${pod_ip}:8000/v1/models"
run curl -sS --connect-timeout 10 "http://${pod_ip}:8000/v1/models"
echo "NIMService test: /v1/models response: ${output}"
[[ "${status}" -eq 0 ]]
[[ "$(echo "${output}" | jq -r '.object')" == "list" ]]
[[ "$(echo "${output}" | jq -r '.data[0].id')" == "meta/llama-3.2-1b-instruct" ]]
[[ "$(echo "${output}" | jq -r '.data[0].object')" == "model" ]]
echo "NIMService test: Curling http://${pod_ip}:8000/v1/chat/completions"
run curl -sS --connect-timeout 30 "http://${pod_ip}:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model":"meta/llama-3.2-1b-instruct","messages":[{"role":"user","content":"ping"}],"max_tokens":8}'
echo "NIMService test: /v1/chat/completions response: ${output}"
[[ "${status}" -eq 0 ]]
[[ "$(echo "${output}" | jq -r '.object')" == "chat.completion" ]]
[[ "$(echo "${output}" | jq -r '.model')" == "meta/llama-3.2-1b-instruct" ]]
[[ "$(echo "${output}" | jq -r '.choices[0].message | has("content") or has("reasoning_content")')" == "true" ]]
}
teardown() {
if kubectl get nimservice "${NIM_SERVICE_NAME}" &>/dev/null; then
POD_NAME=$(kubectl get pods --no-headers -o custom-columns=":metadata.name" | head -1)
if [[ -n "${POD_NAME}" ]]; then
echo "=== NIMService pod logs ==="
kubectl logs "${POD_NAME}" || true
kubectl describe pod "${POD_NAME}" || true
fi
kubectl describe nimservice "${NIM_SERVICE_NAME}" || true
fi
[ -f "${NIM_YAML}" ] && kubectl delete -f "${NIM_YAML}" --ignore-not-found=true
uninstall_nim_operator || true
print_node_journal_since_test_start "${node}" "${node_start_time:-}" "${BATS_TEST_COMPLETED:-}"
}

View File

@@ -89,7 +89,8 @@ if [[ -n "${K8S_TEST_NV:-}" ]]; then
else
K8S_TEST_NV=("k8s-confidential-attestation.bats" \
"k8s-nvidia-cuda.bats" \
"k8s-nvidia-nim.bats")
"k8s-nvidia-nim.bats" \
"k8s-nvidia-nim-service.bats")
fi
SUPPORTED_HYPERVISORS=("qemu-nvidia-gpu" "qemu-nvidia-gpu-snp" "qemu-nvidia-gpu-tdx")

View File

@@ -0,0 +1,54 @@
# Copyright (c) 2026 NVIDIA Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
---
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
name: ${NIM_SERVICE_NAME}
spec:
image:
repository: nvcr.io/nim/meta/llama-3.2-1b-instruct
tag: "1.12.0"
pullPolicy: IfNotPresent
pullSecrets:
- ngc-secret-llama-3-2-1b
authSecret: ngc-api-key-sealed-llama-3-2-1b
# /dev/trusted_store container image layer storage feature cannot be selected,
# but storage.emptyDir selects container data storage feature.
storage:
emptyDir:
sizeLimit: 10Gi
replicas: 1
resources:
limits:
nvidia.com/pgpu: "1"
cpu: "8"
memory: "56Gi"
expose:
service:
type: ClusterIP
port: 8000
runtimeClassName: kata
userID: 1000
groupID: 1000
annotations:
io.katacontainers.config.hypervisor.kernel_params: "agent.guest_components_procs=confidential-data-hub agent.aa_kbc_params=cc_kbc::${CC_KBS_ADDR}"
io.katacontainers.config.hypervisor.cc_init_data: "${NIM_INITDATA_BASE64}"
---
apiVersion: v1
kind: Secret
metadata:
name: ngc-secret-llama-3-2-1b
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: ${DOCKER_CONFIG_JSON}
---
apiVersion: v1
kind: Secret
metadata:
name: ngc-api-key-sealed-llama-3-2-1b
type: Opaque
data:
NGC_API_KEY: "${NGC_API_KEY_SEALED_SECRET_LLAMA_3_2_1B_BASE64}"

View File

@@ -0,0 +1,48 @@
# Copyright (c) 2026 NVIDIA Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
---
apiVersion: apps.nvidia.com/v1alpha1
kind: NIMService
metadata:
name: ${NIM_SERVICE_NAME}
spec:
image:
repository: nvcr.io/nim/meta/llama-3.2-1b-instruct
tag: "1.12.0"
pullPolicy: IfNotPresent
pullSecrets:
- ngc-secret-llama-3-2-1b
authSecret: ngc-api-key-llama-3-2-1b
storage:
hostPath: "${LOCAL_NIM_CACHE_LLAMA_3_2_1B}"
replicas: 1
resources:
limits:
nvidia.com/pgpu: "1"
cpu: "8"
memory: "16Gi"
expose:
service:
type: ClusterIP
port: 8000
runtimeClassName: kata
userID: 1000
groupID: 1000
---
apiVersion: v1
kind: Secret
metadata:
name: ngc-secret-llama-3-2-1b
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: ${DOCKER_CONFIG_JSON}
---
apiVersion: v1
kind: Secret
metadata:
name: ngc-api-key-llama-3-2-1b
type: Opaque
data:
NGC_API_KEY: "${NGC_API_KEY_BASE64}"

View File

@@ -155,6 +155,7 @@ pub struct Config {
pub containerd_conf_file: String,
pub containerd_conf_file_backup: String,
pub containerd_drop_in_conf_file: String,
pub daemonset_name: String,
pub custom_runtimes_enabled: bool,
pub custom_runtimes: Vec<CustomRuntime>,
}
@@ -169,6 +170,12 @@ impl Config {
return Err(anyhow::anyhow!("NODE_NAME must not be empty"));
}
let daemonset_name = env::var("DAEMONSET_NAME")
.ok()
.map(|v| v.trim().to_string())
.filter(|v| !v.is_empty())
.unwrap_or_else(|| "kata-deploy".to_string());
let debug = env::var("DEBUG").unwrap_or_else(|_| "false".to_string()) == "true";
// Parse shims - only use arch-specific variable
@@ -293,6 +300,7 @@ impl Config {
containerd_conf_file,
containerd_conf_file_backup,
containerd_drop_in_conf_file,
daemonset_name,
custom_runtimes_enabled,
custom_runtimes,
};

View File

@@ -94,30 +94,41 @@ impl K8sClient {
Ok(())
}
pub async fn count_kata_deploy_daemonsets(&self) -> Result<usize> {
/// Returns whether a non-terminating DaemonSet with this exact name
/// exists in the current namespace. Used to decide whether this pod is
/// being restarted (true) or uninstalled (false).
pub async fn own_daemonset_exists(&self, daemonset_name: &str) -> Result<bool> {
use k8s_openapi::api::apps::v1::DaemonSet;
use kube::api::Api;
let ds_api: Api<DaemonSet> = Api::default_namespaced(self.client.clone());
match ds_api.get_opt(daemonset_name).await? {
Some(ds) => Ok(ds.metadata.deletion_timestamp.is_none()),
None => Ok(false),
}
}
/// Returns how many non-terminating DaemonSets across all namespaces
/// have a name containing "kata-deploy". Used to decide whether shared
/// node-level resources (node label, CRI restart) should be cleaned up:
/// they are only safe to remove when no kata-deploy instance remains
/// on the cluster.
pub async fn count_any_kata_deploy_daemonsets(&self) -> Result<usize> {
use k8s_openapi::api::apps::v1::DaemonSet;
use kube::api::{Api, ListParams};
let ds_api: Api<DaemonSet> = Api::default_namespaced(self.client.clone());
let lp = ListParams::default();
let daemonsets = ds_api.list(&lp).await?;
let ds_api: Api<DaemonSet> = Api::all(self.client.clone());
let daemonsets = ds_api.list(&ListParams::default()).await?;
// Note: We use client-side filtering here because Kubernetes field selectors
// don't support "contains" operations - they only support exact matches and comparisons.
// Filtering by name containing "kata-deploy" requires client-side processing.
// Exclude DaemonSets that are terminating (have deletion_timestamp) so that when our
// DaemonSet pod runs cleanup on SIGTERM during uninstall, we count 0 and remove the label.
let count = daemonsets
.iter()
.filter(|ds| {
if ds.metadata.deletion_timestamp.is_some() {
return false;
}
ds.metadata
.name
.as_ref()
.map(|n| n.contains("kata-deploy"))
.unwrap_or(false)
ds.metadata.deletion_timestamp.is_none()
&& ds
.metadata
.name
.as_ref()
.is_some_and(|n| n.contains("kata-deploy"))
})
.count();
@@ -584,9 +595,14 @@ pub async fn label_node(
client.label_node(label_key, label_value, overwrite).await
}
pub async fn count_kata_deploy_daemonsets(config: &Config) -> Result<usize> {
pub async fn own_daemonset_exists(config: &Config) -> Result<bool> {
let client = K8sClient::new(&config.node_name).await?;
client.count_kata_deploy_daemonsets().await
client.own_daemonset_exists(&config.daemonset_name).await
}
pub async fn count_any_kata_deploy_daemonsets(config: &Config) -> Result<usize> {
let client = K8sClient::new(&config.node_name).await?;
client.count_any_kata_deploy_daemonsets().await
}
pub async fn crd_exists(config: &Config, crd_name: &str) -> Result<bool> {

View File

@@ -236,19 +236,29 @@ async fn install(config: &config::Config, runtime: &str) -> Result<()> {
async fn cleanup(config: &config::Config, runtime: &str) -> Result<()> {
info!("Cleaning up Kata Containers");
info!("Counting kata-deploy daemonsets");
let kata_deploy_installations = k8s::count_kata_deploy_daemonsets(config).await?;
// Step 1: Check if THIS pod's owning DaemonSet still exists.
// If it does, this is a pod restart (rolling update, label change, etc.),
// not an uninstall — skip everything so running kata pods are not disrupted.
info!(
"Found {} kata-deploy daemonset(s)",
kata_deploy_installations
"Checking if DaemonSet '{}' still exists",
config.daemonset_name
);
if kata_deploy_installations == 0 {
info!("Removing kata-runtime label from node");
k8s::label_node(config, "katacontainers.io/kata-runtime", None, false).await?;
info!("Successfully removed kata-runtime label");
if k8s::own_daemonset_exists(config).await? {
info!(
"DaemonSet '{}' still exists, \
skipping all cleanup to avoid disrupting running kata pods",
config.daemonset_name
);
return Ok(());
}
// Step 2: Our DaemonSet is gone (uninstall). Perform instance-specific
// cleanup: snapshotters, CRI config, and artifacts for this instance.
info!(
"DaemonSet '{}' not found, proceeding with instance cleanup",
config.daemonset_name
);
match config.experimental_setup_snapshotter.as_ref() {
Some(snapshotters) => {
for snapshotter in snapshotters {
@@ -270,6 +280,25 @@ async fn cleanup(config: &config::Config, runtime: &str) -> Result<()> {
artifacts::remove_artifacts(config).await?;
info!("Successfully removed kata artifacts");
// Step 3: Check if ANY other kata-deploy DaemonSets still exist.
// Shared resources (node label, CRI restart) are only safe to touch
// when no other kata-deploy instance remains.
let other_ds_count = k8s::count_any_kata_deploy_daemonsets(config).await?;
if other_ds_count > 0 {
info!(
"{} other kata-deploy DaemonSet(s) still exist, \
skipping node label removal and CRI restart",
other_ds_count
);
return Ok(());
}
info!("No other kata-deploy DaemonSets found, performing full shared cleanup");
info!("Removing kata-runtime label from node");
k8s::label_node(config, "katacontainers.io/kata-runtime", None, false).await?;
info!("Successfully removed kata-runtime label");
// Restart the CRI runtime last. On k3s/rke2 this restarts the entire
// server process, which kills this (terminating) pod. By doing it after
// all other cleanup, we ensure config and artifacts are already gone.

View File

@@ -51,18 +51,19 @@ pub async fn get_container_runtime(config: &Config) -> Result<String> {
return Ok("crio".to_string());
}
if runtime_version.contains("containerd") && runtime_version.contains("-k3s") {
// Check systemd services (ignore errors - service might not exist)
let _ = utils::host_systemctl(&["is-active", "--quiet", "rke2-agent"]);
if utils::host_systemctl(&["is-active", "--quiet", "rke2-agent"]).is_ok() {
return Ok("rke2-agent".to_string());
}
if utils::host_systemctl(&["is-active", "--quiet", "rke2-server"]).is_ok() {
return Ok("rke2-server".to_string());
}
if utils::host_systemctl(&["is-active", "--quiet", "k3s-agent"]).is_ok() {
return Ok("k3s-agent".to_string());
}
// Detect k3s/rke2 via systemd services rather than the containerd version
// string, which no longer reliably contains "k3s" in newer releases
// (e.g. "containerd://2.2.2-bd1.34").
if utils::host_systemctl(&["is-active", "--quiet", "rke2-agent"]).is_ok() {
return Ok("rke2-agent".to_string());
}
if utils::host_systemctl(&["is-active", "--quiet", "rke2-server"]).is_ok() {
return Ok("rke2-server".to_string());
}
if utils::host_systemctl(&["is-active", "--quiet", "k3s-agent"]).is_ok() {
return Ok("k3s-agent".to_string());
}
if utils::host_systemctl(&["is-active", "--quiet", "k3s"]).is_ok() {
return Ok("k3s".to_string());
}
@@ -83,7 +84,7 @@ pub async fn get_container_runtime(config: &Config) -> Result<String> {
Ok(runtime)
}
/// Returns true if containerRuntimeVersion (e.g. "containerd://2.1.5-k3s1") indicates
/// Returns true if containerRuntimeVersion (e.g. "containerd://2.1.5-k3s1", "containerd://2.2.2-bd1.34") indicates
/// containerd 2.x or newer, false for 1.x or unparseable. Used for drop-in support
/// and for K3s/RKE2 template selection (config-v3.toml.tmpl vs config.toml.tmpl).
pub fn containerd_version_is_2_or_newer(runtime_version: &str) -> bool {
@@ -191,6 +192,7 @@ mod tests {
#[case("containerd://2.0.0", true)]
#[case("containerd://2.1.5", true)]
#[case("containerd://2.1.5-k3s1", true)]
#[case("containerd://2.2.2-bd1.34", true)]
#[case("containerd://2.2.0", true)]
#[case("containerd://2.3.1", true)]
#[case("containerd://2.0.0-rc.1", true)]

View File

@@ -143,6 +143,13 @@ spec:
valueFrom:
fieldRef:
fieldPath: spec.nodeName
{{- if .Values.env.multiInstallSuffix }}
- name: DAEMONSET_NAME
value: {{ printf "%s-%s" .Chart.Name .Values.env.multiInstallSuffix | quote }}
{{- else }}
- name: DAEMONSET_NAME
value: {{ .Chart.Name | quote }}
{{- end }}
- name: DEBUG
value: {{ include "kata-deploy.getDebug" . | quote }}
{{- $shimsAmd64 := include "kata-deploy.getEnabledShimsForArch" (dict "root" . "arch" "amd64") | trim -}}

View File

@@ -234,7 +234,7 @@ externals:
nvrc:
# yamllint disable-line rule:line-length
desc: "The NVRC project provides a Rust binary that implements a simple init system for microVMs"
version: "v0.1.3"
version: "v0.1.4"
url: "https://github.com/NVIDIA/nvrc/releases/download/"
nvidia: