`k8s-confidential.bats` technically doesn't need attestation, but only runs
on TEE hardware, so include it in the attestation list so we can test it in PRs
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The skip conditional is wrong, but it's not needed as the setup
and teardown only allow confidential hardware anyway
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Remove --tail=N limits from `kubectl logs` for kata-deploy pods so
the complete output is visible in CI job logs for debugging.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
During tests, one error as below:
```
..k8s-kill-all-process-in-container.bats: line 40: [: too many arguments
```
This commit aims to address such issue follows:
(1) Update process query command to "ps aux || ps" to ensure
compatibility across different container images while maximizing
process visibility.
(2) Use "[t]ail" in grep to reliably match the process without
self-matching.
(3) Quote variable in assertion to resolve "too many arguments" bash
error.
(4) Improve test reliability by ensuring the process list is actually
visible to the verification logic.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The 'no-layer-image' test case was failing because the underlying shim
returned a "unsupported rootfs mounts count" error instead of the
expected application-level "file not found" or "ENOENT" error.
This change updates the BATS test to accept the shim-level rootfs
validation error as a valid failure condition for this unsupported
image scenario, ensuring the CI remains green while reflecting
current runtime behavior.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Document the end-to-end workflow for using the containerd EROFS
snapshotter with Kata Containers runtime-rs, covering containerd
configuration, Kata QEMU settings, and pod deployment examples
via crictl/ctr/Kubernetes.
Include prerequisites (containerd >= 2.2, runtime-rs main branch),
QEMU VMDK format verification command, architecture diagram,
VMDK descriptor format reference, and troubleshooting guide.
Note that Cloud Hypervisor, Firecracker, and Dragonball do not
support VMDK block devices and are currently unsupported for
fsmerged EROFS rootfs.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The image used has some special (as weird) properties that are being
taking advantage of to implement policy related tests.
Changing the image is a no-go at this point, otherwise we break the
tests ... so let's just skip those for now.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add qemu-coco-dev-runtime-rs to the arm64 k8s test matrix so that the
CoCo non-TEE configuration is exercised on aarch64 runners.
Also enable auto-generated policy for qemu-coco-dev on aarch64 (matching
the existing x86_64 behavior) and register the new job as a required
gatekeeper check.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
Add aarch64/arm64 to the list of supported architectures for
qemu-coco-dev and qemu-coco-dev-runtime-rs shims across kata-deploy
configuration, Helm chart values, and test helper scripts.
Note that guest-components and the related build dependencies are not
yet wired for arm64 in these configurations; those will be addressed
separately.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
This test uses YAML files from a different directory than the other
k8s CI tests, so annotations have to be added into these separate
files.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Add kernel_verity_params to the qemu-coco-dev-runtime-rs configuration
so the runtime can assemble dm-verity kernel parameters, and remove the
test skip that was disabling measured rootfs tests for this hypervisor.
Fixes: #12851
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add BATS tests for the GetDiagnosticData termination log feature on
CoCo platforms where shared_fs=none.
Three test cases cover:
- Successful exit (exit 0): termination message is propagated when
GetDiagnosticDataRequest is allowed by policy.
- Failed exit (exit 1): termination message is propagated when
GetDiagnosticDataRequest is allowed by policy.
- Policy denied: with default CoCo policy (GetDiagnosticDataRequest
is false), the container stops cleanly but no termination message
is propagated (best-effort behavior).
Tests are skipped on non-CoCo platforms where shared_fs is not "none".
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
* Put the loop device creation code in the test itself, so that we get proper
logs if that part fails (following other tests).
* Reuse the $node variable to fix the test on multi-node clusters.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Utilise the new hypervisor helpers in our CI and test
code to help add clarity and reduce duplication
Note: `kubernetes_dir` is declared as readonly in
tests/integration/kubernetes/setup.sh which is sourced
by tests_common.sh, so we update it to only be set if
unset
Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add a pure shell script which the CI and integration tests can
use to check for different categories of runtime
Assisted-by: IBM Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
I'm doing some bookkeeping in the Azure subscription that requires we move
from eastus to eastus2. This should have no user-facing impact.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Adding the pod annotation config to the doc site. A symlink is created
at docs/pod-annotations.md that points to
how-to/how-to-set-sandbox-config-kata.md so that the URL for this file will be
created at `/pod-annotations`. Also adding brief contrbuting guidelines and
how-to's for running the documentation site locally for local previews.
Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
I've updaed the images on the Confidential Containers side, in order to
add arm64 support, but I didn't realize it'd break tests not using
those.
Apologies!
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
TEE hardware (TDX, SEV-SNP) is very limited in CI. Running the full
test suite on every PR consumes these resources unnecessarily, since
most tests exercises what is already exercised by the -coco-dev CIs.
Introduce a `tee-test-scope` workflow input (small/full) and a new
`baremetal-small-tee` K8S_TEST_HOST_TYPE that runs only the 12 tests
that are TEE-relevant: attestation tests (encrypted/authenticated/
signed image pull, confidential attestation) plus policy and trusted
ephemeral data storage tests.
PR runs default to "small" (12 tests), nightly runs use "full" (59
tests), and manual dispatch offers a dropdown to choose.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
The k8s-confidential-attestation test extracts the QEMU command line
from journal logs to compute the SNP launch measurement. It only
matched the Go runtime's log format ("launching <path> with: [<args>]"),
but runtime-rs logs differently ("qemu args: <args>").
Handle both formats so the test works with qemu-snp-runtime-rs.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As we're in the process to stabilise runtime-rs for the coming 4.0.0
release, we better start running as many tests as possible with that.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Follow-on to kata-containers/kata-containers#12396
Switch SNP config from initrd-based to image-based rootfs with
dm-verity. The runtime assembles the dm-mod.create kernel cmdline
from kernel_verity_params, and with kernel-hashes=on the root hash
is included in the SNP launch measurement.
Also add qemu-snp to the measured rootfs integration test.
Signed-off-by: Amanda Liem <aliem@amd.com>
With static_sandbox_resource_mgmt calculation fixed for runtime-rs, the
VM is correctly pre-sized at creation time. The vCPU allocation test no
longer depends on CPU hotplug, so the qemu-coco-dev* skip is no longer
needed.
Fixes: #10928
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
With static_sandbox_resource_mgmt now enabled for ARM on runtime-rs,
the VM is correctly pre-sized at creation time. The vCPU allocation
test no longer depends on CPU hotplug, so the aarch64 skip (issue
#10928) is no longer needed.
Fixes: #10928
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
The attestation-agent no longer sets nvidia devices to ready
automatically. Instead, we should use nvrc for this. Since this is
required for all nvidia workloads, add it to the default nv kernel
params.
With bounce buffers, the timing of attesting a device versus setting it
to ready is not so important.
Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
We recently moved the default policy in the Trustee repo. Now it's in
the same place as all the other policies. Update the test code to match.
Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
The busybox-pod.yaml test fixture sets tty: true on the second
container. When a container has a TTY, kubectl exec may return
\r\n line endings. The invisible \r causes string comparisons
to fail:
container_name=$(kubectl exec ... -- env | grep CONTAINER_NAME)
[ "$container_name" == "CONTAINER_NAME=second-test-container" ]
This comparison fails because $container_name contains a trailing
\r character.
Fix by piping through tr -d '\r' after grep. This is harmless
when \r is absent and fixes the mismatch when present.
Fixes: #9136
Signed-off-by: Rophy Tsai <rophy@users.noreply.github.com>
k3s and rke2 ship containerd 2.2.2, which requires the OCI 1.3.0
drop-in overlay. Move them from the separate OCI 1.2.1 branch into
the OCI 1.3.0 condition alongside nvidia-gpu, qemu-snp, qemu-tdx,
and custom container engine versions.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Docker 26+ configures container networking (veth pair, IP addresses,
routes) after task creation rather than before. Kata's endpoint scan
runs during CreateSandbox, before the interfaces exist, resulting in
VMs starting without network connectivity (no -netdev passed to QEMU).
Add RescanNetwork() which runs asynchronously after the Start RPC.
It polls the network namespace until Docker's interfaces appear, then
hotplugs them to QEMU and informs the guest agent to configure them
inside the VM.
Additional fixes:
- mountinfo parser: find fs type dynamically instead of hardcoded
field index, fixing parsing with optional mount tags (shared:,
master:)
- IsDockerContainer: check CreateRuntime hooks for Docker 26+
- DockerNetnsPath: extract netns path from libnetwork-setkey hook
args with path traversal protection
- detectHypervisorNetns: verify PID ownership via /proc/pid/cmdline
to guard against PID recycling
- startVM guard: rescan when len(endpoints)==0 after VM start
Fixes: #9340
Signed-off-by: llink5 <llink5@users.noreply.github.com>
Onboard a test case for deploying a NIM service using the NIM
operator. We install the operator helm chart on the fly as this is
a fast operation, spinning up a single operand. Once a NIM service
is scheduled, the operator creates a deployment with a single pod.
For now, the TEE-based flow uses an allow-all policy. In future
work, we strive to support generating pod security policies for the
scenario where NIM services are deployed and the pod manifest is
being generated on the fly.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Do not run the NIM containers with elevated privileges. Note that,
using hostPath requires proper host folder permissions, and that
using emptyDir requires a proper fsGroup ID.
Once issue 11162 is resolved, we can further refine the securityContext
fields for the TEE manifests.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
The logic in the k8s-empty-dirs.bats file missed to add a security
policy for the pod-empty-dir-fsgroup.yaml manifest. With this change,
we add the policy annotation.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Add functional tests that cover two previously untested kata-deploy
behaviors:
1. Restart resilience (regression test for #12761): deploys a
long-running kata pod, triggers a kata-deploy DaemonSet restart via
rollout restart, and verifies the kata pod survives with the same
UID and zero additional container restarts.
2. Artifact cleanup: after helm uninstall, verifies that RuntimeClasses
are removed, the kata-runtime node label is cleared, /opt/kata is
gone from the host filesystem, and containerd remains healthy.
3. Artifact presence: after install, verifies /opt/kata and the shim
binary exist on the host, RuntimeClasses are created, and the node
is labeled.
Host filesystem checks use a short-lived privileged pod with a
hostPath mount to inspect the node directly.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Remove the workaround that wrote a synthetic containerd V3 config
template for k3s/rke2 in CI. This was added to test kata-deploy's
drop-in support before the upstream k3s/rke2 patch shipped. Now that
k3s and rke2 include the drop-in imports in their default template,
the workaround is no longer needed and breaks newer versions.
Removed:
- tests/containerd-config-v3.tmpl (synthetic Go template)
- _setup_containerd_v3_template_if_needed() and its k3s/rke2 wrappers
- Calls from deploy_k3s() and deploy_rke2()
This reverts the test infrastructure part of a2216ec05.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>