- trusted-storage.yaml.in: use $PV_STORAGE_CAPACITY and
$PVC_STORAGE_REQUEST so that PV/PVC size can vary per test.
- confidential_common.sh: add optional size (MB) argument to
create_loop_device.
- k8s-guest-pull-image.bats: pass PV_STORAGE_CAPACITY and
PVC_STORAGE_REQUEST when generating storage config.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
The follow differences are observed between container 1.x and 2.x:
```
[plugins.'io.containerd.snapshotter.v1.devmapper']
snapshotter = 'overlayfs'
```
and
```
[plugins."io.containerd.snapshotter.v1.devmapper"]
snapshotter = "overlayfs"
```
The current devmapper configuration only works with double quotes.
Make it work with both single and double quotes via tomlq.
In the default configuration for containerd 2.x, the following
configuration block is missing:
```
[[plugins.'io.containerd.transfer.v1.local'.unpack_config]]
platform = "linux/s390x" # system architecture
snapshotter = "devmapper"
```
Ensure the configuration block is added for containerd 2.x.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
As the AMD maintainers switched to the 2.3.0-beta.0 containerd (due to
the nydus fixes that landed there).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Now that kata-deploy deploys and manages nydus-for-kata-tee on all
platforms, the separate standalone nydus-snapshotter DaemonSet deployment
is no longer needed.
- Short-circuit deploy_nydus_snapshotter and cleanup_nydus_snapshotter
to no-ops with an explanatory message.
- Add qemu-snp to the workaround case so AMD SEV-SNP baremetal runners
also get USE_EXPERIMENTAL_SETUP_SNAPSHOTTER=true and kata-deploy picks
up the snapshotter setup on every run.
- Drop the x86_64 arch guard and the hypervisor sub-case from the
EXPERIMENTAL_SETUP_SNAPSHOTTER block, allowing any architecture and
hypervisor to use the kata-deploy-managed path when the flag is set.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Fixes: #10002
Since #11537 resolves the issue, remove the skip conditions for
the k8s e2e tests involving emptyDir volume mounts.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Now that containerd 2.3.0-beta.0 has been released, it brings fixes for
multi-snapshotters that allows us to test the baremetal machines in the
same way we test the non-baremetal ones.
Let's start doing the switch for TDX as timezone is friendlier with
Mikko.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Use the container data storage feature for the k8s-nvidia-nim.bats
test pod manifests. This reduces the pods' memory requirements.
For this, enable the block-encrypted emptydir_mode for the NVIDIA
GPU TEE handlers.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
We need to explicitly pass `-O index.html` as the busybox' wget has a
different behaviour than GNU's wget.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
In case a wget fails for one reason or another, it'll leave behind an
'index.html' file. Let's make sure we allow overriding that file so the
retry loop doesn't fail for no reason.
Fixes: #12670
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We recently had a failure on a new CI runner where
${HOME}/.cicd/venv/bin/activate was not present. The relevant call
originated from ensure_sev_snp_measure. Thus, add a function
ensure_cicd_python_venv before callers to pip install.
Currently, the NVIDIA NIM test and the confidential attestation
tests use pip to install dependencies.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
With the new CDH version, the LUKS header is moved off of the disk
into guest memory. We hence adapt the test's filesystem type checks.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
With signature support for sealed secret, use pre-created signed
sealed secrets and provision the signing public key to the KBS.
Add instructions for re-creating these signed secrets.
Improve k8s-sealed-secrets.bats by reducing repeated kubectl logs
calls. A test run showed a SIGPIPE error one one of the grep-logs
while the printouts of the initial kubectl logs invocation showed
that the expected values were actually in the logs.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Call the setup_genpolicy_registry_auth in run_kubernetes_nv_tests.sh.
Authenticate before exercising any tests.
Recently, we have seen UnauthorizedError messages for the CUDA
vectorAdd image. While this image is not gated behind authentication,
rate limiting may be a possible issue.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
k0s uses /var/lib/k0s/kubelet instead of /var/lib/kubelet as its
kubelet data directory. Introduce get_kubelet_data_dir() in
tests_common.sh and use it in k8s-trusted-ephemeral-data-storage.bats
instead of hardcoding /var/lib/kubelet.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
It can be useful to set these variables during local testing:
* AZ_REGION: Region for the cluster.
* AZ_NODEPOOL_TAGS: Node pool tags for the cluster.
* GENPOLICY_BINARY: Path to the genpolicy binary.
* GENPOLICY_SETTINGS_DIR: Directory holding the genpolicy settings.
I've also made it so that tests_common.sh modifies the duplicated
genpolicy-settings.json (used for testing) instead of the original git-tracked
one.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
In some CI runs, `mktemp` generates random characters that accidentally
form file extensions like `.cSV` or `.Xml`. This triggers downstream
parsing errors because the YAML content is misidentified as CSV/XML.
The issues look like as below:
```
'/tmp/bats-run-KodZEA/.../pod-guest-pull-in-trusted-storage.yaml.in.cSV':
...
```
This commit fixes the issue by:
1. Moving the `XXXXXX` placeholder before the `.yaml` extension.
2. Ensuring the generated file always ends in `.yaml`.
This prevents format misidentification while maintaining filename
uniqueness and security.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Create local block storage (loop device, StorageClass, PV) in the test
only when the cluster has no default StorageClass, matching the approach
used in k8s-volume.bats. Set our StorageClass as default so the PVC
binds to our PV; tear it down after the test.
When a default already exists (e.g. AKS), skip creation and cleanup so
we do not change the cluster's default storage class.
Fixes: #9846
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Allow genpolicy -j to accept a directory instead of a single file.
When given a directory, genpolicy loads genpolicy-settings.json from it
and applies all genpolicy-settings.d/*.json files (sorted by name) as
RFC 6902 JSON Patches. This gives precise control over settings with
explicit operations (add, remove, replace, move, copy, test), including
array index manipulation and assertions.
Ship composable drop-in examples in drop-in-examples/:
- 10-* files set platform base settings (non-CoCo, AKS, CBL-Mariner)
- 20-* files overlay specific adjustments (OCI version, guest pull)
Users copy the combination they need into genpolicy-settings.d/.
Replace the old adapt_common_policy_settings_* jq-patching functions
in tests_common.sh with install_genpolicy_drop_ins(), which copies the
right combination of 10-* and 20-* drop-ins for the CI scenario.
Tests still generate 99-test-overrides.json on the fly for per-test
request/exec overrides.
Packaging installs 10-* and 20-* drop-ins from drop-in-examples/ into
the tarball; the default genpolicy-settings.d/ is left empty.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
kata-deploy's SIGTERM cleanup restarts the CRI runtime, which on
k3s/rke2 takes down the API server temporarily. The helm uninstall
may complete with errors, and the next test suite would start with
a dead API. Add a wait loop after uninstall to ensure the API is
available before proceeding.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
k3s and rke2 use containerd that expects OCI bundle 1.2.1; otherwise
autogenerated policy tests fail. Add adapt_common_policy_settings_for_k3s_rke2
and call it from adapt_common_policy_settings when KUBERNETES is k3s or rke2.
Tested with k3s v1.34.4+k3s1, rke2 v1.34.4+rke2r1.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Having a script to install go is legacy from Jenkins, so
delete it, so there is less code in our repo.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Reinstate mariner host testing - including the Agent Policy tests on
these hosts - now that a new CLH version brought in the required fixes.
This reverts commit ea53779b90.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Trustee now returns the binary SNP TCB claims as hex rather than base64
(for consistency with other platforms). Fortunately, the sev-snp-measure
tool has a flag for setting the output type of the launch digest.
I think hex is the default, but let's keep the flag here to be explicit.
Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
Since runtime-rs added support for virtio-blk-ccw on s390x in #12531,
the assertion in k8s-guest-pull-image.bats should be generalized
to apply to all hypervisors ending with `-runtime-rs`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Specify runAsUser, runAsGroup, supplementalGroups values embedded
in the image's /etc/group file explicitly in the security context.
With this, both genpolicy and containerd, which in case of using
nydus guest-pull, lack image introspection capabilities, use the
same values for user/group/additionalG IDs at policy generation
time and at runtime when the OCI spec is passed.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Extend the timeout for the assert_pod_fail function call for the
test case "Test we cannot pull a large image that pull time exceeds
createcontainer timeout inside the guest" when the experimental
force guest-pull method is being used. In this method, the image is
first pulled on the host before creating the pod sandbox. While
image pull times can suddenly spike, we already time out in the
assert_pod_fail function before the image is even pulled on the
host.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Treat waiting.reason RunContainerError and terminated.reason StartError/Error
as container failure, so tests that expect guest image-pull failure (e.g.
wrong credentials) pass when the container fails with those states instead
of only BackOff.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>