Having a script to install go is legacy from Jenkins, so
delete it, so there is less code in our repo.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Reinstate mariner host testing - including the Agent Policy tests on
these hosts - now that a new CLH version brought in the required fixes.
This reverts commit ea53779b90.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Trustee now returns the binary SNP TCB claims as hex rather than base64
(for consistency with other platforms). Fortunately, the sev-snp-measure
tool has a flag for setting the output type of the launch digest.
I think hex is the default, but let's keep the flag here to be explicit.
Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
Fix `T1005: error strings should not be capitalized (staticcheck)`
This is to comply with go conventitions as errors are normally appended,
so there would be a spurious captialisation in the middle of the message
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
strings.ReplaceAll was introduced in Go 1.12 as a more readable and self-documenting way to say "replace everything".
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add a setting to skip the
`T1005: error strings should not be capitalized (staticcheck)`
rule to avoid impact to our error strings
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Since runtime-rs added support for virtio-blk-ccw on s390x in #12531,
the assertion in k8s-guest-pull-image.bats should be generalized
to apply to all hypervisors ending with `-runtime-rs`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Specify runAsUser, runAsGroup, supplementalGroups values embedded
in the image's /etc/group file explicitly in the security context.
With this, both genpolicy and containerd, which in case of using
nydus guest-pull, lack image introspection capabilities, use the
same values for user/group/additionalG IDs at policy generation
time and at runtime when the OCI spec is passed.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Extend the timeout for the assert_pod_fail function call for the
test case "Test we cannot pull a large image that pull time exceeds
createcontainer timeout inside the guest" when the experimental
force guest-pull method is being used. In this method, the image is
first pulled on the host before creating the pod sandbox. While
image pull times can suddenly spike, we already time out in the
assert_pod_fail function before the image is even pulled on the
host.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Set KubeletConfiguration runtimeRequestTimeout to 600s mainly for CoCo
(Confidential Containers) tests, so container creation (attestation,
policy, image pull, VM start) does not hit the default CRI timeout.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Treat waiting.reason RunContainerError and terminated.reason StartError/Error
as container failure, so tests that expect guest image-pull failure (e.g.
wrong credentials) pass when the container fails with those states instead
of only BackOff.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Run all CoCo non-TEE variants in a single job on the free runner with an
explicit environment matrix (vmm, snapshotter, pull_type, kbs,
containerd_version).
Here we're testing CoCo only with the "active" version of containerd.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
We were running most of the k8s integration tests on AKS. The ones that
don't actually depend on AKS's environment now run on normal
ubuntu-24.04 GitHub runners instead: we bring up a kubeadm cluster
there, test with both containerd lts and active, and skip attestation
tests since those runtimes don't need them. AKS is left only for the
jobs that do depend on it.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Disable mariner host testing in CI, and auto-generated policy testing
for the temporary replacements of these hosts (based on ubuntu), to work
around missing:
1. cloud-hypervisor/cloud-hypervisor@0a5e79a, that will allow Kata
in the future to disable the nested property of guest VPs. Nested
is enabled by default and doesn't work yet with mariner's MSHV.
2. cloud-hypervisor/cloud-hypervisor@bf6f0f8, exposed by the large
ttrpc replies intentionally produced by the Kata CI Policy tests.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Make `az aks create` command easier to change when needed, by moving the
arguments specific to mariner nodes onto a separate line of this script.
This change also removes the need for `shellcheck disable=SC2046` here.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Issue 10838 is resolved by the prior commit, enabling the -m
option of the kernel build for confidential guests which are
not users of the measured rootfs, and by commit
976df22119, which ensures
relevant user space packages are present.
Not every confidential guest has the measured rootfs option
enabled. Every confidential guest is assumed to support CDH's
secure storage features, in contrast.
We also adjust test timeouts to account for occasional spikes on
our bare metal runners (e.g., SNP, TDX, s390x).
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
This test uses YAML files from a different directory than the other
k8s CI tests, so annotations have to be added into these separate
files.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Similar to k8s-guest-pull-image-authenticated and to
k8s-guest-pull-image-signature, enabling k8s-guest-pull-image to
run against the experimental force guest pull method.
Only k8s-guest-pull-image-encrypted requires nydus.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
When NFD is detected (deployed by the chart or existing in the cluster),
apply shim-specific nodeSelectors only for TEE runtime classes (snp,
tdx, and se).
Non-TEE shims keep existing behavior (e.g. runtimeClass.nodeSelector for
nvidia GPU from f3bba0885 is unchanged).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Enhance k8s-configmap.bats and k8s-credentials-secrets.bats to test that ConfigMap and Secret updates propagate to volume-mounted pods.
- Enhanced k8s-configmap.bats to test ConfigMap propagation
* Added volume mount test for ConfigMap consumption
* Added verification that ConfigMap updates propagate to volume-mounted pods
- Enhanced k8s-credentials-secrets.bats to test Secret propagation
* Added verification that Secret updates propagate to volume-mounted pods
Fixes#8015
Signed-off-by: Ajay Victor <ajvictor@in.ibm.com>
genpolicy pulls image manifests from nvcr.io to generate policy and was
failing with 'UnauthorizedError' because it had no registry credentials.
Genpolicy (src/tools/genpolicy) uses docker_credential::get_credential()
in registry.rs, which reads from DOCKER_CONFIG/config.json. Add
setup_genpolicy_registry_auth() to create a Docker config with nvcr.io
auth (NGC_API_KEY) and set DOCKER_CONFIG before running genpolicy so it
can authenticate when pulling manifests.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We've seen a lot of spurious issues when deploying the infra needed for
the tests. Let's give it a few tries before actually failing.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
After the move to Linux 6.17 and QEMU 10.2 from Kata,
k8s-sandbox-vcpus-allocation.bats started failing on TDX.
2026-02-10T16:39:39.1305813Z # pod/vcpus-less-than-one-with-no-limits created
2026-02-10T16:39:39.1306474Z # pod/vcpus-less-than-one-with-limits created
2026-02-10T16:39:39.1307090Z # pod/vcpus-more-than-one-with-limits created
2026-02-10T16:39:39.1307672Z # pod/vcpus-less-than-one-with-limits condition met
2026-02-10T16:39:39.1308373Z # timed out waiting for the condition on pods/vcpus-less-than-one-with-no-limits
2026-02-10T16:39:39.1309132Z # timed out waiting for the condition on pods/vcpus-more-than-one-with-limits
2026-02-10T16:39:39.1310370Z # Error from server (BadRequest): container "vcpus-less-than-one-with-no-limits" in pod "vcpus-less-than-one-with-no-limits" is waiting to start: ContainerCreating
A manual test without agent policies added it seems to work OK but disable
the test for now to get CI stable.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
- Trim trailing whitespace and ensure final newline in non-vendor files
- Add .editorconfig-checker.json excluding vendor dirs, *.patch, *.img,
*.dtb, *.drawio, *.svg, and pkg/cloud-hypervisor/client so CI only
checks project code
- Leave generated and binary assets unchanged (excluded from checker)
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Update the enable_nvrc_trace() function to use the new drop-in
configuration mechanism instead of directly modifying the base
configuration file. The function now creates a 90-nvrc-trace.toml
drop-in file that properly combines existing kernel parameters
with the nvrc.log=trace setting.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This adds a basic configuration for editorconfig checker. The
supplied configuration checks against trailing whitespaces and
issues with newlines.
Example:
| tools/packaging/kernel/configs/fragments/x86_64/numa.conf:
| Wrong line endings or no final newline
| tools/packaging/release/generate_vendor.sh:
| 44: Trailing whitespace
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
This annotation was required for GPU cold-plug before using a
newer device plugin and before querying the pod resources API.
As this annotation is no longer required, cleaning it up.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
It will do following works in this commit:
(1) Rename pod_exec_with_retries() to pod_exec().
(2) Update implementation to call container_exec().
(3) Replace all usages of pod_exec_with_retries across tests
with pod_exec.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>