set_container_command() previously appended command arguments
one-by-one with
'.command += [...]'. This makes the helper non-idempotent and can
lead to unexpected command arrays when invoked multiple times.
Update the helper to set the full command array in a single yq v4
expression and print the target YAML path plus the command being
applied to simplify debugging when tests fail.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The pod config file created by new_pod_config() was generated via
mktemp using the template "pod-config.yaml.in.XXX", which produces
filenames that do not end with ".yaml" (e.g. pod-config.yaml.in.ABC).
If the random combination of special suffix with ".Csv" or ".Xml", etc.
the following operations with yq will fail.
Some helpers and tooling assume the config path ends with ".yaml".
Switch the mktemp template to place the random suffix before the
extension so the returned path always ends with ".yaml".
Fixes: #12268, #12319
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Change the secure_storage_integrity option's default value to true.
With this, integrity protection for encrypted block device contents
will be requested from the confidential data hub by default, see the
agent's cdh_handler_trusted_storage function in rpc.rs.
This behavior can be disabled by explicitly setting the
agent.secure_storage_integrity parameter to 0 or false via kernel
command line parameters.
This will affect the trusted storage implementation for the guest-pull
mechanism, and it will affect future implementations using this code
path, such as implementations for ephemeral secure storage.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Till k8s 1.34 we could grep by "Started containerd". From k8s 1.35
onwards the event message changed and we should, instead, grep by
"Container started".
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Since #12204 was merged, the following error has been observed:
```
bats warning: Executed 1 instead of expected 2 tests
[run_kubernetes_tests.sh:162] ERROR: Tests FAILED from suites: k8s-empty-dirs.bats
```
The cause is that `pod_logs_file` is re-declared as a local variable
in the second test before skipping, which makes it inaccessible
in `teardown()` and leads to an error.
This commit removes the re-declaration of the variable.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Changes in NIM/RAG samples:
- update image references
- update memory requirements, timeouts, model name
- sanitize some of the probes and print-out
Further refinements can be made in the future.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Add two attestation tests. The first one sets a resource policy that
requires CPU0 to have an affirming trust level. This is a negative test
which can run on any platform. Setting this policy without setting any
reference values should result in an attestation failure.
Next, a second test will set the same policy, but this time it will use
the journal log to find the QEMU command line from the previous test and
calculate the expected reference values. Currently this is only
supported on SNP using the sev-snp-measure tool, but the same flow
should work on other platforms.
Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
It will address the issue:
"# bats warning: Executed 0 instead of expected 1 tests"
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
As each case need such preparation of get_pod_config_dir,
a better method is directly move it into the setup_common method.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
To measure the duration for journal, we need clearly print the journal
start time and end time for each case which helps to ensure the journal
log is for the specified period for the case.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Currently policy_settings_dir is created only when
BATS_TEST_NUMBER == "1",
but delete_tmp_policy_settings_dir "${policy_settings_dir}" is
called in teardown() for every test. This means that for tests
after the first one teardown() may attempt to delete a directory
that was already removed by a previous test, or rely on a value
that does not belong to the current test execution.
Adjust teardown logic so that policy_settings_dir is only deleted
for the first test case (BATS_TEST_NUMBER == "1") and ignored for
subsequent tests. This keeps the original optimization of running
genpolicy only once, while avoiding unnecessary or confusing cleanup
attempts in later test cases.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
the previous pod_name is set as local which can not be captured
within the teardown() function, causing failure.
This commit just remove the `local pod_name` to make it a global
variable.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Align with other test logic - declare the KATA_HYPERVISOR in the
run bash script, then declare the RUNTIME_CLASS_NAME variable in
the bats files.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Now that we have a more restrictive resource policy for KBS, let
us start adopting it across all NVIDIA test cases. This policy was
previously introduced by the NVIDIA attestation test.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
With these changes, we create pod security policies when running
against NVIDIA TEE GPU handlers where AUTO_GENERATE_POLICY is set.
For the non-TEE GPU tests, the added functions bail out by design.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Following existing patterns, we adapt the common policy settings
for NVIDIA GPU CI platforms. For instance, for our CI runners, we
use containerd 2.x.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Enable auto-generate policy for qemu-nvidia-gpu-* if the user
didn't specify an AUTO_GENERATE_POLICY value.
Setting this in run_kubernetes_nv_tests.sh is too late as
gha-run.sh calls into run_tests, setup.sh, and then into
create_common_genpolicy_settings() where the rules.rego and
genpolicy-settings file are being copied to the right locations.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Set the attestation policy for GPU0 to affirming. This requires
the GPU, for instance, to have production properties, such as
properly signed VBIOS firmware.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
As some reasons that this CI is continuously failed, we'd like to
temporarily skip it for the s390x platform. And it will be enabled
when we addressed related issues.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit updates the non-TEE tests to disable two specific test
cases: `k8s-number-cpus.bats` and `k8s-sandbox-vcpus-allocation.bats`.
These tests are designed to cover CPU elasticity/dynamic scaling
capabilities. In the non-TEE scenario, we are enforcing the disabling of
this capability by setting the default configuration to
`static_sandbox_resource_mgmt=true`.
Although the tests currently pass, allowing them to run is logically
inconsistent with the intended non-TEE configuration. Therefore, we are
disabling them for all non-TEE runtimes, specifically targeting:
- `qemu-coco-dev`
- `qemu-coco-dev-runtime-rs`
This change ensures that our non-TEE CI accurately reflects the static
resource management policy and prevents misleading test results.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
As runtime-rs doesn't support block device hotplug in s390 arch,
with this fact, we just disable or skip the test when it is the
s390.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Enable the cpu hotplug tests within the k8s-number-cpus.bats for both
cloud-hypervisor and qemu-runtime-rs.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
We have support cpu hotplug features within dragonball and clh, this
commit is to enable the test within the CI.
Fixes: #8660
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
As previous failure within the case, we choose to skip it, but now
the cpu hotplug has been corrected, and it's time to re-enable it.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This reverts commit 5a81b010f2, as we now
have all the infrastructure properly set up as part of our CI node.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We've made the pods require a ridiculous amount of memory, just for the
sake of getting them running.
Now that those are running, tests are passing, CI is required, let's
work to lower the amount of mmemory needed as everything else is working
as expected.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Clean-up shellcheck warnings:
SC2030 (info): Modification of cmd_out is local (to subshell caused by (..) group).
SC2031 (info): cmd_out was modified in a subshell. That change might be lost.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Clean-up shellcheck warnings:
SC2250 (style): Prefer putting braces around variable references even
when not strictly required.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Let's add a simple backup and restore logic for the CDI configuration
file nvidia.com-pgpu.yaml in the k8s-nvidia-*.bats and
k8s-confidential-attestation.bats test files.
Althought not optimal, this is a temporary workaround needed until
NVIDIA releases what's needed for the GPU Operator to properly deal with
cold plugged devices for the Confidential Containers cases, which is
work in progress right now.
After that's released, we can revert/drop this patch.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add the attestation bats test case to the NVIDIA CI and provide a
second pod manifest for the attestation test with a GPU. This will
enable composite attestation in a subsequent step.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
This brings two fixes:
- use the test_key variable to check against the aatest value.
- properly check the run command invocation (run w/o bash does not
seem to like the pipe which leads to ALWAYS evaluating the
status result to 1. With this, the deny-all test would ALWAYS
succeed regardless of whether aatest was actually returned or not.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
When running these tests repeatedly locally, the default policy is not
being reset after the test completes, then subsequent runs fail.
Similar to k8s-sealed-secrets.bats, we set the default policy in an if
condition.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
This allows setting a GPU0 resource policy, enabling GPU
attestation tests to not use the default resource policy.
For now, the policy requires attestation's ear status to
not be contraindicated. In a future change we will require
this to be affirming once our CI runners' vBIOS version is
properly configured.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
This enables attestation tests to figure out whether composite
attestation with a GPU can be executed.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Add the NVIDIA TEE hypervisors. With this, attestation tests can be run
against the NVIDIA handlers, for instance.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
As tags are mutable and digests are not, lets pin our image
by digest to give our CI a better chance of stability
Signed-off-by: stevenhorsman <steven@uk.ibm.com>