During the ${wait_time} for an expected condition, if
CreateContainerRequest was NOT expected to fail: detect possible
CreateContainerRequest failures early and abort the wait.
For example, before this change:
not ok 1 Successful job with auto-generated policy in 107111ms
ok 2 Policy failure: unexpected environment variable in 7920ms
ok 3 Policy failure: unexpected command line argument in 7874ms
ok 4 Policy failure: unexpected emptyDir volume in 7823ms
ok 5 Policy failure: unexpected projected volume in 7812ms
ok 6 Policy failure: unexpected readOnlyRootFilesystem in 7903ms
ok 7 Policy failure: unexpected UID = 222 in 7720ms
After this change:
not ok 1 Successful job with auto-generated policy in 10271ms
ok 2 Policy failure: unexpected environment variable in 8018ms
ok 3 Policy failure: unexpected command line argument in 7886ms
ok 4 Policy failure: unexpected emptyDir volume in 7621ms
ok 5 Policy failure: unexpected projected volume in 7843ms
ok 6 Policy failure: unexpected readOnlyRootFilesystem in 7632ms
ok 7 Policy failure: unexpected UID = 222 in 7619ms
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
During the ${wait_time} for an expected condition, if
CreateContainerRequest was NOT expected to fail: detect possible
CreateContainerRequest failures early and abort the wait.
For example, before this change:
ok 1 Successful sc deployment with auto-generated policy and container image volumes in 14769ms
ok 2 Successful sc with fsGroup/supplementalGroup deployment with auto-generated policy and container image volumes in 8384ms
not ok 3 Successful sc deployment with security context choosing another valid user in 136149ms
ok 4 Successful layered sc deployment with auto-generated policy and container image volumes in 8862ms
ok 5 Policy failure: unexpected GID = 0 for layered securityContext deployment in 7941ms
ok 6 Policy failure: malicious root group added via supplementalGroups deployment in 11612ms
After:
ok 1 Successful sc deployment with auto-generated policy and container image volumes in 15230ms
ok 2 Successful sc with fsGroup/supplementalGroup deployment with auto-generated policy and container image volumes in 9364ms
not ok 3 Successful sc deployment with security context choosing another valid user in 11060ms
ok 4 Successful layered sc deployment with auto-generated policy and container image volumes in 9124ms
ok 5 Policy failure: unexpected GID = 0 for layered securityContext deployment in 7919ms
ok 6 Policy failure: malicious root group added via supplementalGroups deployment in 11666ms
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
During the ${wait_time} for an expected condition, if
CreateContainerRequest was NOT expected to fail: detect possible
CreateContainerRequest failures early and abort the wait.
For example, before this change:
not ok 1 Successful pod with auto-generated policy in 110801ms
not ok 2 Able to read env variables sourced from configmap using envFrom in 94104ms
not ok 3 Successful pod with auto-generated policy and runtimeClassName filter in 95838ms
not ok 4 Successful pod with auto-generated policy and custom layers cache path in 110712ms
ok 5 Policy failure: unexpected container image in 8113ms
ok 6 Policy failure: unexpected privileged security context in 7943ms
ok 7 Policy failure: unexpected terminationMessagePath in 11530ms
ok 8 Policy failure: unexpected hostPath volume mount in 7970ms
ok 9 Policy failure: unexpected config map in 7933ms
not ok 10 Policy failure: unexpected lifecycle.postStart.exec.command in 112677ms
ok 11 RuntimeClassName filter: no policy in 2302ms
not ok 12 ExecProcessRequest tests in 93946ms
not ok 13 Successful pod: runAsUser having the same value as the UID from the container image in 94003ms
ok 14 Policy failure: unexpected UID = 0 in 8016ms
ok 15 Policy failure: unexpected UID = 1234 in 7850ms
After:
not ok 1 Successful pod with auto-generated policy in 12182ms
not ok 2 Able to read env variables sourced from configmap using envFrom in 10121ms
not ok 3 Successful pod with auto-generated policy and runtimeClassName filter in 11738ms
not ok 4 Successful pod with auto-generated policy and custom layers cache path in 26592ms
ok 5 Policy failure: unexpected container image in 7742ms
ok 6 Policy failure: unexpected privileged security context in 7949ms
ok 7 Policy failure: unexpected terminationMessagePath in 7789ms
ok 8 Policy failure: unexpected hostPath volume mount in 7887ms
ok 9 Policy failure: unexpected config map in 7818ms
not ok 10 Policy failure: unexpected lifecycle.postStart.exec.command in 9120ms
ok 11 RuntimeClassName filter: no policy in 2081ms
not ok 12 ExecProcessRequest tests in 9883ms
not ok 13 Successful pod: runAsUser having the same value as the UID from the container image in 9870ms
ok 14 Policy failure: unexpected UID = 0 in 11161ms
ok 15 Policy failure: unexpected UID = 1234 in 7814ms
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
We've seen a few cases where we fail the test due to timeout and when we
print the pods we just see that they've been created.
With that in mind, let's just increase the timeout a little bit.
Example:
```
not ok 1 Parallel jobs in 6250ms
(in test file k8s-parallel.bats, line 41)
`kubectl wait --for=condition=Ready --timeout=$timeout pod -l jobgroup=${job_name}' failed
No resources found in kata-containers-k8s-tests namespace.
[bats-exec-test:71] INFO: k8s configured to use runtimeclass
job.batch/process-item-test1 created
job.batch/process-item-test2 created
job.batch/process-item-test3 created
NAME STATUS COMPLETIONS DURATION AGE
process-item-test1 Running 0/1 0s
process-item-test2 Running 0/1 0s
process-item-test3 Running 0/1 0s
error: no matching resources found
No resources found in kata-containers-k8s-tests namespace.
No resources found in kata-containers-k8s-tests namespace.
DEBUG: system logs of node 'aks-nodepool1-25989463-vmss000000' since test start time (2025-11-01 16:39:03)
-- No entries --
job.batch "process-item-test1" deleted
job.batch "process-item-test2" deleted
job.batch "process-item-test3" deleted
```
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We have 2 tests running on GitHub provided runners:
* devmapper
* CRI-O
- devmapper situation
For devmapper, we're currently testing devmapper with s390x as part of
one of its jobs.
More than that, this test has been failing here due to a lack of space
in the machine for quite some time, and no-action was taken to bring it
back either via GARM or some other way.
With that said, let's rely on the s390x CI to test devmapper and avoid
one extra failure on our CI by removing this one.
- cri-o situation
CRI-O is being tested with a fixed version of kubernetes that's already
reached its EOL, and a CRI-O version that matches that k8s version.
There has been attempts to raise issues, and also to provide a PR that
does at least part of the work ... leaving the debugging part for the
maintainers of the CI. However, there was no action on those from the
maintainers.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Change NIM bats file logic to allow skipping test cases which
require multiple GPUs. This can be helpful for test clusters where
there is only one node with a single GPU, or for local test
environments with a single-node cluster with a single GPU.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
If a ConfigMap has more than 8 files it will not be mounted watchable
[1]. However, genpolicy assumes that ConfigMaps are always mounted at a
watchable path, so containers with large ConfigMap mounts fail
verification.
This commit allows mounting ConfigMaps from watchable and non-watchable
directories. ConfigMap mounts can't be meaningfully verified anyway, so
the exact location of the data does not matter, except that we stay in
the sandbox data dirs.
[1]: 0ce3f5fc6f/docs/design/inotify.md (L11-L21)Fixes: #11777
Signed-off-by: Markus Rudy <mr@edgeless.systems>
Auto-generate policy in k8s-optional-empty-secret.bats, now that
genpolicy suppprts optional secret-based volumes.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
The new initdata variants of the tests are failing on the tdx
runner, so as discussed, skip them for now: Issue #11945
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Now we have wider coverage of initdata testing in
k8s-guest-pull-image-signature.bats then remove
the old testing.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Our current set of authenticated registry tests involve setting
kernel_params to config the image pull process, but as of
kata-containers#11197
this approach is not the main way to set this configuration and the agent
config has been removed. Instead we should set the configuration in the
`cdh.toml` part of the initdata, so add new test cases for this. In future, when
we have been through the deprecation process, we should remove the old tests
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Our current set of signature tests involve setting kernel_parameters to
config the image pull process, but as of
https://github.com/kata-containers/kata-containers/pull/11197
this approach is not the main way to set this configuration and the agent
config has been removed. Instead we should set the configuration in the
`cdh.toml` part of the initdata, so add new test cases for this. In future, when
we have been through the deprecation process, we should remove the old tests
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Create a shared get_initdata method that injects a cdh image
section, so we don't duplicate the initdata structure everywhere
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- copy default-initdata.toml in create_tmp_policy_settings_dir, so it can be modified by other tests if needed
- make auto_generate_policy use default-initdata.toml by default
- add auto_generate_policy_no_added_flags, so it may be used by tests that don't want to use default-initdata.toml by default
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Now that we have added the ability to deploy kata-containers with
experimental_force_guest_pull configured, let's make sure we test it to
avoid any kind of regressions.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
There's no reason to have the code duplication between the SNP / TDX
tests for CoCo, as those are basically using the same configuration
nowadays.
Note that for the TEEs case, as the nydus-snapshotter is deployed by the
admin, once, instead of deploying it on every run ... I'm actually
removing the nydus-snapshotter steps so we make it clear that those
steps are not performed by the CI.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We already have support for deploying a few flavours of k8s that are
required for different tests we perform.
Let's also add the ability to deploy vanilla k8s, as that will be very
useful in the next commits in this series.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Use grep_pod_exec_output to retry possible failing "kubectl exec"
commands. Other tests have been hitting such errors during CI in
the past.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This adds an integration test to verify that privileged containers work
properly when deploying Kata with kata-deploy.
This is a follow-up to #11878.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This allows us to stop setting up the snapshotter ourselves, and just
rely con kata-deploy to do so.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Introduce new test case which verifies that openvpn clients and servers
can run as Kata pods and can successfully establish a connection.
Volatile certificates and keys are generated by an initialization
container and injected into the client and server containers.
This scenario requires TUN/TAP support for the UVM kernel.
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
Co-authored-by: Manuel Huber <manuelh@nvidia.com>
No need to die when a Kind that does not require a policy annotation is
found in a pod manifest. Print an informational message instead.
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
This change crystallizes and simplifies the current handling of /dev
hostPath mounts with virtually no functional change.
Before this change:
- If a mount DESTINATION is in /dev and it is a non-regular file on the HOST,
the shim passes the OCI bind mount as is to the guest (e.g.
/dev/kmsg:/dev/kmsg). The container rightfully sees the GUEST device.
- If the mount DESTINATION does not exist on the host, the shim relies on
k8s/containerd to automatically create a directory (ie. non-regular file) on
the HOST. The shim then also passes the OCI bind mount as is to the guest. The
container rightfully sees the GUEST device.
- For other /dev mounts, the shim passes the device major/minor to the guest
over virtio-fs. The container rightfully sees the GUEST device.
After this change:
- If a mount SOURCE is in /dev and it is a non-regular file on the HOST,
the shim passes the OCI bind mount as is to the guest. The container
rightfully sees the GUEST device.
- The shim does not anymore rely on k8s/containerd to create missing mount
directories. Instead it explicitely handles missing mount SOURCES, and
treats them like the previous bullet point.
- The shim no longer uses virtio-fs to pass /dev device major/minor to the
guest, instead it passes the OCI bind mount as is.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Auto-generate policy for nginx-deployment pods, instead of hard-coding
the "allow all" policy.
Note that the `busybox_pod` - created using `kubectl run` - still
doesn't have an Init Data annotation, so it is using the default policy
built into the Kata Guest rootfs image file.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Auto-generate agent policy in k8s-liveness-probes.bats, instead of using
the non-confidential "allow all" policy.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Auto-generate the agent policy for pod-secret-env.yaml, using
"genpolicy -c inject_secret.yaml".
Support for passing Secret specification files as "-c" arguments of
genpolicy has been added when fixing #10033 with PR #10986.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
-o pipefail in particular ensures that exec_host() returns the right exit
code.
-u is also added for good measure. Note that $BATS_TEST_DIRNAME is set by
bats so we move its usage inside the function.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
The Hadolint warning DL3007 (pin the version explicitly) is no
longer applicable.
We have updated the base image to use a specific version
digest, which satisfies the linter's requirement for reproducible
builds. This commit removes the corresponding inline ignore comment.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>