Explain why the containerd settings on the local machine get set to
containerd's defaults when testing GENPOLICY_PULL_METHOD=containerd.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
oci-distribution is the value used by run-k8s-tests-on-aks.yaml, so
use the same value as default for GENPOLICY_PULL_METHOD in gha-run.sh.
The value of GENPOLICY_PULL_METHOD is currently compared just with
"containerd", but avoid possible future problems due to using a
different default value in gha-run.sh.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Currently, it is not viable to share a writable volume (e.g., emptyDir)
between containers in a single pod for IBM SE.
The following tests are relevant:
- pod-shared-volume.bats
- k8s-empty-dirs.bats
(See: https://github.com/kata-containers/kata-containers/issues/10002)
This commit skips the tests until the issue is resolved.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
For running a KBS with `se-verifier` in service,
specific credentials need to be configured.
(See https://github.com/confidential-containers/trustee/tree/main/attestation-service/verifier/src/se for details.)
This commit introduces two procedures to support IBM SE attestation:
- Prepare required files and directory structure
- Set necessary environment variables for KBS deployment
- Repackage a secure image once the KBS service address is determined
These changes enable `k8s-confidential-attestation.bats` for s390x.
Fixes: #9933
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The current KBS deployment creates a file `key.bin` assuming that
`kustomization.yaml` is located in `overlays/`.
However, this does not hold true when the kustomize config is enabled
for multiple architectures. In such cases, the configuration file
should be located in `overlays/$(uname -m)`.
This commit changes the location for file creation.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Delete test scripts forcely in `Delete kata-deploy` step before
deleting all kata pods.
Fixes: #9980
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
The sealed secret test depends on the KBS to provide
the unsealed value of a vault secret.
This secret is provisioned to an environment variable.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
- Due to the error we hit with pulling the agnhost
image used in the liveness-probe tests, we want to leave
the console printing to help with debug when we next try
to bump the image-rs version
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This test was fixed by previous patches in this PR: kata-containers#9695
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This test was fixed by previous patches in this PR: kata-containers#9695
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
We have not seen instances of the nydus snapshotter hanging on its
deletion that we must patch its finalize.
Let's just drop this line for now.
Signed-Off-By: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As part of archiving the tests repo, we are eliminating the dependency on
`clone_tests_repo()`. The scripts using the function is as follows:
- `ci/install_rust.sh`.
- `ci/setup.sh`
- `ci/lib.sh`
This commit removes or replaces the files, and makes an adjustment accordingly.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
A multi-arch image for `alpine-bash-curl` has been pushed to and available
at `quay.io/kata-containers`.
This commit switches the test image to `quay.io/kata-containers/alpine-bash-curl`.
Fixes: #9935
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
- We only want to enable the sample verifier in the KBS for non-TEE
tests, so prevent an edge case where the TEE platform isn't set up
correctly and we might fall back to the sample and get false positives.
To prevent this we add guards around the sample policy enablement and
only run it for non confidential hardware
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add --layers-cache-file-path flag to allow the user to
specify where the cache file for the container layers
is saved. This allows e.g. to have one cache file
independent of the user's working directory.
Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
The following error was observed during the deployment of nydus snapshotter:
```
Error from server (NotFound):
the server could not find the requested resource ( pods/log nydus-snapshotter-5v82v)
'kubectl logs nydus-snapshotter-5v82v -n nydus-system' failed after 3 tries
Error: Process completed with exit code 1.
```
This error can occur when a pod is re-created by a daemonset during the retry interval.
This commit addresses the issue by using `--selector` rather than the pod name
for `kubectl logs/describe`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The following tests are disabled because they fail (alike with dragonball):
- k8s-cpu-ns.bats
- k8s-number-cpus.bats
- k8s-sandbox-vcpus-allocation.bats
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR uses the nodeport deployment from upstream trustee.
To ensure our deployment is as close to upstream trustee replace
the custom nodeport handling and replace it with nodeport
kustomized flavour from the trustee project.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR makes general improvements like definition of variables and
the use of them to improve the general setup script for kubernetes
tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Frequent errors have been observed during k8s e2e tests:
- The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
- Error from server (ServiceUnavailable): the server is currently unable to handle the request
- Error from server (NotFound): the server could not find the requested resource
These errors can be resolved by retrying the kubectl command.
This commit introduces a wrapper function in common.sh that runs kubectl up to 3 times
with a 5-second interval. Initially, this change only covers gha-run.sh for Kubernetes.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
It seems I was very lose on disabling some of the tests, and the issues
I faced could be related to other instabilities in the CI.
Let's re-enable this one, following what was done for the SEV, SNP, and
coco-qemu-dev.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It seems I was very lose on disabling some of the tests, and the issues
I faced could be related to other instabilities in the CI.
Let's re-enable this one, following what was done for the SEV, SNP, and
coco-qemu-dev.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It seems I was very lose on disabling some of the tests, and the issues
I faced could be related to other instabilities in the CI.
Let's re-enable this one, following what was done for the SEV, SNP, and
coco-qemu-dev.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It seems I was very lose on disabling some of the tests, and the issues
I faced could be related to other instabilities in the CI.
Let's re-enable this one, following what was done for the SEV, SNP, and
coco-qemu-dev.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It seems I was very lose on disabling some of the tests, and the issues
I faced could be related to other instabilities in the CI.
Let's re-enable this one, following what was done for the SEV, SNP, and
coco-qemu-dev.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's stop skipping the CDH tests for TDX, as know we should have an
environmemnt where it can run and should pass. :-)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We must ensure we use the host ip to connect to the PCCS running on the
host side, instead of using localhost (which has a different meaning
from inside the KBS pod).
The reason we're using `hostname -i` isntead of the helper functions, is
because the helper functions need the coco-kbs deployed for them to
work, and what we do is before the deployment.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is generally useful debug output on test failures,
and specifically this has been useful for nydus-related
issues recently.
Signed-off-by: Chris Porter <porter@ibm.com>
A new PULL_TYPE environment variable is recognized by the kata-deploy's
install script to allow it to configure CRIO-O for guest-pull image pulling
type.
The tests/integration/kubernetes/gha-run.sh change allows for testing it:
```
export PULL_TYPE=guest-pull
cd tests/integration/kubernetes
./gha-run.sh deploy-k8s
```
Fixes#9474
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR replaces the name to use a variable that is already defined
to have a better uniformity across the general script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
It was observed that the `node_start_time` value is sometimes empty,
leading to a test failure.
This commit retries fetching the value up to 3 times.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This test fails when using `shared_fs=none` with the nydus snapshotter.
Issue tracked here: #9666
Skipping for now.
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
This test is failing due to the initContainers not being properly
handled with the guest image pulling.
Issue tracked here: #9668
Skipping for now.
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
This test fails when using `shared_fs=none` with the nydus napshotter,
Issue tracked here: #9664
Skipping for now.
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
To prevent download failures caused by high traffic to the Docker image,
opt for quay.io/prometheus/busybox:latest over docker.io/library/busybox:latest .
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
CreateContainerRequest objects can specify devices to be created inside
the guest VM. This change ensures that requested devices have a
corresponding entry in the PodSpec.
Devices that are added to the pod dynamically, for example via the
Device Plugin architecture, can be allowlisted globally by adding their
definition to the settings file.
Fixes: #9651
Signed-off-by: Markus Rudy <mr@edgeless.systems>
End of file should not end with --- mark. This will confuse tools like
yq and kubectl that might think this is another object.
Signed-off-by: Beraldo Leal <bleal@redhat.com>
Since yq frequently updates, let's upgrade to a version from February to
bypass potential issues with versions 4.41-4.43 for now. We can always
upgrade to the newest version if necessary.
Fixes#9354
Depends-on:github.com/kata-containers/tests#5818
Signed-off-by: Beraldo Leal <bleal@redhat.com>
Wainer noticed this is failing for the coco-qemu-dev case, and decided
to skip it, notifying me that he didn't fully understand why it was not
failing on TDX.
Turns out, though, this is also failing on TDX, and we need to skip it
there as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add the CLI flag --runtime-class-names, which is used during
policy generation. For resources that can define a
runtimeClassName (e.g., Pods, Deployments, ReplicaSets,...)
the value must have any of the --runtime-class-names as
prefix, otherwise the resource is ignored.
This allows to run genpolicy on larger yaml
files defining many different resources and only generating
a policy for resources which will be deployed in a
confidential context.
Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
Use the variable BATS_TEST_COMPLETED which is defined by the bats framework
when the test finishes. `BATS_TEST_COMPLETED=` (empty) means the test failed,
so the node syslogs will be printed only at that condition.
Fixes: #9750
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test fails with qemu-coco-dev configuration and guest-pull image pull.
Issue: #9667
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
It's cloning the nydus-snapshotter repo from the version specified in
versions.yaml, however, the deployment files are set to pull in the
latest version of the snapshotter image. With this version we are
pinning the image version too.
This is a temporary fix as it should be better worked out at nydus-snapshotter
project side.
Fixes: #9742
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test fails with qemu-coco-dev configuration and guest-pull image pull.
Issue: #9668
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test fails with qemu-coco-dev configuration and guest-pull image pull.
Issue: #9666
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test fails with qemu-coco-dev configuration and guest-pull image pull.
Issue: #9664
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test fails with qemu-coco-dev configuration and guest-pull image pull.
Unlike other tests that I've seen failing on this scenario, k8s-seccomp.bats
fails after a couple of consecutive executions, so it's that kind of failure
that happens once in a while.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This will enable the k8s tests to leverage guest pulling when
PULL_TYPE=guest-pull for qemu-coco-dev runtimeclass.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The runtime handler annotation is required for Kubernetes <= 1.28 and
guest-pull pull type. So leverage $PULL_TYPE (which is exported by CI jobs)
to conditionally apply the annotation.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This takes a few minutes that could be saved, so let's avoid doing this
on all the platforms, but simply do this when it's needed (the baremetal
use case).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Currently only "baremetal" runs all the tests, but we could easily run
"all" locally or using the github provided runners, even when not using
a "baremetal" system.
The reason I'd like to have a differentiation between "all" and
"baremetal" is because "baremetal" may require some cleanup, which "all"
can simply skip if testing against a fresh created VM.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
For now we've only exposed the option to deploy kata-deploy for k3s and
vanilla kubernetes when using containerd.
However, I do need to also deploy k0s and rke2 for an internal CI, and
having those exposed here do not hurt, and allow us to easily expand the
CI at any time in the future.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The k8s test suite halts on the first failure, i.e., failing-fast. This
isn't the behavior that we used to see when running tests on Jenkins and it
seems that running the entire test suite is still the most productive way. So
this disable fail-fast by default.
However, if you still wish to run on fail-fast mode then just export
K8S_TEST_FAIL_FAST=yes in your environment.
Fixes: #9697
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test has failed in confidential runtime jobs. Skip it
until we don't have a fix.
Fixes: #9663
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
One of our machines is running CentOS 9 Stream, and we could easily
verify that we can build and install the kbs client there, thus we're
expanding the installation script to also support CentOS 9 Stream.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`externals.coco-kbs.toolchain` is not defined, get the rust_version from
`externals.coco-trustee.toolchain` instead.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR enables the installation and unistallation of the kbs client
as well as general coco components needed for the TDX GHA CI.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the use of custom Intel DCAP configuration when
deploying the KBS.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This test doesn't fail with the guest image pulling, but it for sure
should. :-)
We can see in the bats logs, something like:
```
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 31s default-scheduler Successfully assigned kata-containers-k8s-tests/liveness-exec to 984fee00bd70.jf.intel.com
Normal Pulled 23s kubelet Successfully pulled image "quay.io/prometheus/busybox:latest" in 345ms (345ms including waiting)
Normal Started 21s kubelet Started container liveness
Warning Unhealthy 7s (x3 over 13s) kubelet Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
Normal Killing 7s kubelet Container liveness failed liveness probe, will be restarted
Normal Pulled 7s kubelet Successfully pulled image "quay.io/prometheus/busybox:latest" in 389ms (389ms including waiting)
Warning Failed 5s kubelet Error: failed to create containerd task: failed to create shim task: the file /bin/sh was not found: unknown
Normal Pulling 5s (x3 over 23s) kubelet Pulling image "quay.io/prometheus/busybox:latest"
Normal Pulled 4s kubelet Successfully pulled image "quay.io/prometheus/busybox:latest" in 342ms (342ms including waiting)
Normal Created 4s (x3 over 23s) kubelet Created container liveness
Warning Failed 3s kubelet Error: failed to create containerd task: failed to create shim task: failed to mount /run/kata-containers/f0ec86fb156a578964007f7773a3ccbdaf60023106634fe030f039e2e154cd11/rootfs to /run/kata-containers/liveness/rootfs, with error: ENOENT: No such file or directory: unknown
Warning BackOff 1s (x3 over 3s) kubelet Back-off restarting failed container liveness in pod liveness-exec_kata-containers-k8s-tests(b1a980bf-a5b3-479d-97c2-ebdb45773eff)
```
Let's skip it for now as we have an issue opened to track it down:
https://github.com/kata-containers/kata-containers/issues/9665
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is another one that is related to initContainers not being properly
handled with the guest image pulling.
Let's skip it for now as we have
https://github.com/kata-containers/kata-containers/issues/9668 to track
it down.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to firecracker, which doesn't have support for virtio-fs /
virtio-9p, TDX used with `shared_fs=none` will face the very same
limitations.
The tests affected are:
* k8s-credentials-secrets.bats
* k8s-file-volume.bats
* k8s-inotify.bats
* k8s-nested-configmap-secret.bats
* k8s-projected-volume.bats
* k8s-volume.bats
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This test fails when using `shared_fs=none` with the nydus snapshotter,
and we're tracking the issue here:
https://github.com/kata-containers/kata-containers/issues/9664
For now, let's have it skipped.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The test has been failing on TDX for a while, and an issue has been
created to track it down, see:
https://github.com/kata-containers/kata-containers/issues/9663
For now, let's have it skipped.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make sure the logs will print the correct annotation and its
value, instead of always mentioning "kernel" and "initrd".
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We're doing that as all tests are going to be running with
`shared_fs=none`, meaning that we don't need any specific test for this
case anymore.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will set the needed annotation for enforcing that the
image pull will be handled by the snapshotter set for the runtime
handler, instead of using the default one.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Given that we think the containerd -> snapshotter image cache
problems have been resolved by bumping to nydus-snapshotter v0.3.13
we can try removing the skips to test this out
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- We previously have an expectation for the pause rootfs
to be pull on the host when we did a guest pull. We weren't
really clear why, but it is plausible related to the issues we had
with containerd and nydus caching. Now that is fixed we can begin
to address this with setting shared_fs=none, but let's start with
updating the rootfs host check to be not higher than expected
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This test is called from `tests/integration/run_kuberentes_tests.sh`,
which already ensures that yq is installed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
On 1423420, I've mistakenly disabled the tests entirely, for both
non-TEEs and TEEs.
This happened as I didn't realise that `confidential_setup` would take
non-TEEs into consideration. :-/
Now, let me follow-up on that and make sure that the tests will be
running on non-TEEs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function is a helper to check whether the KATA_HYPERVISOR being
used is a confidential hardware (TEE) or not, and we can use it to
skip or only run tests on those platforms when needed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's rename it to `is_confidential_runtime_class`, and adapt all the
places where it's called.
The new name provides a better description, leading to a better
understanding of what the function really does.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's skip the policy addition for now, in order to get the TDX CI back
up and running, and then we can re-enable it as soon as we get
https://github.com/kata-containers/kata-containers/issues/9612 fixed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's skip those tests on TEEs as we've been facing a reasonable amount
of issues, most likely on the containerd side, related to pulling the
image on the guest.
Once we're able to fix the issues on containerd, we can get back and
re-enable those by reverting this commit.
The decision of disabling the tests for TEEs is because the machines may
end up in a state where human intervention is necessary to get them back
to a functional state, and that's really not optimal for our CI.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR adds a k8s negative policy test to the confidential attestation
bats test.
Fixes#9437
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The CH v39 upgrade in #9575 is currently blocked because of a bug in the
Mariner host kernel. To address this, we temporarily tweak the Mariner
CI to use an Ubuntu host and the Kata guest kernel, while retaining the
Mariner initrd. This is tracked in #9594.
Importantly, this allows us to preserve CI for genpolicy. We had to
tweak the default rules.rego however, as the OCI version is now
different in the Ubuntu host. This is tracked in #9593.
This change has been tested together with CH v39 in #9588.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
With the addition of the 'qemu-coco-dev' runtimeClass we no longer need
to run CoCo tests on non-TEE environments with 'qemu'. As a result the
tests also no longer need to set the "io.katacontainers.config.hypervisor.image"
annotation to pods.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Enable the k8s tests for cloud hypervisor with devicemapper.
Fixes: #9221.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Co-authored-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Avoid auto-generating Policy on platforms that haven't been tested
yet with auto-generated Policy.
Support for auto-generated Policy on these additional platforms is
coming up in future PRs, so the tests being fixed here were
prematurely enabled.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Remove k8s-policy-set-keys.bats in preparation for using the regorus
crate instead of the OPA daemon for evaluating the Agent Policy. This
test depended on sending HTTP requests to OPA.
Fixes: #9388
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Auto-generate the policy and then simulate attacks from the K8s
control plane by modifying the test yaml files. The policy then
detects and blocks those changes.
These test cases are using K8s Pods. Additional policy failures
are injected during CI using other types of K8s resources - e.g.,
using Jobs and Replication Controllers - from separate PRs.
Fixes: #9491
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Auto-generate the policy and then simulate attacks from the K8s
control plane by modifying the test yaml files. The policy then
detects and blocks those changes.
These test cases are using K8s Replication Controllers. Additional
policy failures will be injected using other types of K8s resources
- e.g., using Pods and/or Jobs - in separate PRs.
Fixes: #9463
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
When running on non-TEE environments (e.g. KATA_HYPERVISOR=qemu) the tests should
be stressing the CoCo image (/opt/kata/share/kata-containers/kata-containers-confidential.img)
although currently the default image/initrd is built to be able to do guest-pull as well.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Enabled guest-pull tests on non-TEE environment. It know requires the SNAPSHOTTER environment
variable to avoid it running on jobs where nydus-snapshotter is not installed
Fixes: #9410
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR adds the function to uninstall kbs client command function
specially when we are running with baremetal devices.
Fixes#9460
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR defines the PULL_TYPE variable to avoid failures of unbound
variable when this is being test it locally.
Fixes#9453
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Auto-generate the policy and then simulate attacks from the K8s
control plane by modifying the test yaml files. The policy then
detects and blocks those changes.
These test cases are using K8s Jobs. Additional policy failures
will be injected using other types of K8s resources - e.g., using
Pods and/or Replication Controllers - in future PRs.
Fixes: #9406
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
- Add v1 image test case
- Install protobuf-compiler in build check
- Reset containerd config to default in kubernetes test if we are testing genpolicy
- Update docker_credential crate
- Add test that uses default pull method
- Use GENPOLICY_PULL_METHOD in test
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
This PR improves the kbs_k8s_delete function to verify that the
resources were properly deleted for baremetal environments.
Fixes#9379
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
- At the moment we aren't supporting ppc64le or
aarch64 for
CoCo, so filter out these tests from running
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Add basic test case to check that a ruuning
pod can use the api-server-rest (and attestation-agent
and confidential-data-hub indirectly) to get a resource
from a remote KBS
Fixes#9057
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Co-authored-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Use the "allow all" policy for k8s-sandbox-vcpus-allocation.bats,
instead of relying on the Kata Guest image to use the same policy
as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-nginx-connectivity.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-volume.bats, instead of relying
on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-sysctls.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-security-context.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-seccomp.bats, instead of relying
on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-projected-volume.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-pod-quota.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-optional-empty-secret.bats,
instead of relying on the Kata Guest image to use the same policy as
its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-measured-rootfs.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-liveness-probes.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-inotify.bats, instead of relying
on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-guest-pull-image.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-footloose.bats, instead of
relying on the Kata Guest image to use the same policy as its
default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-empty-dirs.bats, instead of
relying on the Kata Guest image to use the same policy as its
default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Check from:
- k8s-exec-rejected.bats
- k8s-policy-set-keys.bats
if policy testing is enabled or not, to reduce the complexity of
run_kubernetes_tests.sh. After these changes, there are no policy
specific commands left in run_kubernetes_tests.sh.
add_allow_all_policy_to_yaml() is moving out of run_kubernetes_tests.sh
too, but it not used yet. It will be used in future commits.
Fixes: #9395
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Don't add the "allow all" policy to all the test YAML files anymore.
After this change, the k8s tests assume that all the Kata CI Guest
rootfs image files either:
- Don't support Agent Policy at all, or
- Include an "allow all" default policy.
This relience/assumption will be addressed in a future commit.
Fixes: #9395
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
the service might not listen on the default port, use the full service
address to ensure we are talking to the right resource.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
this can be used on kcli or other systems where cluster nodes are
accessible from all places where the tests are running.
Fixes: #9272
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
Since v2.2.6 it can detect TDX guests on Azure, so let's bump it even if
Azure peer-pods are not currently used as part of our CI.
Fixes: #9348
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes the unbound variables error when trying to run
the setup script locally in order to avoid errors.
Fixes#9328
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR updates the journal log names for kubernetes artifacts
in order to make sure that we have different names when we are
running parallel GHA jobs.
Fixes#9308
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR terminates the nydus namespace to avoid the error of
that the flag needs an argument.
Fixes#9264
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Those two architectures are not TEE capable, thus we can just skip
running those tests there.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is needed as the TDX machine is hosted inside Intel and relies on
proxies in order to connect to the external world. Not having those set
causes issues when pulling the image inside the guest.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add a test case of pulling image inside the guest for confidential
containers.
Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Georgina Kinge <georgina.kinge@ibm.com>
Co-authored-by: Megan Wright <Megan.Wright@ibm.com>
This commit updates `filter_k8s_test.sh` to handle skipped tests that
include comments. In addition to the existing parameter expansion,
the following expansions have been added:
- Removal of a comment
- Stripping of trailing spaces
Fixes: #9304
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Initialize $GITHUB_ENV to avoid nounset error when running the scripts locally
out of Github Actions.
Fixed commit 9ba5e3d2a8Fixes#9217
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Recently confidential-containers/kbs repository was renamed to
confidential-containers/trustee. Github will automatically resolve the
old URL but we better adjust it in code.
The trustee repository will be cloned to $COCO_TRUSTEE_DIR. Adjusted
file paths and pushd/popd's to use $COCO_KBS_DIR
($COCO_TRUSTEE_DIR/kbs).
On versions.yaml changed from `coco-kbs` to `coco-trustee` as in the
future we might need other trustee components, so keeping it generic.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Added the kbs_set_resources_policy() function to set the KBS policy. Also the
kbs_set_allow_all_resources() and kbs_set_deny_all_resources to set the
"allow all" and "deny all" policy, respectively.
Fixes#9056
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Added utility functions to manage resources in KBS:
- kbs_set_resource(), where the resource data is passed via argument
- kbs_set_resource_from_file(), where the resource data is found in a
file
Fixes#9056
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Added kbs_install_cli function to build and install the kbs-client
executable if not present into the system.
Removed the stub from gha-run.sh; now the install kbs-client in the
.github/workflows/run-kata-deploy-tests-on-aks.yaml will effectively
install the executable.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Added functions to return the service host, port or full-qualified
HTTP address, respectively, kbs_k8s_svc_host(), kbs_k8s_svc_port(),
and kbs_k8s_svc_http_addr().
Fixes#9056
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The k3s distribution of k8s uses an embedded version of containerd and
configures it to log to a file, not the journal. Hence, although we
collect the journal as a test artifact, we also need to collect the
actual log files for containerd.
Also collect the k3s containerd config files to help with debugging.
Fixes: #9104.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
The jobs running on garm will collect journal information. The data gathered
is based on the time the tests started running. The $start_time is
exported on run_tests() and used in collect_artifacts(). It happens that
run_tests() and collect_artifacts() are called on different steps of the
workflow and the environment variables aren't preserved between them,
i.e, $start_time exported on the first step is not available on the
subsequents.
To solve that issue, let's save $start_time in the file pointed out by
$GITHUB_ENV that Github actions uses to export variables. In case $GITHUB_ENV is
empty then probably it is running locally outside of Github, so it won't
save the start time value.
Fixes#9217
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Until this point the deployed KBS service is only reachable from within
the cluster. This introduces a generic mechanism to apply an Ingress
configuration to expose the service externally.
The first implemened ingress is for AKS. In case the HTTP application
routing isn't enabled in the cluster (this is required for ingress), an
add-on is applied.
It was added the get_cluster_specific_dns_zone() and
enable_cluster_http_application_routing() helper functions
to gha-run-k8s-common.sh.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Introduce the tests/integration/kubernetes/confidential_kbs.sh library
that contains functions to manage the KBS on CI. Initially implemented
the kbs_k8s_deploy() and kbs_k8s_delete() functions to, respectively,
deploy and delete KBS on Kubernetes. Also hooked those functions in the
tests/integration/kubernetes/gha-run.sh script to follow the convention
of running commands from Github Workflows:
$ .tests/integration/kubernetes/gha-run.sh deploy-coco-kbs
$ .tests/integration/kubernetes/gha-run.sh delete-coco-kbs
Fixes#9058
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR adds the collect artifacts function in gha-run script for
the kubernetes tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Currently, the checking for kata-deploy is running assume that the
daemonset scheduled at least one pod, however it might not had and the
kubectl wait command fails due to "error: no matching resources found".
On CI I've observed that fail intermittently. I suspect the service
account kata-deploy-sa take a while to show up then no kata-deploy is
scheduled in meanwhile.
Changed the checker logic to use waitForProcess() to keep testing if it is
already running, or hit the timeout (still 10m).
Fixes#9183
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Kubernetes v1.29 introduced a new `PodReadyToStartContainers` condition
that gets inserted at index 0 in the conditions array. This means that
the expected `PodCompleted` reason can now be either at index 0 with
kubernetes v1.28 and older or at index 1 starting with kubernetes v1.29.
This is fragile at best since the `kubectl wait` doesn't allow to combine
multiple checks. Also, checking the reason is dubious as it doesn't really
tell if the pods have actually completed or not.
Check the pod phase to be `Succeeded` instead, this guarantees that :
> All containers in the Pod have terminated in success, and will not
> be restarted.
Fixes#9178
Signed-off-by: Greg Kurz <groug@kaod.org>
Changed the "run k8s tests on AKS" workflows to get the CoCo KBS
installed so that we can run attestation tests.
The plan is to run attestation tests only on a subset of non-TEE jobs
initially, so this commit restricts to install KBS only on kata-qemu
configuration. Actually at this point it is added only stubs commands
to tests/integration/kubernetes/gha-run.sh that should be implemented
in a future commit.
Fixes#9058
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
CI failed to deploy nydus snapshotter because it was not cleaned up last time.
So we can try to cleanup nydus snapshotter before deploying it.
Fixes: #9121
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
teardown() gets executed after each test case, so there is no need to
clean-up before teardown.
Fixes: #9072
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Add set_namespace_to_policy_settings() for changing the pod namespace
in genpolicy settings.
Fixes: #9072
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This has been introduced by 53bc4a432b,
where the condition was changed.
The correct condition is:
* If the list of supported tees does not contain the kata hypervisor
and the list of supported non tees does not contain the kata
hypervisor.
The error is that we were checking whether kata-hypervisor would contain
the list of supported tees, and that would almost always be false
(unless in the case where the list had an one and only one element).
Fixes: #9055 -- part II
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
A test `vcpu allocation k8s test` exhibits different behavior on s390x
For more details, please refer to issue #9093.
This commit is to make the test skipped until the issue is resolved on
the platform.
Fixes: #9093
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This PR adds the ability to run k8s confidential tests in a
non-TEE environment.
Fixes#9055
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This will ensure no leftovers are in the node, which has been cause the
TDX CI to fail every now and then.
Fixes: #9081
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Delete the debugger pod created during the test, rather than already
existing debugger pods.
Also, send the output of "kubectl delete" to stderr, just in case it's
useful for debugging.
Fixes: #9069
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
We can use daemonset to deploy nydus snapshotter, which will decrease
one manual step both for Kata Containers and Confidential Containers CI.
Fixes: #8584
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This PR replaces the add_kernel_initrd_annotations_to_yaml function
more generic so later can be used for other components.
Fixes#9054
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>