Currently, `qemu-runtime-rs` does not support `virtio-scsi`,
which causes the `k8s-block-volume.bats` test to fail.
We should skip this test until `virtio-scsi` is supported by the runtime.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The nginx container seems to error out when using UID=123.
Depending on the timing between container initialization and "kubectl
wait", the test might have gotten lucky and found the pod briefly in
Ready state before nginx errored out. But on some of the nodes, the pod
never got reported as Ready.
Also, don't block in "kubectl wait --for=condition=Ready" when wrapping
that command in a waitForProcess call, because waitForProcess is
designed for short-lived commands.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR updates the fast footprint script to remove the use
of egrep as this command has been deprecated and change it
to use grep command.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This imports the k8s-block-volume test from the tests repo and modifies
it slightly to set up the host volume on the AKS host.
This is a follow-up to #7132.
Fixes: #7164
Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
The ita kustomization for Trustee, as well as previously used one
(DCAP), doesn't have a $(uname -m) directory after the deployment
directory name.
Let's follow the same logic used for the deploy-kbs script and clean
those up accordingly.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Intel Tiber Trust Services (formerly known as Intel Trust Authority) is
Intel's own attestation service, and we want to take advantage of the
TDX CI in order to ensure ITTS works as expected.
In order to do so, let's replace the former method used (DCAP) to use
ITTS instead.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
There are many similar or duplicated code patterns in `teardown()`.
This commit consolidates them into a new function, `teardown_common()`,
which is now called within `teardown()`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The current `exec_host()` accepts a given node name and
creates a node debugger pod, even if the name is invalid.
This could result in the creation of an unnecessary pending
pod (since we are using nodeAffinity; if the given name
does not match any actual node names, the pod won’t be scheduled),
which wastes resources.
This commit introduces validation for the node name to
prevent this situation.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
It was observed that the custom node debugger pod is not
cleaned up when a test times out.
This commit ensures the pod is cleaned up by triggering
the cleanup on EXIT, preventing any debugger pods from
being left behind.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
With #10232 merged, we now have a persistent node debugger pod throughout the test.
As a result, there’s no need to spawn another debugger pod using `kubectl debug`,
which could lead to false negatives due to premature pod termination, as reported
in #10081.
This commit removes the `print_node_journal()` call that uses `kubectl debug` and
instead uses `exec_host()` to capture the host journal. The `exec_host()` function
is relocated to `tests/integration/kubernetes/lib.sh` to prevent cyclical dependencies
between `tests_common.sh` and `lib.sh`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
`assert_pod_fail()` currently calls `k8s_create_pod()` to ensure that a pod
does not become ready within the default 120s. However, this delays the test's
completion even if an error message is detected earlier in the journal.
This commit removes the use of `k8s_create_pod()` and modifies `assert_pod_fail()`
to fail as soon as the pod enters a failed state.
All failing pods end up in one of the following states:
- CrashLoopBackOff
- ImagePullBackOff
The function now polls the pod's state every 5 seconds to check for these conditions.
If the pod enters a failed state, the function immediately returns 0. If the pod
does not reach a failed state within 120 seconds, it returns 1.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Now that the issue with handling loop devices has been resolved,
this commit re-enables the guest-pull-image tests for `qemu-coco-dev`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Timeouts occur (e.g. `create_container_timeout` and `wait_time`)
when using qemu-coco-dev.
This commit increases these timeouts for the trusted image storage
test cases
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
If the host running the tests is different from the host where the cluster is running,
the *_loop_device() functions do not work as expected because the device is created
on the test host, while the cluster expects the device to be local.
This commit ensures that all commands for the relevant functions are executed via exec_host()
so that a device should be handled on a cluster node.
Additionally, it modifies exec_host() to return the exit code of the last executed command
because the existing logic with `kubectl debug` sometimes includes unexpected characters
that are difficult to handle. `kubectl exec` appears to properly return the exit code for
a given command to it.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Creating and deleting a node debugger pod for every `exec_host()`
call is inefficient.
This commit changes the test suite to create and delete the pod
only once, globally.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This commit addresses an issue with handling loop devices
via a node debugger due to restricted privileges.
It runs a pod with full privileges, allowing it to mount
the host root to `/host`, similar to the node debugger.
This change enables us to run tests for trusted image storage
using the `qemu-coco-dev` runtime class.
Fixes: #10133
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This patch adds support to call kata agents SetPolicy
API. Also adds tests for SetPolicy API using agent-ctl.
Fixes#9711
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
The `deploy_k0s` and `deploy_k3s` kubectl installs aren't failing
yet, but let get ahead of this and bump them as well
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
agent built with policy feature initializes the policy engine using a
policy document from a default path, which is installed & linked during
UVM rootfs build. This commit adds support to provide a default agent
policy as environment variable.
This targets development/testing scenarios where kata-agent
is wanted to be started as a local process.
Fixes#10301
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
We've added s390x test container image, so add support
to use them based on the arch the test is running on
Fixes: #10302
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
fixuop
This commit brings some public parts of image pulling test series like
encrypted image pulling, pulling images from authenticated registry and
image verification. This would help to reduce the cost of maintainance.
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Close#8120
**Case 1**
Create a pod from an unsigned image, on an insecureAcceptAnything
registry works.
Image: quay.io/prometheus/busybox:latest
Policy rule:
```
"default": [
{
"type": "insecureAcceptAnything"
}
]
```
**Case 2**
Create a pod from an unsigned image, on a 'restricted registry' is
rejected.
Image: ghcr.io/confidential-containers/test-container-image-rs:unsigned
Policy rule:
```
"quay.io/confidential-containers/test-container-image-rs": [
{
"type": "sigstoreSigned",
"keyPath": "kbs:///default/cosign-public-key/test"
}
]
```
**Case 3**
Create a pod from a signed image, on a 'restricted registry' is
successful.
Image: ghcr.io/confidential-containers/test-container-image-rs:cosign-signed
Policy rule:
```
"ghcr.io/confidential-containers/test-container-image-rs": [
{
"type": "sigstoreSigned",
"keyPath": "kbs:///default/cosign-public-key/test"
}
]
```
**Case 4**
Create a pod from a signed image, on a 'restricted registry', but with
the wrong key is rejected
Image:
ghcr.io/confidential-containers/test-container-image-rs:cosign-signed-key2
Policy:
```
"ghcr.io/confidential-containers/test-container-image-rs": [
{
"type": "sigstoreSigned",
"keyPath": "kbs:///default/cosign-public-key/test"
}
]
```
**Case 5**
Create a pod from an unsigned image, on a 'restricted registry' works
if enable_signature_verfication is false
Image: ghcr.io/kata-containers/confidential-containers:unsigned
image security enable: false
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Change pod runAsUser value of a Replication Controller after generating
the RC's policy, and verify that the RC pods get rejected due to this
change.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Change pod runAsUser value of a Job after generating the Job's policy,
and verify that the Job gets rejected due to this change.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Change pod runAsUser value of a Deployment after generating the
Deployment's policy, and verify that the Deployment fails due to
this change.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Poll/wait for pod termination instead of sleeping 2 minutes. This
change typically saves ~90 seconds in my test cluster.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Kata-deploy often fails due to a transiently unreachable k8s cluster
for the qemu-coco-dev test on s390x.
(e.g. https://github.com/kata-containers/kata-containers/actions/runs/10831142906/job/30058527098?pr=10009)
This commit introduces a retry mechanism to mitigate these failures by
retrying the command two more times with a 10-second interval as a workaround.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
If the CI platform being tested doesn't support yet the prometheus
container image:
- Use busybox instead of prometheus.
- Skip the test cases that depend on the prometheus image.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR fixes the indentation in the cri containerd tests as we
have in several places a misalignment in the script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Remove the workaround for #9928, now that genpolicy is able to
convert user names from container images into the corresponding
UIDs from these images.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Disabling the UID Policy rule was a workaround for #9928. Re-enable
that rule here and add a new test/CI temporary workaround for this
issue. This new test workaround will be removed after fixing #9928.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Change quay.io/prometheus/busybox to quay.io/prometheus/prometheus in
this test. The prometheus image will be helpful for testing the future
fix for #9928 because it specifies user = "nobody".
Also, change:
sh -c "ls -l /"
to:
echo -n "readinessProbe with space characters"
as the test readinessProbe command line. Both include a command line
argument containing space characters, but "sh -c" behaves differently
when using the prometheus container image (causes the readinessProbe
to time out, etc.).
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR removes the remove_img variable in the metrics common script
as it is not being used.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR decreases the number of iterations in the kubernetes soak test
as this is already taking more than 2 hours for the kata coco ci
stability.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
With an older version of image-rs, we were getting the following error:
```
Message: failed to create containerd task: failed to create shim task: failed to handle layer: failed to get decrypt key no suitable key found for decrypting layer key:
```
However, with the version of image-rs we are bumping to, the error comes
as:
```
Message: failed to create containerd task: failed to create shim task: failed to handle layer: failed to get decrypt key
Caused by:
no suitable key found for decrypting layer key:
keyprovider: failed to unwrap key by ttrpc
```
Due to this change, I'm splitting the check in two different ones.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
In the process of switching the TDX CI machine we've noticed that
`hostname -i` in one of the machines returns an one and only IP address,
while in another machine it returns a full list of IPs.
As we're only interested in the first one, let's adapt the code to
always return the first one.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Since CDH/attestation related processes and its dependencies are not fully
available, the setup fails to start kata-agent as local process. This
fix removes these procs to prevent kata-agent from trying to start them.
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
The test setup starts kata-agent as a local process without the
UVM. The agent policy initialization fails due to missing policy
document at `/etc/kata-opa/default-policy.rego`. The fix
- installs a relaxed `allow-all.rego` policy document
- cleans up the install during exit
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
This commit introduces test cases for testing
CopyFile API using kata-agent-ctl with improved command
semantics and handling.
- copy a file to /run/kata-containers
- copy symlink to /run/kata-containers
- copy directory to /run/kata-containers
- copy file to /tmp
- copy large file to /run/kata-containers
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
This PR increases the timeout to wait that the deployment for the soak
stability test is ready in order to avoid random failures saying that
the deployment is not ready yet.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This commit enables running tests for kata agent apis.
The 'api-tests' directory will contain bats test files for
individual APIs.
Fixes#10269
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
This PR updates the machine learning tests references or urls for the
openVINO and oneDNN scripts as currently they are refering to a different
performance benchmark.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
enable CI to add test cases for testing kata-agent APIs. This commit
introduces:
- a workflow to run tests
- setup scripts to prepare the test environment
Fixes#10262
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
metrics tests sometimes fail with kata components still running.
sending SIGKILL and waiting for the processes to reap.
Fixes#8651
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
On commit 51690bc157 we switched the installation from kubectl to helm
and used its `--wait` expecting the execution would continue when all
kata-deploy Pods were Ready. It turns out that there is a limitation on
helm install that won't wait properly when the daemonset is made of a
single replica and maxUnavailable=1. In order to fix that issue, let's
revert the changes partially to keep using kubectl and waitForProcess
to the exection while Pods aren't Running.
Fixes#10168
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR removes the metrics report which is not longer being used
in Kata Containers.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
To deploy KBS on s390x, the environment variable `IBM_SE_CREDS_DIR`
must be exported, and the corresponding directory must be created.
This commit enables KBS deployment for `qemu-coco-dev`, in addition
to the existing `qemu-se` support on the platform.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This reverts commit 704da86e9b, as the
tests never became stable to run.
This was discussed and agreed with the maintainer.
Conflicts:
.github/workflows/basic-ci-amd64.yaml
tests/integration/stdio/gha-run.sh
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This PR enables the k8s soak stability test to run on the weekly
Kata CoCo stability CI.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the oneDNN benchmark information to the machine
learning metrics README.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
GH-9592 addressed a bug in a previous version of the AKS Mariner host
kernel that blocked the CH v39 upgrade. This bug has now been fixed so
we undo that PR.
Note we also specify a different OCI version for Mariner as it differs
from Ubuntu's.
Fixes: #9594
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This reverts commit 41b7577f08.
We were seeing a lot of issues in the TDX CI of the nature:
"Error: failed to create containerd container: create instance
470: object with key "470" already exists: unknown"
With the TDX CI, we moved to having the nydus snapsotter pre-installed.
Essentially the `deploy-snapshotter` step was performed once before any
actual CI runs.
We were seeing failures related to the error message above.
On reverting this change, we are no longer seeing errors related to
"key exists" with the TDX CI passing now.
The change reverted here is related to downloading incomplete images, but this
seems to be messing up TDX CI.
It is possible to pass --snapshotter to `ctr image check` but that does
not seem to have any effect on the data set returned.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This PR adds the OpenVINO benchmark general information into the
machine learning README metrics information.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
In certain environments (e.g., those with lower performance), `k8s_create_pod()`
may require additional wait time, especially when dealing with large images.
Since `k8s_wait_pod_be_ready()` — which is called by `k8s_create_pod()` — already
accepts `wait_time` as a second argument, it makes sense to introduce `wait_time`
to `k8s_create_pod()` and propagate it to the callee.
This commit adds `wait_time` to `k8s_create_pod()` as the 2nd (optional) argument.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Some of the tests call set_metadata_annotation() for updating the kernel
parameters. For `kata-qemu-se`, repack_secure_image() is called which is
defined in `lib_se.sh` and sourced by `confidential_kbs.sh`.
This commit ensures that the function call chain for the relevant
`KATA_HYPERVISOR` is properly handled.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This PR adds the pod deployment yaml for soak test which is part
of the stability k8s tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
* genpolicy: deny UpdateEphemeralMountsRequest
Deny UpdateEphemeralMountsRequest by default, because paths to
critical Guest components can be redirected using such request.
Signed-off-by: Dan Mihai <Daniel.Mihai@microsoft.com>
This PR adds a kubernetes parallel test that will launch multiple replicas
from a kubernetes deployment and we will iterate this multiple times to
verify that we are able to do this using CoCo Kata. This test will be
part of the CoCo Kata stability CI.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This partially reverts commit 94b3348d3c,
as there's more work needed in order to have this one done in a robust
way, and we are taking the safer path of reverting for now, and adding
it back as soon as the release is cut out.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This partially reverts commit 51690bc157,
as there's more work needed in order to have this one done in a robust
way, and we are taking the safer path of reverting for now, and adding
it back as soon as the release is cut out.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This reverts commit 1221ab73f9, as there's
more work needed in order to have this one done in a robust way, and we
are taking the safer path of reverting for now, and adding it back as
soon as the release is cut out.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
The deploy-kbs.sh script generates the kbs.key that's used to install
KBS. This same file is used lately by kbs-client to authenticate. This ensures
that the file was created, otherwise fail.
Another problem solved here is that on bare-metal machines the key doesn't survive
a reboot as it is created in a temporary directory (/tmp/trustee). So let's save
the file to a non-temporary location.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
I assume the PR that introduced this was based on an older version of
yq, and as the test couldn't run before it got merged we never noticed
the error.
However, this test has been failing for a reasonable amount of time,
which makes me think that we either need a maintainer for it, or just
remove it completely, but that's a discussion for another day.
For now, let's make it, at least, run.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR adds the k8s stability Kata CoCo GHA workflow to run weekly
the k8s stability tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Add bind mounts for volumes defined by docker container images, unless
those mounts have been defined in the input K8s YAML file too.
For example, quay.io/opstree/redis defines two mounts:
/data
/node-conf
Before these changes, if these mounts were not defined in the YAML file
too, the auto-generated policy did not allow this container image to
start.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Instead of deploying and removing the snapshotter on every single run,
let's make sure the snapshotter is always deploy on the TDX case.
We're doing this as an experiment, in order to see if we'll be able to
reduce the failures we've been facing with the nydus snapshotter.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The `exec_host()` function often fails to capture the output of a given command
because the node debugger pod is prematurely terminated. To address this issue,
the function has been refactored to ensure consistent output capture by adjusting
the `kubectl debug` process as follows:
- Keep the node debugger pod running
- Wait until the pod is fully ready
- Execute the command using `kubectl exec`
- Capture the output and terminate the pod
This commit refactors `exec_host()` to implement the above steps, improving its reliability.
Fixes: #10081
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
I can't set up loop device with `exec_host`, which the command is
necessary for qemu-coco-dev. See issue #10133.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Update error message in pulling image encrypted to "failed to get decrypt key no suitable key found for decrypting layer key".
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
add tests for guest pull with configured timeout:
1) failed case: Test we cannot pull a large image that pull time exceeds a short creatcontainer timeout(10s) inside the guest
2) successful case: Test we can pull a large image inside the guest with increasing createcontainer timeout(120s)
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
add tests for pull images in the guest using trusted storage:
1) failed case: Test we cannot pull an image that exceeds the memory limit inside the guest
2) successful case: Test we can pull an image inside the guest using
trusted ephemeral storage.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This PR adds a k8s stability test that will be part of the CoCo Kata
stability tests that will run weekly.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR disables the k8s file volume test as we are having random failures
in multiple GHA CIs mainly because the exec_host function sometimes
does it not work properly.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds kubernetes stress-ng tests as part of the stability testing
for kata.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Currently, setting `io.containerd.cri.runtime-handler` annotation in
the yaml is not necessary for pulling images in the guest. All TEE
hypervisors are already running tests with guest-pulling enabled.
Therefore, we can remove some duplicate tests and re-enable the
guest-pull test for running different runtime pods at the same time.
While considering to support different containerd version, I recommend
to keep setting "io.containerd.cri.runtime-handler".
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This PR updates the ubuntu image for stress Dockerfile. The main purpose
is to have a more updated image compared with the one that is in libpod
which has not been updated in a while.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
These new "kata-deploy" and "cleanup" actions are equivalent to
"kata-deploy-garm" "cleanup-garm", respectively, and should be
used on the workflows being migrated from GARM to
Github's managed runners.
Eventually "kata-deploy-garm" and "cleanup-garm" won't be used anymore
then we will be able to remove them.
See: #9940
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR updates the image that we are using in the kubectl debug command
as part of the exec host function, as the current alpine image does not
allow to create a temporary file for example and creates random kubernetes
failures.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Rather then modifying the kata-depoy scripts let's use Helm and
create a values.yaml that can be used to render the final templates
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
For easier handling of kata-deploy we can leverage a Helm chart to get
rid of all the base and overlays for the various components
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
After experimenting a little bit with those tests, they seem to be
passing on all the available TEE machines.
With this in mind, let's just enable them for those machines.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR removes duplicated entries (vcpus count, and available memory),
from onednn and openvino results files.
Fixes: #10119
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
The issue is similar to #10011.
The root cause is that tty and stderr are set to true at same time in
containerd: #10031.
Fixes: #10081
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This PR encloses the search string for 'default_vcpus ='
and 'default_memory =' with double quotes in order to
parse the precise values, which are included in the kata
configuration file.
Fixes: #10118
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
1. Use the new value of AllowRequestsFailingPolicy after setting up a
new Policy. Before this change, the only way to enable
AllowRequestsFailingPolicy was to change the default Policy file,
built into the Guest rootfs image.
2. Ignore errors returned by regorus while evaluating Policy rules, if
AllowRequestsFailingPolicy was enabled. For example, trying to
evaluate the UpdateInterfaceRequest rules using a policy that didn't
define any UpdateInterfaceRequest rules results in a "not found"
error from regorus. Allow AllowRequestsFailingPolicy := true to
bypass that error.
3. Add simple CI test for AllowRequestsFailingPolicy.
These changes are restoring functionality that was broken recently by
commmit df23eb09a6.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR updates the memory tests like fast footprint to use grep -F
instead of fgrep as this command has been deprecated.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds missing packages depenencies to install kbs cli in a fresh
new baremetal environment. This will avoid to have a failure when trying
to run install-kbs-client.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Currently, there are some issues with pulling images in CI, such as :
https://github.com/kata-containers/kata-containers/actions/runs/10109747602/job/27959198585
This issue is caused by switching between different snapshotters for the same image in some scenarios.
To resolve it, we can check existing images to ensure all content is available locally before running tests.
Fixes: #10029
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Rename k8s-exec-rejected.bats to k8s-policy-hard-coded.bats, getting
ready to test additional hard-coded policies using the same script.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Users of AUTO_GENERATE_POLICY=yes:
- Already tested *auto-generated* policy on any platform.
- Will be able to test *hard-coded* policy too on any platform, after
this change.
CI continues to test hard-coded policies just on the platforms listed
here, but testing those policies locally (outside of CI) on other
platforms can be useful too.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Generate policy that validates each exec command line argument, instead
of joining those args and validating the resulting string. Joining the
args ignored the fact that some of the args might include space
characters.
The older format from genpolicy-settings.json was similar to:
"ExecProcessRequest": {
"commands": [
"sh -c cat /proc/self/status"
],
"regex": []
},
That format will not be supported anymore. genpolicy will detect if its
users are trying to use the older "commands" field and will exit with
a relevant error message in that case.
The new settings format is:
"ExecProcessRequest": {
"allowed_commands": [
[
"sh",
"-c",
"cat /proc/self/status"
]
],
"regex": []
},
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Add integration test that creates two bridge networks with nerdctl and
verifies that Kata container is brought up while passing the networks
created.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Share a single test script variable for both:
- Allowing a command to be executed using Policy settings.
- Executing that command using "kubectl exec".
Fixes: #10014
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR updates the metrics launch times to use grep -F instead of
fgrep as this command has been deprecated.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Currently, we have found that `assert_logs_contain` does not work on TDX.
We manually located the specific log, but it fails to get the log using `kubectl debug`. The error found in CI is:
```
warning: couldn't attach to pod/node-debugger-984fee00bd70.jf.intel.com-pdgsj,
falling back to streaming logs: error stream protocol error: unknown error
```
Upon debugging the TDX CI machine, we found an error in containerd:
```
Attach container from runtime service failed" err="rpc error: code = InvalidArgument desc = tty and stderr cannot both be true"
containerID="abc8c7a546c5fede4aae53a6ff2f4382ff35da331bfc5fd3843b0c8b231728bf"
```
We believe this is the root cause of the test failures in TDX CI.
Therefore, we need to ensure that tty and stderr are not set to true at same time.
Fixes: #10011
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Signed-off-by: Wang, Arron <arron.wang@intel.com>
This PR updates the Blogbench reference values for
read and write operations used in the CI check metrics
job.
This is due to the update to version 1.2 of blobench.
Fixes: #10039
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
It is not good practice to call repack_secure_image() from a bats file
because the test code might not consider cases where `qemu-se` is used
as `KATA_HYPERVISOR`.
This commit moves the function call to set_metadata_annotation() if a key
includes `kernel_params` and `KATA_HYPERVISOR` is set to `qemu-se`, allowing
developers to focus on the test scenario itself.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Add policy to pod-secret-env.yaml from k8s-credentials-secrets.bats.
Policy was already auto-generated for the other pod used by the same
test (pod-secret.yaml). pod-secret-env.yaml was inconsistent,
because it was taking advantage of the "allow all" policy built into
the Guest image. Sooner or later, CI Guests for CoCo will not get the
"allow all" policy built in anymore and pod-secret-env.yaml would
have stopped working then.
Note that pod-secret-env.yaml continues to use an "allow all" policy
after these changes. #10033 must be solved before a more restrictive
policy will be generated for pod-secret-env.yaml.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Since #9904 was merged, newly introduced tests for `k8s-guest-pull-image-authenticated.bats`
have been failing on IBM SE (s390x). The agent fails to start because a kernel parameter
cannot pass to the guest VM via annotation. To fix this, the boot image must be rebuilt with
updated parameters.
This commit adds the rebuilding step in create_pod_yaml_with_private_image() for `qemu-se`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Move to blogbench 1.2 version from 1.1.
This version includes an important fix for the read_score test
which was reported to be broken in the previous version.
It essentially fixes this issue here:
https://github.com/jedisct1/Blogbench/issues/4
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
1. Use a container image that supports "ps --user 1000 -f".
2. Execute that command using:
sh -c "ps --user 1000 -f"
instead of passing additional arguments to sh:
sh -c ps --user 1000 -f
Fixes: #10019
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Keep track of individual exec args instead of joining them in the
policy text. Verifying each arg results in a more precise policy,
because some of the args might include space characters.
This improved validation applies to commands specified in K8s YAML
files using:
- livenessProbe
- readinessProbe
- startupProbe
- lifecycle.postStart
- lifecycle.preStop
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Add tests for genpolicy's handling of container.exec_commands. These
are commands allowed by the policy and originating from these input
K8s YAML fields:
- livenessProbe
- readinessProbe
- startupProbe
- lifecycle.postStart
- lifecycle.preStop
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
We had a typo in the attestation tests that we've copied around a
lot and Wainer spotted it in the authenticated registry tests, so let's fix it up now
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add three new test cases for guest pull from an authenticated registry for
the following scenarios:
_**Scenario**: Creating a container from an authenticated image, with correct credentials via KBC works_
**Given** An authenticated container registry *quay.io/kata-containers/confidential-containers-auth*
**And** a version of kata deployed with a guest image that has an agent with `guest_pull`
feature enabled and nydus-snapshotter installed and configured for
[guest-pulling](https://github.com/containerd/nydus-snapshotter/blob/main/misc/snapshotter/config-coco-guest-pulling.toml)
**And** a KBS set up to have the correct auth.json for
registry *quay.io/kata-containers/confidential-containers-auth* embedded in the `"Credential"` section of `its resources file`
**When** I create a pod from the container image *quay.io/kata-containers/confidential-containers-auth:test*
**Then** The pull image works and the pod can start
_**Scenario**: Creating a container from an authenticated image, with incorrect credentials via KBC fails_
**Given** An authenticated container registry *quay.io/kata-containers/confidential-containers-auth*
**And** a version of kata deployed with a guest image that has an agent with `guest_pull`
feature enabled and nydus-snapshotter installed and configured for
[guest-pulling](https://github.com/containerd/nydus-snapshotter/blob/main/misc/snapshotter/config-coco-guest-pulling.toml)
**And** An installed kata CC with the sample_kbs set up to have the auth.json for registry
*quay.io/kata-containers/confidential-containers-auth* embedded in the `"Credential"` resource, but with a dummy user name and password
**When** I create a pod from the container image *quay.io/kata-containers/confidential-containers-auth:test*
**Then** The pull image fails with a message that reflects that the authorisation failed
_**Scenario**: Creating a container from an authenticated image, with no credentials fails_
**Given** An authenticated container registry *quay.io/kata-containers/confidential-containers-auth*
**And** a version of kata deployed with a guest image that has an agent with `guest_pull`
feature enabled and nydus-snapshotter installed and configured for
[guest-pulling](https://github.com/containerd/nydus-snapshotter/blob/main/misc/snapshotter/config-coco-guest-pulling.toml)
**And** An installed kata CC with no credentials section
**When** I create a pod from the container image *quay.io/kata-containers/confidential-containers-auth:test*
**Then** The pull image fails with a message that reflects that the authorisation failed
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The only kernel built for measured rootfs was the kernel-tdx-experimental,
so this test only ran in the qemu-tdx job runs the test.
In commit 6cbdba7 we switched all TEE configurations to use the same kernel-confidential,
so rootfs measured is disabled for qemu-tdx too now.
The VM still fails to boot (because of a different reason...) but the bug
in the assert_logs_contain, fixed in this PR was masking the checks on the logs.
We still have a few open issues related to measured rootfs and generating
the root hash, so let's skip this test that doesn't work until they are looked at
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add three new tests cases for guest-pull of an encrypted image
for the following scenarios:
_**Scenario: Pull encrypted image on guest with correct key works**_
**Given** I have a version of kata deployed with a guest image that has
an agent with `guest_pull` feature enabled and nydus-snapshotter installed
and configured for guest-pulling
**And** A public encrypted container image *i* with a decryption key *k*
that is configured as a resource the KBS, so that image-rs on the guest can
connect to it
**When** I try and create a pod from *i*
**Then** The pod is successfully created and runs
_**Scenario: Cannot pull encrypted image with no decryption key**_
**Given** I have a version of kata deployed with a guest image that has
an agent with `guest_pull` feature enabled and nydus-snapshotter installed
and configured for guest-pulling
**And** A public encrypted container image *i* with a decryption key *k*,
that is **not** configured in a KBS that image-rs on the guest can connect to
**When** I try and create a pod from *i*
**Then** The pod is not created with an error message that reflects why
_**Scenario: Cannot pull encrypted image with wrong decryption key**_
**Given** I have a version of kata deployed with a guest image that has
an agent with `guest_pull` feature enabled and nydus-snapshotter installed
and configured for guest-pulling
**And** A public encrypted container image *i* with a decryption key *k*
and a different key *k'* that is set as a resource in a KBS, that image-rs
on the guest can connect to
**When** I try and create a pod from *i*
**Then** The pod is not created with an error message that reflects why
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The pipe needs adding to the grep, otherwise the grep
gets consumed as an argument to `print_node_journal` and
run in the debug pod.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Auto-generate the policy in k8s-security-context.bats - previously
blocked by lacking support for PodSecurityContext.runAsUser.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR adds the share fs information for dragonball using kata-ctl
to avoid the failures in runk tests saying that shared_fs is an
unbound variable.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Modify the permissions of containerd.sock just when genpolicy needs
access to this socket, when testing GENPOLICY_PULL_METHOD=containerd.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Explain why the containerd settings on the local machine get set to
containerd's defaults when testing GENPOLICY_PULL_METHOD=containerd.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
oci-distribution is the value used by run-k8s-tests-on-aks.yaml, so
use the same value as default for GENPOLICY_PULL_METHOD in gha-run.sh.
The value of GENPOLICY_PULL_METHOD is currently compared just with
"containerd", but avoid possible future problems due to using a
different default value in gha-run.sh.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Currently, it is not viable to share a writable volume (e.g., emptyDir)
between containers in a single pod for IBM SE.
The following tests are relevant:
- pod-shared-volume.bats
- k8s-empty-dirs.bats
(See: https://github.com/kata-containers/kata-containers/issues/10002)
This commit skips the tests until the issue is resolved.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
GH-9973 introduced:
* New function get_kata_memory_and_vcpus() in
tests/metrics/lib/common.bash.
* A call to get_kata_memory_and_vcpus() from extract_kata_env(), which
is defined in tests/common.bash.
Because the nydus test only sources tests/common.bash, it can't find
get_kata_memory_and_vcpus() and errors out.
We fix this by moving the get_kata_memory_and_vcpus() call from
tests/common.bash to tests/metrics/lib/json.bash so that it doesn't
impact the nydus test.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
For running a KBS with `se-verifier` in service,
specific credentials need to be configured.
(See https://github.com/confidential-containers/trustee/tree/main/attestation-service/verifier/src/se for details.)
This commit introduces two procedures to support IBM SE attestation:
- Prepare required files and directory structure
- Set necessary environment variables for KBS deployment
- Repackage a secure image once the KBS service address is determined
These changes enable `k8s-confidential-attestation.bats` for s390x.
Fixes: #9933
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The current KBS deployment creates a file `key.bin` assuming that
`kustomization.yaml` is located in `overlays/`.
However, this does not hold true when the kustomize config is enabled
for multiple architectures. In such cases, the configuration file
should be located in `overlays/$(uname -m)`.
This commit changes the location for file creation.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Delete the kata-containers-k8s-tests namespace before resetting the namespace
to ensure that no deployments or services are restarting and creating pods in the default namespace.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Signed-off-by: Wang, Arron <arron.wang@intel.com>
Delete test scripts forcely in `Delete kata-deploy` step before
deleting all kata pods.
Fixes: #9980
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This PR removes the common_init function call from the memory
usage script to eliminate duplicate checking that is also done
from the init_env function.
It also eliminates duplicaction of nested conditionals.
Fixes: #9984
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR removes the use_devmapper variable which was part of the jenkins
environment flags which is not longer support it or available for the
cri-containerd tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR retrieves the free memory and the vcpus count from
a kata container and includes them to the json results file of
any metric.
Additionally this PR parses the requested vcpus quantity and the
requested amount memory from kata configuration file and includes
this pair of values into the json results file of any metric.
Finally, the file system defined in the kata configuration file
is included in the results template.
Fixes: #9972
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR rephrased the description and usage of certain functions
as such as:
- set_kata_configuration_performance
- set_kata_config_file
- get_current_kata_config_file
- check_if_root
- check_ctr_images
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
The sealed secret test depends on the KBS to provide
the unsealed value of a vault secret.
This secret is provisioned to an environment variable.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
This PR removes the CI variable as well as the instructions related
to this as this was part of the jenkins environment which is not
longer supported it.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes the CI variable as well as the instructions related
to this variable which was used on the jenkins environment and not
longer supported.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR fixes the pip installation for nerdctl by removing a flag
which is not longer supported and avoid the failure of
no such option: --break-system-packages.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
- Due to the error we hit with pulling the agnhost
image used in the liveness-probe tests, we want to leave
the console printing to help with debug when we next try
to bump the image-rs version
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This test was fixed by previous patches in this PR: kata-containers#9695
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This test was fixed by previous patches in this PR: kata-containers#9695
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This PR has better variable definitons as well the use of a variable
which is already defined in the metrics common script for soak parallel
test.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR is for better variable definitions as well as the use of the
CTR_EXE variable which is already defined in the metrics common script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR uses the CTR_EXE which is already defined in the metrics common
script to have uniformity across the multiple stability tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>