Poll/wait for pod termination instead of sleeping 2 minutes. This
change typically saves ~90 seconds in my test cluster.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Kata-deploy often fails due to a transiently unreachable k8s cluster
for the qemu-coco-dev test on s390x.
(e.g. https://github.com/kata-containers/kata-containers/actions/runs/10831142906/job/30058527098?pr=10009)
This commit introduces a retry mechanism to mitigate these failures by
retrying the command two more times with a 10-second interval as a workaround.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
If the CI platform being tested doesn't support yet the prometheus
container image:
- Use busybox instead of prometheus.
- Skip the test cases that depend on the prometheus image.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR fixes the indentation in the cri containerd tests as we
have in several places a misalignment in the script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Remove the workaround for #9928, now that genpolicy is able to
convert user names from container images into the corresponding
UIDs from these images.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Remove the recently added default UID/GID values, because the genpolicy
design is to initialize those fields before this new code path gets
executed.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Some container images are configured such that the user (and group)
under which their entrypoint should run is not a number (or pair of
numbers), but a user name.
For example, in a Dockerfile, one might write:
> USER 185
indicating that the entrypoint should run under UID=185.
Some images, however, might have:
> RUN groupadd --system --gid=185 spark
> RUN useradd --system --uid=185 --gid=spark spark
> ...
> USER spark
indicating that the UID:GID pair should be resolved at runtime via
/etc/passwd.
To handle such images correctly, read through all /etc/passwd files in
all layers, find the latest version of it (i.e., the top-most layer with
such a file), and, in so doing, ensure that whiteouts of this file are
respected (i.e., if one layer adds the file and some subsequent layer
removes it, don't use it).
Signed-off-by: Hernan Gatta <hernan.gatta@opaque.co>
Disabling the UID Policy rule was a workaround for #9928. Re-enable
that rule here and add a new test/CI temporary workaround for this
issue. This new test workaround will be removed after fixing #9928.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Change quay.io/prometheus/busybox to quay.io/prometheus/prometheus in
this test. The prometheus image will be helpful for testing the future
fix for #9928 because it specifies user = "nobody".
Also, change:
sh -c "ls -l /"
to:
echo -n "readinessProbe with space characters"
as the test readinessProbe command line. Both include a command line
argument containing space characters, but "sh -c" behaves differently
when using the prometheus container image (causes the readinessProbe
to time out, etc.).
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR removes the remove_img variable in the metrics common script
as it is not being used.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR decreases the number of iterations in the kubernetes soak test
as this is already taking more than 2 hours for the kata coco ci
stability.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
With an older version of image-rs, we were getting the following error:
```
Message: failed to create containerd task: failed to create shim task: failed to handle layer: failed to get decrypt key no suitable key found for decrypting layer key:
```
However, with the version of image-rs we are bumping to, the error comes
as:
```
Message: failed to create containerd task: failed to create shim task: failed to handle layer: failed to get decrypt key
Caused by:
no suitable key found for decrypting layer key:
keyprovider: failed to unwrap key by ttrpc
```
Due to this change, I'm splitting the check in two different ones.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
ITA stands for Intel Trust Authority, which is in the process to being
renamed to ITTS (Intel Tiber Trust Services).
Proper ITA / ITTS support on Trustee was finished as part of:
* 6f767fa15f
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
ITA stands for Intel Trust Authority, which is in the process to being
renamed to ITTS (Intel Tiber Trust Services).
As we've bumped guest-components on trustee, let's make sure we also
bump image-rs to the commit that brings ITA support in:
* 1db6c3a876
The reason we need to bump the dependency here is to avoid kbs_protocol
mismatch between the version used by the agent and the trustee one.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
In the process of switching the TDX CI machine we've noticed that
`hostname -i` in one of the machines returns an one and only IP address,
while in another machine it returns a full list of IPs.
As we're only interested in the first one, let's adapt the code to
always return the first one.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Since CDH/attestation related processes and its dependencies are not fully
available, the setup fails to start kata-agent as local process. This
fix removes these procs to prevent kata-agent from trying to start them.
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
The test setup starts kata-agent as a local process without the
UVM. The agent policy initialization fails due to missing policy
document at `/etc/kata-opa/default-policy.rego`. The fix
- installs a relaxed `allow-all.rego` policy document
- cleans up the install during exit
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
This commit introduces test cases for testing
CopyFile API using kata-agent-ctl with improved command
semantics and handling.
- copy a file to /run/kata-containers
- copy symlink to /run/kata-containers
- copy directory to /run/kata-containers
- copy file to /tmp
- copy large file to /run/kata-containers
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
In the existing implementation for the CopyFile subcommand,
- cmd line argument list is too long, including various metadata information.
- in case of a regular file, passing the actual data as bytes stream adds to the size and complexity of the input.
- the copy request will fail when the file size exceeds that of the allowed ttrpc max data length limit of 4Mb.
This change refactors the CopyFile handler and modifies the input to a known 'source' 'destination' syntax.
Fixes#9708
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
This PR increases the timeout to wait that the deployment for the soak
stability test is ready in order to avoid random failures saying that
the deployment is not ready yet.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
It will panic when users do GPU vfio passthrough with cdi in runtime.
The root cause is that CustomSpec.Annotations is nil when new element
added.
To address this issue, initialization is introduced when it's nil.
Fixes#10266
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>