The AKS CLI recently introduced a regression that prevents using
aks-preview extensions (Azure/azure-cli#31345), and hence create
CI clusters.
To address this, we temporarily hardcode the last known good version of
aks-preview.
Note that I removed the comment about this being a Mariner requirement,
as aks-preview is also a requirement of AKS App Routing, which will
be introduced soon in #11164.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Let's use what we have in the k8s functional tests to create a common
function to deploy kata containers using our helm charts. This will
help us immensely in the kata-deploy testing side in the near future.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Update the code to install the version of k0s
that we have in our versions.yaml, rather than
just installing the latest, to help our CI being
less stable and prone to breaking due to things
we don't control.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
When trying to deploy nydus on kcli locally we get the
following failure:
```
root@sh-kata-ci1:~# kubectl get pods -n nydus-system
NAMESPACE NAME READY STATUS RESTARTS AGE
nydus-system nydus-snapshotter-5kdqs 0/1 CrashLoopBackOff 4 (84s ago) 7m29s
```
Digging into this I found that the nydus-snapshotter service
is failing with:
```
ubuntu@kata-k8s-worker-0:~$ journalctl -u nydus-snapshotter.service
-- Logs begin at Wed 2025-02-12 15:06:08 UTC, end at Wed 2025-02-12 15:20:27 UTC. --
Feb 12 15:10:39 kata-k8s-worker-0 systemd[1]: Started nydus snapshotter.
Feb 12 15:10:39 kata-k8s-worker-0 containerd-nydus-grpc[6349]: /usr/local/bin/containerd-nydus-grpc:
/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required b>
Feb 12 15:10:39 kata-k8s-worker-0 containerd-nydus-grpc[6349]: /usr/local/bin/containerd-nydus-grpc:
/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required b>
Feb 12 15:10:39 kata-k8s-worker-0 systemd[1]: nydus-snapshotter.service: Main process exited, code=exited, status=1/FAILURE
```
I think this is because 20.04 has version:
```
ubuntu@kata-k8s-worker-0:~$ ldd --version
ldd (Ubuntu GLIBC 2.31-0ubuntu9.16) 2.31
```
so it's too old for the nydus snapshotter.
Also 20.04 is EoL soon, so bumping is better.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Because az client restricts the name to be less than 64 characters. In
some cases (e.g. KATA_HYPERVISOR=qemu-runtime-rs) the generated name
will exceed the limit. This changed the function to shorten the name:
* SHA1 is computed from metadata then compound the cluster's name
* metadata as plain-text are passed as --tags
Fixes: #9850
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
We've seen some issues with tests not being run in
some of the Coco CI jobs (Issue #10451) and in the
envrionments that are more stable we noticed that
they had a newer version of bats installed.
Try updating the version to 1.10+ and print out
the version for debug purposes
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This PR replaces the arch uname -m to use the arch_to_golang
variable in the script to have a better uniformity across the script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The `deploy_k0s` and `deploy_k3s` kubectl installs aren't failing
yet, but let get ahead of this and bump them as well
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
GH-9592 addressed a bug in a previous version of the AKS Mariner host
kernel that blocked the CH v39 upgrade. This bug has now been fixed so
we undo that PR.
Note we also specify a different OCI version for Mariner as it differs
from Ubuntu's.
Fixes: #9594
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Delete the kata-containers-k8s-tests namespace before resetting the namespace
to ensure that no deployments or services are restarting and creating pods in the default namespace.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Signed-off-by: Wang, Arron <arron.wang@intel.com>
Delete test scripts forcely in `Delete kata-deploy` step before
deleting all kata pods.
Fixes: #9980
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Since yq frequently updates, let's upgrade to a version from February to
bypass potential issues with versions 4.41-4.43 for now. We can always
upgrade to the newest version if necessary.
Fixes#9354
Depends-on:github.com/kata-containers/tests#5818
Signed-off-by: Beraldo Leal <bleal@redhat.com>
This is done in order to work around
https://github.com/Azure/azure-cli/issues/28984, following a suggestion
on the very same issue.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes the indentation in gha run k8s common script
to have uniformity across the script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
kube-router decided to use :8080 for its metrics, and this seems to be a
change that affected k0s 1.30.0+, leading to kube-router pod crashing
all the time and anything can actually be started after that.
Due to this issue, let's simply use a different port (:9999) and move on
with our tests.
Fixes: #9623
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The CH v39 upgrade in #9575 is currently blocked because of a bug in the
Mariner host kernel. To address this, we temporarily tweak the Mariner
CI to use an Ubuntu host and the Kata guest kernel, while retaining the
Mariner initrd. This is tracked in #9594.
Importantly, this allows us to preserve CI for genpolicy. We had to
tweak the default rules.rego however, as the OCI version is now
different in the Ubuntu host. This is tracked in #9593.
This change has been tested together with CH v39 in #9588.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
By passing --overwrite-existing to `aks get-credentials` it will stop
asking if I want to overwrite the existing credentials. This is handy
for running the scripts locally.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
- Add basic test case to check that a ruuning
pod can use the api-server-rest (and attestation-agent
and confidential-data-hub indirectly) to get a resource
from a remote KBS
Fixes#9057
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Co-authored-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
There's an rg name duplication situation that got introduced by #9385
where 2 different test runs might have same rg name.
Add back uniqueness by including the first letter of GENPOLICY_PULL_METHOD to
cluster name.
Fixes: #9412
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
This PR defines the GH_PR_NUMBER variable in gha run k8s common
script to avoid failures like unbound variable when running
locally the scripts just like the GHA CI.
Fixes#9408
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
_print_cluster_name() create a string based information like the
pull request number and commit SHA. However, when you are developing the
scripts you might want to use an arbitrary name, so it was introduced
the $AKS_NAME variable that once exported it will overwrite the
generated name.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Until this point the deployed KBS service is only reachable from within
the cluster. This introduces a generic mechanism to apply an Ingress
configuration to expose the service externally.
The first implemened ingress is for AKS. In case the HTTP application
routing isn't enabled in the cluster (this is required for ingress), an
add-on is applied.
It was added the get_cluster_specific_dns_zone() and
enable_cluster_http_application_routing() helper functions
to gha-run-k8s-common.sh.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Kustomize has been used on some of our internal components (e.g.
kata-deploy) to manage k8s deployments. On CI it has been used
the `sed` tool to edit kustomization.yaml files, but `kustomize` is
more suitable for that purpose. So in order to use that tool on CI
scripts in the future, this commit introduces the `install_kustomize()`
function that is going to download and install the binary in
/usr/local/bin in case it's found on $PATH.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
1. Avoid repeating "kata-containers-k8s-tests".
2. Allow users to specify a different test namespace.
3. Introduce the TEST_CLUSTER_NAMESPACE variable, that will also be
useful when auto-generating the Agent Policy for these tests.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
delete_cluster() has tried to delete the az resources group regardless
if it exists. In some cases the result of that operation is ignored,
i.e., fail to resource group not found, but the log messages get a
little dirty. Let's delete the RG only if it exists then.
Fixes#8989
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This addresses an internal AKS issue that intermittently prevents
clusters from getting created. The fix has been rolled out to eastus but
not yet eastus2, so we unblock the CI by switching. No downsides in
general.
This supersedes #8990.
Fixes: #8989
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Due to the changes done in the CI, we need to set the correct
subscription to be used with the account from now on, otherwise we'd end
up using CoCo subscription.
Fixes: #8946
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is to fix a broken usage for `k3s kubectl version` by switching
an option `--short` to `--client=true`.
Fixes: #8621
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
There's no reason to escape the first + on the +k3s[0-9]\+ regex, as
shown here:
```sh
ubuntu@k3s:~$ /usr/local/bin/k3s kubectl version --short 2>/dev/null | \
grep "Client Version" | \
sed \
-e 's/Client Version: //' \
-e 's/+k3s[0-9]\+//'
v1.27.7
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It seems that with the new k3s release, they've bumped their kubectl
version from x.y.z+k3s1 to x.y.z+k3s2.
Let's ensure our regexp is more generic and future proof for such
changes.
Fixes: #8410
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Adapted the gha-run.sh script to create a Kubernetes cluster locally
using the kcli tool.
Use `./gha-run.sh create-cluster-kcli` to create it, and
`./gha-run.sh delete-cluster-kcli` to delete.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The tests are failing when setting up k0s, and that happens because we
download a kubectl binary matching the kubernetes version k0s is using,
and we do that by:
```
sudo k0s kubectl version --short 2>/dev/null | ...
```
With kubectl 1.28, which is now the default on k0s, `kubectl version
--short` has been removed, leading us to an empty stringm causing then
the error in the CI.
Fixes: #8105
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will serve us quite will in the upcoming tests addition, which will
also have to be executed using CRi-O.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We need the default capabilities to be enabled, especially `SYS_CHROOT`,
in order to have tests accessing the host to pass.
A huge thanks to Greg Kurz for spotting this and suggesting the fix.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Otherwise we'll face the following error:
```
Failed to enable unit: Interactive authentication required.
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is based on official CRI-O documentations[0] and right now we're
making this specific to Ubuntu as that's what we have as runners.
We may want to expand this in the future, but we're good for now.
[0]:
https://github.com/cri-o/cri-o/blob/main/install.md#apt-based-operating-systems
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will be very useful in the near future, when we start testing
kata-deploy with rke2 as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will be very useful in the near future, when we start testing
kata-deploy with k0s as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Ideally we'd add the instance_type or the full K8S_TEST_HOST_TYPE but
that exceeds the maximum amount of characteres allowed for the cluster
name. With this in mind, let's use the first letter of
K8S_TEST_HOST_TYPE instead.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This makes it so that each AKS cluster is created in its own individual
resource group, rather than using the "kataCI" resource group for all
test clusters.
This is to accommodate a tool that we recently introduced in our Azure
subscription which automatically deletes resource groups after a set
amount of time, in order to keep spending under control.
The tool will automatically delete any resource group, unless it has a
tag SkipAutoDeleteTill = YYYY-MM-DD. When this tag is present, the
resource group will be retained until the specified date.
Note that I tagged all current resource groups in our subscription with
SkipAutoDeleteTill = 2043-01-01 so that we don't lose any existing
resources.
Fixes: #7982
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>