Add an extra function that updates kata config
to use the max num. of vcpus available and
to use the available memory in the system.
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This is actually a first attempt to document our CI, and all this
content was based on the document created by Fabiano Fidencio (kudos to
him). We are just moving the content and discussion from Google Docs to
here.
I used the "poetic license" to add some notes on what I believe our CI
will look like in the future.
Fixes#9006
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Beraldo Leal <bleal@redhat.com>
- Add v1 image test case
- Install protobuf-compiler in build check
- Reset containerd config to default in kubernetes test if we are testing genpolicy
- Update docker_credential crate
- Add test that uses default pull method
- Use GENPOLICY_PULL_METHOD in test
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
This PR improves the kbs_k8s_delete function to verify that the
resources were properly deleted for baremetal environments.
Fixes#9379
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
- At the moment we aren't supporting ppc64le or
aarch64 for
CoCo, so filter out these tests from running
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Add basic test case to check that a ruuning
pod can use the api-server-rest (and attestation-agent
and confidential-data-hub indirectly) to get a resource
from a remote KBS
Fixes#9057
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Co-authored-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR improves the latency test cleanup in order to avoid random
failures of leaving the pods.
Fixes#9418
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
There's an rg name duplication situation that got introduced by #9385
where 2 different test runs might have same rg name.
Add back uniqueness by including the first letter of GENPOLICY_PULL_METHOD to
cluster name.
Fixes: #9412
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
This PR defines the GH_PR_NUMBER variable in gha run k8s common
script to avoid failures like unbound variable when running
locally the scripts just like the GHA CI.
Fixes#9408
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Use the "allow all" policy for k8s-sandbox-vcpus-allocation.bats,
instead of relying on the Kata Guest image to use the same policy
as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-nginx-connectivity.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-volume.bats, instead of relying
on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-sysctls.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-security-context.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-seccomp.bats, instead of relying
on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-projected-volume.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-pod-quota.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-optional-empty-secret.bats,
instead of relying on the Kata Guest image to use the same policy as
its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-measured-rootfs.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-liveness-probes.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-inotify.bats, instead of relying
on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-guest-pull-image.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-footloose.bats, instead of
relying on the Kata Guest image to use the same policy as its
default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-empty-dirs.bats, instead of
relying on the Kata Guest image to use the same policy as its
default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Check from:
- k8s-exec-rejected.bats
- k8s-policy-set-keys.bats
if policy testing is enabled or not, to reduce the complexity of
run_kubernetes_tests.sh. After these changes, there are no policy
specific commands left in run_kubernetes_tests.sh.
add_allow_all_policy_to_yaml() is moving out of run_kubernetes_tests.sh
too, but it not used yet. It will be used in future commits.
Fixes: #9395
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Don't add the "allow all" policy to all the test YAML files anymore.
After this change, the k8s tests assume that all the Kata CI Guest
rootfs image files either:
- Don't support Agent Policy at all, or
- Include an "allow all" default policy.
This relience/assumption will be addressed in a future commit.
Fixes: #9395
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR updates the journal log name for nerdctl artifacts to make
sure that we have different names in case we add a parallel GHA job.
Fixes#9357
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
the service might not listen on the default port, use the full service
address to ensure we are talking to the right resource.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
this can be used on kcli or other systems where cluster nodes are
accessible from all places where the tests are running.
Fixes: #9272
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
Since v2.2.6 it can detect TDX guests on Azure, so let's bump it even if
Azure peer-pods are not currently used as part of our CI.
Fixes: #9348
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes the unbound variables error when trying to run
the setup script locally in order to avoid errors.
Fixes#9328
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR updates the journal log names for kubernetes artifacts
in order to make sure that we have different names when we are
running parallel GHA jobs.
Fixes#9308
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR terminates the nydus namespace to avoid the error of
that the flag needs an argument.
Fixes#9264
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Added an announcement message to the `static-checks.sh` script. It runs
platform / architecture specific code so it would be useful to display
details of the platform the checker is running on to help with
debugging.
Fixes: #9258.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Those two architectures are not TEE capable, thus we can just skip
running those tests there.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is needed as the TDX machine is hosted inside Intel and relies on
proxies in order to connect to the external world. Not having those set
causes issues when pulling the image inside the guest.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add a test case of pulling image inside the guest for confidential
containers.
Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Georgina Kinge <georgina.kinge@ibm.com>
Co-authored-by: Megan Wright <Megan.Wright@ibm.com>
This commit updates `filter_k8s_test.sh` to handle skipped tests that
include comments. In addition to the existing parameter expansion,
the following expansions have been added:
- Removal of a comment
- Stripping of trailing spaces
Fixes: #9304
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Initialize $GITHUB_ENV to avoid nounset error when running the scripts locally
out of Github Actions.
Fixed commit 9ba5e3d2a8Fixes#9217
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Recently confidential-containers/kbs repository was renamed to
confidential-containers/trustee. Github will automatically resolve the
old URL but we better adjust it in code.
The trustee repository will be cloned to $COCO_TRUSTEE_DIR. Adjusted
file paths and pushd/popd's to use $COCO_KBS_DIR
($COCO_TRUSTEE_DIR/kbs).
On versions.yaml changed from `coco-kbs` to `coco-trustee` as in the
future we might need other trustee components, so keeping it generic.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Added the kbs_set_resources_policy() function to set the KBS policy. Also the
kbs_set_allow_all_resources() and kbs_set_deny_all_resources to set the
"allow all" and "deny all" policy, respectively.
Fixes#9056
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Added utility functions to manage resources in KBS:
- kbs_set_resource(), where the resource data is passed via argument
- kbs_set_resource_from_file(), where the resource data is found in a
file
Fixes#9056
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Added kbs_install_cli function to build and install the kbs-client
executable if not present into the system.
Removed the stub from gha-run.sh; now the install kbs-client in the
.github/workflows/run-kata-deploy-tests-on-aks.yaml will effectively
install the executable.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Added functions to return the service host, port or full-qualified
HTTP address, respectively, kbs_k8s_svc_host(), kbs_k8s_svc_port(),
and kbs_k8s_svc_http_addr().
Fixes#9056
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Fix the `grep(1)` warning caused by the unnecessary escaping of the
hash/sharp symbol.
Fixes: #9235.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
The k3s distribution of k8s uses an embedded version of containerd and
configures it to log to a file, not the journal. Hence, although we
collect the journal as a test artifact, we also need to collect the
actual log files for containerd.
Also collect the k3s containerd config files to help with debugging.
Fixes: #9104.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
The jobs running on garm will collect journal information. The data gathered
is based on the time the tests started running. The $start_time is
exported on run_tests() and used in collect_artifacts(). It happens that
run_tests() and collect_artifacts() are called on different steps of the
workflow and the environment variables aren't preserved between them,
i.e, $start_time exported on the first step is not available on the
subsequents.
To solve that issue, let's save $start_time in the file pointed out by
$GITHUB_ENV that Github actions uses to export variables. In case $GITHUB_ENV is
empty then probably it is running locally outside of Github, so it won't
save the start time value.
Fixes#9217
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR adds the general tests documentation in main README of the
kata containers repository.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds general README documentation for the tests section
in the kata containers repository.
Fixes#9209
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
PR #8760 tentatively tried to have the shim to run in its own mount
namespace for the sake of improving isolation between the sandbox and
the host. Thus crio storage drivers shouldn't create a PRIVATE
bind mount on their home directory. Otherwise, the container's rootfs
mount wouldn't be propagated to kata runtime's mount namespace, and
kata runtime couldn't access the container's rootfs files.
So, when kata cooperated with crio, crio should set
skip_mount_home=true for its storage overlay.
Fixes: #9028
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
_print_cluster_name() create a string based information like the
pull request number and commit SHA. However, when you are developing the
scripts you might want to use an arbitrary name, so it was introduced
the $AKS_NAME variable that once exported it will overwrite the
generated name.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Until this point the deployed KBS service is only reachable from within
the cluster. This introduces a generic mechanism to apply an Ingress
configuration to expose the service externally.
The first implemened ingress is for AKS. In case the HTTP application
routing isn't enabled in the cluster (this is required for ingress), an
add-on is applied.
It was added the get_cluster_specific_dns_zone() and
enable_cluster_http_application_routing() helper functions
to gha-run-k8s-common.sh.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Introduce the tests/integration/kubernetes/confidential_kbs.sh library
that contains functions to manage the KBS on CI. Initially implemented
the kbs_k8s_deploy() and kbs_k8s_delete() functions to, respectively,
deploy and delete KBS on Kubernetes. Also hooked those functions in the
tests/integration/kubernetes/gha-run.sh script to follow the convention
of running commands from Github Workflows:
$ .tests/integration/kubernetes/gha-run.sh deploy-coco-kbs
$ .tests/integration/kubernetes/gha-run.sh delete-coco-kbs
Fixes#9058
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Kustomize has been used on some of our internal components (e.g.
kata-deploy) to manage k8s deployments. On CI it has been used
the `sed` tool to edit kustomization.yaml files, but `kustomize` is
more suitable for that purpose. So in order to use that tool on CI
scripts in the future, this commit introduces the `install_kustomize()`
function that is going to download and install the binary in
/usr/local/bin in case it's found on $PATH.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR adds the collect artifacts function in gha-run script for
the kubernetes tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Currently, the checking for kata-deploy is running assume that the
daemonset scheduled at least one pod, however it might not had and the
kubectl wait command fails due to "error: no matching resources found".
On CI I've observed that fail intermittently. I suspect the service
account kata-deploy-sa take a while to show up then no kata-deploy is
scheduled in meanwhile.
Changed the checker logic to use waitForProcess() to keep testing if it is
already running, or hit the timeout (still 10m).
Fixes#9183
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR enables the cri-containerd tests for cloud hypervisor runtime-rs.
Fixes#9198
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Kubernetes v1.29 introduced a new `PodReadyToStartContainers` condition
that gets inserted at index 0 in the conditions array. This means that
the expected `PodCompleted` reason can now be either at index 0 with
kubernetes v1.28 and older or at index 1 starting with kubernetes v1.29.
This is fragile at best since the `kubectl wait` doesn't allow to combine
multiple checks. Also, checking the reason is dubious as it doesn't really
tell if the pods have actually completed or not.
Check the pod phase to be `Succeeded` instead, this guarantees that :
> All containers in the Pod have terminated in success, and will not
> be restarted.
Fixes#9178
Signed-off-by: Greg Kurz <groug@kaod.org>
This PR implements general fixes to the gha-run script for the
cri-containerd tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR skips the cri-containerd in gha-run script for cloud hypervisor
runtime-rs.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Changed the "run k8s tests on AKS" workflows to get the CoCo KBS
installed so that we can run attestation tests.
The plan is to run attestation tests only on a subset of non-TEE jobs
initially, so this commit restricts to install KBS only on kata-qemu
configuration. Actually at this point it is added only stubs commands
to tests/integration/kubernetes/gha-run.sh that should be implemented
in a future commit.
Fixes#9058
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This refactor the teardown() of tests/integration/nydus/nydus_tests.sh:
* Moved boilerplate code that kill process to a loop;
* Doesn't leave teardown() if a process failed to get killed, so that
other clean up routines are ran;
* Check if the pid exist then attempt to kill the process, so avoid this
misleading message:
```
Usage:
kill [options] <pid> [...]
Options:
<pid> [...] send signal to every <pid> listed
-<signal>, -s, --signal <signal>
specify the <signal> to be sent
-q, --queue <value> integer value to be sent with the signal
-l, --list=[<signal>] list all signal names, or convert one to a name
-L, --table list all signal names in a nice table
-h, --help display this help and exit
-V, --version output version information and exit
For more details see kill(1).
```
Fixes#8948
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
It's recommended to avoid images from docker.io to avoid errors related
with hitting the pull limits that happens mostly on bare-metal machines.
So this replaced the docker.io's busybox with
quay.io/prometheus/busybox.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The "run ps command" test has failed once in a while because it doesn't
wait the sh command to start within the container, consequently `ps`
won't report the amount of lines expected.
Fixes#8975
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Migrated runk tests from pure shell script to bats to be consistent with
other test suites.
The install_dependencies() will install the bats tool locally.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
CI failed to deploy nydus snapshotter because it was not cleaned up last time.
So we can try to cleanup nydus snapshotter before deploying it.
Fixes: #9121
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
teardown() gets executed after each test case, so there is no need to
clean-up before teardown.
Fixes: #9072
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Add set_namespace_to_policy_settings() for changing the pod namespace
in genpolicy settings.
Fixes: #9072
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR adds the error script to display the error message with
much more information to help debugging.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds an enhanced die function in order to dump more information
in a yaml format that will help with the debugging.
Fixes#9105
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This has been introduced by 53bc4a432b,
where the condition was changed.
The correct condition is:
* If the list of supported tees does not contain the kata hypervisor
and the list of supported non tees does not contain the kata
hypervisor.
The error is that we were checking whether kata-hypervisor would contain
the list of supported tees, and that would almost always be false
(unless in the case where the list had an one and only one element).
Fixes: #9055 -- part II
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
A test `vcpu allocation k8s test` exhibits different behavior on s390x
For more details, please refer to issue #9093.
This commit is to make the test skipped until the issue is resolved on
the platform.
Fixes: #9093
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This PR pulls the docker image needed for the test as part of the dependencies
in order to avoid failures of timeouts mainly because the image was not
properly download it and it is unable to find it.
Fixes#9089
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the ability to run k8s confidential tests in a
non-TEE environment.
Fixes#9055
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This will ensure no leftovers are in the node, which has been cause the
TDX CI to fail every now and then.
Fixes: #9081
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Delete the debugger pod created during the test, rather than already
existing debugger pods.
Also, send the output of "kubectl delete" to stderr, just in case it's
useful for debugging.
Fixes: #9069
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
We can use daemonset to deploy nydus snapshotter, which will decrease
one manual step both for Kata Containers and Confidential Containers CI.
Fixes: #8584
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This PR replaces the add_kernel_initrd_annotations_to_yaml function
more generic so later can be used for other components.
Fixes#9054
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
1. add_requests_to_policy_settings allows one or more ttrpc requests
from the Host to the Guest. Example:
add_requests_to_policy_settings "${policy_settings_dir}" \
"ReadStreamRequest" "WriteStreamRequest"
2. add_copy_from_host_to_policy_settings allows executing on the Guest
the commands initiated behind the scenes by "kubectl cp" from the
Host to the Guest. Example:
add_copy_from_host_to_policy_settings "${policy_settings_dir}"
3. add_copy_from_guest_to_policy_settings allows executing on the Guest
the commands initiated behind the scenes by "kubectl cp" from the
Guest to the Host. Example:
add_copy_from_guest_to_policy_settings "${policy_settings_dir}" \
"/tmp/file.txt"
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
1. Rename install_kata_common to install_kata_core.
2. Add TODO for better way to install the Kata tools.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Bump nydus snapshotter version to v0.13.7.
The new release name of nydus snapshotter is `nydus-snapshotter-v0.13.7-linux-amd64.tar.gz`,
which differs from the version used by kata (`nydus-snapshotter-v0.12.0-x86_64.tgz`).
Therefore, we need to update the script to obtain the correct nydus snapshotter name.
Fixes: #9044
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Automatically generate the test policy for k8s-attach-handlers.bats,
if AUTO_GENERATE_POLICY is enabled.
Steps:
- Create a temporary directory for the current test and copy the
common genpolicy settings into this new directory.
- Change genpolicy settings in the temp directory to allow the
"kubectl exec" command that this test needs. (For CoCo, exec is
blocked by the default policy settings)
- Auto-generate the policy for the test YAML file.
- Test as usual, using the YAML file.
- Clean-up the temporary settings described above.
Fixes: #8921
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Enable AUTO_GENERATE_POLICY for one of the Kata CI K8s test platforms.
Additional platforms will be enabled after testing them.
When AUTO_GENERATE_POLICY is enabled, create genpolicy settings that
are common for all tests. Some of the tests will make temporary copies
of these common settings and customize them as needed.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
1. Avoid repeating "kata-containers-k8s-tests".
2. Allow users to specify a different test namespace.
3. Introduce the TEST_CLUSTER_NAMESPACE variable, that will also be
useful when auto-generating the Agent Policy for these tests.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR skips the k8s tests that are not working with cloud hypervisor
runtime-rs with its proper issue.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The loop that generate test cases for virtio-mem enabled/disabled
doesn't return the integers '1' and '0' as expected. Instead it returns
the strings '{1,' and '0}'.
Fixes#9024
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR is a split of #8585.
make the changes on the Github workflows, and the skeleton to deploy_snapshotter()
and cleanup_snapshotter() in tests/integration/kubernetes/gha-run.sh in this commit.
After initially merging this patch to trigger CI jobs for CoCo, which will begin executing
the dummy functions deploy_snapshotter() and cleanup_snapshotter(), the implementation details for these functions
remain in #8585. Our subsequent step involves transferring this logic to the PR #8484, enabling the PR to undergo CI testing prior to its merge.
Fixes: #8997
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
delete_cluster() has tried to delete the az resources group regardless
if it exists. In some cases the result of that operation is ignored,
i.e., fail to resource group not found, but the log messages get a
little dirty. Let's delete the RG only if it exists then.
Fixes#8989
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This addresses an internal AKS issue that intermittently prevents
clusters from getting created. The fix has been rolled out to eastus but
not yet eastus2, so we unblock the CI by switching. No downsides in
general.
This supersedes #8990.
Fixes: #8989
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR adds workflow for running kubernetes test suite on ppc64le.
It uses scripts to create and delete the cluster using kubeadm as none of the current cluster creation tools are supported on Power.
Fixes: #7950
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
This PR re-arranged the nerdctl tests to avoid random failures.
In this PR first will run the tests with RunC and then with the kata hypervisor.
This PR tries to avoid the random failures that is happening with cloud-hypervisor
and clh.
Fixes#8963
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR updates the required packages for the TensorFlow ResNet50
Int8 Dockerfile.
Fixes#8950
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Due to the changes done in the CI, we need to set the correct
subscription to be used with the account from now on, otherwise we'd end
up using CoCo subscription.
Fixes: #8946
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The existing confidential basic test titled `Test unencrypted
confidential container launch success and verify that we are
running in a secure enclave` has been updated to incorporate
IBM Secure Execution (`qemu-se`).
Previously, a secure image was absent from kata-deploy, hindering
the inclusion of IBM SE in the test.
Thanks to the #6755 update, it is now possible to test the TEE.
This modification extends the existing test by introducing
`qemu-se`. The specific changes are outlined below:
- Add an additional test `cc-se-e2e-tests` to s390x nightly
- Expansion of `REMOTE_COMMAND_PER_HYPERVISOR` for `qemu-se`
- Temporary exclusion of two test cases currently incompatible with IBM SE
(`cpu-ns` is a common issue across all TEEs, while `inotify`
will be addressed in a subsequent pull request).
Fixes: #8913
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Add --show-output-of-passing-tests to the k8s integration tests. The
output of a passing test can be helpful when investigating a failure
of the same test.
Fixes: #8885
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR updates the packages necessary to build the ResNet50 fp32
Dockerfile to run properly the benchmark.
Fixes#8875
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The changes to install and test genpolicy must come later, after CI
picks up these gha changes.
Fixes: #8856
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR removes the references to virtiofs from memory average
calculation when the container uses a shared file system other than
virtiofs.
Fixes: #8807
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR removes the iperf3 server protocol as this server definition is
also used for the UDP iperf3 benchmarks to avoid duplication of the
same yaml files.
Fixes#8829
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR uses a specific python version to run tensorflow benchmark
as it needs python 3.8 to run correctly and avoid failures.
Fixes#8791
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Containerd runtime options with wrong setting cause it failed.
Correct it as below:
...
[plugins.cri.containerd.runtimes.${runtime}.options]
ConfigPath= "${KATA_CONFIG_PATH}"
...
Fixes: #8746
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This PR removes the check images function from stressng test as now
it will part of the install dependencies function from gha-run script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
To avoid random failures while trying to build and install the stressng image,
this PR moves that step as part of the install dependencies in order to move
the stability tests and avoid timeouts.
Fixes#8787
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Log the list of the current pods between tests because these pods
might be related to cluster nodes occasionally running out of memory.
Fixes: #8769
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR adds the qemu-experimental hypervisor in the function to
kill kata components.
Fixes#8775
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
As the StratoVirt VMM has been added, we can update the docs
and make some intoduction to StratoVirt, thus users can know more
about the hypervisor choices.
Fixes: #8645
Signed-off-by: Zhigang Wang <wangzhigang17@huawei.com>
Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>
This PR improves the iperf3 cleanup to ensure all the components are
being deleted properly to avoid the random failures of leaving
the iperf3 clients on the kata metrics CI.
Fixes#8765
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The default network backend of runtime-rs with Dragonball is vhost-net
after #8609 merged. The tests might be failed if vhost modules are not
loaded.
Fixes: #8717
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
After kata-deploy has installed, check that the worker nodes
are still in Ready state and don't have a containerd://Unknown
container runtime versions, identicating that container isn't working
to ensure that we didn't corrupt the containerd config during kata-deploy's edits
Fixes: #8678
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
To become more resilient against these kinds of errors:
deployment.apps/confidential-unencrypted created
pod/confidential-unencrypted-c5fdd6964-rrb6q condition met
ssh: connect to host 10.42.0.109 port 22: Connection refused
Fixes: #8687
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR fixes the indentation of the confidential common
script for kubernetes tests.
Fixes#8698
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Update the Speed & Density metric tests baseline for StratoVirt
and re-enable them, and skip other metric tests temporarily.
Fixes: #8656
Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>
This PR improves the latency network cleanup by removing the pods
even if the test fails.
Fixes#8658
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Some copyright dates were not updated with the most recent changes to
code; update them.
Fixes#8595
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Change directory for running make due to local errors when building with
make -C.
Fixes#8595
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
This PR updates the TensorFlow ResNet50 Int8 Dockerfile to use the
proper python version for kata metrics.
Fixes#8643
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Fix paths for yqdir (where the install_yq.sh script currently is) so
that static checks can run without error.
Fixes#8595
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
This is to fix a broken usage for `k3s kubectl version` by switching
an option `--short` to `--client=true`.
Fixes: #8621
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Make the URL checker cycle through a list of user agent values until we
hit one the remote server is happy with.
This is required since, unfortunately, we really, really want to check
these URLs, but some sites block clients based on their `User-Agent`
(UA) request header value. And of course, each site is different and can
change its behaviour at any time.
Our strategy therefore is to try various UA's until we find one the
server accepts:
- No explicit UA (use `curl`'s default)
- Explicitly no UA.
- A blank UA.
- Partial UA values for various CLI tools.
- Partial UA values for various console web browsers.
- Partial UA for Emacs's built-in browser.
- The existing UA which is used as a "last ditch" attempt where the UA implies multiple platforms and browser.
> **Notes:**
>
> - The "partial UA" values specify specify the UA "product" but not the
> UA "product version": we specify `foo` and not `foo/1.2.3`). We do
> this since most sites tested appear to not care about the version.
> This is as expected given that the version is strictly optional (see `[*]`).
>
> - We now log all errors and display an error summary if none of the UAs
> worked, in addition to the simple list of the URLs we believe to be
> invalid. This should make future debugging simpler.
`[*]` - https://www.rfc-editor.org/rfc/rfc9110#section-10.1.5Fixes: #8553.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Split the call to `curl` in the URL checker out into a new
`run_url_check_cmd()` function to make `check_url()` slightly clearer.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
A prerequisite for measuring kata network bandwidth is
run Iperf3 tool at a the transport layer provided by a
k8s service for exposing a network where the clients
inside the cluster can use to contact Pods in the service.
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR fixes small issues like:
1. Cleaning up the k8s environment by removing the iperf test
implementation even when the test fails.
2. Checks if the workload returned a result before generating
an empty results json file as it was bein done.
3. Removes the redundancy of calls to functions that process
subtests and should compose the results json file only when
all results are ready and not before.
4. The tcp service manifest was added to the server deployment
which targets TCP port 5201.
Fixes: #8534
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR enables but do not run the nerdctl tests for cloud hypervisor
runtime-rs until we find out how stable they are.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
It is to remove a GITHUB_WORKSPACE directory for self-hosted runners
when a workflow fails due to the merge conflict. This will prevent
the subsequent workflows from getting stuck in the same situation.
Fixes: #8600
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This PR updates the python version for the TensorFlow ResNet FP32
dockerfile so the benchmark can run without issues.
Fixes#8593
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This will make the life easier for dragonball developers to properly
enable the tests once the tests are ready.
Fixes: #8569
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will make the life easier for dragonball developers to properly
enable the tests once the tests are ready.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will make the life easier for dragonball developers to properly
enable the tests once the tests are ready.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes the iperf parallel bandwidth limit for the kata
metrics CI.
Fixes#8530
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Print more information useful for debugging. Also, use a separate YAML
file for this test, instead of reusing someone else's file.
Fixes: #8270
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
As the configuration for the runtime-rs based drivers are now placed in
a different location than the golang ones, we should adapt this script
accordingly.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`kata-ctl` is the tool for runtime-rs, and it should be used instead of
`kata-runtime`.
`kata-ctl` requires sudo, and that's the reason it's also been added as
part of the calls.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`kata-runtime env` is an alias for `kata-runtime kata-env, and calling
it with the `env` paramenter allows us to easily extend the scripts to
use `kata-ctl` instead of `kata-runtime` when dealing with runtime-rs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As we've done some changes in the runtime-rs based drivers to install
their configuration into a different location, this should also be
reflected as part of this test.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Updates to scripts for static-checks.sh functionality, including common
functions location, the move of several common functions to the existing
common.bash, adding hadolint and xurls to the versions file, and changes
to static checks for running in the main kata containers repo.
The changes to the vendor check include searching for existing go.mod
files but no other changes to expand the test.
Fixes#8187
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Move tool as part of static checks migration.
Fixes#8187
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Signed-off-by: Derek Lee <derlee@redhat.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Marco Vedovati <mvedovati@suse.com>
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Move tool as part of static checks migration.
Fixes#8187
Signed-off-by: Bo Chen <chen.bo@intel.com>
Signed-off-by: Carlos Venegas <jos.c.venegas.munoz@intel.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Signed-off-by: Dan Middleton <dan.middleton@intel.com>
Signed-off-by: Derek Lee <derlee@redhat.com>
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: Graham Whaley <graham.whaley@intel.com>
Signed-off-by: Hui Zhu <teawater@antfin.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Jimmy Xu <xjmmyshcn@gmail.com>
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Move the tool as a dependency for static checks migration.
Fixes#8187
Signed-off-by: Bin Liu <bin@hyper.sh>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Julio Montes <julio.montes@intel.com>
This PR updates the iperf3 network documentation to include
the parallel bandwidth.
Fixes#8523
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Implements the following test case:
Scenario: Check incorrect hash fails
**Given** I have a version of kata installed that has a kernel with the
initramfs built and config with rootfs_verity.scheme=dm-verity
rootfs_verity.hash=<incorrect hash of rootfs> set in the kernel_params
**When** I try and create a container a basic pod
**Then** The pod is doesn't run
**And** Ideally we'd get a helpful message to indicate why
Currently on CI only qemu-tdx is built with measured
rootfs support in the kernel, so the test is restriced to that
runtimeclass.
Fixes#7415
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Bring the setup_common() from CCv0 branch test's
integration/kubernetes/confidential/tests_common.sh. It should be used
to reduce boilerplates on the setup() of the tests.
Unlike the original code, this won't export the `test_start_time` variable
as it wouldn't be accurate to grab logs from the worker nodes due
date/time mismatch between the running tests machine and the worker
node. The function export the `node` variable which holds the name of
a random node which has kata installed. Apart from that, it exports the
`node_start_time` which capture the date/time when the test started,
relative to the `node`.
Tests that should inspect the logs can schedule pods/resources to the `node`
and use `node_start_time` as the value reference to grep the logs.
Fixes#7590
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Bring the assert_logs_contain() from CCv0 branch tests'
integration/kubernetes/confidential/lib.sh.
Introduced the print_node_journal() which uses `kubectl debug` to print
the systemd's journal of a k8s's node.
Fixes#7590
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This new function allow to the annotations to metadata section in a yaml
configuration file.
Co-authored-by: Ryan Savino <ryan.savino@amd.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Copied the new_pod_config() and pod-config.yaml.in from CCv0 branch
tests' integration/kubernetes/confidential/tests_common.sh and fixtures.
Unlike the original version, new_pod_config() now gets the runtimeclass
by parameter as the RUNTIMECLASS environment variable seems not broadly
used on main branch's CI.
The pod-config.yaml.in was changed as the diff shows below. In
particular the imagePullSecrets was removed to avoid it throwing a
warning on the pod's log.
```
--- a/tests/integration/kubernetes/runtimeclass_workloads/pod-config.yaml.in
+++ b/tests/integration/kubernetes/runtimeclass_workloads/pod-config.yaml.in
@@ -5,12 +5,10 @@
apiVersion: v1
kind: Pod
metadata:
- name: busybox-cc
+ name: test-e2e
spec:
runtimeClassName: $RUNTIMECLASS
containers:
- - name: nginx
+ - name: test_container
image: $IMAGE
- imagePullPolicy: Always
- imagePullSecrets:
- - name: cococred
\ No newline at end of file
+ imagePullPolicy: Always
\ No newline at end of file
```
Co-authored-by: Georgina Kinge <georgina.kinge@ibm.com>
Co-authored-by: Megan Wright <Megan.Wright@ibm.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The following functions were copied from CCv0's branch test's
integration/kubernetes/confidential/lib.sh. I did just smalls
refactorings (shortened their names and delinted shellcheck warnings):
- k8s_delete_all_pods_if_any_exists()
- k8s_wait_pod_be_ready()
- k8s_create_pod()
- assert_pod_fail()
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Co-authored-by: Georgina Kinge <georgina.kinge@ibm.com>
Co-authored-by: Jordan Jackson <jordan.jackson@ibm.com>
Co-authored-by: Megan Wright <Megan.Wright@ibm.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Co-authored-by: Wang, Arron <arron.wang@intel.com>
This PR fixes the result finding for the general throughput for
the tensorflow benchmark.
Fixes#8466
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
As part of the CI migration, this PR is to add workflows for containerd and k8s for s390x.
Fixes: #7930
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This commit enables StratoVirt hypervisor to be tested in kata GHA,
incluing k8s, metrics, cri-containerd, nydus and so on.
Meanwhile, adding some unit tests for StratoVirt to make sure it works.
Fixes: #7794
Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>
This PR adds the iperf udp information to the network README
for the kata metrics CI.
Fixes#8452
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
As we've done some changes in the VMM vcpu allocation, let's introduce
basic tests to make sure that we're getting the expected behaviour.
The test consists in checking 3 scenarios:
* default_vcpus = 0 | no limits set
* this should allocate 1 vcpu
* default_vcpus = 0.75 | limits set to 0.25
* this should allocate 1 vcpu
* default_vcpus = 0.75 | limits set to 1.2
* this should allocate 2 vcpus
The tests are very basic, but they do ensure we're rounding things up to
what the new logic is supposed to do.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The current kata-deploy code has been doing a `sed` to add allowed
hypervisor annotations, so CBL mariner can be tested with their own
kernel and initrd.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
There's no reason to escape the first + on the +k3s[0-9]\+ regex, as
shown here:
```sh
ubuntu@k3s:~$ /usr/local/bin/k3s kubectl version --short 2>/dev/null | \
grep "Client Version" | \
sed \
-e 's/Client Version: //' \
-e 's/+k3s[0-9]\+//'
v1.27.7
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It seems that with the new k3s release, they've bumped their kubectl
version from x.y.z+k3s1 to x.y.z+k3s2.
Let's ensure our regexp is more generic and future proof for such
changes.
Fixes: #8410
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`TestDeviceCgroup` is added to cri-containerd's integration tests. The test
launches two containers. Each container has a block device. It checks the
validity of device cgroup.
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Add test to verify kata supports ipvlan networks.
This test can be bit tricky as it requires knowledge about host interfaces
to be used as a master for the ipvlan network.
However, with github actions, we can assume interface called eth0 to be
present on the host and functioning.
Fixes: #8366
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This PR makes the change to using the SIGKILL signal instead
of SIGTERM to force stop each kata component before start
running any metric test.
Fixes: #8336
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR removes double quotes in jq output to return raw strings
as input of checkmetrics tool.
Fixes: #8331
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR increases the number of attempts to stop kata components
when it is required usually before starting a metrics test.
Fixes: #8307
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR enables the new FIO test based on the containerd client
which is used to track the I/O metrics in the kata-ci environment.
Additionally this PR fixes the parsing of results.
Fixes: #8199
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR updates the values of the fio parameters for iodepth
requests and for the number of jobs, in order to increase the
number of sequential operations.
Additionally, it adds the list of packages needed to parse the
results.
Fixes: #8198
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
In bare-metal machines the git tree might get on unstable state with the
previous rebase left halfway. So let's attempt to abort any rebase before.
Fixes#8318
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR adds the iperf udp benchmark for bandwdith measurement
for network metrics.
Fixes#8246
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Encode policy file during test - easier to understand than hard-coding
the encoded file contents.
Fixes: #8214
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR corrects the init env() helper function, to make that
systemctl always returns true when enumerating masked services,
and preventing the test from failing
Fixes: #8242
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
To avoid errors when initializing the test environment, the
kill_processes_before_start() helper function needs to verify that
docker is installed before attempting to stop it.
Fixes: #8218
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR removes trailing commas so that the json results
file is valid.
This PR also changes the way data results are collected by
terating through the array of memory values to calculate
their average.
Fixes: #8204
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR removes the reference in the documentation to the DAX
subtest of the FIO benchmark, because this metric is currently
WIP.
Fixes: #8159
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR adds a trap whenever the scrip exits, it deletes the iperf
k8s deployment and k8s services, and deletes the kata components.
This way, when the script finishes, it verifies that there are
indeed no kata components still running.
Fixes: #8126
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
The KUBERNETES variable is mostly used by kata-deploy whether to apply
k3s specific deployments or not. It is used to select the type of
kubernetes to be installed (k3s, k0s, rancher...etc) and it is always
set on CI. Running the script locally we want to set a value by default
to avoid `KUBERNETES: unbound variable` errors.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test can give false-positive on a multi-node cluster. Changed it to
use the new get_one_kata_node() and the modified exec_host() to run the
setup commands on a given node (that has kata installed) and ensure the
test pod is scheduled at that same node.
Fixes#7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test can give false-positive on a multi-node cluster. Changed it to
use the new get_one_kata_node() and the modified exec_host() to run the
setup commands on a given node (that has kata installed) and ensure the
test pod is scheduled at that same node.
Fixes#7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The exec_host() simply fails on cluster with multi-nodes because
`kubectl get node -o name" will return a list o names. Moreover, it will
return control nodes names which usually don't have kata installed.
Fixes#7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The introduced get_one_kata_node() returns the first node that
has the kata-runtime=true label, i.e., supposedly a node with
kata installed.
This is useful for tests that should run on a determined worker
node on a multi-nodes cluster.
Fixes#7619
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Let KATA_HYPERVISOR be qemu by default in gh-run.sh as this variable
is required to tweak some configurations of kata-deploy.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The deploy-kata() of gha-run.sh will wait for 10 minutes for the kata
deploy installation finish. This allow users of the script to overwrite
that value by exporting the KATA_DEPLOY_WAIT_TIMEOUT environment
variable.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Fixed a couple of warns shellcheck emitted and disabled others:
* SC2154 (var is referenced but not assigned)
* SC2086 (Double quote to prevent globbing and word splitting)
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The only difference to the other platforms is that it needs to
export KUBECONFIG.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The cleanup-kcli() behaves like other deploy kata for
bare-metal (e.g. sev, tdx...etc) except that KUBECONFIG
should be exported.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The cleanup-kcli() behaves like other clean up for bare-metal (e.g. sev,
tdx...etc) except that KUBECONFIG should be exported.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
On CI workflows the variables DOCKER_REGISTRY, DOCKER_REPO and
DOCKER_TAG are exported to match the built image. However, when running
the script outside of CI context, a developer might just use the latest
image which in this case will be
`quay.io/kata-containers/kata-deploy-ci:kata-containers-latest`.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Adapted the gha-run.sh script to create a Kubernetes cluster locally
using the kcli tool.
Use `./gha-run.sh create-cluster-kcli` to create it, and
`./gha-run.sh delete-cluster-kcli` to delete.
Fixes#7620
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
I'm basically moving the runk tests from the tests repo to this one, and
I'm adding the "Signed-off-by:" of every single contributor the tests.
Fixes: #8116
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Chen Yiyang <cyyzero@qq.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
I'm basically moving the tracing tests from the tests repo to this one,
and I'm adding the "Signed-off-by:" of every single contributor to the
tests.
Fixes: #8114
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
This PR enables the use of jq pretty-print feature to
improve the formatting of metric results json files.
Fixes: #8081
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
FIO benchmark is enabled to measure IO in Kata
at different latencies using containerd client,
in order to complement the CI metrics testing set.
This PR asl deprecated the previous Fio bench
based on k8s.
Fixes: #8080
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
The tests are failing when setting up k0s, and that happens because we
download a kubectl binary matching the kubernetes version k0s is using,
and we do that by:
```
sudo k0s kubectl version --short 2>/dev/null | ...
```
With kubectl 1.28, which is now the default on k0s, `kubectl version
--short` has been removed, leading us to an empty stringm causing then
the error in the CI.
Fixes: #8105
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The k8s.gcr.io is deprecated for a while now and has been redirected to
registry.k8s.io. However on some bare-metal machines in our testing
pools that redirection is not working, so let's just replace the
registries.
Fixes#8098
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
(cherry picked from commit b2c3bca558c38deff2117d5909d9071c23c05590)
Let's move, adapt, and use the kata-monitor tests from the tests repo.
In this PR I'm keeping the SoB from every single contributor from who
touched those tests in the past.
Fixes: #8074
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This will serve us quite will in the upcoming tests addition, which will
also have to be executed using CRi-O.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will become handy when doing tests with CRI-O, as CRI-O doesn't
install the CNI plugins for us.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's ensure we have runc running with `SystemdCgroups = false`,
otherwise we'll face failures when running tests depending on runc on
Ubuntu 22.04, woth LTS containerd.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes the latency yamls path for the latency test for
kata metrics.
Fixes#8055
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
We've faced this as part of the CI, only happening with the CRI-O tests:
```
not ok 1 Test readonly volume for pods
# (from function `exec_host' in file tests_common.sh, line 51,
# in test file k8s-file-volume.bats, line 25)
# `exec_host "echo "$file_body" > $tmp_file"' failed with status 127
# [bats-exec-test:38] INFO: k8s configured to use runtimeclass
# bash: line 1: $'\r': command not found
#
# Error from server (NotFound): pods "test-file-volume" not found
```
I must say I didn't dig into figuring out why this is happening, but we
may be safe enough to just trail the '\r', as long as all the tests keep
passing on containerd.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We need the default capabilities to be enabled, especially `SYS_CHROOT`,
in order to have tests accessing the host to pass.
A huge thanks to Greg Kurz for spotting this and suggesting the fix.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
This PR fixes the network metrics section at the README by leaving
the current tests that we have in our kata metrics.
Fixes#8017
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR enables the latency test for gha run script for kata metrics.
Fixes#8037
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Otherwise we'll face the following error:
```
Failed to enable unit: Interactive authentication required.
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is based on official CRI-O documentations[0] and right now we're
making this specific to Ubuntu as that's what we have as runners.
We may want to expand this in the future, but we're good for now.
[0]:
https://github.com/cri-o/cri-o/blob/main/install.md#apt-based-operating-systems
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will be very useful in the near future, when we start testing
kata-deploy with rke2 as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will be very useful in the near future, when we start testing
kata-deploy with k0s as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We'll be using exactly the same code used for the k8s tests, which are
already deploying k3s on GARM.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We just need to make sure the correct overlay is applied, following what
we already have been doing for k3s.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
So we have a better control on which flavour of kubernetes kata-deploy
is expected to be targetting.
This was also done as part of fa62a4c01b,
for the k8s tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise only the first test will be executed
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Ideally we'd add the instance_type or the full K8S_TEST_HOST_TYPE but
that exceeds the maximum amount of characteres allowed for the cluster
name. With this in mind, let's use the first letter of
K8S_TEST_HOST_TYPE instead.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This makes it so that each AKS cluster is created in its own individual
resource group, rather than using the "kataCI" resource group for all
test clusters.
This is to accommodate a tool that we recently introduced in our Azure
subscription which automatically deletes resource groups after a set
amount of time, in order to keep spending under control.
The tool will automatically delete any resource group, unless it has a
tag SkipAutoDeleteTill = YYYY-MM-DD. When this tag is present, the
resource group will be retained until the specified date.
Note that I tagged all current resource groups in our subscription with
SkipAutoDeleteTill = 2043-01-01 so that we don't lose any existing
resources.
Fixes: #7982
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR enables the parallel bandwidth iperf limit for kata metrics.
Fixes#7989
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
We're hitting a specific issue after updating, which will require some
work on dragonball before it can be re-added here.
The issue:
```
...
3: failed to do rafs mount\\n
4: fail to attach rafs \\\"/var/lib/containerd-nydus/snapshots/2/fs/image/image.boot\\\"\\n
5: add share fs mount\\n
6: Mount rafs at
/rafs/197ef3db03c86b91bf3045ff59183ce8b5750941ad1d3484f4a8301a70f5109f/rootfs_lower
error: Failed to Mount backend
...
Caused by:
vmm action error: FsDevice(AttachBackendFailed(\\\"attach/detach a
backend filesystem failed:: missing field `version` at line 1 column
489\\\"))\"): unknown"
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will ensure we're testing with the correct runtime, instead of
using the `default` one.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
To support the v0.12.0 nydus-snapshotter, we need to update the config
files and the commandline to start nydus-snapshotter.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
And with this we finally enable the nydus tests to run as part of our
GHA CI.
Fixes: #6543
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We've been simply doing nothing whenever `install-kata` was called, and
that was the intent when we added the placeholder calls.
Now, let's install kata, as expected. :-)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As we've added install_nydus() and install_nydus_snapshotter(), which do
conform with the pattern we're following on GHA, let's rely on them
rather than relying on the bits coming from nydus_test.sh.
Later on we'll have install_nydus() and install_nydus_snapshotter() as
part of the dependencies install in our `gha-run.sh`.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to what's been done for the cri-containerd tests, as part of
84dd02e0f9, we need to add the timeout
here for the crictl calls.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise we may face errors like:
```
getting sandbox status of pod "d3af2db414ce8": metadata.Name,
metadata.Namespace or metadata.Uid is not in metadata
"&PodSandboxMetadata{Name:nydus-sandbox,Uid:,Namespace:default,Attempt:1,}"
getting sandbox status of pod "-A": rpc error: code = NotFound desc = an
error occurred when try to find sandbox: not found
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise we canoot properly start the nydus snapshotter, nor properly
kill it after it's been started.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The "source ..." we've been doing was not changed since those tests were
part of the Jenkins tests, and we need to adapt them, either setting the
correct path or entirely removing the ones that are not relevant to us
anymore.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will be used to download and install the
nydus-snapshotter, and it follows the same pattern we already have
introduced for downloading and installing another dependencies from
GitHub.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will be used to download and install nydus, and it follows
the same pattern we already have introduced for downloading and
installing another dependencies from GitHub.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
clippy is used as part our tests, so it's useful to have it installed
while we're already installing rust.
In case of developers, they also better be using it. :-)
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We'll use it as part of the refactoring we're doing in the static check
tests.
I can see a lot of other uses of this, but changing all of them to this
one is out of the scope for this PR.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We can use this a lot as part of our CI, but right now I'm just moving
those here with the intent to use later on in this series.
Fixes: #7974 -- part 0
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let me start with a fair warning that this commit is hard to split into
different parts that could be easily tested (or not tested, just
ignored) without breaking pieces.
Now, about the commit itself, as we're on the run to reduce costs
related to our sponsorship on Azure, we can split the k8s tests we run
in 2 simple groups:
* Tests that can be run in the smaller Azure instance (D2s_v5)
* Tests that required the normal Azure instance (D4s_v5)
With this in mind, we're now passing to the tests which type of host
we're using, which allows us to select to run either one of the two
types of tests, or even both in case of running the tests on a baremetal
system.
Fixes: #7972
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The list of tests which require a bigger VM instance is:
* k8s-number-cpus.bats -- failing on all CIs
* k8s-parallel.bats -- only failing on the cbl-mariner CI
* k8s-scale-nginx.bats -- only failing on the cbl-mariner CI
We'll keep those disabled while we re-work the logic to **only run
those** in a bigger (and more expensive) VM instance.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Without setting the cpu limit / request to 1, we can make this test run
in a smaller VM instance without any issue.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We need a very recent L2 guest kernel to fix all the bugs that occur in nested
virtualization.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
cloud hypervisor does not emulate pcie switches or pci bridges, so we need to
accept a lonely device.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
It is fine to start a VM with the disk image without syncing it as we now run
the test in an ephemeral Azure instance.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
tdp_mmu had some issues up until around Linux v6.3 that make it work
particularly bad when running nested on Hyper-V. Reload the module at the start
of the test and disable the tdp_mmu param.
Gather debug info at the end of the test to make it easier to figure out what
went wrong. This uses github actions group syntax so that each section can be
collapsed.
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
- reduce memory and cpu usage to fit in a D4s_v5
- source correct lib
- mount workspace from 9p
- disable cpu mitigations for speed
- drop unused commands and variables
- install containerd
- install kata from built artifacts
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
This imports the vfio test scripts github.com/kata-containers/tests. The test
case doesn't work yet but doing the changes in a separate commit will make it
easier to track the changes. The only change in this commit is renaming
vfio_jenkins_job_build.sh -> vfio_fedora_vm_wrapper.sh
Fixes: #6555
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
Otherwise we'll fail to configure kata-containers in the `install-kata`
step.
This is mostly needed because the nerdctl-full tarball doesn't provide a
contaienrd configuration, just the binary, as contaienrd does not
actually require a configuration file to run with the default config.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
TIL that the Azure VMs we use are created without an explicit outbund
connectivity defined.
This leads us to issues using `ping ...` as part of our tests, and when
consulting Jeremi Piotrowski about the issue he pointed me out to two
interesting links:
* https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access
* https://learn.microsoft.com/en-us/archive/blogs/mast/use-port-pings-instead-of-icmp-to-test-azure-vm-connectivity
For your own sanity, do not read the comments, after all this is
internet. :-)
Anyways, the suggestion is to use nping instead, which is provided by
the nmap package, so we can explicitly switch to using the tcp port 80
for the ping. With this in mind, I'm switching the image we use for the
test and using one that provided nping as a possible entry point, and
from now on (this part of) the tests should work.
Fixes: #7910
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
TIL that the Azure VMs we use are created without an explicit outbund
connectivity defined.
This leads us to issues using `ping ...` as part of our tests, and when
consulting Jeremi Piotrowski about the issue he pointed me out to two
interesting links:
* https://learn.microsoft.com/en-us/azure/virtual-network/ip-services/default-outbound-access
* https://learn.microsoft.com/en-us/archive/blogs/mast/use-port-pings-instead-of-icmp-to-test-azure-vm-connectivity
For your own sanity, do not read the comments, after all this is
internet. :-)
Anyways, the suggestion is to use nping instead, which is provided by
the nmap package, so we can explicitly switch to using the tcp port 80
for the ping. With this in mind, I'm switching the image we use for the
test and using one that provided nping as a possible entry point, and
from now on (this part of) the tests should work.
Fixes: #7910
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Fix kernel and initrd annotations in the k8s tests on Mariner. These
annotations must be applied to the spec.template for Deployment, Job
and ReplicationController resources.
Fixes: #7764
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR ensures that docker is running as part of the init_env function
in kata metrics to avoid failures like docker is not running and making
the kata metrics CI to fail.
Fixes#7898
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
FIO test is showing ongoing issues when running in k8s.
Working on running FIO on the ctr client which has been
shown to be stable.
Fixes: #7920
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
Let's add a very basic sanity test to check that we can spawn a
containers using nerdctl + Kata Containers.
This will ensure that, at least, we don't regress to the point where
this feature doesn't work at all.
In the future, we should also test all the VMMs with devmapper, but
that's for a follow-up PR after this test is working as expected.
Fixes: #7911
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add a very basic sanity test to check that we can spawn a
containers using docker + Kata Containers.
This will ensure that, at least, we don't regress to the point where
this feature doesn't work at all.
For now we're running this test against Cloud Hypervisor and QEMU only,
due to an already reported issue with dragonball:
https://github.com/kata-containers/kata-containers/issues/7912
In the future, we should also test all the VMMs with devmapper, but
that's for a follow-up PR after this test is working as expected.
Fixes: #7910
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Now that the metrics migration from the tests to kata containers has been completed, this PR removes the warning from the main metrics documentation.
Fixes#7894
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
We're changing what's been done as part of ac939c458c, as we've
notcied issues using `github.event.pull_request.merge_commit_sha`.
Basically, whenever a force-push would happen, the reference of
merge_commit_sha wouldn't be updated, leading us to test PRs with the
old code. :-/
In order to get the rebase properly working, we need to ensure we pull
the hash of the commit as part of checkout action, and ensure
fetch-depth is set to 0.
Fixes: #7414
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
There's absolutely no need to have the skip check as part of the test
itself when it's already done as part of the setup function.
We're only touching the files here that were touched in the previous
commit.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's keep both checks for now, but in the future we'll be able to
remove the check for "firecracker", as the hypervisor name used as part
of the GitHub Actions has to match what's used as part of the
kata-deploy stuff, which is `fc` (as in `kata-fc for the runtime class)
instead of `firecracker`.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We've been using the `kata-deploy-tdx` target as that also uses k3s as
base, but it's better to just have a specific garm target.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
So we have a better control on which flavour of kubernetes kata-deploy
is expected to be targetting.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR changes the order in which the FIO test first
cleans the environment and then checks if the environment
is indeed clean.
Fixes: #7869
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
As we were using `tee` without the `-a` (or `--apend`) aptton, the
containerd config would be overwritten, leading to a NotReady state of
the Node.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's download the vanilla kubectl binary into `/usr/bin/`, as we need
to avoid hitting issues like:
```sh
error: open /etc/rancher/k3s/k3s.yaml.lock: permission denied
```
The issue basically happens because k3s links `/usr/local/bin/kubectl`
to `/usr/local/bin/k3s`, and that does extra stuff that vanilla
`kubectl` doesn't do.
Also, in order to properly use the k3s.yaml config with the vanilla
kubectl, we're copying it to ~/.kube/config.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Otherwise the /etc/rancher/k3s/k3s.yaml is not readable by other users
than root.
As --write-config-mode is being passed, and that's an option that has to
be passed to the `server`, -s is also added to the command line.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`wait` waits for a job to complete, not a number of seconds. Not sure
how I got that wrong in the first place, but it's what it's.
Fixes: #6542
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR replaces the ubuntu image for one which has TensorFlow optimized
for kata metrics.
Fixes#7866
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This function right now is completely based on what's part of the tests
repo[0], and that's the reason I'm keeping the `Signed-off-by` of all
the contributors to that file.
This is not perfect, though, as it changes the default snapshotter to
devmapper, instead of only doing so for the Kata Containers specific
runtime handlers. OTOH, this is exactly what we've always been doing as
part of the tests.
We'll improve it, soon enough, when we get to also add a way for
kata-deploy to set up different snapshotters for different handlers.
But, for now, this is as good (or as bad) as it's always been.
It's important to note that the devmapper setup doesn't take into
consideration a BM machine, and this is not suitable for that. We're
really only targetting GHA runners which will be thrown away after the
run is over.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Shiming Zhang <wzshiming@foxmail.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
One can use different kubernetes flavours for getting a kubernetes
cluster up and running.
As part of our CI, though, I really would like to avoid contributors
spending time maintaining and updating kubernetes dependencies, as done
with the tests repo, and which has been proven to be really good on
getting things rotten.
With this in mind, I'm taking the bullet and using "k3s" as the way to
deploy kubernetes for the devmapper related tests, and that's the reason
I'm adding a function to do so, and this will be used later on as part
of this series.
It's important to note that the k3s setup doesn't take into
consideration a BM machine, and this is not suitable for that. We're
really only targetting GHA runners which will be thrown away after the
run is over.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR adds the write 95 percentile for FIO for qemu for
checkmetrics for kata metrics.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the write 95 percentile FIO value for checkmetrics
for kata metrics.
Fixes#7842
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR enables the iperf benchmark to run on the gha for kata metrics.
Fixes#7575
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Use AGENT_POLICY=yes when building the Guest images, and add a
permissive test policy to the k8s tests for:
- CBL-Mariner
- SEV
- SNP
- TDX
Also, add an example of policy rejecting ExecProcessRequest.
Fixes: #7667
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR adds the grabdata script so it can be used for the metrics report
for kata metrics.
Fixes#7812
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR fixes the memory inside limit for clh for kata metrics due
to the recent changes that we had in the script which impacted
in the performance measurement.
Fixes#7786
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's expand the confidential test to also support TDX.
The main difference on the test, though, is that we're not grepping for
a string in the `dmesg` output, but rather relying on `cpuid` to detect
a TDX guest.
Fixes: #7184
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add a test case for the launch of unencrypted confidential
container, verifying that we are running inside a TEE.
Right now the test only works with SEV, but it'll be expanded in the
coming commits, as part of this very same series.
Fixes: #7184
Signed-Off-By: Unmesh Deodhar <udeodhar@amd.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes an issues in the parsing results stage,
by collecting just the n-results from the n-running
containers, discarding irrelevant data.
Fixes: #7774
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
k8s-pid-ns.bats was already using the test name from
k8s-kill-all-process-in-container.bats - probably a copy/paste bug.
Fixes: #7753
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR configures the corresponding kata runtime in K8s
based on the tested hypervisor.
This PR also enables FIO metrics test in the kata metrics-ci.
Fixes: #7665
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR fixes the check results for tensorflow benchmark now
that we change the name of the test.
Fixes#7684
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Instead of doing this as part of the test itself, let's ensure it's done
before running the tests and during the tests cleanup.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This test, at least for now, only checks whether the runtimeclasses
have been properly created.
This is just a migration from a test we had as part of the k8s suite.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
By doing this we can make sure there won't be any clash on the cluster
name created for either the k8s or the kata-deploy tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This will have the same function as run-k8s-tests.sh has, but for
kata-deploy.
Right now it doesn't have any tests, and the command to actually run the
tests is commented out, but right now this is just a placeholder that
will be populated sooner than later.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In a follow-up series, we'll add a whole suite for the kata-deploy
tests. With this in mind, let's already get rid of this one and avoid
more kata-deploy tests to land here.
Fixes: #7642
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The default `kata` runtime class would get created with the `kata`
handler instead of `kata-$KATA_HYPERVISOR`. This made Kata use the wrong
hypervisor and broke CI.
Fixes: #7663
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR fixes MobileNet help me description in the
tensorflow script.
Fixes#7661
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR fixes the TensorFlow word across the document to have uniformity
across all the document.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
As the cri-containerd tests have been fully migrated to GHA, let's make
sure we get them running.
Fixes: #6543
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As part of the runners, we're hitting a timeout that I cannot reproduce,
at all, when allocating the same instance and running the tests
manually.
The default timeout to connect to the server is 2s when using `crictl`.
Let's increase this to 20s.
It's fairly important to mention that in the first tests I used a
timeout of 10s, and that helped but we still hit issues every now and
then.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR renames the tensorflow scripts to include the data format
that is being used as we will have multiple tests with different
data and model formats for tensorflow so this will help us to
distinguish them.
Fixes#7645
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Right now this file does nothing, as it's not even called by any GHA.
However, it'll be populated later on as part of a different series,
where we'll have kata-deploy specific tests running here.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's split a good portion of `tests/integration/kuberentes/gha-run.sh`
out, and put them in a place where they can be used to the soon-to-come
kata-deploy specific tests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixed the loop that stops the kata-shim and the
hypervisors used in metrics checks.
Fixes: #7628
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR adds check containers are running in tensorflow mobilenet
that is being defined in common script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the check containers are up function from common
in tensorflow script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the check containers are running function the common metrics
script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the check containers are up in the common script
in the tensorflow mobilenet script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR uses the check containers are up from the common script
in the tensorflow script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR uses the collect results function defined in common for
the tensorflow mobilenet test.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes the collect results function from tensorflow script
as it is going to be referenced in the common metrics script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the collect results function to the common metrics
script.
Fixes#7617
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR computes average results for TF bench.
Additionally, it improves the data parsing from
all running containers.
Fixes: #7603
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
With these 2 simple checks we can ensure that we do not regress on the
behaviour of allowing the runtime classes / default runtime class to be
created by the kata-deploy payload.
Fixes: #7491
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add here the image we'll be using for unencrypted confidential
tests. Later on, we'll make sure to build and use this image as part of
our CI.
The image can easily be built as a multi-arch image, and has `cpuid`
installed in case of `x86_64` build, so it can be used to detect whether
we're running on a TEE guest without having to rely on `dmesg | grep
...`.
Fixes: #7595
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We don't have to do any sed to replace the runtimeclass being used by
the moment we start taking advantage of the `DEFAULT_SHIM` environment
variable exposed merged in the previous commits.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Instead of using package manager to install bats, building
this from source. This gives us the updated version of bats
which supports functions such as setup_file and
teardown_file.
We can use these functions into our current tests.
Fixes: #7597
Signed-off-by: Unmesh Deodhar <udeodhar@amd.com>
This PR changes the metrics workflow in order to just install
kata once, and run the checks for multiple hypervisor variations.
In this way we save time avoiding installing kata for each
hypervisor to be tested.
Fixes: #7578
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR renames the mobilenet tensorflow test to have a more specific
tensorflow name mainly because tensorflow has different configurations
and we will add more tensorflow tests so we want to distinguish each
tensorflow test.
Fixes#7571
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>