kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2025-06-02 20:36:37 +00:00

Author	SHA1	Message	Date
Dan Mihai	fdf3088be0	Merge pull request #10842 from microsoft/danmihai1/disable-job-policy-test tests: disable k8s-policy-job.bats on coco-dev	2025-02-06 09:09:49 -08:00
Hyounggyu Choi	1bdb34e880	tests: Skip trusted storage tests for IBM SE Let's skip all tests for trusted storage until #10838 is resolved. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-02-06 12:09:14 +01:00
Dan Mihai	47ce5dad9d	tests: disable k8s-policy-job.bats on coco-dev k8s-policy-job is modeled after the older k8s-job, and it appears that both of them fail occasionally on coco-dev. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-02-05 23:06:16 +00:00
Greg Kurz	0215d958da	Merge pull request #10805 from balintTobik/egrep_removal egrep/fgrep removal	2025-01-30 18:26:59 +01:00
Balint Tobik	1943a1c96d	tests: replace egrep with grep -E to avoid deprecation warning https://lists.gnu.org/archive/html/info-gnu/2022-09/msg00001.html Signed-off-by: Balint Tobik <btobik@redhat.com>	2025-01-29 11:26:27 +01:00
Hyounggyu Choi	dde627cef4	test: Run full set of zcrypttest for VFIO-AP coldplug Previously, the test for VFIO-AP coldplug only checked whether a passthrough device was attached to the VM guest. This commit expands the test to include a full set of zcrypttest to verify that the device functions properly within a container. Additionally, since containerd has been upgraded to v1.7.25 on the test machine, it is no longer necessary to run the test via crictl. The commit removes all related codes/files. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-01-28 10:53:00 +01:00
Zvonko Kaiser	b4c710576e	Merge pull request #10782 from stevenhorsman/clh-metrics-write-update metrics: Increase minval range for blogbench test	2025-01-24 10:21:20 -05:00
Fabiano Fidêncio	b47cc6fffe	cri-containerd: Skip TestDeviceCgroup till it's adapted to cgroupsv2 As the devices controller works in a different way in cgroupsv2, the "/sys/fs/cgroup/devices/devices.list" file simply doesn't exist. For now, let's skip the test till the test maintainer decides to re-enable it for cgroupsv2. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-23 17:25:56 +01:00
Fabiano Fidêncio	0626d7182a	tests: k8s-cpu-ns: Adapt to cgroupsv2 The changes done are: * cpu/cpu.shares was replaced by cpu.weight * The weight, according to our reference[0], is calculated by: weight = (1 + ((request - 2) * 9999) / 262142) * cpu/cpu.cfs_quota_us & cpu/cpu.cfs_period_us were replaced by cpu.max, where quota and period are written together (in this order) [0]: https://github.com/containers/crun/blob/main/crun.1.md#cgroup-v2 Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-23 17:25:56 +01:00
Fabiano Fidêncio	4307f0c998	Revert "ci: mariner: Ensure kernel_params can be set" This reverts commit `091ad2a1b2`, in order to ensure tests would be running with cgroupsv2 on the guest. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-23 17:25:56 +01:00
stevenhorsman	d031e479ab	metrics: Increase minval range for blogbench test In the last couple of days I've seen the blogbench metrics write latency test on clh fail a few times because the latency was too low, so adjust the minimum range to tolerate quicker finishes. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-01-23 15:58:31 +00:00
Fabiano Fidêncio	734ef71cf7	tests: k8s: confidential: Cleanup $HOME/.ssh/known_hosts I've noticed the following error when running the tests with SEV: ``` 2025-01-21T17:10:28.7999896Z # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2025-01-21T17:10:28.8000614Z # @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ 2025-01-21T17:10:28.8001217Z # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2025-01-21T17:10:28.8001857Z # IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! 2025-01-21T17:10:28.8003009Z # Someone could be eavesdropping on you right now (man-in-the-middle attack)! 2025-01-21T17:10:28.8003348Z # It is also possible that a host key has just been changed. 2025-01-21T17:10:28.8004422Z # The fingerprint for the ED25519 key sent by the remote host is 2025-01-21T17:10:28.8005019Z # SHA256:x7wF8zI+LLyiwphzmUhqY12lrGY4gs5qNCD81f1Cn1E. 2025-01-21T17:10:28.8005459Z # Please contact your system administrator. 2025-01-21T17:10:28.8006734Z # Add correct host key in /home/kata/.ssh/known_hosts to get rid of this message. 2025-01-21T17:10:28.8007031Z # Offending ED25519 key in /home/kata/.ssh/known_hosts:178 2025-01-21T17:10:28.8007254Z # remove with: 2025-01-21T17:10:28.8008172Z # ssh-keygen -f "/home/kata/.ssh/known_hosts" -R "10.244.0.71" ``` And this was causing a failure to ssh into the confidential pod. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-23 12:04:13 +01:00
Fabiano Fidêncio	18137b1583	tests: k8s: confidential: Increase log_buf_len to 4M Relying on dmesg is really not ideal, as we may lose important info, mainly those which happen very early in the boot, depending on the size of kernel ring buffer. So, for this specific test, let's increase the kernel ring buffer, by default, to 4M. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-23 12:04:13 +01:00
Aurélien Bombo	0d70dc31c1	ci: Unify on $GH_PR_NUMBER environment variable While working on #10559, I realized that some parts of the codebase use $GH_PR_NUMBER, while other parts use $PR_NUMBER. Notably, in that PR, since I used $GH_PR_NUMBER for CoCo non-TEE tests without realizing that TEE tests use $PR_NUMBER, the tests on that PR fail on TEEs: https://github.com/kata-containers/kata-containers/actions/runs/12818127344/job/35744760351?pr=10559#step:10:45 ... 44 error: error parsing STDIN: error converting YAML to JSON: yaml: line 90: mapping values are not allowed in this context ... 135 image: ghcr.io/kata-containers/csi-kata-directvolume: ... So let's unify on $GH_PR_NUMBER so that this issue doesn't repro in the future: I replaced all instances of PR_NUMBER with GH_PR_NUMBER. Note that since some test scripts also refer to that variable, the CI for this PR will fail (would have also happened with the converse substitution), hence I'm not adding the ok-to-test label and we should force-merge this after review. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2025-01-17 10:53:08 -06:00
Hyounggyu Choi	f7816e9206	tests: Introduce retry_kubectl_apply() for trusted storage On s390x, some tests for trusted storage occasionally failed due to: ```bash etcdserver: request timed out ``` or ```bash Internal error occurred: resource quota evaluation timed out ``` These timeouts were not observed previously on k3s but occur sporadically on kubeadm. Importantly, they appear to be temporary and transient, which means they can be ignored in most cases. To address this, we introduced a new wrapper function, `retry_kubectl_apply()`, for `kubectl create`. This function retries applying a given manifest up to 5 times if it fails due to a timeout. However, it will still catch and handle any other errors during pod creation. Fixes: #10651 Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2025-01-14 21:15:44 +01:00
Pradipta Banerjee	36580bb642	tests: Update sealed secret CI value to base64url The existing encoding was base64 and it fails due to `874948638a` Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>	2025-01-13 09:37:05 -05:00
Zvonko Kaiser	f08a9eac11	Merge pull request #10721 from stevenhorsman/more-metrics-latency-minimum-range-fixes metrics: Increase latency test range	2025-01-10 21:59:39 -05:00
Wainer Moschetta	5fae2a9f91	Merge pull request #9871 from wainersm/fix-print_cluster_name tests/gha-run-k8s-common: shorten AKS cluster name	2025-01-09 14:35:02 -03:00
stevenhorsman	aaae5b6d0f	metrics: clh: Increase network-iperf3 range We hit a failure with: ``` time="2025-01-09T09:55:58Z" level=warning msg="Failed Minval (0.017600 > 0.015000) for [network-iperf3]" ``` The range is very big, but in the last 3 test runs I reviewed we have got a minimum value of 0.015s and a max value of 0.052, so there is a ~350% difference possible so I think we need to have a wide range to make this stable. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-01-09 11:25:57 +00:00
stevenhorsman	e946d9d5d3	metrics: qemu: Increase latency test range After the kernel version bump, in the latest nightly run https://github.com/kata-containers/kata-containers/actions/runs/12681309963/job/35345228400 The sequential read throughput result was 79.7% of the expected (so failed) and the sequential write was 84% of the expected, so was fairly close, so increase their minimum ranges to make them more robust. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-01-09 11:25:50 +00:00
Wainer dos Santos Moschetta	badc208e9a	tests/gha-run-k8s-common: shorten AKS cluster name Because az client restricts the name to be less than 64 characters. In some cases (e.g. KATA_HYPERVISOR=qemu-runtime-rs) the generated name will exceed the limit. This changed the function to shorten the name: * SHA1 is computed from metadata then compound the cluster's name * metadata as plain-text are passed as --tags Fixes: #9850 Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>	2025-01-08 16:39:07 -03:00
Fabiano Fidêncio	8f8988fcd1	Merge pull request #10714 from fidencio/topic/update-virtiofsd virtiofsd: Update to its v1.13.0 ( + one patch) release :-)	2025-01-08 17:59:29 +01:00
Fabiano Fidêncio	eb3fe0d27c	Merge pull request #10717 from fidencio/topic/re-enable-oom-test-for-mariner tests: Re-enable oom tests for mariner	2025-01-08 17:43:56 +01:00
stevenhorsman	dc069d83b5	metrics: Increase latency test range The bump to kernel 6.12 seems to have reduced the latency in the metrics test, so increase the ranges for the minimal value, to account for this. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2025-01-08 15:11:49 +00:00
Fabiano Fidêncio	967d5afb42	Revert "tests: k8s: Skip one of the empty-dir tests" This reverts commit `9aea7456fb`. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-08 14:07:34 +01:00
Fabiano Fidêncio	53ac0f00c5	tests: Re-enable oom tests for mariner Since we bumped to the 6.12.x LTS kernel, we've also adjusted the aggressivity of the OOM test, which may be enough to allow us to re-enable it for mariner. Fixes: #8821 Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-07 18:33:17 +01:00
Fabiano Fidêncio	f4a39e8c40	Merge pull request #10468 from fidencio/topic/early-tests-on-next-lts-kernel versions: Move kernel to the latest 6.12 release (the current LTS)	2025-01-07 18:02:04 +01:00
Fupan Li	b19db40343	CI: change the containerd tarball name to containerd Since from https://github.com/containerd/containerd/pull/9096 containerd removed cri-containerd-*.tar.gz release bundles, thus we'd better change the tarball name to "containerd". BTW, the containerd tarball containerd the follow files: bin/ bin/containerd-shim bin/ctr bin/containerd-shim-runc-v1 bin/containerd-stress bin/containerd bin/containerd-shim-runc-v2 thus we should untar containerd into /usr/local directory instead of "/" to keep align with the cri-containerd. In addition, there's no containerd.service file,runc binary and cni-plugin included, thus we should add a specific containerd.service file and install install the runc binary and cni-pluginspecifically. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-01-07 17:39:05 +08:00
Fabiano Fidêncio	9aea7456fb	tests: k8s: Skip one of the empty-dir tests An issue has been created for this, and we should fix the issue before the next release. However, for now, let's unblock the kernel bump and have the test skipped. Reference: https://github.com/kata-containers/kata-containers/issues/10706 Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-06 21:48:20 +01:00
Fabiano Fidêncio	44ff602c64	tests: k8s: Be more aggressive to get OOM Let's increase the amount of bytes allocated per VM worker, so we can hit the OOM sooner. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2025-01-06 21:48:20 +01:00
Fupan Li	2068801b80	Merge pull request #10626 from teawater/ma Add mem-agent to kata	2024-12-24 14:11:36 +08:00
Steve Horsman	99f239bc44	Merge pull request #10380 from stevenhorsman/required-tests-guidance doc: Add required jobs info	2024-12-20 16:24:42 +00:00
stevenhorsman	d1d4bc43a4	static-checks: Add words to dictionary devmapper and snapshotters are being marked as spelling errors, so add them to the kata dictionary Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-20 14:16:52 +00:00
stevenhorsman	dd02b6699e	tests: Fix qemu-coc-dev skip Fix the logic to make the test skipped on qemu-coco-dev, rather than the opposite and update the syntax to make it clearer as it incorrectly got written and reviewed by three different people in it's prior form. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-19 19:50:46 +00:00
Hyounggyu Choi	341e5ca58e	vfio-ap: Assign default string "0" for empty APID and APQI The current script logic assigns an empty string to APID and APQI when APQN consists entirely of zeros (e.g., "00.0000"). However, this behavior is incorrect, as "00" and "0000" are valid values and should be represented as "0". This commit ensures that the script assigns the default string “0” to APID and APQI if their computed values are empty. Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>	2024-12-13 14:39:03 +01:00
Ryan Savino	7d45382f54	Revert "ci: Skip the failing tests in SNP" This reverts commit `2242aee099`.	2024-12-10 16:20:31 -06:00
Aurélien Bombo	037281d699	Merge pull request #10593 from microsoft/saulparedes/improve_namespace_validation policy: improve pod namespace validation	2024-12-09 11:55:09 -06:00
Hui Zhu	4407f6e098	mem-agent: Add to src mem-agent is a component designed for managing memory in Linux environments. Sub-feature memcg: Utilizes the MgLRU feature to monitor each cgroup's memory usage and periodically reclaim cold memory. Sub-feature compact: Periodically compacts memory to facilitate the kernel's free page reporting feature, enabling the release of more idle memory from guests. During memory reclamation and compaction, mem-agent monitors system pressure using Pressure Stall Information (PSI). If the system pressure becomes too high, memory reclamation or compaction will automatically stop. Fixes: #10625 Signed-off-by: Hui Zhu <teawater@antgroup.com>	2024-12-06 10:00:02 +08:00
Wainer Moschetta	a94982d8b8	Merge pull request #10617 from stevenhorsman/skip-k8s-job-test-on-non-tee tests: Skip k8s job test on qemu-coco-dev	2024-12-04 15:47:33 -03:00
Saul Paredes	84a411dac4	policy: improve pod namespace validation - Remove default_namespace from settings - Ensure container namespaces in a pod match each other in case no namespace is specified in the YAML Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-12-04 10:17:54 -08:00
Steve Horsman	c86f76d324	Merge pull request #10588 from stevenhorsman/metrics-clh-min-range-relaxation metrics: Increase minval range for failing tests	2024-12-04 16:10:26 +00:00
stevenhorsman	a8ccd9a2ac	tests: Skip k8s job test on qemu-coco-dev The tests is unstable on this platform, so skip it for now to prevent the regular known failures covering up other issues. See #10616 Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-12-04 16:00:05 +00:00
Saul Paredes	711d12e5db	policy: support optional metadata uid field This prevents a deserialization error when uid is specified Signed-off-by: Saul Paredes <saulparedes@microsoft.com>	2024-12-02 11:24:58 -08:00
stevenhorsman	b87b4b6756	metrics: Increase ranges range for qemu failing tests We've also seen the qemu metrics tests are failing due to the results being slightly outside the max range for network-iperf3 parallel and minimum for network-iperf3 jitter tests on PRs that have no code changes, so we've increase the bounds to not see false negatives. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-11-29 10:52:16 +00:00
stevenhorsman	4011071526	metrics: Increase minval range for failing tests We've seen a couple of instances recently where the metrics tests are failing due to the results being below the minimum value by ~2%. For tests like latency I'm not sure why values being too low would be an issue, but I've updated the minpercent range of the failing tests to try and get them passing. Signed-off-by: stevenhorsman <steven@uk.ibm.com>	2024-11-29 10:50:02 +00:00
Aurélien Bombo	16a91fccbe	Merge pull request #10561 from sprt/csi-driver-ci coco: ci: Lay groundwork for compiling and publishing CSI driver image [1/x]	2024-11-27 10:26:45 -06:00
Aurélien Bombo	5e4990bcf5	coco: ci: Add no-op steps to deploy CSI driver This adds no-op steps that'll be used to deploy and clean up the CSI driver used for testing. Signed-off-by: Aurélien Bombo <abombo@microsoft.com>	2024-11-21 16:08:06 -06:00
Adithya Krishnan Kannan	2242aee099	ci: Skip the failing tests in SNP Per [Issue#10549](https://github.com/kata-containers/kata-containers/issues/10549), the following tests are failing on SNP. 1. k8s-guest-pull-image-encrypted.bats 2. k8s-guest-pull-image-authenticated.bats 3. k8s-guest-pull-image-signature.bats 4. k8s-confidential-attestation.bats Per @fidencio 's comment on [PR#10558](https://github.com/kata-containers/kata-containers/pull/10558), I am skipping the same. Signed-Off-By: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>	2024-11-19 10:41:43 -06:00
Fabiano Fidêncio	9b1a5f2ac2	tests: Add a way to run only tests which rely on attestation We're doing this as, at Intel, we have two different kind of machines we can plug into our CI. Without going much into details, only one of those two kinds of machines will work for the attestation tests we perform with ITA, thus in order to speed up the CI and improve test coverage (OS wise), we're going to run different tests in different machines. Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>	2024-11-14 15:51:57 +01:00
GabyCT	06fe459e52	Merge pull request #10508 from GabyCT/topic/installartsta gha: Get artifacts when installing kata tools in stability workflow	2024-11-11 15:59:06 -06:00

1 2 3 4 5 ...

1402 Commits