kata-containers

mirror of https://github.com/kata-containers/kata-containers.git synced 2026-05-14 19:17:07 +00:00

Author	SHA1	Message	Date
Zvonko Kaiser	d783e59b42	Merge pull request #12055 from fidencio/topic/coco-bump-trustee versions: Bump Trustee	2025-11-12 02:48:16 -05:00
Zvonko Kaiser	76e4e6bc24	Merge pull request #12061 from Apokleos/correct-unexpected-cap tests: Correct unexpected capability for policy failure test	2025-11-11 12:20:33 -05:00
Fabiano Fidêncio	d82eb8d0f1	ci: Drop docker tests We have had those tests broken for months. It's time to get rid of those. NOTE that we could easily revert this commit and re-add those tests as soon as we find someone to maintain and be responsible for such integration. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-11 17:02:02 +01:00
Fabiano Fidêncio	2d2b0de160	tests: kbs: Try to get the pod logs on deployment failure As this helps immensely to figure out what went wrong with the deployment. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-11 08:08:24 +01:00
Fabiano Fidêncio	58df06d90e	versions: Bump Trustee This is a bump pre-release, which brings several fixes and some improvements related to initData, and NVIDIA's remote verifier. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-11 08:08:05 +01:00
Alex Lyn	c225cba0e6	tests: Correct unexpected capability for policy failure test The test case designed to verify policy failures due to an "unexpected capability" was misconfigured. It was using "CAP_SYS_CHROOT" as the unexpected capability to be added. This configuration was flawed for two main reasons: 1.Incorrect Syntax: Kubernetes Pod specs expect capability names without the "CAP_" prefix (e.g., "SYS_CHROOT", not "CAP_SYS_CHROOT"). This made the test case's premise incorrect from a K8s API perspective. 2.Part of Default Set: "SYS_CHROOT" is already included in the `default_caps` list for a standard container. Therefore, adding it would not trigger a policy violation, defeating the purpose of the "unexpected capability" test. Furthermore, a related issue was observed where a malformed capability like "CAP_CAP_SYS_CHROOT" was being generated, causing parsing failures in the `oci-spec-rs` library. This was a symptom of incorrect string manipulation when handling capabilities. This commit corrects the test by selecting "SYS_NICE" as the unexpected capability. "SYS_NICE" is a more suitable choice because: - It is a valid Linux capability. - It is relatively harmless. - It is not part of the default capability set defined in `genpolicy-settings.json`. By using "SYS_NICE", the test now accurately simulates a scenario where a Pod requests a legitimate but non-default capability, which the policy (generated from a baseline Pod without this capability) should correctly reject. This change fixes the test's logic and also resolves the downstream `oci-spec-rs` parsing error by ensuring only valid capability names are processed. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-11-11 14:06:30 +08:00
Fabiano Fidêncio	92226d0a19	tests: nvidia: Be prepared for TDX Thankfully there's only one piece that's still SNP specific (for the supported TEEs). Let's adjust it so we can have an easy and smooth execution when adding a TDX CI machine. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Fabiano Fidêncio	4d314e8676	tests: nvidia: nims: Adjust to CC There are several changes needed in order to get this test working with CC, and yet we still are skipping it. Basically, we need to: * Pull an authenticated image inside the guest, which requires: * Using Trustee to release the credential * We still depend on a PR to be merged on Trustee side * https://github.com/confidential-containers/trustee/pull/1035 * We still depend on a Trustee bump (including the PR above) on our side Apart from those changes, I ended up "duplicating" the tests by adding a "-tee" version of those, which already have: * The proper kbs annotations set up * Dropped host mounts * Increases the memory needed Last but not least, as "bats" probably means "being a terrible script", I had to re-arrange a few things otherwise the tests would not even run due to bats-isms that I am sincerely not able to pin-point. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Fabiano Fidêncio	8cedd96d54	tests: nvidia: k8s: Enforce experimental_force_guest_pull We added the tests using virtio-9p as we knew it'd require incremental changes to be able to use any kind of guest-pull method. Now, as in the coming commits we'll be actually ensuring that guest-pull works and is in use, we can enforce the experimental_force_guest_pull usage for the nvidia cases. Note: We're using experimental_force_guest_pull instead of nydus-snapshotter due to stability concerns with the snapshotter. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Fabiano Fidêncio	e85cf83573	k8s: tests: Fix default for EXPERIMENTAL_FORCE_GUEST_PULL It takes either a shim name or "", but we were treating this (thankfully only in this specific file) as a boolean. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Manuel Huber	8b39468b36	tests: nvidia: Logging for NIM Adjust output to the setup_file and teardown_file behavior. With this, we will be able to observe relevant logging rather than adding to the output variable. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-11-10 13:01:30 +01:00
Fabiano Fidêncio	812191c1f3	tests: nvidia: Do not deploy NFD on nvidia-gpu cases As it'll come from the GPU Operator for now. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-10 13:01:30 +01:00
Dan Mihai	df7ee2dd38	ci: k8s: AUTO_GENERATE_POLICY for cbl-mariner Auto-generate policy on cbl-mariner Hosts if the user didn't specify an AUTO_GENERATE_POLICY value. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-08 00:00:09 +01:00
Dan Mihai	53acb74f26	genpolicy: adapt to new AKS pause container behavior The new image reference has changed to mcr.microsoft.com/oss/v2/kubernetes/pause:3.6 from mcr.microsoft.com/oss/kubernetes/pause:3.6. The new image uses by default UID=0, GID=0 while the older. The older image had: UID=65535, GID=65535. There is a new pause_container_id_policy field in genpolicy-settings.json, informing genpolicy about the way AdditionalGids gets updated - "v1" for the older behavior and "v2" for the newer AKS version: - When using v1, the default value of AdditionalGids is {65535}. - When using v2, the default value of AdditionalGids is {}. UID=65535 and GID=65535 are still hard-coded by default in genpolicy-settings.json. We might be able to remove/ignore these fields in the future, if we'll stop relying on policy::KataSpec::get_process_fields to use these fields. A new CI function adapt_common_policy_settings_for_aks() changes the pause container UID, GID, pause_container_id_policy, and image ref settings values when testing on AKS Hosts - i.e., when testing coco-dev or mariner Hosts. The genpolicy workarounds for the unexpected behavior with guest pull enabled have been improved to use the current container's GID instead of hard-coding GID=0 as the guest pull default. Also, AdditionalGids gets updated when the current container's GID is changing, instead of always changing the AdditionalGids at the very end of policy::AgentPolicy::get_container_process(), when the relevant evolution of the GID value was no longer available. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-08 00:00:09 +01:00
Dan Mihai	cacd37ee6e	tests: genpolicy: restore test settings for non-Coco configMap These settings got broken recently because the non-CoCo tests were disabled for unrelated reasons. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-08 00:00:09 +01:00
Manuel Huber	c6dc176a03	tests: nvidia: cc: Enable NIMs tests Same deal as the previous commut, just enabling the tests here, with the same list of improvements that we will need to go through in order to get is working in a perfect way. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-06 16:28:33 +01:00
Manuel Huber	8ca77f2655	tests: nvidia: cc: Run CUDA vectorAdd tests on CC mode While the primary goal of this change is to detect regressions to the NVIDIA SNP GPU scenario, various improvements to reflect a more realistic CC setting are planned in subsequent changes, such as: * moving away from the overlayfs snapshotter * disabling filesystem sharing * applying a pod security policy * activating the GPUs only after attestation * using a refined approach for GPU cold-plugging without requiring annotations * revisiting pod timeout and overhead parameters (the podOverhead value was increased due to CUDA vectorAdd requiring about 6Gi of podOverhead, as well as the inference and embedqa requiring at least 12Gi, respectively, 14Gi of podOverhead to run without invoking the host's oom-killer. We will revisit this aspect after addressing points 1. and 2.) Signed-off-by: Manuel Huber <manuelh@nvidia.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-06 16:28:33 +01:00
Manuel Huber	d8953f67c5	ci: Onboard another NVIDIA machine Let's add a new NVIDIA machine, which later on will be used for CC related tests. For now the current tests are skipped in the CC capable machine. Signed-off-by: Manuel Huber <manuelh@nvidia.com> Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-05 23:23:08 +01:00
Fupan Li	02ecab40e4	tests: disable the cpu hotplug test for coco dev runtime Since qemu-coco-dev-runtime-rs and qemu-coco-dev had disabled the cpu&memory hotplug by enable static_sandbox_resource_mgmt, thus we should disable the cpu hotplug test for those two runtime. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-11-05 16:59:13 +01:00
Fupan Li	1fc05491a2	tests: enable the cpu hotplug test for dragonball etc Since the qemu, cloud-hypervisor and dragonball had supported the cpu hotplug on runtime-rs, thus enable the cpu hotplug test in CI. Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>	2025-11-05 16:59:13 +01:00
Fabiano Fidêncio	0a0de4e6e3	Revert "tests: Do not enable NFD on s390x" This reverts commit `c75a46d17f`, as NFD now publishes an s390x image (and also a ppc64le one). Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-05 16:06:33 +01:00
Fabiano Fidêncio	5b01eaf929	tests: Align kata-deploy helm's uninstall Let's use the same method both on the kata-deploy and k8s tests. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-04 09:29:35 +01:00
Dan Mihai	6a4c336ca0	Merge pull request #12016 from microsoft/danmihai1/early-wait-abort tests: k8s: reduce test time for unexpected CreateContainerRequest errors	2025-11-03 12:04:56 -08:00
Fabiano Fidêncio	3107533953	tests: Adjust to runtimeClass creation by the chart It's just a follow-up on the previous commit where we move away from the runtimeClass creation inside the script, and instead we do it using the chart itself. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-03 17:32:18 +01:00
Fabiano Fidêncio	14039c9089	golang: Update to 1.24.9 In order to fix: ``` === Running govulncheck on containerd-shim-kata-v2 === Vulnerabilities found in containerd-shim-kata-v2: === Symbol Results === Vulnerability #1: GO-2025-4015 Excessive CPU consumption in Reader.ReadResponse in net/textproto More info: https://pkg.go.dev/vuln/GO-2025-4015 Standard library Found in: net/textproto@go1.24.6 Fixed in: net/textproto@go1.24.8 Vulnerable symbols found: #1: textproto.Reader.ReadResponse Vulnerability #2: GO-2025-4014 Unbounded allocation when parsing GNU sparse map in archive/tar More info: https://pkg.go.dev/vuln/GO-2025-4014 Standard library Found in: archive/tar@go1.24.6 Fixed in: archive/tar@go1.24.8 Vulnerable symbols found: #1: tar.Reader.Next Vulnerability #3: GO-2025-4013 Panic when validating certificates with DSA public keys in crypto/x509 More info: https://pkg.go.dev/vuln/GO-2025-4013 Standard library Found in: crypto/x509@go1.24.6 Fixed in: crypto/x509@go1.24.8 Vulnerable symbols found: #1: x509.Certificate.Verify #2: x509.Certificate.Verify Vulnerability #4: GO-2025-4012 Lack of limit when parsing cookies can cause memory exhaustion in net/http More info: https://pkg.go.dev/vuln/GO-2025-4012 Standard library Found in: net/http@go1.24.6 Fixed in: net/http@go1.24.8 Vulnerable symbols found: #1: http.Client.Do #2: http.Client.Get #3: http.Client.Head #4: http.Client.Post #5: http.Client.PostForm Use '-show traces' to see the other 9 found symbols Vulnerability #5: GO-2025-4011 Parsing DER payload can cause memory exhaustion in encoding/asn1 More info: https://pkg.go.dev/vuln/GO-2025-4011 Standard library Found in: encoding/asn1@go1.24.6 Fixed in: encoding/asn1@go1.24.8 Vulnerable symbols found: #1: asn1.Unmarshal #2: asn1.UnmarshalWithParams Vulnerability #6: GO-2025-4010 Insufficient validation of bracketed IPv6 hostnames in net/url More info: https://pkg.go.dev/vuln/GO-2025-4010 Standard library Found in: net/url@go1.24.6 Fixed in: net/url@go1.24.8 Vulnerable symbols found: #1: url.JoinPath #2: url.Parse #3: url.ParseRequestURI #4: url.URL.Parse #5: url.URL.UnmarshalBinary Vulnerability #7: GO-2025-4009 Quadratic complexity when parsing some invalid inputs in encoding/pem More info: https://pkg.go.dev/vuln/GO-2025-4009 Standard library Found in: encoding/pem@go1.24.6 Fixed in: encoding/pem@go1.24.8 Vulnerable symbols found: #1: pem.Decode Vulnerability #8: GO-2025-4008 ALPN negotiation error contains attacker controlled information in crypto/tls More info: https://pkg.go.dev/vuln/GO-2025-4008 Standard library Found in: crypto/tls@go1.24.6 Fixed in: crypto/tls@go1.24.8 Vulnerable symbols found: #1: tls.Conn.Handshake #2: tls.Conn.HandshakeContext #3: tls.Conn.Read #4: tls.Conn.Write #5: tls.Dial Use '-show traces' to see the other 4 found symbols Vulnerability #9: GO-2025-4007 Quadratic complexity when checking name constraints in crypto/x509 More info: https://pkg.go.dev/vuln/GO-2025-4007 Standard library Found in: crypto/x509@go1.24.6 Fixed in: crypto/x509@go1.24.9 Vulnerable symbols found: #1: x509.CertPool.AppendCertsFromPEM #2: x509.Certificate.CheckCRLSignature #3: x509.Certificate.CheckSignature #4: x509.Certificate.CheckSignatureFrom #5: x509.Certificate.CreateCRL Use '-show traces' to see the other 27 found symbols Vulnerability #10: GO-2025-4006 Excessive CPU consumption in ParseAddress in net/mail More info: https://pkg.go.dev/vuln/GO-2025-4006 Standard library Found in: net/mail@go1.24.6 Fixed in: net/mail@go1.24.8 Vulnerable symbols found: #1: mail.AddressParser.Parse #2: mail.AddressParser.ParseList #3: mail.Header.AddressList #4: mail.ParseAddress #5: mail.ParseAddressList ``` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-03 16:57:22 +01:00
Dan Mihai	c563ee99fa	tests: policy-rc: detect create container errors early During the ${wait_time} for an expected condition, if CreateContainerRequest was NOT expected to fail: detect possible CreateContainerRequest failures early and abort the wait. For example, before this change: not ok 1 Successful replication controller with auto-generated policy in 123335ms ok 2 Policy failure: unexpected container command in 14601ms ok 3 Policy failure: unexpected volume mountPath in 14443ms ok 4 Policy failure: unexpected host device mapping in 14515ms ok 5 Policy failure: unexpected securityContext.allowPrivilegeEscalation in 14485ms ok 6 Policy failure: unexpected capability in 14382ms ok 7 Policy failure: unexpected UID = 1000 in 14578ms After this change: not ok 1 Successful replication controller with auto-generated policy in 17108ms ok 2 Policy failure: unexpected container command in 14427ms ok 3 Policy failure: unexpected volume mountPath in 14636ms ok 4 Policy failure: unexpected host device mapping in 14493ms ok 5 Policy failure: unexpected securityContext.allowPrivilegeEscalation in 14554ms ok 6 Policy failure: unexpected capability in 15087ms ok 7 Policy failure: unexpected UID = 1000 in 14371ms Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-03 15:55:55 +00:00
Dan Mihai	319400dc0d	tests: policy-pvc: detect create container errors early During the ${wait_time} for an expected condition, if CreateContainerRequest was NOT expected to fail: detect possible CreateContainerRequest failures early and abort the wait. For example, before this change: not ok 1 Successful pod with auto-generated policy in 94852ms ok 2 Policy failure: unexpected device mount in 17807ms After this change: not ok 1 Successful pod with auto-generated policy in 35194ms ok 2 Policy failure: unexpected device mount in 21355ms Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-03 15:55:55 +00:00
Dan Mihai	1914fcb812	tests: policy-log: detect create container errors early During the ${wait_time} for an expected condition, if CreateContainerRequest was NOT expected to fail: detect possible CreateContainerRequest failures early and abort the wait. For example, before this change: not ok 1 Logs empty when ReadStreamRequest is blocked in 102257ms After this change: not ok 1 Logs empty when ReadStreamRequest is blocked in 17339ms Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-03 15:55:55 +00:00
Dan Mihai	a0bd9e02ca	tests: policy-job: detect create container errors early During the ${wait_time} for an expected condition, if CreateContainerRequest was NOT expected to fail: detect possible CreateContainerRequest failures early and abort the wait. For example, before this change: not ok 1 Successful job with auto-generated policy in 107111ms ok 2 Policy failure: unexpected environment variable in 7920ms ok 3 Policy failure: unexpected command line argument in 7874ms ok 4 Policy failure: unexpected emptyDir volume in 7823ms ok 5 Policy failure: unexpected projected volume in 7812ms ok 6 Policy failure: unexpected readOnlyRootFilesystem in 7903ms ok 7 Policy failure: unexpected UID = 222 in 7720ms After this change: not ok 1 Successful job with auto-generated policy in 10271ms ok 2 Policy failure: unexpected environment variable in 8018ms ok 3 Policy failure: unexpected command line argument in 7886ms ok 4 Policy failure: unexpected emptyDir volume in 7621ms ok 5 Policy failure: unexpected projected volume in 7843ms ok 6 Policy failure: unexpected readOnlyRootFilesystem in 7632ms ok 7 Policy failure: unexpected UID = 222 in 7619ms Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-03 15:55:55 +00:00
Dan Mihai	992c91371c	tests: policy-deployment-sc: detect create container errors early During the ${wait_time} for an expected condition, if CreateContainerRequest was NOT expected to fail: detect possible CreateContainerRequest failures early and abort the wait. For example, before this change: ok 1 Successful sc deployment with auto-generated policy and container image volumes in 14769ms ok 2 Successful sc with fsGroup/supplementalGroup deployment with auto-generated policy and container image volumes in 8384ms not ok 3 Successful sc deployment with security context choosing another valid user in 136149ms ok 4 Successful layered sc deployment with auto-generated policy and container image volumes in 8862ms ok 5 Policy failure: unexpected GID = 0 for layered securityContext deployment in 7941ms ok 6 Policy failure: malicious root group added via supplementalGroups deployment in 11612ms After: ok 1 Successful sc deployment with auto-generated policy and container image volumes in 15230ms ok 2 Successful sc with fsGroup/supplementalGroup deployment with auto-generated policy and container image volumes in 9364ms not ok 3 Successful sc deployment with security context choosing another valid user in 11060ms ok 4 Successful layered sc deployment with auto-generated policy and container image volumes in 9124ms ok 5 Policy failure: unexpected GID = 0 for layered securityContext deployment in 7919ms ok 6 Policy failure: malicious root group added via supplementalGroups deployment in 11666ms Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-03 15:55:55 +00:00
Dan Mihai	704ee76f1e	tests: policy-deployment-sc: reduced redundancy Call common function instead of copy/paste of three commands. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-03 15:55:55 +00:00
Dan Mihai	2cafb10a6a	tests: policy-pod: detect create container errors early During the ${wait_time} for an expected condition, if CreateContainerRequest was NOT expected to fail: detect possible CreateContainerRequest failures early and abort the wait. For example, before this change: not ok 1 Successful pod with auto-generated policy in 110801ms not ok 2 Able to read env variables sourced from configmap using envFrom in 94104ms not ok 3 Successful pod with auto-generated policy and runtimeClassName filter in 95838ms not ok 4 Successful pod with auto-generated policy and custom layers cache path in 110712ms ok 5 Policy failure: unexpected container image in 8113ms ok 6 Policy failure: unexpected privileged security context in 7943ms ok 7 Policy failure: unexpected terminationMessagePath in 11530ms ok 8 Policy failure: unexpected hostPath volume mount in 7970ms ok 9 Policy failure: unexpected config map in 7933ms not ok 10 Policy failure: unexpected lifecycle.postStart.exec.command in 112677ms ok 11 RuntimeClassName filter: no policy in 2302ms not ok 12 ExecProcessRequest tests in 93946ms not ok 13 Successful pod: runAsUser having the same value as the UID from the container image in 94003ms ok 14 Policy failure: unexpected UID = 0 in 8016ms ok 15 Policy failure: unexpected UID = 1234 in 7850ms After: not ok 1 Successful pod with auto-generated policy in 12182ms not ok 2 Able to read env variables sourced from configmap using envFrom in 10121ms not ok 3 Successful pod with auto-generated policy and runtimeClassName filter in 11738ms not ok 4 Successful pod with auto-generated policy and custom layers cache path in 26592ms ok 5 Policy failure: unexpected container image in 7742ms ok 6 Policy failure: unexpected privileged security context in 7949ms ok 7 Policy failure: unexpected terminationMessagePath in 7789ms ok 8 Policy failure: unexpected hostPath volume mount in 7887ms ok 9 Policy failure: unexpected config map in 7818ms not ok 10 Policy failure: unexpected lifecycle.postStart.exec.command in 9120ms ok 11 RuntimeClassName filter: no policy in 2081ms not ok 12 ExecProcessRequest tests in 9883ms not ok 13 Successful pod: runAsUser having the same value as the UID from the container image in 9870ms ok 14 Policy failure: unexpected UID = 0 in 11161ms ok 15 Policy failure: unexpected UID = 1234 in 7814ms Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-11-03 15:55:55 +00:00
Fabiano Fidêncio	c539a9e90e	tests: k8s: parallel: Increase timeout We've seen a few cases where we fail the test due to timeout and when we print the pods we just see that they've been created. With that in mind, let's just increase the timeout a little bit. Example: ``` not ok 1 Parallel jobs in 6250ms (in test file k8s-parallel.bats, line 41) `kubectl wait --for=condition=Ready --timeout=$timeout pod -l jobgroup=${job_name}' failed No resources found in kata-containers-k8s-tests namespace. [bats-exec-test:71] INFO: k8s configured to use runtimeclass job.batch/process-item-test1 created job.batch/process-item-test2 created job.batch/process-item-test3 created NAME STATUS COMPLETIONS DURATION AGE process-item-test1 Running 0/1 0s process-item-test2 Running 0/1 0s process-item-test3 Running 0/1 0s error: no matching resources found No resources found in kata-containers-k8s-tests namespace. No resources found in kata-containers-k8s-tests namespace. DEBUG: system logs of node 'aks-nodepool1-25989463-vmss000000' since test start time (2025-11-01 16:39:03) -- No entries -- job.batch "process-item-test1" deleted job.batch "process-item-test2" deleted job.batch "process-item-test3" deleted ``` Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-01 18:09:37 +01:00
Fabiano Fidêncio	8a5ebd5d16	tests: k8s: run QoS tests on a bigger instance It's been failing to start quite regularly on the smaller instance. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-11-01 17:54:58 +01:00
Fabiano Fidêncio	c75a46d17f	tests: Do not enable NFD on s390x As we're failing on the uninstall, which seems related to a bug on NFD itself, but I don't have access to a s390x machine to debug, let's skip the enablement for now and enable it back once we've experimented it better on s390x. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-31 16:30:13 +01:00
Fabiano Fidêncio	67e38e0f92	tests: Do not enable NFD on cbl-mariner As we're failing to install NFD on CBL Mariner, let's skip the enablement there, and enable it once we've experimented it better there. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-31 16:30:13 +01:00
Fabiano Fidêncio	1bc873397b	tests: Use NFD as part of the tests As we have the ability to deploy NFD as a sub-chart of our chart, let's make sure we test it during our CI. We had to increase the timeout values, where we had timeouts set, to deploy / undeploy kata, as now NFD is also deployed / undeployed. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-31 16:30:13 +01:00
Fabiano Fidêncio	e30e2b5f45	tests: k8s: Remove tests running on GitHub provided runner We have 2 tests running on GitHub provided runners: * devmapper * CRI-O - devmapper situation For devmapper, we're currently testing devmapper with s390x as part of one of its jobs. More than that, this test has been failing here due to a lack of space in the machine for quite some time, and no-action was taken to bring it back either via GARM or some other way. With that said, let's rely on the s390x CI to test devmapper and avoid one extra failure on our CI by removing this one. - cri-o situation CRI-O is being tested with a fixed version of kubernetes that's already reached its EOL, and a CRI-O version that matches that k8s version. There has been attempts to raise issues, and also to provide a PR that does at least part of the work ... leaving the debugging part for the maintainers of the CI. However, there was no action on those from the maintainers. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-30 11:46:59 +01:00
Manuel Huber	8dc78057d6	ci: Refactor NVIDIA NIM test Change NIM bats file logic to allow skipping test cases which require multiple GPUs. This can be helpful for test clusters where there is only one node with a single GPU, or for local test environments with a single-node cluster with a single GPU. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-10-28 19:12:16 +01:00
Manuel Huber	be32b77baf	ci: Add NVIDIA CUDA vectoradd test This change adds a CUDA vectoradd test case and makes enabling NVRC tracing optional and idempotent. Signed-off-by: Manuel Huber <manuelh@nvidia.com>	2025-10-28 19:12:16 +01:00
Alex Lyn	25ab615da5	Merge pull request #11913 from Apokleos/dedicated-error-rs CI: Add dedicated expected error message for runtime-rs	2025-10-27 10:47:07 +08:00
Dan Mihai	61ee4d7f8b	Merge pull request #11951 from burgerdev/watchable genpolicy: allow non-watchable ConfigMaps	2025-10-24 08:38:55 -07:00
Dan Mihai	ac3ea973ee	Merge pull request #11958 from microsoft/danmihai1/policy-tests-upstream5 tests: k8s: auto-generate policy for additional tests	2025-10-24 07:18:00 -07:00
Alex Lyn	e539432a91	CI: Add dedicated expected error message for runtime-rs Runtime-rs has its dedicated error message, we need handle it separately. Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>	2025-10-24 20:08:59 +08:00
Markus Rudy	acc7974602	genpolicy: allow non-watchable ConfigMaps If a ConfigMap has more than 8 files it will not be mounted watchable [1]. However, genpolicy assumes that ConfigMaps are always mounted at a watchable path, so containers with large ConfigMap mounts fail verification. This commit allows mounting ConfigMaps from watchable and non-watchable directories. ConfigMap mounts can't be meaningfully verified anyway, so the exact location of the data does not matter, except that we stay in the sandbox data dirs. [1]: `0ce3f5fc6f/docs/design/inotify.md (L11-L21)` Fixes: #11777 Signed-off-by: Markus Rudy <mr@edgeless.systems>	2025-10-23 15:45:17 +02:00
Fabiano Fidêncio	94adc58342	tests: Ensure helm secret for kata-deploy installation is cleaned up Every now and then, in case a failure happens, helm leaves the secret behind without cleaning it up, leading to issues in the consecutive runs. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-23 11:15:13 +02:00
Fabiano Fidêncio	12a515826d	tools: Install Golang from a reliable mirror (follow-up) Aurélien has moved to a reliable mirror for our tests, but we missed that our tools Dockerfiles could benefit from the same change, which is added now. Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>	2025-10-23 11:15:13 +02:00
Hyounggyu Choi	2c805900a4	Merge pull request #11891 from stevenhorsman/signature-tests-with-initdata tests/k8s: Add initdata variants of signature verification and registry authentication tests	2025-10-22 20:27:26 +02:00
Dan Mihai	d7176ffcc8	tests: k8s-sandbox-vcpus-allocation generated policy Auto-generate policy for k8s-sandbox-vcpus-allocation.bats. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-10-21 21:36:49 +00:00
Dan Mihai	25299bc2a9	tests: k8s-block-volume.bats generated policy Auto-generate policy for k8s-block-volume.bats. Signed-off-by: Dan Mihai <dmihai@microsoft.com>	2025-10-21 21:36:40 +00:00

1 2 3 4 5 ...

1726 Commits