Commit Graph

17146 Commits

Author SHA1 Message Date
dependabot[bot]
cfec218f52 build(deps): bump tracing-subscriber from 0.2.25 to 0.3.20 in /src/agent
Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.2.25 to 0.3.20.
- [Release notes](https://github.com/tokio-rs/tracing/releases)
- [Commits](https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.2.25...tracing-subscriber-0.3.20)

---
updated-dependencies:
- dependency-name: tracing-subscriber
  dependency-version: 0.3.20
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-05 17:23:09 +00:00
Fabiano Fidêncio
ace9cf942d tests: guest-pull: Fix names
When added, I've mistakenly used the wrong test-type name, which is now
fixed and should be enough to trigger the tests correctly.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-05 18:21:48 +01:00
Fupan Li
0df6c795d8 runtime-rs: disable the default static resource management
Since the qemu & cloud-hypervisor support the cpu & memory
hotplug now, thus disable the static resource management
for qemu and cloud-hypervisor by default.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-11-05 16:59:13 +01:00
Fupan Li
02ecab40e4 tests: disable the cpu hotplug test for coco dev runtime
Since qemu-coco-dev-runtime-rs and qemu-coco-dev had disabled the
cpu&memory hotplug by enable static_sandbox_resource_mgmt, thus
we should disable the cpu hotplug test for those two runtime.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-11-05 16:59:13 +01:00
Fupan Li
1fc05491a2 tests: enable the cpu hotplug test for dragonball etc
Since the qemu, cloud-hypervisor and dragonball had supported the
cpu hotplug on runtime-rs, thus enable the cpu hotplug test in CI.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-11-05 16:59:13 +01:00
Fabiano Fidêncio
0a0de4e6e3 Revert "tests: Do not enable NFD on s390x"
This reverts commit c75a46d17f, as NFD now
publishes an s390x image (and also a ppc64le one).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-05 16:06:33 +01:00
Manuel Huber
1561d7fbba runtime: Clear outer CDI annotations
Pod annotations from the outer runtime are being used for cold-plugging
CDI devices. We need to ensure that these annotations don't leak into
the inner runtime for which specific container (sibling) annotations
are being created. Without this change, the inner runtime receives both
annotations, leading to failing CDI injection as an outer runtime
annotation observed in the guest translates to an unresolvable CDI
device, for example, cdi.k8s.io/gpu: "nvidia.com/pgpu=0".

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-11-04 23:18:00 +01:00
Fabiano Fidêncio
1dfbb14093 tests: Stop testing on stratovirt
Stratovirt has been failing for a considerable amount of time, with no
sign of someone watching it and being actively working on a fix.

With this we also stop building and shipping stratovirt as part of our
release as we cannot test it.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-04 10:22:46 +01:00
Fabiano Fidêncio
02f47d3f18 helm: uninstall: Take nodeSelector into consideration
As we're already doing for the install part, but this bit was missed
during review.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-04 09:29:35 +01:00
Fabiano Fidêncio
5b01eaf929 tests: Align kata-deploy helm's uninstall
Let's use the same method both on the kata-deploy and k8s tests.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-04 09:29:35 +01:00
Fabiano Fidêncio
4293cdf846 tests: Add stability tests for experimental-force-guest-pull
A few weeks ago we've tested nydus-snapshotter with this approach, and
we DID find issues with it.

Now, let's also test this with `experimental_force_guest_pull`.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-04 09:02:19 +01:00
Dan Mihai
6a4c336ca0 Merge pull request #12016 from microsoft/danmihai1/early-wait-abort
tests: k8s: reduce test time for unexpected CreateContainerRequest errors
2025-11-03 12:04:56 -08:00
Fabiano Fidêncio
3107533953 tests: Adjust to runtimeClass creation by the chart
It's just a follow-up on the previous commit where we move away from the
runtimeClass creation inside the script, and instead we do it using the
chart itself.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-03 17:32:18 +01:00
Fabiano Fidêncio
12f3b206eb Revert "kata-deploy: Allow setting the default runtime class name"
This reverts commit be05e1370c, which is
not a problem as we never released such option.

 Conflicts:
	tools/packaging/kata-deploy/helm-chart/README.md

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-03 17:32:18 +01:00
Fabiano Fidêncio
7cfa826804 kata-deploy: Let helm deal with runtimeClass creation
We had this logic inside the script when we didn't use the helm chart.
However, this only makes the shim script more convoluted for no reason.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-03 17:32:18 +01:00
Fabiano Fidêncio
14039c9089 golang: Update to 1.24.9
In order to fix:
```

=== Running govulncheck on containerd-shim-kata-v2 ===
 Vulnerabilities found in containerd-shim-kata-v2:
=== Symbol Results ===

Vulnerability #1: GO-2025-4015
    Excessive CPU consumption in Reader.ReadResponse in net/textproto
  More info: https://pkg.go.dev/vuln/GO-2025-4015
  Standard library
    Found in: net/textproto@go1.24.6
    Fixed in: net/textproto@go1.24.8
    Vulnerable symbols found:
      #1: textproto.Reader.ReadResponse

Vulnerability #2: GO-2025-4014
    Unbounded allocation when parsing GNU sparse map in archive/tar
  More info: https://pkg.go.dev/vuln/GO-2025-4014
  Standard library
    Found in: archive/tar@go1.24.6
    Fixed in: archive/tar@go1.24.8
    Vulnerable symbols found:
      #1: tar.Reader.Next

Vulnerability #3: GO-2025-4013
    Panic when validating certificates with DSA public keys in crypto/x509
  More info: https://pkg.go.dev/vuln/GO-2025-4013
  Standard library
    Found in: crypto/x509@go1.24.6
    Fixed in: crypto/x509@go1.24.8
    Vulnerable symbols found:
      #1: x509.Certificate.Verify
      #2: x509.Certificate.Verify

Vulnerability #4: GO-2025-4012
    Lack of limit when parsing cookies can cause memory exhaustion in net/http
  More info: https://pkg.go.dev/vuln/GO-2025-4012
  Standard library
    Found in: net/http@go1.24.6
    Fixed in: net/http@go1.24.8
    Vulnerable symbols found:
      #1: http.Client.Do
      #2: http.Client.Get
      #3: http.Client.Head
      #4: http.Client.Post
      #5: http.Client.PostForm
      Use '-show traces' to see the other 9 found symbols

Vulnerability #5: GO-2025-4011
    Parsing DER payload can cause memory exhaustion in encoding/asn1
  More info: https://pkg.go.dev/vuln/GO-2025-4011
  Standard library
    Found in: encoding/asn1@go1.24.6
    Fixed in: encoding/asn1@go1.24.8
    Vulnerable symbols found:
      #1: asn1.Unmarshal
      #2: asn1.UnmarshalWithParams

Vulnerability #6: GO-2025-4010
    Insufficient validation of bracketed IPv6 hostnames in net/url
  More info: https://pkg.go.dev/vuln/GO-2025-4010
  Standard library
    Found in: net/url@go1.24.6
    Fixed in: net/url@go1.24.8
    Vulnerable symbols found:
      #1: url.JoinPath
      #2: url.Parse
      #3: url.ParseRequestURI
      #4: url.URL.Parse
      #5: url.URL.UnmarshalBinary

Vulnerability #7: GO-2025-4009
    Quadratic complexity when parsing some invalid inputs in encoding/pem
  More info: https://pkg.go.dev/vuln/GO-2025-4009
  Standard library
    Found in: encoding/pem@go1.24.6
    Fixed in: encoding/pem@go1.24.8
    Vulnerable symbols found:
      #1: pem.Decode

Vulnerability #8: GO-2025-4008
    ALPN negotiation error contains attacker controlled information in
    crypto/tls
  More info: https://pkg.go.dev/vuln/GO-2025-4008
  Standard library
    Found in: crypto/tls@go1.24.6
    Fixed in: crypto/tls@go1.24.8
    Vulnerable symbols found:
      #1: tls.Conn.Handshake
      #2: tls.Conn.HandshakeContext
      #3: tls.Conn.Read
      #4: tls.Conn.Write
      #5: tls.Dial
      Use '-show traces' to see the other 4 found symbols

Vulnerability #9: GO-2025-4007
    Quadratic complexity when checking name constraints in crypto/x509
  More info: https://pkg.go.dev/vuln/GO-2025-4007
  Standard library
    Found in: crypto/x509@go1.24.6
    Fixed in: crypto/x509@go1.24.9
    Vulnerable symbols found:
      #1: x509.CertPool.AppendCertsFromPEM
      #2: x509.Certificate.CheckCRLSignature
      #3: x509.Certificate.CheckSignature
      #4: x509.Certificate.CheckSignatureFrom
      #5: x509.Certificate.CreateCRL
      Use '-show traces' to see the other 27 found symbols

Vulnerability #10: GO-2025-4006
    Excessive CPU consumption in ParseAddress in net/mail
  More info: https://pkg.go.dev/vuln/GO-2025-4006
  Standard library
    Found in: net/mail@go1.24.6
    Fixed in: net/mail@go1.24.8
    Vulnerable symbols found:
      #1: mail.AddressParser.Parse
      #2: mail.AddressParser.ParseList
      #3: mail.Header.AddressList
      #4: mail.ParseAddress
      #5: mail.ParseAddressList
```

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-03 16:57:22 +01:00
Dan Mihai
c563ee99fa tests: policy-rc: detect create container errors early
During the ${wait_time} for an expected condition, if
CreateContainerRequest was NOT expected to fail: detect possible
CreateContainerRequest failures early and abort the wait.

For example, before this change:

not ok 1 Successful replication controller with auto-generated policy in 123335ms
ok 2 Policy failure: unexpected container command in 14601ms
ok 3 Policy failure: unexpected volume mountPath in 14443ms
ok 4 Policy failure: unexpected host device mapping in 14515ms
ok 5 Policy failure: unexpected securityContext.allowPrivilegeEscalation in 14485ms
ok 6 Policy failure: unexpected capability in 14382ms
ok 7 Policy failure: unexpected UID = 1000 in 14578ms

After this change:

not ok 1 Successful replication controller with auto-generated policy in 17108ms
ok 2 Policy failure: unexpected container command in 14427ms
ok 3 Policy failure: unexpected volume mountPath in 14636ms
ok 4 Policy failure: unexpected host device mapping in 14493ms
ok 5 Policy failure: unexpected securityContext.allowPrivilegeEscalation in 14554ms
ok 6 Policy failure: unexpected capability in 15087ms
ok 7 Policy failure: unexpected UID = 1000 in 14371ms

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-11-03 15:55:55 +00:00
Dan Mihai
319400dc0d tests: policy-pvc: detect create container errors early
During the ${wait_time} for an expected condition, if
CreateContainerRequest was NOT expected to fail: detect possible
CreateContainerRequest failures early and abort the wait.

For example, before this change:

not ok 1 Successful pod with auto-generated policy in 94852ms
ok 2 Policy failure: unexpected device mount in 17807ms

After this change:

not ok 1 Successful pod with auto-generated policy in 35194ms
ok 2 Policy failure: unexpected device mount in 21355ms

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-11-03 15:55:55 +00:00
Dan Mihai
1914fcb812 tests: policy-log: detect create container errors early
During the ${wait_time} for an expected condition, if
CreateContainerRequest was NOT expected to fail: detect possible
CreateContainerRequest failures early and abort the wait.

For example, before this change:

not ok 1 Logs empty when ReadStreamRequest is blocked in 102257ms

After this change:

not ok 1 Logs empty when ReadStreamRequest is blocked in 17339ms

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-11-03 15:55:55 +00:00
Dan Mihai
a0bd9e02ca tests: policy-job: detect create container errors early
During the ${wait_time} for an expected condition, if
CreateContainerRequest was NOT expected to fail: detect possible
CreateContainerRequest failures early and abort the wait.

For example, before this change:

not ok 1 Successful job with auto-generated policy in 107111ms
ok 2 Policy failure: unexpected environment variable in 7920ms
ok 3 Policy failure: unexpected command line argument in 7874ms
ok 4 Policy failure: unexpected emptyDir volume in 7823ms
ok 5 Policy failure: unexpected projected volume in 7812ms
ok 6 Policy failure: unexpected readOnlyRootFilesystem in 7903ms
ok 7 Policy failure: unexpected UID = 222 in 7720ms

After this change:

not ok 1 Successful job with auto-generated policy in 10271ms
ok 2 Policy failure: unexpected environment variable in 8018ms
ok 3 Policy failure: unexpected command line argument in 7886ms
ok 4 Policy failure: unexpected emptyDir volume in 7621ms
ok 5 Policy failure: unexpected projected volume in 7843ms
ok 6 Policy failure: unexpected readOnlyRootFilesystem in 7632ms
ok 7 Policy failure: unexpected UID = 222 in 7619ms

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-11-03 15:55:55 +00:00
Dan Mihai
992c91371c tests: policy-deployment-sc: detect create container errors early
During the ${wait_time} for an expected condition, if
CreateContainerRequest was NOT expected to fail: detect possible
CreateContainerRequest failures early and abort the wait.

For example, before this change:

ok 1 Successful sc deployment with auto-generated policy and container image volumes in 14769ms
ok 2 Successful sc with fsGroup/supplementalGroup deployment with auto-generated policy and container image volumes in 8384ms
not ok 3 Successful sc deployment with security context choosing another valid user in 136149ms
ok 4 Successful layered sc deployment with auto-generated policy and container image volumes in 8862ms
ok 5 Policy failure: unexpected GID = 0 for layered securityContext deployment in 7941ms
ok 6 Policy failure: malicious root group added via supplementalGroups deployment in 11612ms

After:

ok 1 Successful sc deployment with auto-generated policy and container image volumes in 15230ms
ok 2 Successful sc with fsGroup/supplementalGroup deployment with auto-generated policy and container image volumes in 9364ms
not ok 3 Successful sc deployment with security context choosing another valid user in 11060ms
ok 4 Successful layered sc deployment with auto-generated policy and container image volumes in 9124ms
ok 5 Policy failure: unexpected GID = 0 for layered securityContext deployment in 7919ms
ok 6 Policy failure: malicious root group added via supplementalGroups deployment in 11666ms

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-11-03 15:55:55 +00:00
Dan Mihai
704ee76f1e tests: policy-deployment-sc: reduced redundancy
Call common function instead of copy/paste of three commands.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-11-03 15:55:55 +00:00
Dan Mihai
2cafb10a6a tests: policy-pod: detect create container errors early
During the ${wait_time} for an expected condition, if
CreateContainerRequest was NOT expected to fail: detect possible
CreateContainerRequest failures early and abort the wait.

For example, before this change:

not ok 1 Successful pod with auto-generated policy in 110801ms
not ok 2 Able to read env variables sourced from configmap using envFrom in 94104ms
not ok 3 Successful pod with auto-generated policy and runtimeClassName filter in 95838ms
not ok 4 Successful pod with auto-generated policy and custom layers cache path in 110712ms
ok 5 Policy failure: unexpected container image in 8113ms
ok 6 Policy failure: unexpected privileged security context in 7943ms
ok 7 Policy failure: unexpected terminationMessagePath in 11530ms
ok 8 Policy failure: unexpected hostPath volume mount in 7970ms
ok 9 Policy failure: unexpected config map in 7933ms
not ok 10 Policy failure: unexpected lifecycle.postStart.exec.command in 112677ms
ok 11 RuntimeClassName filter: no policy in 2302ms
not ok 12 ExecProcessRequest tests in 93946ms
not ok 13 Successful pod: runAsUser having the same value as the UID from the container image in 94003ms
ok 14 Policy failure: unexpected UID = 0 in 8016ms
ok 15 Policy failure: unexpected UID = 1234 in 7850ms

After:

not ok 1 Successful pod with auto-generated policy in 12182ms
not ok 2 Able to read env variables sourced from configmap using envFrom in 10121ms
not ok 3 Successful pod with auto-generated policy and runtimeClassName filter in 11738ms
not ok 4 Successful pod with auto-generated policy and custom layers cache path in 26592ms
ok 5 Policy failure: unexpected container image in 7742ms
ok 6 Policy failure: unexpected privileged security context in 7949ms
ok 7 Policy failure: unexpected terminationMessagePath in 7789ms
ok 8 Policy failure: unexpected hostPath volume mount in 7887ms
ok 9 Policy failure: unexpected config map in 7818ms
not ok 10 Policy failure: unexpected lifecycle.postStart.exec.command in 9120ms
ok 11 RuntimeClassName filter: no policy in 2081ms
not ok 12 ExecProcessRequest tests in 9883ms
not ok 13 Successful pod: runAsUser having the same value as the UID from the container image in 9870ms
ok 14 Policy failure: unexpected UID = 0 in 11161ms
ok 15 Policy failure: unexpected UID = 1234 in 7814ms

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-11-03 15:55:55 +00:00
Alex Lyn
897ecfb503 Merge pull request #12014 from fidencio/topic/release-ensure-helm-dependencies-update
scripts: release: Run helm dependencies update
2025-11-03 16:34:17 +08:00
Fabiano Fidêncio
c539a9e90e tests: k8s: parallel: Increase timeout
We've seen a few cases where we fail the test due to timeout and when we
print the pods we just see that they've been created.

With that in mind, let's just increase the timeout a little bit.

Example:
```
not ok 1 Parallel jobs in 6250ms
 (in test file k8s-parallel.bats, line 41)
   `kubectl wait --for=condition=Ready --timeout=$timeout pod -l jobgroup=${job_name}' failed
 No resources found in kata-containers-k8s-tests namespace.
 [bats-exec-test:71] INFO: k8s configured to use runtimeclass
 job.batch/process-item-test1 created
 job.batch/process-item-test2 created
 job.batch/process-item-test3 created
 NAME                 STATUS    COMPLETIONS   DURATION   AGE
 process-item-test1   Running   0/1                      0s
 process-item-test2   Running   0/1                      0s
 process-item-test3   Running   0/1                      0s
 error: no matching resources found
 No resources found in kata-containers-k8s-tests namespace.
 No resources found in kata-containers-k8s-tests namespace.
 DEBUG: system logs of node 'aks-nodepool1-25989463-vmss000000' since test start time (2025-11-01 16:39:03)
 -- No entries --
 job.batch "process-item-test1" deleted
 job.batch "process-item-test2" deleted
 job.batch "process-item-test3" deleted
```

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-01 18:09:37 +01:00
Fabiano Fidêncio
8a5ebd5d16 tests: k8s: run QoS tests on a bigger instance
It's been failing to start quite regularly on the smaller instance.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-01 17:54:58 +01:00
Fabiano Fidêncio
157b2c32ce scripts: release: Run helm dependencies update
Otherwise we'll face issues like:
```
Error: found in Chart.yaml, but missing in charts/ directory: node-feature-discovery
```

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-01 17:54:58 +01:00
Fabiano Fidêncio
c75a46d17f tests: Do not enable NFD on s390x
As we're failing on the uninstall, which seems related to a bug on NFD
itself, but I don't have access to a s390x machine to debug, let's skip
the enablement for now and enable it back once we've experimented it
better on s390x.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-31 16:30:13 +01:00
Fabiano Fidêncio
67e38e0f92 tests: Do not enable NFD on cbl-mariner
As we're failing to install NFD on CBL Mariner, let's skip the
enablement there, and enable it once we've experimented it better there.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-31 16:30:13 +01:00
Fabiano Fidêncio
1bc873397b tests: Use NFD as part of the tests
As we have the ability to deploy NFD as a sub-chart of our chart, let's
make sure we test it during our CI.

We had to increase the timeout values, where we had timeouts set, to
deploy / undeploy kata, as now NFD is also deployed / undeployed.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-31 16:30:13 +01:00
Fabiano Fidêncio
ebe15d154e kata-deploy: Add NFD as a dependency
Let's ensure that we add NFD as a weak dependency of the kata-deploy
helm chart.

What we're doing for now is leaving it up to the user / admin to enable
it, and if enabled then we do a explicit check for virtualization
support (x86_64 only for now).

In case NFD is already deployed, we fail the installation (in case it's
enabled on the kata-deploy helm chart) with a clear error message to the
user.

While I know that kata-remote **DOES NOT** require virtualization, I've
left this out (with a comment for when we add a peer-pods dependency on
kata-deploy) in order to simplify things for now, as kata-remote is not
a deployed shim by default.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-31 16:30:13 +01:00
Fabiano Fidêncio
be05e1370c kata-deploy: Allow setting the default runtime class name
As Kata Containers can be consumed by other helm-charts, hard coding the
default runtime class name to `kata` is not optimal.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-31 16:14:53 +01:00
Fabiano Fidêncio
820e6d6351 kata-deploy: Add more per-arch options
All the options that take a specific shim as an argument MUST have
specific per arch settings, as not all the shims are available for all
the arches, leading to issues when setting up multi-arch deployments.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-31 16:14:53 +01:00
Zvonko Kaiser
94abe4fc00 osbuilder: nvrc: Consume NVRC release instead of building it
Let's ensure that we consume NVRC releases straight from GitHub instead
of building the binaries ourselves.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-31 12:10:20 +01:00
Zvonko Kaiser
69c76971f3 gpu: Handle VFIO and IOMMUFD
We have here either /dev/vfio/<num> or /dev/vfio/devices/vfio<num>,
for IOMMUFD format /dev/vfio/devices/vfio<num>, strip "vfio" prefix

/dev/vfio/123 - basename "123" - vfioNum = "123" - cdi.k8s.io/vfio123
/dev/vfio/devices/vfio123 - basename "vfio123" - strip - vfioNum = "123" - cdi.k8s.io/vfio123

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-10-31 09:46:07 +01:00
Fabiano Fidêncio
e30e2b5f45 tests: k8s: Remove tests running on GitHub provided runner
We have 2 tests running on GitHub provided runners:
* devmapper
* CRI-O

- devmapper situation

For devmapper, we're currently testing devmapper with s390x as part of
one of its jobs.

More than that, this test has been failing here due to a lack of space
in the machine for quite some time, and no-action was taken to bring it
back either via GARM or some other way.

With that said, let's rely on the s390x CI to test devmapper and avoid
one extra failure on our CI by removing this one.

- cri-o situation

CRI-O is being tested with a fixed version of kubernetes that's already
reached its EOL, and a CRI-O version that matches that k8s version.

There has been attempts to raise issues, and also to provide a PR that
does at least part of the work ... leaving the debugging part for the
maintainers of the CI. However, there was no action on those from the
maintainers.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-30 11:46:59 +01:00
Alex Lyn
fa521220a9 Merge pull request #11816 from jiuyi123/rs-vm-template-kata-ctl-merge
kata-ctl: add factory subcommands for VM template management
2025-10-30 18:21:12 +08:00
ssc
551caad4b1 docs: add guide on VM templating usage in runtime-rs
- Explained the concept and benefits of VM templating
- Provided step-by-step instructions for enabling VM templating
- Detailed the setup for using snapshotter in place of VirtioFS for template-based VM creation
- Added performance test results comparing template-based and direct VM creation

Signed-off-by: ssc <741026400@qq.com>
2025-10-30 15:18:31 +08:00
ssc
5a586e13a1 kata-ctl: add factory subcommands for VM template management
- init: initialize the VM template factory
- status: check the current factory status
- destroy: clean up and remove factory resources
These commands provide basic lifecycle management for VM templates.

Signed-off-by: ssc <741026400@qq.com>
2025-10-30 10:27:17 +08:00
RuoqingHe
8878c46e8f Merge pull request #11867 from spectator333/update-rust-vmm-deps
dragonball: Bump kvm-ioctls to fix security issue
2025-10-30 00:17:29 +08:00
Siyu Tao
dd444d23b3 dragonball: Bump kvm-ioctls to fix security issue
Use `ioctl_with_mut_ref` instead of `ioctl_with_ref` in the
`create_device` method as it needs to write to the `kvm_create_device`
struct passed to it, which was released in v0.12.1.

Signed-off-by: Siyu Tao <taosiyu2024@163.com>
2025-10-29 14:03:29 +00:00
Steve Horsman
0e19a2bf91 Merge pull request #11993 from zvonkok/vectorAdd
gpu: Add libs for CC
2025-10-29 13:42:34 +00:00
stevenhorsman
555926ea1a libs: Fix formatting issue
Fix the cargo fmt issues and then we can make the libs tests required
again to avoid this regression happening again.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-10-29 13:13:50 +01:00
Steve Horsman
dbdd1009af Merge pull request #11933 from kata-containers/topic/kata-deploy-nfd-dependency-part-I
kata-deploy: Automatically deploy NodeFeatureRules for TEEs
2025-10-29 09:50:38 +00:00
Fabiano Fidêncio
103f80c7f5 readme: install: Drop outdated documentation
kata-deploy helm chart is *THE* way to deploy kata-containers on
kubernetes environments, and kubernetes environments is basically the
only reliably tested deployment we have.

For now, let's just drop documentation that is outdated / incorrect, and
in the future let's ensure we update the linked docs, as we work on
update / upgrade for the helm chart.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-10-29 09:41:57 +01:00
Zvonko Kaiser
5ff218823c gpu: Remove unneeded libraries
The libs in question were added when moving to developer.nvidia.com
but switching back to ubuntu only based builds they are not needed.
Remove them to keep the rootfs as minimal as possible.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-10-29 08:03:36 +01:00
Zvonko Kaiser
6d9b4059f5 gpu: Add libs for CC
In the case of CC we need additional libraries in the rootfs.
Add them conditionally if type == confidential.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-10-29 08:03:36 +01:00
Xuewei Niu
55d181beb1 Merge pull request #11828 from jiuyi123/rs-vm-template-runtime-rs
runtime-rs: introduce VM template lifecycle and integration
2025-10-29 14:03:46 +08:00
Xuewei Niu
8aca32dfa9 Merge pull request #11862 from StevenFryto/rootless_clh
runtime-rs: supporting the CLH VMM process running in non-root mode
2025-10-29 13:31:53 +08:00
ssc
16e8cf1a09 runtime-rs: boot vm from template
Add build_vm_from_template() that flips boot_from_template flag,
wires factory.template_path/{memory,state} into the hypervisor config,
and returns ready-to-use hypervisor & agent instances.
When factory.template is enabled, VirtContainer bypasses normal creation
and directly boots the VM by restoring the template through incoming migration,
completing the "create → save → clone" loop.

Fixes: #11413

Signed-off-by: ssc <741026400@qq.com>
2025-10-29 12:38:28 +08:00