Commit Graph

15842 Commits

Author SHA1 Message Date
RuoqingHe
1cb34c4d0a
Merge pull request #11202 from RuoqingHe/2025-04-28-upgrade-rtnetlink
runtime-rs: Upgrade `rust-netlink` crates
2025-05-05 21:35:45 +08:00
Ruoqing He
2d0f32ff96 runtime-rs: Upgrade crates from rust-netlink
Bump `netlink-sys` to v0.8, `netlink-packet-route` to v0.22 and
`rtnetlink` to v0.16 to reach a consistent state of `rust-netlink`
dependencies.

`bitflags` is bumped to v2.9.0 since those crates requires it.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-05-03 02:31:02 +00:00
Ruoqing He
09700478eb runtime-rs: Group Dependencies from rust-netlink
`rtnetlink`, `netlink-sys` and `netlink-packet-route` are from the same
organization, and some of them are depending on the others, which
implies the version of those crates should be chosen and dealt with
carefully, group them to provide better management.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-05-03 02:29:43 +00:00
Fabiano Fidêncio
fbf7faa9f4
Merge pull request #11227 from fidencio/topic/agent-only-try-ipv6-if-stack-is-supported
agent: netlink: Only add an ipv6 address if ipv6 is enabled
2025-05-02 12:31:40 +02:00
Xuewei Niu
a9b3c6a5a5
Merge pull request #11209 from lifupan/fix_slog
shimv2: fix the issue logger write failed
2025-05-02 17:25:44 +08:00
Fabiano Fidêncio
79ad68cce5
Merge pull request #11230 from kimullaa/remove-wrong-qemu-option
runtime: remove wrong qemu-system-x86_64 option
2025-05-02 11:18:45 +02:00
Fabiano Fidêncio
4ce00ea434 agent: netlink: Only add an ipv6 address if ipv6 is enabled
When running Kata Containers on CSPs, the CSPs may enforce their
clusters to be IPv4-only.

Checking the OCI spec passed down to container, on a GKE cluster, we can
see:
```
    "sysctl": {
      ...
      "net.ipv6.conf.all.disable_ipv6": "1",
      "net.ipv6.conf.default.disable_ipv6": "1",
      ...
    },
```

Even with ipv6 being explicitly disabled (behind our back ;-)), we've
noticed that IPv6 addresses would be received, but then as IPv6 was
disabled we'd break on CreatePodSandbox with the following error:
```
Warning  FailedCreatePodSandBox  4s    kubelet            Failed to
create pod sandbox: rpc error: code = Unknown desc = failed to create
containerd task: failed to create shim task: "update interface: Failed
to add address fe80::c44c:1cff:fe84:f6b7: NetlinkError(ErrorMessage {
code: Some(-13), header: [64, 0, 0, 0, 20, 0, 5, 5, 19, 0, 0, 0, 0, 0,
0, 0, 10, 64, 0, 0, 2, 0, 0, 0, 20, 0, 1, 0, 254, 128, 0, 0, 0, 0, 0, 0,
196, 76, 28, 255, 254, 132, 246, 183, 20, 0, 2, 0, 254, 128, 0, 0, 0, 0,
0, 0, 196, 76, 28, 255, 254, 132, 246, 183] })\n\nStack backtrace:\n
0: <unknown>\n   1: <unknown>\n   2: <unknown>\n   3: <unknown>\n   4:
<unknown>\n   5: <unknown>\n   6: <unknown>\n   7: <unknown>\n   8:
<unknown>\n   9: <unknown>\n  10: <unknown>": unknown
```

A huge shoutout to Fupan Li for helping with the debug on this one!

Fixes: #11200

Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
2025-05-02 09:10:45 +02:00
Shunsuke Kimura
3dba8ddd98 runtime: remove wrong qemu-system-x86_64 option
qemu-system-x86_64 does not support "-machine virt".
(this is only supported by arm,aarch64)
<https://people.redhat.com/~cohuck/2022/01/05/qemu-machine-types.html>

Fixes: #11229

Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>
2025-05-02 04:37:12 +09:00
Fabiano Fidêncio
7e404dd13f
Merge pull request #11228 from zvonkok/fix-kernel-modules-build
gpu: Set the ARCH explicilty for driver builds
2025-05-01 21:07:20 +02:00
Zvonko Kaiser
445cad7754 gpu: Set the ARCH explicilty for driver builds
Kernel Makefiles changed how to deduce the right arch
lets set it explicilty to enable arm and amd builds.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-05-01 17:13:20 +00:00
RuoqingHe
049a4ef3a8
Merge pull request #11146 from RuoqingHe/2025-04-14-dragonball-centralize-dbs
dragonball: Put local dependencies into workspace
2025-05-01 22:06:51 +08:00
RuoqingHe
bd1071aff8
Merge pull request #11174 from kata-containers/dependabot/cargo/src/mem-agent/crossbeam-channel-0.5.15
build(deps): bump crossbeam-channel from 0.5.13 to 0.5.15 in /src/mem-agent
2025-05-01 16:53:42 +08:00
Ruoqing He
61f2b6a733 dragonball: Put local dependencies into workspace
Put local dependencies (mostly `dbs` crates) into workspace to avoid
complex path dependencies all over the workspace. Simplify path
dependency referencing.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-05-01 08:40:22 +00:00
RuoqingHe
33c69fc8bf
Merge pull request #11204 from stevenhorsman/go-security-bump-april-25
versions: Bump golang.org/x/net
2025-05-01 16:36:24 +08:00
Fabiano Fidêncio
bc66d75fe9
Merge pull request #11217 from stevenhorsman/runtime-rs-centralise-workspace-config
Runtime rs centralise workspace config
2025-05-01 10:36:07 +02:00
Fupan Li
9924fbbc70 shimv2: fix the issue logger write failed
It's better to open the log pipe file with read & write option,
otherwise, once the containerd reboot and closed the read
endpoint, kata shim would write the log pipe with broken pipe error.

Fixes: #11207

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-05-01 16:15:18 +08:00
Fabiano Fidêncio
3dfabd42c2
Merge pull request #11206 from kimullaa/fix-xfs-rootfs-type
runtime: remove wrong xfs options
2025-05-01 09:05:17 +02:00
Fabiano Fidêncio
a2fbc598b8
Merge pull request #11223 from microsoft/cameronbaird/revert-aks-extension-pin
ci: revert temp: ci: Fix AKS cluster creation
2025-05-01 08:33:12 +02:00
Shunsuke Kimura
62639c861e runtime: remove wrong xfs options
"data=ordered" and "errors=remount-ro" are wrong options in xfs.
(they are ext4 options)
<https://manpages.ubuntu.com/manpages/focal/man5/xfs.5.html>

Fixes: #11205

Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>
2025-05-01 07:56:39 +09:00
Cameron Baird
6e21d14334 Revert "temp: ci: Fix AKS cluster creation"
This reverts commit 1de466fe84.

The latest release of the az aks extension fixes the issue https://github.com/Azure/azure-cli-extensions/blob/main/src/aks-preview/HISTORY.rst#1400b5

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
2025-04-30 21:24:42 +00:00
stevenhorsman
a126884953 runtime-rs: Share workspace config
Update the runtime-rs workspace packages to
use workspace package versions where applicable
to centralise the config and reduce maintenance
when updating these

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-04-30 19:40:47 +01:00
stevenhorsman
f8fcd032ef workflow: Set RUST_LIB_BACKTRACE=0
As discussed in #9538, with anyhow >=1.0.77 we have test failures due to backtrace behaviour
changing, so set RUST_LIB_BACKTRACE=0,
so that we only have backtrace on panics

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-04-30 19:38:13 +01:00
stevenhorsman
ffbaa793a3 versions: Update crossbeam-channel
Update all crossbeam-channel for all non-agent
packages (it was done separately in #11175)
to 0.5.15 to get them on latest version and remove
the versions with a vulnerability

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-04-30 19:36:40 +01:00
Steve Horsman
b97bc03ecb
Merge pull request #11211 from stevenhorsman/dragonball-lockfiles
dragonball: Remove package lockfiles
2025-04-30 19:34:58 +01:00
stevenhorsman
f910c7535a ci: Workaround cargo deny issue
When a PR has no new files the cargo deny runner fails with:
```
[cargo-deny-generator.sh:17] ERROR: changed_files_status=
```
so add `|| true` to try and help this

Co-authored-by: Ruoqing He <heruoqing@iscas.ac.cn>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-04-30 16:27:25 +01:00
stevenhorsman
97f7d49e8e dragonball: Remove package lockfiles
Since #10780 the dbs crates are managed as members
of the dragonball workspace, so we can remove the lockfile
as it's now workspace managed now

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-04-30 09:14:07 +01:00
Steve Horsman
8045cb982c
Merge pull request #11208 from kata-containers/dependabot/cargo/src/runtime-rs/tokio-1.38.2
build(deps): bump tokio from 1.38.0 to 1.38.2 in /src/runtime-rs
2025-04-30 08:44:51 +01:00
Aurélien Bombo
46af7cf817
Merge pull request #11077 from microsoft/cameronbaird/address-gid-mismatch
genpolicy: Align GID behavior with CRI and enable GID policy checks.
2025-04-29 22:23:23 +01:00
Aurélien Bombo
19371e2d3b
Merge pull request #11164 from wainersm/fix_kbs_on_aks
tests/k8s: fix kbs installation on Azure AKS
2025-04-29 18:25:14 +01:00
Steve Horsman
6c1fafb651
Merge pull request #11210 from kata-containers/dependabot/cargo/src/tools/runk/tokio-1.44.2
build(deps): bump tokio from 1.38.0 to 1.44.2 in /src/tools/runk
2025-04-29 16:43:58 +01:00
Steve Horsman
3c8cc0cdbf
Merge pull request #11212 from BbolroC/add-cc-vfio-ap-test-s390x
GHA: Add VFIO-AP to s390x nightly tests for CoCo
2025-04-29 16:15:00 +01:00
Steve Horsman
a6d1dc7df3
Merge pull request #10940 from ldoktor/peer-pods
ci.ocp: Add peer-pods setup script
2025-04-29 15:57:30 +01:00
Hyounggyu Choi
63b9ae3ed0 GHA: Add VFIO-AP to s390x nightly tests for CoCo
As #11076 introduces VFIO-AP bind/associate funtions for IBM Secure
Execution (SEL), a new internal nightly test has been established.
This PR adds a new entry `cc-vfio-ap-e2e-tests` to the existing matrix
to share the test result.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-04-29 16:06:12 +02:00
Steve Horsman
8b32846519
Merge pull request #10882 from stevenhorsman/kbs-logging-on-failure
tests: confidential: Add KBS logging
2025-04-29 13:29:21 +01:00
dependabot[bot]
7163d7d89b
build(deps): bump tokio from 1.38.0 to 1.38.2 in /src/runtime-rs
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.38.0 to 1.38.2.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.38.0...tokio-1.38.2)

---
updated-dependencies:
- dependency-name: tokio
  dependency-version: 1.38.2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-29 12:21:58 +00:00
dependabot[bot]
2992a279ab
build(deps): bump tokio from 1.38.0 to 1.44.2 in /src/tools/runk
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.38.0 to 1.44.2.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.38.0...tokio-1.44.2)

---
updated-dependencies:
- dependency-name: tokio
  dependency-version: 1.44.2
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-29 12:14:41 +00:00
Fabiano Fidêncio
e5cc9acab8
Merge pull request #11175 from kata-containers/dependabot/cargo/src/agent/crossbeam-channel-0.5.15
build(deps): bump crossbeam-channel from 0.5.14 to 0.5.15 in /src/agent
2025-04-29 14:13:25 +02:00
Fabiano Fidêncio
a9893e83b8
Merge pull request #11203 from stevenhorsman/high-severity-security-bumps-april-25
rust: High severity security bumps april 25
2025-04-29 14:10:05 +02:00
stevenhorsman
52b2662b75 tests: confidential: Add KBS logging
For help with debugging add, logging of the KBS,
like the container system logs if the confidential test fails

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-04-29 09:48:18 +01:00
stevenhorsman
bcffe938ca versions: Bump golang.org/x/net
Bump golang.org/x/net to 0.38.0 as dependabot
isn't doing it for these packages to remediate
CVE-2025-22872

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-04-29 09:46:48 +01:00
Steve Horsman
57527c1ce4
Merge pull request #11161 from kata-containers/dependabot/go_modules/src/runtime/golang.org/x/net-0.38.0
build(deps): bump golang.org/x/net from 0.33.0 to 0.38.0 in /src/runtime
2025-04-29 09:39:30 +01:00
Cameron Baird
70ef0376fb genpolicy: Introduce special handling for clusters using nydus
Nydus+guest_pull has specific behavior where it improperly handles image layers on
the host, causing the CRI to not find /etc/passwd and /etc/group files
on container images which have them. The unfortunately causes different
outcomes w.r.t. GID used which we are trying to enforce with policy.

This behavior is observed/explained in https://github.com/kata-containers/kata-containers/issues/11162

Handle this exception with a config.settings.cluster_config.guest_pull
field. When this is true, simply ignore the /etc/* files in the
container image as they will not be parsed by the CRI.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
2025-04-28 20:18:42 +00:00
Cameron Baird
d3b652014a genpolicy: Introduce genpolicy tests for security contexts
Add security context testcases for genpolicy, verifying that UID and GID
configurations controlled by the kubernetes security context are
enforced.

Also, fix the other CreateContainerRequest tests' expected contents to
reflect our new genpolicy parsing/enforcement of GIDs.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
2025-04-28 16:28:31 +00:00
Cameron Baird
fc75aee13a ci: Add CI tests for runAsGroup, GID policy
Introduce tests to check for policy correctness on a redis deployment
with 1. a pod-level securityContext 2. a container-level securityContext
which shadows the pod-level securityContext 3. a pod-level
securityContext which selects an existing user (nobody), causing a new GID to be selected.

Redis is an interesting container image to test with because it includes
a /etc/passwd file with existing user/group configuration of 1000:1000 baked in.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
2025-04-28 16:28:31 +00:00
Cameron Baird
938ddeaf1e genpolicy: Enable GID checks in rules.rego
With fixes to align policy GID parsing with the CRI behavior, we can now
enable policy verification of GIDs.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
2025-04-28 16:28:31 +00:00
Cameron Baird
eb2c7f4150 genpolicy: Integrate /etc/passwd from OCI container when setting GIDs
The GID used for the running process in an OCI container is a function of
1. The securityContext.runAsGroup specified in a pod yaml, 2. The UID:GID mapping in
/etc/passwd, if present in the container image layers, 3. Zero, even if
the userstr specifies a GID.

Make our policy engine align with this behavior by:
1. At the registry level, always obtain the GID from the /etc/passwd
   file if present. Ignore GIDs specified in the userstr encoded in the
OCI container.
2. After an update to UID due to securityContexts, perform one final check against
   the /etc/passwd file if present. The GID used for the running
process is the mapping in this file from UID->GID.
3. Override everything above with the GID of the securityContext
   configuration if provided

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
2025-04-28 16:28:31 +00:00
Cameron Baird
c13d7796ee genpolicy: Parse secContext runAsGroup and allowPrivilegeEscalation
Our policy should cover these fields for securityContexts at the pod or
container level of granularity.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
2025-04-28 16:28:31 +00:00
Cameron Baird
349ce8c339 genpolicy: Refactor registry user/group parsing to account for all cases
The get_process logic in registry.rs did not account for all cases
(username:groupname), did not defer to contents of /etc/group,
/etc/passwd when it should, and was difficult to read.

Clean this implementation up, factoring the string parsing for
user/group strings into their own functions. Enable the
registry::Container class to query /etc/passwd and /etc/group, if they
exist.

Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
2025-04-28 16:28:29 +00:00
Wainer dos Santos Moschetta
460c3394dd gha: run CoCo non-TEE tests on "all" host type
By running on "all" host type there are two consequences:

1) run the "normal" tests too (until now, it's only "small" tests), so
   increasing the coverage
2) create AKS cluster with larger VMs. This is a new requirement due to
   the current ingress controller for the KBS service eating too much
   vCPUs and lefting only few for the tests (resulting on failures)

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2025-04-28 12:08:31 -03:00
Wainer dos Santos Moschetta
945482ff6e tests: make _print_instance_type() to handle "all" host type
_print_instance_type() returns the instance type of the AKS nodes, based
on the host type. Tests are grouped per host type in "small" and "normal"
sets based on the CPU requirements: "small" tests require few CPUs and
"normal" more.

There is an 3rd case: "all" host type maps to the union of "small"
and "normal" tests, which should be handled by _print_instance_type()
properly. In this case, it should return the largest instance type
possible because "normal" tests  will be executed too.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2025-04-28 12:08:31 -03:00