Compare commits

...

95 Commits

Author SHA1 Message Date
Steve Horsman
2728b493d5 Merge pull request #12681 from manuelh-dev/mahuber/ci-pip-py-venv
tests: cc: setup function for python venv
2026-03-23 14:33:30 +00:00
Fabiano Fidêncio
1ec97d25e7 Merge pull request #12704 from stevenhorsman/security-fixes-23-mar-26
Security fixes 23 mar 26
2026-03-23 15:27:07 +01:00
Fabiano Fidêncio
aa6890eae1 Merge pull request #12675 from manuelh-dev/mahuber/cdh-storage-options
agent: add mkfs_opts parameter to cdh_secure_mount
2026-03-23 15:18:38 +01:00
Fabiano Fidêncio
fe817bb47b Merge pull request #12705 from fidencio/topic/tests-nginx-connectibity-2nd-try
tests: nginx-connectivity: Use `-O index.html` to override the downloaded file
2026-03-23 13:08:51 +01:00
Fabiano Fidêncio
514a2b1a7c Merge pull request #12264 from fidencio/topic/nvidia-gpu-cc-use-nydus-snapshotter
nvidia: cc: Use nydus-snapshotter
2026-03-23 12:50:15 +01:00
stevenhorsman
2edb588ed9 kata-ctl: Pin micro_http
the micro_http crate was just pointing the the main branch and hadn't been updated for
around 3 years, so pin to the latest for stability and update to remediate RUSTSEC-2024-0002

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-23 10:34:28 +00:00
stevenhorsman
9871256771 versions: Bump cloud-hypervisor to v51
In v51 the license was added, so try bumping to this version
to solve the cargo deny issue

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-23 10:34:28 +00:00
dependabot[bot]
8de7f29981 agent-ctl: Bump aws-lc-rs to 1.16.2
Bump aws-lc-rs, so that aws-lc-sys updates to 0.39.0 to remediate
RUSTSEC-2026-0044 and https://osv.dev/vulnerability/RUSTSEC-2026-0048

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-23 10:34:28 +00:00
dependabot[bot]
1c63738b80 build(deps): bump aws-lc-fips-sys in /src/tools/agent-ctl
Bumps [aws-lc-fips-sys](https://github.com/aws/aws-lc-rs) from 0.13.12 to 0.13.13.
- [Release notes](https://github.com/aws/aws-lc-rs/releases)
- [Commits](https://github.com/aws/aws-lc-rs/compare/aws-lc-fips-sys/v0.13.12...aws-lc-fips-sys/v0.13.13)

---
updated-dependencies:
- dependency-name: aws-lc-fips-sys
  dependency-version: 0.13.13
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-23 10:34:28 +00:00
dependabot[bot]
6e79a9d6ad build(deps): bump rustls-webpki in /src/tools/agent-ctl
Bumps [rustls-webpki](https://github.com/rustls/webpki) from 0.103.3 to 0.103.10.
- [Release notes](https://github.com/rustls/webpki/releases)
- [Commits](https://github.com/rustls/webpki/compare/v/0.103.3...v/0.103.10)

---
updated-dependencies:
- dependency-name: rustls-webpki
  dependency-version: 0.103.10
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-23 10:34:27 +00:00
dependabot[bot]
8df9cf35df build(deps): bump rustls-webpki in /tools/packaging/kata-deploy/binary
Bumps [rustls-webpki](https://github.com/rustls/webpki) from 0.103.8 to 0.103.10.
- [Release notes](https://github.com/rustls/webpki/releases)
- [Commits](https://github.com/rustls/webpki/compare/v/0.103.8...v/0.103.10)

---
updated-dependencies:
- dependency-name: rustls-webpki
  dependency-version: 0.103.10
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-23 10:34:27 +00:00
dependabot[bot]
ef32923461 build(deps): bump tar from 0.4.44 to 0.4.45
Bumps [tar](https://github.com/alexcrichton/tar-rs) from 0.4.44 to 0.4.45.
- [Commits](https://github.com/alexcrichton/tar-rs/compare/0.4.44...0.4.45)

---
updated-dependencies:
- dependency-name: tar
  dependency-version: 0.4.45
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-23 10:34:27 +00:00
stevenhorsman
85e17c2e77 deps: Bump rustls-webpki
Bump rusttls-webpki to 0.103.10 to remediate RUSTSEC-2026-0049

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-23 10:34:27 +00:00
stevenhorsman
c3868f8e60 deps: Bump aws-lc-rs to 1.16.2
Bump aws-lc-rs, so that aws-lc-sys updates to 0.39.0 to remediate
RUSTSEC-2026-0044 and https://osv.dev/vulnerability/RUSTSEC-2026-0048

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-23 10:34:27 +00:00
stevenhorsman
27417d9d15 ci: Add more crates to dependabot groups
Add aws-lc, and rustls-webpki, so that in future the different
component bumps are all done together

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-23 10:34:27 +00:00
Fabiano Fidêncio
83f37f4beb tests: nginx-connectivity: Override index.html (2nd try)
We need to explicitly pass `-O index.html` as the busybox' wget has a
different behaviour than GNU's wget.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-23 11:11:44 +01:00
Fabiano Fidêncio
e44dfccf7a Revert "tests: nginx-connectivity: Allow overriding the downloded file"
This reverts commit 4403289123.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-23 11:06:23 +01:00
Hyounggyu Choi
1035504492 Merge pull request #12701 from fidencio/topic/tests-arm-nginx-connectivity
tests: nginx-connectivity: Allow overriding the downloded file
2026-03-23 10:37:25 +01:00
Steve Horsman
20cb65b1fb Merge pull request #12624 from lifupan/bump_rust_vmms
runtime-rs: Bump rust vmms for dragonball
2026-03-23 08:56:47 +00:00
Fabiano Fidêncio
864f181faf Merge pull request #12694 from manuelh-dev/mahuber/nv-test-timeout
tests: nvidia: Increase run test timeout
2026-03-23 09:13:20 +01:00
Fabiano Fidêncio
642b5661ff Merge pull request #12651 from manuelh-dev/mahuber/doc-update-nvidia-gpu-op
docs: Update NVIDIA GPU passthrough QEMU scenario
2026-03-23 09:01:02 +01:00
Fabiano Fidêncio
4403289123 tests: nginx-connectivity: Allow overriding the downloded file
In case a wget fails for one reason or another, it'll leave behind an
'index.html' file.  Let's make sure we allow overriding that file so the
retry loop doesn't fail for no reason.

Fixes: #12670

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-23 04:08:24 +01:00
Alex Lyn
d2c2ec6e23 Merge pull request #12633 from LandonTClipp/docs_materialx
docs: Move to mkdocs-material, port Helm to docs site
2026-03-23 09:29:25 +08:00
Fupan Li
608f378bff dragonball: make sure the nydus's worker thread access network
Since the dragonball's vmm thread had been joined in the pod's
netns, which wouldn't access the network, thus we should make
sure the nydus's worker thread join into the runD's main thread's
netns which would access the network.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2026-03-22 22:44:24 +08:00
Fabiano Fidêncio
f14895bdc4 Merge pull request #12673 from manuelh-dev/mahuber/release-doc-update
docs: Update release process notes
2026-03-22 13:11:03 +01:00
Fabiano Fidêncio
fd716c017d Merge pull request #12567 from agamdua/ebpf-confs
kernel: Add debug kernel with eBPF configs to static tarball builds
2026-03-22 13:06:54 +01:00
Fabiano Fidêncio
740d380b8e tests: nvidia: cc: Use nydus-snapshotter
So we can test what we just changed in the config files.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-22 10:10:34 +01:00
Fabiano Fidêncio
6194510e90 nvidia: cc: Use nydus-snapshotter
We've been using `experimental_force_guest_pull`, but now that we have a
containerd release that should work more reliably with the multi
snapshotter setup, we want to give it a try.

Note: We need containerd 2.2.2+.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-22 10:10:34 +01:00
Agam Dua
7e3fd74779 kernel: bump config version
With debug/ebpf updates in place, let's bump the kata config version.

Signed-off-by: Agam Dua <agam_dua@apple.com>
Co-authored-by: Eric Ernst <eric_ernst@apple.com>
2026-03-20 15:04:15 -07:00
Agam Dua
f6319da73d tests: Add eBPF and dwarves to spell check dictionary
Add missing terms to the spell check dictionary to fix CI failures
for kernel debug documentation:

- eBPF
- dwarves: Linux package with DWARF/BTF tools (pahole) required for
  CONFIG_DEBUG_INFO_BTF kernel option

Also fix the casing of "ebpf" to "eBPF" in the kernel README to match
the official naming convention.

Signed-off-by: Agam Dua <agam_dua@apple.com>
2026-03-20 15:04:08 -07:00
Agam Dua
91d6c39f06 kernel: Fix debug build and add debug symbols to installation
Fixed a bug with the debug kernel build where common/ was repeated
after the common path variable, resulting in the debug
confs never being picked up.

This exposed a subsequent bug where the debug conf
was included in other builds, this is also fixed by creating a
separate directory for debug confs with one file at the moment,
debug.conf that contains debug configurations and bpf specific
configs.

To enable kernel builds (specifically for bpf) the dwarves package was added
to the kernel dockerfile for the pahole package.

Signed-off-by: Agam Dua <agam_dua@apple.com>
2026-03-20 14:50:23 -07:00
Agam Dua
5ab0744c25 ci: Add pipeline for building and distributing the debug kernel
Add the debug kernel to the kata tarball alongside the other kernels.

Also update the kernel README documentation to describe the new debug
kernel build process.

Signed-off-by: Agam Dua <agam_dua@apple.com>
2026-03-20 14:50:23 -07:00
Agam Dua
e905b74267 kernel: Add eBPF configs for debug builds
Adds a BPF section in the debug.conf kernel configuration options
 to enable eBPF and BTF support for debug kernel builds.

Signed-off-by: Agam Dua <agam_dua@apple.com>
2026-03-20 14:50:23 -07:00
LandonTClipp
5333e45313 docs: Fix static-checks.sh when running locally
This fixes the test_dir variable in static-checks.sh so that
when a --repo-path is provided, the test_dir variable uses that
for the location instead of the GOPATH location.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2026-03-20 14:51:45 -05:00
LandonTClipp
795869152d docs: Move to mkdocs-material, port Helm to docs site
This supersedes https://github.com/kata-containers/kata-containers/pull/12622.
I replaced Zensical with mkdocs-materialx. Materialx is a fork of mkdocs-material
created after mkdocs-material was put into maintenance mode. We'll use this
platform until Zensical is more feature complete.

Added a few of the existing docs into the site to make a more user-friendly flow.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2026-03-20 14:51:39 -05:00
Manuel Huber
8903b12d34 tests: nvidia: Increase run test timeout
Increase the timeout as a few new features and tests are going to be
onboarded for the NVIDIA GPU CI.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-20 11:12:52 -07:00
Manuel Huber
476f550977 docs: Update NVIDIA GPU passthrough QEMU scenario
With the upcoming GPU operator 26.3 relase and recent changes to
kata-containers, we adapt this documentation with notes on multi
GPU passthrough, support for TDX, changed deployment instructions,
and with various other minor improvements.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-20 10:53:14 -07:00
Manuel Huber
ae59cf26a0 kata-deploy: Check kata-tarball size limits
For kata tarballs we eventually release to GitHub, check their
size against the GitHub size limit. With this, we fail in case of
an ongoing release process in 'CI | Publish Kata Containers payload'
instead of only later on in the 'Release Kata Containers' action,
and we fail during PR builds, avoiding this situation at all.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-20 10:40:55 -07:00
RuoqingHe
cfc1836a31 Merge pull request #12672 from stevenhorsman/agent-security-fixes
agent: Bump tracing-subscriber
2026-03-20 17:37:16 +08:00
Steve Horsman
7ab6e11e10 Merge pull request #12678 from kata-containers/dependabot/go_modules/src/runtime/google.golang.org/grpc-1.79.3
build(deps): bump google.golang.org/grpc from 1.72.0 to 1.79.3 in /src/runtime
2026-03-20 08:49:35 +00:00
Steve Horsman
e475fb2116 Merge pull request #12680 from kata-containers/dependabot/go_modules/src/tools/csi-kata-directvolume/google.golang.org/grpc-1.79.3
build(deps): bump google.golang.org/grpc from 1.63.2 to 1.79.3 in /src/tools/csi-kata-directvolume
2026-03-20 08:49:27 +00:00
RuoqingHe
f62a6b6ab2 Merge pull request #12677 from stevenhorsman/cspell-action
Switch to use cspell for spell checking
2026-03-20 13:27:36 +08:00
Manuel Huber
4afb55154a docs: Update release process notes
Update the Release-Process.md file with some clarifications on the
release process.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-19 15:14:23 -07:00
stevenhorsman
38a655487f vsock-exporter: Switch bincode for serde_json
bincode is not maintained, so switch to serde_json to
resolve RUSTSEC-2025-0141

Assisted-By: Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-19 10:45:17 +00:00
stevenhorsman
e1d7d5bef8 agent: Remove async-std
It's a dev-dependency that doesn't seem to be used, so
remove it and resolve RUSTSEC-2025-0052

Assisted-By: Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-19 10:45:17 +00:00
stevenhorsman
e4eda5e1d8 agent: Bump tracing-subscriber
- Bump tracing-subscriber to 0.3.20 to resolve RUSTSEC-2025-0055
- Switch deprecated `slog_info!` for `slog::info!`

Generated-By: Bob
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-19 10:45:17 +00:00
stevenhorsman
e62df07b6a static-checks: Delete kata-spell-check
The old hunspell based spell-check was causing contributors
challenges and proving a barrier to doc updates. We've replaced
it with a cspell based-solution, so clean up the old approach.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-19 10:22:54 +00:00
stevenhorsman
44ec815f77 gatekeeper: Add check-spelling to required
Add the new job to the list of static checks that runs on doc updates.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-19 10:22:54 +00:00
stevenhorsman
c2cedd7c02 workflows: Add spellcheck workflow
Add a separate spellcheck workflow, so we can replace
the complex hunspell approach embedded in static-checks

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-19 10:22:54 +00:00
stevenhorsman
d06dadd8ef docs: Spelling updates
Either fixing typos, or including program/repo name in
backticks

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-19 10:22:54 +00:00
stevenhorsman
829a32ee67 spellcheck: Add cspell files
Add cspell config and initial dictionary

Assisted-by: Bob (dictionary ordering and catergorisation)
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-19 10:22:54 +00:00
dependabot[bot]
2f5415d8f5 build(deps): bump google.golang.org/grpc
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.63.2 to 1.79.3.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.63.2...v1.79.3)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.79.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-19 10:03:45 +00:00
dependabot[bot]
3876a80208 build(deps): bump google.golang.org/grpc in /src/runtime
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.72.0 to 1.79.3.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.72.0...v1.79.3)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-version: 1.79.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-19 10:03:30 +00:00
Alex Lyn
de2ddf6ed9 Merge pull request #12637 from stevenhorsman/bump-rust-to-1.92
versions: Bump rust to 1.92
2026-03-19 10:02:18 +08:00
Manuel Huber
5765bc97b4 tests: cc: setup function for python venv
We recently had a failure on a new CI runner where
${HOME}/.cicd/venv/bin/activate was not present. The relevant call
originated from ensure_sev_snp_measure. Thus, add a function
ensure_cicd_python_venv before callers to pip install.
Currently, the NVIDIA NIM test and the confidential attestation
tests use pip to install dependencies.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-18 17:07:47 -07:00
Manuel Huber
62d74bb1fd agent: add mkfs_opts parameter to cdh_secure_mount
Add an mkfs_opts parameter to cdh_secure_mount so that its users
can parametrize these options depending on their needs. For now,
there is two users providing explicit values (container image
layer storage and container data storage features).

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-18 15:37:32 -07:00
Aurélien Bombo
352b4cdad2 Merge pull request #12660 from LandonTClipp/ci_docs
ci: Don't run CI builds on doc PRs
2026-03-17 12:19:11 -05:00
stevenhorsman
56b6917adf versions: Bump rust to 1.92
Now that 1.94 has release, in compliance with our toolchain guidance
we should bump to rust 1.92

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-17 16:04:58 +00:00
stevenhorsman
2a4227e02e kata-ctl: Try fixing unused_assignement error
`allow(unused_assignments)` isn't working as it's
in macro generated code, so referencing the command
in the error, to use it

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-17 16:04:58 +00:00
stevenhorsman
ca7cdcd732 kata-ctl: Rewrite path_join test
This test was failing clippy by calling .unwrap() after
an .is_ok(), but after I looked at it, it seemed a bit messy,
so I split it up and tried rewriting it to make it more readable
IMHO.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-17 16:04:58 +00:00
stevenhorsman
501578cc5a agent: Remove non-idiomatic unwrap
Calling .unwrap() after an .is_some() check is considered non-idiomatic in
as it performs redundant work and makes the code more verbose.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-17 16:04:58 +00:00
Alex Lyn
833b72470c Merge pull request #12647 from sprt/gp-improve
genpolicy: Improve emptyDir storage options and mount point validation
2026-03-17 13:56:42 +08:00
Manuel Huber
660e3bb653 gpu: Obsolete the NVIDIA initrd build
As the NVIDIA stack has shifted to using an image for both the
confidential and non-confidential variants, we retire the initrd
build.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 21:29:58 -04:00
Aurélien Bombo
f8e234c6f9 Merge pull request #12650 from kata-containers/sprt/remove-csi
ci: Stop building/deploying CSI driver
2026-03-16 16:53:02 -05:00
Steve Horsman
294c367063 Merge pull request #12668 from manuelh-dev/release/3.28.0
release: Bump version to 3.28.0
2026-03-16 19:47:12 +00:00
Manuel Huber
5210584f95 release: Bump version to 3.28.0
Bump VERSION and helm-charts versions.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 09:52:35 -07:00
Manuel Huber
e13748f46d tests: Adapt trusted ephemeral storage test
With the new CDH version, the LUKS header is moved off of the disk
into guest memory. We hence adapt the test's filesystem type checks.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 09:43:17 -07:00
Manuel Huber
5bbc0abb81 tests: use pre-created, signed sealed secrets
With signature support for sealed secret, use pre-created signed
sealed secrets and provision the signing public key to the KBS.

Add instructions for re-creating these signed secrets.

Improve k8s-sealed-secrets.bats by reducing repeated kubectl logs
calls. A test run showed a SIGPIPE error one one of the grep-logs
while the printouts of the initial kubectl logs invocation showed
that the expected values were actually in the logs.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 09:43:17 -07:00
Manuel Huber
a9b222f91e gpu: Update chiseled rootfs with new CDH deps
With CDH requiring libcryptsetup, mkfs.ext4, dd, and their
dependencies, we will need to update the chiseled NVIDIA rootfs
accordingly.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 09:43:17 -07:00
Manuel Huber
169f92ff09 agent: cdh: Update CDH and API
With the new CDH version, the secure_mount API changes.
Further, the new CDH version no longer uses the luks-encrypt-storage
script but utilizes libcryptsetup as well as mkfs.ext4 and dd. Hence, adapt
some of the CDH and Kata components build steps

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 09:43:17 -07:00
Alex Lyn
ef5db0a01f Merge pull request #12607 from zvonkok/system-map
kernel: Ship System.map as part of the kernel build
2026-03-16 09:37:44 +08:00
Zvonko Kaiser
99f32de1e5 kata-deploy: Update RuntimeClass PodOverhead
Align the podOverhead with the default_memory updated
in the previous commit.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
6a853a9684 gpu: Bump NVRC
We have a new release add this one to the next
Kata release.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
8ff5d164c6 runtime: make CDI annotation vendor-agnostic with lookup table
Replace hardcoded NVIDIA vendor ID (0x10de) and class (0x030) checks
with a vendor-agnostic lookup table (cdiDeviceKind) that maps PCI
vendor/class pairs to CDI device kinds. This makes it straightforward
to add support for new device types by adding entries to the table.

Refactor siblingAnnotation to resolve device BDFs once upfront and
reuse them for both CDI type detection and sibling matching, eliminating
redundant sysfs reads. Devices not in the lookup table (e.g. NVSwitches)
are skipped with errNoSiblingFound, while known device types that fail
to match a sibling produce a hard error.

Consolidate the hot-plug and cold-plug device loops into a single loop
over extracted container paths, removing duplicated filtering logic.

Export GetPCIDeviceProperty from the device drivers package to allow
vendor/class lookup from sysfs in the container annotation path.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
d4c21f50b5 gpu: Bump default memory to 8G for GPU runtimes
We need enough inital memory to prepare more complex
platforms like HGX H100 or HGX B200 systems.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
5c9683f006 gpu: Remove devtmpfs.mount=0
With the newest NVRC release this is solved and does
not need to be overriden.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
d22c314e91 gpu: Increase dial_timeout=1200
For cold-plug when running with nerdctl the timeouts in the config
are being used, increase the dial_timeout (e.g. for CreateSandbox) to match
create_container_timeout.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
7fe84c8038 gpu: HGX Rootfs Fixes
Various smaller fixes to enable HGX systems.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Joji Mekkattuparamban
1fd66db271 nvidia-gpu: add missing libraries to rootfs
Added the missing packages to the nvidia rootfs.

Fixes #12534

Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>
2026-03-13 16:24:32 -07:00
Dan Mihai
9332b75c04 Merge pull request #12661 from stevenhorsman/runtime-go-1.25.8
runtime: bump go.mod version
2026-03-13 14:06:08 -07:00
Zvonko Kaiser
d382379571 kernel: Ship System.map as part of the kernel build
Some use-cases need the System.map of the running kernel,
ship it via kernel-artifact.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-13 19:27:18 +00:00
Manuel Huber
4a7022d2f4 tests: nvidia: call genpolicy auth for all tests
Call the setup_genpolicy_registry_auth in run_kubernetes_nv_tests.sh.
Authenticate before exercising any tests.

Recently, we have seen UnauthorizedError messages for the CUDA
vectorAdd image. While this image is not gated behind authentication,
rate limiting may be a possible issue.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-13 09:03:01 -07:00
LandonTClipp
9a8932412d docs: remove URL and markdown reference checks
This URL check performed a CURL command to see if it was real. This will
not work in the mkdocs world because the docs might reference a link that
is not yet built on the main page. This is a chicken-and-egg problem.

For reference:

```
ERROR: Invalid URL 'https://kata-containers.github.io/kata-containers/installation/#helm-chart' found in the following files:

tools/packaging/kata-deploy/helm-chart/README.md
```

The markdown reference requirement was put in place for the old docs system, but this
will not apply anymore in the new mkdocs system. I'm removing this
entirely because it will only get in the way and cause confusion.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2026-03-12 15:48:35 -05:00
LandonTClipp
d5d741f4e3 ci: Don't run CI builds on doc PRs
We disable the Kata artifact builds and testing if the PR is only
related to documentation. Regular static checks will remain.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2026-03-12 15:48:05 -05:00
Zvonko Kaiser
4c450a5b01 Merge pull request #12648 from manuelh-dev/mahuber/trustee-upgrade
versions: bump trustee to latest version
2026-03-12 14:09:15 -04:00
Steve Horsman
7d2e18575c Merge pull request #12343 from zvonkok/release-model
doc: Release model update
2026-03-12 14:44:51 +00:00
Zvonko Kaiser
7f662662cf lint: Fix 80 char column size
Make markdownlint happy

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-12 12:03:29 +00:00
Zvonko Kaiser
6e03a95730 doc: Update Release Process
Add how Kata is doing the rolling release.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-12 12:03:29 +00:00
stevenhorsman
f25fa6ab25 runtime: bump go.mod version
Update the runtime's go.mod go version to 1.25.8 to
keep in sync with versions.yaml

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-12 08:53:40 +00:00
Manuel Huber
0926c92aa0 versions: bump trustee to latest version
Ingest various recent fixes and dependency updates.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-11 14:45:42 -07:00
Aurélien Bombo
32444737b5 gatekeeper: Remove csi-kata-directvolume build from required tests
Since we don't build that anymore.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-11 12:55:23 -05:00
Aurélien Bombo
64aed13d5f Revert "ci: Add no-op step to compile CSI driver"
This reverts commit e43c59a2c6.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-11 12:55:23 -05:00
Aurélien Bombo
dd2c4c0db3 Revert "coco: ci: Add no-op steps to deploy CSI driver"
This reverts commit 5e4990bcf5.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-11 12:55:23 -05:00
Aurélien Bombo
d598e0baf1 Revert "ci: Implement build step for CSI driver"
This partially reverts commit fb87bf221f.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-11 12:55:23 -05:00
Aurélien Bombo
2a15cfc5ec genpolicy: Improve emptyDir storage options and mount point validation
These are two changes following a Copilot review on #10559:

1. Restore the p_storage.driver != "blk" check in allow_storage_options():
   - An early version of #10599 hardcoded p_storage.driver to "blk".
   - Hence that check needed to be removed to validate "blk" storage options.
   - The final version of #10599 hardcodes p_storage.driver to "" to
     account for both "blk" and "scsi", and checks storage options in
     allow_block_storage().
   - Hence that check should be restored to preserve the original behavior.

https://github.com/kata-containers/kata-containers/pull/10559#discussion_r2907646552

2. Don't use a regex to validate emptyDir storage mount points:
   - It's risky to use a regex to validate a path that has base64-encoded
     components.
   - We can infer the exact path anyway so the regex is redundant.

https://github.com/kata-containers/kata-containers/pull/10559#discussion_r2907646582

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-10 11:22:10 -05:00
202 changed files with 6481 additions and 6489 deletions

37
.cspell.yaml Normal file
View File

@@ -0,0 +1,37 @@
# yaml-language-server: $schema=https://raw.githubusercontent.com/streetsidesoftware/cspell/main/cspell.schema.json
version: "0.2"
language: en,en-GB
dictionaryDefinitions:
- name: kata-terms
path: ./tests/spellcheck/kata-dictionary.txt
addWords: true
dictionaries:
- en-GB
- en_US
- bash
- git
- golang
- k8s
- python
- rust
- companies
- mnemonics
- peopleNames
- softwareTerms
- networking-terms
- kata-terms
ignoreRegExpList:
- /@[a-z\d](?:[a-z\d]|-(?=[a-z\d])){0,38}/gi # Ignores github handles
# Ignore code blocks
- /^\s*`{3,}[\s\S]*?^\s*`{3,}/gm
- /`[^`\n]+`/g
ignorePaths:
- "**/vendor/**" # vendor files aren't owned by us
- "**/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/**" # Generated files
- "**/requirements.txt"
useGitignore: true

View File

@@ -37,9 +37,9 @@ updates:
# create groups for common dependencies, so they can all go in a single PR
# We can extend this as we see more frequent groups
groups:
bit-vec:
aws-libcrypto:
patterns:
- bit-vec
- aws-lc-*
bumpalo:
patterns:
- bumpalo
@@ -67,6 +67,9 @@ updates:
rustix:
patterns:
- rustix
rustls-webpki:
patterns:
- rustls-webpki
slab:
patterns:
- slab

View File

@@ -47,6 +47,7 @@ jobs:
- coco-guest-components
- firecracker
- kernel
- kernel-debug
- kernel-dragonball-experimental
- kernel-nvidia-gpu
- nydus
@@ -168,8 +169,6 @@ jobs:
- rootfs-image-nvidia-gpu-confidential
- rootfs-initrd
- rootfs-initrd-confidential
- rootfs-initrd-nvidia-gpu
- rootfs-initrd-nvidia-gpu-confidential
steps:
- name: Login to Kata Containers quay.io
if: ${{ inputs.push-to-registry == 'yes' }}
@@ -349,6 +348,16 @@ jobs:
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts versions.yaml
env:
RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}
- name: Check kata tarball size (GitHub release asset limit)
run: |
# https://docs.github.com/en/repositories/releasing-projects-on-github/about-releases#storage-and-bandwidth-quotas
GITHUB_ASSET_MAX_BYTES=2147483648
tarball_size=$(stat -c "%s" kata-static.tar.zst)
if [[ "${tarball_size}" -ge "${GITHUB_ASSET_MAX_BYTES}" ]]; then
echo "::error::tarball size (${tarball_size} bytes) >= GitHub release asset limit (${GITHUB_ASSET_MAX_BYTES} bytes)"
exit 1
fi
echo "tarball size: ${tarball_size} bytes"
- name: store-artifacts
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
@@ -367,7 +376,6 @@ jobs:
matrix:
asset:
- agent-ctl
- csi-kata-directvolume
- genpolicy
- kata-ctl
- kata-manager
@@ -450,6 +458,16 @@ jobs:
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-tools-artifacts versions.yaml kata-tools-static.tar.zst
env:
RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}
- name: Check kata-tools tarball size (GitHub release asset limit)
run: |
# https://docs.github.com/en/repositories/releasing-projects-on-github/about-releases#storage-and-bandwidth-quotas
GITHUB_ASSET_MAX_BYTES=2147483648
tarball_size=$(stat -c "%s" kata-tools-static.tar.zst)
if [[ "${tarball_size}" -ge "${GITHUB_ASSET_MAX_BYTES}" ]]; then
echo "::error::tarball size (${tarball_size} bytes) >= GitHub release asset limit (${GITHUB_ASSET_MAX_BYTES} bytes)"
exit 1
fi
echo "tarball size: ${tarball_size} bytes"
- name: store-artifacts
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:

View File

@@ -45,6 +45,7 @@ jobs:
- cloud-hypervisor
- firecracker
- kernel
- kernel-debug
- kernel-dragonball-experimental
- kernel-nvidia-gpu
- kernel-cca-confidential
@@ -152,7 +153,6 @@ jobs:
- rootfs-image
- rootfs-image-nvidia-gpu
- rootfs-initrd
- rootfs-initrd-nvidia-gpu
steps:
- name: Login to Kata Containers quay.io
if: ${{ inputs.push-to-registry == 'yes' }}
@@ -327,6 +327,16 @@ jobs:
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts versions.yaml
env:
RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}
- name: Check kata tarball size (GitHub release asset limit)
run: |
# https://docs.github.com/en/repositories/releasing-projects-on-github/about-releases#storage-and-bandwidth-quotas
GITHUB_ASSET_MAX_BYTES=2147483648
tarball_size=$(stat -c "%s" kata-static.tar.zst)
if [[ "${tarball_size}" -ge "${GITHUB_ASSET_MAX_BYTES}" ]]; then
echo "::error::tarball size (${tarball_size} bytes) >= GitHub release asset limit (${GITHUB_ASSET_MAX_BYTES} bytes)"
exit 1
fi
echo "tarball size: ${tarball_size} bytes"
- name: store-artifacts
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:

View File

@@ -262,6 +262,16 @@ jobs:
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts versions.yaml
env:
RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}
- name: Check kata tarball size (GitHub release asset limit)
run: |
# https://docs.github.com/en/repositories/releasing-projects-on-github/about-releases#storage-and-bandwidth-quotas
GITHUB_ASSET_MAX_BYTES=2147483648
tarball_size=$(stat -c "%s" kata-static.tar.zst)
if [[ "${tarball_size}" -ge "${GITHUB_ASSET_MAX_BYTES}" ]]; then
echo "::error::tarball size (${tarball_size} bytes) >= GitHub release asset limit (${GITHUB_ASSET_MAX_BYTES} bytes)"
exit 1
fi
echo "tarball size: ${tarball_size} bytes"
- name: store-artifacts
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:

View File

@@ -350,6 +350,16 @@ jobs:
./tools/packaging/kata-deploy/local-build/kata-deploy-merge-builds.sh kata-artifacts versions.yaml
env:
RELEASE: ${{ inputs.stage == 'release' && 'yes' || 'no' }}
- name: Check kata tarball size (GitHub release asset limit)
run: |
# https://docs.github.com/en/repositories/releasing-projects-on-github/about-releases#storage-and-bandwidth-quotas
GITHUB_ASSET_MAX_BYTES=2147483648
tarball_size=$(stat -c "%s" kata-static.tar.zst)
if [[ "${tarball_size}" -ge "${GITHUB_ASSET_MAX_BYTES}" ]]; then
echo "::error::tarball size (${tarball_size} bytes) >= GitHub release asset limit (${GITHUB_ASSET_MAX_BYTES} bytes)"
exit 1
fi
echo "tarball size: ${tarball_size} bytes"
- name: store-artifacts
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:

View File

@@ -4,17 +4,18 @@ on:
branches:
- main
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
deploy-docs:
name: deploy-docs
build:
runs-on: ubuntu-24.04
name: Build docs
permissions:
contents: read
pages: write
id-token: write
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- uses: actions/configure-pages@983d7736d9b0ae728b81ab479565c72886d7745b # v5.0.0
- uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5.0.1
@@ -23,10 +24,30 @@ jobs:
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
with:
python-version: 3.x
- run: pip install zensical
- run: zensical build --clean
- run: pip install -r docs/requirements.txt
- run: python3 -m mkdocs build --config-file ./mkdocs.yaml --site-dir site/
id: build
- uses: actions/upload-pages-artifact@7b1f4a764d45c48632c6b24a0339c27f5614fb0b # v4.0.0
with:
path: site
- uses: actions/deploy-pages@d6db90164ac5ed86f2b6aed7e0febac5b3c0c03e # v4.0.5
id: deployment
with:
path: site/
name: github-pages
deploy:
needs: build
runs-on: ubuntu-24.04
name: Deploy docs
permissions:
pages: write
id-token: write
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
steps:
- name: Deploy to GitHub Pages
uses: actions/deploy-pages@d6db90164ac5ed86f2b6aed7e0febac5b3c0c03e # v4.0.5
id: deployment
with:
artifact_name: github-pages

View File

@@ -49,6 +49,8 @@ jobs:
KATA_HYPERVISOR: ${{ matrix.environment.vmm }}
KUBERNETES: kubeadm
KBS: ${{ matrix.environment.name == 'nvidia-gpu-snp' && 'true' || 'false' }}
SNAPSHOTTER: ${{ matrix.environment.name == 'nvidia-gpu-snp' && 'nydus' || '' }}
USE_EXPERIMENTAL_SNAPSHOTTER_SETUP: ${{ matrix.environment.name == 'nvidia-gpu-snp' && 'true' || 'false' }}
K8S_TEST_HOST_TYPE: baremetal
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
@@ -98,7 +100,7 @@ jobs:
run: bash tests/integration/kubernetes/gha-run.sh install-bats
- name: Run tests ${{ matrix.environment.vmm }}
timeout-minutes: 30
timeout-minutes: 60
run: bash tests/integration/kubernetes/gha-run.sh run-nv-tests
env:
NGC_API_KEY: ${{ secrets.NGC_API_KEY }}

View File

@@ -110,10 +110,6 @@ jobs:
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh install-kbs-client
- name: Deploy CSI driver
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh deploy-csi-driver
- name: Run tests
timeout-minutes: 100
run: bash tests/integration/kubernetes/gha-run.sh run-tests
@@ -134,10 +130,6 @@ jobs:
[[ "${KATA_HYPERVISOR}" == "qemu-tdx" ]] && echo "ITA_KEY=${GH_ITA_KEY}" >> "${GITHUB_ENV}"
bash tests/integration/kubernetes/gha-run.sh delete-coco-kbs
- name: Delete CSI driver
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh delete-csi-driver
# Generate jobs for testing CoCo on non-TEE environments
run-k8s-tests-coco-nontee:
name: run-k8s-tests-coco-nontee
@@ -235,10 +227,6 @@ jobs:
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh install-kbs-client
- name: Deploy CSI driver
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh deploy-csi-driver
- name: Run tests
timeout-minutes: 80
run: bash tests/integration/kubernetes/gha-run.sh run-tests
@@ -257,11 +245,6 @@ jobs:
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh delete-coco-kbs
- name: Delete CSI driver
if: always()
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh delete-csi-driver
# Extensive matrix: autogenerated policy tests (nydus + experimental-force-guest-pull) on k0s, k3s, rke2, microk8s with qemu-coco-dev / qemu-coco-dev-runtime-rs
run-k8s-tests-coco-nontee-extensive-matrix:
if: ${{ inputs.extensive-matrix-autogenerated-policy == 'yes' }}
@@ -365,10 +348,6 @@ jobs:
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh install-kbs-client
- name: Deploy CSI driver
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh deploy-csi-driver
- name: Run tests
timeout-minutes: 80
run: bash tests/integration/kubernetes/gha-run.sh run-tests
@@ -387,11 +366,6 @@ jobs:
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh delete-coco-kbs
- name: Delete CSI driver
if: always()
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh delete-csi-driver
# Generate jobs for testing CoCo on non-TEE environments with erofs-snapshotter
run-k8s-tests-coco-nontee-with-erofs-snapshotter:
name: run-k8s-tests-coco-nontee-with-erofs-snapshotter
@@ -478,10 +452,6 @@ jobs:
timeout-minutes: 20
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata
- name: Deploy CSI driver
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh deploy-csi-driver
- name: Run tests
timeout-minutes: 80
run: bash tests/integration/kubernetes/gha-run.sh run-tests
@@ -494,8 +464,3 @@ jobs:
if: always()
timeout-minutes: 15
run: bash tests/integration/kubernetes/gha-run.sh cleanup
- name: Delete CSI driver
if: always()
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh delete-csi-driver

30
.github/workflows/spellcheck.yaml vendored Normal file
View File

@@ -0,0 +1,30 @@
name: Spelling check
on: ["pull_request"]
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
check-spelling:
name: check-spelling
runs-on: ubuntu-24.04
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
- name: Check Spelling
uses: streetsidesoftware/cspell-action@9cd41bb518a24fefdafd9880cbab8f0ceba04d28 # 8.3.0
with:
files: |
**/*.md
**/*.rst
**/*.txt
incremental_files_only: true
config: ".cspell.yaml"

View File

@@ -138,7 +138,7 @@ jobs:
go-version: ${{ env.GO_VERSION }}
- name: Install system dependencies
run: |
sudo apt-get update && sudo apt-get -y install moreutils hunspell hunspell-en-gb hunspell-en-us pandoc
sudo apt-get update && sudo apt-get -y install moreutils
- name: Install open-policy-agent
run: |
cd "${GOPATH}/src/github.com/${GITHUB_REPOSITORY}"

29
Cargo.lock generated
View File

@@ -148,9 +148,10 @@ checksum = "7f202df86484c868dbad7eaa557ef785d5c66295e41b460ef922eca0723b842c"
[[package]]
name = "api_client"
version = "0.1.0"
source = "git+https://github.com/cloud-hypervisor/cloud-hypervisor?tag=v27.0#2ba6a9bfcfd79629aecf77504fa554ab821d138e"
source = "git+https://github.com/cloud-hypervisor/cloud-hypervisor?tag=v51.0#00e106e53e7ba48b01f194f7887b20b9a0bfb905"
dependencies = [
"vmm-sys-util 0.10.0",
"thiserror 2.0.18",
"vmm-sys-util 0.14.0",
]
[[package]]
@@ -340,9 +341,9 @@ checksum = "cc17ab023b4091c10ff099f9deebaeeb59b5189df07e554c4fef042b70745d68"
[[package]]
name = "aws-lc-rs"
version = "1.16.1"
version = "1.16.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "94bffc006df10ac2a68c83692d734a465f8ee6c5b384d8545a636f81d858f4bf"
checksum = "a054912289d18629dc78375ba2c3726a3afe3ff71b4edba9dedfca0e3446d1fc"
dependencies = [
"aws-lc-sys",
"zeroize",
@@ -350,9 +351,9 @@ dependencies = [
[[package]]
name = "aws-lc-sys"
version = "0.38.0"
version = "0.39.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4321e568ed89bb5a7d291a7f37997c2c0df89809d7b6d12062c81ddb54aa782e"
checksum = "1fa7e52a4c5c547c741610a2c6f123f3881e409b714cd27e6798ef020c514f0a"
dependencies = [
"cc",
"cmake",
@@ -693,7 +694,7 @@ dependencies = [
"nix 0.26.4",
"serde",
"serde_json",
"thiserror 1.0.69",
"thiserror 2.0.18",
"tokio",
]
@@ -1305,6 +1306,7 @@ dependencies = [
"epoll",
"fuse-backend-rs",
"io-uring",
"kata-sys-util",
"kvm-bindings",
"kvm-ioctls",
"libc",
@@ -1477,6 +1479,7 @@ dependencies = [
"dbs-utils",
"dbs-virtio-devices",
"derivative",
"kata-sys-util",
"kvm-bindings",
"kvm-ioctls",
"lazy_static",
@@ -5122,9 +5125,9 @@ checksum = "f87165f0995f63a9fbeea62b64d10b4d9d8e78ec6d7d51fb2125fda7bb36788f"
[[package]]
name = "rustls-webpki"
version = "0.103.9"
version = "0.103.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d7df23109aa6c1567d1c575b9952556388da57401e4ace1d15f79eedad0d8f53"
checksum = "df33b2b81ac578cabaf06b89b0631153a3f416b0a886e8a7a1707fb51abbd1ef"
dependencies = [
"aws-lc-rs",
"ring",
@@ -5917,9 +5920,9 @@ checksum = "55937e1799185b12863d447f42597ed69d9928686b8d88a1df17376a097d8369"
[[package]]
name = "tar"
version = "0.4.44"
version = "0.4.45"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1d863878d212c87a19c1a610eb53bb01fe12951c0501cf5a0d65f724914a667a"
checksum = "22692a6476a21fa75fdfc11d452fda482af402c008cdbaf3476414e122040973"
dependencies = [
"filetime",
"libc",
@@ -6782,9 +6785,9 @@ checksum = "69c376a9b84afdf97bddd2628096cf3554208b2a676cf06b4532e0f433a54e02"
[[package]]
name = "vmm-sys-util"
version = "0.10.0"
version = "0.14.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "08604d7be03eb26e33b3cee3ed4aef2bf550b305d1cca60e84da5d28d3790b62"
checksum = "d21f366bf22bfba3e868349978766a965cbe628c323d58e026be80b8357ab789"
dependencies = [
"bitflags 1.3.2",
"libc",

View File

@@ -49,8 +49,11 @@ docs-url-alive-check:
build-and-publish-kata-debug:
bash tools/packaging/kata-debug/kata-debug-build-and-upload-payload.sh ${KATA_DEBUG_REGISTRY} ${KATA_DEBUG_TAG}
docs-serve:
docker run --rm -p 8000:8000 -v ./docs:/docs:ro -v ${PWD}/zensical.toml:/zensical.toml:ro zensical/zensical serve --config-file /zensical.toml -a 0.0.0.0:8000
docs-build:
docker build -t kata-docs:latest -f ./docs/Dockerfile ./docs
docs-serve: docs-build
docker run --rm -p 8000:8000 -v ${PWD}:/docs:ro kata-docs:latest serve --config-file /docs/mkdocs.yaml -a 0.0.0.0:8000
.PHONY: \
all \
@@ -59,4 +62,5 @@ docs-serve:
default \
static-checks \
docs-url-alive-check \
docs-build \
docs-serve

View File

@@ -1 +1 @@
3.27.0
3.28.0

View File

@@ -378,7 +378,7 @@ that is used in the test" section. From there you can see exactly what you'll
have to use when deploying kata-deploy in your local cluster.
> [!NOTE]
> TODO: WAINER TO FINISH THIS PART BASED ON HIS PR TO RUN A LOCAL CI
> TODO: @wainersm TO FINISH THIS PART BASED ON HIS PR TO RUN A LOCAL CI
## Adding new runners

View File

@@ -98,7 +98,7 @@ Let's say the OCP pipeline passed running with
but failed running with
``quay.io/kata-containers/kata-deploy-ci:kata-containers-9f512c016e75599a4a921bd84ea47559fe610057-amd64``
and you'd like to know which PR caused the regression. You can either run with
all the 60 tags between or you can utilize the [bisecter](https://github.com/ldoktor/bisecter)
all the 60 tags between or you can utilize the [`bisecter`](https://github.com/ldoktor/bisecter)
to optimize the number of steps in between.
Before running the bisection you need a reproducer script. Sample one called

18
docs/.nav.yml Normal file
View File

@@ -0,0 +1,18 @@
# https://lukasgeiter.github.io/mkdocs-awesome-nav/
nav:
- Home: index.md
- Getting Started:
- prerequisites.md
- installation.md
- Configuration:
- helm-configuration.md
- runtime-configuration.md
- Platform Support:
- hypervisors.md
- Guides:
- Use Cases:
- NVIDIA GPU Passthrough: use-cases/NVIDIA-GPU-passthrough-and-Kata-QEMU.md
- NVIDIA vGPU: use-cases/NVIDIA-GPU-passthrough-and-Kata.md
- Intel Discrete GPU: use-cases/Intel-Discrete-GPU-passthrough-and-Kata.md
- Misc:
- Architecture: design/architecture/

View File

@@ -730,7 +730,7 @@ sudo sed -i -e 's/^kernel_params = "\(.*\)"/kernel_params = "\1 agent.debug_cons
##### Connecting to the debug console
Next, connect to the debug console. The VSOCKS paths vary slightly between each
Next, connect to the debug console. The VSOCK paths vary slightly between each
VMM solution.
In case of cloud-hypervisor, connect to the `vsock` as shown:

11
docs/Dockerfile Normal file
View File

@@ -0,0 +1,11 @@
# Copyright 2026 Kata Contributors
#
# SPDX-License-Identifier: Apache-2.0
#
FROM python:3.12-slim
WORKDIR /
COPY ./requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt
ENTRYPOINT ["python3", "-m", "mkdocs"]

View File

@@ -188,15 +188,14 @@ and compare them with standard tools (e.g. `diff(1)`).
# Spelling
Since this project uses a number of terms not found in conventional
dictionaries, we have a
[spell checking tool](https://github.com/kata-containers/kata-containers/tree/main/tests/cmd/check-spelling)
that checks both dictionary words and the additional terms we use.
dictionaries, we have a [kata-dictionary](../tests/spellcheck/kata-dictionary.txt)
that contains some project specific terms we use.
Run the spell checking tool on your document before raising a PR to ensure it
You can run the `cspell` checking tool on your document before raising a PR to ensure it
is free of mistakes.
If your document introduces new terms, you need to update the custom
dictionary used by the spell checking tool to incorporate the new words.
dictionary to incorporate the new words.
# Names

View File

@@ -1,59 +1,69 @@
# How to do a Kata Containers Release
This document lists the tasks required to create a Kata Release.
## Requirements
- GitHub permissions to run workflows.
## Versioning
## Release Model
The Kata Containers project uses [semantic versioning](http://semver.org/) for all releases.
Semantic versions are comprised of three fields in the form:
Kata Containers follows a rolling release model with monthly snapshots.
New features, bug fixes, and improvements are continuously integrated into
`main`. Each month, a snapshot is tagged as a new `MINOR` release.
```
MAJOR.MINOR.PATCH
```
### Versioning
When `MINOR` increases, the new release adds **new features** but *without changing the existing behavior*.
Releases use the `MAJOR.MINOR.PATCH` scheme. Monthly snapshots increment
`MINOR`; `PATCH` is typically `0`. Major releases are rare (years apart) and
signal significant architectural changes that may require updates to container
managers (Containerd, CRI-O) or other infrastructure. Breaking changes in
`MINOR` releases are avoided where possible, but may occasionally occur as
features are deprecated or removed.
When `MAJOR` increases, the new release adds **new features, bug fixes, or
both** and which **changes the behavior from the previous release** (incompatible with previous releases).
### No Stable Branches
A major release will also likely require a change of the container manager version used,
-for example Containerd or CRI-O. Please refer to the release notes for further details.
**Important** : the Kata Containers project doesn't have stable branches (see
[this issue](https://github.com/kata-containers/kata-containers/issues/9064) for details).
Bug fixes are released as part of `MINOR` or `MAJOR` releases only. `PATCH` is always `0`.
The Kata Containers project does not maintain stable branches (see
[#9064](https://github.com/kata-containers/kata-containers/issues/9064)).
Bug fixes land on `main` and ship in the next monthly snapshot rather than
being backported. Downstream projects that need extended support or compliance
certifications should select a monthly snapshot as their stable base and manage
their own validation and patch backporting from there.
## Release Process
### Bump the `VERSION` and `Chart.yaml` file
### Lock the `main` branch and announce release process
When the `kata-containers/kata-containers` repository is ready for a new release,
first create a PR to set the release in the [`VERSION`](./../VERSION) file and update the
`version` and `appVersion` in the
[`Chart.yaml`](./../tools/packaging/kata-deploy/helm-chart/kata-deploy/Chart.yaml) file and
have it merged.
### Lock the `main` branch
In order to prevent any PRs getting merged during the release process, and slowing the release
process down, by impacting the payload caches, we have recently trailed setting the `main`
branch to read only whilst the release action runs.
In order to prevent any PRs getting merged during the release process, and
slowing the release process down, by impacting the payload caches, we have
recently trialed setting the `main` branch to read-only.
Once the `kata-containers/kata-containers` repository is ready for a new
release, lock the main branch until the release action has completed.
Notify the #kata-dev Slack channel about the ongoing release process.
Ideally, CI usage by others should be reduced to a minimum during the
ongoing release process.
> [!NOTE]
> Admin permission is needed to complete this task.
> Admin permission is needed to lock/unlock the `main` branch.
### Bump the `VERSION` and `Chart.yaml` file
Create a PR to set the release in the [`VERSION`](./../VERSION) file and to
update the `version` and `appVersion` fields in the
[`Chart.yaml`](./../tools/packaging/kata-deploy/helm-chart/kata-deploy/Chart.yaml)
file. Temporarily unlock the main branch to merge the PR.
### Wait for the `VERSION` bump PR payload publish to complete
To reduce the chance of need to re-run the release workflow, check the
[CI | Publish Kata Containers payload](https://github.com/kata-containers/kata-containers/actions/workflows/payload-after-push.yaml)
To reduce the chance of need to re-run the release workflow, check the [CI |
Publish Kata Containers
payload](https://github.com/kata-containers/kata-containers/actions/workflows/payload-after-push.yaml)
once the `VERSION` PR bump has merged to check that the assets build correctly
and are cached, so that the release process can just download these artifacts
rather than needing to build them all, which takes time and can reveal errors in infra.
rather than needing to build them all, which takes time and can reveal errors in
infra.
### Check GitHub Actions
### Trigger the `Release Kata Containers` GitHub Action
We make use of [GitHub actions](https://github.com/features/actions) in the
[release](https://github.com/kata-containers/kata-containers/actions/workflows/release.yaml)
@@ -63,11 +73,10 @@ release artifacts.
> [!NOTE]
> Write permissions to trigger the action.
The action is manually triggered and is responsible for generating a new
release (including a new tag), pushing those to the
`kata-containers/kata-containers` repository. The new release is initially
created as a draft. It is promoted to an official release when the whole
workflow has completed successfully.
The action is manually triggered and is responsible for generating a new release
(including a new tag), pushing those to the `kata-containers/kata-containers`
repository. The new release is initially created as a draft. It is promoted to
an official release when the whole workflow has completed successfully.
Check the [actions status
page](https://github.com/kata-containers/kata-containers/actions) to verify all
@@ -75,12 +84,13 @@ steps in the actions workflow have completed successfully. On success, a static
tarball containing Kata release artifacts will be uploaded to the [Release
page](https://github.com/kata-containers/kata-containers/releases).
If the workflow fails because of some external environmental causes, e.g. network
timeout, simply re-run the failed jobs until they eventually succeed.
If the workflow fails because of some external environmental causes, e.g.
network timeout, simply re-run the failed jobs until they eventually succeed.
If for some reason you need to cancel the workflow or re-run it entirely, go first
to the [Release page](https://github.com/kata-containers/kata-containers/releases) and
delete the draft release from the previous run.
If for some reason you need to cancel the workflow or re-run it entirely, go
first to the [Release
page](https://github.com/kata-containers/kata-containers/releases) and delete
the draft release from the previous run.
### Unlock the `main` branch
@@ -90,9 +100,8 @@ an admin to do it.
### Improve the release notes
Release notes are auto-generated by the GitHub CLI tool used as part of our
release workflow. However, some manual tweaking may still be necessary in
order to highlight the most important features and bug fixes in a specific
release.
release workflow. However, some manual tweaking may still be necessary in order
to highlight the most important features and bug fixes in a specific release.
With this in mind, please, poke @channel on #kata-dev and people who worked on
the release will be able to contribute to that.

View File

Before

Width:  |  Height:  |  Size: 710 B

After

Width:  |  Height:  |  Size: 710 B

View File

@@ -231,12 +231,6 @@ Run the
[markdown checker](https://github.com/kata-containers/kata-containers/tree/main/tests/cmd/check-markdown)
on your documentation changes.
### Spell check
Run the
[spell checker](https://github.com/kata-containers/kata-containers/tree/main/tests/cmd/check-spelling)
on your documentation changes.
## Finally
You may wish to read the documentation that the

View File

@@ -43,7 +43,7 @@ To fulfill the [Kata design requirements](kata-design-requirements.md), and base
|`sandbox.AddInterface(inf)`| Add new NIC to the sandbox.|
|`sandbox.RemoveInterface(inf)`| Remove a NIC from the sandbox.|
|`sandbox.ListInterfaces()`| List all NICs and their configurations in the sandbox, return a `pbTypes.Interface` list.|
|`sandbox.UpdateRoutes(routes)`| Update the sandbox route table (e.g. for portmapping support), return a `pbTypes.Route` list.|
|`sandbox.UpdateRoutes(routes)`| Update the sandbox route table (e.g. for port mapping support), return a `pbTypes.Route` list.|
|`sandbox.ListRoutes()`| List the sandbox route table, return a `pbTypes.Route` list.|
### Sandbox Relay API

View File

@@ -8,7 +8,7 @@ The following benchmarking result shows the performance improvement compared wit
## Proposal - Bring `lazyload` ability to Kata Containers
`Nydusd` is a fuse/`virtiofs` daemon which is provided by `nydus` project and it supports `PassthroughFS` and [RAFS](https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-design.md) (Registry Acceleration File System) natively, so in Kata Containers, we can use `nydusd` in place of `virtiofsd` and mount `nydus` image to guest in the meanwhile.
`Nydusd` is a fuse/`virtiofs` daemon which is provided by `nydus` project and it supports `PassthroughFS` and [`rafs`](https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-design.md) (Registry Acceleration File System) natively, so in Kata Containers, we can use `nydusd` in place of `virtiofsd` and mount `nydus` image to guest in the meanwhile.
The process of creating/starting Kata Containers with `virtiofsd`,

264
docs/helm-configuration.md Normal file
View File

@@ -0,0 +1,264 @@
# Helm Configuration
## Parameters
The helm chart provides a comprehensive set of configuration options. You may view the parameters and their descriptions by going to the [GitHub source](https://github.com/kata-containers/kata-containers/blob/main/tools/packaging/kata-deploy/helm-chart/kata-deploy/values.yaml) or by using helm:
```sh
# List available kata-deploy chart versions:
# helm search repo kata-deploy-charts/kata-deploy --versions
#
# Then replace X.Y.Z below with the desired chart version:
helm show values --version X.Y.Z oci://ghcr.io/kata-containers/kata-deploy-charts/kata-deploy
```
### shims
Kata ships with a number of pre-built artifacts and runtimes. You may selectively enable or disable specific shims. For example:
```yaml title="values.yaml"
shims:
disableAll: true
qemu:
enabled: true
qemu-nvidia-gpu:
enabled: true
qemu-nvidia-gpu-snp:
enabled: false
```
Shims can also have configuration options specific to them:
```yaml
qemu-nvidia-gpu:
enabled: ~
supportedArches:
- amd64
- arm64
allowedHypervisorAnnotations: []
containerd:
snapshotter: ""
runtimeClass:
# This label is automatically added by gpu-operator. Override it
# if you want to use a different label.
# Uncomment once GPU Operator v26.3 is out
# nodeSelector:
# nvidia.com/cc.ready.state: "false"
```
It's best to reference the default `values.yaml` file above for more details.
### Custom Runtimes
Kata allows you to create custom runtime configurations. This is done by overlaying one of the pre-existing runtime configs with user-provided configs. For example, we can use the `qemu-nvidia-gpu` as a base config and overlay our own parameters to it:
```yaml
customRuntimes:
enabled: false
runtimes:
my-gpu-runtime:
baseConfig: "qemu-nvidia-gpu" # Required: existing config to use as base
dropIn: | # Optional: overrides via config.d mechanism
[hypervisor.qemu]
default_memory = 1024
default_vcpus = 4
runtimeClass: |
kind: RuntimeClass
apiVersion: node.k8s.io/v1
metadata:
name: kata-my-gpu-runtime
labels:
app.kubernetes.io/managed-by: kata-deploy
handler: kata-my-gpu-runtime
overhead:
podFixed:
memory: "640Mi"
cpu: "500m"
scheduling:
nodeSelector:
katacontainers.io/kata-runtime: "true"
# Optional: CRI-specific configuration
containerd:
snapshotter: "nydus" # Configure containerd snapshotter (nydus, erofs, etc.)
crio:
pullType: "guest-pull" # Configure CRI-O runtime_pull_image = true
```
Again, view the default [`values.yaml`](#parameters) file for more details.
## Examples
We provide a few examples that you can pass to helm via the `-f`/`--values` flag.
### [`try-kata-tee.values.yaml`](https://github.com/kata-containers/kata-containers/blob/main/tools/packaging/kata-deploy/helm-chart/kata-deploy/try-kata-tee.values.yaml)
This file enables only the TEE (Trusted Execution Environment) shims for confidential computing:
```sh
helm install kata-deploy oci://ghcr.io/kata-containers/kata-deploy-charts/kata-deploy \
--version VERSION \
-f try-kata-tee.values.yaml
```
Includes:
- `qemu-snp` - AMD SEV-SNP (amd64)
- `qemu-tdx` - Intel TDX (amd64)
- `qemu-se` - IBM Secure Execution for Linux (SEL) (s390x)
- `qemu-se-runtime-rs` - IBM Secure Execution for Linux (SEL) Rust runtime (s390x)
- `qemu-cca` - Arm Confidential Compute Architecture (arm64)
- `qemu-coco-dev` - Confidential Containers development (amd64, s390x)
- `qemu-coco-dev-runtime-rs` - Confidential Containers development Rust runtime (amd64, s390x)
### [`try-kata-nvidia-gpu.values.yaml`](https://github.com/kata-containers/kata-containers/blob/main/tools/packaging/kata-deploy/helm-chart/kata-deploy/try-kata-nvidia-gpu.values.yaml)
This file enables only the NVIDIA GPU-enabled shims:
```sh
helm install kata-deploy oci://ghcr.io/kata-containers/kata-deploy-charts/kata-deploy \
--version VERSION \
-f try-kata-nvidia-gpu.values.yaml
```
Includes:
- `qemu-nvidia-gpu` - Standard NVIDIA GPU support (amd64, arm64)
- `qemu-nvidia-gpu-snp` - NVIDIA GPU with AMD SEV-SNP (amd64)
- `qemu-nvidia-gpu-tdx` - NVIDIA GPU with Intel TDX (amd64)
### `nodeSelector`
We can deploy Kata only to specific nodes using `nodeSelector`
```sh
# First, label the nodes where you want kata-containers to be installed
$ kubectl label nodes worker-node-1 kata-containers=enabled
$ kubectl label nodes worker-node-2 kata-containers=enabled
# Then install the chart with `nodeSelector`
$ helm install kata-deploy \
--set nodeSelector.kata-containers="enabled" \
"${CHART}" --version "${VERSION}"
```
You can also use a values file:
```yaml title="values.yaml"
nodeSelector:
kata-containers: "enabled"
node-type: "worker"
```
```sh
$ helm install kata-deploy -f values.yaml "${CHART}" --version "${VERSION}"
```
### Multiple Kata installations on the Same Node
For debugging, testing and other use-case it is possible to deploy multiple
versions of Kata on the very same node. All the needed artifacts are getting the
`multiInstallSuffix` appended to distinguish each installation. **BEWARE** that one
needs at least **containerd-2.0** since this version has drop-in conf support
which is a prerequisite for the `multiInstallSuffix` to work properly.
```sh
$ helm install kata-deploy-cicd \
-n kata-deploy-cicd \
--set env.multiInstallSuffix=cicd \
--set env.debug=true \
"${CHART}" --version "${VERSION}"
```
Note: `runtimeClasses` are automatically created by Helm (via
`runtimeClasses.enabled=true`, which is the default).
Now verify the installation by examining the `runtimeClasses`:
```sh
$ kubectl get runtimeClasses
NAME HANDLER AGE
kata-clh-cicd kata-clh-cicd 77s
kata-cloud-hypervisor-cicd kata-cloud-hypervisor-cicd 77s
kata-dragonball-cicd kata-dragonball-cicd 77s
kata-fc-cicd kata-fc-cicd 77s
kata-qemu-cicd kata-qemu-cicd 77s
kata-qemu-coco-dev-cicd kata-qemu-coco-dev-cicd 77s
kata-qemu-nvidia-gpu-cicd kata-qemu-nvidia-gpu-cicd 77s
kata-qemu-nvidia-gpu-snp-cicd kata-qemu-nvidia-gpu-snp-cicd 77s
kata-qemu-nvidia-gpu-tdx-cicd kata-qemu-nvidia-gpu-tdx-cicd 76s
kata-qemu-runtime-rs-cicd kata-qemu-runtime-rs-cicd 77s
kata-qemu-se-runtime-rs-cicd kata-qemu-se-runtime-rs-cicd 77s
kata-qemu-snp-cicd kata-qemu-snp-cicd 77s
kata-qemu-tdx-cicd kata-qemu-tdx-cicd 77s
kata-stratovirt-cicd kata-stratovirt-cicd 77s
```
## RuntimeClass Node Selectors for TEE Shims
**Manual configuration:** Any `nodeSelector` you set under `shims.<shim>.runtimeClass.nodeSelector`
is **always applied** to that shim's RuntimeClass, whether or not NFD is present. Use this when
you want to pin TEE workloads to specific nodes (e.g. without NFD, or with custom labels).
**Auto-inject when NFD is present:** If you do *not* set a `runtimeClass.nodeSelector` for a
TEE shim, the chart can **automatically inject** NFD-based labels when NFD is detected in the
cluster (deployed by this chart with `node-feature-discovery.enabled=true` or found externally):
- AMD SEV-SNP shims: `amd.feature.node.kubernetes.io/snp: "true"`
- Intel TDX shims: `intel.feature.node.kubernetes.io/tdx: "true"`
- IBM Secure Execution for Linux (SEL) shims (s390x): `feature.node.kubernetes.io/cpu-security.se.enabled: "true"`
The chart uses Helm's `lookup` function to detect NFD (by looking for the
`node-feature-discovery-worker` DaemonSet). Auto-inject only runs when NFD is detected and
no manual `runtimeClass.nodeSelector` is set for that shim.
**Note**: NFD detection requires cluster access. During `helm template` (dry-run without a
cluster), external NFD is not seen, so auto-injected labels are not added. Manual
`runtimeClass.nodeSelector` values are still applied in all cases.
## Customizing Configuration with Drop-in Files
When kata-deploy installs Kata Containers, the base configuration files should not
be modified directly. Instead, use drop-in configuration files to customize
settings. This approach ensures your customizations survive kata-deploy upgrades.
### How Drop-in Files Work
The Kata runtime reads the base configuration file and then applies any `.toml`
files found in the `config.d/` directory alongside it. Files are processed in
alphabetical order, with later files overriding earlier settings.
### Creating Custom Drop-in Files
To add custom settings, create a `.toml` file in the appropriate `config.d/`
directory. Use a numeric prefix to control the order of application.
**Reserved prefixes** (used by kata-deploy):
- `10-*`: Core kata-deploy settings
- `20-*`: Debug settings
- `30-*`: Kernel parameters
**Recommended prefixes for custom settings**: `50-89`
### Drop-In Config Examples
#### Adding Custom Kernel Parameters
```bash
# SSH into the node or use kubectl exec
sudo mkdir -p /opt/kata/share/defaults/kata-containers/runtimes/qemu/config.d/
sudo cat > /opt/kata/share/defaults/kata-containers/runtimes/qemu/config.d/50-custom.toml << 'EOF'
[hypervisor.qemu]
kernel_params = "my_param=value"
EOF
```
#### Changing Default Memory Size
```bash
sudo cat > /opt/kata/share/defaults/kata-containers/runtimes/qemu/config.d/50-memory.toml << 'EOF'
[hypervisor.qemu]
default_memory = 4096
EOF
```

View File

@@ -23,7 +23,7 @@ workloads with isolated sandboxes (i.e. Kata Containers).
As a result, the CRI implementations extended their semantics for the requirements:
- At the beginning, [Frakti](https://github.com/kubernetes/frakti) checks the network configuration of a Pod, and
- At the beginning, [`Frakti`](https://github.com/kubernetes/frakti) checks the network configuration of a Pod, and
treat Pod with `host` network as trusted, while others are treated as untrusted.
- The containerd introduced an annotation for untrusted Pods since [v1.0](https://github.com/containerd/cri/blob/v1.0.0-rc.0/docs/config.md):
```yaml

View File

@@ -18,7 +18,7 @@ The host kernel must be equal to or later than upstream version [6.11](https://c
[`sev-utils`](https://github.com/amd/sev-utils/blob/coco-202501150000/docs/snp.md) is an easy way to install the required host kernel with the `setup-host` command. However, it will also build compatible guest kernel, OVMF, and QEMU components which are not necessary as these components are packaged with kata. The `sev-utils` script utility can be used with these additional components to test the memory encrypted launch and attestation of a base QEMU SNP guest.
For a simplified way to build just the upstream compatible host kernel, use the Confidential Containers fork of [AMDESE AMDSEV](https://github.com/confidential-containers/amdese-amdsev/tree/amd-snp-202501150000). Individual components can be built by running the following command:
For a simplified way to build just the upstream compatible host kernel, use the Confidential Containers fork of [`amdese-amdsev`](https://github.com/confidential-containers/amdese-amdsev/tree/amd-snp-202501150000). Individual components can be built by running the following command:
```
./build.sh kernel host --install
@@ -65,7 +65,7 @@ $ ./configure --enable-virtfs --target-list=x86_64-softmmu --enable-debug
$ make -j "$(nproc)"
$ popd
```
- Create cert-chain for SNP attestation ( using [snphost](https://github.com/virtee/snphost/blob/main/docs/snphost.1.adoc) )
- Create cert-chain for SNP attestation ( using [`snphost`](https://github.com/virtee/snphost/blob/main/docs/snphost.1.adoc) )
```bash
$ git clone https://github.com/virtee/snphost.git && cd snphost/
$ cargo build
@@ -178,4 +178,3 @@ sudo reboot
```bash
sudo rmmod kvm_amd && sudo modprobe kvm_amd sev_snp=0
```

View File

@@ -315,7 +315,7 @@ $ kata-agent-ctl connect --server-address "unix:///var/run/kata/$PODID/root/kata
### compact_threshold
Control the mem-agent compaction function compact threshold.<br>
compact_threshold is the pages number.<br>
When examining the /proc/pagetypeinfo, if there's an increase in the number of movable pages of orders smaller than the compact_order compared to the amount following the previous compaction period, and this increase surpasses a certain threshold specifically, more than compact_threshold number of pages, or the number of free pages has decreased by compact_threshold since the previous compaction. Current compact run period will not do compaction because there is no enough fragmented pages to be compaction.<br>
When examining the `/proc/pagetypeinfo`, if there's an increase in the number of movable pages of orders smaller than the compact_order compared to the amount following the previous compaction period, and this increase surpasses a certain threshold specifically, more than compact_threshold number of pages, or the number of free pages has decreased by compact_threshold since the previous compaction. Current compact run period will not do compaction because there is no enough fragmented pages to be compaction.<br>
This design aims to minimize the impact of unnecessary compaction calls on system performance.<br>
Default to 1024.

View File

@@ -8,7 +8,7 @@
> **Note:** `cri-tools` is only used for debugging and validation purpose, and don't use it to run production workloads.
> **Note:** For how to install and configure `cri-tools` with CRI runtimes like `containerd` or CRI-O, please also refer to other [howtos](./README.md).
> **Note:** For how to install and configure `cri-tools` with CRI runtimes like `containerd` or CRI-O, please also refer to other [how-tos](./README.md).
## Use `crictl` run Pods in Kata containers

View File

@@ -16,83 +16,38 @@ which hypervisors you may wish to investigate further.
## Types
| Hypervisor | Written in | Architectures | Type |
|-|-|-|-|
|[Cloud Hypervisor] | rust | `aarch64`, `x86_64` | Type 2 ([KVM]) |
|[Firecracker] | rust | `aarch64`, `x86_64` | Type 2 ([KVM]) |
|[QEMU] | C | all | Type 2 ([KVM]) | `configuration-qemu.toml` |
|[`Dragonball`] | rust | `aarch64`, `x86_64` | Type 2 ([KVM]) |
|[StratoVirt] | rust | `aarch64`, `x86_64` | Type 2 ([KVM]) |
| Hypervisor | Written in | Architectures | GPU Support | Intel TDX | AMD SEV-SNP |
|-|-|-|-|-|-|
|[Cloud Hypervisor](#cloud-hypervisor) | rust | `aarch64`, `x86_64` | :x: | :x: | :x: |
|[Firecracker](#firecracker) | rust | `aarch64`, `x86_64` | :x: | :x: | :x: |
|[QEMU](#qemu) | C | all | :white_check_mark: | :white_check_mark: | :white_check_mark: |
|[Dragonball](#dragonball) | rust | `aarch64`, `x86_64` | :x: | :x: | :x: |
|StratoVirt | rust | `aarch64`, `x86_64` | :x: | :x: | :x: |
## Determine currently configured hypervisor
Each Kata runtime is configured for a specific hypervisor through the runtime's configuration file. For example:
```bash
$ kata-runtime kata-env | awk -v RS= '/\[Hypervisor\]/' | grep Path
```toml title="/opt/kata/share/defaults/kata-containers/configuration.toml"
[hypervisor.qemu]
path = "/opt/kata/bin/qemu-system-x86_64"
```
## Choose a Hypervisor
```toml title="/opt/kata/share/defaults/kata-containers/configuration-clh.toml"
[hypervisor.clh]
path = "/opt/kata/bin/cloud-hypervisor"
```
The table below provides a brief summary of some of the differences between
the hypervisors:
## Cloud Hypervisor
| Hypervisor | Summary | Features | Limitations | Container Creation speed | Memory density | Use cases | Comment |
|-|-|-|-|-|-|-|-|
|[Cloud Hypervisor] | Low latency, small memory footprint, small attack surface | Minimal | | excellent | excellent | High performance modern cloud workloads | |
|[Firecracker] | Very slimline | Extremely minimal | Doesn't support all device types | excellent | excellent | Serverless / FaaS | |
|[QEMU] | Lots of features | Lots | | good | good | Good option for most users | |
|[`Dragonball`] | Built-in VMM, low CPU and memory overhead| Minimal | | excellent | excellent | Optimized for most container workloads | `out-of-the-box` Kata Containers experience |
|[StratoVirt] | Unified architecture supporting three scenarios: VM, container, and serverless | Extremely minimal(`MicroVM`) to Lots(`StandardVM`) | | excellent | excellent | Common container workloads | `StandardVM` type of StratoVirt for Kata is under development |
[Cloud Hypervisor](https://www.cloudhypervisor.org/) is a more modern hypervisor written in Rust.
For further details, see the [Virtualization in Kata Containers](design/virtualization.md) document and the official documentation for each hypervisor.
## Firecracker
## Hypervisor configuration files
[Firecracker](https://firecracker-microvm.github.io/) is a minimal and lightweight hypervisor created for the AWS Lambda product.
Since each hypervisor offers different features and options, Kata Containers
provides a separate
[configuration file](../src/runtime/README.md#configuration)
for each. The configuration files contain comments explaining which options
are available, their default values and how each setting can be used.
## QEMU
| Hypervisor | Golang runtime config file | golang runtime short name | golang runtime default | rust runtime config file | rust runtime short name | rust runtime default |
|-|-|-|-|-|-|-|
| [Cloud Hypervisor] | [`configuration-clh.toml`](../src/runtime/config/configuration-clh.toml.in) | `clh` | | [`configuration-cloud-hypervisor.toml`](../src/runtime-rs/config/configuration-cloud-hypervisor.toml.in) | `cloud-hypervisor` | |
| [Firecracker] | [`configuration-fc.toml`](../src/runtime/config/configuration-fc.toml.in) | `fc` | | | | |
| [QEMU] | [`configuration-qemu.toml`](../src/runtime/config/configuration-qemu.toml.in) | `qemu` | yes | [`configuration-qemu.toml`](../src/runtime-rs/config/configuration-qemu-runtime-rs.toml.in) | `qemu` | |
| [`Dragonball`] | | | | [`configuration-dragonball.toml`](../src/runtime-rs/config/configuration-dragonball.toml.in) | `dragonball` | yes |
| [StratoVirt] | [`configuration-stratovirt.toml`](../src/runtime/config/configuration-stratovirt.toml.in) | `stratovirt` | | | | |
QEMU is the best supported hypervisor for NVIDIA-based GPUs and for confidential computing use-cases (such as Intel TDX and AMD SEV-SNP). Runtimes that use this are normally named `kata-qemu-nvidia-gpu-*`. The Kata project focuses primarily on QEMU runtimes for GPU support.
> **Notes:**
>
> - The short names specified are used by the [`kata-manager`](../utils/README.md) tool.
> - As shown by the default columns, each runtime type has its own default hypervisor.
> - The [golang runtime](../src/runtime) is the current default runtime.
> - The [rust runtime](../src/runtime-rs), also known as `runtime-rs`,
> is the newer runtime written in the rust language.
> - See the [Configuration](../README.md#configuration) for further details.
> - The configuration file links in the table link to the "source"
> versions: these are not usable configuration files as they contain
> variables that need to be expanded:
> - The links are provided for reference only.
> - The final (installed) versions, where all variables have been
> expanded, are built from these source configuration files.
> - The pristine configuration files are usually installed in the
> `/opt/kata/share/defaults/kata-containers/` or
> `/usr/share/defaults/kata-containers/` directories.
> - Some hypervisors may have the same name for both golang and rust
> runtimes, but the file contents may differ.
> - If there is no configuration file listed for the golang or
> rust runtimes, this either means the hypervisor cannot be run with
> a particular runtime, or that a driver has not yet been made
> available for that runtime.
## Dragonball
## Switch configured hypervisor
To switch the configured hypervisor, you only need to run a single command.
See [the `kata-manager` documentation](../utils/README.md#choose-a-hypervisor) for further details.
[Cloud Hypervisor]: https://github.com/cloud-hypervisor/cloud-hypervisor
[Firecracker]: https://github.com/firecracker-microvm/firecracker
[KVM]: https://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine
[QEMU]: http://www.qemu.org
[`Dragonball`]: https://github.com/kata-containers/kata-containers/blob/main/src/dragonball
[StratoVirt]: https://gitee.com/openeuler/stratovirt
Dragonball is a special hypervisor created by the Ant Group that runs in the same process as the Rust-based containerd shim.

94
docs/index.md Normal file
View File

@@ -0,0 +1,94 @@
# Kata Containers
Kata Containers is an open source community working to build a secure container runtime with lightweight virtual machines (VM's) that feel and perform like standard Linux containers, but provide stronger workload isolation using hardware virtualization technology as a second layer of defense.
## How it Works
Kata implements the [Open Containers Runtime Specification](https://github.com/opencontainers/runtime-spec). More specifically, it implements a containerd shim that implements the expected interface for managing container lifecycles. The default containerd runtime of `runc` spawns a container like this:
```mermaid
flowchart TD
subgraph Host
containerd
runc
process[Container Process]
containerd --> runc --> process
end
```
When containerd receives a request to spawn a container, it will pull the container image down and then call out to the runc shim (usually located at `/usr/local/bin/containerd-shim-runc-v2`). runc will then create various process isolation resources like Linux namespaces (networking, PIDs, mounts etc), seccomp filters, Linux capability reductions, and then spawn the process inside of those resources. This process runs in the host kernel.
Kata spawns containers like this:
```mermaid
flowchart TD
subgraph Host
containerdOuter[containerd]
kata
containerdOuter --> kata
kata --> kataAgent
subgraph VM
kataAgent[Kata Agent]
process[Container Process]
kataAgent --> process
end
end
```
The container process spawned inside of the VM allows us to isolate the guest kernel from the host system. This is the fundamental principle of how Kata achieves its isolation boundaries.
## Example
When Kata is installed in a system, a number of artifacts are laid down. containerd's config will be modified as such:
```toml title="/etc/containerd/config.toml"
imports = ["/opt/kata/containerd/config.d/kata-deploy.toml"]
```
This file will contain configuration for various flavors of Kata runtimes. We can see the vanilla CPU runtime config here:
```toml title="/opt/kata/containerd/config.d/kata-deploy.toml"
[plugins."io.containerd.cri.v1.runtime".containerd.runtimes.kata-qemu]
runtime_type = "io.containerd.kata-qemu.v2"
runtime_path = "/opt/kata/bin/containerd-shim-kata-v2"
privileged_without_host_devices = true
pod_annotations = ["io.katacontainers.*"]
[plugins."io.containerd.cri.v1.runtime".containerd.runtimes.kata-qemu.options]
ConfigPath = "/opt/kata/share/defaults/kata-containers/configuration-qemu.toml"
```
Because containerd's CRI is aware of the Kata runtimes, we can spawn Kubernetes pods:
```yaml
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
runtimeClassName: kata-qemu
containers:
- name: test
image: "quay.io/libpod/ubuntu:latest"
command: ["/bin/bash", "-c"]
args: ["echo hello"]
```
We can also spawn a Kata container by submitting a request to containerd like so:
<div class="annotate" markdown>
```sh
$ ctr image pull quay.io/libpod/ubuntu:latest
$ ctr run --runtime "io.containerd.kata.v2" --runtime-config-path /opt/kata/share/defaults/kata-containers/configuration-qemu.toml --rm -t "quay.io/libpod/ubuntu:latest" foo sh
# echo hello
hello
```
</div>
!!! tip
`ctr` is not aware of the CRI config in `/etc/containerd/config.toml`. This is why you must specify the `--runtime-config-path`. Additionally, the `--runtime` value is converted into a specific binary name which containerd then searches for in its `PATH`. See the [containerd docs](https://github.com/containerd/containerd/blob/release/2.2/core/runtime/v2/README.md#usage) for more details.

64
docs/installation.md Normal file
View File

@@ -0,0 +1,64 @@
# Installation
## Helm Chart
[helm](https://helm.sh/docs/intro/install/) can be used to install templated kubernetes manifests.
### Prerequisites
- **Kubernetes ≥ v1.22** v1.22 is the first release where the CRI v1 API
became the default and `RuntimeClass` left alpha. The chart depends on those
stable interfaces; earlier clusters need `featuregates` or CRI shims that are
out of scope.
- **Kata Release 3.12** - v3.12.0 introduced publishing the helm-chart on the
release page for easier consumption, since v3.8.0 we shipped the helm-chart
via source code in the kata-containers `GitHub` repository.
- CRIcompatible runtime (containerd or CRIO). If one wants to use the
`multiInstallSuffix` feature one needs at least **containerd-2.0** which
supports drop-in config files
- Nodes must allow loading kernel modules and installing Kata artifacts (the
chart runs privileged containers to do so)
### `helm install`
```sh
# Install directly from the official ghcr.io OCI registry
# update the VERSION X.YY.Z to your needs or just use the latest
export VERSION=$(curl -sSL https://api.github.com/repos/kata-containers/kata-containers/releases/latest | jq .tag_name | tr -d '"')
export CHART="oci://ghcr.io/kata-containers/kata-deploy-charts/kata-deploy"
$ helm install kata-deploy "${CHART}" --version "${VERSION}"
# See everything you can configure
$ helm show values "${CHART}" --version "${VERSION}"
```
This installs the `kata-deploy` DaemonSet and the default Kata `RuntimeClass`
resources on your cluster.
To see what versions of the chart are available:
```sh
$ helm show chart oci://ghcr.io/kata-containers/kata-deploy-charts/kata-deploy
```
### `helm uninstall`
```sh
$ helm uninstall kata-deploy -n kube-system
```
During uninstall, Helm will report that some resources were kept due to the
resource policy (`ServiceAccount`, `ClusterRole`, `ClusterRoleBinding`). This
is **normal**. A post-delete hook Job runs after uninstall and removes those
resources so no cluster-wide `RBAC` is left behind.
## Pre-Built Release
Kata can also be installed using the pre-built releases: https://github.com/kata-containers/kata-containers/releases
This method does not have any facilities for artifact lifecycle management.

116
docs/prerequisites.md Normal file
View File

@@ -0,0 +1,116 @@
# Prerequisites
## Kubernetes
If using Kubernetes, at least version `v1.22` is recommended. This is the first release that the CRI v1 API and the `RuntimeClass` left alpha.
## containerd
Kata requires a [CRI](https://kubernetes.io/docs/concepts/containers/cri/)-compatible container runtime. containerd is commonly used for Kata. We recommend installing containerd using your platform's package distribution mechanism. We recommend at least the latest version of containerd v2.1.x.[^1]
### Debian/Ubuntu
To install on Debian-based systems:
```sh
$ apt update
$ apt install containerd
$ systemctl status containerd
● containerd.service - containerd container runtime
Loaded: loaded (/etc/systemd/system/containerd.service; enabled; preset: enabled)
Drop-In: /etc/systemd/system/containerd.service.d
└─http-proxy.conf
Active: active (running) since Wed 2026-02-25 22:58:13 UTC; 5 days ago
Docs: https://containerd.io
Main PID: 3767885 (containerd)
Tasks: 540
Memory: 70.7G (peak: 70.8G)
CPU: 4h 9min 26.153s
CGroup: /runtime.slice/containerd.service
├─ 12694 /usr/local/bin/container
```
### Fedora/RedHat
To install on Fedora-based systems:
```
$ yum install containerd
```
??? help
Documentation assistance is requested for more specific instructions on Fedora systems.
### Pre-Built Releases
Many Linux distributions will not package the latest versions of containerd. If you find that your distribution provides very old versions of containerd, it's recommended to upgrade with the [pre-built releases](https://github.com/containerd/containerd/releases).
#### Executable
Download the latest release of containerd:
```sh
$ wget https://github.com/containerd/containerd/releases/download/v${VERSION}/containerd-${VERSION}-linux-${PLATFORM}.tar.gz
# Extract to the current directory
$ tar -xf ./containerd*.tar.gz
# Extract to root if you want it installed to its final location.
$ tar -C / -xf ./*.tar.gz
```
### Containerd Config
Containerd requires a config file at `/etc/containerd/config.toml`. This needs to be populated with a simple default config:
```sh
$ /usr/local/bin/containerd config default > /etc/containerd/config.toml
```
### Systemd Unit File
Install the systemd unit file:
```sh
$ wget -O /etc/systemd/system/containerd.service https://raw.githubusercontent.com/containerd/containerd/main/containerd.service
```
!!! info
- You must modify the `ExecStart` line to the location of the installed containerd executable.
- containerd's `PATH` variable must allow it to find `containerd-shim-kata-v2`. You can do this by either creating a symlink from `/usr/local/bin/containerd-shim-kata-v2` to `/opt/kata/bin/containerd-shim-kata-v2` or by modifying containerd's `PATH` variable to search in `/opt/kata/bin/`. See the Environment= command in systemd.exec(5) for further details.
Reload systemd and start containerd:
```sh
$ systemctl daemon-reload
$ systemctl enable --now containerd
$ systemctl start containerd
$ systemctl status containerd
```
More details can be found on the [containerd installation docs](https://github.com/containerd/containerd/blob/main/docs/getting-started.md).
### Enable CRI
If you're using Kubernetes, you must enable the containerd Container Runtime Interface (CRI) plugin:
```sh
$ ctr plugins ls | grep cri
io.containerd.cri.v1 images - ok
io.containerd.cri.v1 runtime linux/amd64 ok
io.containerd.grpc.v1 cri - ok
```
If these are not enabled, you'll need to remove it from the `disabled_plugins` section of the containerd config.
[^1]: Kata makes use of containerd's drop-in config merging in `/etc/containerd/config.d/` which is only available starting from containerd v2. containerd v1 may work, but some Kata features will not work as expected.
## runc
The default `runc` runtime needs to be installed for non-kata containers. More details can be found at the [containerd docs](https://github.com/containerd/containerd/blob/979c80d8a5d7fc7be34102a1ada53ae5a0ff09e8/docs/RUNC.md).

9
docs/requirements.txt Normal file
View File

@@ -0,0 +1,9 @@
mkdocs-materialx==10.0.9
mkdocs-glightbox==0.4.0
mkdocs-macros-plugin==1.5.0
mkdocs-awesome-nav==3.3.0
mkdocs-open-in-new-tab==1.0.8
mkdocs-redirects==1.2.2
CairoSVG==2.9.0
pillow==12.1.1
click==8.2.1

View File

@@ -0,0 +1,56 @@
# Runtime Configuration
The containerd shims (both the Rust and Go implementations) take configuration files to control their behavior. These files are in `/opt/kata/share/defaults/kata-containers/`. An example excerpt:
```toml title="/opt/kata/share/defaults/kata-containers/configuration.toml"
[hypervisor.qemu]
path = "/opt/kata/bin/qemu-system-x86_64"
kernel = "/opt/kata/share/kata-containers/vmlinux.container"
image = "/opt/kata/share/kata-containers/kata-containers.img"
machine_type = "q35"
# rootfs filesystem type:
# - ext4 (default)
# - xfs
# - erofs
rootfs_type = "ext4"
# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
rootless = false
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
# of the annotation, e.g. "path" for io.katacontainers.config.hypervisor.path"
enable_annotations = ["enable_iommu", "virtio_fs_extra_args", "kernel_params"]
```
These files should never be modified directly. If you wish to create a modified version of these files, you may create your own [custom runtime](helm-configuration.md#custom-runtimes). For example, to modify the image path, we provide these values to helm:
```yaml title="values.yaml"
customRuntimes:
enabled: true
runtimes:
my-gpu-runtime:
baseConfig: "qemu-nvidia-gpu"
dropIn: |
[hypervisor.qemu]
image = "/path/to/custom-image.img"
runtimeClass: |
kind: RuntimeClass
apiVersion: node.k8s.io/v1
metadata:
name: kata-my-gpu-runtime
labels:
app.kubernetes.io/managed-by: kata-deploy
handler: kata-my-gpu-runtime
overhead:
podFixed:
memory: "640Mi"
cpu: "500m"
scheduling:
nodeSelector:
katacontainers.io/kata-runtime: "true"
```

View File

@@ -175,7 +175,7 @@ specific).
##### Dragonball networking
For Dragonball, the `virtio-net` backend default is within Dragonbasll's VMM.
For Dragonball, the `virtio-net` backend default is within Dragonball's VMM.
#### virtio-vsock

View File

@@ -1,11 +1,12 @@
# Enabling NVIDIA GPU workloads using GPU passthrough with Kata Containers
This page provides:
1. A description of the components involved when running GPU workloads with
Kata Containers using the NVIDIA TEE and non-TEE GPU runtime classes.
1. An explanation of the orchestration flow on a Kubernetes node for this
scenario.
1. A deployment guide enabling to utilize these runtime classes.
1. A deployment guide to utilize these runtime classes.
The goal is to educate readers familiar with Kubernetes and Kata Containers
on NVIDIA's reference implementation which is reflected in Kata CI's build
@@ -18,58 +19,56 @@ Confidential Containers.
> **Note:**
>
> The current supported mode for enabling GPU workloads in the TEE scenario
> is single GPU passthrough (one GPU per pod) on AMD64 platforms (AMD SEV-SNP
> being the only supported TEE scenario so far with support for Intel TDX being
> on the way).
> The currently supported modes for enabling GPU workloads in the TEE
> scenario are: (1) singleGPU passthrough (one physical GPU per pod) and
> (2) multi-GPU passthrough on NVSwitch (NVLink) based HGX systems
> (for example, HGX Hopper (SXM) and HGX Blackwell / HGX B200).
## Component Overview
Before providing deployment guidance, we describe the components involved to
support running GPU workloads. We start from a top to bottom perspective
from the NVIDIA GPU operator via the Kata runtime to the components within
from the NVIDIA GPU Operator via the Kata runtime to the components within
the NVIDIA GPU Utility Virtual Machine (UVM) root filesystem.
### NVIDIA GPU Operator
A central component is the
[NVIDIA GPU operator](https://github.com/NVIDIA/gpu-operator) which can be
deployed onto your cluster as a helm chart. Installing the GPU operator
[NVIDIA GPU Operator](https://github.com/NVIDIA/gpu-operator) which can be
deployed onto your cluster as a helm chart. Installing the GPU Operator
delivers various operands on your nodes in the form of Kubernetes DaemonSets.
These operands are vital to support the flow of orchestrating pod manifests
using NVIDIA GPU runtime classes with GPU passthrough on your nodes. Without
getting into the details, the most important operands and their
responsibilities are:
- **nvidia-vfio-manager:** Binding discovered NVIDIA GPUs to the `vfio-pci`
driver for VFIO passthrough.
- **nvidia-vfio-manager:** Binding discovered NVIDIA GPUs and nvswitches to
the `vfio-pci` driver for VFIO passthrough.
- **nvidia-cc-manager:** Transitioning GPUs into confidential computing (CC)
and non-CC mode (see the
[NVIDIA/k8s-cc-manager](https://github.com/NVIDIA/k8s-cc-manager)
repository).
- **nvidia-kata-manager:** Creating host-side CDI specifications for GPU
passthrough, resulting in the file `/var/run/cdi/nvidia.yaml`, containing
`kind: nvidia.com/pgpu` (see the
[NVIDIA/k8s-kata-manager](https://github.com/NVIDIA/k8s-kata-manager)
repository).
- **nvidia-sandbox-device-plugin** (see the
[NVIDIA/sandbox-device-plugin](https://github.com/NVIDIA/sandbox-device-plugin)
repository):
- Creating host-side CDI specifications for GPU passthrough,
resulting in the file `/var/run/cdi/nvidia.yaml`, containing
`kind: nvidia.com/pgpu`
- Allocating GPUs during pod deployment.
- Discovering NVIDIA GPUs, their capabilities, and advertising these to
the Kubernetes control plane (allocatable resources as type
`nvidia.com/pgpu` resources will appear for the node and GPU Device IDs
will be registered with Kubelet). These GPUs can thus be allocated as
container resources in your pod manifests. See below GPU operator
container resources in your pod manifests. See below GPU Operator
deployment instructions for the use of the key `pgpu`, controlled via a
variable.
To summarize, the GPU operator manages the GPUs on each node, allowing for
To summarize, the GPU Operator manages the GPUs on each node, allowing for
simple orchestration of pod manifests using Kata Containers. Once the cluster
with GPU operator and Kata bits is up and running, the end user can schedule
with GPU Operator and Kata bits is up and running, the end user can schedule
Kata NVIDIA GPU workloads, using resource limits and the
`kata-qemu-nvidia-gpu` or `kata-qemu-nvidia-gpu-snp` runtime classes, for
example:
`kata-qemu-nvidia-gpu`, `kata-qemu-nvidia-gpu-tdx` or
`kata-qemu-nvidia-gpu-snp` runtime classes, for example:
```yaml
apiVersion: v1
@@ -213,7 +212,7 @@ API and kernel drivers, interacting with the pass-through GPU device.
An additional step is exercised in our CI samples: when using images from an
authenticated registry, the guest-pull mechanism triggers attestation using
trustee's Key Broker Service (KBS) for secure release of the NGC API
Trustee's Key Broker Service (KBS) for secure release of the NGC API
authentication key used to access the NVCR container registry. As part of
this, the attestation agent exercises composite attestation and transitions
the GPU into `Ready` state (without this, the GPU has to explicitly be
@@ -232,24 +231,40 @@ NVIDIA GPU CI validation jobs. Note that, this setup:
- uses the genpolicy tool to attach Kata agent security policies to the pod
manifest
- has dedicated (composite) attestation tests, a CUDA vectorAdd test, and a
NIM/RA test sample with secure API key release
NIM/RA test sample with secure API key release using sealed secrets.
A similar deployment guide and scenario description can be found in NVIDIA resources
under
[Early Access: NVIDIA GPU Operator with Confidential Containers based on Kata](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/confidential-containers.html).
[NVIDIA Confidential Containers Overview (Early Access)](https://docs.nvidia.com/datacenter/cloud-native/confidential-containers/latest/overview.html).
### Feature Set
The NVIDIA stack for Kata Containers leverages features for the confidential
computing scenario from both the confidential containers open source project
and from the Kata Containers source tree, such as:
- composite attestation using Trustee and the NVIDIA Remote Attestation
Service NRAS
- generating kata agent security policies using the genpolicy tool
- use of signed sealed secrets
- access to authenticated registries for container image guest-pull
- container image signature verification and encrypted container images
- ephemeral container data and image layer storage
### Requirements
The requirements for the TEE scenario are:
- Ubuntu 25.10 as host OS
- CPU with AMD SEV-SNP support with proper BIOS/UEFI version and settings
- CPU with AMD SEV-SNP or Intel TDX support with proper BIOS/UEFI version
and settings
- CC-capable Hopper/Blackwell GPU with proper VBIOS version.
BIOS and VBIOS configuration is out of scope for this guide. Other resources,
such as the documentation found on the
[NVIDIA Trusted Computing Solutions](https://docs.nvidia.com/nvtrust/index.html)
page and the above linked NVIDIA documentation, provide guidance on
page, on the
[Secure AI Compatibility Matrix](https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/secure-ai-compatibility-matrix/)
page, and on the above linked NVIDIA documentation, provide guidance on
selecting proper hardware and on properly configuring its firmware and OS.
### Installation
@@ -257,12 +272,16 @@ selecting proper hardware and on properly configuring its firmware and OS.
#### Containerd and Kubernetes
First, set up your Kubernetes cluster. For instance, in Kata CI, our NVIDIA
jobs use a single-node vanilla Kubernetes cluster with a 2.x containerd
version and Kata's current supported Kubernetes version. We set this cluster
up using the `deploy_k8s` function from `tests/integration/kubernetes/gha-run.sh`
as follows:
jobs use a single-node vanilla Kubernetes cluster with a 2.1 containerd
version and Kata's current supported Kubernetes version. This cluster is
being set up using the `deploy_k8s` function from the script file
`tests/integration/kubernetes/gha-run.sh`. If you intend to run this script,
follow these steps, and make sure you have `yq` and `helm` installed. Note
that, these scripts query the GitHub API, so creating and declaring a
personal access token prevents rate limiting issues.
You can execute the function as follows:
```bash
$ export GH_TOKEN="<your-gh-pat>"
$ export KUBERNETES="vanilla"
$ export CONTAINER_ENGINE="containerd"
$ export CONTAINER_ENGINE_VERSION="v2.1"
@@ -276,8 +295,11 @@ $ deploy_k8s
> `runtimeRequestTimeout` timeout value than the two minute default timeout.
> Using the guest-pull mechanism, pulling large images may take a significant
> amount of time and may delay container start, possibly leading your Kubelet
> to de-allocate your pod before it transitions from the *container created*
> to the *container running* state.
> to de-allocate your pod before it transitions from the *container creating*
> to the *container running* state. The NVIDIA shim configurations use a
> `create_container_timeout` of 1200s, which is the equivalent value on shim
> side, controlling the time the shim allows for a container to remain in
> *container creating* state.
> **Note:**
>
@@ -291,7 +313,7 @@ $ deploy_k8s
#### GPU Operator
Assuming you have the helm tools installed, deploy the latest version of the
GPU Operator as a helm chart (minimum version: `v25.10.0`):
GPU Operator as a helm chart (minimum version: `v26.3.0`):
```bash
$ helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update
@@ -300,33 +322,27 @@ $ helm install --wait --generate-name \
nvidia/gpu-operator \
--set sandboxWorkloads.enabled=true \
--set sandboxWorkloads.defaultWorkload=vm-passthrough \
--set kataManager.enabled=true \
--set kataManager.config.runtimeClasses=null \
--set kataManager.repository=nvcr.io/nvidia/cloud-native \
--set kataManager.image=k8s-kata-manager \
--set kataManager.version=v0.2.4 \
--set ccManager.enabled=true \
--set ccManager.defaultMode=on \
--set ccManager.repository=nvcr.io/nvidia/cloud-native \
--set ccManager.image=k8s-cc-manager \
--set ccManager.version=v0.2.0 \
--set sandboxDevicePlugin.repository=nvcr.io/nvidia/cloud-native \
--set sandboxDevicePlugin.image=nvidia-sandbox-device-plugin \
--set sandboxDevicePlugin.version=v0.0.1 \
--set 'sandboxDevicePlugin.env[0].name=P_GPU_ALIAS' \
--set 'sandboxDevicePlugin.env[0].value=pgpu' \
--set sandboxWorkloads.mode=kata \
--set nfd.enabled=true \
--set nfd.nodefeaturerules=true
```
> **Note:**
>
> For heterogeneous clusters with different GPU types, you can omit
> the `P_GPU_ALIAS` environment variable lines. This will cause the sandbox
> device plugin to create GPU model-specific resource types (e.g.,
> `nvidia.com/GH100_H100L_94GB`) instead of the generic `nvidia.com/pgpu`,
> which in turn can be used by pods through respective resource limits.
> For simplicity, this guide uses the generic alias.
> For heterogeneous clusters with different GPU types, you can specify an
> empty `P_GPU_ALIAS` environment variable for the sandbox device plugin:
> `- --set 'sandboxDevicePlugin.env[0].name=P_GPU_ALIAS' \`
> `- --set 'sandboxDevicePlugin.env[0].value=""' \`
> This will cause the sandbox device plugin to create GPU model-specific
> resource types (e.g., `nvidia.com/GH100_H100L_94GB`) instead of the
> default `pgpu` type, which usually results in advertising a resource of
> type `nvidia.com/pgpu`
> The exposed device resource types can be used for pods by specifying
> respective resource limits.
> Your node's nvswitches are exposed as resources of type
> `nvidia.com/nvswitch` by default. Using the variable `NVSWITCH_ALIAS`
> allows to control the advertising behavior similar to the `P_GPU_ALIAS`
> variable.
> **Note:**
>
@@ -351,8 +367,7 @@ $ helm install kata-deploy \
--create-namespace \
-f "https://raw.githubusercontent.com/kata-containers/kata-containers/refs/tags/${VERSION}/tools/packaging/kata-deploy/helm-chart/kata-deploy/try-kata-nvidia-gpu.values.yaml" \
--set nfd.enabled=false \
--set shims.qemu-nvidia-gpu-tdx.enabled=false \
--wait --timeout 10m --atomic \
--wait --timeout 10m \
"${CHART}" --version "${VERSION}"
```
@@ -382,31 +397,22 @@ mode which requires entering a licensing agreement with NVIDIA, see the
### Cluster validation and preparation
If you did not use the `sandboxWorkloads.defaultWorkload=vm-passthrough`
parameter during GPU operator deployment, label your nodes for GPU VM
parameter during GPU Operator deployment, label your nodes for GPU VM
passthrough, for the example of using all nodes for GPU passthrough, run:
```bash
$ kubectl label nodes --all nvidia.com/gpu.workload.config=vm-passthrough --overwrite
```
Check if the `nvidia-cc-manager` pod is running if you intend to run GPU TEE
scenarios. If not, you need to manually label the node as CC capable. Current
GPU Operator node feature rules do not yet recognize all CC capable GPU PCI
IDs. Run the following command:
```bash
$ kubectl label nodes --all nvidia.com/cc.capable=true
```
After this, assure the `nvidia-cc-manager` pod is running. With the suggested
parameters for GPU Operator deployment, the `nvidia-cc-manager` will
automatically transition the GPU into CC mode.
With the suggested parameters for GPU Operator deployment, the
`nvidia-cc-manager` operand will automatically transition the GPU into CC
mode.
After deployment, you can transition your node(s) to the desired CC state,
using either the `on` or `off` value, depending on your scenario. For the
non-CC scenario, transition to the `off` state via:
using either the `on`, `ppcie`, or `off` value, depending on your scenario.
For the non-CC scenario, transition to the `off` state via:
`kubectl label nodes --all nvidia.com/cc.mode=off` and wait until all pods
are back running. When an actual change is exercised, various GPU operator
are back running. When an actual change is exercised, various GPU Operator
operands will be restarted.
Ensure all pods are running:
@@ -425,9 +431,10 @@ $ lspci -nnk -d 10de:
### Run the CUDA vectorAdd sample
Create the following file:
Create the pod manifest with:
```yaml
```bash
$ cat > cuda-vectoradd-kata.yaml.in << 'EOF'
apiVersion: v1
kind: Pod
metadata:
@@ -445,6 +452,7 @@ spec:
limits:
nvidia.com/pgpu: "1"
memory: 16Gi
EOF
```
Depending on your scenario and on the CC state, export your desired runtime
@@ -477,6 +485,17 @@ To stop the pod, run: `kubectl delete pod cuda-vectoradd-kata`.
### Next steps
#### Use multi-GPU passthrough
If you have machines supporting multi-GPU passthrough, use a pod deployment
manifest which uses 8 pgpu and 4 nvswitch resources.
On the NVIDIA Hopper architecture multi-GPU passthrough uses protected PCIe
(PPCIE) which claims exclusive use of the nvswitches for a single CVM. In
this case, transition your relevant node(s) GPU mode to `ppcie` mode.
The NVIDIA Blackwell architecture uses NVLink encryption which places the
switches outside of the Trusted Computing Base (TCB) and so does not
require a separate switch setting.
#### Transition between CC and non-CC mode
Use the previously described node labeling approach to transition between
@@ -492,7 +511,7 @@ and a basic NIM/RAG deployment. Running CI tests for the TEE GPU scenario
requires KBS to be deployed (except for the CUDA vectorAdd test). The best
place to get started running these tests locally is to look into our
[NVIDIA CI workflow manifest](https://github.com/kata-containers/kata-containers/blob/main/.github/workflows/run-k8s-tests-on-nvidia-gpu.yaml)
and into the underling
and into the underlying
[run_kubernetes_nv_tests.sh](https://github.com/kata-containers/kata-containers/blob/main/tests/integration/kubernetes/run_kubernetes_nv_tests.sh)
script. For example, to run the CUDA vectorAdd scenario against the TEE GPU
runtime class use the following commands:
@@ -547,6 +566,22 @@ With GPU passthrough being supported by the
you can use the tool to create a Kata agent security policy. Our CI deploys
all sample pod manifests with a Kata agent security policy.
Note that, using containerd 2.1 in upstream's CI, we use the following
modification to the genpolicy default settings:
```bash
[
{
"op": "replace",
"path": "/kata_config/oci_version",
"value": "1.2.1"
}
]
```
This modification is applied via the genpolicy drop-in configuration file
`src\tools\genpolicy\drop-in-examples\20-oci-1.2.1-drop-in.json`.
When using a newer containerd version, such as containerd 2.2, the OCI
version field needs to be adjusted to "1.3.0", for instance.
#### Deploy pods using your own containers and manifests
You can author pod manifests leveraging your own containers, for instance,
@@ -564,6 +599,3 @@ following annotation in the manifest:
>
> - musl-based container images (e.g., using Alpine), or distro-less
> containers are not supported.
> - for the TEE scenario, only single-GPU passthrough per pod is supported,
> so your pod resource limit must be: `nvidia.com/pgpu: "1"` (on a system
> with multiple GPUs, you can thus pass through one GPU per pod).

91
mkdocs.yaml Normal file
View File

@@ -0,0 +1,91 @@
site_name: "Kata Containers Docs"
site_description: "Developer and user documentation for the Kata Containers project."
site_author: "Kata Containers Community"
repo_url: "https://github.com/kata-containers/kata-containers"
site_url: "https://kata-containers.github.io/kata-containers"
edit_uri: "edit/main/docs/"
repo_name: kata-containers
theme:
name: materialx
favicon: "assets/images/favicon.svg"
logo: "assets/images/favicon.svg"
topbar_style: glass
palette:
- media: "(prefers-color-scheme)"
toggle:
icon: material/brightness-auto
name: Switch to light mode
- media: "(prefers-color-scheme: light)"
scheme: default
primary: blue
accent: light blue
toggle:
icon: material/weather-sunny
name: Switch to dark mode
- media: "(prefers-color-scheme: dark)"
scheme: slate
primary: cyan
accent: cyan
toggle:
icon: material/brightness-4
name: Switch to system preference
features:
- content.action.edit
- content.action.view
- content.code.annotate
- content.code.copy
- content.code.select
- content.footnote.tooltips
- content.tabs.link
- content.tooltips
- navigation.expand
- navigation.indexes
- navigation.path
- navigation.sections
- navigation.tabs
- navigation.tracking
- navigation.top
- navigation.instant
- navigation.instant.prefetch
- navigation.instant.progress
- toc.follow
markdown_extensions:
- abbr
- admonition
- attr_list
- def_list
- footnotes
- md_in_html
- pymdownx.arithmatex:
generic: true
- pymdownx.emoji:
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
- pymdownx.details
- pymdownx.highlight:
anchor_linenums: true
line_spans: __span
pygments_lang_class: true
auto_title: true
- pymdownx.keys
- pymdownx.magiclink
- pymdownx.superfences:
custom_fences:
- name: mermaid
class: mermaid
format: !!python/name:pymdownx.superfences.fence_code_format
- pymdownx.inlinehilite
- pymdownx.tabbed:
alternate_style: true
- pymdownx.tilde
- pymdownx.caret
- pymdownx.mark
- toc:
permalink: true
plugins:
- search
- awesome-nav

View File

@@ -1,3 +1,3 @@
[toolchain]
# Keep in sync with versions.yaml
channel = "1.91"
channel = "1.92"

1830
src/agent/Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -63,9 +63,9 @@ cgroups = { package = "cgroups-rs", git = "https://github.com/kata-containers/cg
# Tracing
tracing = "0.1.41"
tracing-subscriber = "0.2.18"
tracing-opentelemetry = "0.13.0"
opentelemetry = { version = "0.14.0", features = ["rt-tokio-current-thread"] }
tracing-subscriber = "0.3.20"
tracing-opentelemetry = "0.17.0"
opentelemetry = { version = "0.17.0", features = ["rt-tokio"] }
# Configuration
serde = { version = "1.0.129", features = ["derive"] }
@@ -78,7 +78,6 @@ strum_macros = "0.26.2"
tempfile = "3.19.1"
which = "4.3.0"
rstest = "0.18.0"
async-std = { version = "1.12.0", features = ["attributes"] }
# Local dependencies
kata-agent-policy = { path = "policy" }
@@ -195,7 +194,6 @@ pv_core = { git = "https://github.com/ibm-s390-linux/s390-tools", rev = "4942504
tempfile.workspace = true
which.workspace = true
rstest.workspace = true
async-std.workspace = true
test-utils.workspace = true

View File

@@ -89,7 +89,7 @@ pub fn baremount(
let destination_str = destination.to_string_lossy();
if let Ok(m) = get_linux_mount_info(destination_str.deref()) {
if m.fs_type == fs_type && !flags.contains(MsFlags::MS_REMOUNT) {
slog_info!(logger, "{source:?} is already mounted at {destination:?}");
slog::info!(logger, "{source:?} is already mounted at {destination:?}");
return Ok(());
}
}

View File

@@ -110,8 +110,10 @@ impl Namespace {
unshare(cf)?;
if ns_type == NamespaceType::Uts && hostname.is_some() {
nix::unistd::sethostname(hostname.unwrap())?;
if ns_type == NamespaceType::Uts {
if let Some(host) = hostname {
nix::unistd::sethostname(host)?;
}
}
// Bind mount the new namespace from the current thread onto the mount point to persist it.

View File

@@ -2317,8 +2317,14 @@ async fn cdh_handler_trusted_storage(oci: &mut Spec) -> Result<()> {
for specdev in devices.iter() {
if specdev.path().as_path().to_str() == Some(TRUSTED_IMAGE_STORAGE_DEVICE) {
let dev_major_minor = format!("{}:{}", specdev.major(), specdev.minor());
cdh_secure_mount("BlockDevice", &dev_major_minor, "LUKS", KATA_IMAGE_WORK_DIR)
.await?;
cdh_secure_mount(
"block-device",
&dev_major_minor,
"luks2",
KATA_IMAGE_WORK_DIR,
"-E lazy_journal_init",
)
.await?;
break;
}
}
@@ -2331,6 +2337,7 @@ pub(crate) async fn cdh_secure_mount(
device_id: &str,
encrypt_type: &str,
mount_point: &str,
mkfs_opts: &str,
) -> Result<()> {
if !confidential_data_hub::is_cdh_client_initialized() {
return Ok(());
@@ -2340,19 +2347,31 @@ pub(crate) async fn cdh_secure_mount(
info!(
sl(),
"cdh_secure_mount: device_type {}, device_id {}, encrypt_type {}, integrity {}",
"cdh_secure_mount: device_type {}, device_id {}, encrypt_type {}, integrity {}, mkfs_opts {}",
device_type,
device_id,
encrypt_type,
integrity
integrity,
mkfs_opts
);
let options = std::collections::HashMap::from([
("deviceId".to_string(), device_id.to_string()),
("encryptType".to_string(), encrypt_type.to_string()),
("sourceType".to_string(), "empty".to_string()),
("targetType".to_string(), "fileSystem".to_string()),
("filesystemType".to_string(), "ext4".to_string()),
("mkfsOpts".to_string(), mkfs_opts.to_string()),
("encryptionType".to_string(), encrypt_type.to_string()),
("dataIntegrity".to_string(), integrity),
]);
std::fs::create_dir_all(mount_point).inspect_err(|e| {
error!(
sl(),
"Failed to create mount point directory {}: {:?}", mount_point, e
);
})?;
confidential_data_hub::secure_mount(device_type, &options, vec![], mount_point).await?;
Ok(())

View File

@@ -59,7 +59,14 @@ async fn handle_block_storage(
.contains(&"encryption_key=ephemeral".to_string());
if has_ephemeral_encryption {
crate::rpc::cdh_secure_mount("BlockDevice", dev_num, "LUKS", &storage.mount_point).await?;
crate::rpc::cdh_secure_mount(
"block-device",
dev_num,
"luks2",
&storage.mount_point,
"-O ^has_journal -m 0 -i 163840 -I 128",
)
.await?;
set_ownership(logger, storage)?;
new_device(storage.mount_point.clone())
} else {

View File

@@ -5,7 +5,8 @@
use anyhow::Result;
use opentelemetry::sdk::propagation::TraceContextPropagator;
use opentelemetry::{global, sdk::trace::Config, trace::TracerProvider};
use opentelemetry::trace::TracerProvider;
use opentelemetry::{global, sdk::trace::Config};
use slog::{info, o, Logger};
use std::collections::HashMap;
use tracing_opentelemetry::OpenTelemetryLayer;
@@ -23,15 +24,12 @@ pub fn setup_tracing(name: &'static str, logger: &Logger) -> Result<()> {
let config = Config::default();
let builder = opentelemetry::sdk::trace::TracerProvider::builder()
.with_batch_exporter(exporter, opentelemetry::runtime::TokioCurrentThread)
.with_batch_exporter(exporter, opentelemetry::runtime::Tokio)
.with_config(config);
let provider = builder.build();
// We don't need a versioned tracer.
let version = None;
let tracer = provider.get_tracer(name, version);
let tracer = provider.tracer(name);
let _global_provider = global::set_tracer_provider(provider);

View File

@@ -10,7 +10,7 @@ libc.workspace = true
thiserror.workspace = true
opentelemetry = { workspace = true, features = ["serialize"] }
tokio-vsock.workspace = true
bincode = "1.3.3"
serde_json = "1.0"
byteorder = "1.4.3"
slog = { workspace = true, features = [
"dynamic-keys",

View File

@@ -58,7 +58,7 @@ pub enum Error {
#[error("connection error: {0}")]
ConnectionError(String),
#[error("serialisation error: {0}")]
SerialisationError(#[from] bincode::Error),
SerialisationError(#[from] serde_json::Error),
#[error("I/O error: {0}")]
IOError(#[from] std::io::Error),
}
@@ -81,8 +81,7 @@ async fn write_span(
let mut writer = writer.lock().await;
let encoded_payload: Vec<u8> =
bincode::serialize(&span).map_err(|e| make_io_error(e.to_string()))?;
serde_json::to_vec(span).map_err(|e| make_io_error(e.to_string()))?;
let payload_len: u64 = encoded_payload.len() as u64;
let mut payload_len_as_bytes: [u8; HEADER_SIZE_BYTES as usize] =

View File

@@ -50,6 +50,7 @@ vm-memory = { workspace = true, features = ["backend-mmap"] }
crossbeam-channel = "0.5.6"
vfio-bindings = { workspace = true, optional = true }
vfio-ioctls = { workspace = true, optional = true }
kata-sys-util = { path = "../libs/kata-sys-util" }
[dev-dependencies]
slog-async = "2.7.0"

View File

@@ -10,10 +10,10 @@ This repository contains the following submodules:
| Name | Arch| Description |
| --- | --- | --- |
| [`bootparam`](src/x86_64/bootparam.rs) | x86_64 | Magic addresses externally used to lay out x86_64 VMs |
| [fdt](src/aarch64/fdt.rs) | aarch64| Create FDT for Aarch64 systems |
| [layout](src/x86_64/layout.rs) | x86_64 | x86_64 layout constants |
| [layout](src/aarch64/layout.rs/) | aarch64 | aarch64 layout constants |
| [mptable](src/x86_64/mptable.rs) | x86_64 | MP Table configurations used for defining VM boot status |
| [`fdt`](src/aarch64/fdt.rs) | aarch64| Create FDT for Aarch64 systems |
| [`layout`](src/x86_64/layout.rs) | x86_64 | x86_64 layout constants |
| [`layout`](src/aarch64/layout.rs/) | aarch64 | aarch64 layout constants |
| [`mptable`](src/x86_64/mptable.rs) | x86_64 | MP Table configurations used for defining VM boot status |
## Acknowledgement

View File

@@ -3,7 +3,7 @@
This crate is a collection of modules that provides helpers and utilities to create a TDX Dragonball VM.
Currently this crate involves:
- tdx-ioctls
- `tdx-ioctls`
## Acknowledgement

View File

@@ -44,6 +44,7 @@ vm-memory = { workspace = true, features = ["backend-mmap"] }
sendfd = "0.4.3"
vhost-rs = { version = "0.15.0", package = "vhost", optional = true }
timerfd = "1.0"
kata-sys-util = { workspace = true}
[dev-dependencies]
vm-memory = { workspace = true, features = ["backend-mmap", "backend-atomic"] }

View File

@@ -2,6 +2,7 @@
//
// SPDX-License-Identifier: Apache-2.0 AND BSD-3-Clause
use kata_sys_util::netns::NetnsGuard;
use std::any::Any;
use std::collections::HashMap;
use std::ffi::CString;
@@ -454,6 +455,11 @@ impl<AS: GuestAddressSpace> VirtioFs<AS> {
prefetch_list_path: Option<String>,
) -> FsResult<()> {
debug!("http_server rafs");
// We need to make sure the nydus worker thread in the runD main process's network namespace
// instead of the vmm thread's netns, which wouldn't access the host network.
let _netns_guard =
NetnsGuard::new("/proc/self/ns/net").map_err(|e| FsError::BackendFs(e.to_string()))?;
let file = Path::new(&source);
let (mut rafs, rafs_cfg) = match config.as_ref() {
Some(cfg) => {

View File

@@ -24,9 +24,7 @@ message SecureMountRequest {
string mount_point = 4;
}
message SecureMountResponse {
string mount_path = 1;
}
message SecureMountResponse {}
message ImagePullRequest {
// - `image_url`: The reference of the image to pull

View File

@@ -4,7 +4,7 @@ Safe Path
A library to safely handle filesystem paths, typically for container runtimes.
There are often path related attacks, such as symlink based attacks, TOCTTOU attacks. The `safe-path` crate
There are often path related attacks, such as symlink based attacks, time-of-check to time-of-use (TOCTOU) attacks. The `safe-path` crate
provides several functions and utility structures to protect against path resolution related attacks.
## Support

View File

@@ -15,13 +15,13 @@ serde = { workspace = true, features = ["rc", "derive"] }
serde_json = { workspace = true }
tokio = { workspace = true, features = ["sync", "rt"] }
nix = "0.26.2"
thiserror = { workspace = true }
thiserror = "2.0.18"
# Cloud Hypervisor public HTTP API functions
# Note that the version specified is not necessarily the version of CH
# being used. This version is used to pin the CH config structure
# which is relatively static.
api_client = { git = "https://github.com/cloud-hypervisor/cloud-hypervisor", crate = "api_client", tag = "v27.0" }
api_client = { git = "https://github.com/cloud-hypervisor/cloud-hypervisor", tag = "v51.0" }
# Local dependencies
kata-types = { workspace = true }

View File

@@ -135,7 +135,7 @@ pub async fn cloud_hypervisor_vm_netdev_add_with_fds(
"PUT",
"vm.add-net",
Some(&serialised),
request_fds,
&request_fds,
)
.map_err(|e| anyhow!(e))?;

View File

@@ -65,8 +65,6 @@ INITRDCONFIDENTIALNAME = $(PROJECT_TAG)-initrd-confidential.img
IMAGENAME_NV = $(PROJECT_TAG)-nvidia-gpu.img
IMAGENAME_CONFIDENTIAL_NV = $(PROJECT_TAG)-nvidia-gpu-confidential.img
INITRDNAME_NV = $(PROJECT_TAG)-initrd-nvidia-gpu.img
INITRDNAME_CONFIDENTIAL_NV = $(PROJECT_TAG)-initrd-nvidia-gpu-confidential.img
TARGET = $(BIN_PREFIX)-runtime
RUNTIME_OUTPUT = $(CURDIR)/$(TARGET)
@@ -136,8 +134,6 @@ INITRDCONFIDENTIALPATH := $(PKGDATADIR)/$(INITRDCONFIDENTIALNAME)
IMAGEPATH_NV := $(PKGDATADIR)/$(IMAGENAME_NV)
IMAGEPATH_CONFIDENTIAL_NV := $(PKGDATADIR)/$(IMAGENAME_CONFIDENTIAL_NV)
INITRDPATH_NV := $(PKGDATADIR)/$(INITRDNAME_NV)
INITRDPATH_CONFIDENTIAL_NV := $(PKGDATADIR)/$(INITRDNAME_CONFIDENTIAL_NV)
ROOTFSTYPE_EXT4 := \"ext4\"
ROOTFSTYPE_XFS := \"xfs\"
@@ -483,16 +479,12 @@ ifneq (,$(QEMUCMD))
KERNELPATH_CONFIDENTIAL_NV = $(KERNELDIR)/$(KERNELNAME_CONFIDENTIAL_NV)
DEFAULTVCPUS_NV = 1
DEFAULTMEMORY_NV = 2048
DEFAULTMEMORY_NV = 8192
DEFAULTTIMEOUT_NV = 1200
DEFAULTVFIOPORT_NV = root-port
DEFAULTPCIEROOTPORT_NV = 8
# Disable the devtmpfs mount in guest. NVRC does this, and later kata-agent
# attempts this as well in a non-failing manner. Otherwise, NVRC fails when
# using an image and /dev is already mounted.
KERNELPARAMS_NV = "cgroup_no_v1=all"
KERNELPARAMS_NV += "devtmpfs.mount=0"
KERNELPARAMS_NV += "pci=realloc"
KERNELPARAMS_NV += "pci=nocrs"
KERNELPARAMS_NV += "pci=assign-busses"
@@ -660,10 +652,6 @@ USER_VARS += IMAGENAME_NV
USER_VARS += IMAGENAME_CONFIDENTIAL_NV
USER_VARS += IMAGEPATH_NV
USER_VARS += IMAGEPATH_CONFIDENTIAL_NV
USER_VARS += INITRDNAME_NV
USER_VARS += INITRDNAME_CONFIDENTIAL_NV
USER_VARS += INITRDPATH_NV
USER_VARS += INITRDPATH_CONFIDENTIAL_NV
USER_VARS += KERNELNAME_NV
USER_VARS += KERNELPATH_NV
USER_VARS += KERNELNAME_CONFIDENTIAL_NV

View File

@@ -599,7 +599,7 @@ debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 90)
dial_timeout = 90
dial_timeout = @DEFAULTTIMEOUT_NV@
[runtime]
# If enabled, the runtime will log additional debug messages to the

View File

@@ -576,7 +576,7 @@ debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 90)
dial_timeout = 90
dial_timeout = @DEFAULTTIMEOUT_NV@
[runtime]
# If enabled, the runtime will log additional debug messages to the

View File

@@ -578,7 +578,7 @@ debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 90)
dial_timeout = 90
dial_timeout = @DEFAULTTIMEOUT_NV@
[runtime]
# If enabled, the runtime will log additional debug messages to the

View File

@@ -1,7 +1,7 @@
module github.com/kata-containers/kata-containers/src/runtime
// Keep in sync with version in versions.yaml
go 1.25.7
go 1.25.8
// WARNING: Do NOT use `replace` directives as those break dependabot:
// https://github.com/kata-containers/kata-containers/issues/11020
@@ -56,10 +56,10 @@ require (
go.opentelemetry.io/otel/exporters/jaeger v1.0.0
go.opentelemetry.io/otel/sdk v1.40.0
go.opentelemetry.io/otel/trace v1.40.0
golang.org/x/oauth2 v0.30.0
golang.org/x/oauth2 v0.34.0
golang.org/x/sys v0.40.0
google.golang.org/grpc v1.72.0
google.golang.org/protobuf v1.36.7
google.golang.org/grpc v1.79.3
google.golang.org/protobuf v1.36.10
k8s.io/apimachinery v0.33.0
k8s.io/cri-api v0.33.0
k8s.io/kubelet v0.33.0
@@ -135,7 +135,7 @@ require (
golang.org/x/sync v0.19.0 // indirect
golang.org/x/text v0.33.0 // indirect
google.golang.org/genproto v0.0.0-20250303144028-a0af3efb3deb // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20250313205543-e70fdf4c4cb4 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20251202230838-ff82c1b0f217 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect

View File

@@ -366,8 +366,8 @@ golang.org/x/net v0.0.0-20210428140749-89ef3d95e781/go.mod h1:OJAsFXCWl8Ukc7SiCT
golang.org/x/net v0.49.0 h1:eeHFmOGUTtaaPSGNmjBKpbng9MulQsJURQUAfUwY++o=
golang.org/x/net v0.49.0/go.mod h1:/ysNB2EvaqvesRkuLAyjI1ycPZlQHM3q01F02UY/MV8=
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
golang.org/x/oauth2 v0.30.0 h1:dnDm7JmhM45NNpd8FDDeLhK6FwqbOf4MLCM9zb1BOHI=
golang.org/x/oauth2 v0.30.0/go.mod h1:B++QgG3ZKulg6sRPGD/mqlHQs5rB3Ml9erfeDY7xKlU=
golang.org/x/oauth2 v0.34.0 h1:hqK/t4AKgbqWkdkcAeI8XLmbK+4m4G5YeQRrmiotGlw=
golang.org/x/oauth2 v0.34.0/go.mod h1:lzm5WQJQwKZ3nwavOZ3IS5Aulzxi68dUSgRHujetwEA=
golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20181108010431-42b317875d0f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
@@ -417,6 +417,8 @@ golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8T
golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
gonum.org/v1/gonum v0.16.0 h1:5+ul4Swaf3ESvrOnidPp4GZbzf0mxVQpDCYUQE7OJfk=
gonum.org/v1/gonum v0.16.0/go.mod h1:fef3am4MQ93R2HHpKnLk4/Tbh/s0+wqD5nfa6Pnwy4E=
google.golang.org/appengine v1.1.0/go.mod h1:EbEs0AVv82hx2wNQdGPgUI5lhzA/G0D9YwlJXL52JkM=
google.golang.org/appengine v1.4.0/go.mod h1:xpcJRLb0r/rnEns0DIKYYv+WjYCduHsrkT7/EB5XEv4=
google.golang.org/genproto v0.0.0-20180817151627-c66870c02cf8/go.mod h1:JiN7NxoALGmiZfu7CAH4rXhgtRTLTxftemlI0sWmxmc=
@@ -424,15 +426,15 @@ google.golang.org/genproto v0.0.0-20190819201941-24fa4b261c55/go.mod h1:DMBHOl98
google.golang.org/genproto v0.0.0-20200526211855-cb27e3aa2013/go.mod h1:NbSheEEYHJ7i3ixzK3sjbqSGDJWnxyFXZblF3eUsNvo=
google.golang.org/genproto v0.0.0-20250303144028-a0af3efb3deb h1:ITgPrl429bc6+2ZraNSzMDk3I95nmQln2fuPstKwFDE=
google.golang.org/genproto v0.0.0-20250303144028-a0af3efb3deb/go.mod h1:sAo5UzpjUwgFBCzupwhcLcxHVDK7vG5IqI30YnwX2eE=
google.golang.org/genproto/googleapis/rpc v0.0.0-20250313205543-e70fdf4c4cb4 h1:iK2jbkWL86DXjEx0qiHcRE9dE4/Ahua5k6V8OWFb//c=
google.golang.org/genproto/googleapis/rpc v0.0.0-20250313205543-e70fdf4c4cb4/go.mod h1:LuRYeWDFV6WOn90g357N17oMCaxpgCnbi/44qJvDn2I=
google.golang.org/genproto/googleapis/rpc v0.0.0-20251202230838-ff82c1b0f217 h1:gRkg/vSppuSQoDjxyiGfN4Upv/h/DQmIR10ZU8dh4Ww=
google.golang.org/genproto/googleapis/rpc v0.0.0-20251202230838-ff82c1b0f217/go.mod h1:7i2o+ce6H/6BluujYR+kqX3GKH+dChPTQU19wjRPiGk=
google.golang.org/grpc v1.19.0/go.mod h1:mqu4LbDTu4XGKhr4mRzUsmM4RtVoemTSY81AxZiDr8c=
google.golang.org/grpc v1.23.0/go.mod h1:Y5yQAOtifL1yxbo5wqy6BxZv8vAUGQwXBOALyacEbxg=
google.golang.org/grpc v1.25.1/go.mod h1:c3i+UQWmh7LiEpx4sFZnkU36qjEYZ0imhYfXVyQciAY=
google.golang.org/grpc v1.27.0/go.mod h1:qbnxyOmOxrQa7FizSgH+ReBfzJrCY1pSN7KXBS8abTk=
google.golang.org/grpc v1.33.2/go.mod h1:JMHMWHQWaTccqQQlmk3MJZS+GWXOdAesneDmEnv2fbc=
google.golang.org/grpc v1.72.0 h1:S7UkcVa60b5AAQTaO6ZKamFp1zMZSU0fGDK2WZLbBnM=
google.golang.org/grpc v1.72.0/go.mod h1:wH5Aktxcg25y1I3w7H69nHfXdOG3UiadoBtjh3izSDM=
google.golang.org/grpc v1.79.3 h1:sybAEdRIEtvcD68Gx7dmnwjZKlyfuc61Dyo9pGXXkKE=
google.golang.org/grpc v1.79.3/go.mod h1:KmT0Kjez+0dde/v2j9vzwoAScgEPx/Bw1CYChhHLrHQ=
google.golang.org/protobuf v0.0.0-20200109180630-ec00e32a8dfd/go.mod h1:DFci5gLYBciE7Vtevhsrf46CRTquxDuWsQurQQe4oz8=
google.golang.org/protobuf v0.0.0-20200221191635-4d8936d0db64/go.mod h1:kwYJMbMJ01Woi6D6+Kah6886xMZcty6N08ah7+eCXa0=
google.golang.org/protobuf v0.0.0-20200228230310-ab0ca4ff8a60/go.mod h1:cfTl7dwQJ+fmap5saPgwCLgHXTUD7jkjRqWcaiX5VyM=
@@ -444,8 +446,8 @@ google.golang.org/protobuf v1.23.1-0.20200526195155-81db48ad09cc/go.mod h1:EGpAD
google.golang.org/protobuf v1.25.0/go.mod h1:9JNX74DMeImyA3h4bdi1ymwjUzf21/xIlbajtzgsN7c=
google.golang.org/protobuf v1.26.0-rc.1/go.mod h1:jlhhOSvTdKEhbULTjvd4ARK9grFBp09yW+WbY/TyQbw=
google.golang.org/protobuf v1.26.0/go.mod h1:9q0QmTI4eRPtz6boOQmLYwt+qCgq0jsYwAQnmE0givc=
google.golang.org/protobuf v1.36.7 h1:IgrO7UwFQGJdRNXH/sQux4R1Dj1WAKcLElzeeRaXV2A=
google.golang.org/protobuf v1.36.7/go.mod h1:jduwjTPXsFjZGTmRluh+L6NjiWu7pchiJ2/5YcXBHnY=
google.golang.org/protobuf v1.36.10 h1:AYd7cD/uASjIL6Q9LiTjz8JLcrh/88q5UObnmY3aOOE=
google.golang.org/protobuf v1.36.10/go.mod h1:HTf+CrKn2C3g5S8VImy6tdcUvCska2kB7j23XfzDpco=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q=

View File

@@ -72,7 +72,7 @@ func IsPCIeDevice(bdf string) bool {
}
// read from /sys/bus/pci/devices/xxx/property
func getPCIDeviceProperty(bdf string, property PCISysFsProperty) string {
func GetPCIDeviceProperty(bdf string, property PCISysFsProperty) string {
if len(strings.Split(bdf, ":")) == 2 {
bdf = PCIDomain + ":" + bdf
}
@@ -220,9 +220,9 @@ func GetDeviceFromVFIODev(device config.DeviceInfo) ([]*config.VFIODev, error) {
return nil, err
}
vendorID := getPCIDeviceProperty(deviceBDF, PCISysFsDevicesVendor)
deviceID := getPCIDeviceProperty(deviceBDF, PCISysFsDevicesDevice)
pciClass := getPCIDeviceProperty(deviceBDF, PCISysFsDevicesClass)
vendorID := GetPCIDeviceProperty(deviceBDF, PCISysFsDevicesVendor)
deviceID := GetPCIDeviceProperty(deviceBDF, PCISysFsDevicesDevice)
pciClass := GetPCIDeviceProperty(deviceBDF, PCISysFsDevicesClass)
i, err := extractIndex(device.HostPath)
if err != nil {
@@ -276,7 +276,7 @@ func GetAllVFIODevicesFromIOMMUGroup(device config.DeviceInfo) ([]*config.VFIODe
switch vfioDeviceType {
case config.VFIOPCIDeviceNormalType, config.VFIOPCIDeviceMediatedType:
// This is vfio-pci and vfio-mdev specific
pciClass := getPCIDeviceProperty(deviceBDF, PCISysFsDevicesClass)
pciClass := GetPCIDeviceProperty(deviceBDF, PCISysFsDevicesClass)
// We need to ignore Host or PCI Bridges that are in the same IOMMU group as the
// passed-through devices. One CANNOT pass-through a PCI bridge or Host bridge.
// Class 0x0604 is PCI bridge, 0x0600 is Host bridge
@@ -288,8 +288,8 @@ func GetAllVFIODevicesFromIOMMUGroup(device config.DeviceInfo) ([]*config.VFIODe
continue
}
// Fetch the PCI Vendor ID and Device ID
vendorID := getPCIDeviceProperty(deviceBDF, PCISysFsDevicesVendor)
deviceID := getPCIDeviceProperty(deviceBDF, PCISysFsDevicesDevice)
vendorID := GetPCIDeviceProperty(deviceBDF, PCISysFsDevicesVendor)
deviceID := GetPCIDeviceProperty(deviceBDF, PCISysFsDevicesDevice)
// Do not directly assign to `vfio` -- need to access field still
vfio = config.VFIODev{

View File

@@ -6,6 +6,7 @@ import (
"errors"
"fmt"
"io"
"mime"
"net/http"
"net/url"
"strings"
@@ -116,10 +117,38 @@ func retrieveDeviceAuth(ctx context.Context, c *Config, v url.Values) (*DeviceAu
return nil, fmt.Errorf("oauth2: cannot auth device: %v", err)
}
if code := r.StatusCode; code < 200 || code > 299 {
return nil, &RetrieveError{
retrieveError := &RetrieveError{
Response: r,
Body: body,
}
content, _, _ := mime.ParseMediaType(r.Header.Get("Content-Type"))
switch content {
case "application/x-www-form-urlencoded", "text/plain":
// some endpoints return a query string
vals, err := url.ParseQuery(string(body))
if err != nil {
return nil, retrieveError
}
retrieveError.ErrorCode = vals.Get("error")
retrieveError.ErrorDescription = vals.Get("error_description")
retrieveError.ErrorURI = vals.Get("error_uri")
default:
var tj struct {
// https://datatracker.ietf.org/doc/html/rfc6749#section-5.2
ErrorCode string `json:"error"`
ErrorDescription string `json:"error_description"`
ErrorURI string `json:"error_uri"`
}
if json.Unmarshal(body, &tj) != nil {
return nil, retrieveError
}
retrieveError.ErrorCode = tj.ErrorCode
retrieveError.ErrorDescription = tj.ErrorDescription
retrieveError.ErrorURI = tj.ErrorURI
}
return nil, retrieveError
}
da := &DeviceAuthResponse{}

View File

@@ -9,7 +9,6 @@
package oauth2 // import "golang.org/x/oauth2"
import (
"bytes"
"context"
"errors"
"net/http"
@@ -99,7 +98,7 @@ const (
// in the POST body as application/x-www-form-urlencoded parameters.
AuthStyleInParams AuthStyle = 1
// AuthStyleInHeader sends the client_id and client_password
// AuthStyleInHeader sends the client_id and client_secret
// using HTTP Basic Authorization. This is an optional style
// described in the OAuth2 RFC 6749 section 2.3.1.
AuthStyleInHeader AuthStyle = 2
@@ -158,7 +157,7 @@ func SetAuthURLParam(key, value string) AuthCodeOption {
// PKCE), https://www.oauth.com/oauth2-servers/pkce/ and
// https://www.ietf.org/archive/id/draft-ietf-oauth-v2-1-09.html#name-cross-site-request-forgery (describing both approaches)
func (c *Config) AuthCodeURL(state string, opts ...AuthCodeOption) string {
var buf bytes.Buffer
var buf strings.Builder
buf.WriteString(c.Endpoint.AuthURL)
v := url.Values{
"response_type": {"code"},

View File

@@ -51,7 +51,7 @@ func S256ChallengeFromVerifier(verifier string) string {
return base64.RawURLEncoding.EncodeToString(sha[:])
}
// S256ChallengeOption derives a PKCE code challenge derived from verifier with
// S256ChallengeOption derives a PKCE code challenge from the verifier with
// method S256. It should be passed to [Config.AuthCodeURL] or [Config.DeviceAuth]
// only.
func S256ChallengeOption(verifier string) AuthCodeOption {

View File

@@ -103,7 +103,7 @@ func (t *Token) WithExtra(extra any) *Token {
}
// Extra returns an extra field.
// Extra fields are key-value pairs returned by the server as a
// Extra fields are key-value pairs returned by the server as
// part of the token retrieval response.
func (t *Token) Extra(key string) any {
if raw, ok := t.raw.(map[string]any); ok {

View File

@@ -58,7 +58,7 @@ func (t *Transport) RoundTrip(req *http.Request) (*http.Response, error) {
var cancelOnce sync.Once
// CancelRequest does nothing. It used to be a legacy cancellation mechanism
// but now only it only logs on first use to warn that it's deprecated.
// but now only logs on first use to warn that it's deprecated.
//
// Deprecated: use contexts for cancellation instead.
func (t *Transport) CancelRequest(req *http.Request) {

View File

@@ -1,73 +1,159 @@
# How to contribute
We definitely welcome your patches and contributions to gRPC! Please read the gRPC
organization's [governance rules](https://github.com/grpc/grpc-community/blob/master/governance.md)
and [contribution guidelines](https://github.com/grpc/grpc-community/blob/master/CONTRIBUTING.md) before proceeding.
We welcome your patches and contributions to gRPC! Please read the gRPC
organization's [governance
rules](https://github.com/grpc/grpc-community/blob/master/governance.md) before
proceeding.
If you are new to GitHub, please start by reading [Pull Request howto](https://help.github.com/articles/about-pull-requests/)
## Legal requirements
In order to protect both you and ourselves, you will need to sign the
[Contributor License Agreement](https://identity.linuxfoundation.org/projects/cncf).
[Contributor License
Agreement](https://identity.linuxfoundation.org/projects/cncf). When you create
your first PR, a link will be added as a comment that contains the steps needed
to complete this process.
## Getting Started
A great way to start is by searching through our open issues. [Unassigned issues
labeled as "help
wanted"](https://github.com/grpc/grpc-go/issues?q=sort%3Aupdated-desc%20is%3Aissue%20is%3Aopen%20label%3A%22Status%3A%20Help%20Wanted%22%20no%3Aassignee)
are especially nice for first-time contributors, as they should be well-defined
problems that already have agreed-upon solutions.
## Code Style
We follow [Google's published Go style
guide](https://google.github.io/styleguide/go/). Note that there are three
primary documents that make up this style guide; please follow them as closely
as possible. If a reviewer recommends something that contradicts those
guidelines, there may be valid reasons to do so, but it should be rare.
## Guidelines for Pull Requests
How to get your contributions merged smoothly and quickly.
Please read the following carefully to ensure your contributions can be merged
smoothly and quickly.
### PR Contents
- Create **small PRs** that are narrowly focused on **addressing a single
concern**. We often times receive PRs that are trying to fix several things at
a time, but only one fix is considered acceptable, nothing gets merged and
both author's & review's time is wasted. Create more PRs to address different
concerns and everyone will be happy.
concern**. We often receive PRs that attempt to fix several things at the same
time, and if one part of the PR has a problem, that will hold up the entire
PR.
- If you are searching for features to work on, issues labeled [Status: Help
Wanted](https://github.com/grpc/grpc-go/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-desc+label%3A%22Status%3A+Help+Wanted%22)
is a great place to start. These issues are well-documented and usually can be
resolved with a single pull request.
- If your change does not address an **open issue** with an **agreed
resolution**, consider opening an issue and discussing it first. If you are
suggesting a behavioral or API change, consider starting with a [gRFC
proposal](https://github.com/grpc/proposal). Many new features that are not
bug fixes will require cross-language agreement.
- If you are adding a new file, make sure it has the copyright message template
at the top as a comment. You can copy over the message from an existing file
and update the year.
- If you want to fix **formatting or style**, consider whether your changes are
an obvious improvement or might be considered a personal preference. If a
style change is based on preference, it likely will not be accepted. If it
corrects widely agreed-upon anti-patterns, then please do create a PR and
explain the benefits of the change.
- The grpc package should only depend on standard Go packages and a small number
of exceptions. If your contribution introduces new dependencies which are NOT
in the [list](https://godoc.org/google.golang.org/grpc?imports), you need a
discussion with gRPC-Go authors and consultants.
- For speculative changes, consider opening an issue and discussing it first. If
you are suggesting a behavioral or API change, consider starting with a [gRFC
proposal](https://github.com/grpc/proposal).
- Provide a good **PR description** as a record of **what** change is being made
and **why** it was made. Link to a GitHub issue if it exists.
- If you want to fix formatting or style, consider whether your changes are an
obvious improvement or might be considered a personal preference. If a style
change is based on preference, it likely will not be accepted. If it corrects
widely agreed-upon anti-patterns, then please do create a PR and explain the
benefits of the change.
- Unless your PR is trivial, you should expect there will be reviewer comments
that you'll need to address before merging. We'll mark it as `Status: Requires
Reporter Clarification` if we expect you to respond to these comments in a
timely manner. If the PR remains inactive for 6 days, it will be marked as
`stale` and automatically close 7 days after that if we don't hear back from
you.
- Maintain **clean commit history** and use **meaningful commit messages**. PRs
with messy commit history are difficult to review and won't be merged. Use
`rebase -i upstream/master` to curate your commit history and/or to bring in
latest changes from master (but avoid rebasing in the middle of a code
review).
- Keep your PR up to date with upstream/master (if there are merge conflicts, we
can't really merge your change).
- For correcting **misspellings**, please be aware that we use some terms that
are sometimes flagged by spell checkers. As an example, "if an only if" is
often written as "iff". Please do not make spelling correction changes unless
you are certain they are misspellings.
- **All tests need to be passing** before your change can be merged. We
recommend you **run tests locally** before creating your PR to catch breakages
early on.
- `./scripts/vet.sh` to catch vet errors
- `go test -cpu 1,4 -timeout 7m ./...` to run the tests
- `go test -race -cpu 1,4 -timeout 7m ./...` to run tests in race mode
recommend you run tests locally before creating your PR to catch breakages
early on:
- Exceptions to the rules can be made if there's a compelling reason for doing so.
- `./scripts/vet.sh` to catch vet errors.
- `go test -cpu 1,4 -timeout 7m ./...` to run the tests.
- `go test -race -cpu 1,4 -timeout 7m ./...` to run tests in race mode.
Note that we have a multi-module repo, so `go test` commands may need to be
run from the root of each module in order to cause all tests to run.
*Alternatively*, you may find it easier to push your changes to your fork on
GitHub, which will trigger a GitHub Actions run that you can use to verify
everything is passing.
- Note that there are two GitHub actions checks that need not be green:
1. We test the freshness of the generated proto code we maintain via the
`vet-proto` check. If the source proto files are updated, but our repo is
not updated, an optional checker will fail. This will be fixed by our team
in a separate PR and will not prevent the merge of your PR.
2. We run a checker that will fail if there is any change in dependencies of
an exported package via the `dependencies` check. If new dependencies are
added that are not appropriate, we may not accept your PR (see below).
- If you are adding a **new file**, make sure it has the **copyright message**
template at the top as a comment. You can copy the message from an existing
file and update the year.
- The grpc package should only depend on standard Go packages and a small number
of exceptions. **If your contribution introduces new dependencies**, you will
need a discussion with gRPC-Go maintainers.
### PR Descriptions
- **PR titles** should start with the name of the component being addressed, or
the type of change. Examples: transport, client, server, round_robin, xds,
cleanup, deps.
- Read and follow the **guidelines for PR titles and descriptions** here:
https://google.github.io/eng-practices/review/developer/cl-descriptions.html
*particularly* the sections "First Line" and "Body is Informative".
Note: your PR description will be used as the git commit message in a
squash-and-merge if your PR is approved. We may make changes to this as
necessary.
- **Does this PR relate to an open issue?** On the first line, please use the
tag `Fixes #<issue>` to ensure the issue is closed when the PR is merged. Or
use `Updates #<issue>` if the PR is related to an open issue, but does not fix
it. Consider filing an issue if one does not already exist.
- PR descriptions *must* conclude with **release notes** as follows:
```
RELEASE NOTES:
* <component>: <summary>
```
This need not match the PR title.
The summary must:
* be something that gRPC users will understand.
* clearly explain the feature being added, the issue being fixed, or the
behavior being changed, etc. If fixing a bug, be clear about how the bug
can be triggered by an end-user.
* begin with a capital letter and use complete sentences.
* be as short as possible to describe the change being made.
If a PR is *not* end-user visible -- e.g. a cleanup, testing change, or
GitHub-related, use `RELEASE NOTES: n/a`.
### PR Process
- Please **self-review** your code changes before sending your PR. This will
prevent simple, obvious errors from causing delays.
- Maintain a **clean commit history** and use **meaningful commit messages**.
PRs with messy commit histories are difficult to review and won't be merged.
Before sending your PR, ensure your changes are based on top of the latest
`upstream/master` commits, and avoid rebasing in the middle of a code review.
You should **never use `git push -f`** unless absolutely necessary during a
review, as it can interfere with GitHub's tracking of comments.
- Unless your PR is trivial, you should **expect reviewer comments** that you
will need to address before merging. We'll label the PR as `Status: Requires
Reporter Clarification` if we expect you to respond to these comments in a
timely manner. If the PR remains inactive for 6 days, it will be marked as
`stale`, and we will automatically close it after 7 days if we don't hear back
from you. Please feel free to ping issues or bugs if you do not get a response
within a week.

View File

@@ -9,21 +9,19 @@ for general contribution guidelines.
## Maintainers (in alphabetical order)
- [aranjans](https://github.com/aranjans), Google LLC
- [arjan-bal](https://github.com/arjan-bal), Google LLC
- [arvindbr8](https://github.com/arvindbr8), Google LLC
- [atollena](https://github.com/atollena), Datadog, Inc.
- [dfawley](https://github.com/dfawley), Google LLC
- [easwars](https://github.com/easwars), Google LLC
- [erm-g](https://github.com/erm-g), Google LLC
- [gtcooke94](https://github.com/gtcooke94), Google LLC
- [purnesh42h](https://github.com/purnesh42h), Google LLC
- [zasweq](https://github.com/zasweq), Google LLC
## Emeritus Maintainers (in alphabetical order)
- [adelez](https://github.com/adelez)
- [aranjans](https://github.com/aranjans)
- [canguler](https://github.com/canguler)
- [cesarghali](https://github.com/cesarghali)
- [erm-g](https://github.com/erm-g)
- [iamqizhao](https://github.com/iamqizhao)
- [jeanbza](https://github.com/jeanbza)
- [jtattermusch](https://github.com/jtattermusch)
@@ -32,5 +30,7 @@ for general contribution guidelines.
- [matt-kwong](https://github.com/matt-kwong)
- [menghanl](https://github.com/menghanl)
- [nicolasnoble](https://github.com/nicolasnoble)
- [purnesh42h](https://github.com/purnesh42h)
- [srini100](https://github.com/srini100)
- [yongni](https://github.com/yongni)
- [zasweq](https://github.com/zasweq)

View File

@@ -32,6 +32,7 @@ import "google.golang.org/grpc"
- [Low-level technical docs](Documentation) from this repository
- [Performance benchmark][]
- [Examples](examples)
- [Contribution guidelines](CONTRIBUTING.md)
## FAQ

View File

@@ -75,8 +75,6 @@ func unregisterForTesting(name string) {
func init() {
internal.BalancerUnregister = unregisterForTesting
internal.ConnectedAddress = connectedAddress
internal.SetConnectedAddress = setConnectedAddress
}
// Get returns the resolver builder registered with the given name.
@@ -360,6 +358,10 @@ type Balancer interface {
// call SubConn.Shutdown for its existing SubConns; however, this will be
// required in a future release, so it is recommended.
Close()
// ExitIdle instructs the LB policy to reconnect to backends / exit the
// IDLE state, if appropriate and possible. Note that SubConns that enter
// the IDLE state will not reconnect until SubConn.Connect is called.
ExitIdle()
}
// ExitIdler is an optional interface for balancers to implement. If
@@ -367,8 +369,8 @@ type Balancer interface {
// the ClientConn is idle. If unimplemented, ClientConn.Connect will cause
// all SubConns to connect.
//
// Notice: it will be required for all balancers to implement this in a future
// release.
// Deprecated: All balancers must implement this interface. This interface will
// be removed in a future release.
type ExitIdler interface {
// ExitIdle instructs the LB policy to reconnect to backends / exit the
// IDLE state, if appropriate and possible. Note that SubConns that enter

View File

@@ -37,6 +37,8 @@ import (
"google.golang.org/grpc/resolver"
)
var randIntN = rand.IntN
// ChildState is the balancer state of a child along with the endpoint which
// identifies the child balancer.
type ChildState struct {
@@ -45,7 +47,15 @@ type ChildState struct {
// Balancer exposes only the ExitIdler interface of the child LB policy.
// Other methods of the child policy are called only by endpointsharding.
Balancer balancer.ExitIdler
Balancer ExitIdler
}
// ExitIdler provides access to only the ExitIdle method of the child balancer.
type ExitIdler interface {
// ExitIdle instructs the LB policy to reconnect to backends / exit the
// IDLE state, if appropriate and possible. Note that SubConns that enter
// the IDLE state will not reconnect until SubConn.Connect is called.
ExitIdle()
}
// Options are the options to configure the behaviour of the
@@ -104,6 +114,21 @@ type endpointSharding struct {
mu sync.Mutex
}
// rotateEndpoints returns a slice of all the input endpoints rotated a random
// amount.
func rotateEndpoints(es []resolver.Endpoint) []resolver.Endpoint {
les := len(es)
if les == 0 {
return es
}
r := randIntN(les)
// Make a copy to avoid mutating data beyond the end of es.
ret := make([]resolver.Endpoint, les)
copy(ret, es[r:])
copy(ret[les-r:], es[:r])
return ret
}
// UpdateClientConnState creates a child for new endpoints and deletes children
// for endpoints that are no longer present. It also updates all the children,
// and sends a single synchronous update of the childrens' aggregated state at
@@ -125,7 +150,7 @@ func (es *endpointSharding) UpdateClientConnState(state balancer.ClientConnState
newChildren := resolver.NewEndpointMap[*balancerWrapper]()
// Update/Create new children.
for _, endpoint := range state.ResolverState.Endpoints {
for _, endpoint := range rotateEndpoints(state.ResolverState.Endpoints) {
if _, ok := newChildren.Get(endpoint); ok {
// Endpoint child was already created, continue to avoid duplicate
// update.
@@ -205,6 +230,16 @@ func (es *endpointSharding) Close() {
}
}
func (es *endpointSharding) ExitIdle() {
es.childMu.Lock()
defer es.childMu.Unlock()
for _, bw := range es.children.Load().Values() {
if !bw.isClosed {
bw.child.ExitIdle()
}
}
}
// updateState updates this component's state. It sends the aggregated state,
// and a picker with round robin behavior with all the child states present if
// needed.
@@ -261,7 +296,7 @@ func (es *endpointSharding) updateState() {
p := &pickerWithChildStates{
pickers: pickers,
childStates: childStates,
next: uint32(rand.IntN(len(pickers))),
next: uint32(randIntN(len(pickers))),
}
es.cc.UpdateState(balancer.State{
ConnectivityState: aggState,
@@ -326,15 +361,13 @@ func (bw *balancerWrapper) UpdateState(state balancer.State) {
// ExitIdle pings an IDLE child balancer to exit idle in a new goroutine to
// avoid deadlocks due to synchronous balancer state updates.
func (bw *balancerWrapper) ExitIdle() {
if ei, ok := bw.child.(balancer.ExitIdler); ok {
go func() {
bw.es.childMu.Lock()
if !bw.isClosed {
ei.ExitIdle()
}
bw.es.childMu.Unlock()
}()
}
go func() {
bw.es.childMu.Lock()
if !bw.isClosed {
bw.child.ExitIdle()
}
bw.es.childMu.Unlock()
}()
}
// updateClientConnStateLocked delivers the ClientConnState to the child

View File

@@ -26,6 +26,8 @@ import (
var (
// RandShuffle pseudo-randomizes the order of addresses.
RandShuffle = rand.Shuffle
// RandFloat64 returns, as a float64, a pseudo-random number in [0.0,1.0).
RandFloat64 = rand.Float64
// TimeAfterFunc allows mocking the timer for testing connection delay
// related functionality.
TimeAfterFunc = func(d time.Duration, f func()) func() {

File diff suppressed because it is too large Load Diff

View File

@@ -1,927 +0,0 @@
/*
*
* Copyright 2024 gRPC authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/
// Package pickfirstleaf contains the pick_first load balancing policy which
// will be the universal leaf policy after dualstack changes are implemented.
//
// # Experimental
//
// Notice: This package is EXPERIMENTAL and may be changed or removed in a
// later release.
package pickfirstleaf
import (
"encoding/json"
"errors"
"fmt"
"net"
"net/netip"
"sync"
"time"
"google.golang.org/grpc/balancer"
"google.golang.org/grpc/balancer/pickfirst/internal"
"google.golang.org/grpc/connectivity"
expstats "google.golang.org/grpc/experimental/stats"
"google.golang.org/grpc/grpclog"
"google.golang.org/grpc/internal/envconfig"
internalgrpclog "google.golang.org/grpc/internal/grpclog"
"google.golang.org/grpc/internal/pretty"
"google.golang.org/grpc/resolver"
"google.golang.org/grpc/serviceconfig"
)
func init() {
if envconfig.NewPickFirstEnabled {
// Register as the default pick_first balancer.
Name = "pick_first"
}
balancer.Register(pickfirstBuilder{})
}
type (
// enableHealthListenerKeyType is a unique key type used in resolver
// attributes to indicate whether the health listener usage is enabled.
enableHealthListenerKeyType struct{}
// managedByPickfirstKeyType is an attribute key type to inform Outlier
// Detection that the generic health listener is being used.
// TODO: https://github.com/grpc/grpc-go/issues/7915 - Remove this when
// implementing the dualstack design. This is a hack. Once Dualstack is
// completed, outlier detection will stop sending ejection updates through
// the connectivity listener.
managedByPickfirstKeyType struct{}
)
var (
logger = grpclog.Component("pick-first-leaf-lb")
// Name is the name of the pick_first_leaf balancer.
// It is changed to "pick_first" in init() if this balancer is to be
// registered as the default pickfirst.
Name = "pick_first_leaf"
disconnectionsMetric = expstats.RegisterInt64Count(expstats.MetricDescriptor{
Name: "grpc.lb.pick_first.disconnections",
Description: "EXPERIMENTAL. Number of times the selected subchannel becomes disconnected.",
Unit: "disconnection",
Labels: []string{"grpc.target"},
Default: false,
})
connectionAttemptsSucceededMetric = expstats.RegisterInt64Count(expstats.MetricDescriptor{
Name: "grpc.lb.pick_first.connection_attempts_succeeded",
Description: "EXPERIMENTAL. Number of successful connection attempts.",
Unit: "attempt",
Labels: []string{"grpc.target"},
Default: false,
})
connectionAttemptsFailedMetric = expstats.RegisterInt64Count(expstats.MetricDescriptor{
Name: "grpc.lb.pick_first.connection_attempts_failed",
Description: "EXPERIMENTAL. Number of failed connection attempts.",
Unit: "attempt",
Labels: []string{"grpc.target"},
Default: false,
})
)
const (
// TODO: change to pick-first when this becomes the default pick_first policy.
logPrefix = "[pick-first-leaf-lb %p] "
// connectionDelayInterval is the time to wait for during the happy eyeballs
// pass before starting the next connection attempt.
connectionDelayInterval = 250 * time.Millisecond
)
type ipAddrFamily int
const (
// ipAddrFamilyUnknown represents strings that can't be parsed as an IP
// address.
ipAddrFamilyUnknown ipAddrFamily = iota
ipAddrFamilyV4
ipAddrFamilyV6
)
type pickfirstBuilder struct{}
func (pickfirstBuilder) Build(cc balancer.ClientConn, bo balancer.BuildOptions) balancer.Balancer {
b := &pickfirstBalancer{
cc: cc,
target: bo.Target.String(),
metricsRecorder: cc.MetricsRecorder(),
subConns: resolver.NewAddressMapV2[*scData](),
state: connectivity.Connecting,
cancelConnectionTimer: func() {},
}
b.logger = internalgrpclog.NewPrefixLogger(logger, fmt.Sprintf(logPrefix, b))
return b
}
func (b pickfirstBuilder) Name() string {
return Name
}
func (pickfirstBuilder) ParseConfig(js json.RawMessage) (serviceconfig.LoadBalancingConfig, error) {
var cfg pfConfig
if err := json.Unmarshal(js, &cfg); err != nil {
return nil, fmt.Errorf("pickfirst: unable to unmarshal LB policy config: %s, error: %v", string(js), err)
}
return cfg, nil
}
// EnableHealthListener updates the state to configure pickfirst for using a
// generic health listener.
func EnableHealthListener(state resolver.State) resolver.State {
state.Attributes = state.Attributes.WithValue(enableHealthListenerKeyType{}, true)
return state
}
// IsManagedByPickfirst returns whether an address belongs to a SubConn
// managed by the pickfirst LB policy.
// TODO: https://github.com/grpc/grpc-go/issues/7915 - This is a hack to disable
// outlier_detection via the with connectivity listener when using pick_first.
// Once Dualstack changes are complete, all SubConns will be created by
// pick_first and outlier detection will only use the health listener for
// ejection. This hack can then be removed.
func IsManagedByPickfirst(addr resolver.Address) bool {
return addr.BalancerAttributes.Value(managedByPickfirstKeyType{}) != nil
}
type pfConfig struct {
serviceconfig.LoadBalancingConfig `json:"-"`
// If set to true, instructs the LB policy to shuffle the order of the list
// of endpoints received from the name resolver before attempting to
// connect to them.
ShuffleAddressList bool `json:"shuffleAddressList"`
}
// scData keeps track of the current state of the subConn.
// It is not safe for concurrent access.
type scData struct {
// The following fields are initialized at build time and read-only after
// that.
subConn balancer.SubConn
addr resolver.Address
rawConnectivityState connectivity.State
// The effective connectivity state based on raw connectivity, health state
// and after following sticky TransientFailure behaviour defined in A62.
effectiveState connectivity.State
lastErr error
connectionFailedInFirstPass bool
}
func (b *pickfirstBalancer) newSCData(addr resolver.Address) (*scData, error) {
addr.BalancerAttributes = addr.BalancerAttributes.WithValue(managedByPickfirstKeyType{}, true)
sd := &scData{
rawConnectivityState: connectivity.Idle,
effectiveState: connectivity.Idle,
addr: addr,
}
sc, err := b.cc.NewSubConn([]resolver.Address{addr}, balancer.NewSubConnOptions{
StateListener: func(state balancer.SubConnState) {
b.updateSubConnState(sd, state)
},
})
if err != nil {
return nil, err
}
sd.subConn = sc
return sd, nil
}
type pickfirstBalancer struct {
// The following fields are initialized at build time and read-only after
// that and therefore do not need to be guarded by a mutex.
logger *internalgrpclog.PrefixLogger
cc balancer.ClientConn
target string
metricsRecorder expstats.MetricsRecorder // guaranteed to be non nil
// The mutex is used to ensure synchronization of updates triggered
// from the idle picker and the already serialized resolver,
// SubConn state updates.
mu sync.Mutex
// State reported to the channel based on SubConn states and resolver
// updates.
state connectivity.State
// scData for active subonns mapped by address.
subConns *resolver.AddressMapV2[*scData]
addressList addressList
firstPass bool
numTF int
cancelConnectionTimer func()
healthCheckingEnabled bool
}
// ResolverError is called by the ClientConn when the name resolver produces
// an error or when pickfirst determined the resolver update to be invalid.
func (b *pickfirstBalancer) ResolverError(err error) {
b.mu.Lock()
defer b.mu.Unlock()
b.resolverErrorLocked(err)
}
func (b *pickfirstBalancer) resolverErrorLocked(err error) {
if b.logger.V(2) {
b.logger.Infof("Received error from the name resolver: %v", err)
}
// The picker will not change since the balancer does not currently
// report an error. If the balancer hasn't received a single good resolver
// update yet, transition to TRANSIENT_FAILURE.
if b.state != connectivity.TransientFailure && b.addressList.size() > 0 {
if b.logger.V(2) {
b.logger.Infof("Ignoring resolver error because balancer is using a previous good update.")
}
return
}
b.updateBalancerState(balancer.State{
ConnectivityState: connectivity.TransientFailure,
Picker: &picker{err: fmt.Errorf("name resolver error: %v", err)},
})
}
func (b *pickfirstBalancer) UpdateClientConnState(state balancer.ClientConnState) error {
b.mu.Lock()
defer b.mu.Unlock()
b.cancelConnectionTimer()
if len(state.ResolverState.Addresses) == 0 && len(state.ResolverState.Endpoints) == 0 {
// Cleanup state pertaining to the previous resolver state.
// Treat an empty address list like an error by calling b.ResolverError.
b.closeSubConnsLocked()
b.addressList.updateAddrs(nil)
b.resolverErrorLocked(errors.New("produced zero addresses"))
return balancer.ErrBadResolverState
}
b.healthCheckingEnabled = state.ResolverState.Attributes.Value(enableHealthListenerKeyType{}) != nil
cfg, ok := state.BalancerConfig.(pfConfig)
if state.BalancerConfig != nil && !ok {
return fmt.Errorf("pickfirst: received illegal BalancerConfig (type %T): %v: %w", state.BalancerConfig, state.BalancerConfig, balancer.ErrBadResolverState)
}
if b.logger.V(2) {
b.logger.Infof("Received new config %s, resolver state %s", pretty.ToJSON(cfg), pretty.ToJSON(state.ResolverState))
}
var newAddrs []resolver.Address
if endpoints := state.ResolverState.Endpoints; len(endpoints) != 0 {
// Perform the optional shuffling described in gRFC A62. The shuffling
// will change the order of endpoints but not touch the order of the
// addresses within each endpoint. - A61
if cfg.ShuffleAddressList {
endpoints = append([]resolver.Endpoint{}, endpoints...)
internal.RandShuffle(len(endpoints), func(i, j int) { endpoints[i], endpoints[j] = endpoints[j], endpoints[i] })
}
// "Flatten the list by concatenating the ordered list of addresses for
// each of the endpoints, in order." - A61
for _, endpoint := range endpoints {
newAddrs = append(newAddrs, endpoint.Addresses...)
}
} else {
// Endpoints not set, process addresses until we migrate resolver
// emissions fully to Endpoints. The top channel does wrap emitted
// addresses with endpoints, however some balancers such as weighted
// target do not forward the corresponding correct endpoints down/split
// endpoints properly. Once all balancers correctly forward endpoints
// down, can delete this else conditional.
newAddrs = state.ResolverState.Addresses
if cfg.ShuffleAddressList {
newAddrs = append([]resolver.Address{}, newAddrs...)
internal.RandShuffle(len(endpoints), func(i, j int) { endpoints[i], endpoints[j] = endpoints[j], endpoints[i] })
}
}
// If an address appears in multiple endpoints or in the same endpoint
// multiple times, we keep it only once. We will create only one SubConn
// for the address because an AddressMap is used to store SubConns.
// Not de-duplicating would result in attempting to connect to the same
// SubConn multiple times in the same pass. We don't want this.
newAddrs = deDupAddresses(newAddrs)
newAddrs = interleaveAddresses(newAddrs)
prevAddr := b.addressList.currentAddress()
prevSCData, found := b.subConns.Get(prevAddr)
prevAddrsCount := b.addressList.size()
isPrevRawConnectivityStateReady := found && prevSCData.rawConnectivityState == connectivity.Ready
b.addressList.updateAddrs(newAddrs)
// If the previous ready SubConn exists in new address list,
// keep this connection and don't create new SubConns.
if isPrevRawConnectivityStateReady && b.addressList.seekTo(prevAddr) {
return nil
}
b.reconcileSubConnsLocked(newAddrs)
// If it's the first resolver update or the balancer was already READY
// (but the new address list does not contain the ready SubConn) or
// CONNECTING, enter CONNECTING.
// We may be in TRANSIENT_FAILURE due to a previous empty address list,
// we should still enter CONNECTING because the sticky TF behaviour
// mentioned in A62 applies only when the TRANSIENT_FAILURE is reported
// due to connectivity failures.
if isPrevRawConnectivityStateReady || b.state == connectivity.Connecting || prevAddrsCount == 0 {
// Start connection attempt at first address.
b.forceUpdateConcludedStateLocked(balancer.State{
ConnectivityState: connectivity.Connecting,
Picker: &picker{err: balancer.ErrNoSubConnAvailable},
})
b.startFirstPassLocked()
} else if b.state == connectivity.TransientFailure {
// If we're in TRANSIENT_FAILURE, we stay in TRANSIENT_FAILURE until
// we're READY. See A62.
b.startFirstPassLocked()
}
return nil
}
// UpdateSubConnState is unused as a StateListener is always registered when
// creating SubConns.
func (b *pickfirstBalancer) UpdateSubConnState(subConn balancer.SubConn, state balancer.SubConnState) {
b.logger.Errorf("UpdateSubConnState(%v, %+v) called unexpectedly", subConn, state)
}
func (b *pickfirstBalancer) Close() {
b.mu.Lock()
defer b.mu.Unlock()
b.closeSubConnsLocked()
b.cancelConnectionTimer()
b.state = connectivity.Shutdown
}
// ExitIdle moves the balancer out of idle state. It can be called concurrently
// by the idlePicker and clientConn so access to variables should be
// synchronized.
func (b *pickfirstBalancer) ExitIdle() {
b.mu.Lock()
defer b.mu.Unlock()
if b.state == connectivity.Idle {
b.startFirstPassLocked()
}
}
func (b *pickfirstBalancer) startFirstPassLocked() {
b.firstPass = true
b.numTF = 0
// Reset the connection attempt record for existing SubConns.
for _, sd := range b.subConns.Values() {
sd.connectionFailedInFirstPass = false
}
b.requestConnectionLocked()
}
func (b *pickfirstBalancer) closeSubConnsLocked() {
for _, sd := range b.subConns.Values() {
sd.subConn.Shutdown()
}
b.subConns = resolver.NewAddressMapV2[*scData]()
}
// deDupAddresses ensures that each address appears only once in the slice.
func deDupAddresses(addrs []resolver.Address) []resolver.Address {
seenAddrs := resolver.NewAddressMapV2[*scData]()
retAddrs := []resolver.Address{}
for _, addr := range addrs {
if _, ok := seenAddrs.Get(addr); ok {
continue
}
retAddrs = append(retAddrs, addr)
}
return retAddrs
}
// interleaveAddresses interleaves addresses of both families (IPv4 and IPv6)
// as per RFC-8305 section 4.
// Whichever address family is first in the list is followed by an address of
// the other address family; that is, if the first address in the list is IPv6,
// then the first IPv4 address should be moved up in the list to be second in
// the list. It doesn't support configuring "First Address Family Count", i.e.
// there will always be a single member of the first address family at the
// beginning of the interleaved list.
// Addresses that are neither IPv4 nor IPv6 are treated as part of a third
// "unknown" family for interleaving.
// See: https://datatracker.ietf.org/doc/html/rfc8305#autoid-6
func interleaveAddresses(addrs []resolver.Address) []resolver.Address {
familyAddrsMap := map[ipAddrFamily][]resolver.Address{}
interleavingOrder := []ipAddrFamily{}
for _, addr := range addrs {
family := addressFamily(addr.Addr)
if _, found := familyAddrsMap[family]; !found {
interleavingOrder = append(interleavingOrder, family)
}
familyAddrsMap[family] = append(familyAddrsMap[family], addr)
}
interleavedAddrs := make([]resolver.Address, 0, len(addrs))
for curFamilyIdx := 0; len(interleavedAddrs) < len(addrs); curFamilyIdx = (curFamilyIdx + 1) % len(interleavingOrder) {
// Some IP types may have fewer addresses than others, so we look for
// the next type that has a remaining member to add to the interleaved
// list.
family := interleavingOrder[curFamilyIdx]
remainingMembers := familyAddrsMap[family]
if len(remainingMembers) > 0 {
interleavedAddrs = append(interleavedAddrs, remainingMembers[0])
familyAddrsMap[family] = remainingMembers[1:]
}
}
return interleavedAddrs
}
// addressFamily returns the ipAddrFamily after parsing the address string.
// If the address isn't of the format "ip-address:port", it returns
// ipAddrFamilyUnknown. The address may be valid even if it's not an IP when
// using a resolver like passthrough where the address may be a hostname in
// some format that the dialer can resolve.
func addressFamily(address string) ipAddrFamily {
// Parse the IP after removing the port.
host, _, err := net.SplitHostPort(address)
if err != nil {
return ipAddrFamilyUnknown
}
ip, err := netip.ParseAddr(host)
if err != nil {
return ipAddrFamilyUnknown
}
switch {
case ip.Is4() || ip.Is4In6():
return ipAddrFamilyV4
case ip.Is6():
return ipAddrFamilyV6
default:
return ipAddrFamilyUnknown
}
}
// reconcileSubConnsLocked updates the active subchannels based on a new address
// list from the resolver. It does this by:
// - closing subchannels: any existing subchannels associated with addresses
// that are no longer in the updated list are shut down.
// - removing subchannels: entries for these closed subchannels are removed
// from the subchannel map.
//
// This ensures that the subchannel map accurately reflects the current set of
// addresses received from the name resolver.
func (b *pickfirstBalancer) reconcileSubConnsLocked(newAddrs []resolver.Address) {
newAddrsMap := resolver.NewAddressMapV2[bool]()
for _, addr := range newAddrs {
newAddrsMap.Set(addr, true)
}
for _, oldAddr := range b.subConns.Keys() {
if _, ok := newAddrsMap.Get(oldAddr); ok {
continue
}
val, _ := b.subConns.Get(oldAddr)
val.subConn.Shutdown()
b.subConns.Delete(oldAddr)
}
}
// shutdownRemainingLocked shuts down remaining subConns. Called when a subConn
// becomes ready, which means that all other subConn must be shutdown.
func (b *pickfirstBalancer) shutdownRemainingLocked(selected *scData) {
b.cancelConnectionTimer()
for _, sd := range b.subConns.Values() {
if sd.subConn != selected.subConn {
sd.subConn.Shutdown()
}
}
b.subConns = resolver.NewAddressMapV2[*scData]()
b.subConns.Set(selected.addr, selected)
}
// requestConnectionLocked starts connecting on the subchannel corresponding to
// the current address. If no subchannel exists, one is created. If the current
// subchannel is in TransientFailure, a connection to the next address is
// attempted until a subchannel is found.
func (b *pickfirstBalancer) requestConnectionLocked() {
if !b.addressList.isValid() {
return
}
var lastErr error
for valid := true; valid; valid = b.addressList.increment() {
curAddr := b.addressList.currentAddress()
sd, ok := b.subConns.Get(curAddr)
if !ok {
var err error
// We want to assign the new scData to sd from the outer scope,
// hence we can't use := below.
sd, err = b.newSCData(curAddr)
if err != nil {
// This should never happen, unless the clientConn is being shut
// down.
if b.logger.V(2) {
b.logger.Infof("Failed to create a subConn for address %v: %v", curAddr.String(), err)
}
// Do nothing, the LB policy will be closed soon.
return
}
b.subConns.Set(curAddr, sd)
}
switch sd.rawConnectivityState {
case connectivity.Idle:
sd.subConn.Connect()
b.scheduleNextConnectionLocked()
return
case connectivity.TransientFailure:
// The SubConn is being re-used and failed during a previous pass
// over the addressList. It has not completed backoff yet.
// Mark it as having failed and try the next address.
sd.connectionFailedInFirstPass = true
lastErr = sd.lastErr
continue
case connectivity.Connecting:
// Wait for the connection attempt to complete or the timer to fire
// before attempting the next address.
b.scheduleNextConnectionLocked()
return
default:
b.logger.Errorf("SubConn with unexpected state %v present in SubConns map.", sd.rawConnectivityState)
return
}
}
// All the remaining addresses in the list are in TRANSIENT_FAILURE, end the
// first pass if possible.
b.endFirstPassIfPossibleLocked(lastErr)
}
func (b *pickfirstBalancer) scheduleNextConnectionLocked() {
b.cancelConnectionTimer()
if !b.addressList.hasNext() {
return
}
curAddr := b.addressList.currentAddress()
cancelled := false // Access to this is protected by the balancer's mutex.
closeFn := internal.TimeAfterFunc(connectionDelayInterval, func() {
b.mu.Lock()
defer b.mu.Unlock()
// If the scheduled task is cancelled while acquiring the mutex, return.
if cancelled {
return
}
if b.logger.V(2) {
b.logger.Infof("Happy Eyeballs timer expired while waiting for connection to %q.", curAddr.Addr)
}
if b.addressList.increment() {
b.requestConnectionLocked()
}
})
// Access to the cancellation callback held by the balancer is guarded by
// the balancer's mutex, so it's safe to set the boolean from the callback.
b.cancelConnectionTimer = sync.OnceFunc(func() {
cancelled = true
closeFn()
})
}
func (b *pickfirstBalancer) updateSubConnState(sd *scData, newState balancer.SubConnState) {
b.mu.Lock()
defer b.mu.Unlock()
oldState := sd.rawConnectivityState
sd.rawConnectivityState = newState.ConnectivityState
// Previously relevant SubConns can still callback with state updates.
// To prevent pickers from returning these obsolete SubConns, this logic
// is included to check if the current list of active SubConns includes this
// SubConn.
if !b.isActiveSCData(sd) {
return
}
if newState.ConnectivityState == connectivity.Shutdown {
sd.effectiveState = connectivity.Shutdown
return
}
// Record a connection attempt when exiting CONNECTING.
if newState.ConnectivityState == connectivity.TransientFailure {
sd.connectionFailedInFirstPass = true
connectionAttemptsFailedMetric.Record(b.metricsRecorder, 1, b.target)
}
if newState.ConnectivityState == connectivity.Ready {
connectionAttemptsSucceededMetric.Record(b.metricsRecorder, 1, b.target)
b.shutdownRemainingLocked(sd)
if !b.addressList.seekTo(sd.addr) {
// This should not fail as we should have only one SubConn after
// entering READY. The SubConn should be present in the addressList.
b.logger.Errorf("Address %q not found address list in %v", sd.addr, b.addressList.addresses)
return
}
if !b.healthCheckingEnabled {
if b.logger.V(2) {
b.logger.Infof("SubConn %p reported connectivity state READY and the health listener is disabled. Transitioning SubConn to READY.", sd.subConn)
}
sd.effectiveState = connectivity.Ready
b.updateBalancerState(balancer.State{
ConnectivityState: connectivity.Ready,
Picker: &picker{result: balancer.PickResult{SubConn: sd.subConn}},
})
return
}
if b.logger.V(2) {
b.logger.Infof("SubConn %p reported connectivity state READY. Registering health listener.", sd.subConn)
}
// Send a CONNECTING update to take the SubConn out of sticky-TF if
// required.
sd.effectiveState = connectivity.Connecting
b.updateBalancerState(balancer.State{
ConnectivityState: connectivity.Connecting,
Picker: &picker{err: balancer.ErrNoSubConnAvailable},
})
sd.subConn.RegisterHealthListener(func(scs balancer.SubConnState) {
b.updateSubConnHealthState(sd, scs)
})
return
}
// If the LB policy is READY, and it receives a subchannel state change,
// it means that the READY subchannel has failed.
// A SubConn can also transition from CONNECTING directly to IDLE when
// a transport is successfully created, but the connection fails
// before the SubConn can send the notification for READY. We treat
// this as a successful connection and transition to IDLE.
// TODO: https://github.com/grpc/grpc-go/issues/7862 - Remove the second
// part of the if condition below once the issue is fixed.
if oldState == connectivity.Ready || (oldState == connectivity.Connecting && newState.ConnectivityState == connectivity.Idle) {
// Once a transport fails, the balancer enters IDLE and starts from
// the first address when the picker is used.
b.shutdownRemainingLocked(sd)
sd.effectiveState = newState.ConnectivityState
// READY SubConn interspliced in between CONNECTING and IDLE, need to
// account for that.
if oldState == connectivity.Connecting {
// A known issue (https://github.com/grpc/grpc-go/issues/7862)
// causes a race that prevents the READY state change notification.
// This works around it.
connectionAttemptsSucceededMetric.Record(b.metricsRecorder, 1, b.target)
}
disconnectionsMetric.Record(b.metricsRecorder, 1, b.target)
b.addressList.reset()
b.updateBalancerState(balancer.State{
ConnectivityState: connectivity.Idle,
Picker: &idlePicker{exitIdle: sync.OnceFunc(b.ExitIdle)},
})
return
}
if b.firstPass {
switch newState.ConnectivityState {
case connectivity.Connecting:
// The effective state can be in either IDLE, CONNECTING or
// TRANSIENT_FAILURE. If it's TRANSIENT_FAILURE, stay in
// TRANSIENT_FAILURE until it's READY. See A62.
if sd.effectiveState != connectivity.TransientFailure {
sd.effectiveState = connectivity.Connecting
b.updateBalancerState(balancer.State{
ConnectivityState: connectivity.Connecting,
Picker: &picker{err: balancer.ErrNoSubConnAvailable},
})
}
case connectivity.TransientFailure:
sd.lastErr = newState.ConnectionError
sd.effectiveState = connectivity.TransientFailure
// Since we're re-using common SubConns while handling resolver
// updates, we could receive an out of turn TRANSIENT_FAILURE from
// a pass over the previous address list. Happy Eyeballs will also
// cause out of order updates to arrive.
if curAddr := b.addressList.currentAddress(); equalAddressIgnoringBalAttributes(&curAddr, &sd.addr) {
b.cancelConnectionTimer()
if b.addressList.increment() {
b.requestConnectionLocked()
return
}
}
// End the first pass if we've seen a TRANSIENT_FAILURE from all
// SubConns once.
b.endFirstPassIfPossibleLocked(newState.ConnectionError)
}
return
}
// We have finished the first pass, keep re-connecting failing SubConns.
switch newState.ConnectivityState {
case connectivity.TransientFailure:
b.numTF = (b.numTF + 1) % b.subConns.Len()
sd.lastErr = newState.ConnectionError
if b.numTF%b.subConns.Len() == 0 {
b.updateBalancerState(balancer.State{
ConnectivityState: connectivity.TransientFailure,
Picker: &picker{err: newState.ConnectionError},
})
}
// We don't need to request re-resolution since the SubConn already
// does that before reporting TRANSIENT_FAILURE.
// TODO: #7534 - Move re-resolution requests from SubConn into
// pick_first.
case connectivity.Idle:
sd.subConn.Connect()
}
}
// endFirstPassIfPossibleLocked ends the first happy-eyeballs pass if all the
// addresses are tried and their SubConns have reported a failure.
func (b *pickfirstBalancer) endFirstPassIfPossibleLocked(lastErr error) {
// An optimization to avoid iterating over the entire SubConn map.
if b.addressList.isValid() {
return
}
// Connect() has been called on all the SubConns. The first pass can be
// ended if all the SubConns have reported a failure.
for _, sd := range b.subConns.Values() {
if !sd.connectionFailedInFirstPass {
return
}
}
b.firstPass = false
b.updateBalancerState(balancer.State{
ConnectivityState: connectivity.TransientFailure,
Picker: &picker{err: lastErr},
})
// Start re-connecting all the SubConns that are already in IDLE.
for _, sd := range b.subConns.Values() {
if sd.rawConnectivityState == connectivity.Idle {
sd.subConn.Connect()
}
}
}
func (b *pickfirstBalancer) isActiveSCData(sd *scData) bool {
activeSD, found := b.subConns.Get(sd.addr)
return found && activeSD == sd
}
func (b *pickfirstBalancer) updateSubConnHealthState(sd *scData, state balancer.SubConnState) {
b.mu.Lock()
defer b.mu.Unlock()
// Previously relevant SubConns can still callback with state updates.
// To prevent pickers from returning these obsolete SubConns, this logic
// is included to check if the current list of active SubConns includes
// this SubConn.
if !b.isActiveSCData(sd) {
return
}
sd.effectiveState = state.ConnectivityState
switch state.ConnectivityState {
case connectivity.Ready:
b.updateBalancerState(balancer.State{
ConnectivityState: connectivity.Ready,
Picker: &picker{result: balancer.PickResult{SubConn: sd.subConn}},
})
case connectivity.TransientFailure:
b.updateBalancerState(balancer.State{
ConnectivityState: connectivity.TransientFailure,
Picker: &picker{err: fmt.Errorf("pickfirst: health check failure: %v", state.ConnectionError)},
})
case connectivity.Connecting:
b.updateBalancerState(balancer.State{
ConnectivityState: connectivity.Connecting,
Picker: &picker{err: balancer.ErrNoSubConnAvailable},
})
default:
b.logger.Errorf("Got unexpected health update for SubConn %p: %v", state)
}
}
// updateBalancerState stores the state reported to the channel and calls
// ClientConn.UpdateState(). As an optimization, it avoids sending duplicate
// updates to the channel.
func (b *pickfirstBalancer) updateBalancerState(newState balancer.State) {
// In case of TransientFailures allow the picker to be updated to update
// the connectivity error, in all other cases don't send duplicate state
// updates.
if newState.ConnectivityState == b.state && b.state != connectivity.TransientFailure {
return
}
b.forceUpdateConcludedStateLocked(newState)
}
// forceUpdateConcludedStateLocked stores the state reported to the channel and
// calls ClientConn.UpdateState().
// A separate function is defined to force update the ClientConn state since the
// channel doesn't correctly assume that LB policies start in CONNECTING and
// relies on LB policy to send an initial CONNECTING update.
func (b *pickfirstBalancer) forceUpdateConcludedStateLocked(newState balancer.State) {
b.state = newState.ConnectivityState
b.cc.UpdateState(newState)
}
type picker struct {
result balancer.PickResult
err error
}
func (p *picker) Pick(balancer.PickInfo) (balancer.PickResult, error) {
return p.result, p.err
}
// idlePicker is used when the SubConn is IDLE and kicks the SubConn into
// CONNECTING when Pick is called.
type idlePicker struct {
exitIdle func()
}
func (i *idlePicker) Pick(balancer.PickInfo) (balancer.PickResult, error) {
i.exitIdle()
return balancer.PickResult{}, balancer.ErrNoSubConnAvailable
}
// addressList manages sequentially iterating over addresses present in a list
// of endpoints. It provides a 1 dimensional view of the addresses present in
// the endpoints.
// This type is not safe for concurrent access.
type addressList struct {
addresses []resolver.Address
idx int
}
func (al *addressList) isValid() bool {
return al.idx < len(al.addresses)
}
func (al *addressList) size() int {
return len(al.addresses)
}
// increment moves to the next index in the address list.
// This method returns false if it went off the list, true otherwise.
func (al *addressList) increment() bool {
if !al.isValid() {
return false
}
al.idx++
return al.idx < len(al.addresses)
}
// currentAddress returns the current address pointed to in the addressList.
// If the list is in an invalid state, it returns an empty address instead.
func (al *addressList) currentAddress() resolver.Address {
if !al.isValid() {
return resolver.Address{}
}
return al.addresses[al.idx]
}
func (al *addressList) reset() {
al.idx = 0
}
func (al *addressList) updateAddrs(addrs []resolver.Address) {
al.addresses = addrs
al.reset()
}
// seekTo returns false if the needle was not found and the current index was
// left unchanged.
func (al *addressList) seekTo(needle resolver.Address) bool {
for ai, addr := range al.addresses {
if !equalAddressIgnoringBalAttributes(&addr, &needle) {
continue
}
al.idx = ai
return true
}
return false
}
// hasNext returns whether incrementing the addressList will result in moving
// past the end of the list. If the list has already moved past the end, it
// returns false.
func (al *addressList) hasNext() bool {
if !al.isValid() {
return false
}
return al.idx+1 < len(al.addresses)
}
// equalAddressIgnoringBalAttributes returns true is a and b are considered
// equal. This is different from the Equal method on the resolver.Address type
// which considers all fields to determine equality. Here, we only consider
// fields that are meaningful to the SubConn.
func equalAddressIgnoringBalAttributes(a, b *resolver.Address) bool {
return a.Addr == b.Addr && a.ServerName == b.ServerName &&
a.Attributes.Equal(b.Attributes)
}

View File

@@ -26,7 +26,7 @@ import (
"google.golang.org/grpc/balancer"
"google.golang.org/grpc/balancer/endpointsharding"
"google.golang.org/grpc/balancer/pickfirst/pickfirstleaf"
"google.golang.org/grpc/balancer/pickfirst"
"google.golang.org/grpc/grpclog"
internalgrpclog "google.golang.org/grpc/internal/grpclog"
)
@@ -47,7 +47,7 @@ func (bb builder) Name() string {
}
func (bb builder) Build(cc balancer.ClientConn, opts balancer.BuildOptions) balancer.Balancer {
childBuilder := balancer.Get(pickfirstleaf.Name).Build
childBuilder := balancer.Get(pickfirst.Name).Build
bal := &rrBalancer{
cc: cc,
Balancer: endpointsharding.NewBalancer(cc, opts, childBuilder, endpointsharding.Options{}),
@@ -67,13 +67,6 @@ func (b *rrBalancer) UpdateClientConnState(ccs balancer.ClientConnState) error {
return b.Balancer.UpdateClientConnState(balancer.ClientConnState{
// Enable the health listener in pickfirst children for client side health
// checks and outlier detection, if configured.
ResolverState: pickfirstleaf.EnableHealthListener(ccs.ResolverState),
ResolverState: pickfirst.EnableHealthListener(ccs.ResolverState),
})
}
func (b *rrBalancer) ExitIdle() {
// Should always be ok, as child is endpoint sharding.
if ei, ok := b.Balancer.(balancer.ExitIdler); ok {
ei.ExitIdle()
}
}

View File

@@ -111,20 +111,6 @@ type SubConnState struct {
// ConnectionError is set if the ConnectivityState is TransientFailure,
// describing the reason the SubConn failed. Otherwise, it is nil.
ConnectionError error
// connectedAddr contains the connected address when ConnectivityState is
// Ready. Otherwise, it is indeterminate.
connectedAddress resolver.Address
}
// connectedAddress returns the connected address for a SubConnState. The
// address is only valid if the state is READY.
func connectedAddress(scs SubConnState) resolver.Address {
return scs.connectedAddress
}
// setConnectedAddress sets the connected address for a SubConnState.
func setConnectedAddress(scs *SubConnState, addr resolver.Address) {
scs.connectedAddress = addr
}
// A Producer is a type shared among potentially many consumers. It is

View File

@@ -36,7 +36,6 @@ import (
)
var (
setConnectedAddress = internal.SetConnectedAddress.(func(*balancer.SubConnState, resolver.Address))
// noOpRegisterHealthListenerFn is used when client side health checking is
// disabled. It sends a single READY update on the registered listener.
noOpRegisterHealthListenerFn = func(_ context.Context, listener func(balancer.SubConnState)) func() {
@@ -305,7 +304,7 @@ func newHealthData(s connectivity.State) *healthData {
// updateState is invoked by grpc to push a subConn state update to the
// underlying balancer.
func (acbw *acBalancerWrapper) updateState(s connectivity.State, curAddr resolver.Address, err error) {
func (acbw *acBalancerWrapper) updateState(s connectivity.State, err error) {
acbw.ccb.serializer.TrySchedule(func(ctx context.Context) {
if ctx.Err() != nil || acbw.ccb.balancer == nil {
return
@@ -317,9 +316,6 @@ func (acbw *acBalancerWrapper) updateState(s connectivity.State, curAddr resolve
// opts.StateListener is set, so this cannot ever be nil.
// TODO: delete this comment when UpdateSubConnState is removed.
scs := balancer.SubConnState{ConnectivityState: s, ConnectionError: err}
if s == connectivity.Ready {
setConnectedAddress(&scs, curAddr)
}
// Invalidate the health listener by updating the healthData.
acbw.healthMu.Lock()
// A race may occur if a health listener is registered soon after the
@@ -450,13 +446,14 @@ func (acbw *acBalancerWrapper) healthListenerRegFn() func(context.Context, func(
if acbw.ccb.cc.dopts.disableHealthCheck {
return noOpRegisterHealthListenerFn
}
cfg := acbw.ac.cc.healthCheckConfig()
if cfg == nil {
return noOpRegisterHealthListenerFn
}
regHealthLisFn := internal.RegisterClientHealthCheckListener
if regHealthLisFn == nil {
// The health package is not imported.
return noOpRegisterHealthListenerFn
}
cfg := acbw.ac.cc.healthCheckConfig()
if cfg == nil {
channelz.Error(logger, acbw.ac.channelz, "Health check is requested but health package is not imported.")
return noOpRegisterHealthListenerFn
}
return func(ctx context.Context, listener func(balancer.SubConnState)) func() {

View File

@@ -18,7 +18,7 @@
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
// protoc-gen-go v1.36.5
// protoc-gen-go v1.36.10
// protoc v5.27.1
// source: grpc/binlog/v1/binarylog.proto
@@ -858,133 +858,68 @@ func (x *Address) GetIpPort() uint32 {
var File_grpc_binlog_v1_binarylog_proto protoreflect.FileDescriptor
var file_grpc_binlog_v1_binarylog_proto_rawDesc = string([]byte{
0x0a, 0x1e, 0x67, 0x72, 0x70, 0x63, 0x2f, 0x62, 0x69, 0x6e, 0x6c, 0x6f, 0x67, 0x2f, 0x76, 0x31,
0x2f, 0x62, 0x69, 0x6e, 0x61, 0x72, 0x79, 0x6c, 0x6f, 0x67, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f,
0x12, 0x11, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x62, 0x69, 0x6e, 0x61, 0x72, 0x79, 0x6c, 0x6f, 0x67,
0x2e, 0x76, 0x31, 0x1a, 0x1e, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2f, 0x70, 0x72, 0x6f, 0x74,
0x6f, 0x62, 0x75, 0x66, 0x2f, 0x64, 0x75, 0x72, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x2e, 0x70, 0x72,
0x6f, 0x74, 0x6f, 0x1a, 0x1f, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2f, 0x70, 0x72, 0x6f, 0x74,
0x6f, 0x62, 0x75, 0x66, 0x2f, 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x2e, 0x70,
0x72, 0x6f, 0x74, 0x6f, 0x22, 0xbb, 0x07, 0x0a, 0x0c, 0x47, 0x72, 0x70, 0x63, 0x4c, 0x6f, 0x67,
0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x38, 0x0a, 0x09, 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61,
0x6d, 0x70, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x1a, 0x2e, 0x67, 0x6f, 0x6f, 0x67, 0x6c,
0x65, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x75, 0x66, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x73,
0x74, 0x61, 0x6d, 0x70, 0x52, 0x09, 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x12,
0x17, 0x0a, 0x07, 0x63, 0x61, 0x6c, 0x6c, 0x5f, 0x69, 0x64, 0x18, 0x02, 0x20, 0x01, 0x28, 0x04,
0x52, 0x06, 0x63, 0x61, 0x6c, 0x6c, 0x49, 0x64, 0x12, 0x35, 0x0a, 0x17, 0x73, 0x65, 0x71, 0x75,
0x65, 0x6e, 0x63, 0x65, 0x5f, 0x69, 0x64, 0x5f, 0x77, 0x69, 0x74, 0x68, 0x69, 0x6e, 0x5f, 0x63,
0x61, 0x6c, 0x6c, 0x18, 0x03, 0x20, 0x01, 0x28, 0x04, 0x52, 0x14, 0x73, 0x65, 0x71, 0x75, 0x65,
0x6e, 0x63, 0x65, 0x49, 0x64, 0x57, 0x69, 0x74, 0x68, 0x69, 0x6e, 0x43, 0x61, 0x6c, 0x6c, 0x12,
0x3d, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x29, 0x2e,
0x67, 0x72, 0x70, 0x63, 0x2e, 0x62, 0x69, 0x6e, 0x61, 0x72, 0x79, 0x6c, 0x6f, 0x67, 0x2e, 0x76,
0x31, 0x2e, 0x47, 0x72, 0x70, 0x63, 0x4c, 0x6f, 0x67, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x2e, 0x45,
0x76, 0x65, 0x6e, 0x74, 0x54, 0x79, 0x70, 0x65, 0x52, 0x04, 0x74, 0x79, 0x70, 0x65, 0x12, 0x3e,
0x0a, 0x06, 0x6c, 0x6f, 0x67, 0x67, 0x65, 0x72, 0x18, 0x05, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x26,
0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x62, 0x69, 0x6e, 0x61, 0x72, 0x79, 0x6c, 0x6f, 0x67, 0x2e,
0x76, 0x31, 0x2e, 0x47, 0x72, 0x70, 0x63, 0x4c, 0x6f, 0x67, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x2e,
0x4c, 0x6f, 0x67, 0x67, 0x65, 0x72, 0x52, 0x06, 0x6c, 0x6f, 0x67, 0x67, 0x65, 0x72, 0x12, 0x46,
0x0a, 0x0d, 0x63, 0x6c, 0x69, 0x65, 0x6e, 0x74, 0x5f, 0x68, 0x65, 0x61, 0x64, 0x65, 0x72, 0x18,
0x06, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x1f, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x62, 0x69, 0x6e,
0x61, 0x72, 0x79, 0x6c, 0x6f, 0x67, 0x2e, 0x76, 0x31, 0x2e, 0x43, 0x6c, 0x69, 0x65, 0x6e, 0x74,
0x48, 0x65, 0x61, 0x64, 0x65, 0x72, 0x48, 0x00, 0x52, 0x0c, 0x63, 0x6c, 0x69, 0x65, 0x6e, 0x74,
0x48, 0x65, 0x61, 0x64, 0x65, 0x72, 0x12, 0x46, 0x0a, 0x0d, 0x73, 0x65, 0x72, 0x76, 0x65, 0x72,
0x5f, 0x68, 0x65, 0x61, 0x64, 0x65, 0x72, 0x18, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x1f, 0x2e,
0x67, 0x72, 0x70, 0x63, 0x2e, 0x62, 0x69, 0x6e, 0x61, 0x72, 0x79, 0x6c, 0x6f, 0x67, 0x2e, 0x76,
0x31, 0x2e, 0x53, 0x65, 0x72, 0x76, 0x65, 0x72, 0x48, 0x65, 0x61, 0x64, 0x65, 0x72, 0x48, 0x00,
0x52, 0x0c, 0x73, 0x65, 0x72, 0x76, 0x65, 0x72, 0x48, 0x65, 0x61, 0x64, 0x65, 0x72, 0x12, 0x36,
0x0a, 0x07, 0x6d, 0x65, 0x73, 0x73, 0x61, 0x67, 0x65, 0x18, 0x08, 0x20, 0x01, 0x28, 0x0b, 0x32,
0x1a, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x62, 0x69, 0x6e, 0x61, 0x72, 0x79, 0x6c, 0x6f, 0x67,
0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x65, 0x73, 0x73, 0x61, 0x67, 0x65, 0x48, 0x00, 0x52, 0x07, 0x6d,
0x65, 0x73, 0x73, 0x61, 0x67, 0x65, 0x12, 0x36, 0x0a, 0x07, 0x74, 0x72, 0x61, 0x69, 0x6c, 0x65,
0x72, 0x18, 0x09, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x1a, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x62,
0x69, 0x6e, 0x61, 0x72, 0x79, 0x6c, 0x6f, 0x67, 0x2e, 0x76, 0x31, 0x2e, 0x54, 0x72, 0x61, 0x69,
0x6c, 0x65, 0x72, 0x48, 0x00, 0x52, 0x07, 0x74, 0x72, 0x61, 0x69, 0x6c, 0x65, 0x72, 0x12, 0x2b,
0x0a, 0x11, 0x70, 0x61, 0x79, 0x6c, 0x6f, 0x61, 0x64, 0x5f, 0x74, 0x72, 0x75, 0x6e, 0x63, 0x61,
0x74, 0x65, 0x64, 0x18, 0x0a, 0x20, 0x01, 0x28, 0x08, 0x52, 0x10, 0x70, 0x61, 0x79, 0x6c, 0x6f,
0x61, 0x64, 0x54, 0x72, 0x75, 0x6e, 0x63, 0x61, 0x74, 0x65, 0x64, 0x12, 0x2e, 0x0a, 0x04, 0x70,
0x65, 0x65, 0x72, 0x18, 0x0b, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x1a, 0x2e, 0x67, 0x72, 0x70, 0x63,
0x2e, 0x62, 0x69, 0x6e, 0x61, 0x72, 0x79, 0x6c, 0x6f, 0x67, 0x2e, 0x76, 0x31, 0x2e, 0x41, 0x64,
0x64, 0x72, 0x65, 0x73, 0x73, 0x52, 0x04, 0x70, 0x65, 0x65, 0x72, 0x22, 0xf5, 0x01, 0x0a, 0x09,
0x45, 0x76, 0x65, 0x6e, 0x74, 0x54, 0x79, 0x70, 0x65, 0x12, 0x16, 0x0a, 0x12, 0x45, 0x56, 0x45,
0x4e, 0x54, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x5f, 0x55, 0x4e, 0x4b, 0x4e, 0x4f, 0x57, 0x4e, 0x10,
0x00, 0x12, 0x1c, 0x0a, 0x18, 0x45, 0x56, 0x45, 0x4e, 0x54, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x5f,
0x43, 0x4c, 0x49, 0x45, 0x4e, 0x54, 0x5f, 0x48, 0x45, 0x41, 0x44, 0x45, 0x52, 0x10, 0x01, 0x12,
0x1c, 0x0a, 0x18, 0x45, 0x56, 0x45, 0x4e, 0x54, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x5f, 0x53, 0x45,
0x52, 0x56, 0x45, 0x52, 0x5f, 0x48, 0x45, 0x41, 0x44, 0x45, 0x52, 0x10, 0x02, 0x12, 0x1d, 0x0a,
0x19, 0x45, 0x56, 0x45, 0x4e, 0x54, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x5f, 0x43, 0x4c, 0x49, 0x45,
0x4e, 0x54, 0x5f, 0x4d, 0x45, 0x53, 0x53, 0x41, 0x47, 0x45, 0x10, 0x03, 0x12, 0x1d, 0x0a, 0x19,
0x45, 0x56, 0x45, 0x4e, 0x54, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x5f, 0x53, 0x45, 0x52, 0x56, 0x45,
0x52, 0x5f, 0x4d, 0x45, 0x53, 0x53, 0x41, 0x47, 0x45, 0x10, 0x04, 0x12, 0x20, 0x0a, 0x1c, 0x45,
0x56, 0x45, 0x4e, 0x54, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x5f, 0x43, 0x4c, 0x49, 0x45, 0x4e, 0x54,
0x5f, 0x48, 0x41, 0x4c, 0x46, 0x5f, 0x43, 0x4c, 0x4f, 0x53, 0x45, 0x10, 0x05, 0x12, 0x1d, 0x0a,
0x19, 0x45, 0x56, 0x45, 0x4e, 0x54, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x5f, 0x53, 0x45, 0x52, 0x56,
0x45, 0x52, 0x5f, 0x54, 0x52, 0x41, 0x49, 0x4c, 0x45, 0x52, 0x10, 0x06, 0x12, 0x15, 0x0a, 0x11,
0x45, 0x56, 0x45, 0x4e, 0x54, 0x5f, 0x54, 0x59, 0x50, 0x45, 0x5f, 0x43, 0x41, 0x4e, 0x43, 0x45,
0x4c, 0x10, 0x07, 0x22, 0x42, 0x0a, 0x06, 0x4c, 0x6f, 0x67, 0x67, 0x65, 0x72, 0x12, 0x12, 0x0a,
0x0e, 0x4c, 0x4f, 0x47, 0x47, 0x45, 0x52, 0x5f, 0x55, 0x4e, 0x4b, 0x4e, 0x4f, 0x57, 0x4e, 0x10,
0x00, 0x12, 0x11, 0x0a, 0x0d, 0x4c, 0x4f, 0x47, 0x47, 0x45, 0x52, 0x5f, 0x43, 0x4c, 0x49, 0x45,
0x4e, 0x54, 0x10, 0x01, 0x12, 0x11, 0x0a, 0x0d, 0x4c, 0x4f, 0x47, 0x47, 0x45, 0x52, 0x5f, 0x53,
0x45, 0x52, 0x56, 0x45, 0x52, 0x10, 0x02, 0x42, 0x09, 0x0a, 0x07, 0x70, 0x61, 0x79, 0x6c, 0x6f,
0x61, 0x64, 0x22, 0xbb, 0x01, 0x0a, 0x0c, 0x43, 0x6c, 0x69, 0x65, 0x6e, 0x74, 0x48, 0x65, 0x61,
0x64, 0x65, 0x72, 0x12, 0x37, 0x0a, 0x08, 0x6d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61, 0x18,
0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x1b, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x62, 0x69, 0x6e,
0x61, 0x72, 0x79, 0x6c, 0x6f, 0x67, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x65, 0x74, 0x61, 0x64, 0x61,
0x74, 0x61, 0x52, 0x08, 0x6d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61, 0x12, 0x1f, 0x0a, 0x0b,
0x6d, 0x65, 0x74, 0x68, 0x6f, 0x64, 0x5f, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28,
0x09, 0x52, 0x0a, 0x6d, 0x65, 0x74, 0x68, 0x6f, 0x64, 0x4e, 0x61, 0x6d, 0x65, 0x12, 0x1c, 0x0a,
0x09, 0x61, 0x75, 0x74, 0x68, 0x6f, 0x72, 0x69, 0x74, 0x79, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09,
0x52, 0x09, 0x61, 0x75, 0x74, 0x68, 0x6f, 0x72, 0x69, 0x74, 0x79, 0x12, 0x33, 0x0a, 0x07, 0x74,
0x69, 0x6d, 0x65, 0x6f, 0x75, 0x74, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x19, 0x2e, 0x67,
0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x75, 0x66, 0x2e, 0x44,
0x75, 0x72, 0x61, 0x74, 0x69, 0x6f, 0x6e, 0x52, 0x07, 0x74, 0x69, 0x6d, 0x65, 0x6f, 0x75, 0x74,
0x22, 0x47, 0x0a, 0x0c, 0x53, 0x65, 0x72, 0x76, 0x65, 0x72, 0x48, 0x65, 0x61, 0x64, 0x65, 0x72,
0x12, 0x37, 0x0a, 0x08, 0x6d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61, 0x18, 0x01, 0x20, 0x01,
0x28, 0x0b, 0x32, 0x1b, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x62, 0x69, 0x6e, 0x61, 0x72, 0x79,
0x6c, 0x6f, 0x67, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61, 0x52,
0x08, 0x6d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61, 0x22, 0xb1, 0x01, 0x0a, 0x07, 0x54, 0x72,
0x61, 0x69, 0x6c, 0x65, 0x72, 0x12, 0x37, 0x0a, 0x08, 0x6d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74,
0x61, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x1b, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x62,
0x69, 0x6e, 0x61, 0x72, 0x79, 0x6c, 0x6f, 0x67, 0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x65, 0x74, 0x61,
0x64, 0x61, 0x74, 0x61, 0x52, 0x08, 0x6d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61, 0x12, 0x1f,
0x0a, 0x0b, 0x73, 0x74, 0x61, 0x74, 0x75, 0x73, 0x5f, 0x63, 0x6f, 0x64, 0x65, 0x18, 0x02, 0x20,
0x01, 0x28, 0x0d, 0x52, 0x0a, 0x73, 0x74, 0x61, 0x74, 0x75, 0x73, 0x43, 0x6f, 0x64, 0x65, 0x12,
0x25, 0x0a, 0x0e, 0x73, 0x74, 0x61, 0x74, 0x75, 0x73, 0x5f, 0x6d, 0x65, 0x73, 0x73, 0x61, 0x67,
0x65, 0x18, 0x03, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x73, 0x74, 0x61, 0x74, 0x75, 0x73, 0x4d,
0x65, 0x73, 0x73, 0x61, 0x67, 0x65, 0x12, 0x25, 0x0a, 0x0e, 0x73, 0x74, 0x61, 0x74, 0x75, 0x73,
0x5f, 0x64, 0x65, 0x74, 0x61, 0x69, 0x6c, 0x73, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x0d,
0x73, 0x74, 0x61, 0x74, 0x75, 0x73, 0x44, 0x65, 0x74, 0x61, 0x69, 0x6c, 0x73, 0x22, 0x35, 0x0a,
0x07, 0x4d, 0x65, 0x73, 0x73, 0x61, 0x67, 0x65, 0x12, 0x16, 0x0a, 0x06, 0x6c, 0x65, 0x6e, 0x67,
0x74, 0x68, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0d, 0x52, 0x06, 0x6c, 0x65, 0x6e, 0x67, 0x74, 0x68,
0x12, 0x12, 0x0a, 0x04, 0x64, 0x61, 0x74, 0x61, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x04,
0x64, 0x61, 0x74, 0x61, 0x22, 0x42, 0x0a, 0x08, 0x4d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61,
0x12, 0x36, 0x0a, 0x05, 0x65, 0x6e, 0x74, 0x72, 0x79, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32,
0x20, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x62, 0x69, 0x6e, 0x61, 0x72, 0x79, 0x6c, 0x6f, 0x67,
0x2e, 0x76, 0x31, 0x2e, 0x4d, 0x65, 0x74, 0x61, 0x64, 0x61, 0x74, 0x61, 0x45, 0x6e, 0x74, 0x72,
0x79, 0x52, 0x05, 0x65, 0x6e, 0x74, 0x72, 0x79, 0x22, 0x37, 0x0a, 0x0d, 0x4d, 0x65, 0x74, 0x61,
0x64, 0x61, 0x74, 0x61, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03, 0x6b, 0x65, 0x79,
0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x14, 0x0a, 0x05, 0x76,
0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0c, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75,
0x65, 0x22, 0xb8, 0x01, 0x0a, 0x07, 0x41, 0x64, 0x64, 0x72, 0x65, 0x73, 0x73, 0x12, 0x33, 0x0a,
0x04, 0x74, 0x79, 0x70, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x1f, 0x2e, 0x67, 0x72,
0x70, 0x63, 0x2e, 0x62, 0x69, 0x6e, 0x61, 0x72, 0x79, 0x6c, 0x6f, 0x67, 0x2e, 0x76, 0x31, 0x2e,
0x41, 0x64, 0x64, 0x72, 0x65, 0x73, 0x73, 0x2e, 0x54, 0x79, 0x70, 0x65, 0x52, 0x04, 0x74, 0x79,
0x70, 0x65, 0x12, 0x18, 0x0a, 0x07, 0x61, 0x64, 0x64, 0x72, 0x65, 0x73, 0x73, 0x18, 0x02, 0x20,
0x01, 0x28, 0x09, 0x52, 0x07, 0x61, 0x64, 0x64, 0x72, 0x65, 0x73, 0x73, 0x12, 0x17, 0x0a, 0x07,
0x69, 0x70, 0x5f, 0x70, 0x6f, 0x72, 0x74, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0d, 0x52, 0x06, 0x69,
0x70, 0x50, 0x6f, 0x72, 0x74, 0x22, 0x45, 0x0a, 0x04, 0x54, 0x79, 0x70, 0x65, 0x12, 0x10, 0x0a,
0x0c, 0x54, 0x59, 0x50, 0x45, 0x5f, 0x55, 0x4e, 0x4b, 0x4e, 0x4f, 0x57, 0x4e, 0x10, 0x00, 0x12,
0x0d, 0x0a, 0x09, 0x54, 0x59, 0x50, 0x45, 0x5f, 0x49, 0x50, 0x56, 0x34, 0x10, 0x01, 0x12, 0x0d,
0x0a, 0x09, 0x54, 0x59, 0x50, 0x45, 0x5f, 0x49, 0x50, 0x56, 0x36, 0x10, 0x02, 0x12, 0x0d, 0x0a,
0x09, 0x54, 0x59, 0x50, 0x45, 0x5f, 0x55, 0x4e, 0x49, 0x58, 0x10, 0x03, 0x42, 0x5c, 0x0a, 0x14,
0x69, 0x6f, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x62, 0x69, 0x6e, 0x61, 0x72, 0x79, 0x6c, 0x6f,
0x67, 0x2e, 0x76, 0x31, 0x42, 0x0e, 0x42, 0x69, 0x6e, 0x61, 0x72, 0x79, 0x4c, 0x6f, 0x67, 0x50,
0x72, 0x6f, 0x74, 0x6f, 0x50, 0x01, 0x5a, 0x32, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2e, 0x67,
0x6f, 0x6c, 0x61, 0x6e, 0x67, 0x2e, 0x6f, 0x72, 0x67, 0x2f, 0x67, 0x72, 0x70, 0x63, 0x2f, 0x62,
0x69, 0x6e, 0x61, 0x72, 0x79, 0x6c, 0x6f, 0x67, 0x2f, 0x67, 0x72, 0x70, 0x63, 0x5f, 0x62, 0x69,
0x6e, 0x61, 0x72, 0x79, 0x6c, 0x6f, 0x67, 0x5f, 0x76, 0x31, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74,
0x6f, 0x33,
})
const file_grpc_binlog_v1_binarylog_proto_rawDesc = "" +
"\n" +
"\x1egrpc/binlog/v1/binarylog.proto\x12\x11grpc.binarylog.v1\x1a\x1egoogle/protobuf/duration.proto\x1a\x1fgoogle/protobuf/timestamp.proto\"\xbb\a\n" +
"\fGrpcLogEntry\x128\n" +
"\ttimestamp\x18\x01 \x01(\v2\x1a.google.protobuf.TimestampR\ttimestamp\x12\x17\n" +
"\acall_id\x18\x02 \x01(\x04R\x06callId\x125\n" +
"\x17sequence_id_within_call\x18\x03 \x01(\x04R\x14sequenceIdWithinCall\x12=\n" +
"\x04type\x18\x04 \x01(\x0e2).grpc.binarylog.v1.GrpcLogEntry.EventTypeR\x04type\x12>\n" +
"\x06logger\x18\x05 \x01(\x0e2&.grpc.binarylog.v1.GrpcLogEntry.LoggerR\x06logger\x12F\n" +
"\rclient_header\x18\x06 \x01(\v2\x1f.grpc.binarylog.v1.ClientHeaderH\x00R\fclientHeader\x12F\n" +
"\rserver_header\x18\a \x01(\v2\x1f.grpc.binarylog.v1.ServerHeaderH\x00R\fserverHeader\x126\n" +
"\amessage\x18\b \x01(\v2\x1a.grpc.binarylog.v1.MessageH\x00R\amessage\x126\n" +
"\atrailer\x18\t \x01(\v2\x1a.grpc.binarylog.v1.TrailerH\x00R\atrailer\x12+\n" +
"\x11payload_truncated\x18\n" +
" \x01(\bR\x10payloadTruncated\x12.\n" +
"\x04peer\x18\v \x01(\v2\x1a.grpc.binarylog.v1.AddressR\x04peer\"\xf5\x01\n" +
"\tEventType\x12\x16\n" +
"\x12EVENT_TYPE_UNKNOWN\x10\x00\x12\x1c\n" +
"\x18EVENT_TYPE_CLIENT_HEADER\x10\x01\x12\x1c\n" +
"\x18EVENT_TYPE_SERVER_HEADER\x10\x02\x12\x1d\n" +
"\x19EVENT_TYPE_CLIENT_MESSAGE\x10\x03\x12\x1d\n" +
"\x19EVENT_TYPE_SERVER_MESSAGE\x10\x04\x12 \n" +
"\x1cEVENT_TYPE_CLIENT_HALF_CLOSE\x10\x05\x12\x1d\n" +
"\x19EVENT_TYPE_SERVER_TRAILER\x10\x06\x12\x15\n" +
"\x11EVENT_TYPE_CANCEL\x10\a\"B\n" +
"\x06Logger\x12\x12\n" +
"\x0eLOGGER_UNKNOWN\x10\x00\x12\x11\n" +
"\rLOGGER_CLIENT\x10\x01\x12\x11\n" +
"\rLOGGER_SERVER\x10\x02B\t\n" +
"\apayload\"\xbb\x01\n" +
"\fClientHeader\x127\n" +
"\bmetadata\x18\x01 \x01(\v2\x1b.grpc.binarylog.v1.MetadataR\bmetadata\x12\x1f\n" +
"\vmethod_name\x18\x02 \x01(\tR\n" +
"methodName\x12\x1c\n" +
"\tauthority\x18\x03 \x01(\tR\tauthority\x123\n" +
"\atimeout\x18\x04 \x01(\v2\x19.google.protobuf.DurationR\atimeout\"G\n" +
"\fServerHeader\x127\n" +
"\bmetadata\x18\x01 \x01(\v2\x1b.grpc.binarylog.v1.MetadataR\bmetadata\"\xb1\x01\n" +
"\aTrailer\x127\n" +
"\bmetadata\x18\x01 \x01(\v2\x1b.grpc.binarylog.v1.MetadataR\bmetadata\x12\x1f\n" +
"\vstatus_code\x18\x02 \x01(\rR\n" +
"statusCode\x12%\n" +
"\x0estatus_message\x18\x03 \x01(\tR\rstatusMessage\x12%\n" +
"\x0estatus_details\x18\x04 \x01(\fR\rstatusDetails\"5\n" +
"\aMessage\x12\x16\n" +
"\x06length\x18\x01 \x01(\rR\x06length\x12\x12\n" +
"\x04data\x18\x02 \x01(\fR\x04data\"B\n" +
"\bMetadata\x126\n" +
"\x05entry\x18\x01 \x03(\v2 .grpc.binarylog.v1.MetadataEntryR\x05entry\"7\n" +
"\rMetadataEntry\x12\x10\n" +
"\x03key\x18\x01 \x01(\tR\x03key\x12\x14\n" +
"\x05value\x18\x02 \x01(\fR\x05value\"\xb8\x01\n" +
"\aAddress\x123\n" +
"\x04type\x18\x01 \x01(\x0e2\x1f.grpc.binarylog.v1.Address.TypeR\x04type\x12\x18\n" +
"\aaddress\x18\x02 \x01(\tR\aaddress\x12\x17\n" +
"\aip_port\x18\x03 \x01(\rR\x06ipPort\"E\n" +
"\x04Type\x12\x10\n" +
"\fTYPE_UNKNOWN\x10\x00\x12\r\n" +
"\tTYPE_IPV4\x10\x01\x12\r\n" +
"\tTYPE_IPV6\x10\x02\x12\r\n" +
"\tTYPE_UNIX\x10\x03B\\\n" +
"\x14io.grpc.binarylog.v1B\x0eBinaryLogProtoP\x01Z2google.golang.org/grpc/binarylog/grpc_binarylog_v1b\x06proto3"
var (
file_grpc_binlog_v1_binarylog_proto_rawDescOnce sync.Once

View File

@@ -35,16 +35,19 @@ import (
"google.golang.org/grpc/balancer/pickfirst"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/connectivity"
"google.golang.org/grpc/credentials"
expstats "google.golang.org/grpc/experimental/stats"
"google.golang.org/grpc/internal"
"google.golang.org/grpc/internal/channelz"
"google.golang.org/grpc/internal/grpcsync"
"google.golang.org/grpc/internal/idle"
iresolver "google.golang.org/grpc/internal/resolver"
"google.golang.org/grpc/internal/stats"
istats "google.golang.org/grpc/internal/stats"
"google.golang.org/grpc/internal/transport"
"google.golang.org/grpc/keepalive"
"google.golang.org/grpc/resolver"
"google.golang.org/grpc/serviceconfig"
"google.golang.org/grpc/stats"
"google.golang.org/grpc/status"
_ "google.golang.org/grpc/balancer/roundrobin" // To register roundrobin.
@@ -97,6 +100,41 @@ var (
errTransportCredentialsMissing = errors.New("grpc: the credentials require transport level security (use grpc.WithTransportCredentials() to set)")
)
var (
disconnectionsMetric = expstats.RegisterInt64Count(expstats.MetricDescriptor{
Name: "grpc.subchannel.disconnections",
Description: "EXPERIMENTAL. Number of times the selected subchannel becomes disconnected.",
Unit: "{disconnection}",
Labels: []string{"grpc.target"},
OptionalLabels: []string{"grpc.lb.backend_service", "grpc.lb.locality", "grpc.disconnect_error"},
Default: false,
})
connectionAttemptsSucceededMetric = expstats.RegisterInt64Count(expstats.MetricDescriptor{
Name: "grpc.subchannel.connection_attempts_succeeded",
Description: "EXPERIMENTAL. Number of successful connection attempts.",
Unit: "{attempt}",
Labels: []string{"grpc.target"},
OptionalLabels: []string{"grpc.lb.backend_service", "grpc.lb.locality"},
Default: false,
})
connectionAttemptsFailedMetric = expstats.RegisterInt64Count(expstats.MetricDescriptor{
Name: "grpc.subchannel.connection_attempts_failed",
Description: "EXPERIMENTAL. Number of failed connection attempts.",
Unit: "{attempt}",
Labels: []string{"grpc.target"},
OptionalLabels: []string{"grpc.lb.backend_service", "grpc.lb.locality"},
Default: false,
})
openConnectionsMetric = expstats.RegisterInt64UpDownCount(expstats.MetricDescriptor{
Name: "grpc.subchannel.open_connections",
Description: "EXPERIMENTAL. Number of open connections.",
Unit: "{attempt}",
Labels: []string{"grpc.target"},
OptionalLabels: []string{"grpc.lb.backend_service", "grpc.security_level", "grpc.lb.locality"},
Default: false,
})
)
const (
defaultClientMaxReceiveMessageSize = 1024 * 1024 * 4
defaultClientMaxSendMessageSize = math.MaxInt32
@@ -208,9 +246,10 @@ func NewClient(target string, opts ...DialOption) (conn *ClientConn, err error)
channelz.Infof(logger, cc.channelz, "Channel authority set to %q", cc.authority)
cc.csMgr = newConnectivityStateManager(cc.ctx, cc.channelz)
cc.pickerWrapper = newPickerWrapper(cc.dopts.copts.StatsHandlers)
cc.pickerWrapper = newPickerWrapper()
cc.metricsRecorderList = stats.NewMetricsRecorderList(cc.dopts.copts.StatsHandlers)
cc.metricsRecorderList = istats.NewMetricsRecorderList(cc.dopts.copts.StatsHandlers)
cc.statsHandler = istats.NewCombinedHandler(cc.dopts.copts.StatsHandlers...)
cc.initIdleStateLocked() // Safe to call without the lock, since nothing else has a reference to cc.
cc.idlenessMgr = idle.NewManager((*idler)(cc), cc.dopts.idleTimeout)
@@ -260,9 +299,10 @@ func DialContext(ctx context.Context, target string, opts ...DialOption) (conn *
}()
// This creates the name resolver, load balancer, etc.
if err := cc.idlenessMgr.ExitIdleMode(); err != nil {
return nil, err
if err := cc.exitIdleMode(); err != nil {
return nil, fmt.Errorf("failed to exit idle mode: %w", err)
}
cc.idlenessMgr.UnsafeSetNotIdle()
// Return now for non-blocking dials.
if !cc.dopts.block {
@@ -330,7 +370,7 @@ func (cc *ClientConn) addTraceEvent(msg string) {
Severity: channelz.CtInfo,
}
}
channelz.AddTraceEvent(logger, cc.channelz, 0, ted)
channelz.AddTraceEvent(logger, cc.channelz, 1, ted)
}
type idler ClientConn
@@ -339,14 +379,17 @@ func (i *idler) EnterIdleMode() {
(*ClientConn)(i).enterIdleMode()
}
func (i *idler) ExitIdleMode() error {
return (*ClientConn)(i).exitIdleMode()
func (i *idler) ExitIdleMode() {
// Ignore the error returned from this method, because from the perspective
// of the caller (idleness manager), the channel would have always moved out
// of IDLE by the time this method returns.
(*ClientConn)(i).exitIdleMode()
}
// exitIdleMode moves the channel out of idle mode by recreating the name
// resolver and load balancer. This should never be called directly; use
// cc.idlenessMgr.ExitIdleMode instead.
func (cc *ClientConn) exitIdleMode() (err error) {
func (cc *ClientConn) exitIdleMode() error {
cc.mu.Lock()
if cc.conns == nil {
cc.mu.Unlock()
@@ -354,11 +397,23 @@ func (cc *ClientConn) exitIdleMode() (err error) {
}
cc.mu.Unlock()
// Set state to CONNECTING before building the name resolver
// so the channel does not remain in IDLE.
cc.csMgr.updateState(connectivity.Connecting)
// This needs to be called without cc.mu because this builds a new resolver
// which might update state or report error inline, which would then need to
// acquire cc.mu.
if err := cc.resolverWrapper.start(); err != nil {
return err
// If resolver creation fails, treat it like an error reported by the
// resolver before any valid updates. Set channel's state to
// TransientFailure, and set an erroring picker with the resolver build
// error, which will returned as part of any subsequent RPCs.
logger.Warningf("Failed to start resolver: %v", err)
cc.csMgr.updateState(connectivity.TransientFailure)
cc.mu.Lock()
cc.updateResolverStateAndUnlock(resolver.State{}, err)
return fmt.Errorf("failed to start resolver: %w", err)
}
cc.addTraceEvent("exiting idle mode")
@@ -456,7 +511,7 @@ func (cc *ClientConn) validateTransportCredentials() error {
func (cc *ClientConn) channelzRegistration(target string) {
parentChannel, _ := cc.dopts.channelzParent.(*channelz.Channel)
cc.channelz = channelz.RegisterChannel(parentChannel, target)
cc.addTraceEvent("created")
cc.addTraceEvent(fmt.Sprintf("created for target %q", target))
}
// chainUnaryClientInterceptors chains all unary client interceptors into one.
@@ -621,7 +676,8 @@ type ClientConn struct {
channelz *channelz.Channel // Channelz object.
resolverBuilder resolver.Builder // See initParsedTargetAndResolverBuilder().
idlenessMgr *idle.Manager
metricsRecorderList *stats.MetricsRecorderList
metricsRecorderList *istats.MetricsRecorderList
statsHandler stats.Handler
// The following provide their own synchronization, and therefore don't
// require cc.mu to be held to access them.
@@ -678,10 +734,8 @@ func (cc *ClientConn) GetState() connectivity.State {
// Notice: This API is EXPERIMENTAL and may be changed or removed in a later
// release.
func (cc *ClientConn) Connect() {
if err := cc.idlenessMgr.ExitIdleMode(); err != nil {
cc.addTraceEvent(err.Error())
return
}
cc.idlenessMgr.ExitIdleMode()
// If the ClientConn was not in idle mode, we need to call ExitIdle on the
// LB policy so that connections can be created.
cc.mu.Lock()
@@ -689,22 +743,31 @@ func (cc *ClientConn) Connect() {
cc.mu.Unlock()
}
// waitForResolvedAddrs blocks until the resolver has provided addresses or the
// context expires. Returns nil unless the context expires first; otherwise
// returns a status error based on the context.
func (cc *ClientConn) waitForResolvedAddrs(ctx context.Context) error {
// waitForResolvedAddrs blocks until the resolver provides addresses or the
// context expires, whichever happens first.
//
// Error is nil unless the context expires first; otherwise returns a status
// error based on the context.
//
// The returned boolean indicates whether it did block or not. If the
// resolution has already happened once before, it returns false without
// blocking. Otherwise, it wait for the resolution and return true if
// resolution has succeeded or return false along with error if resolution has
// failed.
func (cc *ClientConn) waitForResolvedAddrs(ctx context.Context) (bool, error) {
// This is on the RPC path, so we use a fast path to avoid the
// more-expensive "select" below after the resolver has returned once.
if cc.firstResolveEvent.HasFired() {
return nil
return false, nil
}
internal.NewStreamWaitingForResolver()
select {
case <-cc.firstResolveEvent.Done():
return nil
return true, nil
case <-ctx.Done():
return status.FromContextError(ctx.Err()).Err()
return false, status.FromContextError(ctx.Err()).Err()
case <-cc.ctx.Done():
return ErrClientConnClosing
return false, ErrClientConnClosing
}
}
@@ -723,8 +786,8 @@ func init() {
internal.EnterIdleModeForTesting = func(cc *ClientConn) {
cc.idlenessMgr.EnterIdleModeForTesting()
}
internal.ExitIdleModeForTesting = func(cc *ClientConn) error {
return cc.idlenessMgr.ExitIdleMode()
internal.ExitIdleModeForTesting = func(cc *ClientConn) {
cc.idlenessMgr.ExitIdleMode()
}
}
@@ -849,6 +912,7 @@ func (cc *ClientConn) newAddrConnLocked(addrs []resolver.Address, opts balancer.
channelz: channelz.RegisterSubChannel(cc.channelz, ""),
resetBackoff: make(chan struct{}),
}
ac.updateTelemetryLabelsLocked()
ac.ctx, ac.cancel = context.WithCancel(cc.ctx)
// Start with our address set to the first address; this may be updated if
// we connect to different addresses.
@@ -913,25 +977,24 @@ func (cc *ClientConn) incrCallsFailed() {
// connect starts creating a transport.
// It does nothing if the ac is not IDLE.
// TODO(bar) Move this to the addrConn section.
func (ac *addrConn) connect() error {
func (ac *addrConn) connect() {
ac.mu.Lock()
if ac.state == connectivity.Shutdown {
if logger.V(2) {
logger.Infof("connect called on shutdown addrConn; ignoring.")
}
ac.mu.Unlock()
return errConnClosing
return
}
if ac.state != connectivity.Idle {
if logger.V(2) {
logger.Infof("connect called on addrConn in non-idle state (%v); ignoring.", ac.state)
}
ac.mu.Unlock()
return nil
return
}
ac.resetTransportAndUnlock()
return nil
}
// equalAddressIgnoringBalAttributes returns true is a and b are considered equal.
@@ -965,7 +1028,7 @@ func (ac *addrConn) updateAddrs(addrs []resolver.Address) {
}
ac.addrs = addrs
ac.updateTelemetryLabelsLocked()
if ac.state == connectivity.Shutdown ||
ac.state == connectivity.TransientFailure ||
ac.state == connectivity.Idle {
@@ -1067,13 +1130,6 @@ func (cc *ClientConn) healthCheckConfig() *healthCheckConfig {
return cc.sc.healthCheckConfig
}
func (cc *ClientConn) getTransport(ctx context.Context, failfast bool, method string) (transport.ClientTransport, balancer.PickResult, error) {
return cc.pickerWrapper.pick(ctx, failfast, balancer.PickInfo{
Ctx: ctx,
FullMethodName: method,
})
}
func (cc *ClientConn) applyServiceConfigAndBalancer(sc *ServiceConfig, configSelector iresolver.ConfigSelector) {
if sc == nil {
// should never reach here.
@@ -1211,6 +1267,9 @@ type addrConn struct {
resetBackoff chan struct{}
channelz *channelz.SubChannel
localityLabel string
backendServiceLabel string
}
// Note: this requires a lock on ac.mu.
@@ -1218,6 +1277,18 @@ func (ac *addrConn) updateConnectivityState(s connectivity.State, lastErr error)
if ac.state == s {
return
}
// If we are transitioning out of Ready, it means there is a disconnection.
// A SubConn can also transition from CONNECTING directly to IDLE when
// a transport is successfully created, but the connection fails
// before the SubConn can send the notification for READY. We treat
// this as a successful connection and transition to IDLE.
// TODO: https://github.com/grpc/grpc-go/issues/7862 - Remove the second
// part of the if condition below once the issue is fixed.
if ac.state == connectivity.Ready || (ac.state == connectivity.Connecting && s == connectivity.Idle) {
disconnectionsMetric.Record(ac.cc.metricsRecorderList, 1, ac.cc.target, ac.backendServiceLabel, ac.localityLabel, "unknown")
openConnectionsMetric.Record(ac.cc.metricsRecorderList, -1, ac.cc.target, ac.backendServiceLabel, ac.securityLevelLocked(), ac.localityLabel)
}
ac.state = s
ac.channelz.ChannelMetrics.State.Store(&s)
if lastErr == nil {
@@ -1225,7 +1296,7 @@ func (ac *addrConn) updateConnectivityState(s connectivity.State, lastErr error)
} else {
channelz.Infof(logger, ac.channelz, "Subchannel Connectivity change to %v, last error: %s", s, lastErr)
}
ac.acbw.updateState(s, ac.curAddr, lastErr)
ac.acbw.updateState(s, lastErr)
}
// adjustParams updates parameters used to create transports upon
@@ -1275,6 +1346,15 @@ func (ac *addrConn) resetTransportAndUnlock() {
ac.mu.Unlock()
if err := ac.tryAllAddrs(acCtx, addrs, connectDeadline); err != nil {
if !errors.Is(err, context.Canceled) {
connectionAttemptsFailedMetric.Record(ac.cc.metricsRecorderList, 1, ac.cc.target, ac.backendServiceLabel, ac.localityLabel)
} else {
if logger.V(2) {
// This records cancelled connection attempts which can be later
// replaced by a metric.
logger.Infof("Context cancellation detected; not recording this as a failed connection attempt.")
}
}
// TODO: #7534 - Move re-resolution requests into the pick_first LB policy
// to ensure one resolution request per pass instead of per subconn failure.
ac.cc.resolveNow(resolver.ResolveNowOptions{})
@@ -1314,10 +1394,50 @@ func (ac *addrConn) resetTransportAndUnlock() {
}
// Success; reset backoff.
ac.mu.Lock()
connectionAttemptsSucceededMetric.Record(ac.cc.metricsRecorderList, 1, ac.cc.target, ac.backendServiceLabel, ac.localityLabel)
openConnectionsMetric.Record(ac.cc.metricsRecorderList, 1, ac.cc.target, ac.backendServiceLabel, ac.securityLevelLocked(), ac.localityLabel)
ac.backoffIdx = 0
ac.mu.Unlock()
}
// updateTelemetryLabelsLocked calculates and caches the telemetry labels based on the
// first address in addrConn.
func (ac *addrConn) updateTelemetryLabelsLocked() {
labelsFunc, ok := internal.AddressToTelemetryLabels.(func(resolver.Address) map[string]string)
if !ok || len(ac.addrs) == 0 {
// Reset defaults
ac.localityLabel = ""
ac.backendServiceLabel = ""
return
}
labels := labelsFunc(ac.addrs[0])
ac.localityLabel = labels["grpc.lb.locality"]
ac.backendServiceLabel = labels["grpc.lb.backend_service"]
}
type securityLevelKey struct{}
func (ac *addrConn) securityLevelLocked() string {
var secLevel string
// During disconnection, ac.transport is nil. Fall back to the security level
// stored in the current address during connection.
if ac.transport == nil {
secLevel, _ = ac.curAddr.Attributes.Value(securityLevelKey{}).(string)
return secLevel
}
authInfo := ac.transport.Peer().AuthInfo
if ci, ok := authInfo.(interface {
GetCommonAuthInfo() credentials.CommonAuthInfo
}); ok {
secLevel = ci.GetCommonAuthInfo().SecurityLevel.String()
// Store the security level in the current address' attributes so
// that it remains available for disconnection metrics after the
// transport is closed.
ac.curAddr.Attributes = ac.curAddr.Attributes.WithValue(securityLevelKey{}, secLevel)
}
return secLevel
}
// tryAllAddrs tries to create a connection to the addresses, and stop when at
// the first successful one. It returns an error if no address was successfully
// connected, or updates ac appropriately with the new transport.
@@ -1407,25 +1527,26 @@ func (ac *addrConn) createTransport(ctx context.Context, addr resolver.Address,
}
ac.mu.Lock()
defer ac.mu.Unlock()
if ctx.Err() != nil {
// This can happen if the subConn was removed while in `Connecting`
// state. tearDown() would have set the state to `Shutdown`, but
// would not have closed the transport since ac.transport would not
// have been set at that point.
//
// We run this in a goroutine because newTr.Close() calls onClose()
// We unlock ac.mu because newTr.Close() calls onClose()
// inline, which requires locking ac.mu.
//
ac.mu.Unlock()
// The error we pass to Close() is immaterial since there are no open
// streams at this point, so no trailers with error details will be sent
// out. We just need to pass a non-nil error.
//
// This can also happen when updateAddrs is called during a connection
// attempt.
go newTr.Close(transport.ErrConnClosing)
newTr.Close(transport.ErrConnClosing)
return nil
}
defer ac.mu.Unlock()
if hctx.Err() != nil {
// onClose was already called for this connection, but the connection
// was successfully established first. Consider it a success and set
@@ -1822,7 +1943,7 @@ func (cc *ClientConn) initAuthority() error {
} else if auth, ok := cc.resolverBuilder.(resolver.AuthorityOverrider); ok {
cc.authority = auth.OverrideAuthority(cc.parsedTarget)
} else if strings.HasPrefix(endpoint, ":") {
cc.authority = "localhost" + endpoint
cc.authority = "localhost" + encodeAuthority(endpoint)
} else {
cc.authority = encodeAuthority(endpoint)
}

View File

@@ -44,8 +44,7 @@ type PerRPCCredentials interface {
// A54). uri is the URI of the entry point for the request. When supported
// by the underlying implementation, ctx can be used for timeout and
// cancellation. Additionally, RequestInfo data will be available via ctx
// to this call. TODO(zhaoq): Define the set of the qualified keys instead
// of leaving it as an arbitrary string.
// to this call.
GetRequestMetadata(ctx context.Context, uri ...string) (map[string]string, error)
// RequireTransportSecurity indicates whether the credentials requires
// transport security.
@@ -96,10 +95,11 @@ func (c CommonAuthInfo) GetCommonAuthInfo() CommonAuthInfo {
return c
}
// ProtocolInfo provides information regarding the gRPC wire protocol version,
// security protocol, security protocol version in use, server name, etc.
// ProtocolInfo provides static information regarding transport credentials.
type ProtocolInfo struct {
// ProtocolVersion is the gRPC wire protocol version.
//
// Deprecated: this is unused by gRPC.
ProtocolVersion string
// SecurityProtocol is the security protocol in use.
SecurityProtocol string
@@ -109,7 +109,16 @@ type ProtocolInfo struct {
//
// Deprecated: please use Peer.AuthInfo.
SecurityVersion string
// ServerName is the user-configured server name.
// ServerName is the user-configured server name. If set, this overrides
// the default :authority header used for all RPCs on the channel using the
// containing credentials, unless grpc.WithAuthority is set on the channel,
// in which case that setting will take precedence.
//
// This must be a valid `:authority` header according to
// [RFC3986](https://datatracker.ietf.org/doc/html/rfc3986#section-3.2).
//
// Deprecated: Users should use grpc.WithAuthority to override the authority
// on a channel instead of configuring the credentials.
ServerName string
}
@@ -120,6 +129,20 @@ type AuthInfo interface {
AuthType() string
}
// AuthorityValidator validates the authority used to override the `:authority`
// header. This is an optional interface that implementations of AuthInfo can
// implement if they support per-RPC authority overrides. It is invoked when the
// application attempts to override the HTTP/2 `:authority` header using the
// CallAuthority call option.
type AuthorityValidator interface {
// ValidateAuthority checks the authority value used to override the
// `:authority` header. The authority parameter is the override value
// provided by the application via the CallAuthority option. This value
// typically corresponds to the server hostname or endpoint the RPC is
// targeting. It returns non-nil error if the validation fails.
ValidateAuthority(authority string) error
}
// ErrConnDispatched indicates that rawConn has been dispatched out of gRPC
// and the caller should not close rawConn.
var ErrConnDispatched = errors.New("credentials: rawConn is dispatched out of gRPC")
@@ -159,12 +182,17 @@ type TransportCredentials interface {
// Clone makes a copy of this TransportCredentials.
Clone() TransportCredentials
// OverrideServerName specifies the value used for the following:
//
// - verifying the hostname on the returned certificates
// - as SNI in the client's handshake to support virtual hosting
// - as the value for `:authority` header at stream creation time
//
// Deprecated: use grpc.WithAuthority instead. Will be supported
// throughout 1.x.
// The provided string should be a valid `:authority` header according to
// [RFC3986](https://datatracker.ietf.org/doc/html/rfc3986#section-3.2).
//
// Deprecated: this method is unused by gRPC. Users should use
// grpc.WithAuthority to override the authority on a channel instead of
// configuring the credentials.
OverrideServerName(string) error
}
@@ -207,14 +235,32 @@ type RequestInfo struct {
AuthInfo AuthInfo
}
// requestInfoKey is a struct to be used as the key to store RequestInfo in a
// context.
type requestInfoKey struct{}
// RequestInfoFromContext extracts the RequestInfo from the context if it exists.
//
// This API is experimental.
func RequestInfoFromContext(ctx context.Context) (ri RequestInfo, ok bool) {
ri, ok = icredentials.RequestInfoFromContext(ctx).(RequestInfo)
ri, ok = ctx.Value(requestInfoKey{}).(RequestInfo)
return ri, ok
}
// NewContextWithRequestInfo creates a new context from ctx and attaches ri to it.
//
// This RequestInfo will be accessible via RequestInfoFromContext.
//
// Intended to be used from tests for PerRPCCredentials implementations (that
// often need to check connection's SecurityLevel). Should not be used from
// non-test code: the gRPC client already prepares a context with the correct
// RequestInfo attached when calling PerRPCCredentials.GetRequestMetadata.
//
// This API is experimental.
func NewContextWithRequestInfo(ctx context.Context, ri RequestInfo) context.Context {
return context.WithValue(ctx, requestInfoKey{}, ri)
}
// ClientHandshakeInfo holds data to be passed to ClientHandshake. This makes
// it possible to pass arbitrary data to the handshaker from gRPC, resolver,
// balancer etc. Individual credential implementations control the actual

View File

@@ -30,7 +30,7 @@ import (
// NewCredentials returns a credentials which disables transport security.
//
// Note that using this credentials with per-RPC credentials which require
// transport security is incompatible and will cause grpc.Dial() to fail.
// transport security is incompatible and will cause RPCs to fail.
func NewCredentials() credentials.TransportCredentials {
return insecureTC{}
}
@@ -71,6 +71,12 @@ func (info) AuthType() string {
return "insecure"
}
// ValidateAuthority allows any value to be overridden for the :authority
// header.
func (info) ValidateAuthority(string) error {
return nil
}
// insecureBundle implements an insecure bundle.
// An insecure bundle provides a thin wrapper around insecureTC to support
// the credentials.Bundle interface.

View File

@@ -22,6 +22,7 @@ import (
"context"
"crypto/tls"
"crypto/x509"
"errors"
"fmt"
"net"
"net/url"
@@ -50,6 +51,25 @@ func (t TLSInfo) AuthType() string {
return "tls"
}
// ValidateAuthority validates the provided authority being used to override the
// :authority header by verifying it against the peer certificates. It returns a
// non-nil error if the validation fails.
func (t TLSInfo) ValidateAuthority(authority string) error {
var errs []error
host, _, err := net.SplitHostPort(authority)
if err != nil {
host = authority
}
for _, cert := range t.State.PeerCertificates {
var err error
if err = cert.VerifyHostname(host); err == nil {
return nil
}
errs = append(errs, err)
}
return fmt.Errorf("credentials: invalid authority %q: %v", authority, errors.Join(errs...))
}
// cipherSuiteLookup returns the string version of a TLS cipher suite ID.
func cipherSuiteLookup(cipherSuiteID uint16) string {
for _, s := range tls.CipherSuites() {
@@ -94,14 +114,14 @@ func (c tlsCreds) Info() ProtocolInfo {
func (c *tlsCreds) ClientHandshake(ctx context.Context, authority string, rawConn net.Conn) (_ net.Conn, _ AuthInfo, err error) {
// use local cfg to avoid clobbering ServerName if using multiple endpoints
cfg := credinternal.CloneTLSConfig(c.config)
if cfg.ServerName == "" {
serverName, _, err := net.SplitHostPort(authority)
if err != nil {
// If the authority had no host port or if the authority cannot be parsed, use it as-is.
serverName = authority
}
cfg.ServerName = serverName
serverName, _, err := net.SplitHostPort(authority)
if err != nil {
// If the authority had no host port or if the authority cannot be parsed, use it as-is.
serverName = authority
}
cfg.ServerName = serverName
conn := tls.Client(rawConn, cfg)
errChannel := make(chan error, 1)
go func() {
@@ -243,9 +263,11 @@ func applyDefaults(c *tls.Config) *tls.Config {
// certificates to establish the identity of the client need to be included in
// the credentials (eg: for mTLS), use NewTLS instead, where a complete
// tls.Config can be specified.
// serverNameOverride is for testing only. If set to a non empty string,
// it will override the virtual host name of authority (e.g. :authority header
// field) in requests.
//
// serverNameOverride is for testing only. If set to a non empty string, it will
// override the virtual host name of authority (e.g. :authority header field) in
// requests. Users should use grpc.WithAuthority passed to grpc.NewClient to
// override the authority of the client instead.
func NewClientTLSFromCert(cp *x509.CertPool, serverNameOverride string) TransportCredentials {
return NewTLS(&tls.Config{ServerName: serverNameOverride, RootCAs: cp})
}
@@ -255,9 +277,11 @@ func NewClientTLSFromCert(cp *x509.CertPool, serverNameOverride string) Transpor
// certificates to establish the identity of the client need to be included in
// the credentials (eg: for mTLS), use NewTLS instead, where a complete
// tls.Config can be specified.
// serverNameOverride is for testing only. If set to a non empty string,
// it will override the virtual host name of authority (e.g. :authority header
// field) in requests.
//
// serverNameOverride is for testing only. If set to a non empty string, it will
// override the virtual host name of authority (e.g. :authority header field) in
// requests. Users should use grpc.WithAuthority passed to grpc.NewClient to
// override the authority of the client instead.
func NewClientTLSFromFile(certFile, serverNameOverride string) (TransportCredentials, error) {
b, err := os.ReadFile(certFile)
if err != nil {

View File

@@ -213,6 +213,7 @@ func WithReadBufferSize(s int) DialOption {
func WithInitialWindowSize(s int32) DialOption {
return newFuncDialOption(func(o *dialOptions) {
o.copts.InitialWindowSize = s
o.copts.StaticWindowSize = true
})
}
@@ -222,6 +223,26 @@ func WithInitialWindowSize(s int32) DialOption {
func WithInitialConnWindowSize(s int32) DialOption {
return newFuncDialOption(func(o *dialOptions) {
o.copts.InitialConnWindowSize = s
o.copts.StaticWindowSize = true
})
}
// WithStaticStreamWindowSize returns a DialOption which sets the initial
// stream window size to the value provided and disables dynamic flow control.
func WithStaticStreamWindowSize(s int32) DialOption {
return newFuncDialOption(func(o *dialOptions) {
o.copts.InitialWindowSize = s
o.copts.StaticWindowSize = true
})
}
// WithStaticConnWindowSize returns a DialOption which sets the initial
// connection window size to the value provided and disables dynamic flow
// control.
func WithStaticConnWindowSize(s int32) DialOption {
return newFuncDialOption(func(o *dialOptions) {
o.copts.InitialConnWindowSize = s
o.copts.StaticWindowSize = true
})
}
@@ -360,7 +381,7 @@ func WithReturnConnectionError() DialOption {
//
// Note that using this DialOption with per-RPC credentials (through
// WithCredentialsBundle or WithPerRPCCredentials) which require transport
// security is incompatible and will cause grpc.Dial() to fail.
// security is incompatible and will cause RPCs to fail.
//
// Deprecated: use WithTransportCredentials and insecure.NewCredentials()
// instead. Will be supported throughout 1.x.
@@ -587,6 +608,8 @@ func WithChainStreamInterceptor(interceptors ...StreamClientInterceptor) DialOpt
// WithAuthority returns a DialOption that specifies the value to be used as the
// :authority pseudo-header and as the server name in authentication handshake.
// This overrides all other ways of setting authority on the channel, but can be
// overridden per-call by using grpc.CallAuthority.
func WithAuthority(a string) DialOption {
return newFuncDialOption(func(o *dialOptions) {
o.authority = a

View File

@@ -27,8 +27,10 @@ package encoding
import (
"io"
"slices"
"strings"
"google.golang.org/grpc/encoding/internal"
"google.golang.org/grpc/internal/grpcutil"
)
@@ -36,12 +38,26 @@ import (
// It is intended for grpc internal use only.
const Identity = "identity"
func init() {
internal.RegisterCompressorForTesting = func(c Compressor) func() {
name := c.Name()
curCompressor, found := registeredCompressor[name]
RegisterCompressor(c)
return func() {
if found {
registeredCompressor[name] = curCompressor
return
}
delete(registeredCompressor, name)
grpcutil.RegisteredCompressorNames = slices.DeleteFunc(grpcutil.RegisteredCompressorNames, func(s string) bool {
return s == name
})
}
}
}
// Compressor is used for compressing and decompressing when sending or
// receiving messages.
//
// If a Compressor implements `DecompressedSize(compressedBytes []byte) int`,
// gRPC will invoke it to determine the size of the buffer allocated for the
// result of decompression. A return value of -1 indicates unknown size.
type Compressor interface {
// Compress writes the data written to wc to w after compressing it. If an
// error occurs while initializing the compressor, that error is returned

View File

@@ -0,0 +1,28 @@
/*
*
* Copyright 2025 gRPC authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/
// Package internal contains code internal to the encoding package.
package internal
// RegisterCompressorForTesting registers a compressor in the global compressor
// registry. It returns a cleanup function that should be called at the end
// of the test to unregister the compressor.
//
// This prevents compressors registered in one test from appearing in the
// encoding headers of subsequent tests.
var RegisterCompressorForTesting any // func RegisterCompressor(c Compressor) func()

View File

@@ -46,9 +46,25 @@ func (c *codecV2) Marshal(v any) (data mem.BufferSlice, err error) {
return nil, fmt.Errorf("proto: failed to marshal, message is %T, want proto.Message", v)
}
// Important: if we remove this Size call then we cannot use
// UseCachedSize in MarshalOptions below.
size := proto.Size(vv)
// MarshalOptions with UseCachedSize allows reusing the result from the
// previous Size call. This is safe here because:
//
// 1. We just computed the size.
// 2. We assume the message is not being mutated concurrently.
//
// Important: If the proto.Size call above is removed, using UseCachedSize
// becomes unsafe and may lead to incorrect marshaling.
//
// For more details, see the doc of UseCachedSize:
// https://pkg.go.dev/google.golang.org/protobuf/proto#MarshalOptions
marshalOptions := proto.MarshalOptions{UseCachedSize: true}
if mem.IsBelowBufferPoolingThreshold(size) {
buf, err := proto.Marshal(vv)
buf, err := marshalOptions.Marshal(vv)
if err != nil {
return nil, err
}
@@ -56,7 +72,7 @@ func (c *codecV2) Marshal(v any) (data mem.BufferSlice, err error) {
} else {
pool := mem.DefaultBufferPool()
buf := pool.Get(size)
if _, err := (proto.MarshalOptions{}).MarshalAppend((*buf)[:0], vv); err != nil {
if _, err := marshalOptions.MarshalAppend((*buf)[:0], vv); err != nil {
pool.Put(buf)
return nil, err
}

View File

@@ -75,6 +75,8 @@ const (
MetricTypeIntHisto
MetricTypeFloatHisto
MetricTypeIntGauge
MetricTypeIntUpDownCount
MetricTypeIntAsyncGauge
)
// Int64CountHandle is a typed handle for a int count metric. This handle
@@ -93,6 +95,23 @@ func (h *Int64CountHandle) Record(recorder MetricsRecorder, incr int64, labels .
recorder.RecordInt64Count(h, incr, labels...)
}
// Int64UpDownCountHandle is a typed handle for an int up-down counter metric.
// This handle is passed at the recording point in order to know which metric
// to record on.
type Int64UpDownCountHandle MetricDescriptor
// Descriptor returns the int64 up-down counter handle typecast to a pointer to a
// MetricDescriptor.
func (h *Int64UpDownCountHandle) Descriptor() *MetricDescriptor {
return (*MetricDescriptor)(h)
}
// Record records the int64 up-down counter value on the metrics recorder provided.
// The value 'v' can be positive to increment or negative to decrement.
func (h *Int64UpDownCountHandle) Record(recorder MetricsRecorder, v int64, labels ...string) {
recorder.RecordInt64UpDownCount(h, v, labels...)
}
// Float64CountHandle is a typed handle for a float count metric. This handle is
// passed at the recording point in order to know which metric to record on.
type Float64CountHandle MetricDescriptor
@@ -154,6 +173,30 @@ func (h *Int64GaugeHandle) Record(recorder MetricsRecorder, incr int64, labels .
recorder.RecordInt64Gauge(h, incr, labels...)
}
// AsyncMetric is a marker interface for asynchronous metric types.
type AsyncMetric interface {
isAsync()
Descriptor() *MetricDescriptor
}
// Int64AsyncGaugeHandle is a typed handle for an int gauge metric. This handle is
// passed at the recording point in order to know which metric to record on.
type Int64AsyncGaugeHandle MetricDescriptor
// isAsync implements the AsyncMetric interface.
func (h *Int64AsyncGaugeHandle) isAsync() {}
// Descriptor returns the int64 gauge handle typecast to a pointer to a
// MetricDescriptor.
func (h *Int64AsyncGaugeHandle) Descriptor() *MetricDescriptor {
return (*MetricDescriptor)(h)
}
// Record records the int64 gauge value on the metrics recorder provided.
func (h *Int64AsyncGaugeHandle) Record(recorder AsyncMetricsRecorder, value int64, labels ...string) {
recorder.RecordInt64AsyncGauge(h, value, labels...)
}
// registeredMetrics are the registered metric descriptor names.
var registeredMetrics = make(map[string]bool)
@@ -249,6 +292,35 @@ func RegisterInt64Gauge(descriptor MetricDescriptor) *Int64GaugeHandle {
return (*Int64GaugeHandle)(descPtr)
}
// RegisterInt64UpDownCount registers the metric description onto the global registry.
// It returns a typed handle to use for recording data.
//
// NOTE: this function must only be called during initialization time (i.e. in
// an init() function), and is not thread-safe. If multiple metrics are
// registered with the same name, this function will panic.
func RegisterInt64UpDownCount(descriptor MetricDescriptor) *Int64UpDownCountHandle {
registerMetric(descriptor.Name, descriptor.Default)
// Set the specific metric type for the up-down counter
descriptor.Type = MetricTypeIntUpDownCount
descPtr := &descriptor
metricsRegistry[descriptor.Name] = descPtr
return (*Int64UpDownCountHandle)(descPtr)
}
// RegisterInt64AsyncGauge registers the metric description onto the global registry.
// It returns a typed handle to use for recording data.
//
// NOTE: this function must only be called during initialization time (i.e. in
// an init() function), and is not thread-safe. If multiple metrics are
// registered with the same name, this function will panic.
func RegisterInt64AsyncGauge(descriptor MetricDescriptor) *Int64AsyncGaugeHandle {
registerMetric(descriptor.Name, descriptor.Default)
descriptor.Type = MetricTypeIntAsyncGauge
descPtr := &descriptor
metricsRegistry[descriptor.Name] = descPtr
return (*Int64AsyncGaugeHandle)(descPtr)
}
// snapshotMetricsRegistryForTesting snapshots the global data of the metrics
// registry. Returns a cleanup function that sets the metrics registry to its
// original state.

View File

@@ -19,9 +19,13 @@
// Package stats contains experimental metrics/stats API's.
package stats
import "google.golang.org/grpc/stats"
import (
"google.golang.org/grpc/internal"
"google.golang.org/grpc/stats"
)
// MetricsRecorder records on metrics derived from metric registry.
// Implementors must embed UnimplementedMetricsRecorder.
type MetricsRecorder interface {
// RecordInt64Count records the measurement alongside labels on the int
// count associated with the provided handle.
@@ -38,6 +42,49 @@ type MetricsRecorder interface {
// RecordInt64Gauge records the measurement alongside labels on the int
// gauge associated with the provided handle.
RecordInt64Gauge(handle *Int64GaugeHandle, incr int64, labels ...string)
// RecordInt64UpDownCounter records the measurement alongside labels on the int
// count associated with the provided handle.
RecordInt64UpDownCount(handle *Int64UpDownCountHandle, incr int64, labels ...string)
// RegisterAsyncReporter registers a reporter to produce metric values for
// only the listed descriptors. The returned function must be called when
// the metrics are no longer needed, which will remove the reporter. The
// returned method needs to be idempotent and concurrent safe.
RegisterAsyncReporter(reporter AsyncMetricReporter, descriptors ...AsyncMetric) func()
// EnforceMetricsRecorderEmbedding is included to force implementers to embed
// another implementation of this interface, allowing gRPC to add methods
// without breaking users.
internal.EnforceMetricsRecorderEmbedding
}
// AsyncMetricReporter is an interface for types that record metrics asynchronously
// for the set of descriptors they are registered with. The AsyncMetricsRecorder
// parameter is used to record values for these metrics.
//
// Implementations must make unique recordings across all registered
// AsyncMetricReporters. Meaning, they should not report values for a metric with
// the same attributes as another AsyncMetricReporter will report.
//
// Implementations must be concurrent-safe.
type AsyncMetricReporter interface {
// Report records metric values using the provided recorder.
Report(AsyncMetricsRecorder) error
}
// AsyncMetricReporterFunc is an adapter to allow the use of ordinary functions as
// AsyncMetricReporters.
type AsyncMetricReporterFunc func(AsyncMetricsRecorder) error
// Report calls f(r).
func (f AsyncMetricReporterFunc) Report(r AsyncMetricsRecorder) error {
return f(r)
}
// AsyncMetricsRecorder records on asynchronous metrics derived from metric registry.
type AsyncMetricsRecorder interface {
// RecordInt64AsyncGauge records the measurement alongside labels on the int
// count associated with the provided handle asynchronously
RecordInt64AsyncGauge(handle *Int64AsyncGaugeHandle, incr int64, labels ...string)
}
// Metrics is an experimental legacy alias of the now-stable stats.MetricSet.
@@ -52,3 +99,33 @@ type Metric = string
func NewMetrics(metrics ...Metric) *Metrics {
return stats.NewMetricSet(metrics...)
}
// UnimplementedMetricsRecorder must be embedded to have forward compatible implementations.
type UnimplementedMetricsRecorder struct {
internal.EnforceMetricsRecorderEmbedding
}
// RecordInt64Count provides a no-op implementation.
func (UnimplementedMetricsRecorder) RecordInt64Count(*Int64CountHandle, int64, ...string) {}
// RecordFloat64Count provides a no-op implementation.
func (UnimplementedMetricsRecorder) RecordFloat64Count(*Float64CountHandle, float64, ...string) {}
// RecordInt64Histo provides a no-op implementation.
func (UnimplementedMetricsRecorder) RecordInt64Histo(*Int64HistoHandle, int64, ...string) {}
// RecordFloat64Histo provides a no-op implementation.
func (UnimplementedMetricsRecorder) RecordFloat64Histo(*Float64HistoHandle, float64, ...string) {}
// RecordInt64Gauge provides a no-op implementation.
func (UnimplementedMetricsRecorder) RecordInt64Gauge(*Int64GaugeHandle, int64, ...string) {}
// RecordInt64UpDownCount provides a no-op implementation.
func (UnimplementedMetricsRecorder) RecordInt64UpDownCount(*Int64UpDownCountHandle, int64, ...string) {
}
// RegisterAsyncReporter provides a no-op implementation.
func (UnimplementedMetricsRecorder) RegisterAsyncReporter(AsyncMetricReporter, ...AsyncMetric) func() {
// No-op: Return an empty function to ensure caller doesn't panic on nil function call
return func() {}
}

View File

@@ -17,7 +17,7 @@
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
// protoc-gen-go v1.36.5
// protoc-gen-go v1.36.10
// protoc v5.27.1
// source: grpc/health/v1/health.proto
@@ -261,63 +261,29 @@ func (x *HealthListResponse) GetStatuses() map[string]*HealthCheckResponse {
var File_grpc_health_v1_health_proto protoreflect.FileDescriptor
var file_grpc_health_v1_health_proto_rawDesc = string([]byte{
0x0a, 0x1b, 0x67, 0x72, 0x70, 0x63, 0x2f, 0x68, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x2f, 0x76, 0x31,
0x2f, 0x68, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x12, 0x0e, 0x67,
0x72, 0x70, 0x63, 0x2e, 0x68, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x2e, 0x76, 0x31, 0x22, 0x2e, 0x0a,
0x12, 0x48, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x43, 0x68, 0x65, 0x63, 0x6b, 0x52, 0x65, 0x71, 0x75,
0x65, 0x73, 0x74, 0x12, 0x18, 0x0a, 0x07, 0x73, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x18, 0x01,
0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x73, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x22, 0xb1, 0x01,
0x0a, 0x13, 0x48, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x43, 0x68, 0x65, 0x63, 0x6b, 0x52, 0x65, 0x73,
0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x49, 0x0a, 0x06, 0x73, 0x74, 0x61, 0x74, 0x75, 0x73, 0x18,
0x01, 0x20, 0x01, 0x28, 0x0e, 0x32, 0x31, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x68, 0x65, 0x61,
0x6c, 0x74, 0x68, 0x2e, 0x76, 0x31, 0x2e, 0x48, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x43, 0x68, 0x65,
0x63, 0x6b, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x2e, 0x53, 0x65, 0x72, 0x76, 0x69,
0x6e, 0x67, 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, 0x52, 0x06, 0x73, 0x74, 0x61, 0x74, 0x75, 0x73,
0x22, 0x4f, 0x0a, 0x0d, 0x53, 0x65, 0x72, 0x76, 0x69, 0x6e, 0x67, 0x53, 0x74, 0x61, 0x74, 0x75,
0x73, 0x12, 0x0b, 0x0a, 0x07, 0x55, 0x4e, 0x4b, 0x4e, 0x4f, 0x57, 0x4e, 0x10, 0x00, 0x12, 0x0b,
0x0a, 0x07, 0x53, 0x45, 0x52, 0x56, 0x49, 0x4e, 0x47, 0x10, 0x01, 0x12, 0x0f, 0x0a, 0x0b, 0x4e,
0x4f, 0x54, 0x5f, 0x53, 0x45, 0x52, 0x56, 0x49, 0x4e, 0x47, 0x10, 0x02, 0x12, 0x13, 0x0a, 0x0f,
0x53, 0x45, 0x52, 0x56, 0x49, 0x43, 0x45, 0x5f, 0x55, 0x4e, 0x4b, 0x4e, 0x4f, 0x57, 0x4e, 0x10,
0x03, 0x22, 0x13, 0x0a, 0x11, 0x48, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x4c, 0x69, 0x73, 0x74, 0x52,
0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x22, 0xc4, 0x01, 0x0a, 0x12, 0x48, 0x65, 0x61, 0x6c, 0x74,
0x68, 0x4c, 0x69, 0x73, 0x74, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x4c, 0x0a,
0x08, 0x73, 0x74, 0x61, 0x74, 0x75, 0x73, 0x65, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32,
0x30, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x68, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x2e, 0x76, 0x31,
0x2e, 0x48, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x4c, 0x69, 0x73, 0x74, 0x52, 0x65, 0x73, 0x70, 0x6f,
0x6e, 0x73, 0x65, 0x2e, 0x53, 0x74, 0x61, 0x74, 0x75, 0x73, 0x65, 0x73, 0x45, 0x6e, 0x74, 0x72,
0x79, 0x52, 0x08, 0x73, 0x74, 0x61, 0x74, 0x75, 0x73, 0x65, 0x73, 0x1a, 0x60, 0x0a, 0x0d, 0x53,
0x74, 0x61, 0x74, 0x75, 0x73, 0x65, 0x73, 0x45, 0x6e, 0x74, 0x72, 0x79, 0x12, 0x10, 0x0a, 0x03,
0x6b, 0x65, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x03, 0x6b, 0x65, 0x79, 0x12, 0x39,
0x0a, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x23, 0x2e,
0x67, 0x72, 0x70, 0x63, 0x2e, 0x68, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x2e, 0x76, 0x31, 0x2e, 0x48,
0x65, 0x61, 0x6c, 0x74, 0x68, 0x43, 0x68, 0x65, 0x63, 0x6b, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e,
0x73, 0x65, 0x52, 0x05, 0x76, 0x61, 0x6c, 0x75, 0x65, 0x3a, 0x02, 0x38, 0x01, 0x32, 0xfd, 0x01,
0x0a, 0x06, 0x48, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x12, 0x50, 0x0a, 0x05, 0x43, 0x68, 0x65, 0x63,
0x6b, 0x12, 0x22, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x68, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x2e,
0x76, 0x31, 0x2e, 0x48, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x43, 0x68, 0x65, 0x63, 0x6b, 0x52, 0x65,
0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x23, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x68, 0x65, 0x61,
0x6c, 0x74, 0x68, 0x2e, 0x76, 0x31, 0x2e, 0x48, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x43, 0x68, 0x65,
0x63, 0x6b, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x4d, 0x0a, 0x04, 0x4c, 0x69,
0x73, 0x74, 0x12, 0x21, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x68, 0x65, 0x61, 0x6c, 0x74, 0x68,
0x2e, 0x76, 0x31, 0x2e, 0x48, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x4c, 0x69, 0x73, 0x74, 0x52, 0x65,
0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x22, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x68, 0x65, 0x61,
0x6c, 0x74, 0x68, 0x2e, 0x76, 0x31, 0x2e, 0x48, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x4c, 0x69, 0x73,
0x74, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x12, 0x52, 0x0a, 0x05, 0x57, 0x61, 0x74,
0x63, 0x68, 0x12, 0x22, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x68, 0x65, 0x61, 0x6c, 0x74, 0x68,
0x2e, 0x76, 0x31, 0x2e, 0x48, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x43, 0x68, 0x65, 0x63, 0x6b, 0x52,
0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x1a, 0x23, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x68, 0x65,
0x61, 0x6c, 0x74, 0x68, 0x2e, 0x76, 0x31, 0x2e, 0x48, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x43, 0x68,
0x65, 0x63, 0x6b, 0x52, 0x65, 0x73, 0x70, 0x6f, 0x6e, 0x73, 0x65, 0x30, 0x01, 0x42, 0x70, 0x0a,
0x11, 0x69, 0x6f, 0x2e, 0x67, 0x72, 0x70, 0x63, 0x2e, 0x68, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x2e,
0x76, 0x31, 0x42, 0x0b, 0x48, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x50, 0x72, 0x6f, 0x74, 0x6f, 0x50,
0x01, 0x5a, 0x2c, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2e, 0x67, 0x6f, 0x6c, 0x61, 0x6e, 0x67,
0x2e, 0x6f, 0x72, 0x67, 0x2f, 0x67, 0x72, 0x70, 0x63, 0x2f, 0x68, 0x65, 0x61, 0x6c, 0x74, 0x68,
0x2f, 0x67, 0x72, 0x70, 0x63, 0x5f, 0x68, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x5f, 0x76, 0x31, 0xa2,
0x02, 0x0c, 0x47, 0x72, 0x70, 0x63, 0x48, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x56, 0x31, 0xaa, 0x02,
0x0e, 0x47, 0x72, 0x70, 0x63, 0x2e, 0x48, 0x65, 0x61, 0x6c, 0x74, 0x68, 0x2e, 0x56, 0x31, 0x62,
0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33,
})
const file_grpc_health_v1_health_proto_rawDesc = "" +
"\n" +
"\x1bgrpc/health/v1/health.proto\x12\x0egrpc.health.v1\".\n" +
"\x12HealthCheckRequest\x12\x18\n" +
"\aservice\x18\x01 \x01(\tR\aservice\"\xb1\x01\n" +
"\x13HealthCheckResponse\x12I\n" +
"\x06status\x18\x01 \x01(\x0e21.grpc.health.v1.HealthCheckResponse.ServingStatusR\x06status\"O\n" +
"\rServingStatus\x12\v\n" +
"\aUNKNOWN\x10\x00\x12\v\n" +
"\aSERVING\x10\x01\x12\x0f\n" +
"\vNOT_SERVING\x10\x02\x12\x13\n" +
"\x0fSERVICE_UNKNOWN\x10\x03\"\x13\n" +
"\x11HealthListRequest\"\xc4\x01\n" +
"\x12HealthListResponse\x12L\n" +
"\bstatuses\x18\x01 \x03(\v20.grpc.health.v1.HealthListResponse.StatusesEntryR\bstatuses\x1a`\n" +
"\rStatusesEntry\x12\x10\n" +
"\x03key\x18\x01 \x01(\tR\x03key\x129\n" +
"\x05value\x18\x02 \x01(\v2#.grpc.health.v1.HealthCheckResponseR\x05value:\x028\x012\xfd\x01\n" +
"\x06Health\x12P\n" +
"\x05Check\x12\".grpc.health.v1.HealthCheckRequest\x1a#.grpc.health.v1.HealthCheckResponse\x12M\n" +
"\x04List\x12!.grpc.health.v1.HealthListRequest\x1a\".grpc.health.v1.HealthListResponse\x12R\n" +
"\x05Watch\x12\".grpc.health.v1.HealthCheckRequest\x1a#.grpc.health.v1.HealthCheckResponse0\x01Bp\n" +
"\x11io.grpc.health.v1B\vHealthProtoP\x01Z,google.golang.org/grpc/health/grpc_health_v1\xa2\x02\fGrpcHealthV1\xaa\x02\x0eGrpc.Health.V1b\x06proto3"
var (
file_grpc_health_v1_health_proto_rawDescOnce sync.Once

View File

@@ -17,7 +17,7 @@
// Code generated by protoc-gen-go-grpc. DO NOT EDIT.
// versions:
// - protoc-gen-go-grpc v1.5.1
// - protoc-gen-go-grpc v1.6.0
// - protoc v5.27.1
// source: grpc/health/v1/health.proto
@@ -188,13 +188,13 @@ type HealthServer interface {
type UnimplementedHealthServer struct{}
func (UnimplementedHealthServer) Check(context.Context, *HealthCheckRequest) (*HealthCheckResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method Check not implemented")
return nil, status.Error(codes.Unimplemented, "method Check not implemented")
}
func (UnimplementedHealthServer) List(context.Context, *HealthListRequest) (*HealthListResponse, error) {
return nil, status.Errorf(codes.Unimplemented, "method List not implemented")
return nil, status.Error(codes.Unimplemented, "method List not implemented")
}
func (UnimplementedHealthServer) Watch(*HealthCheckRequest, grpc.ServerStreamingServer[HealthCheckResponse]) error {
return status.Errorf(codes.Unimplemented, "method Watch not implemented")
return status.Error(codes.Unimplemented, "method Watch not implemented")
}
func (UnimplementedHealthServer) testEmbeddedByValue() {}

View File

@@ -97,8 +97,12 @@ type StreamServerInfo struct {
IsServerStream bool
}
// StreamServerInterceptor provides a hook to intercept the execution of a streaming RPC on the server.
// info contains all the information of this RPC the interceptor can operate on. And handler is the
// service method implementation. It is the responsibility of the interceptor to invoke handler to
// complete the RPC.
// StreamServerInterceptor provides a hook to intercept the execution of a
// streaming RPC on the server.
//
// srv is the service implementation on which the RPC was invoked, and needs to
// be passed to handler, and not used otherwise. ss is the server side of the
// stream. info contains all the information of this RPC the interceptor can
// operate on. And handler is the service method implementation. It is the
// responsibility of the interceptor to invoke handler to complete the RPC.
type StreamServerInterceptor func(srv any, ss ServerStream, info *StreamServerInfo, handler StreamHandler) error

View File

@@ -67,6 +67,10 @@ type Balancer struct {
// balancerCurrent before the UpdateSubConnState is called on the
// balancerCurrent.
currentMu sync.Mutex
// activeGoroutines tracks all the goroutines that this balancer has started
// and that should be waited on when the balancer closes.
activeGoroutines sync.WaitGroup
}
// swap swaps out the current lb with the pending lb and updates the ClientConn.
@@ -76,7 +80,9 @@ func (gsb *Balancer) swap() {
cur := gsb.balancerCurrent
gsb.balancerCurrent = gsb.balancerPending
gsb.balancerPending = nil
gsb.activeGoroutines.Add(1)
go func() {
defer gsb.activeGoroutines.Done()
gsb.currentMu.Lock()
defer gsb.currentMu.Unlock()
cur.Close()
@@ -223,15 +229,7 @@ func (gsb *Balancer) ExitIdle() {
// There is no need to protect this read with a mutex, as the write to the
// Balancer field happens in SwitchTo, which completes before this can be
// called.
if ei, ok := balToUpdate.Balancer.(balancer.ExitIdler); ok {
ei.ExitIdle()
return
}
gsb.mu.Lock()
defer gsb.mu.Unlock()
for sc := range balToUpdate.subconns {
sc.Connect()
}
balToUpdate.ExitIdle()
}
// updateSubConnState forwards the update to the appropriate child.
@@ -282,6 +280,7 @@ func (gsb *Balancer) Close() {
currentBalancerToClose.Close()
pendingBalancerToClose.Close()
gsb.activeGoroutines.Wait()
}
// balancerWrapper wraps a balancer.Balancer, and overrides some Balancer
@@ -332,7 +331,12 @@ func (bw *balancerWrapper) UpdateState(state balancer.State) {
defer bw.gsb.mu.Unlock()
bw.lastState = state
// If Close() acquires the mutex before UpdateState(), the balancer
// will already have been removed from the current or pending state when
// reaching this point.
if !bw.gsb.balancerCurrentOrPending(bw) {
// Returning here ensures that (*Balancer).swap() is not invoked after
// (*Balancer).Close() and therefore prevents "use after close".
return
}

View File

@@ -0,0 +1,66 @@
/*
*
* Copyright 2025 gRPC authors.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/
// Package weight contains utilities to manage endpoint weights. Weights are
// used by LB policies such as ringhash to distribute load across multiple
// endpoints.
package weight
import (
"fmt"
"google.golang.org/grpc/resolver"
)
// attributeKey is the type used as the key to store EndpointInfo in the
// Attributes field of resolver.Endpoint.
type attributeKey struct{}
// EndpointInfo will be stored in the Attributes field of Endpoints in order to
// use the ringhash balancer.
type EndpointInfo struct {
Weight uint32
}
// Equal allows the values to be compared by Attributes.Equal.
func (a EndpointInfo) Equal(o any) bool {
oa, ok := o.(EndpointInfo)
return ok && oa.Weight == a.Weight
}
// Set returns a copy of endpoint in which the Attributes field is updated with
// EndpointInfo.
func Set(endpoint resolver.Endpoint, epInfo EndpointInfo) resolver.Endpoint {
endpoint.Attributes = endpoint.Attributes.WithValue(attributeKey{}, epInfo)
return endpoint
}
// String returns a human-readable representation of EndpointInfo.
// This method is intended for logging, testing, and debugging purposes only.
// Do not rely on the output format, as it is not guaranteed to remain stable.
func (a EndpointInfo) String() string {
return fmt.Sprintf("Weight: %d", a.Weight)
}
// FromEndpoint returns the EndpointInfo stored in the Attributes field of an
// endpoint. It returns an empty EndpointInfo if attribute is not found.
func FromEndpoint(endpoint resolver.Endpoint) EndpointInfo {
v := endpoint.Attributes.Value(attributeKey{})
ei, _ := v.(EndpointInfo)
return ei
}

View File

@@ -83,6 +83,7 @@ func (b *Unbounded) Load() {
default:
}
} else if b.closing && !b.closed {
b.closed = true
close(b.c)
}
}

View File

@@ -194,7 +194,7 @@ func (r RefChannelType) String() string {
// If channelz is not turned ON, this will simply log the event descriptions.
func AddTraceEvent(l grpclog.DepthLoggerV2, e Entity, depth int, desc *TraceEvent) {
// Log only the trace description associated with the bottom most entity.
d := fmt.Sprintf("[%s]%s", e, desc.Desc)
d := fmt.Sprintf("[%s] %s", e, desc.Desc)
switch desc.Severity {
case CtUnknown, CtInfo:
l.InfoDepth(depth+1, d)

View File

@@ -20,20 +20,6 @@ import (
"context"
)
// requestInfoKey is a struct to be used as the key to store RequestInfo in a
// context.
type requestInfoKey struct{}
// NewRequestInfoContext creates a context with ri.
func NewRequestInfoContext(ctx context.Context, ri any) context.Context {
return context.WithValue(ctx, requestInfoKey{}, ri)
}
// RequestInfoFromContext extracts the RequestInfo from ctx.
func RequestInfoFromContext(ctx context.Context) any {
return ctx.Value(requestInfoKey{})
}
// clientHandshakeInfoKey is a struct used as the key to store
// ClientHandshakeInfo in a context.
type clientHandshakeInfoKey struct{}

Some files were not shown because too many files have changed in this diff Show More