Compare commits

...

132 Commits

Author SHA1 Message Date
Manuel Huber
660e3bb653 gpu: Obsolete the NVIDIA initrd build
As the NVIDIA stack has shifted to using an image for both the
confidential and non-confidential variants, we retire the initrd
build.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 21:29:58 -04:00
Aurélien Bombo
f8e234c6f9 Merge pull request #12650 from kata-containers/sprt/remove-csi
ci: Stop building/deploying CSI driver
2026-03-16 16:53:02 -05:00
Steve Horsman
294c367063 Merge pull request #12668 from manuelh-dev/release/3.28.0
release: Bump version to 3.28.0
2026-03-16 19:47:12 +00:00
Manuel Huber
5210584f95 release: Bump version to 3.28.0
Bump VERSION and helm-charts versions.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 09:52:35 -07:00
Manuel Huber
e13748f46d tests: Adapt trusted ephemeral storage test
With the new CDH version, the LUKS header is moved off of the disk
into guest memory. We hence adapt the test's filesystem type checks.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 09:43:17 -07:00
Manuel Huber
5bbc0abb81 tests: use pre-created, signed sealed secrets
With signature support for sealed secret, use pre-created signed
sealed secrets and provision the signing public key to the KBS.

Add instructions for re-creating these signed secrets.

Improve k8s-sealed-secrets.bats by reducing repeated kubectl logs
calls. A test run showed a SIGPIPE error one one of the grep-logs
while the printouts of the initial kubectl logs invocation showed
that the expected values were actually in the logs.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 09:43:17 -07:00
Manuel Huber
a9b222f91e gpu: Update chiseled rootfs with new CDH deps
With CDH requiring libcryptsetup, mkfs.ext4, dd, and their
dependencies, we will need to update the chiseled NVIDIA rootfs
accordingly.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 09:43:17 -07:00
Manuel Huber
169f92ff09 agent: cdh: Update CDH and API
With the new CDH version, the secure_mount API changes.
Further, the new CDH version no longer uses the luks-encrypt-storage
script but utilizes libcryptsetup as well as mkfs.ext4 and dd. Hence, adapt
some of the CDH and Kata components build steps

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-16 09:43:17 -07:00
Alex Lyn
ef5db0a01f Merge pull request #12607 from zvonkok/system-map
kernel: Ship System.map as part of the kernel build
2026-03-16 09:37:44 +08:00
Zvonko Kaiser
99f32de1e5 kata-deploy: Update RuntimeClass PodOverhead
Align the podOverhead with the default_memory updated
in the previous commit.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
6a853a9684 gpu: Bump NVRC
We have a new release add this one to the next
Kata release.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>

Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
8ff5d164c6 runtime: make CDI annotation vendor-agnostic with lookup table
Replace hardcoded NVIDIA vendor ID (0x10de) and class (0x030) checks
with a vendor-agnostic lookup table (cdiDeviceKind) that maps PCI
vendor/class pairs to CDI device kinds. This makes it straightforward
to add support for new device types by adding entries to the table.

Refactor siblingAnnotation to resolve device BDFs once upfront and
reuse them for both CDI type detection and sibling matching, eliminating
redundant sysfs reads. Devices not in the lookup table (e.g. NVSwitches)
are skipped with errNoSiblingFound, while known device types that fail
to match a sibling produce a hard error.

Consolidate the hot-plug and cold-plug device loops into a single loop
over extracted container paths, removing duplicated filtering logic.

Export GetPCIDeviceProperty from the device drivers package to allow
vendor/class lookup from sysfs in the container annotation path.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
d4c21f50b5 gpu: Bump default memory to 8G for GPU runtimes
We need enough inital memory to prepare more complex
platforms like HGX H100 or HGX B200 systems.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
5c9683f006 gpu: Remove devtmpfs.mount=0
With the newest NVRC release this is solved and does
not need to be overriden.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
d22c314e91 gpu: Increase dial_timeout=1200
For cold-plug when running with nerdctl the timeouts in the config
are being used, increase the dial_timeout (e.g. for CreateSandbox) to match
create_container_timeout.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Zvonko Kaiser
7fe84c8038 gpu: HGX Rootfs Fixes
Various smaller fixes to enable HGX systems.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-15 09:53:32 -07:00
Joji Mekkattuparamban
1fd66db271 nvidia-gpu: add missing libraries to rootfs
Added the missing packages to the nvidia rootfs.

Fixes #12534

Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>
2026-03-13 16:24:32 -07:00
Dan Mihai
9332b75c04 Merge pull request #12661 from stevenhorsman/runtime-go-1.25.8
runtime: bump go.mod version
2026-03-13 14:06:08 -07:00
Zvonko Kaiser
d382379571 kernel: Ship System.map as part of the kernel build
Some use-cases need the System.map of the running kernel,
ship it via kernel-artifact.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-13 19:27:18 +00:00
Manuel Huber
4a7022d2f4 tests: nvidia: call genpolicy auth for all tests
Call the setup_genpolicy_registry_auth in run_kubernetes_nv_tests.sh.
Authenticate before exercising any tests.

Recently, we have seen UnauthorizedError messages for the CUDA
vectorAdd image. While this image is not gated behind authentication,
rate limiting may be a possible issue.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-13 09:03:01 -07:00
Zvonko Kaiser
4c450a5b01 Merge pull request #12648 from manuelh-dev/mahuber/trustee-upgrade
versions: bump trustee to latest version
2026-03-12 14:09:15 -04:00
Steve Horsman
7d2e18575c Merge pull request #12343 from zvonkok/release-model
doc: Release model update
2026-03-12 14:44:51 +00:00
Zvonko Kaiser
7f662662cf lint: Fix 80 char column size
Make markdownlint happy

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-12 12:03:29 +00:00
Zvonko Kaiser
6e03a95730 doc: Update Release Process
Add how Kata is doing the rolling release.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-03-12 12:03:29 +00:00
stevenhorsman
f25fa6ab25 runtime: bump go.mod version
Update the runtime's go.mod go version to 1.25.8 to
keep in sync with versions.yaml

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-12 08:53:40 +00:00
Steve Horsman
a29eb3751a Merge pull request #12517 from kata-containers/osv-scanner-bump-2.3.3
workflows: Bump OSV scanner
2026-03-12 08:48:52 +00:00
stevenhorsman
064a960aaa workflows: Bump OSV scanner
Bump to the latest version to pick up bug fixes

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-12 07:00:11 +00:00
Steve Horsman
f41edcb4c0 Merge pull request #12653 from kata-containers/dependabot/cargo/src/tools/agent-ctl/quinn-proto-0.11.14
build(deps): bump quinn-proto from 0.11.8 to 0.11.14 in /src/tools/agent-ctl
2026-03-12 06:53:59 +00:00
Manuel Huber
0926c92aa0 versions: bump trustee to latest version
Ingest various recent fixes and dependency updates.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-11 14:45:42 -07:00
Manuel Huber
8162d15b46 nvidia: fix invalid CTK reference
Use proper reference from versions yaml structure.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-03-11 12:49:29 -07:00
Aurélien Bombo
32444737b5 gatekeeper: Remove csi-kata-directvolume build from required tests
Since we don't build that anymore.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-11 12:55:23 -05:00
Aurélien Bombo
64aed13d5f Revert "ci: Add no-op step to compile CSI driver"
This reverts commit e43c59a2c6.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-11 12:55:23 -05:00
Aurélien Bombo
dd2c4c0db3 Revert "coco: ci: Add no-op steps to deploy CSI driver"
This reverts commit 5e4990bcf5.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-11 12:55:23 -05:00
Aurélien Bombo
d598e0baf1 Revert "ci: Implement build step for CSI driver"
This partially reverts commit fb87bf221f.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-11 12:55:23 -05:00
dependabot[bot]
d366d103cc build(deps): bump quinn-proto in /src/tools/agent-ctl
Bumps [quinn-proto](https://github.com/quinn-rs/quinn) from 0.11.8 to 0.11.14.
- [Release notes](https://github.com/quinn-rs/quinn/releases)
- [Commits](https://github.com/quinn-rs/quinn/compare/quinn-proto-0.11.8...quinn-proto-0.11.14)

---
updated-dependencies:
- dependency-name: quinn-proto
  dependency-version: 0.11.14
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-11 16:04:34 +00:00
Dan Mihai
04f180434e Merge pull request #12640 from burgerdev/genpolicy-workspace
genpolicy: add to Cargo workspace
2026-03-11 09:02:39 -07:00
Steve Horsman
ba0f5b98fe Merge pull request #12643 from stevenhorsman/bump-golang-to-1.25.8
versions: bump golang to 1.25.8
2026-03-11 08:53:21 +00:00
Markus Rudy
cf7d4c33b3 kata-deploy: fix binary location for genpolicy
Moving the genpolicy crate into the root workspace causes the build
outputs to go into the root workspace's target directory, instead of
src/tools/genpolicy/target, invalidating assumptions made by the
kata-deploy-binaries script.

This commit adds a special case for the lookup path of the genpolicy
binary, and fixes two bugs that made identifying this problem harder.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-03-11 09:30:48 +01:00
Markus Rudy
221a22bd7d genpolicy: ignore RUSTSEC-2024-0320
The yaml-rust dependency is unmaintained, but no suitable alternatives
exist. We log an exception for this now and will revisit the topic after
some time.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-03-11 09:30:48 +01:00
Markus Rudy
6643b258bb genpolicy: update oci-client to v0.16.1
The older version we used transitively depends on an unmaintained crate.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-03-11 09:30:48 +01:00
Markus Rudy
8dfeeea924 genpolicy: add to Cargo workspace
This commit adds the genpolicy utility to the root workspace. For now,
only dependencies that are already in the root workspace are consumed
from there, the genpolicy-specific ones should be added later.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-03-11 09:30:46 +01:00
Markus Rudy
fc4eaf8b66 runtime-rs: specify the subpackage to build
Before this change, `make test` for runtime-rs used to test all crates
in the root workspace (due to the `--all` flag). This was not intended
but happened to be mostly working. However, genpolicy needs additional
steps before it can build, so this behavior blocks adding genpolicy to
the root workspace.

The solution here is to only build the inteded packages. For the build
and run commands, this is the runtime-rs crate itself. For testing, we
need to include the sub-crates, too, which needs a bit of cargo metadata
scraping.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-03-11 09:28:24 +01:00
Aurélien Bombo
b6c60d9229 Merge pull request #10559 from sprt/conf-local-storage
coco: Implement trusted ephemeral data storage
2026-03-10 10:39:40 -05:00
Dan Mihai
f9a8eb6ecc genpolicy: allow_mount improvements for emptyDir
1. Reduce the complexity of the new allow_mount rules for emptyDir.

2. Reverse the order of the two allow_mount versions, as a hint to the
   rego engine that the first version is more often matching the input.

3. Remove `p_mount.source != ""` from mount_source_allows, because:
 - Policy rules typically test the values from input, not values read
   from Policy.
 - mount_source_allows is no longer called for emptyDir mounts after
   these changes, so p_mount.source is not empty.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-03-09 14:52:17 -05:00
Fabiano Fidêncio
374b0abe29 tests: Fix kubelet data dir for k0s in trusted ephemeral storage test
k0s uses /var/lib/k0s/kubelet instead of /var/lib/kubelet as its
kubelet data directory. Introduce get_kubelet_data_dir() in
tests_common.sh and use it in k8s-trusted-ephemeral-data-storage.bats
instead of hardcoding /var/lib/kubelet.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-09 14:52:17 -05:00
Aurélien Bombo
718632bfe0 build: Add artifacts to .gitignore
This adds various files that are generated during development.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-09 14:52:17 -05:00
Aurélien Bombo
68bdbef676 tests: Improve logging for some tests
Use modern test semantics to ease debugging.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-09 14:52:17 -05:00
Aurélien Bombo
3dd77bf576 tests: Introduce new env variables to ease development
It can be useful to set these variables during local testing:

 * AZ_REGION: Region for the cluster.
 * AZ_NODEPOOL_TAGS: Node pool tags for the cluster.
 * GENPOLICY_BINARY: Path to the genpolicy binary.
 * GENPOLICY_SETTINGS_DIR: Directory holding the genpolicy settings.

I've also made it so that tests_common.sh modifies the duplicated
genpolicy-settings.json (used for testing) instead of the original git-tracked
one.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-09 14:52:17 -05:00
Aurélien Bombo
aae54f704c ci: Stop deploying the CSI driver
The design moved away from CSI driver so stop deploying that.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-09 14:52:17 -05:00
Aurélien Bombo
a98e328359 tests: Add test for trusted ephemeral data storage
This tests the feature on CoCo machines.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-09 14:52:17 -05:00
Aurélien Bombo
9fe03fb170 genpolicy: Support trusted ephemeral data storage
* Introduces a new cluster_config setting encrypted_emptydir defaulting to true.
 * Adapts genpolicy for encrypted emptyDirs.

Crucially, the rules.rego change checks that the mount and the storage are
well-formed together:

 * i_storage.source matches a known regex.
 * i_storage.mount_point == $(spath)/BASE64(i_storage.source)
 * i_storage.mount_point == p_storage.mount_point
 * i_storage.mount_point == i_mount.source

Note that policy enforcement is necessary to prevent rogue device injection.
E.g. the agent could not blindly encrypt all block devices as some use cases
only need dm-verity.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-09 14:52:17 -05:00
Aurélien Bombo
eaa711617e agent: Support trusted ephemeral data storage
Handles block-based emptyDirs plugged via virtio-blk and virtio-scsi by
encrypting and formatting them.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-09 14:52:17 -05:00
Aurélien Bombo
a4fd32a29a runtime: Support trusted ephemeral data storage
* Introduces the `emptydir_mode` config flag to allow instructing the runtime
   to create a block device for emptyDir volumes.
 * The block device is created in the original emptyDir folder on the host
   so that Kubelet can monitors its disk usage and evict the pod if it exceeds
   its sizeLimit. This matches runc and virtio-fs.
 * The block device's disk image file is sparse to minimize host disk
   footprint.

Fixes: #10560

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-09 14:52:17 -05:00
Alex Lyn
fb743a304c runtime: Support plugging a disk as an image file
Some VMMs support plugging a disk as an image file instead of a block device,
so we adapt the runtime to support that.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Co-authored-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-09 14:52:17 -05:00
stevenhorsman
8ae0e36737 versions: bump golang to 1.25.8
Bump the builder image and versions to resolve CVEs:
- GO-2026-4601
- GO-2026-4602
- GO-2026-4603

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-09 09:10:01 +00:00
Alex Lyn
22c4cab237 Merge pull request #12623 from Apokleos/fix-dgb-ut
runtime-rs: Fix dragonball's flaky unit tests
2026-03-09 11:38:02 +08:00
Alex Lyn
62b0f63e37 dragonball: Generate unique TAP names to avoid conflicts
The vhost-kern net unit test used a fixed TAP interface name
("test_vhosttap"). When tests run in parallel or a previous run
leaves the interface behind, TAP creation can fail with
EBUSY ("Resource busy"), making CI flaky.

Introduce a unique_tap_name() helper in the tests and use it to
generate a per-test TAP name (based on pid/thread/counter),
avoiding name collisions and stabilizing CI.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 17:33:40 +08:00
Alex Lyn
b2932f963a Merge pull request #12631 from Apokleos/fix-suffix
ci: keep mktemp output suffix stable with .yaml
2026-03-06 14:15:49 +08:00
Alex Lyn
1c8c0089da dragonball: fix flaky signal_handler test using libc::raise
The signal_handler test was intermittently failing because it used
kill(pid, sig), which sends signals asynchronously to the process.
This created a race condition where the child thread could exit and
be joined before the signal was delivered or processed.

This fix including:
1. Replaces `kill` with `libc::raise` to ensure signals are delivered
   synchronously to the calling thread.
2. Reorders triggers to verify standard signals before installing
   seccomp filters.
3. Guarantees that metrics are incremented before the child thread
   terminates and is joined by the main thread.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:28:56 +08:00
Alex Lyn
d0718f6001 dragonball: Fix unnecessary parentheses around type
warning: unnecessary parentheses around type
   --> src/dragonball/dbs_legacy_devices/src/serial.rs:245:39
    |
245 |         let out: Arc<Mutex<Option<Box<(dyn std::io::Write + Send +
'static)>>>> =
    |                                       ^
^
    |
    = note: `#[warn(unused_parens)]` (part of `#[warn(unused)]`) on by
default
help: remove these parentheses

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:28:56 +08:00
Alex Lyn
b4161198ee dragonball: Remove unused imports variables in dbs_pci
Fix warnings of unused imports as below:
```
warning: unused imports: `DEVICE_ACKNOWLEDGE`, `DEVICE_DRIVER_OK`,
`DEVICE_DRIVER`, `DEVICE_FEATURES_OK`, and `DEVICE_INIT`
    --> src/dragonball/dbs_pci/src/virtio_pci.rs:1177:9
     |
1177 |         DEVICE_ACKNOWLEDGE, DEVICE_DRIVER, DEVICE_DRIVER_OK,
DEVICE_FEATURES_OK, DEVICE_INIT,
     |         ^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^
     |
     = note: `#[warn(unused_imports)]` (part of `#[warn(unused)]`) on by
default
```

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:28:56 +08:00
Alex Lyn
ca4e14086f runtime-rs: Fix warnings of unformatted codes
Fix warnings from unformattted codes.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:28:56 +08:00
Alex Lyn
ce800b7c37 dragonball: Fix flaky test_vhost_user_net_virtio_device_activate hang
The vhost-user-net tests could hang in CI because
VhostUserNet::new_server() blocks indefinitely on listener.accept()
when the slave fails to connect in time
(e.g. due to scheduler delays or flaky socket paths). This also caused
panics when connect_slave() returned None and the test unwrapped it.

Fix the tests by:
- using a `/tmp`, absolute, unique unix socket path per test run
  retrying slave connect with a deadline
- running new_server() in a separate thread and waiting via
  recv_timeout() to ensure the test never blocks indefinitely

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:28:56 +08:00
Alex Lyn
a988b10440 dragonball: Fix flaky test_vhost_user_net_virtio_device_normal hang
It aims to fix flaky test hang by implementing thread timeouts.

The `test_vhost_user_net_virtio_device_normal` was hanging in CI
when master/slave threads drifted.

This commit stabilizes the test by:
- Using `tempfile` and unique paths to ensure socket isolation.
- Adding a 5s deadline for slave connections to handle CI jitter.
- Running `new_server` in a separate thread with a `recv_timeout`
  to prevent the CI pipeline from deadlocking.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:28:56 +08:00
Alex Lyn
f36218d566 dragonball: Fix flaky test_inner_stream_timeout in inner backend
The `test_inner_stream_timeout` test case was prone to failure due to a
race condition between the main thread and the background handler. The
test relied on hardcoded `thread::sleep` durations, which could cause
the second read operation to time out (150ms window) before the main
thread performed its write (after a 300ms sleep) under high system load.

This commit stabilizes the test by:
1. Replacing fixed sleep durations with a `Condvar` and a `stage`
   variable to implement a deterministic state machine.
2. Synchronizing the threads so that the main thread only writes data
   after the background handler has confirmed it is ready or has
   completed its previous phase.
3. Ensuring the read timeout is explicitly managed between different
   validation stages to prevent accidental `TimedOut` errors.

This change eliminates the flakiness and ensures the test passes
consistently across different CIenvironments.

Fixes #12618

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:28:56 +08:00
Alex Lyn
c8a39ad28d dragonball: Fix flaky test_epoll_manager by improving synchronization
This commit aims to address issues of "Infinite loop in epoll_manager
tests" and improve stablity.

Root causes as below:
1. Using `handle_events(-1)` caused the worker thread to block forever
   if an event was missed or if the internal `kick()` signal was not
   accounted for correctly.
2. Relying on event counts was unreliable because internal signals could
   fluctuate the total count, causing the it to enter an infinite loop.
3. Using `EventSet::OUT` on an EventFd is often continuously ready,
   leading to non-deterministic trigger behavior.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:28:56 +08:00
Alex Lyn
a35dcf952e ci: Fix YAML parsing flakiness caused by mktemp random suffixes
In some CI runs, `mktemp` generates random characters that accidentally
form file extensions like `.cSV` or `.Xml`. This triggers downstream
parsing errors because the YAML content is misidentified as CSV/XML.
The issues look like as below:
```
'/tmp/bats-run-KodZEA/.../pod-guest-pull-in-trusted-storage.yaml.in.cSV':
...
```

This commit fixes the issue by:
1. Moving the `XXXXXX` placeholder before the `.yaml` extension.
2. Ensuring the generated file always ends in `.yaml`.

This prevents format misidentification while maintaining filename
uniqueness and security.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-03-06 09:21:29 +08:00
Fabiano Fidêncio
2fff33cfa4 Merge pull request #12628 from stevenhorsman/agent-ctl-bump-aws-lc-rs
agent-ctl: Update aws-lc-rs
2026-03-05 20:52:03 +01:00
Fabiano Fidêncio
83a8b257d1 Merge pull request #12265 from fidencio/topic/nvidia-bump-container-toolkit
nvidia: Bump nvidia-container-toolkit to 1.18.1
2026-03-05 15:25:15 +01:00
Fabiano Fidêncio
079fac1309 Merge pull request #12591 from fidencio/topic/kernel-add-mmio-back-to-the-unified-kernels
kernel: include mmio fragment in unified build for firecracker
2026-03-05 13:45:41 +01:00
Steve Horsman
5df7c4aa9c Merge pull request #12630 from zachspar/spar/kata-deploy-helm/configurable-pod-overhead
kata-deploy: add per-shim configurable pod overhead
2026-03-05 12:42:53 +00:00
Fabiano Fidêncio
e9894c0bd8 nvidia: Bump nvidia-container-toolkit to 1.18.1
Let's update the nvidia-container-toolkit to 1.18.1 (from 1.17.6).

We're, from now on, relying on the version set in the versions.yaml
file.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-05 11:53:09 +01:00
stevenhorsman
c57f2be18e agent-ctl: Update aws-lc-rs
aws-lc has mutliple high severity CVEs:
- GHSA-vw5v-4f2q-w9xf
- GHSA-65p9-r9h6-22vj
- GHSA-hfpc-8r3f-gw53

so try and bump to the latest `aws-lc-rs` crate to pull in the available fixed versions

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-05 10:02:22 +00:00
Zachary Spar
bda9f6491f kata-deploy: add per-shim configurable pod overhead
Allow users to override the default RuntimeClass pod overhead for
any shim via shims.<name>.runtimeClass.overhead.{memory,cpu}.

When the field is absent the existing hardcoded defaults from the
dict are used, so this is fully
backward compatible.

Signed-off-by: Zachary Spar <zspar@coreweave.com>
2026-03-05 08:00:01 +01:00
Fabiano Fidêncio
8f35c31b30 Merge pull request #12542 from fidencio/topic/genpolicy-distribute-different-settings-rather-than-patching-for-ci
genpolicy: settings.d drop-ins and scenario example drop-ins
2026-03-05 07:37:30 +01:00
Fabiano Fidêncio
b5e0a5b7d6 Merge pull request #12555 from fidencio/topic/tests-use-local-pv-pvc-for-policy-tests
k8s-policy-pvc: use local PV/PVC when no default StorageClass exists
2026-03-05 07:37:11 +01:00
Dan Mihai
cb97ebd067 Merge pull request #12615 from microsoft/danmihai1/subPathExpr
tests: k8s: basic test for subPathExpr
2026-03-04 13:10:57 -08:00
Fabiano Fidêncio
a0b9d965e5 k8s-policy-pvc: use local PV/PVC when no default StorageClass exists
Create local block storage (loop device, StorageClass, PV) in the test
only when the cluster has no default StorageClass, matching the approach
used in k8s-volume.bats. Set our StorageClass as default so the PVC
binds to our PV; tear it down after the test.

When a default already exists (e.g. AKS), skip creation and cleanup so
we do not change the cluster's default storage class.

Fixes: #9846

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 21:50:51 +01:00
Fabiano Fidêncio
83dd7dcc75 runtimes: reject virtio-blk-mmio when confidential_guest is true
Virtio-mmio transport is not hardened for confidential computing (unlike
virtio-pci). Reject config that would use virtio-blk-mmio for rootfs/block
when confidential_guest is set, so CoCo guests only use virtio-blk-pci.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 21:41:27 +01:00
Fabiano Fidêncio
cb0d02e40b kernel: include mmio fragment in unified build for firecracker
Remove # !confidential from mmio.conf so CONFIG_VIRTIO_MMIO and
CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES are included when building the
unified x86_64/s390x kernel with -x

Firecracker requires virtio-mmio for block devices; without it the
guest kernel panics (no /dev/vda).

Fixes: #12581
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 21:18:35 +01:00
Fabiano Fidêncio
d40afe592c genpolicy: add settings drop-in directory and RFC 6902 JSON Patch support
Allow genpolicy -j to accept a directory instead of a single file.
When given a directory, genpolicy loads genpolicy-settings.json from it
and applies all genpolicy-settings.d/*.json files (sorted by name) as
RFC 6902 JSON Patches. This gives precise control over settings with
explicit operations (add, remove, replace, move, copy, test), including
array index manipulation and assertions.

Ship composable drop-in examples in drop-in-examples/:
- 10-* files set platform base settings (non-CoCo, AKS, CBL-Mariner)
- 20-* files overlay specific adjustments (OCI version, guest pull)
Users copy the combination they need into genpolicy-settings.d/.

Replace the old adapt_common_policy_settings_* jq-patching functions
in tests_common.sh with install_genpolicy_drop_ins(), which copies the
right combination of 10-* and 20-* drop-ins for the CI scenario.
Tests still generate 99-test-overrides.json on the fly for per-test
request/exec overrides.

Packaging installs 10-* and 20-* drop-ins from drop-in-examples/ into
the tarball; the default genpolicy-settings.d/ is left empty.

Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 20:13:21 +01:00
Dan Mihai
e40d962b13 genpolicy: improve allow_mount logging
Add simple -------- text lines separator to the beginnning of the
allow_mount log output, to help log readers easier separate the ~30
lines of text generated while verifying each mount.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-03-04 16:28:29 +00:00
Dan Mihai
3f845af9d4 tests: k8s: basic test for subPathExpr
Add basic genpolicy test coverage for subPathExpr and corresponding
container mounts.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-03-04 16:28:29 +00:00
Steve Horsman
a4a4683ec7 Merge pull request #12626 from kata-containers/topic/kata-deploy-k3s-rke2-use-imports
kata-deploy: a bunch of fixes regarding uninstall, rke2 and k3s tests
2026-03-04 14:01:09 +00:00
Steve Horsman
2687ad75c1 Merge pull request #12617 from BbolroC/skip-cgroup-device-check-for-remote
runtime: Skip to call sandboxDevices() for remote hypervisor
2026-03-04 14:00:23 +00:00
Steve Horsman
8e11bb2526 Merge pull request #12611 from mythi/coco-kernel-v6.18.15
versions: bump to Linux v6.18.15 (LTS)
2026-03-04 14:00:00 +00:00
Steve Horsman
94f850979f Merge pull request #12613 from stevenhorsman/tooling-bump-x/net-to-v0.51.0
Tooling bump x/net to v0.51.0
2026-03-04 13:44:22 +00:00
stevenhorsman
8640f27516 ci: Remove SNP tests from required
The SNP tests have been unstable on nightlies, but even when these
it seems to be manually cleaned up or something as PR tests are consistently
failing, so we should skip this from the required list until it is reliable.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-04 14:41:09 +01:00
Fabiano Fidêncio
56c3618c1d tests: kata-deploy: wait for API recovery after uninstall
kata-deploy's SIGTERM cleanup restarts the CRI runtime, which on
k3s/rke2 takes down the API server temporarily. The helm uninstall
may complete with errors, and the next test suite would start with
a dead API. Add a wait loop after uninstall to ensure the API is
available before proceeding.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 11:26:31 +01:00
Fabiano Fidêncio
966d710df5 tests: increase kata-deploy wait timeout to 15 minutes
kata-deploy restarts the CRI runtime during install, which can cause
the kata-deploy pod to be killed and recreated by the DaemonSet
controller. On k3s and rke2 in particular, the restart can take
several minutes. Increase the default timeout from 600s (10m) to
900s (15m) to accommodate this.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 11:26:31 +01:00
Fabiano Fidêncio
ebe75cc3e3 kata-deploy: make verification job resilient to CRI runtime restarts
kata-deploy restarts the CRI runtime (k3s/containerd) during install,
which can kill the verification job pod or cause transient API server
errors. Bump backoffLimit from 0 to 3 so the job can retry after being
killed, and add a retry loop around kubectl rollout status to handle
transient connection failures.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 11:26:31 +01:00
Fabiano Fidêncio
7a08ef2f8d kata-deploy: run cleanup on SIGTERM instead of preStop hook
Move the cleanup logic from a preStop lifecycle hook (separate exec)
into the main process's SIGTERM handler. This simplifies the
architecture: the install process now handles its own teardown when
the pod is terminated.

The SIGTERM handler is registered before install begins, and
tokio::select! races install against SIGTERM so cleanup always runs
even if SIGTERM arrives mid-install (e.g. helm uninstall while the
container is restarting after a failed install attempt).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 11:26:31 +01:00
Fabiano Fidêncio
4e024bfb43 Revert "tests: Skip testing k3s/rke2 with nydus snapshotter"
This reverts commit ab25592533, as now
we're deploying k3s/rke2 in a way that we properly test them.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 11:26:31 +01:00
Fabiano Fidêncio
a2216ec05a tests: set up full K3s/RKE2 V3 containerd template when needed
If the rendered config-v3.toml does not import the drop-in dir, write
the full k3s ContainerdConfigTemplateV3 (with hardcoded import path) so
kata-deploy can use drop-in.

This allows us to test with K3s/RKE2 before my patch there gets
released.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 11:26:31 +01:00
Fabiano Fidêncio
01895bf87e kata-deploy: use k3s/rke2 drop-in
Check the rendered containerd config for the versioned drop-in dir import
(config.toml.d or config-v3.toml.d) and bail with a clear error if it is
missing.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-04 11:08:26 +01:00
Aurélien Bombo
d821d4e572 Merge pull request #12619 from sprt/require-editorconfig
gatekeeper: Add EditorConfig checker to required tests
2026-03-03 21:36:32 -06:00
Fabiano Fidêncio
b0345d50e8 build: kernel: Do not expect a modules tarball for vanilla kernel
When I added this I had in mind the period that we still relied on the
SEV module being generated, which we don't do for quite a long time.

This wrong assumption caused the cache to **ALWAYS** fail, increasing
our build time considerably for no reason.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-03 20:14:42 +01:00
Aurélien Bombo
911742e26e gatekeeper: Add EditorConfig checker to required tests
Now that it's stable and fully configured.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-03 11:34:06 -06:00
Hyounggyu Choi
347ce5e3bc runtime: Skip to call sandboxDevices() for remote hypervisor
The remote hypervisor delegates VM creation to a remote service.
The VM runs on cloud infrastructure, not the local host kernel.
So requiring a KVM/MSHV device is semantically wrong and would
cause a hard failure on any host where these devices are absent
(e.g., a VM that doesn't expose nested virtualization).

Skip sandboxDevices() entirely when the configured hypervisor type
is remoteHypervisor{}.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-03-03 13:44:12 +01:00
Fabiano Fidêncio
ab25592533 tests: Skip testing k3s/rke2 with nydus snapshotter
We depend on a k3s commit so we can properly test it, or we need to
change our CI quite a bit to deploy a full template with that imports
in. For now, let's just skip the testing in k3s/rke2 and we'll address
it in a different PR.

ref:
b51167a996

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-03 12:55:10 +01:00
Fabiano Fidêncio
fa3c3eb2ce ci: Add autogenerated policy tests on k0s, k3s, rke2 and microk8s
These tests run only on nightly and when triggering the dev CI manually.
They cover both nydus snapshotter with guest-pull and experimental-force-guest-pull,
using qemu-coco-dev and qemu-coco-dev-runtime-rs, and are included in the
run-kata-coco-tests workflow behind the extensive-matrix-autogenerated-policy flag.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-03 12:55:10 +01:00
Fabiano Fidêncio
3e807300ac tests: k0s: Ensure --logging=containerd=debug is passed
As the default is `info` and that actually overrides whatever is set in
the drop-in file used by k0s.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-03 12:55:10 +01:00
Fabiano Fidêncio
876c6c832d tests: set runtime-request-timeout to 600s for k0s, k3s, rke2, microk8s
Align with kubeadm and bare metal by setting the kubelet CRI
runtime-request-timeout to 600s in deploy functions for k0s (worker
profile), k3s (--kubelet-arg), rke2 (config.yaml), and microk8s
(args/kubelet + restart).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-03 12:55:10 +01:00
Fabiano Fidêncio
9725df658f tests: k8s: policy: set OCI bundle 1.2.1 for k3s/rke2
k3s and rke2 use containerd that expects OCI bundle 1.2.1; otherwise
autogenerated policy tests fail. Add adapt_common_policy_settings_for_k3s_rke2
and call it from adapt_common_policy_settings when KUBERNETES is k3s or rke2.

Tested with k3s v1.34.4+k3s1, rke2 v1.34.4+rke2r1.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-03-03 12:55:10 +01:00
Steve Horsman
7ca8db1e61 Merge pull request #12616 from Amulyam24/go-arch-fix
gha: pass the arch for setup-go on ppc64le
2026-03-03 11:34:30 +00:00
Amulyam24
0754a17fed gha: pass the arch for setup-go on ppc64le
By default, setup-go installs ppc64 binary instead of ppc64le,
resulting in an exec format error. Pass the arch explicitly to fix this.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2026-03-03 16:41:10 +05:30
Mikko Ylinen
2cf9018e35 versions: bump to Linux v6.18.15 (LTS)
Bump to the latest LTS kernel to get a fix for TDX:

efi: Fix reservation of unaccepted memory table

See details in:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0862438c90487e79822d5647f854977d50381505

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-03-03 07:56:24 +02:00
Mikko Ylinen
0b2af07b02 build: kernel: fix checksum checks for RC kernels
get_kernel() thinks it knows when it needs to skip sha256sum validation for
RC kernels since sha256sums.asc is not available:

INFO: Config version: 176
INFO: Kernel version: 6.18-rc5
INFO: kernel path does not exist, will download kernel
INFO: Release candidate kernels are not part of the official sha256sums.asc -- skipping sha256sum validation

But continues to check it anyway since ${rc} matches
with -n. sha256sum should only be checked when ${rc} is NOT
set.

Fixes a problem where downloaded RC kernels are always removed
and downloaded again.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-03-03 07:56:24 +02:00
Dan Mihai
3ea23528a5 docs: require user/group/fsGroup/supplementalGroups
Add a nydus guest-pull limitation explaining that specifying runAsUser,
runAsGroup, fsGroup, and supplementalGroups are required.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-02 23:48:36 +01:00
stevenhorsman
642aa12889 csi-kata-directvolume: Bump x/net to v0.51
Remediates CVE GO-2026-4559

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-02 16:40:58 +00:00
stevenhorsman
24fe232e56 ci/openshift-ci: Bump x/net to v0.51
Remediate CVE GO-2026-4559

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-02 16:40:03 +00:00
Steve Horsman
e50324ba5b Merge pull request #12609 from kata-containers/dependabot/go_modules/src/runtime/go.opentelemetry.io/otel/sdk-1.40.0
build(deps): bump go.opentelemetry.io/otel/sdk from 1.35.0 to 1.40.0 in /src/runtime
2026-03-02 16:32:40 +00:00
stevenhorsman
993a4846c8 versions: Bump go to 1.25.7
Now that go 1.26 is out, 1.24 is not supported, so bump to
1.25 as per our policy.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-03-02 16:33:47 +01:00
dependabot[bot]
d95d1796b2 build(deps): bump go.opentelemetry.io/otel/sdk in /src/runtime
Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.35.0 to 1.40.0.
- [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases)
- [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md)
- [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.35.0...v1.40.0)

---
updated-dependencies:
- dependency-name: go.opentelemetry.io/otel/sdk
  dependency-version: 1.40.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-02 12:59:21 +00:00
Steve Horsman
501d8d1916 Merge pull request #12596 from kata-containers/remove-install_go
workflow | tests: Remove install go
2026-03-02 12:36:58 +00:00
Steve Horsman
964c91f8fc Merge pull request #12608 from kata-containers/sprt/fix-hostpath-dev-docs
docs: Use more accurate wording for /dev hostPath behavior
2026-03-02 11:50:15 +00:00
Aurélien Bombo
68e67d7f8a docs: Use more accurate wording for /dev hostPath behavior
I got lazy when I first added this section in 5c21b1f, so updating the
language to specify that any non-regular host file (under /dev) qualifies,
not just devices.

This matches the actual code, see:

330bfff4be/src/runtime/virtcontainers/mount.go (L57-L83)

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-03-02 11:32:01 +00:00
Steve Horsman
b147cb1319 Merge pull request #12587 from fidencio/topic/runtime-add-configurable-kubelet-root-dir
runtimes: add configurable kubelet root dir
2026-02-28 19:06:14 +00:00
Xuewei Niu
8a4ae090e6 Merge pull request #12513 from lifupan/event_publish
send the task create/start/delete event to containerd
2026-02-28 14:41:46 +08:00
Zvonko Kaiser
afe09803a1 gpu: Ignore OVMF and use the Kernel for proper PCI setup
Sometimes OVFM provides incorrect values to the kernel
we override it by telling the kernel to handle the PCI space setup
like allocating the proper window sizes and assigning the proper busses
to each device.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-27 22:54:31 +01:00
Manuel Huber
88f746dea8 runtime: nvidia: Use OVMF for NV GPU handler
Shift to using OVMF instead of using SeaBios.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>

Update src/runtime/Makefile

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-27 22:54:31 +01:00
Zvonko Kaiser
eec397ac08 qemu: Remove PCIe root port BAR reserve sizing
Stop computing and setting mem-reserve and pref64-reserve on PCIe root
ports and switch ports. Remove getBARsMaxAddressableMemory() which
scanned host GPU BARs to pre-calculate these values.

The previous approach only considered GPU devices (IsGPU(), class
0x0302) when scanning for BAR sizes, so devices like NVSwitches (class
0x0680) with their 32MB non-prefetchable BAR0 were not accounted for
and received the 4MB default. Additionally, GetTotalAddressableMemory()
classifies BARs by 32/64-bit address width rather than by the
prefetchable flag that QEMU's mem-reserve vs pref64-reserve maps to.

Modern QEMU introspects VFIO device BARs when they are attached to
root ports and sizes the MMIO windows accordingly. Modern OVMF
(edk2-stable202502+) automatically calculates the 64-bit PCI MMIO
aperture based on the BARs of actually present devices during PCI
enumeration. Omitting the reserve parameters lets QEMU and OVMF
handle MMIO window sizing correctly for all device types including
GPUs, NVSwitches, and NICs without requiring host-side BAR scanning.

This also removes the nvpci dependency from qemu_arch_base.go.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-27 22:54:31 +01:00
Zvonko Kaiser
bb7fd335f3 qemu: Remove OVMF X-PciMmio64Mb fw_cfg hint
Modern OVMF (edk2-stable202502 and later) automatically sizes the
64-bit PCI MMIO aperture based on the BARs of actually attached
devices during PCI enumeration. The opt/ovmf/X-PciMmio64Mb fw_cfg
hint is no longer needed to ensure large-BAR devices like NVIDIA
GPUs receive adequate MMIO space.

The previous approach was fragile: the runtime scanned host PCI
devices to estimate the required aperture size, but only considered
GPU devices (class 0x0302), missing NVSwitches and other devices
with large BARs. Removing this code avoids confusion about MMIO
sizing responsibility.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-27 22:54:31 +01:00
Fabiano Fidêncio
330bfff4be kata-deploy: Fix nydus snapshotter config (on v3 config version)
On containerd v3 config, disable_snapshot_annotations must be set under the
images plugin, not the runtime plugin.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-27 18:20:30 +01:00
Fabiano Fidêncio
0a73638744 runtime: add configurable kubelet root dir
Different kubernetes distributions, such as k0s, use a different kubelet
root dir location instead of the default /var/lib/kubelet, so ConfigMap
and Secret volume propagation were failing.

This adds a kubelet_root_dir config option that the go runtime uses when
matching volume paths and kata-deploy now sets it automatically for k0s
via a drop-in file.

runtime-rs does not need this option: it identifies ConfigMap/Secret,
projected, and downward-api volumes by volume-type path segment
(kubernetes.io~configmap, etc.), not by kubelet root prefix.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-27 14:10:57 +01:00
Steve Horsman
2695007ef8 Merge pull request #12584 from stevenhorsman/switch-actionlint-workflow
workflow: Update actionlint workflows
2026-02-27 13:03:58 +00:00
stevenhorsman
66e58d6490 tests: Delete install_go.sh
Having a script to install go is legacy from Jenkins, so
delete it, so there is less code in our repo.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-27 12:42:43 +00:00
stevenhorsman
b71bb47e21 workflow: Use setup-go to install go
Rather than having our own script, just use the github action
to install go when needed.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-27 12:42:43 +00:00
stevenhorsman
308442e887 workflow: Update actionlint workflows
The actionlint gh extension is outdated and the wrapping seems
unnecessary when there is a github action that seems to be maintained,
so let's update to use that

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 11:52:19 +00:00
Fupan Li
2149fc0eee runtime-rs: send the task delete event to containerd
According to shimv2 proto, it should send task delete event to
containerd once a container task delete succesfully.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2026-02-14 12:44:31 +08:00
Fupan Li
d2613025b7 runtime-rs: send the task create event to containerd
According to shimv2 proto, it should send task create event to
containerd once a container task create succesfully.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2026-02-14 12:44:23 +08:00
Fupan Li
499e18c876 runtime-rs: send the task start event to container
According to shimv2 proto, it should send task start event to
containerd once a container task start succesfully.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2026-02-14 12:44:03 +08:00
307 changed files with 42983 additions and 8433 deletions

View File

@@ -28,3 +28,9 @@ self-hosted-runner:
- s390x-large
- tdx
- ubuntu-24.04-arm
paths:
.github/workflows/**/*.{yml,yaml}:
ignore:
# We use if: false to "temporarily" skip jobs with issues
- 'constant expression "false" in condition'

View File

@@ -13,18 +13,13 @@ concurrency:
jobs:
run-actionlint:
name: run-actionlint
env:
GH_TOKEN: ${{ github.token }}
runs-on: ubuntu-24.04
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
persist-credentials: false
- name: Install actionlint gh extension
run: gh extension install https://github.com/cschleiden/gh-actionlint
- name: Run actionlint
run: gh actionlint
uses: raven-actions/actionlint@e01d1ea33dd6a5ed517d95b4c0c357560ac6f518 # v2.1.1

View File

@@ -47,6 +47,23 @@ jobs:
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install yq
run: |
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Read properties from versions.yaml
run: |
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
- name: Install dependencies
run: bash tests/integration/cri-containerd/gha-run.sh install-dependencies
env:

View File

@@ -47,8 +47,25 @@ jobs:
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install yq
run: |
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Read properties from versions.yaml
run: |
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
- name: Install dependencies
run: bash tests/integration/cri-containerd/gha-run.sh
run: bash tests/integration/cri-containerd/gha-run.sh install-dependencies
env:
GH_TOKEN: ${{ github.token }}

View File

@@ -82,11 +82,17 @@ jobs:
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Install golang
- name: Read properties from versions.yaml
if: contains(matrix.component.needs, 'golang')
run: |
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> "$GITHUB_PATH"
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
if: contains(matrix.component.needs, 'golang')
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
- name: Setup rust
if: contains(matrix.component.needs, 'rust')
run: |

View File

@@ -143,7 +143,7 @@ jobs:
if-no-files-found: error
- name: store-extratarballs-artifact ${{ matrix.asset }}
if: ${{ matrix.asset == 'kernel' || startsWith(matrix.asset, 'kernel-nvidia-gpu') }}
if: ${{ startsWith(matrix.asset, 'kernel-nvidia-gpu') }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: kata-artifacts-amd64-${{ matrix.asset }}-modules${{ inputs.tarball-suffix }}
@@ -168,8 +168,6 @@ jobs:
- rootfs-image-nvidia-gpu-confidential
- rootfs-initrd
- rootfs-initrd-confidential
- rootfs-initrd-nvidia-gpu
- rootfs-initrd-nvidia-gpu-confidential
steps:
- name: Login to Kata Containers quay.io
if: ${{ inputs.push-to-registry == 'yes' }}
@@ -235,7 +233,6 @@ jobs:
asset:
- busybox
- coco-guest-components
- kernel-modules
- kernel-nvidia-gpu-modules
- pause-image
steps:
@@ -368,7 +365,6 @@ jobs:
matrix:
asset:
- agent-ctl
- csi-kata-directvolume
- genpolicy
- kata-ctl
- kata-manager

View File

@@ -152,7 +152,6 @@ jobs:
- rootfs-image
- rootfs-image-nvidia-gpu
- rootfs-initrd
- rootfs-initrd-nvidia-gpu
steps:
- name: Login to Kata Containers quay.io
if: ${{ inputs.push-to-registry == 'yes' }}

View File

@@ -120,15 +120,6 @@ jobs:
retention-days: 15
if-no-files-found: error
- name: store-extratarballs-artifact ${{ matrix.asset }}
if: ${{ matrix.asset == 'kernel' }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: kata-artifacts-s390x-${{ matrix.asset }}-modules${{ inputs.tarball-suffix }}
path: kata-build/kata-static-${{ matrix.asset }}-modules.tar.zst
retention-days: 15
if-no-files-found: error
build-asset-rootfs:
name: build-asset-rootfs
runs-on: s390x

View File

@@ -17,6 +17,7 @@ jobs:
pr-number: "dev"
tag: ${{ github.sha }}-dev
target-branch: ${{ github.ref_name }}
extensive-matrix-autogenerated-policy: "yes"
secrets:
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}

View File

@@ -22,6 +22,7 @@ jobs:
pr-number: "nightly"
tag: ${{ github.sha }}-nightly
target-branch: ${{ github.ref_name }}
extensive-matrix-autogenerated-policy: "yes"
secrets:
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
AZ_APPID: ${{ secrets.AZ_APPID }}

View File

@@ -19,6 +19,10 @@ on:
required: false
type: string
default: no
extensive-matrix-autogenerated-policy:
required: false
type: string
default: no
secrets:
AUTHENTICATED_IMAGE_PASSWORD:
required: true
@@ -212,61 +216,6 @@ jobs:
platforms: linux/amd64, linux/s390x
file: tests/integration/kubernetes/runtimeclass_workloads/confidential/unencrypted/Dockerfile
publish-csi-driver-amd64:
name: publish-csi-driver-amd64
needs: build-kata-static-tarball-amd64
permissions:
contents: read
packages: write
runs-on: ubuntu-22.04
steps:
- name: Checkout code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
persist-credentials: false
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tools-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-tools-static-tarball-amd64-${{ inputs.tag }}
path: kata-tools-artifacts
- name: Install kata-tools
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts
- name: Copy binary into Docker context
run: |
# Copy to the location where the Dockerfile expects the binary.
mkdir -p src/tools/csi-kata-directvolume/bin/
cp /opt/kata/bin/csi-kata-directvolume src/tools/csi-kata-directvolume/bin/directvolplugin
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0
- name: Login to Kata Containers ghcr.io
uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Docker build and push
uses: docker/build-push-action@ca052bb54ab0790a636c9b5f226502c73d547a25 # v5.4.0
with:
tags: ghcr.io/kata-containers/csi-kata-directvolume:${{ inputs.pr-number }}
push: true
context: src/tools/csi-kata-directvolume/
platforms: linux/amd64
file: src/tools/csi-kata-directvolume/Dockerfile
run-kata-monitor-tests:
if: ${{ inputs.skip-test != 'yes' }}
needs: build-kata-static-tarball-amd64
@@ -345,7 +294,6 @@ jobs:
needs:
- publish-kata-deploy-payload-amd64
- build-and-publish-tee-confidential-unencrypted-image
- publish-csi-driver-amd64
uses: ./.github/workflows/run-kata-coco-tests.yaml
permissions:
contents: read
@@ -358,6 +306,7 @@ jobs:
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
extensive-matrix-autogenerated-policy: ${{ inputs.extensive-matrix-autogenerated-policy }}
secrets:
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
AZ_APPID: ${{ secrets.AZ_APPID }}

View File

@@ -31,10 +31,22 @@ jobs:
with:
persist-credentials: false
- name: Install golang
- name: Install yq
run: |
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> "${GITHUB_PATH}"
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Read properties from versions.yaml
run: |
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
- name: Install Rust
run: ./tests/install_rust.sh

View File

@@ -24,10 +24,22 @@ jobs:
fetch-depth: 0
persist-credentials: false
- name: Install golang
- name: Install yq
run: |
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> "${GITHUB_PATH}"
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Read properties from versions.yaml
run: |
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
- name: Docs URL Alive Check
run: |

View File

@@ -27,10 +27,22 @@ jobs:
fetch-depth: 0
persist-credentials: false
- name: Install golang
- name: Install yq
run: |
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> "${GITHUB_PATH}"
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Read properties from versions.yaml
run: |
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
- name: Install govulncheck
run: |

View File

@@ -19,23 +19,25 @@ permissions: {}
jobs:
scan-scheduled:
name: Scan of whole repo
permissions:
actions: read # # Required to upload SARIF file to CodeQL
contents: read # Read commit contents
security-events: write # Require writing security events to upload SARIF file to security tab
if: ${{ github.event_name == 'push' || github.event_name == 'schedule' || github.event_name == 'workflow_dispatch' }}
uses: "google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml@b00f71e051ddddc6e46a193c31c8c0bf283bf9e6" # v2.1.0
uses: "google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml@8ae4be80636b94886b3c271caad730985ce0611c" # v2.3.3
with:
scan-args: |-
-r
./
scan-pr:
name: Scan of just PR code
permissions:
actions: read # Required to upload SARIF file to CodeQL
contents: read # Read commit contents
security-events: write # Require writing security events to upload SARIF file to security tab
if: ${{ github.event_name == 'pull_request' }}
uses: "google/osv-scanner-action/.github/workflows/osv-scanner-reusable-pr.yml@b00f71e051ddddc6e46a193c31c8c0bf283bf9e6" # v2.1.0
uses: "google/osv-scanner-action/.github/workflows/osv-scanner-reusable-pr.yml@8ae4be80636b94886b3c271caad730985ce0611c" # v2.3.3
with:
# Example of specifying custom arguments
scan-args: |-

View File

@@ -53,6 +53,25 @@ jobs:
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install yq
run: |
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Read properties from versions.yaml
run: |
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
# Setup-go doesn't work properly with ppc64le: https://github.com/actions/setup-go/issues/648
architecture: ${{ inputs.arch == 'ppc64le' && 'ppc64le' || '' }}
- name: Install dependencies
timeout-minutes: 15
run: bash tests/integration/cri-containerd/gha-run.sh install-dependencies

View File

@@ -57,10 +57,24 @@ jobs:
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install golang
- name: Install yq
run: |
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> "$GITHUB_PATH"
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Read properties from versions.yaml
run: |
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
# Setup-go doesn't work properly with ppc64le: https://github.com/actions/setup-go/issues/648
architecture: 'ppc64le'
- name: Prepare the runner for k8s test suite
run: bash "${HOME}/scripts/k8s_cluster_prepare.sh"

View File

@@ -24,6 +24,10 @@ on:
required: false
type: string
default: ""
extensive-matrix-autogenerated-policy:
required: false
type: string
default: no
secrets:
AUTHENTICATED_IMAGE_PASSWORD:
required: true
@@ -106,10 +110,6 @@ jobs:
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh install-kbs-client
- name: Deploy CSI driver
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh deploy-csi-driver
- name: Run tests
timeout-minutes: 100
run: bash tests/integration/kubernetes/gha-run.sh run-tests
@@ -130,10 +130,6 @@ jobs:
[[ "${KATA_HYPERVISOR}" == "qemu-tdx" ]] && echo "ITA_KEY=${GH_ITA_KEY}" >> "${GITHUB_ENV}"
bash tests/integration/kubernetes/gha-run.sh delete-coco-kbs
- name: Delete CSI driver
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh delete-csi-driver
# Generate jobs for testing CoCo on non-TEE environments
run-k8s-tests-coco-nontee:
name: run-k8s-tests-coco-nontee
@@ -231,10 +227,6 @@ jobs:
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh install-kbs-client
- name: Deploy CSI driver
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh deploy-csi-driver
- name: Run tests
timeout-minutes: 80
run: bash tests/integration/kubernetes/gha-run.sh run-tests
@@ -253,10 +245,126 @@ jobs:
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh delete-coco-kbs
- name: Delete CSI driver
# Extensive matrix: autogenerated policy tests (nydus + experimental-force-guest-pull) on k0s, k3s, rke2, microk8s with qemu-coco-dev / qemu-coco-dev-runtime-rs
run-k8s-tests-coco-nontee-extensive-matrix:
if: ${{ inputs.extensive-matrix-autogenerated-policy == 'yes' }}
name: run-k8s-tests-coco-nontee-extensive-matrix
strategy:
fail-fast: false
matrix:
environment: [
{ k8s: k0s, vmm: qemu-coco-dev, snapshotter: nydus, pull_type: guest-pull },
{ k8s: k0s, vmm: qemu-coco-dev, snapshotter: "", pull_type: experimental-force-guest-pull },
{ k8s: k0s, vmm: qemu-coco-dev-runtime-rs, snapshotter: nydus, pull_type: guest-pull },
{ k8s: k3s, vmm: qemu-coco-dev, snapshotter: nydus, pull_type: guest-pull },
{ k8s: k3s, vmm: qemu-coco-dev, snapshotter: "", pull_type: experimental-force-guest-pull },
{ k8s: k3s, vmm: qemu-coco-dev-runtime-rs, snapshotter: nydus, pull_type: guest-pull },
{ k8s: rke2, vmm: qemu-coco-dev, snapshotter: nydus, pull_type: guest-pull },
{ k8s: rke2, vmm: qemu-coco-dev, snapshotter: "", pull_type: experimental-force-guest-pull },
{ k8s: rke2, vmm: qemu-coco-dev-runtime-rs, snapshotter: nydus, pull_type: guest-pull },
{ k8s: microk8s, vmm: qemu-coco-dev, snapshotter: nydus, pull_type: guest-pull },
{ k8s: microk8s, vmm: qemu-coco-dev, snapshotter: "", pull_type: experimental-force-guest-pull },
{ k8s: microk8s, vmm: qemu-coco-dev-runtime-rs, snapshotter: nydus, pull_type: guest-pull },
]
runs-on: ubuntu-24.04
permissions:
contents: read
environment: ci
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
GH_PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.environment.vmm }}
KBS: "true"
KBS_INGRESS: "nodeport"
KUBERNETES: ${{ matrix.environment.k8s }}
SNAPSHOTTER: ${{ matrix.environment.snapshotter }}
PULL_TYPE: ${{ matrix.environment.pull_type }}
EXPERIMENTAL_FORCE_GUEST_PULL: ${{ matrix.environment.pull_type == 'experimental-force-guest-pull' && matrix.environment.vmm || '' }}
AUTHENTICATED_IMAGE_USER: ${{ vars.AUTHENTICATED_IMAGE_USER }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
AUTO_GENERATE_POLICY: "yes"
K8S_TEST_HOST_TYPE: "all"
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
persist-credentials: false
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tools-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-tools-artifacts
- name: Install kata-tools
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts
- name: Remove unnecessary directories to free up space
run: |
sudo rm -rf /usr/local/.ghcup
sudo rm -rf /opt/hostedtoolcache/CodeQL
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf /usr/local/share/boost
sudo rm -rf /usr/lib/jvm
sudo rm -rf /usr/share/swift
sudo rm -rf /usr/local/share/powershell
sudo rm -rf /usr/local/julia*
sudo rm -rf /opt/az
sudo rm -rf /usr/local/share/chromium
sudo rm -rf /opt/microsoft
sudo rm -rf /opt/google
sudo rm -rf /usr/lib/firefox
- name: Deploy ${{ matrix.environment.k8s }}
timeout-minutes: 15
run: bash tests/integration/kubernetes/gha-run.sh deploy-k8s
- name: Install `bats`
run: bash tests/integration/kubernetes/gha-run.sh install-bats
- name: Deploy Kata
timeout-minutes: 20
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata
env:
USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: ${{ matrix.environment.snapshotter == 'nydus' }}
- name: Deploy CoCo KBS
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-coco-kbs
- name: Install `kbs-client`
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh install-kbs-client
- name: Run tests
timeout-minutes: 80
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Report tests
if: always()
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh delete-csi-driver
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Delete kata-deploy
if: always()
timeout-minutes: 15
run: bash tests/integration/kubernetes/gha-run.sh cleanup
- name: Delete CoCo KBS
if: always()
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh delete-coco-kbs
# Generate jobs for testing CoCo on non-TEE environments with erofs-snapshotter
run-k8s-tests-coco-nontee-with-erofs-snapshotter:
@@ -344,10 +452,6 @@ jobs:
timeout-minutes: 20
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata
- name: Deploy CSI driver
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh deploy-csi-driver
- name: Run tests
timeout-minutes: 80
run: bash tests/integration/kubernetes/gha-run.sh run-tests
@@ -360,8 +464,3 @@ jobs:
if: always()
timeout-minutes: 15
run: bash tests/integration/kubernetes/gha-run.sh cleanup
- name: Delete CSI driver
if: always()
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh delete-csi-driver

View File

@@ -126,11 +126,16 @@ jobs:
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Install golang
- name: Read properties from versions.yaml
run: |
cd "${GOPATH}/src/github.com/${GITHUB_REPOSITORY}"
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> "$GITHUB_PATH"
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
- name: Install system dependencies
run: |
sudo apt-get update && sudo apt-get -y install moreutils hunspell hunspell-en-gb hunspell-en-us pandoc

3
.gitignore vendored
View File

@@ -20,3 +20,6 @@ tools/packaging/static-build/agent/install_libseccomp.sh
.direnv
**/.DS_Store
site/
opt/
tools/packaging/kernel/configs/**/.config
root_hash.txt

1515
Cargo.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -22,6 +22,9 @@ members = [
"src/dragonball/dbs_utils",
"src/dragonball/dbs_virtio_devices",
# genpolicy
"src/tools/genpolicy",
# runtime-rs
"src/runtime-rs",
"src/runtime-rs/crates/agent",
@@ -107,6 +110,9 @@ safe-path = { path = "src/libs/safe-path" }
shim-interface = { path = "src/libs/shim-interface" }
test-utils = { path = "src/libs/test-utils" }
# Local dependencies from `src/agent`
kata-agent-policy = { path = "src/agent/policy" }
# Outside dependencies
actix-rt = "2.7.0"
anyhow = "1.0"

View File

@@ -1 +1 @@
3.27.0
3.28.0

View File

@@ -187,9 +187,10 @@ different compared to `runc` containers:
into the guest and exposes it directly to the container.
**Mounting guest devices**: When the source path of a hostPath volume is
under `/dev`, and the path either corresponds to a host device or is not
accessible by the Kata shim, the Kata agent bind mounts the source path
directly from the *guest* filesystem into the container.
under `/dev` (or `/dev` itself), and the path corresponds to a
non-regular file (i.e., a device, directory, or any other special file)
or is not accessible by the Kata shim, the Kata agent bind mounts the
source path directly from the *guest* filesystem into the container.
[runtime-config]: /src/runtime/README.md#configuration
[k8s-hostpath]: https://kubernetes.io/docs/concepts/storage/volumes/#hostpath
@@ -226,6 +227,35 @@ Importantly, the default behavior to pass the host devices to a
privileged container is not supported in Kata Containers and needs to be
disabled, see [Privileged Kata Containers](how-to/privileged.md).
## Guest pulled container images
When using features like **nydus guest-pull**, set user/group IDs explicitly in the pod spec.
If the ID values are omitted:
- Your workload might be executed with unexpected user/group ID values, because image layers
may be unavailable to containerd, so image config (including user/group) is not applied.
- If using policy or genpolicy, the generated policy may detect these unexpected values and
reject the creation of workload containers.
Set `securityContext` explicitly. Use **pod-level** `spec.securityContext` (for Pods) or
`spec.template.spec.securityContext` (for controllers like Deployments) and/or **container-level**
`spec.containers[].securityContext`. Include at least:
- `runAsUser` — primary user ID
- `runAsGroup` — primary group ID
- `fsGroup` — volume group ownership (often reflected as a supplemental group)
- `supplementalGroups` — list of additional group IDs (if needed)
Example:
```yaml
# Explicit user/group/supplementary groups to support nydus guest-pull
securityContext:
runAsUser: 0
runAsGroup: 0
fsGroup: 0
supplementalGroups: [1, 2, 3, 4, 6, 10, 11, 20, 26, 27]
```
# Appendices
## The constraints challenge

View File

@@ -1,57 +1,64 @@
# How to do a Kata Containers Release
This document lists the tasks required to create a Kata Release.
## Requirements
- GitHub permissions to run workflows.
## Versioning
## Release Model
The Kata Containers project uses [semantic versioning](http://semver.org/) for all releases.
Semantic versions are comprised of three fields in the form:
Kata Containers follows a rolling release model with monthly snapshots.
New features, bug fixes, and improvements are continuously integrated into
`main`. Each month, a snapshot is tagged as a new `MINOR` release.
```
MAJOR.MINOR.PATCH
```
### Versioning
When `MINOR` increases, the new release adds **new features** but *without changing the existing behavior*.
Releases use the `MAJOR.MINOR.PATCH` scheme. Monthly snapshots increment
`MINOR`; `PATCH` is typically `0`. Major releases are rare (years apart) and
signal significant architectural changes that may require updates to container
managers (Containerd, CRI-O) or other infrastructure. Breaking changes in
`MINOR` releases are avoided where possible, but may occasionally occur as
features are deprecated or removed.
When `MAJOR` increases, the new release adds **new features, bug fixes, or
both** and which **changes the behavior from the previous release** (incompatible with previous releases).
### No Stable Branches
A major release will also likely require a change of the container manager version used,
-for example Containerd or CRI-O. Please refer to the release notes for further details.
**Important** : the Kata Containers project doesn't have stable branches (see
[this issue](https://github.com/kata-containers/kata-containers/issues/9064) for details).
Bug fixes are released as part of `MINOR` or `MAJOR` releases only. `PATCH` is always `0`.
The Kata Containers project does not maintain stable branches (see
[#9064](https://github.com/kata-containers/kata-containers/issues/9064)).
Bug fixes land on `main` and ship in the next monthly snapshot rather than
being backported. Downstream projects that need extended support or compliance
certifications should select a monthly snapshot as their stable base and manage
their own validation and patch backporting from there.
## Release Process
### Bump the `VERSION` and `Chart.yaml` file
When the `kata-containers/kata-containers` repository is ready for a new release,
first create a PR to set the release in the [`VERSION`](./../VERSION) file and update the
`version` and `appVersion` in the
[`Chart.yaml`](./../tools/packaging/kata-deploy/helm-chart/kata-deploy/Chart.yaml) file and
have it merged.
When the `kata-containers/kata-containers` repository is ready for a new
release, first create a PR to set the release in the [`VERSION`](./../VERSION)
file and update the `version` and `appVersion` in the
[`Chart.yaml`](./../tools/packaging/kata-deploy/helm-chart/kata-deploy/Chart.yaml)
file and have it merged.
### Lock the `main` branch
In order to prevent any PRs getting merged during the release process, and slowing the release
process down, by impacting the payload caches, we have recently trailed setting the `main`
branch to read only whilst the release action runs.
In order to prevent any PRs getting merged during the release process, and
slowing the release process down, by impacting the payload caches, we have
recently trialed setting the `main` branch to read only whilst the release
action runs.
> [!NOTE]
> Admin permission is needed to complete this task.
### Wait for the `VERSION` bump PR payload publish to complete
To reduce the chance of need to re-run the release workflow, check the
[CI | Publish Kata Containers payload](https://github.com/kata-containers/kata-containers/actions/workflows/payload-after-push.yaml)
To reduce the chance of need to re-run the release workflow, check the [CI |
Publish Kata Containers
payload](https://github.com/kata-containers/kata-containers/actions/workflows/payload-after-push.yaml)
once the `VERSION` PR bump has merged to check that the assets build correctly
and are cached, so that the release process can just download these artifacts
rather than needing to build them all, which takes time and can reveal errors in infra.
rather than needing to build them all, which takes time and can reveal errors in
infra.
### Check GitHub Actions
@@ -63,11 +70,10 @@ release artifacts.
> [!NOTE]
> Write permissions to trigger the action.
The action is manually triggered and is responsible for generating a new
release (including a new tag), pushing those to the
`kata-containers/kata-containers` repository. The new release is initially
created as a draft. It is promoted to an official release when the whole
workflow has completed successfully.
The action is manually triggered and is responsible for generating a new release
(including a new tag), pushing those to the `kata-containers/kata-containers`
repository. The new release is initially created as a draft. It is promoted to
an official release when the whole workflow has completed successfully.
Check the [actions status
page](https://github.com/kata-containers/kata-containers/actions) to verify all
@@ -75,12 +81,13 @@ steps in the actions workflow have completed successfully. On success, a static
tarball containing Kata release artifacts will be uploaded to the [Release
page](https://github.com/kata-containers/kata-containers/releases).
If the workflow fails because of some external environmental causes, e.g. network
timeout, simply re-run the failed jobs until they eventually succeed.
If the workflow fails because of some external environmental causes, e.g.
network timeout, simply re-run the failed jobs until they eventually succeed.
If for some reason you need to cancel the workflow or re-run it entirely, go first
to the [Release page](https://github.com/kata-containers/kata-containers/releases) and
delete the draft release from the previous run.
If for some reason you need to cancel the workflow or re-run it entirely, go
first to the [Release
page](https://github.com/kata-containers/kata-containers/releases) and delete
the draft release from the previous run.
### Unlock the `main` branch
@@ -90,9 +97,8 @@ an admin to do it.
### Improve the release notes
Release notes are auto-generated by the GitHub CLI tool used as part of our
release workflow. However, some manual tweaking may still be necessary in
order to highlight the most important features and bug fixes in a specific
release.
release workflow. However, some manual tweaking may still be necessary in order
to highlight the most important features and bug fixes in a specific release.
With this in mind, please, poke @channel on #kata-dev and people who worked on
the release will be able to contribute to that.

View File

@@ -99,6 +99,9 @@ The [`genpolicy`](../../src/tools/genpolicy/) application can be used to generat
**Warning** Users should review carefully the automatically-generated Policy, and modify the Policy file if needed to match better their use case, before using this Policy.
**Important — User / Group / Supplemental groups for Policy and genpolicy**
When using features like **nydus guest-pull**, set user/group IDs explicitly in the pod spec, as described in [Limitations](../Limitations.md#guest-pulled-container-images).
See the [`genpolicy` documentation](../../src/tools/genpolicy/README.md) and the [Policy contents examples](#policy-contents) for additional information.
## Policy contents

8
osv-scanner.toml Normal file
View File

@@ -0,0 +1,8 @@
[[IgnoredVulns]]
# yaml-rust is unmaintained.
# We tried the most promising alternative in https://github.com/kata-containers/kata-containers/pull/12509,
# but its literal quoting is not conformant.
id = "RUSTSEC-2024-0320"
ignoreUntil = 2026-10-01 # TODO(burgerdev): revisit yml library ecosystem
reason = "No alternative currently supports 'yes' strings correctly; genpolicy processes only trusted input."

7
src/agent/Cargo.lock generated
View File

@@ -979,6 +979,12 @@ dependencies = [
"parking_lot_core",
]
[[package]]
name = "data-encoding"
version = "2.9.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2a2330da5de22e8a3cb63252ce2abb30116bf5265e89c0e01bc17015ce30a476"
[[package]]
name = "deranged"
version = "0.5.5"
@@ -3428,6 +3434,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "843c3d97f07e3b5ac0955d53ad0af4c91fe4a4f8525843ece5bf014f27829b73"
dependencies = [
"anyhow",
"data-encoding",
"lazy_static",
"rand",
"regex",

View File

@@ -18,6 +18,8 @@ serde_json.workspace = true
# Agent Policy
regorus = { version = "0.2.8", default-features = false, features = [
"arc",
"base64",
"base64url",
"regex",
"std",
] }

View File

@@ -2308,9 +2308,6 @@ fn is_sealed_secret_path(source_path: &str) -> bool {
}
async fn cdh_handler_trusted_storage(oci: &mut Spec) -> Result<()> {
if !confidential_data_hub::is_cdh_client_initialized() {
return Ok(());
}
let linux = oci
.linux()
.as_ref()
@@ -2320,23 +2317,10 @@ async fn cdh_handler_trusted_storage(oci: &mut Spec) -> Result<()> {
for specdev in devices.iter() {
if specdev.path().as_path().to_str() == Some(TRUSTED_IMAGE_STORAGE_DEVICE) {
let dev_major_minor = format!("{}:{}", specdev.major(), specdev.minor());
let secure_storage_integrity = AGENT_CONFIG.secure_storage_integrity.to_string();
info!(
sl(),
"trusted_store device major:min {}, enable data integrity {}",
dev_major_minor,
secure_storage_integrity
);
let options = std::collections::HashMap::from([
("deviceId".to_string(), dev_major_minor),
("encryptType".to_string(), "LUKS".to_string()),
("dataIntegrity".to_string(), secure_storage_integrity),
]);
confidential_data_hub::secure_mount(
"BlockDevice",
&options,
vec![],
cdh_secure_mount(
"block-device",
&dev_major_minor,
"luks2",
KATA_IMAGE_WORK_DIR,
)
.await?;
@@ -2347,6 +2331,49 @@ async fn cdh_handler_trusted_storage(oci: &mut Spec) -> Result<()> {
Ok(())
}
pub(crate) async fn cdh_secure_mount(
device_type: &str,
device_id: &str,
encrypt_type: &str,
mount_point: &str,
) -> Result<()> {
if !confidential_data_hub::is_cdh_client_initialized() {
return Ok(());
}
let integrity = AGENT_CONFIG.secure_storage_integrity.to_string();
info!(
sl(),
"cdh_secure_mount: device_type {}, device_id {}, encrypt_type {}, integrity {}",
device_type,
device_id,
encrypt_type,
integrity
);
let options = std::collections::HashMap::from([
("deviceId".to_string(), device_id.to_string()),
("sourceType".to_string(), "empty".to_string()),
("targetType".to_string(), "fileSystem".to_string()),
("filesystemType".to_string(), "ext4".to_string()),
("mkfsOpts".to_string(), "-E lazy_journal_init".to_string()),
("encryptionType".to_string(), encrypt_type.to_string()),
("dataIntegrity".to_string(), integrity),
]);
std::fs::create_dir_all(mount_point).inspect_err(|e| {
error!(
sl(),
"Failed to create mount point directory {}: {:?}", mount_point, e
);
})?;
confidential_data_hub::secure_mount(device_type, &options, vec![], mount_point).await?;
Ok(())
}
async fn cdh_handler_sealed_secrets(oci: &mut Spec) -> Result<()> {
if !confidential_data_hub::is_cdh_client_initialized() {
return Ok(());

View File

@@ -65,6 +65,12 @@ type UeventWatcher = (Box<dyn UeventMatcher>, oneshot::Sender<Uevent>);
pub struct StorageState {
count: Arc<AtomicU32>,
device: Arc<dyn StorageDevice>,
/// Whether the storage is shared across multiple containers (e.g.
/// block-based emptyDirs). Shared storages should not be cleaned up
/// when a container exits; cleanup happens only when the sandbox is
/// destroyed.
shared: bool,
}
impl Debug for StorageState {
@@ -74,17 +80,11 @@ impl Debug for StorageState {
}
impl StorageState {
fn new() -> Self {
fn new(shared: bool) -> Self {
StorageState {
count: Arc::new(AtomicU32::new(1)),
device: Arc::new(StorageDeviceGeneric::default()),
}
}
pub fn from_device(device: Arc<dyn StorageDevice>) -> Self {
Self {
count: Arc::new(AtomicU32::new(1)),
device,
shared,
}
}
@@ -92,6 +92,10 @@ impl StorageState {
self.device.path()
}
pub fn is_shared(&self) -> bool {
self.shared
}
pub async fn ref_count(&self) -> u32 {
self.count.load(Ordering::Relaxed)
}
@@ -171,8 +175,10 @@ impl Sandbox {
/// Add a new storage object or increase reference count of existing one.
/// The caller may detect new storage object by checking `StorageState.refcount == 1`.
/// The `shared` flag indicates if this storage is shared across multiple containers;
/// if true, cleanup will be skipped when containers exit.
#[instrument]
pub async fn add_sandbox_storage(&mut self, path: &str) -> StorageState {
pub async fn add_sandbox_storage(&mut self, path: &str, shared: bool) -> StorageState {
match self.storages.entry(path.to_string()) {
Entry::Occupied(e) => {
let state = e.get().clone();
@@ -180,7 +186,7 @@ impl Sandbox {
state
}
Entry::Vacant(e) => {
let state = StorageState::new();
let state = StorageState::new(shared);
e.insert(state.clone());
state
}
@@ -188,22 +194,32 @@ impl Sandbox {
}
/// Update the storage device associated with a path.
/// Preserves the existing shared flag and reference count.
pub fn update_sandbox_storage(
&mut self,
path: &str,
device: Arc<dyn StorageDevice>,
) -> std::result::Result<Arc<dyn StorageDevice>, Arc<dyn StorageDevice>> {
if !self.storages.contains_key(path) {
return Err(device);
match self.storages.get(path) {
None => Err(device),
Some(existing) => {
let state = StorageState {
device,
..existing.clone()
};
// Safe to unwrap() because we have just ensured existence of entry via get().
let state = self.storages.insert(path.to_string(), state).unwrap();
Ok(state.device)
}
}
let state = StorageState::from_device(device);
// Safe to unwrap() because we have just ensured existence of entry.
let state = self.storages.insert(path.to_string(), state).unwrap();
Ok(state.device)
}
/// Decrease reference count and destroy the storage object if reference count reaches zero.
///
/// For shared storages (e.g., emptyDir volumes), cleanup is skipped even when refcount
/// reaches zero. The storage entry is kept in the map so subsequent containers can reuse
/// the already-mounted storage. Actual cleanup happens when the sandbox is destroyed.
///
/// Returns `Ok(true)` if the reference count has reached zero and the storage object has been
/// removed.
#[instrument]
@@ -212,6 +228,10 @@ impl Sandbox {
None => Err(anyhow!("Sandbox storage with path {} not found", path)),
Some(state) => {
if state.dec_and_test_ref_count().await {
if state.is_shared() {
state.count.store(1, Ordering::Release);
return Ok(false);
}
if let Some(storage) = self.storages.remove(path) {
storage.device.cleanup()?;
}
@@ -720,7 +740,7 @@ mod tests {
let tmpdir_path = tmpdir.path().to_str().unwrap();
// Add a new sandbox storage
let new_storage = s.add_sandbox_storage(tmpdir_path).await;
let new_storage = s.add_sandbox_storage(tmpdir_path, false).await;
// Check the reference counter
let ref_count = new_storage.ref_count().await;
@@ -730,7 +750,7 @@ mod tests {
);
// Use the existing sandbox storage
let new_storage = s.add_sandbox_storage(tmpdir_path).await;
let new_storage = s.add_sandbox_storage(tmpdir_path, false).await;
// Since we are using existing storage, the reference counter
// should be 2 by now.
@@ -771,7 +791,7 @@ mod tests {
assert!(bind_mount(srcdir_path, destdir_path, &logger).is_ok());
s.add_sandbox_storage(destdir_path).await;
s.add_sandbox_storage(destdir_path, false).await;
let storage = StorageDeviceGeneric::new(destdir_path.to_string());
assert!(s
.update_sandbox_storage(destdir_path, Arc::new(storage))
@@ -789,7 +809,7 @@ mod tests {
let other_dir_path = other_dir.path().to_str().unwrap();
other_dir_str = other_dir_path.to_string();
s.add_sandbox_storage(other_dir_path).await;
s.add_sandbox_storage(other_dir_path, false).await;
let storage = StorageDeviceGeneric::new(other_dir_path.to_string());
assert!(s
.update_sandbox_storage(other_dir_path, Arc::new(storage))
@@ -808,9 +828,9 @@ mod tests {
let storage_path = "/tmp/testEphe";
// Add a new sandbox storage
s.add_sandbox_storage(storage_path).await;
s.add_sandbox_storage(storage_path, false).await;
// Use the existing sandbox storage
let state = s.add_sandbox_storage(storage_path).await;
let state = s.add_sandbox_storage(storage_path, false).await;
assert!(
state.ref_count().await > 1,
"Expects false as the storage is not new."

View File

@@ -6,7 +6,7 @@
use crate::linux_abi::pcipath_from_dev_tree_path;
use std::fs;
use std::os::unix::fs::PermissionsExt;
use std::os::unix::fs::{MetadataExt, PermissionsExt};
use std::path::Path;
use std::sync::Arc;
@@ -17,6 +17,7 @@ use kata_types::device::{
DRIVER_BLK_MMIO_TYPE, DRIVER_BLK_PCI_TYPE, DRIVER_NVDIMM_TYPE, DRIVER_SCSI_TYPE,
};
use kata_types::mount::StorageDevice;
use nix::sys::stat::{major, minor};
use protocols::agent::Storage;
use tracing::instrument;
@@ -29,10 +30,45 @@ use crate::device::block_device_handler::{
};
use crate::device::nvdimm_device_handler::wait_for_pmem_device;
use crate::device::scsi_device_handler::get_scsi_device_name;
use crate::storage::{common_storage_handler, new_device, StorageContext, StorageHandler};
use crate::storage::{
common_storage_handler, new_device, set_ownership, StorageContext, StorageHandler,
};
use slog::Logger;
#[cfg(target_arch = "s390x")]
use std::str::FromStr;
fn get_device_number(dev_path: &str, metadata: Option<&fs::Metadata>) -> Result<String> {
let dev_id = match metadata {
Some(m) => m.rdev(),
None => {
let m =
fs::metadata(dev_path).context(format!("get metadata on file {:?}", dev_path))?;
m.rdev()
}
};
Ok(format!("{}:{}", major(dev_id), minor(dev_id)))
}
async fn handle_block_storage(
logger: &Logger,
storage: &Storage,
dev_num: &str,
) -> Result<Arc<dyn StorageDevice>> {
let has_ephemeral_encryption = storage
.driver_options
.contains(&"encryption_key=ephemeral".to_string());
if has_ephemeral_encryption {
crate::rpc::cdh_secure_mount("block-device", dev_num, "luks2", &storage.mount_point)
.await?;
set_ownership(logger, storage)?;
new_device(storage.mount_point.clone())
} else {
let path = common_storage_handler(logger, storage)?;
new_device(path)
}
}
#[derive(Debug)]
pub struct VirtioBlkMmioHandler {}
@@ -75,6 +111,8 @@ impl StorageHandler for VirtioBlkPciHandler {
mut storage: Storage,
ctx: &mut StorageContext,
) -> Result<Arc<dyn StorageDevice>> {
let dev_num: String;
// If hot-plugged, get the device node path based on the PCI path
// otherwise use the virt path provided in Storage Source
if storage.source.starts_with("/dev") {
@@ -84,15 +122,16 @@ impl StorageHandler for VirtioBlkPciHandler {
if mode & libc::S_IFBLK == 0 {
return Err(anyhow!("Invalid device {}", &storage.source));
}
dev_num = get_device_number(&storage.source, Some(&metadata))?;
} else {
let (root_complex, pcipath) = pcipath_from_dev_tree_path(&storage.source)?;
let dev_path =
get_virtio_blk_pci_device_name(ctx.sandbox, root_complex, &pcipath).await?;
storage.source = dev_path;
dev_num = get_device_number(&storage.source, None)?;
}
let path = common_storage_handler(ctx.logger, &storage)?;
new_device(path)
handle_block_storage(ctx.logger, &storage, &dev_num).await
}
}
@@ -151,10 +190,10 @@ impl StorageHandler for ScsiHandler {
) -> Result<Arc<dyn StorageDevice>> {
// Retrieve the device path from SCSI address.
let dev_path = get_scsi_device_name(ctx.sandbox, &storage.source).await?;
storage.source = dev_path;
storage.source = dev_path.clone();
let path = common_storage_handler(ctx.logger, &storage)?;
new_device(path)
let dev_num = get_device_number(&dev_path, None)?;
handle_block_storage(ctx.logger, &storage, &dev_num).await
}
}

View File

@@ -172,7 +172,11 @@ pub async fn add_storages(
for storage in storages {
let path = storage.mount_point.clone();
let state = sandbox.lock().await.add_sandbox_storage(&path).await;
let state = sandbox
.lock()
.await
.add_sandbox_storage(&path, storage.shared)
.await;
if state.ref_count().await > 1 {
if let Some(path) = state.path() {
if !path.is_empty() {

View File

@@ -242,7 +242,7 @@ mod tests {
let metrics = Arc::new(SerialDeviceMetrics::default());
let out: Arc<Mutex<Option<Box<(dyn std::io::Write + Send + 'static)>>>> =
let out: Arc<Mutex<Option<Box<dyn std::io::Write + Send + 'static>>>> =
Arc::new(Mutex::new(Some(Box::new(std::io::sink()))));
let mut serial = SerialDevice {
serial: Serial::with_events(

View File

@@ -1174,7 +1174,6 @@ pub(crate) mod tests {
use dbs_virtio_devices::Result as VirtIoResult;
use dbs_virtio_devices::{
ActivateResult, VirtioDeviceConfig, VirtioDeviceInfo, VirtioSharedMemory,
DEVICE_ACKNOWLEDGE, DEVICE_DRIVER, DEVICE_DRIVER_OK, DEVICE_FEATURES_OK, DEVICE_INIT,
};
use dbs_address_space::{AddressSpaceLayout, AddressSpaceRegion, AddressSpaceRegionType};

View File

@@ -99,76 +99,61 @@ impl Default for EpollManager {
#[cfg(test)]
mod tests {
use super::*;
use std::os::unix::io::AsRawFd;
use std::os::fd::AsRawFd;
use std::sync::mpsc::channel;
use std::time::Duration;
use vmm_sys_util::{epoll::EventSet, eventfd::EventFd};
struct DummySubscriber {
pub event: EventFd,
pub event: Arc<EventFd>,
pub notify: std::sync::mpsc::Sender<()>,
}
impl DummySubscriber {
fn new() -> Self {
Self {
event: EventFd::new(0).unwrap(),
}
fn new(event: Arc<EventFd>, notify: std::sync::mpsc::Sender<()>) -> Self {
Self { event, notify }
}
}
impl MutEventSubscriber for DummySubscriber {
fn process(&mut self, events: Events, _ops: &mut EventOps) {
let source = events.fd();
let event_set = events.event_set();
assert_ne!(source, self.event.as_raw_fd());
match event_set {
EventSet::IN => {
unreachable!()
}
EventSet::OUT => {
self.event.read().unwrap();
}
_ => {
unreachable!()
}
}
fn init(&mut self, ops: &mut EventOps) {
ops.add(Events::new(self.event.as_ref(), EventSet::IN))
.unwrap();
}
fn init(&mut self, _ops: &mut EventOps) {}
fn process(&mut self, events: Events, _ops: &mut EventOps) {
if events.fd() == self.event.as_raw_fd() && events.event_set().contains(EventSet::IN) {
let _ = self.event.read();
let _ = self.notify.send(());
}
}
}
#[test]
fn test_epoll_manager() {
let mut epoll_manager = EpollManager::default();
let epoll_manager_clone = epoll_manager.clone();
let thread = std::thread::spawn(move || loop {
let count = epoll_manager_clone.handle_events(-1).unwrap();
if count == 0 {
continue;
let epoll_manager = EpollManager::default();
let (stop_tx, stop_rx) = channel::<()>();
let worker_mgr = epoll_manager.clone();
let worker = std::thread::spawn(move || {
while stop_rx.try_recv().is_err() {
let _ = worker_mgr.handle_events(50);
}
assert_eq!(count, 1);
break;
});
let handler = DummySubscriber::new();
let event = handler.event.try_clone().unwrap();
let (notify_tx, notify_rx) = channel::<()>();
let event = Arc::new(EventFd::new(0).unwrap());
let handler = DummySubscriber::new(event.clone(), notify_tx);
let id = epoll_manager.add_subscriber(Box::new(handler));
thread.join().unwrap();
epoll_manager
.add_event(id, Events::new(&event, EventSet::OUT))
.unwrap();
event.write(1).unwrap();
let epoll_manager_clone = epoll_manager.clone();
let thread = std::thread::spawn(move || loop {
let count = epoll_manager_clone.handle_events(-1).unwrap();
if count == 0 {
continue;
}
assert_eq!(count, 2);
break;
});
notify_rx
.recv_timeout(Duration::from_secs(2))
.expect("timeout waiting for subscriber to be processed");
thread.join().unwrap();
epoll_manager.remove_subscriber(id).unwrap();
epoll_manager.clone().remove_subscriber(id).unwrap();
let _ = stop_tx.send(());
worker.join().unwrap();
}
}

View File

@@ -690,6 +690,15 @@ mod tests {
use crate::tests::{create_address_space, create_vm_and_irq_manager};
use crate::{create_queue_notifier, VirtioQueueConfig};
fn unique_tap_name(prefix: &str) -> String {
use std::sync::atomic::{AtomicUsize, Ordering};
static CNT: AtomicUsize = AtomicUsize::new(0);
let n = CNT.fetch_add(1, Ordering::Relaxed);
// "vtap" + pid(<=5) + n(<=3) => max len <= 15
format!("{}{:x}{:x}", prefix, std::process::id() & 0xfff, n & 0xfff)
}
fn create_vhost_kern_net_epoll_handler(
id: String,
) -> NetEpollHandler<Arc<GuestMemoryMmap>, QueueSync, GuestRegionMmap> {
@@ -723,13 +732,16 @@ mod tests {
let guest_mac = MacAddr::parse_str(guest_mac_str).unwrap();
let queue_sizes = Arc::new(vec![128]);
let epoll_mgr = EpollManager::default();
let mut dev: Net<Arc<GuestMemoryMmap>, QueueSync, GuestRegionMmap> = Net::new(
String::from("test_vhosttap"),
Some(&guest_mac),
queue_sizes,
epoll_mgr,
)
.unwrap();
let tap_name = unique_tap_name("vtap");
let dev_result: VirtioResult<Net<Arc<GuestMemoryMmap>, QueueSync, GuestRegionMmap>> =
Net::new(tap_name.clone(), Some(&guest_mac), queue_sizes, epoll_mgr);
let mut dev: Net<Arc<GuestMemoryMmap>, QueueSync, GuestRegionMmap> = match dev_result {
Ok(d) => d,
Err(e) => {
eprintln!("skip test: failed to create tap {}: {:?}", tap_name, e);
return;
}
};
assert_eq!(dev.device_type(), TYPE_NET);
@@ -765,14 +777,16 @@ mod tests {
{
let queue_sizes = Arc::new(vec![128]);
let epoll_mgr = EpollManager::default();
let mut dev: Net<Arc<GuestMemoryMmap>, QueueSync, GuestRegionMmap> = Net::new(
String::from("test_vhosttap"),
Some(&guest_mac),
queue_sizes,
epoll_mgr,
)
.unwrap();
let tap_name = unique_tap_name("vtap");
let dev_result: VirtioResult<Net<Arc<GuestMemoryMmap>, QueueSync, GuestRegionMmap>> =
Net::new(tap_name.clone(), Some(&guest_mac), queue_sizes, epoll_mgr);
let mut dev: Net<Arc<GuestMemoryMmap>, QueueSync, GuestRegionMmap> = match dev_result {
Ok(d) => d,
Err(e) => {
eprintln!("skip test: failed to create tap {}: {:?}", tap_name, e);
return;
}
};
let queues = vec![
VirtioQueueConfig::create(128, 0).unwrap(),
VirtioQueueConfig::create(128, 0).unwrap(),
@@ -809,13 +823,17 @@ mod tests {
let queue_eventfd2 = Arc::new(EventFd::new(0).unwrap());
let queue_sizes = Arc::new(vec![128, 128]);
let epoll_mgr = EpollManager::default();
let mut dev: Net<Arc<GuestMemoryMmap>, Queue, GuestRegionMmap> = Net::new(
String::from("test_vhosttap"),
Some(&guest_mac),
queue_sizes,
epoll_mgr,
)
.unwrap();
let tap_name = unique_tap_name("vtap");
let dev_result: VirtioResult<Net<Arc<GuestMemoryMmap>, Queue, GuestRegionMmap>> =
Net::new(tap_name.clone(), Some(&guest_mac), queue_sizes, epoll_mgr);
let mut dev: Net<Arc<GuestMemoryMmap>, Queue, GuestRegionMmap> = match dev_result {
Ok(d) => d,
Err(e) => {
eprintln!("skip test: failed to create tap {}: {:?}", tap_name, e);
return;
}
};
let queues = vec![
VirtioQueueConfig::new(queue, queue_eventfd, notifier.clone(), 1),

View File

@@ -590,6 +590,7 @@ where
mod tests {
use std::sync::Arc;
use std::thread;
use std::time::{Duration, Instant};
use dbs_device::resources::DeviceResources;
use dbs_interrupt::{InterruptManager, InterruptSourceType, MsiNotifier, NoopNotifier};
@@ -609,19 +610,16 @@ mod tests {
};
use crate::{VirtioDevice, VirtioDeviceConfig, VirtioQueueConfig, TYPE_NET};
fn connect_slave(path: &str) -> Option<Endpoint<MasterReq>> {
let mut retry_count = 5;
fn connect_slave(path: &str, timeout: Duration) -> Option<Endpoint<MasterReq>> {
let deadline = Instant::now() + timeout;
loop {
match Endpoint::<MasterReq>::connect(path) {
Ok(endpoint) => return Some(endpoint),
Ok(ep) => return Some(ep),
Err(_) => {
if retry_count > 0 {
std::thread::sleep(std::time::Duration::from_millis(100));
retry_count -= 1;
continue;
} else {
if Instant::now() >= deadline {
return None;
}
thread::sleep(Duration::from_millis(20));
}
}
}
@@ -639,62 +637,88 @@ mod tests {
#[test]
fn test_vhost_user_net_virtio_device_normal() {
let device_socket = concat!("vhost.", line!());
let queue_sizes = Arc::new(vec![128]);
let dir_path = std::path::Path::new("/tmp");
let socket_path = dir_path.join(format!(
"vhost-user-net-{}-{:?}.sock",
std::process::id(),
thread::current().id()
));
let socket_str = socket_path.to_str().unwrap().to_string();
let _ = std::fs::remove_file(&socket_path);
let queue_sizes = Arc::new(vec![128u16]);
let epoll_mgr = EpollManager::default();
let handler = thread::spawn(move || {
let mut slave = connect_slave(device_socket).unwrap();
let socket_for_slave = socket_str.clone();
let slave_th = thread::spawn(move || {
let mut slave = connect_slave(&socket_for_slave, Duration::from_secs(5))
.unwrap_or_else(|| panic!("slave connect timeout: {}", socket_for_slave));
create_vhost_user_net_slave(&mut slave);
});
let mut dev: VhostUserNet<Arc<GuestMemoryMmap>> =
VhostUserNet::new_server(device_socket, None, queue_sizes, epoll_mgr).unwrap();
let (tx, rx) = std::sync::mpsc::channel();
let socket_for_master = socket_str.clone();
let queue_sizes_for_master = queue_sizes.clone();
let epoll_mgr_for_master = epoll_mgr.clone();
thread::spawn(move || {
let res = VhostUserNet::<Arc<GuestMemoryMmap>>::new_server(
&socket_for_master,
None,
queue_sizes_for_master,
epoll_mgr_for_master,
);
let _ = tx.send(res);
});
let dev_res = rx
.recv_timeout(Duration::from_secs(5))
.unwrap_or_else(|_| panic!("new_server() stuck/timeout: {}", socket_str));
let dev: VhostUserNet<Arc<GuestMemoryMmap>> = dev_res.unwrap_or_else(|e| {
panic!(
"new_server() returned error: {:?}, socket={}",
e, socket_str
)
});
assert_eq!(
VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::device_type(&dev),
TYPE_NET
);
let queue_size = [128];
let queue_size = [128u16];
assert_eq!(
VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::queue_max_sizes(
&dev
),
&queue_size[..]
);
assert_eq!(
VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::get_avail_features(&dev, 0),
dev.device().device_info.get_avail_features(0)
);
assert_eq!(
VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::get_avail_features(&dev, 1),
dev.device().device_info.get_avail_features(1)
);
assert_eq!(
VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::get_avail_features(&dev, 2),
dev.device().device_info.get_avail_features(2)
);
VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::set_acked_features(
&mut dev, 2, 0,
);
assert_eq!(VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::get_avail_features(&dev, 2), 0);
let config: [u8; 8] = [0; 8];
let _ = VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::write_config(
&mut dev, 0, &config,
);
let mut data: [u8; 8] = [1; 8];
let _ = VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::read_config(
&mut dev, 0, &mut data,
);
assert_eq!(config, data);
handler.join().unwrap();
slave_th.join().unwrap();
let _ = std::fs::remove_file(&socket_path);
drop(dev);
}
#[test]
fn test_vhost_user_net_virtio_device_activate() {
skip_if_kvm_unaccessable!();
let device_socket = concat!("vhost.", line!());
let queue_sizes = Arc::new(vec![128]);
let dir_path = std::path::Path::new("/tmp");
let socket_path = dir_path.join(format!(
"vhost-user-net-{}-{:?}.sock",
std::process::id(),
thread::current().id()
));
let socket_str = socket_path.to_str().unwrap().to_string();
let _ = std::fs::remove_file(&socket_path);
let queue_sizes = Arc::new(vec![128u16]);
let epoll_mgr = EpollManager::default();
let handler = thread::spawn(move || {
let mut slave = connect_slave(device_socket).unwrap();
let socket_for_slave = socket_str.clone();
let slave_th = thread::spawn(move || {
let mut slave = connect_slave(&socket_for_slave, Duration::from_secs(10))
.unwrap_or_else(|| panic!("slave connect timeout: {}", socket_for_slave));
create_vhost_user_net_slave(&mut slave);
let mut pfeatures = VhostUserProtocolFeatures::all();
// A workaround for no support for `INFLIGHT_SHMFD`. File an issue to track
@@ -702,8 +726,30 @@ mod tests {
pfeatures -= VhostUserProtocolFeatures::INFLIGHT_SHMFD;
negotiate_slave(&mut slave, pfeatures, true, 1);
});
let mut dev: VhostUserNet<Arc<GuestMemoryMmap>> =
VhostUserNet::new_server(device_socket, None, queue_sizes, epoll_mgr).unwrap();
let (tx, rx) = std::sync::mpsc::channel();
let socket_for_master = socket_str.clone();
let queue_sizes_for_master = queue_sizes.clone();
let epoll_mgr_for_master = epoll_mgr.clone();
thread::spawn(move || {
let res = VhostUserNet::<Arc<GuestMemoryMmap>>::new_server(
&socket_for_master,
None,
queue_sizes_for_master,
epoll_mgr_for_master,
);
let _ = tx.send(res);
});
let mut dev: VhostUserNet<Arc<GuestMemoryMmap>> = rx
.recv_timeout(Duration::from_secs(10))
.unwrap_or_else(|_| panic!("new_server() stuck/timeout: {}", socket_str))
.unwrap_or_else(|e| {
panic!(
"new_server() returned error: {:?}, socket={}",
e, socket_str
)
});
// invalid queue size
{
let kvm = Kvm::new().unwrap();
@@ -760,6 +806,9 @@ mod tests {
);
dev.activate(config).unwrap();
}
handler.join().unwrap();
slave_th.join().unwrap();
let _ = std::fs::remove_file(&socket_path);
drop(dev);
}
}

View File

@@ -867,56 +867,96 @@ mod tests {
.set_read_timeout(Some(Duration::from_millis(150)))
.is_ok());
let cond_pair = Arc::new((Mutex::new(false), Condvar::new()));
let cond_pair_2 = Arc::clone(&cond_pair);
let handler = thread::Builder::new()
.spawn(move || {
// notify handler thread start
let (lock, cvar) = &*cond_pair_2;
let mut started = lock.lock().unwrap();
*started = true;
// stage:
// 0 = handler started
// 1 = first read timed out (main can do first write now)
// 2 = timeout cancelled, handler is about to do 3rd blocking read
let stage = Arc::new((Mutex::new(0u32), Condvar::new()));
let stage2 = Arc::clone(&stage);
let handler = thread::spawn(move || {
// notify started
{
let (lock, cvar) = &*stage2;
let mut s = lock.lock().unwrap();
*s = 0;
cvar.notify_one();
drop(started);
}
let start_time1 = Instant::now();
let mut reader_buf = [0; 5];
// first read would timed out
assert_eq!(
outer_stream.read_exact(&mut reader_buf).unwrap_err().kind(),
ErrorKind::TimedOut
);
let end_time1 = Instant::now().duration_since(start_time1).as_millis();
assert!((150..250).contains(&end_time1));
let mut reader_buf = [0u8; 5];
// second read would ok
assert!(outer_stream.read_exact(&mut reader_buf).is_ok());
assert_eq!(reader_buf, [1, 2, 3, 4, 5]);
// 1) first read should timed out
let start_time1 = Instant::now();
assert_eq!(
outer_stream.read_exact(&mut reader_buf).unwrap_err().kind(),
ErrorKind::TimedOut
);
let end_time1 = start_time1.elapsed().as_millis();
assert!((150..300).contains(&end_time1));
// cancel the read timeout
let start_time2 = Instant::now();
outer_stream.set_read_timeout(None).unwrap();
assert!(outer_stream.read_exact(&mut reader_buf).is_ok());
let end_time2 = Instant::now().duration_since(start_time2).as_millis();
assert!(end_time2 >= 500);
})
.unwrap();
outer_stream
.set_read_timeout(Some(Duration::from_secs(10)))
.unwrap();
// wait handler thread started
let (lock, cvar) = &*cond_pair;
let mut started = lock.lock().unwrap();
while !*started {
started = cvar.wait(started).unwrap();
// notify main: timeout observed, now do first write
{
let (lock, cvar) = &*stage2;
let mut s = lock.lock().unwrap();
*s = 1;
cvar.notify_one();
}
// 2) second read should ok (main will write after stage==1)
outer_stream.read_exact(&mut reader_buf).unwrap();
assert_eq!(reader_buf, [1, 2, 3, 4, 5]);
// 3) cancel timeout, then do a blocking read; notify main before blocking
outer_stream.set_read_timeout(None).unwrap();
{
let (lock, cvar) = &*stage2;
let mut s = lock.lock().unwrap();
*s = 2;
cvar.notify_one();
}
let start_time2 = Instant::now();
outer_stream.read_exact(&mut reader_buf).unwrap();
let end_time2 = start_time2.elapsed().as_millis();
assert!(end_time2 >= 500);
assert_eq!(reader_buf, [1, 2, 3, 4, 5]);
});
// wait handler started (stage==0)
{
let (lock, cvar) = &*stage;
let mut s = lock.lock().unwrap();
while *s != 0 {
s = cvar.wait(s).unwrap();
}
}
// sleep 300ms, test timeout
thread::sleep(Duration::from_millis(300));
let writer_buf = [1, 2, 3, 4, 5];
inner_stream.write_all(&writer_buf).unwrap();
// wait first timeout done (stage==1), then do first write
{
let (lock, cvar) = &*stage;
let mut s = lock.lock().unwrap();
while *s < 1 {
s = cvar.wait(s).unwrap();
}
}
inner_stream.write_all(&[1, 2, 3, 4, 5]).unwrap();
// wait handler cancelled timeout and is about to block-read (stage==2)
{
let (lock, cvar) = &*stage;
let mut s = lock.lock().unwrap();
while *s < 2 {
s = cvar.wait(s).unwrap();
}
}
// sleep 500ms again, test cancel timeout
thread::sleep(Duration::from_millis(500));
let writer_buf = [1, 2, 3, 4, 5];
inner_stream.write_all(&writer_buf).unwrap();
inner_stream.write_all(&[1, 2, 3, 4, 5]).unwrap();
handler.join().unwrap();
}

View File

@@ -120,7 +120,7 @@ mod tests {
use libc::{cpu_set_t, syscall};
use std::convert::TryInto;
use std::{mem, process, thread};
use std::{mem, thread};
use seccompiler::{apply_filter, BpfProgram, SeccompAction, SeccompFilter};
@@ -157,6 +157,16 @@ mod tests {
let child = thread::spawn(move || {
assert!(register_signal_handlers().is_ok());
// Trigger SIGBUS/SIGSEGV *before* installing the seccomp filter.
// Call SIGBUS signal handler.
assert_eq!(METRICS.read().unwrap().signals.sigbus.count(), 0);
unsafe { libc::raise(SIGBUS) };
// Call SIGSEGV signal handler.
assert_eq!(METRICS.read().unwrap().signals.sigsegv.count(), 0);
unsafe { libc::raise(SIGSEGV) };
// Install a seccomp filter that traps a known syscall so that we can verify SIGSYS handling.
let filter = SeccompFilter::new(
vec![(libc::SYS_mkdirat, vec![])].into_iter().collect(),
SeccompAction::Allow,
@@ -168,20 +178,8 @@ mod tests {
assert!(apply_filter(&TryInto::<BpfProgram>::try_into(filter).unwrap()).is_ok());
assert_eq!(METRICS.read().unwrap().seccomp.num_faults.count(), 0);
// Call the blacklisted `SYS_mkdirat`.
// Invoke the blacklisted syscall to trigger SIGSYS and exercise the SIGSYS handler.
unsafe { syscall(libc::SYS_mkdirat, "/foo/bar\0") };
// Call SIGBUS signal handler.
assert_eq!(METRICS.read().unwrap().signals.sigbus.count(), 0);
unsafe {
syscall(libc::SYS_kill, process::id(), SIGBUS);
}
// Call SIGSEGV signal handler.
assert_eq!(METRICS.read().unwrap().signals.sigsegv.count(), 0);
unsafe {
syscall(libc::SYS_kill, process::id(), SIGSEGV);
}
});
assert!(child.join().is_ok());

View File

@@ -13,6 +13,7 @@ use super::{default, register_hypervisor_plugin};
use crate::config::default::MAX_CH_VCPUS;
use crate::config::default::MIN_CH_MEMORY_SIZE_MB;
use crate::config::hypervisor::VIRTIO_BLK_MMIO;
use crate::config::{ConfigPlugin, TomlConfig};
use crate::{resolve_path, validate_path};
@@ -104,6 +105,16 @@ impl ConfigPlugin for CloudHypervisorConfig {
));
}
// CoCo guest hardening: virtio-mmio is not hardened for confidential computing.
if ch.security_info.confidential_guest
&& ch.boot_info.vm_rootfs_driver == VIRTIO_BLK_MMIO
{
return Err(std::io::Error::other(
"Confidential guests must not use virtio-blk-mmio (use virtio-blk-pci); \
virtio-mmio is not hardened for CoCo",
));
}
if ch.boot_info.kernel.is_empty() {
return Err(std::io::Error::other("Guest kernel image for CH is empty"));
}

View File

@@ -124,6 +124,17 @@ impl ConfigPlugin for QemuConfig {
));
}
// CoCo guest hardening: virtio-mmio transport is not hardened for confidential
// computing; only virtio-pci is. Ensure we never use virtio-blk-mmio for rootfs.
if qemu.security_info.confidential_guest
&& qemu.boot_info.vm_rootfs_driver == VIRTIO_BLK_MMIO
{
return Err(std::io::Error::other(
"Confidential guests must not use virtio-blk-mmio (use virtio-blk-pci); \
virtio-mmio is not hardened for CoCo",
));
}
if qemu.boot_info.kernel.is_empty() {
return Err(std::io::Error::other(
"Guest kernel image for qemu is empty",

View File

@@ -520,6 +520,11 @@ message Storage {
// FSGroup consists of the group ID and group ownership change policy
// that the mounted volume must have its group ID changed to when specified.
FSGroup fs_group = 7;
// Shared indicates this storage is shared across multiple containers
// (e.g., block-based emptyDirs). When true, the agent should not clean up
// the storage when a container using it exits, as other containers
// may still need it. Cleanup will happen when the sandbox is destroyed.
bool shared = 8;
}
// Device represents only the devices that could have been defined through the

View File

@@ -24,9 +24,7 @@ message SecureMountRequest {
string mount_point = 4;
}
message SecureMountResponse {
string mount_path = 1;
}
message SecureMountResponse {}
message ImagePullRequest {
// - `image_url`: The reference of the image to pull

View File

@@ -15,6 +15,11 @@ PROJECT_URL = https://github.com/kata-containers
PROJECT_COMPONENT = containerd-shim-kata-v2
CONTAINERD_RUNTIME_NAME = io.containerd.kata.v2
# This snippet finds all packages inside runtime-rs. Used for tessting.
PACKAGES := $(shell cargo metadata --no-deps --format-version 1 | \
jq -r '.packages[] | select(.manifest_path | contains("runtime-rs")) | .name')
PACKAGE_FLAGS := $(patsubst %,-p %,$(PACKAGES))
include ../../utils.mk
ARCH_DIR = arch
@@ -45,9 +50,9 @@ test:
else
##TARGET default: build code
default: runtime show-header
##TARGET test: run cargo tests
##TARGET test: run cargo tests for runtime-rs and all its sub-crates.
test: static-checks-build
@cargo test --all --target $(TRIPLE) $(EXTRA_RUSTFEATURES) -- --nocapture --skip bindgen
@cargo test $(PACKAGE_FLAGS) --target $(TRIPLE) $(EXTRA_RUSTFEATURES) -- --nocapture --skip bindgen
install: install-runtime install-configs
endif
@@ -733,7 +738,7 @@ static-checks-build: $(GENERATED_FILES)
$(TARGET): $(GENERATED_FILES) $(TARGET_PATH)
$(TARGET_PATH): $(SOURCES) | show-summary
@RUSTFLAGS="$(EXTRA_RUSTFLAGS) --deny warnings" cargo build --target $(TRIPLE) $(if $(findstring release,$(BUILD_TYPE)),--release) $(EXTRA_RUSTFEATURES)
@RUSTFLAGS="$(EXTRA_RUSTFLAGS) --deny warnings" cargo build -p runtime-rs --target $(TRIPLE) $(if $(findstring release,$(BUILD_TYPE)),--release) $(EXTRA_RUSTFEATURES)
$(GENERATED_FILES): %: %.in
@sed \
@@ -769,7 +774,7 @@ endif
##TARGET run: build and run agent
run:
@cargo run --target $(TRIPLE)
@cargo run -p runtime-rs --target $(TRIPLE)
show-header:
@printf "%s - version %s (commit %s)\n\n" "$(TARGET)" "$(VERSION)" "$(COMMIT_MSG)"

View File

@@ -470,7 +470,10 @@ impl CloudHypervisorInner {
net_config.id = None;
net_config.num_queues = network_queues_pairs * 2;
info!(sl!(), "network device queue pairs {:?}", network_queues_pairs);
info!(
sl!(),
"network device queue pairs {:?}", network_queues_pairs
);
// we need ensure opening network device happens in netns.
let netns = self.netns.clone().unwrap_or_default();

View File

@@ -9,8 +9,8 @@ use crate::device::topology::PCIePort;
use crate::qemu::qmp::get_qmp_socket_path;
use crate::{
device::driver::ProtectionDeviceConfig, hypervisor_persist::HypervisorState, selinux,
HypervisorConfig, MemoryConfig, VcpuThreadIds, VsockDevice, HYPERVISOR_QEMU,
KATA_BLK_DEV_TYPE, KATA_CCW_DEV_TYPE, KATA_NVDIMM_DEV_TYPE, KATA_SCSI_DEV_TYPE,
HypervisorConfig, MemoryConfig, VcpuThreadIds, VsockDevice, HYPERVISOR_QEMU, KATA_BLK_DEV_TYPE,
KATA_CCW_DEV_TYPE, KATA_NVDIMM_DEV_TYPE, KATA_SCSI_DEV_TYPE,
};
use crate::utils::{
@@ -138,15 +138,16 @@ impl QemuInner {
&block_dev.config.path_on_host,
block_dev.config.is_readonly,
)?,
KATA_CCW_DEV_TYPE | KATA_BLK_DEV_TYPE | KATA_SCSI_DEV_TYPE => cmdline.add_block_device(
block_dev.device_id.as_str(),
&block_dev.config.path_on_host,
block_dev
.config
.is_direct
.unwrap_or(self.config.blockdev_info.block_device_cache_direct),
block_dev.config.driver_option.as_str() == KATA_SCSI_DEV_TYPE,
)?,
KATA_CCW_DEV_TYPE | KATA_BLK_DEV_TYPE | KATA_SCSI_DEV_TYPE => cmdline
.add_block_device(
block_dev.device_id.as_str(),
&block_dev.config.path_on_host,
block_dev
.config
.is_direct
.unwrap_or(self.config.blockdev_info.block_device_cache_direct),
block_dev.config.driver_option.as_str() == KATA_SCSI_DEV_TYPE,
)?,
unsupported => {
info!(sl!(), "unsupported block device driver: {}", unsupported)
}

View File

@@ -187,11 +187,21 @@ impl Qmp {
continue;
}
(None, _) => {
warn!(sl!(), "hotpluggable vcpu {} has no socket_id for driver {}, skipping", core_id, driver);
warn!(
sl!(),
"hotpluggable vcpu {} has no socket_id for driver {}, skipping",
core_id,
driver
);
continue;
}
(_, None) => {
warn!(sl!(), "hotpluggable vcpu {} has no thread_id for driver {}, skipping", core_id, driver);
warn!(
sl!(),
"hotpluggable vcpu {} has no thread_id for driver {}, skipping",
core_id,
driver
);
continue;
}
}
@@ -753,10 +763,9 @@ impl Qmp {
Ok((None, Some(scsi_addr)))
} else if block_driver == VIRTIO_BLK_CCW {
let subchannel = self
.ccw_subchannel
.as_mut()
.ok_or_else(|| anyhow!("CCW subchannel not available for virtio-blk-ccw hotplug"))?;
let subchannel = self.ccw_subchannel.as_mut().ok_or_else(|| {
anyhow!("CCW subchannel not available for virtio-blk-ccw hotplug")
})?;
let slot = subchannel
.add_device(&node_name)

View File

@@ -11,6 +11,7 @@ lazy_static = { workspace = true }
netns-rs = { workspace = true }
slog = { workspace = true }
slog-scope = { workspace = true }
containerd-shim-protos = { workspace = true }
tokio = { workspace = true, features = ["rt-multi-thread"] }
tracing = { workspace = true }
tracing-opentelemetry = { workspace = true }

View File

@@ -6,7 +6,7 @@
use std::sync::Arc;
use anyhow::{Context, Result};
use containerd_shim_protos::events::task::{TaskExit, TaskOOM};
use containerd_shim_protos::events::task::{TaskCreate, TaskDelete, TaskExit, TaskOOM, TaskStart};
use containerd_shim_protos::protobuf::Message as ProtobufMessage;
use tokio::sync::mpsc::{channel, Receiver, Sender};
@@ -49,9 +49,15 @@ impl Message {
const TASK_OOM_EVENT_TOPIC: &str = "/tasks/oom";
const TASK_EXIT_EVENT_TOPIC: &str = "/tasks/exit";
const TASK_START_EVENT_TOPIC: &str = "/tasks/start";
const TASK_CREATE_EVENT_TOPIC: &str = "/tasks/create";
const TASK_DELETE_EVENT_TOPIC: &str = "/tasks/delete";
const TASK_OOM_EVENT_URL: &str = "containerd.events.TaskOOM";
const TASK_EXIT_EVENT_URL: &str = "containerd.events.TaskExit";
const TASK_START_EVENT_URL: &str = "containerd.events.TaskStart";
const TASK_CREATE_EVENT_URL: &str = "containerd.events.TaskCreate";
const TASK_DELETE_EVENT_URL: &str = "containerd.events.TaskDelete";
pub trait Event: std::fmt::Debug + Send {
fn r#type(&self) -> String;
@@ -86,3 +92,45 @@ impl Event for TaskExit {
self.write_to_bytes().context("get exit value")
}
}
impl Event for TaskStart {
fn r#type(&self) -> String {
TASK_START_EVENT_TOPIC.to_string()
}
fn type_url(&self) -> String {
TASK_START_EVENT_URL.to_string()
}
fn value(&self) -> Result<Vec<u8>> {
self.write_to_bytes().context("get start value")
}
}
impl Event for TaskCreate {
fn r#type(&self) -> String {
TASK_CREATE_EVENT_TOPIC.to_string()
}
fn type_url(&self) -> String {
TASK_CREATE_EVENT_URL.to_string()
}
fn value(&self) -> Result<Vec<u8>> {
self.write_to_bytes().context("get create value")
}
}
impl Event for TaskDelete {
fn r#type(&self) -> String {
TASK_DELETE_EVENT_TOPIC.to_string()
}
fn type_url(&self) -> String {
TASK_DELETE_EVENT_URL.to_string()
}
fn value(&self) -> Result<Vec<u8>> {
self.write_to_bytes().context("get delete value")
}
}

View File

@@ -6,14 +6,16 @@
use anyhow::{anyhow, Context, Result};
use common::{
message::Message,
message::{Action, Message},
types::{
ContainerProcess, PlatformInfo, SandboxConfig, SandboxRequest, SandboxResponse,
SandboxStatusInfo, StartSandboxInfo, TaskRequest, TaskResponse, DEFAULT_SHM_SIZE,
ContainerProcess, PlatformInfo, ProcessType, SandboxConfig, SandboxRequest,
SandboxResponse, SandboxStatusInfo, StartSandboxInfo, TaskRequest, TaskResponse,
DEFAULT_SHM_SIZE,
},
RuntimeHandler, RuntimeInstance, Sandbox, SandboxNetworkEnv,
};
use containerd_shim_protos::events::task::{TaskCreate, TaskDelete, TaskStart};
use hypervisor::{
utils::{create_dir_all_with_inherit_owner, create_vmm_user, remove_vmm_user},
Param,
@@ -33,13 +35,13 @@ use netns_rs::{Env, NetNs};
use nix::{sys::statfs, unistd::User};
use oci_spec::runtime as oci;
use persist::sandbox_persist::Persist;
use protobuf::Message as ProtobufMessage;
use resource::{
cpu_mem::initial_size::InitialSizeManager,
network::{dan_config_path, generate_netns_name},
};
use runtime_spec as spec;
use shim_interface::shim_mgmt::ERR_NO_SHIM_SERVER;
use protobuf::Message as ProtobufMessage;
use std::{
collections::HashMap,
env,
@@ -480,6 +482,7 @@ impl RuntimeHandlerManager {
.await
.context("start sandbox in task handler")?;
let bundle = container_config.bundle.clone();
let container_id = container_config.container_id.clone();
let shim_pid = instance
.container_manager
@@ -501,6 +504,19 @@ impl RuntimeHandlerManager {
}
});
let msg_sender = self.inner.read().await.msg_sender.clone();
let event = TaskCreate {
container_id,
bundle,
pid,
..Default::default()
};
let msg = Message::new(Action::Event(Arc::new(event)));
msg_sender
.send(msg)
.await
.context("send task create event")?;
Ok(TaskResponse::CreateContainer(shim_pid))
} else {
self.handler_task_request(req)
@@ -570,6 +586,7 @@ impl RuntimeHandlerManager {
.context("get runtime instance")?;
let sandbox = instance.sandbox.clone();
let cm = instance.container_manager.clone();
let msg_sender = self.inner.read().await.msg_sender.clone();
match req {
TaskRequest::CreateContainer(req) => Err(anyhow!("Unreachable TaskRequest {:?}", req)),
@@ -579,6 +596,20 @@ impl RuntimeHandlerManager {
}
TaskRequest::DeleteProcess(process_id) => {
let resp = cm.delete_process(&process_id).await.context("do delete")?;
if process_id.process_type == ProcessType::Container {
let event = TaskDelete {
id: process_id.container_id().to_string(),
pid: resp.pid.pid,
exit_status: resp.exit_status as u32,
..Default::default()
};
let msg = Message::new(Action::Event(Arc::new(event)));
msg_sender
.send(msg)
.await
.context("send task delete event")?;
}
Ok(TaskResponse::DeleteProcess(resp))
}
TaskRequest::ExecProcess(req) => {
@@ -614,12 +645,28 @@ impl RuntimeHandlerManager {
.context("start process")?;
let pid = shim_pid.pid;
let process_type = process_id.process_type;
let container_id = process_id.container_id().to_string();
tokio::spawn(async move {
let result = sandbox.wait_process(cm, process_id, pid).await;
if let Err(e) = result {
error!(sl!(), "sandbox wait process error: {:?}", e);
}
});
if process_type == ProcessType::Container {
let event = TaskStart {
container_id,
pid,
..Default::default()
};
let msg = Message::new(Action::Event(Arc::new(event)));
msg_sender
.send(msg)
.await
.context("send task start event")?;
}
Ok(TaskResponse::StartProcess(shim_pid))
}

View File

@@ -65,8 +65,6 @@ INITRDCONFIDENTIALNAME = $(PROJECT_TAG)-initrd-confidential.img
IMAGENAME_NV = $(PROJECT_TAG)-nvidia-gpu.img
IMAGENAME_CONFIDENTIAL_NV = $(PROJECT_TAG)-nvidia-gpu-confidential.img
INITRDNAME_NV = $(PROJECT_TAG)-initrd-nvidia-gpu.img
INITRDNAME_CONFIDENTIAL_NV = $(PROJECT_TAG)-initrd-nvidia-gpu-confidential.img
TARGET = $(BIN_PREFIX)-runtime
RUNTIME_OUTPUT = $(CURDIR)/$(TARGET)
@@ -136,8 +134,6 @@ INITRDCONFIDENTIALPATH := $(PKGDATADIR)/$(INITRDCONFIDENTIALNAME)
IMAGEPATH_NV := $(PKGDATADIR)/$(IMAGENAME_NV)
IMAGEPATH_CONFIDENTIAL_NV := $(PKGDATADIR)/$(IMAGENAME_CONFIDENTIAL_NV)
INITRDPATH_NV := $(PKGDATADIR)/$(INITRDNAME_NV)
INITRDPATH_CONFIDENTIAL_NV := $(PKGDATADIR)/$(INITRDNAME_CONFIDENTIAL_NV)
ROOTFSTYPE_EXT4 := \"ext4\"
ROOTFSTYPE_XFS := \"xfs\"
@@ -147,10 +143,14 @@ DEFROOTFSTYPE := $(ROOTFSTYPE_EXT4)
FIRMWAREPATH :=
FIRMWAREVOLUMEPATH :=
FIRMWAREPATH_NV = $(FIRMWAREPATH)
FIRMWARETDVFPATH := $(PREFIXDEPS)/share/ovmf/OVMF.inteltdx.fd
FIRMWARETDVFPATH_NV := $(FIRMWARETDVFPATH)
FIRMWARETDVFVOLUMEPATH :=
FIRMWARESNPPATH := $(PREFIXDEPS)/share/ovmf/AMDSEV.fd
FIRMWARESNPPATH_NV := $(FIRMWARESNPPATH)
KERNELVERITYPARAMS ?= ""
KERNELVERITYPARAMS_NV ?= ""
@@ -221,6 +221,8 @@ DEFENABLEANNOTATIONS := [\"enable_iommu\", \"virtio_fs_extra_args\", \"kernel_pa
DEFENABLEANNOTATIONS_COCO := [\"enable_iommu\", \"virtio_fs_extra_args\", \"kernel_params\", \"kernel_verity_params\", \"default_vcpus\", \"default_memory\", \"cc_init_data\"]
DEFDISABLEGUESTSECCOMP := true
DEFDISABLEGUESTEMPTYDIR := false
DEFEMPTYDIRMODE := shared-fs
DEFEMPTYDIRMODE_COCO := block-encrypted
#Default experimental features enabled
DEFAULTEXPFEATURES := []
@@ -301,9 +303,11 @@ DEFDANCONF := /run/kata-containers/dans
DEFFORCEGUESTPULL := false
DEFKUBELETROOTDIR := /var/lib/kubelet
# Device cold plug
DEFPODRESOURCEAPISOCK := ""
DEFPODRESOURCEAPISOCK_NV := "/var/lib/kubelet/pod-resources/kubelet.sock"
DEFPODRESOURCEAPISOCK_NV := "$(DEFKUBELETROOTDIR)/pod-resources/kubelet.sock"
SED = sed
@@ -468,23 +472,22 @@ ifneq (,$(QEMUCMD))
KERNELSEPATH = $(KERNELDIR)/$(KERNELSENAME)
# NVIDIA GPU specific options (all should be suffixed by _NV)
# Normal: uncompressed (KERNELTYPE). Confidential: compressed (KERNELCONFIDENTIALTYPE).
KERNELNAME_NV = $(call MAKE_KERNEL_NAME_NV,$(KERNELTYPE))
KERNELTYPE_NV = compressed
KERNELNAME_NV = $(call MAKE_KERNEL_NAME_NV,$(KERNELTYPE_NV))
KERNELPATH_NV = $(KERNELDIR)/$(KERNELNAME_NV)
KERNELNAME_CONFIDENTIAL_NV = $(call MAKE_KERNEL_NAME_NV,$(KERNELCONFIDENTIALTYPE))
KERNELPATH_CONFIDENTIAL_NV = $(KERNELDIR)/$(KERNELNAME_CONFIDENTIAL_NV)
DEFAULTVCPUS_NV = 1
DEFAULTMEMORY_NV = 2048
DEFAULTMEMORY_NV = 8192
DEFAULTTIMEOUT_NV = 1200
DEFAULTVFIOPORT_NV = root-port
DEFAULTPCIEROOTPORT_NV = 8
# Disable the devtmpfs mount in guest. NVRC does this, and later kata-agent
# attempts this as well in a non-failing manner. Otherwise, NVRC fails when
# using an image and /dev is already mounted.
KERNELPARAMS_NV = "cgroup_no_v1=all"
KERNELPARAMS_NV += "devtmpfs.mount=0"
KERNELPARAMS_NV += "pci=realloc"
KERNELPARAMS_NV += "pci=nocrs"
KERNELPARAMS_NV += "pci=assign-busses"
# Setting this to false can lead to cgroup leakages in the host
# Best practice for production is to set this to true
@@ -649,10 +652,6 @@ USER_VARS += IMAGENAME_NV
USER_VARS += IMAGENAME_CONFIDENTIAL_NV
USER_VARS += IMAGEPATH_NV
USER_VARS += IMAGEPATH_CONFIDENTIAL_NV
USER_VARS += INITRDNAME_NV
USER_VARS += INITRDNAME_CONFIDENTIAL_NV
USER_VARS += INITRDPATH_NV
USER_VARS += INITRDPATH_CONFIDENTIAL_NV
USER_VARS += KERNELNAME_NV
USER_VARS += KERNELPATH_NV
USER_VARS += KERNELNAME_CONFIDENTIAL_NV
@@ -681,10 +680,13 @@ USER_VARS += KERNELPATH_FC
USER_VARS += KERNELPATH_STRATOVIRT
USER_VARS += KERNELVIRTIOFSPATH
USER_VARS += FIRMWAREPATH
USER_VARS += FIRMWAREPATH_NV
USER_VARS += FIRMWARETDVFPATH
USER_VARS += FIRMWAREVOLUMEPATH
USER_VARS += FIRMWARETDVFVOLUMEPATH
USER_VARS += FIRMWARESNPPATH
USER_VARS += FIRMWARETDVFPATH_NV
USER_VARS += FIRMWARESNPPATH_NV
USER_VARS += MACHINEACCELERATORS
USER_VARS += CPUFEATURES
USER_VARS += TDXCPUFEATURES
@@ -738,6 +740,8 @@ USER_VARS += DEFNETWORKMODEL_FC
USER_VARS += DEFNETWORKMODEL_QEMU
USER_VARS += DEFNETWORKMODEL_STRATOVIRT
USER_VARS += DEFDISABLEGUESTEMPTYDIR
USER_VARS += DEFEMPTYDIRMODE
USER_VARS += DEFEMPTYDIRMODE_COCO
USER_VARS += DEFDISABLEGUESTSECCOMP
USER_VARS += DEFDISABLESELINUX
USER_VARS += DEFDISABLEGUESTSELINUX
@@ -785,6 +789,7 @@ USER_VARS += DEFSTATICRESOURCEMGMT_NV
USER_VARS += DEFBINDMOUNTS
USER_VARS += DEFCREATECONTAINERTIMEOUT
USER_VARS += DEFDANCONF
USER_VARS += DEFKUBELETROOTDIR
USER_VARS += DEFFORCEGUESTPULL
USER_VARS += DEFVFIOMODE
USER_VARS += DEFVFIOMODE_SE

View File

@@ -463,6 +463,18 @@ vfio_mode = "@DEFVFIOMODE@"
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Specifies how Kubernetes emptyDir volumes are handled.
# Options:
#
# - shared-fs (default)
# Shares the emptyDir folder with the guest using the method given
# by the `shared_fs` setting.
#
# - block-encrypted
# Plugs a block device to be encrypted in the guest.
#
emptydir_mode = "@DEFEMPTYDIRMODE@"
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
@@ -491,6 +503,11 @@ create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
# (default: /run/kata-containers/dans)
dan_conf = "@DEFDANCONF@"
# kubelet_root_dir is the kubelet root directory used to match ConfigMap/Secret
# volume paths for propagation. Override for distros that use a different path
# (e.g. k0s: /var/lib/k0s/kubelet).
kubelet_root_dir = "@DEFKUBELETROOTDIR@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the

View File

@@ -354,6 +354,18 @@ static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT_FC@
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Specifies how Kubernetes emptyDir volumes are handled.
# Options:
#
# - shared-fs (default)
# Shares the emptyDir folder with the guest using the method given
# by the `shared_fs` setting.
#
# - block-encrypted
# Plugs a block device to be encrypted in the guest.
#
emptydir_mode = "@DEFEMPTYDIRMODE@"
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
@@ -382,6 +394,11 @@ create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
# (default: /run/kata-containers/dans)
dan_conf = "@DEFDANCONF@"
# kubelet_root_dir is the kubelet root directory used to match ConfigMap/Secret
# volume paths for propagation. Override for distros that use a different path
# (e.g. k0s: /var/lib/k0s/kubelet).
kubelet_root_dir = "@DEFKUBELETROOTDIR@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the

View File

@@ -638,6 +638,18 @@ vfio_mode = "@DEFVFIOMODE@"
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Specifies how Kubernetes emptyDir volumes are handled.
# Options:
#
# - shared-fs (default)
# Shares the emptyDir folder with the guest using the method given
# by the `shared_fs` setting.
#
# - block-encrypted
# Plugs a block device to be encrypted in the guest.
#
emptydir_mode = "@DEFEMPTYDIRMODE@"
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
@@ -670,6 +682,12 @@ dan_conf = "@DEFDANCONF@"
# the container image should be pulled in the guest, without using an external snapshotter.
# This is an experimental feature and might be removed in the future.
experimental_force_guest_pull = @DEFFORCEGUESTPULL@
# kubelet_root_dir is the kubelet root directory used to match ConfigMap/Secret
# volume paths for propagation. Override for distros that use a different path
# (e.g. k0s: /var/lib/k0s/kubelet).
kubelet_root_dir = "@DEFKUBELETROOTDIR@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the

View File

@@ -701,6 +701,18 @@ vfio_mode = "@DEFVFIOMODE@"
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Specifies how Kubernetes emptyDir volumes are handled.
# Options:
#
# - shared-fs (default)
# Shares the emptyDir folder with the guest using the method given
# by the `shared_fs` setting.
#
# - block-encrypted
# Plugs a block device to be encrypted in the guest.
#
emptydir_mode = "@DEFEMPTYDIRMODE_COCO@"
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
@@ -734,6 +746,11 @@ dan_conf = "@DEFDANCONF@"
# This is an experimental feature and might be removed in the future.
experimental_force_guest_pull = @DEFFORCEGUESTPULL@
# kubelet_root_dir is the kubelet root directory used to match ConfigMap/Secret
# volume paths for propagation. Override for distros that use a different path
# (e.g. k0s: /var/lib/k0s/kubelet).
kubelet_root_dir = "@DEFKUBELETROOTDIR@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the

View File

@@ -99,7 +99,7 @@ kernel_verity_params = "@KERNELVERITYPARAMS_CONFIDENTIAL_NV@"
# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = "@FIRMWARESNPPATH@"
firmware = "@FIRMWARESNPPATH_NV@"
# Path to the firmware volume.
# firmware TDVF or OVMF can be split into FIRMWARE_VARS.fd (UEFI variables
@@ -599,7 +599,7 @@ debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 90)
dial_timeout = 90
dial_timeout = @DEFAULTTIMEOUT_NV@
[runtime]
# If enabled, the runtime will log additional debug messages to the
@@ -717,6 +717,18 @@ vfio_mode = "@DEFVFIOMODE@"
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Specifies how Kubernetes emptyDir volumes are handled.
# Options:
#
# - shared-fs (default)
# Shares the emptyDir folder with the guest using the method given
# by the `shared_fs` setting.
#
# - block-encrypted
# Plugs a block device to be encrypted in the guest.
#
emptydir_mode = "@DEFEMPTYDIRMODE@"
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
@@ -750,6 +762,11 @@ dan_conf = "@DEFDANCONF@"
# This is an experimental feature and might be removed in the future.
experimental_force_guest_pull = @DEFFORCEGUESTPULL@
# kubelet_root_dir is the kubelet root directory used to match ConfigMap/Secret
# volume paths for propagation. Override for distros that use a different path
# (e.g. k0s: /var/lib/k0s/kubelet).
kubelet_root_dir = "@DEFKUBELETROOTDIR@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the

View File

@@ -76,7 +76,7 @@ kernel_verity_params = "@KERNELVERITYPARAMS_CONFIDENTIAL_NV@"
# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = "@FIRMWARETDVFPATH@"
firmware = "@FIRMWARETDVFPATH_NV@"
# Path to the firmware volume.
# firmware TDVF or OVMF can be split into FIRMWARE_VARS.fd (UEFI variables
@@ -576,7 +576,7 @@ debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 90)
dial_timeout = 90
dial_timeout = @DEFAULTTIMEOUT_NV@
[runtime]
# If enabled, the runtime will log additional debug messages to the
@@ -694,6 +694,18 @@ vfio_mode = "@DEFVFIOMODE@"
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Specifies how Kubernetes emptyDir volumes are handled.
# Options:
#
# - shared-fs (default)
# Shares the emptyDir folder with the guest using the method given
# by the `shared_fs` setting.
#
# - block-encrypted
# Plugs a block device to be encrypted in the guest.
#
emptydir_mode = "@DEFEMPTYDIRMODE@"
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
@@ -727,6 +739,11 @@ dan_conf = "@DEFDANCONF@"
# This is an experimental feature and might be removed in the future.
experimental_force_guest_pull = @DEFFORCEGUESTPULL@
# kubelet_root_dir is the kubelet root directory used to match ConfigMap/Secret
# volume paths for propagation. Override for distros that use a different path
# (e.g. k0s: /var/lib/k0s/kubelet).
kubelet_root_dir = "@DEFKUBELETROOTDIR@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the

View File

@@ -58,7 +58,7 @@ kernel_verity_params = "@KERNELVERITYPARAMS_NV@"
# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
firmware = "@FIRMWAREPATH@"
firmware = "@FIRMWAREPATH_NV@"
# Path to the firmware volume.
# firmware TDVF or OVMF can be split into FIRMWARE_VARS.fd (UEFI variables
@@ -578,7 +578,7 @@ debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 90)
dial_timeout = 90
dial_timeout = @DEFAULTTIMEOUT_NV@
[runtime]
# If enabled, the runtime will log additional debug messages to the
@@ -696,6 +696,18 @@ vfio_mode = "@DEFVFIOMODE@"
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Specifies how Kubernetes emptyDir volumes are handled.
# Options:
#
# - shared-fs (default)
# Shares the emptyDir folder with the guest using the method given
# by the `shared_fs` setting.
#
# - block-encrypted
# Plugs a block device to be encrypted in the guest.
#
emptydir_mode = "@DEFEMPTYDIRMODE@"
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
@@ -724,6 +736,11 @@ create_container_timeout = @DEFAULTTIMEOUT_NV@
# (default: /run/kata-containers/dans)
dan_conf = "@DEFDANCONF@"
# kubelet_root_dir is the kubelet root directory used to match ConfigMap/Secret
# volume paths for propagation. Override for distros that use a different path
# (e.g. k0s: /var/lib/k0s/kubelet).
kubelet_root_dir = "@DEFKUBELETROOTDIR@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the

View File

@@ -679,6 +679,18 @@ vfio_mode = "@DEFVFIOMODE_SE@"
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Specifies how Kubernetes emptyDir volumes are handled.
# Options:
#
# - shared-fs (default)
# Shares the emptyDir folder with the guest using the method given
# by the `shared_fs` setting.
#
# - block-encrypted
# Plugs a block device to be encrypted in the guest.
#
emptydir_mode = "@DEFEMPTYDIRMODE@"
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
@@ -712,6 +724,11 @@ dan_conf = "@DEFDANCONF@"
# This is an experimental feature and might be removed in the future.
experimental_force_guest_pull = @DEFFORCEGUESTPULL@
# kubelet_root_dir is the kubelet root directory used to match ConfigMap/Secret
# volume paths for propagation. Override for distros that use a different path
# (e.g. k0s: /var/lib/k0s/kubelet).
kubelet_root_dir = "@DEFKUBELETROOTDIR@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the

View File

@@ -704,6 +704,18 @@ vfio_mode = "@DEFVFIOMODE@"
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Specifies how Kubernetes emptyDir volumes are handled.
# Options:
#
# - shared-fs (default)
# Shares the emptyDir folder with the guest using the method given
# by the `shared_fs` setting.
#
# - block-encrypted
# Plugs a block device to be encrypted in the guest.
#
emptydir_mode = "@DEFEMPTYDIRMODE_COCO@"
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
@@ -737,6 +749,11 @@ dan_conf = "@DEFDANCONF@"
# This is an experimental feature and might be removed in the future.
experimental_force_guest_pull = @DEFFORCEGUESTPULL@
# kubelet_root_dir is the kubelet root directory used to match ConfigMap/Secret
# volume paths for propagation. Override for distros that use a different path
# (e.g. k0s: /var/lib/k0s/kubelet).
kubelet_root_dir = "@DEFKUBELETROOTDIR@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the

View File

@@ -686,6 +686,18 @@ vfio_mode = "@DEFVFIOMODE@"
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Specifies how Kubernetes emptyDir volumes are handled.
# Options:
#
# - shared-fs (default)
# Shares the emptyDir folder with the guest using the method given
# by the `shared_fs` setting.
#
# - block-encrypted
# Plugs a block device to be encrypted in the guest.
#
emptydir_mode = "@DEFEMPTYDIRMODE_COCO@"
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
@@ -719,6 +731,11 @@ dan_conf = "@DEFDANCONF@"
# This is an experimental feature and might be removed in the future.
experimental_force_guest_pull = @DEFFORCEGUESTPULL@
# kubelet_root_dir is the kubelet root directory used to match ConfigMap/Secret
# volume paths for propagation. Override for distros that use a different path
# (e.g. k0s: /var/lib/k0s/kubelet).
kubelet_root_dir = "@DEFKUBELETROOTDIR@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the

View File

@@ -695,6 +695,18 @@ vfio_mode = "@DEFVFIOMODE@"
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Specifies how Kubernetes emptyDir volumes are handled.
# Options:
#
# - shared-fs (default)
# Shares the emptyDir folder with the guest using the method given
# by the `shared_fs` setting.
#
# - block-encrypted
# Plugs a block device to be encrypted in the guest.
#
emptydir_mode = "@DEFEMPTYDIRMODE@"
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
@@ -723,6 +735,11 @@ create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
# (default: /run/kata-containers/dans)
dan_conf = "@DEFDANCONF@"
# kubelet_root_dir is the kubelet root directory used to match ConfigMap/Secret
# volume paths for propagation. Override for distros that use a different path
# (e.g. k0s: /var/lib/k0s/kubelet).
kubelet_root_dir = "@DEFKUBELETROOTDIR@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the

View File

@@ -262,6 +262,18 @@ vfio_mode = "@DEFVFIOMODE@"
# Note: remote hypervisor has no sharing of emptydir mounts from host to guest
disable_guest_empty_dir = false
# Specifies how Kubernetes emptyDir volumes are handled.
# Options:
#
# - shared-fs (default)
# Shares the emptyDir folder with the guest using the method given
# by the `shared_fs` setting.
#
# - block-encrypted
# Plugs a block device to be encrypted in the guest.
#
emptydir_mode = "@DEFEMPTYDIRMODE@"
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
@@ -290,6 +302,11 @@ create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
# (default: /run/kata-containers/dans)
dan_conf = "@DEFDANCONF@"
# kubelet_root_dir is the kubelet root directory used to match ConfigMap/Secret
# volume paths for propagation. Override for distros that use a different path
# (e.g. k0s: /var/lib/k0s/kubelet).
kubelet_root_dir = "@DEFKUBELETROOTDIR@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the

View File

@@ -397,6 +397,18 @@ static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT_STRATOVIRT@
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Specifies how Kubernetes emptyDir volumes are handled.
# Options:
#
# - shared-fs (default)
# Shares the emptyDir folder with the guest using the method given
# by the `shared_fs` setting.
#
# - block-encrypted
# Plugs a block device to be encrypted in the guest.
#
emptydir_mode = "@DEFEMPTYDIRMODE@"
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
@@ -425,6 +437,11 @@ create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
# (default: /run/kata-containers/dans)
dan_conf = "@DEFDANCONF@"
# kubelet_root_dir is the kubelet root directory used to match ConfigMap/Secret
# volume paths for propagation. Override for distros that use a different path
# (e.g. k0s: /var/lib/k0s/kubelet).
kubelet_root_dir = "@DEFKUBELETROOTDIR@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the

View File

@@ -1,7 +1,7 @@
module github.com/kata-containers/kata-containers/src/runtime
// Keep in sync with version in versions.yaml
go 1.24.13
go 1.25.8
// WARNING: Do NOT use `replace` directives as those break dependabot:
// https://github.com/kata-containers/kata-containers/issues/11020
@@ -52,11 +52,10 @@ require (
github.com/urfave/cli v1.22.17
github.com/vishvananda/netlink v1.3.1
github.com/vishvananda/netns v0.0.5
gitlab.com/nvidia/cloud-native/go-nvlib v0.0.0-20220601114329-47893b162965
go.opentelemetry.io/otel v1.35.0
go.opentelemetry.io/otel v1.40.0
go.opentelemetry.io/otel/exporters/jaeger v1.0.0
go.opentelemetry.io/otel/sdk v1.35.0
go.opentelemetry.io/otel/trace v1.35.0
go.opentelemetry.io/otel/sdk v1.40.0
go.opentelemetry.io/otel/trace v1.40.0
golang.org/x/oauth2 v0.30.0
golang.org/x/sys v0.40.0
google.golang.org/grpc v1.72.0
@@ -127,9 +126,9 @@ require (
github.com/x448/float16 v0.8.4 // indirect
go.mongodb.org/mongo-driver v1.14.0 // indirect
go.opencensus.io v0.24.0 // indirect
go.opentelemetry.io/auto/sdk v1.1.0 // indirect
go.opentelemetry.io/auto/sdk v1.2.1 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.59.0 // indirect
go.opentelemetry.io/otel/metric v1.35.0 // indirect
go.opentelemetry.io/otel/metric v1.40.0 // indirect
golang.org/x/exp v0.0.0-20241108190413-2d47ceb2692f // indirect
golang.org/x/mod v0.31.0 // indirect
golang.org/x/net v0.49.0 // indirect

View File

@@ -266,8 +266,8 @@ github.com/prometheus/common v0.62.0 h1:xasJaQlnWAeyHdUBeGjXmutelfJHWMRr+Fg4QszZ
github.com/prometheus/common v0.62.0/go.mod h1:vyBcEuLSvWos9B1+CyL7JZ2up+uFzXhkqml0W5zIY1I=
github.com/prometheus/procfs v0.15.1 h1:YagwOFzUgYfKKHX6Dr+sHT7km/hxC76UB0learggepc=
github.com/prometheus/procfs v0.15.1/go.mod h1:fB45yRUv8NstnjriLhBQLuOUt+WW4BsoGhij/e3PBqk=
github.com/rogpeppe/go-internal v1.13.1 h1:KvO1DLK/DRN07sQ1LQKScxyZJuNnedQ5/wKSR38lUII=
github.com/rogpeppe/go-internal v1.13.1/go.mod h1:uMEvuHeurkdAXX61udpOXGD/AzZDWNMNyH2VO9fmH0o=
github.com/rogpeppe/go-internal v1.14.1 h1:UQB4HGPB6osV0SQTLymcB4TgvyWu6ZyliaW0tI/otEQ=
github.com/rogpeppe/go-internal v1.14.1/go.mod h1:MaRKkUm5W0goXpeCfT7UZI6fk/L7L7so1lCWt35ZSgc=
github.com/russross/blackfriday/v2 v2.1.0 h1:JIOH55/0cWyOuilr9/qlrm0BSXldqnqwMsf35Ld67mk=
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
github.com/safchain/ethtool v0.6.2 h1:O3ZPFAKEUEfbtE6J/feEe2Ft7dIJ2Sy8t4SdMRiIMHY=
@@ -309,31 +309,29 @@ github.com/xeipuuv/gojsonschema v1.2.0 h1:LhYJRs+L4fBtjZUfuSZIKGeVu0QRy8e5Xi7D17
github.com/xeipuuv/gojsonschema v1.2.0/go.mod h1:anYRn/JVcOK2ZgGU+IjEV4nwlhoK5sQluxsYJ78Id3Y=
github.com/yuin/goldmark v1.1.27/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
gitlab.com/nvidia/cloud-native/go-nvlib v0.0.0-20220601114329-47893b162965 h1:EXE1ZsUqiUWGV5Dw2oTYpXx24ffxj0//yhTB0Ppv+4s=
gitlab.com/nvidia/cloud-native/go-nvlib v0.0.0-20220601114329-47893b162965/go.mod h1:TBB3sR7/jg4RCThC/cgT4fB8mAbbMO307TycfgeR59w=
go.mongodb.org/mongo-driver v1.14.0 h1:P98w8egYRjYe3XDjxhYJagTokP/H6HzlsnojRgZRd80=
go.mongodb.org/mongo-driver v1.14.0/go.mod h1:Vzb0Mk/pa7e6cWw85R4F/endUC3u0U9jGcNU603k65c=
go.opencensus.io v0.24.0 h1:y73uSU6J157QMP2kn2r30vwW1A2W2WFwSCGnAVxeaD0=
go.opencensus.io v0.24.0/go.mod h1:vNK8G9p7aAivkbmorf4v+7Hgx+Zs0yY+0fOtgBfjQKo=
go.opentelemetry.io/auto/sdk v1.1.0 h1:cH53jehLUN6UFLY71z+NDOiNJqDdPRaXzTel0sJySYA=
go.opentelemetry.io/auto/sdk v1.1.0/go.mod h1:3wSPjt5PWp2RhlCcmmOial7AvC4DQqZb7a7wCow3W8A=
go.opentelemetry.io/auto/sdk v1.2.1 h1:jXsnJ4Lmnqd11kwkBV2LgLoFMZKizbCi5fNZ/ipaZ64=
go.opentelemetry.io/auto/sdk v1.2.1/go.mod h1:KRTj+aOaElaLi+wW1kO/DZRXwkF4C5xPbEe3ZiIhN7Y=
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.59.0 h1:CV7UdSGJt/Ao6Gp4CXckLxVRRsRgDHoI8XjbL3PDl8s=
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.59.0/go.mod h1:FRmFuRJfag1IZ2dPkHnEoSFVgTVPUd2qf5Vi69hLb8I=
go.opentelemetry.io/otel v1.0.0/go.mod h1:AjRVh9A5/5DE7S+mZtTR6t8vpKKryam+0lREnfmS4cg=
go.opentelemetry.io/otel v1.35.0 h1:xKWKPxrxB6OtMCbmMY021CqC45J+3Onta9MqjhnusiQ=
go.opentelemetry.io/otel v1.35.0/go.mod h1:UEqy8Zp11hpkUrL73gSlELM0DupHoiq72dR+Zqel/+Y=
go.opentelemetry.io/otel v1.40.0 h1:oA5YeOcpRTXq6NN7frwmwFR0Cn3RhTVZvXsP4duvCms=
go.opentelemetry.io/otel v1.40.0/go.mod h1:IMb+uXZUKkMXdPddhwAHm6UfOwJyh4ct1ybIlV14J0g=
go.opentelemetry.io/otel/exporters/jaeger v1.0.0 h1:cLhx8llHw02h5JTqGqaRbYn+QVKHmrzD9vEbKnSPk5U=
go.opentelemetry.io/otel/exporters/jaeger v1.0.0/go.mod h1:q10N1AolE1JjqKrFJK2tYw0iZpmX+HBaXBtuCzRnBGQ=
go.opentelemetry.io/otel/metric v1.35.0 h1:0znxYu2SNyuMSQT4Y9WDWej0VpcsxkuklLa4/siN90M=
go.opentelemetry.io/otel/metric v1.35.0/go.mod h1:nKVFgxBZ2fReX6IlyW28MgZojkoAkJGaE8CpgeAU3oE=
go.opentelemetry.io/otel/metric v1.40.0 h1:rcZe317KPftE2rstWIBitCdVp89A2HqjkxR3c11+p9g=
go.opentelemetry.io/otel/metric v1.40.0/go.mod h1:ib/crwQH7N3r5kfiBZQbwrTge743UDc7DTFVZrrXnqc=
go.opentelemetry.io/otel/sdk v1.0.0/go.mod h1:PCrDHlSy5x1kjezSdL37PhbFUMjrsLRshJ2zCzeXwbM=
go.opentelemetry.io/otel/sdk v1.35.0 h1:iPctf8iprVySXSKJffSS79eOjl9pvxV9ZqOWT0QejKY=
go.opentelemetry.io/otel/sdk v1.35.0/go.mod h1:+ga1bZliga3DxJ3CQGg3updiaAJoNECOgJREo9KHGQg=
go.opentelemetry.io/otel/sdk/metric v1.34.0 h1:5CeK9ujjbFVL5c1PhLuStg1wxA7vQv7ce1EK0Gyvahk=
go.opentelemetry.io/otel/sdk/metric v1.34.0/go.mod h1:jQ/r8Ze28zRKoNRdkjCZxfs6YvBTG1+YIqyFVFYec5w=
go.opentelemetry.io/otel/sdk v1.40.0 h1:KHW/jUzgo6wsPh9At46+h4upjtccTmuZCFAc9OJ71f8=
go.opentelemetry.io/otel/sdk v1.40.0/go.mod h1:Ph7EFdYvxq72Y8Li9q8KebuYUr2KoeyHx0DRMKrYBUE=
go.opentelemetry.io/otel/sdk/metric v1.40.0 h1:mtmdVqgQkeRxHgRv4qhyJduP3fYJRMX4AtAlbuWdCYw=
go.opentelemetry.io/otel/sdk/metric v1.40.0/go.mod h1:4Z2bGMf0KSK3uRjlczMOeMhKU2rhUqdWNoKcYrtcBPg=
go.opentelemetry.io/otel/trace v1.0.0/go.mod h1:PXTWqayeFUlJV1YDNhsJYB184+IvAH814St6o6ajzIs=
go.opentelemetry.io/otel/trace v1.35.0 h1:dPpEfJu1sDIqruz7BHFG3c7528f6ddfSWfFDVt/xgMs=
go.opentelemetry.io/otel/trace v1.35.0/go.mod h1:WUk7DtFp1Aw2MkvqGdwiXYDZZNvA/1J8o6xRXLrIkyc=
go.opentelemetry.io/otel/trace v1.40.0 h1:WA4etStDttCSYuhwvEa8OP8I5EWu24lkOzp+ZYblVjw=
go.opentelemetry.io/otel/trace v1.40.0/go.mod h1:zeAhriXecNGP/s2SEG3+Y8X9ujcJOTqQ5RgdEJcawiA=
go.uber.org/automaxprocs v1.6.0 h1:O3y2/QNTOdbF+e/dpXNNW7Rx2hZ4sTIPyybbxyNqTUs=
go.uber.org/automaxprocs v1.6.0/go.mod h1:ifeIMSnPZuznNm6jmdzmU3/bfk01Fe2fotchwEFJ8r8=
go.uber.org/goleak v1.3.0 h1:2K3zAYmnTNqV73imy9J1T3WC+gmCePx2hEGkimedGto=

View File

@@ -72,7 +72,7 @@ func IsPCIeDevice(bdf string) bool {
}
// read from /sys/bus/pci/devices/xxx/property
func getPCIDeviceProperty(bdf string, property PCISysFsProperty) string {
func GetPCIDeviceProperty(bdf string, property PCISysFsProperty) string {
if len(strings.Split(bdf, ":")) == 2 {
bdf = PCIDomain + ":" + bdf
}
@@ -220,9 +220,9 @@ func GetDeviceFromVFIODev(device config.DeviceInfo) ([]*config.VFIODev, error) {
return nil, err
}
vendorID := getPCIDeviceProperty(deviceBDF, PCISysFsDevicesVendor)
deviceID := getPCIDeviceProperty(deviceBDF, PCISysFsDevicesDevice)
pciClass := getPCIDeviceProperty(deviceBDF, PCISysFsDevicesClass)
vendorID := GetPCIDeviceProperty(deviceBDF, PCISysFsDevicesVendor)
deviceID := GetPCIDeviceProperty(deviceBDF, PCISysFsDevicesDevice)
pciClass := GetPCIDeviceProperty(deviceBDF, PCISysFsDevicesClass)
i, err := extractIndex(device.HostPath)
if err != nil {
@@ -276,7 +276,7 @@ func GetAllVFIODevicesFromIOMMUGroup(device config.DeviceInfo) ([]*config.VFIODe
switch vfioDeviceType {
case config.VFIOPCIDeviceNormalType, config.VFIOPCIDeviceMediatedType:
// This is vfio-pci and vfio-mdev specific
pciClass := getPCIDeviceProperty(deviceBDF, PCISysFsDevicesClass)
pciClass := GetPCIDeviceProperty(deviceBDF, PCISysFsDevicesClass)
// We need to ignore Host or PCI Bridges that are in the same IOMMU group as the
// passed-through devices. One CANNOT pass-through a PCI bridge or Host bridge.
// Class 0x0604 is PCI bridge, 0x0600 is Host bridge
@@ -288,8 +288,8 @@ func GetAllVFIODevicesFromIOMMUGroup(device config.DeviceInfo) ([]*config.VFIODe
continue
}
// Fetch the PCI Vendor ID and Device ID
vendorID := getPCIDeviceProperty(deviceBDF, PCISysFsDevicesVendor)
deviceID := getPCIDeviceProperty(deviceBDF, PCISysFsDevicesDevice)
vendorID := GetPCIDeviceProperty(deviceBDF, PCISysFsDevicesVendor)
deviceID := GetPCIDeviceProperty(deviceBDF, PCISysFsDevicesDevice)
// Do not directly assign to `vfio` -- need to access field still
vfio = config.VFIODev{

View File

@@ -17,6 +17,7 @@ import (
const (
mountInfoFileName = "mountInfo.json"
EncryptionKeyMetadataKey = "encryptionKey"
FSGroupMetadataKey = "fsGroup"
FSGroupChangePolicyMetadataKey = "fsGroupChangePolicy"
)
@@ -77,6 +78,14 @@ func Add(volumePath string, mountInfo string) error {
return os.WriteFile(filepath.Join(volumeDir, mountInfoFileName), []byte(mountInfo), 0600)
}
func AddMountInfo(volumePath string, mountInfo MountInfo) error {
s, err := json.Marshal(&mountInfo)
if err != nil {
return err
}
return Add(volumePath, string(s))
}
// Remove deletes the direct volume path including all the files inside it.
func Remove(volumePath string) error {
return os.RemoveAll(filepath.Join(kataDirectVolumeRootPath, b64.URLEncoding.EncodeToString([]byte(volumePath))))
@@ -99,7 +108,18 @@ func VolumeMountInfo(volumePath string) (*MountInfo, error) {
return &mountInfo, nil
}
// RecordSandboxID associates a sandbox id with a direct volume.
// IsVolumeMounted returns whether the direct volume mount is present.
func IsVolumeMounted(volumePath string) (bool, error) {
if _, err := VolumeMountInfo(volumePath); err != nil {
if os.IsNotExist(err) {
return false, nil
}
return false, err
}
return true, nil
}
// RecordSandboxId associates a sandbox id with a direct volume.
func RecordSandboxID(sandboxID string, volumePath string) error {
encodedPath := b64.URLEncoding.EncodeToString([]byte(volumePath))
mountInfoFilePath := filepath.Join(kataDirectVolumeRootPath, encodedPath, mountInfoFileName)

View File

@@ -197,10 +197,28 @@ type runtime struct {
StaticSandboxResourceMgmt bool `toml:"static_sandbox_resource_mgmt"`
EnablePprof bool `toml:"enable_pprof"`
DisableGuestEmptyDir bool `toml:"disable_guest_empty_dir"`
EmptyDirMode string `toml:"emptydir_mode"`
CreateContainerTimeout uint64 `toml:"create_container_timeout"`
DanConf string `toml:"dan_conf"`
ForceGuestPull bool `toml:"experimental_force_guest_pull"`
PodResourceAPISock string `toml:"pod_resource_api_sock"`
KubeletRootDir string `toml:"kubelet_root_dir"`
}
// emptyDirMode returns a valid emptydir_mode value, defaulting to shared-fs
// if the TOML field is unset.
func (r runtime) emptyDirMode() (string, error) {
if r.EmptyDirMode == "" {
return vc.EmptyDirModeSharedFs, nil
}
switch r.EmptyDirMode {
case vc.EmptyDirModeSharedFs, vc.EmptyDirModeVirtioBlkEncrypted:
return r.EmptyDirMode, nil
default:
return "", fmt.Errorf("invalid emptydir_mode=%q, allowed values: %q, %q",
r.EmptyDirMode, vc.EmptyDirModeSharedFs, vc.EmptyDirModeVirtioBlkEncrypted)
}
}
type agent struct {
@@ -1389,6 +1407,16 @@ func updateRuntimeConfigAgent(configPath string, tomlConf tomlConfig, config *oc
return nil
}
func updateRuntimeConfigRuntime(configPath string, tomlConf tomlConfig, config *oci.RuntimeConfig) error {
emptyDirMode, err := tomlConf.Runtime.emptyDirMode()
if err != nil {
return fmt.Errorf("%v: %v", configPath, err)
}
config.EmptyDirMode = emptyDirMode
return nil
}
// SetKernelParams adds the user-specified kernel parameters (from the
// configuration file) to the defaults so that the former take priority.
func SetKernelParams(runtimeConfig *oci.RuntimeConfig) error {
@@ -1453,6 +1481,10 @@ func updateRuntimeConfig(configPath string, tomlConf tomlConfig, config *oci.Run
return err
}
if err := updateRuntimeConfigRuntime(configPath, tomlConf, config); err != nil {
return err
}
fConfig, err := newFactoryConfig(tomlConf.Factory)
if err != nil {
return fmt.Errorf("%v: %v", configPath, err)
@@ -1642,6 +1674,7 @@ func LoadConfiguration(configPath string, ignoreLogging bool) (resolvedConfigPat
config.ForceGuestPull = tomlConf.Runtime.ForceGuestPull
config.PodResourceAPISock = tomlConf.Runtime.PodResourceAPISock
config.KubeletRootDir = tomlConf.Runtime.KubeletRootDir
return resolved, config, nil
}

View File

@@ -218,6 +218,7 @@ func createAllRuntimeConfigFiles(dir, hypervisor string) (testConfig testRuntime
JaegerPassword: jaegerPassword,
FactoryConfig: factoryConfig,
EmptyDirMode: vc.EmptyDirModeSharedFs,
}
err = SetKernelParams(&runtimeConfig)
@@ -599,6 +600,7 @@ func TestMinimalRuntimeConfig(t *testing.T) {
AgentConfig: expectedAgentConfig,
FactoryConfig: expectedFactoryConfig,
EmptyDirMode: vc.EmptyDirModeSharedFs,
}
err = SetKernelParams(&expectedConfig)
if err != nil {
@@ -1609,6 +1611,39 @@ func TestCheckNetNsConfig(t *testing.T) {
assert.Error(err)
}
func TestCheckEmptyDirMode(t *testing.T) {
assert := assert.New(t)
// Valid values
r := runtime{EmptyDirMode: vc.EmptyDirModeSharedFs}
mode, err := r.emptyDirMode()
assert.NoError(err)
assert.Equal(vc.EmptyDirModeSharedFs, mode)
r = runtime{EmptyDirMode: vc.EmptyDirModeVirtioBlkEncrypted}
mode, err = r.emptyDirMode()
assert.NoError(err)
assert.Equal(vc.EmptyDirModeVirtioBlkEncrypted, mode)
r = runtime{}
mode, err = r.emptyDirMode()
assert.NoError(err)
assert.Equal(vc.EmptyDirModeSharedFs, mode)
// Invalid values
r = runtime{EmptyDirMode: "invalid"}
_, err = r.emptyDirMode()
assert.Error(err)
r = runtime{EmptyDirMode: "shared_fs"}
_, err = r.emptyDirMode()
assert.Error(err)
r = runtime{EmptyDirMode: "block_encrypted"}
_, err = r.emptyDirMode()
assert.Error(err)
}
func TestCheckFactoryConfig(t *testing.T) {
assert := assert.New(t)

View File

@@ -98,12 +98,12 @@ func HandleFactory(ctx context.Context, vci vc.VC, runtimeConfig *oci.RuntimeCon
// For the given pod ephemeral volume is created only once
// backed by tmpfs inside the VM. For successive containers
// of the same pod the already existing volume is reused.
func SetEphemeralStorageType(ociSpec specs.Spec, disableGuestEmptyDir bool) specs.Spec {
func SetEphemeralStorageType(ociSpec specs.Spec, disableGuestEmptyDir bool, emptyDirMode string) specs.Spec {
for idx, mnt := range ociSpec.Mounts {
if vc.IsEphemeralStorage(mnt.Source) {
ociSpec.Mounts[idx].Type = vc.KataEphemeralDevType
}
if vc.Isk8sHostEmptyDir(mnt.Source) && !disableGuestEmptyDir {
if vc.Isk8sHostEmptyDir(mnt.Source) && !disableGuestEmptyDir && emptyDirMode != vc.EmptyDirModeVirtioBlkEncrypted {
ociSpec.Mounts[idx].Type = vc.KataLocalDevType
}
}
@@ -243,7 +243,8 @@ func CreateContainer(ctx context.Context, sandbox vc.VCSandbox, ociSpec specs.Sp
// The value of this annotation is sent to the sandbox using init data.
delete(ociSpec.Annotations, vcAnnotations.Initdata)
ociSpec = SetEphemeralStorageType(ociSpec, disableGuestEmptyDir)
emptyDirMode := sandbox.Status().EmptyDirMode
ociSpec = SetEphemeralStorageType(ociSpec, disableGuestEmptyDir, emptyDirMode)
contConfig, err := oci.ContainerConfig(ociSpec, bundlePath, containerID, disableOutput)
if err != nil {

View File

@@ -141,7 +141,7 @@ func TestSetEphemeralStorageType(t *testing.T) {
ociMounts = append(ociMounts, mount)
ociSpec.Mounts = ociMounts
ociSpec = SetEphemeralStorageType(ociSpec, false)
ociSpec = SetEphemeralStorageType(ociSpec, false, vc.EmptyDirModeSharedFs)
mountType := ociSpec.Mounts[0].Type
assert.Equal(mountType, "ephemeral",

View File

@@ -165,6 +165,10 @@ type RuntimeConfig struct {
// Determines if Kata creates emptyDir on the guest
DisableGuestEmptyDir bool
// EmptyDirMode specifies how Kubernetes emptyDir volumes are handled.
// Valid values are "shared-fs" (default) or "block-encrypted".
EmptyDirMode string
// CreateContainer timeout which, if provided, indicates the createcontainer request timeout
// needed for the workload ( Mostly used for pulling images in the guest )
CreateContainerTimeout uint64
@@ -193,6 +197,10 @@ type RuntimeConfig struct {
// ColdPlugVFIO != NoPort AND PodResourceAPISock != "" => kubelet
// based cold plug.
PodResourceAPISock string
// KubeletRootDir is the kubelet root directory used to match ConfigMap/Secret
// volume paths (e.g. /var/lib/k0s/kubelet for k0s). If empty, default is used.
KubeletRootDir string
}
// AddKernelParam allows the addition of new kernel parameters to an existing
@@ -1207,6 +1215,8 @@ func SandboxConfig(ocispec specs.Spec, runtime RuntimeConfig, bundlePath, cid st
DisableGuestSeccomp: runtime.DisableGuestSeccomp,
EmptyDirMode: runtime.EmptyDirMode,
EnableVCPUsPinning: runtime.EnableVCPUsPinning,
GuestSeLinuxLabel: runtime.GuestSeLinuxLabel,
@@ -1216,6 +1226,8 @@ func SandboxConfig(ocispec specs.Spec, runtime RuntimeConfig, bundlePath, cid st
CreateContainerTimeout: runtime.CreateContainerTimeout,
ForceGuestPull: runtime.ForceGuestPull,
KubeletRootDir: runtime.KubeletRootDir,
}
if err := addAnnotations(ocispec, &sandboxConfig, runtime); err != nil {

View File

@@ -186,13 +186,15 @@ func NewResourceController(path string, resources *specs.LinuxResources) (Resour
}, nil
}
func NewSandboxResourceController(path string, resources *specs.LinuxResources, sandboxCgroupOnly bool) (ResourceController, error) {
func NewSandboxResourceController(path string, resources *specs.LinuxResources, sandboxCgroupOnly bool, needsHypervisorDevices bool) (ResourceController, error) {
sandboxResources := *resources
sandboxDevices, err := sandboxDevices()
if err != nil {
return nil, err
if needsHypervisorDevices {
sandboxDevs, err := sandboxDevices()
if err != nil {
return nil, err
}
sandboxResources.Devices = append(sandboxResources.Devices, sandboxDevs...)
}
sandboxResources.Devices = append(sandboxResources.Devices, sandboxDevices...)
// Currently we know to handle systemd cgroup path only when it's the only cgroup (no overhead group), hence,
// if sandboxCgroupOnly is not true we treat it as cgroupfs path as it used to be, although it may be incorrect.

View File

@@ -21,7 +21,7 @@ func NewResourceController(path string, resources *specs.LinuxResources) (Resour
return &DarwinResourceController{}, nil
}
func NewSandboxResourceController(path string, resources *specs.LinuxResources, sandboxCgroupOnly bool) (ResourceController, error) {
func NewSandboxResourceController(path string, resources *specs.LinuxResources, sandboxCgroupOnly bool, needsHypervisorDevices bool) (ResourceController, error) {
return &DarwinResourceController{}, nil
}

View File

@@ -1,202 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@@ -1,94 +0,0 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package bytes
import (
"encoding/binary"
"unsafe"
)
// Raw returns just the bytes without any assumptions about layout
type Raw interface {
Raw() *[]byte
}
// Reader used to read various data sizes in the byte array
type Reader interface {
Read8(pos int) uint8
Read16(pos int) uint16
Read32(pos int) uint32
Read64(pos int) uint64
Len() int
}
// Writer used to write various sizes of data in the byte array
type Writer interface {
Write8(pos int, value uint8)
Write16(pos int, value uint16)
Write32(pos int, value uint32)
Write64(pos int, value uint64)
Len() int
}
// Bytes object for manipulating arbitrary byte arrays
type Bytes interface {
Raw
Reader
Writer
Slice(offset int, size int) Bytes
LittleEndian() Bytes
BigEndian() Bytes
}
var nativeByteOrder binary.ByteOrder
func init() {
buf := [2]byte{}
*(*uint16)(unsafe.Pointer(&buf[0])) = uint16(0x00FF)
switch buf {
case [2]byte{0xFF, 0x00}:
nativeByteOrder = binary.LittleEndian
case [2]byte{0x00, 0xFF}:
nativeByteOrder = binary.BigEndian
default:
panic("Unable to infer byte order")
}
}
// New raw bytearray
func New(data *[]byte) Bytes {
return (*native)(data)
}
// NewLittleEndian little endian ordering of bytes
func NewLittleEndian(data *[]byte) Bytes {
if nativeByteOrder == binary.LittleEndian {
return (*native)(data)
}
return (*swapbo)(data)
}
// NewBigEndian big endian ordering of bytes
func NewBigEndian(data *[]byte) Bytes {
if nativeByteOrder == binary.BigEndian {
return (*native)(data)
}
return (*swapbo)(data)
}

View File

@@ -1,78 +0,0 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package bytes
import (
"unsafe"
)
type native []byte
var _ Bytes = (*native)(nil)
func (b *native) Read8(pos int) uint8 {
return (*b)[pos]
}
func (b *native) Read16(pos int) uint16 {
return *(*uint16)(unsafe.Pointer(&((*b)[pos])))
}
func (b *native) Read32(pos int) uint32 {
return *(*uint32)(unsafe.Pointer(&((*b)[pos])))
}
func (b *native) Read64(pos int) uint64 {
return *(*uint64)(unsafe.Pointer(&((*b)[pos])))
}
func (b *native) Write8(pos int, value uint8) {
(*b)[pos] = value
}
func (b *native) Write16(pos int, value uint16) {
*(*uint16)(unsafe.Pointer(&((*b)[pos]))) = value
}
func (b *native) Write32(pos int, value uint32) {
*(*uint32)(unsafe.Pointer(&((*b)[pos]))) = value
}
func (b *native) Write64(pos int, value uint64) {
*(*uint64)(unsafe.Pointer(&((*b)[pos]))) = value
}
func (b *native) Slice(offset int, size int) Bytes {
nb := (*b)[offset : offset+size]
return &nb
}
func (b *native) LittleEndian() Bytes {
return NewLittleEndian((*[]byte)(b))
}
func (b *native) BigEndian() Bytes {
return NewBigEndian((*[]byte)(b))
}
func (b *native) Raw() *[]byte {
return (*[]byte)(b)
}
func (b *native) Len() int {
return len(*b)
}

View File

@@ -1,112 +0,0 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package bytes
import (
"unsafe"
)
type swapbo []byte
var _ Bytes = (*swapbo)(nil)
func (b *swapbo) Read8(pos int) uint8 {
return (*b)[pos]
}
func (b *swapbo) Read16(pos int) uint16 {
buf := [2]byte{}
buf[0] = (*b)[pos+1]
buf[1] = (*b)[pos+0]
return *(*uint16)(unsafe.Pointer(&buf[0]))
}
func (b *swapbo) Read32(pos int) uint32 {
buf := [4]byte{}
buf[0] = (*b)[pos+3]
buf[1] = (*b)[pos+2]
buf[2] = (*b)[pos+1]
buf[3] = (*b)[pos+0]
return *(*uint32)(unsafe.Pointer(&buf[0]))
}
func (b *swapbo) Read64(pos int) uint64 {
buf := [8]byte{}
buf[0] = (*b)[pos+7]
buf[1] = (*b)[pos+6]
buf[2] = (*b)[pos+5]
buf[3] = (*b)[pos+4]
buf[4] = (*b)[pos+3]
buf[5] = (*b)[pos+2]
buf[6] = (*b)[pos+1]
buf[7] = (*b)[pos+0]
return *(*uint64)(unsafe.Pointer(&buf[0]))
}
func (b *swapbo) Write8(pos int, value uint8) {
(*b)[pos] = value
}
func (b *swapbo) Write16(pos int, value uint16) {
buf := [2]byte{}
*(*uint16)(unsafe.Pointer(&buf[0])) = value
(*b)[pos+0] = buf[1]
(*b)[pos+1] = buf[0]
}
func (b *swapbo) Write32(pos int, value uint32) {
buf := [4]byte{}
*(*uint32)(unsafe.Pointer(&buf[0])) = value
(*b)[pos+0] = buf[3]
(*b)[pos+1] = buf[2]
(*b)[pos+2] = buf[1]
(*b)[pos+3] = buf[0]
}
func (b *swapbo) Write64(pos int, value uint64) {
buf := [8]byte{}
*(*uint64)(unsafe.Pointer(&buf[0])) = value
(*b)[pos+0] = buf[7]
(*b)[pos+1] = buf[6]
(*b)[pos+2] = buf[5]
(*b)[pos+3] = buf[4]
(*b)[pos+4] = buf[3]
(*b)[pos+5] = buf[2]
(*b)[pos+6] = buf[1]
(*b)[pos+7] = buf[0]
}
func (b *swapbo) Slice(offset int, size int) Bytes {
nb := (*b)[offset : offset+size]
return &nb
}
func (b *swapbo) LittleEndian() Bytes {
return NewLittleEndian((*[]byte)(b))
}
func (b *swapbo) BigEndian() Bytes {
return NewBigEndian((*[]byte)(b))
}
func (b *swapbo) Raw() *[]byte {
return (*[]byte)(b)
}
func (b *swapbo) Len() int {
return len(*b)
}

View File

@@ -1,143 +0,0 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package nvpci
import (
"fmt"
"io/ioutil"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvpci/bytes"
)
const (
// PCICfgSpaceStandardSize represents the size in bytes of the standard config space
PCICfgSpaceStandardSize = 256
// PCICfgSpaceExtendedSize represents the size in bytes of the extended config space
PCICfgSpaceExtendedSize = 4096
// PCICapabilityListPointer represents offset for the capability list pointer
PCICapabilityListPointer = 0x34
// PCIStatusCapabilityList represents the status register bit which indicates capability list support
PCIStatusCapabilityList = 0x10
// PCIStatusBytePosition represents the position of the status register
PCIStatusBytePosition = 0x06
)
// ConfigSpace PCI configuration space (standard extended) file path
type ConfigSpace struct {
Path string
}
// ConfigSpaceIO Interface for reading and writing raw and preconfigured values
type ConfigSpaceIO interface {
bytes.Bytes
GetVendorID() uint16
GetDeviceID() uint16
GetPCICapabilities() (*PCICapabilities, error)
}
type configSpaceIO struct {
bytes.Bytes
}
// PCIStandardCapability standard PCI config space
type PCIStandardCapability struct {
bytes.Bytes
}
// PCIExtendedCapability extended PCI config space
type PCIExtendedCapability struct {
bytes.Bytes
Version uint8
}
// PCICapabilities combines the standard and extended config space
type PCICapabilities struct {
Standard map[uint8]*PCIStandardCapability
Extended map[uint16]*PCIExtendedCapability
}
func (cs *ConfigSpace) Read() (ConfigSpaceIO, error) {
config, err := ioutil.ReadFile(cs.Path)
if err != nil {
return nil, fmt.Errorf("failed to open file: %v", err)
}
return &configSpaceIO{bytes.New(&config)}, nil
}
func (cs *configSpaceIO) GetVendorID() uint16 {
return cs.Read16(0)
}
func (cs *configSpaceIO) GetDeviceID() uint16 {
return cs.Read16(2)
}
func (cs *configSpaceIO) GetPCICapabilities() (*PCICapabilities, error) {
caps := &PCICapabilities{
make(map[uint8]*PCIStandardCapability),
make(map[uint16]*PCIExtendedCapability),
}
support := cs.Read8(PCIStatusBytePosition) & PCIStatusCapabilityList
if support == 0 {
return nil, fmt.Errorf("pci device does not support capability list")
}
soffset := cs.Read8(PCICapabilityListPointer)
if int(soffset) >= cs.Len() {
return nil, fmt.Errorf("capability list pointer out of bounds")
}
for soffset != 0 {
if soffset == 0xff {
return nil, fmt.Errorf("config space broken")
}
if int(soffset) >= PCICfgSpaceStandardSize {
return nil, fmt.Errorf("standard capability list pointer out of bounds")
}
data := cs.Read32(int(soffset))
id := uint8(data & 0xff)
caps.Standard[id] = &PCIStandardCapability{
cs.Slice(int(soffset), cs.Len()-int(soffset)),
}
soffset = uint8((data >> 8) & 0xff)
}
if cs.Len() <= PCICfgSpaceStandardSize {
return caps, nil
}
eoffset := uint16(PCICfgSpaceStandardSize)
for eoffset != 0 {
if eoffset == 0xffff {
return nil, fmt.Errorf("config space broken")
}
if int(eoffset) >= PCICfgSpaceExtendedSize {
return nil, fmt.Errorf("extended capability list pointer out of bounds")
}
data := cs.Read32(int(eoffset))
id := uint16(data & 0xffff)
version := uint8((data >> 16) & 0xf)
caps.Extended[id] = &PCIExtendedCapability{
cs.Slice(int(eoffset), cs.Len()-int(eoffset)),
version,
}
eoffset = uint16((data >> 4) & 0xffc)
}
return caps, nil
}

View File

@@ -1,127 +0,0 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package mmio
import (
"fmt"
"os"
"syscall"
"unsafe"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvpci/bytes"
)
// Mmio memory map a region
type Mmio interface {
bytes.Raw
bytes.Reader
bytes.Writer
Sync() error
Close() error
Slice(offset int, size int) Mmio
LittleEndian() Mmio
BigEndian() Mmio
}
type mmio struct {
bytes.Bytes
}
func open(path string, offset int, size int, flags int) (Mmio, error) {
var mmapFlags int
switch flags {
case os.O_RDONLY:
mmapFlags = syscall.PROT_READ
case os.O_RDWR:
mmapFlags = syscall.PROT_READ | syscall.PROT_WRITE
default:
return nil, fmt.Errorf("invalid flags: %v", flags)
}
file, err := os.OpenFile(path, flags, 0)
if err != nil {
return nil, fmt.Errorf("failed to open file: %v", err)
}
defer file.Close()
fi, err := file.Stat()
if err != nil {
return nil, fmt.Errorf("failed to get file info: %v", err)
}
if size > int(fi.Size()) {
return nil, fmt.Errorf("requested size larger than file size")
}
if size < 0 {
size = int(fi.Size())
}
mmap, err := syscall.Mmap(
int(file.Fd()),
int64(offset),
size,
mmapFlags,
syscall.MAP_SHARED)
if err != nil {
return nil, fmt.Errorf("failed to mmap file: %v", err)
}
return &mmio{bytes.New(&mmap)}, nil
}
// OpenRO open region readonly
func OpenRO(path string, offset int, size int) (Mmio, error) {
return open(path, offset, size, os.O_RDONLY)
}
// OpenRW open region read write
func OpenRW(path string, offset int, size int) (Mmio, error) {
return open(path, offset, size, os.O_RDWR)
}
func (m *mmio) Slice(offset int, size int) Mmio {
return &mmio{m.Bytes.Slice(offset, size)}
}
func (m *mmio) LittleEndian() Mmio {
return &mmio{m.Bytes.LittleEndian()}
}
func (m *mmio) BigEndian() Mmio {
return &mmio{m.Bytes.BigEndian()}
}
func (m *mmio) Close() error {
err := syscall.Munmap(*m.Bytes.Raw())
if err != nil {
return fmt.Errorf("failed to munmap file: %v", err)
}
return nil
}
func (m *mmio) Sync() error {
_, _, errno := syscall.Syscall(
syscall.SYS_MSYNC,
uintptr(unsafe.Pointer(&(*m.Bytes.Raw())[0])),
uintptr(m.Len()),
uintptr(syscall.MS_SYNC|syscall.MS_INVALIDATE))
if errno != 0 {
return fmt.Errorf("failed to msync file: %v", errno)
}
return nil
}

View File

@@ -1,74 +0,0 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package mmio
import (
"fmt"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvpci/bytes"
)
type mockMmio struct {
mmio
source *[]byte
offset int
rw bool
}
func mockOpen(source *[]byte, offset int, size int, rw bool) (Mmio, error) {
if size < 0 {
size = len(*source) - offset
}
if (offset + size) > len(*source) {
return nil, fmt.Errorf("offset+size out of range")
}
data := append([]byte{}, (*source)[offset:offset+size]...)
m := &mockMmio{}
m.Bytes = bytes.New(&data).LittleEndian()
m.source = source
m.offset = offset
m.rw = rw
return m, nil
}
// MockOpenRO open read only
func MockOpenRO(source *[]byte, offset int, size int) (Mmio, error) {
return mockOpen(source, offset, size, false)
}
// MockOpenRW open read write
func MockOpenRW(source *[]byte, offset int, size int) (Mmio, error) {
return mockOpen(source, offset, size, true)
}
func (m *mockMmio) Close() error {
m = &mockMmio{}
return nil
}
func (m *mockMmio) Sync() error {
if !m.rw {
return fmt.Errorf("opened read-only")
}
for i := range *m.Bytes.Raw() {
(*m.source)[m.offset+i] = (*m.Bytes.Raw())[i]
}
return nil
}

View File

@@ -1,141 +0,0 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package nvpci
import (
"fmt"
"io/ioutil"
"os"
"path/filepath"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvpci/bytes"
)
// MockNvpci mock pci device
type MockNvpci struct {
*nvpci
}
var _ Interface = (*MockNvpci)(nil)
// NewMockNvpci create new mock PCI and remove old devices
func NewMockNvpci() (mock *MockNvpci, rerr error) {
rootDir, err := ioutil.TempDir("", "")
if err != nil {
return nil, err
}
defer func() {
if rerr != nil {
os.RemoveAll(rootDir)
}
}()
mock = &MockNvpci{
NewFrom(rootDir).(*nvpci),
}
return mock, nil
}
// Cleanup remove the mocked PCI devices root folder
func (m *MockNvpci) Cleanup() {
os.RemoveAll(m.pciDevicesRoot)
}
// AddMockA100 Create an A100 like GPU mock device
func (m *MockNvpci) AddMockA100(address string, numaNode int) error {
deviceDir := filepath.Join(m.pciDevicesRoot, address)
err := os.MkdirAll(deviceDir, 0755)
if err != nil {
return err
}
vendor, err := os.Create(filepath.Join(deviceDir, "vendor"))
if err != nil {
return err
}
_, err = vendor.WriteString(fmt.Sprintf("0x%x", PCINvidiaVendorID))
if err != nil {
return err
}
class, err := os.Create(filepath.Join(deviceDir, "class"))
if err != nil {
return err
}
_, err = class.WriteString(fmt.Sprintf("0x%x", PCI3dControllerClass))
if err != nil {
return err
}
device, err := os.Create(filepath.Join(deviceDir, "device"))
if err != nil {
return err
}
_, err = device.WriteString("0x20bf")
if err != nil {
return err
}
numa, err := os.Create(filepath.Join(deviceDir, "numa_node"))
if err != nil {
return err
}
_, err = numa.WriteString(fmt.Sprintf("%v", numaNode))
if err != nil {
return err
}
config, err := os.Create(filepath.Join(deviceDir, "config"))
if err != nil {
return err
}
_data := make([]byte, PCICfgSpaceStandardSize)
data := bytes.New(&_data)
data.Write16(0, PCINvidiaVendorID)
data.Write16(2, uint16(0x20bf))
data.Write8(PCIStatusBytePosition, PCIStatusCapabilityList)
_, err = config.Write(*data.Raw())
if err != nil {
return err
}
bar0 := []uint64{0x00000000c2000000, 0x00000000c2ffffff, 0x0000000000040200}
resource, err := os.Create(filepath.Join(deviceDir, "resource"))
if err != nil {
return err
}
_, err = resource.WriteString(fmt.Sprintf("0x%x 0x%x 0x%x", bar0[0], bar0[1], bar0[2]))
if err != nil {
return err
}
pmcID := uint32(0x170000a1)
resource0, err := os.Create(filepath.Join(deviceDir, "resource0"))
if err != nil {
return err
}
_data = make([]byte, bar0[1]-bar0[0]+1)
data = bytes.New(&_data).LittleEndian()
data.Write32(0, pmcID)
_, err = resource0.Write(*data.Raw())
if err != nil {
return err
}
return nil
}

View File

@@ -1,316 +0,0 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package nvpci
import (
"fmt"
"io/ioutil"
"os"
"path"
"sort"
"strconv"
"strings"
)
const (
// PCIDevicesRoot represents base path for all pci devices under sysfs
PCIDevicesRoot = "/sys/bus/pci/devices"
// PCINvidiaVendorID represents PCI vendor id for NVIDIA
PCINvidiaVendorID uint16 = 0x10de
// PCIVgaControllerClass represents the PCI class for VGA Controllers
PCIVgaControllerClass uint32 = 0x030000
// PCI3dControllerClass represents the PCI class for 3D Graphics accellerators
PCI3dControllerClass uint32 = 0x030200
// PCINvSwitchClass represents the PCI class for NVSwitches
PCINvSwitchClass uint32 = 0x068000
)
// Interface allows us to get a list of all NVIDIA PCI devices
type Interface interface {
GetAllDevices() ([]*NvidiaPCIDevice, error)
Get3DControllers() ([]*NvidiaPCIDevice, error)
GetVGAControllers() ([]*NvidiaPCIDevice, error)
GetNVSwitches() ([]*NvidiaPCIDevice, error)
GetGPUs() ([]*NvidiaPCIDevice, error)
}
// MemoryResources a more human readable handle
type MemoryResources map[int]*MemoryResource
// ResourceInterface exposes some higher level functions of resources
type ResourceInterface interface {
GetTotalAddressableMemory(bool) (uint64, uint64)
}
type nvpci struct {
pciDevicesRoot string
}
var _ Interface = (*nvpci)(nil)
var _ ResourceInterface = (*MemoryResources)(nil)
// NvidiaPCIDevice represents a PCI device for an NVIDIA product
type NvidiaPCIDevice struct {
Path string
Address string
Vendor uint16
Class uint32
Device uint16
NumaNode int
Config *ConfigSpace
Resources MemoryResources
}
// IsVGAController if class == 0x300
func (d *NvidiaPCIDevice) IsVGAController() bool {
return d.Class == PCIVgaControllerClass
}
// Is3DController if class == 0x302
func (d *NvidiaPCIDevice) Is3DController() bool {
return d.Class == PCI3dControllerClass
}
// IsNVSwitch if classe == 0x068
func (d *NvidiaPCIDevice) IsNVSwitch() bool {
return d.Class == PCINvSwitchClass
}
// IsGPU either VGA for older cards or 3D for newer
func (d *NvidiaPCIDevice) IsGPU() bool {
return d.IsVGAController() || d.Is3DController()
}
// IsResetAvailable some devices can be reset without rebooting,
// check if applicable
func (d *NvidiaPCIDevice) IsResetAvailable() bool {
_, err := os.Stat(path.Join(d.Path, "reset"))
return err == nil
}
// Reset perform a reset to apply a new configuration at HW level
func (d *NvidiaPCIDevice) Reset() error {
err := ioutil.WriteFile(path.Join(d.Path, "reset"), []byte("1"), 0)
if err != nil {
return fmt.Errorf("unable to write to reset file: %v", err)
}
return nil
}
// New interface that allows us to get a list of all NVIDIA PCI devices
func New() Interface {
return &nvpci{PCIDevicesRoot}
}
// NewFrom interface allows us to get a list of all NVIDIA PCI devices at a specific root directory
func NewFrom(root string) Interface {
return &nvpci{root}
}
// GetAllDevices returns all Nvidia PCI devices on the system
func (p *nvpci) GetAllDevices() ([]*NvidiaPCIDevice, error) {
deviceDirs, err := ioutil.ReadDir(p.pciDevicesRoot)
if err != nil {
return nil, fmt.Errorf("unable to read PCI bus devices: %v", err)
}
var nvdevices []*NvidiaPCIDevice
for _, deviceDir := range deviceDirs {
devicePath := path.Join(p.pciDevicesRoot, deviceDir.Name())
nvdevice, err := NewDevice(devicePath)
if err != nil {
return nil, fmt.Errorf("error constructing NVIDIA PCI device %s: %v", deviceDir.Name(), err)
}
if nvdevice == nil {
continue
}
nvdevices = append(nvdevices, nvdevice)
}
addressToID := func(address string) uint64 {
address = strings.ReplaceAll(address, ":", "")
address = strings.ReplaceAll(address, ".", "")
id, _ := strconv.ParseUint(address, 16, 64)
return id
}
sort.Slice(nvdevices, func(i, j int) bool {
return addressToID(nvdevices[i].Address) < addressToID(nvdevices[j].Address)
})
return nvdevices, nil
}
// NewDevice constructs an NvidiaPCIDevice
func NewDevice(devicePath string) (*NvidiaPCIDevice, error) {
address := path.Base(devicePath)
vendor, err := ioutil.ReadFile(path.Join(devicePath, "vendor"))
if err != nil {
return nil, fmt.Errorf("unable to read PCI device vendor id for %s: %v", address, err)
}
vendorStr := strings.TrimSpace(string(vendor))
vendorID, err := strconv.ParseUint(vendorStr, 0, 16)
if err != nil {
return nil, fmt.Errorf("unable to convert vendor string to uint16: %v", vendorStr)
}
if uint16(vendorID) != PCINvidiaVendorID {
return nil, nil
}
class, err := ioutil.ReadFile(path.Join(devicePath, "class"))
if err != nil {
return nil, fmt.Errorf("unable to read PCI device class for %s: %v", address, err)
}
classStr := strings.TrimSpace(string(class))
classID, err := strconv.ParseUint(classStr, 0, 32)
if err != nil {
return nil, fmt.Errorf("unable to convert class string to uint32: %v", classStr)
}
device, err := ioutil.ReadFile(path.Join(devicePath, "device"))
if err != nil {
return nil, fmt.Errorf("unable to read PCI device id for %s: %v", address, err)
}
deviceStr := strings.TrimSpace(string(device))
deviceID, err := strconv.ParseUint(deviceStr, 0, 16)
if err != nil {
return nil, fmt.Errorf("unable to convert device string to uint16: %v", deviceStr)
}
numa, err := ioutil.ReadFile(path.Join(devicePath, "numa_node"))
if err != nil {
return nil, fmt.Errorf("unable to read PCI NUMA node for %s: %v", address, err)
}
numaStr := strings.TrimSpace(string(numa))
numaNode, err := strconv.ParseInt(numaStr, 0, 64)
if err != nil {
return nil, fmt.Errorf("unable to convert NUMA node string to int64: %v", numaNode)
}
config := &ConfigSpace{
Path: path.Join(devicePath, "config"),
}
resource, err := ioutil.ReadFile(path.Join(devicePath, "resource"))
if err != nil {
return nil, fmt.Errorf("unable to read PCI resource file for %s: %v", address, err)
}
resources := make(map[int]*MemoryResource)
for i, line := range strings.Split(strings.TrimSpace(string(resource)), "\n") {
values := strings.Split(line, " ")
if len(values) != 3 {
return nil, fmt.Errorf("more than 3 entries in line '%d' of resource file", i)
}
start, _ := strconv.ParseUint(values[0], 0, 64)
end, _ := strconv.ParseUint(values[1], 0, 64)
flags, _ := strconv.ParseUint(values[2], 0, 64)
if (end - start) != 0 {
resources[i] = &MemoryResource{
uintptr(start),
uintptr(end),
flags,
fmt.Sprintf("%s/resource%d", devicePath, i),
}
}
}
nvdevice := &NvidiaPCIDevice{
Path: devicePath,
Address: address,
Vendor: uint16(vendorID),
Class: uint32(classID),
Device: uint16(deviceID),
NumaNode: int(numaNode),
Config: config,
Resources: resources,
}
return nvdevice, nil
}
// Get3DControllers returns all NVIDIA 3D Controller PCI devices on the system
func (p *nvpci) Get3DControllers() ([]*NvidiaPCIDevice, error) {
devices, err := p.GetAllDevices()
if err != nil {
return nil, fmt.Errorf("error getting all NVIDIA devices: %v", err)
}
var filtered []*NvidiaPCIDevice
for _, d := range devices {
if d.Is3DController() {
filtered = append(filtered, d)
}
}
return filtered, nil
}
// GetVGAControllers returns all NVIDIA VGA Controller PCI devices on the system
func (p *nvpci) GetVGAControllers() ([]*NvidiaPCIDevice, error) {
devices, err := p.GetAllDevices()
if err != nil {
return nil, fmt.Errorf("error getting all NVIDIA devices: %v", err)
}
var filtered []*NvidiaPCIDevice
for _, d := range devices {
if d.IsVGAController() {
filtered = append(filtered, d)
}
}
return filtered, nil
}
// GetNVSwitches returns all NVIDIA NVSwitch PCI devices on the system
func (p *nvpci) GetNVSwitches() ([]*NvidiaPCIDevice, error) {
devices, err := p.GetAllDevices()
if err != nil {
return nil, fmt.Errorf("error getting all NVIDIA devices: %v", err)
}
var filtered []*NvidiaPCIDevice
for _, d := range devices {
if d.IsNVSwitch() {
filtered = append(filtered, d)
}
}
return filtered, nil
}
// GetGPUs returns all NVIDIA GPU devices on the system
func (p *nvpci) GetGPUs() ([]*NvidiaPCIDevice, error) {
devices, err := p.GetAllDevices()
if err != nil {
return nil, fmt.Errorf("error getting all NVIDIA devices: %v", err)
}
var filtered []*NvidiaPCIDevice
for _, d := range devices {
if d.IsGPU() {
filtered = append(filtered, d)
}
}
return filtered, nil
}

View File

@@ -1,140 +0,0 @@
/*
* Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package nvpci
import (
"fmt"
"sort"
"gitlab.com/nvidia/cloud-native/go-nvlib/pkg/nvpci/mmio"
)
const (
pmcEndianRegister = 0x4
pmcLittleEndian = 0x0
pmcBigEndian = 0x01000001
)
// MemoryResource represents a mmio region
type MemoryResource struct {
Start uintptr
End uintptr
Flags uint64
Path string
}
// OpenRW read write mmio region
func (mr *MemoryResource) OpenRW() (mmio.Mmio, error) {
rw, err := mmio.OpenRW(mr.Path, 0, int(mr.End-mr.Start+1))
if err != nil {
return nil, fmt.Errorf("failed to open file for mmio: %v", err)
}
switch rw.Read32(pmcEndianRegister) {
case pmcBigEndian:
return rw.BigEndian(), nil
case pmcLittleEndian:
return rw.LittleEndian(), nil
}
return nil, fmt.Errorf("unknown endianness for mmio: %v", err)
}
// OpenRO read only mmio region
func (mr *MemoryResource) OpenRO() (mmio.Mmio, error) {
ro, err := mmio.OpenRO(mr.Path, 0, int(mr.End-mr.Start+1))
if err != nil {
return nil, fmt.Errorf("failed to open file for mmio: %v", err)
}
switch ro.Read32(pmcEndianRegister) {
case pmcBigEndian:
return ro.BigEndian(), nil
case pmcLittleEndian:
return ro.LittleEndian(), nil
}
return nil, fmt.Errorf("unknown endianness for mmio: %v", err)
}
// From Bit Twiddling Hacks, great resource for all low level bit manipulations
func calcNextPowerOf2(n uint64) uint64 {
n--
n |= n >> 1
n |= n >> 2
n |= n >> 4
n |= n >> 8
n |= n >> 16
n |= n >> 32
n++
return n
}
// GetTotalAddressableMemory will accumulate the 32bit and 64bit memory windows
// of each BAR and round the value if needed to the next power of 2; first
// return value is the accumulated 32bit addresable memory size the second one
// is the accumulated 64bit addressable memory size in bytes. These values are
// needed to configure virtualized environments.
func (mrs MemoryResources) GetTotalAddressableMemory(roundUp bool) (uint64, uint64) {
const pciIOVNumBAR = 6
const pciBaseAddressMemTypeMask = 0x06
const pciBaseAddressMemType32 = 0x00 /* 32 bit address */
const pciBaseAddressMemType64 = 0x04 /* 64 bit address */
// We need to sort the resources so the first 6 entries are the BARs
// How a map is represented in memory is not guaranteed, it is not an
// array. Keys do not have an order.
keys := make([]int, 0, len(mrs))
for k := range mrs {
keys = append(keys, k)
}
sort.Ints(keys)
numBAR := 0
memSize32bit := uint64(0)
memSize64bit := uint64(0)
for _, key := range keys {
// The PCIe spec only defines 5 BARs per device, we're
// discarding everything after the 5th entry of the resources
// file, see lspci.c
if key >= pciIOVNumBAR || numBAR == pciIOVNumBAR {
break
}
numBAR = numBAR + 1
region := mrs[key]
flags := region.Flags & pciBaseAddressMemTypeMask
memType32bit := flags == pciBaseAddressMemType32
memType64bit := flags == pciBaseAddressMemType64
memSize := (region.End - region.Start) + 1
if memType32bit {
memSize32bit = memSize32bit + uint64(memSize)
}
if memType64bit {
memSize64bit = memSize64bit + uint64(memSize)
}
}
if roundUp {
memSize32bit = calcNextPowerOf2(memSize32bit)
memSize64bit = calcNextPowerOf2(memSize64bit)
}
return memSize32bit, memSize64bit
}

View File

@@ -82,7 +82,7 @@ func marshalJSON(id []byte) ([]byte, error) {
}
// unmarshalJSON inflates trace id from hex string, possibly enclosed in quotes.
func unmarshalJSON(dst []byte, src []byte) error {
func unmarshalJSON(dst, src []byte) error {
if l := len(src); l >= 2 && src[0] == '"' && src[l-1] == '"' {
src = src[1 : l-1]
}

View File

@@ -41,7 +41,7 @@ func (i *protoInt64) UnmarshalJSON(data []byte) error {
// strings or integers.
type protoUint64 uint64
// Int64 returns the protoUint64 as a uint64.
// Uint64 returns the protoUint64 as a uint64.
func (i *protoUint64) Uint64() uint64 { return uint64(*i) }
// UnmarshalJSON decodes both strings and integers.

View File

@@ -10,6 +10,7 @@ import (
"errors"
"fmt"
"io"
"math"
"time"
)
@@ -151,8 +152,8 @@ func (s Span) MarshalJSON() ([]byte, error) {
}{
Alias: Alias(s),
ParentSpanID: parentSpanId,
StartTime: uint64(startT),
EndTime: uint64(endT),
StartTime: uint64(startT), // nolint:gosec // >0 checked above.
EndTime: uint64(endT), // nolint:gosec // >0 checked above.
})
}
@@ -201,11 +202,13 @@ func (s *Span) UnmarshalJSON(data []byte) error {
case "startTimeUnixNano", "start_time_unix_nano":
var val protoUint64
err = decoder.Decode(&val)
s.StartTime = time.Unix(0, int64(val.Uint64()))
v := int64(min(val.Uint64(), math.MaxInt64)) //nolint:gosec // Overflow checked.
s.StartTime = time.Unix(0, v)
case "endTimeUnixNano", "end_time_unix_nano":
var val protoUint64
err = decoder.Decode(&val)
s.EndTime = time.Unix(0, int64(val.Uint64()))
v := int64(min(val.Uint64(), math.MaxInt64)) //nolint:gosec // Overflow checked.
s.EndTime = time.Unix(0, v)
case "attributes":
err = decoder.Decode(&s.Attrs)
case "droppedAttributesCount", "dropped_attributes_count":
@@ -248,13 +251,20 @@ func (s *Span) UnmarshalJSON(data []byte) error {
type SpanFlags int32
const (
// SpanFlagsTraceFlagsMask is a mask for trace-flags.
//
// Bits 0-7 are used for trace flags.
SpanFlagsTraceFlagsMask SpanFlags = 255
// Bits 8 and 9 are used to indicate that the parent span or link span is remote.
// Bit 8 (`HAS_IS_REMOTE`) indicates whether the value is known.
// Bit 9 (`IS_REMOTE`) indicates whether the span or link is remote.
// SpanFlagsContextHasIsRemoteMask is a mask for HAS_IS_REMOTE status.
//
// Bits 8 and 9 are used to indicate that the parent span or link span is
// remote. Bit 8 (`HAS_IS_REMOTE`) indicates whether the value is known.
SpanFlagsContextHasIsRemoteMask SpanFlags = 256
// SpanFlagsContextHasIsRemoteMask indicates the Span is remote.
// SpanFlagsContextIsRemoteMask is a mask for IS_REMOTE status.
//
// Bits 8 and 9 are used to indicate that the parent span or link span is
// remote. Bit 9 (`IS_REMOTE`) indicates whether the span or link is
// remote.
SpanFlagsContextIsRemoteMask SpanFlags = 512
)
@@ -263,26 +273,30 @@ const (
type SpanKind int32
const (
// Indicates that the span represents an internal operation within an application,
// as opposed to an operation happening at the boundaries. Default value.
// SpanKindInternal indicates that the span represents an internal
// operation within an application, as opposed to an operation happening at
// the boundaries.
SpanKindInternal SpanKind = 1
// Indicates that the span covers server-side handling of an RPC or other
// remote network request.
// SpanKindServer indicates that the span covers server-side handling of an
// RPC or other remote network request.
SpanKindServer SpanKind = 2
// Indicates that the span describes a request to some remote service.
// SpanKindClient indicates that the span describes a request to some
// remote service.
SpanKindClient SpanKind = 3
// Indicates that the span describes a producer sending a message to a broker.
// Unlike CLIENT and SERVER, there is often no direct critical path latency relationship
// between producer and consumer spans. A PRODUCER span ends when the message was accepted
// by the broker while the logical processing of the message might span a much longer time.
// SpanKindProducer indicates that the span describes a producer sending a
// message to a broker. Unlike SpanKindClient and SpanKindServer, there is
// often no direct critical path latency relationship between producer and
// consumer spans. A SpanKindProducer span ends when the message was
// accepted by the broker while the logical processing of the message might
// span a much longer time.
SpanKindProducer SpanKind = 4
// Indicates that the span describes consumer receiving a message from a broker.
// Like the PRODUCER kind, there is often no direct critical path latency relationship
// between producer and consumer spans.
// SpanKindConsumer indicates that the span describes a consumer receiving
// a message from a broker. Like SpanKindProducer, there is often no direct
// critical path latency relationship between producer and consumer spans.
SpanKindConsumer SpanKind = 5
)
// Event is a time-stamped annotation of the span, consisting of user-supplied
// SpanEvent is a time-stamped annotation of the span, consisting of user-supplied
// text description and key-value pairs.
type SpanEvent struct {
// time_unix_nano is the time the event occurred.
@@ -312,7 +326,7 @@ func (e SpanEvent) MarshalJSON() ([]byte, error) {
Time uint64 `json:"timeUnixNano,omitempty"`
}{
Alias: Alias(e),
Time: uint64(t),
Time: uint64(t), //nolint:gosec // >0 checked above
})
}
@@ -347,7 +361,8 @@ func (se *SpanEvent) UnmarshalJSON(data []byte) error {
case "timeUnixNano", "time_unix_nano":
var val protoUint64
err = decoder.Decode(&val)
se.Time = time.Unix(0, int64(val.Uint64()))
v := int64(min(val.Uint64(), math.MaxInt64)) //nolint:gosec // Overflow checked.
se.Time = time.Unix(0, v)
case "name":
err = decoder.Decode(&se.Name)
case "attributes":
@@ -365,10 +380,11 @@ func (se *SpanEvent) UnmarshalJSON(data []byte) error {
return nil
}
// A pointer from the current span to another span in the same trace or in a
// different trace. For example, this can be used in batching operations,
// where a single batch handler processes multiple requests from different
// traces or when the handler receives a request from a different project.
// SpanLink is a reference from the current span to another span in the same
// trace or in a different trace. For example, this can be used in batching
// operations, where a single batch handler processes multiple requests from
// different traces or when the handler receives a request from a different
// project.
type SpanLink struct {
// A unique identifier of a trace that this linked span is part of. The ID is a
// 16-byte array.

View File

@@ -3,17 +3,19 @@
package telemetry
// StatusCode is the status of a Span.
//
// For the semantics of status codes see
// https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#set-status
type StatusCode int32
const (
// The default status.
// StatusCodeUnset is the default status.
StatusCodeUnset StatusCode = 0
// The Span has been validated by an Application developer or Operator to
// have completed successfully.
// StatusCodeOK is used when the Span has been validated by an Application
// developer or Operator to have completed successfully.
StatusCodeOK StatusCode = 1
// The Span contains an error.
// StatusCodeError is used when the Span contains an error.
StatusCodeError StatusCode = 2
)

View File

@@ -71,7 +71,7 @@ func (td *Traces) UnmarshalJSON(data []byte) error {
return nil
}
// A collection of ScopeSpans from a Resource.
// ResourceSpans is a collection of ScopeSpans from a Resource.
type ResourceSpans struct {
// The resource for the spans in this message.
// If this field is not set then no resource info is known.
@@ -128,7 +128,7 @@ func (rs *ResourceSpans) UnmarshalJSON(data []byte) error {
return nil
}
// A collection of Spans produced by an InstrumentationScope.
// ScopeSpans is a collection of Spans produced by an InstrumentationScope.
type ScopeSpans struct {
// The instrumentation scope information for the spans in this message.
// Semantically when InstrumentationScope isn't set, it is equivalent with

View File

@@ -1,8 +1,6 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0
//go:generate stringer -type=ValueKind -trimprefix=ValueKind
package telemetry
import (
@@ -23,7 +21,7 @@ import (
// A zero value is valid and represents an empty value.
type Value struct {
// Ensure forward compatibility by explicitly making this not comparable.
noCmp [0]func() //nolint: unused // This is indeed used.
noCmp [0]func() //nolint:unused // This is indeed used.
// num holds the value for Int64, Float64, and Bool. It holds the length
// for String, Bytes, Slice, Map.
@@ -92,7 +90,7 @@ func IntValue(v int) Value { return Int64Value(int64(v)) }
// Int64Value returns a [Value] for an int64.
func Int64Value(v int64) Value {
return Value{num: uint64(v), any: ValueKindInt64}
return Value{num: uint64(v), any: ValueKindInt64} //nolint:gosec // Raw value conv.
}
// Float64Value returns a [Value] for a float64.
@@ -164,7 +162,7 @@ func (v Value) AsInt64() int64 {
// this will return garbage.
func (v Value) asInt64() int64 {
// Assumes v.num was a valid int64 (overflow not checked).
return int64(v.num) // nolint: gosec
return int64(v.num) //nolint:gosec // Bounded.
}
// AsBool returns the value held by v as a bool.
@@ -309,13 +307,13 @@ func (v Value) String() string {
return v.asString()
case ValueKindInt64:
// Assumes v.num was a valid int64 (overflow not checked).
return strconv.FormatInt(int64(v.num), 10) // nolint: gosec
return strconv.FormatInt(int64(v.num), 10) //nolint:gosec // Bounded.
case ValueKindFloat64:
return strconv.FormatFloat(v.asFloat64(), 'g', -1, 64)
case ValueKindBool:
return strconv.FormatBool(v.asBool())
case ValueKindBytes:
return fmt.Sprint(v.asBytes())
return string(v.asBytes())
case ValueKindMap:
return fmt.Sprint(v.asMap())
case ValueKindSlice:
@@ -343,7 +341,7 @@ func (v *Value) MarshalJSON() ([]byte, error) {
case ValueKindInt64:
return json.Marshal(struct {
Value string `json:"intValue"`
}{strconv.FormatInt(int64(v.num), 10)})
}{strconv.FormatInt(int64(v.num), 10)}) //nolint:gosec // Raw value conv.
case ValueKindFloat64:
return json.Marshal(struct {
Value float64 `json:"doubleValue"`

View File

@@ -6,6 +6,7 @@ package sdk
import (
"encoding/json"
"fmt"
"math"
"reflect"
"runtime"
"strings"
@@ -16,7 +17,7 @@ import (
"go.opentelemetry.io/otel/attribute"
"go.opentelemetry.io/otel/codes"
semconv "go.opentelemetry.io/otel/semconv/v1.26.0"
semconv "go.opentelemetry.io/otel/semconv/v1.37.0"
"go.opentelemetry.io/otel/trace"
"go.opentelemetry.io/otel/trace/noop"
@@ -85,7 +86,12 @@ func (s *span) SetAttributes(attrs ...attribute.KeyValue) {
limit := maxSpan.Attrs
if limit == 0 {
// No attributes allowed.
s.span.DroppedAttrs += uint32(len(attrs))
n := int64(len(attrs))
if n > 0 {
s.span.DroppedAttrs += uint32( //nolint:gosec // Bounds checked.
min(n, math.MaxUint32),
)
}
return
}
@@ -121,8 +127,13 @@ func (s *span) SetAttributes(attrs ...attribute.KeyValue) {
// convCappedAttrs converts up to limit attrs into a []telemetry.Attr. The
// number of dropped attributes is also returned.
func convCappedAttrs(limit int, attrs []attribute.KeyValue) ([]telemetry.Attr, uint32) {
n := len(attrs)
if limit == 0 {
return nil, uint32(len(attrs))
var out uint32
if n > 0 {
out = uint32(min(int64(n), math.MaxUint32)) //nolint:gosec // Bounds checked.
}
return nil, out
}
if limit < 0 {
@@ -130,8 +141,12 @@ func convCappedAttrs(limit int, attrs []attribute.KeyValue) ([]telemetry.Attr, u
return convAttrs(attrs), 0
}
limit = min(len(attrs), limit)
return convAttrs(attrs[:limit]), uint32(len(attrs) - limit)
if n < 0 {
n = 0
}
limit = min(n, limit)
return convAttrs(attrs[:limit]), uint32(n - limit) //nolint:gosec // Bounds checked.
}
func convAttrs(attrs []attribute.KeyValue) []telemetry.Attr {

View File

@@ -5,6 +5,7 @@ package sdk
import (
"context"
"math"
"time"
"go.opentelemetry.io/otel/trace"
@@ -21,15 +22,20 @@ type tracer struct {
var _ trace.Tracer = tracer{}
func (t tracer) Start(ctx context.Context, name string, opts ...trace.SpanStartOption) (context.Context, trace.Span) {
var psc trace.SpanContext
func (t tracer) Start(
ctx context.Context,
name string,
opts ...trace.SpanStartOption,
) (context.Context, trace.Span) {
var psc, sc trace.SpanContext
sampled := true
span := new(span)
// Ask eBPF for sampling decision and span context info.
t.start(ctx, span, &psc, &sampled, &span.spanContext)
t.start(ctx, span, &psc, &sampled, &sc)
span.sampled.Store(sampled)
span.spanContext = sc
ctx = trace.ContextWithSpan(ctx, span)
@@ -58,7 +64,13 @@ func (t *tracer) start(
// start is used for testing.
var start = func(context.Context, *span, *trace.SpanContext, *bool, *trace.SpanContext) {}
func (t tracer) traces(name string, cfg trace.SpanConfig, sc, psc trace.SpanContext) (*telemetry.Traces, *telemetry.Span) {
var intToUint32Bound = min(math.MaxInt, math.MaxUint32)
func (t tracer) traces(
name string,
cfg trace.SpanConfig,
sc, psc trace.SpanContext,
) (*telemetry.Traces, *telemetry.Span) {
span := &telemetry.Span{
TraceID: telemetry.TraceID(sc.TraceID()),
SpanID: telemetry.SpanID(sc.SpanID()),
@@ -73,11 +85,16 @@ func (t tracer) traces(name string, cfg trace.SpanConfig, sc, psc trace.SpanCont
links := cfg.Links()
if limit := maxSpan.Links; limit == 0 {
span.DroppedLinks = uint32(len(links))
n := len(links)
if n > 0 {
bounded := max(min(n, intToUint32Bound), 0)
span.DroppedLinks = uint32(bounded) //nolint:gosec // Bounds checked.
}
} else {
if limit > 0 {
n := max(len(links)-limit, 0)
span.DroppedLinks = uint32(n)
bounded := min(n, intToUint32Bound)
span.DroppedLinks = uint32(bounded) //nolint:gosec // Bounds checked.
links = links[n:]
}
span.Links = convLinks(links)

View File

@@ -0,0 +1,3 @@
exemptions:
- check: artifacthub_badge
reason: "Artifact Hub doesn't support Go packages"

View File

@@ -7,3 +7,5 @@ ans
nam
valu
thirdparty
addOpt
observ

View File

@@ -1,252 +1,267 @@
# See https://github.com/golangci/golangci-lint#config-file
version: "2"
run:
issues-exit-code: 1 #Default
tests: true #Default
issues-exit-code: 1
tests: true
linters:
# Disable everything by default so upgrades to not include new "default
# enabled" linters.
disable-all: true
# Specifically enable linters we want to use.
default: none
enable:
- asasalint
- bodyclose
- depguard
- errcheck
- errorlint
- gocritic
- godot
- gofumpt
- goimports
- gosec
- gosimple
- govet
- ineffassign
- misspell
- modernize
- perfsprint
- revive
- staticcheck
- testifylint
- typecheck
- unconvert
- unused
- unparam
- unused
- usestdlibvars
- usetesting
settings:
depguard:
rules:
auto/sdk:
files:
- '!internal/global/trace.go'
- ~internal/global/trace_test.go
deny:
- pkg: go.opentelemetry.io/auto/sdk
desc: Do not use SDK from automatic instrumentation.
non-tests:
files:
- '!$test'
- '!**/*test/*.go'
- '!**/internal/matchers/*.go'
deny:
- pkg: testing
- pkg: github.com/stretchr/testify
- pkg: crypto/md5
- pkg: crypto/sha1
- pkg: crypto/**/pkix
otel-internal:
files:
- '**/sdk/*.go'
- '**/sdk/**/*.go'
- '**/exporters/*.go'
- '**/exporters/**/*.go'
- '**/schema/*.go'
- '**/schema/**/*.go'
- '**/metric/*.go'
- '**/metric/**/*.go'
- '**/bridge/*.go'
- '**/bridge/**/*.go'
- '**/trace/*.go'
- '**/trace/**/*.go'
- '**/log/*.go'
- '**/log/**/*.go'
deny:
- pkg: go.opentelemetry.io/otel/internal$
desc: Do not use cross-module internal packages.
- pkg: go.opentelemetry.io/otel/internal/internaltest
desc: Do not use cross-module internal packages.
otlp-internal:
files:
- '!**/exporters/otlp/internal/**/*.go'
deny:
- pkg: go.opentelemetry.io/otel/exporters/otlp/internal
desc: Do not use cross-module internal packages.
otlpmetric-internal:
files:
- '!**/exporters/otlp/otlpmetric/internal/*.go'
- '!**/exporters/otlp/otlpmetric/internal/**/*.go'
deny:
- pkg: go.opentelemetry.io/otel/exporters/otlp/otlpmetric/internal
desc: Do not use cross-module internal packages.
otlptrace-internal:
files:
- '!**/exporters/otlp/otlptrace/*.go'
- '!**/exporters/otlp/otlptrace/internal/**.go'
deny:
- pkg: go.opentelemetry.io/otel/exporters/otlp/otlptrace/internal
desc: Do not use cross-module internal packages.
gocritic:
disabled-checks:
- appendAssign
- commentedOutCode
- dupArg
- hugeParam
- importShadow
- preferDecodeRune
- rangeValCopy
- unnamedResult
- whyNoLint
enable-all: true
godot:
exclude:
# Exclude links.
- '^ *\[[^]]+\]:'
# Exclude sentence fragments for lists.
- ^[ ]*[-•]
# Exclude sentences prefixing a list.
- :$
misspell:
locale: US
ignore-rules:
- cancelled
modernize:
disable:
- omitzero
perfsprint:
int-conversion: true
err-error: true
errorf: true
sprintf1: true
strconcat: true
revive:
confidence: 0.01
rules:
- name: blank-imports
- name: bool-literal-in-expr
- name: constant-logical-expr
- name: context-as-argument
arguments:
- allowTypesBefore: '*testing.T'
disabled: true
- name: context-keys-type
- name: deep-exit
- name: defer
arguments:
- - call-chain
- loop
- name: dot-imports
- name: duplicated-imports
- name: early-return
arguments:
- preserveScope
- name: empty-block
- name: empty-lines
- name: error-naming
- name: error-return
- name: error-strings
- name: errorf
- name: exported
arguments:
- sayRepetitiveInsteadOfStutters
- name: flag-parameter
- name: identical-branches
- name: if-return
- name: import-shadowing
- name: increment-decrement
- name: indent-error-flow
arguments:
- preserveScope
- name: package-comments
- name: range
- name: range-val-in-closure
- name: range-val-address
- name: redefines-builtin-id
- name: string-format
arguments:
- - panic
- /^[^\n]*$/
- must not contain line breaks
- name: struct-tag
- name: superfluous-else
arguments:
- preserveScope
- name: time-equal
- name: unconditional-recursion
- name: unexported-return
- name: unhandled-error
arguments:
- fmt.Fprint
- fmt.Fprintf
- fmt.Fprintln
- fmt.Print
- fmt.Printf
- fmt.Println
- name: unused-parameter
- name: unused-receiver
- name: unnecessary-stmt
- name: use-any
- name: useless-break
- name: var-declaration
- name: var-naming
arguments:
- ["ID"] # AllowList
- ["Otel", "Aws", "Gcp"] # DenyList
- name: waitgroup-by-value
testifylint:
enable-all: true
disable:
- float-compare
- go-require
- require-error
usetesting:
context-background: true
context-todo: true
exclusions:
generated: lax
presets:
- common-false-positives
- legacy
- std-error-handling
rules:
- linters:
- revive
path: schema/v.*/types/.*
text: avoid meaningless package names
# TODO: Having appropriate comments for exported objects helps development,
# even for objects in internal packages. Appropriate comments for all
# exported objects should be added and this exclusion removed.
- linters:
- revive
path: .*internal/.*
text: exported (method|function|type|const) (.+) should have comment or be unexported
# Yes, they are, but it's okay in a test.
- linters:
- revive
path: _test\.go
text: exported func.*returns unexported type.*which can be annoying to use
# Example test functions should be treated like main.
- linters:
- revive
path: example.*_test\.go
text: calls to (.+) only in main[(][)] or init[(][)] functions
# It's okay to not run gosec and perfsprint in a test.
- linters:
- gosec
- perfsprint
path: _test\.go
# Ignoring gosec G404: Use of weak random number generator (math/rand instead of crypto/rand)
# as we commonly use it in tests and examples.
- linters:
- gosec
text: 'G404:'
# Ignoring gosec G402: TLS MinVersion too low
# as the https://pkg.go.dev/crypto/tls#Config handles MinVersion default well.
- linters:
- gosec
text: 'G402: TLS MinVersion too low.'
issues:
# Maximum issues count per one linter.
# Set to 0 to disable.
# Default: 50
# Setting to unlimited so the linter only is run once to debug all issues.
max-issues-per-linter: 0
# Maximum count of issues with the same text.
# Set to 0 to disable.
# Default: 3
# Setting to unlimited so the linter only is run once to debug all issues.
max-same-issues: 0
# Excluding configuration per-path, per-linter, per-text and per-source.
exclude-rules:
# TODO: Having appropriate comments for exported objects helps development,
# even for objects in internal packages. Appropriate comments for all
# exported objects should be added and this exclusion removed.
- path: '.*internal/.*'
text: "exported (method|function|type|const) (.+) should have comment or be unexported"
linters:
- revive
# Yes, they are, but it's okay in a test.
- path: _test\.go
text: "exported func.*returns unexported type.*which can be annoying to use"
linters:
- revive
# Example test functions should be treated like main.
- path: example.*_test\.go
text: "calls to (.+) only in main[(][)] or init[(][)] functions"
linters:
- revive
# It's okay to not run gosec and perfsprint in a test.
- path: _test\.go
linters:
- gosec
- perfsprint
# Ignoring gosec G404: Use of weak random number generator (math/rand instead of crypto/rand)
# as we commonly use it in tests and examples.
- text: "G404:"
linters:
- gosec
# Ignoring gosec G402: TLS MinVersion too low
# as the https://pkg.go.dev/crypto/tls#Config handles MinVersion default well.
- text: "G402: TLS MinVersion too low."
linters:
- gosec
include:
# revive exported should have comment or be unexported.
- EXC0012
# revive package comment should be of the form ...
- EXC0013
linters-settings:
depguard:
rules:
non-tests:
files:
- "!$test"
- "!**/*test/*.go"
- "!**/internal/matchers/*.go"
deny:
- pkg: "testing"
- pkg: "github.com/stretchr/testify"
- pkg: "crypto/md5"
- pkg: "crypto/sha1"
- pkg: "crypto/**/pkix"
auto/sdk:
files:
- "!internal/global/trace.go"
- "~internal/global/trace_test.go"
deny:
- pkg: "go.opentelemetry.io/auto/sdk"
desc: Do not use SDK from automatic instrumentation.
otlp-internal:
files:
- "!**/exporters/otlp/internal/**/*.go"
deny:
- pkg: "go.opentelemetry.io/otel/exporters/otlp/internal"
desc: Do not use cross-module internal packages.
otlptrace-internal:
files:
- "!**/exporters/otlp/otlptrace/*.go"
- "!**/exporters/otlp/otlptrace/internal/**.go"
deny:
- pkg: "go.opentelemetry.io/otel/exporters/otlp/otlptrace/internal"
desc: Do not use cross-module internal packages.
otlpmetric-internal:
files:
- "!**/exporters/otlp/otlpmetric/internal/*.go"
- "!**/exporters/otlp/otlpmetric/internal/**/*.go"
deny:
- pkg: "go.opentelemetry.io/otel/exporters/otlp/otlpmetric/internal"
desc: Do not use cross-module internal packages.
otel-internal:
files:
- "**/sdk/*.go"
- "**/sdk/**/*.go"
- "**/exporters/*.go"
- "**/exporters/**/*.go"
- "**/schema/*.go"
- "**/schema/**/*.go"
- "**/metric/*.go"
- "**/metric/**/*.go"
- "**/bridge/*.go"
- "**/bridge/**/*.go"
- "**/trace/*.go"
- "**/trace/**/*.go"
- "**/log/*.go"
- "**/log/**/*.go"
deny:
- pkg: "go.opentelemetry.io/otel/internal$"
desc: Do not use cross-module internal packages.
- pkg: "go.opentelemetry.io/otel/internal/attribute"
desc: Do not use cross-module internal packages.
- pkg: "go.opentelemetry.io/otel/internal/internaltest"
desc: Do not use cross-module internal packages.
- pkg: "go.opentelemetry.io/otel/internal/matchers"
desc: Do not use cross-module internal packages.
godot:
exclude:
# Exclude links.
- '^ *\[[^]]+\]:'
# Exclude sentence fragments for lists.
- '^[ ]*[-•]'
# Exclude sentences prefixing a list.
- ':$'
goimports:
local-prefixes: go.opentelemetry.io
misspell:
locale: US
ignore-words:
- cancelled
perfsprint:
err-error: true
errorf: true
int-conversion: true
sprintf1: true
strconcat: true
revive:
# Sets the default failure confidence.
# This means that linting errors with less than 0.8 confidence will be ignored.
# Default: 0.8
confidence: 0.01
# https://github.com/mgechev/revive/blob/master/RULES_DESCRIPTIONS.md
rules:
- name: blank-imports
- name: bool-literal-in-expr
- name: constant-logical-expr
- name: context-as-argument
disabled: true
arguments:
- allowTypesBefore: "*testing.T"
- name: context-keys-type
- name: deep-exit
- name: defer
arguments:
- ["call-chain", "loop"]
- name: dot-imports
- name: duplicated-imports
- name: early-return
arguments:
- "preserveScope"
- name: empty-block
- name: empty-lines
- name: error-naming
- name: error-return
- name: error-strings
- name: errorf
- name: exported
arguments:
- "sayRepetitiveInsteadOfStutters"
- name: flag-parameter
- name: identical-branches
- name: if-return
- name: import-shadowing
- name: increment-decrement
- name: indent-error-flow
arguments:
- "preserveScope"
- name: package-comments
- name: range
- name: range-val-in-closure
- name: range-val-address
- name: redefines-builtin-id
- name: string-format
arguments:
- - panic
- '/^[^\n]*$/'
- must not contain line breaks
- name: struct-tag
- name: superfluous-else
arguments:
- "preserveScope"
- name: time-equal
- name: unconditional-recursion
- name: unexported-return
- name: unhandled-error
arguments:
- "fmt.Fprint"
- "fmt.Fprintf"
- "fmt.Fprintln"
- "fmt.Print"
- "fmt.Printf"
- "fmt.Println"
- name: unnecessary-stmt
- name: useless-break
- name: var-declaration
- name: var-naming
arguments:
- ["ID"] # AllowList
- ["Otel", "Aws", "Gcp"] # DenyList
- name: waitgroup-by-value
testifylint:
enable-all: true
disable:
- float-compare
- go-require
- require-error
formatters:
enable:
- gofumpt
- goimports
- golines
settings:
gofumpt:
extra-rules: true
goimports:
local-prefixes:
- go.opentelemetry.io/otel
golines:
max-len: 120
exclusions:
generated: lax

View File

@@ -1,6 +1,13 @@
http://localhost
https://localhost
http://jaeger-collector
https://github.com/open-telemetry/opentelemetry-go/milestone/
https://github.com/open-telemetry/opentelemetry-go/projects
# Weaver model URL for semantic-conventions repository.
https?:\/\/github\.com\/open-telemetry\/semantic-conventions\/archive\/refs\/tags\/[^.]+\.zip\[[^]]+]
file:///home/runner/work/opentelemetry-go/opentelemetry-go/libraries
file:///home/runner/work/opentelemetry-go/opentelemetry-go/manual
http://4.3.2.1:78/user/123
file:///home/runner/work/opentelemetry-go/opentelemetry-go/exporters/otlp/otlptrace/otlptracegrpc/internal/observ/dns:/:4317
# URL works, but it has blocked link checkers.
https://dl.acm.org/doi/10.1145/198429.198435

View File

@@ -11,6 +11,304 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
<!-- Released section -->
<!-- Don't change this section unless doing release -->
## [1.40.0/0.62.0/0.16.0] 2026-02-02
### Added
- Add `AlwaysRecord` sampler in `go.opentelemetry.io/otel/sdk/trace`. (#7724)
- Add `Enabled` method to all synchronous instrument interfaces (`Float64Counter`, `Float64UpDownCounter`, `Float64Histogram`, `Float64Gauge`, `Int64Counter`, `Int64UpDownCounter`, `Int64Histogram`, `Int64Gauge`,) in `go.opentelemetry.io/otel/metric`.
This stabilizes the synchronous instrument enabled feature, allowing users to check if an instrument will process measurements before performing computationally expensive operations. (#7763)
- Add `go.opentelemetry.io/otel/semconv/v1.39.0` package.
The package contains semantic conventions from the `v1.39.0` version of the OpenTelemetry Semantic Conventions.
See the [migration documentation](./semconv/v1.39.0/MIGRATION.md) for information on how to upgrade from `go.opentelemetry.io/otel/semconv/v1.38.0.` (#7783, #7789)
### Changed
- Improve the concurrent performance of `HistogramReservoir` in `go.opentelemetry.io/otel/sdk/metric/exemplar` by 4x. (#7443)
- Improve the concurrent performance of `FixedSizeReservoir` in `go.opentelemetry.io/otel/sdk/metric/exemplar`. (#7447)
- Improve performance of concurrent histogram measurements in `go.opentelemetry.io/otel/sdk/metric`. (#7474)
- Improve performance of concurrent synchronous gauge measurements in `go.opentelemetry.io/otel/sdk/metric`. (#7478)
- Add experimental observability metrics in `go.opentelemetry.io/otel/exporters/stdout/stdoutmetric`. (#7492)
- `Exporter` in `go.opentelemetry.io/otel/exporters/prometheus` ignores metrics with the scope `go.opentelemetry.io/contrib/bridges/prometheus`.
This prevents scrape failures when the Prometheus exporter is misconfigured to get data from the Prometheus bridge. (#7688)
- Improve performance of concurrent exponential histogram measurements in `go.opentelemetry.io/otel/sdk/metric`. (#7702)
- The `rpc.grpc.status_code` attribute in the experimental metrics emitted from `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc` is replaced with the `rpc.response.status_code` attribute to align with the semantic conventions. (#7854)
- The `rpc.grpc.status_code` attribute in the experimental metrics emitted from `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc` is replaced with the `rpc.response.status_code` attribute to align with the semantic conventions. (#7854)
### Fixed
- Fix bad log message when key-value pairs are dropped because of key duplication in `go.opentelemetry.io/otel/sdk/log`. (#7662)
- Fix `DroppedAttributes` on `Record` in `go.opentelemetry.io/otel/sdk/log` to not count the non-attribute key-value pairs dropped because of key duplication. (#7662)
- Fix `SetAttributes` on `Record` in `go.opentelemetry.io/otel/sdk/log` to not log that attributes are dropped when they are actually not dropped. (#7662)
- Fix missing `request.GetBody` in `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp` to correctly handle HTTP/2 `GOAWAY` frame. (#7794)
- `WithHostID` detector in `go.opentelemetry.io/otel/sdk/resource` to use full path for `ioreg` command on Darwin (macOS). (#7818)
### Deprecated
- Deprecate `go.opentelemetry.io/otel/exporters/zipkin`.
For more information, see the [OTel blog post deprecating the Zipkin exporter](https://opentelemetry.io/blog/2025/deprecating-zipkin-exporters/). (#7670)
## [1.39.0/0.61.0/0.15.0/0.0.14] 2025-12-05
### Added
- Greatly reduce the cost of recording metrics in `go.opentelemetry.io/otel/sdk/metric` using hashing for map keys. (#7175)
- Add `WithInstrumentationAttributeSet` option to `go.opentelemetry.io/otel/log`, `go.opentelemetry.io/otel/metric`, and `go.opentelemetry.io/otel/trace` packages.
This provides a concurrent-safe and performant alternative to `WithInstrumentationAttributes` by accepting a pre-constructed `attribute.Set`. (#7287)
- Add experimental observability for the Prometheus exporter in `go.opentelemetry.io/otel/exporters/prometheus`.
Check the `go.opentelemetry.io/otel/exporters/prometheus/internal/x` package documentation for more information. (#7345)
- Add experimental observability metrics in `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc`. (#7353)
- Add temporality selector functions `DeltaTemporalitySelector`, `CumulativeTemporalitySelector`, `LowMemoryTemporalitySelector` to `go.opentelemetry.io/otel/sdk/metric`. (#7434)
- Add experimental observability metrics for simple log processor in `go.opentelemetry.io/otel/sdk/log`. (#7548)
- Add experimental observability metrics in `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc`. (#7459)
- Add experimental observability metrics in `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp`. (#7486)
- Add experimental observability metrics for simple span processor in `go.opentelemetry.io/otel/sdk/trace`. (#7374)
- Add experimental observability metrics in `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp`. (#7512)
- Add experimental observability metrics for manual reader in `go.opentelemetry.io/otel/sdk/metric`. (#7524)
- Add experimental observability metrics for periodic reader in `go.opentelemetry.io/otel/sdk/metric`. (#7571)
- Support `OTEL_EXPORTER_OTLP_LOGS_INSECURE` and `OTEL_EXPORTER_OTLP_INSECURE` environmental variables in `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp`. (#7608)
- Add `Enabled` method to the `Processor` interface in `go.opentelemetry.io/otel/sdk/log`.
All `Processor` implementations now include an `Enabled` method. (#7639)
- The `go.opentelemetry.io/otel/semconv/v1.38.0` package.
The package contains semantic conventions from the `v1.38.0` version of the OpenTelemetry Semantic Conventions.
See the [migration documentation](./semconv/v1.38.0/MIGRATION.md) for information on how to upgrade from `go.opentelemetry.io/otel/semconv/v1.37.0.`(#7648)
### Changed
- `Distinct` in `go.opentelemetry.io/otel/attribute` is no longer guaranteed to uniquely identify an attribute set.
Collisions between `Distinct` values for different Sets are possible with extremely high cardinality (billions of series per instrument), but are highly unlikely. (#7175)
- `WithInstrumentationAttributes` in `go.opentelemetry.io/otel/trace` synchronously de-duplicates the passed attributes instead of delegating it to the returned `TracerOption`. (#7266)
- `WithInstrumentationAttributes` in `go.opentelemetry.io/otel/meter` synchronously de-duplicates the passed attributes instead of delegating it to the returned `MeterOption`. (#7266)
- `WithInstrumentationAttributes` in `go.opentelemetry.io/otel/log` synchronously de-duplicates the passed attributes instead of delegating it to the returned `LoggerOption`. (#7266)
- Rename the `OTEL_GO_X_SELF_OBSERVABILITY` environment variable to `OTEL_GO_X_OBSERVABILITY` in `go.opentelemetry.io/otel/sdk/trace`, `go.opentelemetry.io/otel/sdk/log`, and `go.opentelemetry.io/otel/exporters/stdout/stdouttrace`. (#7302)
- Improve performance of histogram `Record` in `go.opentelemetry.io/otel/sdk/metric` when min and max are disabled using `NoMinMax`. (#7306)
- Improve error handling for dropped data during translation by using `prometheus.NewInvalidMetric` in `go.opentelemetry.io/otel/exporters/prometheus`.
⚠️ **Breaking Change:** Previously, these cases were only logged and scrapes succeeded.
Now, when translation would drop data (e.g., invalid label/value), the exporter emits a `NewInvalidMetric`, and Prometheus scrapes **fail with HTTP 500** by default.
To preserve the prior behavior (scrapes succeed while errors are logged), configure your Prometheus HTTP handler with: `promhttp.HandlerOpts{ ErrorHandling: promhttp.ContinueOnError }`. (#7363)
- Replace fnv hash with xxhash in `go.opentelemetry.io/otel/attribute` for better performance. (#7371)
- The default `TranslationStrategy` in `go.opentelemetry.io/exporters/prometheus` is changed from `otlptranslator.NoUTF8EscapingWithSuffixes` to `otlptranslator.UnderscoreEscapingWithSuffixes`. (#7421)
- Improve performance of concurrent measurements in `go.opentelemetry.io/otel/sdk/metric`. (#7427)
- Include W3C TraceFlags (bits 07) in the OTLP `Span.Flags` field in `go.opentelemetry.io/exporters/otlp/otlptrace/otlptracehttp` and `go.opentelemetry.io/exporters/otlp/otlptrace/otlptracegrpc`. (#7438)
- The `ErrorType` function in `go.opentelemetry.io/otel/semconv/v1.37.0` now handles custom error types.
If an error implements an `ErrorType() string` method, the return value of that method will be used as the error type. (#7442)
### Fixed
- Fix `WithInstrumentationAttributes` options in `go.opentelemetry.io/otel/trace`, `go.opentelemetry.io/otel/metric`, and `go.opentelemetry.io/otel/log` to properly merge attributes when passed multiple times instead of replacing them.
Attributes with duplicate keys will use the last value passed. (#7300)
- The equality of `attribute.Set` when using the `Equal` method is not affected by the user overriding the empty set pointed to by `attribute.EmptySet` in `go.opentelemetry.io/otel/attribute`. (#7357)
- Return partial OTLP export errors to the caller in `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc`. (#7372)
- Return partial OTLP export errors to the caller in `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp`. (#7372)
- Return partial OTLP export errors to the caller in `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc`. (#7372)
- Return partial OTLP export errors to the caller in `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp`. (#7372)
- Return partial OTLP export errors to the caller in `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc`. (#7372)
- Return partial OTLP export errors to the caller in `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp`. (#7372)
- Fix `AddAttributes`, `SetAttributes`, `SetBody` on `Record` in `go.opentelemetry.io/otel/sdk/log` to not mutate input. (#7403)
- Do not double record measurements of `RecordSet` methods in `go.opentelemetry.io/otel/semconv/v1.37.0`. (#7655)
- Do not double record measurements of `RecordSet` methods in `go.opentelemetry.io/otel/semconv/v1.36.0`. (#7656)
### Removed
- Drop support for [Go 1.23]. (#7274)
- Remove the `FilterProcessor` interface in `go.opentelemetry.io/otel/sdk/log`.
The `Enabled` method has been added to the `Processor` interface instead.
All `Processor` implementations must now implement the `Enabled` method.
Custom processors that do not filter records can implement `Enabled` to return `true`. (#7639)
## [1.38.0/0.60.0/0.14.0/0.0.13] 2025-08-29
This release is the last to support [Go 1.23].
The next release will require at least [Go 1.24].
### Added
- Add native histogram exemplar support in `go.opentelemetry.io/otel/exporters/prometheus`. (#6772)
- Add template attribute functions to the `go.opentelmetry.io/otel/semconv/v1.34.0` package. (#6939)
- `ContainerLabel`
- `DBOperationParameter`
- `DBSystemParameter`
- `HTTPRequestHeader`
- `HTTPResponseHeader`
- `K8SCronJobAnnotation`
- `K8SCronJobLabel`
- `K8SDaemonSetAnnotation`
- `K8SDaemonSetLabel`
- `K8SDeploymentAnnotation`
- `K8SDeploymentLabel`
- `K8SJobAnnotation`
- `K8SJobLabel`
- `K8SNamespaceAnnotation`
- `K8SNamespaceLabel`
- `K8SNodeAnnotation`
- `K8SNodeLabel`
- `K8SPodAnnotation`
- `K8SPodLabel`
- `K8SReplicaSetAnnotation`
- `K8SReplicaSetLabel`
- `K8SStatefulSetAnnotation`
- `K8SStatefulSetLabel`
- `ProcessEnvironmentVariable`
- `RPCConnectRPCRequestMetadata`
- `RPCConnectRPCResponseMetadata`
- `RPCGRPCRequestMetadata`
- `RPCGRPCResponseMetadata`
- Add `ErrorType` attribute helper function to the `go.opentelmetry.io/otel/semconv/v1.34.0` package. (#6962)
- Add `WithAllowKeyDuplication` in `go.opentelemetry.io/otel/sdk/log` which can be used to disable deduplication for log records. (#6968)
- Add `WithCardinalityLimit` option to configure the cardinality limit in `go.opentelemetry.io/otel/sdk/metric`. (#6996, #7065, #7081, #7164, #7165, #7179)
- Add `Clone` method to `Record` in `go.opentelemetry.io/otel/log` that returns a copy of the record with no shared state. (#7001)
- Add experimental self-observability span and batch span processor metrics in `go.opentelemetry.io/otel/sdk/trace`.
Check the `go.opentelemetry.io/otel/sdk/trace/internal/x` package documentation for more information. (#7027, #6393, #7209)
- The `go.opentelemetry.io/otel/semconv/v1.36.0` package.
The package contains semantic conventions from the `v1.36.0` version of the OpenTelemetry Semantic Conventions.
See the [migration documentation](./semconv/v1.36.0/MIGRATION.md) for information on how to upgrade from `go.opentelemetry.io/otel/semconv/v1.34.0.`(#7032, #7041)
- Add support for configuring Prometheus name translation using `WithTranslationStrategy` option in `go.opentelemetry.io/otel/exporters/prometheus`. The current default translation strategy when UTF-8 mode is enabled is `NoUTF8EscapingWithSuffixes`, but a future release will change the default strategy to `UnderscoreEscapingWithSuffixes` for compliance with the specification. (#7111)
- Add experimental self-observability log metrics in `go.opentelemetry.io/otel/sdk/log`.
Check the `go.opentelemetry.io/otel/sdk/log/internal/x` package documentation for more information. (#7121)
- Add experimental self-observability trace exporter metrics in `go.opentelemetry.io/otel/exporters/stdout/stdouttrace`.
Check the `go.opentelemetry.io/otel/exporters/stdout/stdouttrace/internal/x` package documentation for more information. (#7133)
- Support testing of [Go 1.25]. (#7187)
- The `go.opentelemetry.io/otel/semconv/v1.37.0` package.
The package contains semantic conventions from the `v1.37.0` version of the OpenTelemetry Semantic Conventions.
See the [migration documentation](./semconv/v1.37.0/MIGRATION.md) for information on how to upgrade from `go.opentelemetry.io/otel/semconv/v1.36.0.`(#7254)
### Changed
- Optimize `TraceIDFromHex` and `SpanIDFromHex` in `go.opentelemetry.io/otel/sdk/trace`. (#6791)
- Change `AssertEqual` in `go.opentelemetry.io/otel/log/logtest` to accept `TestingT` in order to support benchmarks and fuzz tests. (#6908)
- Change `DefaultExemplarReservoirProviderSelector` in `go.opentelemetry.io/otel/sdk/metric` to use `runtime.GOMAXPROCS(0)` instead of `runtime.NumCPU()` for the `FixedSizeReservoirProvider` default size. (#7094)
### Fixed
- `SetBody` method of `Record` in `go.opentelemetry.io/otel/sdk/log` now deduplicates key-value collections (`log.Value` of `log.KindMap` from `go.opentelemetry.io/otel/log`). (#7002)
- Fix `go.opentelemetry.io/otel/exporters/prometheus` to not append a suffix if it's already present in metric name. (#7088)
- Fix the `go.opentelemetry.io/otel/exporters/stdout/stdouttrace` self-observability component type and name. (#7195)
- Fix partial export count metric in `go.opentelemetry.io/otel/exporters/stdout/stdouttrace`. (#7199)
### Deprecated
- Deprecate `WithoutUnits` and `WithoutCounterSuffixes` options, preferring `WithTranslationStrategy` instead. (#7111)
- Deprecate support for `OTEL_GO_X_CARDINALITY_LIMIT` environment variable in `go.opentelemetry.io/otel/sdk/metric`. Use `WithCardinalityLimit` option instead. (#7166)
## [0.59.1] 2025-07-21
### Changed
- Retract `v0.59.0` release of `go.opentelemetry.io/otel/exporters/prometheus` module which appends incorrect unit suffixes. (#7046)
- Change `go.opentelemetry.io/otel/exporters/prometheus` to no longer deduplicate suffixes when UTF8 is enabled.
It is recommended to disable unit and counter suffixes in the exporter, and manually add suffixes if you rely on the existing behavior. (#7044)
### Fixed
- Fix `go.opentelemetry.io/otel/exporters/prometheus` to properly handle unit suffixes when the unit is in brackets.
E.g. `{spans}`. (#7044)
## [1.37.0/0.59.0/0.13.0] 2025-06-25
### Added
- The `go.opentelemetry.io/otel/semconv/v1.33.0` package.
The package contains semantic conventions from the `v1.33.0` version of the OpenTelemetry Semantic Conventions.
See the [migration documentation](./semconv/v1.33.0/MIGRATION.md) for information on how to upgrade from `go.opentelemetry.io/otel/semconv/v1.32.0.`(#6799)
- The `go.opentelemetry.io/otel/semconv/v1.34.0` package.
The package contains semantic conventions from the `v1.34.0` version of the OpenTelemetry Semantic Conventions. (#6812)
- Add metric's schema URL as `otel_scope_schema_url` label in `go.opentelemetry.io/otel/exporters/prometheus`. (#5947)
- Add metric's scope attributes as `otel_scope_[attribute]` labels in `go.opentelemetry.io/otel/exporters/prometheus`. (#5947)
- Add `EventName` to `EnabledParameters` in `go.opentelemetry.io/otel/log`. (#6825)
- Add `EventName` to `EnabledParameters` in `go.opentelemetry.io/otel/sdk/log`. (#6825)
- Changed handling of `go.opentelemetry.io/otel/exporters/prometheus` metric renaming to add unit suffixes when it doesn't match one of the pre-defined values in the unit suffix map. (#6839)
### Changed
- The semantic conventions have been upgraded from `v1.26.0` to `v1.34.0` in `go.opentelemetry.io/otel/bridge/opentracing`. (#6827)
- The semantic conventions have been upgraded from `v1.26.0` to `v1.34.0` in `go.opentelemetry.io/otel/exporters/zipkin`. (#6829)
- The semantic conventions have been upgraded from `v1.26.0` to `v1.34.0` in `go.opentelemetry.io/otel/metric`. (#6832)
- The semantic conventions have been upgraded from `v1.26.0` to `v1.34.0` in `go.opentelemetry.io/otel/sdk/resource`. (#6834)
- The semantic conventions have been upgraded from `v1.26.0` to `v1.34.0` in `go.opentelemetry.io/otel/sdk/trace`. (#6835)
- The semantic conventions have been upgraded from `v1.26.0` to `v1.34.0` in `go.opentelemetry.io/otel/trace`. (#6836)
- `Record.Resource` now returns `*resource.Resource` instead of `resource.Resource` in `go.opentelemetry.io/otel/sdk/log`. (#6864)
- Retry now shows error cause for context timeout in `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc`, `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc`, `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc`, `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp`, `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp`, `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp`. (#6898)
### Fixed
- Stop stripping trailing slashes from configured endpoint URL in `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc`. (#6710)
- Stop stripping trailing slashes from configured endpoint URL in `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp`. (#6710)
- Stop stripping trailing slashes from configured endpoint URL in `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc`. (#6710)
- Stop stripping trailing slashes from configured endpoint URL in `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp`. (#6710)
- Validate exponential histogram scale range for Prometheus compatibility in `go.opentelemetry.io/otel/exporters/prometheus`. (#6822)
- Context cancellation during metric pipeline produce does not corrupt data in `go.opentelemetry.io/otel/sdk/metric`. (#6914)
### Removed
- `go.opentelemetry.io/otel/exporters/prometheus` no longer exports `otel_scope_info` metric. (#6770)
## [0.12.2] 2025-05-22
### Fixed
- Retract `v0.12.0` release of `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc` module that contains invalid dependencies. (#6804)
- Retract `v0.12.0` release of `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp` module that contains invalid dependencies. (#6804)
- Retract `v0.12.0` release of `go.opentelemetry.io/otel/exporters/stdout/stdoutlog` module that contains invalid dependencies. (#6804)
## [0.12.1] 2025-05-21
### Fixes
- Use the proper dependency version of `go.opentelemetry.io/otel/sdk/log/logtest` in `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc`. (#6800)
- Use the proper dependency version of `go.opentelemetry.io/otel/sdk/log/logtest` in `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp`. (#6800)
- Use the proper dependency version of `go.opentelemetry.io/otel/sdk/log/logtest` in `go.opentelemetry.io/otel/exporters/stdout/stdoutlog`. (#6800)
## [1.36.0/0.58.0/0.12.0] 2025-05-20
### Added
- Add exponential histogram support in `go.opentelemetry.io/otel/exporters/prometheus`. (#6421)
- The `go.opentelemetry.io/otel/semconv/v1.31.0` package.
The package contains semantic conventions from the `v1.31.0` version of the OpenTelemetry Semantic Conventions.
See the [migration documentation](./semconv/v1.31.0/MIGRATION.md) for information on how to upgrade from `go.opentelemetry.io/otel/semconv/v1.30.0`. (#6479)
- Add `Recording`, `Scope`, and `Record` types in `go.opentelemetry.io/otel/log/logtest`. (#6507)
- Add `WithHTTPClient` option to configure the `http.Client` used by `go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp`. (#6751)
- Add `WithHTTPClient` option to configure the `http.Client` used by `go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp`. (#6752)
- Add `WithHTTPClient` option to configure the `http.Client` used by `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp`. (#6688)
- Add `ValuesGetter` in `go.opentelemetry.io/otel/propagation`, a `TextMapCarrier` that supports retrieving multiple values for a single key. (#5973)
- Add `Values` method to `HeaderCarrier` to implement the new `ValuesGetter` interface in `go.opentelemetry.io/otel/propagation`. (#5973)
- Update `Baggage` in `go.opentelemetry.io/otel/propagation` to retrieve multiple values for a key when the carrier implements `ValuesGetter`. (#5973)
- Add `AssertEqual` function in `go.opentelemetry.io/otel/log/logtest`. (#6662)
- The `go.opentelemetry.io/otel/semconv/v1.32.0` package.
The package contains semantic conventions from the `v1.32.0` version of the OpenTelemetry Semantic Conventions.
See the [migration documentation](./semconv/v1.32.0/MIGRATION.md) for information on how to upgrade from `go.opentelemetry.io/otel/semconv/v1.31.0`(#6782)
- Add `Transform` option in `go.opentelemetry.io/otel/log/logtest`. (#6794)
- Add `Desc` option in `go.opentelemetry.io/otel/log/logtest`. (#6796)
### Removed
- Drop support for [Go 1.22]. (#6381, #6418)
- Remove `Resource` field from `EnabledParameters` in `go.opentelemetry.io/otel/sdk/log`. (#6494)
- Remove `RecordFactory` type from `go.opentelemetry.io/otel/log/logtest`. (#6492)
- Remove `ScopeRecords`, `EmittedRecord`, and `RecordFactory` types from `go.opentelemetry.io/otel/log/logtest`. (#6507)
- Remove `AssertRecordEqual` function in `go.opentelemetry.io/otel/log/logtest`, use `AssertEqual` instead. (#6662)
### Changed
- ⚠️ Update `github.com/prometheus/client_golang` to `v1.21.1`, which changes the `NameValidationScheme` to `UTF8Validation`.
This allows metrics names to keep original delimiters (e.g. `.`), rather than replacing with underscores.
This can be reverted by setting `github.com/prometheus/common/model.NameValidationScheme` to `LegacyValidation` in `github.com/prometheus/common/model`. (#6433)
- Initialize map with `len(keys)` in `NewAllowKeysFilter` and `NewDenyKeysFilter` to avoid unnecessary allocations in `go.opentelemetry.io/otel/attribute`. (#6455)
- `go.opentelemetry.io/otel/log/logtest` is now a separate Go module. (#6465)
- `go.opentelemetry.io/otel/sdk/log/logtest` is now a separate Go module. (#6466)
- `Recorder` in `go.opentelemetry.io/otel/log/logtest` no longer separately stores records emitted by loggers with the same instrumentation scope. (#6507)
- Improve performance of `BatchProcessor` in `go.opentelemetry.io/otel/sdk/log` by not exporting when exporter cannot accept more. (#6569, #6641)
### Deprecated
- Deprecate support for `model.LegacyValidation` for `go.opentelemetry.io/otel/exporters/prometheus`. (#6449)
### Fixes
- Stop percent encoding header environment variables in `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc` and `go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp`. (#6392)
- Ensure the `noopSpan.tracerProvider` method is not inlined in `go.opentelemetry.io/otel/trace` so the `go.opentelemetry.io/auto` instrumentation can instrument non-recording spans. (#6456)
- Use a `sync.Pool` instead of allocating `metricdata.ResourceMetrics` in `go.opentelemetry.io/otel/exporters/prometheus`. (#6472)
## [1.35.0/0.57.0/0.11.0] 2025-03-05
This release is the last to support [Go 1.22].
@@ -3237,7 +3535,15 @@ It contains api and sdk for trace and meter.
- CircleCI build CI manifest files.
- CODEOWNERS file to track owners of this project.
[Unreleased]: https://github.com/open-telemetry/opentelemetry-go/compare/v1.35.0...HEAD
[Unreleased]: https://github.com/open-telemetry/opentelemetry-go/compare/v1.40.0...HEAD
[1.40.0/0.62.0/0.16.0]: https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.40.0
[1.39.0/0.61.0/0.15.0/0.0.14]: https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.39.0
[1.38.0/0.60.0/0.14.0/0.0.13]: https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.38.0
[0.59.1]: https://github.com/open-telemetry/opentelemetry-go/releases/tag/exporters/prometheus/v0.59.1
[1.37.0/0.59.0/0.13.0]: https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.37.0
[0.12.2]: https://github.com/open-telemetry/opentelemetry-go/releases/tag/log/v0.12.2
[0.12.1]: https://github.com/open-telemetry/opentelemetry-go/releases/tag/log/v0.12.1
[1.36.0/0.58.0/0.12.0]: https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.36.0
[1.35.0/0.57.0/0.11.0]: https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.35.0
[1.34.0/0.56.0/0.10.0]: https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.34.0
[1.33.0/0.55.0/0.9.0/0.0.12]: https://github.com/open-telemetry/opentelemetry-go/releases/tag/v1.33.0
@@ -3329,6 +3635,7 @@ It contains api and sdk for trace and meter.
<!-- Released section ended -->
[Go 1.25]: https://go.dev/doc/go1.25
[Go 1.24]: https://go.dev/doc/go1.24
[Go 1.23]: https://go.dev/doc/go1.23
[Go 1.22]: https://go.dev/doc/go1.22

View File

@@ -12,6 +12,6 @@
# https://help.github.com/en/articles/about-code-owners
#
* @MrAlias @XSAM @dashpole @pellared @dmathieu
* @MrAlias @XSAM @dashpole @pellared @dmathieu @flc1125
CODEOWNERS @MrAlias @pellared @dashpole @XSAM @dmathieu

Some files were not shown because too many files have changed in this diff Show More