Compare commits

...

265 Commits

Author SHA1 Message Date
stevenhorsman
30c28d912b tests: Delete install_go.sh
Having a script to install go is legacy from Jenkins, so
delete it, so there is less code in our repo.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 16:36:59 +00:00
stevenhorsman
df49ae4272 workflow: Use setup-go to install go
Rather than having our own script, just use the github action
to install go when needed.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 16:36:09 +00:00
Steve Horsman
1048132eb1 Merge pull request #12564 from stevenhorsman/remove-unused-dependencies
Try and remove unused crates
2026-02-26 13:53:44 +00:00
Aurélien Bombo
2a13f33d50 Merge pull request #12565 from microsoft/danmihai1/clh-51.1
versions: update cloud hypervisor to v51.1
2026-02-26 07:52:57 -06:00
stevenhorsman
82c27181d8 kata-deploy: Remove unused crates
cargo machete has identified `serde` and `thiserror` as being unused,
so remove them from Cargo.toml

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:38:35 +00:00
stevenhorsman
bdbfe9915b kata-ctl: Remove unused crates
cargo machete has identified the follow crates as unused:
- containerd-shim-protos
- safe-path
- strum
- ttrpc

strum is neded (and maybe isn't picked up due to it being
used by macros?), so add it to the ignore list and remove
the rest

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:38:35 +00:00
stevenhorsman
b4365bdcaa genpolicy: Remove unused crates
`cargo machete` has identified `openssl` and `serde-transcode`
as being un-used. openssl is required, so add it to the ignore
list and just remove serde-transcode

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:38:35 +00:00
stevenhorsman
382c6d2a2f agent-ctl: Remove unused crates
`log` and `rustjail` are flagged by cargo machete as unused,
so lets remove them to reduce the footprint of crates in this tool

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:38:35 +00:00
stevenhorsman
e43a17c2ba runtime-rs: Remove unused crates
- Remove unused crates to reduce our size and the work needed
to do updates
- Also update package.metadata.cargo-machete with some crates
that are incorrectly coming up as unused

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:37:46 +00:00
stevenhorsman
8177a440ca libs: Remove unused crates
Remove unused crates to reduce our size and the work needed
to do updates

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:37:46 +00:00
stevenhorsman
ed7ef68510 dragonball: Remove unused crates
Remove the crates that cargo machete has assessed as being unused

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:37:15 +00:00
stevenhorsman
c1b8c6bce6 dragonball: Update cargo-machete config
cargo machete can't understand `host-device = ["dep:vfio-bindings"`,
so tell it to ignore `vfio-bindings` and not suggest it's unused

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:37:14 +00:00
stevenhorsman
1139a9bb8a trace-forwarder: Try and remove unused crates
I ran cargo machete in trace-forwarder and it suggested that some
of the packages were not used, including a chain with a vulnerability,
so try and remove them to resolve RUSTSEC-2021-0139

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-26 09:37:14 +00:00
Steve Horsman
675c0c3450 Merge pull request #12553 from kata-containers/dependabot/cargo/src/tools/agent-ctl/keccak-0.1.6
build(deps): bump keccak from 0.1.5 to 0.1.6 in /src/tools/agent-ctl
2026-02-26 08:53:57 +00:00
Steve Horsman
9a921bb396 Merge pull request #12575 from kata-containers/build-checks-go-install-setup-go
workflows: Swap our go install for setup-go
2026-02-26 08:51:56 +00:00
Steve Horsman
da0ca483b0 Merge pull request #12572 from fitzthum/bump-trustee
versions: bump Trustee to latest version
2026-02-26 08:48:37 +00:00
Alex Lyn
57b0148356 Merge pull request #12400 from Apokleos/enhance-snp-rs
runtime-rs: Enhance Qemu/SNP Protection
2026-02-26 15:29:33 +08:00
Dan Mihai
2361dc7ca0 tests: k8s: reinstate testing on mariner hosts
Reinstate mariner host testing - including the Agent Policy tests on
these hosts - now that a new CLH version brought in the required fixes.

This reverts commit ea53779b90.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-02-25 21:01:25 +00:00
Dan Mihai
7973e4e2a8 runtime: clh: disable nested vCPUs on MSHV
The recently-added nested property is true by default, but is not
supported yet on MSHV.

See cloud-hypervisor/cloud-hypervisor#7408 for additional information.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-02-25 21:01:25 +00:00
Dan Mihai
24ac2ccb5c runtime-rs: clh: specify raw image format
Specify raw image format for all guest block devices.

- Attempting to auto-detect the image format from CLH would be riskier
  for the Host.

- Creating a new raw image file, auto-detecting its format, and then
  creating a filesystem from the Guest onto the block device is no
  longer supported by CLH v51. Therefore, Kata CI's k8s-block-volume.bats
  would fail without specifying the raw format when hot plugging its block
  device.

- See cloud-hypervisor/cloud-hypervisor@b3e8e2a for additional information.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-02-25 21:01:25 +00:00
Dan Mihai
dc398e801c runtime: clh: specify raw image format
Specify raw image format for all guest block devices.

- Attempting to auto-detect the image format from CLH would be riskier
  for the Host.

- Creating a new raw image file, auto-detecting its format, and then
  creating a filesystem from the Guest onto the block device is no
  longer supported by CLH v51. Therefore, Kata CI's k8s-block-volume.bats
  would fail without specifying the raw format when hot plugging its block
  device.

- See cloud-hypervisor/cloud-hypervisor@b3e8e2a for additional information.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-02-25 21:01:25 +00:00
Dan Mihai
0629354ca0 versions: update cloud hypervisor to v51.1
```
v51.1
=====

This is a bug fix release. The following issues have been addressed:

* Fix image_type in OpenAPI definition (#7734)

v51.0
=====

This release has been tracked in v51.0 group of our roadmap project.

Security Fixes

This release fixes a security vulnerability in disk image handling.
Details can be found in GHSA-jmr4-g2hv-mjj6.

* A new `backing_files=on|off` option has been added to `--disk` to
  explicitly control whether QCOW2 backing files are permitted. This
  defaults to `off` to prevent the loading of backing files entirely.
  (#7685)
* Explicit image type specification via the user interface, removing
  reliance on format autodetection (#7728).
* Prevent sector-zero writes for autodetected raw images (#7728).

Significant QCOW2 v3 Improvements

A large number of QCOW2 v3 specification features have been implemented:

* RAW backing file support for QCOW2 overlays (#7570)
* Zero bit in L2 entries (#7627)
* Incompatible feature bit validation (#7612)
* Dirty bit support (#7636)
* Variable refcount widths (1 to 64-bit) (#7633)
* Corrupt bit detection and marking (#7639)
* Autoclear feature bits handling (#7648)
* Thread safety fix for multiple virtio queues (`num_queues > 1`)
  (#7661)
* Correct zero-fill for reads beyond backing file size (#7678)
* Live disk resize support (#7687)

ACPI Generic Initiator Support

ACPI Generic Initiator Affinity (SRAT Type 5) support has been added
to associate VFIO-PCI devices with dedicated memory/CPU-less NUMA
nodes. This enables the guest OS to make NUMA-aware memory allocation
decisions for device workloads. A new `device_id` parameter has been
added to `--numa` for specifying VFIO devices. (#7626)

Block Device DISCARD and WRITE_ZEROES Support

The `virtio-blk` device now supports `DISCARD` and `WRITE_ZEROES`
operations for QCOW2 and RAW image formats. This enables thin
provisioning and efficient space reclamation when guests trim
filesystems. A new `sparse=on|off` option has been added to `--disk` to
control disk space management: `sparse=on` (default) enables thin
provisioning with space reclamation, while `sparse=off` provides thick
provisioning with consistent I/O latency. (#7666)

Notable Performance Improvements

* Transparent Huge Pages (THP) support has been extended to cover
  anonymous shared memory (`shared=on`) via `madvise`. Previously, THP
  was only used for non-shared memory. (#7646)
* The `vhost-user-net` device now uses the default set of vhost-user
  virtio features, including `VIRTIO_F_RING_INDIRECT_DESC`, which
  provides a performance improvement. (#7653)

MSHV Support Improvements

* Optimize CPU state update after emulation by only updating special
  registers when changed (#7603)
* Enable SMT for guests with `threads_per_core > 1` (#7668)
* Stub `save_data_tables()` to unblock VM pause/resume (#7692)
* Handle `GHCB_INFO_SPECIAL_DBGPRINT` VMG exit in SEV-SNP guest exit
  handler (#7703)
* Fix CVM boot failure on MSHV (#7548)
* Fix CPU topology detection for multithreaded configurations (#7576)

Notable Bug Fixes

* Fix VFIO device hot-remove leaving group and container file
  descriptors open, preventing re-add (#7676)
* Fix snapshot restore when backing file is on read-only storage with
  `shared=false` (#7674)
* Enforce `VIRTIO_BLK_F_RO` even if guest does not negotiate it
  (#7705)
* Fix read-only block device FLUSH requests from OVMF preventing VMs
  from booting (#7706)
* Fix vhost-user device not properly dropping unowned file descriptors
  (#7679)
* Fix `vhost-user-block` `get_config` interoperability (#7617)
* Fix vsock TOCTOU race condition by copying packet header from guest
  memory before processing (#7530)
* Fix vsock handling of large TX packets spanning multiple data
  descriptors (#7680)
* Add `gettid()` to all seccomp filters (#7596)
* Fix MAC address parsing that wrongly allowed `+` instead of hex
  characters (#7579)
* Improve UUID parse error message and `--net` fd help text (#7702)
* Fix various inconsistencies in our OpenAPI specification file
  (#7716, #7726)
* Various documentation fixes (#7602, #7606)
```

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-02-25 21:01:25 +00:00
Tobin Feldman-Fitzthum
b4b5db2f1c tests: fixup SNP attestation test for new Trustee version
Trustee now returns the binary SNP TCB claims as hex rather than base64
(for consistency with other platforms). Fortunately, the sev-snp-measure
tool has a flag for setting the output type of the launch digest.

I think hex is the default, but let's keep the flag here to be explicit.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-02-25 09:57:36 -08:00
Steve Horsman
a655605e8f Merge pull request #12566 from manuelh-dev/mahuber/fail-exp-timeout
tests: Extend fail timeout for failure test
2026-02-25 16:11:53 +00:00
stevenhorsman
856ba08c71 workflows: Swap our go install for setup-go
Unfortunately, due to golang/go#75031, there is an issue
that results in `go: no such tool "covdata"`
with a automatically installed 1.25 toolchain, so
the approach to skip the install_go.sh script (which causes
double install problems) didn't work. Try the alternative approach
of using setup-go action, which should do a more comprehensive job

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-25 13:46:40 +00:00
Alex Lyn
2fb6376a99 dragonball: Reduce warnings in dragonball when using 1.91 rust tools
Some warnings come up when we use bumped rust-1.91, this commit aims to
eliminate warnings.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-25 21:04:35 +08:00
Alex Lyn
dc87c1f732 runtime-rs: Add support for configurable Qemu/SEV-SNP guest policy
This commit enables the SEV-SNP guest policy to be explicitly
configured via the runtime configuration in runtime-rs.

To provide both ease of use and maximum flexibility, the following
logic is implemented:
1. If the user provides a custom `snp_guest_policy` in the
configuration, this value is passed directly to the QEMU SEV-SNP
guest object.
2. If the user does not specify a policy, the driver defaults to
`0x30000`, matching QEMU's standard default for SEV-SNP guests.

This enhancement allows users to fine-tune security constraints through
the policy bitmask, while ensuring a sensible and functional default
for standard SNP deployments.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-25 21:04:35 +08:00
Alex Lyn
9fc708ec4f kata-types: Add SNP launch configuration fields to SecurityInfo
This commit introduces three new fields to the `SecurityInfo` struct
to support SEV-SNP (Secure Nested Paging) attestation and measurement
capabilities:

(1) `snp_id_block`: A 96-byte Base64-encoded ID block for the
  SNP_LAUNCH_FINISH command.
(2) `snp_id_auth`: A 4096-byte Base64-encoded authentication structure
  accompanying the ID block.
(3) `snp_guest_policy`: A bitmask for the SNP guest policy, passed to
  the SNP_LAUNCH_START command.

These fields enable users to provide identity information to the SNP
firmware, allowing for remote attestation and verified guest launches.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-25 21:04:35 +08:00
Alex Lyn
f9ffc95c3c runtime-rs: Introduce a SNP policy field in ObjectSevSnpGuest
A bitmask for the SNP guest policy is introduced in ObjectSevSnpGuest
to help pass to Qemu cmdline.

And defaults to 0x30000 (QEMU's default) to maintain standard behavior
it just looks like as: "policy=0x30000"

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-25 21:04:35 +08:00
Alex Lyn
21e0df4c06 runtime-rs: Add kernel irqchip with split for SNP
Add more param with split when qemu launches for SNP.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-25 21:04:35 +08:00
Alex Lyn
ebe87d0e6f runtime-rs: Disable memory hotplug setting within SEV-SNP
For SEV-SNP, memory overcommit is not supported. we only set the memory
size.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-25 21:04:35 +08:00
Alex Lyn
830667c041 runtime-rs: Add two methods for Qemu Memory to control memory set
Introduce two methods to help set max memory and num_slots.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-25 21:04:35 +08:00
Alex Lyn
d298df7014 kata-types: Add cross-platform host_memory_mib() helper for host memory
Introduce host_memory_mib() with OS-specific implementations
(Linux/Android via nix::sysinfo,
macOS via sysctl) selected at compile time. This improves
portability and allows consistent host memory sizing/validation
across different platforms.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-25 21:04:26 +08:00
Zvonko Kaiser
7294719e1c Merge pull request #12559 from fidencio/topic/kata-deploy-fix-custom-runtime-no-snapshotter
kata-deploy: a few guard-rails to avoid failures if components are not set in the values.yaml file
2026-02-25 08:03:28 -05:00
dependabot[bot]
528a944b2a build(deps): bump keccak from 0.1.5 to 0.1.6 in /src/tools/agent-ctl
Bumps [keccak](https://github.com/RustCrypto/sponges) from 0.1.5 to 0.1.6.
- [Commits](https://github.com/RustCrypto/sponges/compare/keccak-v0.1.5...keccak-v0.1.6)

---
updated-dependencies:
- dependency-name: keccak
  dependency-version: 0.1.6
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-02-25 13:02:31 +00:00
Alex Lyn
b3d60698af runtime-rs: move host memory adjustment into MemoryInfo using nix sysinfo
As the memory related information has been serialized at the sandbox
initalization specially at the moment of parsing configuration toml.

This commit aims to refactor MemoryInfo initialization logics:

(1) Remove memory sizing/host-memory adjustment logic from QEMU cmdline
  Memory::new()
(2) Initialize/adjust memory values via kata-types MemoryInfo (single
  source of truth)
(3) Replace sysinfo::System::new_with_specifics with
  nix::sys::sysinfo::sysinfo() to get host RAM

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-25 19:32:44 +08:00
Steve Horsman
7ffb7719b5 Merge pull request #12562 from kata-containers/prep-for-go-1.25-switch
Prep for go 1.25 switch
2026-02-25 11:13:30 +00:00
dependabot[bot]
7cc2e9710b build(deps): bump github.com/BurntSushi/toml in /src/runtime
Bumps [github.com/BurntSushi/toml](https://github.com/BurntSushi/toml) from 1.3.2 to 1.5.0.
- [Release notes](https://github.com/BurntSushi/toml/releases)
- [Commits](https://github.com/BurntSushi/toml/compare/v1.3.2...v1.5.0)

---
updated-dependencies:
- dependency-name: github.com/BurntSushi/toml
  dependency-version: 1.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-02-25 10:00:36 +01:00
Tobin Feldman-Fitzthum
88568dd6e0 versions: bump Trustee to latest version
Update Trustee to pickup a few recent features, such as improvements to
TDX attestation configuration, and fixes to our vault/OpenBao backend.

This will also pickup our bump of Trustee to Rust 1.90.0.

We should be able to use this version of Trustee with the current
version of guest-components, which cannot be bumped at the moment due to
development dependencies.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-02-24 13:54:44 -08:00
Hyounggyu Choi
78d19a4402 Merge pull request #12569 from BbolroC/fix-assertion-guest-pull-runtime-rs
tests: Improve assertion handling for runtime-rs hypervisor
2026-02-24 16:34:40 +01:00
stevenhorsman
ef1b0b2913 runtime: Fix mismatch in receiver names
Fix: `ST1016: methods on the same type should have the same receiver name`

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
1b2ca678e5 runtime: Fix identifier names
Fix identifiers that are non compliant with go's conventions
e.g. not capitalising initialisations

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
69fea195f9 runtime: Fix arm unit test
I think that c727332b0e
broke the arm unit test by removing the arm specific overrides,
so update the expected output

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
b187983f84 workflows: build_checks: skip the go install if installed
Some of our static checks are hitting issues with duplicate
go versions installed. Given that we in go.mod we set the
version to match our required toolchain, if go is already installed
we can let go handle the toolchain version management instead
of installing a second version

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
8f7a2b3d5d runtime: Add copyright & licenses
Add missing headers

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
9b307a5fa6 metrics: Uncapitalise error strings
Fix `T1005: error strings should not be capitalized (staticcheck)`
This is to comply with go conventitions as errors are normally appended,
so there would be a spurious captialisation in the middle of the message

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
6eb67327d0 tests: Use ReplaceAll over Replace
strings.ReplaceAll was introduced in Go 1.12 as a more readable and self-documenting way to say "replace everything".

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
8fc6280f5e log-parser: Use time.IsZero() to check
Using time.IsZero() to check for uninitialised times is clearer

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
c1117bc831 log-parser: Use ReplaceAll over Replace
strings.ReplaceAll was introduced in Go 1.12 as a more readable and self-documenting way to say "replace everything".

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
8311dffce3 log-parser: Apply De Morgan's law
QF1001: Distributing negation across terms and flipping operators, makes it
easy for humans to process expressions at a time, vs evaluating a whole block
and then flipping it and can allow for earlier exit

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
f24765562d csi-kata-directvolume: Fix error messages
Error messages get appended and prepended, so it's against convention
to end them with punctuation

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
f84b462b95 runtime: Fix typo in comment
Fix `requiered` is a misspelling of `required` (misspell)

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
15813564f7 runtime: Avoid using fmt.Sprintf("%s", x)
It's more efficient and concise to just call .String()

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
a577685a8a runtime: Apply De Morgan's law
QF1001: Distributing negation across terms and flipping operators, makes it
easy for humans to process expressions at a time, vs evaluating a whole block
and then flipping it and can allow for earlier exit

Signed-off-by: stevenhorsman <steven@uk.ibm.com>

fixup: demorgans
2026-02-24 14:33:04 +00:00
stevenhorsman
e86338c9c0 runtime: Remove explicit types in variable declarations
QF1011 - use the short declaration as the type can be inferred

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
f60ee411f0 runtime: Update poorly chosen Duration names
ST1011 - having time.Duration values with variable names of MS/Secs
is misleading

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
6562ec5b61 runtime: Merge conditional assignment
Fix `QF1007: could merge conditional assignment into variable declaration`

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
a0ccb63f47 runtime: Use ReplaceAll over Replace
strings.ReplaceAll was introduced in Go 1.12 as a more readable and self-documenting way to say "replace everything".

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
a78d212dfc kata-monitor: Switch to switch statements
Resolve: `QF1003: could use tagged switch`

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
6f438bfb19 runtime: Improve receiver name
Update from `this` to fix:
```
ST1006: receiver name should be a reflection of its identity; don't use generic names such as "this" or "self" (staticcheck)
```

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
f1960103d1 runtime: Improve split statement
strings.SplitN(s, sep, -1) is functionally identical to strings.Split(s, sep)
as -1 says to return all substrings, so choose the more concise version

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
8cd3aa8c84 runtime: Remove embedded field from selector
GenericDevice is an embedded (anonymous) field in the device struct, so its fields
and methods are "promoted" to the outer struct, so we go straight to it.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
4351a61f67 runtime: Fix error string formatting
Resolve `ST1005: error strings should not end with punctuation or newlines (staticcheck)`

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
312567a137 runtime: Fix double imports
Remove one of the double imports to tidy up the code

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
93c77a7d4e runtime: Improve print statement
fix `QF1012: Use fmt.Fprintf(...) instead of Write([]byte(fmt.Sprintf(...))) (staticcheck)`

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:33:04 +00:00
stevenhorsman
cff8994336 runtime: Switch to switch statements
Resolve: `QF1003: could use tagged switch on major (staticcheck)`
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:22:10 +00:00
stevenhorsman
487f530d89 ci: Update golangci configuration
Add a setting to skip the
`T1005: error strings should not be capitalized (staticcheck)`
rule to avoid impact to our error strings

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 14:22:09 +00:00
Hyounggyu Choi
3d71be3dd3 tests: Improve assertion handling for runtime-rs hypervisor
Since runtime-rs added support for virtio-blk-ccw on s390x in #12531,
the assertion in k8s-guest-pull-image.bats should be generalized
to apply to all hypervisors ending with `-runtime-rs`.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-24 11:48:07 +01:00
stevenhorsman
5ca4c34a34 kata-monitor: Fix golangci-lint warning
QF1012: Use fmt.Fprintf(...) instead of Write([]byte(fmt.Sprintf(...))) (staticcheck)
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 10:02:48 +00:00
stevenhorsman
2ac89f4569 versions: Update golangci-lint
Bump to the latest version to pick up support for Go 1.25

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-24 10:02:48 +00:00
Manuel Huber
566bb306f1 tests: enable policy for openvpn on nydus
Specify runAsUser, runAsGroup, supplementalGroups values embedded
in the image's /etc/group file explicitly in the security context.
With this, both genpolicy and containerd, which in case of using
nydus guest-pull, lack image introspection capabilities, use the
same values for user/group/additionalG IDs at policy generation
time and at runtime when the OCI spec is passed.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-24 08:08:15 +01:00
Fupan Li
0bfb6b3c45 Merge pull request #12531 from BbolroC/blkdev-hotplug-s390x-runtime-rs
runtime-rs: Support for block device hotplug on s390x
2026-02-24 13:03:59 +08:00
Fabiano Fidêncio
a0d954cf7c tests: Enable auto-generated policies for experimental_force_guest_pull
We want to run with auto-generated policies when using experimental_force_guest_pull.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-23 22:15:18 +01:00
Manuel Huber
e15c18f05c tests: Extend fail timeout for failure test
Extend the timeout for the assert_pod_fail function call for the
test case "Test we cannot pull a large image that pull time exceeds
createcontainer timeout inside the guest" when the experimental
force guest-pull method is being used. In this method, the image is
first pulled on the host before creating the pod sandbox. While
image pull times can suddenly spike, we already time out in the
assert_pod_fail function before the image is even pulled on the
host.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-23 12:56:23 -08:00
Hyounggyu Choi
4e533f82e7 tests: Remove skip condition for runtime-rs on s390x in k8s-block-volume
This commit removes the skip condition for qemu-runtime-rs on s390x
in k8s-block-volume.bats.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-23 09:00:29 +01:00
Hyounggyu Choi
2961914f54 runtime-rs: Support for virtio-blk-ccw devices and hotplug
- Introduced `ccw_addr` field in `BlockConfig` for CCW device addresses
- Updated `CcwSubChannel` to handle CCW addresses and channel itself
- Enhanced `QemuInner` to handle CCW subchannel for hotplug operations
- Handled `virtio-blk-ccw` devices in hotplug_block_device()
- Modified resource management to accommodate `ccw_addr`

Fixes: #10373

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-23 09:00:29 +01:00
Hyounggyu Choi
e893526fad runtime-rs: Reuse constants from kata-types
Some constants are duplicated in runtime-rs even though they
are already defined in kata-types. Use the definitions from
kata-types as the single source of truth to avoid inconsistencies
between components (e.g. agent and runtime).

This change makes runtime-rs use the constants defined in
kata-types.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-23 09:00:29 +01:00
Hyounggyu Choi
606d193f65 runtime-rs: Set DRIVER_BLK_CCW_TYPE correctly
`DRIVER_BLK_CCW_TYPE` is defined as `blk-ccw`
in src/libs/kata-types/src/device.rs, so set
the variable in runtime-rs accordingly.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-23 09:00:29 +01:00
Fabiano Fidêncio
b082cf1708 kata-deploy: validate defaultShim is enabled before propagating it
getDefaultShimForArch previously returned whatever string was set in
defaultShim.<arch> without any validation. A typo, a non-existent shim,
or a shim that is disabled via disableAll would all silently produce a
bogus DEFAULT_SHIM_* env var, causing kata-deploy to fail at runtime.

Guard the return value by checking whether the configured shim is
present in the list of shims that are both enabled and support the
requested architecture. If not, return empty string so the env var is
simply omitted.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-21 14:01:11 +01:00
Fabiano Fidêncio
4ff7f67278 kata-deploy: fix nil pointer when custom runtime omits containerd/crio
Using `$runtime.containerd.snapshotter` and `$runtime.crio.pullType`
panics with a nil pointer error when the containerd or crio block is
absent from the custom runtime definition.

Let's use the `dig` function which safely traverses nested keys and
returns an empty string as the default when any key in the path is
missing.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-21 13:59:41 +01:00
Fabiano Fidêncio
96c20f8baa tests: k8s: set CreateContainerRequest (on free runners) timeout to 600s
Set KubeletConfiguration runtimeRequestTimeout to 600s mainly for CoCo
(Confidential Containers) tests, so container creation (attestation,
policy, image pull, VM start) does not hit the default CRI timeout.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-21 08:44:47 +01:00
Fabiano Fidêncio
9634dfa859 gatekeeper: Update tests name
We need to do so after moving some of the tests to the free runners.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-21 08:44:47 +01:00
Fabiano Fidêncio
a6b7a2d8a4 tests: assert_pod_fail accept RunContainerError and StartError
Treat waiting.reason RunContainerError and terminated.reason StartError/Error
as container failure, so tests that expect guest image-pull failure (e.g.
wrong credentials) pass when the container fails with those states instead
of only BackOff.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-21 08:44:47 +01:00
Fabiano Fidêncio
42d980815a tests: skip k8s-policy-pvc on non-AKS
Otherwise it'll fail as we cannot bind the device.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-21 08:44:47 +01:00
Fabiano Fidêncio
1523c48a2b tests: k8s: Align coco / erofs job declaration
Later on we may even think about merging those, but for now let's at
least make sure the envs used are the same / declared in a similar place
for each job.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-21 08:44:47 +01:00
Fabiano Fidêncio
1b9b53248e tests: k8s: coco: rely more on free runners
Run all CoCo non-TEE variants in a single job on the free runner with an
explicit environment matrix (vmm, snapshotter, pull_type, kbs,
containerd_version).

Here we're testing CoCo only with the "active" version of containerd.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-21 08:44:47 +01:00
Fabiano Fidêncio
1fa3475e36 tests: k8s: rely more on free runners
We were running most of the k8s integration tests on AKS. The ones that
don't actually depend on AKS's environment now run on normal
ubuntu-24.04 GitHub runners instead: we bring up a kubeadm cluster
there, test with both containerd lts and active, and skip attestation
tests since those runtimes don't need them. AKS is left only for the
jobs that do depend on it.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-21 08:44:47 +01:00
Fabiano Fidêncio
2f056484f3 versions: Bump containerd active to 2.2
SSIA

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-21 08:44:47 +01:00
Zvonko Kaiser
6d1eaa1065 Merge pull request #12461 from manuelh-dev/mahuber/guest-pull-bats
tests: enable more scenarios for k8s-guest-pull-image.bats
2026-02-20 08:48:54 -05:00
Zvonko Kaiser
1de7dd58f5 gpu: Add NVLSM daemon
We need to chissel the NVLSM daemon for NVL5 systems

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-20 11:39:59 +01:00
Zvonko Kaiser
67d154fe47 gpu: Enable NVL5 based platform
NVL5 based HGX systems need ib_umad and
fabricmanager and nvlsm installed.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-20 11:39:59 +01:00
Dan Mihai
ea53779b90 ci: k8s: temporarily disable mariner host
Disable mariner host testing in CI, and auto-generated policy testing
for the temporary replacements of these hosts (based on ubuntu), to work
around missing:

1. cloud-hypervisor/cloud-hypervisor@0a5e79a, that will allow Kata
   in the future to disable the nested property of guest VPs. Nested
   is enabled by default and doesn't work yet with mariner's MSHV.
2. cloud-hypervisor/cloud-hypervisor@bf6f0f8, exposed by the large
   ttrpc replies intentionally produced by the Kata CI Policy tests.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-02-19 20:42:50 +01:00
Dan Mihai
3e2153bbae ci: k8s: easier to modify az aks create command
Make `az aks create` command easier to change when needed, by moving the
arguments specific to mariner nodes onto a separate line of this script.
This change also removes the need for `shellcheck disable=SC2046` here.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-02-19 20:42:50 +01:00
Fabiano Fidêncio
cadbf51015 versions: Update Cloud Hypervisor to v50.0
```
This release has been tracked in v50.0 group of our roadmap project.

Configurable Nested Virtualization Option on x86_64
The nested=on|off option has been added to --cpu to allow users
to configure nested virtualization support in the guest on x86_64
hosts (for both KVM and MSHV). The default value is on to maintain
consistency with existing behavior. (#7408)

Compression Support for QCOW2
QCOW2 support has been extended to handle compression clusters based on
zlib and zstd. (#7462)

Notable Performance Improvements
Performance of live migration has been improved via an optimized
implementation of dirty bitmap maintenance. (#7468)

Live Disk Resizing Support for Raw Images
The /vm.resize-disk API has been introduced to allow users to resize block
devices backed by raw images while a guest is running. (#7476)

Developer Experience Improvements
Significant improvements have been made to developer experience and
productivity. These include a simplified root manifest, codified and
tightened Clippy lints, and streamlined workflows for cargo clippy and
cargo test. (#7489)

Improved File-level Locking Support
Block devices now use byte-range advisory locks instead of whole-file
locks. While both approaches prevent multiple Cloud Hypervisor instances
from simultaneously accessing the same disk image with write
permissions, byte-range locks provide better compatibility with network
storage backends. (#7494)

Logging Improvements
Logs now include event information generated by the event-monitor
module. (#7512)

Notable Bug Fixes
* Fix several issues around CPUID in the guest (#7485, #7495, #7508)
* Fix snapshot/restore for Windows Guest (#7492)
* Respect queue size in block performance tests (#7515)
* Fix several Serial Manager issues (#7502)
* Fix several seccomp violation issues (#7477, #7497, #7518)
* Fix various issues around block and qcow (#7526, #7528, #7537, #7546,
  #7549)
* Retrieve MSRs list correctly on MSHV (#7543)
* Fix live migration (and snapshot/restore) with AMX state (#7534)
```

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-19 20:42:50 +01:00
Dan Mihai
d8b403437f static-build: delete cloud-hypervisor directory
This cloud-hypervisor is a directory, so it needs "rm -rf" instead of
"rm -f".

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-02-19 20:42:50 +01:00
Manuel Huber
fd340ac91c tests: remove skips for some guest-pull scenarios
Issue 10838 is resolved by the prior commit, enabling the -m
option of the kernel build for confidential guests which are
not users of the measured rootfs, and by commit
976df22119, which ensures
relevant user space packages are present.
Not every confidential guest has the measured rootfs option
enabled. Every confidential guest is assumed to support CDH's
secure storage features, in contrast.

We also adjust test timeouts to account for occasional spikes on
our bare metal runners (e.g., SNP, TDX, s390x).

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-19 10:10:55 -08:00
Harshitha Gowda
728d8656ee tests: Set sev-snp, qemu-snp CIs as required
run-k8s-tests-on-tee (sev-snp, qemu-snp)

Signed-off-by: Harshitha Gowda <hgowda@amd.com>
2026-02-19 16:41:29 +01:00
Fabiano Fidêncio
855f4dc7fa release: Bump version to 3.27.0
Bump VERSION and helm-charts versions.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-19 14:01:26 +01:00
Markus Rudy
0621e65e74 genpolicy: allow RO and RW for sysfs with privileged container
After containerd 2.0.4, privileged containers handle sysfs mounts a bit
differently, so we can end up with the policy expecting RO and the input
having RW.

The sandbox needs to get privileged mounts when any container in the pod
is privileged, not only when the pause container itself is marked
privileged. So we now compute that and pass it into get_mounts.

One downside: we’re relaxing policy checks (accepting RO/RW mismatch for
sysfs) and giving the pause container privileged mounts whenever the pod
has any privileged workload. For Kata, that means a slightly broader
attack surface for privileged pods—the pause container sees more than it
strictly needs, and we’re being more permissive on sysfs.

It’s a trade-off for compatibility with newer containerd; if you need
maximum isolation, you may want to avoid privileged pods or tighten
policy elsewhere.

Fixes: #12532

Signed-off-by: Markus Rudy <mr@edgeless.systems>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-19 11:16:50 +01:00
Amulyam24
a22c59a204 kata-deploy: enable kata-remote for ppc64le
When kata-deploy is deployed with cloud-api-adaptor, it
defaults to qemu instead of configuring the remote shim.
Support ppc64le to enable it correctly when shims.remote.enabled=true

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2026-02-19 11:14:27 +01:00
Steve Horsman
6a67250397 Merge commit from fork
runtime-go/rs: Disable virtio-pmem for Cloud Hypervisor
2026-02-19 09:00:56 +00:00
Chiranjeevi Uddanti
88203cbf8d tests: Add regression test for sandbox_cgroup_only=false
Add unit test for get_ch_vcpu_tids() and integration test that creates
a pod with sandbox_cgroup_only=false to verify it starts successfully.

Signed-off-by: Chiranjeevi Uddanti <244287281+chiranjeevi-max@users.noreply.github.com>
Co-authored-by: Antigravity <antigravityagent@google.com>
2026-02-18 20:20:14 +01:00
Chiranjeevi Uddanti
9c52f0caa7 runtime-rs/ch: Fix inverted vcpu/tid mapping in get_ch_vcpu_tids
The VcpuThreadIds struct expects a mapping from vcpu_id to thread_id,
but get_ch_vcpu_tids() was inserting (tid, vcpu_id) instead of
(vcpu_id, tid).

This caused move_vcpus_to_sandbox_cgroup() to interpret vcpu IDs
(0, 1, 2...) as process IDs when sandbox_cgroup_only=false, leading
to failed attempts to read /proc/0/status.

Fixes: #12479
Signed-off-by: Chiranjeevi Uddanti <244287281+chiranjeevi-max@users.noreply.github.com>
2026-02-18 20:20:14 +01:00
Aurélien Bombo
8ff9cd1f12 Merge pull request #12455 from ajaypvictor/secret-cm-without-sharedfs
ci: Add integration tests for secret & configmap propagation
2026-02-18 12:06:48 -06:00
Aurélien Bombo
336b922d4f tests/cbl-mariner: Stop disabling NVDIMM explicitly
This is not needed anymore since now disable_image_nvdimm=true for
Cloud Hypervisor.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-02-18 11:52:51 -06:00
Aurélien Bombo
48aa077e8c runtime{,-rs}/qemu/arm64: Disable DAX
Enabling full-featured QEMU NVDIMM support on ARM with DAX enabled causes a
kernel panic in caches_clean_inval_pou (see below, different issue from
33b1f07), so we disable DAX in that environment.

[    1.222529] EXT4-fs (pmem0p1): mounted filesystem e5a4892c-dac8-42ee-ba55-27d4ff2f38c3 ro with ordered data mode. Quota mode: disabled.
[    1.222695] VFS: Mounted root (ext4 filesystem) readonly on device 259:1.
[    1.224890] devtmpfs: mounted
[    1.225175] Freeing unused kernel memory: 1920K
[    1.226102] Run /sbin/init as init process
[    1.226164]   with arguments:
[    1.226204]     /sbin/init
[    1.226235]   with environment:
[    1.226268]     HOME=/
[    1.226295]     TERM=linux
[    1.230974] Internal error: synchronous external abort: 0000000096000010 [#1]  SMP
[    1.231963] CPU: 0 UID: 0 PID: 1 Comm: init Tainted: G   M                6.18.5 #1 NONE
[    1.232965] Tainted: [M]=MACHINE_CHECK
[    1.233428] pstate: 43400005 (nZcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[    1.234273] pc : caches_clean_inval_pou+0x68/0x84
[    1.234862] lr : sync_icache_aliases+0x30/0x38
[    1.235412] sp : ffff80008000b9a0
[    1.235842] x29: ffff80008000b9a0 x28: 0000000000000000 x27: 00000000019a00e1
[    1.236912] x26: ffff80008000bc08 x25: ffff80008000baf0 x24: fffffdffc0000000
[    1.238064] x23: ffff000001671ab0 x22: ffff000001663480 x21: fffffdffc23401c0
[    1.239356] x20: fffffdffc23401c0 x19: fffffdffc23401c0 x18: 0000000000000000
[    1.240626] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[    1.241762] x14: ffffaae8f021b3b0 x13: 0000000000000000 x12: ffffaae8f021b3b0
[    1.242874] x11: ffffffffffffffff x10: 0000000000000000 x9 : 0000ffffbb53c000
[    1.244022] x8 : 0000000000000000 x7 : 0000000000000012 x6 : ffff55178f5e5000
[    1.245157] x5 : ffff80008000b970 x4 : ffff00007fa4f680 x3 : ffff00008d007000
[    1.246257] x2 : 0000000000000040 x1 : ffff00008d008000 x0 : ffff00008d007000
[    1.247387] Call trace:
[    1.248056]  caches_clean_inval_pou+0x68/0x84 (P)
[    1.248923]  __sync_icache_dcache+0x7c/0x9c
[    1.249578]  insert_page_into_pte_locked+0x1e4/0x284
[    1.250432]  insert_page+0xa8/0xc0
[    1.251080]  vmf_insert_page_mkwrite+0x40/0x7c
[    1.251832]  dax_iomap_pte_fault+0x598/0x804
[    1.252646]  dax_iomap_fault+0x28/0x30
[    1.253293]  ext4_dax_huge_fault+0x80/0x2dc
[    1.253988]  ext4_dax_fault+0x10/0x3c
[    1.254679]  __do_fault+0x38/0x12c
[    1.255293]  __handle_mm_fault+0x530/0xcf0
[    1.255990]  handle_mm_fault+0xe4/0x230
[    1.256697]  do_page_fault+0x17c/0x4dc
[    1.257487]  do_translation_fault+0x30/0x38
[    1.258184]  do_mem_abort+0x40/0x8c
[    1.258895]  el0_ia+0x4c/0x170
[    1.259420]  el0t_64_sync_handler+0xd8/0xdc
[    1.260154]  el0t_64_sync+0x168/0x16c
[    1.260795] Code: d2800082 9ac32042 d1000443 8a230003 (d50b7523)
[    1.261756] ---[ end trace 0000000000000000 ]---

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-02-18 11:52:43 -06:00
Aurélien Bombo
c727332b0e runtime/qemu/arm64: Align NVDIMM usage on amd64
Nowadays on arm64 we use a modern QEMU version which supports the features we
require for NVDIMM, so we remove the arm64-specific code and use the generic
implementation.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-02-18 11:47:53 -06:00
Aurélien Bombo
e17f96251d runtime{,-rs}/clh: Disable virtio-pmem
This disables virtio-pmem support for Cloud Hypervisor by changing
Kata config defaults and removing the relevant code paths.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-02-18 11:47:53 -06:00
Zvonko Kaiser
1d09e70233 Merge pull request #12538 from fidencio/topic/kata-deploy-fix-regression-on-hardcopying-symlinks
kata-deploy: preserve symlinks when installing artifacts
2026-02-18 12:44:46 -05:00
Mikko Ylinen
5622ab644b versions: bump QEMU to v10.2.1
v10.2.1 is the latest patch release in v10.2 series. Changes:
https://github.com/qemu/qemu/compare/v10.2.0...v10.2.1

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-02-18 18:18:52 +01:00
Mikko Ylinen
d68adc54da versions: bump to Linux v6.18.12 (LTS)
Latest changelog in
https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.18.12

Also other changes for 6..11 updates are available.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-02-18 18:18:52 +01:00
Fabiano Fidêncio
34336f87c7 kata-deploy: convert install.rs get_hypervisor_name tests to rstest
Use rstest parameterized tests for QEMU variants, other hypervisors,
and unknown/empty shim cases.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-18 12:41:55 +01:00
Fabiano Fidêncio
bb11bf0403 kata-deploy: preserve symlinks when installing artifacts
When copying artifacts from the container to the host, detect source
entries that are symlinks and recreate them as symlinks at the
destination instead of copying the target file.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-18 12:29:14 +01:00
Dan Mihai
eee25095b5 tests: mariner annotations for k8s-openvpn
This test uses YAML files from a different directory than the other
k8s CI tests, so annotations have to be added into these separate
files.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-02-18 07:17:04 +01:00
Manuel Huber
4c760fd031 build: add CONFIDENTIAL_GUEST variable for kernel
This change adds the CONFIDENTIAL_GUEST variable to the kernel
build logic. Similar to commit
976df22119, we would like to enable
the cryptsetup functionalities not only when building a measured
root file system, but also when building for a confidential guest.
The current state is that not all confidential guests use a
measured root filesystem, and as a matter of fact, we should
indeed decouple these aspects.

With the current convention, a confidential guest is a user of CDH
with its storage features. A better naming of the
CONFIDENTIAL_GUEST variable could have been a naming related to CDH
storage functionality. Further, the kernel build script's -m
parameter could be improved too - as indicated by this change, not
only measured rootfs builds will need the cryptsetup.conf file.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-17 12:44:50 -08:00
Manuel Huber
d3742ca877 tests: enable guest pull bats for force guest pull
Similar to k8s-guest-pull-image-authenticated and to
k8s-guest-pull-image-signature, enabling k8s-guest-pull-image to
run against the experimental force guest pull method.
Only k8s-guest-pull-image-encrypted requires nydus.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-17 12:44:50 -08:00
Markus Rudy
8365afa336 qemu: log exit code after failure
When qemu exits prematurely, we usually see a message like

  msg="Cannot start VM" error="exiting QMP loop, command cancelled"

This is an indirect hint, caused by the QMP server shutting down. It
takes experience to understand what it even means, and it still does not
show what's actually the problem.

With this commit, we're taking the error return from the qemu
subprocess and surface it in the logs, if it's not nil. This means we
automatically capture any non-zero exit codes in the logs.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2026-02-17 21:03:13 +01:00
Fabiano Fidêncio
f0a0425617 kata-deploy: convert a few toml.rs tests to rstest
Turn test_toml_value_types into a parameterized test with one case per type
(string, bool, int). Merge the two invalid-TOML tests (get and set) into one
rstest with two cases, and the two "not an array" tests into one rstest
with two cases.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-17 09:33:39 +01:00
Fabiano Fidêncio
899005859c kata-deploy: avoid leading/blank lines in written TOML config
When writing containerd drop-in or other TOML (e.g. initially empty file),
the serialized document could start with many newlines.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-17 09:33:39 +01:00
Fabiano Fidêncio
cfa8188cad kata-deploy: convert containerd version support tests to rstest
Replace multiple #[test] functions for snapshotter and erofs version
checks with parameterized #[rstest] #[case] tests for consistency and
easier extension.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-17 09:33:39 +01:00
Fabiano Fidêncio
cadac7a960 kata-deploy: runtime_platform -> runtime_platforms
Fix runtime_platforms typo.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-17 09:33:39 +01:00
Hyounggyu Choi
8bc60a0761 Merge pull request #12521 from fidencio/topic/kata-deploy-auto-add-nfd-tee-labels-to-the-runtime-class
kata-deploy: Add TEE nodeSelectors for TEE shims when NFD is detected
2026-02-16 18:06:18 +01:00
Jacek Tomasiak
8025fa0457 agent: Don't pass empty options to mount
With some older kernels some fs implementations don't handle empty
options strings well. This leads to failures in "setup rootfs" step.
E.g. `cgroup: cgroup2: unknown option ""`.
This is fixed by mapping empty string to `None` before passing to
`nix::mount`.

Signed-off-by: Jacek Tomasiak <jtomasiak@arista.com>
Signed-off-by: Jacek Tomasiak <jacek.tomasiak@gmail.com>
2026-02-16 14:55:59 +01:00
Fabiano Fidêncio
a04df4f4cb kata-deploy: disable provenance/SBOM for quay.io compatibility
Disable provenance and SBOM when building per-arch kata-deploy images so
each tag is a single image manifest. quay.io rejects pushing multi-arch
manifest lists that include attestation manifests (400 manifest invalid).
Add a note in the release script documenting this.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-16 13:32:25 +01:00
Fabiano Fidêncio
0e8e30d6b5 kata-deploy: fix default RuntimeClass + nodeSelectors
The default RuntimeClass (e.g. kata) is meant to point at the default shim
handler (e.g. kata-qemu-$tee). We were building it in a separate block and
only sometimes adding the same TEE nodeSelectors as the shim-specific
RuntimeClass, leading to kata ending up without the SE/SNP/TDX
nodeSelector while kata-qemu-$tee had it.

The fix is to stop duplicating the RuntimeClass definition, having a
single template that renders one RuntimeClass (name, handler, overhead,
nodeSelectors).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-16 13:09:03 +01:00
Fabiano Fidêncio
80a175d09b kata-deploy: Add TEE nodeSelectors for TEE shims when NFD is detected
When NFD is detected (deployed by the chart or existing in the cluster),
apply shim-specific nodeSelectors only for TEE runtime classes (snp,
tdx, and se).

Non-TEE shims keep existing behavior (e.g. runtimeClass.nodeSelector for
nvidia GPU from f3bba0885 is unchanged).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-16 12:07:51 +01:00
Fabiano Fidêncio
d000acfe08 infra: fix multi-arch manifest publish
Per-arch images were failing publish-multiarch-manifest with 'X is a manifest
list' because Buildx now enables attestations by default, so each arch tag
became an image index. Use 'docker buildx imagetools create' instead of
'docker manifest create' so we can merge those indexes into the final
multi-arch manifest while keeping provenance and SBOM on per-arch images.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-14 19:49:00 +01:00
Fabiano Fidêncio
02c9a4b23c kata-deploy: Temporarily comment GPU specific labels
We depend on GPU Operator v26.3 release, which is not out yet.
Although we have been testing with it, it's not yet publicly available,
which would break anyone actually trying to use the GPU runtime classes.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-14 09:25:14 +01:00
Ajay Victor
83935e005c ci: Add integration tests for secret & configmap propagation
Enhance k8s-configmap.bats and k8s-credentials-secrets.bats to test that ConfigMap and Secret updates propagate to volume-mounted pods.

- Enhanced k8s-configmap.bats to test ConfigMap propagation
  * Added volume mount test for ConfigMap consumption
  * Added verification that ConfigMap updates propagate to volume-mounted pods

- Enhanced k8s-credentials-secrets.bats to test Secret propagation
  * Added verification that Secret updates propagate to volume-mounted pods

Fixes #8015

Signed-off-by: Ajay Victor <ajvictor@in.ibm.com>
2026-02-14 08:56:21 +05:30
Fabiano Fidêncio
5106e7b341 build: Add gnupg to the agent's builder container
Otherwise we'll fail to check gperf's GPG signing key when needed.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-14 00:33:45 +01:00
stevenhorsman
79b5022a5a kata-ctl: Bump rkyv version to 0.7.46
Bump to remediate RUSTSEC-2026-0001

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-14 00:33:45 +01:00
stevenhorsman
30ebc4241e genpolicy: Bump rkyv version to 0.7.46
Bump to remediate RUSTSEC-2026-0001

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-14 00:33:45 +01:00
stevenhorsman
87d1979c84 agent-ctl: Bump rkyv version to 0.7.46
Bump to remediate RUSTSEC-2026-0001

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-14 00:33:45 +01:00
stevenhorsman
90dbd3f562 agent: Bump rkyv version to 0.7.46
Bump to remediate RUSTSEC-2026-0001

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-14 00:33:45 +01:00
stevenhorsman
7f77948658 versions: Bump rkyv version to 0.7.46
Bump to remediate RUSTSEC-2026-0001

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-14 00:33:45 +01:00
Aurélien Bombo
981f693a88 Merge pull request #11140 from balintTobik/hyperv_warning
runtime: refactor hypervisor devices cgroup creation
2026-02-13 15:16:09 -06:00
Fabiano Fidêncio
d8acc403c8 kata-deploy: set CRI images runtime_platform snapshotter for containerd v3
In containerd config v3 the CRI plugin is split into runtime and images,
and setting the snapshotter only on the runtime plugin is not enough for image
pull/prepare.

The images plugin must have runtime_platform.<runtime>.snapshotter so it
uses the correct snapshotter per runtime (e.g. nydus, erofs).

A PR on the containerd side is open so we can rely on the runtime plugin
snapshotter alone: https://github.com/containerd/containerd/pull/12836

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-13 22:15:02 +01:00
Fabiano Fidêncio
2930c68c0b ci: tdx: properly skip k8s-sandbox-vcpus-allocation.bats
This is a follow-up for 25962e9325

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-13 20:56:08 +01:00
Fabiano Fidêncio
f6e0a7c33c scripts: use temporary GPG home when verifying cached gperf tarball
In CI the default GPG keyring is often read-only or missing, so
'gpg --import' of the cached keyring fails and verification cannot
succeed. Use a temporary GNUPGHOME for import and verify so cached
gperf can be verified without writing to the system keyring.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-13 19:39:55 +01:00
stevenhorsman
55a89f6836 runtime: doc: Remove usage of golang.org/x/net/context
This package is deprecated and we aren't using it any more

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-13 17:55:23 +01:00
stevenhorsman
06246ea18b csi-kata-directvolume: Remove usage of golang.org/x/net/context
This packages is deprecated, so use the standard library context
package instead

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-13 17:55:23 +01:00
stevenhorsman
f2fae93785 csi-kata-directvolume: Bump x/net to v0.50
Remediates CVEs:
- GO-2026-4440
- GO-2026-4441

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-13 17:55:23 +01:00
stevenhorsman
74d4469dab ci/openshift-ci: Bump x/net to v0.50
Remediates CVEs:
- GO-2026-4440
- GO-2026-4441

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-13 17:55:23 +01:00
Steve Horsman
bb867149bb Merge pull request #12514 from fidencio/topic/nvidia-try-to-improve-genpolicy-failures
tests: nvidia: Fix genpolicy error when pulling nvcr.io images
2026-02-13 16:34:00 +00:00
Joji Mekkattuparamban
f3bba08851 kata-deploy: add node selector to nvidia runtime classes
The CC runtime classes kata-qemu-nvidia-gpu-snp and kata-qemu-nvidia-gpu-tdx
are mutually exclusive with kata-qemu-nvidia-gpu, as dictated by the gpu
cc mode setting. In order to properly support a cluster that has both CC and
non-CC nodes, we use a node selector so the scheduling is consistent with the
GPU mode. The GPU operator sets a label nvidia.com/cc.ready.state=[true, false]
to indicate the gpu mode setting

Fixes #12431

Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>
2026-02-13 15:58:06 +01:00
Fabiano Fidêncio
8cb7d0be9d tests: nvidia: Fix genpolicy error when pulling nvcr.io images
genpolicy pulls image manifests from nvcr.io to generate policy and was
failing with 'UnauthorizedError' because it had no registry credentials.

Genpolicy (src/tools/genpolicy) uses docker_credential::get_credential()
in registry.rs, which reads from DOCKER_CONFIG/config.json. Add
setup_genpolicy_registry_auth() to create a Docker config with nvcr.io
auth (NGC_API_KEY) and set DOCKER_CONFIG before running genpolicy so it
can authenticate when pulling manifests.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-13 13:12:55 +01:00
Fabiano Fidêncio
f4dcb66a3c ci: add workflow to push ORAS tarball cache
Add push-oras-tarball-cache workflow that runs on push to main when
versions.yaml changes (and on workflow_dispatch). It populates the
ghcr.io ORAS cache with gperf and busybox tarballs from versions.yaml.

Remove the push_to_cache call from download-with-oras-cache.sh since
it was never triggered in CI. Cache population is now done solely by the
new workflow and by populate-oras-tarball-cache.sh when run manually.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-13 12:57:48 +01:00
Balint Tobik
295a6a81d0 runtime: refactor hypervisor devices cgroup creation
Separatly added hypervisor devices to cgroup to
omit not relevant warnings and fail if none of them
are available.
Also fix a testcase reload removed kernel modules to later testcases
and skip some tests on ARM because lack of virtualization support
Fixes #6656

Signed-off-by: Balint Tobik <btobik@redhat.com>
2026-02-13 09:23:08 +01:00
Aurélien Bombo
14be9504e7 Merge pull request #12506 from kata-containers/sprt/gperf-mirror
versions: Switch gperf mirror again
2026-02-12 17:00:17 -06:00
Fabiano Fidêncio
a01e95b988 kata-deploy: test k3s/rke2 template handling / version checks
Add tests for the split_non_toml_header helper that strips Go template
directives before TOML parsing, and for every TOML operation (set, get,
append, remove, set_array) on files that start with {{ template "base" . }}.

Also converts the containerd version detection tests in manager.rs from
individual #[test] functions with helper wrappers to parametrized #[rstest]
cases, which is more readable and easier to extend.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-12 22:30:08 +01:00
Fabiano Fidêncio
2e7633674f kata-deploy: use k3s/rke2 base template
K3s docs (https://docs.k3s.io/advanced#configuring-containerd) say that the
right way to customize containerd is to extend the base template with
{{ template "base" . }} and append your own TOML blocks, rather than copying a
prerendered config.toml into the template file.

We were copying config.toml into config.toml.tmpl / config-v3.toml.tmpl, which
meant we were replacing the K3s defaults with a snapshot that gets stale as
soon as K3s is upgraded.

Now we create the template files with just the base directive and let our
regular set_toml_value code path append the Kata runtime configuration on top.

To make that work, the TOML utils learned to handle files that start with a
Go template line ({{ ... }}): strip it before parsing, put it back when writing.
This keeps the K3s/RKE2 path identical to every other runtime -- no special
append logic needed.

refs:
* k3s:: https://docs.k3s.io/advanced#configuring-containerd
* rke2: https://docs.rke2.io/advanced?_highlight=conyainerd#configuring-containerd

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-12 22:30:08 +01:00
Aurélien Bombo
199e1ab16c versions: Switch gperf mirror again
The mirror introduced by #11178 still breaks quite often so apply this as a
quick fix.

A proper solution would probably be to load balance like in #12453.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-02-12 13:41:19 -06:00
Fabiano Fidêncio
6a3bbb1856 tests: Retry k8s deployment
We've seen a lot of spurious issues when deploying the infra needed for
the tests.  Let's give it a few tries before actually failing.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-02-12 20:13:59 +01:00
Manuel Huber
ed7de905b5 build: Tighten upstream download path for ORAS
The gperf-3.3 tarball frequently fails to download on my end with
cryptic error messages such as: "tar: This does not look like a tar
archive". This change tightens the download logic a bit: We fail at
the point in time when we're supposed to fail. This way we detect
rate limiting issues right away, and this way, the actual hashsum
and signature checks are effective, not only printouts.

This change also updates the key reference and allows for an array,
for instance, when a different signer was used for a cache vs
upstream version.
The change also makes it clear, that signature verification is only
implemented for the gperf tarball. Improvements can be made in a
subsequent change.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-12 19:20:35 +01:00
Fabiano Fidêncio
9fc5be47d0 kata-deploy: fix custom runtime config path for runtime-rs shims
Custom runtimes whose base config lives under runtime-rs/ (e.g. dragonball,
cloud-hypervisor) were not found because the path was always built under
share/defaults/kata-containers/. Use get_kata_containers_original_config_path
for the handler so rust shim configs are read from .../runtime-rs/.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-12 18:08:47 +01:00
Fabiano Fidêncio
50923b6d62 kata-deploy: run cleanup on uninstall via DaemonSet preStop
On helm uninstall let's rely on a preStop hook to run kata-deploy
cleanup so each pod cleans its node before exiting.

We **must** keep RBAC (resource-policy: keep) so pods retain API access
during termination, and then can properly delete the NodeFeatureRules
and remove the labels from the nodes.

The post-delete hook Job, which runs on a single node, now is only
responsible for cleaning the kept RBAC (cluster-wide resource) after
uninstall, not leaving any resource or artefact behind.

The changes on this commit lead to a "resouerces were kept" message when
running `helm uninstall`, which document as being normal, as the
post-delete job will remove those.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-11 22:05:10 +01:00
Fabiano Fidêncio
6e0cbc28a3 kata-deploy: fix node label removal
When removing a node label, JSON merge patch semantics require setting
the key to null; omitting the key leaves it unchanged.

Fix label_node to send a patch with the label key set to null so the API
server actually removes katacontainers.io/kata-runtime.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-11 22:05:10 +01:00
Fabiano Fidêncio
510d2a69ae kata-deploy: exit with 0 on SIGTERM in install mode
Wait for SIGTERM after install and exit(0) so the container terminates
cleanly. If registering the SIGTERM handler fails, log a warning and
sleep forever instead of exiting with an error (fallback to the old
behaviour).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-11 22:05:10 +01:00
Mikko Ylinen
25962e9325 tests/coco: disable k8s-sandbox-vcpus-allocation.bats for TDX
After the move to Linux 6.17 and QEMU 10.2 from Kata,
k8s-sandbox-vcpus-allocation.bats started failing on TDX.

2026-02-10T16:39:39.1305813Z # pod/vcpus-less-than-one-with-no-limits created
2026-02-10T16:39:39.1306474Z # pod/vcpus-less-than-one-with-limits created
2026-02-10T16:39:39.1307090Z # pod/vcpus-more-than-one-with-limits created
2026-02-10T16:39:39.1307672Z # pod/vcpus-less-than-one-with-limits condition met
2026-02-10T16:39:39.1308373Z # timed out waiting for the condition on pods/vcpus-less-than-one-with-no-limits
2026-02-10T16:39:39.1309132Z # timed out waiting for the condition on pods/vcpus-more-than-one-with-limits
2026-02-10T16:39:39.1310370Z # Error from server (BadRequest): container "vcpus-less-than-one-with-no-limits" in pod "vcpus-less-than-one-with-no-limits" is waiting to start: ContainerCreating

A manual test without agent policies added it seems to work OK but disable
the test for now to get CI stable.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-02-11 22:02:59 +01:00
stevenhorsman
006a5d5141 versions: Tidy up versions file
- We don't use containerd.latest as the comment on it suggests
- We also don't have any references to `sriov-network-device`
so remove that and the plugins section.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-11 20:49:53 +01:00
Dan Mihai
9d763e9d5a Merge pull request #12439 from sespiros/genpolicy-suppress-yaml-stdout
genpolicy: suppress YAML output when --{base64/raw}-out are used
2026-02-11 10:27:01 -08:00
Spyros Seimenis
282bfc9f14 genpolicy: suppress YAML output when --{base64/raw}-out are used
this will suppress yaml output only if the input is passed via
stdin. If {base64/raw}-out is passed in alongside a yaml file, the
encoded annotation or the policy data respectively will be printed
to stdout as before.

Fixes #12438

Signed-off-by: Spyros Seimenis <sse@edgeless.systems>
2026-02-11 14:08:30 +02:00
Hyounggyu Choi
c84e37f6ac Merge pull request #12486 from BbolroC/cpu-hotplug-s390x-runtime-rs
runtime-rs: Skip sockets and threads for hotplug_vcpus on Z/P
2026-02-11 09:40:21 +01:00
Hyounggyu Choi
67f54bdcb5 tests: Remove skip condition for runtime-rs on s390x in k8s-cpu-ns
This commit removes the skip condition for qemu-runtime-rs on s390x
in k8s-cpu-ns.bats.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-11 05:52:13 +01:00
Hyounggyu Choi
eab77a26ab runtime-rs: Skip sockets and threads for hotplug_vcpus on Z/P
As s390x and ppc64 use a flat CPU topology without sockets and threads,
this commit skips the socket_id and thread_id properties for vCPU hotplug
on these architectures instead of aborting the operation.
This is the change in line with those from the Go runtime:
- isSocketIDSupported()
- isThreadIDSupported()

Fixes: #12155

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2026-02-11 05:52:13 +01:00
Alex Lyn
c53910eb1b Merge pull request #12408 from Apokleos/netdev-multiq
runtime-rs: Add support configurable network_queues via configuration and annotation
2026-02-11 09:34:58 +08:00
stevenhorsman
a115d6d858 ci: Add copyright and license to shellcheckrc
Make the static-checks happy

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-10 21:58:28 +01:00
stevenhorsman
15d6a681ed doc: Fix spelling issues
Put things in backticks

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-10 21:58:28 +01:00
stevenhorsman
e84d234721 doc: Update broken/slow URLs
Update the URLs to better/existing links

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-10 21:58:28 +01:00
Fabiano Fidêncio
5c0269881e tests: Make editorconfig-checker happy
- Trim trailing whitespace and ensure final newline in non-vendor files
- Add .editorconfig-checker.json excluding vendor dirs, *.patch, *.img,
  *.dtb, *.drawio, *.svg, and pkg/cloud-hypervisor/client so CI only
  checks project code
- Leave generated and binary assets unchanged (excluded from checker)

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-10 21:58:28 +01:00
Fabiano Fidêncio
34199b09eb runtime-rs: Properly parse containerd runtime options to extract ConfigPath
The runtime-rs shim was failing to load its configuration when deployed
via kata-deploy because it couldn't correctly parse the ConfigPath passed
by containerd. The previous implementation naively skipped the first 2
bytes of the options and interpreted the rest as a UTF-8 string, which
doesn't work since containerd passes a properly serialized protobuf
message of type runtimeoptions.v1.Options.

This change adds the runtimeoptions.proto definition to the protocols
crate and updates the load_config function to correctly deserialize the
protobuf message and extract the config_path field, matching how the Go
runtime handles this via typeurl.UnmarshalAny.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
cb652e0da1 tests: Update NVRC trace to use drop-in config mechanism
Update the enable_nvrc_trace() function to use the new drop-in
configuration mechanism instead of directly modifying the base
configuration file. The function now creates a 90-nvrc-trace.toml
drop-in file that properly combines existing kernel parameters
with the nvrc.log=trace setting.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
4cb2aea9dd kata-deploy: Document drop-in configuration and add warning to config files
When kata-deploy installs Kata Containers, the base configuration files
should not be modified directly. This change adds documentation explaining
how to use drop-in configuration files for customization, and prepends a
warning comment to all deployed configuration files reminding users to use
drop-in files instead.

The warning is added to both standard shim configurations and custom
runtime configurations. It includes a brief explanation of how drop-in
files work and points users to the documentation for more details.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
d5d561abe5 kata-deploy: Add detailed logging for drop-in configuration
Add clear INFO-level messages when creating drop-in configuration
files, making it easy to understand what kata-deploy is doing during
installation:

- "Setting up runtime directory for shim: X"
- "Generating drop-in configuration files for shim: X"
- "Created drop-in file: <path>"

When DEBUG mode is enabled (via DEBUG=true environment variable),
also log the full content of each drop-in file to aid troubleshooting.

The log level is now automatically set to Debug when the DEBUG
environment variable is set, ensuring debug messages are visible.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
eddd1b507e kata-deploy: Extract common drop-in generation into shared helper
Deduplicate the drop-in file generation logic between configure_shim_config
and install_custom_runtime_configs by extracting it into a shared
write_common_drop_ins helper function.

This ensures both standard and custom runtimes use the same code path
for generating drop-in configuration files.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
577aa6b319 kata-deploy: Propagate drop-in configs to custom runtime classes
Ensure custom runtime classes receive the same drop-in configuration
files as standard runtimes:
- 10-installation-prefix.toml (if custom dest_dir)
- 20-debug.toml (if debug enabled)
- 30-kernel-params.toml (proxy + debug kernel params)

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
8c60a88bda kata-deploy: Add combined kernel_params drop-in
Add a combined drop-in file (30-kernel-params.toml) that handles all
kernel_params modifications. This approach reads the base kernel_params
from the original untouched config file and combines them with:
- Proxy settings (agent.https_proxy, agent.no_proxy)
- Debug settings (agent.log=debug, initcall_debug)

Using a single drop-in file for kernel_params avoids the TOML merge
behavior where scalar values are replaced rather than appended.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
fae96f1f82 kata-deploy: Add drop-in file for debug configuration
When debug mode is enabled, generate a drop-in configuration file
(20-debug.toml) with the boolean debug flags for hypervisor, runtime,
and agent sections.

Note: kernel_params for debug (agent.log=debug, initcall_debug) will
be handled by a separate combined kernel_params drop-in file to avoid
the TOML merge replacement behavior.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
bb65e516e5 kata-deploy: Add drop-in file for installation prefix
When the installation prefix differs from the default /opt/kata,
generate a drop-in configuration file (10-installation-prefix.toml)
with the adjusted paths instead of modifying the original config file.

This removes the need for adjust_installation_prefix and
adjust_qemu_cmdline functions which are now deleted along with
their tests.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Fabiano Fidêncio
cd76d61a3d kata-deploy: Add infrastructure for per-shim drop-in configuration
Instead of modifying original config files directly, set up a per-shim
directory structure that uses symlinks to the original configs and
config.d/ directories for drop-in overrides.

This enables cleaner configuration management where the original files
remain untouched and all kata-deploy customizations are in separate
drop-in files that can be easily inspected and removed.

Directory structure:
  {config_path}/runtimes/{shim}/
  {config_path}/runtimes/{shim}/configuration-{shim}.toml -> symlink
  {config_path}/runtimes/{shim}/config.d/

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-10 18:12:17 +01:00
Paul Meyer
c5ad3f9b26 Merge pull request #12472 from katexochen/p/disable-nvdimm-cc
runtime: disable nvdimm for confidential guest
2026-02-10 14:54:40 +01:00
Steve Horsman
44c86f881b Merge pull request #12466 from ldoktor/gk-pagination
tools.gatekeeper: Add support to paginate workflows
2026-02-10 11:59:57 +00:00
Steve Horsman
a8debc9841 Merge pull request #12476 from stevenhorsman/bump-rust-to-1.91
versions: Bump rust to 1.91
2026-02-10 10:03:01 +00:00
Paul Meyer
76525b97a6 runtime-rs: disable nvdimm for confidential guest
nvdimm isn't supported by confidential guests, so disable it in
the configuration.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>
2026-02-10 08:38:41 +01:00
Paul Meyer
a5f554922c runtime: disable nvdimm for confidential guest
There is code to disable this at runtime  when confidential_guest
is enabled anyway[^1], but it will omit a warning every time. All
the touched configuration files set confidential_guest to true,
so we already know nvdimm isn't supported.

[^1]: 16a7ed6e14/src/runtime/virtcontainers/qemu_amd64.go (L144-L148)

Signed-off-by: Paul Meyer <katexochen0@gmail.com>
2026-02-10 08:38:18 +01:00
Lukáš Doktor
f7baa394d4 tools.gatekeeper: Add support to paginate workflows
The number of workflows increased over 30 so we need to paginate them as
well as jobs. This commit extracts the existing pagination from jobs and
uses it for both jobs and workflows.

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2026-02-10 06:53:47 +00:00
stevenhorsman
120fde28e1 versions: Bump rust to 1.91
Following the agreed toolchain policy - bump rust to the current (1.93)-2
releases.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-10 06:52:42 +00:00
Alex Lyn
362a4c5714 runtime-rs: Fix multiqueue config propagation and tap initialization
The previous implementation failed to correctly propagate the network
multiqueue configuration, causing the effective queue number to remain
0.
It also mixed up "queue pairs" with "queue number", so tap devices were
opened without proper multiqueue initialization which causes Clh
netconfig validation failed.

This commit fixes the configuration mapping and initializes tap devices
with the correct multiqueue semantics, ensuring Cloud Hypervisor
receives a valid netconfig.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-10 11:34:25 +08:00
Alex Lyn
79f81dae50 runtime-rs: Add network_queues for setting network device multiqueues
To make network_queues configurable, a new method is introduced via
configurtion toml.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-10 11:34:25 +08:00
Alex Lyn
6723ff5c46 runtime-rs: Add configurable DEFNETQUEUES in Makefile
To make build with a configurable item of network queues, a dedicated
variable of DEFNETQUEUES is added.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-10 11:34:25 +08:00
Alex Lyn
cfc479ef1d kata-types: Add Network device specific annotation for network queues
This commit introduces a new annotation for users to easily set network
queues via "io.katacontainers.config.hypervisor.network_queues".
And the annotation will be mapped into `NetworkInfo.network_queues`
within the configuration.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-10 11:34:25 +08:00
Alex Lyn
61e7875267 kata-types: Adjust the network_queues when load from configuration
Adjusts the network queues after loading from a configuration file.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-10 11:34:25 +08:00
Manuel Huber
a6ca5c6628 ci: add editorconfig checker
This adds a basic configuration for editorconfig checker. The
supplied configuration checks against trailing whitespaces and
issues with newlines.
Example:
| tools/packaging/kernel/configs/fragments/x86_64/numa.conf:
|       Wrong line endings or no final newline
| tools/packaging/release/generate_vendor.sh:
|       44: Trailing whitespace

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-09 15:03:26 -08:00
stevenhorsman
e6d291cf0a trace-forwarder: Bump time to 0.3.47
Bump time to remediate CVE-2026-25727

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:44:51 +01:00
stevenhorsman
79dc892e18 kata-ctl: Bump time to 0.3.47
Bump time to remediate CVE-2026-25727

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:44:51 +01:00
stevenhorsman
9e1ddcdde9 agent-ctl: Bump time to 0.3.47
Bump time to remediate CVE-2026-25727

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:44:51 +01:00
stevenhorsman
f840f9ad54 rust: Bump time to 0.3.47
To remediate CVE-2026-25727

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:44:51 +01:00
stevenhorsman
ffcb10b6a3 agent: Bump time crate to 0.3.47
Update time to resolve CVE-2026-25727.
Note: this involved bumping the versions of slog-term and slog-json
and bumping the MSRV to 1.88.0 which time 0.3.47 requires.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:44:51 +01:00
stevenhorsman
33d494b07e kata-deploy: Bump bytes to 1.11.1
To remediate CVE-2026-25541

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:43:23 +01:00
stevenhorsman
2ea29df99a genpolicy: Bump bytes to 1.11.1
To remediate CVE-2026-25541

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:43:23 +01:00
stevenhorsman
fa3b419965 kata-ctl: Bump bytes to 1.11.1
To remediate CVE-2026-25541

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:43:23 +01:00
stevenhorsman
e49a61eea2 agent: Bump bytes to 1.11.1
To remediate CVE-2026-25541

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:43:23 +01:00
stevenhorsman
bc45788356 versions: Bump bytes to 1.11.1
To remediate CVE-2026-25541

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:43:23 +01:00
stevenhorsman
51d35f9261 agent-ctl: Bump bytes to 1.11.1
Remediate CVE-2026-25541

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 21:43:23 +01:00
Park.Jiyeon
082e25b297 genpolicy: skip serializing VFIO generation-only settings
Skip serializing anno/value regexes and the NVIDIA VFIO device type since they
are generation-time only.

Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>
2026-02-09 11:36:34 -08:00
Park.Jiyeon
9231144b99 genpolicy: refactor VFIO settings and support multiple NVIDIA GPU keys
- Moved VFIO-related config from "device_annotations" to a new "devices" section.
- Introduced structured "nvidia" subfield for NVIDIA-specific VFIO settings.
- Replaced hardcoded "nvidia.com/pgpu" with configurable "pgpu_resource_keys".
- Adjusted Rego rules and code to match new config schema.

Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>
2026-02-09 11:36:34 -08:00
Park.Jiyeon
5fa5d1934b fix(genpolicy): make NVIDIA GPU resource keys configurable
Allow specifying multiple NVIDIA GPU resource keys via an explicit allowlist.

Keys are now configured under `device_annotations.vfio.nvidia_pgpu_resource_keys`
in genpolicy-settings.json. This removes the previous hardcoded reliance on
`nvidia.com/pgpu` and supports model-specific resource names.

Fixes #12322

Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>
2026-02-09 11:36:34 -08:00
Manuel Huber
525192832f tests: Clean up superfluous GPU annotation
This annotation was required for GPU cold-plug before using a
newer device plugin and before querying the pod resources API.
As this annotation is no longer required, cleaning it up.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-09 11:28:24 -08:00
Konstantin Khlebnikov
5d99a141d9 runtime: add hypervisor options for NUMA topology
With enable_numa=true hypervisor will expose host NUMA topology as is:
map vm NUMA nodes to host 1:1 and bind vpus to relates CPUS.

Option "numa_mapping" allows to redefine NUMA nodes mapping:
- map each vm node to particular host node or several numa nodes
- emulate numa on host without numa (useful for tests)

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Co-authored-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-09 20:09:25 +01:00
Fabiano Fidêncio
ab515712d4 kernel: Unify kernel and kernel-confidential
Build a single kernel for both kernel and kernel-confidential on x86_64
and s390x. The kernel is built with TEE support (-x) on those arches only.

This helps to simplilfy and to maintain the code, and having a single
kernel was the original plan since forever.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-09 18:28:23 +01:00
Fabiano Fidêncio
c5b5433866 kernel: Unify nvidia-gpu and nvidia-gpu-confidential
Build a single kernel for both nvidia-gpu and nvidia-gpu-confidential,
simplifying and reducing code maintenance.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-09 18:28:23 +01:00
Steve Horsman
f02fa79758 Merge pull request #12470 from jirimoravcik/docs/add-os-version
docs: add `OS_VERSION` to rootfs script
2026-02-09 15:06:14 +00:00
Alex Lyn
3fda59e27d tests: rename pod_exec_with_retries to pod_exec and update callers
It will do following works in this commit:

(1) Rename pod_exec_with_retries() to pod_exec().
(2) Update implementation to call container_exec().
(3) Replace all usages of pod_exec_with_retries across tests
    with pod_exec.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
Alex Lyn
861d39305c tests: drop kubectl exec retries in container_exec
This commit aims to drop retries when kubectl exec a container:

(1) Rename container_exec_with_retries() to container_exec().
(2) Remove the retry loop and sleep backoff around kubectl exec.

Keep the same logging and container-selection logic and return
kubectl exec exit status directly.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
Alex Lyn
41e8acbc5e runtime: Map empty ReadStdout/ReadStderr response to io.EOF
After the kata-agent "drain-after-exit" change, stdout/stderr EOF is
signaled by a successful ReadStdout/ReadStderr reply with empty Data
(len==0), instead of an RPC error. However, runtime-go currently
returns (0, nil) to io.CopyBuffer() when resp.Data is empty, which
violates Go io.Reader semantics and can cause `kubectl exec` to
hang after the command output is already printed.

To avoid exec hang:
In readProcessStream(), map an empty response (len(resp.Data)==0)
into (0, io.EOF). This allows the stdout/stderr copy goroutines to
terminate, closes exitIOch, and unblocks the wait path so exec can
complete normally.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
Alex Lyn
ffb8a6a9c3 agent: fix misleading tokio::select! biased comment in do_read_stream
The previous comment incorrectly implied that `biased` prevents data
loss and the exit notifier would never be polled before all buffered
data is read. And the detailed info can be seen from the document:
https://docs.rs/tokio/latest/src/tokio/macros/select.rs.html#67

Tokio's `biased` only makes polling order deterministic(top-to-bottom)
when multiple branches are ready in the same poll, and it makes fairness
the caller's responsibility. Output can still be truncated if the exit
notification becomes ready while `read_stream` is pending.

This change updates the comment to reflect the actual semantics and
caveats. No functional behavior change.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
Alex Lyn
1080f6d87e agent: Introduce drain after exit mechanism to address truncation race
Short-lived processes (e.g., `kubectl exec echo`) in legacy-io mode
occasionally lose the last segments of their output.

The root cause is a race condition where the `term_exit_notifier`
triggers before the pipe buffers are fully drained. In the previous
implementation, once the exit notification was received, the agent
immediately returned an EOF, causing the runtime's `run_io_copy` to
terminate and drop any residual data in the pipe.

This patch introduces a "drain after exit" mechanism:
- Upon receiving an exit notification, the agent enters a 500ms window
  for polling `read_streaim` to flush remaining data from the buffer.
- A true EOF is only returned if the stream is confirmed empty or the
  timeout is reached.

This ensures reliable output delivery for transient exec tasks under
high concurrency.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
Alex Lyn
700bddeecc agent: treat EOF as normal for read_stdout/stderr stream
Legacy IO uses shim polling via read_stdout/read_stderr. The agent
previously mapped pipe EOF (read() == 0) and term_exit_notifier to
errors ("read meet eof"/"eof"), which became ttrpc INTERNAL failures.
This caused runtime IO copy to abort early, leading to lost
stdout/stderr for short-lived exec (e.g."echo") and spurious failures.

Normalize EOF semantics: read_stream now returns Ok(empty) on EOF
instead of Err("read meet eof").

This makes legacy IO behave like a proper stream: data until EOF, no
INTERNAL errors for normal termination.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2026-02-09 15:56:13 +01:00
stevenhorsman
b909c41128 runtime: Bump x/net to v0.49.0
Bump x/net to resolve CVEs:
- GO-2026-4441
- GO-2026-4440

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 14:49:31 +01:00
stevenhorsman
b29312289f versions: Bump go to 1.24.13
Bump go to 1.24.13 to fix CVE GO-2026-4337

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2026-02-09 14:49:31 +01:00
Zvonko Kaiser
7af306de13 agent: Update aarch64 create_pci_root_bus_path
aarch64 is also a supported architecture for NUMA.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-09 10:19:41 +01:00
Zvonko Kaiser
8185c015ad gpu: Add Agent NUMA Support 1 of N
We're introducing a root_complex to assign each
and every device to a NUMA node or to the default
root_complex="00" aka pcie.0. This patch introduces
the proper handling of the current  qom path being
bus/device == "00/02" with NUMAA we need to extend it
with the root_complex/bus/device == "10/00/02".

We're defaulting to root_complex="00".

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2026-02-09 10:19:41 +01:00
Alex Lyn
16a7ed6e14 Merge pull request #12464 from mythi/runtime-rs-tdvf
runtime-rs: use FIRMWARETDVFPATH like Go runtime
2026-02-09 09:12:52 +08:00
Mikko Ylinen
4088881662 runtime-rs: use FIRMWARETDVFPATH like Go runtime
Use OVMF path configuration for Intel TDX consistently:

$ git grep FIRMWARETD
src/runtime-rs/Makefile:FIRMWARETDXPATH := $(PREFIXDEPS)/share/ovmf/OVMF.inteltdx.fd
src/runtime-rs/Makefile:USER_VARS += FIRMWARETDXPATH
src/runtime-rs/config/configuration-qemu-tdx-runtime-rs.toml.in:firmware = "@FIRMWARETDXPATH@"
src/runtime/Makefile:FIRMWARETDVFPATH := $(PREFIXDEPS)/share/ovmf/OVMF.inteltdx.fd

Go runtime has used *TDVF* so just make runtime-rs to follow. This
keeps the behavior consistent when downstreams switch from Go runtime
to runtime-rs.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-02-08 21:38:06 +01:00
Jiri Moravcik
d5840149d2 docs: add OS_VERSION to rootfs script
The OS_VERSION is required when trying to build RootFS with ubuntu distro.

Fixes #12469

Signed-off-by: Jiri Moravcik <jiri.moravcik@gmail.com>
2026-02-08 21:21:59 +01:00
Manuel Huber
d9d1073cf1 gpu: Install packages for devkit
Introduce a new function to install additional packages into the
devkit flavor. With modprobe, we avoid errors on pod startup
related to loading nvidia kernel modules in the NVRC phase.
Note, the production flavor gets modprobe from busybox, see its
configuration file containing CONFIG_MODPROBE=y.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-06 09:58:32 +01:00
Manuel Huber
a786582d0b rootfs: deprecate initramfs dm-verity mode
Remove the initramfs folder, its build steps, and use the kernel
based dm-verity enforcement for the handlers which used the
initramfs mode. Also, remove the initramfs verity mode
capability from the shims and their configs.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
cf7f340b39 tests: Read and overwrite kernel_verity_parameters
Read the kernel_verity_paramers from the shim config and adjust
the root hash for the negative test.
Further, improve some of the test logic by using shared
functions. This especially ensures we don't read the full
journalctl logs on a node but only the portion of the logs we are
actually supposed to look at.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
7958be8634 runtime: Make kernel_verity_params overwritable
Similar to the kernel_params annotation, add a
kernel_verity_params annotation and add logic to make these
parameters overwritable. For instance, this can be used in test
logic to provide bogus dm-verity hashes for negative tests.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
7700095ea8 runtime-rs: Make kernel_verity_params overwritable
Similar to the kernel_params annotation, add a
kernel_verity_params annotation and add logic to make these
parameters overwritable. For instance, this can be used in test
logic to provide bogus dm-verity hashes for negative tests.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
472b50fa42 runtime-rs: Enable kernelinit dm-verity variant
This change introduces the kernel_verity_parameters knob to the
rust based shim, picking up dm-verity information in a new config
field (the corresponding build variable is already produced by
the shim build). The change extends the shim to parse dm-verity
information from this parameter and to construct the kernel command
line appropriately, based on the indicated initramfs or kernelinit
build variant.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
f639c3fa17 runtime: Enable kernelinit dm-verity variant
This change introduces the kernel_verity_parameters knob to the
Go based shim, picking up dm-verity information in a new config
field (the corresponding build variable is already produced by
the shim build). The change extends the shim to parse dm-verity
information from this parameter and to construct the kernel command
line appropriately, based on the indicated initramfs or kernelinit
build variant.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
e120dd4cc6 tests: cc: Remove quotes from kernel command line
With dm-mod.create parameters using quotes, we remove the
backslashes used to escape these quotes from the output we
retrieve. This will enable attestation tests to work with the
kernelinit dm-verity mode.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
976df22119 rootfs: Change condition for cryptsetup-bin
Measured rootfs mode and CDH secure storage feature require the
cryptsetup-bin and e2fsprogs components in the guest.
This change makes this more explicity - confidential guests are
users of the CDH secure container image layer storage feature.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
a3c4e0b64f rootfs: Introduce kernelinit dm-verity mode
This change introduces the kernelinit dm-verity mode, allowing
initramfs-less dm-verity enforcement against the rootfs image.
For this, the change introduces a new variable with dm-verity
information. This variable will be picked up by shim
configurations in subsequent commits.
This will allow the shims to build the kernel command line
with dm-verity information based on the existing
kernel_parameters configuration knob and a new
kernel_verity_params configuration knob. The latter
specifically provides the relevant dm-verity information.
This new configuration knob avoids merging the verity
parameters into the kernel_params field. Avoiding this, no
cumbersome escape logic is required as we do not need to pass the
dm-mod.create="..." parameter directly in the kernel_parameters,
but only relevant dm-verity parameters in semi-structured manner
(see above). The only place where the final command line is
assembled is in the shims. Further, this is a line easy to comment
out for developers to disable dm-verity enforcement (or for CI
tasks).

This change produces the new kernelinit dm-verity parameters for
the NVIDIA runtime handlers, and modifies the format of how
these parameters are prepared for all handlers. With this, the
parameters are currently no longer provided to the
kernel_params configuration knob for any runtime handler.
This change alone should thus not be used as dm-verity
information will no longer be picked up by the shims.

systemd-analyze on the coco-dev handler shows that using the
kernelinit mode on a local machine, less time is spent in the
kernel phase, slightly speeding up pod start-up. On that machine,
the average of 172.5ms was reduced to 141ms (4 measurements, each
with a basic pod manifest), i.e., the kernel phase duration is
improved by about 18 percent.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
83a0bd1360 gpu: use dm-verity for the non-TEE GPU handler
Use a dm-verity protected rootfs image for the non-TEE NVIDIA
GPU handler as well.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
02ed4c99bc rootfs: Use maxdepth=1 to search for kata tarballs
These tarballs are in the top layer of the build directory,
no need to traverse all sub-directories.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
d37db5f068 rootfs: Restore "gpu: Handle root_hash.txt ..."
This reverts commit 923f97bc66 in
order to re-instantiate the logic from commit
e4a13b9a4a.

The latter commit was previously reverted due to the NVIDIA GPU TEE
handler using an initrd, not an image.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
f1ca547d66 initramfs: introduce log function
Log to /dev/kmsg, this way logs will show up and not get lost.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
6d0bb49716 runtime: nvidia: Use img and sanitize whitespaces
Shift NVIDIA shim configurations to use an image instead of an initrd,
and remove trailing whitespaces from the configs.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Manuel Huber
282014000f tests: cc: support initrd, image for attestation
Allow using an image instead of an initrd. For confidential
guests using images, the assumption is that the guest kernel uses
dm-verity protection, implicitly measuring the rootfs image via
the kernel command line's dm-verity information.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-05 23:04:35 +01:00
Greg Kurz
e430b2641c Merge pull request #12435 from bpradipt/crio-annotation
shim: Add CRI-O annotation support for device cold plug
2026-02-05 09:29:19 +01:00
Alex Lyn
e257430976 Merge pull request #12433 from manuelh-dev/mahuber/cfg-sanitize-whitespaces
runtimes: Sanitize trailing whitespaces
2026-02-05 09:31:21 +08:00
Fabiano Fidêncio
dda1b30c34 tests: nvidia-nim: Use sealed secrets for NGC_API_KEY
Convert the NGC_API_KEY from a regular Kubernetes secret to a sealed
secret for the CC GPU tests. This ensures the API key is only accessible
within the confidential enclave after successful attestation.

The sealed secret uses the "vault" type which points to a resource stored
in the Key Broker Service (KBS). The Confidential Data Hub (CDH) inside
the guest will unseal this secret by fetching it from KBS after
attestation.

The initdata file is created AFTER create_tmp_policy_settings_dir()
copies the empty default file, and BEFORE auto_generate_policy() runs.
This allows genpolicy to add the generated policy.rego to our custom
CDH configuration.

The sealed secret format follows the CoCo specification:
sealed.<JWS header>.<JWS payload>.<signature>

Where the payload contains:
- version: "0.1.0"
- type: "vault" (pointer to KBS resource)
- provider: "kbs"
- resource_uri: KBS path to the actual secret

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-04 12:34:44 +01:00
Fabiano Fidêncio
c9061f9e36 tests: kata-deploy: Increase post-deployment wait time
Increase the sleep time after kata-deploy deployment from 10s to 60s
to give more time for runtimes to be configured. This helps avoid
race conditions on slower K8s distributions like k3s where the
RuntimeClass may not be immediately available after the DaemonSet
rollout completes.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-04 12:13:53 +01:00
Fabiano Fidêncio
0fb2c500fd tests: kata-deploy: Merge E2E tests to avoid timing issues
Merge the two E2E tests ("Custom RuntimeClass exists with correct
properties" and "Custom runtime can run a pod") into a single test, as
those 2 are very much dependent of each other.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-04 12:13:53 +01:00
Fabiano Fidêncio
fef93f1e08 tests: kata-deploy: Use die() instead of fail() for error handling
Replace fail() calls with die() which is already provided by
common.bash. The fail() function doesn't exist in the test
infrastructure, causing "command not found" errors when tests fail.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-04 12:13:53 +01:00
Fabiano Fidêncio
f90c12d4df kata-deploy: Avoid text file busy error with nydus-snapshotter
We cannot overwrtie a binary that's currently in use, and that's the
reason that elsewhere we remove / unlink the binary (the running process
keeps its file descriptor, so we're good doing that) and only then we
copy the binary.  However, we missed doing this for the
nydus-snapshotter deployment.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-04 10:24:49 +01:00
Manuel Huber
30c7325e75 runtimes: Sanitize trailing whitespaces
Clean up trailing whitespaces, making life easier for those who
have configured their IDE to clean these up.
Suggest to not add new code with trailing whitespaces etc.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-02-03 11:46:30 -08:00
Steve Horsman
30494abe48 Merge pull request #12426 from kata-containers/dependabot/github_actions/zizmorcore/zizmor-action-0.4.1
build(deps): bump zizmorcore/zizmor-action from 0.2.0 to 0.4.1
2026-02-03 14:38:54 +00:00
Pradipta Banerjee
8a449d358f shim: Add CRI-O annotation support for device cold plug
Add support for CRI-O annotations when fetching pod identifiers for
device cold plug. The code now checks containerd CRI annotations first,
then falls back to CRI-O annotations if they are empty.

This enables device cold plug to work with both containerd and CRI-O
container runtimes.

Annotations supported:
- containerd: io.kubernetes.cri.sandbox-name, io.kubernetes.cri.sandbox-namespace
- CRI-O: io.kubernetes.cri-o.KubeName, io.kubernetes.cri-o.Namespace

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
2026-02-03 04:51:15 +00:00
Steve Horsman
6bb77a2f13 Merge pull request #12390 from mythi/tdx-updates-2026-2
runtime: tdx QEMU configuration changes
2026-02-02 16:58:44 +00:00
Zvonko Kaiser
6702b48858 Merge pull request #12428 from fidencio/topic/nydus-snapshotter-start-from-a-clean-state
kata-deploy: nydus: Always start from a clean state
2026-02-02 11:21:26 -05:00
Steve Horsman
0530a3494f Merge pull request #12415 from nlle/make-helm-updatestrategy-configurable
kata-deploy: Make update strategy configurable for kata-deploy DaemonSet
2026-02-02 10:29:01 +00:00
Steve Horsman
93dcaee965 Merge pull request #12423 from manuelh-dev/mahuber/pause-build-fix
packaging: Delete pause_bundle dir before unpack
2026-02-02 10:26:30 +00:00
Fabiano Fidêncio
62ad0814c5 kata-deploy: nydus: Always start from a clean state
Clean up existing nydus-snapshotter state to ensure fresh start with new
version.

This is safe across all K8s distributions (k3s, rke2, k0s, microk8s,
etc.) because we only touch the nydus data directory, not containerd's
internals.

When containerd tries to use non-existent snapshots, it will
re-pull/re-unpack.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2026-02-02 11:06:37 +01:00
Mikko Ylinen
870630c421 kata-deploy: drop custom TDX installation steps
As we have moved to use QEMU (and OVMF already earlier) from
kata-deploy, the custom tdx configurations and distro checks
are no longer needed.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-02-02 11:11:26 +02:00
Mikko Ylinen
927be7b8ad runtime: tdx: move to use QEMU from kata-deploy
Currently, a working TDX setup expects users to install special
TDX support builds from Canonical/CentOS virt-sig for TDX to
work. kata-deploy configured TDX runtime handler to use QEMU
from the distro's paths.

With TDX support now being available in upstream Linux and
Ubuntu 24.04 having an install candidate (linux-image-generic-6.17)
for a new enough kernel, move TDX configuration to use QEMU from
kata-deploy.

While this is the new default, going back to the original
setup is possible by making manual changes to TDX runtime handlers.

Note: runtime-rs is already using QEMUPATH for TDX.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2026-02-02 11:10:52 +02:00
Nikolaj Lindberg Lerche
6e98df2bac kata-deploy: Make update strategy configurable for kata-deploy DaemonSet
This Allows the updateStrategy to be configured for the kata-deploy helm
chart, this is enabling administrators to control the aggressiveness of
updates. For a less aggressive approach, the strategy can be set to
`OnDelete`. Alternatively, the update process can be made more
aggressive by adjusting the `maxUnavailable` parameter.

Signed-off-by: Nikolaj Lindberg Lerche <nlle@ambu.com>
2026-02-01 20:14:29 +01:00
Dan Mihai
d7ff54769c tests: policy: remove the need for using sudo
Modify the copy of root user's settings file, instead of modifying the
original file.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-02-01 20:09:50 +01:00
Dan Mihai
4d860dcaf5 tests: policy: avoid redundant debug output
Avoid redundant and confusing teardown_common() debug output for
k8s-policy-pod.bats and k8s-policy-pvc.bats.

The Policy tests skip the Message field when printing information about
their pods, because unfortunately that field might contain a truncated
Policy log - for the test cases that intentiocally cause Policy
failures. The non-truncated Policy log is already available from other
"kubectl describe" fields.

So, avoid the redundant pod information from teardown_common(), that
also included the confusing Message field.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2026-02-01 20:09:50 +01:00
dependabot[bot]
dc8d9e056d build(deps): bump zizmorcore/zizmor-action from 0.2.0 to 0.4.1
Bumps [zizmorcore/zizmor-action](https://github.com/zizmorcore/zizmor-action) from 0.2.0 to 0.4.1.
- [Release notes](https://github.com/zizmorcore/zizmor-action/releases)
- [Commits](e673c3917a...135698455d)

---
updated-dependencies:
- dependency-name: zizmorcore/zizmor-action
  dependency-version: 0.4.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-02-01 15:08:10 +00:00
Manuel Huber
8b0c199f43 packaging: Delete pause_bundle dir before unpack
Delete the pause_bundle directory before running the umoci unpack
operation. This will make builds idempotent and not fail with
errors like "create runtime bundle: config.json already exists in
.../build/pause-image/destdir/pause_bundle". This will make life
better when building locally.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-31 19:43:11 +01:00
Steve Horsman
4d1095e653 Merge pull request #12350 from manuelh-dev/mahuber/term-grace-period
tests: Remove terminationGracePeriod in manifests
2026-01-29 15:17:17 +00:00
Manuel Huber
6438fe7f2d tests: Remove terminationGracePeriod in manifests
Do not kill containers immediately, instead use Kubernetes'
default termination grace period.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-01-23 16:18:44 -08:00
577 changed files with 10634 additions and 7391 deletions

7
.editorconfig Normal file
View File

@@ -0,0 +1,7 @@
root = true
[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true

View File

@@ -0,0 +1,30 @@
{
"Verbose": false,
"Debug": false,
"IgnoreDefaults": false,
"SpacesAfterTabs": false,
"NoColor": false,
"Exclude": [
"src/runtime/vendor",
"src/tools/log-parser/vendor",
"tests/metrics/cmd/checkmetrics/vendor",
"tests/vendor",
"src/runtime/virtcontainers/pkg/cloud-hypervisor/client",
"\\.img$",
"\\.dtb$",
"\\.drawio$",
"\\.svg$",
"\\.patch$"
],
"AllowedContentTypes": [],
"PassedFiles": [],
"Disable": {
"EndOfLine": false,
"Indentation": false,
"IndentSize": false,
"InsertFinalNewline": false,
"TrimTrailingWhitespace": false,
"MaxLineLength": false,
"Charset": false
}
}

View File

@@ -17,7 +17,7 @@ runs:
uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: nightly
toolchain: nightly
override: true
- name: Cache

View File

@@ -47,6 +47,23 @@ jobs:
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install yq
run: |
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Read properties from versions.yaml
run: |
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
- name: Install dependencies
run: bash tests/integration/cri-containerd/gha-run.sh install-dependencies
env:
@@ -347,7 +364,7 @@ jobs:
path: kata-tools-artifacts
- name: Install kata & kata-tools
run: |
run: |
bash tests/functional/kata-agent-apis/gha-run.sh install-kata kata-artifacts
bash tests/functional/kata-agent-apis/gha-run.sh install-kata-tools kata-tools-artifacts

View File

@@ -47,8 +47,25 @@ jobs:
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install yq
run: |
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Read properties from versions.yaml
run: |
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
- name: Install dependencies
run: bash tests/integration/cri-containerd/gha-run.sh
run: bash tests/integration/cri-containerd/gha-run.sh install-dependencies
env:
GH_TOKEN: ${{ github.token }}

View File

@@ -82,11 +82,17 @@ jobs:
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Install golang
- name: Read properties from versions.yaml
if: contains(matrix.component.needs, 'golang')
run: |
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> "$GITHUB_PATH"
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
if: contains(matrix.component.needs, 'golang')
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
- name: Setup rust
if: contains(matrix.component.needs, 'rust')
run: |

View File

@@ -74,7 +74,7 @@ jobs:
- rust
- protobuf-compiler
instance:
- ${{ inputs.instance }}
- ${{ inputs.instance }}
steps:
- name: Adjust a permission for repo
@@ -94,11 +94,19 @@ jobs:
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Install golang
- name: Read properties from versions.yaml
if: contains(matrix.component.needs, 'golang')
run: |
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> "$GITHUB_PATH"
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
if: contains(matrix.component.needs, 'golang')
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
# Setup-go doesn't work properly with ppc64le: https://github.com/actions/setup-go/issues/648
architecture: ${{ contains(inputs.instance, 'ppc64le') && 'ppc64le' || '' }}
- name: Setup rust
if: contains(matrix.component.needs, 'rust')
run: |

View File

@@ -47,10 +47,8 @@ jobs:
- coco-guest-components
- firecracker
- kernel
- kernel-confidential
- kernel-dragonball-experimental
- kernel-nvidia-gpu
- kernel-nvidia-gpu-confidential
- nydus
- ovmf
- ovmf-sev
@@ -145,7 +143,7 @@ jobs:
if-no-files-found: error
- name: store-extratarballs-artifact ${{ matrix.asset }}
if: ${{ startsWith(matrix.asset, 'kernel-nvidia-gpu') }}
if: ${{ matrix.asset == 'kernel' || startsWith(matrix.asset, 'kernel-nvidia-gpu') }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: kata-artifacts-amd64-${{ matrix.asset }}-modules${{ inputs.tarball-suffix }}
@@ -237,8 +235,8 @@ jobs:
asset:
- busybox
- coco-guest-components
- kernel-modules
- kernel-nvidia-gpu-modules
- kernel-nvidia-gpu-confidential-modules
- pause-image
steps:
- uses: geekyeggo/delete-artifact@f275313e70c08f6120db482d7a6b98377786765b # v5.1.0

View File

@@ -44,7 +44,6 @@ jobs:
- agent
- coco-guest-components
- kernel
- kernel-confidential
- pause-image
- qemu
- virtiofsd
@@ -121,6 +120,15 @@ jobs:
retention-days: 15
if-no-files-found: error
- name: store-extratarballs-artifact ${{ matrix.asset }}
if: ${{ matrix.asset == 'kernel' }}
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: kata-artifacts-s390x-${{ matrix.asset }}-modules${{ inputs.tarball-suffix }}
path: kata-build/kata-static-${{ matrix.asset }}-modules.tar.zst
retention-days: 15
if-no-files-found: error
build-asset-rootfs:
name: build-asset-rootfs
runs-on: s390x

View File

@@ -297,6 +297,21 @@ jobs:
AZ_TENANT_ID: ${{ secrets.AZ_TENANT_ID }}
AZ_SUBSCRIPTION_ID: ${{ secrets.AZ_SUBSCRIPTION_ID }}
run-k8s-tests-on-free-runner:
if: ${{ inputs.skip-test != 'yes' }}
needs: publish-kata-deploy-payload-amd64
permissions:
contents: read
uses: ./.github/workflows/run-k8s-tests-on-free-runner.yaml
with:
tarball-suffix: -${{ inputs.tag }}
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
commit-hash: ${{ inputs.commit-hash }}
pr-number: ${{ inputs.pr-number }}
target-branch: ${{ inputs.target-branch }}
run-k8s-tests-on-arm64:
if: ${{ inputs.skip-test != 'yes' }}
needs: publish-kata-deploy-payload-arm64

View File

@@ -31,10 +31,22 @@ jobs:
with:
persist-credentials: false
- name: Install golang
- name: Install yq
run: |
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> "${GITHUB_PATH}"
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Read properties from versions.yaml
run: |
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
- name: Install Rust
run: ./tests/install_rust.sh

View File

@@ -24,10 +24,22 @@ jobs:
fetch-depth: 0
persist-credentials: false
- name: Install golang
- name: Install yq
run: |
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> "${GITHUB_PATH}"
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Read properties from versions.yaml
run: |
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
- name: Docs URL Alive Check
run: |

View File

@@ -0,0 +1,29 @@
name: EditorConfig checker
on:
pull_request:
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true
jobs:
editorconfig-checker:
name: editorconfig-checker
runs-on: ubuntu-24.04
steps:
- name: Checkout the code
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
persist-credentials: false
- name: Set up editorconfig-checker
uses: editorconfig-checker/action-editorconfig-checker@4b6cd6190d435e7e084fb35e36a096e98506f7b9 # v2.1.0
with:
version: v3.6.1
- name: Run editorconfig-checker
run: editorconfig-checker

View File

@@ -27,10 +27,22 @@ jobs:
fetch-depth: 0
persist-credentials: false
- name: Install golang
- name: Install yq
run: |
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> "${GITHUB_PATH}"
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Read properties from versions.yaml
run: |
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
- name: Install govulncheck
run: |

View File

@@ -0,0 +1,43 @@
# Push gperf and busybox tarballs to the ORAS cache (ghcr.io) so that
# download-with-oras-cache.sh can pull them instead of hitting upstream.
# Runs when versions.yaml changes on main (e.g. after a PR merge) or manually.
name: CI | Push ORAS tarball cache
on:
push:
branches:
- main
paths:
- 'versions.yaml'
workflow_dispatch:
permissions: {}
jobs:
push-oras-cache:
name: push-oras-cache
runs-on: ubuntu-22.04
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
persist-credentials: false
- name: Install yq
run: ./ci/install_yq.sh
- name: Install ORAS
uses: oras-project/setup-oras@22ce207df3b08e061f537244349aac6ae1d214f6 # v1.2.4
with:
version: "1.2.0"
- name: Populate ORAS tarball cache
run: ./tools/packaging/scripts/populate-oras-tarball-cache.sh all
env:
ARTEFACT_REGISTRY: ghcr.io
ARTEFACT_REPOSITORY: kata-containers
ARTEFACT_REGISTRY_USERNAME: ${{ github.actor }}
ARTEFACT_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -55,6 +55,23 @@ jobs:
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install yq
run: |
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Read properties from versions.yaml
run: |
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
- name: Install dependencies
timeout-minutes: 15
run: bash tests/integration/cri-containerd/gha-run.sh install-dependencies

View File

@@ -42,17 +42,6 @@ jobs:
strategy:
fail-fast: false
matrix:
host_os:
- ubuntu
vmm:
- clh
- dragonball
- qemu
- qemu-runtime-rs
- cloud-hypervisor
instance-type:
- small
- normal
include:
- host_os: cbl-mariner
vmm: clh
@@ -80,6 +69,7 @@ jobs:
KUBERNETES: "vanilla"
K8S_TEST_HOST_TYPE: ${{ matrix.instance-type }}
GENPOLICY_PULL_METHOD: ${{ matrix.genpolicy-pull-method }}
RUNS_ON_AKS: "true"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:

View File

@@ -0,0 +1,127 @@
# Run Kubernetes integration tests on free GitHub runners with a locally
# deployed cluster (kubeadm).
name: CI | Run kubernetes tests on free runner
on:
workflow_call:
inputs:
tarball-suffix:
required: false
type: string
registry:
required: true
type: string
repo:
required: true
type: string
tag:
required: true
type: string
pr-number:
required: true
type: string
commit-hash:
required: false
type: string
target-branch:
required: false
type: string
default: ""
permissions: {}
jobs:
run-k8s-tests:
name: run-k8s-tests
strategy:
fail-fast: false
matrix:
environment: [
{ vmm: clh, containerd_version: lts },
{ vmm: clh, containerd_version: active },
{ vmm: dragonball, containerd_version: lts },
{ vmm: dragonball, containerd_version: active },
{ vmm: qemu, containerd_version: lts },
{ vmm: qemu, containerd_version: active },
{ vmm: qemu-runtime-rs, containerd_version: lts },
{ vmm: qemu-runtime-rs, containerd_version: active },
{ vmm: cloud-hypervisor, containerd_version: lts },
{ vmm: cloud-hypervisor, containerd_version: active },
]
runs-on: ubuntu-24.04
permissions:
contents: read
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
GH_PR_NUMBER: ${{ inputs.pr-number }}
KATA_HOST_OS: ubuntu
KATA_HYPERVISOR: ${{ matrix.environment.vmm }}
KUBERNETES: vanilla
K8S_TEST_HOST_TYPE: baremetal-no-attestation
CONTAINER_ENGINE: containerd
CONTAINER_ENGINE_VERSION: ${{ matrix.environment.containerd_version }}
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
persist-credentials: false
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tools-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-tools-artifacts
- name: Install kata-tools
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts
- name: Remove unnecessary directories to free up space
run: |
sudo rm -rf /usr/local/.ghcup
sudo rm -rf /opt/hostedtoolcache/CodeQL
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf /usr/local/share/boost
sudo rm -rf /usr/lib/jvm
sudo rm -rf /usr/share/swift
sudo rm -rf /usr/local/share/powershell
sudo rm -rf /usr/local/julia*
sudo rm -rf /opt/az
sudo rm -rf /usr/local/share/chromium
sudo rm -rf /opt/microsoft
sudo rm -rf /opt/google
sudo rm -rf /usr/lib/firefox
- name: Deploy k8s (kubeadm)
run: bash tests/integration/kubernetes/gha-run.sh deploy-k8s
- name: Install `bats`
run: bash tests/integration/kubernetes/gha-run.sh install-bats
- name: Deploy Kata
timeout-minutes: 20
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata
- name: Run tests
timeout-minutes: 60
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Delete kata-deploy
if: always()
timeout-minutes: 15
run: bash tests/integration/kubernetes/gha-run.sh cleanup

View File

@@ -57,10 +57,24 @@ jobs:
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: Install golang
- name: Install yq
run: |
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> "$GITHUB_PATH"
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Read properties from versions.yaml
run: |
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
# Setup-go doesn't work properly with ppc64le: https://github.com/actions/setup-go/issues/648
architecture: ${{ contains(inputs.instance, 'ppc64le') && 'ppc64le' || '' }}
- name: Prepare the runner for k8s test suite
run: bash "${HOME}/scripts/k8s_cluster_prepare.sh"

View File

@@ -140,165 +140,36 @@ jobs:
strategy:
fail-fast: false
matrix:
vmm:
- qemu-coco-dev
- qemu-coco-dev-runtime-rs
snapshotter:
- nydus
pull-type:
- guest-pull
include:
- pull-type: experimental-force-guest-pull
vmm: qemu-coco-dev
snapshotter: ""
runs-on: ubuntu-22.04
environment: [
{ vmm: qemu-coco-dev, snapshotter: nydus, pull_type: guest-pull },
{ vmm: qemu-coco-dev-runtime-rs, snapshotter: nydus, pull_type: guest-pull },
{ vmm: qemu-coco-dev, snapshotter: "", pull_type: experimental-force-guest-pull },
]
runs-on: ubuntu-24.04
permissions:
id-token: write # Used for OIDC access to log into Azure
contents: read
environment: ci
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
GH_PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
KATA_HYPERVISOR: ${{ matrix.environment.vmm }}
# Some tests rely on that variable to run (or not)
KBS: "true"
# Set the KBS ingress handler (empty string disables handling)
KBS_INGRESS: "aks"
KBS_INGRESS: "nodeport"
KUBERNETES: "vanilla"
PULL_TYPE: ${{ matrix.pull-type }}
PULL_TYPE: ${{ matrix.environment.pull_type }}
AUTHENTICATED_IMAGE_USER: ${{ vars.AUTHENTICATED_IMAGE_USER }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
SNAPSHOTTER: ${{ matrix.snapshotter }}
EXPERIMENTAL_FORCE_GUEST_PULL: ${{ matrix.pull-type == 'experimental-force-guest-pull' && matrix.vmm || '' }}
# Caution: current ingress controller used to expose the KBS service
# requires much vCPUs, lefting only a few for the tests. Depending on the
# host type chose it will result on the creation of a cluster with
# insufficient resources.
SNAPSHOTTER: ${{ matrix.environment.snapshotter }}
EXPERIMENTAL_FORCE_GUEST_PULL: ${{ matrix.environment.pull_type == 'experimental-force-guest-pull' && matrix.environment.vmm || '' }}
AUTO_GENERATE_POLICY: "yes"
K8S_TEST_HOST_TYPE: "all"
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
persist-credentials: false
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tools-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-tools-artifacts
- name: Install kata-tools
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts
- name: Log into the Azure account
uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0
with:
client-id: ${{ secrets.AZ_APPID }}
tenant-id: ${{ secrets.AZ_TENANT_ID }}
subscription-id: ${{ secrets.AZ_SUBSCRIPTION_ID }}
- name: Create AKS cluster
uses: nick-fields/retry@ce71cc2ab81d554ebbe88c79ab5975992d79ba08 # v3.0.2
with:
timeout_minutes: 15
max_attempts: 20
retry_on: error
retry_wait_seconds: 10
command: bash tests/integration/kubernetes/gha-run.sh create-cluster
- name: Install `bats`
run: bash tests/integration/kubernetes/gha-run.sh install-bats
- name: Install `kubectl`
uses: azure/setup-kubectl@776406bce94f63e41d621b960d78ee25c8b76ede # v4.0.1
with:
version: 'latest'
- name: Download credentials for the Kubernetes CLI to use them
run: bash tests/integration/kubernetes/gha-run.sh get-cluster-credentials
- name: Deploy Kata
timeout-minutes: 20
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-aks
env:
USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: ${{ env.SNAPSHOTTER == 'nydus' }}
AUTO_GENERATE_POLICY: ${{ env.PULL_TYPE == 'experimental-force-guest-pull' && 'no' || 'yes' }}
- name: Deploy CoCo KBS
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-coco-kbs
- name: Install `kbs-client`
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh install-kbs-client
- name: Deploy CSI driver
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh deploy-csi-driver
- name: Run tests
timeout-minutes: 80
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Refresh OIDC token in case access token expired
if: always()
uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0
with:
client-id: ${{ secrets.AZ_APPID }}
tenant-id: ${{ secrets.AZ_TENANT_ID }}
subscription-id: ${{ secrets.AZ_SUBSCRIPTION_ID }}
- name: Delete AKS cluster
if: always()
timeout-minutes: 15
run: bash tests/integration/kubernetes/gha-run.sh delete-cluster
# Generate jobs for testing CoCo on non-TEE environments with erofs-snapshotter
run-k8s-tests-coco-nontee-with-erofs-snapshotter:
name: run-k8s-tests-coco-nontee-with-erofs-snapshotter
strategy:
fail-fast: false
matrix:
vmm:
- qemu-coco-dev
snapshotter:
- erofs
pull-type:
- default
runs-on: ubuntu-24.04
environment: ci
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
GH_PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
# Some tests rely on that variable to run (or not)
KBS: "false"
# Set the KBS ingress handler (empty string disables handling)
KBS_INGRESS: ""
KUBERNETES: "vanilla"
CONTAINER_ENGINE: "containerd"
CONTAINER_ENGINE_VERSION: "v2.2"
PULL_TYPE: ${{ matrix.pull-type }}
SNAPSHOTTER: ${{ matrix.snapshotter }}
USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: "true"
K8S_TEST_HOST_TYPE: "all"
# We are skipping the auto generated policy tests for now,
# but those should be enabled as soon as we work on that.
AUTO_GENERATE_POLICY: "no"
CONTAINER_ENGINE_VERSION: "active"
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
@@ -342,8 +213,129 @@ jobs:
- name: Deploy kubernetes
timeout-minutes: 15
run: bash tests/integration/kubernetes/gha-run.sh deploy-k8s
- name: Install `bats`
run: bash tests/integration/kubernetes/gha-run.sh install-bats
- name: Deploy Kata
timeout-minutes: 20
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata
env:
GH_TOKEN: ${{ github.token }}
USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: ${{ matrix.environment.snapshotter == 'nydus' }}
- name: Deploy CoCo KBS
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh deploy-coco-kbs
- name: Install `kbs-client`
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh install-kbs-client
- name: Deploy CSI driver
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh deploy-csi-driver
- name: Run tests
timeout-minutes: 80
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Delete kata-deploy
if: always()
timeout-minutes: 15
run: bash tests/integration/kubernetes/gha-run.sh cleanup
- name: Delete CoCo KBS
if: always()
timeout-minutes: 10
run: bash tests/integration/kubernetes/gha-run.sh delete-coco-kbs
- name: Delete CSI driver
if: always()
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh delete-csi-driver
# Generate jobs for testing CoCo on non-TEE environments with erofs-snapshotter
run-k8s-tests-coco-nontee-with-erofs-snapshotter:
name: run-k8s-tests-coco-nontee-with-erofs-snapshotter
strategy:
fail-fast: false
matrix:
vmm:
- qemu-coco-dev
snapshotter:
- erofs
pull-type:
- default
runs-on: ubuntu-24.04
environment: ci
env:
DOCKER_REGISTRY: ${{ inputs.registry }}
DOCKER_REPO: ${{ inputs.repo }}
DOCKER_TAG: ${{ inputs.tag }}
GH_PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.vmm }}
# Some tests rely on that variable to run (or not)
KBS: "false"
# Set the KBS ingress handler (empty string disables handling)
KBS_INGRESS: ""
KUBERNETES: "vanilla"
CONTAINER_ENGINE: "containerd"
CONTAINER_ENGINE_VERSION: "active"
PULL_TYPE: ${{ matrix.pull-type }}
SNAPSHOTTER: ${{ matrix.snapshotter }}
USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: "true"
K8S_TEST_HOST_TYPE: "all"
# We are skipping the auto generated policy tests for now,
# but those should be enabled as soon as we work on that.
AUTO_GENERATE_POLICY: "no"
GH_TOKEN: ${{ github.token }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ inputs.commit-hash }}
fetch-depth: 0
persist-credentials: false
- name: Rebase atop of the latest target branch
run: |
./tests/git-helper.sh "rebase-atop-of-the-latest-target-branch"
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tools-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-tools-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-tools-artifacts
- name: Install kata-tools
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-tools-artifacts
- name: Remove unnecessary directories to free up space
run: |
sudo rm -rf /usr/local/.ghcup
sudo rm -rf /opt/hostedtoolcache/CodeQL
sudo rm -rf /usr/local/lib/android
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo rm -rf /usr/local/share/boost
sudo rm -rf /usr/lib/jvm
sudo rm -rf /usr/share/swift
sudo rm -rf /usr/local/share/powershell
sudo rm -rf /usr/local/julia*
sudo rm -rf /opt/az
sudo rm -rf /usr/local/share/chromium
sudo rm -rf /opt/microsoft
sudo rm -rf /opt/google
sudo rm -rf /usr/lib/firefox
- name: Deploy kubernetes
timeout-minutes: 15
run: bash tests/integration/kubernetes/gha-run.sh deploy-k8s
- name: Install `bats`
run: bash tests/integration/kubernetes/gha-run.sh install-bats
@@ -363,3 +355,13 @@ jobs:
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Delete kata-deploy
if: always()
timeout-minutes: 15
run: bash tests/integration/kubernetes/gha-run.sh cleanup
- name: Delete CSI driver
if: always()
timeout-minutes: 5
run: bash tests/integration/kubernetes/gha-run.sh delete-csi-driver

View File

@@ -126,11 +126,15 @@ jobs:
./ci/install_yq.sh
env:
INSTALL_IN_GOPATH: false
- name: Install golang
- name: Read properties from versions.yaml
run: |
cd "${GOPATH}/src/github.com/${GITHUB_REPOSITORY}"
./tests/install_go.sh -f -p
echo "/usr/local/go/bin" >> "$GITHUB_PATH"
go_version="$(yq '.languages.golang.version' versions.yaml)"
[ -n "$go_version" ]
echo "GO_VERSION=${go_version}" >> "$GITHUB_ENV"
- name: Setup Golang version ${{ env.GO_VERSION }}
uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
with:
go-version: ${{ env.GO_VERSION }}
- name: Install system dependencies
run: |
sudo apt-get update && sudo apt-get -y install moreutils hunspell hunspell-en-gb hunspell-en-us pandoc

View File

@@ -21,7 +21,7 @@ jobs:
persist-credentials: false
- name: Run zizmor
uses: zizmorcore/zizmor-action@e673c3917a1aef3c65c972347ed84ccd013ecda4 # v0.2.0
uses: zizmorcore/zizmor-action@135698455da5c3b3e55f73f4419e481ab68cdd95 # v0.4.1
with:
advanced-security: false
annotations: true

251
Cargo.lock generated
View File

@@ -44,9 +44,7 @@ version = "0.1.0"
dependencies = [
"anyhow",
"async-trait",
"futures 0.1.31",
"kata-types",
"log",
"logging",
"nix 0.26.4",
"oci-spec 0.8.3",
@@ -141,23 +139,12 @@ version = "0.7.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "435a87a52755b8f27fcf321ac4f04b2802e337c8c4872923137471ec39c37532"
dependencies = [
"event-listener 5.4.1",
"event-listener",
"event-listener-strategy",
"futures-core",
"pin-project-lite",
]
[[package]]
name = "async-channel"
version = "1.9.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "81953c529336010edd6d8e358f886d9581267795c61b19475b71314bffa46d35"
dependencies = [
"concurrent-queue",
"event-listener 2.5.3",
"futures-core",
]
[[package]]
name = "async-channel"
version = "2.5.0"
@@ -184,21 +171,6 @@ dependencies = [
"slab",
]
[[package]]
name = "async-global-executor"
version = "2.4.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "05b1b633a2115cd122d73b955eadd9916c18c8f510ec9cd1686404c60ad1c29c"
dependencies = [
"async-channel 2.5.0",
"async-executor",
"async-io",
"async-lock",
"blocking",
"futures-lite",
"once_cell",
]
[[package]]
name = "async-io"
version = "2.6.0"
@@ -223,7 +195,7 @@ version = "3.4.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5fd03604047cee9b6ce9de9f70c6cd540a0520c813cbd49bae61f33ab80ed1dc"
dependencies = [
"event-listener 5.4.1",
"event-listener",
"event-listener-strategy",
"pin-project-lite",
]
@@ -234,14 +206,14 @@ version = "2.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fc50921ec0055cdd8a16de48773bfeec5c972598674347252c0399676be7da75"
dependencies = [
"async-channel 2.5.0",
"async-channel",
"async-io",
"async-lock",
"async-signal",
"async-task",
"blocking",
"cfg-if 1.0.0",
"event-listener 5.4.1",
"event-listener",
"futures-lite",
"rustix 1.1.2",
]
@@ -275,32 +247,6 @@ dependencies = [
"windows-sys 0.61.2",
]
[[package]]
name = "async-std"
version = "1.13.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2c8e079a4ab67ae52b7403632e4618815d6db36d2a010cfe41b02c1b1578f93b"
dependencies = [
"async-channel 1.9.0",
"async-global-executor",
"async-io",
"async-lock",
"crossbeam-utils",
"futures-channel",
"futures-core",
"futures-io",
"futures-lite",
"gloo-timers",
"kv-log-macro",
"log",
"memchr",
"once_cell",
"pin-project-lite",
"pin-utils",
"slab",
"wasm-bindgen-futures",
]
[[package]]
name = "async-task"
version = "4.7.1"
@@ -447,7 +393,7 @@ version = "1.6.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e83f8d02be6967315521be875afa792a316e28d57b5a2d401897e2a7921b7f21"
dependencies = [
"async-channel 2.5.0",
"async-channel",
"async-task",
"futures-io",
"futures-lite",
@@ -525,9 +471,9 @@ checksum = "14c189c53d098945499cdfa7ecc63567cf3886b3332b312a5b4585d8d3a6a610"
[[package]]
name = "bytes"
version = "1.5.0"
version = "1.11.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a2bd12c1caf447e69cd4528f47f94d203fd2582878ecb9e9465484c4148a8223"
checksum = "1e748733b7cbc798e1434b6ac524f0c1ff2ab456fe201501e6497c8417a4fc33"
[[package]]
name = "caps"
@@ -644,29 +590,17 @@ dependencies = [
"containerd-shim-protos",
"kata-sys-util",
"kata-types",
"lazy_static",
"nix 0.26.4",
"oci-spec 0.8.3",
"persist",
"protobuf",
"protocols",
"resource",
"runtime-spec",
"serde_json",
"slog",
"slog-scope",
"strum 0.24.1",
"thiserror 1.0.48",
"tokio",
"ttrpc",
]
[[package]]
name = "common-path"
version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2382f75942f4b3be3690fe4f86365e9c853c1587d6ee58212cebf6e2a9ccd101"
[[package]]
name = "concurrent-queue"
version = "2.5.0"
@@ -711,7 +645,7 @@ dependencies = [
"async-trait",
"cgroups-rs 0.3.4",
"containerd-shim-protos",
"futures 0.3.28",
"futures",
"go-flag",
"lazy_static",
"libc",
@@ -1044,7 +978,6 @@ dependencies = [
"dbs-interrupt",
"dbs-utils",
"dbs-virtio-devices",
"downcast-rs",
"kvm-bindings",
"kvm-ioctls",
"libc",
@@ -1057,7 +990,6 @@ dependencies = [
"vfio-ioctls",
"virtio-queue",
"vm-memory",
"vmm-sys-util 0.11.1",
]
[[package]]
@@ -1074,7 +1006,6 @@ dependencies = [
name = "dbs-upcall"
version = "0.3.0"
dependencies = [
"anyhow",
"dbs-utils",
"dbs-virtio-devices",
"log",
@@ -1138,12 +1069,12 @@ dependencies = [
[[package]]
name = "deranged"
version = "0.3.11"
version = "0.5.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b42b6fa04a440b495c8b04d0e71b707c585f83cb9cb28cf8cd0d976c315e31b4"
checksum = "ececcb659e7ba858fb4f10388c250a7252eb0a27373f1a72b8748afdd248e587"
dependencies = [
"powerfmt",
"serde",
"serde_core",
]
[[package]]
@@ -1269,12 +1200,6 @@ version = "0.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1435fa1053d8b2fbbe9be7e97eca7f33d37b28409959813daefc1446a14247f1"
[[package]]
name = "downcast-rs"
version = "1.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9ea835d29036a4087793836fa931b08837ad5e957da9e23886b29586fb9b6650"
[[package]]
name = "dragonball"
version = "0.1.0"
@@ -1295,7 +1220,6 @@ dependencies = [
"dbs-utils",
"dbs-virtio-devices",
"derivative",
"fuse-backend-rs",
"kvm-bindings",
"kvm-ioctls",
"lazy_static",
@@ -1350,6 +1274,18 @@ version = "1.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "66b7e2430c6dff6a955451e2cfc438f09cea1965a9d6f87f7e3b90decc014099"
[[package]]
name = "enum-as-inner"
version = "0.6.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a1e6a265c649f3f5979b601d26f1d05ada116434c87741c9493cb56218f76cbc"
dependencies = [
"heck 0.5.0",
"proc-macro2",
"quote",
"syn 2.0.104",
]
[[package]]
name = "enumflags2"
version = "0.7.12"
@@ -1403,12 +1339,6 @@ dependencies = [
"windows-sys 0.61.2",
]
[[package]]
name = "event-listener"
version = "2.5.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0206175f82b8d6bf6652ff7d71a1e27fd2e4efde587fd368662814d6ec1d9ce0"
[[package]]
name = "event-listener"
version = "5.4.1"
@@ -1426,7 +1356,7 @@ version = "0.5.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8be9f3dfaaffdae2972880079a491a1a8bb7cbed0b8dd7a347f668b4150a3b93"
dependencies = [
"event-listener 5.4.1",
"event-listener",
"pin-project-lite",
]
@@ -1554,12 +1484,6 @@ dependencies = [
"vmm-sys-util 0.11.1",
]
[[package]]
name = "futures"
version = "0.1.31"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3a471a38ef8ed83cd6e40aa59c1ffe17db6855c18e3604d9c4ed8c08ebc28678"
[[package]]
name = "futures"
version = "0.3.28"
@@ -1719,18 +1643,6 @@ version = "0.3.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0cc23270f6e1808e30a928bdc84dea0b9b4136a8bc82338574f23baf47bbd280"
[[package]]
name = "gloo-timers"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bbb143cf96099802033e0d4f4963b19fd2e0b728bcf076cd9cf7f6634f092994"
dependencies = [
"futures-channel",
"futures-core",
"js-sys",
"wasm-bindgen",
]
[[package]]
name = "go-flag"
version = "0.1.0"
@@ -1966,7 +1878,7 @@ dependencies = [
"crossbeam-channel",
"dbs-utils",
"dragonball",
"futures 0.3.28",
"futures",
"go-flag",
"hyper",
"hyperlocal",
@@ -1977,10 +1889,8 @@ dependencies = [
"libc",
"logging",
"nix 0.26.4",
"oci-spec 0.8.3",
"path-clean",
"persist",
"protobuf",
"protocols",
"qapi",
"qapi-qmp",
@@ -1992,7 +1902,6 @@ dependencies = [
"serde",
"serde_json",
"serial_test 2.0.0",
"shim-interface",
"slog",
"slog-scope",
"tempfile",
@@ -2269,8 +2178,6 @@ version = "0.1.0"
dependencies = [
"anyhow",
"byteorder",
"chrono",
"common-path",
"fail",
"hex",
"kata-types",
@@ -2279,11 +2186,9 @@ dependencies = [
"mockall",
"nix 0.26.4",
"oci-spec 0.8.3",
"once_cell",
"pci-ids",
"rand 0.8.5",
"runtime-spec",
"safe-path 0.1.0",
"serde",
"serde_json",
"slog",
@@ -2302,8 +2207,8 @@ dependencies = [
"byte-unit",
"flate2",
"glob",
"hex",
"lazy_static",
"nix 0.26.4",
"num_cpus",
"oci-spec 0.8.3",
"regex",
@@ -2314,18 +2219,10 @@ dependencies = [
"sha2 0.10.9",
"slog",
"slog-scope",
"sysctl",
"sysinfo",
"thiserror 1.0.48",
"toml 0.5.11",
]
[[package]]
name = "kv-log-macro"
version = "1.0.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0de8b303297635ad57c9f5059fd9cee7a47f8e8daa09df0fcd07dd39fb22977f"
dependencies = [
"log",
"toml",
]
[[package]]
@@ -2646,7 +2543,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b65d130ee111430e47eed7896ea43ca693c387f097dd97376bffafbf25812128"
dependencies = [
"bytes",
"futures 0.3.28",
"futures",
"log",
"netlink-packet-core",
"netlink-sys",
@@ -2660,7 +2557,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "416060d346fbaf1f23f9512963e3e878f1a78e707cb699ba9215761754244307"
dependencies = [
"bytes",
"futures 0.3.28",
"futures",
"libc",
"log",
"tokio",
@@ -2784,9 +2681,9 @@ dependencies = [
[[package]]
name = "num-conv"
version = "0.1.0"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "51d515d32fb182ee37cda2ccdcb92950d6a3c2893aa280e540671c2cd0f3b1d9"
checksum = "cf97ec579c3c42f953ef76dbf8d55ac91fb219dde70e49aa4a6b7d74e9919050"
[[package]]
name = "num-traits"
@@ -2817,7 +2714,7 @@ dependencies = [
"log",
"serde",
"serde_json",
"toml 0.5.11",
"toml",
]
[[package]]
@@ -3044,7 +2941,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1e785d273968748578931e4dc3b4f5ec86b26e09d9e0d66b55adda7fce742f7a"
dependencies = [
"async-trait",
"futures 0.3.28",
"futures",
"futures-executor",
"headers",
"http",
@@ -3212,11 +3109,9 @@ dependencies = [
"async-trait",
"kata-sys-util",
"kata-types",
"libc",
"safe-path 0.1.0",
"serde",
"serde_json",
"shim-interface",
]
[[package]]
@@ -3626,7 +3521,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7b047adab56acc4948d4b9b58693c1f33fd13efef2d6bb5f0f66a47436ceada8"
dependencies = [
"bytes",
"futures 0.3.28",
"futures",
"log",
"memchr",
"qapi-qmp",
@@ -3908,11 +3803,10 @@ dependencies = [
"agent",
"anyhow",
"async-trait",
"bitflags 2.10.0",
"byte-unit",
"cgroups-rs 0.5.0",
"flate2",
"futures 0.3.28",
"futures",
"hex",
"hypervisor",
"inotify",
@@ -3922,7 +3816,6 @@ dependencies = [
"libc",
"logging",
"netlink-packet-route",
"netlink-sys",
"netns-rs",
"nix 0.26.4",
"oci-spec 0.8.3",
@@ -3945,9 +3838,9 @@ dependencies = [
[[package]]
name = "rkyv"
version = "0.7.45"
version = "0.7.46"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9008cd6385b9e161d8229e1f6549dd23c3d022f132a2ea37ac3a10ac4935779b"
checksum = "2297bf9c81a3f0dc96bc9521370b88f054168c29826a75e89c55ff196e7ed6a1"
dependencies = [
"bitvec",
"bytecheck",
@@ -3963,9 +3856,9 @@ dependencies = [
[[package]]
name = "rkyv_derive"
version = "0.7.45"
version = "0.7.46"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "503d1d27590a2b0a3a4ca4c94755aa2875657196ecbf401a42eff41d7de532c0"
checksum = "84d7b42d4b8d06048d3ac8db0eb31bcb942cbeb709f0b5f2b2ebde398d3038f5"
dependencies = [
"proc-macro2",
"quote",
@@ -4007,7 +3900,6 @@ dependencies = [
"common",
"containerd-shim-protos",
"go-flag",
"logging",
"nix 0.26.4",
"runtimes",
"shim",
@@ -4018,7 +3910,6 @@ dependencies = [
name = "runtime-spec"
version = "0.1.0"
dependencies = [
"libc",
"serde",
"serde_derive",
"serde_json",
@@ -4032,7 +3923,6 @@ dependencies = [
"anyhow",
"common",
"hyper",
"hyperlocal",
"hypervisor",
"kata-sys-util",
"kata-types",
@@ -4047,6 +3937,8 @@ dependencies = [
"persist",
"procfs 0.12.0",
"prometheus",
"protobuf",
"protocols",
"resource",
"runtime-spec",
"serde_json",
@@ -4349,7 +4241,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1c789ec87f4687d022a2405cf46e0cd6284889f1839de292cadeb6c6019506f2"
dependencies = [
"dashmap",
"futures 0.3.28",
"futures",
"lazy_static",
"log",
"parking_lot",
@@ -4363,7 +4255,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0e56dd856803e253c8f298af3f4d7eb0ae5e23a737252cd90bb4f3b435033b2d"
dependencies = [
"dashmap",
"futures 0.3.28",
"futures",
"lazy_static",
"log",
"parking_lot",
@@ -4403,12 +4295,10 @@ dependencies = [
"containerd-shim-protos",
"kata-types",
"logging",
"persist",
"runtimes",
"slog",
"slog-scope",
"tokio",
"tracing",
"ttrpc",
]
@@ -4472,9 +4362,7 @@ dependencies = [
"nix 0.26.4",
"oci-spec 0.8.3",
"protobuf",
"rand 0.8.5",
"runtime-spec",
"runtimes",
"serial_test 0.10.0",
"service",
"sha2 0.10.9",
@@ -4483,11 +4371,8 @@ dependencies = [
"slog-scope",
"slog-stdlog",
"tempfile",
"tests_utils",
"thiserror 1.0.48",
"tokio",
"tracing",
"tracing-opentelemetry",
"unix_socket2",
]
@@ -4497,7 +4382,6 @@ version = "0.1.0"
dependencies = [
"anyhow",
"common",
"logging",
"runtimes",
"tokio",
]
@@ -4791,6 +4675,20 @@ dependencies = [
"syn 2.0.104",
]
[[package]]
name = "sysctl"
version = "0.7.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cca424247104946a59dacd27eaad296223b7feec3d168a6dd04585183091eb0b"
dependencies = [
"bitflags 2.10.0",
"byteorder",
"enum-as-inner",
"libc",
"thiserror 2.0.12",
"walkdir",
]
[[package]]
name = "sysinfo"
version = "0.34.2"
@@ -4948,30 +4846,30 @@ dependencies = [
[[package]]
name = "time"
version = "0.3.37"
version = "0.3.47"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "35e7868883861bd0e56d9ac6efcaaca0d6d5d82a2a7ec8209ff492c07cf37b21"
checksum = "743bd48c283afc0388f9b8827b976905fb217ad9e647fae3a379a9283c4def2c"
dependencies = [
"deranged",
"itoa",
"num-conv",
"powerfmt",
"serde",
"serde_core",
"time-core",
"time-macros",
]
[[package]]
name = "time-core"
version = "0.1.2"
version = "0.1.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ef927ca75afb808a4d64dd374f00a2adf8d0fcff8e7b184af886c3c87ec4a3f3"
checksum = "7694e1cfe791f8d31026952abf09c69ca6f6fa4e1a1229e18988f06a04a12dca"
[[package]]
name = "time-macros"
version = "0.2.19"
version = "0.2.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2834e6017e3e5e4b9834939793b282bc03b37a3336245fa820e35e233e2a85de"
checksum = "2e70e4c5a0e0a8a4823ad65dfe1a6930e4f4d756dcd9dd7939022b5e8c501215"
dependencies = [
"num-conv",
"time-core",
@@ -5081,21 +4979,12 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "52a15c15b1bc91f90902347eff163b5b682643aff0c8e972912cca79bd9208dd"
dependencies = [
"bytes",
"futures 0.3.28",
"futures",
"libc",
"tokio",
"vsock",
]
[[package]]
name = "toml"
version = "0.4.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "758664fc71a3a69038656bee8b6be6477d2a6c315a6b81f7081f591bffa4111f"
dependencies = [
"serde",
]
[[package]]
name = "toml"
version = "0.5.11"
@@ -5238,7 +5127,7 @@ dependencies = [
"async-trait",
"byteorder",
"crossbeam",
"futures 0.3.28",
"futures",
"home",
"libc",
"log",
@@ -5440,16 +5329,13 @@ version = "0.1.0"
dependencies = [
"agent",
"anyhow",
"async-std",
"async-trait",
"awaitgroup",
"common",
"containerd-shim-protos",
"futures 0.3.28",
"hypervisor",
"kata-sys-util",
"kata-types",
"lazy_static",
"libc",
"logging",
"nix 0.26.4",
@@ -5465,7 +5351,6 @@ dependencies = [
"slog-scope",
"strum 0.24.1",
"tokio",
"toml 0.4.10",
"tracing",
"url",
"uuid 1.18.1",
@@ -6098,7 +5983,7 @@ dependencies = [
"async-trait",
"blocking",
"enumflags2",
"event-listener 5.4.1",
"event-listener",
"futures-core",
"futures-lite",
"hex",

View File

@@ -47,10 +47,10 @@ docs-url-alive-check:
bash ci/docs-url-alive-check.sh
build-and-publish-kata-debug:
bash tools/packaging/kata-debug/kata-debug-build-and-upload-payload.sh ${KATA_DEBUG_REGISTRY} ${KATA_DEBUG_TAG}
bash tools/packaging/kata-debug/kata-debug-build-and-upload-payload.sh ${KATA_DEBUG_REGISTRY} ${KATA_DEBUG_TAG}
docs-serve:
docker run --rm -p 8000:8000 -v ./docs:/docs:ro -v ${PWD}/zensical.toml:/zensical.toml:ro zensical/zensical serve --config-file /zensical.toml -a 0.0.0.0:8000
docker run --rm -p 8000:8000 -v ./docs:/docs:ro -v ${PWD}/zensical.toml:/zensical.toml:ro zensical/zensical serve --config-file /zensical.toml -a 0.0.0.0:8000
.PHONY: \
all \

View File

@@ -1 +1 @@
3.26.0
3.27.0

View File

@@ -73,12 +73,12 @@ function install_yq() {
goarch=arm64
;;
"arm64")
# If we're on an apple silicon machine, just assign amd64.
# The version of yq we use doesn't have a darwin arm build,
# If we're on an apple silicon machine, just assign amd64.
# The version of yq we use doesn't have a darwin arm build,
# but Rosetta can come to the rescue here.
if [[ ${goos} == "Darwin" ]]; then
goarch=amd64
else
else
goarch=arm64
fi
;;

View File

@@ -83,4 +83,4 @@ files to the repository and create a pull request when you are ready.
If you have an idea for a blog post and would like to get feedback from the
community about it or have any questions about the process, please reach out
on one of the community's [communication channels](https://katacontainers.io/community/).
on one of the community's [communication channels](https://katacontainers.io/community/).

View File

@@ -289,14 +289,14 @@ provided by your distribution.
As a prerequisite, you need to install Docker. Otherwise, you will not be
able to run the `rootfs.sh` script with `USE_DOCKER=true` as expected in
the following example.
the following example. Specifying the `OS_VERSION` is required when using `distro="ubuntu"`.
```bash
$ export distro="ubuntu" # example
$ export ROOTFS_DIR="$(realpath kata-containers/tools/osbuilder/rootfs-builder/rootfs)"
$ sudo rm -rf "${ROOTFS_DIR}"
$ pushd kata-containers/tools/osbuilder/rootfs-builder
$ script -fec 'sudo -E USE_DOCKER=true ./rootfs.sh "${distro}"'
$ script -fec 'sudo -E USE_DOCKER=true OS_VERSION=noble ./rootfs.sh "${distro}"'
$ popd
```

View File

@@ -206,7 +206,7 @@ For security reasons, the following mounts are disallowed:
| `proc \|\| sysfs` | `*` | not a directory (e.g. symlink) | CVE-2019-19921 |
For bind mounts under /proc, these destinations are allowed:
* `/proc/cpuinfo`
* `/proc/diskstats`
* `/proc/meminfo`

View File

@@ -4,7 +4,7 @@ As we know, we can interact with cgroups in two ways, **`cgroupfs`** and **`syst
## usage
For systemd, kata agent configures cgroups according to the following `linux.cgroupsPath` format standard provided by `runc` (`[slice]:[prefix]:[name]`). If you don't provide a valid `linux.cgroupsPath`, kata agent will treat it as `"system.slice:kata_agent:<container-id>"`.
For systemd, kata agent configures cgroups according to the following `linux.cgroupsPath` format standard provided by `runc` (`[slice]:[prefix]:[name]`). If you don't provide a valid `linux.cgroupsPath`, kata agent will treat it as `"system.slice:kata_agent:<container-id>"`.
> Here slice is a systemd slice under which the container is placed. If empty, it defaults to system.slice, except when cgroup v2 is used and rootless container is created, in which case it defaults to user.slice.
>
@@ -65,7 +65,7 @@ The kata agent will translate the parameters in the `linux.resources` of `config
## Systemd Interface
`session.rs` and `system.rs` in `src/agent/rustjail/src/cgroups/systemd/interface` are automatically generated by `zbus-xmlgen`, which is is an accompanying tool provided by `zbus` to generate Rust code from `D-Bus XML interface descriptions`. The specific commands to generate these two files are as follows:
`session.rs` and `system.rs` in `src/agent/rustjail/src/cgroups/systemd/interface` are automatically generated by `zbus-xmlgen`, which is is an accompanying tool provided by `zbus` to generate Rust code from `D-Bus XML interface descriptions`. The specific commands to generate these two files are as follows:
```shell
// system.rs

View File

@@ -10,7 +10,7 @@ participant proxy
#Docker Exec
Docker->kata-runtime: exec
kata-runtime->virtcontainers: EnterContainer()
virtcontainers->agent: exec
virtcontainers->agent: exec
agent->virtcontainers: Process started in the container
virtcontainers->shim: start shim
shim->agent: ReadStdout()

View File

@@ -322,7 +322,7 @@ The runtime is responsible for starting the [hypervisor](#hypervisor)
and it's VM, and communicating with the [agent](#agent) using a
[ttRPC based protocol](#agent-communications-protocol) over a VSOCK
socket that provides a communications link between the VM and the
host.
host.
This protocol allows the runtime to send container management commands
to the agent. The protocol is also used to carry the standard I/O

View File

@@ -4,15 +4,15 @@ Containers typically live in their own, possibly shared, networking namespace.
At some point in a container lifecycle, container engines will set up that namespace
to add the container to a network which is isolated from the host network.
In order to setup the network for a container, container engines call into a
In order to setup the network for a container, container engines call into a
networking plugin. The network plugin will usually create a virtual
ethernet (`veth`) pair adding one end of the `veth` pair into the container
networking namespace, while the other end of the `veth` pair is added to the
ethernet (`veth`) pair adding one end of the `veth` pair into the container
networking namespace, while the other end of the `veth` pair is added to the
host networking namespace.
This is a very namespace-centric approach as many hypervisors or VM
Managers (VMMs) such as `virt-manager` cannot handle `veth`
interfaces. Typically, [`TAP`](https://www.kernel.org/doc/Documentation/networking/tuntap.txt)
interfaces. Typically, [`TAP`](https://www.kernel.org/doc/Documentation/networking/tuntap.txt)
interfaces are created for VM connectivity.
To overcome incompatibility between typical container engines expectations
@@ -22,15 +22,15 @@ interfaces with `TAP` ones using [Traffic Control](https://man7.org/linux/man-pa
![Kata Containers networking](../arch-images/network.png)
With a TC filter rules in place, a redirection is created between the container network
and the virtual machine. As an example, the network plugin may place a device,
`eth0`, in the container's network namespace, which is one end of a VETH device.
and the virtual machine. As an example, the network plugin may place a device,
`eth0`, in the container's network namespace, which is one end of a VETH device.
Kata Containers will create a tap device for the VM, `tap0_kata`,
and setup a TC redirection filter to redirect traffic from `eth0`'s ingress to `tap0_kata`'s egress,
and a second TC filter to redirect traffic from `tap0_kata`'s ingress to `eth0`'s egress.
Kata Containers maintains support for MACVTAP, which was an earlier implementation used in Kata.
With this method, Kata created a MACVTAP device to connect directly to the `eth0` device.
TC-filter is the default because it allows for simpler configuration, better CNI plugin
Kata Containers maintains support for MACVTAP, which was an earlier implementation used in Kata.
With this method, Kata created a MACVTAP device to connect directly to the `eth0` device.
TC-filter is the default because it allows for simpler configuration, better CNI plugin
compatibility, and performance on par with MACVTAP.
Kata Containers has deprecated support for bridge due to lacking performance relative to TC-filter and MACVTAP.

View File

@@ -111,20 +111,20 @@ In our case, there will be a variety of resources, and every resource has severa
- Are the "service", "message dispatcher" and "runtime handler" all part of the single Kata 3.x runtime binary?
Yes. They are components in Kata 3.x runtime. And they will be packed into one binary.
1. Service is an interface, which is responsible for handling multiple services like task service, image service and etc.
2. Message dispatcher, it is used to match multiple requests from the service module.
3. Runtime handler is used to deal with the operation for sandbox and container.
1. Service is an interface, which is responsible for handling multiple services like task service, image service and etc.
2. Message dispatcher, it is used to match multiple requests from the service module.
3. Runtime handler is used to deal with the operation for sandbox and container.
- What is the name of the Kata 3.x runtime binary?
Apparently we can't use `containerd-shim-v2-kata` because it's already used. We are facing the hardest issue of "naming" again. Any suggestions are welcomed.
Internally we use `containerd-shim-v2-rund`.
- Is the Kata 3.x design compatible with the containerd shimv2 architecture?
- Is the Kata 3.x design compatible with the containerd shimv2 architecture?
Yes. It is designed to follow the functionality of go version kata. And it implements the `containerd shim v2` interface/protocol.
- How will users migrate to the Kata 3.x architecture?
The migration plan will be provided before the Kata 3.x is merging into the main branch.
- Is `Dragonball` limited to its own built-in VMM? Can the `Dragonball` system be configured to work using an external `Dragonball` VMM/hypervisor?
@@ -134,35 +134,35 @@ In our case, there will be a variety of resources, and every resource has severa
`runD` is the `containerd-shim-v2` counterpart of `runC` and can run a pod/containers. `Dragonball` is a `microvm`/VMM that is designed to run container workloads. Instead of `microvm`/VMM, we sometimes refer to it as secure sandbox.
- QEMU, Cloud Hypervisor and Firecracker support are planned, but how that would work. Are they working in separate process?
Yes. They are unable to work as built in VMM.
- What is `upcall`?
The `upcall` is used to hotplug CPU/memory/MMIO devices, and it solves two issues.
1. avoid dependency on PCI/ACPI
2. avoid dependency on `udevd` within guest and get deterministic results for hotplug operations. So `upcall` is an alternative to ACPI based CPU/memory/device hotplug. And we may cooperate with the community to add support for ACPI based CPU/memory/device hotplug if needed.
`Dbs-upcall` is a `vsock-based` direct communication tool between VMM and guests. The server side of the `upcall` is a driver in guest kernel (kernel patches are needed for this feature) and it'll start to serve the requests once the kernel has started. And the client side is in VMM , it'll be a thread that communicates with VSOCK through `uds`. We have accomplished device hotplug / hot-unplug directly through `upcall` in order to avoid virtualization of ACPI to minimize virtual machine's overhead. And there could be many other usage through this direct communication channel. It's already open source.
https://github.com/openanolis/dragonball-sandbox/tree/main/crates/dbs-upcall
https://github.com/openanolis/dragonball-sandbox/tree/main/crates/dbs-upcall
- The URL below says the kernel patches work with 4.19, but do they also work with 5.15+ ?
Forward compatibility should be achievable, we have ported it to 5.10 based kernel.
- Are these patches platform-specific or would they work for any architecture that supports VSOCK?
It's almost platform independent, but some message related to CPU hotplug are platform dependent.
- Could the kernel driver be replaced with a userland daemon in the guest using loopback VSOCK?
We need to create device nodes for hot-added CPU/memory/devices, so it's not easy for userspace daemon to do these tasks.
- The fact that `upcall` allows communication between the VMM and the guest suggests that this architecture might be incompatible with https://github.com/confidential-containers where the VMM should have no knowledge of what happens inside the VM.
1. `TDX` doesn't support CPU/memory hotplug yet.
2. For ACPI based device hotplug, it depends on ACPI `DSDT` table, and the guest kernel will execute `ASL` code to handle during handling those hotplug event. And it should be easier to audit VSOCK based communication than ACPI `ASL` methods.
- What is the security boundary for the monolithic / "Built-in VMM" case?
It has the security boundary of virtualization. More details will be provided in next stage.

View File

@@ -1,62 +1,62 @@
# Motivation
Today, there exist a few gaps between Container Storage Interface (CSI) and virtual machine (VM) based runtimes such as Kata Containers
Today, there exist a few gaps between Container Storage Interface (CSI) and virtual machine (VM) based runtimes such as Kata Containers
that prevent them from working together smoothly.
First, its cumbersome to use a persistent volume (PV) with Kata Containers. Today, for a PV with Filesystem volume mode, Virtio-fs
is the only way to surface it inside a Kata Container guest VM. But often mounting the filesystem (FS) within the guest operating system (OS) is
is the only way to surface it inside a Kata Container guest VM. But often mounting the filesystem (FS) within the guest operating system (OS) is
desired due to performance benefits, availability of native FS features and security benefits over the Virtio-fs mechanism.
Second, its difficult if not impossible to resize a PV online with Kata Containers. While a PV can be expanded on the host OS,
the updated metadata needs to be propagated to the guest OS in order for the application container to use the expanded volume.
Second, its difficult if not impossible to resize a PV online with Kata Containers. While a PV can be expanded on the host OS,
the updated metadata needs to be propagated to the guest OS in order for the application container to use the expanded volume.
Currently, there is not a way to propagate the PV metadata from the host OS to the guest OS without restarting the Pod sandbox.
# Proposed Solution
Because of the OS boundary, these features cannot be implemented in the CSI node driver plugin running on the host OS
as is normally done in the runc container. Instead, they can be done by the Kata Containers agent inside the guest OS,
but it requires the CSI driver to pass the relevant information to the Kata Containers runtime.
An ideal long term solution would be to have the `kubelet` coordinating the communication between the CSI driver and
the container runtime, as described in [KEP-2857](https://github.com/kubernetes/enhancements/pull/2893/files).
Because of the OS boundary, these features cannot be implemented in the CSI node driver plugin running on the host OS
as is normally done in the runc container. Instead, they can be done by the Kata Containers agent inside the guest OS,
but it requires the CSI driver to pass the relevant information to the Kata Containers runtime.
An ideal long term solution would be to have the `kubelet` coordinating the communication between the CSI driver and
the container runtime, as described in [KEP-2857](https://github.com/kubernetes/enhancements/pull/2893/files).
However, as the KEP is still under review, we would like to propose a short/medium term solution to unblock our use case.
The proposed solution is built on top of a previous [proposal](https://github.com/egernst/kata-containers/blob/da-proposal/docs/design/direct-assign-volume.md)
The proposed solution is built on top of a previous [proposal](https://github.com/egernst/kata-containers/blob/da-proposal/docs/design/direct-assign-volume.md)
described by Eric Ernst. The previous proposal has two gaps:
1. Writing a `csiPlugin.json` file to the volume root path introduced a security risk. A malicious user can gain unauthorized
access to a block device by writing their own `csiPlugin.json` to the above location through an ephemeral CSI plugin.
1. Writing a `csiPlugin.json` file to the volume root path introduced a security risk. A malicious user can gain unauthorized
access to a block device by writing their own `csiPlugin.json` to the above location through an ephemeral CSI plugin.
2. The proposal didn't describe how to establish a mapping between a volume and a kata sandbox, which is needed for
2. The proposal didn't describe how to establish a mapping between a volume and a kata sandbox, which is needed for
implementing CSI volume resize and volume stat collection APIs.
This document particularly focuses on how to address these two gaps.
## Assumptions and Limitations
1. The proposal assumes that a block device volume will only be used by one Pod on a node at a time, which we believe
is the most common pattern in Kata Containers use cases. Its also unsafe to have the same block device attached to more than
one Kata pod. In the context of Kubernetes, the `PersistentVolumeClaim` (PVC) needs to have the `accessMode` as `ReadWriteOncePod`.
2. More advanced Kubernetes volume features such as, `fsGroup`, `fsGroupChangePolicy`, and `subPath` are not supported.
1. The proposal assumes that a block device volume will only be used by one Pod on a node at a time, which we believe
is the most common pattern in Kata Containers use cases. Its also unsafe to have the same block device attached to more than
one Kata pod. In the context of Kubernetes, the `PersistentVolumeClaim` (PVC) needs to have the `accessMode` as `ReadWriteOncePod`.
2. More advanced Kubernetes volume features such as, `fsGroup`, `fsGroupChangePolicy`, and `subPath` are not supported.
## End User Interface
1. The user specifies a PV as a direct-assigned volume. How a PV is specified as a direct-assigned volume is left for each CSI implementation to decide.
There are a few options for reference:
1. A storage class parameter specifies whether it's a direct-assigned volume. This avoids any lookups of PVC
or Pod information from the CSI plugin (as external provisioner takes care of these). However, all PVs in the storage class with the parameter set
1. A storage class parameter specifies whether it's a direct-assigned volume. This avoids any lookups of PVC
or Pod information from the CSI plugin (as external provisioner takes care of these). However, all PVs in the storage class with the parameter set
will have host mounts skipped.
2. Use a PVC annotation. This approach requires the CSI plugins have `--extra-create-metadata` [set](https://kubernetes-csi.github.io/docs/external-provisioner.html#persistentvolumeclaim-and-persistentvolume-parameters)
to be able to perform a lookup of the PVC annotations from the API server. Pro: API server lookup of annotations only required during creation of PV.
to be able to perform a lookup of the PVC annotations from the API server. Pro: API server lookup of annotations only required during creation of PV.
Con: The CSI plugin will always skip host mounting of the PV.
3. The CSI plugin can also lookup pod `runtimeclass` during `NodePublish`. This approach can be found in the [ALIBABA CSI plugin](https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/pkg/disk/nodeserver.go#L248).
2. The CSI node driver delegates the direct assigned volume to the Kata Containers runtime. The CSI node driver APIs need to
2. The CSI node driver delegates the direct assigned volume to the Kata Containers runtime. The CSI node driver APIs need to
be modified to pass the volume mount information and collect volume information to/from the Kata Containers runtime by invoking `kata-runtime` command line commands.
* **NodePublishVolume** -- It invokes `kata-runtime direct-volume add --volume-path [volumePath] --mount-info [mountInfo]`
* **`NodePublishVolume`** -- It invokes `kata-runtime direct-volume add --volume-path [volumePath] --mount-info [mountInfo]`
to propagate the volume mount information to the Kata Containers runtime for it to carry out the filesystem mount operation.
The `volumePath` is the [target_path](https://github.com/container-storage-interface/spec/blob/master/csi.proto#L1364) in the CSI `NodePublishVolumeRequest`.
The `mountInfo` is a serialized JSON string.
* **NodeGetVolumeStats** -- It invokes `kata-runtime direct-volume stats --volume-path [volumePath]` to retrieve the filesystem stats of direct-assigned volume.
* **NodeExpandVolume** -- It invokes `kata-runtime direct-volume resize --volume-path [volumePath] --size [size]` to send a resize request to the Kata Containers runtime to
The `mountInfo` is a serialized JSON string.
* **`NodeGetVolumeStats`** -- It invokes `kata-runtime direct-volume stats --volume-path [volumePath]` to retrieve the filesystem stats of direct-assigned volume.
* **`NodeExpandVolume`** -- It invokes `kata-runtime direct-volume resize --volume-path [volumePath] --size [size]` to send a resize request to the Kata Containers runtime to
resize the direct-assigned volume.
* **NodeStageVolume/NodeUnStageVolume** -- It invokes `kata-runtime direct-volume remove --volume-path [volumePath]` to remove the persisted metadata of a direct-assigned volume.
* **`NodeStageVolume/NodeUnStageVolume`** -- It invokes `kata-runtime direct-volume remove --volume-path [volumePath]` to remove the persisted metadata of a direct-assigned volume.
The `mountInfo` object is defined as follows:
```Golang
@@ -78,17 +78,17 @@ Notes: given that the `mountInfo` is persisted to the disk by the Kata runtime,
## Implementation Details
### Kata runtime
Instead of the CSI node driver writing the mount info into a `csiPlugin.json` file under the volume root,
as described in the original proposal, here we propose that the CSI node driver passes the mount information to
the Kata Containers runtime through a new `kata-runtime` commandline command. The `kata-runtime` then writes the mount
Instead of the CSI node driver writing the mount info into a `csiPlugin.json` file under the volume root,
as described in the original proposal, here we propose that the CSI node driver passes the mount information to
the Kata Containers runtime through a new `kata-runtime` commandline command. The `kata-runtime` then writes the mount
information to a `mountInfo.json` file in a predefined location (`/run/kata-containers/shared/direct-volumes/[volume_path]/`).
When the Kata Containers runtime starts a container, it verifies whether a volume mount is a direct-assigned volume by checking
whether there is a `mountInfo` file under the computed Kata `direct-volumes` directory. If it is, the runtime parses the `mountInfo` file,
When the Kata Containers runtime starts a container, it verifies whether a volume mount is a direct-assigned volume by checking
whether there is a `mountInfo` file under the computed Kata `direct-volumes` directory. If it is, the runtime parses the `mountInfo` file,
updates the mount spec with the data in `mountInfo`. The updated mount spec is then passed to the Kata agent in the guest VM together
with other mounts. The Kata Containers runtime also creates a file named by the sandbox id under the `direct-volumes/[volume_path]/`
directory. The reason for adding a sandbox id file is to establish a mapping between the volume and the sandbox using it.
Later, when the Kata Containers runtime handles the `get-stats` and `resize` commands, it uses the sandbox id to identify
with other mounts. The Kata Containers runtime also creates a file named by the sandbox id under the `direct-volumes/[volume_path]/`
directory. The reason for adding a sandbox id file is to establish a mapping between the volume and the sandbox using it.
Later, when the Kata Containers runtime handles the `get-stats` and `resize` commands, it uses the sandbox id to identify
the endpoint of the corresponding `containerd-shim-kata-v2`.
### containerd-shim-kata-v2 changes
@@ -101,12 +101,12 @@ $ curl --unix-socket "$shim_socket_path" -I -X GET 'http://localhost/direct-volu
$ curl --unix-socket "$shim_socket_path" -I -X POST 'http://localhost/direct-volume/resize' -d '{ "volumePath"": [volumePath], "Size": "123123" }'
```
The shim then forwards the corresponding request to the `kata-agent` to carry out the operations inside the guest VM. For `resize` operation,
the Kata runtime also needs to notify the hypervisor to resize the block device (e.g. call `block_resize` in QEMU).
The shim then forwards the corresponding request to the `kata-agent` to carry out the operations inside the guest VM. For `resize` operation,
the Kata runtime also needs to notify the hypervisor to resize the block device (e.g. call `block_resize` in QEMU).
### Kata agent changes
The mount spec of a direct-assigned volume is passed to `kata-agent` through the existing `Storage` GRPC object.
The mount spec of a direct-assigned volume is passed to `kata-agent` through the existing `Storage` GRPC object.
Two new APIs and three new GRPC objects are added to GRPC protocol between the shim and agent for resizing and getting volume stats:
```protobuf
@@ -226,7 +226,7 @@ Lets assume that changes have been made in the `aws-ebs-csi-driver` node driv
1. In the node CSI driver, the `NodePublishVolume` API invokes: `kata-runtime direct-volume add --volume-path "/kubelet/a/b/c/d/sdf" --mount-info "{\"Device\": \"/dev/sdf\", \"fstype\": \"ext4\"}"`.
2. The `Kata-runtime` writes the mount-info JSON to a file called `mountInfo.json` under `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.
**Node unstage volume**
**Node `unstage` volume**
1. In the node CSI driver, the `NodeUnstageVolume` API invokes: `kata-runtime direct-volume remove --volume-path "/kubelet/a/b/c/d/sdf"`.
2. Kata-runtime deletes the directory `/run/kata-containers/shared/direct-volumes/kubelet/a/b/c/d/sdf`.

View File

@@ -59,5 +59,5 @@ The table below summarized when and where those different hooks will be executed
+ `Hook Path` specifies where hook's path be resolved.
+ `Exec Place` specifies in which namespace those hooks can be executed.
+ For `CreateContainer` Hooks, OCI requires to run them inside the container namespace while the hook executable path is in the host runtime, which is a non-starter for VM-based containers. So we design to keep them running in the *host vmm namespace.*
+ `Exec Time` specifies at which time point those hooks can be executed.
+ For `CreateContainer` Hooks, OCI requires to run them inside the container namespace while the hook executable path is in the host runtime, which is a non-starter for VM-based containers. So we design to keep them running in the *host vmm namespace.*
+ `Exec Time` specifies at which time point those hooks can be executed.

View File

@@ -118,7 +118,7 @@ all vCPU and I/O related threads) will be created in the `/kata_<PodSandboxID>`
### Why create a kata-cgroup under the parent cgroup?
And why not directly adding the per sandbox shim directly to the pod cgroup (e.g.
And why not directly adding the per sandbox shim directly to the pod cgroup (e.g.
`/kubepods` in the Kubernetes context)?
The Kata Containers shim implementation creates a per-sandbox cgroup
@@ -219,13 +219,13 @@ the `/kubepods` cgroup hierarchy, and a `/<PodSandboxID>` under the `/kata_overh
On a typical cgroup v1 hierarchy mounted under `/sys/fs/cgroup/`, for a pod which sandbox
ID is `12345678`, create with `sandbox_cgroup_only` disabled, the 2 memory subsystems
for the sandbox cgroup and the overhead cgroup would respectively live under
for the sandbox cgroup and the overhead cgroup would respectively live under
`/sys/fs/cgroup/memory/kubepods/kata_12345678` and `/sys/fs/cgroup/memory/kata_overhead/12345678`.
Unlike when `sandbox_cgroup_only` is enabled, the Kata Containers shim will move itself
to the overhead cgroup first, and then move the vCPU threads to the sandbox cgroup as
they're created. All Kata processes and threads will run under the overhead cgroup except for
the vCPU threads.
the vCPU threads.
With `sandbox_cgroup_only` disabled, Kata Containers assumes the pod cgroup is only sized
to accommodate for the actual container workloads processes. For Kata, this maps
@@ -247,7 +247,7 @@ cgroup size and constraints accordingly.
# Supported cgroups
Kata Containers currently supports cgroups `v1` and `v2`.
Kata Containers currently supports cgroups `v1` and `v2`.
In the following sections each cgroup is described briefly.

View File

@@ -119,17 +119,17 @@ The metrics service also doesn't hold any metrics in memory.
*Metrics size*: response size of one Prometheus scrape request.
It's easy to estimate the size of one metrics fetch request issued by Prometheus.
The formula to calculate the expected size when no gzip compression is in place is:
The formula to calculate the expected size when no gzip compression is in place is:
9 + (144 - 9) * `number of kata sandboxes`
Prometheus supports `gzip compression`. When enabled, the response size of each request will be smaller:
Prometheus supports `gzip compression`. When enabled, the response size of each request will be smaller:
2 + (10 - 2) * `number of kata sandboxes`
**Example**
We have 10 sandboxes running on a node. The expected size of one metrics fetch request issued by Prometheus against the kata-monitor agent running on that node will be:
**Example**
We have 10 sandboxes running on a node. The expected size of one metrics fetch request issued by Prometheus against the kata-monitor agent running on that node will be:
9 + (144 - 9) * 10 = **1.35M**
If `gzip compression` is enabled:
If `gzip compression` is enabled:
2 + (10 - 2) * 10 = **82K**
#### Metrics delay ####

View File

@@ -71,7 +71,7 @@ The Kata Containers runtime **MUST** support scalable I/O through the SRIOV tech
### Virtualization overhead reduction
A compelling aspect of containers is their minimal overhead compared to bare metal applications.
A container runtime should keep the overhead to a minimum in order to provide the expected user
experience.
experience.
The Kata Containers runtime implementation **SHOULD** be optimized for:
* Minimal workload boot and shutdown times

View File

@@ -5,7 +5,7 @@ To safeguard the integrity of container images and prevent tampering from the ho
## Introduction to remote snapshot
Containerd 1.7 introduced `remote snapshotter` feature which is the foundation for pulling images in the guest for Confidential Containers.
While it's beyond the scope of this document to fully explain how the container rootfs is created to the point it can be executed, a fundamental grasp of the snapshot concept is essential. Putting it in a simple way, containerd fetches the image layers from an OCI registry into its local content storage. However, they cannot be mounted as is (e.g. the layer can be tar+gzip compressed) as well as they should be immutable so the content can be shared among containers. Thus containerd leverages snapshots of those layers to build the container's rootfs.
While it's beyond the scope of this document to fully explain how the container rootfs is created to the point it can be executed, a fundamental grasp of the snapshot concept is essential. Putting it in a simple way, containerd fetches the image layers from an OCI registry into its local content storage. However, they cannot be mounted as is (e.g. the layer can be tar+gzip compressed) as well as they should be immutable so the content can be shared among containers. Thus containerd leverages snapshots of those layers to build the container's rootfs.
The role of `remote snapshotter` is to reuse snapshots that are stored in a remotely shared place, thus enabling containerd to prepare the containers rootfs in a manner similar to that of a local `snapshotter`. The key behavior that makes this the building block of Kata's guest image management for Confidential Containers is that containerd will not pull the image layers from registry, instead it assumes that `remote snapshotter` and/or an external entity will perform that operation on his behalf.
@@ -48,7 +48,7 @@ Pull the container image directly from the guest VM using `nydus snapshotter` ba
#### Architecture
The following diagram provides an overview of the architecture for pulling image in the guest with key components.
The following diagram provides an overview of the architecture for pulling image in the guest with key components.
```mermaid
flowchart LR
Kubelet[kubelet]--> |1\. Pull image request & metadata|Containerd
@@ -129,7 +129,7 @@ Next the `handleImageGuestPullBlockVolume()` is called to build the Storage obje
Below is an example of storage information packaged in the message sent to the kata-agent:
```json
"driver": "image_guest_pull",
"driver": "image_guest_pull",
"driver_options": [
"image_guest_pull"{
"metadata":{
@@ -145,15 +145,15 @@ Below is an example of storage information packaged in the message sent to the k
"io.kubernetes.cri.sandbox-uid": "de7c6a0c-79c0-44dc-a099-69bb39f180af",
}
}
],
"source": "quay.io/kata-containers/confidential-containers:unsigned",
"fstype": "overlay",
"options": [],
],
"source": "quay.io/kata-containers/confidential-containers:unsigned",
"fstype": "overlay",
"options": [],
"mount_point": "/run/kata-containers/cb0b47276ea66ee9f44cc53afa94d7980b57a52c3f306f68cb034e58d9fbd3c6/rootfs",
```
Next, the kata-agent's RPC module will handle the create container request which, among other things, involves adding storages to the sandbox. The storage module contains implementations of `StorageHandler` interface for various storage types, being the `ImagePullHandler` in charge of handling the storage object for the container image (the storage manager instantiates the handler based on the value of the "driver").
`ImagePullHandler` delegates the image pulling operation to the `confidential_data_hub.pull_image()` that is going to create the image's bundle directory on the guest filesystem and, in turn, the `ImagePullService` of Confidential Data Hub to fetch, uncompress and mount the image's rootfs.
`ImagePullHandler` delegates the image pulling operation to the `confidential_data_hub.pull_image()` that is going to create the image's bundle directory on the guest filesystem and, in turn, the `ImagePullService` of Confidential Data Hub to fetch, uncompress and mount the image's rootfs.
> **Notes:**
> In this flow, `confidential_data_hub.pull_image()` parses the image metadata, looking for either the `io.kubernetes.cri.container-type: sandbox` or `io.kubernetes.cri-o.ContainerType: sandbox` (CRI-IO case) annotation, then it never calls the `pull_image()` RPC of Confidential Data Hub because the pause image is expected to already be inside the guest's filesystem, so instead `confidential_data_hub.unpack_pause_image()` is called.

View File

@@ -1,6 +1,6 @@
# Virtual machine vCPU sizing in Kata Containers 3.0
> Preview:
> Preview:
> [Kubernetes(since 1.23)][1] and [Containerd(since 1.6.0-beta4)][2] will help calculate `Sandbox Size` info and pass it to Kata Containers through annotations.
> In order to adapt to this beneficial change and be compatible with the past, we have implemented the new vCPUs handling way in `runtime-rs`, which is slightly different from the original `runtime-go`'s design.
@@ -20,7 +20,7 @@ Our understanding and priority of these resources are as follows, which will aff
* `default_vcpus`: default number of vCPUs when starting a VM.
* `default_maxvcpus`: maximum number of vCPUs.
* From `Annotation`:
* `InitialSize`: we call the size of the resource passed from the annotations as `InitialSize`. Kubernetes will calculate the sandbox size according to the Pod's statement, which is the `InitialSize` here. This size should be the size we want to prioritize.
* `InitialSize`: we call the size of the resource passed from the annotations as `InitialSize`. Kubernetes will calculate the sandbox size according to the Pod's statement, which is the `InitialSize` here. This size should be the size we want to prioritize.
* From `Container Spec`:
* The amount of CPU resources that the Container wants to use will be declared through the spec. Including the aforementioned annotations, we mainly consider `cpu quota` and `cpuset` when calculating the number of vCPUs.
* `cpu quota`: `cpu quota` is the most common way to declare the amount of CPU resources. The number of vCPUs introduced by `cpu quota` declared in a container's spec is: `vCPUs = ceiling( quota / period )`.

View File

@@ -1,9 +1,9 @@
# Design Doc for Kata Containers' VCPUs Pinning Feature
# Design Doc for Kata Containers VCPUs Pinning Feature
## Background
By now, vCPU threads of Kata Containers are scheduled randomly to CPUs. And each pod would request a specific set of CPUs which we call it CPU set (just the CPU set meaning in Linux cgroups).
By now, vCPU threads of Kata Containers are scheduled randomly to CPUs. And each pod would request a specific set of CPUs which we call it CPU set (just the CPU set meaning in Linux cgroups).
If the number of vCPU threads are equal to that of CPUs claimed in CPU set, we can then pin each vCPU thread to one specified CPU, to reduce the cost of random scheduling.
If the number of vCPU threads are equal to that of CPUs claimed in CPU set, we can then pin each vCPU thread to one specified CPU, to reduce the cost of random scheduling.
## Detailed Design
@@ -20,7 +20,7 @@ Two ways are provided to use this vCPU thread pinning feature: through `QEMU` co
### When is VCPUs Pinning Checked?
As shown in Section 1, when `num(vCPU threads) == num(CPUs in CPU set)`, we shall pin each vCPU thread to a specified CPU. And when this condition is broken, we should restore to the original random scheduling pattern.
As shown in Section 1, when `num(vCPU threads) == num(CPUs in CPU set)`, we shall pin each vCPU thread to a specified CPU. And when this condition is broken, we should restore to the original random scheduling pattern.
So when may `num(CPUs in CPU set)` change? There are 5 possible scenes:
| Possible scenes | Related Code |
@@ -34,4 +34,4 @@ So when may `num(CPUs in CPU set)` change? There are 5 possible scenes:
### Core Pinning Logics
We can split the whole process into the following steps. Related methods are `checkVCPUsPinning` and `resetVCPUsPinning`, in file Sandbox.go.
![](arch-images/vcpus-pinning-process.png)
![](arch-images/vcpus-pinning-process.png)

View File

@@ -49,4 +49,4 @@
- [How to use the Kata Agent Policy](how-to-use-the-kata-agent-policy.md)
- [How to pull images in the guest](how-to-pull-images-in-guest-with-kata.md)
- [How to use mem-agent to decrease the memory usage of Kata container](how-to-use-memory-agent.md)
- [How to use seccomp with runtime-rs](how-to-use-seccomp-with-runtime-rs.md)
- [How to use seccomp with runtime-rs](how-to-use-seccomp-with-runtime-rs.md)

View File

@@ -1,24 +1,24 @@
# How to use Kata Containers and Containerd
This document covers the installation and configuration of [containerd](https://containerd.io/)
This document covers the installation and configuration of [containerd](https://containerd.io/)
and [Kata Containers](https://katacontainers.io). The containerd provides not only the `ctr`
command line tool, but also the [CRI](https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/)
command line tool, but also the [CRI](https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/)
interface for [Kubernetes](https://kubernetes.io) and other CRI clients.
This document is primarily written for Kata Containers v1.5.0-rc2 or above, and containerd v1.2.0 or above.
This document is primarily written for Kata Containers v1.5.0-rc2 or above, and containerd v1.2.0 or above.
Previous versions are addressed here, but we suggest users upgrade to the newer versions for better support.
## Concepts
### Kubernetes `RuntimeClass`
[`RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/) is a Kubernetes feature first
introduced in Kubernetes 1.12 as alpha. It is the feature for selecting the container runtime configuration to
[`RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/) is a Kubernetes feature first
introduced in Kubernetes 1.12 as alpha. It is the feature for selecting the container runtime configuration to
use to run a pods containers. This feature is supported in `containerd` since [v1.2.0](https://github.com/containerd/containerd/releases/tag/v1.2.0).
Before the `RuntimeClass` was introduced, Kubernetes was not aware of the difference of runtimes on the node. `kubelet`
creates Pod sandboxes and containers through CRI implementations, and treats all the Pods equally. However, there
are requirements to run trusted Pods (i.e. Kubernetes plugin) in a native container like runc, and to run untrusted
are requirements to run trusted Pods (i.e. Kubernetes plugin) in a native container like runc, and to run untrusted
workloads with isolated sandboxes (i.e. Kata Containers).
As a result, the CRI implementations extended their semantics for the requirements:
@@ -32,17 +32,17 @@ As a result, the CRI implementations extended their semantics for the requiremen
```
- Similarly, CRI-O introduced the annotation `io.kubernetes.cri-o.TrustedSandbox` for untrusted Pods.
To eliminate the complexity of user configuration introduced by the non-standardized annotations and provide
extensibility, `RuntimeClass` was introduced. This gives users the ability to affect the runtime behavior
through `RuntimeClass` without the knowledge of the CRI daemons. We suggest that users with multiple runtimes
To eliminate the complexity of user configuration introduced by the non-standardized annotations and provide
extensibility, `RuntimeClass` was introduced. This gives users the ability to affect the runtime behavior
through `RuntimeClass` without the knowledge of the CRI daemons. We suggest that users with multiple runtimes
use `RuntimeClass` instead of the deprecated annotations.
### Containerd Runtime V2 API: Shim V2 API
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](../../src/runtime/cmd/containerd-shim-kata-v2/)
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/main/core/runtime/v2) for Kata.
With `shimv2`, Kubernetes can launch Pod and OCI-compatible containers with one shim per Pod. Prior to `shimv2`, `2N+1`
shims (i.e. a `containerd-shim` and a `kata-shim` for each container and the Pod sandbox itself) and no standalone `kata-proxy`
With `shimv2`, Kubernetes can launch Pod and OCI-compatible containers with one shim per Pod. Prior to `shimv2`, `2N+1`
shims (i.e. a `containerd-shim` and a `kata-shim` for each container and the Pod sandbox itself) and no standalone `kata-proxy`
process were used, even with VSOCK not available.
![Kubernetes integration with shimv2](../design/arch-images/shimv2.svg)
@@ -87,7 +87,7 @@ $ popd
### Install `cri-tools`
> **Note:** `cri-tools` is a set of tools for CRI used for development and testing. Users who only want
> **Note:** `cri-tools` is a set of tools for CRI used for development and testing. Users who only want
> to use containerd with Kubernetes can skip the `cri-tools`.
You can install the `cri-tools` from source code:
@@ -104,7 +104,7 @@ $ popd
### Configure containerd to use Kata Containers
By default, the configuration of containerd is located at `/etc/containerd/config.toml`, and the
By default, the configuration of containerd is located at `/etc/containerd/config.toml`, and the
`cri` plugins are placed in the following section:
```toml
@@ -123,7 +123,7 @@ The following sections outline how to add Kata Containers to the configurations.
#### Kata Containers as a `RuntimeClass`
For
For
- Kata Containers v1.5.0 or above (including `1.5.0-rc`)
- Containerd v1.2.0 or above
- Kubernetes v1.12.0 or above
@@ -132,8 +132,8 @@ The `RuntimeClass` is suggested.
The following configuration includes two runtime classes:
- `plugins.cri.containerd.runtimes.runc`: the runc, and it is the default runtime.
- `plugins.cri.containerd.runtimes.kata`: The function in containerd (reference [the document here](https://github.com/containerd/containerd/tree/main/core/runtime/v2))
where the dot-connected string `io.containerd.kata.v2` is translated to `containerd-shim-kata-v2` (i.e. the
- `plugins.cri.containerd.runtimes.kata`: The function in containerd (reference [the document here](https://github.com/containerd/containerd/tree/main/core/runtime/v2))
where the dot-connected string `io.containerd.kata.v2` is translated to `containerd-shim-kata-v2` (i.e. the
binary name of the Kata implementation of [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/main/core/runtime/v2)).
```toml
@@ -168,9 +168,9 @@ This `ConfigPath` option is optional. If you do not specify it, shimv2 first tri
#### Kata Containers as the runtime for untrusted workload
For cases without `RuntimeClass` support, we can use the legacy annotation method to support using Kata Containers
for an untrusted workload. With the following configuration, you can run trusted workloads with a runtime such as `runc`
and then, run an untrusted workload with Kata Containers:
For cases without `RuntimeClass` support, we can use the legacy annotation method to support using Kata Containers
for an untrusted workload. With the following configuration, you can run trusted workloads with a runtime such as `runc`
and then, run an untrusted workload with Kata Containers:
```toml
[plugins.cri.containerd]
@@ -201,9 +201,9 @@ If you want to set Kata Containers as the only runtime in the deployment, you ca
> **Note:** If you skipped the [Install `cri-tools`](#install-cri-tools) section, you can skip this section too.
First, add the CNI configuration in the containerd configuration.
First, add the CNI configuration in the containerd configuration.
The following is the configuration if you installed CNI as the *[Install CNI plugins](#install-cni-plugins)* section outlined.
The following is the configuration if you installed CNI as the *[Install CNI plugins](#install-cni-plugins)* section outlined.
Put the CNI configuration as `/etc/cni/net.d/10-mynet.conf`:
@@ -324,7 +324,7 @@ $ sudo crictl start 1aab7585530e6
1aab7585530e6
```
In Kubernetes, you need to create a `RuntimeClass` resource and add the `RuntimeClass` field in the Pod Spec
In Kubernetes, you need to create a `RuntimeClass` resource and add the `RuntimeClass` field in the Pod Spec
(see this [document](https://kubernetes.io/docs/concepts/containers/runtime-class/) for more information).
If `RuntimeClass` is not supported, you can use the following annotation in a Kubernetes pod to identify as an untrusted workload:

View File

@@ -3358,4 +3358,4 @@
"title": "Kata containers",
"uid": "75pdqURGk",
"version": 1
}
}

View File

@@ -27,7 +27,7 @@ spec:
containers:
- name: kata-monitor
image: quay.io/kata-containers/kata-monitor:2.0.0
args:
args:
- -log-level=debug
ports:
- containerPort: 8090

View File

@@ -79,7 +79,7 @@ metadata:
spec:
replicas: 1
selector:
matchLabels:
matchLabels:
app: prometheus
template:
metadata:

View File

@@ -16,7 +16,7 @@ To pull images in the guest, we need to do the following steps:
### Delete images used for pulling in the guest
Though the `CRI Runtime Specific Snapshotter` is still an [experimental feature](https://github.com/containerd/containerd/blob/main/RELEASES.md#experimental-features) in containerd, which containerd is not supported to manage the same image in different `snapshotters`(The default `snapshotter` in containerd is `overlayfs`). To avoid errors caused by this, it is recommended to delete images (including the pause image) in containerd that needs to be pulled in guest later before configuring `nydus snapshotter` in containerd.
Though the `CRI Runtime Specific Snapshotter` is still an [experimental feature](https://github.com/containerd/containerd/blob/main/RELEASES.md#experimental-features) in containerd, which containerd is not supported to manage the same image in different `snapshotters`(The default `snapshotter` in containerd is `overlayfs`). To avoid errors caused by this, it is recommended to delete images (including the pause image) in containerd that needs to be pulled in guest later before configuring `nydus snapshotter` in containerd.
### Install `nydus snapshotter`
@@ -24,7 +24,7 @@ Though the `CRI Runtime Specific Snapshotter` is still an [experimental feature]
To use DaemonSet to install `nydus snapshotter`, we need to ensure that `yq` exists in the host.
1. Download `nydus snapshotter` repo
1. Download `nydus snapshotter` repo
```bash
$ nydus_snapshotter_install_dir="/tmp/nydus-snapshotter"
$ nydus_snapshotter_url=https://github.com/containerd/nydus-snapshotter
@@ -42,7 +42,7 @@ $ yq -i \
$ yq -i \
> 'data.ENABLE_CONFIG_FROM_VOLUME = "false"' -P \
> misc/snapshotter/base/nydus-snapshotter.yaml
# Enable to run snapshotter as a systemd service
# Enable to run snapshotter as a systemd service
# (skip if you want to run nydus snapshotter as a standalone process)
$ yq -i \
> 'data.ENABLE_SYSTEMD_SERVICE = "true"' -P \
@@ -79,7 +79,7 @@ Created symlink /etc/systemd/system/multi-user.target.wants/nydus-snapshotter.se
#### Install `nydus snapshotter` manually
1. Download `nydus snapshotter` binary from release
1. Download `nydus snapshotter` binary from release
```bash
$ ARCH=$(uname -m)
$ golang_arch=$(case "$ARCH" in
@@ -111,7 +111,7 @@ level=info msg="Run daemons monitor..."
Configure `nydus snapshotter` to enable `CRI Runtime Specific Snapshotter` in containerd. This ensures run kata containers with `nydus snapshotter`. Below, the steps are illustrated using `kata-qemu` as an example.
```toml
# Modify containerd configuration to ensure that the following lines appear in the containerd configuration
# Modify containerd configuration to ensure that the following lines appear in the containerd configuration
# (Assume that the containerd config is located in /etc/containerd/config.toml)
[plugins."io.containerd.grpc.v1.cri".containerd]
@@ -124,7 +124,7 @@ Configure `nydus snapshotter` to enable `CRI Runtime Specific Snapshotter` in co
snapshotter = "nydus"
```
> **Notes:**
> **Notes:**
> The `CRI Runtime Specific Snapshotter` feature only works for containerd v1.7.0 and above. So for Containerd v1.7.0 below, in addition to the above settings, we need to set the global `snapshotter` to `nydus` in containerd config. For example:
```toml
@@ -280,7 +280,7 @@ quay.io/confidential-containers/test-images largeimage
```bash
$ lsblk --fs
NAME FSTYPE LABEL UUID FSAVAIL FSUSE% MOUNTPOINT
sda
sda
└─encrypted_disk_GsLDt
178M 87% /run/kata-containers/image
@@ -309,4 +309,4 @@ $ free -m
total used free shared buff/cache available
Mem: 1989 52 43 0 1893 1904
Swap: 0 0 0
```
```

View File

@@ -10,19 +10,19 @@ Currently, there is no widely applicable and convenient method available for use
## Solution
According to the proposal, it requires to use the `kata-ctl direct-volume` command to add a direct assigned block volume device to the Kata Containers runtime.
According to the proposal, it requires to use the `kata-ctl direct-volume` command to add a direct assigned block volume device to the Kata Containers runtime.
And then with the help of method [get_volume_mount_info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L95), get information from JSON file: `(mountinfo.json)` and parse them into structure [Direct Volume Info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L70) which is used to save device-related information.
And then with the help of method [get_volume_mount_info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L95), get information from JSON file: `(mountinfo.json)` and parse them into structure [Direct Volume Info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L70) which is used to save device-related information.
We only fill the `mountinfo.json`, such as `device` ,`volume_type`, `fs_type`, `metadata` and `options`, which correspond to the fields in [Direct Volume Info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L70), to describe a device.
We only fill the `mountinfo.json`, such as `device` ,`volume_type`, `fs_type`, `metadata` and `options`, which correspond to the fields in [Direct Volume Info](https://github.com/kata-containers/kata-containers/blob/099b4b0d0e3db31b9054e7240715f0d7f51f9a1c/src/libs/kata-types/src/mount.rs#L70), to describe a device.
The JSON file `mountinfo.json` placed in a sub-path `/kubelet/kata-test-vol-001/volume001` which under fixed path `/run/kata-containers/shared/direct-volumes/`.
And the full path looks like: `/run/kata-containers/shared/direct-volumes/kubelet/kata-test-vol-001/volume001`, But for some security reasons. it is
The JSON file `mountinfo.json` placed in a sub-path `/kubelet/kata-test-vol-001/volume001` which under fixed path `/run/kata-containers/shared/direct-volumes/`.
And the full path looks like: `/run/kata-containers/shared/direct-volumes/kubelet/kata-test-vol-001/volume001`, But for some security reasons. it is
encoded as `/run/kata-containers/shared/direct-volumes/L2t1YmVsZXQva2F0YS10ZXN0LXZvbC0wMDEvdm9sdW1lMDAx`.
Finally, when running a Kata Containers with `ctr run --mount type=X, src=Y, dst=Z,,options=rbind:rw`, the `type=X` should be specified a proprietary type specifically designed for some kind of volume.
Finally, when running a Kata Containers with `ctr run --mount type=X, src=Y, dst=Z,,options=rbind:rw`, the `type=X` should be specified a proprietary type specifically designed for some kind of volume.
Now, supported types:
Now, supported types:
- `directvol` for direct volume
- `vfiovol` for VFIO device based volume
@@ -46,10 +46,10 @@ $ sudo mkfs.ext4 /tmp/stor/rawdisk01.20g
```json
{
"device": "/tmp/stor/rawdisk01.20g",
"volume_type": "directvol",
"fs_type": "ext4",
"metadata":"{}",
"device": "/tmp/stor/rawdisk01.20g",
"volume_type": "directvol",
"fs_type": "ext4",
"metadata":"{}",
"options": []
}
```
@@ -57,7 +57,7 @@ $ sudo mkfs.ext4 /tmp/stor/rawdisk01.20g
```bash
$ sudo kata-ctl direct-volume add /kubelet/kata-direct-vol-002/directvol002 "{\"device\": \"/tmp/stor/rawdisk01.20g\", \"volume_type\": \"directvol\", \"fs_type\": \"ext4\", \"metadata\":"{}", \"options\": []}"
$# /kubelet/kata-direct-vol-002/directvol002 <==> /run/kata-containers/shared/direct-volumes/W1lMa2F0ZXQva2F0YS10a2F0DAxvbC0wMDEvdm9sdW1lMDAx
$ cat W1lMa2F0ZXQva2F0YS10a2F0DAxvbC0wMDEvdm9sdW1lMDAx/mountInfo.json
$ cat W1lMa2F0ZXQva2F0YS10a2F0DAxvbC0wMDEvdm9sdW1lMDAx/mountInfo.json
{"volume_type":"directvol","device":"/tmp/stor/rawdisk01.20g","fs_type":"ext4","metadata":{},"options":[]}
```
@@ -79,8 +79,8 @@ In this scenario, the device's host kernel driver will be replaced by `vfio-pci`
And either device's BDF or its VFIO IOMMU group ID in `/dev/vfio/` is fine for "device" in `mountinfo.json`.
```bash
$ lspci -nn -k -s 45:00.1
45:00.1 SCSI storage controller
$ lspci -nn -k -s 45:00.1
45:00.1 SCSI storage controller
...
Kernel driver in use: vfio-pci
...
@@ -99,9 +99,9 @@ First, configure the `mountinfo.json`, as below:
```json
{
"device": "45:00.1",
"volume_type": "vfiovol",
"fs_type": "ext4",
"metadata":"{}",
"volume_type": "vfiovol",
"fs_type": "ext4",
"metadata":"{}",
"options": []
}
```
@@ -111,9 +111,9 @@ First, configure the `mountinfo.json`, as below:
```json
{
"device": "0000:45:00.1",
"volume_type": "vfiovol",
"fs_type": "ext4",
"metadata":"{}",
"volume_type": "vfiovol",
"fs_type": "ext4",
"metadata":"{}",
"options": []
}
```
@@ -122,10 +122,10 @@ First, configure the `mountinfo.json`, as below:
```json
{
"device": "/dev/vfio/110",
"volume_type": "vfiovol",
"fs_type": "ext4",
"metadata":"{}",
"device": "/dev/vfio/110",
"volume_type": "vfiovol",
"fs_type": "ext4",
"metadata":"{}",
"options": []
}
```
@@ -135,7 +135,7 @@ Second, run kata-containers with device(`/dev/vfio/110`) as an example:
```bash
$ sudo kata-ctl direct-volume add /kubelet/kata-vfio-vol-003/vfiovol003 "{\"device\": \"/dev/vfio/110\", \"volume_type\": \"vfiovol\", \"fs_type\": \"ext4\", \"metadata\":"{}", \"options\": []}"
$ # /kubelet/kata-vfio-vol-003/directvol003 <==> /run/kata-containers/shared/direct-volumes/F0va22F0ZvaS12F0YS10a2F0DAxvbC0F0ZXvdm9sdF0Z0YSx
$ cat F0va22F0ZvaS12F0YS10a2F0DAxvbC0F0ZXvdm9sdF0Z0YSx/mountInfo.json
$ cat F0va22F0ZvaS12F0YS10a2F0DAxvbC0F0ZXvdm9sdF0Z0YSx/mountInfo.json
{"volume_type":"vfiovol","device":"/dev/vfio/110","fs_type":"ext4","metadata":{},"options":[]}
```

View File

@@ -1,5 +1,5 @@
## Introduction
To improve security, Kata Container supports running the VMM process (QEMU and cloud-hypervisor) as a non-`root` user.
To improve security, Kata Container supports running the VMM process (QEMU and cloud-hypervisor) as a non-`root` user.
This document describes how to enable the rootless VMM mode and its limitations.
## Pre-requisites
@@ -20,8 +20,8 @@ By default, the VMM process still runs as the root user. There are two ways to e
2. Set the Kubernetes annotation `io.katacontainers.hypervisor.rootless` to `true`.
## Implementation details
When `rootless` flag is enabled, upon a request to create a Pod, Kata Containers runtime creates a random user and group (e.g. `kata-123`), and uses them to start the hypervisor process.
The `kvm` group is also given to the hypervisor process as a supplemental group to give the hypervisor process access to the `/dev/kvm` device.
When `rootless` flag is enabled, upon a request to create a Pod, Kata Containers runtime creates a random user and group (e.g. `kata-123`), and uses them to start the hypervisor process.
The `kvm` group is also given to the hypervisor process as a supplemental group to give the hypervisor process access to the `/dev/kvm` device.
Another necessary change is to move the hypervisor runtime files (e.g. `vhost-fs.sock`, `qmp.sock`) to a directory (under `/run/user/[uid]/`) where only the non-root hypervisor has access to.
## Limitations
@@ -30,4 +30,4 @@ Another necessary change is to move the hypervisor runtime files (e.g. `vhost-fs
2. Currently, this feature is only supported in QEMU and cloud-hypervisor. For firecracker, you can use jailer to run the VMM process with a non-root user.
3. Certain features will not work when rootless VMM is enabled, including:
1. Passing devices to the guest (`virtio-blk`, `virtio-scsi`) will not work if the non-privileged user does not have permission to access it (leading to a permission denied error). A more permissive permission (e.g. 666) may overcome this issue. However, you need to be aware of the potential security implications of reducing the security on such devices.
2. `vfio` device will also not work because of permission denied error.
2. `vfio` device will also not work because of permission denied error.

View File

@@ -59,7 +59,7 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.enable_iothreads` | `boolean`| enable IO to be processed in a separate thread. Supported currently for virtio-`scsi` driver |
| `io.katacontainers.config.hypervisor.enable_mem_prealloc` | `boolean` | the memory space used for `nvdimm` device by the hypervisor |
| `io.katacontainers.config.hypervisor.enable_vhost_user_store` | `boolean` | enable vhost-user storage device (QEMU) |
| `io.katacontainers.config.hypervisor.vhost_user_reconnect_timeout_sec` | `string`| the timeout for reconnecting vhost user socket (QEMU)
| `io.katacontainers.config.hypervisor.vhost_user_reconnect_timeout_sec` | `string`| the timeout for reconnecting vhost user socket (QEMU)
| `io.katacontainers.config.hypervisor.enable_virtio_mem` | `boolean` | enable virtio-mem (QEMU) |
| `io.katacontainers.config.hypervisor.entropy_source` (R) | string| the path to a host source of entropy (`/dev/random`, `/dev/urandom` or real hardware RNG device) |
| `io.katacontainers.config.hypervisor.file_mem_backend` (R) | string | file based memory backend root directory |

View File

@@ -19,7 +19,7 @@ The Kubernetes cluster will use the
## Install and configure containerd
First, follow the [How to use Kata Containers and Containerd](containerd-kata.md) to install and configure containerd.
First, follow the [How to use Kata Containers and Containerd](containerd-kata.md) to install and configure containerd.
Then, make sure the containerd works with the [examples in it](containerd-kata.md#run).
## Install and configure Kubernetes
@@ -44,11 +44,13 @@ In order to allow Kubelet to use containerd (using the CRI interface), configure
```bash
$ sudo mkdir -p /etc/systemd/system/kubelet.service.d/
$ cat << EOF | sudo tee /etc/systemd/system/kubelet.service.d/0-containerd.conf
[Service]
[Service]
Environment="KUBELET_EXTRA_ARGS=--container-runtime=remote --runtime-request-timeout=15m --container-runtime-endpoint=unix:///run/containerd/containerd.sock"
EOF
```
For Kata Containers (and especially CoCo / Confidential Containers tests), use at least `--runtime-request-timeout=600s` (10m) so CRI CreateContainerRequest does not time out.
- Inform systemd about the new configuration
```bash
@@ -182,7 +184,7 @@ If a pod has the `runtimeClassName` set to `kata`, the CRI runs the pod with the
containers:
- name: nginx
image: nginx
EOF
```

View File

@@ -2,9 +2,9 @@
## Sysctls
In Linux, the sysctl interface allows an administrator to modify kernel
parameters at runtime. Parameters are available via the `/proc/sys/` virtual
process file system.
In Linux, the sysctl interface allows an administrator to modify kernel
parameters at runtime. Parameters are available via the `/proc/sys/` virtual
process file system.
The parameters include the following subsystems among others:
- `fs` (file systems)
@@ -17,9 +17,9 @@ To get a complete list of kernel parameters, run:
$ sudo sysctl -a
```
Kubernetes provide mechanisms for setting namespaced sysctls.
Kubernetes provide mechanisms for setting namespaced sysctls.
Namespaced sysctls can be set per pod in the case of Kubernetes.
The following sysctls are known to be namespaced and can be set with
The following sysctls are known to be namespaced and can be set with
Kubernetes:
- `kernel.shm*`
@@ -84,14 +84,14 @@ The recommendation is to set them directly on the host or use a privileged
container in the case of Kubernetes.
In the case of Kata, the approach of setting sysctls on the host does not
work since the host sysctls have no effect on a Kata Container running
work since the host sysctls have no effect on a Kata Container running
inside a guest. Kata gives you the ability to set non-namespaced sysctls using a privileged container.
This has the advantage that the non-namespaced sysctls are set inside the guest
without having any effect on the `/proc/sys` values of any other pod or the
host itself.
This has the advantage that the non-namespaced sysctls are set inside the guest
without having any effect on the `/proc/sys` values of any other pod or the
host itself.
The recommended approach to do this would be to set the sysctl value in a
privileged init container. In this way, the application containers do not need any elevated
privileged init container. In this way, the application containers do not need any elevated
privileges.
```

View File

@@ -116,4 +116,4 @@ time for I in $(seq 100); do
done
# Display the memory usage again after running the test
free -h
free -h

View File

@@ -1,6 +1,6 @@
# Kata Agent Policy
Agent Policy is a Kata Containers feature that enables the Guest VM to perform additional validation for each [ttRPC API](../../src/libs/protocols/protos/agent.proto) request.
Agent Policy is a Kata Containers feature that enables the Guest VM to perform additional validation for each [ttRPC API](../../src/libs/protocols/protos/agent.proto) request.
The Policy is commonly used for implementing confidential containers, where the Kata Shim and the Kata Agent have different trust properties. However, the Policy can be used for non-confidential containers too - e.g., for a basic defense in depth step of blocking the Host from starting an application on the Guest. However, for non-confidential containers, the Host might be able to modify the Policy and/or replace the Agent and disable its Policy rules, so a Policy is more helpful for confidential containers.
@@ -35,7 +35,7 @@ Kubernetes users can encode in `base64` format their Policy documents, and add t
For example, the [`allow-all-except-exec-process.rego`](../../src/kata-opa/allow-all-except-exec-process.rego) sample policy file is different from the [default Policy](../../src/kata-opa/allow-all.rego) because it rejects any `ExecProcess` requests. To encode this policy file, you need to:
- Embed the policy inside an init data struct
- Compress
- Base64 encode
- Base64 encode
For example:
```bash

View File

@@ -22,7 +22,7 @@ mitigation does not affect a container's ability to mount *guest* devices.
## Containerd
The Containerd allows configuring the privileged host devices behavior for each runtime in the containerd config. This is
done with the `privileged_without_host_devices` option. Setting this to `true` will disable hot plugging of the host
done with the `privileged_without_host_devices` option. Setting this to `true` will disable hot plugging of the host
devices into the guest, even when privileged is enabled.
Support for configuring privileged host devices behaviour was added in containerd `1.3.0` version.
@@ -49,7 +49,7 @@ See below example config:
## CRI-O
Similar to containerd, CRI-O allows configuring the privileged host devices
behavior for each runtime in the CRI config. This is done with the
behavior for each runtime in the CRI config. This is done with the
`privileged_without_host_devices` option. Setting this to `true` will disable
hot plugging of the host devices into the guest, even when privileged is enabled.
@@ -74,4 +74,4 @@ See below example config:
```
- [Kata Containers with CRI-O](../how-to/run-kata-with-k8s.md#cri-o)

View File

@@ -30,7 +30,7 @@ POD ID CREATED STATE NAME
#### Create container in the pod sandbox with config file
```bash
$ sudo crictl create 16a62b035940f container_config.json sandbox_config.json
$ sudo crictl create 16a62b035940f container_config.json sandbox_config.json
e6ca0e0f7f532686236b8b1f549e4878e4fe32ea6b599a5d684faf168b429202
```
@@ -66,7 +66,7 @@ $ sudo crictl exec -it e6ca0e0f7f532 sh
And run commands in it:
```
/ # hostname
/ # hostname
busybox_host
/ # id
uid=0(root) gid=0(root)

View File

@@ -169,7 +169,7 @@ Add the following annotation for CRI-O
```yaml
io.kubernetes.cri-o.TrustedSandbox: "false"
```
The following is an example of what your YAML can look like:
The following is an example of what your YAML can look like:
```yaml
...
@@ -199,7 +199,7 @@ Add the following annotation for containerd
```yaml
io.kubernetes.cri.untrusted-workload: "true"
```
The following is an example of what your YAML can look like:
The following is an example of what your YAML can look like:
```yaml
...

View File

@@ -3,12 +3,12 @@
### What is VMCache
VMCache is a new function that creates VMs as caches before using it.
It helps speed up new container creation.
It helps speed up new container creation.
The function consists of a server and some clients communicating
through Unix socket. The protocol is gRPC in [`protocols/cache/cache.proto`](../../src/runtime/protocols/cache/cache.proto).
through Unix socket. The protocol is gRPC in [`protocols/cache/cache.proto`](../../src/runtime/protocols/cache/cache.proto).
The VMCache server will create some VMs and cache them by factory cache.
It will convert the VM to gRPC format and transport it when gets
requested from clients.
requested from clients.
Factory `grpccache` is the VMCache client. It will request gRPC format
VM and convert it back to a VM. If VMCache function is enabled,
`kata-runtime` will request VM from factory `grpccache` when it creates
@@ -16,8 +16,8 @@ a new sandbox.
### How is this different to VM templating
Both [VM templating](../how-to/what-is-vm-templating-and-how-do-I-use-it.md) and VMCache help speed up new container creation.
When VM templating enabled, new VMs are created by cloning from a pre-created template VM, and they will share the same initramfs, kernel and agent memory in readonly mode. So it saves a lot of memory if there are many Kata Containers running on the same host.
Both [VM templating](../how-to/what-is-vm-templating-and-how-do-I-use-it.md) and VMCache help speed up new container creation.
When VM templating enabled, new VMs are created by cloning from a pre-created template VM, and they will share the same initramfs, kernel and agent memory in readonly mode. So it saves a lot of memory if there are many Kata Containers running on the same host.
VMCache is not vulnerable to [share memory CVE](../how-to/what-is-vm-templating-and-how-do-I-use-it.md#what-are-the-cons) because each VM doesn't share the memory.
### How to enable VMCache
@@ -25,9 +25,9 @@ VMCache is not vulnerable to [share memory CVE](../how-to/what-is-vm-templating-
VMCache can be enabled by changing your Kata Containers config file (`/usr/share/defaults/kata-containers/configuration.toml`,
overridden by `/etc/kata-containers/configuration.toml` if provided) such that:
* `vm_cache_number` specifies the number of caches of VMCache:
* unspecified or == 0
* unspecified or == 0
VMCache is disabled
* `> 0`
* `> 0`
will be set to the specified number
* `vm_cache_endpoint` specifies the address of the Unix socket.

View File

@@ -10,8 +10,8 @@ much like a process fork done by the kernel but here we *fork* VMs.
### How is this different from VMCache
Both [VMCache](../how-to/what-is-vm-cache-and-how-do-I-use-it.md) and VM templating help speed up new container creation.
When VMCache enabled, new VMs are created by the VMCache server. So it is not vulnerable to share memory CVE because each VM doesn't share the memory.
Both [VMCache](../how-to/what-is-vm-cache-and-how-do-I-use-it.md) and VM templating help speed up new container creation.
When VMCache enabled, new VMs are created by the VMCache server. So it is not vulnerable to share memory CVE because each VM doesn't share the memory.
VM templating saves a lot of memory if there are many Kata Containers running on the same host.
### What are the Pros

View File

@@ -1,11 +1,11 @@
# Kata Containers installation guides
The following is an overview of the different installation methods available.
The following is an overview of the different installation methods available.
## Prerequisites
Kata Containers requires nested virtualization or bare metal. Check
[hardware requirements](./../../README.md#hardware-requirements) to see if your system is capable of running Kata
Kata Containers requires nested virtualization or bare metal. Check
[hardware requirements](./../../README.md#hardware-requirements) to see if your system is capable of running Kata
Containers.
The Kata Deploy Helm chart is the preferred way to install all of the binaries and

View File

@@ -101,7 +101,7 @@
```
> **Note:**
>
>
> The containerd daemon needs to be able to find the
> `containerd-shim-kata-v2` binary to allow Kata Containers to be created.

View File

@@ -1,10 +1,10 @@
# Kata Containers 3.0 rust runtime installation
The following is an overview of the different installation methods available.
The following is an overview of the different installation methods available.
## Prerequisites
Kata Containers 3.0 rust runtime requires nested virtualization or bare metal. Check
[hardware requirements](/src/runtime/README.md#hardware-requirements) to see if your system is capable of running Kata
Kata Containers 3.0 rust runtime requires nested virtualization or bare metal. Check
[hardware requirements](/src/runtime/README.md#hardware-requirements) to see if your system is capable of running Kata
Containers.
### Platform support
@@ -25,10 +25,10 @@ architectures:
| Installation method | Description | Automatic updates | Use case | Availability
|------------------------------------------------------|----------------------------------------------------------------------------------------------|-------------------|-----------------------------------------------------------------------------------------------|----------- |
| [Using kata-deploy](#kata-deploy-installation) | The preferred way to deploy the Kata Containers distributed binaries on a Kubernetes cluster | **No!** | Best way to give it a try on kata-containers on an already up and running Kubernetes cluster. | Yes |
| [Using official distro packages](#official-packages) | Kata packages provided by Linux distributions official repositories | yes | Recommended for most users. | No |
| [Using official distro packages](#official-packages) | Kata packages provided by Linux distributions official repositories | yes | Recommended for most users. | No |
| [Automatic](#automatic-installation) | Run a single command to install a full system | **No!** | For those wanting the latest release quickly. | No |
| [Manual](#manual-installation) | Follow a guide step-by-step to install a working system | **No!** | For those who want the latest release with more control. | No |
| [Build from source](#build-from-source-installation) | Build the software components manually | **No!** | Power users and developers only. | Yes |
| [Build from source](#build-from-source-installation) | Build the software components manually | **No!** | Power users and developers only. | Yes |
### Kata Deploy Installation
@@ -57,7 +57,7 @@ Follow the [`kata-deploy`](../../tools/packaging/kata-deploy/helm-chart/README.m
```
* Musl support for fully static binary
Example for `x86_64`
```
$ rustup target add x86_64-unknown-linux-musl

View File

@@ -80,8 +80,8 @@ In case of Kata, today the devices which we need in the guest are:
- Dynamic Resource Management: `ACPI` is utilized to allow for dynamic VM
resource management (for example: CPU, memory, device hotplug). This is
required when containers are resized, or more generally when containers are
added to a pod.
added to a pod.
How these devices are utilized varies depending on the VMM utilized. We clarify
the default settings provided when integrating Kata with the QEMU, Dragonball,
Firecracker and Cloud Hypervisor VMMs in the following sections.

View File

@@ -107,7 +107,7 @@ Containers agent:
By default, tracing is disabled for all components. To enable _any_ form of
tracing an `enable_tracing` option must be enabled for at least one component.
> **Note:**
> **Note:**
>
> Enabling this option will only allow tracing for subsequently
> started containers.
@@ -187,7 +187,7 @@ from improvements in the tracing infrastructure. Overall, the impact of
enabling runtime and agent tracing should be extremely low.
## Agent shutdown behaviour
In normal operation, the Kata runtime manages the VM shutdown and performs
certain optimisations to speed up this process. However, if agent tracing is
enabled, the agent itself is responsible for shutting down the VM. This it to

View File

@@ -83,10 +83,10 @@ If you have [s390-tools](https://github.com/ibm-s390-linux/s390-tools) available
```
[container]# lszcrypt -V
CARD.DOM TYPE MODE STATUS REQUESTS PENDING HWTYPE QDEPTH FUNCTIONS DRIVER SESTAT
CARD.DOM TYPE MODE STATUS REQUESTS PENDING HWTYPE QDEPTH FUNCTIONS DRIVER SESTAT
--------------------------------------------------------------------------------------------------------
03 CEX8P EP11-Coproc online 2 0 14 08 -----XN-F- cex4card -
03.0041 CEX8P EP11-Coproc online 2 0 14 08 -----XN-F- cex4queue usable
03 CEX8P EP11-Coproc online 2 0 14 08 -----XN-F- cex4card -
03.0041 CEX8P EP11-Coproc online 2 0 14 08 -----XN-F- cex4queue usable
```
---

View File

@@ -6,15 +6,15 @@ For integrated GPUs please refer to [Integrate-Intel-GPUs-with-Kata](Intel-GPU-p
> **Note:** These instructions are for a system that has an x86_64 CPU.
An Intel Discrete GPU can be passed to a Kata Container using GPU passthrough,
An Intel Discrete GPU can be passed to a Kata Container using GPU passthrough,
or SR-IOV passthrough.
In Intel GPU pass-through mode, an entire physical GPU is directly assigned to one VM.
In Intel GPU pass-through mode, an entire physical GPU is directly assigned to one VM.
In this mode of operation, the GPU is accessed exclusively by the Intel driver running in
the VM to which it is assigned. The GPU is not shared among VMs.
With SR-IOV mode, it is possible to pass a Virtual GPU instance to a virtual machine.
With this, multiple Virtual GPU instances can be carved out of a single physical GPU
With this, multiple Virtual GPU instances can be carved out of a single physical GPU
and be passed to different VMs, allowing the GPU to be shared.
| Technology | Description |
@@ -28,13 +28,13 @@ Intel GPUs Recommended for Virtualization:
- Intel® Data Center GPU Max Series (`Ponte Vecchio`)
- Intel® Data Center GPU Flex Series (`Arctic Sound-M`)
- Intel® Data Center GPU Arc Series
- Intel® Data Center GPU Arc Series
The following steps outline the workflow for using an Intel Graphics device with Kata Containers.
## Host BIOS requirements
Hardware such as Intel Max and Flex series require larger PCI BARs.
Hardware such as Intel Max and Flex series require larger PCI BARs.
For large BAR devices, MMIO mapping above the 4GB address space should be enabled in the PCI configuration of the BIOS.
@@ -89,7 +89,7 @@ CONFIG_VFIO_IOMMU_TYPE1
CONFIG_VFIO_PCI
```
## Host kernel command line
## Host kernel command line
Your host kernel needs to be booted with `intel_iommu=on` and `i915.enable_iaf=0` on the kernel command
line.
@@ -112,8 +112,8 @@ $ sudo update-grub
For CentOS/RHEL:
```bash
$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg
```
$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg
```
4. Reboot the system
```bash
@@ -234,7 +234,7 @@ Use the following steps to pass an Intel Graphics device in SR-IOV mode to a Kat
$ BDF="0000:3a:00.0"
$ cat "/sys/bus/pci/devices/$BDF/sriov_totalvfs"
63
```
```
Create SR-IOV interfaces for the GPU:
```sh

View File

@@ -1,3 +1,3 @@
[toolchain]
# Keep in sync with versions.yaml
channel = "1.89"
channel = "1.91"

View File

@@ -1,3 +1,8 @@
# Copyright (c) 2025 NVIDIA Corporation
#
# SPDX-License-Identifier: Apache-2.0
#
# Allow opening any 'source'd file, even if not specified as input
external-sources=true
@@ -9,7 +14,7 @@ enable=check-unassigned-uppercase
# Enforces braces around variable expansions to avoid ambiguity or confusion.
# e.g. ${filename} rather than $filename
enable=require-variable-braces
enable=require-variable-braces
# Requires double-bracket syntax [[ expr ]] for safer, more consistent tests.
# NO: if [ "$var" = "value" ]

234
src/agent/Cargo.lock generated
View File

@@ -625,9 +625,9 @@ dependencies = [
[[package]]
name = "bytes"
version = "1.10.1"
version = "1.11.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d71b6127be86fdcfddb610f7182ac57211d4b18a3e9c82eb2d17662f2227ad6a"
checksum = "1e748733b7cbc798e1434b6ac524f0c1ff2ab456fe201501e6497c8417a4fc33"
[[package]]
name = "capctl"
@@ -743,12 +743,6 @@ version = "1.0.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5b63caa9aa9397e2d9480a9b13673856c78d8ac123288526c37d7839f2a86990"
[[package]]
name = "common-path"
version = "1.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2382f75942f4b3be3690fe4f86365e9c853c1587d6ee58212cebf6e2a9ccd101"
[[package]]
name = "concurrent-queue"
version = "2.5.0"
@@ -987,9 +981,9 @@ dependencies = [
[[package]]
name = "deranged"
version = "0.4.0"
version = "0.5.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9c9e6a11ca8224451684bc0d7d5a7adbf8f2fd6887261a1cfc3c0432f9d4068e"
checksum = "ececcb659e7ba858fb4f10388c250a7252eb0a27373f1a72b8748afdd248e587"
dependencies = [
"powerfmt",
]
@@ -1066,27 +1060,6 @@ dependencies = [
"crypto-common",
]
[[package]]
name = "dirs-next"
version = "2.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b98cf8ebf19c3d1b223e151f99a4f9f0690dca41414773390fc824184ac833e1"
dependencies = [
"cfg-if",
"dirs-sys-next",
]
[[package]]
name = "dirs-sys-next"
version = "0.1.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4ebda144c4fe02d1f7ea1a7d9641b6fc6b580adcfa024ae48797ecdeb6825b4d"
dependencies = [
"libc",
"redox_users",
"winapi",
]
[[package]]
name = "displaydoc"
version = "0.2.5"
@@ -1119,6 +1092,18 @@ dependencies = [
"serde",
]
[[package]]
name = "enum-as-inner"
version = "0.6.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a1e6a265c649f3f5979b601d26f1d05ada116434c87741c9493cb56218f76cbc"
dependencies = [
"heck 0.5.0",
"proc-macro2",
"quote",
"syn 2.0.101",
]
[[package]]
name = "enumflags2"
version = "0.7.11"
@@ -1590,7 +1575,7 @@ version = "1.3.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f4a85d31aea989eead29a3aaf9e1115a180df8282431156e533de47660892565"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"fnv",
"itoa",
]
@@ -1601,7 +1586,7 @@ version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1efedce1fb8e6913f23e0c92de8e62cd5b772a67e7b3946df930a62566c93184"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"http",
]
@@ -1611,7 +1596,7 @@ version = "0.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b021d93e26becf5dc7e1b75b1bed1fd93124b374ceb73f43d4d4eafec896a64a"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"futures-core",
"http",
"http-body",
@@ -1630,7 +1615,7 @@ version = "1.6.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cc2b571658e38e0c01b1fdca3bbbe93c00d3d71693ff2770043f8c29bc7d6f80"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"futures-channel",
"futures-util",
"http",
@@ -1649,7 +1634,7 @@ version = "0.1.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "497bbc33a26fdd4af9ed9c70d63f61cf56a938375fbb32df34db9b1cd6d643f2"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"futures-channel",
"futures-util",
"http",
@@ -1675,7 +1660,7 @@ dependencies = [
"js-sys",
"log",
"wasm-bindgen",
"windows-core 0.61.0",
"windows-core",
]
[[package]]
@@ -2123,8 +2108,6 @@ version = "0.1.0"
dependencies = [
"anyhow",
"byteorder",
"chrono",
"common-path",
"fail",
"hex",
"kata-types",
@@ -2133,11 +2116,9 @@ dependencies = [
"mockall",
"nix 0.26.4",
"oci-spec",
"once_cell",
"pci-ids",
"rand",
"runtime-spec",
"safe-path",
"serde",
"serde_json",
"slog",
@@ -2156,8 +2137,8 @@ dependencies = [
"byte-unit",
"flate2",
"glob",
"hex",
"lazy_static",
"nix 0.26.4",
"num_cpus",
"oci-spec",
"regex",
@@ -2168,6 +2149,7 @@ dependencies = [
"sha2 0.10.9",
"slog",
"slog-scope",
"sysctl",
"sysinfo",
"thiserror 1.0.69",
"toml",
@@ -2214,16 +2196,6 @@ version = "0.2.172"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d750af042f7ef4f724306de029d18836c26c1765a54a6a3f094cbd23a7267ffa"
[[package]]
name = "libredox"
version = "0.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c0ff37bd590ca25063e35af745c343cb7a0271906fb7b37e4813e8f79f00268d"
dependencies = [
"bitflags 2.9.0",
"libc",
]
[[package]]
name = "libseccomp"
version = "0.3.0"
@@ -2337,7 +2309,6 @@ name = "mem-agent"
version = "0.2.0"
dependencies = [
"anyhow",
"async-trait",
"chrono",
"maplit",
"nix 0.30.1",
@@ -2488,7 +2459,7 @@ version = "0.11.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "72452e012c2f8d612410d89eea01e2d9b56205274abb35d53f60200b2ec41d60"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"futures",
"log",
"netlink-packet-core",
@@ -2514,7 +2485,7 @@ version = "0.8.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "16c903aa70590cb93691bf97a767c8d1d6122d2cc9070433deb3bbf36ce8bd23"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"futures",
"libc",
"log",
@@ -2678,9 +2649,9 @@ dependencies = [
[[package]]
name = "num-conv"
version = "0.1.0"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "51d515d32fb182ee37cda2ccdcb92950d6a3c2893aa280e540671c2cd0f3b1d9"
checksum = "cf97ec579c3c42f953ef76dbf8d55ac91fb219dde70e49aa4a6b7d74e9919050"
[[package]]
name = "num-integer"
@@ -3183,7 +3154,7 @@ version = "0.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "de5e2533f59d08fcf364fd374ebda0692a70bd6d7e66ef97f306f45c6c5d8020"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"prost-derive",
]
@@ -3193,7 +3164,7 @@ version = "0.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "355f634b43cdd80724ee7848f95770e7e70eefa6dcf14fea676216573b8fd603"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"heck 0.3.3",
"itertools",
"log",
@@ -3224,7 +3195,7 @@ version = "0.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "603bbd6394701d13f3f25aada59c7de9d35a6a5887cfc156181234a44002771b"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"prost",
]
@@ -3372,17 +3343,6 @@ dependencies = [
"bitflags 2.9.0",
]
[[package]]
name = "redox_users"
version = "0.4.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ba009ff324d1fc1b900bd1fdb31564febe58a8ccc8a6fdbb93b543d33b13ca43"
dependencies = [
"getrandom 0.2.16",
"libredox",
"thiserror 1.0.69",
]
[[package]]
name = "ref-cast"
version = "1.0.24"
@@ -3498,7 +3458,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d19c46a6fdd48bc4dab94b6103fccc55d34c67cc0ad04653aad4ea2a07cd7bbb"
dependencies = [
"base64 0.22.1",
"bytes 1.10.1",
"bytes 1.11.1",
"futures-channel",
"futures-core",
"futures-util",
@@ -3530,13 +3490,13 @@ dependencies = [
[[package]]
name = "rkyv"
version = "0.7.45"
version = "0.7.46"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9008cd6385b9e161d8229e1f6549dd23c3d022f132a2ea37ac3a10ac4935779b"
checksum = "2297bf9c81a3f0dc96bc9521370b88f054168c29826a75e89c55ff196e7ed6a1"
dependencies = [
"bitvec",
"bytecheck",
"bytes 1.10.1",
"bytes 1.11.1",
"hashbrown 0.12.3",
"ptr_meta",
"rend",
@@ -3548,9 +3508,9 @@ dependencies = [
[[package]]
name = "rkyv_derive"
version = "0.7.45"
version = "0.7.46"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "503d1d27590a2b0a3a4ca4c94755aa2875657196ecbf401a42eff41d7de532c0"
checksum = "84d7b42d4b8d06048d3ac8db0eb31bcb942cbeb709f0b5f2b2ebde398d3038f5"
dependencies = [
"proc-macro2",
"quote",
@@ -3617,7 +3577,6 @@ dependencies = [
name = "runtime-spec"
version = "0.1.0"
dependencies = [
"libc",
"serde",
"serde_derive",
"serde_json",
@@ -3631,7 +3590,7 @@ checksum = "faa7de2ba56ac291bd90c6b9bece784a52ae1411f9506544b3eae36dd2356d50"
dependencies = [
"arrayvec",
"borsh",
"bytes 1.10.1",
"bytes 1.11.1",
"num-traits",
"rand",
"rkyv",
@@ -3827,10 +3786,11 @@ checksum = "56e6fa9c48d24d85fb3de5ad847117517440f6beceb7798af16b4a87d616b8d0"
[[package]]
name = "serde"
version = "1.0.219"
version = "1.0.228"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5f0e2c6ed6606019b4e29e69dbaba95b11854410e5347d525002456dbbb786b6"
checksum = "9a8e94ea7f378bd32cbbd37198a4a91436180c5bb472411e48b5ec2e2124ae9e"
dependencies = [
"serde_core",
"serde_derive",
]
@@ -3865,10 +3825,19 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "794e44574226fc701e3be5c651feb7939038fc67fb73f6f4dd5c4ba90fd3be70"
[[package]]
name = "serde_derive"
version = "1.0.219"
name = "serde_core"
version = "1.0.228"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5b0276cf7f2c73365f7157c8123c21cd9a50fbbd844757af28ca1f5925fc2a00"
checksum = "41d385c7d4ca58e59fc732af25c3983b67ac852c1a25000afe1175de458b67ad"
dependencies = [
"serde_derive",
]
[[package]]
name = "serde_derive"
version = "1.0.228"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d540f220d3187173da220f885ab66608367b6574e925011a9353e4badda91d79"
dependencies = [
"proc-macro2",
"quote",
@@ -4095,10 +4064,11 @@ dependencies = [
[[package]]
name = "slog-term"
version = "2.9.1"
version = "2.9.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b6e022d0b998abfe5c3782c1f03551a596269450ccd677ea51c56f8b214610e8"
checksum = "5cb1fc680b38eed6fad4c02b3871c09d2c81db8c96aa4e9c0a34904c830f09b5"
dependencies = [
"chrono",
"is-terminal",
"slog",
"term",
@@ -4246,6 +4216,20 @@ dependencies = [
"syn 2.0.101",
]
[[package]]
name = "sysctl"
version = "0.7.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cca424247104946a59dacd27eaad296223b7feec3d168a6dd04585183091eb0b"
dependencies = [
"bitflags 2.9.0",
"byteorder",
"enum-as-inner",
"libc",
"thiserror 2.0.12",
"walkdir",
]
[[package]]
name = "sysinfo"
version = "0.34.2"
@@ -4286,13 +4270,11 @@ dependencies = [
[[package]]
name = "term"
version = "0.7.0"
version = "1.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c59df8ac95d96ff9bede18eb7300b0fda5e5d8d90960e76f8e14ae765eedbf1f"
checksum = "d8c27177b12a6399ffc08b98f76f7c9a1f4fe9fc967c784c5a071fa8d93cf7e1"
dependencies = [
"dirs-next",
"rustversion",
"winapi",
"windows-sys 0.60.2",
]
[[package]]
@@ -4361,30 +4343,30 @@ dependencies = [
[[package]]
name = "time"
version = "0.3.41"
version = "0.3.47"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8a7619e19bc266e0f9c5e6686659d394bc57973859340060a69221e57dbc0c40"
checksum = "743bd48c283afc0388f9b8827b976905fb217ad9e647fae3a379a9283c4def2c"
dependencies = [
"deranged",
"itoa",
"num-conv",
"powerfmt",
"serde",
"serde_core",
"time-core",
"time-macros",
]
[[package]]
name = "time-core"
version = "0.1.4"
version = "0.1.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c9e9a38711f559d9e3ce1cdb06dd7c5b8ea546bc90052da6d06bb76da74bb07c"
checksum = "7694e1cfe791f8d31026952abf09c69ca6f6fa4e1a1229e18988f06a04a12dca"
[[package]]
name = "time-macros"
version = "0.2.22"
version = "0.2.27"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3526739392ec93fd8b359c8e98514cb3e8e021beb4e5f597b00a0221f8ed8a49"
checksum = "2e70e4c5a0e0a8a4823ad65dfe1a6930e4f4d756dcd9dd7939022b5e8c501215"
dependencies = [
"num-conv",
"time-core",
@@ -4422,7 +4404,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0cc3a2344dafbe23a245241fe8b09735b521110d30fcefbbd5feb1797ca35d17"
dependencies = [
"backtrace",
"bytes 1.10.1",
"bytes 1.11.1",
"io-uring",
"libc",
"mio",
@@ -4476,7 +4458,7 @@ version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "52a15c15b1bc91f90902347eff163b5b682643aff0c8e972912cca79bd9208dd"
dependencies = [
"bytes 1.10.1",
"bytes 1.11.1",
"futures",
"libc",
"tokio",
@@ -5021,7 +5003,7 @@ version = "0.57.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "12342cb4d8e3b046f3d80effd474a7a02447231330ef77d71daa6fbc40681143"
dependencies = [
"windows-core 0.57.0",
"windows-core",
"windows-targets 0.52.6",
]
@@ -5031,25 +5013,12 @@ version = "0.57.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d2ed2439a290666cd67ecce2b0ffaad89c2a56b976b736e6ece670297897832d"
dependencies = [
"windows-implement 0.57.0",
"windows-interface 0.57.0",
"windows-implement",
"windows-interface",
"windows-result 0.1.2",
"windows-targets 0.52.6",
]
[[package]]
name = "windows-core"
version = "0.61.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4763c1de310c86d75a878046489e2e5ba02c649d185f21c67d4cf8a56d098980"
dependencies = [
"windows-implement 0.60.0",
"windows-interface 0.59.1",
"windows-link",
"windows-result 0.3.2",
"windows-strings 0.4.0",
]
[[package]]
name = "windows-implement"
version = "0.57.0"
@@ -5061,17 +5030,6 @@ dependencies = [
"syn 2.0.101",
]
[[package]]
name = "windows-implement"
version = "0.60.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a47fddd13af08290e67f4acabf4b459f647552718f683a7b415d290ac744a836"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.101",
]
[[package]]
name = "windows-interface"
version = "0.57.0"
@@ -5083,17 +5041,6 @@ dependencies = [
"syn 2.0.101",
]
[[package]]
name = "windows-interface"
version = "0.59.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bd9211b69f8dcdfa817bfd14bf1c97c9188afa36f4750130fcdf3f400eca9fa8"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.101",
]
[[package]]
name = "windows-link"
version = "0.1.1"
@@ -5107,7 +5054,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4286ad90ddb45071efd1a66dfa43eb02dd0dfbae1545ad6cc3c51cf34d7e8ba3"
dependencies = [
"windows-result 0.3.2",
"windows-strings 0.3.1",
"windows-strings",
"windows-targets 0.53.2",
]
@@ -5138,15 +5085,6 @@ dependencies = [
"windows-link",
]
[[package]]
name = "windows-strings"
version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7a2ba9642430ee452d5a7aa78d72907ebe8cfda358e8cb7918a2050581322f97"
dependencies = [
"windows-link",
]
[[package]]
name = "windows-sys"
version = "0.48.0"

View File

@@ -5,7 +5,7 @@ members = ["rustjail", "policy", "vsock-exporter"]
authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
edition = "2018"
license = "Apache-2.0"
rust-version = "1.85.1"
rust-version = "1.88.0"
[workspace.dependencies]
oci-spec = { version = "0.8.1", features = ["runtime"] }

View File

@@ -857,7 +857,7 @@ fn mount_from(
dest.as_str(),
Some(mount_typ.as_str()),
flags,
Some(d.as_str()),
Some(d.as_str()).filter(|s| !s.is_empty()),
)
.inspect_err(|e| log_child!(cfd_log, "mount error: {:?}", e))?;

View File

@@ -11,11 +11,15 @@ use crate::device::{
};
#[cfg(target_arch = "s390x")]
use crate::linux_abi::CCW_ROOT_BUS_PATH;
use crate::linux_abi::{create_pci_root_bus_path, SYSFS_DIR, SYSTEM_DEV_PATH};
use crate::linux_abi::{
create_pci_root_bus_path, pcipath_from_dev_tree_path, SYSFS_DIR, SYSTEM_DEV_PATH,
};
use crate::pci;
use crate::sandbox::Sandbox;
use crate::uevent::{wait_for_uevent, Uevent, UeventMatcher};
use anyhow::{anyhow, Context, Result};
#[cfg(target_arch = "s390x")]
use std::str::FromStr;
#[cfg(target_arch = "s390x")]
use kata_types::device::DRIVER_BLK_CCW_TYPE;
@@ -23,7 +27,6 @@ use kata_types::device::{DRIVER_BLK_MMIO_TYPE, DRIVER_BLK_PCI_TYPE};
use protocols::agent::Device;
use regex::Regex;
use std::path::Path;
use std::str::FromStr;
use std::sync::Arc;
use tokio::sync::Mutex;
use tracing::instrument;
@@ -47,8 +50,8 @@ impl DeviceHandler for VirtioBlkPciDeviceHandler {
#[instrument]
async fn device_handler(&self, device: &Device, ctx: &mut DeviceContext) -> Result<SpecUpdate> {
let pcipath = pci::Path::from_str(&device.id)?;
let vm_path = get_virtio_blk_pci_device_name(ctx.sandbox, &pcipath).await?;
let (root_complex, pcipath) = pcipath_from_dev_tree_path(&device.id)?;
let vm_path = get_virtio_blk_pci_device_name(ctx.sandbox, root_complex, &pcipath).await?;
Ok(DeviceInfo::new(&vm_path, true)
.context("New device info")?
@@ -112,11 +115,12 @@ impl DeviceHandler for VirtioBlkMmioDeviceHandler {
#[instrument]
pub async fn get_virtio_blk_pci_device_name(
sandbox: &Arc<Mutex<Sandbox>>,
root_complex: &str,
pcipath: &pci::Path,
) -> Result<String> {
let root_bus_sysfs = format!("{}{}", SYSFS_DIR, create_pci_root_bus_path());
let root_bus_sysfs = format!("{}{}", SYSFS_DIR, create_pci_root_bus_path(root_complex));
let sysfs_rel_path = pcipath_to_sysfs(&root_bus_sysfs, pcipath)?;
let matcher = VirtioBlkPciMatcher::new(&sysfs_rel_path);
let matcher = VirtioBlkPciMatcher::new(&sysfs_rel_path, root_complex);
let uev = wait_for_uevent(sandbox, matcher).await?;
Ok(format!("{}/{}", SYSTEM_DEV_PATH, &uev.devname))
@@ -167,8 +171,8 @@ pub struct VirtioBlkPciMatcher {
}
impl VirtioBlkPciMatcher {
pub fn new(relpath: &str) -> VirtioBlkPciMatcher {
let root_bus = create_pci_root_bus_path();
pub fn new(relpath: &str, root_complex: &str) -> VirtioBlkPciMatcher {
let root_bus = create_pci_root_bus_path(root_complex);
let re = format!(r"^{root_bus}{relpath}/virtio[0-9]+/block/");
VirtioBlkPciMatcher {
@@ -229,11 +233,13 @@ impl UeventMatcher for VirtioBlkCCWMatcher {
#[cfg(test)]
mod tests {
use super::*;
#[cfg(target_arch = "s390x")]
use std::str::FromStr;
#[tokio::test]
#[allow(clippy::redundant_clone)]
async fn test_virtio_blk_matcher() {
let root_bus = create_pci_root_bus_path();
let root_bus = create_pci_root_bus_path("00");
let devname = "vda";
let mut uev_a = crate::uevent::Uevent::default();
@@ -242,12 +248,12 @@ mod tests {
uev_a.subsystem = BLOCK.to_string();
uev_a.devname = devname.to_string();
uev_a.devpath = format!("{root_bus}{relpath_a}/virtio4/block/{devname}");
let matcher_a = VirtioBlkPciMatcher::new(relpath_a);
let matcher_a = VirtioBlkPciMatcher::new(relpath_a, "00");
let mut uev_b = uev_a.clone();
let relpath_b = "/0000:00:0a.0/0000:00:0b.0";
uev_b.devpath = format!("{root_bus}{relpath_b}/virtio0/block/{devname}");
let matcher_b = VirtioBlkPciMatcher::new(relpath_b);
let matcher_b = VirtioBlkPciMatcher::new(relpath_b, "00");
assert!(matcher_a.is_match(&uev_a));
assert!(matcher_b.is_match(&uev_b));

View File

@@ -1155,9 +1155,11 @@ mod tests {
// test
async fn example_get_device_name(
sandbox: &Arc<Mutex<Sandbox>>,
root_complex: &str,
relpath: &str,
) -> Result<String> {
let matcher = crate::device::block_device_handler::VirtioBlkPciMatcher::new(relpath);
let matcher =
crate::device::block_device_handler::VirtioBlkPciMatcher::new(relpath, root_complex);
let uev = wait_for_uevent(sandbox, matcher).await?;
@@ -1167,7 +1169,8 @@ mod tests {
#[tokio::test]
async fn test_get_device_name() {
let devname = "vda";
let root_bus = create_pci_root_bus_path();
let root_complex = "00";
let root_bus = create_pci_root_bus_path(root_complex);
let relpath = "/0000:00:0a.0/0000:03:0b.0";
let devpath = format!("{root_bus}{relpath}/virtio4/block/{devname}");
@@ -1184,7 +1187,7 @@ mod tests {
sb.uevent_map.insert(devpath.clone(), uev);
drop(sb); // unlock
let name = example_get_device_name(&sandbox, relpath).await;
let name = example_get_device_name(&sandbox, root_complex, relpath).await;
assert!(name.is_ok(), "{}", name.unwrap_err());
assert_eq!(name.unwrap(), devname);
@@ -1194,7 +1197,7 @@ mod tests {
spawn_test_watcher(sandbox.clone(), uev);
let name = example_get_device_name(&sandbox, relpath).await;
let name = example_get_device_name(&sandbox, root_complex, relpath).await;
assert!(name.is_ok(), "{}", name.unwrap_err());
assert_eq!(name.unwrap(), devname);
}

View File

@@ -38,11 +38,12 @@ fn check_existing(re: Regex) -> Result<bool> {
#[cfg(not(target_arch = "s390x"))]
pub async fn wait_for_pci_net_interface(
sandbox: &Arc<Mutex<Sandbox>>,
root_complex: &str,
pcipath: &pci::Path,
) -> Result<()> {
let root_bus_sysfs = format!("{}{}", SYSFS_DIR, create_pci_root_bus_path());
let root_bus_sysfs = format!("{}{}", SYSFS_DIR, create_pci_root_bus_path(root_complex));
let sysfs_rel_path = pcipath_to_sysfs(&root_bus_sysfs, pcipath)?;
let matcher = NetPciMatcher::new(&sysfs_rel_path);
let matcher = NetPciMatcher::new(&sysfs_rel_path, root_complex);
let pattern = format!(
r"[./]+{}/[a-z0-9/]*net/[a-z0-9/]*",
matcher.devpath.as_str()
@@ -65,8 +66,8 @@ pub struct NetPciMatcher {
#[cfg(not(target_arch = "s390x"))]
impl NetPciMatcher {
pub fn new(relpath: &str) -> NetPciMatcher {
let root_bus = create_pci_root_bus_path();
pub fn new(relpath: &str, root_complex: &str) -> NetPciMatcher {
let root_bus = create_pci_root_bus_path(root_complex);
NetPciMatcher {
devpath: format!("{root_bus}{relpath}"),
@@ -131,7 +132,7 @@ mod tests {
#[tokio::test]
#[allow(clippy::redundant_clone)]
async fn test_net_pci_matcher() {
let root_bus = create_pci_root_bus_path();
let root_bus = create_pci_root_bus_path("00");
let relpath_a = "/0000:00:02.0/0000:01:01.0";
let mut uev_a = crate::uevent::Uevent::default();
@@ -139,13 +140,13 @@ mod tests {
uev_a.devpath = format!("{root_bus}{relpath_a}");
uev_a.subsystem = String::from("net");
uev_a.interface = String::from("eth0");
let matcher_a = NetPciMatcher::new(relpath_a);
let matcher_a = NetPciMatcher::new(relpath_a, "00");
println!("Matcher a : {}", matcher_a.devpath);
let relpath_b = "/0000:00:02.0/0000:01:02.0";
let mut uev_b = uev_a.clone();
uev_b.devpath = format!("{root_bus}{relpath_b}");
let matcher_b = NetPciMatcher::new(relpath_b);
let matcher_b = NetPciMatcher::new(relpath_b, "00");
assert!(matcher_a.is_match(&uev_a));
assert!(matcher_b.is_match(&uev_b));
@@ -156,7 +157,7 @@ mod tests {
let net_substr = "/net/eth0";
let mut uev_c = uev_a.clone();
uev_c.devpath = format!("{root_bus}{relpath_c}{net_substr}");
let matcher_c = NetPciMatcher::new(relpath_c);
let matcher_c = NetPciMatcher::new(relpath_c, "00");
assert!(matcher_c.is_match(&uev_c));
assert!(!matcher_a.is_match(&uev_c));

View File

@@ -110,7 +110,7 @@ mod tests {
#[tokio::test]
#[allow(clippy::redundant_clone)]
async fn test_scsi_block_matcher() {
let root_bus = create_pci_root_bus_path();
let root_bus = create_pci_root_bus_path("00");
let devname = "sda";
let mut uev_a = crate::uevent::Uevent::default();

View File

@@ -66,9 +66,10 @@ impl DeviceHandler for VfioPciDeviceHandler {
.ok_or_else(|| anyhow!("Malformed VFIO PCI option {:?}", opt))?;
let host =
pci::Address::from_str(host).context("Bad host PCI address in VFIO option {:?}")?;
let pcipath = pci::Path::from_str(pcipath)?;
let guestdev = wait_for_pci_device(ctx.sandbox, &pcipath).await?;
let (root_complex, pcipath) = pcipath_from_dev_tree_path(pcipath)?;
let guestdev = wait_for_pci_device(ctx.sandbox, root_complex, &pcipath).await?;
if vfio_in_guest {
pci_driver_override(ctx.logger, SYSFS_BUS_PCI_PATH, guestdev, "vfio-pci")?;
@@ -212,8 +213,8 @@ pub struct PciMatcher {
}
impl PciMatcher {
pub fn new(relpath: &str) -> Result<PciMatcher> {
let root_bus = create_pci_root_bus_path();
pub fn new(relpath: &str, root_complex: &str) -> Result<PciMatcher> {
let root_bus = create_pci_root_bus_path(root_complex);
Ok(PciMatcher {
devpath: format!("{root_bus}{relpath}"),
})
@@ -302,11 +303,12 @@ async fn associate_ap_device(apqn: &Apqn, mkvp: &str) -> Result<()> {
pub async fn wait_for_pci_device(
sandbox: &Arc<Mutex<Sandbox>>,
root_complex: &str,
pcipath: &pci::Path,
) -> Result<pci::Address> {
let root_bus_sysfs = format!("{}{}", SYSFS_DIR, create_pci_root_bus_path());
let root_bus_sysfs = format!("{}{}", SYSFS_DIR, create_pci_root_bus_path(root_complex));
let sysfs_rel_path = pcipath_to_sysfs(&root_bus_sysfs, pcipath)?;
let matcher = PciMatcher::new(&sysfs_rel_path)?;
let matcher = PciMatcher::new(&sysfs_rel_path, root_complex)?;
let uev = wait_for_uevent(sandbox, matcher).await?;

View File

@@ -3,8 +3,10 @@
// SPDX-License-Identifier: Apache-2.0
//
use crate::pci;
use anyhow::{Context, Result};
use cfg_if::cfg_if;
use std::str::FromStr;
// Linux ABI related constants.
#[cfg(target_arch = "aarch64")]
@@ -18,19 +20,32 @@ pub const SYSFS_DIR: &str = "/sys";
target_arch = "x86_64",
target_arch = "x86"
))]
pub fn create_pci_root_bus_path() -> String {
String::from("/devices/pci0000:00")
// With NUMA, we need to make sure we use the correct root complex which is
// defined by the pxb-pcie driver.
pub fn create_pci_root_bus_path(root_complex: &str) -> String {
format!("/devices/pci0000:{root_complex}")
}
// This is used in several modules, let's create a helper function to parse the
// qom path and switch easily once the shim sends us the full NUMA path
pub fn pcipath_from_dev_tree_path(dev_tree_path: &str) -> Result<(&str, pci::Path)> {
// Placeholder until the shim send us the full NUMA path
// via shim in the form of root_complex/bus/device 10/00/02
// Currently the shim only sends us the bus/device 00/02
let pci_path = pci::Path::from_str(dev_tree_path)
.with_context(|| format!("Failed to parse PCI path from QOM path '{}'", dev_tree_path))?;
Ok(("00", pci_path))
}
#[cfg(target_arch = "aarch64")]
pub fn create_pci_root_bus_path() -> String {
let ret = String::from("/devices/platform/4010000000.pcie/pci0000:00");
pub fn create_pci_root_bus_path(root_complex: &str) -> String {
let ret = format!("/devices/platform/4010000000.pcie/pci0000:{root_complex}");
let acpi_root_bus_path = String::from("/devices/pci0000:00");
let acpi_root_bus_path = format!("/devices/pci0000:{root_complex}");
let mut acpi_sysfs_dir = String::from(SYSFS_DIR);
let mut sysfs_dir = String::from(SYSFS_DIR);
let mut start_root_bus_path = String::from("/devices/platform/");
let end_root_bus_path = String::from("/pci0000:00");
let end_root_bus_path = format!("/pci0000:{root_complex}");
// check if there is pci bus path for acpi
acpi_sysfs_dir.push_str(&acpi_root_bus_path);

View File

@@ -7,6 +7,7 @@ use async_trait::async_trait;
use rustjail::{pipestream::PipeStream, process::StreamType};
use tokio::io::{AsyncReadExt, AsyncWriteExt, ReadHalf};
use tokio::sync::Mutex;
use tokio::time::{timeout, Duration};
use std::convert::TryFrom;
use std::ffi::{CString, OsStr};
@@ -14,6 +15,7 @@ use std::fmt::Debug;
use std::io;
use std::os::unix::ffi::OsStrExt;
use std::path::Path;
#[cfg(target_arch = "s390x")]
use std::str::FromStr;
use std::sync::Arc;
use ttrpc::{
@@ -95,7 +97,6 @@ use libc::{self, c_char, c_ushort, pid_t, winsize, TIOCSWINSZ};
use std::fs;
use std::os::unix::prelude::PermissionsExt;
use std::process::{Command, Stdio};
use std::time::Duration;
use nix::unistd::{Gid, Uid};
use std::fs::{File, OpenOptions};
@@ -700,25 +701,66 @@ impl AgentService {
let reader = reader.ok_or_else(|| anyhow!("cannot get stream reader"))?;
tokio::select! {
// Poll the futures in the order they appear from top to bottom
// it is very important to avoid data loss. If there is still
// data in the buffer and read_stream branch will return
// Poll::Ready so that the term_exit_notifier will never polled
// before all data were read.
// Create one in-flight read future and reuse it in both branches.
let read_fut = read_stream(&reader, req.len as usize);
tokio::pin!(read_fut);
// Cancellation and polling model: Rust async is polled, not preempted.
// `Future::poll()` is a synchronous function call that runs to completion and
// returns Ready or Pending (`std::future::Future`).
// Readiness notifications (Waker::wake / Tokio Notify) only schedule the task
// to be polled again later; they do not interrupt an in-progress poll.
// Therefore, a Notify becoming ready while `poll_read()` is executing cannot cause
// the read future to be dropped mid-way; cancellation can only happen when the branch
// is still pending between polls (Tokio `select!` cancels by dropping non-selected futures).
// Detailed information, please refer to Tokio doc for more information:
// - Future::poll: https://doc.rust-lang.org/std/future/trait.Future.html
// - Waker: https://doc.rust-lang.org/std/task/struct.Waker.html
// - Tokio select!: https://docs.rs/tokio/latest/tokio/macro.select.html
let data = tokio::select! {
// Use `biased` to make the polling order deterministic (top-to-bottom).
// This ensures that *when multiple branches are ready at the same time*,
// we prefer reading pending output over reacting to the exit notification.
//
// Note: `biased` does NOT guarantee that we won't lose output. If the exit
// notification becomes ready while `read_stream` is still pending, the
// exit branch may be selected and we may stop reading before draining the
// remaining buffered data.
//
// Detailed information, please refer to Tokio doc for more information:
// https://docs.rs/tokio/latest/src/tokio/macros/select.rs.html#67
biased;
v = read_stream(&reader, req.len as usize) => {
let vector = v?;
let mut resp = ReadStreamResponse::new();
resp.set_data(vector);
Ok(resp)
}
v = &mut read_fut => v?,
_ = term_exit_notifier.notified() => {
Err(anyhow!("eof"))
// Drain-after-exit rationale:
// The process has exited, but the data may still be buffered in the pipe/pty.
// We should keep waiting for the same in-flight read for a bounded window to drain the data.
//
// It enters this branch only if `term_exit_notifier.notified()` fires. It then try to "drain"
// any remaining buffered output for a short, bounded time window:
// - If non-empty data is read: return immediately.
// - else then return empty data as EOF.
const DRAIN_DEADLINE_MS: u64 = 500; // 500ms
let deadline = Duration::from_millis(DRAIN_DEADLINE_MS);
// Attempt to drain remaining buffered output after process exit
// Try reading with timeout
match timeout(deadline, &mut read_fut).await {
Ok(v) => v?, // got data or EOF (empty)
_ => {
warn!(sl(), "exit-drain timeout, return EOF"; "container-id" => cid, "exec-id" => eid);
Vec::new() // Return empty as EOF
}
}
}
}
};
let mut resp = ReadStreamResponse::new();
resp.set_data(data);
Ok(resp)
}
}
@@ -1019,10 +1061,11 @@ impl agent_ttrpc::AgentService for AgentService {
if !interface.devicePath.is_empty() {
#[cfg(not(target_arch = "s390x"))]
{
let pcipath = pci::Path::from_str(&interface.devicePath).map_ttrpc_err(|e| {
format!("Unexpected pci-path for network interface: {e:?}")
})?;
wait_for_pci_net_interface(&self.sandbox, &pcipath)
let (root_complex, pcipath) = pcipath_from_dev_tree_path(&interface.devicePath)
.map_ttrpc_err(|e| {
format!("Invalid PCI path for network interface: {:?}", e)
})?;
wait_for_pci_net_interface(&self.sandbox, root_complex, &pcipath)
.await
.map_ttrpc_err(|e| format!("interface not available: {e:?}"))?;
}
@@ -1773,10 +1816,6 @@ async fn read_stream(reader: &Mutex<ReadHalf<PipeStream>>, l: usize) -> Result<V
let len = reader.read(&mut content).await?;
content.resize(len, 0);
if len == 0 {
return Err(anyhow!("read meet eof"));
}
Ok(content)
}
@@ -2125,7 +2164,9 @@ async fn do_add_swap(sandbox: &Arc<Mutex<Sandbox>>, req: &AddSwapRequest) -> Res
slots.push(pci::SlotFn::new(*slot, 0)?);
}
let pcipath = pci::Path::new(slots)?;
let dev_name = get_virtio_blk_pci_device_name(sandbox, &pcipath).await?;
// Default all virtio devices to root_complex 00 aka pcie.0
let root_complex = "00";
let dev_name = get_virtio_blk_pci_device_name(sandbox, root_complex, &pcipath).await?;
let c_str = CString::new(dev_name)?;
let ret = unsafe { libc::swapon(c_str.as_ptr() as *const c_char, 0) };

View File

@@ -4,10 +4,10 @@
// SPDX-License-Identifier: Apache-2.0
//
use crate::linux_abi::pcipath_from_dev_tree_path;
use std::fs;
use std::os::unix::fs::PermissionsExt;
use std::path::Path;
use std::str::FromStr;
use std::sync::Arc;
use anyhow::{anyhow, Context, Result};
@@ -29,8 +29,9 @@ use crate::device::block_device_handler::{
};
use crate::device::nvdimm_device_handler::wait_for_pmem_device;
use crate::device::scsi_device_handler::get_scsi_device_name;
use crate::pci;
use crate::storage::{common_storage_handler, new_device, StorageContext, StorageHandler};
#[cfg(target_arch = "s390x")]
use std::str::FromStr;
#[derive(Debug)]
pub struct VirtioBlkMmioHandler {}
@@ -84,8 +85,9 @@ impl StorageHandler for VirtioBlkPciHandler {
return Err(anyhow!("Invalid device {}", &storage.source));
}
} else {
let pcipath = pci::Path::from_str(&storage.source)?;
let dev_path = get_virtio_blk_pci_device_name(ctx.sandbox, &pcipath).await?;
let (root_complex, pcipath) = pcipath_from_dev_tree_path(&storage.source)?;
let dev_path =
get_virtio_blk_pci_device_name(ctx.sandbox, root_complex, &pcipath).await?;
storage.source = dev_path;
}

View File

@@ -48,7 +48,6 @@ vmm-sys-util = { workspace = true }
virtio-queue = { workspace = true, optional = true }
vm-memory = { workspace = true, features = ["backend-mmap"] }
crossbeam-channel = "0.5.6"
fuse-backend-rs = "0.10.5"
vfio-bindings = { workspace = true, optional = true }
vfio-ioctls = { workspace = true, optional = true }
@@ -86,3 +85,6 @@ host-device = ["dep:vfio-bindings", "dep:vfio-ioctls", "dep:dbs-pci"]
unexpected_cfgs = { level = "warn", check-cfg = [
'cfg(feature, values("test-mock"))',
] }
[package.metadata.cargo-machete]
ignored = ["vfio-bindings"]

View File

@@ -40,7 +40,7 @@ You could see the [official documentation](docs/) page for more details.
# Supported Architectures
- x86-64
- aarch64
# Supported Kernel
[TODO](https://github.com/kata-containers/kata-containers/issues/4303)
@@ -51,4 +51,4 @@ Part of the code is based on the [Cloud Hypervisor](https://github.com/cloud-hyp
# License
`Dragonball` is licensed under [Apache License](http://www.apache.org/licenses/LICENSE-2.0), Version 2.0.
`Dragonball` is licensed under [Apache License](http://www.apache.org/licenses/LICENSE-2.0), Version 2.0.

View File

@@ -24,4 +24,4 @@
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

View File

@@ -11,4 +11,4 @@ keywords = ["dragonball", "acpi", "vmm", "secure-sandbox"]
readme = "README.md"
[dependencies]
vm-memory = {workspace = true}
vm-memory = {workspace = true}

View File

@@ -7,7 +7,7 @@ sandbox (virtual machine), such as memory-mapped I/O address space, port I/O add
MSI/MSI-X vectors, device instance id, etc. The `dbs-allocator` crate is designed to help the resource manager
to track and allocate these types of resources.
Main components are:
Main components are:
- *Constraints*: struct to declare constraints for resource allocation.
```rust
#[derive(Copy, Clone, Debug)]
@@ -34,20 +34,20 @@ pub fn allocate(&mut self, constraint: &Constraint) -> Option<Range>
pub fn free(&mut self, key: &Range) -> Option<T>
pub fn insert(&mut self, key: Range, data: Option<T>) -> Self
pub fn update(&mut self, key: &Range, data: T) -> Option<T>
pub fn delete(&mut self, key: &Range) -> Option<T>
pub fn delete(&mut self, key: &Range) -> Option<T>
pub fn get(&self, key: &Range) -> Option<NodeState<&T>>
```
## Usage
The concept of Interval Tree may seem complicated, but using dbs-allocator to do resource allocation and release is simple and straightforward.
The concept of Interval Tree may seem complicated, but using dbs-allocator to do resource allocation and release is simple and straightforward.
You can following these steps to allocate your VMM resource.
```rust
// 1. To start with, we should create an interval tree for some specific resouces and give maximum address/id range as root node. The range here could be address range, id range, etc.
let mut resources_pool = IntervalTree::new();
resources_pool.insert(Range::new(MIN_RANGE, MAX_RANGE), None);
let mut resources_pool = IntervalTree::new();
resources_pool.insert(Range::new(MIN_RANGE, MAX_RANGE), None);
// 2. Next, create a constraint with the size for your resource, you could also assign the maximum, minimum and alignment for the constraint. Then we could use the constraint to allocate the resource in the range we previously decided. Interval Tree will give you the appropriate range.
// 2. Next, create a constraint with the size for your resource, you could also assign the maximum, minimum and alignment for the constraint. Then we could use the constraint to allocate the resource in the range we previously decided. Interval Tree will give you the appropriate range.
let mut constraint = Constraint::new(SIZE);
let mut resources_range = self.resources_pool.allocate(&constraint);
@@ -64,13 +64,13 @@ use dbs_allocator::{Constraint, IntervalTree, Range};
let mut pci_device_pool = IntervalTree::new();
// Init PCI device id pool with the range 0 to 255
pci_device_pool.insert(Range::new(0x0u8, 0xffu8), None);
pci_device_pool.insert(Range::new(0x0u8, 0xffu8), None);
// Construct a constraint with size 1 and alignment 1 to ask for an ID.
let mut constraint = Constraint::new(1u64).align(1u64);
// Construct a constraint with size 1 and alignment 1 to ask for an ID.
let mut constraint = Constraint::new(1u64).align(1u64);
// Get an ID from the pci_device_pool
let mut id = pci_device_pool.allocate(&constraint).map(|e| e.min as u8);
let mut id = pci_device_pool.allocate(&constraint).map(|e| e.min as u8);
// Pass the ID generated from dbs-allocator to vm-pci specified functions to create pci devices
let mut pci_device = PciDevice::new(id as u8, ..);
@@ -84,15 +84,15 @@ use dbs_allocator::{Constraint, IntervalTree, Range};
let mut mem_pool = IntervalTree::new();
// Init memory address from GUEST_MEM_START to GUEST_MEM_END
mem_pool.insert(Range::new(GUEST_MEM_START, GUEST_MEM_END), None);
mem_pool.insert(Range::new(GUEST_MEM_START, GUEST_MEM_END), None);
// Construct a constraint with size, maximum addr and minimum address of memory region to ask for an memory allocation range.
// Construct a constraint with size, maximum addr and minimum address of memory region to ask for an memory allocation range.
let constraint = Constraint::new(region.len())
.min(region.start_addr().raw_value())
.max(region.last_addr().raw_value());
// Get the memory allocation range from the pci_device_pool
let mem_range = mem_pool.allocate(&constraint).unwrap();
let mem_range = mem_pool.allocate(&constraint).unwrap();
// Update the mem_range in IntervalTree with memory region info
mem_pool.update(&mem_range, region);
@@ -103,4 +103,4 @@ mem_pool.update(&mem_range, region);
## License
This project is licensed under [Apache License](http://www.apache.org/licenses/LICENSE-2.0), Version 2.0.
This project is licensed under [Apache License](http://www.apache.org/licenses/LICENSE-2.0), Version 2.0.

View File

@@ -2,7 +2,7 @@
## Design
CPUID is designed as the CPUID filter for Intel and AMD CPU Identification. Through CPUID configuration, we could set CPU topology, Cache topology, PMU status and other features for the VMs.
CPUID is designed as the CPUID filter for Intel and AMD CPU Identification. Through CPUID configuration, we could set CPU topology, Cache topology, PMU status and other features for the VMs.
CPUID is developed based on the Firecracker CPUID code while we add other extensions such as CPU Topology and VPMU features.
@@ -34,7 +34,7 @@ pub struct VmSpec {
```
## Example
We will show examples for filtering CPUID.
We will show examples for filtering CPUID.
First, you need to use KVM_GET_CPUID2 ioctl to get the original CPUID, this part is not included in the db-cpuid.
```rust

View File

@@ -15,8 +15,7 @@ The dbs-device crate provides:
The dbs-device crate is designed to support the virtual machine's device model.
The core concepts of device model are [Port I/O](https://wiki.osdev.org/I/O_Ports) and
[Memory-mapped I/O](https://en.wikipedia.org/wiki/Memory-mapped_I/O),
The core concepts of device model are [Memory-mapped I/O and port-mapped I/O](https://en.wikipedia.org/wiki/Memory-mapped_I/O_and_port-mapped_I/O),
which are two main methods of performing I/O between CPU and devices.
The device model provided by the dbs-device crate works as below:
@@ -51,7 +50,7 @@ The difference of [`DeviceIo`] and [`DeviceIoMut`] is the reference type of `sel
interior mutability and thread-safe protection itself
- [`DeviceIoMut`] trait would pass a mutable reference `&mut self` to method, and it can give mutability to device
which is wrapped by `Mutex` directly to simplify the difficulty of achieving interior mutability.
Additionally, the [`DeviceIo`] trait has an auto implement for `Mutex<T: DeviceIoMut>`
Last, the device needs to be added to [`IoManager`] by using `register_device_io()`, and the function would add device

View File

@@ -4,7 +4,7 @@
## Serial Devices
Defined a wrapper over the Serial of [`vm-superio`](https://github.com/rust-vmm/vm-superio).
Defined a wrapper over the Serial of [`vm-superio`](https://github.com/rust-vmm/vm-superio).
This wrapper is needed because [Orphan rules](https://doc.rust-lang.org/reference/items/implementations.html#orphan-rules),
which is one crate can not implement a trait for a struct defined in
another crate. This wrapper also contains the input field that is
@@ -12,7 +12,7 @@ missing from upstream implementation.
## i8042 Devices
Defined a wrapper over the `i8042 PS/2 Controller` of [`vm-superio`](https://github.com/rust-vmm/vm-superio).
Defined a wrapper over the `i8042 PS/2 Controller` of [`vm-superio`](https://github.com/rust-vmm/vm-superio).
The i8042 PS/2 controller emulates, at this point, only the CPU reset command which is needed for announcing the VMM about the guest's shutdown.
### Acknowledgement

View File

@@ -23,24 +23,22 @@ dbs-interrupt = { workspace = true, features = [
"kvm-legacy-irq",
"kvm-msi-irq",
] }
downcast-rs = "1.2.0"
byteorder = "1.4.3"
serde = "1.0.27"
vm-memory = {workspace = true}
kvm-ioctls = {workspace = true}
kvm-bindings = {workspace = true}
vfio-ioctls = {workspace = true}
vfio-bindings = {workspace = true}
vm-memory = { workspace = true }
kvm-ioctls = { workspace = true }
kvm-bindings = { workspace = true }
vfio-ioctls = { workspace = true }
vfio-bindings = { workspace = true }
libc = "0.2.39"
vmm-sys-util = {workspace = true}
virtio-queue = {workspace = true}
dbs-utils = {workspace = true}
virtio-queue = { workspace = true }
dbs-utils = { workspace = true }
[dev-dependencies]
dbs-arch = { workspace = true }
kvm-ioctls = {workspace = true}
kvm-ioctls = { workspace = true }
test-utils = { workspace = true }
nix = { workspace = true }

View File

@@ -22,6 +22,6 @@ There are several components in `dbs-pci` crate building together to emulate PCI
8. `vfio` mod: `vfio` mod collects lots of information related to the `vfio` operations.
a. `vfio` `msi` and `msix` capability and state
b. `vfio` interrupt information
b. `vfio` interrupt information
c. PCI region information
d. `vfio` PCI device information and state
d. `vfio` PCI device information and state

View File

@@ -11,7 +11,6 @@ keywords = ["dragonball", "secure-sandbox", "devices", "upcall", "virtio"]
readme = "README.md"
[dependencies]
anyhow = "1"
log = "0.4.14"
thiserror = "1"
timerfd = "1.2.0"

View File

@@ -2,7 +2,7 @@
`dbs-upcall` is a direct communication tool between VMM and guest developed upon VSOCK. The server side of the upcall is a driver in guest kernel (kernel patches are needed for this feature) and it'll start to serve the requests after the kernel starts. And the client side is in VMM , it'll be a thread that communicates with VSOCK through UDS.
We have accomplished device hotplug / hot-unplug directly through upcall in order to avoid virtualization of ACPI to minimize virtual machines overhead. And there could be many other usage through this direct communication channel.
We have accomplished device hotplug / hot-unplug directly through upcall in order to avoid virtualization of ACPI to minimize virtual machines overhead. And there could be many other usage through this direct communication channel.
## Design
@@ -33,7 +33,7 @@ If request is sent in step 3, upcall state will change to `ServiceBusy` and upca
There are two parts for the upcall request message : message header and message load.
And there are three parts for the upcall reply message: message header, result and message load.
Message Header contains following information and it remains the same for the request and the reply :
Message Header contains following information and it remains the same for the request and the reply :
1. magic_version(u32): magic version for identifying upcall and the service type
2. msg_size(u32): size of the message load
3. msg_type(u32): type for the message to identify its usage (e.g. ADD_CPU)
@@ -49,7 +49,7 @@ msg_load type 2: `cpu_dev_info`` - for CPU hotplug / hot-unplug request:
1. count
2. `apic_ver`
3. `apic_ids[256]`
For the upcall reply message, reply contains result and two kinds of msg_load.
If result is 0, the operation is successful.
If result is not 0, result refers to the error code.
@@ -66,4 +66,4 @@ Kernel patches are needed for dbs-upcall. You could go to [Upcall Kernel Patches
## License
This project is licensed under [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).
This project is licensed under [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).

View File

@@ -24,8 +24,8 @@ dbs-boot = { workspace = true }
epoll = ">=4.3.1, <4.3.2"
io-uring = "0.5.2"
fuse-backend-rs = { version = "0.10.5", optional = true }
kvm-bindings = { workspace = true}
kvm-ioctls = {workspace = true}
kvm-bindings = { workspace = true }
kvm-ioctls = { workspace = true }
libc = "0.2.119"
log = "0.4.14"
nix = "0.24.3"
@@ -37,19 +37,16 @@ serde = "1.0.27"
serde_json = "1.0.9"
thiserror = "1"
threadpool = "1"
virtio-bindings = {workspace = true}
virtio-queue = {workspace = true}
vmm-sys-util = {workspace = true}
virtio-bindings = { workspace = true }
virtio-queue = { workspace = true }
vmm-sys-util = { workspace = true }
vm-memory = { workspace = true, features = ["backend-mmap"] }
sendfd = "0.4.3"
vhost-rs = { version = "0.6.1", package = "vhost", optional = true }
timerfd = "1.0"
[dev-dependencies]
vm-memory = { workspace = true, features = [
"backend-mmap",
"backend-atomic",
] }
vm-memory = { workspace = true, features = ["backend-mmap", "backend-atomic"] }
test-utils = { workspace = true }
[features]

View File

@@ -439,19 +439,19 @@ pub mod tests {
VirtqDesc { desc }
}
pub fn addr(&self) -> VolatileRef<u64> {
pub fn addr(&self) -> VolatileRef<'_, u64> {
self.desc.get_ref(offset_of!(DescriptorTmp, addr)).unwrap()
}
pub fn len(&self) -> VolatileRef<u32> {
pub fn len(&self) -> VolatileRef<'_, u32> {
self.desc.get_ref(offset_of!(DescriptorTmp, len)).unwrap()
}
pub fn flags(&self) -> VolatileRef<u16> {
pub fn flags(&self) -> VolatileRef<'_, u16> {
self.desc.get_ref(offset_of!(DescriptorTmp, flags)).unwrap()
}
pub fn next(&self) -> VolatileRef<u16> {
pub fn next(&self) -> VolatileRef<'_, u16> {
self.desc.get_ref(offset_of!(DescriptorTmp, next)).unwrap()
}
@@ -513,11 +513,11 @@ pub mod tests {
self.start.unchecked_add(self.ring.len() as GuestUsize)
}
pub fn flags(&self) -> VolatileRef<u16> {
pub fn flags(&self) -> VolatileRef<'_, u16> {
self.ring.get_ref(0).unwrap()
}
pub fn idx(&self) -> VolatileRef<u16> {
pub fn idx(&self) -> VolatileRef<'_, u16> {
self.ring.get_ref(2).unwrap()
}
@@ -525,12 +525,12 @@ pub mod tests {
4 + mem::size_of::<T>() * (i as usize)
}
pub fn ring(&self, i: u16) -> VolatileRef<T> {
pub fn ring(&self, i: u16) -> VolatileRef<'_, T> {
assert!(i < self.qsize);
self.ring.get_ref(Self::ring_offset(i)).unwrap()
}
pub fn event(&self) -> VolatileRef<u16> {
pub fn event(&self) -> VolatileRef<'_, u16> {
self.ring.get_ref(Self::ring_offset(self.qsize)).unwrap()
}
@@ -602,7 +602,7 @@ pub mod tests {
(self.dtable.len() / VirtqDesc::dtable_len(1)) as u16
}
pub fn dtable(&self, i: u16) -> VirtqDesc {
pub fn dtable(&self, i: u16) -> VirtqDesc<'_> {
VirtqDesc::new(&self.dtable, i)
}

View File

@@ -865,11 +865,11 @@ mod tests {
0
);
let config: [u8; 8] = [0; 8];
VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::write_config(
let _ = VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::write_config(
&mut dev, 0, &config,
);
let mut data: [u8; 8] = [1; 8];
VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::read_config(
let _ = VirtioDevice::<Arc<GuestMemoryMmap<()>>, QueueSync, GuestRegionMmap>::read_config(
&mut dev, 0, &mut data,
);
assert_eq!(config, data);

View File

@@ -339,7 +339,7 @@ mod tests {
}
}
pub fn create_event_handler_context(&self) -> EventHandlerContext {
pub fn create_event_handler_context(&self) -> EventHandlerContext<'_> {
const QSIZE: u16 = 256;
let guest_rxvq = GuestQ::new(GuestAddress(0x0010_0000), &self.mem, QSIZE);

View File

@@ -3,14 +3,14 @@
## Device Manager
Currently we have following device manager:
| Name | Description |
| Name | Description |
| --- | --- |
| [address space manager](../src/address_space_manager.rs) | abstracts virtual machine's physical management and provide mapping for guest virtual memory and MMIO ranges of emulated virtual devices, pass-through devices and vCPU |
| [config manager](../src/config_manager.rs) | provides abstractions for configuration information |
| [console manager](../src/device_manager/console_manager.rs) | provides management for all console devices |
| [resource manager](../src/resource_manager.rs) |provides resource management for `legacy_irq_pool`, `msi_irq_pool`, `pio_pool`, `mmio_pool`, `mem_pool`, `kvm_mem_slot_pool` with builder `ResourceManagerBuilder` |
| [VSOCK device manager](../src/device_manager/vsock_dev_mgr.rs) | provides configuration info for `VIRTIO-VSOCK` and management for all VSOCK devices |
| [config manager](../src/config_manager.rs) | provides abstractions for configuration information |
| [console manager](../src/device_manager/console_manager.rs) | provides management for all console devices |
| [resource manager](../src/resource_manager.rs) |provides resource management for `legacy_irq_pool`, `msi_irq_pool`, `pio_pool`, `mmio_pool`, `mem_pool`, `kvm_mem_slot_pool` with builder `ResourceManagerBuilder` |
| [VSOCK device manager](../src/device_manager/vsock_dev_mgr.rs) | provides configuration info for `VIRTIO-VSOCK` and management for all VSOCK devices |
## Device supported
`VIRTIO-VSOCK`

View File

@@ -27,4 +27,4 @@ You could use following command to download the upstream kernel (currently Drago
After this command, the kernel code with `upcall` and related `.config` file are all set up in the directory `kata-linux-dragonball-experimental-5.10.25-[config version]`. You can either manually compile the kernel with `make` command or following [Document for build-kernel.sh](../../../tools/packaging/kernel/README.md) to build and use this guest kernel.
Also, a client-side is also needed in VMM. Dragonball has already open-source the way to implement `upcall` client and Dragonball compiled with `dbs-upcall` feature will enable Dragonball client side.
Also, a client-side is also needed in VMM. Dragonball has already open-source the way to implement `upcall` client and Dragonball compiled with `dbs-upcall` feature will enable Dragonball client side.

View File

@@ -27,11 +27,11 @@ When the vCPU is created, it'll turn to `paused` state. After vCPU resource is r
During the `running` state, VMM will catch vCPU exit and execute different logic according to the exit reason.
If the VMM catch some exit reasons that it cannot handle, the state will change to `waiting_exit` and VMM will stop the virtual machine.
If the VMM catch some exit reasons that it cannot handle, the state will change to `waiting_exit` and VMM will stop the virtual machine.
When the state switches to `waiting_exit`, an exit event will be sent to vCPU `exit_evt`, event manager will detect the change in `exit_evt` and set VMM `exit_evt_flag` as 1. A thread serving for VMM event loop will check `exit_evt_flag` and if the flag is 1, it'll stop the VMM.
When the VMM is stopped / destroyed, the state will change to `exited`.
## vCPU Hot plug
Since `Dragonball Sandbox` doesn't support virtualization of ACPI system, we use [`upcall`](https://github.com/openanolis/dragonball-sandbox/tree/main/crates/dbs-upcall) to establish a direct communication channel between `Dragonball` and Guest in order to trigger vCPU hotplug.

View File

@@ -389,7 +389,7 @@ impl FsDeviceMgr {
};
let vm_as = ctx.get_vm_as().map_err(|e| {
error!(ctx.logger(), "virtio-fs get vm_as error: {:?}", e;
error!(ctx.logger(), "virtio-fs get vm_as error: {:?}", e;
"subsystem" => "virito-fs");
FsDeviceError::DeviceManager(e)
})?;

Some files were not shown because too many files have changed in this diff Show More