Compare commits

..

215 Commits

Author SHA1 Message Date
Fabiano Fidêncio
c7d0c270ee release: Bump version to 3.24.0
Bump VERSION and helm-chart versions

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-12 18:15:41 +01:00
Fabiano Fidêncio
50b853eb93 tests: nvidia: Always rely on the "kata" default runtime class
This is a pattern already followed by all the other tests.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-12 16:31:42 +01:00
Manuel Huber
ff2396aeec tests: nvidia: Declare KATA_HYPERVISOR variable
Align with other test logic - declare the KATA_HYPERVISOR in the
run bash script, then declare the RUNTIME_CLASS_NAME variable in
the bats files.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 16:31:42 +01:00
Manuel Huber
6e31cf2156 tests: nvidia: cc: USE is_confidential_gpu_hw
This function has recently been introduced, so we align patterns.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 16:31:42 +01:00
Manuel Huber
cd1f55b41c tests: nvidia: cc: Set GPU0 policy for NIM tests
Now that we have a more restrictive resource policy for KBS, let
us start adopting it across all NVIDIA test cases. This policy was
previously introduced by the NVIDIA attestation test.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 16:31:42 +01:00
Manuel Huber
edbac264cb tests: nvidia: cc: Remove KBS variable
The variable is now set in the CI YAML file, thus removing the
assignment.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 16:31:42 +01:00
Manuel Huber
9665b74653 tests: nvidia: cc: address shellcheck warnings
Address shellcheck warnings for run_kubernetes_nv_tests.sh

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 16:31:42 +01:00
Manuel Huber
5f9e7a03a8 tests: nvidia: do not use teardown_common
Clean up in each NVIDIA bats file according to our needs.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 16:31:42 +01:00
Alex Lyn
c3fd4c1621 version: Bump rtnetlink and netlink-packet-route
It aims to upgrade rtnetlink to mitigate netlink log noise.
This commit upgrades the `rtnetlink` dependency (and corresponding
libraries like `netlink-packet-route`) to address excessive and
unnecessary netlink-related logging during sandbox startup.

Problem:
The previously used `rtnetlink v0.16` (depending on `netlink-proto
v0.11.3`) generates a high volume of DEBUG/INFO level netlink messages
during sandbox initialization. This noise:
1.  Overloads the logging system, often leading to warnings like
"slog-async: logger dropped messages due to channel overflow."
2.  Interferes with effective troubleshooting by distracting developers
from legitimate Kata errors.

Solution:
We upgrade to `rtnetlink v0.19` (and `netlink-proto v0.12`), as testing
confirms that the latest versions have correctly elevated the verbosity
of these netlink internal events to the TRACE level.

This change significantly enhances the log analysis experience by
suppressing unnecessary network-related logs during startup.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-12 14:27:33 +01:00
Manuel Huber
1781fb8b06 tests: nvidia: cc: Use CUDA image from NVCR
Pull from nvcr.io to avoid hitting unauthenticated pull rate
limits.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 12:52:33 +01:00
Manuel Huber
f63f95f315 tests: nvidia: cc: generate pod security policies
With these changes, we create pod security policies when running
against NVIDIA TEE GPU handlers where AUTO_GENERATE_POLICY is set.
For the non-TEE GPU tests, the added functions bail out by design.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 12:52:33 +01:00
Manuel Huber
bf26ad9532 nvidia: tests: remove outer CDI annotations
With the new device plugin being used by CI runners, these
annotations are no longer necessary.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 12:52:33 +01:00
Manuel Huber
37b4f6ae8b tests: Adapt NVIDIA common policy settings
Following existing patterns, we adapt the common policy settings
for NVIDIA GPU CI platforms. For instance, for our CI runners, we
use containerd 2.x.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 12:52:33 +01:00
Manuel Huber
f4c0c8546e tests: Enable AUTO_GENERATE_POLICY for NVIDIA TEEs
Enable auto-generate policy for qemu-nvidia-gpu-* if the user
didn't specify an AUTO_GENERATE_POLICY value.

Setting this in run_kubernetes_nv_tests.sh is too late as
gha-run.sh calls into run_tests, setup.sh, and then into
create_common_genpolicy_settings() where the rules.rego and
genpolicy-settings file are being copied to the right locations.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 12:52:33 +01:00
Manuel Huber
b9774e44b6 genpolicy: tests: Add VFIO passthrough test cases
Add one valid test case with 2 GPUs with proper VFIO device
entries and CDI annotations.
Add seven test cases with invalid combinations of VFIO device
entries and CDI annotations.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 12:52:33 +01:00
Manuel Huber
d3e6936820 genpolicy: validation of vfio passthrough GPUs
Add rules for vfio passthrough GPUs. When creating the security
policy document, parse GPU resource limits and derive CDI
annotation patterns and VFIO device entries.
With various values for CDI annotations and device paths being
runtime-dependent, use regular expressions.
For now, this enables passthrough of NVIDIA GPUs, but the changes
are designed to allow for other VFIO device types.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-12 12:52:33 +01:00
Alex Lyn
82e8e9fbe0 doc: add block device's settings to the doc page
Add the block device specific annotations which is dedicated within
runtime-rs for num_queues and queue_sie to the document to help
users set the two parameters.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-11 21:10:22 +01:00
Alex Lyn
a8a458664d kata-types: Allow dynamic queue config via Pod annotations
This commit introduces the capability to dynamically configure
`queue_size` and `num_queues` parameters via Pod annotations.

Currently, `kata-runtime` allows for static configuration of
`queue_size` and `num_queues` for block devices through its config
file. However, a critical issue arises when a Pod is allocated fewer
CPU cores than the statically configured `num_queues` value. In such
scenarios, the Pod fails to start, leading to operational instability
and limiting flexibility in resource allocation.

To address this, this feature enables users to override the default
queue_size and num_queues parameters by specifying them in Pod
annotations.This allows for fine-grained control and dynamic adjustment
of these parameters based on the specific resource allocation of a Pod.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-11 21:10:22 +01:00
Steve Horsman
51459b9b15 Merge pull request #12220 from fidencio/topic/ci-arm64-temporarily-disable-arm64-non-k8s-tests
ci: arm64-non-k8s: temporarily skip the tests
2025-12-11 11:35:39 +00:00
Fabiano Fidêncio
46c7d6c9f8 ci: arm64-non-k8s: temporarily skip the tests
The runner is down for a few weeks. I may end up bringing in my personal
runner, but I'm not confident I can easily do this before the holidays,
thus I'm skipping the tests for now.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-11 12:14:32 +01:00
Manuel Huber
560f6f6c74 tests: nvidia: cc: Affirming attestation policy
Set the attestation policy for GPU0 to affirming. This requires
the GPU, for instance, to have production properties, such as
properly signed VBIOS firmware.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-11 10:16:58 +01:00
Alex Lyn
751b6875f9 tests: Temporarily skip the cpu-ns test for the s390x platform
As some reasons that this CI is continuously failed, we'd like to
temporarily skip it for the s390x platform. And it will be enabled
when we addressed related issues.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
d495b77135 runtime-rs: Align the default annptations with runtime-go
As the default enable_annotations in runtime-rs is different with
runtime-go, we should make it align with configuration in runtime-go.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
c8dd5fbacf runtime-rs: Migrate vCPU tracking to fractional float
This commit refactors the vCPU resource management within runtime's
`CpuResource` structure and related calculation logic to use
floating-point numbers (`f32`) instead of integers (`u32`).

This migration is necessary to fully support the fractional vCPU
allocation introduced in the `kata-types` library, ensuring better
precision in:
1.Allocation Tracking: `current_vcpu` now tracks the precise
fractional value (e.g., 1.5 vCPUs).
2.Resource Calculation: `calc_cpu_resources` now returns a precise
`f32` sum of container vCPU requests, including normalization logic
based on the maximum period, removing the previous integer rounding
steps in the calculation.
3.Hypervisor Interaction: The integer vCPU requirement for the
hypervisor remains, so `ceil()` is now explicitly applied only when
interacting with the hypervisor or agent APIs
(`do_update_cpu_resources`, `current_vcpu`, `online_cpu_mem`).

And key changes as below:
1. `CpuResource::current_vcpu` updated from `u32` to `f32`.
2. `calc_cpu_resources` return type changed from `u32` to `f32`.
3. CPU hotplug logic now uses `f32` for the target vCPU count and applies
4. `ceil()` before calling `hypervisor.resize_vcpu()`.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
84fd33c3bc kata-types: Use fractional float for vCPU resource tracking
Refactors `LinuxContainerCpuResources` and `LinuxSandboxCpuResources`
to track calculated vCPU allocation using `f64` (fractional float)
instead of `u64` (milliseconds).

This ensures more precise resource calculation (`quota / period`) and
aggregation by avoiding rounding errors inherent in millisecond-based
integer tracking.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
0f04363ea8 tests: Disable CPU elasticity tests for nontee scenarios
This commit updates the non-TEE tests to disable two specific test
cases: `k8s-number-cpus.bats` and `k8s-sandbox-vcpus-allocation.bats`.

These tests are designed to cover CPU elasticity/dynamic scaling
capabilities. In the non-TEE scenario, we are enforcing the disabling of
this capability by setting the default configuration to
`static_sandbox_resource_mgmt=true`.

Although the tests currently pass, allowing them to run is logically
inconsistent with the intended non-TEE configuration. Therefore, we are
disabling them for all non-TEE runtimes, specifically targeting:
- `qemu-coco-dev`
- `qemu-coco-dev-runtime-rs`

This change ensures that our non-TEE CI accurately reflects the static
resource management policy and prevents misleading test results.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
beaf44dd2e tests: disable block volume test for s390 arch
As runtime-rs doesn't support block device hotplug in s390 arch,
with this fact, we just disable or skip the test when it is the
s390.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
535ba589f4 runtime-rs: Enable elastic resource feature
To support such feature, the item in Makefile should be enabled,
and it can be set true when make build, just like this:
`DEFSTATICRESOURCEMGMT_QEMU := false`
When users don't want this feature, they can set it with true via
the configuration.toml.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
28371dbec5 tests: Enable cloud-hypervisor and qemu-runtime-rs within the CI
Enable the cpu hotplug tests within the k8s-number-cpus.bats for both
cloud-hypervisor and qemu-runtime-rs.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
82a72b4564 tests: Enable cpu hotplug for dragonball and clh in vcpus allocation
We have support cpu hotplug features within dragonball and clh, this
commit is to enable the test within the CI.

Fixes: #8660

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
6196d3d646 tests: Enable cpu hotplug tests in k8s-cpu-ns.bats
As previous failure within the case, we choose to skip it, but now
the cpu hotplug has been corrected, and it's time to re-enable it.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
Alex Lyn
96bd13e85d tests: Add support for qemu-runtime-rs
We have supportted virtio-scsi driver, and now the CI should be
enabled.

Fixes: #10373

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-12-10 22:11:56 +01:00
dependabot[bot]
2137b1fa3a build(deps): bump github.com/containernetworking/plugins in /src/runtime
Bumps [github.com/containernetworking/plugins](https://github.com/containernetworking/plugins) from 1.7.1 to 1.9.0.
- [Release notes](https://github.com/containernetworking/plugins/releases)
- [Commits](https://github.com/containernetworking/plugins/compare/v1.7.1...v1.9.0)

---
updated-dependencies:
- dependency-name: github.com/containernetworking/plugins
  dependency-version: 1.9.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-12-10 16:10:24 +01:00
LandonTClipp
b50a73912d runtime: Config test extension for IOMMUFDID
Adding additional cases for the IOMMUFDID method to check for
non-IOMMUFD paths are passed. The method should do the right
thing.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2025-12-10 15:46:28 +01:00
LandonTClipp
d5e4cf6b4d runtime: Add test for ExecuteVFIODeviceAdd
Copilot made a good point that we should have a test for this.
Thus, this commit.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2025-12-10 15:46:28 +01:00
LandonTClipp
137866f793 runtime: Allow QMP commands to be logged in debug level
Logging the QMP commands gives us a lot of flexibility to
troubleshoot issues with what is being sent to QEMU.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2025-12-10 15:46:28 +01:00
LandonTClipp
a3b5764f67 runtime: Fix import cycle and add unit test for IOMMUFDID()
An import cycle was introduced because of a mutual need
for the constant that describes the prefix of IOMMUFD files.
We need to extract this out into a higher-level package.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2025-12-10 15:46:28 +01:00
LandonTClipp
09438fd54f runtime: Add IOMMUFD Object Creation for QEMU QMP Commands
The QMP commands sent to QEMU did not properly set up
IOMMUFD objects in the codepath that handles VFIO device
hot-plugging. This is mainly relevant in the Kubernetes
use-case where the VFIO devices are not available when
QEMU is first launched.

Signed-off-by: LandonTClipp <11232769+LandonTClipp@users.noreply.github.com>
2025-12-10 15:46:28 +01:00
Manuel Huber
cb8fd2e3b1 runtime: gpu: Skip CDI annos for pause container
The pause container does not need CDI annotations, these are only
intended for workload containers.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-10 13:26:04 +01:00
Fabiano Fidêncio
69a0ac979c tests: Adjust install_bats()
The function assumes that the runner is a Ubuntu machine, which so far
has been true as part of our CI.

However, the new ARM runner is running on Debian, and those mirror
additions would simply break.

With this in mind, for any distro that's not ubuntu, let's just make
sure to inform the owner of the system to have bats already installed as
part of the environment provided.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-10 12:05:04 +01:00
Fabiano Fidêncio
406f6b1d15 Revert "tests: Add workaround to override CDI files"
This reverts commit 5a81b010f2, as we now
have all the infrastructure properly set up as part of our CI node.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-09 23:18:11 +01:00
Fabiano Fidêncio
3db7b88eff tests: remove containerd guest pull stability tests
Remove the existing containerd guest pull stability tests workflow
as we're going to rebuild all the VMs used for testing and introduce
new, more focused stability tests for nydus-snapshotter.

The new tests will be added soon, as part of another PR.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-08 16:29:11 +01:00
Fabiano Fidêncio
5b6a2d25bc podOverhead: Reduce memory overhead for GPU runtime classes
Now that we've bumped to QEMU 10.2.0-rc1, we can take advantage of a fix
that's present there, which fixes the double memory allocation for the
cases where GPUs are being cold-plugged.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-06 00:16:43 +01:00
Fabiano Fidêndio
71f78cc87e tests: cc: gpu: Lower the amount of memory required by the pods
We've made the pods require a ridiculous amount of memory, just for the
sake of getting them running.

Now that those are running, tests are passing, CI is required, let's
work to lower the amount of mmemory needed as everything else is working
as expected.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-06 00:16:43 +01:00
Dan Mihai
965ad10cf2 tests: k8s: tests_common.sh local modification
Clean-up shellcheck warnings:

SC2030 (info): Modification of cmd_out is local (to subshell caused by (..) group).
SC2031 (info): cmd_out was modified in a subshell. That change might be lost.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-12-06 00:16:23 +01:00
Dan Mihai
8199171cc4 tests: k8s: tests_common.sh braces around variables
Clean-up shellcheck warnings:

SC2250 (style): Prefer putting braces around variable references even
when not strictly required.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-12-06 00:16:23 +01:00
Fabiano Fidêncio
5a81b010f2 tests: Add workaround to override CDI files
Let's add a simple backup and restore logic for the CDI configuration
file nvidia.com-pgpu.yaml in the k8s-nvidia-*.bats and
k8s-confidential-attestation.bats test files.

Althought not optimal, this is a temporary workaround needed until
NVIDIA releases what's needed for the GPU Operator to properly deal with
cold plugged devices for the Confidential Containers cases, which is
work in progress right now.

After that's released, we can revert/drop this patch.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-05 18:58:35 +01:00
Fabiano Fidêncio
aaa67df4dd versions: Bump experimental {tdx,snp} QEMU
Let's bump experimental {tdx,snp} QEMU to the tags created Today in the
Confidential Containers repo, which match with QEMU 10.2.0-rc1.

This bump is specially beneficial for us, as we can get rid of QEMU's
double memory allocation when **cold plugging** a GPU.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-05 18:58:35 +01:00
Zvonko Kaiser
f8ad17499d gpu: VFIO handling container vs sandbox
If the sandbox has cold-plugged a IOMMUFD device but the
device-plugins sends us a /dev/vfio/<NUM> device we need to
check if the IOMMUFD device and the  VFIO device are the same
We have the sibling.BDF we now need to extract the BDF of the
devPath that is either /dev/vfio/<NUM> or /dev/vfio/devices/vfio<NUM>

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-05 16:53:31 +01:00
Zvonko Kaiser
147e9f188e Merge pull request #12080 from manuelh-dev/mahuber/cc-gpu-ci-attestation
tests: nvidia: cc: Add attestation test
2025-12-05 09:31:57 -05:00
Steve Horsman
2f1b98c232 Merge pull request #12197 from stevenhorsman/logrus-1.9.3-bump
version: Bump sirupsen/logrus
2025-12-05 14:18:50 +00:00
Manuel Huber
e5861cde20 tests: use Authorization when GH_TOKEN is set
Same as for other uses of GH_TOKEN, use it when set in order to
avoid rate limiting issues.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 14:08:43 +01:00
stevenhorsman
9eba559bd6 version: Bump sirupsen/logrus
Bump the github.com/sirupsen/logrus version to 1.9.3
across our components where it is back-level to bring us
up-to-date and resolve high severity CVE-2025-65637

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-05 11:12:04 +00:00
Manuel Huber
34efa83afc tests: nvidia: cc: Add attestation test
Add the attestation bats test case to the NVIDIA CI and provide a
second pod manifest for the attestation test with a GPU. This will
enable composite attestation in a subsequent step.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 11:48:55 +01:00
Manuel Huber
e31d592a0c versions: Bump coco-trustee
Bump to pull in a fix for composite attestation with GPUs. The new
commit ID corresponds to the fix (change for default GPU policy),
currently being the top commit of the main branch.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 11:48:55 +01:00
Manuel Huber
73dfa9b9d5 versions: Bump coco-guest-components
Bump to pull in a fix for NVIDIA CC GPU attestation.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 11:48:55 +01:00
Manuel Huber
116a72ad0d tests: cc: Fix command evaluation
This brings two fixes:
- use the test_key variable to check against the aatest value.
- properly check the run command invocation (run w/o bash does not
  seem to like the pipe which leads to ALWAYS evaluating the
  status result to 1. With this, the deny-all test would ALWAYS
  succeed regardless of whether aatest was actually returned or not.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 11:48:55 +01:00
Manuel Huber
23675c784b tests: cc: Reset default policy
When running these tests repeatedly locally, the default policy is not
being reset after the test completes, then subsequent runs fail.
Similar to k8s-sealed-secrets.bats, we set the default policy in an if
condition.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 11:48:55 +01:00
Manuel Huber
f70c3adaf1 tests: cc: Add kbs_set_gpu0_resource_policy
This allows setting a GPU0 resource policy, enabling GPU
attestation tests to not use the default resource policy.
For now, the policy requires attestation's ear status to
not be contraindicated. In a future change we will require
this to be affirming once our CI runners' vBIOS version is
properly configured.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 11:48:55 +01:00
Manuel Huber
c2d1e2dcc9 tests: cc: Add is_confidential_gpu_hardware
This enables attestation tests to figure out whether composite
attestation with a GPU can be executed.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 11:48:55 +01:00
Manuel Huber
53e94df203 tests: nvidia: cc: add SUPPORTED_TEE_HYPERVISORS
Add the NVIDIA TEE hypervisors. With this, attestation tests can be run
against the NVIDIA handlers, for instance.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-05 11:48:55 +01:00
Fabiano Fidêncio
923f97bc66 rootfs: Temporarily revert "gpu: Handle root_hash.txt correctly"
This reverts commit e4a13b9a4a, as it
caused some issues with the GPU workflows.

Reverting it is better, as it unblocks other PRs.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-05 11:47:37 +01:00
Steve Horsman
d27af53902 Merge pull request #12185 from stevenhorsman/runtime-rs-required-checks
ci: Add qemu-runtime-rs AKS tests to required
2025-12-05 10:43:25 +00:00
stevenhorsman
403de2161f version: Update golang to 1.24.11
Needed to fix:
```
Vulnerability #1: GO-2025-4155
    Excessive resource consumption when printing error string for host
    certificate validation in crypto/x509
  More info: https://pkg.go.dev/vuln/GO-2025-4155
  Standard library
    Found in: crypto/x509@go1.24.9
    Fixed in: crypto/x509@go1.24.11
    Vulnerable symbols found:
      #1: x509.HostnameError.Error
```

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-04 22:50:07 +01:00
Steve Horsman
425f4ffc8d Merge pull request #12124 from zvonkok/nvidia-measured-rootfs
gpu: Measured rootfs
2025-12-04 14:54:11 +00:00
Steve Horsman
f673f33e72 Merge pull request #12172 from fidencio/topic/gatekeeper-mark-nvidia-jobs-as-required
gatekeeper: Mark NVIDIA CC GPU test as required
2025-12-04 12:48:57 +00:00
stevenhorsman
112810c796 ci: Add qemu-runtime-rs AKS tests to required
Add the small and normal variants of the qemu-runtime-rs
tests to the required-tests list now that they are stable.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-04 11:15:43 +00:00
Fabiano Fidêncio
c505afb67c gatekeeper: Mark NVIDIA CC GPU test as required
It's been stable for the past 10 nightlies, no retries.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-12-04 11:14:25 +00:00
Steve Horsman
635f7892d5 Merge pull request #12190 from BbolroC/mark-s390x-jobs-as-nonrequired
gatekeeper: Drop all s390x e2e tests temporarily
2025-12-04 11:10:46 +00:00
Steve Horsman
2a6ebc556f Merge pull request #12175 from kata-containers/mahuber/gpu-ci-genpolicy
ci: nvidia: Install kata-artifacts
2025-12-04 09:23:32 +00:00
Hyounggyu Choi
b6ef7eb9c3 gatekeeper: Drop all s390x e2e tests temporarily
This commit marks three s390x CI jobs as non-required.
Please check out the details at #12189.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-12-04 08:05:14 +01:00
Steve Horsman
10b0717cae Merge pull request #12179 from stevenhorsman/nginx-test-image-by-digest
tests: Switch nginx test image ref to digest
2025-12-03 13:39:07 +00:00
Zvonko Kaiser
e4a13b9a4a gpu: Handle root_hash.txt correctly
Updates to the shim-v2 build and the binaries.sh script.
Makeing sure that both variants "confidential" AND
"nvidia-gpu-confidential" are handled.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-02 19:56:19 +01:00
Steve Horsman
d8405cb7fb Merge pull request #11983 from stevenhorsman/toolchain-guidance
doc: Document our Toolchain policy
2025-12-02 15:47:54 +00:00
stevenhorsman
b9cb667687 doc: Document our Toolchain policy
Create an initial version of our toolchain policy as agreed in
Architecture Committee meetings and the PTG

Fixes: #9841
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-02 14:28:29 +00:00
stevenhorsman
79a75b63bf tests: Switch nginx test image ref to digest
As tags are mutable and digests are not, lets pin our image
by digest to give our CI a better chance of stability

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-02 13:02:50 +00:00
stevenhorsman
5c618dc8e2 tests: Switch nginx images to use version.yaml details
- Swap out the hard-coded nginx registry and verisons for reading
the test image details for version.yaml
which can also ensure that the quay.io mirror is used
rather than the docker hub versions which can hit pull limits
- Try setting imagePullPoliycy Always to fix issues with the arm CI

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-12-02 10:04:09 +01:00
Manuel Huber
3427b5c00e ci: nvidia: Install kata-artifacts
In preparation for Kata agent security policy testing, installing
Kata tools to provide genpolicy.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-01 17:59:19 +00:00
Manuel Huber
4355af7972 kata-deploy: Fix binary find install_tools_helper
Using make tarball targets for tools locally, binaries may exist
for both debug and release builds. In this case, cryptic errors
are shown as we try to install multiple binaries.
This change require exactly one binary to be found and errors  out
in other cases.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-12-01 09:29:24 -08:00
Manuel Huber
5a5c43429e ci: nvidia: remove kubectl_retry calls
When tests regress, the CI wait time can increase significantly
with the current kubectly_retry attempt logic. Thus, align with
other tests and remove kubectl_retry invocations. Instead, rely on
proper timeouts.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-11-28 19:00:57 +01:00
Fabiano Fidêncio
e3646adedf gatekeeper: Drop SEV-SNP from required
SEV-SNP machine is failing due to nydus not being deployed in the
machine.

We cannot easily contact the maintainers due to the US Holidays, and I
think this should become a criteria for a machine not be added as
required again (different regions coverage).

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-28 12:46:07 +01:00
Steve Horsman
8534afb9e8 Merge pull request #12150 from stevenhorsman/add-gatekeeper-triggers
ci: Add two extra gatekeeper triggers
2025-11-28 09:34:41 +00:00
Zvonko Kaiser
9dfa6df2cb agent: Bump CDI-rs to latest
Latest version of container-device-interface is v0.1.1

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-27 22:57:50 +01:00
Fabiano Fidêncio
776e08dbba build: Add nvidia image rootfs builds
So far we've only been building the initrd for the nvidia rootfs.
However, we're also interested on having the image beind used for a few
use-cases.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-27 22:46:07 +01:00
stevenhorsman
531311090c ci: Add two extra gatekeeper triggers
We hit a case that gatekeeper was failing due to thinking the WIP check
had failed, but since it ran the PR had been edited to remove that from
the title. We should listen to edits and unlabels of the PR to ensure that
gatekeeper doesn't get outdated in situations like this.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-27 16:45:04 +00:00
Zvonko Kaiser
bfc9e446e1 kernel: Add NUMA config
Add per arch specific NUMA enablement kernel settings

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-27 12:45:27 +01:00
Steve Horsman
c5ae8c4ba0 Merge pull request #12144 from BbolroC/use-runs-on-to-choose-runners
GHA: Use `runs-on` only for choosing proper runners
2025-11-27 09:54:39 +00:00
Fabiano Fidêncio
2e1ca580a6 runtime-rs: Only QEMU supports templating
We can remove the checks and default values attribution from all other
shims.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-27 10:31:28 +01:00
Alex Lyn
df8315c865 Merge pull request #12130 from Apokleos/stability-rs
tests: Enable stability tests for runtime-rs
2025-11-27 14:27:58 +08:00
Fupan Li
50dce0cc89 Merge pull request #12141 from Apokleos/fix-nydus-sn
tests: Properly handle containerd config based on version
2025-11-27 11:59:59 +08:00
Fabiano Fidêncio
fa42641692 kata-deploy: Cover all flavours of QEMU shims with multiInstallSuffix
We were missing all the runtime-rs variants.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-26 17:44:16 +01:00
Fabiano Fidêncio
96d1e0fe97 kata-deploy: Fix multiInstallSuffix for NV shims
When using the multiInstallSuffix we must be cautelous on using the shim
name, as qemu-nvidia-gpu* doesn't actually have a matching QEMU itself,
but should rather be mapped to:
qemu-nvidia-gpu -> qemu
qemu-nvidia-gpu-snp -> qemu-snp-experimental
qemu-nvidia-gpu-tdx -> qemu-tdx-experimental

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-26 17:44:16 +01:00
Markus Rudy
d8f347d397 Merge pull request #12112 from shwetha-s-poojary/fix_list_routes
agent: fix the list_routes failure
2025-11-26 17:32:10 +01:00
Steve Horsman
3573408f6b Merge pull request #11586 from zvonkok/numa-qemu
qemu: Enable NUMA
2025-11-26 16:28:16 +00:00
Steve Horsman
aae483bf1d Merge pull request #12096 from Amulyam24/enable-ibm-runners
ci: re-enable IBM runners for ppc64le and s390x
2025-11-26 13:51:21 +00:00
Steve Horsman
5c09849fe6 Merge pull request #12143 from kata-containers/topic/add-report-tests-to-workflows
workflows: Add Report tests to all workflows
2025-11-26 13:18:21 +00:00
Steve Horsman
ed7108e61a Merge pull request #12138 from arvindskumar99/SNPrequired
CI: readding SNP as required
2025-11-26 11:33:07 +00:00
Amulyam24
43a004444a ci: re-enable IBM runners for ppc64le and s390x
This PR re-enables the IBM runners for ppc64le/s390x build jobs and s390x static checks.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2025-11-26 16:20:01 +05:30
Hyounggyu Choi
6f761149a7 GHA: Use runs-on only for choosing proper runners
Fixes: #12123

`include` in #12069, introduced to choose a different runner
based on component, leads to another set of redundant jobs
where `matrix.command` is empty.
This commit gets back to the `runs-on` solution, but makes
the condition human-readable.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-11-26 11:35:30 +01:00
Alex Lyn
4e450691f4 tests: Unify nydus configuration to containerd v3 schema
Containerd configuration syntax (`config.toml`) varies across versions,
requiring per-version logic for fields like `runtime`.

However, testing confirms that containerd LTS (1.7.x) and newer
versions fully support the v3 schema for the nydus remote snapshotter.

This commit changes the previous containerd v1 settings in `config.toml`.
Instead, it introduces a unified v3-style configuration for nydus, which
can be vailid for lts and active containerds.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-26 17:58:16 +08:00
stevenhorsman
4c59cf1a5d workflows: Add Report tests to all workflows
In the CoCo tests jobs @wainersm create a report tests step
that summarises the jobs, so they are easier to understand and
get results for. This is very useful, so let's roll it out to all the bats
tests.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-26 09:28:36 +00:00
shwetha-s-poojary
4510e6b49e agent: fix the list_routes failure
relax list_routes tests so not every route requires a device

Signed-off-by: shwetha-s-poojary <shwetha.s-poojary@ibm.com>
2025-11-25 20:25:46 -08:00
Xuewei Niu
04e1cf06ed Merge pull request #12137 from Apokleos/fix-netdev-mq
runtime-rs: fix QMP 'mq' parameter type in netdev_add to boolean
2025-11-26 11:49:33 +08:00
Alex Lyn
ebe084e093 Merge pull request #12122 from fidencio/topic/configs-do-no-have-commented-out-options
runtimes: config: Do NOT have commented fields
2025-11-26 10:33:32 +08:00
Alex Lyn
e9f50f6e71 Merge pull request #12116 from manuelh-dev/mahuber/ci-openvpn-policy-v2
policy: ci: enable security policy for openvpn test case
2025-11-26 09:35:43 +08:00
Fabiano Fidêncio
e859537c74 runtimes: config: Do NOT have commented fields
In order to have a better way to set things up using a toml editor, we
should take the containerd approach and actually have everything
uncommnted.  This will help us to unify how we deal with such values in
the future from the kata-deploy POV.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-25 19:26:56 +01:00
Arvind Kumar
c085011a0a CI: readding SNP as required
Reenabling the SNP CI node as a required test.

Signed-off-by: Arvind Kumar <arvinkum@amd.com>
2025-11-25 17:05:01 +00:00
Fabiano Fidêncio
5ca4f2b9ff runtimes: annotations: Fix kernel param handling
We need to ensure that we do not blindly append nor blindly override the
kernel parameters set by default, but rather modify the values in case
they exist, and append in case they do not.

Now we're actually making golang and rust runtime behave the same, as so
far they were behaving differently, each version wrong in its own way.
:-p.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-25 16:04:52 +01:00
Zvonko Kaiser
45cce49b72 shellcheckk: Fix [] [[]] SC2166
This file is a beast so doing one shellcheck fix after the other.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-25 15:46:16 +01:00
Zvonko Kaiser
b2c9439314 qemu: Update tools/packaging/static-build/qemu/build-qemu.sh
This nit was introduced by 227e717 during the v3.1.0 era. The + sign from the bash substitution ${CI:+...} was copied by mistake.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-25 15:46:09 +01:00
Zvonko Kaiser
2f3d42c0e4 shellcheck: build-qemu.sh is clean
Make shellcheck happy

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-25 15:46:07 +01:00
Zvonko Kaiser
f55de74ac5 shellcheck: build-base-qemu.sh is clean
Make shellcheck happy

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-25 15:45:49 +01:00
Zvonko Kaiser
040f920de1 qemu: Enable NUMA support
Enable NUMA support with QEMU.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-25 15:45:00 +01:00
Alex Lyn
de9308419b Merge pull request #12135 from microsoft/danmihai1/init-data
agent: allow disabling detect_initdata_device
2025-11-25 21:07:57 +08:00
Alex Lyn
34d3bd18bc Merge pull request #12132 from fidencio/topic/runtime-classes-fix-nvidia-gpu-podOverhead
runtimeclasses: Fix nvidia-gpu podOverhead
2025-11-25 20:23:07 +08:00
Alex Lyn
7f4d856e38 tests: Enable nydus tests for qemu-runtime-rs
We need enable nydus tests for qemu-runtime-rs, and this commit
aims to do it.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-25 17:45:57 +08:00
Alex Lyn
98df3e760c runtime-rs: fix QMP 'mq' parameter type in netdev_add to boolean
QEMU netdev_add QMP command requires the 'mq' (multi-queue) argument
to be of boolean type (`true` / `false`). In runtime-rs the virtio-net
device hotplug logic currently passes a string value (e.g. "on"/"off"),
which causes QEMU to reject the command:
```
    Invalid parameter type for 'mq', expected: boolean
```
This patch modifies `hotplug_network_device` to insert 'mq' as a proper
boolean value of `true . This fixes sandbox startup failures when
multi-queue is enabled.

Fixes #12136

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-25 17:34:36 +08:00
Alex Lyn
23393d47f6 tests: Enable stability tests for qemu-runtime-rs on nontee
Enable the stability tests for qemu-runtime-rs CoCo on non-TEE
environments

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-25 16:18:37 +08:00
Alex Lyn
f1d971040d tests: Enable run-nerdctl-tests for qemu-runtime-rs
Enable nerdctl tests for qemu-runtime-rs

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-25 16:14:50 +08:00
Alex Lyn
c7842aed16 tests: Enable stability tests for runtime-rs
As previous set without qemu-runtime-rs, we enable it in this commit.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-25 16:12:12 +08:00
Alex Lyn
aadf1d6f71 Merge pull request #11932 from Apokleos/enhance-blk-params
runtime-rs: Allow configuration of virtio block queue parameters
2025-11-25 15:24:12 +08:00
Dan Mihai
22d60a36c0 agent: allow disabling detect_initdata_device
Allow users to build the Kata Agent using INIT_DATA=no to disable the
detect_initdata_device() code loop and associated debug log output.

Future additional improvements related to Init Data are tracked by #11532.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2025-11-25 02:44:28 +00:00
Fabiano Fidêncio
bb56a2e4d9 runtimeclasses: Fix nvidia-gpu podOverhead
On 69c4fc4e76, I've mistakenly changed the
nvidia-gpu podOverhead while I should only have changed the TEE
nvidia-gpu ones.

Let's move it back to its original value.

Reported-by: Joji Mekkattuparamban <jojim@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-24 21:43:29 +01:00
Zvonko Kaiser
55489818d6 gpu: TDX kernel param cleanup
This settings is not needed anymore with Ubuntu 25.10
and the newest QEMU releases for TDX by Ubuntu.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-24 15:49:16 +01:00
Steve Horsman
e1e370091c Merge pull request #12128 from fidencio/topic/kata-deploy-nfd-adjust-runtime-classe
kata-deploy: nfd: Patch TEE runtimeclasses when needed
2025-11-24 14:05:43 +00:00
Steve Horsman
d437f875aa Merge pull request #12126 from zvonkok/cold-plug-cleanup
gpu: Cleanup Makefile
2025-11-24 14:01:49 +00:00
Zvonko Kaiser
77089fe5b3 Merge pull request #12115 from nheinemans-asml/main
Kata-deploy: Add tolerations to daemonset and cleanup job
2025-11-24 09:00:42 -05:00
Manuel Huber
331515e1b8 ci: enable security policy for openvpn test
With issue 11777 being resolved, this commit enables openvpn
policy testing. The remaining work on the security policy
required to successfully run this test case was to enable UDP
ports for Service kinds and to use the mount path's last component
instead of the volume name to construct the expected storage
source path.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-11-23 17:23:43 +00:00
Manuel Huber
4f32816ea3 policy: Use mount path instead of volume name
Use the mount path's last component instead of the volume name to
construct the expected storage source path. Example: Name of a
volumeMount is 'openvpn-config' and its mountPath is
'/etc/openvpn/'. Without this change, we use 'openvpn-config' to
calculate the expected storage source path. However, we need to
use 'openvpn', because the shim uses the basename of the
destination path as the source suffix and not the volume name.
For reference, see 'fs_hsare_linux.go"'s 'ShareFile' function
where the filename variable uses 'filepath.Base(m.Destionation))'.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-11-23 17:23:43 +00:00
Manuel Huber
e4123a9848 policy: support UDP based Service types
For Service kinds using the UDP protocol as port. An example is
the openvpn-server-service.yaml file part of the openvpn CI test.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-11-23 17:23:43 +00:00
Fabiano Fidêncio
d0f3eb935e kata-deploy: nfd: Patch TEE runtimeclasses when needed
We've added logic to properly do the book keeping of the TEE keys when
using NFD **AND** creating the runtime classes. However, we need to also
take into consideration the case where the runtimeclasses are being
created by the helm template, and in that case we just update what helm
has deployed.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-23 10:27:52 +01:00
Zvonko Kaiser
dce207397c gpu: Cleanup Makefile
Some VARS were introduced but not cleaned up with
the recent cold-plug PR, doing this now

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-11-21 22:03:34 +00:00
Zvonko Kaiser
8afcdae31f Merge pull request #12092 from manuelh-dev/mahuber/cc-gpu-ci-smi-srs
tests: nvidia: cc: Remove nvrc.smi.srs=1 parameter
2025-11-21 08:26:13 -05:00
Steve Horsman
37dd055283 Merge pull request #12090 from stevenhorsman/required-tests-update-14-nov-2025
Required tests update 14 nov 2025
2025-11-21 12:05:05 +00:00
nheinemans-asml
ef9d4e8b0d kata-deploy: Add tolerations value to kata-deploy
This allows the daemonset and cleanup job to run on tainted nodes.

fixes #12114

Signed-off-by: nheinemans-asml <nick.heinemans@asml.com>
Signed-off-by: nheinemans-asml <97238218+nheinemans-asml@users.noreply.github.com>
2025-11-21 09:49:47 +01:00
Manuel Huber
dfc229f51e tests: nvidia: cc: Remove nvrc.smi.srs=1 parameter
Remove the nvrc.smi.srs=1 parameter from the kernel command line.
In CC use cases, the attestation agent is expected to set the GPU
ready state. For the CUDA vectorAdd case where attestation agent
is not being used, we set the ready state by adding the kernel
command line parameter through an annotation.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-11-21 09:35:05 +01:00
Manuel Huber
6c6fc50aa5 tests: nvidia: cc: allow-all policy and init-data
Add an allow-all policy for the CC GPU tests and ensure the init-data
device is being created (hypervisor annotations).

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-11-21 09:24:15 +01:00
Manuel Huber
7e20118c8e tests: nvidia: move secret definitions to bottom
The add_allow_all_policy_to_yaml in tests_common.sh needs some
improvements so that this function can support pod manifests with
different resource kinds. For now, moving the Secret definition
to the bottom so that we can create a default policy for the Pod.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-11-21 09:24:15 +01:00
Manuel Huber
ffd5443637 tests: nvidia: adapt is_aks_cluster
The qemu-nvida-gpu handlers should not cause is_aks_cluster to
return 1. Otherwise, CI logic will assume these hypervisors run on
AKS hosts, see the following message in CI w/o this change:
INFO: Adapting common policy settings for AKS Hosts

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-11-21 09:24:15 +01:00
Manuel Huber
f2bdd12e5e tests: nvidia: Check KATA_HYPERVISOR var
Fail explicitly when a wrong KATA_HYPERVISOR variable is provided.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-11-21 09:24:15 +01:00
Xuewei Niu
bf967b81cc runtime-rs: Bump cgroups-rs to v0.5.0
The new version fixes some issues with systemd version, path
verification.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2025-11-21 09:06:26 +01:00
Fabiano Fidêncio
6b40b59861 tests: Reduce KBS deployment check flakeness
We currently start a pod that does a `wget` to the KBS address, and
fails after 5 seconds.

By the time it fails and reports back, we can see that KBS is actually
running, but the workflow failed as the checker failed. :-/

Let's give it more time for the KBS to show up, and the flakeness should
go away.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-20 19:29:26 +01:00
Fabiano Fidêncio
35672ec5ee tests: cc: Test authenticated images with force guest pull
As this should simply work.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-20 19:02:15 +01:00
Fupan Li
b86e7ff42b Merge pull request #12087 from jojimt/device_cold_plug
shim: Support device cold plug with Kubernetes
2025-11-20 19:17:13 +08:00
Joji Mekkattuparamban
7dc292094c shim: go vendor changes for cold plug support
Vendor in the kubelet pod resources API.

Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>
2025-11-20 10:58:55 +01:00
Joji Mekkattuparamban
5aa184925a shim: Support device cold plug with Kubernetes
Utilize Kubelet's Pod Resource API to determine device allocations
for the Pod during sandbox creation. Use CDI files to translate the device
IDs to corresponding device paths and perform device injection.

Fixes #12009

Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>
2025-11-20 10:58:55 +01:00
Manuel Huber
477ca3980b tests: nvidia: cc: Re-enable multi GPU test case
Use the pod name variable so that kubectl wait finds the pod. Currently,
kubectl waits for nvidia-nim-llama-3-2-nv-embedqa-1b-v2, not for
nvidia-nim-llama-3-2-nv-embedqa-1b-v2-tee

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-11-20 10:05:46 +01:00
Zvonko Kaiser
89cd561340 Merge pull request #12059 from manuelh-dev/mahuber/bb-debug-v2
gpu: introduce a new devkit build flag to produce a rootfs for developers
2025-11-19 13:03:46 -05:00
Steve Horsman
8c6c31555a Merge pull request #12111 from fidencio/topic/ci-fix-erofs-ci
tests: k8s: Fix typo in authenticated tests
2025-11-19 16:08:48 +00:00
Manuel Huber
3966864376 gpu: introduce devkit build flag
Introduce a new devkit parameter which will produce a rootfs
without chisselling. This results in a larger rootfs with various
packages and binaries being included, for instance, enabling the
use of the debug console.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-11-19 15:50:03 +01:00
Manuel Huber
2c9e0f9f4f gpu: add signed-by to package sources
Pin to specific key. CUDA package sources in
/etc/apt/sources.list.d already use a specific key.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2025-11-19 15:50:03 +01:00
Ruoqing He
54bfbf5687 build: Exclude tools from root workspace
There are rust packages being cloned and built inside
tools/packaging/kata-deploy/local-build/build folder, which may mislead
those packages to think they are part of the kata root workspace.
Exclude the directory to avoid that.

Reported-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-11-19 15:49:25 +01:00
Fabiano Fidêncio
ae463642ed tests: k8s: Fix typo in authenticated tests
The person who introduced the check, someone named Fabiano Fidêncio,
forgot a `$` in a variable assignment.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-19 11:59:59 +01:00
Steve Horsman
87b180383e Merge pull request #11802 from kata-containers/dependabot/github_actions/oras-project/setup-oras-1.2.4
build(deps): bump oras-project/setup-oras from 1.2.2 to 1.2.4
2025-11-19 09:58:37 +00:00
dependabot[bot]
ede5ac9c2d build(deps): bump the bit-vec group across 2 directories with 1 update
Bumps the bit-vec group with 1 update in the /src/agent directory: [bit-vec](https://github.com/contain-rs/bit-vec).
Bumps the bit-vec group with 1 update in the /src/tools/agent-ctl directory: [bit-vec](https://github.com/contain-rs/bit-vec).


Updates `bit-vec` from 0.6.3 to 0.8.0
- [Changelog](https://github.com/contain-rs/bit-vec/blob/master/RELEASES.md)
- [Commits](https://github.com/contain-rs/bit-vec/commits)

Updates `bit-vec` from 0.6.3 to 0.8.0
- [Changelog](https://github.com/contain-rs/bit-vec/blob/master/RELEASES.md)
- [Commits](https://github.com/contain-rs/bit-vec/commits)

---
updated-dependencies:
- dependency-name: bit-vec
  dependency-version: 0.8.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: bit-vec
- dependency-name: bit-vec
  dependency-version: 0.8.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: bit-vec
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-19 10:43:25 +01:00
stevenhorsman
b75d90b483 ci: Comment out snp ci from required-tests
The snp CI has not been required for a while and has recently been
broken, so comment it out from the list of required jobs.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-19 09:39:36 +00:00
stevenhorsman
ae71921be2 ci: Update build-checks name in required-tests
to update the required-tests to match.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-19 09:39:36 +00:00
stevenhorsman
112ed9bb46 ci: Comment out run-nydus from required-tests
The run-nydus tests are not stable and blocking PRs, so make them
non-required temporarily until they can be looked at

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-19 09:38:38 +00:00
Fupan Li
478a5ff693 Merge pull request #12109 from Apokleos/enable-cocodev-rs
tests: Enable AUTO_GENERATE_POLICY for qemu-coco-dev-runtime-rs
2025-11-19 12:05:22 +08:00
Alex Lyn
1da225efc5 tests: Enable AUTO_GENERATE_POLICY for qemu-coco-dev-runtime-rs
Enable auto-generate policy on cbl-mariner Hosts for
qemu-coco-dev-runtime-rs if the user didn't specify an
AUTO_GENERATE_POLICY value.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-19 10:44:03 +08:00
Alex Lyn
8d85548711 Merge pull request #12102 from Apokleos/rs-copyfile-devcgrp
runtime-rs: Clear Linux.Resources.Devices completely and correct the guest path for container mount binding
2025-11-19 09:05:59 +08:00
Fabiano Fidêncio
8c02b5b913 tests: nvidia: cc: Temporarily skip multi GPU for nim tests
We will re-enable this one later on once the changes to properly cold
plug multi GPUs are merged.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-18 22:29:42 +01:00
Fabiano Fidêncio
69c4fc4e76 kata-deploy: Adjust podOverhead for GPU TEEs
Let's just move the podOverhead to a gigantic value, as we do need pod
snadboxes as big as that, and we've noticed QEMU being OOM killed with
smaller overheads.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-18 22:29:42 +01:00
Fabiano Fidêncio
94ed4051b0 tests: nvidia: cc: Increase RAM for NIM pods
Those need to pull the models inside the guest, and the guest has 50% of
its memory "allowed" to be used as tmpfs, so, we gotta usa the RAM that
we have.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-18 22:29:42 +01:00
Fabiano Fidêncio
e5062a056e tests: nvidia: cc: Adjust timeouts on NIM pods
Timeout increases for confidential computing slowness:
* livenessProbe:
  * initialDelaySeconds: 15 → 120 seconds
  * timeoutSeconds: 1 → 10 seconds
  * failureThreshold: 3 → 10

* readinessProbe:
  * initialDelaySeconds: 15 → 120 seconds
  * timeoutSeconds: 1 → 10 seconds
  * failureThreshold: 3 → 10

* startupProbe:
  * initialDelaySeconds: 40 → 180 seconds
  * timeoutSeconds: 1 → 10 seconds
  * failureThreshold: 180 → 300

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-18 22:29:42 +01:00
Fabiano Fidêncio
dee6f2666b runtime: nvidia: Increase the guest pull timeout to 20 minutes
Yes, we're dealing with a combination of large images and image-rs
concurrent image layers being not optimal.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-18 22:29:42 +01:00
Fabiano Fidêncio
6be43b2308 tests: nvidia: Retry kubectl commands
As with CoCo some of the commands may take longer, way longer than
expected.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-18 22:29:42 +01:00
Fabiano Fidêncio
bb5bf6b864 tests: nvidia: nims: Use the current auths format for KBS
We cannot use the same format used for docker, as it includes username
and password, while what's expected when using Trustee does not.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-18 22:29:42 +01:00
Fabiano Fidêncio
92da54c088 tests: nvidia: cc: Enable NIM tests
Now that we've bumped Trustee to a version that supports the NVIDIA
remote verifier, let's re-enable the tests.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-18 22:29:42 +01:00
Steve Horsman
74254cba8f Merge pull request #12106 from stevenhorsman/gatekeeper-paging-reduction
ci: Adjust gatekeeper's job fetch
2025-11-18 14:08:26 +00:00
Fabiano Fidêncio
8eca0814bd tests: Run authenticated tests with experimental_force_guest_pull
As it should be supported.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-18 14:46:48 +01:00
Fabiano Fidêncio
5beb1af202 tests: Pass EXPERIMENTAL_FORCE_GUEST_PULL to the test
Right now we have only been passing the env var to the deployment
script, but we really need to pass it to the tests script as well.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-18 14:46:48 +01:00
Markus Rudy
638cad18ef Merge pull request #11978 from burgerdev/genpolicy-test-refactor
genpolicy: prepare integration tests for programmatic modification
2025-11-18 09:54:40 +01:00
stevenhorsman
9f0fea1e34 ci: Adjust gatekeeper's job fetch
Try and reduce the page limit of each job request to avoid the chances of
us tripping over github's 10s api limit.
All credit to @burgerdev for the investigation and suggestion!

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-18 08:22:36 +00:00
Alex Lyn
6ceacee0b9 runtime-rs: Add queue_size and num_queues for block volumes
Add the related block queue_size and num_queues in volumes based on
block devices, This very important for IO performance.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-18 14:53:43 +08:00
Alex Lyn
30a9a8b4ec runtime-rs: Add queue_size and num_queues for block device
Add the queue_size and num_queues in block device config when the
block device is handled.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-18 14:53:43 +08:00
Alex Lyn
9b0204a2de runtime-rs: Set Clh's disk queue_size and num_queues
Previous Clh's settings with disk queue_size and num_queues are
hardcodes, they should be configurable with user-defined values.
This commit is to address such issue via passing these settings.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-18 14:53:43 +08:00
Alex Lyn
f19c48505c runtime-rs: Introduce queue_size and num_queues in BlockConfig
Usually, we pass the related block config via BlockConfig, and to reach
the goal of user-friendly setting queue_size and num_queues for users,
the queue_size and num_queues are introduced in BlockConfig.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-18 14:53:43 +08:00
Alex Lyn
e958993348 kata-types: Introduce queue_size and num_queues within BlockDeviceInfo
Add two fields of queue_size and num_queues in BlockDeviceInfo to allow
users to set the related items via configurations

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-18 14:53:43 +08:00
Alex Lyn
780c45de23 runtime-rs: Add support queue_size and num_queues within configurations
Add related items for block device queue size and num queues in
configurations. And users can set the related items by configurations.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-18 14:53:43 +08:00
Steve Horsman
ac021e2ab9 Merge pull request #11563 from RuoqingHe/single-workspace
build: Introduce root workspace for rust components
2025-11-18 06:36:18 +00:00
Alex Lyn
d071384bba runtime-rs: Clear Linux.Resources.Devices completely
The current implementation causes issues with the Agent Policy
nontee CI tests, as Kata-Agent does not allow any configuration
for `count(Linux.Resources.Devices) == 0`.

This commit ensures that Linux.Resources.Devices, including all its
values, is completely cleared from the OCI Runtime Specification before
being passed to the Kata-Agent.

This addresses the CI failure by enforcing the required empty state for
the Devices cgroup configuration.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-18 13:40:09 +08:00
Xuewei Niu
ca8b3300d3 Merge pull request #11620 from zhangckid/indep_iothreads_upstream
Runtime/QEMU: Introduce virtio-blk with iothreads and enable Indep iothreads framework
2025-11-18 11:08:51 +08:00
Alex Lyn
5982e66503 runtime-rs: Ensure unique guest path for container mount binding
Previously, CopyFile implementation attempted to reuse existing guest
paths for subsequent containers within the same Pod. This prevented
correct bind mounting of shared configurations (e.g., ConfigMaps,
Service Accounts) into the later containers within a multi-containers
pod, as they lacked their own allocated guest path.

This commit modifies the logic to create a unique guest path for every
container that requires file propagation.

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-18 11:03:26 +08:00
Fupan Li
f791be1abb Merge pull request #12064 from Apokleos/policy-optional-path
genpolicy: Make cpath compatible with both runtime-rs and runtime-go
2025-11-18 10:19:26 +08:00
Ruoqing He
e6b24cd789 build: Exclude crates with no workspace setup
Crates with no workspace setup would think themselves are in the root
workspace, which our root workspace is not ready for them. Excluding
them for now.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-11-18 01:39:48 +00:00
Ruoqing He
6068242bf1 build: Move dragonball to root workspace
Move dragonball and all its member of that workspace into root
workspace.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-11-18 01:39:48 +00:00
Ruoqing He
3fbe693658 build: Introduce root workspace for rust components
Add Cargo.toml at repo root, use this root workspace for as many as
possible Rust components of Kata Containers. This would enable us to
share a common Cargo.lock file, and reduce the noise from dependabot.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-11-18 01:39:48 +00:00
Steve Horsman
650ada7bcc Merge pull request #12101 from stevenhorsman/release/3.23.0
release: Bump version to 3.23.0
2025-11-17 21:09:45 +00:00
stevenhorsman
70f1f4a3ac release: Bump version to 3.23.0
Bump VERSION and helm-chart versions

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-17 19:27:25 +00:00
stevenhorsman
c47e8d0ab8 kata-ctl: update backtrace and local references
Similar to #12075, bump-backtrace to 0.3.76 to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056
As a side effect this brought in loads of other crate changes, which I think are due
to it bumping the local dependencies that this package builds on.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-17 20:13:04 +01:00
stevenhorsman
d16620bae1 runk: update backtrace to 0.3.76
Similar to #12075, bump-backtrace to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-17 20:13:04 +01:00
stevenhorsman
0b259e4fcf agent-ctl: update backtrace to 0.3.76
Similar to #12075, bump-backtrace to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-17 20:13:04 +01:00
stevenhorsman
4abf79f16f genpolicy: update backtrace to 0.3.76
Similar to #12075, bump-backtrace to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-17 20:13:04 +01:00
stevenhorsman
4158d9a94a runtime-rs: update flate2 & backtrace
Similar to #12075, bump flate2 and backtrace to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-17 20:13:04 +01:00
stevenhorsman
fe10db233c runtime-rs: Remove libbacktrace feature from backtrace
This feature was removed in https://github.com/rust-lang/backtrace-rs/pull/615
which shows that the implementation was removed over two years ago, so
get rid of this feature, so we can move to newer versions

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-17 20:13:04 +01:00
stevenhorsman
398e7987cd dragonball: update flate2 & backtrace
Similar to #12075, bump flate2 and backtrace to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-17 20:13:04 +01:00
Steve Horsman
04c7d11689 Merge pull request #12044 from lifupan/fix_update_interface
runtime: fix the issue of update interface error
2025-11-17 14:45:36 +00:00
Fupan Li
763a0d8675 runtime: fix the issue of update interface error
Since the network device hotplug is an asynchronous operation,
it's possible that the hotplug operation had returned, but
the network device hasn't ready in guest, thus it's better to
retry on this operation to wait until the device ready in guest.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-11-17 13:58:36 +01:00
Steve Horsman
b3eb794662 Merge pull request #12098 from stevenhorsman/csi-kata-direct-volume-xz-0.5.15-bump
csi-kata-directvolume: Bump xz module
2025-11-17 12:47:28 +00:00
Fabiano Fidêncio
75996945aa kata-deploy: try-kata-values.yaml -> values.yaml
This makes the user experience better, as the admin can deploy Kata
Containers without having to download / set up any additional file.

Of course, if the admin wants something more specific, examples are
provided.

Tests and documentation are updated to reflect this change.

Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
2025-11-17 12:16:17 +01:00
Alex Lyn
71a9ecf9f8 Merge pull request #12095 from lifupan/fix_vcpu_number
runtime-rs: fix the issue of wrong vcpu number
2025-11-17 19:11:48 +08:00
stevenhorsman
502a3ce3b6 csi-kata-directvolume: Bump xz module
Bump github.com/ulikunitz/xz to v0.5.15, to remediate vulnerability
GO-2025-3922

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-11-17 10:20:50 +00:00
Markus Rudy
b771bb6ed3 genpolicy: log requests as jsonlines
The current format of genpolicy request logs looks a bit like JSON, but
it does not parse out of the box and needs post-processing with sed, for
example.

This commit changes the log format to jsonlines[1], which is basically
newline-delimited compact JSON values. Compared to standard JSON, this
allows streaming output. The resulting file can be converted and
processed programmatically, for example with `jq -s`.

The fields are also adjusted to match the field names of TestRequest, so
that the logged requests can be used immediately in tests.

[1]: https://jsonlines.org/

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2025-11-17 09:01:00 +01:00
Markus Rudy
eb6cf025b3 genpolicy: format testcases.json and sort by key
This should allow keeping future diffs minimal.

The files were formatted with `jq -S`, which should be used after future
updates to the test case files.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2025-11-17 09:01:00 +01:00
Markus Rudy
851f8258af genpolicy: move testcase request type out of struct
Storing the request type outside the request object has two benefits:

* The request JSON passed to the Rego engine matches more closely what
  would be passed by the agent (no `type` field).
* If we want to update the requests, it's easier to insert them into a
  dedicated field, rather than inserting them and amending the type
  field.

This is a first step towards programmatic updates of testcase files.

This commit also adds the 'Request' suffix to the test case enum, such
that we can use the 'ep' input for allow_request directly.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2025-11-17 09:01:00 +01:00
zhangchen.kidd
914063bcdd runtime: documentation: Add virtio-blk support iothread comments in docs
Add comments to make the "EnableIOThreads" flag as a switch
for virtio-blk(based on IndepIOThreads) driver.

Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>
2025-11-17 15:55:03 +08:00
zhangchen.kidd
9128112e3d runtime: qemu: Add Independent IOThread support for virtio-blk
Make hotplug virtio-blk device attach to Independent IOThread 0 as default
when enabled the EnableIOThreads and IndepIOThreads.

Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>
2025-11-17 15:55:03 +08:00
zhangchen.kidd
fea954df7a runtime: qemu: qmp: Add iothread args for QMP ExecutePCIDeviceAdd
Qemu already support the device_add with iothread args.
Make KATA have ability to hotplug PCI device with IOThreads.
Currently, just support QEMU as the hypervisor, not sure it
works for stratovirt.

Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>
2025-11-17 15:55:03 +08:00
zhangchen.kidd
af203b7dee runtime: qemu: introduce setup iothread function
Make the original virtio-scsi iothread and the new independent
iothread to a dedicated method for handing the related logics.

Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>
2025-11-17 15:55:03 +08:00
zhangchen.kidd
d20712aa9e runtime: qemu: Add comments for virtio-scsi iothread args
For current implementation, just virtio-scsi use this
iothread path.

Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>
2025-11-17 15:55:03 +08:00
zhangchen.kidd
f9d4829e77 rumtime: qemu: Add indep_iothreads for QEMU hypervisor toml
Add indep_iothreads args for QEMU related configuration toml.
The default value is 0.

Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>
2025-11-17 15:55:03 +08:00
zhangchen.kidd
c3d3684f81 runtime: Introduce independent IOThreads framework
Introduce independent IOThread framework for Kata container.

What is the indep_iothreads:
This new feature introduce a way to pre-alloc IOThreads
for QEMU hypervisor (maybe other hypervisor can support too).
Independent IOThreads enables IO to be processed in a separate thread.
To generally improve the performance of each module, avoid them
running in the QEMU main loop.

Why need indep_iothreads:
In Kata container implementation, many devices based on hotplug
mechanism. The real workload container may not sync the same
lifecycle with the VM. It may require to hotplug/unplug new disks
or other devices without destroying the VM. So we can keep the
IOThread with the VM as a IOThread pool(some devices need multi iothreads
for performance like virtio-blk vq-mapping), the hotplug devices
can attach/detach with the IOThread according to business needs.
At the same time, QEMU also support the "x-blockdev-set-iothread"
to change iothreads(but it need stop VM for data secure).
Current QEMU have many devices support iothread, virtio-blk,
virtio-scsi, virtio-balloon, monitor, colo-compare...etc...

How it works:
Add new item in hypervisor struct named "indep_iothreads" in toml.
The default value is 0, it reused the original "enable_iothreads" as
the switch. If the "indep_iothreads" != 0 and "enable_iothreads" = true
it will add qmp object -iothread indepIOThreadsPrefix_No when VM startup.
The first user is the virtio-blk, it will attach the indep_iothread_0
as default when enable iothread for virtio-blk.

Thanks
Chen

Signed-off-by: zhangchen.kidd <zhangchen.kidd@jd.com>
2025-11-17 15:55:01 +08:00
Alex Lyn
daca7b268b genpolicy: Make cpath compatible with both runtime-rs and runtime-go
Update the `cpath` variable in the policy template to support the
optional `/passthrough` subpath used by runtime-rs. This ensures
that mount source path validation works correctly for both runtime
implementations.

By changing `cpath` to include the `(?:/passthrough)?` regular
expression fragment, we make the `/passthrough` segment optional.
The updated `cpath`:
`/run/kata-containers/shared/containers(?:/passthrough)?`

This single regex pattern now correctly matches both:
1.`/run/kata-containers/shared/containers/<sandbox-id>/...`
(runtime-go)
2.`/run/kata-containers/shared/containers/passthrough/<sandbox-id>/...`
(runtime-rs)

This elegantly resolves the compatibility issue without needing to add
separate or conditional logic to the policy rules, making the policy
more robust and maintainable.

Fixes: #12063

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2025-11-17 09:36:19 +08:00
dependabot[bot]
c715d8648c build(deps): bump oras-project/setup-oras from 1.2.2 to 1.2.4
Bumps [oras-project/setup-oras](https://github.com/oras-project/setup-oras) from 1.2.2 to 1.2.4.
- [Release notes](https://github.com/oras-project/setup-oras/releases)
- [Commits](5c0b487ce3...22ce207df3)

---
updated-dependencies:
- dependency-name: oras-project/setup-oras
  dependency-version: 1.2.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-11-12 09:45:27 +00:00
372 changed files with 18984 additions and 8243 deletions

View File

@@ -10,11 +10,6 @@ self-hosted-runner:
- amd64-nvidia-a100
- amd64-nvidia-h100-snp
- arm64-k8s
- containerd-v1.7-overlayfs
- containerd-v2.0-overlayfs
- containerd-v2.1-overlayfs
- containerd-v2.2
- containerd-v2.2-overlayfs
- garm-ubuntu-2004
- garm-ubuntu-2004-smaller
- garm-ubuntu-2204
@@ -25,6 +20,7 @@ self-hosted-runner:
- ppc64le-k8s
- ppc64le-small
- ubuntu-24.04-ppc64le
- ubuntu-24.04-s390x
- metrics
- riscv-builder
- sev-snp

View File

@@ -71,7 +71,7 @@ jobs:
fail-fast: false
matrix:
containerd_version: ['lts', 'active']
vmm: ['clh', 'cloud-hypervisor', 'dragonball', 'qemu']
vmm: ['clh', 'cloud-hypervisor', 'dragonball', 'qemu', 'qemu-runtime-rs']
runs-on: ubuntu-22.04
env:
CONTAINERD_VERSION: ${{ matrix.containerd_version }}
@@ -117,7 +117,7 @@ jobs:
fail-fast: false
matrix:
containerd_version: ['lts', 'active']
vmm: ['clh', 'qemu', 'dragonball']
vmm: ['clh', 'qemu', 'dragonball', 'qemu-runtime-rs']
runs-on: ubuntu-22.04
env:
CONTAINERD_VERSION: ${{ matrix.containerd_version }}
@@ -292,6 +292,7 @@ jobs:
- dragonball
- qemu
- cloud-hypervisor
- qemu-runtime-rs
runs-on: ubuntu-22.04
env:
KATA_HYPERVISOR: ${{ matrix.vmm }}

View File

@@ -12,7 +12,12 @@ name: Build checks
jobs:
check:
name: check
runs-on: ${{ matrix.runner || inputs.instance }}
runs-on: >-
${{
( contains(inputs.instance, 's390x') && matrix.component.name == 'runtime' ) && 's390x' ||
( contains(inputs.instance, 'ppc64le') && (matrix.component.name == 'runtime' || matrix.component.name == 'agent') ) && 'ppc64le' ||
inputs.instance
}}
strategy:
fail-fast: false
matrix:
@@ -70,36 +75,6 @@ jobs:
- protobuf-compiler
instance:
- ${{ inputs.instance }}
include:
- component:
name: runtime
path: src/runtime
needs:
- golang
- XDG_RUNTIME_DIR
instance: ubuntu-24.04-s390x
runner: s390x
- component:
name: runtime
path: src/runtime
needs:
- golang
- XDG_RUNTIME_DIR
instance: ubuntu-24.04-ppc64le
runner: ppc64le
- component:
name: agent
path: src/agent
needs:
- rust
- libdevmapper
- libseccomp
- protobuf-compiler
- clang
instance: ubuntu-24.04-ppc64le
runner: ppc64le
steps:
- name: Adjust a permission for repo

View File

@@ -121,7 +121,7 @@ jobs:
echo "oci-name=${oci_image%@*}" >> "$GITHUB_OUTPUT"
echo "oci-digest=${oci_image#*@}" >> "$GITHUB_OUTPUT"
- uses: oras-project/setup-oras@5c0b487ce3fe0ce3ab0d034e63669e426e294e4d # v1.2.2
- uses: oras-project/setup-oras@22ce207df3b08e061f537244349aac6ae1d214f6 # v1.2.4
if: ${{ env.PERFORM_ATTESTATION == 'yes' }}
with:
version: "1.2.0"
@@ -171,6 +171,8 @@ jobs:
- rootfs-image
- rootfs-image-confidential
- rootfs-image-mariner
- rootfs-image-nvidia-gpu
- rootfs-image-nvidia-gpu-confidential
- rootfs-initrd
- rootfs-initrd-confidential
- rootfs-initrd-nvidia-gpu

View File

@@ -102,7 +102,7 @@ jobs:
echo "oci-name=${oci_image%@*}" >> "$GITHUB_OUTPUT"
echo "oci-digest=${oci_image#*@}" >> "$GITHUB_OUTPUT"
- uses: oras-project/setup-oras@5c0b487ce3fe0ce3ab0d034e63669e426e294e4d # v1.2.2
- uses: oras-project/setup-oras@22ce207df3b08e061f537244349aac6ae1d214f6 # v1.2.4
if: ${{ env.PERFORM_ATTESTATION == 'yes' }}
with:
version: "1.2.0"
@@ -150,6 +150,7 @@ jobs:
matrix:
asset:
- rootfs-image
- rootfs-image-nvidia-gpu
- rootfs-initrd
- rootfs-initrd-nvidia-gpu
steps:

View File

@@ -32,7 +32,7 @@ jobs:
permissions:
contents: read
packages: write
runs-on: ppc64le-small
runs-on: ubuntu-24.04-ppc64le
strategy:
matrix:
asset:
@@ -89,7 +89,7 @@ jobs:
build-asset-rootfs:
name: build-asset-rootfs
runs-on: ppc64le-small
runs-on: ubuntu-24.04-ppc64le
needs: build-asset
permissions:
contents: read
@@ -170,7 +170,7 @@ jobs:
build-asset-shim-v2:
name: build-asset-shim-v2
runs-on: ppc64le-small
runs-on: ubuntu-24.04-ppc64le
needs: [build-asset, build-asset-rootfs, remove-rootfs-binary-artifacts]
permissions:
contents: read
@@ -230,7 +230,7 @@ jobs:
create-kata-tarball:
name: create-kata-tarball
runs-on: ppc64le-small
runs-on: ubuntu-24.04-ppc64le
needs: [build-asset, build-asset-rootfs, build-asset-shim-v2]
permissions:
contents: read

View File

@@ -32,7 +32,7 @@ permissions: {}
jobs:
build-asset:
name: build-asset
runs-on: s390x
runs-on: ubuntu-24.04-s390x
permissions:
contents: read
packages: write
@@ -257,7 +257,7 @@ jobs:
build-asset-shim-v2:
name: build-asset-shim-v2
runs-on: s390x
runs-on: ubuntu-24.04-s390x
needs: [build-asset, build-asset-rootfs, remove-rootfs-binary-artifacts]
permissions:
contents: read
@@ -319,7 +319,7 @@ jobs:
create-kata-tarball:
name: create-kata-tarball
runs-on: s390x
runs-on: ubuntu-24.04-s390x
needs:
- build-asset
- build-asset-rootfs

View File

@@ -147,7 +147,7 @@ jobs:
tag: ${{ inputs.tag }}-s390x
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
runner: s390x
runner: ubuntu-24.04-s390x
arch: s390x
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
@@ -165,7 +165,7 @@ jobs:
tag: ${{ inputs.tag }}-ppc64le
commit-hash: ${{ inputs.commit-hash }}
target-branch: ${{ inputs.target-branch }}
runner: ppc64le-small
runner: ubuntu-24.04-ppc64le
arch: ppc64le
secrets:
QUAY_DEPLOYER_PASSWORD: ${{ secrets.QUAY_DEPLOYER_PASSWORD }}
@@ -314,6 +314,7 @@ jobs:
needs: publish-kata-deploy-payload-amd64
uses: ./.github/workflows/run-k8s-tests-on-nvidia-gpu.yaml
with:
tarball-suffix: -${{ inputs.tag }}
registry: ghcr.io
repo: ${{ github.repository_owner }}/kata-deploy-ci
tag: ${{ inputs.tag }}-amd64
@@ -473,7 +474,7 @@ jobs:
vmm: ${{ matrix.params.vmm }}
run-cri-containerd-tests-arm64:
if: ${{ inputs.skip-test != 'yes' }}
if: false
needs: build-kata-static-tarball-arm64
strategy:
fail-fast: false

View File

@@ -10,7 +10,9 @@ on:
- opened
- synchronize
- reopened
- edited
- labeled
- unlabeled
permissions: {}

View File

@@ -31,7 +31,7 @@ jobs:
permissions:
contents: read
packages: write
runs-on: ppc64le-small
runs-on: ubuntu-24.04-ppc64le
steps:
- name: Login to Kata Containers ghcr.io
uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

View File

@@ -35,7 +35,7 @@ jobs:
permissions:
contents: read
packages: write
runs-on: s390x
runs-on: ubuntu-24.04-s390x
steps:
- name: Login to Kata Containers ghcr.io
uses: docker/login-action@74a5d142397b4f367a81961eba4e8cd7edddf772 # v3.4.0

View File

@@ -1,167 +0,0 @@
name: CI | Run containerd guest pull stability tests
on:
schedule:
- cron: "0 */1 * * *" #run every hour
permissions: {}
# This job relies on k8s pre-installed using kubeadm
jobs:
run-containerd-guest-pull-stability-tests:
name: run-containerd-guest-pull-stability-tests-${{ matrix.environment.test-type }}-${{ matrix.environment.containerd }}
strategy:
fail-fast: false
matrix:
environment: [
{ test-type: multi-snapshotter, containerd: v2.2 },
{ test-type: force-guest-pull, containerd: v1.7 },
{ test-type: force-guest-pull, containerd: v2.0 },
{ test-type: force-guest-pull, containerd: v2.1 },
{ test-type: force-guest-pull, containerd: v2.2 },
]
env:
# I don't want those to be inside double quotes, so I'm deliberately ignoring the double quotes here.
IMAGES_LIST: quay.io/mongodb/mongodb-community-server@sha256:8b73733842da21b6bbb6df4d7b2449229bb3135d2ec8c6880314d88205772a11 ghcr.io/edgelesssys/redis@sha256:ecb0a964c259a166a1eb62f0eb19621d42bd1cce0bc9bb0c71c828911d4ba93d
runs-on: containerd-${{ matrix.environment.test-type }}-${{ matrix.environment.containerd }}
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
persist-credentials: false
- name: Rotate the journal
run: sudo journalctl --rotate --vacuum-time 1s
- name: Pull the kata-deploy image to be used
run: sudo ctr -n k8s.io image pull quay.io/kata-containers/kata-deploy-ci:kata-containers-latest
- name: Deploy Kata Containers
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata
env:
KATA_HYPERVISOR: qemu-coco-dev
KUBERNETES: vanilla
SNAPSHOTTER: ${{ matrix.environment.test-type == 'multi-snapshotter' && 'nydus' || '' }}
USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: ${{ matrix.environment.test-type == 'multi-snapshotter' }}
EXPERIMENTAL_FORCE_GUEST_PULL: ${{ matrix.environment.test-type == 'force-guest-pull' && 'qemu-coco-dev' || '' }}
# This is needed as we may hit the createContainerTimeout
- name: Adjust Kata Containers' create_container_timeout
run: |
sudo sed -i -e 's/^\(create_container_timeout\).*=.*$/\1 = 600/g' /opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml
grep "create_container_timeout.*=" /opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml
# This is needed in order to have enough tmpfs space inside the guest to pull the image
- name: Adjust Kata Containers' default_memory
run: |
sudo sed -i -e 's/^\(default_memory\).*=.*$/\1 = 4096/g' /opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml
grep "default_memory.*=" /opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml
- name: Run a few containers using overlayfs
run: |
# I don't want those to be inside double quotes, so I'm deliberately ignoring the double quotes here
# shellcheck disable=SC2086
for img in ${IMAGES_LIST}; do
echo "overlayfs | Using on image: ${img}"
pod="$(echo ${img} | tr ':.@/' '-' | awk '{print substr($0,1,56)}')"
kubectl run "${pod}" \
-it --rm \
--restart=Never \
--image="${img}" \
--image-pull-policy=Always \
--pod-running-timeout=10m \
-- uname -r
done
- name: Run a the same few containers using a different snapshotter
run: |
# I don't want those to be inside double quotes, so I'm deliberately ignoring the double quotes here
# shellcheck disable=SC2086
for img in ${IMAGES_LIST}; do
echo "nydus | Using on image: ${img}"
pod="kata-$(echo ${img} | tr ':.@/' '-' | awk '{print substr($0,1,56)}')"
kubectl run "${pod}" \
-it --rm \
--restart=Never \
--image="${img}" \
--image-pull-policy=Always \
--pod-running-timeout=10m \
--overrides='{
"spec": {
"runtimeClassName": "kata-qemu-coco-dev"
}
}' \
-- uname -r
done
- name: Uninstall Kata Containers
run: bash tests/integration/kubernetes/gha-run.sh cleanup
env:
KATA_HYPERVISOR: qemu-coco-dev
KUBERNETES: vanilla
SNAPSHOTTER: nydus
USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: true
- name: Run a few containers using overlayfs
run: |
# I don't want those to be inside double quotes, so I'm deliberately ignoring the double quotes here
# shellcheck disable=SC2086
for img in ${IMAGES_LIST}; do
echo "overlayfs | Using on image: ${img}"
pod="$(echo ${img} | tr ':.@/' '-' | awk '{print substr($0,1,56)}')"
kubectl run "${pod}" \
-it --rm \
--restart=Never \
--image=${img} \
--image-pull-policy=Always \
--pod-running-timeout=10m \
-- uname -r
done
- name: Deploy Kata Containers
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata
env:
KATA_HYPERVISOR: qemu-coco-dev
KUBERNETES: vanilla
SNAPSHOTTER: nydus
USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: true
# This is needed as we may hit the createContainerTimeout
- name: Adjust Kata Containers' create_container_timeout
run: |
sudo sed -i -e 's/^\(create_container_timeout\).*=.*$/\1 = 600/g' /opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml
grep "create_container_timeout.*=" /opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml
# This is needed in order to have enough tmpfs space inside the guest to pull the image
- name: Adjust Kata Containers' default_memory
run: |
sudo sed -i -e 's/^\(default_memory\).*=.*$/\1 = 4096/g' /opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml
grep "default_memory.*=" /opt/kata/share/defaults/kata-containers/configuration-qemu-coco-dev.toml
- name: Run a the same few containers using a different snapshotter
run: |
# I don't want those to be inside double quotes, so I'm deliberately ignoring the double quotes here
# shellcheck disable=SC2086
for img in ${IMAGES_LIST}; do
echo "nydus | Using on image: ${img}"
pod="kata-$(echo ${img} | tr ':.@/' '-' | awk '{print substr($0,1,56)}')"
kubectl run "${pod}" \
-it --rm \
--restart=Never \
--image="${img}" \
--image-pull-policy=Always \
--pod-running-timeout=10m \
--overrides='{
"spec": {
"runtimeClassName": "kata-qemu-coco-dev"
}
}' \
-- uname -r
done
- name: Uninstall Kata Containers
run: bash tests/integration/kubernetes/gha-run.sh cleanup || true
if: always()
env:
KATA_HYPERVISOR: qemu-coco-dev
KUBERNETES: vanilla
SNAPSHOTTER: nydus
USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: true

View File

@@ -142,6 +142,10 @@ jobs:
timeout-minutes: 60
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Refresh OIDC token in case access token expired
if: always()
uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0

View File

@@ -68,6 +68,10 @@ jobs:
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Collect artifacts ${{ matrix.vmm }}
if: always()
run: bash tests/integration/kubernetes/gha-run.sh collect-artifacts

View File

@@ -1,7 +1,10 @@
name: CI | Run NVIDIA GPU kubernetes tests on arm64
name: CI | Run NVIDIA GPU kubernetes tests on amd64
on:
workflow_call:
inputs:
tarball-suffix:
required: true
type: string
registry:
required: true
type: string
@@ -45,6 +48,7 @@ jobs:
GH_PR_NUMBER: ${{ inputs.pr-number }}
KATA_HYPERVISOR: ${{ matrix.environment.vmm }}
KUBERNETES: kubeadm
KBS: ${{ matrix.environment.name == 'nvidia-gpu-snp' && 'true' || 'false' }}
K8S_TEST_HOST_TYPE: baremetal
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
@@ -59,6 +63,15 @@ jobs:
env:
TARGET_BRANCH: ${{ inputs.target-branch }}
- name: get-kata-tarball
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4.3.0
with:
name: kata-static-tarball-amd64${{ inputs.tarball-suffix }}
path: kata-artifacts
- name: Install kata
run: bash tests/integration/kubernetes/gha-run.sh install-kata-tools kata-artifacts
- name: Uninstall previous `kbs-client`
if: matrix.environment.name != 'nvidia-gpu'
timeout-minutes: 10
@@ -89,6 +102,11 @@ jobs:
run: bash tests/integration/kubernetes/gha-run.sh run-nv-tests
env:
NGC_API_KEY: ${{ secrets.NGC_API_KEY }}
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Collect artifacts ${{ matrix.environment.vmm }}
if: always()
run: bash tests/integration/kubernetes/gha-run.sh collect-artifacts

View File

@@ -75,3 +75,7 @@ jobs:
- name: Run tests
timeout-minutes: 30
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests

View File

@@ -131,6 +131,10 @@ jobs:
timeout-minutes: 60
run: bash tests/integration/kubernetes/gha-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Delete kata-deploy
if: always()
run: bash tests/integration/kubernetes/gha-run.sh cleanup-zvsi

View File

@@ -46,6 +46,7 @@ jobs:
matrix:
vmm:
- qemu-coco-dev
- qemu-coco-dev-runtime-rs
snapshotter:
- nydus
pull-type:
@@ -139,6 +140,10 @@ jobs:
timeout-minutes: 300
run: bash tests/stability/gha-stability-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Refresh OIDC token in case access token expired
if: always()
uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0

View File

@@ -159,6 +159,7 @@ jobs:
AUTHENTICATED_IMAGE_USER: ${{ vars.AUTHENTICATED_IMAGE_USER }}
AUTHENTICATED_IMAGE_PASSWORD: ${{ secrets.AUTHENTICATED_IMAGE_PASSWORD }}
SNAPSHOTTER: ${{ matrix.snapshotter }}
EXPERIMENTAL_FORCE_GUEST_PULL: ${{ matrix.pull-type == 'experimental-force-guest-pull' && matrix.vmm || '' }}
# Caution: current ingress controller used to expose the KBS service
# requires much vCPUs, lefting only a few for the tests. Depending on the
# host type chose it will result on the creation of a cluster with
@@ -217,7 +218,6 @@ jobs:
timeout-minutes: 20
run: bash tests/integration/kubernetes/gha-run.sh deploy-kata-aks
env:
EXPERIMENTAL_FORCE_GUEST_PULL: ${{ env.PULL_TYPE == 'experimental-force-guest-pull' && env.KATA_HYPERVISOR || '' }}
USE_EXPERIMENTAL_SETUP_SNAPSHOTTER: ${{ env.SNAPSHOTTER == 'nydus' }}
AUTO_GENERATE_POLICY: ${{ env.PULL_TYPE == 'experimental-force-guest-pull' && 'no' || 'yes' }}

View File

@@ -102,6 +102,10 @@ jobs:
- name: Run tests
run: bash tests/functional/kata-deploy/gha-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests
- name: Refresh OIDC token in case access token expired
if: always()
uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2.3.0

View File

@@ -85,3 +85,7 @@ jobs:
- name: Run tests
run: bash tests/functional/kata-deploy/gha-run.sh run-tests
- name: Report tests
if: always()
run: bash tests/integration/kubernetes/gha-run.sh report-tests

View File

@@ -29,7 +29,7 @@ jobs:
matrix:
instance:
- "ubuntu-24.04-arm"
- "s390x"
- "ubuntu-24.04-s390x"
- "ubuntu-24.04-ppc64le"
uses: ./.github/workflows/build-checks.yaml
with:

1
.gitignore vendored
View File

@@ -18,3 +18,4 @@ src/tools/log-parser/kata-log-parser
tools/packaging/static-build/agent/install_libseccomp.sh
.envrc
.direnv
**/.DS_Store

View File

@@ -4,18 +4,18 @@ version = 4
[[package]]
name = "addr2line"
version = "0.21.0"
version = "0.25.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8a30b2e23b9e17a9f90641c7ab1549cd9b44f296d3ccbf309d2863cfe398a0cb"
checksum = "1b5d307320b3181d6d7954e663bd7c774a838b8220fe0593c86d9fb09f498b4b"
dependencies = [
"gimli",
]
[[package]]
name = "adler"
version = "1.0.2"
name = "adler2"
version = "2.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f26201604c87b1e01bd3d98f8d5d9a8fcbb815e8cedb41ffccbeb4bf593a35fe"
checksum = "320119579fcad9c21884f5c4861d16174d0e06250625266f50fe6898340abefa"
[[package]]
name = "android-tzdata"
@@ -64,17 +64,17 @@ checksum = "d468802bab17cbc0cc575e9b053f41e72aa36bfa6b7f55e3529ffa43161b97fa"
[[package]]
name = "backtrace"
version = "0.3.69"
version = "0.3.76"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2089b7e3f35b9dd2d0ed921ead4f6d318c27680d4a5bd167b3ee120edb105837"
checksum = "bb531853791a215d7c62a30daf0dde835f381ab5de4589cfe7c649d2cbe92bd6"
dependencies = [
"addr2line",
"cc",
"cfg-if",
"libc",
"miniz_oxide",
"object",
"rustc-demangle",
"windows-link",
]
[[package]]
@@ -638,9 +638,9 @@ dependencies = [
[[package]]
name = "flate2"
version = "1.0.27"
version = "1.1.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c6c98ee8095e9d1dcbf2fcc6d95acccb90d1c81db1e44725c6a984b1dbdfb010"
checksum = "bfe33edd8e85a12a67454e37f8c75e730830d83e313556ab9ebf9ee7fbeb3bfb"
dependencies = [
"crc32fast",
"libz-sys",
@@ -780,9 +780,9 @@ dependencies = [
[[package]]
name = "gimli"
version = "0.28.0"
version = "0.32.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6fb8d784f27acf97159b40fc4db5ecd8aa23b9ad5ef69cdd136d3bc80665f0c0"
checksum = "e629b9b98ef3dd8afe6ca2bd0f89306cec16d43d907889945bc5d6687f2f13c7"
[[package]]
name = "h2"
@@ -1250,11 +1250,12 @@ checksum = "6877bb514081ee2a7ff5ef9de3281f14a4dd4bceac4c09388074a6b5df8a139a"
[[package]]
name = "miniz_oxide"
version = "0.7.1"
version = "0.8.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e7810e0be55b428ada41041c41f32c9f1a42817901b4ccf45fa3d4b6561e74c7"
checksum = "1fa76a2c86f704bdb222d66965fb3d63269ce38518b83cb0575fca855ebb6316"
dependencies = [
"adler",
"adler2",
"simd-adler32",
]
[[package]]
@@ -1452,9 +1453,9 @@ dependencies = [
[[package]]
name = "object"
version = "0.32.1"
version = "0.37.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9cf5f9dd3933bd50a9e1f149ec995f39ae2c496d31fd772c1fd45ebc27e902b0"
checksum = "ff76201f031d8863c38aa7f905eca4f53abbfa15f609db4277d44cd8938f33fe"
dependencies = [
"memchr",
]
@@ -1756,9 +1757,9 @@ dependencies = [
[[package]]
name = "rustc-demangle"
version = "0.1.23"
version = "0.1.26"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d626bb9dae77e28219937af045c257c28bfd3f69333c512553507f5f9798cb76"
checksum = "56f7d92ca342cea22a06f2121d944b4fd82af56988c270852495420f961d4ace"
[[package]]
name = "rustix"
@@ -1926,6 +1927,12 @@ version = "1.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0fda2ff0d084019ba4d7c6f371c95d8fd75ce3524c3cb8fb653a3023f6323e64"
[[package]]
name = "simd-adler32"
version = "0.3.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d66dc143e6b11c1eddc06d5c423cfc97062865baf299914ab64caa38182078fe"
[[package]]
name = "slab"
version = "0.4.11"
@@ -2553,6 +2560,12 @@ dependencies = [
"windows-targets 0.48.5",
]
[[package]]
name = "windows-link"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f0805222e57f7521d6a62e36fa9163bc891acd422f971defe97d64e70d0a4fe5"
[[package]]
name = "windows-sys"
version = "0.48.0"

72
Cargo.toml Normal file
View File

@@ -0,0 +1,72 @@
[workspace.package]
authors = ["The Kata Containers community <kata-dev@lists.katacontainers.io>"]
edition = "2018"
license = "Apache-2.0"
rust-version = "1.85.1"
[workspace]
members = [
# Dragonball
"src/dragonball",
"src/dragonball/dbs_acpi",
"src/dragonball/dbs_address_space",
"src/dragonball/dbs_allocator",
"src/dragonball/dbs_arch",
"src/dragonball/dbs_boot",
"src/dragonball/dbs_device",
"src/dragonball/dbs_interrupt",
"src/dragonball/dbs_legacy_devices",
"src/dragonball/dbs_pci",
"src/dragonball/dbs_tdx",
"src/dragonball/dbs_upcall",
"src/dragonball/dbs_utils",
"src/dragonball/dbs_virtio_devices",
]
resolver = "2"
# TODO: Add all excluded crates to root workspace
exclude = [
"src/agent",
"src/tools",
"src/libs",
"src/runtime-rs",
# We are cloning and building rust packages under
# "tools/packaging/kata-deploy/local-build/build" folder, which may mislead
# those packages to think they are part of the kata root workspace
"tools/packaging/kata-deploy/local-build/build",
]
[workspace.dependencies]
# Rust-VMM crates
event-manager = "0.2.1"
kvm-bindings = "0.6.0"
kvm-ioctls = "=0.12.1"
linux-loader = "0.8.0"
seccompiler = "0.5.0"
vfio-bindings = "0.3.0"
vfio-ioctls = "0.1.0"
virtio-bindings = "0.1.0"
virtio-queue = "0.7.0"
vm-fdt = "0.2.0"
vm-memory = "0.10.0"
vm-superio = "0.5.0"
vmm-sys-util = "0.11.0"
# Local dependencies from Dragonball Sandbox crates
dbs-acpi = { path = "src/dragonball/dbs_acpi" }
dbs-address-space = { path = "src/dragonball/dbs_address_space" }
dbs-allocator = { path = "src/dragonball/dbs_allocator" }
dbs-arch = { path = "src/dragonball/dbs_arch" }
dbs-boot = { path = "src/dragonball/dbs_boot" }
dbs-device = { path = "src/dragonball/dbs_device" }
dbs-interrupt = { path = "src/dragonball/dbs_interrupt" }
dbs-legacy-devices = { path = "src/dragonball/dbs_legacy_devices" }
dbs-pci = { path = "src/dragonball/dbs_pci" }
dbs-tdx = { path = "src/dragonball/dbs_tdx" }
dbs-upcall = { path = "src/dragonball/dbs_upcall" }
dbs-utils = { path = "src/dragonball/dbs_utils" }
dbs-virtio-devices = { path = "src/dragonball/dbs_virtio_devices" }
# Local dependencies from `src/lib`
test-utils = { path = "src/libs/test-utils" }

View File

@@ -1 +1 @@
3.22.0
3.24.0

View File

@@ -83,3 +83,7 @@ Documents that help to understand and contribute to Kata Containers.
If you have a suggestion for how we can improve the
[website](https://katacontainers.io), please raise an issue (or a PR) on
[the repository that holds the source for the website](https://github.com/OpenStackweb/kata-netlify-refresh).
### Toolchain Guidance
* [Toolchain Guidance](./Toochain-Guidance.md)

39
docs/Toochain-Guidance.md Normal file
View File

@@ -0,0 +1,39 @@
# Toolchains
As a community we want to strike a balance between having up-to-date toolchains, to receive the
latest security fixes and to be able to benefit from new features and packages, whilst not being
too bleeding edge and disrupting downstream and other consumers. As a result we have the following
guidelines (note, not hard rules) for our go and rust toolchains that we are attempting to try out:
## Go toolchain
Go is released [every six months](https://go.dev/wiki/Go-Release-Cycle) with support for the
[last two major release versions](https://go.dev/doc/devel/release#policy). We always want to
ensure that we are on a supported version so we receive security fixes. To try and make
things easier for some of our users, we aim to be using the older of the two supported major
versions, unless there is a compelling reason to adopt the newer version.
In practice this means that we bump our major version of the go toolchain every six months to
version (1.x-1) in response to a new version (1.x) coming out, which makes our current version
(1.x-2) no longer supported. We will bump the minor version whenever required to satisfy
dependency updates, or security fixes.
Our go toolchain version is recorded in [`versions.yaml`](../versions.yaml) under
`.languages.golang.version` and should match with the version in our `go.mod` files.
## Rust toolchain
Rust has a [six week](https://doc.rust-lang.org/book/appendix-05-editions.html#:~:text=The%20Rust%20language%20and%20compiler,these%20tiny%20changes%20add%20up.)
release cycle and they only support the latest stable release, so if we wanted to remain on a
supported release we would only ever build with the latest stable and bump every 6 weeks.
However feedback from our community has indicated that this is a challenge as downstream consumers
often want to get rust from their distro, or downstream fork and these struggle to keep up with
the six week release schedule. As a result the community has agreed to try out a policy of
"stable-2", where we aim to build with a rust version that is two versions behind the latest stable
version.
In practice this should mean that we bump our rust toolchain every six weeks, to version
1.x-2 when 1.x is released as stable and we should be picking up the latest point release
of that version, if there were any.
The rust-toolchain that we are using is recorded in [`rust-toolchain.toml`](../rust-toolchain.toml).

View File

@@ -97,6 +97,8 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.use_legacy_serial` | `boolean` | uses legacy serial device for guest's console (QEMU) |
| `io.katacontainers.config.hypervisor.default_gpus` | uint32 | the minimum number of GPUs required for the VM. Only used by remote hypervisor to help with instance selection |
| `io.katacontainers.config.hypervisor.default_gpu_model` | string | the GPU model required for the VM. Only used by remote hypervisor to help with instance selection |
| `io.katacontainers.config.hypervisor.block_device_num_queues` | `usize` | The number of queues to use for block devices (runtime-rs only) |
| `io.katacontainers.config.hypervisor.block_device_queue_size` | uint32 | The size of the of the queue to use for block devices (runtime-rs only) |
## Container Options
| Key | Value Type | Comments |

28
src/agent/Cargo.lock generated
View File

@@ -459,15 +459,9 @@ version = "0.8.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "08807e080ed7f9d5433fa9b275196cfc35414f66a0c79d864dc51a0d825231a3"
dependencies = [
"bit-vec 0.8.0",
"bit-vec",
]
[[package]]
name = "bit-vec"
version = "0.6.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "349f9b6a179ed607305526ca489b34ad0a41aed5f7980fa90eb03160b69598fb"
[[package]]
name = "bit-vec"
version = "0.8.0"
@@ -1250,7 +1244,6 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7ced92e76e966ca2fd84c8f7aa01a4aea65b0eb6648d72f7c8f3e2764a67fece"
dependencies = [
"crc32fast",
"libz-sys",
"miniz_oxide",
]
@@ -2266,17 +2259,6 @@ dependencies = [
"uuid 0.8.2",
]
[[package]]
name = "libz-sys"
version = "1.1.22"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8b70e7a7df205e92a1a4cd9aaae7898dac0aa555503cc0a649494d0d60e7651d"
dependencies = [
"cc",
"pkg-config",
"vcpkg",
]
[[package]]
name = "linux-raw-sys"
version = "0.3.8"
@@ -3719,7 +3701,7 @@ dependencies = [
"anyhow",
"async-trait",
"awaitgroup",
"bit-vec 0.6.3",
"bit-vec",
"capctl",
"caps",
"cfg-if",
@@ -4821,12 +4803,6 @@ version = "1.11.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "943ce29a8a743eb10d6082545d861b24f9d1b160b7d741e0f2cdf726bec909c5"
[[package]]
name = "vcpkg"
version = "0.2.15"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "accd4ea62f7bb7a82fe23066fb0957d48ef677f6eeb8215f372f52e48bb32426"
[[package]]
name = "version_check"
version = "0.9.5"

View File

@@ -186,7 +186,7 @@ base64 = "0.22"
sha2 = "0.10.8"
async-compression = { version = "0.4.22", features = ["tokio", "gzip"] }
container-device-interface = "0.1.0"
container-device-interface = "0.1.1"
[target.'cfg(target_arch = "s390x")'.dependencies]
pv_core = { git = "https://github.com/ibm-s390-linux/s390-tools", rev = "4942504a9a2977d49989a5e5b7c1c8e07dc0fa41", package = "s390_pv_core" }
@@ -206,6 +206,7 @@ lto = true
seccomp = ["rustjail/seccomp"]
standard-oci-runtime = ["rustjail/standard-oci-runtime"]
agent-policy = ["kata-agent-policy"]
init-data = []
[[bin]]
name = "kata-agent"

View File

@@ -41,6 +41,14 @@ ifeq ($(AGENT_POLICY),yes)
override EXTRA_RUSTFEATURES += agent-policy
endif
##VAR INIT_DATA=yes|no define if agent enables the init data feature
INIT_DATA ?= yes
# Enable the init data fature of rust build
ifeq ($(INIT_DATA),yes)
override EXTRA_RUSTFEATURES += init-data
endif
include ../../utils.mk
##VAR STANDARD_OCI_RUNTIME=yes|no define if agent enables standard oci runtime feature

View File

@@ -10,7 +10,7 @@ use anyhow::{bail, Result};
use slog::{debug, error, info, warn};
use tokio::io::AsyncWriteExt;
static POLICY_LOG_FILE: &str = "/tmp/policy.txt";
static POLICY_LOG_FILE: &str = "/tmp/policy.jsonl";
static POLICY_DEFAULT_FILE: &str = "/etc/kata-opa/default-policy.rego";
/// Convenience macro to obtain the scope logger
@@ -26,7 +26,7 @@ pub struct AgentPolicy {
/// When true policy errors are ignored, for debug purposes.
allow_failures: bool,
/// "/tmp/policy.txt" log file for policy activity.
/// "/tmp/policy.jsonl" log file for policy activity.
log_file: Option<tokio::fs::File>,
/// Regorus engine
@@ -213,7 +213,7 @@ impl AgentPolicy {
// The Policy text can be obtained directly from the pod YAML.
}
_ => {
let log_entry = format!("[\"ep\":\"{ep}\",{input}],\n\n");
let log_entry = format!("{{\"kind\":\"{ep}\",\"request\":{input}}}\n");
if let Err(e) = log_file.write_all(log_entry.as_bytes()).await {
warn!(sl!(), "policy: log_eval_input: write_all failed: {}", e);

View File

@@ -44,7 +44,7 @@ async-trait.workspace = true
inotify = "0.9.2"
libseccomp = { version = "0.3.0", optional = true }
zbus = "3.12.0"
bit-vec = "0.6.3"
bit-vec = "0.8.0"
xattr = "0.2.3"
# Local dependencies

View File

@@ -9,6 +9,7 @@
// SPDX-License-Identifier: Apache-2.0
//
#[cfg(feature = "init-data")]
use std::{os::unix::fs::FileTypeExt, path::Path};
use anyhow::{bail, Context, Result};
@@ -37,14 +38,24 @@ pub const AA_CONFIG_PATH: &str = concatcp!(INITDATA_PATH, "/aa.toml");
pub const CDH_CONFIG_PATH: &str = concatcp!(INITDATA_PATH, "/cdh.toml");
/// Magic number of initdata device
#[cfg(feature = "init-data")]
pub const INITDATA_MAGIC_NUMBER: &[u8] = b"initdata";
/// initdata device with disk type 'vd*'
#[cfg(feature = "init-data")]
const INITDATA_PREFIX_DISK_VDX: &str = "vd";
/// initdata device with disk type 'sd*'
#[cfg(feature = "init-data")]
const INITDATA_PREFIX_DISK_SDX: &str = "sd";
#[cfg(not(feature = "init-data"))]
async fn detect_initdata_device(logger: &Logger) -> Result<Option<String>> {
debug!(logger, "Initdata is disabled");
Ok(None)
}
#[cfg(feature = "init-data")]
async fn detect_initdata_device(logger: &Logger) -> Result<Option<String>> {
let dev_dir = Path::new("/dev");
let mut read_dir = tokio::fs::read_dir(dev_dir).await?;

View File

@@ -401,11 +401,10 @@ impl Handle {
}
if let RouteAttribute::Oif(index) = attribute {
route.device = self
.find_link(LinkFilter::Index(*index))
.await
.context(format!("error looking up device {index}"))?
.name();
route.device = match self.find_link(LinkFilter::Index(*index)).await {
Ok(link) => link.name(),
Err(_) => String::new(),
};
}
}
@@ -1005,10 +1004,6 @@ mod tests {
.expect("Failed to list routes");
assert_ne!(all.len(), 0);
for r in &all {
assert_ne!(r.device.len(), 0);
}
}
#[tokio::test]

View File

@@ -9,58 +9,6 @@ repository = "https://github.com/kata-containers/kata-containers.git"
license = "Apache-2.0"
edition = "2018"
[workspace]
members = [
"dbs_acpi",
"dbs_address_space",
"dbs_allocator",
"dbs_arch",
"dbs_boot",
"dbs_device",
"dbs_interrupt",
"dbs_legacy_devices",
"dbs_pci",
"dbs_tdx",
"dbs_upcall",
"dbs_utils",
"dbs_virtio_devices",
]
resolver = "2"
[workspace.dependencies]
# Rust-VMM crates
event-manager = "0.2.1"
kvm-bindings = "0.6.0"
kvm-ioctls = "=0.12.1"
linux-loader = "0.8.0"
seccompiler = "0.5.0"
vfio-bindings = "0.3.0"
vfio-ioctls = "0.1.0"
virtio-bindings = "0.1.0"
virtio-queue = "0.7.0"
vm-fdt = "0.2.0"
vm-memory = "0.10.0"
vm-superio = "0.5.0"
vmm-sys-util = "0.11.0"
# Local dependencies from Dragonball Sandbox crates
dbs-acpi = { path = "dbs_acpi" }
dbs-address-space = { path = "dbs_address_space" }
dbs-allocator = { path = "dbs_allocator" }
dbs-arch = { path = "dbs_arch" }
dbs-boot = { path = "dbs_boot" }
dbs-device = { path = "dbs_device" }
dbs-interrupt = { path = "dbs_interrupt" }
dbs-legacy-devices = { path = "dbs_legacy_devices" }
dbs-pci = { path = "dbs_pci" }
dbs-tdx = { path = "dbs_tdx" }
dbs-upcall = { path = "dbs_upcall" }
dbs-utils = { path = "dbs_utils" }
dbs-virtio-devices = { path = "dbs_virtio_devices" }
# Local dependencies from `src/lib`
test-utils = { path = "../libs/test-utils" }
[dependencies]
anyhow = "1.0.32"
arc-swap = "1.5.0"
@@ -83,12 +31,12 @@ kvm-bindings = { workspace = true }
kvm-ioctls = { workspace = true }
lazy_static = "1.2"
libc = "0.2.39"
linux-loader = {workspace = true}
linux-loader = { workspace = true }
log = "0.4.14"
nix = "0.24.2"
procfs = "0.12.0"
prometheus = { version = "0.14.0", features = ["process"] }
seccompiler = {workspace = true}
seccompiler = { workspace = true }
serde = "1.0.27"
serde_derive = "1.0.27"
serde_json = "1.0.9"
@@ -96,7 +44,7 @@ slog = "2.5.2"
slog-scope = "4.4.0"
thiserror = "1"
tracing = "0.1.41"
vmm-sys-util = {workspace = true}
vmm-sys-util = { workspace = true }
virtio-queue = { workspace = true, optional = true }
vm-memory = { workspace = true, features = ["backend-mmap"] }
crossbeam-channel = "0.5.6"
@@ -118,14 +66,14 @@ virtio-blk = ["dbs-virtio-devices/virtio-blk", "virtio-queue"]
virtio-net = ["dbs-virtio-devices/virtio-net", "virtio-queue"]
# virtio-fs only work on atomic-guest-memory
virtio-fs = [
"dbs-virtio-devices/virtio-fs-pro",
"virtio-queue",
"atomic-guest-memory",
"dbs-virtio-devices/virtio-fs-pro",
"virtio-queue",
"atomic-guest-memory",
]
virtio-mem = [
"dbs-virtio-devices/virtio-mem",
"virtio-queue",
"atomic-guest-memory",
"dbs-virtio-devices/virtio-mem",
"virtio-queue",
"atomic-guest-memory",
]
virtio-balloon = ["dbs-virtio-devices/virtio-balloon", "virtio-queue"]
vhost-net = ["dbs-virtio-devices/vhost-net"]
@@ -136,5 +84,5 @@ host-device = ["dep:vfio-bindings", "dep:vfio-ioctls", "dep:dbs-pci"]
[lints.rust]
unexpected_cfgs = { level = "warn", check-cfg = [
'cfg(feature, values("test-mock"))',
'cfg(feature, values("test-mock"))',
] }

View File

@@ -283,6 +283,13 @@ pub const KATA_ANNO_CFG_HYPERVISOR_DEFAULT_GPUS: &str =
pub const KATA_ANNO_CFG_HYPERVISOR_DEFAULT_GPU_MODEL: &str =
"io.katacontainers.config.hypervisor.default_gpu_model";
/// Block device specific annotation for num_queues
pub const KATA_ANNO_CFG_HYPERVISOR_BLOCK_DEV_NUM_QUEUES: &str =
"io.katacontainers.config.hypervisor.block_device_num_queues";
/// Block device specific annotation for queue_size
pub const KATA_ANNO_CFG_HYPERVISOR_BLOCK_DEV_QUEUE_SIZE: &str =
"io.katacontainers.config.hypervisor.block_device_queue_size";
// Runtime related annotations
/// Prefix for Runtime configurations.
pub const KATA_ANNO_CFG_RUNTIME_PREFIX: &str = "io.katacontainers.config.runtime.";
@@ -503,6 +510,7 @@ impl Annotation {
let u32_err = io::Error::new(io::ErrorKind::InvalidData, "parse u32 error".to_string());
let u64_err = io::Error::new(io::ErrorKind::InvalidData, "parse u64 error".to_string());
let i32_err = io::Error::new(io::ErrorKind::InvalidData, "parse i32 error".to_string());
let usize_err = io::Error::new(io::ErrorKind::InvalidData, "parse usize error".to_string());
let hv = config.hypervisor.get_mut(hypervisor_name).ok_or_else(|| {
io::Error::new(
io::ErrorKind::InvalidData,
@@ -620,7 +628,7 @@ impl Annotation {
hv.boot_info.kernel = value.to_string();
}
KATA_ANNO_CFG_HYPERVISOR_KERNEL_PARAMS => {
hv.boot_info.kernel_params = value.to_string();
hv.boot_info.replace_kernel_params(value);
}
KATA_ANNO_CFG_HYPERVISOR_IMAGE_PATH => {
hv.boot_info.validate_boot_path(value)?;
@@ -960,7 +968,26 @@ impl Annotation {
return Err(u32_err);
}
},
KATA_ANNO_CFG_HYPERVISOR_BLOCK_DEV_NUM_QUEUES => {
match self.get_value::<usize>(key) {
Ok(v) => {
hv.blockdev_info.num_queues = v.unwrap_or_default();
}
Err(_e) => {
return Err(usize_err);
}
}
}
KATA_ANNO_CFG_HYPERVISOR_BLOCK_DEV_QUEUE_SIZE => {
match self.get_value::<u32>(key) {
Ok(v) => {
hv.blockdev_info.queue_size = v.unwrap_or_default();
}
Err(_e) => {
return Err(u32_err);
}
}
}
_ => {
return Err(io::Error::new(
io::ErrorKind::InvalidInput,

View File

@@ -41,11 +41,13 @@ pub const DEFAULT_BLOCK_NVDIMM_MEM_OFFSET: u64 = 0;
pub const DEFAULT_BLOCK_DEVICE_AIO_THREADS: &str = "threads";
pub const DEFAULT_BLOCK_DEVICE_AIO_NATIVE: &str = "native";
pub const DEFAULT_BLOCK_DEVICE_AIO: &str = "io_uring";
pub const DEFAULT_BLOCK_DEVICE_NUM_QUEUES: u32 = 1;
pub const DEFAULT_BLOCK_DEVICE_QUEUE_SIZE: u32 = 128;
pub const DEFAULT_SHARED_FS_TYPE: &str = "virtio-fs";
pub const DEFAULT_VIRTIO_FS_CACHE_MODE: &str = "never";
pub const DEFAULT_VIRTIO_FS_DAX_SIZE_MB: u32 = 1024;
pub const DEFAULT_SHARED_9PFS_SIZE_MB: u32 = 128 * 1024;
pub const DEFAULT_SHARED_9PFS_SIZE_MB: u32 = 8 * 1024;
pub const MIN_SHARED_9PFS_SIZE_MB: u32 = 4 * 1024;
pub const MAX_SHARED_9PFS_SIZE_MB: u32 = 8 * 1024 * 1024;
@@ -110,3 +112,6 @@ pub const MAX_REMOTE_VCPUS: u32 = 32;
pub const MIN_REMOTE_MEMORY_SIZE_MB: u32 = 64;
pub const DEFAULT_REMOTE_MEMORY_SIZE_MB: u32 = 128;
pub const DEFAULT_REMOTE_MEMORY_SLOTS: u32 = 128;
// Default configuration for factory/templating
pub const DEFAULT_TEMPLATE_PATH: &str = "/run/vc/vm/template";

View File

@@ -189,6 +189,13 @@ pub struct BlockDeviceInfo {
/// increases the initial max rate
#[serde(default)]
pub disk_rate_limiter_ops_one_time_burst: Option<u64>,
/// virtio queue size. Size: byte
#[serde(default)]
pub queue_size: u32,
/// block device multi-queue
#[serde(default)]
pub num_queues: usize,
}
impl BlockDeviceInfo {
@@ -219,6 +226,15 @@ impl BlockDeviceInfo {
));
}
}
if self.num_queues == 0 {
self.num_queues = default::DEFAULT_BLOCK_DEVICE_NUM_QUEUES as usize;
}
if self.queue_size == 0 {
self.queue_size = default::DEFAULT_BLOCK_DEVICE_QUEUE_SIZE;
}
if self.memory_offset == 0 {
self.memory_offset = default::DEFAULT_BLOCK_NVDIMM_MEM_OFFSET;
}
@@ -358,6 +374,71 @@ impl BootInfo {
self.kernel_params = p.join(KERNEL_PARAM_DELIMITER);
}
/// Replace kernel parameters with the same key.
///
/// For each parameter in the new_params string, if a parameter with the same key
/// already exists in kernel_params, it will be removed before adding the new one.
/// This allows selective parameter override from annotations without replacing
/// the entire kernel command line.
pub fn replace_kernel_params(&mut self, new_params: &str) {
if new_params.is_empty() {
return;
}
// Parse existing kernel parameters into a map
let mut existing_params: Vec<(String, String)> = Vec::new();
for param in self.kernel_params.split(KERNEL_PARAM_DELIMITER) {
let param = param.trim();
if param.is_empty() {
continue;
}
// Split by '=' to get key and value
if let Some(eq_pos) = param.find('=') {
let key = param[..eq_pos].to_string();
let value = param[eq_pos + 1..].to_string();
existing_params.push((key, value));
} else {
// Parameter without value (like "quiet")
existing_params.push((param.to_string(), String::new()));
}
}
// Parse new parameters and collect keys to replace
let mut new_param_keys: Vec<String> = Vec::new();
let mut new_param_list: Vec<String> = Vec::new();
for param in new_params.split(KERNEL_PARAM_DELIMITER) {
let param = param.trim();
if param.is_empty() {
continue;
}
if let Some(eq_pos) = param.find('=') {
let key = param[..eq_pos].to_string();
new_param_keys.push(key);
} else {
new_param_keys.push(param.to_string());
}
new_param_list.push(param.to_string());
}
// Remove existing parameters that will be replaced
existing_params.retain(|(key, _)| !new_param_keys.contains(key));
// Reconstruct kernel_params: existing params + new params
let mut all_params: Vec<String> = existing_params
.iter()
.map(|(key, value)| {
if value.is_empty() {
key.clone()
} else {
format!("{}={}", key, value)
}
})
.collect();
all_params.extend(new_param_list);
self.kernel_params = all_params.join(KERNEL_PARAM_DELIMITER);
}
/// Validate guest kernel image annotation.
pub fn validate_boot_path(&self, path: &str) -> Result<()> {
validate_path!(path, "path {} is invalid{}")?;

View File

@@ -91,6 +91,10 @@ impl ConfigPlugin for QemuConfig {
if qemu.memory_info.memory_slots == 0 {
qemu.memory_info.memory_slots = default::DEFAULT_QEMU_MEMORY_SLOTS;
}
if qemu.factory.template_path.is_empty() {
qemu.factory.template_path = default::DEFAULT_TEMPLATE_PATH.to_string();
}
}
Ok(())

View File

@@ -65,6 +65,11 @@ impl ConfigPlugin for RemoteConfig {
if remote.memory_info.memory_slots == 0 {
remote.memory_info.memory_slots = default::DEFAULT_REMOTE_MEMORY_SLOTS
}
// Apply factory defaults
if remote.factory.template_path.is_empty() {
remote.factory.template_path = default::DEFAULT_TEMPLATE_PATH.to_string();
}
}
Ok(())

View File

@@ -25,6 +25,7 @@ pub enum Error {
}
/// Assigned CPU resources for a Linux container.
/// Stores fractional vCPU allocation for more precise resource tracking.
#[derive(Clone, Default, Debug)]
pub struct LinuxContainerCpuResources {
shares: u64,
@@ -32,7 +33,8 @@ pub struct LinuxContainerCpuResources {
quota: i64,
cpuset: CpuSet,
nodeset: NumaNodeSet,
calculated_vcpu_time_ms: Option<u64>,
/// Calculated fractional vCPU allocation, e.g., 0.25 means 1/4 of a CPU.
calculated_vcpu: Option<f64>,
}
impl LinuxContainerCpuResources {
@@ -61,10 +63,10 @@ impl LinuxContainerCpuResources {
&self.nodeset
}
/// Get number of vCPUs to fulfill the CPU resource request, `None` means unconstrained.
pub fn get_vcpus(&self) -> Option<u64> {
self.calculated_vcpu_time_ms
.map(|v| v.saturating_add(999) / 1000)
/// Get the number of vCPUs assigned to the container as a fractional value.
/// Returns `None` if unconstrained (no limit).
pub fn get_vcpus(&self) -> Option<f64> {
self.calculated_vcpu
}
}
@@ -75,15 +77,18 @@ impl TryFrom<&oci::LinuxCpu> for LinuxContainerCpuResources {
fn try_from(value: &oci::LinuxCpu) -> Result<Self, Self::Error> {
let period = value.period().unwrap_or(0);
let quota = value.quota().unwrap_or(-1);
let value_cpus = value.cpus().as_ref().map_or("", |cpus| cpus);
let value_cpus = value.cpus().as_deref().unwrap_or("");
let cpuset = CpuSet::from_str(value_cpus).map_err(Error::InvalidCpuSet)?;
let value_mems = value.mems().as_ref().map_or("", |mems| mems);
let value_mems = value.mems().as_deref().unwrap_or("");
let nodeset = NumaNodeSet::from_str(value_mems).map_err(Error::InvalidNodeSet)?;
// If quota is -1, it means the CPU resource request is unconstrained. In that case,
// we don't currently assign additional CPUs.
let milli_sec = if quota >= 0 && period != 0 {
Some((quota as u64).saturating_mul(1000) / period)
// Calculate fractional vCPUs:
// If quota >= 0 and period > 0, vCPUs = quota / period.
// Otherwise, if cpuset is non-empty, derive from cpuset length.
let vcpu_fraction = if quota >= 0 && period > 0 {
Some(quota as f64 / period as f64)
} else if !cpuset.is_empty() {
Some(cpuset.len() as f64)
} else {
None
};
@@ -94,16 +99,18 @@ impl TryFrom<&oci::LinuxCpu> for LinuxContainerCpuResources {
quota,
cpuset,
nodeset,
calculated_vcpu_time_ms: milli_sec,
calculated_vcpu: vcpu_fraction,
})
}
}
/// Assigned CPU resources for a Linux sandbox/pod.
/// Aggregated CPU resources for a Linux sandbox/pod.
/// Tracks cumulative fractional vCPU allocation across all containers in the pod.
#[derive(Default, Debug)]
pub struct LinuxSandboxCpuResources {
shares: u64,
calculated_vcpu_time_ms: u64,
/// Total fractional vCPU allocation for the sandbox.
calculated_vcpu: f64,
cpuset: CpuSet,
nodeset: NumaNodeSet,
}
@@ -122,9 +129,9 @@ impl LinuxSandboxCpuResources {
self.shares
}
/// Get assigned vCPU time in ms.
pub fn calculated_vcpu_time_ms(&self) -> u64 {
self.calculated_vcpu_time_ms
/// Return the cumulative fractional vCPU allocation for the sandbox.
pub fn calculated_vcpu(&self) -> f64 {
self.calculated_vcpu
}
/// Get the CPU set.
@@ -137,19 +144,23 @@ impl LinuxSandboxCpuResources {
&self.nodeset
}
/// Get number of vCPUs to fulfill the CPU resource request.
pub fn get_vcpus(&self) -> u64 {
if self.calculated_vcpu_time_ms == 0 && !self.cpuset.is_empty() {
self.cpuset.len() as u64
} else {
self.calculated_vcpu_time_ms.saturating_add(999) / 1000
/// Get the number of vCPUs for the sandbox as a fractional value.
/// If no quota and cpuset is defined, return cpuset length as float.
pub fn get_vcpus(&self) -> f64 {
if self.calculated_vcpu == 0.0 {
if !self.cpuset.is_empty() {
return self.cpuset.len() as f64;
}
return 0.0;
}
self.calculated_vcpu
}
/// Merge resources assigned to a container into the sandbox/pod resources.
/// Merge container CPU resources into this sandbox CPU resource object.
/// Aggregates fractional vCPU allocation and extends cpuset/nodeset.
pub fn merge(&mut self, container_resource: &LinuxContainerCpuResources) -> &mut Self {
if let Some(v) = container_resource.calculated_vcpu_time_ms.as_ref() {
self.calculated_vcpu_time_ms += v;
if let Some(v) = container_resource.calculated_vcpu {
self.calculated_vcpu += v;
}
self.cpuset.extend(&container_resource.cpuset);
self.nodeset.extend(&container_resource.nodeset);
@@ -160,16 +171,16 @@ impl LinuxSandboxCpuResources {
#[cfg(test)]
mod tests {
use super::*;
const EPSILON: f64 = 0.0001;
#[test]
fn test_linux_container_cpu_resources() {
let resources = LinuxContainerCpuResources::default();
assert_eq!(resources.shares(), 0);
assert_eq!(resources.calculated_vcpu_time_ms, None);
assert!(resources.cpuset.is_empty());
assert!(resources.nodeset.is_empty());
assert!(resources.calculated_vcpu_time_ms.is_none());
assert!(resources.get_vcpus().is_none());
let mut linux_cpu = oci::LinuxCpu::default();
linux_cpu.set_shares(Some(2048));
@@ -182,11 +193,20 @@ mod tests {
assert_eq!(resources.shares(), 2048);
assert_eq!(resources.period(), 100);
assert_eq!(resources.quota(), 1001);
assert_eq!(resources.calculated_vcpu_time_ms, Some(10010));
assert_eq!(resources.get_vcpus().unwrap(), 11);
// Expected fractional vCPUs = quota / period
let expected_vcpus = 1001.0 / 100.0;
assert!(
(resources.get_vcpus().unwrap() - expected_vcpus).abs() < EPSILON,
"got {}, expect {}",
resources.get_vcpus().unwrap(),
expected_vcpus
);
assert_eq!(resources.cpuset().len(), 3);
assert_eq!(resources.nodeset().len(), 1);
// Test cpuset-only path (no quota)
let mut linux_cpu = oci::LinuxCpu::default();
linux_cpu.set_shares(Some(2048));
linux_cpu.set_cpus(Some("1".to_string()));
@@ -196,8 +216,10 @@ mod tests {
assert_eq!(resources.shares(), 2048);
assert_eq!(resources.period(), 0);
assert_eq!(resources.quota(), -1);
assert_eq!(resources.calculated_vcpu_time_ms, None);
assert!(resources.get_vcpus().is_none());
assert!(
(resources.get_vcpus().unwrap() - 1.0).abs() < EPSILON,
"cpuset size vCPU mismatch"
);
assert_eq!(resources.cpuset().len(), 1);
assert_eq!(resources.nodeset().len(), 2);
}
@@ -207,8 +229,7 @@ mod tests {
let mut sandbox = LinuxSandboxCpuResources::new(1024);
assert_eq!(sandbox.shares(), 1024);
assert_eq!(sandbox.get_vcpus(), 0);
assert_eq!(sandbox.calculated_vcpu_time_ms(), 0);
assert_eq!(sandbox.get_vcpus(), 0.0);
assert!(sandbox.cpuset().is_empty());
assert!(sandbox.nodeset().is_empty());
@@ -222,11 +243,20 @@ mod tests {
let resources = LinuxContainerCpuResources::try_from(&linux_cpu).unwrap();
sandbox.merge(&resources);
assert_eq!(sandbox.shares(), 1024);
assert_eq!(sandbox.get_vcpus(), 11);
assert_eq!(sandbox.calculated_vcpu_time_ms(), 10010);
// vCPUs after merge = quota / period
let expected_vcpus = 1001.0 / 100.0;
assert!(
(sandbox.get_vcpus() - expected_vcpus).abs() < EPSILON,
"sandbox vCPU mismatch: got {}, expect {}",
sandbox.get_vcpus(),
expected_vcpus
);
assert_eq!(sandbox.cpuset().len(), 3);
assert_eq!(sandbox.nodeset().len(), 1);
// Merge cpuset-only container
let mut linux_cpu = oci::LinuxCpu::default();
linux_cpu.set_shares(Some(2048));
linux_cpu.set_cpus(Some("1,4".to_string()));
@@ -236,8 +266,15 @@ mod tests {
sandbox.merge(&resources);
assert_eq!(sandbox.shares(), 1024);
assert_eq!(sandbox.get_vcpus(), 11);
assert_eq!(sandbox.calculated_vcpu_time_ms(), 10010);
// Expect quota-based + cpuset len (since cpuset is treated as allocation)
let expected_after_merge = expected_vcpus + resources.get_vcpus().unwrap();
assert!(
(sandbox.get_vcpus() - expected_after_merge).abs() < EPSILON,
"sandbox vCPU mismatch after cpuset merge: got {}, expect {}",
sandbox.get_vcpus(),
expected_after_merge
);
assert_eq!(sandbox.cpuset().len(), 4);
assert_eq!(sandbox.nodeset().len(), 2);
}

View File

@@ -52,7 +52,8 @@ pub struct Config {
// the next compact_force_times times, a compaction will be forced
// regardless of the system's memory situation.
// If compact_force_times is set to 0, will do force compaction each time.
// If compact_force_times is set to std::u64::MAX, will never do force compaction.
// If compact_force_times is set to std::u64::MAX, u64::MAX - 1, or i64::MAX, will never do force compaction.
// Note: Using i64::MAX (9223372036854775807) instead of u64::MAX to avoid TOML parser issues.
pub compact_force_times: u64,
}
@@ -67,7 +68,7 @@ impl Default for Config {
compact_sec_max: 5 * 60,
compact_order: PAGE_REPORTING_MIN_ORDER,
compact_threshold: 2 << PAGE_REPORTING_MIN_ORDER,
compact_force_times: u64::MAX,
compact_force_times: i64::MAX as u64,
}
}
}
@@ -133,7 +134,7 @@ impl CompactCore {
}
fn need_force_compact(&self) -> bool {
if self.config.compact_force_times == u64::MAX {
if self.config.compact_force_times >= i64::MAX as u64 {
return false;
}

View File

@@ -25,19 +25,13 @@ dependencies = [
[[package]]
name = "addr2line"
version = "0.20.0"
version = "0.25.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f4fa78e18c64fce05e902adecd7a5eed15a5e0a3439f7b0e169f0252214865e3"
checksum = "1b5d307320b3181d6d7954e663bd7c774a838b8220fe0593c86d9fb09f498b4b"
dependencies = [
"gimli",
]
[[package]]
name = "adler"
version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f26201604c87b1e01bd3d98f8d5d9a8fcbb815e8cedb41ffccbeb4bf593a35fe"
[[package]]
name = "adler2"
version = "2.0.1"
@@ -344,17 +338,17 @@ checksum = "cc17ab023b4091c10ff099f9deebaeeb59b5189df07e554c4fef042b70745d68"
[[package]]
name = "backtrace"
version = "0.3.68"
version = "0.3.76"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "4319208da049c43661739c5fade2ba182f09d1dc2299b32298d3a31692b17e12"
checksum = "bb531853791a215d7c62a30daf0dde835f381ab5de4589cfe7c649d2cbe92bd6"
dependencies = [
"addr2line",
"cc",
"cfg-if 1.0.0",
"libc",
"miniz_oxide 0.7.1",
"miniz_oxide",
"object",
"rustc-demangle",
"windows-link 0.2.1",
]
[[package]]
@@ -582,9 +576,9 @@ dependencies = [
[[package]]
name = "cgroups-rs"
version = "0.4.0"
version = "0.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "879433e90a9bf3c38e4e854ad36bd14507751dbd3a0df15429283ff5c10ff0e4"
checksum = "efc46cf39fc5922b840030e0e5b378ce5caa9a824a675a95c6dec2c2c9ce9468"
dependencies = [
"bit-vec",
"libc",
@@ -621,7 +615,7 @@ dependencies = [
"js-sys",
"num-traits",
"wasm-bindgen",
"windows-link",
"windows-link 0.1.3",
]
[[package]]
@@ -1448,7 +1442,7 @@ checksum = "bfe33edd8e85a12a67454e37f8c75e730830d83e313556ab9ebf9ee7fbeb3bfb"
dependencies = [
"crc32fast",
"libz-sys",
"miniz_oxide 0.8.9",
"miniz_oxide",
]
[[package]]
@@ -1674,9 +1668,9 @@ dependencies = [
[[package]]
name = "gimli"
version = "0.27.3"
version = "0.32.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b6c80984affa11d98d1b88b66ac8853f143217b399d3c74116778ff8fdb4ed2e"
checksum = "e629b9b98ef3dd8afe6ca2bd0f89306cec16d43d907889945bc5d6687f2f13c7"
[[package]]
name = "glob"
@@ -2510,15 +2504,6 @@ version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "68354c5c6bd36d73ff3feceb05efa59b6acb7626617f4962be322a825e61f79a"
[[package]]
name = "miniz_oxide"
version = "0.7.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e7810e0be55b428ada41041c41f32c9f1a42817901b4ccf45fa3d4b6561e74c7"
dependencies = [
"adler",
]
[[package]]
name = "miniz_oxide"
version = "0.8.9"
@@ -2605,55 +2590,37 @@ dependencies = [
[[package]]
name = "netlink-packet-core"
version = "0.7.0"
version = "0.8.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "72724faf704479d67b388da142b186f916188505e7e0b26719019c525882eda4"
checksum = "3463cbb78394cb0141e2c926b93fc2197e473394b761986eca3b9da2c63ae0f4"
dependencies = [
"anyhow",
"byteorder",
"netlink-packet-utils",
"paste",
]
[[package]]
name = "netlink-packet-route"
version = "0.22.0"
version = "0.26.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fc0e7987b28514adf555dc1f9a5c30dfc3e50750bbaffb1aec41ca7b23dcd8e4"
checksum = "9ea06a7cec15a9df94c58bddc472b1de04ca53bd32e72da7da2c5dd1c3885edc"
dependencies = [
"anyhow",
"bitflags 2.9.0",
"byteorder",
"libc",
"log",
"netlink-packet-core",
"netlink-packet-utils",
]
[[package]]
name = "netlink-packet-utils"
version = "0.5.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0ede8a08c71ad5a95cdd0e4e52facd37190977039a4704eb82a283f713747d34"
dependencies = [
"anyhow",
"byteorder",
"paste",
"thiserror 1.0.69",
]
[[package]]
name = "netlink-proto"
version = "0.11.3"
version = "0.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "86b33524dc0968bfad349684447bfce6db937a9ac3332a1fe60c0c5a5ce63f21"
checksum = "b65d130ee111430e47eed7896ea43ca693c387f097dd97376bffafbf25812128"
dependencies = [
"bytes",
"futures 0.3.28",
"log",
"netlink-packet-core",
"netlink-sys",
"thiserror 1.0.69",
"tokio",
"thiserror 2.0.11",
]
[[package]]
@@ -2910,9 +2877,9 @@ dependencies = [
[[package]]
name = "object"
version = "0.31.1"
version = "0.37.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8bda667d9f2b5051b8833f59f3bf748b28ef54f850f4fcb389a252aa383866d1"
checksum = "ff76201f031d8863c38aa7f905eca4f53abbfa15f609db4277d44cd8938f33fe"
dependencies = [
"memchr",
]
@@ -3890,7 +3857,7 @@ dependencies = [
"async-trait",
"bitflags 2.9.0",
"byte-unit",
"cgroups-rs 0.4.0",
"cgroups-rs 0.5.0",
"flate2",
"futures 0.3.28",
"hex",
@@ -3963,18 +3930,18 @@ dependencies = [
[[package]]
name = "rtnetlink"
version = "0.16.0"
version = "0.19.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3cb5850b5aa2c9c0ae44f157694bbe85107a2e13d76eb3178d0e3ee96c410f57"
checksum = "1f3ee907fdcec9200d13b9cdb64dfc8179cb4ac16ead6ae0ac76333dc41981fc"
dependencies = [
"futures 0.3.28",
"futures-channel",
"futures-util",
"log",
"netlink-packet-core",
"netlink-packet-route",
"netlink-packet-utils",
"netlink-proto",
"netlink-sys",
"nix 0.29.0",
"nix 0.30.1",
"thiserror 1.0.69",
"tokio",
]
@@ -4055,9 +4022,9 @@ dependencies = [
[[package]]
name = "rustc-demangle"
version = "0.1.23"
version = "0.1.26"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d626bb9dae77e28219937af045c257c28bfd3f69333c512553507f5f9798cb76"
checksum = "56f7d92ca342cea22a06f2121d944b4fd82af56988c270852495420f961d4ace"
[[package]]
name = "rustix"
@@ -5696,6 +5663,12 @@ version = "0.1.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5e6ad25900d524eaabdbbb96d20b4311e1e7ae1699af4fb28c17ae66c80d798a"
[[package]]
name = "windows-link"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f0805222e57f7521d6a62e36fa9163bc891acd422f971defe97d64e70d0a4fe5"
[[package]]
name = "windows-result"
version = "0.1.2"

View File

@@ -150,8 +150,8 @@ DEFMEMSLOTS := 10
DEFMAXMEMSZ := 0
##VAR DEFBRIDGES=<number> Default number of bridges
DEFBRIDGES := 0
DEFENABLEANNOTATIONS := [\"kernel_params\"]
DEFENABLEANNOTATIONS_COCO := [\"kernel_params\",\"cc_init_data\"]
DEFENABLEANNOTATIONS := [\"enable_iommu\", \"virtio_fs_extra_args\", \"kernel_params\", \"default_vcpus\", \"default_memory\"]
DEFENABLEANNOTATIONS_COCO := [\"enable_iommu\", \"virtio_fs_extra_args\", \"kernel_params\", \"default_vcpus\", \"default_memory\", \"cc_init_data\"]
DEFDISABLEGUESTSECCOMP := true
DEFDISABLEGUESTEMPTYDIR := false
##VAR DEFAULTEXPFEATURES=[features] Default experimental features enabled
@@ -347,8 +347,13 @@ endif
DEFBLOCKDEVICEAIO_QEMU := io_uring
DEFNETWORKMODEL_QEMU := tcfilter
DEFDISABLEGUESTSELINUX := true
DEFSECCOMPSANDBOXPARAM := on,obsolete=deny,spawn=deny,resourcecontrol=deny
DEFGUESTSELINUXLABEL := system_u:system_r:container_t
# Default is empty string "" to match Rust default None (when commented out in config).
# Most users will want to set this to "on,obsolete=deny,spawn=deny,resourcecontrol=deny"
# for better security. Note: "elevateprivileges=deny" doesn't work with daemonize option.
DEFSECCOMPSANDBOXPARAM := ""
# Default is empty string "" to match Rust default None (when commented out in config).
# Most users will want to set this to "system_u:system_r:container_t" for SELinux support.
DEFGUESTSELINUXLABEL := ""
endif
ifneq (,$(FCCMD))

View File

@@ -18,41 +18,15 @@ image = "@IMAGEPATH@"
# - ext4 (default)
# - xfs
# - erofs
rootfs_type=@DEFROOTFSTYPE@
rootfs_type = @DEFROOTFSTYPE@
# Block storage driver to be used for the VM rootfs is backed
# by a block device.
vm_rootfs_driver = "@VMROOTFSDRIVER_CLH@"
# Enable confidential guest support.
# Toggling that setting may trigger different hardware features, ranging
# from memory encryption to both memory and CPU-state encryption and integrity.
# The Kata Containers runtime dynamically detects the available feature set and
# aims at enabling the largest possible one, returning an error if none is
# available, or none is supported by the hypervisor.
#
# Known limitations:
# * Does not work by design:
# - CPU Hotplug
# - Memory Hotplug
# - NVDIMM devices
#
# Supported TEEs:
# * Intel TDX
#
# Default false
# confidential_guest = true
# Path to the firmware.
# If you want Cloud Hypervisor to use a specific firmware, set its path below.
# This is option is only used when confidential_guest is enabled.
#
# For more information about firmwared that can be used with specific TEEs,
# please, refer to:
# * Intel TDX:
# - td-shim: https://github.com/confidential-containers/td-shim
#
# firmware = "@FIRMWAREPATH@"
firmware = "@FIRMWAREPATH@"
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
@@ -68,7 +42,7 @@ valid_hypervisor_paths = @CLHVALIDHYPERVISORPATHS@
# List of valid annotations values for ctlpath
# The default if not set is empty (all annotations rejected.)
# Your distribution recommends:
# valid_ctlpaths =
valid_ctlpaths = []
# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
@@ -166,7 +140,7 @@ default_bridges = @DEFBRIDGES@
# the VM.
#
# Default false
#reclaim_guest_freed_memory = true
reclaim_guest_freed_memory = false
# Block device driver to be used by the hypervisor when a container's storage
# is backed by a block device or a file. This driver facilitates attaching the
@@ -176,7 +150,7 @@ block_device_driver = "virtio-blk-pci"
# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true
block_device_cache_direct = false
# Bandwidth rate limiter options
#
@@ -184,29 +158,35 @@ block_device_driver = "virtio-blk-pci"
# for SB/VM).
# The same value is used for inbound and outbound bandwidth.
# Default 0-sized value means unlimited rate.
#disk_rate_limiter_bw_max_rate = 0
#
disk_rate_limiter_bw_max_rate = 0
# disk_rate_limiter_bw_one_time_burst increases the initial max rate and this
# initial extra credit does *NOT* affect the overall limit and can be used for
# an *initial* burst of data.
# This is *optional* and only takes effect if disk_rate_limiter_bw_max_rate is
# set to a non zero value.
#disk_rate_limiter_bw_one_time_burst = 0
#
disk_rate_limiter_bw_one_time_burst = 0
# Operation rate limiter options
#
# disk_rate_limiter_ops_max_rate controls disk I/O bandwidth (size in ops/sec
# for SB/VM).
# The same value is used for inbound and outbound bandwidth.
# Default 0-sized value means unlimited rate.
#disk_rate_limiter_ops_max_rate = 0
#
disk_rate_limiter_ops_max_rate = 0
# disk_rate_limiter_ops_one_time_burst increases the initial max rate and this
# initial extra credit does *NOT* affect the overall limit and can be used for
# an *initial* burst of data.
# This is *optional* and only takes effect if disk_rate_limiter_bw_max_rate is
# set to a non zero value.
#disk_rate_limiter_ops_one_time_burst = 0
disk_rate_limiter_ops_one_time_burst = 0
# Virtio queue size. Size: byte. default 128
queue_size = 128
# Block device multi-queue, default 1
num_queues = 1
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
@@ -215,7 +195,7 @@ block_device_driver = "virtio-blk-pci"
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
enable_mem_prealloc = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -223,27 +203,27 @@ block_device_driver = "virtio-blk-pci"
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Enable running clh VMM as a non-root user.
# By default clh VMM run as root. When this is set to true, clh VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
#rootless = true
rootless = false
# Disable the 'seccomp' feature from Cloud Hypervisor, firecracker or dragonball, default false
# disable_seccomp = true
disable_seccomp = false
# This option changes the default hypervisor and kernel parameters
# to enable debug output where available.
#
# Default false
#enable_debug = true
enable_debug = false
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
disable_nesting_checks = false
# Path to OCI hook binaries in the *guest rootfs*.
# This does not affect host-side hooks which must instead be added to
@@ -260,30 +240,31 @@ block_device_driver = "virtio-blk-pci"
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
# Enable swap in the guest. Default false.
# When enable_guest_swap is enabled, insert a raw file to the guest as the swap device.
#enable_guest_swap = true
enable_guest_swap = false
# If enable_guest_swap is enabled, the swap device will be created in the guest
# at this path. Default "/run/kata-containers/swap".
#guest_swap_path = "/run/kata-containers/swap"
guest_swap_path = "/run/kata-containers/swap"
# The percentage of the total memory to be used as swap device.
# Default 100.
#guest_swap_size_percent = 100
guest_swap_size_percent = 100
# The threshold in seconds to create swap device in the guest.
# Kata will wait guest_swap_create_threshold_secs seconds before creating swap device.
# Default 60.
#guest_swap_create_threshold_secs = 60
guest_swap_create_threshold_secs = 60
[agent.@PROJECT_TYPE@]
container_pipe_size=@PIPESIZE@
container_pipe_size = @PIPESIZE@
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -297,18 +278,18 @@ container_pipe_size=@PIPESIZE@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent dial timeout in millisecond.
# (default: 10)
#dial_timeout_ms = 10
dial_timeout_ms = 10
# Agent reconnect timeout in millisecond.
# Retry times = reconnect_timeout_ms / dial_timeout_ms (default: 300)
@@ -317,28 +298,28 @@ container_pipe_size=@PIPESIZE@
# You'd better not change the value of dial_timeout_ms, unless you have an
# idea of what you are doing.
# (default: 3000)
#reconnect_timeout_ms = 3000
reconnect_timeout_ms = 3000
[agent.@PROJECT_TYPE@.mem_agent]
# Control the mem-agent function enable or disable.
# Default to false
#mem_agent_enable = true
mem_agent_enable = false
# Control the mem-agent memcg function disable or enable
# Default to false
#memcg_disable = false
memcg_disable = false
# Control the mem-agent function swap enable or disable.
# Default to false
#memcg_swap = false
memcg_swap = false
# Control the mem-agent function swappiness max number.
# Default to 50
#memcg_swappiness_max = 50
memcg_swappiness_max = 50
# Control the mem-agent memcg function wait period seconds
# Default to 600
#memcg_period_secs = 600
memcg_period_secs = 600
# Control the mem-agent memcg wait period PSI percent limit.
# If the percentage of memory and IO PSI stall time within
@@ -346,7 +327,7 @@ container_pipe_size=@PIPESIZE@
# then the aging and eviction for this cgroup will not be
# executed after this waiting period.
# Default to 1
#memcg_period_psi_percent_limit = 1
memcg_period_psi_percent_limit = 1
# Control the mem-agent memcg eviction PSI percent limit.
# If the percentage of memory and IO PSI stall time for a cgroup
@@ -354,44 +335,44 @@ container_pipe_size=@PIPESIZE@
# this cgroup will immediately stop and will not resume until
# the next memcg waiting period.
# Default to 1
#memcg_eviction_psi_percent_limit = 1
memcg_eviction_psi_percent_limit = 1
# Control the mem-agent memcg eviction run aging count min.
# A cgroup will only perform eviction when the number of aging cycles
# in memcg is greater than or equal to memcg_eviction_run_aging_count_min.
# Default to 3
#memcg_eviction_run_aging_count_min = 3
memcg_eviction_run_aging_count_min = 3
# Control the mem-agent compact function disable or enable
# Default to false
#compact_disable = false
compact_disable = false
# Control the mem-agent compaction function wait period seconds
# Default to 600
#compact_period_secs = 600
compact_period_secs = 600
# Control the mem-agent compaction function wait period PSI percent limit.
# If the percentage of memory and IO PSI stall time within
# the compaction waiting period exceeds this value,
# then the compaction will not be executed after this waiting period.
# Default to 1
#compact_period_psi_percent_limit = 1
compact_period_psi_percent_limit = 1
# Control the mem-agent compaction function compact PSI percent limit.
# During compaction, the percentage of memory and IO PSI stall time
# is checked every second. If this percentage exceeds
# compact_psi_percent_limit, the compaction process will stop.
# Default to 5
#compact_psi_percent_limit = 5
compact_psi_percent_limit = 5
# Control the maximum number of seconds for each compaction of mem-agent compact function.
# Default to 180
#compact_sec_max = 180
# Default to 300
compact_sec_max = 300
# Control the mem-agent compaction function compact order.
# compact_order is use with compact_threshold.
# Default to 9
#compact_order = 9
compact_order = 9
# Control the mem-agent compaction function compact threshold.
# compact_threshold is the pages number.
@@ -404,7 +385,7 @@ container_pipe_size=@PIPESIZE@
# since the previous compaction.
# then the system should initiate another round of memory compaction.
# Default to 1024
#compact_threshold = 1024
compact_threshold = 1024
# Control the mem-agent compaction function force compact times.
# After one compaction, if there has not been a compaction within
@@ -413,7 +394,9 @@ container_pipe_size=@PIPESIZE@
# If compact_force_times is set to 0, will do force compaction each time.
# If compact_force_times is set to 18446744073709551615, will never do force compaction.
# Default to 18446744073709551615
#compact_force_times = 18446744073709551615
# Note: Using a large but valid u64 value (within i64::MAX range) instead of u64::MAX to avoid TOML parser issues
# Using 9223372036854775807 (i64::MAX) which is effectively "never" for practical purposes
compact_force_times = 9223372036854775807
# Create Container Request Timeout
# This timeout value is used to set the maximum duration for the agent to process a CreateContainerRequest.
@@ -426,20 +409,20 @@ container_pipe_size=@PIPESIZE@
# - runtime-request-timeout: The timeout value specified in the Kubelet configuration described as the link below:
# (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout)
# Defaults to @DEFCREATECONTAINERTIMEOUT@ second(s)
# create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
# If enabled, enabled, it means that 1) if the runtime exits abnormally,
# the cleanup process will be skipped, and 2) the runtime will not exit
# even if the health check fails.
# This option is typically used to retain abnormal information for debugging.
# (default: false)
#keep_abnormal = true
keep_abnormal = false
# Internetworking model
# Determines how the VM should be connected to the
@@ -464,33 +447,33 @@ container_pipe_size=@PIPESIZE@
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_CLH@"
internetworking_model = "@DEFNETWORKMODEL_CLH@"
name="@RUNTIMENAME@"
hypervisor_name="@HYPERVISOR_CLH@"
agent_name="@PROJECT_TYPE@"
name = "@RUNTIMENAME@"
hypervisor_name = "@HYPERVISOR_CLH@"
agent_name = "@PROJECT_TYPE@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -498,7 +481,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -506,18 +489,18 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_CLH@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY_CLH@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -526,7 +509,7 @@ experimental=@DEFAULTEXPFEATURES@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT_CLH@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT_CLH@
# If specified, sandbox_bind_mounts identifieds host paths to be mounted(ro, rw) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
@@ -536,7 +519,7 @@ static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT_CLH@
# - "/path/to", default readonly mode.
# - "/path/to:ro", readonly mode.
# - "/path/to:rw", readwrite mode.
sandbox_bind_mounts=@DEFBINDMOUNTS@
sandbox_bind_mounts = @DEFBINDMOUNTS@
# Base directory of directly attachable network config.
# Network devices for VM-based containers are allowed to be placed in the

View File

@@ -16,13 +16,12 @@ path = "@DBPATH@"
ctlpath = "@DBCTLPATH@"
kernel = "@KERNELPATH_DB@"
image = "@IMAGEPATH@"
# initrd = "@INITRDPATH@"
# rootfs filesystem type:
# - ext4 (default)
# - xfs
# - erofs
rootfs_type=@DEFROOTFSTYPE@
rootfs_type = @DEFROOTFSTYPE@
# Block storage driver to be used for the VM rootfs is backed
@@ -43,7 +42,7 @@ valid_hypervisor_paths = @DBVALIDHYPERVISORPATHS@
# List of valid annotations values for ctlpath
# The default if not set is empty (all annotations rejected.)
# Your distribution recommends:
# valid_ctlpaths =
valid_ctlpaths = []
# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
@@ -106,7 +105,7 @@ default_bridges = @DEFBRIDGES@
# the VM.
#
# Default false
#reclaim_guest_freed_memory = true
reclaim_guest_freed_memory = false
# Default memory size in MiB for SB/VM.
# If unspecified then it will be set @DEFMEMSZ@ MiB.
@@ -129,7 +128,7 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_DB@"
# to enable debug output where available.
#
# Default false
#enable_debug = true
enable_debug = false
# The log level will be applied to hypervisor.
# Possible values are:
@@ -140,17 +139,18 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_DB@"
# - error
# - critical
# Default: info
#log_level = "info"
log_level = "info"
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
# Default false
disable_nesting_checks = false
# If host doesn't support vhost_net, set to true. Thus we won't create vhost fds for nics.
# Default false
#disable_vhost_net = true
disable_vhost_net = false
# Path to OCI hook binaries in the *guest rootfs*.
# This does not affect host-side hooks which must instead be added to
@@ -167,7 +167,8 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_DB@"
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
# Shared file system type:
# - inline-virtio-fs (default)
@@ -209,7 +210,13 @@ virtio_fs_cache = "@DEFVIRTIOFSCACHE@"
# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true
block_device_cache_direct = false
# Virtio queue size. Size: byte. default 128
queue_size = 128
# Block device multi-queue, default 1
num_queues = 1
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -217,33 +224,33 @@ virtio_fs_cache = "@DEFVIRTIOFSCACHE@"
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Disable the 'seccomp' feature from Cloud Hypervisor, firecracker or dragonball, default false
# disable_seccomp = true
disable_seccomp = false
# Enable swap in the guest. Default false.
# When enable_guest_swap is enabled, insert a raw file to the guest as the swap device.
#enable_guest_swap = true
enable_guest_swap = false
# If enable_guest_swap is enabled, the swap device will be created in the guest
# at this path. Default "/run/kata-containers/swap".
#guest_swap_path = "/run/kata-containers/swap"
guest_swap_path = "/run/kata-containers/swap"
# The percentage of the total memory to be used as swap device.
# Default 100.
#guest_swap_size_percent = 100
guest_swap_size_percent = 100
# The threshold in seconds to create swap device in the guest.
# Kata will wait guest_swap_create_threshold_secs seconds before creating swap device.
# Default 60.
#guest_swap_create_threshold_secs = 60
guest_swap_create_threshold_secs = 60
[agent.@PROJECT_TYPE@]
container_pipe_size=@PIPESIZE@
container_pipe_size = @PIPESIZE@
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# The log level will be applied to agent.
# Possible values are:
@@ -254,7 +261,7 @@ container_pipe_size=@PIPESIZE@
# - error
# - critical
# (default: info)
#log_level = "info"
log_level = "info"
# Enable agent tracing.
#
@@ -268,18 +275,18 @@ container_pipe_size=@PIPESIZE@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent dial timeout in millisecond.
# (default: 10)
#dial_timeout_ms = 10
dial_timeout_ms = 10
# Agent reconnect timeout in millisecond.
# Retry times = reconnect_timeout_ms / dial_timeout_ms (default: 300)
@@ -288,7 +295,7 @@ container_pipe_size=@PIPESIZE@
# You'd better not change the value of dial_timeout_ms, unless you have an
# idea of what you are doing.
# (default: 3000)
#reconnect_timeout_ms = 3000
reconnect_timeout_ms = 3000
# Create Container Request Timeout
# This timeout value is used to set the maximum duration for the agent to process a CreateContainerRequest.
@@ -301,28 +308,28 @@ container_pipe_size=@PIPESIZE@
# - runtime-request-timeout: The timeout value specified in the Kubelet configuration described as the link below:
# (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout)
# Defaults to @DEFCREATECONTAINERTIMEOUT@ second(s)
# create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
[agent.@PROJECT_TYPE@.mem_agent]
# Control the mem-agent function enable or disable.
# Default to false
#mem_agent_enable = true
mem_agent_enable = false
# Control the mem-agent memcg function disable or enable
# Default to false
#memcg_disable = false
memcg_disable = false
# Control the mem-agent function swap enable or disable.
# Default to false
#memcg_swap = false
memcg_swap = false
# Control the mem-agent function swappiness max number.
# Default to 50
#memcg_swappiness_max = 50
memcg_swappiness_max = 50
# Control the mem-agent memcg function wait period seconds
# Default to 600
#memcg_period_secs = 600
memcg_period_secs = 600
# Control the mem-agent memcg wait period PSI percent limit.
# If the percentage of memory and IO PSI stall time within
@@ -330,7 +337,7 @@ container_pipe_size=@PIPESIZE@
# then the aging and eviction for this cgroup will not be
# executed after this waiting period.
# Default to 1
#memcg_period_psi_percent_limit = 1
memcg_period_psi_percent_limit = 1
# Control the mem-agent memcg eviction PSI percent limit.
# If the percentage of memory and IO PSI stall time for a cgroup
@@ -338,44 +345,44 @@ container_pipe_size=@PIPESIZE@
# this cgroup will immediately stop and will not resume until
# the next memcg waiting period.
# Default to 1
#memcg_eviction_psi_percent_limit = 1
memcg_eviction_psi_percent_limit = 1
# Control the mem-agent memcg eviction run aging count min.
# A cgroup will only perform eviction when the number of aging cycles
# in memcg is greater than or equal to memcg_eviction_run_aging_count_min.
# Default to 3
#memcg_eviction_run_aging_count_min = 3
memcg_eviction_run_aging_count_min = 3
# Control the mem-agent compact function disable or enable
# Default to false
#compact_disable = false
compact_disable = false
# Control the mem-agent compaction function wait period seconds
# Default to 600
#compact_period_secs = 600
compact_period_secs = 600
# Control the mem-agent compaction function wait period PSI percent limit.
# If the percentage of memory and IO PSI stall time within
# the compaction waiting period exceeds this value,
# then the compaction will not be executed after this waiting period.
# Default to 1
#compact_period_psi_percent_limit = 1
compact_period_psi_percent_limit = 1
# Control the mem-agent compaction function compact PSI percent limit.
# During compaction, the percentage of memory and IO PSI stall time
# is checked every second. If this percentage exceeds
# compact_psi_percent_limit, the compaction process will stop.
# Default to 5
#compact_psi_percent_limit = 5
compact_psi_percent_limit = 5
# Control the maximum number of seconds for each compaction of mem-agent compact function.
# Default to 180
#compact_sec_max = 180
compact_sec_max = 180
# Control the mem-agent compaction function compact order.
# compact_order is use with compact_threshold.
# Default to 9
#compact_order = 9
compact_order = 9
# Control the mem-agent compaction function compact threshold.
# compact_threshold is the pages number.
@@ -388,22 +395,22 @@ container_pipe_size=@PIPESIZE@
# since the previous compaction.
# then the system should initiate another round of memory compaction.
# Default to 1024
#compact_threshold = 1024
compact_threshold = 1024
# Control the mem-agent compaction function force compact times.
# After one compaction, if there has not been a compaction within
# the next compact_force_times times, a compaction will be forced
# regardless of the system's memory situation.
# If compact_force_times is set to 0, will do force compaction each time.
# If compact_force_times is set to 18446744073709551615, will never do force compaction.
# Default to 18446744073709551615
#compact_force_times = 18446744073709551615
# If compact_force_times is set to 9223372036854775807, will never do force compaction.
# Default to 9223372036854775807
compact_force_times = 9223372036854775807
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
# The log level will be applied to runtimes.
# Possible values are:
@@ -414,14 +421,14 @@ container_pipe_size=@PIPESIZE@
# - error
# - critical
# (default: info)
#log_level = "info"
log_level = "info"
# If enabled, enabled, it means that 1) if the runtime exits abnormally,
# the cleanup process will be skipped, and 2) the runtime will not exit
# even if the health check fails.
# This option is typically used to retain abnormal information for debugging.
# (default: false)
#keep_abnormal = true
keep_abnormal = false
# Internetworking model
# Determines how the VM should be connected to the
@@ -446,33 +453,33 @@ container_pipe_size=@PIPESIZE@
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_DB@"
internetworking_model = "@DEFNETWORKMODEL_DB@"
name="@RUNTIMENAME@"
hypervisor_name="@HYPERVISOR_DB@"
agent_name="@PROJECT_TYPE@"
name = "@RUNTIMENAME@"
hypervisor_name = "@HYPERVISOR_DB@"
agent_name = "@PROJECT_TYPE@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -480,7 +487,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -488,18 +495,18 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_DB@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY_DB@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -508,7 +515,7 @@ experimental=@DEFAULTEXPFEATURES@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT_DB@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT_DB@
# If specified, sandbox_bind_mounts identifieds host paths to be mounted(ro, rw) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
@@ -518,7 +525,7 @@ static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT_DB@
# - "/path/to", default readonly mode.
# - "/path/to:ro", readonly mode.
# - "/path/to:rw", readwrite mode.
sandbox_bind_mounts=@DEFBINDMOUNTS@
sandbox_bind_mounts = @DEFBINDMOUNTS@
# Base directory of directly attachable network config.
# Network devices for VM-based containers are allowed to be placed in the
@@ -534,4 +541,4 @@ dan_conf = "@DEFDANCONF@"
use_passfd_io = true
# If fd passthrough io is enabled, the runtime will attempt to use the specified port instead of the default port.
# passfd_listener_port = 1027
passfd_listener_port = 1027

View File

@@ -16,14 +16,13 @@
path = "@QEMUPATH@"
kernel = "@KERNELPATH_COCO@"
image = "@IMAGECONFIDENTIALPATH@"
# initrd = "@INITRDCONFIDENTIALPATH@"
machine_type = "@MACHINETYPE@"
# rootfs filesystem type:
# - ext4 (default)
# - xfs
# - erofs
rootfs_type=@DEFROOTFSTYPE@
rootfs_type = @DEFROOTFSTYPE@
# Block storage driver to be used for the VM rootfs is backed
# by a block device. This is virtio-blk-pci, virtio-blk-mmio or nvdimm
@@ -43,18 +42,12 @@ vm_rootfs_driver = "@VMROOTFSDRIVER_QEMU@"
# - NVDIMM devices
#
# Default false
# confidential_guest = true
# Choose AMD SEV-SNP confidential guests
# In case of using confidential guests on AMD hardware that supports both SEV
# and SEV-SNP, the following enables SEV-SNP guests. SEV guests are default.
# Default false
# sev_snp_guest = true
confidential_guest = false
# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true
rootless = false
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
@@ -92,7 +85,7 @@ firmware_volume = "@FIRMWAREVOLUMEPATH@"
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators="@MACHINEACCELERATORS@"
machine_accelerators = "@MACHINEACCELERATORS@"
# Qemu seccomp sandbox feature
# comma-separated list of seccomp sandbox features to control the syscall access.
@@ -100,12 +93,13 @@ machine_accelerators="@MACHINEACCELERATORS@"
# Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox
# Another note: enabling this feature may reduce performance, you may enable
# /proc/sys/net/core/bpf_jit_enable to reduce the impact. see https://man7.org/linux/man-pages/man8/bpfc.8.html
#seccompsandbox="@DEFSECCOMPSANDBOXPARAM@"
# Recommended value when enabling: "on,obsolete=deny,spawn=deny,resourcecontrol=deny"
seccompsandbox = "@DEFSECCOMPSANDBOXPARAM@"
# CPU features
# comma-separated list of cpu features to pass to the cpu
# For example, `cpu_features = "pmu=off,vmx=off"
cpu_features="@CPUFEATURES@"
cpu_features = "@CPUFEATURES@"
# Default number of vCPUs per SB/VM:
# unspecified or 0 --> will be set to @DEFVCPUS@
@@ -151,7 +145,7 @@ default_bridges = @DEFBRIDGES@
# the VM.
#
# Default false
#reclaim_guest_freed_memory = true
reclaim_guest_freed_memory = false
# Default memory size in MiB for SB/VM.
# If unspecified then it will be set @DEFMEMSZ@ MiB.
@@ -160,7 +154,7 @@ default_memory = @DEFMEMSZ@
# Default memory slots per SB/VM.
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# Default maximum memory in MiB per SB / VM
# unspecified or == 0 --> will be set to the actual amount of physical RAM
@@ -173,13 +167,13 @@ default_maxmemory = @DEFMAXMEMSZ@
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0
memory_offset = 0
# Specifies virtio-mem will be enabled or not.
# Please note that this option should be used with the command
# "echo 1 > /proc/sys/vm/overcommit_memory".
# Default false
#enable_virtio_mem = true
enable_virtio_mem = false
# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
@@ -256,17 +250,17 @@ block_device_aio = "@DEFBLOCKDEVICEAIO_QEMU@"
# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true
block_device_cache_set = false
# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true
block_device_cache_direct = false
# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true
block_device_cache_noflush = false
# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
@@ -281,7 +275,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
enable_mem_prealloc = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -289,7 +283,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Enable vhost-user storage device, default false
# Enabling this will result in some Linux reserved block type
@@ -306,11 +300,11 @@ vhost_user_store_path = "@DEFVHOSTUSERSTOREPATH@"
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true
enable_iommu = false
# Enable IOMMU_PLATFORM, default false
# Enabling this will result in the VM device having iommu_platform=on set
#enable_iommu_platform = true
enable_iommu_platform = false
# List of valid annotations values for the vhost user store path
# The default if not set is empty (all annotations rejected.)
@@ -326,7 +320,7 @@ vhost_user_reconnect_timeout_sec = 0
# will disable this feature. In the case of virtio-fs, this is enabled
# automatically and '/dev/shm' is used as the backing folder.
# This option will be ignored if VM templating is enabled.
#file_mem_backend = "@DEFFILEMEMBACKEND@"
file_mem_backend = "@DEFFILEMEMBACKEND@"
# List of valid annotations values for the file_mem_backend annotation
# The default if not set is empty (all annotations rejected.)
@@ -341,7 +335,7 @@ pflashes = []
# to enable debug output where available.
#
# Default false
#enable_debug = true
enable_debug = false
# This option allows to add an extra HMP or QMP socket when `enable_debug = true`
#
@@ -356,17 +350,18 @@ pflashes = []
#
# If set to the empty string "", no extra monitor socket is added. This is
# the default.
#extra_monitor_socket = "hmp"
extra_monitor_socket = ""
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
# Default false
disable_nesting_checks = false
# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = @DEFMSIZE9P@
msize_9p = @DEFMSIZE9P@
# If false and nvdimm is supported, use nvdimm device to plug guest image.
# Otherwise virtio-block device is used.
@@ -374,44 +369,44 @@ pflashes = []
# nvdimm is not supported when `confidential_guest = true`.
#
# Default is false
#disable_image_nvdimm = true
disable_image_nvdimm = false
# VFIO devices are hotplugged on a bridge by default.
# Enable hotplugging on root bus. This may be required for devices with
# a large PCI bar, as this is a current limitation with hotplugging on
# a bridge.
# Default false
#hotplug_vfio_on_root_bus = true
hotplug_vfio_on_root_bus = false
# Enable hot-plugging of VFIO devices to a bridge-port,
# root-port or switch-port.
# The default setting is "no-port"
#hot_plug_vfio = "root-port"
hot_plug_vfio = "no-port"
# In a confidential compute environment hot-plugging can compromise
# security.
# Enable cold-plugging of VFIO devices to a bridge-port,
# root-port or switch-port.
# The default setting is "no-port", which means disabled.
#cold_plug_vfio = "root-port"
cold_plug_vfio = "no-port"
# Before hot plugging a PCIe device, you need to add a pcie_root_port device.
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
# The value means the number of pcie_root_port
# This value is valid when hotplug_vfio_on_root_bus is true and machine_type is "q35"
# Default 0
#pcie_root_port = 2
pcie_root_port = 0
# Before hot plugging a PCIe device onto a switch port, you need add a pcie_switch_port device fist.
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
# The value means how many devices attached onto pcie_switch_port will be created.
# This value is valid when hotplug_vfio_on_root_bus is true, and machine_type is "q35"
# Default 0
#pcie_switch_port = 2
pcie_switch_port = 0
# If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off
# security (vhost-net runs ring0) for network I/O performance.
#disable_vhost_net = true
disable_vhost_net = false
#
# Default entropy source.
@@ -423,7 +418,7 @@ pflashes = []
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "@DEFENTROPYSOURCE@"
entropy_source = "@DEFENTROPYSOURCE@"
# List of valid annotations values for entropy_source
# The default if not set is empty (all annotations rejected.)
@@ -445,29 +440,19 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
# Enable connection to Quote Generation Service (QGS)
# The "tdx_quote_generation_service_socket_port" parameter configures how QEMU connects to the TDX Quote Generation Service (QGS).
# This connection is essential for Trusted Domain (TD) attestation, as QGS signs the TDREPORT sent by QEMU via the GetQuote hypercall.
# By default QGS runs on vsock port 4050, but can be modified by the host admin. For QEMU's tdx-guest object, this connection needs to
# be specified in a JSON format, for example:
# -object '{"qom-type":"tdx-guest","id":"tdx","quote-generation-socket":{"type":"vsock","cid":"2","port":"4050"}}'
# It's important to note that setting "tdx_quote_generation_service_socket_port" to 0 enables communication via Unix Domain Sockets (UDS).
# To activate UDS, the QGS service itself must be launched with the "-port=0" parameter and the UDS will always be located at /var/run/tdx-qgs/qgs.socket.
# -object '{"qom-type":"tdx-guest","id":"tdx","quote-generation-socket":{"type":"unix","path":"/var/run/tdx-qgs/qgs.socket"}}'
# tdx_quote_generation_service_socket_port = @QEMUTDXQUOTEGENERATIONSERVICESOCKETPORT@
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
#
# Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) to discipline traffic.
# Default 0-sized value means unlimited rate.
#rx_rate_limiter_max_rate = 0
rx_rate_limiter_max_rate = 0
# Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) and ifb(Intermediate Functional Block)
# to discipline traffic.
# Default 0-sized value means unlimited rate.
#tx_rate_limiter_max_rate = 0
tx_rate_limiter_max_rate = 0
# Set where to save the guest memory dump file.
# If set, when GUEST_PANICKED event occurred,
@@ -477,9 +462,10 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# The dumped file(also called vmcore) can be processed with crash or gdb.
#
# WARNING:
# Dump guests memory can take very long depending on the amount of guest memory
# Dump guest's memory can take very long depending on the amount of guest memory
# and use much disk space.
#guest_memory_dump_path="/var/crash/kata"
# Recommended value when enabling: "/var/crash/kata"
guest_memory_dump_path = ""
# If enable paging.
# Basically, if you want to use "gdb" rather than "crash",
@@ -490,17 +476,17 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
#guest_memory_dump_paging=false
# use legacy serial for guest console if available and implemented for architecture. Default false
#use_legacy_serial = true
use_legacy_serial = false
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
disable_selinux = @DEFDISABLESELINUX@
# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
# Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built
# with `SELINUX=yes`.
# (default: true)
disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
disable_guest_selinux = @DEFDISABLEGUESTSELINUX@
[factory]
@@ -515,41 +501,17 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true
enable_template = false
# Specifies the path of template.
#
# Default "/run/vc/vm/template"
#template_path = "/run/vc/vm/template"
# The number of caches of VMCache:
# unspecified or == 0 --> VMCache is disabled
# > 0 --> will be set to the specified number
#
# VMCache is a function that creates VMs as caches before using it.
# It helps speed up new container creation.
# The function consists of a server and some clients communicating
# through Unix socket. The protocol is gRPC in protocols/cache/cache.proto.
# The VMCache server will create some VMs and cache them by factory cache.
# It will convert the VM to gRPC format and transport it when gets
# requestion from clients.
# Factory grpccache is the VMCache client. It will request gRPC format
# VM and convert it back to a VM. If VMCache function is enabled,
# kata-runtime will request VM from factory grpccache when it creates
# a new sandbox.
#
# Default 0
#vm_cache_number = 0
# Specify the address of the Unix socket that is used by VMCache.
#
# Default /var/run/kata-containers/cache.sock
#vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
template_path = "/run/vc/vm/template"
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -563,7 +525,7 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
@@ -576,18 +538,18 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# * The module is not available in the guest or it doesn't met the guest kernel
# requirements, like architecture and version.
#
kernel_modules=[]
kernel_modules = []
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent dial timeout in millisecond.
# (default: 10)
#dial_timeout_ms = 10
dial_timeout_ms = 10
# Agent reconnect timeout in millisecond.
# Retry times = reconnect_timeout_ms / dial_timeout_ms (default: 300)
@@ -596,7 +558,7 @@ kernel_modules=[]
# You'd better not change the value of dial_timeout_ms, unless you have an
# idea of what you are doing.
# (default: 3000)
#reconnect_timeout_ms = 3000
reconnect_timeout_ms = 3000
# Create Container Request Timeout
# This timeout value is used to set the maximum duration for the agent to process a CreateContainerRequest.
@@ -609,28 +571,28 @@ kernel_modules=[]
# - runtime-request-timeout: The timeout value specified in the Kubelet configuration described as the link below:
# (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout)
# Defaults to @DEFCREATECONTAINERTIMEOUT@ second(s)
# create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
[agent.@PROJECT_TYPE@.mem_agent]
# Control the mem-agent function enable or disable.
# Default to false
#mem_agent_enable = true
mem_agent_enable = false
# Control the mem-agent memcg function disable or enable
# Default to false
#memcg_disable = false
memcg_disable = false
# Control the mem-agent function swap enable or disable.
# Default to false
#memcg_swap = false
memcg_swap = false
# Control the mem-agent function swappiness max number.
# Default to 50
#memcg_swappiness_max = 50
memcg_swappiness_max = 50
# Control the mem-agent memcg function wait period seconds
# Default to 600
#memcg_period_secs = 600
memcg_period_secs = 600
# Control the mem-agent memcg wait period PSI percent limit.
# If the percentage of memory and IO PSI stall time within
@@ -638,7 +600,7 @@ kernel_modules=[]
# then the aging and eviction for this cgroup will not be
# executed after this waiting period.
# Default to 1
#memcg_period_psi_percent_limit = 1
memcg_period_psi_percent_limit = 1
# Control the mem-agent memcg eviction PSI percent limit.
# If the percentage of memory and IO PSI stall time for a cgroup
@@ -646,44 +608,44 @@ kernel_modules=[]
# this cgroup will immediately stop and will not resume until
# the next memcg waiting period.
# Default to 1
#memcg_eviction_psi_percent_limit = 1
memcg_eviction_psi_percent_limit = 1
# Control the mem-agent memcg eviction run aging count min.
# A cgroup will only perform eviction when the number of aging cycles
# in memcg is greater than or equal to memcg_eviction_run_aging_count_min.
# Default to 3
#memcg_eviction_run_aging_count_min = 3
memcg_eviction_run_aging_count_min = 3
# Control the mem-agent compact function disable or enable
# Default to false
#compact_disable = false
compact_disable = false
# Control the mem-agent compaction function wait period seconds
# Default to 600
#compact_period_secs = 600
compact_period_secs = 600
# Control the mem-agent compaction function wait period PSI percent limit.
# If the percentage of memory and IO PSI stall time within
# the compaction waiting period exceeds this value,
# then the compaction will not be executed after this waiting period.
# Default to 1
#compact_period_psi_percent_limit = 1
compact_period_psi_percent_limit = 1
# Control the mem-agent compaction function compact PSI percent limit.
# During compaction, the percentage of memory and IO PSI stall time
# is checked every second. If this percentage exceeds
# compact_psi_percent_limit, the compaction process will stop.
# Default to 5
#compact_psi_percent_limit = 5
compact_psi_percent_limit = 5
# Control the maximum number of seconds for each compaction of mem-agent compact function.
# Default to 180
#compact_sec_max = 180
compact_sec_max = 180
# Control the mem-agent compaction function compact order.
# compact_order is use with compact_threshold.
# Default to 9
#compact_order = 9
compact_order = 9
# Control the mem-agent compaction function compact threshold.
# compact_threshold is the pages number.
@@ -696,16 +658,16 @@ kernel_modules=[]
# since the previous compaction.
# then the system should initiate another round of memory compaction.
# Default to 1024
#compact_threshold = 1024
compact_threshold = 1024
# Control the mem-agent compaction function force compact times.
# After one compaction, if there has not been a compaction within
# the next compact_force_times times, a compaction will be forced
# regardless of the system's memory situation.
# If compact_force_times is set to 0, will do force compaction each time.
# If compact_force_times is set to 18446744073709551615, will never do force compaction.
# Default to 18446744073709551615
#compact_force_times = 18446744073709551615
# If compact_force_times is set to 9223372036854775807, will never do force compaction.
# Default to 9223372036854775807
compact_force_times = 9223372036854775807
# Create Container Request Timeout
# This timeout value is used to set the maximum duration for the agent to process a CreateContainerRequest.
@@ -718,13 +680,14 @@ kernel_modules=[]
# - runtime-request-timeout: The timeout value specified in the Kubelet configuration described as the link below:
# (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout)
# Defaults to @DEFCREATECONTAINERTIMEOUT_COCO@ second(s)
# create_container_timeout = @DEFCREATECONTAINERTIMEOUT_COCO@
create_container_timeout = @DEFCREATECONTAINERTIMEOUT_COCO@
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
#
# Internetworking model
# Determines how the VM should be connected to the
@@ -742,23 +705,23 @@ kernel_modules=[]
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_QEMU@"
internetworking_model = "@DEFNETWORKMODEL_QEMU@"
name="@RUNTIMENAME@"
hypervisor_name="@HYPERVISOR_QEMU@"
agent_name="@PROJECT_TYPE@"
name = "@RUNTIMENAME@"
hypervisor_name = "@HYPERVISOR_QEMU@"
agent_name = "@PROJECT_TYPE@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# vCPUs pinning settings
# if enabled, each vCPU thread will be scheduled to a fixed CPU
# qualified condition: num(vCPU threads) == num(CPUs in sandbox's CPUSet)
# enable_vcpus_pinning = false
enable_vcpus_pinning = false
# Apply a custom SELinux security policy to the container process inside the VM.
# This is used when you want to apply a type other than the default `container_t`,
@@ -766,22 +729,23 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="@DEFGUESTSELINUXLABEL@"
# Example value when enabling: "system_u:system_r:container_t"
guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -789,7 +753,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -797,7 +761,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_QEMU@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY_QEMU@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -806,13 +770,13 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_QEMU@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT_COCO@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT_COCO@
# If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
# If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts`
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=@DEFBINDMOUNTS@
sandbox_bind_mounts = @DEFBINDMOUNTS@
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
@@ -833,19 +797,19 @@ sandbox_bind_mounts=@DEFBINDMOUNTS@
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE@"
vfio_mode = "@DEFVFIOMODE@"
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=@DEFDISABLEGUESTEMPTYDIR@
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false

View File

@@ -16,45 +16,22 @@
path = "@QEMUPATH@"
kernel = "@KERNELPATH_QEMU@"
image = "@IMAGEPATH@"
# initrd = "@INITRDPATH@"
machine_type = "@MACHINETYPE@"
# rootfs filesystem type:
# - ext4 (default)
# - xfs
# - erofs
rootfs_type=@DEFROOTFSTYPE@
rootfs_type = @DEFROOTFSTYPE@
# Block storage driver to be used for the VM rootfs is backed
# by a block device. This is virtio-blk-pci, virtio-blk-mmio or nvdimm
vm_rootfs_driver = "@VMROOTFSDRIVER_QEMU@"
# Enable confidential guest support.
# Toggling that setting may trigger different hardware features, ranging
# from memory encryption to both memory and CPU-state encryption and integrity.
# The Kata Containers runtime dynamically detects the available feature set and
# aims at enabling the largest possible one, returning an error if none is
# available, or none is supported by the hypervisor.
#
# Known limitations:
# * Does not work by design:
# - CPU Hotplug
# - Memory Hotplug
# - NVDIMM devices
#
# Default false
# confidential_guest = true
# Choose AMD SEV-SNP confidential guests
# In case of using confidential guests on AMD hardware that supports both SEV
# and SEV-SNP, the following enables SEV-SNP guests. SEV guests are default.
# Default false
# sev_snp_guest = true
# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true
rootless = false
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
@@ -92,7 +69,7 @@ firmware_volume = "@FIRMWAREVOLUMEPATH@"
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators="@MACHINEACCELERATORS@"
machine_accelerators = "@MACHINEACCELERATORS@"
# Qemu seccomp sandbox feature
# comma-separated list of seccomp sandbox features to control the syscall access.
@@ -100,12 +77,13 @@ machine_accelerators="@MACHINEACCELERATORS@"
# Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox
# Another note: enabling this feature may reduce performance, you may enable
# /proc/sys/net/core/bpf_jit_enable to reduce the impact. see https://man7.org/linux/man-pages/man8/bpfc.8.html
#seccompsandbox="@DEFSECCOMPSANDBOXPARAM@"
# Recommended value when enabling: "on,obsolete=deny,spawn=deny,resourcecontrol=deny"
seccompsandbox = "@DEFSECCOMPSANDBOXPARAM@"
# CPU features
# comma-separated list of cpu features to pass to the cpu
# For example, `cpu_features = "pmu=off,vmx=off"
cpu_features="@CPUFEATURES@"
cpu_features = "@CPUFEATURES@"
# Default number of vCPUs per SB/VM:
# unspecified or 0 --> will be set to @DEFVCPUS@
@@ -151,7 +129,7 @@ default_bridges = @DEFBRIDGES@
# the VM.
#
# Default false
#reclaim_guest_freed_memory = true
reclaim_guest_freed_memory = false
# Default memory size in MiB for SB/VM.
# If unspecified then it will be set @DEFMEMSZ@ MiB.
@@ -160,7 +138,7 @@ default_memory = @DEFMEMSZ@
# Default memory slots per SB/VM.
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# Default maximum memory in MiB per SB / VM
# unspecified or == 0 --> will be set to the actual amount of physical RAM
@@ -173,13 +151,13 @@ default_maxmemory = @DEFMAXMEMSZ@
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0
memory_offset = 0
# Specifies virtio-mem will be enabled or not.
# Please note that this option should be used with the command
# "echo 1 > /proc/sys/vm/overcommit_memory".
# Default false
#enable_virtio_mem = true
enable_virtio_mem = false
# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
@@ -262,17 +240,17 @@ block_device_aio = "@DEFBLOCKDEVICEAIO_QEMU@"
# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true
block_device_cache_set = false
# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true
block_device_cache_direct = false
# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true
block_device_cache_noflush = false
# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
@@ -280,6 +258,12 @@ block_device_aio = "@DEFBLOCKDEVICEAIO_QEMU@"
#
enable_iothreads = @DEFENABLEIOTHREADS@
# Virtio queue size. Size: byte. default 128
queue_size = 128
# Block device multi-queue, default 1
num_queues = 1
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
@@ -287,7 +271,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
enable_mem_prealloc = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -295,7 +279,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Enable vhost-user storage device, default false
# Enabling this will result in some Linux reserved block type
@@ -312,11 +296,11 @@ vhost_user_store_path = "@DEFVHOSTUSERSTOREPATH@"
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true
enable_iommu = false
# Enable IOMMU_PLATFORM, default false
# Enabling this will result in the VM device having iommu_platform=on set
#enable_iommu_platform = true
enable_iommu_platform = false
# List of valid annotations values for the vhost user store path
# The default if not set is empty (all annotations rejected.)
@@ -332,7 +316,7 @@ vhost_user_reconnect_timeout_sec = 0
# will disable this feature. In the case of virtio-fs, this is enabled
# automatically and '/dev/shm' is used as the backing folder.
# This option will be ignored if VM templating is enabled.
#file_mem_backend = "@DEFFILEMEMBACKEND@"
file_mem_backend = "@DEFFILEMEMBACKEND@"
# List of valid annotations values for the file_mem_backend annotation
# The default if not set is empty (all annotations rejected.)
@@ -347,7 +331,7 @@ pflashes = []
# to enable debug output where available.
#
# Default false
#enable_debug = true
enable_debug = false
# This option allows to add an extra HMP or QMP socket when `enable_debug = true`
#
@@ -362,17 +346,17 @@ pflashes = []
#
# If set to the empty string "", no extra monitor socket is added. This is
# the default.
#extra_monitor_socket = "hmp"
extra_monitor_socket = ""
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
disable_nesting_checks = false
# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = @DEFMSIZE9P@
msize_9p = @DEFMSIZE9P@
# If false and nvdimm is supported, use nvdimm device to plug guest image.
# Otherwise virtio-block device is used.
@@ -380,44 +364,44 @@ pflashes = []
# nvdimm is not supported when `confidential_guest = true`.
#
# Default is false
#disable_image_nvdimm = true
disable_image_nvdimm = false
# VFIO devices are hotplugged on a bridge by default.
# Enable hotplugging on root bus. This may be required for devices with
# a large PCI bar, as this is a current limitation with hotplugging on
# a bridge.
# Default false
#hotplug_vfio_on_root_bus = true
hotplug_vfio_on_root_bus = false
# Enable hot-plugging of VFIO devices to a bridge-port,
# root-port or switch-port.
# The default setting is "no-port"
#hot_plug_vfio = "root-port"
hot_plug_vfio = "no-port"
# In a confidential compute environment hot-plugging can compromise
# security.
# Enable cold-plugging of VFIO devices to a bridge-port,
# root-port or switch-port.
# The default setting is "no-port", which means disabled.
#cold_plug_vfio = "root-port"
cold_plug_vfio = "no-port"
# Before hot plugging a PCIe device, you need to add a pcie_root_port device.
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
# The value means the number of pcie_root_port
# This value is valid when hotplug_vfio_on_root_bus is true and machine_type is "q35"
# Default 0
#pcie_root_port = 2
pcie_root_port = 0
# Before hot plugging a PCIe device onto a switch port, you need add a pcie_switch_port device fist.
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
# The value means how many devices attached onto pcie_switch_port will be created.
# This value is valid when hotplug_vfio_on_root_bus is true, and machine_type is "q35"
# Default 0
#pcie_switch_port = 2
pcie_switch_port = 0
# If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off
# security (vhost-net runs ring0) for network I/O performance.
#disable_vhost_net = true
disable_vhost_net = false
#
# Default entropy source.
@@ -429,7 +413,7 @@ pflashes = []
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "@DEFENTROPYSOURCE@"
entropy_source = "@DEFENTROPYSOURCE@"
# List of valid annotations values for entropy_source
# The default if not set is empty (all annotations rejected.)
@@ -451,7 +435,8 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
# Enable connection to Quote Generation Service (QGS)
# The "tdx_quote_generation_service_socket_port" parameter configures how QEMU connects to the TDX Quote Generation Service (QGS).
@@ -462,18 +447,18 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# It's important to note that setting "tdx_quote_generation_service_socket_port" to 0 enables communication via Unix Domain Sockets (UDS).
# To activate UDS, the QGS service itself must be launched with the "-port=0" parameter and the UDS will always be located at /var/run/tdx-qgs/qgs.socket.
# -object '{"qom-type":"tdx-guest","id":"tdx","quote-generation-socket":{"type":"unix","path":"/var/run/tdx-qgs/qgs.socket"}}'
# tdx_quote_generation_service_socket_port = @QEMUTDXQUOTEGENERATIONSERVICESOCKETPORT@
tdx_quote_generation_service_socket_port = @QEMUTDXQUOTEGENERATIONSERVICESOCKETPORT@
#
# Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) to discipline traffic.
# Default 0-sized value means unlimited rate.
#rx_rate_limiter_max_rate = 0
rx_rate_limiter_max_rate = 0
# Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) and ifb(Intermediate Functional Block)
# to discipline traffic.
# Default 0-sized value means unlimited rate.
#tx_rate_limiter_max_rate = 0
tx_rate_limiter_max_rate = 0
# Set where to save the guest memory dump file.
# If set, when GUEST_PANICKED event occurred,
@@ -483,9 +468,10 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# The dumped file(also called vmcore) can be processed with crash or gdb.
#
# WARNING:
# Dump guests memory can take very long depending on the amount of guest memory
# Dump guest's memory can take very long depending on the amount of guest memory
# and use much disk space.
#guest_memory_dump_path="/var/crash/kata"
# Recommended value when enabling: "/var/crash/kata"
guest_memory_dump_path = ""
# If enable paging.
# Basically, if you want to use "gdb" rather than "crash",
@@ -493,20 +479,20 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# then you should enable paging.
#
# See: https://www.qemu.org/docs/master/qemu-qmp-ref.html#Dump-guest-memory for details
#guest_memory_dump_paging=false
guest_memory_dump_paging = false
# use legacy serial for guest console if available and implemented for architecture. Default false
#use_legacy_serial = true
use_legacy_serial = false
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
disable_selinux = @DEFDISABLESELINUX@
# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
# Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built
# with `SELINUX=yes`.
# (default: true)
disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
disable_guest_selinux = @DEFDISABLEGUESTSELINUX@
[hypervisor.qemu.factory]
@@ -521,41 +507,17 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true
enable_template = false
# Specifies the path of template.
#
# Default "/run/vc/vm/template"
#template_path = "/run/vc/vm/template"
# The number of caches of VMCache:
# unspecified or == 0 --> VMCache is disabled
# > 0 --> will be set to the specified number
#
# VMCache is a function that creates VMs as caches before using it.
# It helps speed up new container creation.
# The function consists of a server and some clients communicating
# through Unix socket. The protocol is gRPC in protocols/cache/cache.proto.
# The VMCache server will create some VMs and cache them by factory cache.
# It will convert the VM to gRPC format and transport it when gets
# requestion from clients.
# Factory grpccache is the VMCache client. It will request gRPC format
# VM and convert it back to a VM. If VMCache function is enabled,
# kata-runtime will request VM from factory grpccache when it creates
# a new sandbox.
#
# Default 0
#vm_cache_number = 0
# Specify the address of the Unix socket that is used by VMCache.
#
# Default /var/run/kata-containers/cache.sock
#vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
template_path = "/run/vc/vm/template"
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -569,7 +531,7 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
@@ -582,18 +544,18 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# * The module is not available in the guest or it doesn't met the guest kernel
# requirements, like architecture and version.
#
kernel_modules=[]
kernel_modules = []
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent dial timeout in millisecond.
# (default: 10)
#dial_timeout_ms = 10
dial_timeout_ms = 10
# Agent reconnect timeout in millisecond.
# Retry times = reconnect_timeout_ms / dial_timeout_ms (default: 300)
@@ -602,28 +564,28 @@ kernel_modules=[]
# You'd better not change the value of dial_timeout_ms, unless you have an
# idea of what you are doing.
# (default: 3000)
#reconnect_timeout_ms = 3000
reconnect_timeout_ms = 3000
[agent.@PROJECT_TYPE@.mem_agent]
# Control the mem-agent function enable or disable.
# Default to false
#mem_agent_enable = true
mem_agent_enable = false
# Control the mem-agent memcg function disable or enable
# Default to false
#memcg_disable = false
memcg_disable = false
# Control the mem-agent function swap enable or disable.
# Default to false
#memcg_swap = false
memcg_swap = false
# Control the mem-agent function swappiness max number.
# Default to 50
#memcg_swappiness_max = 50
memcg_swappiness_max = 50
# Control the mem-agent memcg function wait period seconds
# Default to 600
#memcg_period_secs = 600
memcg_period_secs = 600
# Control the mem-agent memcg wait period PSI percent limit.
# If the percentage of memory and IO PSI stall time within
@@ -631,7 +593,7 @@ kernel_modules=[]
# then the aging and eviction for this cgroup will not be
# executed after this waiting period.
# Default to 1
#memcg_period_psi_percent_limit = 1
memcg_period_psi_percent_limit = 1
# Control the mem-agent memcg eviction PSI percent limit.
# If the percentage of memory and IO PSI stall time for a cgroup
@@ -639,44 +601,44 @@ kernel_modules=[]
# this cgroup will immediately stop and will not resume until
# the next memcg waiting period.
# Default to 1
#memcg_eviction_psi_percent_limit = 1
memcg_eviction_psi_percent_limit = 1
# Control the mem-agent memcg eviction run aging count min.
# A cgroup will only perform eviction when the number of aging cycles
# in memcg is greater than or equal to memcg_eviction_run_aging_count_min.
# Default to 3
#memcg_eviction_run_aging_count_min = 3
memcg_eviction_run_aging_count_min = 3
# Control the mem-agent compact function disable or enable
# Default to false
#compact_disable = false
compact_disable = false
# Control the mem-agent compaction function wait period seconds
# Default to 600
#compact_period_secs = 600
compact_period_secs = 600
# Control the mem-agent compaction function wait period PSI percent limit.
# If the percentage of memory and IO PSI stall time within
# the compaction waiting period exceeds this value,
# then the compaction will not be executed after this waiting period.
# Default to 1
#compact_period_psi_percent_limit = 1
compact_period_psi_percent_limit = 1
# Control the mem-agent compaction function compact PSI percent limit.
# During compaction, the percentage of memory and IO PSI stall time
# is checked every second. If this percentage exceeds
# compact_psi_percent_limit, the compaction process will stop.
# Default to 5
#compact_psi_percent_limit = 5
compact_psi_percent_limit = 5
# Control the maximum number of seconds for each compaction of mem-agent compact function.
# Default to 180
#compact_sec_max = 180
# Default to 300
compact_sec_max = 300
# Control the mem-agent compaction function compact order.
# compact_order is use with compact_threshold.
# Default to 9
#compact_order = 9
compact_order = 9
# Control the mem-agent compaction function compact threshold.
# compact_threshold is the pages number.
@@ -689,7 +651,7 @@ kernel_modules=[]
# since the previous compaction.
# then the system should initiate another round of memory compaction.
# Default to 1024
#compact_threshold = 1024
compact_threshold = 1024
# Control the mem-agent compaction function force compact times.
# After one compaction, if there has not been a compaction within
@@ -698,7 +660,9 @@ kernel_modules=[]
# If compact_force_times is set to 0, will do force compaction each time.
# If compact_force_times is set to 18446744073709551615, will never do force compaction.
# Default to 18446744073709551615
#compact_force_times = 18446744073709551615
# Note: Using a large but valid u64 value (within i64::MAX range) instead of u64::MAX to avoid TOML parser issues
# Using 9223372036854775807 (i64::MAX) which is effectively "never" for practical purposes
compact_force_times = 9223372036854775807
# Create Container Request Timeout
# This timeout value is used to set the maximum duration for the agent to process a CreateContainerRequest.
@@ -711,14 +675,14 @@ kernel_modules=[]
# - runtime-request-timeout: The timeout value specified in the Kubelet configuration described as the link below:
# (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout)
# Defaults to @DEFCREATECONTAINERTIMEOUT@ second(s)
# create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
#
enable_debug = false
# Internetworking model
# Determines how the VM should be connected to the
# the container network interface
@@ -735,23 +699,23 @@ kernel_modules=[]
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_QEMU@"
internetworking_model = "@DEFNETWORKMODEL_QEMU@"
name="@RUNTIMENAME@"
hypervisor_name="@HYPERVISOR_QEMU@"
agent_name="@PROJECT_TYPE@"
name = "@RUNTIMENAME@"
hypervisor_name = "@HYPERVISOR_QEMU@"
agent_name = "@PROJECT_TYPE@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# vCPUs pinning settings
# if enabled, each vCPU thread will be scheduled to a fixed CPU
# qualified condition: num(vCPU threads) == num(CPUs in sandbox's CPUSet)
# enable_vcpus_pinning = false
enable_vcpus_pinning = false
# Apply a custom SELinux security policy to the container process inside the VM.
# This is used when you want to apply a type other than the default `container_t`,
@@ -759,22 +723,23 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="@DEFGUESTSELINUXLABEL@"
# Example value when enabling: "system_u:system_r:container_t"
guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -782,7 +747,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -790,7 +755,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_QEMU@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY_QEMU@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -799,13 +764,13 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_QEMU@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT_QEMU@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT_QEMU@
# If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
# If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts`
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=@DEFBINDMOUNTS@
sandbox_bind_mounts = @DEFBINDMOUNTS@
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
@@ -826,19 +791,19 @@ sandbox_bind_mounts=@DEFBINDMOUNTS@
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE@"
vfio_mode = "@DEFVFIOMODE@"
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=@DEFDISABLEGUESTEMPTYDIR@
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false

View File

@@ -40,7 +40,7 @@ confidential_guest = true
# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true
rootless = false
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
@@ -78,7 +78,7 @@ firmware_volume = "@FIRMWAREVOLUMEPATH@"
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators="@MACHINEACCELERATORS@"
machine_accelerators = "@MACHINEACCELERATORS@"
# Qemu seccomp sandbox feature
# comma-separated list of seccomp sandbox features to control the syscall access.
@@ -86,12 +86,13 @@ machine_accelerators="@MACHINEACCELERATORS@"
# Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox
# Another note: enabling this feature may reduce performance, you may enable
# /proc/sys/net/core/bpf_jit_enable to reduce the impact. see https://man7.org/linux/man-pages/man8/bpfc.8.html
#seccompsandbox="@DEFSECCOMPSANDBOXPARAM@"
# Recommended value when enabling: "on,obsolete=deny,spawn=deny,resourcecontrol=deny"
seccompsandbox = "@DEFSECCOMPSANDBOXPARAM@"
# CPU features
# comma-separated list of cpu features to pass to the cpu
# For example, `cpu_features = "pmu=off,vmx=off"
cpu_features="@CPUFEATURES@"
cpu_features = "@CPUFEATURES@"
# Default number of vCPUs per SB/VM:
# unspecified or 0 --> will be set to @DEFVCPUS@
@@ -136,7 +137,7 @@ default_memory = @DEFMEMSZ@
# Default memory slots per SB/VM.
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# Default maximum memory in MiB per SB / VM
# unspecified or == 0 --> will be set to the actual amount of physical RAM
@@ -149,13 +150,13 @@ default_maxmemory = @DEFMAXMEMSZ@
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0
memory_offset = 0
# Specifies virtio-mem will be enabled or not.
# Please note that this option should be used with the command
# "echo 1 > /proc/sys/vm/overcommit_memory".
# Default false
#enable_virtio_mem = true
enable_virtio_mem = false
# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
@@ -238,17 +239,17 @@ block_device_aio = "@DEFBLOCKDEVICEAIO_QEMU@"
# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true
block_device_cache_set = false
# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true
block_device_cache_direct = false
# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true
block_device_cache_noflush = false
# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
@@ -256,6 +257,12 @@ block_device_aio = "@DEFBLOCKDEVICEAIO_QEMU@"
#
enable_iothreads = @DEFENABLEIOTHREADS@
# Virtio queue size. Size: byte. default 128
queue_size = 128
# Block device multi-queue, default 1
num_queues = 1
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
@@ -263,7 +270,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
enable_mem_prealloc = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -271,7 +278,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Enable vhost-user storage device, default false
# Enabling this will result in some Linux reserved block type
@@ -288,11 +295,11 @@ vhost_user_store_path = "@DEFVHOSTUSERSTOREPATH@"
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true
enable_iommu = false
# Enable IOMMU_PLATFORM, default false
# Enabling this will result in the VM device having iommu_platform=on set
#enable_iommu_platform = true
enable_iommu_platform = false
# List of valid annotations values for the vhost user store path
# The default if not set is empty (all annotations rejected.)
@@ -303,7 +310,7 @@ valid_vhost_user_store_paths = @DEFVALIDVHOSTUSERSTOREPATHS@
# will disable this feature. In the case of virtio-fs, this is enabled
# automatically and '/dev/shm' is used as the backing folder.
# This option will be ignored if VM templating is enabled.
#file_mem_backend = "@DEFFILEMEMBACKEND@"
file_mem_backend = "@DEFFILEMEMBACKEND@"
# List of valid annotations values for the file_mem_backend annotation
# The default if not set is empty (all annotations rejected.)
@@ -318,17 +325,17 @@ pflashes = []
# to enable debug output where available.
#
# Default false
#enable_debug = true
enable_debug = false
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
disable_nesting_checks = false
# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = @DEFMSIZE9P@
msize_9p = @DEFMSIZE9P@
# If false and nvdimm is supported, use nvdimm device to plug guest image.
# Otherwise virtio-block device is used.
@@ -336,33 +343,33 @@ pflashes = []
# nvdimm is not supported when `confidential_guest = true`.
#
# Default is false
#disable_image_nvdimm = true
disable_image_nvdimm = false
# Enable hot-plugging of VFIO devices to a bridge-port,
# root-port or switch-port.
# The default setting is "no-port"
#hot_plug_vfio = "root-port"
hot_plug_vfio = "no-port"
# In a confidential compute environment hot-plugging can compromise
# security.
# Enable cold-plugging of VFIO devices to a bridge-port,
# root-port or switch-port.
# The default setting is "no-port", which means disabled.
cold_plug_vfio = "root-port"
cold_plug_vfio = "no-port"
# VFIO devices are hotplugged on a bridge by default.
# Enable hotplugging on root bus. This may be required for devices with
# a large PCI bar, as this is a current limitation with hotplugging on
# a bridge.
# Default false
#hotplug_vfio_on_root_bus = true
hotplug_vfio_on_root_bus = false
# Before hot plugging a PCIe device, you need to add a pcie_root_port device.
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
# The value means the number of pcie_root_port
# This value is valid when hotplug_vfio_on_root_bus is true and machine_type is "q35"
# Default 0
#pcie_root_port = 2
pcie_root_port = 0
# If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off
# security (vhost-net runs ring0) for network I/O performance.
@@ -378,7 +385,7 @@ disable_vhost_net = true
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "@DEFENTROPYSOURCE@"
entropy_source = "@DEFENTROPYSOURCE@"
# List of valid annotations values for entropy_source
# The default if not set is empty (all annotations rejected.)
@@ -400,17 +407,18 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
#
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
# Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) to discipline traffic.
# Default 0-sized value means unlimited rate.
#rx_rate_limiter_max_rate = 0
rx_rate_limiter_max_rate = 0
# Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) and ifb(Intermediate Functional Block)
# to discipline traffic.
# Default 0-sized value means unlimited rate.
#tx_rate_limiter_max_rate = 0
tx_rate_limiter_max_rate = 0
# Set where to save the guest memory dump file.
# If set, when GUEST_PANICKED event occurred,
@@ -420,9 +428,10 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# The dumped file(also called vmcore) can be processed with crash or gdb.
#
# WARNING:
# Dump guests memory can take very long depending on the amount of guest memory
# Dump guest's memory can take very long depending on the amount of guest memory
# and use much disk space.
#guest_memory_dump_path="/var/crash/kata"
# Recommended value when enabling: "/var/crash/kata"
guest_memory_dump_path = ""
# If enable paging.
# Basically, if you want to use "gdb" rather than "crash",
@@ -430,7 +439,7 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# then you should enable paging.
#
# See: https://www.qemu.org/docs/master/qemu-qmp-ref.html#Dump-guest-memory for details
#guest_memory_dump_paging=false
guest_memory_dump_paging = false
# Enable swap in the guest. Default false.
# When enable_guest_swap is enabled, insert a raw file to the guest as the swap device
@@ -441,20 +450,20 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
# If swap_in_bytes and memory_limit_in_bytes is not set, the size should
# be default_memory.
#enable_guest_swap = true
enable_guest_swap = false
# use legacy serial for guest console if available and implemented for architecture. Default false
#use_legacy_serial = true
use_legacy_serial = false
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
disable_selinux = @DEFDISABLESELINUX@
# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
# Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built
# with `SELINUX=yes`.
# (default: true)
disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
disable_guest_selinux = @DEFDISABLEGUESTSELINUX@
[factory]
@@ -469,41 +478,17 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true
enable_template = false
# Specifies the path of template.
#
# Default "/run/vc/vm/template"
#template_path = "/run/vc/vm/template"
# The number of caches of VMCache:
# unspecified or == 0 --> VMCache is disabled
# > 0 --> will be set to the specified number
#
# VMCache is a function that creates VMs as caches before using it.
# It helps speed up new container creation.
# The function consists of a server and some clients communicating
# through Unix socket. The protocol is gRPC in protocols/cache/cache.proto.
# The VMCache server will create some VMs and cache them by factory cache.
# It will convert the VM to gRPC format and transport it when gets
# requestion from clients.
# Factory grpccache is the VMCache client. It will request gRPC format
# VM and convert it back to a VM. If VMCache function is enabled,
# kata-runtime will request VM from factory grpccache when it creates
# a new sandbox.
#
# Default 0
#vm_cache_number = 0
# Specify the address of the Unix socket that is used by VMCache.
#
# Default /var/run/kata-containers/cache.sock
#vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
template_path = "/run/vc/vm/template"
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -517,7 +502,7 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
@@ -530,14 +515,14 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# * The module is not available in the guest or it doesn't met the guest kernel
# requirements, like architecture and version.
#
kernel_modules=[]
kernel_modules = []
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent dial timeout in millisecond.
# (default: 10)
@@ -563,14 +548,14 @@ reconnect_timeout_ms = 5000
# - runtime-request-timeout: The timeout value specified in the Kubelet configuration described as the link below:
# (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout)
# Defaults to @DEFCREATECONTAINERTIMEOUT@ second(s)
# create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
#
enable_debug = false
# Internetworking model
# Determines how the VM should be connected to the
# the container network interface
@@ -587,23 +572,23 @@ reconnect_timeout_ms = 5000
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_QEMU@"
internetworking_model = "@DEFNETWORKMODEL_QEMU@"
name="@RUNTIMENAME@"
hypervisor_name="@HYPERVISOR_QEMU@"
agent_name="@PROJECT_TYPE@"
name = "@RUNTIMENAME@"
hypervisor_name = "@HYPERVISOR_QEMU@"
agent_name = "@PROJECT_TYPE@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# vCPUs pinning settings
# if enabled, each vCPU thread will be scheduled to a fixed CPU
# qualified condition: num(vCPU threads) == num(CPUs in sandbox's CPUSet)
# enable_vcpus_pinning = false
enable_vcpus_pinning = false
# Apply a custom SELinux security policy to the container process inside the VM.
# This is used when you want to apply a type other than the default `container_t`,
@@ -611,22 +596,23 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="@DEFGUESTSELINUXLABEL@"
# Example value when enabling: "system_u:system_r:container_t"
guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -634,7 +620,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -642,7 +628,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_QEMU@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY_QEMU@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -651,13 +637,13 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_QEMU@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT_QEMU@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT_QEMU@
# If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
# If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts`
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=@DEFBINDMOUNTS@
sandbox_bind_mounts = @DEFBINDMOUNTS@
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
@@ -678,19 +664,19 @@ sandbox_bind_mounts=@DEFBINDMOUNTS@
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE_SE@"
vfio_mode = "@DEFVFIOMODE_SE@"
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=@DEFDISABLEGUESTEMPTYDIR@
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false

View File

@@ -19,24 +19,6 @@ remote_hypervisor_socket = "/run/peerpod/hypervisor.sock"
# Timeout in seconds for creating a remote hypervisor, 600s(10min) by default
remote_hypervisor_timeout = 600
# Enable confidential guest support.
# Toggling that setting may trigger different hardware features, ranging
# from memory encryption to both memory and CPU-state encryption and integrity.
# The Kata Containers runtime dynamically detects the available feature set and
# aims at enabling the largest possible one, returning an error if none is
# available, or none is supported by the hypervisor.
#
# Known limitations:
# * Does not work by design:
# - CPU Hotplug
# - Memory Hotplug
# - NVDIMM devices
#
# Default false
# confidential_guest = true
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
# of the annotation, e.g. "path" for io.katacontainers.config.hypervisor.path"
@@ -54,7 +36,7 @@ enable_annotations = ["machine_type", "default_memory", "default_vcpus", "defaul
# To see the list of default parameters, enable hypervisor debug, create a
# container and look for 'default-kernel-parameters' log entries.
# NOTE: kernel_params are not currently passed over in remote hypervisor
# kernel_params = ""
kernel_params = ""
# Path to the firmware.
# If you want that qemu uses the default firmware leave this option empty
@@ -65,7 +47,7 @@ firmware = "@FIRMWAREPATH@"
# < 0 --> will be set to the actual number of physical cores
# > 0 <= number of physical cores --> will be set to the specified number
# > number of physical cores --> will be set to the actual number of physical cores
# default_vcpus = 1
default_vcpus = 1
# Default maximum number of vCPUs per SB/VM:
# unspecified or == 0 --> will be set to the actual number of physical cores or to the maximum number
@@ -82,7 +64,7 @@ firmware = "@FIRMWAREPATH@"
# vCPUs supported by the SB/VM. In general, we recommend that you do not edit this variable,
# unless you know what are you doing.
# NOTICE: on arm platform with gicv2 interrupt controller, set it to 8.
# default_maxvcpus = @DEFMAXVCPUS@
default_maxvcpus = @DEFMAXVCPUS@
# Bridges can be used to hot plug devices.
# Limitations:
@@ -99,19 +81,19 @@ default_bridges = @DEFBRIDGES@
# Default memory size in MiB for SB/VM.
# If unspecified then it will be set @DEFMEMSZ@ MiB.
# Note: the remote hypervisor uses the peer pod config to determine the memory of the VM
# default_memory = @DEFMEMSZ@
default_memory = @DEFMEMSZ@
#
# Default memory slots per SB/VM.
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
# Note: the remote hypervisor uses the peer pod config to determine the memory of the VM
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# This option changes the default hypervisor and kernel parameters
# to enable debug output where available. And Debug also enable the hmp socket.
#
# Default false
# enable_debug = true
enable_debug = false
# Path to OCI hook binaries in the *guest rootfs*.
# This does not affect host-side hooks which must instead be added to
@@ -128,10 +110,11 @@ default_bridges = @DEFBRIDGES@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
disable_selinux = @DEFDISABLESELINUX@
# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
@@ -144,7 +127,7 @@ disable_guest_selinux = true
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
# enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -158,18 +141,18 @@ disable_guest_selinux = true
# increasing the container shutdown time slightly.
#
# (default: disabled)
# enable_tracing = true
enable_tracing = false
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 30)
#dial_timeout = 30
dial_timeout = 30
# Create Container Request Timeout
# This timeout value is used to set the maximum duration for the agent to process a CreateContainerRequest.
@@ -182,13 +165,13 @@ disable_guest_selinux = true
# - runtime-request-timeout: The timeout value specified in the Kubelet configuration described as the link below:
# (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout)
# Defaults to @DEFCREATECONTAINERTIMEOUT@ second(s)
# create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
# enable_debug = true
enable_debug = false
#
# Internetworking model
# Determines how the VM should be connected to the
@@ -207,11 +190,11 @@ disable_guest_selinux = true
# provided by plugin to a tap interface connected to the VM.
#
# Note: The remote hypervisor, uses it's own network, so "none" is required
internetworking_model="none"
internetworking_model = "none"
name="virt_container"
hypervisor_name="remote"
agent_name="kata"
name = "virt_container"
hypervisor_name = "remote"
agent_name = "kata"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
@@ -219,7 +202,7 @@ agent_name="kata"
# within the guest
# (default: true)
# Note: The remote hypervisor has a different guest, so currently requires this to be set to true
disable_guest_seccomp=true
disable_guest_seccomp = true
# Apply a custom SELinux security policy to the container process inside the VM.
@@ -228,22 +211,23 @@ disable_guest_seccomp=true
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="@DEFGUESTSELINUXLABEL@"
# Example value when enabling: "system_u:system_r:container_t"
guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -260,7 +244,7 @@ disable_new_netns = false
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_REMOTE@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY_REMOTE@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -270,7 +254,7 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_REMOTE@
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
# Note: the remote hypervisor uses the peer pod config to determine the sandbox size, so requires this to be set to true
static_sandbox_resource_mgmt=true
static_sandbox_resource_mgmt = true
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
@@ -291,20 +275,20 @@ static_sandbox_resource_mgmt=true
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE@"
vfio_mode = "@DEFVFIOMODE@"
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
# Note: remote hypervisor has no sharing of emptydir mounts from host to guest
disable_guest_empty_dir=false
disable_guest_empty_dir = false
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false

View File

@@ -16,7 +16,7 @@ path = "@FCPATH@"
kernel = "@KERNELPATH_FC@"
image = "@IMAGEPATH@"
rootfs_type=@DEFROOTFSTYPE@
rootfs_type = @DEFROOTFSTYPE@
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
# of the annotation, e.g. "path" for io.katacontainers.config.hypervisor.path"
@@ -32,7 +32,7 @@ valid_hypervisor_paths = @FCVALIDHYPERVISORPATHS@
# If the jailer path is not set kata will launch firecracker
# without a jail. If the jailer is set firecracker will be
# launched in a jailed enviornment created by the jailer
#jailer_path = "@FCJAILERPATH@"
jailer_path = "@FCJAILERPATH@"
# List of valid jailer path values for the hypervisor
# Each member of the list can be a regular expression
@@ -104,7 +104,7 @@ memory_slots = @DEFMEMSLOTS@
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0
memory_offset = 0
# Default maximum memory in MiB per SB / VM
# unspecified or == 0 --> will be set to the actual amount of physical RAM
@@ -121,12 +121,12 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_FC@"
# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true
block_device_cache_set = false
# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true
block_device_cache_noflush = false
# Bandwidth rate limiter options
#
@@ -134,14 +134,14 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_FC@"
# for SB/VM).
# The same value is used for inbound and outbound bandwidth.
# Default 0-sized value means unlimited rate.
#disk_rate_limiter_bw_max_rate = 0
disk_rate_limiter_bw_max_rate = 0
#
# disk_rate_limiter_bw_one_time_burst increases the initial max rate and this
# initial extra credit does *NOT* affect the overall limit and can be used for
# an *initial* burst of data.
# This is *optional* and only takes effect if disk_rate_limiter_bw_max_rate is
# set to a non zero value.
#disk_rate_limiter_bw_one_time_burst = 0
disk_rate_limiter_bw_one_time_burst = 0
#
# Operation rate limiter options
#
@@ -149,14 +149,20 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_FC@"
# for SB/VM).
# The same value is used for inbound and outbound bandwidth.
# Default 0-sized value means unlimited rate.
#disk_rate_limiter_ops_max_rate = 0
disk_rate_limiter_ops_max_rate = 0
#
# disk_rate_limiter_ops_one_time_burst increases the initial max rate and this
# initial extra credit does *NOT* affect the overall limit and can be used for
# an *initial* burst of data.
# This is *optional* and only takes effect if disk_rate_limiter_bw_max_rate is
# set to a non zero value.
#disk_rate_limiter_ops_one_time_burst = 0
disk_rate_limiter_ops_one_time_burst = 0
# Virtio queue size. Size: byte. default 128
queue_size = 128
# Block device multi-queue, default 1
num_queues = 1
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
@@ -165,7 +171,7 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_FC@"
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
enable_mem_prealloc = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -173,39 +179,40 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_FC@"
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Disable the 'seccomp' feature from Cloud Hypervisor, firecracker or dragonball, default false
# disable_seccomp = true
disable_seccomp = false
# Enable vIOMMU, default false
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true
enable_iommu = false
# This option changes the default hypervisor and kernel parameters
# to enable debug output where available.
#
# Default false
#enable_debug = true
enable_debug = false
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
# Default false
disable_nesting_checks = false
# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = @DEFMSIZE9P@
msize_9p = @DEFMSIZE9P@
# VFIO devices are hotplugged on a bridge by default.
# Enable hotplugging on root bus. This may be required for devices with
# a large PCI bar, as this is a current limitation with hotplugging on
# a bridge.
# Default false
#hotplug_vfio_on_root_bus = true
hotplug_vfio_on_root_bus = false
#
# Default entropy source.
@@ -217,7 +224,7 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_FC@"
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "@DEFENTROPYSOURCE@"
entropy_source = "@DEFENTROPYSOURCE@"
# List of valid annotations values for entropy_source
# The default if not set is empty (all annotations rejected.)
@@ -239,40 +246,27 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered will scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
#
# Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM).
# In Firecracker, it provides a built-in rate limiter, which is based on TBF(Token Bucket Filter)
# queueing discipline.
# Default 0-sized value means unlimited rate.
#rx_rate_limiter_max_rate = 0
rx_rate_limiter_max_rate = 0
# Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM).
# In Firecracker, it provides a built-in rate limiter, which is based on TBF(Token Bucket Filter)
# queueing discipline.
# Default 0-sized value means unlimited rate.
#tx_rate_limiter_max_rate = 0
tx_rate_limiter_max_rate = 0
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
[factory]
# VM templating support. Once enabled, new VMs are created from template
# using vm cloning. They will share the same initial kernel, initramfs and
# agent memory by mapping it readonly. It helps speeding up new container
# creation and saves a lot of memory if there are many kata containers running
# on the same host.
#
# When disabled, new VMs are created from scratch.
#
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true
disable_selinux = @DEFDISABLESELINUX@
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -286,7 +280,7 @@ disable_selinux=@DEFDISABLESELINUX@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
@@ -299,14 +293,14 @@ disable_selinux=@DEFDISABLESELINUX@
# * The module is not available in the guest or it doesn't met the guest kernel
# requirements, like architecture and version.
#
kernel_modules=[]
kernel_modules = []
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 45)
@@ -314,7 +308,7 @@ dial_timeout = 45
# Confidential Data Hub API timeout value in seconds
# (default: 50)
#cdh_api_timeout = 50
cdh_api_timeout = 50
# Create Container Request Timeout
# This timeout value is used to set the maximum duration for the agent to process a CreateContainerRequest.
@@ -327,13 +321,14 @@ dial_timeout = 45
# - runtime-request-timeout: The timeout value specified in the Kubelet configuration described as the link below:
# (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout)
# Defaults to @DEFCREATECONTAINERTIMEOUT@ second(s)
# create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
#
# Internetworking model
# Determines how the VM should be connected to the
@@ -351,33 +346,33 @@ dial_timeout = 45
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_FC@"
internetworking_model = "@DEFNETWORKMODEL_FC@"
name="@RUNTIMENAME@"
hypervisor_name="@HYPERVISOR_FC@"
agent_name="@PROJECT_TYPE@"
name = "@RUNTIMENAME@"
hypervisor_name = "@HYPERVISOR_FC@"
agent_name = "@PROJECT_TYPE@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -385,7 +380,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -393,7 +388,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_FC@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY_FC@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -402,19 +397,19 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_FC@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT_FC@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT_FC@
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=@DEFDISABLEGUESTEMPTYDIR@
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false

View File

@@ -25,7 +25,7 @@ use ch_config::ch_api::{
cloud_hypervisor_vm_fs_add, cloud_hypervisor_vm_netdev_add_with_fds,
cloud_hypervisor_vm_vsock_add, PciDeviceInfo, VmRemoveDeviceData,
};
use ch_config::convert::{DEFAULT_DISK_QUEUES, DEFAULT_DISK_QUEUE_SIZE, DEFAULT_NUM_PCI_SEGMENTS};
use ch_config::convert::DEFAULT_NUM_PCI_SEGMENTS;
use ch_config::DiskConfig;
use ch_config::{net_util::MacAddr, DeviceConfig, FsConfig, NetConfig, VsockConfig};
use kata_sys_util::netns::NetnsGuard;
@@ -542,8 +542,8 @@ impl TryFrom<BlockConfig> for DiskConfig {
let disk_config: DiskConfig = DiskConfig {
path: Some(blkcfg.path_on_host.as_str().into()),
readonly: blkcfg.is_readonly,
num_queues: DEFAULT_DISK_QUEUES,
queue_size: DEFAULT_DISK_QUEUE_SIZE,
num_queues: blkcfg.num_queues,
queue_size: blkcfg.queue_size as u16,
..Default::default()
};

View File

@@ -103,6 +103,12 @@ pub struct BlockConfig {
/// device minor number
pub minor: i64,
/// virtio queue size. size: byte
pub queue_size: u32,
/// block device multi-queue
pub num_queues: usize,
}
#[derive(Debug, Clone, Default)]

View File

@@ -488,7 +488,7 @@ impl Qmp {
);
netdev_frontend_args.insert("addr".to_owned(), format!("{:02}", slot).into());
netdev_frontend_args.insert("mac".to_owned(), virtio_net_device.get_mac_addr().into());
netdev_frontend_args.insert("mq".to_owned(), "on".into());
netdev_frontend_args.insert("mq".to_owned(), true.into());
// As the golang runtime documents the vectors computation, it's
// 2N+2 vectors, N for tx queues, N for rx queues, 1 for config, and one for possible control vq
netdev_frontend_args.insert(

View File

@@ -17,7 +17,7 @@ anyhow = { workspace = true }
async-trait = { workspace = true }
bitflags = "2.9.0"
byte-unit = "5.1.6"
cgroups-rs = { version = "0.4.0", features = ["oci"] }
cgroups-rs = { version = "0.5.0", features = ["oci"] }
futures = "0.3.11"
lazy_static = { workspace = true }
libc = { workspace = true }
@@ -40,9 +40,9 @@ tempfile = "3.19.1"
hex = "0.4"
## Dependencies from `rust-netlink`
netlink-packet-route = "0.22"
netlink-packet-route = "0.26"
netlink-sys = "0.8"
rtnetlink = "0.16"
rtnetlink = "0.19"
# Local dependencies
agent = { workspace = true }

View File

@@ -10,7 +10,6 @@ use kata_types::{config::TomlConfig, cpu::LinuxContainerCpuResources};
use oci::LinuxCpu;
use oci_spec::runtime as oci;
use std::{
cmp,
collections::{HashMap, HashSet},
convert::TryFrom,
sync::Arc,
@@ -22,7 +21,7 @@ use crate::ResourceUpdateOp;
#[derive(Default, Debug, Clone)]
pub struct CpuResource {
/// Current number of vCPUs
pub(crate) current_vcpu: Arc<RwLock<u32>>,
pub(crate) current_vcpu: Arc<RwLock<f32>>,
/// Default number of vCPUs
pub(crate) default_vcpu: f32,
@@ -39,7 +38,7 @@ impl CpuResource {
.get(&hypervisor_name)
.context(format!("failed to get hypervisor {}", hypervisor_name))?;
Ok(Self {
current_vcpu: Arc::new(RwLock::new(hypervisor_config.cpu_info.default_vcpus as u32)),
current_vcpu: Arc::new(RwLock::new(hypervisor_config.cpu_info.default_vcpus)),
default_vcpu: hypervisor_config.cpu_info.default_vcpus,
container_cpu_resources: Arc::new(RwLock::new(HashMap::new())),
})
@@ -71,14 +70,14 @@ impl CpuResource {
Ok(())
}
pub(crate) async fn current_vcpu(&self) -> u32 {
pub(crate) async fn current_vcpu(&self) -> f32 {
let current_vcpu = self.current_vcpu.read().await;
*current_vcpu
}
async fn update_current_vcpu(&self, new_vcpus: u32) {
let mut current_vcpu = self.current_vcpu.write().await;
*current_vcpu = new_vcpus;
*current_vcpu = new_vcpus as f32;
}
// update container_cpu_resources field
@@ -116,10 +115,11 @@ impl CpuResource {
}
// calculates the total required vcpus by adding each container's requirements within the pod
async fn calc_cpu_resources(&self) -> Result<u32> {
async fn calc_cpu_resources(&self) -> Result<f32> {
let resources = self.container_cpu_resources.read().await;
if resources.is_empty() {
return Ok(self.default_vcpu.ceil() as u32);
// No containers, just keep the default vCPU configuration
return Ok(self.default_vcpu);
}
// If requests of individual containers are expresses with different
@@ -128,6 +128,7 @@ impl CpuResource {
// to use the largest period as the common denominator since it
// shifts precision out of the fractional part and into the
// integral part in case a rewritten quota ends up non-integral.
// Determine the largest CPU period among containers, will be used to normalize quotas
let max_period = resources
.iter()
.map(|(_, cpu_resource)| cpu_resource.period())
@@ -155,53 +156,74 @@ impl CpuResource {
let quota = cpu_resource.quota() as f64;
let period = cpu_resource.period() as f64;
if quota >= 0.0 && period != 0.0 {
if quota >= 0.0 && period > 0.0 {
// Normalize to max_period before adding quotas
total_quota += quota * (max_period / period);
}
}
// contrained only by cpuset
// constrained only by cpuset (no quota set)
if total_quota == 0.0 && !cpuset_vcpu.is_empty() {
info!(sl!(), "(from cpuset)get vcpus # {:?}", cpuset_vcpu);
return Ok(cpuset_vcpu.len() as u32);
return Ok(cpuset_vcpu.len() as f32);
}
let total_vcpu = if total_quota > 0.0 && max_period != 0.0 {
self.default_vcpu as f64 + total_quota / max_period
} else {
self.default_vcpu as f64
};
// When quota is set: calculate vCPUs as quota/period after normalization
if total_quota > 0.0 && max_period > 0.0 {
let quota_vcpu = total_quota / max_period;
info!(
sl!(),
"(from cfs_quota&cfs_period) target vcpus {} from quota {} max_period {}",
quota_vcpu,
total_quota,
max_period
);
info!(
sl!(),
"(from cfs_quota&cfs_period)get vcpus count {}", total_vcpu
);
Ok(total_vcpu.ceil() as u32)
let total_vcpu = quota_vcpu as f32 + self.default_vcpu;
return Ok(total_vcpu);
}
// Default case: no quota, no cpuset: use default_vcpu
Ok(self.default_vcpu.max(1.0))
}
// do hotplug and hot-unplug the vcpu
async fn do_update_cpu_resources(
&self,
new_vcpus: u32,
new_vcpus: f32,
op: ResourceUpdateOp,
hypervisor: &dyn Hypervisor,
) -> Result<u32> {
let old_vcpus = self.current_vcpu().await;
// when adding vcpus, ignore old_vcpus > new_vcpus
// when deleting vcpus, ignore old_vcpus < new_vcpus
// Prevent decreasing vCPUs on an Add operation or increasing on a Delete
if (op == ResourceUpdateOp::Add && old_vcpus > new_vcpus)
|| (op == ResourceUpdateOp::Del && old_vcpus < new_vcpus)
{
return Ok(old_vcpus);
return Ok(old_vcpus.ceil() as u32);
}
// do not reduce computing power
// the number of vcpus would not be lower than the default size
let new_vcpus = cmp::max(new_vcpus, self.default_vcpu.ceil() as u32);
// Enforce minimum of 1 vCPU for the VM
let min_vcpus = 1.0_f32;
let target_vcpus = if new_vcpus < min_vcpus {
min_vcpus
} else {
new_vcpus
};
// Hypervisor only supports integer vCPU counts round up at the last step
let target_vcpus_int = target_vcpus.ceil() as u32;
info!(
sl!(),
"(do_update_cpu_resources) old_vcpus {} -> new_vcpus {} (ceil to {})",
old_vcpus,
new_vcpus,
target_vcpus_int
);
let (_, new) = hypervisor
.resize_vcpu(old_vcpus, new_vcpus)
.resize_vcpu(old_vcpus.ceil() as u32, target_vcpus_int)
.await
.context("resize vcpus")?;
@@ -225,6 +247,7 @@ mod tests {
.entry("qemu".to_owned())
.and_modify(|hv_config| hv_config.cpu_info.default_vcpus = default_vcpus);
config.runtime.hypervisor_name = "qemu".to_owned();
CpuResource::new(Arc::new(config)).unwrap()
}
@@ -251,31 +274,15 @@ mod tests {
// result of 0.99999999999999989) but it still doesn't guarantee the
// correct result in general. For instance, adding 0.1 twenty times
// in 64 bits results in 2.0000000000000004.
add_linux_container_cpu_resources(
&mut cpu_resource,
vec![
(100_000, 1_000_000),
(100_000, 1_000_000),
(100_000, 1_000_000),
(100_000, 1_000_000),
(100_000, 1_000_000),
(100_000, 1_000_000),
(100_000, 1_000_000),
(100_000, 1_000_000),
(100_000, 1_000_000),
(100_000, 1_000_000),
],
)
.await;
add_linux_container_cpu_resources(&mut cpu_resource, vec![(100_000, 1_000_000); 10]).await;
assert_eq!(cpu_resource.calc_cpu_resources().await.unwrap(), 1);
// 10 * 0.1 = 1.0: matches expected vCPU sum (float-safe in f64)
assert_eq!(cpu_resource.calc_cpu_resources().await.unwrap(), 1.0);
}
#[tokio::test]
async fn test_big_allocation_1() {
let default_vcpus = 10.0;
let mut cpu_resource = get_cpu_resource_with_default_vcpus(default_vcpus);
let mut cpu_resource = get_cpu_resource_with_default_vcpus(10.0);
add_linux_container_cpu_resources(
&mut cpu_resource,
vec![
@@ -286,16 +293,20 @@ mod tests {
)
.await;
assert_eq!(
cpu_resource.calc_cpu_resources().await.unwrap(),
128 + default_vcpus as u32
const EPSILON: f32 = 0.0001;
let actual = cpu_resource.calc_cpu_resources().await.unwrap();
let expected = 138.0;
assert!(
(actual - expected).abs() < EPSILON,
"got {}, expect {}",
actual,
expected
);
}
#[tokio::test]
async fn test_big_allocation_2() {
let default_vcpus = 10.0;
let mut cpu_resource = get_cpu_resource_with_default_vcpus(default_vcpus);
let mut cpu_resource = get_cpu_resource_with_default_vcpus(10.0);
add_linux_container_cpu_resources(
&mut cpu_resource,
vec![
@@ -306,98 +317,114 @@ mod tests {
)
.await;
assert_eq!(
cpu_resource.calc_cpu_resources().await.unwrap(),
(33 + 31 + 77 + 1) + default_vcpus as u32
const EPSILON: f32 = 0.0001;
let actual = cpu_resource.calc_cpu_resources().await.unwrap();
let expected = 151.0;
assert!(
(actual - expected).abs() < EPSILON,
"got {}, expect {}",
actual,
expected
);
}
#[tokio::test]
async fn test_big_allocation_3() {
let default_vcpus = 10.0;
let mut cpu_resource = get_cpu_resource_with_default_vcpus(default_vcpus);
let mut cpu_resource = get_cpu_resource_with_default_vcpus(10.0);
add_linux_container_cpu_resources(&mut cpu_resource, vec![(141_000_008, 1_000_000)]).await;
assert_eq!(
cpu_resource.calc_cpu_resources().await.unwrap(),
142 + default_vcpus as u32
// 141 + 1(response to hypervisor ceil handling, still in calc we keep float)
const EPSILON: f32 = 0.0001;
let actual = cpu_resource.calc_cpu_resources().await.unwrap();
let expected = 151.0;
assert!(
(actual - expected).abs() < EPSILON,
"got {}, expect {}",
actual,
expected
);
}
#[tokio::test]
async fn test_big_allocation_4() {
let default_vcpus = 10.0;
let mut cpu_resource = get_cpu_resource_with_default_vcpus(default_vcpus);
add_linux_container_cpu_resources(
&mut cpu_resource,
vec![
(17_000_001, 1_000_000),
(17_000_001, 1_000_000),
(17_000_001, 1_000_000),
(17_000_001, 1_000_000),
],
)
.await;
let mut cpu_resource = get_cpu_resource_with_default_vcpus(10.0);
add_linux_container_cpu_resources(&mut cpu_resource, vec![(17_000_001, 1_000_000); 4])
.await;
assert_eq!(
cpu_resource.calc_cpu_resources().await.unwrap(),
(4 * 17 + 1) + default_vcpus as u32
const EPSILON: f32 = 0.0001;
let actual = cpu_resource.calc_cpu_resources().await.unwrap();
let expected = 78.0;
assert!(
(actual - expected).abs() < EPSILON,
"got {}, expect {}",
actual,
expected
);
}
#[tokio::test]
async fn test_divisible_periods() {
let default_vcpus = 3.0;
let mut cpu_resource = get_cpu_resource_with_default_vcpus(default_vcpus);
let mut cpu_resource = get_cpu_resource_with_default_vcpus(3.0);
add_linux_container_cpu_resources(
&mut cpu_resource,
vec![(1_000_000, 1_000_000), (1_000_000, 500_000)],
)
.await;
assert_eq!(
cpu_resource.calc_cpu_resources().await.unwrap(),
3 + default_vcpus as u32
);
// periods normalized: second gets * 2 quota. total=1+2=3
assert_eq!(cpu_resource.calc_cpu_resources().await.unwrap(), 6.0);
let mut cpu_resource = get_cpu_resource_with_default_vcpus(default_vcpus);
let mut cpu_resource = get_cpu_resource_with_default_vcpus(3.0);
add_linux_container_cpu_resources(
&mut cpu_resource,
vec![(3_000_000, 1_500_000), (1_000_000, 500_000)],
)
.await;
assert_eq!(
cpu_resource.calc_cpu_resources().await.unwrap(),
4 + default_vcpus as u32
);
// normalized: first quota=2, second quota=2. total=4
assert_eq!(cpu_resource.calc_cpu_resources().await.unwrap(), 7.0);
}
#[tokio::test]
async fn test_indivisible_periods() {
let default_vcpus = 1.0;
let mut cpu_resource = get_cpu_resource_with_default_vcpus(default_vcpus);
const EPSILON: f32 = 0.0001;
// Case 1
let mut cpu_resource = get_cpu_resource_with_default_vcpus(1.0);
add_linux_container_cpu_resources(
&mut cpu_resource,
vec![(1_000_000, 1_000_000), (900_000, 300_000)],
)
.await;
assert_eq!(
cpu_resource.calc_cpu_resources().await.unwrap(),
4 + default_vcpus as u32
let actual = cpu_resource.calc_cpu_resources().await.unwrap();
let expected = 5.0; // pure quota sum, no default_vcpu added
assert!(
(actual - expected).abs() < EPSILON,
"case1: got {}, expect {}",
actual,
expected
);
let mut cpu_resource = get_cpu_resource_with_default_vcpus(default_vcpus);
// Case 2
let mut cpu_resource = get_cpu_resource_with_default_vcpus(1.0);
add_linux_container_cpu_resources(
&mut cpu_resource,
vec![(1_000_000, 1_000_000), (900_000, 299_999)],
)
.await;
assert_eq!(
cpu_resource.calc_cpu_resources().await.unwrap(),
5 + default_vcpus as u32
let actual = cpu_resource.calc_cpu_resources().await.unwrap();
// total_quota = 1_000_000 + (900_000 * (1_000_000 / 299_999))
// total_vcpus = total_quota / 1_000_000
let expected = (1_000_000.0 + (900_000.0 * (1_000_000.0 / 299_999.0))) / 1_000_000.0 + 1.0;
assert!(
(actual - expected as f32).abs() < EPSILON,
"case2: got {}, expect {}",
actual,
expected
);
}
@@ -407,17 +434,18 @@ mod tests {
let mut cpu_resource = get_cpu_resource_with_default_vcpus(default_vcpus);
add_linux_container_cpu_resources(&mut cpu_resource, vec![(250_000, 1_000_000)]).await;
assert_eq!(cpu_resource.calc_cpu_resources().await.unwrap(), 1);
assert_eq!(cpu_resource.calc_cpu_resources().await.unwrap(), 0.75);
let mut cpu_resource = get_cpu_resource_with_default_vcpus(default_vcpus);
add_linux_container_cpu_resources(&mut cpu_resource, vec![(500_000, 1_000_000)]).await;
assert_eq!(cpu_resource.calc_cpu_resources().await.unwrap(), 1);
assert_eq!(cpu_resource.calc_cpu_resources().await.unwrap(), 1.0);
let mut cpu_resource = get_cpu_resource_with_default_vcpus(default_vcpus);
add_linux_container_cpu_resources(&mut cpu_resource, vec![(500_001, 1_000_000)]).await;
assert_eq!(cpu_resource.calc_cpu_resources().await.unwrap(), 2);
let mut cpu_resource = get_cpu_resource_with_default_vcpus(default_vcpus);
add_linux_container_cpu_resources(&mut cpu_resource, vec![(500_001, 1_000_000)]).await;
assert_eq!(cpu_resource.calc_cpu_resources().await.unwrap(), 1.000001);
// This test doesn't pass because 0.1 is periodic in binary and thus
// not exactly representable by a float of any width for fundamental

View File

@@ -17,7 +17,7 @@ use oci_spec::runtime as oci;
// sandbox/container's workload
#[derive(Clone, Copy, Debug)]
struct InitialSize {
vcpu: u32,
vcpu: f32,
mem_mb: u32,
orig_toml_default_mem: u32,
}
@@ -28,7 +28,7 @@ const MIB: i64 = 1024 * 1024;
impl TryFrom<&HashMap<String, String>> for InitialSize {
type Error = anyhow::Error;
fn try_from(an: &HashMap<String, String>) -> Result<Self> {
let mut vcpu: u32 = 0;
let mut vcpu: f32 = 0.0;
let annotation = Annotation::new(an.clone());
let (period, quota, memory) =
@@ -56,7 +56,7 @@ impl TryFrom<&HashMap<String, String>> for InitialSize {
impl TryFrom<&oci::Spec> for InitialSize {
type Error = anyhow::Error;
fn try_from(spec: &oci::Spec) -> Result<Self> {
let mut vcpu: u32 = 0;
let mut vcpu: f32 = 0.0;
let mut mem_mb: u32 = 0;
match container_type(spec) {
// podsandbox, from annotation
@@ -140,8 +140,8 @@ impl InitialSizeManager {
.get_mut(hypervisor_name)
.context("failed to get hypervisor config")?;
if self.resource.vcpu > 0 {
hv.cpu_info.default_vcpus = self.resource.vcpu as f32
if self.resource.vcpu > 0.0 {
info!(sl!(), "resource with vcpu {}", self.resource.vcpu);
}
self.resource.orig_toml_default_mem = hv.memory_info.default_memory;
if self.resource.mem_mb > 0 {
@@ -160,11 +160,11 @@ impl InitialSizeManager {
}
}
fn get_nr_vcpu(resource: &LinuxContainerCpuResources) -> u32 {
fn get_nr_vcpu(resource: &LinuxContainerCpuResources) -> f32 {
if let Some(v) = resource.get_vcpus() {
v as u32
v as f32
} else {
0
0.0
}
}
@@ -223,7 +223,7 @@ mod tests {
memory: None,
},
result: InitialSize {
vcpu: 0,
vcpu: 0.0,
mem_mb: 0,
orig_toml_default_mem: 0,
},
@@ -237,7 +237,7 @@ mod tests {
memory: Some(512 * MIB),
},
result: InitialSize {
vcpu: 3,
vcpu: 3.0,
mem_mb: 512,
orig_toml_default_mem: 0,
},
@@ -250,7 +250,7 @@ mod tests {
memory: Some(513 * MIB),
},
result: InitialSize {
vcpu: 0,
vcpu: 0.0,
mem_mb: 514,
orig_toml_default_mem: 0,
},
@@ -295,9 +295,12 @@ mod tests {
let initial_size = initial_size.unwrap();
assert_eq!(
initial_size.vcpu, d.result.vcpu,
initial_size.vcpu.ceil(),
d.result.vcpu,
"test[{}]: {:?} vcpu should be {}",
i, d.desc, d.result.vcpu,
i,
d.desc,
d.result.vcpu,
);
assert_eq!(
initial_size.mem_mb, d.result.mem_mb,
@@ -349,9 +352,12 @@ mod tests {
let initial_size = initial_size.unwrap();
assert_eq!(
initial_size.vcpu, d.result.vcpu,
initial_size.vcpu.ceil(),
d.result.vcpu,
"test[{}]: {:?} vcpu should be {}",
i, d.desc, d.result.vcpu,
i,
d.desc,
d.result.vcpu,
);
assert_eq!(
initial_size.mem_mb, d.result.mem_mb,

View File

@@ -413,17 +413,14 @@ impl ResourceManagerInner {
for d in linux_devices.iter() {
match d.typ() {
LinuxDeviceType::B => {
let block_driver = get_block_device_info(&self.device_manager)
.await
.block_device_driver;
let aio = get_block_device_info(&self.device_manager)
.await
.block_device_aio;
let blkdev_info = get_block_device_info(&self.device_manager).await;
let dev_info = DeviceConfig::BlockCfg(BlockConfig {
major: d.major(),
minor: d.minor(),
driver_option: block_driver,
blkdev_aio: BlockDeviceAio::new(&aio),
driver_option: blkdev_info.block_device_driver,
blkdev_aio: BlockDeviceAio::new(&blkdev_info.block_device_aio),
num_queues: blkdev_info.num_queues,
queue_size: blkdev_info.queue_size,
..Default::default()
});
@@ -595,7 +592,7 @@ impl ResourceManagerInner {
self.agent
.online_cpu_mem(OnlineCPUMemRequest {
wait: false,
nb_cpus: self.cpu_resource.current_vcpu().await,
nb_cpus: self.cpu_resource.current_vcpu().await.ceil() as u32,
cpu_only: false,
})
.await

View File

@@ -47,6 +47,8 @@ impl BlockVolume {
minor: stat::minor(fstat.st_rdev) as i64,
driver_option: blkdev_info.block_device_driver,
blkdev_aio: BlockDeviceAio::new(&blkdev_info.block_device_aio),
num_queues: blkdev_info.num_queues,
queue_size: blkdev_info.queue_size,
..Default::default()
};

View File

@@ -62,6 +62,8 @@ impl RawblockVolume {
path_on_host: mount_info.device.clone(),
driver_option: blkdev_info.block_device_driver,
blkdev_aio: BlockDeviceAio::new(&blkdev_info.block_device_aio),
num_queues: blkdev_info.num_queues,
queue_size: blkdev_info.queue_size,
..Default::default()
};

View File

@@ -330,9 +330,6 @@ impl VolumeManager {
state.guest_path,
state.ref_count,
);
// Return guest path
return Ok(state.guest_path.clone());
}
// Create a new volume state

View File

@@ -22,7 +22,7 @@ use kata_types::{
container::{update_ocispec_annotations, POD_CONTAINER, POD_SANDBOX},
k8s::{self, container_type},
};
use oci_spec::runtime::{self as oci, LinuxDeviceCgroup};
use oci_spec::runtime as oci;
use oci::{LinuxResources, Process as OCIProcess};
use resource::{
@@ -217,11 +217,10 @@ impl Container {
if let Some(linux) = &mut spec.linux_mut() {
linux.set_resources(resources);
// In certain scenarios, particularly under CoCo/Agent Policy enforcement, the default initial value of `Linux.Resources.Devices`
// is considered non-compliant, leading to container creation failures. To address this issue and ensure consistency with the behavior
// in `runtime-go`, the default value of `Linux.Resources.Devices` from the OCI Spec should be removed.
// In certain scenarios, particularly under CoCo/Agent Policy enforcement,
// the value of `Linux.Resources.Devices` should be empty.
if let Some(resource) = linux.resources_mut() {
clean_linux_resources_devices(resource);
resource.set_devices(None);
}
}
@@ -688,30 +687,6 @@ fn is_pid_namespace_enabled(spec: &oci::Spec) -> bool {
false
}
/// Cleans or filters specific device cgroup rules within the `devices` field of the `LinuxResources`.
/// Specifically, it iterates through all `LinuxDeviceCgroup` rules in `resources`
/// and removes those considered to be "default, all-access (rwm), and non-specific device" rules.
fn clean_linux_resources_devices(resources: &mut LinuxResources) {
if let Some(devices) = resources.devices_mut().take() {
let cleaned_devices: Vec<LinuxDeviceCgroup> = devices
.into_iter()
.filter(|device| {
!(!device.allow()
&& device.typ().is_none()
&& device.major().is_none()
&& device.minor().is_none()
&& device.access().as_deref() == Some("rwm"))
})
.collect();
resources.set_devices(if cleaned_devices.is_empty() {
None
} else {
Some(cleaned_devices)
});
}
}
#[cfg(test)]
mod tests {
use super::amend_spec;

View File

@@ -14,9 +14,8 @@ path = "src/bin/main.rs"
[dependencies]
anyhow = { workspace = true }
backtrace = { version = ">=0.3.35", features = [
backtrace = { version = ">=0.3.76", features = [
"libunwind",
"libbacktrace",
"std",
], default-features = false }
containerd-shim-protos = { workspace = true }

View File

@@ -233,13 +233,19 @@ DEFDISABLESELINUX := false
# Default guest SELinux configuration
DEFDISABLEGUESTSELINUX := true
DEFGUESTSELINUXLABEL := system_u:system_r:container_t
# Default is empty string "" to match the default golang (when commented out in config).
# Most users will want to set this to "system_u:system_r:container_t" for SELinux support.
DEFGUESTSELINUXLABEL :=
#Default SeccomSandbox param
#The same default policy is used by libvirt
#More explanation on https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg03348.html
# More explanation on https://lists.gnu.org/archive/html/qemu-devel/2017-02/msg03348.html
#
# Default is empty string "" to match the default (when commented out in config).
# Most users will want to set this to "on,obsolete=deny,spawn=deny,resourcecontrol=deny"
# for better security. Note: "elevateprivileges=deny" doesn't work with daemonize option.
# Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox
DEFSECCOMPSANDBOXPARAM := on,obsolete=deny,spawn=deny,resourcecontrol=deny
DEFSECCOMPSANDBOXPARAM :=
#Default entropy source
DEFENTROPYSOURCE := /dev/urandom
@@ -269,6 +275,7 @@ DEFVIRTIOFSQUEUESIZE ?= 1024
# Make sure you quote args.
DEFVIRTIOFSEXTRAARGS ?= [\"--thread-pool-size=1\", \"--announce-submounts\"]
DEFENABLEIOTHREADS := false
DEFINDEPIOTHREADS := 0
DEFENABLEVHOSTUSERSTORE := false
DEFVHOSTUSERSTOREPATH := $(PKGRUNDIR)/vhost-user
DEFVALIDVHOSTUSERSTOREPATHS := [\"$(DEFVHOSTUSERSTOREPATH)\"]
@@ -295,6 +302,10 @@ DEFDANCONF := /run/kata-containers/dans
DEFFORCEGUESTPULL := false
# Device cold plug
DEFPODRESOURCEAPISOCK := ""
DEFPODRESOURCEAPISOCK_NV := "/var/lib/kubelet/pod-resources/kubelet.sock"
SED = sed
CLI_DIR = cmd
@@ -461,7 +472,7 @@ ifneq (,$(QEMUCMD))
DEFAULTVCPUS_NV = 1
DEFAULTMEMORY_NV = 2048
DEFAULTTIMEOUT_NV = 500
DEFAULTTIMEOUT_NV = 1200
DEFAULTVFIOPORT_NV = root-port
DEFAULTPCIEROOTPORT_NV = 8
@@ -469,12 +480,9 @@ ifneq (,$(QEMUCMD))
KERNELPARAMS_NV += "cgroup_no_v1=all"
KERNELTDXPARAMS_NV = $(KERNELPARAMS_NV)
KERNELTDXPARAMS_NV += "clearcpuid=mtrr"
KERNELTDXPARAMS_NV += "authorize_allow_devs=pci:ALL"
KERNELSNPPARAMS_NV = $(KERNELPARAMS_NV)
#TODO: temporary until the attestation agent activates the device after successful attestation
KERNELSNPPARAMS_NV += "nvrc.smi.srs=1"
# Setting this to false can lead to cgroup leakages in the host
# Best practice for production is to set this to true
@@ -758,6 +766,7 @@ USER_VARS += DEFVIRTIOFSEXTRAARGS
USER_VARS += DEFENABLEANNOTATIONS
USER_VARS += DEFENABLEANNOTATIONS_COCO
USER_VARS += DEFENABLEIOTHREADS
USER_VARS += DEFINDEPIOTHREADS
USER_VARS += DEFSECCOMPSANDBOXPARAM
USER_VARS += DEFENABLEVHOSTUSERSTORE
USER_VARS += DEFVHOSTUSERSTOREPATH
@@ -783,7 +792,8 @@ USER_VARS += BUILDFLAGS
USER_VARS += DEFDISABLEIMAGENVDIMM
USER_VARS += DEFCCAMEASUREMENTALGO
USER_VARS += DEFSHAREDFS_QEMU_CCA_VIRTIOFS
USER_VARS += DEFPODRESOURCEAPISOCK
USER_VARS += DEFPODRESOURCEAPISOCK_NV
V = @
Q = $(V:1=)

View File

@@ -20,41 +20,22 @@ image = "@IMAGEPATH@"
# - ext4 (default)
# - xfs
# - erofs
rootfs_type=@DEFROOTFSTYPE@
# Enable confidential guest support.
# Toggling that setting may trigger different hardware features, ranging
# from memory encryption to both memory and CPU-state encryption and integrity.
# The Kata Containers runtime dynamically detects the available feature set and
# aims at enabling the largest possible one, returning an error if none is
# available, or none is supported by the hypervisor.
#
# Known limitations:
# * Does not work by design:
# - CPU Hotplug
# - Memory Hotplug
# - NVDIMM devices
#
# Supported TEEs:
# * Intel TDX
#
# Default false
# confidential_guest = true
rootfs_type = @DEFROOTFSTYPE@
# Enable running clh VMM as a non-root user.
# By default clh VMM run as root. When this is set to true, clh VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true
rootless = false
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
disable_selinux = @DEFDISABLESELINUX@
# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
# Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built
# with `SELINUX=yes`.
# (default: true)
disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
disable_guest_selinux = @DEFDISABLEGUESTSELINUX@
# Path to the firmware.
# If you want Cloud Hypervisor to use a specific firmware, set its path below.
@@ -120,7 +101,7 @@ default_memory = @DEFMEMSZ@
# Default memory slots per SB/VM.
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# Default maximum memory in MiB per SB / VM
# unspecified or == 0 --> will be set to the actual amount of physical RAM
@@ -182,12 +163,12 @@ block_device_driver = "virtio-blk"
# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true
block_device_cache_set = false
# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true
block_device_cache_direct = false
# Reclaim guest freed memory.
# Enabling this will result in the VM balloon device having f_reporting=on set.
@@ -197,32 +178,32 @@ block_device_driver = "virtio-blk"
# the VM.
#
# Default false
#reclaim_guest_freed_memory = true
reclaim_guest_freed_memory = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
# being allocated using huge pages.
#enable_hugepages = true
enable_hugepages = false
# Disable the 'seccomp' feature from Cloud Hypervisor, default false
# disable_seccomp = true
disable_seccomp = false
# Enable vIOMMU, default false
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: iommu=pt
#enable_iommu = true
enable_iommu = false
# This option changes the default hypervisor and kernel parameters
# to enable debug output where available.
#
# Default false
#enable_debug = true
enable_debug = false
# This option specifies the loglevel of the hypervisor
#
# Default 1
#hypervisor_loglevel = 1
hypervisor_loglevel = 1
# If false and nvdimm is supported, use nvdimm device to plug guest image.
# Otherwise virtio-block device is used.
@@ -232,7 +213,7 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# Enable hot-plugging of VFIO devices to a root-port.
# The default setting is "no-port"
#hot_plug_vfio = "root-port"
hot_plug_vfio = "no-port"
# Path to OCI hook binaries in the *guest rootfs*.
# This does not affect host-side hooks which must instead be added to
@@ -249,7 +230,7 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
guest_hook_path = ""
#
# These options are related to network rate limiter at the VMM level, and are
# based on the Cloud Hypervisor I/O throttling. Those are disabled by default
@@ -263,14 +244,14 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# for SB/VM).
# The same value is used for inbound and outbound bandwidth.
# Default 0-sized value means unlimited rate.
#net_rate_limiter_bw_max_rate = 0
net_rate_limiter_bw_max_rate = 0
#
# net_rate_limiter_bw_one_time_burst increases the initial max rate and this
# initial extra credit does *NOT* affect the overall limit and can be used for
# an *initial* burst of data.
# This is *optional* and only takes effect if net_rate_limiter_bw_max_rate is
# set to a non zero value.
#net_rate_limiter_bw_one_time_burst = 0
net_rate_limiter_bw_one_time_burst = 0
#
# Operation rate limiter options
#
@@ -278,14 +259,14 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# for SB/VM).
# The same value is used for inbound and outbound bandwidth.
# Default 0-sized value means unlimited rate.
#net_rate_limiter_ops_max_rate = 0
net_rate_limiter_ops_max_rate = 0
#
# net_rate_limiter_ops_one_time_burst increases the initial max rate and this
# initial extra credit does *NOT* affect the overall limit and can be used for
# an *initial* burst of data.
# This is *optional* and only takes effect if net_rate_limiter_bw_max_rate is
# set to a non zero value.
#net_rate_limiter_ops_one_time_burst = 0
net_rate_limiter_ops_one_time_burst = 0
#
# These options are related to disk rate limiter at the VMM level, and are
# based on the Cloud Hypervisor I/O throttling. Those are disabled by default
@@ -299,14 +280,14 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# for SB/VM).
# The same value is used for inbound and outbound bandwidth.
# Default 0-sized value means unlimited rate.
#disk_rate_limiter_bw_max_rate = 0
disk_rate_limiter_bw_max_rate = 0
#
# disk_rate_limiter_bw_one_time_burst increases the initial max rate and this
# initial extra credit does *NOT* affect the overall limit and can be used for
# an *initial* burst of data.
# This is *optional* and only takes effect if disk_rate_limiter_bw_max_rate is
# set to a non zero value.
#disk_rate_limiter_bw_one_time_burst = 0
disk_rate_limiter_bw_one_time_burst = 0
#
# Operation rate limiter options
#
@@ -314,19 +295,19 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# for SB/VM).
# The same value is used for inbound and outbound bandwidth.
# Default 0-sized value means unlimited rate.
#disk_rate_limiter_ops_max_rate = 0
disk_rate_limiter_ops_max_rate = 0
#
# disk_rate_limiter_ops_one_time_burst increases the initial max rate and this
# initial extra credit does *NOT* affect the overall limit and can be used for
# an *initial* burst of data.
# This is *optional* and only takes effect if disk_rate_limiter_bw_max_rate is
# set to a non zero value.
#disk_rate_limiter_ops_one_time_burst = 0
disk_rate_limiter_ops_one_time_burst = 0
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -340,14 +321,14 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 45)
@@ -355,13 +336,13 @@ dial_timeout = 45
# Confidential Data Hub API timeout value in seconds
# (default: 50)
#cdh_api_timeout = 50
cdh_api_timeout = 50
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
#
# Internetworking model
# Determines how the VM should be connected to the
@@ -379,14 +360,14 @@ dial_timeout = 45
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_CLH@"
internetworking_model = "@DEFNETWORKMODEL_CLH@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# Apply a custom SELinux security policy to the container process inside the VM.
# This is used when you want to apply a type other than the default `container_t`,
@@ -394,22 +375,23 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="@DEFGUESTSELINUXLABEL@"
# Example value when enabling: "system_u:system_r:container_t"
guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -417,7 +399,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -425,7 +407,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -434,13 +416,13 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT_CLH@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT_CLH@
# If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
# If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts`
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=@DEFBINDMOUNTS@
sandbox_bind_mounts = @DEFBINDMOUNTS@
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
@@ -461,22 +443,22 @@ sandbox_bind_mounts=@DEFBINDMOUNTS@
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE@"
vfio_mode = "@DEFVFIOMODE@"
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=@DEFDISABLEGUESTEMPTYDIR@
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false
# Indicates the CreateContainer request timeout needed for the workload(s)
# It using guest_pull this includes the time to pull the image inside the guest
@@ -494,3 +476,26 @@ create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
# to the hypervisor.
# (default: /run/kata-containers/dans)
dan_conf = "@DEFDANCONF@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the
# KubeletPodResourcesGet featureGate must be enabled in Kubelet,
# if using Kubelet older than 1.34.
#
# The pod resource API's socket is relative to the Kubelet's root-dir,
# which is defined by the cluster admin, and its location is:
# ${KubeletRootDir}/pod-resources/kubelet.sock
#
# The pod resource API's socket is relative to the Kubelet's root-dir,
# which is defined by the cluster admin, and its location is:
# ${KubeletRootDir}/pod-resources/kubelet.sock
#
# cold_plug_vfio(see hypervisor config) acts as a feature gate:
# cold_plug_vfio = no_port (default) => no cold plug
# cold_plug_vfio != no_port AND pod_resource_api_sock = "" => need
# explicit CDI annotation for cold plug (applies mainly
# to non-k8s cases)
# cold_plug_vfio != no_port AND pod_resource_api_sock != "" => kubelet
# based cold plug.
pod_resource_api_sock = "@DEFPODRESOURCEAPISOCK@"

View File

@@ -20,7 +20,7 @@ image = "@IMAGEPATH@"
# - ext4 (default)
# - xfs
# - erofs
rootfs_type=@DEFROOTFSTYPE@
rootfs_type = @DEFROOTFSTYPE@
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
@@ -102,14 +102,14 @@ default_memory = @DEFMEMSZ@
# Default memory slots per SB/VM.
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# The size in MiB will be plused to max memory of hypervisor.
# It is the memory address space for the NVDIMM device.
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0
memory_offset = 0
# Default maximum memory in MiB per SB / VM
# unspecified or == 0 --> will be set to the actual amount of physical RAM
@@ -124,12 +124,12 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_FC@"
# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true
block_device_cache_set = false
# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true
block_device_cache_noflush = false
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
@@ -138,7 +138,7 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_FC@"
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
enable_mem_prealloc = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -146,29 +146,29 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_FC@"
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Enable vIOMMU, default false
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true
enable_iommu = false
# This option changes the default hypervisor and kernel parameters
# to enable debug output where available.
#
# Default false
#enable_debug = true
enable_debug = false
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
disable_nesting_checks = false
# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = @DEFMSIZE9P@
msize_9p = @DEFMSIZE9P@
#
# Default entropy source.
@@ -180,7 +180,7 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_FC@"
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "@DEFENTROPYSOURCE@"
entropy_source= "@DEFENTROPYSOURCE@"
# List of valid annotations values for entropy_source
# The default if not set is empty (all annotations rejected.)
@@ -202,21 +202,21 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered will scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
guest_hook_path = ""
#
# Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM).
# In Firecracker, it provides a built-in rate limiter, which is based on TBF(Token Bucket Filter)
# queueing discipline.
# Default 0-sized value means unlimited rate.
#rx_rate_limiter_max_rate = 0
rx_rate_limiter_max_rate = 0
# Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM).
# In Firecracker, it provides a built-in rate limiter, which is based on TBF(Token Bucket Filter)
# queueing discipline.
# Default 0-sized value means unlimited rate.
#tx_rate_limiter_max_rate = 0
tx_rate_limiter_max_rate = 0
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
disable_selinux = @DEFDISABLESELINUX@
[factory]
# VM templating support. Once enabled, new VMs are created from template
@@ -230,12 +230,12 @@ disable_selinux=@DEFDISABLESELINUX@
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true
enable_template = false
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -249,7 +249,7 @@ disable_selinux=@DEFDISABLESELINUX@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
@@ -262,14 +262,14 @@ disable_selinux=@DEFDISABLESELINUX@
# * The module is not available in the guest or it doesn't met the guest kernel
# requirements, like architecture and version.
#
kernel_modules=[]
kernel_modules = []
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 45)
@@ -277,13 +277,13 @@ dial_timeout = 45
# Confidential Data Hub API timeout value in seconds
# (default: 50)
#cdh_api_timeout = 50
cdh_api_timeout = 50
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
#
# Internetworking model
# Determines how the VM should be connected to the
@@ -301,29 +301,29 @@ dial_timeout = 45
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_FC@"
internetworking_model = "@DEFNETWORKMODEL_FC@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -331,7 +331,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -339,7 +339,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -348,22 +348,22 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT_FC@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT_FC@
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=@DEFDISABLEGUESTEMPTYDIR@
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false
# Indicates the CreateContainer request timeout needed for the workload(s)
# It using guest_pull this includes the time to pull the image inside the guest
@@ -381,3 +381,22 @@ create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
# to the hypervisor.
# (default: /run/kata-containers/dans)
dan_conf = "@DEFDANCONF@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the
# KubeletPodResourcesGet featureGate must be enabled in Kubelet,
# if using Kubelet older than 1.34.
#
# The pod resource API's socket is relative to the Kubelet's root-dir,
# which is defined by the cluster admin, and its location is:
# ${KubeletRootDir}/pod-resources/kubelet.sock
#
# cold_plug_vfio(see hypervisor config) acts as a feature gate:
# cold_plug_vfio = no_port (default) => no cold plug
# cold_plug_vfio != no_port AND pod_resource_api_sock = "" => need
# explicit CDI annotation for cold plug (applies mainly
# to non-k8s cases)
# cold_plug_vfio != no_port AND pod_resource_api_sock != "" => kubelet
# based cold plug.
pod_resource_api_sock = "@DEFPODRESOURCEAPISOCK@"

View File

@@ -14,14 +14,13 @@
path = "@QEMUCCAEXPERIMENTALPATH@"
kernel = "@KERNELCONFIDENTIALPATH@"
image = "@IMAGECONFIDENTIALPATH@"
# initrd = "@INITRDCONFIDENTIALPATH@"
machine_type = "@MACHINETYPE@"
# rootfs filesystem type:
# - ext4 (default)
# - xfs
# - erofs
rootfs_type=@DEFROOTFSTYPE@
rootfs_type = @DEFROOTFSTYPE@
# Enable confidential guest support.
# Toggling that setting may trigger different hardware features, ranging
@@ -42,7 +41,7 @@ confidential_guest = true
# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true
rootless = false
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
@@ -80,7 +79,7 @@ firmware_volume = "@FIRMWAREVOLUMEPATH@"
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators="@MACHINEACCELERATORS@"
machine_accelerators = "@MACHINEACCELERATORS@"
# Qemu seccomp sandbox feature
# comma-separated list of seccomp sandbox features to control the syscall access.
@@ -88,12 +87,13 @@ machine_accelerators="@MACHINEACCELERATORS@"
# Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox
# Another note: enabling this feature may reduce performance, you may enable
# /proc/sys/net/core/bpf_jit_enable to reduce the impact. see https://man7.org/linux/man-pages/man8/bpfc.8.html
#seccompsandbox="@DEFSECCOMPSANDBOXPARAM@"
# Recommended value when enabling: "on,obsolete=deny,spawn=deny,resourcecontrol=deny"
seccompsandbox = "@DEFSECCOMPSANDBOXPARAM@"
# CPU features
# comma-separated list of cpu features to pass to the cpu
# For example, `cpu_features = "pmu=off,vmx=off"
cpu_features="@CPUFEATURES@"
cpu_features = "@CPUFEATURES@"
# Default number of vCPUs per SB/VM:
# unspecified or 0 --> will be set to @DEFVCPUS@
@@ -138,7 +138,7 @@ default_memory = @DEFMEMSZ@
# Default memory slots per SB/VM.
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# Default maximum memory in MiB per SB / VM
# unspecified or == 0 --> will be set to the actual amount of physical RAM
@@ -151,13 +151,13 @@ default_maxmemory = @DEFMAXMEMSZ@
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0
memory_offset = 0
# Specifies virtio-mem will be enabled or not.
# Please note that this option should be used with the command
# "echo 1 > /proc/sys/vm/overcommit_memory".
# Default false
#enable_virtio_mem = true
enable_virtio_mem = false
# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
@@ -217,17 +217,17 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_QEMU@"
# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true
block_device_cache_set = false
# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true
block_device_cache_direct = false
# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true
block_device_cache_noflush = false
# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
@@ -242,7 +242,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
enable_mem_prealloc = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -250,7 +250,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Enable vhost-user storage device, default false
# Enabling this will result in some Linux reserved block type
@@ -267,11 +267,11 @@ vhost_user_store_path = "@DEFVHOSTUSERSTOREPATH@"
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true
enable_iommu = false
# Enable IOMMU_PLATFORM, default false
# Enabling this will result in the VM device having iommu_platform=on set
#enable_iommu_platform = true
enable_iommu_platform = false
# List of valid annotations values for the vhost user store path
# The default if not set is empty (all annotations rejected.)
@@ -282,7 +282,7 @@ valid_vhost_user_store_paths = @DEFVALIDVHOSTUSERSTOREPATHS@
# will disable this feature. In the case of virtio-fs, this is enabled
# automatically and '/dev/shm' is used as the backing folder.
# This option will be ignored if VM templating is enabled.
#file_mem_backend = "@DEFFILEMEMBACKEND@"
file_mem_backend = "@DEFFILEMEMBACKEND@"
# List of valid annotations values for the file_mem_backend annotation
# The default if not set is empty (all annotations rejected.)
@@ -297,17 +297,17 @@ pflashes = []
# to enable debug output where available.
#
# Default false
#enable_debug = true
enable_debug = false
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
disable_nesting_checks = false
# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = @DEFMSIZE9P@
msize_9p = @DEFMSIZE9P@
# If false and nvdimm is supported, use nvdimm device to plug guest image.
# Otherwise virtio-block device is used.
@@ -319,11 +319,11 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
# The value means the number of pcie_root_port
# Default 0
#pcie_root_port = 2
pcie_root_port = 0
# If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off
# security (vhost-net runs ring0) for network I/O performance.
#disable_vhost_net = true
disable_vhost_net = false
#
# Default entropy source.
@@ -335,7 +335,7 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "@DEFENTROPYSOURCE@"
entropy_source= "@DEFENTROPYSOURCE@"
# List of valid annotations values for entropy_source
# The default if not set is empty (all annotations rejected.)
@@ -357,17 +357,17 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
guest_hook_path = ""
#
# Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) to discipline traffic.
# Default 0-sized value means unlimited rate.
#rx_rate_limiter_max_rate = 0
rx_rate_limiter_max_rate = 0
# Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) and ifb(Intermediate Functional Block)
# to discipline traffic.
# Default 0-sized value means unlimited rate.
#tx_rate_limiter_max_rate = 0
tx_rate_limiter_max_rate = 0
# Set where to save the guest memory dump file.
# If set, when GUEST_PANICKED event occurred,
@@ -377,9 +377,10 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# The dumped file(also called vmcore) can be processed with crash or gdb.
#
# WARNING:
# Dump guests memory can take very long depending on the amount of guest memory
# Dump guest's memory can take very long depending on the amount of guest memory
# and use much disk space.
#guest_memory_dump_path="/var/crash/kata"
# Recommended value when enabling: "/var/crash/kata"
guest_memory_dump_path = ""
# If enable paging.
# Basically, if you want to use "gdb" rather than "crash",
@@ -387,7 +388,7 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# then you should enable paging.
#
# See: https://www.qemu.org/docs/master/qemu-qmp-ref.html#Dump-guest-memory for details
#guest_memory_dump_paging=false
guest_memory_dump_paging = false
# Enable swap in the guest. Default false.
# When enable_guest_swap is enabled, insert a raw file to the guest as the swap device
@@ -398,26 +399,26 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
# If swap_in_bytes and memory_limit_in_bytes is not set, the size should
# be default_memory.
#enable_guest_swap = true
enable_guest_swap = false
# use legacy serial for guest console if available and implemented for architecture. Default false
#use_legacy_serial = true
use_legacy_serial = false
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
disable_selinux = @DEFDISABLESELINUX@
# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
# Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built
# with `SELINUX=yes`.
# (default: true)
disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
disable_guest_selinux = @DEFDISABLEGUESTSELINUX@
# In QEMU, the Realm Management Extension (RME) measurement algorithm is used for attestation, and it supports
# sha256 and sha512 as options. The default is sha512. This algorithm is crucial for verifying the integrity of a
# Realm, a secure execution environment within the larger system. QEMU supports sha256 and sha512 for CCA RME
# measurements. sha512 is generally preferred on 64-bit architectures due to potential hardware acceleration.
measurement_algo="@DEFCCAMEASUREMENTALGO@"
measurement_algo = "@DEFCCAMEASUREMENTALGO@"
[factory]
# VM templating support. Once enabled, new VMs are created from template
@@ -431,12 +432,12 @@ measurement_algo="@DEFCCAMEASUREMENTALGO@"
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true
enable_template = false
# Specifies the path of template.
#
# Default "/run/vc/vm/template"
#template_path = "/run/vc/vm/template"
template_path = "/run/vc/vm/template"
# The number of caches of VMCache:
# unspecified or == 0 --> VMCache is disabled
@@ -455,17 +456,17 @@ measurement_algo="@DEFCCAMEASUREMENTALGO@"
# a new sandbox.
#
# Default 0
#vm_cache_number = 0
vm_cache_number = 0
# Specify the address of the Unix socket that is used by VMCache.
#
# Default /var/run/kata-containers/cache.sock
#vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -479,7 +480,7 @@ measurement_algo="@DEFCCAMEASUREMENTALGO@"
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
@@ -492,14 +493,14 @@ measurement_algo="@DEFCCAMEASUREMENTALGO@"
# * The module is not available in the guest or it doesn't met the guest kernel
# requirements, like architecture and version.
#
kernel_modules=[]
kernel_modules = []
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 90)
@@ -509,7 +510,7 @@ dial_timeout = 90
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
#
# Internetworking model
# Determines how the VM should be connected to the
@@ -527,14 +528,14 @@ dial_timeout = 90
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_QEMU@"
internetworking_model = "@DEFNETWORKMODEL_QEMU@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# Apply a custom SELinux security policy to the container process inside the VM.
# This is used when you want to apply a type other than the default `container_t`,
@@ -542,22 +543,23 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="@DEFGUESTSELINUXLABEL@"
# Example value when enabling: "system_u:system_r:container_t"
guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -565,7 +567,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -573,7 +575,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -582,13 +584,13 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT_TEE@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT_TEE@
# If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
# If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts`
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=@DEFBINDMOUNTS@
sandbox_bind_mounts = @DEFBINDMOUNTS@
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
@@ -609,22 +611,22 @@ sandbox_bind_mounts=@DEFBINDMOUNTS@
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE@"
vfio_mode = "@DEFVFIOMODE@"
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=@DEFDISABLEGUESTEMPTYDIR@
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false
# Indicates the CreateContainer request timeout needed for the workload(s)
# It using guest_pull this includes the time to pull the image inside the guest
@@ -647,3 +649,21 @@ dan_conf = "@DEFDANCONF@"
# the container image should be pulled in the guest, without using an external snapshotter.
# This is an experimental feature and might be removed in the future.
experimental_force_guest_pull = @DEFFORCEGUESTPULL@
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the
# KubeletPodResourcesGet featureGate must be enabled in Kubelet,
# if using Kubelet older than 1.34.
#
# The pod resource API's socket is relative to the Kubelet's root-dir,
# which is defined by the cluster admin, and its location is:
# ${KubeletRootDir}/pod-resources/kubelet.sock
#
# cold_plug_vfio(see hypervisor config) acts as a feature gate:
# cold_plug_vfio = no_port (default) => no cold plug
# cold_plug_vfio != no_port AND pod_resource_api_sock = "" => need
# explicit CDI annotation for cold plug (applies mainly
# to non-k8s cases)
# cold_plug_vfio != no_port AND pod_resource_api_sock != "" => kubelet
# based cold plug.
pod_resource_api_sock = "@DEFPODRESOURCEAPISOCK@"

View File

@@ -16,41 +16,18 @@
path = "@QEMUPATH@"
kernel = "@KERNELCONFIDENTIALPATH@"
image = "@IMAGECONFIDENTIALPATH@"
# initrd = "@INITRDCONFIDENTIALPATH@"
machine_type = "@MACHINETYPE@"
# rootfs filesystem type:
# - ext4 (default)
# - xfs
# - erofs
rootfs_type=@DEFROOTFSTYPE@
# Enable confidential guest support.
# Toggling that setting may trigger different hardware features, ranging
# from memory encryption to both memory and CPU-state encryption and integrity.
# The Kata Containers runtime dynamically detects the available feature set and
# aims at enabling the largest possible one, returning an error if none is
# available, or none is supported by the hypervisor.
#
# Known limitations:
# * Does not work by design:
# - CPU Hotplug
# - Memory Hotplug
# - NVDIMM devices
#
# Default false
# confidential_guest = true
# Choose AMD SEV-SNP confidential guests
# In case of using confidential guests on AMD hardware that supports both SEV
# and SEV-SNP, the following enables SEV-SNP guests. SEV guests are default.
# Default false
# sev_snp_guest = true
rootfs_type = @DEFROOTFSTYPE@
# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true
rootless = false
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
@@ -88,7 +65,7 @@ firmware_volume = "@FIRMWAREVOLUMEPATH@"
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators="@MACHINEACCELERATORS@"
machine_accelerators = "@MACHINEACCELERATORS@"
# Qemu seccomp sandbox feature
# comma-separated list of seccomp sandbox features to control the syscall access.
@@ -96,12 +73,13 @@ machine_accelerators="@MACHINEACCELERATORS@"
# Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox
# Another note: enabling this feature may reduce performance, you may enable
# /proc/sys/net/core/bpf_jit_enable to reduce the impact. see https://man7.org/linux/man-pages/man8/bpfc.8.html
#seccompsandbox="@DEFSECCOMPSANDBOXPARAM@"
# Recommended value when enabling: "on,obsolete=deny,spawn=deny,resourcecontrol=deny"
seccompsandbox = "@DEFSECCOMPSANDBOXPARAM@"
# CPU features
# comma-separated list of cpu features to pass to the cpu
# For example, `cpu_features = "pmu=off,vmx=off"
cpu_features="@CPUFEATURES@"
cpu_features = "@CPUFEATURES@"
# Default number of vCPUs per SB/VM:
# unspecified or 0 --> will be set to @DEFVCPUS@
@@ -146,7 +124,7 @@ default_memory = @DEFMEMSZ@
# Default memory slots per SB/VM.
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# Default maximum memory in MiB per SB / VM
# unspecified or == 0 --> will be set to the actual amount of physical RAM
@@ -159,13 +137,13 @@ default_maxmemory = @DEFMAXMEMSZ@
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0
memory_offset = 0
# Specifies virtio-mem will be enabled or not.
# Please note that this option should be used with the command
# "echo 1 > /proc/sys/vm/overcommit_memory".
# Default false
#enable_virtio_mem = true
enable_virtio_mem = false
# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
@@ -246,24 +224,28 @@ block_device_aio = "@DEFBLOCKDEVICEAIO_QEMU@"
# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true
block_device_cache_set = false
# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true
block_device_cache_direct = false
# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true
block_device_cache_noflush = false
# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
# for SCSI.
# handled in a separate IO thread. This is currently implemented
# for virtio-scsi and virtio-blk.
#
enable_iothreads = @DEFENABLEIOTHREADS@
# Independent IOThreads enables IO to be processed in a separate thread, it is
# for QEMU hotplug device attach to iothread, like virtio-blk.
indep_iothreads = @DEFINDEPIOTHREADS@
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
@@ -271,7 +253,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
enable_mem_prealloc = false
# Reclaim guest freed memory.
# Enabling this will result in the VM balloon device having f_reporting=on set.
@@ -281,7 +263,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# the VM.
#
# Default false
#reclaim_guest_freed_memory = true
reclaim_guest_freed_memory = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -289,7 +271,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Enable vhost-user storage device, default false
# Enabling this will result in some Linux reserved block type
@@ -306,11 +288,11 @@ vhost_user_store_path = "@DEFVHOSTUSERSTOREPATH@"
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true
enable_iommu = false
# Enable IOMMU_PLATFORM, default false
# Enabling this will result in the VM device having iommu_platform=on set
#enable_iommu_platform = true
enable_iommu_platform = false
# List of valid annotations values for the vhost user store path
# The default if not set is empty (all annotations rejected.)
@@ -326,7 +308,7 @@ vhost_user_reconnect_timeout_sec = 0
# will disable this feature. In the case of virtio-fs, this is enabled
# automatically and '/dev/shm' is used as the backing folder.
# This option will be ignored if VM templating is enabled.
#file_mem_backend = "@DEFFILEMEMBACKEND@"
file_mem_backend = "@DEFFILEMEMBACKEND@"
# List of valid annotations values for the file_mem_backend annotation
# The default if not set is empty (all annotations rejected.)
@@ -341,7 +323,7 @@ pflashes = []
# to enable debug output where available.
#
# Default false
#enable_debug = true
enable_debug = false
# This option allows to add an extra HMP or QMP socket when `enable_debug = true`
#
@@ -356,17 +338,17 @@ pflashes = []
#
# If set to the empty string "", no extra monitor socket is added. This is
# the default.
#extra_monitor_socket = hmp
extra_monitor_socket = ""
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
disable_nesting_checks = true
# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = @DEFMSIZE9P@
msize_9p = @DEFMSIZE9P@
# If false and nvdimm is supported, use nvdimm device to plug guest image.
# Otherwise virtio-block device is used.
@@ -377,24 +359,24 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# Enable hot-plugging of VFIO devices to a bridge-port,
# root-port or switch-port.
# The default setting is "no-port"
#hot_plug_vfio = "root-port"
hot_plug_vfio = "no-port"
# In a confidential compute environment hot-plugging can compromise
# security.
# Enable cold-plugging of VFIO devices to a bridge-port,
# root-port or switch-port.
# The default setting is "no-port", which means disabled.
#cold_plug_vfio = "root-port"
cold_plug_vfio = "no-port"
# Before hot plugging a PCIe device, you need to add a pcie_root_port device.
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
# The value means the number of pcie_root_port
# Default 0
#pcie_root_port = 2
pcie_root_port = 0
# If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off
# security (vhost-net runs ring0) for network I/O performance.
#disable_vhost_net = true
disable_vhost_net = false
#
# Default entropy source.
@@ -406,7 +388,7 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "@DEFENTROPYSOURCE@"
entropy_source= "@DEFENTROPYSOURCE@"
# List of valid annotations values for entropy_source
# The default if not set is empty (all annotations rejected.)
@@ -428,17 +410,18 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
#
# Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) to discipline traffic.
# Default 0-sized value means unlimited rate.
#rx_rate_limiter_max_rate = 0
rx_rate_limiter_max_rate = 0
# Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) and ifb(Intermediate Functional Block)
# to discipline traffic.
# Default 0-sized value means unlimited rate.
#tx_rate_limiter_max_rate = 0
tx_rate_limiter_max_rate = 0
# Set where to save the guest memory dump file.
# If set, when GUEST_PANICKED event occurred,
@@ -448,9 +431,10 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# The dumped file(also called vmcore) can be processed with crash or gdb.
#
# WARNING:
# Dump guests memory can take very long depending on the amount of guest memory
# Dump guest's memory can take very long depending on the amount of guest memory
# and use much disk space.
#guest_memory_dump_path="/var/crash/kata"
# Recommended value when enabling: "/var/crash/kata"
guest_memory_dump_path = ""
# If enable paging.
# Basically, if you want to use "gdb" rather than "crash",
@@ -458,7 +442,7 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# then you should enable paging.
#
# See: https://www.qemu.org/docs/master/qemu-qmp-ref.html#Dump-guest-memory for details
#guest_memory_dump_paging=false
guest_memory_dump_paging = false
# Enable swap in the guest. Default false.
# When enable_guest_swap is enabled, insert a raw file to the guest as the swap device
@@ -469,20 +453,20 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
# If swap_in_bytes and memory_limit_in_bytes is not set, the size should
# be default_memory.
#enable_guest_swap = true
enable_guest_swap = false
# use legacy serial for guest console if available and implemented for architecture. Default false
#use_legacy_serial = true
use_legacy_serial = false
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
disable_selinux = @DEFDISABLESELINUX@
# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
# Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built
# with `SELINUX=yes`.
# (default: true)
disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
disable_guest_selinux = @DEFDISABLEGUESTSELINUX@
[factory]
@@ -497,12 +481,12 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true
enable_template = false
# Specifies the path of template.
#
# Default "/run/vc/vm/template"
#template_path = "/run/vc/vm/template"
template_path = "/run/vc/vm/template"
# The number of caches of VMCache:
# unspecified or == 0 --> VMCache is disabled
@@ -521,17 +505,17 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# a new sandbox.
#
# Default 0
#vm_cache_number = 0
vm_cache_number = 0
# Specify the address of the Unix socket that is used by VMCache.
#
# Default /var/run/kata-containers/cache.sock
#vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -545,7 +529,7 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
@@ -558,14 +542,14 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# * The module is not available in the guest or it doesn't met the guest kernel
# requirements, like architecture and version.
#
kernel_modules=[]
kernel_modules = []
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 45)
@@ -573,13 +557,13 @@ dial_timeout = 45
# Confidential Data Hub API timeout value in seconds
# (default: 50)
#cdh_api_timeout = 50
cdh_api_timeout = 50
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
#
# Internetworking model
# Determines how the VM should be connected to the
@@ -597,19 +581,19 @@ dial_timeout = 45
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_QEMU@"
internetworking_model = "@DEFNETWORKMODEL_QEMU@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# vCPUs pinning settings
# if enabled, each vCPU thread will be scheduled to a fixed CPU
# qualified condition: num(vCPU threads) == num(CPUs in sandbox's CPUSet)
# enable_vcpus_pinning = false
enable_vcpus_pinning = false
# Apply a custom SELinux security policy to the container process inside the VM.
# This is used when you want to apply a type other than the default `container_t`,
@@ -617,22 +601,23 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="@DEFGUESTSELINUXLABEL@"
# Example value when enabling: "system_u:system_r:container_t"
guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -640,7 +625,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -648,7 +633,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -657,13 +642,13 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT_TEE@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT_TEE@
# If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
# If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts`
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=@DEFBINDMOUNTS@
sandbox_bind_mounts = @DEFBINDMOUNTS@
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
@@ -684,22 +669,22 @@ sandbox_bind_mounts=@DEFBINDMOUNTS@
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE@"
vfio_mode = "@DEFVFIOMODE@"
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=@DEFDISABLEGUESTEMPTYDIR@
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false
# Indicates the CreateContainer request timeout needed for the workload(s)
# It using guest_pull this includes the time to pull the image inside the guest
@@ -722,3 +707,22 @@ dan_conf = "@DEFDANCONF@"
# the container image should be pulled in the guest, without using an external snapshotter.
# This is an experimental feature and might be removed in the future.
experimental_force_guest_pull = @DEFFORCEGUESTPULL@
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the
# KubeletPodResourcesGet featureGate must be enabled in Kubelet,
# if using Kubelet older than 1.34.
#
# The pod resource API's socket is relative to the Kubelet's root-dir,
# which is defined by the cluster admin, and its location is:
# ${KubeletRootDir}/pod-resources/kubelet.sock
#
# cold_plug_vfio(see hypervisor config) acts as a feature gate:
# cold_plug_vfio = no_port (default) => no cold plug
# cold_plug_vfio != no_port AND pod_resource_api_sock = "" => need
# explicit CDI annotation for cold plug (applies mainly
# to non-k8s cases)
# cold_plug_vfio != no_port AND pod_resource_api_sock != "" => kubelet
# based cold plug.
pod_resource_api_sock = "@DEFPODRESOURCEAPISOCK@"

View File

@@ -23,7 +23,7 @@ machine_type = "@MACHINETYPE@"
# - ext4 (default)
# - xfs
# - erofs
rootfs_type=@DEFROOTFSTYPE@
rootfs_type = @DEFROOTFSTYPE@
# Enable confidential guest support.
# Toggling that setting may trigger different hardware features, ranging
@@ -47,7 +47,7 @@ sev_snp_guest = true
# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true
rootless = false
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
@@ -68,17 +68,17 @@ valid_hypervisor_paths = @QEMUSNPVALIDHYPERVISORPATHS@
#
# 96-byte, base64-encoded blob to provide the ID Block structure for the
# SNP_LAUNCH_FINISH command defined in the SEV-SNP firmware ABI (QEMU default: all-zero)
#snp_id_block = ""
snp_id_block = ""
# 4096-byte, base64-encoded blob to provide the ID Authentication Information Structure
# for the SNP_LAUNCH_FINISH command defined in the SEV-SNP firmware ABI (QEMU default: all-zero)
#snp_id_auth = ""
snp_id_auth = ""
# SNP Guest Policy, the POLICY parameter to the SNP_LAUNCH_START command.
# If unset, the QEMU default policy (0x30000) will be used.
# Notice that the guest policy is enforced at VM launch, and your pod VMs
# won't start at all if the policy denys it. This will be indicated by a
# 'SNP_LAUNCH_START' error.
#snp_guest_policy = 196608
snp_guest_policy = 196608
# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
@@ -105,7 +105,7 @@ firmware_volume = "@FIRMWARETDVFVOLUMEPATH@"
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators="@MACHINEACCELERATORS@"
machine_accelerators = "@MACHINEACCELERATORS@"
# Qemu seccomp sandbox feature
# comma-separated list of seccomp sandbox features to control the syscall access.
@@ -113,12 +113,13 @@ machine_accelerators="@MACHINEACCELERATORS@"
# Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox
# Another note: enabling this feature may reduce performance, you may enable
# /proc/sys/net/core/bpf_jit_enable to reduce the impact. see https://man7.org/linux/man-pages/man8/bpfc.8.html
#seccompsandbox="@DEFSECCOMPSANDBOXPARAM@"
# Recommended value when enabling: "on,obsolete=deny,spawn=deny,resourcecontrol=deny"
seccompsandbox = "@DEFSECCOMPSANDBOXPARAM@"
# CPU features
# comma-separated list of cpu features to pass to the cpu
# For example, `cpu_features = "pmu=off,vmx=off"
cpu_features="@CPUFEATURES@"
cpu_features = "@CPUFEATURES@"
# Default number of vCPUs per SB/VM:
# unspecified or 0 --> will be set to @DEFVCPUS@
@@ -163,7 +164,7 @@ default_memory = @DEFAULTMEMORY_NV@
# Default memory slots per SB/VM.
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# Default maximum memory in MiB per SB / VM
# unspecified or == 0 --> will be set to the actual amount of physical RAM
@@ -176,13 +177,13 @@ default_maxmemory = @DEFMAXMEMSZ@
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0
memory_offset = 0
# Specifies virtio-mem will be enabled or not.
# Please note that this option should be used with the command
# "echo 1 > /proc/sys/vm/overcommit_memory".
# Default false
#enable_virtio_mem = true
enable_virtio_mem = false
# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
@@ -263,24 +264,28 @@ block_device_aio = "@DEFBLOCKDEVICEAIO_QEMU@"
# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true
block_device_cache_set = false
# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true
block_device_cache_direct = false
# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true
block_device_cache_noflush = false
# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
# for SCSI.
# handled in a separate IO thread. This is currently implemented
# for virtio-scsi and virtio-blk.
#
enable_iothreads = @DEFENABLEIOTHREADS@
# Independent IOThreads enables IO to be processed in a separate thread, it is
# for QEMU hotplug device attach to iothread, like virtio-blk.
indep_iothreads = @DEFINDEPIOTHREADS@
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
@@ -288,7 +293,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
enable_mem_prealloc = false
# Reclaim guest freed memory.
# Enabling this will result in the VM balloon device having f_reporting=on set.
@@ -298,7 +303,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# the VM.
#
# Default false
#reclaim_guest_freed_memory = true
reclaim_guest_freed_memory = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -306,7 +311,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Enable vhost-user storage device, default false
# Enabling this will result in some Linux reserved block type
@@ -323,11 +328,11 @@ vhost_user_store_path = "@DEFVHOSTUSERSTOREPATH@"
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true
enable_iommu = false
# Enable IOMMU_PLATFORM, default false
# Enabling this will result in the VM device having iommu_platform=on set
#enable_iommu_platform = true
enable_iommu_platform = false
# List of valid annotations values for the vhost user store path
# The default if not set is empty (all annotations rejected.)
@@ -358,17 +363,17 @@ pflashes = []
# to enable debug output where available. And Debug also enable the hmp socket.
#
# Default false
#enable_debug = true
enable_debug = false
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
disable_nesting_checks = false
# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = @DEFMSIZE9P@
msize_9p = @DEFMSIZE9P@
# If false and nvdimm is supported, use nvdimm device to plug guest image.
# Otherwise virtio-block device is used.
@@ -380,7 +385,7 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
# The value means the number of pcie_root_port
# Default 0
#pcie_root_port = 2
pcie_root_port = 0
# In a confidential compute environment hot-plugging can compromise
# security.
@@ -391,7 +396,7 @@ cold_plug_vfio = "@DEFAULTVFIOPORT_NV@"
# If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off
# security (vhost-net runs ring0) for network I/O performance.
#disable_vhost_net = true
disable_vhost_net = false
#
# Default entropy source.
@@ -403,7 +408,7 @@ cold_plug_vfio = "@DEFAULTVFIOPORT_NV@"
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "@DEFENTROPYSOURCE@"
entropy_source= "@DEFENTROPYSOURCE@"
# List of valid annotations values for entropy_source
# The default if not set is empty (all annotations rejected.)
@@ -425,17 +430,18 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
#
# Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) to discipline traffic.
# Default 0-sized value means unlimited rate.
#rx_rate_limiter_max_rate = 0
rx_rate_limiter_max_rate = 0
# Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) and ifb(Intermediate Functional Block)
# to discipline traffic.
# Default 0-sized value means unlimited rate.
#tx_rate_limiter_max_rate = 0
tx_rate_limiter_max_rate = 0
# Set where to save the guest memory dump file.
# If set, when GUEST_PANICKED event occurred,
@@ -445,9 +451,10 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# The dumped file(also called vmcore) can be processed with crash or gdb.
#
# WARNING:
# Dump guests memory can take very long depending on the amount of guest memory
# Dump guest's memory can take very long depending on the amount of guest memory
# and use much disk space.
#guest_memory_dump_path="/var/crash/kata"
# Recommended value when enabling: "/var/crash/kata"
guest_memory_dump_path = ""
# If enable paging.
# Basically, if you want to use "gdb" rather than "crash",
@@ -455,7 +462,7 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# then you should enable paging.
#
# See: https://www.qemu.org/docs/master/qemu-qmp-ref.html#Dump-guest-memory for details
#guest_memory_dump_paging=false
guest_memory_dump_paging = false
# Enable swap in the guest. Default false.
# When enable_guest_swap is enabled, insert a raw file to the guest as the swap device
@@ -466,20 +473,20 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
# If swap_in_bytes and memory_limit_in_bytes is not set, the size should
# be default_memory.
#enable_guest_swap = true
enable_guest_swap = false
# use legacy serial for guest console if available and implemented for architecture. Default false
#use_legacy_serial = true
use_legacy_serial = false
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
disable_selinux = @DEFDISABLESELINUX@
# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
# Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built
# with `SELINUX=yes`.
# (default: true)
disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
disable_guest_selinux = @DEFDISABLEGUESTSELINUX@
[factory]
@@ -494,12 +501,12 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true
enable_template = false
# Specifies the path of template.
#
# Default "/run/vc/vm/template"
#template_path = "/run/vc/vm/template"
template_path = "/run/vc/vm/template"
# The number of caches of VMCache:
# unspecified or == 0 --> VMCache is disabled
@@ -518,17 +525,17 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# a new sandbox.
#
# Default 0
#vm_cache_number = 0
vm_cache_number = 0
# Specify the address of the Unix socket that is used by VMCache.
#
# Default /var/run/kata-containers/cache.sock
#vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -542,7 +549,7 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
@@ -555,14 +562,14 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# * The module is not available in the guest or it doesn't met the guest kernel
# requirements, like architecture and version.
#
kernel_modules=[]
kernel_modules = []
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 90)
@@ -572,7 +579,7 @@ dial_timeout = 90
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
#
# Internetworking model
# Determines how the VM should be connected to the
@@ -590,19 +597,19 @@ dial_timeout = 90
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_QEMU@"
internetworking_model = "@DEFNETWORKMODEL_QEMU@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# vCPUs pinning settings
# if enabled, each vCPU thread will be scheduled to a fixed CPU
# qualified condition: num(vCPU threads) == num(CPUs in sandbox's CPUSet)
# enable_vcpus_pinning = false
enable_vcpus_pinning = false
# Apply a custom SELinux security policy to the container process inside the VM.
# This is used when you want to apply a type other than the default `container_t`,
@@ -610,22 +617,23 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="@DEFGUESTSELINUXLABEL@"
# Example value when enabling: "system_u:system_r:container_t"
guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -633,7 +641,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -641,7 +649,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_NV@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY_NV@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -650,13 +658,13 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_NV@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT_TEE@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT_TEE@
# If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
# If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts`
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=@DEFBINDMOUNTS@
sandbox_bind_mounts = @DEFBINDMOUNTS@
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
@@ -677,22 +685,22 @@ sandbox_bind_mounts=@DEFBINDMOUNTS@
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE@"
vfio_mode = "@DEFVFIOMODE@"
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=@DEFDISABLEGUESTEMPTYDIR@
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false
# Indicates the CreateContainer request timeout needed for the workload(s)
# It using guest_pull this includes the time to pull the image inside the guest
@@ -715,3 +723,22 @@ dan_conf = "@DEFDANCONF@"
# the container image should be pulled in the guest, without using an external snapshotter.
# This is an experimental feature and might be removed in the future.
experimental_force_guest_pull = @DEFFORCEGUESTPULL@
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the
# KubeletPodResourcesGet featureGate must be enabled in Kubelet,
# if using Kubelet older than 1.34.
#
# The pod resource API's socket is relative to the Kubelet's root-dir,
# which is defined by the cluster admin, and its location is:
# ${KubeletRootDir}/pod-resources/kubelet.sock
#
# cold_plug_vfio(see hypervisor config) acts as a feature gate:
# cold_plug_vfio = no_port (default) => no cold plug
# cold_plug_vfio != no_port AND pod_resource_api_sock = "" => need
# explicit CDI annotation for cold plug (applies mainly
# to non-k8s cases)
# cold_plug_vfio != no_port AND pod_resource_api_sock != "" => kubelet
# based cold plug.
pod_resource_api_sock = "@DEFPODRESOURCEAPISOCK_NV@"

View File

@@ -23,7 +23,7 @@ tdx_quote_generation_service_socket_port = @QEMUTDXQUOTEGENERATIONSERVICESOCKETP
# - ext4 (default)
# - xfs
# - erofs
rootfs_type=@DEFROOTFSTYPE@
rootfs_type = @DEFROOTFSTYPE@
# Enable confidential guest support.
# Toggling that setting may trigger different hardware features, ranging
@@ -44,7 +44,7 @@ confidential_guest = true
# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true
rootless = false
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
@@ -82,7 +82,7 @@ firmware_volume = "@FIRMWARETDVFVOLUMEPATH@"
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators="@MACHINEACCELERATORS@"
machine_accelerators = "@MACHINEACCELERATORS@"
# Qemu seccomp sandbox feature
# comma-separated list of seccomp sandbox features to control the syscall access.
@@ -90,12 +90,13 @@ machine_accelerators="@MACHINEACCELERATORS@"
# Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox
# Another note: enabling this feature may reduce performance, you may enable
# /proc/sys/net/core/bpf_jit_enable to reduce the impact. see https://man7.org/linux/man-pages/man8/bpfc.8.html
#seccompsandbox="@DEFSECCOMPSANDBOXPARAM@"
# Recommended value when enabling: "on,obsolete=deny,spawn=deny,resourcecontrol=deny"
seccompsandbox = "@DEFSECCOMPSANDBOXPARAM@"
# CPU features
# comma-separated list of cpu features to pass to the cpu
# For example, `cpu_features = "pmu=off,vmx=off"
cpu_features="@TDXCPUFEATURES@"
cpu_features = "@TDXCPUFEATURES@"
# Default number of vCPUs per SB/VM:
# unspecified or 0 --> will be set to @DEFVCPUS@
@@ -140,7 +141,7 @@ default_memory = @DEFAULTMEMORY_NV@
# Default memory slots per SB/VM.
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# Default maximum memory in MiB per SB / VM
# unspecified or == 0 --> will be set to the actual amount of physical RAM
@@ -153,13 +154,13 @@ default_maxmemory = @DEFMAXMEMSZ@
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0
memory_offset = 0
# Specifies virtio-mem will be enabled or not.
# Please note that this option should be used with the command
# "echo 1 > /proc/sys/vm/overcommit_memory".
# Default false
#enable_virtio_mem = true
enable_virtio_mem = false
# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
@@ -240,24 +241,28 @@ block_device_aio = "@DEFBLOCKDEVICEAIO_QEMU@"
# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true
block_device_cache_set = false
# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true
block_device_cache_direct = false
# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true
block_device_cache_noflush = false
# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
# for SCSI.
# handled in a separate IO thread. This is currently implemented
# for virtio-scsi and virtio-blk.
#
enable_iothreads = @DEFENABLEIOTHREADS@
# Independent IOThreads enables IO to be processed in a separate thread, it is
# for QEMU hotplug device attach to iothread, like virtio-blk.
indep_iothreads = @DEFINDEPIOTHREADS@
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
@@ -265,7 +270,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
enable_mem_prealloc = false
# Reclaim guest freed memory.
# Enabling this will result in the VM balloon device having f_reporting=on set.
@@ -275,7 +280,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# the VM.
#
# Default false
#reclaim_guest_freed_memory = true
reclaim_guest_freed_memory = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -283,7 +288,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Enable vhost-user storage device, default false
# Enabling this will result in some Linux reserved block type
@@ -300,11 +305,11 @@ vhost_user_store_path = "@DEFVHOSTUSERSTOREPATH@"
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true
enable_iommu = false
# Enable IOMMU_PLATFORM, default false
# Enabling this will result in the VM device having iommu_platform=on set
#enable_iommu_platform = true
enable_iommu_platform = false
# List of valid annotations values for the vhost user store path
# The default if not set is empty (all annotations rejected.)
@@ -320,7 +325,7 @@ vhost_user_reconnect_timeout_sec = 0
# will disable this feature. In the case of virtio-fs, this is enabled
# automatically and '/dev/shm' is used as the backing folder.
# This option will be ignored if VM templating is enabled.
#file_mem_backend = "@DEFFILEMEMBACKEND@"
file_mem_backend = "@DEFFILEMEMBACKEND@"
# List of valid annotations values for the file_mem_backend annotation
# The default if not set is empty (all annotations rejected.)
@@ -335,17 +340,17 @@ pflashes = []
# to enable debug output where available. And Debug also enable the hmp socket.
#
# Default false
#enable_debug = true
enable_debug = false
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
disable_nesting_checks = false
# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = @DEFMSIZE9P@
msize_9p = @DEFMSIZE9P@
# If false and nvdimm is supported, use nvdimm device to plug guest image.
# Otherwise virtio-block device is used.
@@ -357,7 +362,7 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
# The value means the number of pcie_root_port
# Default 0
#pcie_root_port = 2
pcie_root_port = 0
# In a confidential compute environment hot-plugging can compromise
# security.
@@ -368,7 +373,7 @@ cold_plug_vfio = "@DEFAULTVFIOPORT_NV@"
# If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off
# security (vhost-net runs ring0) for network I/O performance.
#disable_vhost_net = true
disable_vhost_net = false
#
# Default entropy source.
@@ -380,7 +385,7 @@ cold_plug_vfio = "@DEFAULTVFIOPORT_NV@"
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "@DEFENTROPYSOURCE@"
entropy_source= "@DEFENTROPYSOURCE@"
# List of valid annotations values for entropy_source
# The default if not set is empty (all annotations rejected.)
@@ -402,17 +407,18 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
#
# Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) to discipline traffic.
# Default 0-sized value means unlimited rate.
#rx_rate_limiter_max_rate = 0
rx_rate_limiter_max_rate = 0
# Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) and ifb(Intermediate Functional Block)
# to discipline traffic.
# Default 0-sized value means unlimited rate.
#tx_rate_limiter_max_rate = 0
tx_rate_limiter_max_rate = 0
# Set where to save the guest memory dump file.
# If set, when GUEST_PANICKED event occurred,
@@ -422,9 +428,10 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# The dumped file(also called vmcore) can be processed with crash or gdb.
#
# WARNING:
# Dump guests memory can take very long depending on the amount of guest memory
# Dump guest's memory can take very long depending on the amount of guest memory
# and use much disk space.
#guest_memory_dump_path="/var/crash/kata"
# Recommended value when enabling: "/var/crash/kata"
guest_memory_dump_path = ""
# If enable paging.
# Basically, if you want to use "gdb" rather than "crash",
@@ -432,7 +439,7 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# then you should enable paging.
#
# See: https://www.qemu.org/docs/master/qemu-qmp-ref.html#Dump-guest-memory for details
#guest_memory_dump_paging=false
guest_memory_dump_paging = false
# Enable swap in the guest. Default false.
# When enable_guest_swap is enabled, insert a raw file to the guest as the swap device
@@ -443,20 +450,20 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
# If swap_in_bytes and memory_limit_in_bytes is not set, the size should
# be default_memory.
#enable_guest_swap = true
enable_guest_swap = false
# use legacy serial for guest console if available and implemented for architecture. Default false
#use_legacy_serial = true
use_legacy_serial = false
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
disable_selinux = @DEFDISABLESELINUX@
# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
# Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built
# with `SELINUX=yes`.
# (default: true)
disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
disable_guest_selinux = @DEFDISABLEGUESTSELINUX@
[factory]
@@ -471,12 +478,12 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true
enable_template = false
# Specifies the path of template.
#
# Default "/run/vc/vm/template"
#template_path = "/run/vc/vm/template"
template_path = "/run/vc/vm/template"
# The number of caches of VMCache:
# unspecified or == 0 --> VMCache is disabled
@@ -495,17 +502,17 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# a new sandbox.
#
# Default 0
#vm_cache_number = 0
vm_cache_number = 0
# Specify the address of the Unix socket that is used by VMCache.
#
# Default /var/run/kata-containers/cache.sock
#vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -519,7 +526,7 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
@@ -532,14 +539,14 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# * The module is not available in the guest or it doesn't met the guest kernel
# requirements, like architecture and version.
#
kernel_modules=[]
kernel_modules = []
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 90)
@@ -549,7 +556,7 @@ dial_timeout = 90
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
#
# Internetworking model
# Determines how the VM should be connected to the
@@ -567,19 +574,19 @@ dial_timeout = 90
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_QEMU@"
internetworking_model = "@DEFNETWORKMODEL_QEMU@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# vCPUs pinning settings
# if enabled, each vCPU thread will be scheduled to a fixed CPU
# qualified condition: num(vCPU threads) == num(CPUs in sandbox's CPUSet)
# enable_vcpus_pinning = false
enable_vcpus_pinning = false
# Apply a custom SELinux security policy to the container process inside the VM.
# This is used when you want to apply a type other than the default `container_t`,
@@ -587,22 +594,23 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="@DEFGUESTSELINUXLABEL@"
# Example value when enabling: "system_u:system_r:container_t"
guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -610,7 +618,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -618,7 +626,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_NV@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY_NV@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -627,13 +635,13 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_NV@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT_TEE@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT_TEE@
# If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
# If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts`
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=@DEFBINDMOUNTS@
sandbox_bind_mounts = @DEFBINDMOUNTS@
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
@@ -654,22 +662,22 @@ sandbox_bind_mounts=@DEFBINDMOUNTS@
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE@"
vfio_mode = "@DEFVFIOMODE@"
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=@DEFDISABLEGUESTEMPTYDIR@
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false
# Indicates the CreateContainer request timeout needed for the workload(s)
# It using guest_pull this includes the time to pull the image inside the guest
@@ -692,3 +700,22 @@ dan_conf = "@DEFDANCONF@"
# the container image should be pulled in the guest, without using an external snapshotter.
# This is an experimental feature and might be removed in the future.
experimental_force_guest_pull = @DEFFORCEGUESTPULL@
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the
# KubeletPodResourcesGet featureGate must be enabled in Kubelet,
# if using Kubelet older than 1.34.
#
# The pod resource API's socket is relative to the Kubelet's root-dir,
# which is defined by the cluster admin, and its location is:
# ${KubeletRootDir}/pod-resources/kubelet.sock
#
# cold_plug_vfio(see hypervisor config) acts as a feature gate:
# cold_plug_vfio = no_port (default) => no cold plug
# cold_plug_vfio != no_port AND pod_resource_api_sock = "" => need
# explicit CDI annotation for cold plug (applies mainly
# to non-k8s cases)
# cold_plug_vfio != no_port AND pod_resource_api_sock != "" => kubelet
# based cold plug.
pod_resource_api_sock = "@DEFPODRESOURCEAPISOCK_NV@"

View File

@@ -21,34 +21,12 @@ machine_type = "@MACHINETYPE@"
# - ext4 (default)
# - xfs
# - erofs
rootfs_type=@DEFROOTFSTYPE@
# Enable confidential guest support.
# Toggling that setting may trigger different hardware features, ranging
# from memory encryption to both memory and CPU-state encryption and integrity.
# The Kata Containers runtime dynamically detects the available feature set and
# aims at enabling the largest possible one, returning an error if none is
# available, or none is supported by the hypervisor.
#
# Known limitations:
# * Does not work by design:
# - CPU Hotplug
# - Memory Hotplug
# - NVDIMM devices
#
# Default false
# confidential_guest = true
# Choose AMD SEV-SNP confidential guests
# In case of using confidential guests on AMD hardware that supports both SEV
# and SEV-SNP, the following enables SEV-SNP guests. SEV guests are default.
# Default false
# sev_snp_guest = true
rootfs_type = @DEFROOTFSTYPE@
# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true
rootless = false
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
@@ -86,7 +64,7 @@ firmware_volume = "@FIRMWAREVOLUMEPATH@"
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators="@MACHINEACCELERATORS@"
machine_accelerators = "@MACHINEACCELERATORS@"
# Qemu seccomp sandbox feature
# comma-separated list of seccomp sandbox features to control the syscall access.
@@ -94,12 +72,13 @@ machine_accelerators="@MACHINEACCELERATORS@"
# Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox
# Another note: enabling this feature may reduce performance, you may enable
# /proc/sys/net/core/bpf_jit_enable to reduce the impact. see https://man7.org/linux/man-pages/man8/bpfc.8.html
#seccompsandbox="@DEFSECCOMPSANDBOXPARAM@"
# Recommended value when enabling: "on,obsolete=deny,spawn=deny,resourcecontrol=deny"
seccompsandbox = "@DEFSECCOMPSANDBOXPARAM@"
# CPU features
# comma-separated list of cpu features to pass to the cpu
# For example, `cpu_features = "pmu=off,vmx=off"
cpu_features="@CPUFEATURES@"
cpu_features = "@CPUFEATURES@"
# Default number of vCPUs per SB/VM:
# unspecified or 0 --> will be set to @DEFVCPUS@
@@ -144,7 +123,7 @@ default_memory = @DEFAULTMEMORY_NV@
# Default memory slots per SB/VM.
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# Default maximum memory in MiB per SB / VM
# unspecified or == 0 --> will be set to the actual amount of physical RAM
@@ -157,13 +136,13 @@ default_maxmemory = @DEFMAXMEMSZ@
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0
memory_offset = 0
# Specifies virtio-mem will be enabled or not.
# Please note that this option should be used with the command
# "echo 1 > /proc/sys/vm/overcommit_memory".
# Default false
#enable_virtio_mem = true
enable_virtio_mem = false
# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
@@ -244,24 +223,28 @@ block_device_aio = "@DEFBLOCKDEVICEAIO_QEMU@"
# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true
block_device_cache_set = false
# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true
block_device_cache_direct = false
# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true
block_device_cache_noflush = false
# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
# for SCSI.
# handled in a separate IO thread. This is currently implemented
# for virtio-scsi and virtio-blk.
#
enable_iothreads = @DEFENABLEIOTHREADS@
# Independent IOThreads enables IO to be processed in a separate thread, it is
# for QEMU hotplug device attach to iothread, like virtio-blk.
indep_iothreads = @DEFINDEPIOTHREADS@
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
@@ -269,7 +252,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
enable_mem_prealloc = false
# Reclaim guest freed memory.
# Enabling this will result in the VM balloon device having f_reporting=on set.
@@ -279,7 +262,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# the VM.
#
# Default false
#reclaim_guest_freed_memory = true
reclaim_guest_freed_memory = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -287,7 +270,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Enable vhost-user storage device, default false
# Enabling this will result in some Linux reserved block type
@@ -304,11 +287,11 @@ vhost_user_store_path = "@DEFVHOSTUSERSTOREPATH@"
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true
enable_iommu = false
# Enable IOMMU_PLATFORM, default false
# Enabling this will result in the VM device having iommu_platform=on set
#enable_iommu_platform = true
enable_iommu_platform = false
# List of valid annotations values for the vhost user store path
# The default if not set is empty (all annotations rejected.)
@@ -324,7 +307,7 @@ vhost_user_reconnect_timeout_sec = 0
# will disable this feature. In the case of virtio-fs, this is enabled
# automatically and '/dev/shm' is used as the backing folder.
# This option will be ignored if VM templating is enabled.
#file_mem_backend = "@DEFFILEMEMBACKEND@"
file_mem_backend = "@DEFFILEMEMBACKEND@"
# List of valid annotations values for the file_mem_backend annotation
# The default if not set is empty (all annotations rejected.)
@@ -339,7 +322,7 @@ pflashes = []
# to enable debug output where available.
#
# Default false
#enable_debug = true
enable_debug = false
# This option allows to add an extra HMP or QMP socket when `enable_debug = true`
#
@@ -354,17 +337,17 @@ pflashes = []
#
# If set to the empty string "", no extra monitor socket is added. This is
# the default.
#extra_monitor_socket = hmp
extra_monitor_socket = ""
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
disable_nesting_checks = true
# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = @DEFMSIZE9P@
msize_9p = @DEFMSIZE9P@
# If false and nvdimm is supported, use nvdimm device to plug guest image.
# Otherwise virtio-block device is used.
@@ -382,7 +365,7 @@ hot_plug_vfio = "@DEFAULTVFIOPORT_NV@"
# Enable cold-plugging of VFIO devices to a bridge-port,
# root-port or switch-port.
# The default setting is "no-port", which means disabled.
#cold_plug_vfio = "@DEFAULTVFIOPORT_NV@"
cold_plug_vfio = "no-port"
# Before hot plugging a PCIe device, you need to add a pcie_root_port device.
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
@@ -392,7 +375,7 @@ pcie_root_port = @DEFAULTPCIEROOTPORT_NV@
# If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off
# security (vhost-net runs ring0) for network I/O performance.
#disable_vhost_net = true
disable_vhost_net = false
#
# Default entropy source.
@@ -404,7 +387,7 @@ pcie_root_port = @DEFAULTPCIEROOTPORT_NV@
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "@DEFENTROPYSOURCE@"
entropy_source= "@DEFENTROPYSOURCE@"
# List of valid annotations values for entropy_source
# The default if not set is empty (all annotations rejected.)
@@ -426,17 +409,18 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
#
# Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) to discipline traffic.
# Default 0-sized value means unlimited rate.
#rx_rate_limiter_max_rate = 0
rx_rate_limiter_max_rate = 0
# Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) and ifb(Intermediate Functional Block)
# to discipline traffic.
# Default 0-sized value means unlimited rate.
#tx_rate_limiter_max_rate = 0
tx_rate_limiter_max_rate = 0
# Set where to save the guest memory dump file.
# If set, when GUEST_PANICKED event occurred,
@@ -446,9 +430,10 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# The dumped file(also called vmcore) can be processed with crash or gdb.
#
# WARNING:
# Dump guests memory can take very long depending on the amount of guest memory
# Dump guest's memory can take very long depending on the amount of guest memory
# and use much disk space.
#guest_memory_dump_path="/var/crash/kata"
# Recommended value when enabling: "/var/crash/kata"
guest_memory_dump_path = ""
# If enable paging.
# Basically, if you want to use "gdb" rather than "crash",
@@ -456,7 +441,7 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# then you should enable paging.
#
# See: https://www.qemu.org/docs/master/qemu-qmp-ref.html#Dump-guest-memory for details
#guest_memory_dump_paging=false
guest_memory_dump_paging = false
# Enable swap in the guest. Default false.
# When enable_guest_swap is enabled, insert a raw file to the guest as the swap device
@@ -467,20 +452,20 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
# If swap_in_bytes and memory_limit_in_bytes is not set, the size should
# be default_memory.
#enable_guest_swap = true
enable_guest_swap = false
# use legacy serial for guest console if available and implemented for architecture. Default false
#use_legacy_serial = true
use_legacy_serial = false
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
disable_selinux = @DEFDISABLESELINUX@
# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
# Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built
# with `SELINUX=yes`.
# (default: true)
disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
disable_guest_selinux = @DEFDISABLEGUESTSELINUX@
[factory]
@@ -495,12 +480,12 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true
enable_template = false
# Specifies the path of template.
#
# Default "/run/vc/vm/template"
#template_path = "/run/vc/vm/template"
template_path = "/run/vc/vm/template"
# The number of caches of VMCache:
# unspecified or == 0 --> VMCache is disabled
@@ -519,17 +504,17 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# a new sandbox.
#
# Default 0
#vm_cache_number = 0
vm_cache_number = 0
# Specify the address of the Unix socket that is used by VMCache.
#
# Default /var/run/kata-containers/cache.sock
#vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -543,7 +528,7 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
@@ -556,14 +541,14 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# * The module is not available in the guest or it doesn't met the guest kernel
# requirements, like architecture and version.
#
kernel_modules=[]
kernel_modules = []
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 90)
@@ -573,7 +558,7 @@ dial_timeout = 90
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
#
# Internetworking model
# Determines how the VM should be connected to the
@@ -591,19 +576,19 @@ dial_timeout = 90
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_QEMU@"
internetworking_model = "@DEFNETWORKMODEL_QEMU@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# vCPUs pinning settings
# if enabled, each vCPU thread will be scheduled to a fixed CPU
# qualified condition: num(vCPU threads) == num(CPUs in sandbox's CPUSet)
# enable_vcpus_pinning = false
enable_vcpus_pinning = false
# Apply a custom SELinux security policy to the container process inside the VM.
# This is used when you want to apply a type other than the default `container_t`,
@@ -611,22 +596,23 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="@DEFGUESTSELINUXLABEL@"
# Example value when enabling: "system_u:system_r:container_t"
guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -634,7 +620,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -642,7 +628,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_NV@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY_NV@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -651,13 +637,13 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY_NV@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT@
# If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
# If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts`
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=@DEFBINDMOUNTS@
sandbox_bind_mounts = @DEFBINDMOUNTS@
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
@@ -678,22 +664,22 @@ sandbox_bind_mounts=@DEFBINDMOUNTS@
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE@"
vfio_mode = "@DEFVFIOMODE@"
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=@DEFDISABLEGUESTEMPTYDIR@
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false
# Indicates the CreateContainer request timeout needed for the workload(s)
# It using guest_pull this includes the time to pull the image inside the guest
@@ -711,3 +697,22 @@ create_container_timeout = @DEFAULTTIMEOUT_NV@
# to the hypervisor.
# (default: /run/kata-containers/dans)
dan_conf = "@DEFDANCONF@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the
# KubeletPodResourcesGet featureGate must be enabled in Kubelet,
# if using Kubelet older than 1.34.
#
# The pod resource API's socket is relative to the Kubelet's root-dir,
# which is defined by the cluster admin, and its location is:
# ${KubeletRootDir}/pod-resources/kubelet.sock
#
# cold_plug_vfio(see hypervisor config) acts as a feature gate:
# cold_plug_vfio = no_port (default) => no cold plug
# cold_plug_vfio != no_port AND pod_resource_api_sock = "" => need
# explicit CDI annotation for cold plug (applies mainly
# to non-k8s cases)
# cold_plug_vfio != no_port AND pod_resource_api_sock != "" => kubelet
# based cold plug.
pod_resource_api_sock = "@DEFPODRESOURCEAPISOCK_NV@"

View File

@@ -35,7 +35,7 @@ confidential_guest = true
# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true
rootless = false
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
@@ -73,7 +73,7 @@ firmware_volume = "@FIRMWAREVOLUMEPATH@"
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators="@MACHINEACCELERATORS@"
machine_accelerators = "@MACHINEACCELERATORS@"
# Qemu seccomp sandbox feature
# comma-separated list of seccomp sandbox features to control the syscall access.
@@ -81,12 +81,13 @@ machine_accelerators="@MACHINEACCELERATORS@"
# Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox
# Another note: enabling this feature may reduce performance, you may enable
# /proc/sys/net/core/bpf_jit_enable to reduce the impact. see https://man7.org/linux/man-pages/man8/bpfc.8.html
#seccompsandbox="@DEFSECCOMPSANDBOXPARAM@"
# Recommended value when enabling: "on,obsolete=deny,spawn=deny,resourcecontrol=deny"
seccompsandbox = "@DEFSECCOMPSANDBOXPARAM@"
# CPU features
# comma-separated list of cpu features to pass to the cpu
# For example, `cpu_features = "pmu=off,vmx=off"
cpu_features="@CPUFEATURES@"
cpu_features = "@CPUFEATURES@"
# Default number of vCPUs per SB/VM:
# unspecified or 0 --> will be set to @DEFVCPUS@
@@ -131,7 +132,7 @@ default_memory = @DEFMEMSZ@
# Default memory slots per SB/VM.
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# Default maximum memory in MiB per SB / VM
# unspecified or == 0 --> will be set to the actual amount of physical RAM
@@ -144,13 +145,13 @@ default_maxmemory = @DEFMAXMEMSZ@
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0
memory_offset = 0
# Specifies virtio-mem will be enabled or not.
# Please note that this option should be used with the command
# "echo 1 > /proc/sys/vm/overcommit_memory".
# Default false
#enable_virtio_mem = true
enable_virtio_mem = false
# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
@@ -230,24 +231,28 @@ block_device_aio = "@DEFBLOCKDEVICEAIO_QEMU@"
# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true
block_device_cache_set = false
# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true
block_device_cache_direct = false
# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true
block_device_cache_noflush = false
# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
# for SCSI.
# handled in a separate IO thread. This is currently implemented
# for virtio-scsi and virtio-blk.
#
enable_iothreads = @DEFENABLEIOTHREADS@
# Independent IOThreads enables IO to be processed in a separate thread, it is
# for QEMU hotplug device attach to iothread, like virtio-blk.
indep_iothreads = @DEFINDEPIOTHREADS@
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
@@ -255,7 +260,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
enable_mem_prealloc = false
# Reclaim guest freed memory.
# Enabling this will result in the VM balloon device having f_reporting=on set.
@@ -265,7 +270,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# the VM.
#
# Default false
#reclaim_guest_freed_memory = true
reclaim_guest_freed_memory = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -273,7 +278,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Enable vhost-user storage device, default false
# Enabling this will result in some Linux reserved block type
@@ -290,11 +295,11 @@ vhost_user_store_path = "@DEFVHOSTUSERSTOREPATH@"
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true
enable_iommu = false
# Enable IOMMU_PLATFORM, default false
# Enabling this will result in the VM device having iommu_platform=on set
#enable_iommu_platform = true
enable_iommu_platform = false
# List of valid annotations values for the vhost user store path
# The default if not set is empty (all annotations rejected.)
@@ -305,7 +310,7 @@ valid_vhost_user_store_paths = @DEFVALIDVHOSTUSERSTOREPATHS@
# will disable this feature. In the case of virtio-fs, this is enabled
# automatically and '/dev/shm' is used as the backing folder.
# This option will be ignored if VM templating is enabled.
#file_mem_backend = "@DEFFILEMEMBACKEND@"
file_mem_backend = "@DEFFILEMEMBACKEND@"
# List of valid annotations values for the file_mem_backend annotation
# The default if not set is empty (all annotations rejected.)
@@ -320,17 +325,17 @@ pflashes = []
# to enable debug output where available.
#
# Default false
#enable_debug = true
enable_debug = false
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
disable_nesting_checks = false
# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = @DEFMSIZE9P@
msize_9p = @DEFMSIZE9P@
# If false and nvdimm is supported, use nvdimm device to plug guest image.
# Otherwise virtio-block device is used.
@@ -338,10 +343,10 @@ pflashes = []
# nvdimm is not supported when `confidential_guest = true`.
disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# Enable hot-plugging of VFIO devices to a bridge-port,
# Enable hot-plugging of VFIO devices to a bridge-port,
# root-port or switch-port.
# The default setting is "no-port"
#hot_plug_vfio = "bridge-port"
hot_plug_vfio = "no-port"
# In a confidential compute environment hot-plugging can compromise
# security.
@@ -354,11 +359,11 @@ cold_plug_vfio = "bridge-port"
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
# The value means the number of pcie_root_port
# Default 0
#pcie_root_port = 2
pcie_root_port = 0
# If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off
# security (vhost-net runs ring0) for network I/O performance.
#disable_vhost_net = true
disable_vhost_net = false
#
# Default entropy source.
@@ -370,7 +375,7 @@ cold_plug_vfio = "bridge-port"
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "@DEFENTROPYSOURCE@"
entropy_source= "@DEFENTROPYSOURCE@"
# List of valid annotations values for entropy_source
# The default if not set is empty (all annotations rejected.)
@@ -392,17 +397,18 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
#
# Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) to discipline traffic.
# Default 0-sized value means unlimited rate.
#rx_rate_limiter_max_rate = 0
rx_rate_limiter_max_rate = 0
# Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) and ifb(Intermediate Functional Block)
# to discipline traffic.
# Default 0-sized value means unlimited rate.
#tx_rate_limiter_max_rate = 0
tx_rate_limiter_max_rate = 0
# Set where to save the guest memory dump file.
# If set, when GUEST_PANICKED event occurred,
@@ -412,9 +418,10 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# The dumped file(also called vmcore) can be processed with crash or gdb.
#
# WARNING:
# Dump guests memory can take very long depending on the amount of guest memory
# Dump guest's memory can take very long depending on the amount of guest memory
# and use much disk space.
#guest_memory_dump_path="/var/crash/kata"
# Recommended value when enabling: "/var/crash/kata"
guest_memory_dump_path = ""
# If enable paging.
# Basically, if you want to use "gdb" rather than "crash",
@@ -422,7 +429,7 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# then you should enable paging.
#
# See: https://www.qemu.org/docs/master/qemu-qmp-ref.html#Dump-guest-memory for details
#guest_memory_dump_paging=false
guest_memory_dump_paging = false
# Enable swap in the guest. Default false.
# When enable_guest_swap is enabled, insert a raw file to the guest as the swap device
@@ -433,20 +440,20 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
# If swap_in_bytes and memory_limit_in_bytes is not set, the size should
# be default_memory.
#enable_guest_swap = true
enable_guest_swap = false
# use legacy serial for guest console if available and implemented for architecture. Default false
#use_legacy_serial = true
use_legacy_serial = false
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
disable_selinux = @DEFDISABLESELINUX@
# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
# Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built
# with `SELINUX=yes`.
# (default: true)
disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
disable_guest_selinux = @DEFDISABLEGUESTSELINUX@
[factory]
@@ -461,12 +468,12 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true
enable_template = false
# Specifies the path of template.
#
# Default "/run/vc/vm/template"
#template_path = "/run/vc/vm/template"
template_path = "/run/vc/vm/template"
# The number of caches of VMCache:
# unspecified or == 0 --> VMCache is disabled
@@ -485,17 +492,17 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# a new sandbox.
#
# Default 0
#vm_cache_number = 0
vm_cache_number = 0
# Specify the address of the Unix socket that is used by VMCache.
#
# Default /var/run/kata-containers/cache.sock
#vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -509,7 +516,7 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
@@ -522,14 +529,14 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# * The module is not available in the guest or it doesn't met the guest kernel
# requirements, like architecture and version.
#
kernel_modules=[]
kernel_modules = []
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 30)
@@ -539,7 +546,7 @@ dial_timeout = 90
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
#
# Internetworking model
# Determines how the VM should be connected to the
@@ -557,19 +564,19 @@ dial_timeout = 90
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_QEMU@"
internetworking_model = "@DEFNETWORKMODEL_QEMU@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# vCPUs pinning settings
# if enabled, each vCPU thread will be scheduled to a fixed CPU
# qualified condition: num(vCPU threads) == num(CPUs in sandbox's CPUSet)
# enable_vcpus_pinning = false
enable_vcpus_pinning = false
# Apply a custom SELinux security policy to the container process inside the VM.
# This is used when you want to apply a type other than the default `container_t`,
@@ -577,22 +584,23 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="@DEFGUESTSELINUXLABEL@"
# Example value when enabling: "system_u:system_r:container_t"
guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -600,7 +608,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -608,7 +616,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -617,13 +625,13 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT@
# If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
# If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts`
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=@DEFBINDMOUNTS@
sandbox_bind_mounts = @DEFBINDMOUNTS@
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
@@ -644,22 +652,22 @@ sandbox_bind_mounts=@DEFBINDMOUNTS@
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE_SE@"
vfio_mode = "@DEFVFIOMODE_SE@"
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=@DEFDISABLEGUESTEMPTYDIR@
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false
# Indicates the CreateContainer request timeout needed for the workload(s)
# It using guest_pull this includes the time to pull the image inside the guest
@@ -682,3 +690,22 @@ dan_conf = "@DEFDANCONF@"
# the container image should be pulled in the guest, without using an external snapshotter.
# This is an experimental feature and might be removed in the future.
experimental_force_guest_pull = @DEFFORCEGUESTPULL@
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the
# KubeletPodResourcesGet featureGate must be enabled in Kubelet,
# if using Kubelet older than 1.34.
#
# The pod resource API's socket is relative to the Kubelet's root-dir,
# which is defined by the cluster admin, and its location is:
# ${KubeletRootDir}/pod-resources/kubelet.sock
#
# cold_plug_vfio(see hypervisor config) acts as a feature gate:
# cold_plug_vfio = no_port (default) => no cold plug
# cold_plug_vfio != no_port AND pod_resource_api_sock = "" => need
# explicit CDI annotation for cold plug (applies mainly
# to non-k8s cases)
# cold_plug_vfio != no_port AND pod_resource_api_sock != "" => kubelet
# based cold plug.
pod_resource_api_sock = "@DEFPODRESOURCEAPISOCK@"

View File

@@ -15,7 +15,6 @@
[hypervisor.qemu]
path = "@QEMUPATH@"
kernel = "@KERNELCONFIDENTIALPATH@"
#image = "@IMAGEPATH@"
initrd = "@INITRDCONFIDENTIALPATH@"
machine_type = "@MACHINETYPE@"
@@ -23,7 +22,7 @@ machine_type = "@MACHINETYPE@"
# - ext4 (default)
# - xfs
# - erofs
rootfs_type=@DEFROOTFSTYPE@
rootfs_type = @DEFROOTFSTYPE@
# Enable confidential guest support.
# Toggling that setting may trigger different hardware features, ranging
@@ -47,7 +46,7 @@ sev_snp_guest = true
# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true
rootless = false
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
@@ -68,17 +67,17 @@ valid_hypervisor_paths = @QEMUVALIDHYPERVISORPATHS@
#
# 96-byte, base64-encoded blob to provide the ID Block structure for the
# SNP_LAUNCH_FINISH command defined in the SEV-SNP firmware ABI (QEMU default: all-zero)
#snp_id_block = ""
snp_id_block = ""
# 4096-byte, base64-encoded blob to provide the ID Authentication Information Structure
# for the SNP_LAUNCH_FINISH command defined in the SEV-SNP firmware ABI (QEMU default: all-zero)
#snp_id_auth = ""
snp_id_auth = ""
# SNP Guest Policy, the POLICY parameter to the SNP_LAUNCH_START command.
# If unset, the QEMU default policy (0x30000) will be used.
# Notice that the guest policy is enforced at VM launch, and your pod VMs
# won't start at all if the policy denys it. This will be indicated by a
# 'SNP_LAUNCH_START' error.
#snp_guest_policy = 196608
snp_guest_policy = 196608
# Optional space-separated list of options to pass to the guest kernel.
# For example, use `kernel_params = "vsyscall=emulate"` if you are having
@@ -105,7 +104,7 @@ firmware_volume = "@FIRMWARETDVFVOLUMEPATH@"
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators="@MACHINEACCELERATORS@"
machine_accelerators = "@MACHINEACCELERATORS@"
# Qemu seccomp sandbox feature
# comma-separated list of seccomp sandbox features to control the syscall access.
@@ -113,12 +112,13 @@ machine_accelerators="@MACHINEACCELERATORS@"
# Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox
# Another note: enabling this feature may reduce performance, you may enable
# /proc/sys/net/core/bpf_jit_enable to reduce the impact. see https://man7.org/linux/man-pages/man8/bpfc.8.html
#seccompsandbox="@DEFSECCOMPSANDBOXPARAM@"
# Recommended value when enabling: "on,obsolete=deny,spawn=deny,resourcecontrol=deny"
seccompsandbox = "@DEFSECCOMPSANDBOXPARAM@"
# CPU features
# comma-separated list of cpu features to pass to the cpu
# For example, `cpu_features = "pmu=off,vmx=off"
cpu_features="@CPUFEATURES@"
cpu_features = "@CPUFEATURES@"
# Default number of vCPUs per SB/VM:
# unspecified or 0 --> will be set to @DEFVCPUS@
@@ -163,7 +163,7 @@ default_memory = @DEFMEMSZ@
# Default memory slots per SB/VM.
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# Default maximum memory in MiB per SB / VM
# unspecified or == 0 --> will be set to the actual amount of physical RAM
@@ -176,13 +176,13 @@ default_maxmemory = @DEFMAXMEMSZ@
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0
memory_offset = 0
# Specifies virtio-mem will be enabled or not.
# Please note that this option should be used with the command
# "echo 1 > /proc/sys/vm/overcommit_memory".
# Default false
#enable_virtio_mem = true
enable_virtio_mem = false
# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
@@ -263,24 +263,28 @@ block_device_aio = "@DEFBLOCKDEVICEAIO_QEMU@"
# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true
block_device_cache_set = false
# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true
block_device_cache_direct = false
# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true
block_device_cache_noflush = false
# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
# for SCSI.
# handled in a separate IO thread. This is currently implemented
# for virtio-scsi and virtio-blk.
#
enable_iothreads = @DEFENABLEIOTHREADS@
# Independent IOThreads enables IO to be processed in a separate thread, it is
# for QEMU hotplug device attach to iothread, like virtio-blk.
indep_iothreads = @DEFINDEPIOTHREADS@
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
@@ -288,7 +292,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
enable_mem_prealloc = false
# Reclaim guest freed memory.
# Enabling this will result in the VM balloon device having f_reporting=on set.
@@ -298,7 +302,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# the VM.
#
# Default false
#reclaim_guest_freed_memory = true
reclaim_guest_freed_memory = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -306,7 +310,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Enable vhost-user storage device, default false
# Enabling this will result in some Linux reserved block type
@@ -323,11 +327,11 @@ vhost_user_store_path = "@DEFVHOSTUSERSTOREPATH@"
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true
enable_iommu = false
# Enable IOMMU_PLATFORM, default false
# Enabling this will result in the VM device having iommu_platform=on set
#enable_iommu_platform = true
enable_iommu_platform = false
# List of valid annotations values for the vhost user store path
# The default if not set is empty (all annotations rejected.)
@@ -358,7 +362,7 @@ pflashes = []
# to enable debug output where available. And Debug also enable the hmp socket.
#
# Default false
#enable_debug = true
enable_debug = false
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
@@ -368,7 +372,7 @@ disable_nesting_checks = true
# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = @DEFMSIZE9P@
msize_9p = @DEFMSIZE9P@
# If false and nvdimm is supported, use nvdimm device to plug guest image.
# Otherwise virtio-block device is used.
@@ -380,11 +384,11 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
# The value means the number of pcie_root_port
# Default 0
#pcie_root_port = 2
pcie_root_port = 0
# If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off
# security (vhost-net runs ring0) for network I/O performance.
#disable_vhost_net = true
disable_vhost_net = false
#
# Default entropy source.
@@ -396,7 +400,7 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "@DEFENTROPYSOURCE@"
entropy_source= "@DEFENTROPYSOURCE@"
# List of valid annotations values for entropy_source
# The default if not set is empty (all annotations rejected.)
@@ -418,17 +422,18 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
#
# Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) to discipline traffic.
# Default 0-sized value means unlimited rate.
#rx_rate_limiter_max_rate = 0
rx_rate_limiter_max_rate = 0
# Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) and ifb(Intermediate Functional Block)
# to discipline traffic.
# Default 0-sized value means unlimited rate.
#tx_rate_limiter_max_rate = 0
tx_rate_limiter_max_rate = 0
# Set where to save the guest memory dump file.
# If set, when GUEST_PANICKED event occurred,
@@ -438,9 +443,10 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# The dumped file(also called vmcore) can be processed with crash or gdb.
#
# WARNING:
# Dump guests memory can take very long depending on the amount of guest memory
# Dump guest's memory can take very long depending on the amount of guest memory
# and use much disk space.
#guest_memory_dump_path="/var/crash/kata"
# Recommended value when enabling: "/var/crash/kata"
guest_memory_dump_path = ""
# If enable paging.
# Basically, if you want to use "gdb" rather than "crash",
@@ -448,7 +454,7 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# then you should enable paging.
#
# See: https://www.qemu.org/docs/master/qemu-qmp-ref.html#Dump-guest-memory for details
#guest_memory_dump_paging=false
guest_memory_dump_paging = false
# Enable swap in the guest. Default false.
# When enable_guest_swap is enabled, insert a raw file to the guest as the swap device
@@ -459,20 +465,20 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
# If swap_in_bytes and memory_limit_in_bytes is not set, the size should
# be default_memory.
#enable_guest_swap = true
enable_guest_swap = false
# use legacy serial for guest console if available and implemented for architecture. Default false
#use_legacy_serial = true
use_legacy_serial = false
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
disable_selinux = @DEFDISABLESELINUX@
# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
# Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built
# with `SELINUX=yes`.
# (default: true)
disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
disable_guest_selinux = @DEFDISABLEGUESTSELINUX@
[factory]
@@ -487,12 +493,12 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true
enable_template = false
# Specifies the path of template.
#
# Default "/run/vc/vm/template"
#template_path = "/run/vc/vm/template"
template_path = "/run/vc/vm/template"
# The number of caches of VMCache:
# unspecified or == 0 --> VMCache is disabled
@@ -511,17 +517,17 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# a new sandbox.
#
# Default 0
#vm_cache_number = 0
vm_cache_number = 0
# Specify the address of the Unix socket that is used by VMCache.
#
# Default /var/run/kata-containers/cache.sock
#vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -535,7 +541,7 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
@@ -548,14 +554,14 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# * The module is not available in the guest or it doesn't met the guest kernel
# requirements, like architecture and version.
#
kernel_modules=[]
kernel_modules = []
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 90)
@@ -565,7 +571,7 @@ dial_timeout = 90
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
#
# Internetworking model
# Determines how the VM should be connected to the
@@ -583,19 +589,19 @@ dial_timeout = 90
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_QEMU@"
internetworking_model = "@DEFNETWORKMODEL_QEMU@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# vCPUs pinning settings
# if enabled, each vCPU thread will be scheduled to a fixed CPU
# qualified condition: num(vCPU threads) == num(CPUs in sandbox's CPUSet)
# enable_vcpus_pinning = false
enable_vcpus_pinning = false
# Apply a custom SELinux security policy to the container process inside the VM.
# This is used when you want to apply a type other than the default `container_t`,
@@ -603,22 +609,23 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="@DEFGUESTSELINUXLABEL@"
# Example value when enabling: "system_u:system_r:container_t"
guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -626,7 +633,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -634,7 +641,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -643,13 +650,13 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT_TEE@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT_TEE@
# If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
# If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts`
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=@DEFBINDMOUNTS@
sandbox_bind_mounts = @DEFBINDMOUNTS@
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
@@ -670,22 +677,22 @@ sandbox_bind_mounts=@DEFBINDMOUNTS@
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE@"
vfio_mode = "@DEFVFIOMODE@"
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=@DEFDISABLEGUESTEMPTYDIR@
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false
# Indicates the CreateContainer request timeout needed for the workload(s)
# It using guest_pull this includes the time to pull the image inside the guest
@@ -708,3 +715,22 @@ dan_conf = "@DEFDANCONF@"
# the container image should be pulled in the guest, without using an external snapshotter.
# This is an experimental feature and might be removed in the future.
experimental_force_guest_pull = @DEFFORCEGUESTPULL@
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the
# KubeletPodResourcesGet featureGate must be enabled in Kubelet,
# if using Kubelet older than 1.34.
#
# The pod resource API's socket is relative to the Kubelet's root-dir,
# which is defined by the cluster admin, and its location is:
# ${KubeletRootDir}/pod-resources/kubelet.sock
#
# cold_plug_vfio(see hypervisor config) acts as a feature gate:
# cold_plug_vfio = no_port (default) => no cold plug
# cold_plug_vfio != no_port AND pod_resource_api_sock = "" => need
# explicit CDI annotation for cold plug (applies mainly
# to non-k8s cases)
# cold_plug_vfio != no_port AND pod_resource_api_sock != "" => kubelet
# based cold plug.
pod_resource_api_sock = "@DEFPODRESOURCEAPISOCK@"

View File

@@ -15,7 +15,6 @@
path = "@QEMUTDXPATH@"
kernel = "@KERNELCONFIDENTIALPATH@"
image = "@IMAGECONFIDENTIALPATH@"
# initrd = "@INITRDPATH@"
machine_type = "@MACHINETYPE@"
tdx_quote_generation_service_socket_port = @QEMUTDXQUOTEGENERATIONSERVICESOCKETPORT@
@@ -23,7 +22,7 @@ tdx_quote_generation_service_socket_port = @QEMUTDXQUOTEGENERATIONSERVICESOCKETP
# - ext4 (default)
# - xfs
# - erofs
rootfs_type=@DEFROOTFSTYPE@
rootfs_type = @DEFROOTFSTYPE@
# Enable confidential guest support.
# Toggling that setting may trigger different hardware features, ranging
@@ -44,7 +43,7 @@ confidential_guest = true
# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true
rootless = false
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
@@ -82,7 +81,7 @@ firmware_volume = "@FIRMWARETDVFVOLUMEPATH@"
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators="@MACHINEACCELERATORS@"
machine_accelerators = "@MACHINEACCELERATORS@"
# Qemu seccomp sandbox feature
# comma-separated list of seccomp sandbox features to control the syscall access.
@@ -90,12 +89,13 @@ machine_accelerators="@MACHINEACCELERATORS@"
# Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox
# Another note: enabling this feature may reduce performance, you may enable
# /proc/sys/net/core/bpf_jit_enable to reduce the impact. see https://man7.org/linux/man-pages/man8/bpfc.8.html
#seccompsandbox="@DEFSECCOMPSANDBOXPARAM@"
# Recommended value when enabling: "on,obsolete=deny,spawn=deny,resourcecontrol=deny"
seccompsandbox = "@DEFSECCOMPSANDBOXPARAM@"
# CPU features
# comma-separated list of cpu features to pass to the cpu
# For example, `cpu_features = "pmu=off,vmx=off"
cpu_features="@TDXCPUFEATURES@"
cpu_features = "@TDXCPUFEATURES@"
# Default number of vCPUs per SB/VM:
# unspecified or 0 --> will be set to @DEFVCPUS@
@@ -140,7 +140,7 @@ default_memory = @DEFMEMSZ@
# Default memory slots per SB/VM.
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# Default maximum memory in MiB per SB / VM
# unspecified or == 0 --> will be set to the actual amount of physical RAM
@@ -153,13 +153,13 @@ default_maxmemory = @DEFMAXMEMSZ@
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0
memory_offset = 0
# Specifies virtio-mem will be enabled or not.
# Please note that this option should be used with the command
# "echo 1 > /proc/sys/vm/overcommit_memory".
# Default false
#enable_virtio_mem = true
enable_virtio_mem = false
# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
@@ -240,24 +240,28 @@ block_device_aio = "@DEFBLOCKDEVICEAIO_QEMU@"
# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true
block_device_cache_set = false
# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true
block_device_cache_direct = false
# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true
block_device_cache_noflush = false
# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
# for SCSI.
# handled in a separate IO thread. This is currently implemented
# for virtio-scsi and virtio-blk.
#
enable_iothreads = @DEFENABLEIOTHREADS@
# Independent IOThreads enables IO to be processed in a separate thread, it is
# for QEMU hotplug device attach to iothread, like virtio-blk.
indep_iothreads = @DEFINDEPIOTHREADS@
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
@@ -265,7 +269,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
enable_mem_prealloc = false
# Reclaim guest freed memory.
# Enabling this will result in the VM balloon device having f_reporting=on set.
@@ -275,7 +279,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# the VM.
#
# Default false
#reclaim_guest_freed_memory = true
reclaim_guest_freed_memory = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -283,7 +287,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Enable vhost-user storage device, default false
# Enabling this will result in some Linux reserved block type
@@ -300,11 +304,11 @@ vhost_user_store_path = "@DEFVHOSTUSERSTOREPATH@"
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true
enable_iommu = false
# Enable IOMMU_PLATFORM, default false
# Enabling this will result in the VM device having iommu_platform=on set
#enable_iommu_platform = true
enable_iommu_platform = false
# List of valid annotations values for the vhost user store path
# The default if not set is empty (all annotations rejected.)
@@ -320,7 +324,7 @@ vhost_user_reconnect_timeout_sec = 0
# will disable this feature. In the case of virtio-fs, this is enabled
# automatically and '/dev/shm' is used as the backing folder.
# This option will be ignored if VM templating is enabled.
#file_mem_backend = "@DEFFILEMEMBACKEND@"
file_mem_backend = "@DEFFILEMEMBACKEND@"
# List of valid annotations values for the file_mem_backend annotation
# The default if not set is empty (all annotations rejected.)
@@ -335,17 +339,17 @@ pflashes = []
# to enable debug output where available. And Debug also enable the hmp socket.
#
# Default false
#enable_debug = true
enable_debug = false
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
disable_nesting_checks = false
# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = @DEFMSIZE9P@
msize_9p = @DEFMSIZE9P@
# If false and nvdimm is supported, use nvdimm device to plug guest image.
# Otherwise virtio-block device is used.
@@ -357,11 +361,11 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
# The value means the number of pcie_root_port
# Default 0
#pcie_root_port = 2
pcie_root_port = 0
# If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off
# security (vhost-net runs ring0) for network I/O performance.
#disable_vhost_net = true
disable_vhost_net = false
#
# Default entropy source.
@@ -373,7 +377,7 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "@DEFENTROPYSOURCE@"
entropy_source= "@DEFENTROPYSOURCE@"
# List of valid annotations values for entropy_source
# The default if not set is empty (all annotations rejected.)
@@ -395,17 +399,18 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
#
# Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) to discipline traffic.
# Default 0-sized value means unlimited rate.
#rx_rate_limiter_max_rate = 0
rx_rate_limiter_max_rate = 0
# Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) and ifb(Intermediate Functional Block)
# to discipline traffic.
# Default 0-sized value means unlimited rate.
#tx_rate_limiter_max_rate = 0
tx_rate_limiter_max_rate = 0
# Set where to save the guest memory dump file.
# If set, when GUEST_PANICKED event occurred,
@@ -415,9 +420,10 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# The dumped file(also called vmcore) can be processed with crash or gdb.
#
# WARNING:
# Dump guests memory can take very long depending on the amount of guest memory
# Dump guest's memory can take very long depending on the amount of guest memory
# and use much disk space.
#guest_memory_dump_path="/var/crash/kata"
# Recommended value when enabling: "/var/crash/kata"
guest_memory_dump_path = ""
# If enable paging.
# Basically, if you want to use "gdb" rather than "crash",
@@ -425,7 +431,7 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# then you should enable paging.
#
# See: https://www.qemu.org/docs/master/qemu-qmp-ref.html#Dump-guest-memory for details
#guest_memory_dump_paging=false
guest_memory_dump_paging = false
# Enable swap in the guest. Default false.
# When enable_guest_swap is enabled, insert a raw file to the guest as the swap device
@@ -436,20 +442,20 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
# If swap_in_bytes and memory_limit_in_bytes is not set, the size should
# be default_memory.
#enable_guest_swap = true
enable_guest_swap = false
# use legacy serial for guest console if available and implemented for architecture. Default false
#use_legacy_serial = true
use_legacy_serial = false
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
disable_selinux = @DEFDISABLESELINUX@
# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
# Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built
# with `SELINUX=yes`.
# (default: true)
disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
disable_guest_selinux = @DEFDISABLEGUESTSELINUX@
[factory]
@@ -464,12 +470,12 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true
enable_template = false
# Specifies the path of template.
#
# Default "/run/vc/vm/template"
#template_path = "/run/vc/vm/template"
template_path = "/run/vc/vm/template"
# The number of caches of VMCache:
# unspecified or == 0 --> VMCache is disabled
@@ -488,17 +494,17 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# a new sandbox.
#
# Default 0
#vm_cache_number = 0
vm_cache_number = 0
# Specify the address of the Unix socket that is used by VMCache.
#
# Default /var/run/kata-containers/cache.sock
#vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -512,7 +518,7 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
@@ -525,14 +531,14 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# * The module is not available in the guest or it doesn't met the guest kernel
# requirements, like architecture and version.
#
kernel_modules=[]
kernel_modules = []
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 60)
@@ -542,7 +548,7 @@ dial_timeout = 60
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
#
# Internetworking model
# Determines how the VM should be connected to the
@@ -560,19 +566,19 @@ dial_timeout = 60
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_QEMU@"
internetworking_model = "@DEFNETWORKMODEL_QEMU@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# vCPUs pinning settings
# if enabled, each vCPU thread will be scheduled to a fixed CPU
# qualified condition: num(vCPU threads) == num(CPUs in sandbox's CPUSet)
# enable_vcpus_pinning = false
enable_vcpus_pinning = false
# Apply a custom SELinux security policy to the container process inside the VM.
# This is used when you want to apply a type other than the default `container_t`,
@@ -580,22 +586,23 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="@DEFGUESTSELINUXLABEL@"
# Example value when enabling: "system_u:system_r:container_t"
guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -603,7 +610,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -611,7 +618,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -620,13 +627,13 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT_TEE@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT_TEE@
# If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
# If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts`
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=@DEFBINDMOUNTS@
sandbox_bind_mounts = @DEFBINDMOUNTS@
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
@@ -647,22 +654,22 @@ sandbox_bind_mounts=@DEFBINDMOUNTS@
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE@"
vfio_mode = "@DEFVFIOMODE@"
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=@DEFDISABLEGUESTEMPTYDIR@
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false
# Indicates the CreateContainer request timeout needed for the workload(s)
# It using guest_pull this includes the time to pull the image inside the guest
@@ -685,3 +692,22 @@ dan_conf = "@DEFDANCONF@"
# the container image should be pulled in the guest, without using an external snapshotter.
# This is an experimental feature and might be removed in the future.
experimental_force_guest_pull = @DEFFORCEGUESTPULL@
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the
# KubeletPodResourcesGet featureGate must be enabled in Kubelet,
# if using Kubelet older than 1.34.
#
# The pod resource API's socket is relative to the Kubelet's root-dir,
# which is defined by the cluster admin, and its location is:
# ${KubeletRootDir}/pod-resources/kubelet.sock
#
# cold_plug_vfio(see hypervisor config) acts as a feature gate:
# cold_plug_vfio = no_port (default) => no cold plug
# cold_plug_vfio != no_port AND pod_resource_api_sock = "" => need
# explicit CDI annotation for cold plug (applies mainly
# to non-k8s cases)
# cold_plug_vfio != no_port AND pod_resource_api_sock != "" => kubelet
# based cold plug.
pod_resource_api_sock = "@DEFPODRESOURCEAPISOCK@"

View File

@@ -15,41 +15,18 @@
path = "@QEMUPATH@"
kernel = "@KERNELPATH@"
image = "@IMAGEPATH@"
# initrd = "@INITRDPATH@"
machine_type = "@MACHINETYPE@"
# rootfs filesystem type:
# - ext4 (default)
# - xfs
# - erofs
rootfs_type=@DEFROOTFSTYPE@
# Enable confidential guest support.
# Toggling that setting may trigger different hardware features, ranging
# from memory encryption to both memory and CPU-state encryption and integrity.
# The Kata Containers runtime dynamically detects the available feature set and
# aims at enabling the largest possible one, returning an error if none is
# available, or none is supported by the hypervisor.
#
# Known limitations:
# * Does not work by design:
# - CPU Hotplug
# - Memory Hotplug
# - NVDIMM devices
#
# Default false
# confidential_guest = true
# Choose AMD SEV-SNP confidential guests
# In case of using confidential guests on AMD hardware that supports both SEV
# and SEV-SNP, the following enables SEV-SNP guests. SEV guests are default.
# Default false
# sev_snp_guest = true
rootfs_type = @DEFROOTFSTYPE@
# Enable running QEMU VMM as a non-root user.
# By default QEMU VMM run as root. When this is set to true, QEMU VMM process runs as
# a non-root random user. See documentation for the limitations of this mode.
# rootless = true
rootless = false
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
@@ -87,7 +64,7 @@ firmware_volume = "@FIRMWAREVOLUMEPATH@"
# Machine accelerators
# comma-separated list of machine accelerators to pass to the hypervisor.
# For example, `machine_accelerators = "nosmm,nosmbus,nosata,nopit,static-prt,nofw"`
machine_accelerators="@MACHINEACCELERATORS@"
machine_accelerators = "@MACHINEACCELERATORS@"
# Qemu seccomp sandbox feature
# comma-separated list of seccomp sandbox features to control the syscall access.
@@ -95,12 +72,13 @@ machine_accelerators="@MACHINEACCELERATORS@"
# Note: "elevateprivileges=deny" doesn't work with daemonize option, so it's removed from the seccomp sandbox
# Another note: enabling this feature may reduce performance, you may enable
# /proc/sys/net/core/bpf_jit_enable to reduce the impact. see https://man7.org/linux/man-pages/man8/bpfc.8.html
#seccompsandbox="@DEFSECCOMPSANDBOXPARAM@"
# Recommended value when enabling: "on,obsolete=deny,spawn=deny,resourcecontrol=deny"
seccompsandbox = "@DEFSECCOMPSANDBOXPARAM@"
# CPU features
# comma-separated list of cpu features to pass to the cpu
# For example, `cpu_features = "pmu=off,vmx=off"
cpu_features="@CPUFEATURES@"
cpu_features = "@CPUFEATURES@"
# Default number of vCPUs per SB/VM:
# unspecified or 0 --> will be set to @DEFVCPUS@
@@ -145,7 +123,7 @@ default_memory = @DEFMEMSZ@
# Default memory slots per SB/VM.
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# Default maximum memory in MiB per SB / VM
# unspecified or == 0 --> will be set to the actual amount of physical RAM
@@ -158,13 +136,13 @@ default_maxmemory = @DEFMAXMEMSZ@
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0
memory_offset = 0
# Specifies virtio-mem will be enabled or not.
# Please note that this option should be used with the command
# "echo 1 > /proc/sys/vm/overcommit_memory".
# Default false
#enable_virtio_mem = true
enable_virtio_mem = false
# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
@@ -245,24 +223,28 @@ block_device_aio = "@DEFBLOCKDEVICEAIO_QEMU@"
# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true
block_device_cache_set = false
# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true
block_device_cache_direct = false
# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true
block_device_cache_noflush = false
# Enable iothreads (data-plane) to be used. This causes IO to be
# handled in a separate IO thread. This is currently only implemented
# for SCSI.
# handled in a separate IO thread. This is currently implemented
# for virtio-scsi and virtio-blk.
#
enable_iothreads = @DEFENABLEIOTHREADS@
# Independent IOThreads enables IO to be processed in a separate thread, it is
# for QEMU hotplug device attach to iothread, like virtio-blk.
indep_iothreads = @DEFINDEPIOTHREADS@
# Enable pre allocation of VM RAM, default false
# Enabling this will result in lower container density
# as all of the memory will be allocated and locked
@@ -270,7 +252,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# upfront or in the cases where you want memory latencies
# to be very predictable
# Default false
#enable_mem_prealloc = true
enable_mem_prealloc = false
# Reclaim guest freed memory.
# Enabling this will result in the VM balloon device having f_reporting=on set.
@@ -280,7 +262,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# the VM.
#
# Default false
#reclaim_guest_freed_memory = true
reclaim_guest_freed_memory = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -288,7 +270,7 @@ enable_iothreads = @DEFENABLEIOTHREADS@
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Enable vhost-user storage device, default false
# Enabling this will result in some Linux reserved block type
@@ -305,11 +287,11 @@ vhost_user_store_path = "@DEFVHOSTUSERSTOREPATH@"
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true
enable_iommu = false
# Enable IOMMU_PLATFORM, default false
# Enabling this will result in the VM device having iommu_platform=on set
#enable_iommu_platform = true
enable_iommu_platform = false
# List of valid annotations values for the vhost user store path
# The default if not set is empty (all annotations rejected.)
@@ -325,7 +307,7 @@ vhost_user_reconnect_timeout_sec = 0
# will disable this feature. In the case of virtio-fs, this is enabled
# automatically and '/dev/shm' is used as the backing folder.
# This option will be ignored if VM templating is enabled.
#file_mem_backend = "@DEFFILEMEMBACKEND@"
file_mem_backend = "@DEFFILEMEMBACKEND@"
# List of valid annotations values for the file_mem_backend annotation
# The default if not set is empty (all annotations rejected.)
@@ -340,7 +322,7 @@ pflashes = []
# to enable debug output where available.
#
# Default false
#enable_debug = true
enable_debug = false
# This option allows to add an extra HMP or QMP socket when `enable_debug = true`
#
@@ -355,17 +337,17 @@ pflashes = []
#
# If set to the empty string "", no extra monitor socket is added. This is
# the default.
#extra_monitor_socket = hmp
extra_monitor_socket = ""
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
disable_nesting_checks = false
# This is the msize used for 9p shares. It is the number of bytes
# used for 9p packet payload.
#msize_9p = @DEFMSIZE9P@
msize_9p = @DEFMSIZE9P@
# If false and nvdimm is supported, use nvdimm device to plug guest image.
# Otherwise virtio-block device is used.
@@ -376,24 +358,24 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# Enable hot-plugging of VFIO devices to a bridge-port,
# root-port or switch-port.
# The default setting is "no-port"
#hot_plug_vfio = "root-port"
hot_plug_vfio = "no-port"
# In a confidential compute environment hot-plugging can compromise
# security.
# Enable cold-plugging of VFIO devices to a bridge-port,
# root-port or switch-port.
# The default setting is "no-port", which means disabled.
#cold_plug_vfio = "root-port"
cold_plug_vfio = "no-port"
# Before hot plugging a PCIe device, you need to add a pcie_root_port device.
# Use this parameter when using some large PCI bar devices, such as Nvidia GPU
# The value means the number of pcie_root_port
# Default 0
#pcie_root_port = 2
pcie_root_port = 0
# If vhost-net backend for virtio-net is not desired, set to true. Default is false, which trades off
# security (vhost-net runs ring0) for network I/O performance.
#disable_vhost_net = true
disable_vhost_net = false
#
# Default entropy source.
@@ -405,7 +387,7 @@ disable_image_nvdimm = @DEFDISABLEIMAGENVDIMM@
# The source of entropy /dev/urandom is non-blocking and provides a
# generally acceptable source of entropy. It should work well for pretty much
# all practical purposes.
#entropy_source= "@DEFENTROPYSOURCE@"
entropy_source= "@DEFENTROPYSOURCE@"
# List of valid annotations values for entropy_source
# The default if not set is empty (all annotations rejected.)
@@ -427,17 +409,18 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
#
# Use rx Rate Limiter to control network I/O inbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) to discipline traffic.
# Default 0-sized value means unlimited rate.
#rx_rate_limiter_max_rate = 0
rx_rate_limiter_max_rate = 0
# Use tx Rate Limiter to control network I/O outbound bandwidth(size in bits/sec for SB/VM).
# In Qemu, we use classful qdiscs HTB(Hierarchy Token Bucket) and ifb(Intermediate Functional Block)
# to discipline traffic.
# Default 0-sized value means unlimited rate.
#tx_rate_limiter_max_rate = 0
tx_rate_limiter_max_rate = 0
# Set where to save the guest memory dump file.
# If set, when GUEST_PANICKED event occurred,
@@ -447,9 +430,10 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# The dumped file(also called vmcore) can be processed with crash or gdb.
#
# WARNING:
# Dump guests memory can take very long depending on the amount of guest memory
# Dump guest's memory can take very long depending on the amount of guest memory
# and use much disk space.
#guest_memory_dump_path="/var/crash/kata"
# Recommended value when enabling: "/var/crash/kata"
guest_memory_dump_path = ""
# If enable paging.
# Basically, if you want to use "gdb" rather than "crash",
@@ -457,7 +441,7 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# then you should enable paging.
#
# See: https://www.qemu.org/docs/master/qemu-qmp-ref.html#Dump-guest-memory for details
#guest_memory_dump_paging=false
guest_memory_dump_paging = false
# Enable swap in the guest. Default false.
# When enable_guest_swap is enabled, insert a raw file to the guest as the swap device
@@ -468,20 +452,20 @@ valid_entropy_sources = @DEFVALIDENTROPYSOURCES@
# If swap_in_bytes is not set, the size should be memory_limit_in_bytes.
# If swap_in_bytes and memory_limit_in_bytes is not set, the size should
# be default_memory.
#enable_guest_swap = true
enable_guest_swap = false
# use legacy serial for guest console if available and implemented for architecture. Default false
#use_legacy_serial = true
use_legacy_serial = false
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
disable_selinux = @DEFDISABLESELINUX@
# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
# Note: To enable guest SELinux, the guest rootfs must be CentOS that is created and built
# with `SELINUX=yes`.
# (default: true)
disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
disable_guest_selinux = @DEFDISABLEGUESTSELINUX@
[factory]
@@ -496,12 +480,12 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true
enable_template = false
# Specifies the path of template.
#
# Default "/run/vc/vm/template"
#template_path = "/run/vc/vm/template"
template_path = "/run/vc/vm/template"
# The number of caches of VMCache:
# unspecified or == 0 --> VMCache is disabled
@@ -520,17 +504,17 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# a new sandbox.
#
# Default 0
#vm_cache_number = 0
vm_cache_number = 0
# Specify the address of the Unix socket that is used by VMCache.
#
# Default /var/run/kata-containers/cache.sock
#vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
vm_cache_endpoint = "/var/run/kata-containers/cache.sock"
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -544,7 +528,7 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
@@ -557,14 +541,14 @@ disable_guest_selinux=@DEFDISABLEGUESTSELINUX@
# * The module is not available in the guest or it doesn't met the guest kernel
# requirements, like architecture and version.
#
kernel_modules=[]
kernel_modules = []
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 45)
@@ -572,13 +556,13 @@ dial_timeout = 45
# Confidential Data Hub API timeout value in seconds
# (default: 50)
#cdh_api_timeout = 50
cdh_api_timeout = 50
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
#
# Internetworking model
# Determines how the VM should be connected to the
@@ -596,19 +580,19 @@ dial_timeout = 45
# Uses tc filter rules to redirect traffic from the network interface
# provided by plugin to a tap interface connected to the VM.
#
internetworking_model="@DEFNETWORKMODEL_QEMU@"
internetworking_model = "@DEFNETWORKMODEL_QEMU@"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
# machine and applied by the kata agent. If set to true, seccomp is not applied
# within the guest
# (default: true)
disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# vCPUs pinning settings
# if enabled, each vCPU thread will be scheduled to a fixed CPU
# qualified condition: num(vCPU threads) == num(CPUs in sandbox's CPUSet)
# enable_vcpus_pinning = false
enable_vcpus_pinning = false
# Apply a custom SELinux security policy to the container process inside the VM.
# This is used when you want to apply a type other than the default `container_t`,
@@ -616,22 +600,23 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="@DEFGUESTSELINUXLABEL@"
# Example value when enabling: "system_u:system_r:container_t"
guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -639,7 +624,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -647,7 +632,7 @@ disable_guest_seccomp=@DEFDISABLEGUESTSECCOMP@
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -656,13 +641,13 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
# - When running with pods, sandbox sizing information will only be available if using Kubernetes >= 1.23 and containerd >= 1.6. CRI-O
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
static_sandbox_resource_mgmt=@DEFSTATICRESOURCEMGMT@
static_sandbox_resource_mgmt = @DEFSTATICRESOURCEMGMT@
# If specified, sandbox_bind_mounts identifieds host paths to be mounted (ro) into the sandboxes shared path.
# This is only valid if filesystem sharing is utilized. The provided path(s) will be bindmounted into the shared fs directory.
# If defaults are utilized, these mounts should be available in the guest at `/run/kata-containers/shared/containers/sandbox-mounts`
# These will not be exposed to the container workloads, and are only provided for potential guest services.
sandbox_bind_mounts=@DEFBINDMOUNTS@
sandbox_bind_mounts = @DEFBINDMOUNTS@
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
@@ -683,22 +668,22 @@ sandbox_bind_mounts=@DEFBINDMOUNTS@
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE@"
vfio_mode = "@DEFVFIOMODE@"
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
disable_guest_empty_dir=@DEFDISABLEGUESTEMPTYDIR@
disable_guest_empty_dir = @DEFDISABLEGUESTEMPTYDIR@
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false
# Indicates the CreateContainer request timeout needed for the workload(s)
# It using guest_pull this includes the time to pull the image inside the guest
@@ -716,3 +701,22 @@ create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
# to the hypervisor.
# (default: /run/kata-containers/dans)
dan_conf = "@DEFDANCONF@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the
# KubeletPodResourcesGet featureGate must be enabled in Kubelet,
# if using Kubelet older than 1.34.
#
# The pod resource API's socket is relative to the Kubelet's root-dir,
# which is defined by the cluster admin, and its location is:
# ${KubeletRootDir}/pod-resources/kubelet.sock
#
# cold_plug_vfio(see hypervisor config) acts as a feature gate:
# cold_plug_vfio = no_port (default) => no cold plug
# cold_plug_vfio != no_port AND pod_resource_api_sock = "" => need
# explicit CDI annotation for cold plug (applies mainly
# to non-k8s cases)
# cold_plug_vfio != no_port AND pod_resource_api_sock != "" => kubelet
# based cold plug.
pod_resource_api_sock = "@DEFPODRESOURCEAPISOCK@"

View File

@@ -16,24 +16,6 @@
remote_hypervisor_socket = "/run/peerpod/hypervisor.sock"
remote_hypervisor_timeout = 600
# Enable confidential guest support.
# Toggling that setting may trigger different hardware features, ranging
# from memory encryption to both memory and CPU-state encryption and integrity.
# The Kata Containers runtime dynamically detects the available feature set and
# aims at enabling the largest possible one, returning an error if none is
# available, or none is supported by the hypervisor.
#
# Known limitations:
# * Does not work by design:
# - CPU Hotplug
# - Memory Hotplug
# - NVDIMM devices
#
# Default false
# confidential_guest = true
# List of valid annotation names for the hypervisor
# Each member of the list is a regular expression, which is the base name
# of the annotation, e.g. "path" for io.katacontainers.config.hypervisor.path"
@@ -102,13 +84,13 @@ default_bridges = @DEFBRIDGES@
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
# Note: the remote hypervisor uses the peer pod config to determine the memory of the VM
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# This option changes the default hypervisor and kernel parameters
# to enable debug output where available. And Debug also enable the hmp socket.
#
# Default false
#enable_debug = true
enable_debug = false
# Path to OCI hook binaries in the *guest rootfs*.
# This does not affect host-side hooks which must instead be added to
@@ -125,10 +107,11 @@ default_bridges = @DEFBRIDGES@
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
# disable applying SELinux on the VMM process (default false)
disable_selinux=@DEFDISABLESELINUX@
disable_selinux = @DEFDISABLESELINUX@
# disable applying SELinux on the container process
# If set to false, the type `container_t` is applied to the container process by default.
@@ -141,7 +124,7 @@ disable_guest_selinux = true
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -155,24 +138,24 @@ disable_guest_selinux = true
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Enable debug console.
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 30)
#dial_timeout = 30
dial_timeout = 30
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
#
# Internetworking model
# Determines how the VM should be connected to the
@@ -191,7 +174,7 @@ disable_guest_selinux = true
# provided by plugin to a tap interface connected to the VM.
#
# Note: The remote hypervisor, uses it's own network, so "none" is required
internetworking_model="none"
internetworking_model = "none"
# disable guest seccomp
# Determines whether container seccomp profiles are passed to the virtual
@@ -199,7 +182,7 @@ internetworking_model="none"
# within the guest
# (default: true)
# Note: The remote hypervisor has a different guest, so currently requires this to be set to true
disable_guest_seccomp=true
disable_guest_seccomp = true
# Apply a custom SELinux security policy to the container process inside the VM.
@@ -208,22 +191,23 @@ disable_guest_seccomp=true
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label="@DEFGUESTSELINUXLABEL@"
# Example value when enabling: "system_u:system_r:container_t"
guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -240,7 +224,7 @@ disable_new_netns = true
# The sandbox cgroup path is the parent cgroup of a container with the PodSandbox annotation.
# The sandbox cgroup is constrained if there is no container type annotation.
# See: https://pkg.go.dev/github.com/kata-containers/kata-containers/src/runtime/virtcontainers#ContainerType
sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
sandbox_cgroup_only = @DEFSANDBOXCGROUPONLY@
# If enabled, the runtime will attempt to determine appropriate sandbox size (memory, CPU) before booting the virtual machine. In
# this case, the runtime will not dynamically update the amount of memory and CPU in the virtual machine. This is generally helpful
@@ -250,7 +234,7 @@ sandbox_cgroup_only=@DEFSANDBOXCGROUPONLY@
# does not yet support sandbox sizing annotations.
# - When running single containers using a tool like ctr, container sizing information will be available.
# Note: the remote hypervisor uses the peer pod config to determine the sandbox size, so requires this to be set to true
static_sandbox_resource_mgmt=true
static_sandbox_resource_mgmt = true
# VFIO Mode
# Determines how VFIO devices should be be presented to the container.
@@ -271,23 +255,23 @@ static_sandbox_resource_mgmt=true
# Using this mode requires specially built workloads that know how
# to locate the relevant device interfaces within the VM.
#
vfio_mode="@DEFVFIOMODE@"
vfio_mode = "@DEFVFIOMODE@"
# If enabled, the runtime will not create Kubernetes emptyDir mounts on the guest filesystem. Instead, emptyDir mounts will
# be created on the host and shared via virtio-fs. This is potentially slower, but allows sharing of files from host to guest.
# Note: remote hypervisor has no sharing of emptydir mounts from host to guest
disable_guest_empty_dir=false
disable_guest_empty_dir = false
# Enabled experimental feature list, format: ["a", "b"].
# Experimental features are features not stable enough for production,
# they may break compatibility, and are prepared for a big version bump.
# Supported experimental features:
# (default: [])
experimental=@DEFAULTEXPFEATURES@
experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
# enable_pprof = true
enable_pprof = false
# Indicates the CreateContainer request timeout needed for the workload(s)
# It using guest_pull this includes the time to pull the image inside the guest
@@ -305,3 +289,22 @@ create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
# to the hypervisor.
# (default: /run/kata-containers/dans)
dan_conf = "@DEFDANCONF@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the
# KubeletPodResourcesGet featureGate must be enabled in Kubelet,
# if using Kubelet older than 1.34.
#
# The pod resource API's socket is relative to the Kubelet's root-dir,
# which is defined by the cluster admin, and its location is:
# ${KubeletRootDir}/pod-resources/kubelet.sock
#
# cold_plug_vfio(see hypervisor config) acts as a feature gate:
# cold_plug_vfio = no_port (default) => no cold plug
# cold_plug_vfio != no_port AND pod_resource_api_sock = "" => need
# explicit CDI annotation for cold plug (applies mainly
# to non-k8s cases)
# cold_plug_vfio != no_port AND pod_resource_api_sock != "" => kubelet
# based cold plug.
pod_resource_api_sock = "@DEFPODRESOURCEAPISOCK@"

View File

@@ -13,8 +13,7 @@
[hypervisor.stratovirt]
path = "@STRATOVIRTPATH@"
kernel = "@KERNELPATH_STRATOVIRT@"
#image = "@IMAGEPATH@"
initrd = "@INITRDPATH@"
image = "@IMAGEPATH@"
machine_type = "@DEFMACHINETYPE_STRATOVIRT@"
# rootfs filesystem type:
@@ -89,7 +88,7 @@ default_memory = @DEFMEMSZ@
# Default memory slots per SB/VM.
# If unspecified then it will be set @DEFMEMSLOTS@.
# This is will determine the times that memory will be hotadded to sandbox/VM.
#memory_slots = @DEFMEMSLOTS@
memory_slots = @DEFMEMSLOTS@
# Default maximum memory in MiB per SB / VM
# unspecified or == 0 --> will be set to the actual amount of physical RAM
@@ -102,7 +101,7 @@ default_maxmemory = @DEFMAXMEMSZ@
# If set block storage driver (block_device_driver) to "nvdimm",
# should set memory_offset to the size of block device.
# Default 0
#memory_offset = 0
memory_offset = 0
# Disable block device from being used for a container's rootfs.
# In case of a storage driver like devicemapper where a container's
@@ -164,17 +163,17 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_STRATOVIRT@"
# Specifies cache-related options will be set to block devices or not.
# Default false
#block_device_cache_set = true
block_device_cache_set = false
# Specifies cache-related options for block devices.
# Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
# Default false
#block_device_cache_direct = true
block_device_cache_direct = false
# Specifies cache-related options for block devices.
# Denotes whether flush requests for the device are ignored.
# Default false
#block_device_cache_noflush = true
block_device_cache_noflush = false
# Enable huge pages for VM RAM, default false
# Enabling this will result in the VM memory
@@ -182,25 +181,25 @@ block_device_driver = "@DEFBLOCKSTORAGEDRIVER_STRATOVIRT@"
# This is useful when you want to use vhost-user network
# stacks within the container. This will automatically
# result in memory pre allocation
#enable_hugepages = true
enable_hugepages = false
# Enable vIOMMU, default false
# Enabling this will result in the VM having a vIOMMU device
# This will also add the following options to the kernel's
# command line: intel_iommu=on,iommu=pt
#enable_iommu = true
enable_iommu = false
# This option changes the default hypervisor and kernel parameters
# to enable debug output where available.
#
# Default false
#enable_debug = true
enable_debug = false
# Disable the customizations done in the runtime when it detects
# that it is running on top a VMM. This will result in the runtime
# behaving as it would when running on bare metal.
#
#disable_nesting_checks = true
disable_nesting_checks = false
#
# Default entropy source.
@@ -229,7 +228,8 @@ entropy_source = "@DEFENTROPYSOURCE@"
# https://github.com/opencontainers/runtime-spec/blob/v1.0.1/config.md#posix-platform-hooks
# Warnings will be logged if any error is encountered while scanning for hooks,
# but it will not abort container execution.
#guest_hook_path = "/usr/share/oci/hooks"
# Recommended value when enabling: "/usr/share/oci/hooks"
guest_hook_path = ""
# disable applying SELinux on the VMM process (default false)
disable_selinux = @DEFDISABLESELINUX@
@@ -253,12 +253,12 @@ disable_guest_selinux = @DEFDISABLEGUESTSELINUX@
# Note: Requires "initrd=" to be set ("image=" is not supported).
#
# Default false
#enable_template = true
enable_template = false
[agent.@PROJECT_TYPE@]
# If enabled, make the agent display debug-level messages.
# (default: disabled)
#enable_debug = true
enable_debug = false
# Enable agent tracing.
#
@@ -272,7 +272,7 @@ disable_guest_selinux = @DEFDISABLEGUESTSELINUX@
# increasing the container shutdown time slightly.
#
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Comma separated list of kernel modules and their parameters.
# These modules will be loaded in the guest kernel using modprobe(8).
@@ -292,7 +292,7 @@ kernel_modules = []
# If enabled, user can connect guest OS running inside hypervisor
# through "kata-runtime exec <sandbox-id>" command
#debug_console_enabled = true
debug_console_enabled = false
# Agent connection dialing timeout value in seconds
# (default: 45)
@@ -300,13 +300,13 @@ dial_timeout = 45
# Confidential Data Hub API timeout value in seconds
# (default: 50)
#cdh_api_timeout = 50
cdh_api_timeout = 50
[runtime]
# If enabled, the runtime will log additional debug messages to the
# system log
# (default: disabled)
#enable_debug = true
enable_debug = false
#
# Internetworking model
# Determines how the VM should be connected to the
@@ -336,7 +336,7 @@ disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# vCPUs pinning settings
# if enabled, each vCPU thread will be scheduled to a fixed CPU
# qualified condition: num(vCPU threads) == num(CPUs in sandbox's CPUSet)
#enable_vcpus_pinning = false
enable_vcpus_pinning = true
# Apply a custom SELinux security policy to the container process inside the VM.
# This is used when you want to apply a type other than the default `container_t`,
@@ -344,22 +344,23 @@ disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# (format: "user:role:type")
# Note: You cannot specify MCS policy with the label because the sensitivity levels and
# categories are determined automatically by high-level container runtimes such as containerd.
#guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# Example value when enabling: "system_u:system_r:container_t"
guest_selinux_label = "@DEFGUESTSELINUXLABEL@"
# If enabled, the runtime will create opentracing.io traces and spans.
# (See https://www.jaegertracing.io/docs/getting-started).
# (default: disabled)
#enable_tracing = true
enable_tracing = false
# Set the full url to the Jaeger HTTP Thrift collector.
# The default if not set will be "http://localhost:14268/api/traces"
#jaeger_endpoint = ""
jaeger_endpoint = ""
# Sets the username to be used if basic auth is required for Jaeger.
#jaeger_user = ""
jaeger_user = ""
# Sets the password to be used if basic auth is required for Jaeger.
#jaeger_password = ""
jaeger_password = ""
# If enabled, the runtime will not create a network namespace for shim and hypervisor processes.
# This option may have some potential impacts to your host. It should only be used when you know what you're doing.
@@ -367,7 +368,7 @@ disable_guest_seccomp = @DEFDISABLEGUESTSECCOMP@
# with `internetworking_model=none`. The tap device will be in the host network namespace and can connect to a bridge
# (like OVS) directly.
# (default: false)
#disable_new_netns = true
disable_new_netns = false
# if enabled, the runtime will add all the kata processes inside one dedicated cgroup.
# The container cgroups in the host are not created, just one single cgroup per sandbox.
@@ -399,7 +400,7 @@ experimental = @DEFAULTEXPFEATURES@
# If enabled, user can run pprof tools with shim v2 process through kata-monitor.
# (default: false)
#enable_pprof = true
enable_pprof = false
# Indicates the CreateContainer request timeout needed for the workload(s)
# It using guest_pull this includes the time to pull the image inside the guest
@@ -417,3 +418,22 @@ create_container_timeout = @DEFCREATECONTAINERTIMEOUT@
# to the hypervisor.
# (default: /run/kata-containers/dans)
dan_conf = "@DEFDANCONF@"
# pod_resource_api_sock specifies the unix socket for the Kubelet's
# PodResource API endpoint. If empty, kubernetes based cold plug
# will not be attempted. In order for this feature to work, the
# KubeletPodResourcesGet featureGate must be enabled in Kubelet,
# if using Kubelet older than 1.34.
#
# The pod resource API's socket is relative to the Kubelet's root-dir,
# which is defined by the cluster admin, and its location is:
# ${KubeletRootDir}/pod-resources/kubelet.sock
#
# cold_plug_vfio(see hypervisor config) acts as a feature gate:
# cold_plug_vfio = no_port (default) => no cold plug
# cold_plug_vfio != no_port AND pod_resource_api_sock = "" => need
# explicit CDI annotation for cold plug (applies mainly
# to non-k8s cases)
# cold_plug_vfio != no_port AND pod_resource_api_sock != "" => kubelet
# based cold plug.
pod_resource_api_sock = "@DEFPODRESOURCEAPISOCK@"

View File

@@ -1,7 +1,7 @@
module github.com/kata-containers/kata-containers/src/runtime
// Keep in sync with version in versions.yaml
go 1.24.9
go 1.24.11
// WARNING: Do NOT use `replace` directives as those break dependabot:
// https://github.com/kata-containers/kata-containers/issues/11020
@@ -20,8 +20,8 @@ require (
github.com/containerd/fifo v1.1.0
github.com/containerd/ttrpc v1.2.7
github.com/containerd/typeurl/v2 v2.2.3
github.com/containernetworking/plugins v1.7.1
github.com/coreos/go-systemd/v22 v22.5.1-0.20231103132048-7d375ecc2b09
github.com/containernetworking/plugins v1.9.0
github.com/coreos/go-systemd/v22 v22.6.0
github.com/cri-o/cri-o v1.34.0
github.com/docker/go-units v0.5.0
github.com/fsnotify/fsnotify v1.9.0
@@ -46,11 +46,11 @@ require (
github.com/prometheus/client_model v0.6.1
github.com/prometheus/common v0.62.0
github.com/prometheus/procfs v0.15.1
github.com/safchain/ethtool v0.5.10
github.com/safchain/ethtool v0.6.2
github.com/sirupsen/logrus v1.9.3
github.com/stretchr/testify v1.11.1
github.com/urfave/cli v1.22.15
github.com/vishvananda/netlink v1.3.1-0.20250303224720-0e7078ed04c8
github.com/vishvananda/netlink v1.3.1
github.com/vishvananda/netns v0.0.5
gitlab.com/nvidia/cloud-native/go-nvlib v0.0.0-20220601114329-47893b162965
go.opentelemetry.io/otel v1.35.0
@@ -58,11 +58,12 @@ require (
go.opentelemetry.io/otel/sdk v1.35.0
go.opentelemetry.io/otel/trace v1.35.0
golang.org/x/oauth2 v0.30.0
golang.org/x/sys v0.34.0
golang.org/x/sys v0.35.0
google.golang.org/grpc v1.72.0
google.golang.org/protobuf v1.36.6
google.golang.org/protobuf v1.36.7
k8s.io/apimachinery v0.33.0
k8s.io/cri-api v0.33.0
k8s.io/kubelet v0.33.0
tags.cncf.io/container-device-interface v1.0.1
)
@@ -71,7 +72,7 @@ require (
github.com/AdaLogics/go-fuzz-headers v0.0.0-20230811130428-ced1acdcaa24 // indirect
github.com/AdamKorcz/go-118-fuzz-build v0.0.0-20230306123547-8075edf89bb0 // indirect
github.com/Microsoft/go-winio v0.6.2 // indirect
github.com/Microsoft/hcsshim v0.12.9 // indirect
github.com/Microsoft/hcsshim v0.13.0 // indirect
github.com/asaskevich/govalidator v0.0.0-20230301143203-a9d515a09cc2 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
@@ -91,7 +92,7 @@ require (
github.com/docker/go-events v0.0.0-20190806004212-e31b211e4f1c // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/fxamacker/cbor/v2 v2.7.0 // indirect
github.com/go-logr/logr v1.4.2 // indirect
github.com/go-logr/logr v1.4.3 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-openapi/analysis v0.23.0 // indirect
github.com/go-openapi/jsonpointer v0.21.0 // indirect
@@ -131,9 +132,9 @@ require (
go.opentelemetry.io/otel/metric v1.35.0 // indirect
golang.org/x/exp v0.0.0-20241108190413-2d47ceb2692f // indirect
golang.org/x/mod v0.26.0 // indirect
golang.org/x/net v0.42.0 // indirect
golang.org/x/net v0.43.0 // indirect
golang.org/x/sync v0.16.0 // indirect
golang.org/x/text v0.27.0 // indirect
golang.org/x/text v0.28.0 // indirect
google.golang.org/genproto v0.0.0-20250303144028-a0af3efb3deb // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20250313205543-e70fdf4c4cb4 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect

View File

@@ -11,10 +11,12 @@ github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03
github.com/BurntSushi/toml v1.3.2/go.mod h1:CxXYINrC8qIiEnFrOxCa7Jy5BFHlXnUU2pbicEuybxQ=
github.com/BurntSushi/toml v1.5.0 h1:W5quZX/G/csjUnuI8SUYlsHs9M38FC7znL0lIO+DvMg=
github.com/BurntSushi/toml v1.5.0/go.mod h1:ukJfTF/6rtPPRCnwkur4qwRxa8vTRFBF0uk2lLoLwho=
github.com/Masterminds/semver/v3 v3.4.0 h1:Zog+i5UMtVoCU8oKka5P7i9q9HgrJeGzI9SA1Xbatp0=
github.com/Masterminds/semver/v3 v3.4.0/go.mod h1:4V+yj/TJE1HU9XfppCwVMZq3I84lprf4nC11bSS5beM=
github.com/Microsoft/go-winio v0.6.2 h1:F2VQgta7ecxGYO8k3ZZz3RS8fVIXVxONVUPlNERoyfY=
github.com/Microsoft/go-winio v0.6.2/go.mod h1:yd8OoFMLzJbo9gZq8j5qaps8bJ9aShtEA8Ipt1oGCvU=
github.com/Microsoft/hcsshim v0.12.9 h1:2zJy5KA+l0loz1HzEGqyNnjd3fyZA31ZBCGKacp6lLg=
github.com/Microsoft/hcsshim v0.12.9/go.mod h1:fJ0gkFAna6ukt0bLdKB8djt4XIJhF/vEPuoIWYVvZ8Y=
github.com/Microsoft/hcsshim v0.13.0 h1:/BcXOiS6Qi7N9XqUcv27vkIuVOkBEcWstd2pMlWSeaA=
github.com/Microsoft/hcsshim v0.13.0/go.mod h1:9KWJ/8DgU+QzYGupX4tzMhRQE8h6w90lH6HAaclpEok=
github.com/asaskevich/govalidator v0.0.0-20230301143203-a9d515a09cc2 h1:DklsrG3dyBCFEj5IhUbnKptjxatkF07cF2ak3yi77so=
github.com/asaskevich/govalidator v0.0.0-20230301143203-a9d515a09cc2/go.mod h1:WaHUgvxTVq04UNunO+XhnAqY/wQc+bxr74GqbsZ/Jqw=
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
@@ -64,10 +66,10 @@ github.com/containerd/typeurl/v2 v2.2.3 h1:yNA/94zxWdvYACdYO8zofhrTVuQY73fFU1y++
github.com/containerd/typeurl/v2 v2.2.3/go.mod h1:95ljDnPfD3bAbDJRugOiShd/DlAAsxGtUBhJxIn7SCk=
github.com/containernetworking/cni v1.3.0 h1:v6EpN8RznAZj9765HhXQrtXgX+ECGebEYEmnuFjskwo=
github.com/containernetworking/cni v1.3.0/go.mod h1:Bs8glZjjFfGPHMw6hQu82RUgEPNGEaBb9KS5KtNMnJ4=
github.com/containernetworking/plugins v1.7.1 h1:CNAR0jviDj6FS5Vg85NTgKWLDzZPfi/lj+VJfhMDTIs=
github.com/containernetworking/plugins v1.7.1/go.mod h1:xuMdjuio+a1oVQsHKjr/mgzuZ24leAsqUYRnzGoXHy0=
github.com/coreos/go-systemd/v22 v22.5.1-0.20231103132048-7d375ecc2b09 h1:OoRAFlvDGCUqDLampLQjk0yeeSGdF9zzst/3G9IkBbc=
github.com/coreos/go-systemd/v22 v22.5.1-0.20231103132048-7d375ecc2b09/go.mod h1:m2r/smMKsKwgMSAoFKHaa68ImdCSNuKE1MxvQ64xuCQ=
github.com/containernetworking/plugins v1.9.0 h1:Mg3SXBdRGkdXyFC4lcwr6u2ZB2SDeL6LC3U+QrEANuQ=
github.com/containernetworking/plugins v1.9.0/go.mod h1:JG3BxoJifxxHBhG3hFyxyhid7JgRVBu/wtooGEvWf1c=
github.com/coreos/go-systemd/v22 v22.6.0 h1:aGVa/v8B7hpb0TKl0MWoAavPDmHvobFe5R5zn0bCJWo=
github.com/coreos/go-systemd/v22 v22.6.0/go.mod h1:iG+pp635Fo7ZmV/j14KUcmEyWF+0X7Lua8rrTWzYgWU=
github.com/cpuguy83/go-md2man/v2 v2.0.4/go.mod h1:tgQtvFlXSQOSOSIRvRPT7W67SCa46tRHOmNcaadrF8o=
github.com/cpuguy83/go-md2man/v2 v2.0.6 h1:XJtiaUW6dEEqVuZiMTn1ldk455QWwEIsMIJlo5vtkx0=
github.com/cpuguy83/go-md2man/v2 v2.0.6/go.mod h1:oOW0eioCTA6cOiMLiUPZOpcVxMig6NIQQ7OS05n1F4g=
@@ -100,8 +102,8 @@ github.com/fxamacker/cbor/v2 v2.7.0/go.mod h1:pxXPTn3joSm21Gbwsv0w9OSA2y1HFR9qXE
github.com/go-ini/ini v1.67.0 h1:z6ZrTEZqSWOTyH2FlglNbNgARyHG8oLW9gMELqKr06A=
github.com/go-ini/ini v1.67.0/go.mod h1:ByCAeIL28uOIIG0E3PJtZPDL8WnHpFKFOtgjp+3Ies8=
github.com/go-logr/logr v1.2.2/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A=
github.com/go-logr/logr v1.4.2 h1:6pFjapn8bFcIbiKo3XT4j/BhANplGihG6tvd+8rYgrY=
github.com/go-logr/logr v1.4.2/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY=
github.com/go-logr/logr v1.4.3 h1:CjnDlHq8ikf6E492q6eKboGOC0T8CDaOvkHCIg8idEI=
github.com/go-logr/logr v1.4.3/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY=
github.com/go-logr/stdr v1.2.2 h1:hSWxHoqTgW2S2qGc0LTAI563KZ5YKYRhT3MFKZMbjag=
github.com/go-logr/stdr v1.2.2/go.mod h1:mMo/vtBO5dYbehREoey6XUKy/eSumjCCveDpRre4VKE=
github.com/go-openapi/analysis v0.23.0 h1:aGday7OWupfMs+LbmLZG4k0MYXIANxcuBTYUC03zFCU=
@@ -130,7 +132,6 @@ github.com/go-task/slim-sprig v0.0.0-20210107165309-348f09dbbbc0 h1:p104kn46Q8Wd
github.com/go-task/slim-sprig v0.0.0-20210107165309-348f09dbbbc0/go.mod h1:fyg7847qk6SyHyPtNmDHnmrv/HOrqktSC+C9fM+CJOE=
github.com/go-task/slim-sprig/v3 v3.0.0 h1:sUs3vkvUymDpBKi3qH1YSqBQk9+9D/8M2mN1vB6EwHI=
github.com/go-task/slim-sprig/v3 v3.0.0/go.mod h1:W848ghGpv3Qj3dhTPRyJypKRiqCdHZiAzKg9hl15HA8=
github.com/godbus/dbus/v5 v5.1.0/go.mod h1:xhWf0FNVPg57R7Z0UbKHbJfkEywrmjJnf7w5xrFpKfA=
github.com/godbus/dbus/v5 v5.1.1-0.20230522191255-76236955d466 h1:sQspH8M4niEijh3PFscJRLDnkL547IeP7kpPe3uUhEg=
github.com/godbus/dbus/v5 v5.1.1-0.20230522191255-76236955d466/go.mod h1:ZiQxhyQ+bbbfxUKVvjfO498oPYvtYhZzycal3G/NHmU=
github.com/gogo/protobuf v1.3.2 h1:Ov1cvc58UF3b5XjBnZv7+opcTcQFZebYjWzi34vdm4Q=
@@ -165,8 +166,8 @@ github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/
github.com/google/go-cmp v0.5.9/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8=
github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU=
github.com/google/pprof v0.0.0-20250403155104-27863c87afa6 h1:BHT72Gu3keYf3ZEu2J0b1vyeLSOYI8bm5wbJM/8yDe8=
github.com/google/pprof v0.0.0-20250403155104-27863c87afa6/go.mod h1:boTsfXsheKC2y+lKOCMpSfarhxDeIzfZG1jqGcPl3cA=
github.com/google/pprof v0.0.0-20250820193118-f64d9cf942d6 h1:EEHtgt9IwisQ2AZ4pIsMjahcegHh6rmhqxzIRQIyepY=
github.com/google/pprof v0.0.0-20250820193118-f64d9cf942d6/go.mod h1:I6V7YzU0XDpsHqbsyrghnFZLO1gwK6NPTNvmetQIk9U=
github.com/google/uuid v1.1.2/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
@@ -229,13 +230,13 @@ github.com/onsi/ginkgo v1.6.0/go.mod h1:lLunBs/Ym6LB5Z9jYTR76FiuTmxDTDusOGeTQH+W
github.com/onsi/ginkgo v1.12.1/go.mod h1:zj2OWP4+oCPe1qIXoGWkgMRwljMUYCdkwsT2108oapk=
github.com/onsi/ginkgo v1.16.4 h1:29JGrr5oVBm5ulCWet69zQkzWipVXIol6ygQUe/EzNc=
github.com/onsi/ginkgo v1.16.4/go.mod h1:dX+/inL/fNMqNlz0e9LfyB9TswhZpCVdJM/Z6Vvnwo0=
github.com/onsi/ginkgo/v2 v2.23.4 h1:ktYTpKJAVZnDT4VjxSbiBenUjmlL/5QkBEocaWXiQus=
github.com/onsi/ginkgo/v2 v2.23.4/go.mod h1:Bt66ApGPBFzHyR+JO10Zbt0Gsp4uWxu5mIOTusL46e8=
github.com/onsi/ginkgo/v2 v2.25.1 h1:Fwp6crTREKM+oA6Cz4MsO8RhKQzs2/gOIVOUscMAfZY=
github.com/onsi/ginkgo/v2 v2.25.1/go.mod h1:ppTWQ1dh9KM/F1XgpeRqelR+zHVwV81DGRSDnFxK7Sk=
github.com/onsi/gomega v1.7.1/go.mod h1:XdKZgCCFLUoM/7CFJVPcG8C1xQ1AJ0vpAezJrB7JYyY=
github.com/onsi/gomega v1.10.1/go.mod h1:iN09h71vgCQne3DLsj+A5owkum+a2tYe+TOCB1ybHNo=
github.com/onsi/gomega v1.16.0/go.mod h1:HnhC7FXeEQY45zxNK3PPoIUhzk/80Xly9PcubAlGdZY=
github.com/onsi/gomega v1.37.0 h1:CdEG8g0S133B4OswTDC/5XPSzE1OeP29QOioj2PID2Y=
github.com/onsi/gomega v1.37.0/go.mod h1:8D9+Txp43QWKhM24yyOBEdpkzN8FvJyAwecBgsU4KU0=
github.com/onsi/gomega v1.38.1 h1:FaLA8GlcpXDwsb7m0h2A9ew2aTk3vnZMlzFgg5tz/pk=
github.com/onsi/gomega v1.38.1/go.mod h1:LfcV8wZLvwcYRwPiJysphKAEsmcFnLMK/9c+PjvlX8g=
github.com/opencontainers/go-digest v1.0.0 h1:apOUWs51W5PlhuyGyz9FCeeBIOUDA/6nW8Oi/yOhh5U=
github.com/opencontainers/go-digest v1.0.0/go.mod h1:0JzlMkj0TRzQZfJkVvzbP0HBR3IKzErnv2BNG4W4MAM=
github.com/opencontainers/image-spec v1.1.1 h1:y0fUlFfIZhPF1W537XOLg0/fcx6zcHCJwooC2xJA040=
@@ -270,8 +271,8 @@ github.com/rogpeppe/go-internal v1.13.1 h1:KvO1DLK/DRN07sQ1LQKScxyZJuNnedQ5/wKSR
github.com/rogpeppe/go-internal v1.13.1/go.mod h1:uMEvuHeurkdAXX61udpOXGD/AzZDWNMNyH2VO9fmH0o=
github.com/russross/blackfriday/v2 v2.1.0 h1:JIOH55/0cWyOuilr9/qlrm0BSXldqnqwMsf35Ld67mk=
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
github.com/safchain/ethtool v0.5.10 h1:Im294gZtuf4pSGJRAOGKaASNi3wMeFaGaWuSaomedpc=
github.com/safchain/ethtool v0.5.10/go.mod h1:w9jh2Lx7YBR4UwzLkzCmWl85UY0W2uZdd7/DckVE5+c=
github.com/safchain/ethtool v0.6.2 h1:O3ZPFAKEUEfbtE6J/feEe2Ft7dIJ2Sy8t4SdMRiIMHY=
github.com/safchain/ethtool v0.6.2/go.mod h1:VS7cn+bP3Px3rIq55xImBiZGHVLNyBh5dqG6dDQy8+I=
github.com/sirupsen/logrus v1.9.3 h1:dueUQJ1C2q9oE3F7wvmSGAaVtTmUizReu6fjN8uqzbQ=
github.com/sirupsen/logrus v1.9.3/go.mod h1:naHLuLoDiP4jHNo9R0sCBMtWGeIprob74mVsIT4qYEQ=
github.com/spf13/pflag v1.0.6 h1:jFzHGLGAlb3ruxLB8MhbI6A8+AQX/2eW4qeyNZXNp2o=
@@ -295,9 +296,8 @@ github.com/syndtr/gocapability v0.0.0-20200815063812-42c35b437635 h1:kdXcSzyDtse
github.com/syndtr/gocapability v0.0.0-20200815063812-42c35b437635/go.mod h1:hkRG7XYTFWNJGYcbNJQlaLq0fg1yr4J4t/NcTQtrfww=
github.com/urfave/cli v1.22.15 h1:nuqt+pdC/KqswQKhETJjo7pvn/k4xMUxgW6liI7XpnM=
github.com/urfave/cli v1.22.15/go.mod h1:wSan1hmo5zeyLGBjRJbzRTNk8gwoYa2B9n4q9dmRIc0=
github.com/vishvananda/netlink v1.3.1-0.20250303224720-0e7078ed04c8 h1:Y4egeTrP7sccowz2GWTJVtHlwkZippgBTpUmMteFUWQ=
github.com/vishvananda/netlink v1.3.1-0.20250303224720-0e7078ed04c8/go.mod h1:i6NetklAujEcC6fK0JPjT8qSwWyO0HLn4UKG+hGqeJs=
github.com/vishvananda/netns v0.0.4/go.mod h1:SpkAiCQRtJ6TvvxPnOSyH3BMl6unz3xZlaprSwhNNJM=
github.com/vishvananda/netlink v1.3.1 h1:3AEMt62VKqz90r0tmNhog0r/PpWKmrEShJU0wJW6bV0=
github.com/vishvananda/netlink v1.3.1/go.mod h1:ARtKouGSTGchR8aMwmkzC0qiNPrrWO5JS/XMVl45+b4=
github.com/vishvananda/netns v0.0.5 h1:DfiHV+j8bA32MFM7bfEunvT8IAqQ/NzSJHtcmW5zdEY=
github.com/vishvananda/netns v0.0.5/go.mod h1:SpkAiCQRtJ6TvvxPnOSyH3BMl6unz3xZlaprSwhNNJM=
github.com/x448/float16 v0.8.4 h1:qLwI1I70+NjRFUR3zs1JPUCgaCXSh3SW62uAKT1mSBM=
@@ -339,6 +339,8 @@ go.uber.org/automaxprocs v1.6.0 h1:O3y2/QNTOdbF+e/dpXNNW7Rx2hZ4sTIPyybbxyNqTUs=
go.uber.org/automaxprocs v1.6.0/go.mod h1:ifeIMSnPZuznNm6jmdzmU3/bfk01Fe2fotchwEFJ8r8=
go.uber.org/goleak v1.3.0 h1:2K3zAYmnTNqV73imy9J1T3WC+gmCePx2hEGkimedGto=
go.uber.org/goleak v1.3.0/go.mod h1:CoHD4mav9JJNrW/WLlf7HGZPjdw8EucARQHekz1X6bE=
go.yaml.in/yaml/v3 v3.0.4 h1:tfq32ie2Jv2UxXFdLJdh3jXuOzWiL1fo0bu/FbuKpbc=
go.yaml.in/yaml/v3 v3.0.4/go.mod h1:DhzuOOF2ATzADvBadXxruRBLzYTpT36CKvDb3+aBEFg=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI=
golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto=
@@ -364,8 +366,8 @@ golang.org/x/net v0.0.0-20200520004742-59133d7f0dd7/go.mod h1:qpuaurCH72eLCgpAm/
golang.org/x/net v0.0.0-20201021035429-f5854403a974/go.mod h1:sp8m0HH+o8qH0wwXwYZr8TS3Oi6o0r6Gce1SSxlDquU=
golang.org/x/net v0.0.0-20201110031124-69a78807bb2b/go.mod h1:sp8m0HH+o8qH0wwXwYZr8TS3Oi6o0r6Gce1SSxlDquU=
golang.org/x/net v0.0.0-20210428140749-89ef3d95e781/go.mod h1:OJAsFXCWl8Ukc7SiCT/9KSuxbyM7479/AVlXFRxuMCk=
golang.org/x/net v0.42.0 h1:jzkYrhi3YQWD6MLBJcsklgQsoAcw89EcZbJw8Z614hs=
golang.org/x/net v0.42.0/go.mod h1:FF1RA5d3u7nAYA4z2TkclSCKh68eSXtiFwcWQpPXdt8=
golang.org/x/net v0.43.0 h1:lat02VYK2j4aLzMzecihNvTlJNQUq316m2Mr9rnM6YE=
golang.org/x/net v0.43.0/go.mod h1:vhO1fvI4dGsIjh73sWfUVjj3N7CA9WkKJNQm2svM6Jg=
golang.org/x/oauth2 v0.0.0-20180821212333-d2e6202438be/go.mod h1:N/0e6XlmueqKjAGxoOufVs8QHGRruUQn6yWY3a++T0U=
golang.org/x/oauth2 v0.30.0 h1:dnDm7JmhM45NNpd8FDDeLhK6FwqbOf4MLCM9zb1BOHI=
golang.org/x/oauth2 v0.30.0/go.mod h1:B++QgG3ZKulg6sRPGD/mqlHQs5rB3Ml9erfeDY7xKlU=
@@ -395,15 +397,14 @@ golang.org/x/sys v0.0.0-20220817070843-5a390386f1f2/go.mod h1:oPkhp1MJrh7nUepCBc
golang.org/x/sys v0.1.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.2.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.10.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.29.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.34.0 h1:H5Y5sJ2L2JRdyv7ROF1he/lPdvFsd0mJHFw2ThKHxLA=
golang.org/x/sys v0.34.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
golang.org/x/sys v0.35.0 h1:vz1N37gP5bs89s7He8XuIYXpyY0+QlsKmzipCbUtyxI=
golang.org/x/sys v0.35.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.27.0 h1:4fGWRpyh641NLlecmyl4LOe6yDdfaYNrGb2zdfo4JV4=
golang.org/x/text v0.27.0/go.mod h1:1D28KMCvyooCX9hBiosv5Tz/+YLxj0j7XhWjpSUF7CU=
golang.org/x/text v0.28.0 h1:rhazDwis8INMIwQ4tpjLDzUhx6RlXqZNPEM0huQojng=
golang.org/x/text v0.28.0/go.mod h1:U8nCwOR8jO/marOQ0QbDiOngZVEBB7MAiitBuMjXiNU=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20190114222345-bf090417da8b/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
golang.org/x/tools v0.0.0-20190226205152-f727befe758c/go.mod h1:9Yl7xja0Znq3iFh3HoIrodX9oNMXvdceNzlUR8zjMvY=
@@ -413,8 +414,8 @@ golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e/go.mod h1:b+2E5dAYhXwXZwtn
golang.org/x/tools v0.0.0-20200619180055-7c47624df98f/go.mod h1:EkVYQZoAsY45+roYkvgYkIh4xh/qjgUK9TdY2XT94GE=
golang.org/x/tools v0.0.0-20201224043029-2b0845dc783e/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA=
golang.org/x/tools v0.0.0-20210106214847-113979e3529a/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA=
golang.org/x/tools v0.34.0 h1:qIpSLOxeCYGg9TrcJokLBG4KFA6d795g0xkBkiESGlo=
golang.org/x/tools v0.34.0/go.mod h1:pAP9OwEaY1CAW3HOmg3hLZC5Z0CCmzjAF2UQMSqNARg=
golang.org/x/tools v0.36.0 h1:kWS0uv/zsvHEle1LbV5LE8QujrxB3wfQyxHfhOk0Qkg=
golang.org/x/tools v0.36.0/go.mod h1:WBDiHKJK8YgLHlcQPYQzNCkUxUypCaa5ZegCVutKm+s=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
@@ -446,8 +447,8 @@ google.golang.org/protobuf v1.23.1-0.20200526195155-81db48ad09cc/go.mod h1:EGpAD
google.golang.org/protobuf v1.25.0/go.mod h1:9JNX74DMeImyA3h4bdi1ymwjUzf21/xIlbajtzgsN7c=
google.golang.org/protobuf v1.26.0-rc.1/go.mod h1:jlhhOSvTdKEhbULTjvd4ARK9grFBp09yW+WbY/TyQbw=
google.golang.org/protobuf v1.26.0/go.mod h1:9q0QmTI4eRPtz6boOQmLYwt+qCgq0jsYwAQnmE0givc=
google.golang.org/protobuf v1.36.6 h1:z1NpPI8ku2WgiWnf+t9wTPsn6eP1L7ksHUlkfLvd9xY=
google.golang.org/protobuf v1.36.6/go.mod h1:jduwjTPXsFjZGTmRluh+L6NjiWu7pchiJ2/5YcXBHnY=
google.golang.org/protobuf v1.36.7 h1:IgrO7UwFQGJdRNXH/sQux4R1Dj1WAKcLElzeeRaXV2A=
google.golang.org/protobuf v1.36.7/go.mod h1:jduwjTPXsFjZGTmRluh+L6NjiWu7pchiJ2/5YcXBHnY=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q=
@@ -469,6 +470,8 @@ k8s.io/apimachinery v0.33.0 h1:1a6kHrJxb2hs4t8EE5wuR/WxKDwGN1FKH3JvDtA0CIQ=
k8s.io/apimachinery v0.33.0/go.mod h1:BHW0YOu7n22fFv/JkYOEfkUYNRN0fj0BlvMFWA7b+SM=
k8s.io/cri-api v0.33.0 h1:YyGNgWmuSREqFPlP3XCstlHLilYdW898KwtKoaTYwBs=
k8s.io/cri-api v0.33.0/go.mod h1:OLQvT45OpIA+tv91ZrpuFIGY+Y2Ho23poS7n115Aocs=
k8s.io/kubelet v0.33.0 h1:4pJA2Ge6Rp0kDNV76KH7pTBiaV2T1a1874QHMcubuSU=
k8s.io/kubelet v0.33.0/go.mod h1:iDnxbJQMy9DUNaML5L/WUlt3uJtNLWh7ZAe0JSp4Yi0=
sigs.k8s.io/randfill v1.0.0 h1:JfjMILfT8A6RbawdsK2JXGBR5AQVfd+9TbzrlneTyrU=
sigs.k8s.io/randfill v1.0.0/go.mod h1:XeLlZ/jmk4i1HRopwe7/aU3H5n1zNUcX6TM94b3QxOY=
sigs.k8s.io/yaml v1.4.0 h1:Mk1wCc2gy/F0THH0TAp1QYyJNzRm2KCLy3o5ASXVI5E=

View File

@@ -23,7 +23,6 @@ import (
containerd_types "github.com/containerd/containerd/api/types"
"github.com/containerd/containerd/mount"
"github.com/containerd/typeurl/v2"
"github.com/kata-containers/kata-containers/src/runtime/pkg/device/config"
"github.com/kata-containers/kata-containers/src/runtime/pkg/utils"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/annotations"
@@ -113,19 +112,12 @@ func create(ctx context.Context, s *service, r *taskAPI.CreateTaskRequest) (*con
if s.sandbox != nil {
return nil, fmt.Errorf("cannot create another sandbox in sandbox: %s", s.sandbox.ID())
}
// Here we deal with CDI devices that are cold-plugged (k8s) and
// for the single_container (nerdctl, podman, ...) use-case.
// We can provide additional directories where to search for
// CDI specs if needed. immutable OS's only have specific
// directories where applications can write too. For instance /opt/cdi
//
// _, err = withCDI(ociSpec.Annotations, []string{"/opt/cdi"}, ociSpec)
_, err = config.WithCDI(ociSpec.Annotations, []string{}, ociSpec)
if err != nil {
return nil, fmt.Errorf("adding CDI devices failed: %w", err)
}
s.config = runtimeConfig
err = coldPlugDevices(ctx, s, ociSpec)
if err != nil {
return nil, fmt.Errorf("device cold plug failed: %w", err)
}
// create tracer
// This is the earliest location we can create the tracer because we must wait

View File

@@ -0,0 +1,146 @@
// Copyright (c) 2025 NVIDIA CORPORATION.
//
// SPDX-License-Identifier: Apache-2.0
//
package containerdshim
import (
"context"
"fmt"
"net"
"strings"
"github.com/kata-containers/kata-containers/src/runtime/pkg/device/config"
"github.com/opencontainers/runtime-spec/specs-go"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
podresourcesv1 "k8s.io/kubelet/pkg/apis/podresources/v1"
)
const (
nameAnnotation = "io.kubernetes.cri.sandbox-name"
namespaceAnnotation = "io.kubernetes.cri.sandbox-namespace"
)
// coldPlugDevices handles cold plug of CDI devices into the sandbox
// kubelet's PodResources API is used for determining the devices to be
// cold plugged, if so configured. Otherwise, cdi annotations can be used for
// covering stand alone scenarios.
func coldPlugDevices(ctx context.Context, s *service, ociSpec *specs.Spec) error {
if s.config.HypervisorConfig.ColdPlugVFIO == config.NoPort {
// device cold plug is not enabled
shimLog.Debug("cold_plug_vfio not enabled, skip device cold plug")
return nil
}
kubeletSock := s.config.PodResourceAPISock
if kubeletSock != "" {
return coldPlugWithAPI(ctx, s, ociSpec)
}
shimLog.Debug("config.PodResourceAPISock not set, skip k8s based device cold plug")
// Here we deal with CDI devices that are cold-plugged
// for the single_container (nerdctl, podman, ...) use-case.
// We can provide additional directories where to search for
// CDI specs if needed. immutable OS's only have specific
// directories where applications can write too. For instance /opt/cdi
_, err := config.WithCDI(ociSpec.Annotations, []string{}, ociSpec)
if err != nil {
return fmt.Errorf("CDI device injection failed: %w", err)
}
return nil
}
func coldPlugWithAPI(ctx context.Context, s *service, ociSpec *specs.Spec) error {
ann := ociSpec.Annotations
devices, err := getDeviceSpec(ctx, s.config.PodResourceAPISock, ann)
if err != nil {
return err
}
if len(devices) == 0 {
shimLog.WithField("pod", debugPodID(ann)).Debug("No devices found in Pod Resources, skip cold plug")
return nil
}
err = config.InjectCDIDevices(ociSpec, devices)
if err != nil {
return fmt.Errorf("cold plug: CDI device injection failed: %w", err)
}
return nil
}
// getDeviceSpec fetches the device information for the pod sandbox using
// kubelet's pod resource api. This is necessary for cold plug because
// the Kubelet does not pass the device information via CRI during
// Sandbox creation.
func getDeviceSpec(ctx context.Context, socket string, ann map[string]string) ([]string, error) {
podName := ann[nameAnnotation]
podNs := ann[namespaceAnnotation]
// create dialer for unix socket
dialer := func(ctx context.Context, target string) (net.Conn, error) {
// need this workaround to avoid duplicate prefix
addr := strings.TrimPrefix(target, "unix://")
return (&net.Dialer{}).DialContext(ctx, "unix", addr)
}
target := fmt.Sprintf("unix://%s", socket)
conn, err := grpc.NewClient(
target,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithContextDialer(dialer),
grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(16*1024*1024)),
)
if err != nil {
return nil, fmt.Errorf("cold plug: failed to connect to kubelet: %w", err)
}
defer conn.Close()
// create client
client := podresourcesv1.NewPodResourcesListerClient(conn)
// get all pod resources
prr := &podresourcesv1.GetPodResourcesRequest{
PodName: podName,
PodNamespace: podNs,
}
resp, err := client.Get(ctx, prr)
if err != nil {
return nil, fmt.Errorf("cold plug: GetPodResources failed: %w", err)
}
podRes := resp.PodResources
if podRes == nil {
return nil, fmt.Errorf("cold plug: PodResources is nil")
}
// Process results
var devices []string
for _, container := range podRes.Containers {
for _, d := range container.Devices {
shimLog.WithField("container", container.Name).Debugf("Pod Resources Device: %s = %v\n",
d.ResourceName, d.DeviceIds)
cdiDevs := formatCDIDevIDs(d.ResourceName, d.DeviceIds)
devices = append(devices, cdiDevs...)
}
}
return devices, nil
}
// formatCDIDevIDs formats the way CDI package expects
func formatCDIDevIDs(specName string, devIDs []string) []string {
var result []string
for _, id := range devIDs {
result = append(result, fmt.Sprintf("%s=%s", specName, id))
}
return result
}
func debugPodID(ann map[string]string) string {
return fmt.Sprintf("%s/%s", ann[namespaceAnnotation], ann[nameAnnotation])
}

View File

@@ -15,6 +15,7 @@ import (
"github.com/container-orchestrated-devices/container-device-interface/pkg/cdi"
"github.com/go-ini/ini"
"github.com/kata-containers/kata-containers/src/runtime/pkg/device"
vcTypes "github.com/kata-containers/kata-containers/src/runtime/virtcontainers/types"
"github.com/opencontainers/runtime-spec/specs-go"
"golang.org/x/sys/unix"
@@ -431,6 +432,16 @@ type VFIODev struct {
HostPath string
}
// IOMMUFDID returns the IOMMUFD ID if the VFIO device is backed by IOMMUFD
// otherwise returns an empty string.
func (t VFIODev) IOMMUFDID() string {
if !strings.HasPrefix(t.DevfsDev, device.IommufdDevPath) {
return ""
}
basename := filepath.Base(t.DevfsDev)
return strings.TrimPrefix(basename, "vfio")
}
// RNGDev represents a random number generator device
type RNGDev struct {
// ID is used to identify the device in the hypervisor options.
@@ -684,6 +695,25 @@ func WithCDI(annotations map[string]string, cdiSpecDirs []string, spec *specs.Sp
return spec, nil
}
if err = injectDevices(cdiSpecDirs, spec, devsFromAnnotations); err != nil {
return nil, err
}
// One crucial thing to keep in mind is that CDI device injection
// might add OCI Spec environment variables, hooks, and mounts as
// well. Therefore it is important that none of the corresponding
// OCI Spec fields are reset up in the call stack once we return.
return spec, nil
}
// InjectCDIDevices injects the specified devices into the oci spec.
// Devices must be a slice of strings of the form
// vendor.com/class=unique_name
func InjectCDIDevices(spec *specs.Spec, devices []string) error {
return injectDevices(nil, spec, devices)
}
func injectDevices(cdiSpecDirs []string, spec *specs.Spec, devices []string) error {
var registry cdi.Registry
if len(cdiSpecDirs) > 0 {
// We can override the directories where to search for CDI specs
@@ -693,22 +723,13 @@ func WithCDI(annotations map[string]string, cdiSpecDirs []string, spec *specs.Sp
registry = cdi.GetRegistry()
}
if err = registry.Refresh(); err != nil {
// We don't consider registry refresh failure a fatal error.
// For instance, a dynamically generated invalid CDI Spec file for
// any particular vendor shouldn't prevent injection of devices of
// different vendors. CDI itself knows better and it will fail the
// injection if necessary.
return nil, fmt.Errorf("CDI registry refresh failed: %w", err)
if err := registry.Refresh(); err != nil {
return fmt.Errorf("CDI registry refresh failed: %w", err)
}
if _, err := registry.InjectDevices(spec, devsFromAnnotations...); err != nil {
return nil, fmt.Errorf("CDI device injection failed: %w", err)
if _, err := registry.InjectDevices(spec, devices...); err != nil {
return fmt.Errorf("CDI device injection failed: %w", err)
}
// One crucial thing to keep in mind is that CDI device injection
// might add OCI Spec environment variables, hooks, and mounts as
// well. Therefore it is important that none of the corresponding
// OCI Spec fields are reset up in the call stack once we return.
return spec, nil
return nil
}

View File

@@ -68,3 +68,24 @@ func TestGetSysDevPathImpl(t *testing.T) {
assert.Contains(path, expectedFormat)
assert.Contains(path, "block")
}
func TestIOMMUFDID(t *testing.T) {
for _, tc := range []struct {
devfsDev string
expected string
}{
{"/dev/vfio/42", ""},
{"/dev/vfio/devices/vfio99", "99"},
{"/dev/vfio/invalid", ""},
{"/dev/other/42", ""},
} {
t.Run(tc.devfsDev, func(t *testing.T) {
assert := assert.New(t)
info := VFIODev{
DevfsDev: "/dev/vfio/devices/vfio5",
}
assert.Equal("5", info.IOMMUFDID())
})
}
}

View File

@@ -0,0 +1,11 @@
// Copyright (c) 2017-2018 Intel Corporation
// Copyright (c) 2018 Huawei Corporation
//
// SPDX-License-Identifier: Apache-2.0
//
package device
const (
IommufdDevPath = "/dev/vfio/devices"
)

View File

@@ -160,7 +160,7 @@ func checkIgnorePCIClass(pciClass string, deviceBDF string, bitmask uint64) (boo
return false, nil
}
func getMajorMinorFromDevPath(devPath string) (uint32, uint32, error) {
func GetMajorMinorFromDevPath(devPath string) (uint32, uint32, error) {
fi, err := os.Stat(devPath)
if err != nil {
return 0, 0, err
@@ -181,7 +181,7 @@ func extractIndex(devicePath string) (string, error) {
return strings.TrimPrefix(base, prefix), nil
}
func getBdfFromVFIODev(major uint32, minor uint32) (string, error) {
func GetBDFFromVFIODev(major uint32, minor uint32) (string, error) {
devPath := fmt.Sprintf("/sys/dev/char/%d:%d", major, minor)
realPath, err := filepath.EvalSymlinks(devPath)
if err != nil {
@@ -203,13 +203,13 @@ func GetDeviceFromVFIODev(device config.DeviceInfo) ([]*config.VFIODev, error) {
// device major:minor entries in /sys/chart/major:minor
// $ ls -l /dev/vfio/devices/vfio0
// crw------- 1 root root 237, 0 Jan 15 16:53 /dev/vfio/devices/vfio0
major, minor, err := getMajorMinorFromDevPath(device.HostPath)
major, minor, err := GetMajorMinorFromDevPath(device.HostPath)
if err != nil {
return nil, fmt.Errorf("Failed to get major:minor from %s: %v", device.HostPath, err)
}
// $ ls -l /sys/dev/char/237:0
// /sys/dev/char/237:0 -> ../../devices/pci0000:64/0000:64:00.0/0000:65:00.0/vfio-dev/vfio0
deviceBDF, err := getBdfFromVFIODev(major, minor)
deviceBDF, err := GetBDFFromVFIODev(major, minor)
if err != nil {
return nil, err
}

View File

@@ -15,6 +15,7 @@ import (
"github.com/sirupsen/logrus"
pkgDevice "github.com/kata-containers/kata-containers/src/runtime/pkg/device"
"github.com/kata-containers/kata-containers/src/runtime/pkg/device/api"
"github.com/kata-containers/kata-containers/src/runtime/pkg/device/config"
"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/utils"
@@ -28,7 +29,6 @@ const (
iommuGroupPath = "/sys/bus/pci/devices/%s/iommu_group"
vfioDevPath = "/dev/vfio/%s"
vfioAPSysfsDir = "/sys/devices/vfio_ap"
IommufdDevPath = "/dev/vfio/devices"
)
// VFIODevice is a vfio device meant to be passed to the hypervisor
@@ -69,7 +69,7 @@ func (device *VFIODevice) Attach(ctx context.Context, devReceiver api.DeviceRece
// In the case of IOMMUFD the device.HostPath will look like
// /dev/vfio/devices/vfio0
// (1) Check if we have the new IOMMUFD or old container based VFIO
if strings.HasPrefix(device.DeviceInfo.HostPath, IommufdDevPath) {
if strings.HasPrefix(device.DeviceInfo.HostPath, pkgDevice.IommufdDevPath) {
device.VfioDevs, err = GetDeviceFromVFIODev(*device.DeviceInfo)
if err != nil {
return err

View File

@@ -27,7 +27,7 @@ import (
"strings"
"syscall"
"github.com/kata-containers/kata-containers/src/runtime/pkg/device/drivers"
pkgDevice "github.com/kata-containers/kata-containers/src/runtime/pkg/device"
)
// Machine describes the machine type qemu will emulate.
@@ -2023,7 +2023,7 @@ func (vfioDev VFIODevice) QemuParams(config *Config) []string {
deviceParams = append(deviceParams, fmt.Sprintf("devno=%s", vfioDev.DevNo))
}
if strings.HasPrefix(vfioDev.DevfsDev, drivers.IommufdDevPath) {
if strings.HasPrefix(vfioDev.DevfsDev, pkgDevice.IommufdDevPath) {
qemuParams = append(qemuParams, "-object")
qemuParams = append(qemuParams, fmt.Sprintf("iommufd,id=iommufd%s", vfioDev.ID))
deviceParams = append(deviceParams, fmt.Sprintf("iommufd=iommufd%s", vfioDev.ID))

View File

@@ -43,6 +43,10 @@ type QMPLog interface {
// Errorf writes error output to the log. A newline will be
// added to the output if one is not provided.
Errorf(string, ...interface{})
// Debugf writes debug output to the log. A newline will be
// added to the output if one is not provided.
Debugf(string, ...interface{})
}
type qmpNullLogger struct{}
@@ -60,6 +64,9 @@ func (l qmpNullLogger) Warningf(format string, v ...interface{}) {
func (l qmpNullLogger) Errorf(format string, v ...interface{}) {
}
func (l qmpNullLogger) Debugf(format string, v ...interface{}) {
}
// QMPConfig is a configuration structure that can be used to specify a
// logger and a channel to which logs and QMP events are to be sent. If
// neither of these fields are specified, or are set to nil, no logs will be
@@ -653,6 +660,7 @@ func (q *QMP) executeCommandWithResponse(ctx context.Context, name string, args
func (q *QMP) executeCommand(ctx context.Context, name string, args map[string]interface{},
filter *qmpEventFilter) error {
q.cfg.Logger.Debugf("Executing QMP command: %s: %v", name, args)
_, err := q.executeCommandWithResponse(ctx, name, args, nil, filter)
return err
@@ -1101,7 +1109,7 @@ func (q *QMP) ExecuteDeviceDel(ctx context.Context, devID string) error {
// disableModern indicates if virtio version 1.0 should be replaced by the
// former version 0.9, as there is a KVM bug that occurs when using virtio
// 1.0 in nested environments.
func (q *QMP) ExecutePCIDeviceAdd(ctx context.Context, blockdevID, devID, driver, addr, bus, romfile string, queues int, shared, disableModern bool) error {
func (q *QMP) ExecutePCIDeviceAdd(ctx context.Context, blockdevID, devID, driver, addr, bus, romfile string, queues int, shared, disableModern bool, iothreadID string) error {
args := map[string]interface{}{
"id": devID,
"driver": driver,
@@ -1128,6 +1136,10 @@ func (q *QMP) ExecutePCIDeviceAdd(ctx context.Context, blockdevID, devID, driver
}
}
if iothreadID != "" {
args["iothread"] = iothreadID
}
return q.executeCommand(ctx, "device_add", args, nil)
}
@@ -1156,7 +1168,8 @@ func (q *QMP) ExecutePCIVhostUserDevAdd(ctx context.Context, driver, devID, char
// devID is the id of the device to add. Must be valid QMP identifier.
// bdf is the PCI bus-device-function of the pci device.
// bus is optional. When hot plugging a PCIe device, the bus can be the ID of the pcie-root-port.
func (q *QMP) ExecuteVFIODeviceAdd(ctx context.Context, devID, bdf, bus, romfile string) error {
// iommufdID is the ID of the iommufd object to be created for this device. If empty, no iommufd object will be created.
func (q *QMP) ExecuteVFIODeviceAdd(ctx context.Context, devID, bdf, bus, romfile string, iommufdID string) error {
var driver string
var transport VirtioTransport
@@ -1175,6 +1188,17 @@ func (q *QMP) ExecuteVFIODeviceAdd(ctx context.Context, devID, bdf, bus, romfile
if bus != "" {
args["bus"] = bus
}
if iommufdID != "" {
iommufdIDFull := "iommufd" + iommufdID
objectAddArgs := map[string]interface{}{
"qom-type": "iommufd",
"id": iommufdIDFull,
}
if err := q.executeCommand(ctx, "object-add", objectAddArgs, nil); err != nil {
return err
}
args["iommufd"] = iommufdIDFull
}
return q.executeCommand(ctx, "device_add", args, nil)
}

View File

@@ -50,6 +50,10 @@ func (l qmpTestLogger) Errorf(format string, v ...interface{}) {
l.Infof(format, v...)
}
func (l qmpTestLogger) Debugf(format string, v ...interface{}) {
l.Infof(format, v...)
}
// nolint: govet
type qmpTestCommand struct {
name string
@@ -1066,7 +1070,7 @@ func TestQMPPCIDeviceAdd(t *testing.T) {
blockdevID := fmt.Sprintf("drive_%s", volumeUUID)
devID := fmt.Sprintf("device_%s", volumeUUID)
err := q.ExecutePCIDeviceAdd(context.Background(), blockdevID, devID,
"virtio-blk-pci", "0x1", "", "", 1, true, false)
"virtio-blk-pci", "0x1", "", "", 1, true, false, "")
if err != nil {
t.Fatalf("Unexpected error %v", err)
}
@@ -1136,6 +1140,51 @@ func TestQMPAPVFIOMediatedDeviceAdd(t *testing.T) {
<-disconnectedCh
}
func TestExecuteVFIODeviceAdd(t *testing.T) {
bdf := "04:00.0"
romfile := ""
for _, tc := range []struct {
name string
iommufdID string
}{
{
name: "with IOMMUFD",
iommufdID: "0",
},
{
name: "without IOMMUFD",
iommufdID: "",
},
} {
t.Run(tc.name, func(t *testing.T) {
connectedCh := make(chan *QMPVersion)
disconnectedCh := make(chan struct{})
buf := newQMPTestCommandBuffer(t)
// Note: At the time of writing, the QMPTestCommandBuffer does not
// support verifying parameters passed to object-add and device_add.
// So we just verify that the commands are sent in the correct order.
if tc.iommufdID != "" {
buf.AddCommand("object-add", nil, "return", nil)
}
buf.AddCommand("device_add", nil, "return", nil)
cfg := QMPConfig{Logger: qmpTestLogger{}}
q := startQMPLoop(buf, cfg, connectedCh, disconnectedCh)
checkVersion(t, connectedCh)
err := q.ExecuteVFIODeviceAdd(context.Background(), "devID", bdf, "rp1", romfile, tc.iommufdID)
if err != nil {
t.Fatalf("Unexpected error %v", err)
}
q.Shutdown()
<-disconnectedCh
})
}
}
// Checks that CPU are correctly added using device_add
func TestQMPCPUDeviceAdd(t *testing.T) {
drivers := []string{"host-x86_64-cpu", "host-s390x-cpu", "host-powerpc64-cpu"}

View File

@@ -207,41 +207,42 @@ const (
)
type RuntimeConfigOptions struct {
Hypervisor string
HypervisorPath string
DefaultGuestHookPath string
KernelPath string
ImagePath string
RootfsType string
KernelParams string
MachineType string
LogPath string
BlockDeviceDriver string
BlockDeviceAIO string
SharedFS string
VirtioFSDaemon string
JaegerEndpoint string
JaegerUser string
JaegerPassword string
PFlash []string
HotPlugVFIO config.PCIePort
ColdPlugVFIO config.PCIePort
PCIeRootPort uint32
PCIeSwitchPort uint32
DefaultVCPUCount uint32
DefaultMaxVCPUCount uint32
DefaultMemSize uint32
DefaultMaxMemorySize uint64
DefaultMsize9p uint32
DisableBlock bool
EnableIOThreads bool
DisableNewNetNs bool
HypervisorDebug bool
RuntimeDebug bool
RuntimeTrace bool
AgentDebug bool
AgentTrace bool
EnablePprof bool
Hypervisor string
HypervisorPath string
DefaultGuestHookPath string
KernelPath string
ImagePath string
RootfsType string
KernelParams string
MachineType string
LogPath string
BlockDeviceDriver string
BlockDeviceAIO string
SharedFS string
VirtioFSDaemon string
JaegerEndpoint string
JaegerUser string
JaegerPassword string
PFlash []string
HotPlugVFIO config.PCIePort
ColdPlugVFIO config.PCIePort
PCIeRootPort uint32
PCIeSwitchPort uint32
DefaultVCPUCount uint32
DefaultMaxVCPUCount uint32
DefaultMemSize uint32
DefaultMaxMemorySize uint64
DefaultMsize9p uint32
DefaultIndepIOThreads uint32
DisableBlock bool
EnableIOThreads bool
DisableNewNetNs bool
HypervisorDebug bool
RuntimeDebug bool
RuntimeTrace bool
AgentDebug bool
AgentTrace bool
EnablePprof bool
}
// ContainerIDTestDataType is a type used to test Container and Sandbox ID's.
@@ -318,6 +319,7 @@ func MakeRuntimeConfigFileData(config RuntimeConfigOptions) string {
default_memory = ` + strconv.FormatUint(uint64(config.DefaultMemSize), 10) + `
disable_block_device_use = ` + strconv.FormatBool(config.DisableBlock) + `
enable_iothreads = ` + strconv.FormatBool(config.EnableIOThreads) + `
indep_iothreads = ` + strconv.FormatUint(uint64(config.DefaultIndepIOThreads), 10) + `
cold_plug_vfio = "` + config.ColdPlugVFIO.String() + `"
hot_plug_vfio = "` + config.HotPlugVFIO.String() + `"
pcie_root_port = ` + strconv.FormatUint(uint64(config.PCIeRootPort), 10) + `

View File

@@ -74,6 +74,7 @@ const defaultBlockDeviceCacheSet bool = false
const defaultBlockDeviceCacheDirect bool = false
const defaultBlockDeviceCacheNoflush bool = false
const defaultEnableIOThreads bool = false
const defaultIndepIOThreads uint32 = 0
const defaultEnableMemPrealloc bool = false
const defaultEnableReclaimGuestFreedMemory bool = false
const defaultEnableHugePages bool = false

View File

@@ -157,6 +157,7 @@ type hypervisor struct {
Debug bool `toml:"enable_debug"`
DisableNestingChecks bool `toml:"disable_nesting_checks"`
EnableIOThreads bool `toml:"enable_iothreads"`
IndepIOThreads uint32 `toml:"indep_iothreads"`
DisableImageNvdimm bool `toml:"disable_image_nvdimm"`
HotPlugVFIO config.PCIePort `toml:"hot_plug_vfio"`
ColdPlugVFIO config.PCIePort `toml:"cold_plug_vfio"`
@@ -196,6 +197,7 @@ type runtime struct {
CreateContainerTimeout uint64 `toml:"create_container_timeout"`
DanConf string `toml:"dan_conf"`
ForceGuestPull bool `toml:"experimental_force_guest_pull"`
PodResourceAPISock string `toml:"pod_resource_api_sock"`
}
type agent struct {
@@ -622,6 +624,14 @@ func (h hypervisor) msize9p() uint32 {
return h.Msize9p
}
func (h hypervisor) indepiothreads() uint32 {
if h.IndepIOThreads == 0 {
return defaultIndepIOThreads
}
return h.IndepIOThreads
}
func (h hypervisor) guestHookPath() string {
if h.GuestHookPath == "" {
return defaultGuestHookPath
@@ -818,6 +828,7 @@ func newFirecrackerHypervisorConfig(h hypervisor) (vc.HypervisorConfig, error) {
DisableNestingChecks: h.DisableNestingChecks,
BlockDeviceDriver: blockDriver,
EnableIOThreads: h.EnableIOThreads,
IndepIOThreads: h.indepiothreads(),
DisableVhostNet: true, // vhost-net backend is not supported in Firecracker
GuestHookPath: h.guestHookPath(),
RxRateLimiterMaxRate: rxRateLimiterMaxRate,
@@ -973,6 +984,7 @@ func newQemuHypervisorConfig(h hypervisor) (vc.HypervisorConfig, error) {
BlockDeviceCacheDirect: h.BlockDeviceCacheDirect,
BlockDeviceCacheNoflush: h.BlockDeviceCacheNoflush,
EnableIOThreads: h.EnableIOThreads,
IndepIOThreads: h.indepiothreads(),
Msize9p: h.msize9p(),
DisableImageNvdimm: h.DisableImageNvdimm,
HotPlugVFIO: h.hotPlugVFIO(),
@@ -1105,6 +1117,7 @@ func newClhHypervisorConfig(h hypervisor) (vc.HypervisorConfig, error) {
BlockDeviceCacheSet: h.BlockDeviceCacheSet,
BlockDeviceCacheDirect: h.BlockDeviceCacheDirect,
EnableIOThreads: h.EnableIOThreads,
IndepIOThreads: h.indepiothreads(),
Msize9p: h.msize9p(),
DisableImageNvdimm: h.DisableImageNvdimm,
ColdPlugVFIO: h.coldPlugVFIO(),
@@ -1464,6 +1477,7 @@ func GetDefaultHypervisorConfig() vc.HypervisorConfig {
BlockDeviceCacheDirect: defaultBlockDeviceCacheDirect,
BlockDeviceCacheNoflush: defaultBlockDeviceCacheNoflush,
EnableIOThreads: defaultEnableIOThreads,
IndepIOThreads: defaultIndepIOThreads,
Msize9p: defaultMsize9p,
ColdPlugVFIO: defaultColdPlugVFIO,
HotPlugVFIO: defaultHotPlugVFIO,
@@ -1602,6 +1616,7 @@ func LoadConfiguration(configPath string, ignoreLogging bool) (resolvedConfigPat
}
config.ForceGuestPull = tomlConf.Runtime.ForceGuestPull
config.PodResourceAPISock = tomlConf.Runtime.PodResourceAPISock
return resolved, config, nil
}

View File

@@ -174,6 +174,25 @@ type RuntimeConfig struct {
// ForceGuestPull enforces guest pull independent of snapshotter annotations.
ForceGuestPull bool
// PodResourceAPISock specifies the unix socket for the Kubelet's
// PodResource API endpoint. If empty, kubernetes based cold plug
// will not be attempted. In order for this feature to work, the
// KubeletPodResourcesGet featureGate must be enabled in Kubelet,
// if using Kubelet older than 1.34.
//
// The pod resource API's socket is relative to the Kubelet's root-dir,
// which is defined by the cluster admin, and its location is:
// ${KubeletRootDir}/pod-resources/kubelet.sock
//
// HypervisorConfig.ColdPlugVFIO acts as a feature gate:
// ColdPlugVFIO = NoPort => no cold plug
// ColdPlugVFIO != NoPort AND PodResourceAPISock = "" => need
// explicit CDI annotation for cold plug (applies mainly
// to non-k8s cases)
// ColdPlugVFIO != NoPort AND PodResourceAPISock != "" => kubelet
// based cold plug.
PodResourceAPISock string
}
// AddKernelParam allows the addition of new kernel parameters to an existing
@@ -596,7 +615,20 @@ func addHypervisorPathOverrides(ocispec specs.Spec, config *vc.SandboxConfig, ru
if value, ok := ocispec.Annotations[vcAnnotations.KernelParams]; ok {
if value != "" {
params := vc.DeserializeParams(strings.Fields(value))
// Annotation parameters should replace existing parameters with the same key
// rather than append, to allow overriding default values
for _, param := range params {
// Remove any existing parameter with the same key
var newParams []vc.Param
for _, existingParam := range config.HypervisorConfig.KernelParams {
if existingParam.Key != param.Key {
newParams = append(newParams, existingParam)
}
}
config.HypervisorConfig.KernelParams = newParams
// Now add the annotation parameter
if err := config.HypervisorConfig.AddKernelParam(param); err != nil {
return fmt.Errorf("Error adding kernel parameters in annotation kernel_params : %v", err)
}
@@ -840,6 +872,17 @@ func addHypervisorBlockOverrides(ocispec specs.Spec, sbConfig *vc.SandboxConfig)
return err
}
if err := newAnnotationConfiguration(ocispec, vcAnnotations.IndepIOThreads).setUintWithCheck(func(indepiothreads uint64) error {
// Default indepiothreads limit is less than 50.
if indepiothreads == 0 || indepiothreads > 50 {
return fmt.Errorf("Error parsing annotation for indepiothreads, please specify numeric value less than 50")
}
sbConfig.HypervisorConfig.IndepIOThreads = uint32(indepiothreads)
return nil
}); err != nil {
return err
}
if err := newAnnotationConfiguration(ocispec, vcAnnotations.BlockDeviceCacheSet).setBool(func(blockDeviceCacheSet bool) {
sbConfig.HypervisorConfig.BlockDeviceCacheSet = blockDeviceCacheSet
}); err != nil {

View File

@@ -0,0 +1,12 @@
Language: Cpp
BasedOnStyle: Microsoft
BreakBeforeBraces: Attach
PointerAlignment: Left
AllowShortFunctionsOnASingleLine: All
# match Go style
IndentCaseLabels: false
# don't break comments over line limit (needed for CodeQL exceptions)
ReflowComments: false
InsertNewlineAtEOF: true
KeepEmptyLines:
AtEndOfFile: true

View File

@@ -5,9 +5,6 @@ run:
- admin
- functional
- integration
skip-dirs:
# paths are relative to module root
- cri-containerd/test-images
linters:
enable:
@@ -34,13 +31,15 @@ linters-settings:
# struct order is often for Win32 compat
# also, ignore pointer bytes/GC issues for now until performance becomes an issue
- fieldalignment
check-shadowing: true
stylecheck:
# https://staticcheck.io/docs/checks
checks: ["all"]
issues:
exclude-dirs:
# paths are relative to module root
- cri-containerd/test-images
exclude-rules:
# err is very often shadowed in nested scopes
- linters:
@@ -70,22 +69,22 @@ issues:
- path: layer.go
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
- path: hcsshim.go
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
- path: cmd\\ncproxy\\nodenetsvc\\
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
- path: cmd\\ncproxy_mock\\
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
- path: internal\\hcs\\schema2\\
linters:
@@ -95,67 +94,67 @@ issues:
- path: internal\\wclayer\\
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
- path: hcn\\
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
- path: internal\\hcs\\schema1\\
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
- path: internal\\hns\\
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
- path: ext4\\internal\\compactext4\\
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
- path: ext4\\internal\\format\\
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
- path: internal\\guestrequest\\
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
- path: internal\\guest\\prot\\
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
- path: internal\\windevice\\
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
- path: internal\\winapi\\
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
- path: internal\\vmcompute\\
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
- path: internal\\regstate\\
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
- path: internal\\hcserror\\
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"
# v0 APIs are deprecated, but still retained for backwards compatability
- path: cmd\\ncproxy\\
@@ -171,4 +170,4 @@ issues:
- path: internal\\vhdx\\info
linters:
- stylecheck
Text: "ST1003:"
text: "ST1003:"

View File

@@ -1,13 +1,20 @@
BASE:=base.tar.gz
DEV_BUILD:=0
include Makefile.bootfiles
GO:=go
GO_FLAGS:=-ldflags "-s -w" # strip Go binaries
CGO_ENABLED:=0
GOMODVENDOR:=
KMOD:=0
CFLAGS:=-O2 -Wall
LDFLAGS:=-static -s # strip C binaries
LDFLAGS:=-static -s #strip C binaries
LDLIBS:=
PREPROCESSORFLAGS:=
ifeq "$(KMOD)" "1"
LDFLAGS:= -s
LDLIBS:= -lkmod
PREPROCESSORFLAGS:=-DMODULES=1
endif
GO_FLAGS_EXTRA:=
ifeq "$(GOMODVENDOR)" "1"
@@ -23,108 +30,14 @@ SRCROOT=$(dir $(abspath $(firstword $(MAKEFILE_LIST))))
# additional directories to search for rule prerequisites and targets
VPATH=$(SRCROOT)
DELTA_TARGET=out/delta.tar.gz
ifeq "$(DEV_BUILD)" "1"
DELTA_TARGET=out/delta-dev.tar.gz
endif
ifeq "$(SNP_BUILD)" "1"
DELTA_TARGET=out/delta-snp.tar.gz
endif
# The link aliases for gcstools
GCS_TOOLS=\
generichook \
install-drivers
# Common path prefix.
PATH_PREFIX:=
# These have PATH_PREFIX prepended to obtain the full path in recipies e.g. $(PATH_PREFIX)/$(VMGS_TOOL)
VMGS_TOOL:=
IGVM_TOOL:=
KERNEL_PATH:=
.PHONY: all always rootfs test snp simple
.DEFAULT_GOAL := all
all: out/initrd.img out/rootfs.tar.gz
clean:
find -name '*.o' -print0 | xargs -0 -r rm
rm -rf bin deps rootfs out
test:
cd $(SRCROOT) && $(GO) test -v ./internal/guest/...
rootfs: out/rootfs.vhd
snp: out/kernelinitrd.vmgs out/rootfs.hash.vhd out/rootfs.vhd out/v2056.vmgs
simple: out/simple.vmgs snp
%.vmgs: %.bin
rm -f $@
# du -BM returns the size of the bin file in M, eg 7M. The sed command replaces the M with *1024*1024 and then bc does the math to convert to bytes
$(PATH_PREFIX)/$(VMGS_TOOL) create --filepath $@ --filesize `du -BM $< | sed "s/M.*/*1024*1024/" | bc`
$(PATH_PREFIX)/$(VMGS_TOOL) write --filepath $@ --datapath $< -i=8
# Simplest debug UVM used to test changes to the linux kernel. No dmverity protection. Boots an initramdisk rather than directly booting a vhd disk.
out/simple.bin: out/initrd.img $(PATH_PREFIX)/$(KERNEL_PATH) boot/startup_simple.sh
rm -f $@
python3 $(PATH_PREFIX)/$(IGVM_TOOL) -o $@ -kernel $(PATH_PREFIX)/$(KERNEL_PATH) -append "8250_core.nr_uarts=0 panic=-1 debug loglevel=7 rdinit=/startup_simple.sh" -rdinit out/initrd.img -vtl 0
ROOTFS_DEVICE:=/dev/sda
VERITY_DEVICE:=/dev/sdb
# Debug build for use with uvmtester. UVM with dm-verity protected vhd disk mounted directly via the kernel command line. Ignores corruption in dm-verity protected disk. (Use dmesg to see if dm-verity is ignoring data corruption.)
out/v2056.bin: out/rootfs.vhd out/rootfs.hash.vhd $(PATH_PREFIX)/$(KERNEL_PATH) out/rootfs.hash.datasectors out/rootfs.hash.datablocksize out/rootfs.hash.hashblocksize out/rootfs.hash.datablocks out/rootfs.hash.rootdigest out/rootfs.hash.salt boot/startup_v2056.sh
rm -f $@
python3 $(PATH_PREFIX)/$(IGVM_TOOL) -o $@ -kernel $(PATH_PREFIX)/$(KERNEL_PATH) -append "8250_core.nr_uarts=0 panic=-1 debug loglevel=7 root=/dev/dm-0 dm-mod.create=\"dmverity,,,ro,0 $(shell cat out/rootfs.hash.datasectors) verity 1 $(ROOTFS_DEVICE) $(VERITY_DEVICE) $(shell cat out/rootfs.hash.datablocksize) $(shell cat out/rootfs.hash.hashblocksize) $(shell cat out/rootfs.hash.datablocks) 0 sha256 $(shell cat out/rootfs.hash.rootdigest) $(shell cat out/rootfs.hash.salt) 1 ignore_corruption\" init=/startup_v2056.sh" -vtl 0
# Full UVM with dm-verity protected vhd disk mounted directly via the kernel command line.
out/kernelinitrd.bin: out/rootfs.vhd out/rootfs.hash.vhd out/rootfs.hash.datasectors out/rootfs.hash.datablocksize out/rootfs.hash.hashblocksize out/rootfs.hash.datablocks out/rootfs.hash.rootdigest out/rootfs.hash.salt $(PATH_PREFIX)/$(KERNEL_PATH) boot/startup.sh
rm -f $@
python3 $(PATH_PREFIX)/$(IGVM_TOOL) -o $@ -kernel $(PATH_PREFIX)/$(KERNEL_PATH) -append "8250_core.nr_uarts=0 panic=-1 debug loglevel=7 root=/dev/dm-0 dm-mod.create=\"dmverity,,,ro,0 $(shell cat out/rootfs.hash.datasectors) verity 1 $(ROOTFS_DEVICE) $(VERITY_DEVICE) $(shell cat out/rootfs.hash.datablocksize) $(shell cat out/rootfs.hash.hashblocksize) $(shell cat out/rootfs.hash.datablocks) 0 sha256 $(shell cat out/rootfs.hash.rootdigest) $(shell cat out/rootfs.hash.salt)\" init=/startup.sh" -vtl 0
# Rule to make a vhd from a file. This is used to create the rootfs.hash.vhd from rootfs.hash.
%.vhd: % bin/cmd/tar2ext4
./bin/cmd/tar2ext4 -only-vhd -i $< -o $@
# Rule to make a vhd from an ext4 file. This is used to create the rootfs.vhd from rootfs.ext4.
%.vhd: %.ext4 bin/cmd/tar2ext4
./bin/cmd/tar2ext4 -only-vhd -i $< -o $@
%.hash %.hash.info %.hash.datablocks %.hash.rootdigest %hash.datablocksize %.hash.datasectors %.hash.hashblocksize: %.ext4 %.hash.salt
veritysetup format --no-superblock --salt $(shell cat out/rootfs.hash.salt) $< $*.hash > $*.hash.info
# Retrieve info required by dm-verity at boot time
# Get the blocksize of rootfs
cat $*.hash.info | awk '/^Root hash:/{ print $$3 }' > $*.hash.rootdigest
cat $*.hash.info | awk '/^Salt:/{ print $$2 }' > $*.hash.salt
cat $*.hash.info | awk '/^Data block size:/{ print $$4 }' > $*.hash.datablocksize
cat $*.hash.info | awk '/^Hash block size:/{ print $$4 }' > $*.hash.hashblocksize
cat $*.hash.info | awk '/^Data blocks:/{ print $$3 }' > $*.hash.datablocks
echo $$(( $$(cat $*.hash.datablocks) * $$(cat $*.hash.datablocksize) / 512 )) > $*.hash.datasectors
out/rootfs.hash.salt:
hexdump -vn32 -e'8/4 "%08X" 1 "\n"' /dev/random > $@
out/rootfs.ext4: out/rootfs.tar.gz bin/cmd/tar2ext4
gzip -f -d ./out/rootfs.tar.gz
./bin/cmd/tar2ext4 -i ./out/rootfs.tar -o $@
out/rootfs.tar.gz: out/initrd.img
rm -rf rootfs-conv
mkdir rootfs-conv
gunzip -c out/initrd.img | (cd rootfs-conv && cpio -imd)
tar -zcf $@ -C rootfs-conv .
rm -rf rootfs-conv
out/initrd.img: $(BASE) $(DELTA_TARGET) $(SRCROOT)/hack/catcpio.sh
$(SRCROOT)/hack/catcpio.sh "$(BASE)" $(DELTA_TARGET) > out/initrd.img.uncompressed
gzip -c out/initrd.img.uncompressed > $@
rm out/initrd.img.uncompressed
# This target includes utilities which may be useful for testing purposes.
out/delta-dev.tar.gz: out/delta.tar.gz bin/internal/tools/snp-report
rm -rf rootfs-dev
@@ -168,10 +81,7 @@ out/delta.tar.gz: bin/init bin/vsockexec bin/cmd/gcs bin/cmd/gcstools bin/cmd/ho
tar -zcf $@ -C rootfs .
rm -rf rootfs
out/containerd-shim-runhcs-v1.exe:
GOOS=windows $(GO_BUILD) -o $@ $(SRCROOT)/cmd/containerd-shim-runhcs-v1
bin/cmd/gcs bin/cmd/gcstools bin/cmd/hooks/wait-paths bin/cmd/tar2ext4 bin/internal/tools/snp-report bin/cmd/dmverity-vhd:
bin/cmd/gcs bin/cmd/gcstools bin/cmd/hooks/wait-paths bin/cmd/tar2ext4 bin/internal/tools/snp-report:
@mkdir -p $(dir $@)
GOOS=linux $(GO_BUILD) -o $@ $(SRCROOT)/$(@:bin/%=%)
@@ -181,8 +91,8 @@ bin/vsockexec: vsockexec/vsockexec.o vsockexec/vsock.o
bin/init: init/init.o vsockexec/vsock.o
@mkdir -p bin
$(CC) $(LDFLAGS) -o $@ $^
$(CC) $(LDFLAGS) -o $@ $^ $(LDLIBS)
%.o: %.c
@mkdir -p $(dir $@)
$(CC) $(CFLAGS) $(CPPFLAGS) -c -o $@ $<
$(CC) $(PREPROCESSORFLAGS) $(CFLAGS) $(CPPFLAGS) -c -o $@ $<

View File

@@ -0,0 +1,197 @@
BASE:=base.tar.gz
DEV_BUILD:=0
DELTA_TARGET=out/delta.tar.gz
ifeq "$(DEV_BUILD)" "1"
DELTA_TARGET=out/delta-dev.tar.gz
endif
ifeq "$(SNP_BUILD)" "1"
DELTA_TARGET=out/delta-snp.tar.gz
endif
SRCROOT=$(dir $(abspath $(firstword $(MAKEFILE_LIST))))
PATH_PREFIX:=
# These have PATH_PREFIX prepended to obtain the full path in recipies e.g. $(PATH_PREFIX)/$(VMGS_TOOL)
VMGS_TOOL:=
IGVM_TOOL:=
KERNEL_PATH:=
TAR2EXT4_TOOL:=bin/cmd/tar2ext4
ROOTFS_DEVICE:=/dev/sda
HASH_DEVICE:=/dev/sdb
.PHONY: all always rootfs test snp simple
.DEFAULT_GOAL := all
all: out/initrd.img out/rootfs.tar.gz
clean:
find -name '*.o' -print0 | xargs -0 -r rm
rm -rf bin rootfs out
rootfs: out/rootfs.vhd
snp: out/kernel.vmgs out/rootfs-verity.vhd out/v2056.vmgs out/v2056combined.vmgs
simple: out/simple.vmgs snp
%.vmgs: %.bin
rm -f $@
# du -BM returns the size of the bin file in M, eg 7M. The sed command replaces the M with *1024*1024 and then bc does the math to convert to bytes
$(PATH_PREFIX)/$(VMGS_TOOL) create --filepath $@ --filesize `du -BM $< | sed "s/M.*/*1024*1024/" | bc`
$(PATH_PREFIX)/$(VMGS_TOOL) write --filepath $@ --datapath $< -i=8
# Simplest debug UVM used to test changes to the linux kernel. No dmverity protection. Boots an initramdisk rather than directly booting a vhd disk.
out/simple.bin: out/initrd.img $(PATH_PREFIX)/$(KERNEL_PATH) boot/startup_simple.sh
rm -f $@
python3 $(PATH_PREFIX)/$(IGVM_TOOL) \
-o $@ \
-kernel $(PATH_PREFIX)/$(KERNEL_PATH) \
-append "8250_core.nr_uarts=0 panic=-1 debug loglevel=7 rdinit=/startup_simple.sh" \
-rdinit out/initrd.img \
-vtl 0
# The boot performance is optimized by supplying rootfs as a SCSI attachment. In this case the kernel boots with
# dm-verity to ensure the integrity. Similar to layer VHDs the verity Merkle tree is appended to ext4 filesystem.
# It transpires that the /dev/sd* order is not deterministic wrt the scsi device order. Thus build a single userland
# fs + merkle tree device and boot that.
#
# From https://www.kernel.org/doc/html/latest/admin-guide/device-mapper/dm-init.html
#
# dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+]
#
# where:
# <name> ::= The device name.
# <uuid> ::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ""
# <minor> ::= The device minor number | ""
# <flags> ::= "ro" | "rw"
# <table> ::= <start_sector> <num_sectors> <target_type> <target_args>
# <target_type> ::= "verity" | "linear" | ... (see list below)
#
# From https://docs.kernel.org/admin-guide/device-mapper/verity.html
# <version> <dev> <hash_dev>
# <data_block_size> <hash_block_size>
# <num_data_blocks> <hash_start_block>
# <algorithm> <digest> <salt>
# [<#opt_params> <opt_params>]
#
# typical igvm tool line once all the macros are expanded
# python3 /home/user/igvmfile.py -o out/v2056.bin -kernel /hose/user/bzImage -append "8250_core.nr_uarts=0 panic=-1 debug loglevel=9 ignore_loglevel dev.scsi.logging_level=9411 root=/dev/dm-0 dm-mod.create=\"dmverity,,,ro,0 196744 verity 1 /dev/sda /dev/sdb 4096 4096 24593 0 sha256 6d625a306aafdf73125a84388b7bfdd2c3a154bd8d698955f4adffc736bdfd66 b9065c23231f0d8901cc3a68e1d3b8d624213e76d6f9f6d3ccbcb829f9c710ba 1 ignore_corruption\" init=/startup_v2056.sh" -vtl 0
#
# so a kernel command line of:
# 8250_core.nr_uarts=0 panic=-1 debug loglevel=9 ignore_loglevel dev.scsi.logging_level=9411 root=/dev/dm-0 dm-mod.create=\"dmverity,,,ro,0 196744 verity 1 /dev/sda /dev/sdb 4096 4096 24593 0 sha256 6d625a306aafdf73125a84388b7bfdd2c3a154bd8d698955f4adffc736bdfd66 b9065c23231f0d8901cc3a68e1d3b8d624213e76d6f9f6d3ccbcb829f9c710ba 1 ignore_corruption\" init=/startup_v2056.sh
#
# and a dm-mod.create of:
# dmverity,,,ro,0 196744 verity 1 /dev/sda /dev/sdb 4096 4096 24593 0 sha256 6d625a306aafdf73125a84388b7bfdd2c3a154bd8d698955f4adffc736bdfd66 b9065c23231f0d8901cc3a68e1d3b8d624213e76d6f9f6d3ccbcb829f9c710ba 1 ignore_corruption
#
# which breaks down to:
#
# name = "dmverity"
# uuid = ""
# minor = ""
# flags = "ro"
# table = 0 196744 verity "args"
# start_sector = 0
# num_sectors = 196744
# target_type = verity
# target_args = 1 /dev/sda /dev/sdb 4096 4096 24593 0 sha256 6d625a306aafdf73125a84388b7bfdd2c3a154bd8d698955f4adffc736bdfd66 b9065c23231f0d8901cc3a68e1d3b8d624213e76d6f9f6d3ccbcb829f9c710ba 1 ignore_corruption
# args:
# version 1
# dev /dev/sda
# hash_dev /dev/sdb
# data_block_size 4096
# hash_block_size 4096
# num_data_blocks 24593
# hash_start_block 0
# algorithm sha256
# digest 6d625a306aafdf73125a84388b7bfdd2c3a154bd8d698955f4adffc736bdfd66
# salt b9065c23231f0d8901cc3a68e1d3b8d624213e76d6f9f6d3ccbcb829f9c710ba
# opt_params
# count = 1
# ignore_corruption
#
# combined typical (not bigger count of sectors for the whole device)
# dmverity,,,ro,0 199672 verity 1 /dev/sda /dev/sda 4096 4096 24959 24959 sha256 4aa6e79866ee946ddbd9cddd6554bc6449272942fcc65934326817785a3bd374 adc4956274489c936395bab046a2d476f21ef436e571ba53da2fdf3aee59bf0a
#
# A few notes:
# - num_sectors is the size of the final (aka target) verity device, i.e. the size of our rootfs excluding the Merkle
# tree.
# - We don't add verity superblock, so the <hash_start_block> will be exactly at the end of ext4 filesystem and equal
# to its size. In the case when verity superblock is present an extra block should be added to the offset value,
# i.e. 24959 becomes 24960.
# Debug build for use with uvmtester. UVM with dm-verity protected vhd disk mounted directly via the kernel command line.
# Ignores corruption in dm-verity protected disk. (Use dmesg to see if dm-verity is ignoring data corruption.)
out/v2056.bin: out/rootfs.vhd out/rootfs.hash.vhd $(PATH_PREFIX)/$(KERNEL_PATH) out/rootfs.hash.datasectors out/rootfs.hash.datablocksize out/rootfs.hash.hashblocksize out/rootfs.hash.datablocks out/rootfs.hash.rootdigest out/rootfs.hash.salt boot/startup_v2056.sh
rm -f $@
python3 $(PATH_PREFIX)/$(IGVM_TOOL) \
-o $@ \
-kernel $(PATH_PREFIX)/$(KERNEL_PATH) \
-append "8250_core.nr_uarts=0 panic=-1 debug loglevel=9 root=/dev/dm-0 dm-mod.create=\"dmverity,,,ro,0 $(shell cat out/rootfs.hash.datasectors) verity 1 $(ROOTFS_DEVICE) $(HASH_DEVICE) $(shell cat out/rootfs.hash.datablocksize) $(shell cat out/rootfs.hash.hashblocksize) $(shell cat out/rootfs.hash.datablocks) $(shell cat out/rootfs.hash.datablocks) sha256 $(shell cat out/rootfs.hash.rootdigest) $(shell cat out/rootfs.hash.salt) 1 ignore_corruption\" init=/startup_v2056.sh" \
-vtl 0
out/v2056combined.bin: out/rootfs-verity.vhd $(PATH_PREFIX)/$(KERNEL_PATH) out/rootfs.hash.datablocksize out/rootfs.hash.hashblocksize out/rootfs.hash.datablocks out/rootfs.hash.rootdigest out/rootfs.hash.salt boot/startup_v2056.sh
rm -f $@
echo root=/dev/dm-0 dm-mod.create=\"dmverity,,,ro,0 $(shell cat out/rootfs.hash.datasectors) verity 1 $(ROOTFS_DEVICE) $(ROOTFS_DEVICE) $(shell cat out/rootfs.hash.datablocksize) $(shell cat out/rootfs.hash.hashblocksize) $(shell cat out/rootfs.hash.datablocks) $(shell cat out/rootfs.hash.datablocks) sha256 $(shell cat out/rootfs.hash.rootdigest) $(shell cat out/rootfs.hash.salt) 1 ignore_corruption\"
python3 $(PATH_PREFIX)/$(IGVM_TOOL) \
-o $@ \
-kernel $(PATH_PREFIX)/$(KERNEL_PATH) \
-append "8250_core.nr_uarts=0 panic=-1 debug loglevel=9 ignore_loglevel dev.scsi.logging_level=9411 root=/dev/dm-0 dm-mod.create=\"dmverity,,,ro,0 $(shell cat out/rootfs.hash.datasectors) verity 1 $(ROOTFS_DEVICE) $(ROOTFS_DEVICE) $(shell cat out/rootfs.hash.datablocksize) $(shell cat out/rootfs.hash.hashblocksize) $(shell cat out/rootfs.hash.datablocks) $(shell cat out/rootfs.hash.datablocks) sha256 $(shell cat out/rootfs.hash.rootdigest) $(shell cat out/rootfs.hash.salt) 1 ignore_corruption\" init=/startup_v2056.sh" \
-vtl 0
# Full UVM with dm-verity protected vhd disk mounted directly via the kernel command line.
out/kernel.bin: out/rootfs-verity.vhd $(PATH_PREFIX)/$(KERNEL_PATH) out/rootfs.hash.datasectors out/rootfs.hash.datablocksize out/rootfs.hash.hashblocksize out/rootfs.hash.datablocks out/rootfs.hash.rootdigest out/rootfs.hash.salt boot/startup.sh
rm -f $@
echo root=/dev/dm-0 dm-mod.create=\"dmverity,,,ro,0 $(shell cat out/rootfs.hash.datasectors) verity 1 $(ROOTFS_DEVICE) $(ROOTFS_DEVICE) $(shell cat out/rootfs.hash.datablocksize) $(shell cat out/rootfs.hash.hashblocksize) $(shell cat out/rootfs.hash.datablocks) $(shell cat out/rootfs.hash.datablocks) sha256 $(shell cat out/rootfs.hash.rootdigest) $(shell cat out/rootfs.hash.salt)\"
python3 $(PATH_PREFIX)/$(IGVM_TOOL) \
-o $@ \
-kernel $(PATH_PREFIX)/$(KERNEL_PATH) \
-append "8250_core.nr_uarts=0 panic=-1 debug loglevel=7 root=/dev/dm-0 dm-mod.create=\"dmverity,,,ro,0 $(shell cat out/rootfs.hash.datasectors) verity 1 $(ROOTFS_DEVICE) $(ROOTFS_DEVICE) $(shell cat out/rootfs.hash.datablocksize) $(shell cat out/rootfs.hash.hashblocksize) $(shell cat out/rootfs.hash.datablocks) $(shell cat out/rootfs.hash.datablocks) sha256 $(shell cat out/rootfs.hash.rootdigest) $(shell cat out/rootfs.hash.salt)\" init=/startup.sh" \
-vtl 0
# Rule to make a vhd from a file. This is used to create the rootfs.hash.vhd from rootfs.hash.
%.vhd: % $(TAR2EXT4_TOOL)
$(TAR2EXT4_TOOL) -only-vhd -i $< -o $@
# Rule to make a vhd from an ext4 file. This is used to create the rootfs.vhd from rootfs.ext4.
%.vhd: %.ext4 $(TAR2EXT4_TOOL)
$(TAR2EXT4_TOOL) -only-vhd -i $< -o $@
%.hash %.hash.info %.hash.datablocks %.hash.rootdigest %hash.datablocksize %.hash.datasectors %.hash.hashblocksize: %.ext4 %.hash.salt
veritysetup format --no-superblock --salt $(shell cat out/rootfs.hash.salt) $< $*.hash > $*.hash.info
# Retrieve info required by dm-verity at boot time
# Get the blocksize of rootfs
cat $*.hash.info | awk '/^Root hash:/{ print $$3 }' > $*.hash.rootdigest
cat $*.hash.info | awk '/^Salt:/{ print $$2 }' > $*.hash.salt
cat $*.hash.info | awk '/^Data block size:/{ print $$4 }' > $*.hash.datablocksize
cat $*.hash.info | awk '/^Hash block size:/{ print $$4 }' > $*.hash.hashblocksize
cat $*.hash.info | awk '/^Data blocks:/{ print $$3 }' > $*.hash.datablocks
echo $$(( $$(cat $*.hash.datablocks) * $$(cat $*.hash.datablocksize) / 512 )) > $*.hash.datasectors
out/rootfs.hash.salt:
hexdump -vn32 -e'8/4 "%08X" 1 "\n"' /dev/random > $@
out/rootfs.ext4: out/rootfs.tar.gz $(TAR2EXT4_TOOL)
gzip -f -d ./out/rootfs.tar.gz
$(TAR2EXT4_TOOL) -i ./out/rootfs.tar -o $@
out/rootfs-verity.ext4: out/rootfs.ext4 out/rootfs.hash
cp out/rootfs.ext4 $@
cat out/rootfs.hash >> $@
out/rootfs.tar.gz: out/initrd.img
rm -rf rootfs-conv
mkdir rootfs-conv
gunzip -c out/initrd.img | (cd rootfs-conv && cpio -imd)
tar -zcf $@ -C rootfs-conv .
rm -rf rootfs-conv
out/initrd.img: $(BASE) $(DELTA_TARGET) $(SRCROOT)/hack/catcpio.sh
$(SRCROOT)/hack/catcpio.sh "$(BASE)" $(DELTA_TARGET) > out/initrd.img.uncompressed
gzip -c out/initrd.img.uncompressed > $@
rm out/initrd.img.uncompressed

View File

@@ -44,7 +44,7 @@ delta.tar.gz initrd.img rootfs.tar.gz
### Containerd Shim
For info on the [Runtime V2 API](https://github.com/containerd/containerd/blob/master/runtime/v2/README.md).
For info on the [Runtime V2 API](https://github.com/containerd/containerd/blob/main/core/runtime/v2/README.md).
Contrary to the typical Linux architecture of shim -> runc, the runhcs shim is used both to launch and manage the lifetime of containers.

Some files were not shown because too many files have changed in this diff Show More