The test currently uses a static directory at `/tmp/initimg_test`. This
introduces non-determinism into the unit test:
* Files that already exist in that dir might alter test results.
* If the directory is owned by root, the test will fail due to
permissions.
Switch to using the tempfile crate instead.
Fixes: #13053
Signed-off-by: Markus Rudy <mr@edgeless.systems>
The initdata is currently being decoded, and then re-encoded with the
to_string function. This will usually not preserve the original initdata
document, and thus the initdata hash will differ between the annotation
and the block device.
This commit changes the logic to only decode the base64, but keep the
initdata document intact. Since the error message is now nested, adjust
the tests to look for the expected error in the chain.
Fixes: #12951
Signed-off-by: Markus Rudy <mr@edgeless.systems>
Exit early with an error message instead of starting kata-deploy if
the value of KATA_HYPERVISOR is not expected during CI.
For example: "cloud-hypervisor" was renamed recently to
"clh-runtime-rs" and user scripts depending on the old name were
getting tangled in kata-deploy instead of just rejecting the old
value quickly.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Avoid running "git remote show origin" repeatedly when common.bash
gets sourced multiple times and target_branch was not specified by
the caller.
Repeated "git remote show origin" calls inflicted the additional
overhead of authenticating and communicating with the remote git
repository.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This reverts commit edfb6f5716.
The NVIDIA non-TEE CI job has passed again over the last 5 nightly
runs after merging PRs #13007 and #13020.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
If the policy loading encounters an error, we `abort(3)` the agent for
safety. Since abort causes the process to stop immediately, the async
logs might not be flushed yet, and thus won't make it to the runtime,
hiding the reason for the abort. Wait a bit before aborting so that the
logs are fully written.
Fixes: #13031
Signed-off-by: Markus Rudy <mr@edgeless.systems>
Group the shared-context parameters (share_fs, device_manager, sid,
agent, emptydir_mode) into a VolumeContext struct so handler_volumes
stays within clippy's argument count limit and avoids -D warnings
breakage in CI.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
Now that runtime-rs supports block-encrypted emptyDir volumes, remove
the no-trusted-storage workaround templates and the is_runtime_rs
branching in the NIM test. Runtime-rs now uses the same TEE templates
as the Go runtime with emptyDir + PVC at 48Gi memory, instead of the
128Gi workaround that compensated for lacking trusted storage.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
Remove the runtime-rs skip from the trusted ephemeral data storage
test now that runtime-rs implements block-encrypted emptyDir volumes.
Also remove the genpolicy drop-in that disabled encrypted_emptydir
for runtime-rs and the corresponding copy logic in tests_common.sh.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
Add the emptydir_mode configuration option to all runtime-rs config
template files. CoCo configs (snp, tdx, se, coco-dev, nvidia-gpu-snp,
nvidia-gpu-tdx) default to block-encrypted via @DEFEMPTYDIRMODE_COCO@,
while non-CoCo configs (qemu, nvidia-gpu, fc) default to shared-fs
via @DEFEMPTYDIRMODE@.
Also add DEFEMPTYDIRMODE and DEFEMPTYDIRMODE_COCO variables to the
runtime-rs Makefile for template substitution.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
When emptydir_mode is "block-encrypted", host emptyDir paths must
remain as "bind" mounts so the EncryptedEmptyDirVolume handler can
intercept them in the volume dispatch chain. Previously,
update_ephemeral_storage_type() would unconditionally convert them
to "local" type, causing them to be handled as plain local volumes
instead.
Add the emptydir_mode parameter to update_ephemeral_storage_type()
and its call chain (amend_spec in container.rs) and skip the
host-emptyDir-to-local conversion when the mode is block-encrypted.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
Add the core volume handler for block-encrypted emptyDir support
in runtime-rs, bringing it to parity with the Go runtime (PR #10559).
When emptydir_mode is set to "block-encrypted", host emptyDir bind
mounts are intercepted and handled as follows:
1. A sparse disk image (disk.img) is created inside the emptyDir
folder, sized to match the host filesystem capacity.
2. A mountInfo.json is written under the kata direct-volume root
with volume_type "blk", fs_type "ext4", and metadata
encryptionKey=ephemeral.
3. The disk image is plugged into the guest VM as a virtio-blk
device via the hypervisor device manager.
4. An agent::Storage is built with driver_options containing
encryption_key=ephemeral and shared=true, so the kata-agent
delegates formatting and encryption to CDH using LUKS2.
The volume is registered in the dispatch chain before the regular
block-volume check, and ephemeral disk metadata is tracked for
sandbox-level cleanup at teardown.
Also re-exports EMPTYDIR_MODE_* constants from kata-types::config
so downstream crates can reference them.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
The proto Storage message already has a "shared" field (field 8),
but the runtime-rs agent crate's internal Storage struct was
missing it, so it was never forwarded to the kata-agent.
Add the field to the Rust struct and its From<Storage> translation,
and update all explicit struct initialisers across the resource
crate to include shared: false so the build stays clean.
This is needed for trusted ephemeral data storage, where the
agent uses the shared flag to avoid premature cleanup of volumes
that are shared across containers in a pod.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
Add add_volume_mount_info(), is_volume_mounted(), and
remove_volume_path() to the mount module. These mirror the Go
helpers (AddMountInfo, IsVolumeMounted, Remove) in
src/runtime/pkg/direct-volume/utils.go and are needed by the
upcoming EncryptedEmptyDirVolume to write and clean up
mountInfo.json metadata for block-encrypted emptyDir volumes.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
Add the emptydir_mode field to the Runtime configuration struct,
allowing runtime-rs to read the emptyDir handling mode from the
TOML config file. This is groundwork for trusted ephemeral data
storage support in runtime-rs (parity with the Go runtime).
Two modes are supported:
- shared-fs (default): share emptyDir via virtio-fs/9p.
- block-encrypted: plug a block device encrypted in-guest via
CDH/LUKS2.
Empty values default to "shared-fs"; unknown values are rejected
during validation.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
Added a firmware module to dbs_boot crate, and guest VM is allowed
to load tdshim into memory, which serves as a prerequisite for
booting TDX VM. And other sections (including kernel payload and
cmdline) are also loaded into correct guest physical addresses
according to the design of tdshim layout.
Signed-off-by: Xiaofan Xxf <xiaofan.xxf@antgroup.com>
based on current runtime-go behaviour introduced in https://github.com/kata-containers/kata-containers/pull/9195
When using static resources, always set maxvcpus value equal to the vcpus value.
This is because the static resources case does not support dynamic CPU hotplugging,
and therefore the maximum number of vCPUs should be limited to the number of vCPUs.
Booting with a high number of max vCPUs is a bit slower compared to a lower number.
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Update CDH to a newer version and:
- adjust the NVIDIA root filesystem build to reflect the change from
using libcryptsetup to using the cryptsetup binary.
- adjust image-pull test cases to conduct parallel write operations
on the /dev/trusted_store backed guest image pull location since
issue #12721 has been solved on CDH side.
Fixes#12721
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
- Added click==8.3.3 to docs/requirements.txt
- Click 8.3.3 is the latest version for Python >=3.10
- Required for mkdocs toolchain compatibility and resolves vulnerability in indirect dependencies
- Ref : CVE-2026-7246
Signed-off-by: pavithiran34 <pavithiran.p@ibm.com>
Replace guest-pull image allow-all placeholders with explicit
auto-generated policies for each generated pod manifest.
Generate policy after the final YAML edits so initdata and image
pull secrets are represented in the policy inputs.
Assisted-by: OpenAI Codex <codex@openai.com>
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Teach auto_generate_policy to reuse a cc_init_data annotation by
decoding it into the temporary default-initdata.toml file.
This lets tests preserve CDH initdata while genpolicy appends the
generated agent security policy for the workload.
Assisted-by: OpenAI Codex <codex@openai.com>
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Move the Docker auth setup into common.bash so tests beyond the
NVIDIA runner can provide credentials for genpolicy image pulls.
Make the registry, username, password and output directory explicit
while preserving the nvcr.io setup used by the NIM tests.
Assisted-by: OpenAI Codex <codex@openai.com>
Signed-off-by: Manuel Huber <manuelh@nvidia.com>