v9 is based on Node.js 20 which is deprecated, so update to the
latest to pick up a Node.js 24 version before Github removes Node 20
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add a manually-triggered workflow that builds and pushes a multi-arch
busybox-based image to quay.io/kata-containers/confidential-containers-auth
for use as an authenticated container image in CI tests.
The workflow uses skopeo to copy per-arch images and buildah to create
and push the multi-arch manifest.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
Wait() was releasing s.mu immediately after getContainer(), then
calling getExec() — which reads c.execs — without holding any lock.
Concurrent Exec() or Delete() calls that write to c.execs under s.mu
triggered a "concurrent map read and map write" fatal panic.
Add a dedicated sync.RWMutex to the container struct that protects the
execs map. getExec() now acquires a read lock internally, and all
writes go through new setExec()/deleteExec() helpers that acquire the
write lock. This keeps the locking concern local to the map and avoids
complicating the s.mu usage in Wait().
Add a regression test (TestConcurrentExecAccess) that exercises
concurrent getExec reads against setExec/deleteExec writes; this
reliably reproduces the panic under the race detector without the fix.
Fixes: #12825
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
TEE hardware (TDX, SEV-SNP) is very limited in CI. Running the full
test suite on every PR consumes these resources unnecessarily, since
most tests exercises what is already exercised by the -coco-dev CIs.
Introduce a `tee-test-scope` workflow input (small/full) and a new
`baremetal-small-tee` K8S_TEST_HOST_TYPE that runs only the 12 tests
that are TEE-relevant: attestation tests (encrypted/authenticated/
signed image pull, confidential attestation) plus policy and trusted
ephemeral data storage tests.
PR runs default to "small" (12 tests), nightly runs use "full" (59
tests), and manual dispatch offers a dropdown to choose.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
Update the name and move it to the static checks as we don't
need to ensure it's running for none code changes.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The cargo deny generated action doesn't seem to work
and seems unnecessarily complex, so try using
EmbarkStudios/cargo-deny-action instead
Fixes: #11218
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The new version of image-rs supports more types of signed images. First,
we added supported for a few more key types. Second, we added support
for multi-arch images where the manifest digest is signed but the
individual arch manifest is not. These images are relatively common, so
let's pickup the fix asap.
Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
I don't think agent-ctl will benefit from the new image-rs features, but
let's update it to be complete.
Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
This is not related to this PR, but rather to #12734, which ended up not
running the `make src/agent generate-protocols`.
While here, let's also fix it.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The hardcoded DEFAULT_LAUNCH_PROCESS_TIMEOUT of 6 seconds in the kata
agent is insufficient for environments with NVIDIA GPUs and NVSwitches,
where the attestation-agent needs significantly more time to collect
evidence during initialization (e.g. ~2 seconds per NVSwitch).
When the timeout expires, the agent (PID 1) exits with an error, causing
the guest kernel to perform an orderly shutdown before the
attestation-agent has finished starting.
Make this timeout configurable via the kernel parameter
agent.launch_process_timeout (in seconds), preserving the 6-second
default for backward compatibility. The Go runtime is wired up to pass
this value from the TOML config's [agent.kata] section through to the
kernel command line.
The NVIDIA GPU configs set the new default to 15 seconds.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor
Add two new configuration knobs that control the logical and physical
sector sizes advertised by virtio-blk devices to the guest:
block_device_logical_sector_size (config file)
block_device_physical_sector_size (config file)
io.katacontainers.config.hypervisor.blk_logical_sector_size (annotation)
io.katacontainers.config.hypervisor.blk_physical_sector_size (annotation)
The annotation names are abbreviated relative to the config file keys
because Kubernetes enforces a 63-character limit on annotation name
segments, and the full names would exceed it.
Both settings default to 0 (let QEMU decide). When set, they are passed
as logical_block_size and physical_block_size in the QMP device_add
command during block device hotplug.
Setting logical_sector_size smaller then container filesystem
block size will cause EINVAL on mount. The physical_sector_size can
always be set independently.
Values must be 0 or a power of 2 in the range [512, 65536]; other
values are rejected with an error at sandbox creation time.
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
Add a global and per-shim configurable switch to enable/disable
the overhead section in generated RuntimeClasses. This allows users
to omit overhead when it's not needed or managed externally.
Priority: per-shim > global > default(true).
Signed-off-by: LizZhang315 <123134987@qq.com>
Users were confused about which configuration file to edit because
kata-deploy copied the base config into a per-shim runtime directory
(runtimes/<shim>/) for config.d support, leaving the original file
in place untouched. This made it look like the original was the
authoritative config, when in reality the runtime was loading the
copy from the per-shim directory.
Replace the original config file with a symlink pointing to the
per-shim runtime copy after the copy is made. The runtime's
ResolvePath / EvalSymlinks follows the symlink and lands in the
per-shim directory, where it naturally finds config.d/ with all
drop-in fragments. This makes it immediately obvious that the
real configuration lives in the per-shim directory and removes the
ambiguity about which file to inspect or modify.
During cleanup, the symlink at the original location is explicitly
removed before the runtime directory is deleted.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The k8s-confidential-attestation test extracts the QEMU command line
from journal logs to compute the SNP launch measurement. It only
matched the Go runtime's log format ("launching <path> with: [<args>]"),
but runtime-rs logs differently ("qemu args: <args>").
Handle both formats so the test works with qemu-snp-runtime-rs.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As we're in the process to stabilise runtime-rs for the coming 4.0.0
release, we better start running as many tests as possible with that.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Now, we include the nvrc.smi.srs=1 flag in the default kernel cmdline.
Thus, we can remove the guidance for people to add it themselves when
not using attestation. In fact, users don't really need to know about
this flag at all.
Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
Fix all clippy warnings triggered by -D warnings:
- install.rs: remove useless .into() conversions on PathBuf values
and replace vec! with an array literal where a Vec is not needed
- utils/toml.rs: replace while-let-on-iterator with a for loop and
drop the now-unnecessary mut on the iterator binding
- main.rs: replace match-with-single-pattern with if-let in two
places dealing with experimental_setup_snapshotter
- utils/yaml.rs: extract repeated serde_yaml::Value::String key into
a local variable, removing needless borrows on temporary values
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Made-with: Cursor