Compare commits

...

12 Commits

Author SHA1 Message Date
Aurélien Bombo
8916f5f301 gha: Pin action for cargo-deny workflow
The cargo-deny workflow should be the last workflow to not use a pinned version.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2026-04-07 15:41:09 -05:00
Fabiano Fidêncio
b3ae6ef99c Merge pull request #12760 from fitzthum/bump-nvat
Bump trustee and guest-components to add nvswitch / ppcie support
2026-04-07 19:07:50 +02:00
Aurélien Bombo
79fab93041 Merge pull request #12779 from rophy/fix/strip-cr-from-tty-exec
tests: strip \r from kubectl exec output for TTY containers
2026-04-07 10:19:21 -05:00
Tobin Feldman-Fitzthum
e40abcf72d nvidia: add nvrc.smi.srs=1 to default nvidia kernel params
The attestation-agent no longer sets nvidia devices to ready
automatically. Instead, we should use nvrc for this. Since this is
required for all nvidia workloads, add it to the default nv kernel
params.

With bounce buffers, the timing of attesting a device versus setting it
to ready is not so important.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-07 14:28:50 +00:00
Manuel Huber
0fd4559f7e docs: Update NVIDIA GPU passthrough QEMU scenario
Updates for the NVIDIA GPU passthrough scenario for the
kata-containers release 3.29.0.

Signed-off-by: Manuel Huber <manuelh@nvidia.com>
2026-04-07 14:58:40 +02:00
Tobin Feldman-Fitzthum
7385938c57 tests: fix default KBS Policy path
We recently moved the default policy in the Trustee repo. Now it's in
the same place as all the other policies. Update the test code to match.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-07 05:46:27 +00:00
Tobin Feldman-Fitzthum
38e04bb6d8 versions: bump guest-components for switch attestation
Pick up the new version of guest-components which uses NVAT bindings
instead of NVML bindings. This will allow us to attests guests with
nvswitches.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-07 05:46:27 +00:00
Rophy Tsai
f7d9024249 tests: strip \r from kubectl exec output for TTY containers
The busybox-pod.yaml test fixture sets tty: true on the second
container. When a container has a TTY, kubectl exec may return
\r\n line endings. The invisible \r causes string comparisons
to fail:

  container_name=$(kubectl exec ... -- env | grep CONTAINER_NAME)
  [ "$container_name" == "CONTAINER_NAME=second-test-container" ]

This comparison fails because $container_name contains a trailing
\r character.

Fix by piping through tr -d '\r' after grep. This is harmless
when \r is absent and fixes the mismatch when present.

Fixes: #9136

Signed-off-by: Rophy Tsai <rophy@users.noreply.github.com>
2026-04-07 01:35:10 +00:00
Tobin Feldman-Fitzthum
3d60196735 versions: bump Trustee to pickup PPCIE support
Trustee is compatible with old guest components (using NVML bindings) or
new guest components (using NVAT). If we have the new version of gc, we
can attest PPCIE guests, which we need the new version of Trustee to
verify.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-06 17:51:12 +00:00
Tobin Feldman-Fitzthum
0444d70704 rootfs: add runtime support for NVAT
Update NVIDIA rootfs builder to include runtime dependencies for NVAT
Rust bindings.

The nvattest package does not include the .so file, so we need to build
from source.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-06 17:51:12 +00:00
Tobin Feldman-Fitzthum
78c61459f8 packaging: add built-time support for NVAT
The attestation agent will soon rely on the NVAT rust bindings, which
have some built-time dependencies.

There is currently no nvattest-dev package, so we need to build from
source to get the headers and .so file.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-04-06 17:51:12 +00:00
Tobin Feldman-Fitzthum
8944058a5b versions: add nvat version
Keep track of which version of NVIDIA Attestation SDK to use when
building the attestation agent with NVIDIA support.

Signed-off-by: Tobin Feldman-Fitzthum <tfeldmanfitz@nvidia.com>
2026-03-31 21:15:03 +00:00
15 changed files with 120 additions and 37 deletions

View File

@@ -14,7 +14,7 @@ runs:
using: "composite"
steps:
- name: Install Rust
uses: actions-rs/toolchain@v1
uses: actions-rs/toolchain@b2417cde72dcf67f306c0ae8e0828a81bf0b189f # v1.0.6
with:
profile: minimal
toolchain: nightly

View File

@@ -227,7 +227,7 @@ Kata's non-TEE and TEE GPU workload deployment scenarios for your Kubernetes
nodes. We provide guidance based on the upstream Kata CI procedures for the
NVIDIA GPU CI validation jobs. Note that, this setup:
- uses the guest image pull method to pull container image layers
- uses the nydus snapshotter to pull container image layers in the guest
- uses the genpolicy tool to attach Kata agent security policies to the pod
manifest
- has dedicated (composite) attestation tests, a CUDA vectorAdd test, and a
@@ -250,6 +250,17 @@ Service NRAS
- container image signature verification and encrypted container images
- ephemeral container data and image layer storage
For the use of these features, we refer to separate documentation in the
kata-containers and confidential-containers documentation resources.
For example, see a
[list of features](https://confidentialcontainers.org/docs/features/) along
with their documentation in the confidential-containers documentation.
> **Note:**
>
> Image signature verification for signed multi-arch images is currently not
> supported.
### Requirements
The requirements for the TEE scenario are:
@@ -272,8 +283,8 @@ selecting proper hardware and on properly configuring its firmware and OS.
#### Containerd and Kubernetes
First, set up your Kubernetes cluster. For instance, in Kata CI, our NVIDIA
jobs use a single-node vanilla Kubernetes cluster with a 2.1 containerd
version and Kata's current supported Kubernetes version. This cluster is
jobs use a single-node vanilla Kubernetes cluster with containerd v2.2
and Kata's current supported Kubernetes version. This cluster is
being set up using the `deploy_k8s` function from the script file
`tests/integration/kubernetes/gha-run.sh`. If you intend to run this script,
follow these steps, and make sure you have `yq` and `helm` installed. Note
@@ -284,7 +295,7 @@ You can execute the function as follows:
$ export GH_TOKEN="<your-gh-pat>"
$ export KUBERNETES="vanilla"
$ export CONTAINER_ENGINE="containerd"
$ export CONTAINER_ENGINE_VERSION="v2.1"
$ export CONTAINER_ENGINE_VERSION="v2.2"
$ source tests/gha-run-k8s-common.sh
$ deploy_k8s
```
@@ -300,6 +311,13 @@ $ deploy_k8s
> `create_container_timeout` of 1200s, which is the equivalent value on shim
> side, controlling the time the shim allows for a container to remain in
> *container creating* state.
> If you need a timeout of more than 1200s, you will also need to adjust the
> agent's `image_pull_timeout`, which in turn sets the confidential data
> hub's image pull API timeout in seconds. For this, add the
> `agent.image_pull_timeout=<seconds>` kernel parameter to your shim
> configuration's `kernel_params` field, or pass the parameter explicitly
> via the `io.katacontainers.config.hypervisor.kernel_params: "..."` pod
> annotation. The default value for this timeout is 1200s.
> **Note:**
>
@@ -356,7 +374,7 @@ $ helm install --wait --generate-name \
Install the latest Kata Containers helm chart, similar to
[existing documentation](https://github.com/kata-containers/kata-containers/blob/main/tools/packaging/kata-deploy/helm-chart/README.md)
(minimum version: `3.24.0`).
(minimum version: `3.29.0`).
```bash
$ export VERSION=$(curl -sSL https://api.github.com/repos/kata-containers/kata-containers/releases/latest | jq .tag_name | tr -d '"')
@@ -371,6 +389,13 @@ $ helm install kata-deploy \
"${CHART}" --version "${VERSION}"
```
> **Note:**
>
> For node lifecycle management, see the
> [lifecycle-manager](https://github.com/kata-containers/lifecycle-manager)
> repository which enables Argo Workflows-based lifecycle management for your
> node's Kata deployments.
#### Trustee's KBS for remote attestation
For our Kata CI runners we use Trustee's KBS for composite attestation for
@@ -566,21 +591,21 @@ With GPU passthrough being supported by the
you can use the tool to create a Kata agent security policy. Our CI deploys
all sample pod manifests with a Kata agent security policy.
Note that, using containerd 2.1 in upstream's CI, we use the following
modification to the genpolicy default settings:
Note that, in Kata CI, we use snippets such as the following to modify the
genpolicy default settings:
```bash
[
{
"op": "replace",
"path": "/kata_config/oci_version",
"value": "1.2.1"
"value": "1.3.0"
}
]
```
This modification is applied via the genpolicy drop-in configuration file
`src\tools\genpolicy\drop-in-examples\20-oci-1.2.1-drop-in.json`.
When using a newer containerd version, such as containerd 2.2, the OCI
version field needs to be adjusted to "1.3.0", for instance.
`src/tools/genpolicy/drop-in-examples/20-oci-1.3.0-drop-in.json`.
When using a newer (or older) containerd version, the OCI version field
may need to be adjusted accordingly.
#### Deploy pods using your own containers and manifests

View File

@@ -495,6 +495,9 @@ ifneq (,$(QEMUCMD))
KERNELPARAMS_NV += "pci=nocrs"
KERNELPARAMS_NV += "pci=assign-busses"
KERNELPARAMS_CONFIDENTIAL_NV = $(KERNELPARAMS_NV)
KERNELPARAMS_CONFIDENTIAL_NV += "nvrc.smi.srs=1"
# Setting this to false can lead to cgroup leakages in the host
# Best practice for production is to set this to true
DEFSANDBOXCGROUPONLY_NV = true
@@ -667,6 +670,7 @@ USER_VARS += DEFAULTMEMORY_NV
USER_VARS += DEFAULTVFIOPORT_NV
USER_VARS += DEFAULTPCIEROOTPORT_NV
USER_VARS += KERNELPARAMS_NV
USER_VARS += KERNELPARAMS_CONFIDENTIAL_NV
USER_VARS += KERNELVERITYPARAMS_NV
USER_VARS += KERNELVERITYPARAMS_CONFIDENTIAL_NV
USER_VARS += DEFAULTTIMEOUT_NV

View File

@@ -90,7 +90,7 @@ snp_guest_policy = 196608
# may stop the virtual machine from booting.
# To see the list of default parameters, enable hypervisor debug, create a
# container and look for 'default-kernel-parameters' log entries.
kernel_params = "@KERNELPARAMS_NV@"
kernel_params = "@KERNELPARAMS_CONFIDENTIAL_NV@"
# Optional dm-verity parameters (comma-separated key=value list):
# root_hash=...,salt=...,data_blocks=...,data_block_size=...,hash_block_size=...

View File

@@ -67,7 +67,7 @@ valid_hypervisor_paths = @QEMUTDXEXPERIMENTALVALIDHYPERVISORPATHS@
# may stop the virtual machine from booting.
# To see the list of default parameters, enable hypervisor debug, create a
# container and look for 'default-kernel-parameters' log entries.
kernel_params = "@KERNELPARAMS_NV@"
kernel_params = "@KERNELPARAMS_CONFIDENTIAL_NV@"
# Optional dm-verity parameters (comma-separated key=value list):
# root_hash=...,salt=...,data_blocks=...,data_block_size=...,hash_block_size=...

View File

@@ -45,7 +45,7 @@ kbs_set_allow_all_resources() {
kbs_set_default_policy() {
kbs_set_resources_policy \
"${COCO_KBS_DIR}/src/policy_engine/opa/default_policy.rego"
"${COCO_KBS_DIR}/sample_policies/default.rego"
}
# Set "deny all" policy to resources.

View File

@@ -69,11 +69,11 @@ EOF"
## Cases for target container
### First container
container_name=$(kubectl exec $pod_name -c $first_container_name -- $env_command | grep CONTAINER_NAME)
container_name=$(kubectl exec $pod_name -c $first_container_name -- $env_command | grep CONTAINER_NAME | tr -d '\r')
[ "$container_name" == "CONTAINER_NAME=$first_container_name" ]
### Second container
container_name=$(kubectl exec $pod_name -c $second_container_name -- $env_command | grep CONTAINER_NAME)
container_name=$(kubectl exec $pod_name -c $second_container_name -- $env_command | grep CONTAINER_NAME | tr -d '\r')
[ "$container_name" == "CONTAINER_NAME=$second_container_name" ]
}

View File

@@ -29,14 +29,6 @@ setup() {
envsubst < "${pod_yaml_in}" > "${pod_yaml}"
if [ "${TEE}" = "true" ]; then
kernel_params_annotation="io.katacontainers.config.hypervisor.kernel_params"
kernel_params_value="nvrc.smi.srs=1"
set_metadata_annotation "${pod_yaml}" \
"${kernel_params_annotation}" \
"${kernel_params_value}"
fi
policy_settings_dir="$(create_tmp_policy_settings_dir "${pod_config_dir}")"
add_requests_to_policy_settings "${policy_settings_dir}" "ReadStreamRequest"

View File

@@ -35,15 +35,16 @@ setup() {
kubectl wait --for=condition=Ready --timeout=$timeout pod $pod_name
# Check PID from first container
# Strip \r — containers with tty: true return \r\n line endings
first_pid_container=$(kubectl exec $pod_name -c $first_container_name \
-- $ps_command | grep "/pause")
-- $ps_command | grep "/pause" | tr -d '\r')
# Verify that is not empty
check_first_pid=$(echo $first_pid_container | wc -l)
[ "$check_first_pid" == "1" ]
# Check PID from second container
second_pid_container=$(kubectl exec $pod_name -c $second_container_name \
-- $ps_command | grep "/pause")
-- $ps_command | grep "/pause" | tr -d '\r')
# Verify that is not empty
check_second_pid=$(echo $second_pid_container | wc -l)
[ "$check_second_pid" == "1" ]

View File

@@ -68,12 +68,12 @@ install_userspace_components() {
libnvidia-decode libnvidia-fbc1 libnvidia-encode \
libnvidia-nscq libnvidia-compute nvidia-settings
# Needed for confidential-data-hub runtime dependencies
# Needed for confidential-data-hub and NVAT runtime dependencies
eval "${APT_INSTALL}" cryptsetup-bin dmsetup \
libargon2-1 e2fsprogs
libargon2-1 e2fsprogs libxml2
apt-mark hold cryptsetup-bin dmsetup libargon2-1 \
e2fsprogs
e2fsprogs libxml2
}
setup_apt_repositories() {

View File

@@ -224,6 +224,26 @@ chisseled_gpudirect() {
exit 1
}
chisseled_nvat() {
if [[ "${type}" != "confidential" ]]; then
return
fi
echo "nvidia: chisseling NVAT"
local libdir="lib/${machine_arch}-linux-gnu"
# NVAT shared library (bundled via coco-guest-components tarball)
cp -a "${stage_one}"/usr/local/lib/libnvat.so* "${libdir}"/.
# NVAT runtime dependencies (per ldd on attestation-agent)
cp -a "${stage_one}/${libdir}"/libxml2.so.2* "${libdir}"/.
cp -a "${stage_one}/${libdir}"/libstdc++.so.6* "${libdir}"/.
cp -a "${stage_one}/${libdir}"/liblzma.so.5* "${libdir}"/.
cp -a "${stage_one}/${libdir}"/libicuuc.so.* "${libdir}"/.
cp -a "${stage_one}/${libdir}"/libicudata.so.* "${libdir}"/.
}
setup_nvrc_init_symlinks() {
local nvrc="NVRC-${machine_arch}-unknown-linux-musl"
# make sure NVRC is the init process for the initrd and image case
@@ -358,7 +378,7 @@ coco_guest_components() {
local -r pause_dir="pause_bundle"
mkdir -p "${coco_bin_dir}"
cp -a "${stage_one}/${coco_bin_dir}"/attestation-agent "${coco_bin_dir}/."
cp -a "${stage_one}/${coco_bin_dir}"/attestation-agent-nv "${coco_bin_dir}/attestation-agent"
cp -a "${stage_one}/${coco_bin_dir}"/api-server-rest "${coco_bin_dir}/."
cp -a "${stage_one}/${coco_bin_dir}"/confidential-data-hub "${coco_bin_dir}/."
@@ -418,6 +438,7 @@ setup_nvidia_gpu_rootfs_stage_two() {
done
coco_guest_components
chisseled_nvat
fi
compress_rootfs

View File

@@ -14,7 +14,7 @@ ENV PATH="/opt/cargo/bin/:${PATH}"
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
RUN mkdir ${RUSTUP_HOME} ${CARGO_HOME} && chmod -R a+rwX ${RUSTUP_HOME} ${CARGO_HOME}
RUN mkdir ${RUSTUP_HOME} ${CARGO_HOME}
RUN apt-get update && \
apt-get --no-install-recommends install -y \
@@ -38,6 +38,18 @@ RUN apt-get update && \
apt-get clean && rm -rf /var/lib/apt/lists/ && \
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain ${RUST_TOOLCHAIN}
ARG NVAT_VERSION
RUN if [ "$(uname -m)" = "x86_64" ] && [ -n "${NVAT_VERSION}" ]; then \
apt-get update && apt-get --no-install-recommends install -y \
build-essential libxml2-dev zlib1g-dev && \
tmpdir=$(mktemp -d) && pushd "$tmpdir" && \
git clone https://github.com/NVIDIA/attestation-sdk && \
pushd attestation-sdk && git fetch --depth=1 origin "${NVAT_VERSION}" && \
git checkout FETCH_HEAD && pushd nv-attestation-sdk-cpp && cmake . && make install && \
mkdir -p /usr/include && ln -sf /usr/local/include/nvat.h /usr/include/nvat.h && ldconfig && \
popd && popd && popd && rm -rf "$tmpdir" && \
apt-get clean && rm -rf /var/lib/apt/lists/; fi
ENV LIBC="gnu"
RUN ARCH=$(uname -m); \
rust_arch=""; \
@@ -50,3 +62,5 @@ RUN ARCH=$(uname -m); \
esac; \
echo "RUST_ARCH=${rust_arch}" > /etc/profile.d/rust.sh; \
rustup target add "${rust_arch}-unknown-linux-${LIBC}"
RUN chmod -R a+rwX ${RUSTUP_HOME} ${CARGO_HOME}

View File

@@ -35,6 +35,22 @@ build_coco_guest_components_from_source() {
DESTDIR="${DESTDIR}/usr/local/bin" TEE_PLATFORM=${TEE_PLATFORM} make install
install -D -m0644 "confidential-data-hub/hub/src/image/ocicrypt_config.json" "${DESTDIR}/etc/ocicrypt_config.json"
if [ -n "${NV_ATTESTER:-}" ]; then
echo "build attestation-agent-nv with nvidia-attester support"
rm "target/${RUST_ARCH}-unknown-linux-${LIBC}/release/attestation-agent"
ATTESTER="${NV_ATTESTER}" NVAT_USE_SYSTEM_LIB=1 RUSTFLAGS="-L /usr/local/lib" \
DESTDIR="${DESTDIR}/usr/local/bin" TEE_PLATFORM=${TEE_PLATFORM} make build
strip "target/${RUST_ARCH}-unknown-linux-${LIBC}/release/attestation-agent"
install -D -m0755 "target/${RUST_ARCH}-unknown-linux-${LIBC}/release/attestation-agent" \
"${DESTDIR}/usr/local/bin/attestation-agent-nv"
mkdir -p "${DESTDIR}/usr/local/lib"
cp -a /usr/local/lib/libnvat.so* "${DESTDIR}/usr/local/lib/"
fi
popd
}

View File

@@ -28,12 +28,16 @@ package_output_dir="${package_output_dir:-}"
[ -n "${coco_guest_components_version}" ] || die "Failed to get coco-guest-components version or commit"
[ -n "${coco_guest_components_toolchain}" ] || die "Failed to get the rust toolchain to build coco-guest-components"
nvat_version="${nvat_version:-}"
[ -n "${nvat_version}" ] || nvat_version=$(get_from_kata_deps ".externals.nvidia.nvat.version" 2>/dev/null || true)
container_image="${COCO_GUEST_COMPONENTS_CONTAINER_BUILDER:-$(get_coco_guest_components_image_name)}"
[ "${CROSS_BUILD}" == "true" ] && container_image="${container_image}-cross-build"
docker pull ${container_image} || \
(docker $BUILDX build $PLATFORM \
--build-arg RUST_TOOLCHAIN="${coco_guest_components_toolchain}" \
--build-arg NVAT_VERSION="${nvat_version}" \
-t "${container_image}" "${script_dir}" && \
# No-op unless PUSH_TO_REGISTRY is exported as "yes"
push_to_registry "${container_image}")
@@ -44,7 +48,8 @@ RESOURCE_PROVIDER="kbs,sev"
# snp-attester and tdx-attester crates require packages only available on x86
# se-attester crate requires packages only available on s390x
case "$(uname -m)" in
x86_64) ATTESTER="snp-attester,tdx-attester,nvidia-attester" ;;
x86_64) ATTESTER="snp-attester,tdx-attester"
NV_ATTESTER="snp-attester,tdx-attester,nvidia-attester" ;;
s390x) ATTESTER="se-attester" ;;
aarch64) ATTESTER="cca-attester" ;;
*) ATTESTER="none" ;;
@@ -56,6 +61,7 @@ docker run --rm -i -v "${repo_root_dir}:${repo_root_dir}" \
--env TEE_PLATFORM=${TEE_PLATFORM:+"all"} \
--env RESOURCE_PROVIDER=${RESOURCE_PROVIDER:-} \
--env ATTESTER=${ATTESTER:-} \
--env NV_ATTESTER=${NV_ATTESTER:-} \
--env coco_guest_components_repo="${coco_guest_components_repo}" \
--env coco_guest_components_version="${coco_guest_components_version}" \
--user "$(id -u)":"$(id -g)" \

View File

@@ -269,6 +269,10 @@ externals:
ctk:
version: "1.18.1-1"
url: "https://github.com/NVIDIA/nvidia-container-toolkit"
nvat:
desc: "NVIDIA Attestation SDK"
version: "2026.03.02"
url: "https://github.com/NVIDIA/attestation-sdk"
busybox:
desc: "The Swiss Army Knife of Embedded Linux"
@@ -288,18 +292,18 @@ externals:
coco-guest-components:
description: "Provides attested key unwrapping for image decryption"
url: "https://github.com/confidential-containers/guest-components/"
version: "ab95914ac84c32a43102463cc0ae330710af47be"
version: "30b552e7841b10e656fa28cf643ed25b9d45e33f"
toolchain: "1.90.0"
coco-trustee:
description: "Provides attestation and secret delivery components"
url: "https://github.com/confidential-containers/trustee"
version: "f5cb8fc1b51b652fc24e2d6b8742cf417805352e"
version: "22788122660d6e9be3e4bf52704282de5fcc0a2a"
# image / ita_image and image_tag / ita_image_tag must be in sync
image: "ghcr.io/confidential-containers/staged-images/kbs"
image_tag: "f5cb8fc1b51b652fc24e2d6b8742cf417805352e"
image_tag: "22788122660d6e9be3e4bf52704282de5fcc0a2a"
ita_image: "ghcr.io/confidential-containers/staged-images/kbs-ita-as"
ita_image_tag: "f5cb8fc1b51b652fc24e2d6b8742cf417805352e-x86_64"
ita_image_tag: "22788122660d6e9be3e4bf52704282de5fcc0a2a-x86_64"
toolchain: "1.90.0"
containerd: