Merge pull request #11076 from Jakob-Naucke/ap-bind-assoc

Bind/associate for VFIO-AP
This commit is contained in:
Hyounggyu Choi 2025-05-09 09:32:46 +02:00 committed by GitHub
commit a286a5aee8
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
7 changed files with 243 additions and 28 deletions

View File

@ -32,6 +32,7 @@ See the [how-to documentation](how-to).
* [Intel QAT with Kata](./use-cases/using-Intel-QAT-and-kata.md)
* [SPDK vhost-user with Kata](./use-cases/using-SPDK-vhostuser-and-kata.md)
* [Intel SGX with Kata](./use-cases/using-Intel-SGX-and-kata.md)
* [IBM Crypto Express passthrough with Confidential Containers](./use-cases/CEX-passthrough-and-coco.md)
## Developer Guide

View File

@ -0,0 +1,96 @@
# Using IBM Crypto Express with Confidential Containers
On IBM Z (s390x), IBM Crypto Express (CEX) hardware security modules (HSM) can be passed through to virtual guests.
This VFIO pass-through is domain-wise, i.e. guests can securely share one physical card.
For the Accelerator and Enterprise PKCS #11 (EP11) modes of CEX, on IBM z16 and up, pass-through is also supported when using the IBM Secure Execution trusted execution environment.
To maintain confidentiality when using EP11 within Secure Execution, additional steps are required.
When using Secure Execution within Kata Containers, some of these steps are managed by the Kata agent, but preparation is required to make pass-through work.
The Kata agent will expect required confidential information at runtime via [Confidential Data Hub](https://github.com/confidential-containers/guest-components/tree/main/confidential-data-hub) from Confidential Containers, and this guide assumes Confidential Containers components as a means of secret provisioning.
At the time of writing, devices for trusted execution environments are only supported via the `--device` option of e.g. `ctr`, `docker`, or `podman`, but **not** via Kubernetes.
Refer to [KEP 4113](https://github.com/kubernetes/enhancements/pull/4113) for details.
Using a CEX card in Accelerator mode is much simpler and does not require the steps below.
To do so, prepare [Kata for Secure Execution](../how-to/how-to-run-kata-containers-with-SE-VMs.md), set `vfio_mode = "vfio"` and `cold_plug_vfio = "bridge-port"` in the Kata `configuration.toml` file and use a [mediated device](../../src/runtime/virtcontainers/README.md#how-to-pass-a-device-using-vfio-ap-passthrough) similar to operating without Secure Execution.
The Kata agent will do the [Secure Execution bind](https://www.ibm.com/docs/en/linux-on-systems?topic=adapters-accelerator-mode) automatically.
## Prerequisites
- A host kernel that supports adjunct processor (AP) pass-through with Secure Execution. [Official support](https://www.ibm.com/docs/en/linux-on-systems?topic=restrictions-required-software) exists as of Ubuntu 24.04, RHEL 8.10 and 9.4, and SLES 15 SP6.
- An EP11 domain with a master key set up. In this process, you will need the master key verification pattern (MKVP) [1].
- A [mediated device](../../src/runtime/virtcontainers/README.md#how-to-pass-a-device-using-vfio-ap-passthrough), created from this domain, to pass through.
- Working [Kata Containers with Secure Execution](../how-to/how-to-run-kata-containers-with-SE-VMs.md).
- Working access to a [key broker service (KBS) with the IBM Secure Execution verifier](https://github.com/confidential-containers/trustee/blob/main/deps/verifier/src/se/README.md) from a Kata container. The provided Secure Execution header must match the Kata guest image and a policy to allow the appropriate secrets for this guest must be set up.
- In Kata's `configuration.toml`, set `vfio_mode = "vfio"` and `cold_plug_vfio = "bridge-port"`
## Prepare an association secret
An EP11 Secure Execution workload requires an [association secret](https://www.ibm.com/docs/en/linux-on-systems?topic=adapters-ep11-mode) to be inserted in the guest and associated with the adjunct processor (AP) queue.
In Kata Containers, this secret must be created and made available via Trustee, whereas the Kata agent performs the actual secret insertion and association.
On a trusted system, to create an association secret using the host key document (HKD) `z16.crt`, a guest header `hdr.bin`, a CA certificate `DigiCertCA.crt`, an IBM signing key `ibm-z-host-key-signing-gen2.crt`, and let the command create a random association secret that is named `my secret` and save this random association secret to `my_random_secret`, run:
```
[trusted]# pvsecret create -k z16.crt --hdr hdr.bin -o my_addsecreq \
--crt DigiCertCA.crt --crt ibm-z-host-key-signing-gen2.crt \
association "my secret" --output-secret my_random_secret
```
using `pvsecret` from the [s390-tools](https://github.com/ibm-s390-linux/s390-tools) suite.
`hdr.bin` **must** be the Secure Execution header matching the Kata guest image, i.e. the one also provided to Trustee.
This command saves the add-secret request itself to `my_addsecreq`, and information on the secret, including the secret ID, to `my_secret.yaml`.
This secret ID must be provided alongside the secret.
Write it to `my_addsecid` with or without leading `0x` or, using `yq`:
```
[trusted]# yq ".id" my_secret.yaml > my_addsecid
```
## Provision the association secret with Trustee
The secret and secret ID must be provided via Trustee with respect to the MKVP.
The paths where the Kata agent will expect this info are `vfio_ap/${mkvp}/secret` and `vfio_ap/${mkvp}/secret_id`, where `$mkvp` is the first 16 bytes (32 hex numbers) without leading `0x` of the MKVP.
For example, if your MKVPs read [1] as
```
WK CUR: valid 0xdb3c3b3c3f097dd55ec7eb0e7fdbcb933b773619640a1a75a9161cec00000000
WK NEW: empty -
```
use `db3c3b3c3f097dd55ec7eb0e7fdbcb93` in the provision for Trustee.
With a KBS running at `127.0.0.1:8080`, to store the secret and ID created above in the KBS with the authentication key `kbs.key` and this MKVP, run:
```
[trusted]# kbs-client --url http://127.0.0.1:8080 config \
--auth-private-key kbs.key set-resource \
--path vfio_ap/db3c3b3c3f097dd55ec7eb0e7fdbcb93/secret \
--resource-file my_addsecreq
[trusted]# kbs-client --url http://127.0.0.1:8080 config \
--auth-private-key kbs.key set-resource \
--path vfio_ap/db3c3b3c3f097dd55ec7eb0e7fdbcb93/secret_id \
--resource-file my_addsecid
```
## Run the workload
Assuming the mediated device exists at `/dev/vfio/0`, run e.g.
```
[host]# docker run --rm --runtime io.containerd.run.kata.v2 --device /dev/vfio/0 -it ubuntu
```
If you have [s390-tools](https://github.com/ibm-s390-linux/s390-tools) available in the container, you can see the available CEX domains including Secure Execution info using `lszcrypt -V`:
```
[container]# lszcrypt -V
CARD.DOM TYPE MODE STATUS REQUESTS PENDING HWTYPE QDEPTH FUNCTIONS DRIVER SESTAT
--------------------------------------------------------------------------------------------------------
03 CEX8P EP11-Coproc online 2 0 14 08 -----XN-F- cex4card -
03.0041 CEX8P EP11-Coproc online 2 0 14 08 -----XN-F- cex4queue usable
```
---
[1] If you have access to the host, the MKVP can be read at `/sys/bus/ap/card${cardno}/${apqn}/mkvps`, where `${cardno}` is the the two-digit hexadecimal identification for the card, and `${apqn}` is the APQN of the domain you want to pass, e.g. `card03/03.0041` for the the domain 0x41 on card 3.
This information is only readable when card and domain are not yet masked for use with VFIO.
If you do not have access to the host, you should receive the MKVP from your HSM domain administrator.

16
src/agent/Cargo.lock generated
View File

@ -3085,6 +3085,7 @@ dependencies = [
"rtnetlink",
"runtime-spec",
"rustjail",
"s390_pv_core",
"safe-path",
"scan_fmt",
"scopeguard",
@ -5575,6 +5576,20 @@ version = "0.2.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6518fc26bced4d53678a22d6e423e9d8716377def84545fe328236e3af070e7f"
[[package]]
name = "s390_pv_core"
version = "0.11.0"
source = "git+https://github.com/ibm-s390-linux/s390-tools?rev=4942504a9a2977d49989a5e5b7c1c8e07dc0fa41#4942504a9a2977d49989a5e5b7c1c8e07dc0fa41"
dependencies = [
"byteorder",
"libc",
"log",
"regex",
"serde",
"thiserror 2.0.12",
"zerocopy 0.7.35",
]
[[package]]
name = "safe-path"
version = "0.1.0"
@ -7768,6 +7783,7 @@ version = "0.7.35"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1b9b4fd18abc82b8136838da5d50bae7bdea537c574d8dc1a34ed098d6c166f0"
dependencies = [
"byteorder",
"zerocopy-derive 0.7.35",
]

View File

@ -187,6 +187,9 @@ base64 = "0.22"
sha2 = "0.10.8"
async-compression = { version = "0.4.22", features = ["tokio", "gzip"] }
[target.'cfg(target_arch = "s390x")'.dependencies]
pv_core = { git = "https://github.com/ibm-s390-linux/s390-tools", rev = "4942504a9a2977d49989a5e5b7c1c8e07dc0fa41", package = "s390_pv_core" }
[dev-dependencies]
tempfile.workspace = true
which.workspace = true

View File

@ -11,8 +11,12 @@ use crate::AGENT_CONFIG;
use anyhow::{bail, Context, Result};
use derivative::Derivative;
use protocols::{
confidential_data_hub, confidential_data_hub_ttrpc_async,
confidential_data_hub_ttrpc_async::{SealedSecretServiceClient, SecureMountServiceClient},
confidential_data_hub,
confidential_data_hub::GetResourceRequest,
confidential_data_hub_ttrpc_async,
confidential_data_hub_ttrpc_async::{
GetResourceServiceClient, SealedSecretServiceClient, SecureMountServiceClient,
},
};
use std::fs;
use std::os::unix::fs::symlink;
@ -39,6 +43,8 @@ pub struct CDHClient {
sealed_secret_client: SealedSecretServiceClient,
#[derivative(Debug = "ignore")]
secure_mount_client: SecureMountServiceClient,
#[derivative(Debug = "ignore")]
get_resource_client: GetResourceServiceClient,
}
impl CDHClient {
@ -47,10 +53,13 @@ impl CDHClient {
let sealed_secret_client =
confidential_data_hub_ttrpc_async::SealedSecretServiceClient::new(client.clone());
let secure_mount_client =
confidential_data_hub_ttrpc_async::SecureMountServiceClient::new(client);
confidential_data_hub_ttrpc_async::SecureMountServiceClient::new(client.clone());
let get_resource_client =
confidential_data_hub_ttrpc_async::GetResourceServiceClient::new(client);
Ok(CDHClient {
sealed_secret_client,
secure_mount_client,
get_resource_client,
})
}
@ -84,6 +93,18 @@ impl CDHClient {
.await?;
Ok(())
}
pub async fn get_resource(&self, resource_path: &str) -> Result<Vec<u8>> {
let req = GetResourceRequest {
ResourcePath: format!("kbs://{}", resource_path),
..Default::default()
};
let res = self
.get_resource_client
.get_resource(ttrpc::context::with_timeout(*CDH_API_TIMEOUT), &req)
.await?;
Ok(res.Resource)
}
}
pub async fn init_cdh_client(cdh_socket_uri: &str) -> Result<()> {
@ -201,6 +222,15 @@ pub async fn secure_mount(
Ok(())
}
#[allow(dead_code)]
pub async fn get_cdh_resource(resource_path: &str) -> Result<Vec<u8>> {
let cdh_client = CDH_CLIENT
.get()
.expect("Confidential Data Hub not initialized");
cdh_client.get_resource(resource_path).await
}
#[cfg(test)]
mod tests {
use super::*;

View File

@ -4,14 +4,13 @@
// SPDX-License-Identifier: Apache-2.0
//
#[cfg(target_arch = "s390x")]
use crate::ap;
use crate::device::{pcipath_to_sysfs, DevUpdate, DeviceContext, DeviceHandler, SpecUpdate};
use crate::linux_abi::*;
use crate::pci;
use crate::sandbox::Sandbox;
use crate::uevent::{wait_for_uevent, Uevent, UeventMatcher};
use anyhow::{anyhow, Context, Result};
use cfg_if::cfg_if;
use kata_types::device::{
DRIVER_VFIO_AP_COLD_TYPE, DRIVER_VFIO_AP_TYPE, DRIVER_VFIO_PCI_GK_TYPE, DRIVER_VFIO_PCI_TYPE,
};
@ -27,6 +26,22 @@ use std::sync::Arc;
use tokio::sync::Mutex;
use tracing::instrument;
cfg_if! {
if #[cfg(target_arch = "s390x")] {
use crate::ap;
use crate::cdh::get_cdh_resource;
use std::convert::TryFrom;
use pv_core::ap::{
Apqn,
apqn_info::Ep11,
assoc_state::AssocState,
bind_state::BindState,
};
use pv_core::misc::{encode_hex, pv_guest_bit_set};
use pv_core::uv;
}
}
#[derive(Debug)]
pub struct VfioPciDeviceHandler {}
@ -103,7 +118,14 @@ impl DeviceHandler for VfioApDeviceHandler {
#[instrument]
async fn device_handler(&self, device: &Device, ctx: &mut DeviceContext) -> Result<SpecUpdate> {
// Force AP bus rescan
fs::write(AP_SCANS_PATH, "1")?;
let mut ap_context = String::from("Failed to rescan AP bus");
if pv_guest_bit_set() {
ap_context.push_str(
". Verify your host kernel supports AP pass-through with Secure Execution",
);
}
fs::write(AP_SCANS_PATH, "1").context(ap_context)?;
for apqn in device.options.iter() {
let ap_address = ap::Address::from_str(apqn).context("Failed to parse AP address")?;
match device.type_.as_str() {
@ -111,7 +133,7 @@ impl DeviceHandler for VfioApDeviceHandler {
wait_for_ap_device(ctx.sandbox, ap_address).await?;
}
DRIVER_VFIO_AP_COLD_TYPE => {
check_ap_device(ctx.sandbox, ap_address).await?;
check_ap_device(ap_address).await?;
}
_ => return Err(anyhow!("Unsupported AP device type: {}", device.type_)),
}
@ -214,35 +236,70 @@ async fn wait_for_ap_device(sandbox: &Arc<Mutex<Sandbox>>, address: ap::Address)
#[cfg(target_arch = "s390x")]
#[instrument]
async fn check_ap_device(sandbox: &Arc<Mutex<Sandbox>>, address: ap::Address) -> Result<()> {
let ap_path = format!(
"/sys/{}/card{:02x}/{}/online",
AP_ROOT_BUS_PATH, address.adapter_id, address
);
if !Path::new(&ap_path).is_file() {
return Err(anyhow!(
"AP device online file not found or not accessible: {}",
ap_path
));
async fn check_ap_device(address: ap::Address) -> Result<()> {
let apqn = Apqn::try_from(&address.to_string() as &str)
.context("Failed to establish AP at {address}")?;
if apqn.info.is_none() {
return Err(anyhow!("Failed to read info for AP {address}"));
}
match fs::read_to_string(&ap_path) {
Ok(content) => {
let is_online = content.trim() == "1";
if !is_online {
return Err(anyhow!("AP device {} exists but is not online", address));
}
}
Err(e) => {
if !pv_guest_bit_set() {
return Ok(());
}
apqn.set_bind_state(BindState::Bound)
.context(anyhow!("Failed to bind AP {address}"))?;
if let Some(Ep11(ep11_info)) = &apqn.info {
if ep11_info.mkvp.is_empty() {
return Err(anyhow!(
"Failed to read online status for AP device {}: {}",
address,
e
"Master key verification pattern for AP {address} is unset"
));
}
associate_ap_device(&apqn, &ep11_info.mkvp)
.await
.context(anyhow!("Failed to associate AP {address}"))?;
}
Ok(())
}
#[cfg(target_arch = "s390x")]
async fn associate_ap_device(apqn: &Apqn, mkvp: &str) -> Result<()> {
let resource_path = format!("/vfio_ap/{mkvp}");
let secret_resource_path = format!("{resource_path}/secret");
let secret_id_resource_path = format!("{resource_path}/secret_id");
let uv_secret = get_cdh_resource(&secret_resource_path)
.await
.context(anyhow!(
"Failed to read Confidential Data Hub secret {secret_resource_path}. \
Provide the desired Ultravisor secret for this MKVP with an appropriate key broker service."
))?;
let secret_id_bytes = get_cdh_resource(&secret_id_resource_path)
.await
.context(anyhow!(
"Failed to read Confidential Data Hub secret {secret_id_resource_path}. \
Provide the desired Ultravisor secret ID for this MKVP with an appropriate key broker service."
))?;
let secret_id = std::str::from_utf8(&secret_id_bytes)?
.trim_start_matches("0x")
.trim_end();
// TODO Once initdata is stable, enable and mandate this request be signed
// (`pvsecret create --user-sign-key`, `pvsecret verify --user-cert`)
let uv = uv::UvDevice::open()?;
let mut add_cmd = uv::AddCmd::new(&mut uv_secret.as_slice())
.context("Failed to create add secret request")?;
uv.send_cmd(&mut add_cmd).context("Failed to add secret")?;
let mut list_cmd = uv::ListCmd::new();
uv.send_cmd(&mut list_cmd)?;
let secret_idx = uv::SecretList::try_from(list_cmd)?
.iter()
.find(|&s| encode_hex(s.id()) == secret_id)
.ok_or_else(|| anyhow!("Could not find secret with the ID {secret_id}. \
Perhaps there is a mismatch between the provided secret and secret ID."))?
.index();
Ok(apqn.set_associate_state(AssocState::Associated(secret_idx))?)
}
pub async fn wait_for_pci_device(
sandbox: &Arc<Mutex<Sandbox>>,
pcipath: &pci::Path,

View File

@ -34,4 +34,16 @@ service SealedSecretService {
service SecureMountService {
rpc SecureMount(SecureMountRequest) returns (SecureMountResponse) {};
}
message GetResourceRequest {
string ResourcePath = 1;
}
message GetResourceResponse {
bytes Resource = 1;
}
service GetResourceService {
rpc GetResource(GetResourceRequest) returns (GetResourceResponse) {};
}