mirror of
https://github.com/kata-containers/kata-containers.git
synced 2025-07-17 00:43:36 +00:00
Merge pull request #11076 from Jakob-Naucke/ap-bind-assoc
Bind/associate for VFIO-AP
This commit is contained in:
commit
a286a5aee8
@ -32,6 +32,7 @@ See the [how-to documentation](how-to).
|
||||
* [Intel QAT with Kata](./use-cases/using-Intel-QAT-and-kata.md)
|
||||
* [SPDK vhost-user with Kata](./use-cases/using-SPDK-vhostuser-and-kata.md)
|
||||
* [Intel SGX with Kata](./use-cases/using-Intel-SGX-and-kata.md)
|
||||
* [IBM Crypto Express passthrough with Confidential Containers](./use-cases/CEX-passthrough-and-coco.md)
|
||||
|
||||
## Developer Guide
|
||||
|
||||
|
96
docs/use-cases/CEX-passthrough-and-coco.md
Normal file
96
docs/use-cases/CEX-passthrough-and-coco.md
Normal file
@ -0,0 +1,96 @@
|
||||
# Using IBM Crypto Express with Confidential Containers
|
||||
|
||||
On IBM Z (s390x), IBM Crypto Express (CEX) hardware security modules (HSM) can be passed through to virtual guests.
|
||||
This VFIO pass-through is domain-wise, i.e. guests can securely share one physical card.
|
||||
For the Accelerator and Enterprise PKCS #11 (EP11) modes of CEX, on IBM z16 and up, pass-through is also supported when using the IBM Secure Execution trusted execution environment.
|
||||
To maintain confidentiality when using EP11 within Secure Execution, additional steps are required.
|
||||
When using Secure Execution within Kata Containers, some of these steps are managed by the Kata agent, but preparation is required to make pass-through work.
|
||||
The Kata agent will expect required confidential information at runtime via [Confidential Data Hub](https://github.com/confidential-containers/guest-components/tree/main/confidential-data-hub) from Confidential Containers, and this guide assumes Confidential Containers components as a means of secret provisioning.
|
||||
|
||||
At the time of writing, devices for trusted execution environments are only supported via the `--device` option of e.g. `ctr`, `docker`, or `podman`, but **not** via Kubernetes.
|
||||
Refer to [KEP 4113](https://github.com/kubernetes/enhancements/pull/4113) for details.
|
||||
|
||||
Using a CEX card in Accelerator mode is much simpler and does not require the steps below.
|
||||
To do so, prepare [Kata for Secure Execution](../how-to/how-to-run-kata-containers-with-SE-VMs.md), set `vfio_mode = "vfio"` and `cold_plug_vfio = "bridge-port"` in the Kata `configuration.toml` file and use a [mediated device](../../src/runtime/virtcontainers/README.md#how-to-pass-a-device-using-vfio-ap-passthrough) similar to operating without Secure Execution.
|
||||
The Kata agent will do the [Secure Execution bind](https://www.ibm.com/docs/en/linux-on-systems?topic=adapters-accelerator-mode) automatically.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- A host kernel that supports adjunct processor (AP) pass-through with Secure Execution. [Official support](https://www.ibm.com/docs/en/linux-on-systems?topic=restrictions-required-software) exists as of Ubuntu 24.04, RHEL 8.10 and 9.4, and SLES 15 SP6.
|
||||
- An EP11 domain with a master key set up. In this process, you will need the master key verification pattern (MKVP) [1].
|
||||
- A [mediated device](../../src/runtime/virtcontainers/README.md#how-to-pass-a-device-using-vfio-ap-passthrough), created from this domain, to pass through.
|
||||
- Working [Kata Containers with Secure Execution](../how-to/how-to-run-kata-containers-with-SE-VMs.md).
|
||||
- Working access to a [key broker service (KBS) with the IBM Secure Execution verifier](https://github.com/confidential-containers/trustee/blob/main/deps/verifier/src/se/README.md) from a Kata container. The provided Secure Execution header must match the Kata guest image and a policy to allow the appropriate secrets for this guest must be set up.
|
||||
- In Kata's `configuration.toml`, set `vfio_mode = "vfio"` and `cold_plug_vfio = "bridge-port"`
|
||||
|
||||
## Prepare an association secret
|
||||
|
||||
An EP11 Secure Execution workload requires an [association secret](https://www.ibm.com/docs/en/linux-on-systems?topic=adapters-ep11-mode) to be inserted in the guest and associated with the adjunct processor (AP) queue.
|
||||
In Kata Containers, this secret must be created and made available via Trustee, whereas the Kata agent performs the actual secret insertion and association.
|
||||
On a trusted system, to create an association secret using the host key document (HKD) `z16.crt`, a guest header `hdr.bin`, a CA certificate `DigiCertCA.crt`, an IBM signing key `ibm-z-host-key-signing-gen2.crt`, and let the command create a random association secret that is named `my secret` and save this random association secret to `my_random_secret`, run:
|
||||
|
||||
```
|
||||
[trusted]# pvsecret create -k z16.crt --hdr hdr.bin -o my_addsecreq \
|
||||
--crt DigiCertCA.crt --crt ibm-z-host-key-signing-gen2.crt \
|
||||
association "my secret" --output-secret my_random_secret
|
||||
```
|
||||
|
||||
using `pvsecret` from the [s390-tools](https://github.com/ibm-s390-linux/s390-tools) suite.
|
||||
`hdr.bin` **must** be the Secure Execution header matching the Kata guest image, i.e. the one also provided to Trustee.
|
||||
This command saves the add-secret request itself to `my_addsecreq`, and information on the secret, including the secret ID, to `my_secret.yaml`.
|
||||
This secret ID must be provided alongside the secret.
|
||||
Write it to `my_addsecid` with or without leading `0x` or, using `yq`:
|
||||
|
||||
```
|
||||
[trusted]# yq ".id" my_secret.yaml > my_addsecid
|
||||
```
|
||||
|
||||
## Provision the association secret with Trustee
|
||||
|
||||
The secret and secret ID must be provided via Trustee with respect to the MKVP.
|
||||
The paths where the Kata agent will expect this info are `vfio_ap/${mkvp}/secret` and `vfio_ap/${mkvp}/secret_id`, where `$mkvp` is the first 16 bytes (32 hex numbers) without leading `0x` of the MKVP.
|
||||
|
||||
For example, if your MKVPs read [1] as
|
||||
|
||||
```
|
||||
WK CUR: valid 0xdb3c3b3c3f097dd55ec7eb0e7fdbcb933b773619640a1a75a9161cec00000000
|
||||
WK NEW: empty -
|
||||
```
|
||||
|
||||
use `db3c3b3c3f097dd55ec7eb0e7fdbcb93` in the provision for Trustee.
|
||||
With a KBS running at `127.0.0.1:8080`, to store the secret and ID created above in the KBS with the authentication key `kbs.key` and this MKVP, run:
|
||||
|
||||
```
|
||||
[trusted]# kbs-client --url http://127.0.0.1:8080 config \
|
||||
--auth-private-key kbs.key set-resource \
|
||||
--path vfio_ap/db3c3b3c3f097dd55ec7eb0e7fdbcb93/secret \
|
||||
--resource-file my_addsecreq
|
||||
[trusted]# kbs-client --url http://127.0.0.1:8080 config \
|
||||
--auth-private-key kbs.key set-resource \
|
||||
--path vfio_ap/db3c3b3c3f097dd55ec7eb0e7fdbcb93/secret_id \
|
||||
--resource-file my_addsecid
|
||||
```
|
||||
|
||||
## Run the workload
|
||||
|
||||
Assuming the mediated device exists at `/dev/vfio/0`, run e.g.
|
||||
|
||||
```
|
||||
[host]# docker run --rm --runtime io.containerd.run.kata.v2 --device /dev/vfio/0 -it ubuntu
|
||||
```
|
||||
|
||||
If you have [s390-tools](https://github.com/ibm-s390-linux/s390-tools) available in the container, you can see the available CEX domains including Secure Execution info using `lszcrypt -V`:
|
||||
|
||||
```
|
||||
[container]# lszcrypt -V
|
||||
CARD.DOM TYPE MODE STATUS REQUESTS PENDING HWTYPE QDEPTH FUNCTIONS DRIVER SESTAT
|
||||
--------------------------------------------------------------------------------------------------------
|
||||
03 CEX8P EP11-Coproc online 2 0 14 08 -----XN-F- cex4card -
|
||||
03.0041 CEX8P EP11-Coproc online 2 0 14 08 -----XN-F- cex4queue usable
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
[1] If you have access to the host, the MKVP can be read at `/sys/bus/ap/card${cardno}/${apqn}/mkvps`, where `${cardno}` is the the two-digit hexadecimal identification for the card, and `${apqn}` is the APQN of the domain you want to pass, e.g. `card03/03.0041` for the the domain 0x41 on card 3.
|
||||
This information is only readable when card and domain are not yet masked for use with VFIO.
|
||||
If you do not have access to the host, you should receive the MKVP from your HSM domain administrator.
|
16
src/agent/Cargo.lock
generated
16
src/agent/Cargo.lock
generated
@ -3085,6 +3085,7 @@ dependencies = [
|
||||
"rtnetlink",
|
||||
"runtime-spec",
|
||||
"rustjail",
|
||||
"s390_pv_core",
|
||||
"safe-path",
|
||||
"scan_fmt",
|
||||
"scopeguard",
|
||||
@ -5575,6 +5576,20 @@ version = "0.2.2"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "6518fc26bced4d53678a22d6e423e9d8716377def84545fe328236e3af070e7f"
|
||||
|
||||
[[package]]
|
||||
name = "s390_pv_core"
|
||||
version = "0.11.0"
|
||||
source = "git+https://github.com/ibm-s390-linux/s390-tools?rev=4942504a9a2977d49989a5e5b7c1c8e07dc0fa41#4942504a9a2977d49989a5e5b7c1c8e07dc0fa41"
|
||||
dependencies = [
|
||||
"byteorder",
|
||||
"libc",
|
||||
"log",
|
||||
"regex",
|
||||
"serde",
|
||||
"thiserror 2.0.12",
|
||||
"zerocopy 0.7.35",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "safe-path"
|
||||
version = "0.1.0"
|
||||
@ -7768,6 +7783,7 @@ version = "0.7.35"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "1b9b4fd18abc82b8136838da5d50bae7bdea537c574d8dc1a34ed098d6c166f0"
|
||||
dependencies = [
|
||||
"byteorder",
|
||||
"zerocopy-derive 0.7.35",
|
||||
]
|
||||
|
||||
|
@ -187,6 +187,9 @@ base64 = "0.22"
|
||||
sha2 = "0.10.8"
|
||||
async-compression = { version = "0.4.22", features = ["tokio", "gzip"] }
|
||||
|
||||
[target.'cfg(target_arch = "s390x")'.dependencies]
|
||||
pv_core = { git = "https://github.com/ibm-s390-linux/s390-tools", rev = "4942504a9a2977d49989a5e5b7c1c8e07dc0fa41", package = "s390_pv_core" }
|
||||
|
||||
[dev-dependencies]
|
||||
tempfile.workspace = true
|
||||
which.workspace = true
|
||||
|
@ -11,8 +11,12 @@ use crate::AGENT_CONFIG;
|
||||
use anyhow::{bail, Context, Result};
|
||||
use derivative::Derivative;
|
||||
use protocols::{
|
||||
confidential_data_hub, confidential_data_hub_ttrpc_async,
|
||||
confidential_data_hub_ttrpc_async::{SealedSecretServiceClient, SecureMountServiceClient},
|
||||
confidential_data_hub,
|
||||
confidential_data_hub::GetResourceRequest,
|
||||
confidential_data_hub_ttrpc_async,
|
||||
confidential_data_hub_ttrpc_async::{
|
||||
GetResourceServiceClient, SealedSecretServiceClient, SecureMountServiceClient,
|
||||
},
|
||||
};
|
||||
use std::fs;
|
||||
use std::os::unix::fs::symlink;
|
||||
@ -39,6 +43,8 @@ pub struct CDHClient {
|
||||
sealed_secret_client: SealedSecretServiceClient,
|
||||
#[derivative(Debug = "ignore")]
|
||||
secure_mount_client: SecureMountServiceClient,
|
||||
#[derivative(Debug = "ignore")]
|
||||
get_resource_client: GetResourceServiceClient,
|
||||
}
|
||||
|
||||
impl CDHClient {
|
||||
@ -47,10 +53,13 @@ impl CDHClient {
|
||||
let sealed_secret_client =
|
||||
confidential_data_hub_ttrpc_async::SealedSecretServiceClient::new(client.clone());
|
||||
let secure_mount_client =
|
||||
confidential_data_hub_ttrpc_async::SecureMountServiceClient::new(client);
|
||||
confidential_data_hub_ttrpc_async::SecureMountServiceClient::new(client.clone());
|
||||
let get_resource_client =
|
||||
confidential_data_hub_ttrpc_async::GetResourceServiceClient::new(client);
|
||||
Ok(CDHClient {
|
||||
sealed_secret_client,
|
||||
secure_mount_client,
|
||||
get_resource_client,
|
||||
})
|
||||
}
|
||||
|
||||
@ -84,6 +93,18 @@ impl CDHClient {
|
||||
.await?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
pub async fn get_resource(&self, resource_path: &str) -> Result<Vec<u8>> {
|
||||
let req = GetResourceRequest {
|
||||
ResourcePath: format!("kbs://{}", resource_path),
|
||||
..Default::default()
|
||||
};
|
||||
let res = self
|
||||
.get_resource_client
|
||||
.get_resource(ttrpc::context::with_timeout(*CDH_API_TIMEOUT), &req)
|
||||
.await?;
|
||||
Ok(res.Resource)
|
||||
}
|
||||
}
|
||||
|
||||
pub async fn init_cdh_client(cdh_socket_uri: &str) -> Result<()> {
|
||||
@ -201,6 +222,15 @@ pub async fn secure_mount(
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[allow(dead_code)]
|
||||
pub async fn get_cdh_resource(resource_path: &str) -> Result<Vec<u8>> {
|
||||
let cdh_client = CDH_CLIENT
|
||||
.get()
|
||||
.expect("Confidential Data Hub not initialized");
|
||||
|
||||
cdh_client.get_resource(resource_path).await
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
@ -4,14 +4,13 @@
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#[cfg(target_arch = "s390x")]
|
||||
use crate::ap;
|
||||
use crate::device::{pcipath_to_sysfs, DevUpdate, DeviceContext, DeviceHandler, SpecUpdate};
|
||||
use crate::linux_abi::*;
|
||||
use crate::pci;
|
||||
use crate::sandbox::Sandbox;
|
||||
use crate::uevent::{wait_for_uevent, Uevent, UeventMatcher};
|
||||
use anyhow::{anyhow, Context, Result};
|
||||
use cfg_if::cfg_if;
|
||||
use kata_types::device::{
|
||||
DRIVER_VFIO_AP_COLD_TYPE, DRIVER_VFIO_AP_TYPE, DRIVER_VFIO_PCI_GK_TYPE, DRIVER_VFIO_PCI_TYPE,
|
||||
};
|
||||
@ -27,6 +26,22 @@ use std::sync::Arc;
|
||||
use tokio::sync::Mutex;
|
||||
use tracing::instrument;
|
||||
|
||||
cfg_if! {
|
||||
if #[cfg(target_arch = "s390x")] {
|
||||
use crate::ap;
|
||||
use crate::cdh::get_cdh_resource;
|
||||
use std::convert::TryFrom;
|
||||
use pv_core::ap::{
|
||||
Apqn,
|
||||
apqn_info::Ep11,
|
||||
assoc_state::AssocState,
|
||||
bind_state::BindState,
|
||||
};
|
||||
use pv_core::misc::{encode_hex, pv_guest_bit_set};
|
||||
use pv_core::uv;
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
pub struct VfioPciDeviceHandler {}
|
||||
|
||||
@ -103,7 +118,14 @@ impl DeviceHandler for VfioApDeviceHandler {
|
||||
#[instrument]
|
||||
async fn device_handler(&self, device: &Device, ctx: &mut DeviceContext) -> Result<SpecUpdate> {
|
||||
// Force AP bus rescan
|
||||
fs::write(AP_SCANS_PATH, "1")?;
|
||||
let mut ap_context = String::from("Failed to rescan AP bus");
|
||||
if pv_guest_bit_set() {
|
||||
ap_context.push_str(
|
||||
". Verify your host kernel supports AP pass-through with Secure Execution",
|
||||
);
|
||||
}
|
||||
fs::write(AP_SCANS_PATH, "1").context(ap_context)?;
|
||||
|
||||
for apqn in device.options.iter() {
|
||||
let ap_address = ap::Address::from_str(apqn).context("Failed to parse AP address")?;
|
||||
match device.type_.as_str() {
|
||||
@ -111,7 +133,7 @@ impl DeviceHandler for VfioApDeviceHandler {
|
||||
wait_for_ap_device(ctx.sandbox, ap_address).await?;
|
||||
}
|
||||
DRIVER_VFIO_AP_COLD_TYPE => {
|
||||
check_ap_device(ctx.sandbox, ap_address).await?;
|
||||
check_ap_device(ap_address).await?;
|
||||
}
|
||||
_ => return Err(anyhow!("Unsupported AP device type: {}", device.type_)),
|
||||
}
|
||||
@ -214,35 +236,70 @@ async fn wait_for_ap_device(sandbox: &Arc<Mutex<Sandbox>>, address: ap::Address)
|
||||
|
||||
#[cfg(target_arch = "s390x")]
|
||||
#[instrument]
|
||||
async fn check_ap_device(sandbox: &Arc<Mutex<Sandbox>>, address: ap::Address) -> Result<()> {
|
||||
let ap_path = format!(
|
||||
"/sys/{}/card{:02x}/{}/online",
|
||||
AP_ROOT_BUS_PATH, address.adapter_id, address
|
||||
);
|
||||
if !Path::new(&ap_path).is_file() {
|
||||
return Err(anyhow!(
|
||||
"AP device online file not found or not accessible: {}",
|
||||
ap_path
|
||||
));
|
||||
async fn check_ap_device(address: ap::Address) -> Result<()> {
|
||||
let apqn = Apqn::try_from(&address.to_string() as &str)
|
||||
.context("Failed to establish AP at {address}")?;
|
||||
if apqn.info.is_none() {
|
||||
return Err(anyhow!("Failed to read info for AP {address}"));
|
||||
}
|
||||
match fs::read_to_string(&ap_path) {
|
||||
Ok(content) => {
|
||||
let is_online = content.trim() == "1";
|
||||
if !is_online {
|
||||
return Err(anyhow!("AP device {} exists but is not online", address));
|
||||
}
|
||||
}
|
||||
Err(e) => {
|
||||
if !pv_guest_bit_set() {
|
||||
return Ok(());
|
||||
}
|
||||
apqn.set_bind_state(BindState::Bound)
|
||||
.context(anyhow!("Failed to bind AP {address}"))?;
|
||||
if let Some(Ep11(ep11_info)) = &apqn.info {
|
||||
if ep11_info.mkvp.is_empty() {
|
||||
return Err(anyhow!(
|
||||
"Failed to read online status for AP device {}: {}",
|
||||
address,
|
||||
e
|
||||
"Master key verification pattern for AP {address} is unset"
|
||||
));
|
||||
}
|
||||
associate_ap_device(&apqn, &ep11_info.mkvp)
|
||||
.await
|
||||
.context(anyhow!("Failed to associate AP {address}"))?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
#[cfg(target_arch = "s390x")]
|
||||
async fn associate_ap_device(apqn: &Apqn, mkvp: &str) -> Result<()> {
|
||||
let resource_path = format!("/vfio_ap/{mkvp}");
|
||||
let secret_resource_path = format!("{resource_path}/secret");
|
||||
let secret_id_resource_path = format!("{resource_path}/secret_id");
|
||||
|
||||
let uv_secret = get_cdh_resource(&secret_resource_path)
|
||||
.await
|
||||
.context(anyhow!(
|
||||
"Failed to read Confidential Data Hub secret {secret_resource_path}. \
|
||||
Provide the desired Ultravisor secret for this MKVP with an appropriate key broker service."
|
||||
))?;
|
||||
let secret_id_bytes = get_cdh_resource(&secret_id_resource_path)
|
||||
.await
|
||||
.context(anyhow!(
|
||||
"Failed to read Confidential Data Hub secret {secret_id_resource_path}. \
|
||||
Provide the desired Ultravisor secret ID for this MKVP with an appropriate key broker service."
|
||||
))?;
|
||||
let secret_id = std::str::from_utf8(&secret_id_bytes)?
|
||||
.trim_start_matches("0x")
|
||||
.trim_end();
|
||||
|
||||
// TODO Once initdata is stable, enable and mandate this request be signed
|
||||
// (`pvsecret create --user-sign-key`, `pvsecret verify --user-cert`)
|
||||
let uv = uv::UvDevice::open()?;
|
||||
let mut add_cmd = uv::AddCmd::new(&mut uv_secret.as_slice())
|
||||
.context("Failed to create add secret request")?;
|
||||
uv.send_cmd(&mut add_cmd).context("Failed to add secret")?;
|
||||
let mut list_cmd = uv::ListCmd::new();
|
||||
uv.send_cmd(&mut list_cmd)?;
|
||||
|
||||
let secret_idx = uv::SecretList::try_from(list_cmd)?
|
||||
.iter()
|
||||
.find(|&s| encode_hex(s.id()) == secret_id)
|
||||
.ok_or_else(|| anyhow!("Could not find secret with the ID {secret_id}. \
|
||||
Perhaps there is a mismatch between the provided secret and secret ID."))?
|
||||
.index();
|
||||
Ok(apqn.set_associate_state(AssocState::Associated(secret_idx))?)
|
||||
}
|
||||
|
||||
pub async fn wait_for_pci_device(
|
||||
sandbox: &Arc<Mutex<Sandbox>>,
|
||||
pcipath: &pci::Path,
|
||||
|
@ -34,4 +34,16 @@ service SealedSecretService {
|
||||
|
||||
service SecureMountService {
|
||||
rpc SecureMount(SecureMountRequest) returns (SecureMountResponse) {};
|
||||
}
|
||||
|
||||
message GetResourceRequest {
|
||||
string ResourcePath = 1;
|
||||
}
|
||||
|
||||
message GetResourceResponse {
|
||||
bytes Resource = 1;
|
||||
}
|
||||
|
||||
service GetResourceService {
|
||||
rpc GetResource(GetResourceRequest) returns (GetResourceResponse) {};
|
||||
}
|
Loading…
Reference in New Issue
Block a user