Compare commits

..

38 Commits
2.4.1 ... 2.4.3

Author SHA1 Message Date
Archana Shinde
6330386ab6 Merge pull request #4593 from fidencio/2.4.3-branch-bump
# Kata Containers 2.4.3
2022-07-05 15:04:34 -07:00
Fabiano Fidêncio
847003187c release: Kata Containers 2.4.3
- stable-2.4 | shim: set a non-zero return code if the wait process call failed.
- stable-2.4 | rootfs: Fix chronyd.service failing on boot

396fed42c release: Adapt kata-deploy for 2.4.3
025e3ea6a shim: set a non-zero return code if the wait process call failed.
f32a14663 snap: Fix debug cli option
0718b9b55 rootfs: Fix chronyd.service failing on boot

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2022-07-05 22:26:52 +02:00
Fabiano Fidêncio
396fed42c1 release: Adapt kata-deploy for 2.4.3
kata-deploy files must be adapted to a new release.  The cases where it
happens are when the release goes from -> to:
* main -> stable:
  * kata-deploy-stable / kata-cleanup-stable: are removed

* stable -> stable:
  * kata-deploy / kata-cleanup: bump the release to the new one.

There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2022-07-05 22:26:52 +02:00
GabyCT
ca7bb9dceb Merge pull request #4571 from fidencio/topic/stable-2.4-set-status-if-wait-process-failed
stable-2.4 | shim: set a non-zero return code if the wait process call failed.
2022-07-01 11:28:35 -05:00
liubin
025e3ea6ab shim: set a non-zero return code if the wait process call failed.
Return code is an int32 type, so if an error occurred, the default value
may be zero, this value will be created as a normal exit code.

Set return code to 255 will let the caller(for example Kubernetes) know
that there are some problems with the pod/container.

Fixes: #4419

Signed-off-by: liubin <liubin0329@gmail.com>
(cherry picked from commit ab5f1c9564)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-06-30 21:55:13 +02:00
GabyCT
5d50fc3908 Merge pull request #4501 from fidencio/topic/stable-2.4-backport-chronyd-fix
stable-2.4 | rootfs: Fix chronyd.service failing on boot
2022-06-22 12:32:21 -05:00
James O. D. Hunt
f32a146637 snap: Fix debug cli option
`snap`/`snapcraft` seems to have changed recently. Since `snap`
auto-updates all `snap` packages and since we use the `snapcraft` `snap`
for building snaps, this is impacting all our CI jobs which now show:

```
Installing Snapcraft for Linux…
snapcraft 7.0.4 from Canonical* installed

Run snapcraft -d snap --destructive-mode
Usage: snapcraft [options] command [args]...
Try 'snapcraft pack -h' for help.
Error: unrecognized arguments: -d
Error: Process completed with exit code 1.
```

Move the debug option to make it a sub-command (long) option to resolve
this issue.

Fixes: #4457.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 90a7763ac6)
2022-06-21 16:35:53 +02:00
Champ-Goblem
0718b9b55f rootfs: Fix chronyd.service failing on boot
In at least kata versions 2.3.3 and 2.4.0 it was noticed that the guest
operating system's clock would drift out of sync slowly over time
whilst the pod was running.

This had previously been raised and fixed in the old reposity via [1].
In essence kvm_ptp and chrony were paired together in order to
keep the system clock up to date with the host.

In the recent versions of kata metioned above,
the chronyd.service fails upon boot with status `266/NAMESPACE`
which seems to be due to the fact that the `/var/lib/chrony`
directory no longer exists.

This change sets the `/var/lib/chrony` directory for the `ReadWritePaths`
to be ignored when the directory does not exist, as per [2].

[1] https://github.com/kata-containers/runtime/issues/1279
[2] https://www.freedesktop.org/software/systemd
/man/systemd.exec.html#ReadWritePaths=

Fixes: #4167
Signed-off-by: Champ-Goblem <cameron_mcdermott@yahoo.co.uk>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 1b7fd19acb)
2022-06-21 15:49:56 +02:00
snir911
6d93875ead Merge pull request #4385 from snir911/2.4.2-branch-bump
# Kata Containers 2.4.2
2022-06-08 19:38:17 +03:00
Snir Sheriber
7fd22d77d0 release: Kata Containers 2.4.2
- My 2.4 pr backport -- fix shim leak caused by ESRCH in agent destroy
- backport-2.4 | workflows: add workflow_dispatch triggering to test-kata-deploy
- stable-2.4: runtime: Adding the correct detection of mediated PCIe devices
- stable-2.4: backport agent fixes
- stable-2.4 | clh: Update to the v24.0 release
- stable-2.4 | Backport fixes for direct-volume stats
- stable-2.4 | tools: Add QEMU patches for SGX numa support
- stable-2.4 | versions: Upgrade to Cloud Hypervisor v23.1

607a8a9c2 release: Adapt kata-deploy for 2.4.2
e5568a31a agent: ignore ESRCH error when destroying containers
322839ac7 runtime: force stop container after the container process exits
b75d5cee7 docs: update release process github token instructions
e938ce443 docs: update release process with latest workflow triggering
046ba4df7 workflows: add workflow_dispatch triggering to test-kata-deploy
14ce4b01b runtime: Adding the correct detection of mediated PCIe devices
f54d5cf16 agent: Fix is_signal_handled failing parsing str to u64
80d5f9e14 agent: move assert_result macro to test_utils file
50a74dfee agent: add tests for is_signal_handled function
560247f8d agent: add tests for update_container_namespaces
47d4e79c1 agent: add tests for do_write_stream function
e3ce8aff9 agent: add tests for get_memory_info function
ebe9fc2ca clh: Update to the v24.0 release
29c9391da agent: fix direct-assigned volume stats
d1848523d runtime: direct-volume stats use correct name
338c9f2b0 runtime: direct-volume stats update to use GET parameter
f528bc010 runtime: fix incorrect Action function for direct-volume stats
3413c8588 tools: Add QEMU patches for SGX numa support
db6d4f7e1 versions: Upgrade to Cloud Hypervisor v23.1

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-06-08 11:54:59 +03:00
Snir Sheriber
607a8a9c2d release: Adapt kata-deploy for 2.4.2
kata-deploy files must be adapted to a new release.  The cases where it
happens are when the release goes from -> to:
* main -> stable:
  * kata-deploy-stable / kata-cleanup-stable: are removed

* stable -> stable:
  * kata-deploy / kata-cleanup: bump the release to the new one.

There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-06-08 11:54:59 +03:00
Feng Wang
562e968d19 Merge pull request #4389 from fengwang666/my_2.4_pr_backport
My 2.4 pr backport -- fix shim leak caused by ESRCH in agent destroy
2022-06-02 14:28:40 -07:00
Feng Wang
e5568a31a7 agent: ignore ESRCH error when destroying containers
destroy() method should ignore the ESRCH error from signal::kill
and continue the operation as ESRCH is often considered harmless.

Fixes: #4359

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-06-02 12:50:54 -07:00
Feng Wang
322839ac75 runtime: force stop container after the container process exits
Set thestop container force flag to true so that the container state is always set to
“StateStopped” after the container wait goroutine is finished. This is necessary for
the following delete container step to succeed.

Fixes: #4359

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-06-02 12:50:40 -07:00
snir911
4be3aebd15 Merge pull request #4352 from snir911/fix-workflow-stable-2.4
backport-2.4 | workflows: add workflow_dispatch triggering to test-kata-deploy
2022-06-02 13:19:19 +03:00
Snir Sheriber
b75d5cee74 docs: update release process github token instructions
and fix the gpg generating key url

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-06-01 19:12:45 +03:00
Snir Sheriber
e938ce443c docs: update release process with latest workflow triggering
instructions

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-06-01 19:12:37 +03:00
Snir Sheriber
046ba4df7f workflows: add workflow_dispatch triggering to test-kata-deploy
This will allow to trigger the test-kata-deploy workflow manually from
any branch instead of using always the one that is defined on main

See: https://github.blog/changelog/2020-07-06-github-actions-manual-triggers-with-workflow_dispatch/

Fixes: #4349
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-06-01 16:25:25 +03:00
Fabiano Fidêncio
a1d2049bee Merge pull request #4337 from snir911/backports-stable-2.4
stable-2.4: runtime: Adding the correct detection of mediated PCIe devices
2022-05-30 22:35:26 +02:00
James O. D. Hunt
8dcf6c354f Merge pull request #4274 from egernst/backport-agent-fixes
stable-2.4: backport agent fixes
2022-05-30 16:57:07 +01:00
Zvonko Kaiser
14ce4b01ba runtime: Adding the correct detection of mediated PCIe devices
Fixes #4212

Backport-of: https://github.com/kata-containers/kata-containers/pull/4213
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2022-05-30 12:32:41 +03:00
snir911
4018fdc9b2 Merge pull request #4319 from fidencio/topic/stable-2.4-update-clh-to-v24.0
stable-2.4 | clh: Update to the v24.0 release
2022-05-29 11:46:58 +03:00
Champ-Goblem
f54d5cf165 agent: Fix is_signal_handled failing parsing str to u64
In the is_signal_handled function, when parsing the hex string returned
from `/proc/<pid>/status` the space/tab character after the colon
is not removed.

This patch trims the result of SigCgt so that
all whitespace characters are removed. It also extends the existing
test cases to check for this scenario.

Fixes: #4250
Signed-off-by: Champ-Goblem <cameron@northflank.com>
2022-05-26 15:44:56 -07:00
Braden Rayhorn
80d5f9e145 agent: move assert_result macro to test_utils file
Move the assert_result macro to the shared test_utils file
so that it is not duplicated in individual files.

Fixes: #4093

Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
2022-05-26 15:44:56 -07:00
Braden Rayhorn
50a74dfeee agent: add tests for is_signal_handled function
Add test coverage for is_signal_handled function in rpc.rs. Includes
refactors to make the function testable and handle additional cases.

Fixes #3939

Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
2022-05-26 15:43:45 -07:00
Braden Rayhorn
560247f8da agent: add tests for update_container_namespaces
Add test coverage for update_container_namespaces function
in src/rpc.rs. Includes minor refactor to make function easier
to test.

Fixes #4034

Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
2022-05-26 15:43:45 -07:00
Braden Rayhorn
47d4e79c15 agent: add tests for do_write_stream function
Add test coverage for do_write_stream function of AgentService
in src/rpc.rs. Includes minor refactoring to make function more
easily testable.

Fixes #3984

Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
2022-05-26 15:42:45 -07:00
Braden Rayhorn
e3ce8aff99 agent: add tests for get_memory_info function
Add test coverage for get_memory_info function in src/rpc.rs. Includes
some minor refactoring of the function.

Fixes #3837

Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
2022-05-26 15:42:45 -07:00
Yibo Zhuang
fc2c933a88 Merge pull request #4305 from yibozhuang/stable-2.4
stable-2.4 | Backport fixes for direct-volume stats
2022-05-26 13:52:19 -07:00
Fabiano Fidêncio
ebe9fc2cad clh: Update to the v24.0 release
This release has been tracked through the v24.0 project.

virtio-iommu specification describes how a device can be attached by default
to a bypass domain. This feature is particularly helpful for booting a VM with
guest software which doesn't support virtio-iommu but still need to access
the device. Now that Cloud Hypervisor supports this feature, it can boot a VM
with Rust Hypervisor Firmware or OVMF even if the virtio-block device exposing
the disk image is placed behind a virtual IOMMU.

Multiple checks have been added to the code to prevent devices with identical
identifiers from being created, and therefore avoid unexpected behaviors at boot
or whenever a device was hot plugged into the VM.

Sparse mmap support has been added to both VFIO and vfio-user devices. This
allows the device regions that are not fully mappable to be partially mapped.
And the more a device region can be mapped into the guest address space, the
fewer VM exits will be generated when this device is accessed. This directly
impacts the performance related to this device.

A new serial_number option has been added to --platform, allowing a user to
set a specific serial number for the platform. This number is exposed to the
guest through the SMBIOS.

* Fix loading RAW firmware (#4072)
* Reject compressed QCOW images (#4055)
* Reject virtio-mem resize if device is not activated (#4003)
* Fix potential mmap leaks from VFIO/vfio-user MMIO regions (#4069)
* Fix algorithm finding HOB memory resources (#3983)

* Refactor interrupt handling (#4083)
* Load kernel asynchronously (#4022)
* Only create ACPI memory manager DSDT when resizable (#4013)

Deprecated features will be removed in a subsequent release and users should
plan to use alternatives

* The mergeable option from the virtio-pmem support has been deprecated
(#3968)
* The dax option from the virtio-fs support has been deprecated (#3889)

Fixes: #4317

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-05-26 08:58:27 +00:00
Yibo Zhuang
29c9391da1 agent: fix direct-assigned volume stats
The current implementation of walking the
disks to match with the requested volume path
in agent doesn't work because the volume path
provided by the shim to the agent is the mount
path within the guest and not the device name.
The current logic is trying to match the
device name to the volume path which will never
match.

This change will simplify the
get_volume_capacity_stats and
get_volume_inode_stats to just call statfs and
get the bytes and inodes usage of the volume
path directly.

Fixes: #4297

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-05-23 16:40:35 -07:00
Yibo Zhuang
d1848523d3 runtime: direct-volume stats use correct name
Today the shim does a translation when doing
direct-volume stats where it takes the source and
returns the mount path within the guest.

The source for a direct-assigned volume is actually
the device path on the host and not the publish
volume path.

This change will perform a lookup of the mount info
during direct-volume stats to ensure that the
device path is provided to the shim for querying
the volume stats.

Fixes: #4297

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-05-23 16:34:50 -07:00
Yibo Zhuang
338c9f2b0b runtime: direct-volume stats update to use GET parameter
The go default http mux AFAIK doesn’t support pattern
routing so right now client is padding the url
for direct-volume stats with a subpath of the volume
path and this will always result in 404 not found returned
by the shim.

This change will update the shim to take the volume
path as a GET query parameter instead of a subpath.
If the parameter is missing or empty, then return
400 BadRequest to the client.

Fixes: #4297

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-05-23 16:34:41 -07:00
Yibo Zhuang
f528bc0103 runtime: fix incorrect Action function for direct-volume stats
The action function expects a function that returns error
but the current direct-volume stats Action returns
(string, error) which is invalid.

This change fixes the format and print out the stats from
the command instead.

Fixes: #4293

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-05-23 16:34:29 -07:00
Chelsea Mafrica
f821ecbdc6 Merge pull request #4268 from cmaf/tools-patch-qemu-sgx-numa-2.4
stable-2.4 | tools: Add QEMU patches for SGX numa support
2022-05-16 14:31:48 -07:00
Chelsea Mafrica
3413c8588d tools: Add QEMU patches for SGX numa support
There are a few patches for SGX numa support in QEMU added after the
6.2.0 release. Add them for SGX support in Kata.

Fixes #4254

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
(cherry picked from commit b4b9068cb7)
2022-05-16 11:48:22 -07:00
Fabiano Fidêncio
b93a0b1012 Merge pull request #4229 from likebreath/0510/backport_clh_v23.1
stable-2.4 | versions: Upgrade to Cloud Hypervisor v23.1
2022-05-12 21:55:20 +02:00
Bo Chen
db6d4f7e16 versions: Upgrade to Cloud Hypervisor v23.1
The following issues have been addressed from the latest bug fix release
v23.1 of Cloud Hypervisor: 1) Add some missing seccomp rules; 2) Remove
virtio-fs filesystem entries from config on removal; 3) Do not delete
API socket on API server start; 4) Reject virtio-mem resize if the guest
doesn't activate the device; 5) Fix OpenAPI naming of I/O throttling
knobs;

Fixes: #4222

Signed-off-by: Bo Chen <chen.bo@intel.com>
(cherry picked from commit 82ea018281)
2022-05-10 11:26:57 -07:00
28 changed files with 1404 additions and 108 deletions

View File

@@ -1,4 +1,5 @@
on:
workflow_dispatch: # this is used to trigger the workflow on non-main branches
issue_comment:
types: [created, edited]

View File

@@ -26,7 +26,7 @@ jobs:
# Check semantic versioning format (x.y.z) and if the current tag is the latest tag
if echo "${current_version}" | grep -q "^[[:digit:]]\+\.[[:digit:]]\+\.[[:digit:]]\+$" && echo -e "$latest_version\n$current_version" | sort -C -V; then
# Current version is the latest version, build it
snapcraft -d snap --destructive-mode
snapcraft snap --debug --destructive-mode
fi
- name: Upload snap

View File

@@ -24,4 +24,4 @@ jobs:
- name: Build snap
if: ${{ !contains(github.event.pull_request.labels.*.name, 'force-skip-ci') }}
run: |
snapcraft -d snap --destructive-mode
snapcraft snap --debug --destructive-mode

View File

@@ -1 +1 @@
2.4.1
2.4.3

View File

@@ -4,11 +4,11 @@
## Requirements
- [hub](https://github.com/github/hub)
* Using an [application token](https://github.com/settings/tokens) is required for hub.
* Using an [application token](https://github.com/settings/tokens) is required for hub (set to a GITHUB_TOKEN environment variable).
- GitHub permissions to push tags and create releases in Kata repositories.
- GPG configured to sign git tags. https://help.github.com/articles/generating-a-new-gpg-key/
- GPG configured to sign git tags. https://docs.github.com/en/authentication/managing-commit-signature-verification/generating-a-new-gpg-key
- You should configure your GitHub to use your ssh keys (to push to branches). See https://help.github.com/articles/adding-a-new-ssh-key-to-your-github-account/.
* As an alternative, configure hub to push and fork with HTTPS, `git config --global hub.protocol https` (Not tested yet) *
@@ -48,7 +48,7 @@
### Merge all bump version Pull requests
- The above step will create a GitHub pull request in the Kata projects. Trigger the CI using `/test` command on each bump Pull request.
- Trigger the test-kata-deploy workflow on the kata-containers repository bump Pull request using `/test_kata_deploy` (monitor under the "action" tab).
- Trigger the `test-kata-deploy` workflow which is under the `Actions` tab on the repository GitHub page (make sure to select the correct branch and validate it passes).
- Check any failures and fix if needed.
- Work with the Kata approvers to verify that the CI works and the pull requests are merged.

View File

@@ -1032,7 +1032,19 @@ impl BaseContainer for LinuxContainer {
let st = self.oci_state()?;
for pid in self.processes.keys() {
signal::kill(Pid::from_raw(*pid), Some(Signal::SIGKILL))?;
match signal::kill(Pid::from_raw(*pid), Some(Signal::SIGKILL)) {
Err(Errno::ESRCH) => {
info!(
self.logger,
"kill encounters ESRCH, pid: {}, container: {}",
pid,
self.id.clone()
);
continue;
}
Err(err) => return Err(anyhow!(err)),
Ok(_) => continue,
}
}
if spec.hooks.is_some() {

View File

@@ -432,6 +432,8 @@ fn get_container_pipe_size(param: &str) -> Result<i32> {
#[cfg(test)]
mod tests {
use crate::assert_result;
use super::*;
use anyhow::anyhow;
use std::fs::File;
@@ -439,32 +441,6 @@ mod tests {
use std::time;
use tempfile::tempdir;
// Parameters:
//
// 1: expected Result
// 2: actual Result
// 3: string used to identify the test on error
macro_rules! assert_result {
($expected_result:expr, $actual_result:expr, $msg:expr) => {
if $expected_result.is_ok() {
let expected_level = $expected_result.as_ref().unwrap();
let actual_level = $actual_result.unwrap();
assert!(*expected_level == actual_level, "{}", $msg);
} else {
let expected_error = $expected_result.as_ref().unwrap_err();
let expected_error_msg = format!("{:?}", expected_error);
if let Err(actual_error) = $actual_result {
let actual_error_msg = format!("{:?}", actual_error);
assert!(expected_error_msg == actual_error_msg, "{}", $msg);
} else {
assert!(expected_error_msg == "expected error, got OK", "{}", $msg);
}
}
};
}
#[test]
fn test_new() {
let config: AgentConfig = Default::default();

View File

@@ -40,13 +40,11 @@ use rustjail::specconv::CreateOpts;
use nix::errno::Errno;
use nix::mount::MsFlags;
use nix::sys::stat;
use nix::sys::{stat, statfs};
use nix::unistd::{self, Pid};
use rustjail::cgroups::Manager;
use rustjail::process::ProcessOperations;
use sysinfo::{DiskExt, System, SystemExt};
use crate::device::{
add_devices, get_virtio_blk_pci_device_name, update_device_cgroup, update_env_pci,
};
@@ -71,7 +69,6 @@ use tracing::instrument;
use libc::{self, c_char, c_ushort, pid_t, winsize, TIOCSWINSZ};
use std::fs;
use std::os::unix::fs::MetadataExt;
use std::os::unix::prelude::PermissionsExt;
use std::process::{Command, Stdio};
use std::time::Duration;
@@ -85,6 +82,11 @@ use std::path::PathBuf;
const CONTAINER_BASE: &str = "/run/kata-containers";
const MODPROBE_PATH: &str = "/sbin/modprobe";
const ERR_CANNOT_GET_WRITER: &str = "Cannot get writer";
const ERR_INVALID_BLOCK_SIZE: &str = "Invalid block size";
const ERR_NO_LINUX_FIELD: &str = "Spec does not contain linux field";
const ERR_NO_SANDBOX_PIDNS: &str = "Sandbox does not have sandbox_pidns";
// Convenience macro to obtain the scope logger
macro_rules! sl {
() => {
@@ -404,7 +406,8 @@ impl AgentService {
// For container initProcess, if it hasn't installed handler for "SIGTERM" signal,
// it will ignore the "SIGTERM" signal sent to it, thus send it "SIGKILL" signal
// instead of "SIGTERM" to terminate it.
if p.init && sig == libc::SIGTERM && !is_signal_handled(p.pid, sig as u32) {
let proc_status_file = format!("/proc/{}/status", p.pid);
if p.init && sig == libc::SIGTERM && !is_signal_handled(&proc_status_file, sig as u32) {
sig = libc::SIGKILL;
}
p.signal(sig)?;
@@ -575,7 +578,7 @@ impl AgentService {
}
};
let writer = writer.ok_or_else(|| anyhow!("cannot get writer"))?;
let writer = writer.ok_or_else(|| anyhow!(ERR_CANNOT_GET_WRITER))?;
writer.lock().await.write_all(req.data.as_slice()).await?;
let mut resp = WriteStreamResponse::new();
@@ -1219,7 +1222,12 @@ impl protocols::agent_ttrpc::AgentService for AgentService {
info!(sl!(), "get guest details!");
let mut resp = GuestDetailsResponse::new();
// to get memory block size
match get_memory_info(req.mem_block_size, req.mem_hotplug_probe) {
match get_memory_info(
req.mem_block_size,
req.mem_hotplug_probe,
SYSFS_MEMORY_BLOCK_SIZE_PATH,
SYSFS_MEMORY_HOTPLUG_PROBE_PATH,
) {
Ok((u, v)) => {
resp.mem_block_size_bytes = u;
resp.support_mem_hotplug_probe = v;
@@ -1408,24 +1416,29 @@ impl protocols::health_ttrpc::Health for HealthService {
}
}
fn get_memory_info(block_size: bool, hotplug: bool) -> Result<(u64, bool)> {
fn get_memory_info(
block_size: bool,
hotplug: bool,
block_size_path: &str,
hotplug_probe_path: &str,
) -> Result<(u64, bool)> {
let mut size: u64 = 0;
let mut plug: bool = false;
if block_size {
match fs::read_to_string(SYSFS_MEMORY_BLOCK_SIZE_PATH) {
match fs::read_to_string(block_size_path) {
Ok(v) => {
if v.is_empty() {
info!(sl!(), "string in empty???");
return Err(anyhow!("Invalid block size"));
warn!(sl!(), "file {} is empty", block_size_path);
return Err(anyhow!(ERR_INVALID_BLOCK_SIZE));
}
size = u64::from_str_radix(v.trim(), 16).map_err(|_| {
warn!(sl!(), "failed to parse the str {} to hex", size);
anyhow!("Invalid block size")
anyhow!(ERR_INVALID_BLOCK_SIZE)
})?;
}
Err(e) => {
info!(sl!(), "memory block size error: {:?}", e.kind());
warn!(sl!(), "memory block size error: {:?}", e.kind());
if e.kind() != std::io::ErrorKind::NotFound {
return Err(anyhow!(e));
}
@@ -1434,10 +1447,10 @@ fn get_memory_info(block_size: bool, hotplug: bool) -> Result<(u64, bool)> {
}
if hotplug {
match stat::stat(SYSFS_MEMORY_HOTPLUG_PROBE_PATH) {
match stat::stat(hotplug_probe_path) {
Ok(_) => plug = true,
Err(e) => {
info!(sl!(), "hotplug memory error: {:?}", e);
warn!(sl!(), "hotplug memory error: {:?}", e);
match e {
nix::Error::ENOENT => plug = false,
_ => return Err(anyhow!(e)),
@@ -1452,20 +1465,12 @@ fn get_memory_info(block_size: bool, hotplug: bool) -> Result<(u64, bool)> {
fn get_volume_capacity_stats(path: &str) -> Result<VolumeUsage> {
let mut usage = VolumeUsage::new();
let s = System::new();
for disk in s.disks() {
if let Some(v) = disk.name().to_str() {
if v.to_string().eq(path) {
usage.available = disk.available_space();
usage.total = disk.total_space();
usage.used = usage.total - usage.available;
usage.unit = VolumeUsage_Unit::BYTES; // bytes
break;
}
} else {
return Err(anyhow!(nix::Error::EINVAL));
}
}
let stat = statfs::statfs(path)?;
let block_size = stat.block_size() as u64;
usage.total = stat.blocks() * block_size;
usage.available = stat.blocks_free() * block_size;
usage.used = usage.total - usage.available;
usage.unit = VolumeUsage_Unit::BYTES;
Ok(usage)
}
@@ -1473,20 +1478,11 @@ fn get_volume_capacity_stats(path: &str) -> Result<VolumeUsage> {
fn get_volume_inode_stats(path: &str) -> Result<VolumeUsage> {
let mut usage = VolumeUsage::new();
let s = System::new();
for disk in s.disks() {
if let Some(v) = disk.name().to_str() {
if v.to_string().eq(path) {
let meta = fs::metadata(disk.mount_point())?;
let inode = meta.ino();
usage.used = inode;
usage.unit = VolumeUsage_Unit::INODES;
break;
}
} else {
return Err(anyhow!(nix::Error::EINVAL));
}
}
let stat = statfs::statfs(path)?;
usage.total = stat.files();
usage.available = stat.files_free();
usage.used = usage.total - usage.available;
usage.unit = VolumeUsage_Unit::INODES;
Ok(usage)
}
@@ -1575,7 +1571,7 @@ fn update_container_namespaces(
let linux = spec
.linux
.as_mut()
.ok_or_else(|| anyhow!("Spec didn't container linux field"))?;
.ok_or_else(|| anyhow!(ERR_NO_LINUX_FIELD))?;
let namespaces = linux.namespaces.as_mut_slice();
for namespace in namespaces.iter_mut() {
@@ -1602,7 +1598,7 @@ fn update_container_namespaces(
if let Some(ref pidns) = &sandbox.sandbox_pidns {
pid_ns.path = String::from(pidns.path.as_str());
} else {
return Err(anyhow!("failed to get sandbox pidns"));
return Err(anyhow!(ERR_NO_SANDBOX_PIDNS));
}
}
@@ -1622,21 +1618,33 @@ fn append_guest_hooks(s: &Sandbox, oci: &mut Spec) -> Result<()> {
Ok(())
}
// Check is the container process installed the
// Check if the container process installed the
// handler for specific signal.
fn is_signal_handled(pid: pid_t, signum: u32) -> bool {
let sig_mask: u64 = 1u64 << (signum - 1);
let file_name = format!("/proc/{}/status", pid);
fn is_signal_handled(proc_status_file: &str, signum: u32) -> bool {
let shift_count: u64 = if signum == 0 {
// signum 0 is used to check for process liveness.
// Since that signal is not part of the mask in the file, we only need
// to know if the file (and therefore) process exists to handle
// that signal.
return fs::metadata(proc_status_file).is_ok();
} else if signum > 64 {
// Ensure invalid signum won't break bit shift logic
warn!(sl!(), "received invalid signum {}", signum);
return false;
} else {
(signum - 1).into()
};
// Open the file in read-only mode (ignoring errors).
let file = match File::open(&file_name) {
let file = match File::open(proc_status_file) {
Ok(f) => f,
Err(_) => {
warn!(sl!(), "failed to open file {}\n", file_name);
warn!(sl!(), "failed to open file {}", proc_status_file);
return false;
}
};
let sig_mask: u64 = 1 << shift_count;
let reader = BufReader::new(file);
// Read the file line by line using the lines() iterator from std::io::BufRead.
@@ -1644,21 +1652,21 @@ fn is_signal_handled(pid: pid_t, signum: u32) -> bool {
let line = match line {
Ok(l) => l,
Err(_) => {
warn!(sl!(), "failed to read file {}\n", file_name);
warn!(sl!(), "failed to read file {}", proc_status_file);
return false;
}
};
if line.starts_with("SigCgt:") {
let mask_vec: Vec<&str> = line.split(':').collect();
if mask_vec.len() != 2 {
warn!(sl!(), "parse the SigCgt field failed\n");
warn!(sl!(), "parse the SigCgt field failed");
return false;
}
let sig_cgt_str = mask_vec[1];
let sig_cgt_str = mask_vec[1].trim();
let sig_cgt_mask = match u64::from_str_radix(sig_cgt_str, 16) {
Ok(h) => h,
Err(_) => {
warn!(sl!(), "failed to parse the str {} to hex\n", sig_cgt_str);
warn!(sl!(), "failed to parse the str {} to hex", sig_cgt_str);
return false;
}
};
@@ -1866,8 +1874,13 @@ fn load_kernel_module(module: &protocols::agent::KernelModule) -> Result<()> {
#[cfg(test)]
mod tests {
use super::*;
use crate::protocols::agent_ttrpc::AgentService as _;
use oci::{Hook, Hooks};
use crate::{
assert_result, namespace::Namespace, protocols::agent_ttrpc::AgentService as _,
skip_if_not_root,
};
use nix::mount;
use oci::{Hook, Hooks, Linux, LinuxNamespace};
use tempfile::{tempdir, TempDir};
use ttrpc::{r#async::TtrpcContext, MessageHeader};
fn mk_ttrpc_context() -> TtrpcContext {
@@ -1879,6 +1892,44 @@ mod tests {
}
}
fn create_dummy_opts() -> CreateOpts {
let root = Root {
path: String::from("/"),
..Default::default()
};
let spec = Spec {
linux: Some(oci::Linux::default()),
root: Some(root),
..Default::default()
};
CreateOpts {
cgroup_name: "".to_string(),
use_systemd_cgroup: false,
no_pivot_root: false,
no_new_keyring: false,
spec: Some(spec),
rootless_euid: false,
rootless_cgroup: false,
}
}
fn create_linuxcontainer() -> (LinuxContainer, TempDir) {
let dir = tempdir().expect("failed to make tempdir");
(
LinuxContainer::new(
"some_id",
dir.path().join("rootfs").to_str().unwrap(),
create_dummy_opts(),
&slog_scope::logger(),
)
.unwrap(),
dir,
)
}
#[test]
fn test_load_kernel_module() {
let mut m = protocols::agent::KernelModule {
@@ -1971,6 +2022,511 @@ mod tests {
assert!(result.is_err(), "expected add arp neighbors to fail");
}
#[tokio::test]
async fn test_do_write_stream() {
#[derive(Debug)]
struct TestData<'a> {
create_container: bool,
has_fd: bool,
has_tty: bool,
break_pipe: bool,
container_id: &'a str,
exec_id: &'a str,
data: Vec<u8>,
result: Result<protocols::agent::WriteStreamResponse>,
}
impl Default for TestData<'_> {
fn default() -> Self {
TestData {
create_container: true,
has_fd: true,
has_tty: true,
break_pipe: false,
container_id: "1",
exec_id: "2",
data: vec![1, 2, 3],
result: Ok(WriteStreamResponse {
len: 3,
..WriteStreamResponse::default()
}),
}
}
}
let tests = &[
TestData {
..Default::default()
},
TestData {
has_tty: false,
..Default::default()
},
TestData {
break_pipe: true,
result: Err(anyhow!(std::io::Error::from_raw_os_error(libc::EPIPE))),
..Default::default()
},
TestData {
create_container: false,
result: Err(anyhow!(crate::sandbox::ERR_INVALID_CONTAINER_ID)),
..Default::default()
},
TestData {
container_id: "8181",
result: Err(anyhow!(crate::sandbox::ERR_INVALID_CONTAINER_ID)),
..Default::default()
},
TestData {
data: vec![],
result: Ok(WriteStreamResponse {
len: 0,
..WriteStreamResponse::default()
}),
..Default::default()
},
TestData {
has_fd: false,
result: Err(anyhow!(ERR_CANNOT_GET_WRITER)),
..Default::default()
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let logger = slog::Logger::root(slog::Discard, o!());
let mut sandbox = Sandbox::new(&logger).unwrap();
let (rfd, wfd) = unistd::pipe().unwrap();
if d.break_pipe {
unistd::close(rfd).unwrap();
}
if d.create_container {
let (mut linux_container, _root) = create_linuxcontainer();
let exec_process_id = 2;
linux_container.id = "1".to_string();
let mut exec_process = Process::new(
&logger,
&oci::Process::default(),
&exec_process_id.to_string(),
false,
1,
)
.unwrap();
let fd = {
if d.has_fd {
Some(wfd)
} else {
None
}
};
if d.has_tty {
exec_process.parent_stdin = None;
exec_process.term_master = fd;
} else {
exec_process.parent_stdin = fd;
exec_process.term_master = None;
}
linux_container
.processes
.insert(exec_process_id, exec_process);
sandbox.add_container(linux_container);
}
let agent_service = Box::new(AgentService {
sandbox: Arc::new(Mutex::new(sandbox)),
});
let result = agent_service
.do_write_stream(protocols::agent::WriteStreamRequest {
container_id: d.container_id.to_string(),
exec_id: d.exec_id.to_string(),
data: d.data.clone(),
..Default::default()
})
.await;
if !d.break_pipe {
unistd::close(rfd).unwrap();
}
unistd::close(wfd).unwrap();
let msg = format!("{}, result: {:?}", msg, result);
assert_result!(d.result, result, msg);
}
}
#[tokio::test]
async fn test_update_container_namespaces() {
#[derive(Debug)]
struct TestData<'a> {
has_linux_in_spec: bool,
sandbox_pidns_path: Option<&'a str>,
namespaces: Vec<LinuxNamespace>,
use_sandbox_pidns: bool,
result: Result<()>,
expected_namespaces: Vec<LinuxNamespace>,
}
impl Default for TestData<'_> {
fn default() -> Self {
TestData {
has_linux_in_spec: true,
sandbox_pidns_path: Some("sharedpidns"),
namespaces: vec![
LinuxNamespace {
r#type: NSTYPEIPC.to_string(),
path: "ipcpath".to_string(),
},
LinuxNamespace {
r#type: NSTYPEUTS.to_string(),
path: "utspath".to_string(),
},
],
use_sandbox_pidns: false,
result: Ok(()),
expected_namespaces: vec![
LinuxNamespace {
r#type: NSTYPEIPC.to_string(),
path: "".to_string(),
},
LinuxNamespace {
r#type: NSTYPEUTS.to_string(),
path: "".to_string(),
},
LinuxNamespace {
r#type: NSTYPEPID.to_string(),
path: "".to_string(),
},
],
}
}
}
let tests = &[
TestData {
..Default::default()
},
TestData {
use_sandbox_pidns: true,
expected_namespaces: vec![
LinuxNamespace {
r#type: NSTYPEIPC.to_string(),
path: "".to_string(),
},
LinuxNamespace {
r#type: NSTYPEUTS.to_string(),
path: "".to_string(),
},
LinuxNamespace {
r#type: NSTYPEPID.to_string(),
path: "sharedpidns".to_string(),
},
],
..Default::default()
},
TestData {
namespaces: vec![],
use_sandbox_pidns: true,
expected_namespaces: vec![LinuxNamespace {
r#type: NSTYPEPID.to_string(),
path: "sharedpidns".to_string(),
}],
..Default::default()
},
TestData {
namespaces: vec![],
use_sandbox_pidns: false,
expected_namespaces: vec![LinuxNamespace {
r#type: NSTYPEPID.to_string(),
path: "".to_string(),
}],
..Default::default()
},
TestData {
namespaces: vec![],
sandbox_pidns_path: None,
use_sandbox_pidns: true,
result: Err(anyhow!(ERR_NO_SANDBOX_PIDNS)),
expected_namespaces: vec![],
..Default::default()
},
TestData {
has_linux_in_spec: false,
result: Err(anyhow!(ERR_NO_LINUX_FIELD)),
..Default::default()
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let logger = slog::Logger::root(slog::Discard, o!());
let mut sandbox = Sandbox::new(&logger).unwrap();
if let Some(pidns_path) = d.sandbox_pidns_path {
let mut sandbox_pidns = Namespace::new(&logger);
sandbox_pidns.path = pidns_path.to_string();
sandbox.sandbox_pidns = Some(sandbox_pidns);
}
let mut oci = Spec::default();
if d.has_linux_in_spec {
oci.linux = Some(Linux {
namespaces: d.namespaces.clone(),
..Default::default()
});
}
let result = update_container_namespaces(&sandbox, &mut oci, d.use_sandbox_pidns);
let msg = format!("{}, result: {:?}", msg, result);
assert_result!(d.result, result, msg);
if let Some(linux) = oci.linux {
assert_eq!(d.expected_namespaces, linux.namespaces, "{}", msg);
}
}
}
#[tokio::test]
async fn test_get_memory_info() {
#[derive(Debug)]
struct TestData<'a> {
// if None is provided, no file will be generated, else the data in the Option will populate the file
block_size_data: Option<&'a str>,
hotplug_probe_data: bool,
get_block_size: bool,
get_hotplug: bool,
result: Result<(u64, bool)>,
}
let tests = &[
TestData {
block_size_data: Some("10000000"),
hotplug_probe_data: true,
get_block_size: true,
get_hotplug: true,
result: Ok((268435456, true)),
},
TestData {
block_size_data: Some("100"),
hotplug_probe_data: false,
get_block_size: true,
get_hotplug: true,
result: Ok((256, false)),
},
TestData {
block_size_data: None,
hotplug_probe_data: false,
get_block_size: true,
get_hotplug: true,
result: Ok((0, false)),
},
TestData {
block_size_data: Some(""),
hotplug_probe_data: false,
get_block_size: true,
get_hotplug: false,
result: Err(anyhow!(ERR_INVALID_BLOCK_SIZE)),
},
TestData {
block_size_data: Some("-1"),
hotplug_probe_data: false,
get_block_size: true,
get_hotplug: false,
result: Err(anyhow!(ERR_INVALID_BLOCK_SIZE)),
},
TestData {
block_size_data: Some(" "),
hotplug_probe_data: false,
get_block_size: true,
get_hotplug: false,
result: Err(anyhow!(ERR_INVALID_BLOCK_SIZE)),
},
TestData {
block_size_data: Some("some data"),
hotplug_probe_data: false,
get_block_size: true,
get_hotplug: false,
result: Err(anyhow!(ERR_INVALID_BLOCK_SIZE)),
},
TestData {
block_size_data: Some("some data"),
hotplug_probe_data: true,
get_block_size: false,
get_hotplug: false,
result: Ok((0, false)),
},
TestData {
block_size_data: Some("100"),
hotplug_probe_data: true,
get_block_size: false,
get_hotplug: false,
result: Ok((0, false)),
},
TestData {
block_size_data: Some("100"),
hotplug_probe_data: true,
get_block_size: false,
get_hotplug: true,
result: Ok((0, true)),
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let dir = tempdir().expect("failed to make tempdir");
let block_size_path = dir.path().join("block_size_bytes");
let hotplug_probe_path = dir.path().join("probe");
if let Some(block_size_data) = d.block_size_data {
fs::write(&block_size_path, block_size_data).unwrap();
}
if d.hotplug_probe_data {
fs::write(&hotplug_probe_path, []).unwrap();
}
let result = get_memory_info(
d.get_block_size,
d.get_hotplug,
block_size_path.to_str().unwrap(),
hotplug_probe_path.to_str().unwrap(),
);
let msg = format!("{}, result: {:?}", msg, result);
assert_result!(d.result, result, msg);
}
}
#[tokio::test]
async fn test_is_signal_handled() {
#[derive(Debug)]
struct TestData<'a> {
status_file_data: Option<&'a str>,
signum: u32,
result: bool,
}
let tests = &[
TestData {
status_file_data: Some(
r#"
SigBlk:0000000000010000
SigCgt:0000000000000001
OtherField:other
"#,
),
signum: 1,
result: true,
},
TestData {
status_file_data: Some("SigCgt:000000004b813efb"),
signum: 4,
result: true,
},
TestData {
status_file_data: Some("SigCgt:\t000000004b813efb"),
signum: 4,
result: true,
},
TestData {
status_file_data: Some("SigCgt: 000000004b813efb"),
signum: 4,
result: true,
},
TestData {
status_file_data: Some("SigCgt:000000004b813efb "),
signum: 4,
result: true,
},
TestData {
status_file_data: Some("SigCgt:\t000000004b813efb "),
signum: 4,
result: true,
},
TestData {
status_file_data: Some("SigCgt:000000004b813efb"),
signum: 3,
result: false,
},
TestData {
status_file_data: Some("SigCgt:000000004b813efb"),
signum: 65,
result: false,
},
TestData {
status_file_data: Some("SigCgt:000000004b813efb"),
signum: 0,
result: true,
},
TestData {
status_file_data: Some("SigCgt:ZZZZZZZZ"),
signum: 1,
result: false,
},
TestData {
status_file_data: Some("SigCgt:-1"),
signum: 1,
result: false,
},
TestData {
status_file_data: Some("SigCgt"),
signum: 1,
result: false,
},
TestData {
status_file_data: Some("any data"),
signum: 0,
result: true,
},
TestData {
status_file_data: Some("SigBlk:0000000000000001"),
signum: 1,
result: false,
},
TestData {
status_file_data: None,
signum: 1,
result: false,
},
TestData {
status_file_data: None,
signum: 0,
result: false,
},
];
for (i, d) in tests.iter().enumerate() {
let msg = format!("test[{}]: {:?}", i, d);
let dir = tempdir().expect("failed to make tempdir");
let proc_status_file_path = dir.path().join("status");
if let Some(file_data) = d.status_file_data {
fs::write(&proc_status_file_path, file_data).unwrap();
}
let result = is_signal_handled(proc_status_file_path.to_str().unwrap(), d.signum);
let msg = format!("{}, result: {:?}", msg, result);
assert_eq!(d.result, result, "{}", msg);
}
}
#[tokio::test]
async fn test_verify_cid() {
#[derive(Debug)]
@@ -2197,4 +2753,66 @@ mod tests {
}
}
}
#[tokio::test]
async fn test_volume_capacity_stats() {
skip_if_not_root!();
// Verify error if path does not exist
assert!(get_volume_capacity_stats("/does-not-exist").is_err());
// Create a new tmpfs mount, and verify the initial values
let mount_dir = tempfile::tempdir().unwrap();
mount::mount(
Some("tmpfs"),
mount_dir.path().to_str().unwrap(),
Some("tmpfs"),
mount::MsFlags::empty(),
None::<&str>,
)
.unwrap();
let mut stats = get_volume_capacity_stats(mount_dir.path().to_str().unwrap()).unwrap();
assert_eq!(stats.used, 0);
assert_ne!(stats.available, 0);
let available = stats.available;
// Verify that writing a file will result in increased utilization
fs::write(mount_dir.path().join("file.dat"), "foobar").unwrap();
stats = get_volume_capacity_stats(mount_dir.path().to_str().unwrap()).unwrap();
assert_eq!(stats.used, 4 * 1024);
assert_eq!(stats.available, available - 4 * 1024);
}
#[tokio::test]
async fn test_get_volume_inode_stats() {
skip_if_not_root!();
// Verify error if path does not exist
assert!(get_volume_inode_stats("/does-not-exist").is_err());
// Create a new tmpfs mount, and verify the initial values
let mount_dir = tempfile::tempdir().unwrap();
mount::mount(
Some("tmpfs"),
mount_dir.path().to_str().unwrap(),
Some("tmpfs"),
mount::MsFlags::empty(),
None::<&str>,
)
.unwrap();
let mut stats = get_volume_inode_stats(mount_dir.path().to_str().unwrap()).unwrap();
assert_eq!(stats.used, 1);
assert_ne!(stats.available, 0);
let available = stats.available;
// Verify that creating a directory and writing a file will result in increased utilization
let dir = mount_dir.path().join("foobar");
fs::create_dir_all(&dir).unwrap();
fs::write(dir.as_path().join("file.dat"), "foobar").unwrap();
stats = get_volume_inode_stats(mount_dir.path().to_str().unwrap()).unwrap();
assert_eq!(stats.used, 3);
assert_eq!(stats.available, available - 2);
}
}

View File

@@ -32,6 +32,8 @@ use tokio::sync::oneshot;
use tokio::sync::Mutex;
use tracing::instrument;
pub const ERR_INVALID_CONTAINER_ID: &str = "Invalid container id";
type UeventWatcher = (Box<dyn UeventMatcher>, oneshot::Sender<Uevent>);
#[derive(Debug)]
@@ -237,7 +239,7 @@ impl Sandbox {
pub fn find_container_process(&mut self, cid: &str, eid: &str) -> Result<&mut Process> {
let ctr = self
.get_container(cid)
.ok_or_else(|| anyhow!("Invalid container id"))?;
.ok_or_else(|| anyhow!(ERR_INVALID_CONTAINER_ID))?;
if eid.is_empty() {
return ctr

View File

@@ -53,4 +53,29 @@ mod test_utils {
}
};
}
// Parameters:
//
// 1: expected Result
// 2: actual Result
// 3: string used to identify the test on error
#[macro_export]
macro_rules! assert_result {
($expected_result:expr, $actual_result:expr, $msg:expr) => {
if $expected_result.is_ok() {
let expected_value = $expected_result.as_ref().unwrap();
let actual_value = $actual_result.unwrap();
assert!(*expected_value == actual_value, "{}", $msg);
} else {
assert!($actual_result.is_err(), "{}", $msg);
let expected_error = $expected_result.as_ref().unwrap_err();
let expected_error_msg = format!("{:?}", expected_error);
let actual_error_msg = format!("{:?}", $actual_result.unwrap_err());
assert!(expected_error_msg == actual_error_msg, "{}", $msg);
}
};
}
}

View File

@@ -7,10 +7,11 @@ package main
import (
"encoding/json"
"fmt"
"net/url"
containerdshim "github.com/kata-containers/kata-containers/src/runtime/pkg/containerd-shim-v2"
"github.com/kata-containers/kata-containers/src/runtime/pkg/direct-volume"
volume "github.com/kata-containers/kata-containers/src/runtime/pkg/direct-volume"
"github.com/kata-containers/kata-containers/src/runtime/pkg/utils/shimclient"
"github.com/urfave/cli"
@@ -89,12 +90,14 @@ var statsCommand = cli.Command{
Destination: &volumePath,
},
},
Action: func(c *cli.Context) (string, error) {
Action: func(c *cli.Context) error {
stats, err := Stats(volumePath)
if err != nil {
return "", cli.NewExitError(err.Error(), 1)
return cli.NewExitError(err.Error(), 1)
}
return string(stats), nil
fmt.Println(string(stats))
return nil
},
}
@@ -127,8 +130,14 @@ func Stats(volumePath string) ([]byte, error) {
if err != nil {
return nil, err
}
urlSafeDevicePath := url.PathEscape(volumePath)
body, err := shimclient.DoGet(sandboxId, defaultTimeout, containerdshim.DirectVolumeStatUrl+"/"+urlSafeDevicePath)
volumeMountInfo, err := volume.VolumeMountInfo(volumePath)
if err != nil {
return nil, err
}
urlSafeDevicePath := url.PathEscape(volumeMountInfo.Device)
body, err := shimclient.DoGet(sandboxId, defaultTimeout,
fmt.Sprintf("%s?%s=%s", containerdshim.DirectVolumeStatUrl, containerdshim.DirectVolumePathKey, urlSafeDevicePath))
if err != nil {
return nil, err
}
@@ -141,8 +150,13 @@ func Resize(volumePath string, size uint64) error {
if err != nil {
return err
}
volumeMountInfo, err := volume.VolumeMountInfo(volumePath)
if err != nil {
return err
}
resizeReq := containerdshim.ResizeRequest{
VolumePath: volumePath,
VolumePath: volumeMountInfo.Device,
Size: size,
}
encoded, err := json.Marshal(resizeReq)

View File

@@ -32,6 +32,8 @@ import (
)
const (
DirectVolumePathKey = "path"
DirectVolumeStatUrl = "/direct-volume/stats"
DirectVolumeResizeUrl = "/direct-volume/resize"
)
@@ -139,7 +141,16 @@ func decodeAgentMetrics(body string) []*dto.MetricFamily {
}
func (s *service) serveVolumeStats(w http.ResponseWriter, r *http.Request) {
volumePath, err := url.PathUnescape(strings.TrimPrefix(r.URL.Path, DirectVolumeStatUrl))
val := r.URL.Query().Get(DirectVolumePathKey)
if val == "" {
msg := fmt.Sprintf("Required parameter %s not found", DirectVolumePathKey)
shimMgtLog.Info(msg)
w.WriteHeader(http.StatusBadRequest)
w.Write([]byte(msg))
return
}
volumePath, err := url.PathUnescape(val)
if err != nil {
shimMgtLog.WithError(err).Error("failed to unescape the volume stat url path")
w.WriteHeader(http.StatusInternalServerError)

View File

@@ -53,6 +53,11 @@ func wait(ctx context.Context, s *service, c *container, execID string) (int32,
"container": c.id,
"pid": processID,
}).Error("Wait for process failed")
// set return code if wait failed
if ret == 0 {
ret = exitCode255
}
}
timeStamp := time.Now()
@@ -78,7 +83,7 @@ func wait(ctx context.Context, s *service, c *container, execID string) (int32,
shimLog.WithField("sandbox", s.sandbox.ID()).Error("failed to delete sandbox")
}
} else {
if _, err = s.sandbox.StopContainer(ctx, c.id, false); err != nil {
if _, err = s.sandbox.StopContainer(ctx, c.id, true); err != nil {
shimLog.WithError(err).WithField("container", c.id).Warn("stop container failed")
}
}

View File

@@ -45,9 +45,8 @@ func deviceLogger() *logrus.Entry {
return api.DeviceLogger()
}
// Identify PCIe device by /sys/bus/pci/slots/xx/max_bus_speed, sample content "8.0 GT/s PCIe"
// The /sys/bus/pci/slots/xx/address contains bdf, sample content "0000:04:00"
// bdf format: bus:slot.function
// Identify PCIe device by reading the size of the PCI config space
// Plain PCI device have 256 bytes of config space where PCIe devices have 4K
func isPCIeDevice(bdf string) bool {
if len(strings.Split(bdf, ":")) == 2 {
bdf = PCIDomain + ":" + bdf

View File

@@ -222,6 +222,7 @@ func getVFIODetails(deviceFileName, iommuDevicesPath string) (deviceBDF, deviceS
// Get sysfsdev of device eg. /sys/devices/pci0000:00/0000:00:02.0/f79944e4-5a3d-11e8-99ce-479cbab002e4
sysfsDevStr := filepath.Join(iommuDevicesPath, deviceFileName)
deviceSysfsDev, err = getSysfsDev(sysfsDevStr)
deviceBDF = getBDF(getMediatedBDF(deviceSysfsDev))
default:
err = fmt.Errorf("Incorrect tokens found while parsing vfio details: %s", deviceFileName)
}
@@ -229,10 +230,23 @@ func getVFIODetails(deviceFileName, iommuDevicesPath string) (deviceBDF, deviceS
return deviceBDF, deviceSysfsDev, vfioDeviceType, err
}
// getMediatedBDF returns the BDF of a VF
// Expected input string format is /sys/devices/pci0000:d7/BDF0/BDF1/.../MDEVBDF/UUID
func getMediatedBDF(deviceSysfsDev string) string {
tokens := strings.SplitN(deviceSysfsDev, "/", -1)
if len(tokens) < 4 {
return ""
}
return tokens[len(tokens)-2]
}
// getBDF returns the BDF of pci device
// Expected input string format is [<domain>]:[<bus>][<slot>].[<func>] eg. 0000:02:10.0
func getBDF(deviceSysStr string) string {
tokens := strings.SplitN(deviceSysStr, ":", 2)
if len(tokens) == 1 {
return ""
}
return tokens[1]
}

View File

@@ -46,4 +46,5 @@ func TestGetVFIODetails(t *testing.T) {
assert.Nil(t, err)
}
}
}

View File

@@ -606,6 +606,7 @@ components:
- 3
- 3
num_pci_segments: 3
serial_number: serial_number
pmem:
- pci_segment: 6
mergeable: false
@@ -948,6 +949,7 @@ components:
- 3
- 3
num_pci_segments: 3
serial_number: serial_number
pmem:
- pci_segment: 6
mergeable: false
@@ -1169,6 +1171,7 @@ components:
- 3
- 3
num_pci_segments: 3
serial_number: serial_number
properties:
num_pci_segments:
format: int16
@@ -1178,6 +1181,8 @@ components:
format: int16
type: integer
type: array
serial_number:
type: string
type: object
MemoryZoneConfig:
example:

View File

@@ -6,6 +6,7 @@ Name | Type | Description | Notes
------------ | ------------- | ------------- | -------------
**NumPciSegments** | Pointer to **int32** | | [optional]
**IommuSegments** | Pointer to **[]int32** | | [optional]
**SerialNumber** | Pointer to **string** | | [optional]
## Methods
@@ -76,6 +77,31 @@ SetIommuSegments sets IommuSegments field to given value.
HasIommuSegments returns a boolean if a field has been set.
### GetSerialNumber
`func (o *PlatformConfig) GetSerialNumber() string`
GetSerialNumber returns the SerialNumber field if non-nil, zero value otherwise.
### GetSerialNumberOk
`func (o *PlatformConfig) GetSerialNumberOk() (*string, bool)`
GetSerialNumberOk returns a tuple with the SerialNumber field if it's non-nil, zero value otherwise
and a boolean to check if the value has been set.
### SetSerialNumber
`func (o *PlatformConfig) SetSerialNumber(v string)`
SetSerialNumber sets SerialNumber field to given value.
### HasSerialNumber
`func (o *PlatformConfig) HasSerialNumber() bool`
HasSerialNumber returns a boolean if a field has been set.
[[Back to Model list]](../README.md#documentation-for-models) [[Back to API list]](../README.md#documentation-for-api-endpoints) [[Back to README]](../README.md)

View File

@@ -18,6 +18,7 @@ import (
type PlatformConfig struct {
NumPciSegments *int32 `json:"num_pci_segments,omitempty"`
IommuSegments *[]int32 `json:"iommu_segments,omitempty"`
SerialNumber *string `json:"serial_number,omitempty"`
}
// NewPlatformConfig instantiates a new PlatformConfig object
@@ -101,6 +102,38 @@ func (o *PlatformConfig) SetIommuSegments(v []int32) {
o.IommuSegments = &v
}
// GetSerialNumber returns the SerialNumber field value if set, zero value otherwise.
func (o *PlatformConfig) GetSerialNumber() string {
if o == nil || o.SerialNumber == nil {
var ret string
return ret
}
return *o.SerialNumber
}
// GetSerialNumberOk returns a tuple with the SerialNumber field value if set, nil otherwise
// and a boolean to check if the value has been set.
func (o *PlatformConfig) GetSerialNumberOk() (*string, bool) {
if o == nil || o.SerialNumber == nil {
return nil, false
}
return o.SerialNumber, true
}
// HasSerialNumber returns a boolean if a field has been set.
func (o *PlatformConfig) HasSerialNumber() bool {
if o != nil && o.SerialNumber != nil {
return true
}
return false
}
// SetSerialNumber gets a reference to the given string and assigns it to the SerialNumber field.
func (o *PlatformConfig) SetSerialNumber(v string) {
o.SerialNumber = &v
}
func (o PlatformConfig) MarshalJSON() ([]byte, error) {
toSerialize := map[string]interface{}{}
if o.NumPciSegments != nil {
@@ -109,6 +142,9 @@ func (o PlatformConfig) MarshalJSON() ([]byte, error) {
if o.IommuSegments != nil {
toSerialize["iommu_segments"] = o.IommuSegments
}
if o.SerialNumber != nil {
toSerialize["serial_number"] = o.SerialNumber
}
return json.Marshal(toSerialize)
}

View File

@@ -616,6 +616,8 @@ components:
items:
type: integer
format: int16
serial_number:
type: string
MemoryZoneConfig:
required:

View File

@@ -1593,7 +1593,7 @@ func (s *Sandbox) ResumeContainer(ctx context.Context, containerID string) error
}
// createContainers registers all containers, create the
// containers in the guest and starts one shim per container.
// containers in the guest.
func (s *Sandbox) createContainers(ctx context.Context) error {
span, ctx := katatrace.Trace(ctx, s.Logger(), "createContainers", sandboxTracingTags, map[string]string{"sandbox_id": s.id})
defer span.End()

View File

@@ -532,8 +532,13 @@ EOT
if [ -f "$chrony_systemd_service" ]; then
# Remove user option, user could not exist in the rootfs
# Set the /var/lib/chrony for ReadWritePaths to be ignored if
# its nonexistent, this broke the service on boot previously
# due to the directory not being present "(code=exited, status=226/NAMESPACE)"
sed -i -e 's/^\(ExecStart=.*\)-u [[:alnum:]]*/\1/g' \
-e '/^\[Unit\]/a ConditionPathExists=\/dev\/ptp0' ${chrony_systemd_service}
-e '/^\[Unit\]/a ConditionPathExists=\/dev\/ptp0' \
-e 's/^ReadWritePaths=\(.\+\) \/var\/lib\/chrony \(.\+\)$/ReadWritePaths=\1 -\/var\/lib\/chrony \2/m' \
${chrony_systemd_service}
fi
AGENT_DIR="${ROOTFS_DIR}/usr/bin"

View File

@@ -18,7 +18,7 @@ spec:
katacontainers.io/kata-runtime: cleanup
containers:
- name: kube-kata-cleanup
image: quay.io/kata-containers/kata-deploy:2.4.1
image: quay.io/kata-containers/kata-deploy:2.4.3
imagePullPolicy: Always
command: [ "bash", "-c", "/opt/kata-artifacts/scripts/kata-deploy.sh reset" ]
env:

View File

@@ -16,7 +16,7 @@ spec:
serviceAccountName: kata-label-node
containers:
- name: kube-kata
image: quay.io/kata-containers/kata-deploy:2.4.1
image: quay.io/kata-containers/kata-deploy:2.4.3
imagePullPolicy: Always
lifecycle:
preStop:

View File

@@ -0,0 +1,277 @@
From 1105812382e1126d86dddc16b3700f8c79dc93d1 Mon Sep 17 00:00:00 2001
From: Yang Zhong <yang.zhong@intel.com>
Date: Mon, 1 Nov 2021 12:20:05 -0400
Subject: [PATCH 1/3] numa: Enable numa for SGX EPC sections
The basic SGX did not enable numa for SGX EPC sections, which
result in all EPC sections located in numa node 0. This patch
enable SGX numa function in the guest and the EPC section can
work with RAM as one numa node.
The Guest kernel related log:
[ 0.009981] ACPI: SRAT: Node 0 PXM 0 [mem 0x180000000-0x183ffffff]
[ 0.009982] ACPI: SRAT: Node 1 PXM 1 [mem 0x184000000-0x185bfffff]
The SRAT table can normally show SGX EPC sections menory info in different
numa nodes.
The SGX EPC numa related command:
......
-m 4G,maxmem=20G \
-smp sockets=2,cores=2 \
-cpu host,+sgx-provisionkey \
-object memory-backend-ram,size=2G,host-nodes=0,policy=bind,id=node0 \
-object memory-backend-epc,id=mem0,size=64M,prealloc=on,host-nodes=0,policy=bind \
-numa node,nodeid=0,cpus=0-1,memdev=node0 \
-object memory-backend-ram,size=2G,host-nodes=1,policy=bind,id=node1 \
-object memory-backend-epc,id=mem1,size=28M,prealloc=on,host-nodes=1,policy=bind \
-numa node,nodeid=1,cpus=2-3,memdev=node1 \
-M sgx-epc.0.memdev=mem0,sgx-epc.0.node=0,sgx-epc.1.memdev=mem1,sgx-epc.1.node=1 \
......
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20211101162009.62161-2-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
hw/core/numa.c | 5 ++---
hw/i386/acpi-build.c | 2 ++
hw/i386/sgx-epc.c | 3 +++
hw/i386/sgx-stub.c | 4 ++++
hw/i386/sgx.c | 44 +++++++++++++++++++++++++++++++++++++++
include/hw/i386/sgx-epc.h | 3 +++
monitor/hmp-cmds.c | 1 +
qapi/machine.json | 10 ++++++++-
qemu-options.hx | 4 ++--
9 files changed, 70 insertions(+), 6 deletions(-)
diff --git a/hw/core/numa.c b/hw/core/numa.c
index e6050b2273..1aa05dcf42 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -784,9 +784,8 @@ static void numa_stat_memory_devices(NumaNodeMem node_mem[])
break;
case MEMORY_DEVICE_INFO_KIND_SGX_EPC:
se = value->u.sgx_epc.data;
- /* TODO: once we support numa, assign to right node */
- node_mem[0].node_mem += se->size;
- node_mem[0].node_plugged_mem += se->size;
+ node_mem[se->node].node_mem += se->size;
+ node_mem[se->node].node_plugged_mem = 0;
break;
default:
g_assert_not_reached();
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index a99c6e4fe3..8383b83ee3 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2068,6 +2068,8 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
nvdimm_build_srat(table_data);
}
+ sgx_epc_build_srat(table_data);
+
/*
* TODO: this part is not in ACPI spec and current linux kernel boots fine
* without these entries. But I recall there were issues the last time I
diff --git a/hw/i386/sgx-epc.c b/hw/i386/sgx-epc.c
index e508827e78..96b2940d75 100644
--- a/hw/i386/sgx-epc.c
+++ b/hw/i386/sgx-epc.c
@@ -21,6 +21,7 @@
static Property sgx_epc_properties[] = {
DEFINE_PROP_UINT64(SGX_EPC_ADDR_PROP, SGXEPCDevice, addr, 0),
+ DEFINE_PROP_UINT32(SGX_EPC_NUMA_NODE_PROP, SGXEPCDevice, node, 0),
DEFINE_PROP_LINK(SGX_EPC_MEMDEV_PROP, SGXEPCDevice, hostmem,
TYPE_MEMORY_BACKEND_EPC, HostMemoryBackendEpc *),
DEFINE_PROP_END_OF_LIST(),
@@ -139,6 +140,8 @@ static void sgx_epc_md_fill_device_info(const MemoryDeviceState *md,
se->memaddr = epc->addr;
se->size = object_property_get_uint(OBJECT(epc), SGX_EPC_SIZE_PROP,
NULL);
+ se->node = object_property_get_uint(OBJECT(epc), SGX_EPC_NUMA_NODE_PROP,
+ NULL);
se->memdev = object_get_canonical_path(OBJECT(epc->hostmem));
info->u.sgx_epc.data = se;
diff --git a/hw/i386/sgx-stub.c b/hw/i386/sgx-stub.c
index c9b379e665..26833eb233 100644
--- a/hw/i386/sgx-stub.c
+++ b/hw/i386/sgx-stub.c
@@ -6,6 +6,10 @@
#include "qapi/error.h"
#include "qapi/qapi-commands-misc-target.h"
+void sgx_epc_build_srat(GArray *table_data)
+{
+}
+
SGXInfo *qmp_query_sgx(Error **errp)
{
error_setg(errp, "SGX support is not compiled in");
diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
index 8fef3dd8fa..d04299904a 100644
--- a/hw/i386/sgx.c
+++ b/hw/i386/sgx.c
@@ -23,6 +23,7 @@
#include "sysemu/hw_accel.h"
#include "sysemu/reset.h"
#include <sys/ioctl.h>
+#include "hw/acpi/aml-build.h"
#define SGX_MAX_EPC_SECTIONS 8
#define SGX_CPUID_EPC_INVALID 0x0
@@ -36,6 +37,46 @@
#define RETRY_NUM 2
+static int sgx_epc_device_list(Object *obj, void *opaque)
+{
+ GSList **list = opaque;
+
+ if (object_dynamic_cast(obj, TYPE_SGX_EPC)) {
+ *list = g_slist_append(*list, DEVICE(obj));
+ }
+
+ object_child_foreach(obj, sgx_epc_device_list, opaque);
+ return 0;
+}
+
+static GSList *sgx_epc_get_device_list(void)
+{
+ GSList *list = NULL;
+
+ object_child_foreach(qdev_get_machine(), sgx_epc_device_list, &list);
+ return list;
+}
+
+void sgx_epc_build_srat(GArray *table_data)
+{
+ GSList *device_list = sgx_epc_get_device_list();
+
+ for (; device_list; device_list = device_list->next) {
+ DeviceState *dev = device_list->data;
+ Object *obj = OBJECT(dev);
+ uint64_t addr, size;
+ int node;
+
+ node = object_property_get_uint(obj, SGX_EPC_NUMA_NODE_PROP,
+ &error_abort);
+ addr = object_property_get_uint(obj, SGX_EPC_ADDR_PROP, &error_abort);
+ size = object_property_get_uint(obj, SGX_EPC_SIZE_PROP, &error_abort);
+
+ build_srat_memory(table_data, addr, size, node, MEM_AFFINITY_ENABLED);
+ }
+ g_slist_free(device_list);
+}
+
static uint64_t sgx_calc_section_metric(uint64_t low, uint64_t high)
{
return (low & MAKE_64BIT_MASK(12, 20)) +
@@ -226,6 +267,9 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
/* set the memdev link with memory backend */
object_property_parse(obj, SGX_EPC_MEMDEV_PROP, list->value->memdev,
&error_fatal);
+ /* set the numa node property for sgx epc object */
+ object_property_set_uint(obj, SGX_EPC_NUMA_NODE_PROP, list->value->node,
+ &error_fatal);
object_property_set_bool(obj, "realized", true, &error_fatal);
object_unref(obj);
}
diff --git a/include/hw/i386/sgx-epc.h b/include/hw/i386/sgx-epc.h
index a6a65be854..581fac389a 100644
--- a/include/hw/i386/sgx-epc.h
+++ b/include/hw/i386/sgx-epc.h
@@ -25,6 +25,7 @@
#define SGX_EPC_ADDR_PROP "addr"
#define SGX_EPC_SIZE_PROP "size"
#define SGX_EPC_MEMDEV_PROP "memdev"
+#define SGX_EPC_NUMA_NODE_PROP "node"
/**
* SGXEPCDevice:
@@ -38,6 +39,7 @@ typedef struct SGXEPCDevice {
/* public */
uint64_t addr;
+ uint32_t node;
HostMemoryBackendEpc *hostmem;
} SGXEPCDevice;
@@ -56,6 +58,7 @@ typedef struct SGXEPCState {
} SGXEPCState;
bool sgx_epc_get_section(int section_nr, uint64_t *addr, uint64_t *size);
+void sgx_epc_build_srat(GArray *table_data);
static inline uint64_t sgx_epc_above_4g_end(SGXEPCState *sgx_epc)
{
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 9c91bf93e9..2669156b28 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1810,6 +1810,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
se->id ? se->id : "");
monitor_printf(mon, " memaddr: 0x%" PRIx64 "\n", se->memaddr);
monitor_printf(mon, " size: %" PRIu64 "\n", se->size);
+ monitor_printf(mon, " node: %" PRId64 "\n", se->node);
monitor_printf(mon, " memdev: %s\n", se->memdev);
break;
default:
diff --git a/qapi/machine.json b/qapi/machine.json
index f1839acf20..edeab6084b 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1207,12 +1207,15 @@
#
# @memdev: memory backend linked with device
#
+# @node: the numa node
+#
# Since: 6.2
##
{ 'struct': 'SgxEPCDeviceInfo',
'data': { '*id': 'str',
'memaddr': 'size',
'size': 'size',
+ 'node': 'int',
'memdev': 'str'
}
}
@@ -1285,10 +1288,15 @@
#
# @memdev: memory backend linked with device
#
+# @node: the numa node
+#
# Since: 6.2
##
{ 'struct': 'SgxEPC',
- 'data': { 'memdev': 'str' } }
+ 'data': { 'memdev': 'str',
+ 'node': 'int'
+ }
+}
##
# @SgxEPCProperties:
diff --git a/qemu-options.hx b/qemu-options.hx
index ae2c6dbbfc..489b58e151 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -127,11 +127,11 @@ SRST
ERST
DEF("M", HAS_ARG, QEMU_OPTION_M,
- " sgx-epc.0.memdev=memid\n",
+ " sgx-epc.0.memdev=memid,sgx-epc.0.node=numaid\n",
QEMU_ARCH_ALL)
SRST
-``sgx-epc.0.memdev=@var{memid}``
+``sgx-epc.0.memdev=@var{memid},sgx-epc.0.node=@var{numaid}``
Define an SGX EPC section.
ERST
--
2.25.1

View File

@@ -0,0 +1,200 @@
From 4755927ae12547c2e7cb22c5fa1b39038c6c11b1 Mon Sep 17 00:00:00 2001
From: Yang Zhong <yang.zhong@intel.com>
Date: Mon, 1 Nov 2021 12:20:07 -0400
Subject: [PATCH 2/3] numa: Support SGX numa in the monitor and Libvirt
interfaces
Add the SGXEPCSection list into SGXInfo to show the multiple
SGX EPC sections detailed info, not the total size like before.
This patch can enable numa support for 'info sgx' command and
QMP interfaces. The new interfaces show each EPC section info
in one numa node. Libvirt can use QMP interface to get the
detailed host SGX EPC capabilities to decide how to allocate
host EPC sections to guest.
(qemu) info sgx
SGX support: enabled
SGX1 support: enabled
SGX2 support: enabled
FLC support: enabled
NUMA node #0: size=67108864
NUMA node #1: size=29360128
The QMP interface show:
(QEMU) query-sgx
{"return": {"sgx": true, "sgx2": true, "sgx1": true, "sections": \
[{"node": 0, "size": 67108864}, {"node": 1, "size": 29360128}], "flc": true}}
(QEMU) query-sgx-capabilities
{"return": {"sgx": true, "sgx2": true, "sgx1": true, "sections": \
[{"node": 0, "size": 17070817280}, {"node": 1, "size": 17079205888}], "flc": true}}
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20211101162009.62161-4-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
hw/i386/sgx.c | 51 +++++++++++++++++++++++++++++++++++--------
qapi/misc-target.json | 19 ++++++++++++++--
2 files changed, 59 insertions(+), 11 deletions(-)
diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
index d04299904a..5de5dd0893 100644
--- a/hw/i386/sgx.c
+++ b/hw/i386/sgx.c
@@ -83,11 +83,13 @@ static uint64_t sgx_calc_section_metric(uint64_t low, uint64_t high)
((high & MAKE_64BIT_MASK(0, 20)) << 32);
}
-static uint64_t sgx_calc_host_epc_section_size(void)
+static SGXEPCSectionList *sgx_calc_host_epc_sections(void)
{
+ SGXEPCSectionList *head = NULL, **tail = &head;
+ SGXEPCSection *section;
uint32_t i, type;
uint32_t eax, ebx, ecx, edx;
- uint64_t size = 0;
+ uint32_t j = 0;
for (i = 0; i < SGX_MAX_EPC_SECTIONS; i++) {
host_cpuid(0x12, i + 2, &eax, &ebx, &ecx, &edx);
@@ -101,10 +103,13 @@ static uint64_t sgx_calc_host_epc_section_size(void)
break;
}
- size += sgx_calc_section_metric(ecx, edx);
+ section = g_new0(SGXEPCSection, 1);
+ section->node = j++;
+ section->size = sgx_calc_section_metric(ecx, edx);
+ QAPI_LIST_APPEND(tail, section);
}
- return size;
+ return head;
}
static void sgx_epc_reset(void *opaque)
@@ -168,13 +173,35 @@ SGXInfo *qmp_query_sgx_capabilities(Error **errp)
info->sgx1 = eax & (1U << 0) ? true : false;
info->sgx2 = eax & (1U << 1) ? true : false;
- info->section_size = sgx_calc_host_epc_section_size();
+ info->sections = sgx_calc_host_epc_sections();
close(fd);
return info;
}
+static SGXEPCSectionList *sgx_get_epc_sections_list(void)
+{
+ GSList *device_list = sgx_epc_get_device_list();
+ SGXEPCSectionList *head = NULL, **tail = &head;
+ SGXEPCSection *section;
+
+ for (; device_list; device_list = device_list->next) {
+ DeviceState *dev = device_list->data;
+ Object *obj = OBJECT(dev);
+
+ section = g_new0(SGXEPCSection, 1);
+ section->node = object_property_get_uint(obj, SGX_EPC_NUMA_NODE_PROP,
+ &error_abort);
+ section->size = object_property_get_uint(obj, SGX_EPC_SIZE_PROP,
+ &error_abort);
+ QAPI_LIST_APPEND(tail, section);
+ }
+ g_slist_free(device_list);
+
+ return head;
+}
+
SGXInfo *qmp_query_sgx(Error **errp)
{
SGXInfo *info = NULL;
@@ -193,14 +220,13 @@ SGXInfo *qmp_query_sgx(Error **errp)
return NULL;
}
- SGXEPCState *sgx_epc = &pcms->sgx_epc;
info = g_new0(SGXInfo, 1);
info->sgx = true;
info->sgx1 = true;
info->sgx2 = true;
info->flc = true;
- info->section_size = sgx_epc->size;
+ info->sections = sgx_get_epc_sections_list();
return info;
}
@@ -208,6 +234,7 @@ SGXInfo *qmp_query_sgx(Error **errp)
void hmp_info_sgx(Monitor *mon, const QDict *qdict)
{
Error *err = NULL;
+ SGXEPCSectionList *section_list, *section;
g_autoptr(SGXInfo) info = qmp_query_sgx(&err);
if (err) {
@@ -222,8 +249,14 @@ void hmp_info_sgx(Monitor *mon, const QDict *qdict)
info->sgx2 ? "enabled" : "disabled");
monitor_printf(mon, "FLC support: %s\n",
info->flc ? "enabled" : "disabled");
- monitor_printf(mon, "size: %" PRIu64 "\n",
- info->section_size);
+
+ section_list = info->sections;
+ for (section = section_list; section; section = section->next) {
+ monitor_printf(mon, "NUMA node #%" PRId64 ": ",
+ section->value->node);
+ monitor_printf(mon, "size=%" PRIu64 "\n",
+ section->value->size);
+ }
}
bool sgx_epc_get_section(int section_nr, uint64_t *addr, uint64_t *size)
diff --git a/qapi/misc-target.json b/qapi/misc-target.json
index 5aa2b95b7d..1022aa0184 100644
--- a/qapi/misc-target.json
+++ b/qapi/misc-target.json
@@ -337,6 +337,21 @@
'if': 'TARGET_ARM' }
+##
+# @SGXEPCSection:
+#
+# Information about intel SGX EPC section info
+#
+# @node: the numa node
+#
+# @size: the size of epc section
+#
+# Since: 6.2
+##
+{ 'struct': 'SGXEPCSection',
+ 'data': { 'node': 'int',
+ 'size': 'uint64'}}
+
##
# @SGXInfo:
#
@@ -350,7 +365,7 @@
#
# @flc: true if FLC is supported
#
-# @section-size: The EPC section size for guest
+# @sections: The EPC sections info for guest
#
# Since: 6.2
##
@@ -359,7 +374,7 @@
'sgx1': 'bool',
'sgx2': 'bool',
'flc': 'bool',
- 'section-size': 'uint64'},
+ 'sections': ['SGXEPCSection']},
'if': 'TARGET_I386' }
##
--
2.25.1

View File

@@ -0,0 +1,67 @@
From d1889b36098c79e2e6ac90faf3d0dc5ec0057677 Mon Sep 17 00:00:00 2001
From: Yang Zhong <yang.zhong@intel.com>
Date: Mon, 1 Nov 2021 12:20:08 -0400
Subject: [PATCH 3/3] doc: Add the SGX numa description
Add the SGX numa reference command and how to check if
SGX numa is support or not with multiple EPC sections.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20211101162009.62161-5-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
docs/system/i386/sgx.rst | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)
diff --git a/docs/system/i386/sgx.rst b/docs/system/i386/sgx.rst
index f8fade5ac2..0f0a73f758 100644
--- a/docs/system/i386/sgx.rst
+++ b/docs/system/i386/sgx.rst
@@ -141,8 +141,7 @@ To launch a SGX guest:
|qemu_system_x86| \\
-cpu host,+sgx-provisionkey \\
-object memory-backend-epc,id=mem1,size=64M,prealloc=on \\
- -object memory-backend-epc,id=mem2,size=28M \\
- -M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2
+ -M sgx-epc.0.memdev=mem1,sgx-epc.0.node=0
Utilizing SGX in the guest requires a kernel/OS with SGX support.
The support can be determined in guest by::
@@ -152,8 +151,32 @@ The support can be determined in guest by::
and SGX epc info by::
$ dmesg | grep sgx
- [ 1.242142] sgx: EPC section 0x180000000-0x181bfffff
- [ 1.242319] sgx: EPC section 0x181c00000-0x1837fffff
+ [ 0.182807] sgx: EPC section 0x140000000-0x143ffffff
+ [ 0.183695] sgx: [Firmware Bug]: Unable to map EPC section to online node. Fallback to the NUMA node 0.
+
+To launch a SGX numa guest:
+
+.. parsed-literal::
+
+ |qemu_system_x86| \\
+ -cpu host,+sgx-provisionkey \\
+ -object memory-backend-ram,size=2G,host-nodes=0,policy=bind,id=node0 \\
+ -object memory-backend-epc,id=mem0,size=64M,prealloc=on,host-nodes=0,policy=bind \\
+ -numa node,nodeid=0,cpus=0-1,memdev=node0 \\
+ -object memory-backend-ram,size=2G,host-nodes=1,policy=bind,id=node1 \\
+ -object memory-backend-epc,id=mem1,size=28M,prealloc=on,host-nodes=1,policy=bind \\
+ -numa node,nodeid=1,cpus=2-3,memdev=node1 \\
+ -M sgx-epc.0.memdev=mem0,sgx-epc.0.node=0,sgx-epc.1.memdev=mem1,sgx-epc.1.node=1
+
+and SGX epc numa info by::
+
+ $ dmesg | grep sgx
+ [ 0.369937] sgx: EPC section 0x180000000-0x183ffffff
+ [ 0.370259] sgx: EPC section 0x184000000-0x185bfffff
+
+ $ dmesg | grep SRAT
+ [ 0.009981] ACPI: SRAT: Node 0 PXM 0 [mem 0x180000000-0x183ffffff]
+ [ 0.009982] ACPI: SRAT: Node 1 PXM 1 [mem 0x184000000-0x185bfffff]
References
----------
--
2.25.1

View File

@@ -75,7 +75,7 @@ assets:
url: "https://github.com/cloud-hypervisor/cloud-hypervisor"
uscan-url: >-
https://github.com/cloud-hypervisor/cloud-hypervisor/tags.*/v?(\d\S+)\.tar\.gz
version: "v23.0"
version: "v24.0"
firecracker:
description: "Firecracker micro-VMM"