Merge pull request #7564 from fidencio/topic/merge-from-main-Aug-7th

CC | Merge from main to CCv0 -- Aug 7th, 2023
2025-08-24 10:41:43 +00:00 · 2023-08-07 23:08:12 +02:00 · 2023-08-07 23:08:12 +02:00 · d0abf45ed1
commit d0abf45ed1
parent 0bee9f199d 5f5e05a77f
107 changed files with 3222 additions and 2528 deletions
--- a/.github/workflows/ci-on-push.yaml
+++ b/.github/workflows/ci-on-push.yaml
@ -3,6 +3,7 @@ on:
  pull_request_target:
    branches:
      - 'main'
+      - 'stable-*'
    types:
      # Adding 'labeled' to the list of activity types that trigger this event
      # (default: opened, synchronize, reopened) so that we can run this
--- a/.github/workflows/run-metrics.yaml
+++ b/.github/workflows/run-metrics.yaml
@ -49,9 +49,12 @@ jobs:
      - name: run tensorflow test
        run:  bash tests/metrics/gha-run.sh run-test-tensorflow

+      - name: run fio test
+        run:  bash tests/metrics/gha-run.sh run-test-fio
+
      - name: make metrics tarball ${{ matrix.vmm }}
        run: bash tests/metrics/gha-run.sh make-tarball-results
-          
+
      - name: archive metrics results ${{ matrix.vmm }}
        uses: actions/upload-artifact@v3
        with:
--- a/2
+++ b/2
@ -1 +1 @@
-3.2.0-alpha4
+3.2.0-rc0
--- a/ci/install_libseccomp.sh
+++ b/ci/install_libseccomp.sh
@ -88,7 +88,8 @@ build_and_install_libseccomp() {
    curl -sLO "${libseccomp_tarball_url}"
    tar -xf "${libseccomp_tarball}"
    pushd "libseccomp-${libseccomp_version}"
-    ./configure --prefix="${libseccomp_install_dir}" CFLAGS="${cflags}" --enable-static --host="${arch}"
+    [ "${arch}" == $(uname -m) ] && cc_name="" || cc_name="${arch}-linux-gnu-gcc"
+    CC=${cc_name} ./configure --prefix="${libseccomp_install_dir}" CFLAGS="${cflags}" --enable-static --host="${arch}"
    make
    make install
    popd
--- a/docs/design/README.md
+++ b/docs/design/README.md
@ -14,6 +14,7 @@ Kata Containers design documents:
 - [`Inotify` support](inotify.md)
 - [`Hooks` support](hooks-handling.md)
 - [Metrics(Kata 2.0)](kata-2-0-metrics.md)
+- [Metrics in Rust Runtime(runtime-rs)](kata-metrics-in-runtime-rs.md)
 - [Design for Kata Containers `Lazyload` ability with `nydus`](kata-nydus-design.md)
 - [Design for direct-assigned volume](direct-blk-device-assignment.md)
 - [Design for core-scheduling](core-scheduling.md)
--- a/docs/design/architecture/kubernetes.md
+++ b/docs/design/architecture/kubernetes.md
@ -12,7 +12,7 @@ only needs to run a container runtime and a container agent (called a
 Kata Containers represents a Kubelet pod as a VM.

 A Kubernetes cluster runs a control plane where a scheduler (typically
-running on a dedicated master node) calls into a compute Kubelet. This
+running on a dedicated control-plane node) calls into a compute Kubelet. This
 Kubelet instance is responsible for managing the lifecycle of pods
 within the nodes and eventually relies on a container runtime to
 handle execution. The Kubelet architecture decouples lifecycle
--- a/docs/design/kata-metrics-in-runtime-rs.md
+++ b/docs/design/kata-metrics-in-runtime-rs.md
@ -0,0 +1,50 @@
+# Kata Metrics in Rust Runtime(runtime-rs)
+
+Rust Runtime(runtime-rs) is responsible for:
+
+- Gather metrics about `shim`.
+- Gather metrics from `hypervisor` (through `channel`).
+- Get metrics from `agent` (through `ttrpc`).
+
+---
+
+Here are listed all the metrics gathered by `runtime-rs`.
+
+> * Current status of each entry is marked as:
+>  * ✅：DONE
+>   * 🚧：TODO
+
+### Kata Shim
+
+| STATUS | Metric name                                                  | Type        | Units          | Labels                                                       |
+| ------ | ------------------------------------------------------------ | ----------- | -------------- | ------------------------------------------------------------ |
+| 🚧      | `kata_shim_agent_rpc_durations_histogram_milliseconds`: <br> RPC latency distributions. | `HISTOGRAM` | `milliseconds` | <ul><li>`action` (RPC actions of Kata agent)<ul><li>`grpc.CheckRequest`</li><li>`grpc.CloseStdinRequest`</li><li>`grpc.CopyFileRequest`</li><li>`grpc.CreateContainerRequest`</li><li>`grpc.CreateSandboxRequest`</li><li>`grpc.DestroySandboxRequest`</li><li>`grpc.ExecProcessRequest`</li><li>`grpc.GetMetricsRequest`</li><li>`grpc.GuestDetailsRequest`</li><li>`grpc.ListInterfacesRequest`</li><li>`grpc.ListProcessesRequest`</li><li>`grpc.ListRoutesRequest`</li><li>`grpc.MemHotplugByProbeRequest`</li><li>`grpc.OnlineCPUMemRequest`</li><li>`grpc.PauseContainerRequest`</li><li>`grpc.RemoveContainerRequest`</li><li>`grpc.ReseedRandomDevRequest`</li><li>`grpc.ResumeContainerRequest`</li><li>`grpc.SetGuestDateTimeRequest`</li><li>`grpc.SignalProcessRequest`</li><li>`grpc.StartContainerRequest`</li><li>`grpc.StatsContainerRequest`</li><li>`grpc.TtyWinResizeRequest`</li><li>`grpc.UpdateContainerRequest`</li><li>`grpc.UpdateInterfaceRequest`</li><li>`grpc.UpdateRoutesRequest`</li><li>`grpc.WaitProcessRequest`</li><li>`grpc.WriteStreamRequest`</li></ul></li><li>`sandbox_id`</li></ul> |
+| ✅      | `kata_shim_fds`: <br> Kata containerd shim v2 open FDs.      | `GAUGE`     |                | <ul><li>`sandbox_id`</li></ul>                               |
+| ✅      | `kata_shim_io_stat`: <br> Kata containerd shim v2 process IO statistics. | `GAUGE`     |                | <ul><li>`item` (see `/proc/<pid>/io`)<ul><li>`cancelledwritebytes`</li><li>`rchar`</li><li>`readbytes`</li><li>`syscr`</li><li>`syscw`</li><li>`wchar`</li><li>`writebytes`</li></ul></li><li>`sandbox_id`</li></ul> |
+| ✅      | `kata_shim_netdev`: <br> Kata containerd shim v2 network devices statistics. | `GAUGE`     |                | <ul><li>`interface` (network device name)</li><li>`item` (see `/proc/net/dev`)<ul><li>`recv_bytes`</li><li>`recv_compressed`</li><li>`recv_drop`</li><li>`recv_errs`</li><li>`recv_fifo`</li><li>`recv_frame`</li><li>`recv_multicast`</li><li>`recv_packets`</li><li>`sent_bytes`</li><li>`sent_carrier`</li><li>`sent_colls`</li><li>`sent_compressed`</li><li>`sent_drop`</li><li>`sent_errs`</li><li>`sent_fifo`</li><li>`sent_packets`</li></ul></li><li>`sandbox_id`</li></ul> |
+| 🚧      | `kata_shim_pod_overhead_cpu`: <br> Kata Pod overhead for CPU resources(percent). | `GAUGE`     | percent        | <ul><li>`sandbox_id`</li></ul>                               |
+| 🚧      | `kata_shim_pod_overhead_memory_in_bytes`: <br> Kata Pod overhead for memory resources(bytes). | `GAUGE`     | `bytes`        | <ul><li>`sandbox_id`</li></ul>                               |
+| ✅      | `kata_shim_proc_stat`: <br> Kata containerd shim v2 process statistics. | `GAUGE`     |                | <ul><li>`item` (see `/proc/<pid>/stat`)<ul><li>`cstime`</li><li>`cutime`</li><li>`stime`</li><li>`utime`</li></ul></li><li>`sandbox_id`</li></ul> |
+| ✅      | `kata_shim_proc_status`: <br> Kata containerd shim v2 process status. | `GAUGE`     |                | <ul><li>`item` (see `/proc/<pid>/status`)<ul><li>`hugetlbpages`</li><li>`nonvoluntary_ctxt_switches`</li><li>`rssanon`</li><li>`rssfile`</li><li>`rssshmem`</li><li>`vmdata`</li><li>`vmexe`</li><li>`vmhwm`</li><li>`vmlck`</li><li>`vmlib`</li><li>`vmpeak`</li><li>`vmpin`</li><li>`vmpmd`</li><li>`vmpte`</li><li>`vmrss`</li><li>`vmsize`</li><li>`vmstk`</li><li>`vmswap`</li><li>`voluntary_ctxt_switches`</li></ul></li><li>`sandbox_id`</li></ul> |
+| 🚧      | `kata_shim_process_cpu_seconds_total`: <br> Total user and system CPU time spent in seconds. | `COUNTER`   | `seconds`      | <ul><li>`sandbox_id`</li></ul>                               |
+| 🚧      | `kata_shim_process_max_fds`: <br> Maximum number of open file descriptors. | `GAUGE`     |                | <ul><li>`sandbox_id`</li></ul>                               |
+| 🚧      | `kata_shim_process_open_fds`: <br> Number of open file descriptors. | `GAUGE`     |                | <ul><li>`sandbox_id`</li></ul>                               |
+| 🚧      | `kata_shim_process_resident_memory_bytes`: <br> Resident memory size in bytes. | `GAUGE`     | `bytes`        | <ul><li>`sandbox_id`</li></ul>                               |
+| 🚧      | `kata_shim_process_start_time_seconds`: <br> Start time of the process since `unix` epoch in seconds. | `GAUGE`     | `seconds`      | <ul><li>`sandbox_id`</li></ul>                               |
+| 🚧      | `kata_shim_process_virtual_memory_bytes`: <br> Virtual memory size in bytes. | `GAUGE`     | `bytes`        | <ul><li>`sandbox_id`</li></ul>                               |
+| 🚧      | `kata_shim_process_virtual_memory_max_bytes`: <br> Maximum amount of virtual memory available in bytes. | `GAUGE`     | `bytes`        | <ul><li>`sandbox_id`</li></ul>                               |
+| 🚧      | `kata_shim_rpc_durations_histogram_milliseconds`: <br> RPC latency distributions. | `HISTOGRAM` | `milliseconds` | <ul><li>`action` (Kata shim v2 actions)<ul><li>`checkpoint`</li><li>`close_io`</li><li>`connect`</li><li>`create`</li><li>`delete`</li><li>`exec`</li><li>`kill`</li><li>`pause`</li><li>`pids`</li><li>`resize_pty`</li><li>`resume`</li><li>`shutdown`</li><li>`start`</li><li>`state`</li><li>`stats`</li><li>`update`</li><li>`wait`</li></ul></li><li>`sandbox_id`</li></ul> |
+| ✅      | `kata_shim_threads`: <br> Kata containerd shim v2 process threads. | `GAUGE`     |                | <ul><li>`sandbox_id`</li></ul>                               |
+
+### Kata Hypervisor
+
+Different from golang runtime, hypervisor and shim in runtime-rs belong to the **same process**, so all previous metrics for hypervisor and shim only need to be gathered once. Thus, we currently only collect previous metrics in kata shim.
+
+At the same time, we added the interface(`VmmAction::GetHypervisorMetrics`) to gather hypervisor metrics, in case we design tailor-made metrics for hypervisor in the future. Here're metrics exposed from [src/dragonball/src/metric.rs](https://github.com/kata-containers/kata-containers/blob/main/src/dragonball/src/metric.rs).
+
+| Metric name                                                  | Type       | Units | Labels                                                       |
+| ------------------------------------------------------------ | ---------- | ----- | ------------------------------------------------------------ |
+| `kata_hypervisor_scrape_count`: <br> Metrics scrape count    | `COUNTER`  |       | <ul><li>`sandbox_id`</li></ul>                               |
+| `kata_hypervisor_vcpu`: <br>Hypervisor metrics specific to VCPUs' mode of functioning. | `IntGauge` |       | <ul><li>`item`<ul><li>`exit_io_in`</li><li>`exit_io_out`</li><li>`exit_mmio_read`</li><li>`exit_mmio_write`</li><li>`failures`</li><li>`filter_cpuid`</li></ul></li><li>`sandbox_id`</li></ul> |
+| `kata_hypervisor_seccomp`: <br> Hypervisor metrics for the seccomp filtering. | `IntGauge` |       | <ul><li>`item`<ul><li>`num_faults`</li></ul></li><li>`sandbox_id`</li></ul> |
+| `kata_hypervisor_seccomp`: <br> Hypervisor metrics for the seccomp filtering. | `IntGauge` |       | <ul><li>`item`<ul><li>`sigbus`</li><li>`sigsegv`</li></ul></li><li>`sandbox_id`</li></ul> |
--- a/docs/how-to/how-to-use-k8s-with-containerd-and-kata.md
+++ b/docs/how-to/how-to-use-k8s-with-containerd-and-kata.md
@ -139,12 +139,12 @@ By default the CNI plugin binaries is installed under `/opt/cni/bin` (in package
  EOF
  ```

-## Allow pods to run in the master node
+## Allow pods to run in the control-plane node

-By default, the cluster will not schedule pods in the master node. To enable master node scheduling:
+By default, the cluster will not schedule pods in the control-plane node. To enable control-plane node scheduling:

 ```bash
-$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/master-
+$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/control-plane-
 ```

 ## Create runtime class for Kata Containers
--- a/docs/how-to/run-kata-with-k8s.md
+++ b/docs/how-to/run-kata-with-k8s.md
@ -115,11 +115,11 @@ $ sudo kubeadm init --ignore-preflight-errors=all --config kubeadm-config.yaml
 $ export KUBECONFIG=/etc/kubernetes/admin.conf
 ```

-### Allow pods to run in the master node
+### Allow pods to run in the control-plane node

-By default, the cluster will not schedule pods in the master node. To enable master node scheduling:
+By default, the cluster will not schedule pods in the control-plane node. To enable control-plane node scheduling:
 ```bash
-$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/master-
+$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/control-plane-
 ```

 ### Create runtime class for Kata Containers
--- a/docs/install/minikube-installation-guide.md
+++ b/docs/install/minikube-installation-guide.md
@ -91,7 +91,7 @@ Before you install Kata Containers, check that your Minikube is operating. On yo
 $ kubectl get nodes
 ```

-You should see your `master` node listed as being `Ready`.
+You should see your `control-plane` node listed as being `Ready`.

 Check you have virtualization enabled inside your Minikube. The following should return
 a number larger than `0` if you have either of the `vmx` or `svm` nested virtualization features
--- a/src/agent/src/mount.rs
+++ b/src/agent/src/mount.rs
@ -230,7 +230,7 @@ pub fn baremount(
 async fn ephemeral_storage_handler(
    logger: &Logger,
    storage: &Storage,
-    sandbox: Arc<Mutex<Sandbox>>,
+    sandbox: &Arc<Mutex<Sandbox>>,
 ) -> Result<String> {
    // hugetlbfs
    if storage.fstype == FS_TYPE_HUGETLB {
@ -278,7 +278,7 @@ async fn ephemeral_storage_handler(
 pub async fn update_ephemeral_mounts(
    logger: Logger,
    storages: Vec<Storage>,
-    sandbox: Arc<Mutex<Sandbox>>,
+    sandbox: &Arc<Mutex<Sandbox>>,
 ) -> Result<()> {
    for (_, storage) in storages.iter().enumerate() {
        let handler_name = storage.driver.clone();
@ -340,8 +340,33 @@ pub async fn update_ephemeral_mounts(
 async fn overlayfs_storage_handler(
    logger: &Logger,
    storage: &Storage,
-    _sandbox: Arc<Mutex<Sandbox>>,
+    cid: Option<&str>,
+    _sandbox: &Arc<Mutex<Sandbox>>,
 ) -> Result<String> {
+    if storage
+        .options
+        .iter()
+        .any(|e| e == "io.katacontainers.fs-opt.overlay-rw")
+    {
+        let cid = cid.ok_or_else(|| anyhow!("No container id in rw overlay"))?;
+        let cpath = Path::new(crate::rpc::CONTAINER_BASE).join(cid);
+        let work = cpath.join("work");
+        let upper = cpath.join("upper");
+
+        fs::create_dir_all(&work).context("Creating overlay work directory")?;
+        fs::create_dir_all(&upper).context("Creating overlay upper directory")?;
+
+        let mut storage = storage.clone();
+        storage.fstype = "overlay".into();
+        storage
+            .options
+            .push(format!("upperdir={}", upper.to_string_lossy()));
+        storage
+            .options
+            .push(format!("workdir={}", work.to_string_lossy()));
+        return common_storage_handler(logger, &storage);
+    }
+
    common_storage_handler(logger, storage)
 }

@ -349,7 +374,7 @@ async fn overlayfs_storage_handler(
 async fn local_storage_handler(
    _logger: &Logger,
    storage: &Storage,
-    sandbox: Arc<Mutex<Sandbox>>,
+    sandbox: &Arc<Mutex<Sandbox>>,
 ) -> Result<String> {
    fs::create_dir_all(&storage.mount_point).context(format!(
        "failed to create dir all {:?}",
@ -389,7 +414,7 @@ async fn local_storage_handler(
 async fn virtio9p_storage_handler(
    logger: &Logger,
    storage: &Storage,
-    _sandbox: Arc<Mutex<Sandbox>>,
+    _sandbox: &Arc<Mutex<Sandbox>>,
 ) -> Result<String> {
    common_storage_handler(logger, storage)
 }
@ -503,11 +528,11 @@ fn get_pagesize_and_size_from_option(options: &[String]) -> Result<(u64, u64)> {
 async fn virtiommio_blk_storage_handler(
    logger: &Logger,
    storage: &Storage,
-    sandbox: Arc<Mutex<Sandbox>>,
+    sandbox: &Arc<Mutex<Sandbox>>,
 ) -> Result<String> {
    let storage = storage.clone();
    if !Path::new(&storage.source).exists() {
-        get_virtio_mmio_device_name(&sandbox, &storage.source)
+        get_virtio_mmio_device_name(sandbox, &storage.source)
            .await
            .context("failed to get mmio device name")?;
    }
@ -520,7 +545,7 @@ async fn virtiommio_blk_storage_handler(
 async fn virtiofs_storage_handler(
    logger: &Logger,
    storage: &Storage,
-    _sandbox: Arc<Mutex<Sandbox>>,
+    _sandbox: &Arc<Mutex<Sandbox>>,
 ) -> Result<String> {
    common_storage_handler(logger, storage)
 }
@ -530,7 +555,7 @@ async fn virtiofs_storage_handler(
 async fn virtio_blk_storage_handler(
    logger: &Logger,
    storage: &Storage,
-    sandbox: Arc<Mutex<Sandbox>>,
+    sandbox: &Arc<Mutex<Sandbox>>,
 ) -> Result<String> {
    let mut storage = storage.clone();
    // If hot-plugged, get the device node path based on the PCI path
@ -545,7 +570,7 @@ async fn virtio_blk_storage_handler(
        }
    } else {
        let pcipath = pci::Path::from_str(&storage.source)?;
-        let dev_path = get_virtio_blk_pci_device_name(&sandbox, &pcipath).await?;
+        let dev_path = get_virtio_blk_pci_device_name(sandbox, &pcipath).await?;
        storage.source = dev_path;
    }

@ -558,11 +583,11 @@ async fn virtio_blk_storage_handler(
 async fn virtio_blk_ccw_storage_handler(
    logger: &Logger,
    storage: &Storage,
-    sandbox: Arc<Mutex<Sandbox>>,
+    sandbox: &Arc<Mutex<Sandbox>>,
 ) -> Result<String> {
    let mut storage = storage.clone();
    let ccw_device = ccw::Device::from_str(&storage.source)?;
-    let dev_path = get_virtio_blk_ccw_device_name(&sandbox, &ccw_device).await?;
+    let dev_path = get_virtio_blk_ccw_device_name(sandbox, &ccw_device).await?;
    storage.source = dev_path;
    common_storage_handler(logger, &storage)
 }
@ -572,7 +597,7 @@ async fn virtio_blk_ccw_storage_handler(
 async fn virtio_blk_ccw_storage_handler(
    _: &Logger,
    _: &Storage,
-    _: Arc<Mutex<Sandbox>>,
+    _: &Arc<Mutex<Sandbox>>,
 ) -> Result<String> {
    Err(anyhow!("CCW is only supported on s390x"))
 }
@ -582,12 +607,12 @@ async fn virtio_blk_ccw_storage_handler(
 async fn virtio_scsi_storage_handler(
    logger: &Logger,
    storage: &Storage,
-    sandbox: Arc<Mutex<Sandbox>>,
+    sandbox: &Arc<Mutex<Sandbox>>,
 ) -> Result<String> {
    let mut storage = storage.clone();

    // Retrieve the device path from SCSI address.
-    let dev_path = get_scsi_device_name(&sandbox, &storage.source).await?;
+    let dev_path = get_scsi_device_name(sandbox, &storage.source).await?;
    storage.source = dev_path;

    common_storage_handler(logger, &storage)
@ -608,12 +633,12 @@ fn common_storage_handler(logger: &Logger, storage: &Storage) -> Result<String>
 async fn nvdimm_storage_handler(
    logger: &Logger,
    storage: &Storage,
-    sandbox: Arc<Mutex<Sandbox>>,
+    sandbox: &Arc<Mutex<Sandbox>>,
 ) -> Result<String> {
    let storage = storage.clone();

    // Retrieve the device path from NVDIMM address.
-    wait_for_pmem_device(&sandbox, &storage.source).await?;
+    wait_for_pmem_device(sandbox, &storage.source).await?;

    common_storage_handler(logger, &storage)
 }
@ -621,7 +646,7 @@ async fn nvdimm_storage_handler(
 async fn bind_watcher_storage_handler(
    logger: &Logger,
    storage: &Storage,
-    sandbox: Arc<Mutex<Sandbox>>,
+    sandbox: &Arc<Mutex<Sandbox>>,
    cid: Option<String>,
 ) -> Result<()> {
    let mut locked = sandbox.lock().await;
@ -813,10 +838,14 @@ fn parse_mount_flags_and_options(options_vec: Vec<&str>) -> (MsFlags, String) {
                    }
                }
                None => {
+                    if opt.starts_with("io.katacontainers.") {
+                        continue;
+                    }
+
                    if !options.is_empty() {
                        options.push_str(format!(",{}", opt).as_str());
                    } else {
-                        options.push_str(opt.to_string().as_str());
+                        options.push_str(opt);
                    }
                }
            };
@ -833,7 +862,7 @@ fn parse_mount_flags_and_options(options_vec: Vec<&str>) -> (MsFlags, String) {
 pub async fn add_storages(
    logger: Logger,
    storages: Vec<Storage>,
-    sandbox: Arc<Mutex<Sandbox>>,
+    sandbox: &Arc<Mutex<Sandbox>>,
    cid: Option<String>,
 ) -> Result<Vec<String>> {
    let mut mount_list = Vec::new();
@ -853,31 +882,22 @@ pub async fn add_storages(
        }

        let res = match handler_name.as_str() {
-            DRIVER_BLK_TYPE => virtio_blk_storage_handler(&logger, &storage, sandbox.clone()).await,
-            DRIVER_BLK_CCW_TYPE => {
-                virtio_blk_ccw_storage_handler(&logger, &storage, sandbox.clone()).await
-            }
-            DRIVER_9P_TYPE => virtio9p_storage_handler(&logger, &storage, sandbox.clone()).await,
-            DRIVER_VIRTIOFS_TYPE => {
-                virtiofs_storage_handler(&logger, &storage, sandbox.clone()).await
-            }
-            DRIVER_EPHEMERAL_TYPE => {
-                ephemeral_storage_handler(&logger, &storage, sandbox.clone()).await
-            }
+            DRIVER_BLK_TYPE => virtio_blk_storage_handler(&logger, &storage, sandbox).await,
+            DRIVER_BLK_CCW_TYPE => virtio_blk_ccw_storage_handler(&logger, &storage, sandbox).await,
+            DRIVER_9P_TYPE => virtio9p_storage_handler(&logger, &storage, sandbox).await,
+            DRIVER_VIRTIOFS_TYPE => virtiofs_storage_handler(&logger, &storage, sandbox).await,
+            DRIVER_EPHEMERAL_TYPE => ephemeral_storage_handler(&logger, &storage, sandbox).await,
            DRIVER_OVERLAYFS_TYPE => {
-                overlayfs_storage_handler(&logger, &storage, sandbox.clone()).await
+                overlayfs_storage_handler(&logger, &storage, cid.as_deref(), sandbox).await
            }
            DRIVER_MMIO_BLK_TYPE => {
-                virtiommio_blk_storage_handler(&logger, &storage, sandbox.clone()).await
+                virtiommio_blk_storage_handler(&logger, &storage, sandbox).await
            }
-            DRIVER_LOCAL_TYPE => local_storage_handler(&logger, &storage, sandbox.clone()).await,
-            DRIVER_SCSI_TYPE => {
-                virtio_scsi_storage_handler(&logger, &storage, sandbox.clone()).await
-            }
-            DRIVER_NVDIMM_TYPE => nvdimm_storage_handler(&logger, &storage, sandbox.clone()).await,
+            DRIVER_LOCAL_TYPE => local_storage_handler(&logger, &storage, sandbox).await,
+            DRIVER_SCSI_TYPE => virtio_scsi_storage_handler(&logger, &storage, sandbox).await,
+            DRIVER_NVDIMM_TYPE => nvdimm_storage_handler(&logger, &storage, sandbox).await,
            DRIVER_WATCHABLE_BIND_TYPE => {
-                bind_watcher_storage_handler(&logger, &storage, sandbox.clone(), cid.clone())
-                    .await?;
+                bind_watcher_storage_handler(&logger, &storage, sandbox, cid.clone()).await?;
                // Don't register watch mounts, they're handled separately by the watcher.
                Ok(String::new())
            }
--- a/src/agent/src/rpc.rs
+++ b/src/agent/src/rpc.rs
@ -193,9 +193,6 @@ impl AgentService {
        let mut oci_spec = req.OCI.clone();
        let use_sandbox_pidns = req.sandbox_pidns();

-        let sandbox;
-        let mut s;
-
        let mut oci = match oci_spec.as_mut() {
            Some(spec) => rustjail::grpc_to_oci(spec),
            None => {
@ -254,15 +251,13 @@ impl AgentService {
        let m = add_storages(
            sl(),
            req.storages.to_vec(),
-            self.sandbox.clone(),
+            &self.sandbox,
            Some(req.container_id.clone()),
        )
        .await?;
-        {
-            sandbox = self.sandbox.clone();
-            s = sandbox.lock().await;
-            s.container_mounts.insert(cid.clone(), m);
-        }
+
+        let mut s = self.sandbox.lock().await;
+        s.container_mounts.insert(cid.clone(), m);

        update_container_namespaces(&s, &mut oci, use_sandbox_pidns)?;

@ -335,8 +330,7 @@ impl AgentService {
    async fn do_start_container(&self, req: protocols::agent::StartContainerRequest) -> Result<()> {
        let cid = req.container_id;

-        let sandbox = self.sandbox.clone();
-        let mut s = sandbox.lock().await;
+        let mut s = self.sandbox.lock().await;
        let sid = s.id.clone();

        let ctr = s
@ -370,8 +364,7 @@ impl AgentService {
        let cid = req.container_id.clone();

        if req.timeout == 0 {
-            let s = Arc::clone(&self.sandbox);
-            let mut sandbox = s.lock().await;
+            let mut sandbox = self.sandbox.lock().await;

            sandbox.bind_watcher.remove_container(&cid).await;

@ -411,8 +404,7 @@ impl AgentService {
            return Err(anyhow!(nix::Error::UnknownErrno));
        }

-        let s = self.sandbox.clone();
-        let mut sandbox = s.lock().await;
+        let mut sandbox = self.sandbox.lock().await;
        remove_container_resources(&mut sandbox, &cid)?;

        Ok(())
@ -425,8 +417,7 @@ impl AgentService {

        info!(sl(), "do_exec_process cid: {} eid: {}", cid, exec_id);

-        let s = self.sandbox.clone();
-        let mut sandbox = s.lock().await;
+        let mut sandbox = self.sandbox.lock().await;

        let mut process = req
            .process
@ -453,7 +444,6 @@ impl AgentService {
    async fn do_signal_process(&self, req: protocols::agent::SignalProcessRequest) -> Result<()> {
        let cid = req.container_id.clone();
        let eid = req.exec_id.clone();
-        let s = self.sandbox.clone();

        info!(
            sl(),
@ -465,7 +455,7 @@ impl AgentService {

        let mut sig: libc::c_int = req.signal as libc::c_int;
        {
-            let mut sandbox = s.lock().await;
+            let mut sandbox = self.sandbox.lock().await;
            let p = sandbox.find_container_process(cid.as_str(), eid.as_str())?;
            // For container initProcess, if it hasn't installed handler for "SIGTERM" signal,
            // it will ignore the "SIGTERM" signal sent to it, thus send it "SIGKILL" signal
@ -538,8 +528,7 @@ impl AgentService {
    }

    async fn freeze_cgroup(&self, cid: &str, state: FreezerState) -> Result<()> {
-        let s = self.sandbox.clone();
-        let mut sandbox = s.lock().await;
+        let mut sandbox = self.sandbox.lock().await;
        let ctr = sandbox
            .get_container(cid)
            .ok_or_else(|| anyhow!("Invalid container id {}", cid))?;
@ -548,8 +537,7 @@ impl AgentService {
    }

    async fn get_pids(&self, cid: &str) -> Result<Vec<i32>> {
-        let s = self.sandbox.clone();
-        let mut sandbox = s.lock().await;
+        let mut sandbox = self.sandbox.lock().await;
        let ctr = sandbox
            .get_container(cid)
            .ok_or_else(|| anyhow!("Invalid container id {}", cid))?;
@ -564,7 +552,6 @@ impl AgentService {
    ) -> Result<protocols::agent::WaitProcessResponse> {
        let cid = req.container_id.clone();
        let eid = req.exec_id;
-        let s = self.sandbox.clone();
        let mut resp = WaitProcessResponse::new();
        let pid: pid_t;

@ -578,7 +565,7 @@ impl AgentService {
        );

        let exit_rx = {
-            let mut sandbox = s.lock().await;
+            let mut sandbox = self.sandbox.lock().await;
            let p = sandbox.find_container_process(cid.as_str(), eid.as_str())?;

            p.exit_watchers.push(exit_send);
@ -593,7 +580,7 @@ impl AgentService {
            info!(sl(), "cid {} eid {} received exit signal", &cid, &eid);
        }

-        let mut sandbox = s.lock().await;
+        let mut sandbox = self.sandbox.lock().await;
        let ctr = sandbox
            .get_container(&cid)
            .ok_or_else(|| anyhow!("Invalid container id"))?;
@ -635,8 +622,7 @@ impl AgentService {
        let eid = req.exec_id.clone();

        let writer = {
-            let s = self.sandbox.clone();
-            let mut sandbox = s.lock().await;
+            let mut sandbox = self.sandbox.lock().await;
            let p = sandbox.find_container_process(cid.as_str(), eid.as_str())?;

            // use ptmx io
@ -667,9 +653,7 @@ impl AgentService {

        let term_exit_notifier;
        let reader = {
-            let s = self.sandbox.clone();
-            let mut sandbox = s.lock().await;
-
+            let mut sandbox = self.sandbox.lock().await;
            let p = sandbox.find_container_process(cid.as_str(), eid.as_str())?;

            term_exit_notifier = p.term_exit_notifier.clone();
@ -700,7 +684,7 @@ impl AgentService {
            // Poll::Ready so that the term_exit_notifier will never polled
            // before all data were read.
            biased;
-            v = read_stream(reader, req.len as usize)  => {
+            v = read_stream(&reader, req.len as usize)  => {
                let vector = v?;
                let mut resp = ReadStreamResponse::new();
                resp.set_data(vector);
@ -852,8 +836,7 @@ impl agent_ttrpc::AgentService for AgentService {
        let cid = req.container_id.clone();
        let res = req.resources;

-        let s = Arc::clone(&self.sandbox);
-        let mut sandbox = s.lock().await;
+        let mut sandbox = self.sandbox.lock().await;

        let ctr = sandbox.get_container(&cid).ok_or_else(|| {
            ttrpc_error(
@ -886,8 +869,7 @@ impl agent_ttrpc::AgentService for AgentService {
        trace_rpc_call!(ctx, "stats_container", req);
        is_allowed(&req)?;
        let cid = req.container_id;
-        let s = Arc::clone(&self.sandbox);
-        let mut sandbox = s.lock().await;
+        let mut sandbox = self.sandbox.lock().await;

        let ctr = sandbox.get_container(&cid).ok_or_else(|| {
            ttrpc_error(
@ -908,8 +890,7 @@ impl agent_ttrpc::AgentService for AgentService {
        trace_rpc_call!(ctx, "pause_container", req);
        is_allowed(&req)?;
        let cid = req.container_id();
-        let s = Arc::clone(&self.sandbox);
-        let mut sandbox = s.lock().await;
+        let mut sandbox = self.sandbox.lock().await;

        let ctr = sandbox.get_container(cid).ok_or_else(|| {
            ttrpc_error(
@ -932,8 +913,7 @@ impl agent_ttrpc::AgentService for AgentService {
        trace_rpc_call!(ctx, "resume_container", req);
        is_allowed(&req)?;
        let cid = req.container_id();
-        let s = Arc::clone(&self.sandbox);
-        let mut sandbox = s.lock().await;
+        let mut sandbox = self.sandbox.lock().await;

        let ctr = sandbox.get_container(cid).ok_or_else(|| {
            ttrpc_error(
@ -1014,8 +994,7 @@ impl agent_ttrpc::AgentService for AgentService {

        let cid = req.container_id.clone();
        let eid = req.exec_id;
-        let s = Arc::clone(&self.sandbox);
-        let mut sandbox = s.lock().await;
+        let mut sandbox = self.sandbox.lock().await;

        let p = sandbox
            .find_container_process(cid.as_str(), eid.as_str())
@ -1041,8 +1020,7 @@ impl agent_ttrpc::AgentService for AgentService {

        let cid = req.container_id.clone();
        let eid = req.exec_id.clone();
-        let s = Arc::clone(&self.sandbox);
-        let mut sandbox = s.lock().await;
+        let mut sandbox = self.sandbox.lock().await;
        let p = sandbox
            .find_container_process(cid.as_str(), eid.as_str())
            .map_err(|e| {
@ -1146,7 +1124,7 @@ impl agent_ttrpc::AgentService for AgentService {
        trace_rpc_call!(ctx, "update_mounts", req);
        is_allowed(&req)?;

-        match update_ephemeral_mounts(sl(), req.storages.to_vec(), self.sandbox.clone()).await {
+        match update_ephemeral_mounts(sl(), req.storages.to_vec(), &self.sandbox).await {
            Ok(_) => Ok(Empty::new()),
            Err(e) => Err(ttrpc_error(
                ttrpc::Code::INTERNAL,
@ -1369,8 +1347,7 @@ impl agent_ttrpc::AgentService for AgentService {
        is_allowed(&req)?;

        {
-            let sandbox = self.sandbox.clone();
-            let mut s = sandbox.lock().await;
+            let mut s = self.sandbox.lock().await;

            let _ = fs::remove_dir_all(CONTAINER_BASE);
            let _ = fs::create_dir_all(CONTAINER_BASE);
@ -1400,19 +1377,16 @@ impl agent_ttrpc::AgentService for AgentService {
                .map_err(|e| ttrpc_error(ttrpc::Code::INTERNAL, e))?;
        }

-        match add_storages(sl(), req.storages.to_vec(), self.sandbox.clone(), None).await {
+        match add_storages(sl(), req.storages.to_vec(), &self.sandbox, None).await {
            Ok(m) => {
-                let sandbox = self.sandbox.clone();
-                let mut s = sandbox.lock().await;
-                s.mounts = m
+                self.sandbox.lock().await.mounts = m;
            }
            Err(e) => return Err(ttrpc_error(ttrpc::Code::INTERNAL, e)),
        };

        match setup_guest_dns(sl(), req.dns.to_vec()) {
            Ok(_) => {
-                let sandbox = self.sandbox.clone();
-                let mut s = sandbox.lock().await;
+                let mut s = self.sandbox.lock().await;
                let _dns = req
                    .dns
                    .to_vec()
@ -1433,8 +1407,7 @@ impl agent_ttrpc::AgentService for AgentService {
        trace_rpc_call!(ctx, "destroy_sandbox", req);
        is_allowed(&req)?;

-        let s = Arc::clone(&self.sandbox);
-        let mut sandbox = s.lock().await;
+        let mut sandbox = self.sandbox.lock().await;
        // destroy all containers, clean up, notify agent to exit
        // etc.
        sandbox
@ -1501,8 +1474,7 @@ impl agent_ttrpc::AgentService for AgentService {
        req: protocols::agent::OnlineCPUMemRequest,
    ) -> ttrpc::Result<Empty> {
        is_allowed(&req)?;
-        let s = Arc::clone(&self.sandbox);
-        let sandbox = s.lock().await;
+        let sandbox = self.sandbox.lock().await;
        trace_rpc_call!(ctx, "online_cpu_mem", req);

        sandbox
@ -1625,12 +1597,10 @@ impl agent_ttrpc::AgentService for AgentService {
        req: protocols::agent::GetOOMEventRequest,
    ) -> ttrpc::Result<OOMEvent> {
        is_allowed(&req)?;
-        let sandbox = self.sandbox.clone();
-        let s = sandbox.lock().await;
+        let s = self.sandbox.lock().await;
        let event_rx = &s.event_rx.clone();
        let mut event_rx = event_rx.lock().await;
        drop(s);
-        drop(sandbox);

        if let Some(container_id) = event_rx.recv().await {
            info!(sl(), "get_oom_event return {}", &container_id);
@ -1827,7 +1797,7 @@ fn get_agent_details() -> AgentDetails {
    detail
 }

-async fn read_stream(reader: Arc<Mutex<ReadHalf<PipeStream>>>, l: usize) -> Result<Vec<u8>> {
+async fn read_stream(reader: &Mutex<ReadHalf<PipeStream>>, l: usize) -> Result<Vec<u8>> {
    let mut content = vec![0u8; l];

    let mut reader = reader.lock().await;
--- a/src/dragonball/Cargo.lock
+++ b/src/dragonball/Cargo.lock
--- a/src/dragonball/Cargo.toml
+++ b/src/dragonball/Cargo.toml
@ -10,6 +10,7 @@ license = "Apache-2.0"
 edition = "2018"

 [dependencies]
+anyhow = "1.0.32"
 arc-swap = "1.5.0"
 bytes = "1.1.0"
 dbs-address-space =  { path = "./src/dbs_address_space" }
@ -29,6 +30,8 @@ libc = "0.2.39"
 linux-loader = "0.6.0"
 log = "0.4.14"
 nix = "0.24.2"
+procfs = "0.12.0"
+prometheus = { version = "0.13.0", features = ["process"] }
 seccompiler = "0.2.0"
 serde = "1.0.27"
 serde_derive = "1.0.27"
@ -40,10 +43,11 @@ vmm-sys-util = "0.11.0"
 virtio-queue = { version = "0.6.0", optional = true }
 vm-memory = { version = "0.9.0", features = ["backend-mmap"] }
 crossbeam-channel = "0.5.6"
+fuse-backend-rs = "=0.10.4"

 [dev-dependencies]
-slog-term = "2.9.0"
 slog-async = "2.7.0"
+slog-term = "2.9.0"
 test-utils = { path = "../libs/test-utils" }

 [features]
--- a/src/dragonball/src/api/v1/vmm_action.rs
+++ b/src/dragonball/src/api/v1/vmm_action.rs
@ -16,6 +16,8 @@ use crate::event_manager::EventManager;
 use crate::vm::{CpuTopology, KernelConfigInfo, VmConfigInfo};
 use crate::vmm::Vmm;

+use crate::hypervisor_metrics::get_hypervisor_metrics;
+
 use self::VmConfigError::*;
 use self::VmmActionError::MachineConfig;

@ -58,6 +60,11 @@ pub enum VmmActionError {
    #[error("Upcall not ready, can't hotplug device.")]
    UpcallServerNotReady,

+    /// Error when get prometheus metrics.
+    /// Currently does not distinguish between error types for metrics.
+    #[error("failed to get hypervisor metrics")]
+    GetHypervisorMetrics,
+
    /// The action `ConfigureBootSource` failed either because of bad user input or an internal
    /// error.
    #[error("failed to configure boot source for VM: {0}")]
@ -135,6 +142,9 @@ pub enum VmmAction {
    /// Get the configuration of the microVM.
    GetVmConfiguration,

+    /// Get Prometheus Metrics.
+    GetHypervisorMetrics,
+
    /// Set the microVM configuration (memory & vcpu) using `VmConfig` as input. This
    /// action can only be called before the microVM has booted.
    SetVmConfiguration(VmConfigInfo),
@ -208,6 +218,8 @@ pub enum VmmData {
    Empty,
    /// The microVM configuration represented by `VmConfigInfo`.
    MachineConfiguration(Box<VmConfigInfo>),
+    /// Prometheus Metrics represented by String.
+    HypervisorMetrics(String),
 }

 /// Request data type used to communicate between the API and the VMM.
@ -262,6 +274,7 @@ impl VmmService {
            VmmAction::GetVmConfiguration => Ok(VmmData::MachineConfiguration(Box::new(
                self.machine_config.clone(),
            ))),
+            VmmAction::GetHypervisorMetrics => self.get_hypervisor_metrics(),
            VmmAction::SetVmConfiguration(machine_config) => {
                self.set_vm_configuration(vmm, machine_config)
            }
@ -381,6 +394,13 @@ impl VmmService {
        Ok(VmmData::Empty)
    }

+    /// Get prometheus metrics.
+    fn get_hypervisor_metrics(&self) -> VmmRequestResult {
+        get_hypervisor_metrics()
+            .map_err(|_| VmmActionError::GetHypervisorMetrics)
+            .map(VmmData::HypervisorMetrics)
+    }
+
    /// Set virtual machine configuration.
    pub fn set_vm_configuration(
        &mut self,
--- a/src/dragonball/src/hypervisor_metrics.rs
+++ b/src/dragonball/src/hypervisor_metrics.rs
@ -0,0 +1,102 @@
+// Copyright 2021-2022 Ant Group
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+extern crate procfs;
+
+use crate::metric::{IncMetric, METRICS};
+use anyhow::{anyhow, Result};
+use prometheus::{Encoder, IntCounter, IntGaugeVec, Opts, Registry, TextEncoder};
+use std::sync::Mutex;
+
+const NAMESPACE_KATA_HYPERVISOR: &str = "kata_hypervisor";
+
+lazy_static! {
+    static ref REGISTERED: Mutex<bool> = Mutex::new(false);
+
+    // custom registry
+    static ref REGISTRY: Registry = Registry::new();
+
+    // hypervisor metrics
+    static ref HYPERVISOR_SCRAPE_COUNT: IntCounter =
+    IntCounter::new(format!("{}_{}",NAMESPACE_KATA_HYPERVISOR,"scrape_count"), "Hypervisor metrics scrape count.").unwrap();
+
+    static ref HYPERVISOR_VCPU: IntGaugeVec =
+    IntGaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_HYPERVISOR,"vcpu"), "Hypervisor metrics specific to VCPUs' mode of functioning."), &["item"]).unwrap();
+
+    static ref HYPERVISOR_SECCOMP: IntGaugeVec =
+    IntGaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_HYPERVISOR,"seccomp"), "Hypervisor metrics for the seccomp filtering."), &["item"]).unwrap();
+
+    static ref HYPERVISOR_SIGNALS: IntGaugeVec =
+    IntGaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_HYPERVISOR,"signals"), "Hypervisor metrics related to signals."), &["item"]).unwrap();
+}
+
+/// get prometheus metrics
+pub fn get_hypervisor_metrics() -> Result<String> {
+    let mut registered = REGISTERED
+        .lock()
+        .map_err(|e| anyhow!("failed to check hypervisor metrics register status {:?}", e))?;
+
+    if !(*registered) {
+        register_hypervisor_metrics()?;
+        *registered = true;
+    }
+
+    update_hypervisor_metrics()?;
+
+    // gather all metrics and return as a String
+    let metric_families = REGISTRY.gather();
+
+    let mut buffer = Vec::new();
+    let encoder = TextEncoder::new();
+    encoder.encode(&metric_families, &mut buffer)?;
+
+    Ok(String::from_utf8(buffer)?)
+}
+
+fn register_hypervisor_metrics() -> Result<()> {
+    REGISTRY.register(Box::new(HYPERVISOR_SCRAPE_COUNT.clone()))?;
+    REGISTRY.register(Box::new(HYPERVISOR_VCPU.clone()))?;
+    REGISTRY.register(Box::new(HYPERVISOR_SECCOMP.clone()))?;
+    REGISTRY.register(Box::new(HYPERVISOR_SIGNALS.clone()))?;
+
+    Ok(())
+}
+
+fn update_hypervisor_metrics() -> Result<()> {
+    HYPERVISOR_SCRAPE_COUNT.inc();
+
+    set_intgauge_vec_vcpu(&HYPERVISOR_VCPU);
+    set_intgauge_vec_seccomp(&HYPERVISOR_SECCOMP);
+    set_intgauge_vec_signals(&HYPERVISOR_SIGNALS);
+
+    Ok(())
+}
+
+fn set_intgauge_vec_vcpu(icv: &prometheus::IntGaugeVec) {
+    icv.with_label_values(&["exit_io_in"])
+        .set(METRICS.vcpu.exit_io_in.count() as i64);
+    icv.with_label_values(&["exit_io_out"])
+        .set(METRICS.vcpu.exit_io_out.count() as i64);
+    icv.with_label_values(&["exit_mmio_read"])
+        .set(METRICS.vcpu.exit_mmio_read.count() as i64);
+    icv.with_label_values(&["exit_mmio_write"])
+        .set(METRICS.vcpu.exit_mmio_write.count() as i64);
+    icv.with_label_values(&["failures"])
+        .set(METRICS.vcpu.failures.count() as i64);
+    icv.with_label_values(&["filter_cpuid"])
+        .set(METRICS.vcpu.filter_cpuid.count() as i64);
+}
+
+fn set_intgauge_vec_seccomp(icv: &prometheus::IntGaugeVec) {
+    icv.with_label_values(&["num_faults"])
+        .set(METRICS.seccomp.num_faults.count() as i64);
+}
+
+fn set_intgauge_vec_signals(icv: &prometheus::IntGaugeVec) {
+    icv.with_label_values(&["sigbus"])
+        .set(METRICS.signals.sigbus.count() as i64);
+    icv.with_label_values(&["sigsegv"])
+        .set(METRICS.signals.sigsegv.count() as i64);
+}
--- a/src/dragonball/src/lib.rs
+++ b/src/dragonball/src/lib.rs
@ -9,6 +9,9 @@
 //TODO: Remove this, after the rest of dragonball has been committed.
 #![allow(dead_code)]

+#[macro_use]
+extern crate lazy_static;
+
 /// Address space manager for virtual machines.
 pub mod address_space_manager;
 /// API to handle vmm requests.
@ -19,6 +22,8 @@ pub mod config_manager;
 pub mod device_manager;
 /// Errors related to Virtual machine manager.
 pub mod error;
+/// Prometheus Metrics.
+pub mod hypervisor_metrics;
 /// KVM operation context for virtual machines.
 pub mod kvm_context;
 /// Metrics system.
--- a/src/runtime-rs/Cargo.lock
+++ b/src/runtime-rs/Cargo.lock
--- a/src/runtime-rs/crates/agent/src/kata/agent.rs
+++ b/src/runtime-rs/crates/agent/src/kata/agent.rs
@ -121,5 +121,6 @@ impl_agent!(
    set_ip_tables | crate::SetIPTablesRequest | crate::SetIPTablesResponse | None,
    get_volume_stats | crate::VolumeStatsRequest | crate::VolumeStatsResponse | None,
    resize_volume | crate::ResizeVolumeRequest | crate::Empty | None,
-    online_cpu_mem | crate::OnlineCPUMemRequest | crate::Empty | None
+    online_cpu_mem | crate::OnlineCPUMemRequest | crate::Empty | None,
+    get_metrics | crate::Empty | crate::MetricsResponse | None
 );
--- a/src/runtime-rs/crates/agent/src/kata/trans.rs
+++ b/src/runtime-rs/crates/agent/src/kata/trans.rs
@ -7,7 +7,7 @@
 use std::convert::Into;

 use protocols::{
-    agent::{self, OOMEvent},
+    agent::{self, Metrics, OOMEvent},
    csi, empty, health, types,
 };

@ -19,13 +19,13 @@ use crate::{
        Empty, ExecProcessRequest, FSGroup, FSGroupChangePolicy, GetIPTablesRequest,
        GetIPTablesResponse, GuestDetailsResponse, HealthCheckResponse, HugetlbStats, IPAddress,
        IPFamily, Interface, Interfaces, KernelModule, MemHotplugByProbeRequest, MemoryData,
-        MemoryStats, NetworkStats, OnlineCPUMemRequest, PidsStats, ReadStreamRequest,
-        ReadStreamResponse, RemoveContainerRequest, ReseedRandomDevRequest, ResizeVolumeRequest,
-        Route, Routes, SetGuestDateTimeRequest, SetIPTablesRequest, SetIPTablesResponse,
-        SignalProcessRequest, StatsContainerResponse, Storage, StringUser, ThrottlingData,
-        TtyWinResizeRequest, UpdateContainerRequest, UpdateInterfaceRequest, UpdateRoutesRequest,
-        VersionCheckResponse, VolumeStatsRequest, VolumeStatsResponse, WaitProcessRequest,
-        WriteStreamRequest,
+        MemoryStats, MetricsResponse, NetworkStats, OnlineCPUMemRequest, PidsStats,
+        ReadStreamRequest, ReadStreamResponse, RemoveContainerRequest, ReseedRandomDevRequest,
+        ResizeVolumeRequest, Route, Routes, SetGuestDateTimeRequest, SetIPTablesRequest,
+        SetIPTablesResponse, SignalProcessRequest, StatsContainerResponse, Storage, StringUser,
+        ThrottlingData, TtyWinResizeRequest, UpdateContainerRequest, UpdateInterfaceRequest,
+        UpdateRoutesRequest, VersionCheckResponse, VolumeStatsRequest, VolumeStatsResponse,
+        WaitProcessRequest, WriteStreamRequest,
    },
    OomEventResponse, WaitProcessResponse, WriteStreamResponse,
 };
@ -755,6 +755,14 @@ impl From<agent::WaitProcessResponse> for WaitProcessResponse {
    }
 }

+impl From<Empty> for agent::GetMetricsRequest {
+    fn from(_: Empty) -> Self {
+        Self {
+            ..Default::default()
+        }
+    }
+}
+
 impl From<Empty> for agent::GetOOMEventRequest {
    fn from(_: Empty) -> Self {
        Self {
@ -789,6 +797,14 @@ impl From<health::VersionCheckResponse> for VersionCheckResponse {
    }
 }

+impl From<agent::Metrics> for MetricsResponse {
+    fn from(from: Metrics) -> Self {
+        Self {
+            metrics: from.metrics,
+        }
+    }
+}
+
 impl From<agent::OOMEvent> for OomEventResponse {
    fn from(from: OOMEvent) -> Self {
        Self {
--- a/src/runtime-rs/crates/agent/src/lib.rs
+++ b/src/runtime-rs/crates/agent/src/lib.rs
@ -18,13 +18,14 @@ pub use types::{
    CloseStdinRequest, ContainerID, ContainerProcessID, CopyFileRequest, CreateContainerRequest,
    CreateSandboxRequest, Empty, ExecProcessRequest, GetGuestDetailsRequest, GetIPTablesRequest,
    GetIPTablesResponse, GuestDetailsResponse, HealthCheckResponse, IPAddress, IPFamily, Interface,
-    Interfaces, ListProcessesRequest, MemHotplugByProbeRequest, OnlineCPUMemRequest,
-    OomEventResponse, ReadStreamRequest, ReadStreamResponse, RemoveContainerRequest,
-    ReseedRandomDevRequest, ResizeVolumeRequest, Route, Routes, SetGuestDateTimeRequest,
-    SetIPTablesRequest, SetIPTablesResponse, SignalProcessRequest, StatsContainerResponse, Storage,
-    TtyWinResizeRequest, UpdateContainerRequest, UpdateInterfaceRequest, UpdateRoutesRequest,
-    VersionCheckResponse, VolumeStatsRequest, VolumeStatsResponse, WaitProcessRequest,
-    WaitProcessResponse, WriteStreamRequest, WriteStreamResponse,
+    Interfaces, ListProcessesRequest, MemHotplugByProbeRequest, MetricsResponse,
+    OnlineCPUMemRequest, OomEventResponse, ReadStreamRequest, ReadStreamResponse,
+    RemoveContainerRequest, ReseedRandomDevRequest, ResizeVolumeRequest, Route, Routes,
+    SetGuestDateTimeRequest, SetIPTablesRequest, SetIPTablesResponse, SignalProcessRequest,
+    StatsContainerResponse, Storage, TtyWinResizeRequest, UpdateContainerRequest,
+    UpdateInterfaceRequest, UpdateRoutesRequest, VersionCheckResponse, VolumeStatsRequest,
+    VolumeStatsResponse, WaitProcessRequest, WaitProcessResponse, WriteStreamRequest,
+    WriteStreamResponse,
 };

 use anyhow::Result;
@ -86,6 +87,7 @@ pub trait Agent: AgentManager + HealthService + Send + Sync {

    // utils
    async fn copy_file(&self, req: CopyFileRequest) -> Result<Empty>;
+    async fn get_metrics(&self, req: Empty) -> Result<MetricsResponse>;
    async fn get_oom_event(&self, req: Empty) -> Result<OomEventResponse>;
    async fn get_ip_tables(&self, req: GetIPTablesRequest) -> Result<GetIPTablesResponse>;
    async fn set_ip_tables(&self, req: SetIPTablesRequest) -> Result<SetIPTablesResponse>;
--- a/src/runtime-rs/crates/agent/src/types.rs
+++ b/src/runtime-rs/crates/agent/src/types.rs
@ -556,6 +556,11 @@ pub struct VersionCheckResponse {
    pub agent_version: String,
 }

+#[derive(PartialEq, Clone, Default, Debug)]
+pub struct MetricsResponse {
+    pub metrics: String,
+}
+
 #[derive(PartialEq, Clone, Default, Debug)]
 pub struct OomEventResponse {
    pub container_id: String,
--- a/src/runtime-rs/crates/hypervisor/ch-config/src/ch_api.rs
+++ b/src/runtime-rs/crates/hypervisor/ch-config/src/ch_api.rs
@ -2,7 +2,7 @@
 //
 // SPDX-License-Identifier: Apache-2.0

-use crate::{DeviceConfig, FsConfig, VmConfig};
+use crate::{DeviceConfig, DiskConfig, FsConfig, VmConfig};
 use anyhow::{anyhow, Result};
 use api_client::simple_api_full_command_and_response;

@ -69,6 +69,24 @@ pub async fn cloud_hypervisor_vm_stop(mut socket: UnixStream) -> Result<Option<S
    .await?
 }

+pub async fn cloud_hypervisor_vm_blockdev_add(
+    mut socket: UnixStream,
+    blk_config: DiskConfig,
+) -> Result<Option<String>> {
+    task::spawn_blocking(move || -> Result<Option<String>> {
+        let response = simple_api_full_command_and_response(
+            &mut socket,
+            "PUT",
+            "vm.add-disk",
+            Some(&serde_json::to_string(&blk_config)?),
+        )
+        .map_err(|e| anyhow!(e))?;
+
+        Ok(response)
+    })
+    .await?
+}
+
 #[allow(dead_code)]
 pub async fn cloud_hypervisor_vm_device_add(mut socket: UnixStream) -> Result<Option<String>> {
    let device_config = DeviceConfig::default();
--- a/src/runtime-rs/crates/hypervisor/src/ch/inner_device.rs
+++ b/src/runtime-rs/crates/hypervisor/src/ch/inner_device.rs
@ -6,17 +6,21 @@

 use super::inner::CloudHypervisorInner;
 use crate::device::DeviceType;
+use crate::BlockConfig;
 use crate::HybridVsockConfig;
 use crate::ShareFsDeviceConfig;
 use crate::VmmState;
 use anyhow::{anyhow, Context, Result};
-use ch_config::ch_api::cloud_hypervisor_vm_fs_add;
+use ch_config::ch_api::{cloud_hypervisor_vm_blockdev_add, cloud_hypervisor_vm_fs_add};
+use ch_config::DiskConfig;
 use ch_config::FsConfig;
 use safe_path::scoped_join;
 use std::convert::TryFrom;
 use std::path::PathBuf;

 const VIRTIO_FS: &str = "virtio-fs";
+const DEFAULT_DISK_QUEUES: usize = 1;
+const DEFAULT_DISK_QUEUE_SIZE: u16 = 1024;

 impl CloudHypervisorInner {
    pub(crate) async fn add_device(&mut self, device: DeviceType) -> Result<()> {
@ -43,6 +47,7 @@ impl CloudHypervisorInner {
        match device {
            DeviceType::ShareFs(sharefs) => self.handle_share_fs_device(sharefs.config).await,
            DeviceType::HybridVsock(hvsock) => self.handle_hvsock_device(&hvsock.config).await,
+            DeviceType::Block(block) => self.handle_block_device(block.config).await,
            _ => Err(anyhow!("unhandled device: {:?}", device)),
        }
    }
@ -125,6 +130,37 @@ impl CloudHypervisorInner {
        Ok(())
    }

+    async fn handle_block_device(&mut self, cfg: BlockConfig) -> Result<()> {
+        let socket = self
+            .api_socket
+            .as_ref()
+            .ok_or("missing socket")
+            .map_err(|e| anyhow!(e))?;
+
+        let num_queues: usize = DEFAULT_DISK_QUEUES;
+        let queue_size: u16 = DEFAULT_DISK_QUEUE_SIZE;
+
+        let block_config = DiskConfig {
+            path: Some(cfg.path_on_host.as_str().into()),
+            readonly: cfg.is_readonly,
+            num_queues,
+            queue_size,
+            ..Default::default()
+        };
+
+        let response = cloud_hypervisor_vm_blockdev_add(
+            socket.try_clone().context("failed to clone socket")?,
+            block_config,
+        )
+        .await?;
+
+        if let Some(detail) = response {
+            debug!(sl!(), "blockdev add response: {:?}", detail);
+        }
+
+        Ok(())
+    }
+
    pub(crate) async fn get_shared_fs_devices(&mut self) -> Result<Option<Vec<FsConfig>>> {
        let pending_root_devices = self.pending_devices.take();

@ -173,13 +209,13 @@ impl TryFrom<ShareFsSettings> for FsConfig {
        let num_queues: usize = if cfg.queue_num > 0 {
            cfg.queue_num as usize
        } else {
-            1
+            DEFAULT_DISK_QUEUES
        };

        let queue_size: u16 = if cfg.queue_num > 0 {
            u16::try_from(cfg.queue_size)?
        } else {
-            1024
+            DEFAULT_DISK_QUEUE_SIZE
        };

        let socket_path = if cfg.sock_path.starts_with('/') {
--- a/src/runtime-rs/crates/hypervisor/src/ch/inner_hypervisor.rs
+++ b/src/runtime-rs/crates/hypervisor/src/ch/inner_hypervisor.rs
@ -536,6 +536,10 @@ impl CloudHypervisorInner {
        caps.set(CapabilityBits::FsSharingSupport);
        Ok(caps)
    }
+
+    pub(crate) async fn get_hypervisor_metrics(&self) -> Result<String> {
+        todo!()
+    }
 }

 // Log all output from the CH process until a shutdown signal is received.
--- a/src/runtime-rs/crates/hypervisor/src/ch/mod.rs
+++ b/src/runtime-rs/crates/hypervisor/src/ch/mod.rs
@ -152,6 +152,11 @@ impl Hypervisor for CloudHypervisor {
        let inner = self.inner.read().await;
        inner.capabilities().await
    }
+
+    async fn get_hypervisor_metrics(&self) -> Result<String> {
+        let inner = self.inner.read().await;
+        inner.get_hypervisor_metrics().await
+    }
 }

 #[async_trait]
--- a/src/runtime-rs/crates/hypervisor/src/dragonball/inner_hypervisor.rs
+++ b/src/runtime-rs/crates/hypervisor/src/dragonball/inner_hypervisor.rs
@ -92,6 +92,11 @@ impl DragonballInner {
        ))
    }

+    pub(crate) async fn get_hypervisor_metrics(&self) -> Result<String> {
+        info!(sl!(), "get hypervisor metrics");
+        self.vmm_instance.get_hypervisor_metrics()
+    }
+
    pub(crate) async fn disconnect(&mut self) {
        self.state = VmmState::NotReady;
    }
--- a/src/runtime-rs/crates/hypervisor/src/dragonball/mod.rs
+++ b/src/runtime-rs/crates/hypervisor/src/dragonball/mod.rs
@ -160,6 +160,11 @@ impl Hypervisor for Dragonball {
        let inner = self.inner.read().await;
        inner.capabilities().await
    }
+
+    async fn get_hypervisor_metrics(&self) -> Result<String> {
+        let inner = self.inner.read().await;
+        inner.get_hypervisor_metrics().await
+    }
 }

 #[async_trait]
--- a/src/runtime-rs/crates/hypervisor/src/dragonball/vmm_instance.rs
+++ b/src/runtime-rs/crates/hypervisor/src/dragonball/vmm_instance.rs
@ -267,6 +267,15 @@ impl VmmInstance {
        std::process::id()
    }

+    pub fn get_hypervisor_metrics(&self) -> Result<String> {
+        if let Ok(VmmData::HypervisorMetrics(metrics)) =
+            self.handle_request(Request::Sync(VmmAction::GetHypervisorMetrics))
+        {
+            return Ok(metrics);
+        }
+        Err(anyhow!("Failed to get hypervisor metrics"))
+    }
+
    pub fn stop(&mut self) -> Result<()> {
        self.handle_request(Request::Sync(VmmAction::ShutdownMicroVm))
            .map_err(|e| {
--- a/src/runtime-rs/crates/hypervisor/src/lib.rs
+++ b/src/runtime-rs/crates/hypervisor/src/lib.rs
@ -97,4 +97,5 @@ pub trait Hypervisor: std::fmt::Debug + Send + Sync {
    async fn get_jailer_root(&self) -> Result<String>;
    async fn save_state(&self) -> Result<HypervisorState>;
    async fn capabilities(&self) -> Result<Capabilities>;
+    async fn get_hypervisor_metrics(&self) -> Result<String>;
 }
--- a/src/runtime-rs/crates/hypervisor/src/qemu/inner.rs
+++ b/src/runtime-rs/crates/hypervisor/src/qemu/inner.rs
@ -136,6 +136,10 @@ impl QemuInner {
        info!(sl!(), "QemuInner::hypervisor_config()");
        self.config.clone()
    }
+
+    pub(crate) async fn get_hypervisor_metrics(&self) -> Result<String> {
+        todo!()
+    }
 }

 use crate::device::DeviceType;
--- a/src/runtime-rs/crates/hypervisor/src/qemu/mod.rs
+++ b/src/runtime-rs/crates/hypervisor/src/qemu/mod.rs
@ -147,4 +147,9 @@ impl Hypervisor for Qemu {
        let inner = self.inner.read().await;
        inner.capabilities().await
    }
+
+    async fn get_hypervisor_metrics(&self) -> Result<String> {
+        let inner = self.inner.read().await;
+        inner.get_hypervisor_metrics().await
+    }
 }
--- a/src/runtime-rs/crates/runtimes/Cargo.toml
+++ b/src/runtime-rs/crates/runtimes/Cargo.toml
@ -22,6 +22,8 @@ hyperlocal = "0.8"
 serde_json = "1.0.88"
 nix = "0.25.0"
 url = "2.3.1"
+procfs = "0.12.0"
+prometheus = { version = "0.13.0", features = ["process"] }

 agent = { path = "../agent" }
 common = { path = "./common" }
--- a/src/runtime-rs/crates/runtimes/common/src/sandbox.rs
+++ b/src/runtime-rs/crates/runtimes/common/src/sandbox.rs
@ -41,4 +41,8 @@ pub trait Sandbox: Send + Sync {
    async fn direct_volume_stats(&self, volume_path: &str) -> Result<String>;
    async fn direct_volume_resize(&self, resize_req: agent::ResizeVolumeRequest) -> Result<()>;
    async fn agent_sock(&self) -> Result<String>;
+
+    // metrics function
+    async fn agent_metrics(&self) -> Result<String>;
+    async fn hypervisor_metrics(&self) -> Result<String>;
 }
--- a/src/runtime-rs/crates/runtimes/src/lib.rs
+++ b/src/runtime-rs/crates/runtimes/src/lib.rs
@ -4,6 +4,9 @@
 // SPDX-License-Identifier: Apache-2.0
 //

+#[macro_use(lazy_static)]
+extern crate lazy_static;
+
 #[macro_use]
 extern crate slog;

@ -12,5 +15,6 @@ logging::logger_with_subsystem!(sl, "runtimes");
 pub mod manager;
 pub use manager::RuntimeHandlerManager;
 pub use shim_interface;
+mod shim_metrics;
 mod shim_mgmt;
 pub mod tracer;
--- a/src/runtime-rs/crates/runtimes/src/shim_metrics.rs
+++ b/src/runtime-rs/crates/runtimes/src/shim_metrics.rs
@ -0,0 +1,235 @@
+// Copyright 2021-2022 Ant Group
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+extern crate procfs;
+
+use anyhow::{anyhow, Result};
+use prometheus::{Encoder, Gauge, GaugeVec, Opts, Registry, TextEncoder};
+use slog::warn;
+use std::sync::Mutex;
+
+const NAMESPACE_KATA_SHIM: &str = "kata_shim";
+
+// Convenience macro to obtain the scope logger
+macro_rules! sl {
+    () => {
+        slog_scope::logger().new(o!("subsystem" => "metrics"))
+    };
+}
+
+lazy_static! {
+    static ref REGISTERED: Mutex<bool> = Mutex::new(false);
+
+    // custom registry
+    static ref REGISTRY: Registry = Registry::new();
+
+    // shim metrics
+    static ref SHIM_THREADS: Gauge = Gauge::new(format!("{}_{}", NAMESPACE_KATA_SHIM, "threads"),"Kata containerd shim v2 process threads.").unwrap();
+
+    static ref SHIM_PROC_STATUS: GaugeVec =
+        GaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_SHIM,"proc_status"), "Kata containerd shim v2 process status."), &["item"]).unwrap();
+
+    static ref SHIM_PROC_STAT: GaugeVec = GaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_SHIM,"proc_stat"), "Kata containerd shim v2 process statistics."), &["item"]).unwrap();
+
+    static ref SHIM_NETDEV: GaugeVec = GaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_SHIM,"netdev"), "Kata containerd shim v2 network devices statistics."), &["interface", "item"]).unwrap();
+
+    static ref SHIM_IO_STAT: GaugeVec = GaugeVec::new(Opts::new(format!("{}_{}",NAMESPACE_KATA_SHIM,"io_stat"), "Kata containerd shim v2 process IO statistics."), &["item"]).unwrap();
+
+    static ref SHIM_OPEN_FDS: Gauge = Gauge::new(format!("{}_{}", NAMESPACE_KATA_SHIM, "fds"), "Kata containerd shim v2 open FDs.").unwrap();
+}
+
+pub fn get_shim_metrics() -> Result<String> {
+    let mut registered = REGISTERED
+        .lock()
+        .map_err(|e| anyhow!("failed to check shim metrics register status {:?}", e))?;
+
+    if !(*registered) {
+        register_shim_metrics()?;
+        *registered = true;
+    }
+
+    update_shim_metrics()?;
+
+    // gather all metrics and return as a String
+    let metric_families = REGISTRY.gather();
+
+    let mut buffer = Vec::new();
+    let encoder = TextEncoder::new();
+    encoder.encode(&metric_families, &mut buffer)?;
+
+    Ok(String::from_utf8(buffer)?)
+}
+
+fn register_shim_metrics() -> Result<()> {
+    REGISTRY.register(Box::new(SHIM_THREADS.clone()))?;
+    REGISTRY.register(Box::new(SHIM_PROC_STATUS.clone()))?;
+    REGISTRY.register(Box::new(SHIM_PROC_STAT.clone()))?;
+    REGISTRY.register(Box::new(SHIM_NETDEV.clone()))?;
+    REGISTRY.register(Box::new(SHIM_IO_STAT.clone()))?;
+    REGISTRY.register(Box::new(SHIM_OPEN_FDS.clone()))?;
+
+    // TODO:
+    // REGISTRY.register(Box::new(RPC_DURATIONS_HISTOGRAM.clone()))?;
+    // REGISTRY.register(Box::new(SHIM_POD_OVERHEAD_CPU.clone()))?;
+    // REGISTRY.register(Box::new(SHIM_POD_OVERHEAD_MEMORY.clone()))?;
+
+    Ok(())
+}
+
+fn update_shim_metrics() -> Result<()> {
+    let me = procfs::process::Process::myself();
+
+    let me = match me {
+        Ok(p) => p,
+        Err(e) => {
+            warn!(sl!(), "failed to create process instance: {:?}", e);
+            return Ok(());
+        }
+    };
+
+    SHIM_THREADS.set(me.stat.num_threads as f64);
+
+    match me.status() {
+        Err(err) => error!(sl!(), "failed to get process status: {:?}", err),
+        Ok(status) => set_gauge_vec_proc_status(&SHIM_PROC_STATUS, &status),
+    }
+
+    match me.stat() {
+        Err(err) => {
+            error!(sl!(), "failed to get process stat: {:?}", err);
+        }
+        Ok(stat) => {
+            set_gauge_vec_proc_stat(&SHIM_PROC_STAT, &stat);
+        }
+    }
+
+    match procfs::net::dev_status() {
+        Err(err) => {
+            error!(sl!(), "failed to get host net::dev_status: {:?}", err);
+        }
+        Ok(devs) => {
+            for (_, status) in devs {
+                set_gauge_vec_netdev(&SHIM_NETDEV, &status);
+            }
+        }
+    }
+
+    match me.io() {
+        Err(err) => {
+            error!(sl!(), "failed to get process io stat: {:?}", err);
+        }
+        Ok(io) => {
+            set_gauge_vec_proc_io(&SHIM_IO_STAT, &io);
+        }
+    }
+
+    match me.fd_count() {
+        Err(err) => {
+            error!(sl!(), "failed to get process open fds number: {:?}", err);
+        }
+        Ok(fds) => {
+            SHIM_OPEN_FDS.set(fds as f64);
+        }
+    }
+
+    // TODO:
+    // RPC_DURATIONS_HISTOGRAM & SHIM_POD_OVERHEAD_CPU & SHIM_POD_OVERHEAD_MEMORY
+
+    Ok(())
+}
+
+fn set_gauge_vec_proc_status(gv: &prometheus::GaugeVec, status: &procfs::process::Status) {
+    gv.with_label_values(&["vmpeak"])
+        .set(status.vmpeak.unwrap_or(0) as f64);
+    gv.with_label_values(&["vmsize"])
+        .set(status.vmsize.unwrap_or(0) as f64);
+    gv.with_label_values(&["vmlck"])
+        .set(status.vmlck.unwrap_or(0) as f64);
+    gv.with_label_values(&["vmpin"])
+        .set(status.vmpin.unwrap_or(0) as f64);
+    gv.with_label_values(&["vmhwm"])
+        .set(status.vmhwm.unwrap_or(0) as f64);
+    gv.with_label_values(&["vmrss"])
+        .set(status.vmrss.unwrap_or(0) as f64);
+    gv.with_label_values(&["rssanon"])
+        .set(status.rssanon.unwrap_or(0) as f64);
+    gv.with_label_values(&["rssfile"])
+        .set(status.rssfile.unwrap_or(0) as f64);
+    gv.with_label_values(&["rssshmem"])
+        .set(status.rssshmem.unwrap_or(0) as f64);
+    gv.with_label_values(&["vmdata"])
+        .set(status.vmdata.unwrap_or(0) as f64);
+    gv.with_label_values(&["vmstk"])
+        .set(status.vmstk.unwrap_or(0) as f64);
+    gv.with_label_values(&["vmexe"])
+        .set(status.vmexe.unwrap_or(0) as f64);
+    gv.with_label_values(&["vmlib"])
+        .set(status.vmlib.unwrap_or(0) as f64);
+    gv.with_label_values(&["vmpte"])
+        .set(status.vmpte.unwrap_or(0) as f64);
+    gv.with_label_values(&["vmswap"])
+        .set(status.vmswap.unwrap_or(0) as f64);
+    gv.with_label_values(&["hugetlbpages"])
+        .set(status.hugetlbpages.unwrap_or(0) as f64);
+    gv.with_label_values(&["voluntary_ctxt_switches"])
+        .set(status.voluntary_ctxt_switches.unwrap_or(0) as f64);
+    gv.with_label_values(&["nonvoluntary_ctxt_switches"])
+        .set(status.nonvoluntary_ctxt_switches.unwrap_or(0) as f64);
+}
+
+fn set_gauge_vec_proc_stat(gv: &prometheus::GaugeVec, stat: &procfs::process::Stat) {
+    gv.with_label_values(&["utime"]).set(stat.utime as f64);
+    gv.with_label_values(&["stime"]).set(stat.stime as f64);
+    gv.with_label_values(&["cutime"]).set(stat.cutime as f64);
+    gv.with_label_values(&["cstime"]).set(stat.cstime as f64);
+}
+
+fn set_gauge_vec_netdev(gv: &prometheus::GaugeVec, status: &procfs::net::DeviceStatus) {
+    gv.with_label_values(&[status.name.as_str(), "recv_bytes"])
+        .set(status.recv_bytes as f64);
+    gv.with_label_values(&[status.name.as_str(), "recv_packets"])
+        .set(status.recv_packets as f64);
+    gv.with_label_values(&[status.name.as_str(), "recv_errs"])
+        .set(status.recv_errs as f64);
+    gv.with_label_values(&[status.name.as_str(), "recv_drop"])
+        .set(status.recv_drop as f64);
+    gv.with_label_values(&[status.name.as_str(), "recv_fifo"])
+        .set(status.recv_fifo as f64);
+    gv.with_label_values(&[status.name.as_str(), "recv_frame"])
+        .set(status.recv_frame as f64);
+    gv.with_label_values(&[status.name.as_str(), "recv_compressed"])
+        .set(status.recv_compressed as f64);
+    gv.with_label_values(&[status.name.as_str(), "recv_multicast"])
+        .set(status.recv_multicast as f64);
+    gv.with_label_values(&[status.name.as_str(), "sent_bytes"])
+        .set(status.sent_bytes as f64);
+    gv.with_label_values(&[status.name.as_str(), "sent_packets"])
+        .set(status.sent_packets as f64);
+    gv.with_label_values(&[status.name.as_str(), "sent_errs"])
+        .set(status.sent_errs as f64);
+    gv.with_label_values(&[status.name.as_str(), "sent_drop"])
+        .set(status.sent_drop as f64);
+    gv.with_label_values(&[status.name.as_str(), "sent_fifo"])
+        .set(status.sent_fifo as f64);
+    gv.with_label_values(&[status.name.as_str(), "sent_colls"])
+        .set(status.sent_colls as f64);
+    gv.with_label_values(&[status.name.as_str(), "sent_carrier"])
+        .set(status.sent_carrier as f64);
+    gv.with_label_values(&[status.name.as_str(), "sent_compressed"])
+        .set(status.sent_compressed as f64);
+}
+
+fn set_gauge_vec_proc_io(gv: &prometheus::GaugeVec, io_stat: &procfs::process::Io) {
+    gv.with_label_values(&["rchar"]).set(io_stat.rchar as f64);
+    gv.with_label_values(&["wchar"]).set(io_stat.wchar as f64);
+    gv.with_label_values(&["syscr"]).set(io_stat.syscr as f64);
+    gv.with_label_values(&["syscw"]).set(io_stat.syscw as f64);
+    gv.with_label_values(&["read_bytes"])
+        .set(io_stat.read_bytes as f64);
+    gv.with_label_values(&["write_bytes"])
+        .set(io_stat.write_bytes as f64);
+    gv.with_label_values(&["cancelled_write_bytes"])
+        .set(io_stat.cancelled_write_bytes as f64);
+}
--- a/src/runtime-rs/crates/runtimes/src/shim_mgmt/handlers.rs
+++ b/src/runtime-rs/crates/runtimes/src/shim_mgmt/handlers.rs
@ -7,6 +7,7 @@
 // This defines the handlers corresponding to the url when a request is sent to destined url,
 // the handler function should be invoked, and the corresponding data will be in the response

+use crate::shim_metrics::get_shim_metrics;
 use agent::ResizeVolumeRequest;
 use anyhow::{anyhow, Context, Result};
 use common::Sandbox;
@ -16,7 +17,7 @@ use url::Url;

 use shim_interface::shim_mgmt::{
    AGENT_URL, DIRECT_VOLUME_PATH_KEY, DIRECT_VOLUME_RESIZE_URL, DIRECT_VOLUME_STATS_URL,
-    IP6_TABLE_URL, IP_TABLE_URL,
+    IP6_TABLE_URL, IP_TABLE_URL, METRICS_URL,
 };

 // main router for response, this works as a multiplexer on
@ -43,6 +44,7 @@ pub(crate) async fn handler_mux(
        (&Method::POST, DIRECT_VOLUME_RESIZE_URL) => {
            direct_volume_resize_handler(sandbox, req).await
        }
+        (&Method::GET, METRICS_URL) => metrics_url_handler(sandbox, req).await,
        _ => Ok(not_found(req).await),
    }
 }
@ -146,3 +148,19 @@ async fn direct_volume_resize_handler(
        _ => Err(anyhow!("handler: Failed to resize volume")),
    }
 }
+
+// returns the url for metrics
+async fn metrics_url_handler(
+    sandbox: Arc<dyn Sandbox>,
+    _req: Request<Body>,
+) -> Result<Response<Body>> {
+    // get metrics from agent, hypervisor, and shim
+    let agent_metrics = sandbox.agent_metrics().await.unwrap_or_default();
+    let hypervisor_metrics = sandbox.hypervisor_metrics().await.unwrap_or_default();
+    let shim_metrics = get_shim_metrics().unwrap_or_default();
+
+    Ok(Response::new(Body::from(format!(
+        "{}{}{}",
+        agent_metrics, hypervisor_metrics, shim_metrics
+    ))))
+}
--- a/src/runtime-rs/crates/runtimes/virt_container/src/sandbox.rs
+++ b/src/runtime-rs/crates/runtimes/virt_container/src/sandbox.rs
@ -459,6 +459,18 @@ impl Sandbox for VirtSandbox {
            .context("sandbox: failed to get iptables")?;
        Ok(resp.data)
    }
+
+    async fn agent_metrics(&self) -> Result<String> {
+        self.agent
+            .get_metrics(agent::Empty::new())
+            .await
+            .map_err(|err| anyhow!("failed to get agent metrics {:?}", err))
+            .map(|resp| resp.metrics)
+    }
+
+    async fn hypervisor_metrics(&self) -> Result<String> {
+        self.hypervisor.get_hypervisor_metrics().await
+    }
 }

 #[async_trait]
--- a/src/runtime/pkg/containerd-shim-v2/create.go
+++ b/src/runtime/pkg/containerd-shim-v2/create.go
@ -16,6 +16,7 @@ import (
 	"path"
 	"path/filepath"
 	"strconv"
+	"strings"
 	"syscall"

 	"github.com/container-orchestrated-devices/container-device-interface/pkg/cdi"
@ -24,6 +25,8 @@ import (
 	taskAPI "github.com/containerd/containerd/runtime/v2/task"
 	"github.com/containerd/typeurl"
 	"github.com/kata-containers/kata-containers/src/runtime/pkg/utils"
+	"github.com/kata-containers/kata-containers/src/runtime/virtcontainers"
+	"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/annotations"
 	"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/rootless"
 	"github.com/opencontainers/runtime-spec/specs-go"
 	"github.com/pkg/errors"
@ -105,6 +108,28 @@ func withCDI(annotations map[string]string, cdiSpecDirs []string, spec *specs.Sp
 	return spec, nil
 }

+func copyLayersToMounts(rootFs *vc.RootFs, spec *specs.Spec) error {
+	for _, o := range rootFs.Options {
+		if !strings.HasPrefix(o, annotations.FileSystemLayer) {
+			continue
+		}
+
+		fields := strings.Split(o[len(annotations.FileSystemLayer):], ",")
+		if len(fields) < 2 {
+			return fmt.Errorf("Missing fields in rootfs layer: %q", o)
+		}
+
+		spec.Mounts = append(spec.Mounts, specs.Mount{
+			Destination: "/run/kata-containers/sandbox/layers/" + filepath.Base(fields[0]),
+			Type:        fields[1],
+			Source:      fields[0],
+			Options:     fields[2:],
+		})
+	}
+
+	return nil
+}
+
 func create(ctx context.Context, s *service, r *taskAPI.CreateTaskRequest) (*container, error) {
 	rootFs := vc.RootFs{}
 	if len(r.Rootfs) == 1 {
@ -120,6 +145,11 @@ func create(ctx context.Context, s *service, r *taskAPI.CreateTaskRequest) (*con
 	if err != nil {
 		return nil, err
 	}
+
+	if err := copyLayersToMounts(&rootFs, ociSpec); err != nil {
+		return nil, err
+	}
+
 	containerType, err := oci.ContainerType(*ociSpec)
 	if err != nil {
 		return nil, err
@ -340,6 +370,11 @@ func checkAndMount(s *service, r *taskAPI.CreateTaskRequest) (bool, error) {
 		if katautils.IsBlockDevice(m.Source) && !s.config.HypervisorConfig.DisableBlockDeviceUse {
 			return false, nil
 		}
+
+		if virtcontainers.HasOptionPrefix(m.Options, annotations.FileSystemLayer) {
+			return false, nil
+		}
+
 		if m.Type == vc.NydusRootFSType {
 			// if kata + nydus, do not mount
 			return false, nil
--- a/src/runtime/pkg/device/config/config.go
+++ b/src/runtime/pkg/device/config/config.go
@ -480,6 +480,10 @@ func GetHostPath(devInfo DeviceInfo, vhostUserStoreEnabled bool, vhostUserStoreP
 		return "", fmt.Errorf("Empty path provided for device")
 	}

+	if devInfo.Major == -1 {
+		return devInfo.HostPath, nil
+	}
+
 	// Filter out vhost-user storage devices by device Major numbers.
 	if vhostUserStoreEnabled && devInfo.DevType == "b" &&
 		(devInfo.Major == VhostUserSCSIMajor || devInfo.Major == VhostUserBlkMajor) {
--- a/src/runtime/pkg/device/manager/manager.go
+++ b/src/runtime/pkg/device/manager/manager.go
@ -83,10 +83,21 @@ func NewDeviceManager(blockDriver string, vhostUserStoreEnabled bool, vhostUserS
 	return dm
 }

-func (dm *deviceManager) findDeviceByMajorMinor(major, minor int64) api.Device {
+func (dm *deviceManager) findDevice(devInfo *config.DeviceInfo) api.Device {
+	// For devices with a major of -1, we use the host path to find existing instances.
+	if devInfo.Major == -1 {
+		for _, dev := range dm.devices {
+			dma, _ := dev.GetMajorMinor()
+			if dma == -1 && dev.GetHostPath() == devInfo.HostPath {
+				return dev
+			}
+		}
+		return nil
+	}
+
 	for _, dev := range dm.devices {
 		dma, dmi := dev.GetMajorMinor()
-		if dma == major && dmi == minor {
+		if dma == devInfo.Major && dmi == devInfo.Minor {
 			return dev
 		}
 	}
@ -111,7 +122,7 @@ func (dm *deviceManager) createDevice(devInfo config.DeviceInfo) (dev api.Device
 		}
 	}()

-	if existingDev := dm.findDeviceByMajorMinor(devInfo.Major, devInfo.Minor); existingDev != nil {
+	if existingDev := dm.findDevice(&devInfo); existingDev != nil {
 		return existingDev, nil
 	}

--- a/src/runtime/pkg/govmm/qemu/qmp.go
+++ b/src/runtime/pkg/govmm/qemu/qmp.go
@ -818,7 +818,12 @@ func (q *QMP) blockdevAddBaseArgs(driver string, blockDevice *BlockDevice) map[s
 // used to name the device.  As this identifier will be passed directly to QMP,
 // it must obey QMP's naming rules, e,g., it must start with a letter.
 func (q *QMP) ExecuteBlockdevAdd(ctx context.Context, blockDevice *BlockDevice) error {
-	args := q.blockdevAddBaseArgs("host_device", blockDevice)
+	var args map[string]interface{}
+	if fi, err := os.Stat(blockDevice.File); err == nil && fi.Mode().IsRegular() {
+		args = q.blockdevAddBaseArgs("file", blockDevice)
+	} else {
+		args = q.blockdevAddBaseArgs("host_device", blockDevice)
+	}

 	return q.executeCommand(ctx, "blockdev-add", args, nil)
 }
--- a/src/runtime/virtcontainers/container.go
+++ b/src/runtime/virtcontainers/container.go
@ -614,8 +614,9 @@ func (c *Container) createBlockDevices(ctx context.Context) error {
 			continue
 		}

-		if c.mounts[i].Type != "bind" {
-			// We only handle for bind-mounts
+		isBlockFile := HasOption(c.mounts[i].Options, vcAnnotations.IsFileBlockDevice)
+		if c.mounts[i].Type != "bind" && !isBlockFile {
+			// We only handle for bind and block device mounts.
 			continue
 		}

@ -677,7 +678,7 @@ func (c *Container) createBlockDevices(ctx context.Context) error {

 		// Check if mount is a block device file. If it is, the block device will be attached to the host
 		// instead of passing this as a shared mount.
-		if stat.Mode&unix.S_IFBLK == unix.S_IFBLK {
+		if stat.Mode&unix.S_IFMT == unix.S_IFBLK {
 			di = &config.DeviceInfo{
 				HostPath:      c.mounts[i].Source,
 				ContainerPath: c.mounts[i].Destination,
@ -686,6 +687,15 @@ func (c *Container) createBlockDevices(ctx context.Context) error {
 				Minor:         int64(unix.Minor(uint64(stat.Rdev))),
 				ReadOnly:      c.mounts[i].ReadOnly,
 			}
+		} else if isBlockFile && stat.Mode&unix.S_IFMT == unix.S_IFREG {
+			di = &config.DeviceInfo{
+				HostPath:      c.mounts[i].Source,
+				ContainerPath: c.mounts[i].Destination,
+				DevType:       "b",
+				Major:         -1,
+				Minor:         0,
+				ReadOnly:      c.mounts[i].ReadOnly,
+			}
 			// Check whether source can be used as a pmem device
 		} else if di, err = config.PmemDeviceInfo(c.mounts[i].Source, c.mounts[i].Destination); err != nil {
 			c.Logger().WithError(err).
@ -857,6 +867,21 @@ func (c *Container) checkBlockDeviceSupport(ctx context.Context) bool {
 	return false
 }

+// Sort the devices starting with device #1 being the VFIO control group
+// device and the next the actuall device(s) e.g. /dev/vfio/<group>
+func sortContainerVFIODevices(devices []ContainerDevice) []ContainerDevice {
+	var vfioDevices []ContainerDevice
+
+	for _, device := range devices {
+		if deviceManager.IsVFIOControlDevice(device.ContainerPath) {
+			vfioDevices = append([]ContainerDevice{device}, vfioDevices...)
+			continue
+		}
+		vfioDevices = append(vfioDevices, device)
+	}
+	return vfioDevices
+}
+
 // create creates and starts a container inside a Sandbox. It has to be
 // called only when a new container, not known by the sandbox, has to be created.
 func (c *Container) create(ctx context.Context) (err error) {
@ -899,6 +924,13 @@ func (c *Container) create(ctx context.Context) (err error) {
 		}
 		c.devices = cntDevices
 	}
+	// If modeVFIO is enabled we need 1st to attach the VFIO control group
+	// device /dev/vfio/vfio an 2nd the actuall device(s) afterwards.
+	// Sort the devices starting with device #1 being the VFIO control group
+	// device and the next the actuall device(s) /dev/vfio/<group>
+	if modeVFIO {
+		c.devices = sortContainerVFIODevices(c.devices)
+	}

 	c.Logger().WithFields(logrus.Fields{
 		"devices": c.devices,
--- a/src/runtime/virtcontainers/fs_share_linux.go
+++ b/src/runtime/virtcontainers/fs_share_linux.go
@ -26,6 +26,7 @@ import (
 	"github.com/kata-containers/kata-containers/src/runtime/pkg/device/config"
 	"github.com/kata-containers/kata-containers/src/runtime/pkg/katautils/katatrace"
 	"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/agent/protocols/grpc"
+	"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/pkg/annotations"
 	"github.com/kata-containers/kata-containers/src/runtime/virtcontainers/utils"
 )

@ -459,6 +460,20 @@ func (f *FilesystemShare) ShareRootFilesystem(ctx context.Context, c *Container)
 		return f.shareRootFilesystemWithNydus(ctx, c)
 	}

+	if HasOptionPrefix(c.rootFs.Options, annotations.FileSystemLayer) {
+		path := filepath.Join("/run/kata-containers", c.id, "rootfs")
+		return &SharedFile{
+			storage: &grpc.Storage{
+				MountPoint: path,
+				Source:     "none",
+				Fstype:     c.rootFs.Type,
+				Driver:     kataOverlayDevType,
+				Options:    c.rootFs.Options,
+			},
+			guestPath: path,
+		}, nil
+	}
+
 	if c.state.Fstype != "" && c.state.BlockDeviceID != "" {
 		// The rootfs storage volume represents the container rootfs
 		// mount point inside the guest.
--- a/src/runtime/virtcontainers/kata_agent.go
+++ b/src/runtime/virtcontainers/kata_agent.go
@ -924,6 +924,8 @@ func (k *kataAgent) removeIgnoredOCIMount(spec *specs.Spec, ignoredMounts map[st
 	for _, m := range spec.Mounts {
 		if _, found := ignoredMounts[m.Source]; found {
 			k.Logger().WithField("removed-mount", m.Source).Debug("Removing OCI mount")
+		} else if HasOption(m.Options, vcAnnotations.IsFileSystemLayer) {
+			k.Logger().WithField("removed-mount", m.Source).Debug("Removing layer")
 		} else {
 			mounts = append(mounts, m)
 		}
@ -1311,13 +1313,17 @@ func (k *kataAgent) createContainer(ctx context.Context, sandbox *Sandbox, c *Co

 	// Block based volumes will require some adjustments in the OCI spec, and creation of
 	// storage objects to pass to the agent.
-	volumeStorages, err := k.handleBlkOCIMounts(c, ociSpec)
+	layerStorages, volumeStorages, err := k.handleBlkOCIMounts(c, ociSpec)
 	if err != nil {
 		return nil, err
 	}

 	ctrStorages = append(ctrStorages, volumeStorages...)

+	// Layer storage objects are prepended to the list so that they come _before_ the
+	// rootfs because the rootfs depends on them (it's an overlay of the layers).
+	ctrStorages = append(layerStorages, ctrStorages...)
+
 	grpcSpec, err := grpc.OCItoGRPC(ociSpec)
 	if err != nil {
 		return nil, err
@ -1629,9 +1635,10 @@ func (k *kataAgent) createBlkStorageObject(c *Container, m Mount) (*grpc.Storage
 // handleBlkOCIMounts will create a unique destination mountpoint in the guest for each volume in the
 // given container and will update the OCI spec to utilize this mount point as the new source for the
 // container volume. The container mount structure is updated to store the guest destination mountpoint.
-func (k *kataAgent) handleBlkOCIMounts(c *Container, spec *specs.Spec) ([]*grpc.Storage, error) {
+func (k *kataAgent) handleBlkOCIMounts(c *Container, spec *specs.Spec) ([]*grpc.Storage, []*grpc.Storage, error) {

 	var volumeStorages []*grpc.Storage
+	var layerStorages []*grpc.Storage

 	for i, m := range c.mounts {
 		id := m.BlockDeviceID
@ -1647,7 +1654,12 @@ func (k *kataAgent) handleBlkOCIMounts(c *Container, spec *specs.Spec) ([]*grpc.
 		// Create Storage structure
 		vol, err := k.createBlkStorageObject(c, m)
 		if vol == nil || err != nil {
-			return nil, err
+			return nil, nil, err
+		}
+
+		if HasOption(m.Options, vcAnnotations.IsFileSystemLayer) {
+			layerStorages = append(layerStorages, vol)
+			continue
 		}

 		// Each device will be mounted at a unique location within the VM only once. Mounting
@ -1668,6 +1680,10 @@ func (k *kataAgent) handleBlkOCIMounts(c *Container, spec *specs.Spec) ([]*grpc.
 				"new-source":      path,
 			}).Debug("Replacing OCI mount source")
 			spec.Mounts[idx].Source = path
+			if HasOption(spec.Mounts[idx].Options, vcAnnotations.IsFileBlockDevice) {
+				// The device is already mounted, just bind to path in container.
+				spec.Mounts[idx].Options = []string{"bind"}
+			}
 			break
 		}

@ -1678,7 +1694,7 @@ func (k *kataAgent) handleBlkOCIMounts(c *Container, spec *specs.Spec) ([]*grpc.
 		volumeStorages = append(volumeStorages, vol)
 	}

-	return volumeStorages, nil
+	return layerStorages, volumeStorages, nil
 }

 // handlePidNamespace checks if Pid namespace for a container needs to be shared with its sandbox
--- a/src/runtime/virtcontainers/mount.go
+++ b/src/runtime/virtcontainers/mount.go
@ -407,3 +407,21 @@ func isWatchableMount(path string) bool {

 	return false
 }
+
+func HasOption(options []string, option string) bool {
+	for _, o := range options {
+		if o == option {
+			return true
+		}
+	}
+	return false
+}
+
+func HasOptionPrefix(options []string, prefix string) bool {
+	for _, o := range options {
+		if strings.HasPrefix(o, prefix) {
+			return true
+		}
+	}
+	return false
+}
--- a/src/runtime/virtcontainers/pkg/annotations/annotations.go
+++ b/src/runtime/virtcontainers/pkg/annotations/annotations.go
@ -330,6 +330,21 @@ const (
 	ContainerResourcesSwapInBytes = kataAnnotContainerResourcePrefix + "swap_in_bytes"
 )

+// Annotations related to file system options.
+const (
+	kataAnnotFsOptPrefix = kataAnnotationsPrefix + "fs-opt."
+
+	// FileSystemLayer describes a layer of an overlay filesystem.
+	FileSystemLayer = kataAnnotFsOptPrefix + "layer="
+
+	// IsFileSystemLayer indicates that the annotated filesystem is a layer of an overlay fs.
+	IsFileSystemLayer = kataAnnotFsOptPrefix + "is-layer"
+
+	// IsFileBlockDevice indicates that the annotated filesystem is mounted on a block device
+	// backed by a host file.
+	IsFileBlockDevice = kataAnnotFsOptPrefix + "block_device=file"
+)
+
 const (
 	// SHA512 is the SHA-512 (64) hash algorithm
 	SHA512 string = "sha512"
--- a/src/runtime/virtcontainers/sandbox.go
+++ b/src/runtime/virtcontainers/sandbox.go
@ -647,7 +647,8 @@ func (s *Sandbox) coldOrHotPlugVFIO(sandboxConfig *SandboxConfig) (bool, error)
 	hotPlugVFIO := (sandboxConfig.HypervisorConfig.HotPlugVFIO != config.NoPort)

 	modeIsGK := (sandboxConfig.VfioMode == config.VFIOModeGuestKernel)
-	modeIsVFIO := (sandboxConfig.VfioMode == config.VFIOModeVFIO)
+	// modeIsVFIO is needed at the container level not the sandbox level.
+	// modeIsVFIO := (sandboxConfig.VfioMode == config.VFIOModeVFIO)

 	var vfioDevices []config.DeviceInfo
 	// vhost-user-block device is a PCIe device in Virt, keep track of it
@ -662,13 +663,6 @@ func (s *Sandbox) coldOrHotPlugVFIO(sandboxConfig *SandboxConfig) (bool, error)
 				continue
 			}
 			isVFIODevice := deviceManager.IsVFIODevice(device.ContainerPath)
-			isVFIOControlDevice := deviceManager.IsVFIOControlDevice(device.ContainerPath)
-			// vfio_mode=vfio needs the VFIO control device add it to the list
-			// of devices to be added to the VM.
-			if modeIsVFIO && isVFIOControlDevice && !hotPlugVFIO {
-				vfioDevices = append(vfioDevices, device)
-			}
-
 			if hotPlugVFIO && isVFIODevice {
 				device.ColdPlug = false
 				device.Port = sandboxConfig.HypervisorConfig.HotPlugVFIO
--- a/src/tools/kata-ctl/Cargo.lock
+++ b/src/tools/kata-ctl/Cargo.lock
--- a/src/tools/kata-ctl/Cargo.toml
+++ b/src/tools/kata-ctl/Cargo.toml
@ -44,7 +44,12 @@ logging = { path = "../../libs/logging" }
 slog = "2.7.0"
 slog-scope = "4.4.0"
 hyper = "0.14.20"
-tokio = "1.28.1"
+tokio = { version = "1.28.1", features = ["signal"] }
+ttrpc = "0.6.0"
+
+prometheus = { version = "0.13.0", features = ["process"] }
+procfs = "0.12.0"
+lazy_static = "1.2"

 [target.'cfg(target_arch = "s390x")'.dependencies]
 reqwest = { version = "0.11", default-features = false, features = ["json", "blocking", "native-tls"] }
--- a/src/tools/kata-ctl/src/args.rs
+++ b/src/tools/kata-ctl/src/args.rs
@ -56,6 +56,9 @@ pub enum Commands {
    /// Gather metrics associated with infrastructure used to run a sandbox
    Metrics(MetricsCommand),

+    /// Start a monitor to get metrics of Kata Containers
+    Monitor(MonitorArgument),
+
    /// Display version details
    Version,
 }
@ -122,6 +125,12 @@ pub enum IpTablesArguments {
    Metrics,
 }

+#[derive(Debug, Args)]
+pub struct MonitorArgument {
+    /// The address to listen on for HTTP requests. (default "127.0.0.1:8090")
+    pub address: Option<String>,
+}
+
 #[derive(Debug, Args)]
 pub struct DirectVolumeCommand {
    #[clap(subcommand)]
--- a/src/tools/kata-ctl/src/main.rs
+++ b/src/tools/kata-ctl/src/main.rs
@ -3,9 +3,16 @@
 // SPDX-License-Identifier: Apache-2.0
 //

+#[macro_use]
+extern crate lazy_static;
+
+#[macro_use]
+extern crate slog;
+
 mod arch;
 mod args;
 mod check;
+mod monitor;
 mod ops;
 mod types;
 mod utils;
@ -18,7 +25,7 @@ use std::process::exit;
 use args::{Commands, KataCtlCli};

 use ops::check_ops::{
-    handle_check, handle_factory, handle_iptables, handle_metrics, handle_version,
+    handle_check, handle_factory, handle_iptables, handle_metrics, handle_monitor, handle_version,
 };
 use ops::env_ops::handle_env;
 use ops::exec_ops::handle_exec;
@ -52,6 +59,7 @@ fn real_main() -> Result<()> {
        Commands::Factory => handle_factory(),
        Commands::Iptables(args) => handle_iptables(args),
        Commands::Metrics(args) => handle_metrics(args),
+        Commands::Monitor(args) => handle_monitor(args),
        Commands::Version => handle_version(),
    };

--- a/src/tools/kata-ctl/src/monitor/http_server.rs
+++ b/src/tools/kata-ctl/src/monitor/http_server.rs
@ -0,0 +1,181 @@
+// Copyright 2022-2023 Ant Group
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+use crate::monitor::metrics::get_monitor_metrics;
+use crate::sl;
+use crate::utils::TIMEOUT;
+
+use anyhow::{anyhow, Context, Result};
+use hyper::body;
+use hyper::service::{make_service_fn, service_fn};
+use hyper::{Body, Method, Request, Response, Server, StatusCode};
+use shim_interface::shim_mgmt::client::MgmtClient;
+use slog::{self, info};
+use std::collections::HashMap;
+use std::net::SocketAddr;
+
+const ROOT_URI: &str = "/";
+const METRICS_URI: &str = "/metrics";
+
+async fn handler_mux(req: Request<Body>) -> Result<Response<Body>> {
+    info!(
+        sl!(),
+        "mgmt-svr(mux): recv req, method: {}, uri: {}",
+        req.method(),
+        req.uri().path()
+    );
+
+    match (req.method(), req.uri().path()) {
+        (&Method::GET, ROOT_URI) => root_uri_handler(req).await,
+        (&Method::GET, METRICS_URI) => metrics_uri_handler(req).await,
+        _ => not_found_uri_handler(req).await,
+    }
+    .map_or_else(
+        |e| {
+            Response::builder()
+                .status(StatusCode::INTERNAL_SERVER_ERROR)
+                .body(Body::from(format!("{:?}\n", e)))
+                .map_err(|e| anyhow!("Failed to Build Response {:?}", e))
+        },
+        Ok,
+    )
+}
+
+pub async fn http_server_setup(socket_addr: &str) -> Result<()> {
+    let addr: SocketAddr = socket_addr
+        .parse()
+        .context("failed to parse http socket address")?;
+
+    let make_svc =
+        make_service_fn(|_conn| async { Ok::<_, anyhow::Error>(service_fn(handler_mux)) });
+
+    Server::bind(&addr).serve(make_svc).await?;
+
+    Ok(())
+}
+
+async fn root_uri_handler(_req: Request<Body>) -> Result<Response<Body>> {
+    Response::builder()
+        .status(StatusCode::OK)
+        .body(Body::from(
+            r#"Available HTTP endpoints:
+    /metrics : Get metrics from sandboxes.
+"#,
+        ))
+        .map_err(|e| anyhow!("Failed to Build Response {:?}", e))
+}
+
+async fn metrics_uri_handler(req: Request<Body>) -> Result<Response<Body>> {
+    let mut response_body = String::new();
+
+    response_body += &get_monitor_metrics().context("Failed to Get Monitor Metrics")?;
+
+    if let Some(uri_query) = req.uri().query() {
+        if let Ok(sandbox_id) = parse_sandbox_id(uri_query) {
+            response_body += &get_runtime_metrics(sandbox_id)
+                .await
+                .context(format!("{}\nFailed to Get Runtime Metrics", response_body))?;
+        }
+    }
+
+    Response::builder()
+        .status(StatusCode::OK)
+        .body(Body::from(response_body))
+        .map_err(|e| anyhow!("Failed to Build Response {:?}", e))
+}
+
+async fn get_runtime_metrics(sandbox_id: &str) -> Result<String> {
+    // build shim client
+    let shim_client =
+        MgmtClient::new(sandbox_id, Some(TIMEOUT)).context("failed to build shim mgmt client")?;
+
+    // get METRICS_URI
+    let shim_response = shim_client
+        .get(METRICS_URI)
+        .await
+        .context("failed to get METRICS_URI")?;
+
+    // get runtime_metrics
+    let runtime_metrics = String::from_utf8(body::to_bytes(shim_response).await?.to_vec())
+        .context("failed to get runtime_metrics")?;
+
+    Ok(runtime_metrics)
+}
+
+async fn not_found_uri_handler(_req: Request<Body>) -> Result<Response<Body>> {
+    Response::builder()
+        .status(StatusCode::NOT_FOUND)
+        .body(Body::from("NOT FOUND"))
+        .map_err(|e| anyhow!("Failed to Build Response {:?}", e))
+}
+
+fn parse_sandbox_id(uri: &str) -> Result<&str> {
+    let uri_pairs: HashMap<_, _> = uri
+        .split_whitespace()
+        .map(|s| s.split_at(s.find('=').unwrap_or(0)))
+        .map(|(key, val)| (key, &val[1..]))
+        .collect();
+
+    match uri_pairs.get("sandbox") {
+        Some(sid) => Ok(sid.to_owned()),
+        None => Err(anyhow!("params sandbox not found")),
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_parse_sandbox_id() {
+        assert!(parse_sandbox_id("sandbox=demo_sandbox").unwrap() == "demo_sandbox");
+        assert!(parse_sandbox_id("foo=bar").is_err());
+    }
+
+    #[tokio::test]
+    async fn test_root_uri_handler() {
+        let root_resp = handler_mux(
+            Request::builder()
+                .method("GET")
+                .uri("/")
+                .body(hyper::Body::from(""))
+                .unwrap(),
+        )
+        .await
+        .unwrap();
+
+        assert!(root_resp.status() == StatusCode::OK);
+    }
+
+    #[tokio::test]
+    async fn test_metrics_uri_handler() {
+        let metrics_resp = handler_mux(
+            Request::builder()
+                .method("GET")
+                .uri("/metrics?sandbox=demo_sandbox")
+                .body(hyper::Body::from(""))
+                .unwrap(),
+        )
+        .await
+        .unwrap();
+
+        assert!(metrics_resp.status() == StatusCode::INTERNAL_SERVER_ERROR);
+    }
+
+    #[tokio::test]
+    async fn test_not_found_uri_handler() {
+        let not_found_resp = handler_mux(
+            Request::builder()
+                .method("POST")
+                .uri("/metrics?sandbox=demo_sandbox")
+                .body(hyper::Body::from(""))
+                .unwrap(),
+        )
+        .await
+        .unwrap();
+
+        assert!(not_found_resp.status() == StatusCode::NOT_FOUND);
+    }
+}
--- a/src/tools/kata-ctl/src/monitor/metrics.rs
+++ b/src/tools/kata-ctl/src/monitor/metrics.rs
@ -0,0 +1,91 @@
+// Copyright 2022-2023 Ant Group
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+extern crate procfs;
+
+use anyhow::{anyhow, Context, Result};
+
+use prometheus::{Encoder, Gauge, IntCounter, Registry, TextEncoder};
+use std::sync::Mutex;
+
+const NAMESPACE_KATA_MONITOR: &str = "kata_ctl_monitor";
+
+lazy_static! {
+
+    static ref REGISTERED: Mutex<bool> = Mutex::new(false);
+
+    // custom registry
+    static ref REGISTRY: Registry = Registry::new();
+
+    // monitor metrics
+    static ref MONITOR_SCRAPE_COUNT: IntCounter =
+    IntCounter::new(format!("{}_{}", NAMESPACE_KATA_MONITOR, "scrape_count"), "Monitor scrape count").unwrap();
+
+    static ref MONITOR_MAX_FDS: Gauge = Gauge::new(format!("{}_{}", NAMESPACE_KATA_MONITOR, "process_max_fds"), "Open FDs for monitor").unwrap();
+
+    static ref MONITOR_OPEN_FDS: Gauge = Gauge::new(format!("{}_{}", NAMESPACE_KATA_MONITOR, "process_open_fds"), "Open FDs for monitor").unwrap();
+
+    static ref MONITOR_RESIDENT_MEMORY: Gauge = Gauge::new(format!("{}_{}", NAMESPACE_KATA_MONITOR, "process_resident_memory_bytes"), "Resident memory size in bytes for monitor").unwrap();
+}
+
+/// get monitor metrics
+pub fn get_monitor_metrics() -> Result<String> {
+    let mut registered = REGISTERED
+        .lock()
+        .map_err(|e| anyhow!("failed to check monitor metrics register status {:?}", e))?;
+
+    if !(*registered) {
+        register_monitor_metrics().context("failed to register monitor metrics")?;
+        *registered = true;
+    }
+
+    update_monitor_metrics().context("failed to update monitor metrics")?;
+
+    // gather all metrics and return as a String
+    let metric_families = REGISTRY.gather();
+
+    let mut buffer = Vec::new();
+    TextEncoder::new()
+        .encode(&metric_families, &mut buffer)
+        .context("failed to encode gathered metrics")?;
+
+    Ok(String::from_utf8(buffer)?)
+}
+
+fn register_monitor_metrics() -> Result<()> {
+    REGISTRY.register(Box::new(MONITOR_SCRAPE_COUNT.clone()))?;
+    REGISTRY.register(Box::new(MONITOR_MAX_FDS.clone()))?;
+    REGISTRY.register(Box::new(MONITOR_OPEN_FDS.clone()))?;
+    REGISTRY.register(Box::new(MONITOR_RESIDENT_MEMORY.clone()))?;
+
+    Ok(())
+}
+
+fn update_monitor_metrics() -> Result<()> {
+    MONITOR_SCRAPE_COUNT.inc();
+
+    let me = match procfs::process::Process::myself() {
+        Ok(p) => p,
+        Err(e) => {
+            eprintln!("failed to create process instance: {:?}", e);
+
+            return Ok(());
+        }
+    };
+
+    if let Ok(fds) = procfs::sys::fs::file_max() {
+        MONITOR_MAX_FDS.set(fds as f64);
+    }
+
+    if let Ok(fds) = me.fd_count() {
+        MONITOR_OPEN_FDS.set(fds as f64);
+    }
+
+    if let Ok(statm) = me.statm() {
+        MONITOR_RESIDENT_MEMORY.set(statm.resident as f64);
+    }
+
+    Ok(())
+}
--- a/src/tools/kata-ctl/src/monitor/mod.rs
+++ b/src/tools/kata-ctl/src/monitor/mod.rs
@ -0,0 +1,8 @@
+// Copyright 2022-2023 Ant Group
+//
+// SPDX-License-Identifier: Apache-2.0
+//
+
+mod metrics;
+
+pub mod http_server;
--- a/src/tools/kata-ctl/src/ops/check_ops.rs
+++ b/src/tools/kata-ctl/src/ops/check_ops.rs
@ -5,15 +5,21 @@

 use crate::arch::arch_specific::get_checks;

-use crate::args::{CheckArgument, CheckSubCommand, IptablesCommand, MetricsCommand};
+use crate::args::{
+    CheckArgument, CheckSubCommand, IptablesCommand, MetricsCommand, MonitorArgument,
+};

 use crate::check;

+use crate::monitor::http_server;
+
 use crate::ops::version;

 use crate::types::*;

-use anyhow::{anyhow, Result};
+use anyhow::{anyhow, Context, Result};
+
+const MONITOR_DEFAULT_SOCK_ADDR: &str = "127.0.0.1:8090";

 use slog::{info, o, warn};

@ -128,6 +134,17 @@ pub fn handle_metrics(_args: MetricsCommand) -> Result<()> {
    Ok(())
 }

+pub fn handle_monitor(monitor_args: MonitorArgument) -> Result<()> {
+    tokio::runtime::Runtime::new()
+        .context("failed to new runtime for aync http server")?
+        .block_on(http_server::http_server_setup(
+            monitor_args
+                .address
+                .as_deref()
+                .unwrap_or(MONITOR_DEFAULT_SOCK_ADDR),
+        ))
+}
+
 pub fn handle_version() -> Result<()> {
    let version = version::get().unwrap();

--- a/src/tools/kata-ctl/src/ops/exec_ops.rs
+++ b/src/tools/kata-ctl/src/ops/exec_ops.rs
@ -25,6 +25,8 @@ use vmm_sys_util::terminal::Terminal;
 use crate::args::ExecArguments;
 use shim_interface::shim_mgmt::{client::MgmtClient, AGENT_URL};

+use crate::utils::TIMEOUT;
+
 const CMD_CONNECT: &str = "CONNECT";
 const CMD_OK: &str = "OK";
 const SCHEME_VSOCK: &str = "VSOCK";
@ -32,7 +34,6 @@ const SCHEME_HYBRID_VSOCK: &str = "HVSOCK";

 const EPOLL_EVENTS_LEN: usize = 16;
 const KATA_AGENT_VSOCK_TIMEOUT: u64 = 5;
-const TIMEOUT: Duration = Duration::from_millis(2000);

 type Result<T> = std::result::Result<T, Error>;

--- a/src/tools/kata-ctl/src/ops/volume_ops.rs
+++ b/src/tools/kata-ctl/src/ops/volume_ops.rs
@ -14,7 +14,7 @@ use kata_types::mount::{
 use nix;
 use reqwest::StatusCode;
 use slog::{info, o};
-use std::{fs, time::Duration};
+use std::fs;
 use url;

 use agent::ResizeVolumeRequest;
@ -23,7 +23,8 @@ use shim_interface::shim_mgmt::{
    DIRECT_VOLUME_PATH_KEY, DIRECT_VOLUME_RESIZE_URL, DIRECT_VOLUME_STATS_URL,
 };

-const TIMEOUT: Duration = Duration::from_millis(2000);
+use crate::utils::TIMEOUT;
+
 const CONTENT_TYPE_JSON: &str = "application/json";

 macro_rules! sl {
--- a/src/tools/kata-ctl/src/utils.rs
+++ b/src/tools/kata-ctl/src/utils.rs
@ -8,10 +8,12 @@
 use crate::arch::arch_specific;

 use anyhow::{anyhow, Context, Result};
-use std::fs;
+use std::{fs, time::Duration};

 const NON_PRIV_USER: &str = "nobody";

+pub const TIMEOUT: Duration = Duration::from_millis(2000);
+
 pub fn drop_privs() -> Result<()> {
    if nix::unistd::Uid::effective().is_root() {
        privdrop::PrivDrop::default()
--- a/tests/common.bash
+++ b/tests/common.bash
@ -186,31 +186,20 @@ function clean_env_ctr()
 	if (( count_tasks > 0 )); then
 		die "Can't remove running containers."
 	fi
-
-	kill_kata_components
 }

 # Kills running shim and hypervisor components
+# by using the kata-component file name.
 function kill_kata_components() {
-	local kata_bin_dir="/opt/kata/bin"
-	local shim_path="${kata_bin_dir}/containerd-shim-kata-v2"
-	local hypervisor_path="${kata_bin_dir}/qemu-system-x86_64"
-	local pid_shim_count="$(pgrep -fc ${shim_path} || exit 0)"
+	local PID_NAMES=( "containerd-shim-kata-v2" "qemu-system-x86_64" "cloud-hypervisor" )

-	[ ${pid_shim_count} -gt "0" ] && sudo kill -SIGKILL "$(pgrep -f ${shim_path})" > /dev/null 2>&1
-
-        if [ "${KATA_HYPERVISOR}" = 'clh' ]; then
-		hypervisor_path="${kata_bin_dir}/cloud-hypervisor"
-	elif [ "${KATA_HYPERVISOR}" != 'qemu' ]; then
-                echo "Failed to stop the hypervisor: '${KATA_HYPERVISOR}' as it is not recognized"
-		return
-        fi
-
-	local pid_hypervisor_count="$(pgrep -fc ${hypervisor_path} || exit 0)"
-
-	if [ ${pid_hypervisor_count} -gt "0" ]; then
-		sudo kill -SIGKILL "$(pgrep -f ${hypervisor_path})" > /dev/null 2>&1
-	fi
+	sudo systemctl stop containerd
+	# Get the filenames of the kata components
+	# and kill the correspondingt processes
+	for PID_NAME in ${PID_NAMES} ; do
+		sudo killall ${PID_NAME} > /dev/null 2>&1 || true
+	done
+	sudo systemctl start containerd
 }

 # Restarts a systemd service while ensuring the start-limit-burst is set to 0.
--- a/tests/integration/kubernetes/gha-run.sh
+++ b/tests/integration/kubernetes/gha-run.sh
@ -173,8 +173,8 @@ function delete_cluster() {
 }

 function get_nodes_and_pods_info() {
-    kubectl debug $(kubectl get nodes -o name) -it --image=quay.io/kata-containers/kata-debug:latest
-    kubectl get pods -o name | grep node-debugger | xargs kubectl delete
+    kubectl debug $(kubectl get nodes -o name) -it --image=quay.io/kata-containers/kata-debug:latest || true
+    kubectl get pods -o name | grep node-debugger | xargs kubectl delete || true
 }

 function main() {
--- a/tests/metrics/README.md
+++ b/tests/metrics/README.md
@ -66,6 +66,8 @@ Tests relating to networking. General items could include:
 - parallel bandwidth
 - write and read percentiles 

+For further details see the [network tests documentation](network).
+
 ### Storage

 Tests relating to the storage (graph, volume) drivers.
--- a/tests/metrics/cmd/checkmetrics/ci_worker/checkmetrics-json-clh-kata-metric8.toml
+++ b/tests/metrics/cmd/checkmetrics/ci_worker/checkmetrics-json-clh-kata-metric8.toml
@ -17,7 +17,7 @@ description = "measure container lifecycle timings"
 checkvar = ".\"boot-times\".Results | .[] | .\"to-workload\".Result"
 checktype = "mean"
 midval = 0.69
-minpercent = 30.0
+minpercent = 40.0
 maxpercent = 30.0

 [[metric]]
--- a/tests/metrics/cmd/checkmetrics/ci_worker/checkmetrics-json-qemu-kata-metric8.toml
+++ b/tests/metrics/cmd/checkmetrics/ci_worker/checkmetrics-json-qemu-kata-metric8.toml
@ -17,7 +17,7 @@ description = "measure container lifecycle timings"
 checkvar = ".\"boot-times\".Results | .[] | .\"to-workload\".Result"
 checktype = "mean"
 midval = 0.71
-minpercent = 30.0
+minpercent = 40.0
 maxpercent = 30.0

 [[metric]]
--- a/tests/metrics/density/README.md
+++ b/tests/metrics/density/README.md
@ -51,3 +51,8 @@ For more details see the [footprint test documentation](footprint_data.md).
 Measures the memory statistics *inside* the container. This allows evaluation of
 the overhead the VM kernel and rootfs are having on the memory that was requested
 by the container co-ordination system, and thus supplied to the VM.
+
+## `k8s-sysbench`
+
+`Sysbench`is an open-source and multi-purpose benchmark utility that evaluates parameters features tests for `CPU`, memory
+and I/O. Currently the `k8s-sysbench` test is measuring the `CPU` performance.
--- a/tests/metrics/density/k8s-sysbench.sh
+++ b/tests/metrics/density/k8s-sysbench.sh
@ -0,0 +1,69 @@
+#!/bin/bash
+#
+# Copyright (c) 2022-2023 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+
+set -o errexit
+set -o nounset
+set -o pipefail
+
+SCRIPT_PATH=$(dirname "$(readlink -f "$0")")
+source "${SCRIPT_PATH}/../lib/common.bash"
+sysbench_file=$(mktemp sysbenchresults.XXXXXXXXXX)
+TEST_NAME="${TEST_NAME:-sysbench}"
+CI_JOB="${CI_JOB:-}"
+IMAGE="docker.io/library/local-sysbench:latest"
+DOCKERFILE="${SCRIPT_PATH}/sysbench-dockerfile/Dockerfile"
+
+function remove_tmp_file() {
+	rm -rf "${sysbench_file}"
+}
+
+trap remove_tmp_file EXIT
+
+function sysbench_memory() {
+	kubectl exec -i "$pod_name" -- sh -c "sysbench memory --threads=2 run" > "${sysbench_file}"
+	metrics_json_init
+	local memory_latency_sum=$(cat "$sysbench_file" | grep sum | cut -f2 -d':' | sed 's/[[:blank:]]//g')
+	metrics_json_start_array
+	local json="$(cat << EOF
+	{
+		"memory-latency-sum": {
+			"Result" : $memory_latency_sum,
+			"Units" : "ms"
+		}
+	}
+EOF
+)"
+	metrics_json_add_array_element "$json"
+	metrics_json_end_array "Results"
+	metrics_json_save
+}
+
+function sysbench_start_deployment() {
+	cmds=("bc" "jq")
+	check_cmds "${cmds[@]}"
+
+	# Check no processes are left behind
+	check_processes
+
+	export pod_name="test-sysbench"
+
+	kubectl create -f "${SCRIPT_PATH}/runtimeclass_workloads/sysbench-pod.yaml"
+	kubectl wait --for=condition=Ready --timeout=120s pod "$pod_name"
+}
+
+function sysbench_cleanup() {
+	kubectl delete pod "$pod_name"
+	check_processes
+}
+
+function main() {
+	init_env
+	sysbench_start_deployment
+	sysbench_memory
+	sysbench_cleanup
+}
+
+main "$@"
--- a/tests/metrics/density/runtimeclass_workloads/sysbench-pod.yaml
+++ b/tests/metrics/density/runtimeclass_workloads/sysbench-pod.yaml
@ -0,0 +1,18 @@
+#
+# Copyright (c) 2018-2023 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+#
+apiVersion: v1
+kind: Pod
+metadata:
+  name: test-sysbench
+spec:
+  terminationGracePeriodSeconds: 0
+  runtimeClassName: kata
+  containers:
+  - name: test-sysbench
+    image: localhost:5000/sysbench-kata:latest
+    command:
+        - sleep
+        - "60"
--- a/tests/metrics/density/sysbench-dockerfile/Dockerfile
+++ b/tests/metrics/density/sysbench-dockerfile/Dockerfile
@ -0,0 +1,17 @@
+# Copyright (c) 2022-2023 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+
+# Usage: FROM [image name]
+FROM ubuntu:20.04
+
+# Version of the Dockerfile
+LABEL DOCKERFILE_VERSION="1.0"
+
+RUN apt-get update && \
+	apt-get install -y build-essential git curl sudo && \
+	apt-get remove -y unattended-upgrades && \
+	curl -OkL https://packagecloud.io/install/repositories/akopytov/sysbench/script.deb.sh && \
+	apt-get install -y sysbench
+
+CMD ["/bin/bash"]
--- a/tests/metrics/gha-run.sh
+++ b/tests/metrics/gha-run.sh
@ -85,6 +85,14 @@ function run_test_tensorflow() {
 	check_metrics
 }

+function run_test_fio() {
+	info "Running FIO test using ${KATA_HYPERVISOR} hypervisor"
+        # ToDo: remove the exit once the metrics workflow is stable
+        exit 0
+
+	bash storage/fio-k8s/fio-test-ci.sh
+}
+
 function main() {
 	action="${1:-}"
 	case "${action}" in
@ -95,6 +103,7 @@ function main() {
 		run-test-memory-usage-inside-container) run_test_memory_usage_inside_container ;;
 		run-test-blogbench) run_test_blogbench ;;
 		run-test-tensorflow) run_test_tensorflow ;;
+		run-test-fio) run_test_fio ;;
 		*) >&2 die "Invalid argument" ;;
 	esac
 }
--- a/tests/metrics/lib/common.bash
+++ b/tests/metrics/lib/common.bash
@ -192,6 +192,8 @@ function kill_processes_before_start()
 	CTR_PROCS=$(sudo "${CTR_EXE}" t list -q)
 	[[ -n "${CTR_PROCS}" ]] && clean_env_ctr

+	kill_kata_components
+
 	check_processes
 }

--- a/tests/metrics/network/README.md
+++ b/tests/metrics/network/README.md
@ -0,0 +1,22 @@
+# Kata Containers network metrics
+
+Kata Containers provides a series of network performance tests. Running these provides a basic reference for measuring network essentials like 
+bandwidth, jitter, latency and parallel bandwidth.
+
+## Performance tools
+
+- `iperf3` measures bandwidth, jitter, CPU usage and the quality of a network link.
+
+## Networking tests
+
+- `k8s-network-metrics-iperf3.sh` measures bandwidth which is the speed of the data transfer.
+- `latency-network.sh` measures network latency.
+
+## Running the tests
+
+Individual tests can be run by hand, for example:
+
+``` 
+$ cd metrics
+$ bash network/iperf3_kubernetes/k8s-network-metrics-iperf3.sh -b
+```
--- a/tests/metrics/network/iperf3_kubernetes/k8s-network-metrics-iperf3.sh
+++ b/tests/metrics/network/iperf3_kubernetes/k8s-network-metrics-iperf3.sh
@ -0,0 +1,314 @@
+#!/bin/bash
+#
+# Copyright (c) 2021-2023 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+#
+# This test measures the following network essentials:
+# - bandwith simplex
+# - jitter
+#
+# These metrics/results will be got from the interconnection between
+# a client and a server using iperf3 tool.
+# The following cases are covered:
+#
+# case 1:
+#  container-server <----> container-client
+#
+# case 2"
+#  container-server <----> host-client
+
+set -o pipefail
+
+SCRIPT_PATH=$(dirname "$(readlink -f "$0")")
+
+source "${SCRIPT_PATH}/../../lib/common.bash"
+iperf_file=$(mktemp iperfresults.XXXXXXXXXX)
+TEST_NAME="${TEST_NAME:-network-iperf3}"
+COLLECT_ALL="${COLLECT_ALL:-false}"
+
+function remove_tmp_file() {
+	rm -rf "${iperf_file}"
+}
+
+trap remove_tmp_file EXIT
+
+function iperf3_all_collect_results() {
+	metrics_json_init
+	metrics_json_start_array
+	local json="$(cat << EOF
+	{
+		"bandwidth": {
+			"Result" : $bandwidth_result,
+			"Units" : "$bandwidth_units"
+		},
+		"jitter": {
+			"Result" : $jitter_result,
+			"Units" : "$jitter_units"
+		},
+		"cpu": {
+			"Result" : $cpu_result,
+			"Units"  : "$cpu_units"
+		},
+		"parallel": {
+			"Result" : $parallel_result,
+			"Units" : "$parallel_units"
+		}
+	}
+EOF
+)"
+	metrics_json_add_array_element "$json"
+	metrics_json_end_array "Results"
+}
+
+function iperf3_bandwidth() {
+	# Start server
+	local transmit_timeout="30"
+
+	kubectl exec -i "$client_pod_name" -- sh -c "iperf3 -J -c ${server_ip_add} -t ${transmit_timeout}" | jq '.end.sum_received.bits_per_second' > "${iperf_file}"
+	export bandwidth_result=$(cat "${iperf_file}")
+	export bandwidth_units="bits per second"
+
+	if [ "$COLLECT_ALL" == "true" ]; then
+		iperf3_all_collect_results
+	else
+		metrics_json_init
+		metrics_json_start_array
+
+		local json="$(cat << EOF
+		{
+			"bandwidth": {
+				"Result" : $bandwidth_result,
+				"Units" : "$bandwidth_units"
+			}
+		}
+EOF
+)"
+		metrics_json_add_array_element "$json"
+		metrics_json_end_array "Results"
+	fi
+}
+
+function iperf3_jitter() {
+	# Start server
+	local transmit_timeout="30"
+
+	kubectl exec -i "$client_pod_name" -- sh -c "iperf3 -J -c ${server_ip_add} -u -t ${transmit_timeout}" | jq '.end.sum.jitter_ms' > "${iperf_file}"
+	result=$(cat "${iperf_file}")
+	export jitter_result=$(printf "%0.3f\n" $result)
+	export jitter_units="ms"
+
+	if [ "$COLLECT_ALL" == "true" ]; then
+		iperf3_all_collect_results
+	else
+		metrics_json_init
+		metrics_json_start_array
+
+		local json="$(cat << EOF
+		{
+			"jitter": {
+				"Result" : $jitter_result,
+				"Units" : "ms"
+			}
+		}
+EOF
+)"
+		metrics_json_add_array_element "$json"
+		metrics_json_end_array "Results"
+	fi
+}
+
+function iperf3_parallel() {
+	# This will measure four parallel connections with iperf3
+	kubectl exec -i "$client_pod_name" -- sh -c "iperf3 -J -c ${server_ip_add} -P 4" | jq '.end.sum_received.bits_per_second' > "${iperf_file}"
+	export parallel_result=$(cat "${iperf_file}")
+	export parallel_units="bits per second"
+
+	if [ "$COLLECT_ALL" == "true" ]; then
+		iperf3_all_collect_results
+	else
+		metrics_json_init
+		metrics_json_start_array
+
+		local json="$(cat << EOF
+		{
+			"parallel": {
+				"Result" : $parallel_result,
+				"Units" : "$parallel_units"
+			}
+		}
+EOF
+)"
+		metrics_json_add_array_element "$json"
+		metrics_json_end_array "Results"
+	fi
+}
+
+function iperf3_cpu() {
+	# Start server
+	local transmit_timeout="80"
+
+	kubectl exec -i "$client_pod_name" -- sh -c "iperf3 -J -c ${server_ip_add} -t ${transmit_timeout}" | jq '.end.cpu_utilization_percent.host_total' > "${iperf_file}"
+	export cpu_result=$(cat "${iperf_file}")
+	export cpu_units="percent"
+
+	if [ "$COLLECT_ALL" == "true" ]; then
+		iperf3_all_collect_results
+	else
+		metrics_json_init
+		metrics_json_start_array
+
+		local json="$(cat << EOF
+		{
+			"cpu": {
+				"Result" : $cpu_result,
+				"Units"  : "$cpu_units"
+			}
+		}
+EOF
+)"
+
+		metrics_json_add_array_element "$json"
+		metrics_json_end_array "Results"
+	fi
+}
+
+function iperf3_start_deployment() {
+	cmds=("bc" "jq")
+	check_cmds "${cmds[@]}"
+
+	# Check no processes are left behind
+	check_processes
+
+	export service="iperf3-server"
+	export deployment="iperf3-server-deployment"
+
+	wait_time=20
+	sleep_time=2
+
+	# Create deployment
+	kubectl create -f "${SCRIPT_PATH}/runtimeclass_workloads/iperf3-deployment.yaml"
+
+	# Check deployment creation
+	local cmd="kubectl wait --for=condition=Available deployment/${deployment}"
+	waitForProcess "$wait_time" "$sleep_time" "$cmd"
+
+	# Create DaemonSet
+	kubectl create -f "${SCRIPT_PATH}/runtimeclass_workloads/iperf3-daemonset.yaml"
+
+	# Expose deployment
+	kubectl expose deployment/"${deployment}"
+
+	# Get the names of the server pod
+	export server_pod_name=$(kubectl get pods -o name | grep server | cut -d '/' -f2)
+
+	# Verify the server pod is working
+	local cmd="kubectl get pod $server_pod_name -o yaml | grep 'phase: Running'"
+	waitForProcess "$wait_time" "$sleep_time" "$cmd"
+
+	# Get the names of client pod
+	export client_pod_name=$(kubectl get pods -o name | grep client | cut -d '/' -f2)
+
+	# Verify the client pod is working
+	local cmd="kubectl get pod $client_pod_name -o yaml | grep 'phase: Running'"
+	waitForProcess "$wait_time" "$sleep_time" "$cmd"
+
+	# Get the ip address of the server pod
+	export server_ip_add=$(kubectl get pod "$server_pod_name" -o jsonpath='{.status.podIP}')
+}
+
+function iperf3_deployment_cleanup() {
+	kubectl delete pod "$server_pod_name" "$client_pod_name"
+	kubectl delete ds iperf3-clients
+	kubectl delete deployment "$deployment"
+	kubectl delete service "$deployment"
+	check_processes
+}
+
+function help() {
+echo "$(cat << EOF
+Usage: $0 "[options]"
+	Description:
+		This script implements a number of network metrics
+		using iperf3.
+
+	Options:
+		-a	Run all tests
+		-b 	Run bandwidth tests
+		-c	Run cpu metrics tests
+		-h	Help
+		-j	Run jitter tests
+EOF
+)"
+}
+
+function main() {
+	init_env
+	iperf3_start_deployment
+
+	local OPTIND
+	while getopts ":abcjph:" opt
+	do
+		case "$opt" in
+		a)	# all tests
+			test_all="1"
+			;;
+		b)	# bandwith test
+			test_bandwith="1"
+			;;
+		c)
+			# run cpu tests
+			test_cpu="1"
+			;;
+		h)
+			help
+			exit 0;
+			;;
+		j)	# jitter tests
+			test_jitter="1"
+			;;
+		p)
+			# run parallel tests
+			test_parallel="1"
+			;;
+		:)
+			echo "Missing argument for -$OPTARG";
+			help
+			exit 1;
+			;;
+		esac
+	done
+	shift $((OPTIND-1))
+
+	[[ -z "$test_bandwith" ]] && \
+	[[ -z "$test_jitter" ]] && \
+	[[ -z "$test_cpu" ]] && \
+	[[ -z "$test_parallel" ]] && \
+	[[ -z "$test_all" ]] && \
+		help && die "Must choose at least one test"
+
+	if [ "$test_bandwith" == "1" ]; then
+		iperf3_bandwidth
+	fi
+
+	if [ "$test_jitter" == "1" ]; then
+		iperf3_jitter
+	fi
+
+	if [ "$test_cpu" == "1" ]; then
+		iperf3_cpu
+	fi
+
+	if [ "$test_parallel" == "1" ]; then
+		iperf3_parallel
+	fi
+
+	if [ "$test_all" == "1" ]; then
+		export COLLECT_ALL=true && iperf3_bandwidth && iperf3_jitter && iperf3_cpu && iperf3_parallel
+	fi
+
+	metrics_json_save
+	iperf3_deployment_cleanup
+}
+
+main "$@"
--- a/tests/metrics/network/iperf3_kubernetes/runtimeclass_workloads/iperf3-daemonset.yaml
+++ b/tests/metrics/network/iperf3_kubernetes/runtimeclass_workloads/iperf3-daemonset.yaml
@ -0,0 +1,29 @@
+#
+# Copyright (c) 2021-2023 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+#
+apiVersion: apps/v1
+kind: DaemonSet
+metadata:
+  name: iperf3-clients
+  labels:
+    app: iperf3-client
+spec:
+  selector:
+    matchLabels:
+      app: iperf3-client
+  template:
+    metadata:
+      labels:
+        app: iperf3-client
+    spec:
+      tolerations:
+        - key: node-role.kubernetes.io/master
+          operator: Exists
+          effect: NoSchedule
+      containers:
+      - name: iperf3-client
+        image: networkstatic/iperf3
+        command: ['/bin/sh', '-c', 'sleep infinity']
+      terminationGracePeriodSeconds: 0
--- a/tests/metrics/network/iperf3_kubernetes/runtimeclass_workloads/iperf3-deployment.yaml
+++ b/tests/metrics/network/iperf3_kubernetes/runtimeclass_workloads/iperf3-deployment.yaml
@ -0,0 +1,44 @@
+#
+# Copyright (c) 2021-2023 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+#
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: iperf3-server-deployment
+  labels:
+    app: iperf3-server
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: iperf3-server
+  template:
+    metadata:
+      labels:
+        app: iperf3-server
+    spec:
+      affinity:
+        nodeAffinity:
+          preferredDuringSchedulingIgnoredDuringExecution:
+          - weight: 1
+            preference:
+              matchExpressions:
+              - key: kubernetes.io/role
+                operator: In
+                values:
+                - master
+      tolerations:
+        - key: node-role.kubernetes.io/master
+          operator: Exists
+          effect: NoSchedule
+      containers:
+      - name: iperf3-server
+        image: networkstatic/iperf3
+        args: ['-s']
+        ports:
+        - containerPort: 5201
+          name: server
+      terminationGracePeriodSeconds: 0
+      runtimeClassName: kata
--- a/tests/metrics/network/iperf3_kubernetes/runtimeclass_workloads/iperf3-service.yaml
+++ b/tests/metrics/network/iperf3_kubernetes/runtimeclass_workloads/iperf3-service.yaml
@ -0,0 +1,44 @@
+#
+# Copyright (c) 2021-2023 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+#
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: iperf3-server-deployment
+  labels:
+    app: iperf3-server
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: iperf3-server
+  template:
+    metadata:
+      labels:
+        app: iperf3-server
+    spec:
+      affinity:
+        nodeAffinity:
+          preferredDuringSchedulingIgnoredDuringExecution:
+          - weight: 1
+            preference:
+              matchExpressions:
+              - key: kubernetes.io/role
+                operator: In
+                values:
+                - master
+      tolerations:
+        - key: node-role.kubernetes.io/master
+          operator: Exists
+          effect: NoSchedule
+      containers:
+      - name: iperf3-server
+        image: networkstatic/iperf3
+        args: ['-s']
+        ports:
+        - containerPort: 5201
+          name: server
+      terminationGracePeriodSeconds: 0
+      runtimeClassName: kata
--- a/tests/metrics/network/latency_kubernetes/latency-client.yaml
+++ b/tests/metrics/network/latency_kubernetes/latency-client.yaml
@ -0,0 +1,17 @@
+# Copyright (c) 2022-2023 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+#
+apiVersion: v1
+kind: Pod
+metadata:
+  name: latency-client
+spec:
+  terminationGracePeriodSeconds: 0
+  runtimeClassName: kata
+  containers:
+    - name: client-container
+      image: quay.io/prometheus/busybox:latest
+      command:
+        - sleep
+        - "180"
--- a/tests/metrics/network/latency_kubernetes/latency-network.sh
+++ b/tests/metrics/network/latency_kubernetes/latency-network.sh
@ -0,0 +1,85 @@
+#!/bin/bash
+#
+# Copyright (c) 2022 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+
+set -o pipefail
+
+SCRIPT_PATH=$(dirname "$(readlink -f "$0")")
+
+source "${SCRIPT_PATH}/../../lib/common.bash"
+latency_file=$(mktemp latencyresults.XXXXXXXXXX)
+TEST_NAME="${TEST_NAME:-latency}"
+
+function remove_tmp_file() {
+	rm -rf "${latency_file}"
+}
+
+trap remove_tmp_file EXIT
+
+function main() {
+	init_env
+	cmds=("bc" "jq")
+	check_cmds "${cmds[@]}"
+
+	# Check no processes are left behind
+	check_processes
+
+	wait_time=20
+	sleep_time=2
+
+	# Create server
+	kubectl create -f "${SCRIPT_PATH}/runtimeclass_workloads/latency-server.yaml"
+
+	# Get the names of the server pod
+	export server_pod_name="latency-server"
+
+	# Verify the server pod is working
+	local cmd="kubectl get pod $server_pod_name -o yaml | grep 'phase: Running'"
+	waitForProcess "$wait_time" "$sleep_time" "$cmd"
+
+	# Create client
+	kubectl create -f "${SCRIPT_PATH}/runtimeclass_workloads/latency-client.yaml"
+
+	# Get the names of the client pod
+	export client_pod_name="latency-client"
+
+	# Verify the client pod is working
+	local cmd="kubectl get pod $client_pod_name -o yaml | grep 'phase: Running'"
+	waitForProcess "$wait_time" "$sleep_time" "$cmd"
+
+	# Get the ip address of the server pod
+	export server_ip_add=$(kubectl get pod "$server_pod_name" -o jsonpath='{.status.podIP}')
+
+	# Number of packets (sent)
+	local number="${number:-30}"
+
+	local client_command="ping -c ${number} ${server_ip_add}"
+
+	kubectl exec "$client_pod_name" -- sh -c "$client_command" > "$latency_file"
+
+	metrics_json_init
+
+	local latency=$(cat $latency_file | grep avg | cut -f2 -d'=' | sed 's/[[:blank:]]//g' | cut -f2 -d'/')
+
+	metrics_json_start_array
+
+	local json="$(cat << EOF
+	{
+		"latency": {
+			"Result" : $latency,
+			"Units" : "ms"
+		}
+	}
+EOF
+)"
+
+	metrics_json_add_array_element "$json"
+	metrics_json_end_array "Results"
+	metrics_json_save
+
+	kubectl delete pod "$client_pod_name" "$server_pod_name"
+	check_processes
+}
+main "$@"
--- a/tests/metrics/network/latency_kubernetes/latency-server.yaml
+++ b/tests/metrics/network/latency_kubernetes/latency-server.yaml
@ -0,0 +1,18 @@
+#
+# Copyright (c) 2022-2023 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+#
+apiVersion: v1
+kind: Pod
+metadata:
+  name: latency-server
+spec:
+  terminationGracePeriodSeconds: 0
+  runtimeClassName: kata
+  containers:
+    - name: server-container
+      image: quay.io/prometheus/busybox:latest
+      command:
+        - sleep
+        - "180"
--- a/tests/metrics/storage/README.md
+++ b/tests/metrics/storage/README.md
@ -1,11 +1,27 @@
 # Kata Containers storage I/O tests
+
 The metrics tests in this directory are designed to be used to assess storage IO.
+
 ## `Blogbench` test
+
 The `blogbench` script is based on the `blogbench` program which is designed to emulate a busy blog server with a number of concurrent 
 threads performing a mixture of reads, writes and rewrites.
+
 ### Running the `blogbench` test
+
 The `blogbench` test can be run by hand, for example:
 ```
 $ cd metrics
 $ bash storage/blogbench.sh
 ```
+## `fio` test
+
+The `fio` test utilises the [fio tool](https://github.com/axboe/fio), configured
+to perform measurements upon a single test file.
+
+The test configuration used by the script can be modified by setting a number of
+environment variables to change or over-ride the test defaults.
+
+## DAX `virtio-fs` `fio` Kubernetes tests
+
+[Test](fio-k8s/README.md) to compare the use of DAX option in `virtio-fs`.
--- a/tests/metrics/storage/fio-k8s/README.md
+++ b/tests/metrics/storage/fio-k8s/README.md
@ -0,0 +1,30 @@
+# FIO test in Kubernetes
+
+This is an automation to run `fio` with Kubernetes.
+
+## Requirements:
+
+- Kubernetes cluster running.
+- Kata configured as `runtimeclass`. 
+
+## Test structure:
+
+- [fio-test]: Program wrapper to launch `fio` in a K8s pod. 
+- [pkg]: Library code that could be used for more `fio` automation. 
+- [configs]: Configuration files used by [fio-test]. 
+- [DAX-compare-test]: Script to run [fio-test] to generate `fio` data for Kata with/without `virtio-fs DAX` and K8s bare-metal runtime(`runc`). 
+- [report] Jupyter Notebook to create reports for data generated by [DAX-compare-test].
+
+## Top-level Makefile targets
+
+- `build`: Build `fio` metrics.
+- `test`: quick test, used to verify changes in [fio-test]. 
+- `run`: Run `fio` metrics and generate reports. 
+- `test-report-interactive`: Run python notebook in `localhost:8888`, useful to edit the report. 
+- `test-report`: Generate report from data generated by `make test`. 
+
+[fio-test]:cmd/fiotest 
+[configs]:configs 
+[pkg]:pkg 
+[report]:scripts/dax-compare-test/report
+[DAX-compare-test]:scripts/dax-compare-test/README.md
--- a/tests/metrics/storage/fio-k8s/fio-test-ci.sh
+++ b/tests/metrics/storage/fio-k8s/fio-test-ci.sh
@ -0,0 +1,85 @@
+#!/bin/bash
+#
+# Copyright (c) 2022-2023 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+
+set -e
+
+# General env
+SCRIPT_PATH=$(dirname "$(readlink -f "$0")")
+source "${SCRIPT_PATH}/../../lib/common.bash"
+FIO_PATH="${GOPATH}/src/github.com/kata-containers/kata-containers/tests/metrics/storage/fio-k8s"
+TEST_NAME="${TEST_NAME:-fio}"
+
+function main() {
+	cmds=("bc" "jq")
+	check_cmds "${cmds[@]}"
+	check_processes
+	init_env
+
+	export KUBECONFIG="$HOME/.kube/config"
+
+	pushd "${FIO_PATH}"
+		echo "INFO: Running K8S FIO test"
+		make test-ci
+	popd
+
+	test_result_file="${FIO_PATH}/cmd/fiotest/test-results/kata/randrw-sync.job/output.json"
+
+	metrics_json_init
+	local read_io=$(cat $test_result_file | grep io_bytes | head -1 | sed 's/[[:blank:]]//g' | cut -f2 -d ':' | cut -f1 -d ',')
+	local read_bw=$(cat $test_result_file | grep bw_bytes | head -1 | sed 's/[[:blank:]]//g' | cut -f2 -d ':' | cut -f1 -d ',')
+ 	local read_90_percentile=$(cat $test_result_file | grep 90.000000 | head -1 | sed 's/[[:blank:]]//g' | cut -f2 -d ':' | cut -f1 -d ',')
+	local read_95_percentile=$(cat $test_result_file | grep 95.000000 | head -1 | sed 's/[[:blank:]]//g' | cut -f2 -d ':' | cut -f1 -d ',')
+	local write_io=$(cat $test_result_file | grep io_bytes | head -2 | tail -1 | sed 's/[[:blank:]]//g' | cut -f2 -d ':' | cut -f1 -d ',')
+	local write_bw=$(cat $test_result_file | grep bw_bytes | head -2 | tail -1 | sed 's/[[:blank:]]//g' | cut -f2 -d ':' | cut -f1 -d ',')
+	local write_90_percentile=$(cat $test_result_file | grep 90.000000 | head -2 | tail -1 | sed 's/[[:blank:]]//g' | cut -f2 -d ':' | cut -f1 -d ',')
+	local write_95_percentile=$(cat $test_result_file | grep 95.000000 | head -2 | tail -1 | sed 's/[[:blank:]]//g' | cut -f2 -d ':' | cut -f1 -d ',')
+
+	metrics_json_start_array
+	local json="$(cat << EOF
+	{
+		"readio": {
+			"Result" : $read_io,
+			"Units" : "bytes"
+		},
+		"readbw": {
+			"Result" : $read_bw,
+			"Units" : "bytes/sec"
+		},
+		"read90percentile": {
+			"Result" : $read_90_percentile,
+			"Units" : "ns"
+		},
+		"read95percentile": {
+			"Result" : $read_95_percentile,
+			"Units" : "ns"
+		},
+		"writeio": {
+			"Result" : $write_io,
+			"Units" : "bytes"
+		},
+		"writebw": {
+			"Result" : $write_bw,
+			"Units" : "bytes/sec"
+		},
+		"write90percentile": {
+			"Result" : $write_90_percentile,
+			"Units" : "ns"
+		},
+		"write95percentile": {
+			"Result" : $write_95_percentile,
+			"Units" : "ns"
+		}
+	}
+EOF
+)"
+	metrics_json_add_array_element "$json"
+	metrics_json_end_array "Results"
+	metrics_json_save
+
+	check_processes
+}
+
+main "$@"
--- a/tests/metrics/storage/fio-k8s/scripts/dax-compare-test/Makefile
+++ b/tests/metrics/storage/fio-k8s/scripts/dax-compare-test/Makefile
--- a/tests/metrics/storage/fio-k8s/scripts/dax-compare-test/README.md
+++ b/tests/metrics/storage/fio-k8s/scripts/dax-compare-test/README.md
@ -0,0 +1,47 @@
+# FIO in Kubernetes
+
+This test runs `fio` jobs to measure how Kata Containers work using virtio-fs DAX. The test works using Kubernetes.  
+The test has to run in a single node cluster, it is needed as the test modifies Kata configuration file. 
+
+The `virtio-fs` options that this test will use are: 
+
+* `cache mode` Only `auto`, this is the most compatible mode for most of the Kata use cases. Today this is default in Kata. 
+* `thread pool size` Restrict the number of worker threads per request queue, zero means no thread pool. 
+* `DAX` 
+``` 
+File contents can be mapped into a memory window on the host, allowing the guest to directly access data from the host page cache. This has several advantages: The guest page cache is bypassed, reducing the memory footprint.  No communication is necessary 
+to access file contents, improving I/O performance. Shared file access is coherent between virtual machines on the same host even with mmap. 
+``` 
+
+This test by default iterates over different `virtio-fs` configurations.
+
+| test name                 | DAX | thread pool size | cache mode |
+|---------------------------|-----|------------------|------------|
+| pool_0_cache_auto_no_DAX  | no  | 0                | auto       |
+| pool_0_cache_auto_DAX     | yes | 0                | auto       |
+
+The `fio` options used are: 
+
+`ioengine`: How the IO requests are issued to the kernel. 
+* `libaio`: Supports async IO for both direct and buffered IO. 
+* `mmap`: File is memory mapped with mmap(2) and data copied to/from using memcpy(3). 
+
+`rw type`: Type of I/O pattern. 
+* `randread`: Random reads. 
+* `randrw`: Random mixed reads and writes. 
+* `randwrite`: Random writes. 
+* `read`: Sequential reads. 
+* `write`: Sequential writes.
+
+Additional notes: Some jobs contain a `multi` prefix. This means that the same job runs more than once at the same time using its own file.
+
+### Static `fio` values:
+
+Some `fio` values are not modified over all the jobs. 
+
+* `runtime`: Tell `fio` to terminate processing after the specified period of time(seconds).
+* `loops`: Run the specified number of iterations of this job. Used to repeat the same workload a given number of times.
+* `iodepth`: Number of I/O units to keep in flight against the file. Note that increasing `iodepth` beyond 1 will not affect synchronous `ioengine`.
+* `size`: The total size of file I/O for each thread of this job.
+* `direct`: If value is true, use non-buffered I/O. This is usually O_`DIRECT`.
+* `blocksize`: The block size in bytes used for I/O units.
--- a/tests/metrics/storage/fio-k8s/scripts/dax-compare-test/compare-virtiofsd-dax.sh
+++ b/tests/metrics/storage/fio-k8s/scripts/dax-compare-test/compare-virtiofsd-dax.sh
--- a/tests/metrics/time/launch_times.sh
+++ b/tests/metrics/time/launch_times.sh
@ -96,6 +96,7 @@ run_workload() {
 	# number of decimal digits after the decimal points
 	# for 'bc' performing math in kernel period estimation
 	L_CALC_SCALE=13
+	local CONTAINER_NAME="kata_launch_times_$(( $RANDOM % 1000 + 1))"
 	start_time=$($DATECMD)

 	# Check entropy level of the host
@ -103,8 +104,7 @@ run_workload() {

 	# Run the image and command and capture the results into an array...
 	declare workload_result
-	readarray -n 0 workload_result < <(sudo -E "${CTR_EXE}" run --rm --runtime=${CTR_RUNTIME} ${IMAGE} test bash -c "$DATECMD $DMESGCMD")
-
+	readarray -n 0 workload_result < <(sudo -E "${CTR_EXE}" run --rm --runtime ${CTR_RUNTIME} ${IMAGE} ${CONTAINER_NAME} bash -c "$DATECMD $DMESGCMD")
 	end_time=$($DATECMD)

 	# Delay this calculation until after we have run - do not want
--- a/tools/osbuilder/image-builder/image_builder.sh
+++ b/tools/osbuilder/image-builder/image_builder.sh
@ -13,6 +13,16 @@ set -o pipefail
 DOCKER_RUNTIME=${DOCKER_RUNTIME:-runc}
 MEASURED_ROOTFS=${MEASURED_ROOTFS:-no}

+#For cross build
+CROSS_BUILD=${CROSS_BUILD:-false}
+BUILDX=""
+PLATFORM=""
+TARGET_ARCH=${TARGET_ARCH:-$(uname -m)}
+ARCH=${ARCH:-$(uname -m)}
+[ "${TARGET_ARCH}" == "aarch64" ] && TARGET_ARCH=arm64
+TARGET_OS=${TARGET_OS:-linux}
+[ "${CROSS_BUILD}" == "true" ] && BUILDX=buildx && PLATFORM="--platform=${TARGET_OS}/${TARGET_ARCH}"
+
 readonly script_name="${0##*/}"
 readonly script_dir=$(dirname "$(readlink -f "$0")")
 readonly lib_file="${script_dir}/../scripts/lib.sh"
@ -154,7 +164,7 @@ build_with_container() {
 		engine_build_args+=" --runtime ${DOCKER_RUNTIME}"
 	fi

-	"${container_engine}" build  \
+	"${container_engine}" ${BUILDX} build ${PLATFORM}  \
 		   ${engine_build_args} \
 		   --build-arg http_proxy="${http_proxy}" \
 		   --build-arg https_proxy="${https_proxy}" \
@ -189,6 +199,8 @@ build_with_container() {
 		   --env MEASURED_ROOTFS="${MEASURED_ROOTFS}" \
 		   --env SELINUX="${SELINUX}" \
 		   --env DEBUG="${DEBUG}" \
+		   --env ARCH="${ARCH}" \
+		   --env TARGET_ARCH="${TARGET_ARCH}" \
 		   -v /dev:/dev \
 		   -v "${script_dir}":"/osbuilder" \
 		   -v "${script_dir}/../scripts":"/scripts" \
--- a/tools/osbuilder/rootfs-builder/rootfs.sh
+++ b/tools/osbuilder/rootfs-builder/rootfs.sh
@ -32,6 +32,16 @@ SELINUX=${SELINUX:-"no"}
 lib_file="${script_dir}/../scripts/lib.sh"
 source "$lib_file"

+#For cross build
+CROSS_BUILD=${CROSS_BUILD:-false}
+BUILDX=""
+PLATFORM=""
+TARGET_ARCH=${TARGET_ARCH:-$(uname -m)}
+ARCH=${ARCH:-$(uname -m)}
+[ "${TARGET_ARCH}" == "aarch64" ] && TARGET_ARCH=arm64
+TARGET_OS=${TARGET_OS:-linux}
+[ "${CROSS_BUILD}" == "true" ] && BUILDX=buildx && PLATFORM="--platform=${TARGET_OS}/${TARGET_ARCH}"
+
 handle_error() {
 	local exit_code="${?}"
 	local line_number="${1:-}"
--- a/tools/osbuilder/rootfs-builder/ubuntu/Dockerfile.in
+++ b/tools/osbuilder/rootfs-builder/ubuntu/Dockerfile.in
@ -10,6 +10,7 @@ FROM ${IMAGE_REGISTRY}/golang:1.18 AS skopeo
@SET_PROXY@
 WORKDIR /skopeo
 ARG SKOPEO_VERSION
+# hadolint ignore=DL4006
 RUN curl -fsSL "https://github.com/containers/skopeo/archive/v${SKOPEO_VERSION}.tar.gz" \
  | tar -xzf - --strip-components=1
 RUN CGO_ENABLED=0 DISABLE_DOCS=1 make BUILDTAGS=containers_image_openpgp GO_DYN_FLAGS=
@ -19,6 +20,7 @@ FROM ${IMAGE_REGISTRY}/ubuntu:@OS_VERSION@

 # makedev tries to mknod from postinst
 RUN [ -x /usr/bin/systemd-detect-virt ] || ( echo "echo docker" >/usr/bin/systemd-detect-virt && chmod +x /usr/bin/systemd-detect-virt )
+# hadolint ignore=DL3009,SC2046
 RUN apt-get update && \
    DEBIAN_FRONTEND=noninteractive \
    apt-get --no-install-recommends -y install \
@ -29,6 +31,7 @@ RUN apt-get update && \
         libc_arch="$gcc_arch" && \
         [ "$gcc_arch" = aarch64 ] && libc_arch=arm64; \
         [ "$gcc_arch" = ppc64le ] && gcc_arch=powerpc64le && libc_arch=ppc64el; \
+         [ "$gcc_arch" = s390x ] && gcc_arch=s390x && libc_arch=s390x; \
         [ "$gcc_arch" = x86_64 ] && gcc_arch=x86-64 && libc_arch=amd64; \
         echo "gcc-$gcc_arch-linux-gnu libc6-dev-$libc_arch-cross")) \
    git \
--- a/tools/packaging/guest-image/build_image.sh
+++ b/tools/packaging/guest-image/build_image.sh
@ -21,7 +21,13 @@ readonly osbuilder_dir="$(cd "${repo_root_dir}/tools/osbuilder" && pwd)"

 export GOPATH=${GOPATH:-${HOME}/go}

-arch_target="$(uname -m)"
+ARCH=${ARCH:-$(uname -m)}
+if [ $(uname -m) == "${ARCH}" ]; then
+       arch_target="$(uname -m)"
+else
+       arch_target="${ARCH}"
+fi
+
 final_artifact_name="kata-containers"
 image_initrd_extension=".img"

--- a/tools/packaging/kata-deploy/kata-cleanup/base/kata-cleanup.yaml
+++ b/tools/packaging/kata-deploy/kata-cleanup/base/kata-cleanup.yaml
@ -14,6 +14,7 @@ spec:
          name: kubelet-kata-cleanup
    spec:
      serviceAccountName: kata-deploy-sa
+      hostPID: true
      nodeSelector:
          katacontainers.io/kata-runtime: cleanup
      containers:
@ -38,18 +39,6 @@ spec:
          value: "false"
        securityContext:
          privileged: true
-        volumeMounts:
-        - name: dbus
-          mountPath: /var/run/dbus/system_bus_socket
-        - name: systemd
-          mountPath: /run/systemd/system
-      volumes:
-        - name: dbus
-          hostPath:
-            path: /var/run/dbus/system_bus_socket
-        - name: systemd
-          hostPath:
-            path: /run/systemd/system
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
--- a/tools/packaging/kata-deploy/kata-deploy/base/kata-deploy.yaml
+++ b/tools/packaging/kata-deploy/kata-deploy/base/kata-deploy.yaml
@ -14,6 +14,7 @@ spec:
          name: kata-deploy
    spec:
      serviceAccountName: kata-deploy-sa
+      hostPID: true
      containers:
      - name: kube-kata
        image: quay.io/kata-containers/kata-deploy-cc:v0
@ -47,10 +48,6 @@ spec:
          mountPath: /etc/containerd/
        - name: kata-artifacts
          mountPath: /opt/kata/
-        - name: dbus
-          mountPath: /var/run/dbus/system_bus_socket
-        - name: systemd
-          mountPath: /run/systemd/system
        - name: local-bin
          mountPath: /usr/local/bin/
      volumes:
@ -64,12 +61,6 @@ spec:
          hostPath:
            path: /opt/kata/
            type: DirectoryOrCreate
-        - name: dbus
-          hostPath:
-            path: /var/run/dbus/system_bus_socket
-        - name: systemd
-          hostPath:
-            path: /run/systemd/system
        - name: local-bin
          hostPath:
            path: /usr/local/bin/
--- a/tools/packaging/kata-deploy/local-build/kata-deploy-binaries-in-docker.sh
+++ b/tools/packaging/kata-deploy/local-build/kata-deploy-binaries-in-docker.sh
@ -19,6 +19,29 @@ gid=$(id -g ${USER})
 http_proxy="${http_proxy:-}"
 https_proxy="${https_proxy:-}"

+ARCH=${ARCH:-$(uname -m)}
+CROSS_BUILD=
+BUILDX=""
+PLATFORM=""
+TARGET_ARCH=${TARGET_ARCH:-$(uname -m)}
+[ "$(uname -m)" != "${TARGET_ARCH}" ] && CROSS_BUILD=true
+
+[ "${TARGET_ARCH}" == "aarch64" ] && TARGET_ARCH=arm64
+
+# used for cross build
+TARGET_OS=${TARGET_OS:-linux}
+TARGET_ARCH=${TARGET_ARCH:-$ARCH}
+
+[ "${CROSS_BUILD}" == "true" ] && BUILDX="buildx" && PLATFORM="--platform=${TARGET_OS}/${TARGET_ARCH}"
+if [ "${CROSS_BUILD}" == "true" ]; then
+       # check if the current docker support docker buildx
+       docker buildx ls > /dev/null 2>&1 || true
+       [ $? != 0 ] && echo "no docker buildx support, please upgrad your docker" && exit 1
+       # check if docker buildx support target_arch, if not install it
+       r=$(docker buildx ls | grep "${TARGET_ARCH}" || true)
+       [ -z "$r" ] && sudo docker run --privileged --rm tonistiigi/binfmt --install ${TARGET_ARCH}
+fi
+
 if [ "${script_dir}" != "${PWD}" ]; then
 	ln -sf "${script_dir}/build" "${PWD}/build"
 fi
@ -72,6 +95,9 @@ docker run \
 	--env VIRTIOFSD_CONTAINER_BUILDER="${VIRTIOFSD_CONTAINER_BUILDER:-}" \
 	--env MEASURED_ROOTFS="${MEASURED_ROOTFS:-}" \
 	--env USE_CACHE="${USE_CACHE:-}" \
+	--env CROSS_BUILD="${CROSS_BUILD}" \
+	--env TARGET_ARCH="${TARGET_ARCH}" \
+	--env ARCH="${ARCH}" \
 	--rm \
 	-w ${script_dir} \
 	build-kata-deploy "${kata_deploy_create}" $@
--- a/tools/packaging/kata-deploy/local-build/kata-deploy-binaries.sh
+++ b/tools/packaging/kata-deploy/local-build/kata-deploy-binaries.sh
@ -42,7 +42,7 @@ source "${script_dir}/../../scripts/lib.sh"
 readonly jenkins_url="http://jenkins.katacontainers.io"
 readonly cached_artifacts_path="lastSuccessfulBuild/artifact/artifacts"

-ARCH=$(uname -m)
+ARCH=${ARCH:-$(uname -m)}
 MEASURED_ROOTFS=${MEASURED_ROOTFS:-no}
 USE_CACHE="${USE_CACHE:-"yes"}"

@ -368,7 +368,7 @@ install_initrd() {

 	local jenkins="${jenkins_url}/job/kata-containers-main-rootfs-${initrd_type}-${ARCH}/${cached_artifacts_path}"
 	if [ -n "${variant}" ]; then
-		jenkins="${jenkins_url}/job/kata-containers-2.0-rootfs-initrd-${variant}-cc-$(uname -m)/${cached_artifacts_path}"
+		jenkins="${jenkins_url}/job/kata-containers-2.0-rootfs-initrd-${variant}-cc-${ARCH}/${cached_artifacts_path}"
 	fi
 	local component="rootfs-${initrd_type}"

@ -392,6 +392,8 @@ install_initrd() {
 		version_checker="${osbuilder_last_commit}-${guest_image_last_commit}-${initramfs_last_commit}-${agent_last_commit}-${libs_last_commit}-${attestation_agent_version}-${gperf_version}-${libseccomp_version}-${pause_version}-${rust_version}-${initrd_type}-${AA_KBC}"
 	fi

+	[[ "${ARCH}" == "aarch64" && "${CROSS_BUILD}" == "true" ]] && echo "warning: Don't cross build initrd for aarch64 as it's too slow" && exit 0
+
 	install_cached_tarball_component \
 		"${component}" \
 		"${jenkins}" \
@ -438,9 +440,9 @@ install_cached_kernel_tarball_component() {

 	# This must only be done as part of the CCv0 branch, as TDX version of
 	# Kernel is not the same as the one used on main
-	local url="${jenkins_url}/job/kata-containers-main-${kernel_name}-$(uname -m)/${cached_artifacts_path}"
+	local url="${jenkins_url}/job/kata-containers-main-${kernel_name}-${ARCH}/${cached_artifacts_path}"
 	if [[ "${kernel_name}" == "kernel-tdx-experimental" ]]; then
-		url="${jenkins_url}/job/kata-containers-2.0-kernel-tdx-cc-$(uname -m)/${cached_artifacts_path}"
+		url="${jenkins_url}/job/kata-containers-2.0-kernel-tdx-cc-${ARCH}/${cached_artifacts_path}"
 	fi

 	install_cached_tarball_component \
@ -588,9 +590,9 @@ install_qemu_helper() {

 	# This must only be done as part of the CCv0 branch, as TDX version of 
 	# QEMU is not the same as the one used on main
-	local url="${jenkins_url}/job/kata-containers-main-${qemu_name}-$(uname -m)/${cached_artifacts_path}"
+	local url="${jenkins_url}/job/kata-containers-main-${qemu_name}-${ARCH}/${cached_artifacts_path}"
 	if [[ "${qemu_name}" == "qemu-tdx-experimental" ]]; then
-		url="${jenkins_url}/job/kata-containers-2.0-qemu-tdx-cc-$(uname -m)/${cached_artifacts_path}"
+		url="${jenkins_url}/job/kata-containers-2.0-qemu-tdx-cc-${ARCH}/${cached_artifacts_path}"
 	fi

 	install_cached_tarball_component \
@ -713,7 +715,7 @@ install_clh_glibc() {
 install_virtiofsd() {
 	install_cached_tarball_component \
 		"virtiofsd" \
-		"${jenkins_url}/job/kata-containers-main-virtiofsd-$(uname -m)/${cached_artifacts_path}" \
+		"${jenkins_url}/job/kata-containers-main-virtiofsd-${ARCH}/${cached_artifacts_path}" \
 		"$(get_from_kata_deps "externals.virtiofsd.version")-$(get_from_kata_deps "externals.virtiofsd.toolchain")" \
 		"$(get_virtiofsd_image_name)" \
 		"${final_tarball_name}" \
@ -760,7 +762,7 @@ install_shimv2() {

 	install_cached_tarball_component \
 		"shim-v2" \
-		"${jenkins_url}/job/kata-containers-main-shim-v2-$(uname -m)/${cached_artifacts_path}" \
+		"${jenkins_url}/job/kata-containers-main-shim-v2-${ARCH}/${cached_artifacts_path}" \
 		"${shim_v2_version}" \
 		"$(get_shim_v2_image_name)" \
 		"${final_tarball_name}" \
--- a/tools/packaging/kata-deploy/scripts/kata-deploy.sh
+++ b/tools/packaging/kata-deploy/scripts/kata-deploy.sh
@ -25,6 +25,10 @@ die() {
        exit 1
 }

+function host_systemctl() {
+	nsenter --target 1 --mount systemctl "${@}"
+}
+
 function print_usage() {
 	echo "Usage: $0 [install/cleanup/reset]"
 }
@ -71,11 +75,11 @@ function get_container_runtime() {
                die "invalid node name"
 	fi
 	if echo "$runtime" | grep -qE 'containerd.*-k3s'; then
-		if systemctl is-active --quiet rke2-agent; then
+		if host_systemctl is-active --quiet rke2-agent; then
 			echo "rke2-agent"
-		elif systemctl is-active --quiet rke2-server; then
+		elif host_systemctl is-active --quiet rke2-server; then
 			echo "rke2-server"
-		elif systemctl is-active --quiet k3s-agent; then
+		elif host_systemctl is-active --quiet k3s-agent; then
 			echo "k3s-agent"
 		else
 			echo "k3s"
@ -136,8 +140,8 @@ function configure_cri_runtime() {
 		configure_containerd
 		;;
 	esac
-	systemctl daemon-reload
-	systemctl restart "$1"
+	host_systemctl daemon-reload
+	host_systemctl restart "$1"

 	wait_till_node_is_ready
 }
@ -372,10 +376,10 @@ function cleanup_containerd() {

 function reset_runtime() {
 	kubectl label node "$NODE_NAME" katacontainers.io/kata-runtime-
-	systemctl daemon-reload
-	systemctl restart "$1"
+	host_systemctl daemon-reload
+	host_systemctl restart "$1"
 	if [ "$1" == "crio" ] || [ "$1" == "containerd" ]; then
-		systemctl restart kubelet
+		host_systemctl restart kubelet
 	fi

 	wait_till_node_is_ready
--- a/tools/packaging/kernel/build-kernel.sh
+++ b/tools/packaging/kernel/build-kernel.sh
@ -63,6 +63,8 @@ kernel_url=""
 #Linux headers for GPU guest fs module building
 linux_headers=""

+CROSS_BUILD_ARG=""
+
 MEASURED_ROOTFS=${MEASURED_ROOTFS:-no}

 packaging_scripts_dir="${script_dir}/../scripts"
@ -412,7 +414,7 @@ setup_kernel() {

 	info "Copying config file from: ${kernel_config_path}"
 	cp "${kernel_config_path}" ./.config
-	make oldconfig
+	ARCH=${arch_target}  make oldconfig ${CROSS_BUILD_ARG}
 	)
 }

@ -423,7 +425,7 @@ build_kernel() {
 	[ -n "${arch_target}" ] || arch_target="$(uname -m)"
 	arch_target=$(arch_to_kernel "${arch_target}")
 	pushd "${kernel_path}" >>/dev/null
-	make -j $(nproc ${CI:+--ignore 1}) ARCH="${arch_target}"
+	make -j $(nproc ${CI:+--ignore 1}) ARCH="${arch_target}" ${CROSS_BUILD_ARG}
 	if [ "${conf_guest}" == "sev" ]; then
 		make -j $(nproc ${CI:+--ignore 1}) INSTALL_MOD_STRIP=1 INSTALL_MOD_PATH=${kernel_path} modules_install
 	fi
@ -630,6 +632,8 @@ main() {

 	info "Kernel version: ${kernel_version}"

+	[ "${arch_target}" != "" -a "${arch_target}" != $(uname -m) ] && CROSS_BUILD_ARG="CROSS_COMPILE=${arch_target}-linux-gnu-"
+
 	case "${subcmd}" in
 		build)
 			build_kernel "${kernel_path}"
--- a/tools/packaging/kernel/kata_config_version
+++ b/tools/packaging/kernel/kata_config_version
@ -1 +1 @@
-111
+112
--- a/tools/packaging/release/release-notes.sh
+++ b/tools/packaging/release/release-notes.sh
@ -141,18 +141,18 @@ build reproducibility we publish those container images, and when those are used
 of the projects listed as part of the "versions.yaml" file, users can get as close to the environment we
 used to build the release artefacts.
 * Kernel (on all its different flavours): $(get_kernel_image_name)
-* OVMF (on all its diferent flavours): $(get_ovmf_image_name)
+* OVMF (on all its different flavours): $(get_ovmf_image_name)
 * QEMU (on all its different flavurs): $(get_qemu_image_name)
 * shim-v2: $(get_shim_v2_image_name)
 * virtiofsd: $(get_virtiofsd_image_name)

 The users who want to rebuild the tarballs using exactly the same images can simply use the following environment
 variables:
-* `KERNEL_CONTAINER_BUILDER`
-* `OVMF_CONTAINER_BUILDER`
-* `QEMU_CONTAINER_BUILDER`
-* `SHIM_V2_CONTAINER_BUILDER`
-* `VIRTIOFSD_CONTAINER_BUILDER`
+* \`KERNEL_CONTAINER_BUILDER\`
+* \`OVMF_CONTAINER_BUILDER\`
+* \`QEMU_CONTAINER_BUILDER\`
+* \`SHIM_V2_CONTAINER_BUILDER\`
+* \`VIRTIOFSD_CONTAINER_BUILDER\`

 ## Kata Linux Containers Kernel
 Kata Containers ${runtime_version} suggest to use the Linux kernel [${kernel_version}][kernel]
--- a/tools/packaging/scripts/lib.sh
+++ b/tools/packaging/scripts/lib.sh
@ -27,6 +27,16 @@ jenkins_url="http://jenkins.katacontainers.io"
 # Path where cached artifacts are found.
 cached_artifacts_path="lastSuccessfulBuild/artifact/artifacts"

+#for cross build
+CROSS_BUILD=${CROSS_BUILD-:}
+BUILDX=""
+PLATFORM=""
+TARGET_ARCH=${TARGET_ARCH:-$(uname -m)}
+ARCH=${ARCH:-$(uname -m)}
+[ "${TARGET_ARCH}" == "aarch64" ] && TARGET_ARCH=arm64
+TARGET_OS=${TARGET_OS:-linux}
+[ "${CROSS_BUILD}" == "true" ] && BUILDX=buildx && PLATFORM="--platform=${TARGET_OS}/${TARGET_ARCH}"
+
 clone_tests_repo() {
 	# KATA_CI_NO_NETWORK is (has to be) ignored if there is
 	# no existing clone.
@ -228,7 +238,7 @@ get_ovmf_image_name() {
 }

 get_virtiofsd_image_name() {
-	ARCH=$(uname -m)
+	ARCH=${ARCH:-$(uname -m)}
 	case ${ARCH} in
 	        "aarch64")
 	                libc="musl"
--- a/tools/packaging/static-build/kernel/Dockerfile
+++ b/tools/packaging/static-build/kernel/Dockerfile
@ -5,6 +5,8 @@
 FROM ubuntu:22.04
 ENV DEBIAN_FRONTEND=noninteractive

+ARG ARCH
+
 # kernel deps
 RUN apt-get update && \
    apt-get install -y --no-install-recommends \
@ -23,4 +25,5 @@ RUN apt-get update && \
 	    rsync \
 	    cpio \
 	    patch && \
-    apt-get clean && apt-get autoclean
+    if [ "${ARCH}" != "$(uname -m)" ]; then apt-get install --no-install-recommends -y gcc-"${ARCH}"-linux-gnu binutils-"${ARCH}"-linux-gnu; fi && \
+    apt-get clean && apt-get autoclean && rm -rf /var/lib/apt/lists/*
--- a/tools/packaging/static-build/kernel/build.sh
+++ b/tools/packaging/static-build/kernel/build.sh
@ -14,12 +14,26 @@ source "${script_dir}/../../scripts/lib.sh"

 readonly kernel_builder="${repo_root_dir}/tools/packaging/kernel/build-kernel.sh"

+BUILDX=
+PLATFORM=
+
 DESTDIR=${DESTDIR:-${PWD}}
 PREFIX=${PREFIX:-/opt/kata}
 container_image="${KERNEL_CONTAINER_BUILDER:-$(get_kernel_image_name)}"

+if [ "${CROSS_BUILD}" == "true" ]; then
+       container_image="${container_image}-${ARCH}-cross-build"
+       # Need to build a s390x image due to an issue at
+       # https://github.com/kata-containers/kata-containers/pull/6586#issuecomment-1603189242
+       if [ ${ARCH} == "s390x" ]; then
+               BUILDX="buildx"
+               PLATFORM="--platform=linux/s390x"
+       fi
+fi
+
 sudo docker pull ${container_image} || \
-	(sudo docker build -t "${container_image}" "${script_dir}" && \
+	(sudo docker ${BUILDX} build ${PLATFORM} \
+	--build-arg ARCH=${ARCH} -t "${container_image}" "${script_dir}" && \
 	 # No-op unless PUSH_TO_REGISTRY is exported as "yes"
 	 push_to_registry "${container_image}")

@ -27,21 +41,21 @@ sudo docker run --rm -i -v "${repo_root_dir}:${repo_root_dir}" \
 	-w "${PWD}" \
 	--env MEASURED_ROOTFS="${MEASURED_ROOTFS:-}" \
 	"${container_image}" \
-	bash -c "${kernel_builder} $* setup"
+	bash -c "${kernel_builder} -a ${ARCH} $* setup"

 sudo docker run --rm -i -v "${repo_root_dir}:${repo_root_dir}" \
 	-w "${PWD}" \
 	"${container_image}" \
-	bash -c "${kernel_builder} $* build"
+	bash -c "${kernel_builder} -a ${ARCH} $* build"

 sudo docker run --rm -i -v "${repo_root_dir}:${repo_root_dir}" \
 	-w "${PWD}" \
 	--env DESTDIR="${DESTDIR}" --env PREFIX="${PREFIX}" \
 	"${container_image}" \
-	bash -c "${kernel_builder} $* install"
+	bash -c "${kernel_builder} -a ${ARCH} $* install"

 sudo docker run --rm -i -v "${repo_root_dir}:${repo_root_dir}" \
 	-w "${PWD}" \
 	--env DESTDIR="${DESTDIR}" --env PREFIX="${PREFIX}" \
 	"${container_image}" \
-	bash -c "${kernel_builder} $* build-headers"
+	bash -c "${kernel_builder} -a ${ARCH} $* build-headers"
--- a/Show More
+++ b/Show More