Merge pull request #4024 from bergwolf/2.4.0-branch-bump

# Kata Containers 2.4.0
release: Kata Containers 2.4.0
2026-02-22 14:54:23 +00:00 · 2022-04-01 13:46:35 +02:00 · 2022-04-01 06:20:20 +00:00 · 2022-04-01 06:20:20 +00:00 · 2022-04-01 11:29:18 +08:00 · 2022-03-31 16:49:06 +08:00
28 changed files with 504 additions and 123 deletions
--- a/.github/workflows/release.yaml
+++ b/.github/workflows/release.yaml
@@ -140,13 +140,10 @@ jobs:
      - uses: actions/checkout@v2
      - name: generate-and-upload-tarball
        run: |
-          pushd $GITHUB_WORKSPACE/src/agent
-          cargo vendor >> .cargo/config
-          popd
          tag=$(echo $GITHUB_REF | cut -d/ -f3-)
          tarball="kata-containers-$tag-vendor.tar.gz"
          pushd $GITHUB_WORKSPACE
-          tar -cvzf "${tarball}" src/agent/.cargo/config src/agent/vendor
+          bash -c "tools/packaging/release/generate_vendor.sh ${tarball}"
          GITHUB_TOKEN=${{ secrets.GIT_UPLOAD_TOKEN }} hub release edit -m "" -a "${tarball}" "${tag}" 
          popd

--- a/2
+++ b/2
@@ -1 +1 @@
-2.4.0-rc0
+2.4.0
--- a/docs/how-to/run-kata-with-k8s.md
+++ b/docs/how-to/run-kata-with-k8s.md
@@ -104,26 +104,69 @@ $ sudo kubeadm init --ignore-preflight-errors=all --cri-socket /run/containerd/c
 $ export KUBECONFIG=/etc/kubernetes/admin.conf
 ```

-You can force Kubelet to use Kata Containers by adding some `untrusted`
-annotation to your pod configuration. In our case, this ensures Kata
-Containers is the selected runtime to run the described workload.
+### Allow pods to run in the master node

-`nginx-untrusted.yaml`
-```yaml
-apiVersion: v1
-kind: Pod
+By default, the cluster will not schedule pods in the master node. To enable master node scheduling:
+```bash
+$ sudo -E kubectl taint nodes --all node-role.kubernetes.io/master-
+```
+
+### Create runtime class for Kata Containers
+
+Users can use [`RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/#runtime-class) to specify a different runtime for Pods.
+
+```bash
+$ cat > runtime.yaml <<EOF
+apiVersion: node.k8s.io/v1
+kind: RuntimeClass
 metadata:
-  name: nginx-untrusted
-  annotations:
-    io.kubernetes.cri.untrusted-workload: "true"
-spec:
-  containers:
+  name: kata
+handler: kata
+EOF
+
+$ sudo -E kubectl apply -f runtime.yaml
+```
+
+### Run pod in Kata Containers
+
+If a pod has the `runtimeClassName` set to `kata`, the CRI plugin runs the pod with the
+[Kata Containers runtime](../../src/runtime/README.md).
+
+- Create an pod configuration that using Kata Containers runtime
+
+  ```bash
+  $ cat << EOF | tee nginx-kata.yaml
+  apiVersion: v1
+  kind: Pod
+  metadata:
+    name: nginx-kata
+  spec:
+    runtimeClassName: kata
+    containers:
    - name: nginx
      image: nginx
-```

-Next, you run your pod:
-```
-$ sudo -E kubectl apply -f nginx-untrusted.yaml
-```
+  EOF
+  ```

+- Create the pod
+  ```bash
+  $ sudo -E kubectl apply -f nginx-kata.yaml
+  ```
+
+- Check pod is running
+
+  ```bash
+  $ sudo -E kubectl get pods
+  ```
+
+- Check hypervisor is running
+  ```bash
+  $ ps aux | grep qemu
+  ```
+
+### Delete created pod
+
+```bash
+$ sudo -E kubectl delete -f nginx-kata.yaml
+```
--- a/docs/use-cases/using-Intel-SGX-and-kata.md
+++ b/docs/use-cases/using-Intel-SGX-and-kata.md
@@ -21,20 +21,7 @@ CONFIG_X86_SGX_KVM=y
   * [Intel SGX Kubernetes device plugin](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/sgx_plugin#deploying-with-pre-built-images)

 > Note: Kata Containers supports creating VM sandboxes with Intel® SGX enabled
-> using [cloud-hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor/) VMM only. QEMU support is waiting to get the
-> Intel SGX enabled QEMU upstream release.
-
-## Installation
-
-### Kata Containers Guest Kernel
-
-Follow the instructions to [setup](../../tools/packaging/kernel/README.md#setup-kernel-source-code) and [build](../../tools/packaging/kernel/README.md#build-the-kernel) the experimental guest kernel. Then, install as:
-
-```sh
-$ sudo cp kata-linux-experimental-*/vmlinux /opt/kata/share/kata-containers/vmlinux.sgx
-$ sudo sed -i 's|vmlinux.container|vmlinux.sgx|g' \
-  /opt/kata/share/defaults/kata-containers/configuration-clh.toml
-```
+> using [cloud-hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor/) and [QEMU](https://www.qemu.org/) VMMs only.

 ### Kata Containers Configuration

@@ -48,6 +35,8 @@ to the `sandbox` are: `["io.katacontainers.*", "sgx.intel.com/epc"]`.

 With the following sample job deployed using `kubectl apply -f`:

+> Note: Change the `runtimeClassName` option accordingly, only `kata-clh` and `kata-qemu` support Intel® SGX.
+
 ```yaml
 apiVersion: batch/v1
 kind: Job
--- a/src/agent/rustjail/src/process.rs
+++ b/src/agent/rustjail/src/process.rs
@@ -8,8 +8,8 @@ use std::fs::File;
 use std::os::unix::io::RawFd;
 use tokio::sync::mpsc::Sender;

+use nix::errno::Errno;
 use nix::fcntl::{fcntl, FcntlArg, OFlag};
-use nix::sys::signal::{self, Signal};
 use nix::sys::wait::{self, WaitStatus};
 use nix::unistd::{self, Pid};
 use nix::Result;
@@ -80,7 +80,7 @@ pub struct Process {
 pub trait ProcessOperations {
    fn pid(&self) -> Pid;
    fn wait(&self) -> Result<WaitStatus>;
-    fn signal(&self, sig: Signal) -> Result<()>;
+    fn signal(&self, sig: libc::c_int) -> Result<()>;
 }

 impl ProcessOperations for Process {
@@ -92,8 +92,10 @@ impl ProcessOperations for Process {
        wait::waitpid(Some(self.pid()), None)
    }

-    fn signal(&self, sig: Signal) -> Result<()> {
-        signal::kill(self.pid(), Some(sig))
+    fn signal(&self, sig: libc::c_int) -> Result<()> {
+        let res = unsafe { libc::kill(self.pid().into(), sig) };
+
+        Errno::result(res).map(drop)
    }
 }

@@ -281,6 +283,6 @@ mod tests {
        // signal to every process in the process
        // group of the calling process.
        process.pid = 0;
-        assert!(process.signal(Signal::SIGCONT).is_ok());
+        assert!(process.signal(libc::SIGCONT).is_ok());
    }
 }
--- a/src/agent/src/rpc.rs
+++ b/src/agent/src/rpc.rs
@@ -19,6 +19,7 @@ use ttrpc::{
 };

 use anyhow::{anyhow, Context, Result};
+use cgroups::freezer::FreezerState;
 use oci::{LinuxNamespace, Root, Spec};
 use protobuf::{Message, RepeatedField, SingularPtrField};
 use protocols::agent::{
@@ -39,9 +40,9 @@ use rustjail::specconv::CreateOpts;

 use nix::errno::Errno;
 use nix::mount::MsFlags;
-use nix::sys::signal::Signal;
 use nix::sys::stat;
 use nix::unistd::{self, Pid};
+use rustjail::cgroups::Manager;
 use rustjail::process::ProcessOperations;

 use sysinfo::{DiskExt, System, SystemExt};
@@ -69,7 +70,6 @@ use tracing_opentelemetry::OpenTelemetrySpanExt;
 use tracing::instrument;

 use libc::{self, c_char, c_ushort, pid_t, winsize, TIOCSWINSZ};
-use std::convert::TryFrom;
 use std::fs;
 use std::os::unix::fs::MetadataExt;
 use std::os::unix::prelude::PermissionsExt;
@@ -389,7 +389,6 @@ impl AgentService {
        let cid = req.container_id.clone();
        let eid = req.exec_id.clone();
        let s = self.sandbox.clone();
-        let mut sandbox = s.lock().await;

        info!(
            sl!(),
@@ -398,27 +397,93 @@ impl AgentService {
            "exec-id" => eid.clone(),
        );

-        let p = sandbox.find_container_process(cid.as_str(), eid.as_str())?;
-
-        let mut signal = Signal::try_from(req.signal as i32).map_err(|e| {
-            anyhow!(e).context(format!(
-                "failed to convert {:?} to signal (container-id: {}, exec-id: {})",
-                req.signal, cid, eid
-            ))
-        })?;
-
-        // For container initProcess, if it hasn't installed handler for "SIGTERM" signal,
-        // it will ignore the "SIGTERM" signal sent to it, thus send it "SIGKILL" signal
-        // instead of "SIGTERM" to terminate it.
-        if p.init && signal == Signal::SIGTERM && !is_signal_handled(p.pid, req.signal) {
-            signal = Signal::SIGKILL;
+        let mut sig: libc::c_int = req.signal as libc::c_int;
+        {
+            let mut sandbox = s.lock().await;
+            let p = sandbox.find_container_process(cid.as_str(), eid.as_str())?;
+            // For container initProcess, if it hasn't installed handler for "SIGTERM" signal,
+            // it will ignore the "SIGTERM" signal sent to it, thus send it "SIGKILL" signal
+            // instead of "SIGTERM" to terminate it.
+            if p.init && sig == libc::SIGTERM && !is_signal_handled(p.pid, sig as u32) {
+                sig = libc::SIGKILL;
+            }
+            p.signal(sig)?;
        }

-        p.signal(signal)?;
+        if eid.is_empty() {
+            // eid is empty, signal all the remaining processes in the container cgroup
+            info!(
+                sl!(),
+                "signal all the remaining processes";
+                "container-id" => cid.clone(),
+                "exec-id" => eid.clone(),
+            );

+            if let Err(err) = self.freeze_cgroup(&cid, FreezerState::Frozen).await {
+                warn!(
+                    sl!(),
+                    "freeze cgroup failed";
+                    "container-id" => cid.clone(),
+                    "exec-id" => eid.clone(),
+                    "error" => format!("{:?}", err),
+                );
+            }
+
+            let pids = self.get_pids(&cid).await?;
+            for pid in pids.iter() {
+                let res = unsafe { libc::kill(*pid, sig) };
+                if let Err(err) = Errno::result(res).map(drop) {
+                    warn!(
+                        sl!(),
+                        "signal failed";
+                        "container-id" => cid.clone(),
+                        "exec-id" => eid.clone(),
+                        "pid" => pid,
+                        "error" => format!("{:?}", err),
+                    );
+                }
+            }
+            if let Err(err) = self.freeze_cgroup(&cid, FreezerState::Thawed).await {
+                warn!(
+                    sl!(),
+                    "unfreeze cgroup failed";
+                    "container-id" => cid.clone(),
+                    "exec-id" => eid.clone(),
+                    "error" => format!("{:?}", err),
+                );
+            }
+        }
        Ok(())
    }

+    async fn freeze_cgroup(&self, cid: &str, state: FreezerState) -> Result<()> {
+        let s = self.sandbox.clone();
+        let mut sandbox = s.lock().await;
+        let ctr = sandbox
+            .get_container(cid)
+            .ok_or_else(|| anyhow!("Invalid container id {}", cid))?;
+        let cm = ctr
+            .cgroup_manager
+            .as_ref()
+            .ok_or_else(|| anyhow!("cgroup manager not exist"))?;
+        cm.freeze(state)?;
+        Ok(())
+    }
+
+    async fn get_pids(&self, cid: &str) -> Result<Vec<i32>> {
+        let s = self.sandbox.clone();
+        let mut sandbox = s.lock().await;
+        let ctr = sandbox
+            .get_container(cid)
+            .ok_or_else(|| anyhow!("Invalid container id {}", cid))?;
+        let cm = ctr
+            .cgroup_manager
+            .as_ref()
+            .ok_or_else(|| anyhow!("cgroup manager not exist"))?;
+        let pids = cm.get_pids()?;
+        Ok(pids)
+    }
+
    #[instrument]
    async fn do_wait_process(
        &self,
--- a/src/runtime/Makefile
+++ b/src/runtime/Makefile
@@ -589,12 +589,10 @@ $(GENERATED_FILES): %: %.in $(MAKEFILE_LIST) VERSION .git-commit

 generate-config: $(CONFIGS)

-test: install-hook go-test
+test: hook go-test

-install-hook:
+hook:
 	make -C virtcontainers hook
-	echo "installing mock hook"
-	sudo -E make -C virtcontainers install

 go-test: $(GENERATED_FILES)
 	go clean -testcache
--- a/src/runtime/cmd/kata-monitor/main.go
+++ b/src/runtime/cmd/kata-monitor/main.go
@@ -21,7 +21,7 @@ import (
 const defaultListenAddress = "127.0.0.1:8090"

 var monitorListenAddr = flag.String("listen-address", defaultListenAddress, "The address to listen on for HTTP requests.")
-var runtimeEndpoint = flag.String("runtime-endpoint", "/run/containerd/containerd.sock", `Endpoint of CRI container runtime service. (default: "/run/containerd/containerd.sock")`)
+var runtimeEndpoint = flag.String("runtime-endpoint", "/run/containerd/containerd.sock", "Endpoint of CRI container runtime service.")
 var logLevel = flag.String("log-level", "info", "Log level of logrus(trace/debug/info/warn/error/fatal/panic).")

 // These values are overridden via ldflags
--- a/src/runtime/pkg/containerd-shim-v2/service.go
+++ b/src/runtime/pkg/containerd-shim-v2/service.go
@@ -776,6 +776,8 @@ func (s *service) Kill(ctx context.Context, r *taskAPI.KillRequest) (_ *ptypes.E
 			return empty, errors.New("The exec process does not exist")
 		}
 		processStatus = execs.status
+	} else {
+		r.All = true
 	}

 	// According to CRI specs, kubelet will call StopPodSandbox()
--- a/src/runtime/pkg/containerd-shim-v2/start.go
+++ b/src/runtime/pkg/containerd-shim-v2/start.go
@@ -8,12 +8,14 @@ package containerdshim
 import (
 	"context"
 	"fmt"
+	"github.com/sirupsen/logrus"

 	"github.com/containerd/containerd/api/types/task"
 	"github.com/kata-containers/kata-containers/src/runtime/pkg/katautils"
 )

 func startContainer(ctx context.Context, s *service, c *container) (retErr error) {
+	shimLog.WithField("container", c.id).Debug("start container")
 	defer func() {
 		if retErr != nil {
 			// notify the wait goroutine to continue
@@ -78,7 +80,8 @@ func startContainer(ctx context.Context, s *service, c *container) (retErr error
 			return err
 		}
 		c.ttyio = tty
-		go ioCopy(c.exitIOch, c.stdinCloser, tty, stdin, stdout, stderr)
+
+		go ioCopy(shimLog.WithField("container", c.id), c.exitIOch, c.stdinCloser, tty, stdin, stdout, stderr)
 	} else {
 		// close the io exit channel, since there is no io for this container,
 		// otherwise the following wait goroutine will hang on this channel.
@@ -94,6 +97,10 @@ func startContainer(ctx context.Context, s *service, c *container) (retErr error
 }

 func startExec(ctx context.Context, s *service, containerID, execID string) (e *exec, retErr error) {
+	shimLog.WithFields(logrus.Fields{
+		"container": containerID,
+		"exec":      execID,
+	}).Debug("start container execution")
 	// start an exec
 	c, err := s.getContainer(containerID)
 	if err != nil {
@@ -140,7 +147,10 @@ func startExec(ctx context.Context, s *service, containerID, execID string) (e *
 	}
 	execs.ttyio = tty

-	go ioCopy(execs.exitIOch, execs.stdinCloser, tty, stdin, stdout, stderr)
+	go ioCopy(shimLog.WithFields(logrus.Fields{
+		"container": c.id,
+		"exec":      execID,
+	}), execs.exitIOch, execs.stdinCloser, tty, stdin, stdout, stderr)

 	go wait(ctx, s, c, execID)

--- a/src/runtime/pkg/containerd-shim-v2/stream.go
+++ b/src/runtime/pkg/containerd-shim-v2/stream.go
@@ -12,6 +12,7 @@ import (
 	"syscall"

 	"github.com/containerd/fifo"
+	"github.com/sirupsen/logrus"
 )

 // The buffer size used to specify the buffer for IO streams copy
@@ -86,18 +87,20 @@ func newTtyIO(ctx context.Context, stdin, stdout, stderr string, console bool) (
 	return ttyIO, nil
 }

-func ioCopy(exitch, stdinCloser chan struct{}, tty *ttyIO, stdinPipe io.WriteCloser, stdoutPipe, stderrPipe io.Reader) {
+func ioCopy(shimLog *logrus.Entry, exitch, stdinCloser chan struct{}, tty *ttyIO, stdinPipe io.WriteCloser, stdoutPipe, stderrPipe io.Reader) {
 	var wg sync.WaitGroup

 	if tty.Stdin != nil {
 		wg.Add(1)
 		go func() {
+			shimLog.Debug("stdin io stream copy started")
 			p := bufPool.Get().(*[]byte)
 			defer bufPool.Put(p)
 			io.CopyBuffer(stdinPipe, tty.Stdin, *p)
 			// notify that we can close process's io safely.
 			close(stdinCloser)
 			wg.Done()
+			shimLog.Debug("stdin io stream copy exited")
 		}()
 	}

@@ -105,6 +108,7 @@ func ioCopy(exitch, stdinCloser chan struct{}, tty *ttyIO, stdinPipe io.WriteClo
 		wg.Add(1)

 		go func() {
+			shimLog.Debug("stdout io stream copy started")
 			p := bufPool.Get().(*[]byte)
 			defer bufPool.Put(p)
 			io.CopyBuffer(tty.Stdout, stdoutPipe, *p)
@@ -113,20 +117,24 @@ func ioCopy(exitch, stdinCloser chan struct{}, tty *ttyIO, stdinPipe io.WriteClo
 				// close stdin to make the other routine stop
 				tty.Stdin.Close()
 			}
+			shimLog.Debug("stdout io stream copy exited")
 		}()
 	}

 	if tty.Stderr != nil && stderrPipe != nil {
 		wg.Add(1)
 		go func() {
+			shimLog.Debug("stderr io stream copy started")
 			p := bufPool.Get().(*[]byte)
 			defer bufPool.Put(p)
 			io.CopyBuffer(tty.Stderr, stderrPipe, *p)
 			wg.Done()
+			shimLog.Debug("stderr io stream copy exited")
 		}()
 	}

 	wg.Wait()
 	tty.close()
 	close(exitch)
+	shimLog.Debug("all io stream copy goroutines exited")
 }
--- a/src/runtime/pkg/containerd-shim-v2/stream_test.go
+++ b/src/runtime/pkg/containerd-shim-v2/stream_test.go
@@ -7,6 +7,7 @@ package containerdshim

 import (
 	"context"
+	"github.com/sirupsen/logrus"
 	"io"
 	"os"
 	"path/filepath"
@@ -179,7 +180,7 @@ func TestIoCopy(t *testing.T) {
 		defer tty.close()

 		// start the ioCopy threads : copy from src to dst
-		go ioCopy(exitioch, stdinCloser, tty, dstInW, srcOutR, srcErrR)
+		go ioCopy(logrus.WithContext(context.Background()), exitioch, stdinCloser, tty, dstInW, srcOutR, srcErrR)

 		var firstW, secondW, thirdW io.WriteCloser
 		var firstR, secondR, thirdR io.Reader
--- a/src/runtime/pkg/containerd-shim-v2/wait.go
+++ b/src/runtime/pkg/containerd-shim-v2/wait.go
@@ -15,7 +15,6 @@ import (
 	"github.com/containerd/containerd/api/types/task"
 	"github.com/containerd/containerd/mount"
 	"github.com/sirupsen/logrus"
-	"google.golang.org/grpc/codes"

 	"github.com/kata-containers/kata-containers/src/runtime/pkg/oci"
 )
@@ -31,12 +30,17 @@ func wait(ctx context.Context, s *service, c *container, execID string) (int32,
 	if execID == "" {
 		//wait until the io closed, then wait the container
 		<-c.exitIOch
+		shimLog.WithField("container", c.id).Debug("The container io streams closed")
 	} else {
 		execs, err = c.getExec(execID)
 		if err != nil {
 			return exitCode255, err
 		}
 		<-execs.exitIOch
+		shimLog.WithFields(logrus.Fields{
+			"container": c.id,
+			"exec":      execID,
+		}).Debug("The container process io streams closed")
 		//This wait could be triggered before exec start which
 		//will get the exec's id, thus this assignment must after
 		//the exec exit, to make sure it get the exec's id.
@@ -63,6 +67,7 @@ func wait(ctx context.Context, s *service, c *container, execID string) (int32,
 		if c.cType.IsSandbox() {
 			// cancel watcher
 			if s.monitor != nil {
+				shimLog.WithField("sandbox", s.sandbox.ID()).Info("cancel watcher")
 				s.monitor <- nil
 			}
 			if err = s.sandbox.Stop(ctx, true); err != nil {
@@ -82,13 +87,17 @@ func wait(ctx context.Context, s *service, c *container, execID string) (int32,
 		c.exitTime = timeStamp

 		c.exitCh <- uint32(ret)
-
+		shimLog.WithField("container", c.id).Debug("The container status is StatusStopped")
 	} else {
 		execs.status = task.StatusStopped
 		execs.exitCode = ret
 		execs.exitTime = timeStamp

 		execs.exitCh <- uint32(ret)
+		shimLog.WithFields(logrus.Fields{
+			"container": c.id,
+			"exec":      execID,
+		}).Debug("The container exec status is StatusStopped")
 	}
 	s.mu.Unlock()

@@ -102,6 +111,7 @@ func watchSandbox(ctx context.Context, s *service) {
 		return
 	}
 	err := <-s.monitor
+	shimLog.WithError(err).WithField("sandbox", s.sandbox.ID()).Info("watchSandbox gets an error or stop signal")
 	if err == nil {
 		return
 	}
@@ -147,13 +157,11 @@ func watchOOMEvents(ctx context.Context, s *service) {
 		default:
 			containerID, err := s.sandbox.GetOOMEvent(ctx)
 			if err != nil {
-				shimLog.WithError(err).Warn("failed to get OOM event from sandbox")
-				// If the GetOOMEvent call is not implemented, then the agent is most likely an older version,
-				// stop attempting to get OOM events.
-				// for rust agent, the response code is not found
-				if isGRPCErrorCode(codes.NotFound, err) || err.Error() == "Dead agent" {
+				if err.Error() == "ttrpc: closed" || err.Error() == "Dead agent" {
+					shimLog.WithError(err).Warn("agent has shutdown, return from watching of OOM events")
 					return
 				}
+				shimLog.WithError(err).Warn("failed to get OOM event from sandbox")
 				time.Sleep(defaultCheckInterval)
 				continue
 			}
--- a/src/runtime/pkg/katautils/hook_test.go
+++ b/src/runtime/pkg/katautils/hook_test.go
@@ -20,7 +20,7 @@ import (
 var testKeyHook = "test-key"
 var testContainerIDHook = "test-container-id"
 var testControllerIDHook = "test-controller-id"
-var testBinHookPath = "/usr/bin/virtcontainers/bin/test/hook"
+var testBinHookPath = "../../virtcontainers/hook/mock/hook"
 var testBundlePath = "/test/bundle"

 func getMockHookBinPath() string {
--- a/src/runtime/virtcontainers/container.go
+++ b/src/runtime/virtcontainers/container.go
@@ -12,6 +12,7 @@ import (
 	"os"
 	"path/filepath"
 	"strconv"
+	"strings"
 	"syscall"
 	"time"

@@ -1060,7 +1061,18 @@ func (c *Container) signalProcess(ctx context.Context, processID string, signal
 		return fmt.Errorf("Container not ready, running or paused, impossible to signal the container")
 	}

-	return c.sandbox.agent.signalProcess(ctx, c, processID, signal, all)
+	// kill(2) method can return ESRCH in certain cases, which is not handled by containerd cri server in container_stop.go.
+	// CRIO server also doesn't handle ESRCH. So kata runtime will swallow it here.
+	var err error
+	if err = c.sandbox.agent.signalProcess(ctx, c, processID, signal, all); err != nil &&
+		strings.Contains(err.Error(), "ESRCH: No such process") {
+		c.Logger().WithFields(logrus.Fields{
+			"container":  c.id,
+			"process-id": processID,
+		}).Warn("signal encounters ESRCH, process already finished")
+		return nil
+	}
+	return err
 }

 func (c *Container) winsizeProcess(ctx context.Context, processID string, height, width uint32) error {
--- a/src/runtime/virtcontainers/monitor.go
+++ b/src/runtime/virtcontainers/monitor.go
@@ -18,6 +18,8 @@ const (
 	watcherChannelSize   = 128
 )

+var monitorLog = virtLog.WithField("subsystem", "virtcontainers/monitor")
+
 // nolint: govet
 type monitor struct {
 	watchers []chan error
@@ -33,6 +35,9 @@ type monitor struct {
 }

 func newMonitor(s *Sandbox) *monitor {
+	// there should only be one monitor for one sandbox,
+	// so it's safe to let monitorLog as a global variable.
+	monitorLog = monitorLog.WithField("sandbox", s.ID())
 	return &monitor{
 		sandbox:       s,
 		checkInterval: defaultCheckInterval,
@@ -72,6 +77,7 @@ func (m *monitor) newWatcher(ctx context.Context) (chan error, error) {
 }

 func (m *monitor) notify(ctx context.Context, err error) {
+	monitorLog.WithError(err).Warn("notify on errors")
 	m.sandbox.agent.markDead(ctx)

 	m.Lock()
@@ -85,18 +91,19 @@ func (m *monitor) notify(ctx context.Context, err error) {
 	// but just in case...
 	defer func() {
 		if x := recover(); x != nil {
-			virtLog.Warnf("watcher closed channel: %v", x)
+			monitorLog.Warnf("watcher closed channel: %v", x)
 		}
 	}()

 	for _, c := range m.watchers {
+		monitorLog.WithError(err).Warn("write error to watcher")
 		// throw away message can not write to channel
 		// make it not stuck, the first error is useful.
 		select {
 		case c <- err:

 		default:
-			virtLog.WithField("channel-size", watcherChannelSize).Warnf("watcher channel is full, throw notify message")
+			monitorLog.WithField("channel-size", watcherChannelSize).Warnf("watcher channel is full, throw notify message")
 		}
 	}
 }
@@ -104,6 +111,7 @@ func (m *monitor) notify(ctx context.Context, err error) {
 func (m *monitor) stop() {
 	// wait outside of monitor lock for the watcher channel to exit.
 	defer m.wg.Wait()
+	monitorLog.Info("stopping monitor")

 	m.Lock()
 	defer m.Unlock()
@@ -122,7 +130,7 @@ func (m *monitor) stop() {
 	// but just in case...
 	defer func() {
 		if x := recover(); x != nil {
-			virtLog.Warnf("watcher closed channel: %v", x)
+			monitorLog.Warnf("watcher closed channel: %v", x)
 		}
 	}()

--- a/src/runtime/virtcontainers/utils/utils.go
+++ b/src/runtime/virtcontainers/utils/utils.go
@@ -321,6 +321,7 @@ func WaitLocalProcess(pid int, timeoutSecs uint, initialSignal syscall.Signal, l
 	if initialSignal != syscall.Signal(0) {
 		if err = syscall.Kill(pid, initialSignal); err != nil {
 			if err == syscall.ESRCH {
+				logger.WithField("pid", pid).Warnf("kill encounters ESRCH, process already finished")
 				return nil
 			}

--- a/tools/packaging/kata-deploy/README.md
+++ b/tools/packaging/kata-deploy/README.md
@@ -143,7 +143,7 @@ $ kubectl -n kube-system wait --timeout=10m --for=delete -l name=kata-deploy pod

 After ensuring kata-deploy has been deleted, cleanup the cluster:
 ```sh
-$ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-containers/main/tools/packaging/kata-deploy/kata-cleanup/base/kata-cleanup-stabe.yaml
+$ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-containers/main/tools/packaging/kata-deploy/kata-cleanup/base/kata-cleanup-stable.yaml
 ```

 The cleanup daemon-set will run a single time, cleaning up the node-label, which makes it difficult to check in an automated fashion.
--- a/tools/packaging/kata-deploy/kata-cleanup/base/kata-cleanup.yaml
+++ b/tools/packaging/kata-deploy/kata-cleanup/base/kata-cleanup.yaml
@@ -18,7 +18,7 @@ spec:
          katacontainers.io/kata-runtime: cleanup
      containers:
      - name: kube-kata-cleanup
-        image: quay.io/kata-containers/kata-deploy:latest
+        image: quay.io/kata-containers/kata-deploy:2.4.0
        imagePullPolicy: Always
        command: [ "bash", "-c", "/opt/kata-artifacts/scripts/kata-deploy.sh reset" ]
        env:
--- a/tools/packaging/kata-deploy/kata-deploy/base/kata-deploy.yaml
+++ b/tools/packaging/kata-deploy/kata-deploy/base/kata-deploy.yaml
@@ -16,7 +16,7 @@ spec:
      serviceAccountName: kata-label-node
      containers:
      - name: kube-kata
-        image: quay.io/kata-containers/kata-deploy:latest
+        image: quay.io/kata-containers/kata-deploy:2.4.0
        imagePullPolicy: Always
        lifecycle:
          preStop:
--- a/tools/packaging/kernel/kata_config_version
+++ b/tools/packaging/kernel/kata_config_version
@@ -1 +1 @@
-89
+90
--- a/tools/packaging/qemu/patches/6.2.x/Revert-target-ppc-Move-SPR_DSISR-setting-to-powerpc_excp.patch
+++ b/tools/packaging/qemu/patches/6.2.x/Revert-target-ppc-Move-SPR_DSISR-setting-to-powerpc_excp.patch
@@ -0,0 +1,81 @@
+From 29c4a3363bf287bb9a7b0342b1bc2dba3661c96c Mon Sep 17 00:00:00 2001
+From: Fabiano Rosas <farosas@linux.ibm.com>
+Date: Fri, 17 Dec 2021 17:57:18 +0100
+Subject: [PATCH] Revert "target/ppc: Move SPR_DSISR setting to powerpc_excp"
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+This reverts commit 336e91f85332dda0ede4c1d15b87a19a0fb898a2.
+
+It breaks the --disable-tcg build:
+
+ ../target/ppc/excp_helper.c:463:29: error: implicit declaration of
+ function ‘cpu_ldl_code’ [-Werror=implicit-function-declaration]
+
+We should not have TCG code in powerpc_excp because some kvm-only
+routines use it indirectly to dispatch interrupts. See
+kvm_handle_debug, spapr_mce_req_event and
+spapr_do_system_reset_on_cpu.
+
+We can re-introduce the change once we have split the interrupt
+injection code between KVM and TCG.
+
+Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
+Message-Id: <20211209173323.2166642-1-farosas@linux.ibm.com>
+Signed-off-by: Cédric Le Goater <clg@kaod.org>
+---
+ target/ppc/excp_helper.c | 21 ++++++++++++---------
+ 1 file changed, 12 insertions(+), 9 deletions(-)
+
+diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
+index feb3fd42e2..6ba0840e99 100644
+--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
+@@ -464,15 +464,13 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int excp_model, int excp)
+         break;
+     }
+     case POWERPC_EXCP_ALIGN:     /* Alignment exception                      */
+        /* Get rS/rD and rA from faulting opcode */
+         /*
+-         * Get rS/rD and rA from faulting opcode.
+-         * Note: We will only invoke ALIGN for atomic operations,
+-         * so all instructions are X-form.
+         * Note: the opcode fields will not be set properly for a
+         * direct store load/store, but nobody cares as nobody
+         * actually uses direct store segments.
+          */
+-        {
+-            uint32_t insn = cpu_ldl_code(env, env->nip);
+-            env->spr[SPR_DSISR] |= (insn & 0x03FF0000) >> 16;
+-        }
+        env->spr[SPR_DSISR] |= (env->error_code & 0x03FF0000) >> 16;
+         break;
+     case POWERPC_EXCP_PROGRAM:   /* Program exception                        */
+         switch (env->error_code & ~0xF) {
+@@ -1441,6 +1439,11 @@ void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr vaddr,
+                                  int mmu_idx, uintptr_t retaddr)
+ {
+     CPUPPCState *env = cs->env_ptr;
+    uint32_t insn;
+
+    /* Restore state and reload the insn we executed, for filling in DSISR.  */
+    cpu_restore_state(cs, retaddr, true);
+    insn = cpu_ldl_code(env, env->nip);
+ 
+     switch (env->mmu_model) {
+     case POWERPC_MMU_SOFT_4xx:
+@@ -1456,8 +1459,8 @@ void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr vaddr,
+     }
+ 
+     cs->exception_index = POWERPC_EXCP_ALIGN;
+-    env->error_code = 0;
+-    cpu_loop_exit_restore(cs, retaddr);
+    env->error_code = insn & 0x03FF0000;
+    cpu_loop_exit(cs);
+ }
+ #endif /* CONFIG_TCG */
+ #endif /* !CONFIG_USER_ONLY */
+-- 
+GitLab
+
--- a/tools/packaging/qemu/patches/tag_patches/v6.2.0/no_patches.txt
+++ b/tools/packaging/qemu/patches/tag_patches/v6.2.0/no_patches.txt
--- a/tools/packaging/release/generate_vendor.sh
+++ b/tools/packaging/release/generate_vendor.sh
@@ -0,0 +1,53 @@
+#!/usr/bin/env bash
+#
+# Copyright (c) 2022 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+#
+
+set -o errexit
+set -o nounset
+set -o pipefail
+
+script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+script_name="$(basename "${BASH_SOURCE[0]}")"
+
+# This is very much error prone in case we re-structure our
+# repos again, but it's also used in a few other places :-/
+repo_dir="${script_dir}/../../.."
+
+function usage() {
+
+	cat <<EOF
+Usage: ${script_name} tarball-name
+This script creates a tarball with all the cargo vendored code
+that a distro would need to do a full build of the project in
+a disconnected environment, generating a "tarball-name" file.
+
+EOF
+
+}
+
+create_vendor_tarball() {
+	vendor_dir_list=""
+	pushd ${repo_dir}
+		for i in $(find . -name 'Cargo.lock'); do
+			dir="$(dirname $i)"
+			pushd "${dir}"
+				[ -d .cargo ] || mkdir .cargo
+				cargo vendor >> .cargo/config
+				vendor_dir_list+=" $dir/vendor $dir/.cargo/config"
+				echo "${vendor_dir_list}"
+			popd
+		done
+	popd
+	
+	tar -cvzf ${1} ${vendor_dir_list}
+}
+
+main () {
+	[ $# -ne 1 ] && usage && exit 0
+	create_vendor_tarball ${1}
+}
+
+main "$@"
--- a/tools/packaging/release/update-repository-version.sh
+++ b/tools/packaging/release/update-repository-version.sh
@@ -68,7 +68,6 @@ generate_kata_deploy_commit() {
 kata-deploy files must be adapted to a new release.  The cases where it
 happens are when the release goes from -> to:
 * main -> stable:
-  * kata-deploy / kata-cleanup: change from \"latest\" to \"rc0\"
  * kata-deploy-stable / kata-cleanup-stable: are removed

 * stable -> stable:
@@ -161,7 +160,7 @@ bump_repo() {
 		#                     +----------------+----------------+
 		#                     |      from      |       to       |
 		#  -------------------+----------------+----------------+
-		#  kata-deploy        | "latest"       | "rc0"          |
+		#  kata-deploy        | "latest"       | "latest"       |
 		#  -------------------+----------------+----------------+
 		#  kata-deploy-stable | "stable"       | REMOVED        |
 		#  -------------------+----------------+----------------+
@@ -183,29 +182,34 @@ bump_repo() {
 		info "Updating kata-deploy / kata-cleanup image tags"
 		local version_to_replace="${current_version}"
 		local replacement="${new_version}"
-		if [ "${target_branch}" == "main" ]; then
+		local need_commit=false
+		if [ "${target_branch}" == "main" ];then
 			if [[ "${new_version}" =~ "rc" ]]; then
-				## this is the case 2) where we remove te kata-deploy / kata-cleanup stable files
+				## We are bumping from alpha to RC, should drop kata-deploy-stable yamls.
 				git rm "${kata_deploy_stable_yaml}"
 				git rm "${kata_cleanup_stable_yaml}"

-			else
-				## this is the case 1) where we just do nothing
-				replacement="latest"
+				need_commit=true
 			fi
-			version_to_replace="latest"
-		fi
-
-		if [ "${version_to_replace}" != "${replacement}" ]; then
-			## this covers case 2) and 3), as on both of them we have changes on kata-deploy / kata-cleanup  files
-			sed -i "s#${registry}:${version_to_replace}#${registry}:${new_version}#g" "${kata_deploy_yaml}"
-			sed -i "s#${registry}:${version_to_replace}#${registry}:${new_version}#g" "${kata_cleanup_yaml}"
+		elif [ "${new_version}" != *"rc"* ]; then
+			## We are on a stable branch and creating new stable releases.
+			## Need to change kata-deploy / kata-cleanup to use the stable tags.
+			if [[ "${version_to_replace}" =~ "rc" ]]; then
+				## Coming from "rcX" so from the latest tag.
+				version_to_replace="latest"
+			fi
+			sed -i "s#${registry}:${version_to_replace}#${registry}:${replacement}#g" "${kata_deploy_yaml}"
+			sed -i "s#${registry}:${version_to_replace}#${registry}:${replacement}#g" "${kata_cleanup_yaml}"

 			git diff

 			git add "${kata_deploy_yaml}"
 			git add "${kata_cleanup_yaml}"

+			need_commit=true
+		fi
+
+		if [ "${need_commit}" == "true" ]; then
 			info "Creating the commit with the kata-deploy changes"
 			local commit_msg="$(generate_kata_deploy_commit $new_version)"
 			git commit -s -m "${commit_msg}"
--- a/tools/packaging/scripts/configure-hypervisor.sh
+++ b/tools/packaging/scripts/configure-hypervisor.sh
@@ -250,7 +250,6 @@ generate_qemu_options() {
 	qemu_options+=(size:--disable-auth-pam)

 	# Disable unused filesystem support
-	[ "$arch" == x86_64 ] && qemu_options+=(size:--disable-fdt)
 	qemu_options+=(size:--disable-glusterfs)
 	qemu_options+=(size:--disable-libiscsi)
 	qemu_options+=(size:--disable-libnfs)
@@ -303,7 +302,6 @@ generate_qemu_options() {
 		;;
 	esac
 	qemu_options+=(size:--disable-qom-cast-debug)
-	qemu_options+=(size:--disable-tcmalloc)

 	# Disable libudev since it is only needed for qemu-pr-helper and USB,
 	# none of which are used with Kata
--- a/utils/kata-manager.sh
+++ b/utils/kata-manager.sh
@@ -208,11 +208,14 @@ Description: Install $kata_project [1] (and optionally $containerd_project [2])
 Options:

 -c <version> : Specify containerd version.
+ -d           : Enable debug for all components.
 -f           : Force installation (use with care).
 -h           : Show this help statement.
 -k <version> : Specify Kata Containers version.
 -o           : Only install Kata Containers.
 -r           : Don't cleanup on failure (retain files).
+ -t           : Disable self test (don't try to create a container after install).
+ -T           : Only run self test (do not install anything).

 Notes:

@@ -402,13 +405,21 @@ install_containerd()

 	sudo tar -C /usr/local -xvf "${file}"

-	sudo ln -sf /usr/local/bin/ctr "${link_dir}"
+	for file in \
+		/usr/local/bin/containerd \
+		/usr/local/bin/ctr
+		do
+			sudo ln -sf "$file" "${link_dir}"
+		done

 	info "$project installed\n"
 }

 configure_containerd()
 {
+	local enable_debug="${1:-}"
+	[ -z "$enable_debug" ] && die "no enable debug value"
+
 	local project="$containerd_project"

 	info "Configuring $project"
@@ -460,26 +471,55 @@ configure_containerd()
 		info "Backed up $cfg to $original"
 	}

+	local modified="false"
+
 	# Add the Kata Containers configuration details:

+	local comment_text
+	comment_text=$(printf "%s: Added by %s\n" \
+		"$(date -Iseconds)" \
+		"$script_name")
+
 	sudo grep -q "$kata_runtime_type" "$cfg" || {
 		cat <<-EOT | sudo tee -a "$cfg"
-[plugins]
-  [plugins."io.containerd.grpc.v1.cri"]
-    [plugins."io.containerd.grpc.v1.cri".containerd]
-      default_runtime_name = "${kata_runtime_name}"
-      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
-        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.${kata_runtime_name}]
-          runtime_type = "${kata_runtime_type}"
-EOT
+		# $comment_text
+		[plugins]
+		  [plugins."io.containerd.grpc.v1.cri"]
+		    [plugins."io.containerd.grpc.v1.cri".containerd]
+		      default_runtime_name = "${kata_runtime_name}"
+		      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
+		        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.${kata_runtime_name}]
+		          runtime_type = "${kata_runtime_type}"
+		EOT

-		info "Modified $cfg"
+		modified="true"
 	}

+	if [ "$enable_debug" = "true" ]
+	then
+		local debug_enabled
+		debug_enabled=$(awk -v RS='' '/\[debug\]/' "$cfg" |\
+			grep -E "^\s*\<level\>\s*=\s*.*\<debug\>" || true)
+
+		[ -n "$debug_enabled" ] || {
+			cat <<-EOT | sudo tee -a "$cfg"
+			# $comment_text
+			[debug]
+				level = "debug"
+			EOT
+		}
+
+		modified="true"
+	fi
+
+	[ "$modified" = "true" ] && info "Modified $cfg"
 	sudo systemctl enable containerd
 	sudo systemctl start containerd

-	info "Configured $project\n"
+	local msg="disabled"
+	[ "$enable_debug" = "true" ] && msg="enabled"
+
+	info "Configured $project (debug $msg)\n"
 }

 install_kata()
@@ -540,11 +580,48 @@ install_kata()
 	info "$project installed\n"
 }

+configure_kata()
+{
+	local enable_debug="${1:-}"
+	[ -z "$enable_debug" ] && die "no enable debug value"
+
+	[ "$enable_debug" = "false" ] && \
+		info "Using default $kata_project configuration" && \
+		return 0
+
+	local config_file='configuration.toml'
+	local kata_dir='/etc/kata-containers'
+
+	sudo mkdir -p "$kata_dir"
+
+	local cfg_from
+	local cfg_to
+
+	cfg_from="${kata_install_dir}/share/defaults/kata-containers/${config_file}"
+	cfg_to="${kata_dir}/${config_file}"
+
+	[ -e "$cfg_from" ] || die "cannot find $kata_project configuration file"
+
+	sudo install -o root -g root -m 0644 "$cfg_from" "$cfg_to"
+
+	sudo sed -i \
+		-e 's/^# *\(enable_debug\).*=.*$/\1 = true/g' \
+		-e 's/^kernel_params = "\(.*\)"/kernel_params = "\1 agent.log=debug initcall_debug"/g' \
+		"$cfg_to"
+
+	info "Configured $kata_project for full debug (delete $cfg_to to use pristine $kata_project configuration)"
+}
+
 handle_kata()
 {
 	local version="${1:-}"

-	install_kata "$version"
+	local enable_debug="${2:-}"
+	[ -z "$enable_debug" ] && die "no enable debug value"
+
+	install_kata "$version" "$enable_debug"
+
+	configure_kata "$enable_debug"

 	kata-runtime --version
 }
@@ -556,6 +633,9 @@ handle_containerd()
 	local force="${2:-}"
 	[ -z "$force" ] && die "need force value"

+	local enable_debug="${3:-}"
+	[ -z "$enable_debug" ] && die "no enable debug value"
+
 	local ret

 	if [ "$force" = "true" ]
@@ -572,7 +652,7 @@ handle_containerd()
 		fi
 	fi

-	configure_containerd
+	configure_containerd "$enable_debug"

 	containerd --version
 }
@@ -617,20 +697,32 @@ handle_installation()
 	local only_kata="${3:-}"
 	[ -z "$only_kata" ] && die "no only Kata value"

+	local enable_debug="${4:-}"
+	[ -z "$enable_debug" ] && die "no enable debug value"
+
+	local disable_test="${5:-}"
+	[ -z "$disable_test" ] && die "no disable test value"
+
+	local only_run_test="${6:-}"
+	[ -z "$only_run_test" ] && die "no only run test value"
+
 	# These params can be blank
-	local kata_version="${4:-}"
-	local containerd_version="${5:-}"
+	local kata_version="${7:-}"
+	local containerd_version="${8:-}"
+
+	[ "$only_run_test" = "true" ] && test_installation && return 0

 	setup "$cleanup" "$force"

-	handle_kata "$kata_version"
+	handle_kata "$kata_version" "$enable_debug"

 	[ "$only_kata" = "false" ] && \
 		handle_containerd \
 		"$containerd_version" \
-		"$force"
+		"$force" \
+		"$enable_debug"

-	test_installation
+	[ "$disable_test" = "false" ] && test_installation

 	if [ "$only_kata" = "true" ]
 	then
@@ -647,21 +739,27 @@ handle_args()
 	local cleanup="true"
 	local force="false"
 	local only_kata="false"
+	local disable_test="false"
+	local only_run_test="false"
+	local enable_debug="false"

 	local opt

 	local kata_version=""
 	local containerd_version=""

-	while getopts "c:fhk:or" opt "$@"
+	while getopts "c:dfhk:ortT" opt "$@"
 	do
 		case "$opt" in
 			c) containerd_version="$OPTARG" ;;
+			d) enable_debug="true" ;;
 			f) force="true" ;;
 			h) usage; exit 0 ;;
 			k) kata_version="$OPTARG" ;;
 			o) only_kata="true" ;;
 			r) cleanup="false" ;;
+			t) disable_test="true" ;;
+			T) only_run_test="true" ;;
 		esac
 	done

@@ -674,6 +772,9 @@ handle_args()
 		"$cleanup" \
 		"$force" \
 		"$only_kata" \
+		"$enable_debug" \
+		"$disable_test" \
+		"$only_run_test" \
 		"$kata_version" \
 		"$containerd_version"
 }
--- a/versions.yaml
+++ b/versions.yaml
@@ -75,7 +75,7 @@ assets:
      url: "https://github.com/cloud-hypervisor/cloud-hypervisor"
      uscan-url: >-
        https://github.com/cloud-hypervisor/cloud-hypervisor/tags.*/v?(\d\S+)\.tar\.gz
-      version: "v22.0"
+      version: "v22.1"

    firecracker:
      description: "Firecracker micro-VMM"
@@ -88,8 +88,8 @@ assets:
    qemu:
      description: "VMM that uses KVM"
      url: "https://github.com/qemu/qemu"
-      version: "v6.1.0"
-      tag: "v6.1.0"
+      version: "v6.2.0"
+      tag: "v6.2.0"
      # Do not include any non-full release versions
      # Break the line *without CR or space being appended*, to appease
      # yamllint, and note the deliberate ' ' at the end of the expression.
@@ -153,7 +153,7 @@ assets:
  kernel:
    description: "Linux kernel optimised for virtual machines"
    url: "https://cdn.kernel.org/pub/linux/kernel/v5.x/"
-    version: "v5.15.23"
+    version: "v5.15.26"
    tdx:
      description: "Linux kernel that supports TDX"
      url: "https://github.com/intel/tdx/archive/refs/tags"
Author	SHA1	Message	Date
Fabiano Fidêncio	0ad6f05dee	Merge pull request #4024 from bergwolf/2.4.0-branch-bump # Kata Containers 2.4.0	2022-04-01 13:46:35 +02:00
Peng Tao	4c9c01a124	release: Kata Containers 2.4.0 - stable-2.4 \| agent: fix container stop error with signal SIGRTMIN+3 - stable-2.4 \| kata-monitor: fix duplicated output when printing usage - stable-2.4 \| runtime: Stop getting OOM events from agent for "ttrpc closed" error - kata-deploy: fix version bump from -rc to stable - stable-2.4: release: Include all the rust vendored code into the vendored tarball - stable-2.4 \| tools: release: Do not consider release candidates as stable releases - agent: Signal the whole process group - stable-2.4 \| docs: Update k8s documentation - backport main commits to stable 2.4 - stable-2.4: Bump QEMU to 6.2 (bringing then SGX support in) - runtime: Properly handle ESRCH error when signaling container - stable-2.4 \| versions: Upgrade to Cloud Hypervisor v22.1 `f2319d69` release: Adapt kata-deploy for 2.4.0 `cae48e9c` agent: fix container stop error with signal SIGRTMIN+3 `342aa95c` kata-monitor: fix duplicated output when printing usage `9f75e226` runtime: add logs around sandbox monitor `363fbed8` runtime: stop getting OOM events when ttrpc: closed error `f840de5a` workflows,release: Ship all the rust vendored code `952cea5f` tools: Add a generate_vendor.sh script `cc965fa0` kata-deploy: fix version bump from -rc to stable `f41cc184` tools: release: Do not consider release candidates as stable releases `e059b50f` runtime: Add more debug logs for container io stream copy `71ce6f53` agent: Kill the all the container processes of the same cgroup `30fc2c86` docs: Update k8s documentation `24028969` virtcontainers: Run mock hook from build tree rather than system bin dir `4e54aa5a` doc: fix filename typo `d815393c` manager: Add options to change self test behaviour `4111e1a3` manager: Add option to enable component debug `2918be18` manager: Create containerd link `6b31b068` kernel: fix cve-2022-0847 `5589b246` doc: update Intel SGX use cases document `1da88dca` tools: update QEMU to 6.2 `3e2f9223` runtime: Properly handle ESRCH error when signaling container `4c21cb3e` versions: Upgrade to Cloud Hypervisor v22.1 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2022-04-01 06:20:20 +00:00
Peng Tao	f2319d693d	release: Adapt kata-deploy for 2.4.0 kata-deploy files must be adapted to a new release. The cases where it happens are when the release goes from -> to: * main -> stable: * kata-deploy-stable / kata-cleanup-stable: are removed * stable -> stable: * kata-deploy / kata-cleanup: bump the release to the new one. There are no changes when doing an alpha release, as the files on the "main" branch always point to the "latest" and "stable" tags. Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2022-04-01 06:20:20 +00:00
Bin Liu	98ccf8f6a1	Merge pull request #4008 from wxx213/stable-2.4 stable-2.4 \| agent: fix container stop error with signal SIGRTMIN+3	2022-04-01 11:29:18 +08:00
Wang Xingxing	cae48e9c9b	agent: fix container stop error with signal SIGRTMIN+3 The nix::sys::signal::Signal package api cannot deal with SIGRTMIN+3, directly use libc function to send the signal. Fixes: #3990 Signed-off-by: Wang Xingxing <stellarwxx@163.com> (cherry picked from commit `0d765bd082`) Signed-off-by: Wang Xingxing <stellarwxx@163.com>	2022-03-31 16:49:06 +08:00
snir911	a36103c759	Merge pull request #4003 from fgiudici/kata-monitor_fix_help_backport stable-2.4 \| kata-monitor: fix duplicated output when printing usage	2022-03-30 18:57:17 +03:00
Fabiano Fidêncio	6abbcc551c	Merge pull request #3997 from liubin/backport-2.4 stable-2.4 \| runtime: Stop getting OOM events from agent for "ttrpc closed" error	2022-03-30 14:08:55 +02:00
Francesco Giudici	342aa95cc8	kata-monitor: fix duplicated output when printing usage (default: "/run/containerd/containerd.sock") is duplicated when printing kata-monitor usage: [root@kubernetes ~]# kata-monitor --help Usage of kata-monitor: -listen-address string The address to listen on for HTTP requests. (default ":8090") -log-level string Log level of logrus(trace/debug/info/warn/error/fatal/panic). (default "info") -runtime-endpoint string Endpoint of CRI container runtime service. (default: "/run/containerd/containerd.sock") (default "/run/containerd/containerd.sock") the golang flag package takes care of adding the defaults when printing usage. Remove the explicit print of the value so that it would not be printed on screen twice. Fixes: #3998 Signed-off-by: Francesco Giudici <fgiudici@redhat.com> (cherry picked from commit `a63bbf9793`)	2022-03-30 14:02:54 +02:00
bin	9f75e226f1	runtime: add logs around sandbox monitor For debugging purposes, add some logs. Fixes: #3815 Signed-off-by: bin <bin@hyper.sh>	2022-03-30 17:11:40 +08:00
bin	363fbed804	runtime: stop getting OOM events when ttrpc: closed error getOOMEvents is a long-waiting call, it will retry when failed. For cases of agent shutdown, the retry should stop. When the agent hasn't detected agent has died, we can also check whether the error is "ttrpc: closed". Fixes: #3815 Signed-off-by: bin <bin@hyper.sh>	2022-03-30 17:11:35 +08:00
Fabiano Fidêncio	54a638317a	Merge pull request #3988 from bergwolf/github/kata-deploy kata-deploy: fix version bump from -rc to stable	2022-03-30 11:01:45 +02:00
Peng Tao	8ce6b12b41	Merge pull request #3993 from fidencio/wip/stable-2.4-release-include-all-rust-vendored-code-to-the-vendored-tarball stable-2.4: release: Include all the rust vendored code into the vendored tarball	2022-03-30 16:10:47 +08:00
Fabiano Fidêncio	f840de5acb	workflows,release: Ship all the rust vendored code Instead of only vendoring the code needed by the agent, let's ensure we vendor all the needed rust code, and let's do it using the newly introduced enerate_vendor.sh script. Fixes: #3973 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> (cherry picked from commit `3606923ac8`)	2022-03-29 23:27:43 +02:00
Fabiano Fidêncio	952cea5f5d	tools: Add a generate_vendor.sh script This script is responsible for generating a tarball with all the rust vendored code that is needed for fully building kata-containers on a disconnected environment. Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> (cherry picked from commit `2eb07455d0`)	2022-03-29 23:27:29 +02:00
Peng Tao	cc965fa0cb	kata-deploy: fix version bump from -rc to stable In such case, we should bump from "latest" tag rather than from current_version. Fixes: #3986 Signed-off-by: Peng Tao <bergwolf@hyper.sh>	2022-03-29 03:45:27 +00:00
GabyCT	44b1473d0c	Merge pull request #3977 from fidencio/wip/backport-fix-for-3847 stable-2.4 \| tools: release: Do not consider release candidates as stable releases	2022-03-28 10:38:47 -06:00
Fupan Li	565efd1bf2	Merge pull request #3975 from bergwolf/github/backport-stable-2.4 agent: Signal the whole process group	2022-03-28 18:26:12 +08:00
Fabiano Fidêncio	f41cc18427	tools: release: Do not consider release candidates as stable releases During the release of 2.4.0-rc0 @egernst noticed an incositency in the way we handle release tags, as release candidates are being taken as "stable" releases, while both the kata-deploy tests and the release action consider this as "latest". Ideally we should have our own tag for "release candidate", but that's something that could and should be discussed more extensively outside of the scope of this quick fix. For now, let's align the code generating the PR for bumping the release with what we already do as part of the release action and kata-deploy test, and tag "-rc" as latest, regardless of which branch it's coming from. Fixes: #3847 Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com> (cherry picked from commit `4adf93ef2c`)	2022-03-28 11:01:58 +02:00
Feng Wang	e059b50f5c	runtime: Add more debug logs for container io stream copy This can help debugging container lifecycle issues Fixes: #3913 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-28 16:22:22 +08:00
Feng Wang	71ce6f537f	agent: Kill the all the container processes of the same cgroup Otherwise the container process might leak and cause an unclean exit Fixes: #3913 Signed-off-by: Feng Wang <feng.wang@databricks.com>	2022-03-28 16:21:51 +08:00
Bin Liu	a2b73b60bd	Merge pull request #3960 from cmaf/update-k8s-docs-1-stable-2.4 stable-2.4 \| docs: Update k8s documentation	2022-03-25 15:25:25 +08:00
Bin Liu	2ce9ce7b8f	Merge pull request #3954 from bergwolf/github/backport-stable-2.4 backport main commits to stable 2.4	2022-03-25 14:45:17 +08:00
Chelsea Mafrica	30fc2c863d	docs: Update k8s documentation Update documentation with missing step to untaint node to enable scheduling and update the example to run a pod using the kata runtime class instead of untrusted workloads, which applies to versions of CRI-O prior to v1.12. Fixes #3863 Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com> (cherry picked from commit `5c434270d1`)	2022-03-24 11:22:18 -07:00
David Gibson	24028969c2	virtcontainers: Run mock hook from build tree rather than system bin dir Running unit tests should generally have minimal dependencies on things outside the build tree. It definitely shouldn't modify system wide things outside the build tree. Currently the runtime "make test" target does so, though. Several of the tests in src/runtime/pkg/katautils/hook_test.go require a sample hook binary. They expect this hook in /usr/bin/virtcontainers/bin/test/hook, so the makefile, as root, installs the test binary to that location. Go tests automatically run within the package's directory though, so there's no need to use a system wide path. We can use a relative path to the binary build within the tree just as easily. fixes #3941 Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2022-03-24 12:02:00 +08:00
Garrett Mahin	4e54aa5a7b	doc: fix filename typo Corrects a filename typo in cleanup cluster part of kata-deploy README.md Fixes: #3869 Signed-off-by: Garrett Mahin <garrett.mahin@gmail.com>	2022-03-24 12:00:17 +08:00
James O. D. Hunt	d815393c3e	manager: Add options to change self test behaviour Added new `kata-manager` options to control the self-test behaviour. By default, after installation the manager will run a test to ensure a Kata Containers container can be created. New options allow: - The self test to be disabled. - Only the self test to be run (no installation). These features allow changes to be made to the installed system before the self test is run. Fixes: #3851. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2022-03-24 11:59:48 +08:00
James O. D. Hunt	4111e1a3de	manager: Add option to enable component debug Added a `-d` option to `kata-manager` to enable Kata Containers and containerd debug. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2022-03-24 11:59:33 +08:00
James O. D. Hunt	2918be180f	manager: Create containerd link Make the `kata-manager` create a `containerd` link to ensure the downloaded containerd systemd service file can find the daemon when using the GitHub packaged version of containerd. Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>	2022-03-24 11:59:26 +08:00
Julio Montes	6b31b06832	kernel: fix cve-2022-0847 bump guest kernel version to fix cve-2022-0847 "Dirty Pipe" fixes #3852 Signed-off-by: Julio Montes <julio.montes@intel.com>	2022-03-24 11:58:43 +08:00
Fabiano Fidêncio	53a9cf7dc4	Merge pull request #3927 from fidencio/stable-2.4/qemu-bump stable-2.4: Bump QEMU to 6.2 (bringing then SGX support in)	2022-03-23 07:20:35 +01:00
Julio Montes	5589b246d7	doc: update Intel SGX use cases document Installation section is not longer needed because of the latest default kata kernel supports Intel SGX. Include QEMU to the list of supported hypervisors. fixes #3911 Signed-off-by: Julio Montes <julio.montes@intel.com> (cherry picked from commit `24b29310b2`)	2022-03-22 08:36:04 +01:00
Julio Montes	1da88dca4b	tools: update QEMU to 6.2 bring Intel SGX support Changes tha may impact in Kata Containers Arm: The 'virt' machine now supports an emulated ITS The 'virt' machine now supports more than 123 CPUs in TCG emulation mode The pl031 real-time clock device now supports sending RTC_CHANGE QMP events PowerPC: Improved POWER10 support for the 'powernv' machine Initial support for POWER10 DD2.0 CPU added Added support for FORM2 PAPR NUMA descriptions in the "pseries" machine type s390x: Improved storage key emulation (e.g. fixed address handling, lazy storage key enablement for TCG, ...) New gen16 CPU features are now enabled automatically in the latest machine type KVM: Support for SGX in the virtual machine, using the /dev/sgx_vepc device on the host and the "memory-backend-epc" backend in QEMU. New "hv-apicv" CPU property (aliased to "hv-avic") sets the HV_DEPRECATING_AEOI_RECOMMENDED bit in CPUID[0x40000004].EAX. virtio-mem: QEMU now fully supports guest memory dumps with virtio-mem. QEMU now cleanly supports precopy migration, postcopy migration and background snapshots with virtio-mem. fixes #3902 Signed-off-by: Julio Montes <julio.montes@intel.com> (cherry picked from commit `18d4d7fb1d`)	2022-03-22 08:35:45 +01:00
Peng Tao	8cc2231818	Merge pull request #3892 from fengwang666/my_2.4_pr_backport runtime: Properly handle ESRCH error when signaling container	2022-03-15 10:11:25 +08:00
GabyCT	63c1498f05	Merge pull request #3891 from likebreath/stable-2.4 stable-2.4 \| versions: Upgrade to Cloud Hypervisor v22.1	2022-03-14 17:44:09 -06:00
Feng Wang	3e2f9223b0	runtime: Properly handle ESRCH error when signaling container Currently kata shim v2 doesn't translate ESRCH signal, causing container fail to stop and shim leak. Fixes: #3874 Signed-off-by: Feng Wang <feng.wang@databricks.com> (cherry picked from commit `aa5ae6b17c`)	2022-03-14 13:15:54 -07:00
Bo Chen	4c21cb3eb1	versions: Upgrade to Cloud Hypervisor v22.1 This is a bug fix release. The following issues have been addressed: 1) VFIO ioctl reordering to fix MSI on AMD platforms; 2) Fix virtio-net control queue. Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v22.1 Fixes: #3872 Signed-off-by: Bo Chen <chen.bo@intel.com> (cherry picked from commit `7a18e32fa7`)	2022-03-14 12:34:31 -07:00