With the Linux implementation of the FilesystemSharer interface, we can
now remove all host filesystem sharing code from kata_agent and keep it
where it belongs: sandbox.go.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
This gathers the current kata agent and container filesystem sharing
code into a FilesystemSharer implementation.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Filesystem sharing here means the ability to share some parts of the
host filesystem with the guest. It's mostly about sharing files and
container bundle root filesystems.
In order to allow for different file and rootfs sharing implementations,
we define a FilesystemSharer interface.
This interface provides a preparation step, where concrete
implementations will be able to e.g. prepare the host filesysstem.
Then it provides 2 methods, one for sharing any file (regular file or a
directory) and another one for sharing a container root filesystem
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Let's enable TDX support for Cloud Hypervisor, using td-shim as its
desired firmware.
Fixes: #3632
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
"firmware" option was already present for a while, but it's never been
exposed to the configuration file before.
Let's do it now as it can be used, in combination with the newly added
confidential_guest option, to boot a guest VM using the so called
`td-shim`[0] with Cloud Hypervisor.
[0]: https://github.com/confidential-containers/td-shim
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
NVDIMM is also not supported with Confidential Guests and Virtio Block
devices should be used instead.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to VCPUs and Device hotplug, Confidential Guests also do not
support Memory hotplug.
Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Memory hotplug as
being supported when running Confidential Guests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to VCPUs hotplug, Confidential Guests also do not support
Device hotplug.
Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Device hotplug as
being supported when running Confidential Guests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As confidential guests do not support VCPUs hotplug, let's set the
"DefaultMaxVCPUs" value to "NumVCPUs".
The reason to do this is to ensure that guests will be started with the
correct amount of VCPUs, without giving to the guest with all the
possible VCPUs the host could provide.
One clear side effect of this limitation is that workloads that would
require more VCPUs on their yaml definition will not run on this
scenario.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
ConfidentialGuest is an option already present and exposed for QEMU,
which is used for using Kata Containers together with different sorts of
Guest Protections, such as TDX and SEV for x86_64, PEF for ppc64le, and
SE for s390x.
Right now we error out in case confidential_guest is enabled, as we will
be implementing the needed blocks for this as part of this series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is a small code refactor removing a deadcode based the checks
already done in the generic hypervisor abstraction.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The hypervisor code already defines 3 common kernel root params for the
following cases:
* NVDIMM
* NVDIMM without DAX support
* Virtio Block
As parameters used for cloud-hypervisor have an overlap with the ones
provided by the NVDIMM case, let's take advantage of that.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's bump the Cloud Hypervisor version to 5343e09e7b8db, as that brings
a few fixes we're interested in, such as:
* hypervisor, vmm: Handle TDX hypercalls with INVALID_OPERAND
- https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3723
- This is needed for the TDX support on the cloud hypervisor driver,
which is part of this very same series.
* openapi: Update the PciBdf types
- https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3748
- This is needed due to a change in a DeviceNode field, which would
cause a marshalling / demarshalling error when running with a
version of cloud-hypervisor that includes the TDX fixes mentioned
above.
* scripts: dev_cli: Don't quote $features_build
* scripts: dev_cli: Add --features option
- https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3773
- This is needed due to changes in the scripts used to build Cloud
Hypervisor, which are used as part of Kata Containers CIs and
github actions.
Due to this change, we're also adapting the build scripts as part
of this very same commit.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
in TestHandleHugepages. On s390x, hugepage sizes must be set at boot, so
test with any that are present (default is 1M).
Depends-on: github.com/kata-containers/kata-containers#3770
Fixes: #3763
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
We introduced collection of sandboxes metadata from the CRI that will be
attached to the sandbox metrics: this will allow to immediately match
sandboxes metrics with CRI workloads.
Rename the symbols from *Kube* to *CRI* as the metadata will be there
every time pods are created through CRI, also if kubernetes is not
installed (e.g., 'crictl runp').
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
To resourcecontrol, and make it consistent with the fact that cgroups
are a Linux implementation of the ResourceController interface.
Fixes: #3601
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
We call it a ResourceController, and we make it not so Linux specific.
Now the Linux implementations is the cgroups one.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
We now rely on fs events only to update the sandbox cache. This is not
true anyway for sandboxes already present at kata-monitor startup: we
just retrieve the list and add them in the cache only when we get their
CRI metadata. If CRI metadata is not available we will never add them to
the sandbox cache.
Fix this by immediately adding the sandboxes we find at startup time to
the sandbox cache.
Fixes: #3705
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
Since kata-monitor now:
- relies on fs events *only* to update the sandbox cache
- adds CRI meta-data as labels (CRI pod name, namespace and uid)
it deserves a version bump.
Note that while we could let kata-monitor match the runtime version,
kata-monitor will usually work flawlessy with different kata shim
releases: so it makes sense to keep kata-monitor version separated.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
The Developer-Guide.md build a custom kata agent with `x86_64-unknown-linux-musl`.
The `musl` should be changed by the system arch. The system arch is aarch64,
ppc64le and s390x, the musl should be changed. When the arch is ppc64le or s390x,
the musl should be replaced by the gnu.
Fixes: #3731
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
CRI-O start shim process without setting TTRPC_ADDRESS,
that the forwarding events goroutine will get errors.
For CRI-O runtime, we can log the events to log file.
Fixes: #3733
Signed-off-by: bin <bin@hyper.sh>
This PR updates the contributing documentation link to the
one that is using kata 2.0
Fixes#3740
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The kata-deploy Dockerfile is based on CentOS 7, which has no s390x
support. Add an `IMAGE` argument to specify the registry, which still
defaults to CentOS, but e.g. ClefOS can be selected instead.
Other x86_64 assumptions are also removed. Other general simplicifations
are made.
This does not address the more general issue of #3723 -- what we're
doing here does not seem to be working with systemd >= something between
235-237.
Fixes: #3722
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
When using kata-deploy, no `containerd-shim-kata-v2` binary is deployed,
but we do deploy a `kata` runtime class, which seems very much
incosistent.
As the default configuration for kata-containers points to QEMU, let's
also use kata with QEMU as the default shim-v2 binary.
Fixes: #3228, #3734
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As kata with qemu has supported lazyload, so this pr aims to
bring lazyload ability to kata with clh.
Fixes#3654
Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
The name of SYS_SUPPORTS_HUGETLBFS has been changed to
ARCH_SUPPORTS_HUGETLBFS which is being selected on default
by another kernel config.
More info- 855f9a8e87
Change applicable from v5.13.
Fixes: #3720
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
`tools/packaging/scripts/apply_patches.sh` uses `git apply $patch`, but
this will not apply to subdirectories. If one wanted to apply with
`git apply`, they'd have to run it with `--directory=...`
_relative to the Git tree's root_ (absolute will not work!). I suggest
we just use `patch`, which will do what we expected `git apply` would
do.
`patch` is also added to build containers that require it.
Fixes: #3690
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Let's take advantage of the fact that we've bumped to our kernel version
ot the 5.15 LTS and enable SGX by default, as it's present there.
Fixes: #3692
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's stop building the experimental kernel as, currently, we have
all the needed contents as part of the vanilla kernel.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
There's no need to build an experimental kernel for x86_64 as all the
bits which were part of the experimental one (SGX only, really) are now
part of the vanilla one.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>