Mount the direct-assigned block device fs only once and keep a refcount
in the guest. Also use the ro flag inside the options field to determine
whether the block device and filesystem should be mounted as ro
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Translate the volume path from host-known path to guest-known path
and forward the request to kata agent.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
During the container creation, it will parse the mount info file
of the direct assigned volumes and update the in memory mount object.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Add GetVolumeStats and ResizeVolume APIs for the runtime to query stat
and resize fs in the guest.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
To query fs stats and resize fs, the requests need to be passed to
kata agent through containerd-shim-v2. So we're adding to rest APIs
on the shim management endpoint.
Also refactor shim management client to its own go file.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
In the direct assigned volume scenario, Kata Containers persists
the information required for managing the volume inside the guest
on host filesystem.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Add commands to add, remove, resize and get stats of a direct-assigned volume.
These commands are expected to be consumed by CSI.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
This PR updates the README document by using the proper link for
the contributing guide as well as a misspelling.
Fixes#3791
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
There are a few outstanding changes required to build the runtime on
Darwin.
Let's add a GitHub action to exercise build and unit tests of the
packages which we do expect to work. Eventually this should be dropped
and we can run any Darwin specific tests, or just add MacOS to the
matrix for our static check OSes.
Fixes: #3778
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
This utility function is also used to check the spec that will run in
the guest - no need for this to be linux specific.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Their types may differ on various host OSes, but
unix.Major|Minor always takes a uint64
Depends-on: github.com/kata-containers/tests#4516
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Add a stub for utils_darwin to facilitate building this package on
Darwin. We can probably drop this empty stub if we have better
abstraction for the various parts of virtcontainers that call it
today...
Fixes:# 3777
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
We need to convert them to uint64 as their types may differ on various
host OSes, but unix.Major|Minor takes a uint64 regardless.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Let's clarify that an error will be reported in case confidential_guest
is enabled, but the hardware where Kata Containers is running doesn't
provide the required feature set.
Fixes: #3787
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's use "Intel TDX" rather than just "TDX", as it can ease the
understanding of the terminology.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's mention the supported TEEs to be used with confidential guests.
Right now, Cloud Hyperisor supports only Intel TDX, used together with
TD Shim.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Nydusd uses a bufio.Scanner to check if nydusd process has
existed, but stderr/stdout passed to Cmd is self-created pipe,
this pipe will not be closed if the process start failing.
Use standard Cmd.StdoutPipe can close the stdout and kata shim
will detect the existence of the nydusd process, then call cmd.Wait to
reap the process' resources.
Fixes: #3783
Signed-off-by: bin <bin@hyper.sh>
The content about systemd in "/proc/self/cgroup" is as:
1:name=systemd:/kubepods/pod1815643d-3789-4e4e-aaf4-00de024912e1/0e15a65bd5f7b30a0b818d90706212354d8b3f0998a1495473c3be9a24706ccf
and in "/prol/self/mountinfo" is as:
30 29 0:26 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:6 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
The keys extracted from the two files are the same as "name=systemd". So no need to rename the key to "systemd".
Fixes: #3385
Signed-off-by: sailorvii <challengingway@hotmail.com>
A copy and paste mistake was made and the error on HotplugRemoveDevice()
should be about removal and not about addition.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
4c164afbac renamed extra_build_args to
features, but did it only in one place, leading to:
```
21:15:28 /home/jenkins/workspace/kata-containers-2.0-ubuntu-ARM-PR/go/src/github.com/kata-containers/kata-containers/tools/packaging/static-build/cloud-hypervisor/build-static-clh.sh: line 55: features: unbound variable
21:15:29 make[1]: *** [tools/packaging/kata-deploy/local-build/Makefile:30: cloud-hypervisor-tarball-build] Error 1
```
Fixes: #3775
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
removes --tags selinux handling in the makefile (part of it introduced here: d78ffd6)
and makes selinux configurable via configuration.toml
Fixes: #3631
Signed-off-by: Tanweer Noor <tnoor@apple.com>
Switching to the generic FilesystemSharer brings 2 majors improvements:
1. Remove container and sandbox specific code from kata_agent.go
2. Allow for non Linux implementations to provide ways to share
container files and root filesystems with the Kata Linux guest.
Fixes#3622
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
With the Linux implementation of the FilesystemSharer interface, we can
now remove all host filesystem sharing code from kata_agent and keep it
where it belongs: sandbox.go.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
This gathers the current kata agent and container filesystem sharing
code into a FilesystemSharer implementation.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Filesystem sharing here means the ability to share some parts of the
host filesystem with the guest. It's mostly about sharing files and
container bundle root filesystems.
In order to allow for different file and rootfs sharing implementations,
we define a FilesystemSharer interface.
This interface provides a preparation step, where concrete
implementations will be able to e.g. prepare the host filesysstem.
Then it provides 2 methods, one for sharing any file (regular file or a
directory) and another one for sharing a container root filesystem
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Let's enable TDX support for Cloud Hypervisor, using td-shim as its
desired firmware.
Fixes: #3632
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
"firmware" option was already present for a while, but it's never been
exposed to the configuration file before.
Let's do it now as it can be used, in combination with the newly added
confidential_guest option, to boot a guest VM using the so called
`td-shim`[0] with Cloud Hypervisor.
[0]: https://github.com/confidential-containers/td-shim
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
NVDIMM is also not supported with Confidential Guests and Virtio Block
devices should be used instead.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to VCPUs and Device hotplug, Confidential Guests also do not
support Memory hotplug.
Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Memory hotplug as
being supported when running Confidential Guests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to VCPUs hotplug, Confidential Guests also do not support
Device hotplug.
Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Device hotplug as
being supported when running Confidential Guests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As confidential guests do not support VCPUs hotplug, let's set the
"DefaultMaxVCPUs" value to "NumVCPUs".
The reason to do this is to ensure that guests will be started with the
correct amount of VCPUs, without giving to the guest with all the
possible VCPUs the host could provide.
One clear side effect of this limitation is that workloads that would
require more VCPUs on their yaml definition will not run on this
scenario.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
ConfidentialGuest is an option already present and exposed for QEMU,
which is used for using Kata Containers together with different sorts of
Guest Protections, such as TDX and SEV for x86_64, PEF for ppc64le, and
SE for s390x.
Right now we error out in case confidential_guest is enabled, as we will
be implementing the needed blocks for this as part of this series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is a small code refactor removing a deadcode based the checks
already done in the generic hypervisor abstraction.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The hypervisor code already defines 3 common kernel root params for the
following cases:
* NVDIMM
* NVDIMM without DAX support
* Virtio Block
As parameters used for cloud-hypervisor have an overlap with the ones
provided by the NVDIMM case, let's take advantage of that.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>