containerd does not automatically add groups to the list of additional
GIDs when the groups have the same name as the user:
https://github.com/containerd/containerd/blob/f482992/pkg/oci/spec_opts.go#L852-L854
This is a bug and should be corrected, but it has been present since at
least 1.6.0 and thus affects almost all containerd deployments in
existence. Thus, we adopt the same behavior and ignore groups with the
same name as the user when calculating additional GIDs.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
When connecting to guest through vsock, a log is printed for each failure.
The failure comes from two main reasons: (1) the guest is not ready or (2)
some real errors happen. Printing logs for the first case leads to log
clutter, and your logs will like this:
```
Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/...
Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/...
Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/...
Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/...
Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/...
```
To avoid this, the sock implmentations save the last error and return it
after all retries are exhausted. Users are able to check all errors by
setting the log level to trace.
Reorganize the log format to "{sock type}: {message}" to make it clearer.
Apart from that, errors return by the socks use `self`, instead of
`ConnectConfig`, since the `ConnectConfig` doesn't provide any useful
information.
Disable infinite loop for the log forwarder. There is retry logic in the
sock implmentations. We can consider the agent-log unavailable if
`sock.connect()` encounters an error.
Fixes: #10847
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
The vhost-user-fs has been added to Dragonball, so we can remove
`update_memory`'s dead_code attribute.
Fixes: #8691
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Removed unnecessary dynamic dispatch for services. Properly dereferenced
service Box values and stored in Arc.
Co-authored-by: Ruoqing He <heruoqing@iscas.ac.cn>
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
Previous version of `ttrpc-codegen` is generating outdated
`#![allow(box_pointers)]` which was deprecated. Bump `ttrpc-codegen`
from v0.4.2 to v0.5.0 and `protobuf` from vx to v3.7.1 to get rid of
this.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
The additional GIDs are handled by genpolicy as a BTreeSet. This set is
then serialized to an ordered JSON array. On the containerd side, the
GIDs are added to a list in the order they are discovered in /etc/group,
and the main GID of the user is prepended to that list. This means that
we don't have any guarantees that the input GIDs will be sorted. Since
the order does not matter here, comparing the list of GIDs as sets is
close enough.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
The warning used to trigger even if the passwd file was not needed. This
commit moves it down to where it actually matters.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
https://github.com/kata-containers/kata-containers/pull/11077
established that the GID from the image config is never used for
deriving the primary group of the container process. This commit removes
the associated logic that derived a GID from a named group.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
We need more and accurate documentation. Let's start
by providing an Helm Chart install doc and as a second
step remove the kustomize steps.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Steve Horsman <steven@uk.ibm.com>
The Guest rootfs image file size is aligned up to 128M boundary,
since commmit 2b0d5b2. This change allows users to use a custom
alignment value - e.g., to align up to 2M, users will be able to
specify IMAGE_SIZE_ALIGNMENT_MB=2 for image_builder.sh.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
- Create groups for commonly seen cargo packages so that rather than
getting up to 9 PRs for each rust components, bumps to the same package
are grouped together.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Create a dependabot configuration to check for updates
to our rust and golang packages each day and our github
actions each month
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
After the last commit, the initdata test on SNP should be ok. Thus we
turn on this flag for CI.
Fixes#11300
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
the qemu commandline of SNP should start with `sev-snp-guest`, and then
following other parameters separeted by ','. This patch fixes the
parameter order.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
We're switching to using a rev as it may take some time for the package
to be updated on crates.io.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
There's no benefit on keeping those restricted to the dragonball build,
when they can be used with other VMMs as well (as long as they support
the mem-agent).
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
TUN/TAP is a must for VPN related services.
Signed-off-by: Champ-Goblem <cameron@northflank.com>
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Currently, when a new sandbox resource controller is created with cgroupsv2 and sandbox_cgroup_only is disabled,
the cgroup management falls back to cgroupfs. During deletion, `IsSystemdCgroup` checks if the path contains `:`
and tries to delete the cgroup via systemd. However, the cgroup was originally set up via cgroupfs and this process
fails with `lstat /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/....scope: no such file or directory`.
This patch updates the deletion logic to take in to account the sandbox_cgroup_only=false option and in this case uses
the cgroupfs delete.
Fixes: #11036
Signed-off-by: Champ-Goblem <cameron@northflank.com>
Increase the NOFILE limit in the systemd service, this helps with
running databases in the Kata runtime.
Signed-off-by: Champ-Goblem <cameron@northflank.com>
Since 3.12 we're shipping the helm-chart per default
with each release. Update the documentation to use helm rather
then the kata-deploy manifests.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We have a number of jobs that either need,or nest workflows
that need gh permissions, such as for pushing to ghcr,
or doing attest build provenance. This means they need write
permissions on things like `packages`, `id-token` and `attestations`,
so we need to set these permissions at the job-level
(along with `contents: read`), so they are not restricted by our
safe defaults.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
I shortsightedly forgot that gatekeeper would need
to read more than just the commit content in it's
python scripts, so add read permissions to actions
issues which it uses in it's processing
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We have a number of jobs that nest the build-static-tarball
workflows later on. Due to these doing attest build provenance,
and pushing to ghcr.io, t hey need write permissions on
`packages`, `id-token` and `attestations`, so we need to set
these permissions on the top-level jobs (along with `contents: read`),
so they are not blocked.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Some legacy workflows require write access to github which
is a security weakness and don't provide much value,
so lets remove them.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
It frequently causes "Resource Temporarily Unavailable (OS Error 11)"
with the original 250ms read timeout When passing through devices via
VFIO in QEMU. The root cause lies in synchronization timeout windows
failing to accommodate inherent delays during critical hardware init
phases in kernel space. This commit would increase the timeout to 5000ms
which was determined through some tests. While not guaranteeing complete
resolution for all hardware combinations, this change significantly
reduces timeout failures.
Fixes # 10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This enables guest pull via config, without the need of any external
snapshotter. When the config enables runtime.experimental_force_guest_pull, instead of
relying on annotations to select the way to share the root FS, we always
use guest pull.
Co-authored-by: Markus Rudy <mr@edgeless.systems>
Signed-off-by: Paul Meyer <katexochen0@gmail.com>