As runtime-rs use the "/run/kata-containers/shared/sandboxes/" which
will be checked and denied by agent policy. To make it pass the rules,
we'd better align it with runtime-go on the guest path definition.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
When Sharedfs is disabled (share_fs = None), ShareFsVolume uses CopyFile to
create directories in the guest VM. However, it incorrectly uses host fs
metadata (root:root 755) instead of the container's security context, causing
permission denied errors for non-root containers.
The issue occurs because:
(1) Host emptyDir at /var/lib/kubelet/pods/.../empty-dir is owned by root
(2) CopyFile copies these permissions directly to guest
(3) Container running as uid=1001 with fsGroup=123 cannot write to `root:root 755`
dirs.
This commit fixes the issue by:
1. Extracting uid/gid/fsGroup from OCI spec.process.user
2. Setting proper permissions when creating directories.
For fsGroup dirs: 0o2775 (setgid) to ensure group inheritance
3. Ensuring new files automatically inherit the directory's group
ownership
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit refactors the volume handling logic to centralize shared
filesystem volume management. Previously, `ShareFsVolume` instances
would each create and manage their own `VolumeManager`, leading to a
fragmented state and issues with sharing.
The key changes include:
(1) A single `VolumeManager` instance is now created and owned by
the `VolumeResource`.
(2) The `VolumeResource` passes its `volume_manager` to
`ShareFsVolume::new` when a new shared volume is handled.
(3) This design ensures that all `ShareFsVolume` instances within a
single andbox resource manager, enabling correct reference counting and
preventing redundant file copies to the guest.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit integrates the new `VolumeManager` into the `ShareFsVolume`
lifecycle. Instead of directly copying files, `ShareFsVolume::new` now
uses the `VolumeManager` to get a guest path and determine if the volume
needs to be copied. It also updates the `cleanup` function to release
the volume's reference count, allowing the `VolumeManager` to manage its
state and clean up resources when no longer in use.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit introduces a new `VolumeManager` to track the state of shared
volumes, including their reference count and whether they have been copied
to the guest.
The manager's goal is to handle the lifecycle of shared filesystem volumes,
including:
(1) Volume State Tracking: Tracks the mapping from host source paths
to guest destination paths.
(2) Reference Counting: Manages reference counts for each volume, preventing
premature cleanup when multiple containers share the same source.
(3) Deterministic guest paths: Generates unique and deterministic guest
paths using SHA-256 hashing to avoid naming conflicts.
(4) Improved Management: Provides a centralized way to handle volume creation,
copying, and release, including aborting file watchers when volumes are no
longer in use.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit refactors the `CopyFile` related code to streamline the
logic for creating guest directories and make the code structure clearer.
Its main goal is to improve the overall maintainability and facilitate
future feature extensions.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit is designed to perform a full sync before starting monitoring
to ensure that files which exist before monitoring starts are also synced.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
ef642fe890 added a special case to avoid
moving cgroups that are on the "default" slice in case of deletion.
However, this special check should be done in the Parent() method
instead, which ensures that the default resource controller ID is
returned, instead of ".".
Fixes: #11599
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
- Set guest Storage.options for block rootfs to empty (do not propagate host mount options).
- Align behavior with Go runtime: only add xfs nouuid when needed.
Signed-off-by: Caspian443 <scrisis843@gmail.com>
We moved to `.zst`, but users still use the upstream kata-manager to
download older versions of the project, thus we need to support both
suffixes.
Fixes: #11714
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Similar to what we've done for Cloud Hypervisor in the commit
9f76467cb7, we're backporting a runtime-rs
feature that would be benificial to have as part of the go runtime.
This allows users to use virito-balloon for the hypervisor to reclaim
memory freed by the guest.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
The default suggestion for top-level permissions was
`contents: read`, but scorecard notes anything other than empty,
so try updating it and see if there are any issues. I think it's
only needed if we run workflows from other repos.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Since the previous tightening a few workflow updates have
gone in and the zizmor job isn't flagging them as issues,
so address this to remove potential attack vectors
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This reverts commit cb5f143b1b, as the
cached packages have been regenerated after the switch to using zstd.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
As part of the go 1.24.6 bump there are errors about the incorrect
use of a errorf, so switch to the non-formatting version, or add
the format string as appropriate
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
golang 1.25 has been released, so 1.23 is EoL,
so we should update to ensure we don't end up with security issues
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Update the two workflows that used setup-go to
instead call `install_go.sh` script, which handles
installing the correct version of golang
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
`${kernel_name,,}` is bash 4.0 and not posix compliant, so doesn't
work on macos, so switch to `tr` which is more widely
supported
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In #11693 the cc_init_data annotation was changes to be hypervisor
scoped, so each hypervisor needs to explicitly allow it in order to
use it now, so add this to both the go and rust runtime's remote
configurations
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We need to get the root_hash.txt file from the image build, otherwise
there's no way to build the shim using those values for the
configuration files.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Although the compress ratio is not as optimal as using xz, it's way
faster to compress / uncompress, and it's "good enough".
This change is not small, but it's still self-contained, and has to get
in at once, in order to help bisects in the future.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
As 3.18 is already EOL.
We need to add `--break-system-packages` to enforce the install of the
installation of the yq version that we rely on. The tests have shown
that no breakage actually happens, fortunately.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Currently, we change vm_rootfs_driver as the initdata device driver
with block_device_driver.
Fixes#11697
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
we also need support initdat within nonprotection even though the
platform is detected as NonProtection or usually is called nontee
host. Within these cases, there's no need to validate the item of
`confidential_guest=true`, we believe the result of the method
`available_guest_protection()?`.
Fixes#11697
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The default `reconnect_timeout` (3 seconds) was found to be insufficient for
IBM SEL when using VSOCK. This commit updates the timeouts as follows:
- `dial_timeout_ms`: Set to 90ms to match the value used in go-runtime for IBM SEL
- `reconnect_timeout_ms`: Increased to 5000ms based on empirical testing
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Add support for the `InitData` resource config on IBM SEL,
so that a corresponding block device is created and the
initdata is passed to the guest through this device.
Note that we skip passing the initdata hash via QEMU’s
object, since the hypervisor does not yet support this
mechanism for IBM SEL. It will be introduced separately
once QEMU adds the feature.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Linux v6.16 brings some useful features for the confidential guests.
Most importantly, it adds an ABI to extend runtime measurement registers
(RTMR) for the TEE platforms supporting it. This is currently enabled
on Intel TDX only.
The kernel version bump from v6.12.x to v6.16 forces some CONFIG_*
changes too:
MEMORY_HOTPLUG_DEFAULT_ONLINE was dropped in favor of more config
choices. The equivalent option is MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO.
X86_5LEVEL was made unconditional. Since this was only a TDX
configuration, dropping it completely as part of v6.16 is fine.
CRYPTO_NULL2 was merged with CRYPTO_NULL. This was only added in
confidential guest fragments (cryptsetup) so we can drop it in this update.
CRYPTO_FIPS now depends on CRYPTO_SELFTESTS which further depends on
EXPERT which we don't have. Enable both in a separate config fragment
for confidential guests. This can be moved to a common setting once
other targets bump to post v6.16.
CRYPTO_SHA256_SSE3 arch optimizations were reworked and are now enabled
by default. Instead of adding it to whitelist.conf, just drop it completely
since it was only enabled as part of "measured boot" feature for
confidential guests. CONFIG_CRYPTO_CRC32_S390 was reworked the same way.
In this case, whitelist.conf is needed.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>