With the new CDH version, the secure_mount API changes.
Further, the new CDH version no longer uses the luks-encrypt-storage
script but utilizes libcryptsetup as well as mkfs.ext4 and dd. Hence, adapt
some of the CDH and Kata components build steps
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
For cold-plug when running with nerdctl the timeouts in the config
are being used, increase the dial_timeout (e.g. for CreateSandbox) to match
create_container_timeout.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Moving the genpolicy crate into the root workspace causes the build
outputs to go into the root workspace's target directory, instead of
src/tools/genpolicy/target, invalidating assumptions made by the
kata-deploy-binaries script.
This commit adds a special case for the lookup path of the genpolicy
binary, and fixes two bugs that made identifying this problem harder.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
Let's update the nvidia-container-toolkit to 1.18.1 (from 1.17.6).
We're, from now on, relying on the version set in the versions.yaml
file.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Allow users to override the default RuntimeClass pod overhead for
any shim via shims.<name>.runtimeClass.overhead.{memory,cpu}.
When the field is absent the existing hardcoded defaults from the
dict are used, so this is fully
backward compatible.
Signed-off-by: Zachary Spar <zspar@coreweave.com>
Remove # !confidential from mmio.conf so CONFIG_VIRTIO_MMIO and
CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES are included when building the
unified x86_64/s390x kernel with -x
Firecracker requires virtio-mmio for block devices; without it the
guest kernel panics (no /dev/vda).
Fixes: #12581
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Allow genpolicy -j to accept a directory instead of a single file.
When given a directory, genpolicy loads genpolicy-settings.json from it
and applies all genpolicy-settings.d/*.json files (sorted by name) as
RFC 6902 JSON Patches. This gives precise control over settings with
explicit operations (add, remove, replace, move, copy, test), including
array index manipulation and assertions.
Ship composable drop-in examples in drop-in-examples/:
- 10-* files set platform base settings (non-CoCo, AKS, CBL-Mariner)
- 20-* files overlay specific adjustments (OCI version, guest pull)
Users copy the combination they need into genpolicy-settings.d/.
Replace the old adapt_common_policy_settings_* jq-patching functions
in tests_common.sh with install_genpolicy_drop_ins(), which copies the
right combination of 10-* and 20-* drop-ins for the CI scenario.
Tests still generate 99-test-overrides.json on the fly for per-test
request/exec overrides.
Packaging installs 10-* and 20-* drop-ins from drop-in-examples/ into
the tarball; the default genpolicy-settings.d/ is left empty.
Made-with: Cursor
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The SNP tests have been unstable on nightlies, but even when these
it seems to be manually cleaned up or something as PR tests are consistently
failing, so we should skip this from the required list until it is reliable.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
kata-deploy restarts the CRI runtime (k3s/containerd) during install,
which can kill the verification job pod or cause transient API server
errors. Bump backoffLimit from 0 to 3 so the job can retry after being
killed, and add a retry loop around kubectl rollout status to handle
transient connection failures.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Move the cleanup logic from a preStop lifecycle hook (separate exec)
into the main process's SIGTERM handler. This simplifies the
architecture: the install process now handles its own teardown when
the pod is terminated.
The SIGTERM handler is registered before install begins, and
tokio::select! races install against SIGTERM so cleanup always runs
even if SIGTERM arrives mid-install (e.g. helm uninstall while the
container is restarting after a failed install attempt).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Check the rendered containerd config for the versioned drop-in dir import
(config.toml.d or config-v3.toml.d) and bail with a clear error if it is
missing.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
When I added this I had in mind the period that we still relied on the
SEV module being generated, which we don't do for quite a long time.
This wrong assumption caused the cache to **ALWAYS** fail, increasing
our build time considerably for no reason.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
get_kernel() thinks it knows when it needs to skip sha256sum validation for
RC kernels since sha256sums.asc is not available:
INFO: Config version: 176
INFO: Kernel version: 6.18-rc5
INFO: kernel path does not exist, will download kernel
INFO: Release candidate kernels are not part of the official sha256sums.asc -- skipping sha256sum validation
But continues to check it anyway since ${rc} matches
with -n. sha256sum should only be checked when ${rc} is NOT
set.
Fixes a problem where downloaded RC kernels are always removed
and downloaded again.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
On containerd v3 config, disable_snapshot_annotations must be set under the
images plugin, not the runtime plugin.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Different kubernetes distributions, such as k0s, use a different kubelet
root dir location instead of the default /var/lib/kubelet, so ConfigMap
and Secret volume propagation were failing.
This adds a kubelet_root_dir config option that the go runtime uses when
matching volume paths and kata-deploy now sets it automatically for k0s
via a drop-in file.
runtime-rs does not need this option: it identifies ConfigMap/Secret,
projected, and downward-api volumes by volume-type path segment
(kubernetes.io~configmap, etc.), not by kubelet root prefix.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
When image.reference or kubectlImage.reference already contains a digest
(e.g. quay.io/...@sha256:...), use the reference as-is instead of
appending :tag. This avoids invalid image strings like 'image@sha256🔤'
when tag is empty and allows users to pin by digest.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Since QEMU v10.0.0 and Linux v6.13, virtio-mem-ccw is supported.
Let's enable the required kernel configs for s390x.
This commit enables `CONFIG_VIRTIO_MEM` and `CONFIG_MEMORY_HOTREMOVE`
to support memory hotplug in the VM guest.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
getDefaultShimForArch previously returned whatever string was set in
defaultShim.<arch> without any validation. A typo, a non-existent shim,
or a shim that is disabled via disableAll would all silently produce a
bogus DEFAULT_SHIM_* env var, causing kata-deploy to fail at runtime.
Guard the return value by checking whether the configured shim is
present in the list of shims that are both enabled and support the
requested architecture. If not, return empty string so the env var is
simply omitted.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Using `$runtime.containerd.snapshotter` and `$runtime.crio.pullType`
panics with a nil pointer error when the containerd or crio block is
absent from the custom runtime definition.
Let's use the `dig` function which safely traverses nested keys and
returns an empty string as the default when any key in the path is
missing.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
When kata-deploy is deployed with cloud-api-adaptor, it
defaults to qemu instead of configuring the remote shim.
Support ppc64le to enable it correctly when shims.remote.enabled=true
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>