After supporting the Arm CCA, it will rely on the kernel kvm.h headers to build the
runtime. The kernel-headers currently quite new with the traditional one, so that we
rely on build the kernel header first and then inject it to the shim-v2 build container.
Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
Co-authored-by: Seunguk Shin <seunguk.shin@arm.com>
On commit 9602ba6ccc, from February this
year, we've introduced a check to ensure that the files needed for
signing the kernel build are present. However, we've noticed last week
that there were a reasonable amount of wrong assumptions with the
workflow. :-)
Zvonko fixed the majority of those, but this bit was left and it'd cause
breakages when using kernel that was cached ... although passing when
building new kernels.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
For those who are not willing to use the nydus-snapshotter for pulling
the image inside the guest, let's allow them setting the
experimetal_force_guest_pull, introduced by Edgeless, as part of our
helm-chart.
This option can be set as:
_experimentalForceGuestPull: "qemu-tdx,qemu-coco-dev"
Which would them ensure that the configuration for `qemu-tdx` and
`qemu-coco-dev` would have the option enabled.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As the kata-deploy helm chart has been the only way we've been testing
kata-containers deployment as part of our CI, it's time to finally get
rid of the kustomize yamls and avoid us having to maintain two different
methods (with one of those not being tested).
Here I removed:
* kata-deploy yamls and kustomize yamls
* kata-cleanup yamls and kustomize yamls
* kata-rbac yals and kustomize yamls
* README.md for the kustomize yamls was removed
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Canonical TDX release is not needed for vanilla Ubuntu 25.10 but
GRUB_CMDLINE_LINUX_DEFAULT needs to contain `nohibernate` and
`kvm_intel.tdx=1`
Signed-off-by: Szymon Klimek <szymon.klimek@intel.com>
Let's expose the EXPERIMENTAL_SETUP_SNAPSHOTTER script environment
variable to our chart, allowing then users of our helm chart to take
advantage of this experimental feature.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We may deploy in scenarios where we want to have both snapshotters set
up, sometimes even for simple test on which one behaves better.
With this in mind, let's allow EXTERNAL_SETUP_SNAPSHOTTER to receive a
comma separated list of snapshotters, such as:
```
EXPERIMENTAL_SETUP_SNAPSHOTTER="erofs,nydus"
```
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Similarly to what's been done for the nydus-snapshotter, let's allow
users to have erofs-snapshotter set up by simply passing:
```
EXPERIMENTAL_SETUP_SNAPSHOTTER="erofs".
```
Mind that erofs, although a built-in containerd snapshotter, has system
depdencies that we will *NOT* install and it's up to the admin to do so.
These dependencies are:
* erofs-utils
* fsverity
* erofs module loaded
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's introduce a new EXPERIMENTAL_SETUP_SNAPSHOTTER environemnt
variable that, when set, allows kata-deploy to put the nydus snapshotter
in the correct place, and configure containerd accordingly.
Mind, this is a stop gap till the nydus-snapshotter helm chart is ready
to be used and behaving well enough to become a weak dependency of our
helm chart. When that happens this code can be deleted entirely.
Users can have nydus-snapshotter deployed and configured for the
guest-pull use case by simply passing:
```
EXPERIMENTAL_SETUP_SNAPSHOTTER="nydus"
```
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Otherwise we'd end up adding a the file several times, which could lead
to problems when removing the entry, leading to containerd not being
able to start due to an import file not being present.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Moving the CUDA repo to the top for all essential packages
and adding a repo priority favouring NVIDIA based repos.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
I've hit this when using a machine with slow internet connection, which
took ages to download the kata-cleanup image, and then helm timed out in
the middle of the cleanup, leading to the cleanup job being restarted
and then bailing with an error as the runtimeclasses that kata-deploy
tries to delete were already deleted.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This reverts commit cb5f143b1b, as the
cached packages have been regenerated after the switch to using zstd.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
We need to get the root_hash.txt file from the image build, otherwise
there's no way to build the shim using those values for the
configuration files.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Although the compress ratio is not as optimal as using xz, it's way
faster to compress / uncompress, and it's "good enough".
This change is not small, but it's still self-contained, and has to get
in at once, in order to help bisects in the future.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
As 3.18 is already EOL.
We need to add `--break-system-packages` to enforce the install of the
installation of the yq version that we rely on. The tests have shown
that no breakage actually happens, fortunately.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
We need to make sure that we use the latest kernel
and rebuild the initrd and image for the nvidia-gpu
use-cases otherwise the tests will fail since
the modules are not build against the new kernel and
they simply fail to load.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
As there were a few moderate security vulnerability fixes missed as part
of the 3.19.0 release.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
For the release itself, let's simply copy the VERSION file to the
tarball.
To do so, we had to change the logic that merges the build, as at that
point the tag is not yet pushed to the repo.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
- Add nodeSelector configuration to values.yaml with empty default
- Update DaemonSet template to conditionally include nodeSelector
- Add documentation and examples for nodeSelector usage in README
- Allows users to restrict kata-containers deployment to specific nodes by labeling them
Signed-off-by: Gus Minto-Cowcher <gus@basecamp-research.com>
The KERNEL_DEBUG_ENABLED was missing in the outer shell script
so overrides via make were not possible.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The peer pods project is using the agent-ctl tool in some
tests, so tagging our cache will let them more easily identify
development versions of kata for testing between releases.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We want to be able to build a debug version of the kernel for various
use-cases like debugging, tracing and others.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The convention for rootfs-* names is:
* rootfs-${image_type}-${special_build}
If this is not followed, cache will never work as expected, leading to
building the initrd / image on every single build, which is specially
constly when building the nvidia specific targets.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Removing files related to SEV, responsible for
installing and configuring Kata containers.
Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
Signed-off-by: Arvind Kumar <arvinkum@amd.com>
679cc9d47c was merged and bumped the
podoverhead for the gpu related runtimeclasses. However, the bump on the
`kata-runtimeClasses.yaml` as overlooked, making our tests fail due to
that discrepancy.
Let's just adjust the values here and move on.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
We cannot only rely only on default_cpu and default_memory in the
config, default is 1 and 2Gi but we need some overhead for QEMU and
the other related binaries running as the pod overhead. Especially
when QEMU is hot-plugging GPUs, CPUs, and memory it can consume more
memory.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
This helps considerably to avoid patching the code, and just adjusting
the build environment to use a smaller alignment than the default one.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
the latest Canonical TDX release supports 25.04 / Plucky as
well. Users experimenting with the latest goodies in the
25.04 TDX enablement won't get Kata deployed properly.
This change accepts 25.04 as supported distro for TDX.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Now kata-agent by default supports both guest pull and host pull
abilities, thus we do not need to specify the PULL_TYPE env when
building kata-agent.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
This completely eliminates the Azure secret from the repo, following the below
guidance:
https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/configuring-openid-connect-in-azure
The federated identity is scoped to the `ci` environment, meaning:
* I had to specify this environment in some YAMLs. I don't believe there's any
downside to this.
* As previously, the CI works seamlessly both from PRs and in the manual
workflow.
I also deleted the tools/packaging/kata-deploy/action folder as it doesn't seem
to be used anymore, and it contains a reference to the secret.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
We need more and accurate documentation. Let's start
by providing an Helm Chart install doc and as a second
step remove the kustomize steps.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Steve Horsman <steven@uk.ibm.com>
The Guest rootfs image file size is aligned up to 128M boundary,
since commmit 2b0d5b2. This change allows users to use a custom
alignment value - e.g., to align up to 2M, users will be able to
specify IMAGE_SIZE_ALIGNMENT_MB=2 for image_builder.sh.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>