As the some of the global vars can be empty, we should actually check
their _FOR_ARCH version instead.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As we're making the values.yaml more user friendly, we actually have to
handle the https_proxy and no_proxy entries per shim, instead of having
this globally available, as this will only affect images being pulled
inside the guest (as in, when using TEE variations of the shims).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
For the nvidia-gpu-snp and nvidia-gpu-tdx we must set containerd to
allow the CDI annotation to be passed to down.
This solution may become obsolete soon enough, but the cleanest way to
have it properly working is by adding it here (even if we remove it
before the next release).
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's now make sure that we don't add duplicated values to any of our
entries, making the script as sane as possible for sequential runs.
Vibed with Cursor's help!
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's add some helper functions, not yet used, to avoid adding
duplicated items.
This idea is an expansion of Choi's idea to avoid setting duplicated
items, and it'll help on making the whole script idempotent on
sequential runs.
Vibed with Cursor's help!
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
I know, this is not simplifying much things for now, but it has a good
intent in the background and will serve as base for making the
kata-deploy helm chart more user friendly.
With that said, let's add ALLOWED_HYPERVISOR_ANNOTATIONS per arch, while
adding support to set something like "qemu:foo,bar clh:bar foobar
barfoo". Why? Because in the future we'll have a better way to set this
per shim (and the shim is per arch ...).
More details of what we'll do in the future are being discussed here:
https://github.com/kata-containers/kata-containers/issues/12024
Anyways, the variables are **DELIBERATELY** not exposed to the chart for
now, as those will be later on when addressing the issue mentioned
above.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This reverts commit be05e1370c, which is
not a problem as we never released such option.
Conflicts:
tools/packaging/kata-deploy/helm-chart/README.md
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As Kata Containers can be consumed by other helm-charts, hard coding the
default runtime class name to `kata` is not optimal.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
All the options that take a specific shim as an argument MUST have
specific per arch settings, as not all the shims are available for all
the arches, leading to issues when setting up multi-arch deployments.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
When the NodeFeatureRule CRD is detected kata-deploy will:
* Create the specific NodeFeatureRules for the x86_64 TEEs
* Adapt the TEEs runtime classes to take into account the amount of keys
available in the system when spawning the podsandbox.
Note, we still do not have NFD as sub-dependency of the helm chart, and
I'm not even sure if we will have. However, it's important to integrate
better with the scenarios where the NFD is already present.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This allows us to do a full multi-arch deployment, as the user can
easily select which shim can be deployed per arch, as some of the VMMs
are not supported on all architectures, which would lead to a broken
installation.
Now, passing shims per arch we can easily have an heterogenous
deployment where, for instance, we can set qemu-se-runtime-rs for s390x,
qemu-cca for aarch64, and qemu-snp / qemu-tdx for x86_64 and call all of
those a default kata-confidential ... and have everything working with
the same deployment.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
By doing this we can ensure that more than one instance of
nydus-snapshotter can be running inside the cluster, which is super
useful for doing A-B "upgrades" (where we install a new version of
kata-containers + nydus on B, while A is still running, and then only
uninstall A after making sure that B is working as expected).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We've been wrongly trying to set up the `${shim}` (as the qemu-snp, for
instance) as the hypervisor name in the kata-containers configuration
file, leading to an `tomlq` breaking as all the .hypervisors.qemu* shims
are tied to the `qemu` hypervisor, and it happens regardless of the shim
having a different name, or the hypervisor being experimental or not.
```sh
$ grep "hypervisor.qemu*" src/runtime/config/configuration-*
src/runtime/config/configuration-qemu-cca.toml.in:[hypervisor.qemu]
src/runtime/config/configuration-qemu-coco-dev.toml.in:[hypervisor.qemu]
src/runtime/config/configuration-qemu-nvidia-gpu-snp.toml.in:[hypervisor.qemu]
src/runtime/config/configuration-qemu-nvidia-gpu-tdx.toml.in:[hypervisor.qemu]
src/runtime/config/configuration-qemu-nvidia-gpu.toml.in:[hypervisor.qemu]
src/runtime/config/configuration-qemu-se.toml.in:[hypervisor.qemu]
src/runtime/config/configuration-qemu-snp.toml.in:[hypervisor.qemu]
src/runtime/config/configuration-qemu-tdx.toml.in:[hypervisor.qemu]
src/runtime/config/configuration-qemu.toml.in:[hypervisor.qemu]
$ grep "hypervisor.qemu*" src/runtime-rs/config/configuration-*
src/runtime-rs/config/configuration-qemu-runtime-rs.toml.in:[hypervisor.qemu]
src/runtime-rs/config/configuration-qemu-se-runtime-rs.toml.in:[hypervisor.qemu]
```
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
For those who are not willing to use the nydus-snapshotter for pulling
the image inside the guest, let's allow them setting the
experimetal_force_guest_pull, introduced by Edgeless, as part of our
helm-chart.
This option can be set as:
_experimentalForceGuestPull: "qemu-tdx,qemu-coco-dev"
Which would them ensure that the configuration for `qemu-tdx` and
`qemu-coco-dev` would have the option enabled.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Canonical TDX release is not needed for vanilla Ubuntu 25.10 but
GRUB_CMDLINE_LINUX_DEFAULT needs to contain `nohibernate` and
`kvm_intel.tdx=1`
Signed-off-by: Szymon Klimek <szymon.klimek@intel.com>
We may deploy in scenarios where we want to have both snapshotters set
up, sometimes even for simple test on which one behaves better.
With this in mind, let's allow EXTERNAL_SETUP_SNAPSHOTTER to receive a
comma separated list of snapshotters, such as:
```
EXPERIMENTAL_SETUP_SNAPSHOTTER="erofs,nydus"
```
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Similarly to what's been done for the nydus-snapshotter, let's allow
users to have erofs-snapshotter set up by simply passing:
```
EXPERIMENTAL_SETUP_SNAPSHOTTER="erofs".
```
Mind that erofs, although a built-in containerd snapshotter, has system
depdencies that we will *NOT* install and it's up to the admin to do so.
These dependencies are:
* erofs-utils
* fsverity
* erofs module loaded
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's introduce a new EXPERIMENTAL_SETUP_SNAPSHOTTER environemnt
variable that, when set, allows kata-deploy to put the nydus snapshotter
in the correct place, and configure containerd accordingly.
Mind, this is a stop gap till the nydus-snapshotter helm chart is ready
to be used and behaving well enough to become a weak dependency of our
helm chart. When that happens this code can be deleted entirely.
Users can have nydus-snapshotter deployed and configured for the
guest-pull use case by simply passing:
```
EXPERIMENTAL_SETUP_SNAPSHOTTER="nydus"
```
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Otherwise we'd end up adding a the file several times, which could lead
to problems when removing the entry, leading to containerd not being
able to start due to an import file not being present.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
I've hit this when using a machine with slow internet connection, which
took ages to download the kata-cleanup image, and then helm timed out in
the middle of the cleanup, leading to the cleanup job being restarted
and then bailing with an error as the runtimeclasses that kata-deploy
tries to delete were already deleted.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Removing files related to SEV, responsible for
installing and configuring Kata containers.
Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
Signed-off-by: Arvind Kumar <arvinkum@amd.com>
the latest Canonical TDX release supports 25.04 / Plucky as
well. Users experimenting with the latest goodies in the
25.04 TDX enablement won't get Kata deployed properly.
This change accepts 25.04 as supported distro for TDX.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
`containerd` command should be executed in the host environment.
(To generate the config that matches the host's containerd version.)
Fixes: #11092
Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>
The previous attempt to fix this issue only took in consideration the
QEMU binary, as I completely forgot that there were other pieces of the
config that we also adjusted.
Now, let's just check one of the configs before trying to adjust
anything else, and only do the changes if the suffix added with the
multi-install suffix is not yet added.{
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Once the multiInstallSuffix has been taken into account, we should not
keep appending it on every re-run/restart, as that would lead to a path
that does not exist.
Fixes: #11187
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
INSTALLATION_PREFIX must begin with a "/"
because it is being concatenated with /host.
If there is no /, displays a message and makes an error.
Fixes: #11096
Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>
It hangs when invalid arguments are specified.
```bash
kata-deploy-6sr2p:/# /opt/kata-artifacts/scripts/kata-deploy.sh xxx
Action:
* xxx
...
Usage: /opt/kata-artifacts/scripts/kata-deploy.sh [install/cleanup/reset]
ERROR: invalid arguments
...
^C <- hang
```
I changed it to behave the same as when there are no arguments.
```bash
kata-deploy-6sr2p:/# /opt/kata-artifacts/scripts/kata-deploy.sh
Usage: /opt/kata-artifacts/scripts/kata-deploy.sh [install/cleanup/reset]
ERROR: invalid arguments
kata-deploy-6sr2p:/# echo $?
1
```
Fixes: #11068
Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>
When `KATA_HYPERVISOR` is set to `qemu-se-runtime-rs`,
a configuration file is properly referenced and a runtime class
should be created via kata-deploy.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Change kata-deploy script and Helm chart in order to be able to use kata-deploy on a microk8s cluster deployed with snap.
Fixes: #10830
Signed-off-by: Stephane Talbot <Stephane.Talbot@univ-savoie.fr>
This is super useful for development / debugging scenarios, mainly when
dealing with limited hardware availability, as this change allows
multiple people to develop into one single machine, while still using
kata-deploy.
Fixes: #10546
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This will make our lives considerably easier when it comes to cleaning
up content added, while it's also a groundwork needed for having
multiple installations running in parallel.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Let's actually mount the whole /etc/k0s as /etc/containerd, so we can
easily access the containerd configuration file which has the version in
it, allowing us to parse it instead of just making a guess based on
kubernetes distro being used.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
On Ubuntu 24.04, with the distro default containerd, we're already
getting:
```
$ containerd config default | grep "version = "
version = 3
```
With that in mind, let's make sure that we're ready to support this from
the next release.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Use the configuration used by AKS (static_sandbox_resource_mgmt=true)
for CI testing on Mariner hosts.
Hopefully pod startup will become more predictable on these hosts -
e.g., by avoiding the occasional hotplug timeouts described by #10413.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This will help to avoid code duplication on what's needed on the helm
and non-helm cases.
The reason it's not been added as part of the commit which adds the
post-delete hook is simply for helping the reviewer (as the diff would
be less readable with this change).
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Instead of using a lifecycle.preStop hook, as done when we're using
using the helm chat, let's add a post-delete hook to take care of
properly cleaning up the node during when uninstalling kata-deploy.
The reason why the lifecyle.preStop hook would never work on our case is
simply because each helm chart operation follows the Kuberentes
"declarative" approach, meaning that an operation won't wait for its
previous operation to successfully finish before being called, leading
to us trying to access content that's defined by our RBAC, in an
operation that was started before our RBAC was deleted, but having the
RBAC being deleted before the operation actually started.
Unfortunately this hook brings in some code duplicatioon, mainly related
to the RBAC parts, but that's not new as the same happens with our
deamonset.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It's simply easier if we just use /host/opt/kata instead in our scripts,
which will simplify a lot the logic of adding an INSTALLATION_PREFIX
later on.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
As we build our binaries with the `/opt/kata` prefix, that's the value
of $dest_dir.
Later in thise series it'll become handy, as we'll introduce a way to
install the Kata Containers artefacts in a different location.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>