As we're failing on the uninstall, which seems related to a bug on NFD
itself, but I don't have access to a s390x machine to debug, let's skip
the enablement for now and enable it back once we've experimented it
better on s390x.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As we're failing to install NFD on CBL Mariner, let's skip the
enablement there, and enable it once we've experimented it better there.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As we have the ability to deploy NFD as a sub-chart of our chart, let's
make sure we test it during our CI.
We had to increase the timeout values, where we had timeouts set, to
deploy / undeploy kata, as now NFD is also deployed / undeployed.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's ensure that we add NFD as a weak dependency of the kata-deploy
helm chart.
What we're doing for now is leaving it up to the user / admin to enable
it, and if enabled then we do a explicit check for virtualization
support (x86_64 only for now).
In case NFD is already deployed, we fail the installation (in case it's
enabled on the kata-deploy helm chart) with a clear error message to the
user.
While I know that kata-remote **DOES NOT** require virtualization, I've
left this out (with a comment for when we add a peer-pods dependency on
kata-deploy) in order to simplify things for now, as kata-remote is not
a deployed shim by default.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As Kata Containers can be consumed by other helm-charts, hard coding the
default runtime class name to `kata` is not optimal.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
All the options that take a specific shim as an argument MUST have
specific per arch settings, as not all the shims are available for all
the arches, leading to issues when setting up multi-arch deployments.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's ensure that we consume NVRC releases straight from GitHub instead
of building the binaries ourselves.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We have here either /dev/vfio/<num> or /dev/vfio/devices/vfio<num>,
for IOMMUFD format /dev/vfio/devices/vfio<num>, strip "vfio" prefix
/dev/vfio/123 - basename "123" - vfioNum = "123" - cdi.k8s.io/vfio123
/dev/vfio/devices/vfio123 - basename "vfio123" - strip - vfioNum = "123" - cdi.k8s.io/vfio123
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
This allows us to test privileged containers when using the webhook.
We can do this because kata-deploy sets privileged_without_host_devices = true for kata runtime by default.
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
We have 2 tests running on GitHub provided runners:
* devmapper
* CRI-O
- devmapper situation
For devmapper, we're currently testing devmapper with s390x as part of
one of its jobs.
More than that, this test has been failing here due to a lack of space
in the machine for quite some time, and no-action was taken to bring it
back either via GARM or some other way.
With that said, let's rely on the s390x CI to test devmapper and avoid
one extra failure on our CI by removing this one.
- cri-o situation
CRI-O is being tested with a fixed version of kubernetes that's already
reached its EOL, and a CRI-O version that matches that k8s version.
There has been attempts to raise issues, and also to provide a PR that
does at least part of the work ... leaving the debugging part for the
maintainers of the CI. However, there was no action on those from the
maintainers.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
- Explained the concept and benefits of VM templating
- Provided step-by-step instructions for enabling VM templating
- Detailed the setup for using snapshotter in place of VirtioFS for template-based VM creation
- Added performance test results comparing template-based and direct VM creation
Signed-off-by: ssc <741026400@qq.com>
- init: initialize the VM template factory
- status: check the current factory status
- destroy: clean up and remove factory resources
These commands provide basic lifecycle management for VM templates.
Signed-off-by: ssc <741026400@qq.com>
Use `ioctl_with_mut_ref` instead of `ioctl_with_ref` in the
`create_device` method as it needs to write to the `kvm_create_device`
struct passed to it, which was released in v0.12.1.
Signed-off-by: Siyu Tao <taosiyu2024@163.com>
Fix the cargo fmt issues and then we can make the libs tests required
again to avoid this regression happening again.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
kata-deploy helm chart is *THE* way to deploy kata-containers on
kubernetes environments, and kubernetes environments is basically the
only reliably tested deployment we have.
For now, let's just drop documentation that is outdated / incorrect, and
in the future let's ensure we update the linked docs, as we work on
update / upgrade for the helm chart.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The libs in question were added when moving to developer.nvidia.com
but switching back to ubuntu only based builds they are not needed.
Remove them to keep the rootfs as minimal as possible.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
In the case of CC we need additional libraries in the rootfs.
Add them conditionally if type == confidential.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Add build_vm_from_template() that flips boot_from_template flag,
wires factory.template_path/{memory,state} into the hypervisor config,
and returns ready-to-use hypervisor & agent instances.
When factory.template is enabled, VirtContainer bypasses normal creation
and directly boots the VM by restoring the template through incoming migration,
completing the "create → save → clone" loop.
Fixes: #11413
Signed-off-by: ssc <741026400@qq.com>
Introduced factory::FactoryConfig with init/destroy/status commands to manage template pools.
Added template::Template to fetch, create and persist base VMs.
Introduced vm::{VM, VMConfig} exposing create, pause, save, resume, stop,
disconnect and migration helpers for sandbox integration.
Extended QemuInner to executes QMP incoming migration, pause/resume and status tracking.
Fixes: #11413
Signed-off-by: ssc <741026400@qq.com>
Added new fields in Hypervisor struct to support VM template creation,
template boot, memory and device state paths, shared path, and store
paths. Introduced a Factory struct in config to manage template path,
cache endpoint, cache number, and template enable flag. Integrated
Factory into TomlConfig for runtime configuration parsing.
Fixes: #11413
Signed-off-by: ssc <741026400@qq.com>
This change enables to run the Cloud Hypervisor VMM using a non-root user
when rootless flag is set true in configuration.
Fixes: #11414
Signed-off-by: stevenfryto <sunzitai_1832@bupt.edu.cn>
Pass the file descriptors of the tuntap device to the Cloud Hypervisor VMM process
so that the process could open the device without cap_net_admin
Signed-off-by: stevenfryto <sunzitai_1832@bupt.edu.cn>
There's no reason to keep the env var / input as it's never been used
and now kata-deploy detects automatically whether NFD is deployed or
not.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
When the NodeFeatureRule CRD is detected kata-deploy will:
* Create the specific NodeFeatureRules for the x86_64 TEEs
* Adapt the TEEs runtime classes to take into account the amount of keys
available in the system when spawning the podsandbox.
Note, we still do not have NFD as sub-dependency of the helm chart, and
I'm not even sure if we will have. However, it's important to integrate
better with the scenarios where the NFD is already present.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Change NIM bats file logic to allow skipping test cases which
require multiple GPUs. This can be helpful for test clusters where
there is only one node with a single GPU, or for local test
environments with a single-node cluster with a single GPU.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Temporarily disables the new runners for building artifacts jobs. Will be re-enabled once they are stable.
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
This partially reverts 8dcd91c for the s390x because the
CI jobs are currently blocking the release. The new runners
will be re-introduced once they are stable and no longer
impact critical paths.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This allows us to do a full multi-arch deployment, as the user can
easily select which shim can be deployed per arch, as some of the VMMs
are not supported on all architectures, which would lead to a broken
installation.
Now, passing shims per arch we can easily have an heterogenous
deployment where, for instance, we can set qemu-se-runtime-rs for s390x,
qemu-cca for aarch64, and qemu-snp / qemu-tdx for x86_64 and call all of
those a default kata-confidential ... and have everything working with
the same deployment.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The previous procedure failed to reliably ensure that all unused Device
Cgroups were completely removed, a failure consistently verified by CI
tests.
This change introduces a more robust and thorough cleanup mechanism. The
goal is to prevent previous issues—likely stemming from improper use of
Rust mutable references—that caused the modifications to be ineffective
or incomplete.
This ensures a clean environment and reliable CI test execution.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Build only from Ubuntu repositories do not mix with developer.nvidia.com
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Update tools/osbuilder/rootfs-builder/nvidia/nvidia_chroot.sh
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Migrate the k8s job to a different runner and use a long running cluster
instead of creating the cluster on every run.
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>