Utilize Kubelet's Pod Resource API to determine device allocations
for the Pod during sandbox creation. Use CDI files to translate the device
IDs to corresponding device paths and perform device injection.
Fixes#12009
Signed-off-by: Joji Mekkattuparamban <jojim@nvidia.com>
Use the pod name variable so that kubectl wait finds the pod. Currently,
kubectl waits for nvidia-nim-llama-3-2-nv-embedqa-1b-v2, not for
nvidia-nim-llama-3-2-nv-embedqa-1b-v2-tee
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Introduce a new devkit parameter which will produce a rootfs
without chisselling. This results in a larger rootfs with various
packages and binaries being included, for instance, enabling the
use of the debug console.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
There are rust packages being cloned and built inside
tools/packaging/kata-deploy/local-build/build folder, which may mislead
those packages to think they are part of the kata root workspace.
Exclude the directory to avoid that.
Reported-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
The person who introduced the check, someone named Fabiano Fidêncio,
forgot a `$` in a variable assignment.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Enable auto-generate policy on cbl-mariner Hosts for
qemu-coco-dev-runtime-rs if the user didn't specify an
AUTO_GENERATE_POLICY value.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
We will re-enable this one later on once the changes to properly cold
plug multi GPUs are merged.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's just move the podOverhead to a gigantic value, as we do need pod
snadboxes as big as that, and we've noticed QEMU being OOM killed with
smaller overheads.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Those need to pull the models inside the guest, and the guest has 50% of
its memory "allowed" to be used as tmpfs, so, we gotta usa the RAM that
we have.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Yes, we're dealing with a combination of large images and image-rs
concurrent image layers being not optimal.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We cannot use the same format used for docker, as it includes username
and password, while what's expected when using Trustee does not.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Now that we've bumped Trustee to a version that supports the NVIDIA
remote verifier, let's re-enable the tests.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Right now we have only been passing the env var to the deployment
script, but we really need to pass it to the tests script as well.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Try and reduce the page limit of each job request to avoid the chances of
us tripping over github's 10s api limit.
All credit to @burgerdev for the investigation and suggestion!
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The current implementation causes issues with the Agent Policy
nontee CI tests, as Kata-Agent does not allow any configuration
for `count(Linux.Resources.Devices) == 0`.
This commit ensures that Linux.Resources.Devices, including all its
values, is completely cleared from the OCI Runtime Specification before
being passed to the Kata-Agent.
This addresses the CI failure by enforcing the required empty state for
the Devices cgroup configuration.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Previously, CopyFile implementation attempted to reuse existing guest
paths for subsequent containers within the same Pod. This prevented
correct bind mounting of shared configurations (e.g., ConfigMaps,
Service Accounts) into the later containers within a multi-containers
pod, as they lacked their own allocated guest path.
This commit modifies the logic to create a unique guest path for every
container that requires file propagation.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Crates with no workspace setup would think themselves are in the root
workspace, which our root workspace is not ready for them. Excluding
them for now.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Add Cargo.toml at repo root, use this root workspace for as many as
possible Rust components of Kata Containers. This would enable us to
share a common Cargo.lock file, and reduce the noise from dependabot.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Similar to #12075, bump-backtrace to 0.3.76 to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056
As a side effect this brought in loads of other crate changes, which I think are due
to it bumping the local dependencies that this package builds on.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Similar to #12075, bump-backtrace to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Similar to #12075, bump-backtrace to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Similar to #12075, bump-backtrace to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Similar to #12075, bump flate2 and backtrace to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Similar to #12075, bump flate2 and backtrace to remove the dependency
on adler, which is unmaintained - contributing to mitigating RUSTSEC-2025-0056
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Since the network device hotplug is an asynchronous operation,
it's possible that the hotplug operation had returned, but
the network device hasn't ready in guest, thus it's better to
retry on this operation to wait until the device ready in guest.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This makes the user experience better, as the admin can deploy Kata
Containers without having to download / set up any additional file.
Of course, if the admin wants something more specific, examples are
provided.
Tests and documentation are updated to reflect this change.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>