This reverts commit be05e1370c, which is
not a problem as we never released such option.
Conflicts:
tools/packaging/kata-deploy/helm-chart/README.md
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We had this logic inside the script when we didn't use the helm chart.
However, this only makes the shim script more convoluted for no reason.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
In order to fix:
```
=== Running govulncheck on containerd-shim-kata-v2 ===
Vulnerabilities found in containerd-shim-kata-v2:
=== Symbol Results ===
Vulnerability #1: GO-2025-4015
Excessive CPU consumption in Reader.ReadResponse in net/textproto
More info: https://pkg.go.dev/vuln/GO-2025-4015
Standard library
Found in: net/textproto@go1.24.6
Fixed in: net/textproto@go1.24.8
Vulnerable symbols found:
#1: textproto.Reader.ReadResponse
Vulnerability #2: GO-2025-4014
Unbounded allocation when parsing GNU sparse map in archive/tar
More info: https://pkg.go.dev/vuln/GO-2025-4014
Standard library
Found in: archive/tar@go1.24.6
Fixed in: archive/tar@go1.24.8
Vulnerable symbols found:
#1: tar.Reader.Next
Vulnerability #3: GO-2025-4013
Panic when validating certificates with DSA public keys in crypto/x509
More info: https://pkg.go.dev/vuln/GO-2025-4013
Standard library
Found in: crypto/x509@go1.24.6
Fixed in: crypto/x509@go1.24.8
Vulnerable symbols found:
#1: x509.Certificate.Verify
#2: x509.Certificate.Verify
Vulnerability #4: GO-2025-4012
Lack of limit when parsing cookies can cause memory exhaustion in net/http
More info: https://pkg.go.dev/vuln/GO-2025-4012
Standard library
Found in: net/http@go1.24.6
Fixed in: net/http@go1.24.8
Vulnerable symbols found:
#1: http.Client.Do
#2: http.Client.Get
#3: http.Client.Head
#4: http.Client.Post
#5: http.Client.PostForm
Use '-show traces' to see the other 9 found symbols
Vulnerability #5: GO-2025-4011
Parsing DER payload can cause memory exhaustion in encoding/asn1
More info: https://pkg.go.dev/vuln/GO-2025-4011
Standard library
Found in: encoding/asn1@go1.24.6
Fixed in: encoding/asn1@go1.24.8
Vulnerable symbols found:
#1: asn1.Unmarshal
#2: asn1.UnmarshalWithParams
Vulnerability #6: GO-2025-4010
Insufficient validation of bracketed IPv6 hostnames in net/url
More info: https://pkg.go.dev/vuln/GO-2025-4010
Standard library
Found in: net/url@go1.24.6
Fixed in: net/url@go1.24.8
Vulnerable symbols found:
#1: url.JoinPath
#2: url.Parse
#3: url.ParseRequestURI
#4: url.URL.Parse
#5: url.URL.UnmarshalBinary
Vulnerability #7: GO-2025-4009
Quadratic complexity when parsing some invalid inputs in encoding/pem
More info: https://pkg.go.dev/vuln/GO-2025-4009
Standard library
Found in: encoding/pem@go1.24.6
Fixed in: encoding/pem@go1.24.8
Vulnerable symbols found:
#1: pem.Decode
Vulnerability #8: GO-2025-4008
ALPN negotiation error contains attacker controlled information in
crypto/tls
More info: https://pkg.go.dev/vuln/GO-2025-4008
Standard library
Found in: crypto/tls@go1.24.6
Fixed in: crypto/tls@go1.24.8
Vulnerable symbols found:
#1: tls.Conn.Handshake
#2: tls.Conn.HandshakeContext
#3: tls.Conn.Read
#4: tls.Conn.Write
#5: tls.Dial
Use '-show traces' to see the other 4 found symbols
Vulnerability #9: GO-2025-4007
Quadratic complexity when checking name constraints in crypto/x509
More info: https://pkg.go.dev/vuln/GO-2025-4007
Standard library
Found in: crypto/x509@go1.24.6
Fixed in: crypto/x509@go1.24.9
Vulnerable symbols found:
#1: x509.CertPool.AppendCertsFromPEM
#2: x509.Certificate.CheckCRLSignature
#3: x509.Certificate.CheckSignature
#4: x509.Certificate.CheckSignatureFrom
#5: x509.Certificate.CreateCRL
Use '-show traces' to see the other 27 found symbols
Vulnerability #10: GO-2025-4006
Excessive CPU consumption in ParseAddress in net/mail
More info: https://pkg.go.dev/vuln/GO-2025-4006
Standard library
Found in: net/mail@go1.24.6
Fixed in: net/mail@go1.24.8
Vulnerable symbols found:
#1: mail.AddressParser.Parse
#2: mail.AddressParser.ParseList
#3: mail.Header.AddressList
#4: mail.ParseAddress
#5: mail.ParseAddressList
```
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Otherwise we'll face issues like:
```
Error: found in Chart.yaml, but missing in charts/ directory: node-feature-discovery
```
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's ensure that we add NFD as a weak dependency of the kata-deploy
helm chart.
What we're doing for now is leaving it up to the user / admin to enable
it, and if enabled then we do a explicit check for virtualization
support (x86_64 only for now).
In case NFD is already deployed, we fail the installation (in case it's
enabled on the kata-deploy helm chart) with a clear error message to the
user.
While I know that kata-remote **DOES NOT** require virtualization, I've
left this out (with a comment for when we add a peer-pods dependency on
kata-deploy) in order to simplify things for now, as kata-remote is not
a deployed shim by default.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As Kata Containers can be consumed by other helm-charts, hard coding the
default runtime class name to `kata` is not optimal.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
All the options that take a specific shim as an argument MUST have
specific per arch settings, as not all the shims are available for all
the arches, leading to issues when setting up multi-arch deployments.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Let's ensure that we consume NVRC releases straight from GitHub instead
of building the binaries ourselves.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The libs in question were added when moving to developer.nvidia.com
but switching back to ubuntu only based builds they are not needed.
Remove them to keep the rootfs as minimal as possible.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
In the case of CC we need additional libraries in the rootfs.
Add them conditionally if type == confidential.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
When the NodeFeatureRule CRD is detected kata-deploy will:
* Create the specific NodeFeatureRules for the x86_64 TEEs
* Adapt the TEEs runtime classes to take into account the amount of keys
available in the system when spawning the podsandbox.
Note, we still do not have NFD as sub-dependency of the helm chart, and
I'm not even sure if we will have. However, it's important to integrate
better with the scenarios where the NFD is already present.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This allows us to do a full multi-arch deployment, as the user can
easily select which shim can be deployed per arch, as some of the VMMs
are not supported on all architectures, which would lead to a broken
installation.
Now, passing shims per arch we can easily have an heterogenous
deployment where, for instance, we can set qemu-se-runtime-rs for s390x,
qemu-cca for aarch64, and qemu-snp / qemu-tdx for x86_64 and call all of
those a default kata-confidential ... and have everything working with
the same deployment.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Build only from Ubuntu repositories do not mix with developer.nvidia.com
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Update tools/osbuilder/rootfs-builder/nvidia/nvidia_chroot.sh
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Aurélien has moved to a reliable mirror for our tests, but we missed
that our tools Dockerfiles could benefit from the same change, which is
added now.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Although we saw this happening, we expected it to NOT happen ...
As the kernel is not signed, but we expect it to be (the cached
version), then we're bailing. :-/
Let's ensure a full rebuild of kernels happen and we'll be good from
that point onwards.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
By doing this we can ensure that more than one instance of
nydus-snapshotter can be running inside the cluster, which is super
useful for doing A-B "upgrades" (where we install a new version of
kata-containers + nydus on B, while A is still running, and then only
uninstall A after making sure that B is working as expected).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We've been wrongly trying to set up the `${shim}` (as the qemu-snp, for
instance) as the hypervisor name in the kata-containers configuration
file, leading to an `tomlq` breaking as all the .hypervisors.qemu* shims
are tied to the `qemu` hypervisor, and it happens regardless of the shim
having a different name, or the hypervisor being experimental or not.
```sh
$ grep "hypervisor.qemu*" src/runtime/config/configuration-*
src/runtime/config/configuration-qemu-cca.toml.in:[hypervisor.qemu]
src/runtime/config/configuration-qemu-coco-dev.toml.in:[hypervisor.qemu]
src/runtime/config/configuration-qemu-nvidia-gpu-snp.toml.in:[hypervisor.qemu]
src/runtime/config/configuration-qemu-nvidia-gpu-tdx.toml.in:[hypervisor.qemu]
src/runtime/config/configuration-qemu-nvidia-gpu.toml.in:[hypervisor.qemu]
src/runtime/config/configuration-qemu-se.toml.in:[hypervisor.qemu]
src/runtime/config/configuration-qemu-snp.toml.in:[hypervisor.qemu]
src/runtime/config/configuration-qemu-tdx.toml.in:[hypervisor.qemu]
src/runtime/config/configuration-qemu.toml.in:[hypervisor.qemu]
$ grep "hypervisor.qemu*" src/runtime-rs/config/configuration-*
src/runtime-rs/config/configuration-qemu-runtime-rs.toml.in:[hypervisor.qemu]
src/runtime-rs/config/configuration-qemu-se-runtime-rs.toml.in:[hypervisor.qemu]
```
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We've recently added support for:
* deploying and setting up a snapshotter, via
_experimentalSetupSnapshotter
* enabling experimental_force_guest_pull, via
_experimentalForceGuestPull
However, we never updated the documentation for those, thus let's do it
now.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This change ensures that the NVIDIA package repository for nvidia-imex
and libnvidia-nspc is being used as source.
The NVIDIA repository does not publish these packages with a -580
version suffix, which made us fall back to the packages from the
Ubuntu repository.
These two packages were recently updated by Ubuntu to depend on
nvidia-kernel-common-580-server (this happened from version
580.82.07-0ubuntu1 to version 580.95.05-0ubuntu1). This conflicts
with nvidia-kernel-common-580 which gets installed by
nvidia-headless-no-dkms-580-open, thus causing a build failure.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
While the local-build's folder's Makefile dependencies for the
confidential nvidia rootfs targets already declare the pause image
and coco-guest-components dependencies, the actual rootfs
composition does not contain the pause image bundle and relevant
certificates for guest pull. This change ensure the rootfs gets
composed with the relevant files.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
After supporting the Arm CCA, it will rely on the kernel kvm.h headers to build the
runtime. The kernel-headers currently quite new with the traditional one, so that we
rely on build the kernel header first and then inject it to the shim-v2 build container.
Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
Co-authored-by: Seunguk Shin <seunguk.shin@arm.com>
With this change we namespace the stage one rootfs tarball name
and use the same name across all uses. This will help overcome
several subtle local build problems.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
We need to ensure that any change on the Dockerfile (and its dir) leads
to the build being retriggered, rather than using the cached version.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We are seeing more protoc related failures on the new
runners, so try adding the protobuf-compiler dependency
to these steps to see if it helps.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
On commit 9602ba6ccc, from February this
year, we've introduced a check to ensure that the files needed for
signing the kernel build are present. However, we've noticed last week
that there were a reasonable amount of wrong assumptions with the
workflow. :-)
Zvonko fixed the majority of those, but this bit was left and it'd cause
breakages when using kernel that was cached ... although passing when
building new kernels.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This is needed to the kernel setup picks up the correct
config values from our fragments directories.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We need to make sure that the kernel we're using has the
correct configs set, otherwise the module signing will not work.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
What was done in the past, trying to set the env var on the same step
it'd be used, simply does not work.
Instead, we need to properly set it through the `env` set up, as done
now.
We're also bumping the kata_config_version to ensure we retrigger the
kernel builds.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
There's no reason to have the code duplication between the SNP / TDX
tests for CoCo, as those are basically using the same configuration
nowadays.
Note that for the TEEs case, as the nydus-snapshotter is deployed by the
admin, once, instead of deploying it on every run ... I'm actually
removing the nydus-snapshotter steps so we make it clear that those
steps are not performed by the CI.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
For DGX like systems we need additional binaries and libraries,
enable the Kata AND CoCo use-case.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Update tools/osbuilder/rootfs-builder/nvidia/nvidia_rootfs.sh
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
For those who are not willing to use the nydus-snapshotter for pulling
the image inside the guest, let's allow them setting the
experimetal_force_guest_pull, introduced by Edgeless, as part of our
helm-chart.
This option can be set as:
_experimentalForceGuestPull: "qemu-tdx,qemu-coco-dev"
Which would them ensure that the configuration for `qemu-tdx` and
`qemu-coco-dev` would have the option enabled.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As the kata-deploy helm chart has been the only way we've been testing
kata-containers deployment as part of our CI, it's time to finally get
rid of the kustomize yamls and avoid us having to maintain two different
methods (with one of those not being tested).
Here I removed:
* kata-deploy yamls and kustomize yamls
* kata-cleanup yamls and kustomize yamls
* kata-rbac yals and kustomize yamls
* README.md for the kustomize yamls was removed
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
NVRC introduced the confidential feature flag and we
haven't updated the rootfs build to accomodate.
If rootfs_type==confidential user --feature=confidential
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Canonical TDX release is not needed for vanilla Ubuntu 25.10 but
GRUB_CMDLINE_LINUX_DEFAULT needs to contain `nohibernate` and
`kvm_intel.tdx=1`
Signed-off-by: Szymon Klimek <szymon.klimek@intel.com>