Fixed a bug with the debug kernel build where common/ was repeated
after the common path variable, resulting in the debug
confs never being picked up.
This exposed a subsequent bug where the debug conf
was included in other builds, this is also fixed by creating a
separate directory for debug confs with one file at the moment,
debug.conf that contains debug configurations and bpf specific
configs.
To enable kernel builds (specifically for bpf) the dwarves package was added
to the kernel dockerfile for the pahole package.
Signed-off-by: Agam Dua <agam_dua@apple.com>
Adds a BPF section in the debug.conf kernel configuration options
to enable eBPF and BTF support for debug kernel builds.
Signed-off-by: Agam Dua <agam_dua@apple.com>
Remove # !confidential from mmio.conf so CONFIG_VIRTIO_MMIO and
CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES are included when building the
unified x86_64/s390x kernel with -x
Firecracker requires virtio-mmio for block devices; without it the
guest kernel panics (no /dev/vda).
Fixes: #12581
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Since QEMU v10.0.0 and Linux v6.13, virtio-mem-ccw is supported.
Let's enable the required kernel configs for s390x.
This commit enables `CONFIG_VIRTIO_MEM` and `CONFIG_MEMORY_HOTREMOVE`
to support memory hotplug in the VM guest.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
- Trim trailing whitespace and ensure final newline in non-vendor files
- Add .editorconfig-checker.json excluding vendor dirs, *.patch, *.img,
*.dtb, *.drawio, *.svg, and pkg/cloud-hypervisor/client so CI only
checks project code
- Leave generated and binary assets unchanged (excluded from checker)
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Remove the initramfs folder, its build steps, and use the kernel
based dm-verity enforcement for the handlers which used the
initramfs mode. Also, remove the initramfs verity mode
capability from the shims and their configs.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
This change introduces the kernelinit dm-verity mode, allowing
initramfs-less dm-verity enforcement against the rootfs image.
For this, the change introduces a new variable with dm-verity
information. This variable will be picked up by shim
configurations in subsequent commits.
This will allow the shims to build the kernel command line
with dm-verity information based on the existing
kernel_parameters configuration knob and a new
kernel_verity_params configuration knob. The latter
specifically provides the relevant dm-verity information.
This new configuration knob avoids merging the verity
parameters into the kernel_params field. Avoiding this, no
cumbersome escape logic is required as we do not need to pass the
dm-mod.create="..." parameter directly in the kernel_parameters,
but only relevant dm-verity parameters in semi-structured manner
(see above). The only place where the final command line is
assembled is in the shims. Further, this is a line easy to comment
out for developers to disable dm-verity enforcement (or for CI
tasks).
This change produces the new kernelinit dm-verity parameters for
the NVIDIA runtime handlers, and modifies the format of how
these parameters are prepared for all handlers. With this, the
parameters are currently no longer provided to the
kernel_params configuration knob for any runtime handler.
This change alone should thus not be used as dm-verity
information will no longer be picked up by the shims.
systemd-analyze on the coco-dev handler shows that using the
kernelinit mode on a local machine, less time is spent in the
kernel phase, slightly speeding up pod start-up. On that machine,
the average of 172.5ms was reduced to 141ms (4 measurements, each
with a basic pod manifest), i.e., the kernel phase duration is
improved by about 18 percent.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
Bump both the kernel and kernel-confidential versions from v6.12.x and
v6.16.x to v6.18.4, aligning with the new LTS release.
Kernel 6.18 introduced several configuration changes that required
updates to our kernel config fragments:
* CRYPTO_FIPS dependencies changed:
- In 6.12: depended on !CRYPTO_MANAGER_DISABLE_TESTS
- In 6.18: now depends on CRYPTO_SELFTESTS (which requires EXPERT)
Added CONFIG_EXPERT=y and CONFIG_CRYPTO_SELFTESTS=y to crypto.conf
to satisfy the new dependency chain.
* CONFIG_EXPERT is a naughty one, as it disables / enables a bunch
of things behind ones back, probably just to prove a point that
it is for experts ;-) ... regardless, a reasonable amount of
options had to be re-added in order to make sure anything ends
up broken.
* Legacy iptables support:
Kernel 6.18 requires explicit legacy xtables/iptables configs for
IP_NF_* options. Added CONFIG_NETFILTER_XTABLES_LEGACY,
CONFIG_IP_NF_IPTABLES_LEGACY, and CONFIG_IP6_NF_IPTABLES_LEGACY
to netfilter.conf.
* Module signing dependencies:
Added CONFIG_MODULES=y and other required dependencies to
module_signing.conf to ensure MODULE_SIG can be properly enabled.
* Whitelist updates:
- Added CONFIG_NF_CT_PROTO_DCCP (removed in 6.18+)
- Added CONFIG_CRYPTO_SELFTESTS, CONFIG_NETFILTER_XTABLES_LEGACY,
CONFIG_IP_NF_IPTABLES_LEGACY, CONFIG_IP6_NF_IPTABLES_LEGACY
(added in 6.18+, not present in older kernels like 6.12)
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Adds a practical set of kernel config used by docker-in-docker and kind
for network bridging and filtering. It also includes the matching IPv6
support to allow tools like kind that require IPv6 network policies to
work out of the box.
This support includes:
- nftables reject and filtering support for inet/ipv4/ipv6
- Bridge filtering for container-to-container traffic
- IPv6 NAT, filtering, and packet matching rules for network policies
- VXLAN and IPsec crypto support for network tunneling
- TMPFS POSIX ACL support for filesystem permissions
The configs are organized across fragment files:
- common/fs.conf: TMPFS ACL support
- common/crypto.conf: IPsec/VXLAN crypto algorithms
- common/network.conf: VXLAN, IPsec ESP, nftables bridge/ARP/netdev
- common/netfilter.conf: IPv6 netfilter stack and nftables advanced features
Fixes: #11886
Signed-off-by: Simon Kaegi <simon.kaegi@gmail.com>
After supporting the Arm CCA, it will rely on the kernel kvm.h headers to build the
runtime. The kernel-headers currently quite new with the traditional one, so that we
rely on build the kernel header first and then inject it to the shim-v2 build container.
Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
Co-authored-by: Seunguk Shin <seunguk.shin@arm.com>
Currently, use of openvpn clients/servers is not possible in Kata UVMs.
Following error message can be expected:
ERROR: Cannot open TUN/TAP dev /dev/net/tun: No such device (errno=19)
To support opevpn scenarios using bridging and TAP, we enable various
kernel networking config options.
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
SNP launch was failing after the confidential guest kernel was upgraded to 6.16.1.
Added required module CONFIG_MTRR enabled.
Added required module CONFIG_X86_PAT enabled.
Fixes: #11779
Signed-off-by: Ryan Savino <ryan.savino@amd.com>
Currently, the UVM kernel fails for istio deployments (at least with the
version we tested, 1.27.0). This is because the istio sidecar container
uses ip6tables and the required kernel configs are not built-in:
```
iptables binary ip6tables has no loaded kernel support and cannot be used, err: exit status 3 out: ip6tables v1.8.10 (legacy):
can't initialize ip6tables table `filter': Table does not exist (do you need to insmod?)
Perhaps ip6tables or your kernel needs to be upgraded.
```
Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
Linux v6.16 brings some useful features for the confidential guests.
Most importantly, it adds an ABI to extend runtime measurement registers
(RTMR) for the TEE platforms supporting it. This is currently enabled
on Intel TDX only.
The kernel version bump from v6.12.x to v6.16 forces some CONFIG_*
changes too:
MEMORY_HOTPLUG_DEFAULT_ONLINE was dropped in favor of more config
choices. The equivalent option is MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO.
X86_5LEVEL was made unconditional. Since this was only a TDX
configuration, dropping it completely as part of v6.16 is fine.
CRYPTO_NULL2 was merged with CRYPTO_NULL. This was only added in
confidential guest fragments (cryptsetup) so we can drop it in this update.
CRYPTO_FIPS now depends on CRYPTO_SELFTESTS which further depends on
EXPERT which we don't have. Enable both in a separate config fragment
for confidential guests. This can be moved to a common setting once
other targets bump to post v6.16.
CRYPTO_SHA256_SSE3 arch optimizations were reworked and are now enabled
by default. Instead of adding it to whitelist.conf, just drop it completely
since it was only enabled as part of "measured boot" feature for
confidential guests. CONFIG_CRYPTO_CRC32_S390 was reworked the same way.
In this case, whitelist.conf is needed.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
We want to be able to build a debug version of the kernel for various
use-cases like debugging, tracing and others.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Removing kernel config files realting
to SEV as part of the SEV deprecation
efforts.
Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
Signed-off-by: Arvind Kumar <arvinkum@amd.com>
Only sign the kernel if the user has provided the KBUILD_SIGN_PIN
otherwise ignore.
Whole here, let's move the functionality to the common fragments as it's
not a GPU specific functionality.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
There's no benefit on keeping those restricted to the dragonball build,
when they can be used with other VMMs as well (as long as they support
the mem-agent).
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Currently, Kata EROFS support needs it, otherwise it will:
[ 0.564610] erofs: (device sda): mounted with root inode @ nid 36.
[ 0.564858] overlayfs: failed to set xattr on upper
[ 0.564859] overlayfs: ...falling back to index=off,metacopy=off.
[ 0.564860] overlayfs: ...falling back to xino=off.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Linux CoCo x86 guest is hardened to ensure RDRAND provides enough
entropy to initialize Linux RNG. A failure will panic the guest.
For confidential guests any other RNG source is untrusted so disable
them.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
As the comment in the fragment suggests, this is for the firecracker builds
and not relevant for confidential guests, for example.
Exlude mmio.conf fragment by adding the new !confidential tag to drop
virtio MMIO transport for the confidential guest kernel (as virtio PCI is
enough for the use cases today).
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
build-kernel.sh supports exluding fragments from the common base
set based on the kernel target architecture.
However, there are also cases where the base set must be stripped
down for other reason. For example, confidential guest builds want to
exclude some drivers the untrusted host may try to add devices (e.g.,
virtio-rng).
Make build-kernel.sh to skip fragments tagged using '!confidential'
when confidential guest kernels are built.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Knowing that the upstream project provides a "ready to use" version of
the kernel, it's good to include an easy way to users to monitor
performance, and that's what we're doing by enabling the TASKSTATS (and
related) kernel configs.
This has been present as part of older kernels, but I couldn't
reasonably find the reason why it's been dropped.
Signed-off-by: Champ-Goblem <cameron@northflank.com>
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
AIA (Advanced Interrupt Architecture) is available and enabled by
default after v6.10 kernel, provide pci.conf to make proper use of IMSIC
of AIA.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Create `riscv` folder for riscv64 architecture to be inferred while
constructing kernel configuration, and introduce `base.conf` which
builds 64-bit kernel and with KVM built-in to kernel.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
At the proper step pass-through the var KBUILD_SIGN_PIN
so that the kernel_headers step has the PIN for encrypting
the signing key.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Set CONFIG_BLK_DEV_WRITE_MOUNTED=y to restore previous kernel behaviour.
Kernel v6.8+ will by default block buffer writes to block devices mounted by filesystems.
This unfortunately is what we need to use mounted loop devices needed by some teams
to build OSIs and as an overlay backing store.
More info on this config item [here](https://cateee.net/lkddb/web-lkddb/BLK_DEV_WRITE_MOUNTED.html)
Fixes: #10808
Signed-off-by: Simon Kaegi <simon.kaegi@gmail.com>
Add mglru, debugfs and psi to dragonball-experimental/mem_agent.conf to
support mem_agent function.
Fixes: #10625
Signed-off-by: Hui Zhu <teawater@antgroup.com>
KinD checks for the presence of this (and other) kernel configuration
via scripts like
https://blog.hypriot.com/post/verify-kernel-container-compatibility/ or
attempts to directly use /proc/sys/kernel/keys/ without checking to see
if it exists, causing an exit when it does not see it.
Docker/it's consumers apparently expect to be able to use the kernel
keyring and it's associated syscalls from/for containers.
There aren't any known downsides to enabling this except that it would
by definition enable additional syscalls defined in
https://man7.org/linux/man-pages/man7/keyrings.7.html which are
reachable from userspace. This minimally increases the attack surface of
the Kata Kernel, but this attack surface is minimal (especially since
the kernel is most likely being executed by some kind of hypervisor) and
highly restricted compared to the utility of enabling this feature to
get further containerization compatibility.
Signed-off-by: Crypt0s <BryanHalf@gmail.com>
Add CONFIG_PAGE_REPORTING, CONFIG_BALLOON_COMPACTION and
CONFIG_VIRTIO_BALLOON to dragonball-experimental configs to open
dragonball function and free page reporting function.
Signed-off-by: Hui Zhu <teawater@antgroup.com>
With Nvidia DPU or ConnectX network adapter, VF can do VFIO passthrough
to guest VM in `guest-kernel` mode. In the guest kernel, the adapter's
driver is required to claim the VFIO device and create network interface.
Signed-off-by: Lei Huang <leih@nvidia.com>
While enabling the attestation for IBM SE, it was observed that
a kernel config `CONFIG_S390_UV_UAPI` is missing.
This config is required to present an ultravisor in the guest VM.
Ths commit adds the missing config.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
`CONFIG_TN3270_TTY` and `CONFIG_S390_AP_IOMMU` are dropped for s390x
in 6.7.x which is used for a confidential kernel.
But they are still used for a vanilla kernel. So we need to add them
to the whitelist.
Fixes: #9465
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
CONFIG_CRYPTO_ECDSA is not supported in older kernels such as 5.10.x
which may cause building broken problem if we build such kernel with
NVIDIA GPU in version 5.10.x
So this patch is to add CONFIG_CRYPTO_ECDSA into whitelist.conf to
avoid break building guest kernel with NVIDIA GPU.
Fixes: #9140
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
- enable CONFIG_MTRR,CONFIG_X86_PAT on x86_64 for nvidia gpu
- optimize -f of build-kernel.sh, clean old kernel path and config before setup
- add kernel 5.16.x
Fixes: #9143
Signed-off-by: Jimmy-Xu <xjimmyshcn@gmail.com>
We're using a Kernel based on v6.7, which should include all te
patches needed for SEV / SNP / TDX.
By doing this, later on, we'll be able to stop building the specific
kernel for each one of the targets we have for the TEEs.
Let's note that we've introduced the "confidential" target for the
kernel builder script, while the TEE specific builds are being kept as
they're -- at least for now.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>