Addon images (e.g. the CoCo guest components addon) are not full root
filesystems -- they contain only the binaries and configuration that
get bind-mounted into the real rootfs at boot. The existing
check_rootfs() validation requires /sbin/init and systemd, which are
not present in addon images.
Add a SKIP_ROOTFS_CHECK environment variable that, when set to "yes",
bypasses the check_rootfs() call. Forward the variable into the
container environment when using the Docker-based build path so it
works in both direct and containerised invocations.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Assisted-by: Cursor <cursoragent@cursor.com>
The DAX header (2 MiB of NVDIMM metadata + a duplicate MBR) is
unconditionally prepended to every image by set_dax_header(). NVIDIA
images use virtio-blk-pci with disable_image_nvdimm=true, so the
kernel reads MBR #1 directly and never touches the DAX metadata --
it is dead weight.
Add a SKIP_DAX_HEADER environment variable (default "no") that, when
set to "yes", skips the DAX header entirely:
- Removes the 2 MiB DAX overhead from image size calculations in
both the erofs and ext4 paths
- Skips the set_dax_header() call, avoiding compilation and
execution of the nsdax tool
- Passes the variable through to containerised builds
Enable SKIP_DAX_HEADER=yes for both install_image_nvidia_gpu() and
install_image_nvidia_gpu_confidential() in the build pipeline. All
other image builds are unaffected (default remains "no").
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Add full dm-verity and measured rootfs support to
create_erofs_rootfs_image(), bringing it to parity with the ext4 path.
Unlike ext4, which is a read-write filesystem mounted read-only by
convention, erofs is structurally read-only -- no journal, no write
metadata, no superblock write path.
This is a natural fit for dm-verity: erofs never attempts writes, so
verity never has to reject anything. With ext4, the kernel must skip
journal replay on verity-protected devices, which is a fragile
assumption.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Extract build_kernel_verity_params() and setup_verity() from the
inline block inside create_rootfs_image() into top-level functions.
This is a pure refactoring with no behavior change. The verity logic
is moved verbatim, with the only difference being that
build_kernel_verity_params() now takes the image path as an explicit
parameter instead of capturing it from the enclosing scope.
The extracted functions will be reused by create_erofs_rootfs_image()
in a subsequent commit to add dm-verity support for erofs images.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Remove the initramfs folder, its build steps, and use the kernel
based dm-verity enforcement for the handlers which used the
initramfs mode. Also, remove the initramfs verity mode
capability from the shims and their configs.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
This change introduces the kernelinit dm-verity mode, allowing
initramfs-less dm-verity enforcement against the rootfs image.
For this, the change introduces a new variable with dm-verity
information. This variable will be picked up by shim
configurations in subsequent commits.
This will allow the shims to build the kernel command line
with dm-verity information based on the existing
kernel_parameters configuration knob and a new
kernel_verity_params configuration knob. The latter
specifically provides the relevant dm-verity information.
This new configuration knob avoids merging the verity
parameters into the kernel_params field. Avoiding this, no
cumbersome escape logic is required as we do not need to pass the
dm-mod.create="..." parameter directly in the kernel_parameters,
but only relevant dm-verity parameters in semi-structured manner
(see above). The only place where the final command line is
assembled is in the shims. Further, this is a line easy to comment
out for developers to disable dm-verity enforcement (or for CI
tasks).
This change produces the new kernelinit dm-verity parameters for
the NVIDIA runtime handlers, and modifies the format of how
these parameters are prepared for all handlers. With this, the
parameters are currently no longer provided to the
kernel_params configuration knob for any runtime handler.
This change alone should thus not be used as dm-verity
information will no longer be picked up by the shims.
systemd-analyze on the coco-dev handler shows that using the
kernelinit mode on a local machine, less time is spent in the
kernel phase, slightly speeding up pod start-up. On that machine,
the average of 172.5ms was reduced to 141ms (4 measurements, each
with a basic pod manifest), i.e., the kernel phase duration is
improved by about 18 percent.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
This reverts commit 923f97bc66 in
order to re-instantiate the logic from commit
e4a13b9a4a.
The latter commit was previously reverted due to the NVIDIA GPU TEE
handler using an initrd, not an image.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
This reverts commit e4a13b9a4a, as it
caused some issues with the GPU workflows.
Reverting it is better, as it unblocks other PRs.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Updates to the shim-v2 build and the binaries.sh script.
Makeing sure that both variants "confidential" AND
"nvidia-gpu-confidential" are handled.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
So far we've only been building the initrd for the nvidia rootfs.
However, we're also interested on having the image beind used for a few
use-cases.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
The Guest rootfs image file size is aligned up to 128M boundary,
since commmit 2b0d5b2. This change allows users to use a custom
alignment value - e.g., to align up to 2M, users will be able to
specify IMAGE_SIZE_ALIGNMENT_MB=2 for image_builder.sh.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Move the deletion of unnecessary systemd units and files from
image_builder.sh into rootfs.sh.
The files being deleted can be applicable to other image file formats
too, not just to the rootfs-image format created by image_builder.sh.
Also, image_builder.sh was deleting these files *after* it calculated
the size of the rootfs files, thus missing out on the opportunity to
possibly create a smaller image file.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
It's too long a time to cross build agent based on docker buildx, thus
we cross build rootfs based on a container with cross compile toolchain
of gcc and rust with musl libc. Then we get fast build just like native
build.
rootfs initrd cross build is disabled as no cross compile tolchain for
rust with musl lib if found for alpine and based on docker buildx takes
too long a time.
Fixes: #6557
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Generate rootfs hash data during creating the kata rootfs,
current kata image only have one partition, we add another
partition as hash device to save hash data of rootfs data blocks.
Fixes: #6674
Signed-off-by: Wang, Arron <arron.wang@intel.com>
For kata containers, rootfs is used in the read-only way.
EROFS can noticably decrease metadata overhead.
On the basis of supporting the EROFS file system, it supports using the config parameter to switch the file system used by rootfs.
Fixes: #6063
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
Create a guest image to support SELinux for containers inside the guest
if `SELINUX=yes` is specified. This works only if the guest rootfs is
CentOS and the init service is systemd, not the agent init. To enable
labeling the guest image on the host, selinuxfs must be mounted on the
host. The kata-agent will be labeled as `container_runtime_exec_t` type.
Fixes: #4812
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Now if no options/arguments specified, the shell scripts will return an error:
ERROR: Invalid rootfs directory: ''
This commit will show usage if no options/arguments specified.
Fixes: #3256
Signed-off-by: bin <bin@hyper.sh>
Use the same runtime used for podman run also for the podman build cmd
Additionally remove "docker" from the docker_run_args variable
Fixes: #3239
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
The help information of '-f' option is missing, and same issue
with 'BLOCK_SIZE' env variables, fix it in usage() function.
Fixes: #3231
Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
This patch fixes inconsistent calculations of the rootfs size.
For `du` and `df`, `-B 1MB` is different from `-BM`. The
former is the power of 1000, and the latter is the power of
1024. So comparing them doesn't make sense. The bug may result
in a larger image than needed.
Fixes: #2560
Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
There is an inconformity between qemu and kernel of memory alignment
check in memory hotplug. Both of qemu and kernel will do the start
address alignment check in memory hotplug. But it's 2M in qemu
while 128M in kernel. It leads to an issue when memory hotplug.
Currently, the kata image is a nvdimm device, which will plug into the VM as
a dimm. If another dimm is pluged, it will reside on top of that nvdimm.
So, the start address of the second dimm may not pass the alginment
check in kernel if the nvdimm size doesn't align with 128M.
There are 3 ways to address this issue I think:
1. fix the alignment size in kernel according to qemu. I think people
in linux kernel community will not accept it.
2. do alignment check in qemu and force the start address of hotplug
in alignment with 128M, which means there maybe holes between memory blocks.
3. obey the rule in user end, which means fix it in kata.
I think the second one is the best, but I can't do that for some reason.
Thus, the last one is the choice here.
Fixes: #1769
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Give the user chance to specify their own registry in event the default
provided are not accessible, desirable.
Fixes: #1393
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
Changed the user-visible urls to point to the right Kata Containers
files/repositories.
Fixes#234
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Disable reflink when using DAX. Reflink is a xfs feature that cannot be
used together with DAX.
fixes kata-containers/osbuilder#456
fixes#577
Signed-off-by: Julio Montes <julio.montes@intel.com>
move all osbuilder files into `tools` directory to be able
to merge this into kata-containers repo.
Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>