While the local-build's folder's Makefile dependencies for the
confidential nvidia rootfs targets already declare the pause image
and coco-guest-components dependencies, the actual rootfs
composition does not contain the pause image bundle and relevant
certificates for guest pull. This change ensure the rootfs gets
composed with the relevant files.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
TOML was chosen for initdata particularly for the ability to include
policy docs and other configuration files without mangling them. The
default TOML encoding renders string values as single-line,
double-quoted strings, effectively depriving us of this feature.
This commit changes the encoding to use `to_string_pretty`, and includes
a test that verifies the desirable aspect of encoding: newlines are kept
verbatim.
Fixes: #11943
Signed-off-by: Markus Rudy <mr@edgeless.systems>
One problem that we've been having for a reasonable amount of time, is
containerd not behaving very well when we have multiple snapshotters.
Although I'm adding this test with my "CoCo" hat in mind, the issue can
happen easily with any other case that requires a different snapshotter
(such as, for instance, firecracker + devmapper).
With this in mind, let's do some stability tests, checking every hour a
simple case of running a few pre-defined containers with runc, and then
running the same containers with kata.
This should be enough to put us in the situation where containerd gets
confused about which snapshotter owns the image layers, and break on us
(or not break and show us that this has been solved ...).
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
With this change we namespace the stage one rootfs tarball name
and use the same name across all uses. This will help overcome
several subtle local build problems.
Signed-off-by: Manuel Huber <manuelh@nvidia.com>
We fix the following error:
```
thread 'sandbox::tests::add_and_get_container' panicked at src/sandbox.rs:901:10:
called `Result::unwrap()` on an `Err` value: Create cgroupfs manager
Caused by:
0: fs error caused by: Os { code: 17, kind: AlreadyExists, message: "File exists" }
1: File exists (os error 17)
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```
by ensuring that the cgroup path is unique for tests run in the same millisecond.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Use CDI exclusively from crates.io and not from a GH repository.
Cargo can easily check if a new version is available and we can
far more easier bump it if needed.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
with the shellcheck fixes we accidentally quoted the "-n NAMESPACE"
argument where we should have used array instead, which lead to oc
considering this as a pod name and returning error.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
We need to ensure that any change on the Dockerfile (and its dir) leads
to the build being retriggered, rather than using the cached version.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We are seeing more protoc related failures on the new
runners, so try adding the protobuf-compiler dependency
to these steps to see if it helps.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- copy default-initdata.toml in create_tmp_policy_settings_dir, so it can be modified by other tests if needed
- make auto_generate_policy use default-initdata.toml by default
- add auto_generate_policy_no_added_flags, so it may be used by tests that don't want to use default-initdata.toml by default
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
On commit 9602ba6ccc, from February this
year, we've introduced a check to ensure that the files needed for
signing the kernel build are present. However, we've noticed last week
that there were a reasonable amount of wrong assumptions with the
workflow. :-)
Zvonko fixed the majority of those, but this bit was left and it'd cause
breakages when using kernel that was cached ... although passing when
building new kernels.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
This is needed to the kernel setup picks up the correct
config values from our fragments directories.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We need to make sure that the kernel we're using has the
correct configs set, otherwise the module signing will not work.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Now that we have added the ability to deploy kata-containers with
experimental_force_guest_pull configured, let's make sure we test it to
avoid any kind of regressions.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
Otherwise we have no way to differentiate running tests on qemu-coco-dev
with different snapshotters.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
What was done in the past, trying to set the env var on the same step
it'd be used, simply does not work.
Instead, we need to properly set it through the `env` set up, as done
now.
We're also bumping the kata_config_version to ensure we retrigger the
kernel builds.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
We have some scalable s390x and ppc runners, so
start to use them for build and test, to improve
the throughput of our CI
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
For some reason we didn't have the "Report tests" step as part of the
TEE jobs. This step immensely helps to check which tests are failing and
why, so let's add it while touching the workflow.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
There's no reason to have the code duplication between the SNP / TDX
tests for CoCo, as those are basically using the same configuration
nowadays.
Note that for the TEEs case, as the nydus-snapshotter is deployed by the
admin, once, instead of deploying it on every run ... I'm actually
removing the nydus-snapshotter steps so we make it clear that those
steps are not performed by the CI.
Signed-off-by: Fabiano Fidêncio <ffidencio@nvidia.com>
As virtio-scsi has been set the default block device driver, the
runtime also need to correctly handle the virtio-scsi info, specially
the SCSI address required within kata-agent handling logic.
And getting and assigning the scsi_addr to kata agent device id
will be enough. This commit just do such work.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Since runtime-rs support the block device hotplug with
creating new containers, and the device would also be
removed when the container stopped, thus add the block
device unplug for clh.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This commit introduces support for selecting `virtio-scsi` as the
block device driver for QEMU during initial setup.
The primary goal is to resolve a conflict in non-TEE environments:
1. The global block device configuration defaults to `virtio-scsi`.
2. The `initdata` device driver was previously designed and hardcoded
to `virtio-blk-pci`.
3. This conflict prevented unified block device usage.
By allowing `virtio-scsi` to be configured at cold boot, the `initdata`
device can now correctly adhere to the global setting, eliminating the
need for a hardcoded driver and ensuring consistent block device
configuration across all supported devices (excluding rootfs).
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The implementation of the seccomp feature in Dragonball currently has a basic framework.
But the actual restriction rules are empty.
This pull request includes the following changes:
- Modifiy configuration files to relevant configuration files.
- Modifiy seccomp framework to support different restrictions for different threads.
- Add new seccomp rules for the modified framework.
This commit primarily implements the changes 1 and 3 for runtime-rs.
Fixes: #11673
Signed-off-by: wangxinge <wangxinge@bupt.edu.cn>
For DGX like systems we need additional binaries and libraries,
enable the Kata AND CoCo use-case.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Update tools/osbuilder/rootfs-builder/nvidia/nvidia_rootfs.sh
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>