In order to have a reproducible code generation process, we need to pin
the versions of the tools used. This is accomplished easiest by
generating inside a container.
This commit adds a container image definition with fixed dependencies
for Golang proto/ttrpc code generation, and changes the agent Makefile
to invoke the update-generated-proto.sh script from within that
container.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
The generated Go bindings for the agent are out of date. This commit
was produced by running
src/agent/src/libs/protocols/hack/update-generated-proto.sh with
protobuf compiler versions matching those of the last run, according to
the generated code comments.
Since there are new RPC methods, those needed to be added to the
HybridVSockTTRPCMockImp.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
- "confidential_emptyDir" becomes "emptyDir" in the settings file.
- "confidential_configMap" becomes "configMap" in settings.
- "mount_source_cpath" becomes "cpath".
- The new "root_path" gets used instead of the old "cpath" to point to
the container root path..
- "confidential_guest" is no longer used. By default it gets replaced
by "enable_configmap_secret_storages"=false, because CoCo is using
CopyFileRequest instead of the Storage data structures for ConfigMap
and/or Secret volume mounts during CreateContainerRequest.
- The value of "guest_pull" becomes true by default.
- "image_layer_verification" is no longer used - just CoCo's guest pull
is supported.
- The Request input files from unit tests are changing to reflect the
new default settings values described above.
- tests/integration/kubernetes/tests_common.sh adjusts the settings for
platforms that are not set-up for CoCo during CI (i.e., platforms
other than SNP, TDX, and CoCo Dev).
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Skip pulling container image layers when guest-pull=true. The contents
of these layers were ignored due to:
- #11162, and
- tarfs snapshotter support having been removed from genpolicy.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
AKS Confidential Containers are using the tarfs snapshotter. CoCo
upstream doesn't use this snapshotter, so remove this Policy complexity
from upstream.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
`mem-agent` here is now a library and do not contain examples, ignore
Cargo.lock to get rid of untracked file noise produced by `cargo run` or
`cargo test`.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Re-generates the client code against Cloud Hypervisor v47.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
`MmapRegion` is only used while `virtio-fs` is enabled during testing
dragonball, gate the import behind `virtio-fs` feature.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Some variables went unused if certain features are not enabled, use
`#[allow(unused)]` to suppress those warnings at the time being.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
`VcpuManagerError` is only needed when `host-device` feature is enabled,
gate the import behind that feature.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Code inside `test_mac_addr_serialization_and_deserialization` test does
not actually require this `with-serde` feature to test, removing the
assertion here to enable this test.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Add full cgroups support on host. Cgroups are managed by `FsManager` and
`SystemdManager`. As the names impies, the `FsManager` manages cgroups
through cgroupfs, while the `SystemdManager` manages cgroups through
systemd. The two manages support cgroup v1 and cgroup v2.
Two types of cgroups path are supported:
1. For colon paths, for example "foo.slice:bar:baz", the runtime manages
cgroups by `SystemdManager`;
2. For relative/absolute paths, the runtime manages cgroups by
`FsManager`.
vCPU threads are added into the sandbox cgroups in cgroup v1 + cgroupfs,
others, cgroup v1 + systemd, cgroup v2 + cgroupfs, cgroup v2 + systemd, VMM
process is added into the cgroups.
The systemd doesn't provide a way to add thread to a unit. `add_thread()`
in `SystemdManager` is equivalent to `add_process()`.
Cgroup v2 supports threaded mode. However, we should enable threaded mode
from leaf node to the root node (`/`) iteratively [1]. This means the
runtime needs to modify the cgroups created by container runtime (e.g.
containerd). Considering cgroupfs + cgroup v2 is not a common combination,
its behavior is aligned with systemd + cgroup v2, which is not allowed to
manage process at the thread level.
1: https://www.kernel.org/doc/html/v4.18/admin-guide/cgroup-v2.html#threadsFixes: #11356
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
As some reasons, it first should make it align with runtime-go, this
commit will do this work.
Fixes#11543
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
The actual memory usage on the host is equal to the hypervisor memory usage
plus the user memory usage. An OOM killer might kill the shim when the
memory limit on host is same with that of container and the container
consumes all available memory. In this case, the containerd will never
receive OOM event, but get "task exit" event. That makes the `k8s-oom.bats`
test fail.
The fix is to add a new container to increase the sandbox memory limit.
When the container "oom-test" is killed by OOM killer, there is still
available memory for the shim, so it will not be killed.
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
When enabling systemd cgroup driver and sandbox cgroup only, the shim is
under a systemd unit. When the unit is stopping, systemd sends SIGTERM to
the shim. The shim can't exit immediately, as there are some cleanups to
do. Therefore, ignoring SIGTERM is required here. The shim should complete
the work within a period (Kata sets it to 300s by default). Once a timeout
occurs, systemd will send SIGKILL.
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Our CI keeps on getting
```
jq: error (at <stdin>:1): Cannot index string with string "tag_name"
```
during the install dependencies phase, which I suspect
might be due to github rate limits being reduced, so try
to pass through the `GH_TOKEN` env and use it in the auth header.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
It is important that we continue to support VirtIO-SCSI. While
VirtIO-BLK is a common choice, virtio-scsi offers significant
performance advantages in specific scenarios, particularly when
utilizing iothreads and with NVMe Fabrics.
Maintaining Flexibility and Choice by supporting both virtio-blk and
virtio-scsi, we provide greater flexibility for users to choose the
optimal storage(virtio-blk, virtio-scsi) interface based on their
specific workload requirements and hardware configurations.
As virtio-scsi controller has been created when qemu vm starts with
block device driver is set to `virtio-scsi`. This commit is for blockdev_add
the backend block device and device_add frondend virtio-scsi device via qmp.
Fixes#11516
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
As block device index is an very important unique id of a block device
and can indicate a block device which is equivalent to device_id.
In case of index is required in calculating scsi LUN and reduce
useless arguments within reusing `hotplug_block_device`, we'd better
change the device_id with block device index.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
In this commit, block device aio are introduced within hotplug_block_device
within qemu via qmp and the "iouring" is set the default.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
It should be correctly handled within the device manager when do
create_block_device if the driver_option is virtio-scsi.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
It supports handling scsi device when block device driver is `scsi`.
And it will ensure a correct storage source with LUN.
Fixes#11516
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
It's used to help discover scsi devices inside guest and also add a
new const value `KATA_SCSI_DEV_TYPE` to help pass information.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
AIO is the I/O mechanism used by qemu with options:
- threads
Pthread based disk I/O.
- native
Native Linux I/O.
- io_uring (default mode)
Linux io_uring API. This provides the fastest I/O operations on
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Although Previous implementation of hotplugging block device via QMP
can successfully hot-plug the regular file based block device, but it
fails when the backend is /dev/xxx(e.g. /dev/loop0). With analysis about
it, we can know that it lacks the ablility to hotplug host block devices.
This commit will fill the gap, and make it work well for host block
devices.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
As there were a few moderate security vulnerability fixes missed as part
of the 3.19.0 release.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
For the release itself, let's simply copy the VERSION file to the
tarball.
To do so, we had to change the logic that merges the build, as at that
point the tag is not yet pushed to the repo.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
On commit 90bc749a19, we've changed the
QEMUTDXPATH in order to get it to work with GPUs, but the change broke
the non-GPU TDX use-case, which depends on the distro binary.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
- Add nodeSelector configuration to values.yaml with empty default
- Update DaemonSet template to conditionally include nodeSelector
- Add documentation and examples for nodeSelector usage in README
- Allows users to restrict kata-containers deployment to specific nodes by labeling them
Signed-off-by: Gus Minto-Cowcher <gus@basecamp-research.com>
According to the issue [1], Tokio will panic when we are giving a blocking
socket to Tokio's `from_std()` method, the information is as follows:
```
A panic occurred at crates/agent/src/sock/vsock.rs:59: Registering a
blocking socket with the tokio runtime is unsupported. If you wish to do
anyways, please add `--cfg tokio_allow_from_blocking_fd` to your RUSTFLAGS.
```
A workaround is to set the socket to non-blocking.
1: https://github.com/tokio-rs/tokio/issues/7172
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
The KERNEL_DEBUG_ENABLED was missing in the outer shell script
so overrides via make were not possible.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Bump these crates to remove the unmaintained dependency
proc-macro-error and remediate RUSTSEC-2024-0370
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Bump these crates across various components to remove the
dependency on unmaintained instant crate and remediate
RUSTSEC-2024-0384
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- The github generated template had an old version which
isn't valid for the pr-scan, so update to the latest
- The action needs also `actions: read` and `contents:read` to run in kata-containers
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Some of the nix apis we are using are now enabled by features,
so add these to resolve the compilation issues
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This new version of gc fixes s390x attestation, also introduces registry
configuration setting directly via initdata.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
The peer pods project is using the agent-ctl tool in some
tests, so tagging our cache will let them more easily identify
development versions of kata for testing between releases.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Sometimes, containers or execs do not use stdin, so there is no chance
to add parent stdin to the process's writer hashmap, resulting in the
parent stdin's fd not being closed when the process is cleaned up later.
Therefore, when creating a process, first explicitly add parent stdin to
the wirter hashmap. Make sure that the parent stdin's fd can be closed
when the process is cleaned up later.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
We want to be able to build a debug version of the kernel for various
use-cases like debugging, tracing and others.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The convention for rootfs-* names is:
* rootfs-${image_type}-${special_build}
If this is not followed, cache will never work as expected, leading to
building the initrd / image on every single build, which is specially
constly when building the nvidia specific targets.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
The init data could not be read properly within kata-agent because the
data length field was omitted, a consequence of a mismatch in the data
write format.
Fixes#11556
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Now AA supports to receive initdata toml plaintext and deliver it in the
attestation. This patch creates a file under
'/run/confidential-containers/initdata'
to store the initdata toml and give it to AA process.
When we have a separate component to handle initdata, we will move the
logic to that component.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Update to https://github.com/teawater/mem-agent/tree/kata-20250627.
The commit list:
3854b3a Update nix version from 0.23.2 to 0.30.1
d9a4ced Update tokio version from 1.33 to 1.45.1
9115c4d run_eviction_single_config: Simplify check evicted pages after
eviction
68b48d2 get_swappiness: Use a rounding method to obtain the swappiness
value
14c4508 run_eviction_single_config: Add max_seq and min_seq check with
each info
8a3a642 run_eviction_single_config: Move infov update to main loop
b6d30cf memcg.rs: run_aging_single_config: Fix error of last_inc_time
check
54fce7e memcg.rs: Update anon eviction code
41c31bf cgroup.rs: Fix build issue with musl
0d6aa77 Remove lazy_static from dependencies
a66711d memcg.rs: update_and_add: Fix memcg not work after set memcg
issue
cb932b1 Add logs and change some level of some logs
93c7ad8 Add per-cgroup and per-numa config support
092a75b Remove all Cargo.lock to support different versions of rust
540bf04 Update mem-agent-srv, mem-agent-ctl and mem-agent-lib to
v0.2.0
81f39b2 compact.rs: Change default value of compact_sec_max to 300
c455d47 compact.rs: Fix psi_path error with cgroup v2 issue
6016e86 misc.rs: Fix log error
ded90e9 Set mem-agent-srv and mem-agent-ctl as bin
Fixes: #11478
Signed-off-by: teawater <zhuhui@kylinos.cn>
As the following job has passed 10 days in a row for the nightly test:
```
kata-containers-ci-on-push / run-k8s-tests-on-zvsi / run-k8s-tests (nydus, qemu-coco-dev, kubeadm)
```
this commit makes the job required again.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The teardown_common will print the description of the running pods, kill
them all and print the system's syslogs afterwards.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Set the node in the spec template of a Job manifest, allowing to use
set_node() on tests like k8s-parallel.bats
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The teardown_common will print the description of the running pods, kill
them all and print the system's syslogs afterwards.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The teardown_common will print the description of the running pods, kill
them all and print the system's syslogs afterwards.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The previous description for the `block_device_driver` was inaccurate or
outdated. This commit updates the documentation to provide a more
precise explanation of its function.
Fixes#11488
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
When we run a kata pod with runtime-rs/qemu and with a default
configuration toml, it will fail with error "unsupported driver type
virtio-scsi".
As virtio-scsi within runtime-rs is not so popular, we set default block
device driver with `virtio-blk-*`.
Fixes#11488
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This patch changes the container process HashMap to use exec_id as the primary
key instead of PID, preventing exec_id collisions that could be exploited in
Confidential Computing scenarios where the host is less trusted than the guest.
Key changes:
- Changed `processes: HashMap<pid_t, Process>` to `HashMap<String, Process>`
- Added exec_id collision detection in `start()` method
- Updated process lookup operations to use exec_id directly
- Simplified `get_process()` with direct HashMap access
This prevents multiple exec operations from reusing the same exec_id, which
could be problematic in CoCo use cases where process isolation and unique
identification are critical for security.
Signed-off-by: Ankita Pareek <ankitapareek@microsoft.com>
The `/opt/kata/VERSION` file, which is created using `git describe
--tags`, requires the newly released tag to be updated in order to be
accurate.
To do so, let's add a `fetch-tags: true` to the checkout action used
during the `create-kata-tarball` job.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
The CoCo non-TEE job (run-k8s-tests-coco-nontee) used to be required but
we had to withdraw it to fix a problem (#11156). Now the job is back
running and stable, so time to make it required again.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
tempdir hasn't been updated for seven years and pulls in
remove_dir_all@0.5.3 which has security advisory
GHSA-mc8h-8q98-g5hr, so replace this with using tempfile,
which the crate got merged into and we use elsewhere in the
project
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Ignore Cargo.lock in `libs` to prevent developers from accidentally
track lock files in `libs` workspace.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
This PR adds support for adding a network device before starting the
cloud-hypervisor VM.
This commit will get the host devices from NamedHypervisorConfig and
assign it to VmConfig's devices which is for vfio devices when clh
starts launching.
And with this, it successfully finish the vfio devices conversion from
a generic Hypervisor config to a clh specific VmConfig.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This commit introduce `host_devices` to help convert vfio devices from
a generic hypervisor config to a cloud-hypervisor specific VmConfig.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This PR adds support for adding a vfio device before starting the
cloud-hypervisor VM (or cold-plug vfio device).
This commit changes "pending_devices" for clh implementation via adding
DeviceType::Vfio() into pending_devices. And it will get shared host devices
after correctly handling vfio devices (Specially for primary device).
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
crates in `libs` workspace do not ship binaries, they are just libraries
for other workspace to reference, the `Cargo.lock` file hence would not
take effect. Removing Cargo.lock for `libs` workspace.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
In line with configuration for other TEEs, shared_fs should
be set to none for IBM SEL. This commit updates the value for
runtime/runtime-rs.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
As we're using a `kubectl wait --timeout ...` to check whether the
kata-deploy pod's been deleted or not, let's remove the `--wait` from
the `helm uninstall ...` call as k0s tests were failing because the
`kubectl wait --timeout...` was starting after the pod was deleted,
making the test fail.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
We've been pinning a specific version of k0s for CRI-O tests, which may
make sense for CRI-O, but doesn't make sense at all when it comes to
testing that we can install kata-deploy on latest k0s (and currently our
test for that is broken).
Let's bump to the latest, and from this point we start debugging,
instead of debugging on an ancient version of the project.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Bump url, reqwests and idna crates in order to move away from
idna <1.0.3 and remediate CVE-2024-12224.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Previously, the rootlessDir variable in `src/runtime/virtcontainers/pkg/rootless.go` was initialized at
package load time using `os.Getenv("XDG_RUNTIME_DIR")`. However, in rootless
VMM mode, the correct value of $XDG_RUNTIME_DIR is set later during runtime
using os.Setenv(), so rootlessDir remained empty.
This patch defers the initialization of rootlessDir until the first call
to `GetRootlessDir()`, ensuring it always reflects the current environment
value of $XDG_RUNTIME_DIR.
Fixes: #11526
Signed-off-by: stevenfryto <sunzitai_1832@bupt.edu.cn>
There are workflows that rely on `az aks install-cli` to get kubectl
installed. There is a well-known problem on install-cli, related with
API usage rate limit, that has recently caused the command to fail
quite often.
This is replacing install-cli with the azure/setup-kubectl github
action which has no such as rate limit problem.
While here, removed the install_cli() function from gha-run-k8s-common.sh
so avoid developers using it by mistake in the future.
Fixes#11463
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Removing kernel config files realting
to SEV as part of the SEV deprecation
efforts.
Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
Signed-off-by: Arvind Kumar <arvinkum@amd.com>
Removing runtime SEV functionality,
such as the kbs, ovmf, VMSA handling,
and SEV configs as part of deprecating
SEV from kata.
Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
Signed-off-by: Arvind Kumar <arvinkum@amd.com>
Removing files related to SEV, responsible for
installing and configuring Kata containers.
Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
Signed-off-by: Arvind Kumar <arvinkum@amd.com>
Add init data annotation within preparing remote hypervisor annotations
when prepare vm, so that it can be passed within CreateVMRequest.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
679cc9d47c was merged and bumped the
podoverhead for the gpu related runtimeclasses. However, the bump on the
`kata-runtimeClasses.yaml` as overlooked, making our tests fail due to
that discrepancy.
Let's just adjust the values here and move on.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
We cannot only rely only on default_cpu and default_memory in the
config, default is 1 and 2Gi but we need some overhead for QEMU and
the other related binaries running as the pod overhead. Especially
when QEMU is hot-plugging GPUs, CPUs, and memory it can consume more
memory.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
atty is unmaintained, with the last release almost 3 years
ago, so we don't need to check for updates, but instead will
remove it from out dependency tree.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
structopt features were integrated into clap v3 and so is not
actively updated and pulls in the atty crate which has a security
advisory, so update clap, remove structopts, update the code that
used it to remove the outdated dependencies.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
slog-term 2.9.0 included atty, which is unmaintained
as has a security advisory GHSA-g98v-hv3f-hcfr,
so bump the version across our components to remove
this dependency.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We had the proper config.toml configuration for static builds
but were building the glibc target and not the musl target.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The way GH works, we can only require Zizmor results on ALL PR runs, or
none, so remove the path filter.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Previously, the source field was subject to mandatory checks. However,
in guest-pull mode, this field doesn't consistently provide useful
information. Our practical experience has shown that relying on this
field for critical data isn't always necessary.
In other aspect, not all cases need mandatory check for KataVirtualVolume.
based on this fact, we'd better to make from_base64 do only one thing and
remove the validate(). Of course, We also keep the previous capability to
make it easy for possible cases which use such method and we rename it
clearly with from_base64_and_validate.
This commit relaxes the mandatory checks on the KataVirtualVolume specifically
for guest-pull mode, acknowledging its diminished utility in this context.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
When hot plugging vcpu in dragonball hypervisor, use the synchronization
interface and wait until the hot plug cpu is executed in the guest
before returning. This ensures that the subsequent device hot plug will
not conflict with the previous call.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Let dragonball's resize_vcpu api support synchronization, and only
return after the hot-plug of the CPU is successfully executed in the
guest kernel. This ensures that the subsequent device hot-plug operation
can also proceed smoothly.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Got follow warning with make test of kata-agent:
Compiling rustjail v0.1.0 (/data/teawater/kata-containers/src/agent/rustjail)
Compiling kata-agent v0.1.0 (/data/teawater/kata-containers/src/agent)
warning: unused import: `std::os::unix::fs`
--> rustjail/src/mount.rs:1147:9
|
1147 | use std::os::unix::fs;
| ^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
This commit fixes it.
Fixes: #11508
Signed-off-by: teawater <zhuhui@kylinos.cn>
Introduce a const value `KATA_VIRTUAL_VOLUME_PREFIX` defined in the libs/kata-types,
and it'll be better import such const value from there.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This was originally implemented as a Jenkins skip and is only used in a few
workflows. Nowadays this would be better implemented via the gatekeeper.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This patch fixes the rules.rego file to ensure that the
policy is correctly parsed and applied by opa.
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
This commit updates the `tests_common.sh` script
to enable the `confidential_guest`
setting for the coco tests in the Kubernetes
integration tests.
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
This patch removes storages from the testcases.json file for execprocess.
This is because input storage objects are invalid for two reasons:
1. "io.katacontainers.fs-opt.layer=" is missing option in annotations.
2. by default, we don't have host-tarfs-dm-verity enabled, so the storage
objects are not created in policy.
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
---
This patch introduces some basic checks for the
`image_guest_pull` storage type in the genpolicy tool.
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
This patch improves the test framework for the
genpolicy tool by enabling the use of config maps.
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
Add the definiation of variable DEFCREATECONTAINERTIMEOUT into
Makefile target with default timeout 30s.
Fixes: #485
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
It's used to indicate timeout value set for image pulling in
guest during creating container.
This allows users to set this timeout with annotation according to the
size of image to be pulled.
Fixes#10692
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
It allows users to set this create container timeout within
configuration.toml according to the size of image to be pulled
inside guest.
Fixes#10692
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
To better understand the impact of different timeout values on system
behavior, this section provides a more comprehensive explanation of the
request_timeout_ms:
This timeout value is used to set the maximum duration for the agent to
process a CreateContainerRequest. It's also used to ensure that workloads,
especially those involving large image pulls within the guest, have sufficient
time to complete.
Based on explaination above, it's renamed with `create_container_timeout`,
Specially, exposed in 'configuration.toml'
Fixes#10692
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This helps considerably to avoid patching the code, and just adjusting
the build environment to use a smaller alignment than the default one.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
These tests are not passing, or being maintained,
so as discussed on the AC meeting, we will skip them
from automatically running until they can be reviewed
and re-worked, so avoid wasting CI cycles.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This adds Zizmor GHA security scanning as a PR gate.
Note that this does NOT require that Zizmor returns 0 alerts, but rather
that Zizmor's invocation completes successfully (regardless of how many
alerts it raises).
I will set up the former after this commit is merged (through the GH UI).
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Enable GPU annotations by adding `default_gpus` and `default_gpu_model`
into the list of valid annotations `enable_annotations`.
Fixes#10484
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Add GPU specific annotations used by remote hypervisor for instance
selection during `prepare_vm`.
Fixes#10484
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Two annotations: `default_gpus and `default_gpu_model` as GPU annotations
are introduced for Kata VM configurations to improve instance selection on
remote hypervisors. By adding these annotations:
(1) `default_gpus`: Allows users to specify the minimum number of GPUs a VM
requires. This ensures that the remote hypervisor selects an instance
with at least that many GPUs, preventing resource under-provisioning.
(2) `default_gpu_model`: Lets users define the specific GPU model needed for
the VM. This is crucial for workloads that depend on particular GPU archs or
features, ensuring compatibility and optimal performance.
Fixes#10484
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
To provide the remote hypervisor with the necessary intelligence
to select the most appropriate instance for a given GPU instance,
leading to better resource allocation, two fields `default_gpus`
and `default_gpu_model` are introduced in `RemoteInfo`.
Fixes#10484
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
To better support containerd 2.1 and later versions, remove the
hardcoded `layer.erofs` and instead parse `/proc/mounts` to obtain the
real mount source (and `/sys/block/loopX/loop/backing_file` if needed).
If the mount source doesn't end with `layer.erofs`, it should be marked
as unsupported, as it may be a filesystem meta file generated by later
containerd versions for the EROFS flattened filesystem feature.
Also check whether the filesystem type is `overlay` or not, since the
containerd mount manager [1] may change it after being introduced.
[1] https://github.com/containerd/containerd/issues/11303
Fixes: f63ec50ba3 ("runtime: Add EROFS snapshotter with block device support")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Current Dockerfile fails when trying to build from the root of the repo
docker build -t kata-monitor -f tools/packaging/kata-monitor/Dockerfile .
with "invalid go version '1.23.0': must match format 1.23"
Using go 1.23 in the Dockerfile fixes the build error
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
I notices that agent-ctl is including a 9 month old version of
image-rs and the libs crates haven't been update for potentially
many years, so bump all of these.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This commit introduces the ability to run Pods without shared fs
mechanism in Kata.
The default shared fs can lead to unnecessary resource consumption
and security risks for certain use cases. Specifically, scenarios
where files only need to be copied into the VM once at Pod creation
(e.g., non-tee envs) and don't require dynamic updates make the shared
fs redundant and inefficient.
By explicitly disabling shared fs functionality, we reduce resource
overhead and shrink the attack surface. Users will need to employ
alternative methods(e.g. guest-pull) to ensure container images are
shared into the guest VM for these specific scenarios.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
In the pre commit:74eccc54e7b31cc4c9abd8b6e4007c3a4c1d4dd4,
it missed return the right rootfs volume.
In the is_block_rootfs fn, if the rootfs is based on a
block device such as devicemapper, it should clear the
volume's source and let the device_manager to use the
dev_id to get the device's host path.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
For containerd's Blockfile Snapshotter, it will pass
a rootfs mounts with a rawfile as a mount source
and mount options with "loop" embeded.
To support this type of rootfs, it is necessary to identify this as a
blockfile rootfs through the "loop" flag, and then use the volume source
of the rootfs as the source of the block device to hot-insert it into
the guest.
Fixes:#11464
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Instead of building it every time, we can store the regorus
binary in OCI registry using oras and download it from there.
This reduces the install time from ~1m40s to ~15s.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
This commit add support of resize_vcpu for cloud-hypervisor
using the it's vm resize api. It can support bothof vcpu hotplug
and hot unplug.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
For cloud-hypervisor, currently only hot plugging of memory is
supported, but hot unplugging of memory is not supported. In addition,
by default, cloud-hypervisor uses ACPI-based memory hot-plugging instead
of virtio-mem based memory hot-plugging.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Add API interfaces for get vminfo and resize. get vminfo can obtain the
memory size and number of vCPUs from the cloud hypervisor vmm in real
time. This interface provides information for the subsequent resize
memory and vCPU.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
The system's own Deserialize cannot implement parsing from string to
MacAddr, so we need to implement this trait ourself.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
To make it flexibility and extensibility This change modifies the Kata
Agent's handling of `InitData` to allow for unrecognized key-value pairs.
The `InitData` field now directly utilizes `HashMap<String, String>`,
enabling it to carry arbitrary metadata and information that may be
consumed by other components
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
During sandbox preparation, initdata should be specified to TdxConfig,
specially mrconfigid, which is used to pass to tdx guest report for
measurement.
Fixes#11180
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
SEV-SNP guest configuration utilizes a different set of properties
compared to the existing 'sev-guest' object. This change introduces
the `host-data` property within the sev-snp-guest object. This property
allows for configuring an SEV-SNP guest with host-provided data, which
is crucial for data integrity verification during attestation.
The `host-data` property is specifically valid for SEV-SNP guests
running
on a capable platform. It is configured as a base64-encoded string when
using the sev-snp-guest object.
the example cmdline looks like:
```shell
-object sev-snp-guest,id=sev-snp0,host-data=CGNkCHoBC5CcdGXir...
```
Fixes#11180
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
To facilitate the transfer of initdata generated during
`prepare_initdata_device_config`, a new parameter has been
introduced into the `prepare_protection_device_config` function.
Furthermore, to specifically pass initdata to SEV-SNP Guests, a
`host_data` field has been added to the `SevSnpConfig` structure.
However, this field is exclusively applicable to the SEV-SNP platform.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Retrieve the Initdata string content from the security_info of the
Configuration. Based on the Protection Platform type, calculate the
digest of the Initdata. Write the Initdata content to the block
device. Subsequently, construct the BlockConfig based on this block
device information.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
To correctly manage initdata as a block device, a new InitData
Resource type, inherently a block device, has been introduced
within the ResourceManager. As a component of the Sandbox's
resources, this InitData Resource needs to be appropriately
handled by the Device Manager's handler.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit implements the retrieval and processing of InitData provided
via a Pod annotation. Specifically, it enables runtime-rs to:
(1) Parse the "io.katacontainers.config.hypervisor.cc_init_data"
annotation from the Pod YAML.
(2) Perform reverse operations on the annotation value: base64 decoding
followed by gzip decompression.
(3) Deserialize the decompressed data into the internal InitData
structure.
(4) Serialize the resulting InitData into a string and store it in the
Configuration.
This allows users to inject configuration data into the TEE Guest by
encoding and compressing it and passing it as an annotation in the Pod
configuration. This mechanism supports scenarios where dynamic config
is required for Confidential Containers.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit introduces the Initdata Spec and the logic for
calculating its digest. It includes:
(1) Define a `ProtectedPlatform` enum to represent major TEE platform
types.
(2) Create an `InitData` struct to support building and serializing
initialization data in TOML format.
(3) Implement adaptation for SHA-256, SHA-384, and SHA-512 digest
algorithms.
(4) Provide a platform-specific mechanism for adjusting digest lengths
(zero-padding).
(5) Supporting the decoding and verification of base64+gzip encoded
Initdata.
The core functionality ensures the integrity of data injected by the
host through trusted algorithms, while also accommodating the
measurement requirements of different TEE platforms.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit introduces a new `initdata` field of type String to
hypervisor `SecurityInfo`.
In accordance with the Initdata Specification, this field will
facilitate the injection of well-defined data from an untrusted host
into the TEE. To ensure the integrity of this injected data, the TEE
evidence's hostdata capability or the (v)TPM dynamic measurement
capability will be leveraged, as outlined in the specification.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Don't use local launched_pods variable in test_rc_policy(), because
teardown() needs to use this variable to print a description of the
pods, for debugging purposes.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
The locking mechanism around the layers cache file was insufficient to
prevent corruption of the file. This commit moves the layers cache's
management in-memory, only reading the cache file once at the beginning
of `genpolicy`, and only writing to it once, at the end of `genpolicy`.
In the case that obtaining a lock on the cache file fails,
reading/writing to it is skipped, and the cache is not used/persisted.
Signed-off-by: charludo <git@charlotteharludo.com>
`vmm-sys-util` was duplicated while updating the `ignore` list of
`rust-vmm` crates in #11431, remove duplicated one and sort the list.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
When moving from clap v2 to v4 a bunch of
functions have been removed, so update the code
to handle these replacements
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
When moving from clap v2 to v4 a bunch of
functions have been removed, so update the code
to handle these replacements
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Update dependabot ignore list in cargo ecosystem to ignore upgrades from
rust-vmm crates, since those crates need to be managed carefully and
manually.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Pin Github owned actions to specific hashes as recommended
as tags are mutable see https://pin-gh-actions.kammel.dev/.
This one of the recommendations that scorecard gives us.
Note this was generated with `frizbee actions`
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Fedora 40 is EoL, and I've seen the registry pull fail
a few times recently, so let's bump to fedora 42 which
has 10 months of support left.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Now we are decoupled from the image-rs crate,
we can bump the protobuf version across our project
to resolve the GHSA-2gh3-rmm4-6rq5 advisory
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This patch updates the container image for the CI test workloads:
- `k8s-layered-sc-deployment.yaml`
- `k8s-pod-sc-deployment.yaml`
- `k8s-pod-sc-nobodyupdate-deployment.yaml`
- `k8s-pod-sc-supplementalgroups-deployment.yaml`
- `k8s-policy-deployment.yaml`
Also updates unit tests:
- `test_create_container_security_context`
- `test_create_container_security_context_supplemental_groups`
This fixes tests failing due to an image pull error as the previous image is no longer available in
the container registry.
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Switch the hyper for an underscore, so the ghcr
helm publish can work properly.
Co-authored-by: Fabiano Fidêncio <fidencio@northflank.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Only sign the kernel if the user has provided the KBUILD_SIGN_PIN
otherwise ignore.
Whole here, let's move the functionality to the common fragments as it's
not a GPU specific functionality.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
At the moment if any of the tests in the matric fails
then the rest of the jobs are cancelled, so we have to
re-run everything. Add `fail-fast: false` to stop this
behaviour.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Remove the rule that causes gatekeeper to skip tests
if we've only updated the required-tests.yaml list.
Although update to just the required-tests.yaml
doesn't change the outcome of any of the CI tests, it
does change whether gatekeeper will still pass with the new
rules. Although it's a bit of a hit to run the CI, it's probably
worth it to keep gatekeeper validated.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This adds govulncheck vulnerability scanning as a non-blocking check in
the static checks workflow. The check scans Go runtime binaries for known
vulnerabilities while filtering out verified false positives.
Signed-off-by: Mitch Zhu <mitchzhu@microsoft.com>
Since #11197 was merged, all confidential k8s e2e tests for s390x
have been failing with the following errors:
```
attestation-agent: error while loading shared libraries:
libcurl.so.4: cannot open shared object file
libnghttp2.so.14: cannot open shared object file
```
In line with the update on x86_64, we need to upgrade the OS used
in rootfs-{image,initrd} on s390x.
This commit also bumps all 22.04 to 24.04 for all architectures.
For s390x, this ensures the missing packages listed above are
installed.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
After commit a3f973db3b merged, protection::GuestProtection::[Snp,Sev]
have changed to tuple variants, and can no longer be used in assert_eq
marco without tuple values, or some errors will raised:
```
assert_eq!(actual.unwrap(), GuestProtection::Snp);
| ^^^^^^^^^^^^^^^^^^^^ expected \
`GuestProtection`, found enum constructor
```
Signed-off-by: Lei Liu <liulei.pt@bytedance.com>
containerd-sandboxapi fails with `containerd v2.0.x` and passes with
`containerd v1.7.x` regardless kata-containers. And it was not tested
with `containerd v2.0.x` because `containerd v2.0.x` could not
recognize `[plugins.cri.containerd]` in `config.toml`.
Signed-off-by: Seunguk Shin <seunguk.shin@arm.com>
the latest Canonical TDX release supports 25.04 / Plucky as
well. Users experimenting with the latest goodies in the
25.04 TDX enablement won't get Kata deployed properly.
This change accepts 25.04 as supported distro for TDX.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Container image integrity protection is a critical practice involving a
multi-layered defense mechanism. While container images inherently offer
basic integrity verification through Content-Addressable Storage (CAS)
(ensuring pulled content matches stored hashes), a combination of other
measures is crucial for production environments. These layers include:
Encrypted Transport (HTTPS/TLS) to prevent tampering during transfer;
Image Signing to confirm the image originates from a trusted source;
Vulnerability Scanning to ensure the image content is "healthy"; and
Trusted Registries with stringent access controls.
In certain scenarios, such as when container image confidentiality
requirements are not stringent, and integrity is already ensured via the
aforementioned mechanisms (especially CAS and HTTPS/TLS), adopting
"force guest pull" can be a viable option. This implies that even when
pulling images from a container registry, their integrity remains
guaranteed through content hashes and other built-in mechanisms, without
relying on additional host-side verification or specialized transfer
methods.
Since this feature is already available in runtime-go and offers
synergistic benefits with guest pull, we have chosen to support force
guest pull.
Fixes#10690
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit introduces the `adjust_rootfs_mounts` function to manage
root filesystem mounts for guest-pull scenarios.
When the force guest-pull mechanism is active, this function ensures that
the rootfs is exclusively configured via a dedicated `KataVirtualVolume`.
It disregards any provided input mounts, instead generating a single,
default `KataVirtualVolume`. This volume is then base64-encoded and set
as the sole mount option for a new, singular `Mount` entry, which is
returned as the only item in the `Vec<Mount>`.
This change guarantees consistent and exclusive rootfs configuration
when utilizing guest-pull for container images.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
In CoCo scenarios, there's no image pulling on host side, and it will
disable such operations, that's to say, there's no files sharing between
host and guest, especially for container rootfs.
We introduce Kata Virtual Volume to help handle such cases:
(1) Introduce is_kata_virtual_volume to ensure the volume is kata
virtual volume.
(2) Introduce VirtualVolume Handling logic in handle_rootfs when the
mount is kata virtual volume.
Fixes#10690
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit introduces comprehensive support for rootfs mount mgmt
through Kata Virtual Volumes, specifically enabling the guest-pull
mechanism.
It enhances the runtime's ability to:
(1) Extract image references from container annotations (CRI/CRI-O).
(2) Process `KataVirtualVolume` objects, configuring them for guest-pull operations.
(3) Set up the agent's storage for guest-pulled images.
This functionality streamlines the process of pulling container images
directly within the guest for rootfs, aligning with guest-side image management strategies.
Fixes#10690
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
The Multistrap issue has been fixed in noble thus we can use the LTS.
Also, this will fix the error reported by CDH
```
/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found
```
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
The new version of AA allows the config not having a coco_as token
config. If not provided, it will mark as None.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
This patch updates the guest-components to new version with better
error logging for CDH. It also allows the config of AA not having a
coco_as token config.
Also, the new version of CDH requires to build aws-lc-sys thus needs to
install cmake for build.
See
https://github.com/kata-containers/kata-containers/actions/runs/15327923347/job/43127108813?pr=11197#step:6:1609
for details.
Besides, the new version of guest-components have some fixes for SNP
stack, which requires the updates of trustee side.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
This new parameter for kata-agent is used to control the timeout for a
guest pull request. Note that sometimes an image can be really big, so
we set default timeout to 1200 seconds (20 minutes).
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
After moving image pulling from kata-agent to CDH, the failed image pull
error messages have been slightly changed. This commit is to apply for
the change.
Note that in original and current image-rs implementation, both no key
or wrong key will result in a same error information.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Now kata-agent by default supports both guest pull and host pull
abilities, thus we do not need to specify the PULL_TYPE env when
building kata-agent.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
After moving guest pull abilities to CDH, the document of guest pull
should be updated due to new workflow.
Also, replace the diagram of PNG into a mermaid one for better
maintaince.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
In previous version, only when the `guest-pull` feature is enabled
during the build time, the OCI process will be tried to be overrided
when the storage has a guest pull volume and also it is sandbox. After
getting rid of the feature, whether it is guest-pull is runtimely
determined thus we can always do this trying override, by checking if
there is kata guest pull volume in storages and it's sandbox.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Now the ocicrypt configuration used by CDH is always the same and it's
not a good practics to write it into the rootfs during runtime by
kata-agent. Thus we now move it to coco-guest-components build script.
The config will be embedded into guest image/initrd together with CDH
binary.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
The feature `guest-pull` and `default-pull` are both removed, because
both guest pull and host pull are supported in building time without
without involving new dependencies like image-rs before. The guest pull
will depend on the CDH process, not the build time feature.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
This is a higher level calling to pull image inside guest. Now it should
call confidential_data_hub's API. As the previous pull_image API does
1. check is sandbox
2. generate bundle_path
inside the original logic, and the new API does not do them to keep the
API semantice clean, thus before we call the API, we explicitly do the
two things.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
now image pull ability is moved to CDH, thus the CDH process needs
environment variables of ocicrypt to help find the keyprovider(cdh) to
decrypt images.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
As image pull ability is moved to CDH, kata-agent does not need the
confugurations of image pulling anymore.
All these configurations reading from kernel cmdline is now implemented
by CDH.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Image pull abilities are all moved to the separate component
Confidential Data Hub (CDH) and we only left the auxiliary functions
except pull_image in confidential_data_hub/image.rs
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
This is a little refactoring commit that moves the mod `cdh.rs` and
`image.rs` to a directory module `confidential_data_hub`. This is
because the image pull ability will be moved into confidential data
hub, thus it is better to handle image pull things in the confidential
data hub submodule.
Also, this commit does some changes upon the original code. It gets rid
of a static variable for CDH timeout config and directly use the global
config variable's member. Also, this changes the
`is_cdh_client_initialized` function to sync version as it does not need
to be async.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
CDH provides the image pull api. This commit adds the declaration of the
API in the CDH proto file. This will be used in following commits.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
This method is not used when guest-pull is not used.
Add a flag that prevents a compile error when building with rust version > 1.84.0 and not using guest-pull
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Fixes a confusing log message shown when Virtio-FS is disabled.
Previously we logged “The virtiofsd had stopped” regardless of whether Virtio-FS was actually enabled or not.
Signed-off-by: Paweł Bęza <pawel.beza99@gmail.com>
Add the memory prealloc support for qemu hypervisor.
When it was enabled, all of the memory will be allocated
and locked. This is useful when you want to reserve all the
memory upfront or in the cases where you want memory latencies
to be very predictable.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This reverts commit 2ee3470627.
This is mostly redundant given we already have workflow approval for external
contributors.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Bump `ttrpc-codegen` related dependencies in response to `ttrpc-codegen`
bump in `libs/protocol`.
Relates: #11376
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Fix `empty_line_after_doc_comments` clippy warning as suggested by rust
1.85.1.
```console
error: empty line after doc comment
--> dbs_boot/src/x86_64/layout.rs:11:1
|
11 | / /// Magic addresses externally used to lay out x86_64 VMs.
12 | |
| |_^
13 | /// Global Descriptor Table Offset
14 | pub const BOOT_GDT_OFFSET: u64 = 0x500;
| ------------------------------ the comment documents this constant
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#empty_line_after_doc_comments
= note: `-D clippy::empty-line-after-doc-comments` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::empty_line_after_doc_comments)]`
= help: if the empty line is unintentional remove it
help: if the documentation should include the empty line include it in the comment
|
12 | ///
|
```
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Manually fix `unstable_name_collisions` clippy warning reported by rust
1.85.1.
```console
error: a method with this name may be added to the standard library in the future
--> src/registry.rs:646:10
|
646 | file.unlock()?;
| ^^^^^^
|
= warning: once this associated item is added to the standard library, the ambiguity may cause an error or change in behavior!
= note: for more information, see issue #48919 <https://github.com/rust-lang/rust/issues/48919>
= help: call with fully qualified syntax `fs2::FileExt::unlock(...)` to keep using the current method
= note: `-D unstable-name-collisions` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(unstable_name_collisions)]`
```
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Bump `ttrpc-codegen` related dependencies in response to `ttrpc-codegen`
bump in `libs/protocol`.
Relates: #11376
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Manually fix `empty_line_after_outer_attr` clippy warning reported by
rust 1.85.1.
```console
error: empty line after outer attribute
--> src/check.rs:515:9
|
515 | / #[allow(dead_code)]
516 | |
| |_^
517 | struct TestData<'a> {
| ------------------- the attribute applies to this struct
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#empty_line_after_outer_attr
= note: `-D clippy::empty-line-after-outer-attr` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::empty_line_after_outer_attr)]`
= help: if the empty line is unintentional remove it
```
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Bump `ttrpc-codegen` related dependencies in response to `ttrpc-codegen`
bump in `libs/protocol`.
Relates: #11376
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Bump `ttrpc-codegen` related dependencies in response to `ttrpc-codegen`
bump in `libs/protocol`.
Relates: #11376
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Manually fix `empty_line_after_doc_comments` clippy warning reported by
rust 1.85.1.
```console
error: empty line after doc comment
--> src/linux_abi.rs:8:1
|
8 | / /// Linux ABI related constants.
9 | |
| |_^
10 | #[cfg(target_arch = "aarch64")]
11 | use std::fs;
| ------- the comment documents this import
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#empty_line_after_doc_comments
= note: `-D clippy::empty-line-after-doc-comments` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::empty_line_after_doc_comments)]`
= help: if the empty line is unintentional remove it
```
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
The users@0.11.0 has a high severity CVE-2025-5791
and doesn't seem to be maintained, so switch to
uzers which forked it.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In the nvidia rootfs build, only copy in `kata-opa` if `AGENT_POLICY` is enabled. This fixes
builds when `AGENT_POLICY` is disabled and opa is not built.
Signed-off-by: Champ-Goblem <cameron@northflank.com>
Manually fix `unnecessary_get_then_check` clippy warning as suggested by
rust 1.85.1.
```console
warning: unnecessary use of `get(&shared_mount.src_ctr).is_none()`
--> src/sandbox.rs:431:25
|
431 | if src_ctrs.get(&shared_mount.src_ctr).is_none() {
| ---------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| help: replace it with: `!src_ctrs.contains_key(&shared_mount.src_ctr)`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_get_then_check
```
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Kata runtime employs a CapabilityBits mechanism for VMM capability
governance. Fundamentally, this mechanism utilizes predefined feature
flags to manage the VMM's operational boundaries.
To meet demands for storage performance and security, it's necessary
to explicitly enable capability flags such as `BlockDeviceSupport`
(basic block device support) and `BlockDeviceHotplugSupport` (block
device hotplug) which ensures the VMM provides the expected caps.
In CoCo scenarios, due to the potential risks of sensitive data leaks
or side-channel attacks introduced by virtio-fs through shared file
systems, the `FsSharingSupport` flag must be forcibly disabled. This
disables the virtio-fs feature at the capability set level, blocking
insecure data channels.
Fixes#11341
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Two key important scenarios:
(1) Support `virtio-blk-pci` cold plug capability for confidential guests
instead of nvdimm device in CVM due to security constraints in CoCo cases.
(2) Push initdata payload into compressed raw block device and insert it
in CVM through `virtio-blk-pci` cold plug mechanism.
Fixes#11341
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
By default the checkout action leave the credentials
in the checked-out repo's `.git/config`, which means
they could get exposed. Use persist-credentials: false
to prevent this happening.
Note: static-checks.yaml does use git diff after the checkout,
but the git docs state that git diff is just local, so doesn't
need authentication.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
On newer TDX platforms, checking `/sys/firmware/tdx` for `major_version` and
`minor_version` is no longer necessary. Instead, we only need to verify that
`/sys/module/kvm_intel/parameters/tdx` is set to `'Y'`.
This commit addresses the following:
(1) Removes the outdated check and corrects related code, primarily impacting
`cloud-hypervisor`.
(2) Refines the TDX platform detection logic within `arch_guest_protection`.
Fixes#11177
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Currently, the TDX Quote Generation Service (QGS) connection in
QEMU with default vsock port 4050 for TD attestation. To make it
flexible for users to modify the QGS port. Based on the introduced
qgs_port, This commit supports the QGS port to be configured via
configuration
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Currently, the TDX Quote Generation Service (QGS) connection in QEMU is
hardcoded to vsock port 4050, which limits flexibility for TD attestation.
While the users will be able to modify the QGS port. To address this
inflexibility, this commit introduces a new qgs_port field within security
info and make it default with 4050.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
During the prepare for `start sandbox` phase, this commit
ensures the correct `ProtectionDeviceConfig` is prepared
based on the `GuestProtection` type in a TEE platform.
Specifically, for the TDX platform, this commit sets the
essential parameters within the ProtectionDeviceConfig,
including the TDX ID, firmware path, and the default QGS
port (4050).
This information is then passed to the underlying VMM for
further processing using the existing ResourceManager and
DeviceManager infrastructure.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This patch introduces TdxConfig with key fields, firmare,
qgs_port, mrconfigid, and other useful things. With this config,
a new ProtectionDeviceConfig type `Tdx(TdxConfig)` is added.
With this new type supported, we finally add tdx protection device
into the cmdline to launch a TDX-based CVM.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit introduces the `tdx-guest` designed to facilitate
the launch of CVMs leveraging Intel's TDX.
Launching a TDX-based CVM requires various properties, including
`quote-generation-socket`, and `mrconfigid`,`sept-ve-disable` .etc.
(1) The `quote-generation-socket` property is added to the
`tdx-guest` object, which is of type `SocketAddress`, specifies the
address of the Quote Generation Service (QGS).
(2) The `mrconfigid` property, representing the SHA384 hash
for non-owner-defined configurations of the guest TD, is introduced as a
runtime or OS configuration parameter.
(3) And the `sept-ve-disable` property allows control over whether
EPT violation conversions to #VE exceptions are disabled when the guest
TD accesses PENDING pages.
With the introduction of the `tdx-guest` object and its associated
properties, launching TDX-based CVMs is now supported. For example, a
TDX guest can be configured via the command line as follows:
```shell
-object {"qom-type":"tdx-guest", "id":"tdx", "sept-ve-disable":true,\
"mrconfigid":"vHswGkzG4B3Kikg96sLQ5vPCYx4AtuB4Ubfzz9UOXvZtCGat8b8ok7Ubz4AxDDHh",\
"quote-generation-socket":{"type":"vsock","cid":"2","port":"4050"} \
-machine q35,accel=kvm,confidential-guest-support=tdx
```
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This enables consistent JSON representation of socket addresses
across system components:
(1) Add serde serialization/deserialization with standardized
field naming convention.
(2) Enforce string-based port/cid and unix/path representation
for protocol compatibility.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
For CoCo, shared_fs is prohibited as we cannot guarantee the security of
guest/host sharing. Therefore, this PR enables administrators to configure
shared_fs = "none" via the configuration.toml file, thereby enforcing the
disablement of sharing.
Fixes#10677
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Run the k8s tests on mariner with annotation disable_image_nvdimm=true,
to use virtio-blk instead of nvdimm for the guest rootfs block device.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Allow users to build using DEFDISABLEIMAGENVDIMM=true if they want to
set disable_image_nvdimm=true in configuration-clh.toml.
disable_image_nvdimm=false is the default config value.
Also, use virtio-blk instead of nvdimm if disable_image_nvdimm=true in
configuration-clh.toml.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Allow users to build using DEFDISABLEIMAGENVDIMM=true if they
want to set disable_image_nvdimm=true in configuration-qemu*.toml.
disable_image_nvdimm=false is the default configuration value.
Note that the value of disable_image_nvdimm gets ignored for
platforms using "confidential_guest = true".
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Comment out "disable_image_nvdimm = true" in:
- configuration-qemu-snp.toml
- configuration-qemu-nvidia-gpu-snp.toml
for consistency with the other configuration-qemu*.toml files.
Those two platforms are using "confidential_guest = true", and therefore
the value of disable_image_nvdimm gets ignored.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Bump chrono package to 0.4.41 and thereby
remove the time 0.1.43 dependency and remediate
CVE-2020-26235
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This removes the ok-to-test label on every push, except if the PR author
has write access to the repo (ie. permission to modify labels).
This protects against attackers who would initially open a genuine PR,
then push malicious code after the initial review.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This completely eliminates the Azure secret from the repo, following the below
guidance:
https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/configuring-openid-connect-in-azure
The federated identity is scoped to the `ci` environment, meaning:
* I had to specify this environment in some YAMLs. I don't believe there's any
downside to this.
* As previously, the CI works seamlessly both from PRs and in the manual
workflow.
I also deleted the tools/packaging/kata-deploy/action folder as it doesn't seem
to be used anymore, and it contains a reference to the secret.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Having secrets unconditionally being inherited is
bad practice, so update the workflows to only pass
through the minimal secrets that are needed
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In this commit, hotplug_vfio_on_root_bus parameter is removed.
<dd422ccb69>
pcie_root_port parameter description
(`This value is valid when hotplug_vfio_on_root_bus is true and
machine_type is "q35"`) will have no value,
and not completely valid, since vrit or DB as also support for root-ports and CLH as well.
so removed.
Fixes: #11316
Co-authored-by: Zvonko Kaiser <zkaiser@nvidia.com>
Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>
Instead of looping over the users per group and parsing passwd for each
user, we can do the reverse lookup uid->user up front and then compare
the names directly. This has the nice side-effect of silencing warnings
about non-existent users mentioned in /etc/group, which is not relevant
for policy decisions.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
containerd does not automatically add groups to the list of additional
GIDs when the groups have the same name as the user:
https://github.com/containerd/containerd/blob/f482992/pkg/oci/spec_opts.go#L852-L854
This is a bug and should be corrected, but it has been present since at
least 1.6.0 and thus affects almost all containerd deployments in
existence. Thus, we adopt the same behavior and ignore groups with the
same name as the user when calculating additional GIDs.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
When connecting to guest through vsock, a log is printed for each failure.
The failure comes from two main reasons: (1) the guest is not ready or (2)
some real errors happen. Printing logs for the first case leads to log
clutter, and your logs will like this:
```
Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/...
Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/...
Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/...
Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/...
Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/...
```
To avoid this, the sock implmentations save the last error and return it
after all retries are exhausted. Users are able to check all errors by
setting the log level to trace.
Reorganize the log format to "{sock type}: {message}" to make it clearer.
Apart from that, errors return by the socks use `self`, instead of
`ConnectConfig`, since the `ConnectConfig` doesn't provide any useful
information.
Disable infinite loop for the log forwarder. There is retry logic in the
sock implmentations. We can consider the agent-log unavailable if
`sock.connect()` encounters an error.
Fixes: #10847
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
The vhost-user-fs has been added to Dragonball, so we can remove
`update_memory`'s dead_code attribute.
Fixes: #8691
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Removed unnecessary dynamic dispatch for services. Properly dereferenced
service Box values and stored in Arc.
Co-authored-by: Ruoqing He <heruoqing@iscas.ac.cn>
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
Previous version of `ttrpc-codegen` is generating outdated
`#![allow(box_pointers)]` which was deprecated. Bump `ttrpc-codegen`
from v0.4.2 to v0.5.0 and `protobuf` from vx to v3.7.1 to get rid of
this.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
The additional GIDs are handled by genpolicy as a BTreeSet. This set is
then serialized to an ordered JSON array. On the containerd side, the
GIDs are added to a list in the order they are discovered in /etc/group,
and the main GID of the user is prepended to that list. This means that
we don't have any guarantees that the input GIDs will be sorted. Since
the order does not matter here, comparing the list of GIDs as sets is
close enough.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
The warning used to trigger even if the passwd file was not needed. This
commit moves it down to where it actually matters.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
We need more and accurate documentation. Let's start
by providing an Helm Chart install doc and as a second
step remove the kustomize steps.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Steve Horsman <steven@uk.ibm.com>
The Guest rootfs image file size is aligned up to 128M boundary,
since commmit 2b0d5b2. This change allows users to use a custom
alignment value - e.g., to align up to 2M, users will be able to
specify IMAGE_SIZE_ALIGNMENT_MB=2 for image_builder.sh.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
- Create groups for commonly seen cargo packages so that rather than
getting up to 9 PRs for each rust components, bumps to the same package
are grouped together.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Create a dependabot configuration to check for updates
to our rust and golang packages each day and our github
actions each month
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
After the last commit, the initdata test on SNP should be ok. Thus we
turn on this flag for CI.
Fixes#11300
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
the qemu commandline of SNP should start with `sev-snp-guest`, and then
following other parameters separeted by ','. This patch fixes the
parameter order.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
We're switching to using a rev as it may take some time for the package
to be updated on crates.io.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
There's no benefit on keeping those restricted to the dragonball build,
when they can be used with other VMMs as well (as long as they support
the mem-agent).
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Currently, when a new sandbox resource controller is created with cgroupsv2 and sandbox_cgroup_only is disabled,
the cgroup management falls back to cgroupfs. During deletion, `IsSystemdCgroup` checks if the path contains `:`
and tries to delete the cgroup via systemd. However, the cgroup was originally set up via cgroupfs and this process
fails with `lstat /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/....scope: no such file or directory`.
This patch updates the deletion logic to take in to account the sandbox_cgroup_only=false option and in this case uses
the cgroupfs delete.
Fixes: #11036
Signed-off-by: Champ-Goblem <cameron@northflank.com>
Increase the NOFILE limit in the systemd service, this helps with
running databases in the Kata runtime.
Signed-off-by: Champ-Goblem <cameron@northflank.com>
Since 3.12 we're shipping the helm-chart per default
with each release. Update the documentation to use helm rather
then the kata-deploy manifests.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We have a number of jobs that either need,or nest workflows
that need gh permissions, such as for pushing to ghcr,
or doing attest build provenance. This means they need write
permissions on things like `packages`, `id-token` and `attestations`,
so we need to set these permissions at the job-level
(along with `contents: read`), so they are not restricted by our
safe defaults.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
I shortsightedly forgot that gatekeeper would need
to read more than just the commit content in it's
python scripts, so add read permissions to actions
issues which it uses in it's processing
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We have a number of jobs that nest the build-static-tarball
workflows later on. Due to these doing attest build provenance,
and pushing to ghcr.io, t hey need write permissions on
`packages`, `id-token` and `attestations`, so we need to set
these permissions on the top-level jobs (along with `contents: read`),
so they are not blocked.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Some legacy workflows require write access to github which
is a security weakness and don't provide much value,
so lets remove them.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
It frequently causes "Resource Temporarily Unavailable (OS Error 11)"
with the original 250ms read timeout When passing through devices via
VFIO in QEMU. The root cause lies in synchronization timeout windows
failing to accommodate inherent delays during critical hardware init
phases in kernel space. This commit would increase the timeout to 5000ms
which was determined through some tests. While not guaranteeing complete
resolution for all hardware combinations, this change significantly
reduces timeout failures.
Fixes # 10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This enables guest pull via config, without the need of any external
snapshotter. When the config enables runtime.experimental_force_guest_pull, instead of
relying on annotations to select the way to share the root FS, we always
use guest pull.
Co-authored-by: Markus Rudy <mr@edgeless.systems>
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
The service name is specified as RFC 1035 lable name [1]. The svc_name
regex in the genpolicy settings is applied to the downward API env
variables created based on the service name. So it tries to match
RFC 1035 labels after they are transformed to downward API variable
names [2]. So the set of lower case alphanumerics and dashes is
transformed to upper case alphanumerics and underscores.
The previous regex wronly permitted use of numbers, but did allow
dot and dash, which shouldn't be allowed (dot not because they aren't
conform with RFC 1035, dash not because it is transformed to underscore).
We have to take care not to also try to use the regex in places where
we actually want to check for RFC 1035 label instead of the downward
API transformed version of it.
Further, we should consider using a format like JSON5/JSONC for the
policy settings, as these are far from trivial and would highly benefit
from proper documentation through comments.
[1]: https://kubernetes.io/docs/concepts/services-networking/service/#defining-a-service
[2]: b2dfba4151/pkg/kubelet/envvars/envvars.go (L29-L70)
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
Since kernel v6.3 the vsock packet is not split over two descriptors and
is instead included in a single one.
Therefore, we currently decide the specific method of obtaining
BufWrapper based on the length of descriptor.
Refer:
a2752fe04fhttps://git.kernel.org/torvalds/c/71dc9ec9ac7d
Signed-off-by: Xingru Li <lixingru.lxr@linux.alibaba.com>
[ Gao Xiang: port this patch from the internal branch to address Linux 6.1.63+. ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Currently, Kata EROFS support needs it, otherwise it will:
[ 0.564610] erofs: (device sda): mounted with root inode @ nid 36.
[ 0.564858] overlayfs: failed to set xattr on upper
[ 0.564859] overlayfs: ...falling back to index=off,metacopy=off.
[ 0.564860] overlayfs: ...falling back to xino=off.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Some nvidia gpu pci address domain with 0001,
current runtime default deal with 0000:bdf,
which cause address errors during device initialization
and address conflicts during device registration.
Fixes#11252
Signed-off-by: yangsong <yunya.ys@antgroup.com>
Fixed "note: Not following: ./../../../tools/packaging/guest-image/lib_se.sh:
openBinaryFile: does not exist (No such file or directory) [SC1091]"
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Although the script will inherit that setting from the caller scripts,
expliciting it in the file will vanish shellcheck "warning: Use 'pushd
... || exit' or 'pushd ... || return' in case pushd fails. [SC2164]"
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Addressed the following shellcheck advices:
SC2046 (warning): Quote this to prevent word splitting.
SC2248 (style): Prefer double quoting even when variables don't contain special characters
SC2250 (style): Prefer putting braces around variable references even when not strictly required.
SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
It's import to handle port allocation in a PCIe topology before vfio
deivce hotplug via QMP.
The code ensures that VFIO devices are properly allocated to available
ports (either root ports or switch ports) and updates the device's bus
and port information accordingly.
It'll first retrieves the PCIe port type from the topology using
pcie_topo.get_pcie_port(). And then, searches for an available node in
the PCIe topology with RootPort or SwitchPort type and allocates the
VFIO device to the found available port. Finally, Updates the device's
bus with the allocated port's ID and type.
Fixes # 10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit implements the `find_available_node` function,
which searches the PCIe topology for the first available
`TopologyPortDevice` or `SwitchDownPort`.
If no available node is found in either the `pcie_port_devices`
or the connected switches' downstream ports, the function returns
`None`.
Fixes # 10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit note that the current implementation restriction where
'multifunction=on' is temporarily unsupported. While the feature
isn't available in the present version, we explicitly acknowledge
this limitation and commit to addressing it in future iterations
to enhance functional completeness.
Tracking issue #11292 has been created to monitor progress towards
full multifunction support.
Fixes#10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
To support port devices for vfio devices, more fields need to be
introduced to help pass port type, bus and other information.
Fixes#10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
When try to delete a cgroup, it's needed to move all of the
tasks/procs in the cgroup into root cgroup and then delete it.
Since for cgroup v2, it doesn't support to move thread into
root cgroup, thus move the processes instead of moving tasks
can fix this issue.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
this script relies on temporary subscriptions and won't cleanup any
resources. Let's improve the logging to better describe what resources
were created and how to clean them, if the user needs to do so.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
We used hardcoded "ci/openshift-ci/cluster" location which expects this
script to be only executed from the root. Let's use SCRIPT_DIR instead
to allow execution from elsewhere eg. by user bisecting a failed CI run.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
In CI we hit problem where just after `az login` the first `az
network vnet list` command fails due to permission. We see
"insufficient permissions" or "pending permissions", suggesting we should
retry later. Manual tests and successful runs indicate we do have the
permissions, but not immediately after login.
Azure docs suggest using extra `az account set` but still the
propagation might take some time. Add a loop retrying
the first command a few times before declaring failure.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
kbs_k8s_svc_host() returns the ingress IP when the KBS service is
exposed via an ingress. In Azure AKS the ingress can time a while to be
fully ready and recently we have noticed on CI that kbs_k8s_svc_host()
has returned empty value. Maybe the problem is on current timeout being
too low, so let's increase it to 50 seconds to see if the situation
improves.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Added 'report-tests' command to gha-run.sh to print to stdout a report
of the tests executed.
For example:
```
SUMMARY (2025-02-17-14:43:53):
Pass: 0
Fail: 1
STATUSES:
not_ok foo.bats
OUTPUTS:
::group::foo.bats
1..3
not ok 1 test 1
not ok 2 test 2
ok 3 test 3
1..2
not ok 1 test 1
not ok 2 test 2
::endgroup::
```
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Currently run_kubernetes_tests.sh sends all the bats outputs to stdout
which can be very difficult to browse to find a problem, mainly on
CI. With this change, each bats execution have its output sent to
'reports/yyy-mm-dd-hh:mm:ss/<status>-<bats file>.log' where <status>
is either 'ok' (tests passed) or 'not_ok' (some tests failed).
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Added variable REPO_COMPONENTS (default: "main") which sets components
used by mmdebstrap for rootfs building.
This is useful for custom image builders who want to include EXTRA_PKGS
from components other than the default "main" (e.g. "universe").
Fixes: #11278
Signed-off-by: Jacek Tomasiak <jtomasiak@arista.com>
Signed-off-by: Jacek Tomasiak <jacek.tomasiak@gmail.com>
Use $(sandbox-namespace) wildcard in case none is specified in yaml. If wildcard is present, compare
input against annotation value.
Fixes regression introduced in https://github.com/microsoft/kata-containers/pull/273
where samples that use metadata.namespace env var were no longer working.
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Validate more process fields for commands enabled using the
ExecProcessRequest "commands" and/or "regex" fields from the
settings file.
Add function to get the container from state based on container_id
matching instead of matching it against every policy container data
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Signed-off-by: Ankita Pareek <ankitapareek@microsoft.com>
Using process data inputs for allow_process() is easier to
read/understand compared with the older OCI data inputs.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
For security reasons, we have restricted directory copying.
Introduces the `is_allowlisted_copy_volume` function to verify
if a given volume path is present in an allowed copy directory.
This enhances security by ensuring only permitted volumes are
copied
Currently, only directories under the path
`/var/lib/kubelet/pods/<uid>/volumes/{kubernetes.io~configmap,
kubernetes.io~secret, kubernetes.io~downward-api,
kubernetes.io~projected}` are allowed to be copied into the
guest. Copying of other directories will be prohibited.
Fixes#11237
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
When synchronizing file changes on the host, a "symlink AlreadyExists"
issue occurs, primarily due to improper handling of symbolic links
(symlinks). Additionally, there are other related problems.
This patch will try to address these problems.
(1) Handle symlink target existence (files, dirs, symlinks) during host file
sync. Use appropriate removal methods (unlink, remove_file, remove_dir_all).
(2) Enhance temporary file handling for safer operations and implement truncate
only at offset 0 for resume support.
(3) Set permissions and ownership for parent directories.
(4) Check and clean target path for regular files before rename.
Fixes#11237
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Introduce event-driven file sync mechanism between host and guest when
sharedfs is disabled, which will help monitor the host path in time and
do sync files changes:
1. Introduce FsWatcher to monitor directory changes via inotify;
2. Support recursive watching with configurable filters;
3. Add debounce logic (default 500ms cooldown) to handle burst events;
4. Trigger `copy_dir_recursively` on stable state;
5. Handle CREATE/MODIFY/DELETE/MOVED/CLOSE_WRITE events;
Fixes#11237
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
In Kubernetes (k8s), while Kata Pods often use virtiofs for injecting
Service Accounts, Secrets, and ConfigMaps, security-sensitive
environments like CoCo disable host-guest sharing. Consequently, when
SharedFs is disabled, we propagate these configurations into the guest
via file copy and bind mount for correct container access.
Fixes#11237
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
There's several cases that block device plays very import roles:
1. Direct Volume:
In Kata cases, to achieve high-performance I/O, raw files on the host
are typically passed directly to the Guest via virtio-blk, and then
bond/mounted within the Guest for container usage.
2. Trusted Storage
In CoCo scenarios, particularly in Guest image pull mode, images are
typically pulled directly from the registry within the Guest. However,
due to constrained memory resources (prioritized for containers), CoCo
leverages externally attached encrypted storage to store images,
requiring hot-plug capability for block devices.
and as other vmms, like dragonball and cloud-hypervisor in runtime-rs or
qemu in kata-runtime have already supported such capabilities, we need
support block device with hot-plug method (QMP) in qemu-rs. Let's do it.
Fixes#11143
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit introduces block device hotplugging capability using
QMP commands.
The implementation enables attaching raw block devices to a running
VM through the following steps:
1.Block Device Configuration
Uses `blockdev-add` QMP command to define a raw block backend with
(1) Direct I/O mode
(2) Configurable read-only flag
(3) Host file/block device path (`/path/to/block`)
2.PCI Device Attachment, Attaches the block device via `device_add`
QMP command as a `virtio-blk-pci` device:
(1) Dynamically allocates PCI slots using `find_free_slot()`
(2) Binds to user-specified PCIe bus (e.g., `pcie.1`)
(3) Returns PCI path for further management
Fixes#11143
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
The get_pci_path_by_qdev_id function is designed to search for a PCI
device within a given list of devices based on a specified qdev_id.
It tracks the device's path in the PCI topology by recording the slot
values of the devices traversed during the search. If the device is
located behind a PCI bridge, the function recursively explores the
bridge's device list to find the target device. The function returns
the matching device along with its updated path if found, otherwise,
it returns None.
Fixes#11143
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
`containerd` command should be executed in the host environment.
(To generate the config that matches the host's containerd version.)
Fixes: #11092
Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>
Fixes: #11288
This commit appends hotplug devices (e.g., persistent volume)
to deviceInfos when `vfio_mod` is `vfio` and `cold_plug_vfio`
is set to one except `no-port`. For details, please visit the issue.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This exposes REPO_URL and adds REPO_URL_X86_64 which can be set to use
custom Ubuntu repo for building rootfs.
If only one architecture is built, REPO_URL can be set. Otherwise,
REPO_URL_X86_64 is used for x86_64 arch and REPO_URL for others.
Fixes: #11276
Signed-off-by: Jacek Tomasiak <jtomasiak@arista.com>
Signed-off-by: Jacek Tomasiak <jacek.tomasiak@gmail.com>
Now that memory hotplug should work, as we're using a firmware that
supports that, let's re-enable the tests that rely on hotplug.
Fixes: #10926, #10927
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
As the genpolicy from_files call makes network requests to container
registries, it has a chance to fail.
Harden us against flakes due to network by introducing a 6x retry loop
in genpolicy tests.
Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
Drop '-vmx-rdseed-exit' from '-cpu host' QEMU options. The history
of it is unknown but it's likely related to early TDX enablement.
TD pods start up fine without it (tested by manually editing the
configuration file) and it's also not used elsewhere.
Keep TDXCPUFEATURES for now in case a need for it shows up later.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Extract PortDevice relevant information, and then invoke different
processing methods based on the device type.
Fixes#10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Some data structures and methods are introduced to help handle vfio devices.
And mothods add_pcie_root_ports and add_pcie_switch_ports follow runtime's
related implementations of vfio devices.
Fixes#10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Prepare pcie port devices before starting VM with the help of
device manager and PCIe Topology.
Fixes#10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
A new resource type `PortDevice` is introduced which is dedicated
for handling root ports/switch ports during sandbox creation(VM).
Fixes#10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
PortDevice is for handling root ports or switch ports in PCIe
Topology. It will make it easy pass the root ports/switch ports
information during create VM with requirements of PCIe devices.
Fixes#10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit introduces an implementation for managing PCIe topologies,
focusing on the relationship between Root Ports and Switch Ports. The
design supports two strategies for generating Switch Ports:
Let's take the requirement of 4 switch ports as an example. There'll be
three possible solutions as below:
(1) Single Root Port + Single PCIe Switch: Uses 1 Root Port and 1 Switch
with 4 Downstream Ports.
(2) Multiple Root Ports + Multiple PCIe Switches: Uses 2 Root Ports and
2 Switches, each with 2 Downstream Ports.
The recommended strategy is Option 1 due to its simplicity, efficiency,
and scalability. The implementation includes data structures
(PcieTopology, RootPort, PcieSwitch, SwitchPort) and operations
(add_pcie_root_port, add_switch_to_root_port, add_switch_port_to_switch)
to manage the topology effectively.
Fxies #10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
(1) Introduce new field `pcie_switch_port` for switch ports.
(2) Add related checking logics in vmms(dragonball, qemu)
Fixes#10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
The edk2 is required for memory hot plug on qemu for arm64.
This adds the edk2 to configuration-qemu.toml for arm64.
Signed-off-by: Seunguk Shin <seunguk.shin@arm.com>
Reviewed-by: Nick Connolly <nick.connolly@arm.com>
The edk2 is required for memory hot plug on qemu for arm64.
This adds the edk2 to static tarball for arm64.
Signed-off-by: Seunguk Shin <seunguk.shin@arm.com>
Reviewed-by: Nick Connolly <nick.connolly@arm.com>
The github rest api truncated job names that are >100
characters (which doesn't seem to be documented).
There doesn't seem to be a way to easily make gatekeeper
handle this automatically, so lets update the required-tests
to expect the truncated job names
Fixes: #11176
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
to simplify gatekeeper development add support for DEBUG_INPUT which can
be used to report content from files gathered in DEBUG run.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
to avoid manual curling to analyze GK issues let's add a way to dump all
GK requests in a directory when the use specifies "DEBUG" env variable.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
Let's take advantage that helm take and OCI registry as the charts, and
upload our charts to the OCI registries we've been using so far.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
The previous attempt to fix this issue only took in consideration the
QEMU binary, as I completely forgot that there were other pieces of the
config that we also adjusted.
Now, let's just check one of the configs before trying to adjust
anything else, and only do the changes if the suffix added with the
multi-install suffix is not yet added.{
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Multistrap requires usrmerge package which was dropped in Ubuntu 24.04
(Noble). Based on details from [0], the rootfs build process was switched
to mmdebstrap.
Some additional minor tweaks were needed around chrony as the version
from Noble has very strict systemd sandboxing configured and it doesn't
work with readonly root by default.
[0] https://lists.debian.org/debian-dpkg/2023/05/msg00080.htmlFixes: #11245
Signed-off-by: Jacek Tomasiak <jtomasiak@arista.com>
Signed-off-by: Jacek Tomasiak <jacek.tomasiak@gmail.com>
Guest components is now less verbose with its error messages. This will
be fixed after the release but for now switch to a more generic error
message that is still found in the logs.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Guest components is less verbose with its error message now. This will
be fixed after the release, but for now, update the tests with the new
more general message.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Guets components is less verbose with its error messages. This will be
fixed after the release, but for now let's replace this with a more
generic message.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Fix up genpolicy test inputs to include required additionalGids
Include a test for the pod_container container in security_context tests
as these containers follow slightly different paths in containerd.
Introduce a test for fsGroup/supplementalGroups fields in the security
context.
Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
Introduce new test case to the security context bats file which verifies
that policy works properly for a deployment yaml containing fsGroup and
supplementalGroup configuration.
Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
With added support for parsing these fields in genpolicy, we can now
enable policy verification of AdditionalGids.
Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
Some changes in guest components have obscured the error message that we
show when we fail to get the credentials for an authenticated image. The
new error message is a little bit misleading since it references
decrypting an image. This will be udpated in a future release, but for
now look for this message.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Guest components prints out a different error when failing to decrypt an
image. Update the test to look for this new error.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Adding:
"-object rng-random,id=rng0,filename=/dev/urandom -device
virtio-rng-pci,rng=rng0"
for confidential guests is not necessary as the RNG source cannot
be trusted and the guest kernel has the driver already disable as well.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Linux CoCo x86 guest is hardened to ensure RDRAND provides enough
entropy to initialize Linux RNG. A failure will panic the guest.
For confidential guests any other RNG source is untrusted so disable
them.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
With #11076 merged, a VFIO configuration is needed in the runtime
when IBM SEL is involved (e.g., qemu-se or qemu-se-runtime-rs).
For the Go runtime, we already have a nightly test
(e.g., https://github.com/kata-containers/kata-containers/actions/runs/14964175872/job/42031097043)
in which this change has been applied.
For the Rust runtime, the feature has not yet been migrated.
Thus, this change serves as a placeholder and a reminder for future implementation.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Pick up changes to guest components. This hash is right before the
changes to GC to support image pull via the CDH.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
As the comment in the fragment suggests, this is for the firecracker builds
and not relevant for confidential guests, for example.
Exlude mmio.conf fragment by adding the new !confidential tag to drop
virtio MMIO transport for the confidential guest kernel (as virtio PCI is
enough for the use cases today).
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
build-kernel.sh supports exluding fragments from the common base
set based on the kernel target architecture.
However, there are also cases where the base set must be stripped
down for other reason. For example, confidential guest builds want to
exclude some drivers the untrusted host may try to add devices (e.g.,
virtio-rng).
Make build-kernel.sh to skip fragments tagged using '!confidential'
when confidential guest kernels are built.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
It looks like the 22.04 image got updated and broke
the docker tests (see #11247), so make these un-required
until we can get a resolution
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Update the tempfile crate to resolve security issue
[WS-2023-0045](7247a8b6ee)
that came with the remove_dir_all dependency in prior versions
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This aligns the workdir preparation more closely with the workdir
preparation for the generate integration test. Most notably, we
clean up the temporary directory before we execute the tests in it.
This way we better isolate different runs.
Signed-off-by: Leonard Cohnen <leonard.cohnen@gmail.com>
Signed-off-by: Markus Rudy <mr@edgeless.systems>
Add a new type of integration test to genpolicy. Now we can test flag handling
and how the CLI behaves with certain yaml inputs.
The first tests cover the case when a Pod references a Kubernetes secret of
config map in another file. Those need to be explicitly added via the
--config-files flag.
In the future we can easily add test suites that cover that all yaml fields
of all resources are understood by genpolicy.
Signed-off-by: Leonard Cohnen <leonard.cohnen@gmail.com>
In preparation for adding more types of integration tests, moving the
policy enforcements test into a separate folder.
Signed-off-by: Leonard Cohnen <leonard.cohnen@gmail.com>
Signed-off-by: Markus Rudy <mr@edgeless.systems>
This allows passing config maps and secrets (as well as any other
resource kinds relevant in the future) using the -c flag.
Fixes: #10033
Co-authored-by: Leonard Cohnen <leonard.cohnen@gmail.com>
Signed-off-by: Leonard Cohnen <leonard.cohnen@gmail.com>
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Switch imports to resolve:
```
SA1019: "github.com/opencontainers/runc/libcontainer/userns" is deprecated:
use github.com/moby/sys/userns
```
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In the latest oci-spec, the prestart hook is deprecated.
However, the docker & nerdctl tests failed when I switched
to one of the newer hooks which don't run at quite the same time,
so ignore the deprecation warnings for now to unblock the security fix
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We've been using the
github.com/containers/podman/v4/pkg/annotations module
to get cri-o annotations, which has some major CVEs in, but
in v5 most of the annotations were moved into crio (from 1.30)
(see https://github.com/cri-o/cri-o/pull/7867). Let's switch
to use the cri-o annotations module instead and remediate
CVE-2024-3056.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
When installing with kata-deploy,
usually `/opt/kata/bin` is not in the PATH.
Therefore, it will fail to execute.
so add it to the PATH.
Fixes: #11122
Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>
Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com>
`musl` target is not yet available for riscv64 as of 1.80.0 rust
toolchain, set `FORTIFY_SOURCE` to 1 on riscv64 platforms.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
`create_pci_root_bus_path` needs to be enabled on riscv64 for agent to
compile and work on those platforms.
Signed-off-by: Nikos Ch. Papadopoulos <ncpapad@cslab.ece.ntua.gr>
Since the ephemeral volume already has a separate volume type for
processing, the processing in the virtiofs share volume can be deleted.
Moreover, it is not appropriate to process the ephemeral in the share
fs.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
For k8s, there's two type of volumes based on ephemral memory,
one is emptydir volume based on ephemeral memory, and the other
one is used for shm device such as /dev/shm. Thus add a new volume
type ephemeral volume to support those two type volumes and remove
the legacy shm volume.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Implementing directory creation logic in the OverlayfsHandler to process
driver options with the KATA_VOLUME_OVERLAYFS_CREATE_DIR prefix
Signed-off-by: ChengyuZhu6 <hudson@cyzhu.com>
- Detection of EROFS options in container rootfs
- Creation of necessary EROFS devices
- Sharing of rootfs with EROFS via overlayfs
Fixes: #11163
Signed-off-by: ChengyuZhu6 <hudson@cyzhu.com>
This patch:
- adds a count check on mounts
- adds various test scenarios for mounts with emptyDir volume source
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
Some cni plugins will set the MTU of some routes, such as cilium will
modify the MTU of the default route. If the mtu of the route is not set
correctly, it may cause excessive fragmentation or even packet loss of
network packets. Therefore, this PR adds the setting of the MTU of the
route. First, when obtaining the route, if the MTU is set, the MTU will
also be obtained and set to the route in the guest.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Bump `netlink-sys` to v0.8, `netlink-packet-route` to v0.22 and
`rtnetlink` to v0.16 to reach a consistent state of `rust-netlink`
dependencies.
`bitflags` is bumped to v2.9.0 since those crates requires it.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
`rtnetlink`, `netlink-sys` and `netlink-packet-route` are from the same
organization, and some of them are depending on the others, which
implies the version of those crates should be chosen and dealt with
carefully, group them to provide better management.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Kernel Makefiles changed how to deduce the right arch
lets set it explicilty to enable arm and amd builds.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Put local dependencies (mostly `dbs` crates) into workspace to avoid
complex path dependencies all over the workspace. Simplify path
dependency referencing.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
It's better to open the log pipe file with read & write option,
otherwise, once the containerd reboot and closed the read
endpoint, kata shim would write the log pipe with broken pipe error.
Fixes: #11207
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Update the runtime-rs workspace packages to
use workspace package versions where applicable
to centralise the config and reduce maintenance
when updating these
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
As discussed in #9538, with anyhow >=1.0.77 we have test failures due to backtrace behaviour
changing, so set RUST_LIB_BACKTRACE=0,
so that we only have backtrace on panics
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Update all crossbeam-channel for all non-agent
packages (it was done separately in #11175)
to 0.5.15 to get them on latest version and remove
the versions with a vulnerability
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
When a PR has no new files the cargo deny runner fails with:
```
[cargo-deny-generator.sh:17] ERROR: changed_files_status=
```
so add `|| true` to try and help this
Co-authored-by: Ruoqing He <heruoqing@iscas.ac.cn>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We are seeing failures in this test, where the output of
the kubectl exec command seems to be blank, so try
retrying the exec like #11024Fixes: #11133
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Since #10780 the dbs crates are managed as members
of the dragonball workspace, so we can remove the lockfile
as it's now workspace managed now
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
As #11076 introduces VFIO-AP bind/associate funtions for IBM Secure
Execution (SEL), a new internal nightly test has been established.
This PR adds a new entry `cc-vfio-ap-e2e-tests` to the existing matrix
to share the test result.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
For help with debugging add, logging of the KBS,
like the container system logs if the confidential test fails
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Bump golang.org/x/net to 0.38.0 as dependabot
isn't doing it for these packages to remediate
CVE-2025-22872
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Nydus+guest_pull has specific behavior where it improperly handles image layers on
the host, causing the CRI to not find /etc/passwd and /etc/group files
on container images which have them. The unfortunately causes different
outcomes w.r.t. GID used which we are trying to enforce with policy.
This behavior is observed/explained in https://github.com/kata-containers/kata-containers/issues/11162
Handle this exception with a config.settings.cluster_config.guest_pull
field. When this is true, simply ignore the /etc/* files in the
container image as they will not be parsed by the CRI.
Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
Add security context testcases for genpolicy, verifying that UID and GID
configurations controlled by the kubernetes security context are
enforced.
Also, fix the other CreateContainerRequest tests' expected contents to
reflect our new genpolicy parsing/enforcement of GIDs.
Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
Introduce tests to check for policy correctness on a redis deployment
with 1. a pod-level securityContext 2. a container-level securityContext
which shadows the pod-level securityContext 3. a pod-level
securityContext which selects an existing user (nobody), causing a new GID to be selected.
Redis is an interesting container image to test with because it includes
a /etc/passwd file with existing user/group configuration of 1000:1000 baked in.
Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
With fixes to align policy GID parsing with the CRI behavior, we can now
enable policy verification of GIDs.
Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
The GID used for the running process in an OCI container is a function of
1. The securityContext.runAsGroup specified in a pod yaml, 2. The UID:GID mapping in
/etc/passwd, if present in the container image layers, 3. Zero, even if
the userstr specifies a GID.
Make our policy engine align with this behavior by:
1. At the registry level, always obtain the GID from the /etc/passwd
file if present. Ignore GIDs specified in the userstr encoded in the
OCI container.
2. After an update to UID due to securityContexts, perform one final check against
the /etc/passwd file if present. The GID used for the running
process is the mapping in this file from UID->GID.
3. Override everything above with the GID of the securityContext
configuration if provided
Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
Our policy should cover these fields for securityContexts at the pod or
container level of granularity.
Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
The get_process logic in registry.rs did not account for all cases
(username:groupname), did not defer to contents of /etc/group,
/etc/passwd when it should, and was difficult to read.
Clean this implementation up, factoring the string parsing for
user/group strings into their own functions. Enable the
registry::Container class to query /etc/passwd and /etc/group, if they
exist.
Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
By running on "all" host type there are two consequences:
1) run the "normal" tests too (until now, it's only "small" tests), so
increasing the coverage
2) create AKS cluster with larger VMs. This is a new requirement due to
the current ingress controller for the KBS service eating too much
vCPUs and lefting only few for the tests (resulting on failures)
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
_print_instance_type() returns the instance type of the AKS nodes, based
on the host type. Tests are grouped per host type in "small" and "normal"
sets based on the CPU requirements: "small" tests require few CPUs and
"normal" more.
There is an 3rd case: "all" host type maps to the union of "small"
and "normal" tests, which should be handled by _print_instance_type()
properly. In this case, it should return the largest instance type
possible because "normal" tests will be executed too.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
It's used an AKS managed ingress controller which keeps two nginx pod
replicas where both request 500m of CPU. On small VMs like we've used on
CI for running the CoCo non-TEE tests, it left only a few amount of CPU
for the tests. Actually, one of these pod replicas won't even get
started. So let's patch the ingress controller to have only one replica
of nginx.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The Azure AKS addon-http-application-routing add-on is deprecated and
cannot be enabled on new clusters which has caused some CI jobs to fail.
Migrated our code to use approuting instead. Unlike
addon-http-application-routing, this add-on doesn't
configure a managed cluster DNS zone, but the created ingress has a
public IP. To avoid having to deal with DNS setup, we will be using that
address from now on. Thus, some functions no longer used are deleted.
Fixes#11156
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Once the multiInstallSuffix has been taken into account, we should not
keep appending it on every re-run/restart, as that would lead to a path
that does not exist.
Fixes: #11187
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
In #11044, `run-k8s-tests-coco-nontee` was set as requried by mistake.
This PR disables the test again.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
We're bringing to *Cloud Hypervisor only* the reclaim_guest_freed_memory
option already present in the runtime-rs.
This allows us to use virtio-balloon for the hypervisor to reclaim
memory freed by the guest.
The reason we're not touching other hypervisors is because we're very
much aware of avoiding to clutter the go code at this point, so we'll
leave it for whoever really needs this on other hypervisor (and trust
me, we really do need it for Cloud Hypervisor right now ;-)).
Signed-off-by: Champ-Goblem <cameron@northflank.com>
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
The AKS CLI recently introduced a regression that prevents using
aks-preview extensions (Azure/azure-cli#31345), and hence create
CI clusters.
To address this, we temporarily hardcode the last known good version of
aks-preview.
Note that I removed the comment about this being a Mariner requirement,
as aks-preview is also a requirement of AKS App Routing, which will
be introduced soon in #11164.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Knowing that the upstream project provides a "ready to use" version of
the kernel, it's good to include an easy way to users to monitor
performance, and that's what we're doing by enabling the TASKSTATS (and
related) kernel configs.
This has been present as part of older kernels, but I couldn't
reasonably find the reason why it's been dropped.
Signed-off-by: Champ-Goblem <cameron@northflank.com>
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Let's add a RUNTIME_CHOICE env var that can be passed to be build
scripts, which allows the user to select whether they bulld the go
runtime, the rust runtime, or both.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
genpolicy is sending more HTTPS requests than other components during
CI so it's more likely to be affected by transient network errors
similar to:
ConnectError(
"dns error",
Custom {
kind: Uncategorized,
error: "failed to lookup address information: Try again",
},
)
Note that genpolicy is not the only component hitting network errors
during CI. Recent example from a different component:
"Message: failed to create containerd task: failed to create shim task:
failed to async pull blob stream HTTP status server error (502 Bad Gateway)"
This CI change might help just with the genpolicy errors.
Fixes: #11182
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
We can provide devices during cold-plug with CDI annotation on a Pod
level and add per container device information wit the device plugin.
Since the sandbox has already attached the VFIO device remove them
from consideration and just apply the inner runtime CDI annotation.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The addition of CDI devices is now done for single_container
and pod_sandbox and pod_container before the devmanager creates
the deviceinfos no need for extra parsing.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
After the outer runtime has processed the CDI annotation from the
spec we can delete them since they were converted into Linux
devices in the OCI spec.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Every so often the main gnu site has an outage, so
we can't download gperf. GNU providesthe generic URL https://ftpmirror.gnu.org to
automatically choose a nearby and up-to-date mirror,
so switch to this to help avoid this problem
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
If the correct version of go is already installed then
install_go.sh runs `exit`. When calling this as source from
cri-containerd/gha-run.sh it means all dependencies after
are skipped, so remove this.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Bump golang version to the latest minor 1.23.x release
now that 1.24 has been released and 1.22.x is no longer
stable and receiving security fixes
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add apt/apt-get updates before we do
apt/apt-get installs to try and help with
issues where we fail to fetch packages
Co-authored-by: Fabiano Fidêncio <fidencio@northflank.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This test we will test initdata in the following logic
1. Enable image signature verification via kernel commandline
2. Set Trustee address via initdata
3. Pull an image from a banned registry
4. Check if the pulling fails with log `image security validation
failed` the initdata works.
Note that if initdata does not work, the pod still fails to launch. But
the error information is `[CDH] [ERROR]: Get Resource failed` which
internally means that the KBS URL has not been set correctly.
This test now only runs on qemu-coco-dev+x86_64 and qemu-tdx
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
For a long time, there has been unformatted code in the kata-types
codebase, for example:
```
if qemu.memory_info.enable_guest_swap {
- return Err(eother!(
- "Qemu hypervisor doesn't support enable_guest_swap"
- ));
+ return Err(eother!("Qemu hypervisor doesn't support
enable_guest_swap"));
}
...
- }, device::DRIVER_NVDIMM_TYPE, eother, resolve_path
+ },
+ device::DRIVER_NVDIMM_TYPE,
+ eother, resolve_path,
-use std::collections::HashMap;
-use anyhow::{Result, anyhow};
+use anyhow::{anyhow, Result};
use std::collections::hash_map::Entry;
+use std::collections::HashMap;
-/// DRIVER_VFIO_PCI_GK_TYPE is the device driver for vfio-pci
+/// DRIVER_VFIO_PCI_GK_TYPE is the device driver for vfio-pci
```
This has brought unnecessary difficulties in version maintenance and
commit difficulties. This commit will address this issue.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
The Coniguration initialization was observed to be significantly slow
due to the extensive system information gathering performed by
`sysinfo::System::new_all()`. This function collects data on CPU,
memory, disks, and network, most of which is unnecessary for Kata's
memory adjusting config phase, where only the total system memory is
required.
This commit optimizes the initialization process by implementing a more
targeted approach to retrieve only the total system memory. This avoids
the overhead of collecting a large amount of irrelevant data, resulting
in a noticeable performance improvement.
Fixes#11165
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
We need more information (BAR memory and other future
ures...)for
PCI devices when vfio devices passed through.
So the method get_bars_max_addressable_memory is introduced for vfio
devices to deduce the memory_reserve and pref64_reserve for NVIDIA
devices. But it will be extended for other devices.
Fixes#10556
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
It's the basic framework for getting information of pci devices.
Currently, we focus on the PCI Max bar memory size, but it'll be
extended in the future.
Fixes#10556
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Use `|=` instead of `+=` while calculating and iterating through a
vector of flags, which makes more sense and prevents situations like
duplicated flags in vector, which would cause problems.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Currently, s390x only tests cri-containerd. Partially converge to the
feature set of basic-ci-amd64:
- containerd-sandboxapi
- containerd-stability
- docker
with the appropriate hypervisors.
Do not run tests currently skipped on amd64, as well as
- agent-ctl, which we don't package for s390x
- nerdctl, does not package the `full` image for s390x
- nydus, does not package for s390x
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Co-authored-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Recent PR #10732 moved the deletion of systemd files and units that were
deemed uneccessary by 02b3b3b977 from `image_builder.sh` to `rootfs.sh`.
This unfortunately broke `rootfs.sh centos` and `rootfs.sh -r` as used by
some other downstream users like fedora and RHEL, with the following error :
Warning FailedCreatePodSandBox 1s (x5 over 63s) kubelet
Failed to create pod sandbox: rpc error: code = Unknown
desc = CreateContainer failed: Establishing a D-Bus connection
Caused by:
0: I/O error: Connection reset by peer (os error 104)
1: Connection reset by peer (os error 104)
This is because the aforementioned distros use dbus-broker [1] that requires
systemd-journald to be present.
It is questionable that systemd units or files should be deemed unnecessary
for _all_ distros but this has been around since 2019. There's now also a
long-standing expectation from CI that `make rootfs && make image` does
remove these files.
In order to accomodate all the expectations, add a `-d` flag to `rootfs.sh`
to delete the systemd files and have `make rootfs` to use it.
[1] https://github.com/bus1/dbus-broker
Reported-by: Niteesh Dubey <niteesh@us.ibm.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
The CoCo non-TEE job has failed due the removal of an add-on
from AKS, causing KBS to not get installed (see #11156).
The fix should be done in this repo as well as in trustee, which can
take some time. We don't want to hold kata-containers PRs from getting
merged anylonger, so removing the job from required list.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
this script will be used in a new OCP integration pipeline to monitor
basic workflows of OCP+peer-pods.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
io.katacontainers.config.runtime.cc_init_data specifies initdata used by
the pod in base64(gzip(initdata toml)) format. The initdata will be
encapsulated into an initdata image and mount it as a raw block device
to the guest.
The initdata image will be aligned with 512 bytes, which is chosen as a
usual sector size supported by different hypervisors like qemu, clh and
dragonball.
Note that this patch only adds support for qemu hypervisor.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
This commit adds changes to add input container_id and related
container data to state after a CreateContainerRequest is allowed. This
helps constrain reference container data for evaluating request
inputs to one instead of matching against every policy container data,
Ex: in ExecProcessRequest inputs.
Fixes#11109
Signed-off-by: Sumedh Sharma <sumsharma@microsoft.com>
We need to make sure the device files are created correctly
in the rootfs otherwise kata-agent will apply permission 0o000.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
It's been released for some time already ... and although we did have
the necessary patches in, we better to stick to a released version of
the project.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
This is mostly used for Kata Containers backing up Confidential
Computing use cases, this also has benefits for the normal Kata
Containers use cases, this it's left enabled by default.
However, let's allow users to specify whether or not they want to have
it enabled, as depending on their use-case, it just does not make sense.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Right now we've had some logic to add EXTRA_PKGS, but those were
restrict to the nvidia builds, and would require changing the file
manually.
Let's make sure a user can add this just by specifying an env var.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Kata Containers provides, since forever, a way to run OCI guest-hooks
from the rootfs, as long as the files are dropped in a specific location
defined in the configuration.toml.
However, so far, it's been up to the ones using it to hack the generated
image in order to add those guest hooks, which is far from handy.
Let's add a way for the ones interested on this feature to just drop a
tarball file under the same known build directory, spcificy an env var,
and let the guest hooks be installed during the rootfs build.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Add a top-level rust-toolchain.toml with the version
that matches version.yaml to ensure that we stay in sync
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Kata-agent now will check if a device /dev/vd* with 'initdata' magic
number exists. If it exists, kata-agent will try to read it. Bytes 9~16
are the length of the compressed initdata toml in little endine.
Bytes starting from 17 is the compressed initdata.
The initdata image device layout looks like
0 8 16 16+length ... EOF
'initdata' length gzip(initdata toml) paddings
The initdata will be parsed and put as aa.toml, cdh.toml and
policy.rego to /run/confidential-containers/initdata.
When AgentPolicy is initialized, the default policy will be overwritten
by that.
When AA is to be launched, if initdata is once processed, the launch arg
will include --initdata parameter.
Also, if
/run/confidential-containers/initdata/aa.toml exists, the launch args
will include -c /run/confidential-containers/initdata/aa.toml.
When CDH is to be launched, if initdata is once processed, the launch
args will include -c /run/confidential-containers/initdata/cdh.toml
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
When making new tests required, or removing existing tests
from required, this doesn't impact the CI jobs, so we don't need
to run all the tests.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Remove metrics setup job
- Update some truncation typos of job names
- Add shellcheck-required
- Remove the ok-to-test as a required label on the build test
as it isn't needed as a trigger
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
To reduce duplication, we could update
the vsock-exporter crate to use settings and versions
from the agent, where applicable.
> [!NOTE]
> In order to use the workspace, this has bumped some crate versions
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- To reduce duplication, we could update
the rustjail crate to use settings and versions
from the agent, where applicable.
- Also switch to using the derive feature in serde crate
rather than the separate serde_derive to avoid keeping
both versions in sync
> [!NOTE]
> In order to use the workspace, this has bumped
some crate versions
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
To reduce duplication, we could update
the policy crate to use settings and versions
from the agent, where applicable.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Create agent workspace dependencies and packge info
so that the packages in the workspace can use them
- Group the local dependencies together for clarity
(like in #11129)
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Put local dependencies into workspace to avoid complex path dependencies all
over the workspace. This gives an overview of local dependencies this workspace
uses, where those crates are located, and simplifies the local dependencies
referencing process.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Judging by the layout of the `Cargo.toml` files, local dependencies are
intentionally separated from other dependencies, let's enforce it
workspace-wise.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Only `shim` and `shim-ctl` are incorporated in `runtime-rs`'s workspace, let's
extend it to cover all crates in `runtime-rs/crates`.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Just get base name from iommu group symlink is enough. As the
validation will be handled in subsequent steps when constructing
the full path /sys/kernel/iommu_groups/$iommu_group.
In this PR, it will remove dupicalted validation of iommu_group.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
For those not interested in CoCo, let's at least allow them to easily
build the agent without the guest-pull feature.
This reduces the binary size (already stripped) from 25M to 18M.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This commit introduces missing validations for input fields in ExecProcessRequest to
harden the security policy.
The changes include:
- Update rules.rego to add null/empty field enforcements for String_user, SelinuxLabel and ApparmorProfile
- Add unit test cases for ExecProcessRequest for each of the validations
Signed-off-by: Ankita Pareek <ankitapareek@microsoft.com>
Add how-to-use-memory-agent.md (How to use mem-agent to decrease the
memory usage of Kata container) to docs to show how to use mem-agent.
Fixes: #11013
Signed-off-by: Hui Zhu <teawater@gmail.com>
some of the e2e tests spawn a lot of workers which are mainly idle, but
the scheduler fails to schedule them due to cpu resource overcommit. For
our testing we are more focused on having actual pods running than the
speed of the scheduled pods so let's increase the amount of schedulable
pods by decreasing the default cpu requests.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
Previously we introduced `build-kata-static-tarball-riscv64.yaml`,
enable that workflow in `ci.yaml`.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
INSTALLATION_PREFIX must begin with a "/"
because it is being concatenated with /host.
If there is no /, displays a message and makes an error.
Fixes: #11096
Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>
Since it is difficult to update the README when modifying the options of ./build-kernel.sh,
instead of update the README, we encourage users to run the -h command.
Fixes: #11065
Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>
While running `cargo fmt -- --check` in `src/runtime-rs` directory, it
errors out and suggesting these is an redundant empty line, which
prevents `make check` of `runtime-rs` component from passing.
Remove redundant empty line to fix this.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
kata-deploy tests have been quite stable, working for more than 10 days
without any nightly failure (or any failure reported at all), and I'll
be the one maintaining those.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
`VMContainerCapable` requires a present `kvm` device, which is not yet
available in our RISC-V runners. Skipped related tests if it is running
on `riscv-builder`.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Provide according tests to cover `kata-runtime` package, test
`kata-runtime`'s `check` functionality on riscv64 platforms.
Signed-off-by: Yuting Nie <nieyuting@iscas.ac.cn>
Add `getExpectedHostDetails` with expected value according to template
defined in `kata-check_data_riscv64_test.go`. This provides necessary
`HostInfo` for tests to cover `kata-check_riscv64.go`.
Signed-off-by: Yuting Nie <nieyuting@iscas.ac.cn>
Add definition of `testCPUInfoTemplate` which is retrieved from
`/proc/cpuinfo` of a QEMU emulated virtual machine on virt board.
Signed-off-by: Yuting Nie <nieyuting@iscas.ac.cn>
`testSetCPUTypeGeneric` will be used for writting `kata-check` in
`kata-runtime` on riscv64 platforms, enable building for later testing.
Signed-off-by: Yuting Nie <nieyuting@iscas.ac.cn>
Convert Rust arch to Go arch in Makefile, and add `riscv64-options.mk`
to provide definitions required for runtime to build on riscv64.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Enable `kata-runtime check` command to work on riscv64 platforms to make
sure required features/devices presents.
Co-authored-by: Yuting Nie <nieyuting@iscas.ac.cn>
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
We get the following error while writing containerd config
if a base dir `/etc/containerd` does not exist like:
```
sudo tee /etc/containerd/config.toml << EOF
...
EOF
tee: /etc/containerd/config.toml: No such file or directory
```
The commit makes sure a base directory for containerd before
writing config and drops the config file deletion because a
default behaviour of `tee` is overwriting.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
As reported in #11011, mounted secrets are available after
a container image is pulled by add_storage() for IBM SE.
But secure mount should be handled before the `add_storage()`.
Therefore, this commit divides cdh_handler() into:
- cdh_handler_trusted_storage()
- cdh_handler_sealed_secrets()
and calls cdh_handler_sealed_secrets() after add_storage()
while keeping cdh_handler_trusted_storage() unchanged.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The test `Cannot get CDH resource when deny-all policy is set`
completes with a KBS policy set to deny-all. This affects the
future TEE test (e.g. k8s-sealed-secrets.bats) which makes a
request against KBS.
This commit introduces kbs_set_default_policy() and puts it to
the setup() in k8s-sealed-secrets.bats.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Kata Containers has support for both the IBM Secure Execution trusted
execution environment and the IBM Crypto Express hardware security
module (used via the Adjunct Processor bus), but using them together
requires specific steps.
In Secure Execution, the Acceleration and Enterprise PKCS11 modes of
Crypto Express are supported. Both modes require the domain to be
_bound_ in the guest, and the latter also requires the domain to be
_associated_ with a _guest secret_. Guest secrets must be submitted to
the ultravisor from within the guest.
Each EP11 domain has a master key verification pattern (MKVP) that can
be established at HSM setup time. The guest secret and its ID are to
be provided at `/vfio_ap/{mkvp}/secret` and
`/vfio_ap/{mkvp}/secret_id` via a key broker service respectively.
Bind each domain, and for each EP11 domain,
- get the secret and secret ID from the addresses above,
- submit the secret to the ultravisor,
- find the index of the secret corresponding to the ID, and
- associate the domain to the index of this secret.
To bind, add the secret, parse the info about the domain, and
associate, the s390_pv_core crate is used. The code from this crate
also does the AP online check, which can be removed from here.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
With this we switch to fully testing with helm, instead of testimg with
the kustomizations (which will soon be removed).
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Let's use what we have in the k8s functional tests to create a common
function to deploy kata containers using our helm charts. This will
help us immensely in the kata-deploy testing side in the near future.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This is not strictly needed, but it does help a lot when setting up a
cluster manually, while still relying on those scripts.
While here, let's also ensure the assignment is between quotes, to make
shellchecker happier.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This commit introduces changes to add test data for multiple request
type in a single testcases.json file. This allows for stateful testing,
for ex: enable testing ExecProcessRequest using policy state set after testing
a CreateContainerRequest.
Fixes#11073.
Signed-off-by: Sumedh Sharma <sumsharma@microsoft.com>
TDX Quote Generation Service (QGS) signs TDREPORT sent to it from
Qemu (GetQuote hypercall). Qemu needs quote-generation-socket
address configured for IPC.
Currently, Kata govmm only enables vsock based IPC for QGS but
QGS supports Unix Domain Sockets too which works well for host
process to process IPC (Qemu <-> QGS).
The QGS configuration to enable UDS is to run the service with "-port=0"
parameter. The same works well here too: setting
"tdx_quote_generation_service_socket_port=0" let's users to enable
UDS based IPC.
The socket path is fixed in QGS and cannot be configured: when "-port=0"
is used, the socket appears in /var/run/tdx-qgs/qgs.socket.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Use "cargo build --release" when BUILD_TYPE was not specified, or when
BUILD_TYPE=release. The default "cargo build" behavior is to build in
debug mode.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
It hangs when invalid arguments are specified.
```bash
kata-deploy-6sr2p:/# /opt/kata-artifacts/scripts/kata-deploy.sh xxx
Action:
* xxx
...
Usage: /opt/kata-artifacts/scripts/kata-deploy.sh [install/cleanup/reset]
ERROR: invalid arguments
...
^C <- hang
```
I changed it to behave the same as when there are no arguments.
```bash
kata-deploy-6sr2p:/# /opt/kata-artifacts/scripts/kata-deploy.sh
Usage: /opt/kata-artifacts/scripts/kata-deploy.sh [install/cleanup/reset]
ERROR: invalid arguments
kata-deploy-6sr2p:/# echo $?
1
```
Fixes: #11068
Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>
Since some files generated by protobuf were share between
runtime-rs and kata agent, and the kata agent's dependency
image-rs dependened protobuf@3.7.1, thus we'd better to keep
the protobuf version aligned between runtime-rs and agent,
otherwise, we couldn't compile the runtime-rs and agent
at the same time.
Fixes: https://github.com/kata-containers/kata-containers/issues/10650
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
`GITHUB_RUNNER_CI_ARM64` is turned on for self hosted runners without
virtualization to skipped those tests depend on virtualization. This may
happen to other archs/runners as well, let's generalize it to
`GITHUB_RUNNER_CI_NON_VIRT` so we can reuse it on other archs.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
More straightforward implementation of hard_coded_policy_tests_enabled,
that avoids ShellCheck warning:
warning: Remove quotes from right-hand side of =~ to match as a regex rather than literally. [SC2076]
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Fix unintended use of caller's variable. Use the corresponding function
parameter instead. ShellCheck:
warning: policy_settings_dir is referenced but not assigned. [SC2154]
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Avoid masking command return values by declaring and only then assigning.
ShellCheck:
warning: Declare and assign separately to avoid masking return values. [SC2155]
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Pick the the values exported by other scripts. ShellCheck:
warning: AUTO_GENERATE_POLICY is referenced but not assigned. [SC2154]
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
ShellCheck:
warning: This assignment is only seen by the forked process. [SC2097]
warning: This expansion will not see the mentioned assignment. [SC2098]
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
ShellCheck: add braces around variable references:
note: Prefer putting braces around variable references even when not strictly required. [SC2250]
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
ShellCheck: export variables used outside of tests_common.sh - e.g.,
warning: timeout appears unused. Verify use (or export if used externally). [SC2034]
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Replace [ ] with [[ ]] as advised by shellcheck:
note: Prefer [[ ]] over [ ] for tests in Bash/Ksh. [SC2292]
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
The metrics tests haven't been stable, or required through
github for many week now, so update the required-tests.yaml
list to re-sync
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
with the latest CoCo guest-components, tdx-attester no longer
depends on libtdx attest. Stop installing it to the rootfs.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
`go.opentelemetry.io/otel/trace.NewNoopTracerProvider`
is deprectated now, so switch to
`go.opentelemetry.io/otel/trace/noop.NewTracerProvider`
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
SC2248 (style): Prefer double quoting even when variables don't contain
special characters, might result in arguments difference, shouldn't in
our cases.
Related to: #10951
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
SC2155 (warning): Declare and assign separately to avoid masking return
values, should be harmless.
Related to: #10951
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
SC2086 Double quote to prevent globbing and word splitting, might break
places where we deliberately use word splitting, but we are not using it
here.
Related to: #10951
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh. This might
result in different handling of globs and some ops which we don't use.
Related to: #10951
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
`runtime-rs` is now buildable and testable on riscv64 platforms, enable
`build-check` on `runtime-rs`.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
`available_guets_protection` is required for `runtime-rs` to infer while
building it on riscv64 platforms. Set it to `NoProtection` as riscv64
does not support guest protection for now.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Same rationale as for runtime. With tests, the blackfriday replacement was
actually meaningful, so I refactored some imports.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
These replace directives aren't understood by dependabot, hence dependabot can
claim to upgrade a dependency, while a replace directive still makes the
dependency point to an old version.
Fixes: #11020
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
The SNP CI has been consistently passing and we request the @kata-containers/architecture-committee to mark this test as a required test.
Signed-Off-By: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
Retry "kubectl exec" a few times if it unexpectedly produced an empty
output string.
This is an attempt to work around test failures similar to:
https://github.com/kata-containers/kata-containers/actions/runs/13840930994/job/38730153687?pr=10983
not ok 1 Environment variables
(from function `grep_pod_exec_output' in file tests_common.sh, line 394,
in test file k8s-env.bats, line 36)
`grep_pod_exec_output "${pod_name}" "HOST_IP=\([0-9]\+\(\.\|$\)\)\{4\}" "${exec_command[@]}"' failed
That test obtained correct ouput from "sh -c printenv" one time, but the
second execution of the same command returned an empty output string.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
image-rs has gotten a number of significant updates, eliminating corner
cases with obscure containers, improving support for local certs, and
more.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Update to the latest hash of guest-components. This will pick up some
nice new features including using ec key for the rcar handshake.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Update to new hashes for Trustee. The MSRV for Trustee is now 1.80.0 so
bump the rust toolchain as well.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
In ef0e8669fb we
had been seeing some significantly lower minvalues in
the jitter.Result test, so I lowered the mid-value rather
than having a very high minpercent, but it appears that the
variability of this result is very high, so we are still getting
the occasional high value, so reset the midval and just
have a bigger ranges on both sides, to try and keep the test
stable.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The kubectl wait has a built in timeout of 30s, so
wrapping it in waitForProcess, means we have
180/2 * 30 delay, which is much longer than intended,
so just set the timeout directly.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This PR makes changes to remove steps to run scripts for
preparing and cleaning the runner and instead use runner
hooks env variables to manage them.
Fixes: #9934
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
For a use case, we want to set the SNP IDBlock, which allows
configuring the AMD ASP to enforce parameters like expected launch
digest at launch. The struct with the config that should be enforced
(IDBlock) is signed. The public key is placed in the auth block and
the signature is verified by the ASP before launch. The digest of the
public key is also part of the attestation report (ID_KEY_DIGESTS).
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
Test using the host path /tmp/k8s-policy-pod-test instead of
/var/lib/kubelet/pods.
/var/lib/kubelet/pods might happen to contain files that CopyFileRequest
would try to send to the Guest before CreateContainerRequest. Such
CopyFileRequest was an unintended side effect of this test.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Validate sandbox name using a regex. If the YAML specifies metadata.name, use a regex that exact matches.
If the YAML specifies metadata.generateName, use a regex that matches the prefix of the generated name.
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
We have three SG2042 connected and labeled as `riscv-builder`, add that
entry to `actionlint.yaml` to help linting while setting up workflows.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Enable `kernel` and `virtiofsd` static-tarball build for riscv64. Since
`virtiofsd` was previously supported and `kernel` is supported now.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
AIA (Advanced Interrupt Architecture) is available and enabled by
default after v6.10 kernel, provide pci.conf to make proper use of IMSIC
of AIA.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Create `riscv` folder for riscv64 architecture to be inferred while
constructing kernel configuration, and introduce `base.conf` which
builds 64-bit kernel and with KVM built-in to kernel.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Introduce rule to block routes from source addresses which are the
loopback. Block routes added to the lo device.
Signed-off-by: Cameron Baird <cameronbaird@microsoft.com>
AddSwap send the pci path to guest kernel to let it add swap device.
But some mmio device doesn't have pci path. To support it add
AddSwapPath send virt_path to guest kernel as swap device.
Fixes: #10988
Signed-off-by: Hui Zhu <teawater@antgroup.com>
This commit add guest swap support.
When configuration enable_guest_swap is enabled, runtime-rs will start a
swap task.
When the VM start or update the guest memory, the swap task will be
waked up to create and insert a swap file.
Before this job, swap task will sleep some seconds (set by configuration
guest_swap_create_threshold_secs) to reduce the impact on guest kernel
boot performance and prevent the insertion of multiple swap files due to
frequent memory elasticity within a short period.
The size of swap file is set by configuration guest_swap_size_percent.
The percentage of the total memory to be used as swap device.
Fixes: #10988
Signed-off-by: Hui Zhu <teawater@antgroup.com>
Add is_direct to struct BlockConfig.
This option specifies cache-related options for block devices.
Denotes whether use of O_DIRECT (bypass the host page cache) is enabled.
If not set, use configurarion block_device_cache_direct.
Fixes: #10988
Signed-off-by: Hui Zhu <teawater@antgroup.com>
Log the "kubectl exec" ouput, just in case it helps investigate sporadic
test errors like:
https://github.com/kata-containers/kata-containers/actions/runs/13724022494/job/38387329321?pr=10973
not ok 1 Environment variables
(in test file k8s-env.bats, line 37)
`grep "HOST_IP=\([0-9]\+\(\.\|$\)\)\{4\}"' failed
It appears that the first exec from this test case produced the expected
output:
MY_POD_NAME=test-env
but the second exec produced something else - that will be logged after
this change.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Log the "kubectl exec" ouput, just in case it helps investigate sporadic
test errors like:
https://github.com/kata-containers/kata-containers/actions/runs/13724022494/job/38387329268?pr=10973
not ok 1 ConfigMap for a pod
(in test file k8s-configmap.bats, line 44)
`kubectl exec $pod_name -- "${exec_command[@]}" | grep "KUBE_CONFIG_2=value-2"' failed
It appears that the first exec from this test case produced the expected
output:
KUBE_CONFIG_1=value-1
but the second exec produced something else - that will be logged after
this change.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
grep_pod_exec_output invokes "kubectl exec", logs its output, and checks
that a grep pattern is present in the output.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
We can use the netlink update method to add a route or an interface
address. There is no need to delete it first and then add it. This can
save two system commissions.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Get the route entry's flags from the host and
pass it into kata-agent to add route entries
with flags support.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
We should support the flags when add the route from
host to guest. Otherwise, some route would be set
failed.
Fixes: #7934
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
./tests/git-helper.sh:20:5: note: Prefer [[ ]] over [ ] for tests in Bash/Ksh. [SC2292]
./tests/git-helper.sh:22:26: note: Double quote to prevent globbing and word splitting. [SC2086]
./tests/git-helper.sh:23:7: note: Prefer [[ ]] over [ ] for tests in Bash/Ksh. [SC2292]
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Switch to `docker.io` provided by Ubuntu sources. It is not necessary
for us to install docker through `get-docker.sh`.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
- In the previous PR I only skipped the runtime/vendor
directory, but errors are showing up in other vendor
packages, so try a wildcard skip
- Also update the job step was we can distinguish between the
required and non-required versions
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- remove hard link to v.1.47.0 in go.mod
- run go mod tidy, go mod vendor to actually update to v1.58.3
- addresses CVE-2023-44487
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
Cloud-Hypervisor currently only supports `x86_64` and `aarch64`, this
features should not be avaiable even if other architectures explicitly
requires it.
Restrict `cloud-hypervisor` feature to only `x86_64` and `aarch64`.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Drop `target_arch = "s390x"` all over `runtime-rs`, it is strange to
have such predicates on features and code while we do not support it.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
As we'll touch this file during this series, let's already make sure we
solve all the needed warnings.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
He were fixing the few warnings we found in the files present in the
functional tests for kata-deploy.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
It doesn't make much sense to test different VMMs as that wouldn't
trigger a different code path.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
The previous PR mistakenly set all perms to 0o666 we should follow
what runc does and fetch the permission from the guest aka host
if the file_mode == 0. If we do not find the device on the guest aka
host fallback to 0.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
As we're testing against the LTS and the Active versions of
containers, let's upgrade the lts version from 1.6 to 1.7 and
active version from 1.7 to 2.0 to cover the sandboxapi tests.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
After the introduction of the following kernel parameters (see #6163):
```
CONFIG_SCLP_VT220_TTY=y
CONFIG_SCLP_VT220_CONSOLE=y
```
the system log for Kata components (e.g., the agent) no longer appeared
on the SCLP console (i.e., /dev/ttysclp0). Let's switch to the default
fallback console (likely /dev/console) for logging.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
When `KATA_HYPERVISOR` is set to `qemu-se-runtime-rs`,
a configuration file is properly referenced and a runtime class
should be created via kata-deploy.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
A configuration file, `configuration-qemu-se-runtime-rs.toml`,
is referenced when the `qemu-se-runtime-rs` runtime is configured.
This commit adds a template file and updates the Makefile configuration
accordingly.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
We had the default permissions set to 0o000 if the file_mode was not
present, for most container devices this is the wrong default. Since
those devices are meant also to be accessed by users and others add a
sane default of 0o666 to devices that do not have any permissions set.
Otherwise only root can acess those and we cannot run containers as a
user.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
`musl-tools` is only needed when a component needs `rust`, and the
`instance` running is of `x86_64` or `aarch64`.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
We had a static mapping of host guest PCI addresses, which prevented to
use VFIO devices in initContainers. We're tracking now the host-guest
mapping per container and removing this mapping if a container is
removed.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Now we've added the double quotes around
`${K8S_TEST_UNION[@]}`, so platforms are
failing with:
```
Error: Test file "/home/ubuntu/runner/_layout/_work/kata-containers/kata-containers/tests/integration/kubernetes/k8s-nginx-connectivity.bats
" does not exist
```
due to the line continuation, so sanitise the value
to try and fix this.
Co-authored-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The space was missing before `]`, so fix this and also
swtich to double square brackets and variable braces
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This was checking that a literal string was non-zero.
I'm assume it instead wanted to check if the file exists
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
> In functions, use return instead of break.
> rationale: break or continue are used to abort or
continue a loop, and are not the right way to exit
a function. Use return instead.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
> Can only exit with status 0-255. Other data should be written to stdout/stderr.
Switch exit -1 to exit 1
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
> Argument mixes string and array. Use * or separate argument.
- Swap echos for printfs and improve formatting
- Replace $@ with $*
- Split arrays into separate arguments
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
I'm not sure if we use test_images anywhere, so before
we invest the time to fix the 120 shellcheck errors and warnings
we should decide if we want to keep it. See #10957
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Start with a required smaller set of shellchecks
to try and prevent regressions whilst we fix
the current problems
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Ignore the vendor directories in our shellcheck
workflow as we can't fix them. If there is a way to
set this in shellcheckrc that would be better, but
it doesn't seem to be implemented yet.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
When using `virtio-net-pci` for IBM SE, the following error occurs:
```
update interface: Link not found (Address: f2:21:48:25:f4:10)
```
On s390x, it is more appropriate to use the CCW type of virtio
network device.
This commit ensures that a subchannel is configured accordingly.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
For IBM SE, the following kernel parameters are not required:
- Basic parameters (reboot and systemd-related)
- Rootfs parameters
This commit suppresses these parameters when IBM SE is configured.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This commit configures the command line for IBM Secure Execution (SE)
and other TEEs. The following changes are made:
- Add a new item `Se` to ProtectionDeviceConfig and handle it at sandbox
- Introduce `add_se_protection_device()` for SE cmdline config
- Bypass rootfs image/initrd validity checks when SE is configured.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
`USE_BUILDIN_DB` is turned on by default for architectures do not
support `Dragonball`, which leads `s390x` is building `runtime-rs` with
`--features dragonball` presents.
Let's restrict `USE_BUILDIN_DB` to be enable only for architectures
supported by `Dragonball` (namely x86_64 and aarch64 as of now).
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
On qemu the run seems to error after ~4-7 runs, so try
a cut down version of repetitions to see if this helps us
get results in a stable way.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We have a new metrics machine and environment
and the iperf jitter result failed as it finished too quickly,
so increase the minpercent to try and get it stable
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We have a new metrics machine and environment
and the fio write.bw and iperf3 parallel.Results
tests failed for clh, as below
the minimum range, so increase the
minpercent to try and get it stable
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We have a new metrics machine and environment
and the boot time test failed for clh, so increase the
maxpercent to try and get it stable
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The iperf deployment is quite a lot out of date
and uses `master` for it's affinity and toleration,
so update this to control-plane, so it can run on
newer Kubernetes clusters
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The new metrics runner seems slower, so we are
seeing errors like:
The iperf3 tests are failing with:
```
pod rejected: RuntimeClass "kata" not found
```
so give more time for it to succeed
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Move `kill_kata_components` from common.bash
into the metrics code base as the only user of it
- Increase the timeout on the start of containerd as
the last 10 nightlies metric tests have failed with:
```
223478 Killed sudo timeout -s SIGKILL "${TIMEOUT}" systemctl start containerd
```
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- As the metrics tests are largely independent
then allow subsequent tests to run even if previous
ones failed. The results might not be perfect if
clean-up is required, but we can work on that later.
- Move the test results check out of the latency
test that seems arbitrary and into it's own job step
- Add timeouts to steps that might fail/hang if there
are containerd/K8s issues
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Currently the run-metrics job runs a manual install
and does this in a separate job before the metrics
tests run. This doesn't make sense as if we have multiple
CI runs in parallel (like we often do), there is a high chance
that the setup for another PR runs between the metrics
setup and the runs, meaning it's not testing the correct
version of code. We want to remove this from happening,
so install (and delete to cleanup) kata as part of the metrics
test jobs.
Also switch to kata-deploy rather than manual install for
simplicity and in order to test what we recommend to users.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The drop-in path should be /etc/containerd (from the containers'
perspective), which mounts to the host path /etc/k0s/containerd.d.
With what we had we ended up dropping the file under the
/etc/k0s/containerd.d/containerd.d/, which is wrong.
This is a regression introduce by: 94b3348d3c
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Change kata-deploy script and Helm chart in order to be able to use kata-deploy on a microk8s cluster deployed with snap.
Fixes: #10830
Signed-off-by: Stephane Talbot <Stephane.Talbot@univ-savoie.fr>
Refator matrix setup and according dependencies installation logic in
`build-checks.yaml` and `build-checks-preview-riscv64.yaml` to provide
better readability and maintainability.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
`install-libseccomp` is applied only for `agent` component, and we are
already combining matrix with `if`s in steps, drop `install-libseccomp`
in matrix to reduce complexity.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
At the proper step pass-through the var KBUILD_SIGN_PIN
so that the kernel_headers step has the PIN for encrypting
the signing key.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
In kata-deploy-binaries.sh we need to pass-through the var
KBUILD_SIGN_PIN to the other static builder scripts.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Update kata-deploy-binaries-in-docker.sh to read the
env variable KBUILD_SIGN_PIN that either can be set via
GHA or other means.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We need to place the signing key and cert at the right place
and hide the KBUILD_SIGN_PIN from echo'ing or xtrace
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
If KBUILD_SIGN_PIN is provided we can encrypt the signing key
for out-of-tree builds and second round jobs in GHA
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The GitHub hosted runners for ARM64 do not provide virtualisation
support, thus we're just skipping the tests as those would check whether
or not the system is "VMContainerCapable".
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Update the code to install the version of k0s
that we have in our versions.yaml, rather than
just installing the latest, to help our CI being
less stable and prone to breaking due to things
we don't control.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add external versions support for k0s and
initially pin it at v1.31.5 as our cri-o tests
started failing when v1.32 became the latest
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In some cases, /init is not following two levels of symlinks
i.e. /init to /sbin/init to /lib/systemd/systemd
Setting /init directly to /lib/systemd/systemd when AGENT_INIT is not mandated
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
Sysctls may be added to a container by the Kubernetes pod definition or
by containerd configuration. This commit adds support for the
corresponding PodSecurityContext field and an option to specify
environment-dependent sysctls in the settings file.
The sysctls requested in a CreateContainerRequest are checked against
the sysctls in the pod definition, or if not defined there in the
defaults in genpolicy-settings.json. There is no check for the presence
of expected sysctls, though, because Kubernetes might legitimately
omit unsafe syscalls itself and because default sysctls might not apply
to all containers.
Fixes: #10064
Signed-off-by: Markus Rudy <mr@edgeless.systems>
On s390x, a virtio-net device will use the CCW bus instead of PCI,
which impacts how its uevent should be handled. Take the respective
path accordingly.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
To support virtio-net-ccw for s390x, add CCW devices to the Endpoint
interface. Add respective fields and functions to implementing structs.
Device paths may be empty. PciPath resolves this by being a list that
may be empty, but this design does not map to CcwDevice. Use a pointer
instead.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Field is being used for both PCI and CCW devices. Name it devicePath
to avoid confusion when the device isn't a PCI device.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
As part of device preparation in Sandbox we check available protection
and create a corresponding ProtectionDeviceConfig if appropriate. The
resource-side handling is trivial.
Signed-off-by: Pavel Mores <pmores@redhat.com>
As an example, or a test case, we add some implementation of SEV/SEV-SNP.
Within the QEMU command line generation, the 'Cpu' object is extended to
accomodate the EPYC-v4 CPU type for SEV-SNP.
'Machine' is extended to support the confidential-guest-support parameter
which is useful for other TEEs as well.
Support for emitting the -bios command line switch is added as that seems
to be the preferred way of supplying a path to firmware for SEV/SEV-SNP.
Support for emitting '-object sev-guest' and '-object sev-snp-guest'
with an appropriate set of parameters is added as well.
Signed-off-by: Pavel Mores <pmores@redhat.com>
ProtectionDevice is a new device type whose implementation structure
matches the one of other devices in the device module. It is split into
an inner "config" part which contains device details (we implement
SEV/SEV-SNP for now) and the customary outer "device" part which just adds
a device instance ID and the customary Device trait implementation.
Signed-off-by: Pavel Mores <pmores@redhat.com>
This matches the existing TDX handling where additional details are
retrieved right away after TDX is detected. Note that the actual details
(cbitpos) acquisition is NOT included at this time.
This change might seem bigger than it is. The change itself is just in
protection.rs, the rest are corresponding adjustments.
Signed-off-by: Pavel Mores <pmores@redhat.com>
This function was accidentally left unimplemented for CronJob, resulting
in runAsUser not being supported there.
Fixes: #10653
Signed-off-by: Markus Rudy <mr@edgeless.systems>
Since we have RISC-V builders available now, let's start with
`agent-ctl`, `trace-forwarder` and `genpolicy` components to run
build-checks on these `riscv-builder`s, and gradually add the rest
components when they are ready, to catch up with other architectures
eventually.
This workflow could be mannually triggered, `riscv-builder` will be the
default instance when that is the case.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Adding devices by CDI annotation can fail for a variety of reasons. If
that happens, it's helpful to know the root cause of the issue (CDI spec
missing, malformatted, requested device not present, etc.).
This commit adds the root cause of the CDI device addition to the errors
reported back to the caller. Since this error is bubbled up all the way
back to the shimv2 task.Create handler, it will be visible in Kubernetes
logs and enable fixing the root cause.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
Currently, if a layer can't be processed, we log this a warning and
continue execution, finally exit with a zero exit code. This can lead
to the generation of invalid policies. One reason a layer might not be
processed is that the pull of that layer fails.
We need all layers to be processed successfully to generate a valid
policy, as otherwise we will miss the verity hash for that layer or
we might miss the USER information from a passwd stored in that layer.
This will cause our VM to not get through the agent's policy validation.
Returning an error instead of printing a warning will cause genpolicy
to fail in such cases.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
As the guest-pull is a very Confidental Containers specific feature,
let's make sure we, at least, don't break folks who decide to build Kata
Containers' agent without having this feature enabled (for instance, for
the sake of the agent size).
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Architectures here with `musl` available are minority, which is more
suitable for enumeration.
With this change, we are implicitly choosing gnu target for `ppc64le`,
`riscv64` and `s390x`.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
While installing Rust and Golang in our CI workflow, `arch_to_golang`
and `arch_to_rust` are needed for inferring the correct arch string for
riscv64 architecture.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Since `ARCH` for `riscv64` is `riscv64gc`, we'll need to override it in
`utils.mk`, and forcing `gnu` target for `riscv64` because `musl` target
is not yet made ready.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
We need a proper ID otherwise QEMU sometimes fails with invalid ID.
Use the same pattern as with the old VFIO implementation.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
I noticed that CodeQl using the default config hasn't
scanned since May 2024, so figured it would be worth
trying an explicit configuration to see if that gets better results.
It's mostly the template, but updated to be more relevant:
- Only scan PRs and pushes to the `main` branch
- Set a pinned runner version rather than latest (with mac support)
- Edit the list of languages to be scanned to be more relevant
for kata-containers
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Currently the ci-on-push workflow that runs on PRs runs
two jobs: gatekeeper-skipper.yaml and ci.yaml. In order
to test things like for the error
```
too many workflows are referenced, total: 21, limit: 20
```
on topic branches, we need ci-devel.yaml to have an
extra workflow to match ci-on-push, so add the build-checks
as this is helpful to run on topic branches anyway.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Replace the four different publish workflows with
a single one that take input parameters of the arch
and runner, so reduce the amount of duplicated code
and try and avoid the
```
too many workflows are referenced, total: 21, limit: 20
```
error
Let's take advantege of the current arm64 runners, and make sure we have
those tests running there as well.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
There are many `override ARCH = powerpc64le` after where `utils.mk` is
included, which are redundant.
Drop those redundant `override`s.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
In the CI, test containers intermittently fail to start after creation,
with an error like below (see #10872 for more details):
# State: Terminated
# Reason: StartError
# Message: failed to start containerd task "afd43e77fae0815afbc7205eac78f94859e247968a6a4e8bcbb987690fcf10a6": No such file or directory (os error 2)
I've observed this error to repro with the following containers, which
have in common that they're all *very short-lived* by design (more tests
might be affected):
* k8s-job.bats
* k8s-seccomp.bats
* k8s-hostname.bats
* k8s-policy-job.bats
* k8s-policy-logs.bats
Furthermore, appending a `; sleep 1` to the command line for those
containers seemed to consistently get rid of the error.
Investigating further, I've uncovered a race between the end of the container
process and the setting up of the cgroup watchers (to report OOMs).
If the process terminates first, the agent will try to watch cgroup
paths that don't exist anymore, and it will fail to start the container.
The added error context in notifier.rs confirms that the error comes
from the missing cgroup:
https://github.com/kata-containers/kata-containers/actions/runs/13450787436/job/37585901466#step:17:6536
The fix simply consists in creating the watchers *before* we start the
container but still *after* we create it -- this is non-blocking, and IIUC the
cgroup is guaranteed to already be present then.
Fixes: #10872
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
I missed that when I added the other comments, so, for the sake of
consistency, let's just add it there as well.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
We never ever tested MEASURED_ROOTFS with initrd, and I sincerely do not
know why we've been setting that to "yes" in the initrd cases.
Let's drop it, as it may be causing issues with the jobs that rely on
the rootfs-initrd-confidential.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
For each IOMMUFD device create an object and assign
it to the device, we need additional information that
is populated now correctly to decide if we run the old VFIO
or new VFIO backend.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
As a follow-up for #10904, we do not need to set MEASURED_ROOTFS to no
on s390x explicitly. The GHA workflow already exports this variable.
This commit removes the redundant assignment.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This test verifies that, when ReadStreamRequest is blocked by the
policy, the logs are empty and the container does not deadlock.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This is the first attempt to remove the following code:
```
if [ "${ARCH}" == "s390x" ]; then
export MEASURED_ROOTFS=no
fi
```
from install_shimv2() in kata-deploy-binaries.sh.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
On Ubuntu 24.04, due to the /usr merge, system-provided unit files
now reside in `/usr/lib/systemd/system/` instead of `/lib/systemd/system/`.
For example, the command below now returns a different path:
```
$ systemctl show containerd.service -p FragmentPath
/usr/lib/systemd/system/containerd.service
```
Previously, on Ubuntu 22.04 and earlier, it returned:
```
/lib/systemd/system/containerd.service
```
The current pattern `if [[ $unit_file == /lib* ]]` fails to match the new path.
To ensure compatibility across versions, we update the pattern to match both
`/lib` and `/usr/lib` like:
```
if [[ $unit_file =~ ^/(usr/)?lib/ ]]
```
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Centralize all RustVMM crates to workspace.dependencies to prevent
having multiple versions of each RustVMM crate, which is error-prone and
inconsistent. With this setup, updates on RustVMM crates would be much
easier.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Setup workspace in dragonball, move `dbs` crates one level up to be
managed as members of dragonball workspace.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Add entries for dbs_* crates' README.md to pass `kata-spell-check.sh`
spell checking.
Changed British terms to American terms in README of `dbs_pci` to pass
`hunspell` check.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
sudo hides the environment variables that are sometimes
useful with the builds (for example: proxy settings).
While install_oras.sh could run completely without sudo in
the container it's COPY'd to, make minimal changes to it
to keep it functional outside the container too while still
addressing the problem of 'sudo curl' not working with proxy
env variables.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
To mitigate:
warning: `.../kata-containers/src/agent/.cargo/config` is deprecated in favor of `config.toml`
note: if you need to support cargo 1.38 or earlier, you can symlink `config` to `config.toml`
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
This picks up a security fix for confidential pulling of unsigned
images.
The crate moved permanently to oci-client, which required a few import
changes.
Co-authored-by: Paul Meyer <katexochen0@gmail.com>
Signed-off-by: Markus Rudy <mr@edgeless.systems>
We are running `header_check` for non-text files like binary files,
symbolic link files, image files (pictures) and etc., which does not
make sense.
Filter out non-text files and run `header_check` only for text files
changed.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
This was messed up a little when factoring out the policy crate.
Removing the dependencies no longer used by the agent and making the
import of kata-agent-policy optional again.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
An image `registry.k8s.io/hpa-example` only supports amd64.
Let's use a multi-arch image `quay.io/prometheus/prometheus`
for the QEMU example instead.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
`kata-dictionary.dic` changes after running `kata-spell-check.sh
make-dict`. This is due to someone forgot to first update entries in
data and run `make-dict`, but directly updated `kata-dictionary.dic`
instead.
Add mssing entries to data and re-run `make-dict` to generate correct
`kata-dictionary.dic`.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
previously we used to deploy the webhook and then modified the cm from
our ci/openshift-ci/ script to the desired value, but sometimes it
happens that the webhook pod starts before we modify the cm and keeps
using the default value.
Let's change the approach and modify the deployments in-place. The only
cons is it leaves the git dirty, but since this script is only supposed
to be used in ci it should be safe.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
So attestation-agent and others have a version including the ttrpc bump
to v0.8.4, allowing us to use the latest LTS kernel.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
We've been appending to the wrong variable for quite some time, it
seems, leading to not actually regenerating the rootfs when needed.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Set CONFIG_BLK_DEV_WRITE_MOUNTED=y to restore previous kernel behaviour.
Kernel v6.8+ will by default block buffer writes to block devices mounted by filesystems.
This unfortunately is what we need to use mounted loop devices needed by some teams
to build OSIs and as an overlay backing store.
More info on this config item [here](https://cateee.net/lkddb/web-lkddb/BLK_DEV_WRITE_MOUNTED.html)
Fixes: #10808
Signed-off-by: Simon Kaegi <simon.kaegi@gmail.com>
Run:
```
cargo update -p cookie-store
cargo update -p publicsuffix
```
to update the version of idna and resolve CVE-2024-12224
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Removed a rogue printf and updated the logging to say
that we're waiting for CDI spec(s) to be generated rather
than saying there is an error, it's not we have a timeout
after that it is an error.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
With the create_container_timeout the dial_timeout is lest important.
Add the custom timeout for GPUs in create_container_timeout
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The tags created automatically for published Github releases
are probably not annotated, so by simply running `git describe` we are
not getting the correct tag. Use a `git describe --tags` to allow git
to look at all tags, not just annotated ones.
Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
AgentConfig now has the cdi_timeout from the kernel
cmdline, update the proper function signature and use
it in the for loop.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Some systems like a DGX where we have 8 H100 or 8 H800 GPUs
need some extended time to be initialized. We need to make
sure we can configure CDI timeout, to enable even systems with 16 GPUs.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Phase 1 of Issue #10840
AMD has deprecated SEV support on
Kata Containers, and going forward,
SNP will be the only AMD feature
supported. As a first step in this
deprecation process, we are removing
the SEV CI workflow from the test suite
to unblock the CI.
Will be adding future commits to
remove redundant SEV code paths.
Signed-Off-By: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
The block volume test has failed on 10/10 nightlies
and all the PRs I've seen, so skip it until it can be assessed.
See #10873
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Based on the guidance from @Xynnn007 in #10851
> The new version of image-rs will do attestation once
ClientBuilder.build().await() is called, while the old version
will do so lazily the first image pull request comes.
Looks like it's called in rpc::start() in kata-agent, when
I'm afraid the network hasn't been initialized yet.
> I am not sure if the guest network is prepared after
the DNS is configured (in create_sandbox),
if so we can move (the init_image_service) right after that.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
As this brings in the commit bumping ttrpc to 0.8.4, which fixes
connection issues with kernel 6.12.9+.
As image-rs has a new builder pattern and several of the values in the
image client config have been renamed, let's change the agent to account
for this.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@linux.ibm.com>
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
cgroups v2 enforces stricter delegation rules, preventing operations on
cgroups outside our ownership boundary. When running Docker-in-Docker (DinD),
processes must be attached to an "init" subcgroup within the systemd unit.
This fix detects and uses the init subcgroup when proxying process attachment.
Fixes#10733
Signed-off-by: Antoine Gaillard <antoine.gaillard@datadoghq.com>
When trying to deploy nydus on kcli locally we get the
following failure:
```
root@sh-kata-ci1:~# kubectl get pods -n nydus-system
NAMESPACE NAME READY STATUS RESTARTS AGE
nydus-system nydus-snapshotter-5kdqs 0/1 CrashLoopBackOff 4 (84s ago) 7m29s
```
Digging into this I found that the nydus-snapshotter service
is failing with:
```
ubuntu@kata-k8s-worker-0:~$ journalctl -u nydus-snapshotter.service
-- Logs begin at Wed 2025-02-12 15:06:08 UTC, end at Wed 2025-02-12 15:20:27 UTC. --
Feb 12 15:10:39 kata-k8s-worker-0 systemd[1]: Started nydus snapshotter.
Feb 12 15:10:39 kata-k8s-worker-0 containerd-nydus-grpc[6349]: /usr/local/bin/containerd-nydus-grpc:
/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required b>
Feb 12 15:10:39 kata-k8s-worker-0 containerd-nydus-grpc[6349]: /usr/local/bin/containerd-nydus-grpc:
/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required b>
Feb 12 15:10:39 kata-k8s-worker-0 systemd[1]: nydus-snapshotter.service: Main process exited, code=exited, status=1/FAILURE
```
I think this is because 20.04 has version:
```
ubuntu@kata-k8s-worker-0:~$ ldd --version
ldd (Ubuntu GLIBC 2.31-0ubuntu9.16) 2.31
```
so it's too old for the nydus snapshotter.
Also 20.04 is EoL soon, so bumping is better.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Some problem hidden in `dbs` crates are revealed after making these
crates workspace components, fix according to `cargo clippy` suggests.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Peer pods have a linux namespace of type network. We want to make sure that all
container in the same pod use the same namespace. Therefore, we add the first
namespace path to the state and check all other requests against that.
This commit also adds the corresponding integration test in the policy crate
showcasing the benefit of having rust integration tests for the policy.
Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
The generated rego policies for `CreateContainerRequest` are stateful and that
state is handled in the policy crate. We use this policy crate in the
genpolicy integration test to be able to test if those state changes are
handled correctly without spinning up an agent or even a cluster.
This also allows to easily test on a e.g., CreateContainerRequest level
instead of relying on changing the yaml that is applied to a cluster.
Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
This commit allows to programmatically invoke genpolicy. This allows for other
rust tools that don't want to consume genpolicy as binary to generate policies.
One such use-case is the policy integration test implemented in the following
commits.
Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
The policy module augments the policy generated with genpolicy by keeping and
providing state to each invocation.
Therefore, it is not sufficient anymore to test the passing of requests in
the genpolicy crate.
Since in Rust, integration tests cannot call functions that are not exposed
publicly, this commit factors out the policy module of the agent into its
own crate and exposes the necessary functions to be consumed by the agent
and an integration tests. The integration test itself is implemented in the
following commits.
Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
Since the CPU SNP is upstreamed and available via our
default QEMU target we're repurposing the SNP-experimental
for the GPU+SNP enablement.
First step is to update the version we're basing it off.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
the latest containerd had an issue for its e2e test, thus we should do
the following fix to workaround this issue. For much info about this issue,
please see:
https://github.com/containerd/containerd/pull/11240
Once this pr was merged and release new version, we can remove
this workaround.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
A test case is added based on the intergrated cri-containerd case.
The difference between cri containerd integrated testcase and sandbox
api testcase is the "sandboxer" setting in the sandbox runtime handler.
If the "sandboxer" is set to "" or "podsandbox", then containerd will
use the legacy shimv2 api, and if the "sandboxer" is set to "shim", then
it will use the sandbox api to launch the pod.
In addition, add a containerd v2.0.0 version. Because containerd officially
supports the sandbox api from version 2.0.0.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
add and resiger the sandbox api service, thus runtime-rs
can deal with the sandbox api rpc call from the containerd.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
For Kata-Containers, we add SandboxService for these new calls alongside
the existing
TaskService, including processing requests and replies, and properly
calling
VirtSandbox's interfaces. By splitting the start logic of the sandbox,
virt_container
is compatible with calls from the SandboxService and TaskService. In
addition, we modify
the processing of resource configuration to solve the problem that
SandboxService does not
have a spec file when creating a pod.
Sandbox api can be supported from containerd 1.7. But there's a
difference from container 2.0.
To enbale it from 2.0, you can support the sandbox api for a specific
runtime by adding:
sandboxer = "shim", take kata runtime as an example:
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
sandboxer = "shim"
privileged_without_host_devices = true
pod_annotations = ["io.katacontainers.*"]
For container version 1.7, you can enable it by:
1: add env ENABLE_CRI_SANDBOXES=true
2: add sandbox_mode = "shim" to runtime config.
Acknowledgement
This work was based on @wllenyj's POC code:
(f5b62a2d7c)
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Signed-off-by: wllenyj <wllenyj@linux.alibaba.com>
For the processing of init sandbox, the init of task
api has some more special processing procedures than
the init of sandbox api, so these two types of init
are separated here.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
When start the sandbox, the sandbox id would be passed from the
shim command line, and it only need to get the containerd id from
oci spec when starting the pod container instead of the pod sandbox.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
When the sandbox api was enabled, the pause container would
be removed and sandbox start api only pass an empty bundle
directory, which means there's no oci spec file under it, thus
the cgroup config couldn't get the cgroup path from pause container's
oci spec. So we should set a default cgroup path for sandbox api
case.
In the future, we can promote containerd to pass the cgroup path during
the sandbox start phase.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Remove block_device_cache_direct from config of fc in runtime-rs because
fc doesn't support this config.
Fixes: #10849
Signed-off-by: Hui Zhu <teawater@antgroup.com>
Add block_device_cache_direct to config of ch and dragonball in
runtime-rs because they support this config.
Fixes: #10849
Signed-off-by: Hui Zhu <teawater@antgroup.com>
This commit change config in CloudHypervisorInner to normal
HypervisorConfig to decrease the change of its type.
Fixes: #10849
Signed-off-by: Hui Zhu <teawater@antgroup.com>
Otherwise we may end up simply unpacking kata-containers specific
binaries into the same location that system ones are needed, leading to
a broken system (most likely what happened with the metrics CI, and also
what's happening with the GHA runners).
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
We've been hitting issues with the CentOS 9 Stream machine, which Intel
doesn't have cycles to debug.
After raising this up in the Confidential Containers community meeting
we got the green light from Red Hat (Ariel Adam) to just disable the CI
based on CentOS 9 Stream for now.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
IBM SE ensures to make initrd measured by genprotimg and verified by ultravisor.
Let's not build the measured rootf on s390x.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This is needed in order to avoid agent build issues, such as:
```
error[E0658]: use of unstable library feature 'lazy_cell'
--> /home/ansible/.cargo/git/checkouts/guest-components-1e54b222ad8d9630/514c561/ocicrypt-rs/src/lib.rs:10:5
|
10 | use std::sync::LazyLock;
| ^^^^^^^^^^^^^^^^^^^
|
= note: see issue #109736 <https://github.com/rust-lang/rust/issues/109736> for more information
```
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
As we want to make sure a new builder image is generated if the rust
version is bumped.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Due to the way that multi-arch support is done, on various platforms
we will get a clippy error:
```
error: this expression always evaluates to false
```
which might not be true on those other platforms, so
allow this code pattern to suppress the clippy error
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
k8s-policy-job is modeled after the older k8s-job, and it appears
that both of them fail occasionally on coco-dev.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Preparing to install nydus permanently on the AMD node,
so disabling deploy and delete command for SNP and SEV.
Signed-off-by: Arvind Kumar <arvinkum@amd.com>
We introduced extratarballs with a make target. The CI
currently only uploads tarballs that are listed in the matrix.
The NV kernel builds a headers package which needs to be uploaded
as well.
The get-artifacts has a glob to download all artifacts hence we
should be good.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
So it avoids us hitting
```
error[E0282]: type annotations needed for `Box<_>`
--> /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/time-0.3.31/src/format_description/parse/mod.rs:83:9
|
83 | let items = format_items
| ^^^^^
...
86 | Ok(items.into())
| ---- type must be known at this point
|
help: consider giving `items` an explicit type, where the placeholders `_` are specified
|
83 | let items: Box<_> = format_items
| ++++++++
```
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
On powerpc64le platform the ip neigh command has
a trailing space after the state, so the test is failing e.g.
```
assertion `left == right` failed
left: "169.254.1.1 lladdr 6a:92:3a:59:70:aa PERMANENT \n"
right: "169.254.1.1 lladdr 6a:92:3a:59:70:aa PERMANENT\n"
```
Trim the whitespace to make the test pass on all platforms
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
`host_is_vmcontainer_capable` is required, but wasn't
implemented for powerpc64, so copy the aarch64 approach
@Amulyam24
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In #7236 the guest protection code was moved to kata-sys-utils,
but some of it was left behind, and the adjustment to the new
location wasn't completed, so the powerpc64 code doesn't
build now we've fixed the cfg to test it.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Some of the Kernel structs have `#[allow(dead_code)]`
but not all and this results in the clippy error:
```
error: fields `name` and `value` are never read
```
so complete the job started before to remove the error.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Clippy errors with:
```
error: field `driver` is never read
--> crates/resource/src/network/utils/link/driver_info.rs:77:9
|
76 | pub struct DriverInfo {
| ---------- field in this struct
77 | pub driver: String,
| ^^^^^^
```
We set this, but never read it, so clippy is correct,
but I'm not sure if it's useful for logging, or other purposes,
so I'll allow it for now.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Clippy fails with:
```
warning: unexpected `cfg` condition value: `test-mock`
--> /root/go/src/github.com/kata-containers/kata-containers/src/dragonball/src/dbs_pci/src/vfio.rs:1929:17
|
1929 | #[cfg(all(test, feature = "test-mock"))]
| ^^^^^^^^^^^^^^^^^^^^^ help: remove the condition
|
= note: no expected values for `feature`
= help: consider adding `test-mock` as a feature in `Cargo.toml`
```
So add it as an expected cfg in the linter to skip this
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Clippy fails with:
```
error: unexpected `cfg` condition value: `enable-vendor`
--> crates/hypervisor/src/device/driver/vfio.rs:180:11
|
180 | #[cfg(feature = "enable-vendor")]
| ^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: expected values for `feature` are: `ch-config`, `cloud-hypervisor`, `default`, and `dragonball`
= help: consider adding `enable-vendor` as a feature in `Cargo.toml`
```
So add it as an expected cfg in the linter to skip this
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Clippy complains about:
```
error: for loop over a `&Result`. This is more readably written as an `if let` statement
--> crates/hypervisor/src/firecracker/fc_api.rs:99:22
|
99 | for param in &kernel_params.to_string() {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
```
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Fix clippy error:
```
direct implementation of `ToString`
```
by switching to implement Display instead
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Clippy errors with:
```
error: field `0` is never read
--> crates/hypervisor/src/qemu/cmdline_generator.rs:375:25
|
375 | DeviceAlreadyExists(String), // Error when trying to add an existing device
| ------------------- ^^^^^^
```
but this is used when creating the error later, so add an allow
to ignore this warning
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Fix clippy error
```
error: usage of a legacy numeric constant
```
by swapping `std::u8::MAX` for `u8::MAX`
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Clippy errors with:
```
error: field `0` is never read
```
but the field is required for the `map_err`, so ignore this
error for now to avoid too much disruption
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
There were references to `config_manager::DeviceInfoGroup`
which doesn't exist, so I guess it means `DeviceConfigInfo`
instead, so update them
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Clippy errors with:
```
error: doc list item missing indentation
```
which I think is because the Return is between two list
items, so add a blank line to separate this into a separate
paragraph
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
clippy errors with:
```
error: initializer for `thread_local` value can be made `const`
```
so update as suggested
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Fix clippy error:
```
direct implementation of `ToString`
```
by switching to implement Display instead
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Fix clippy error
```
error: usage of a legacy numeric constant
```
by swapping `std::i32::<MIN/MAX>` for `i32::<MIN/MAX>`
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
```
error: file opened with `create`, but `truncate` behavior not defined
```
`truncate(true)` ensures the file is entirely overwritten with new data
which I believe is the behaviour we want
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
```
error: bound is defined in more than one place
```
Move Sized into the later definition of `R` & `W`
rather than defining them in two places
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
```
error: file opened with `create`, but `truncate` behavior not defined
```
`truncate(true)` ensures the file is entirely overwritten with new data
which I believe is the behaviour we want
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
```
error: field `image` is never read
--> src/registry.rs:35:9
|
34 | pub struct Container {
| --------- field in this struct
35 | pub image: String,
| ^^^^^
|
= note: `Container` has derived impls for the traits `Debug` and `Clone`, but these are intentionally ignored during dead code analysis
= note: `-D dead-code` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(dead_code)]`
error: field `use_cache` is never read
--> src/utils.rs:106:9
|
105 | pub struct Config {
| ------ field in this struct
106 | pub use_cache: bool,
| ^^^^^^^^^
|
= note: `Config` has derived impls for the traits `Debug` and `Clone`, but these are intentionally ignored during dead code analysis
error: could not compile `genpolicy` (bin "genpolicy") due to 2 previous errors
```
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Based on comments from @Amulyam24 we need to use
the `target_endian = "little"` as well as target_arch = "powerpc64"
to ensure we are working on powerpc64le.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Starting with version 1.80, the Rust linter does not accept an invalid
value for `target_arch` in configuration checks:
```
Compiling kata-sys-util v0.1.0 (/home/ddd/Work/kata/kata-containers/src/libs/kata-sys-util)
error: unexpected `cfg` condition value: `powerpc64le`
--> /home/ddd/Work/kata/kata-containers/src/libs/kata-sys-util/src/protection.rs:17:34
|
17 | #[cfg(any(target_arch = "s390x", target_arch = "powerpc64le"))]
| ^^^^^^^^^^^^^^-------------
| |
| help: there is a expected value with a similar name: `"powerpc64"`
|
= note: expected values for `target_arch` are: `aarch64`, `arm`, `arm64ec`, `avr`, `bpf`, `csky`, `hexagon`, `loongarch64`, `m68k`, `mips`, `mips32r6`, `mips64`, `mips64r6`, `msp430`, `nvptx64`, `powerpc`, `powerpc64`, `riscv32`, `riscv64`, `s390x`, `sparc`, `sparc64`, `wasm32`, `wasm64`, `x86`, and `x86_64`
= note: see <https://doc.rust-lang.org/nightly/rustc/check-cfg/cargo-specifics.html> for more information about checking conditional configuration
= note: `-D unexpected-cfgs` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(unexpected_cfgs)]`
```
According [to GitHub user @Urgau][explain], this is a new warning
introduced in Rust 1.80, but the problem exists before. The correct
architecture name should be `powerpc64`, and the differentiation
between `powerpc64le` and `powerpc64` should use the `target_endian =
"little"` check.
[explain]: #10072 (comment)
Fixes: #10067
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
[emlima: fix some more occurences and typos]
Signed-off-by: Emanuel Lima <emlima@redhat.com>
[stevenhorsman: fix some more occurences and typos]
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add aarch64 and x86_64 handling. Especially build the Rust
dependency with the correct rust musl target.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Container logs are forwarded to the agent through a unix pipe. These
pipes have limited capacity and block the writer when full. If reading
logs is blocked by policy, a common setup for confidential containers,
the pipes fill up and eventually block the container.
This commit changes the implementation of ReadStream such that it
returns empty log messages instead of a policy failure (in case reading
log messages is forbidden by policy). As long as the runtime does not
encounter a failure, it keeps pulling logs periodically. In turn, this
triggers the agent to flush the pipes.
Fixes: #10680
Co-Authored-By: Aurélien Bombo <abombo@microsoft.com>
Signed-off-by: Markus Rudy <mr@edgeless.systems>
In v4.44.5 of `yq`, artifacts for riscv64 are released. Update the
version used for `yq` and enable `install_yq.sh` to work on riscv64.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
The static_checks_versions test uses yamllint which fails with:
```
[comments] too few spaces before comment
```
many times and so makes code reviews more annoying with
all these extra messages. Other it's probably not the worse issues,
I checked the
[yaml spec](https://yaml.org/spec/1.2.2/#66-comments)
and it does say
> Comments must be separated from other tokens by white space character*s*
so it's easiest to fix it and move on.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
I've also seen cases (the qemu, crio, k0s tests) where Delete kata-deploy is still
running for this test after 2 hours, and had to be manually
cancelled, so let's try adding a 5m timeout to the kata-deploy delete to stop CI jobs hanging.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
#10714 added support for building a specific commit,
but due to the clone only having `--depth=1`, we can only
reset to a commit if it's the latest on the `main` branch,
otherwise we will get:
```
+ git clone --depth 1 --branch main https://gitlab.com/virtio-fs/virtiofsd virtiofsd
Cloning into 'virtiofsd'...
warning: redirecting to https://gitlab.com/virtio-fs/virtiofsd.git/
+ pushd virtiofsd
+ git reset --hard cecc61bca981ab42aae6ec490dfd59965e79025e
...
fatal: Could not parse object 'cecc61bca981ab42aae6ec490dfd59965e79025e'.
```
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Referenced AMD developer page for latest SEV firmware.
Instructions to point to upstream 6.11 kernel or later.
Referenced sev-utils and AMDESE fork for kernel setup.
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
use upstream qemu in snp and nvidia snp configs.
load ovmf with bios flag on qemu cmdline instead of file.
Fixes: #10750
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
snp standard attestation with the upstream kernel and qemu do not support extended attestation with certs.
Fixes: #10750
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
Previously, the test for VFIO-AP coldplug only checked whether a
passthrough device was attached to the VM guest. This commit expands
the test to include a full set of zcrypttest to verify that the device
functions properly within a container.
Additionally, since containerd has been upgraded to v1.7.25 on the
test machine, it is no longer necessary to run the test via crictl.
The commit removes all related codes/files.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This commit updates the device handler to call check_ap_device()
instead of wait_for_ap_device() for VFIO-AP coldplug.
The handler now returns a SpecUpdate for passthrough devices if
the device is online (e.g., `/sys/devices/ap/card05/05.001f/online`
is set to 1).
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This commit introduces a new gRPC device type, `vfio-ap-cold`, to support
VFIO-AP coldplug. This enables the VM guest to handle passthrough devices
differently from VFIO-AP hotplug.
With this new type, the guest no longer needs to wait for events (e.g., device
addition) because the device already exists at the time the device type is checked.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Even though ociSpec.Linux.Devices is preserved when vfio_mode is VFIO,
it has not been updated correctly for coldplug scenarios. This happens
because the device info passed to the agent via CreateContainerRequest
is dropped by the Kata runtime.
This commit ensures that the device info is added to the sandbox's
device manager when vfio_mode is VFIO and coldPlugVFIO is true
(e.g., vfio-ap-cold), allowing ociSpec.Linux.Devices to be properly
updated with the device information before the container is created on
the guest.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Since we're importing some build script for nvidia and we're
setting set -u we have some unbound variables in rootfs.sh
add initialization for those.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
stack-only types are handled properly with the
parse_cmdline_param macro advancted types like
String couldn't be guarded by a guard function since
it passed the variable by value rather than reference.
Now we can have guard functions for the String type
parse_cmdline_param!(
param,
CGROUP_NO_V1,
config.cgroup_no_v1,
get_string_value,
| no_v1 | no_v1 == "all"
);
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
For AGENT_INIT=yes we do not run systemd and hence
systemd.unified_... does not mean anything to other init
systems. Providing cgroup_no_v1=all is enough to signal
other init systemd to use cgroupV2.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Since we're defaulting to AGENT_INIT=no for all the initrd/images
adapt the NV build to properly get kata-agent installed.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
While I wish we could be bumping to the very same version everywhere,
it's not possible and it's been quite a ride to get a combination of
things that work.
Let me try to describe my approach here:
* Do *NOT* stay on 20.04
* This version will be EOL'ed by April
* This version has a very old version of systemd that causes a bug
when trying to online the cpusets for guests using systemd as
init, causing then a breakage on the qemu-coco-non-tee and TDX
non-attestation set of tests
* Bump to 22.04 when possible
* This was possible for the majority of the cases, but for the
confidential initrd & confidential images for x86_64, the reason
being failures on AMD SEV CI (which I didn't debug), and a kernel
panic on the CentOS 9 Stream TDX machine
* 22.04 is being used instead of 24.04 as multistrap is simply broken
on Ubuntu 24.04, and I'd prefer to stay on an LTS release whenever
it's possible
* Bump to 24.10 for x86_64 image confidential
* This was done as we got everything working with 24.10 in the CI.
* This requires using libtdx-attest from noble (Ubuntu 24.04), as
Intel only releases their sgx stuff for LTS releases.
* Stick to 20.04 for x86_64 initrd confidential
* 24.10 caused a panic on their CI
* This is only being used by AMD so far, so they can decide when to
bump, after doing the proper testing & debug that the bump will work
as expected for them
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
We first try without passing the `--break-system-packages` argument, as
that's not supported on Ubuntu 22.04 or older, but that's required on
Ubuntu 24.04 or newer.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Otherwise a bump in the os name and / or os version would lead to the CI
using a cached artefact.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
We have gotten Ubuntu 20.04 working pretty much "by luck", as multistrap
fails the deployment, and then a hacky function was introduced to add
the proper dbus links. However, this does not scale at all, and we
should:
* Fail if multistrap fails
* I won't do this for Ubuntu 20.04 as it's working for now and soon
enough it'll be EOL
* Add better logging to ensure someone can know when multistrap fails
Below you can find the failure that we're hitting on Ubuntu 20.04:
```sh
Errors were encountered while processing:
dbus
ERR: dpkg configure reported an error.
Native mode configuration reported an error!
I: Tidying up apt cache and list data.
Multistrap system reported 1 error in /rootfs/.
I: Tidying up apt cache and list data.
```
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Right now we're hitting an interesting situation with osbuilder, where
regardless of what's being passed Ubuntu 20.04 (focal) is being used
when building the rootfs-image, as shown in the snippets of the logs
below:
```
ffidenci@tatu:~/src/upstream/kata-containers/kata-containers$ make rootfs-image-confidential-tarball
/home/ffidenci/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build//kata-deploy-copy-libseccomp-installer.sh "agent"
make agent-tarball-build
...
make pause-image-tarball-build
...
make coco-guest-components-tarball-build
...
make kernel-confidential-tarball-build
...
make rootfs-image-confidential-tarball-build
make[1]: Entering directory '/home/ffidenci/src/upstream/kata-containers/kata-containers'
/home/ffidenci/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build//kata-deploy-binaries-in-docker.sh --build=rootfs-image-confidential
sha256:f16c57890b0e85f6e1bbe1957926822495063bc6082a83e6ab7f7f13cabeeb93
Build kata version 3.13.0: rootfs-image-confidential
INFO: DESTDIR /home/ffidenci/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build/build/rootfs-image-confidential/destdir
INFO: Create image
build image
~/src/upstream/kata-containers/kata-containers/tools/osbuilder ~/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build/build/rootfs-image-confidential/builddir
INFO: Build image
INFO: image os: ubuntu
INFO: image os version: latest
Creating rootfs for ubuntu
/home/ffidenci/src/upstream/kata-containers/kata-containers/tools/osbuilder/rootfs-builder/rootfs.sh -o 3.13.0-13f0807e9f5687d8e5e9a0f4a0a8bb57ca50d00c-dirty -r /home/ffidenci/src/upstream/kata-containers/kata-containers/tools/packaging/kata-deploy/local-build/build/rootfs-image-confidential/builddir/rootfs-image/ubuntu_rootfs ubuntu
INFO: rootfs_lib.sh file found. Loading content
~/src/upstream/kata-containers/kata-containers/tools/osbuilder/rootfs-builder/ubuntu ~/src/upstream/kata-containers/kata-containers/tools/osbuilder
~/src/upstream/kata-containers/kata-containers/tools/osbuilder
INFO: rootfs_lib.sh file found. Loading content
INFO: build directly
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
Get:1 http://security.ubuntu.com/ubuntu focal-security InRelease [128 kB]
Get:2 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
Get:3 http://archive.ubuntu.com/ubuntu focal-updates InRelease [128 kB]
Get:4 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [4276 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal-backports InRelease [128 kB]
Get:6 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]
Get:7 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [1297 kB]
Get:8 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [30.9 kB]
Get:9 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [4187 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]
Get:13 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [4663 kB]
Get:14 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1589 kB]
Get:15 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [34.6 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [4463 kB]
Get:17 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [55.2 kB]
Get:18 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [28.6 kB]
Fetched 34.1 MB in 5s (6284 kB/s)
...
```
The reason this is happening is due to a few issues in different places:
1. IMG_OS_VERSION, passed to osbuilder, is not used anywhere and
OS_VERSION should be used instead. And we should break if OS_VERSION
is not properly passed down
2. Using UBUNTU_CODENAME is simply wrong, as it'll use whatever comes as
the base container from kata-deploy's local-build scripts, and it has
just been working by luck
Note that at the same time this commit fixes the wrong behaviour, it
would break the rootfses build as they are, this we need to set the
versions.yaml to use 20.04 were it was already using 20.04 even without
us knowing.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
As this is required as part of the osbuilder tool to be able to properly
set the repositories used when building the rootfs.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
While having variables are nice, those are more extensive to write down,
and actually confusing for tired developer eyes to read, plus we're
mixing the use of the yaml variables here and there together with not
using them for some architectures.
With the best "all or nothing" spirit, let's just make it easier for our
developers to read the versions.yaml and easily understand what's being
used.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
As the devices controller works in a different way in cgroupsv2, the
"/sys/fs/cgroup/devices/devices.list" file simply doesn't exist.
For now, let's skip the test till the test maintainer decides to
re-enable it for cgroupsv2.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
The changes done are:
* cpu/cpu.shares was replaced by cpu.weight
* The weight, according to our reference[0], is calculated by:
weight = (1 + ((request - 2) * 9999) / 262142)
* cpu/cpu.cfs_quota_us & cpu/cpu.cfs_period_us were replaced by cpu.max,
where quota and period are written together (in this order)
[0]: https://github.com/containers/crun/blob/main/crun.1.md#cgroup-v2
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This reverts commit 091ad2a1b2, in order
to ensure tests would be running with cgroupsv2 on the guest.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
In the last couple of days I've seen the blogbench
metrics write latency test on clh fail a few times because
the latency was too low, so adjust the minimum range
to tolerate quicker finishes.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The static-checks targets are `pull_request`, so
they can run the PR workflow version, so we want to
update the required-tests.yaml so that static-check
workflow changes do trigger static checks in order
to test them properly.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Now we have the build-assets running on the gh-hosted
runners, try the same approach for the static-checks
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
I've noticed the following error when running the tests with SEV:
```
2025-01-21T17:10:28.7999896Z # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
2025-01-21T17:10:28.8000614Z # @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
2025-01-21T17:10:28.8001217Z # @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
2025-01-21T17:10:28.8001857Z # IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
2025-01-21T17:10:28.8003009Z # Someone could be eavesdropping on you right now (man-in-the-middle attack)!
2025-01-21T17:10:28.8003348Z # It is also possible that a host key has just been changed.
2025-01-21T17:10:28.8004422Z # The fingerprint for the ED25519 key sent by the remote host is
2025-01-21T17:10:28.8005019Z # SHA256:x7wF8zI+LLyiwphzmUhqY12lrGY4gs5qNCD81f1Cn1E.
2025-01-21T17:10:28.8005459Z # Please contact your system administrator.
2025-01-21T17:10:28.8006734Z # Add correct host key in /home/kata/.ssh/known_hosts to get rid of this message.
2025-01-21T17:10:28.8007031Z # Offending ED25519 key in /home/kata/.ssh/known_hosts:178
2025-01-21T17:10:28.8007254Z # remove with:
2025-01-21T17:10:28.8008172Z # ssh-keygen -f "/home/kata/.ssh/known_hosts" -R "10.244.0.71"
```
And this was causing a failure to ssh into the confidential pod.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Relying on dmesg is really not ideal, as we may lose important info,
mainly those which happen very early in the boot, depending on the size
of kernel ring buffer.
So, for this specific test, let's increase the kernel ring buffer, by
default, to 4M.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Let's make sure that we don't use Kata Containers' agent as init for the
Confidential related rootfses*, as we don't want to increase the agent's
complexity for no reason ... mainly when we can rely on a proper init
system.
*:
- images already used systemd as init
- initrds are now using systemd as init
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Bumps the go_modules group with 1 update in the /src/runtime directory: [golang.org/x/net](https://github.com/golang/net).
Bumps the go_modules group with 1 update in the /src/tools/csi-kata-directvolume directory: [golang.org/x/net](https://github.com/golang/net).
Bumps the go_modules group with 1 update in the /tools/testing/kata-webhook directory: [golang.org/x/net](https://github.com/golang/net).
Updates `golang.org/x/net` from 0.25.0 to 0.33.0
- [Commits](https://github.com/golang/net/compare/v0.25.0...v0.33.0)
Updates `golang.org/x/net` from 0.23.0 to 0.33.0
- [Commits](https://github.com/golang/net/compare/v0.25.0...v0.33.0)
Updates `golang.org/x/net` from 0.23.0 to 0.33.0
- [Commits](https://github.com/golang/net/compare/v0.25.0...v0.33.0)
---
updated-dependencies:
- dependency-name: golang.org/x/net
dependency-type: indirect
dependency-group: go_modules
- dependency-name: golang.org/x/net
dependency-type: direct:production
dependency-group: go_modules
- dependency-name: golang.org/x/net
dependency-type: indirect
dependency-group: go_modules
...
Signed-off-by: dependabot[bot] <support@github.com>
When the agent is run as the init process cgroupfs is being
setup. In the case of cgroupsV1 we needed to enable the memory hiearchy
this is now per default enabled in cgroupsV2. Additionally the file
/sys/fs/cgroup/memory/memory.use_hierarchy isn't even available with V2.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The `Create AKS cluster` step in `run-k8s-tests-on-aks.yaml` is likely
to fail fail since we are trying to issue `PUT` to `aks` in a relatively
high frequency, while the `aks` end has it's limit on `bucket-size` and
`refill-rate`, documented here [1].
Use `nick-fields/retry@v3` to retry in 10 seconds after request fail,
based on observations that AKS were request 7, or 8 second delays
before retry as part of their 429 response
[1] https://learn.microsoft.com/en-us/azure/aks/quotas-skus-regions#throttling-limits-on-aks-resource-provider-apisFixes: #10772
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
With this change, `virtiofsd` (gnu target) could be built and then to be
used with other components.
Depends: #10741Fixes: #10739
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
While working on #10559, I realized that some parts of the codebase use
$GH_PR_NUMBER, while other parts use $PR_NUMBER.
Notably, in that PR, since I used $GH_PR_NUMBER for CoCo non-TEE tests
without realizing that TEE tests use $PR_NUMBER, the tests on that PR
fail on TEEs:
https://github.com/kata-containers/kata-containers/actions/runs/12818127344/job/35744760351?pr=10559#step:10:45
...
44 error: error parsing STDIN: error converting YAML to JSON: yaml: line 90: mapping values are not allowed in this context
...
135 image: ghcr.io/kata-containers/csi-kata-directvolume:
...
So let's unify on $GH_PR_NUMBER so that this issue doesn't repro in the
future: I replaced all instances of PR_NUMBER with GH_PR_NUMBER.
Note that since some test scripts also refer to that variable, the CI
for this PR will fail (would have also happened with the converse
substitution), hence I'm not adding the ok-to-test label and we should
force-merge this after review.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
With ubuntu 20.04 image, virtiofsd gnu target couldn't be built due to
"unsupported ISA subset z" reported by "cc".
Updating to ubuntu 22.04 image addresses this problem.
Relates: #10739
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
With newer kernels we have a new backend for VFIO
called IOMMUFD this is a departure from VFIO IOMMU Groups
since it has only one device associated with an IOMMUFD entry.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The pyinstaller is located per default under /usr/local/bin
some prior versions were installing it to ${HOME}.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Skip logging empty lines of text from the Guest console output, if
there are any such lines.
Without this change, the Guest console log from CLH + /dev/pts/0 has
twice as many lines of text. Half of these lines are empty.
Fixes: #10737
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
On s390x, some tests for trusted storage occasionally failed due to:
```bash
etcdserver: request timed out
```
or
```bash
Internal error occurred: resource quota evaluation timed out
```
These timeouts were not observed previously on k3s but occur
sporadically on kubeadm. Importantly, they appear to be temporary
and transient, which means they can be ignored in most cases.
To address this, we introduced a new wrapper function, `retry_kubectl_apply()`,
for `kubectl create`. This function retries applying a given manifest up to 5
times if it fails due to a timeout. However, it will still catch and handle
any other errors during pod creation.
Fixes: #10651
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Move the deletion of unnecessary systemd units and files from
image_builder.sh into rootfs.sh.
The files being deleted can be applicable to other image file formats
too, not just to the rootfs-image format created by image_builder.sh.
Also, image_builder.sh was deleting these files *after* it calculated
the size of the rootfs files, thus missing out on the opportunity to
possibly create a smaller image file.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This update addresses an issue with token verification for SE and SNP
introduced in the last update by #10541.
Bumping the project to the latest commit resolves the issue.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The earlier implementation relied on using a specific mount-path prefix - `/sealed`
to determine that the referenced secret is a sealed secret.
However that was restrictive for certain use cases as it forced
the user to always use a specific mountpath naming convention.
This commit introduces an alternative implementation to relax the
restriction. A sealed secret can be mounted in any mount-path.
However it comes with a potential performance penality. The
implementation loops through all volume mounts and reads the file
to determine if it's a sealed secret or not.
Fixes: #10398
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
We hit a failure with:
```
time="2025-01-09T09:55:58Z" level=warning msg="Failed Minval (0.017600 > 0.015000) for [network-iperf3]"
```
The range is very big, but in the last 3 test runs I reviewed we have got a minimum value of 0.015s
and a max value of 0.052, so there is a ~350% difference possible
so I think we need to have a wide range to make this stable.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Because az client restricts the name to be less than 64 characters. In
some cases (e.g. KATA_HYPERVISOR=qemu-runtime-rs) the generated name
will exceed the limit. This changed the function to shorten the name:
* SHA1 is computed from metadata then compound the cluster's name
* metadata as plain-text are passed as --tags
Fixes: #9850
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The bump to kernel 6.12 seems to have reduced the latency in
the metrics test, so increase the ranges for the minimal value,
to account for this.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Together with the bump, let's also bump the rust version needed to build
the package, with the caveat that virtiofsd doesn't actually use a
pinned version as part of their CI, so we're bumping to whatever is the
version on `alpine:rust` (which is used in their CI).
It's important to note that we're using a version which brings in one
extra patch apart from the release, as the next virtiofsd release will
happen at the end of February, 2025.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Right now we've been only building releases from virtiofsd, but we'll
need to pin a specific commit till v1.14.0 is out, thus let's add the
needed machinery to do so.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
The guest-components directory has been re-arranged slightly. Adjust the
installation path of the LUKS helper script to account for this.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Trustee has some new features including a plugin backend, support for
PKCS11 resources, improvements to token verification, and adjustments to
logging, and more.
Also update guest-components to pickup improvements and keep the KBS
protocol in sync.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Since we bumped to the 6.12.x LTS kernel, we've also adjusted the
aggressivity of the OOM test, which may be enough to allow us to
re-enable it for mariner.
Fixes: #8821
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Since from https://github.com/containerd/containerd/pull/9096
containerd removed cri-containerd-*.tar.gz release bundles,
thus we'd better change the tarball name to "containerd".
BTW, the containerd tarball containerd the follow files:
bin/
bin/containerd-shim
bin/ctr
bin/containerd-shim-runc-v1
bin/containerd-stress
bin/containerd
bin/containerd-shim-runc-v2
thus we should untar containerd into /usr/local directory instead of "/"
to keep align with the cri-containerd.
In addition, there's no containerd.service file,runc binary and cni-plugin
included, thus we should add a specific containerd.service file and
install install the runc binary and cni-pluginspecifically.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Got following issue:
kata-ctl direct-volume add /kubelet/kata-direct-vol-002/directvol002
"{\"device\": \"/home/t4/teawater/coco/t.img\", \"volume-type\":
\"directvol\", \"fstype\": \"\", \"metadata\":"{}", \"options\": []}"
subsystem: kata-ctl_main
Dec 30 09:43:41.150 ERRO Os {
code: 2,
kind: NotFound,
message: "No such file or directory",
}
The reason is KATA_DIRECT_VOLUME_ROOT_PATH is not exist.
This commit create_dir_all KATA_DIRECT_VOLUME_ROOT_PATH before join_path
to handle this issue.
Fixes: #10695
Signed-off-by: Hui Zhu <teawater@antgroup.com>
Unsurprisingly now we've got passed the containerd test
hangs on the ppc64le, we are hitting others in the "Prepare the
self-hosted runner" stage, so add timeouts to all of them
to avoid CI blockages.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add information about what required jobs are and
our initial guidelines for how jobs are eligible for being
made required, or non-required
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
`set_hypervisor_config()` and `set_passfd_listener_port()` acquire inner
lock, so that `mut` for `hypervisor` is unneeded.
Fixes: #10682
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Fix the logic to make the test skipped on qemu-coco-dev,
rather than the opposite and update the syntax to make it
clearer as it incorrectly got written and reviewed by three
different people in it's prior form.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In order to check that we don't accidentally overwrite
release artifacts, we should add a check if the release
name already exists and bail if it does.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Building static binaries for aarch64 requires disabling PIE
We get an GOT overflow and the OS libraries are only build with fpic
and not with fPIC which enables unlimited sized GOT tables.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The current script logic assigns an empty string to APID and APQI
when APQN consists entirely of zeros (e.g., "00.0000").
However, this behavior is incorrect, as "00" and "0000" are valid
values and should be represented as "0".
This commit ensures that the script assigns the default string “0”
to APID and APQI if their computed values are empty.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Due to the agent-api tests requiring the agent to be deployed in the
CI by the tarball, so in the short-term lets only do this on the release
stage, so that both kata-manager works with the release and the
agent-api tests work with the other CI builds.
In the longer term we need to re-evaluate what is in our tarballs
(issue #10619), but want to unblock the tests in the short-term.
Fixes: #10630
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
With the code I originally did I think there is potentially
a case where we can get a failure due to timing of steps.
Before this change the `build-asset-shim-v2`
job could start the `get-artifacts` step and concurrently
`remove-rootfs-binary-artifacts` could run and delete the artifact
during the download and result in the error. In this commit, I
try to resolve this by making sure that the shim build waits
for the artifact deletes to complete before starting.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We want to have the latest QEMU version available
which is as of this writing v9.1.2
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
qemu: Add new options for 9.1.2
We need to fence specific options depending on the version
and disable ones that are not needed anymore
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
qemu: Add no_patches.txt
Since we do not have any patches for this version
let's create the appropriate files.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The new QEMU build needs python-tomli, now that we bumped Ubuntu
we can include the needed tomli package
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We're disabling pmem support, it is heavilly broken with
Ubuntu's static build of QEMU and not needed
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
In jammy we have the liburing package available, hence
remove the source build and include the package.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
This path fixes the logic of getting the type of volume: when the type of
OCI mount is Some("none") and the options have "bind" or "rbind", the
type will be considered as "bind".
Fixes: #10642
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
This commit sets memory config `shared` to false in cloud hypervisor
when creating vm with shared_fs=None && hugePages = false.
Currently in runtime/virtcontainers/clh.go,the memory config shared is by default set to true.
As per the CLH memory document,
(a) shared=true is needed in case like when using virtio_fs since virtiofs daemon runs as separate process than clh.
(b) for shared_fs=none + hugespages=false, shared=false can be set to use private anonymous memory for guest (with no file backing).
(c) Another memory config thp (use transparent huge pages) is always enabled by default.
As per documentation, (b) + (c) can be used in combination.
However, with the current CLH implementation, the above combination cannot be used since shared=true is always set.
Fixes#10547
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
> A && B || C is not if-then-else. C may run when A is true
Refactor the echo so that we can't get into a situation where
the retry of workspace delete happens if the original one was
successful
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Bump some actions that are significantly out-of-date
and out of sync with the versions used in other workflows
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The tdx runners got split into two different
runners, so we need to update the known self-hosted
runner labels
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
On PRs that update anything in the workflows directory,
add an actionlint run to validate our workflow files for errors
and hopefully catch issues earlier.
Fixes: #9646
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
So users can simply download the chart and use it accordingly without
the need to download the full repo.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Add sub command MemAgentMemcgSet and MemAgentCompactSet to agent-ctl to
configate the mem-agent inside the running kata-containers.
Fixes: #10625
Signed-off-by: Hui Zhu <teawater@antgroup.com>
Add MemAgentMemcgSet and MemAgentCompactSet to agent API to set the config of
mem-agent memcg and compact.
Fixes: #10625
Signed-off-by: Hui Zhu <teawater@antgroup.com>
mem-agent is a component designed for managing memory in Linux
environments.
Sub-feature memcg: Utilizes the MgLRU feature to monitor each cgroup's
memory usage and periodically reclaim cold memory.
Sub-feature compact: Periodically compacts memory to facilitate the
kernel's free page reporting feature, enabling the release of more idle
memory from guests.
During memory reclamation and compaction, mem-agent monitors system
pressure using Pressure Stall Information (PSI). If the system pressure
becomes too high, memory reclamation or compaction will automatically
stop.
Fixes: #10625
Signed-off-by: Hui Zhu <teawater@antgroup.com>
Add mglru, debugfs and psi to dragonball-experimental/mem_agent.conf to
support mem_agent function.
Fixes: #10625
Signed-off-by: Hui Zhu <teawater@antgroup.com>
This is super useful for development / debugging scenarios, mainly when
dealing with limited hardware availability, as this change allows
multiple people to develop into one single machine, while still using
kata-deploy.
Fixes: #10546
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
At the same time that INSTALLATION_PREFIX was added, I was working on
the helm changes to properly do the cleanup / deletion when it's
removed. However, I missed adding the INSTALLATION_PREFIX env var
there. which I'm doing now.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
- Remove default_namespace from settings
- Ensure container namespaces in a pod match each other in case no namespace is specified in the YAML
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
The tests is unstable on this platform, so skip it for now to prevent
the regular known failures covering up other issues. See #10616
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Fix copy-paste errors in artifact filters for arm64 and ppc64le
- Remove the trailing wildcard filter that falsely ends up removing agent-ctl
and replace with the tarball-suffix, which should exactly match the artifacts
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
With the building/publishing step for the CSI driver validated, we can
set that as a requirement for the CoCo tests.
Depends on: #10561
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
The driver build recipe has a script to check the current Go version against
the go.mod version. However, the script is broken ($expected is unbound) and I
don't believe we do this for other components. On top of this, Go should be
backward-compatible. Let's keep things simple for now and we can evaluate
restoring this script in the future if need be.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This fully implements the compilation step for csi-kata-directvolume.
This component can now be built by the CI running:
$ cd tools/packaging/kata-deploy/local-build
$ make csi-kata-directvolume-tarball
A couple notes:
* When installing the binary, we rename it from directvolplugin to
csi-kata-directvolume on the fly to make it more readable.
* We add go to the tools builder Dockerfile to support building this
tool.
* I've noticed the file install_libseccomp.sh gets created by the build
process so I've added it to a .gitignore.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
The devno assignment logic is repeated in 5 different places
during device addition.
To improve code maintainability and readability, this commit
introduces a standalone function, `get_devno_ccw()`,
to handle the deduplication.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Before this patch there was a mismatch between the JSON path under which
the state of the rule evaluation is set in comparison to under which
it is retrieved.
This resulted in the behavior that each time the policy was evaluated,
it thought it was the _first_ time the policy was evaluated.
This also means that the consistency check for the `sandbox_name`
was ineffective.
Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
With full debug logging enabled there might be around 1,500 redials
so log just ~15 of these redials to avoid flooding the log.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
The default initrd confidential target will have a
variant=confidential we need to accomodate this
and make sure we also accomodate aaa-xxx-confidential targets.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We've also seen the qemu metrics tests are failing due to the results
being slightly outside the max range for network-iperf3 parallel and minimum for network-iperf3 jitter tests on PRs that have no code changes,
so we've increase the bounds to not see false negatives.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We've seen a couple of instances recently where the metrics
tests are failing due to the results being below the minimum
value by ~2%.
For tests like latency I'm not sure why values being too low would
be an issue, but I've updated the minpercent range of the failing tests
to try and get them passing.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
As the following CI job has been marked as required:
- kata-containers-ci-on-push / run-k8s-tests-on-zvsi / run-k8s-tests (devmapper, qemu, kubeadm)
we need to add it to the gatekeeper's required job list.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This reverts commit 559018554b.
As we've noticed that this is causing issues with initrd builds in the
CI.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This will help us to simply allow a new dummy build whenever a new
component is added.
As long as the format `$(call DUMMY,$@)` is followed, we should be good
to go without taking the risk of breaking the CI.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This will make our lives considerably easier when it comes to cleaning
up content added, while it's also a groundwork needed for having
multiple installations running in parallel.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
We need the publish certain artefacts for the rootfs,
like the agent, guest-components, pause bundle etc
as they are consumed in the `build-asset-rootfs` step.
However after this point they aren't needed and probably
shouldn't be included in the overall kata tarball, so delete
them once they aren't needed any more to avoid them
being included.
Fixes: #10575
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Let's actually mount the whole /etc/k0s as /etc/containerd, so we can
easily access the containerd configuration file which has the version in
it, allowing us to parse it instead of just making a guess based on
kubernetes distro being used.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
On Ubuntu 24.04, with the distro default containerd, we're already
getting:
```
$ containerd config default | grep "version = "
version = 3
```
With that in mind, let's make sure that we're ready to support this from
the next release.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
A new attribute named `devno` is added to DeviceVirtioScsi.
It will be used to specify a device number for a CCW bus type.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
A new attribute named `devno` is added to DeviceVhostUserFs.
It will be used to specify a device number for a CCW bus type.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
A new attribute named `devno` is added to VhostVsock.
It will be used to specify a device number for a CCW bus type.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
A new attribute named `devno` is added to DeviceVirtioSerial.
It will be used to specify a device number for a CCW bus type.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
A new attribute named `devno` is added to DeviceVirtioBlk.
It will be used to specify a device number for a CCW bus type.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
To explicitly specify a device number on the QEMU command line
for the following devices using the CCW transport on s390x:
- SerialDevice
- BlockDevice
- VhostUserDevice
- SCSIController
- VSOCKDevice
this commit introduces a new structure CcwSubChannel and implements
the following methods:
- add_device()
- remove_device()
- address_format_ccw()
- set_addr()
You can see the detailed explanation for each method in the comment.
This resolves the 1st part of #10573.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
let's print the also the existing result's id when printing the
information about ignoring older result id to simplify debugging.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
tha matching run_id means we're dealing with the same job but with
updated results and not with an older job. Update the results in such
case.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
This adds a new job to build and publish the CSI driver Docker image.
Of course this job will fail after we merge this PR because the CSI driver
compilation job hasn't been implemented yet. However that will be implemented
directly after in #10561.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This adds a no-op build step to compile the CSI driver. The actual compilation
will be implemented in an ulterior PR, so as to ensure we don't break the CI.
Addresses: #10560
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Build ubuntu rootfs with Docker failed with error:
`Unable to find libclang`
Fix this error by adding libclang-dev to the dependency.
Signed-off-by: Jitang Lei <leijitang@outlook.com>
We need to clean-up any created files/dirs otherwise
we cause problems on self-hosted runners. Using tempdir which
will be removed automatically.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Now we are downloading artifacts to create the rootfs
we need to ensure they are uploaded always,
even on releases
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
KinD checks for the presence of this (and other) kernel configuration
via scripts like
https://blog.hypriot.com/post/verify-kernel-container-compatibility/ or
attempts to directly use /proc/sys/kernel/keys/ without checking to see
if it exists, causing an exit when it does not see it.
Docker/it's consumers apparently expect to be able to use the kernel
keyring and it's associated syscalls from/for containers.
There aren't any known downsides to enabling this except that it would
by definition enable additional syscalls defined in
https://man7.org/linux/man-pages/man7/keyrings.7.html which are
reachable from userspace. This minimally increases the attack surface of
the Kata Kernel, but this attack surface is minimal (especially since
the kernel is most likely being executed by some kind of hypervisor) and
highly restricted compared to the utility of enabling this feature to
get further containerization compatibility.
Signed-off-by: Crypt0s <BryanHalf@gmail.com>
This will be helpful in order to increase the OS coverage (we'll be
using both Ubuntu 24.04 and CentOS 9 Stream), while also reducing the
amount spent on the tests (as one machine will only run attestation
related tests, and the other the tests that do *not* require
attestation).
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
We're doing this as, at Intel, we have two different kind of machines we
can plug into our CI. Without going much into details, only one of
those two kinds of machines will work for the attestation tests we
perform with ITA, thus in order to speed up the CI and improve test
coverage (OS wise), we're going to run different tests in different
machines.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
As discussed in the CI working group,
we are temporarily skipping the SNP CI
to unblock the remaining workflow.
Will revert after fixing the SNP runner.
Signed-Off-By: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
Use regorous engine's add_data method to add state to the policy.
This data can later be accessed inside rego context through the data namespace.
Support state modifications (json-patches) that may be returned as a result from policy evaluation.
Also initialize a policy engine data slice "pstate" dedicated for storing state.
Fixes#10087
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
The Clear Linux rootfs is not being tested anywhere, and it seems Intel
doesn't have the capacity to review the PRs related to this (combined
with the lack of interested from the rest of the community on reviewing
PRs that are specific to this untested rootfs).
With this in mind, I'm suggesting we drop Clear Linux support and focus
on what we can actually maintain.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This reverts commit f15e16b692, as we
don't have to do this since we're relying on the
`static_sandbox_resource_mgmt` feature, which gives us the correct
amount of memory and CPUs to be allocated.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Trustee's deployment must set the correct https_proxy as env var on the
container that will talk to the ITA / ITTS server, otherwise the kbs
service won't be able to start, causing then issues in our CI.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Signed-off-by: Krzysztof Sandowicz <krzysztof.sandowicz@intel.com>
Fedora F40 removed python3 from the base container, to avoid such issues
let's rely on the latest and greates official python container.
Fixes: #10497
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
This PR adds the get artifacts which are needed when installing kata
tools in stability workflow to avoid failures saying that artifacts
are missing.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
As discussed in the AC meeting, we don't have a maintainer,
(or users?) of runk, and the CI is unstable, so giving we can't
support it, we shouldn't waste CI cycles on it.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
As discussed on the AC call, we are lacking maintainers for the
metrics tests. As a starting point for potentially phasing them
out, we discussed starting with removing the test for stratovirt
as a non-core hypervisor and a job that is problematic in leaving
behind resources that need cleaning up.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This PR adds the install kata tools step as part of the k8s stability workflow.
To avoid the failures saying that certain kata components are not installed it.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Currently the error we are checking for is
`CreateContainerRequest timed out`, but this message
doesn't always seem to be printed to our pod log.
Try using a more general message that should be present
more reliably.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We have an error with service name resolution with this test when using crio.
This error could not be reproduced outside of the CI for now.
Skipping it to keep the CI job running until we find a solution.
See: #10414
Signed-off-by: Julien Ropé <jrope@redhat.com>
By the moment we're testing it also with qemu-coco-dev, it becomes
easier for a developer without access to TEE to also test it locally.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
The qemu-coco-dev runtime class should be as close as possible to what
the TEEs runtime classes are doing, and this was one of the options that
ended up overlooked till now.
Shout out to Dan Mihai for noticing that!
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Most likely this was overlooked during the development / review, but
we're actually interested on the size rather than on the pagesize of the
hugepages.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
So far we were not prepared to deal with release candidates as those:
* Do not have a sha256sum in the sha256sums provided by the kernel cdn
* Come from a different URL (directly from Linus)
* Have a different suffix (.tar.gz, instead of .tar.xz)
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This doesn't change much on how we're doing things Today, but it
simplifies a lot cases that may be added later on (and will be) like
building -rc kernels.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
By doing this we can ensure this can be re-used, if needed (and it'll be
needed), for also getting the URL.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
-f forces the (re)generaton of the config when doing the setup, which
helps a lot on local development whilst not causing any harm in the CI
builds.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Let's just do a simple `sed` and **not** use the repo that became
private.
This is not a backport of https://github.com/tianocore/edk2/pull/6402,
but it's a similar approach that allows us to proceed without the need
to pick up a newer version of edk2.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This PR fixes the source to avoid duplication specially in the common.sh
script and avoid failures saying that certain script is not in the directory.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This change is motivated by a problem in peerpod's podvms. In this setup
the lifecycle of guest components is managed by systemd. The current code
skips over init steps like setting the ocicrypt-rs env and initialization
of a CDH client in this case.
To address this the launch of the processes has been isolated into its
own fn.
Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
This reverts commit 973b8a1d8f.
As @danmihai1 points out https://github.com/bats-core/bats-core/issues/364
states that using traps in bats is error prone, so this could be the cause
of the confidential test instability we've been seeing, like it was
in the static checks, so let's try and revert this.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
At @danmihai1's suggestion add a die message in case
the call to setup_common fails, so we can see if in the test
output.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We've seen some issues with tests not being run in
some of the Coco CI jobs (Issue #10451) and in the
envrionments that are more stable we noticed that
they had a newer version of bats installed.
Try updating the version to 1.10+ and print out
the version for debug purposes
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We need to know if we're building a nvidia initrd or image
Additionally if we build a regular or confidential VARIANT
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Fixes: #9478
We want to keep track of the driver versions build during initrd/image build so update the artifact_name after the fact.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
This commit introduces changes to enable testing kata-agent's container
APIs of CreateContainer/StartContainer/RemoveContainer. The changeset
include:
- using confidential-containers image-rs crate to pull/unpack/mount a
container image. Currently supports only un-authenicated registry pull
- re-factor api handlers to reduce cmdline complexity and handle
request generation logic in tool
- introduce an OCI config template for container creation
- add test case
Fixes#9707
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
This PR updates the virtualization document by removing a url link
which is not longer valid.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The current version of the oci-spec crate compiles RLimit structs only
for Linux and Solaris. Until this is fixed upstream, add compilation
conditions to the type converters for the affected structs.
Fixes: #10071
Signed-off-by: Markus Rudy <mr@edgeless.systems>
The parse_json_string function is specific to parsing capability strings
out of ttRPC proto definitions and does not benefit from being available
to other crates. Moving it into the protocols crate allows removing
kata-sys-util as a dependency, which in turn enables compiling the
library on darwin.
Fixes: #10071
Signed-off-by: Markus Rudy <mr@edgeless.systems>
The cgroups.rs source file was removed in
234d7bca04. With cgroups support handled
in runtime-rs, the cgroups dependency on kata-sys-util can be removed.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
This commit introduces changes to use ubuntu for statically
building kata tools. In the existing CI setup, the tools
currently build only for x86_64 architecture.
It also fixes the build error seen for agent-ctl PR#10395.
Fixes#10441
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
This PR adds missing steps in the gha run script for the kata stability
workflow.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes the kernel config repo variable at the build kernel
script as it is not used.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The cloud-api-adaptor needs to support different types of pod VM
instance.
We needs to pass some annotations like machine_type, default_vcpus and
default_memory to prepare the VMs.
Signed-off-by: Chasing1020 <643601464@qq.com>
This patch adds the support of the remote hypervisor type for runtime-rs.
The cloud-api-adaptor needs the annotations and network namespace path
to create the VMs.
The remote hypervisor opens a UNIX domain socket specified in the config
file, and sends ttrpc requests to a external process to control sandbox
VMs.
Fixes: #10350
Signed-off-by: Chasing1020 <643601464@qq.com>
Add GPU annotations for remote hypervisor to help
with the right instance selection based on number of GPUs
and model
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Last November, SUSE discontinued support for s390x, leaving k3s
on this platform stuck at k8s version 1.28, while upstream k8s
has since reached 1.31. Fortunately, kubeadm allows us to create
a 1.30 Kubernetes cluster on s390x.
This commit switches the KUBERNETES option from k3s to kubeadm
for s390x and removes a dedicated cluster creation step.
Now, cluster setup and teardown occur in ACTIONS_RUNNER_HOOK_JOB_{STARTED,COMPLETED}.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
When creating a k8s cluster via kubeadm, the devmapper setup
for containerd requires a different configuration.
This commit introduces a new `kubeadm` option for the KUBERNETES
variable and adjusts the path to the containerd config file for
devmapper setup.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
All the oras push logic happens from inside `${workdir}`, while the
root_hash.txt extraction and renaming was not taking this into
consideration.
This was not caught during the manually triggered runs as those do not
perform the oras push.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
As we're now building everything needed to test TDX with measured rootfs
support, let's bring this test back in (for TDX only, at least for now).
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
The approach taken for now is to export MEASURED_ROOTFS=yes on the
workflow files for the architectures using confidential stuff, and leave
the "normal" build without having it set (to avoid any change of
expectation on the current bevahiour).
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Let's make sure we take the root_hashes into consideration to decide
whether the shim-v2 should or should not be used from the cached
artefacts.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Let's cache the root_hash.txt from the confidential image so we can use
them later on to decide whether there was a rootfs change that would
require shim-v2 to be rebuilt.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Let's ensure we remove the component and any extra tarball provided by
ORAS in case the cached component cannot be used.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
We may decide to add this later on, but for now this is only targetting
TEEs and the confidential image / initrd.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Let's ensure that we get the already built rootfs tarball from previous
steps of the action at the time we're building the shim-v2.
The reason we do that is because the rootfs binary tarballs has a
root_hash.txt file that contains the information needed the shim-v2
build scripts to add the measured rootfs arguments to the shim-v2
configuration files.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
As the rootfs will have what we need to add as part of the shim-v2
configuration files for measured rootfs, we **must** ensure this is
built **before** shim-v2.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
By doing this we can just re-use the dependencies already built, saving
us a reasonable amount of time.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This is a helper script that does basically what's already being done by
the s390x CI, which is:
* Move a folder with the components that we were stored / downloaded
during the GHA execution to the expected `build` location
* Get rid of the dependencies for a specific asset, as the dependencies
are already pulled in from previous GHA steps
For now this script is only being added but not yet executed anywhere,
and that will come as the next step in this series.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
So far we haven't been storing the rootfs dependencies as part of our
workflows, but we better do it to re-use them as part of the rootfs
build.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Return of proper error to the initiator is not guaranteed.
Method StopVM could kill shim process together with VM pieces.
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
The only reason we had this one passing for amd64 is because the check
was done using the wrong variable (`matrix.stage`, while in the other
workflows the variable used is `inputs.stage`).
The commit that broke the release process is 67a8665f51, which
blindly copy & pasted the logic from the matrix assets.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Corrects device filemode permissions typo/regression in rustjail to `666` instead of `066`.
`666` is the standard and expected value for these devices in containers.
Fixes: #10454
Signed-off-by: Simon Kaegi <simon.kaegi@gmail.com>
This PR increase the time to run the stressng k8s tests for the
CoCo stability CI.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
More than one developer can and should be able to run this workflow at
the same time, without cancelling the job started by another developer.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Let's use "dev" instead of "manually-triggered" as it avoids the name
being too long, which results in failures to create AKS clusters.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This PR adds the trap statement into the kata doc
script to clean up properly the temporary files.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
As a new workflow was added for the cases where developers want to test
their changes in the workflow itself, let's make sure we stop allowing
manual triggers on this workflow, which can lead to a polluted /
misleading weather of the CI.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This workflow is intended to replace the `workflow_dispatch` trigger
currently present as part of the `ci-nightly.yaml`.
The reasoning behind having this done in this way is because of our good
and old GHA behaviour for `pull_request_target`, which requires a PR to
be merged in order to check the changes in the workflow itself, which
leads to:
* when a change in a workflow is done, developers (should) do:
* push their branch to the kata-containers repo
* manually trigger the "nightly" CI in order to ensure the changes
don't break anything
* this can result in the "nightly" CI weather being polluted
* we don't have the guarantee / assurance about the last n nightly
runs anymore
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Change the PACKAGES variable for the cbl-mariner rootfs-builder
to use the kata-packages-uvm meta package from
packages.microsoft.com to define the set of packages to be
contained in the UVM.
This aligns the UVM build for the Azure Linux distribution
with the UVM build done for the Kata Containers offering on
Azure Kubernetes Services (AKS).
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
By doing this we can ensure that whenever the rootfs changes, we'll be
able to get the new root_hash.txt and use it.
This is the very first step to bring the measured rootfs tests back.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
When QEMU is terminated by signal 15, it deletes the PidFile.
Upon detecting that QEMU has exited, the shim executes the stopVM function.
If the PidFile is not found, the PID is set to 0.
Subsequently, the shim executes `kill -9 0`, which terminates the current process group.
This prevents any further logic from being executed, resulting in resources not being cleaned up.
Signed-off-by: wangyaqi54 <wangyaqi54@jd.com>
Semantics are lifted straight out of the go runtime for compatibility.
We introduce DeviceVirtioScsi to represent a virtio-scsi device and
instantiate it if block device driver in the configuration file is set
to virtio-scsi. We also introduce ObjectIoThread which is instantiated
if the configuration file additionally enables iothreads.
Signed-off-by: Pavel Mores <pmores@redhat.com>
When do update_container_namespaces updating namespaces, setting
all UTS(and IPC) namespace paths to None resulted in hostnames
set prior to the update becoming ineffective. This was primarily
due to an error made while aligning with the oci spec: in an attempt
to match empty strings with None values in oci-spec-rs, all paths
were incorrectly set to None.
Fixes#10325
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Moved the parsing of the oci image marker into its own step, since we
only need to perform that for attestation purposes and some cached
images might not have that file in the tarball.
Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
This adds provenance attestation logic for agent binaries that are
published to an oci registry via ORAS.
As a downstream consumer of the kata-agent binary the Peerpod project
needs to verify that the artifact has been built on kata's CI.
To create an attestation we need to know the exact digest of the oci
artifact, at the point when the artifact was pushed.
Therefore we record the full oci image as returned by oras push.
The pushing and tagging logic has been slightly reworked to make this
task less repetetive.
The oras cli accepts multiple tags separated by comma on pushes, so a
push can be performed atomically instead of iterating through tags and
pushing each individually. This removes the risk of partially successful
push operations (think: rate limits on the oci registry).
So far the provenance creation has been only enabled for agent builds on
amd64 and xs390x.
Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
This PR updates the CI documentation referring to the several tests and
in which kind of instances is running them.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Use rstest for unit test rather than TestData arrays where
possible to make the code more compact, easier to read
and open the possibility to enhance test cases with a
description more easily.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This PR replaces the arch uname -m to use the arch_to_golang
variable in the script to have a better uniformity across the script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
inotify-configmap-pod.yaml is using: "inotifywait --timeout 120",
so wait for up to 180 seconds for the pod termination to be
reported.
Hopefully, some of the sporadic errors from #10413 will be avoided
this way:
not ok 1 configmap update works, and preserves symlinks
waitForProcess "${wait_time}" "$sleep_time" "${command}" failed
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
While Kubernetes defines `binaryData` as `[]byte`,
when defined in a YAML file the raw bytes are
base64 encoded. Therefore, we need to read the YAML
value as `String` and not as `Vec<u8>`.
Fixes: #10410
Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
Adds dependencies of 'clang' & 'protobuf' to be installed in runners
when building agent-ctl sources having image pull support.
Fixes#10400
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
This PR removes egrep command as it has been deprecated and it replaces by
grep in the test images script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Use the configuration used by AKS (static_sandbox_resource_mgmt=true)
for CI testing on Mariner hosts.
Hopefully pod startup will become more predictable on these hosts -
e.g., by avoiding the occasional hotplug timeouts described by #10413.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
When dealing with a specific release, it was easier to just do some
adjustments on the image that has to be used for ITA without actually
adding a new entry in the versions.yaml.
However, it's been proven to be more complicated than that when it comes
to dealing with staged images, and we better explicitly add (and
update) those versions altogether to avoid CI issues.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
The root cause is that the CDH client is a global variable, and unit tests `test_unseal_env` and `test_unseal_file`
share this lock-free global variable, leading to resource contention and destruction.
Merging the two unit tests into one test_sealed_secret will resolve this issue.
Fixes: #10403
Signed-off-by: ChengyuZhu6 <zhucy0405@gmail.com>
As mariner has switched to using an image instead of an initrd, let's
just drop the abiliy to build the initrd and avoid keeping something in
the tree that won't be used.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
As the mariner image is already in place, and the tests were modified to
use them (as part of this series), let's just stop building it as part
of the CI.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
The reason we're doing this is because mariner image uses, by default,
cgroups default-hierarchy as `unified` (aka, cgroupsv2).
In order to keep the same initrd behaviour for mariner, let's enforce
that `SYSTEMD_CGROUP_ENABLE_LEGACY_FORCE=1
systemd.legacy_systemd_cgroup_controller=yes
systemd.unified_cgroup_hierarchy=0` is passed to the kernel cmdline, at
least for now.
Other tests that are setting `kernel_params` are not running on mariner,
then we're safe taking this path as it's done as part of this PR.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
As an image has been added for mariner as part of the commit 63c1f81c2,
let's start using it in the CI, instead of using the initrd.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This reverts commit 31e09058af, as it's
breaking the agent unit tests CI.
This is a stop gap till Chengyu Zhu finds the time to properly address
the issue, avoiding the CI to be blocked for now.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
In https://github.com/confidential-containers/trustee/pull/521
the overlays logic was modified to add non-SE
s390x support and simplify non-ibm-se platforms.
We need to update the logic in `kbs_k8s_deploy`
to match and can remove the dummying of `IBM_SE_CREDS_DIR`
for non-SE now
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This PR adds the trap statement in the test images script to clean up
tmp files.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Kata CI will start testing the new rootfs-image-mariner instead of the
older rootfs-initrd-mariner image.
The "official" AKS images are moving from a rootfs-initrd-mariner
format to the rootfs-image-mariner format. Making the same change in
Kata CI is useful to keep this testing in sync with the AKS settings.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Refactor CDH-related operations into the cdh_handler function to make the `create_container` code clearer.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Users must set the mount path to `/sealed/<path>` for kata agent to detect the sealed secret mount
and handle it in createcontainer stage.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
Introduced `unseal_file` function to unseal secret as files:
- Implemented logic to handle symlinks and regular files within the sealed secret directory.
- For each entry, call CDH to unseal secrets and the unsealed contents are written to a new file, and a symlink is created to replace the sealed symlink.
Fixes: #8123
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
By checking for AGENT_POLICY we ensure we only try to read
allow-all.rego if AGENT_POLICY is set to "yes"
Signed-off-by: Emanuel Lima <emlima@redhat.com>
The behavior of Kata CI doesn't change.
For local testing using kubernetes/gha-run.sh and AUTO_GENERATE_POLICY=yes:
1. Before these changes users were forced to use:
- SEV, SNP, or TDX guests, or
- KATA_HOST_OS=cbl-mariner
2. After these changes users can also use other platforms that are
configured with "shared_fs = virtio-fs" - e.g.,
- KATA_HOST_OS=ubuntu + KATA_HYPERVISOR=qemu
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR removes duplicated arch variable definition in the rootfs script
as this variable and its value is already defined at the top of the
script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
add_device() now checks if QEMU is running already by checking if we have
a QMP connection. If we do a new function hotplug_device() is called
which hotplugs the device if it's a network one.
Signed-off-by: Pavel Mores <pmores@redhat.com>
With the helpers from previous commit, the actual hotplugging
implementation, though lengthy, is mostly just assembling a QMP command
to hotplug the network device backend and then doing the same for the
corresponding frontend.
Note that hotplug_network_device() takes cmdline_generator types Netdev
and DeviceVirtioNet. This is intentional and aims to take advantage of
the similarity between parameter sets needed to coldplug and hotplug
devices reuse and simplify our code. To enable using the types from qmp,
accessors were added as needed.
Signed-off-by: Pavel Mores <pmores@redhat.com>
Before adding network device hotplugging functionality itself we add
a couple of helpers in a separate commit since their functionality is
non-trivial.
To hotplug a device we need a free PCI slot. We add find_free_slot()
which can be called to obtain one. It looks for PCI bridges connected
to the root bridge and looks for an unoccupied slot on each of them. The
first found is returned to the caller. The algorithm explicitly doesn't
support any more complex bridge hierarchies since those are never produced
when coldplugging PCI bridges.
Sending netdev queue and vhost file descriptors to QEMU is slightly
involved and implemented in pass_fd(). The actual socket has to be passed
in an SCM_RIGHTS socket control message (also called ancillary data, see
man 3 cmsg) so we have to use the msghdr structure and sendmsg() call
(see man 2 sendmsg) to send the message. Since qapi-rs doesn't support
sending messages with ancillary data we have to do the sending sort of
"under it", manually, by retrieving qapi-rs's socket and using it directly.
Signed-off-by: Pavel Mores <pmores@redhat.com>
NetworkConfig::index has been used to generate an id for a network device
backend. However, it turns out that it's not unique (it's always zero
as confirmed by a comment at its definition) so it's not suitable to
generate an id that needs to be unique.
Use the host device name instead.
Signed-off-by: Pavel Mores <pmores@redhat.com>
Network device hotplugging will use the same infrastructure (Netdev,
DeviceVirtioNet) as coldplugging, i.e. QemuCmdLine. To make the code
of network device setup visible outside of QemuCmdLine we factor it out
to a non-member function `get_network_device()` and make QemuCmdLine just
delegate to it.
Signed-off-by: Pavel Mores <pmores@redhat.com>
The function takes a whole QemuCmdLine but only actually uses
HypervisorConfig. We increase callability of the function by limiting
its interface to what it needs. This will come handy shortly.
Signed-off-by: Pavel Mores <pmores@redhat.com>
At least one PCI bridge is necessary to hotplug PCI devices. We only
support PCI (at this point at least) since that's what the go runtime
does (note that looking at the code in virtcontainers it might seem that
other bus types are supported, however when the bridge objects are passed
to govmm, all but PCI bridges are actually ignored). The entire logic of
bridge setup is lifted from runtime-go for compatibility's sake.
Signed-off-by: Pavel Mores <pmores@redhat.com>
with multiple iterations/reruns we need to use the latest run of each
workflow. For that we can use the "run_id" and only update results of
the same or newer run_ids.
To do that we need to store the "run_id". To avoid adding individual
attributes this commit stores the full job object that contains the
status, conclussion as well as other attributes of the individual jobs,
which might come handy in the future in exchange for slightly bigger
memory overhead (still we only store the latest run of required jobs
only).
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
some tests require certain labels before they are executed. When our PR
is not labeled appropriately the gatekeeper detects skipped required
tests and reports a failure. With this change we add "required-labeles"
to the tests mapping and check the expected labels first informing the
user about the missing labeles before even checking the test statuses.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
the test names are using `;` and regexps were designed to use `,` but
during development simply joined the expressions by `|`. This should
work but might be confusing so let's go with the semi-colon separator
everywhere.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
The Github SHA of triggering PR should be exported in the environment
so that gatekeeper can fetch the right workflows/jobs.
Note: by default github will export GITHUB_SHA in the job's environment
but that value cannot be used if the gatekeeper was triggered from a
pull_request_target event, because the SHA correspond to the push
branch.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This will allow to cancel-in-progress the gatekeeper jobs.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
to allow selective testing as well as selective list of required tests
let's add a mapping of required jobs/tests in "skips.py" and a
"gatekeaper" workflow that will ensure the expected required jobs were
successful. Then we can only mark the "gatekeaper" as the required job
and modify the logic to suit our needs.
Fixes: #9237
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
The behavior of Kata CI doesn't change.
For local testing using kubernetes/gha-run.sh:
1. Before these changes:
- AUTO_GENERATE_POLICY=yes was always used by the users of SEV, SNP,
TDX, or KATA_HOST_OS=cbl-mariner.
2. After these changes:
- Users of SEV, SNP, TDX, or KATA_HOST_OS=cbl-mariner must specify
AUTO_GENERATE_POLICY=yes if they want to auto-generate policy.
- These users have the option to test just using hard-coded policies
(e.g., using the default policy built into the Guest rootfs) by
using AUTO_GENERATE_POLICY=no. AUTO_GENERATE_POLICY=no is the default
value of this env variable.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR adds the trap statement in the confidential kbs script
to clean up temporary files and ensure we are leaving them.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The tests is disabled for qemu-coco-dev / qemu-tdx, but it doesn't seen
to actually be failing on those. Plus, it's passing on SEV / SNP, which
means that we most likely missed re-enabling this one in the past.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
The quay.io registry returns the tags sorted alphabetically and doesn't
seem to provide a way to sort it by age. Let's use "git log" to get all
changes between the commits and print all tags that were actually
pushed.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
Currently, `qemu-runtime-rs` does not support `virtio-scsi`,
which causes the `k8s-block-volume.bats` test to fail.
We should skip this test until `virtio-scsi` is supported by the runtime.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The nginx container seems to error out when using UID=123.
Depending on the timing between container initialization and "kubectl
wait", the test might have gotten lucky and found the pod briefly in
Ready state before nginx errored out. But on some of the nodes, the pod
never got reported as Ready.
Also, don't block in "kubectl wait --for=condition=Ready" when wrapping
that command in a waitForProcess call, because waitForProcess is
designed for short-lived commands.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR updates the fast footprint script to remove the use
of egrep as this command has been deprecated and change it
to use grep command.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This imports the k8s-block-volume test from the tests repo and modifies
it slightly to set up the host volume on the AKS host.
This is a follow-up to #7132.
Fixes: #7164
Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
The ita kustomization for Trustee, as well as previously used one
(DCAP), doesn't have a $(uname -m) directory after the deployment
directory name.
Let's follow the same logic used for the deploy-kbs script and clean
those up accordingly.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Intel Tiber Trust Services (formerly known as Intel Trust Authority) is
Intel's own attestation service, and we want to take advantage of the
TDX CI in order to ensure ITTS works as expected.
In order to do so, let's replace the former method used (DCAP) to use
ITTS instead.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Inorder to support sandbox api, intorduce the sandbox_config
struct and split the sandbox start stage from init process.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
There are many similar or duplicated code patterns in `teardown()`.
This commit consolidates them into a new function, `teardown_common()`,
which is now called within `teardown()`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The current `exec_host()` accepts a given node name and
creates a node debugger pod, even if the name is invalid.
This could result in the creation of an unnecessary pending
pod (since we are using nodeAffinity; if the given name
does not match any actual node names, the pod won’t be scheduled),
which wastes resources.
This commit introduces validation for the node name to
prevent this situation.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This commit enables basic amd64 tests of docker for runtime-rs by adding
vmm types "dragonball" and "cloud-hypervisor".
Signed-off-by: Sicheng Liu <lsc2001@outlook.com>
Docker cannot exit normally after the container process exits when
used with runtime-rs since it doesn't receive the exit event. This
commit enable runtime-rs to send TaskExit to containerd after process
exits.
Also, it moves "system_time_into" and "option_system_time_into" from
crates/runtimes/common/src/types/trans_into_shim.rs to a new utility
mod.
Signed-off-by: Sicheng Liu <lsc2001@outlook.com>
When the sandbox api was enabled, the pasue container
wouldn't be created, thus the shared sandbox pidns
should be fallbacked to the first container's init process,
instead of return any error here.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
It was observed that the custom node debugger pod is not
cleaned up when a test times out.
This commit ensures the pod is cleaned up by triggering
the cleanup on EXIT, preventing any debugger pods from
being left behind.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
When using network adapters that support SR-IOV, a VFIO device can be
plugged into a guest VM and claimed as a network interface. This can
significantly enhance network performance.
Fixes: #9758
Signed-off-by: Lei Huang <leih@nvidia.com>
With #10232 merged, we now have a persistent node debugger pod throughout the test.
As a result, there’s no need to spawn another debugger pod using `kubectl debug`,
which could lead to false negatives due to premature pod termination, as reported
in #10081.
This commit removes the `print_node_journal()` call that uses `kubectl debug` and
instead uses `exec_host()` to capture the host journal. The `exec_host()` function
is relocated to `tests/integration/kubernetes/lib.sh` to prevent cyclical dependencies
between `tests_common.sh` and `lib.sh`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
in b9d88f74ed the `runtime_class` CM was
added which overrides the one we previously set. Let's reorder our logic
to first deploy webhook and then override the default CM in order to use
the one we really want.
Since we need to change dirs we also have to use realpath to ensure the
files are located well.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
`assert_pod_fail()` currently calls `k8s_create_pod()` to ensure that a pod
does not become ready within the default 120s. However, this delays the test's
completion even if an error message is detected earlier in the journal.
This commit removes the use of `k8s_create_pod()` and modifies `assert_pod_fail()`
to fail as soon as the pod enters a failed state.
All failing pods end up in one of the following states:
- CrashLoopBackOff
- ImagePullBackOff
The function now polls the pod's state every 5 seconds to check for these conditions.
If the pod enters a failed state, the function immediately returns 0. If the pod
does not reach a failed state within 120 seconds, it returns 1.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This PR removes some qemu information which is not longer valid as
this is referring to the tests repository and to kata 1.x.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
rename the task_service to service, in order to
incopperate with the following added sandbox
services.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
In order to make different from sandbox request/response, this commit
changed the task request/response to TaskRequest/TaskResponse.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Since the wait_vm would be called before calling stop_vm,
which would take the reader lock, thus blocking the stop_vm
getting the writer lock, which would trigge the dead lock.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Since the block_on would block on the current thread
which would prevent other async tasks to be run on this
worker thread, thus change it to use the async task for
this task.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
- Reflect the need to update the versions in the Helm Chart
- Add the lock branch instruction
- Add clarity about the permissions needed to complete tasks
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
* Clarifies instructions for k0s.
* Adds kata-deploy step for each cluster type.
* Removes the old kata-deploy-stable step for vanilla k8s.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR introduces support for selectively compiling Dragonball in
runtime-rs. By default, Dragonball will continue to be compiled into
the containerd-shim-kata-v2 executable, but users now have the option
to disable Dragonball compilation.
Fixes#10310
Signed-off-by: sidney chang <2190206983@qq.com>
Now that the issue with handling loop devices has been resolved,
this commit re-enables the guest-pull-image tests for `qemu-coco-dev`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Timeouts occur (e.g. `create_container_timeout` and `wait_time`)
when using qemu-coco-dev.
This commit increases these timeouts for the trusted image storage
test cases
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
If the host running the tests is different from the host where the cluster is running,
the *_loop_device() functions do not work as expected because the device is created
on the test host, while the cluster expects the device to be local.
This commit ensures that all commands for the relevant functions are executed via exec_host()
so that a device should be handled on a cluster node.
Additionally, it modifies exec_host() to return the exit code of the last executed command
because the existing logic with `kubectl debug` sometimes includes unexpected characters
that are difficult to handle. `kubectl exec` appears to properly return the exit code for
a given command to it.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Creating and deleting a node debugger pod for every `exec_host()`
call is inefficient.
This commit changes the test suite to create and delete the pod
only once, globally.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This commit addresses an issue with handling loop devices
via a node debugger due to restricted privileges.
It runs a pod with full privileges, allowing it to mount
the host root to `/host`, similar to the node debugger.
This change enables us to run tests for trusted image storage
using the `qemu-coco-dev` runtime class.
Fixes: #10133
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Add cdi devices including ContainerDevice definition and
annotation_container_device method to annotate vfio device
in OCI Spec annotations which is inserted into Guest with
its mapping of vendor-class and guest pci path.
Fixes#10145
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
We need vfio device's properties device, vendor and
class, but we can only get property device and vendor.
just extend it with class is ok.
Fixes#10145
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The kata webhook requires a configmap to define what runtime class it
should set for the newly created pods. Additionally, the configmap
allows others to modify the default runtime class name we wish to set
(in case the handler is kata but the name of the runtimeclass is
different).
Finally, this PR changes the webhook-check to compare the runtime of the
newly created pod against the specific runtime class in the configmap,
if said confimap doesn't exist, then it will default to "kata".
Signed-off-by: Martin <mheberling@microsoft.com>
This PR increases the timeout to run k8s tests for Kata CoCo TDX
to avoid the random failures of timeout.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
As we don't have any CI, nor maintainer to keep ACRN code around, we
better have it removed than give users the expectation that it should or
would work at some point.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Yq installation shouldn't force to use sudo in case yq is already installed in correct version.
Signed-off-by: Pawel Proskurnicki <pawel.proskurnicki@intel.com>
It's a prerequisite PR to make built-in vmm dragonball compilation
options configurable.
Extract TAP device-related code from dragonball's dbs_utils into a
separate library within the runtime-rs hypervisor module.
To enhance functionality and reduce dependencies, the extracted code
has been reimplemented using the libc crate and the ifreq structure.
Fixes#10182
Signed-off-by: sidney chang <2190206983@qq.com>
2024-09-13 07:32:14 -07:00
2837 changed files with 403724 additions and 77274 deletions
# Consider using larger runners or machines with greater resources for possible analysis time improvements.
runs-on:ubuntu-24.04
permissions:
# required for all workflows
security-events:write
# required to fetch internal or private CodeQL packs
packages:read
# only required for workflows in private repositories
actions:read
contents:read
strategy:
fail-fast:false
matrix:
include:
- language:go
build-mode:manual
- language:python
build-mode:none
# CodeQL supports the following values keywords for 'language': 'c-cpp', 'csharp', 'go', 'java-kotlin', 'javascript-typescript', 'python', 'ruby', 'swift'
# Use `c-cpp` to analyze code written in C, C++ or both
# Use 'java-kotlin' to analyze code written in Java, Kotlin or both
# Use 'javascript-typescript' to analyze code written in JavaScript, TypeScript or both
# To learn more about changing the languages that are analyzed or customizing the build mode for your analysis,
# see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/customizing-your-advanced-setup-for-code-scanning.
# If you are analyzing a compiled language, you can modify the 'build-mode' for that language to customize how
# your codebase is analyzed, see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/codeql-code-scanning-for-compiled-languages
# Add any setup steps before running the `github/codeql-action/init` action.
# This includes steps like installing compilers or runtimes (`actions/setup-node`
# or others). This is typically only required for manual builds.
# - name: Setup runtime (example)
# uses: actions/setup-example@v1
# Initializes the CodeQL tools for scanning.
- name:Initialize CodeQL
uses:github/codeql-action/init@v3
with:
languages:${{ matrix.language }}
build-mode:${{ matrix.build-mode }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# For more details on CodeQL's query packs, refer to: https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
# queries: security-extended,security-and-quality
# If the analyze step fails for one of the languages you are analyzing with
# "We were unable to automatically build your code", modify the matrix above
# to set the build mode to "manual" for that language. Then modify this step
# to build your code.
# ℹ️ Command-line programs to run using the OS shell.
# 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
pushd"${katacontainers_repo_dir}/tools/packaging/kata-deploy"||{echo"Failed to push to ${katacontainers_repo_dir}/tools/packaging/kata-deploy";exit 125;}
oc delete -f kata-deploy/base/kata-deploy.yaml
oc -n kube-system wait --timeout=10m --for=delete -l name=kata-deploy pod
oc apply -f kata-cleanup/base/kata-cleanup.yaml
echo"Wait for all related pods to be gone"
(repeats=1;fori in $(seq 1 600);do
(repeats=1;for_ in $(seq 1 600);do
oc get pods -l name="kubelet-kata-cleanup" --no-headers=true -n kube-system 2>&1| grep "No resources found" -q &&((repeats++))||repeats=1
for NODE_NAME in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}');do[["${NODE_NAME}"=~ 'worker']]&& kubectl label node "${NODE_NAME}" node.kubernetes.io/worker=;done
echo"This script created additional resources to create peering between ${AZURE_REGION} and ${PP_REGION}. Ensure you release those resources after the testing (or use temporary subscription)"
E2E_TEST="${E2E_TEST:-'"[sig-node] Container Runtime blackbox test on terminated container should report termination message as empty when pod succeeds and TerminationMessagePolicy FallbackToLogsOnError is set [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"'}"
@@ -499,19 +499,6 @@ If you do not want to install the respective QEMU version, the configuration fil
See the [static-build script for QEMU](../tools/packaging/static-build/qemu/build-static-qemu.sh) for a reference on how to get, setup, configure and build QEMU for Kata.
### Build a custom QEMU for aarch64/arm64 - REQUIRED
> **Note:**
>
> - You should only do this step if you are on aarch64/arm64.
> - You should include [Eric Auger's latest PCDIMM/NVDIMM patches](https://patchwork.kernel.org/cover/10647305/) which are
> under upstream review for supporting NVDIMM on aarch64.
>
You could build the custom `qemu-system-aarch64` as required with the following command:
| [image](background.md#root-filesystem-image) | [Ubuntu](https://ubuntu.com) (for x86_64 systems)| systemd | Fully tested in our CI | systemd offers flexibility |
| [initrd](#initrd-image) | [Alpine Linux](https://alpinelinux.org) | Kata [agent](README.md#agent) (as no systemd support) | Security hardened and tiny C library |
@@ -50,7 +50,7 @@ We provide `Dragonball` Sandbox to enable built-in VMM by integrating VMM's func
#### How To Support Async
The kata-runtime is controlled by TOKIO_RUNTIME_WORKER_THREADS to run the OS thread, which is 2 threads by default. For TTRPC and container-related threads run in the `tokio` thread in a unified manner, and related dependencies need to be switched to Async, such as Timer, File, Netlink, etc. With the help of Async, we can easily support no-block I/O and timer. Currently, we only utilize Async for kata-runtime. The built-in VMM keeps the OS thread because it can ensure that the threads are controllable.
**For N tokio worker threads and M containers**
**For N `tokio` worker threads and M containers**
- Sync runtime(both OS thread and `tokio` task are OS thread but without `tokio` worker thread) OS thread number: 4 + 12*M
- Async runtime(only OS thread is OS thread) OS thread number: 2 + N
@@ -103,7 +103,6 @@ In our case, there will be a variety of resources, and every resource has severa
| `Cgroup V2` | | Stage 2 | 🚧 |
| Hypervisor | `Dragonball` | Stage 1 | 🚧 |
| | QEMU | Stage 2 | 🚫 |
| | ACRN | Stage 3 | 🚫 |
| | Cloud Hypervisor | Stage 3 | 🚫 |
| | Firecracker | Stage 3 | 🚫 |
@@ -166,4 +165,4 @@ In our case, there will be a variety of resources, and every resource has severa
- What is the security boundary for the monolithic / "Built-in VMM" case?
It has the security boundary of virtualization. More details will be provided in next stage.
It has the security boundary of virtualization. More details will be provided in next stage.
The following sequence diagram depicted below offers a detailed overview of the messages/calls exchanged to pull an unencrypted unsigned image from an unauthenticated container registry. This involves the kata-runtime, kata-agent, and the guest-components’ image-rs to use the guest pull mechanism.
First and foremost, the guest pull code path is only activated when `nydus snapshotter` requires the handling of a volume which type is `image_guest_pull`, as can be seen on the message below:
```json
@@ -108,10 +153,10 @@ Below is an example of storage information packaged in the message sent to the k
```
Next, the kata-agent's RPC module will handle the create container request which, among other things, involves adding storages to the sandbox. The storage module contains implementations of `StorageHandler` interface for various storage types, being the `ImagePullHandler` in charge of handling the storage object for the container image (the storage manager instantiates the handler based on the value of the "driver").
`ImagePullHandler` delegates the image pulling operation to the `ImageService.pull_image()` that is going to create the image's bundle directory on the guest filesystem and, in turn, class the image-rs to in fact fetch and uncompress the image's bundle.
`ImagePullHandler` delegates the image pulling operation to the `confidential_data_hub.pull_image()` that is going to create the image's bundle directory on the guest filesystem and, in turn, the `ImagePullService` of Confidential Data Hub to fetch, uncompress and mount the image's rootfs.
> **Notes:**
> In this flow, `ImageService.pull_image()` parses the image metadata, looking for either the `io.kubernetes.cri.container-type: sandbox` or `io.kubernetes.cri-o.ContainerType: sandbox` (CRI-IO case) annotation, then it never calls the `image-rs.pull_image()` because the pause image is expected to already be inside the guest's filesystem, so instead `ImageService.unpack_pause_image()` is called.
> In this flow, `confidential_data_hub.pull_image()` parses the image metadata, looking for either the `io.kubernetes.cri.container-type: sandbox` or `io.kubernetes.cri-o.ContainerType: sandbox` (CRI-IO case) annotation, then it never calls the `pull_image()` RPC of Confidential Data Hub because the pause image is expected to already be inside the guest's filesystem, so instead `confidential_data_hub.unpack_pause_image()` is called.
@@ -10,7 +10,19 @@ To run Kata Containers in SNP-VMs, the following software stack is used.

The host BIOS and kernel must be capable of supporting AMD SEV-SNP and configured accordingly. For Kata Containers, the host kernel with branch [`sev-snp-iommu-avic_5.19-rc6_v3`](https://github.com/AMDESE/linux/tree/sev-snp-iommu-avic_5.19-rc6_v3) and commit [`3a88547`](https://github.com/AMDESE/linux/commit/3a885471cf89156ea555341f3b737ad2a8d9d3d0) is known to work in conjunction with SEV Firmware version 1.51.3 (0xh\_1.33.03) available on AMD's [SEV developer website](https://developer.amd.com/sev/). See [AMD's guide](https://github.com/AMDESE/AMDSEV/tree/sev-snp-devel) to configure the host accordingly. Verify that you are able to run SEV-SNP encrypted VMs first. The guest components required for Kata Containers are built as described below.
The host BIOS and kernel must be capable of supporting AMD SEV-SNP and the host must be configured accordingly.
The latest SEV Firmware version is available on AMD's [SEV Developer Webpage](https://www.amd.com/en/developer/sev.html). It can also be updated via a platform OEM BIOS update.
The host kernel must be equal to or later than upstream version [6.11](https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.11.tar.xz).
[`sev-utils`](https://github.com/amd/sev-utils/blob/coco-202501150000/docs/snp.md) is an easy way to install the required host kernel with the `setup-host` command. However, it will also build compatible guest kernel, OVMF, and QEMU components which are not necessary as these components are packaged with kata. The `sev-utils` script utility can be used with these additional components to test the memory encrypted launch and attestation of a base QEMU SNP guest.
For a simplified way to build just the upstream compatible host kernel, use the Confidential Containers fork of [AMDESE AMDSEV](https://github.com/confidential-containers/amdese-amdsev/tree/amd-snp-202501150000). Individual components can be built by running the following command:
```
./build.sh kernel host --install
```
**Tip**: It is easiest to first have Kata Containers running on your system and then modify it to run containers in SNP-VMs. Follow the [Developer guide](../Developer-Guide.md#warning) and then follow the below steps. Nonetheless, you can just follow this guide from the start.
@@ -29,16 +41,16 @@ __SNP-specific steps:__
- Build the SNP-specific kernel as shown below (see this [guide](../../tools/packaging/kernel/README.md#build-kata-containers-kernel) for more information)
```bash
$ pushd kata-containers/tools/packaging/
$ ./kernel/build-kernel.sh -a x86_64 -x snp setup
$ ./kernel/build-kernel.sh -a x86_64 -x snp build
$ sudo -E PATH="${PATH}" ./kernel/build-kernel.sh -x snp install
@@ -28,6 +28,7 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.runtime.sandbox_cgroup_only`| `boolean` | determines if Kata processes are managed only in sandbox cgroup |
| `io.katacontainers.config.runtime.enable_pprof` | `boolean` | enables Golang `pprof` for `containerd-shim-kata-v2` process |
| `io.katacontainers.config.runtime.create_container_timeout` | `uint64` | the timeout for create a container in `seconds`, default is `60` |
| `io.katacontainers.config.runtime.experimental_force_guest_pull` | `boolean` | forces the runtime to pull the image in the guest VM, default is `false`. This is an experimental feature and might be removed in the future. |
## Agent Options
| Key | Value Type | Comments |
@@ -46,7 +47,6 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.block_device_cache_set` | `boolean` | cache-related options will be set to block devices or not |
| `io.katacontainers.config.hypervisor.block_device_driver` | string | the driver to be used for block device, valid values are `virtio-blk`, `virtio-scsi`, `nvdimm`|
| `io.katacontainers.config.hypervisor.cpu_features` | `string` | Comma-separated list of CPU features to pass to the CPU (QEMU) |
| `io.katacontainers.config.hypervisor.ctlpath` (R) | `string` | Path to the `acrnctl` binary for the ACRN hypervisor |
| `io.katacontainers.config.hypervisor.default_max_vcpus` | uint32| the maximum number of vCPUs allocated for the VM by the hypervisor |
| `io.katacontainers.config.hypervisor.default_memory` | uint32| the memory assigned for a VM by the hypervisor in `MiB` |
| `io.katacontainers.config.hypervisor.default_vcpus` | float32| the default vCPUs assigned for a VM by the hypervisor |
@@ -95,6 +95,8 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.virtio_fs_extra_args` | string | extra options passed to `virtiofs` daemon |
| `io.katacontainers.config.hypervisor.enable_guest_swap` | `boolean` | enable swap in the guest |
| `io.katacontainers.config.hypervisor.use_legacy_serial` | `boolean` | uses legacy serial device for guest's console (QEMU) |
| `io.katacontainers.config.hypervisor.default_gpus` | uint32 | the minimum number of GPUs required for the VM. Only used by remote hypervisor to help with instance selection |
| `io.katacontainers.config.hypervisor.default_gpu_model` | string | the GPU model required for the VM. Only used by remote hypervisor to help with instance selection |
| `file_mem_backend` | `valid_file_mem_backends` | Valid locations for the file-based memory backend root directory |
| `jailer_path` | `valid_jailer_paths`| Valid paths for the jailer constraining the container VM (Firecracker) |
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.