Semantics are lifted straight out of the go runtime for compatibility.
We introduce DeviceVirtioScsi to represent a virtio-scsi device and
instantiate it if block device driver in the configuration file is set
to virtio-scsi. We also introduce ObjectIoThread which is instantiated
if the configuration file additionally enables iothreads.
Signed-off-by: Pavel Mores <pmores@redhat.com>
When do update_container_namespaces updating namespaces, setting
all UTS(and IPC) namespace paths to None resulted in hostnames
set prior to the update becoming ineffective. This was primarily
due to an error made while aligning with the oci spec: in an attempt
to match empty strings with None values in oci-spec-rs, all paths
were incorrectly set to None.
Fixes#10325
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Moved the parsing of the oci image marker into its own step, since we
only need to perform that for attestation purposes and some cached
images might not have that file in the tarball.
Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
This adds provenance attestation logic for agent binaries that are
published to an oci registry via ORAS.
As a downstream consumer of the kata-agent binary the Peerpod project
needs to verify that the artifact has been built on kata's CI.
To create an attestation we need to know the exact digest of the oci
artifact, at the point when the artifact was pushed.
Therefore we record the full oci image as returned by oras push.
The pushing and tagging logic has been slightly reworked to make this
task less repetetive.
The oras cli accepts multiple tags separated by comma on pushes, so a
push can be performed atomically instead of iterating through tags and
pushing each individually. This removes the risk of partially successful
push operations (think: rate limits on the oci registry).
So far the provenance creation has been only enabled for agent builds on
amd64 and xs390x.
Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
This PR updates the CI documentation referring to the several tests and
in which kind of instances is running them.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Use rstest for unit test rather than TestData arrays where
possible to make the code more compact, easier to read
and open the possibility to enhance test cases with a
description more easily.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This PR replaces the arch uname -m to use the arch_to_golang
variable in the script to have a better uniformity across the script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
inotify-configmap-pod.yaml is using: "inotifywait --timeout 120",
so wait for up to 180 seconds for the pod termination to be
reported.
Hopefully, some of the sporadic errors from #10413 will be avoided
this way:
not ok 1 configmap update works, and preserves symlinks
waitForProcess "${wait_time}" "$sleep_time" "${command}" failed
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
While Kubernetes defines `binaryData` as `[]byte`,
when defined in a YAML file the raw bytes are
base64 encoded. Therefore, we need to read the YAML
value as `String` and not as `Vec<u8>`.
Fixes: #10410
Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
Adds dependencies of 'clang' & 'protobuf' to be installed in runners
when building agent-ctl sources having image pull support.
Fixes#10400
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
This PR removes egrep command as it has been deprecated and it replaces by
grep in the test images script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Use the configuration used by AKS (static_sandbox_resource_mgmt=true)
for CI testing on Mariner hosts.
Hopefully pod startup will become more predictable on these hosts -
e.g., by avoiding the occasional hotplug timeouts described by #10413.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
When dealing with a specific release, it was easier to just do some
adjustments on the image that has to be used for ITA without actually
adding a new entry in the versions.yaml.
However, it's been proven to be more complicated than that when it comes
to dealing with staged images, and we better explicitly add (and
update) those versions altogether to avoid CI issues.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
The root cause is that the CDH client is a global variable, and unit tests `test_unseal_env` and `test_unseal_file`
share this lock-free global variable, leading to resource contention and destruction.
Merging the two unit tests into one test_sealed_secret will resolve this issue.
Fixes: #10403
Signed-off-by: ChengyuZhu6 <zhucy0405@gmail.com>
As mariner has switched to using an image instead of an initrd, let's
just drop the abiliy to build the initrd and avoid keeping something in
the tree that won't be used.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
As the mariner image is already in place, and the tests were modified to
use them (as part of this series), let's just stop building it as part
of the CI.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
The reason we're doing this is because mariner image uses, by default,
cgroups default-hierarchy as `unified` (aka, cgroupsv2).
In order to keep the same initrd behaviour for mariner, let's enforce
that `SYSTEMD_CGROUP_ENABLE_LEGACY_FORCE=1
systemd.legacy_systemd_cgroup_controller=yes
systemd.unified_cgroup_hierarchy=0` is passed to the kernel cmdline, at
least for now.
Other tests that are setting `kernel_params` are not running on mariner,
then we're safe taking this path as it's done as part of this PR.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
As an image has been added for mariner as part of the commit 63c1f81c2,
let's start using it in the CI, instead of using the initrd.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This reverts commit 31e09058af, as it's
breaking the agent unit tests CI.
This is a stop gap till Chengyu Zhu finds the time to properly address
the issue, avoiding the CI to be blocked for now.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
In https://github.com/confidential-containers/trustee/pull/521
the overlays logic was modified to add non-SE
s390x support and simplify non-ibm-se platforms.
We need to update the logic in `kbs_k8s_deploy`
to match and can remove the dummying of `IBM_SE_CREDS_DIR`
for non-SE now
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This PR adds the trap statement in the test images script to clean up
tmp files.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Kata CI will start testing the new rootfs-image-mariner instead of the
older rootfs-initrd-mariner image.
The "official" AKS images are moving from a rootfs-initrd-mariner
format to the rootfs-image-mariner format. Making the same change in
Kata CI is useful to keep this testing in sync with the AKS settings.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Refactor CDH-related operations into the cdh_handler function to make the `create_container` code clearer.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Users must set the mount path to `/sealed/<path>` for kata agent to detect the sealed secret mount
and handle it in createcontainer stage.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
Introduced `unseal_file` function to unseal secret as files:
- Implemented logic to handle symlinks and regular files within the sealed secret directory.
- For each entry, call CDH to unseal secrets and the unsealed contents are written to a new file, and a symlink is created to replace the sealed symlink.
Fixes: #8123
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
By checking for AGENT_POLICY we ensure we only try to read
allow-all.rego if AGENT_POLICY is set to "yes"
Signed-off-by: Emanuel Lima <emlima@redhat.com>
The behavior of Kata CI doesn't change.
For local testing using kubernetes/gha-run.sh and AUTO_GENERATE_POLICY=yes:
1. Before these changes users were forced to use:
- SEV, SNP, or TDX guests, or
- KATA_HOST_OS=cbl-mariner
2. After these changes users can also use other platforms that are
configured with "shared_fs = virtio-fs" - e.g.,
- KATA_HOST_OS=ubuntu + KATA_HYPERVISOR=qemu
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR removes duplicated arch variable definition in the rootfs script
as this variable and its value is already defined at the top of the
script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
add_device() now checks if QEMU is running already by checking if we have
a QMP connection. If we do a new function hotplug_device() is called
which hotplugs the device if it's a network one.
Signed-off-by: Pavel Mores <pmores@redhat.com>
With the helpers from previous commit, the actual hotplugging
implementation, though lengthy, is mostly just assembling a QMP command
to hotplug the network device backend and then doing the same for the
corresponding frontend.
Note that hotplug_network_device() takes cmdline_generator types Netdev
and DeviceVirtioNet. This is intentional and aims to take advantage of
the similarity between parameter sets needed to coldplug and hotplug
devices reuse and simplify our code. To enable using the types from qmp,
accessors were added as needed.
Signed-off-by: Pavel Mores <pmores@redhat.com>
Before adding network device hotplugging functionality itself we add
a couple of helpers in a separate commit since their functionality is
non-trivial.
To hotplug a device we need a free PCI slot. We add find_free_slot()
which can be called to obtain one. It looks for PCI bridges connected
to the root bridge and looks for an unoccupied slot on each of them. The
first found is returned to the caller. The algorithm explicitly doesn't
support any more complex bridge hierarchies since those are never produced
when coldplugging PCI bridges.
Sending netdev queue and vhost file descriptors to QEMU is slightly
involved and implemented in pass_fd(). The actual socket has to be passed
in an SCM_RIGHTS socket control message (also called ancillary data, see
man 3 cmsg) so we have to use the msghdr structure and sendmsg() call
(see man 2 sendmsg) to send the message. Since qapi-rs doesn't support
sending messages with ancillary data we have to do the sending sort of
"under it", manually, by retrieving qapi-rs's socket and using it directly.
Signed-off-by: Pavel Mores <pmores@redhat.com>
NetworkConfig::index has been used to generate an id for a network device
backend. However, it turns out that it's not unique (it's always zero
as confirmed by a comment at its definition) so it's not suitable to
generate an id that needs to be unique.
Use the host device name instead.
Signed-off-by: Pavel Mores <pmores@redhat.com>
Network device hotplugging will use the same infrastructure (Netdev,
DeviceVirtioNet) as coldplugging, i.e. QemuCmdLine. To make the code
of network device setup visible outside of QemuCmdLine we factor it out
to a non-member function `get_network_device()` and make QemuCmdLine just
delegate to it.
Signed-off-by: Pavel Mores <pmores@redhat.com>
The function takes a whole QemuCmdLine but only actually uses
HypervisorConfig. We increase callability of the function by limiting
its interface to what it needs. This will come handy shortly.
Signed-off-by: Pavel Mores <pmores@redhat.com>
At least one PCI bridge is necessary to hotplug PCI devices. We only
support PCI (at this point at least) since that's what the go runtime
does (note that looking at the code in virtcontainers it might seem that
other bus types are supported, however when the bridge objects are passed
to govmm, all but PCI bridges are actually ignored). The entire logic of
bridge setup is lifted from runtime-go for compatibility's sake.
Signed-off-by: Pavel Mores <pmores@redhat.com>
with multiple iterations/reruns we need to use the latest run of each
workflow. For that we can use the "run_id" and only update results of
the same or newer run_ids.
To do that we need to store the "run_id". To avoid adding individual
attributes this commit stores the full job object that contains the
status, conclussion as well as other attributes of the individual jobs,
which might come handy in the future in exchange for slightly bigger
memory overhead (still we only store the latest run of required jobs
only).
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
some tests require certain labels before they are executed. When our PR
is not labeled appropriately the gatekeeper detects skipped required
tests and reports a failure. With this change we add "required-labeles"
to the tests mapping and check the expected labels first informing the
user about the missing labeles before even checking the test statuses.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
the test names are using `;` and regexps were designed to use `,` but
during development simply joined the expressions by `|`. This should
work but might be confusing so let's go with the semi-colon separator
everywhere.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
The Github SHA of triggering PR should be exported in the environment
so that gatekeeper can fetch the right workflows/jobs.
Note: by default github will export GITHUB_SHA in the job's environment
but that value cannot be used if the gatekeeper was triggered from a
pull_request_target event, because the SHA correspond to the push
branch.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This will allow to cancel-in-progress the gatekeeper jobs.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
to allow selective testing as well as selective list of required tests
let's add a mapping of required jobs/tests in "skips.py" and a
"gatekeaper" workflow that will ensure the expected required jobs were
successful. Then we can only mark the "gatekeaper" as the required job
and modify the logic to suit our needs.
Fixes: #9237
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
The behavior of Kata CI doesn't change.
For local testing using kubernetes/gha-run.sh:
1. Before these changes:
- AUTO_GENERATE_POLICY=yes was always used by the users of SEV, SNP,
TDX, or KATA_HOST_OS=cbl-mariner.
2. After these changes:
- Users of SEV, SNP, TDX, or KATA_HOST_OS=cbl-mariner must specify
AUTO_GENERATE_POLICY=yes if they want to auto-generate policy.
- These users have the option to test just using hard-coded policies
(e.g., using the default policy built into the Guest rootfs) by
using AUTO_GENERATE_POLICY=no. AUTO_GENERATE_POLICY=no is the default
value of this env variable.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR adds the trap statement in the confidential kbs script
to clean up temporary files and ensure we are leaving them.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The tests is disabled for qemu-coco-dev / qemu-tdx, but it doesn't seen
to actually be failing on those. Plus, it's passing on SEV / SNP, which
means that we most likely missed re-enabling this one in the past.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
The quay.io registry returns the tags sorted alphabetically and doesn't
seem to provide a way to sort it by age. Let's use "git log" to get all
changes between the commits and print all tags that were actually
pushed.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
Currently, `qemu-runtime-rs` does not support `virtio-scsi`,
which causes the `k8s-block-volume.bats` test to fail.
We should skip this test until `virtio-scsi` is supported by the runtime.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The nginx container seems to error out when using UID=123.
Depending on the timing between container initialization and "kubectl
wait", the test might have gotten lucky and found the pod briefly in
Ready state before nginx errored out. But on some of the nodes, the pod
never got reported as Ready.
Also, don't block in "kubectl wait --for=condition=Ready" when wrapping
that command in a waitForProcess call, because waitForProcess is
designed for short-lived commands.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR updates the fast footprint script to remove the use
of egrep as this command has been deprecated and change it
to use grep command.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This imports the k8s-block-volume test from the tests repo and modifies
it slightly to set up the host volume on the AKS host.
This is a follow-up to #7132.
Fixes: #7164
Signed-off-by: Salvador Fuentes <salvador.fuentes@intel.com>
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
The ita kustomization for Trustee, as well as previously used one
(DCAP), doesn't have a $(uname -m) directory after the deployment
directory name.
Let's follow the same logic used for the deploy-kbs script and clean
those up accordingly.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Intel Tiber Trust Services (formerly known as Intel Trust Authority) is
Intel's own attestation service, and we want to take advantage of the
TDX CI in order to ensure ITTS works as expected.
In order to do so, let's replace the former method used (DCAP) to use
ITTS instead.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Inorder to support sandbox api, intorduce the sandbox_config
struct and split the sandbox start stage from init process.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
There are many similar or duplicated code patterns in `teardown()`.
This commit consolidates them into a new function, `teardown_common()`,
which is now called within `teardown()`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The current `exec_host()` accepts a given node name and
creates a node debugger pod, even if the name is invalid.
This could result in the creation of an unnecessary pending
pod (since we are using nodeAffinity; if the given name
does not match any actual node names, the pod won’t be scheduled),
which wastes resources.
This commit introduces validation for the node name to
prevent this situation.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This commit enables basic amd64 tests of docker for runtime-rs by adding
vmm types "dragonball" and "cloud-hypervisor".
Signed-off-by: Sicheng Liu <lsc2001@outlook.com>
Docker cannot exit normally after the container process exits when
used with runtime-rs since it doesn't receive the exit event. This
commit enable runtime-rs to send TaskExit to containerd after process
exits.
Also, it moves "system_time_into" and "option_system_time_into" from
crates/runtimes/common/src/types/trans_into_shim.rs to a new utility
mod.
Signed-off-by: Sicheng Liu <lsc2001@outlook.com>
When the sandbox api was enabled, the pasue container
wouldn't be created, thus the shared sandbox pidns
should be fallbacked to the first container's init process,
instead of return any error here.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
It was observed that the custom node debugger pod is not
cleaned up when a test times out.
This commit ensures the pod is cleaned up by triggering
the cleanup on EXIT, preventing any debugger pods from
being left behind.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
When using network adapters that support SR-IOV, a VFIO device can be
plugged into a guest VM and claimed as a network interface. This can
significantly enhance network performance.
Fixes: #9758
Signed-off-by: Lei Huang <leih@nvidia.com>
With #10232 merged, we now have a persistent node debugger pod throughout the test.
As a result, there’s no need to spawn another debugger pod using `kubectl debug`,
which could lead to false negatives due to premature pod termination, as reported
in #10081.
This commit removes the `print_node_journal()` call that uses `kubectl debug` and
instead uses `exec_host()` to capture the host journal. The `exec_host()` function
is relocated to `tests/integration/kubernetes/lib.sh` to prevent cyclical dependencies
between `tests_common.sh` and `lib.sh`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
in b9d88f74ed the `runtime_class` CM was
added which overrides the one we previously set. Let's reorder our logic
to first deploy webhook and then override the default CM in order to use
the one we really want.
Since we need to change dirs we also have to use realpath to ensure the
files are located well.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
`assert_pod_fail()` currently calls `k8s_create_pod()` to ensure that a pod
does not become ready within the default 120s. However, this delays the test's
completion even if an error message is detected earlier in the journal.
This commit removes the use of `k8s_create_pod()` and modifies `assert_pod_fail()`
to fail as soon as the pod enters a failed state.
All failing pods end up in one of the following states:
- CrashLoopBackOff
- ImagePullBackOff
The function now polls the pod's state every 5 seconds to check for these conditions.
If the pod enters a failed state, the function immediately returns 0. If the pod
does not reach a failed state within 120 seconds, it returns 1.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This PR removes some qemu information which is not longer valid as
this is referring to the tests repository and to kata 1.x.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
rename the task_service to service, in order to
incopperate with the following added sandbox
services.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
In order to make different from sandbox request/response, this commit
changed the task request/response to TaskRequest/TaskResponse.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Since the wait_vm would be called before calling stop_vm,
which would take the reader lock, thus blocking the stop_vm
getting the writer lock, which would trigge the dead lock.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Since the block_on would block on the current thread
which would prevent other async tasks to be run on this
worker thread, thus change it to use the async task for
this task.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
- Reflect the need to update the versions in the Helm Chart
- Add the lock branch instruction
- Add clarity about the permissions needed to complete tasks
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
* Clarifies instructions for k0s.
* Adds kata-deploy step for each cluster type.
* Removes the old kata-deploy-stable step for vanilla k8s.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR introduces support for selectively compiling Dragonball in
runtime-rs. By default, Dragonball will continue to be compiled into
the containerd-shim-kata-v2 executable, but users now have the option
to disable Dragonball compilation.
Fixes#10310
Signed-off-by: sidney chang <2190206983@qq.com>
Now that the issue with handling loop devices has been resolved,
this commit re-enables the guest-pull-image tests for `qemu-coco-dev`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Timeouts occur (e.g. `create_container_timeout` and `wait_time`)
when using qemu-coco-dev.
This commit increases these timeouts for the trusted image storage
test cases
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
If the host running the tests is different from the host where the cluster is running,
the *_loop_device() functions do not work as expected because the device is created
on the test host, while the cluster expects the device to be local.
This commit ensures that all commands for the relevant functions are executed via exec_host()
so that a device should be handled on a cluster node.
Additionally, it modifies exec_host() to return the exit code of the last executed command
because the existing logic with `kubectl debug` sometimes includes unexpected characters
that are difficult to handle. `kubectl exec` appears to properly return the exit code for
a given command to it.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Creating and deleting a node debugger pod for every `exec_host()`
call is inefficient.
This commit changes the test suite to create and delete the pod
only once, globally.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This commit addresses an issue with handling loop devices
via a node debugger due to restricted privileges.
It runs a pod with full privileges, allowing it to mount
the host root to `/host`, similar to the node debugger.
This change enables us to run tests for trusted image storage
using the `qemu-coco-dev` runtime class.
Fixes: #10133
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Add cdi devices including ContainerDevice definition and
annotation_container_device method to annotate vfio device
in OCI Spec annotations which is inserted into Guest with
its mapping of vendor-class and guest pci path.
Fixes#10145
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
We need vfio device's properties device, vendor and
class, but we can only get property device and vendor.
just extend it with class is ok.
Fixes#10145
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The kata webhook requires a configmap to define what runtime class it
should set for the newly created pods. Additionally, the configmap
allows others to modify the default runtime class name we wish to set
(in case the handler is kata but the name of the runtimeclass is
different).
Finally, this PR changes the webhook-check to compare the runtime of the
newly created pod against the specific runtime class in the configmap,
if said confimap doesn't exist, then it will default to "kata".
Signed-off-by: Martin <mheberling@microsoft.com>
This PR increases the timeout to run k8s tests for Kata CoCo TDX
to avoid the random failures of timeout.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
As we don't have any CI, nor maintainer to keep ACRN code around, we
better have it removed than give users the expectation that it should or
would work at some point.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This change technically affects the path for enabled guest selinux as well,
however since this is not implemented in runtime-rs anyway nothing should
break. When guest selinux support is added this change will come handy.
Signed-off-by: Pavel Mores <pmores@redhat.com>
If guest selinux is off the runtime has to ensure that container OCI spec
contains no selinux labels for the container rootfs and process. Failure
to do so causes kata agent to try and apply the labels which fails since
selinux is not enabled in guest, which in turn causes container launch
to fail.
This is largely inspired by golang runtime(*) with a slight deviation
in ordering of checks. This change simply checks the disable_guest_selinux
config setting and if it's true it clears both rootfs and process label if
necessary. Golang runtime, on the other hand, seems to first check if
process label is non-empty and only then it checks the config setting,
meaning that if process label is empty the rootfs label is not reset
even if it's non-empty. Frankly, this looks like a potential bug though
probably unlikely to manifest since it can be assumed that the labels are
either both empty, or both non-empty.
(*) 4fd4b02f2e/src/runtime/virtcontainers/kata_agent.go (L1005)
Signed-off-by: Pavel Mores <pmores@redhat.com>
In order to handle the setting we have to first parse it and make its
value available to the rest of the program.
The yes() function is added to comply with serde which seems to insist
on default values being returned from functions. Long term, this is
surely not the best place for this function to live, however given that
this is currently the first and only place where it's used it seems
appropriate to put it near its use. If it ends up being reused elsewhere
a better place will surely emerge.
Signed-off-by: Pavel Mores <pmores@redhat.com>
Azure internal mirrors for Ubuntu 20.04 have gone awry, leading to a
situation where dependencies cannot be installed (such as
libdevmapper-dev), blocking then our CI.
Let's bump the runners to 22.04 regardless, even knowing it'll cause an
issue with the runk tests, as the agent check tests are considered more
crucial to the project at this point.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
kata-shim was not reporting `inactive_file` in memory stat.
This memory is deducted by containerd when calculating the size of container working set, as it can be paged out by the operating
system under memory pressure. Without reporting `inactive_file`, containerd will over report container memory usage.
[Here](https://github.com/containerd/containerd/blob/v1.7.22/pkg/cri/server/container_stats_list_linux.go#L117) is where containerd
deducts `inactive_file` from memory usage.
Note that kata-shim correctly reports `total_inactive_file` for cgroup v1, but this was not implemented for cgroup v2.
This commit:
- Adds code in kata-shim to report "inactive_file" memory for cgroup v2
- Implements reporting of all available cgroup v2 memory stats to containerd
- Uses defensive coding to avoid assuming existence of any memory.stat fields
The list of available cgroup v2 memory stats defined by containerd can be found
[here](https://pkg.go.dev/github.com/containerd/cgroups/v2/stats#MemoryStat).
Fixes#10280
Signed-off-by: Alex Man <alexman@stripe.com>
Yq installation shouldn't force to use sudo in case yq is already installed in correct version.
Signed-off-by: Pawel Proskurnicki <pawel.proskurnicki@intel.com>
This patch adds support to call kata agents SetPolicy
API. Also adds tests for SetPolicy API using agent-ctl.
Fixes#9711
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
The `deploy_k0s` and `deploy_k3s` kubectl installs aren't failing
yet, but let get ahead of this and bump them as well
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This commit adds a step called `Rebase atop of the latest target branch`
to the job named `build-asset-boot-image-se` which can test the PR properly.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
agent built with policy feature initializes the policy engine using a
policy document from a default path, which is installed & linked during
UVM rootfs build. This commit adds support to provide a default agent
policy as environment variable.
This targets development/testing scenarios where kata-agent
is wanted to be started as a local process.
Fixes#10301
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
The following changes have been made:
- Remove unnecessary `sudo`
- Add an error message where an incorrect host key document is used
- Add a missing artifact `kernel-confidential-modules`
- Make a variable `kernel_version` and replace it with relevant hits
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
We've added s390x test container image, so add support
to use them based on the arch the test is running on
Fixes: #10302
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
fixuop
This commit brings some public parts of image pulling test series like
encrypted image pulling, pulling images from authenticated registry and
image verification. This would help to reduce the cost of maintainance.
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Close#8120
**Case 1**
Create a pod from an unsigned image, on an insecureAcceptAnything
registry works.
Image: quay.io/prometheus/busybox:latest
Policy rule:
```
"default": [
{
"type": "insecureAcceptAnything"
}
]
```
**Case 2**
Create a pod from an unsigned image, on a 'restricted registry' is
rejected.
Image: ghcr.io/confidential-containers/test-container-image-rs:unsigned
Policy rule:
```
"quay.io/confidential-containers/test-container-image-rs": [
{
"type": "sigstoreSigned",
"keyPath": "kbs:///default/cosign-public-key/test"
}
]
```
**Case 3**
Create a pod from a signed image, on a 'restricted registry' is
successful.
Image: ghcr.io/confidential-containers/test-container-image-rs:cosign-signed
Policy rule:
```
"ghcr.io/confidential-containers/test-container-image-rs": [
{
"type": "sigstoreSigned",
"keyPath": "kbs:///default/cosign-public-key/test"
}
]
```
**Case 4**
Create a pod from a signed image, on a 'restricted registry', but with
the wrong key is rejected
Image:
ghcr.io/confidential-containers/test-container-image-rs:cosign-signed-key2
Policy:
```
"ghcr.io/confidential-containers/test-container-image-rs": [
{
"type": "sigstoreSigned",
"keyPath": "kbs:///default/cosign-public-key/test"
}
]
```
**Case 5**
Create a pod from an unsigned image, on a 'restricted registry' works
if enable_signature_verfication is false
Image: ghcr.io/kata-containers/confidential-containers:unsigned
image security enable: false
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add two parameters for enabling cosign signature image verification.
- `enable_signature_verification`: to activate signature verification
- `image_policy`: URI of the image policy
config
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
new version of the anyhow crate has changed the backtrace capture thus
unit tests of kata-agent that compares a raised error with an expected
one would fail. To fix this, we need only panics to have backtraces,
thus set `RUST_BACKTRACE=1` and `RUST_LIB_BACKTRACE=0` for tests due to
document
https://docs.rs/anyhow/latest/anyhow/
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Change pod runAsUser value of a Replication Controller after generating
the RC's policy, and verify that the RC pods get rejected due to this
change.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Change pod runAsUser value of a Job after generating the Job's policy,
and verify that the Job gets rejected due to this change.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Change pod runAsUser value of a Deployment after generating the
Deployment's policy, and verify that the Deployment fails due to
this change.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Poll/wait for pod termination instead of sleeping 2 minutes. This
change typically saves ~90 seconds in my test cluster.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
It's a prerequisite PR to make built-in vmm dragonball compilation
options configurable.
Extract TAP device-related code from dragonball's dbs_utils into a
separate library within the runtime-rs hypervisor module.
To enhance functionality and reduce dependencies, the extracted code
has been reimplemented using the libc crate and the ifreq structure.
Fixes#10182
Signed-off-by: sidney chang <2190206983@qq.com>
In the latest `s390-tools`, there has been update on how to
verify a secure boot image. A host key revocation list (CRL),
which was optinoal, now becomes mandatory for verification.
This commit updates the relevant scripts and documentation accordingly.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
If the CI platform being tested doesn't support yet the prometheus
container image:
- Use busybox instead of prometheus.
- Skip the test cases that depend on the prometheus image.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR fixes the indentation in the cri containerd tests as we
have in several places a misalignment in the script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Remove the workaround for #9928, now that genpolicy is able to
convert user names from container images into the corresponding
UIDs from these images.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Remove the recently added default UID/GID values, because the genpolicy
design is to initialize those fields before this new code path gets
executed.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Some container images are configured such that the user (and group)
under which their entrypoint should run is not a number (or pair of
numbers), but a user name.
For example, in a Dockerfile, one might write:
> USER 185
indicating that the entrypoint should run under UID=185.
Some images, however, might have:
> RUN groupadd --system --gid=185 spark
> RUN useradd --system --uid=185 --gid=spark spark
> ...
> USER spark
indicating that the UID:GID pair should be resolved at runtime via
/etc/passwd.
To handle such images correctly, read through all /etc/passwd files in
all layers, find the latest version of it (i.e., the top-most layer with
such a file), and, in so doing, ensure that whiteouts of this file are
respected (i.e., if one layer adds the file and some subsequent layer
removes it, don't use it).
Signed-off-by: Hernan Gatta <hernan.gatta@opaque.co>
Disabling the UID Policy rule was a workaround for #9928. Re-enable
that rule here and add a new test/CI temporary workaround for this
issue. This new test workaround will be removed after fixing #9928.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Change quay.io/prometheus/busybox to quay.io/prometheus/prometheus in
this test. The prometheus image will be helpful for testing the future
fix for #9928 because it specifies user = "nobody".
Also, change:
sh -c "ls -l /"
to:
echo -n "readinessProbe with space characters"
as the test readinessProbe command line. Both include a command line
argument containing space characters, but "sh -c" behaves differently
when using the prometheus container image (causes the readinessProbe
to time out, etc.).
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR removes the remove_img variable in the metrics common script
as it is not being used.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR decreases the number of iterations in the kubernetes soak test
as this is already taking more than 2 hours for the kata coco ci
stability.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
kata-agent incorrectly reports CPU time for cgroup v2, causing 1000x underreporting.
For cgroup v2, kata-agent reads the cpu.stat file, which reports the time consumed by the processes in the cgroup in µs.
However, there was a bug in kata-agent where it returned this value in µs without converting it to ns.
This commit adds the necessary µs to ns conversion for cgroup v2, aligning it with v1 behavior and kata-shim's expectations.
This fixes#10278
Signed-off-by: Alex Man <alexman@stripe.com>
With an older version of image-rs, we were getting the following error:
```
Message: failed to create containerd task: failed to create shim task: failed to handle layer: failed to get decrypt key no suitable key found for decrypting layer key:
```
However, with the version of image-rs we are bumping to, the error comes
as:
```
Message: failed to create containerd task: failed to create shim task: failed to handle layer: failed to get decrypt key
Caused by:
no suitable key found for decrypting layer key:
keyprovider: failed to unwrap key by ttrpc
```
Due to this change, I'm splitting the check in two different ones.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
ITA stands for Intel Trust Authority, which is in the process to being
renamed to ITTS (Intel Tiber Trust Services).
Proper ITA / ITTS support on Trustee was finished as part of:
* 6f767fa15f
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
ITA stands for Intel Trust Authority, which is in the process to being
renamed to ITTS (Intel Tiber Trust Services).
As we've bumped guest-components on trustee, let's make sure we also
bump image-rs to the commit that brings ITA support in:
* 1db6c3a876
The reason we need to bump the dependency here is to avoid kbs_protocol
mismatch between the version used by the agent and the trustee one.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
In the process of switching the TDX CI machine we've noticed that
`hostname -i` in one of the machines returns an one and only IP address,
while in another machine it returns a full list of IPs.
As we're only interested in the first one, let's adapt the code to
always return the first one.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Since CDH/attestation related processes and its dependencies are not fully
available, the setup fails to start kata-agent as local process. This
fix removes these procs to prevent kata-agent from trying to start them.
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
The test setup starts kata-agent as a local process without the
UVM. The agent policy initialization fails due to missing policy
document at `/etc/kata-opa/default-policy.rego`. The fix
- installs a relaxed `allow-all.rego` policy document
- cleans up the install during exit
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
This commit introduces test cases for testing
CopyFile API using kata-agent-ctl with improved command
semantics and handling.
- copy a file to /run/kata-containers
- copy symlink to /run/kata-containers
- copy directory to /run/kata-containers
- copy file to /tmp
- copy large file to /run/kata-containers
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
In the existing implementation for the CopyFile subcommand,
- cmd line argument list is too long, including various metadata information.
- in case of a regular file, passing the actual data as bytes stream adds to the size and complexity of the input.
- the copy request will fail when the file size exceeds that of the allowed ttrpc max data length limit of 4Mb.
This change refactors the CopyFile handler and modifies the input to a known 'source' 'destination' syntax.
Fixes#9708
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
This PR increases the timeout to wait that the deployment for the soak
stability test is ready in order to avoid random failures saying that
the deployment is not ready yet.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
It will panic when users do GPU vfio passthrough with cdi in runtime.
The root cause is that CustomSpec.Annotations is nil when new element
added.
To address this issue, initialization is introduced when it's nil.
Fixes#10266
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Move device code with nvdimm driver to nvdimm_device_handler, including
nvdimm device and pmem device.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Removed the `StorageHandlerManager` struct and its associated implementations and
introduced a type alias `StorageHandlerManager` for `HandlerManager` to simplify the code.
The new type alias maintains the same functionality while reducing redundancy.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Introduced `HandlerManager` struct to manage registered handlers, which will be used to storage and device management for kata-agent.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
I know right now we're always passing a value for that, but this doesn't
really have to be set unless attestation is used. Thus, let's also omit
it in case it's empty.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This is a quick and simple pre-req for supporting initData, which will
take advantage of the mrconfigid in the TDX case.
While already adding mrconfigid, which is hardcoded empty right now,
let's do the same for mrowner and mrownerconfig, and leave it prepared
for future expansions.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
The reason we're relying on yet another function to do so is because the
TDX object will be used in its qom / qapi json format.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This commit enables running tests for kata agent apis.
The 'api-tests' directory will contain bats test files for
individual APIs.
Fixes#10269
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
This PR updates the machine learning tests references or urls for the
openVINO and oneDNN scripts as currently they are refering to a different
performance benchmark.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
enable CI to add test cases for testing kata-agent APIs. This commit
introduces:
- a workflow to run tests
- setup scripts to prepare the test environment
Fixes#10262
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
As kata-cleanup will only call `reset_runtime()`, there's absolutely no
need to export the other set of environment variables in its yaml file.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
metrics tests sometimes fail with kata components still running.
sending SIGKILL and waiting for the processes to reap.
Fixes#8651
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
This error is specific to SNP platforms, so let's make sure we only
error this out when an SNP platform is used.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Add CONFIG_PAGE_REPORTING, CONFIG_BALLOON_COMPACTION and
CONFIG_VIRTIO_BALLOON to dragonball-experimental configs to open
dragonball function and free page reporting function.
Signed-off-by: Hui Zhu <teawater@antgroup.com>
Under normal circumstances, the virtual machine only requests memory
from the host and does not actively release it back to host when it is
no longer needed, leading to a waste of memory resources.
Free page reporting is a sub-feature of virtio-balloon. When this
feature is enabled, the Linux guest kernel will send information about
released pages to dragonball via virtio-balloon, and dragonball will
then release these pages.
This commit adds an option enable_balloon_f_reporting to runtime-rs.
When this option is enabled, runtime-rs will insert a virtio-balloon
device with the f_reporting option enabled during the Dragonball virtual
machine startup.
Signed-off-by: Hui Zhu <teawater@antgroup.com>
...or by using a binary with additional suffix.
This allows having multiple versions of nydus-overlayfs installed on the
host, telling nydus-snapshotter which one to use while still detecting
Nydus is used.
Signed-off-by: Paul Meyer <49727155+katexochen@users.noreply.github.com>
This will help to avoid code duplication on what's needed on the helm
and non-helm cases.
The reason it's not been added as part of the commit which adds the
post-delete hook is simply for helping the reviewer (as the diff would
be less readable with this change).
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Instead of using a lifecycle.preStop hook, as done when we're using
using the helm chat, let's add a post-delete hook to take care of
properly cleaning up the node during when uninstalling kata-deploy.
The reason why the lifecyle.preStop hook would never work on our case is
simply because each helm chart operation follows the Kuberentes
"declarative" approach, meaning that an operation won't wait for its
previous operation to successfully finish before being called, leading
to us trying to access content that's defined by our RBAC, in an
operation that was started before our RBAC was deleted, but having the
RBAC being deleted before the operation actually started.
Unfortunately this hook brings in some code duplicatioon, mainly related
to the RBAC parts, but that's not new as the same happens with our
deamonset.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
On commit 51690bc157 we switched the installation from kubectl to helm
and used its `--wait` expecting the execution would continue when all
kata-deploy Pods were Ready. It turns out that there is a limitation on
helm install that won't wait properly when the daemonset is made of a
single replica and maxUnavailable=1. In order to fix that issue, let's
revert the changes partially to keep using kubectl and waitForProcess
to the exection while Pods aren't Running.
Fixes#10168
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Move `unseal_env` and `secure_mount` functions on the global `CDH_CLIENT` instance to access the CDH client.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Introduced a global `CDH_CLIENT` instance to hold the cdh client and
implemented `init_cdh_client` function to initialize the cdh client if not already set.
Fixes: #10231
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This PR removes the metrics report which is not longer being used
in Kata Containers.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The following steps are required for enabling KBS:
- Set environment variables `KBS` and `KBS_INGRESS`
- Uninstall and install `kbs-client`
- Deploy KBS
This commit adds the above stpes to the existing workflow
for `qemu-coco-dev`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
To deploy KBS on s390x, the environment variable `IBM_SE_CREDS_DIR`
must be exported, and the corresponding directory must be created.
This commit enables KBS deployment for `qemu-coco-dev`, in addition
to the existing `qemu-se` support on the platform.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
- Added `driver_types` method to `StorageHandler` trait to return driver
types managed by each handler.
- Implemented driver_types method for all storage handlers.
- Updated `STORAGE_HANDLERS` initialization to use `driver_types` for
handler registration.
Fixes: #10242
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
- Updated the `add_handler` function in `StorageHandlerManager` to accept a slice of IDs (`&[&str]`) instead of a single ID (`&str`).
This change allows a single handler to be registered for multiple storage device types.
- Refactored calls to `add_handler` in `Storage` of kata-agent to use the new function, passing arrays of storage drivers instead of single driver.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This reverts commit 704da86e9b, as the
tests never became stable to run.
This was discussed and agreed with the maintainer.
Conflicts:
.github/workflows/basic-ci-amd64.yaml
tests/integration/stdio/gha-run.sh
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This PR increases the time to run the Kata CoCo stability tests as
this tests are design to run for more than 2 hours.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR enables the k8s soak stability test to run on the weekly
Kata CoCo stability CI.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
It's simply easier if we just use /host/opt/kata instead in our scripts,
which will simplify a lot the logic of adding an INSTALLATION_PREFIX
later on.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
As we build our binaries with the `/opt/kata` prefix, that's the value
of $dest_dir.
Later in thise series it'll become handy, as we'll introduce a way to
install the Kata Containers artefacts in a different location.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This PR updates the firecracker version to 1.8.0 which includes the
following changes:
- Added ACPI support to Firecracker for x86_64 microVMs. Currently, we pass ACPI tables with information about the available vCPUs, interrupt controllers, VirtIO and legacy x86 devices to the guest. This allows booting kernels without MPTable support. Please see our kernel policy documentation for more information regarding relevant kernel configurations.
- Added support for the Virtual Machine Generation Identifier (VMGenID) device on x86_64 platforms. VMGenID is a virtual device that allows VMMs to notify guests when they are resumed from a snapshot. Linux includes VMGenID support since version 5.18. It uses notifications from the device to reseed its internal CSPRNG. Please refer to snapshot support and random for clones documention for more info on VMGenID. VMGenID state is part of the snapshot format of Firecracker. As a result, Firecracker snapshot version is now 2.0.0.
- Changed T2CL template to pass through bit 27 and 28 of MSR_IA32_ARCH_CAPABILITIES (RFDS_NO and RFDS_CLEAR) since KVM consider they are able to be passed through and T2CL isn't designed for secure snapshot migration between different processors.
- Avoid setting kvm_immediate_exit to 1 if are already handling an exit, or if the vCPU is stopped. This avoids a spurious KVM exit upon restoring snapshots.
- Changed T2S template to set bit 27 of MSR_IA32_ARCH_CAPABILITIES (RFDS_NO) to 1 since it assumes that the fleet only consists of processors that are not affected by RFDS.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This commit includes a fix for pulling an image on platforms that do not
support xattr.
Some platforms/file-systems do not support xattrs, this would make the
image pull fail because of failing to set xattr. This commit will check
whether the target path supports xattr. If yes, the unpacking will
maintain xattrs; if not, it will not set xattrs.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
slog's is_enabled() is documented as:
- "best effort", and
- Sometime resulting in false positives.
Use AGENT_CONFIG.log_level.as_usize() instead, to avoid those false
positives.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
22.04 is the default today:
23da668261/README.md
Being more specific will avoid unexpected errors when Github updates the
default.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Also keeps the Rust installation step even though it's preinstalled, so that we
use the version specified in versions.yaml.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR adds the oneDNN benchmark information to the machine
learning metrics README.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
GH-9592 addressed a bug in a previous version of the AKS Mariner host
kernel that blocked the CH v39 upgrade. This bug has now been fixed so
we undo that PR.
Note we also specify a different OCI version for Mariner as it differs
from Ubuntu's.
Fixes: #9594
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR fixes the entry or use of the ci weekly GHA workflow
to run properly the weekly k8s tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
PhysicalEndpoint unbinds its VF interface and rebinds it as a VFIO device,
then cold-plugs the VFIO device into the guest kernel.
When `cold_plug_vfio` is set to "no-port", cold-plugging the VFIO device
will fail.
This change checks if `cold_plug_vfio` is enabled before creating PhysicalEndpoint
to avoid unnecessary VFIO rebind operations.
Fixes: #10162
Signed-off-by: Lei Huang <leih@nvidia.com>
This reverts commit 41b7577f08.
We were seeing a lot of issues in the TDX CI of the nature:
"Error: failed to create containerd container: create instance
470: object with key "470" already exists: unknown"
With the TDX CI, we moved to having the nydus snapsotter pre-installed.
Essentially the `deploy-snapshotter` step was performed once before any
actual CI runs.
We were seeing failures related to the error message above.
On reverting this change, we are no longer seeing errors related to
"key exists" with the TDX CI passing now.
The change reverted here is related to downloading incomplete images, but this
seems to be messing up TDX CI.
It is possible to pass --snapshotter to `ctr image check` but that does
not seem to have any effect on the data set returned.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This patch re-generates the client code for Cloud Hypervisor v41.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.
Fixes: #10203
Signed-off-by: Bo Chen <chen.bo@intel.com>
This PR adds the OpenVINO benchmark general information into the
machine learning README metrics information.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
In certain environments (e.g., those with lower performance), `k8s_create_pod()`
may require additional wait time, especially when dealing with large images.
Since `k8s_wait_pod_be_ready()` — which is called by `k8s_create_pod()` — already
accepts `wait_time` as a second argument, it makes sense to introduce `wait_time`
to `k8s_create_pod()` and propagate it to the callee.
This commit adds `wait_time` to `k8s_create_pod()` as the 2nd (optional) argument.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Some of the tests call set_metadata_annotation() for updating the kernel
parameters. For `kata-qemu-se`, repack_secure_image() is called which is
defined in `lib_se.sh` and sourced by `confidential_kbs.sh`.
This commit ensures that the function call chain for the relevant
`KATA_HYPERVISOR` is properly handled.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This commit add a row for `cdh_api_timeout` to the agent options in
how-to-set-sandbox-config-kata.md.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This commit allows `cdh_api_timeout` to be configured from the configuration file.
The configuration is commented out with specifying a default value (50s) because
the default value is configured in the agent.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
therwise we may end up running into unexpected issues when calling the
cleanup option, as the same checks would be done, and files could end up
being copied again, overwriting the original content which was backked
up by the install option.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This commit updates CDH_API_TIMEOUT to use AGENT_CONFIG.cdh_api_timeout
and changes it from a `const` to `lazy_static` to accommodate runtime-determined values.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
To make the `cdh_api_timeout` variable configurable, it has been added to
the `AgentConfig` structure.
This change includes storing the variable as a `time::Duration` type and
generalizing the existing `hotplug_timeout` code to handle both timeouts.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This PR adds the pod deployment yaml for soak test which is part
of the stability k8s tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
* genpolicy: deny UpdateEphemeralMountsRequest
Deny UpdateEphemeralMountsRequest by default, because paths to
critical Guest components can be redirected using such request.
Signed-off-by: Dan Mihai <Daniel.Mihai@microsoft.com>
This PR adds a kubernetes parallel test that will launch multiple replicas
from a kubernetes deployment and we will iterate this multiple times to
verify that we are able to do this using CoCo Kata. This test will be
part of the CoCo Kata stability CI.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This partially reverts commit 94b3348d3c,
as there's more work needed in order to have this one done in a robust
way, and we are taking the safer path of reverting for now, and adding
it back as soon as the release is cut out.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This partially reverts commit 51690bc157,
as there's more work needed in order to have this one done in a robust
way, and we are taking the safer path of reverting for now, and adding
it back as soon as the release is cut out.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This reverts commit 1221ab73f9, as there's
more work needed in order to have this one done in a robust way, and we
are taking the safer path of reverting for now, and adding it back as
soon as the release is cut out.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
This PR adds the Kata CoCo Stability workflow that will setup the
environment to run the k8s tests on a non-tee environment.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds the Kata CoCo stability weekly yaml that will trigger
weekly the k8s stability tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The deploy-kbs.sh script generates the kbs.key that's used to install
KBS. This same file is used lately by kbs-client to authenticate. This ensures
that the file was created, otherwise fail.
Another problem solved here is that on bare-metal machines the key doesn't survive
a reboot as it is created in a temporary directory (/tmp/trustee). So let's save
the file to a non-temporary location.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Neither CRI-O nor containerd requires that, and removing such symlinks
makes everything less intrusive from our side.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It's already being used with CRi-O, let's simplify what we do and also
use this for containerd, which will allow us to do further cleanups in
the coming patches.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
"gha-run.sh" requires a `run` argument in order to run the tests, which
seems to be forgotten when the test was added.
This PR needs to get merged before the test can successfully run.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
I assume the PR that introduced this was based on an older version of
yq, and as the test couldn't run before it got merged we never noticed
the error.
However, this test has been failing for a reasonable amount of time,
which makes me think that we either need a maintainer for it, or just
remove it completely, but that's a discussion for another day.
For now, let's make it, at least, run.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR adds the k8s stability Kata CoCo GHA workflow to run weekly
the k8s stability tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This reverts commit d35320472c.
Although the commit in question does solve an issue related to the usage
of busybox from docker.io, as it's reasonably easy to hit the rate
limit, the commit also brings in functionalities that are causing issues
in, at least, the TDX CI, such as:
```sh
[2024-08-16T16:03:52Z INFO actix_web::middleware::logger] 10.244.0.1 "POST /kbs/v0/attest HTTP/1.1" 401 259 "-" "attestation-agent-kbs-client/0.1.0" 0.065266
[2024-08-16T16:03:53Z INFO kbs::http::attest] Auth API called.
[2024-08-16T16:03:53Z INFO actix_web::middleware::logger] 10.244.0.1 "POST /kbs/v0/auth HTTP/1.1" 200 74 "-" "attestation-agent-kbs-client/0.1.0" 0.000169
[2024-08-16T16:03:54Z INFO kbs::http::attest] Attest API called.
[2024-08-16T16:03:54Z INFO verifier::tdx] Quote DCAP check succeeded.
[2024-08-16T16:03:54Z INFO verifier::tdx] MRCONFIGID check succeeded.
[2024-08-16T16:03:54Z INFO verifier::tdx] CCEL integrity check succeeded.
[2024-08-16T16:03:54Z ERROR kbs::http::error] Attestation failed: Verifier evaluate failed: TDX Verifier: failed to parse AA Eventlog from evidence
Caused by:
at least one line should be included in AAEL
```
Let's revert this for now, and then once we get this one fixed on
trustee side we'll update again.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add bind mounts for volumes defined by docker container images, unless
those mounts have been defined in the input K8s YAML file too.
For example, quay.io/opstree/redis defines two mounts:
/data
/node-conf
Before these changes, if these mounts were not defined in the YAML file
too, the auto-generated policy did not allow this container image to
start.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Improved error handling to provide clearer feedback on request failures.
For example:
Improve createcontainer request timeout error message from
"Error: failed to create containerd task: failed to create shim task:context deadline exceed"
to "Error: failed to create containerd task: failed to create shim task: CreateContainerRequest timed out: context deadline exceed".
Fixes: #10173 -- part II
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
For the rare cases where containerd_conf_file does not exist, cp could fail
and let the pod in Error state. Let's make it a little bit more robust.
Signed-off-by: Beraldo Leal <bleal@redhat.com>
Install luks-encrypt-storage script by guest-components. So that we can maintain a single source and prevent synchronization issues.
Fixes: #10173 -- part I
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Container/Sandbox clean up should not fail if root FS is not mounted.
This PR handles EINVAL errors when umount2 is called.
Fixes: #10166
Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>
Instead of deploying and removing the snapshotter on every single run,
let's make sure the snapshotter is always deploy on the TDX case.
We're doing this as an experiment, in order to see if we'll be able to
reduce the failures we've been facing with the nydus snapshotter.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The `exec_host()` function often fails to capture the output of a given command
because the node debugger pod is prematurely terminated. To address this issue,
the function has been refactored to ensure consistent output capture by adjusting
the `kubectl debug` process as follows:
- Keep the node debugger pod running
- Wait until the pod is fully ready
- Execute the command using `kubectl exec`
- Capture the output and terminate the pod
This commit refactors `exec_host()` to implement the above steps, improving its reliability.
Fixes: #10081
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This reverts commit 8d9bec2e01, as it
causes issues in the operator and kata-deploy itself, leading to the
node to be NotReady.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
I can't set up loop device with `exec_host`, which the command is
necessary for qemu-coco-dev. See issue #10133.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Update error message in pulling image encrypted to "failed to get decrypt key no suitable key found for decrypting layer key".
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Introduces `secure_mount` API in the cdh. It includes:
- Adding the `SecureMountServiceClient`.
- Implementing the `secure_mount` function to handle secure mounting requests.
- Updating the confidential_data_hub.proto file to define SecureMountRequest and SecureMountResponse messages
and adding the SecureMountService service.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
renames the sealed_secret.proto file to confidential_data_hub.proto and
updates the corresponding API namespace from sealed_secret to confidential_data_hub.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
add tests for guest pull with configured timeout:
1) failed case: Test we cannot pull a large image that pull time exceeds a short creatcontainer timeout(10s) inside the guest
2) successful case: Test we can pull a large image inside the guest with increasing createcontainer timeout(120s)
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
add tests for pull images in the guest using trusted storage:
1) failed case: Test we cannot pull an image that exceeds the memory limit inside the guest
2) successful case: Test we can pull an image inside the guest using
trusted ephemeral storage.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This PR adds a k8s stability test that will be part of the CoCo Kata
stability tests that will run weekly.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR disables the k8s file volume test as we are having random failures
in multiple GHA CIs mainly because the exec_host function sometimes
does it not work properly.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds kubernetes stress-ng tests as part of the stability testing
for kata.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Hooks are executed on the host, so we don't expect to run hooks and thus
require that no hook paths are set.
Additional Kernel modules expand the attack surface, so require that
none are set. If a use case arises, modules should be allowlisted via
settings.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
CopyFile is invoked by the host's FileSystemShare.ShareFile function,
which puts all files into directories with a common pattern. Copying
files anywhere else is dangerous and must be prevented. Thus, we check
that the target path prefix matches the expected directory pattern of
ShareFile, and that this directory is not escaped by .. traversal.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
when use debug console, the shell run in child process may not be
exited, in some scenes.
eg. directly Ctrl-C in the host to terminate the kata-runtime process,
that will block the task handling the console connection,while waiting
for the child to exit.
Signed-off-by: soulfy <liukai254@jd.com>
Only do the checking in case the tarball was not explicitly passed by
the user. We have no control of what's passed and we cannot expect that
all the files are going to be under /opt.
Fixes: #10147
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Currently, setting `io.containerd.cri.runtime-handler` annotation in
the yaml is not necessary for pulling images in the guest. All TEE
hypervisors are already running tests with guest-pulling enabled.
Therefore, we can remove some duplicate tests and re-enable the
guest-pull test for running different runtime pods at the same time.
While considering to support different containerd version, I recommend
to keep setting "io.containerd.cri.runtime-handler".
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Initialize the trusted stroage when the device is defined
as "/dev/trusted_store" with shell script as first step.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Wang, Arron <arron.wang@intel.com>
After enable secure storage integrity for trusted storage, the initialize
time will take more times, the default value will be NOT enabled but add this config to
allow the user to enable if they care more strict security.
Fixes: #8142
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Wang, Arron <arron.wang@intel.com>
This PR updates the ubuntu image for stress Dockerfile. The main purpose
is to have a more updated image compared with the one that is in libpod
which has not been updated in a while.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
- At the moment we aren't factoring in the kata version on our caches,
so it means that when we bump this just before release, we don't
rebuilt components that pull in the VERSION content, so the release build
ends up with incorrect versions in it's binaries
Fixes: #10092
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The jobs are all executed on ubuntu-22.04 so it's invariant and
can be removed from the matrix (this will shrink the jobs names).
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Created the run-k8s-tests-on-amd64.yaml which is a merge of
run-k8s-tests-on-garm.yaml and run-k8s-tests-with-crio-on-garm.yaml
ps: renamed the job from 'run-k8s-tests' to 'run-k8s-tests-on-amd64' to
it is easier to find on Github UI and be distinguished from s390x,
ppc64le, etc...
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Switch to Github managed runners just like the run-k8s-tests-on-garm
workflow.
See: #9940
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Switched to Github managed runners. The instance_type parameter was
removed and K8S_TEST_HOST_TYPE is set to "all" which combine the
tests of "small" and "normal". This way it will reduze to half of
the jobs.
See: #9940
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
These new "kata-deploy" and "cleanup" actions are equivalent to
"kata-deploy-garm" "cleanup-garm", respectively, and should be
used on the workflows being migrated from GARM to
Github's managed runners.
Eventually "kata-deploy-garm" and "cleanup-garm" won't be used anymore
then we will be able to remove them.
See: #9940
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR updates the image that we are using in the kubectl debug command
as part of the exec host function, as the current alpine image does not
allow to create a temporary file for example and creates random kubernetes
failures.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
A recent fix should resolve some the issues seen earlier with clh
with the go runtime. Enabling this test to check if the issue is still
seen.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Our code for handling images being pulled inside the guest relies on a
containerType ("sandbox" or "container") being set as part of the
container annotations, which is done by the CRI Engine being used, and
depending on the used CRI Engine we check for a specfic annotation
related to the image-name, which is then passed to the agent.
However, when running kata-containers without kubernetes, specifically
when using `nerdctl`, none of those annotations are set at all.
One thing that we can do to allow folks to use `nerdctl`, however, is to
take advantage of the `--label` flag, and document on our side that
users must pass `io.kubernetes.cri.image-name=$image_name` as part of
the label.
By doing this, and changing our "fallback" so we can always look for
such annotation, we ensure that nerdctl will work when using the nydus
snapshotter, with kata-containers, to perform image pulling inside the
pod sandbox / guest.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Adding reset_cleanup to cleanup action so that it is done automatically
without the need to run yet another DS just to reset the runtime.
This is now part of the lifecycle hook when issuing kata-deploy.sh
cleanup
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Rather then modifying the kata-depoy scripts let's use Helm and
create a values.yaml that can be used to render the final templates
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
For easier handling of kata-deploy we can leverage a Helm chart to get
rid of all the base and overlays for the various components
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The kata containers hypervisior qemu configuration supports setting
block_device_aio="native", but the kata static build of qemu does
not add the linux aio feature.
The libaio-dev library is a necessary dependency for building qemu
with linux aio.
Fixes: #10130
Signed-off-by: Zhiwei Huang <ai.william@outlook.com>
Provides a test runner that generates a policy and validates it
with canned requests. The initial set of test cases is mostly for
illustration and will be expanded incrementally.
In order to enable both cross-compilation on Ubuntu test runners as well
as native compilation on the Alpine tools builder, it is easiest to
switch to the vendored openssl-src variant. This builds OpenSSL from
source, which depends on Perl at build time.
Adding the test to the Makefile makes it execute in CI, on a variety of
architectures. Building on ppc64le requires a newer version of the
libz-ng-sys crate.
Fixes: #10061
Signed-off-by: Markus Rudy <mr@edgeless.systems>
cargo clippy has two new warnings that need addressing:
- assigning_clones
These were fixed by clippy itself.
- suspicious_open_options
I added truncate(false) because we're opening the file for reading.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
After experimenting a little bit with those tests, they seem to be
passing on all the available TEE machines.
With this in mind, let's just enable them for those machines.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
When re-enabling those we'll need a smart way to do so, as this limit of
20 workflows referenced is just ... weird.
However, for now, it's more important to add the jobs related to the new
platforms than keep the ones that are actively disabled.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As we have new runners added, let's enable the builders so we can
prevent build failures happening after something gets merged.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit e9710332e7, as there
are now 2 arm64-builders (to be expanded to 4 really soon).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit c5dad991ce, as there
are now 2 arm64-builders (to be expanded to 4 really soon).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This parameter has been deprecated for a long time and QEMU 9.1.0 finally removes it.
Fixes: kata-containers#10112
Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>
- Add --version flag to the genpolicy tool that prints the current
version
- Add version.rs.in template to store the version information
- Update makefile to autogenerate version.rs from version.rs.in
- Add license to Cargo.toml
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
This PR removes duplicated entries (vcpus count, and available memory),
from onednn and openvino results files.
Fixes: #10119
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
The issue is similar to #10011.
The root cause is that tty and stderr are set to true at same time in
containerd: #10031.
Fixes: #10081
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This PR encloses the search string for 'default_vcpus ='
and 'default_memory =' with double quotes in order to
parse the precise values, which are included in the kata
configuration file.
Fixes: #10118
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
1. Use the new value of AllowRequestsFailingPolicy after setting up a
new Policy. Before this change, the only way to enable
AllowRequestsFailingPolicy was to change the default Policy file,
built into the Guest rootfs image.
2. Ignore errors returned by regorus while evaluating Policy rules, if
AllowRequestsFailingPolicy was enabled. For example, trying to
evaluate the UpdateInterfaceRequest rules using a policy that didn't
define any UpdateInterfaceRequest rules results in a "not found"
error from regorus. Allow AllowRequestsFailingPolicy := true to
bypass that error.
3. Add simple CI test for AllowRequestsFailingPolicy.
These changes are restoring functionality that was broken recently by
commmit df23eb09a6.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Let's remove what we commented out, as publish manifest complains:
```
Created manifest list quay.io/kata-containers/kata-deploy-ci:kata-containers-latest
./tools/packaging/release/release.sh: line 146: --amend: command not found
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR updates the memory tests like fast footprint to use grep -F
instead of fgrep as this command has been deprecated.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds missing packages depenencies to install kbs cli in a fresh
new baremetal environment. This will avoid to have a failure when trying
to run install-kbs-client.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
We've done something quite similar for kata-deploy, but I've noticed we
forgot about the kata-manager counterpart.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It's been a reasonable time that we're not able to even build arm64
artefacts.
For now I am removing the builds as it doesn't make sense to keep
running failing builds, and those can be re-enabled once we have arm64
machines plugged in that can be used for building the stuff, and
maintainers for those machines.
The `arm-jetson-xavier-nx-01` is also being removed from the runners.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In #10096, a cleanup step for kata-deploy is removed by mistake.
This leads to a cleanup error in the following `Complete job` step.
This commit restores the removed step to resolve the current CI failure on s390x.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
As suggested in #9934, the following hooks have been introduced for s390x runners:
- ACTIONS_RUNNER_HOOK_JOB_STARTED
- ACTIONS_RUNNER_HOOK_JOB_COMPLETED
These hooks will perfectly replace the existing {pre,post}-action scripts.
This commit wipes out all GHA steps for s390x where the actions are triggered.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Rename k8s-exec-rejected.bats to k8s-policy-hard-coded.bats, getting
ready to test additional hard-coded policies using the same script.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Users of AUTO_GENERATE_POLICY=yes:
- Already tested *auto-generated* policy on any platform.
- Will be able to test *hard-coded* policy too on any platform, after
this change.
CI continues to test hard-coded policies just on the platforms listed
here, but testing those policies locally (outside of CI) on other
platforms can be useful too.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Since we can't find a homogeneous value for the resource/cgroup
management of multiple hypervisors, and we have decoupled the
env vars in the Makefile, we don't need the generic ones.
Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
To avoid overriding env vars when multiple hypervisors are
available, we add per-hypervisor vars for static resource
management and cgroups handling. We reflect that in the
relevant config files as well.
Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
Let's ensure at least 50% of the memory is used for /run, as systemd by
default forces it to be 10%, which is way too small even for very small
workloads.
This is only done for the rootfs-confidential image.
Fixes: kata-containers#6775
Signed-off-by: Alex Carter <Alex.Carter@ibm.com>
Signed-off-by: Wang, Arron <arron.wang@intel.com>
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.co
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It's better to check the container's status before
try to send signal to it. Since there's no need
to send signal to it when the container's stopped.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Since stop sandbox would be called in multi path,
thus it's better to set and check the sandbox's state.
Fixes: #10042
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Generate policy that validates each exec command line argument, instead
of joining those args and validating the resulting string. Joining the
args ignored the fact that some of the args might include space
characters.
The older format from genpolicy-settings.json was similar to:
"ExecProcessRequest": {
"commands": [
"sh -c cat /proc/self/status"
],
"regex": []
},
That format will not be supported anymore. genpolicy will detect if its
users are trying to use the older "commands" field and will exit with
a relevant error message in that case.
The new settings format is:
"ExecProcessRequest": {
"allowed_commands": [
[
"sh",
"-c",
"cat /proc/self/status"
]
],
"regex": []
},
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
It's time to delete the kata oci spec implemented just
for kata. As we have already done align OCI Spec with
oci-spec-rs.
Fixes#9766
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit aligns the OCI Spec implementation in runtime-rs
with the OCI Spec definitions and related operations provided
by oci-spec-rs. Key changes as below:
(1) Leveraged oci-spec-rs to align Kata Runtime OCI Spec with
the official OCI Spec.
(2) Introduced runtime-spec to separate OCI Spec definitions
from Kata-specific State data structures.
(3) Preserved the original code logic and implementation as
much as possible.
(4) Made minor code adjustments to adhere to Rust programming
conventions;
Fixes#9766
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Utilized oci-spec-rs to align OCI Spec structures
and data representations in runk with the OCI Spec.
Fixes#9766
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit aligns the OCI Spec used within agent-ctl
with the oci-spec-rs definition and operations. This
enhancement ensures that agent-ctl adheres to the latest
OCI standards and provides a more consistent and reliable
experience for managing container images and configurations.
Fixes#9766
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This commit transitions the data implementation for OCI Spec
from kata-oci-spec to oci-spec-rs. While both libraries adhere
to the OCI Spec standard, significant implementation details
differ. To ensure data exchange through TTRPC services, this
commit reimplements necessary data conversion logic.
This conversion bridges the gap between oci-spec-rs data and
TTRPC data formats, guaranteeing consistent and reliable data
transfer across the system.
Fixes#9766
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add integration test that creates two bridge networks with nerdctl and
verifies that Kata container is brought up while passing the networks
created.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
For nerdctl and docker runtimes, network is hot-plugged instead of
cold-plugged. While this change was made in the runtime,
we did not have the agent waiting for the device to be ready.
On some systems, the device hotplug could take some time causing
the update_interface rpc call to fail as the interface is not available.
Add a watcher for the network interface based on the pci-path of the
network interface. Note, waiting on the device based on name is really
not reliable especially in case multiple networks are hotplugged.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Share a single test script variable for both:
- Allowing a command to be executed using Policy settings.
- Executing that command using "kubectl exec".
Fixes: #10014
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Hotunplugging memory is not guaranteed or even likely to work.
Nevertheless I'd really like to have this code in for tests and
observation. It shouldn't hurt, from experience so far.
Signed-off-by: Pavel Mores <pmores@redhat.com>
The bulk of this implementation are simple though tedious sanity checks,
alignment computations and logging.
Note that before any hotplugging, we query qemu directly for the current
size of hotplugged memory. This ensures that any request to resize memory
will be properly compared to the actual already available amount and only
necessary amount will be added.
Note also that we borrow checked_next_multiple_of() from CH implementation.
While this might look uncleanly it's just a rather temporary solution since
an equivalent function will apparently be part of std soon, likely the
upcoming 1.75.
Signed-off-by: Pavel Mores <pmores@redhat.com>
The algorithm is rather simple - we query qemu for existing memory devices
to figure out the index of the one we're about to add. Then we add a
backend object and a corresponding frontend device.
Signed-off-by: Pavel Mores <pmores@redhat.com>
As part of aligning the Kata OCI Spec with oci-spec-rs,
the concept of "State" falls outside the scope of the OCI
Spec itself. While we'll retain the existing code for State
management for now, to improve code organizationand clarity,
we propose moving the State-related code from the oci/ dir
to a dedicated directory named runtime-spec/.
This separation will be completed in subsequent commits with
the removal of the oci/ directory.
Fixes#9766
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This PR increases the timeout for running the CoCo tests to avoid random failures.
These failures occur when the action `Run tests` times out after 30 minutes, causing the CI to fail.
Fixes: #10062
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This PR updates the metrics launch times to use grep -F instead of
fgrep as this command has been deprecated.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
For the GPU build we need go/rust and some other helpers
to build the rootfs.
Always use versions.yaml for the correct and working Rust and golang
version
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Initramfs expects /init, create symlink only if ${ROOTFS}/init does not exist
Init may be provided by other packages, e.g. systemd or GPU initrd/rootfs
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Currently, we have found that `assert_logs_contain` does not work on TDX.
We manually located the specific log, but it fails to get the log using `kubectl debug`. The error found in CI is:
```
warning: couldn't attach to pod/node-debugger-984fee00bd70.jf.intel.com-pdgsj,
falling back to streaming logs: error stream protocol error: unknown error
```
Upon debugging the TDX CI machine, we found an error in containerd:
```
Attach container from runtime service failed" err="rpc error: code = InvalidArgument desc = tty and stderr cannot both be true"
containerID="abc8c7a546c5fede4aae53a6ff2f4382ff35da331bfc5fd3843b0c8b231728bf"
```
We believe this is the root cause of the test failures in TDX CI.
Therefore, we need to ensure that tty and stderr are not set to true at same time.
Fixes: #10011
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Signed-off-by: Wang, Arron <arron.wang@intel.com>
With Nvidia DPU or ConnectX network adapter, VF can do VFIO passthrough
to guest VM in `guest-kernel` mode. In the guest kernel, the adapter's
driver is required to claim the VFIO device and create network interface.
Signed-off-by: Lei Huang <leih@nvidia.com>
When create container failed, it should cleanup the container
thus there's no device/resource left.
Fixes: #10044
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This PR updates the Blogbench reference values for
read and write operations used in the CI check metrics
job.
This is due to the update to version 1.2 of blobench.
Fixes: #10039
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
It is not good practice to call repack_secure_image() from a bats file
because the test code might not consider cases where `qemu-se` is used
as `KATA_HYPERVISOR`.
This commit moves the function call to set_metadata_annotation() if a key
includes `kernel_params` and `KATA_HYPERVISOR` is set to `qemu-se`, allowing
developers to focus on the test scenario itself.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Add policy to pod-secret-env.yaml from k8s-credentials-secrets.bats.
Policy was already auto-generated for the other pod used by the same
test (pod-secret.yaml). pod-secret-env.yaml was inconsistent,
because it was taking advantage of the "allow all" policy built into
the Guest image. Sooner or later, CI Guests for CoCo will not get the
"allow all" policy built in anymore and pod-secret-env.yaml would
have stopped working then.
Note that pod-secret-env.yaml continues to use an "allow all" policy
after these changes. #10033 must be solved before a more restrictive
policy will be generated for pod-secret-env.yaml.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Since #9904 was merged, newly introduced tests for `k8s-guest-pull-image-authenticated.bats`
have been failing on IBM SE (s390x). The agent fails to start because a kernel parameter
cannot pass to the guest VM via annotation. To fix this, the boot image must be rebuilt with
updated parameters.
This commit adds the rebuilding step in create_pod_yaml_with_private_image() for `qemu-se`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Similar to HotAttach, the HotDetach method signature for network
endoints needs to be changed as well to allow for the method to make
use of device manager to manage the hot unplug of physical network
devices.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Enable physical network interfaces to be hotplugged.
For this, we need to change the signature of the HotAttach method
to make use of Sandbox instead of Hypervisor. Similar approach was
followed for Attach method, but this change was overlooked for
HotAttach.
The signature change is required in order to make use of
device manager and receiver for physical network
enpoints.
Fixes: #8405
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
The current implementation for device binding using driver bind/unbind
and new_id fails in the scenario when the physical device is not bound
to a driver before assigning it to vfio.
There exists and updated mechanism to accomplish the same that does not
have the same issue as above.
The driver_override field for a device allows us to specify the driver for a device
rather than relying on the bound driver to provide a positive match of the
device. It also has other advantages referenced here:
https://patchwork.kernel.org/project/linux-pci/patch/1396372540.476.160.camel@ul30vt.home/
So use the updated driver_override mechanism for binding/unbinding a
physical device/virtual function to vfio-pci.
Signed-off-by: liangxianlong <liang.xianlong@zte.com.cn>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Move to blogbench 1.2 version from 1.1.
This version includes an important fix for the read_score test
which was reported to be broken in the previous version.
It essentially fixes this issue here:
https://github.com/jedisct1/Blogbench/issues/4
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
1. Use a container image that supports "ps --user 1000 -f".
2. Execute that command using:
sh -c "ps --user 1000 -f"
instead of passing additional arguments to sh:
sh -c ps --user 1000 -f
Fixes: #10019
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Keep track of individual exec args instead of joining them in the
policy text. Verifying each arg results in a more precise policy,
because some of the args might include space characters.
This improved validation applies to commands specified in K8s YAML
files using:
- livenessProbe
- readinessProbe
- startupProbe
- lifecycle.postStart
- lifecycle.preStop
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Add tests for genpolicy's handling of container.exec_commands. These
are commands allowed by the policy and originating from these input
K8s YAML fields:
- livenessProbe
- readinessProbe
- startupProbe
- lifecycle.postStart
- lifecycle.preStop
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
We had a typo in the attestation tests that we've copied around a
lot and Wainer spotted it in the authenticated registry tests, so let's fix it up now
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add three new test cases for guest pull from an authenticated registry for
the following scenarios:
_**Scenario**: Creating a container from an authenticated image, with correct credentials via KBC works_
**Given** An authenticated container registry *quay.io/kata-containers/confidential-containers-auth*
**And** a version of kata deployed with a guest image that has an agent with `guest_pull`
feature enabled and nydus-snapshotter installed and configured for
[guest-pulling](https://github.com/containerd/nydus-snapshotter/blob/main/misc/snapshotter/config-coco-guest-pulling.toml)
**And** a KBS set up to have the correct auth.json for
registry *quay.io/kata-containers/confidential-containers-auth* embedded in the `"Credential"` section of `its resources file`
**When** I create a pod from the container image *quay.io/kata-containers/confidential-containers-auth:test*
**Then** The pull image works and the pod can start
_**Scenario**: Creating a container from an authenticated image, with incorrect credentials via KBC fails_
**Given** An authenticated container registry *quay.io/kata-containers/confidential-containers-auth*
**And** a version of kata deployed with a guest image that has an agent with `guest_pull`
feature enabled and nydus-snapshotter installed and configured for
[guest-pulling](https://github.com/containerd/nydus-snapshotter/blob/main/misc/snapshotter/config-coco-guest-pulling.toml)
**And** An installed kata CC with the sample_kbs set up to have the auth.json for registry
*quay.io/kata-containers/confidential-containers-auth* embedded in the `"Credential"` resource, but with a dummy user name and password
**When** I create a pod from the container image *quay.io/kata-containers/confidential-containers-auth:test*
**Then** The pull image fails with a message that reflects that the authorisation failed
_**Scenario**: Creating a container from an authenticated image, with no credentials fails_
**Given** An authenticated container registry *quay.io/kata-containers/confidential-containers-auth*
**And** a version of kata deployed with a guest image that has an agent with `guest_pull`
feature enabled and nydus-snapshotter installed and configured for
[guest-pulling](https://github.com/containerd/nydus-snapshotter/blob/main/misc/snapshotter/config-coco-guest-pulling.toml)
**And** An installed kata CC with no credentials section
**When** I create a pod from the container image *quay.io/kata-containers/confidential-containers-auth:test*
**Then** The pull image fails with a message that reflects that the authorisation failed
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
If the agent-config has a value for `image_registry_auth`,
Then pass this to the image-rs client and enable auth mode too
Fixes: #8122
Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add optional config for agent.image_registry_auth, to specify
the uri of credentials to be used when pulling images in the guest
from an authenticated registry
Fixes: #8122
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This PR updates the devmapper docs by updating the url link
for the current containerd devmapper information.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Made run-k8s-tests-on-zvsi.yaml free of warnings by removing:
SC2086:info:1:1: Double quote to prevent globbing and word splitting ...
SC2086:info:2:1: Double quote to prevent globbing and word splitting ...
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Add setting to allow specifying the cpath for a mount source.
cpath is the root path for most files used by a container. For example,
the container rootfs and various files copied from the Host to the
Guest when shared_fs=none are hosted under cpath.
mount_source_cpath is the root of the paths used a storage mount
sources. Depending on Kata settings, mount_source_cpath might have the
same value as cpath - but on TDX for example these two paths are
different: TDX uses "/run/kata-containers" as cpath,
but "/run/kata-containers/shared/containers" as mount_source_cpath.
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
This PR updates the runtime v2 containerd url information at containerd
documentation.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR updates the cri installation guide url link in the containerd
documentation guide as the previous url link does not exists.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The only kernel built for measured rootfs was the kernel-tdx-experimental,
so this test only ran in the qemu-tdx job runs the test.
In commit 6cbdba7 we switched all TEE configurations to use the same kernel-confidential,
so rootfs measured is disabled for qemu-tdx too now.
The VM still fails to boot (because of a different reason...) but the bug
in the assert_logs_contain, fixed in this PR was masking the checks on the logs.
We still have a few open issues related to measured rootfs and generating
the root hash, so let's skip this test that doesn't work until they are looked at
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add three new tests cases for guest-pull of an encrypted image
for the following scenarios:
_**Scenario: Pull encrypted image on guest with correct key works**_
**Given** I have a version of kata deployed with a guest image that has
an agent with `guest_pull` feature enabled and nydus-snapshotter installed
and configured for guest-pulling
**And** A public encrypted container image *i* with a decryption key *k*
that is configured as a resource the KBS, so that image-rs on the guest can
connect to it
**When** I try and create a pod from *i*
**Then** The pod is successfully created and runs
_**Scenario: Cannot pull encrypted image with no decryption key**_
**Given** I have a version of kata deployed with a guest image that has
an agent with `guest_pull` feature enabled and nydus-snapshotter installed
and configured for guest-pulling
**And** A public encrypted container image *i* with a decryption key *k*,
that is **not** configured in a KBS that image-rs on the guest can connect to
**When** I try and create a pod from *i*
**Then** The pod is not created with an error message that reflects why
_**Scenario: Cannot pull encrypted image with wrong decryption key**_
**Given** I have a version of kata deployed with a guest image that has
an agent with `guest_pull` feature enabled and nydus-snapshotter installed
and configured for guest-pulling
**And** A public encrypted container image *i* with a decryption key *k*
and a different key *k'* that is set as a resource in a KBS, that image-rs
on the guest can connect to
**When** I try and create a pod from *i*
**Then** The pod is not created with an error message that reflects why
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
ocicrypt config is for kata-agent to connect to CDH to request for image
decryption key. This value is specified by an env. We use this
workaround the same as CCv0 branch.
In future, we will consider better ways instead of writting files and
setting envs inside inner logic of kata-agent.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
- Enable the kata-cc-rustls-tls feature in image-rs, so that it
can get resources from the KBS in order to retrieve the registry
credentials.
- Also bump to the latest image-rs to pick up protobuf fixes
- Add libprotobuf-dev dependency to the agent packaging
as it is needed by the new image-rs feature
- Add extra env in the agent make test as the
new version of the anyhow crate has changed the backtrace capture thus unit
tests of kata-agent that compares a raised error with an expected one
would fail. To fix this, we need only panics to have backtraces, thus
set RUST_BACKTRACE=0 for tests due to document
https://docs.rs/anyhow/latest/anyhow/Fixes#9538
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The pipe needs adding to the grep, otherwise the grep
gets consumed as an argument to `print_node_journal` and
run in the debug pod.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Auto-generate the policy in k8s-security-context.bats - previously
blocked by lacking support for PodSecurityContext.runAsUser.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Add ability to auto-generate policy for SecurityContext.runAsUser and
PodSecurityContext.runAsUser.
Fixes: #8879
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR adds the share fs information for dragonball using kata-ctl
to avoid the failures in runk tests saying that shared_fs is an
unbound variable.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Modify the permissions of containerd.sock just when genpolicy needs
access to this socket, when testing GENPOLICY_PULL_METHOD=containerd.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Explain why the containerd settings on the local machine get set to
containerd's defaults when testing GENPOLICY_PULL_METHOD=containerd.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
oci-distribution is the value used by run-k8s-tests-on-aks.yaml, so
use the same value as default for GENPOLICY_PULL_METHOD in gha-run.sh.
The value of GENPOLICY_PULL_METHOD is currently compared just with
"containerd", but avoid possible future problems due to using a
different default value in gha-run.sh.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Currently, it is not viable to share a writable volume (e.g., emptyDir)
between containers in a single pod for IBM SE.
The following tests are relevant:
- pod-shared-volume.bats
- k8s-empty-dirs.bats
(See: https://github.com/kata-containers/kata-containers/issues/10002)
This commit skips the tests until the issue is resolved.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Sets SharedFS config to NoSharedFS for remote hypervisor in order to start the file watcher which syncs files from the host to the guest VMs.
Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>
GH-9973 introduced:
* New function get_kata_memory_and_vcpus() in
tests/metrics/lib/common.bash.
* A call to get_kata_memory_and_vcpus() from extract_kata_env(), which
is defined in tests/common.bash.
Because the nydus test only sources tests/common.bash, it can't find
get_kata_memory_and_vcpus() and errors out.
We fix this by moving the get_kata_memory_and_vcpus() call from
tests/common.bash to tests/metrics/lib/json.bash so that it doesn't
impact the nydus test.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR increases the timeout to run the CoCo TDX tests in order
to avoid the random failures on TDX saying that
The action 'Run tests' has timed out after 30 minutes and making
the GHA job fail.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
For running a KBS with `se-verifier` in service,
specific credentials need to be configured.
(See https://github.com/confidential-containers/trustee/tree/main/attestation-service/verifier/src/se for details.)
This commit introduces two procedures to support IBM SE attestation:
- Prepare required files and directory structure
- Set necessary environment variables for KBS deployment
- Repackage a secure image once the KBS service address is determined
These changes enable `k8s-confidential-attestation.bats` for s390x.
Fixes: #9933
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Currently, all functions in `build_se_image.sh` are dedicated to
publishing a payload image. However, `build_secure_image()` is now
also used for repackaging a secure image when a kernel parameter
is reconfigured. This reconfiguration is necessary because the KBS
service address is determined after the initial secure image build.
This commit extracts `build_secure_image()` from `build_se_image.sh`
and creates a separate library, which can be loaded by bats-core.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The current KBS deployment creates a file `key.bin` assuming that
`kustomization.yaml` is located in `overlays/`.
However, this does not hold true when the kustomize config is enabled
for multiple architectures. In such cases, the configuration file
should be located in `overlays/$(uname -m)`.
This commit changes the location for file creation.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
As part of the enablement for s390x, KBS should support multi-arch deployment.
This commit updates the version of coco-trustee to a commit where the support
is implemented.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The DAN feature has already been implemented in kata-runtime-rs, and
this commit brings the same capability to the Go kata-runtime.
Fixes: #9758
Signed-off-by: Lei Huang <leih@nvidia.com>
Delete the kata-containers-k8s-tests namespace before resetting the namespace
to ensure that no deployments or services are restarting and creating pods in the default namespace.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Signed-off-by: Wang, Arron <arron.wang@intel.com>
Delete test scripts forcely in `Delete kata-deploy` step before
deleting all kata pods.
Fixes: #9980
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
In dragonball Vfio device passthrough scenarois, the first passthrough
device will be allocated slot 0 which is occupied by root device.
It will cause error, looks like as below:
```
...
6: failed to add VFIO passthrough device: NoResource\n
7: no resource available for VFIO device"): unknown
...
```
To address such problem, we adopt another method with no pre-allocated
guest device id and just let dragonball auto allocate guest device id
and return it to runtime. With this idea, add_device will return value
Result<DeviceType> and apply the change to related code.
Fixes#9813
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This PR removes the common_init function call from the memory
usage script to eliminate duplicate checking that is also done
from the init_env function.
It also eliminates duplicaction of nested conditionals.
Fixes: #9984
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR removes the CI variable from build kernel script which
is not longer supported it as this was part of the jenkins
environment.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes the CI variable in kata deploy in docker script
which was supported it in jenkins environment which is not
longer being supported it.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes the CI variable from the local build makefile as
this was part of the jenkins environment which is not longer supported
it.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes the CI variable in test images script for osbuilder
as this was part of the jenkins environment which is not longer supported
it.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes the CI variable in test config osbuilder script
which was supported on the jenkins environment which is not
longer supported it.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
[greg: squash all fixes into a single patch]
Signed-off-by: Greg Kurz <groug@kaod.org>
This PR removes the use_devmapper variable which was part of the jenkins
environment flags which is not longer support it or available for the
cri-containerd tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR retrieves the free memory and the vcpus count from
a kata container and includes them to the json results file of
any metric.
Additionally this PR parses the requested vcpus quantity and the
requested amount memory from kata configuration file and includes
this pair of values into the json results file of any metric.
Finally, the file system defined in the kata configuration file
is included in the results template.
Fixes: #9972
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR rephrased the description and usage of certain functions
as such as:
- set_kata_configuration_performance
- set_kata_config_file
- get_current_kata_config_file
- check_if_root
- check_ctr_images
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
run-k8s-tests-on-zvsi runs the coco tests and we've added new
secrets to provide credentials for the authenticated image testing,
so we need to let the zvsi job inherit these from the caller workflow
like the rest of the coco tests
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Fixes: #9532
Instead of call agent.close_stdin in close_io, we call agent.write_stdin
with 0 len data when the stdin pipe ends.
Signed-off-by: Tim Zhang <tim@hyper.sh>
The sealed secret test depends on the KBS to provide
the unsealed value of a vault secret.
This secret is provisioned to an environment variable.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
To test unsealing secrets stored in environment variables,
we create a simple test server that takes the place of
the CDH. We start this server and then use it to
unseal a test secret.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
When sealed-secret is enabled, the Kata Agent
intercepts environment variables containing
sealed secrets and uses the CDH to unseal
the value.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
This PR removes the CI variable from QAT run script which was used
in the jenkins environment and not longer used.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes the CI variable as well as the instructions related
to this as this was part of the jenkins environment which is not
longer supported it.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes the CI variable as well as the instructions related
to this variable which was used on the jenkins environment and not
longer supported.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR updates the Intel QAT documentation by removing the CI variable
which is not longer being supported as this was part of the jenkins
CI environment.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes the CI variable which is not longer being used or valid
in the kata containers repository. The CI variable was used when we
were using jenkins and scripts setups which are not longer supported.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR fixes the pip installation for nerdctl by removing a flag
which is not longer supported and avoid the failure of
no such option: --break-system-packages.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
- Due to the error we hit with pulling the agnhost
image used in the liveness-probe tests, we want to leave
the console printing to help with debug when we next try
to bump the image-rs version
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Bump the commit of image-rs we are pulling in to 413295415
Note: This is the last commmit before a change to whiteout handling
was introduced that lead to the error `'failed to unpack: convert whiteout"`
when pulling the agnhost:2.21 image
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Let me explain why:
In our previous approach, we implemented guest pull by passing PullImageRequest to the guest.
However, this method resulted in the loss of specifications essential for running the container,
such as commands specified in YAML, during the CreateContainer stage. To address this,
it is necessary to integrate the OCI specifications and process information
from the image’s configuration with the container in guest pull.
The snapshotter method does not care this issue. Nevertheless, a problem arises
when two containers in the same pod attempt to pull the same image, like InitContainer.
This is because the image service searches for the existing configuration,
which resides in the guest. The configuration, associated with <image name, cid>,
is stored in the directory /run/kata-containers/<cid>. Consequently, when the InitContainer finishes
its task and terminates, the directory ceases to exist. As a result, during the creation
of the application container, the OCI spec and process information cannot
be merged due to the absence of the expected configuration file.
Fixes: kata-containers#9665
Fixes: kata-containers#9666
Fixes: kata-containers#9667
Fixes: kata-containers#9668
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Currently, the image is pulled by image-rs in the guest and mounted at
`/run/kata-containers/image/cid/rootfs`. Finally, the agent rebinds
`/run/kata-containers/image/cid/rootfs` to `/run/kata-containers/cid/rootfs` in CreateContainer.
However, this process requires specific cleanup steps for these mount points.
To simplify, we reuse the mount point `/run/kata-containers/cid/rootfs`
and allow image-rs to directly mount the image there, eliminating the need for rebinding.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
- Add a check in setup_bundle to see if the bundle already exists
and if it does then skip the setup.
This commit is cherry-picked from 44ed3ab80e.
The reason that k8s-kill-all-process-in-container.bats failed is that
deletion of the directory `/root/kata-containers/cid/rootfs` failed during removing container
because it was mounted twice (one in image-rs and one in set_bundle ) and only unmounted once in removing container.
Fixes: #9664
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Dave Hay <david_hay@uk.ibm.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
After v5.14 there is no cpu_hotplug_begin function
now cpus_write_lock same for cpu_hotplug_done = cpus_write_unlock
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
This PR has better variable definitons as well the use of a variable
which is already defined in the metrics common script for soak parallel
test.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR is for better variable definitions as well as the use of the
CTR_EXE variable which is already defined in the metrics common script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR uses the CTR_EXE which is already defined in the metrics common
script to have uniformity across the multiple stability tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
To unseal a secret, the Kata agent will contact the CDH
using ttRPC. Add the proto that describes the sealed
secret service and messages that will be used.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Biao Lu <biao.lu@intel.com>
We have not seen instances of the nydus snapshotter hanging on its
deletion that we must patch its finalize.
Let's just drop this line for now.
Signed-Off-By: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add a basic runtime-rs `Hypervisor` trait implementation for
AWS Firecracker
- Add basic hypervisor operations (setup / start / stop / add_device)
- Implement AWS Firecracker API on a separate file `fc_api.rs`
- Add support for running jailed (include all sandbox-related content)
- Add initial device support (limited as hotplug is not supported)
- Add separate config for runtime-rs (FC)
Notes:
- devmapper is the only snapshotter supported
- to account for no sharefs support, we copy files in the sandbox (as
in the GO runtime)
- nerdctl spawn is broken (TODO: #7703)
Fixes: #5268
Signed-off-by: George Pyrros <gpyrros@nubificus.co.uk>
Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
Signed-off-by: Charalampos Mainas <cmainas@nubificus.co.uk>
Signed-off-by: George Ntoutsos <gntouts@nubificus.co.uk>
Some resource names seem to be lingering in Azure limbo but do not map
to any actual resources, so we ignore those.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This PR removes the CI_JOB variable which previously was used but
not longer being supported of the metrics sysbench test.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes the jenkins reference from unit testing presentation
as this is not longer supported on the kata containers project.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR updates the container name to put a random name instead
of using a hard coded name. This PR is a general improvement
to avoid random bug failures specially when we are running on
baremetal environments.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Test reports that it is a onednn test when it is openvino; update
description.
Fixes: #9948
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
This commit extends the vfio-ap hotplug test to include the use of `zcrypttest`.
A newly introduced test by the tool consists of several test rounds as follows:
- ioctl_test
- simple_test
- simple_one_thread_test
- simple_multi_threads_test
- multi_thread_stress_test
- hang_after_offline_online_test
A writable root filesystem is required for testing because the reference count
needs to be reset after each test round. The current containerd kata containers
support does not include `--privileged_without_host_devices`, which is necessary
to configure a writable filesystem along with `--privileged`. (Please check out
https://github.com/kata-containers/kata-containers/issues/9791 for details)
So `crictl` is chosen to extend the test.
The commit also includes the removal of old commands previously used for the
tests repository but no longer in use.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This commit copies an internal testing tool `zcrypttest` to the
test image. A base image is changed to `ubuntu:22.04` due to a
library dependency issue.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This commit adds the missing step of passing an attached vfio-ap device
to a container via ociSpec. It instantiates and passes a vfio-ap device
(e.g. a Z crypto device).
A device at `/dev/z90crypt` covers all use cases at the time of writing.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Do not install the packages librados-dev and librbd-dev as they are not needed for building static qemu.
Add machine option cap-ail-mode-3=off while creating the VM to qemu cmdline.
Fixes: #9893
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
The CI is failing with:
```
Invalid workflow file: .github/workflows/cleanup-resources.yaml#L10
The workflow is not valid. .github/workflows/cleanup-resources.yaml (Line: 10, Col: 5): Unexpected value 'secrets'
```
I think this is because `secrets: inherit` is only applicable
when re-using a workflow, not for a standalone job like
we have here.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
For docker-based builds only install Rust when necessary.
Further, execute the detect Rust version check only when
intending to install Rust.
As of today, this is the case when we intend to build the
agent during rootfs build.
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
Per the decision taken in the 6/27 AC meeting, this PR temporarily
disables kata-deploy and GARM tests until we secure further Azure CI
funding.
In the meantime, I'll transition the GARM tests to free runners and
reenable them to regain that coverage without affecting spending (see
#9940). If it turns out the free runners are too slow, we'll switch back
to GARM.
After funding is secured, we'll reenable the kata-deploy tests (see
#9939).
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
As part of archiving the tests repo, we are eliminating the dependency on
`clone_tests_repo()`. The scripts using the function is as follows:
- `ci/install_rust.sh`.
- `ci/setup.sh`
- `ci/lib.sh`
This commit removes or replaces the files, and makes an adjustment accordingly.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
A multi-arch image for `alpine-bash-curl` has been pushed to and available
at `quay.io/kata-containers`.
This commit switches the test image to `quay.io/kata-containers/alpine-bash-curl`.
Fixes: #9935
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The following scripts are not used by the repository any more:
- ci/install_go.sh
- ci/run.sh
- ci/install_vc.sh
Additionally, they rely on the tests repo, which is soon to be archived.
This commit drops the unused scripts.
Fixes: #8507
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
In case we are dealing with multiple interfaces and there exists a
network interface with a conflicting name, we temporarily rename it to
avoid name conflicts.
Before doing this, we need to rename bring the interface down.
Failure to do so results in netlink returning Resource busy errors.
The resource needs to be down for subsequent operation when the name is
swapped back as well.
This solves the issue of passing multiple networks in case of nerdctl
as:
nerdctl run --rm --net foo --net bar docker.io/library/busybox:latest ip a
Fixes: #9900
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This patch re-generates the client code for Cloud Hypervisor v40.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.
Fixes: #9929
Signed-off-by: Bo Chen <chen.bo@intel.com>
Observed instability in the API server after deploying kata-deploy caused test failures.
(see: https://github.com/kata-containers/kata-containers/actions/runs/9681494440/job/26743286861)
Specifically, `kubectl_retry logs` failed before the API server could respond properly.
This commit increases the interval and max_tries for kubectl_retry(), allowing sufficient
time to handle this situation.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
In some DMZ'ed or CI systems the repos are not up to date
and multistrap fails to find the ubuntu-keyring package.
Update the repos to fix this;
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
This PR increases the timeout to crictl calls on kata monitor
tests to avoid to hit issues every now and avoid random failures.
This PR is very similar to PR #7640.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Updated genpolicy settings to allow 2 empty environment variables that
may be forgotten to specify (AZURE_CLIENT_ID and AZURE_TENANT_ID)
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
This is the first part of adding a job to clean up potentially dangling
Azure resources. This will be based on Jeremi's tool from
https://github.com/jepio/kata-azure-automation.
At first, we'll only clean up AKS clusters, as this is what has been
causing us problems lately, but this could very well be extended to
cleaning up entire resource groups, which is why I left the different
names pretty generic (i.e. "resources" instead of "clusters").
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
- We only want to enable the sample verifier in the KBS for non-TEE
tests, so prevent an edge case where the TEE platform isn't set up
correctly and we might fall back to the sample and get false positives.
To prevent this we add guards around the sample policy enablement and
only run it for non confidential hardware
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Add the `AUTHENTICATED_IMAGE_USER` and
`AUTHENTICATED_IMAGE_PASSWORD` repository secrets as env vars
to the coco tests, so we can use them to pull an images from
and authenticated registry for testing
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Add --layers-cache-file-path flag to allow the user to
specify where the cache file for the container layers
is saved. This allows e.g. to have one cache file
independent of the user's working directory.
Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
While enabling the attestation for IBM SE, it was observed that
a kernel config `CONFIG_S390_UV_UAPI` is missing.
This config is required to present an ultravisor in the guest VM.
Ths commit adds the missing config.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
we do encourage people to set the KATA_RUNTIME, but it is only used by
the webhook. Let's define it in the main `test.sh` and use it in the
smoke test to ensure the user-defined runtime is smoke-tested rather
than hard-coded kata-qemu one.
Related to: #9804
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
Now, using vanilla kubernetes, let's re-enable the TDX CI and hope it
becomes more stable than it used to be.
The cleanup-snapshotter is now taking ~4 minutes, and that matches with
the other platforms, mainly considering there's a sum of 210 seconds
sleep in the process.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Move the `sandbox.agent.setPolicy` call out of the remoteHypervisor
if, block, so we can use the policy implementation on peer pods
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
get_vmm_master_tid() currently returns an error with the message "cannot
get qemu pid (though it seems running)" when it finds a valid
QemuInner::qemu_process instance but fails to extract the PID out of it.
This condition however in fact means that a qemu child process was running
(otherwise QemuInner::qemu_process would be None) but isn't anymore (id()
returns None).
Signed-off-by: Pavel Mores <pmores@redhat.com>
Since Hypervisor::stop_vm() is called from the WaitProcess request handling
which appears to be per-container, it can be called multiple times during
kata pod shutdown. Currently the function errors out on any subsequent
call after the initial one since there's no VM to stop anymore. This
commit makes the function tolerate that condition.
While it seems conceivable that sandbox shouldn't be stopped by WaitProcess
handling, and the right fix would then have to happen elsewhere, this
commit at least makes qemu driver's behaviour consistent with other
hypervisor drivers in runtime-rs.
We also slightly improve the error message in case there's no
QemuInner::qemu_process instance.
Signed-off-by: Pavel Mores <pmores@redhat.com>
Since no objections were raised in the linked issue (#9847) this commit
removes the attempt to derive sandbox bundle path from container bundle
path. As described in more detail in the linked issue, this is container
runtime specific and doesn't seem to serve any purpose.
As for implementation, we hoist the only part of
get_shim_info_from_sandbox() that's still useful (getting the socket
address) directly into the caller and remove the function altogether.
Fixes#9847
Signed-off-by: Pavel Mores <pmores@redhat.com>
We've noticed a bunch of issues related to deploying and deleting the
nydus-snapshotter. As we don't see the same issues on other machines
using vanilla kubernetes, let's avoid using k3s for now follow the flow.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes the variables names for the network that was created as well
removes the network that were created for the tests to ensure a clean environment
when running all the tests and avoid failures specially on baremental environments
that network already exists.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Reject CreateContainerRequest field values that are not tested by
Kata CI and that might impact the confidentiality of CoCo Guests.
This change uses a "better safe than sorry" approach to untested
fields. It is very possible that in the future we'll encounter
reasonable use cases that will either:
- Show that some of these fields are benign and don't have to be
verified by Policy, or
- Show that Policy should verify legitimate values of these fields
These are the new CreateContainerRequest Policy rules:
count(input.shared_mounts) == 0
is_null(input.string_user)
i_oci := input.OCI
is_null(i_oci.Hooks)
is_null(i_oci.Linux.Seccomp)
is_null(i_oci.Solaris)
is_null(i_oci.Windows)
i_linux := i_oci.Linux
count(i_linux.GIDMappings) == 0
count(i_linux.MountLabel) == 0
count(i_linux.Resources.Devices) == 0
count(i_linux.RootfsPropagation) == 0
count(i_linux.UIDMappings) == 0
is_null(i_linux.IntelRdt)
is_null(i_linux.Resources.BlockIO)
is_null(i_linux.Resources.Network)
is_null(i_linux.Resources.Pids)
is_null(i_linux.Seccomp)
i_linux.Sysctl == {}
i_process := i_oci.Process
count(i_process.SelinuxLabel) == 0
count(i_process.User.Username) == 0
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR improves the variable definition in memory inside
the container script for metrics. This change declares and assigns
the variables separately to avoid masking return values.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR will avoid the failures when collecting artifacts for the gha.
This will ensure that we collect and archive system's data for the
purpose of debugging.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
For a minimal initrd/image build we may want to leverage busybox.
This is part number two of the NVIDIA initrd/image build
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
TDX CI has been having some issues with the Nydus snapshotter cleanup,
which has been stuck for hours depending every now and then.
With this in mind, let's disable the TDX CI, so we avoid it blocking the
progress of Kata Containers project, and we re-enable it as soon as we
have it solved on Intel's side.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Commit 'ca02c9f5124e' implements the vhost-user-blk reconnection functionality,
However, it has missed assigning VhostUserDeviceReconnect when new the QEMU
HypervisorConfig, resulting in VhostUserDeviceReconnect always set to default value 0.
Real change is this line, most of changes caused by go format,
return vc.HypervisorConfig{
// ...
VhostUserDeviceReconnect: h.VhostUserDeviceReconnect,
}, nil
Fixes: #9848
Signed-off-by: markyangcc <mmdou3@163.com>
This is a counterpart of commit abf52420a4 for the qemu-coco-dev
configuration. By allowing default_vcpu and default_memory annotations
users can fine-tune the VM based on the size of the container
image to avoid issues related with pulling large images in the guest.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Just like the TEE configurations (sev, snp, tdx) we want to have the
qemu-coco-dev using shared_fs=none.
Fixes: #9676
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The following error was observed during the deployment of nydus snapshotter:
```
Error from server (NotFound):
the server could not find the requested resource ( pods/log nydus-snapshotter-5v82v)
'kubectl logs nydus-snapshotter-5v82v -n nydus-system' failed after 3 tries
Error: Process completed with exit code 1.
```
This error can occur when a pod is re-created by a daemonset during the retry interval.
This commit addresses the issue by using `--selector` rather than the pod name
for `kubectl logs/describe`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The following jobs have failed more than 50% on nightly CI.
run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, k0s)
run-kata-deploy-tests-on-garm / run-kata-deploy-tests (clh, rke2)
run-kata-deploy-tests-on-garm / run-kata-deploy-tests (qemu, k0s)
Instead of removing only those jobs, let's skip the kata-deploy-tests
on GARM completely so we can try to fix all the issues (or maybe
drop the jobs altogether).
Issue: #9854
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
To build the build-kata-deploy image, it should be copied ci/install_yq.sh to
tools/packaging/kata-deploy/local-build/dockerbuild as this script will install
yq within the image. Currently, if
tools/packaging/kata-deploy/local-build/dockerbuild/install_yq.sh exists then
make won't copy it again. This can raise problems as, for example, the current
update of yq version (commit c99ba42d) in ci/install_yq.sh won't force the
rebuild of the build-kata-deploy image.
Note: this isn't a problem on a fresh dev or CI environment.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
- Bump tokio to 1.38.0 to fix the security vulnerability
https://rustsec.org/advisories/RUSTSEC-2024-0019
If possible it would be good to add the many runtime-rs creates into the
runtime-rs workspace and provide a centralised version to avoid the updates
in many places.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The job has failed more than 50% on nightly CI. Remove it from the list of
execution until we don't have a fix.
Issue: #9853
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The job has failed more than 50% on nightly CI. Remove it from the list of
execution until we don't have a fix.
Issue: #9852
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The job has failed more than 50% on nightly CI. Remove it from the list of
execution until we don't have a fix.
The clh variation was disabled on commit 5f5274e699 so this change will
actually result on all the VFIO jobs disabled. Instead of delete the entire
entry from this workflow yaml (or comment the entry), I preferred to use
`if: false` which will make the jobs appear on the UI as skipped.
Issue: 9851
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
- Fix the lint error:
```
error: you seem to use `.enumerate()` and immediately discard the index
--> src/device_manager/mod.rs:427:33
|
427 | for (_index, device) in self.virtio_devices.iter().enumerate() {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```
by removing the unnecessary enumerate
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The ci failed with:
```
error: use of `or_insert_with` to construct default value
--> src/address_space_manager.rs:650:14
|
650 | .or_insert_with(NumaNode::new);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try: `or_default()`
|
```
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We take advantage of the Inner pattern to enable QemuInner::resize_vcpu()
take `&mut self` which we need to call non-const functions on Qmp.
This runs on Intel architecture but will need to be verified and ported
(if necessary) to other architectures in the future.
Signed-off-by: Pavel Mores <pmores@redhat.com>
The QMP_SOCKET_FILE constant in cmdline_generator.rs is made public to make
it accessible from QemuInner. This is fine for now however if the constant
needs to be accessed from additional places in the future we could consider
moving it to somewhere more visible.
The Debug impl for Qmp is empty since first, we don't actually want it,
it's only forced by Hypervisor trait bounds, and second, it doesn't have
anything to display anyway. If Qmp gets any members in the future that
can be meaningfully displayed they should be handled by Qmp's Debug::fmt().
Signed-off-by: Pavel Mores <pmores@redhat.com>
The constructor handles QMP connection initialisation, too, so there can
be non-functional Qmp instance.
Signed-off-by: Pavel Mores <pmores@redhat.com>
We set the VERSION variable consistently across Makefiles to
'unknown' if the file is empty or not present.
We also use git commands consistently for calculating the COMMIT,
COMMIT_NO variables, not erroring out when building outside of
a git repository.
In create_summary_file we also account for a missing/empty VERSION
file.
This makes e.g. the UVM build process in an environment where we
build outside of git with a minimal/reduced set of files smoother.
Signed-off-by: Manuel Huber <mahuber@microsoft.com>
The following tests are disabled because they fail (alike with dragonball):
- k8s-cpu-ns.bats
- k8s-number-cpus.bats
- k8s-sandbox-vcpus-allocation.bats
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
There is a known issue in qemu 7.2.0 that causes kernel-hashes to fail the verification of the launch binaries for the SEV legacy use case.
Upgraded to qemu 8.2.4.
new available features disabled.
Fixes: #9148
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
When the total number of files observed is greater than limit, return -1 directly.
runtime has fixed this bug, it should b ported to runtime-rs.
Fixes:#9829
Signed-off-by: gaohuatao <gaohuatao@bytedance.com>
It shouldn't call the initial_size_manager's setup_config
in the load_config since it had been called in the sandbox's
try_init function.
Fixes: #9778
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
For kata container, the container's pid is meaning less to
containerd/crio since the container's pid is belonged to VM,
and containerd/crio couldn't use it. Thus we just return any
tid of kata shim or hypervisor. But since the hypervisor had
been stopped before deleting the container, and it wouldn't
get the hypervisor's tid for some supported hypervisor, thus
we'd better to return the kata shim's pid instead of hypervisor's
tid.
Fixes: #9777
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This PR uses the nodeport deployment from upstream trustee.
To ensure our deployment is as close to upstream trustee replace
the custom nodeport handling and replace it with nodeport
kustomized flavour from the trustee project.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR makes general improvements like definition of variables and
the use of them to improve the general setup script for kubernetes
tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
While running with a remote hypervisor, whenever kata-monitor tries to access
metrics from the shim, the shim does a "panic" and no metric can be gathered.
The function GetVirtioFsPid() is called on metrics gathering, and had a call
to "panic()". Since there is no virtiofs process for remote hypervisor, the
right implementation is to return nil. The caller expects that, and will skip
metrics gathering for virtiofs.
Fixes: #9826
Signed-off-by: Julien Ropé <jrope@redhat.com>
This corrects the warning to point to the \`-j\` flag,
which is the correct flag for the JSON settings file.
Previously, the warning was confusing, as it pointed to
the \`-p\` flag, which specifies to the path for the Rego ruleset.
Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
This PR uses the function definition to have uniformity across
all the launch times script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This patch re-generates the client code for Cloud Hypervisor v39.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.
Fixes: #8694, #9574
Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch upgrades Cloud Hypervisor to v39.0 from v36.0, which contains
fixes of several security advisories from dependencies. Details can be
found from #9574.
Fixes: #8694, #9574
Signed-off-by: Bo Chen <chen.bo@intel.com>
Start testing the ability of kata-deploy to install and configure
the qemu-runtime-rs runtimeClass.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Allow kata-deploy to install and configure the qemu-runtime-rs runtimeClass
which ties to qemu hypervisor implementation in rust for the runtime-rs.
Fixes: #9804
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
- Update the config parsing logic so that when reading from the
agent-config.toml file any envs are still processed
- Add units tests to formalise that the envs take precedence over values
from the command line and the config file
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
When the total number of files observed is greater than limit, return (-1, err).
When the returned err is not nil, the func countFiles should return -1.
Fixes:#9780
Signed-off-by: gaohuatao <gaohuatao@bytedance.com>
fixes#9810
Add an annotation to the enum values in the agent config that will
deserialize them using a kebab-case conversion, aligning the behaviour
to parsing of params specified via kernel cmdline.
drive-by fix: add config override for guest_component_procs variable
Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
Frequent errors have been observed during k8s e2e tests:
- The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
- Error from server (ServiceUnavailable): the server is currently unable to handle the request
- Error from server (NotFound): the server could not find the requested resource
These errors can be resolved by retrying the kubectl command.
This commit introduces a wrapper function in common.sh that runs kubectl up to 3 times
with a 5-second interval. Initially, this change only covers gha-run.sh for Kubernetes.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
It seems I was very lose on disabling some of the tests, and the issues
I faced could be related to other instabilities in the CI.
Let's re-enable this one, following what was done for the SEV, SNP, and
coco-qemu-dev.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It seems I was very lose on disabling some of the tests, and the issues
I faced could be related to other instabilities in the CI.
Let's re-enable this one, following what was done for the SEV, SNP, and
coco-qemu-dev.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It seems I was very lose on disabling some of the tests, and the issues
I faced could be related to other instabilities in the CI.
Let's re-enable this one, following what was done for the SEV, SNP, and
coco-qemu-dev.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It seems I was very lose on disabling some of the tests, and the issues
I faced could be related to other instabilities in the CI.
Let's re-enable this one, following what was done for the SEV, SNP, and
coco-qemu-dev.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It seems I was very lose on disabling some of the tests, and the issues
I faced could be related to other instabilities in the CI.
Let's re-enable this one, following what was done for the SEV, SNP, and
coco-qemu-dev.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's stop skipping the CDH tests for TDX, as know we should have an
environmemnt where it can run and should pass. :-)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We must ensure we use the host ip to connect to the PCCS running on the
host side, instead of using localhost (which has a different meaning
from inside the KBS pod).
The reason we're using `hostname -i` isntead of the helper functions, is
because the helper functions need the coco-kbs deployed for them to
work, and what we do is before the deployment.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(1) As mis-use of cap.set causing previous Caps lost which
causing assert! failed, just replacing cap.set with cap.add.
(2) It will return error if there's no such name setting when
do update_config_by_annotation {
...
if config.runtime.name.is_empty() {
return Err(io::Error::new(
io::ErrorKind::InvalidData,
"Runtime name is missing in the configuration",
));
}
...
}
Fixes#9783
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Centos8 is EOL and repos are not available anymore. Centos9 contains the
same packages and should do well as a base for testing.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
This is generally useful debug output on test failures,
and specifically this has been useful for nydus-related
issues recently.
Signed-off-by: Chris Porter <porter@ibm.com>
This is required to support SEV-SNP confidential container with kernel-hashes.
Since this ovmf is latest stable version, it is good to upgrade for tdx
and Vanilaa builds too.
Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>
This is required to provide the hashes of kernel, initrd and cmdline
needed during the attestation of the coco.
Fixes: #9150
Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>
A new PULL_TYPE environment variable is recognized by the kata-deploy's
install script to allow it to configure CRIO-O for guest-pull image pulling
type.
The tests/integration/kubernetes/gha-run.sh change allows for testing it:
```
export PULL_TYPE=guest-pull
cd tests/integration/kubernetes
./gha-run.sh deploy-k8s
```
Fixes#9474
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR replaces the name to use a variable that is already defined
to have a better uniformity across the general script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
In line with the changes for x86_64, the k8s nydus test for s390x should
also use `qemu-coco-dev` for `KATA_HYPERVISOR`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
It was observed that the `node_start_time` value is sometimes empty,
leading to a test failure.
This commit retries fetching the value up to 3 times.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The Kata blog was recently moved to the project's website. The content
of the blog is stored together with the rest of the website source on
GitHub.
This patch adds a short guide that describes how to submit a new
blog post as a PR, to appear on the project's website.
Signed-off-by: Ildiko Vancsa <ildiko.vancsa@gmail.com>
This test fails when using `shared_fs=none` with the nydus snapshotter.
Issue tracked here: #9666
Skipping for now.
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
This test is failing due to the initContainers not being properly
handled with the guest image pulling.
Issue tracked here: #9668
Skipping for now.
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
This test fails when using `shared_fs=none` with the nydus napshotter,
Issue tracked here: #9664
Skipping for now.
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
To prevent download failures caused by high traffic to the Docker image,
opt for quay.io/prometheus/busybox:latest over docker.io/library/busybox:latest .
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Added configuration file with rules to exclude some self-hosted
runners from the linter warnings.
Related-with: #9646
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The job has failed more than 50% on nightly CI. Remove it from the list of
execution until we don't have a fix.
Issue: 9764
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
These jobs have failed more than 50% on nightly CI. Remove them from the list of
execution until we don't have a fix.
Issue: 9763
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This job has failed more than 50% on nightly CI. Remove it from the list of
execution until we don't have a fix.
Issue: 9761
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
For nerdctl install, add symlinks for runc and slirp4netns in the
binary install path.
runc link comes in handy for running runc containers with nerdctl fir
quick tests.
slirp4netns allows for running containers with user mode networking
useful in case of rootless containers.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
CreateContainerRequest objects can specify devices to be created inside
the guest VM. This change ensures that requested devices have a
corresponding entry in the PodSpec.
Devices that are added to the pod dynamically, for example via the
Device Plugin architecture, can be allowlisted globally by adding their
definition to the settings file.
Fixes: #9651
Signed-off-by: Markus Rudy <mr@edgeless.systems>
This adds structs and fields required to parse PodSpecs with
VolumeDevices and PVCs with non-default VolumeModes.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
If a test is failing during setup, makes no much sense to run the suite.
Let's skip and add some debug messages.
Signed-off-by: Beraldo Leal <bleal@redhat.com>
End of file should not end with --- mark. This will confuse tools like
yq and kubectl that might think this is another object.
Signed-off-by: Beraldo Leal <bleal@redhat.com>
Since yq frequently updates, let's upgrade to a version from February to
bypass potential issues with versions 4.41-4.43 for now. We can always
upgrade to the newest version if necessary.
Fixes#9354
Depends-on:github.com/kata-containers/tests#5818
Signed-off-by: Beraldo Leal <bleal@redhat.com>
golang.mk is not ready to deal with non GOPATH installs. This is
breaking test on s390x.
Since previous steps here are installing go and yq our way, we could
skip this aditional check. A full refactor to golang.mk would be needed
to work with different paths.
Signed-off-by: Beraldo Leal <bleal@redhat.com>
fixes#9748
A configuration option `guest_component_procs` has been introduced that
indicates which guest component processes are supposed to be spawned by
the agent. The default behaviour remains that all of those processes are
actively spawned by the agent. At the moment this is based on presence
of binaries in the rootfs and the guest_component_api_rest option.
The new option is incremental:
none -> attestation-agent -> confidential-data-hub -> api-server-rest
e.g. api-server-rest implies attestation-agent and confidential-data-hub
the `none` option has been removed from guest_component_api_rest, since
this is addresses by the introduced option.
To not change expected behaviour for non-coco guests we still will still
only attempt to spawn the processes if the requested attestation binaries
are present on the rootfs, and issue in warning in those cases.
Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
We are currently building Oras from source on ppc64le. Now that they offically release the artefacts
for power, consume them to install Oras.
Fixes: #9213
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
Wainer noticed this is failing for the coco-qemu-dev case, and decided
to skip it, notifying me that he didn't fully understand why it was not
failing on TDX.
Turns out, though, this is also failing on TDX, and we need to skip it
there as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add the CLI flag --runtime-class-names, which is used during
policy generation. For resources that can define a
runtimeClassName (e.g., Pods, Deployments, ReplicaSets,...)
the value must have any of the --runtime-class-names as
prefix, otherwise the resource is ignored.
This allows to run genpolicy on larger yaml
files defining many different resources and only generating
a policy for resources which will be deployed in a
confidential context.
Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
Use the variable BATS_TEST_COMPLETED which is defined by the bats framework
when the test finishes. `BATS_TEST_COMPLETED=` (empty) means the test failed,
so the node syslogs will be printed only at that condition.
Fixes: #9750
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test fails with qemu-coco-dev configuration and guest-pull image pull.
Issue: #9667
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
It's cloning the nydus-snapshotter repo from the version specified in
versions.yaml, however, the deployment files are set to pull in the
latest version of the snapshotter image. With this version we are
pinning the image version too.
This is a temporary fix as it should be better worked out at nydus-snapshotter
project side.
Fixes: #9742
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test fails with qemu-coco-dev configuration and guest-pull image pull.
Issue: #9668
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test fails with qemu-coco-dev configuration and guest-pull image pull.
Issue: #9666
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test fails with qemu-coco-dev configuration and guest-pull image pull.
Issue: #9664
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test fails with qemu-coco-dev configuration and guest-pull image pull.
Unlike other tests that I've seen failing on this scenario, k8s-seccomp.bats
fails after a couple of consecutive executions, so it's that kind of failure
that happens once in a while.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This will enable the k8s tests to leverage guest pulling when
PULL_TYPE=guest-pull for qemu-coco-dev runtimeclass.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The runtime handler annotation is required for Kubernetes <= 1.28 and
guest-pull pull type. So leverage $PULL_TYPE (which is exported by CI jobs)
to conditionally apply the annotation.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
It creates this line, as the Golang runtime does:
-object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0
Signed-off-by: Emanuel Lima <emlima@redhat.com>
We need to remove the device from the tracking map, a container
restart will increment the bus index and we will get out of root-ports
and crash the machine.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We need special handling for pod_sandbox, pod_container and
single_container how and when to inject CDI devices
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.
Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.
Fixes: #8860
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Let's see if it helps with issues like:
```
error: must build at directory: not a valid directory: evalsymlink
failure on
'"/home/runner/actions-runner/_work/kata-containers/kata-containers/tests/functional/kata-deploy/../../..//tools/packaging/kata-deploy/kata-cleanup/overlays/k0s"'
: lstat
/home/runner/actions-runner/_work/kata-containers/kata-containers/tests/functional/kata-deploy/":
no such file or directory
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This takes a few minutes that could be saved, so let's avoid doing this
on all the platforms, but simply do this when it's needed (the baremetal
use case).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Currently only "baremetal" runs all the tests, but we could easily run
"all" locally or using the github provided runners, even when not using
a "baremetal" system.
The reason I'd like to have a differentiation between "all" and
"baremetal" is because "baremetal" may require some cleanup, which "all"
can simply skip if testing against a fresh created VM.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
For now we've only exposed the option to deploy kata-deploy for k3s and
vanilla kubernetes when using containerd.
However, I do need to also deploy k0s and rke2 for an internal CI, and
having those exposed here do not hurt, and allow us to easily expand the
CI at any time in the future.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
k0s was added to kata-deploy, but it's kata-cleanup counterpart was
never added. Let's fix it.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
k0s deployment has been broken since we moved to using `tomlq` in our
scripts. The reason is that before using `tomlq` our script would,
involuntarily, end up creating the file.
Now, in order to fix the situation, we need to explicitly create the
file and let `tomlq` add the needed content.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This option allows to add a debug monitor socket when
`enable_debug = true` to control QEMU within debugging case.
Fixes: #9603
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The `update_env_pci()` function need the PCI address mapping to
translate the host PCI address to guest PCI address in below
environment variables:
- PCIDEVICE_<prefix>_<resource-name>_INFO
- PCIDEVICE_<prefix>_<resource-name>
So collect PCI address mapping for both vfio-pci-gk and
vfio-pci devices.
Fixes#9614
Signed-off-by: Lei Huang <leih@nvidia.com>
The k8s test suite halts on the first failure, i.e., failing-fast. This
isn't the behavior that we used to see when running tests on Jenkins and it
seems that running the entire test suite is still the most productive way. So
this disable fail-fast by default.
However, if you still wish to run on fail-fast mode then just export
K8S_TEST_FAIL_FAST=yes in your environment.
Fixes: #9697
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test has failed in confidential runtime jobs. Skip it
until we don't have a fix.
Fixes: #9663
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
One of our machines is running CentOS 9 Stream, and we could easily
verify that we can build and install the kbs client there, thus we're
expanding the installation script to also support CentOS 9 Stream.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`externals.coco-kbs.toolchain` is not defined, get the rust_version from
`externals.coco-trustee.toolchain` instead.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We're bumping the version in order to bring in the customisation needed
for setting up a custom pccs, which is needed for the KBS integration
tests with Kata Containers + TDX.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR enables the installation and unistallation of the kbs client
as well as general coco components needed for the TDX GHA CI.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR fixes the minvalue for boot time to avoid the random failures
of the GHA CI.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Replace tab indentation with spaces for the three lines within the ifneq statements, aligning them with the surrounding code.
Fixes:#9692
Signed-off-by: sidneychang <2190206983@qq.com>
The `dial_timeout` works fine for Runtime-go, but is obsoleted in
Runtime-rs.
When the pod cannot connect to the Agent upon starting, we need to adjust
the `reconnect_timeout_ms` to increase the number of connection attempts to
the Agent.
Fixes: #9688
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
For the TD attestation to work the connection to QGS on the host is needed.
By default QGS runs on vsock port 4050, but can be modified by the host owner.
Format of the qemu object follows the SocketAddress structure, so it needs to be provided in the JSON format, as in the example below:
-object '{"qom-type":"tdx-guest","id":"tdx","quote-generation-socket":{"type":"vsock","cid":"2","port":"4050"}}'
Fixes: #9497
Signed-off-by: Jakub Ledworowski <jakub.ledworowski@intel.com>
The new version of sriov-network-device-plugin adds an env
`PCIDEVICE_<prefix>_<resource-name>_INFO`, which has a json
value; kata-agent can't parse it as env
`PCIDEVICE_<prefix>_<resource-name>` which has value in format
"DDDD:BB:SS.F".
This change updates env `PCIDEVICE_<prefix>_<resource-name>_INFO`.
Signed-off-by: Lei Huang <leih@nvidia.com>
Container tags can be a maximum of 128 characters long
so calculate the length of the arch suffix and then restrict
the tag to this length subtracted from 128
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Previously I copied the logic that abbreviated the commit hash
from the versioning, but looking at our versions.yaml the clear pattern
is that when pointing at commits of dependencies we use the full
commit hash, not the abbreviated one, so for consistency I think we should
do the same with the components that we make available
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
As we have multi-arch builds for nearly all components, we want to ensure
that all the cache tags we set have the architecture suffix, not just the
`TARGET_BRANCH` one.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Remove `rand.Seed` call to resolve the following failure:
```
rand.Seed is deprecated: As of Go 1.20 there is no reason to call Seed with a random value.
```
The go rand.Seed docs: https://pkg.go.dev/math/rand@go1.20#Seed
back this up and states:
> If Seed is not called, the generator is seeded randomly at program startup.
so I believe we can just delete the call.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
For now, let's allow the users to set the default_cpu and default_memory
when using TDX, as they may hit issues related to the size of the
container image that must be pulled and unpacked inside the guest,
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
- Make due to us bumping the golang version used in our CI
but `make vendor` fails without the go version in the runtime go.mod
being increased, so update this and run go mod tidy
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This PR adds the use of custom Intel DCAP configuration when
deploying the KBS.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR improves general format like variable definition to have
uniformity across the memory usage script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Now we have updated the release builds to push
artefacts to
our registry for the release, so we can cache the images, we need to
set `secrets: inherit` for all architecture's tarball builds
so that we can log into quay.io and ghcr in those steps
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
VERSION_ID is not guaranteed to be specified in os-release, this
makes kaka-deploy breaks in rolling distros like arch linux and void
linux.
Note that operating system vendors may choose not to provide
version information, for example to accommodate for rolling releases.
In this case, VERSION and VERSION_ID may be unset.
Applications should not rely on these fields to be set.
Signed-off-by: vac <dot.fun@protonmail.com>
Fixes: #9532
Close stdin when write_stdin receives data of length 0.
Stop call notify_term_close() in close_stdin, because it could
discard stdout unexpectedly.
Signed-off-by: Tim Zhang <tim@hyper.sh>
This test doesn't fail with the guest image pulling, but it for sure
should. :-)
We can see in the bats logs, something like:
```
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 31s default-scheduler Successfully assigned kata-containers-k8s-tests/liveness-exec to 984fee00bd70.jf.intel.com
Normal Pulled 23s kubelet Successfully pulled image "quay.io/prometheus/busybox:latest" in 345ms (345ms including waiting)
Normal Started 21s kubelet Started container liveness
Warning Unhealthy 7s (x3 over 13s) kubelet Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
Normal Killing 7s kubelet Container liveness failed liveness probe, will be restarted
Normal Pulled 7s kubelet Successfully pulled image "quay.io/prometheus/busybox:latest" in 389ms (389ms including waiting)
Warning Failed 5s kubelet Error: failed to create containerd task: failed to create shim task: the file /bin/sh was not found: unknown
Normal Pulling 5s (x3 over 23s) kubelet Pulling image "quay.io/prometheus/busybox:latest"
Normal Pulled 4s kubelet Successfully pulled image "quay.io/prometheus/busybox:latest" in 342ms (342ms including waiting)
Normal Created 4s (x3 over 23s) kubelet Created container liveness
Warning Failed 3s kubelet Error: failed to create containerd task: failed to create shim task: failed to mount /run/kata-containers/f0ec86fb156a578964007f7773a3ccbdaf60023106634fe030f039e2e154cd11/rootfs to /run/kata-containers/liveness/rootfs, with error: ENOENT: No such file or directory: unknown
Warning BackOff 1s (x3 over 3s) kubelet Back-off restarting failed container liveness in pod liveness-exec_kata-containers-k8s-tests(b1a980bf-a5b3-479d-97c2-ebdb45773eff)
```
Let's skip it for now as we have an issue opened to track it down:
https://github.com/kata-containers/kata-containers/issues/9665
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to firecracker, which doesn't have support for virtio-fs /
virtio-9p, TDX used with `shared_fs=none` will face the very same
limitations.
The tests affected are:
* k8s-credentials-secrets.bats
* k8s-file-volume.bats
* k8s-inotify.bats
* k8s-nested-configmap-secret.bats
* k8s-projected-volume.bats
* k8s-volume.bats
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make sure the logs will print the correct annotation and its
value, instead of always mentioning "kernel" and "initrd".
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We're doing that as all tests are going to be running with
`shared_fs=none`, meaning that we don't need any specific test for this
case anymore.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will set the needed annotation for enforcing that the
image pull will be handled by the snapshotter set for the runtime
handler, instead of using the default one.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We shouldn't be using 9p, at all, with TEEs, as off right now we have no
way to ensure the channels are encrypted. The way to work this around
for now is using guest pull, either with containerd + nydus snapshotter
or with CRI-O; or even tardev snapshotter for pulling on the host (which
is the approach used by MSFT).
This is only done for TDX for now, leaving the generic, AMD, and IBM
related stuff for the folks working on those to switch and debug
possible issues on their environment.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In Kubernetes, the following values for namespace are equivalent and all refer to the default namespace:
- ` ` (namespace field missing)
- `namespace: ""` (namespace field is the empty string)
- `namespace: "default"`(namespace field has the explicit value `default`)
Genpolicy currently does not handle the empty string case correctly.
Signed-Off-By: Malte Poll <1780588+malt3@users.noreply.github.com>
When just containerd is installed without installing nerdctl,
cni plugins are missing from the installation.
containerd tarball does not include cni plugin files.
Hence install cni plugins separately for containerd.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
nerdctl requires cni plugins to be installed in /opt/cni/bin
Without bridge plugin installed, it is not possible to run a
container with nerdctl.
The downloaded nerdctl tarball contains cni plugin files, but are
extracted under /usr/local/libexec.
Copy extracted tarball cni files under /usr/local/libexec
to /opt/cni/bin
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Use `eval` to process the `date` command along with its parameters,
thus avoiding misinterpreting the parameters as commands.
Fixes: #9661
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
Determine the realpath of kata-shim avoiding the check fails
in case the kata-shim is not a symlink, as was happening prior
to this commit.
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
- The tags have a trailing non-printable character, which results
in our cache tags having a trailing underscore e.g. `ghcr.io/kata-containers/cached-artefacts/agent:ce24e9835_`
For ease of use of these cached components, we should strip off the trailing underscore.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
An e2e test for `vfio-ap` has been conducted internally in IBM
due to the lack of publicly available test machines equipped
with a required crypto device.
The test is performed by the `tests` repository:
(i.e. 772105b560/Makefile (L144))
The community is working to integrate all tests into the `kata-containers`
repository, so the `vfio-ap` test should be part of that effort.
This commit moves a test script and Dockerfile for a test image from
the `tests` repository. We do not rename the script to `gha-run.sh`
because it is not executed by Github Actions' workflow.
You can check the test results from the s390x nightly test with the migrated files here:
https://github.com/kata-containers/kata-containers/actions/runs/9123170010/job/25100026025
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Given that we think the containerd -> snapshotter image cache
problems have been resolved by bumping to nydus-snapshotter v0.3.13
we can try removing the skips to test this out
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- We previously have an expectation for the pause rootfs
to be pull on the host when we did a guest pull. We weren't
really clear why, but it is plausible related to the issues we had
with containerd and nydus caching. Now that is fixed we can begin
to address this with setting shared_fs=none, but let's start with
updating the rootfs host check to be not higher than expected
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This test is called from `tests/integration/run_kuberentes_tests.sh`,
which already ensures that yq is installed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
On 1423420, I've mistakenly disabled the tests entirely, for both
non-TEEs and TEEs.
This happened as I didn't realise that `confidential_setup` would take
non-TEEs into consideration. :-/
Now, let me follow-up on that and make sure that the tests will be
running on non-TEEs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function is a helper to check whether the KATA_HYPERVISOR being
used is a confidential hardware (TEE) or not, and we can use it to
skip or only run tests on those platforms when needed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's rename it to `is_confidential_runtime_class`, and adapt all the
places where it's called.
The new name provides a better description, leading to a better
understanding of what the function really does.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Implementation of QemuCmdLine has a fairly uniform and repetitive structure
that's guided by a set of conventions. These conventions have however been
mostly implicit so far, leading to a superfluous and annoying
request/force-push churn during qemu-rs PR reviews.
This commit aims to make things explicit so that contributors can take them
into account before an initial PR submission.
Signed-off-by: Pavel Mores <pmores@redhat.com>
This commit is to append an arch type to the initramfs-cryptsetup image
to prevent a wrong arch image from being pulled on a different arch host.
Fixes: #9654
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
- Container image tags can only contain alphanumeric, period,
hyphen and underscore characters, so convert characters outside
of these to be underscores, to avoid having invalid tag failures
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Recently the extra gpu caching was added, unfortunately when I
rebased I ended up with both the new tagging logic and old logic.
Let's try and integrate them properly to avoid doing the push twice.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
in case the upstream CI fails it's useful to pin-point the PR that
caused the regression. Currently openshift-ci does not allow doing that
from their setup but we can mimic the setup on our infrastructure and
use the available kata-deploy-ci images to find the first failing one.
To help with that add a few helper scripts and a howto.
Fixes: #9228
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
Now we have the workflow updated and can test the changes in caching
we've hit an error:
```
line 1180: artefact_tag: unbound variable
```
so we need to fix that up. Sorry for missing this before.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- It appears like the `if` isn't required when setting env as a
conditional
- `inputs.stage` over input.stage
- Swap matrix.component to matrix.asset
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
ResizeMemory for Cloud Hypervisor is missing a check for the new
requested memory being greater than the max hotplug size after
alignment. Add the check, and since an earlier check for this
setsrequested memory to the max hotplug size, do the same in the
post-alignment check.
Fixes#9640
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
All these settings are hardcoded as `false` and result in
no extra options on the QEMU command line, like the go
runtime does. There actually not needed :
- we're never going to ask QEMU to survive a guest shutdown
- we're never going to run QEMU daemonized since it prevents
log collection
- we're never going to ask QEMU to start with the guest stopped
No need to keep this code around then.
Signed-off-by: Greg Kurz <groug@kaod.org>
- CoCo wants to use the agent and coco-guest-components cached artifacts
so tag them with a helpful version, so make these easier to get
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
No commands remaining.
- We don't want to ship certain components (agent, coco-guest-components)
as part of the release, but for other consumers it's useful to be able to pull in the components
from oras, so rather than not building them, just don't upload it as part of the release.
- Also make the archs all consistent on not shipping the agent
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Set RELEASE env to 'yes', or 'no', based on if the stage
passed in was 'release', so we can use it in the build scripts
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
For other projects (e.g. CoCo projects) being able to
access the released versions of components is helpful,
so push these during the release process
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
A length of the result of `git log -1 --pretty=format:%h` could vary
over different CI systems, highly likely messing up their caching
mechanisms.
This commit is to use an option `--abbrev=9` to standardize the length
to 9 characters for CI.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Bump nydus snapshotter to v0.13.13 to fix the gap when switching
different snapshotters in guest pull.
Fixes: #8407
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This will help us to debug issues in the future (and would have helped
in the past as well). :-)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is needed, as b1710ee2c0 made the
default agent shipped the one with policy support. However, we simply
didn't update the rootfs to reflect that, causing then an issue to start
the agent as shown by the strace below:
```
open("/etc/kata-opa/default-policy.rego", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
futex(0x7f401eba0c28, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [], 8) = 0
tkill(553681, SIGABRT) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=553681, si_uid=1000} ---
+++ killed by SIGABRT (core dumped) +++
```
This happens as the default policy **must** be set when the agent is
built with policy support, but the code path that copies that into the
rootfs is only triggered if the rootfs itself is built with
AGENT_POLICY=yes, which we're now doing for both confidential and
non-confidential cases.
Sadly this was not caught by CI till we the cache was not used for
rootfs, which should be solved by the previous commit.
Fixes: #9630, #9631
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is to add an info for files at `tools/packaging/kata-deploy/local-build/*
to a version of the components and ensure that the cached artefacts are not used
when the files of interest are updated.
Fixes: #9630
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
the `tdx_not_supported_warning` function does not exists, the
`tdx_not_supported` should be called instead.
Fixes: #9628
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
Using a debugger with the kata runtime is complicated, but it can be done
and can be very useful.
This commits provides a helper script that simplifies it, and updates
the developper's documentation to explain how to use it.
Signed-off-by: Julien Ropé <jrope@redhat.com>
This PR fixes the indentation in gha run k8s common script
to have uniformity across the script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
By default, when a container is created with the `--privileged` flag,
all devices in `/dev` from the host are mounted into the guest. If
there is a block device(e.g. `/dev/dm`) followed by a generic
device(e.g. `/dev/null`),two identical block devices(`/dev/dm`)
would be requested to the kata agent causing the agent to exit with error:
> Conflicting device updates for /dev/dm-2
As the generic device type does not hit any cases defined in `switch`,
the variable `kataDevice` which is defined outside of the loop is still
the value of the previous block device rather than `nil`. Defining `kataDevice`
in the loop fixes this bug.
Signed-off-by: cncal <flycalvin@qq.com>
kube-router decided to use :8080 for its metrics, and this seems to be a
change that affected k0s 1.30.0+, leading to kube-router pod crashing
all the time and anything can actually be started after that.
Due to this issue, let's simply use a different port (:9999) and move on
with our tests.
Fixes: #9623
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Whenever we count on having the headers tarball, we must unpack the
cached content into the expected directory, otherwise we'd simply fail,
as we've been failing in our CI, at the end of the process where we
generate the tarball from the cached components.
It's weird to me, sincerely, that the headers tarball end up in such
weird place (build/kernel-nvidia-gpu/builddir/), but I'll leave that to
Zvonko to figure out whether something better can be done, as the intuit
of this PR is simply unblock Kata Containers CI.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
New env var so everyone can test the PUSH_TO_REGISTRY feature
export PUSH_TO_REGISTRY=yes
export ARTEFACT_REGISTRY=quay.io
export ARTEFACT_REPOSITORY=my-fancy-kata-containers
export ARTEFACT_REGISTRY_USERNAME=zvonkok
export ARTEFACT_REGISTRY_PASSWORD=<super-secret>
make ...-tarball
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
This PR updates the launch times scripts by improving the variable
definition as well as trying to use the same format across all the script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
RTC was being built in a wrong fashion on commit #2bc5e3c6e2ab0145fa9e8be95df0d5086c07a517
RTC was being constructed inside the QemuCmdLine struct,
but it should've been built inside the devices vector.
Signed-off-by: Emanuel Lima <emlima@redhat.com>
This was needed when we were using an old (and not maintained anymore)
host stack. Considering what we have as part of the distros, Today,
this can simply be dropped, as I cannot find any reference of this one
being needed in any up-to-date documentation.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit b7cccfa019.
The `private=on` bit has never made its way upstream, and was removed
from the latest iteration that we're using. With that in mind, let's
revert its usage in the code.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit 582b5b6b19.
The `private=on` bit has never made its way upstream, and was removed
from the latest iteration that we're using. With that in mind, let's
revert its addition, and later on its usage in the code.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Here we're checking the distro's `/etc/os-release` or
`/usr/lib/os-release` in order to get which distro we're deploying the
Kata Containers artefacts to, and then to properly adjust the QEMU and
OVMF with TDX support that's been shipped with the distros.
Together with that, we're also printing the instructions provided by the
distro on how to enable and use TDX.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the PLACEHOLDER_FOR_DISTRO_{QEMU,OVMF}_WITH_TDX_SUPPORT
variables instead of actually setting a path, so we can easily replace
those as part of our deployment scripts.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We'll need to have access to the host os-release file (either under
`/etc/os-release` or under `/usr/lib/os-release`), and the simplest
approach that comes to my mind to do is doing what a debug pod would do,
mounting `/` as `/host` and then allowing us to have access to those
files, and then corectly set the TDX specific QEMU and OVMF (TDVF) paths
for the tdx available configurations.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We haven't been using nor testing with td-shim, as Cloud Hypervisor does
not officially support TDX yet, and TDVF is supposed to be used with
QEMU, instead of td-shim.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's remove everything related to the TDX specific QEMU building /
shipping from our repo, as we'll be relying on the one coming from the
distros.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's remove everything related to the TDVF building / shipping from our
repo, as we'll be relying on the one coming from the distro.
Later on, we may need to re-add TDVF logic, as we're already using
upstream edk2 repo / content, but when that's needed we'll simply revert
this commit.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes the random write value for FIO for qemu by decreasing it
to avoid the random failures of the GHA CI.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's skip those tests on TEEs as we've been facing a reasonable amount
of issues, most likely on the containerd side, related to pulling the
image on the guest.
Once we're able to fix the issues on containerd, we can get back and
re-enable those by reverting this commit.
The decision of disabling the tests for TEEs is because the machines may
end up in a state where human intervention is necessary to get them back
to a functional state, and that's really not optimal for our CI.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is the first step of the work to start relying on the artefacts
coming from the distros (CentOS 9 Stream, and Ubuntu) themselves.
Let's have this first one merged, as this will not run the CI due to the
changes being on the yaml itself, and then follow-up with the changes
needed on other parts of the project (kata-deploy, runtime, etc).
Fixes: #9590 -- part I
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Everytime I create contianer on arm64 machine, containerd/kata logs a redundant warning
as follows:
``` shell
time="2024-05-07" level=warning msg="<nil>" arch=arm64 name=containerd-shim-v2
pid=xxx sandbox=fdd1f05 source=virtcontainers/hypervisor
```
I added an error statement so that the error would be logged when it occurs.
Signed-off-by: cncal <flycalvin@qq.com>
This PR removes oci information from versions file as this is not
longer being used in kata containers repository.
Fixes#9599
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds support to install KBS to k8s TDX GHA workflow in
order to run confidential attestation tests.
Fixes#9451
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds a k8s negative policy test to the confidential attestation
bats test.
Fixes#9437
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
We should init and asign the runtime instance to runtime
handler, otherwise, if the pause container failed to start,
which means the runtime instance failed to start, then the
following delete & shutdown request wouldn't be run, thus
the dead shim would be left.
Fixes: #9597
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
dragonball reserves 2048G of mmio space for the pci root bus by default
on physical addresses greater than 4G. However, for some machines with
smaller physical address widths, such as 39-bit wide physical addresses,
dragonball reserves the mmio space when initializing the memory. It is
less than 2048G, so this commit dynamically calculates and allocates the
mmio size of each pci root bus.
Fixes: #9509
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
The CH v39 upgrade in #9575 is currently blocked because of a bug in the
Mariner host kernel. To address this, we temporarily tweak the Mariner
CI to use an Ubuntu host and the Kata guest kernel, while retaining the
Mariner initrd. This is tracked in #9594.
Importantly, this allows us to preserve CI for genpolicy. We had to
tweak the default rules.rego however, as the OCI version is now
different in the Ubuntu host. This is tracked in #9593.
This change has been tested together with CH v39 in #9588.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
The build message shows that yq was not found when I tried to build
runtime binaries, but I've actually installed yq by yum install.
Signed-off-by: cncal <flycalvin@qq.com>
If device /dev/kvm does not exist, kata-runtime check would fail with
an ambiguous error messae 'no such file or directory'. I added a little
more details to make it understandable and it will belike:
```
ERRO[0000] cannot open kvm device: no such file or directory arch=arm64 check-type=full device=/dev/kvm name=kata-runtime pid=2849085 source=runtime
ERRO[0000] no such file or directory arch=arm64 name=kata-runtime pid=2849085 source=runtime
no such file or directory
```
Signed-off-by: cncal <flycalvin@qq.com>
- The testVersionString logic use regex to check that the ociVersion is
displayed correctly, but with the new go module that version has a
`+` in, so we need to quote this to escape special characters
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Dependabot doesn't follow all our commit format guidelines,
so add a check and skip these if the author is `dependabot[bot]`
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
With the addition of the 'qemu-coco-dev' runtimeClass we no longer need
to run CoCo tests on non-TEE environments with 'qemu'. As a result the
tests also no longer need to set the "io.katacontainers.config.hypervisor.image"
annotation to pods.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Created the runtimeclasses/kata-qemu-coco-dev.yaml file and updated the list
of SHIMS.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Created a new configuration to configure Kata for CoCo without requiring TEE
hardware so to allow developers implement/test/debug platform agnostic code
on their workstations. It will also ease testing of CoCo features on CI with
non-TEE supported VMs.
This is based off qemu configuration. The following differences applied:
- switched to confidential guest image/initrd
- switched to confidential kernel
- switched to 9p shared_fs
Fixes#9487
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Now that the `kata-agent` is being built with policy support, let's stop
building the `kata-opa-agent`, reducing the amount of things we need to
test and maintain.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Now that the OPA binary is not required anymore, let's start shipping
the agent with the policy enabled by default.
The agent *without* policy enabled has 30MB, while it's 34MB *with* the
policy enabled.
This 4MB (~10%) increase is, IMHO, worth it in order to reduce the
amount of components we have to maintain and test, including the
possibility to also reduce the amount of possible rootfs / initrd
images.
Whoever wants to use the agent without policy enabled can simply do that
by building their own agent. :-)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Since OPA binary was replaced by the regorus crate, we can finally stop
building and shipping the binary.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As we have an issue with a golang version for `run-cri-containerd`,
it is required to bump the language.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
iommu_platform support was already added on initial DeviceVhostUserFs
introduction, however it incorrectly enabled iommu_platform also on
non-CCW (e.g. PCI) systems.
Signed-off-by: Pavel Mores <pmores@redhat.com>
iommu_platform is only turned on for CCW systems.
PartialEq is added to VirtioBusType to enable the '==' operator.
Signed-off-by: Pavel Mores <pmores@redhat.com>
The adding itself is done by a new function add_iommu() that conforms with
the add_*() convention. Note though that this function is called
internally, by the QemuCmdLine constructor, simply because there's nothing
to trigger its invocation from QemuInner (unlike the other add_*()
functions so far).
Signed-off-by: Pavel Mores <pmores@redhat.com>
Remove reminder to initialize Policy earlier, because currently there
are no plans to initialize earlier.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
When docker is installed on the host system using script from https://get.docker.com/ it automatically creates a docker group with gid=999.
Then during docker build process of tarball, eg. make qemu-tdx-experimental-tarball docker is also installed inside the image with the same
script, which also automatically adds docker group with gid=999.
Then, the build tries to add a new group docker_on_host with gid=999, which already exists, which breaks the build.
Signed-off-by: Jakub Ledworowski <jakub.ledworowski@intel.com>
Avoid auto-generating Policy on platforms that haven't been tested
yet with auto-generated Policy.
Support for auto-generated Policy on these additional platforms is
coming up in future PRs, so the tests being fixed here were
prematurely enabled.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This should only be done once, and if CRI-O restarts, there's a big
chance kata-deploy will also restart and the user would end up with a
file that looks like:
```
[crio]
log_level = "debug"
[crio]
log_level = "debug"
[crio]
log_level = "debug"
...
```
And that would simply cause CRI-O to not start.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Implement Agent Policy using the regorus crate instead of the OPA
daemon.
The OPA daemon will be removed from the Guest rootfs in a future PR.
Fixes: #9388
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Remove k8s-policy-set-keys.bats in preparation for using the regorus
crate instead of the OPA daemon for evaluating the Agent Policy. This
test depended on sending HTTP requests to OPA.
Fixes: #9388
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Lock anyhow version to 1.0.58 because:
- Versions between 1.0.59 - 1.0.76 have not been tested yet using
Kata CI. However, those versions pass "make test" for the
Kata Agent.
- Versions 1.0.77 or newer fail during "make test" - see
https://github.com/kata-containers/kata-containers/issues/9538.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
In case of block devices using virtio-block, we need to pass the
pci-path as the storage source field to the agent.
Current the virt-path is being passed which works just for mmio block
devices.
In the future when support is added for scsi, block-ccw and pmem
devices, the storage source would need to be handled accordingly.
Fixes: #9034
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This commit is to add a new CI job to run-k8s-tests-on-zvsi.yaml.
Why the job is not configured in run-kata-coco-tests.yaml by having it
integrated with `run-k8s-tests-coco-nontee` is:
- It uses k3s instead of AKS
- It runs on a self-hosted runner
These differences make the integrated job not easy to read and maintain
when it comes to incorporating other platforms in the near future.
Fixes: #9467
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
In do_create_container and do_exec_process, we should create the proc_io first,
in case there's some error occur below, thus we can make sure
the io stream closed when error occur.
Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Fixes: #9334
In linux, when a FIFO is opened and there are no writers, the reader
will continuously receive the HUP event. This can be problematic.
To avoid this problem, we open stdin in write mode and keep the stdin-writer
We need to open the stdout/stderr as the read mode and keep the open endpoint
until the process is delete. otherwise,
the process would exit before the containerd side open and read
the stdout fifo, thus runD would write all of the stdout contents into
the stdout fifo and then closed the write endpoint. Then, containerd
open the stdout fifo and try to read, since the write side had closed,
thus containerd would block on the read forever.
Here we keep the stdout/stderr read endpoint File in the common_process,
which would be destroied when containerd send the delete rpc call,
at this time the containerd had waited the stdout read return, thus it
can make sure the contents in the stdout/stderr fifo wouldn't be lost.
Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
In scenario passfd-io, we should wait for stdin to close itself
instead of manually intervening in it.
Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This patch adds a default implementation for the use_sandbox_pidns
and updates the structs that implement the K8sResource trait to use
the default.
Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
- Provide default implementation for get_sandbox_name in K8sResource trait
- Remove default implementation from structs implementing the trait K8sResource
Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
concurrently with itself
Based on 3a1461b0a5186a92afedaaea33ff2bd120d1cea0
Previously the tool would use the layers_cache folder for all instances
and hence delete the cache when it was done, interfereing with other
instances. This change makes it so that each instance of the tool will
have its own temp folder to use.
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
The new run-k8s-tests-coco-nontee job should be the home of attestation
tests.
Changed run-k8s-tests-coco-nontee to get KBS installed and by the time the
KBS variable is exported in the environment then the attestation tests
will kick in (likewise they will skip in run-k8s-tests-on-aks).
Fixes#9455
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This is to update a document `how-to-run-kata-containers-with-SE-VMs`
on using confidential artifacts to build a secure image.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This is to make `boot-image-se-tarball` use confidential kernel and
initrd instead of vanilla version of artifacts.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
1. EPOLLHUP events also need to be read and will be got len 0.
2. We should kill the connection when EPOLLERR events are received.
Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
isClhRunning uses signal 0 to test whether the process is
still alive or not. This doesn't work because the process is a
direct child of the shim. Once it is dead the process becomes
zombie.
Since no one waits for it the process lingers until
its parent dies and init reaps it. Hence sending signal 0 in
isClhRunning will always return success whether the process is
dead or not.
This patch calls wait to reap the process, if it succeeds that
means it is our child process, if not we send the signal.
Fixes: #9431
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Fixes: #9267
The doc states we have support for all lifecycle hooks. There are still some missing.
This is the first issue regarding the CreateContainer hook which is run before pivot_root but after prestart and createruntime
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-03-13 09:24:25 +00:00
1839 changed files with 184870 additions and 162474 deletions
| [`runk`](src/tools/runk) | utility | Standard OCI container runtime based on the agent. |
| [`ci`](.github/workflows) | CI | Continuous Integration configuration files and scripts. |
| [`ocp-ci`](ci/openshift-ci/README.md) | CI | Continuous Integration configuration for the OpenShift pipelines. |
| [`katacontainers.io`](https://github.com/kata-containers/www.katacontainers.io) | Source for the [`katacontainers.io`](https://www.katacontainers.io) site. |
| [`Webhook`](tools/testing/kata-webhook/README.md) | utility | Example of a simple admission controller webhook to annotate pods with the Kata runtime class |
pipelines to monitor selected functional tests on OpenShift.
There are 2 pipelines, history and logs can be accessed here:
* [main - currently supported OCP](https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-kata-containers-kata-containers-main-e2e-tests)
* [next - currently under development OCP](https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-kata-containers-kata-containers-main-next-e2e-tests)
Running openshift-tests on OCP with kata-containers manually
Following steps require the [openshift tests](https://github.com/openshift/origin)
being cloned and built in the current directory:
```bash
#!/bin/bash -e
# Define tests to be skipped (see the pipeline config for the current version)
TEST_SKIPS="\[sig-node\] Security Context should support seccomp runtime/default\|\[sig-node\] Variable Expansion should allow substituting values in a volume subpath\|\[k8s.io\] Probing container should be restarted with a docker exec liveness probe with timeout\|\[sig-node\] Pods Extended Pod Container lifecycle evicted pods should be terminal\|\[sig-node\] PodOSRejection \[NodeConformance\] Kubelet should reject pod when the node OS doesn't match pod's OS\|\[sig-network\].*for evicted pods\|\[sig-network\].*HAProxy router should override the route\|\[sig-network\].*HAProxy router should serve a route\|\[sig-network\].*HAProxy router should serve the correct\|\[sig-network\].*HAProxy router should run\|\[sig-network\].*when FIPS.*the HAProxy router\|\[sig-network\].*bond\|\[sig-network\].*all sysctl on whitelist\|\[sig-network\].*sysctls should not affect\|\[sig-network\] pods should successfully create sandboxes by adding pod to network"
# Get the list of tests to be executed
TESTS="$(./openshift-tests run --dry-run --provider "${TEST_PROVIDER}" "${TEST_SUITE}")"
E2E_TEST="${E2E_TEST:-'"[sig-node] Container Runtime blackbox test on terminated container should report termination message as empty when pod succeeds and TerminationMessagePolicy FallbackToLogsOnError is set [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"'}"
KATA_CI_DIR="${KATA_CI_DIR:-$(pwd)}"
exportKATA_RUNTIME="${KATA_RUNTIME:-kata-qemu}"
## SETUP
# Deploy kata
SETUP=0
pushd"$KATA_CI_DIR"||{echo"Failed to cd to '$KATA_CI_DIR'";exit 255;}
@@ -499,19 +499,6 @@ If you do not want to install the respective QEMU version, the configuration fil
See the [static-build script for QEMU](../tools/packaging/static-build/qemu/build-static-qemu.sh) for a reference on how to get, setup, configure and build QEMU for Kata.
### Build a custom QEMU for aarch64/arm64 - REQUIRED
> **Note:**
>
> - You should only do this step if you are on aarch64/arm64.
> - You should include [Eric Auger's latest PCDIMM/NVDIMM patches](https://patchwork.kernel.org/cover/10647305/) which are
> under upstream review for supporting NVDIMM on aarch64.
>
You could build the custom `qemu-system-aarch64` as required with the following command:
@@ -50,7 +50,7 @@ We provide `Dragonball` Sandbox to enable built-in VMM by integrating VMM's func
#### How To Support Async
The kata-runtime is controlled by TOKIO_RUNTIME_WORKER_THREADS to run the OS thread, which is 2 threads by default. For TTRPC and container-related threads run in the `tokio` thread in a unified manner, and related dependencies need to be switched to Async, such as Timer, File, Netlink, etc. With the help of Async, we can easily support no-block I/O and timer. Currently, we only utilize Async for kata-runtime. The built-in VMM keeps the OS thread because it can ensure that the threads are controllable.
**For N tokio worker threads and M containers**
**For N `tokio` worker threads and M containers**
- Sync runtime(both OS thread and `tokio` task are OS thread but without `tokio` worker thread) OS thread number: 4 + 12*M
- Async runtime(only OS thread is OS thread) OS thread number: 2 + N
@@ -103,7 +103,6 @@ In our case, there will be a variety of resources, and every resource has severa
| `Cgroup V2` | | Stage 2 | 🚧 |
| Hypervisor | `Dragonball` | Stage 1 | 🚧 |
| | QEMU | Stage 2 | 🚫 |
| | ACRN | Stage 3 | 🚫 |
| | Cloud Hypervisor | Stage 3 | 🚫 |
| | Firecracker | Stage 3 | 🚫 |
@@ -166,4 +165,4 @@ In our case, there will be a variety of resources, and every resource has severa
- What is the security boundary for the monolithic / "Built-in VMM" case?
It has the security boundary of virtualization. More details will be provided in next stage.
It has the security boundary of virtualization. More details will be provided in next stage.
@@ -113,6 +113,13 @@ Next, the kata-agent's RPC module will handle the create container request which
> **Notes:**
> In this flow, `ImageService.pull_image()` parses the image metadata, looking for either the `io.kubernetes.cri.container-type: sandbox` or `io.kubernetes.cri-o.ContainerType: sandbox` (CRI-IO case) annotation, then it never calls the `image-rs.pull_image()` because the pause image is expected to already be inside the guest's filesystem, so instead `ImageService.unpack_pause_image()` is called.
## Using guest image pull with `nerdctl`
When running a workload, add the `--label io.kubernetes.cri.image-name=<image>` option e.g.:
@@ -60,7 +60,7 @@ So in guest, container rootfs=overlay(`lowerdir=rafs`, `upperdir=snapshotdir/fs`
> how to transfer the `rafs` info from `nydus-snapshotter` to the Kata Containers Containerd v2 shim?
By default, when creating `OCI` image container, `nydus-snapshotter` will return [`struct` Mount slice](https://github.com/containerd/containerd/blob/main/mount/mount.go#L21) below to containerd and containerd use them to mount rootfs
By default, when creating `OCI` image container, `nydus-snapshotter` will return [`struct` Mount slice](https://github.com/containerd/containerd/blob/main/core/mount/mount.go#L30) below to containerd and containerd use them to mount rootfs
```
[
@@ -72,7 +72,7 @@ By default, when creating `OCI` image container, `nydus-snapshotter` will return
]
```
Then, we can append `rafs` info into `Options`, but if do this, containerd will mount failed, as containerd can not identify `rafs` info. Here, we can refer to [containerd mount helper](https://github.com/containerd/containerd/blob/main/mount/mount_linux.go#L42) and provide a binary called `nydus-overlayfs`. The `Mount` slice which `nydus-snapshotter` returned becomes
Then, we can append `rafs` info into `Options`, but if do this, containerd will mount failed, as containerd can not identify `rafs` info. Here, we can refer to [containerd mount helper](https://github.com/containerd/containerd/blob/main/core/mount/mount_linux.go#L81) and provide a binary called `nydus-overlayfs`. The `Mount` slice which `nydus-snapshotter` returned becomes
@@ -40,7 +40,7 @@ use `RuntimeClass` instead of the deprecated annotations.
### Containerd Runtime V2 API: Shim V2 API
The [`containerd-shim-kata-v2` (short as `shimv2` in this documentation)](../../src/runtime/cmd/containerd-shim-kata-v2/)
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/main/runtime/v2) for Kata.
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/main/core/runtime/v2) for Kata.
With `shimv2`, Kubernetes can launch Pod and OCI-compatible containers with one shim per Pod. Prior to `shimv2`, `2N+1`
shims (i.e. a `containerd-shim` and a `kata-shim` for each container and the Pod sandbox itself) and no standalone `kata-proxy`
process were used, even with VSOCK not available.
@@ -62,7 +62,7 @@ Follow the instructions to [install Kata Containers](../install/README.md).
> You do not need to install `cri` if you have containerd 1.1 or above. Just remove the `cri` plugin from the list of
> `disabled_plugins` in the containerd configuration file (`/etc/containerd/config.toml`).
Follow the instructions from the [CRI installation guide](https://github.com/containerd/containerd/blob/main/docs/cri/installation.md).
Follow the instructions from the [CRI installation guide](https://github.com/containerd/containerd/blob/main/docs/cri/crictl.md#install-crictl).
Then, check if `containerd` is now available:
@@ -132,9 +132,9 @@ The `RuntimeClass` is suggested.
The following configuration includes two runtime classes:
- `plugins.cri.containerd.runtimes.runc`: the runc, and it is the default runtime.
- `plugins.cri.containerd.runtimes.kata`: The function in containerd (reference [the document here](https://github.com/containerd/containerd/tree/main/runtime/v2#binary-naming))
- `plugins.cri.containerd.runtimes.kata`: The function in containerd (reference [the document here](https://github.com/containerd/containerd/tree/main/core/runtime/v2))
where the dot-connected string `io.containerd.kata.v2` is translated to `containerd-shim-kata-v2` (i.e. the
binary name of the Kata implementation of [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/main/runtime/v2)).
binary name of the Kata implementation of [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/main/core/runtime/v2)).
## Run pod in kata containers with pulling image in guest
To verify pulling images in a guest VM, please refer to the following commands:
@@ -152,8 +148,6 @@ apiVersion: v1
kind: Pod
metadata:
name: busybox
annotations:
io.containerd.cri.runtime-handler: kata-qemu
spec:
runtimeClassName: kata-qemu
containers:
@@ -167,9 +161,6 @@ NAME READY STATUS RESTARTS AGE
busybox 1/1 Running 0 10s
```
> **Notes:**
> The `CRI Runtime Specific Snapshotter` is still an experimental feature. To pull images in the guest under the specific kata runtime (such as `kata-qemu`), we need to add the following annotation in metadata to each pod yaml: `io.containerd.cri.runtime-handler: kata-qemu`. By adding the annotation, we can ensure that the feature works as expected.
2. Verify that the pod's images have been successfully downloaded in the guest.
If images intended for deployment are deleted prior to deploying with `nydus snapshotter`, the root filesystems required for the pod's images (including the pause image and the container image) should not be present on the host.
## Run pod in kata containers with pulling large image in guest
Currently, the image pulled in the guest will be downloaded and unpacked in the `/run/kata-containers/image` directory. However, by default, in rootfs-confidential image, systemd allocates 50% of the available physical RAM to the `/run` directory using a `tmpfs` filesystem. As we all know, memory is valuable, especially for confidential containers. This means that if we run a kata container with the default configuration (where the default memory assigned for a VM is 2048 MiB), `/run` would be allocated around 1024 MiB. Consequently, we can only pull images up to 1024 MiB in the guest. So we can use a block volume from the host and use `dm-crypt` and `dm-integrity` to encrypt the block volume in the guest, providing a secure place to store downloaded container images.
### Create block volume with k8s
There are a lot of CSI Plugins that support block volumes: AWS EBS, Azure Disk, Open-Local and so on. But as an example, we use Local Persistent Volumes to use local disks as block storage with k8s cluster.
1. Create an empty disk image and attach the image to a loop device, such as `/dev/loop0`
To build an image on the `x86_64` platform, set the following environment variables together with the variables above before `make boot-image-se-tarball`:
@@ -35,6 +35,7 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.agent.enable_tracing` | `boolean` | enable tracing for the agent |
| `io.katacontainers.config.agent.container_pipe_size` | uint32 | specify the size of the std(in/out) pipes created for containers |
| `io.katacontainers.config.agent.kernel_modules` | string | the list of kernel modules and their parameters that will be loaded in the guest kernel. Semicolon separated list of kernel modules and their parameters. These modules will be loaded in the guest kernel using `modprobe`(8). E.g., `e1000e InterruptThrottleRate=3000,3000,3000 EEE=1; i915 enable_ppgtt=0` |
| `io.katacontainers.config.agent.cdh_api_timeout` | uint32 | timeout in second for Confidential Data Hub (CDH) API service, default is `50` |
## Hypervisor Options
| Key | Value Type | Comments |
@@ -45,7 +46,6 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.block_device_cache_set` | `boolean` | cache-related options will be set to block devices or not |
| `io.katacontainers.config.hypervisor.block_device_driver` | string | the driver to be used for block device, valid values are `virtio-blk`, `virtio-scsi`, `nvdimm`|
| `io.katacontainers.config.hypervisor.cpu_features` | `string` | Comma-separated list of CPU features to pass to the CPU (QEMU) |
| `io.katacontainers.config.hypervisor.ctlpath` (R) | `string` | Path to the `acrnctl` binary for the ACRN hypervisor |
| `io.katacontainers.config.hypervisor.default_max_vcpus` | uint32| the maximum number of vCPUs allocated for the VM by the hypervisor |
| `io.katacontainers.config.hypervisor.default_memory` | uint32| the memory assigned for a VM by the hypervisor in `MiB` |
| `io.katacontainers.config.hypervisor.default_vcpus` | float32| the default vCPUs assigned for a VM by the hypervisor |
This document provides an overview on how to run Kata containers with ACRN hypervisor and device model.
## Introduction
ACRN is a flexible, lightweight Type-1 reference hypervisor built with real-time and safety-criticality in mind. ACRN uses an open source platform making it optimized to streamline embedded development.
Some of the key features being:
- Small footprint - Approx. 25K lines of code (LOC).
- Real Time - Low latency, faster boot time, improves overall responsiveness with hardware.
- Adaptability - Multi-OS support for guest operating systems like Linux, Android, RTOSes.
- Rich I/O mediators - Allows sharing of various I/O devices across VMs.
- Optimized for a variety of IoT (Internet of Things) and embedded device solutions.
Please refer to ACRN [documentation](https://projectacrn.github.io/latest/index.html) for more details on ACRN hypervisor and device model.
## Pre-requisites
This document requires the presence of the ACRN hypervisor and Kata Containers on your system. Install using the instructions available through the following links:
- For networking, ACRN supports either MACVTAP or TAP. If MACVTAP is not enabled in the Service OS, please follow the below steps to update the kernel:
$ sudo sed -i "s/$kernel_img/bzImage/g" /mnt/loader/entries/$conf_file
$ sync && sudo umount /mnt && sudo reboot
```
- Kata Containers installation: Automated installation does not seem to be supported for Clear Linux, so please use [manual installation](../Developer-Guide.md) steps.
> **Note:** Create rootfs image and not initrd image.
In order to run Kata with ACRN, your container stack must provide block-based storage, such as device-mapper.
> **Note:** Currently, by design you can only launch one VM from Kata Containers using ACRN hypervisor (SDC scenario). Based on feedback from community we can increase number of VMs.
## Configure Docker
To configure Docker for device-mapper and Kata,
1. Stop Docker daemon if it is already running.
```bash
$ sudo systemctl stop docker
```
2. Set `/etc/docker/daemon.json` with the following contents.
```
{
"storage-driver": "devicemapper"
}
```
3. Restart docker.
```bash
$ sudo systemctl daemon-reload
$ sudo systemctl restart docker
```
4. Configure [Docker](../Developer-Guide.md#update-the-docker-systemd-unit-file) to use `kata-runtime`.
## Configure Kata Containers with ACRN
To configure Kata Containers with ACRN, copy the generated `configuration-acrn.toml` file when building the `kata-runtime` to either `/etc/kata-containers/configuration.toml` or `/usr/share/defaults/kata-containers/configuration.toml`.
The following command shows full paths to the `configuration.toml` files that the runtime loads. It will use the first path that exists. (Please make sure the kernel and image paths are set correctly in the `configuration.toml` file)
```bash
$ sudo kata-runtime --show-default-config-paths
```
>**Warning:** Please offline CPUs using [this](offline_cpu.sh) script, else VM launches will fail.
```bash
$ sudo ./offline_cpu.sh
```
Start an ACRN based Kata Container,
```bash
$ sudo docker run -ti --runtime=kata-runtime busybox sh
```
You will see ACRN(`acrn-dm`) is now running on your system, as well as a `kata-shim`. You should obtain an interactive shell prompt. Verify that all the Kata processes terminate once you exit the container.
```bash
$ ps -ef | grep -E "kata|acrn"
```
Validate ACRN hypervisor by using `kata-runtime kata-env`,
After running the command above, the default config file `configuration.toml` will be installed under `/usr/share/defaults/kata-containers/`, the binary file `containerd-shim-kata-v2` will be installed under `/usr/local/bin/` .
### Install Shim Without Builtin Dragonball VMM
By default, runtime-rs includes the `Dragonball` VMM. To build without the built-in `Dragonball` hypervisor, use `make USE_BUILDIN_DB=false`:
```bash
$ cd kata-containers/src/runtime-rs
$ make USE_BUILDIN_DB=false
```
After building, specify the desired hypervisor during installation using `HYPERVISOR`. For example, to use `qemu` or `cloud-hypervisor`:
```
sudo make install HYPERVISOR=qemu
```
or
```
sudo make install HYPERVISOR=cloud-hypervisor
```
### Build Kata Containers Kernel
Follow the [Kernel installation guide](/tools/packaging/kernel/README.md).
This document discusses threat models associated with the Kata Containers project.
Kata was designed to provide additional isolation of container workloads, protecting
the host infrastructure from potentially malicious container users or workloads. Since
Kata Containers adds a level of isolation on top of traditional containers, the focus
is on the additional layer provided, not on traditional container security.
This document discusses threat models associated with the Kata Containers
project. Kata was designed to provide additional isolation of container
workloads, protecting the host infrastructure from potentially malicious
container users or workloads. Since Kata Containers adds a level of isolation on
top of traditional containers, the focus is on the additional layer provided,
not on traditional container security.
This document provides a brief background on containers and layered security, describes
the interface to Kata from CRI runtimes, a review of utilized virtual machine interfaces, and then
a review of threats.
This document provides a brief background on containers and layered security,
describes the interface to Kata from CRI runtimes, a review of utilized virtual
machine interfaces, and then a review of threats.
## Kata security objective
Kata seeks to prevent an untrusted container workload or user of that container workload to gain
control of, obtain information from, or tamper with the host infrastructure.
Kata seeks to prevent an untrusted container workload or user of that container
workload to gain control of, obtain information from, or tamper with the host
infrastructure.
In our scenario, an asset is anything on the host system, or elsewhere in the cluster
infrastructure. The attacker is assumed to be either a malicious user or the workload itself
running within the container. The goal of Kata is to prevent attacks which would allow
any access to the defined assets.
In our scenario, an asset is anything on the host system, or elsewhere in the
cluster infrastructure. The attacker is assumed to be either a malicious user or
the workload itself running within the container. The goal of Kata is to prevent
attacks which would allow any access to the defined assets.
## Background on containers, layered security
Traditional containers leverage several key Linux kernel features to provide isolation and
a view that the container workload is the only entity running on the host. Key features include
`Namespaces`, `cgroups`, `capablities`, `SELinux` and `seccomp`. The canonical runtime for creating such
a container is `runc`. In the remainder of the document, the term `traditional-container` will be used
to describe a container workload created by runc.
Traditional containers leverage several key Linux kernel features to provide
isolation and a view that the container workload is the only entity running on
the host. Key features include `Namespaces`, `cgroups`, `capablities`, `SELinux`
and `seccomp`. The canonical runtime for creating such a container is `runc`. In
the remainder of the document, the term `traditional-container` will be used to
describe a container workload created by runc.
Kata Containers provides a second layer of isolation on top of those provided by traditional-containers.
The hardware virtualization interface is the basis of this additional layer. Kata launches a lightweight
virtual machine, and uses the guest’s Linux kernel to create a container workload, or workloads in the case
of multi-container pods. In Kubernetes and in the Kata implementation, the sandbox is carried out at the
pod level. In Kata, this sandbox is created using a virtual machine.
Kata Containers provides a second layer of isolation on top of those provided by
traditional-containers. The hardware virtualization interface is the basis of
this additional layer. Kata launches a lightweight virtual machine, and uses the
guest’s Linux kernel to create a container workload, or workloads in the case of
multi-container pods. In Kubernetes and in the Kata implementation, the sandbox
is carried out at the pod level. In Kata, this sandbox is created using a
virtual machine.
## Interface to Kata Containers: CRI, v2-shim, OCI
A typical Kata Containers deployment uses Kubernetes with a CRI implementation.
On every node, Kubelet will interact with a CRI implementor, which will in turn interface with
an OCI based runtime, such as Kata Containers. Typical CRI implementors are `cri-o` and `containerd`.
On every node, Kubelet will interact with a CRI implementor, which will in turn
interface with an OCI based runtime, such as Kata Containers. Typical CRI
implementors are `cri-o` and `containerd`.
The CRI API, as defined at the Kubernetes [CRI-API repo](https://github.com/kubernetes/cri-api/),
results in a few constructs being supported by the CRI implementation, and ultimately in the OCI
runtime creating the workloads.
The CRI API, as defined at the Kubernetes [CRI-API
repo](https://github.com/kubernetes/cri-api/), results in a few constructs being
supported by the CRI implementation, and ultimately in the OCI runtime creating
the workloads.
In order to run a container inside of the Kata sandbox, several virtual machine devices and interfaces
are required. Kata translates sandbox and container definitions to underlying virtualization technologies provided
by a set of virtual machine monitors (VMMs) and hypervisors. These devices and their underlying
implementations are discussed in detail in the following section.
In order to run a container inside of the Kata sandbox, several virtual machine
devices and interfaces are required. Kata translates sandbox and container
definitions to underlying virtualization technologies provided by a set of
virtual machine monitors (VMMs) and hypervisors. These devices and their
underlying implementations are discussed in detail in the following section.
## Interface to the Kata sandbox/virtual machine
In case of Kata, today the devices which we need in the guest are:
- Storage: In the current design of Kata Containers, we are reliant on the CRI implementor to
assist in image handling and volume management on the host. As a result, we need to support a way of passing to the sandbox the container rootfs, volumes requested
by the workload, and any other volumes created to facilitate sharing of secrets and `configmaps` with the containers. Depending on how these are managed, a block based device or file-system
sharing is required. Kata Containers does this by way of `virtio-blk` and/or `virtio-fs`.
- Networking: A method for enabling network connectivity with the workload is required. Typically this will be done providing a `TAP` device
to the VMM, and this will be exposed to the guest as a `virtio-net` device. It is feasible to pass in a NIC device directly, in which case `VFIO` is leveraged
and the device itself will be exposed to the guest.
-Control: In order to interact with the guest agent and retrieve `STDIO` from containers, a medium of communication is required.
This is available via `virtio-vsock`.
- Devices: `VFIO` is utilized when devices are passed directly to the virtual machine and exposed to the container.
- Dynamic Resource Management: `ACPI` is utilized to allow for dynamic VM resource management (for example: CPU, memory, device hotplug). This is required when containers are resized,
or more generally when containers are added to a pod.
- Storage: In the current design of Kata Containers, we are reliant on the CRI
implementor to assist in image handling and volume management on the host. As a
result, we need to support a way of passing to the sandbox the container
rootfs, volumes requested by the workload, and any other volumes created to
facilitate sharing of secrets and `configmaps` with the containers. Depending
on how these are managed, a block based device or file-system sharing is
required. Kata Containers does this by way of `virtio-blk` and/or `virtio-fs`.
-Networking: A method for enabling network connectivity with the workload is
required. Typically this will be done providing a `TAP` device to the VMM, and
this will be exposed to the guest as a `virtio-net` device. It is feasible to
pass in a NIC device directly, in which case `VFIO` is leveraged and the device
itself will be exposed to the guest.
- Control: In order to interact with the guest agent and retrieve `STDIO` from
containers, a medium of communication is required. This is available via
`virtio-vsock`.
- Devices: `VFIO` is utilized when devices are passed directly to the virtual
machine and exposed to the container.
- Dynamic Resource Management: `ACPI` is utilized to allow for dynamic VM
resource management (for example: CPU, memory, device hotplug). This is
required when containers are resized, or more generally when containers are
added to a pod.
How these devices are utilized varies depending on the VMM utilized. We clarify the default settings provided when integrating Kata
with the QEMU, Firecracker and Cloud Hypervisor VMMs in the following sections.
How these devices are utilized varies depending on the VMM utilized. We clarify
the default settings provided when integrating Kata with the QEMU, Dragonball,
Firecracker and Cloud Hypervisor VMMs in the following sections.
### Virtual Machine Monitor(s)
In a KVM/QEMU (any other VMM utilizing KVM) virtualization setup, all virtual
machines (VMs) share the same host kernel. This shared environment can lead to
scenarios where one VM could potentially impact the performance or stability of
other VMs, including the possibility of a Denial of Service attack.
- Kernel Vulnerabilities: Since all VMs rely on the host's kernel, a
vulnerability in the kernel could be exploited by a process running within one
VM to affect the entire system. This could lead to scenarios where the
compromised VM impacts other VMs or even takes down the host.
- Improper Isolation and Containment: If the virtualization environment is not
correctly configured, processes in one VM might impact other VMs. This could
occur through improper isolation of network traffic, shared file systems, or
other inter-VM communication channels.
- Hypervisor Vulnerabilities: Flaws in the KVM hypervisor or QEMU could be
exploited to cause information disclosure, data tampering, elevation of
privileges, denial of service, and others. Since KVM/QEMU leverages the host
kernel for its operation, any exploit at this level can have widespread impacts.
- Malicious or Flawed Guest Operating Systems: A guest operating system that is
maliciously designed or has serious flaws could engage in activities that
disrupt the normal operation of the host or other guests. This might include
aggressive network activity or interactions with the virtualization stack that
lead to instability.
- Resource Exhaustion: A VM could consume excessive shared resources such as
CPU, memory, or I/O bandwidth, leading to resource starvation for other VMs.
This could be due to misconfiguration, a runaway process, or a deliberate
denial of service attack from a compromised VM.
### Devices
Each virtio device is implemented by a backend, which may execute within userspace on the host (vhost-user), the VMM itself, or within the host kernel (vhost). While it may provide enhanced performance,
vhost devices are often seen as higher risk since an exploit would be already running within the kernel space. While VMM and vhost-user are both in userspace on the host, `vhost-user` generally allows for the back-end process to require less system calls and capabilities compared to a full VMM.
Each virtio device is implemented by a backend, which may execute within
userspace on the host (vhost-user), the VMM itself, or within the host kernel
(vhost). While it may provide enhanced performance, vhost devices are often seen
as higher risk since an exploit would be already running within the kernel
space. While VMM and vhost-user are both in userspace on the host, `vhost-user`
generally allows for the back-end process to require less system calls and
capabilities compared to a full VMM.
#### `virtio-blk` and `virtio-scsi`
The backend for `virtio-blk` and `virtio-scsi` are based in the VMM itself (ring3 in the context of x86) by default for Cloud Hypervisor, Firecracker and QEMU.
While `vhost` based back-ends are available for QEMU, it is not recommended. `vhost-user` back-ends are being added for Cloud Hypervisor, they are not utilized in Kata today.
The backend for `virtio-blk` and `virtio-scsi` are based in the VMM itself
(ring3 in the context of x86) by default for Cloud Hypervisor, Firecracker and
QEMU. While `vhost` based back-ends are available for QEMU, it is not
recommended. `vhost-user` back-ends are being added for Cloud Hypervisor, they
are not utilized in Kata today.
#### `virtio-fs`
`virtio-fs` is supported in Cloud Hypervisor and QEMU. `virtio-fs`'s interaction with the host filesystem is done through a vhost-user daemon, `virtiofsd`.
The `virtio-fs` client, running in the guest, will generate requests to access files. `virtiofsd` will receive requests, open the file, and request the VMM
to `mmap` it into the guest. When DAX is utilized, the guest will access the host's page cache, avoiding the need for copy and duplication. DAX is still an experimental feature,
and is not enabled by default.
`virtio-fs` is supported in Cloud Hypervisor and QEMU. `virtio-fs`'s interaction
with the host filesystem is done through a vhost-user daemon, `virtiofsd`. The
`virtio-fs` client, running in the guest, will generate requests to access
files. `virtiofsd` will receive requests, open the file, and request the VMM to
`mmap` it into the guest. When DAX is utilized, the guest will access the host's
page cache, avoiding the need for copy and duplication. DAX is still an
experimental feature, and is not enabled by default.
From the `virtiofsd` [documentation](https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/README.md):
```This program must be run as the root user. Upon startup the program will switch into a new file system namespace with the shared directory tree as its root. This prevents “file system escapes” due to symlinks and other file system objects that might lead to files outside the shared directory. The program also sandboxes itself using seccomp(2) to prevent ptrace(2) and other vectors that could allow an attacker to compromise the system after gaining control of the virtiofsd process.```
DAX-less support for `virtio-fs` is available as of the 5.4 Linux kernel. QEMU VMM supports virtio-fs as of v4.2. Cloud Hypervisor
supports `virtio-fs`.
DAX-less support for `virtio-fs` is available as of the 5.4 Linux kernel. QEMU
VMM supports virtio-fs as of v4.2. Cloud Hypervisor supports `virtio-fs`.
#### `virtio-net`
@@ -97,9 +159,9 @@ supports `virtio-fs`.
##### QEMU networking
While QEMU has options for `vhost`, `virtio-net` and `vhost-user`, the `virtio-net` backend
for Kata defaults to `vhost-net` for performance reasons. The default configuration is being
reevaluated.
While QEMU has options for `vhost`, `virtio-net` and `vhost-user`, the
`virtio-net` backend for Kata defaults to `vhost-net` for performance reasons.
The default configuration is being reevaluated.
##### Firecracker networking
@@ -107,8 +169,14 @@ For Firecracker, the `virtio-net` backend is within Firecracker's VMM.
##### Cloud Hypervisor networking
For Cloud Hypervisor, the current backend default is within the VMM. `vhost-user-net` support
is being added (written in rust, Cloud Hypervisor specific).
For Cloud Hypervisor, the current backend default is within the VMM.
`vhost-user-net` support is being added (written in rust, Cloud Hypervisor
specific).
##### Dragonball networking
For Dragonball, the `virtio-net` backend default is within Dragonbasll's VMM.
#### virtio-vsock
@@ -116,22 +184,88 @@ is being added (written in rust, Cloud Hypervisor specific).
In QEMU, vsock is backed by `vhost_vsock`, which runs within the kernel itself.
##### Firecracker and Cloud Hypervisor
##### Dragonball, Firecracker and Cloud Hypervisor
In Firecracker and Cloud Hypervisor, vsock is backed by a unix-domain-socket in the hosts userspace.
In Dragonball, Firecracker and Cloud Hypervisor, vsock is backed by a unix-domain-socket in
the hosts userspace.
#### VFIO
Utilizing VFIO, devices can be passed through to the virtual machine. We will assess this separately. Exposure to
host is limited to gaps in device pass-through handling. This is supported in QEMU and Cloud Hypervisor, but not
Firecracker.
Utilizing VFIO, devices can be passed through to the virtual machine. Exposure
to the host is limited to gaps in device pass-through handling. This is
supported in QEMU and Cloud Hypervisor, but not Firecracker.
- Device Isolation Failure: One of the primary risks associated with VFIO is the
failure to isolate the physical device. If a VM can affect the operation of the
physical device in a way that impacts other VMs or the host system, it could
lead to security breaches or system instability.
- DMA Attacks: Direct Memory Access (DMA) attacks are a significant concern with
VFIO. Since the device has direct access to the system's memory, there's a risk
that a compromised VM could use its assigned device to read or write memory
outside of its allocated space, potentially accessing sensitive information or
affecting the host or other VMs.
- Firmware Vulnerabilities: Devices attached via VFIO rely on their firmware,
which can have vulnerabilities. A compromised device firmware could be exploited
to gain unauthorized access or to disrupt the system. Resource Starvation:
Improperly managed, a VM with direct access to hardware resources could
monopolize those resources, leading to performance degradation or denial of
service for other VMs or the host system.
- Escalation of Privileges: If a VM with VFIO access is compromised, it could
potentially be used to gain higher privileges than intended, especially if the
I/O devices have capabilities that are not adequately controlled or monitored.
- Improper Configuration and Management: Human errors in configuring VFIO, such
as incorrect group or user permissions, can expose the system to risks.
Additionally, inadequate monitoring and management of the VMs and their devices
can lead to security lapses.
- Software Vulnerabilities: Like any software, the components of VFIO (like the
kernel modules, device drivers, and management tools) can have vulnerabilities
that might be exploited by an attacker to compromise the security of the system.
Inter-VM Interference and Side-Channel Attacks: Even with device assignment,
there could be side-channel attacks where an attacker VM infers sensitive
information from the physical device's behavior or through shared resources like
cache.
#### ACPI (Dragonball uses Upcall)
ACPI is necessary for hotplugging of CPU, memory and devices. ACPI is available
in QEMU and Cloud Hypervisor. Device, CPU and memory hotplug are not available
in Firecracker.
- Hypervisor Vulnerabilities: In virtualized environments, the hypervisor
manages ACPI calls for virtual machines (VMs). If the hypervisor has
vulnerabilities in handling ACPI requests, it could lead to escalated privileges
or other security breaches.
- VM Escape: A sophisticated attack could exploit ACPI functionality to achieve
a VM escape, where malicious code in a VM breaks out to the host system or other
VMs. Firmware Attacks in a Virtualized Context: Similar to physical
environments, firmware-based attacks (including those targeting ACPI) in
virtualized systems can be persistent and difficult to detect. In a virtualized
environment, such attacks might not only compromise the host system but also all
the VMs running on it.
- Resource Starvation Attacks: ACPI functionality could be exploited to
manipulate power management features, causing denial of service through
resource starvation. For example, an attacker could force a VM into a low-power
state, degrading its performance or availability.
- Compromised VMs Affecting Host ACPI Settings: If a VM is compromised, it might
be used to alter ACPI settings on the host, affecting all VMs on that host. This
could lead to various impacts, from performance degradation to system
instability.
- Supply Chain Risks: As with non-virtualized environments, the firmware,
including ACPI firmware used in virtualized environments, could be compromised
during the supply chain process, leading to vulnerabilities that affect all VMs
running on the hardware.
#### ACPI
ACPI is necessary for hotplug of CPU, memory and devices. ACPI is available in QEMU and Cloud Hypervisor. Device, CPU and memory hotplug
@@ -125,13 +125,19 @@ The kata agent has the ability to configure agent options in guest kernel comman
| `agent.debug_console` | Debug console flag | Allow to connect guest OS running inside hypervisor Connect using `kata-runtime exec <sandbox-id>` | boolean | `false` |
| `agent.debug_console_vport` | Debug console port | Allow to specify the `vsock` port to connect the debugging console | integer | `0` |
| `agent.devmode` | Developer mode | Allow the agent process to coredump | boolean | `false` |
| `agent.guest_components_rest_api` | `api-server-rest` configuration | Select the features that the API Server Rest attestation component will run with. Valid values are `all`, `attestation`, `resource` | string | `resource` |
| `agent.guest_components_procs` | guest-components processes | Attestation-related processes that should be spawned as children of the guest. Valid values are `none`, `attestation-agent`, `confidential-data-hub` (implies `attestation-agent`), `api-server-rest` (implies `attestation-agent` and `confidential-data-hub`) | string | `api-server-rest` |
| `agent.hotplug_timeout` | Hotplug timeout | Allow to configure hotplug timeout(seconds) of block devices | integer | `3` |
| `agent.guest_components_rest_api` | `api-server-rest`configuration | Select the features that the API Server Rest attestation component will run with. Valid values are `all`, `attestation`, `resource`, or `none` to not launch the `api-server-rest` component | string | `resource` |
| `agent.cdh_api_timeout`| Confidential Data Hub (CDH) API timeout | Allow to configure CDH API timeout(seconds) | integer | `50` |
| `agent.https_proxy` | HTTPS proxy | Allow to configure `https_proxy` in the guest | string | `""` |
| `agent.image_registry_auth` | Image registry credential URI | The URI to where image-rs can find the credentials for pulling images from private registries e.g. `file:///root/.docker/config.json` to read from a file in the guest image, or `kbs:///default/credentials/test` to get the file from the KBS| string | `""` |
| `agent.enable_signature_verification` | Image security policy flag | Whether enable image security policy enforcement. If `true`, the resource indexed by URI `agent.image_policy_file` will be got to work as image pulling policy. | string | `""` |
| `agent.image_policy_file` | Image security policy URI | The URI to where image-rs Typical policy URIs are like `file:///etc/image.json` to read from a file in the guest image, or `kbs:///default/security-policy/test` to get the file from the KBS| string | `""` |
| `agent.log` | Log level | Allow the agent log level to be changed (produces more or less output) | string | `"info"` |
| `agent.log_vport` | Log port | Allow to specify the `vsock` port to read logs | integer | `0` |
| `agent.no_proxy` | NO proxy | Allow to configure `no_proxy` in the guest | string | `""` |
| `agent.passfd_listener_port` | File descriptor passthrough IO listener port | Allow to set the file descriptor passthrough IO listener port | integer | `0` |
| `agent.secure_image_storage_integrity` | Image storage integrity | Allow to use `dm-integrity` to protect the integrity of encrypted block volume | boolean | `false` |
| `agent.server_addr` | Server address | Allow the ttRPC server address to be specified | string | `"vsock://-1:1024"` |
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.