We need to remove the device from the tracking map, a container
restart will increment the bus index and we will get out of root-ports
and crash the machine.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We need special handling for pod_sandbox, pod_container and
single_container how and when to inject CDI devices
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.
Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.
Fixes: #8860
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Let's see if it helps with issues like:
```
error: must build at directory: not a valid directory: evalsymlink
failure on
'"/home/runner/actions-runner/_work/kata-containers/kata-containers/tests/functional/kata-deploy/../../..//tools/packaging/kata-deploy/kata-cleanup/overlays/k0s"'
: lstat
/home/runner/actions-runner/_work/kata-containers/kata-containers/tests/functional/kata-deploy/":
no such file or directory
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This takes a few minutes that could be saved, so let's avoid doing this
on all the platforms, but simply do this when it's needed (the baremetal
use case).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Currently only "baremetal" runs all the tests, but we could easily run
"all" locally or using the github provided runners, even when not using
a "baremetal" system.
The reason I'd like to have a differentiation between "all" and
"baremetal" is because "baremetal" may require some cleanup, which "all"
can simply skip if testing against a fresh created VM.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
For now we've only exposed the option to deploy kata-deploy for k3s and
vanilla kubernetes when using containerd.
However, I do need to also deploy k0s and rke2 for an internal CI, and
having those exposed here do not hurt, and allow us to easily expand the
CI at any time in the future.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
k0s was added to kata-deploy, but it's kata-cleanup counterpart was
never added. Let's fix it.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
k0s deployment has been broken since we moved to using `tomlq` in our
scripts. The reason is that before using `tomlq` our script would,
involuntarily, end up creating the file.
Now, in order to fix the situation, we need to explicitly create the
file and let `tomlq` add the needed content.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This option allows to add a debug monitor socket when
`enable_debug = true` to control QEMU within debugging case.
Fixes: #9603
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The `update_env_pci()` function need the PCI address mapping to
translate the host PCI address to guest PCI address in below
environment variables:
- PCIDEVICE_<prefix>_<resource-name>_INFO
- PCIDEVICE_<prefix>_<resource-name>
So collect PCI address mapping for both vfio-pci-gk and
vfio-pci devices.
Fixes#9614
Signed-off-by: Lei Huang <leih@nvidia.com>
The k8s test suite halts on the first failure, i.e., failing-fast. This
isn't the behavior that we used to see when running tests on Jenkins and it
seems that running the entire test suite is still the most productive way. So
this disable fail-fast by default.
However, if you still wish to run on fail-fast mode then just export
K8S_TEST_FAIL_FAST=yes in your environment.
Fixes: #9697
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test has failed in confidential runtime jobs. Skip it
until we don't have a fix.
Fixes: #9663
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
One of our machines is running CentOS 9 Stream, and we could easily
verify that we can build and install the kbs client there, thus we're
expanding the installation script to also support CentOS 9 Stream.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`externals.coco-kbs.toolchain` is not defined, get the rust_version from
`externals.coco-trustee.toolchain` instead.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We're bumping the version in order to bring in the customisation needed
for setting up a custom pccs, which is needed for the KBS integration
tests with Kata Containers + TDX.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR enables the installation and unistallation of the kbs client
as well as general coco components needed for the TDX GHA CI.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR fixes the minvalue for boot time to avoid the random failures
of the GHA CI.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Replace tab indentation with spaces for the three lines within the ifneq statements, aligning them with the surrounding code.
Fixes:#9692
Signed-off-by: sidneychang <2190206983@qq.com>
The `dial_timeout` works fine for Runtime-go, but is obsoleted in
Runtime-rs.
When the pod cannot connect to the Agent upon starting, we need to adjust
the `reconnect_timeout_ms` to increase the number of connection attempts to
the Agent.
Fixes: #9688
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
For the TD attestation to work the connection to QGS on the host is needed.
By default QGS runs on vsock port 4050, but can be modified by the host owner.
Format of the qemu object follows the SocketAddress structure, so it needs to be provided in the JSON format, as in the example below:
-object '{"qom-type":"tdx-guest","id":"tdx","quote-generation-socket":{"type":"vsock","cid":"2","port":"4050"}}'
Fixes: #9497
Signed-off-by: Jakub Ledworowski <jakub.ledworowski@intel.com>