It frequently causes "Resource Temporarily Unavailable (OS Error 11)"
with the original 250ms read timeout When passing through devices via
VFIO in QEMU. The root cause lies in synchronization timeout windows
failing to accommodate inherent delays during critical hardware init
phases in kernel space. This commit would increase the timeout to 5000ms
which was determined through some tests. While not guaranteeing complete
resolution for all hardware combinations, this change significantly
reduces timeout failures.
Fixes # 10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This enables guest pull via config, without the need of any external
snapshotter. When the config enables runtime.experimental_force_guest_pull, instead of
relying on annotations to select the way to share the root FS, we always
use guest pull.
Co-authored-by: Markus Rudy <mr@edgeless.systems>
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
Since kernel v6.3 the vsock packet is not split over two descriptors and
is instead included in a single one.
Therefore, we currently decide the specific method of obtaining
BufWrapper based on the length of descriptor.
Refer:
a2752fe04fhttps://git.kernel.org/torvalds/c/71dc9ec9ac7d
Signed-off-by: Xingru Li <lixingru.lxr@linux.alibaba.com>
[ Gao Xiang: port this patch from the internal branch to address Linux 6.1.63+. ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Currently, Kata EROFS support needs it, otherwise it will:
[ 0.564610] erofs: (device sda): mounted with root inode @ nid 36.
[ 0.564858] overlayfs: failed to set xattr on upper
[ 0.564859] overlayfs: ...falling back to index=off,metacopy=off.
[ 0.564860] overlayfs: ...falling back to xino=off.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Some nvidia gpu pci address domain with 0001,
current runtime default deal with 0000:bdf,
which cause address errors during device initialization
and address conflicts during device registration.
Fixes#11252
Signed-off-by: yangsong <yunya.ys@antgroup.com>
Fixed "note: Not following: ./../../../tools/packaging/guest-image/lib_se.sh:
openBinaryFile: does not exist (No such file or directory) [SC1091]"
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Although the script will inherit that setting from the caller scripts,
expliciting it in the file will vanish shellcheck "warning: Use 'pushd
... || exit' or 'pushd ... || return' in case pushd fails. [SC2164]"
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Addressed the following shellcheck advices:
SC2046 (warning): Quote this to prevent word splitting.
SC2248 (style): Prefer double quoting even when variables don't contain special characters
SC2250 (style): Prefer putting braces around variable references even when not strictly required.
SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
It's import to handle port allocation in a PCIe topology before vfio
deivce hotplug via QMP.
The code ensures that VFIO devices are properly allocated to available
ports (either root ports or switch ports) and updates the device's bus
and port information accordingly.
It'll first retrieves the PCIe port type from the topology using
pcie_topo.get_pcie_port(). And then, searches for an available node in
the PCIe topology with RootPort or SwitchPort type and allocates the
VFIO device to the found available port. Finally, Updates the device's
bus with the allocated port's ID and type.
Fixes # 10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit implements the `find_available_node` function,
which searches the PCIe topology for the first available
`TopologyPortDevice` or `SwitchDownPort`.
If no available node is found in either the `pcie_port_devices`
or the connected switches' downstream ports, the function returns
`None`.
Fixes # 10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit note that the current implementation restriction where
'multifunction=on' is temporarily unsupported. While the feature
isn't available in the present version, we explicitly acknowledge
this limitation and commit to addressing it in future iterations
to enhance functional completeness.
Tracking issue #11292 has been created to monitor progress towards
full multifunction support.
Fixes#10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
To support port devices for vfio devices, more fields need to be
introduced to help pass port type, bus and other information.
Fixes#10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
When try to delete a cgroup, it's needed to move all of the
tasks/procs in the cgroup into root cgroup and then delete it.
Since for cgroup v2, it doesn't support to move thread into
root cgroup, thus move the processes instead of moving tasks
can fix this issue.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
this script relies on temporary subscriptions and won't cleanup any
resources. Let's improve the logging to better describe what resources
were created and how to clean them, if the user needs to do so.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
We used hardcoded "ci/openshift-ci/cluster" location which expects this
script to be only executed from the root. Let's use SCRIPT_DIR instead
to allow execution from elsewhere eg. by user bisecting a failed CI run.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
In CI we hit problem where just after `az login` the first `az
network vnet list` command fails due to permission. We see
"insufficient permissions" or "pending permissions", suggesting we should
retry later. Manual tests and successful runs indicate we do have the
permissions, but not immediately after login.
Azure docs suggest using extra `az account set` but still the
propagation might take some time. Add a loop retrying
the first command a few times before declaring failure.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
kbs_k8s_svc_host() returns the ingress IP when the KBS service is
exposed via an ingress. In Azure AKS the ingress can time a while to be
fully ready and recently we have noticed on CI that kbs_k8s_svc_host()
has returned empty value. Maybe the problem is on current timeout being
too low, so let's increase it to 50 seconds to see if the situation
improves.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Added 'report-tests' command to gha-run.sh to print to stdout a report
of the tests executed.
For example:
```
SUMMARY (2025-02-17-14:43:53):
Pass: 0
Fail: 1
STATUSES:
not_ok foo.bats
OUTPUTS:
::group::foo.bats
1..3
not ok 1 test 1
not ok 2 test 2
ok 3 test 3
1..2
not ok 1 test 1
not ok 2 test 2
::endgroup::
```
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Currently run_kubernetes_tests.sh sends all the bats outputs to stdout
which can be very difficult to browse to find a problem, mainly on
CI. With this change, each bats execution have its output sent to
'reports/yyy-mm-dd-hh:mm:ss/<status>-<bats file>.log' where <status>
is either 'ok' (tests passed) or 'not_ok' (some tests failed).
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Added variable REPO_COMPONENTS (default: "main") which sets components
used by mmdebstrap for rootfs building.
This is useful for custom image builders who want to include EXTRA_PKGS
from components other than the default "main" (e.g. "universe").
Fixes: #11278
Signed-off-by: Jacek Tomasiak <jtomasiak@arista.com>
Signed-off-by: Jacek Tomasiak <jacek.tomasiak@gmail.com>
Use $(sandbox-namespace) wildcard in case none is specified in yaml. If wildcard is present, compare
input against annotation value.
Fixes regression introduced in https://github.com/microsoft/kata-containers/pull/273
where samples that use metadata.namespace env var were no longer working.
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>