Commit Graph

13370 Commits

Author SHA1 Message Date
Lukáš Doktor
6b0eaca4d4
tests: Add support for nodeport ingress for the kbs setup
this can be used on kcli or other systems where cluster nodes are
accessible from all places where the tests are running.

Fixes: #9272

Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
2024-03-26 16:59:00 +01:00
Greg Kurz
5009fabde4 release: Keep it draft until all artifacts have been published
The automated release workflow starts with the creation of the release in
GitHub. This is followed by the build and upload of the various artifacts,
which can be very long (like hours). During this period, the release appears
to be fully available in https://github.com/kata-containers/kata-containers/
even though it lacks all the artifacts. This might be confusing for users
or automation consuming the release.

Create the release as draft and clear the draft flag when all jobs are
done. This ensure that the release will only be tagged and made public
when it is fully usable.

If some job fails because of network timeout or any other transient
error, the correct action is to restart the failed jobs until they
eventually all succeed. This is by far the quicker path to complete
the release process.

If the workflow is *canceled* for some reason, the draft release is left
behind. A new run of the workflow will create a brand new draft release
with the same name (not an issue with GitHub). The draft release from
the previous run should be manually deleted. This step won't be automated
as it looks safer to leave the decision to a human.

[1] https://github.com/kata-containers/kata-containers/releases

Fixes #9064 - part VI

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-03-26 14:48:05 +01:00
Fabiano Fidêncio
cfe75f9422
k8s: confidential: Update cpuid to its latest release
Since v2.2.6 it can detect TDX guests on Azure, so let's bump it even if
Azure peer-pods are not currently used as part of our CI.

Fixes: #9348

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-03-26 10:21:12 +01:00
Chengyu Zhu
d16971e37e
Merge pull request #9325 from ChengyuZhu6/image_service
agent:image: Refactor code to improve memory efficiency of image service
2024-03-26 10:38:37 +08:00
Dan Mihai
6c72c29535 genpolicy: reduce policy debug prints
Kata CI has full debug output enabled for the cbl-mariner k8s tests,
and the test AKS node is relatively slow. So debug prints from policy
are expensive during CI.

Fixes: #9296

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-03-26 02:21:26 +00:00
Alex Lyn
cec943fc26
Merge pull request #9244 from Apokleos/dgb-gpu
runtime-rs/dragonball: add support building kernel with upcall and GPU hotplug
2024-03-26 08:53:54 +08:00
Chelsea Mafrica
4e3deb5a3b tools: Fix path for installing yq in packaging script
The lib.sh script uses the right directory but the wrong path for the
script that installs yq; fix it.

Fixes #9165

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2024-03-25 15:09:52 -07:00
Chelsea Mafrica
cfb977625e docs: Remove links to tests repo
Remove links to tests repo and update with corresponding location in the
current repo.

Fixes #9165

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2024-03-25 15:09:52 -07:00
Chelsea Mafrica
d69514766e src: Remove references to files in tests repo
Change scripts and source that uses files in the tests repo to use the
corresponding file in the current repo.

Fixes #9165

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2024-03-25 15:09:52 -07:00
Gabriela Cervantes
ddef2be4f1 docs: Remove stale kernel information
This PR removes stale kernel information from the README document.

Fixes #9343

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-03-25 15:57:00 +00:00
Greg Kurz
e9e94d2dbd release: Give a pretty name to all steps
For a prettier rendering in the web UI.

Fixes #9064 - part VI

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-03-25 15:50:35 +01:00
Greg Kurz
dce6ea57b2 release: Simplify the create-new-release action of release.sh
Now that the version is an invariant for the entire workflow, it
isn't required to obtain it with an environment variable. Just
rely on the content of the `VERSION` file like other actions.

Fixes #9064 - part VI

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-03-25 15:50:35 +01:00
Alex Lyn
5c54315a87 dragonball: fix CI failure due to poor UT adaptation.
Fixes: #9144

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-25 20:25:27 +08:00
Alex Lyn
079d894496 kernel: bump version in kata config version
Fixes: #9140

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-25 20:25:27 +08:00
Alex Lyn
070c3fa657 docs: add doc about building kernel with upcall and GPU hotplug
We need some docs about how to build a guest kernel to support
both Upcall and Nvidia GPU Passthrough(hotplug) at the same time.
This patch is to do such thing to help users to build a guest
kernel with support both Upcall and Nvidia GPU hotplug/unlplug.

Fixes: #9140

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-25 20:25:17 +08:00
ChengyuZhu6
06b9935402 docs: Add a document for kata guest image management design
Add a document for kata guest image management design.

Related feature: #8484

Fixes: #9225 -- part I

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-03-25 18:17:23 +08:00
Chengyu Zhu
4029d154ba
Merge pull request #9313 from ChengyuZhu6/rtest
agent: Refactor unit tests to leverage rstest for parameterization
2024-03-25 10:31:45 +08:00
Alex Lyn
bc309b9865 kernel: add CONFIG_CRYPTO_ECDSA into whitelist
CONFIG_CRYPTO_ECDSA is not supported in older kernels such as 5.10.x
which may cause building broken problem if we build such kernel with
NVIDIA GPU in version 5.10.x

So this patch is to add CONFIG_CRYPTO_ECDSA into whitelist.conf to
avoid break building guest kernel with NVIDIA GPU.

Fixes: #9140

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-25 08:05:31 +08:00
ChengyuZhu6
f47408fdf4 agent:image: Refactor code to improve memory efficiency of image service
Currently, `.lock().await.clone()` results in `Option<ImageService>` being duplicated in memory with each call to `singleton()`.
Consequently, if kata-agent receives numerous image pulling requests simultaneously,
it will lead to the allocation of multiple `Option<ImageService>` instances in memory, thereby consuming additional memory resources.

In image.rs, we introduce two public functions:
`merge_bundle_oci()` and `init_image_service()`. These functions will encapsulate
the operations on `IMAGE_SERVICE`, ensuring that its internal details remain
hidden from external modules such as `rpc.rs`.

Fixes: #9225 -- part II

Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-25 07:46:50 +08:00
ChengyuZhu6
7a49ec1c80 agent:util: Refactor the unit tests to leverage rstest
Refactor the unit tests in util.rs to leverage rstest for parameterization.

Fixes: #9314

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-23 10:49:53 +08:00
ChengyuZhu6
2df2b4d30d agent:namespace: Refactor unit tests to leverage rstest
Refactor the unit tests in `namespace.rs` to leverage rstest for parameterization.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-23 10:49:48 +08:00
Hyounggyu Choi
d915a79e2d
Merge pull request #9280 from BbolroC/enable-qemu-on-s390x
runtime-rs: Enable qemu on s390x
2024-03-22 23:58:42 +01:00
Fabiano Fidêncio
25cd28a32b
Merge pull request #9337 from fidencio/topic/bump-nydus-snapshotter
versions: Update nydus-snapshotter to v0.13.11
2024-03-22 22:18:18 +01:00
Hyounggyu Choi
81aaa34bd6 runtime-rs: Add DeviceVirtioSerial and DeviceVirtconsole
It is observed that virtiofsd exits immediately on s390x
if there is no attached console devices.
This commit resolves the issue by migrating `appendConsole()`
from runtime and being triggered in `start_vm()`.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-22 19:27:13 +01:00
Hyounggyu Choi
2cfe745efb runtime-rs: Enable memory backend option for Machine for s390x
For s390x, it requires an additional option `memory-backend` for `-machine`.
Otherwise, virtiofsd exits with HandleRequest(InvalidParam).

This commit is to add a field `memory_backend` to `struct Machine`
and turn it on for s390x.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-22 19:27:13 +01:00
Hyounggyu Choi
9bcfaad625 runtime-rs: Add ccw block device for rootfs
Like nvdimm for x86_64, a block device for s390x should be
treated differently with `virtio-blk-ccw`.
This is to generate a QEMU command line parameter for a block
device by using `-blockdev` and `-device` if the `vm_rootfs_driver`
is set to `virtio-blk-ccw`.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-22 19:27:13 +01:00
David Esparza
3e40051634
Merge pull request #9255 from dborquez/thread_pid_function
runtime-rs: ch: Implement full thread/tid/pid handling
2024-03-22 10:05:02 -06:00
Fabiano Fidêncio
d0949759ec
versions: Update nydus-snapshotter to v0.13.11
This version brings in a fix for cleaning up k3s/rke2 environments,
which directly impacts the TDX machine that's part of our CI.

Fixes: #9318

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-03-22 14:56:18 +01:00
Greg Kurz
e4f6a778a8
Merge pull request #9321 from fidencio/topic/releases-follow-up-VI
Revert "release: Skip --generate-notes for this release"
2024-03-22 10:44:40 +01:00
GabyCT
a67382fd00
Merge pull request #9324 from GabyCT/topic/udevguide
docs: Update libseccomp instructions in Developers Guide
2024-03-21 14:25:41 -06:00
Gabriela Cervantes
d54cdd3f0c scripts: Fix unbound variables in k8s setup script
This PR fixes the unbound variables error when trying to run
the setup script locally in order to avoid errors.

Fixes #9328

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-03-21 19:10:16 +00:00
Chengyu Zhu
9a4cb96262
Merge pull request #9312 from ChengyuZhu6/show-feature
agent: Add guest-pull to the list of agent features in announce()
2024-03-21 23:35:29 +08:00
David Esparza
b498e140a1
runtime-rs: ch: Implement full thread/tid/pid handling
Add in the full details once cloud-hypervisor/cloud-hypervisor#6103
has been implemented, and the feature is available in a Cloud Hypervisor
release.

Fixes: #8799

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2024-03-21 08:24:53 -06:00
James O. D. Hunt
1e684f5848
Merge pull request #9259 from jodh-intel/tests-add-static-checks-announce
tests: static checker: Add announce message
2024-03-21 13:59:36 +00:00
ChengyuZhu6
754399d909 agent: Add guest-pull to the list of agent features in announce()
Add guest-pull to the list of agent features in announce().

Fixes: #9225 -- part IV

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-21 20:01:52 +08:00
Xuewei Niu
9c4f9dcb35
Merge pull request #9311 from studychao/chao/fix_mtrr
Dragonballl: introduce MTRR regs support
2024-03-21 17:24:27 +08:00
Hyounggyu Choi
9b2c08935b runtime-rs: Pass different device argument based on bus type
Currently, `*-pci` is used as an argument for the device config.
It is not true for a case where a different type of bus is used.
s390x uses `ccw`.
This commit is to make it flexible to generate the device argument
based on the bus type. A structure `DeviceVhostUserFsPci` and
`VhostVsockPci` is renamed to `DeviceVhostUserFs` and `VhostVsock`
because the structure name is not bound to a certain bus type any more.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-21 09:25:37 +01:00
GabyCT
03f3d3491d
Merge pull request #9265 from GabyCT/topic/fixnydusclean
gha: Fix nydus namespace clean up
2024-03-20 16:17:38 -06:00
GabyCT
702a8a440f
Merge pull request #9309 from GabyCT/topic/fixlograndom
gha: Update journal log names for kubernetes artifacts
2024-03-20 16:17:17 -06:00
Gabriela Cervantes
05f4dc1902 docs: Update libseccomp instructions in Developers Guide
This PR updates the libseccomp instructions in the Developers Guide.

Fixes #9323

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-03-20 20:44:24 +00:00
GabyCT
163103d59e
Merge pull request #9307 from GabyCT/topic/fixdocreq
docs: Update links in the Documentation Requirements document
2024-03-20 14:29:04 -06:00
Gabriela Cervantes
af18221ab7 docs: Update links in the Documentation Requirements document
This PR updates the url links in the Documentation Requirements
document.

Fixes #9306

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-03-20 15:45:49 +00:00
Gabriela Cervantes
a855ecf21b gha: Update journal log names for kubernetes artifacts
This PR updates the journal log names for kubernetes artifacts
in order to make sure that we have different names when we are
running parallel GHA jobs.

Fixes #9308

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-03-20 15:44:20 +00:00
Gabriela Cervantes
4fb8f8705f gha: Fix nydus namespace clean up
This PR terminates the nydus namespace to avoid the error of
that the flag needs an argument.

Fixes #9264

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-03-20 15:41:39 +00:00
Fabiano Fidêncio
0278fc8a91
Revert "release: Skip --generate-notes for this release"
This reverts commit 0fa59ff94b, as now
we'll be able to use the `--generate-notes`, hopefully, without blowing
the allowed limit.

Fixes: #9064 - part VI

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-03-20 15:48:22 +01:00
James O. D. Hunt
577abd014b tests: static checker: Add announce message
Added an announcement message to the `static-checks.sh` script. It runs
platform / architecture specific code so it would be useful to display
details of the platform the checker is running on to help with
debugging.

Fixes: #9258.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2024-03-20 13:41:26 +00:00
James O. D. Hunt
4af4a8ad2b tests: static checker: Create setup function
Move some of the common code into a setup function.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2024-03-20 11:58:28 +00:00
Fabiano Fidêncio
1aec4f737a
Merge pull request #9316 from fidencio/topic/releases-follow-up-V
release: Skip --generate-notes for this release
2024-03-20 10:50:14 +01:00
Fabiano Fidêncio
0fa59ff94b
release: Skip --generate-notes for this release
This release is a special case, as we've slacked for 6 months and the
release content is way too long ... long enough to exceed the allowed
limit for the release notes.

With this in mind we'll just remove the `--generate-notes` for now, and
then revert this commit as soon as the release is out, as releases
should be happening every month and, ideally, we won't reach this
situation never ever again.

Fixes: #9064 - part V

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-03-20 10:32:11 +01:00
Hyounggyu Choi
7b3d1adb8c libs: Bump sysinfo to v0.30.5
It has been observed that the runtime stops running around
`sysinfo::total_memory()` while adjusting a config on s390x.
This is to update the crate to the latest version which happened
to resolve the issue. (No explicit release note for this)

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-20 09:27:13 +01:00