Commit Graph

14303 Commits

Author SHA1 Message Date
Fabiano Fidêncio
3733266a60
ci: nydus: Treat the snapshotter as a dependency
Instead of deploying and removing the snapshotter on every single run,
let's make sure the snapshotter is always deploy on the TDX case.

We're doing this as an experiment, in order to see if we'll be able to
reduce the failures we've been facing with the nydus snapshotter.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-08-15 22:44:30 +02:00
Aurélien Bombo
0223eedda5
Merge pull request #10050 from burgerdev/request-hardening
genpolicy: hardening some agent requests
2024-08-15 08:31:21 -07:00
Fupan Li
365df81d5e
Merge pull request #10148 from lifupan/main_sandboxapi
runtime-rs: Add the wait_vm support for hypervisors
2024-08-15 17:08:38 +08:00
GabyCT
ecfbc9515a
Merge pull request #10158 from GabyCT/topic/k8sstabil
tests: Add kubernetes stability test
2024-08-14 14:44:49 -06:00
Gabriela Cervantes
d48ad94825 tests: Add kubernetes stability test
This PR adds a k8s stability test that will be part of the CoCo Kata
stability tests that will run weekly.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-08-14 15:30:49 +00:00
Fupan Li
cadcf5f92d runtime-rs: Add the wait_vm support for hypervisors
Add the wait_vm method for hypervisors. This is a
prerequisite for sandbox api support.

Fixes: #7043

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-08-14 12:01:34 +08:00
Fupan Li
506977b102
Merge pull request #10156 from GabyCT/topic/disablevolume
tests: Disable k8s file volume test
2024-08-14 12:00:47 +08:00
GabyCT
b0b6a1baea
Merge pull request #10154 from GabyCT/topic/stressk8s
tests: Add kubernetes stress-ng tests
2024-08-13 15:09:59 -06:00
Gabriela Cervantes
e580e29246 tests: Disable k8s file volume test
This PR disables the k8s file volume test as we are having random failures
in multiple GHA CIs mainly because the exec_host function sometimes
does it not work properly.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-08-13 20:50:18 +00:00
Gabriela Cervantes
bdca5ca145 tests: Add kubernetes stress-ng tests
This PR adds kubernetes stress-ng tests as part of the stability testing
for kata.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-08-13 16:23:52 +00:00
Fabiano Fidêncio
99730256a2
Merge pull request #10149 from fidencio/topic/kata-manager-relax-opt-check
kata-manager: Only check files when tarball is not passed
2024-08-13 16:26:16 +02:00
Markus Rudy
bce5cb2ce5 genpolicy: harden CreateSandboxRequest checks
Hooks are executed on the host, so we don't expect to run hooks and thus
require that no hook paths are set.

Additional Kernel modules expand the attack surface, so require that
none are set. If a use case arises, modules should be allowlisted via
settings.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2024-08-13 09:01:58 +02:00
Markus Rudy
aee23409da genpolicy: harden CopyFileRequest checks
CopyFile is invoked by the host's FileSystemShare.ShareFile function,
which puts all files into directories with a common pattern. Copying
files anywhere else is dangerous and must be prevented. Thus, we check
that the target path prefix matches the expected directory pattern of
ShareFile, and that this directory is not escaped by .. traversal.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2024-08-13 09:01:58 +02:00
Steve Horsman
91084058ae
Merge pull request #10007 from wainersm/run_k8s_on_free_runners
ci: Transition GARM tests to free runners, pt. II
2024-08-12 18:12:18 +01:00
Fabiano Fidêncio
5fe65e9fc2
kata-manager: Only check files when tarball is not passed
Only do the checking in case the tarball was not explicitly passed by
the user.  We have no control of what's passed and we cannot expect that
all the files are going to be under /opt.

Fixes: #10147

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-08-12 13:54:24 +02:00
GabyCT
775f6bdc5c
Merge pull request #10142 from GabyCT/topic/updatestress
tests: Update ubuntu image for stress Dockerfile
2024-08-09 16:11:35 -06:00
Gabriela Cervantes
5e5fc145cd tests: Update ubuntu image for stress Dockerfile
This PR updates the ubuntu image for stress Dockerfile. The main purpose
is to have a more updated image compared with the one that is in libpod
which has not been updated in a while.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-08-09 15:29:10 +00:00
Steve Horsman
e4c023a9fa
Merge pull request #10140 from stevenhorsman/kata-version-in-artefact-version
ci: cache: Include kata version in artefact versions
2024-08-09 11:37:09 +01:00
Fabiano Fidêncio
44b08b84b0
Merge pull request #10113 from Freax13/fix/no-scsi-off
qemu: don't emit scsi parameter
2024-08-08 16:23:36 +02:00
stevenhorsman
b6a3a3f8fe ci: cache: Include kata version in artefact versions
- At the moment we aren't factoring in the kata version on our caches,
so it means that when we bump this just before release, we don't
rebuilt components that pull in the VERSION content, so the release build
ends up with incorrect versions in it's binaries

Fixes: #10092
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-08-08 14:58:58 +01:00
GabyCT
584d7a265e
Merge pull request #10127 from GabyCT/topic/execimage
tests:k8s: Update image in kubectl debug for the exec host function
2024-08-07 17:00:52 -06:00
Archana Shinde
1012449141
Merge pull request #10129 from hex2dec/qemu-aio-native
tools: Support for building qemu with linux aio
2024-08-07 14:32:52 -07:00
Archana Shinde
a6a736eeaf
Merge pull request #10089 from amshinde/enable-nerdctl-clh
ci: Enable nerdctl tests for clh
2024-08-07 12:13:00 -07:00
Wainer dos Santos Moschetta
374405aed1 workflows/run-k8s-tests-on-amd64: remove 'instance' from matrix
The jobs are all executed on ubuntu-22.04 so it's invariant and
can be removed from the matrix (this will shrink the jobs names).

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-08-07 16:00:39 -03:00
Wainer dos Santos Moschetta
d11ce129ac workflows: merge run-k8s-tests-on-garm and run-k8s-tests-with-crio-on-garm
Created the run-k8s-tests-on-amd64.yaml which is a merge of
run-k8s-tests-on-garm.yaml and run-k8s-tests-with-crio-on-garm.yaml

ps: renamed the job from 'run-k8s-tests' to 'run-k8s-tests-on-amd64' to
it is easier to find on Github UI and be distinguished from s390x,
ppc64le, etc...

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-08-07 15:50:43 -03:00
Wainer dos Santos Moschetta
ed0732c75d workflows: migrate run-k8s-tests-with-crio-on-garm to free runners
Switch to Github managed runners just like the run-k8s-tests-on-garm
workflow.

See: #9940
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-08-07 15:20:42 -03:00
Wainer dos Santos Moschetta
3d053a70ab workflows: migrate run-k8s-tests-on-garm to free runners
Switched to Github managed runners. The instance_type parameter was
removed and K8S_TEST_HOST_TYPE is set to "all" which combine the
tests of "small" and "normal". This way it will reduze to half of
the jobs.

See: #9940
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-08-07 15:20:42 -03:00
Wainer dos Santos Moschetta
dfb92e403e tests/k8s: add "deploy-kata"/"cleanup" actions to gh-run.sh
These new "kata-deploy" and "cleanup" actions are equivalent to
"kata-deploy-garm" "cleanup-garm", respectively, and should be
used on the workflows being migrated from GARM to
Github's managed runners.

Eventually "kata-deploy-garm" and "cleanup-garm" won't be used anymore
then we will be able to remove them.

See: #9940
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-08-07 15:20:23 -03:00
Dan Mihai
2da77c6979
Merge pull request #10068 from burgerdev/genpolicy-test
genpolicy: add crate-scoped integration test
2024-08-06 16:10:46 -07:00
GabyCT
fb166956ab
Merge pull request #10132 from fidencio/topic/support-image-pull-with-nerdctl
runtime: image-pull: Make it work with nerdctl
2024-08-06 15:33:40 -06:00
Gabriela Cervantes
d0ca43162d tests:k8s: Update image in kubectl debug for the exec host function
This PR updates the image that we are using in the kubectl debug command
as part of the exec host function, as the current alpine image does not
allow to create a temporary file for example and creates random kubernetes
failures.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-08-06 21:13:46 +00:00
Fabiano Fidêncio
63802ecdd9
Merge pull request #9880 from zvonkok/helm-chart
kata-deploy: Add Helm Chart
2024-08-06 22:55:31 +02:00
Archana Shinde
ba884aac13 ci: Enable nerdctl tests for clh
A recent fix should resolve some the issues seen earlier with clh
with the go runtime. Enabling this test to check if the issue is still
seen.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-08-06 10:41:42 -07:00
Fabiano Fidêncio
f33f2d09f7 runtime: image-pull: Make it work with nerdctl
Our code for handling images being pulled inside the guest relies on a
containerType ("sandbox" or "container") being set as part of the
container annotations, which is done by the CRI Engine being used, and
depending on the used CRI Engine we check for a specfic annotation
related to the image-name, which is then passed to the agent.

However, when running kata-containers without kubernetes, specifically
when using `nerdctl`, none of those annotations are set at all.

One thing that we can do to allow folks to use `nerdctl`, however, is to
take advantage of the `--label` flag, and document on our side that
users must pass `io.kubernetes.cri.image-name=$image_name` as part of
the label.

By doing this, and changing our "fallback" so we can always look for
such annotation, we ensure that nerdctl will work when using the nydus
snapshotter, with kata-containers, to perform image pulling inside the
pod sandbox / guest.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-08-06 17:07:45 +02:00
Zvonko Kaiser
8d9bec2e01
ci: add reset_runtime to cleanup
Adding reset_cleanup to cleanup action so that it is done automatically
without the need to run yet another DS just to reset the runtime.

This is now part of the lifecycle hook when issuing kata-deploy.sh
cleanup

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-08-06 11:57:04 +02:00
Zvonko Kaiser
1221ab73f9
ci: make cleanup_kata_deploy really simple
Remove the unneeded logic for cleanup the values are
encapsulated in the deployed helm release

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-08-06 11:57:04 +02:00
Zvonko Kaiser
51690bc157
ci: Use helm to deploy kata-deploy
Rather then modifying the kata-depoy scripts let's use Helm and
create a values.yaml that can be used to render the final templates

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-08-06 11:57:04 +02:00
Zvonko Kaiser
94b3348d3c
kata-deploy: Add Helm Chart
For easier handling of kata-deploy we can leverage a Helm chart to get
rid of all the base and overlays for the various components

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-08-06 11:57:04 +02:00
Zhiwei Huang
d455883b46 tools: Support for building qemu with linux aio
The kata containers hypervisior qemu configuration supports setting
block_device_aio="native", but the kata static build of qemu does
not add the linux aio feature.

The libaio-dev library is a necessary dependency for building qemu
with linux aio.

Fixes: #10130

Signed-off-by: Zhiwei Huang <ai.william@outlook.com>
2024-08-06 14:30:45 +08:00
Markus Rudy
69535e5458 genpolicy: add crate-scoped integration test
Provides a test runner that generates a policy and validates it
with canned requests. The initial set of test cases is mostly for
illustration and will be expanded incrementally.

In order to enable both cross-compilation on Ubuntu test runners as well
as native compilation on the Alpine tools builder, it is easiest to
switch to the vendored openssl-src variant. This builds OpenSSL from
source, which depends on Perl at build time.

Adding the test to the Makefile makes it execute in CI, on a variety of
architectures. Building on ppc64le requires a newer version of the
libz-ng-sys crate.

Fixes: #10061

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2024-08-05 11:52:01 +02:00
Markus Rudy
4d1416529d genpolicy: fix clippy v1.78.0 warnings
cargo clippy has two new warnings that need addressing:
- assigning_clones
  These were fixed by clippy itself.
- suspicious_open_options
  I added truncate(false) because we're opening the file for reading.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2024-08-05 11:48:30 +02:00
Fabiano Fidêncio
43dca8deb4
Merge pull request #10121 from microsoft/saulparedes/add_version_flag
genpolicy: add --version flag
2024-08-03 21:22:10 +02:00
Fabiano Fidêncio
3b2173c87a
Merge pull request #10124 from fidencio/topic/ci-enable-encrypted-image-tests-for-tees
ci: Enable encrypted image tests for TEEs
2024-08-03 11:39:51 +02:00
Fabiano Fidêncio
89f1581e54
ci: Enable encrypted image tests for TEEs
After experimenting a little bit with those tests, they seem to be
passing on all the available TEE machines.

With this in mind, let's just enable them for those machines.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-08-03 09:27:32 +02:00
Fabiano Fidêncio
3b896cf3ef
Merge pull request #10125 from fidencio/topic/un-break-ci
ci: Remove jobs that are not running
2024-08-03 09:27:04 +02:00
Fabiano Fidêncio
62a086937e
ci: Remove jobs that are not running
When re-enabling those we'll need a smart way to do so, as this limit of
20 workflows referenced is just ... weird.

However, for now, it's more important to add the jobs related to the new
platforms than keep the ones that are actively disabled.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-08-03 09:24:05 +02:00
GabyCT
76af5a444b
Merge pull request #10075 from microsoft/saulparedes/hooks
genpolicy: reject create custom hook settings
2024-08-02 15:36:34 -06:00
GabyCT
aadde2c25b
Merge pull request #10120 from kata-containers/fix_metrics_json_results_file
Fix metrics json results file
2024-08-02 11:29:02 -06:00
Fabiano Fidêncio
b93a0642e0
Merge pull request #10123 from fidencio/topic/re-enable-arm-ci
ci: re-enable arm CI
2024-08-02 17:48:35 +02:00
Dan Mihai
2628b34435
Merge pull request #10098 from microsoft/danmihai1/allow-failing
agent: fix the AllowRequestsFailingPolicy functionality
2024-08-02 08:42:47 -07:00