Commit Graph

11493 Commits

Author SHA1 Message Date
Jeremi Piotrowski
cf46b056fd
Merge pull request #7839 from openanolis/chao/switch_to_azure
CI: switch static-checks-dragonball CI machines to Azure
2023-09-05 10:59:02 +02:00
Chao Wu
60f733d301 CI: switch static-checks-dragonball CI machines to Azure
Previously, static-checks-dragonball is using machines from Alibaba
Cloud to run all the CI jobs.

Currently, we are going through an internal process to apply for the new
machines for Dragonball CI. Before the internal process is over, we will
temporarily use Azure VM to run static-checks-dragonball jobs.

fixes: #7838

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-09-05 15:19:07 +08:00
Fabiano Fidêncio
b663ec21ac
Merge pull request #7803 from GabyCT/topic/readmereportdoc
metrics: Add README for kata metrics report
2023-09-03 21:57:13 +02:00
Fabiano Fidêncio
e490b0bc76
Merge pull request #7808 from ManaSugi/fix/remove-manual-chcon
osbuilder: Remove chcon operation for guest SELinux
2023-09-03 21:55:02 +02:00
Fabiano Fidêncio
27dab249a0
Merge pull request #7800 from jodh-intel/kata-sys-util-update-tdx-protection-checks
kata-sys-util: protection: Update TDX checks
2023-09-02 14:47:51 +02:00
Jiang Liu
d5729e818c
Merge pull request #7819 from jiangliu/storage-cleanup
Improve the way to clean up storage devices for sandbox
2023-09-02 17:02:51 +08:00
Jiang Liu
57e7bf14a6 agent: refine StorageDeviceGeneric::cleanup()
Refine StorageDeviceGeneric::cleanup() to improve safety.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-02 14:22:21 +08:00
Jiang Liu
53edb19374 agent: implement StorageDeviceGeneric::cleanup()
Refactor cleanup_sandbox_storage as StorageDeviceGeneric::cleanup().

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-02 14:00:26 +08:00
Jiang Liu
0c63453e28 types: make StorageDevice::cleanup() return possible error code
Make StorageDevice::cleanup() return possible error code.

Fixes: #7818

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-02 13:27:06 +08:00
Jiang Liu
3a3d77b3b5 agent: move StorageDeviceGeneric from kata-types into agent
Move StorageDeviceGeneric from kata-types into agent, so we can
refactor code later.

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-02 13:12:17 +08:00
Jiang Liu
d848126b61
Merge pull request #7821 from jiangliu/storage-leak
agent: avoid possible leakage of storage device
2023-09-02 12:40:40 +08:00
Fabiano Fidêncio
4f92e6df90
Merge pull request #7683 from microsoft/danmihai1/policy-tests
tests: add policy to existing tests
2023-09-01 23:52:15 +02:00
Fabiano Fidêncio
f3e1a6a94f osbuilder: alpine: Change mirror
As we're hitting a lot of:
```
ERROR: https://dl-5.alpinelinux.org/alpine/v3.18/main: operation timed
out
```

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-01 16:01:42 +00:00
Fabiano Fidêncio
ac612aef5e osbuilder: alpine: Match the version on versions.yaml
We've switching to 3.18 as part of
82cd14ba39.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-01 16:01:33 +00:00
Jiang Liu
9cd706d1c9 agent: avoid possible leakage of storage device
When a storage device is used by more than one container, the second
and forth instances will cause storage device reference count leakage,
thus cause storage device leakage. The reason is:
add_storages() will increase reference count of existing storage device,
but forget to add the device to the `mount_list` array, thus leak the
reference count.

Fixes: #7820

Signed-off-by: Jiang Liu <gerry@linux.alibaba.com>
2023-09-01 22:52:42 +08:00
Dan Mihai
bf21411e90 tests: add policy to k8s tests
Use AGENT_POLICY=yes when building the Guest images, and add a
permissive test policy to the k8s tests for:
- CBL-Mariner
- SEV
- SNP
- TDX

Also, add an example of policy rejecting ExecProcessRequest.

Fixes: #7667

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-09-01 14:28:08 +00:00
Dan Mihai
d0e0610679 runtime: config: use the SEV initrd for SNP
Thanks Unmesh Deodhar!

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-09-01 14:28:08 +00:00
Fabiano Fidêncio
67fed26f18 runtime: Use TDX image with in the qemu-tdx config
Let's make sure we use the TDX image as part of the QEMU TDX
configuration, which will help us to have the policies tested here.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-01 14:28:08 +00:00
Fabiano Fidêncio
f65ffb23da
Merge pull request #7814 from fidencio/topic/gha-rebase-prs-atop-of-main-for-the-tests
gha: Rebase PR atop of the target branch before testing
2023-09-01 16:26:32 +02:00
Fabiano Fidêncio
ef70aeb6b8
Merge pull request #7817 from fidencio/topic/update-alpine-to-its-latest-release
versions: Update alpine to its 3.18 version
2023-09-01 14:51:58 +02:00
Fabiano Fidêncio
ac939c458c gha: Rebase atop of the target branch
We have two scenarios we care about this, `pull_request` and
`pull_request_target` events triggered a job.

`pull_request` event:
When using the checkout action, it'll already provide a "rebased atop of
main" repo for us, nothing else is needed, and that's basically what we
already have as part of the jobs in our CI.

`pull_request_target` event:
This one is a little bit tricky, as the checkout action, unless passing
a spsecific repo, give us the PR checked out rebased atop of the HEAD of
the PR branch.  Jeremi Piotrowski nicely pointed out that we could use
github.event.pull_request.merge_commit_sha instead, which is the result
of the PR's branch with the official repo target branch.

Now, the only cases where the contributor's rebase would still be needed
is when the action itself has been changed.

Fixes: #7414

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-09-01 11:23:31 +02:00
Jeremi Piotrowski
bde06758b1
Merge pull request #7761 from jepio/iocopy-fix-race
runtime: Fix data race in ioCopy
2023-09-01 09:30:54 +02:00
Fabiano Fidêncio
82cd14ba39 versions: Update alpine to its 3.18 version
3.15 will be out of life in 2 months from now.

Fixes: #7816

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-31 23:02:54 +02:00
GabyCT
d75c7b5f9c
Merge pull request #7813 from GabyCT/topic/genreport
metrics: Add grabdata script for metrics report
2023-08-31 13:33:38 -06:00
Gabriela Cervantes
6668825752 metrics: Add grabdata script for metrics report
This PR adds the grabdata script so it can be used for the metrics report
for kata metrics.

Fixes #7812

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-31 16:17:29 +00:00
James O. D. Hunt
c290eaed8c kata-sys-util: protection: Update TDX checks
Update the protection checking code to detect newer versions of Intel
TDX (whose userland interface has now stabilised).

> **Note:** that we don't need to retain the existing behaviour since:
>
> - We haven't yet landed the TDX feature (#6448).
> - Systems wishing to use TDX will need to use the latest available
>   system components (such as firmware and host kernel).

Also added an explicit TDX unit test.

Fixes: #7384.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-08-31 16:15:15 +01:00
Fabiano Fidêncio
d7a996c686 gha: Update to checkout@v3 action
At this point we should always be using the latest checkout action.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2023-08-31 16:02:31 +02:00
Jeremi Piotrowski
d7612440b8
Merge pull request #7789 from beraldoleal/tests/amd
Fixes tests on AMD machines
2023-08-31 11:23:51 +02:00
Jeremi Piotrowski
c2ba29c15b runtime: Fix data race in ioCopy
IoCopy is a tricky function (I don't claim to fully understand its contract),
but here is what I see: The goroutine that runs it spawns 3 goroutines - one
for each stream to handle (stdin/stdout/stderr). The goroutine then waits for
the stream goroutines to exit. The idea is that when the process exits and is
closed, the stdout goroutine will be unblocked and close stdin - this should
unblock the stdin goroutine. The stderr goroutine will exit at the same time as
the stdout goroutine. The iocopy routine then closes all tty.io streams.

The problem is that the stdout goroutine decrements the WaitGroup before
closing the stdin stream, which causes the iocopy goroutine to race to close
the streams. Move the wg.Done() of the stdout routine past the close so that
*this* race becomes impossible. I can't guarantee that this doesn't affect some
unspecified behavior.

Fixes: #5031
Signed-off-by: Jeremi Piotrowski <jpiotrowski@microsoft.com>
2023-08-31 10:17:38 +02:00
Manabu Sugimoto
211de08d9e osbuilder: Remove chcon operation for guest SELinux
Remove the `chcon` operation which adds `container_runtime_exec_t` label to
the `kata-agent` binary because the container-selinux package including
the 39f83cc74d
commit has been released officially.
Ref. https://centos.pkgs.org/9-stream/centos-appstream-x86_64/container-selinux-2.221.0-1.el9.noarch.rpm.html

The container-selinux package is installed in a guest rootfs when we create it with `SELinux = yes`,
and `restorecon` sets `container_runtime_exec_t` to the `kata-agent`.

Fixes: #7807

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2023-08-31 16:44:32 +09:00
GabyCT
b467f2ef68
Merge pull request #7772 from GabyCT/topic/fiolimit
metrics: Enable FIO limits for kata metrics
2023-08-30 14:49:04 -06:00
Gabriela Cervantes
9f21fa9b39 metrics: Add report generator link to general documentation
This PR adds the report generator link to general documentation.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-30 16:55:14 +00:00
Gabriela Cervantes
c0ed5ea0ad metrics: Add README for kata metrics report
This PR adds the README for kata metrics report.

Fixes #7802

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-30 16:36:08 +00:00
Fabiano Fidêncio
aa2b51a831
Merge pull request #7783 from GabyCT/topic/makereport
metrics: Add metrics report script
2023-08-30 17:11:39 +02:00
Gabriela Cervantes
a7b59a5bf9 metrics: Add limit for 90 percentile for qemu value
This PR adds the limit for 90 percentile for qemu value for
FIO kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-30 13:53:38 +00:00
Gabriela Cervantes
99db6568e9 metrics: Add limit for write 90 percentile value for clh
This PR adds the limit for write 90 percentile value for clh for
FIO metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-30 13:53:38 +00:00
Gabriela Cervantes
6e06392c55 metrics: Enable FIO limits for kata metrics
This PR enables the FIO limits for kata metrics.

Fixes #7771

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-30 13:53:38 +00:00
David Esparza
924d06a7f5
Merge pull request #7787 from GabyCT/topic/fixmemoryinsidelimit
metrics: Fix memory inside limits for kata metrics
2023-08-30 07:45:17 -06:00
Beraldo Leal
00e7ffd988 tests: check vmx only on Intel machines
When running on amd machines, those tests will fail because there is no
vmx flag. Following other tests that checks for cpuType, let's adapt
them to restrict vmx only on Intel machines.

Fixes #7788.
Related #5066

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-08-29 20:04:31 -04:00
Gabriela Cervantes
c8dd3c0737 metrics: Fix memory footprint qemu limit
This PR fixes the memory footprint qemu limit for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 22:51:21 +00:00
Gabriela Cervantes
8877ec62fb metrics: Fix memory inside limits for kata metrics
This PR fixes the memory inside limit for clh for kata metrics due
to the recent changes that we had in the script which impacted
in the performance measurement.

Fixes #7786

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 21:38:18 +00:00
Beraldo Leal
80146f2078 tests: Fixes cpuType check on AMD machines
cpuType is not initialized yet. gets 0 (Intel) by default, failing on
AMD machines.

Fixes #7785

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2023-08-29 17:04:07 -04:00
Gabriela Cervantes
7e364716dd metrics: Add test setup details to metrics report
This PR adds test setup details to metrics report.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:56:53 +00:00
Gabriela Cervantes
17dc1b9760 metrics: Add boot lifecycle times to metrics report
This PR adds the boot lifecycle times to metrics report.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:55:44 +00:00
Gabriela Cervantes
3b0d6538f2 metrics: Add memory inside container to metrics report
This PR adds memory inside container to metrics report.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:53:17 +00:00
Gabriela Cervantes
79fbb9d243 metrics: Add scaling system footprint in metrics report
This PR adds scaling system footprint in metrics report.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:51:27 +00:00
Gabriela Cervantes
8e6d4e6f3d metrics: Add metrics reportgen
This PR adds metrics reportgen for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:45:36 +00:00
Gabriela Cervantes
139ffd4f75 metrics: Add report file titles
This PR adds report file titles for kata metrics.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 17:43:06 +00:00
GabyCT
8f2dae7b53
Merge pull request #7775 from dborquez/fix_memory_usage_parsing_results
metrics: fix parsing issue on memory-usage test
2023-08-29 11:26:13 -06:00
Gabriela Cervantes
878d1a2e7d metrics: Generate PNGs alongside the PDF report
This PR generates the PNGs for the kata metrics PDF report.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2023-08-29 16:50:32 +00:00