Commit Graph

13293 Commits

Author SHA1 Message Date
stevenhorsman
29a5652e31 packaging: guest-components, set new environment variables
- Set KBC_PROVIDER and ATTESTER rather than TEE_PLATFORM
to avoid tss build issues for vTPM attester(s)
- There are future plans to make a matching TEE_PLATFORM, so this can be simplified once that is available

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-04-08 11:38:53 +01:00
stevenhorsman
a284a20a14 tests: Filter CoCo tests on ppc64le/arm
- At the moment we aren't supporting ppc64le or
aarch64 for
CoCo, so filter out these tests from running

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-04-08 11:38:53 +01:00
stevenhorsman
a0c03966c2 versions: Bump guest-components
- Bump guest-components to try and test compatibility with the latest version

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-04-08 11:38:53 +01:00
stevenhorsman
101a5bf273 packaging: Update guest-components Dockerfile
- Switch to Ubuntu 20.04 for building guest-components as
The rootfs is based on 20.04, so we need matching GLIBC versions.
See #8955
- Add dependencies needed by TDX verifier as we want to build for all platforms

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-04-08 11:38:53 +01:00
Gabriela Cervantes
6d85025e59 test/k8s: Add basic attestation test
- Add basic test case to check that a ruuning
pod can use the api-server-rest (and attestation-agent
and confidential-data-hub indirectly) to get a resource
from a remote KBS

Fixes #9057

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Co-authored-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-04-08 11:38:53 +01:00
Biao Lu
f0edec84f6 agent: Launch api-server-rest
If 'rest_api' is configured, let's start the  api-server-rest after
the attestation-agent and the confidential-data-hub have been started.

Fixes: #7555

Signed-off-by: Biao Lu <biao.lu@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com>
Co-authored-by: Wang, Arron <arron.wang@intel.com>
Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com>
Co-authored-by: Alex Carter <alex.carter@ibm.com>
Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>
2024-04-08 11:38:53 +01:00
Biao lu
4d752e6350 agent: Add config for api-server-rest
Add configuration for 'rest api server'.

Optional configurations are
  'agent.rest_api=attestation' will enable attestation api
  'agent.rest_api=resource' will enable resource api
  'agent.rest_api=all' will enable all (attestation and resource) api

Fixes: #7555

Signed-off-by: Biao Lu <biao.lu@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com>
Co-authored-by: Wang, Arron <arron.wang@intel.com>
Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com>
Co-authored-by: Alex Carter <alex.carter@ibm.com>
Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>
2024-04-08 11:06:14 +01:00
Biao Lu
f476d671ed agent: Launch the confidential data hub
Let's introduce a new method to start the confidential data hub and the
attestation agent.  The former depends on the later, and it needs to be
started before the RPC server.

Starting the attestation components is based on whether the confidential
containers guest components binaries are found in the rootfs.

Fixes: #7544

Signed-off-by: Biao Lu <biao.lu@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com>
Co-authored-by: Wang, Arron <arron.wang@intel.com>
Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com>
Co-authored-by: Alex Carter <alex.carter@ibm.com>
Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>
2024-04-08 11:06:14 +01:00
Greg Kurz
be8f0cb520
Merge pull request #9402 from deagon/feat/debug-threads
qemu: show the thread name when enable the hypervisor.debug option
2024-04-08 11:04:36 +02:00
Hyounggyu Choi
e39be7a45e
Merge pull request #9415 from BbolroC/fix-dir-removal-error
GHA: Implement secondary GITHUB_WORKSPACE cleanup on 1st failure
2024-04-08 10:44:44 +02:00
GabyCT
9d2c5b180e
Merge pull request #9419 from GabyCT/topic/fxlatency
metrics: Improve latency test cleanup
2024-04-05 16:31:00 -06:00
Wainer Moschetta
aae7048d4f
Merge pull request #9273 from ldoktor/kcli-coco-kbs
tests: Support for kbs setup on kcli
2024-04-05 18:55:58 -03:00
Fabiano Fidêncio
f09bb98f51
Merge pull request #8840 from fidencio/topic/update-tdx-artefacts-to-the-new-host-os
tdx: Update TDX artefacts to be used with the Ubuntu 23.10 / CentOS 9 stream OSVs.
2024-04-05 22:36:03 +02:00
Fabiano Fidêncio
cdb8531302
hypervisor: Simplify TDX protection detection
Let's rely on the kvm module 'tdx' parameter to do so.
This aligns with both OSVs (Canonical, Red Hat, SUSE) and the TDX
adoption (https://github.com/intel/tdx-linux) stacks.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 19:51:27 +02:00
Fabiano Fidêncio
2ee03b5dc3
tdvf: Adapt the build command
This is done in order to match the example from:
https://github.com/intel/tdx-linux/wiki/Instruction-to-set-up-TDX-host-and-guest#build-tdvf-image

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 19:51:27 +02:00
Fabiano Fidêncio
b7cccfa019
qemu: tdx: Adapt command line
This commit is a mess, but I'm not exactly sure what's the best way to
make it less messy, as we're getting QEMU TDX to work while partially
reverting 1e34220c41.

With that said, let me cover the content of this commit.

Firstly, we're reverting all the changes related to
"memory-backend-memfd-private", as that's what was used with the
previous host stack, but it seems it
didn't fly upstream.

Secondly, in order to get QEMU to properly work with TDX, we need to
enforce the 'private=on' knob and use the "memory-backend-ram", and
we're doing so, and also making sure to test the `private=on` newly
added knob.

I'm sorry for the confusion, I understand this is not optimal, I just
don't see an easy path to do changes without leaving the code broken
during those changes.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 19:51:27 +02:00
Hyounggyu Choi
4493459937 GHA: Implement secondary GITHUB_WORKSPACE cleanup on 1st failure
Occasionally, the removal of GITHUB_WORKSPACE fails for self-hosted runners
because one of the subdirectories is not empty. This is likely due to another
process occupying the directory at the time.
Implementing a secondary cleanup resolves this issue.
This commit focuses on the implementation for the secondary cleanup.

Fixes: #9317

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-04-05 11:41:51 +02:00
Fabiano Fidêncio
6b4cc5ea6a
Revert "qemu: tdx: Workaround SMP issue with TDX 1.5"
This reverts commit d1b54ede29.

 Conflicts:
	src/runtime/virtcontainers/qemu.go

This commit was a hack that was needed in order to get QEMU + TDX to
work atop of the stack our CI was running on.  As we're moving to "the
officially supported by distros" host OS, we need to get rid of this.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 10:23:52 +02:00
Fabiano Fidêncio
582b5b6b19
govmm: tdx: Expose the private=on|off knob
The private=on|off knob is required in order to properly lauunch a TDX
guest VM.

This is a brand new property that is part of the still in-flight patches
adding TDX support on QEMU.

Please, see:
3fdd8072da

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 10:23:52 +02:00
Fabiano Fidêncio
fe5adae5d9
qemu-tdx: Update to v8.1.0 + TDX patches
Let's update the QEMU to the one that's officially maintained by Intel
till all the TDX patches make their way upstream.

We've had to also update python to explicitly use python3 and add
python3-venv as part of the dependencies.

Fixes: #8810

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 10:23:51 +02:00
Alex Lyn
0e0a361f0e
Merge pull request #8782 from Apokleos/device-increate-count
bugfix and refactor device increate count
2024-04-05 13:43:49 +08:00
Dan Mihai
6f9f8ae285
Merge pull request #9413 from microsoft/saulparedes/ensure_unique_rg_in_gha
gha: ensure unique resource group name
2024-04-04 17:13:09 -07:00
GabyCT
80d926c357
Merge pull request #9411 from microsoft/danmihai1/k8s-job
tests: k8s-job: wait for job successful create
2024-04-04 15:14:56 -06:00
Gabriela Cervantes
8e5d401be0 metrics: Improve latency test cleanup
This PR improves the latency test cleanup in order to avoid random
failures of leaving the pods.

Fixes #9418

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-04-04 20:43:53 +00:00
Saul Paredes
f20caac1c0 gha: ensure unique resource group name
There's an rg name duplication situation that got introduced by #9385
where 2 different test runs might have same rg name.

Add back uniqueness by including the first letter of GENPOLICY_PULL_METHOD to
cluster name.

Fixes: #9412

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-04-04 13:13:32 -07:00
GabyCT
aae2679f09
Merge pull request #9409 from GabyCT/topic/ghrunset
gha: Define GH_PR_NUMBER variable in gha run k8s common script
2024-04-04 09:46:48 -06:00
Dan Mihai
3e72b3f360 tests: k8s-job: wait for job successful create
Don't just verify SuccessfulCreate - wait for it if needed.

Fixes: #9138

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-04-03 22:11:15 +00:00
Gabriela Cervantes
73f27e28d1 gha: Define GH_PR_NUMBER variable in gha run k8s common script
This PR defines the GH_PR_NUMBER variable in gha run k8s common
script to avoid failures like unbound variable when running
locally the scripts just like the GHA CI.

Fixes #9408

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-04-03 18:25:00 +00:00
GabyCT
c5c229b330
Merge pull request #9397 from GabyCT/topic/removeconmon
versions: Remove conmon information from versions.yaml
2024-04-03 11:14:43 -06:00
GabyCT
12947b1ba6
Merge pull request #9344 from GabyCT/topic/kerneldoc
docs: Remove stale kernel information
2024-04-03 11:13:54 -06:00
Dan Mihai
07c23a05f2
Merge pull request #9385 from microsoft/saulparedes/add_genpolicy_yaml_params
gha: add GENPOLICY_PULL_METHOD
2024-04-03 09:20:16 -07:00
Alex Lyn
935a1a3b40 runtime-rs: refactor decrease_attach_count with do_decrease_count
Try to reduce duplicated code in decrease_attach_count with public
new function do_decrease_count.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:19:19 +08:00
Alex Lyn
4f0fab938d runtime-rs: refactor increase_attach_count with do_increase_count
Try to reduce duplicated code in increase_attach_count with public
new function do_increase_count.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:19:19 +08:00
Alex Lyn
fff64f1c3e runtime-rs: introduce dedicated function do_decrease_count
Introduce a dedicated public function do_decrease_count to
reduce duplicated code in drivers' decrease_attach_count.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:19:08 +08:00
Alex Lyn
5750faaf31 runtime-rs: introduce dedicated function do_increase_count
Since there are many implementations of reference counting in the
drivers, all of which have the same implementation, we should try
to reduce such duplicated code as much as possible. Therefore, a
new function is introduced to solve the problem of duplicated code.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:09:17 +08:00
Guoqiang Ding
cd0c31e185 qemu: show the thread name when enable the hypervisor.debug option
Add debug-threads=on in the name argument if debug enabled.

Fixes: #9400
Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>
2024-04-03 10:36:52 +08:00
Saul Paredes
8a92e81f98 gha: add GENPOLICY_PULL_METHOD
Add GENPOLICY_PULL_METHOD that will be used to test pulling
container images in genpolicy using the oci-distribution crate
and/or the containerd interface.

GENPOLICY_PULL_METHOD will start being used in a future PR.

Fixes: #9384

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-04-02 19:03:28 -07:00
Gabriela Cervantes
f3957352f0 versions: Remove conmon information from versions.yaml
This PR removes conmon information from versions.yaml as this is not
longer being used in kata containers repository.

Fixes #9396

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-04-02 16:25:45 +00:00
Alex Lyn
7795f9c016
Merge pull request #9365 from GabyCT/topic/removerunc
versions: Remove runc version information
2024-04-02 09:21:56 +08:00
Alex Lyn
fa8049af6c
Merge pull request #9383 from Apokleos/unified-cgrp-cmdline
kata-agent: enabling cgroups-v2 by systemd.unified_cgroup_hierarchy
2024-04-02 09:08:04 +08:00
Alex Lyn
07bfdf4a22
Merge pull request #9275 from Apokleos/swap-hooks-bindmnt
kata-agent: Change order of guest hook and bind mount processing
2024-04-02 07:40:10 +08:00
Alex Lyn
c88014834b kata-agent: enabling cgroups-v2 by systemd.unified_cgroup_hierarchy
Configure the system to mount cgroups-v2 by default during system boot
by the systemd system, We must add systemd.unified_cgroup_hierarchy=1
parameter to kernel cmdline, which will be passed by kernel_params in
configuration.toml.
To enable cgroup-v2, just add systemd.unified_cgroup_hierarchy=true[1]
to kernel_params.

Fixes: #9336

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-01 18:45:12 +08:00
alex.lyn
548f252bc4 runtime-rs: bugfix incorrect use of refcount before vfio attach
When there's a pod with multiple containers, there may be case that
attach point more than 2, we should not return Err in that case when
we are doing attach ops, but just return Ok.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-01 11:28:57 +08:00
Alex Lyn
aa9cd232cd
Merge pull request #9358 from GabyCT/topic/nerdrandom
gha: Update journal log names for nerdctl artifacts
2024-04-01 09:50:16 +08:00
Alex Lyn
dfa8832406
Merge pull request #9345 from c3d/bug/9342-agent-test-errors
agent: Fix errors in `make check`
2024-04-01 09:48:44 +08:00
Dan Mihai
3a7dbcfc17
Merge pull request #9367 from microsoft/danmihai1/infinite-io-stream-copy-loop
runtime: remove stream copy infinite loop
2024-03-29 09:37:44 -07:00
Dan Mihai
600f9266f3 runtime: remove stream copy infinite loop
This reverts commit 1c5693be86.

Avoid apparent infinite loop when ReadStreamRequest is blocked by
policy - for some of the pods.

When running the k8s-limit-range.bats test with Policy enabled,
the Shim + VMM never get terminated on my cluster. Not sure why
the sandbox clean-up works better for other tests, but the
k8s-limit-range test pod gets stuck in an infinite loop:

stdout io stream copy error happens: error = %wrpc error: code =
PermissionDenied desc = \"ReadStreamRequest is blocked by policy

...

policy check: ReadStreamRequest

...

stdout io stream copy error happens: error = %wrpc error: code =
PermissionDenied desc = \"ReadStreamRequest is blocked by policy

...

policy check: ReadStreamRequest

...

Fixes: #9380

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-03-28 22:43:28 +00:00
Dan Mihai
ebb26edf42
Merge pull request #9347 from microsoft/danmihai1/reduce-exec-test-policy-prints
genpolicy: reduce policy debug prints
2024-03-27 15:12:10 -07:00
Gabriela Cervantes
a32418bf32 versions: Remove runc version information
This PR removes the runc version information as this is not longer being used
in the kata containers scripts.

Fixes #9364

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-03-27 20:32:38 +00:00
Steve Horsman
b3acbe0b7f
Merge pull request #8046 from fitzthum/clean-config
runtime: remove unimplemented CoCo configurations
2024-03-27 19:39:48 +00:00