Commit Graph

14760 Commits

Author SHA1 Message Date
stevenhorsman
aa9f21bd19 test: Add support for s390x in cosign testing
We've added s390x test container image, so add support
to use them based on the arch the test is running on

Fixes: #10302

Signed-off-by: stevenhorsman <steven@uk.ibm.com>

fixuop
2024-09-16 09:20:57 +01:00
stevenhorsman
3087ce17a6 tests: combined pod yaml creation for CoCo tests
This commit brings some public parts of image pulling test series like
encrypted image pulling, pulling images from authenticated registry and
image verification. This would help to reduce the cost of maintainance.

Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-09-16 09:20:57 +01:00
Xynnn007
c80c8d84c3 test: add cosign signature verificaton tests
Close #8120

**Case 1**
Create a pod from an unsigned image, on an insecureAcceptAnything
registry works.

Image: quay.io/prometheus/busybox:latest
Policy rule:
```
"default": [
    {
        "type": "insecureAcceptAnything"
    }
]
```

**Case 2**
Create a pod from an unsigned image, on a 'restricted registry' is
rejected.

Image: ghcr.io/confidential-containers/test-container-image-rs:unsigned
Policy rule:
```
"quay.io/confidential-containers/test-container-image-rs": [
    {
        "type": "sigstoreSigned",
        "keyPath": "kbs:///default/cosign-public-key/test"
    }
]
```

**Case 3**
Create a pod from a signed image, on a 'restricted registry' is
successful.

Image: ghcr.io/confidential-containers/test-container-image-rs:cosign-signed
Policy rule:
```
"ghcr.io/confidential-containers/test-container-image-rs": [
    {
        "type": "sigstoreSigned",
        "keyPath": "kbs:///default/cosign-public-key/test"
    }
]
```

**Case 4**
Create a pod from a signed image, on a 'restricted registry', but with
the wrong key is rejected

Image:
ghcr.io/confidential-containers/test-container-image-rs:cosign-signed-key2

Policy:
```
"ghcr.io/confidential-containers/test-container-image-rs": [
    {
        "type": "sigstoreSigned",
        "keyPath": "kbs:///default/cosign-public-key/test"
    }
]
```

**Case 5**
Create a pod from an unsigned image, on a 'restricted registry' works
if enable_signature_verfication is false

Image: ghcr.io/kata-containers/confidential-containers:unsigned

image security enable: false

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-09-16 09:20:57 +01:00
Xynnn007
9606e7ac8b agent: Set image-rs image security policy
Add two parameters for enabling cosign signature image verification.
- `enable_signature_verification`: to activate signature verification
- `image_policy`: URI of the image policy
config

Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
2024-09-16 09:20:57 +01:00
Xynnn007
653bc3973f agent: fix make test for kata-agent of dependency anyhow
new version of the anyhow crate has changed the backtrace capture thus
unit tests of kata-agent that compares a raised error with an expected
one would fail. To fix this, we need only panics to have backtraces,
thus set `RUST_BACKTRACE=1` and `RUST_LIB_BACKTRACE=0` for tests due to
document

https://docs.rs/anyhow/latest/anyhow/

Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
2024-09-16 09:20:57 +01:00
Fabiano Fidêncio
dfcb41b5cc
Merge pull request #10313 from stevenhorsman/coco-components-0.10-bump
CoCo: Bump Coco components to 0.10 releases
2024-09-14 21:43:28 +02:00
stevenhorsman
705e469696
rootf: Change initrd alpine mirror
The rootfs-initrd build is failing with:
```
fetch https://mirror.math.princeton.edu/pub/alpinelinux//v3.18/main/aarch64/APKINDEX.tar.gz
6684368:error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1914:
ERROR: https://mirror.math.princeton.edu/pub/alpinelinux//v3.18/main: Permission denied
```
so try bumping to a newer version of alpine to see
if that helps the issue

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-09-14 18:47:45 +02:00
Dan Mihai
5777869cf4 tests: k8s-policy-rc: add unexpected UID test
Change pod runAsUser value of a Replication Controller after generating
the RC's policy, and verify that the RC pods get rejected due to this
change.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-09-13 22:05:31 +00:00
Dan Mihai
6773f14667 tests: k8s-policy-job: add unexpected UID test
Change pod runAsUser value of a Job after generating the Job's policy,
and verify that the Job gets rejected due to this change.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-09-13 22:05:31 +00:00
Dan Mihai
124f01beb3 tests: k8s-policy-deployment: add bad UID test
Change pod runAsUser value of a Deployment after generating the
Deployment's policy, and verify that the Deployment fails due to
this change.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-09-13 22:05:31 +00:00
Dan Mihai
16f5ebf5f9 genpolicy: get UID from PodSecurityContext
Get UID from PodSecurityContext for other k8s resource types too,
not just for Pods.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-09-13 22:05:31 +00:00
Dan Mihai
5badc30a69
Merge pull request #10316 from microsoft/danmihai1/k8s-inotify
tests: k8s-inotify: pod termination polling
2024-09-13 15:02:38 -07:00
GabyCT
6f363bba18
Merge pull request #10304 from GabyCT/topic/fixcricont
tests: Fix indentation in the cri containerd tests
2024-09-13 14:49:12 -06:00
Dan Mihai
d3127af9c5 tests: k8s-inotify: pod termination polling
Poll/wait for pod termination instead of sleeping 2 minutes. This
change typically saves ~90 seconds in my test cluster.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-09-13 17:12:55 +00:00
sidney chang
5a7d0ed3ad runtime-rs: introduce tap in hypervisor by extrating it from dragonball
It's a prerequisite PR to make built-in vmm dragonball compilation
options configurable.

Extract TAP device-related code from dragonball's dbs_utils into a
separate library within the runtime-rs hypervisor module.
To enhance functionality and reduce dependencies, the extracted code
has been reimplemented using the libc crate and the ifreq structure.

Fixes #10182

Signed-off-by: sidney chang <2190206983@qq.com>
2024-09-13 07:32:14 -07:00
Fabiano Fidêncio
b09eba8c46
Merge pull request #10309 from BbolroC/helm-install-with-retry
tests: Introduce retry mechanism for helm install
2024-09-13 15:08:46 +02:00
stevenhorsman
00e657cdb7 agent: image-rs: Update to v0.10.0 release
Update image-rs to use the latest release of guest-components

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-09-13 13:29:54 +01:00
stevenhorsman
5e03890562 versions: Bump trustee and guest-components
Bump to the v0.10.1 release of trustee and v0.10.0
release of guest-components

Signed-off-by: stevenhorsman <steven@uk.ibm.com>

fixup
2024-09-13 13:28:54 +01:00
Hyounggyu Choi
0aae847ae5 tests: Update secure boot image verification for IBM SE
In the latest `s390-tools`, there has been update on how to
verify a secure boot image. A host key revocation list (CRL),
which was optinoal, now becomes mandatory for verification.
This commit updates the relevant scripts and documentation accordingly.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-09-13 14:14:02 +02:00
Hyounggyu Choi
4c933a5611 tests: Introduce retry mechanism for helm install
Kata-deploy often fails due to a transiently unreachable k8s cluster
for the qemu-coco-dev test on s390x.
(e.g. https://github.com/kata-containers/kata-containers/actions/runs/10831142906/job/30058527098?pr=10009)
This commit introduces a retry mechanism to mitigate these failures by
retrying the command two more times with a 10-second interval as a workaround.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-09-13 14:03:44 +02:00
Dan Mihai
e937cb1ded
Merge pull request #10291 from microsoft/danmihai1/user-name-to-uid
genpolicy: fix and re-enable create container UID verification
2024-09-12 15:47:59 -07:00
Dan Mihai
0c5ac042e7 tests: k8s-policy-pod: add workaround for #10297
If the CI platform being tested doesn't support yet the prometheus
container image:
- Use busybox instead of prometheus.
- Skip the test cases that depend on the prometheus image.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-09-12 18:26:38 +00:00
Gabriela Cervantes
0346b32a90 tests: Fix indentation in the cri containerd tests
This PR fixes the indentation in the cri containerd tests as we
have in several places a misalignment in the script.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-09-12 16:18:34 +00:00
Dan Mihai
94d95fc055 tests: k8s-policy-pod: test container UID changes
Add test cases for changing container UID after generating the policy.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-09-11 22:38:20 +00:00
Dan Mihai
db1ca4b665 tests: k8s-policy-pod: remove UID workaround
Remove the workaround for #9928, now that genpolicy is able to
convert user names from container images into the corresponding
UIDs from these images.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-09-11 22:38:20 +00:00
Dan Mihai
d2d8d2e519 genpolicy: remove default UID/GID values
Remove the recently added default UID/GID values, because the genpolicy
design is to initialize those fields before this new code path gets
executed.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-09-11 22:38:20 +00:00
Hernan Gatta
871476c3cb genpolicy: pull UID:GID values from /etc/passwd
Some container images are configured such that the user (and group)
under which their entrypoint should run is not a number (or pair of
numbers), but a user name.

For example, in a Dockerfile, one might write:

> USER 185

indicating that the entrypoint should run under UID=185.

Some images, however, might have:

> RUN groupadd --system --gid=185 spark
> RUN useradd --system --uid=185 --gid=spark spark
> ...
> USER spark

indicating that the UID:GID pair should be resolved at runtime via
/etc/passwd.

To handle such images correctly, read through all /etc/passwd files in
all layers, find the latest version of it (i.e., the top-most layer with
such a file), and, in so doing, ensure that whiteouts of this file are
respected (i.e., if one layer adds the file and some subsequent layer
removes it, don't use it).

Signed-off-by: Hernan Gatta <hernan.gatta@opaque.co>
2024-09-11 22:38:20 +00:00
Hernan Gatta
f9249b4476 genpolicy: add tar dependency
Used to read /etc/passwd from tar files.

Signed-off-by: Hernan Gatta <hernan.gatta@opaque.co>
2024-09-11 22:38:20 +00:00
Dan Mihai
eb7f747df1 genpolicy: enable create container UID verification
Disabling the UID Policy rule was a workaround for #9928. Re-enable
that rule here and add a new test/CI temporary workaround for this
issue. This new test workaround will be removed after fixing #9928.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-09-11 22:38:20 +00:00
Dan Mihai
71ede4ea3f tests: k8s-policy-pod: use prometheus container
Change quay.io/prometheus/busybox to quay.io/prometheus/prometheus in
this test. The prometheus image will be helpful for testing the future
fix for #9928 because it specifies user = "nobody".

Also, change:

sh -c "ls -l /"

to:

echo -n "readinessProbe with space characters"

as the test readinessProbe command line. Both include a command line
argument containing space characters, but "sh -c" behaves differently
when using the prometheus container image (causes the readinessProbe
to time out, etc.).

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-09-11 22:38:20 +00:00
GabyCT
614328f342
Merge pull request #10295 from GabyCT/topic/removeimgvar
metrics: Remove unused remove img var in common script
2024-09-11 15:02:39 -07:00
GabyCT
095c5ed961
Merge pull request #10289 from GabyCT/topic/enablestresst
tests: Enable stressng k8s stability test for Kata CoCo CI
2024-09-11 10:47:33 -07:00
Fabiano Fidêncio
97ecdabde9
Merge pull request #10294 from fidencio/topic/bring-ita-support
Bump guest-components / trustee to a version that supports ITA
2024-09-11 19:45:48 +02:00
Gabriela Cervantes
fdaf12d16c metrics: Remove unused remove img var in common script
This PR removes the remove_img variable in the metrics common script
as it is not being used.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-09-11 17:45:18 +00:00
Gabriela Cervantes
04d1122a46 tests: Decrease iterations in soak test
This PR decreases the number of iterations in the kubernetes soak test
as this is already taking more than 2 hours for the kata coco ci
stability.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-09-11 17:39:06 +00:00
Gabriela Cervantes
c48c6f974e tests: Enable stressng k8s stability test for Kata CoCo CI
This PR enables the stressng k8s stability test for Kata CoCo CI.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2024-09-11 17:38:13 +00:00
Alex Man
7e400f7bb2 agent: Fix CPU usage reporting for cgroup v2 in kata-agent
kata-agent incorrectly reports CPU time for cgroup v2, causing 1000x underreporting.

For cgroup v2, kata-agent reads the cpu.stat file, which reports the time consumed by the processes in the cgroup in µs.
However, there was a bug in kata-agent where it returned this value in µs without converting it to ns.

This commit adds the necessary µs to ns conversion for cgroup v2, aligning it with v1 behavior and kata-shim's expectations.

This fixes #10278

Signed-off-by: Alex Man <alexman@stripe.com>
2024-09-11 10:29:03 -07:00
Fabiano Fidêncio
1178fe20e9 tests: Adapt error parser for failed image decryption
With an older version of image-rs, we were getting the following error:
```
       Message:   failed to create containerd task: failed to create shim task: failed to handle layer: failed to get decrypt key no suitable key found for decrypting layer key:
```

However, with the version of image-rs we are bumping to, the error comes
as:
```
       Message:   failed to create containerd task: failed to create shim task: failed to handle layer: failed to get decrypt key

 Caused by:
     no suitable key found for decrypting layer key:
      keyprovider: failed to unwrap key by ttrpc
```

Due to this change, I'm splitting the check in two different ones.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-11 17:07:56 +02:00
Dan Mihai
66dda37877
Merge pull request #10271 from Sumynwa/sumsharma/agent_ctl_issue_9689_local
agent-ctl: Refactor CopyFile Handler
2024-09-11 07:35:09 -07:00
Fabiano Fidêncio
f6cfc33314
Merge pull request #10292 from fidencio/topic/ci-tdx-adapt-how-we-get-the-host-ip
ci: tdx: Adapt how we get the host IP
2024-09-11 14:42:22 +02:00
Fabiano Fidêncio
e2200f0690
versions: trustee: Update to a version that supports ITA
ITA stands for Intel Trust Authority, which is in the process to being
renamed to ITTS (Intel Tiber Trust Services).

Proper ITA / ITTS support on Trustee was finished as part of:
* 6f767fa15f

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-11 13:39:35 +02:00
Fabiano Fidêncio
d3e3ee7755
versions: guest-components: Update to a version that supports ITA
ITA stands for Intel Trust Authority, which is in the process to being
renamed to ITTS (Intel Tiber Trust Services).

As we've bumped guest-components on trustee, let's make sure we also
bump image-rs to the commit that brings ITA support in:
* https://github.com/confidential-containers/guest-components/commit/1db6c3a87665dde58d0efa56f4e4af5fc

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-11 13:36:56 +02:00
Fabiano Fidêncio
f94d80783d
agent: image-rs: Update to a version that supports ITA
ITA stands for Intel Trust Authority, which is in the process to being
renamed to ITTS (Intel Tiber Trust Services).

As we've bumped guest-components on trustee, let's make sure we also
bump image-rs to the commit that brings ITA support in:
* 1db6c3a876

The reason we need to bump the dependency here is to avoid kbs_protocol
mismatch between the version used by the agent and the trustee one.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-11 13:36:46 +02:00
Fabiano Fidêncio
3946aa7283
ci: tdx: Adapt how we get the host IP
In the process of switching the TDX CI machine we've noticed that
`hostname -i` in one of the machines returns an one and only IP address,
while in another machine it returns a full list of IPs.

As we're only interested in the first one, let's adapt the code to
always return the first one.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2024-09-11 09:31:43 +02:00
Sumedh Alok Sharma
b4bbbf65c6 ci: Do not start CDH/attestation procs with kata-agent as local process.
Since CDH/attestation related processes and its dependencies are not fully
available, the setup fails to start kata-agent as local process. This
fix removes these procs to prevent kata-agent from trying to start them.

Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
2024-09-11 11:53:59 +05:30
Sumedh Alok Sharma
8045a7a2ba ci: Install policy document on host to run kata-agent as local process.
The test setup starts kata-agent as a local process without the
UVM. The agent policy initialization fails due to missing policy
document at `/etc/kata-opa/default-policy.rego`. The fix
- installs a relaxed `allow-all.rego` policy document
- cleans up the install during exit

Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
2024-09-11 11:25:08 +05:30
Sumedh Alok Sharma
822f898433 ci: Install bats as dependencies
Install bats as part of dependencies for running the tests.

Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
2024-09-11 10:57:15 +05:30
Sumedh Alok Sharma
2c774fb207 ci: Add tests for CopyFile api.
This commit introduces test cases for testing
CopyFile API using kata-agent-ctl with improved command
semantics and handling.
- copy a file to /run/kata-containers
- copy symlink to /run/kata-containers
- copy directory to /run/kata-containers
- copy file to /tmp
- copy large file to /run/kata-containers

Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
2024-09-11 10:54:01 +05:30
Sumedh Alok Sharma
2af1113426 agent-ctl: Refactor CopyFile handler
In the existing implementation for the CopyFile subcommand,
- cmd line argument list is too long, including various metadata information.
- in case of a regular file, passing the actual data as bytes stream adds to the size and complexity of the input.
- the copy request will fail when the file size exceeds that of the allowed ttrpc max data length limit of 4Mb.

This change refactors the CopyFile handler and modifies the input to a known 'source' 'destination' syntax.

Fixes #9708

Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
2024-09-11 10:54:01 +05:30
Alex Lyn
d0968032f7
Merge pull request #10276 from Apokleos/fix-runtime-cdi
runtime: Fix runtime/cdi panic with assignment to entry in nil map
2024-09-11 09:00:11 +08:00