- Fix the lint error:
```
error: you seem to use `.enumerate()` and immediately discard the index
--> src/device_manager/mod.rs:427:33
|
427 | for (_index, device) in self.virtio_devices.iter().enumerate() {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```
by removing the unnecessary enumerate
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The ci failed with:
```
error: use of `or_insert_with` to construct default value
--> src/address_space_manager.rs:650:14
|
650 | .or_insert_with(NumaNode::new);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try: `or_default()`
|
```
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
The following tests are disabled because they fail (alike with dragonball):
- k8s-cpu-ns.bats
- k8s-number-cpus.bats
- k8s-sandbox-vcpus-allocation.bats
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
It shouldn't call the initial_size_manager's setup_config
in the load_config since it had been called in the sandbox's
try_init function.
Fixes: #9778
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This PR uses the nodeport deployment from upstream trustee.
To ensure our deployment is as close to upstream trustee replace
the custom nodeport handling and replace it with nodeport
kustomized flavour from the trustee project.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR makes general improvements like definition of variables and
the use of them to improve the general setup script for kubernetes
tests.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
While running with a remote hypervisor, whenever kata-monitor tries to access
metrics from the shim, the shim does a "panic" and no metric can be gathered.
The function GetVirtioFsPid() is called on metrics gathering, and had a call
to "panic()". Since there is no virtiofs process for remote hypervisor, the
right implementation is to return nil. The caller expects that, and will skip
metrics gathering for virtiofs.
Fixes: #9826
Signed-off-by: Julien Ropé <jrope@redhat.com>
This corrects the warning to point to the \`-j\` flag,
which is the correct flag for the JSON settings file.
Previously, the warning was confusing, as it pointed to
the \`-p\` flag, which specifies to the path for the Rego ruleset.
Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
This PR uses the function definition to have uniformity across
all the launch times script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This patch re-generates the client code for Cloud Hypervisor v39.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.
Fixes: #8694, #9574
Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch upgrades Cloud Hypervisor to v39.0 from v36.0, which contains
fixes of several security advisories from dependencies. Details can be
found from #9574.
Fixes: #8694, #9574
Signed-off-by: Bo Chen <chen.bo@intel.com>
Start testing the ability of kata-deploy to install and configure
the qemu-runtime-rs runtimeClass.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Allow kata-deploy to install and configure the qemu-runtime-rs runtimeClass
which ties to qemu hypervisor implementation in rust for the runtime-rs.
Fixes: #9804
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
- Update the config parsing logic so that when reading from the
agent-config.toml file any envs are still processed
- Add units tests to formalise that the envs take precedence over values
from the command line and the config file
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
When the total number of files observed is greater than limit, return (-1, err).
When the returned err is not nil, the func countFiles should return -1.
Fixes:#9780
Signed-off-by: gaohuatao <gaohuatao@bytedance.com>
fixes#9810
Add an annotation to the enum values in the agent config that will
deserialize them using a kebab-case conversion, aligning the behaviour
to parsing of params specified via kernel cmdline.
drive-by fix: add config override for guest_component_procs variable
Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
Frequent errors have been observed during k8s e2e tests:
- The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
- Error from server (ServiceUnavailable): the server is currently unable to handle the request
- Error from server (NotFound): the server could not find the requested resource
These errors can be resolved by retrying the kubectl command.
This commit introduces a wrapper function in common.sh that runs kubectl up to 3 times
with a 5-second interval. Initially, this change only covers gha-run.sh for Kubernetes.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
It seems I was very lose on disabling some of the tests, and the issues
I faced could be related to other instabilities in the CI.
Let's re-enable this one, following what was done for the SEV, SNP, and
coco-qemu-dev.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It seems I was very lose on disabling some of the tests, and the issues
I faced could be related to other instabilities in the CI.
Let's re-enable this one, following what was done for the SEV, SNP, and
coco-qemu-dev.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It seems I was very lose on disabling some of the tests, and the issues
I faced could be related to other instabilities in the CI.
Let's re-enable this one, following what was done for the SEV, SNP, and
coco-qemu-dev.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It seems I was very lose on disabling some of the tests, and the issues
I faced could be related to other instabilities in the CI.
Let's re-enable this one, following what was done for the SEV, SNP, and
coco-qemu-dev.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
It seems I was very lose on disabling some of the tests, and the issues
I faced could be related to other instabilities in the CI.
Let's re-enable this one, following what was done for the SEV, SNP, and
coco-qemu-dev.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's stop skipping the CDH tests for TDX, as know we should have an
environmemnt where it can run and should pass. :-)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We must ensure we use the host ip to connect to the PCCS running on the
host side, instead of using localhost (which has a different meaning
from inside the KBS pod).
The reason we're using `hostname -i` isntead of the helper functions, is
because the helper functions need the coco-kbs deployed for them to
work, and what we do is before the deployment.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(1) As mis-use of cap.set causing previous Caps lost which
causing assert! failed, just replacing cap.set with cap.add.
(2) It will return error if there's no such name setting when
do update_config_by_annotation {
...
if config.runtime.name.is_empty() {
return Err(io::Error::new(
io::ErrorKind::InvalidData,
"Runtime name is missing in the configuration",
));
}
...
}
Fixes#9783
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Centos8 is EOL and repos are not available anymore. Centos9 contains the
same packages and should do well as a base for testing.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
This is generally useful debug output on test failures,
and specifically this has been useful for nydus-related
issues recently.
Signed-off-by: Chris Porter <porter@ibm.com>
This is required to support SEV-SNP confidential container with kernel-hashes.
Since this ovmf is latest stable version, it is good to upgrade for tdx
and Vanilaa builds too.
Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>
This is required to provide the hashes of kernel, initrd and cmdline
needed during the attestation of the coco.
Fixes: #9150
Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>
A new PULL_TYPE environment variable is recognized by the kata-deploy's
install script to allow it to configure CRIO-O for guest-pull image pulling
type.
The tests/integration/kubernetes/gha-run.sh change allows for testing it:
```
export PULL_TYPE=guest-pull
cd tests/integration/kubernetes
./gha-run.sh deploy-k8s
```
Fixes#9474
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR replaces the name to use a variable that is already defined
to have a better uniformity across the general script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
In line with the changes for x86_64, the k8s nydus test for s390x should
also use `qemu-coco-dev` for `KATA_HYPERVISOR`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
It was observed that the `node_start_time` value is sometimes empty,
leading to a test failure.
This commit retries fetching the value up to 3 times.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The Kata blog was recently moved to the project's website. The content
of the blog is stored together with the rest of the website source on
GitHub.
This patch adds a short guide that describes how to submit a new
blog post as a PR, to appear on the project's website.
Signed-off-by: Ildiko Vancsa <ildiko.vancsa@gmail.com>
This test fails when using `shared_fs=none` with the nydus snapshotter.
Issue tracked here: #9666
Skipping for now.
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
This test is failing due to the initContainers not being properly
handled with the guest image pulling.
Issue tracked here: #9668
Skipping for now.
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
This test fails when using `shared_fs=none` with the nydus napshotter,
Issue tracked here: #9664
Skipping for now.
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
To prevent download failures caused by high traffic to the Docker image,
opt for quay.io/prometheus/busybox:latest over docker.io/library/busybox:latest .
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Added configuration file with rules to exclude some self-hosted
runners from the linter warnings.
Related-with: #9646
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The job has failed more than 50% on nightly CI. Remove it from the list of
execution until we don't have a fix.
Issue: 9764
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
These jobs have failed more than 50% on nightly CI. Remove them from the list of
execution until we don't have a fix.
Issue: 9763
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This job has failed more than 50% on nightly CI. Remove it from the list of
execution until we don't have a fix.
Issue: 9761
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
For nerdctl install, add symlinks for runc and slirp4netns in the
binary install path.
runc link comes in handy for running runc containers with nerdctl fir
quick tests.
slirp4netns allows for running containers with user mode networking
useful in case of rootless containers.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
CreateContainerRequest objects can specify devices to be created inside
the guest VM. This change ensures that requested devices have a
corresponding entry in the PodSpec.
Devices that are added to the pod dynamically, for example via the
Device Plugin architecture, can be allowlisted globally by adding their
definition to the settings file.
Fixes: #9651
Signed-off-by: Markus Rudy <mr@edgeless.systems>
This adds structs and fields required to parse PodSpecs with
VolumeDevices and PVCs with non-default VolumeModes.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
If a test is failing during setup, makes no much sense to run the suite.
Let's skip and add some debug messages.
Signed-off-by: Beraldo Leal <bleal@redhat.com>
End of file should not end with --- mark. This will confuse tools like
yq and kubectl that might think this is another object.
Signed-off-by: Beraldo Leal <bleal@redhat.com>
Since yq frequently updates, let's upgrade to a version from February to
bypass potential issues with versions 4.41-4.43 for now. We can always
upgrade to the newest version if necessary.
Fixes#9354
Depends-on:github.com/kata-containers/tests#5818
Signed-off-by: Beraldo Leal <bleal@redhat.com>
golang.mk is not ready to deal with non GOPATH installs. This is
breaking test on s390x.
Since previous steps here are installing go and yq our way, we could
skip this aditional check. A full refactor to golang.mk would be needed
to work with different paths.
Signed-off-by: Beraldo Leal <bleal@redhat.com>
fixes#9748
A configuration option `guest_component_procs` has been introduced that
indicates which guest component processes are supposed to be spawned by
the agent. The default behaviour remains that all of those processes are
actively spawned by the agent. At the moment this is based on presence
of binaries in the rootfs and the guest_component_api_rest option.
The new option is incremental:
none -> attestation-agent -> confidential-data-hub -> api-server-rest
e.g. api-server-rest implies attestation-agent and confidential-data-hub
the `none` option has been removed from guest_component_api_rest, since
this is addresses by the introduced option.
To not change expected behaviour for non-coco guests we still will still
only attempt to spawn the processes if the requested attestation binaries
are present on the rootfs, and issue in warning in those cases.
Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
We are currently building Oras from source on ppc64le. Now that they offically release the artefacts
for power, consume them to install Oras.
Fixes: #9213
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
Wainer noticed this is failing for the coco-qemu-dev case, and decided
to skip it, notifying me that he didn't fully understand why it was not
failing on TDX.
Turns out, though, this is also failing on TDX, and we need to skip it
there as well.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add the CLI flag --runtime-class-names, which is used during
policy generation. For resources that can define a
runtimeClassName (e.g., Pods, Deployments, ReplicaSets,...)
the value must have any of the --runtime-class-names as
prefix, otherwise the resource is ignored.
This allows to run genpolicy on larger yaml
files defining many different resources and only generating
a policy for resources which will be deployed in a
confidential context.
Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
Use the variable BATS_TEST_COMPLETED which is defined by the bats framework
when the test finishes. `BATS_TEST_COMPLETED=` (empty) means the test failed,
so the node syslogs will be printed only at that condition.
Fixes: #9750
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test fails with qemu-coco-dev configuration and guest-pull image pull.
Issue: #9667
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
It's cloning the nydus-snapshotter repo from the version specified in
versions.yaml, however, the deployment files are set to pull in the
latest version of the snapshotter image. With this version we are
pinning the image version too.
This is a temporary fix as it should be better worked out at nydus-snapshotter
project side.
Fixes: #9742
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test fails with qemu-coco-dev configuration and guest-pull image pull.
Issue: #9668
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test fails with qemu-coco-dev configuration and guest-pull image pull.
Issue: #9666
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test fails with qemu-coco-dev configuration and guest-pull image pull.
Issue: #9664
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test fails with qemu-coco-dev configuration and guest-pull image pull.
Unlike other tests that I've seen failing on this scenario, k8s-seccomp.bats
fails after a couple of consecutive executions, so it's that kind of failure
that happens once in a while.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This will enable the k8s tests to leverage guest pulling when
PULL_TYPE=guest-pull for qemu-coco-dev runtimeclass.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The runtime handler annotation is required for Kubernetes <= 1.28 and
guest-pull pull type. So leverage $PULL_TYPE (which is exported by CI jobs)
to conditionally apply the annotation.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
It creates this line, as the Golang runtime does:
-object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0
Signed-off-by: Emanuel Lima <emlima@redhat.com>
We need to remove the device from the tracking map, a container
restart will increment the bus index and we will get out of root-ports
and crash the machine.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We need special handling for pod_sandbox, pod_container and
single_container how and when to inject CDI devices
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.
Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.
Fixes: #8860
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Let's see if it helps with issues like:
```
error: must build at directory: not a valid directory: evalsymlink
failure on
'"/home/runner/actions-runner/_work/kata-containers/kata-containers/tests/functional/kata-deploy/../../..//tools/packaging/kata-deploy/kata-cleanup/overlays/k0s"'
: lstat
/home/runner/actions-runner/_work/kata-containers/kata-containers/tests/functional/kata-deploy/":
no such file or directory
```
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This takes a few minutes that could be saved, so let's avoid doing this
on all the platforms, but simply do this when it's needed (the baremetal
use case).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Currently only "baremetal" runs all the tests, but we could easily run
"all" locally or using the github provided runners, even when not using
a "baremetal" system.
The reason I'd like to have a differentiation between "all" and
"baremetal" is because "baremetal" may require some cleanup, which "all"
can simply skip if testing against a fresh created VM.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
For now we've only exposed the option to deploy kata-deploy for k3s and
vanilla kubernetes when using containerd.
However, I do need to also deploy k0s and rke2 for an internal CI, and
having those exposed here do not hurt, and allow us to easily expand the
CI at any time in the future.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
k0s was added to kata-deploy, but it's kata-cleanup counterpart was
never added. Let's fix it.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
k0s deployment has been broken since we moved to using `tomlq` in our
scripts. The reason is that before using `tomlq` our script would,
involuntarily, end up creating the file.
Now, in order to fix the situation, we need to explicitly create the
file and let `tomlq` add the needed content.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This option allows to add a debug monitor socket when
`enable_debug = true` to control QEMU within debugging case.
Fixes: #9603
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The `update_env_pci()` function need the PCI address mapping to
translate the host PCI address to guest PCI address in below
environment variables:
- PCIDEVICE_<prefix>_<resource-name>_INFO
- PCIDEVICE_<prefix>_<resource-name>
So collect PCI address mapping for both vfio-pci-gk and
vfio-pci devices.
Fixes#9614
Signed-off-by: Lei Huang <leih@nvidia.com>
The k8s test suite halts on the first failure, i.e., failing-fast. This
isn't the behavior that we used to see when running tests on Jenkins and it
seems that running the entire test suite is still the most productive way. So
this disable fail-fast by default.
However, if you still wish to run on fail-fast mode then just export
K8S_TEST_FAIL_FAST=yes in your environment.
Fixes: #9697
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This test has failed in confidential runtime jobs. Skip it
until we don't have a fix.
Fixes: #9663
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
One of our machines is running CentOS 9 Stream, and we could easily
verify that we can build and install the kbs client there, thus we're
expanding the installation script to also support CentOS 9 Stream.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`externals.coco-kbs.toolchain` is not defined, get the rust_version from
`externals.coco-trustee.toolchain` instead.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We're bumping the version in order to bring in the customisation needed
for setting up a custom pccs, which is needed for the KBS integration
tests with Kata Containers + TDX.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR enables the installation and unistallation of the kbs client
as well as general coco components needed for the TDX GHA CI.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR fixes the minvalue for boot time to avoid the random failures
of the GHA CI.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Replace tab indentation with spaces for the three lines within the ifneq statements, aligning them with the surrounding code.
Fixes:#9692
Signed-off-by: sidneychang <2190206983@qq.com>
The `dial_timeout` works fine for Runtime-go, but is obsoleted in
Runtime-rs.
When the pod cannot connect to the Agent upon starting, we need to adjust
the `reconnect_timeout_ms` to increase the number of connection attempts to
the Agent.
Fixes: #9688
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
For the TD attestation to work the connection to QGS on the host is needed.
By default QGS runs on vsock port 4050, but can be modified by the host owner.
Format of the qemu object follows the SocketAddress structure, so it needs to be provided in the JSON format, as in the example below:
-object '{"qom-type":"tdx-guest","id":"tdx","quote-generation-socket":{"type":"vsock","cid":"2","port":"4050"}}'
Fixes: #9497
Signed-off-by: Jakub Ledworowski <jakub.ledworowski@intel.com>
The new version of sriov-network-device-plugin adds an env
`PCIDEVICE_<prefix>_<resource-name>_INFO`, which has a json
value; kata-agent can't parse it as env
`PCIDEVICE_<prefix>_<resource-name>` which has value in format
"DDDD:BB:SS.F".
This change updates env `PCIDEVICE_<prefix>_<resource-name>_INFO`.
Signed-off-by: Lei Huang <leih@nvidia.com>
Container tags can be a maximum of 128 characters long
so calculate the length of the arch suffix and then restrict
the tag to this length subtracted from 128
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Previously I copied the logic that abbreviated the commit hash
from the versioning, but looking at our versions.yaml the clear pattern
is that when pointing at commits of dependencies we use the full
commit hash, not the abbreviated one, so for consistency I think we should
do the same with the components that we make available
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
As we have multi-arch builds for nearly all components, we want to ensure
that all the cache tags we set have the architecture suffix, not just the
`TARGET_BRANCH` one.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Remove `rand.Seed` call to resolve the following failure:
```
rand.Seed is deprecated: As of Go 1.20 there is no reason to call Seed with a random value.
```
The go rand.Seed docs: https://pkg.go.dev/math/rand@go1.20#Seed
back this up and states:
> If Seed is not called, the generator is seeded randomly at program startup.
so I believe we can just delete the call.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
For now, let's allow the users to set the default_cpu and default_memory
when using TDX, as they may hit issues related to the size of the
container image that must be pulled and unpacked inside the guest,
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
- Make due to us bumping the golang version used in our CI
but `make vendor` fails without the go version in the runtime go.mod
being increased, so update this and run go mod tidy
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This PR adds the use of custom Intel DCAP configuration when
deploying the KBS.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR improves general format like variable definition to have
uniformity across the memory usage script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Now we have updated the release builds to push
artefacts to
our registry for the release, so we can cache the images, we need to
set `secrets: inherit` for all architecture's tarball builds
so that we can log into quay.io and ghcr in those steps
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
VERSION_ID is not guaranteed to be specified in os-release, this
makes kaka-deploy breaks in rolling distros like arch linux and void
linux.
Note that operating system vendors may choose not to provide
version information, for example to accommodate for rolling releases.
In this case, VERSION and VERSION_ID may be unset.
Applications should not rely on these fields to be set.
Signed-off-by: vac <dot.fun@protonmail.com>
This test doesn't fail with the guest image pulling, but it for sure
should. :-)
We can see in the bats logs, something like:
```
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 31s default-scheduler Successfully assigned kata-containers-k8s-tests/liveness-exec to 984fee00bd70.jf.intel.com
Normal Pulled 23s kubelet Successfully pulled image "quay.io/prometheus/busybox:latest" in 345ms (345ms including waiting)
Normal Started 21s kubelet Started container liveness
Warning Unhealthy 7s (x3 over 13s) kubelet Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
Normal Killing 7s kubelet Container liveness failed liveness probe, will be restarted
Normal Pulled 7s kubelet Successfully pulled image "quay.io/prometheus/busybox:latest" in 389ms (389ms including waiting)
Warning Failed 5s kubelet Error: failed to create containerd task: failed to create shim task: the file /bin/sh was not found: unknown
Normal Pulling 5s (x3 over 23s) kubelet Pulling image "quay.io/prometheus/busybox:latest"
Normal Pulled 4s kubelet Successfully pulled image "quay.io/prometheus/busybox:latest" in 342ms (342ms including waiting)
Normal Created 4s (x3 over 23s) kubelet Created container liveness
Warning Failed 3s kubelet Error: failed to create containerd task: failed to create shim task: failed to mount /run/kata-containers/f0ec86fb156a578964007f7773a3ccbdaf60023106634fe030f039e2e154cd11/rootfs to /run/kata-containers/liveness/rootfs, with error: ENOENT: No such file or directory: unknown
Warning BackOff 1s (x3 over 3s) kubelet Back-off restarting failed container liveness in pod liveness-exec_kata-containers-k8s-tests(b1a980bf-a5b3-479d-97c2-ebdb45773eff)
```
Let's skip it for now as we have an issue opened to track it down:
https://github.com/kata-containers/kata-containers/issues/9665
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to firecracker, which doesn't have support for virtio-fs /
virtio-9p, TDX used with `shared_fs=none` will face the very same
limitations.
The tests affected are:
* k8s-credentials-secrets.bats
* k8s-file-volume.bats
* k8s-inotify.bats
* k8s-nested-configmap-secret.bats
* k8s-projected-volume.bats
* k8s-volume.bats
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's make sure the logs will print the correct annotation and its
value, instead of always mentioning "kernel" and "initrd".
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We're doing that as all tests are going to be running with
`shared_fs=none`, meaning that we don't need any specific test for this
case anymore.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function will set the needed annotation for enforcing that the
image pull will be handled by the snapshotter set for the runtime
handler, instead of using the default one.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We shouldn't be using 9p, at all, with TEEs, as off right now we have no
way to ensure the channels are encrypted. The way to work this around
for now is using guest pull, either with containerd + nydus snapshotter
or with CRI-O; or even tardev snapshotter for pulling on the host (which
is the approach used by MSFT).
This is only done for TDX for now, leaving the generic, AMD, and IBM
related stuff for the folks working on those to switch and debug
possible issues on their environment.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In Kubernetes, the following values for namespace are equivalent and all refer to the default namespace:
- ` ` (namespace field missing)
- `namespace: ""` (namespace field is the empty string)
- `namespace: "default"`(namespace field has the explicit value `default`)
Genpolicy currently does not handle the empty string case correctly.
Signed-Off-By: Malte Poll <1780588+malt3@users.noreply.github.com>
When just containerd is installed without installing nerdctl,
cni plugins are missing from the installation.
containerd tarball does not include cni plugin files.
Hence install cni plugins separately for containerd.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
nerdctl requires cni plugins to be installed in /opt/cni/bin
Without bridge plugin installed, it is not possible to run a
container with nerdctl.
The downloaded nerdctl tarball contains cni plugin files, but are
extracted under /usr/local/libexec.
Copy extracted tarball cni files under /usr/local/libexec
to /opt/cni/bin
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Use `eval` to process the `date` command along with its parameters,
thus avoiding misinterpreting the parameters as commands.
Fixes: #9661
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
Determine the realpath of kata-shim avoiding the check fails
in case the kata-shim is not a symlink, as was happening prior
to this commit.
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
- The tags have a trailing non-printable character, which results
in our cache tags having a trailing underscore e.g. `ghcr.io/kata-containers/cached-artefacts/agent:ce24e9835_`
For ease of use of these cached components, we should strip off the trailing underscore.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
An e2e test for `vfio-ap` has been conducted internally in IBM
due to the lack of publicly available test machines equipped
with a required crypto device.
The test is performed by the `tests` repository:
(i.e. 772105b560/Makefile (L144))
The community is working to integrate all tests into the `kata-containers`
repository, so the `vfio-ap` test should be part of that effort.
This commit moves a test script and Dockerfile for a test image from
the `tests` repository. We do not rename the script to `gha-run.sh`
because it is not executed by Github Actions' workflow.
You can check the test results from the s390x nightly test with the migrated files here:
https://github.com/kata-containers/kata-containers/actions/runs/9123170010/job/25100026025
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Given that we think the containerd -> snapshotter image cache
problems have been resolved by bumping to nydus-snapshotter v0.3.13
we can try removing the skips to test this out
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- We previously have an expectation for the pause rootfs
to be pull on the host when we did a guest pull. We weren't
really clear why, but it is plausible related to the issues we had
with containerd and nydus caching. Now that is fixed we can begin
to address this with setting shared_fs=none, but let's start with
updating the rootfs host check to be not higher than expected
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This test is called from `tests/integration/run_kuberentes_tests.sh`,
which already ensures that yq is installed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
On 1423420, I've mistakenly disabled the tests entirely, for both
non-TEEs and TEEs.
This happened as I didn't realise that `confidential_setup` would take
non-TEEs into consideration. :-/
Now, let me follow-up on that and make sure that the tests will be
running on non-TEEs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This function is a helper to check whether the KATA_HYPERVISOR being
used is a confidential hardware (TEE) or not, and we can use it to
skip or only run tests on those platforms when needed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's rename it to `is_confidential_runtime_class`, and adapt all the
places where it's called.
The new name provides a better description, leading to a better
understanding of what the function really does.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Implementation of QemuCmdLine has a fairly uniform and repetitive structure
that's guided by a set of conventions. These conventions have however been
mostly implicit so far, leading to a superfluous and annoying
request/force-push churn during qemu-rs PR reviews.
This commit aims to make things explicit so that contributors can take them
into account before an initial PR submission.
Signed-off-by: Pavel Mores <pmores@redhat.com>
This commit is to append an arch type to the initramfs-cryptsetup image
to prevent a wrong arch image from being pulled on a different arch host.
Fixes: #9654
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
- Container image tags can only contain alphanumeric, period,
hyphen and underscore characters, so convert characters outside
of these to be underscores, to avoid having invalid tag failures
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Recently the extra gpu caching was added, unfortunately when I
rebased I ended up with both the new tagging logic and old logic.
Let's try and integrate them properly to avoid doing the push twice.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
in case the upstream CI fails it's useful to pin-point the PR that
caused the regression. Currently openshift-ci does not allow doing that
from their setup but we can mimic the setup on our infrastructure and
use the available kata-deploy-ci images to find the first failing one.
To help with that add a few helper scripts and a howto.
Fixes: #9228
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
Now we have the workflow updated and can test the changes in caching
we've hit an error:
```
line 1180: artefact_tag: unbound variable
```
so we need to fix that up. Sorry for missing this before.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- It appears like the `if` isn't required when setting env as a
conditional
- `inputs.stage` over input.stage
- Swap matrix.component to matrix.asset
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
ResizeMemory for Cloud Hypervisor is missing a check for the new
requested memory being greater than the max hotplug size after
alignment. Add the check, and since an earlier check for this
setsrequested memory to the max hotplug size, do the same in the
post-alignment check.
Fixes#9640
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
All these settings are hardcoded as `false` and result in
no extra options on the QEMU command line, like the go
runtime does. There actually not needed :
- we're never going to ask QEMU to survive a guest shutdown
- we're never going to run QEMU daemonized since it prevents
log collection
- we're never going to ask QEMU to start with the guest stopped
No need to keep this code around then.
Signed-off-by: Greg Kurz <groug@kaod.org>
- CoCo wants to use the agent and coco-guest-components cached artifacts
so tag them with a helpful version, so make these easier to get
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
No commands remaining.
- We don't want to ship certain components (agent, coco-guest-components)
as part of the release, but for other consumers it's useful to be able to pull in the components
from oras, so rather than not building them, just don't upload it as part of the release.
- Also make the archs all consistent on not shipping the agent
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Set RELEASE env to 'yes', or 'no', based on if the stage
passed in was 'release', so we can use it in the build scripts
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
For other projects (e.g. CoCo projects) being able to
access the released versions of components is helpful,
so push these during the release process
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
A length of the result of `git log -1 --pretty=format:%h` could vary
over different CI systems, highly likely messing up their caching
mechanisms.
This commit is to use an option `--abbrev=9` to standardize the length
to 9 characters for CI.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Bump nydus snapshotter to v0.13.13 to fix the gap when switching
different snapshotters in guest pull.
Fixes: #8407
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This will help us to debug issues in the future (and would have helped
in the past as well). :-)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is needed, as b1710ee2c0 made the
default agent shipped the one with policy support. However, we simply
didn't update the rootfs to reflect that, causing then an issue to start
the agent as shown by the strace below:
```
open("/etc/kata-opa/default-policy.rego", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = -1 ENOENT (No such file or directory)
futex(0x7f401eba0c28, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1 RT_2], [], 8) = 0
tkill(553681, SIGABRT) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=553681, si_uid=1000} ---
+++ killed by SIGABRT (core dumped) +++
```
This happens as the default policy **must** be set when the agent is
built with policy support, but the code path that copies that into the
rootfs is only triggered if the rootfs itself is built with
AGENT_POLICY=yes, which we're now doing for both confidential and
non-confidential cases.
Sadly this was not caught by CI till we the cache was not used for
rootfs, which should be solved by the previous commit.
Fixes: #9630, #9631
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is to add an info for files at `tools/packaging/kata-deploy/local-build/*
to a version of the components and ensure that the cached artefacts are not used
when the files of interest are updated.
Fixes: #9630
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
the `tdx_not_supported_warning` function does not exists, the
`tdx_not_supported` should be called instead.
Fixes: #9628
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
Using a debugger with the kata runtime is complicated, but it can be done
and can be very useful.
This commits provides a helper script that simplifies it, and updates
the developper's documentation to explain how to use it.
Signed-off-by: Julien Ropé <jrope@redhat.com>
This PR fixes the indentation in gha run k8s common script
to have uniformity across the script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
By default, when a container is created with the `--privileged` flag,
all devices in `/dev` from the host are mounted into the guest. If
there is a block device(e.g. `/dev/dm`) followed by a generic
device(e.g. `/dev/null`),two identical block devices(`/dev/dm`)
would be requested to the kata agent causing the agent to exit with error:
> Conflicting device updates for /dev/dm-2
As the generic device type does not hit any cases defined in `switch`,
the variable `kataDevice` which is defined outside of the loop is still
the value of the previous block device rather than `nil`. Defining `kataDevice`
in the loop fixes this bug.
Signed-off-by: cncal <flycalvin@qq.com>
kube-router decided to use :8080 for its metrics, and this seems to be a
change that affected k0s 1.30.0+, leading to kube-router pod crashing
all the time and anything can actually be started after that.
Due to this issue, let's simply use a different port (:9999) and move on
with our tests.
Fixes: #9623
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Whenever we count on having the headers tarball, we must unpack the
cached content into the expected directory, otherwise we'd simply fail,
as we've been failing in our CI, at the end of the process where we
generate the tarball from the cached components.
It's weird to me, sincerely, that the headers tarball end up in such
weird place (build/kernel-nvidia-gpu/builddir/), but I'll leave that to
Zvonko to figure out whether something better can be done, as the intuit
of this PR is simply unblock Kata Containers CI.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
New env var so everyone can test the PUSH_TO_REGISTRY feature
export PUSH_TO_REGISTRY=yes
export ARTEFACT_REGISTRY=quay.io
export ARTEFACT_REPOSITORY=my-fancy-kata-containers
export ARTEFACT_REGISTRY_USERNAME=zvonkok
export ARTEFACT_REGISTRY_PASSWORD=<super-secret>
make ...-tarball
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
This PR updates the launch times scripts by improving the variable
definition as well as trying to use the same format across all the script.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
RTC was being built in a wrong fashion on commit #2bc5e3c6e2ab0145fa9e8be95df0d5086c07a517
RTC was being constructed inside the QemuCmdLine struct,
but it should've been built inside the devices vector.
Signed-off-by: Emanuel Lima <emlima@redhat.com>
This was needed when we were using an old (and not maintained anymore)
host stack. Considering what we have as part of the distros, Today,
this can simply be dropped, as I cannot find any reference of this one
being needed in any up-to-date documentation.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit b7cccfa019.
The `private=on` bit has never made its way upstream, and was removed
from the latest iteration that we're using. With that in mind, let's
revert its usage in the code.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit 582b5b6b19.
The `private=on` bit has never made its way upstream, and was removed
from the latest iteration that we're using. With that in mind, let's
revert its addition, and later on its usage in the code.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Here we're checking the distro's `/etc/os-release` or
`/usr/lib/os-release` in order to get which distro we're deploying the
Kata Containers artefacts to, and then to properly adjust the QEMU and
OVMF with TDX support that's been shipped with the distros.
Together with that, we're also printing the instructions provided by the
distro on how to enable and use TDX.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the PLACEHOLDER_FOR_DISTRO_{QEMU,OVMF}_WITH_TDX_SUPPORT
variables instead of actually setting a path, so we can easily replace
those as part of our deployment scripts.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We'll need to have access to the host os-release file (either under
`/etc/os-release` or under `/usr/lib/os-release`), and the simplest
approach that comes to my mind to do is doing what a debug pod would do,
mounting `/` as `/host` and then allowing us to have access to those
files, and then corectly set the TDX specific QEMU and OVMF (TDVF) paths
for the tdx available configurations.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We haven't been using nor testing with td-shim, as Cloud Hypervisor does
not officially support TDX yet, and TDVF is supposed to be used with
QEMU, instead of td-shim.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's remove everything related to the TDX specific QEMU building /
shipping from our repo, as we'll be relying on the one coming from the
distros.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's remove everything related to the TDVF building / shipping from our
repo, as we'll be relying on the one coming from the distro.
Later on, we may need to re-add TDVF logic, as we're already using
upstream edk2 repo / content, but when that's needed we'll simply revert
this commit.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes the random write value for FIO for qemu by decreasing it
to avoid the random failures of the GHA CI.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's skip those tests on TEEs as we've been facing a reasonable amount
of issues, most likely on the containerd side, related to pulling the
image on the guest.
Once we're able to fix the issues on containerd, we can get back and
re-enable those by reverting this commit.
The decision of disabling the tests for TEEs is because the machines may
end up in a state where human intervention is necessary to get them back
to a functional state, and that's really not optimal for our CI.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is the first step of the work to start relying on the artefacts
coming from the distros (CentOS 9 Stream, and Ubuntu) themselves.
Let's have this first one merged, as this will not run the CI due to the
changes being on the yaml itself, and then follow-up with the changes
needed on other parts of the project (kata-deploy, runtime, etc).
Fixes: #9590 -- part I
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Everytime I create contianer on arm64 machine, containerd/kata logs a redundant warning
as follows:
``` shell
time="2024-05-07" level=warning msg="<nil>" arch=arm64 name=containerd-shim-v2
pid=xxx sandbox=fdd1f05 source=virtcontainers/hypervisor
```
I added an error statement so that the error would be logged when it occurs.
Signed-off-by: cncal <flycalvin@qq.com>
This PR removes oci information from versions file as this is not
longer being used in kata containers repository.
Fixes#9599
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds support to install KBS to k8s TDX GHA workflow in
order to run confidential attestation tests.
Fixes#9451
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR adds a k8s negative policy test to the confidential attestation
bats test.
Fixes#9437
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
We should init and asign the runtime instance to runtime
handler, otherwise, if the pause container failed to start,
which means the runtime instance failed to start, then the
following delete & shutdown request wouldn't be run, thus
the dead shim would be left.
Fixes: #9597
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
dragonball reserves 2048G of mmio space for the pci root bus by default
on physical addresses greater than 4G. However, for some machines with
smaller physical address widths, such as 39-bit wide physical addresses,
dragonball reserves the mmio space when initializing the memory. It is
less than 2048G, so this commit dynamically calculates and allocates the
mmio size of each pci root bus.
Fixes: #9509
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
The CH v39 upgrade in #9575 is currently blocked because of a bug in the
Mariner host kernel. To address this, we temporarily tweak the Mariner
CI to use an Ubuntu host and the Kata guest kernel, while retaining the
Mariner initrd. This is tracked in #9594.
Importantly, this allows us to preserve CI for genpolicy. We had to
tweak the default rules.rego however, as the OCI version is now
different in the Ubuntu host. This is tracked in #9593.
This change has been tested together with CH v39 in #9588.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
The build message shows that yq was not found when I tried to build
runtime binaries, but I've actually installed yq by yum install.
Signed-off-by: cncal <flycalvin@qq.com>
If device /dev/kvm does not exist, kata-runtime check would fail with
an ambiguous error messae 'no such file or directory'. I added a little
more details to make it understandable and it will belike:
```
ERRO[0000] cannot open kvm device: no such file or directory arch=arm64 check-type=full device=/dev/kvm name=kata-runtime pid=2849085 source=runtime
ERRO[0000] no such file or directory arch=arm64 name=kata-runtime pid=2849085 source=runtime
no such file or directory
```
Signed-off-by: cncal <flycalvin@qq.com>
- The testVersionString logic use regex to check that the ociVersion is
displayed correctly, but with the new go module that version has a
`+` in, so we need to quote this to escape special characters
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Dependabot doesn't follow all our commit format guidelines,
so add a check and skip these if the author is `dependabot[bot]`
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
With the addition of the 'qemu-coco-dev' runtimeClass we no longer need
to run CoCo tests on non-TEE environments with 'qemu'. As a result the
tests also no longer need to set the "io.katacontainers.config.hypervisor.image"
annotation to pods.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Created the runtimeclasses/kata-qemu-coco-dev.yaml file and updated the list
of SHIMS.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Created a new configuration to configure Kata for CoCo without requiring TEE
hardware so to allow developers implement/test/debug platform agnostic code
on their workstations. It will also ease testing of CoCo features on CI with
non-TEE supported VMs.
This is based off qemu configuration. The following differences applied:
- switched to confidential guest image/initrd
- switched to confidential kernel
- switched to 9p shared_fs
Fixes#9487
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Now that the `kata-agent` is being built with policy support, let's stop
building the `kata-opa-agent`, reducing the amount of things we need to
test and maintain.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Now that the OPA binary is not required anymore, let's start shipping
the agent with the policy enabled by default.
The agent *without* policy enabled has 30MB, while it's 34MB *with* the
policy enabled.
This 4MB (~10%) increase is, IMHO, worth it in order to reduce the
amount of components we have to maintain and test, including the
possibility to also reduce the amount of possible rootfs / initrd
images.
Whoever wants to use the agent without policy enabled can simply do that
by building their own agent. :-)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Since OPA binary was replaced by the regorus crate, we can finally stop
building and shipping the binary.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As we have an issue with a golang version for `run-cri-containerd`,
it is required to bump the language.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
iommu_platform support was already added on initial DeviceVhostUserFs
introduction, however it incorrectly enabled iommu_platform also on
non-CCW (e.g. PCI) systems.
Signed-off-by: Pavel Mores <pmores@redhat.com>
iommu_platform is only turned on for CCW systems.
PartialEq is added to VirtioBusType to enable the '==' operator.
Signed-off-by: Pavel Mores <pmores@redhat.com>
The adding itself is done by a new function add_iommu() that conforms with
the add_*() convention. Note though that this function is called
internally, by the QemuCmdLine constructor, simply because there's nothing
to trigger its invocation from QemuInner (unlike the other add_*()
functions so far).
Signed-off-by: Pavel Mores <pmores@redhat.com>
Remove reminder to initialize Policy earlier, because currently there
are no plans to initialize earlier.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
When docker is installed on the host system using script from https://get.docker.com/ it automatically creates a docker group with gid=999.
Then during docker build process of tarball, eg. make qemu-tdx-experimental-tarball docker is also installed inside the image with the same
script, which also automatically adds docker group with gid=999.
Then, the build tries to add a new group docker_on_host with gid=999, which already exists, which breaks the build.
Signed-off-by: Jakub Ledworowski <jakub.ledworowski@intel.com>
Avoid auto-generating Policy on platforms that haven't been tested
yet with auto-generated Policy.
Support for auto-generated Policy on these additional platforms is
coming up in future PRs, so the tests being fixed here were
prematurely enabled.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This should only be done once, and if CRI-O restarts, there's a big
chance kata-deploy will also restart and the user would end up with a
file that looks like:
```
[crio]
log_level = "debug"
[crio]
log_level = "debug"
[crio]
log_level = "debug"
...
```
And that would simply cause CRI-O to not start.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Implement Agent Policy using the regorus crate instead of the OPA
daemon.
The OPA daemon will be removed from the Guest rootfs in a future PR.
Fixes: #9388
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Remove k8s-policy-set-keys.bats in preparation for using the regorus
crate instead of the OPA daemon for evaluating the Agent Policy. This
test depended on sending HTTP requests to OPA.
Fixes: #9388
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Lock anyhow version to 1.0.58 because:
- Versions between 1.0.59 - 1.0.76 have not been tested yet using
Kata CI. However, those versions pass "make test" for the
Kata Agent.
- Versions 1.0.77 or newer fail during "make test" - see
https://github.com/kata-containers/kata-containers/issues/9538.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
In case of block devices using virtio-block, we need to pass the
pci-path as the storage source field to the agent.
Current the virt-path is being passed which works just for mmio block
devices.
In the future when support is added for scsi, block-ccw and pmem
devices, the storage source would need to be handled accordingly.
Fixes: #9034
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This commit is to add a new CI job to run-k8s-tests-on-zvsi.yaml.
Why the job is not configured in run-kata-coco-tests.yaml by having it
integrated with `run-k8s-tests-coco-nontee` is:
- It uses k3s instead of AKS
- It runs on a self-hosted runner
These differences make the integrated job not easy to read and maintain
when it comes to incorporating other platforms in the near future.
Fixes: #9467
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
In do_create_container and do_exec_process, we should create the proc_io first,
in case there's some error occur below, thus we can make sure
the io stream closed when error occur.
Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Fixes: #9334
In linux, when a FIFO is opened and there are no writers, the reader
will continuously receive the HUP event. This can be problematic.
To avoid this problem, we open stdin in write mode and keep the stdin-writer
We need to open the stdout/stderr as the read mode and keep the open endpoint
until the process is delete. otherwise,
the process would exit before the containerd side open and read
the stdout fifo, thus runD would write all of the stdout contents into
the stdout fifo and then closed the write endpoint. Then, containerd
open the stdout fifo and try to read, since the write side had closed,
thus containerd would block on the read forever.
Here we keep the stdout/stderr read endpoint File in the common_process,
which would be destroied when containerd send the delete rpc call,
at this time the containerd had waited the stdout read return, thus it
can make sure the contents in the stdout/stderr fifo wouldn't be lost.
Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
In scenario passfd-io, we should wait for stdin to close itself
instead of manually intervening in it.
Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This patch adds a default implementation for the use_sandbox_pidns
and updates the structs that implement the K8sResource trait to use
the default.
Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
- Provide default implementation for get_sandbox_name in K8sResource trait
- Remove default implementation from structs implementing the trait K8sResource
Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
concurrently with itself
Based on 3a1461b0a5186a92afedaaea33ff2bd120d1cea0
Previously the tool would use the layers_cache folder for all instances
and hence delete the cache when it was done, interfereing with other
instances. This change makes it so that each instance of the tool will
have its own temp folder to use.
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
The new run-k8s-tests-coco-nontee job should be the home of attestation
tests.
Changed run-k8s-tests-coco-nontee to get KBS installed and by the time the
KBS variable is exported in the environment then the attestation tests
will kick in (likewise they will skip in run-k8s-tests-on-aks).
Fixes#9455
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This is to update a document `how-to-run-kata-containers-with-SE-VMs`
on using confidential artifacts to build a secure image.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This is to make `boot-image-se-tarball` use confidential kernel and
initrd instead of vanilla version of artifacts.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
updates golang.org/x/net to newer version that closes some reported
vulnerabilities and security issues
Fixes#9486
Signed-off-by: Adil Sadik <sparky.005@gmail.com>
1. EPOLLHUP events also need to be read and will be got len 0.
2. We should kill the connection when EPOLLERR events are received.
Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This is to make a workflow `run-k8s-tests` and `run-cri-containerd`
(s390x and zvsi) run only on the runners labeled by `s390x-large`.
Fixes: #9507
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
- Bump the stalebot action version to v9 as that fixes the
```
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/stale@v8.
```
warning.
Fixes: #9512
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This commit is simply to remove a CI workflow `k8s-cri-containerd-rhel9-e2e-tests`.
Fixes: #9504
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
As documented in https://github.com/actions/stale?tab=readme-ov-file#start-date
> The start date is used to ignore the issues and pull requests created before the start date.
> Particularly useful when you wish to add this stale workflow on an existing repository
> and only wish to stale the new issues and pull requests.
As we don't want need to treat PRs older than May 2023 as a special case, then remove this option.
Fixes: #9502
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We've discussed this over and over. Let's try to get to an agreement here.
I will use this issue to remove the mandatory Issue - PR dependency.
Fixes: #9500
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Configuration file for qemu with runtime-rs was recently renamed.
Doc contains name for old file. This was somehow not caught in the CI
earlier.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Document describes the steps needed to pass an entire Intel Discrete GPU
as well a GPU SR-IOV interface to a Kata Container.
Fixes: #9083
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Auto-generate the policy and then simulate attacks from the K8s
control plane by modifying the test yaml files. The policy then
detects and blocks those changes.
These test cases are using K8s Pods. Additional policy failures
are injected during CI using other types of K8s resources - e.g.,
using Jobs and Replication Controllers - from separate PRs.
Fixes: #9491
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
The commit is to make the OPA build from source working in `ubuntu-rootfs-osbuilder`.
To achieve the goal, the configuration is changed as follows:
- Switch the make target to `ci-build-linux-static` not triggering docker-in-docker build
- Install go in the builder image for s390x and ppc64le
Fixes: #9466
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The new CoCo non-tee job introduced on commit 0d5399ba92 need to read secrets
like AZ_TENANT_ID, so run-kata-coco-tests workflow should inherit the secrets from
the caller workflow.
Fixes#9477
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Fixes: #9472
For initrd and image, the related NVIDIA will not use the default targets and we will pin them to a specific release.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
In order to build a coco {image,initrd}, it is required to
specify its name and version in versions.yaml. This commit
is to add the configuration for them, respectively.
Fixes: #9470
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
`CONFIG_TN3270_TTY` and `CONFIG_S390_AP_IOMMU` are dropped for s390x
in 6.7.x which is used for a confidential kernel.
But they are still used for a vanilla kernel. So we need to add them
to the whitelist.
Fixes: #9465
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This commit expands the VMM matrix for run-cri-containerd,
adding a new item `qemu-runtime-rs` for a test scenario where
the VMM is QEMU and runtime-rs is employed.
This expansion affects the workflows for both x86_64 and s390x platforms.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
To make `qemu-runtime-rs` working for CI, we have to rename a configuration
template file and `CONFIG_FILE_QEMU` in Makefile.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
`qemu-runtime-rs` will be utilized to handle a test scenario where
the VMM is QEMU and runtime-rs is employed.
Note: Some of the tests are skipped. They are going to be reintegrated in
the follow-up PR (Check out #9375).
Fixes: #9371
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Linux kernel generates a panic when the init process exits.
The kernel is booted with panic=1, hence this leads to a
vm reboot.
When used as a service the kata-agent service has an ExecStop
option which does a full sync and shuts down the vm.
This patch mimicks this behavior when kata-agent is used as
the init process.
Fixes: #9429
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
isClhRunning uses signal 0 to test whether the process is
still alive or not. This doesn't work because the process is a
direct child of the shim. Once it is dead the process becomes
zombie.
Since no one waits for it the process lingers until
its parent dies and init reaps it. Hence sending signal 0 in
isClhRunning will always return success whether the process is
dead or not.
This patch calls wait to reap the process, if it succeeds that
means it is our child process, if not we send the signal.
Fixes: #9431
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Auto-generate the policy and then simulate attacks from the K8s
control plane by modifying the test yaml files. The policy then
detects and blocks those changes.
These test cases are using K8s Replication Controllers. Additional
policy failures will be injected using other types of K8s resources
- e.g., using Pods and/or Jobs - in separate PRs.
Fixes: #9463
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
genpolicy is a handy tool to use in CI systems, to prepare workloads
before applying them to the Kubernetes API server. However, many modern
build systems like Bazel or Nix restrict network access, and rightfully
so, so any registry interaction must take place on localhost.
Configuring certificates for localhost is tricky at best, and since
there are no privacy concerns for localhost traffic, genpolicy should
allow to contact some registries insecurely. As this is a runtime
environment detail, not a target environment detail, configuring
insecure registries does not belong into the JSON settings, so it's
implemented as command line flags.
Fixes: #9008
Signed-off-by: Markus Rudy <webmaster@burgerdev.de>
By passing --overwrite-existing to `aks get-credentials` it will stop
asking if I want to overwrite the existing credentials. This is handy
for running the scripts locally.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
When running on non-TEE environments (e.g. KATA_HYPERVISOR=qemu) the tests should
be stressing the CoCo image (/opt/kata/share/kata-containers/kata-containers-confidential.img)
although currently the default image/initrd is built to be able to do guest-pull as well.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Enabled guest-pull tests on non-TEE environment. It know requires the SNAPSHOTTER environment
variable to avoid it running on jobs where nydus-snapshotter is not installed
Fixes: #9410
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Created the new run-k8s-tests-coco-nontee jobs for running CoCo tests on
non-TEE. It currently generates the run-k8s-tests-coco-nontee(qemu, nydus, guest-pull)
job only to run the guest-pull tests.
Fixes: #9410
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This PR adds the function to uninstall kbs client command function
specially when we are running with baremetal devices.
Fixes#9460
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
CRIs don't always use a pause container, but even if they do the
concrete container choice is not specified. Even if the CRI config can
be tweaked, it's not guaranteed that registries in the public internet
can be reached. To be portable across CRI implementations and
configurations, the genpolicy user needs to be able to configure the
container the tool should append to the policy.
Signed-off-by: Markus Rudy <webmaster@burgerdev.de>
Decouple initialization of the Settings struct from creating the
AgentPolicy struct, so that the settings are available for evaluating,
extending or overriding command line arguments.
Signed-off-by: Markus Rudy <webmaster@burgerdev.de>
This patch introduces a one-time cpath to mitigate the cgroup residuals. It
might break the device cgroup merging rules when the cgroup has children.
Fixes: #9456
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
This PR defines the PULL_TYPE variable to avoid failures of unbound
variable when this is being test it locally.
Fixes#9453
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Auto-generate the policy and then simulate attacks from the K8s
control plane by modifying the test yaml files. The policy then
detects and blocks those changes.
These test cases are using K8s Jobs. Additional policy failures
will be injected using other types of K8s resources - e.g., using
Pods and/or Replication Controllers - in future PRs.
Fixes: #9406
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This PR adds onednn test to exercise additional ML benchmarks.
Onednn is an Intel-optimized library for Deep Neural Networks.
Fixes: #9390
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This PR adds openvino test in order to exercise additional ML
benchmarks.
OpenVino bench used to optimize and deploy deep learning models.
Fixes: #9389
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
Include HTTP and HTTPS env variables in the building docker
images because they are required to download packages
such as Phoronix.
Added a restriction that verifies that docker building images
is performed as root.
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
Adds a function that receives as a single parameter the name of
a valid Kata configuration file which will be established as
the default kata configuration to start kata containers.
Adds a second function that returns the path to the current
kata configuration file.
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
Add an extra function that updates kata config
to use the max num. of vcpus available and
to use the available memory in the system.
Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
This is actually a first attempt to document our CI, and all this
content was based on the document created by Fabiano Fidencio (kudos to
him). We are just moving the content and discussion from Google Docs to
here.
I used the "poetic license" to add some notes on what I believe our CI
will look like in the future.
Fixes#9006
Co-authored-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Beraldo Leal <bleal@redhat.com>
- Add v1 image test case
- Install protobuf-compiler in build check
- Reset containerd config to default in kubernetes test if we are testing genpolicy
- Update docker_credential crate
- Add test that uses default pull method
- Use GENPOLICY_PULL_METHOD in test
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
This PR improves the kbs_k8s_delete function to verify that the
resources were properly deleted for baremetal environments.
Fixes#9379
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Add optional toggle to use existing containerd installation to pull and manage container images.
This adds support to a wider set of images that are currently not supported by standard pull method,
such as those that use v1 manifest.
Fixes: #9144
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
- Set KBC_PROVIDER and ATTESTER rather than TEE_PLATFORM
to avoid tss build issues for vTPM attester(s)
- There are future plans to make a matching TEE_PLATFORM, so this can be simplified once that is available
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- At the moment we aren't supporting ppc64le or
aarch64 for
CoCo, so filter out these tests from running
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Switch to Ubuntu 20.04 for building guest-components as
The rootfs is based on 20.04, so we need matching GLIBC versions.
See #8955
- Add dependencies needed by TDX verifier as we want to build for all platforms
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Add basic test case to check that a ruuning
pod can use the api-server-rest (and attestation-agent
and confidential-data-hub indirectly) to get a resource
from a remote KBS
Fixes#9057
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Co-authored-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Support different pause images in the guest for guest-pull, such as k8s
pause image (registry.k8s.io/pause) and openshift pause image (quay.io/bpradipt/okd-pause).
Fixes: #9225 -- part III
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This commit is a mess, but I'm not exactly sure what's the best way to
make it less messy, as we're getting QEMU TDX to work while partially
reverting 1e34220c41.
With that said, let me cover the content of this commit.
Firstly, we're reverting all the changes related to
"memory-backend-memfd-private", as that's what was used with the
previous host stack, but it seems it
didn't fly upstream.
Secondly, in order to get QEMU to properly work with TDX, we need to
enforce the 'private=on' knob and use the "memory-backend-ram", and
we're doing so, and also making sure to test the `private=on` newly
added knob.
I'm sorry for the confusion, I understand this is not optimal, I just
don't see an easy path to do changes without leaving the code broken
during those changes.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`Node.js 19` is deprecated. Bump to a new version based on `Node.js 20`.
This fixes all remaining sites.
Fixes#9245
Signed-off-by: Greg Kurz <groug@kaod.org>
`Node.js 19` is deprecated. Bump to a new version based on `Node.js 20`.
As explained at [1] :
> The contents of an Artifact are uploaded together into an immutable
> archive. They cannot be altered by subsequent jobs. Both of these
> factors help reduce the possibility of accidentally corrupting
> Artifact files.
This means that artifacts cannot have the same name.
Adapt the `run-k8s-tests-on-garm` workflow accordingly by embedding all
the other `${{ vmm.* }}` fields and `${{ inputs.tag }}` in the artifact
names that would otherwise collide.
Fixes#9245
Signed-off-by: Greg Kurz <groug@kaod.org>
`Node.js 19` is deprecated. Bump to a new version based on `Node.js 20`.
As explained at [1] :
> The contents of an Artifact are uploaded together into an immutable
> archive. They cannot be altered by subsequent jobs. Both of these
> factors help reduce the possibility of accidentally corrupting
> Artifact files.
This means that artifacts cannot have the same name.
Adapt all `build-kata-static-tarball` workflows accordingly by
embedding `${{ matrix.asset }}` in the artifact names that would
otherwise collide.
Fixes#9245
Signed-off-by: Greg Kurz <groug@kaod.org>
Occasionally, the removal of GITHUB_WORKSPACE fails for self-hosted runners
because one of the subdirectories is not empty. This is likely due to another
process occupying the directory at the time.
Implementing a secondary cleanup resolves this issue.
This commit focuses on the implementation for the secondary cleanup.
Fixes: #9317
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This reverts commit d1b54ede29.
Conflicts:
src/runtime/virtcontainers/qemu.go
This commit was a hack that was needed in order to get QEMU + TDX to
work atop of the stack our CI was running on. As we're moving to "the
officially supported by distros" host OS, we need to get rid of this.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The private=on|off knob is required in order to properly lauunch a TDX
guest VM.
This is a brand new property that is part of the still in-flight patches
adding TDX support on QEMU.
Please, see:
3fdd8072da
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's update the QEMU to the one that's officially maintained by Intel
till all the TDX patches make their way upstream.
We've had to also update python to explicitly use python3 and add
python3-venv as part of the dependencies.
Fixes: #8810
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR improves the latency test cleanup in order to avoid random
failures of leaving the pods.
Fixes#9418
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
There's an rg name duplication situation that got introduced by #9385
where 2 different test runs might have same rg name.
Add back uniqueness by including the first letter of GENPOLICY_PULL_METHOD to
cluster name.
Fixes: #9412
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
We used to utilize go runtime's "NumCPUs()", which will give the number
of cores available to the Go runtime, which may be a subset of physical
cores if the shim is started from within a cpuset. From the function's
description:
"NumCPU returns the number of logical CPUs usable by the current
process."
As an example, if containerd is run from within a smaller CPUset, the
maximum size of a pod will be dictated by this CPUset, instead of what
will be available on the rest of the system.
Since the shim will be moved into its own cgroup that may have a
different CPUset, let's stick with checking physical cores. This also
aligns with what we have documented for maxVCPU handling.
In the event we fail to read /proc/cpuinfo, let's use the goruntime.
Fixes: #9327
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
This PR defines the GH_PR_NUMBER variable in gha run k8s common
script to avoid failures like unbound variable when running
locally the scripts just like the GHA CI.
Fixes#9408
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
updating the machine config takes even longer than 1200s, use 60m to be
sure everything is updated.
Fixes: #9338
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
Try to reduce duplicated code in decrease_attach_count with public
new function do_decrease_count.
Fixes: #8738
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Try to reduce duplicated code in increase_attach_count with public
new function do_increase_count.
Fixes: #8738
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Introduce a dedicated public function do_decrease_count to
reduce duplicated code in drivers' decrease_attach_count.
Fixes: #8738
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Since there are many implementations of reference counting in the
drivers, all of which have the same implementation, we should try
to reduce such duplicated code as much as possible. Therefore, a
new function is introduced to solve the problem of duplicated code.
Fixes: #8738
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Use the "allow all" policy for k8s-sandbox-vcpus-allocation.bats,
instead of relying on the Kata Guest image to use the same policy
as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-nginx-connectivity.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-volume.bats, instead of relying
on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-sysctls.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-security-context.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-seccomp.bats, instead of relying
on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-projected-volume.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-pod-quota.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-optional-empty-secret.bats,
instead of relying on the Kata Guest image to use the same policy as
its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-measured-rootfs.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-liveness-probes.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-inotify.bats, instead of relying
on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-guest-pull-image.bats, instead of
relying on the Kata Guest image to use the same policy as its default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-footloose.bats, instead of
relying on the Kata Guest image to use the same policy as its
default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Use the "allow all" policy for k8s-empty-dirs.bats, instead of
relying on the Kata Guest image to use the same policy as its
default.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Check from:
- k8s-exec-rejected.bats
- k8s-policy-set-keys.bats
if policy testing is enabled or not, to reduce the complexity of
run_kubernetes_tests.sh. After these changes, there are no policy
specific commands left in run_kubernetes_tests.sh.
add_allow_all_policy_to_yaml() is moving out of run_kubernetes_tests.sh
too, but it not used yet. It will be used in future commits.
Fixes: #9395
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Add GENPOLICY_PULL_METHOD that will be used to test pulling
container images in genpolicy using the oci-distribution crate
and/or the containerd interface.
GENPOLICY_PULL_METHOD will start being used in a future PR.
Fixes: #9384
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
This PR removes conmon information from versions.yaml as this is not
longer being used in kata containers repository.
Fixes#9396
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Don't add the "allow all" policy to all the test YAML files anymore.
After this change, the k8s tests assume that all the Kata CI Guest
rootfs image files either:
- Don't support Agent Policy at all, or
- Include an "allow all" default policy.
This relience/assumption will be addressed in a future commit.
Fixes: #9395
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Configure the system to mount cgroups-v2 by default during system boot
by the systemd system, We must add systemd.unified_cgroup_hierarchy=1
parameter to kernel cmdline, which will be passed by kernel_params in
configuration.toml.
To enable cgroup-v2, just add systemd.unified_cgroup_hierarchy=true[1]
to kernel_params.
Fixes: #9336
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
When there's a pod with multiple containers, there may be case that
attach point more than 2, we should not return Err in that case when
we are doing attach ops, but just return Ok.
Fixes: #8738
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This reverts commit 1c5693be86.
Avoid apparent infinite loop when ReadStreamRequest is blocked by
policy - for some of the pods.
When running the k8s-limit-range.bats test with Policy enabled,
the Shim + VMM never get terminated on my cluster. Not sure why
the sandbox clean-up works better for other tests, but the
k8s-limit-range test pod gets stuck in an infinite loop:
stdout io stream copy error happens: error = %wrpc error: code =
PermissionDenied desc = \"ReadStreamRequest is blocked by policy
...
policy check: ReadStreamRequest
...
stdout io stream copy error happens: error = %wrpc error: code =
PermissionDenied desc = \"ReadStreamRequest is blocked by policy
...
policy check: ReadStreamRequest
...
Fixes: #9380
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
The 3.3.0 release installs the `kata-manager` script with overly restrictive
permissions (see #9373), so add details to help users handle the situation.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This PR removes the runc version information as this is not longer being used
in the kata containers scripts.
Fixes#9364
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Since we're removing the unused service_offload parameter,
don't set it in any of the packaging scripts.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Since we no longer use the service_offload configuration,
remove the ServiceOffload field from the image struct.
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
These experimental options were added 2 years ago
in anticipation of features that would be added
in CoCo. These do not match the features that were
eventually added and will soon be ported to main.
Fixes: #8047
Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
This is the report from `make check`:
```
error: this loop never actually loops
--> src/signal.rs:147:9
|
147 | / loop {
148 | | select! {
149 | | _ = handle => {
150 | | println!("INFO: task completed");
... |
156 | | }
157 | | }
| |_________^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#never_loop
= note: `#[deny(clippy::never_loop)]` on by default
```
There is only one option: you get something or a timeout. You never retry, so
the report is correct.
Fixes: #9342
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
The lint report is the following:
```
error: `flatten()` will run forever if the iterator repeatedly produces an `Err`
--> src/rpc.rs:1754:10
|
1754 | .flatten()
| ^^^^^^^^^ help: replace with: `map_while(Result::ok)`
|
note: this expression returning a `std::io::Lines` may produce an infinite number of `Err` in case of a read error
--> src/rpc.rs:1752:5
|
1752 | / reader
1753 | | .lines()
| |________________^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#lines_filter_map_ok
= note: `-D clippy::lines-filter-map-ok` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::lines_filter_map_ok)]`
```
This commit simply applies the suggestion.
Fixes: #9342
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Running `make check` in the `src/agent` directory gives:
```
error: you seem to use `.enumerate()` and immediately discard the index
--> rustjail/src/mount.rs:572:27
|
572 | for (_index, line) in reader.lines().enumerate() {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unused_enumerate_index
= note: `-D clippy::unused-enumerate-index` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::unused_enumerate_index)]`
help: remove the `.enumerate()` call
|
572 | for line in reader.lines() {
| ~~~~ ~~~~~~~~~~~~~~
Checking tokio-native-tls v0.3.1
Checking hyper-tls v0.5.0
Checking reqwest v0.11.18
error: could not compile `rustjail` (lib) due to 1 previous error
warning: build failed, waiting for other jobs to finish...
make: *** [../../utils.mk:177: standard_rust_check] Error 101
```
Fixes: #9342
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
support to configure CreateContainerRequestTimeout in the
configurations.
e.g.:
[runtime]
...
create_container_timeout = 300
Note: The effective timeout is determined by the lesser of two values: runtime-request-timeout from kubelet config
(https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout) and create_container_timeout.
In essence, the timeout used for guest pull=runtime-request-timeout<create_container_timeout?runtime-request-timeout:create_container_timeout.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Most of the content of `docs/Stable-Branch-Strategy.md` got de-facto
deprecated by the re-design of the release process described in #9064.
Remove this file and all its references in the repo.
The `## Versioning` section has some useful information though. It is
moved to `docs/Release-Process.md`. The documentation of the `PATCH`
field is adapted according to new workflow.
Fixes#9064 - part VI
Signed-off-by: Greg Kurz <groug@kaod.org>
Support to configure CreateContainerRequestTimeout in the annotations.
e.g.:
annotations:
"io.katacontainers.config.runtime.create_container_timeout": "300"
Note: The effective timeout is determined by the lesser of two values: runtime-request-timeout from kubelet config
(https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout) and create_container_timeout.
In essence, the timeout used for guest pull=runtime-request-timeout<create_container_timeout?runtime-request-timeout:create_container_timeout.
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
In the situation to pull images in the guest #8484, it’s important to account for pulling large images.
Presently, the image pull process in the guest hinges on `CreateContainerRequest`, which defaults to a 60-second timeout.
However, this duration may prove insufficient for pulling larger images, such as those containing AI models.
Consequently, we must devise a method to extend the timeout period for large image pull.
Fixes: #8141
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
This PR updates the journal log name for nerdctl artifacts to make
sure that we have different names in case we add a parallel GHA job.
Fixes#9357
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
the service might not listen on the default port, use the full service
address to ensure we are talking to the right resource.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
this can be used on kcli or other systems where cluster nodes are
accessible from all places where the tests are running.
Fixes: #9272
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
The automated release workflow starts with the creation of the release in
GitHub. This is followed by the build and upload of the various artifacts,
which can be very long (like hours). During this period, the release appears
to be fully available in https://github.com/kata-containers/kata-containers/
even though it lacks all the artifacts. This might be confusing for users
or automation consuming the release.
Create the release as draft and clear the draft flag when all jobs are
done. This ensure that the release will only be tagged and made public
when it is fully usable.
If some job fails because of network timeout or any other transient
error, the correct action is to restart the failed jobs until they
eventually all succeed. This is by far the quicker path to complete
the release process.
If the workflow is *canceled* for some reason, the draft release is left
behind. A new run of the workflow will create a brand new draft release
with the same name (not an issue with GitHub). The draft release from
the previous run should be manually deleted. This step won't be automated
as it looks safer to leave the decision to a human.
[1] https://github.com/kata-containers/kata-containers/releasesFixes#9064 - part VI
Signed-off-by: Greg Kurz <groug@kaod.org>
The remaining code in network.rs was mostly moved to utils.rs which seems
better home for these utility functions anyway (and a closely related
function open_named_tuntap() has already lived there).
ToString implementation for Address was removed after some consideration.
Address should probably ideally implement Display (as per RFC 565) which
would also supply a ToString implementation, however it implements Debug
instead, probably to enable automatic implementation of Debug for anything
that Address is a member of, if for no other reason. Rather than having
two identical functions this commit simply switches to using the Debug
implementation for printing Address on qemu command line.
Fixes#9352
Signed-off-by: Pavel Mores <pmores@redhat.com>
Make file descriptors to be passed to qemu owned by QemuCmdLine. See
commit 52958f17cd for more explanation.
Signed-off-by: Pavel Mores <pmores@redhat.com>
add_network_device() doesn't need to be passed NetworkInfo since it
already has access to the full HypervisorConfig.
Also, one of the goals of QemuCmdLine interface's design is to avoid
coupling between QemuCmdLine and the hypervisor crate's device module,
if at all possible. That's why add_network_device() shouldn't take
device module's NetworkConfig but just parts that are useful in
add_network_device()'s implementation.
Signed-off-by: Pavel Mores <pmores@redhat.com>
is_running_in_vm() is enough to figure out whether to disable_modern but
it's clumsy and verbose to use. should_disable_modern() streamlines the
usage by encapsulating the verbosity.
Signed-off-by: Pavel Mores <pmores@redhat.com>
This commit replaces the existing NetDevice-based implementation with one
using Netdev and DeviceVirtioNet.
Signed-off-by: Pavel Mores <pmores@redhat.com>
In keeping with architecture of QemuCmdLine implementation we split the
functionality into two objects: Netdev to represent and generate the
-netdev part and DeviceVirtioNet for the -device virtio-net-<transport>
part.
This change is a pure refactor, existing functionality does not change.
However, we do remove some stub generalizations and govmm-isms, notably:
- we remove the NetDev enum since the only network interface types that
kata seems to use with qemu are tuntap and macvtap, both of which are
implemented by the same -netdev tap
- enum DeviceDriver is also left out since it doesn't seem reasonable to
try to represent VFIO NICs (which are completely different from
virtio-net ones) with the same struct as virtio-net
- we also remove VirtioTransport because there's no use for it so far, but
with the expectation that it will be added soon.
We also make struct Netdev the owner of any vhost-net and queue file
descriptors so that their lifetime is tied ultimately to the lifetime of
QemuCmdLine automatically, instead of returning the fds to the caller and
forcing it to achieve the equivalent functionality but manually.
Signed-off-by: Pavel Mores <pmores@redhat.com>
generate_netdev_fds() takes NetworkConfig from which it however only needs
a host-side network device name. This commit makes it take the device name
directly, making the function useful to callers who don't have the whole
NetworkConfig but do have the requisite device name.
Signed-off-by: Pavel Mores <pmores@redhat.com>
The idea of this function is to make sure O_CLOEXEC is not set on file
descriptors that should be inherited by a child (=hypervisor) process.
The approach so far is however rather heavy-handed - clearing *all* flags
is unjustifiably aggresive for a low-level function with no knowledge of
context whatsoever.
This commit refactors the function so that it only does what's expected
and renames it accordingly. It also clarifies some of its call sites.
Signed-off-by: Pavel Mores <pmores@redhat.com>
Since v2.2.6 it can detect TDX guests on Azure, so let's bump it even if
Azure peer-pods are not currently used as part of our CI.
Fixes: #9348
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Kata CI has full debug output enabled for the cbl-mariner k8s tests,
and the test AKS node is relatively slow. So debug prints from policy
are expensive during CI.
Fixes: #9296
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
The lib.sh script uses the right directory but the wrong path for the
script that installs yq; fix it.
Fixes#9165
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Remove links to tests repo and update with corresponding location in the
current repo.
Fixes#9165
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Change scripts and source that uses files in the tests repo to use the
corresponding file in the current repo.
Fixes#9165
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Now that the version is an invariant for the entire workflow, it
isn't required to obtain it with an environment variable. Just
rely on the content of the `VERSION` file like other actions.
Fixes#9064 - part VI
Signed-off-by: Greg Kurz <groug@kaod.org>
We need some docs about how to build a guest kernel to support
both Upcall and Nvidia GPU Passthrough(hotplug) at the same time.
This patch is to do such thing to help users to build a guest
kernel with support both Upcall and Nvidia GPU hotplug/unlplug.
Fixes: #9140
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Add a document for kata guest image management design.
Related feature: #8484Fixes: #9225 -- part I
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
CONFIG_CRYPTO_ECDSA is not supported in older kernels such as 5.10.x
which may cause building broken problem if we build such kernel with
NVIDIA GPU in version 5.10.x
So this patch is to add CONFIG_CRYPTO_ECDSA into whitelist.conf to
avoid break building guest kernel with NVIDIA GPU.
Fixes: #9140
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Currently, `.lock().await.clone()` results in `Option<ImageService>` being duplicated in memory with each call to `singleton()`.
Consequently, if kata-agent receives numerous image pulling requests simultaneously,
it will lead to the allocation of multiple `Option<ImageService>` instances in memory, thereby consuming additional memory resources.
In image.rs, we introduce two public functions:
`merge_bundle_oci()` and `init_image_service()`. These functions will encapsulate
the operations on `IMAGE_SERVICE`, ensuring that its internal details remain
hidden from external modules such as `rpc.rs`.
Fixes: #9225 -- part II
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
It is observed that virtiofsd exits immediately on s390x
if there is no attached console devices.
This commit resolves the issue by migrating `appendConsole()`
from runtime and being triggered in `start_vm()`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
For s390x, it requires an additional option `memory-backend` for `-machine`.
Otherwise, virtiofsd exits with HandleRequest(InvalidParam).
This commit is to add a field `memory_backend` to `struct Machine`
and turn it on for s390x.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Like nvdimm for x86_64, a block device for s390x should be
treated differently with `virtio-blk-ccw`.
This is to generate a QEMU command line parameter for a block
device by using `-blockdev` and `-device` if the `vm_rootfs_driver`
is set to `virtio-blk-ccw`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This version brings in a fix for cleaning up k3s/rke2 environments,
which directly impacts the TDX machine that's part of our CI.
Fixes: #9318
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR fixes the unbound variables error when trying to run
the setup script locally in order to avoid errors.
Fixes#9328
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Currently, `*-pci` is used as an argument for the device config.
It is not true for a case where a different type of bus is used.
s390x uses `ccw`.
This commit is to make it flexible to generate the device argument
based on the bus type. A structure `DeviceVhostUserFsPci` and
`VhostVsockPci` is renamed to `DeviceVhostUserFs` and `VhostVsock`
because the structure name is not bound to a certain bus type any more.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
It has been observed that the runtime stops running around
`sysinfo::total_memory()` while adjusting a config on s390x.
This is to update the crate to the latest version which happened
to resolve the issue. (No explicit release note for this)
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
s390x supports a different machine type `s390-ccw-virtio` and it is
not required to configure cpu features by default for the platform.
A hypervisor `dragonball` is not supported on s390x so that `DBCMD`
is not necessary. `vm-rootfs_driver` should be set to `virtio-blk-ccw`.
This commit is to set the architecture-specific flags for Makefile.
Fixes: #9158
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The guest_hook_path item in configuration.toml allows OCI hook scripts
to be executed within Kata's guest environment. Traditionally, these
guest hook programs are pre-built and included in Kata's guest rootfs
image at a fixed location.
While setting guest_hook_path = "/usr/share/oci/hooks" in configuration.toml
works, it lacks flexibility. Not all guest hooks reside in the path
/usr/share/oci/hooks, and users might have custom locations.
To address this, a more flexible and configurable approach is to be proposed
that allows users to specify their desired path. This could include using a
sandbox bind mount path for hooks specific to that particular container.
However, The current implementation of guest hooks and bind mounts in kata-agent
has a reversed order of execution compared to the desired behavior.
To achieve the intended functionality, we simply need to swap the order of their
implementation.
Fixes: #9274
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Fixes: #9267
The doc states we have support for all lifecycle hooks. There are still some missing.
This is the first issue regarding the CreateContainer hook which is run before pivot_root but after prestart and createruntime
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-03-13 09:24:25 +00:00
1791 changed files with 170300 additions and 150342 deletions
| [`runk`](src/tools/runk) | utility | Standard OCI container runtime based on the agent. |
| [`ci`](.github/workflows) | CI | Continuous Integration configuration files and scripts. |
| [`ocp-ci`](ci/openshift-ci/README.md) | CI | Continuous Integration configuration for the OpenShift pipelines. |
| [`katacontainers.io`](https://github.com/kata-containers/www.katacontainers.io) | Source for the [`katacontainers.io`](https://www.katacontainers.io) site. |
| [`Webhook`](tools/testing/kata-webhook/README.md) | utility | Example of a simple admission controller webhook to annotate pods with the Kata runtime class |
> While this project's CI has several areas for improvement, it is constantly
> evolving. This document attempts to describe its current state, but due to
> ongoing changes, you may notice some outdated information here. Feel free to
> modify/improve this document as you use the CI and notice anything odd. The
> community appreciates it!
## Introduction
The Kata Containers CI relies on [GitHub Actions][gh-actions], where the actions
themselves can be found in the `.github/workflows` directory, and they may call
helper scripts, which are located under the `tests` directory, to actually
perform the tasks required for each test case.
## The different workflows
There are a few different sets of workflows that are running as part of our CI,
and here we're going to cover the ones that are less likely to get rotten. With
this said, it's fair to advise that if the reader finds something that got
rotten, opening an issue to the project pointing to the problem is a nice way to
help, and providing a fix for the issue is a very encouraging way to help.
### Jobs that run automatically when a PR is raised
These are a bunch of tests that will automatically run as soon as a PR is
opened, they're mostly running on "cost free" runners, and they do some
pre-checks to evaluate that your PR may be okay to start getting reviewed.
Mind, though, that the community expects the contributors to, at least, build
their code before submitting a PR, which the community sees as a very fair
request.
Without getting into the weeds with details on this, those jobs are the ones
responsible for ensuring that:
- The commit message is in the expected format
- There's no missing Developer's Certificate of Origin
- Static checks are passing
### Jobs that require a maintainer's approval to run
These are the required tests, and our so-called "CI". These require a
maintainer's approval to run as parts of those jobs will be running on "paid
runners", which are currently using Azure infrastructure.
Once a maintainer of the project gives "the green light" (currently by adding an
`ok-to-test` label to the PR, soon to be changed to commenting "/test" as part
of a PR review), the following tests will be executed:
- Build all the components (runs on free cost runners, or bare-metal depending on the architecture)
- Create a tarball with all the components (runs on free cost runners, or bare-metal depending on the architecture)
- Create a kata-deploy payload with the tarball generated in the previous step (runs on free costs runner, or bare-metal depending on the architecture)
- Run the following tests:
- Tests depending on the generated tarball
- Metrics (runs on bare-metal)
-`docker` (runs on Azure small instances)
-`nerdctl` (runs on Azure small instances)
-`kata-monitor` (runs on Azure small instances)
-`cri-containerd` (runs on Azure small instances)
-`nydus` (runs on Azure small instances)
-`vfio` (runs on Azure normal instances)
- Tests depending on the generated kata-deploy payload
- kata-deploy (runs on Azure small instances)
- Tests are performed using different "Kubernetes flavors", such as k0s, k3s, rke2, and Azure Kubernetes Service (AKS).
- Kubernetes (runs in Azure small and medium instances depending on what's required by each test, and on TEE bare-metal machines)
- Tests are performed with different runtime engines, such as CRI-O and containerd.
- Tests are performed with different snapshotters for containerd, namely OverlayFS and devmapper.
- Tests are performed with all the supported hypervisors, which are Cloud Hypervisor, Dragonball, Firecracker, and QEMU.
For all the tests relying on Azure instances, real money is being spent, so the
community asks for the maintainers to be mindful about those, and avoid abusing
them to merely debug issues.
## The different runners
In the previous section we've mentioned using different runners, now in this section we'll go through each type of runner used.
- Cost free runners: Those are the runners provided by GIthub itself, and
those are fairly small machines with no virtualization capabilities enabled -
- Azure small instances: Those are runners which have virtualization
capabilities enabled, 2 CPUs, and 8GB of RAM. These runners have a "-smaller"
suffix to their name.
- Azure normal instances: Those are runners which have virtualization
capabilities enabled, 4 CPUs, and 16GB of RAM. These runners are usually
`garm` ones with no "-smaller" suffix.
- Bare-metal runners: Those are runners provided by community contributors,
and they may vary in architecture, size and virtualization capabilities.
Builder runners don't actually require any virtualization capabilities, while
runners which will be actually performing the tests must have virtualization
capabilities and a reasonable amount for CPU and RAM available (at least
matching the Azure normal instances).
## Adding new tests
Before someone decides to add a new test, we strongly recommend them to go
through [GitHub Actions Documentation][gh-actions],
which will provide you a very sensible background on how to read and understand
current tests we have, and also become familiar with how to write a new test.
On the Kata Containers land, there are basically two sets of tests: "standalone"
and "part of something bigger".
The "standalone" tests, for example the commit message check, won't be covered
here as they're better covered by the GitHub Actions documentation pasted above.
The "part of something bigger" is the more complicated one and not so
straightforward to add, so we'll be focusing our efforts on describing the
addition of those.
> [!NOTE]
> TODO: Currently, this document refers to "tests" when it actually means the
> jobs (or workflows) of GitHub. In an ideal world, except in some specific cases,
> new tests should be added without the need to add new workflows. In the
> not-too-distant future (hopefully), we will improve the workflows to support
> this.
### Adding a new test that's "part of something bigger"
The first important thing here is to align expectations, and we must say that
the community strongly prefers receiving tests that already come with:
- Instructions how to run them
- A proven run where it's passing
There are several ways to achieve those two requirements, and an example of that
can be seen in PR #8115.
With the expectations aligned, adding a test consists in:
- Adding a new yaml file for your test, and ensure it's called from the
"bigger" yaml. See the [Kata Monitor test example][monitor-ex01].
- Adding the helper scripts needed for your test to run. Again, use the [Kata Monitor script as example][monitor-ex02].
Following those examples, the community advice during the review, and even
asking the community directly on Slack are the best ways to get your test
accepted.
## Running tests
### Running the tests as part of the CI
If you're a maintainer of the project, you'll be able to kick in the tests by
yourself. With the current approach, you just need to add the `ok-to-test`
label and the tests will automatically start. We're moving, though, to use a
`/test` command as part of a GitHub review comment, which will simplify this
process.
If you're not a maintainer, please, send a message on Slack or wait till one of
the maintainers reviews your PR. Maintainers will then kick in the tests on
your behalf.
In case a test fails and there's the suspicion it happens due to flakiness in
the test itself, please, create an issue for us, and then re-run (or asks
maintainers to re-run) the tests following these steps:
- Locate which tests is failing
- Click in "details"
- In the top right corner, click in "Re-run jobs"
- And then in "Re-run failed jobs"
- And finally click in the green "Re-run jobs" button
> [!NOTE]
> TODO: We need figures here
### Running the tests locally
In this section, aligning expectations is also something very important, as one
will not be able to run the tests exactly in the same way the tests are running
in the CI, as one most likely won't have access to an Azure subscription.
However, we're trying our best here to provide you with instructions on how to
run the tests in an environment that's "close enough" and will help you to debug
issues you find with the current tests, or even provide a proof-of-concept to
the new test you're trying to add.
The basic steps, which we will cover in details down below are:
1. Create a VM matching the configuration of the target runner
2. Generate the artifacts you'll need for the test, or download them from a
current failed run
3. Follow the steps provided in the action itself to run the tests.
Although the general overview looks easy, we know that some tricks need to be
shared, and we'll go through the general process of debugging one non-Kubernetes
and one Kubernetes specific test for educational purposes.
One important thing to note is that "Create a VM" can be done in innumerable
different ways, using the tools of your choice. For the sake of simplicity on
this guide, we'll be using `kcli`, which we strongly recommend in case you're a
non-experienced user, and happen to be developing on a Linux box.
For both non-Kubernetes and Kubernetes cases, we'll be using PR #8070 as an
example, which at the time this document is being written serves us very well
the purpose, as you can see that we have `nerdctl` and Kubernetes tests failing.
## Debugging tests
### Debugging a non Kubernetes test
As shown above, the `nerdctl` test is failing.
As a developer you can go ahead to the details of the job, and expand the job
that's failing in order to gather more information.
But when that doesn't help, we need to set up our own environment to debug
what's going on.
Taking a look at the `nerdctl` test, which is located here, you can easily see
that it runs-on a `garm-ubuntu-2304-smaller` virtual machine.
The important parts to understand are `ubuntu-2304`, which is the OS where the
test is running on; and "smaller", which means we're running it on a machine
with 2 CPUs and 8GB of RAM.
With this information, we can go ahead and create a similar VM locally using `kcli`.
pipelines to monitor selected functional tests on OpenShift.
There are 2 pipelines, history and logs can be accessed here:
* [main - currently supported OCP](https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-kata-containers-kata-containers-main-e2e-tests)
* [next - currently under development OCP](https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-kata-containers-kata-containers-main-next-e2e-tests)
Running openshift-tests on OCP with kata-containers manually
Following steps require the [openshift tests](https://github.com/openshift/origin)
being cloned and built in the current directory:
```bash
#!/bin/bash -e
# Define tests to be skipped (see the pipeline config for the current version)
TEST_SKIPS="\[sig-node\] Security Context should support seccomp runtime/default\|\[sig-node\] Variable Expansion should allow substituting values in a volume subpath\|\[k8s.io\] Probing container should be restarted with a docker exec liveness probe with timeout\|\[sig-node\] Pods Extended Pod Container lifecycle evicted pods should be terminal\|\[sig-node\] PodOSRejection \[NodeConformance\] Kubelet should reject pod when the node OS doesn't match pod's OS\|\[sig-network\].*for evicted pods\|\[sig-network\].*HAProxy router should override the route\|\[sig-network\].*HAProxy router should serve a route\|\[sig-network\].*HAProxy router should serve the correct\|\[sig-network\].*HAProxy router should run\|\[sig-network\].*when FIPS.*the HAProxy router\|\[sig-network\].*bond\|\[sig-network\].*all sysctl on whitelist\|\[sig-network\].*sysctls should not affect\|\[sig-network\] pods should successfully create sandboxes by adding pod to network"
# Get the list of tests to be executed
TESTS="$(./openshift-tests run --dry-run --provider "${TEST_PROVIDER}" "${TEST_SUITE}")"
E2E_TEST="${E2E_TEST:-'"[sig-node] Container Runtime blackbox test on terminated container should report termination message as empty when pod succeeds and TerminationMessagePolicy FallbackToLogsOnError is set [NodeConformance] [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"'}"
KATA_CI_DIR="${KATA_CI_DIR:-$(pwd)}"
exportKATA_RUNTIME="${KATA_RUNTIME:-kata-qemu}"
## SETUP
# Deploy kata
SETUP=0
pushd"$KATA_CI_DIR"||{echo"Failed to cd to '$KATA_CI_DIR'";exit 255;}
Note, the stable-2.2 branch will still exist with tag 2.2.1, but under current plans it is
not maintained further. The next tag applied to main will be 2.4.0-alpha0. We would then
create a couple of alpha releases gathering features targeted for that particular release (in
this case 2.4.0), followed by a release candidate. The release candidate marks a feature freeze.
A new stable branch is created for the release candidate. Only bug fixes and any security issues
are added to the branch going forward until release 2.4.0 is made.
## Backporting Process
Development that occurs against the main branch and applicable code commits should also be submitted
against the stable branches. Some guidelines for this process follow::
1. Only bug and security fixes which do not introduce inter-component dependencies are
candidates for stable branches. These PRs should be marked with "bug" in GitHub.
2. Once a PR is created against main which meets requirement of (1), a comparable one
should also be submitted against the stable branches. It is the responsibility of the submitter
to apply their pull request against stable, and it is the responsibility of the
reviewers to help identify stable-candidate pull requests.
## Continuous Integration Testing
The test repository is forked to create stable branches from main. Full CI
runs on each stable and main PR using its respective tests repository branch.
### An alternative method for CI testing:
Ideally, the continuous integration infrastructure will run the same test suite on both main
and the stable branches. When tests are modified or new feature tests are introduced, explicit
logic should exist within the testing CI to make sure only applicable tests are executed against
stable and main. While this is not in place currently, it should be considered in the long term.
## Release Management
### Patch releases
Releases are made every four weeks, which include a GitHub release as
well as binary packages. These patch releases are made for both stable branches, and a "release candidate"
for the next `MAJOR` or `MINOR` is created from main. If there are no changes across all the repositories, no
release is created and an announcement is made on the developer mailing list to highlight this.
If a release is being made, each repository is tagged for this release, regardless
of whether changes are introduced. The release schedule can be seen on the
[release rotation wiki page](https://github.com/kata-containers/community/wiki/Release-Team-Rota).
If there is urgent need for a fix, a patch release will be made outside of the planned schedule.
The process followed for making a release can be found at [Release Process](Release-Process.md).
## Minor releases
### Frequency
Minor releases are less frequent in order to provide a more stable baseline for users. They are currently
running on a sixteen weeks cadence. The release schedule can be seen on the
[release rotation wiki page](https://github.com/kata-containers/community/wiki/Release-Team-Rota).
### Compatibility
Kata guarantees compatibility between components that are within one minor release of each other.
This is critical for dependencies which cross between host (shimv2 runtime) and
the guest (hypervisor, rootfs and agent). For example, consider a cluster with a long-running
deployment, workload-never-dies, all on Kata version 2.1.3 components. If the operator updates
the Kata components to the next new minor release (i.e. 2.2.0), we need to guarantee that the 2.2.0
shimv2 runtime still communicates with 2.1.3 agent within workload-never-dies.
Handling live-update is out of the scope of this document. See this [`kata-runtime` issue](https://github.com/kata-containers/runtime/issues/492) for details.
To safeguard the integrity of container images and prevent tampering from the host side, we propose guest image management. This method, employed for Confidential Containers, ensures container images remain unaltered and secure.
## Introduction to remote snapshot
Containerd 1.7 introduced `remote snapshotter` feature which is the foundation for pulling images in the guest for Confidential Containers.
While it's beyond the scope of this document to fully explain how the container rootfs is created to the point it can be executed, a fundamental grasp of the snapshot concept is essential. Putting it in a simple way, containerd fetches the image layers from an OCI registry into its local content storage. However, they cannot be mounted as is (e.g. the layer can be tar+gzip compressed) as well as they should be immutable so the content can be shared among containers. Thus containerd leverages snapshots of those layers to build the container's rootfs.
The role of `remote snapshotter` is to reuse snapshots that are stored in a remotely shared place, thus enabling containerd to prepare the container’s rootfs in a manner similar to that of a local `snapshotter`. The key behavior that makes this the building block of Kata's guest image management for Confidential Containers is that containerd will not pull the image layers from registry, instead it assumes that `remote snapshotter` and/or an external entity will perform that operation on his behalf.
Maybe the simplest example of `remote snapshotter` in Confidential Containers is the pulling of images in the guest VM. Once ensuring the VM is part of a Trusted Computing Base (TCB) and throughout a chain of delegations involving containerd, `remote snapshotter` and kata-runtime, it is possible for the kata-agent to pull the image directly.
## `Remote snapshotter` implementations
`Remote snapshotter` is containerd plug-ins that should implement the [Snapshot Interface](https://pkg.go.dev/github.com/containerd/containerd/v2/snapshots?utm_source=godoc#Snapshotter).
The following `remote snapshotter` is leveraged by Kata Containers:
-`nydus snapshotter`
### `Nydus snapshotter`
This `snapshotter` is implemented as an external containerd proxy plug-in for [`Nydus`](https://nydus.dev/).
Currently it supports a couple of runtime backend, notably, `FUSE`, `virtiofs`, and `EROFS`, being the former leveraged on the tested Kata Containers CI.
## `Nydus snapshotter` and containerd-shim-v2 integration
```go
// KataVirtualVolume encapsulates information for extra mount options and direct volumes.
The following sequence diagram depicted below offers a detailed overview of the messages/calls exchanged to pull an unencrypted unsigned image from an unauthenticated container registry. This involves the kata-runtime, kata-agent, and the guest-components’ image-rs to use the guest pull mechanism.
First and foremost, the guest pull code path is only activated when `nydus snapshotter` requires the handling of a volume which type is `image_guest_pull`, as can be seen on the message below:
In other words, `VolumeType` of `KataVirtualVolumeType` is set to `image_guest_pull`.
Next the `handleImageGuestPullBlockVolume()` is called to build the Storage object that will be attached to the message later sent to kata-agent via the `CreateContainerRequest()` RPC. It is in the `handleImageGuestPullBlockVolume()` that it will begin the handling of the pause image if the request is for a sandbox container type (see more about pause image below).
Below is an example of storage information packaged in the message sent to the kata-agent:
Next, the kata-agent's RPC module will handle the create container request which, among other things, involves adding storages to the sandbox. The storage module contains implementations of `StorageHandler` interface for various storage types, being the `ImagePullHandler` in charge of handling the storage object for the container image (the storage manager instantiates the handler based on the value of the "driver").
`ImagePullHandler` delegates the image pulling operation to the `ImageService.pull_image()` that is going to create the image's bundle directory on the guest filesystem and, in turn, class the image-rs to in fact fetch and uncompress the image's bundle.
> **Notes:**
> In this flow, `ImageService.pull_image()` parses the image metadata, looking for either the `io.kubernetes.cri.container-type: sandbox` or `io.kubernetes.cri-o.ContainerType: sandbox` (CRI-IO case) annotation, then it never calls the `image-rs.pull_image()` because the pause image is expected to already be inside the guest's filesystem, so instead `ImageService.unpack_pause_image()` is called.
References:
[1] [[RFC] Image management proposal for hosting sharing and peer pods](https://github.com/confidential-containers/confidential-containers/issues/137)
Kata Containers 3.3.0 introduces the guest image management feature, which enables the guest VM to directly pull images using `nydus snapshotter`. This feature is designed to protect the integrity of container images and guard against any tampering by the host, which is used for confidential containers. Please refer to [kata-guest-image-management-design](../design/kata-guest-image-management-design.md) for details.
## Prerequisites
- The k8s cluster with Kata 3.3.0+ is ready to use.
-`yq` is installed in the host and it's directory is included in the `PATH` environment variable. (optional, for DaemonSet only)
## Deploy `nydus snapshotter` for guest image management
To pull images in the guest, we need to do the following steps:
1. Delete images used for pulling in the guest (optional, for containerd only)
2. Install `nydus snapshotter`:
1. Install `nydus snapshotter` by k8s DaemonSet (recommended)
2. Install `nydus snapshotter` manually
### Delete images used for pulling in the guest
Though the `CRI Runtime Specific Snapshotter` is still an [experimental feature](https://github.com/containerd/containerd/blob/main/RELEASES.md#experimental-features) in containerd, which containerd is not supported to manage the same image in different `snapshotters`(The default `snapshotter` in containerd is `overlayfs`). To avoid errors caused by this, it is recommended to delete images (including the pause image) in containerd that needs to be pulled in guest later before configuring `nydus snapshotter` in containerd.
### Install `nydus snapshotter`
#### Install `nydus snapshotter` by k8s DaemonSet (recommended)
To use DaemonSet to install `nydus snapshotter`, we need to ensure that `yq` exists in the host.
Configure `nydus snapshotter` to enable `CRI Runtime Specific Snapshotter` in containerd. This ensures run kata containers with `nydus snapshotter`. Below, the steps are illustrated using `kata-qemu` as an example.
```toml
# Modify containerd configuration to ensure that the following lines appear in the containerd configuration
# (Assume that the containerd config is located in /etc/containerd/config.toml)
> The `CRI Runtime Specific Snapshotter` feature only works for containerd v1.7.0 and above. So for Containerd v1.7.0 below, in addition to the above settings, we need to set the global `snapshotter` to `nydus` in containerd config. For example:
```toml
[plugins."io.containerd.grpc.v1.cri".containerd]
snapshotter="nydus"
```
5. Restart containerd service
```bash
$ sudo systemctl restart containerd
```
## Verification
To verify pulling images in a guest VM, please refer to the following commands:
1. Run a kata container
```bash
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: busybox
annotations:
io.containerd.cri.runtime-handler: kata-qemu
spec:
runtimeClassName: kata-qemu
containers:
- name: busybox
image: quay.io/prometheus/busybox:latest
imagePullPolicy: Always
EOF
pod/busybox created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
busybox 1/1 Running 0 10s
```
> **Notes:**
> The `CRI Runtime Specific Snapshotter` is still an experimental feature. To pull images in the guest under the specific kata runtime (such as `kata-qemu`), we need to add the following annotation in metadata to each pod yaml: `io.containerd.cri.runtime-handler: kata-qemu`. By adding the annotation, we can ensure that the feature works as expected.
2. Verify that the pod's images have been successfully downloaded in the guest.
If images intended for deployment are deleted prior to deploying with `nydus snapshotter`, the root filesystems required for the pod's images (including the pause image and the container image) should not be present on the host.
@@ -27,6 +27,7 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.runtime.internetworking_model` | string| determines how the VM should be connected to the container network interface. Valid values are `macvtap`, `tcfilter` and `none` |
| `io.katacontainers.config.runtime.sandbox_cgroup_only`| `boolean` | determines if Kata processes are managed only in sandbox cgroup |
| `io.katacontainers.config.runtime.enable_pprof` | `boolean` | enables Golang `pprof` for `containerd-shim-kata-v2` process |
| `io.katacontainers.config.runtime.create_container_timeout` | `uint64` | the timeout for create a container in `seconds`, default is `60` |
This document discusses threat models associated with the Kata Containers project.
Kata was designed to provide additional isolation of container workloads, protecting
the host infrastructure from potentially malicious container users or workloads. Since
Kata Containers adds a level of isolation on top of traditional containers, the focus
is on the additional layer provided, not on traditional container security.
This document discusses threat models associated with the Kata Containers
project. Kata was designed to provide additional isolation of container
workloads, protecting the host infrastructure from potentially malicious
container users or workloads. Since Kata Containers adds a level of isolation on
top of traditional containers, the focus is on the additional layer provided,
not on traditional container security.
This document provides a brief background on containers and layered security, describes
the interface to Kata from CRI runtimes, a review of utilized virtual machine interfaces, and then
a review of threats.
This document provides a brief background on containers and layered security,
describes the interface to Kata from CRI runtimes, a review of utilized virtual
machine interfaces, and then a review of threats.
## Kata security objective
Kata seeks to prevent an untrusted container workload or user of that container workload to gain
control of, obtain information from, or tamper with the host infrastructure.
Kata seeks to prevent an untrusted container workload or user of that container
workload to gain control of, obtain information from, or tamper with the host
infrastructure.
In our scenario, an asset is anything on the host system, or elsewhere in the cluster
infrastructure. The attacker is assumed to be either a malicious user or the workload itself
running within the container. The goal of Kata is to prevent attacks which would allow
any access to the defined assets.
In our scenario, an asset is anything on the host system, or elsewhere in the
cluster infrastructure. The attacker is assumed to be either a malicious user or
the workload itself running within the container. The goal of Kata is to prevent
attacks which would allow any access to the defined assets.
## Background on containers, layered security
Traditional containers leverage several key Linux kernel features to provide isolation and
a view that the container workload is the only entity running on the host. Key features include
`Namespaces`, `cgroups`, `capablities`, `SELinux` and `seccomp`. The canonical runtime for creating such
a container is `runc`. In the remainder of the document, the term `traditional-container` will be used
to describe a container workload created by runc.
Traditional containers leverage several key Linux kernel features to provide
isolation and a view that the container workload is the only entity running on
the host. Key features include `Namespaces`, `cgroups`, `capablities`, `SELinux`
and `seccomp`. The canonical runtime for creating such a container is `runc`. In
the remainder of the document, the term `traditional-container` will be used to
describe a container workload created by runc.
Kata Containers provides a second layer of isolation on top of those provided by traditional-containers.
The hardware virtualization interface is the basis of this additional layer. Kata launches a lightweight
virtual machine, and uses the guest’s Linux kernel to create a container workload, or workloads in the case
of multi-container pods. In Kubernetes and in the Kata implementation, the sandbox is carried out at the
pod level. In Kata, this sandbox is created using a virtual machine.
Kata Containers provides a second layer of isolation on top of those provided by
traditional-containers. The hardware virtualization interface is the basis of
this additional layer. Kata launches a lightweight virtual machine, and uses the
guest’s Linux kernel to create a container workload, or workloads in the case of
multi-container pods. In Kubernetes and in the Kata implementation, the sandbox
is carried out at the pod level. In Kata, this sandbox is created using a
virtual machine.
## Interface to Kata Containers: CRI, v2-shim, OCI
A typical Kata Containers deployment uses Kubernetes with a CRI implementation.
On every node, Kubelet will interact with a CRI implementor, which will in turn interface with
an OCI based runtime, such as Kata Containers. Typical CRI implementors are `cri-o` and `containerd`.
On every node, Kubelet will interact with a CRI implementor, which will in turn
interface with an OCI based runtime, such as Kata Containers. Typical CRI
implementors are `cri-o` and `containerd`.
The CRI API, as defined at the Kubernetes [CRI-API repo](https://github.com/kubernetes/cri-api/),
results in a few constructs being supported by the CRI implementation, and ultimately in the OCI
runtime creating the workloads.
The CRI API, as defined at the Kubernetes [CRI-API
repo](https://github.com/kubernetes/cri-api/), results in a few constructs being
supported by the CRI implementation, and ultimately in the OCI runtime creating
the workloads.
In order to run a container inside of the Kata sandbox, several virtual machine devices and interfaces
are required. Kata translates sandbox and container definitions to underlying virtualization technologies provided
by a set of virtual machine monitors (VMMs) and hypervisors. These devices and their underlying
implementations are discussed in detail in the following section.
In order to run a container inside of the Kata sandbox, several virtual machine
devices and interfaces are required. Kata translates sandbox and container
definitions to underlying virtualization technologies provided by a set of
virtual machine monitors (VMMs) and hypervisors. These devices and their
underlying implementations are discussed in detail in the following section.
## Interface to the Kata sandbox/virtual machine
In case of Kata, today the devices which we need in the guest are:
- Storage: In the current design of Kata Containers, we are reliant on the CRI implementor to
assist in image handling and volume management on the host. As a result, we need to support a way of passing to the sandbox the container rootfs, volumes requested
by the workload, and any other volumes created to facilitate sharing of secrets and `configmaps` with the containers. Depending on how these are managed, a block based device or file-system
sharing is required. Kata Containers does this by way of `virtio-blk` and/or `virtio-fs`.
- Networking: A method for enabling network connectivity with the workload is required. Typically this will be done providing a `TAP` device
to the VMM, and this will be exposed to the guest as a `virtio-net` device. It is feasible to pass in a NIC device directly, in which case `VFIO` is leveraged
and the device itself will be exposed to the guest.
-Control: In order to interact with the guest agent and retrieve `STDIO` from containers, a medium of communication is required.
This is available via `virtio-vsock`.
- Devices: `VFIO` is utilized when devices are passed directly to the virtual machine and exposed to the container.
- Dynamic Resource Management: `ACPI` is utilized to allow for dynamic VM resource management (for example: CPU, memory, device hotplug). This is required when containers are resized,
or more generally when containers are added to a pod.
- Storage: In the current design of Kata Containers, we are reliant on the CRI
implementor to assist in image handling and volume management on the host. As a
result, we need to support a way of passing to the sandbox the container
rootfs, volumes requested by the workload, and any other volumes created to
facilitate sharing of secrets and `configmaps` with the containers. Depending
on how these are managed, a block based device or file-system sharing is
required. Kata Containers does this by way of `virtio-blk` and/or `virtio-fs`.
-Networking: A method for enabling network connectivity with the workload is
required. Typically this will be done providing a `TAP` device to the VMM, and
this will be exposed to the guest as a `virtio-net` device. It is feasible to
pass in a NIC device directly, in which case `VFIO` is leveraged and the device
itself will be exposed to the guest.
- Control: In order to interact with the guest agent and retrieve `STDIO` from
containers, a medium of communication is required. This is available via
`virtio-vsock`.
- Devices: `VFIO` is utilized when devices are passed directly to the virtual
machine and exposed to the container.
- Dynamic Resource Management: `ACPI` is utilized to allow for dynamic VM
resource management (for example: CPU, memory, device hotplug). This is
required when containers are resized, or more generally when containers are
added to a pod.
How these devices are utilized varies depending on the VMM utilized. We clarify the default settings provided when integrating Kata
with the QEMU, Firecracker and Cloud Hypervisor VMMs in the following sections.
How these devices are utilized varies depending on the VMM utilized. We clarify
the default settings provided when integrating Kata with the QEMU, Dragonball,
Firecracker and Cloud Hypervisor VMMs in the following sections.
### Virtual Machine Monitor(s)
In a KVM/QEMU (any other VMM utilizing KVM) virtualization setup, all virtual
machines (VMs) share the same host kernel. This shared environment can lead to
scenarios where one VM could potentially impact the performance or stability of
other VMs, including the possibility of a Denial of Service attack.
- Kernel Vulnerabilities: Since all VMs rely on the host's kernel, a
vulnerability in the kernel could be exploited by a process running within one
VM to affect the entire system. This could lead to scenarios where the
compromised VM impacts other VMs or even takes down the host.
- Improper Isolation and Containment: If the virtualization environment is not
correctly configured, processes in one VM might impact other VMs. This could
occur through improper isolation of network traffic, shared file systems, or
other inter-VM communication channels.
- Hypervisor Vulnerabilities: Flaws in the KVM hypervisor or QEMU could be
exploited to cause information disclosure, data tampering, elevation of
privileges, denial of service, and others. Since KVM/QEMU leverages the host
kernel for its operation, any exploit at this level can have widespread impacts.
- Malicious or Flawed Guest Operating Systems: A guest operating system that is
maliciously designed or has serious flaws could engage in activities that
disrupt the normal operation of the host or other guests. This might include
aggressive network activity or interactions with the virtualization stack that
lead to instability.
- Resource Exhaustion: A VM could consume excessive shared resources such as
CPU, memory, or I/O bandwidth, leading to resource starvation for other VMs.
This could be due to misconfiguration, a runaway process, or a deliberate
denial of service attack from a compromised VM.
### Devices
Each virtio device is implemented by a backend, which may execute within userspace on the host (vhost-user), the VMM itself, or within the host kernel (vhost). While it may provide enhanced performance,
vhost devices are often seen as higher risk since an exploit would be already running within the kernel space. While VMM and vhost-user are both in userspace on the host, `vhost-user` generally allows for the back-end process to require less system calls and capabilities compared to a full VMM.
Each virtio device is implemented by a backend, which may execute within
userspace on the host (vhost-user), the VMM itself, or within the host kernel
(vhost). While it may provide enhanced performance, vhost devices are often seen
as higher risk since an exploit would be already running within the kernel
space. While VMM and vhost-user are both in userspace on the host, `vhost-user`
generally allows for the back-end process to require less system calls and
capabilities compared to a full VMM.
#### `virtio-blk` and `virtio-scsi`
The backend for `virtio-blk` and `virtio-scsi` are based in the VMM itself (ring3 in the context of x86) by default for Cloud Hypervisor, Firecracker and QEMU.
While `vhost` based back-ends are available for QEMU, it is not recommended. `vhost-user` back-ends are being added for Cloud Hypervisor, they are not utilized in Kata today.
The backend for `virtio-blk` and `virtio-scsi` are based in the VMM itself
(ring3 in the context of x86) by default for Cloud Hypervisor, Firecracker and
QEMU. While `vhost` based back-ends are available for QEMU, it is not
recommended. `vhost-user` back-ends are being added for Cloud Hypervisor, they
are not utilized in Kata today.
#### `virtio-fs`
`virtio-fs` is supported in Cloud Hypervisor and QEMU. `virtio-fs`'s interaction with the host filesystem is done through a vhost-user daemon, `virtiofsd`.
The `virtio-fs` client, running in the guest, will generate requests to access files. `virtiofsd` will receive requests, open the file, and request the VMM
to `mmap` it into the guest. When DAX is utilized, the guest will access the host's page cache, avoiding the need for copy and duplication. DAX is still an experimental feature,
and is not enabled by default.
`virtio-fs` is supported in Cloud Hypervisor and QEMU. `virtio-fs`'s interaction
with the host filesystem is done through a vhost-user daemon, `virtiofsd`. The
`virtio-fs` client, running in the guest, will generate requests to access
files. `virtiofsd` will receive requests, open the file, and request the VMM to
`mmap` it into the guest. When DAX is utilized, the guest will access the host's
page cache, avoiding the need for copy and duplication. DAX is still an
experimental feature, and is not enabled by default.
From the `virtiofsd` [documentation](https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/README.md):
```This program must be run as the root user. Upon startup the program will switch into a new file system namespace with the shared directory tree as its root. This prevents “file system escapes” due to symlinks and other file system objects that might lead to files outside the shared directory. The program also sandboxes itself using seccomp(2) to prevent ptrace(2) and other vectors that could allow an attacker to compromise the system after gaining control of the virtiofsd process.```
DAX-less support for `virtio-fs` is available as of the 5.4 Linux kernel. QEMU VMM supports virtio-fs as of v4.2. Cloud Hypervisor
supports `virtio-fs`.
DAX-less support for `virtio-fs` is available as of the 5.4 Linux kernel. QEMU
VMM supports virtio-fs as of v4.2. Cloud Hypervisor supports `virtio-fs`.
#### `virtio-net`
@@ -97,9 +159,9 @@ supports `virtio-fs`.
##### QEMU networking
While QEMU has options for `vhost`, `virtio-net` and `vhost-user`, the `virtio-net` backend
for Kata defaults to `vhost-net` for performance reasons. The default configuration is being
reevaluated.
While QEMU has options for `vhost`, `virtio-net` and `vhost-user`, the
`virtio-net` backend for Kata defaults to `vhost-net` for performance reasons.
The default configuration is being reevaluated.
##### Firecracker networking
@@ -107,8 +169,14 @@ For Firecracker, the `virtio-net` backend is within Firecracker's VMM.
##### Cloud Hypervisor networking
For Cloud Hypervisor, the current backend default is within the VMM. `vhost-user-net` support
is being added (written in rust, Cloud Hypervisor specific).
For Cloud Hypervisor, the current backend default is within the VMM.
`vhost-user-net` support is being added (written in rust, Cloud Hypervisor
specific).
##### Dragonball networking
For Dragonball, the `virtio-net` backend default is within Dragonbasll's VMM.
#### virtio-vsock
@@ -116,22 +184,88 @@ is being added (written in rust, Cloud Hypervisor specific).
In QEMU, vsock is backed by `vhost_vsock`, which runs within the kernel itself.
##### Firecracker and Cloud Hypervisor
##### Dragonball, Firecracker and Cloud Hypervisor
In Firecracker and Cloud Hypervisor, vsock is backed by a unix-domain-socket in the hosts userspace.
In Dragonball, Firecracker and Cloud Hypervisor, vsock is backed by a unix-domain-socket in
the hosts userspace.
#### VFIO
Utilizing VFIO, devices can be passed through to the virtual machine. We will assess this separately. Exposure to
host is limited to gaps in device pass-through handling. This is supported in QEMU and Cloud Hypervisor, but not
Firecracker.
Utilizing VFIO, devices can be passed through to the virtual machine. Exposure
to the host is limited to gaps in device pass-through handling. This is
supported in QEMU and Cloud Hypervisor, but not Firecracker.
- Device Isolation Failure: One of the primary risks associated with VFIO is the
failure to isolate the physical device. If a VM can affect the operation of the
physical device in a way that impacts other VMs or the host system, it could
lead to security breaches or system instability.
- DMA Attacks: Direct Memory Access (DMA) attacks are a significant concern with
VFIO. Since the device has direct access to the system's memory, there's a risk
that a compromised VM could use its assigned device to read or write memory
outside of its allocated space, potentially accessing sensitive information or
affecting the host or other VMs.
- Firmware Vulnerabilities: Devices attached via VFIO rely on their firmware,
which can have vulnerabilities. A compromised device firmware could be exploited
to gain unauthorized access or to disrupt the system. Resource Starvation:
Improperly managed, a VM with direct access to hardware resources could
monopolize those resources, leading to performance degradation or denial of
service for other VMs or the host system.
- Escalation of Privileges: If a VM with VFIO access is compromised, it could
potentially be used to gain higher privileges than intended, especially if the
I/O devices have capabilities that are not adequately controlled or monitored.
- Improper Configuration and Management: Human errors in configuring VFIO, such
as incorrect group or user permissions, can expose the system to risks.
Additionally, inadequate monitoring and management of the VMs and their devices
can lead to security lapses.
- Software Vulnerabilities: Like any software, the components of VFIO (like the
kernel modules, device drivers, and management tools) can have vulnerabilities
that might be exploited by an attacker to compromise the security of the system.
Inter-VM Interference and Side-Channel Attacks: Even with device assignment,
there could be side-channel attacks where an attacker VM infers sensitive
information from the physical device's behavior or through shared resources like
cache.
#### ACPI (Dragonball uses Upcall)
ACPI is necessary for hotplugging of CPU, memory and devices. ACPI is available
in QEMU and Cloud Hypervisor. Device, CPU and memory hotplug are not available
in Firecracker.
- Hypervisor Vulnerabilities: In virtualized environments, the hypervisor
manages ACPI calls for virtual machines (VMs). If the hypervisor has
vulnerabilities in handling ACPI requests, it could lead to escalated privileges
or other security breaches.
- VM Escape: A sophisticated attack could exploit ACPI functionality to achieve
a VM escape, where malicious code in a VM breaks out to the host system or other
VMs. Firmware Attacks in a Virtualized Context: Similar to physical
environments, firmware-based attacks (including those targeting ACPI) in
virtualized systems can be persistent and difficult to detect. In a virtualized
environment, such attacks might not only compromise the host system but also all
the VMs running on it.
- Resource Starvation Attacks: ACPI functionality could be exploited to
manipulate power management features, causing denial of service through
resource starvation. For example, an attacker could force a VM into a low-power
state, degrading its performance or availability.
- Compromised VMs Affecting Host ACPI Settings: If a VM is compromised, it might
be used to alter ACPI settings on the host, affecting all VMs on that host. This
could lead to various impacts, from performance degradation to system
instability.
- Supply Chain Risks: As with non-virtualized environments, the firmware,
including ACPI firmware used in virtualized environments, could be compromised
during the supply chain process, leading to vulnerabilities that affect all VMs
running on the hardware.
#### ACPI
ACPI is necessary for hotplug of CPU, memory and devices. ACPI is available in QEMU and Cloud Hypervisor. Device, CPU and memory hotplug
The kata agent has the ability to configure agent options in guest kernel command line at runtime. Currently, the following agent options can be configured:
The kata agent has the ability to configure agent options in guest kernel command line at runtime. Currently, the following agent options can be configured:
| Option | Name | Description | Type | Default |
|-|-|:-:|:-:|:-:|
@@ -126,6 +126,8 @@ The kata agent has the ability to configure agent options in guest kernel comman
| `agent.debug_console_vport` | Debug console port | Allow to specify the `vsock` port to connect the debugging console | integer | `0` |
| `agent.devmode` | Developer mode | Allow the agent process to coredump | boolean | `false` |
| `agent.hotplug_timeout` | Hotplug timeout | Allow to configure hotplug timeout(seconds) of block devices | integer | `3` |
| `agent.guest_components_rest_api` | `api-server-rest` configuration | Select the features that the API Server Rest attestation component will run with. Valid values are `all`, `attestation`, `resource` | string | `resource` |
| `agent.guest_components_procs` | guest-components processes | Attestation-related processes that should be spawned as children of the guest. Valid values are `none`, `attestation-agent`, `confidential-data-hub` (implies `attestation-agent`), `api-server-rest` (implies `attestation-agent` and `confidential-data-hub`) | string | `api-server-rest` |
| `agent.https_proxy` | HTTPS proxy | Allow to configure `https_proxy` in the guest | string | `""` |
| `agent.log` | Log level | Allow the agent log level to be changed (produces more or less output) | string | `"info"` |
| `agent.log_vport` | Log port | Allow to specify the `vsock` port to read logs | integer | `0` |
@@ -133,10 +135,10 @@ The kata agent has the ability to configure agent options in guest kernel comman
| `agent.passfd_listener_port` | File descriptor passthrough IO listener port | Allow to set the file descriptor passthrough IO listener port | integer | `0` |
| `agent.server_addr` | Server address | Allow the ttRPC server address to be specified | string | `"vsock://-1:1024"` |
For instance, you can enable the debug console and set the agent log level to debug by configuring the guest kernel command line in the configuration file:
For instance, you can enable the debug console and set the agent log level to debug by configuring the guest kernel command line in the configuration file:
constERR_INVALID_CONTAINER_PIPE_NEGATIVE: &str="container pipe size should not be negative";
constERR_INVALID_GUEST_COMPONENTS_REST_API_VALUE: &str="invalid guest components rest api feature given. Valid values are `all`, `attestation`, `resource`";
constERR_INVALID_GUEST_COMPONENTS_PROCS_VALUE: &str="invalid guest components process param given. Valid values are `attestation-agent`, `confidential-data-hub`, `api-server-rest`, or `none`";
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.