Commit Graph

4749 Commits

Author SHA1 Message Date
Bo Chen
a68aeca356
Merge pull request #9575 from likebreath/0430/clh_v39.0
versions: Upgrade to Cloud Hypervisor v39.0
2024-06-14 09:10:19 -07:00
stevenhorsman
3fb176970f dragonball: Fix device manager warning
- Fix the lint error:
```
error: you seem to use `.enumerate()` and immediately discard the index
   --> src/device_manager/mod.rs:427:33
    |
427 |         for (_index, device) in self.virtio_devices.iter().enumerate() {
    |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```
 by removing the unnecessary enumerate

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-06-14 16:45:16 +01:00
stevenhorsman
1ea2671f2f dragonball: Fix lint with rust 1.75.0
The ci failed with:
```
error: use of `or_insert_with` to construct default value
   --> src/address_space_manager.rs:650:14
    |
650 |             .or_insert_with(NumaNode::new);
    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try: `or_default()`
    |
```

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-06-14 16:45:16 +01:00
Steve Horsman
ab8a9882c1
Merge pull request #9818 from EmmEff/fix-spelling
runtime: fix minor spelling issues
2024-06-14 13:12:56 +01:00
Steve Horsman
99bf95f773
Merge pull request #9827 from littlejawa/fix_panic_on_metrics_gathering
runtime: avoid panic on metrics gathering
2024-06-14 11:12:43 +01:00
Pavel Mores
380f8ad03f runtime-rs: add base vCPU hotplugging support
We take advantage of the Inner pattern to enable QemuInner::resize_vcpu()
take `&mut self` which we need to call non-const functions on Qmp.

This runs on Intel architecture but will need to be verified and ported
(if necessary) to other architectures in the future.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-06-14 10:13:32 +02:00
Pavel Mores
8231c6c4a3 runtime-rs: instantiate Qmp as (optional) member of QemuInner
The QMP_SOCKET_FILE constant in cmdline_generator.rs is made public to make
it accessible from QemuInner.  This is fine for now however if the constant
needs to be accessed from additional places in the future we could consider
moving it to somewhere more visible.

The Debug impl for Qmp is empty since first, we don't actually want it,
it's only forced by Hypervisor trait bounds, and second, it doesn't have
anything to display anyway.  If Qmp gets any members in the future that
can be meaningfully displayed they should be handled by Qmp's Debug::fmt().

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-06-14 10:13:32 +02:00
Pavel Mores
6fdb262dca runtime-rs: add Qmp object to encapsulate QMP functionality
The constructor handles QMP connection initialisation, too, so there can
be non-functional Qmp instance.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-06-14 10:13:32 +02:00
Manuel Huber
62fd84dfd8 build: allow rootfs builds w/o git or VERSION file deps
We set the VERSION variable consistently across Makefiles to
'unknown'  if the file is empty or not present.
We also use git commands consistently for calculating the COMMIT,
COMMIT_NO variables, not erroring out when building outside of
a git repository.
In create_summary_file we also account for a missing/empty VERSION
file.
This makes e.g. the UVM build process in an environment where we
build outside of git with a minimal/reduced set of files smoother.

Signed-off-by: Manuel Huber <mahuber@microsoft.com>
2024-06-13 22:46:52 +00:00
Mike Frisch
c2f61b0fe3 runtime: spelling fixes
Minor spelling fixes in runtime log messages.

Signed-off-by: Mike Frisch <mikef17@gmail.com>
2024-06-13 12:11:34 -04:00
Greg Kurz
b85b1c1058
Merge pull request #9790 from gkurz/kill-some-dead-runtime-code
Kill some dead runtime code
2024-06-13 15:45:51 +02:00
gaohuatao
4cb4e44234 runtime-rs: fix the bug of func count_files
When the total number of files observed is greater than limit, return -1 directly.
runtime has fixed this bug, it should b ported to runtime-rs.

Fixes:#9829

Signed-off-by: gaohuatao <gaohuatao@bytedance.com>
2024-06-13 16:02:33 +08:00
Fupan Li
cd68ef372f sandbox: fix the issue of double initial_size_manager config
It shouldn't call the initial_size_manager's setup_config
in the load_config since it had been called in the sandbox's
try_init function.

Fixes: #9778

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-06-13 15:44:51 +08:00
Fupan Li
61687992f4 sandbox: fix the issue of failed to get the vmm master tid
For kata container, the container's pid is meaning less to
containerd/crio since the container's pid is belonged to VM,
and containerd/crio couldn't use it. Thus we just return any
tid of kata shim or hypervisor. But since the hypervisor had
been stopped before deleting the container, and it wouldn't
get the hypervisor's tid for some supported hypervisor, thus
we'd better to return the kata shim's pid instead of hypervisor's
tid.

Fixes: #9777

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-06-13 10:27:04 +08:00
Fabiano Fidêncio
56423cbbfe
Merge pull request #9706 from burgerdev/burgerdev/genpolicy-devices
genpolicy: add support for devices
2024-06-12 23:03:41 +02:00
Fabiano Fidêncio
3a0247ed43
Merge pull request #9819 from stevenhorsman/config-envvar-precedence
agent: config: Ensure envs take precedence
2024-06-12 11:26:02 +02:00
Julien Ropé
9c86eb1d35 runtime: avoid panic on metrics gathering
While running with a remote hypervisor, whenever kata-monitor tries to access
metrics from the shim, the shim does a "panic" and no metric can be gathered.

The function GetVirtioFsPid() is called on metrics gathering, and had a call
to "panic()". Since there is no virtiofs process for remote hypervisor, the
right implementation is to return nil. The caller expects that, and will skip
metrics gathering for virtiofs.

Fixes: #9826

Signed-off-by: Julien Ropé <jrope@redhat.com>
2024-06-12 10:02:44 +02:00
Xuewei Niu
92cc5e0adb
Merge pull request #9781 from gaohuatao-1/ght/shm 2024-06-12 12:39:28 +08:00
Moritz Sanft
84903c898c
genpolicy: fix settings path flag name
This corrects the warning to point to the \`-j\` flag,
which is the correct flag for the JSON settings file.
Previously, the warning was confusing, as it pointed to
the \`-p\` flag, which specifies to the path for the Rego ruleset.

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
2024-06-11 21:17:18 +02:00
Greg Kurz
1acf8d0c35 govmm: Drop QEMU's NoShutdown knob
Code is not used.

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-06-11 19:55:54 +02:00
Greg Kurz
cb5b548ad7 govmm: Drop QEMU's Daemonize knob
Code isn't used anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-06-11 19:55:54 +02:00
Greg Kurz
33eaf69d5f virtcontainers: Drop QEMU's Daemonize knob
QEMU isn't started as daemon anymore and this won't change (see #5736
for details). Drop the related code.

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-06-11 19:55:54 +02:00
Dan Mihai
d47f40210a
Merge pull request #9808 from microsoft/saulparedes/oci_from_settings
genpolicy: load OCI version from settings
2024-06-11 10:42:04 -07:00
Saul Paredes
3e9d6c11a1 genpolicy: add back support for insecure
registries

Adding back changes from
77540503f9.

Fixes: #9008

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-06-11 09:42:23 -07:00
Bo Chen
2398442c58 runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v39.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #8694, #9574

Signed-off-by: Bo Chen <chen.bo@intel.com>
2024-06-11 09:42:17 -07:00
stevenhorsman
40e02b34cb agent: config: Ensure envs take precedence
- Update the config parsing logic so that when reading from the
agent-config.toml file any envs are still processed
- Add units tests to formalise that the envs take precedence over values
from the command line and the config file

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-06-11 16:31:10 +01:00
Steve Horsman
59ff40f054
Merge pull request #9811 from mkulke/mkulke/use-kebabcase-for-enum-values-in-config-file-parsing
agent: convert enum vals to kebab-case in cfg file
2024-06-11 14:49:30 +01:00
gaohuatao
638e9acf89 runtime: fix the bug of func countFiles
When the total number of files observed is greater than limit, return (-1, err).
When the returned err is not nil, the func countFiles should return -1.

Fixes:#9780

Signed-off-by: gaohuatao <gaohuatao@bytedance.com>
2024-06-11 18:17:18 +08:00
Alex Lyn
1c8db85d54
Merge pull request #9784 from Apokleos/bufix-testcases
kata-types: fix bug in kata-types several test cases
2024-06-11 10:01:45 +08:00
Saul Paredes
6a84562c16 genpolicy: load OCI version from settings
Load OCI version from genpolicy-settings.json and validate it in
rules.rego

Fixes: #9593

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-06-10 15:30:39 -07:00
Magnus Kulke
abc704a720 agent: convert enum vals to kebab-case in cfg file
fixes #9810

Add an annotation to the enum values in the agent config that will
deserialize them using a kebab-case conversion, aligning the behaviour
to parsing of params specified via kernel cmdline.

drive-by fix: add config override for guest_component_procs variable

Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
2024-06-10 21:55:05 +02:00
Alex Lyn
27685c91e5 kata-types: fix bug in kata-types several test cases
(1) As mis-use of cap.set causing previous Caps lost which
causing assert! failed, just replacing cap.set with cap.add.

(2) It will return error if there's no such name setting when
do update_config_by_annotation {
    ...
if config.runtime.name.is_empty() {
            return Err(io::Error::new(
                io::ErrorKind::InvalidData,
                "Runtime name is missing in the configuration",
            ));
        }
...
}

Fixes #9783

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-06-07 09:16:23 +08:00
Niteesh Dubey
62d3d7c58f runtime: enable kernel-hashes for SNP confidential container
This is required to provide the hashes of kernel, initrd and cmdline
needed during the attestation of the coco.

Fixes: #9150

Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>
2024-06-05 15:02:02 +00:00
Fabiano Fidêncio
138ef2c55f
Merge pull request #9678 from AdithyaKrishnan/main
TEEs: Skip a few CI tests for SEV/SNP
2024-06-04 23:42:51 +02:00
GabyCT
6c2e8bed77
Merge pull request #9725 from 3u13r/feat/genpolicy/filter-by-runtime
genpolicy: add ability to filter for runtimeClassName
2024-06-04 10:06:14 -06:00
Wainer Moschetta
b5561074c3
Merge pull request #9377 from beraldoleal/yqbump
deps: bumping yq to v4.40.7
2024-06-03 14:34:58 -03:00
Ryan Savino
6db08ed620 runtime: sev: snp: Use shared_fs=none
Disabling 9p for SEV and SNP TEEs.

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2024-06-03 01:14:16 -05:00
Markus Rudy
13310587ed genpolicy: check requested devices
CreateContainerRequest objects can specify devices to be created inside
the guest VM. This change ensures that requested devices have a
corresponding entry in the PodSpec.

Devices that are added to the pod dynamically, for example via the
Device Plugin architecture, can be allowlisted globally by adding their
definition to the settings file.

Fixes: #9651
Signed-off-by: Markus Rudy <mr@edgeless.systems>
2024-05-31 22:05:49 +02:00
Markus Rudy
ea578f0a80 genpolicy: add support for VolumeDevices
This adds structs and fields required to parse PodSpecs with
VolumeDevices and PVCs with non-default VolumeModes.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2024-05-31 19:34:14 +02:00
Beraldo Leal
c99ba42d62 deps: bumping yq to v4.40.7
Since yq frequently updates, let's upgrade to a version from February to
bypass potential issues with versions 4.41-4.43 for now. We can always
upgrade to the newest version if necessary.

Fixes #9354
Depends-on:github.com/kata-containers/tests#5818

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2024-05-31 13:28:34 -04:00
Beraldo Leal
4f6732595d ci: skip go version check
golang.mk is not ready to deal with non GOPATH installs. This is
breaking test on s390x.

Since previous steps here are installing go and yq our way, we could
skip this aditional check. A full refactor to golang.mk would be needed
to work with different paths.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2024-05-31 13:28:34 -04:00
Magnus Kulke
9f04dc4c8b agent: introduce config for coco attestion procs
fixes #9748

A configuration option `guest_component_procs` has been introduced that
indicates which guest component processes are supposed to be spawned by
the agent. The default behaviour remains that all of those processes are
actively spawned by the agent. At the moment this is based on presence
of binaries in the rootfs and the guest_component_api_rest option.

The new option is incremental:

none -> attestation-agent -> confidential-data-hub -> api-server-rest

e.g. api-server-rest implies attestation-agent and confidential-data-hub

the `none` option has been removed from guest_component_api_rest, since
this is addresses by the introduced option.

To not change expected behaviour for  non-coco guests we still will still
only attempt to spawn the processes if the requested attestation binaries
are present on the rootfs, and issue in warning in those cases.

Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
2024-05-31 12:15:41 +02:00
Leonard Cohnen
1d1690e2a4 genpolicy: add ability to filter for runtimeClassName
Add the CLI flag --runtime-class-names, which is used during
policy generation. For resources that can define a
runtimeClassName (e.g., Pods, Deployments, ReplicaSets,...)
the value must have any of the --runtime-class-names as
prefix, otherwise the resource is ignored.

This allows to run genpolicy on larger yaml
files defining many different resources and only generating
a policy for resources which will be deployed in a
confidential context.

Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
2024-05-31 03:17:02 +02:00
Greg Kurz
b3cb19b6a7
Merge pull request #9639 from emanuellima1/rng-impl
runtime-rs: Add RNG to QEMU cmdline
2024-05-30 12:00:11 +02:00
Emanuel Lima
138d985c64
runtime-rs: Add RNG to QEMU cmdline
It creates this line, as the Golang runtime does:
-object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0

Signed-off-by: Emanuel Lima <emlima@redhat.com>
2024-05-29 13:11:00 -03:00
Zvonko Kaiser
d4832b3b74 vfio: Fix hotpunplug
We need to remove the device from the tracking map, a container
restart will increment the bus index and we will get out of root-ports
and crash the machine.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-05-28 07:37:30 +00:00
Zvonko Kaiser
4c93bb2d61 qemu: Add CDI device handling for any container type
We need special handling for pod_sandbox, pod_container and
single_container how and when to inject CDI devices

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-05-27 10:13:01 +00:00
Zvonko Kaiser
c7b41361b2 gpu: reintroduce pcie_root_port and add pcie_switch_port
In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: #8860

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-05-27 10:13:01 +00:00
Fupan Li
6f6a164451
Merge pull request #9268 from zvonkok/kata-agent-createcontainer
kata-agent: CreateContainer Hook
2024-05-27 16:36:22 +08:00
Alex Lyn
713c929a64
Merge pull request #9656 from pmores/document-qemu-rs-conventions
runtime-rs: document architecture & implementation conventions in qem…
2024-05-27 10:38:58 +08:00
Xuewei Niu
bb7a1c56e9
Merge pull request #9693 from sidneychang/9690/Adjust-indentation 2024-05-27 00:20:34 +08:00
Alex Lyn
55dbf6121a
Merge pull request #9604 from Apokleos/qmp-cmdline01
runtime-rs: add QMP support for Qemu(part I)
2024-05-26 20:22:59 +08:00
Alex Lyn
028b10ce7a
Merge pull request #9687 from l8huang/vfio-pci-gk
agent: collect PCI address mapping for both vfio-pci-gk and vfio-pci device
2024-05-26 17:48:25 +08:00
Steve Horsman
b89c3e35dd
Merge pull request #9583 from cncal/update_check_error_message
runtime: make kata-runtime check error more understandable when /dev/kvm doesn't exist
2024-05-24 17:49:43 +01:00
Alex Lyn
41fb7aeb89 runtime-rs: add QMP params suppport in cmdline
Fixes: #9603

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-05-24 22:16:24 +08:00
Alex Lyn
7ed6c6896b runtime-rs: add an option dbg_monitor_socket for HMP support
This option allows to add a debug monitor socket when
`enable_debug = true` to control QEMU within debugging case.

Fixes: #9603

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-05-24 22:16:17 +08:00
Lei Huang
3624573b12 agent: collect PCI address mapping for both vfio-pci-gk and vfio-pci device
The `update_env_pci()` function need the PCI address mapping to
translate the host PCI address to guest PCI address in below
environment variables:
- PCIDEVICE_<prefix>_<resource-name>_INFO
- PCIDEVICE_<prefix>_<resource-name>

So collect PCI address mapping for both vfio-pci-gk and
vfio-pci devices.

Fixes #9614

Signed-off-by: Lei Huang <leih@nvidia.com>
2024-05-23 21:20:01 -07:00
Fupan Li
d73876252e
Merge pull request #9690 from justxuewei/agent-timeout
runtime-rs: Remove obsoleted dial_timeout config
2024-05-24 10:31:12 +08:00
Zvonko Kaiser
3affd83e14
Merge pull request #9605 from l8huang/skip-env
kata-agent: update env PCIDEVICE_<prefix>_<resource-name>_INFO
2024-05-23 18:45:00 +02:00
Fabiano Fidêncio
d83cf39ba1
Merge pull request #9680 from kata-containers/dependabot/go_modules/src/runtime/go_modules-5e29427af7
build(deps): bump golang.org/x/net from 0.24.0 to 0.25.0 in /src/runtime in the go_modules group across 1 directory
2024-05-23 12:55:29 +02:00
Fabiano Fidêncio
0e33ecf7fc
Merge pull request #9653 from JakubLedworowski/fixes-9497-ensure-quote-generation-service-is-added-to-qemu-cmd-2
runtime: Enable connection to Quote Generation Service (QGS)
2024-05-22 15:49:23 +02:00
sidneychang
8938f35627 runtime-rs: Adjust indentation in ifneq statements within Makefile.
Replace tab indentation with spaces for the three lines within the ifneq statements, aligning them with the surrounding code.

Fixes:#9692

Signed-off-by: sidneychang <2190206983@qq.com>
2024-05-22 20:24:35 +08:00
Fabiano Fidêncio
94f7bbf253
Merge pull request #9682 from fidencio/topic/allow-increasing-cpus-and-memory-via-annotation-for-tdx
runtime: tdx: Allow default_{cpu,memory} annotations
2024-05-22 12:07:28 +02:00
Xuewei Niu
d31616cec3 runtime-rs: Remove obsoleted dial_timeout config
The `dial_timeout` works fine for Runtime-go, but is obsoleted in
Runtime-rs.

When the pod cannot connect to the Agent upon starting, we need to adjust
the `reconnect_timeout_ms` to increase the number of connection attempts to
the Agent.

Fixes: #9688

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-05-22 17:57:05 +08:00
Jakub Ledworowski
fc680139e5 runtime: Enable connection to Quote Generation Service (QGS)
For the TD attestation to work the connection to QGS on the host is needed.
By default QGS runs on vsock port 4050, but can be modified by the host owner.
Format of the qemu object follows the SocketAddress structure, so it needs to be provided in the JSON format, as in the example below:
-object '{"qom-type":"tdx-guest","id":"tdx","quote-generation-socket":{"type":"vsock","cid":"2","port":"4050"}}'

Fixes: #9497
Signed-off-by: Jakub Ledworowski <jakub.ledworowski@intel.com>
2024-05-22 11:16:24 +02:00
Alex Lyn
0331859740
Merge pull request #9642 from gkurz/drop-unused-knobs-qemu-rs
runtime-rs: Drop some useless QEMU arguments
2024-05-22 16:13:14 +08:00
Alex Lyn
ce030d1804
Merge pull request #9641 from cmaf/runtime-resize-mem-1
runtime: Add missing check in ResizeMemory for CH
2024-05-22 14:05:30 +08:00
Alex Lyn
b7af00be2a
Merge pull request #9624 from cncal/bugfix_duplicated_devices
runtime: fix duplicated devices requested to the agent
2024-05-22 12:45:46 +08:00
Steve Horsman
f41f642b90
Merge pull request #9635 from kata-containers/dependabot/go_modules/src/runtime/go_modules-f0df977846
build(deps): bump github.com/containerd/containerd from 1.7.11 to 1.7.16 in /src/runtime in the go_modules group across 1 directory
2024-05-21 21:19:32 +01:00
Steve Horsman
9b0ed3dfa7
Merge pull request #9657 from ajaypvictor/remote-hyp-annotations
runtime: Disable number of cpu comparison on remote hypervisor scenario
2024-05-21 21:19:12 +01:00
Lei Huang
b0a91b0d13 kata-agent: update env PCIDEVICE_<prefix>_<resource-name>_INFO
The new version of sriov-network-device-plugin adds an env
`PCIDEVICE_<prefix>_<resource-name>_INFO`, which has a json
value; kata-agent can't parse it as env
`PCIDEVICE_<prefix>_<resource-name>` which has value in format
"DDDD:BB:SS.F".

This change updates env `PCIDEVICE_<prefix>_<resource-name>_INFO`.

Signed-off-by: Lei Huang <leih@nvidia.com>
2024-05-21 10:46:41 -07:00
stevenhorsman
865fa9da15 runtime: Resolve go static-checks failure
Remove `rand.Seed` call to resolve the following failure:
```
rand.Seed is deprecated: As of Go 1.20 there is no reason to call Seed with a random value.
```

The go rand.Seed docs: https://pkg.go.dev/math/rand@go1.20#Seed
back this up and states:
> If Seed is not called, the generator is seeded randomly at program startup.
so I believe we can just delete the call.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-05-21 11:08:59 +01:00
Fabiano Fidêncio
abf52420a4 runtime: tdx: Allow default_{cpu,memory} annotations
For now, let's allow the users to set the default_cpu and default_memory
when using TDX, as they may hit issues related to the size of the
container image that must be pulled and unpacked inside the guest,

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-21 10:26:39 +02:00
stevenhorsman
75a201389d runtime: update go version in go.mod
- Make due to us bumping the golang version used in our CI
but `make vendor` fails without the go version in the runtime go.mod
being increased, so update this and run go mod tidy

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-05-21 09:11:46 +01:00
dependabot[bot]
735185b15c build(deps): bump github.com/containerd/containerd
Bumps the go_modules group with 1 update in the /src/runtime directory: [github.com/containerd/containerd](https://github.com/containerd/containerd).


Updates `github.com/containerd/containerd` from 1.7.11 to 1.7.16
- [Release notes](https://github.com/containerd/containerd/releases)
- [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md)
- [Commits](https://github.com/containerd/containerd/compare/v1.7.11...v1.7.16)

---
updated-dependencies:
- dependency-name: github.com/containerd/containerd
  dependency-type: direct:production
  dependency-group: go_modules
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-21 09:11:46 +01:00
Ajay Victor
abe607b0c7 runtime: Disable number of cpu comparison on remote hypervisor scenario
Fixes https://github.com/kata-containers/kata-containers/issues/9238

Signed-off-by: Ajay Victor <ajvictor@in.ibm.com>
2024-05-21 13:34:21 +05:30
dependabot[bot]
01868b2849
---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: indirect
  dependency-group: go_modules
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-20 22:06:41 +00:00
Fabiano Fidêncio
072b929b6f
Merge pull request #9660 from malt3/fix/genpolicy/namespace_empty_string
genpolicy: detect empty string in ns as default
2024-05-20 21:34:13 +02:00
Tim Zhang
857d2bbc8e agent: Fix ctr exec stuck problem
Fixes: #9532

Close stdin when write_stdin receives data of length 0.

Stop call notify_term_close() in close_stdin, because it could
discard stdout unexpectedly.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2024-05-20 14:52:14 +08:00
Fabiano Fidêncio
f2de259387
runtime: tdx: Use shared_fs=none
We shouldn't be using 9p, at all, with TEEs, as off right now we have no
way to ensure the channels are encrypted.  The way to work this around
for now is using guest pull, either with containerd + nydus snapshotter
or with CRI-O; or even tardev snapshotter for pulling on the host (which
is the approach used by MSFT).

This is only done for TDX for now, leaving the generic, AMD, and IBM
related stuff for the folks working on those to switch and debug
possible issues on their environment.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-19 18:47:09 +02:00
Malte Poll
babdab9078 genpolicy: detect empty string in ns as default
In Kubernetes, the following values for namespace are equivalent and all refer to the default namespace:

- ` ` (namespace field missing)
- `namespace: ""` (namespace field is the empty string)
- `namespace: "default"`(namespace field has the explicit value `default`)

Genpolicy currently does not handle the empty string case correctly.

Signed-Off-By: Malte Poll <1780588+malt3@users.noreply.github.com>
2024-05-18 12:44:59 +02:00
Pavel Mores
b9febc4458 runtime-rs: document architecture & implementation conventions in qemu-rs
Implementation of QemuCmdLine has a fairly uniform and repetitive structure
that's guided by a set of conventions.  These conventions have however been
mostly implicit so far, leading to a superfluous and annoying
request/force-push churn during qemu-rs PR reviews.

This commit aims to make things explicit so that contributors can take them
into account before an initial PR submission.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-05-17 12:21:44 +02:00
Chelsea Mafrica
5d2af555da runtime: Add missing check in ResizeMemory for CH
ResizeMemory for Cloud Hypervisor is missing a check for the new
requested memory being greater than the max hotplug size after
alignment. Add the check, and since an earlier check for this
setsrequested memory to the max hotplug size, do the same in the
post-alignment check.

Fixes #9640

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2024-05-15 11:29:18 -07:00
Greg Kurz
bd6420e0cc runtime-rs: Drop some useless QEMU arguments
All these settings are hardcoded as `false` and result in
no extra options on the QEMU command line, like the go
runtime does. There actually not needed :
- we're never going to ask QEMU to survive a guest shutdown
- we're never going to run QEMU daemonized since it prevents
  log collection
- we're never going to ask QEMU to start with the guest stopped

No need to keep this code around then.

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-05-15 18:33:43 +02:00
Greg Kurz
e2117d3b71
Merge pull request #9571 from emanuellima1/fix-impl-rtc
runtime-rs: Fix constructing the RTC struct
2024-05-14 09:17:27 +02:00
cncal
232db2d906 runtime: fix duplicated devices requested to the agent
By default, when a container is created with the `--privileged` flag,
all devices in `/dev` from the host are mounted into the guest. If
there is a block device(e.g. `/dev/dm`) followed by a generic
device(e.g. `/dev/null`),two identical block devices(`/dev/dm`)
would be requested to the kata agent causing the agent to exit with error:

> Conflicting device updates for /dev/dm-2

As the generic device type does not hit any cases defined in `switch`,
the variable `kataDevice` which is defined outside of the loop is still
the value of the previous block device rather than `nil`. Defining `kataDevice`
in the loop fixes this bug.

Signed-off-by: cncal <flycalvin@qq.com>
2024-05-12 16:38:37 +08:00
Emanuel Lima
59c1567f80 runtime-rs: Fix constructing the RTC struct
RTC was being built in a wrong fashion on commit #2bc5e3c6e2ab0145fa9e8be95df0d5086c07a517

RTC was being constructed inside the QemuCmdLine struct,
but it should've been built inside the devices vector.

Signed-off-by: Emanuel Lima <emlima@redhat.com>
2024-05-09 15:00:47 -03:00
Fabiano Fidêncio
77f457c0e1
runtime: tdx: Drop sept-ve-disable=on
This was needed when we were using an old (and not maintained anymore)
host stack.  Considering what we have as part of the distros, Today,
this can simply be dropped, as I cannot find any reference of this one
being needed in any up-to-date documentation.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
Fabiano Fidêncio
416d00228c
Revert "qemu: tdx: Adapt command line" (partially)
This reverts commit b7cccfa019.

The `private=on` bit has never made its way upstream, and was removed
from the latest iteration that we're using.  With that in mind, let's
revert its usage in the code.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
Fabiano Fidêncio
1c3037fd25
Revert "govmm: tdx: Expose the private=on|off knob"
This reverts commit 582b5b6b19.

The `private=on` bit has never made its way upstream, and was removed
from the latest iteration that we're using.  With that in mind, let's
revert its addition, and later on its usage in the code.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
Fabiano Fidêncio
f48450b360
runtime: config: tdx: Add QEMU / OVMF placeholder var
Let's add the PLACEHOLDER_FOR_DISTRO_{QEMU,OVMF}_WITH_TDX_SUPPORT
variables instead of actually setting a path, so we can easily replace
those as part of our deployment scripts.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
GabyCT
3b8a910393
Merge pull request #9596 from lifupan/main
db: fix the issue of failed to init pci root bus
2024-05-08 13:14:20 -06:00
Alex Lyn
875e6e3815
Merge pull request #9601 from cncal/fix_redundant_log
qemu: the error is logged only when it occurs
2024-05-08 08:59:01 +08:00
GabyCT
22087f9db9
Merge pull request #9598 from lifupan/main_shim
runtime-rs: fix the issue of the leak of dead shim
2024-05-07 10:14:11 -06:00
GabyCT
a564422b7b
Merge pull request #9582 from cncal/main
build: fix the confusing build message if yq doesn't exist in GOPATH/bin
2024-05-07 09:34:27 -06:00
Fabiano Fidêncio
ddf6b367c7
Merge pull request #9568 from kata-containers/dependabot/go_modules/src/runtime/go_modules-22ef55fa20
build(deps): bump the go_modules group across 5 directories with 8 updates
2024-05-07 13:14:48 +02:00
cncal
15d511af97 qemu: the error is logged only when it occurs
Everytime I create contianer on arm64 machine, containerd/kata logs a redundant warning
as follows:
``` shell
time="2024-05-07" level=warning msg="<nil>" arch=arm64 name=containerd-shim-v2
pid=xxx sandbox=fdd1f05 source=virtcontainers/hypervisor
```
I added an error statement so that the error would be logged when it occurs.

Signed-off-by: cncal <flycalvin@qq.com>
2024-05-07 14:28:04 +08:00
Fupan Li
3694f3d9fe runtime-rs: fix the issue of the leak of dead shim
We should init and asign the runtime instance to runtime
handler, otherwise, if the pause container failed to start,
which means the runtime instance failed to start, then the
following delete & shutdown request wouldn't be run, thus
the dead shim would be left.

Fixes: #9597

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-05-06 17:31:31 +08:00
Fupan Li
26bee78e8d db: fix the issue of failed to init pci root bus
dragonball reserves 2048G of mmio space for the pci root bus by default
on physical addresses greater than 4G. However, for some machines with
smaller physical address widths, such as 39-bit wide physical addresses,
dragonball reserves the mmio space when initializing the memory. It is
less than 2048G, so this commit dynamically calculates and allocates the
mmio size of each pci root bus.

Fixes: #9509

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-05-06 11:34:18 +08:00
Aurélien Bombo
0cc2b07a8c tests: adapt Mariner CI to unblock CH v39 upgrade
The CH v39 upgrade in #9575 is currently blocked because of a bug in the
Mariner host kernel. To address this, we temporarily tweak the Mariner
CI to use an Ubuntu host and the Kata guest kernel, while retaining the
Mariner initrd. This is tracked in #9594.

Importantly, this allows us to preserve CI for genpolicy. We had to
tweak the default rules.rego however, as the OCI version is now
different in the Ubuntu host. This is tracked in #9593.

This change has been tested together with CH v39 in #9588.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2024-05-03 16:29:12 +00:00
cncal
48d873b52b build: fix the confusing build message if yq doesn't exist in GOPATH/bin
The build message shows that yq was not found when I tried to build
runtime binaries, but I've actually installed yq by yum install.

Signed-off-by: cncal <flycalvin@qq.com>
2024-05-03 08:34:45 +08:00
cncal
9caa7beb1f runtime: make kata-runtime check error more understandable
If device /dev/kvm does not exist, kata-runtime check would fail with
an ambiguous error messae 'no such file or directory'. I added a little
more details to make it understandable and it will belike:

```
ERRO[0000] cannot open kvm device: no such file or directory  arch=arm64 check-type=full device=/dev/kvm name=kata-runtime pid=2849085 source=runtime
ERRO[0000] no such file or directory                          arch=arm64 name=kata-runtime pid=2849085 source=runtime
no such file or directory
```

Signed-off-by: cncal <flycalvin@qq.com>
2024-05-03 08:29:08 +08:00
Zvonko Kaiser
e5e0983b56
Merge pull request #9476 from zvonkok/nvidia-config-tomls
config: Add NVIDIA GPU SNP, TDX configuration files
2024-05-02 10:27:10 +02:00
stevenhorsman
3c2232d898 runtime: fix testVersionString logic
- The testVersionString logic use regex to check that the ociVersion is
displayed correctly, but with the new go module that version has a
`+` in, so we need to quote this to escape special characters

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-04-30 10:54:49 +01:00
dependabot[bot]
391bc35805 build(deps): bump the go_modules group across 5 directories with 8 updates
Bumps the go_modules group with 2 updates in the /src/runtime directory: [github.com/containerd/containerd](https://github.com/containerd/containerd) and [github.com/containers/podman/v4](https://github.com/containers/podman).
Bumps the go_modules group with 4 updates in the /src/tools/csi-kata-directvolume directory: [golang.org/x/sys](https://github.com/golang/sys), google.golang.org/protobuf, [golang.org/x/net](https://github.com/golang/net) and [google.golang.org/grpc](https://github.com/grpc/grpc-go).
Bumps the go_modules group with 2 updates in the /src/tools/log-parser directory: [golang.org/x/sys](https://github.com/golang/sys) and gopkg.in/yaml.v3.
Bumps the go_modules group with 2 updates in the /tests directory: [golang.org/x/sys](https://github.com/golang/sys) and gopkg.in/yaml.v3.
Bumps the go_modules group with 2 updates in the /tools/testing/kata-webhook directory: [golang.org/x/sys](https://github.com/golang/sys) and [golang.org/x/net](https://github.com/golang/net).


Updates `github.com/containerd/containerd` from 1.7.2 to 1.7.11
- [Release notes](https://github.com/containerd/containerd/releases)
- [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md)
- [Commits](https://github.com/containerd/containerd/compare/v1.7.2...v1.7.11)

Updates `github.com/containers/podman/v4` from 4.2.0 to 4.9.4
- [Release notes](https://github.com/containers/podman/releases)
- [Changelog](https://github.com/containers/podman/blob/v4.9.4/RELEASE_NOTES.md)
- [Commits](https://github.com/containers/podman/compare/v4.2.0...v4.9.4)

Updates `google.golang.org/protobuf` from 1.29.1 to 1.33.0

Updates `github.com/cyphar/filepath-securejoin` from 0.2.3 to 0.2.4
- [Release notes](https://github.com/cyphar/filepath-securejoin/releases)
- [Commits](https://github.com/cyphar/filepath-securejoin/compare/v0.2.3...v0.2.4)

Updates `golang.org/x/sys` from 0.15.0 to 0.19.0
- [Commits](https://github.com/golang/sys/compare/v0.15.0...v0.19.0)

Updates `google.golang.org/protobuf` from 1.31.0 to 1.33.0

Updates `golang.org/x/net` from 0.19.0 to 0.23.0
- [Commits](https://github.com/golang/net/compare/v0.19.0...v0.23.0)

Updates `google.golang.org/grpc` from 1.59.0 to 1.63.2
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.59.0...v1.63.2)

Updates `golang.org/x/sys` from 0.0.0-20191026070338-33540a1f6037 to 0.1.0
- [Commits](https://github.com/golang/sys/compare/v0.15.0...v0.19.0)

Updates `gopkg.in/yaml.v3` from 3.0.0-20200313102051-9f266ea9e77c to 3.0.0

Updates `golang.org/x/sys` from 0.0.0-20220429233432-b5fbb4746d32 to 0.19.0
- [Commits](https://github.com/golang/sys/compare/v0.15.0...v0.19.0)

Updates `gopkg.in/yaml.v3` from 3.0.0-20210107192922-496545a6307b to 3.0.0

Updates `golang.org/x/sys` from 0.15.0 to 0.19.0
- [Commits](https://github.com/golang/sys/compare/v0.15.0...v0.19.0)

Updates `golang.org/x/net` from 0.19.0 to 0.23.0
- [Commits](https://github.com/golang/net/compare/v0.19.0...v0.23.0)

---
updated-dependencies:
- dependency-name: github.com/containerd/containerd
  dependency-type: direct:production
  dependency-group: go_modules
- dependency-name: github.com/containers/podman/v4
  dependency-type: direct:production
  dependency-group: go_modules
- dependency-name: google.golang.org/protobuf
  dependency-type: direct:production
  dependency-group: go_modules
- dependency-name: github.com/cyphar/filepath-securejoin
  dependency-type: indirect
  dependency-group: go_modules
- dependency-name: golang.org/x/sys
  dependency-type: indirect
  dependency-group: go_modules
- dependency-name: google.golang.org/protobuf
  dependency-type: indirect
  dependency-group: go_modules
- dependency-name: golang.org/x/net
  dependency-type: direct:production
  dependency-group: go_modules
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  dependency-group: go_modules
- dependency-name: golang.org/x/sys
  dependency-type: indirect
  dependency-group: go_modules
- dependency-name: gopkg.in/yaml.v3
  dependency-type: indirect
  dependency-group: go_modules
- dependency-name: golang.org/x/sys
  dependency-type: indirect
  dependency-group: go_modules
- dependency-name: gopkg.in/yaml.v3
  dependency-type: indirect
  dependency-group: go_modules
- dependency-name: golang.org/x/sys
  dependency-type: indirect
  dependency-group: go_modules
- dependency-name: golang.org/x/net
  dependency-type: indirect
  dependency-group: go_modules
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-30 09:46:13 +01:00
Wainer Moschetta
eae429a39b
Merge pull request #9552 from wainersm/kata_cc_dev
runtime: new qemu-coco-dev configuration
2024-04-30 05:21:49 -03:00
Pavel Mores
1dd06cf40d
Merge pull request #9551 from pmores/support-iommu
runtime-rs: support IOMMU in qemu VMs
2024-04-29 15:26:11 +02:00
Wainer dos Santos Moschetta
42fb5d7760 runtime: new qemu-coco-dev configuration
Created a new configuration to configure Kata for CoCo without requiring TEE
hardware so to allow developers implement/test/debug platform agnostic code
on their workstations. It will also ease testing of CoCo features on CI with
non-TEE supported VMs.

This is based off qemu configuration. The following differences applied:
 - switched to confidential guest image/initrd
 - switched to confidential kernel
 - switched to 9p shared_fs

Fixes #9487
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-04-29 05:45:10 -03:00
Fabiano Fidêncio
fe21d7a58b
rootfs: Stop building and shipping OPA
Since OPA binary was replaced by the regorus crate, we can finally stop
building and shipping the binary.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-26 18:51:28 +02:00
Pavel Mores
908ec31d9b runtime-rs: fix iommu_platform support for qemu vhost-user-fs device
iommu_platform support was already added on initial DeviceVhostUserFs
introduction, however it incorrectly enabled iommu_platform also on
non-CCW (e.g. PCI) systems.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-04-26 14:48:00 +02:00
Pavel Mores
174fc8f44b runtime-rs: support iommu_platform for qemu virtio-net device
Note that it's only supported on CCW systems.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-04-26 14:48:00 +02:00
Pavel Mores
0d038f20cc runtime-rs: support iommu_platform for qemu virtio-serial device
iommu_platform is only turned on for CCW systems.

PartialEq is added to VirtioBusType to enable the '==' operator.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-04-26 14:48:00 +02:00
Pavel Mores
66a2dc48ae runtime-rs: support iommu_platform for qemu vhost-vsock device
iommu_platform addition is controlled solely by the configuration file.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-04-26 14:48:00 +02:00
Pavel Mores
d1e6f9cc4e runtime-rs: add IOMMU to qemu VM if configured
The adding itself is done by a new function add_iommu() that conforms with
the add_*() convention.  Note though that this function is called
internally, by the QemuCmdLine constructor, simply because there's nothing
to trigger its invocation from QemuInner (unlike the other add_*()
functions so far).

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-04-26 14:48:00 +02:00
Pavel Mores
0859f47a17 runtime-rs: add representation of '-device intel-iommu' to qemu-rs
Following the golang shim example, the values are hardcoded.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-04-26 14:47:51 +02:00
Pavel Mores
702bf0d35e runtime-rs: support qemu machine's 'kernel_irqchip' param
We will want to set kernel_irqchip when enabling IOMMU and this commit
adds the requisite support.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-04-26 14:42:54 +02:00
Alex Lyn
f72c6ba814
Merge pull request #9519 from emanuellima1/impl-rtc
runtime-rs: Add RTC to QEMU cmdline
2024-04-26 17:44:47 +08:00
Dan Mihai
b42ddaf15f
Merge pull request #9530 from microsoft/saulparedes/improve_caching
genpolicy: changing caching so the tool can run concurrently with itself
2024-04-25 13:06:23 -07:00
Fabiano Fidêncio
b4360e7e37
Merge pull request #9510 from microsoft/danmihai1/regorus-policy2
agent: use regorus instead of opa
2024-04-24 21:40:29 +02:00
Dan Mihai
2400a4d249
Merge pull request #9428 from arc9693/archana1/genplicyfixes
genpolicy: implement default methods for K8sResource trait
2024-04-24 08:04:19 -07:00
Dan Mihai
ff385eac41 agent: remove unnecessary comment
Remove reminder to initialize Policy earlier, because currently there
are no plans to initialize earlier.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-04-24 14:53:51 +00:00
Dan Mihai
89c85dfe84
Merge pull request #9432 from UiPath/fix-clh-wait
clh: isClhRunning waits for full timeout when clh exits
2024-04-23 13:02:45 -07:00
Emanuel Lima
2bc5e3c6e2 runtime-rs: Add RTC to QEMU cmdline
Add RTC by hardcoding the ooptions base=utc,driftfix=slew,clock=host

Signed-off-by: Emanuel Lima <emlima@redhat.com>
2024-04-23 10:46:30 -03:00
Greg Kurz
42a79801f3
Merge pull request #9524 from littlejawa/fix_createruntime_hook_not_called
runtime: Call CreateRuntime hooks at container creation time
2024-04-23 13:43:36 +02:00
Fupan Li
469c4e4f44
Merge pull request #9335 from Tim-Zhang/fix-passfd-fifo-open
passfd-io: fix FIFO opening and vsock handling
2024-04-23 09:04:45 +08:00
Alex Lyn
bc2cf95e7a
Merge pull request #9517 from amshinde/update-storage-source-pciblock
runtime-rs: Update storage source for pci block devices
2024-04-23 07:32:36 +08:00
Dan Mihai
5d31eb4847 agent: use regorus 0.1.4
Use regorus 0.1.4 from crates.io, instead of its source code
repository.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-04-22 23:21:17 +00:00
Dan Mihai
df23eb09a6 agent: use regorus instead of opa
Implement Agent Policy using the regorus crate instead of the OPA
daemon.

The OPA daemon will be removed from the Guest rootfs in a future PR.

Fixes: #9388

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-04-22 19:58:30 +00:00
Dan Mihai
b509c1beee agent: lock anyhow version to 1.0.58
Lock anyhow version to 1.0.58 because:

- Versions between 1.0.59 - 1.0.76 have not been tested yet using
  Kata CI. However, those versions pass "make test" for the
  Kata Agent.

- Versions 1.0.77 or newer fail during "make test" - see
  https://github.com/kata-containers/kata-containers/issues/9538.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-04-22 19:49:15 +00:00
Archana Shinde
cc6b671101 runtime-rs: Update storage source for pci block devices
In case of block devices using virtio-block, we need to pass the
pci-path as the storage source field to the agent.
Current the virt-path is being passed which works just for mmio block
devices.
In the future when support is added for scsi, block-ccw and pmem
devices, the storage source would need to be handled accordingly.

Fixes: #9034

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2024-04-22 11:36:58 -07:00
Greg Kurz
6ca0f09710
Merge pull request #9518 from microsoft/danmihai1/agent-cargo-lock
agent: update cargo.lock
2024-04-22 13:36:06 +02:00
Tim Zhang
aeba483ec8 agent: avoid fd leakage of passfd-io
In do_create_container and do_exec_process, we should create the proc_io first,
in case there's some error occur below, thus we can make sure
the io stream closed when error occur.

Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-04-22 17:39:33 +08:00
Tim Zhang
8441187d5e runtime-rs: fix FIFO handling
Fixes: #9334

In linux, when a FIFO is opened and there are no writers, the reader
will continuously receive the HUP event. This can be problematic.
To avoid this problem, we open stdin in write mode and keep the stdin-writer

We need to open the stdout/stderr as the read mode and keep the open endpoint
until the process is delete. otherwise,
the process would exit before the containerd side open and read
the stdout fifo, thus runD would write all of the stdout contents into
the stdout fifo and then closed the write endpoint. Then, containerd
open the stdout fifo and try to read, since the write side had closed,
thus containerd would block on the read forever.
Here we keep the stdout/stderr read endpoint File in the common_process,
which would be destroied when containerd send the delete rpc call,
at this time the containerd had waited the stdout read return, thus it
can make sure the contents in the stdout/stderr fifo wouldn't be lost.

Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-04-22 17:39:33 +08:00
Tim Zhang
d68eb7f0ad agent: Fix close_stdin for passfd-io
In scenario passfd-io, we should wait for stdin to close itself
instead of manually intervening in it.

Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-04-22 17:39:32 +08:00
Archana Choudhary
4a010cf71b genpolicy: add default implementations for K8sResource trait
This commit adds default implementations for following methods of
K8sResource trait:
- generate_policy
- serialize

Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2024-04-21 12:59:02 +00:00
Archana Choudhary
6edc3b6b0a genpolicy: add default implementation for use_sandbox_pidns
This patch adds a default implementation for the use_sandbox_pidns
and updates the structs that implement the K8sResource trait to use
the default.

Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2024-04-21 12:59:02 +00:00
Archana Choudhary
d5d3f9cda7 genpolicy: add default implementation for use_host_network
- Provide default implementation for use_host_network
- Remove default implementation from structs implementing the trait K8sResource

Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2024-04-21 12:59:02 +00:00
Archana Choudhary
9a3eac5306 genpolicy: add default impl for get_containers
- Provide default impl for get_containers
- Remove default impl from structs implementing the trait K8sResource

Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2024-04-21 12:59:02 +00:00
Archana Choudhary
2db3470602 genpolicy: add default impl for get_container_mounts_and_storages
- Provide default impl for get_container_mounts_and_storages
- Remove default impl from structs implementing the trait K8sResource

Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2024-04-21 12:59:02 +00:00
Archana Choudhary
09b0b4c11d genpolicy: add default implementation for get_sandbox_name
- Provide default implementation for get_sandbox_name in K8sResource trait
- Remove default implementation from structs implementing the trait K8sResource

Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2024-04-21 12:55:32 +00:00
Archana Choudhary
43e9de8125 genpolicy: add default implementation for get_annotations
- Provide default implementation for get_annontations.
- Remove default implementation from structs implementing the trait K8sResource

Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2024-04-21 12:55:32 +00:00
Saul Paredes
2149cb6502 genpolicy: changing caching so the tool can run
concurrently with itself

Based on 3a1461b0a5186a92afedaaea33ff2bd120d1cea0

Previously the tool would use the layers_cache folder for all instances
and hence delete the cache when it was done, interfereing with other
instances. This change makes it so that each instance of the tool will
have its own temp folder to use.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-04-19 15:46:30 -07:00
Steve Horsman
7e12d588c0
Merge pull request #9485 from sparky005/update_golang.org/x/net
update golang.org/x/net
2024-04-19 11:26:13 +01:00
Julien Ropé
70e798ed35 runtime: Call CreateRuntime hooks at container creation time
CreateRuntime hooks are called at the CreateSandbox time,
but not after CreateContainer.

Fixes: #9523

Signed-off-by: Julien Ropé <jrope@redhat.com>
2024-04-19 10:25:02 +02:00
Dan Mihai
4242801b1c agent: update cargo.lock
Update Kata Agent's Cargo.lock after the recent changes to Cargo.toml.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-04-18 17:12:48 +00:00
Adil Sadik
1c5ca0c915 runtime: update golang.org/x/net
updates golang.org/x/net to newer version that closes some reported
vulnerabilities and security issues

Fixes #9486

Signed-off-by: Adil Sadik <sparky.005@gmail.com>
2024-04-18 10:55:02 -04:00
Tim Zhang
221c5b51fe dragonball: fix EPOLLHUP/EPOLLERR events handling in vsock
1. EPOLLHUP events also need to be read and will be got len 0.
2. We should kill the connection when EPOLLERR events are received.

Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-04-18 20:47:02 +08:00
Zvonko Kaiser
eda3bfe2ef config: Add NVIDIA GPU SNP, TDX configuration files
Fixes: #9475

For TDX and SNP add NVIDIA specific configuration files

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-04-17 12:49:13 +00:00
Dan Mihai
c26dad8fe5
Merge pull request #9294 from burgerdev/burgerdev/genpolicy-configurable-pause
genpolicy: support insecure registries and custom pause containers
2024-04-16 09:39:33 -07:00
Greg Kurz
aca6a1bcb5
Merge pull request #9353 from pmores/pr-8866-follow-up
runtime-rs: refactor qemu driver
2024-04-16 16:07:36 +02:00
Hyounggyu Choi
32f58abfde
Merge pull request #9403 from BbolroC/runtime-rs-ci-qemu
CI: Enable GHA cri-containerd workflow for runtime-rs with QEMU
2024-04-15 09:31:25 +02:00
Hyounggyu Choi
606f8e1ab2 runtime-rs: Adjust configuration for qemu-runtime-rs
To make `qemu-runtime-rs` working for CI, we have to rename a configuration
template file and `CONFIG_FILE_QEMU` in Makefile.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-04-12 12:25:53 +02:00
Alexandru Matei
9e01732f7a agent: shutdown vm on exit when agent is used as init process
Linux kernel generates a panic when the init process exits.
The kernel is booted with panic=1, hence this leads to a
vm reboot.
When used as a service the kata-agent service has an ExecStop
option which does a full sync and shuts down the vm.
This patch mimicks this behavior when kata-agent is used as
the init process.

Fixes: #9429

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2024-04-12 11:32:31 +03:00
Alexandru Matei
54923164b5 clh: isClhRunning waits for full timeout when clh exits
isClhRunning uses signal 0 to test whether the process is
still alive or not. This doesn't work because the process is a
direct child of the shim. Once it is dead the process becomes
zombie.
Since no one waits for it the process lingers until
its parent dies and init reaps it. Hence sending signal 0 in
isClhRunning will always return success whether the process is
dead or not.
This patch calls wait to reap the process, if it succeeds that
means it is our child process, if not we send the signal.

Fixes: #9431

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2024-04-12 11:31:53 +03:00
Markus Rudy
77540503f9 genpolicy: add support for insecure registries
genpolicy is a handy tool to use in CI systems, to prepare workloads
before applying them to the Kubernetes API server. However, many modern
build systems like Bazel or Nix restrict network access, and rightfully
so, so any registry interaction must take place on localhost.
Configuring certificates for localhost is tricky at best, and since
there are no privacy concerns for localhost traffic, genpolicy should
allow to contact some registries insecurely. As this is a runtime
environment detail, not a target environment detail, configuring
insecure registries does not belong into the JSON settings, so it's
implemented as command line flags.

Fixes: #9008

Signed-off-by: Markus Rudy <webmaster@burgerdev.de>
2024-04-11 22:29:03 +02:00
Markus Rudy
bc2292bc27 genpolicy: make pause container image configurable
CRIs don't always use a pause container, but even if they do the
concrete container choice is not specified. Even if the CRI config can
be tweaked, it's not guaranteed that registries in the public internet
can be reached. To be portable across CRI implementations and
configurations, the genpolicy user needs to be able to configure the
container the tool should append to the policy.

Signed-off-by: Markus Rudy <webmaster@burgerdev.de>
2024-04-11 16:26:35 +02:00
Markus Rudy
8b30fa103f genpolicy: parse json settings during config init
Decouple initialization of the Settings struct from creating the
AgentPolicy struct, so that the settings are available for evaluating,
extending or overriding command line arguments.

Signed-off-by: Markus Rudy <webmaster@burgerdev.de>
2024-04-11 16:17:33 +02:00
Xuewei Niu
50f78ec52c agent: Fix the issue with the "test_new_fs_manager" test
This patch introduces a one-time cpath to mitigate the cgroup residuals. It
might break the device cgroup merging rules when the cgroup has children.

Fixes: #9456

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-04-11 18:06:05 +08:00
Saul Paredes
51498ba99a genpolicy: toggle containerd pull in tests
- Add v1 image test case
- Install protobuf-compiler in build check
- Reset containerd config to default in kubernetes test if we are testing genpolicy
- Update docker_credential crate
- Add test that uses default pull method
- Use GENPOLICY_PULL_METHOD in test

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-04-08 19:28:29 -07:00
Saul Paredes
c96ebf237c genpolicy: add containerd pull method
Add optional toggle to use existing containerd installation to pull and manage container images.
This adds support to a wider set of images that are currently not supported by standard pull method,
such as those that use v1 manifest.

Fixes: #9144

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-04-08 09:56:59 -07:00
Greg Kurz
8b996b9307
Merge pull request #9331 from egernst/foobar
katautils: check number of cores on the system intead of go runtime
2024-04-08 18:38:49 +02:00
Wainer Moschetta
fba1d394d7
Merge pull request #9369 from ChengyuZhu6/sandbox-image
agent:image: Support different pause image in the guest for guest pull
2024-04-08 11:06:21 -03:00
stevenhorsman
864e9c22ba agent: doc: Add new config doc
Document the new guest_components_rest_api config parameter

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-04-08 11:38:53 +01:00
Biao Lu
f0edec84f6 agent: Launch api-server-rest
If 'rest_api' is configured, let's start the  api-server-rest after
the attestation-agent and the confidential-data-hub have been started.

Fixes: #7555

Signed-off-by: Biao Lu <biao.lu@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com>
Co-authored-by: Wang, Arron <arron.wang@intel.com>
Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com>
Co-authored-by: Alex Carter <alex.carter@ibm.com>
Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>
2024-04-08 11:38:53 +01:00
Biao lu
4d752e6350 agent: Add config for api-server-rest
Add configuration for 'rest api server'.

Optional configurations are
  'agent.rest_api=attestation' will enable attestation api
  'agent.rest_api=resource' will enable resource api
  'agent.rest_api=all' will enable all (attestation and resource) api

Fixes: #7555

Signed-off-by: Biao Lu <biao.lu@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com>
Co-authored-by: Wang, Arron <arron.wang@intel.com>
Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com>
Co-authored-by: Alex Carter <alex.carter@ibm.com>
Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>
2024-04-08 11:06:14 +01:00
Biao Lu
f476d671ed agent: Launch the confidential data hub
Let's introduce a new method to start the confidential data hub and the
attestation agent.  The former depends on the later, and it needs to be
started before the RPC server.

Starting the attestation components is based on whether the confidential
containers guest components binaries are found in the rootfs.

Fixes: #7544

Signed-off-by: Biao Lu <biao.lu@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com>
Co-authored-by: Wang, Arron <arron.wang@intel.com>
Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com>
Co-authored-by: Alex Carter <alex.carter@ibm.com>
Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>
2024-04-08 11:06:14 +01:00
Greg Kurz
be8f0cb520
Merge pull request #9402 from deagon/feat/debug-threads
qemu: show the thread name when enable the hypervisor.debug option
2024-04-08 11:04:36 +02:00
ChengyuZhu6
8c897f822c agent:image: Support different pause image in the guest for guest pull
Support different pause images in the guest for guest-pull, such as k8s
pause image (registry.k8s.io/pause) and openshift pause image (quay.io/bpradipt/okd-pause).

Fixes: #9225 -- part III

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-04-07 09:00:10 +08:00
Fabiano Fidêncio
cdb8531302
hypervisor: Simplify TDX protection detection
Let's rely on the kvm module 'tdx' parameter to do so.
This aligns with both OSVs (Canonical, Red Hat, SUSE) and the TDX
adoption (https://github.com/intel/tdx-linux) stacks.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 19:51:27 +02:00
Fabiano Fidêncio
b7cccfa019
qemu: tdx: Adapt command line
This commit is a mess, but I'm not exactly sure what's the best way to
make it less messy, as we're getting QEMU TDX to work while partially
reverting 1e34220c41.

With that said, let me cover the content of this commit.

Firstly, we're reverting all the changes related to
"memory-backend-memfd-private", as that's what was used with the
previous host stack, but it seems it
didn't fly upstream.

Secondly, in order to get QEMU to properly work with TDX, we need to
enforce the 'private=on' knob and use the "memory-backend-ram", and
we're doing so, and also making sure to test the `private=on` newly
added knob.

I'm sorry for the confusion, I understand this is not optimal, I just
don't see an easy path to do changes without leaving the code broken
during those changes.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 19:51:27 +02:00
Fabiano Fidêncio
6b4cc5ea6a
Revert "qemu: tdx: Workaround SMP issue with TDX 1.5"
This reverts commit d1b54ede29.

 Conflicts:
	src/runtime/virtcontainers/qemu.go

This commit was a hack that was needed in order to get QEMU + TDX to
work atop of the stack our CI was running on.  As we're moving to "the
officially supported by distros" host OS, we need to get rid of this.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 10:23:52 +02:00
Fabiano Fidêncio
582b5b6b19
govmm: tdx: Expose the private=on|off knob
The private=on|off knob is required in order to properly lauunch a TDX
guest VM.

This is a brand new property that is part of the still in-flight patches
adding TDX support on QEMU.

Please, see:
3fdd8072da

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 10:23:52 +02:00
Alex Lyn
0e0a361f0e
Merge pull request #8782 from Apokleos/device-increate-count
bugfix and refactor device increate count
2024-04-05 13:43:49 +08:00
Eric Ernst
da01bccd36 katautils: check number of cores on the system intead of go runtime
We used to utilize go runtime's "NumCPUs()", which will give the number
of cores available to the Go runtime, which may be a subset of physical
cores if the shim is started from within a cpuset. From the function's
description:
"NumCPU returns the number of logical CPUs usable by the current
process."

As an example, if containerd is run from within a smaller CPUset, the
maximum size of a pod will be dictated by this CPUset, instead of what
will be available on the rest of the system.

Since the shim will be moved into its own cgroup that may have a
different CPUset, let's stick with checking physical cores. This also
aligns with what we have documented for maxVCPU handling.

In the event we fail to read /proc/cpuinfo, let's use the goruntime.

Fixes: #9327

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2024-04-03 16:09:16 -07:00
Alex Lyn
935a1a3b40 runtime-rs: refactor decrease_attach_count with do_decrease_count
Try to reduce duplicated code in decrease_attach_count with public
new function do_decrease_count.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:19:19 +08:00
Alex Lyn
4f0fab938d runtime-rs: refactor increase_attach_count with do_increase_count
Try to reduce duplicated code in increase_attach_count with public
new function do_increase_count.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:19:19 +08:00
Alex Lyn
fff64f1c3e runtime-rs: introduce dedicated function do_decrease_count
Introduce a dedicated public function do_decrease_count to
reduce duplicated code in drivers' decrease_attach_count.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:19:08 +08:00
Alex Lyn
5750faaf31 runtime-rs: introduce dedicated function do_increase_count
Since there are many implementations of reference counting in the
drivers, all of which have the same implementation, we should try
to reduce such duplicated code as much as possible. Therefore, a
new function is introduced to solve the problem of duplicated code.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:09:17 +08:00
Guoqiang Ding
cd0c31e185 qemu: show the thread name when enable the hypervisor.debug option
Add debug-threads=on in the name argument if debug enabled.

Fixes: #9400
Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>
2024-04-03 10:36:52 +08:00
Alex Lyn
fa8049af6c
Merge pull request #9383 from Apokleos/unified-cgrp-cmdline
kata-agent: enabling cgroups-v2 by systemd.unified_cgroup_hierarchy
2024-04-02 09:08:04 +08:00
Alex Lyn
07bfdf4a22
Merge pull request #9275 from Apokleos/swap-hooks-bindmnt
kata-agent: Change order of guest hook and bind mount processing
2024-04-02 07:40:10 +08:00
Alex Lyn
c88014834b kata-agent: enabling cgroups-v2 by systemd.unified_cgroup_hierarchy
Configure the system to mount cgroups-v2 by default during system boot
by the systemd system, We must add systemd.unified_cgroup_hierarchy=1
parameter to kernel cmdline, which will be passed by kernel_params in
configuration.toml.
To enable cgroup-v2, just add systemd.unified_cgroup_hierarchy=true[1]
to kernel_params.

Fixes: #9336

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-01 18:45:12 +08:00
alex.lyn
548f252bc4 runtime-rs: bugfix incorrect use of refcount before vfio attach
When there's a pod with multiple containers, there may be case that
attach point more than 2, we should not return Err in that case when
we are doing attach ops, but just return Ok.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-01 11:28:57 +08:00
Alex Lyn
dfa8832406
Merge pull request #9345 from c3d/bug/9342-agent-test-errors
agent: Fix errors in `make check`
2024-04-01 09:48:44 +08:00
Dan Mihai
600f9266f3 runtime: remove stream copy infinite loop
This reverts commit 1c5693be86.

Avoid apparent infinite loop when ReadStreamRequest is blocked by
policy - for some of the pods.

When running the k8s-limit-range.bats test with Policy enabled,
the Shim + VMM never get terminated on my cluster. Not sure why
the sandbox clean-up works better for other tests, but the
k8s-limit-range test pod gets stuck in an infinite loop:

stdout io stream copy error happens: error = %wrpc error: code =
PermissionDenied desc = \"ReadStreamRequest is blocked by policy

...

policy check: ReadStreamRequest

...

stdout io stream copy error happens: error = %wrpc error: code =
PermissionDenied desc = \"ReadStreamRequest is blocked by policy

...

policy check: ReadStreamRequest

...

Fixes: #9380

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-03-28 22:43:28 +00:00
Dan Mihai
ebb26edf42
Merge pull request #9347 from microsoft/danmihai1/reduce-exec-test-policy-prints
genpolicy: reduce policy debug prints
2024-03-27 15:12:10 -07:00
Tobin Feldman-Fitzthum
9856fe5bea runtime: remove ServiceOffload parameter
Since we no longer use the service_offload configuration,
remove the ServiceOffload field from the image struct.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2024-03-27 12:21:13 -05:00
Tobin Feldman-Fitzthum
a18c7ca307 runtime: remove unimplemented CoCo configurations
These experimental options were added 2 years ago
in anticipation of features that would be added
in CoCo. These do not match the features that were
eventually added and will soon be ported to main.

Fixes: #8047

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2024-03-27 12:21:06 -05:00
Chengyu Zhu
e66a5cb54d
Merge pull request #9332 from ChengyuZhu6/guest-pull-timeout
Support to set timeout to pull large image in guest
2024-03-28 00:34:08 +08:00
Christophe de Dinechin
82c4079fd0 agent: Remove useless loop
This is the report from `make check`:

```
error: this loop never actually loops
   --> src/signal.rs:147:9
    |
147 | /         loop {
148 | |             select! {
149 | |                 _ = handle => {
150 | |                     println!("INFO: task completed");
...   |
156 | |             }
157 | |         }
    | |_________^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#never_loop
    = note: `#[deny(clippy::never_loop)]` on by default
```

There is only one option: you get something or a timeout. You never retry, so
the report is correct.

Fixes: #9342

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2024-03-27 17:03:44 +01:00
Christophe de Dinechin
df5c88cdf0 agent: Remove lint error about .flatten running forever
The lint report is the following:

```
error: `flatten()` will run forever if the iterator repeatedly produces an `Err`
    --> src/rpc.rs:1754:10
     |
1754 |         .flatten()
     |          ^^^^^^^^^ help: replace with: `map_while(Result::ok)`
     |
note: this expression returning a `std::io::Lines` may produce an infinite number of `Err` in case of a read error
    --> src/rpc.rs:1752:5
     |
1752 | /     reader
1753 | |         .lines()
     | |________________^
     = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#lines_filter_map_ok
     = note: `-D clippy::lines-filter-map-ok` implied by `-D warnings`
     = help: to override `-D warnings` add `#[allow(clippy::lines_filter_map_ok)]`
```

This commit simply applies the suggestion.

Fixes: #9342

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2024-03-27 17:03:44 +01:00
Christophe de Dinechin
bfb55312be agent: Fix .enumerate errors during make check
Running `make check` in the `src/agent` directory gives:

```
error: you seem to use `.enumerate()` and immediately discard the index
   --> rustjail/src/mount.rs:572:27
    |
572 |     for (_index, line) in reader.lines().enumerate() {
    |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unused_enumerate_index
    = note: `-D clippy::unused-enumerate-index` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(clippy::unused_enumerate_index)]`
help: remove the `.enumerate()` call
    |
572 |     for line in reader.lines() {
    |         ~~~~    ~~~~~~~~~~~~~~

    Checking tokio-native-tls v0.3.1
    Checking hyper-tls v0.5.0
    Checking reqwest v0.11.18
error: could not compile `rustjail` (lib) due to 1 previous error
warning: build failed, waiting for other jobs to finish...
make: *** [../../utils.mk:177: standard_rust_check] Error 101
```

Fixes: #9342

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2024-03-27 17:03:44 +01:00
Greg Kurz
e1068da1a0
Merge pull request #9326 from gkurz/draft-release
Only tag and publish the release when it is fully ready
2024-03-27 15:59:59 +01:00
ChengyuZhu6
c2dc13ebaa runtime: support to configure CreateContainer Timeout in configurations
support to configure CreateContainerRequestTimeout in the
configurations.

e.g.:
[runtime]
...
create_container_timeout = 300

Note: The effective timeout is determined by the lesser of two values: runtime-request-timeout from kubelet config
(https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout) and create_container_timeout.
In essence, the timeout used for guest pull=runtime-request-timeout<create_container_timeout?runtime-request-timeout:create_container_timeout.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-27 21:58:41 +08:00
Greg Kurz
693c9487d4 docs: Adjust release documentation
Most of the content of `docs/Stable-Branch-Strategy.md` got de-facto
deprecated by the re-design of the release process described in #9064.
Remove this file and all its references in the repo.

The `## Versioning` section has some useful information though. It is
moved to `docs/Release-Process.md`. The documentation of the `PATCH`
field is adapted according to new workflow.

Fixes #9064 - part VI

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-03-27 12:41:48 +01:00
Steve Horsman
45aba769c0
Merge pull request #9346 from cmaf/ci-remove-repo-docs
Remove additional links to tests directory
2024-03-27 11:13:32 +00:00
ChengyuZhu6
2224f6d63f runtime: support to configure CreateContainer timeout in annotation
Support to configure CreateContainerRequestTimeout in the annotations.

e.g.:
annotations:
      "io.katacontainers.config.runtime.create_container_timeout": "300"

Note: The effective timeout is determined by the lesser of two values: runtime-request-timeout from kubelet config
(https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout) and create_container_timeout.
In essence, the timeout used for guest pull=runtime-request-timeout<create_container_timeout?runtime-request-timeout:create_container_timeout.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-27 15:44:29 +08:00
ChengyuZhu6
39bd462431 runtime: support to set timeout for CreateContainerRequest
In the situation to pull images in the guest #8484, it’s important to account for pulling large images.
Presently, the image pull process in the guest hinges on `CreateContainerRequest`, which defaults to a 60-second timeout.
However, this duration may prove insufficient for pulling larger images, such as those containing AI models.
Consequently, we must devise a method to extend the timeout period for large image pull.

Fixes: #8141

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-27 15:44:29 +08:00
Pavel Mores
4c72b02e53 runtime-rs: remove the now-unused code of NetDevice
The remaining code in network.rs was mostly moved to utils.rs which seems
better home for these utility functions anyway (and a closely related
function open_named_tuntap() has already lived there).

ToString implementation for Address was removed after some consideration.
Address should probably ideally implement Display (as per RFC 565) which
would also supply a ToString implementation, however it implements Debug
instead, probably to enable automatic implementation of Debug for anything
that Address is a member of, if for no other reason.  Rather than having
two identical functions this commit simply switches to using the Debug
implementation for printing Address on qemu command line.

Fixes #9352

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-26 12:52:40 +01:00
Pavel Mores
c94e55d45a runtime-rs: make QemuCmdLine own vsock file descriptor
Make file descriptors to be passed to qemu owned by QemuCmdLine.  See
commit 52958f17cd for more explanation.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-26 12:50:41 +01:00
Pavel Mores
0cf0e923fc runtime-rs: refactor QemuCmdLine::add_network_device() signature
add_network_device() doesn't need to be passed NetworkInfo since it
already has access to the full HypervisorConfig.

Also, one of the goals of QemuCmdLine interface's design is to avoid
coupling between QemuCmdLine and the hypervisor crate's device module,
if at all possible.  That's why add_network_device() shouldn't take
device module's NetworkConfig but just parts that are useful in
add_network_device()'s implementation.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-26 12:50:41 +01:00
Pavel Mores
a4f033f864 runtime-rs: add should_disable_modern() utility function
is_running_in_vm() is enough to figure out whether to disable_modern but
it's clumsy and verbose to use.  should_disable_modern() streamlines the
usage by encapsulating the verbosity.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-26 12:50:41 +01:00
Pavel Mores
12e40ede97 runtime-rs: reimplement add_network_device() using Netdev & DeviceVirtioNet
This commit replaces the existing NetDevice-based implementation with one
using Netdev and DeviceVirtioNet.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-26 12:50:41 +01:00
Pavel Mores
0a57e2bb32 runtime-rs: refactor NetDevice in qemu driver
In keeping with architecture of QemuCmdLine implementation we split the
functionality into two objects: Netdev to represent and generate the
-netdev part and DeviceVirtioNet for the -device virtio-net-<transport>
part.

This change is a pure refactor, existing functionality does not change.
However, we do remove some stub generalizations and govmm-isms, notably:
- we remove the NetDev enum since the only network interface types that
  kata seems to use with qemu are tuntap and macvtap, both of which are
  implemented by the same -netdev tap
- enum DeviceDriver is also left out since it doesn't seem reasonable to
  try to represent VFIO NICs (which are completely different from
  virtio-net ones) with the same struct as virtio-net
- we also remove VirtioTransport because there's no use for it so far, but
  with the expectation that it will be added soon.

We also make struct Netdev the owner of any vhost-net and queue file
descriptors so that their lifetime is tied ultimately to the lifetime of
QemuCmdLine automatically, instead of returning the fds to the caller and
forcing it to achieve the equivalent functionality but manually.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-26 12:50:41 +01:00
Pavel Mores
7f23734172 runtime-rs: reduce generate_netdev_fds() dependencies
generate_netdev_fds() takes NetworkConfig from which it however only needs
a host-side network device name.  This commit makes it take the device name
directly, making the function useful to callers who don't have the whole
NetworkConfig but do have the requisite device name.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-26 12:50:41 +01:00
Pavel Mores
d4ac45d840 runtime-rs: refactor clear_fd_flags()
The idea of this function is to make sure O_CLOEXEC is not set on file
descriptors that should be inherited by a child (=hypervisor) process.
The approach so far is however rather heavy-handed - clearing *all* flags
is unjustifiably aggresive for a low-level function with no knowledge of
context whatsoever.

This commit refactors the function so that it only does what's expected
and renames it accordingly.  It also clarifies some of its call sites.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-26 12:50:14 +01:00
Chengyu Zhu
d16971e37e
Merge pull request #9325 from ChengyuZhu6/image_service
agent:image: Refactor code to improve memory efficiency of image service
2024-03-26 10:38:37 +08:00
Dan Mihai
6c72c29535 genpolicy: reduce policy debug prints
Kata CI has full debug output enabled for the cbl-mariner k8s tests,
and the test AKS node is relatively slow. So debug prints from policy
are expensive during CI.

Fixes: #9296

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-03-26 02:21:26 +00:00
Alex Lyn
cec943fc26
Merge pull request #9244 from Apokleos/dgb-gpu
runtime-rs/dragonball: add support building kernel with upcall and GPU hotplug
2024-03-26 08:53:54 +08:00
Chelsea Mafrica
d69514766e src: Remove references to files in tests repo
Change scripts and source that uses files in the tests repo to use the
corresponding file in the current repo.

Fixes #9165

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2024-03-25 15:09:52 -07:00
Alex Lyn
5c54315a87 dragonball: fix CI failure due to poor UT adaptation.
Fixes: #9144

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-25 20:25:27 +08:00
ChengyuZhu6
f47408fdf4 agent:image: Refactor code to improve memory efficiency of image service
Currently, `.lock().await.clone()` results in `Option<ImageService>` being duplicated in memory with each call to `singleton()`.
Consequently, if kata-agent receives numerous image pulling requests simultaneously,
it will lead to the allocation of multiple `Option<ImageService>` instances in memory, thereby consuming additional memory resources.

In image.rs, we introduce two public functions:
`merge_bundle_oci()` and `init_image_service()`. These functions will encapsulate
the operations on `IMAGE_SERVICE`, ensuring that its internal details remain
hidden from external modules such as `rpc.rs`.

Fixes: #9225 -- part II

Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-25 07:46:50 +08:00
ChengyuZhu6
7a49ec1c80 agent:util: Refactor the unit tests to leverage rstest
Refactor the unit tests in util.rs to leverage rstest for parameterization.

Fixes: #9314

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-23 10:49:53 +08:00
ChengyuZhu6
2df2b4d30d agent:namespace: Refactor unit tests to leverage rstest
Refactor the unit tests in `namespace.rs` to leverage rstest for parameterization.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-23 10:49:48 +08:00
Hyounggyu Choi
d915a79e2d
Merge pull request #9280 from BbolroC/enable-qemu-on-s390x
runtime-rs: Enable qemu on s390x
2024-03-22 23:58:42 +01:00
Hyounggyu Choi
81aaa34bd6 runtime-rs: Add DeviceVirtioSerial and DeviceVirtconsole
It is observed that virtiofsd exits immediately on s390x
if there is no attached console devices.
This commit resolves the issue by migrating `appendConsole()`
from runtime and being triggered in `start_vm()`.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-22 19:27:13 +01:00
Hyounggyu Choi
2cfe745efb runtime-rs: Enable memory backend option for Machine for s390x
For s390x, it requires an additional option `memory-backend` for `-machine`.
Otherwise, virtiofsd exits with HandleRequest(InvalidParam).

This commit is to add a field `memory_backend` to `struct Machine`
and turn it on for s390x.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-22 19:27:13 +01:00
Hyounggyu Choi
9bcfaad625 runtime-rs: Add ccw block device for rootfs
Like nvdimm for x86_64, a block device for s390x should be
treated differently with `virtio-blk-ccw`.
This is to generate a QEMU command line parameter for a block
device by using `-blockdev` and `-device` if the `vm_rootfs_driver`
is set to `virtio-blk-ccw`.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-22 19:27:13 +01:00
David Esparza
3e40051634
Merge pull request #9255 from dborquez/thread_pid_function
runtime-rs: ch: Implement full thread/tid/pid handling
2024-03-22 10:05:02 -06:00
Chengyu Zhu
9a4cb96262
Merge pull request #9312 from ChengyuZhu6/show-feature
agent: Add guest-pull to the list of agent features in announce()
2024-03-21 23:35:29 +08:00
David Esparza
b498e140a1
runtime-rs: ch: Implement full thread/tid/pid handling
Add in the full details once cloud-hypervisor/cloud-hypervisor#6103
has been implemented, and the feature is available in a Cloud Hypervisor
release.

Fixes: #8799

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2024-03-21 08:24:53 -06:00
ChengyuZhu6
754399d909 agent: Add guest-pull to the list of agent features in announce()
Add guest-pull to the list of agent features in announce().

Fixes: #9225 -- part IV

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-21 20:01:52 +08:00
Xuewei Niu
9c4f9dcb35
Merge pull request #9311 from studychao/chao/fix_mtrr
Dragonballl: introduce MTRR regs support
2024-03-21 17:24:27 +08:00
Hyounggyu Choi
9b2c08935b runtime-rs: Pass different device argument based on bus type
Currently, `*-pci` is used as an argument for the device config.
It is not true for a case where a different type of bus is used.
s390x uses `ccw`.
This commit is to make it flexible to generate the device argument
based on the bus type. A structure `DeviceVhostUserFsPci` and
`VhostVsockPci` is renamed to `DeviceVhostUserFs` and `VhostVsock`
because the structure name is not bound to a certain bus type any more.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-21 09:25:37 +01:00
Hyounggyu Choi
7b3d1adb8c libs: Bump sysinfo to v0.30.5
It has been observed that the runtime stops running around
`sysinfo::total_memory()` while adjusting a config on s390x.
This is to update the crate to the latest version which happened
to resolve the issue. (No explicit release note for this)

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-20 09:27:13 +01:00
Chao Wu
5a4b858ece Dragonballl: introduce MTRR regs support
MTRR, or Memory-Type Range Registers are a group of x86 MSRs providing a way to control access
 and cache ability of physical memory regions.
During our test in runtime-rs + Dragonball, we found out that this register support is a must
for passthrough GPU running CUDA application, GPU needs that information to properly use GPU memory.

fixes: #9310
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2024-03-20 14:18:16 +08:00
ChengyuZhu6
5bad18f9c9
agent: set https_proxy/no_proxy before initializing agent policy
When the https_proxy/no_proxy settings are configured alongside agent-policy enabled, the process of pulling image in the guest will hang.
This issue could stem from the instantiation of `reqwest`’s HTTP client at the time of agent-policy initialization,
potentially impacting the effectiveness of the proxy settings during image guest pulling.
Given that both functionalities use `reqwest`, it is advisable to set https_proxy/no_proxy prior to the initialization of agent-policy.

Fixes: #9212

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 18:06:00 +01:00
ChengyuZhu6
db9f18029c
README: Add https_proxy and no_proxy to agent README
Add agent.https_proxy and agent.no_proxy to the table in the agent README.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 18:06:00 +01:00
ChengyuZhu6
8724d7deeb
packaging: Enable to build agent with PULL_TYPE feature
Enable to build kata-agent with PULL_TYPE feature.

We build kata-agent with guest-pull feature by default, with PULL_TYPE set to default.
This doesn't affect how kata shares images by virtio-fs. The snapshotter controls the image pulling in the guest.
Only the nydus snapshotter with proxy mode can activate this feature.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 18:06:00 +01:00
ChengyuZhu6
ba242b0198
runtime: support different cri container type check
To support handle image-guest-pull block volume from different CRIs, including cri-o and containerd.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 18:05:59 +01:00
ChengyuZhu6
874d83b510
agent/image: Use guest provided pause image
By default the pause image and runtime config will provided
by host side, this may have potential security risks when the
host config a malicious pause image, then we will use the pause
image packaged in the rootfs.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Arron Wang <arron.wang@intel.com>
Co-authored-by: Julien Ropé <jrope@redhat.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
2024-03-19 18:05:59 +01:00
ChengyuZhu6
c269b9e8c6
agent: Add guest-pull feature for kata-agent
Add "guest-pull" feature option to determine that the related dependencies
would be compiled if the feature is enabled.

By default, agent would be built with default-pull feature, which would
support all pull types, including sharing images by virtio-fs and
pulling images in the guest.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 18:05:59 +01:00
ChengyuZhu6
965da9bc9b
runtime: support to pass image information to guest by KataVirtualVolume
support to pass image information to guest by KataVirtualVolumeImageGuestPullType
in KataVirtualVolume, which will be used to pull image on the guest.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 17:22:36 +01:00
ChengyuZhu6
cfd14784a0
agent: Introduce ImagePullHandler to support IMAGE_GUEST_PULL volume
As we do not employ a forked containerd in confidential-containers, we utilize the KataVirtualVolume
which storing the image information as an integral part of `CreateContainer`.
Within this process, we store the image information in rootfs.storage and pass this image url through `CreateContainerRequest`.
This approach distinguishes itself from the use of `PullImageRequest`, as rootfs.storage is already set and initialized at this stage.
To maintain clarity and avoid any need for modification to the `OverlayfsHandler`,we introduce the `ImagePullHandler`.
This dedicated handler is responsible for orchestrating the image-pulling logic within the guest environment.
This logic encompasses tasks such as calling the image-rs to download and unpack the image into `/run/kata-containers/{container_id}/images`,
followed by a bind mount to `/run/kata-containers/{container_id}`.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 17:22:36 +01:00
ChengyuZhu6
462051b067
agent/image: merge container spec for images pulled inside guest
When being passed an image name through a container annotation,
merge its corresponding bundle OCI specification and process into the passed container creation one.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Arron Wang <arron.wang@intel.com>
Co-authored-by: Jiang Liu <gerry@linux.alibaba.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: wllenyj <wllenyj@linux.alibaba.com>
Co-authored-by: jordan9500 <jordan.jackson@ibm.com>
2024-03-19 17:22:36 +01:00
ChengyuZhu6
cec1916196
agent: Support https_proxy/no_proxy config for image download in guest
Containerd can support set a proxy when downloading images with a environment variable.
For CC stack, image download is offload to the kata agent, we need support similar feature.
Current we add https_proxy and no_proxy, http_proxy is not added since it is insecure.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Arron Wang <arron.wang@intel.com>
2024-03-19 17:22:36 +01:00
ChengyuZhu6
9cddd5813c
agent/image: Enable image-rs crate to pull image inside guest
With image-rs pull_image API, the downloaded container image layers
will store at IMAGE_RS_WORK_DIR, and generated bundle dir with rootfs
and config.json will be saved under CONTAINER_BASE/cid directory.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Arron Wang <arron.wang@intel.com>
Co-authored-by: Jiang Liu <gerry@linux.alibaba.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: wllenyj <wllenyj@linux.alibaba.com>
2024-03-19 17:22:36 +01:00
ChengyuZhu6
2b3a00f848
agent: export the image service singleton instance
Export the image service singleton instance.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Jiang Liu <gerry@linux.alibaba.com>
Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: wllenyj <wllenyj@linux.alibaba.com>
2024-03-19 17:22:36 +01:00
ChengyuZhu6
1f1ca6187d
agent: Introduce ImageService
Introduce structure ImageService, which will be used to pull images
inside the guest.

Fixes: #8103

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
co-authored-by: wllenyj <wllenyj@linux.alibaba.com>
co-authored-by: stevenhorsman <steven@uk.ibm.com>
2024-03-19 17:22:33 +01:00
Chelsea Mafrica
42dfe0e8d1
Merge pull request #9286 from jodh-intel/agent-show-enabled-features
agent: Show features enabled at build time
2024-03-19 08:54:49 -07:00
Alex Lyn
7af2df408e
Merge pull request #9295 from likebreath/0318/fix_clh_default_netconfig
runtime-rs: ch: Provide valid default value for NetConfig
2024-03-19 15:17:18 +08:00
Xuewei Niu
99d0e5fff8
Merge pull request #9270 from zvonkok/kata-agent-bind-mount
kata-agent: optional bind flag
2024-03-19 10:39:23 +08:00
Bo Chen
ad4262e86b runtime-rs: ch: Provide valid default value for NetConfig
The current default value of IP `0.0.0.0` with mask `0.0.0.0` will cause
ioctl error when being used to create and configure TAP device, with
newer version of Cloud Hypervisor [1]. This patch replaces them with
valid value that are the same as the Go-lang runtime [2].

[1] https://github.com/cloud-hypervisor/cloud-hypervisor/pull/5924
[2] e3f7852738/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_net_config.go (L40-L57)

Fixes: #9254

Signed-off-by: Bo Chen <chen.bo@intel.com>
2024-03-18 15:47:58 -07:00
Greg Kurz
1e526a4769 runtime-rs: Consolidate the handling of fds passed to QEMU
File descriptors that are passed to QEMU need some special care.
We want them to be closed when the QEMU process is started. But
at the same time, it is required that the associated rust File
structures, either coming from the` std::fs` or the `tokio::fs`
crates, are still in scope when the QEMU process is forked. This
is currently achieved by keeping File structures in variables
at the outer scope of `start_vm()`. This scheme is currently
duplicated, with similar justifications in the corresponding
comments.

Consolidate all this handling in one place with a more generic
explanation.

Fixes #9281

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-03-15 16:14:59 +01:00
James O. D. Hunt
9ef59488d9 agent: Show features enabled at build time
The agent now has a number of optional build-time features that can be
enabled.

Add details of these features to the following areas:

- Version output (`kata-agent --version`)
- Announce message (so that the details are always added to the journal
  at agent startup).
- The response message returned by the ttRPC `GetGuestDetails()` API.

Fixes: #9285.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2024-03-15 13:29:21 +00:00
Greg Kurz
6a112cc7a5 runtime-rs: Fix missing dependency
Some previous contribution missed to run cargo clippy.
Fix the dependency now so that it doesn't cause noise
in future contributions.

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-03-14 23:19:38 +01:00
Dan Mihai
b3b00e00a6
Merge pull request #9246 from microsoft/danmihai/default-env
genpolicy: default env if image doesn't have env
2024-03-14 11:01:43 -07:00
Zvonko Kaiser
c15e19c806 kata-agent: optional bind flag
Fixes: #9269

From https://github.com/opencontainers/runtime-spec/blob/main/config.md#mounts
type (string, OPTIONAL) The type of the filesystem to be mounted.
bind may be only specified in the oci spec options -> flags update r#type
The agent will ignore bind mounts if they are only specified in the OCI spec options and not in the flags.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-03-14 14:42:01 +00:00
Hyounggyu Choi
1dac6b1357 runtime-rs: Configure s390x specific flags for Makefile
s390x supports a different machine type `s390-ccw-virtio` and it is
not required to configure cpu features by default for the platform.
A hypervisor `dragonball` is not supported on s390x so that `DBCMD`
is not necessary. `vm-rootfs_driver` should be set to `virtio-blk-ccw`.
This commit is to set the architecture-specific flags for Makefile.

Fixes: #9158

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-14 13:05:35 +01:00
Alex Lyn
2aa3519520 kata-agent: Change order of guest hook and bind mount processing
The guest_hook_path item in configuration.toml allows OCI hook scripts
to be executed within Kata's guest environment. Traditionally, these
guest hook programs are pre-built and included in Kata's guest rootfs
image at a fixed location.

While setting guest_hook_path = "/usr/share/oci/hooks" in configuration.toml
works, it lacks flexibility. Not all guest hooks reside in the path
/usr/share/oci/hooks, and users might have custom locations.

To address this, a more flexible and configurable approach is to be proposed
that allows users to specify their desired path. This could include using a
sandbox bind mount path for hooks specific to that particular container.

However, The current implementation of guest hooks and bind mounts in kata-agent
has a reversed order of execution compared to the desired behavior.
To achieve the intended functionality, we simply need to swap the order of their
implementation.

Fixes: #9274

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-13 20:30:32 +08:00
Zvonko Kaiser
63dff9a9f2 kata-agent: CreateContainer Hook
Fixes: #9267

The doc states we have support for all lifecycle hooks. There are still some missing.
This is the first issue regarding the CreateContainer hook which is run before pivot_root but after prestart and createruntime

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-03-13 09:24:25 +00:00
Alex Lyn
410afcc913
Merge pull request #8866 from Apokleos/netdev-qemu-rs
runtime-rs: add netdev params to cmdline for qemu-rs.
2024-03-13 13:07:43 +08:00
Alex Lyn
e2ae8ba79b runtime-rs: add network device into Qemu's cmdline
It will open tuntap device and vhost-net device
and store device files.

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-12 22:28:54 +08:00
Alex Lyn
d3bca4597e runtime-rs: add open_named_tuntap to open a named tuntap device.
The open_named_tuntap function is designed as a public function to
open a tuntap device with the specified name. However, in order to
reference existing methods in dbs_utils, we still need to keep the
reference "path = "../../../dragonball/src/dbs_utils" in dependencies
and cannot hide it.

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-12 22:26:32 +08:00
Alex Lyn
005b333976 runtime-rs: add network helpers and impl ToQemuParams
Add network helpers and impl ToQemuParams trait to build
netdev params which are put into cmdline for Qemu VM running.

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-12 22:25:39 +08:00
Alex Lyn
63786934f4 runtime-rs: set network namespace for qemu process and netdev.
We need ensure the add_network_device happens in netns and
move qemu process into netns which keeps the qemu process
running in this net namespace.

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-12 22:21:43 +08:00
Alex Lyn
69a5e5b955 runtime-rs: add network device handler in start_vm.
Add network device handler in start_vm, which is sepcially
for Qemu VM running with added net params to command line.

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-12 22:18:01 +08:00
Alex Lyn
9f6003adde runtime-rs: add a new netns field in struct QemuInner.
We need add a new netns field in struct QemuInner, and
initialize it with argument passed down in prepare_vm().

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-11 16:02:39 +08:00
Alex Lyn
f571ec84d2 runtime-rs: add a public method to support process entering netns.
The enter_netns function is designed as a public method to help
VMMs running as a independent process enter a network namespace,
reducing duplicate code.

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-11 15:55:52 +08:00
Alex Lyn
4176fcc3c6 runtime-rs: make the code for cleanup fd flags as public method.
It just move the related code to a public file(utils.rs) and make
it a common method for both vsock and network, or some others.

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-11 15:52:20 +08:00
Alex Lyn
b1038704e0 runtime-rs: make NetnsGuard common for hypervisor and resource.
In order to better support non-builtin vmm usage of NetnsGuard and
reduce code duplication, we need to move it to a common path that
can be referenced by both hypervisor and resource manager.

In this patch, it just do moving code from network/utils/netns.rs
to kata-sys-utils/src/netns.rs

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-11 15:38:42 +08:00
Alexandru Matei
617b0114b3 clh: initialize clh pid before using it
The PID needs to be initialized before calling isClhRunning.
waitVMM() uses isClhRunning and is called by launchClh() just
before returning from function.

Fixes: #9230

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2024-03-09 13:53:51 +02:00
Steve Horsman
54e5ce2464
Merge pull request #9154 from chungeun-choi/change-deprecated-package
fixed - Change the deprecated module from 'io/util' to util. 'io/util…
2024-03-08 15:05:43 +00:00
Alex Lyn
c73597c39d
Merge pull request #9208 from studychao/chao/fix_virt_ci
Dragonball: fix unit test problems when switching to new virt github machine
2024-03-08 09:41:05 +08:00
Chengyu Zhu
d49391a555
Merge pull request #8798 from LindaYu17/setpolicy
add setpolicy function to kata-runtime tool
2024-03-08 06:31:57 +08:00
Dan Mihai
5398b6466c
Merge pull request #9224 from 3u13r/sidecar-container
genpolicy: add restartPolicy to container struct
2024-03-07 12:59:55 -08:00
Dan Mihai
4c3d6fadc8 genpolicy: default env if image doesn't have env
Use containerd's default environment for container images that don't
specify the Env field.

Also, re-enable policy env variable verification, now that these
uncommon images are supported too.

Fixes: #9239

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-03-07 16:56:06 +00:00
Leonard Cohnen
e30e8ab7dc genpolicy: add restartPolicy to container struct
This adds support for sidecar container introduced in Kubernetes 1.28

Fixes: #9220

Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
2024-03-07 12:00:14 +01:00
Chungeun Choi
bad263f399 runtime: Replace deprecated module io/ioutil" to "io"
This change updates the module import to use 'util' instead of the deprecated 'io/util'

Fixes: #9166

Signed-off-by: Chungeun Choi <ce.choi@okestro.com>
2024-03-07 10:56:06 +00:00
Alex Lyn
ef9a38e551 shim-interface: add Copyright of AntGroup in file shim-interface.rs
Fixes: #9189

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-07 15:46:32 +08:00
Alex Lyn
2972a3a675 shim-interface: add UT for get_uds_with_sid
Fixes: #9189

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-07 15:45:44 +08:00
Alex Lyn
7145243bd3 kata-ctl: Support using container short ID to enter guest.
Fixes: #9189

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-07 15:44:47 +08:00
Linda Yu
1c5693be86 stream: repeat copybuffer if it is blocked by policy
copyBuffer returns and the streams will be closed when error occurs.
If the error contains "blocked by policy" it means the log output is
disabled by policy with "ReadStreamRequest" and "WriteStreamRequest" set
to false. But at this moment, we want the real stream still working (not
be seen) because we might want to enable logging for debugging purpose,
so we repeat copybuffer in this case to avoid streams being closed.

Fixes: #8797

Signed-off-by: Linda Yu <linda.yu@intel.com>
2024-03-07 15:00:23 +08:00
Linda Yu
eda419cb03 kata-runtime: add set policy function to kata-runtime
logging/debugging information might probably be disabled in production
due to security consideration, but we'd better provide an approach for
customer to get logging information during runtime, this PR implement
setpolicy function in kata-runtime tools, although it can set whole policy
other than logging.
setpolicy would evokes remote attestation, which means before setting
policy during runtime, user has to reconfigure new policy hash in KBS/AS.

usage:  kata-runtime policy set policy.rego --sandbox-id XXXXXXXX

Fixes: #8797

Signed-off-by: Linda Yu <linda.yu@intel.com>
2024-03-07 15:00:23 +08:00
Dan Mihai
e61ef30a76 genpolicy: disable env variable verification
Disable env variable verification to unblock CI, until container
images that don't specify the Env variables will be handled correctly
(see #9239).

Also, mark the image config Env field as optional, thus allowing
policy generation for these container images.

Fixes: #9240

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-03-07 01:59:18 +00:00
Fupan Li
628f57aca0
Merge pull request #9193 from UiPath/fix/clh-dax
clh: Enable DAX for rootfs
2024-03-05 09:39:22 +08:00
Chao Wu
9f0eab904b Dragonball: fix test_signal_handler
a) There is some unknown syscalls triggered in new github virt machine
that would break the make test process with SIGSYS after applying
SeccompFilter. In order to fix this, we change the allowlist in this
unit test for seccompfileter into a blocklist to avoid meeting the unknown syscalls.
b) lazy static METRICS is not fully initialize in the unit test and may lead to
unstable result for this UT.

fixes: #9207

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2024-03-04 16:27:27 +08:00
Chao Wu
253fe72435 Dragonball: fix test_handler_insert_region
the mmap region start guest addr hard-code a value and later there
 would be check whether the mentioned addr is larger than or equal
 to mem_end (default to host_phy_mem >> 1) in order to satisfy the
 requirement for DaxMemory. Since github virt machine phy_mem is larger
 than previous CI machine we use, the hard-code value could no longer be
 worked. To fix this, we change the address to mem_end in unit test to
 avoid the influence of host machine change.

fixes: #9207
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2024-03-04 16:27:19 +08:00
Xuewei Niu
daab76de36
Merge pull request #9201 from liubogithub/liubo/dev/panic_fix_3
katautils: fix panic on tracing.
2024-03-02 10:27:02 +08:00
Alex Lyn
13a20957cb
Merge pull request #9164 from Apokleos/directvol-csi-dockerfile
csi-kata-directvolume: add Dockerfile for building csi image
2024-03-01 18:12:19 +08:00
Alex Lyn
f69428a1e7 csi-kata-directvolume: add Dockerfile for building csi image
Fixes: #9163

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-01 10:41:51 +08:00
Liu Bo
b6f8355ea3 katautils: fix panic on tracing.
This fixes a panic on tracing on container exit.

The root cause is that global var needs to be set by "=" instead of
":=".

Fixes: #9102

Signed-off-by: Liu Bo <liub.liubo@gmail.com>
2024-02-29 18:40:23 -08:00
Dan Mihai
11b603e5f1
Merge pull request #9139 from microsoft/saulparedes/genolicy_panic_subpath
genpolicy: panic when we see a volume mount subpath
2024-02-29 12:18:56 -08:00
Alexandru Matei
6856e8f678 clh: Enable DAX for rootfs
Fixes: #9192

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2024-02-29 18:01:47 +02:00
Chengyu Zhu
bb4c608b32
Merge pull request #9110 from ChengyuZhu6/agent_option
agent: Add all agent configuration options to README
2024-02-28 18:50:44 +08:00
Dan Mihai
352e2af5f0
Merge pull request #9153 from microsoft/danmihai1/clh-bootVM-timeout
runtime: clh: minimum 10s timeout for CreateVM + BootVM
2024-02-27 09:58:01 -08:00
ChengyuZhu6
731c490ded agent: Add all agent configuration options to README
Add all agent configuration options to README so that users can more easily understand
what these options do and how to configure them at runtime.

Fixes: #9109

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-02-27 17:35:19 +08:00
Xuewei Niu
bb5e33b33a
Merge pull request #9100 from littlejawa/fix_5738_metrics_memory
runtime: remove kata_shim_netdev metric
2024-02-26 19:01:21 +08:00
Dan Mihai
f4509b806b runtime: clh: minimum 10s timeout for CreateVM + BootVM
Relax the timeout for calling CLH's CreateVM + BootVM APIs. When
hitting the older 1s timeout, killing a half-booted Guest and
retrying the same boot sequence could have been wasteful and resulting
in unstable CI testing on slower Hosts.

Fixes: #9152

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-02-24 19:15:57 +00:00
Saul Paredes
9b7bd376eb genpolicy: panic when we see a volume mount subpath
Based on https://github.com/kata-containers/runtime/issues/2812

Fixes: #9145

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-02-23 09:56:51 -08:00
Xuewei Niu
89c76d7d8d
Merge pull request #9125 from gkurz/fix-agent-cgroup-ns
agent: Run container workload in its own cgroup namespace (cgroup v2 guest only)
2024-02-23 10:40:17 +08:00
Julien Ropé
1c306fe4a6 runtime-rs: stop reporting net dev metrics for the shim
For consistency with the go runtime.
As the shim itself is not using the network (all its communication with
other processes is done with local unix sockets), there is no reason to
keep gathering and reporting shim-specific network metrics.
Actual network usage of the kata containers can be found from the existing
agent network metrics (kata_guest_netdev_stat).

Signed-off-by: Julien Ropé <jrope@redhat.com>
2024-02-22 14:00:00 +01:00
Julien Ropé
9de65707ca runtime: stop reporting net dev metrics for the shim
As part of the shim network metrics, the shim is reporting network interfaces
from the host with no namespace isolation - this gives insight in interfaces
not tied to the kata containers, and causes an increase in resource usage for
kata metrics.

As the shim itself is not using the network (all its communication with
other processes is done with local unix sockets), there is no reason to
keep gathering and reporting shim-specific network metrics.
Actual network usage of the kata containers can be found from the existing
hypervisor network metrics (kata_hypervisor_netdev) and from the agent
network metrics (kata_guest_netdev_stat).

Fixes: #5738

Signed-off-by: Julien Ropé <jrope@redhat.com>
2024-02-22 14:00:00 +01:00
Alex Lyn
014e0f4e46 runtime-rs: bugfix for GPU passthrough failed with InvalidOperation.
We need initailize the pci_hotplug_enabled with true before we do GPU
passthrough with runtime-rs/dragonball. Otherwise it fails with error
`InvalidOperation`.

Fixes: #9129

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-02-22 10:22:32 +08:00
Dan Mihai
d84f50db5b genpolicy: fix typo in policy logging
Improve logging, for easier debugging.

Fixes: #9072

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-02-21 18:08:07 +00:00
Greg Kurz
600b951afd agent: Run container workload in its own cgroup namespace
When cgroup v2 is in use, a container should only see its part of the
unified hierarchy in `/sys/fs/cgroup`, not the full hierarchy created
at the OS level. Similarly, `/proc/self/cgroup` inside the container
should display `0::/`, rather than a full path such as :

0::/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podde291f58_8f20_4d44_aa89_c9e538613d85.slice/crio-9e1823d09627f3c2d42f30d76f0d2933abdbc033a630aab732339c90334fbc5f.scope

What is needed here is isolation from the OS. Do that by running the
container in its own cgroup namespace. This matches what runc and
other non VM based runtimes do.

Fixes #9124

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-02-21 13:14:13 +01:00
Greg Kurz
14886c7b32 agent: lint code
Run cargo-clippy to reduce noise in actual functional changes.

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-02-21 13:14:13 +01:00
Archana Shinde
6d38fa1530 network: Try removing as many changes as possible during network cleanup
In case an error is encountered while removing a network endpoint during
network cleanup, we cuurently return immediately with the error.
With this change, in case of error we simply log the error and proceed
towards removing the next endpoint. With this, we can cleanup the
network changes made by the shim as much as possible.
This is especially important when multiple interfaces are passed to the
network namespace using a network plugin like multus.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-02-20 06:08:05 -08:00
Archana Shinde
b005cda689 network: Move up defer block tp cleanup network
Move the defer for cleaning up network before the call to add network.
This way if any change made by add network is reverted by in case of
failure. This is particulary important for physical network interfaces
as with this step we make sure that driver for the physical interface is
reverted back to the original host driver. Without this the physical
network iterface will remain bound to vfio.

Fixes: #8646

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-02-20 06:06:42 -08:00
ChengyuZhu6
96c297cb37 runtime: fix checksum mismatch error in make vendor
Fix checksum mismatch error in `make vendor`.

Fixes: #9111

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-02-18 22:22:38 +08:00
Fabiano Fidêncio
eea4277fbf
runtime: Update runc to v1.1.12
Although we don't seem to be affected by
https://nvd.nist.gov/vuln/detail/CVE-2024-21626, we vendor and use the
runc package in a few different places of our code, and we better update
the package to its latest release.

Fixes: #9097

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-14 23:13:39 +01:00
Greg Kurz
d7afd31fd4
Merge pull request #8455 from BbolroC/runtime-rs-qemu-config
runtime-rs: Add a new config option for QEMU
2024-02-10 08:48:23 +01:00
Dan Mihai
a054462eb7
Merge pull request #9051 from microsoft/danmihai1/k8s-copy-file
tests: k8s: k8s-copy-file auto-generated policy
2024-02-09 12:30:49 -08:00
Hyounggyu Choi
05c4c8055c runtime-rs: Configure argument replacement for QEMU in Makefile
Last but not least, all placeholders for argument replacement
should be configured to generate a configuration file when `QEMUCMD`
is defined. This enriches those variables.

Additionally, this involves creating a symbolic link to `configuration-qemu.toml`
if QEMU is defined as the default hypervisor.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-02-09 19:31:20 +01:00
Hyounggyu Choi
27cb30d8ce runtime-rs: Adjust configuration template for runtime-rs
There are some variables newly introduced to runtime-rs, such as:

- runtime.name
- runtime.hypervisor_name
- runtime.agent_name
- vm_rootfs_driver

Additionally some of the placeholders for argument replacement are
made hypervisor-specific based on the changes made for dragonball.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-02-09 16:26:59 +01:00
Steve Horsman
b99f574522
Merge pull request #9037 from niteeshkd/nd_SevSnpGuest
runtime: fix creation of SEV confidential container on SNP enabled host.
2024-02-08 09:29:20 +00:00
Greg Kurz
6ead48ec06
Merge pull request #8986 from pmores/drop-shim-v2-address-value-validation
runtime-rs: fix interoperability issues between runtime-rs and cri-o
2024-02-08 09:44:12 +01:00
Dan Mihai
9a780aa98f genpolicy: improve logging from ExecProcessRequest
Additional logging from the ExecProcessRequest rules, for easier
debugging.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-02-08 02:21:58 +00:00
Dan Mihai
dab567bdfa genpolicy: add easy way to allow CloseStdinRequest
For example, Kata CI's k8s-copy-file.bats transfers files between the
Host and the Guest using "kubectl exec", and that results in
CloseStdinRequest being called from the Host.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-02-08 02:21:58 +00:00
Dan Mihai
8401adb113 genpolicy: update default values
1. Remove PullImageRequest because that is not used in the main
   branch. It was used in the CCv0 branch.

2. Add default false values for the remaining Kata Agent ttrpc
   requests.

These changes don't change the functionality of the auto generated
Policy, but they help with easier understanding the Policy text and
the logging from the Rego rules.

Fixes: #9049

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-02-08 02:21:58 +00:00
Dan Mihai
535db6b29c
Merge pull request #9043 from ChengyuZhu6/assert
runtime-rs: fix assert error in `make check`
2024-02-07 18:19:18 -08:00
Dan Mihai
01745689e1
Merge pull request #9029 from microsoft/danmihai1/k8s-empty-dirs
genpolicy: mount source for non-confidential guest
2024-02-07 11:26:16 -08:00
Pavel Mores
6346e04cf7 runtime-rs: fix handling of TTRCP_ADDRESS
Since cri-o doesn't seem to use address for event publishing as mentioned
in the previous commit it will not send it.  However, the exact way of
not sending it is unfortunately different from what is assumed by
runtime-rs.  Due to an implementation detail of cri-o which uses containerd
libraries for some low-level tasks, TTRPC_ADDRESS will not be missing from
environment as assumed, instead it will be present with an empty value.

This commit contains a small adjustment to account for that and use
LogForwarder even if TTRPC_ADDRESS is present, but with an empty value.

Fixes #8985

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-02-07 17:01:04 +01:00
ChengyuZhu6
34c47e08b2 runtime-rs: fix assert error in test in make check
Fix assert error:
error: used `assert_eq!` with a literal bool
   --> crates/hypervisor/src/ch/inner.rs:218:9
    |
218 |         assert_eq!(state.jailed, false);
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#bool_assert_comparison
    = note: `-D clippy::bool-assert-comparison` implied by `-D warnings`

Fixes: #9042

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-02-07 19:31:10 +08:00
Archana Shinde
d9ce88ada3
Merge pull request #8704 from amshinde/runtime-rs-clh-implement-persist
runtime-rs: implement persist api for cloud-hypervisor
2024-02-07 02:29:33 -08:00
Niteesh Dubey
3e383674f8 runtime: fix creation of SEV confidential container on SNP enabled host.
This is needed to fix the bug which is not allowing to create SEV container
on SNP enabled host anymore. This is a regression that was introduced as
part of the following commit:
de39fb7d38

Fixes: #9036

Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>
2024-02-06 19:01:30 +00:00
Hyounggyu Choi
462afcf829 runtime-rs: Copy configuration for QEMU from runtime
It makes sense to reuse a configuration template for runtime-golang
as a base. This is simply to copy it into the config directory.

Fixes: #8441

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-02-06 19:35:44 +01:00
Pavel Mores
f0256fded5 runtime-rs: remove validation of shim v2 -address value
It appears that under the shim v2 protocol, a shim has no use of its own
for the -address value, it just passes it back to container runtime's
(mostly containerd or cri-o) event-publishing binary.  Since the -address
value only flows through the shim, being passed to the shim by a container
runtime and then essentially passed back by shim to the container runtime,
it seems inappropriate for a shim to validate the value that is fully
owned and only used by the container runtime.

This commit removes such validation from runtime-rs.  Doing so, it solves
(part of) an interoperability problem between runtime-rs and cri-o.  cri-o
seems to intentionally choose not to implement the event-publishing part
of the shim v2 protocol and thus it has no value it could pass to
runtime-rs for -address.  As a result, it sends an empty string which has
been failing the excessive validation performed by runtime-rs so far.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-02-06 13:43:09 +01:00
Alex Lyn
1ab9a21492
Merge pull request #8552 from deagon/fix/missing-port-type
runtime: missing port type in the DeviceInfo
2024-02-06 10:56:46 +08:00
Dan Mihai
473efc2149 genpolicy: mount source for non-confidential guest
The emergent Kata CI tests for Policy use confidential_guest = false
in genpolicy-settings.json. That value is inconsistent with the
following mount settings:

        "emptyDir": {
            "mount_type": "local",
            "mount_source": "^$(cpath)/$(sandbox-id)/local/",
            "mount_point": "^$(cpath)/$(sandbox-id)/local/",
            "driver": "local",
            "source": "local",
            "fstype": "local",
            "options": [
                "mode=0777"
            ]
        },

We need to keep those settings for confidential_guest = true, and
change confidential_guest = false to use:

        "emptyDir": {
            "mount_type": "local",
            "mount_source": "^$(cpath)/$(sandbox-id)/rootfs/local/",
            "mount_point": "^$(cpath)/$(sandbox-id)/local/",
            "driver": "local",
            "source": "local",
            "fstype": "local",
            "options": [
                "mode=0777"
            ]
        },

The value of the mount_source field is different.

This change unblocks testing using Kata CI's pod-empty-dir.yaml:

genpolicy -u -y pod-empty-dir.yaml

kubectl apply -f pod-empty-dir.yaml

k get pod sharevol-kata
NAME            READY   STATUS    RESTARTS   AGE
sharevol-kata   1/1     Running   0          53s

Fixes: #8887

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-02-06 01:19:48 +00:00
Fabiano Fidêncio
1362918ff0
Merge pull request #9011 from fidencio/topic/switch-to-using-the-confidential-rootfs
runtime: Replace TEE specific initrd / image for the confidential one
2024-02-05 10:43:12 +01:00
Guoqiang Ding
6068faf40b runtime: failed to run in the case of ColdPlugVFIO
Add the missing port type in the DeviceInfo.

Fixes: #9014
Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>
2024-02-05 17:30:11 +08:00
Archana Shinde
b3c74411f6 runtime-rs: Add tests for persist api for clh
Add tests to check clh struct is saved/restored correctly.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-02-04 22:03:57 -08:00
Archana Shinde
0b78296dca runtime-rs: Store additional field for hypervisor state
Implementing Persist API for cloud-hypervisor was done partially with
initial support for cloud-hypervisor. Store and retrieve additional
fields to/from the hypervisor state.

Fixes: #6202

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-02-04 22:03:57 -08:00
Archana Shinde
a5f0b92bca runtime-rs: Add guest protection to hypervisor state
Store guest-protection used while storing the state of the hypervisor.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-02-04 22:03:54 -08:00
Alex Lyn
cf74166d75
Merge pull request #9015 from Apokleos/bugfix-exec-uds
runtime: display accurate error msg to avoid misleading users.
2024-02-05 13:50:43 +08:00
Alex Lyn
c6830ceb89 runtime: display accurate error msg to avoid misleading users.
The original handling method does not reach user expectations.
When the ClientSocketAddress method stats the corresponding
path of runtime-rs and has not found it yet, we should return
an error message here that includes the reason for the failure
(which should be an error display indicating that both runtime-go
and runtime-rs were not found). Instead of simply displaying the
corresponding path of runtime-rs as the final error message to
users.
It is also necessary to return the error promptly to the caller
for further error handling.

Fixes: #8999

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-02-04 16:45:59 +08:00
Guoqiang Ding
7bf1ebe16d kata-monitor: fix agentUrl from containerd shim
Fix the missing leading slash.

Fixes: #9013
Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>
2024-02-04 16:24:13 +08:00
Fabiano Fidêncio
e4258d8694
runtime: Use confidential image / initrd instead of TEE specific ones
Now that we have a confidential image / initrd being built, instead of a
specific one for each TEE, let's use it everywhere possible.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-03 13:20:14 +01:00
Fabiano Fidêncio
3755c69165
runtime: makefile: remove SNP specific kernel references
As this is not used anymore, we can go ahead and just remove it

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-02 21:12:21 +01:00
Fabiano Fidêncio
57b132f94c
runtime: makefile: remove SEV specific kernel references
As this is not used anymore, we can go ahead and just remove it

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-02 21:12:21 +01:00
Fabiano Fidêncio
2562d23242
runtime: makefile: remove TDX specific kernel references
As this is not used anymore, we can go ahead and just remove it.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-02 21:11:43 +01:00
Fabiano Fidêncio
f4e3c936d8
runtime: snp: config: Use the confidential kernel
As we're building a single confidential kernel, we should rely on it
rather than keep using the specific ones for TDX / SEV / SNP.

However, for debugability-sake, let's do this change TEE by TEE.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-02 21:11:36 +01:00
Fabiano Fidêncio
8731366d7b
runtime: sev: config: Use the confidential kernel
As we're building a single confidential kernel, we should rely on it
rather than keep using the specific ones for TDX / SEV / SNP.

However, for debugability-sake, let's do this change TEE by TEE.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-02 21:11:36 +01:00
Fabiano Fidêncio
6cbdba7268
runtime: tdx: config: Use the confidential kernel
As we're building a single confidential kernel, we should rely on it
rather than keep using the specific ones for TDX / SEV / SNP.

However, for debugability-sake, let's do this change TEE by TEE.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-02 17:13:06 +01:00
Fabiano Fidêncio
a618461d3a
runtime: Add confidential kernel to the makefile
With this we can properly generate and the the `-confidential` kernel,
which supports SEV / SNP / TDX as part of our configuration files.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-02-02 17:13:05 +01:00
Wenyuan Liu
cb888516c1
Merge pull request #8760 from fadecoder/reduce_go_runtime_mounts
runtime: Reduce the mount points with namespace isolation
2024-02-02 16:54:44 +08:00
Greg Kurz
d1a26ead94
Merge pull request #8454 from BbolroC/compile-with-qemu-s390x
runtime-rs: make compilation for QEMU on s390x
2024-02-02 09:29:32 +01:00
Hyounggyu Choi
bb6f5073aa runtime-rs: Allow compilation for s390x
Until now, runtime-rs couldn't be compiled on s390x.
We need to lift those restrictions in Makefile first.

Fixes: #8446

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-02-01 23:48:15 +01:00
Dan Mihai
6f1062b5d6
Merge pull request #8966 from microsoft/danmihai1/k8s-sandbox-vcpus-allocation
genpolicy: ignore empty YAML as input
2024-02-01 13:51:02 -08:00
Dan Mihai
8f9c92c0ee
Merge pull request #8977 from microsoft/danmihai1/default-namespace
genpolicy: support non-default namespace name
2024-02-01 13:50:33 -08:00
Hyounggyu Choi
8fcee6e6ec runtime-rs: Use Persist::restore() of QEMU for VirtSandbox
It fails to compile virt_container because Dragonball is only
used in the implementation of the trait method Persist::restore().
As the hypervisor is not compiled on s390x and QEMU implements
the trait method, this commit is to let the method use QEMUi's.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-02-01 18:02:10 +01:00
Hyounggyu Choi
56aef3741d runtime-rs: Exclude hypervisors plugins except QEMU for s390x
Dragonball and cloud-hypervisor are not supported on s390x.
We need to exclude the plugins for these hypervisors from compilation.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-02-01 18:02:10 +01:00
Zhigang Wang
9317e23df1 mount: Reduce the mount points with namespace isolation
This patch can reduce load on systemd process, and
increase the k8s deployment density when using go runtime.

Fixes: #8758

Signed-off-by: Zhigang Wang <wangzhigang17@huawei.com>
Signed-off-by: Liu Wenyuan <liuwenyuan9@huawei.com>
2024-02-01 18:34:24 +08:00
Xuewei Niu
2332552c8f
Merge pull request #7483 from frezcirno/passfd_io_feature
runtime-rs: improving io performance using dragonball's vsock fd passthrough
2024-02-01 14:53:53 +08:00
Alex Lyn
cf26c16017
Merge pull request #8931 from yaoyinnan/8930/feat/merge-ValidCgroupPath
runtime: merged ValidCgroupPath method
2024-02-01 12:53:55 +08:00
Alex Lyn
a157fc3b74
Merge pull request #8974 from yaoyinnan/5240/fix/cgroup-parallel
runtime: add SingleContainer when obtaining OCI Spec
2024-02-01 11:43:02 +08:00
Alex Lyn
1b8f3ce28a
Merge pull request #8929 from yaoyinnan/8838/fix/error-message
runtime-rs: report error on missing or empty fields in configuration
2024-02-01 11:02:30 +08:00
Dan Mihai
09ea0eed9d genpolicy: ignore empty YAML as input
Kata CI's pod-sandbox-vcpus-allocation.yaml ends with "---", so the
empty YAML document following that line should be ignored.

To test this fix:

genpolicy -u -y pod-sandbox-vcpus-allocation.yaml

Fixes: #8895

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-02-01 02:22:21 +00:00
Dan Mihai
befef119ff
Merge pull request #8941 from malt3/genpolicy-flags
genpolicy: allow separate paths for rules and settings files
2024-01-31 18:14:12 -08:00
Dan Mihai
21125baec3
Merge pull request #8962 from microsoft/danmihai1/config-map-optional2
genpolicy: ignore volume configMap optional field
2024-01-31 12:29:30 -08:00
Greg Kurz
8b1dc06971
Merge pull request #8938 from pmores/log-qemus-stderr-in-shim-log
runtime-rs: Log qemu's stderr in shim log
2024-01-31 18:04:28 +01:00
Dan Mihai
f0339a79a6 genpolicy: support non-default namespace name
Allow users to specify in genpolicy-settings.json a default cluster
namespace other than "default". For example, Kata CI uses as default
namespace: "kata-containers-k8s-tests".

Fixes: #8976

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-31 15:47:01 +00:00
Zixuan Tan
222de4f684 agent: Fix a race condition in passfd_io.rs
There is a race condition in agent HVSOCK_STREAMS hashmap, where a
stream may be taken before it is inserted into the hashmap. This patch
add simple retry logic to the stream consumer to alleviate this issue.

Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
6e4d4c329a agent,runtime-rs: Add license header to passfd_io.rs
Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
1206de2c23 agent: Use pipes as stdout/stderr of container process
Linux forbids opening an existing socket through /proc/<pid>/fd/<fd>,
making some images relying on the special file /dev/stdout(stderr),
/proc/self/fd/1(2) fail to boot in passfd io mode, where the
stdout/stderr of a container process is a vsock socket.

For back compatibility, a pipe is introduced between the process
and the socket, and its read end is set as stdout/stderr of the
container process instead of the socket. The agent will do the
forwarding between the pipe and the socket.

Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
f6710610d1 agent,runtime-rs,runk: fix fmt and clippy warnings
Fix rustfmt and clippy warnings detected by CI.

Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
89be42a177 runtime-rs: open stdout and stderr fifos NONBLOCK
This patch adds O_NONBLOCK flag when open stdout and stderr FIFOs
to avoid blocking.

Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
3eb4bed957 agent: use biased select to avoid data loss
This patch uses a biased select to avoid stdin data loss in case of
CloseStdinRequest.

Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
7874ef5fd2 agent: set stdout/err vsock stream as blocking before passing to child
In passfd io mode, when not using a terminal, the stdout/stderr vsock
streams are directly used as the stdout/stderr of the child process.
These streams are non-blocking by default.

The stdout/stderr of the process should be blocking, otherwise
the process may encounter EAGAIN error when writing to stdout/stderr.

Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Fupan Li
cfb262d02f container: keep the io connection when pass fd to hybrid vsock
We want the io connection keep connected when the containerd closed
the io pipe, thus it can be attached on the io stream.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-01-31 21:07:48 +08:00
Fupan Li
4a762fcfdd dbs: hybrid stream support keep the connection when local closed
Support the hybrid fd passthrough mode with passing pipe fd,
which can specify this connection kept even when the pipe
peer closed, and this connection can be reget wich re-opening
the pipe.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
5536743361 agent,runtime-rs: fix container io detach and attach
Partially fix some issues related to container io detach and attach.

Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
657b17a86f runtime-rs: open stdin fifo with RDWR|NONBLOCK when pass vsock streams
In linux, when a FIFO is opened and there are no writers, the reader
will continuously receive the HUP event. This can be problematic
when creating containers in detached mode, as the stdin FIFO writer
is closed after the container is created, resulting in this situation.

In passfd io mode, open stdin fifo with O_RDWR|O_NONBLOCK to avoid the
HUP event.

Fixes: #6714
Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
f1b33fd2e0 agent: clean up term master fd when container exits
When container exits, the agent should clean up the term master fd,
otherwise the fd will be leaked.

Fixes: kata-containers#6714

Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
b8632b4034 dragonball: vsock: properly handle EPOLLHUP/EPOLLERR events
When one end of the connection close, the epoll event will be triggered
forever. We should close the connection and kill the connection.

Fixes: #6714

Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
442df71fe5 agent,runtime-rs: refactor process io using vsock fd passthrough feature
Currently in the kata container, every io read/write operation requires
an RPC request from the runtime to the agent. This process involves
data copying into/from an RPC request/response, which are high overhead.

To solve this issue, this commit utilize the vsock fd passthrough, a
newly introduced feature in the Dragonball hypervisor. This feature
allows other host programs to pass a file descriptor to the Dragonball
process, directly as the backend of an ordinary hybrid vsock connection.

The runtime-rs now utilizes this feature for container process io. It
open the stdin/stdout/stderr fifo from containerd, and pass them to
Dragonball, then don't bother with process io any more, eliminating
the need for an RPC for each io read/write operation.

In passfd io mode, the agent uses the vsock connections as the child
process's stdin/stdout/stderr, eliminating the need for a pipe
to bump data (in non-tty mode).

Fixes: #6714

Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
eb6bb6fe0d config: add two options to control vsock passthrough io feature
Two toml options, `use_passfd_io` and `passfd_listener_port` are introduced
to enable and configure dragonball's vsock fd passthrough io feature.

This commit is a preparation for vsock fd passthrough io feature.

Fixes: #6714

Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Zixuan Tan
973b5ad1f4 runtime-rs: make Container::new async
Fixes: #6714

Signed-off-by: Zixuan Tan <tanzixuan.me@gmail.com>
2024-01-31 21:07:48 +08:00
Malte Poll
531a11159f genpolicy: allow separate paths for rules and settings files
Using custom input paths with -i is counter-intuitive. Simplify path handling with explicit flags for rules.rego and genpolicy-settings.json.

Fixes: #8568

Signed-Off-By: Malte Poll <1780588+malt3@users.noreply.github.com>
2024-01-31 11:00:19 +01:00
yaoyinnan
9aa1ed805a runtime: add SingleContainer when obtaining OCI Spec
When creating a cgroup, add a SingleContainer when obtaining the OCI Spec to apply to ctr, podman, etc.

Fixes: #5240

Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>
2024-01-31 15:24:07 +08:00
yaoyinnan
b0b8523cea runtime: modify ValidCgroupPath unit test
Modify ValidCgroupPath unit test.

Fixes: #8930

Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>
2024-01-31 14:37:17 +08:00
yaoyinnan
feed5c8ff9 runtime: merged ValidCgroupPath method
Merged ValidCgroupPath method to handle cgroupv1 and cgroupv2.

Fixes: #8930

Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>
2024-01-31 14:37:13 +08:00
yaoyinnan
864389c524 runtime-rs: report error on missing or empty fields in configuration
Removed the setting of default values for runtime fields. Added explicit checks for missing or empty fields, reporting errors with clear messages.

Fixes: #8838

Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>
2024-01-31 12:46:17 +08:00
Kvlil
3fd5628771 dragonball: fix noop-method-call warning
The `noop-method-call` is a rustc lint that has existed since v1.52.0.
This lint has been moved to the warn by default lint level since v1.73.0.
Therefore build is failing with this version and above.
This commit removes the unnecessary call to `<&T as Deref>::deref` on `T: !Deref`.

Fixes: #8586

Signed-off-by: Kvlil <kalil.pelissier@gmail.com>
2024-01-30 17:16:49 +00:00
Wainer Moschetta
bf54a02e16
Merge pull request #8924 from microsoft/danmihai1/pod-nested-configmap-secret
genpolicy: fix ConfigMap volume mount paths
2024-01-30 14:09:41 -03:00
Dan Mihai
d12875ee66 genpolicy: ignore volume configMap optional field
The auto-generated Policy already allows these volumes to be mounted,
regardless if they are:
- Present, or
- Missing and optional

Fixes: #8893

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-30 15:32:37 +00:00
Xuewei Niu
7e10000b6f
Merge pull request #8928 from yaoyinnan/8927/fix/unused-DriverInfo
runtime-rs: fix unused driverInfo error
2024-01-30 20:39:10 +08:00
Pavel Mores
d53edbd0a5 runtime-rs: collect qemu stderr and log it in shim log
Qemu stderr monitoring runs in its own asynchronous green thread.
For that, `stderr` is taken out of the Child representing the qemu child
process to avoid partial move and make it possible for the main thread
still to call functions on QemuInner::qemu_process (e.g. kill(), id()).

Fixes #8937

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-01-30 09:09:05 +01:00
Pavel Mores
684d740122 runtime-rs: switch qemu child process management from std to tokio
We'll want to capture qemu's stderr in parallel with normal runtime-rs
execution.  Tokio's primitives make this much easier than std's.  This
also makes child process management more consistent across runtime-rs
(i.e. virtiofsd child process is already launched and managed using tokio).

Some changes were necessary due to tokio functions being slightly different
from their std counterparts.  Child::kill() is now async and Child::id()
now returns an Option.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-01-30 09:07:14 +01:00
Dan Mihai
6a8f46f3b8
Merge pull request #8918 from microsoft/danmihai1/metadata
genpolicy: optional PodTemplateSpec metadata field
2024-01-29 12:36:30 -08:00
Dan Mihai
60ac3048e9 genpolicy: fix ConfigMap volume mount paths
Allow Kata CI's pod-nested-configmap-secret.yaml to work with
genpolicy and current cbl-mariner images:

1. Ignore the optional type field of Secret input YAML files.

   It's possible that CoCo will need a more sophisticated Policy
   for Secrets, but this change at least unblocks CI testing for
   already-existing genpolicy features.

2. Adapt the value of the settings field below to fit current CI
   images for testing on cbl-mariner Hosts:

    "kata_config": {
        "confidential_guest": false
    },

    Switching this value from true to false instructs genpolicy to
    expect ConfigMap volume mounts similar to:

        "configMap": {
            "mount_type": "bind",
            "mount_source": "$(sfprefix)",
            "mount_point": "^$(cpath)/watchable/$(bundle-id)-[a-z0-9]{16}-",
            "driver": "watchable-bind",
            "fstype": "bind",
            "options": [
                "rbind",
                "rprivate",
                "ro"
            ]
        },

    instead of:

        "confidential_configMap": {
            "mount_type": "bind",
            "mount_source": "$(sfprefix)",
            "mount_point": "$(sfprefix)",
            "driver": "local",
            "fstype": "bind",
            "options": [
                "rbind",
                "rprivate",
                "ro"
            ]
        }
    },

    This settings change unblocks CI testing for ConfigMaps.

Simple sanity testing for these changes:

genpolicy -u -y pod-nested-configmap-secret.yaml

kubectl apply -f pod-nested-configmap-secret.yaml

kubectl get pods | grep config
nested-configmap-secret-pod 1/1     Running   0          26s

Fixes: #8892

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-29 16:13:47 +00:00
Pavel Mores
b52a398469 runtime-rs: move creation of VM path from start_vm() to prepare_vm()
This fixes a flaw pointed out in review of PR #8185.  Creation of the
directory semantically fits better into VM preparation than VM launch.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-01-27 13:46:35 +01:00
Dan Mihai
076869aa39 genpolicy: ignore the nodeName field
Validating the node name is currently outside the scope of the CoCo
policy.

This change unblocks testing using Kata CI's test-pod-file-volume.yaml
and pv-pod.yaml.

Fixes: #8888

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-26 16:30:55 +00:00
yaoyinnan
9b7c5c69cf runtime-rs: fix unused driverInfo error
Remove the unused DriverInfo declaration or integrate it into the codebase where applicable.

Fixes: #8927
Signed-off-by: yaoyinnan <35447132+yaoyinnan@users.noreply.github.com>
2024-01-26 19:59:52 +08:00
Dan Mihai
8ad5459beb genpolicy: optional PodTemplateSpec metadata field
Add metadata containing the Policy annotation if the user didn't
provide any metadata in the input yaml file.

For a simple sanity test using a Kata CI YAML file:

genpolicy -u -y job.yaml

kubectl apply -f job.yaml

kubectl get pods | grep job
job-pi-test-64dxs 0/1     Completed   0          14s

Fixes: #8891

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-25 19:06:59 +00:00
Dan Mihai
535cf04edb genpolicy: add shareProcessNamespace support
Validate the sandbox_pidns field value for CreateSandbox and
CreateContainer.

Fixes: #8868

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-25 16:48:57 +00:00
Kvlil
a4b208a712 runtime: remove SharedVersions field dead code
SharedVersion fiel add a versiontable property that isn't supported by upstream QEMU.
This is dead code since virtcontainers isn't setting SharedVersions to true.

Fixes: #7720

Signed-off-by: Kvlil <kalil.pelissier@gmail.com>
2024-01-22 12:18:42 +00:00
Fabiano Fidêncio
1e30fde8fa
Merge pull request #8862 from microsoft/danmihai1/genpolicy-dns
genpolicy: ignore pod DNS settings
2024-01-19 23:08:26 +01:00
Dan Mihai
ca03d47634 genpolicy: ignore pod DNS settings
Ignore pod DNS settings because policing the network traffic is
currently outside the scope of the Agent Policy.

Example from Kata CI: pod-custom-dns.yaml

Fixes: #8832

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-19 16:42:35 +00:00
Alex.Lyn
826c751bf3
Merge pull request #8185 from pmores/add-qemu-cmdline-generation-framework
Add qemu cmdline generation framework
2024-01-19 21:42:49 +08:00
Pavel Mores
25c8d5db5d runtime-rs: use qemu cmdline generation framework to launch VM
Deploy the framework added by the previous commit to generate qemu
command line and launch the VM.

We now properly store the child process object which allows us to
implement remaining Hypervisor functions necessary for a simple but
successful VM lifecycle, get_vmm_master_tid() and stop_vm().

Fixes #8184

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-01-19 11:42:23 +01:00
Amulyam24
f6fea5f2ca agent: fix failing unit tests on ppc64le
- test_volume_capacity_stats: verify the file block size against the fetched size via statfs()
 - test_reseed_rng: Correct the request codes for RNDADDTOENTCNT and RNDRESEEDCRNG when platform is ppc64le
 - test list_routes: Add the route only if destination is not empty
 - test_new_fs_manager: skip the test if cgroups v2 is used by default
 - skip test cases rpc::tests::test_do_write_stream, sandbox::tests::test_find_process, sandbox::t
ests::test_find_container_process and sandbox::tests::add_and_get_container on ppc64le as they are fl
aky

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2024-01-18 16:32:16 +01:00
Hyounggyu Choi
610f878894 dragonball: Fix compile error for aarch64
This is to fix a compile error raised for aarch64.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-01-18 16:32:15 +01:00
Amulyam24
376941cf69 kata-ctl: skip building kata-ctl on ppc64le
kata-ctl currently fails to build on ppc64le. Skip it for running static checks and the issues will be fixed and tracked in a seperate issue.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2024-01-18 16:31:13 +01:00
Amulyam24
4ecd82a5df runk: skip the test_init_container_create_launcher if not root on ppc64le
This is to skip the test_init_container_create_launcher if not root on ppc64le.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2024-01-18 16:31:13 +01:00
Amulyam24
a4b5447924 tools: fix makefile spacing
This minor PR removes the extra space in the makefiles.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2024-01-18 16:31:13 +01:00
Amulyam24
394777291d runtime: fix failing unit tests on ppc64le
A few CPU related test cases were failing as the version was being verified against Power8 while the CI machine is Power9.

Fixes: #5531

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2024-01-18 16:31:13 +01:00
Amulyam24
486b8a0538 dragonball: skip running static-checks for ppc64le
Since dragonball is not currently supported on ppc64le, skip running the targets for static-checks.

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2024-01-18 16:31:13 +01:00
Hyounggyu Choi
290ecf4c46 Static-check: Exclude s390x from dragonball and runtime-rs
At the moment, a project `dragonball` and `runtime-rs` does not support
for s390x. During the enablement, some errors due to the misconfiguration
of Makefile for `make check` and `make vendor` were identified.

This is to skip the build for the affected target of the projects.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-01-18 16:31:13 +01:00
Hyounggyu Choi
c0f57c9e0a Lint: Fix cargo clippy errors for s390x
Some linting errors were identified during the enablement of `make check`.
These have not been found by the Jenkins CI job because `make test` was
only triggered.

The errors for the `agent` occurs under the s390x specific tests while
the other ones for the `kata-ctl` are the architecture-specific code.

This commit is to fix those errors.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-01-18 16:31:13 +01:00
Jianyong Wu
ba74a624a8 runtime-rs: use pathBuf only for x86
PathBuf here is only used for x86.

Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2024-01-18 16:31:13 +01:00
Chelsea Mafrica
32ad465663
Merge pull request #8710 from jodh-intel/runtime-rs-ch-get-thread-ids
runtime-rs: ch: Implement minimal implementation for missing thread/pid APIs
2024-01-17 14:51:44 -08:00
Fabiano Fidêncio
147d5fd752
Merge pull request #8836 from microsoft/danmihai1/test-with-cbl-mariner
genpolicy: use root path from cbl-mariner Guest VM
2024-01-17 17:51:44 +01:00
Pavel Mores
f550d9a325 runtime-rs: add basic implementation of qemu command line generation
This current framework is enough to launch a VM with a simple container
in it (e.g. busybox).

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-01-17 12:55:00 +01:00
Pavel Mores
e8e13044da runtime-rs: add simple impls to some of Qemu's Hypervisor functions
The idea of most of these is just to prevent running into todo!()s where
we can at the moment, while implementing the fundamental functionality of
VM launch.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-01-17 12:55:00 +01:00
Dan Mihai
69557e5ad6
Merge pull request #8814 from microsoft/danmihai1/genpolicy-kata-deploy
tools: genpolicy static checks
2024-01-16 07:33:42 -08:00
Dan Mihai
13f2398fe8
Merge pull request #8837 from microsoft/danmihai1/allow_storages
genpolicy: temporarily disable allow_storages()
2024-01-16 07:10:49 -08:00
alex.lyn
99717371c1 runtime-rs: bugfix for DirectVolume/rawblock when driver is blk
DirectVolume/Rawblock doesn't work well when device's block driver
is virtio-blk-pci and the storage handler is DRIVER_BLK_PCI_TYPE.

Fixes: #8707

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2024-01-16 10:35:08 +08:00
Dan Mihai
205dafd323 genpolicy: temporarily disable allow_storages()
Temporarily disable the allow_storages() rules, because they are based
on the tarfs snapshotter + container image integrity information that
are not available yet in the main branch - see #8833.

Fixes: #8834

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-15 23:55:27 +00:00
Dan Mihai
f4106a6107 genpolicy: use root path from cbl-mariner Guest VM
Adjust genpolicy-settings.json to match the container root path from
the main branch + cbl-mariner Guest VMs.

This configuration might have to be adjusted again when other types of
Guest VMs will be tested during CI using genpolicy, in the future.

Also, improve logging from allow_root_path(), to easier debug these
issues in the future.

Fixes: #8835

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-15 23:33:28 +00:00
Dan Mihai
201eec628a tools: genpolicy static checks
Package genpolicy and enable static checks for it.

Fixes: #8813

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-15 16:49:58 +00:00
Fabiano Fidêncio
0dc00ae373
Merge pull request #8822 from microsoft/danmihai1/cargo-clippy
genpolicy: cargo clippy fixes
2024-01-15 14:59:04 +01:00
Xuewei Niu
923bd65dff
Merge pull request #8819 from justxuewei/rm-protocol-backend
dragonball: Remove unused definition
2024-01-15 10:09:46 +08:00
Dan Mihai
681cb1626a genpolicy: cargo clippy fixes
Clean up cargo clippy errors.

Fixes: #8818

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-14 01:23:46 +00:00
Xuewei Niu
f1fda3d6b0 dragonball: Remove unused definition
`EndpointProtocolFlags::ProtocolBackend` is removed due to no reference.

Fixes: #8745

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-01-13 13:25:11 +08:00
Dan Mihai
dcaae54cf6 genpolicy: "cargo fmt -- --check" clean-up
Also, update Cargo.lock

Fixes: #8816

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-13 01:57:00 +00:00
Fabiano Fidêncio
a606401722
Merge pull request #8803 from jodh-intel/issues-8784-runtime-rs-ch-rm-todo-to-unbreak
runtime-rs: ch: Unbreak CH driver
2024-01-11 19:37:13 -03:00
Fabiano Fidêncio
86a6d133e4
Merge pull request #8248 from microsoft/danmihai1/genpolicy-main
tools: add policy generation tool
2024-01-11 17:02:54 -03:00
James O. D. Hunt
29e0de4e4a runtime-rs: ch: Implement minimal memory hotplug APIs
Replace the `todo!()` calls with a minimal NOP implementation to return
the CH driver to working order since the `todo!()`'s forcibly crash the
driver at runtime. Full implementations for these APIs will be added on
issues #8800, #8801, and #8802.

Fixes: #8784.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2024-01-11 14:11:31 +00:00
James O. D. Hunt
1c0df670af runtime-rs: ch: Add minimal implementation of hypervisor metrics method
Remove the `todo!()` macro which would cause a runtime crash and replace
with a implementation that returns an error as a stop-gap until #8800 is
implemented.

Fixes: #8785.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2024-01-11 14:11:01 +00:00
Hyounggyu Choi
f62ec0a7f5
Merge pull request #8693 from BbolroC/ibm-se-config-validation-fix
runtime: Allow no initrd path for IBM Z Secure Execution
2024-01-11 09:53:51 +01:00
Xuewei Niu
70305fefc5
Merge pull request #8780 from justxuewei/containerd-events
runtime-rs: Forward events to containerd via ttrpc
2024-01-11 14:58:14 +08:00
Xuewei Niu
6fd49f7604 runtime-rs: Forward events to containerd via ttrpc
It is a little bit heavy for the runtime-rs to forwards events via
containerd CLI, contrast to the ttrpc way. Plus, for runtimes that haven't
this mechanism, e.g. CRI-O, we can't get those events anywhere.

This patch introduces two types of forwarders:

- `ContainerdForwarder`: Acquire ttrpc address from environment variables
  and forward events via ttrpc connection.
- `LogForwarder`: Write event info into logs.

Fixes: #7881

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-01-11 10:32:50 +08:00
Alex.Lyn
695440a431
Merge pull request #8749 from Apokleos/fixup-dragonball-vfio
runtime-rs: fixup vfio device in runtime-rs/dragonball
2024-01-10 15:20:34 +08:00
Greg Kurz
e3611cf27d
Merge pull request #8326 from cheriL/8325/fix_method_param
agent: use method params instead of const params in functions
2024-01-09 07:35:19 +01:00
Pavel Mores
0cfb2d2570 runtime-rs: add simple Persist implementation for Qemu
This is not necessarily meant to work, just to stub out unimplemented
functionality while focusing on more fundamental things.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-01-08 13:12:39 +01:00
Pavel Mores
45862aeec0 runtime-rs: add default rootfs type for qemu
Make sure that rootfs type is known early on even if it's not set in
configuration.toml.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-01-08 13:12:39 +01:00
Xuewei Niu
192c6ee9c3
Merge pull request #8773 from justxuewei/dbs-k8s-fragile 2024-01-05 12:54:32 +08:00
Xuewei Niu
0e9d73fe30 agent: Fix an issue reporting OOM events by mistake
The agent registers an event fd in `memory.oom_control`. An OOM event is
forwarded to containerd when the event is emitted, regardless of the
content in that file.

I observed content indicating that events should not be forwarded, as shown
below. When `oom_kill` is set to 0, it means no OOM has occurred. Therefore,
it is important to check the content to avoid mistakenly forwarding OOM
events.

```
oom_kill_disable 0
under_oom 0
oom_kill 0
```

Fixes: #8715

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-01-05 11:06:37 +08:00
Dan Mihai
7d5336aca3 agent: hold lock while setting new policy
Don't release the lock between is_allowed and set_policy calls,
because the policy might change in between these calls.

Also, move more policy code into policy.rs.

Fixes: #8734

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-01-04 16:45:30 +00:00
Xuewei Niu
b5a6e74cdf
Merge pull request #8744 from justxuewei/vhu-net-compile
dragonball: Fix compilation issue without all net features
2024-01-04 19:02:55 +08:00
soup
7c176a62fe agent: use method params instead of const params in functions
Fixes: #8325

Signed-off-by: soup <lqh348659137@outlook.com>
2024-01-04 09:29:29 +01:00
Xuewei Niu
f97f16a44a agent-ctl: Bump ttrpc version
- `ttrpc` from `0.7.1` to `0.8`.

Fixes: #8757

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-01-04 15:58:34 +08:00
Xuewei Niu
bf59c7b3d4 runtime-rs: Bump ttrpc and containerd-shim-protos versions
- `ttrpc` from `0.7.1` to `0.8`.
- `containerd-shim-protos` from `0.3.0` to `0.6.0`.

Fixes: #8756

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-01-04 15:58:34 +08:00
Xuewei Niu
cf9a0e21a1 protocols: Bump ttrpc version
- `ttrpc` from `0.7.1` to `0.8`.

Fixes: #8756

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-01-04 15:58:34 +08:00
Xuewei Niu
91360e7ddb agent: Bump ttrpc version
- `ttrpc` from `0.7.1` to `0.8`.

Fixes: #8756

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-01-04 15:58:34 +08:00
Chao Wu
f1235ddba3 dbs_virtio_devices: add Cargo.lock
In order to avoid rust-vmm upstream change breaks Dragonball
compilation, we introduce Cargo.lock to dbs crates.

fixes: #8770

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2024-01-04 11:23:30 +08:00
Chao Wu
02cd726bfc dbs-utils: add Cargo.lock
In order to avoid rust-vmm upstream change breaks Dragonball
compilation, we introduce Cargo.lock to dbs crates.

fixes: #8770

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2024-01-04 11:17:45 +08:00
Chao Wu
97bdc1529b dbs-pci: introduce Cargo.lock
As reported in #8767, we have found that the root cause is that rust-vmm's vmm-sys-utils
introduce a new release 0.12.1 and dbs-pci rely on rust-vmm's vfio-ioctls which uses >=
to declare vmm-sys-utils so it automatically upgrade vmm-sys-utils to 0.12.1.
That's how two different versions of vmm-sys-utils is introduced and this breaks the compilation.

In order to fix this and also avoid future problems, we introduce Cargo.lock file to dbs crates.

fixes: #8770

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2024-01-04 11:11:56 +08:00
alex.lyn
d2080fd221 runtime-rs: refactor getting the vfio device guest pci path
Fixes: #8748

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2024-01-02 14:28:34 +08:00
alex.lyn
d795fcfc2f runtime-rs: bridge the vfio device between runtime-rs and dragonball
Previously, Dragonball did not support PCI device hot-plugging or
VFIO device passthrough. Therefore, the runtime-rs support for
Dragonball was incomplete. it is time to complete it so that users
can use Dragonball's PCI hot-plugging and VFIO passthrough capabilities.

Fixes: #8748

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2024-01-02 14:28:10 +08:00
Chao Wu
67b91c1eb3
Merge pull request #8740 from openanolis/upstream/pci-6-final
Dragonball: add pci vfio passthrough, hot(un)plug support
2023-12-29 01:58:32 +08:00
Chao Wu
71c322c293 runtime-rs: fix ci complains
vfio commits introduce quite a lot change in runtime-rs, this commit is
for all the changes related to ci, including compilation errors and so on.

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-12-28 23:34:41 +08:00
Chao Wu
a3f7601f5a dragonball: add pci hotplug / hot-unplug support
Introduce two new vmm action to implement pci hotplug
and pci hot-unplug: PrepareRemoveHostDevice and RemoveHostDevice.

PrepareRemoveHostDevice is to call upcall to unregister the pci device
in the guest kernel.
RemoveHostDevice should be called after PrepareRemoveHostDevice, it is used
to clean the PCI resource in the Dragonball side.

fixes: #8741

Signed-off-by: Gerry Liu <gerry@linux.alibaba.com>
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-12-28 16:08:31 +08:00
Chao Wu
0f402a14f9 dragonball: add InsertHostDevice vmm action
Introduce a new vmm action InsertHostDevice to passthrough
host pci devices like NIC or GPU devices into guest so that
users could have high performance usage of those devices.

fixes: #8741

Signed-off-by: Gerry Liu <gerry@linux.alibaba.com>
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Helin Guo <helinguo@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-12-28 16:04:22 +08:00
Xuewei Niu
4c023e341c dragonball: Fix compilation issue without all net features
Combinations of network features were tested:

- None
- virtio-net
- vhost-net
- vhost-user-net
- virtio-net,vhost-net
- vhost-net,vhost-user-net
- virtio-net,vhost-user-net
- virtio-net,vhost-net,vhost-user-net

Fixes: #8742

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-28 11:37:26 +08:00
Alex.Lyn
990a3adf39
Merge pull request #8618 from Apokleos/csi-for-directvol
runtime-rs: Add dedicated CSI driver for DirectVolume support in Kata
2023-12-27 21:27:29 +08:00
alex.lyn
ea69c17008 runtime-rs: initialize pcie topology in Device Manager
Add a pcie_topology field to DeviceManager and initialize
pcie_topology when ResourceManager calls DeviceManager's new()
with TopologyConfigInfo.

Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:57:23 +08:00
alex.lyn
b42548b8e1 runtime-rs: do unregister device in Trait Device/detach
Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:53:18 +08:00
alex.lyn
0f0b6d13c9 runtime-rs: do register/update device in Trait Device/attach
Before calling the device driver to attach a device, register
the device to PCIe topology and allocate a PciPath for it.

However, for some hypervisor such as CLH, the allocation is invalid
when plugging devices to VM, they have the ability to return
DeviceInfo containing PciPath. It'll update the PciPath with the
returned pci path in the PCIe topology for them to prevent the
inferred pcipath from being different from the actual value returned.

But the update will not be executed if the pcipath value doesn't change.

Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:49:18 +08:00
alex.lyn
ce7d363695 runtime-rs: Introduce helper macros to simplify PCIe device ops
Introduce helper macros to simplify PCIe device register/unregister
and update, which provides a convenient way to handle devices in
topology.

Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:43:58 +08:00
alex.lyn
0d4992b24d runtime-rs: add one more argument in Device attach/detach
Add one more argument with type &mut Option<&mut PCIeTopology>
in attach and detach to inroduce methods within PCIe Topology.

Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:40:01 +08:00
alex.lyn
b425de6105 runtime-rs: implement Trait PCIeDevice for pcie/pci device
Implement Trait PCIeDevice register/unregister for pcie/pci
device, such as vfio device which needs set/get device's pci
path for kata agent's device handler.

Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:33:08 +08:00
alex.lyn
87e39cd1f6 runtime-rs: introduce Trait PCIeDevice to do [un]register device
Introduce Trait PCIeDevice with register/unregister, which are
used to register or unregister pcie device within the PCIe topology.

Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:29:35 +08:00
alex.lyn
6ebc4884fa runtime-rs: introduce PCIe Topology framework for pcie/pci devices
Due to different ways that different VMMs handle PCI devices,
we expect to provide a general PCIe topology processing framework
that is as compatible as possible with VMMs such as dragonball,
qemu, clh(Though it has its own management method, no conflict).

Currently,it's mainly developed for kinds of PCIe/PCI devices in
dragonball/clh which are attached on the pci/pcie root bus directly.

More will be added when Qemu is ready in runtime-rs.

Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:29:25 +08:00
alex.lyn
88839026b9 runtime-rs: introduce TopologyConfigInfo to initialize pcie topology
A TopologyConfigInfo added to store device config info for PCIe/PCI
devices in the VM from Hypervisor DeviceInfo.

And TopologyConfigInfo::new will be the entry to initialize PCIe
Topology for each VM.

Fixes: #7218

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-27 15:21:53 +08:00
Chao Wu
8895cb82df
Merge pull request #8724 from openanolis/chao/add_vfio
dragonball: introduce vfio support
2023-12-27 11:40:53 +08:00
Xuewei Niu
43a627c96f
Merge pull request #8632 from adamqqqplay/support-vhost-user-blk
dragonball: introduce vhost-user-blk device
2023-12-27 09:54:21 +08:00
Chao Wu
2f797a6eb7 pci: rename 2 parameters to follow rust naming convention
PciCapabilityID -> PciCapabilityId
PciBarRegionType::IORegion -> PciBarRegionType::IoRegion

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-12-26 23:28:47 +08:00
Chao Wu
9c13b2c990 dragonball: introduce vfio support
vfio mod collects lots of information related to the vfio operations, including VfioMsi and VfioMsix capability & state,
vfio interrupt info, pci region infor and vfio pci device info & state.

fixes: #8722

Signed-off-by: Gerry Liu <gerry@linux.alibaba.com>
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
Signed-off-by: Shifang Feng <fengshifang@linux.alibaba.com>
Signed-off-by: Yang Su <yang.su@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Xin Lin <jingshan@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-12-26 23:28:43 +08:00
alex.lyn
ba5437382a runtime-rs: add examples about Kata pod with directvol by CSI.
Fixes: #8602

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-26 20:36:34 +08:00
alex.lyn
c6d2a32146 runtime-rs: add support for directvol csi deploy scripts.
Fixes: #8602

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-26 20:36:34 +08:00
alex.lyn
25d8e83e43 runtime-rs: Add dedicated CSI driver for DirectVolume support in Kata
Bridge the gap between user requirements for direct block device access
and the DirectVolume capabilities provided by Kata runtimes
(kata-runtime/runtime-rs), and facilitate seamless integration with CSI
to improve user experience.

It aims to integrate DirectVolume CSI support into Kata, enabling users
to benefit from its performance and flexibility advantages.

Fixes: #8602

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-26 20:36:22 +08:00
Qinqi Qu
81ab174c16 dragonball: support vhost-user-blk in device manager
This patch introduces a feature of supporting vhost-user-blk device.

Fixes: #8631

Signed-off-by: Qinqi Qu <quqinqi@linux.alibaba.com>
2023-12-26 20:02:38 +08:00
Qinqi Qu
ef8dc3b0ce dragonball: support vhost-user-blk
This patch introduces a feature of supporting vhost-user-blk device.

This device needs to be defined before the VM instance is started,
which can be done through the dbs-cli tool with --virblks option:
--virblks '{
	"drive_id": "8623",
	"device_type": "Spdk",
	"path_on_host": "spdk:///var/tmp/vhost.sock",
	"is_root_device": false,
	"is_read_only": false,
	"is_direct": false,
	"no_drop": false,
	"num_queues": 1,
	"queue_size": 256
}'

Fixes: #8631

Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
Signed-off-by: fupan <fupan.lfp@antgroup.com>
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Qinqi Qu <quqinqi@linux.alibaba.com>
2023-12-26 20:02:32 +08:00
alex.lyn
3b317e69e2 runtime-rs: add README and user guide to deploy directvol CSI Driver
Fixes: #8602

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-26 18:00:35 +08:00
Xuewei Niu
36a4cbccf6 runtime-rs: Expand all DeviceType in match arms
The compiler will give a warning if a developer forget to add an arm for
a new variants defined.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-26 10:18:59 +08:00
Xuewei Niu
f2d08bc00f runtime-rs: Remove unused index from Endpoints
The affected `Endpoint`s are `VhostUserEndpoint` and `TapEndpoint`.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-26 10:18:59 +08:00
Xuewei Niu
60a42351e2 runtime-rs: DAN supports vhost-user-net device
DAN reads vhost-user-net device from JSON config. It only supports VMM
running as server right now.

Fixes: #8625

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-26 10:18:59 +08:00
Xuewei Niu
693a0cfbfd dragonball: Make vhost-user-net ready for VhostUserEndpoint
The changes involve:

- Expose VhostUserConfig struct to runtime-rs.
- Set a default value while num_queues or queue_size are 0.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-26 10:18:59 +08:00
Xuewei Niu
54df832407 runtime-rs: Support VhostUserEndpoint
This commit introduces VhostUserEndpoint and supports relative to
vhost-user-net devices for device manager. For now, Dragonball is able to
attach vhost-user-net devices.

Fixes: #8625

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-26 10:18:50 +08:00
Xuewei Niu
374c2f01aa runtime-rs: Simplify VhostUserType enum
Remove unused string parameter from each item.

Fixes: #8625

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-25 16:15:57 +08:00
Xuewei Niu
38eb4077a6
Merge pull request #8503 from justxuewei/vhost-user-net
dragonball: Support vhost-user-net device
2023-12-25 13:47:51 +08:00
Xuewei Niu
4c5de72863 dragonball: Wrap config space into set_config_space
Config space of network device is shared and accord with virtio 1.1 spec.
It is a good way to abstract the common part into one function.
`set_config_space()` implements this.

Plus, this patch removes `vq_pairs` from vhost-net devices, since there is
a possibility of data inconsistency. For example, some places read that
from `self.vq_pairs`, others read from `queue_sizes.len() / 2`.

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-25 10:47:34 +08:00
Alex.Lyn
3a3f39aa2d
Merge pull request #8668 from Apokleos/pci-path-refactor
runtime-rs: Refactor the code related to PCI paths and VFIO device driver initialize in DM.
2023-12-23 21:44:07 +08:00
Dan Mihai
080541a0f2 genpolicy: add SPDX license header
Add SPDX license header to rules.rego.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Saul Paredes
7f126be67e genpolicy: Update oci_distribution to 0.10.0
Also support alternative media type and update samples

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
9eb6fd4c24 docs: add agent policy and genpolicy docs
Add docs for the Agent Policy and for the genpolicy tool.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
57f93195ef genpolicy: add support for StatefulSet YAML input
Generate policy for K8s StatefulSet YAML.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
35958ec9cc genpolicy: add support for ReplicationController
Generate policy for K8s ReplicationController YAML.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
7da17099f2 genpolicy: add support for ReplicaSet YAML input
Generate policy for K8s ReplicaSet YAML.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
d84300f1ee genpolicy: add support for List YAML input
Generate policy for K8s List YAML.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
a03452637b genpolicy: add support for Job YAML input
Generate policy for K8s Job YAML.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
2dbd01c80b genpolicy: add support for Deployment YAML input
Generate policy for K8s Deployment YAML.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
a40a6003d0 genpolicy: add support for DaemonSet YAML input
Generate policy for K8s DaemonSet YAML.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Dan Mihai
48829120b6 policy: initial genpolicy commit
Add application that infers K8s user's intentions based on user's
K8s YAML file, and generates a Rego/OPA based policy for that YAML.

Just Pod YAML files are supported as input using this initial source
code. Support for other types of YAML files will come with upcoming
commits.

Fixes: #7673

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2023-12-22 15:35:05 +00:00
Chao Wu
8cf3bcefd8 dragonball: introduce pci msi/msix interrupt
introduce msi/msix mod to maintain information for PCI Message Signalled
Interrupt Extended Capability. It will be initialized when parsing pci
configuration space and used when getting interrupt capabilities.

fixes: #8661

Signed-off-by: Gerry Liu <gerry@linux.alibaba.com>
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
Signed-off-by: Shifang Feng <fengshifang@linux.alibaba.com>
Signed-off-by: Yang Su <yang.su@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Xin Lin <jingshan@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2023-12-22 16:28:22 +08:00
Xuewei Niu
beadce54c5 dragonball: Support vhost-user-net devices
This PR introduces vhost-user-net devices to Dragonball. The devices are
allowed to run as server on the VMM side.

Fixes: #8502

Signed-off-by: Eric Ren <renzhen@linux.alibaba.com>
Signed-off-by: Liu Jiang <gerry@linux.alibaba.com>
Signed-off-by: Zha Bin <zhabin@linux.alibaba.com>
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
Signed-off-by: Zizheng Bian <zizheng.bian@linux.alibaba.com>
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-22 14:53:18 +08:00
Xuewei Niu
1f21d3cb2c dragonball: Introduce address space for MmioV2DeviceState
Vhost-user-net has a dependency on address space from `MmioV2DeviceState`.
The addition of the address space is introduced in this patch. Plus, it
makes sure all unit tests have the according parameter as well.

Fixes: #8502

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-22 14:53:18 +08:00
alex.lyn
94c83cea84 runtime-rs: Refactor vfio driver implementation
It's important to ensure that these tasks which setup vfio
devices are completed before add_device.

So Moving vfio device setup code to a dedicated method at device
building time which does not affect the behavior of other code.

And this change makes it easier to understand the difference
between create and attach, and also makes the boundaries
clearer.

Fixes: #8665

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-22 10:37:40 +08:00
alex.lyn
82d3cfdeda runtime-rs: Make VhostUserConfig's field pci_path type more specific
Make VhostUserConfig pci_path's type more specific, change it
from Option<String> to Option<PciPath>.

Fixes: #8665

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-22 10:35:38 +08:00
alex.lyn
5cc2890a10 runtime-rs: refactor and re-implement pci path.
Do refactor and re-implement to make the pci path more "rusty".

Fixes: #8665

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-22 10:34:41 +08:00
alex.lyn
1b5758c1f2 runtime-rs: Move the PciPath-related code to a dedicated file
Move the pciPath code to a new file pci_path.rs and update the
references.

Fixes: #8665

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-21 11:35:18 +08:00
alex.lyn
275de453d5 runtime-rs: remove useless get_host_guest_map and its test case
Fixes: #8665

Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
2023-12-21 11:07:56 +08:00
James O. D. Hunt
7da6d0a845 runtime-rs: ch: Implement missing thread/pid APIs
Add implementations for the following `Hypervisor` trait methods which
simply return the same details as the `get_vmm_master_tid()` method:

- `get_thread_ids()`
- `get_pids()`

Fixes: #6438.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2023-12-20 17:58:40 +00:00
Archana Shinde
7e5868a55f
Merge pull request #8588 from amshinde/runtime-rs-update-readme
runtime-rs: Update readme to indicate cloud-hypervisor support
2023-12-19 22:09:14 -08:00
Hyounggyu Choi
540a2a7fb1 runtime: Allow no initrd path for IBM Z Secure Execution
This is to reintroduce a configuration rule for IBM Z Secure Execution,
where no initrd path should be configured. For the TEE of interest,
only a kernel image should be specified with `confidential_guest=true`.

Fixes: #8692

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2023-12-19 11:21:16 +01:00
Xuewei Niu
ec30d5a9a8
Merge pull request #8700 from justxuewei/dbs-ut
dragonball: Trigger unit tests of dbs_* subcrates by `make test`
2023-12-19 17:51:20 +08:00
Xuewei Niu
039fe7f391 dragonball: Trigger unit tests of dbs_* subcrates by make test
`make SUPPORT_VIRTUALIZATION=1 test` iterates through all subcrates and
does test.

Plus, this patch fixes some issues about unit tests:

- Feed too much parameters to `I8042Device::new()`.
- Virtqueue checks have been introduced since `virtio-queue v0.7.0`.
- GHA might have no access to `/var/tmp` dir on runner.

Fixes: #8690

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2023-12-19 16:22:37 +08:00