Commit Graph

4749 Commits

Author SHA1 Message Date
Aurélien Bombo
f08b594733
Merge pull request #9576 from microsoft/saulparedes/support_env_from
genpolicy: Add support for envFrom
2024-07-24 13:39:54 -07:00
Archana Shinde
49fbae4fb1 agent: Wait for interface in update_interface
For nerdctl and docker runtimes, network is hot-plugged instead of
cold-plugged. While this change was made in the runtime,
we did not have the agent waiting for the device to be ready.
On some systems, the device hotplug could take some time causing
the update_interface rpc call to fail as the interface is not available.

Add a watcher for the network interface based on the pci-path of the
network interface. Note, waiting on the device based on name is really
not reliable especially in case multiple networks are hotplugged.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-07-24 10:45:56 -07:00
Pavel Mores
dd1e09bd9d runtime-rs: add experimental support for memory hotunplugging to qemu-rs
Hotunplugging memory is not guaranteed or even likely to work.
Nevertheless I'd really like to have this code in for tests and
observation.  It shouldn't hurt, from experience so far.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-07-24 13:22:41 +02:00
Pavel Mores
3095b65ac3 runtime-rs: support hotplugging memory in QemuInner
The bulk of this implementation are simple though tedious sanity checks,
alignment computations and logging.

Note that before any hotplugging, we query qemu directly for the current
size of hotplugged memory.  This ensures that any request to resize memory
will be properly compared to the actual already available amount and only
necessary amount will be added.

Note also that we borrow checked_next_multiple_of() from CH implementation.
While this might look uncleanly it's just a rather temporary solution since
an equivalent function will apparently be part of std soon, likely the
upcoming 1.75.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-07-24 13:22:41 +02:00
Pavel Mores
4a1c828bf8 runtime-rs: support hotplugging memory in Qmp
The algorithm is rather simple - we query qemu for existing memory devices
to figure out the index of the one we're about to add.  Then we add a
backend object and a corresponding frontend device.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-07-24 13:22:41 +02:00
Pavel Mores
0e0b146b87 runtime-rs: support storage & retrieval of guest memblock size in qemu-rs
This will be used for ensuring that hotplugged memory block sizes are
properly aligned.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-07-24 13:22:41 +02:00
Alex Lyn
efb7390357 kata-sys-utils: align OCI Spec with oci-spec-rs
Do align oci spec and fix warnings to make clippy
happy.

Fixes #9766

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-07-24 14:38:48 +08:00
Alex Lyn
012029063c runtime-spec: Introduce runtime-spec for Container State
As part of aligning the Kata OCI Spec with oci-spec-rs,
the concept of "State" falls outside the scope of the OCI
Spec itself. While we'll retain the existing code for State
management for now, to improve code organizationand clarity,
we propose moving the State-related code from the oci/ dir
to a dedicated directory named runtime-spec/.
This separation will be completed in subsequent commits with
the removal of the oci/ directory.

Fixes #9766

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-07-24 14:38:30 +08:00
Dan Mihai
f26d595e5d
Merge pull request #9910 from microsoft/saulparedes/set_policy_rego_via_env
tools: Allow setting policy rego file via
2024-07-22 11:00:30 -07:00
Alex Lyn
67466aa27f kata-types: do alignment of oci-spec for kata-types
Fixes #9766

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-07-21 22:54:43 +08:00
Saul Paredes
2681fc7eb0 genpolicy: Add support for envFrom
This change adds support for the `envFrom` field in the `Pod` resource

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-07-19 09:53:58 -07:00
Greg Kurz
dc97f3f540
Merge pull request #10045 from lifupan/cleanup_container
runtime-rs: container: fix the issue of missing cleanup container
2024-07-19 16:36:04 +02:00
Alex Lyn
d0dc67bb96
Merge pull request #8597 from amshinde/vfio-hotplug-support
Implement hotplug support for physical endpoints
2024-07-19 13:41:11 +08:00
Fupan Li
8a2f7b7a8c container: fix the issue of missing cleanup container
When create container failed, it should cleanup the container
thus there's no device/resource left.

Fixes: #10044

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-07-19 11:02:55 +08:00
ms-mahuber
ddff762782 tools: Allow setting policy rego file via
environment variable

* Set policy file via env var

* Add restrictive policy file to kata-opa folder

* Change restrictive policy file name

* Change relative default path location

* Add license headers

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-07-18 15:05:45 -07:00
Aurélien Bombo
ab6f37aa52
Merge pull request #10022 from microsoft/danmihai1/probes-and-lifecycle
genpolicy: container.exec_commands args validation
2024-07-18 12:21:31 -07:00
Archana Shinde
1636c201f4 network: Implement network hotunplug for physical endpoints
Similar to HotAttach, the HotDetach method signature for network
endoints needs to be changed as well to allow for the method to make
use of device manager to manage the hot unplug of physical network
devices.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-07-17 16:42:41 -07:00
Archana Shinde
c6390f2a2a vfio: Introduce function to get vfio dev path
This function will be later used to get the vfio dev path.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-07-17 16:42:41 -07:00
Archana Shinde
1e304e6307 network: Implement hotplug for physical endpoints
Enable physical network interfaces to be hotplugged.
For this, we need to change the signature of the HotAttach method
to make use of Sandbox instead of Hypervisor. Similar approach was
followed for Attach method, but this change was overlooked for
HotAttach.
The signature change is required in order to make use of
device manager and receiver for physical network
enpoints.

Fixes: #8405

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-07-17 16:42:40 -07:00
Archana Shinde
2fef4bc844 vfio: use driver_override field for device binding.
The current implementation for device binding using driver bind/unbind
and new_id fails in the scenario when the physical device is not bound
to a driver before assigning it to vfio.
There exists and updated mechanism to accomplish the same that does not
have the same issue as above.
The driver_override field for a device allows us to specify the driver for a device
rather than relying on the bound driver to provide a positive match of the
device. It also has other advantages referenced here:
https://patchwork.kernel.org/project/linux-pci/patch/1396372540.476.160.camel@ul30vt.home/

So use the updated driver_override mechanism for binding/unbinding a
physical device/virtual function to vfio-pci.

Signed-off-by: liangxianlong <liang.xianlong@zte.com.cn>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-07-17 16:42:40 -07:00
Dan Mihai
f31c1b121e
Merge pull request #9812 from microsoft/saulparedes/test_policy_on_tdx
gha: enable policy testing on TDX
2024-07-17 08:47:44 -07:00
Dan Mihai
9f4d1ffd43 genpolicy: container.exec_commands args validation
Keep track of individual exec args instead of joining them in the
policy text. Verifying each arg results in a more precise policy,
because some of the args might include space characters.

This improved validation applies to commands specified in K8s YAML
files using:

- livenessProbe
- readinessProbe
- startupProbe
- lifecycle.postStart
- lifecycle.preStop

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-07-17 01:19:23 +00:00
stevenhorsman
eb07f5ef5e agent: doc: Fix ordering of options
- Fix the config options to be back in alphabetical order to be
easier to find

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-07-16 21:39:31 -03:00
stevenhorsman
7cc81ce867 agent: image: Set image-rs auth config
If the agent-config has a value for `image_registry_auth`,
Then pass this to the image-rs client and enable auth mode too

Fixes: #8122

Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-07-16 21:39:31 -03:00
stevenhorsman
265322990a agent: config: Add config option to provide auth for guest-pull
Add optional config for agent.image_registry_auth, to specify
the uri of credentials to be used when pulling images in the guest
from an authenticated registry

Fixes: #8122

Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-07-16 21:39:31 -03:00
Saul Paredes
0b3d193730 genpolicy: Support cpath for mount sources
Add setting to allow specifying the cpath for a mount source.

cpath is the root path for most files used by a container. For example,
the container rootfs and various files copied from the Host to the
Guest when shared_fs=none are hosted under cpath.

mount_source_cpath is the root of the paths used a storage mount
sources. Depending on Kata settings, mount_source_cpath might have the
same value as cpath - but on TDX for example these two paths are
different: TDX uses "/run/kata-containers" as cpath,
but "/run/kata-containers/shared/containers" as mount_source_cpath.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-07-15 14:09:49 -07:00
Dan Mihai
bcaf7fc3b4
Merge pull request #10008 from microsoft/danmihai1/runAsUser
genpolicy: add support for runAsUser fields
2024-07-15 12:08:50 -07:00
Xynnn007
a56b15112a agent: add ocicrypt config
ocicrypt config is for kata-agent to connect to CDH to request for image
decryption key. This value is specified by an env. We use this
workaround the same as CCv0 branch.

In future, we will consider better ways instead of writting files and
setting envs inside inner logic of kata-agent.

Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
2024-07-15 12:00:50 +01:00
Xynnn007
1072658219 agent: Enable kata-cc-rustls-tls in image-rs
- Enable the kata-cc-rustls-tls feature in image-rs, so that it
can get resources from the KBS in order to retrieve the registry
credentials.
- Also bump to the latest image-rs to pick up protobuf fixes
- Add libprotobuf-dev dependency to the agent packaging
as it is needed by the new image-rs feature
- Add extra env in the agent make test as the
new version of the anyhow crate has changed the backtrace capture thus unit
tests of kata-agent that compares a raised error with an expected one
would fail. To fix this, we need only panics to have backtraces, thus
set RUST_BACKTRACE=0 for tests due to document
https://docs.rs/anyhow/latest/anyhow/

Fixes #9538

Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-07-15 12:00:50 +01:00
Fupan Li
a7179be31d
Merge pull request #9534 from Tim-Zhang/fix-stdin-stuck
Fix ctr exec stuck problem
2024-07-15 13:19:19 +08:00
Dan Mihai
f087044ecb genpolicy: add support for runAsUser
Add ability to auto-generate policy for SecurityContext.runAsUser and
PodSecurityContext.runAsUser.

Fixes: #8879

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-07-13 01:10:43 +00:00
Dan Mihai
5282701b5b genpolicy: add link to allow_user() active issue
Improve comment to workaround in rules.rego, to explain better the
reason for that workaround.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-07-13 01:05:58 +00:00
Qi Feng Huo
4d66ee1935 initdata: add initdata annotation in hypervisor config
- Add Initdata annotation for hypervisor config, so that it can be passed when CreateVM

Signed-off-by: Qi Feng Huo <huoqif@cn.ibm.com>
2024-07-11 10:56:18 +08:00
Silenio Quarti
8260ce8d15 runtime: Initialize SharedFS for remote hypervisor
Sets SharedFS config to NoSharedFS for remote hypervisor in order to start the file watcher which syncs files from the host to the guest VMs. 

Signed-off-by: Silenio Quarti <silenio_quarti@ca.ibm.com>
2024-07-10 14:31:25 -03:00
Xuewei Niu
7f71eac6de
Merge pull request #9868 from l8huang/dan
runtime: implement DAN in Go kata-runtime
2024-07-10 19:09:46 +08:00
Alex Lyn
dafff26f01
Merge pull request #9814 from Apokleos/bugfix-pcipath
runtime-rs: bugfix for root bus slot allocation
2024-07-10 16:19:06 +08:00
Steve Horsman
78bbc51ff0
Merge pull request #9806 from niteeshkd/nd_snp_certs
runtime: pass certificates to get extended attestation report for SNP coco
2024-07-10 08:57:45 +01:00
Lei Huang
171d298dea runtime: implement DAN in Go kata-runtime
The DAN feature has already been implemented in kata-runtime-rs, and
this commit brings the same capability to the Go kata-runtime.

Fixes: #9758

Signed-off-by: Lei Huang <leih@nvidia.com>
2024-07-10 00:22:30 -07:00
Alex Lyn
806e959b01 runtime-rs: bugfix for device slot allocation failed in dragonball
In dragonball Vfio device passthrough scenarois, the first passthrough
device will be allocated slot 0 which is occupied by root device.
It will cause error, looks like as below:
```
...
6: failed to add VFIO passthrough device: NoResource\n
7: no resource available for VFIO device"): unknown
...
```
To address such problem, we adopt another method with no pre-allocated
guest device id and just let dragonball auto allocate guest device id
and return it to runtime. With this idea, add_device will return value
Result<DeviceType> and apply the change to related code.

Fixes #9813

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-07-10 10:59:57 +08:00
Alex Lyn
27947cbb0b dragonball: make add vfio device return guest device id
Fixes #9813

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-07-10 10:59:51 +08:00
Hyounggyu Choi
37b907dfbc
Merge pull request #9859 from BbolroC/set-ocispec-for-vfio-ap
tests: Extend vfio-ap hotplug test to use a zcrypttest tool
2024-07-09 14:03:45 +02:00
Steve Horsman
ff498c55d1
Merge pull request #9719 from fitzthum/sealed-secret
Support Confidential Sealed Secrets (as env vars)
2024-07-09 09:43:51 +01:00
Niteesh Dubey
529660fafb runtime: pass certificates for SNP coco
This will be used to get extended attestation report.

Fixes: #9805

Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>
2024-07-09 03:46:00 +00:00
Tim Zhang
8801554889 runtime-rs: Fix ctr exec stuck problem
Fixes: #9532

Instead of call agent.close_stdin in close_io, we call agent.write_stdin
with 0 len data when the stdin pipe ends.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2024-07-09 11:44:36 +08:00
Linda Yu
b4d61f887b agent: unittest for sealed secret as env in kata
To test unsealing secrets stored in environment variables,
we create a simple test server that takes the place of
the CDH. We start this server and then use it to
unseal a test secret.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
2024-07-08 17:32:45 -05:00
Linda Yu
6003608fe6 agent: support sealed secret as env in kata
When sealed-secret is enabled, the Kata Agent
intercepts environment variables containing
sealed secrets and uses the CDH to unseal
the value.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
2024-07-08 17:31:33 -05:00
stevenhorsman
d511820974 agent: Bump image-rs
- Bump the commit of image-rs we are pulling in to 413295415
Note: This is the last commmit before a change to whiteout handling
was introduced that lead to the error `'failed to unpack: convert whiteout"`
when pulling the agnhost:2.21 image

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-07-05 10:25:28 +01:00
ChengyuZhu6
e71c7ab932 agent/image: Remove functions about merging container spec for guest pull
Let me explain why:

In our previous approach, we implemented guest pull by passing PullImageRequest to the guest.
However, this method  resulted in the loss of specifications essential for running the container,
such as commands specified in YAML, during the CreateContainer stage. To address this,
it is necessary to integrate the OCI specifications and process information
from the image’s configuration with the container in guest pull.

The snapshotter method does not care this issue. Nevertheless, a problem arises
when two containers in the same pod attempt to pull the same image, like InitContainer.
This is because the image service searches for the existing configuration,
which resides in the guest. The configuration, associated with <image name, cid>,
is stored in the directory /run/kata-containers/<cid>. Consequently, when the InitContainer finishes
its task and terminates, the directory ceases to exist. As a result, during the creation
of the application container, the OCI spec and process information cannot
be merged due to the absence of the expected configuration file.

Fixes: kata-containers#9665
Fixes: kata-containers#9666
Fixes: kata-containers#9667
Fixes: kata-containers#9668

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-07-05 08:10:04 +08:00
ChengyuZhu6
c9d1a758cd agent/image: Reuse the mountpoint in image-rs
Currently, the image is pulled by image-rs in the guest and mounted at
`/run/kata-containers/image/cid/rootfs`. Finally, the agent rebinds
`/run/kata-containers/image/cid/rootfs` to `/run/kata-containers/cid/rootfs` in CreateContainer.
However, this process requires specific cleanup steps for these mount points.

To simplify, we reuse the mount point `/run/kata-containers/cid/rootfs`
and allow image-rs to directly mount the image there, eliminating the need for rebinding.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-07-05 08:10:04 +08:00
stevenhorsman
05cd1cc7a0 agent: Add CreateContainer support for pre-pulled bundle
- Add a check in setup_bundle to see if the bundle already exists
and if it does then skip the setup.

This commit is cherry-picked from 44ed3ab80e.

The reason that k8s-kill-all-process-in-container.bats failed is that
deletion of the directory `/root/kata-containers/cid/rootfs` failed during removing container
because it was mounted twice (one in image-rs and one in set_bundle ) and only unmounted once in removing container.

Fixes: #9664

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Dave Hay <david_hay@uk.ibm.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-07-05 08:10:00 +08:00
Biao Lu
6c1a2f01f8 protocols: add support for sealed_secret service
To unseal a secret, the Kata agent will contact the CDH
using ttRPC. Add the proto that describes the sealed
secret service and messages that will be used.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
Signed-off-by: Biao Lu <biao.lu@intel.com>
2024-07-04 01:03:41 -05:00
Anastassios Nanos
db75b5f3c4
Merge pull request #8070 from nubificus/feat_add-fc-runtime-rs
runtime-rs: firecracker hypervisor backend
2024-07-03 22:29:30 +03:00
Dan Mihai
ada53744ea
Merge pull request #9907 from microsoft/saulparedes/allow_empty_env_vars
genpolicy: allow some empty env vars
2024-07-03 08:07:23 -07:00
George Pyrros
2d19f3fbd7 runtime-rs: firecracker hypervisor backend
Add a basic runtime-rs `Hypervisor` trait implementation for
AWS Firecracker

- Add basic hypervisor operations (setup / start / stop / add_device)
- Implement AWS Firecracker API on a separate file `fc_api.rs`
- Add support for running jailed (include all sandbox-related content)
- Add initial device support (limited as hotplug is not supported)
- Add separate config for runtime-rs (FC)

Notes:
- devmapper is the only snapshotter supported
- to account for no sharefs support, we copy files in the sandbox (as
  in the GO runtime)
- nerdctl spawn is broken (TODO: #7703)

Fixes: #5268

Signed-off-by: George Pyrros <gpyrros@nubificus.co.uk>
Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
Signed-off-by: Charalampos Mainas <cmainas@nubificus.co.uk>
Signed-off-by: George Ntoutsos <gntouts@nubificus.co.uk>
2024-07-03 08:30:30 +00:00
Aurélien Bombo
33d08a8417
Merge pull request #9825 from microsoft/mahuber/main
osbuilder: allow rootfs builds w/o git or version file deps
2024-07-02 09:38:13 -07:00
Wainer Moschetta
ec695f67e1
Merge pull request #9577 from microsoft/saulparedes/topology
genpolicy: add topologySpreadConstraints support
2024-07-02 11:24:26 -03:00
Amulya Meka
dd12089e0d
Merge pull request #9914 from Amulyam24/qemu-fix
kata-deploy: fix qemu static build on ppc64le
2024-07-02 10:45:03 +05:30
Saul Paredes
f3f3caa80a genpolicy: update sample
Update pod-one-container.yaml sample

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-07-01 13:49:08 -07:00
Dan Mihai
75aee526a9 genpolicy: add topologySpreadConstraints support
Allow genpolicy to process Pod YAML files including
topologySpreadConstraints.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-07-01 13:32:49 -07:00
Hyounggyu Choi
99690ab202 runtime: Instantiate/pass vfio-ap device to ociSpec
This commit adds the missing step of passing an attached vfio-ap device
to a container via ociSpec. It instantiates and passes a vfio-ap device
(e.g. a Z crypto device).
A device at `/dev/z90crypt` covers all use cases at the time of writing.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-07-01 11:40:49 +02:00
Amulyam24
259ec408b5 kata-deploy: fix qemu static build for v8.2.1 on ppc64le
Do not install the packages librados-dev and librbd-dev as they are not needed for building static qemu.

Add machine option cap-ail-mode-3=off while creating the VM to qemu cmdline.
Fixes: #9893

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2024-07-01 14:56:43 +05:30
Archana Shinde
82a1892d34 agent: Add additional info while returning errors for update_interface
This should provide additional context for errors while updating network
interface.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-06-27 12:56:53 -07:00
Archana Shinde
2127288437 agent: Bring interface down before renaming it.
In case we are dealing with multiple interfaces and there exists a
network interface with a conflicting name, we temporarily rename it to
avoid name conflicts.
Before doing this, we need to rename bring the interface down.
Failure to do so results in netlink returning Resource busy errors.

The resource needs to be down for subsequent operation when the name is
swapped back as well.

This solves the issue of passing multiple networks in case of nerdctl
as:
nerdctl run --rm  --net foo --net bar docker.io/library/busybox:latest ip a

Fixes: #9900

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-06-27 12:56:53 -07:00
Bo Chen
25e3cab028 runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v40.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #9929

Signed-off-by: Bo Chen <chen.bo@intel.com>
2024-06-27 09:59:00 -07:00
Alex Lyn
d66c214ae7
Merge pull request #9849 from markyangcc/main
runtime: fix missing of VhostUserDeviceReconnect parameter assignment
2024-06-27 21:48:37 +08:00
Zvonko Kaiser
893fd2b59c
Merge pull request #9916 from zvonkok/config-fix
gpu: Missing separator
2024-06-26 14:46:47 +02:00
Greg Kurz
fe7ef878d2
Merge pull request #9913 from gkurz/update-kata-ctl-deps
kata-ctl: Update Cargo.lock
2024-06-26 14:31:03 +02:00
Zvonko Kaiser
e0aa54301f gpu: Missing separator
Add the correct separator for replacement

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-06-26 10:40:35 +00:00
Greg Kurz
ac33a389c0
Merge pull request #9879 from pmores/remove-dependency-on-containerd-bundle-dir-tree
runtime-rs: remove attempt to access sandbox bundle from container bu…
2024-06-26 10:57:50 +02:00
Greg Kurz
db7b2f7aaa kata-ctl: Update Cargo.lock
A previous change missed to refresh Cargo.lock.

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-06-26 08:27:52 +02:00
Saul Paredes
ce19419d72 genpolicy: allow some empty env vars
Updated genpolicy settings to allow 2 empty environment variables that
may be forgotten to specify (AZURE_CLIENT_ID and AZURE_TENANT_ID)

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-06-25 10:53:05 -07:00
Aurélien Bombo
0582a9c75b
Merge pull request #9864 from 3u13r/feat/genpolicy/layers-cache-file-path
genpolicy: allow specifying layer cache file
2024-06-25 10:42:22 -07:00
Alex Lyn
2c5b3a5c20
Merge pull request #9830 from gaohuatao-1/ght/count-rs
runtime-rs: fix the bug of func count_files
2024-06-25 15:00:46 +08:00
Aurélien Bombo
b0cdf4eb0d
Merge pull request #9579 from microsoft/saulparedes/add_seccomp_support
genpolicy: ignore SeccompProfile in PodSpec
2024-06-24 08:58:01 -07:00
Leonard Cohnen
6a3ed38140 genpolicy: allow specifying layer cache file
Add --layers-cache-file-path flag to allow the user to
specify where the cache file for the container layers
is saved. This allows e.g. to have one cache file
independent of the user's working directory.

Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
2024-06-24 14:53:27 +02:00
Wainer Moschetta
f7e0d6313b
Merge pull request #9865 from wainersm/qemu-coco-dev_updates
runtime: updates to qemu-coco-dev configuration
2024-06-21 10:14:30 -03:00
Saul Paredes
44afb4aa5f genpolicy: ignore SeccompProfile in PodSpec
Ignore SeccompProfile in PodSpec

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-06-20 09:42:17 -07:00
Dan Mihai
7aeaf2502a
Merge pull request #9856 from microsoft/danmihai1/new-policy-rules
genpolicy: reject untested CreateContainer field values
2024-06-20 09:34:53 -07:00
Steve Horsman
d5b4da7331
Merge pull request #9881 from stevenhorsman/remote-hypervisor-policy
runtime: Support policy in remote hypervisor
2024-06-20 14:01:29 +01:00
stevenhorsman
779754dcf6 runtime: Support policy in remote hypervisor
Move the `sandbox.agent.setPolicy` call out of the remoteHypervisor
if, block, so we can use the policy implementation on peer pods

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-06-19 16:43:53 +01:00
Pavel Mores
6a4919eeb9 runtime-rs: fix misleading log message
get_vmm_master_tid() currently returns an error with the message "cannot
get qemu pid (though it seems running)" when it finds a valid
QemuInner::qemu_process instance but fails to extract the PID out of it.

This condition however in fact means that a qemu child process was running
(otherwise QemuInner::qemu_process would be None) but isn't anymore (id()
returns None).

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-06-19 17:15:24 +02:00
Pavel Mores
af5492e773 runtime-rs: made Qemu::stop_vm() idempotent
Since Hypervisor::stop_vm() is called from the WaitProcess request handling
which appears to be per-container, it can be called multiple times during
kata pod shutdown.  Currently the function errors out on any subsequent
call after the initial one since there's no VM to stop anymore.  This
commit makes the function tolerate that condition.

While it seems conceivable that sandbox shouldn't be stopped by WaitProcess
handling, and the right fix would then have to happen elsewhere, this
commit at least makes qemu driver's behaviour consistent with other
hypervisor drivers in runtime-rs.

We also slightly improve the error message in case there's no
QemuInner::qemu_process instance.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-06-19 17:15:24 +02:00
Pavel Mores
5fbbff9e5e runtime-rs: remove attempt to access sandbox bundle from container bundle
Since no objections were raised in the linked issue (#9847) this commit
removes the attempt to derive sandbox bundle path from container bundle
path.  As described in more detail in the linked issue, this is container
runtime specific and doesn't seem to serve any purpose.

As for implementation, we hoist the only part of
get_shim_info_from_sandbox() that's still useful (getting the socket
address) directly into the caller and remove the function altogether.

Fixes #9847

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-06-19 17:09:15 +02:00
Dan Mihai
4df66568cf genpolicy: reject untested CreateContainer field values
Reject CreateContainerRequest field values that are not tested by
Kata CI and that might impact the confidentiality of CoCo Guests.

This change uses a "better safe than sorry" approach to untested
fields. It is very possible that in the future we'll encounter
reasonable use cases that will either:

- Show that some of these fields are benign and don't have to be
  verified by Policy, or
- Show that Policy should verify legitimate values of these fields

These are the new CreateContainerRequest Policy rules:

    count(input.shared_mounts) == 0
    is_null(input.string_user)

    i_oci := input.OCI
    is_null(i_oci.Hooks)
    is_null(i_oci.Linux.Seccomp)
    is_null(i_oci.Solaris)
    is_null(i_oci.Windows)

    i_linux := i_oci.Linux
    count(i_linux.GIDMappings) == 0
    count(i_linux.MountLabel) == 0
    count(i_linux.Resources.Devices) == 0
    count(i_linux.RootfsPropagation) == 0
    count(i_linux.UIDMappings) == 0
    is_null(i_linux.IntelRdt)
    is_null(i_linux.Resources.BlockIO)
    is_null(i_linux.Resources.Network)
    is_null(i_linux.Resources.Pids)
    is_null(i_linux.Seccomp)
    i_linux.Sysctl == {}

    i_process := i_oci.Process
    count(i_process.SelinuxLabel) == 0
    count(i_process.User.Username) == 0

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-06-18 18:09:31 +00:00
markyangcc
a28bf266f9 runtime: fix missing of VhostUserDeviceReconnect parameter assignment
Commit 'ca02c9f5124e' implements the vhost-user-blk reconnection functionality,
However, it has missed assigning VhostUserDeviceReconnect when new the QEMU
HypervisorConfig, resulting in VhostUserDeviceReconnect always set to default value 0.

Real change is this line, most of changes caused by go format,

return vc.HypervisorConfig{
	// ...
	VhostUserDeviceReconnect: h.VhostUserDeviceReconnect,
}, nil

Fixes: #9848
Signed-off-by: markyangcc <mmdou3@163.com>
2024-06-18 12:15:10 +08:00
Alex Lyn
388cd7dde4
Merge pull request #9772 from pmores/add-base-qmp-framework
runtime-rs: add base qmp framework
2024-06-18 09:53:28 +08:00
Alex Lyn
275c498dc9
Merge pull request #9834 from lifupan/main
sandbox: fix the issue of failed to get the vmm master tid
2024-06-18 08:57:21 +08:00
Alex Lyn
d3fb6bfd35
Merge pull request #9860 from stevenhorsman/tokio-vulnerability-bump
Tokio vulnerability bump
2024-06-18 08:35:34 +08:00
Wainer dos Santos Moschetta
bdbee78517 runtime: allow default_{vcpus,memory} annotations to qemu-coco-dev
This is a counterpart of commit abf52420a4 for the qemu-coco-dev
configuration. By allowing default_vcpu and default_memory annotations
users can fine-tune the VM based on the size of the container
image to avoid issues related with pulling large images in the guest.

Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-06-17 18:59:52 -03:00
Wainer dos Santos Moschetta
baa8d9d99c runtime: set shared_fs=none to qemu-coco-dev configuration
Just like the TEE configurations (sev, snp, tdx) we want to have the
qemu-coco-dev using shared_fs=none.

Fixes: #9676
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-06-17 18:42:46 -03:00
Wainer Moschetta
b6a28bd932
Merge pull request #9786 from microsoft/saulparedes/add_back_insecure_registry_pull
genpolicy: add back support for insecure
2024-06-17 15:21:25 -03:00
Wainer Moschetta
68415dabcd
Merge pull request #9815 from msanft/fix/genpolicy/flag-name
genpolicy: fix settings path flag name
2024-06-17 15:13:25 -03:00
stevenhorsman
53659f1ede libs: Update tokio dependencies
- Bump tokio to 1.38.0 to fix the security vulnerability
https://rustsec.org/advisories/RUSTSEC-2024-0019

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-06-17 13:03:01 +01:00
stevenhorsman
35f6be97df runtime-rs: Update tokio dependency
- Bump tokio to 1.38.0 to fix the security vulnerability
https://rustsec.org/advisories/RUSTSEC-2024-0019

If possible it would be good to add the many runtime-rs creates into the
runtime-rs workspace and provide a centralised version to avoid the updates
in many places.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-06-17 13:03:01 +01:00
stevenhorsman
3bb1a67d80 agent-ctl: Update rustjail dependencies
- Run `cargo update -p rustjail` to pick up rustjail's bump of
tokio to 1.38.0 to fix the security vulnerability
https://rustsec.org/advisories/RUSTSEC-2024-0019

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-06-17 13:03:01 +01:00
stevenhorsman
d2d35d2dcc runk: Update tokio dependencies
- Bump tokio to 1.38.0 to fix the security vulnerability
https://rustsec.org/advisories/RUSTSEC-2024-0019

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-06-17 13:03:01 +01:00
stevenhorsman
adda401a8c genpolicy: Update tokio dependencies
- Bump tokio to 1.38.0 to fix the security vulnerability
https://rustsec.org/advisories/RUSTSEC-2024-0019

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-06-17 13:03:01 +01:00
stevenhorsman
b7928f465e agent: Update tokio dependencies
- Bump tokio to 1.38.0 to fix the security vulnerability
https://rustsec.org/advisories/RUSTSEC-2024-0019

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-06-17 13:02:47 +01:00
Steve Horsman
cce735a09e
Merge pull request #9840 from stevenhorsman/bump-agent-rust-1.75.0
versions: Bump rust toolchain
2024-06-17 11:28:07 +01:00
Fupan Li
b218c4bc10
Merge pull request #9836 from lifupan/main_fix
sandbox: fix the issue of double initial_size_manager config
2024-06-17 09:15:51 +08:00
Bo Chen
a68aeca356
Merge pull request #9575 from likebreath/0430/clh_v39.0
versions: Upgrade to Cloud Hypervisor v39.0
2024-06-14 09:10:19 -07:00
stevenhorsman
3fb176970f dragonball: Fix device manager warning
- Fix the lint error:
```
error: you seem to use `.enumerate()` and immediately discard the index
   --> src/device_manager/mod.rs:427:33
    |
427 |         for (_index, device) in self.virtio_devices.iter().enumerate() {
    |                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```
 by removing the unnecessary enumerate

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-06-14 16:45:16 +01:00
stevenhorsman
1ea2671f2f dragonball: Fix lint with rust 1.75.0
The ci failed with:
```
error: use of `or_insert_with` to construct default value
   --> src/address_space_manager.rs:650:14
    |
650 |             .or_insert_with(NumaNode::new);
    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try: `or_default()`
    |
```

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-06-14 16:45:16 +01:00
Steve Horsman
ab8a9882c1
Merge pull request #9818 from EmmEff/fix-spelling
runtime: fix minor spelling issues
2024-06-14 13:12:56 +01:00
Steve Horsman
99bf95f773
Merge pull request #9827 from littlejawa/fix_panic_on_metrics_gathering
runtime: avoid panic on metrics gathering
2024-06-14 11:12:43 +01:00
Pavel Mores
380f8ad03f runtime-rs: add base vCPU hotplugging support
We take advantage of the Inner pattern to enable QemuInner::resize_vcpu()
take `&mut self` which we need to call non-const functions on Qmp.

This runs on Intel architecture but will need to be verified and ported
(if necessary) to other architectures in the future.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-06-14 10:13:32 +02:00
Pavel Mores
8231c6c4a3 runtime-rs: instantiate Qmp as (optional) member of QemuInner
The QMP_SOCKET_FILE constant in cmdline_generator.rs is made public to make
it accessible from QemuInner.  This is fine for now however if the constant
needs to be accessed from additional places in the future we could consider
moving it to somewhere more visible.

The Debug impl for Qmp is empty since first, we don't actually want it,
it's only forced by Hypervisor trait bounds, and second, it doesn't have
anything to display anyway.  If Qmp gets any members in the future that
can be meaningfully displayed they should be handled by Qmp's Debug::fmt().

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-06-14 10:13:32 +02:00
Pavel Mores
6fdb262dca runtime-rs: add Qmp object to encapsulate QMP functionality
The constructor handles QMP connection initialisation, too, so there can
be non-functional Qmp instance.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-06-14 10:13:32 +02:00
Manuel Huber
62fd84dfd8 build: allow rootfs builds w/o git or VERSION file deps
We set the VERSION variable consistently across Makefiles to
'unknown'  if the file is empty or not present.
We also use git commands consistently for calculating the COMMIT,
COMMIT_NO variables, not erroring out when building outside of
a git repository.
In create_summary_file we also account for a missing/empty VERSION
file.
This makes e.g. the UVM build process in an environment where we
build outside of git with a minimal/reduced set of files smoother.

Signed-off-by: Manuel Huber <mahuber@microsoft.com>
2024-06-13 22:46:52 +00:00
Mike Frisch
c2f61b0fe3 runtime: spelling fixes
Minor spelling fixes in runtime log messages.

Signed-off-by: Mike Frisch <mikef17@gmail.com>
2024-06-13 12:11:34 -04:00
Greg Kurz
b85b1c1058
Merge pull request #9790 from gkurz/kill-some-dead-runtime-code
Kill some dead runtime code
2024-06-13 15:45:51 +02:00
gaohuatao
4cb4e44234 runtime-rs: fix the bug of func count_files
When the total number of files observed is greater than limit, return -1 directly.
runtime has fixed this bug, it should b ported to runtime-rs.

Fixes:#9829

Signed-off-by: gaohuatao <gaohuatao@bytedance.com>
2024-06-13 16:02:33 +08:00
Fupan Li
cd68ef372f sandbox: fix the issue of double initial_size_manager config
It shouldn't call the initial_size_manager's setup_config
in the load_config since it had been called in the sandbox's
try_init function.

Fixes: #9778

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-06-13 15:44:51 +08:00
Fupan Li
61687992f4 sandbox: fix the issue of failed to get the vmm master tid
For kata container, the container's pid is meaning less to
containerd/crio since the container's pid is belonged to VM,
and containerd/crio couldn't use it. Thus we just return any
tid of kata shim or hypervisor. But since the hypervisor had
been stopped before deleting the container, and it wouldn't
get the hypervisor's tid for some supported hypervisor, thus
we'd better to return the kata shim's pid instead of hypervisor's
tid.

Fixes: #9777

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-06-13 10:27:04 +08:00
Fabiano Fidêncio
56423cbbfe
Merge pull request #9706 from burgerdev/burgerdev/genpolicy-devices
genpolicy: add support for devices
2024-06-12 23:03:41 +02:00
Fabiano Fidêncio
3a0247ed43
Merge pull request #9819 from stevenhorsman/config-envvar-precedence
agent: config: Ensure envs take precedence
2024-06-12 11:26:02 +02:00
Julien Ropé
9c86eb1d35 runtime: avoid panic on metrics gathering
While running with a remote hypervisor, whenever kata-monitor tries to access
metrics from the shim, the shim does a "panic" and no metric can be gathered.

The function GetVirtioFsPid() is called on metrics gathering, and had a call
to "panic()". Since there is no virtiofs process for remote hypervisor, the
right implementation is to return nil. The caller expects that, and will skip
metrics gathering for virtiofs.

Fixes: #9826

Signed-off-by: Julien Ropé <jrope@redhat.com>
2024-06-12 10:02:44 +02:00
Xuewei Niu
92cc5e0adb
Merge pull request #9781 from gaohuatao-1/ght/shm 2024-06-12 12:39:28 +08:00
Moritz Sanft
84903c898c
genpolicy: fix settings path flag name
This corrects the warning to point to the \`-j\` flag,
which is the correct flag for the JSON settings file.
Previously, the warning was confusing, as it pointed to
the \`-p\` flag, which specifies to the path for the Rego ruleset.

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
2024-06-11 21:17:18 +02:00
Greg Kurz
1acf8d0c35 govmm: Drop QEMU's NoShutdown knob
Code is not used.

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-06-11 19:55:54 +02:00
Greg Kurz
cb5b548ad7 govmm: Drop QEMU's Daemonize knob
Code isn't used anymore.

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-06-11 19:55:54 +02:00
Greg Kurz
33eaf69d5f virtcontainers: Drop QEMU's Daemonize knob
QEMU isn't started as daemon anymore and this won't change (see #5736
for details). Drop the related code.

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-06-11 19:55:54 +02:00
Dan Mihai
d47f40210a
Merge pull request #9808 from microsoft/saulparedes/oci_from_settings
genpolicy: load OCI version from settings
2024-06-11 10:42:04 -07:00
Saul Paredes
3e9d6c11a1 genpolicy: add back support for insecure
registries

Adding back changes from
77540503f9.

Fixes: #9008

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-06-11 09:42:23 -07:00
Bo Chen
2398442c58 runtime: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v39.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.

Fixes: #8694, #9574

Signed-off-by: Bo Chen <chen.bo@intel.com>
2024-06-11 09:42:17 -07:00
stevenhorsman
40e02b34cb agent: config: Ensure envs take precedence
- Update the config parsing logic so that when reading from the
agent-config.toml file any envs are still processed
- Add units tests to formalise that the envs take precedence over values
from the command line and the config file

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-06-11 16:31:10 +01:00
Steve Horsman
59ff40f054
Merge pull request #9811 from mkulke/mkulke/use-kebabcase-for-enum-values-in-config-file-parsing
agent: convert enum vals to kebab-case in cfg file
2024-06-11 14:49:30 +01:00
gaohuatao
638e9acf89 runtime: fix the bug of func countFiles
When the total number of files observed is greater than limit, return (-1, err).
When the returned err is not nil, the func countFiles should return -1.

Fixes:#9780

Signed-off-by: gaohuatao <gaohuatao@bytedance.com>
2024-06-11 18:17:18 +08:00
Alex Lyn
1c8db85d54
Merge pull request #9784 from Apokleos/bufix-testcases
kata-types: fix bug in kata-types several test cases
2024-06-11 10:01:45 +08:00
Saul Paredes
6a84562c16 genpolicy: load OCI version from settings
Load OCI version from genpolicy-settings.json and validate it in
rules.rego

Fixes: #9593

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-06-10 15:30:39 -07:00
Magnus Kulke
abc704a720 agent: convert enum vals to kebab-case in cfg file
fixes #9810

Add an annotation to the enum values in the agent config that will
deserialize them using a kebab-case conversion, aligning the behaviour
to parsing of params specified via kernel cmdline.

drive-by fix: add config override for guest_component_procs variable

Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
2024-06-10 21:55:05 +02:00
Alex Lyn
27685c91e5 kata-types: fix bug in kata-types several test cases
(1) As mis-use of cap.set causing previous Caps lost which
causing assert! failed, just replacing cap.set with cap.add.

(2) It will return error if there's no such name setting when
do update_config_by_annotation {
    ...
if config.runtime.name.is_empty() {
            return Err(io::Error::new(
                io::ErrorKind::InvalidData,
                "Runtime name is missing in the configuration",
            ));
        }
...
}

Fixes #9783

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-06-07 09:16:23 +08:00
Niteesh Dubey
62d3d7c58f runtime: enable kernel-hashes for SNP confidential container
This is required to provide the hashes of kernel, initrd and cmdline
needed during the attestation of the coco.

Fixes: #9150

Signed-off-by: Niteesh Dubey <niteesh@us.ibm.com>
2024-06-05 15:02:02 +00:00
Fabiano Fidêncio
138ef2c55f
Merge pull request #9678 from AdithyaKrishnan/main
TEEs: Skip a few CI tests for SEV/SNP
2024-06-04 23:42:51 +02:00
GabyCT
6c2e8bed77
Merge pull request #9725 from 3u13r/feat/genpolicy/filter-by-runtime
genpolicy: add ability to filter for runtimeClassName
2024-06-04 10:06:14 -06:00
Wainer Moschetta
b5561074c3
Merge pull request #9377 from beraldoleal/yqbump
deps: bumping yq to v4.40.7
2024-06-03 14:34:58 -03:00
Ryan Savino
6db08ed620 runtime: sev: snp: Use shared_fs=none
Disabling 9p for SEV and SNP TEEs.

Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
2024-06-03 01:14:16 -05:00
Markus Rudy
13310587ed genpolicy: check requested devices
CreateContainerRequest objects can specify devices to be created inside
the guest VM. This change ensures that requested devices have a
corresponding entry in the PodSpec.

Devices that are added to the pod dynamically, for example via the
Device Plugin architecture, can be allowlisted globally by adding their
definition to the settings file.

Fixes: #9651
Signed-off-by: Markus Rudy <mr@edgeless.systems>
2024-05-31 22:05:49 +02:00
Markus Rudy
ea578f0a80 genpolicy: add support for VolumeDevices
This adds structs and fields required to parse PodSpecs with
VolumeDevices and PVCs with non-default VolumeModes.

Signed-off-by: Markus Rudy <mr@edgeless.systems>
2024-05-31 19:34:14 +02:00
Beraldo Leal
c99ba42d62 deps: bumping yq to v4.40.7
Since yq frequently updates, let's upgrade to a version from February to
bypass potential issues with versions 4.41-4.43 for now. We can always
upgrade to the newest version if necessary.

Fixes #9354
Depends-on:github.com/kata-containers/tests#5818

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2024-05-31 13:28:34 -04:00
Beraldo Leal
4f6732595d ci: skip go version check
golang.mk is not ready to deal with non GOPATH installs. This is
breaking test on s390x.

Since previous steps here are installing go and yq our way, we could
skip this aditional check. A full refactor to golang.mk would be needed
to work with different paths.

Signed-off-by: Beraldo Leal <bleal@redhat.com>
2024-05-31 13:28:34 -04:00
Magnus Kulke
9f04dc4c8b agent: introduce config for coco attestion procs
fixes #9748

A configuration option `guest_component_procs` has been introduced that
indicates which guest component processes are supposed to be spawned by
the agent. The default behaviour remains that all of those processes are
actively spawned by the agent. At the moment this is based on presence
of binaries in the rootfs and the guest_component_api_rest option.

The new option is incremental:

none -> attestation-agent -> confidential-data-hub -> api-server-rest

e.g. api-server-rest implies attestation-agent and confidential-data-hub

the `none` option has been removed from guest_component_api_rest, since
this is addresses by the introduced option.

To not change expected behaviour for  non-coco guests we still will still
only attempt to spawn the processes if the requested attestation binaries
are present on the rootfs, and issue in warning in those cases.

Signed-off-by: Magnus Kulke <magnuskulke@microsoft.com>
2024-05-31 12:15:41 +02:00
Leonard Cohnen
1d1690e2a4 genpolicy: add ability to filter for runtimeClassName
Add the CLI flag --runtime-class-names, which is used during
policy generation. For resources that can define a
runtimeClassName (e.g., Pods, Deployments, ReplicaSets,...)
the value must have any of the --runtime-class-names as
prefix, otherwise the resource is ignored.

This allows to run genpolicy on larger yaml
files defining many different resources and only generating
a policy for resources which will be deployed in a
confidential context.

Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
2024-05-31 03:17:02 +02:00
Greg Kurz
b3cb19b6a7
Merge pull request #9639 from emanuellima1/rng-impl
runtime-rs: Add RNG to QEMU cmdline
2024-05-30 12:00:11 +02:00
Emanuel Lima
138d985c64
runtime-rs: Add RNG to QEMU cmdline
It creates this line, as the Golang runtime does:
-object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0

Signed-off-by: Emanuel Lima <emlima@redhat.com>
2024-05-29 13:11:00 -03:00
Zvonko Kaiser
d4832b3b74 vfio: Fix hotpunplug
We need to remove the device from the tracking map, a container
restart will increment the bus index and we will get out of root-ports
and crash the machine.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-05-28 07:37:30 +00:00
Zvonko Kaiser
4c93bb2d61 qemu: Add CDI device handling for any container type
We need special handling for pod_sandbox, pod_container and
single_container how and when to inject CDI devices

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-05-27 10:13:01 +00:00
Zvonko Kaiser
c7b41361b2 gpu: reintroduce pcie_root_port and add pcie_switch_port
In Kubernetes we still do not have proper VM sizing
at sandbox creation level. This KEP tries to mitigates
that: kubernetes/enhancements#4113 but this can take
some time until Kube and containerd or other runtimes
have those changes rolled out.

Before we used a static config of VFIO ports, and we
introduced CDI support which needs a patched contianerd.
We want to eliminate the patched continerd in the GPU case
as well.

Fixes: #8860

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-05-27 10:13:01 +00:00
Fupan Li
6f6a164451
Merge pull request #9268 from zvonkok/kata-agent-createcontainer
kata-agent: CreateContainer Hook
2024-05-27 16:36:22 +08:00
Alex Lyn
713c929a64
Merge pull request #9656 from pmores/document-qemu-rs-conventions
runtime-rs: document architecture & implementation conventions in qem…
2024-05-27 10:38:58 +08:00
Xuewei Niu
bb7a1c56e9
Merge pull request #9693 from sidneychang/9690/Adjust-indentation 2024-05-27 00:20:34 +08:00
Alex Lyn
55dbf6121a
Merge pull request #9604 from Apokleos/qmp-cmdline01
runtime-rs: add QMP support for Qemu(part I)
2024-05-26 20:22:59 +08:00
Alex Lyn
028b10ce7a
Merge pull request #9687 from l8huang/vfio-pci-gk
agent: collect PCI address mapping for both vfio-pci-gk and vfio-pci device
2024-05-26 17:48:25 +08:00
Steve Horsman
b89c3e35dd
Merge pull request #9583 from cncal/update_check_error_message
runtime: make kata-runtime check error more understandable when /dev/kvm doesn't exist
2024-05-24 17:49:43 +01:00
Alex Lyn
41fb7aeb89 runtime-rs: add QMP params suppport in cmdline
Fixes: #9603

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-05-24 22:16:24 +08:00
Alex Lyn
7ed6c6896b runtime-rs: add an option dbg_monitor_socket for HMP support
This option allows to add a debug monitor socket when
`enable_debug = true` to control QEMU within debugging case.

Fixes: #9603

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-05-24 22:16:17 +08:00
Lei Huang
3624573b12 agent: collect PCI address mapping for both vfio-pci-gk and vfio-pci device
The `update_env_pci()` function need the PCI address mapping to
translate the host PCI address to guest PCI address in below
environment variables:
- PCIDEVICE_<prefix>_<resource-name>_INFO
- PCIDEVICE_<prefix>_<resource-name>

So collect PCI address mapping for both vfio-pci-gk and
vfio-pci devices.

Fixes #9614

Signed-off-by: Lei Huang <leih@nvidia.com>
2024-05-23 21:20:01 -07:00
Fupan Li
d73876252e
Merge pull request #9690 from justxuewei/agent-timeout
runtime-rs: Remove obsoleted dial_timeout config
2024-05-24 10:31:12 +08:00
Zvonko Kaiser
3affd83e14
Merge pull request #9605 from l8huang/skip-env
kata-agent: update env PCIDEVICE_<prefix>_<resource-name>_INFO
2024-05-23 18:45:00 +02:00
Fabiano Fidêncio
d83cf39ba1
Merge pull request #9680 from kata-containers/dependabot/go_modules/src/runtime/go_modules-5e29427af7
build(deps): bump golang.org/x/net from 0.24.0 to 0.25.0 in /src/runtime in the go_modules group across 1 directory
2024-05-23 12:55:29 +02:00
Fabiano Fidêncio
0e33ecf7fc
Merge pull request #9653 from JakubLedworowski/fixes-9497-ensure-quote-generation-service-is-added-to-qemu-cmd-2
runtime: Enable connection to Quote Generation Service (QGS)
2024-05-22 15:49:23 +02:00
sidneychang
8938f35627 runtime-rs: Adjust indentation in ifneq statements within Makefile.
Replace tab indentation with spaces for the three lines within the ifneq statements, aligning them with the surrounding code.

Fixes:#9692

Signed-off-by: sidneychang <2190206983@qq.com>
2024-05-22 20:24:35 +08:00
Fabiano Fidêncio
94f7bbf253
Merge pull request #9682 from fidencio/topic/allow-increasing-cpus-and-memory-via-annotation-for-tdx
runtime: tdx: Allow default_{cpu,memory} annotations
2024-05-22 12:07:28 +02:00
Xuewei Niu
d31616cec3 runtime-rs: Remove obsoleted dial_timeout config
The `dial_timeout` works fine for Runtime-go, but is obsoleted in
Runtime-rs.

When the pod cannot connect to the Agent upon starting, we need to adjust
the `reconnect_timeout_ms` to increase the number of connection attempts to
the Agent.

Fixes: #9688

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-05-22 17:57:05 +08:00
Jakub Ledworowski
fc680139e5 runtime: Enable connection to Quote Generation Service (QGS)
For the TD attestation to work the connection to QGS on the host is needed.
By default QGS runs on vsock port 4050, but can be modified by the host owner.
Format of the qemu object follows the SocketAddress structure, so it needs to be provided in the JSON format, as in the example below:
-object '{"qom-type":"tdx-guest","id":"tdx","quote-generation-socket":{"type":"vsock","cid":"2","port":"4050"}}'

Fixes: #9497
Signed-off-by: Jakub Ledworowski <jakub.ledworowski@intel.com>
2024-05-22 11:16:24 +02:00
Alex Lyn
0331859740
Merge pull request #9642 from gkurz/drop-unused-knobs-qemu-rs
runtime-rs: Drop some useless QEMU arguments
2024-05-22 16:13:14 +08:00
Alex Lyn
ce030d1804
Merge pull request #9641 from cmaf/runtime-resize-mem-1
runtime: Add missing check in ResizeMemory for CH
2024-05-22 14:05:30 +08:00
Alex Lyn
b7af00be2a
Merge pull request #9624 from cncal/bugfix_duplicated_devices
runtime: fix duplicated devices requested to the agent
2024-05-22 12:45:46 +08:00
Steve Horsman
f41f642b90
Merge pull request #9635 from kata-containers/dependabot/go_modules/src/runtime/go_modules-f0df977846
build(deps): bump github.com/containerd/containerd from 1.7.11 to 1.7.16 in /src/runtime in the go_modules group across 1 directory
2024-05-21 21:19:32 +01:00
Steve Horsman
9b0ed3dfa7
Merge pull request #9657 from ajaypvictor/remote-hyp-annotations
runtime: Disable number of cpu comparison on remote hypervisor scenario
2024-05-21 21:19:12 +01:00
Lei Huang
b0a91b0d13 kata-agent: update env PCIDEVICE_<prefix>_<resource-name>_INFO
The new version of sriov-network-device-plugin adds an env
`PCIDEVICE_<prefix>_<resource-name>_INFO`, which has a json
value; kata-agent can't parse it as env
`PCIDEVICE_<prefix>_<resource-name>` which has value in format
"DDDD:BB:SS.F".

This change updates env `PCIDEVICE_<prefix>_<resource-name>_INFO`.

Signed-off-by: Lei Huang <leih@nvidia.com>
2024-05-21 10:46:41 -07:00
stevenhorsman
865fa9da15 runtime: Resolve go static-checks failure
Remove `rand.Seed` call to resolve the following failure:
```
rand.Seed is deprecated: As of Go 1.20 there is no reason to call Seed with a random value.
```

The go rand.Seed docs: https://pkg.go.dev/math/rand@go1.20#Seed
back this up and states:
> If Seed is not called, the generator is seeded randomly at program startup.
so I believe we can just delete the call.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-05-21 11:08:59 +01:00
Fabiano Fidêncio
abf52420a4 runtime: tdx: Allow default_{cpu,memory} annotations
For now, let's allow the users to set the default_cpu and default_memory
when using TDX, as they may hit issues related to the size of the
container image that must be pulled and unpacked inside the guest,

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-21 10:26:39 +02:00
stevenhorsman
75a201389d runtime: update go version in go.mod
- Make due to us bumping the golang version used in our CI
but `make vendor` fails without the go version in the runtime go.mod
being increased, so update this and run go mod tidy

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-05-21 09:11:46 +01:00
dependabot[bot]
735185b15c build(deps): bump github.com/containerd/containerd
Bumps the go_modules group with 1 update in the /src/runtime directory: [github.com/containerd/containerd](https://github.com/containerd/containerd).


Updates `github.com/containerd/containerd` from 1.7.11 to 1.7.16
- [Release notes](https://github.com/containerd/containerd/releases)
- [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md)
- [Commits](https://github.com/containerd/containerd/compare/v1.7.11...v1.7.16)

---
updated-dependencies:
- dependency-name: github.com/containerd/containerd
  dependency-type: direct:production
  dependency-group: go_modules
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-21 09:11:46 +01:00
Ajay Victor
abe607b0c7 runtime: Disable number of cpu comparison on remote hypervisor scenario
Fixes https://github.com/kata-containers/kata-containers/issues/9238

Signed-off-by: Ajay Victor <ajvictor@in.ibm.com>
2024-05-21 13:34:21 +05:30
dependabot[bot]
01868b2849
---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: indirect
  dependency-group: go_modules
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-05-20 22:06:41 +00:00
Fabiano Fidêncio
072b929b6f
Merge pull request #9660 from malt3/fix/genpolicy/namespace_empty_string
genpolicy: detect empty string in ns as default
2024-05-20 21:34:13 +02:00
Tim Zhang
857d2bbc8e agent: Fix ctr exec stuck problem
Fixes: #9532

Close stdin when write_stdin receives data of length 0.

Stop call notify_term_close() in close_stdin, because it could
discard stdout unexpectedly.

Signed-off-by: Tim Zhang <tim@hyper.sh>
2024-05-20 14:52:14 +08:00
Fabiano Fidêncio
f2de259387
runtime: tdx: Use shared_fs=none
We shouldn't be using 9p, at all, with TEEs, as off right now we have no
way to ensure the channels are encrypted.  The way to work this around
for now is using guest pull, either with containerd + nydus snapshotter
or with CRI-O; or even tardev snapshotter for pulling on the host (which
is the approach used by MSFT).

This is only done for TDX for now, leaving the generic, AMD, and IBM
related stuff for the folks working on those to switch and debug
possible issues on their environment.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-19 18:47:09 +02:00
Malte Poll
babdab9078 genpolicy: detect empty string in ns as default
In Kubernetes, the following values for namespace are equivalent and all refer to the default namespace:

- ` ` (namespace field missing)
- `namespace: ""` (namespace field is the empty string)
- `namespace: "default"`(namespace field has the explicit value `default`)

Genpolicy currently does not handle the empty string case correctly.

Signed-Off-By: Malte Poll <1780588+malt3@users.noreply.github.com>
2024-05-18 12:44:59 +02:00
Pavel Mores
b9febc4458 runtime-rs: document architecture & implementation conventions in qemu-rs
Implementation of QemuCmdLine has a fairly uniform and repetitive structure
that's guided by a set of conventions.  These conventions have however been
mostly implicit so far, leading to a superfluous and annoying
request/force-push churn during qemu-rs PR reviews.

This commit aims to make things explicit so that contributors can take them
into account before an initial PR submission.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-05-17 12:21:44 +02:00
Chelsea Mafrica
5d2af555da runtime: Add missing check in ResizeMemory for CH
ResizeMemory for Cloud Hypervisor is missing a check for the new
requested memory being greater than the max hotplug size after
alignment. Add the check, and since an earlier check for this
setsrequested memory to the max hotplug size, do the same in the
post-alignment check.

Fixes #9640

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2024-05-15 11:29:18 -07:00
Greg Kurz
bd6420e0cc runtime-rs: Drop some useless QEMU arguments
All these settings are hardcoded as `false` and result in
no extra options on the QEMU command line, like the go
runtime does. There actually not needed :
- we're never going to ask QEMU to survive a guest shutdown
- we're never going to run QEMU daemonized since it prevents
  log collection
- we're never going to ask QEMU to start with the guest stopped

No need to keep this code around then.

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-05-15 18:33:43 +02:00
Greg Kurz
e2117d3b71
Merge pull request #9571 from emanuellima1/fix-impl-rtc
runtime-rs: Fix constructing the RTC struct
2024-05-14 09:17:27 +02:00
cncal
232db2d906 runtime: fix duplicated devices requested to the agent
By default, when a container is created with the `--privileged` flag,
all devices in `/dev` from the host are mounted into the guest. If
there is a block device(e.g. `/dev/dm`) followed by a generic
device(e.g. `/dev/null`),two identical block devices(`/dev/dm`)
would be requested to the kata agent causing the agent to exit with error:

> Conflicting device updates for /dev/dm-2

As the generic device type does not hit any cases defined in `switch`,
the variable `kataDevice` which is defined outside of the loop is still
the value of the previous block device rather than `nil`. Defining `kataDevice`
in the loop fixes this bug.

Signed-off-by: cncal <flycalvin@qq.com>
2024-05-12 16:38:37 +08:00
Emanuel Lima
59c1567f80 runtime-rs: Fix constructing the RTC struct
RTC was being built in a wrong fashion on commit #2bc5e3c6e2ab0145fa9e8be95df0d5086c07a517

RTC was being constructed inside the QemuCmdLine struct,
but it should've been built inside the devices vector.

Signed-off-by: Emanuel Lima <emlima@redhat.com>
2024-05-09 15:00:47 -03:00
Fabiano Fidêncio
77f457c0e1
runtime: tdx: Drop sept-ve-disable=on
This was needed when we were using an old (and not maintained anymore)
host stack.  Considering what we have as part of the distros, Today,
this can simply be dropped, as I cannot find any reference of this one
being needed in any up-to-date documentation.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
Fabiano Fidêncio
416d00228c
Revert "qemu: tdx: Adapt command line" (partially)
This reverts commit b7cccfa019.

The `private=on` bit has never made its way upstream, and was removed
from the latest iteration that we're using.  With that in mind, let's
revert its usage in the code.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
Fabiano Fidêncio
1c3037fd25
Revert "govmm: tdx: Expose the private=on|off knob"
This reverts commit 582b5b6b19.

The `private=on` bit has never made its way upstream, and was removed
from the latest iteration that we're using.  With that in mind, let's
revert its addition, and later on its usage in the code.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
Fabiano Fidêncio
f48450b360
runtime: config: tdx: Add QEMU / OVMF placeholder var
Let's add the PLACEHOLDER_FOR_DISTRO_{QEMU,OVMF}_WITH_TDX_SUPPORT
variables instead of actually setting a path, so we can easily replace
those as part of our deployment scripts.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-05-09 07:59:12 +02:00
GabyCT
3b8a910393
Merge pull request #9596 from lifupan/main
db: fix the issue of failed to init pci root bus
2024-05-08 13:14:20 -06:00
Alex Lyn
875e6e3815
Merge pull request #9601 from cncal/fix_redundant_log
qemu: the error is logged only when it occurs
2024-05-08 08:59:01 +08:00
GabyCT
22087f9db9
Merge pull request #9598 from lifupan/main_shim
runtime-rs: fix the issue of the leak of dead shim
2024-05-07 10:14:11 -06:00
GabyCT
a564422b7b
Merge pull request #9582 from cncal/main
build: fix the confusing build message if yq doesn't exist in GOPATH/bin
2024-05-07 09:34:27 -06:00
Fabiano Fidêncio
ddf6b367c7
Merge pull request #9568 from kata-containers/dependabot/go_modules/src/runtime/go_modules-22ef55fa20
build(deps): bump the go_modules group across 5 directories with 8 updates
2024-05-07 13:14:48 +02:00
cncal
15d511af97 qemu: the error is logged only when it occurs
Everytime I create contianer on arm64 machine, containerd/kata logs a redundant warning
as follows:
``` shell
time="2024-05-07" level=warning msg="<nil>" arch=arm64 name=containerd-shim-v2
pid=xxx sandbox=fdd1f05 source=virtcontainers/hypervisor
```
I added an error statement so that the error would be logged when it occurs.

Signed-off-by: cncal <flycalvin@qq.com>
2024-05-07 14:28:04 +08:00
Fupan Li
3694f3d9fe runtime-rs: fix the issue of the leak of dead shim
We should init and asign the runtime instance to runtime
handler, otherwise, if the pause container failed to start,
which means the runtime instance failed to start, then the
following delete & shutdown request wouldn't be run, thus
the dead shim would be left.

Fixes: #9597

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-05-06 17:31:31 +08:00
Fupan Li
26bee78e8d db: fix the issue of failed to init pci root bus
dragonball reserves 2048G of mmio space for the pci root bus by default
on physical addresses greater than 4G. However, for some machines with
smaller physical address widths, such as 39-bit wide physical addresses,
dragonball reserves the mmio space when initializing the memory. It is
less than 2048G, so this commit dynamically calculates and allocates the
mmio size of each pci root bus.

Fixes: #9509

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-05-06 11:34:18 +08:00
Aurélien Bombo
0cc2b07a8c tests: adapt Mariner CI to unblock CH v39 upgrade
The CH v39 upgrade in #9575 is currently blocked because of a bug in the
Mariner host kernel. To address this, we temporarily tweak the Mariner
CI to use an Ubuntu host and the Kata guest kernel, while retaining the
Mariner initrd. This is tracked in #9594.

Importantly, this allows us to preserve CI for genpolicy. We had to
tweak the default rules.rego however, as the OCI version is now
different in the Ubuntu host. This is tracked in #9593.

This change has been tested together with CH v39 in #9588.

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2024-05-03 16:29:12 +00:00
cncal
48d873b52b build: fix the confusing build message if yq doesn't exist in GOPATH/bin
The build message shows that yq was not found when I tried to build
runtime binaries, but I've actually installed yq by yum install.

Signed-off-by: cncal <flycalvin@qq.com>
2024-05-03 08:34:45 +08:00
cncal
9caa7beb1f runtime: make kata-runtime check error more understandable
If device /dev/kvm does not exist, kata-runtime check would fail with
an ambiguous error messae 'no such file or directory'. I added a little
more details to make it understandable and it will belike:

```
ERRO[0000] cannot open kvm device: no such file or directory  arch=arm64 check-type=full device=/dev/kvm name=kata-runtime pid=2849085 source=runtime
ERRO[0000] no such file or directory                          arch=arm64 name=kata-runtime pid=2849085 source=runtime
no such file or directory
```

Signed-off-by: cncal <flycalvin@qq.com>
2024-05-03 08:29:08 +08:00
Zvonko Kaiser
e5e0983b56
Merge pull request #9476 from zvonkok/nvidia-config-tomls
config: Add NVIDIA GPU SNP, TDX configuration files
2024-05-02 10:27:10 +02:00
stevenhorsman
3c2232d898 runtime: fix testVersionString logic
- The testVersionString logic use regex to check that the ociVersion is
displayed correctly, but with the new go module that version has a
`+` in, so we need to quote this to escape special characters

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-04-30 10:54:49 +01:00
dependabot[bot]
391bc35805 build(deps): bump the go_modules group across 5 directories with 8 updates
Bumps the go_modules group with 2 updates in the /src/runtime directory: [github.com/containerd/containerd](https://github.com/containerd/containerd) and [github.com/containers/podman/v4](https://github.com/containers/podman).
Bumps the go_modules group with 4 updates in the /src/tools/csi-kata-directvolume directory: [golang.org/x/sys](https://github.com/golang/sys), google.golang.org/protobuf, [golang.org/x/net](https://github.com/golang/net) and [google.golang.org/grpc](https://github.com/grpc/grpc-go).
Bumps the go_modules group with 2 updates in the /src/tools/log-parser directory: [golang.org/x/sys](https://github.com/golang/sys) and gopkg.in/yaml.v3.
Bumps the go_modules group with 2 updates in the /tests directory: [golang.org/x/sys](https://github.com/golang/sys) and gopkg.in/yaml.v3.
Bumps the go_modules group with 2 updates in the /tools/testing/kata-webhook directory: [golang.org/x/sys](https://github.com/golang/sys) and [golang.org/x/net](https://github.com/golang/net).


Updates `github.com/containerd/containerd` from 1.7.2 to 1.7.11
- [Release notes](https://github.com/containerd/containerd/releases)
- [Changelog](https://github.com/containerd/containerd/blob/main/RELEASES.md)
- [Commits](https://github.com/containerd/containerd/compare/v1.7.2...v1.7.11)

Updates `github.com/containers/podman/v4` from 4.2.0 to 4.9.4
- [Release notes](https://github.com/containers/podman/releases)
- [Changelog](https://github.com/containers/podman/blob/v4.9.4/RELEASE_NOTES.md)
- [Commits](https://github.com/containers/podman/compare/v4.2.0...v4.9.4)

Updates `google.golang.org/protobuf` from 1.29.1 to 1.33.0

Updates `github.com/cyphar/filepath-securejoin` from 0.2.3 to 0.2.4
- [Release notes](https://github.com/cyphar/filepath-securejoin/releases)
- [Commits](https://github.com/cyphar/filepath-securejoin/compare/v0.2.3...v0.2.4)

Updates `golang.org/x/sys` from 0.15.0 to 0.19.0
- [Commits](https://github.com/golang/sys/compare/v0.15.0...v0.19.0)

Updates `google.golang.org/protobuf` from 1.31.0 to 1.33.0

Updates `golang.org/x/net` from 0.19.0 to 0.23.0
- [Commits](https://github.com/golang/net/compare/v0.19.0...v0.23.0)

Updates `google.golang.org/grpc` from 1.59.0 to 1.63.2
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.59.0...v1.63.2)

Updates `golang.org/x/sys` from 0.0.0-20191026070338-33540a1f6037 to 0.1.0
- [Commits](https://github.com/golang/sys/compare/v0.15.0...v0.19.0)

Updates `gopkg.in/yaml.v3` from 3.0.0-20200313102051-9f266ea9e77c to 3.0.0

Updates `golang.org/x/sys` from 0.0.0-20220429233432-b5fbb4746d32 to 0.19.0
- [Commits](https://github.com/golang/sys/compare/v0.15.0...v0.19.0)

Updates `gopkg.in/yaml.v3` from 3.0.0-20210107192922-496545a6307b to 3.0.0

Updates `golang.org/x/sys` from 0.15.0 to 0.19.0
- [Commits](https://github.com/golang/sys/compare/v0.15.0...v0.19.0)

Updates `golang.org/x/net` from 0.19.0 to 0.23.0
- [Commits](https://github.com/golang/net/compare/v0.19.0...v0.23.0)

---
updated-dependencies:
- dependency-name: github.com/containerd/containerd
  dependency-type: direct:production
  dependency-group: go_modules
- dependency-name: github.com/containers/podman/v4
  dependency-type: direct:production
  dependency-group: go_modules
- dependency-name: google.golang.org/protobuf
  dependency-type: direct:production
  dependency-group: go_modules
- dependency-name: github.com/cyphar/filepath-securejoin
  dependency-type: indirect
  dependency-group: go_modules
- dependency-name: golang.org/x/sys
  dependency-type: indirect
  dependency-group: go_modules
- dependency-name: google.golang.org/protobuf
  dependency-type: indirect
  dependency-group: go_modules
- dependency-name: golang.org/x/net
  dependency-type: direct:production
  dependency-group: go_modules
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  dependency-group: go_modules
- dependency-name: golang.org/x/sys
  dependency-type: indirect
  dependency-group: go_modules
- dependency-name: gopkg.in/yaml.v3
  dependency-type: indirect
  dependency-group: go_modules
- dependency-name: golang.org/x/sys
  dependency-type: indirect
  dependency-group: go_modules
- dependency-name: gopkg.in/yaml.v3
  dependency-type: indirect
  dependency-group: go_modules
- dependency-name: golang.org/x/sys
  dependency-type: indirect
  dependency-group: go_modules
- dependency-name: golang.org/x/net
  dependency-type: indirect
  dependency-group: go_modules
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-04-30 09:46:13 +01:00
Wainer Moschetta
eae429a39b
Merge pull request #9552 from wainersm/kata_cc_dev
runtime: new qemu-coco-dev configuration
2024-04-30 05:21:49 -03:00
Pavel Mores
1dd06cf40d
Merge pull request #9551 from pmores/support-iommu
runtime-rs: support IOMMU in qemu VMs
2024-04-29 15:26:11 +02:00
Wainer dos Santos Moschetta
42fb5d7760 runtime: new qemu-coco-dev configuration
Created a new configuration to configure Kata for CoCo without requiring TEE
hardware so to allow developers implement/test/debug platform agnostic code
on their workstations. It will also ease testing of CoCo features on CI with
non-TEE supported VMs.

This is based off qemu configuration. The following differences applied:
 - switched to confidential guest image/initrd
 - switched to confidential kernel
 - switched to 9p shared_fs

Fixes #9487
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2024-04-29 05:45:10 -03:00
Fabiano Fidêncio
fe21d7a58b
rootfs: Stop building and shipping OPA
Since OPA binary was replaced by the regorus crate, we can finally stop
building and shipping the binary.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-26 18:51:28 +02:00
Pavel Mores
908ec31d9b runtime-rs: fix iommu_platform support for qemu vhost-user-fs device
iommu_platform support was already added on initial DeviceVhostUserFs
introduction, however it incorrectly enabled iommu_platform also on
non-CCW (e.g. PCI) systems.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-04-26 14:48:00 +02:00
Pavel Mores
174fc8f44b runtime-rs: support iommu_platform for qemu virtio-net device
Note that it's only supported on CCW systems.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-04-26 14:48:00 +02:00
Pavel Mores
0d038f20cc runtime-rs: support iommu_platform for qemu virtio-serial device
iommu_platform is only turned on for CCW systems.

PartialEq is added to VirtioBusType to enable the '==' operator.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-04-26 14:48:00 +02:00
Pavel Mores
66a2dc48ae runtime-rs: support iommu_platform for qemu vhost-vsock device
iommu_platform addition is controlled solely by the configuration file.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-04-26 14:48:00 +02:00
Pavel Mores
d1e6f9cc4e runtime-rs: add IOMMU to qemu VM if configured
The adding itself is done by a new function add_iommu() that conforms with
the add_*() convention.  Note though that this function is called
internally, by the QemuCmdLine constructor, simply because there's nothing
to trigger its invocation from QemuInner (unlike the other add_*()
functions so far).

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-04-26 14:48:00 +02:00
Pavel Mores
0859f47a17 runtime-rs: add representation of '-device intel-iommu' to qemu-rs
Following the golang shim example, the values are hardcoded.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-04-26 14:47:51 +02:00
Pavel Mores
702bf0d35e runtime-rs: support qemu machine's 'kernel_irqchip' param
We will want to set kernel_irqchip when enabling IOMMU and this commit
adds the requisite support.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-04-26 14:42:54 +02:00
Alex Lyn
f72c6ba814
Merge pull request #9519 from emanuellima1/impl-rtc
runtime-rs: Add RTC to QEMU cmdline
2024-04-26 17:44:47 +08:00
Dan Mihai
b42ddaf15f
Merge pull request #9530 from microsoft/saulparedes/improve_caching
genpolicy: changing caching so the tool can run concurrently with itself
2024-04-25 13:06:23 -07:00
Fabiano Fidêncio
b4360e7e37
Merge pull request #9510 from microsoft/danmihai1/regorus-policy2
agent: use regorus instead of opa
2024-04-24 21:40:29 +02:00
Dan Mihai
2400a4d249
Merge pull request #9428 from arc9693/archana1/genplicyfixes
genpolicy: implement default methods for K8sResource trait
2024-04-24 08:04:19 -07:00
Dan Mihai
ff385eac41 agent: remove unnecessary comment
Remove reminder to initialize Policy earlier, because currently there
are no plans to initialize earlier.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-04-24 14:53:51 +00:00
Dan Mihai
89c85dfe84
Merge pull request #9432 from UiPath/fix-clh-wait
clh: isClhRunning waits for full timeout when clh exits
2024-04-23 13:02:45 -07:00
Emanuel Lima
2bc5e3c6e2 runtime-rs: Add RTC to QEMU cmdline
Add RTC by hardcoding the ooptions base=utc,driftfix=slew,clock=host

Signed-off-by: Emanuel Lima <emlima@redhat.com>
2024-04-23 10:46:30 -03:00
Greg Kurz
42a79801f3
Merge pull request #9524 from littlejawa/fix_createruntime_hook_not_called
runtime: Call CreateRuntime hooks at container creation time
2024-04-23 13:43:36 +02:00
Fupan Li
469c4e4f44
Merge pull request #9335 from Tim-Zhang/fix-passfd-fifo-open
passfd-io: fix FIFO opening and vsock handling
2024-04-23 09:04:45 +08:00
Alex Lyn
bc2cf95e7a
Merge pull request #9517 from amshinde/update-storage-source-pciblock
runtime-rs: Update storage source for pci block devices
2024-04-23 07:32:36 +08:00
Dan Mihai
5d31eb4847 agent: use regorus 0.1.4
Use regorus 0.1.4 from crates.io, instead of its source code
repository.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-04-22 23:21:17 +00:00
Dan Mihai
df23eb09a6 agent: use regorus instead of opa
Implement Agent Policy using the regorus crate instead of the OPA
daemon.

The OPA daemon will be removed from the Guest rootfs in a future PR.

Fixes: #9388

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-04-22 19:58:30 +00:00
Dan Mihai
b509c1beee agent: lock anyhow version to 1.0.58
Lock anyhow version to 1.0.58 because:

- Versions between 1.0.59 - 1.0.76 have not been tested yet using
  Kata CI. However, those versions pass "make test" for the
  Kata Agent.

- Versions 1.0.77 or newer fail during "make test" - see
  https://github.com/kata-containers/kata-containers/issues/9538.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-04-22 19:49:15 +00:00
Archana Shinde
cc6b671101 runtime-rs: Update storage source for pci block devices
In case of block devices using virtio-block, we need to pass the
pci-path as the storage source field to the agent.
Current the virt-path is being passed which works just for mmio block
devices.
In the future when support is added for scsi, block-ccw and pmem
devices, the storage source would need to be handled accordingly.

Fixes: #9034

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2024-04-22 11:36:58 -07:00
Greg Kurz
6ca0f09710
Merge pull request #9518 from microsoft/danmihai1/agent-cargo-lock
agent: update cargo.lock
2024-04-22 13:36:06 +02:00
Tim Zhang
aeba483ec8 agent: avoid fd leakage of passfd-io
In do_create_container and do_exec_process, we should create the proc_io first,
in case there's some error occur below, thus we can make sure
the io stream closed when error occur.

Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-04-22 17:39:33 +08:00
Tim Zhang
8441187d5e runtime-rs: fix FIFO handling
Fixes: #9334

In linux, when a FIFO is opened and there are no writers, the reader
will continuously receive the HUP event. This can be problematic.
To avoid this problem, we open stdin in write mode and keep the stdin-writer

We need to open the stdout/stderr as the read mode and keep the open endpoint
until the process is delete. otherwise,
the process would exit before the containerd side open and read
the stdout fifo, thus runD would write all of the stdout contents into
the stdout fifo and then closed the write endpoint. Then, containerd
open the stdout fifo and try to read, since the write side had closed,
thus containerd would block on the read forever.
Here we keep the stdout/stderr read endpoint File in the common_process,
which would be destroied when containerd send the delete rpc call,
at this time the containerd had waited the stdout read return, thus it
can make sure the contents in the stdout/stderr fifo wouldn't be lost.

Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-04-22 17:39:33 +08:00
Tim Zhang
d68eb7f0ad agent: Fix close_stdin for passfd-io
In scenario passfd-io, we should wait for stdin to close itself
instead of manually intervening in it.

Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-04-22 17:39:32 +08:00
Archana Choudhary
4a010cf71b genpolicy: add default implementations for K8sResource trait
This commit adds default implementations for following methods of
K8sResource trait:
- generate_policy
- serialize

Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2024-04-21 12:59:02 +00:00
Archana Choudhary
6edc3b6b0a genpolicy: add default implementation for use_sandbox_pidns
This patch adds a default implementation for the use_sandbox_pidns
and updates the structs that implement the K8sResource trait to use
the default.

Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2024-04-21 12:59:02 +00:00
Archana Choudhary
d5d3f9cda7 genpolicy: add default implementation for use_host_network
- Provide default implementation for use_host_network
- Remove default implementation from structs implementing the trait K8sResource

Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2024-04-21 12:59:02 +00:00
Archana Choudhary
9a3eac5306 genpolicy: add default impl for get_containers
- Provide default impl for get_containers
- Remove default impl from structs implementing the trait K8sResource

Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2024-04-21 12:59:02 +00:00
Archana Choudhary
2db3470602 genpolicy: add default impl for get_container_mounts_and_storages
- Provide default impl for get_container_mounts_and_storages
- Remove default impl from structs implementing the trait K8sResource

Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2024-04-21 12:59:02 +00:00
Archana Choudhary
09b0b4c11d genpolicy: add default implementation for get_sandbox_name
- Provide default implementation for get_sandbox_name in K8sResource trait
- Remove default implementation from structs implementing the trait K8sResource

Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2024-04-21 12:55:32 +00:00
Archana Choudhary
43e9de8125 genpolicy: add default implementation for get_annotations
- Provide default implementation for get_annontations.
- Remove default implementation from structs implementing the trait K8sResource

Fixes: #8960
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
2024-04-21 12:55:32 +00:00
Saul Paredes
2149cb6502 genpolicy: changing caching so the tool can run
concurrently with itself

Based on 3a1461b0a5186a92afedaaea33ff2bd120d1cea0

Previously the tool would use the layers_cache folder for all instances
and hence delete the cache when it was done, interfereing with other
instances. This change makes it so that each instance of the tool will
have its own temp folder to use.

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-04-19 15:46:30 -07:00
Steve Horsman
7e12d588c0
Merge pull request #9485 from sparky005/update_golang.org/x/net
update golang.org/x/net
2024-04-19 11:26:13 +01:00
Julien Ropé
70e798ed35 runtime: Call CreateRuntime hooks at container creation time
CreateRuntime hooks are called at the CreateSandbox time,
but not after CreateContainer.

Fixes: #9523

Signed-off-by: Julien Ropé <jrope@redhat.com>
2024-04-19 10:25:02 +02:00
Dan Mihai
4242801b1c agent: update cargo.lock
Update Kata Agent's Cargo.lock after the recent changes to Cargo.toml.

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-04-18 17:12:48 +00:00
Adil Sadik
1c5ca0c915 runtime: update golang.org/x/net
updates golang.org/x/net to newer version that closes some reported
vulnerabilities and security issues

Fixes #9486

Signed-off-by: Adil Sadik <sparky.005@gmail.com>
2024-04-18 10:55:02 -04:00
Tim Zhang
221c5b51fe dragonball: fix EPOLLHUP/EPOLLERR events handling in vsock
1. EPOLLHUP events also need to be read and will be got len 0.
2. We should kill the connection when EPOLLERR events are received.

Signed-off-by: Tim Zhang <tim@hyper.sh>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2024-04-18 20:47:02 +08:00
Zvonko Kaiser
eda3bfe2ef config: Add NVIDIA GPU SNP, TDX configuration files
Fixes: #9475

For TDX and SNP add NVIDIA specific configuration files

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-04-17 12:49:13 +00:00
Dan Mihai
c26dad8fe5
Merge pull request #9294 from burgerdev/burgerdev/genpolicy-configurable-pause
genpolicy: support insecure registries and custom pause containers
2024-04-16 09:39:33 -07:00
Greg Kurz
aca6a1bcb5
Merge pull request #9353 from pmores/pr-8866-follow-up
runtime-rs: refactor qemu driver
2024-04-16 16:07:36 +02:00
Hyounggyu Choi
32f58abfde
Merge pull request #9403 from BbolroC/runtime-rs-ci-qemu
CI: Enable GHA cri-containerd workflow for runtime-rs with QEMU
2024-04-15 09:31:25 +02:00
Hyounggyu Choi
606f8e1ab2 runtime-rs: Adjust configuration for qemu-runtime-rs
To make `qemu-runtime-rs` working for CI, we have to rename a configuration
template file and `CONFIG_FILE_QEMU` in Makefile.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-04-12 12:25:53 +02:00
Alexandru Matei
9e01732f7a agent: shutdown vm on exit when agent is used as init process
Linux kernel generates a panic when the init process exits.
The kernel is booted with panic=1, hence this leads to a
vm reboot.
When used as a service the kata-agent service has an ExecStop
option which does a full sync and shuts down the vm.
This patch mimicks this behavior when kata-agent is used as
the init process.

Fixes: #9429

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2024-04-12 11:32:31 +03:00
Alexandru Matei
54923164b5 clh: isClhRunning waits for full timeout when clh exits
isClhRunning uses signal 0 to test whether the process is
still alive or not. This doesn't work because the process is a
direct child of the shim. Once it is dead the process becomes
zombie.
Since no one waits for it the process lingers until
its parent dies and init reaps it. Hence sending signal 0 in
isClhRunning will always return success whether the process is
dead or not.
This patch calls wait to reap the process, if it succeeds that
means it is our child process, if not we send the signal.

Fixes: #9431

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2024-04-12 11:31:53 +03:00
Markus Rudy
77540503f9 genpolicy: add support for insecure registries
genpolicy is a handy tool to use in CI systems, to prepare workloads
before applying them to the Kubernetes API server. However, many modern
build systems like Bazel or Nix restrict network access, and rightfully
so, so any registry interaction must take place on localhost.
Configuring certificates for localhost is tricky at best, and since
there are no privacy concerns for localhost traffic, genpolicy should
allow to contact some registries insecurely. As this is a runtime
environment detail, not a target environment detail, configuring
insecure registries does not belong into the JSON settings, so it's
implemented as command line flags.

Fixes: #9008

Signed-off-by: Markus Rudy <webmaster@burgerdev.de>
2024-04-11 22:29:03 +02:00
Markus Rudy
bc2292bc27 genpolicy: make pause container image configurable
CRIs don't always use a pause container, but even if they do the
concrete container choice is not specified. Even if the CRI config can
be tweaked, it's not guaranteed that registries in the public internet
can be reached. To be portable across CRI implementations and
configurations, the genpolicy user needs to be able to configure the
container the tool should append to the policy.

Signed-off-by: Markus Rudy <webmaster@burgerdev.de>
2024-04-11 16:26:35 +02:00
Markus Rudy
8b30fa103f genpolicy: parse json settings during config init
Decouple initialization of the Settings struct from creating the
AgentPolicy struct, so that the settings are available for evaluating,
extending or overriding command line arguments.

Signed-off-by: Markus Rudy <webmaster@burgerdev.de>
2024-04-11 16:17:33 +02:00
Xuewei Niu
50f78ec52c agent: Fix the issue with the "test_new_fs_manager" test
This patch introduces a one-time cpath to mitigate the cgroup residuals. It
might break the device cgroup merging rules when the cgroup has children.

Fixes: #9456

Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
2024-04-11 18:06:05 +08:00
Saul Paredes
51498ba99a genpolicy: toggle containerd pull in tests
- Add v1 image test case
- Install protobuf-compiler in build check
- Reset containerd config to default in kubernetes test if we are testing genpolicy
- Update docker_credential crate
- Add test that uses default pull method
- Use GENPOLICY_PULL_METHOD in test

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-04-08 19:28:29 -07:00
Saul Paredes
c96ebf237c genpolicy: add containerd pull method
Add optional toggle to use existing containerd installation to pull and manage container images.
This adds support to a wider set of images that are currently not supported by standard pull method,
such as those that use v1 manifest.

Fixes: #9144

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-04-08 09:56:59 -07:00
Greg Kurz
8b996b9307
Merge pull request #9331 from egernst/foobar
katautils: check number of cores on the system intead of go runtime
2024-04-08 18:38:49 +02:00
Wainer Moschetta
fba1d394d7
Merge pull request #9369 from ChengyuZhu6/sandbox-image
agent:image: Support different pause image in the guest for guest pull
2024-04-08 11:06:21 -03:00
stevenhorsman
864e9c22ba agent: doc: Add new config doc
Document the new guest_components_rest_api config parameter

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2024-04-08 11:38:53 +01:00
Biao Lu
f0edec84f6 agent: Launch api-server-rest
If 'rest_api' is configured, let's start the  api-server-rest after
the attestation-agent and the confidential-data-hub have been started.

Fixes: #7555

Signed-off-by: Biao Lu <biao.lu@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com>
Co-authored-by: Wang, Arron <arron.wang@intel.com>
Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com>
Co-authored-by: Alex Carter <alex.carter@ibm.com>
Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>
2024-04-08 11:38:53 +01:00
Biao lu
4d752e6350 agent: Add config for api-server-rest
Add configuration for 'rest api server'.

Optional configurations are
  'agent.rest_api=attestation' will enable attestation api
  'agent.rest_api=resource' will enable resource api
  'agent.rest_api=all' will enable all (attestation and resource) api

Fixes: #7555

Signed-off-by: Biao Lu <biao.lu@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com>
Co-authored-by: Wang, Arron <arron.wang@intel.com>
Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com>
Co-authored-by: Alex Carter <alex.carter@ibm.com>
Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>
2024-04-08 11:06:14 +01:00
Biao Lu
f476d671ed agent: Launch the confidential data hub
Let's introduce a new method to start the confidential data hub and the
attestation agent.  The former depends on the later, and it needs to be
started before the RPC server.

Starting the attestation components is based on whether the confidential
containers guest components binaries are found in the rootfs.

Fixes: #7544

Signed-off-by: Biao Lu <biao.lu@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Signed-off-by: Linda Yu <linda.yu@intel.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: Jakob Naucke <jakob.naucke@ibm.com>
Co-authored-by: Wang, Arron <arron.wang@intel.com>
Co-authored-by: zhouliang121 <liang.a.zhou@linux.alibaba.com>
Co-authored-by: Alex Carter <alex.carter@ibm.com>
Co-authored-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>
2024-04-08 11:06:14 +01:00
Greg Kurz
be8f0cb520
Merge pull request #9402 from deagon/feat/debug-threads
qemu: show the thread name when enable the hypervisor.debug option
2024-04-08 11:04:36 +02:00
ChengyuZhu6
8c897f822c agent:image: Support different pause image in the guest for guest pull
Support different pause images in the guest for guest-pull, such as k8s
pause image (registry.k8s.io/pause) and openshift pause image (quay.io/bpradipt/okd-pause).

Fixes: #9225 -- part III

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-04-07 09:00:10 +08:00
Fabiano Fidêncio
cdb8531302
hypervisor: Simplify TDX protection detection
Let's rely on the kvm module 'tdx' parameter to do so.
This aligns with both OSVs (Canonical, Red Hat, SUSE) and the TDX
adoption (https://github.com/intel/tdx-linux) stacks.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 19:51:27 +02:00
Fabiano Fidêncio
b7cccfa019
qemu: tdx: Adapt command line
This commit is a mess, but I'm not exactly sure what's the best way to
make it less messy, as we're getting QEMU TDX to work while partially
reverting 1e34220c41.

With that said, let me cover the content of this commit.

Firstly, we're reverting all the changes related to
"memory-backend-memfd-private", as that's what was used with the
previous host stack, but it seems it
didn't fly upstream.

Secondly, in order to get QEMU to properly work with TDX, we need to
enforce the 'private=on' knob and use the "memory-backend-ram", and
we're doing so, and also making sure to test the `private=on` newly
added knob.

I'm sorry for the confusion, I understand this is not optimal, I just
don't see an easy path to do changes without leaving the code broken
during those changes.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 19:51:27 +02:00
Fabiano Fidêncio
6b4cc5ea6a
Revert "qemu: tdx: Workaround SMP issue with TDX 1.5"
This reverts commit d1b54ede29.

 Conflicts:
	src/runtime/virtcontainers/qemu.go

This commit was a hack that was needed in order to get QEMU + TDX to
work atop of the stack our CI was running on.  As we're moving to "the
officially supported by distros" host OS, we need to get rid of this.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 10:23:52 +02:00
Fabiano Fidêncio
582b5b6b19
govmm: tdx: Expose the private=on|off knob
The private=on|off knob is required in order to properly lauunch a TDX
guest VM.

This is a brand new property that is part of the still in-flight patches
adding TDX support on QEMU.

Please, see:
3fdd8072da

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2024-04-05 10:23:52 +02:00
Alex Lyn
0e0a361f0e
Merge pull request #8782 from Apokleos/device-increate-count
bugfix and refactor device increate count
2024-04-05 13:43:49 +08:00
Eric Ernst
da01bccd36 katautils: check number of cores on the system intead of go runtime
We used to utilize go runtime's "NumCPUs()", which will give the number
of cores available to the Go runtime, which may be a subset of physical
cores if the shim is started from within a cpuset. From the function's
description:
"NumCPU returns the number of logical CPUs usable by the current
process."

As an example, if containerd is run from within a smaller CPUset, the
maximum size of a pod will be dictated by this CPUset, instead of what
will be available on the rest of the system.

Since the shim will be moved into its own cgroup that may have a
different CPUset, let's stick with checking physical cores. This also
aligns with what we have documented for maxVCPU handling.

In the event we fail to read /proc/cpuinfo, let's use the goruntime.

Fixes: #9327

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2024-04-03 16:09:16 -07:00
Alex Lyn
935a1a3b40 runtime-rs: refactor decrease_attach_count with do_decrease_count
Try to reduce duplicated code in decrease_attach_count with public
new function do_decrease_count.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:19:19 +08:00
Alex Lyn
4f0fab938d runtime-rs: refactor increase_attach_count with do_increase_count
Try to reduce duplicated code in increase_attach_count with public
new function do_increase_count.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:19:19 +08:00
Alex Lyn
fff64f1c3e runtime-rs: introduce dedicated function do_decrease_count
Introduce a dedicated public function do_decrease_count to
reduce duplicated code in drivers' decrease_attach_count.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:19:08 +08:00
Alex Lyn
5750faaf31 runtime-rs: introduce dedicated function do_increase_count
Since there are many implementations of reference counting in the
drivers, all of which have the same implementation, we should try
to reduce such duplicated code as much as possible. Therefore, a
new function is introduced to solve the problem of duplicated code.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-03 17:09:17 +08:00
Guoqiang Ding
cd0c31e185 qemu: show the thread name when enable the hypervisor.debug option
Add debug-threads=on in the name argument if debug enabled.

Fixes: #9400
Signed-off-by: Guoqiang Ding <dgq8211@gmail.com>
2024-04-03 10:36:52 +08:00
Alex Lyn
fa8049af6c
Merge pull request #9383 from Apokleos/unified-cgrp-cmdline
kata-agent: enabling cgroups-v2 by systemd.unified_cgroup_hierarchy
2024-04-02 09:08:04 +08:00
Alex Lyn
07bfdf4a22
Merge pull request #9275 from Apokleos/swap-hooks-bindmnt
kata-agent: Change order of guest hook and bind mount processing
2024-04-02 07:40:10 +08:00
Alex Lyn
c88014834b kata-agent: enabling cgroups-v2 by systemd.unified_cgroup_hierarchy
Configure the system to mount cgroups-v2 by default during system boot
by the systemd system, We must add systemd.unified_cgroup_hierarchy=1
parameter to kernel cmdline, which will be passed by kernel_params in
configuration.toml.
To enable cgroup-v2, just add systemd.unified_cgroup_hierarchy=true[1]
to kernel_params.

Fixes: #9336

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-01 18:45:12 +08:00
alex.lyn
548f252bc4 runtime-rs: bugfix incorrect use of refcount before vfio attach
When there's a pod with multiple containers, there may be case that
attach point more than 2, we should not return Err in that case when
we are doing attach ops, but just return Ok.

Fixes: #8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-04-01 11:28:57 +08:00
Alex Lyn
dfa8832406
Merge pull request #9345 from c3d/bug/9342-agent-test-errors
agent: Fix errors in `make check`
2024-04-01 09:48:44 +08:00
Dan Mihai
600f9266f3 runtime: remove stream copy infinite loop
This reverts commit 1c5693be86.

Avoid apparent infinite loop when ReadStreamRequest is blocked by
policy - for some of the pods.

When running the k8s-limit-range.bats test with Policy enabled,
the Shim + VMM never get terminated on my cluster. Not sure why
the sandbox clean-up works better for other tests, but the
k8s-limit-range test pod gets stuck in an infinite loop:

stdout io stream copy error happens: error = %wrpc error: code =
PermissionDenied desc = \"ReadStreamRequest is blocked by policy

...

policy check: ReadStreamRequest

...

stdout io stream copy error happens: error = %wrpc error: code =
PermissionDenied desc = \"ReadStreamRequest is blocked by policy

...

policy check: ReadStreamRequest

...

Fixes: #9380

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-03-28 22:43:28 +00:00
Dan Mihai
ebb26edf42
Merge pull request #9347 from microsoft/danmihai1/reduce-exec-test-policy-prints
genpolicy: reduce policy debug prints
2024-03-27 15:12:10 -07:00
Tobin Feldman-Fitzthum
9856fe5bea runtime: remove ServiceOffload parameter
Since we no longer use the service_offload configuration,
remove the ServiceOffload field from the image struct.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2024-03-27 12:21:13 -05:00
Tobin Feldman-Fitzthum
a18c7ca307 runtime: remove unimplemented CoCo configurations
These experimental options were added 2 years ago
in anticipation of features that would be added
in CoCo. These do not match the features that were
eventually added and will soon be ported to main.

Fixes: #8047

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2024-03-27 12:21:06 -05:00
Chengyu Zhu
e66a5cb54d
Merge pull request #9332 from ChengyuZhu6/guest-pull-timeout
Support to set timeout to pull large image in guest
2024-03-28 00:34:08 +08:00
Christophe de Dinechin
82c4079fd0 agent: Remove useless loop
This is the report from `make check`:

```
error: this loop never actually loops
   --> src/signal.rs:147:9
    |
147 | /         loop {
148 | |             select! {
149 | |                 _ = handle => {
150 | |                     println!("INFO: task completed");
...   |
156 | |             }
157 | |         }
    | |_________^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#never_loop
    = note: `#[deny(clippy::never_loop)]` on by default
```

There is only one option: you get something or a timeout. You never retry, so
the report is correct.

Fixes: #9342

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2024-03-27 17:03:44 +01:00
Christophe de Dinechin
df5c88cdf0 agent: Remove lint error about .flatten running forever
The lint report is the following:

```
error: `flatten()` will run forever if the iterator repeatedly produces an `Err`
    --> src/rpc.rs:1754:10
     |
1754 |         .flatten()
     |          ^^^^^^^^^ help: replace with: `map_while(Result::ok)`
     |
note: this expression returning a `std::io::Lines` may produce an infinite number of `Err` in case of a read error
    --> src/rpc.rs:1752:5
     |
1752 | /     reader
1753 | |         .lines()
     | |________________^
     = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#lines_filter_map_ok
     = note: `-D clippy::lines-filter-map-ok` implied by `-D warnings`
     = help: to override `-D warnings` add `#[allow(clippy::lines_filter_map_ok)]`
```

This commit simply applies the suggestion.

Fixes: #9342

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2024-03-27 17:03:44 +01:00
Christophe de Dinechin
bfb55312be agent: Fix .enumerate errors during make check
Running `make check` in the `src/agent` directory gives:

```
error: you seem to use `.enumerate()` and immediately discard the index
   --> rustjail/src/mount.rs:572:27
    |
572 |     for (_index, line) in reader.lines().enumerate() {
    |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unused_enumerate_index
    = note: `-D clippy::unused-enumerate-index` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(clippy::unused_enumerate_index)]`
help: remove the `.enumerate()` call
    |
572 |     for line in reader.lines() {
    |         ~~~~    ~~~~~~~~~~~~~~

    Checking tokio-native-tls v0.3.1
    Checking hyper-tls v0.5.0
    Checking reqwest v0.11.18
error: could not compile `rustjail` (lib) due to 1 previous error
warning: build failed, waiting for other jobs to finish...
make: *** [../../utils.mk:177: standard_rust_check] Error 101
```

Fixes: #9342

Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2024-03-27 17:03:44 +01:00
Greg Kurz
e1068da1a0
Merge pull request #9326 from gkurz/draft-release
Only tag and publish the release when it is fully ready
2024-03-27 15:59:59 +01:00
ChengyuZhu6
c2dc13ebaa runtime: support to configure CreateContainer Timeout in configurations
support to configure CreateContainerRequestTimeout in the
configurations.

e.g.:
[runtime]
...
create_container_timeout = 300

Note: The effective timeout is determined by the lesser of two values: runtime-request-timeout from kubelet config
(https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout) and create_container_timeout.
In essence, the timeout used for guest pull=runtime-request-timeout<create_container_timeout?runtime-request-timeout:create_container_timeout.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-27 21:58:41 +08:00
Greg Kurz
693c9487d4 docs: Adjust release documentation
Most of the content of `docs/Stable-Branch-Strategy.md` got de-facto
deprecated by the re-design of the release process described in #9064.
Remove this file and all its references in the repo.

The `## Versioning` section has some useful information though. It is
moved to `docs/Release-Process.md`. The documentation of the `PATCH`
field is adapted according to new workflow.

Fixes #9064 - part VI

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-03-27 12:41:48 +01:00
Steve Horsman
45aba769c0
Merge pull request #9346 from cmaf/ci-remove-repo-docs
Remove additional links to tests directory
2024-03-27 11:13:32 +00:00
ChengyuZhu6
2224f6d63f runtime: support to configure CreateContainer timeout in annotation
Support to configure CreateContainerRequestTimeout in the annotations.

e.g.:
annotations:
      "io.katacontainers.config.runtime.create_container_timeout": "300"

Note: The effective timeout is determined by the lesser of two values: runtime-request-timeout from kubelet config
(https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=runtime%2Drequest%2Dtimeout) and create_container_timeout.
In essence, the timeout used for guest pull=runtime-request-timeout<create_container_timeout?runtime-request-timeout:create_container_timeout.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-27 15:44:29 +08:00
ChengyuZhu6
39bd462431 runtime: support to set timeout for CreateContainerRequest
In the situation to pull images in the guest #8484, it’s important to account for pulling large images.
Presently, the image pull process in the guest hinges on `CreateContainerRequest`, which defaults to a 60-second timeout.
However, this duration may prove insufficient for pulling larger images, such as those containing AI models.
Consequently, we must devise a method to extend the timeout period for large image pull.

Fixes: #8141

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-27 15:44:29 +08:00
Pavel Mores
4c72b02e53 runtime-rs: remove the now-unused code of NetDevice
The remaining code in network.rs was mostly moved to utils.rs which seems
better home for these utility functions anyway (and a closely related
function open_named_tuntap() has already lived there).

ToString implementation for Address was removed after some consideration.
Address should probably ideally implement Display (as per RFC 565) which
would also supply a ToString implementation, however it implements Debug
instead, probably to enable automatic implementation of Debug for anything
that Address is a member of, if for no other reason.  Rather than having
two identical functions this commit simply switches to using the Debug
implementation for printing Address on qemu command line.

Fixes #9352

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-26 12:52:40 +01:00
Pavel Mores
c94e55d45a runtime-rs: make QemuCmdLine own vsock file descriptor
Make file descriptors to be passed to qemu owned by QemuCmdLine.  See
commit 52958f17cd for more explanation.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-26 12:50:41 +01:00
Pavel Mores
0cf0e923fc runtime-rs: refactor QemuCmdLine::add_network_device() signature
add_network_device() doesn't need to be passed NetworkInfo since it
already has access to the full HypervisorConfig.

Also, one of the goals of QemuCmdLine interface's design is to avoid
coupling between QemuCmdLine and the hypervisor crate's device module,
if at all possible.  That's why add_network_device() shouldn't take
device module's NetworkConfig but just parts that are useful in
add_network_device()'s implementation.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-26 12:50:41 +01:00
Pavel Mores
a4f033f864 runtime-rs: add should_disable_modern() utility function
is_running_in_vm() is enough to figure out whether to disable_modern but
it's clumsy and verbose to use.  should_disable_modern() streamlines the
usage by encapsulating the verbosity.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-26 12:50:41 +01:00
Pavel Mores
12e40ede97 runtime-rs: reimplement add_network_device() using Netdev & DeviceVirtioNet
This commit replaces the existing NetDevice-based implementation with one
using Netdev and DeviceVirtioNet.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-26 12:50:41 +01:00
Pavel Mores
0a57e2bb32 runtime-rs: refactor NetDevice in qemu driver
In keeping with architecture of QemuCmdLine implementation we split the
functionality into two objects: Netdev to represent and generate the
-netdev part and DeviceVirtioNet for the -device virtio-net-<transport>
part.

This change is a pure refactor, existing functionality does not change.
However, we do remove some stub generalizations and govmm-isms, notably:
- we remove the NetDev enum since the only network interface types that
  kata seems to use with qemu are tuntap and macvtap, both of which are
  implemented by the same -netdev tap
- enum DeviceDriver is also left out since it doesn't seem reasonable to
  try to represent VFIO NICs (which are completely different from
  virtio-net ones) with the same struct as virtio-net
- we also remove VirtioTransport because there's no use for it so far, but
  with the expectation that it will be added soon.

We also make struct Netdev the owner of any vhost-net and queue file
descriptors so that their lifetime is tied ultimately to the lifetime of
QemuCmdLine automatically, instead of returning the fds to the caller and
forcing it to achieve the equivalent functionality but manually.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-26 12:50:41 +01:00
Pavel Mores
7f23734172 runtime-rs: reduce generate_netdev_fds() dependencies
generate_netdev_fds() takes NetworkConfig from which it however only needs
a host-side network device name.  This commit makes it take the device name
directly, making the function useful to callers who don't have the whole
NetworkConfig but do have the requisite device name.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-26 12:50:41 +01:00
Pavel Mores
d4ac45d840 runtime-rs: refactor clear_fd_flags()
The idea of this function is to make sure O_CLOEXEC is not set on file
descriptors that should be inherited by a child (=hypervisor) process.
The approach so far is however rather heavy-handed - clearing *all* flags
is unjustifiably aggresive for a low-level function with no knowledge of
context whatsoever.

This commit refactors the function so that it only does what's expected
and renames it accordingly.  It also clarifies some of its call sites.

Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-26 12:50:14 +01:00
Chengyu Zhu
d16971e37e
Merge pull request #9325 from ChengyuZhu6/image_service
agent:image: Refactor code to improve memory efficiency of image service
2024-03-26 10:38:37 +08:00
Dan Mihai
6c72c29535 genpolicy: reduce policy debug prints
Kata CI has full debug output enabled for the cbl-mariner k8s tests,
and the test AKS node is relatively slow. So debug prints from policy
are expensive during CI.

Fixes: #9296

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-03-26 02:21:26 +00:00
Alex Lyn
cec943fc26
Merge pull request #9244 from Apokleos/dgb-gpu
runtime-rs/dragonball: add support building kernel with upcall and GPU hotplug
2024-03-26 08:53:54 +08:00
Chelsea Mafrica
d69514766e src: Remove references to files in tests repo
Change scripts and source that uses files in the tests repo to use the
corresponding file in the current repo.

Fixes #9165

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2024-03-25 15:09:52 -07:00
Alex Lyn
5c54315a87 dragonball: fix CI failure due to poor UT adaptation.
Fixes: #9144

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-25 20:25:27 +08:00
ChengyuZhu6
f47408fdf4 agent:image: Refactor code to improve memory efficiency of image service
Currently, `.lock().await.clone()` results in `Option<ImageService>` being duplicated in memory with each call to `singleton()`.
Consequently, if kata-agent receives numerous image pulling requests simultaneously,
it will lead to the allocation of multiple `Option<ImageService>` instances in memory, thereby consuming additional memory resources.

In image.rs, we introduce two public functions:
`merge_bundle_oci()` and `init_image_service()`. These functions will encapsulate
the operations on `IMAGE_SERVICE`, ensuring that its internal details remain
hidden from external modules such as `rpc.rs`.

Fixes: #9225 -- part II

Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-25 07:46:50 +08:00
ChengyuZhu6
7a49ec1c80 agent:util: Refactor the unit tests to leverage rstest
Refactor the unit tests in util.rs to leverage rstest for parameterization.

Fixes: #9314

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-23 10:49:53 +08:00
ChengyuZhu6
2df2b4d30d agent:namespace: Refactor unit tests to leverage rstest
Refactor the unit tests in `namespace.rs` to leverage rstest for parameterization.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-23 10:49:48 +08:00
Hyounggyu Choi
d915a79e2d
Merge pull request #9280 from BbolroC/enable-qemu-on-s390x
runtime-rs: Enable qemu on s390x
2024-03-22 23:58:42 +01:00
Hyounggyu Choi
81aaa34bd6 runtime-rs: Add DeviceVirtioSerial and DeviceVirtconsole
It is observed that virtiofsd exits immediately on s390x
if there is no attached console devices.
This commit resolves the issue by migrating `appendConsole()`
from runtime and being triggered in `start_vm()`.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-22 19:27:13 +01:00
Hyounggyu Choi
2cfe745efb runtime-rs: Enable memory backend option for Machine for s390x
For s390x, it requires an additional option `memory-backend` for `-machine`.
Otherwise, virtiofsd exits with HandleRequest(InvalidParam).

This commit is to add a field `memory_backend` to `struct Machine`
and turn it on for s390x.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-22 19:27:13 +01:00
Hyounggyu Choi
9bcfaad625 runtime-rs: Add ccw block device for rootfs
Like nvdimm for x86_64, a block device for s390x should be
treated differently with `virtio-blk-ccw`.
This is to generate a QEMU command line parameter for a block
device by using `-blockdev` and `-device` if the `vm_rootfs_driver`
is set to `virtio-blk-ccw`.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-22 19:27:13 +01:00
David Esparza
3e40051634
Merge pull request #9255 from dborquez/thread_pid_function
runtime-rs: ch: Implement full thread/tid/pid handling
2024-03-22 10:05:02 -06:00
Chengyu Zhu
9a4cb96262
Merge pull request #9312 from ChengyuZhu6/show-feature
agent: Add guest-pull to the list of agent features in announce()
2024-03-21 23:35:29 +08:00
David Esparza
b498e140a1
runtime-rs: ch: Implement full thread/tid/pid handling
Add in the full details once cloud-hypervisor/cloud-hypervisor#6103
has been implemented, and the feature is available in a Cloud Hypervisor
release.

Fixes: #8799

Signed-off-by: David Esparza <david.esparza.borquez@intel.com>
2024-03-21 08:24:53 -06:00
ChengyuZhu6
754399d909 agent: Add guest-pull to the list of agent features in announce()
Add guest-pull to the list of agent features in announce().

Fixes: #9225 -- part IV

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-21 20:01:52 +08:00
Xuewei Niu
9c4f9dcb35
Merge pull request #9311 from studychao/chao/fix_mtrr
Dragonballl: introduce MTRR regs support
2024-03-21 17:24:27 +08:00
Hyounggyu Choi
9b2c08935b runtime-rs: Pass different device argument based on bus type
Currently, `*-pci` is used as an argument for the device config.
It is not true for a case where a different type of bus is used.
s390x uses `ccw`.
This commit is to make it flexible to generate the device argument
based on the bus type. A structure `DeviceVhostUserFsPci` and
`VhostVsockPci` is renamed to `DeviceVhostUserFs` and `VhostVsock`
because the structure name is not bound to a certain bus type any more.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-21 09:25:37 +01:00
Hyounggyu Choi
7b3d1adb8c libs: Bump sysinfo to v0.30.5
It has been observed that the runtime stops running around
`sysinfo::total_memory()` while adjusting a config on s390x.
This is to update the crate to the latest version which happened
to resolve the issue. (No explicit release note for this)

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-20 09:27:13 +01:00
Chao Wu
5a4b858ece Dragonballl: introduce MTRR regs support
MTRR, or Memory-Type Range Registers are a group of x86 MSRs providing a way to control access
 and cache ability of physical memory regions.
During our test in runtime-rs + Dragonball, we found out that this register support is a must
for passthrough GPU running CUDA application, GPU needs that information to properly use GPU memory.

fixes: #9310
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2024-03-20 14:18:16 +08:00
ChengyuZhu6
5bad18f9c9
agent: set https_proxy/no_proxy before initializing agent policy
When the https_proxy/no_proxy settings are configured alongside agent-policy enabled, the process of pulling image in the guest will hang.
This issue could stem from the instantiation of `reqwest`’s HTTP client at the time of agent-policy initialization,
potentially impacting the effectiveness of the proxy settings during image guest pulling.
Given that both functionalities use `reqwest`, it is advisable to set https_proxy/no_proxy prior to the initialization of agent-policy.

Fixes: #9212

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 18:06:00 +01:00
ChengyuZhu6
db9f18029c
README: Add https_proxy and no_proxy to agent README
Add agent.https_proxy and agent.no_proxy to the table in the agent README.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 18:06:00 +01:00
ChengyuZhu6
8724d7deeb
packaging: Enable to build agent with PULL_TYPE feature
Enable to build kata-agent with PULL_TYPE feature.

We build kata-agent with guest-pull feature by default, with PULL_TYPE set to default.
This doesn't affect how kata shares images by virtio-fs. The snapshotter controls the image pulling in the guest.
Only the nydus snapshotter with proxy mode can activate this feature.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 18:06:00 +01:00
ChengyuZhu6
ba242b0198
runtime: support different cri container type check
To support handle image-guest-pull block volume from different CRIs, including cri-o and containerd.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 18:05:59 +01:00
ChengyuZhu6
874d83b510
agent/image: Use guest provided pause image
By default the pause image and runtime config will provided
by host side, this may have potential security risks when the
host config a malicious pause image, then we will use the pause
image packaged in the rootfs.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Arron Wang <arron.wang@intel.com>
Co-authored-by: Julien Ropé <jrope@redhat.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
2024-03-19 18:05:59 +01:00
ChengyuZhu6
c269b9e8c6
agent: Add guest-pull feature for kata-agent
Add "guest-pull" feature option to determine that the related dependencies
would be compiled if the feature is enabled.

By default, agent would be built with default-pull feature, which would
support all pull types, including sharing images by virtio-fs and
pulling images in the guest.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 18:05:59 +01:00
ChengyuZhu6
965da9bc9b
runtime: support to pass image information to guest by KataVirtualVolume
support to pass image information to guest by KataVirtualVolumeImageGuestPullType
in KataVirtualVolume, which will be used to pull image on the guest.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 17:22:36 +01:00
ChengyuZhu6
cfd14784a0
agent: Introduce ImagePullHandler to support IMAGE_GUEST_PULL volume
As we do not employ a forked containerd in confidential-containers, we utilize the KataVirtualVolume
which storing the image information as an integral part of `CreateContainer`.
Within this process, we store the image information in rootfs.storage and pass this image url through `CreateContainerRequest`.
This approach distinguishes itself from the use of `PullImageRequest`, as rootfs.storage is already set and initialized at this stage.
To maintain clarity and avoid any need for modification to the `OverlayfsHandler`,we introduce the `ImagePullHandler`.
This dedicated handler is responsible for orchestrating the image-pulling logic within the guest environment.
This logic encompasses tasks such as calling the image-rs to download and unpack the image into `/run/kata-containers/{container_id}/images`,
followed by a bind mount to `/run/kata-containers/{container_id}`.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-03-19 17:22:36 +01:00
ChengyuZhu6
462051b067
agent/image: merge container spec for images pulled inside guest
When being passed an image name through a container annotation,
merge its corresponding bundle OCI specification and process into the passed container creation one.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Arron Wang <arron.wang@intel.com>
Co-authored-by: Jiang Liu <gerry@linux.alibaba.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: wllenyj <wllenyj@linux.alibaba.com>
Co-authored-by: jordan9500 <jordan.jackson@ibm.com>
2024-03-19 17:22:36 +01:00
ChengyuZhu6
cec1916196
agent: Support https_proxy/no_proxy config for image download in guest
Containerd can support set a proxy when downloading images with a environment variable.
For CC stack, image download is offload to the kata agent, we need support similar feature.
Current we add https_proxy and no_proxy, http_proxy is not added since it is insecure.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Arron Wang <arron.wang@intel.com>
2024-03-19 17:22:36 +01:00
ChengyuZhu6
9cddd5813c
agent/image: Enable image-rs crate to pull image inside guest
With image-rs pull_image API, the downloaded container image layers
will store at IMAGE_RS_WORK_DIR, and generated bundle dir with rootfs
and config.json will be saved under CONTAINER_BASE/cid directory.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Arron Wang <arron.wang@intel.com>
Co-authored-by: Jiang Liu <gerry@linux.alibaba.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: wllenyj <wllenyj@linux.alibaba.com>
2024-03-19 17:22:36 +01:00
ChengyuZhu6
2b3a00f848
agent: export the image service singleton instance
Export the image service singleton instance.

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
Co-authored-by: Jiang Liu <gerry@linux.alibaba.com>
Co-authored-by: Xynnn007 <xynnn@linux.alibaba.com>
Co-authored-by: stevenhorsman <steven@uk.ibm.com>
Co-authored-by: wllenyj <wllenyj@linux.alibaba.com>
2024-03-19 17:22:36 +01:00
ChengyuZhu6
1f1ca6187d
agent: Introduce ImageService
Introduce structure ImageService, which will be used to pull images
inside the guest.

Fixes: #8103

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
co-authored-by: wllenyj <wllenyj@linux.alibaba.com>
co-authored-by: stevenhorsman <steven@uk.ibm.com>
2024-03-19 17:22:33 +01:00
Chelsea Mafrica
42dfe0e8d1
Merge pull request #9286 from jodh-intel/agent-show-enabled-features
agent: Show features enabled at build time
2024-03-19 08:54:49 -07:00
Alex Lyn
7af2df408e
Merge pull request #9295 from likebreath/0318/fix_clh_default_netconfig
runtime-rs: ch: Provide valid default value for NetConfig
2024-03-19 15:17:18 +08:00
Xuewei Niu
99d0e5fff8
Merge pull request #9270 from zvonkok/kata-agent-bind-mount
kata-agent: optional bind flag
2024-03-19 10:39:23 +08:00
Bo Chen
ad4262e86b runtime-rs: ch: Provide valid default value for NetConfig
The current default value of IP `0.0.0.0` with mask `0.0.0.0` will cause
ioctl error when being used to create and configure TAP device, with
newer version of Cloud Hypervisor [1]. This patch replaces them with
valid value that are the same as the Go-lang runtime [2].

[1] https://github.com/cloud-hypervisor/cloud-hypervisor/pull/5924
[2] e3f7852738/src/runtime/virtcontainers/pkg/cloud-hypervisor/client/model_net_config.go (L40-L57)

Fixes: #9254

Signed-off-by: Bo Chen <chen.bo@intel.com>
2024-03-18 15:47:58 -07:00
Greg Kurz
1e526a4769 runtime-rs: Consolidate the handling of fds passed to QEMU
File descriptors that are passed to QEMU need some special care.
We want them to be closed when the QEMU process is started. But
at the same time, it is required that the associated rust File
structures, either coming from the` std::fs` or the `tokio::fs`
crates, are still in scope when the QEMU process is forked. This
is currently achieved by keeping File structures in variables
at the outer scope of `start_vm()`. This scheme is currently
duplicated, with similar justifications in the corresponding
comments.

Consolidate all this handling in one place with a more generic
explanation.

Fixes #9281

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-03-15 16:14:59 +01:00
James O. D. Hunt
9ef59488d9 agent: Show features enabled at build time
The agent now has a number of optional build-time features that can be
enabled.

Add details of these features to the following areas:

- Version output (`kata-agent --version`)
- Announce message (so that the details are always added to the journal
  at agent startup).
- The response message returned by the ttRPC `GetGuestDetails()` API.

Fixes: #9285.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2024-03-15 13:29:21 +00:00
Greg Kurz
6a112cc7a5 runtime-rs: Fix missing dependency
Some previous contribution missed to run cargo clippy.
Fix the dependency now so that it doesn't cause noise
in future contributions.

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-03-14 23:19:38 +01:00
Dan Mihai
b3b00e00a6
Merge pull request #9246 from microsoft/danmihai/default-env
genpolicy: default env if image doesn't have env
2024-03-14 11:01:43 -07:00
Zvonko Kaiser
c15e19c806 kata-agent: optional bind flag
Fixes: #9269

From https://github.com/opencontainers/runtime-spec/blob/main/config.md#mounts
type (string, OPTIONAL) The type of the filesystem to be mounted.
bind may be only specified in the oci spec options -> flags update r#type
The agent will ignore bind mounts if they are only specified in the OCI spec options and not in the flags.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-03-14 14:42:01 +00:00
Hyounggyu Choi
1dac6b1357 runtime-rs: Configure s390x specific flags for Makefile
s390x supports a different machine type `s390-ccw-virtio` and it is
not required to configure cpu features by default for the platform.
A hypervisor `dragonball` is not supported on s390x so that `DBCMD`
is not necessary. `vm-rootfs_driver` should be set to `virtio-blk-ccw`.
This commit is to set the architecture-specific flags for Makefile.

Fixes: #9158

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2024-03-14 13:05:35 +01:00
Alex Lyn
2aa3519520 kata-agent: Change order of guest hook and bind mount processing
The guest_hook_path item in configuration.toml allows OCI hook scripts
to be executed within Kata's guest environment. Traditionally, these
guest hook programs are pre-built and included in Kata's guest rootfs
image at a fixed location.

While setting guest_hook_path = "/usr/share/oci/hooks" in configuration.toml
works, it lacks flexibility. Not all guest hooks reside in the path
/usr/share/oci/hooks, and users might have custom locations.

To address this, a more flexible and configurable approach is to be proposed
that allows users to specify their desired path. This could include using a
sandbox bind mount path for hooks specific to that particular container.

However, The current implementation of guest hooks and bind mounts in kata-agent
has a reversed order of execution compared to the desired behavior.
To achieve the intended functionality, we simply need to swap the order of their
implementation.

Fixes: #9274

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-13 20:30:32 +08:00
Zvonko Kaiser
63dff9a9f2 kata-agent: CreateContainer Hook
Fixes: #9267

The doc states we have support for all lifecycle hooks. There are still some missing.
This is the first issue regarding the CreateContainer hook which is run before pivot_root but after prestart and createruntime

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2024-03-13 09:24:25 +00:00
Alex Lyn
410afcc913
Merge pull request #8866 from Apokleos/netdev-qemu-rs
runtime-rs: add netdev params to cmdline for qemu-rs.
2024-03-13 13:07:43 +08:00
Alex Lyn
e2ae8ba79b runtime-rs: add network device into Qemu's cmdline
It will open tuntap device and vhost-net device
and store device files.

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-12 22:28:54 +08:00
Alex Lyn
d3bca4597e runtime-rs: add open_named_tuntap to open a named tuntap device.
The open_named_tuntap function is designed as a public function to
open a tuntap device with the specified name. However, in order to
reference existing methods in dbs_utils, we still need to keep the
reference "path = "../../../dragonball/src/dbs_utils" in dependencies
and cannot hide it.

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-12 22:26:32 +08:00
Alex Lyn
005b333976 runtime-rs: add network helpers and impl ToQemuParams
Add network helpers and impl ToQemuParams trait to build
netdev params which are put into cmdline for Qemu VM running.

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-12 22:25:39 +08:00
Alex Lyn
63786934f4 runtime-rs: set network namespace for qemu process and netdev.
We need ensure the add_network_device happens in netns and
move qemu process into netns which keeps the qemu process
running in this net namespace.

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-12 22:21:43 +08:00
Alex Lyn
69a5e5b955 runtime-rs: add network device handler in start_vm.
Add network device handler in start_vm, which is sepcially
for Qemu VM running with added net params to command line.

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-12 22:18:01 +08:00
Alex Lyn
9f6003adde runtime-rs: add a new netns field in struct QemuInner.
We need add a new netns field in struct QemuInner, and
initialize it with argument passed down in prepare_vm().

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-11 16:02:39 +08:00
Alex Lyn
f571ec84d2 runtime-rs: add a public method to support process entering netns.
The enter_netns function is designed as a public method to help
VMMs running as a independent process enter a network namespace,
reducing duplicate code.

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-11 15:55:52 +08:00
Alex Lyn
4176fcc3c6 runtime-rs: make the code for cleanup fd flags as public method.
It just move the related code to a public file(utils.rs) and make
it a common method for both vsock and network, or some others.

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Pavel Mores <pmores@redhat.com>
2024-03-11 15:52:20 +08:00
Alex Lyn
b1038704e0 runtime-rs: make NetnsGuard common for hypervisor and resource.
In order to better support non-builtin vmm usage of NetnsGuard and
reduce code duplication, we need to move it to a common path that
can be referenced by both hypervisor and resource manager.

In this patch, it just do moving code from network/utils/netns.rs
to kata-sys-utils/src/netns.rs

Fixes: #8865

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-11 15:38:42 +08:00
Alexandru Matei
617b0114b3 clh: initialize clh pid before using it
The PID needs to be initialized before calling isClhRunning.
waitVMM() uses isClhRunning and is called by launchClh() just
before returning from function.

Fixes: #9230

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2024-03-09 13:53:51 +02:00
Steve Horsman
54e5ce2464
Merge pull request #9154 from chungeun-choi/change-deprecated-package
fixed - Change the deprecated module from 'io/util' to util. 'io/util…
2024-03-08 15:05:43 +00:00
Alex Lyn
c73597c39d
Merge pull request #9208 from studychao/chao/fix_virt_ci
Dragonball: fix unit test problems when switching to new virt github machine
2024-03-08 09:41:05 +08:00
Chengyu Zhu
d49391a555
Merge pull request #8798 from LindaYu17/setpolicy
add setpolicy function to kata-runtime tool
2024-03-08 06:31:57 +08:00
Dan Mihai
5398b6466c
Merge pull request #9224 from 3u13r/sidecar-container
genpolicy: add restartPolicy to container struct
2024-03-07 12:59:55 -08:00
Dan Mihai
4c3d6fadc8 genpolicy: default env if image doesn't have env
Use containerd's default environment for container images that don't
specify the Env field.

Also, re-enable policy env variable verification, now that these
uncommon images are supported too.

Fixes: #9239

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-03-07 16:56:06 +00:00
Leonard Cohnen
e30e8ab7dc genpolicy: add restartPolicy to container struct
This adds support for sidecar container introduced in Kubernetes 1.28

Fixes: #9220

Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
2024-03-07 12:00:14 +01:00
Chungeun Choi
bad263f399 runtime: Replace deprecated module io/ioutil" to "io"
This change updates the module import to use 'util' instead of the deprecated 'io/util'

Fixes: #9166

Signed-off-by: Chungeun Choi <ce.choi@okestro.com>
2024-03-07 10:56:06 +00:00
Alex Lyn
ef9a38e551 shim-interface: add Copyright of AntGroup in file shim-interface.rs
Fixes: #9189

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-07 15:46:32 +08:00
Alex Lyn
2972a3a675 shim-interface: add UT for get_uds_with_sid
Fixes: #9189

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-07 15:45:44 +08:00
Alex Lyn
7145243bd3 kata-ctl: Support using container short ID to enter guest.
Fixes: #9189

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-07 15:44:47 +08:00
Linda Yu
1c5693be86 stream: repeat copybuffer if it is blocked by policy
copyBuffer returns and the streams will be closed when error occurs.
If the error contains "blocked by policy" it means the log output is
disabled by policy with "ReadStreamRequest" and "WriteStreamRequest" set
to false. But at this moment, we want the real stream still working (not
be seen) because we might want to enable logging for debugging purpose,
so we repeat copybuffer in this case to avoid streams being closed.

Fixes: #8797

Signed-off-by: Linda Yu <linda.yu@intel.com>
2024-03-07 15:00:23 +08:00
Linda Yu
eda419cb03 kata-runtime: add set policy function to kata-runtime
logging/debugging information might probably be disabled in production
due to security consideration, but we'd better provide an approach for
customer to get logging information during runtime, this PR implement
setpolicy function in kata-runtime tools, although it can set whole policy
other than logging.
setpolicy would evokes remote attestation, which means before setting
policy during runtime, user has to reconfigure new policy hash in KBS/AS.

usage:  kata-runtime policy set policy.rego --sandbox-id XXXXXXXX

Fixes: #8797

Signed-off-by: Linda Yu <linda.yu@intel.com>
2024-03-07 15:00:23 +08:00
Dan Mihai
e61ef30a76 genpolicy: disable env variable verification
Disable env variable verification to unblock CI, until container
images that don't specify the Env variables will be handled correctly
(see #9239).

Also, mark the image config Env field as optional, thus allowing
policy generation for these container images.

Fixes: #9240

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-03-07 01:59:18 +00:00
Fupan Li
628f57aca0
Merge pull request #9193 from UiPath/fix/clh-dax
clh: Enable DAX for rootfs
2024-03-05 09:39:22 +08:00
Chao Wu
9f0eab904b Dragonball: fix test_signal_handler
a) There is some unknown syscalls triggered in new github virt machine
that would break the make test process with SIGSYS after applying
SeccompFilter. In order to fix this, we change the allowlist in this
unit test for seccompfileter into a blocklist to avoid meeting the unknown syscalls.
b) lazy static METRICS is not fully initialize in the unit test and may lead to
unstable result for this UT.

fixes: #9207

Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2024-03-04 16:27:27 +08:00
Chao Wu
253fe72435 Dragonball: fix test_handler_insert_region
the mmap region start guest addr hard-code a value and later there
 would be check whether the mentioned addr is larger than or equal
 to mem_end (default to host_phy_mem >> 1) in order to satisfy the
 requirement for DaxMemory. Since github virt machine phy_mem is larger
 than previous CI machine we use, the hard-code value could no longer be
 worked. To fix this, we change the address to mem_end in unit test to
 avoid the influence of host machine change.

fixes: #9207
Signed-off-by: Chao Wu <chaowu@linux.alibaba.com>
2024-03-04 16:27:19 +08:00
Xuewei Niu
daab76de36
Merge pull request #9201 from liubogithub/liubo/dev/panic_fix_3
katautils: fix panic on tracing.
2024-03-02 10:27:02 +08:00
Alex Lyn
13a20957cb
Merge pull request #9164 from Apokleos/directvol-csi-dockerfile
csi-kata-directvolume: add Dockerfile for building csi image
2024-03-01 18:12:19 +08:00
Alex Lyn
f69428a1e7 csi-kata-directvolume: add Dockerfile for building csi image
Fixes: #9163

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-03-01 10:41:51 +08:00
Liu Bo
b6f8355ea3 katautils: fix panic on tracing.
This fixes a panic on tracing on container exit.

The root cause is that global var needs to be set by "=" instead of
":=".

Fixes: #9102

Signed-off-by: Liu Bo <liub.liubo@gmail.com>
2024-02-29 18:40:23 -08:00
Dan Mihai
11b603e5f1
Merge pull request #9139 from microsoft/saulparedes/genolicy_panic_subpath
genpolicy: panic when we see a volume mount subpath
2024-02-29 12:18:56 -08:00
Alexandru Matei
6856e8f678 clh: Enable DAX for rootfs
Fixes: #9192

Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
2024-02-29 18:01:47 +02:00
Chengyu Zhu
bb4c608b32
Merge pull request #9110 from ChengyuZhu6/agent_option
agent: Add all agent configuration options to README
2024-02-28 18:50:44 +08:00
Dan Mihai
352e2af5f0
Merge pull request #9153 from microsoft/danmihai1/clh-bootVM-timeout
runtime: clh: minimum 10s timeout for CreateVM + BootVM
2024-02-27 09:58:01 -08:00
ChengyuZhu6
731c490ded agent: Add all agent configuration options to README
Add all agent configuration options to README so that users can more easily understand
what these options do and how to configure them at runtime.

Fixes: #9109

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-02-27 17:35:19 +08:00
Xuewei Niu
bb5e33b33a
Merge pull request #9100 from littlejawa/fix_5738_metrics_memory
runtime: remove kata_shim_netdev metric
2024-02-26 19:01:21 +08:00
Dan Mihai
f4509b806b runtime: clh: minimum 10s timeout for CreateVM + BootVM
Relax the timeout for calling CLH's CreateVM + BootVM APIs. When
hitting the older 1s timeout, killing a half-booted Guest and
retrying the same boot sequence could have been wasteful and resulting
in unstable CI testing on slower Hosts.

Fixes: #9152

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-02-24 19:15:57 +00:00
Saul Paredes
9b7bd376eb genpolicy: panic when we see a volume mount subpath
Based on https://github.com/kata-containers/runtime/issues/2812

Fixes: #9145

Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
2024-02-23 09:56:51 -08:00
Xuewei Niu
89c76d7d8d
Merge pull request #9125 from gkurz/fix-agent-cgroup-ns
agent: Run container workload in its own cgroup namespace (cgroup v2 guest only)
2024-02-23 10:40:17 +08:00
Julien Ropé
1c306fe4a6 runtime-rs: stop reporting net dev metrics for the shim
For consistency with the go runtime.
As the shim itself is not using the network (all its communication with
other processes is done with local unix sockets), there is no reason to
keep gathering and reporting shim-specific network metrics.
Actual network usage of the kata containers can be found from the existing
agent network metrics (kata_guest_netdev_stat).

Signed-off-by: Julien Ropé <jrope@redhat.com>
2024-02-22 14:00:00 +01:00
Julien Ropé
9de65707ca runtime: stop reporting net dev metrics for the shim
As part of the shim network metrics, the shim is reporting network interfaces
from the host with no namespace isolation - this gives insight in interfaces
not tied to the kata containers, and causes an increase in resource usage for
kata metrics.

As the shim itself is not using the network (all its communication with
other processes is done with local unix sockets), there is no reason to
keep gathering and reporting shim-specific network metrics.
Actual network usage of the kata containers can be found from the existing
hypervisor network metrics (kata_hypervisor_netdev) and from the agent
network metrics (kata_guest_netdev_stat).

Fixes: #5738

Signed-off-by: Julien Ropé <jrope@redhat.com>
2024-02-22 14:00:00 +01:00
Alex Lyn
014e0f4e46 runtime-rs: bugfix for GPU passthrough failed with InvalidOperation.
We need initailize the pci_hotplug_enabled with true before we do GPU
passthrough with runtime-rs/dragonball. Otherwise it fails with error
`InvalidOperation`.

Fixes: #9129

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
2024-02-22 10:22:32 +08:00
Dan Mihai
d84f50db5b genpolicy: fix typo in policy logging
Improve logging, for easier debugging.

Fixes: #9072

Signed-off-by: Dan Mihai <dmihai@microsoft.com>
2024-02-21 18:08:07 +00:00
Greg Kurz
600b951afd agent: Run container workload in its own cgroup namespace
When cgroup v2 is in use, a container should only see its part of the
unified hierarchy in `/sys/fs/cgroup`, not the full hierarchy created
at the OS level. Similarly, `/proc/self/cgroup` inside the container
should display `0::/`, rather than a full path such as :

0::/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podde291f58_8f20_4d44_aa89_c9e538613d85.slice/crio-9e1823d09627f3c2d42f30d76f0d2933abdbc033a630aab732339c90334fbc5f.scope

What is needed here is isolation from the OS. Do that by running the
container in its own cgroup namespace. This matches what runc and
other non VM based runtimes do.

Fixes #9124

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-02-21 13:14:13 +01:00
Greg Kurz
14886c7b32 agent: lint code
Run cargo-clippy to reduce noise in actual functional changes.

Signed-off-by: Greg Kurz <groug@kaod.org>
2024-02-21 13:14:13 +01:00
Archana Shinde
6d38fa1530 network: Try removing as many changes as possible during network cleanup
In case an error is encountered while removing a network endpoint during
network cleanup, we cuurently return immediately with the error.
With this change, in case of error we simply log the error and proceed
towards removing the next endpoint. With this, we can cleanup the
network changes made by the shim as much as possible.
This is especially important when multiple interfaces are passed to the
network namespace using a network plugin like multus.

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-02-20 06:08:05 -08:00
Archana Shinde
b005cda689 network: Move up defer block tp cleanup network
Move the defer for cleaning up network before the call to add network.
This way if any change made by add network is reverted by in case of
failure. This is particulary important for physical network interfaces
as with this step we make sure that driver for the physical interface is
reverted back to the original host driver. Without this the physical
network iterface will remain bound to vfio.

Fixes: #8646

Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
2024-02-20 06:06:42 -08:00
ChengyuZhu6
96c297cb37 runtime: fix checksum mismatch error in make vendor
Fix checksum mismatch error in `make vendor`.

Fixes: #9111

Signed-off-by: ChengyuZhu6 <chengyu.zhu@intel.com>
2024-02-18 22:22:38 +08:00