Commit Graph

1277 Commits

Author SHA1 Message Date
Alex Lyn
8b49564c01
Merge pull request #10610 from Xynnn007/faet-initdata-rbd
Feat | Implement initdata for bare-metal/qemu hypervisor
2025-04-24 09:59:14 +08:00
Xynnn007
17d0db9865 agent: add initdata parse logic
Kata-agent now will check if a device /dev/vd* with 'initdata' magic
number exists. If it exists, kata-agent will try to read it. Bytes 9~16
are the length of the compressed initdata toml in little endine.
Bytes starting from 17 is the compressed initdata.

The initdata image device layout looks like

0        8      16    16+length ...         EOF
'initdata'  length gzip(initdata toml) paddings

The initdata will be parsed and put as aa.toml, cdh.toml and
policy.rego to /run/confidential-containers/initdata.

When AgentPolicy is initialized, the default policy will be overwritten
by that.

When AA is to be launched, if initdata is once processed, the launch arg
will include --initdata parameter.

Also, if
/run/confidential-containers/initdata/aa.toml exists, the launch args
will include -c /run/confidential-containers/initdata/aa.toml.

When CDH is to be launched, if initdata is once processed, the launch
args will include -c /run/confidential-containers/initdata/cdh.toml

Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
2025-04-10 13:09:51 +08:00
stevenhorsman
6603cf7872 agent: Update vsock-exporter to use workspace settings
To reduce duplication, we could update
the vsock-exporter crate to use settings and versions
 from the agent, where applicable.
> [!NOTE]
> In order to use the workspace, this has bumped some crate versions

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-04-09 12:02:43 +01:00
stevenhorsman
2cb9fd3c69 agent: Update rustjail to use workspace settings
- To reduce duplication, we could update
the rustjail crate to use settings and versions
from the agent, where applicable.
- Also switch to using the derive feature in serde crate
rather than the separate serde_derive to avoid keeping
both versions in sync

> [!NOTE]
> In order to use the workspace, this has bumped
some crate versions

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-04-09 12:02:43 +01:00
stevenhorsman
655255b50c agent: Update policy to use workspace settings
To reduce duplication, we could update
the policy crate to use settings and versions
from the agent, where applicable.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-04-09 11:42:05 +01:00
stevenhorsman
1bec432ffa agent: Create workspace package and dependencies
- Create agent workspace dependencies and packge info
so that the packages in the workspace can use them
- Group the local dependencies together for clarity
(like in #11129)

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-04-09 11:42:00 +01:00
Fabiano Fidêncio
e3c98a5ac7 agent: Allow users to build without guest-pull
For those not interested in CoCo, let's at least allow them to easily
build the agent without the guest-pull feature.

This reduces the binary size (already stripped) from 25M to 18M.

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-04-04 22:58:43 +01:00
Zvonko Kaiser
e5c4cfb8a1
Merge pull request #11081 from BbolroC/unsealed-secret-fix
tests: Enable sealed secrets for all TEEs
2025-03-31 11:19:52 -04:00
Hyounggyu Choi
423ad8341d agent: Call cdh_handler for sealed secrets after add_storage()
As reported in #11011, mounted secrets are available after
a container image is pulled by add_storage() for IBM SE.
But secure mount should be handled before the `add_storage()`.
Therefore, this commit divides cdh_handler() into:

- cdh_handler_trusted_storage()
- cdh_handler_sealed_secrets()

and calls cdh_handler_sealed_secrets() after add_storage()
while keeping cdh_handler_trusted_storage() unchanged.

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2025-03-26 17:50:41 +01:00
Fupan Li
4b93176225 runtime-rs: update the protobuf to 3.7.1
Since some files generated by protobuf were share between
runtime-rs and kata agent, and the kata agent's dependency
image-rs dependened protobuf@3.7.1, thus we'd better to keep
the protobuf version aligned between runtime-rs and agent,
otherwise, we couldn't compile the runtime-rs and agent
at the same time.

Fixes: https://github.com/kata-containers/kata-containers/issues/10650

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-03-21 17:46:12 +08:00
Tobin Feldman-Fitzthum
b7786fbcf0 agent: update image-rs for coco v0.13.0
image-rs has gotten a number of significant updates, eliminating corner
cases with obscure containers, improving support for local certs, and
more.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@ibm.com>
2025-03-14 10:44:10 -05:00
Hui Zhu
93cd30862d libs: Add AddSwapPath to service AgentService
AddSwap send the pci path to guest kernel to let it add swap device.
But some mmio device doesn't have pci path.  To support it add
AddSwapPath send virt_path to guest kernel as swap device.

Fixes: #10988

Signed-off-by: Hui Zhu <teawater@antgroup.com>
2025-03-11 16:02:48 +08:00
Fupan Li
9a4c0a5c5c agent: add the route flags support when adding routes
Get the route entry's flags passed from host and
set it in the add route request.

Fixes: #7934

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-03-07 09:56:08 +08:00
Fupan Li
d929bc0224 agent: refactor the code of update routes/interfaces
We can use the netlink update method to add a route or an interface
address. There is no need to delete it first and then add it. This can
save two system commissions.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-03-07 09:56:08 +08:00
Fupan Li
aad915a7a1 agent: upgrade the netlink related crates
Upgrade rtnetlink and related crates to support
route flags.

Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
2025-03-07 09:56:08 +08:00
Zvonko Kaiser
3cea080185 agent: fix permisssion according to runc
The previous PR mistakenly set all perms to 0o666 we should follow
what runc does and fetch the permission from the guest aka host
if the file_mode == 0. If we do not find the device on the guest aka
host fallback to 0.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-03-05 17:33:40 +00:00
Zvonko Kaiser
a5629f9bfa
Merge pull request #10971 from zvonkok/host-guest-mapping
agent: Enable VFIO and initContainers
2025-03-05 08:58:45 -05:00
Zvonko Kaiser
c73ff7518e agent: Fix default linux device permissions
We had the default permissions set to 0o000 if the file_mode was not
present, for most container devices this is the wrong default. Since
those devices are meant also to be accessed by users and others add a
sane default of 0o666 to devices that do not have any permissions set.

Otherwise only root can acess those and we cannot run containers as a
user.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-03-05 02:22:24 +00:00
Zvonko Kaiser
248d04c20c agent: Enable VFIO and initContainers
We had a static mapping of host guest PCI addresses, which prevented to
use VFIO devices in initContainers. We're tracking now the host-guest
mapping per container and removing this mapping if a container is
removed.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-03-04 19:53:52 +00:00
Jakob Naucke
c146980bcd
agent: Handle virtio-net-ccw devices separately
On s390x, a virtio-net device will use the CCW bus instead of PCI,
which impacts how its uevent should be handled. Take the respective
path accordingly.

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2025-02-26 11:36:43 +01:00
Jakob Naucke
b325069d72
agent: Update QEMU URL
Readthedocs URL was outdated.

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2025-02-26 11:36:42 +01:00
Jakob Naucke
9935f9ea7e
proto: Rename Interface.pciPath to devicePath
Field is being used for both PCI and CCW devices. Name it devicePath
to avoid confusion when the device isn't a PCI device.

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2025-02-26 11:36:42 +01:00
Fabiano Fidêncio
b3b570e4c4 agent: Fix non-guest-pull build
As the guest-pull is a very Confidental Containers specific feature,
let's make sure we, at least, don't break folks who decide to build Kata
Containers' agent without having this feature enabled (for instance, for
the sake of the agent size).

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
2025-02-25 21:48:41 +01:00
Fabiano Fidêncio
e09ae2cc0b
Merge pull request #10921 from RuoqingHe/drop-redundant-override
build: Drop redundant ARCH override
2025-02-25 14:54:36 +01:00
Ruoqing He
265a751837 build: Drop redundant ARCH override
There are many `override ARCH = powerpc64le` after where `utils.mk` is
included, which are redundant.

Drop those redundant `override`s.

Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
2025-02-24 22:04:28 +08:00
Aurélien Bombo
a1ed923740 agent: Fix race condition with cgroup watchers
In the CI, test containers intermittently fail to start after creation,
with an error like below (see #10872 for more details):

  #     State:      Terminated
  #       Reason:   StartError
  #       Message:  failed to start containerd task "afd43e77fae0815afbc7205eac78f94859e247968a6a4e8bcbb987690fcf10a6": No such file or directory (os error 2)

I've observed this error to repro with the following containers, which
have in common that they're all *very short-lived* by design (more tests
might be affected):

 * k8s-job.bats
 * k8s-seccomp.bats
 * k8s-hostname.bats
 * k8s-policy-job.bats
 * k8s-policy-logs.bats

Furthermore, appending a `; sleep 1` to the command line for those
containers seemed to consistently get rid of the error.

Investigating further, I've uncovered a race between the end of the container
process and the setting up of the cgroup watchers (to report OOMs).

If the process terminates first, the agent will try to watch cgroup
paths that don't exist anymore, and it will fail to start the container.
The added error context in notifier.rs confirms that the error comes
from the missing cgroup:

  https://github.com/kata-containers/kata-containers/actions/runs/13450787436/job/37585901466#step:17:6536

The fix simply consists in creating the watchers *before* we start the
container but still *after* we create it -- this is non-blocking, and IIUC the
cgroup is guaranteed to already be present then.

Fixes: #10872

Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
2025-02-21 17:52:11 -06:00
Dan Mihai
b90c537f79
Merge pull request #10881 from mythi/build-fixes
minor build fixes
2025-02-21 09:54:55 -08:00
Aurélien Bombo
601c403603
Merge pull request #10818 from burgerdev/plumbing
agent: clear log pipes if denied by policy
2025-02-19 16:28:58 -06:00
Mikko Ylinen
0d8242aee4 agent: rename cargo config
To mitigate:

warning: `.../kata-containers/src/agent/.cargo/config` is deprecated in favor of `config.toml`
note: if you need to support cargo 1.38 or earlier, you can symlink `config` to `config.toml`

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
2025-02-19 09:34:13 +02:00
Paul Meyer
80af09aae9 agent: make policy feature optional again
This was messed up a little when factoring out the policy crate.
Removing the dependencies no longer used by the agent and making the
import of kata-agent-policy optional again.

Signed-off-by: Paul Meyer <katexochen0@gmail.com>
2025-02-18 15:28:06 +01:00
Zvonko Kaiser
72833cb00b
Merge pull request #10878 from zvonkok/agent_cdi_timeout
gpu: agent cdi timeout
2025-02-17 12:49:51 -05:00
stevenhorsman
e5a284474d deps: Update cookie-store & publicsuffix
Run:
```
cargo update -p cookie-store
cargo update -p publicsuffix
```
to update the version of idna and resolve CVE-2024-12224

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-14 17:30:03 +00:00
stevenhorsman
5656fc6139 deps: Bump reqwest
Bump reqwest to 0.12.12 to pick up fixes

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-14 17:30:03 +00:00
stevenhorsman
3a3849efff deps: Update quinn-proto
Update quin-proto to fix CVE-2024-45311

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-14 17:30:03 +00:00
Fabiano Fidêncio
64ceb0832a
Merge pull request #10851 from fidencio/topic/bump-image-rs-to-bring-in-ttrpc-0.8.4
agent: Bump image-rs to 514c561d93
2025-02-14 18:21:56 +01:00
Fabiano Fidêncio
d5878437a4
Merge pull request #10845 from DataDog/dind-subcgroup-fix
Add process to init subcgroup when we're using dind with cgroups v2
2025-02-14 18:12:24 +01:00
Zvonko Kaiser
908aacfa78 gpu: Update the logging around CDI
Removed a rogue printf and updated the logging to say
that we're waiting for CDI spec(s) to be generated rather
than saying there is an error, it's not we have a timeout
after that it is an error.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-14 14:32:00 +00:00
Zvonko Kaiser
2499d013bd gpu: Update handle_cdi_devices
AgentConfig now has the cdi_timeout from the kernel
cmdline, update the proper function signature and use
it in the for loop.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-13 20:11:48 +00:00
Zvonko Kaiser
95aa21f018 gpu: Add CDI timeout via kernel config
Some systems like a DGX where we have 8 H100 or 8 H800 GPUs
need some extended time to be initialized. We need to make
sure we can configure CDI timeout, to enable even systems with 16 GPUs.

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2025-02-13 19:23:19 +00:00
stevenhorsman
d28a512d29 agent: Wait for network before init_image_service
Based on the guidance from @Xynnn007 in #10851
> The new version of image-rs will do attestation once
ClientBuilder.build().await() is called, while the old version
will do so lazily the first image pull request comes.
Looks like it's called in  rpc::start() in kata-agent, when
I'm afraid the network hasn't been initialized yet.

> I am not sure if the guest network is prepared after
the DNS is configured (in create_sandbox),
if so we can move (the init_image_service) right after that.

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-13 11:44:51 +00:00
Tobin Feldman-Fitzthum
a13d5a3f04 agent: Bump image-rs to 514c561d93
As this brings in the commit bumping ttrpc to 0.8.4, which fixes
connection issues with kernel 6.12.9+.

As image-rs has a new builder pattern and several of the values in the
image client config have been renamed, let's change the agent to account
for this.

Signed-off-by: Tobin Feldman-Fitzthum <tobin@linux.ibm.com>
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-13 11:44:51 +00:00
Antoine Gaillard
4b5b788918
agent: Use init subcgroup for process attachment in DinD
cgroups v2 enforces stricter delegation rules, preventing operations on
cgroups outside our ownership boundary. When running Docker-in-Docker (DinD),
processes must be attached to an "init" subcgroup within the systemd unit.
This fix detects and uses the init subcgroup when proxying process attachment.

Fixes #10733

Signed-off-by: Antoine Gaillard <antoine.gaillard@datadoghq.com>
2025-02-13 10:44:51 +01:00
Leonard Cohnen
cf54a1b0e1 agent: move policy module into separate crate
The policy module augments the policy generated with genpolicy by keeping and
providing state to each invocation.
Therefore, it is not sufficient anymore to test the passing of requests in
the genpolicy crate.

Since in Rust, integration tests cannot call functions that are not exposed
publicly, this commit factors out the policy module of the agent into its
own crate and exposes the necessary functions to be consumed by the agent
and an integration tests. The integration test itself is implemented in the
following commits.

Signed-off-by: Leonard Cohnen <lc@edgeless.systems>
2025-02-12 10:41:15 +01:00
stevenhorsman
17b1e94f1a cargo: Update time crate
So it avoids us hitting
```
error[E0282]: type annotations needed for `Box<_>`
  --> /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/time-0.3.31/src/format_description/parse/mod.rs:83:9
   |
83 |     let items = format_items
   |         ^^^^^
...
86 |     Ok(items.into())
   |              ---- type must be known at this point
   |
help: consider giving `items` an explicit type, where the placeholders `_` are specified
   |
83 |     let items: Box<_> = format_items
   |              ++++++++
```

Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-05 15:16:47 +00:00
stevenhorsman
e9393827e8 agent: Workaround ppc formatting
On powerpc64le platform the ip neigh command has
a trailing space after the state, so the test is failing e.g.
```
 assertion `left == right` failed
  left: "169.254.1.1 lladdr 6a:92:3a:59:70:aa PERMANENT \n"
 right: "169.254.1.1 lladdr 6a:92:3a:59:70:aa PERMANENT\n"
```
Trim the whitespace to make the test pass on all platforms

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-05 15:16:47 +00:00
stevenhorsman
7257ee0397 agent: Remove implementation of ToString
Fix clippy error:
```
direct implementation of `ToString`
```
by switching to implement Display instead

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-05 14:45:01 +00:00
stevenhorsman
ca87aca1a6 agent: Remove use of legacy constants
Fix clippy error
```
error: usage of a legacy numeric constant
```
by swapping `std::i32::<MIN/MAX>` for `i32::<MIN/MAX>`

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-05 14:45:01 +00:00
stevenhorsman
6008fd56a1 agent: Fix clippy error
```
error: file opened with `create`, but `truncate` behavior not defined
```
`truncate(true)` ensures the file is entirely overwritten with new data
which I believe is the behaviour we want

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-05 14:45:01 +00:00
stevenhorsman
a640bb86ec agent: cdh: Remove unnecessary borrows
Fix clippy error:
```
error: the borrowed expression implements the required traits
```

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-05 14:45:01 +00:00
stevenhorsman
a131eec5c1 agent: config: Remove supports_seccomp
supports_seccomp is never used, so throws a clippy error

Signed-off-by: stevenhorsman <steven@uk.ibm.com>
2025-02-05 14:45:01 +00:00