To make it aligned with the setting of runtime-go, we should keep
it as empty when users doesn't enable and set its specified path.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
OSV-Scanner highlights go.mod references to go stdlib 1.23.0 contrary to intention in versions.yaml, so synchronize them.
Make a converse comment for versions.yaml.
Fixes: #11700
Signed-off-by: Alex Tibbles <alex@bleg.org>
Let's rename the runtime-rs initdata annotation from
`io.katacontainers.config.runtime.cc_init_data` to
`io.katacontainers.config.hypervisor.cc_init_data`.
Rationale:
- initdata itself is a hypervisor-specific feature
- the new name aligns with the annotation handling logic:
c92bb1aa88/src/libs/kata-types/src/annotations/mod.rs (L514-L968)
This commit updates the annotation for go-runtime and tests accordingly.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
This commit support the seccomp_sandbox option from the configuration.toml file
and add the logic for appending command-line arguments based on this new configuration parameter.
Fixes: #11524
Signed-off-by: wangxinge <wangxinge@bupt.edu.cn>
Previouly it is reusing the ovmf, which will enter some
issue for path checking, so move to aavmf as it should
be.
Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
Read only the sealed secret prefix instead of the whole file.
Improves performance and reduces memory usage in I/O-heavy environments.
Fixes: #11643
Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>
Dependening on the platform configuration, users might want to
set a more secure policy than the QEMU default.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
This change introduces a new command line option `--vm`
to boot up a pod VM for testing. The tool connects with
kata agent running inside the VM to send the test commands.
The tool uses `hypervisor` crates from runtime-rs for VM
lifecycle management. Current implementation supports
Qemu & Cloud Hypervisor as VMMs.
In summary:
- tool parses the VMM specific runtime-rs kata config file in
/opt/kata/share/defaults/kata-containers/runtime-rs/*
- prepares and starts a VM using runtime-rs::hypervisor vm APIs
- retrieves agent's server address to setup connection
- tests the requested commands & shutdown the VM
Fixes#11566
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
The seccomp feature for Cloud Hypervisor and Firecracker is enabled by default.
This commit introduces an option to disable seccomp for both and updates the built-in configuration.toml file accordingly.
Fixes: #11535
Signed-off-by: wangxinge <wangxinge@bupt.edu.cn>
Route kata-shim logs directly to systemd-journald under 'kata' identifier.
This refactoring enables `kata-shim` logs to be properly attributed to
'kata' in systemd-journald, instead of inheriting the 'containerd'
identifier.
Previously, `kata-shim` logs were challenging to filter and debug as
they
appeared under the `containerd.service` unit.
This commit resolves this by:
1. Introducing a `LogDestination` enum to explicitly define logging
targets (File or Journal).
2. Modifying logger creation to set `SYSLOG_IDENTIFIER=kata` when
logging
to Journald.
3. Ensuring type safety and correct ownership handling for different
logging backends.
This significantly enhances the observability and debuggability of Kata
Containers, making it easier to monitor and troubleshoot Kata-specific
events.
Fixes: #11590
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Configuration information is adjusted after loading from file but so
far, there has been no similar check for configuration coming from
annotations. This commit introduces re-adjusting config after
annotations have been processed.
A small refactor was necessary as a prerequisite which introduces
function TomlConfig::adjust_config() to make it easier to invoke
the adjustment for a whole TomlConfig instance. This function is
analogous to the existing validate() function.
The immediate motivation for this change is to make sure that 0
in "default_vcpus" annotation will be properly adjusted to 1 as
is the case if 0 is loaded from a config file. This is required
to match the golang runtime behaviour.
Signed-off-by: Pavel Mores <pmores@redhat.com>
Also included (as commented out) is a test that does not pass although
it should. See source code comment for explanation why fixing this seems
beyond the scope of this PR.
Signed-off-by: Pavel Mores <pmores@redhat.com>
This commit focuses purely on the formal change of type. If any subsequent
changes in semantics are needed they are purposely avoided here so that the
commit can be reviewed as a 100% formal and 0% semantic change.
Signed-off-by: Pavel Mores <pmores@redhat.com>
This commit addresses a part of the same problem as PR #7623 did for the
golang runtime. So far we've been rounding up individual containers'
vCPU requests and then summing them up which can lead to allocation of
excess vCPUs as described in the mentioned PR's cover letter. We address
this by reversing the order of operations, we sum the (possibly fractional)
container requests and only then round up the total.
We also align runtime-rs's behaviour with runtime-go in that we now
include the default vcpu request from the config file ('default_vcpu')
in the total.
We diverge from PR #7623 in that `default_vcpu` is still treated as an
integer (this will be a topic of a separate commit), and that this
implementation avoids relying on 32-bit floating point arithmetic as there
are some potential problems with using f32. For instance, some numbers
commonly used in decimal, notably all of single-decimal-digit numbers
0.1, 0.2 .. 0.9 except 0.5, are periodic in binary and thus fundamentally
not representable exactly. Arithmetics performed on such numbers can lead
to surprising results, e.g. adding 0.1 ten times gives 1.0000001, not 1,
and taking a ceil() results in 2, clearly a wrong answer in vcpu
allocation.
So instead, we take advantage of the fact that container requests happen
to be expressed as a quota/period fraction so we can sum up quotas,
fundamentally integral numbers (possibly fractional only due to the need
to rewrite them with a common denominator) with much less danger of
precision loss.
Signed-off-by: Pavel Mores <pmores@redhat.com>
When hot-plugging CPUs on QEMU, we send a QMP command with JSON
arguments. QEMU 9.2 recently became more strict[1] enforcing the
JSON schema for QMP parameters. As a result, running Kata Containers
with QEMU 9.2 results in a message complaining that the core-id
parameter is expected to be an integer:
```
qmp hotplug cpu, cpuID=cpu-0 socketID=1, error:
QMP command failed:
Invalid parameter type for 'core-id', expected: integer
```
Fix that by changing the core-id, socket-id and thread-id to be
integer values.
[1]: be93fd5372Fixes: #11633
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
As we have changed the initdata annotation definition, Accordingly, we also
need correct its const definition with KATA_ANNO_CFG_RUNTIME_INIT_DATA.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
When the network interface provisioned by the CNI has static ARP table entries,
the runtime calls AddARPNeighbor to propagate these to the agent. As of today,
these calls are simply rejected.
In order to allow the calls, we do some sanity checks on the arguments:
We must ensure that we don't unexpectedly route traffic to the host that was
not intended to leave the VM. In a first approximation, this applies to
loopback IPs and devices. However, there may be other sensitive ranges (for
example, VPNs between VMs), so there should be some flexibility for users to
restrict this further. This is why we introduce a setting, similar to
UpdateRoutes, that allows restricting the neighbor IPs further.
The only valid state of an ARP neighbor entry is NUD_PERMANENT, which has a
value of 128 [1]. This is already enforced by the runtime.
According to rtnetlink(7), valid flag values are 8 and 128, respectively [2],
thus we allow any combination of these.
[1]: https://github.com/torvalds/linux/blob/4790580/include/uapi/linux/neighbour.h#L72
[2]: https://github.com/torvalds/linux/blob/4790580/include/uapi/linux/neighbour.h#L49C20-L53Fixes: #11664
Signed-off-by: Markus Rudy <mr@edgeless.systems>
To make it work within CI, we do alignment with kata-runtime's definition
with "io.katacontainers.config.runtime.cc_init_data".
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
In order to have a reproducible code generation process, we need to pin
the versions of the tools used. This is accomplished easiest by
generating inside a container.
This commit adds a container image definition with fixed dependencies
for Golang proto/ttrpc code generation, and changes the agent Makefile
to invoke the update-generated-proto.sh script from within that
container.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
The generated Go bindings for the agent are out of date. This commit
was produced by running
src/agent/src/libs/protocols/hack/update-generated-proto.sh with
protobuf compiler versions matching those of the last run, according to
the generated code comments.
Since there are new RPC methods, those needed to be added to the
HybridVSockTTRPCMockImp.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
- "confidential_emptyDir" becomes "emptyDir" in the settings file.
- "confidential_configMap" becomes "configMap" in settings.
- "mount_source_cpath" becomes "cpath".
- The new "root_path" gets used instead of the old "cpath" to point to
the container root path..
- "confidential_guest" is no longer used. By default it gets replaced
by "enable_configmap_secret_storages"=false, because CoCo is using
CopyFileRequest instead of the Storage data structures for ConfigMap
and/or Secret volume mounts during CreateContainerRequest.
- The value of "guest_pull" becomes true by default.
- "image_layer_verification" is no longer used - just CoCo's guest pull
is supported.
- The Request input files from unit tests are changing to reflect the
new default settings values described above.
- tests/integration/kubernetes/tests_common.sh adjusts the settings for
platforms that are not set-up for CoCo during CI (i.e., platforms
other than SNP, TDX, and CoCo Dev).
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Skip pulling container image layers when guest-pull=true. The contents
of these layers were ignored due to:
- #11162, and
- tarfs snapshotter support having been removed from genpolicy.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
AKS Confidential Containers are using the tarfs snapshotter. CoCo
upstream doesn't use this snapshotter, so remove this Policy complexity
from upstream.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
`mem-agent` here is now a library and do not contain examples, ignore
Cargo.lock to get rid of untracked file noise produced by `cargo run` or
`cargo test`.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Re-generates the client code against Cloud Hypervisor v47.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
`MmapRegion` is only used while `virtio-fs` is enabled during testing
dragonball, gate the import behind `virtio-fs` feature.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Some variables went unused if certain features are not enabled, use
`#[allow(unused)]` to suppress those warnings at the time being.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
`VcpuManagerError` is only needed when `host-device` feature is enabled,
gate the import behind that feature.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Code inside `test_mac_addr_serialization_and_deserialization` test does
not actually require this `with-serde` feature to test, removing the
assertion here to enable this test.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Add full cgroups support on host. Cgroups are managed by `FsManager` and
`SystemdManager`. As the names impies, the `FsManager` manages cgroups
through cgroupfs, while the `SystemdManager` manages cgroups through
systemd. The two manages support cgroup v1 and cgroup v2.
Two types of cgroups path are supported:
1. For colon paths, for example "foo.slice:bar:baz", the runtime manages
cgroups by `SystemdManager`;
2. For relative/absolute paths, the runtime manages cgroups by
`FsManager`.
vCPU threads are added into the sandbox cgroups in cgroup v1 + cgroupfs,
others, cgroup v1 + systemd, cgroup v2 + cgroupfs, cgroup v2 + systemd, VMM
process is added into the cgroups.
The systemd doesn't provide a way to add thread to a unit. `add_thread()`
in `SystemdManager` is equivalent to `add_process()`.
Cgroup v2 supports threaded mode. However, we should enable threaded mode
from leaf node to the root node (`/`) iteratively [1]. This means the
runtime needs to modify the cgroups created by container runtime (e.g.
containerd). Considering cgroupfs + cgroup v2 is not a common combination,
its behavior is aligned with systemd + cgroup v2, which is not allowed to
manage process at the thread level.
1: https://www.kernel.org/doc/html/v4.18/admin-guide/cgroup-v2.html#threadsFixes: #11356
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
As some reasons, it first should make it align with runtime-go, this
commit will do this work.
Fixes#11543
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>