Since it aligns with the create_container_timeout definition in
runtime-go, we need to set the value in configuration.toml in seconds,
not milliseconds. We must also convert it to milliseconds when the
configuration is loaded for request_timeout_ms.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
ef642fe890 added a special case to avoid
moving cgroups that are on the "default" slice in case of deletion.
However, this special check should be done in the Parent() method
instead, which ensures that the default resource controller ID is
returned, instead of ".".
Fixes: #11599
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
- Set guest Storage.options for block rootfs to empty (do not propagate host mount options).
- Align behavior with Go runtime: only add xfs nouuid when needed.
Signed-off-by: Caspian443 <scrisis843@gmail.com>
We moved to `.zst`, but users still use the upstream kata-manager to
download older versions of the project, thus we need to support both
suffixes.
Fixes: #11714
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Similar to what we've done for Cloud Hypervisor in the commit
9f76467cb7, we're backporting a runtime-rs
feature that would be benificial to have as part of the go runtime.
This allows users to use virito-balloon for the hypervisor to reclaim
memory freed by the guest.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
The default suggestion for top-level permissions was
`contents: read`, but scorecard notes anything other than empty,
so try updating it and see if there are any issues. I think it's
only needed if we run workflows from other repos.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Since the previous tightening a few workflow updates have
gone in and the zizmor job isn't flagging them as issues,
so address this to remove potential attack vectors
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This reverts commit cb5f143b1b, as the
cached packages have been regenerated after the switch to using zstd.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
As part of the go 1.24.6 bump there are errors about the incorrect
use of a errorf, so switch to the non-formatting version, or add
the format string as appropriate
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
golang 1.25 has been released, so 1.23 is EoL,
so we should update to ensure we don't end up with security issues
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Update the two workflows that used setup-go to
instead call `install_go.sh` script, which handles
installing the correct version of golang
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
`${kernel_name,,}` is bash 4.0 and not posix compliant, so doesn't
work on macos, so switch to `tr` which is more widely
supported
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In #11693 the cc_init_data annotation was changes to be hypervisor
scoped, so each hypervisor needs to explicitly allow it in order to
use it now, so add this to both the go and rust runtime's remote
configurations
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We need to get the root_hash.txt file from the image build, otherwise
there's no way to build the shim using those values for the
configuration files.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Although the compress ratio is not as optimal as using xz, it's way
faster to compress / uncompress, and it's "good enough".
This change is not small, but it's still self-contained, and has to get
in at once, in order to help bisects in the future.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
As 3.18 is already EOL.
We need to add `--break-system-packages` to enforce the install of the
installation of the yq version that we rely on. The tests have shown
that no breakage actually happens, fortunately.
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Currently, we change vm_rootfs_driver as the initdata device driver
with block_device_driver.
Fixes#11697
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
we also need support initdat within nonprotection even though the
platform is detected as NonProtection or usually is called nontee
host. Within these cases, there's no need to validate the item of
`confidential_guest=true`, we believe the result of the method
`available_guest_protection()?`.
Fixes#11697
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
The default `reconnect_timeout` (3 seconds) was found to be insufficient for
IBM SEL when using VSOCK. This commit updates the timeouts as follows:
- `dial_timeout_ms`: Set to 90ms to match the value used in go-runtime for IBM SEL
- `reconnect_timeout_ms`: Increased to 5000ms based on empirical testing
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Add support for the `InitData` resource config on IBM SEL,
so that a corresponding block device is created and the
initdata is passed to the guest through this device.
Note that we skip passing the initdata hash via QEMU’s
object, since the hypervisor does not yet support this
mechanism for IBM SEL. It will be introduced separately
once QEMU adds the feature.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Linux v6.16 brings some useful features for the confidential guests.
Most importantly, it adds an ABI to extend runtime measurement registers
(RTMR) for the TEE platforms supporting it. This is currently enabled
on Intel TDX only.
The kernel version bump from v6.12.x to v6.16 forces some CONFIG_*
changes too:
MEMORY_HOTPLUG_DEFAULT_ONLINE was dropped in favor of more config
choices. The equivalent option is MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO.
X86_5LEVEL was made unconditional. Since this was only a TDX
configuration, dropping it completely as part of v6.16 is fine.
CRYPTO_NULL2 was merged with CRYPTO_NULL. This was only added in
confidential guest fragments (cryptsetup) so we can drop it in this update.
CRYPTO_FIPS now depends on CRYPTO_SELFTESTS which further depends on
EXPERT which we don't have. Enable both in a separate config fragment
for confidential guests. This can be moved to a common setting once
other targets bump to post v6.16.
CRYPTO_SHA256_SSE3 arch optimizations were reworked and are now enabled
by default. Instead of adding it to whitelist.conf, just drop it completely
since it was only enabled as part of "measured boot" feature for
confidential guests. CONFIG_CRYPTO_CRC32_S390 was reworked the same way.
In this case, whitelist.conf is needed.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
This reverts commit ede773db17.
`cc_init_data` should be under a hypervisor category because
it is a hypervisor-specific feature. The annotation including
`runtime` also breaks a logic for `is_annotation_enabled()`.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
We need to include `cc_init_data` in the enable_annotations
array to pass the data. Since initdata is a CoCo-specific
feature, this commit introduces a new array,
`DEFENABLEANNOTATIONS_COCO`, which contains the required
string and applies it to the relevant CoCo configuration.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Currently, there are 2 issues for the empty initdata annotation
test:
- Empty string handling
- "\[CDH\] \[ERROR\]: Get Resource failed" not appearing
`add_hypervisor_initdata_overrides()` does not handle
an empty string, which might lead to panic like:
```
called `Result::unwrap()` on an `Err` value: gz decoder failed
Caused by:
failed to fill whole buffer
```
This commit makes the function return an empty string
for a given empty input and updates the assertion string
to one that appears in both go-runtime and runtime-rs.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Currently, runtime-rs related code within the libs directory lacks
sufficient CI protection. We frequently observe the following issues:
- Inconsistent Code Formatting: Code that has not been properly
formatted
is merged.
- Failing Tests: Code with failing unit or integration tests is merged.
To address these issues, we need introduce stricter CI checks for the
libs directory. This may specifically include:
- Code Formatting Checks
- Mandatory Test Runs
Fixes#11512
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
To make it aligned with the setting of runtime-go, we should keep
it as empty when users doesn't enable and set its specified path.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
We need to make sure that we use the latest kernel
and rebuild the initrd and image for the nvidia-gpu
use-cases otherwise the tests will fail since
the modules are not build against the new kernel and
they simply fail to load.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
OSV-Scanner highlights go.mod references to go stdlib 1.23.0 contrary to intention in versions.yaml, so synchronize them.
Make a converse comment for versions.yaml.
Fixes: #11700
Signed-off-by: Alex Tibbles <alex@bleg.org>
Let's rename the runtime-rs initdata annotation from
`io.katacontainers.config.runtime.cc_init_data` to
`io.katacontainers.config.hypervisor.cc_init_data`.
Rationale:
- initdata itself is a hypervisor-specific feature
- the new name aligns with the annotation handling logic:
c92bb1aa88/src/libs/kata-types/src/annotations/mod.rs (L514-L968)
This commit updates the annotation for go-runtime and tests accordingly.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Enable testing of initdata on the qemu-coco-dev and qemu-se
runtime classes, so we can validate the function on s390x
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This commit support the seccomp_sandbox option from the configuration.toml file
and add the logic for appending command-line arguments based on this new configuration parameter.
Fixes: #11524
Signed-off-by: wangxinge <wangxinge@bupt.edu.cn>
Previouly it is reusing the ovmf, which will enter some
issue for path checking, so move to aavmf as it should
be.
Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
Read only the sealed secret prefix instead of the whole file.
Improves performance and reduces memory usage in I/O-heavy environments.
Fixes: #11643
Signed-off-by: Park.Jiyeon <jiyeonnn2@icloud.com>
Dependening on the platform configuration, users might want to
set a more secure policy than the QEMU default.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
This change introduces a new command line option `--vm`
to boot up a pod VM for testing. The tool connects with
kata agent running inside the VM to send the test commands.
The tool uses `hypervisor` crates from runtime-rs for VM
lifecycle management. Current implementation supports
Qemu & Cloud Hypervisor as VMMs.
In summary:
- tool parses the VMM specific runtime-rs kata config file in
/opt/kata/share/defaults/kata-containers/runtime-rs/*
- prepares and starts a VM using runtime-rs::hypervisor vm APIs
- retrieves agent's server address to setup connection
- tests the requested commands & shutdown the VM
Fixes#11566
Signed-off-by: Sumedh Alok Sharma <sumsharma@microsoft.com>
The seccomp feature for Cloud Hypervisor and Firecracker is enabled by default.
This commit introduces an option to disable seccomp for both and updates the built-in configuration.toml file accordingly.
Fixes: #11535
Signed-off-by: wangxinge <wangxinge@bupt.edu.cn>
Route kata-shim logs directly to systemd-journald under 'kata' identifier.
This refactoring enables `kata-shim` logs to be properly attributed to
'kata' in systemd-journald, instead of inheriting the 'containerd'
identifier.
Previously, `kata-shim` logs were challenging to filter and debug as
they
appeared under the `containerd.service` unit.
This commit resolves this by:
1. Introducing a `LogDestination` enum to explicitly define logging
targets (File or Journal).
2. Modifying logger creation to set `SYSLOG_IDENTIFIER=kata` when
logging
to Journald.
3. Ensuring type safety and correct ownership handling for different
logging backends.
This significantly enhances the observability and debuggability of Kata
Containers, making it easier to monitor and troubleshoot Kata-specific
events.
Fixes: #11590
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
After moving Arm64 CI nodes to new one, we do faced an interesting
issue for timeout when it executes the command with crictl runp,
the error is usally: code = DeadlineExceeded
Fixes: #11662
Signed-off-by: Kevin Zhao <kevin.zhao@linaro.org>
This series should make runtime-rs's vcpu allocation behaviour match the
behaviour of runtime-go so we can now enable pertinent tests which were
skipped so far due the difference between both shims.
Signed-off-by: Pavel Mores <pmores@redhat.com>
Configuration information is adjusted after loading from file but so
far, there has been no similar check for configuration coming from
annotations. This commit introduces re-adjusting config after
annotations have been processed.
A small refactor was necessary as a prerequisite which introduces
function TomlConfig::adjust_config() to make it easier to invoke
the adjustment for a whole TomlConfig instance. This function is
analogous to the existing validate() function.
The immediate motivation for this change is to make sure that 0
in "default_vcpus" annotation will be properly adjusted to 1 as
is the case if 0 is loaded from a config file. This is required
to match the golang runtime behaviour.
Signed-off-by: Pavel Mores <pmores@redhat.com>
Also included (as commented out) is a test that does not pass although
it should. See source code comment for explanation why fixing this seems
beyond the scope of this PR.
Signed-off-by: Pavel Mores <pmores@redhat.com>
This commit focuses purely on the formal change of type. If any subsequent
changes in semantics are needed they are purposely avoided here so that the
commit can be reviewed as a 100% formal and 0% semantic change.
Signed-off-by: Pavel Mores <pmores@redhat.com>
This commit addresses a part of the same problem as PR #7623 did for the
golang runtime. So far we've been rounding up individual containers'
vCPU requests and then summing them up which can lead to allocation of
excess vCPUs as described in the mentioned PR's cover letter. We address
this by reversing the order of operations, we sum the (possibly fractional)
container requests and only then round up the total.
We also align runtime-rs's behaviour with runtime-go in that we now
include the default vcpu request from the config file ('default_vcpu')
in the total.
We diverge from PR #7623 in that `default_vcpu` is still treated as an
integer (this will be a topic of a separate commit), and that this
implementation avoids relying on 32-bit floating point arithmetic as there
are some potential problems with using f32. For instance, some numbers
commonly used in decimal, notably all of single-decimal-digit numbers
0.1, 0.2 .. 0.9 except 0.5, are periodic in binary and thus fundamentally
not representable exactly. Arithmetics performed on such numbers can lead
to surprising results, e.g. adding 0.1 ten times gives 1.0000001, not 1,
and taking a ceil() results in 2, clearly a wrong answer in vcpu
allocation.
So instead, we take advantage of the fact that container requests happen
to be expressed as a quota/period fraction so we can sum up quotas,
fundamentally integral numbers (possibly fractional only due to the need
to rewrite them with a common denominator) with much less danger of
precision loss.
Signed-off-by: Pavel Mores <pmores@redhat.com>
When hot-plugging CPUs on QEMU, we send a QMP command with JSON
arguments. QEMU 9.2 recently became more strict[1] enforcing the
JSON schema for QMP parameters. As a result, running Kata Containers
with QEMU 9.2 results in a message complaining that the core-id
parameter is expected to be an integer:
```
qmp hotplug cpu, cpuID=cpu-0 socketID=1, error:
QMP command failed:
Invalid parameter type for 'core-id', expected: integer
```
Fix that by changing the core-id, socket-id and thread-id to be
integer values.
[1]: be93fd5372Fixes: #11633
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
As we have changed the initdata annotation definition, Accordingly, we also
need correct its const definition with KATA_ANNO_CFG_RUNTIME_INIT_DATA.
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
This adds SECURITY.md to the list of GH-native files that should be excluded by
the reference checker.
Today this is useful for downstreams who already have a SECURITY.md file for
compliance reasons. When Kata onboards that file, this commit will also be
required.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
When the network interface provisioned by the CNI has static ARP table entries,
the runtime calls AddARPNeighbor to propagate these to the agent. As of today,
these calls are simply rejected.
In order to allow the calls, we do some sanity checks on the arguments:
We must ensure that we don't unexpectedly route traffic to the host that was
not intended to leave the VM. In a first approximation, this applies to
loopback IPs and devices. However, there may be other sensitive ranges (for
example, VPNs between VMs), so there should be some flexibility for users to
restrict this further. This is why we introduce a setting, similar to
UpdateRoutes, that allows restricting the neighbor IPs further.
The only valid state of an ARP neighbor entry is NUD_PERMANENT, which has a
value of 128 [1]. This is already enforced by the runtime.
According to rtnetlink(7), valid flag values are 8 and 128, respectively [2],
thus we allow any combination of these.
[1]: https://github.com/torvalds/linux/blob/4790580/include/uapi/linux/neighbour.h#L72
[2]: https://github.com/torvalds/linux/blob/4790580/include/uapi/linux/neighbour.h#L49C20-L53Fixes: #11664
Signed-off-by: Markus Rudy <mr@edgeless.systems>
To make it work within CI, we do alignment with kata-runtime's definition
with "io.katacontainers.config.runtime.cc_init_data".
Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Print more details about the behavior of "kubectl logs", trying to understand
errors like:
https://github.com/kata-containers/kata-containers/actions/runs/16662887973/job/47164791712
not ok 1 Check the number vcpus are correctly allocated to the sandbox
(in test file k8s-sandbox-vcpus-allocation.bats, line 37)
`[ `kubectl logs ${pods[$i]}` -eq ${expected_vcpus[$i]} ]' failed with status 2
No resources found in kata-containers-k8s-tests namespace.
...
k8s-sandbox-vcpus-allocation.bats: line 37: [: -eq: unary operator expected
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
This auto-detects the repo by default (instead of having to specify
KATA_DEV_MODE=true) so that forked repos can leverage the static-checks.yaml CI
check without modification.
An alternative would have been to pass the repo in static-checks.yaml. However,
because of the matrix, this would've changed the check name, which is a pain to
handle in either the gatekeeper/GH UI.
Example fork failure:
https://github.com/microsoft/kata-containers/actions/runs/16656407213/job/47142421739#step:8:75
I've tested this change to work in a fork.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
In order to have a reproducible code generation process, we need to pin
the versions of the tools used. This is accomplished easiest by
generating inside a container.
This commit adds a container image definition with fixed dependencies
for Golang proto/ttrpc code generation, and changes the agent Makefile
to invoke the update-generated-proto.sh script from within that
container.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
The generated Go bindings for the agent are out of date. This commit
was produced by running
src/agent/src/libs/protocols/hack/update-generated-proto.sh with
protobuf compiler versions matching those of the last run, according to
the generated code comments.
Since there are new RPC methods, those needed to be added to the
HybridVSockTTRPCMockImp.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
Updated versions.yaml to use Firecracker v1.12.1.
Replaced firecracker and jailer binaries under /opt/kata/bin.
Tested with kata-fc runtime on Kubernetes:
- Deployed pods using gitpod/openvscode-server
- Verified microVM startup, container access, and Firecracker usage
- Confirmed Firecracker and jailer versions via CLI
Signed-off-by: Kumar Mohit <68772712+itsmohitnarayan@users.noreply.github.com>
- "confidential_emptyDir" becomes "emptyDir" in the settings file.
- "confidential_configMap" becomes "configMap" in settings.
- "mount_source_cpath" becomes "cpath".
- The new "root_path" gets used instead of the old "cpath" to point to
the container root path..
- "confidential_guest" is no longer used. By default it gets replaced
by "enable_configmap_secret_storages"=false, because CoCo is using
CopyFileRequest instead of the Storage data structures for ConfigMap
and/or Secret volume mounts during CreateContainerRequest.
- The value of "guest_pull" becomes true by default.
- "image_layer_verification" is no longer used - just CoCo's guest pull
is supported.
- The Request input files from unit tests are changing to reflect the
new default settings values described above.
- tests/integration/kubernetes/tests_common.sh adjusts the settings for
platforms that are not set-up for CoCo during CI (i.e., platforms
other than SNP, TDX, and CoCo Dev).
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Skip pulling container image layers when guest-pull=true. The contents
of these layers were ignored due to:
- #11162, and
- tarfs snapshotter support having been removed from genpolicy.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
AKS Confidential Containers are using the tarfs snapshotter. CoCo
upstream doesn't use this snapshotter, so remove this Policy complexity
from upstream.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
`mem-agent` here is now a library and do not contain examples, ignore
Cargo.lock to get rid of untracked file noise produced by `cargo run` or
`cargo test`.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Re-generates the client code against Cloud Hypervisor v47.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
`MmapRegion` is only used while `virtio-fs` is enabled during testing
dragonball, gate the import behind `virtio-fs` feature.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Some variables went unused if certain features are not enabled, use
`#[allow(unused)]` to suppress those warnings at the time being.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
`VcpuManagerError` is only needed when `host-device` feature is enabled,
gate the import behind that feature.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Code inside `test_mac_addr_serialization_and_deserialization` test does
not actually require this `with-serde` feature to test, removing the
assertion here to enable this test.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Add full cgroups support on host. Cgroups are managed by `FsManager` and
`SystemdManager`. As the names impies, the `FsManager` manages cgroups
through cgroupfs, while the `SystemdManager` manages cgroups through
systemd. The two manages support cgroup v1 and cgroup v2.
Two types of cgroups path are supported:
1. For colon paths, for example "foo.slice:bar:baz", the runtime manages
cgroups by `SystemdManager`;
2. For relative/absolute paths, the runtime manages cgroups by
`FsManager`.
vCPU threads are added into the sandbox cgroups in cgroup v1 + cgroupfs,
others, cgroup v1 + systemd, cgroup v2 + cgroupfs, cgroup v2 + systemd, VMM
process is added into the cgroups.
The systemd doesn't provide a way to add thread to a unit. `add_thread()`
in `SystemdManager` is equivalent to `add_process()`.
Cgroup v2 supports threaded mode. However, we should enable threaded mode
from leaf node to the root node (`/`) iteratively [1]. This means the
runtime needs to modify the cgroups created by container runtime (e.g.
containerd). Considering cgroupfs + cgroup v2 is not a common combination,
its behavior is aligned with systemd + cgroup v2, which is not allowed to
manage process at the thread level.
1: https://www.kernel.org/doc/html/v4.18/admin-guide/cgroup-v2.html#threadsFixes: #11356
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
As some reasons, it first should make it align with runtime-go, this
commit will do this work.
Fixes#11543
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
The actual memory usage on the host is equal to the hypervisor memory usage
plus the user memory usage. An OOM killer might kill the shim when the
memory limit on host is same with that of container and the container
consumes all available memory. In this case, the containerd will never
receive OOM event, but get "task exit" event. That makes the `k8s-oom.bats`
test fail.
The fix is to add a new container to increase the sandbox memory limit.
When the container "oom-test" is killed by OOM killer, there is still
available memory for the shim, so it will not be killed.
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
When enabling systemd cgroup driver and sandbox cgroup only, the shim is
under a systemd unit. When the unit is stopping, systemd sends SIGTERM to
the shim. The shim can't exit immediately, as there are some cleanups to
do. Therefore, ignoring SIGTERM is required here. The shim should complete
the work within a period (Kata sets it to 300s by default). Once a timeout
occurs, systemd will send SIGKILL.
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Our CI keeps on getting
```
jq: error (at <stdin>:1): Cannot index string with string "tag_name"
```
during the install dependencies phase, which I suspect
might be due to github rate limits being reduced, so try
to pass through the `GH_TOKEN` env and use it in the auth header.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
It is important that we continue to support VirtIO-SCSI. While
VirtIO-BLK is a common choice, virtio-scsi offers significant
performance advantages in specific scenarios, particularly when
utilizing iothreads and with NVMe Fabrics.
Maintaining Flexibility and Choice by supporting both virtio-blk and
virtio-scsi, we provide greater flexibility for users to choose the
optimal storage(virtio-blk, virtio-scsi) interface based on their
specific workload requirements and hardware configurations.
As virtio-scsi controller has been created when qemu vm starts with
block device driver is set to `virtio-scsi`. This commit is for blockdev_add
the backend block device and device_add frondend virtio-scsi device via qmp.
Fixes#11516
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
As block device index is an very important unique id of a block device
and can indicate a block device which is equivalent to device_id.
In case of index is required in calculating scsi LUN and reduce
useless arguments within reusing `hotplug_block_device`, we'd better
change the device_id with block device index.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
In this commit, block device aio are introduced within hotplug_block_device
within qemu via qmp and the "iouring" is set the default.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
It should be correctly handled within the device manager when do
create_block_device if the driver_option is virtio-scsi.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
It supports handling scsi device when block device driver is `scsi`.
And it will ensure a correct storage source with LUN.
Fixes#11516
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
It's used to help discover scsi devices inside guest and also add a
new const value `KATA_SCSI_DEV_TYPE` to help pass information.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
AIO is the I/O mechanism used by qemu with options:
- threads
Pthread based disk I/O.
- native
Native Linux I/O.
- io_uring (default mode)
Linux io_uring API. This provides the fastest I/O operations on
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Although Previous implementation of hotplugging block device via QMP
can successfully hot-plug the regular file based block device, but it
fails when the backend is /dev/xxx(e.g. /dev/loop0). With analysis about
it, we can know that it lacks the ablility to hotplug host block devices.
This commit will fill the gap, and make it work well for host block
devices.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
As there were a few moderate security vulnerability fixes missed as part
of the 3.19.0 release.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
For the release itself, let's simply copy the VERSION file to the
tarball.
To do so, we had to change the logic that merges the build, as at that
point the tag is not yet pushed to the repo.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
On commit 90bc749a19, we've changed the
QEMUTDXPATH in order to get it to work with GPUs, but the change broke
the non-GPU TDX use-case, which depends on the distro binary.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
- Add nodeSelector configuration to values.yaml with empty default
- Update DaemonSet template to conditionally include nodeSelector
- Add documentation and examples for nodeSelector usage in README
- Allows users to restrict kata-containers deployment to specific nodes by labeling them
Signed-off-by: Gus Minto-Cowcher <gus@basecamp-research.com>
According to the issue [1], Tokio will panic when we are giving a blocking
socket to Tokio's `from_std()` method, the information is as follows:
```
A panic occurred at crates/agent/src/sock/vsock.rs:59: Registering a
blocking socket with the tokio runtime is unsupported. If you wish to do
anyways, please add `--cfg tokio_allow_from_blocking_fd` to your RUSTFLAGS.
```
A workaround is to set the socket to non-blocking.
1: https://github.com/tokio-rs/tokio/issues/7172
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
The KERNEL_DEBUG_ENABLED was missing in the outer shell script
so overrides via make were not possible.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Bump these crates to remove the unmaintained dependency
proc-macro-error and remediate RUSTSEC-2024-0370
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Bump these crates across various components to remove the
dependency on unmaintained instant crate and remediate
RUSTSEC-2024-0384
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- The github generated template had an old version which
isn't valid for the pr-scan, so update to the latest
- The action needs also `actions: read` and `contents:read` to run in kata-containers
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Some of the nix apis we are using are now enabled by features,
so add these to resolve the compilation issues
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This new version of gc fixes s390x attestation, also introduces registry
configuration setting directly via initdata.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
The peer pods project is using the agent-ctl tool in some
tests, so tagging our cache will let them more easily identify
development versions of kata for testing between releases.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Sometimes, containers or execs do not use stdin, so there is no chance
to add parent stdin to the process's writer hashmap, resulting in the
parent stdin's fd not being closed when the process is cleaned up later.
Therefore, when creating a process, first explicitly add parent stdin to
the wirter hashmap. Make sure that the parent stdin's fd can be closed
when the process is cleaned up later.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
We want to be able to build a debug version of the kernel for various
use-cases like debugging, tracing and others.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The convention for rootfs-* names is:
* rootfs-${image_type}-${special_build}
If this is not followed, cache will never work as expected, leading to
building the initrd / image on every single build, which is specially
constly when building the nvidia specific targets.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
The init data could not be read properly within kata-agent because the
data length field was omitted, a consequence of a mismatch in the data
write format.
Fixes#11556
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Now AA supports to receive initdata toml plaintext and deliver it in the
attestation. This patch creates a file under
'/run/confidential-containers/initdata'
to store the initdata toml and give it to AA process.
When we have a separate component to handle initdata, we will move the
logic to that component.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Update to https://github.com/teawater/mem-agent/tree/kata-20250627.
The commit list:
3854b3a Update nix version from 0.23.2 to 0.30.1
d9a4ced Update tokio version from 1.33 to 1.45.1
9115c4d run_eviction_single_config: Simplify check evicted pages after
eviction
68b48d2 get_swappiness: Use a rounding method to obtain the swappiness
value
14c4508 run_eviction_single_config: Add max_seq and min_seq check with
each info
8a3a642 run_eviction_single_config: Move infov update to main loop
b6d30cf memcg.rs: run_aging_single_config: Fix error of last_inc_time
check
54fce7e memcg.rs: Update anon eviction code
41c31bf cgroup.rs: Fix build issue with musl
0d6aa77 Remove lazy_static from dependencies
a66711d memcg.rs: update_and_add: Fix memcg not work after set memcg
issue
cb932b1 Add logs and change some level of some logs
93c7ad8 Add per-cgroup and per-numa config support
092a75b Remove all Cargo.lock to support different versions of rust
540bf04 Update mem-agent-srv, mem-agent-ctl and mem-agent-lib to
v0.2.0
81f39b2 compact.rs: Change default value of compact_sec_max to 300
c455d47 compact.rs: Fix psi_path error with cgroup v2 issue
6016e86 misc.rs: Fix log error
ded90e9 Set mem-agent-srv and mem-agent-ctl as bin
Fixes: #11478
Signed-off-by: teawater <zhuhui@kylinos.cn>
As the following job has passed 10 days in a row for the nightly test:
```
kata-containers-ci-on-push / run-k8s-tests-on-zvsi / run-k8s-tests (nydus, qemu-coco-dev, kubeadm)
```
this commit makes the job required again.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
The teardown_common will print the description of the running pods, kill
them all and print the system's syslogs afterwards.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Set the node in the spec template of a Job manifest, allowing to use
set_node() on tests like k8s-parallel.bats
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The teardown_common will print the description of the running pods, kill
them all and print the system's syslogs afterwards.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The teardown_common will print the description of the running pods, kill
them all and print the system's syslogs afterwards.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The previous description for the `block_device_driver` was inaccurate or
outdated. This commit updates the documentation to provide a more
precise explanation of its function.
Fixes#11488
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
When we run a kata pod with runtime-rs/qemu and with a default
configuration toml, it will fail with error "unsupported driver type
virtio-scsi".
As virtio-scsi within runtime-rs is not so popular, we set default block
device driver with `virtio-blk-*`.
Fixes#11488
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This patch changes the container process HashMap to use exec_id as the primary
key instead of PID, preventing exec_id collisions that could be exploited in
Confidential Computing scenarios where the host is less trusted than the guest.
Key changes:
- Changed `processes: HashMap<pid_t, Process>` to `HashMap<String, Process>`
- Added exec_id collision detection in `start()` method
- Updated process lookup operations to use exec_id directly
- Simplified `get_process()` with direct HashMap access
This prevents multiple exec operations from reusing the same exec_id, which
could be problematic in CoCo use cases where process isolation and unique
identification are critical for security.
Signed-off-by: Ankita Pareek <ankitapareek@microsoft.com>
The `/opt/kata/VERSION` file, which is created using `git describe
--tags`, requires the newly released tag to be updated in order to be
accurate.
To do so, let's add a `fetch-tags: true` to the checkout action used
during the `create-kata-tarball` job.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
The CoCo non-TEE job (run-k8s-tests-coco-nontee) used to be required but
we had to withdraw it to fix a problem (#11156). Now the job is back
running and stable, so time to make it required again.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
tempdir hasn't been updated for seven years and pulls in
remove_dir_all@0.5.3 which has security advisory
GHSA-mc8h-8q98-g5hr, so replace this with using tempfile,
which the crate got merged into and we use elsewhere in the
project
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Ignore Cargo.lock in `libs` to prevent developers from accidentally
track lock files in `libs` workspace.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
This PR adds support for adding a network device before starting the
cloud-hypervisor VM.
This commit will get the host devices from NamedHypervisorConfig and
assign it to VmConfig's devices which is for vfio devices when clh
starts launching.
And with this, it successfully finish the vfio devices conversion from
a generic Hypervisor config to a clh specific VmConfig.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This commit introduce `host_devices` to help convert vfio devices from
a generic hypervisor config to a cloud-hypervisor specific VmConfig.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This PR adds support for adding a vfio device before starting the
cloud-hypervisor VM (or cold-plug vfio device).
This commit changes "pending_devices" for clh implementation via adding
DeviceType::Vfio() into pending_devices. And it will get shared host devices
after correctly handling vfio devices (Specially for primary device).
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
crates in `libs` workspace do not ship binaries, they are just libraries
for other workspace to reference, the `Cargo.lock` file hence would not
take effect. Removing Cargo.lock for `libs` workspace.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
In line with configuration for other TEEs, shared_fs should
be set to none for IBM SEL. This commit updates the value for
runtime/runtime-rs.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
As we're using a `kubectl wait --timeout ...` to check whether the
kata-deploy pod's been deleted or not, let's remove the `--wait` from
the `helm uninstall ...` call as k0s tests were failing because the
`kubectl wait --timeout...` was starting after the pod was deleted,
making the test fail.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
We've been pinning a specific version of k0s for CRI-O tests, which may
make sense for CRI-O, but doesn't make sense at all when it comes to
testing that we can install kata-deploy on latest k0s (and currently our
test for that is broken).
Let's bump to the latest, and from this point we start debugging,
instead of debugging on an ancient version of the project.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Bump url, reqwests and idna crates in order to move away from
idna <1.0.3 and remediate CVE-2024-12224.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Previously, the rootlessDir variable in `src/runtime/virtcontainers/pkg/rootless.go` was initialized at
package load time using `os.Getenv("XDG_RUNTIME_DIR")`. However, in rootless
VMM mode, the correct value of $XDG_RUNTIME_DIR is set later during runtime
using os.Setenv(), so rootlessDir remained empty.
This patch defers the initialization of rootlessDir until the first call
to `GetRootlessDir()`, ensuring it always reflects the current environment
value of $XDG_RUNTIME_DIR.
Fixes: #11526
Signed-off-by: stevenfryto <sunzitai_1832@bupt.edu.cn>
There are workflows that rely on `az aks install-cli` to get kubectl
installed. There is a well-known problem on install-cli, related with
API usage rate limit, that has recently caused the command to fail
quite often.
This is replacing install-cli with the azure/setup-kubectl github
action which has no such as rate limit problem.
While here, removed the install_cli() function from gha-run-k8s-common.sh
so avoid developers using it by mistake in the future.
Fixes#11463
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Removing kernel config files realting
to SEV as part of the SEV deprecation
efforts.
Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
Signed-off-by: Arvind Kumar <arvinkum@amd.com>
Removing runtime SEV functionality,
such as the kbs, ovmf, VMSA handling,
and SEV configs as part of deprecating
SEV from kata.
Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
Signed-off-by: Arvind Kumar <arvinkum@amd.com>
Removing files related to SEV, responsible for
installing and configuring Kata containers.
Co-authored-by: Adithya Krishnan Kannan <AdithyaKrishnan.Kannan@amd.com>
Signed-off-by: Arvind Kumar <arvinkum@amd.com>
Add init data annotation within preparing remote hypervisor annotations
when prepare vm, so that it can be passed within CreateVMRequest.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
679cc9d47c was merged and bumped the
podoverhead for the gpu related runtimeclasses. However, the bump on the
`kata-runtimeClasses.yaml` as overlooked, making our tests fail due to
that discrepancy.
Let's just adjust the values here and move on.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
We cannot only rely only on default_cpu and default_memory in the
config, default is 1 and 2Gi but we need some overhead for QEMU and
the other related binaries running as the pod overhead. Especially
when QEMU is hot-plugging GPUs, CPUs, and memory it can consume more
memory.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
atty is unmaintained, with the last release almost 3 years
ago, so we don't need to check for updates, but instead will
remove it from out dependency tree.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
structopt features were integrated into clap v3 and so is not
actively updated and pulls in the atty crate which has a security
advisory, so update clap, remove structopts, update the code that
used it to remove the outdated dependencies.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
slog-term 2.9.0 included atty, which is unmaintained
as has a security advisory GHSA-g98v-hv3f-hcfr,
so bump the version across our components to remove
this dependency.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We had the proper config.toml configuration for static builds
but were building the glibc target and not the musl target.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
The way GH works, we can only require Zizmor results on ALL PR runs, or
none, so remove the path filter.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Previously, the source field was subject to mandatory checks. However,
in guest-pull mode, this field doesn't consistently provide useful
information. Our practical experience has shown that relying on this
field for critical data isn't always necessary.
In other aspect, not all cases need mandatory check for KataVirtualVolume.
based on this fact, we'd better to make from_base64 do only one thing and
remove the validate(). Of course, We also keep the previous capability to
make it easy for possible cases which use such method and we rename it
clearly with from_base64_and_validate.
This commit relaxes the mandatory checks on the KataVirtualVolume specifically
for guest-pull mode, acknowledging its diminished utility in this context.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
When hot plugging vcpu in dragonball hypervisor, use the synchronization
interface and wait until the hot plug cpu is executed in the guest
before returning. This ensures that the subsequent device hot plug will
not conflict with the previous call.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Let dragonball's resize_vcpu api support synchronization, and only
return after the hot-plug of the CPU is successfully executed in the
guest kernel. This ensures that the subsequent device hot-plug operation
can also proceed smoothly.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Got follow warning with make test of kata-agent:
Compiling rustjail v0.1.0 (/data/teawater/kata-containers/src/agent/rustjail)
Compiling kata-agent v0.1.0 (/data/teawater/kata-containers/src/agent)
warning: unused import: `std::os::unix::fs`
--> rustjail/src/mount.rs:1147:9
|
1147 | use std::os::unix::fs;
| ^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
This commit fixes it.
Fixes: #11508
Signed-off-by: teawater <zhuhui@kylinos.cn>
Introduce a const value `KATA_VIRTUAL_VOLUME_PREFIX` defined in the libs/kata-types,
and it'll be better import such const value from there.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This was originally implemented as a Jenkins skip and is only used in a few
workflows. Nowadays this would be better implemented via the gatekeeper.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This patch fixes the rules.rego file to ensure that the
policy is correctly parsed and applied by opa.
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
This commit updates the `tests_common.sh` script
to enable the `confidential_guest`
setting for the coco tests in the Kubernetes
integration tests.
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
This patch removes storages from the testcases.json file for execprocess.
This is because input storage objects are invalid for two reasons:
1. "io.katacontainers.fs-opt.layer=" is missing option in annotations.
2. by default, we don't have host-tarfs-dm-verity enabled, so the storage
objects are not created in policy.
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
---
This patch introduces some basic checks for the
`image_guest_pull` storage type in the genpolicy tool.
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
This patch improves the test framework for the
genpolicy tool by enabling the use of config maps.
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
Add the definiation of variable DEFCREATECONTAINERTIMEOUT into
Makefile target with default timeout 30s.
Fixes: #485
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
It's used to indicate timeout value set for image pulling in
guest during creating container.
This allows users to set this timeout with annotation according to the
size of image to be pulled.
Fixes#10692
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
It allows users to set this create container timeout within
configuration.toml according to the size of image to be pulled
inside guest.
Fixes#10692
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
To better understand the impact of different timeout values on system
behavior, this section provides a more comprehensive explanation of the
request_timeout_ms:
This timeout value is used to set the maximum duration for the agent to
process a CreateContainerRequest. It's also used to ensure that workloads,
especially those involving large image pulls within the guest, have sufficient
time to complete.
Based on explaination above, it's renamed with `create_container_timeout`,
Specially, exposed in 'configuration.toml'
Fixes#10692
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This helps considerably to avoid patching the code, and just adjusting
the build environment to use a smaller alignment than the default one.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
These tests are not passing, or being maintained,
so as discussed on the AC meeting, we will skip them
from automatically running until they can be reviewed
and re-worked, so avoid wasting CI cycles.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This adds Zizmor GHA security scanning as a PR gate.
Note that this does NOT require that Zizmor returns 0 alerts, but rather
that Zizmor's invocation completes successfully (regardless of how many
alerts it raises).
I will set up the former after this commit is merged (through the GH UI).
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Enable GPU annotations by adding `default_gpus` and `default_gpu_model`
into the list of valid annotations `enable_annotations`.
Fixes#10484
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Add GPU specific annotations used by remote hypervisor for instance
selection during `prepare_vm`.
Fixes#10484
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Two annotations: `default_gpus and `default_gpu_model` as GPU annotations
are introduced for Kata VM configurations to improve instance selection on
remote hypervisors. By adding these annotations:
(1) `default_gpus`: Allows users to specify the minimum number of GPUs a VM
requires. This ensures that the remote hypervisor selects an instance
with at least that many GPUs, preventing resource under-provisioning.
(2) `default_gpu_model`: Lets users define the specific GPU model needed for
the VM. This is crucial for workloads that depend on particular GPU archs or
features, ensuring compatibility and optimal performance.
Fixes#10484
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
To provide the remote hypervisor with the necessary intelligence
to select the most appropriate instance for a given GPU instance,
leading to better resource allocation, two fields `default_gpus`
and `default_gpu_model` are introduced in `RemoteInfo`.
Fixes#10484
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
To better support containerd 2.1 and later versions, remove the
hardcoded `layer.erofs` and instead parse `/proc/mounts` to obtain the
real mount source (and `/sys/block/loopX/loop/backing_file` if needed).
If the mount source doesn't end with `layer.erofs`, it should be marked
as unsupported, as it may be a filesystem meta file generated by later
containerd versions for the EROFS flattened filesystem feature.
Also check whether the filesystem type is `overlay` or not, since the
containerd mount manager [1] may change it after being introduced.
[1] https://github.com/containerd/containerd/issues/11303
Fixes: f63ec50ba3 ("runtime: Add EROFS snapshotter with block device support")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Current Dockerfile fails when trying to build from the root of the repo
docker build -t kata-monitor -f tools/packaging/kata-monitor/Dockerfile .
with "invalid go version '1.23.0': must match format 1.23"
Using go 1.23 in the Dockerfile fixes the build error
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
I notices that agent-ctl is including a 9 month old version of
image-rs and the libs crates haven't been update for potentially
many years, so bump all of these.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This commit introduces the ability to run Pods without shared fs
mechanism in Kata.
The default shared fs can lead to unnecessary resource consumption
and security risks for certain use cases. Specifically, scenarios
where files only need to be copied into the VM once at Pod creation
(e.g., non-tee envs) and don't require dynamic updates make the shared
fs redundant and inefficient.
By explicitly disabling shared fs functionality, we reduce resource
overhead and shrink the attack surface. Users will need to employ
alternative methods(e.g. guest-pull) to ensure container images are
shared into the guest VM for these specific scenarios.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
In the pre commit:74eccc54e7b31cc4c9abd8b6e4007c3a4c1d4dd4,
it missed return the right rootfs volume.
In the is_block_rootfs fn, if the rootfs is based on a
block device such as devicemapper, it should clear the
volume's source and let the device_manager to use the
dev_id to get the device's host path.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
For containerd's Blockfile Snapshotter, it will pass
a rootfs mounts with a rawfile as a mount source
and mount options with "loop" embeded.
To support this type of rootfs, it is necessary to identify this as a
blockfile rootfs through the "loop" flag, and then use the volume source
of the rootfs as the source of the block device to hot-insert it into
the guest.
Fixes:#11464
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Instead of building it every time, we can store the regorus
binary in OCI registry using oras and download it from there.
This reduces the install time from ~1m40s to ~15s.
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
This commit add support of resize_vcpu for cloud-hypervisor
using the it's vm resize api. It can support bothof vcpu hotplug
and hot unplug.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
For cloud-hypervisor, currently only hot plugging of memory is
supported, but hot unplugging of memory is not supported. In addition,
by default, cloud-hypervisor uses ACPI-based memory hot-plugging instead
of virtio-mem based memory hot-plugging.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Add API interfaces for get vminfo and resize. get vminfo can obtain the
memory size and number of vCPUs from the cloud hypervisor vmm in real
time. This interface provides information for the subsequent resize
memory and vCPU.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
The system's own Deserialize cannot implement parsing from string to
MacAddr, so we need to implement this trait ourself.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
To make it flexibility and extensibility This change modifies the Kata
Agent's handling of `InitData` to allow for unrecognized key-value pairs.
The `InitData` field now directly utilizes `HashMap<String, String>`,
enabling it to carry arbitrary metadata and information that may be
consumed by other components
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
During sandbox preparation, initdata should be specified to TdxConfig,
specially mrconfigid, which is used to pass to tdx guest report for
measurement.
Fixes#11180
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
SEV-SNP guest configuration utilizes a different set of properties
compared to the existing 'sev-guest' object. This change introduces
the `host-data` property within the sev-snp-guest object. This property
allows for configuring an SEV-SNP guest with host-provided data, which
is crucial for data integrity verification during attestation.
The `host-data` property is specifically valid for SEV-SNP guests
running
on a capable platform. It is configured as a base64-encoded string when
using the sev-snp-guest object.
the example cmdline looks like:
```shell
-object sev-snp-guest,id=sev-snp0,host-data=CGNkCHoBC5CcdGXir...
```
Fixes#11180
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
To facilitate the transfer of initdata generated during
`prepare_initdata_device_config`, a new parameter has been
introduced into the `prepare_protection_device_config` function.
Furthermore, to specifically pass initdata to SEV-SNP Guests, a
`host_data` field has been added to the `SevSnpConfig` structure.
However, this field is exclusively applicable to the SEV-SNP platform.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Retrieve the Initdata string content from the security_info of the
Configuration. Based on the Protection Platform type, calculate the
digest of the Initdata. Write the Initdata content to the block
device. Subsequently, construct the BlockConfig based on this block
device information.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
To correctly manage initdata as a block device, a new InitData
Resource type, inherently a block device, has been introduced
within the ResourceManager. As a component of the Sandbox's
resources, this InitData Resource needs to be appropriately
handled by the Device Manager's handler.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit implements the retrieval and processing of InitData provided
via a Pod annotation. Specifically, it enables runtime-rs to:
(1) Parse the "io.katacontainers.config.hypervisor.cc_init_data"
annotation from the Pod YAML.
(2) Perform reverse operations on the annotation value: base64 decoding
followed by gzip decompression.
(3) Deserialize the decompressed data into the internal InitData
structure.
(4) Serialize the resulting InitData into a string and store it in the
Configuration.
This allows users to inject configuration data into the TEE Guest by
encoding and compressing it and passing it as an annotation in the Pod
configuration. This mechanism supports scenarios where dynamic config
is required for Confidential Containers.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit introduces the Initdata Spec and the logic for
calculating its digest. It includes:
(1) Define a `ProtectedPlatform` enum to represent major TEE platform
types.
(2) Create an `InitData` struct to support building and serializing
initialization data in TOML format.
(3) Implement adaptation for SHA-256, SHA-384, and SHA-512 digest
algorithms.
(4) Provide a platform-specific mechanism for adjusting digest lengths
(zero-padding).
(5) Supporting the decoding and verification of base64+gzip encoded
Initdata.
The core functionality ensures the integrity of data injected by the
host through trusted algorithms, while also accommodating the
measurement requirements of different TEE platforms.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit introduces a new `initdata` field of type String to
hypervisor `SecurityInfo`.
In accordance with the Initdata Specification, this field will
facilitate the injection of well-defined data from an untrusted host
into the TEE. To ensure the integrity of this injected data, the TEE
evidence's hostdata capability or the (v)TPM dynamic measurement
capability will be leveraged, as outlined in the specification.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Don't use local launched_pods variable in test_rc_policy(), because
teardown() needs to use this variable to print a description of the
pods, for debugging purposes.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
The locking mechanism around the layers cache file was insufficient to
prevent corruption of the file. This commit moves the layers cache's
management in-memory, only reading the cache file once at the beginning
of `genpolicy`, and only writing to it once, at the end of `genpolicy`.
In the case that obtaining a lock on the cache file fails,
reading/writing to it is skipped, and the cache is not used/persisted.
Signed-off-by: charludo <git@charlotteharludo.com>
`vmm-sys-util` was duplicated while updating the `ignore` list of
`rust-vmm` crates in #11431, remove duplicated one and sort the list.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
When moving from clap v2 to v4 a bunch of
functions have been removed, so update the code
to handle these replacements
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
When moving from clap v2 to v4 a bunch of
functions have been removed, so update the code
to handle these replacements
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Update dependabot ignore list in cargo ecosystem to ignore upgrades from
rust-vmm crates, since those crates need to be managed carefully and
manually.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Pin Github owned actions to specific hashes as recommended
as tags are mutable see https://pin-gh-actions.kammel.dev/.
This one of the recommendations that scorecard gives us.
Note this was generated with `frizbee actions`
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Fedora 40 is EoL, and I've seen the registry pull fail
a few times recently, so let's bump to fedora 42 which
has 10 months of support left.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Now we are decoupled from the image-rs crate,
we can bump the protobuf version across our project
to resolve the GHSA-2gh3-rmm4-6rq5 advisory
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This patch updates the container image for the CI test workloads:
- `k8s-layered-sc-deployment.yaml`
- `k8s-pod-sc-deployment.yaml`
- `k8s-pod-sc-nobodyupdate-deployment.yaml`
- `k8s-pod-sc-supplementalgroups-deployment.yaml`
- `k8s-policy-deployment.yaml`
Also updates unit tests:
- `test_create_container_security_context`
- `test_create_container_security_context_supplemental_groups`
This fixes tests failing due to an image pull error as the previous image is no longer available in
the container registry.
Signed-off-by: Archana Choudhary <archana1@microsoft.com>
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Switch the hyper for an underscore, so the ghcr
helm publish can work properly.
Co-authored-by: Fabiano Fidêncio <fidencio@northflank.com>
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Only sign the kernel if the user has provided the KBUILD_SIGN_PIN
otherwise ignore.
Whole here, let's move the functionality to the common fragments as it's
not a GPU specific functionality.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
At the moment if any of the tests in the matric fails
then the rest of the jobs are cancelled, so we have to
re-run everything. Add `fail-fast: false` to stop this
behaviour.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Remove the rule that causes gatekeeper to skip tests
if we've only updated the required-tests.yaml list.
Although update to just the required-tests.yaml
doesn't change the outcome of any of the CI tests, it
does change whether gatekeeper will still pass with the new
rules. Although it's a bit of a hit to run the CI, it's probably
worth it to keep gatekeeper validated.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This adds govulncheck vulnerability scanning as a non-blocking check in
the static checks workflow. The check scans Go runtime binaries for known
vulnerabilities while filtering out verified false positives.
Signed-off-by: Mitch Zhu <mitchzhu@microsoft.com>
Since #11197 was merged, all confidential k8s e2e tests for s390x
have been failing with the following errors:
```
attestation-agent: error while loading shared libraries:
libcurl.so.4: cannot open shared object file
libnghttp2.so.14: cannot open shared object file
```
In line with the update on x86_64, we need to upgrade the OS used
in rootfs-{image,initrd} on s390x.
This commit also bumps all 22.04 to 24.04 for all architectures.
For s390x, this ensures the missing packages listed above are
installed.
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
After commit a3f973db3b merged, protection::GuestProtection::[Snp,Sev]
have changed to tuple variants, and can no longer be used in assert_eq
marco without tuple values, or some errors will raised:
```
assert_eq!(actual.unwrap(), GuestProtection::Snp);
| ^^^^^^^^^^^^^^^^^^^^ expected \
`GuestProtection`, found enum constructor
```
Signed-off-by: Lei Liu <liulei.pt@bytedance.com>
containerd-sandboxapi fails with `containerd v2.0.x` and passes with
`containerd v1.7.x` regardless kata-containers. And it was not tested
with `containerd v2.0.x` because `containerd v2.0.x` could not
recognize `[plugins.cri.containerd]` in `config.toml`.
Signed-off-by: Seunguk Shin <seunguk.shin@arm.com>
the latest Canonical TDX release supports 25.04 / Plucky as
well. Users experimenting with the latest goodies in the
25.04 TDX enablement won't get Kata deployed properly.
This change accepts 25.04 as supported distro for TDX.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Container image integrity protection is a critical practice involving a
multi-layered defense mechanism. While container images inherently offer
basic integrity verification through Content-Addressable Storage (CAS)
(ensuring pulled content matches stored hashes), a combination of other
measures is crucial for production environments. These layers include:
Encrypted Transport (HTTPS/TLS) to prevent tampering during transfer;
Image Signing to confirm the image originates from a trusted source;
Vulnerability Scanning to ensure the image content is "healthy"; and
Trusted Registries with stringent access controls.
In certain scenarios, such as when container image confidentiality
requirements are not stringent, and integrity is already ensured via the
aforementioned mechanisms (especially CAS and HTTPS/TLS), adopting
"force guest pull" can be a viable option. This implies that even when
pulling images from a container registry, their integrity remains
guaranteed through content hashes and other built-in mechanisms, without
relying on additional host-side verification or specialized transfer
methods.
Since this feature is already available in runtime-go and offers
synergistic benefits with guest pull, we have chosen to support force
guest pull.
Fixes#10690
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit introduces the `adjust_rootfs_mounts` function to manage
root filesystem mounts for guest-pull scenarios.
When the force guest-pull mechanism is active, this function ensures that
the rootfs is exclusively configured via a dedicated `KataVirtualVolume`.
It disregards any provided input mounts, instead generating a single,
default `KataVirtualVolume`. This volume is then base64-encoded and set
as the sole mount option for a new, singular `Mount` entry, which is
returned as the only item in the `Vec<Mount>`.
This change guarantees consistent and exclusive rootfs configuration
when utilizing guest-pull for container images.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
In CoCo scenarios, there's no image pulling on host side, and it will
disable such operations, that's to say, there's no files sharing between
host and guest, especially for container rootfs.
We introduce Kata Virtual Volume to help handle such cases:
(1) Introduce is_kata_virtual_volume to ensure the volume is kata
virtual volume.
(2) Introduce VirtualVolume Handling logic in handle_rootfs when the
mount is kata virtual volume.
Fixes#10690
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit introduces comprehensive support for rootfs mount mgmt
through Kata Virtual Volumes, specifically enabling the guest-pull
mechanism.
It enhances the runtime's ability to:
(1) Extract image references from container annotations (CRI/CRI-O).
(2) Process `KataVirtualVolume` objects, configuring them for guest-pull operations.
(3) Set up the agent's storage for guest-pulled images.
This functionality streamlines the process of pulling container images
directly within the guest for rootfs, aligning with guest-side image management strategies.
Fixes#10690
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
The Multistrap issue has been fixed in noble thus we can use the LTS.
Also, this will fix the error reported by CDH
```
/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.38' not found
```
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
The new version of AA allows the config not having a coco_as token
config. If not provided, it will mark as None.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
This patch updates the guest-components to new version with better
error logging for CDH. It also allows the config of AA not having a
coco_as token config.
Also, the new version of CDH requires to build aws-lc-sys thus needs to
install cmake for build.
See
https://github.com/kata-containers/kata-containers/actions/runs/15327923347/job/43127108813?pr=11197#step:6:1609
for details.
Besides, the new version of guest-components have some fixes for SNP
stack, which requires the updates of trustee side.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
This new parameter for kata-agent is used to control the timeout for a
guest pull request. Note that sometimes an image can be really big, so
we set default timeout to 1200 seconds (20 minutes).
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
After moving image pulling from kata-agent to CDH, the failed image pull
error messages have been slightly changed. This commit is to apply for
the change.
Note that in original and current image-rs implementation, both no key
or wrong key will result in a same error information.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Now kata-agent by default supports both guest pull and host pull
abilities, thus we do not need to specify the PULL_TYPE env when
building kata-agent.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
After moving guest pull abilities to CDH, the document of guest pull
should be updated due to new workflow.
Also, replace the diagram of PNG into a mermaid one for better
maintaince.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
In previous version, only when the `guest-pull` feature is enabled
during the build time, the OCI process will be tried to be overrided
when the storage has a guest pull volume and also it is sandbox. After
getting rid of the feature, whether it is guest-pull is runtimely
determined thus we can always do this trying override, by checking if
there is kata guest pull volume in storages and it's sandbox.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Now the ocicrypt configuration used by CDH is always the same and it's
not a good practics to write it into the rootfs during runtime by
kata-agent. Thus we now move it to coco-guest-components build script.
The config will be embedded into guest image/initrd together with CDH
binary.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
The feature `guest-pull` and `default-pull` are both removed, because
both guest pull and host pull are supported in building time without
without involving new dependencies like image-rs before. The guest pull
will depend on the CDH process, not the build time feature.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
This is a higher level calling to pull image inside guest. Now it should
call confidential_data_hub's API. As the previous pull_image API does
1. check is sandbox
2. generate bundle_path
inside the original logic, and the new API does not do them to keep the
API semantice clean, thus before we call the API, we explicitly do the
two things.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
now image pull ability is moved to CDH, thus the CDH process needs
environment variables of ocicrypt to help find the keyprovider(cdh) to
decrypt images.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
As image pull ability is moved to CDH, kata-agent does not need the
confugurations of image pulling anymore.
All these configurations reading from kernel cmdline is now implemented
by CDH.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
Image pull abilities are all moved to the separate component
Confidential Data Hub (CDH) and we only left the auxiliary functions
except pull_image in confidential_data_hub/image.rs
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
This is a little refactoring commit that moves the mod `cdh.rs` and
`image.rs` to a directory module `confidential_data_hub`. This is
because the image pull ability will be moved into confidential data
hub, thus it is better to handle image pull things in the confidential
data hub submodule.
Also, this commit does some changes upon the original code. It gets rid
of a static variable for CDH timeout config and directly use the global
config variable's member. Also, this changes the
`is_cdh_client_initialized` function to sync version as it does not need
to be async.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
CDH provides the image pull api. This commit adds the declaration of the
API in the CDH proto file. This will be used in following commits.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
This method is not used when guest-pull is not used.
Add a flag that prevents a compile error when building with rust version > 1.84.0 and not using guest-pull
Signed-off-by: Saul Paredes <saulparedes@microsoft.com>
Fixes a confusing log message shown when Virtio-FS is disabled.
Previously we logged “The virtiofsd had stopped” regardless of whether Virtio-FS was actually enabled or not.
Signed-off-by: Paweł Bęza <pawel.beza99@gmail.com>
Add the memory prealloc support for qemu hypervisor.
When it was enabled, all of the memory will be allocated
and locked. This is useful when you want to reserve all the
memory upfront or in the cases where you want memory latencies
to be very predictable.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
This reverts commit 2ee3470627.
This is mostly redundant given we already have workflow approval for external
contributors.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Bump `ttrpc-codegen` related dependencies in response to `ttrpc-codegen`
bump in `libs/protocol`.
Relates: #11376
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Fix `empty_line_after_doc_comments` clippy warning as suggested by rust
1.85.1.
```console
error: empty line after doc comment
--> dbs_boot/src/x86_64/layout.rs:11:1
|
11 | / /// Magic addresses externally used to lay out x86_64 VMs.
12 | |
| |_^
13 | /// Global Descriptor Table Offset
14 | pub const BOOT_GDT_OFFSET: u64 = 0x500;
| ------------------------------ the comment documents this constant
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#empty_line_after_doc_comments
= note: `-D clippy::empty-line-after-doc-comments` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::empty_line_after_doc_comments)]`
= help: if the empty line is unintentional remove it
help: if the documentation should include the empty line include it in the comment
|
12 | ///
|
```
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Manually fix `unstable_name_collisions` clippy warning reported by rust
1.85.1.
```console
error: a method with this name may be added to the standard library in the future
--> src/registry.rs:646:10
|
646 | file.unlock()?;
| ^^^^^^
|
= warning: once this associated item is added to the standard library, the ambiguity may cause an error or change in behavior!
= note: for more information, see issue #48919 <https://github.com/rust-lang/rust/issues/48919>
= help: call with fully qualified syntax `fs2::FileExt::unlock(...)` to keep using the current method
= note: `-D unstable-name-collisions` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(unstable_name_collisions)]`
```
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Bump `ttrpc-codegen` related dependencies in response to `ttrpc-codegen`
bump in `libs/protocol`.
Relates: #11376
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Manually fix `empty_line_after_outer_attr` clippy warning reported by
rust 1.85.1.
```console
error: empty line after outer attribute
--> src/check.rs:515:9
|
515 | / #[allow(dead_code)]
516 | |
| |_^
517 | struct TestData<'a> {
| ------------------- the attribute applies to this struct
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#empty_line_after_outer_attr
= note: `-D clippy::empty-line-after-outer-attr` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::empty_line_after_outer_attr)]`
= help: if the empty line is unintentional remove it
```
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Bump `ttrpc-codegen` related dependencies in response to `ttrpc-codegen`
bump in `libs/protocol`.
Relates: #11376
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Bump `ttrpc-codegen` related dependencies in response to `ttrpc-codegen`
bump in `libs/protocol`.
Relates: #11376
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Manually fix `empty_line_after_doc_comments` clippy warning reported by
rust 1.85.1.
```console
error: empty line after doc comment
--> src/linux_abi.rs:8:1
|
8 | / /// Linux ABI related constants.
9 | |
| |_^
10 | #[cfg(target_arch = "aarch64")]
11 | use std::fs;
| ------- the comment documents this import
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#empty_line_after_doc_comments
= note: `-D clippy::empty-line-after-doc-comments` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::empty_line_after_doc_comments)]`
= help: if the empty line is unintentional remove it
```
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
The users@0.11.0 has a high severity CVE-2025-5791
and doesn't seem to be maintained, so switch to
uzers which forked it.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In the nvidia rootfs build, only copy in `kata-opa` if `AGENT_POLICY` is enabled. This fixes
builds when `AGENT_POLICY` is disabled and opa is not built.
Signed-off-by: Champ-Goblem <cameron@northflank.com>
Manually fix `unnecessary_get_then_check` clippy warning as suggested by
rust 1.85.1.
```console
warning: unnecessary use of `get(&shared_mount.src_ctr).is_none()`
--> src/sandbox.rs:431:25
|
431 | if src_ctrs.get(&shared_mount.src_ctr).is_none() {
| ---------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| |
| help: replace it with: `!src_ctrs.contains_key(&shared_mount.src_ctr)`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unnecessary_get_then_check
```
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Kata runtime employs a CapabilityBits mechanism for VMM capability
governance. Fundamentally, this mechanism utilizes predefined feature
flags to manage the VMM's operational boundaries.
To meet demands for storage performance and security, it's necessary
to explicitly enable capability flags such as `BlockDeviceSupport`
(basic block device support) and `BlockDeviceHotplugSupport` (block
device hotplug) which ensures the VMM provides the expected caps.
In CoCo scenarios, due to the potential risks of sensitive data leaks
or side-channel attacks introduced by virtio-fs through shared file
systems, the `FsSharingSupport` flag must be forcibly disabled. This
disables the virtio-fs feature at the capability set level, blocking
insecure data channels.
Fixes#11341
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Two key important scenarios:
(1) Support `virtio-blk-pci` cold plug capability for confidential guests
instead of nvdimm device in CVM due to security constraints in CoCo cases.
(2) Push initdata payload into compressed raw block device and insert it
in CVM through `virtio-blk-pci` cold plug mechanism.
Fixes#11341
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
By default the checkout action leave the credentials
in the checked-out repo's `.git/config`, which means
they could get exposed. Use persist-credentials: false
to prevent this happening.
Note: static-checks.yaml does use git diff after the checkout,
but the git docs state that git diff is just local, so doesn't
need authentication.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
On newer TDX platforms, checking `/sys/firmware/tdx` for `major_version` and
`minor_version` is no longer necessary. Instead, we only need to verify that
`/sys/module/kvm_intel/parameters/tdx` is set to `'Y'`.
This commit addresses the following:
(1) Removes the outdated check and corrects related code, primarily impacting
`cloud-hypervisor`.
(2) Refines the TDX platform detection logic within `arch_guest_protection`.
Fixes#11177
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Currently, the TDX Quote Generation Service (QGS) connection in
QEMU with default vsock port 4050 for TD attestation. To make it
flexible for users to modify the QGS port. Based on the introduced
qgs_port, This commit supports the QGS port to be configured via
configuration
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Currently, the TDX Quote Generation Service (QGS) connection in QEMU is
hardcoded to vsock port 4050, which limits flexibility for TD attestation.
While the users will be able to modify the QGS port. To address this
inflexibility, this commit introduces a new qgs_port field within security
info and make it default with 4050.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
During the prepare for `start sandbox` phase, this commit
ensures the correct `ProtectionDeviceConfig` is prepared
based on the `GuestProtection` type in a TEE platform.
Specifically, for the TDX platform, this commit sets the
essential parameters within the ProtectionDeviceConfig,
including the TDX ID, firmware path, and the default QGS
port (4050).
This information is then passed to the underlying VMM for
further processing using the existing ResourceManager and
DeviceManager infrastructure.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This patch introduces TdxConfig with key fields, firmare,
qgs_port, mrconfigid, and other useful things. With this config,
a new ProtectionDeviceConfig type `Tdx(TdxConfig)` is added.
With this new type supported, we finally add tdx protection device
into the cmdline to launch a TDX-based CVM.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit introduces the `tdx-guest` designed to facilitate
the launch of CVMs leveraging Intel's TDX.
Launching a TDX-based CVM requires various properties, including
`quote-generation-socket`, and `mrconfigid`,`sept-ve-disable` .etc.
(1) The `quote-generation-socket` property is added to the
`tdx-guest` object, which is of type `SocketAddress`, specifies the
address of the Quote Generation Service (QGS).
(2) The `mrconfigid` property, representing the SHA384 hash
for non-owner-defined configurations of the guest TD, is introduced as a
runtime or OS configuration parameter.
(3) And the `sept-ve-disable` property allows control over whether
EPT violation conversions to #VE exceptions are disabled when the guest
TD accesses PENDING pages.
With the introduction of the `tdx-guest` object and its associated
properties, launching TDX-based CVMs is now supported. For example, a
TDX guest can be configured via the command line as follows:
```shell
-object {"qom-type":"tdx-guest", "id":"tdx", "sept-ve-disable":true,\
"mrconfigid":"vHswGkzG4B3Kikg96sLQ5vPCYx4AtuB4Ubfzz9UOXvZtCGat8b8ok7Ubz4AxDDHh",\
"quote-generation-socket":{"type":"vsock","cid":"2","port":"4050"} \
-machine q35,accel=kvm,confidential-guest-support=tdx
```
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This enables consistent JSON representation of socket addresses
across system components:
(1) Add serde serialization/deserialization with standardized
field naming convention.
(2) Enforce string-based port/cid and unix/path representation
for protocol compatibility.
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
For CoCo, shared_fs is prohibited as we cannot guarantee the security of
guest/host sharing. Therefore, this PR enables administrators to configure
shared_fs = "none" via the configuration.toml file, thereby enforcing the
disablement of sharing.
Fixes#10677
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Run the k8s tests on mariner with annotation disable_image_nvdimm=true,
to use virtio-blk instead of nvdimm for the guest rootfs block device.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Allow users to build using DEFDISABLEIMAGENVDIMM=true if they want to
set disable_image_nvdimm=true in configuration-clh.toml.
disable_image_nvdimm=false is the default config value.
Also, use virtio-blk instead of nvdimm if disable_image_nvdimm=true in
configuration-clh.toml.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Allow users to build using DEFDISABLEIMAGENVDIMM=true if they
want to set disable_image_nvdimm=true in configuration-qemu*.toml.
disable_image_nvdimm=false is the default configuration value.
Note that the value of disable_image_nvdimm gets ignored for
platforms using "confidential_guest = true".
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Comment out "disable_image_nvdimm = true" in:
- configuration-qemu-snp.toml
- configuration-qemu-nvidia-gpu-snp.toml
for consistency with the other configuration-qemu*.toml files.
Those two platforms are using "confidential_guest = true", and therefore
the value of disable_image_nvdimm gets ignored.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
Bump chrono package to 0.4.41 and thereby
remove the time 0.1.43 dependency and remediate
CVE-2020-26235
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
This removes the ok-to-test label on every push, except if the PR author
has write access to the repo (ie. permission to modify labels).
This protects against attackers who would initially open a genuine PR,
then push malicious code after the initial review.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
This completely eliminates the Azure secret from the repo, following the below
guidance:
https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/configuring-openid-connect-in-azure
The federated identity is scoped to the `ci` environment, meaning:
* I had to specify this environment in some YAMLs. I don't believe there's any
downside to this.
* As previously, the CI works seamlessly both from PRs and in the manual
workflow.
I also deleted the tools/packaging/kata-deploy/action folder as it doesn't seem
to be used anymore, and it contains a reference to the secret.
Signed-off-by: Aurélien Bombo <abombo@microsoft.com>
Having secrets unconditionally being inherited is
bad practice, so update the workflows to only pass
through the minimal secrets that are needed
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
In this commit, hotplug_vfio_on_root_bus parameter is removed.
<dd422ccb69>
pcie_root_port parameter description
(`This value is valid when hotplug_vfio_on_root_bus is true and
machine_type is "q35"`) will have no value,
and not completely valid, since vrit or DB as also support for root-ports and CLH as well.
so removed.
Fixes: #11316
Co-authored-by: Zvonko Kaiser <zkaiser@nvidia.com>
Signed-off-by: Shunsuke Kimura <pbrehpuum@gmail.com>
Instead of looping over the users per group and parsing passwd for each
user, we can do the reverse lookup uid->user up front and then compare
the names directly. This has the nice side-effect of silencing warnings
about non-existent users mentioned in /etc/group, which is not relevant
for policy decisions.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
containerd does not automatically add groups to the list of additional
GIDs when the groups have the same name as the user:
https://github.com/containerd/containerd/blob/f482992/pkg/oci/spec_opts.go#L852-L854
This is a bug and should be corrected, but it has been present since at
least 1.6.0 and thus affects almost all containerd deployments in
existence. Thus, we adopt the same behavior and ignore groups with the
same name as the user when calculating additional GIDs.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
When connecting to guest through vsock, a log is printed for each failure.
The failure comes from two main reasons: (1) the guest is not ready or (2)
some real errors happen. Printing logs for the first case leads to log
clutter, and your logs will like this:
```
Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/...
Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/...
Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/...
Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/...
Feb 07 02:47:24 ubuntu containerd[520]: {"msg":"connect uds \"/run/kata/...
```
To avoid this, the sock implmentations save the last error and return it
after all retries are exhausted. Users are able to check all errors by
setting the log level to trace.
Reorganize the log format to "{sock type}: {message}" to make it clearer.
Apart from that, errors return by the socks use `self`, instead of
`ConnectConfig`, since the `ConnectConfig` doesn't provide any useful
information.
Disable infinite loop for the log forwarder. There is retry logic in the
sock implmentations. We can consider the agent-log unavailable if
`sock.connect()` encounters an error.
Fixes: #10847
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
The vhost-user-fs has been added to Dragonball, so we can remove
`update_memory`'s dead_code attribute.
Fixes: #8691
Signed-off-by: Xuewei Niu <niuxuewei.nxw@antgroup.com>
Removed unnecessary dynamic dispatch for services. Properly dereferenced
service Box values and stored in Arc.
Co-authored-by: Ruoqing He <heruoqing@iscas.ac.cn>
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
Signed-Off-By: Ryan Savino <ryan.savino@amd.com>
Previous version of `ttrpc-codegen` is generating outdated
`#![allow(box_pointers)]` which was deprecated. Bump `ttrpc-codegen`
from v0.4.2 to v0.5.0 and `protobuf` from vx to v3.7.1 to get rid of
this.
Signed-off-by: Ruoqing He <heruoqing@iscas.ac.cn>
The additional GIDs are handled by genpolicy as a BTreeSet. This set is
then serialized to an ordered JSON array. On the containerd side, the
GIDs are added to a list in the order they are discovered in /etc/group,
and the main GID of the user is prepended to that list. This means that
we don't have any guarantees that the input GIDs will be sorted. Since
the order does not matter here, comparing the list of GIDs as sets is
close enough.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
The warning used to trigger even if the passwd file was not needed. This
commit moves it down to where it actually matters.
Signed-off-by: Markus Rudy <mr@edgeless.systems>
We need more and accurate documentation. Let's start
by providing an Helm Chart install doc and as a second
step remove the kustomize steps.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
Co-authored-by: Steve Horsman <steven@uk.ibm.com>
The Guest rootfs image file size is aligned up to 128M boundary,
since commmit 2b0d5b2. This change allows users to use a custom
alignment value - e.g., to align up to 2M, users will be able to
specify IMAGE_SIZE_ALIGNMENT_MB=2 for image_builder.sh.
Signed-off-by: Dan Mihai <dmihai@microsoft.com>
- Create groups for commonly seen cargo packages so that rather than
getting up to 9 PRs for each rust components, bumps to the same package
are grouped together.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
- Create a dependabot configuration to check for updates
to our rust and golang packages each day and our github
actions each month
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
After the last commit, the initdata test on SNP should be ok. Thus we
turn on this flag for CI.
Fixes#11300
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
the qemu commandline of SNP should start with `sev-snp-guest`, and then
following other parameters separeted by ','. This patch fixes the
parameter order.
Signed-off-by: Xynnn007 <xynnn@linux.alibaba.com>
We're switching to using a rev as it may take some time for the package
to be updated on crates.io.
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
There's no benefit on keeping those restricted to the dragonball build,
when they can be used with other VMMs as well (as long as they support
the mem-agent).
Signed-off-by: Fabiano Fidêncio <fidencio@northflank.com>
Currently, when a new sandbox resource controller is created with cgroupsv2 and sandbox_cgroup_only is disabled,
the cgroup management falls back to cgroupfs. During deletion, `IsSystemdCgroup` checks if the path contains `:`
and tries to delete the cgroup via systemd. However, the cgroup was originally set up via cgroupfs and this process
fails with `lstat /sys/fs/cgroup/kubepods.slice/kubepods-besteffort.slice/....scope: no such file or directory`.
This patch updates the deletion logic to take in to account the sandbox_cgroup_only=false option and in this case uses
the cgroupfs delete.
Fixes: #11036
Signed-off-by: Champ-Goblem <cameron@northflank.com>
Increase the NOFILE limit in the systemd service, this helps with
running databases in the Kata runtime.
Signed-off-by: Champ-Goblem <cameron@northflank.com>
Since 3.12 we're shipping the helm-chart per default
with each release. Update the documentation to use helm rather
then the kata-deploy manifests.
Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
We have a number of jobs that either need,or nest workflows
that need gh permissions, such as for pushing to ghcr,
or doing attest build provenance. This means they need write
permissions on things like `packages`, `id-token` and `attestations`,
so we need to set these permissions at the job-level
(along with `contents: read`), so they are not restricted by our
safe defaults.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
I shortsightedly forgot that gatekeeper would need
to read more than just the commit content in it's
python scripts, so add read permissions to actions
issues which it uses in it's processing
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
We have a number of jobs that nest the build-static-tarball
workflows later on. Due to these doing attest build provenance,
and pushing to ghcr.io, t hey need write permissions on
`packages`, `id-token` and `attestations`, so we need to set
these permissions on the top-level jobs (along with `contents: read`),
so they are not blocked.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
Some legacy workflows require write access to github which
is a security weakness and don't provide much value,
so lets remove them.
Signed-off-by: stevenhorsman <steven@uk.ibm.com>
It frequently causes "Resource Temporarily Unavailable (OS Error 11)"
with the original 250ms read timeout When passing through devices via
VFIO in QEMU. The root cause lies in synchronization timeout windows
failing to accommodate inherent delays during critical hardware init
phases in kernel space. This commit would increase the timeout to 5000ms
which was determined through some tests. While not guaranteeing complete
resolution for all hardware combinations, this change significantly
reduces timeout failures.
Fixes # 10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This enables guest pull via config, without the need of any external
snapshotter. When the config enables runtime.experimental_force_guest_pull, instead of
relying on annotations to select the way to share the root FS, we always
use guest pull.
Co-authored-by: Markus Rudy <mr@edgeless.systems>
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
The service name is specified as RFC 1035 lable name [1]. The svc_name
regex in the genpolicy settings is applied to the downward API env
variables created based on the service name. So it tries to match
RFC 1035 labels after they are transformed to downward API variable
names [2]. So the set of lower case alphanumerics and dashes is
transformed to upper case alphanumerics and underscores.
The previous regex wronly permitted use of numbers, but did allow
dot and dash, which shouldn't be allowed (dot not because they aren't
conform with RFC 1035, dash not because it is transformed to underscore).
We have to take care not to also try to use the regex in places where
we actually want to check for RFC 1035 label instead of the downward
API transformed version of it.
Further, we should consider using a format like JSON5/JSONC for the
policy settings, as these are far from trivial and would highly benefit
from proper documentation through comments.
[1]: https://kubernetes.io/docs/concepts/services-networking/service/#defining-a-service
[2]: b2dfba4151/pkg/kubelet/envvars/envvars.go (L29-L70)
Signed-off-by: Paul Meyer <katexochen0@gmail.com>
Since kernel v6.3 the vsock packet is not split over two descriptors and
is instead included in a single one.
Therefore, we currently decide the specific method of obtaining
BufWrapper based on the length of descriptor.
Refer:
a2752fe04fhttps://git.kernel.org/torvalds/c/71dc9ec9ac7d
Signed-off-by: Xingru Li <lixingru.lxr@linux.alibaba.com>
[ Gao Xiang: port this patch from the internal branch to address Linux 6.1.63+. ]
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Currently, Kata EROFS support needs it, otherwise it will:
[ 0.564610] erofs: (device sda): mounted with root inode @ nid 36.
[ 0.564858] overlayfs: failed to set xattr on upper
[ 0.564859] overlayfs: ...falling back to index=off,metacopy=off.
[ 0.564860] overlayfs: ...falling back to xino=off.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Some nvidia gpu pci address domain with 0001,
current runtime default deal with 0000:bdf,
which cause address errors during device initialization
and address conflicts during device registration.
Fixes#11252
Signed-off-by: yangsong <yunya.ys@antgroup.com>
Fixed "note: Not following: ./../../../tools/packaging/guest-image/lib_se.sh:
openBinaryFile: does not exist (No such file or directory) [SC1091]"
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Although the script will inherit that setting from the caller scripts,
expliciting it in the file will vanish shellcheck "warning: Use 'pushd
... || exit' or 'pushd ... || return' in case pushd fails. [SC2164]"
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Addressed the following shellcheck advices:
SC2046 (warning): Quote this to prevent word splitting.
SC2248 (style): Prefer double quoting even when variables don't contain special characters
SC2250 (style): Prefer putting braces around variable references even when not strictly required.
SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
It's import to handle port allocation in a PCIe topology before vfio
deivce hotplug via QMP.
The code ensures that VFIO devices are properly allocated to available
ports (either root ports or switch ports) and updates the device's bus
and port information accordingly.
It'll first retrieves the PCIe port type from the topology using
pcie_topo.get_pcie_port(). And then, searches for an available node in
the PCIe topology with RootPort or SwitchPort type and allocates the
VFIO device to the found available port. Finally, Updates the device's
bus with the allocated port's ID and type.
Fixes # 10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit implements the `find_available_node` function,
which searches the PCIe topology for the first available
`TopologyPortDevice` or `SwitchDownPort`.
If no available node is found in either the `pcie_port_devices`
or the connected switches' downstream ports, the function returns
`None`.
Fixes # 10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
This commit note that the current implementation restriction where
'multifunction=on' is temporarily unsupported. While the feature
isn't available in the present version, we explicitly acknowledge
this limitation and commit to addressing it in future iterations
to enhance functional completeness.
Tracking issue #11292 has been created to monitor progress towards
full multifunction support.
Fixes#10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
To support port devices for vfio devices, more fields need to be
introduced to help pass port type, bus and other information.
Fixes#10361
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
When try to delete a cgroup, it's needed to move all of the
tasks/procs in the cgroup into root cgroup and then delete it.
Since for cgroup v2, it doesn't support to move thread into
root cgroup, thus move the processes instead of moving tasks
can fix this issue.
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
this script relies on temporary subscriptions and won't cleanup any
resources. Let's improve the logging to better describe what resources
were created and how to clean them, if the user needs to do so.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
We used hardcoded "ci/openshift-ci/cluster" location which expects this
script to be only executed from the root. Let's use SCRIPT_DIR instead
to allow execution from elsewhere eg. by user bisecting a failed CI run.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
In CI we hit problem where just after `az login` the first `az
network vnet list` command fails due to permission. We see
"insufficient permissions" or "pending permissions", suggesting we should
retry later. Manual tests and successful runs indicate we do have the
permissions, but not immediately after login.
Azure docs suggest using extra `az account set` but still the
propagation might take some time. Add a loop retrying
the first command a few times before declaring failure.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
kbs_k8s_svc_host() returns the ingress IP when the KBS service is
exposed via an ingress. In Azure AKS the ingress can time a while to be
fully ready and recently we have noticed on CI that kbs_k8s_svc_host()
has returned empty value. Maybe the problem is on current timeout being
too low, so let's increase it to 50 seconds to see if the situation
improves.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Added 'report-tests' command to gha-run.sh to print to stdout a report
of the tests executed.
For example:
```
SUMMARY (2025-02-17-14:43:53):
Pass: 0
Fail: 1
STATUSES:
not_ok foo.bats
OUTPUTS:
::group::foo.bats
1..3
not ok 1 test 1
not ok 2 test 2
ok 3 test 3
1..2
not ok 1 test 1
not ok 2 test 2
::endgroup::
```
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Currently run_kubernetes_tests.sh sends all the bats outputs to stdout
which can be very difficult to browse to find a problem, mainly on
CI. With this change, each bats execution have its output sent to
'reports/yyy-mm-dd-hh:mm:ss/<status>-<bats file>.log' where <status>
is either 'ok' (tests passed) or 'not_ok' (some tests failed).
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
For security reasons, we have restricted directory copying.
Introduces the `is_allowlisted_copy_volume` function to verify
if a given volume path is present in an allowed copy directory.
This enhances security by ensuring only permitted volumes are
copied
Currently, only directories under the path
`/var/lib/kubelet/pods/<uid>/volumes/{kubernetes.io~configmap,
kubernetes.io~secret, kubernetes.io~downward-api,
kubernetes.io~projected}` are allowed to be copied into the
guest. Copying of other directories will be prohibited.
Fixes#11237
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
When synchronizing file changes on the host, a "symlink AlreadyExists"
issue occurs, primarily due to improper handling of symbolic links
(symlinks). Additionally, there are other related problems.
This patch will try to address these problems.
(1) Handle symlink target existence (files, dirs, symlinks) during host file
sync. Use appropriate removal methods (unlink, remove_file, remove_dir_all).
(2) Enhance temporary file handling for safer operations and implement truncate
only at offset 0 for resume support.
(3) Set permissions and ownership for parent directories.
(4) Check and clean target path for regular files before rename.
Fixes#11237
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
Introduce event-driven file sync mechanism between host and guest when
sharedfs is disabled, which will help monitor the host path in time and
do sync files changes:
1. Introduce FsWatcher to monitor directory changes via inotify;
2. Support recursive watching with configurable filters;
3. Add debounce logic (default 500ms cooldown) to handle burst events;
4. Trigger `copy_dir_recursively` on stable state;
5. Handle CREATE/MODIFY/DELETE/MOVED/CLOSE_WRITE events;
Fixes#11237
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
In Kubernetes (k8s), while Kata Pods often use virtiofs for injecting
Service Accounts, Secrets, and ConfigMaps, security-sensitive
environments like CoCo disable host-guest sharing. Consequently, when
SharedFs is disabled, we propagate these configurations into the guest
via file copy and bind mount for correct container access.
Fixes#11237
Signed-off-by: alex.lyn <alex.lyn@antgroup.com>
to simplify gatekeeper development add support for DEBUG_INPUT which can
be used to report content from files gathered in DEBUG run.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
to avoid manual curling to analyze GK issues let's add a way to dump all
GK requests in a directory when the use specifies "DEBUG" env variable.
Signed-off-by: Lukáš Doktor <ldoktor@redhat.com>
# your codebase is analyzed, see https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/codeql-code-scanning-for-compiled-languages
echo"This script created additional resources to create peering between ${AZURE_REGION} and ${PP_REGION}. Ensure you release those resources after the testing (or use temporary subscription)"
The following sequence diagram depicted below offers a detailed overview of the messages/calls exchanged to pull an unencrypted unsigned image from an unauthenticated container registry. This involves the kata-runtime, kata-agent, and the guest-components’ image-rs to use the guest pull mechanism.
First and foremost, the guest pull code path is only activated when `nydus snapshotter` requires the handling of a volume which type is `image_guest_pull`, as can be seen on the message below:
```json
@@ -108,10 +153,10 @@ Below is an example of storage information packaged in the message sent to the k
```
Next, the kata-agent's RPC module will handle the create container request which, among other things, involves adding storages to the sandbox. The storage module contains implementations of `StorageHandler` interface for various storage types, being the `ImagePullHandler` in charge of handling the storage object for the container image (the storage manager instantiates the handler based on the value of the "driver").
`ImagePullHandler` delegates the image pulling operation to the `ImageService.pull_image()` that is going to create the image's bundle directory on the guest filesystem and, in turn, class the image-rs to in fact fetch and uncompress the image's bundle.
`ImagePullHandler` delegates the image pulling operation to the `confidential_data_hub.pull_image()` that is going to create the image's bundle directory on the guest filesystem and, in turn, the `ImagePullService` of Confidential Data Hub to fetch, uncompress and mount the image's rootfs.
> **Notes:**
> In this flow, `ImageService.pull_image()` parses the image metadata, looking for either the `io.kubernetes.cri.container-type: sandbox` or `io.kubernetes.cri-o.ContainerType: sandbox` (CRI-IO case) annotation, then it never calls the `image-rs.pull_image()` because the pause image is expected to already be inside the guest's filesystem, so instead `ImageService.unpack_pause_image()` is called.
> In this flow, `confidential_data_hub.pull_image()` parses the image metadata, looking for either the `io.kubernetes.cri.container-type: sandbox` or `io.kubernetes.cri-o.ContainerType: sandbox` (CRI-IO case) annotation, then it never calls the `pull_image()` RPC of Confidential Data Hub because the pause image is expected to already be inside the guest's filesystem, so instead `confidential_data_hub.unpack_pause_image()` is called.
@@ -28,6 +28,7 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.runtime.sandbox_cgroup_only`| `boolean` | determines if Kata processes are managed only in sandbox cgroup |
| `io.katacontainers.config.runtime.enable_pprof` | `boolean` | enables Golang `pprof` for `containerd-shim-kata-v2` process |
| `io.katacontainers.config.runtime.create_container_timeout` | `uint64` | the timeout for create a container in `seconds`, default is `60` |
| `io.katacontainers.config.runtime.experimental_force_guest_pull` | `boolean` | forces the runtime to pull the image in the guest VM, default is `false`. This is an experimental feature and might be removed in the future. |
| [Using official distro packages](#official-packages) | Kata packages provided by Linux distributions official repositories | yes | Recommended for most users. |
| [Automatic](#automatic-installation) | Run a single command to install a full system | **No!** | For those wanting the latest release quickly. |
| [Using kata-deploy](#kata-deploy-installation) | The preferred way to deploy the Kata Containers distributed binaries on a Kubernetes cluster | **No!** | Best way to give it a try on kata-containers on an already up and running Kubernetes cluster. |
| [Using kata-deploy Helm chart](#kata-deploy-helm-chart) | The preferred way to deploy the Kata Containers distributed binaries on a Kubernetes cluster | **No!** | Best way to give it a try on kata-containers on an already up and running Kubernetes cluster. |
### Kata Deploy Installation
### Kata Deploy Helm Chart
Kata Deploy provides a Dockerfile, which contains all of the binaries and
artifacts required to run Kata Containers, as well as reference DaemonSets,
which can be utilized to install Kata Containers on a running Kubernetes
cluster.
The Kata Deploy Helm chart is a convenient way to install all of the binaries and
artifacts required to run Kata Containers on Kubernetes.
[Use Kata Deploy](/tools/packaging/kata-deploy/README.md) to install Kata Containers on a Kubernetes Cluster.
[Use Kata Deploy Helm Chart](/tools/packaging/kata-deploy/helm-chart/README.md) to install Kata Containers on a Kubernetes Cluster.
@@ -129,6 +129,7 @@ The kata agent has the ability to configure agent options in guest kernel comman
| `agent.guest_components_procs` | guest-components processes | Attestation-related processes that should be spawned as children of the guest. Valid values are `none`, `attestation-agent`, `confidential-data-hub` (implies `attestation-agent`), `api-server-rest` (implies `attestation-agent` and `confidential-data-hub`) | string | `api-server-rest` |
| `agent.hotplug_timeout` | Hotplug timeout | Allow to configure hotplug timeout(seconds) of block devices | integer | `3` |
| `agent.cdh_api_timeout` | Confidential Data Hub (CDH) API timeout | Allow to configure CDH API timeout(seconds) | integer | `50` |
| `agent.image_pull_timeout` | Confidential Data Hub (CDH) Image Pull API timeout | Allow to configure CDH API image pull timeout(seconds) | integer | `1200` |
| `agent.https_proxy` | HTTPS proxy | Allow to configure `https_proxy` in the guest | string | `""` |
| `agent.image_registry_auth` | Image registry credential URI | The URI to where image-rs can find the credentials for pulling images from private registries e.g. `file:///root/.docker/config.json` to read from a file in the guest image, or `kbs:///default/credentials/test` to get the file from the KBS| string | `""` |
| `agent.enable_signature_verification` | Image security policy flag | Whether enable image security policy enforcement. If `true`, the resource indexed by URI `agent.image_policy_file` will be got to work as image pulling policy. | string | `""` |
@@ -148,7 +149,7 @@ The kata agent has the ability to configure agent options in guest kernel comman
> The agent will fail to start if the configuration file is not present,
> or if it can't be parsed properly.
> - `agent.devmode`: true | false
> - `agent.hotplug_timeout` and `agent.cdh_api_timeout`: a whole number of seconds
> - `agent.hotplug_timeout`, `agent.image_pull_timeout` and `agent.cdh_api_timeout`: a whole number of seconds
/// pull_image is used for call confidential data hub to pull image in the guest.
/// Image layers will store at [`image::KATA_IMAGE_WORK_DIR`]`,
/// rootfs and config.json will store under given `bundle_path`.
///
/// # Parameters
/// - `image`: Image name (exp: quay.io/prometheus/busybox:latest)
/// - `bundle_path`: The path to store the image bundle (exp. /run/kata-containers/cb0b47276ea66ee9f44cc53afa94d7980b57a52c3f306f68cb034e58d9fbd3c6/rootfs)
/// pull_image is used for call image-rs to pull image in the guest.
/// # Parameters
/// - `image`: Image name (exp: quay.io/prometheus/busybox:latest)
/// - `cid`: Container id
/// - `image_metadata`: Annotations about the image (exp: "containerd.io/snapshot/cri.layer-digest": "sha256:24fb2886d6f6c5d16481dd7608b47e78a8e92a13d6e64d87d57cb16d5f766d63")
/// # Returns
/// - The image rootfs bundle path. (exp. /run/kata-containers/cb0b47276ea66ee9f44cc53afa94d7980b57a52c3f306f68cb034e58d9fbd3c6/rootfs)
pubasyncfnpull_image(
&mutself,
image: &str,
cid: &str,
image_metadata: &HashMap<String,String>,
)-> Result<String>{
info!(sl(),"image metadata: {image_metadata:?}");
ifSelf::is_sandbox(image_metadata){
letmount_path=Self::unpack_pause_image(cid)?;
returnOk(mount_path);
}
// Image layers will store at KATA_IMAGE_WORK_DIR, generated bundles
// with rootfs and config.json will store under CONTAINER_BASE/cid/images.
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.