When QEMU is launched daemonized, we have the guarantee that the
QMP socket is available. In order to launch a non-daemonized QEMU,
the QMP connection should be created before QEMU is started in order
to avoid a race. Introduce a variant of QMPStart() that can use such
an existing connection.
Signed-off-by: Greg Kurz <groug@kaod.org>
Fixes: #6095
We're already importing the virtcontainers package so might as well
use the constants for the hypervisor types we're checking against instead
of typing the names out in the switch cases.
Signed-off-by: Danny Canter <danny@dcantah.dev>
When the vmm process exits abnormally, a goroutine sets s.monitor
to null in the 'watchSandbox' function without getting service.mu,
This will cause another goroutine to block when sending a message
to s.monitor, and it holds service.mu, which leads to a deadlock.
For example, the wait function in the file
.../pkg/containerd-shim-v2/wait.go will send a message to s.monitor
after obtaining service.mu, but s.monitor may be null at this time
Fixes: #6059
Signed-off-by: ls <335814617@qq.com>
Mount handling is often unique in Linux. Let's ensure that the common
parts remain in mount.go, while Linux speific parts are within a linux
file.
Fixes: #6049
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
The .git-commit can be a multiple line file, potentially confusing
the Darwin linker for example.
Fixes: #6046
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Cgroups do not exist on Darwin, so use an empty implementation for
resourcecontrol for the time being. In the process, ensure that the
utilized cgroup handling (ie, isSystemdCgroup) is kept in general file,
since we use this to help assess/constrain the container spec we pass to
the guest.
Fixes: #6051
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
This PR fixes a misspelling in the error message when it tries to run
a system without Confidential computing support.
Fixes#6042
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
With `disable_netns=true`, we should never scan the sandbox netns which
is the host netns in such case.
Fixes: #6021
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
In TestHandleHugepages it will do a mount operation with different pagesizes,
but some systems only support 2M pagesize, test for a 1g pagesize will fail.
This commit try to fix by only mount pagesizes under `/sys/kernel/mm/hugepages`, which are
supported to mount by the OS.
Fixes: #6029
Signed-off-by: Bin Liu <bin@hyper.sh>
Fixes: #6004
A Virtualization.framework based Hypervisor implementation.
This is just stubs for now to eventually get this building.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
Fixes: #6002
As a first pass for testing, let's add a skeleton for filesystem
sharing support on Darwin..
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
Fixes: #5993
Several tests utilize linux'isms like Mounts, bindmounts, vsock etc.
Let's ensure that these are still tested on Linux, but that we also skip
these tests when on other operating systems (Darwin). This commit just
moves tests; there shouldn't be any functional test changes. While the
tests still won't be runnable on Darwin/other hosts yet, this is a necessary
step forward.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
This is needed in order to have Moby / Docker working properly with
Cloud Hypervisor, as Moby / Docker relies on hotplugging a network
device to the VM as a preStartHook.
Fixes: #5997
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
THe only bit needed for having the vmAddNetPutRequest() capable of
dealing with hotplugs, instead of only coldplugs, is making sure it
doesn't error out in case a `200` response is returned.
The 200 response means:
"""
The new device was successfully added to the VM instance.
"""
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Fixes: #5995
Placeholder skeleton at this point - implementation will be added after
basic build refactoring lands.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
Fixes: #5990
Some signals may not be defined on non Linux host OSes, like
SIGSTKFLT for example. It's also not defined on certain architectures,
but irrelevant for this.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
Fixes: #5983
sched-core only makes sense on Linux hosts. Let's add stub/error for
other platforms.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
Fixes: #5985
With nydus not being its own pkg, it is challenging to implement cleanly
in a virtcontainers package that isn't necesarily Linux-only. The
existing code utilizes network namespace code in order to ensure nydus
is launched in the host netns. This is very Linux specific - so let's
make sure we only carry this out in a linux specific file.
In the Darwin case, to allow for compilation at least, let's add a stub
for doNetNS. Ideally the nydus and vc code can be refactored /
decoupled.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Danny Canter <danny@dcantah.dev>
Moby relies on the prestart hooks to configure network endpoints. We
should rescan the netns after running them so that the newly added
endpoints can be found and plugged to the guest.
Fixes: #5941
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Substitution in the yq install script doesn't like zsh, and additionally
the version of yq we're using doesn't have a darwin/arm64 build so grab
the amd64 version and let rosetta work its magic.
Additionally swap to abspath from readlink -m for the printing of what binaries
to install, as the -m flag doesn't exist on the BSD variant, and this
should be the same behavior.
Fixes: #5970
Signed-off-by: Danny Canter <danny@dcantah.dev>
Was about to change `urandomdev` to a constant when I realized it's
intentionally mutable so it can be mocked in tests. There's other
comments to the same effect so clarify here as well.
Fixes: #5965
Signed-off-by: Danny Canter <danny@dcantah.dev>
Use pidfd_open and poll on newer versions of Linux to wait
for the process to exit. For older versions use existing wait logic
Fixes: #5617
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Now we are supporting two runtime/shim, the go version,
and the rust version, for debug purposes, we can
add an identification in the version info
to tell us which runtime/shim is used.
Fixes: #5806
Signed-off-by: Bin Liu <bin@hyper.sh>
Pass SELinux policy for containers to the agent if `disable_guest_selinux`
is set to `false` in the runtime configuration. The `container_t` type
is applied to the container process inside the guest by default.
Users can also set a custom SELinux policy to the container process using
`guest_selinux_label` in the runtime configuration. This will be an
alternative configuration of Kubernetes' security context for SELinux
because users cannot specify the policy in Kata through Kubernetes's security
context. To apply SELinux policy to the container, the guest rootfs must
be CentOS that is created and built with `SELINUX=yes`.
Fixes: #4812
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
We have starting to use golang 1.19, some features are
not supported later, so run `go fix` to fix them.
Fixes: #5750
Signed-off-by: Bin Liu <bin@hyper.sh>
Use MkdirAll instead of Mkdir so it doesn't generate an
error when the folder is created by another process
Fixes#5713
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
When the user tried to add new devices to the VM, there is no error info for the invalid
device. This PR adds a log record to the `appendDevices` for the invalid device of the
qemu config.
Fixes: #5719
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
Let's follow the binary bump used in the CI and also bump the vendored
version of containerd to v1.6.8.
Fixes: #5722
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The default vhost-user-fs queue-size of qemu is 128 now. Set it to 1024
by default which is same as clh. Also make this value configurable.
Fixes: #5694
Signed-off-by: liyuxuan.darfux <liyuxuan.darfux@bytedance.com>
This patch re-generates the client code for Cloud Hypervisor v28.0.
Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by openapi-generator.
Fixes: #5683
Signed-off-by: Bo Chen <chen.bo@intel.com>
It seems that bumping the version of golang and golangci-lint new format
changes are required.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The package has been deprecated as part of 1.16 and the same
functionality is now provided by either the io or the os package.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
So that we get the latest language fixes.
There is little use to maitain compiler backward compatibility.
Let's just set the default golang version to the latest 1.19.2.
Fixes: #5494
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Through proactively checking if Cloud Hypervisor process is dead,
this patch provides a faster path for isClhRunning
Fixes: #5623
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Use atomic operations instead of acquiring a mutex in isClhRunning.
This stops isClhRunning from generating a deadlock by trying to
reacquire an already-acquired lock when called via StopVM->terminate.
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Avoid executing StopVM concurrently when virtiofs dies as a result of clh
being stopped in StopVM.
Fixes: #5622
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
1. Implemented a rust module for operating cgroups through systemd with the help of zbus (src/agent/rustjail/src/cgroups/systemd).
2. Add support for optional cgroup configuration through fs and systemd at agent (src/agent/rustjail/src/container.rs).
3. Described the usage and supported properties of the agent systemd cgroup (docs/design/agent-systemd-cgroup.md).
Fixes: #4336
Signed-off-by: Yuan-Zhuo <yuanzhuo0118@outlook.com>
An API change, done a long time ago, has been exposed on Cloud
Hypervisor and we should update it on the Kata Containers side to ensure
it doesn't affect Cloud Hypervisor CI and because the change is needed
for an upcoming work to get QAT working with Cloud Hypervisor.
Fixes: #5492
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Currently ACRN hypervisor support in Kata2.x releases is broken.
This commit re-enables ACRN hypervisor support and also refactors
the code so as to remove dependency on Sandbox.
Fixes#3027
Signed-off-by: Vijay Dhanraj <vijay.dhanraj@intel.com>
The containerd stats method and metrics API are broken with Kata 2.5.x, the stats fail to load and the metrics API responds with status code 500
This seems to be down to the conversion from the stats reported by the agent RPC `StatsContainer` where the field `Pagesize` is not
completed by the `setHugetlbStats` method. In the case where multiple sized tables stats are reported, this causes containerd to register two metrics
with the same label set, rather than each being partitioned by the `page` label.
Fixes: #5316
Signed-off-by: Champ-Goblem <cameron@northflank.com>
The new way to boot from TDX firmware (e.g. td-shim) is using the
combination of '--platform tdx=on' with '--firmware tdshim'.
Fixes: #5309
Signed-off-by: Bo Chen <chen.bo@intel.com>
`kernel_irqchip` option doesn't seem to bring any benefits and, on the
contrary, its usage cause issues when using the microvm machine type.
With this in mind, let's remove it.
Fixes: #1984, #4386
Signed-off-by: norbjd <norbjd@users.noreply.github.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The qmp command of hotplug cpu failed error was hidden. It didn't friendly for
the user tracing the hotplug cpu error. The PR help us to improve the hotplug
cpu error log. Add real qemu command error log for `failed to hot add vCPUs`.
Through the error message, we can get the reason of the failed qmp command
for hotplug cpu operation.
Fixes: #5234
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
This is based on a patch from @niteeshkd that adds a config
parameter to choose between AMD SEV and SEV-SNP VMs as the
confidential guest type in case both types are supported. SEV is
the default.
Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
This commit adds AMD SEV-SNP as a confidential guest option to the
runtime. Information on required components such as OVMF, QEMU and
a kernel supporting SEV-SNP are defined in the versions file and
corresponding configs are added.
Note: The CPU model 'host' provided by the current SNP-QEMU does
not support all SNP capabilities yet, which is why this option is
changed to EPYC-v4.
Note: The guest's physical address space reduction specified with
ReducedPhysBits is 1. Details are can be found in Section 15.34.6
here https://www.amd.com/system/files/TechDocs/24593.pdfFixes#4437
Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
Adds initrd configuration option to the configuration.toml that is
generated for the setup using QEMU.
Signed-off-by: Joana Pecholt <joana.pecholt@aisec.fraunhofer.de>
The user name will be used to delete the user instead of relying on
uid lookup because uid can be reused.
Fixes: #5155
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Augment the mock hypervisor so that we can validate that ACPI memory hotplug
is carried out as expected.
We'll augment the number of memory slots in the hypervisor config each
time the memory of the hypervisor is changed. In this way we can ensure
that large memory hotplugs are broken up into appropriately sized
pieces in the unit test.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
If we're using ACPI hotplug for memory, there's a limitation on the
amount of memory which can be hotplugged at a single time.
During hotplug, we'll allocate memory for the memmap for each page,
resulting in a 64 byte per 4KiB page allocation. As an example, hotplugging 12GiB
of memory requires ~192 MiB of *free* memory, which is about the limit
we should expect for an idle 256 MiB guest (conservative heuristic of 75%
of provided memory).
From experimentation, at pod creation time we can reliably add 48 times
what is provided to the guest. (a factor of 48 results in using 75% of
provided memory for hotplug). Using prior example of a guest with 256Mi
RAM, 256 Mi * 48 = 12 Gi; 12GiB is upper end of what we should expect
can be hotplugged successfully into the guest.
Note: It isn't expected that we'll need to hotplug large amounts of RAM
after workloads have already started -- container additions are expected
to occur first in pod lifecycle. Based on this, we expect that provided
memory should be freely available for hotplug.
If virtio-mem is being utilized, there isn't such a limitation - we can
hotplug the max allowed memory at a single time.
Fixes: #4847
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
It'll be useful to get the total memory provided to the guest
(hotplugged + coldplugged). We'll use this information when calcualting
how much memory we can add at a time when utilizing ACPI hotplug.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
With the current TDX kernel used with Kata Containers, `tdx_guest` is
not needed, as TDX_GUEST is now a kernel configuration.
With this in mind, let's just drop the kernel parameter.
Fixes: #4981
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As right now the TDX guest kernel doesn't support "serial" console,
let's switch to using HVC in this case.
Fixes: #4980
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The runtime will crash when trying to resize memory when memory hotplug
is not allowed.
This happens because we cannot simply set the hotplug amount to zero,
leading is to not set memory hotplug at all, and later then trying to
access the value of a nil pointer.
Fixes: #4979
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
While doing tests using `ctr`, I've noticed that I've been hitting those
timeouts more frequently than expected.
Till we find the root cause of the issue (which is *not* in the Kata
Containers), let's increase the timeouts when dealing with a
Confidential Guest.
Fixes: #4978
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
When booting the TDX kernel with `tdx_disable_filter`, as it's been done
for QEMU, VirtioFS can work without any issues.
Whether this will be part of the upstream kernel or not is a different
story, but it easily could make it there as Cloud Hypervisor relies on
the VIRTIO_F_IOMMU_PLATFORM feature, which forces the guest to use the
DMA API, making these devices compatible with TDX.
See Sebastien Boeuf's explanation of this in the
3c973fa7ce208e7113f69424b7574b83f584885d commit:
"""
By using DMA API, the guest triggers the TDX codepath to share some of
the guest memory, in particular the virtqueues and associated buffers so
that the VMM and vhost-user backends/processes can access this memory.
"""
Fixes: #4977
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's swith to depending on podman which also simplies indirect
dependency on kubernetes components. And it helps to avoid cri-o
security issues like CVE-2022-1708 as well.
Fixes: #4972
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
We are not spinning up any L2 guests in vm factory, so the L1 guest
migration is expected to work even with VMX.
See https://www.linux-kvm.org/page/Nested_GuestsFixes: #4050
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
The root span should exist the duration of the trace. Defer ending span
until the end of the trace instead of end of function. Add the span to
the service struct to do so.
Fixes#4902
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
If the API server is not ready, the mount call will fail, so before
mounting share fs, we should wait the nydusd is started and
the API server is ready.
Fixes: #4710
Signed-off-by: liubin <liubin0329@gmail.com>
Signed-off-by: Bin Liu <bin@hyper.sh>
Instead of passing a bunch of arguments to qmp functions for
adding block devices, use govmm BlockDevice structure to reduce these.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Get rid of redundant return values from function.
args and blockdevArgs used to return different values to maintain
compatilibity between qemu versions. These are exactly the same now.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This configuration will allow users to choose between different
I/O backends for qemu, with the default being io_uring.
This will allow users to fallback to a different I/O mechanism while
running on kernels olders than 5.1.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
io_uring was introduced as a new kernel IO interface in kernel 5.1.
It is designed for higher performance than the older Linux AIO API.
This feature was added in qemu 5.0.
Fixes#4645
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
To keep runtime-rs up to date, we will merge main into runtime-rs every
week.
Fixes:#4776
Signed-off-by: Zhongtao Hu <zhongtaohu.tim@linux.alibaba.com>
In qemu.StopVM(), if debug is enabled, the shim will dump logs
from qemu.log, but users don't know which logs are from qemu.log
and shim itself. Adding some additional messages will
help users to distinguish these logs.
Fixes: #4745
Signed-off-by: Bin Liu <bin@hyper.sh>
Enable Kata runtime to handle `disable_selinux` flag properly in order
to be able to change the status by the runtime configuration whether the
runtime applies the SELinux label to VMM process.
Fixes: #4599
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Some clients like nerdctl may pass mount type of none for volumes/bind mounts,
this will lead to container start fails.
Referring to runc, it overwrites the mount type to bind and ignores the input value.
Fixes: #4548
Signed-off-by: liubin <liubin0329@gmail.com>
The tests ensure that interactions between drop-ins and the base
configuration.toml and among drop-ins themselves work as intended,
basically that files are evaluated in the correct order (base file
first, then drop-ins in alphabetical order) and the last one to set
a specific key wins.
Signed-off-by: Pavel Mores <pmores@redhat.com>
updateFromDropIn() uses the infrastructure built by previous commits to
ensure no contents of 'tomlConfig' are lost during decoding. To do
this, we preserve the current contents of our tomlConfig in a clone and
decode a drop-in into the original. At this point, the original
instance is updated but its Agent and/or Hypervisor fields are
potentially damaged.
To merge, we update the clone's Agent/Hypervisor from the original
instance. Now the clone has the desired Agent/Hypervisor and the
original instance has the rest, so to finish, we just need to move the
clone's Agent/Hypervisor to the original.
Signed-off-by: Pavel Mores <pmores@redhat.com>
These functions take a TOML key - an array of individual components,
e.g. ["agent" "kata" "enable_tracing"], as returned by BurntSushi - and
two 'tomlConfig' instances. They copy the value of the struct field
identified by the key from the source instance to the target one if
necessary.
This is only done if the TOML key points to structures stored in
maps by 'tomlConfig', i.e. 'hypervisor' and 'agent'. Nothing needs to
be done in other cases.
Signed-off-by: Pavel Mores <pmores@redhat.com>
For 'tomlConfig' substructures stored in Golang maps - 'hypervisor' and
'agent' - BurntSushi doesn't preserve their previous contents as it does
for substructures stored directly (e.g. 'runtime'). We use reflection
to work around this.
This commit adds three primitive operations to work with struct fields
identified by their `toml:"..."` tags - one to get a field value, one to
set a field value and one to assign a source struct field value to the
corresponding field of a target.
Signed-off-by: Pavel Mores <pmores@redhat.com>
Return code is an int32 type, so if an error occurred, the default value
may be zero, this value will be created as a normal exit code.
Set return code to 255 will let the caller(for example Kubernetes) know
that there are some problems with the pod/container.
Fixes: #4419
Signed-off-by: liubin <liubin0329@gmail.com>
Prior device config move didn't update the comments. Let's address this,
and make sure comments match the new path...
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Ideally this config validation would be in a seperate package
(katautils?), but that would introduce circular dependency since we'd
call it from vc, and it depends on vc types (which, shouldn't be vc, but
probably a hypervisor package instead).
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
While working on the previous commits, some of the functions become
non-used. Let's simply remove them.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Expose the newly added `default_maxmemory` to the project's Makefile and
to the configuration files.
Fixes: #4516
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's adapt Cloud Hypervisor's and QEMU's code to properly behave to the
newly added `default_maxmemory` config.
While implementing this, a change of behaviour (or a bug fix, depending
on how you see it) has been introduced as if a pod requests more memory
than the amount avaiable in the host, instead of failing to start the
pod, we simply hotplug the maximum amount of memory available, mimicing
better the runc behaviour.
Fixes: #4516
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add a `default_maxmemory` configuration, which allows the admins
to set the maximum amount of memory to be used by a VM, considering the
initial amount + whatever ends up being hotplugged via the pod limits.
By default this value is 0 (zero), and it means that the whole physical
RAM is the limit.
Fixes: #4516
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Now kata shim only supports stdout/stderr of fifo from
containerd/CRI-O, but shim v2 supports logging plugins,
and nerdctl default will use the binary schema for logs.
This commit will add the others type of log plugins:
- file
- binary
In case of binary, kata shim will receive a stdout/stderr like:
binary:///nerdctl?_NERDCTL_INTERNAL_LOGGING=/var/lib/nerdctl/1935db59
That means the nerdctl process will handle the logs(stdout/stderr)
Fixes: #4420
Signed-off-by: Bin Liu <bin@hyper.sh>
Depending on the user of it, the hypervisor from hypervisor interface
could have differing view on what is valid or not. To help decouple,
let's instead check the hypervisor config validity as part of the
sandbox creation, rather than as part of the CreateVM call within the
hypervisor interface implementation.
Fixes: #4251
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Policy for whats valid/invalid within the config varies by VMM, host,
and by silicon architecture. Let's keep katautils simple for just
translating a toml to the hypervisor config structure, and leave
validation to virtcontainers.
Without this change, we're doing duplicate validation.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Before, we maintained almost identical structures between our persist
API and what we keep for our devices, with the persist API being a
slight subset of device structures.
Let's deduplicate this, now that persist is importing device package.
Json unmarshal of prior persist structure will work fine, since it was
an exact subset of fields.
Fixes: #4468
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Rather than have device package depend on persist, let's define the
(almost duplicate) structures within device itself, and have the Kata
Container's persist pkg import these.
This'll help avoid unecessary dependencies within our core packages.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Similar to network, we can use multiple queues for virtio-block
devices. This can help improve storage performance.
This commit changes the number of queues for block devices to
the number of cpus for cloud-hypervisor and qemu.
Today the default number of cpus a VM starts with is 1.
Hence the queues used will be 1. This change will help
improve performance when the default cold-plugged cpus is greater
than one by changing this in the config file. This may also help
when we use the sandboxing feature with k8s that passes down
the sum of the resources required down to Kata.
Fixes#4502
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Enable "-sandbox on" in qemu can introduce another protect layer
on the host, to make the secure container more secure.
The default option is disable because this feature may introduce some
performance cost, even though user can enable
/proc/sys/net/core/bpf_jit_enable to reduce the impact.
Fixes: #2266
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Remove space from root span name to follow camel casing of other tracing
span names in the runtime and to make parsing easier in testing.
Fixes#4483
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
By comparing the content of the old url and the new url,
ensure that their content is consistent and does not contain ambiguities
Fixes: #4454
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
Let's improve the log so we make it clear that we're only *actually*
adding the net device to the Cloud Hypervisor configuration when calling
our own version of VmAddNetPut().
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We want to have the file descriptors of the opened tuntap device to pass
them down to the VMMs, so the VMMs don't have to explicitly open a new
tuntap device themselves, as the `container_kvm_t` label does not allow
such a thing.
With this change we ensure that what's currently done when using QEMU as
the hypervisor, can be easily replicated with other VMMs, even if they
don't support multiqueue.
As a side effect of this, we need to close the received file descriptors
in the code of the VMMs which are not going to use them.
Fixes: #3533
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Adding FFI_NO_PI to the netlink flags causes no harm to the supported
and tested hypervisors as when opening the device by its name Cloud
Hypervisor[0], Firecracker[1], and QEMU[2] do set the flag already.
However, when receiving the file descriptor of an opened tutap device
Cloud Hypervisor is not able to set the flag, leaving the guest without
connectivity.
To avoid such an issue, let's simply add the FFI_NO_PI flag to the
netlink flags and ensure, from our side, that the VMMs don't have to set
it on their side when dealing with an already opened tuntap device.
Note that there's a PR opened[3] just for testing that this change
doesn't cause any breakage.
[0]: e52175c2ab/net_util/src/tap.rs (L129)
[1]: b6d6f71213/src/devices/src/virtio/net/tap.rs (L126)
[2]: 3757b0d08b/net/tap-linux.c (L54)
[3]: https://github.com/kata-containers/kata-containers/pull/4292
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is basically a no-op right now, as:
* netPair.TapInterface.VMFds is nil
* the tap name is still passed to Cloud Hypervisor, which is the Cloud
Hypervisor's first choice when opening a tap device.
In the very near future we'll stop passing the tap name to Cloud
Hypervisor, and start passing the file descriptors of the opened tap
instead.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Knowing that VmAddNetPut works as expected, let's switch to manually
building the request and writing it to the appropriate socket.
By doing this it gives us more flexibility to, later on, pass the file
descriptor of the tuntap device to Cloud Hypervisor, as openAPI doesn't
support such operation (it has no notion of SCM Rights).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Instead of creating the VM with the network device already plugged in,
let's actually add the network device *after* the VM is created, but
*before* the Vm is actually booted.
Although it looks like it doesn't make any functional difference between
what's done in the past and what this commit introduces, this will be
used to workaround a limitation on OpenAPI when it comes to passing down
the network device's file descriptor to Cloud Hypervisor, so Cloud
Hypervisor can use it instead of opening the device by its name on the
VMM side.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
VmAddNetPut is the API provided by the Cloud Hypervisor client (auto
generated) code to hotplug a new network device to the VM.
Let's expose it now as it'll be used as part this series, mostly to
guide the reviewer through the process of what we have to do, as later
on, spoiler alert, it'll end up being removed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
So far this has been done for x86_64. Now that the support for building
and testing has been added for all arches, let's do the second part of
the switch.
We're still not done yet for powerpc, as some a virtifosd crash on the
rust version has been found by the maintainer.
Fixes: #4258, #4260
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Changed bitsize for parsing functions to 64-bit in order to avoid
parsing errors.
Fixes#4435
Signed-off-by: Alexandru Matei <alexandru.matei@uipath.com>
Add more detail to the `kata-monitor` doc to allow an admin to make a
more informed decision about where and how to run the daemon.
Fixes: #4416.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
GetOOMEvent is a blocking call that will fail if
the container exit, in this case, it's not an error or warning.
Changing the log level for logs in case of GetOOMEvent call fails
will reduce log noise in a large cluster that has pods
creating/deleting frequently.
Fixes: #4376
Signed-off-by: Bin Liu <bin@hyper.sh>
Since #902 the `io.katacontainers.config.hypervisor` pod annotations
have only been permitted if explicitly allowed in the global
configuration. The default global configuration allows no such
annotations. That's important because several of those annotations
would cause Kata to execute arbitrary binaries, and so were wildly
unsafe.
However, this is inconvenient for the
`io.katacontainers.config.hypervisor.enable_iommu` annotation
specifically, which controls whether the sandbox VM includes a vIOMMU.
A guest side vIOMMU is necessary to implement VFIO passthrough devices
with `vfio_mode = vfio`, so enabling that mode of operation currently
requires a global configuration change, and can't just be enabled
per-pod.
Unlike some of the other hypervisor annotations, the `enable_iommu`
annotation is quite safe. By default the vIOMMU is not present, so
allowing a user to override it for a pod only improves their
facilities for isolation. Even if the global default were changed to
enable the vIOMMU, that doesn't compel the guest kernel to use it, so
allowing a user to disable the vIOMMU doesn't materially affect
isolation either.
Therefore, allow the io.katacontainers.config.hypervisor.enable_iommu
annotation to work in the default configurations.
fixes#4330
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Set thestop container force flag to true so that the container state is always set to
“StateStopped” after the container wait goroutine is finished. This is necessary for
the following delete container step to succeed.
Fixes: #4359
Signed-off-by: Feng Wang <feng.wang@databricks.com>
In linux 5.14 and hopefully some backports, core scheduling allows processes to
be co scheduled within the same domain on SMT enabled systems.
Containerd impl sets the core sched domain when launching a shim. This
allows a clean way for each shim(container/pod) to be in its own domain and any
additional containers, (v2 pods) be be launched with the same domain as well as
any exec'd process added to the container.
kernel docs: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/core-scheduling.html
For Kata specifically, we will look for SCHED_CORE environment variable
to be set to indicate we shuold create a new schedule core domain.
This is equivalent to the containerd shim's PR: e48bbe8394Fixes: #4309
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Michael Crosby <michael@thepasture.io>
While end users can connect directly to the shim, let's provide a way to
easily get/set iptables from kata-runtime itself.
Fixes: #4080
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Without this, potential errors are silently dropped. Let's ensure we
return the error code as well as potenial data from the response.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Before, we had a mix of slash, etc. Unfortunately, when cleaning URL
paths, serve mux seems to mangle the request method, resulting in each
request being a GET (instead of PUT or POST).
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Add two endpoints: ip6tables, iptables.
Each url handler supports GET and PUT operations. PUT expects
the requests' data to be []bytes, and to contain iptable information in
format to be consumed by iptables-restore.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Introduce get/set iptable handling. We add a sandbox API for getting and
setting the IPTables within the guest. This routes it from sandbox
interface, through kata-agent, ultimately making requests to the guest
agent.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
This release has been tracked through the v24.0 project.
virtio-iommu specification describes how a device can be attached by default
to a bypass domain. This feature is particularly helpful for booting a VM with
guest software which doesn't support virtio-iommu but still need to access
the device. Now that Cloud Hypervisor supports this feature, it can boot a VM
with Rust Hypervisor Firmware or OVMF even if the virtio-block device exposing
the disk image is placed behind a virtual IOMMU.
Multiple checks have been added to the code to prevent devices with identical
identifiers from being created, and therefore avoid unexpected behaviors at boot
or whenever a device was hot plugged into the VM.
Sparse mmap support has been added to both VFIO and vfio-user devices. This
allows the device regions that are not fully mappable to be partially mapped.
And the more a device region can be mapped into the guest address space, the
fewer VM exits will be generated when this device is accessed. This directly
impacts the performance related to this device.
A new serial_number option has been added to --platform, allowing a user to
set a specific serial number for the platform. This number is exposed to the
guest through the SMBIOS.
* Fix loading RAW firmware (#4072)
* Reject compressed QCOW images (#4055)
* Reject virtio-mem resize if device is not activated (#4003)
* Fix potential mmap leaks from VFIO/vfio-user MMIO regions (#4069)
* Fix algorithm finding HOB memory resources (#3983)
* Refactor interrupt handling (#4083)
* Load kernel asynchronously (#4022)
* Only create ACPI memory manager DSDT when resizable (#4013)
Deprecated features will be removed in a subsequent release and users should
plan to use alternatives
* The mergeable option from the virtio-pmem support has been deprecated
(#3968)
* The dax option from the virtio-fs support has been deprecated (#3889)
Fixes: #4317
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Today the shim does a translation when doing
direct-volume stats where it takes the source and
returns the mount path within the guest.
The source for a direct-assigned volume is actually
the device path on the host and not the publish
volume path.
This change will perform a lookup of the mount info
during direct-volume stats to ensure that the
device path is provided to the shim for querying
the volume stats.
Fixes: #4297
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
The go default http mux AFAIK doesn’t support pattern
routing so right now client is padding the url
for direct-volume stats with a subpath of the volume
path and this will always result in 404 not found returned
by the shim.
This change will update the shim to take the volume
path as a GET query parameter instead of a subpath.
If the parameter is missing or empty, then return
400 BadRequest to the client.
Fixes: #4297
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
The action function expects a function that returns error
but the current direct-volume stats Action returns
(string, error) which is invalid.
This change fixes the format and print out the stats from
the command instead.
Fixes: #4293
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
The documentation of the bufio package explicitly says
"Err returns the first non-EOF error that was encountered by the
Scanner."
When io.EOF happens, `Err()` will return `nil` and `Scan()` will return
`false`.
Fixes#4079
Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>
This allows to get guest early boot logs which are usually
missed when virtconsole is used.
- It utilizes previous work on the govmm side:
https://github.com/kata-containers/govmm/pull/203
- unit test added
Fixes: #4237
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
As now we build and ship the rust version of virtiofsd, which is not
tied to QEMU, we need to update its default location to match with where
we're installing this binary.
Fixes: #4249
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
go-test.sh by default adds the -v option to 'go test' meaning that output
will be printed from all the passing tests as well as any failing ones.
This results in a lot of output in which it's often difficult to locate the
failing tests you're interested in.
So, remove -v from the default flags.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
One of the responsibilities of the go-test.sh script is setting up the
default flags for 'go test'. This is constructed across several different
places in the script using several unneeded intermediate variables though.
Consolidate all the flag construction into one place.
fixes#4190
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
go-test.sh changes behaviour based on both the $CI and $KATA_DEV_MODE
variables, but not in a way that makes a lot of sense.
If either one is set it uses the test_coverage path, instead of the
test_local path. That collects coverage information, as the name
suggests, but it also means it runs the tests twice as root and
non-root, which is very non-obvious.
It's not clear what use case the test_local path is for at all.
Developer local builds will typically have $KATA_DEV_MODE set and CI
builds will have $CI set. There's essentially no downside to running
coverage all the time - it has little impact on the test runtime.
In addition, if *both* $CI and $KATA_DEV_MODE are set, the script
refuses to run things as root, considering it "unsafe". While having
both set might be unwise in a general sense, there's not really any
way running sudo can be any more unsafe than it is with either one
set.
So, simplify everything by just always running the test_coverage path.
This leaves the test_local path unused, so we can remove it entirely.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
go-test.sh accepts subcommands, however invoking it in the usual way via
the Makefile doesn't use them. In fact the only remaining subcommand is
"help" and we already have another way of getting the usage information
(-h or --help). We don't need a second way, so just drop subcommand
handling.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
go-test.sh defaults to testing all the packages listed by go list, except
for a number filtered out. It turns out that none of those filters are
necessary any more:
* We've long required a Go newer than 1.9 which means the vendor filter
isn't needed
* The agent filter doesn't do anything now that we've moved to the Kata
2.x unified repo
* The tests filters don't hit anything on the list of modules in
src/runtime (which is the only user of the script)
But since we don't need to filter anything out any more, we don't even need
to iterate through a list ourselves. We can simply pass "./..." directly
to go test and it will iterate through all the sub-packages itself.
Interestingly this more than doubles the speed of "make test" for me - I
suspect because go test's internal paralellism works better over a larger
pool of tests.
This also lets us remove handling of non-existent coverage files from
test_go_package(), since with default options we will no longer test packages without tests
by default. If the user explicitly requests testing of a package with no
tests, then failing makes sense.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The go-test.sh script has an explicit chmod command, run as root, to
set the mode of the temporary coverage files to 0644. AFAICT the
point of this is specifically the 004 bit allowing world read access,
so that we can then merge the temporary coverage file into the main
coverage file.
That's a convoluted way of doing things. Instead we can just run the tail
command which reads the temporary file as the same user that generated it.
In addition, go-test.sh became root to remove that temporary coverage
file. This is not necessary, since deleting a regular file just requires
write access to the directory, not the file itself.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The html-coverage option to this script doesn't really alter behaviour
it just does the same thing as normal coverage, then converts the
report to HTML. That conversion is a single command, plus a chmod to
make the final output mode 0644. That overrides any umask the user
has set, which doesn't seem like a policy decision this script should
be making.
Nothing in the kata-containers or tests repository uses this, so it doesn't
really make sense to keep this logic inside this script.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In addition to coverage.txt, the go-test.sh script creates
coverage.txt.tmp files while running. These are temporary and
certainly shouldn't be committed, so add them to the gitignore file.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The go unit tests for the runtime are invoked by the helper script
ci/go-test.sh. Which calls the run_go_test() function in ci/lib.sh. Which
calls into .ci/go-test.sh from the tests repository.
But.. the runtime is the only user of this script, and generally stuff for
unit tests (rather than functional or integration tests) lives in the main
repository, not the tests repository.
So, just move the actual script into src/runtime. A change to remove it
from the tests repo will follow.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We're currently hitting a race condition on the Cloud Hypervisor's
driver code when quickly removing and adding a block device.
This happens because the device removal is an asynchronous operation,
and we currently do *not* monitor events coming from Cloud Hypervisor to
know when the device was actually removed. Together with this, the
sandbox code doesn't know about that and when a new device is attached
it'll quickly assign what may be the very same ID to the new device,
leading to the Cloud Hypervisor's driver trying to hotplug a device with
the very same ID of the device that was not yet removed.
This is, in a nutshell, why the tests with Cloud Hypervisor and
devmapper have been failing every now and then.
The workaround taken to solve the issue is basically *not* passing down
the device ID to Cloud Hypervisor and simply letting Cloud Hypervisor
itself generate those, as Cloud Hypervisor does it in a manner that
avoids such conflicts. With this addition we have then to keep a map of
the device ID and the Cloud Hypervisor's generated ID, so we can
properly remove the device.
This workaround will probably stay for a while, at least till someone
has enough cycles to implement a way to watch the device removal event
and then properly act on that. Spoiler alert, this will be a complex
change that may not even be worth it considering the race can be avoided
with this commit.
Fixes: #4176
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
With everything implemented, let's now expose the disk rate limiter
configuration options in the Cloud Hypervisor configuration file.
Fixes: #4139
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
With everything implemented, let's now expose the net rate limiter
configuration options in the Cloud Hypervisor configuration file.
Fixes: #4017
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The notion of "built-in rate limiter" was added as part of
bd8658e362, and that commit considered
that only Firecracker had a built-in rate limiter, which I think was the
case when that was introduced (mid 2020).
Nowadays, however, Cloud Hypervisor takes advantage of the very same crate
used by Firecraker to do I/O throttling.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's take advantage of the newly added DiskRateLimiter* options and
apply those to the network device configuration.
The logic here is identical to the one already present in the Network
part of Cloud Hypervisor's driver.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the newly added disk rate limiter configurations to the Cloud
Hypervisor's hypervisor configuration.
Right now those are not used anywhere, and there's absolutely no way the
users can set those up. That's coming later in this very same series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is the disk counterpart of the what was introduced for the network
as part of the previous commits in this series.
The newly added fields are:
* DiskRateLimiterBwMaxRate, defined in bits per second, which is used to
control the network I/O bandwidth at the VM level.
* DiskRateLimiterBwOneTimeBurst, also defined in bits per second, which
is used to define an *initial* max rate, which doesn't replenish.
* DiskRateLimiterOpsMaxRate, the operations per second equivalent of the
DiskRateLimiterBwMaxRate.
* DiskRateLimiterOpsOneTimeBurst, the operations per second equivalent of
the DiskRateLimiterBwOneTimeBurst.
For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's take advantage of the newly added NetRateLimiter* options and
apply those to the network device configuration.
The logic here is quite similar to the one already present in the
Firecracker's driver, with the main difference being the single Inbound
/ Outbound MaxRate and the presence of both Bandwidth and Operations
rate limiter.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Firecracker's driver doesn't expose the RefillTime option of the rate
limiter to the user. Instead, it uses a contant value of 1000
miliseconds (1 second).
As we're following Firecracker's driver implementation, let's expose
create a new constant, use it as part of the Firecracker's driver, and
later on re-use it as part of the Cloud Hypervisor's driver.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Firecracker's revertBytes function, now called "RevertBytes", can be
exposed as part of the virtcontainers' utils file, as this function will
be reused by Cloud Hypervisor, when adding the rate limiter logic there.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the newly added network rate limiter configurations to the
Cloud Hypervisor's hypervisor configuration.
Right now those are not used anywhere, and there's absolutely no way the
users can set those up. That's coming later in this very same series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In a similar way to what's already exposed as RxRateLimiterMaxRate and
TxRateLimiterMaxRate, let's add four new fields to the Hypervisor's
configuration.
The values added are related to bandwidth and operations rate limiters,
which have to be added so we can expose I/O throttling configurations to
users using Cloud Hypervisor as their preferred VMM.
The reason we cannot simply re-use {Rx,Tx}RateLimiterMaxRate is because
Cloud Hypervisor exposes a single MaxRate to be used for both inbound
and outbound queues.
The newly added fields are:
* NetRateLimiterBwMaxRate, defined in bits per second, which is used to
control the network I/O bandwidth at the VM level.
* NetRateLimiterBwOneTimeBurst, also defined in bits per second, which
is used to define an *initial* max rate, which doesn't replenish.
* NetRateLimiterOpsMaxRate, the operations per second equivalent of the
NetRateLimiterBwMaxRate.
* NetRateLimiterOpsOneTimeBurst, the operations per second equivalent of
the NetRateLimiterBwOneTimeBurst.
For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Currently EnableMockTesting() takes no arguments and will always place the
mock storage in the fixed location /tmp/vc/mockfs. This means that one
test run can interfere with the next one if anything isn't cleaned up
(and there are other bugs which means that happens). If if those were
fixed this would allow developers testing on the same machine to interfere
with each other.
So, allow the mockfs to be placed at an arbitrary place given as a
parameter to EnableMockTesting(). In TestMain() we place it under our
existing temporary directory, so we don't need any additional cleanup just
for the mockfs.
fixes#4140
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently MockFSInit always creates the mockfs at the fixed path
/tmp/vc/mockfs. This change allows it to be initialized at any path
given as a parameter. This allows the tests in fs_test.go to be
simplified, because the by using a temporary directory from
t.TempDir(), which is automatically cleaned up, we don't need to
manually trigger initTestDir() (which is misnamed, it's actually a
cleanup function).
For now we still use the fixed path when auto-creating the mockfs in
MockAutoInit(), but we'll change that later.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
virtcontainers/persist/fs/mockfs.go defines a mock filesystem type for
testing. A global variable in virtcontainers/persist/manager.go is used to
force use of the mock fs rather than a normal one.
This patch moves the global, and the EnableMockTesting() function which
sets it into mockfs.go. This is slightly cleaner to begin with, and will
allow some further enhancements.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
storagePathSuffix defines the file path suffix - "vc" - used for
Kata's persistent storage information, as a private constant. We
duplicate this information in fc.go which also needs it.
Export it from fs.go instead, so it can be used in fc.go.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A number of unit tests under virtcontainers/factory use
MockStorageRootPath() as a general purpose temporary directory. This
doesn't make sense: the mockfs driver isn't even in use here since we only
call EnableMockTesting for the pase virtcontainers package, not the
subpackages.
Instead use t.TempDir() which is for exactly this purpose. As a bonus it
also handles the cleanup, so we don't need MockStorageDestroy any more.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There are several tests in mount_test.go which perform a sample bind
mount. These need a corresponding unmount to clean up afterwards or
attempting to delete the temporary files will fail due to the existing
mountpoint. Most of them had such an unmount, but
TestBindMountInvalidPgtypes was missing one.
In addition, the existing unmounts where done inconsistently - one was
simply inline (so wouldn't be executed if the test fails too early) and one
is a defer. Change them all to use the t.Cleanup mechanism.
For the dummy mountpoint files, rather than cleaning them up after the
test, the tests were removing them at the beginning of the test. That
stops the test being messed up by a previous run, but messily. Since
these are created in a private temporary directory anyway, if there's
something already there, that indicates a problem we shouldn't ignore.
In fact we don't need to explicitly remove these at all - they'll be
removed along with the rest of the private temporary directory.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The tests in hook_test.go run a mock hook binary, which does some debug
logging to /tmp/mock_hook.log. Currently we don't clean up those logs
when the tests are done. Use a test cleanup function to do this.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
SetupOCIConfigFile creates a temporary directory with os.MkDirTemp(). This
means the callers need to register a deferred function to remove it again.
At least one of them was commented out meaning that a /temp/katatest-
directory was leftover after the unit tests ran.
Change to using t.TempDir() which as well as better matching other parts of
the tests means the testing framework will handle cleaning it up.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Several tests in kata_agent_test.go create /tmp/mountPoint as a dummy
directory to mount. This is not cleaned up after the test. Although it
is in /tmp, that's still a little messy and can be confusing to a user.
In addition, because it uses the same name every time, it allows for one
run of the test to interfere with the next.
Use the built in t.TempDir() to use an automatically named and deleted
temporary directory instead.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is to fix a test failure for the
kata-containers-2.0-ubuntu-20.04-s390x-main-baseline jenkins job
Fixes: #4088
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
kata-monitor allows to get data profiles from the kata shim
instances running on the same node by acting as a proxy
(e.g., http://$NODE_ADDRESS:8090/debug/pprof/?sandbox=$MYSANDBOXID).
In order to proxy the requests and the responses to the right shim,
kata-monitor requires to pass the sandbox id via a query string in the
url.
The profiling index page proxied by kata-monitor contains the link to all
the data profiles available. All the links anyway do not contain the
sandbox id included in the request: the links result then broken when
accessed through kata-monitor.
This happens because the profiling index page comes from the kata shim,
which will not include the query string provided in the http request.
Let's add on-the-fly the sandbox id in each href tag returned by the kata
shim index page before providing the proxied page.
Fixes: #4054
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
The scanner reads nothing from viriofsd stderr pipe, because param
'--syslog' rediercts stderr to syslog. So there is no need to write
scanner.Text() to kata log
Fixes: #4063
Signed-off-by: Zhuoyu Tie <tiezhuoyu@outlook.com>
The fsGroup will be specified by the fsGroup key in
the direct-assign mountinfo metadate field.
This will be set when invoking the kata-runtime
binary and providing the key, value pair in the metadata
field. Similarly, the fsGroupChangePolicy will also
be provided in the mountinfo metadate field.
Adding an extra fields FsGroup and FSGroupChangePolicy
in the Mount construct for container mount which will
be populated when creating block devices by parsing
out the mountInfo.json.
And in handleDeviceBlockVolume of the kata-agent client,
it checks if the mount FSGroup is not nil, which
indicates that fsGroup change is required in the guest,
and will provide the FSGroup field in the protobuf to
pass the value to the agent.
Fixes#4018
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
This change adds two fields to the Storage pb
FSGroup which is a group id that the runtime
specifies to indicate to the agent to perform a
chown of the mounted volume to the specified
group id after mounting is complete in the guest.
FSGroupChangePolicy which is a policy to indicate
whether to always perform the group id ownership
change or only if the root directory group id
does not match with the desired group id.
These two fields will allow CSI plugins to indicate
to Kata that after the block device is mounted in
the guest, group id ownership change should be performed
on that volume.
Fixes#4018
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
Add some links to rendered webpages for better user experience,
let users can jump to pages only by clicking links in browsers.
Fixes: #4061
Signed-off-by: bin <bin@hyper.sh>
virtiofsd's debug will be enabled if hypervisor's debug has been
enabled, this will generate too many noisy logs from virtiofsd.
Unbind the relationship of log level between virtiofsd and
hypervisor, if users want to see debug log of virtiofsd,
can set it by:
virtio_fs_extra_args = ["-o", "log_level=debug"]
Fixes: #3303
Signed-off-by: bin <bin@hyper.sh>
The vhost-user-blk can be hotplugged on the PCI bridge successfully on
X86, but failed on Arm. However, hotplugging it on Root Port as a PCIe
device can work well on ARM.
Open the "pcie_root_port" in configuration.toml is needed.
Fixes: #4019
Signed-off-by: Jaylyn Ren <jaylyn.ren@arm.com>
This configuration option is valid for all the hypervisor that are going
to be used with the confidential containers effort, thus exposing the
configuration option for Cloud Hypervisor as well.
Fixes: #4022
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.
This commit also updates the unit test advice to use `T.TempDir` to
create temporary directory in tests.
Fixes: #3924
Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
(default: "/run/containerd/containerd.sock") is duplicated when
printing kata-monitor usage:
[root@kubernetes ~]# kata-monitor --help
Usage of kata-monitor:
-listen-address string
The address to listen on for HTTP requests. (default ":8090")
-log-level string
Log level of logrus(trace/debug/info/warn/error/fatal/panic). (default "info")
-runtime-endpoint string
Endpoint of CRI container runtime service. (default: "/run/containerd/containerd.sock") (default "/run/containerd/containerd.sock")
the golang flag package takes care of adding the defaults when printing
usage. Remove the explicit print of the value so that it would not be
printed on screen twice.
Fixes: #3998
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
getOOMEvents is a long-waiting call, it will retry when failed.
For cases of agent shutdown, the retry should stop.
When the agent hasn't detected agent has died, we can also check
whether the error is "ttrpc: closed".
Fixes: #3815
Signed-off-by: bin <bin@hyper.sh>
Modify the 2Mib in the comment to 4Mib.
VirtioMem is set by configuration file or annotation. And setupVirtioMem is called only when VirtioMem is true.
Fixes: #3750
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
Previously, it was not permitted to have neither an initrd nor an image.
However, this is the exact config to use for Secure Execution, where the
initrd is part of the image to be specified as `-kernel`. Require the
configuration of no initrd for Secure Execution.
Also
- remove redundant code for image/initrd checking -- no need to check in
`newQemuHypervisorConfig` (calling) when it is also checked in
`getInitrdAndImage` (called)
- use `QemuCCWVirtio` constant when possible
Fixes: #3922
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
src/runtime/virtcontainers/hook/mock contains a simple example hook in Go.
The only thing this is used for is for some tests in
src/runtime/pkg/katautils/hook_test.go. It doesn't really have anything
to do with the rest of the virtcontainers package.
So, move it next to the test code that uses it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We've now removed the need to install the mock hook binary for unit tests.
However, it turns out that managing that was the *only* thing that the
install and uninstall targets in the virtcontainers Makefile handled.
So, remove them.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Running unit tests should generally have minimal dependencies on
things outside the build tree. It *definitely* shouldn't modify
system wide things outside the build tree. Currently the runtime
"make test" target does so, though.
Several of the tests in src/runtime/pkg/katautils/hook_test.go require a
sample hook binary. They expect this hook in
/usr/bin/virtcontainers/bin/test/hook, so the makefile, as root, installs
the test binary to that location.
Go tests automatically run within the package's directory though, so
there's no need to use a system wide path. We can use a relative path to
the binary build within the tree just as easily.
fixes#3941
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The VC_BIN_DIR variable in the virtcontainers Makefile is almost unused.
It's used to generate TEST_BIN_DIR, and it's created in the install target.
However, we also create TEST_BIN_DIR, which is a subdirectory of VC_BIN_DIR
with mkdir -p, so it will necessarily create VC_BIN_DIR along the way.
So we can drop the unnecessary mkdir and expand the definition of
VC_BIN_DIR in the definition of TEST_BIN_DIR.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The INSTALL_EXEC and UNINSTALL_EXEC definitions from the virtcontainers
Makefile (unlike those from the runtime Makefile in the parent directory)
are entirely unused. Remove them.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The check-go-test target passes the path to the mock hook test binary to
go-test.sh when it invokes it. But go-test.sh just calls run_go_test from
ci/lib.sh, which invokes a script from the tests repo *without* any
parameters.
That is, this parameter is ignored anyway, so remove it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently kata shim v2 doesn't translate ESRCH signal, causing container
fail to stop and shim leak.
Fixes: #3874
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Currently, the block driver option is specifed by hard coding, maybe it
is better to use const string variables instead of hard coded strings.
Another modification is to remove duplicate consts for virtio driver in
manager.go.
Fixes: #3321
Signed-off-by: Jason Zhang <zhanghj.lc@inspur.com>
This change introduces the `disable_guest_empty_dir` config option,
which allows the user to change whether a Kubernetes emptyDir volume is
created on the guest (the default, for performance reasons), or the host
(necessary if you want to pass data from the host to a guest via an
emptyDir).
Fixes#2053
Signed-off-by: Evan Foster <efoster@adobe.com>
Let's bring in the latest release of Containerd, 1.6.1, released on
March 2nd, 2022.
With this, we take the opportunity to remove containerd/api reference as
we shouldn't need a separate module only for the API.
Here's the list of changes needed in the code due to the bump:
* stop using `grpc.WithInsecure()` as it's been deprecated
- use `grpc.WithTransportCredentials(insecure.NewCredentials())`
instead
Fixes: #3820
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As this is just a initial vcpu hotplug support, thread and socket has
not been supported. So, don't set socket and thread when hotadd cpu for
arm/virt.
Fixes: #3280
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Mount the direct-assigned block device fs only once and keep a refcount
in the guest. Also use the ro flag inside the options field to determine
whether the block device and filesystem should be mounted as ro
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Translate the volume path from host-known path to guest-known path
and forward the request to kata agent.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
During the container creation, it will parse the mount info file
of the direct assigned volumes and update the in memory mount object.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Add GetVolumeStats and ResizeVolume APIs for the runtime to query stat
and resize fs in the guest.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
To query fs stats and resize fs, the requests need to be passed to
kata agent through containerd-shim-v2. So we're adding to rest APIs
on the shim management endpoint.
Also refactor shim management client to its own go file.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
In the direct assigned volume scenario, Kata Containers persists
the information required for managing the volume inside the guest
on host filesystem.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Add commands to add, remove, resize and get stats of a direct-assigned volume.
These commands are expected to be consumed by CSI.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
If, for some reason, we're able to launch cloud hypervisor but not able
to boot the VM up, the virtiofsd process would be left behind.
Let's ensure, via defer, that we stop virtiofsd in case of errors.
Fixes: #3819
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
kata-containers/pulls#3771 added TDX support for Cloud Hypervisor, but
two big things got overlooked while doing that.
1. virtio-fs, as of now, cannot be part of the trust boundary, so the
Confidential Guest will not be using it.
2. virtio-block hotplug should be enabled in order to use virtio-block
for the rootfs (used with the devmapper plugin).
When trying to use cloud-hypervisor with TDX using virtio-fs, we're
facing the following error on the guest kernel:
```
virtiofs virtio2: device must provide VIRTIO_F_ACCESS_PLATFORM
```
After checking and double-checking with virtiofs and cloud-hypervisor
developers, it happens as confidential containers might put some
limitations on the device, so it can't access all of the guests' memory
and that's where this restriction seems to be coming from. Vivek
mentioned that virtiofsd do not support VIRTIO_F_ACCESS_PLATFORM (aka
VIRTIO_F_IOMMU_PLATFORM) yet, and that for ecrypted guests virtiofs may
not be the best solution at the moment.
@sboeuf put this in a very nice way: "if the virtio-fs driver doesn't
support VIRTIO_F_ACCESS_PLATFORM, then the pages corresponding to the
virtqueues and the buffers won't be marked as SHARED, meaning the VMM
won't have access to it".
Interestingly enough, it works with QEMU, and it may be due to some
change done on the patched QEMU that @devimc is packaging, but we won't
take the path to figure out what was the change and patch
cloud-hypervisor on the same way, because of 1.
Fixes: #3810
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As mounting volumes into the guest requires SharedFS setup, let's ensure
we error out if trying to do so in a situation where SharedFS is not
supported.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
supportsSharedFS() is a new method to be used to ensure that no SharedFS
specifics are called when, for a reason or another, Cloud Hypervisor is
in a mode where SharedFSs are not supported.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to the `createVirtiofsDaemon` and `stopVirtiofsDaemon` methos,
let's introduce and use loadVirtiofsDaemon, at it'll also be handy later
in this series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similary to the `createVirtiofsDaemon` method, let's introduce and use
its counterpart, as it'll also be handy later in this series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to what's been done with the `createVirtiofsDaemon`, let's
create a `setupVirtiofsDaemon` one.
It will also become handy later in this series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's introduce and use a new `createVirtiofsDaemon` method. Its name
says it all, and it'll be handy later in this series when, spoiler
alert, SharedFS cannot be used (in such cases as in Confidential
Guests).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit df8ffecde0, as device
hotplug *is* supported and, more than that, is very much needed when
using virtio-blk instead of virtio-fs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
virtio-fs, instead of virtio-9p, is the default shared file system type
in case virtio-blk is not used.
Fixes: #3813
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Relying on virtio-block is the *only* way to use Firecracker with Kata
Containers, as shared FS (virtio-{fs,fs-nydus,9p}) is not supported by
Firecracker.
As configuration doesn't make sense to be exposed, we hardcode the
`false` value in the Firecracker configuration structure.
Fixes: #3813
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Change `kata-monitor` to listen to port `8090` on the local interface
only by default.
> **Note:**
>
> This is a breaking change as previously it listened on all interfaces.
Fixes: #3795.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Removed redundant and duplicated build options to build
`kata-monitor` the same way as the other components:
- `CGO_ENABLED=0` is not necessary.
- `-buildmode=exe` is not necessary since `BUILDFLAGS` already sets the
build mode.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
- Mostly blank lines after `+build` -- see
https://pkg.go.dev/go/build@go1.14.15 -- this is, to date, enforced by
`gofmt`.
- 1.17-style go:build directives are also added.
- Spaces in govmm/vmm_s390x.go
Fixes: #3769
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
This utility function is also used to check the spec that will run in
the guest - no need for this to be linux specific.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Their types may differ on various host OSes, but
unix.Major|Minor always takes a uint64
Depends-on: github.com/kata-containers/tests#4516
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Add a stub for utils_darwin to facilitate building this package on
Darwin. We can probably drop this empty stub if we have better
abstraction for the various parts of virtcontainers that call it
today...
Fixes:# 3777
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
We need to convert them to uint64 as their types may differ on various
host OSes, but unix.Major|Minor takes a uint64 regardless.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Let's clarify that an error will be reported in case confidential_guest
is enabled, but the hardware where Kata Containers is running doesn't
provide the required feature set.
Fixes: #3787
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's use "Intel TDX" rather than just "TDX", as it can ease the
understanding of the terminology.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's mention the supported TEEs to be used with confidential guests.
Right now, Cloud Hyperisor supports only Intel TDX, used together with
TD Shim.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Nydusd uses a bufio.Scanner to check if nydusd process has
existed, but stderr/stdout passed to Cmd is self-created pipe,
this pipe will not be closed if the process start failing.
Use standard Cmd.StdoutPipe can close the stdout and kata shim
will detect the existence of the nydusd process, then call cmd.Wait to
reap the process' resources.
Fixes: #3783
Signed-off-by: bin <bin@hyper.sh>
A copy and paste mistake was made and the error on HotplugRemoveDevice()
should be about removal and not about addition.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
removes --tags selinux handling in the makefile (part of it introduced here: d78ffd6)
and makes selinux configurable via configuration.toml
Fixes: #3631
Signed-off-by: Tanweer Noor <tnoor@apple.com>
Switching to the generic FilesystemSharer brings 2 majors improvements:
1. Remove container and sandbox specific code from kata_agent.go
2. Allow for non Linux implementations to provide ways to share
container files and root filesystems with the Kata Linux guest.
Fixes#3622
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
With the Linux implementation of the FilesystemSharer interface, we can
now remove all host filesystem sharing code from kata_agent and keep it
where it belongs: sandbox.go.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
This gathers the current kata agent and container filesystem sharing
code into a FilesystemSharer implementation.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Filesystem sharing here means the ability to share some parts of the
host filesystem with the guest. It's mostly about sharing files and
container bundle root filesystems.
In order to allow for different file and rootfs sharing implementations,
we define a FilesystemSharer interface.
This interface provides a preparation step, where concrete
implementations will be able to e.g. prepare the host filesysstem.
Then it provides 2 methods, one for sharing any file (regular file or a
directory) and another one for sharing a container root filesystem
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Let's enable TDX support for Cloud Hypervisor, using td-shim as its
desired firmware.
Fixes: #3632
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
"firmware" option was already present for a while, but it's never been
exposed to the configuration file before.
Let's do it now as it can be used, in combination with the newly added
confidential_guest option, to boot a guest VM using the so called
`td-shim`[0] with Cloud Hypervisor.
[0]: https://github.com/confidential-containers/td-shim
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
NVDIMM is also not supported with Confidential Guests and Virtio Block
devices should be used instead.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to VCPUs and Device hotplug, Confidential Guests also do not
support Memory hotplug.
Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Memory hotplug as
being supported when running Confidential Guests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to VCPUs hotplug, Confidential Guests also do not support
Device hotplug.
Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Device hotplug as
being supported when running Confidential Guests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As confidential guests do not support VCPUs hotplug, let's set the
"DefaultMaxVCPUs" value to "NumVCPUs".
The reason to do this is to ensure that guests will be started with the
correct amount of VCPUs, without giving to the guest with all the
possible VCPUs the host could provide.
One clear side effect of this limitation is that workloads that would
require more VCPUs on their yaml definition will not run on this
scenario.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
ConfidentialGuest is an option already present and exposed for QEMU,
which is used for using Kata Containers together with different sorts of
Guest Protections, such as TDX and SEV for x86_64, PEF for ppc64le, and
SE for s390x.
Right now we error out in case confidential_guest is enabled, as we will
be implementing the needed blocks for this as part of this series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is a small code refactor removing a deadcode based the checks
already done in the generic hypervisor abstraction.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The hypervisor code already defines 3 common kernel root params for the
following cases:
* NVDIMM
* NVDIMM without DAX support
* Virtio Block
As parameters used for cloud-hypervisor have an overlap with the ones
provided by the NVDIMM case, let's take advantage of that.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's bump the Cloud Hypervisor version to 5343e09e7b8db, as that brings
a few fixes we're interested in, such as:
* hypervisor, vmm: Handle TDX hypercalls with INVALID_OPERAND
- https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3723
- This is needed for the TDX support on the cloud hypervisor driver,
which is part of this very same series.
* openapi: Update the PciBdf types
- https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3748
- This is needed due to a change in a DeviceNode field, which would
cause a marshalling / demarshalling error when running with a
version of cloud-hypervisor that includes the TDX fixes mentioned
above.
* scripts: dev_cli: Don't quote $features_build
* scripts: dev_cli: Add --features option
- https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3773
- This is needed due to changes in the scripts used to build Cloud
Hypervisor, which are used as part of Kata Containers CIs and
github actions.
Due to this change, we're also adapting the build scripts as part
of this very same commit.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
in TestHandleHugepages. On s390x, hugepage sizes must be set at boot, so
test with any that are present (default is 1M).
Depends-on: github.com/kata-containers/kata-containers#3770
Fixes: #3763
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
We introduced collection of sandboxes metadata from the CRI that will be
attached to the sandbox metrics: this will allow to immediately match
sandboxes metrics with CRI workloads.
Rename the symbols from *Kube* to *CRI* as the metadata will be there
every time pods are created through CRI, also if kubernetes is not
installed (e.g., 'crictl runp').
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
To resourcecontrol, and make it consistent with the fact that cgroups
are a Linux implementation of the ResourceController interface.
Fixes: #3601
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
We call it a ResourceController, and we make it not so Linux specific.
Now the Linux implementations is the cgroups one.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
We now rely on fs events only to update the sandbox cache. This is not
true anyway for sandboxes already present at kata-monitor startup: we
just retrieve the list and add them in the cache only when we get their
CRI metadata. If CRI metadata is not available we will never add them to
the sandbox cache.
Fix this by immediately adding the sandboxes we find at startup time to
the sandbox cache.
Fixes: #3705
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
Since kata-monitor now:
- relies on fs events *only* to update the sandbox cache
- adds CRI meta-data as labels (CRI pod name, namespace and uid)
it deserves a version bump.
Note that while we could let kata-monitor match the runtime version,
kata-monitor will usually work flawlessy with different kata shim
releases: so it makes sense to keep kata-monitor version separated.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
CRI-O start shim process without setting TTRPC_ADDRESS,
that the forwarding events goroutine will get errors.
For CRI-O runtime, we can log the events to log file.
Fixes: #3733
Signed-off-by: bin <bin@hyper.sh>
As kata with qemu has supported lazyload, so this pr aims to
bring lazyload ability to kata with clh.
Fixes#3654
Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
As the container runtime, we're never inspecting, adding or configuring
host networking endpoints.
Make sure we're always do that by wrapping addSingleEndpoint calls into
the pod network namespace.
Fixes#3661
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Keep all the OS agnostic bits in the hypervisor.go and
hypervisor_ARCH.go files.
Fixes#3614
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Some of them (e.g. QEMU) can run on other OSes (e.g. Darwin) but the
current virtcontainers implementation is Linux specific.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Even though it's still actually defined as the QEMU upper bound,
it's now abstracted away through govmm.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Various improvements to the top-level README file:
- Moved the following sections from the runtime's README to the
top-level README:
- License
- Platform support / Hardware requirements
- Added the following sections to the top-level README:
- Configuration
- Hypervisors
- Improved formatting of the Documentation section in the top-level
README.
- Removed some unused named links from the top-level README.
Also improvements to the runtime README:
- Removed confusing mention of the old 1.x runtime name.
- Clarify the binary name for the 2.x runtime and the utility program.
> **Note:**
>
> We cannot currently link to the AMD website as that site's
> configuration causes the CI static checks to fail. See
> https://github.com/kata-containers/tests/issues/4401Fixes: #3557.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Support hugepages and port from:
96dbb2e8f0Fixes: #3342
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: bin <bin@hyper.sh>
The OCI spec is very specific about it:
"The prestart hooks MUST be executed in the runtime namespace."
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
That allows us to amend those annotations with information that could be
used when running those hooks.
For example nerdctl will use those annotations to resolve the networking
namespace path in where to run the CNI plugin, i.e. the created pod
networking namespace.
Fixes#3629
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
German Maglione, one of the current virtio-fs developers, has brought to
our attention that using "announce-submounts" could help us to prevent
inode number collisions.
This feature was introduced a year ago or so by Hanna Reitz as part of
the 08dce386e77eb9ab044cb118e5391dc9ae11c5a8, and as we already mandate
QEMU >= 6.1.0, let's take advantage of that.
Fixes: #3507
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
When building with `ARCH=x86_64`, the previous `Makefile` will use it
without checking and cause:
Makefile:319: *** "ERROR: No hypervisors known for architecture x86_64 (looked for: acrn firecracker qemu cloud-hypervisor)". Stop.
This commit fix the above issue by checking `ARCH` no matter where it
is assigned.
Fixes: #3444
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
The distro constraint parses os release files, which may not contain
distro version(VERSION_ID field), for example rolling release distributions
like Debian testing, archlinux.
These distro constraints are not used anyway, so removing them instead
of fixing the complex version detection.
Fixes: #1864
Signed-off-by: Shengjing Zhu <zhsj@debian.org>
Let's update cloud-hypervisor to a version that exposes the TDx support
via the OpenAPI's auto-generated code.
Fixes: #3663
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Change the variables `mountTypeFieldIdx := 8`, `mntDestIdx := 4` and `netNsMountType := "nsfs"` to const.
And unify the variable naming style, modify `mntDestIdx` to `mountDestIdx`.
Fixes: #3646
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
Pulling image is the most time-consuming step in the container lifecycle. This PR
introduse nydus to kata container, it can lazily pull image when container start. So it
can speed up kata container create and start.
Fixes#2724
Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
Relative links within this repository allow for easier navigation to
the corresponding file / directory in the current commit / for the
selected version.
Link text was slightly changed / fixed in
- docs/Unit-Test-Advice.md
- docs/how-to/how-to-run-docker-with-kata.md
Fixes#3045
Signed-off-by: Daniel Höxtermann <daniel@hxtm.dev>
We don't need to call NewNetwork() twice, and we can have the VM factory
case return immediatly. That makes the code more readable.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Move the netlink dependent code into network_linux.go.
Other OSes will have to provide the same functions.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
And only have AddEndpoints/RemoveEndpoints for all cases (single
endpoint vs all of them, hotplug or not).
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
We are converting the Network structure into an interface, so that
different host OSes can have different networking implementations for
Kata.
One step into that direction is to rename all the Network structure
fields and methods to something that is less Linux networking namespace
specific. This will make the Network interface naming consistent.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
We are replacing the NetworkingNamespace structure with the Network
one, so we should have the hypervisor interface switching to it as well.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Remove unused parameters.
Reduce the number of parameters by deriving some of them (e.g. a
networking config) from their outer structure (e.g. a Sandbox
reference).
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Endpoints creations, attachement and hotplug are bound to the networking
namespace described through the Network structure.
Making them Network methods is natural and simplifies the code.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
For simplicity sake, there should only be one networking structure per
sandbox, as opposed to two (Network and NetworkingNamespace) currently.
This commit start expanding the Network structure in order to eventually
make it the single representation of a virtcontainers sandbox
networking.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Runtime now accepts both `1` and `Y` as valid values for
kvm_amd module parameter kvm_amd.sev.
Fixes#3273
Signed-off-by: Pierre Kohler <pierre.kohler@cysec.systems>
On s390x, skip adding a virtio-rng device. The on-chip CPACF provides
entropy instead. For Confidential Containers, when using Secure
Execution, entropy attacks on virtio-rng are mitigated.
Fixes: #3598
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
firmware can be split into FIRMWARE_VARS.fd (UEFI variables as
configuration) and FIRMWARE_CODE.fd (UEFI program image). UEFI
variables can be customized per each user while UEFI code is kept same.
fixes#3583
Signed-off-by: Julio Montes <julio.montes@intel.com>
Move the netns specific bits into a Linux specific file.
Fixes: #3607
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
There are software and hardware architectures which do not support
dynamically adjusting the CPU and memory resources associated with a
sandbox. For these, today, they rely on "default CPU" and "default
memory" configuration options for the runtime, either set by annotation
or by the configuration toml on disk.
In the case of a single container (launched by ctr, or something like
"docker run"), we could allow for sizing the VM correctly, since all of
the information is already available to us at creation time.
In the sandbox / pod container case, it is possible for the upper layer
container runtime (ie, containerd or crio) could send a specific
annotation indicating the total workload resource requirements
associated with the sandbox creation request.
In the case of sizing information not being provided, we will follow
same behavior as today: start the VM with (just) the default CPU/memory.
If this information is provided, we'll track this as Workload specific
resources, and track default sizing information as Base resources. We
will update the hypervisor configuration to utilize Base+Workload
resources, thus starting the VM with the appropriate amount of CPU and
memory.
In this scenario (we start the VM with the "right" amount of
CPU/Memory), we do not want to update the VM resources when containers
are added, or adjusted in size.
This functionality is introduced behind a configuration flag,
`static_sandbox_resource_mgmt`. This is defaulted to false for all
configurations except Firecracker, which is set to true.
This'll greatly improve UX for folks who are utilizing
Kata with a VMM or hardware architecture that doesn't support hotplug.
Note, users will still be unable to do in place vertical pod autoscaling
or other dynamic container/pod sizing with this enabled.
Fixes: #3264
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Add the POD metadata we get from the container manager to the metrics by
adding more labels.
Fixes: #3551
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
Kata-monitor detects started and terminated kata pods by monitoring the
vc/sbs fs (this makes sense since we will have to access that path to
access the sockets there to get the metrics from the shim).
While kata-monitor updates its sandbox cache based on the sbs fs events,
it will schedule also a sync with the container manager via the CRI in
order to sync the list of sandboxes there.
The container manager will be the ultimate source of truth, so we will
stick with the response from the container manager, removing the
sandboxes not reported from the container manager.
May happen anyway that when we check the container manager, the new kata
pod is not reported yet, and we will remove it from the kata-monitor pod
cache. If we don't get any new kata pod added or removed, we will not
check with the container manager again, missing reporting metrics about
that kata pod.
Let's stick with the sbs fs as the source of truth: we will update the
cache just following what happens on the sbs fs.
At this point we may have also decided to drop the container manager
connection... better instead to keep it in order to get the kube pod
metadata from it, i.e., the kube UID, Name and Namespace associated with
the sandbox.
Every time we get a new sandbox from the sbs fs we will try to retrieve the
pod metadata associated with it.
Right now we just attach the container manager sandbox id as a label to
the exposed metrics, making hard to link the metrics to the running pod
in the kubernetes cluster.
With kubernetes pod metadata we will be able to add them as labels to map
explicitly the metrics to the kubernetes workloads.
Fixes: #3550
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
We currently WARN about unexpected fs events, which includes CHMOD
operations (which should be actually expected...).
Just ignore all the fs events we don't care about without any warn.
We dump all the events with debug log in any case.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
Improve debug log formatting of the sandbox cache update process.
Move raw and tracing logs from the DEBUG to the TRACE log level.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
The OCI container spec specifies a limit of -1 signifies
unlimited memory. Update the sandbox memory calculator
to reflect this part of the spec.
Fixes: #3512
Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
no explicit PCI test, just switch path depending on architecture
(CCW for s390x, PCI for others). Also fixes an unknown variable error.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
When no options are passed to -ldflags, it passes
incorrect values(in this case, $BUILDFLAGS) to it.
Fix passing empty values by passing $KATA_LDFLAGS
in quotes.
Fixes: #3521
Signed-off-by: Amulya Meka <amulmek1@in.ibm.com>
For now a bunch of tests are simply not working.
Let's skip them all, and re-enable them once
kata-containers/kata-containers/issues/3500 gets fixed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Both projects follow the same license, Apache-2.0, but the header saying
that comes from govmm is different from the one expected for the tests
present on the kata-containers repo.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
govet checks have been ignored on govmm repo, but those are enabled on
kata-containers one. So, in order to avoid failing our CIs let's just
keep ignoring the checks for the govmm structs and have an issue opened
for fixing it whenever someone has cycles to do it.
The important bit here is, we're not making anything worse that it
already is. :-)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
govmm, from now on, should follow the same guidelines from contributing,
copying, and etc as kata-containers does.
The go.mod is not needed anymore as the project lives inside the
runtime.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's stop using govmm from kata-containers/govmm and let's start using
it from our own repo.
Fixes: #3495
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`enable_swap` option was added long time ago to add
`-realtime mlock=off` to the QEMU's command line.
Kata now supports QEMU 6, `-realtime` option has been deprecated and
`mlock=on` is causing unexpected behaviors in kata.
This patch removes support for `enable_swap`, `-realtime` and `mlock=`
since they are causing bugs in kata.
Signed-off-by: Julio Montes <julio.montes@intel.com>
for linking. Required for basic KVM checks on some kernels (e.g. the
one RHEL is currently shipping), cf.
6621441db5/target/s390x/kvm/meson.build (L15-L16).
Fixes: #3469
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
bring SGX support and other fixes
shortlog:
8939b0f qemu: add support for SGX
b17f073 qemu: update readonly flag for block devices
f971801 qemu: only set wait parameter for server mode socket based
char device
82cc01d qemu: Fix 32 bit int overflow in test file
1d1a231 qemu: Add support for legacy serial device
9a2bbed qemu: Remove -realtime in favor of -overcommit
fe83c20 qemu: Add support for --no-shutdown Knob
1ed5271 qmp: wait for POWERDOWN event in ExecuteSystemPowerdown()
fixes#3080
Signed-off-by: Julio Montes <julio.montes@intel.com>
When Sandbox AddInterface() is called, it may fail after endpoint.HotAttach,
we'd better rollback and call save() in the end.
Fixes: #3419
Signed-off-by: yangfeiyu <yangfeiyu20102011@163.com>
After the protocols are moved to upper libs (PR3355),
the runtime protocol generation is broken. This fixes it.
Fixes: #3414
Signed-off-by: Feng Wang <feng.wang@databricks.com>
This PR removes the reference of how to use disable_new_netns
configuration with docker as for kata 2.0 we are not supporting docker
and this information was used for kata 1.x
Fixes#3400
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Update our containerd vendoring. In particular, we're interested in
grabbing the updated annotation definitions for defining sandbox sizing.
- go get github.com/containerd/containerd@v1.6.0-beta.4
- edit go.mod to remove containerd v1.5.8 replacement directive
- go mod vendor
- go mod tidy
Fixes: #3276
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Restore Debian as a rootfs.
1. revert of #3154, but some change
2. update debian version to 10.11
3. update `libstdc++-6-dev` to `libstdc++-8-dev`
4. changes discarded in QAT are not restored
Fixes: #3372
Signed-off-by: zhaojizhuang <571130360@qq.com>
Add span name to logging error to help with debugging when the context
is not set before the span is created.
Fixes#3289
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Move the architecture document into a new `docs/design/architecture/` directory
in preparation for splitting it into more manageable pieces.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
not clear why this was commented out before -- ensure that we set
approprate annotation on the sandbox container's annotations to indicate
this is a sandbox.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Today we assume that if the CRI/upper layer doesn't provide a container
type annotation, it should be treated as a sandbox. Up to this point, a
sandbox with a pause container in CRI context and a single container
(ala ctr run) are treated the same.
For VM sizing and container constraining, it'll be useful to know if
this is a sandbox or if this is a single container.
In updating this, we cleanup the type handling tests and we update the
containerd annotations vendoring.
Fixes: #2926
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Some new attributes are added to hypervisor config:
- VMStorePath
- RunStorePath
- SharedPath
These attributes should be handled in two places:
- reset when check the new hypervisor's config is suitable
to the base config.
- copy from new hypervisor's config when create new VM
Fixes: #3193
Signed-off-by: bin <bin@hyper.sh>
The latest release of openapi-generator v5.3.0 contains the fix for
`dropping err` bug [1]. This patch also re-generated the client code of
Cloud Hypervisor to have the bug fixed.
[1] https://github.com/OpenAPITools/openapi-generator/pull/10275Fixes: #3201
Signed-off-by: Bo Chen <chen.bo@intel.com>
vhost-net is disabled in the rootless kata runtime feature, which has been abandoned since kata 2.0.
I reused the rootless flag for nonroot hypervisor and would like to enable vhost-net.
Fixes#3182
Signed-off-by: Feng Wang <feng.wang@databricks.com>
This PR updates the comments in the configuration.toml to point to
the current kata containers repository instead of the kata 1.x.
Fixes#3163
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Currently we do not have debian as part of the kata CI as we
do not have a mantainer, this PR removes debian as a supported
rootfs in order to have only the distros that we are supporting
and mantainining.
Fixes#3153
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
os.Exit() will terminate program immediately, the defer functions
won't be executed, so we add defer functions again before os.Exit().
Refer to https://pkg.go.dev/os#ExitFixes: #3059
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
1. use ci/go-test.sh to replace the direct call to go test
2. fix data race test
3. install hook whether it is root or not
Fixes#1494
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
This'll end up moving to hypervisors pkg, but let's stop using virtLog,
instead introduce hvLogger.
Fixes: #2884
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Today the hypervisor code in vc relies on persist pkg for two things:
1. To get the VM/run store path on the host filesystem,
2. For type definition of the Load/Save functions of the hypervisor
interface.
For (1), we can simply remove the store interface from the hypervisor
config and replace it with just the path, since this is all we really
need. When we create a NewHypervisor structure, outside of the
hypervisor, we can populate this path.
For (2), rather than have the persist pkg define the structure, let's
let the hypervisor code (soon to be pkg) define the structure. persist
API already needs to call into hypervisor anyway; let's allow us to
define the structure.
We'll probably want to look at following similar pattern for other parts
of vc that we want to make independent of the persist API.
In doing this, we started an initial hypervisors pkg, to hold these
types (avoid a circular dependency between virtcontainers and persist
pkg). Next step will be to remove all other dependencies and move the
hypervisor specific code into this pkg, and out of virtcontaienrs.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Today, acrn relies on sandbox level information, as well as a store
provided by common parts of the hypervisor. As we cleanup the
abstractions within our runtime, we need to ensure that there aren't
cross dependencies between the sandbox, the persistence logic and the
hypervisor.
Ensure that ACRN still compiles, but remove the setSandbox usage as
well as persist driver setup.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
If a file/directory doesn't exist, os.Stat() returns an
error. Assert the returned value with os.IsNotExist() to
prevent it from failing.
Fixes: #2920
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
Many of these functions are just used on one place throughout the rest
of the code base. If we create hypervisor package, newtork package, etc, we may want to
parse this out.
Fixes: #3049
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
This will be useful at runtime level; no need for oci or uuid to be subpkg of
virtcontainers.
While at it, ensure we run gofmt on the changed files.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
The fact that we need to "bridge" the endpoint is a bit irrelevant. To
be consistent with the rest of the endpoints, let's just call this
"macvlan"
Fixes: #3050
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
The thread monitor will check if the agent and the VMM are alive every
second in a blocking thread. The Cloud hypervisor API server is
single-threaded, if the monitor does a `check()`, while a slow request
is still in progress, the monitor check() method will timeout. The
monitor thread will stop all the shim-v2 execution.
This commit modifies the monitor thread to make it check the status of
the hypervisor after 5 seconds. Additionally, the `check()` method from
cloud-hypervisor will use the method `clh.isClhRunning(timeout)` with a
10 seconds timeout. The monitor function does no timeout, so even if
`hypervisor.check()` takes more 10 seconds, the isClhRunning method
handles errors doing a VmmPing and retry in case of errors until the
timeout is reached.
Reduce the time to the next check to 5 should not affect any functionality,
but it will reduce the overhead polling the hypervisor.
Fixes: #2777
Signed-off-by: Carlos Venegas <jose.carlos.venegas.munoz@intel.com>
There are two types packages under virtcontainers, and the
virtcontainers/pkg/types has a few codes, merging them into
one can make it easy for outstanding and using types package.
Fixes: #3031
Signed-off-by: bin <bin@hyper.sh>
As github.com/containerd/cgroups doesn't support scope
units which are essential in some cases lets create
the cgroups manually and load it trough the cgroups
api
This is currently done only when there's single sandbox
cgroup (sandbox_cgroup_only=true), otherwise we set it
as static cgroup path as it used to be (until a proper
soultion for overhead cgroup under systemd will be
suggested)
Fixes: #2868
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
This reverts commit 76f16fd1a7 to bring
back cri-containerd crioptions parsing so that kata works with older
containerd versions like v1.3.9 and v1.4.6.
Fixes: #2999
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Protection types like tdxProtection or seProtection were marked nolint,
remove this. As a side effect, ARM needs dummy tests for these.
Fixes: #2801
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Remove the `StartTracing` and `StopTracing` agent APIs that toggle
dynamic tracing. This is not supported in Kata 2.x, as documented in the
[tracing proposals document](https://github.com/kata-containers/kata-containers/pull/2062).
Fixes: #2985.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Upgrade from v0.20.0 to v1.0.0, first stable release.
Git log
4bfa0034 Release prep v1.0.0-RC3 (2218)
c7ae470a Refactor SDK span creation and implementation (2213)
db317fce Verify and update OTLP trace exporter documentation (2053)
04de34a2 Update the website getting started docs (2203)
a7b9d021 Rename metric instruments to match feature-freeze API specification (2202)
1f527a52 Update trace API config creation functions (2212)
361a2096 Fix RC2 header in changelog (2215)
e209ee75 chore(exporter/zipkin): improves logging on invalid collector. (2191)
c0c5ef65 Fix typos in resource.go. (2201)
abf6afe0 Update otel example guide (2210)
3b05ba02 Bump actions/setup-go from 2.1.3 to 2.1.4 (2206)
bcd7ff7b Bump codecov/codecov-action from 2.0.2 to 2.0.3 (2205)
c912b179 Print JSON objects to stdout without a wrapping array (2196)
add511c1 Make WithoutTimestamps work (2195)
85c27e01 Bump github.com/golangci/golangci-lint from 1.41.1 to 1.42.0 in /internal/tools (2199)
bf6500b3 Bump google.golang.org/grpc from 1.39.1 to 1.40.0 in /exporters/otlp/otlptrace (2184)
9392af96 Bump google.golang.org/grpc in /exporters/otlp/otlptrace/otlptracegrpc (2185)
c95694dc Bump google.golang.org/grpc from 1.39.1 to 1.40.0 in /example/otel-collector (2183)
0528fa66 Bump google.golang.org/grpc from 1.39.1 to 1.40.0 in /exporters/otlp/otlpmetric (2186)
3a26ed21 Deprecate the oteltest package (2188)
c885435f Website: support GH page links to canonical src (2189)
6da20a27 Add cross-module test coverage (2182)
dfc866bd Support capturing stack trace (2163)
41588fea Deprecate the attribute.Any function (2181)
4e8d667f Support a single Resource per MeterProvider in the SDK (2120)
a8bb0bf8 Make the tracetest.SpanRecorder concurrent safe (2178)
87d09df3 Deprecate Array attribute in favor of *Slice types (2162)
df384a9a Move InstrumentKind into the new metric/sdkapi package (2091)
1cb5cdca Unify the OTLP attribute transform (2170)
a882ee37 Clarify the attribute package documentation and order/grouping (2168)
5d25c4d2 Add support for int32 in attribute.Any (2169)
2b0e139e Refactor attributes benchmark tests (2167)
4c7470d9 Bump google.golang.org/grpc from 1.39.0 to 1.39.1 in /exporters/otlp/otlptrace (2176)
990c534a Bump google.golang.org/grpc in /example/otel-collector (2172)
b45c9d31 Bump google.golang.org/grpc from 1.39.0 to 1.39.1 in /exporters/otlp/otlpmetric (2174)
a3d4ff5c Deprecated the bridge/opencensus/utils package (2166)
b1d1d529 Move OC bridge integration tests to own mod (2165)
89a9489c Add OC bridge internal unit tests (2164)
56c743ba Allow global ErrorHandler to be set multiple times (2160)
d18c135f Add OpenCensus bridge internal package (2146)
fcf945a4 Just a little typo fix in code documentation. (2159)
59a82eba Update version.go (2157)
21d4686f Add ErrorHandlerFunc to simplify creating ErrorHandlers (2149)
23cb9396 Remove `internal/semconv-gen` (2155)
39acab32 Fix code sample in otel.GetTraceProvider (2147)
2b1bb29e Update OpenCensus bridge docs with limitations (2145)
fd7c327b Fix Jaeger exporter agent port default value and docs (2131)
b8561785 fix(2138): add guard to constructOTResources to return an empty resource (2139)
11f62640 Add a SpanRecorder to the sdk/trace/tracetest (2132)
fd9de7ec rename assertsocketbuffersize.go to *_test (2136)
a6b4d90c nit doc fix (2135)
79398418 pre-release v1.0.0-RC2 (2133)
2501e0fd Use semconv.SchemaURL in STDOUT exporter example (2134)
ef03dbc9 Bump codecov/codecov-action from 1 to 2.0.2 (2129)
bbe6ca40 Deprecate oteltest.Harness for removal (2123)
7a624ac2 Deprecated the oteltest.TraceStateFromKeyValues function (2122)
ece1879f Removed dropped link's attributes field from API package (2118)
03902d98 Rename sdk/trace/tracetest test.go -> exporter.go (2128)
cb607b0a Unify OTLP exporter retry logic (2095)
abe22437 API: create new linked span from current context (2115)
db81d4aa Update internal/global/trace testing (2111)
7f10ef72 Remove propagation testing types from oteltest (2116)
25d739b0 Remove resource.WithBuiltinDetectors() which has not been maintained (2097)
d57c5a56 Remove several metrics test helpers (2105)
49359495 Simplify trace_context tests (2108)
56d42011 Simplify trace context benchmark test (2109)
63dfe64a Correct status transform in OTLP exporter (2102)
9b1a5f70 Performance improvement: avoid creating multiple same read-only objects (2104)
ab78dbd0 Update release URL (2106)
647af3a0 Pre release experimental metrics v0.22.0 (2101)
0a562337 Fixed OS type value for DragonFly BSD (2092)
62c21ffb Bump golang.org/x/tools from 0.1.4 to 0.1.5 in /internal/tools (2096)
4a3da55a Ensure sample code in website_docs getting started page works (2094)
d3063a3d Update otel.Meter to global.Meter in Getting Started Document.(2087) (2093)
00a1ec5f Add documentation guidelines and improve Jaeger exporter readme (2082)
12f737c7 oteltest: ensure valid SpanContext created for span started WithNewRoot (2073)
484258eb OS description attribute detector (1840)
d8c9a955 Bump google.golang.org/grpc from 1.38.0 to 1.39.0 in /example/otel-collector (2054)
4ffdf034 Add @pellard as an Approver (2047)
1a74b399 Bump google.golang.org/protobuf from 1.26.0 to 1.27.0 in /exporters/otlp/otlpmetric (2040)
57c2e8fb Bump golang.org/x/tools from 0.1.3 to 0.1.4 in /internal/tools (2036)
7cff31a9 Bump google.golang.org/protobuf from 1.26.0 to 1.27.0 in /exporters/otlp/otlptrace (2035)
9e8f523d when using WithNewRoot, don't use the parent context for sampling (2032)
62af6c70 semconv-gen: fix capitalization at word boundaries, add stability/deprecation indicators (2033)
0bceed7e Fix docs on otel-collector example (2034)
6428cd69 Update doc.go (2030)
311a6396 fix documentation for trace.Status (2029)
16f83ce6 export ToZipkinSpanModels for use outside this library (2027)
d5d4c87f Add HTTP metrics exporter for OTLP (2022)
d6e8f60f Bump github.com/golangci/golangci-lint from 1.40.1 to 1.41.1 in /internal/tools (2023)
51dbe3cb Remove deprecated exporters (2020)
257ef7fc Update project status in README (2017)
ced177b7 Pre-release 1.0.0-RC1 (2013)
694c9a41 Interface stability documentation (2012)
39fe8092 Add span.TracerProvider() (2009)
d020e1a2 Add more tests for go.opentelemetry.io/otel/trace package. (2004)
6d4a38f1 replace WithSyncer with WithBatcher in opencensus example (2007)
c30cd1d0 Split stdout exporter into stdouttrace and stdoutmetric (2005)
80ca2b1e otlp: mark unix endpoints to work without transport security (2001)
65140985 Update codecov ignore (2006)
3be9813d Deprecate the exporters in the "trace" and "metric" sub-directories (1993)
377f7ce4 remove WithTrace* options from otlptrace exporters (1997)
b33edaa5 OTLP metrics gRPC exporter (1991)
64b640cc Remove old OTLP exporter (1990)
7728a521 Remove dependency on metrics packages (1988)
135ac4b6 Moved internal/tools duplicated findRepoRoot function to common package (1978)
cdf67ddf Update semantic conventions to v1.4.0, move to versioned package (1987)
4883cb11 Refactor exporter creation functions (1985)
87cc1e1f Test BatchSpanProcessor export timeout directly (1982)
7ffe2845 Added inputPath validation to semconv-gen (1986)
a113856a Add caveat about installing opencensus bridge (1983)
741cb9a3 Fix generator.go call typo in RELEASING.md (1977)
7a0cee7b Replaces golint by revive and fix newly reported linter issues (1946)
46d9687a Add Schema URL support to Resource (1938)
0827aa62 Use mock server as jaeger agent listener. (1930)
20886012 Bugfix jaeger exporter test panic (1973)
4bf6150f Add baggage implementation based on the W3C and OpenTelemetry specification (1967)
bbe2b8a3 Bump github.com/itchyny/gojq from 0.12.3 to 0.12.4 in /internal/tools (1971)
4949bf05 Bump github.com/cenkalti/backoff/v4 from 4.1.0 to 4.1.1 in /exporters/otlp/otlptrace (1972)
015b4c17 Bump github.com/cenkalti/backoff/v4 from 4.1.0 to 4.1.1 in /exporters/otlp (1970)
13eb12ac Bump github.com/prometheus/client_golang from 1.10.0 to 1.11.0 in /exporters/metric/prometheus (1974)
2371bb0a add otlp trace http exporter (1963)
a75ade4e sdk/resource: honor OTEL_SERVICE_NAME in fromEnv resource detector (1969)
aed45802 Bump go.opentelemetry.io/proto/otlp from 0.8.0 to 0.9.0 in /exporters/otlp/otlptrace (1959)
c4ebae6a Bump go.opentelemetry.io/proto/otlp (1960)
b1d2be3b Bump google.golang.org/grpc from 1.37.1 to 1.38.0 in /exporters/otlp/otlptrace (1958)
f6daea5e Generate semantic conventions according to specification latest tagged version (1933)
435a63b3 Bump github.com/google/go-cmp from 0.5.5 to 0.5.6 (1954)
6c46af66 Bump github.com/google/go-cmp from 0.5.5 to 0.5.6 in /exporters/trace/jaeger (1953)
4d294853 Bump actions/cache from 2.1.5 to 2.1.6 (1952)
dfe2b6f1 OTLP trace gRPC exporter (1922)
5a8f7ff7 Bump go.opentelemetry.io/proto/otlp from 0.8.0 to 0.9.0 in /exporters/otlp (1943)
bd935866 Add schema URL support to Tracer (1889)
c1f460e0 Update API configs. (1921)
270cc603 Small fixes on some Span method's documentation headers (1950)
8603b902 Fix typo in doc (1949)
acbb1882 Bump google.golang.org/grpc from 1.37.1 to 1.38.0 in /exporters/otlp (1942)
b1621501 Add codecov badge (1940)
ea1434c3 Fix some golint issues (1947)
0eeb8f87 Refactor Tracestate (1931)
d3b12808 Add Passthrough example (1912)
f06cace6 Add @MadVikingGod as a project Approver (1923)
ab5facb3 Bump github.com/golangci/golangci-lint in /internal/tools (1925)
d23cc61b Refactor configs (1882)
6324adaa Add tracer option argument to global Tracer function (1902)
035fc650 Do not include authentication information in the http.url attribute (1919)
d8ac212c Fix sporadic test failure in otlp exporter http driver (1906)
a3df00f4 Create .gitattributes (1920)
fb88e926 Bump google.golang.org/grpc from 1.37.0 to 1.37.1 in /exporters/otlp (1914)
1982dc46 Bump google.golang.org/grpc in /example/prom-collector (1915)
1759c630 Bump github.com/golangci/golangci-lint in /internal/tools (1916)
7342aa47 Bump google.golang.org/grpc in /example/otel-collector (1913)
21c16418 Add support for scheme in OTEL_EXPORTER_OTLP_ENDPOINT (1886)
5cb62636 Semantic Convention generation tooling (1891)
6219221f Move the unit package to the metric module (1903)
63e0ecfc Implement global default non-recording span (1901)
b6d5442f Remove the Tracer method from the Span API (1900)
ae85fab3 Document functional options (1899)
cabf0c07 Fix default Jaeger collector endpoint (1898)
1e3fa3a3 Bump go.opentelemetry.io/proto/otlp from 0.7.0 to 0.8.0 in /exporters/otlp (1872)
696af787 Bump github.com/benbjohnson/clock from 1.0.3 to 1.1.0 in /sdk/metric (1532)
97eea6c3 Fix some golint issues (1894)
79d9852e fix container port mismatch issue (1895)
d20e7228 CI builds validate against last two versions of Go, dropping 1.14 and adding 1.16 (1865)
cbcd4b1a Redefine ExportSpans of SpanExporter with ReadOnlySpan (1873)
c99d5e99 Split large jaeger span batch to admire the udp packet size limit (1853)
42a84509 Unembed SpanContext (1877)
b7d02db1 Add Status type to SDK (1874)
f90d0d93 Update README (1876)
a1349944 Update resource.go (1871)
f40cad5e Add markdown link check configuration and action (1869)
9bc28f6b Fix existing markdown lint issues (1866)
08f4c270 Add documentation for tracer.Start() (1864)
2bd4840c remove Set.Encoded(Encoder) enconding cache (1855)
7674eebf Removed different types of Detectors for Resources. (1810)
f92a6d83 Implement retry policy for the OTLP/gRPC exporter (1832)
ec75390f Fix BSP context done tests (1863)
8e55f10a Move the Event type from the API to the SDK (1846)
e399d355 drop failed to exporter batches and return error when forcing flush a span processor (1860)
f6a9279a Honor context deadline or cancellation in SimpleSpanProcessor.Shutdown (1856)
aeef8e00 Add markdown lint GitHub action (1849)
d4c8ffad Replace spaces to tabs in Go code snippets (1854)
cb097250 fixed typo (1857)
392a44fa Refine configuration design docs (1841)
62cd933d Handle Resource env error when non-nil (1851)
24a91628 Document the SSP is not for production use (1844)
ec26ac23 Update RELEASING.md (1843)
8eb0bb99 Fix golint issue caused by typo (1847)
ca130e54 Markdownlint (1842)
1144a83d Small typo fixes to existing CHANGELOG entries (1839)
e6086958 Update website_docs to v0.20.0 (1838)
0f4e454c Change NewSplitDriver paramater and initialization (1798)
92551d39 Prerelease v1.0.0 (2250)
61839133 zipkin: remove no-op WithSDKOptions (2248)
568e7556 Set Schema URL when exporting traces to OTLP (2242)
ec26b556 Fix RC tags in docs (2239)
767ce26c Bump github.com/itchyny/gojq from 0.12.4 to 0.12.5 in /internal/tools (2216)
fe7058da adding NewNoopMeterProvider to follow trace api (2237)
c338a5ef Bump github.com/golangci/golangci-lint from 1.42.0 to 1.42.1 in /internal/tools (2236)
ef126f5c Remove deprecated Array from attribute package (2235)
360d1302 Add tests for nil *Resource (2227)
9e7812d1 Remove the deprecated oteltest package (2234)
486afd34 Remove the deprecated bridge/opencensus/utils pkg (2233)
eaacfaa8 Fix slice-valued attributes when used as map keys (2223)
df2bdbba Fix the import comments of otelpconfig (2224)
7aae2a02 otlptrace: Document supported environment variables (2222)
Fixes#2591
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Update OpenTelemetry from v0.15.0 to v0.20.0.
Git log
02d8bdd5 Release v0.20.0 (1837)
aa66fe75 OS and Process resource detectors (1788)
7374d679 Fix Links documents (1835)
856f5b84 Add feature request issue template (1831)
0fdc3d78 Remove bundler from Jaeger exporter (1830)
738ef11e Fix flaky global ErrorHandler delegation test (1829)
e43d9c00 Update Default Value for Jaeger Exporter Endpoint (1824)
0032bd64 Fix default merging of resource attributes from environment variable (1785)
96c5e4ba Add SpanProcessor example for Span annotation on start (1733)
543c8144 Remove the WithSDKOptions from the Jaeger exporter (1825)
66389ad6 Update function docs in sdk.go (1826)
70bc9eb3 Adds support for timeout on the otlp/gRPC exporter (1821)
081cc61d Update Jaeger exporter convenience functions (1822)
1b9f16d3 Remove the WithDisabled option from Jaeger exporter (1806)
6867faa0 Bump actions/cache from v2.1.4 to v2.1.5 (1818)
a2bf04dc Build context pipeline in Jaeger upload process (1809)
2de86f23 Remove locking from Jaeger exporter shutdown/export (1807)
4f9fec29 Add ExportSpans benchmark to Jaeger exporter (1805)
d9566abe Fix OTLP testing flake: signal connection from mock collector (1816)
a2cecb6e add support for env var configuration to otlp/gRPC (1811)
d616df61 Fix flaky OTLP exporter reconnect test (1814)
b09df84a Changes stdout to expose the `*sdktrace.TracerProvider` (1800)
04890608 Remove options field from Jaeger exporter (1808)
6db20e00 Remove the abandoned Process struct in Jaeger exporter (1804)
086abf34 docs: use test example to document prometheus.InstallNewPipeline (1796)
d0cea04b Bump google.golang.org/api from 0.43.0 to 0.44.0 in /exporters/trace/jaeger (1792)
99c477fe Fixed typo for default service name in Jaeger Exporter (1797)
95fd8f50 Bump google.golang.org/grpc from 1.36.1 to 1.37.0 in /exporters/otlp (1791)
9b251644 Zipkin Exporter: Use default resouce's serviceName as default serivce name (1777) (1786)
4d141e47 Add k8s.node.name and k8s.node.uid to semconv (1789)
5c99a34c Fix golint issue caused by incorrect comment (1795)
c5d006c0 Update Jaeger environment variables (1752)
58432808 add NewExportPipeline and InstallNewPipeline for otlp (1373)
7d8e6bd7 Zipkin Exporter: Adjust span transformation to comply with the spec (1688)
2817c091 Merge sdk/export/trace into sdk/trace (1778)
c61e654c Refactor prometheus exporter tests to match file headers as well (1470)
23422c56 Remove process config for Jaeger exporter (1776)
0d49b592 Add test to check bsp ignores `OnEnd` and `ForceFlush` post Shutdown` (1772)
e9aaa04b Record links/events attribute drops independently (1771)
5bbfc22c Make ExportSpans for Jaeger Exporter honor deadline (1773)
0786fe32 Add Bug report issue templates (1775)
3c7facee Add `ExportTimeout` option to batch span processor (1755)
c6b92d5b Make TraceFlags spec-compliant (1770)
ee687ca5 Bump github.com/itchyny/gojq from 0.12.2 to 0.12.3 in /internal/tools (1774)
52a24774 add support for configuring tls certs via env var to otlp/HTTP (1769)
35cfbc7e Update precedence of event name in Jaeger exporter (1768)
33699d24 Adds semantic conventions for exceptions (1492)
928e3c38 Modify ForceFlush to abort after timeout/cancellation (1757)
3947cab4 Fix testCollectorEndpoint typo and add tag assertions in jaeger_test (1753)
ecc635dc add website docs (1747)
07a8d195 Fix Jaeger span status reporting and unify tag keys (1761)
4fa35c90 add partial support for env var config to otlp/HTTP (1758)
bf180d0f improve OTLP/gRPC connection errors (1737)
d575865b Fix span IsRecording when not sampling (1750)
20c93b01 Update SamplingParameters (1749)
97501a3f Update SpanSnapshot to use parent SpanContext (1748)
604b05cb Store current Span instead of local and remote SpanContext in context.Context (1731)
c61f4b6d Set @lizthegrey to emeritus status (1745)
b1342fec Bump github.com/golangci/golangci-lint in /internal/tools (1743)
54e1bd19 Bump google.golang.org/api from 0.41.0 to 0.43.0 in /exporters/trace/jaeger (1741)
4d25b6a2 Bump github.com/prometheus/client_golang from 1.9.0 to 1.10.0 in /exporters/metric/prometheus (1740)
0a47b66f Bump google.golang.org/grpc from 1.36.0 to 1.36.1 in /exporters/otlp (1739)
26f006b8 Reinstate @paivagustavo as an Approver (1734)
382c7ced Remove hasRemoteParent field from SDK span (1728)
862a5a68 Remove setting error status while recording error with Span from oteltest package (1729)
6defcfdf Remove links on NewRoot spans (1726)
a9b2f851 upgrade thrift to v0.14.1 in jaeger exporter (1712)
5a6a854d Bump google.golang.org/protobuf from 1.25.0 to 1.26.0 in /exporters/otlp (1724)
23486213 Migrate to using go.opentelemetry.io/proto/otlp (1713)
5d559b40 Remove makeSamplingDecision func (1711)
e24702da Update the TraceContext.Extract docs (1720)
9d4eb1f6 Update dates in CHANGELOG.md for 2021 releases (1723)
2b4fa968 Release v0.19.0 (1710)
4beb7041 sdk/trace: removing ApplyConfig and Config (1693)
1d42be16 Rename WithDefaultSampler TracerProvider option to WithSampler and update docs (1702)
860d5d86 Add flag to determine whether SpanContext is remote (1701)
0fe65e6b Comply with OpenTelemetry attributes specification (1703)
88884351 Bump google.golang.org/api from 0.40.0 to 0.41.0 in /exporters/trace/jaeger (1700)
345f264a breaking(zipkin): removes servicName from zipkin exporter. (1697)
62cbf0f2 Populate Jaeger's Span.Process from Resource (1673)
28eaaa9a Add a test to prove the Tracer is safe for concurrent calls (1665)
8b1be11a Rename resource pkg label vars and methods (1692)
a1539d44 OpenCensus metric exporter bridge (1444)
77aa218d Fix issue #1490, apply same logic as in the SDK (1687)
9d3416cc Fix synchronization issues in global trace delegate implementation (1686)
58f69f09 Span status from HTTP code: Do not set status message if it can be inferred (1681)
9c305bde Flush metric events prior to shutdown in OTLP example (1678)
66b1135a Fix CHANGELOG (1680)
90bd4ab5 Update employer information for maintainers (1683)
36841913 Remove WithRecord() option from trace.SpanOption when starting a span (1660)
65c7de20 Remove trace prefix from NoOp src files. (1679)
e88a091a Make SpanContext Immutable (1573)
d75e2680 Avoid overriding configuration of tracer provider (1633)
2b4d5ac3 Bump github.com/golangci/golangci-lint in /internal/tools (1671)
150b868d Bump github.com/google/go-cmp from 0.5.4 to 0.5.5 (1667)
76aa924e Fix the examples target info messaging (1676)
a3aa9fda Bump github.com/itchyny/gojq from 0.12.1 to 0.12.2 in /internal/tools (1672)
a5edd79e Removed setting error status while recording err as span event (1663)
e9814758 chore(zipkin): improves zipkin example to not to depend on timeouts. (1566)
3dc91f2d Add ForceFlush method to TracerProvider (1608)
bd0bba43 exporter: swap pusher for exporter (1656)
56904859 Update the SimpleSpanProcessor (1612)
a7f7abac SpanStatus description set only when status code is set to Error (1662)
05252f40 Jaeger Exporter: Fix minor mapping discrepancies (1626)
238e7c61 Add non-empty string check for attribute keys (1659)
e9b9aca8 Add tests for propagation of Sampler Tracestate changes (1655)
875a2583 Add docs on when reviews should be cleared (1556)
7153ef2d Add HTTP/JSON to the otlp exporter (1586)
62e2a0f7 Unexport the simple and batch SpanProcessors (1638)
992837f1 Add TracerProvider tests to oteltest harness (1607)
bb4c297e Pre release v0.18.0 (1635)
712c3dcc Fix makefile ci target and coverage test packages (1634)
841d2a58 Rename local var new to not collide with builtin (1610)
13938ab5 Update SpanProcessor docs (1611)
e25503a0 Add compatibility tests to CI (1567)
1519d959 Use reasonable interval in sdktrace.WithBatchTimeout (1621)
7d4496e0 Pass metric labels when transforming to gaugeArray (1570)
6d4a5e0d Bump google.golang.org/grpc from 1.35.0 to 1.36.0 in /exporters/otlp (1619)
a93393a0 Bump google.golang.org/grpc in /example/prom-collector (1620)
e499ca86 Fix validation for tracestate with vendor and add tests (1581)
43886e52 Make timestamps sequential in lastvalue agg check (1579)
37688ef6 revent end-users from implementing some interfaces (1575)
85e696d2 Updating documentation with an working example for creating NewExporter (1513)
562eb28b Unify the Added sections of the unreleased changes (1580)
c4cf1aff Fix Windows build of Jaeger tests (1577)
4a163bea Fix stdout TestStdoutTimestamp failure with sleep (1572)
bd4701eb Stagger timestamps in exact aggregator tests (1569)
b94cd4b2 add code attributes to semconv package (1558)
78c06cef Update docs from gitter to slack for communication (1554)
1307c911 Remove vendor exclude from license-check (1552)
5d2636e5 Bump github.com/golangci/golangci-lint in /internal/tools (1565)
d7aff473 Vendor Thrift dependency (1551)
298c5a14 Update span limits to conform with OpenTelemetry specification (1535)
ecf65d79 Rename otel/label -> otel/attribute (1541)
1b5b6621 Remove resampling on span.SetName (1545)
8da52996 fix: grpc reconnection (1521)
3bce9c97 Add Keys() method to propagation.TextMapCarrier (1544)
0b1a1c72 Make oteltest.SpanRecorder into a concrete type (1542)
7d0e3e52 SDK span no modification after ended (1543)
7de3b58c Remove extra labels types (1314)
73194e44 Bump google.golang.org/api from 0.39.0 to 0.40.0 in /exporters/trace/jaeger (1536)
8fae0a64 Create resource.Default() with required attributes/default values (1507)
76f93422 Release v0.17.0 (1534)
9b242bc4 Organize API into Go modules based on stability and dependencies (1528)
e50a1c8c Bump actions/cache from v2 to v2.1.4 (1518)
a6aa7f00 Bump google.golang.org/api from 0.38.0 to 0.39.0 in /exporters/trace/jaeger (1517)
38efc875 Code Improvement - Error strings should not be capitalized (1488)
6b340501 Update default branch name (1505)
b39fd052 nit: Fix comment to be up-to-date (1510)
186c2953 Fix golint error of package comment form (1487)
9308d662 Bump google.golang.org/api from 0.37.0 to 0.38.0 in /exporters/trace/jaeger (1506)
1952d7b6 Reverse order of attribute precedence when merging two Resources (1501)
ad7b4715 Remove build flags for runtime/trace support (1498)
4bf4b690 Remove inaccurate and unnecessary import comment (1481)
7e19eb6a Bump google.golang.org/api from 0.36.0 to 0.37.0 in /exporters/trace/jaeger (1504)
c6a4406a Bump github.com/golangci/golangci-lint in /internal/tools (1503)
9524ac09 Update workflows to include main branch as trigger (1497)
c066f15e Bump github.com/gogo/protobuf from 1.3.1 to 1.3.2 in /internal/tools (1478)
894e0240 Bump github.com/golangci/golangci-lint in /internal/tools (1477)
71ffba39 Bump google.golang.org/grpc from 1.34.0 to 1.35.0 in /exporters/otlp (1471)
515809a8 Bump github.com/itchyny/gojq from 0.12.0 to 0.12.1 in /internal/tools (1472)
3e96ad1e gitignore: remove unused example path (1474)
c5622777 Histogram aggregator functional options (1434)
0df8cd62 Rename Makefile.proto to avoid interpretation as proto file (1468)
979ff51f Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 (1453)
1df8b3b8 Bump github.com/gogo/protobuf from 1.3.1 to 1.3.2 in /exporters/otlp (1456)
4c30a90a Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /sdk (1455)
5a9f8f6e Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/stdout (1454)
7786f34c Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/trace/zipkin (1457)
4352a7a6 Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/otlp (1460)
6990b3b3 Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/metric/prometheus (1461)
7af40d22 Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/trace/jaeger (1463)
f16f1892 Bump google.golang.org/grpc in /example/otel-collector (1465)
fe363be3 Move Span Event to API (1452)
43922240 Bump google.golang.org/grpc in /example/prom-collector (1466)
0aadfb27 Prepare release v0.16.0 (1464)
207587b6 Metric histogram aggregator: Swap in SynchronizedMove to avoid allocations (1435)
c29c6fd1 Shutdown underlying span exporter while shutting down BatchSpanProcessor (1443)
dfece3d2 Combine the Push and Pull metric controllers (1378)
74deeddd Handle tracestate in TraceContext propagator (1447)
49f699d6 Remove Quantile aggregation, DDSketch aggregator; add Exact timestamps (1412)
9c949411 Rename internal/testing to internal/internaltest (1449)
8d809814 Move gRPC driver to a subpackage and add an HTTP driver (1420)
9332af1b Bump github.com/golangci/golangci-lint in /internal/tools (1445)
5ed96e92 Update exporters/otlp Readme.md (1441)
bc9cb5e3 Switch CircleCI badge to GitHub Actions (1440)
716ad082 Remove CircleCI config (1439)
0682db1e Adding Security Workflows to GitHub Actions (2/2): gosec workflow (1429)
11f732b8 Adding Security Workflows to GitHub Actions (1/2): codeql workflow (1428)
40f1c003 Add Tracestate into the SamplingResult struct (1432)
db06c8d1 Flush metric events before shutdown in collector example (1438)
f6f458e1 Fix golint issue caused by typo in trace.go (1436)
fe9d1f7e Use uint64 Count consistently in metric aggregation (1430)
3a337d0b Bump github.com/golangci/golangci-lint in /internal/tools (1433)
1e4c8321 cleanup: drop the removed examples in gitignore (1427)
5c9221cf Unify endpoint API that related to OTel exporter (1401)
045c3ffe Build scripts: Replace mapfile with read loop for old bash versions (1425)
2def8c3d Add Versioning Documentation (1388)
6bcd1085 Bump github.com/itchyny/gojq from 0.11.2 to 0.12.0 in /internal/tools (1424)
38e76efe Add a split protocol driver for otlp exporter (1418)
439cd313 Add TraceState to SpanContext in API (1340)
35215264 Split connection management away from exporter (1369)
add9d933 Bump github.com/prometheus/client_golang from 1.8.0 to 1.9.0 in /exporters/metric/prometheus (1414)
93d426a1 Add @dashpole as a project Approver (1410)
6fe20ef3 Fix small typo (1409)
b22d0d70 Mention the getting started guide (1406)
3fb80fb2 Fix duplicate checkout action in GitHub workflow (1407)
2051927b Correct CI workflow syntax (1403)
f11a86f7 Fix typo in comment (1402)
bdf87a78 Migrate CircleCI ci.yml workflow to GitHub Actions (1382)
4e59dd1f Bump google.golang.org/grpc from 1.32.0 to 1.34.0 in /example/otel-collector (1400)
83513f70 Bump google.golang.org/api from 0.32.0 to 0.36.0 in /exporters/trace/jaeger (1398)
a354fc41 Bump github.com/prometheus/client_golang from 1.7.1 to 1.8.0 in /exporters/metric/prometheus (1397)
3528e42c Bump google.golang.org/grpc from 1.32.0 to 1.34.0 in /exporters/otlp (1396)
af114baf Call otel.Handle with non-nil errors (1384)
c3c4273e Add RO/RW span interfaces (1360)
Fixes#2591
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
In later versions of OpenTelemetry label.Any() is deprecated. Create
addTag() to handle type assertions of values. Change AddTag() to
variadic function that accepts multiple keys and values.
Fixes#2547
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
There are some issues with Makefile for runtime:
- default target can't be used as a dependent of other targets.
- empty target `check`
And also add two targets for locally development/tests.
- lint: run golangci-lint
- pre-commit: run lint and test
Fixes: #2942
Signed-off-by: bin <bin@hyper.sh>
We only added span.End() in the main process of the shim2 Shutdown method.
The "Shutdown" span would keep alive, when the containers number is not 0.
This PR make sure the "Shutdown" trace span have a correct end.
Fixes: #2930
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
Add -failfast option to let test exit on error, but -failfast option
can't cross package, so there is a for loop used to test on all packages
in src/runtime, and the parallel number is set to 1, this may lead test
to be slow.
Fixes: #1997
Signed-off-by: bin <bin@hyper.sh>
Virtcontainers API document functions weren't sync with the codes Sandbox and VCImpl.
And we have two functions named `CreateSandbox` functions, diff by one parameter,
very confused. So this pr sync the codes to api documents.
Fixes: #2928
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
Cloud hypervisor is only supporting virtio-blk, this PR removes comments
that make a wrong reference of other features that are not supported
by clh.
Fixes#2924
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Current handling of read-only mounts is a little tricky.
However, a clearer solution can be used here:
1. make a private ro bind mount at privateDest to the mount source
2. make a bind mount at mountDest to the mount created in step 1
3. umount the private bind mount created in step 1
One important aspect is that the mount in step 2 is duplicated from
the one we created in step 1. So the MS_RDONLY flag is properly
preserved in all mounts created in the propagtion.
Fixes: #2205
Depends-on: github.com/kata-containers/tests#4106
Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
The kata-agent supports seccomp feature based on the OCI runtime specification.
This seccomp capability in the kata-agent is enabled by default.
However, it is not enforced by default: users need to enable that by setting
`disable_guest_seccomp` to `false` in the main configuration file.
Fixes: #1476
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
This changed valid() in hypervisor to check the case where both
initrd and image path are set; in this case it returns an error.
Fixes#1868
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Show available guest protections in the
`kata-runtime env` output. Also bump the formatVersion.
Fixes: #1982
Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
Add functions to return guestProtection as a string slice, which
can be then used in `kata-runtime env` output.
Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
On a conventional (e.g. runc) container, passing in a VFIO group device,
/dev/vfio/NN, will result in the same VFIO group device being available
within the container.
With Kata, however, the VFIO device will be bound to the guest kernel's
driver (if it has one), possibly appearing as some other device (or a
network interface) within the guest.
This add a new `vfio_mode` option to alter this. If set to "vfio" it will
instruct the agent to remap VFIO devices to the VFIO driver within the
guest as well, meaning they will appear as VFIO devices within the
container.
Unlike a runc container, the VFIO devices will have different names to the
host, since the names correspond to the IOMMU groups of the guest and those
can't be remapped with namespaces.
For now we keep 'guest-kernel' as the value in the default configuration
files, to maintain current Kata behaviour. In future we should change this
to 'vfio' as the default. That will make Kata's default behaviour more
closely resemble OCI specified behaviour.
fixes#693
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently constrainGRPCSpec always removes VFIO devices from the OCI
container spec which will be used for the inner container. For
upcoming support for VFIO devices in DPDK usecases we'll need to not
do that.
As a preliminary to that, add an extra parameter to the function to
control whether or not it will remove the VFIO devices from the spec.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
"constraint" is a noun, "constrain" is the associated verb, which makes
more sense in this context.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In order to support DPDK workloads, we need to change the way VFIO devices
will be handled in Kata containers. However, the current method, although
it is not remotely OCI compliant has real uses. Therefore, introduce a new
runtime configuration field "vfio_mode" to control how VFIO devices will be
presented to the container.
We also add a new sandbox annotation -
io.katacontainers.config.runtime.vfio_mode - to override this on a
per-sandbox basis.
For now, the only allowed value is "guest-kernel" which refers to the
current behaviour where VFIO devices added to the container will be bound
to whatever driver in the VM kernel claims them.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
DefaultMaxVCPUs may be larger than the defaultMaxQemuVCPUs that should
be checked and avoided.
Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
The physical current vcpu number should not be used directly as the
largest vcpu number is limited to defaultMaxQemuVCPUs.
Here, a new helper is introduced in pkg/katautils/config.go to get
current vcpu number.
Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
The current kernel version parse lib can't process suffix '+', as the
modified kernel version will add '+' as suffix, thus panic will occur.
For example, if the current kernel version is "5.14.0-rc4+", test
TestHostNetworkingRequested will panic:
--- FAIL: TestHostNetworkingRequested (0.00s)
panic: &{DistroName:ubuntu DistroVersion:18.04
KernelVersion:5.11.0-rc3+ Issue: Passed:[] Failed:[] Debug:true
ActualEUID:0}: failed to check test constraints: error: Build meta data
is empty
Here, remove the suffix '+' in kernel version fix helper.
Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Export the top level hypervisor type
s/hypervisor/Hypervisor
Fixes: #2880
Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Last of a series of commits to export the top level
hypervisor generic methods.
s/createSandbox/CreateVM
Fixes#2880
Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Export commonly used hypervisor fields and utility functions.
These need to be exposed to allow the hypervisor to be consumed
externally.
Note: This does not change the hypervisor interface definition.
Those changes will be separate commits.
Signed-off-by: Manohar Castelino <mcastelino@apple.com>
This PR includes these optimize changes:
- Remove the dependency on the container engine.
The old code uses runc to generate config.json and
Docker to export rootfs, that will be heavy and need
additional dependency.
Using a fixed config for busybox image can avoid
the heavy processing above.
- Moved duplicate code to pkg/katatestutils package
Fixes: #2752
Signed-off-by: bin <bin@hyper.sh>
cri-containerd project has been merged into containerd repo, and
we should not reference it any more in code and docs.
This commit will use containerd package instead of cri-containerd
package.
Fixes: #2791
Signed-off-by: bin <bin@hyper.sh>
When running the TestIoCopy test, on some occasions, the test
runs too quick, and closes the stdin pipe before the ioCopy()
routine start to read from it. This causes a SIGSEGV error.
To fix this issue, I am adding additional read/write tests before
closing the pipes. As the read operation waits for the writer to
be done, this actually synchronizes the threads and make sure
the final tests (with closed pipes) works as expected.
Fixes: #2042
Signed-off-by: Julien Ropé <jrope@redhat.com>
The tracing tags for api.go contain `"packages"` as a tag name,
whereas all other tags contain `"package"`.
Fixes: #2847
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Display a pseudo path to the sandbox socket in the output of
`kata-runtime env` for those hypervisors that use Hybrid VSOCK.
The path is not a real path since the command does not create a sandbox.
The output includes a `{ID}` tag which would be replaced with the real
sandbox ID (name) when the sandbox was created.
This feature is only useful for agent tracing with the trace forwarder
where the configured hypervisor uses Hybrid VSOCK.
Note that the features required a new `setConfig()` method to be added
to the `hypervisor` interface. This isn't normally needed as the
specified hypervisor configuration passed to `setConfig()` is also
passed to `createSandbox()`. However the new call is required by
`kata-runtime env` to display the correct socket path for Firecracker.
The new method isn't wholly redundant for the main code path though as
it's now used by each hypervisor's `createSandbox()` call.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
The 'quiet' kernel parameter can avoid guest kernel logs while booting,
which can reduce boot time.
Fix: #2820
Signed-off-by: Bo Chen <chen.bo@intel.com>
We will need to have console output from the guest only for debugging
purposes. As a result, we can turn-off both the serial and
virtio-console devices by default for better boot time.
Fixes: #2820
Signed-off-by: Bo Chen <chen.bo@intel.com>
The variable for 'name' in config-settings.go.in was previously
hardcoded as "kata". In e7c42fb it was changed to the runtime name,
which is "kata-runtime". Add a variable to specify a syslog identifier
for consistency for tests and documentation that use it.
Fixes#2806
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Update the sandbox dir clean up logic to be more appropriate
Add different seeds for randInt() method
Fixes#2770
Signed-off-by: Feng Wang <feng.wang@databricks.com>
This patch adds an option "disable_seccomp" to the config
hypervisor.clh, from which users can disable the `seccomp`
feature from Cloud Hypervisor when needed (for debugging purposes).
Fixes: #2782
Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch enables the `seccomp` feature from Cloud Hypervisor which
provides fine-grained allowed syscalls for each of its worker
threads. It brings important security benefits, while would increase
memory footprint.
Fixes: #2782
Signed-off-by: Bo Chen <chen.bo@intel.com>
Shim management server is running in a go routine, in test mode
this will cause the directory where the listen socket
file(/run/vc/sbs/777-77-77777777/shim-monitor.sock) in leak
after the tests finished.
Fixes: #2805
Signed-off-by: bin <bin@hyper.sh>
This commit does two chagnes:
- move code for managing temp users to rootless.go.
- use common function in qemu.go when shutdown the VM.
Fixes: #2759
Signed-off-by: bin <bin@hyper.sh>
Even CCA, which is the confidential compute archtecture, has not been
ready, add a empty implementation to avoid static check error.
Fixes: #2789
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Suggested-by: Fabiano Fidêncio <fidencio@redhat.com>
Exclude from lint checking for it is ultimately only used in
architecture-specific code.
Fixes: #2273
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Bump containerd to v1.5.7 in order to bring in a fix for CVE-2021-41103,
"insufficiently restricted permissions ons plugins directories
(https://github.com/advisories/GHSA-c2h3-6mxw-7mvq)".
dependabot found a potential security vulnerability and raised a PR to
fix it. However, dependabot does not properly follows nor understands
the needed of our CIs (mainly related to formatting the PR and whatnot),
thus I'm re-raising it.
Fixes: #2796
Supersedes: #2787
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Our check for the IP family is working as long as we have either a
gateway or a destination IP. Some routes are missing both.
The RT netlink messages provide the IP family information for each
route, so we can carry that piece of information up to the guest. That
will allow for a more reliable route IP family determination.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
We need to be able to get the IP family from the netlink route meesages,
and the Route.Family field only got recently added to the netlink
package.
The update generates static check warnings about the call for
nethandler.Delete() being deprecated in favor of a Close() call instead.
So we include the s/Delete()/Close()/ change as part of this PR.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Reduce the cloud-hypervisor log level from `Debug` to `Info` when hypervisor
debug is enabled. This is required since `Debug` level:
- Is overkill for debugging hypervisor failures.
- Effectively hides the output from the guest kernel and userland: CLH
generates so much output that the output from the guest gets "lost in
the noise" (experiments show that for each full CLH debug message, at most
1 _byte_ of guest output is displayed).
Fixes: #2726.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>