- stable-2.4 | Second round of backports for the 2.4.1 release
- stable-2.4 | First round of backports for the 2.4.1 release
- stable-2.4 | versions: Upgrade to Cloud Hypervisor v23.0
- stable-2.4 | runtime: Base64 encode the direct volume mountInfo path
- stable-2.4 | agent: Avoid agent panic when reading empty stats
8e076c87 release: Adapt kata-deploy for 2.4.1
b50b091c agent: watchers: ensure uid/gid is preserved on copy/mkdir
03bc89ab clh: Rely on Cloud Hypervisor for generating the device ID
6b2c641f tools: fix typo in clh directory name
81e10fe3 packaging: Fix clh build from source fall-back
8b21c5f7 agent: modify the type of swappiness to u64
3f5c6e71 runtime: Allock mockfs storage to be placed in any directory
0bd1abac runtime: Let MockFSInit create a mock fs driver at any path
3e74243f runtime: Move mockfs control global into mockfs.go
aed4fe6a runtime: Export StoragePathSuffix
e1c4f57c runtime: Don't abuse MockStorageRootPath() for factory tests
c49084f3 runtime: Make bind mount tests better clean up after themselves
4e350f7d runtime: Clean up mock hook logs in tests
415420f6 runtime: Make SetupOCIConfigFile clean up after itself
688b9abd runtime: Don't use fixed /tmp/mountPoint path
dc1288de kata-monitor: add a README file
78edf827 kata-monitor: add some links when generating pages for browsers
eff74fab agent: fsGroup support for direct-assigned volume
01cd5809 proto: fsGroup support for direct-assigned volume
97ad1d55 runtime: fsGroup support for direct-assigned volume
b62cced7 runtime: no need to write virtiofsd error to log
8242cfd2 kata-monitor: update the hrefs in the debug/pprof index page
a37d4e53 agent: best-effort removing mount point
d1197ee8 tools/packaging: Fix error path in 'kata-deploy-binaries.sh -s'
c9c77511 tools/packaging: Fix usage of kata-deploy-binaries.sh
1e622316 tools/packaging/kata-deploy: Copy install_yq.sh in a dedicated script
8fa64e01 packaging: Eliminate TTY_OPT and NO_TTY variables in kata-deploy
8f67f9e3 tools/packaging/kata-deploy/local-build: Add build to gitignore
3049b776 versions: Bump firecracker to v0.23.4
aedfef29 runtime/virtcontainers: Pass the hugepages resources to agent
c9e1f727 agent: Verify that we allocated as many hugepages as we need
ba858e8c agent: Don't attempt to create directories for hugepage configuration
bc32eff7 virtcontainers: clh: Re-generate the client code
984ef538 versions: Upgrade to Cloud Hypervisor v23.0
adf6493b runtime: Base64 encode the direct volume mountInfo path
6b417540 agent: Avoid agent panic when reading empty stats
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
kata-deploy files must be adapted to a new release. The cases where it
happens are when the release goes from -> to:
* main -> stable:
* kata-deploy-stable / kata-cleanup-stable: are removed
* stable -> stable:
* kata-deploy / kata-cleanup: bump the release to the new one.
There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Today in agent watchers, when we copy files/symlinks
or create directories, the ownership of the source path
is not preserved which can lead to permission issues.
In copy, ensure that we do a chown of the source path
uid/gid to the destination file/symlink after copy to
ensure that ownership matches the source ownership.
fs::copy() takes care of setting the permissions.
For directory creation, ensure that we set the
permissions of the created directory to the source
directory permissions and also perform a chown of the
source path uid/gid to ensure directory ownership
and permissions matches to the source.
Fixes: #4188
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
(cherry picked from commit 70eda2fa6c)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We're currently hitting a race condition on the Cloud Hypervisor's
driver code when quickly removing and adding a block device.
This happens because the device removal is an asynchronous operation,
and we currently do *not* monitor events coming from Cloud Hypervisor to
know when the device was actually removed. Together with this, the
sandbox code doesn't know about that and when a new device is attached
it'll quickly assign what may be the very same ID to the new device,
leading to the Cloud Hypervisor's driver trying to hotplug a device with
the very same ID of the device that was not yet removed.
This is, in a nutshell, why the tests with Cloud Hypervisor and
devmapper have been failing every now and then.
The workaround taken to solve the issue is basically *not* passing down
the device ID to Cloud Hypervisor and simply letting Cloud Hypervisor
itself generate those, as Cloud Hypervisor does it in a manner that
avoids such conflicts. With this addition we have then to keep a map of
the device ID and the Cloud Hypervisor's generated ID, so we can
properly remove the device.
This workaround will probably stay for a while, at least till someone
has enough cycles to implement a way to watch the device removal event
and then properly act on that. Spoiler alert, this will be a complex
change that may not even be worth it considering the race can be avoided
with this commit.
Fixes: #4196
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 33a8b70558)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
If we fail to download the clh binary, we fall-back to build from source.
Unfortunately, `pull_clh_released_binary()` leaves a `cloud_hypervisor`
directory behind, which causes `build_clh_from_source()` not to clone
the git repo:
[ -d "${repo_dir}" ] || git clone "${cloud_hypervisor_repo}"
When building from a kata-containers git repo, the subsequent calls
to `git` in this function thus apply to the kata-containers repo and
eventually fail, e.g.:
+ git checkout v23.0
error: pathspec 'v23.0' did not match any file(s) known to git
It doesn't quite make sense actually to keep an existing directory the
content of which is arbitrary when we want to it to contain a specific
version of clh. Just remove it instead.
Fixes: #4151
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit afbd60da27)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The type of MemorySwappiness in runtime is uint64, and the type of swappiness in agent is int64,
if we set max uint64 in runtime and pass it to agent, the value will be equal to -1. We should
modify the type of swappiness to u64
Fixes: #4123
Signed-off-by: holyfei <yangfeiyu20092010@163.com>
(cherry picked from commit 0239502781)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Currently EnableMockTesting() takes no arguments and will always place the
mock storage in the fixed location /tmp/vc/mockfs. This means that one
test run can interfere with the next one if anything isn't cleaned up
(and there are other bugs which means that happens). If if those were
fixed this would allow developers testing on the same machine to interfere
with each other.
So, allow the mockfs to be placed at an arbitrary place given as a
parameter to EnableMockTesting(). In TestMain() we place it under our
existing temporary directory, so we don't need any additional cleanup just
for the mockfs.
fixes#4140
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 1b931f4203)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Currently MockFSInit always creates the mockfs at the fixed path
/tmp/vc/mockfs. This change allows it to be initialized at any path
given as a parameter. This allows the tests in fs_test.go to be
simplified, because the by using a temporary directory from
t.TempDir(), which is automatically cleaned up, we don't need to
manually trigger initTestDir() (which is misnamed, it's actually a
cleanup function).
For now we still use the fixed path when auto-creating the mockfs in
MockAutoInit(), but we'll change that later.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit ef6d54a781)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
virtcontainers/persist/fs/mockfs.go defines a mock filesystem type for
testing. A global variable in virtcontainers/persist/manager.go is used to
force use of the mock fs rather than a normal one.
This patch moves the global, and the EnableMockTesting() function which
sets it into mockfs.go. This is slightly cleaner to begin with, and will
allow some further enhancements.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 5d8438e939)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
storagePathSuffix defines the file path suffix - "vc" - used for
Kata's persistent storage information, as a private constant. We
duplicate this information in fc.go which also needs it.
Export it from fs.go instead, so it can be used in fc.go.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 963d03ea8a)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
A number of unit tests under virtcontainers/factory use
MockStorageRootPath() as a general purpose temporary directory. This
doesn't make sense: the mockfs driver isn't even in use here since we only
call EnableMockTesting for the pase virtcontainers package, not the
subpackages.
Instead use t.TempDir() which is for exactly this purpose. As a bonus it
also handles the cleanup, so we don't need MockStorageDestroy any more.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 1719a8b491)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
There are several tests in mount_test.go which perform a sample bind
mount. These need a corresponding unmount to clean up afterwards or
attempting to delete the temporary files will fail due to the existing
mountpoint. Most of them had such an unmount, but
TestBindMountInvalidPgtypes was missing one.
In addition, the existing unmounts where done inconsistently - one was
simply inline (so wouldn't be executed if the test fails too early) and one
is a defer. Change them all to use the t.Cleanup mechanism.
For the dummy mountpoint files, rather than cleaning them up after the
test, the tests were removing them at the beginning of the test. That
stops the test being messed up by a previous run, but messily. Since
these are created in a private temporary directory anyway, if there's
something already there, that indicates a problem we shouldn't ignore.
In fact we don't need to explicitly remove these at all - they'll be
removed along with the rest of the private temporary directory.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit bec59f9e39)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The tests in hook_test.go run a mock hook binary, which does some debug
logging to /tmp/mock_hook.log. Currently we don't clean up those logs
when the tests are done. Use a test cleanup function to do this.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit f7ba21c86f)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
SetupOCIConfigFile creates a temporary directory with os.MkDirTemp(). This
means the callers need to register a deferred function to remove it again.
At least one of them was commented out meaning that a /temp/katatest-
directory was leftover after the unit tests ran.
Change to using t.TempDir() which as well as better matching other parts of
the tests means the testing framework will handle cleaning it up.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 90b2f5b776)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Several tests in kata_agent_test.go create /tmp/mountPoint as a dummy
directory to mount. This is not cleaned up after the test. Although it
is in /tmp, that's still a little messy and can be confusing to a user.
In addition, because it uses the same name every time, it allows for one
run of the test to interfere with the next.
Use the built in t.TempDir() to use an automatically named and deleted
temporary directory instead.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 2eeb5dc223)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Add some links to rendered webpages for better user experience,
let users can jump to pages only by clicking links in browsers.
Fixes: #4061
Signed-off-by: bin <bin@hyper.sh>
(cherry picked from commit f8cc5d1ad8)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Adding two functions set_ownership and
recursive_ownership_change to support changing group id
ownership for a mounted volume.
The set_ownership will be called in common_storage_handler
after mount_storage performs the mount for the volume.
set_ownership will be a noop if the FSGroup field in the
Storage struct is not set which indicates no chown will be
performed. If FSGroup field is specified, then it will
perform the recursive walk of the mounted volume path to
change ownership of all files and directories to the
desired group id. It will also configure the SetGid bit
so that files created the directory will have group
following parent directory group.
If the fsGroupChangePolicy is on root mismatch,
then the group ownership will be skipped if the root
directory group id alreasy matches the desired group
id and if the SetGid bit is also set on the root directory.
This is the same behavior as what
Kubelet does today when performing the recursive walk
to change ownership.
Fixes#4018
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
(cherry picked from commit 92c00c7e84)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This change adds two fields to the Storage pb
FSGroup which is a group id that the runtime
specifies to indicate to the agent to perform a
chown of the mounted volume to the specified
group id after mounting is complete in the guest.
FSGroupChangePolicy which is a policy to indicate
whether to always perform the group id ownership
change or only if the root directory group id
does not match with the desired group id.
These two fields will allow CSI plugins to indicate
to Kata that after the block device is mounted in
the guest, group id ownership change should be performed
on that volume.
Fixes#4018
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
(cherry picked from commit 6a47b82c81)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The fsGroup will be specified by the fsGroup key in
the direct-assign mountinfo metadate field.
This will be set when invoking the kata-runtime
binary and providing the key, value pair in the metadata
field. Similarly, the fsGroupChangePolicy will also
be provided in the mountinfo metadate field.
Adding an extra fields FsGroup and FSGroupChangePolicy
in the Mount construct for container mount which will
be populated when creating block devices by parsing
out the mountInfo.json.
And in handleDeviceBlockVolume of the kata-agent client,
it checks if the mount FSGroup is not nil, which
indicates that fsGroup change is required in the guest,
and will provide the FSGroup field in the protobuf to
pass the value to the agent.
Fixes#4018
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
(cherry picked from commit 532d53977e)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The scanner reads nothing from viriofsd stderr pipe, because param
'--syslog' rediercts stderr to syslog. So there is no need to write
scanner.Text() to kata log
Fixes: #4063
Signed-off-by: Zhuoyu Tie <tiezhuoyu@outlook.com>
(cherry picked from commit 6e79042aa0)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
kata-monitor allows to get data profiles from the kata shim
instances running on the same node by acting as a proxy
(e.g., http://$NODE_ADDRESS:8090/debug/pprof/?sandbox=$MYSANDBOXID).
In order to proxy the requests and the responses to the right shim,
kata-monitor requires to pass the sandbox id via a query string in the
url.
The profiling index page proxied by kata-monitor contains the link to all
the data profiles available. All the links anyway do not contain the
sandbox id included in the request: the links result then broken when
accessed through kata-monitor.
This happens because the profiling index page comes from the kata shim,
which will not include the query string provided in the http request.
Let's add on-the-fly the sandbox id in each href tag returned by the kata
shim index page before providing the proxied page.
Fixes: #4054
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit 86977ff780)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
During container exit, the agent tries to remove all the mount point directories,
which can fail if it's a readonly filesytem (e.g. device mapper). This commit ignores
the removal failure and logs a warning message.
Fixes: #4043
Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit aabcebbf58)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
`make kata-tarball` relies on `kata-deploy-binaries.sh -s` which
silently ignores errors, and you may end up with an incomplete
tarball without noticing it because `make`'s exit status is 0.
`kata-deploy-binaries.sh` does set the `errexit` option and all the
code in the script seems to assume that since it doesn't do error
checking. Unfortunately, bash automatically disables `errexit` when
calling a function from a conditional pipeline, like done in the `-s`
case:
if [ "${silent}" == true ]; then
if ! handle_build "${t}" &>"$log_file"; then
^^^^^^
this disables `errexit`
and `handle_build` ends with a `tar tvf` that always succeeds.
Adding error checking all over the place isn't really an option
as it would seriously obfuscate the code. Drop the conditional
pipeline instead and print the final error message from a `trap`
handler on the special ERR signal. This requires the `errtrace`
option as `trap`s aren't propagated to functions by default.
Since all outputs of `handle_build` are redirected to the build
log file, some file descriptor duplication magic is needed for
the handler to be able to write to the orignal stdout and stderr.
Fixes#3757
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit a779e19bee)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
'make kata-tarball' sometimes fails early with:
cp: cannot create regular file '[...]/tools/packaging/kata-deploy/local-build/dockerbuild/install_yq.sh': File exists
This happens because all assets are built in parallel using the same
`kata-deploy-binaries-in-docker.sh` script, and thus all try to copy
the `install_yq.sh` script to the same location with the `cp` command.
This is a well known race condition that cannot be avoided without
serialization of `cp` invocations.
Move the copying of `install_yq.sh` to a separate script and ensure
it is called *before* parallel builds. Make the presence of the copy
a prerequisite for each sub-build so that they still can be triggered
individually. Update the GH release workflow to also call this script
before calling `kata-deploy-binaries-in-docker.sh`.
Fixes#3756
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 154c8b03d3)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
NO_TTY configured whether to add the -t option to docker run. It makes no
sense for the caller to configure this, since whether you need it depends
on the commands you're running. Since the point here is to run
non-interactive build scripts, we don't need -t, or -i either.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 1ed7da8fc7)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This directory consists entirely of files built during a make kata-tarball,
so it should not be committed to the tree. A symbolic link to this directory
might be created during 'make tarball', ignore it as well.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[greg: - rearranged the subject to make the subsystem checker happy
- also ignore the symbolic link created by
`kata-deploy-binaries-in-docker.sh`]
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit bad859d2f8)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This release changes Docker images repository from DockerHub to Amazon
ECR. This resolves the `You have reached your pull rate limit` error
when building the firecracker tarball.
Fixes#4001
Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 0d5f80b803)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The hugepages resources claimed by containers should be limited
by cgroup in the guest OS.
Fixes: #3695
Signed-off-by: Miao Xia <xia.miao1@zte.com.cn>
(cherry picked from commit a2f5c1768e)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
allocate_hugepages() writes to the kernel sysfs file to allocate hugepages
in the Kata VM. However, even if the write succeeds, it's not certain that
the kernel will actually be able to allocate as many hugepages as we
requested.
This patch reads back the file after writing it to check if we were able to
allocate all the required hugepages.
fixes#3816
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 42e35505b0)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
allocate_hugepages() constructs the path for the sysfs directory containing
hugepage configuration, then attempts to create this directory if it does
not exist.
This doesn't make sense: sysfs is a view into kernel configuration, if the
kernel has support for the hugepage size, the directory will already be
there, if it doesn't, trying to create it won't help.
For the same reason, attempting to create the "nr_hugepages" file
itself is pointless, so there's no reason to call
OpenOptions::create(true).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 608e003abc)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is to avoid accidentally deleting multiple volumes.
Fixes#4020
Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit 354cd3b9b6)
- stable-2.4 | agent: fix container stop error with signal SIGRTMIN+3
- stable-2.4 | kata-monitor: fix duplicated output when printing usage
- stable-2.4 | runtime: Stop getting OOM events from agent for "ttrpc closed" error
- kata-deploy: fix version bump from -rc to stable
- stable-2.4: release: Include all the rust vendored code into the vendored tarball
- stable-2.4 | tools: release: Do not consider release candidates as stable releases
- agent: Signal the whole process group
- stable-2.4 | docs: Update k8s documentation
- backport main commits to stable 2.4
- stable-2.4: Bump QEMU to 6.2 (bringing then SGX support in)
- runtime: Properly handle ESRCH error when signaling container
- stable-2.4 | versions: Upgrade to Cloud Hypervisor v22.1
f2319d69 release: Adapt kata-deploy for 2.4.0
cae48e9c agent: fix container stop error with signal SIGRTMIN+3
342aa95c kata-monitor: fix duplicated output when printing usage
9f75e226 runtime: add logs around sandbox monitor
363fbed8 runtime: stop getting OOM events when ttrpc: closed error
f840de5a workflows,release: Ship *all* the rust vendored code
952cea5f tools: Add a generate_vendor.sh script
cc965fa0 kata-deploy: fix version bump from -rc to stable
f41cc184 tools: release: Do not consider release candidates as stable releases
e059b50f runtime: Add more debug logs for container io stream copy
71ce6f53 agent: Kill the all the container processes of the same cgroup
30fc2c86 docs: Update k8s documentation
24028969 virtcontainers: Run mock hook from build tree rather than system bin dir
4e54aa5a doc: fix filename typo
d815393c manager: Add options to change self test behaviour
4111e1a3 manager: Add option to enable component debug
2918be18 manager: Create containerd link
6b31b068 kernel: fix cve-2022-0847
5589b246 doc: update Intel SGX use cases document
1da88dca tools: update QEMU to 6.2
3e2f9223 runtime: Properly handle ESRCH error when signaling container
4c21cb3e versions: Upgrade to Cloud Hypervisor v22.1
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
kata-deploy files must be adapted to a new release. The cases where it
happens are when the release goes from -> to:
* main -> stable:
* kata-deploy-stable / kata-cleanup-stable: are removed
* stable -> stable:
* kata-deploy / kata-cleanup: bump the release to the new one.
There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
The nix::sys::signal::Signal package api cannot deal with SIGRTMIN+3,
directly use libc function to send the signal.
Fixes: #3990
Signed-off-by: Wang Xingxing <stellarwxx@163.com>
(cherry picked from commit 0d765bd082)
Signed-off-by: Wang Xingxing <stellarwxx@163.com>
(default: "/run/containerd/containerd.sock") is duplicated when
printing kata-monitor usage:
[root@kubernetes ~]# kata-monitor --help
Usage of kata-monitor:
-listen-address string
The address to listen on for HTTP requests. (default ":8090")
-log-level string
Log level of logrus(trace/debug/info/warn/error/fatal/panic). (default "info")
-runtime-endpoint string
Endpoint of CRI container runtime service. (default: "/run/containerd/containerd.sock") (default "/run/containerd/containerd.sock")
the golang flag package takes care of adding the defaults when printing
usage. Remove the explicit print of the value so that it would not be
printed on screen twice.
Fixes: #3998
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
(cherry picked from commit a63bbf9793)
getOOMEvents is a long-waiting call, it will retry when failed.
For cases of agent shutdown, the retry should stop.
When the agent hasn't detected agent has died, we can also check
whether the error is "ttrpc: closed".
Fixes: #3815
Signed-off-by: bin <bin@hyper.sh>
Instead of only vendoring the code needed by the agent, let's ensure we
vendor all the needed rust code, and let's do it using the newly
introduced enerate_vendor.sh script.
Fixes: #3973
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 3606923ac8)
This script is responsible for generating a tarball with all the rust
vendored code that is needed for fully building kata-containers on a
disconnected environment.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 2eb07455d0)
During the release of 2.4.0-rc0 @egernst noticed an incositency in the
way we handle release tags, as release candidates are being taken as
"stable" releases, while both the kata-deploy tests and the release
action consider this as "latest".
Ideally we should have our own tag for "release candidate", but that's
something that could and should be discussed more extensively outside of
the scope of this quick fix.
For now, let's align the code generating the PR for bumping the release
with what we already do as part of the release action and kata-deploy
test, and tag "-rc" as latest, regardless of which branch it's coming
from.
Fixes: #3847
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
(cherry picked from commit 4adf93ef2c)
Update documentation with missing step to untaint node to enable
scheduling and update the example to run a pod using the kata runtime
class instead of untrusted workloads, which applies to versions of CRI-O
prior to v1.12.
Fixes#3863
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
(cherry picked from commit 5c434270d1)
Running unit tests should generally have minimal dependencies on
things outside the build tree. It *definitely* shouldn't modify
system wide things outside the build tree. Currently the runtime
"make test" target does so, though.
Several of the tests in src/runtime/pkg/katautils/hook_test.go require a
sample hook binary. They expect this hook in
/usr/bin/virtcontainers/bin/test/hook, so the makefile, as root, installs
the test binary to that location.
Go tests automatically run within the package's directory though, so
there's no need to use a system wide path. We can use a relative path to
the binary build within the tree just as easily.
fixes#3941
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Added new `kata-manager` options to control the self-test behaviour. By
default, after installation the manager will run a test to ensure a Kata
Containers container can be created. New options allow:
- The self test to be disabled.
- Only the self test to be run (no installation).
These features allow changes to be made to the installed system before
the self test is run.
Fixes: #3851.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Make the `kata-manager` create a `containerd` link to ensure the
downloaded containerd systemd service file can find the daemon when
using the GitHub packaged version of containerd.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Installation section is not longer needed because of the latest
default kata kernel supports Intel SGX.
Include QEMU to the list of supported hypervisors.
fixes#3911
Signed-off-by: Julio Montes <julio.montes@intel.com>
(cherry picked from commit 24b29310b2)
bring Intel SGX support
Changes tha may impact in Kata Containers
Arm:
The 'virt' machine now supports an emulated ITS
The 'virt' machine now supports more than 123 CPUs in TCG emulation mode
The pl031 real-time clock device now supports sending RTC_CHANGE QMP events
PowerPC:
Improved POWER10 support for the 'powernv' machine
Initial support for POWER10 DD2.0 CPU added
Added support for FORM2 PAPR NUMA descriptions in the "pseries" machine
type
s390x:
Improved storage key emulation (e.g. fixed address handling, lazy
storage key enablement for TCG, ...)
New gen16 CPU features are now enabled automatically in the latest
machine type
KVM:
Support for SGX in the virtual machine, using the /dev/sgx_vepc device
on the host and the "memory-backend-epc" backend in QEMU.
New "hv-apicv" CPU property (aliased to "hv-avic") sets the
HV_DEPRECATING_AEOI_RECOMMENDED bit in CPUID[0x40000004].EAX.
virtio-mem:
QEMU now fully supports guest memory dumps with virtio-mem.
QEMU now cleanly supports precopy migration, postcopy migration and
background snapshots with virtio-mem.
fixes#3902
Signed-off-by: Julio Montes <julio.montes@intel.com>
(cherry picked from commit 18d4d7fb1d)
Currently kata shim v2 doesn't translate ESRCH signal, causing container
fail to stop and shim leak.
Fixes: #3874
Signed-off-by: Feng Wang <feng.wang@databricks.com>
(cherry picked from commit aa5ae6b17c)
- Enhancement: fix comments/logs and delete not used function
- storage: make k8s emptyDir volume creation location configurable
- Implement direct-assigned volume
- Bump containerd to 1.6.1
- experimentally enable vcpu hotplug and virtio-mem on arm64 in kernel part
- versions: Upgrade to Cloud Hypervisor v22.0
- katatestutils: remove distro constraints
- Minor fixes for the `disable_block_device_use` comments
- clh: stop virtofsd if clh fails to boot up the vm
- clh: tdx: Don't use sharedFS with Confidential Guests
- runtime: Build golang components with extra security options
- snap: Use git clone depth 1 for QEMU and dependencies
- snap: Don't build cloud-hypevisor on ppc64le
- build: always reset ARCH after getting it
- virtcontainers: remove temp dir created for vsock in test code
- docs: Add unit testing presentation
- virtcontainers: Use available s390x hugepages
- Update QEMU >= 6.1.0 in configure-hypervisor.sh
- Fix monitor listen address
- snap: clh: Re-use kata-deploy script here
- osbuilder: Add CentOS Stream rootfs
- runtime: Gofmt fixes
- Update `confidential_guest` comments
- cleanup runtime pkgs for Darwin build, add basic Darwin build/unit test
- docs: Update Readme document
- runtime: use Cmd.StdoutPipe instead of self-created pipe
- docs: Developer-Guide build a custom Kata agent with musl
- kata-agent: Fix mismatching error of cgroup and mountinfo.
- runtime, config: make selinux configurable
- Fix unbound variable / typo on error mesage
- clh: Add TDX support
- virtcontainers: Do not add a virtio-rng-ccw device
- kata-monitor: fix collecting metrics for sandboxes not started through CRI
- runtime: fix package declaration for ppc64le
- Make the hypervisor framework not Linux specific
- kata-deploy: Simplify Dockerfile and support s390x
- Support nerdctl OCI hooks
- shim: log events for CRI-O
- docs: Update contributing link
- kata-deploy: Use (kata with) qemu as the default shim-v2 binary
- kata-monitor: simplify sandbox cache management and attach kubernetes POD metadata to metrics
- nydus: add lazyload support for kata with clh
- kernel: remove SYS_SUPPORTS_HUGETLBFS from powerpc fragments
- packaging: Use `patch` for applying patches
- virtcontainers: Remove duplicated assert messages in utils test code
- versions: add nydus-snapshotter
- docs: Update limitations document
- packaging: support qemu-tdx
- Kata manager fix install
- versions: Linux 5.15.x
- trace-forwarder/agent-ctl: run cargo fmt/clippy in make check
- docs: Improve top-level README
- runtime: use github.com/mdlayher/vsock@v1.1.0
- tools: Build cloud-hypervisor with "--features tdx"
- virtiofsd: Use "-o announce_submounts"
- feature: hugepages support
- tools: clh: Allow to set when to build from sources and the build flags passed down to cargo
- docs: Remove docker run and shared memory from limitations
- versions: Udpate Cloud Hypervisor to 55479a64d237
- kernel: add missing config fragment for TDx
- runtime: The index variable is initialized multiple times in for
- scripts: fix a typo while to check build_type
- versions: bump CRI-O to its 1.23 release
- feature(nydusd): add nydusd support to introduce lazyload ability
- docs: Fix relative links in Markdown
- kernel: support TDx
- device: Actually update PCIDEVICE_ environment variables for the guest
- docs: Update link to EFK stack docs
- runtime: support QEMU SGX
- snap: update qemu version to 6.1.0 for arm
- Release process related fixes
- openshift-ci: switch to CentOS Stream
- virtcontainers: Split the rootless package into OS specific parts
- runtime: suppport split firmware
- kata-deploy: for testing, make sure we use the PR branch
- docs: Remove Zun documentation with kata containers
- agent: Fix execute_hook() args error
- workflows: stop checking revert commit
84dff440 release: Adapt kata-deploy for 2.4.0-rc0
b257e0e5 rustjail: delete function signal in BaseContainer
d647b28b agent: delete meaningless FIXME comment
1b34494b runtime: fix invalid comments for pkg/resourcecontrol
afc567a9 storage: make k8s emptyDir creation configurable
e76519af runtime: small refactor to improve readability
7e5f11a5 vendor: Update containerd to 1.6.1
42771fa7 runtime: don't set socket and thread for arm/virt
8828ef41 kernel: add arm experimental kernel build support
8a9007fe config: remove 2 config as they are removed in 5.15
1b6f7401 kernel: add arm experimental patches to support vcpu hotplug and virtio-mem
f905161b runtime: mount direct-assigned block device fs only once
27fb4902 agent: add get volume stats handler in agent
ea51ef1c runtime: forward the stat and resize requests from shimv2 to kata agent
c39281ad runtime: update container creation to work with direct assigned volumes
4e00c237 agent: add grpc interface for stat and resize operations
e9b5a255 runtime: add stat and resize APIs to containerd-shim-v2
6e0090ab runtime: persist direct volume mount info
fa326b4e runtime: augment kata-runtime CLI to support direct-assigned volume
b8844fb8 versions: Upgrade to Cloud Hypervisor v22.0
af804734 clh: stop virtofsd if clh fails to boot up the vm
97951a2d clh: Don't use SharedFS with Confidential Guests
c30b3a9f clh: Adding a volume is not supported without SharedFS
f889f1f9 clh: introduce supportsSharedFS()
54d27ed7 clh: introduce loadVirtiofsDaemon()
ae2221ea clh: introduce stopVirtiofsDaemon()
e8bc26f9 clh: introduce setupVirtiofsDaemon()
413b3b47 clh: introduce createVirtiofsDaemon()
55cd0c89 runtime: Build golang components with extra security options
76e4f6a2 Revert "hypervisors: Confidential Guests do not support Device hotplug"
fa8b9392 config: qemu: Fix disable_block_device_use comments
9615c8bc config: fc: Don't expose disable_block_device_use
c1fb4bb7 snap: Don't build cloud-hypevisor on ppc64le
58913694 snap: Use git clone depth 1 for QEMU and dependencies
b27c7f40 docs: Add unit testing presentation
e64c54a2 monitor: Listen to localhost only by default
e6350d3d monitor: Fix build options
a67b93bb snap: clh: Re-use kata-deploy script here
f31125fe version: Bump cloud-hypervisor to b0324f85571c441f
54d0a672 subsystem: build
edf20766 docs: Update Readme document
eda8ea15 runtime: Gofmt fixes
4afb278f ci: add github action to exercise darwin build, unit tests
e355a718 container: file is not linux specific
b31876ee device-manager: move linux-only test to a linux-only file
6a5c6344 resourcecontrol: SystemdCgroup check is not necessarily linux specific
cc58cf69 resourcecontrol: convert stats dev_t to unit64types
5be188cc utils: Add darwin stub
ad044919 virtcontainers: Convert stats dev_t to uint64
56751089 katautils: Use a syscall wrapper for the hook JSON state
7d64ae7a runtime: Add a syscall wrapper package
abc681ca katautils: Add Darwin stub for the netNS API
de574662 config: Expand confidential_guest comments
641d475f config: clh: Use "Intel TDX" instead of just "TDX"
0bafa2de config: clh: Mention supported TEEs
81ed269e runtime: use Cmd.StdoutPipe instead of self-created pipe
8edca8bb kata-agent: Fix mismatching error of cgroup and mountinfo.
a9ba7c13 clh: Fix typo on HotplugRemoveDevice
827ab82a tools: clh: Fix unbound variable
082d538c runtime: make selinux configurable
1103f5a4 virtcontainers: Use FilesystemSharer for sharing the containers files
533c1c0e virtcontainers: Keep all filesystem sharing prep code to sandbox.go
61590bbd virtcontainers: Add a Linux implementation for the FilesystemSharer
03fc1cbd virtcontainers: Add a filesystem sharing interface
72434333 clh: Add TDX support
a13b4d5a clh: Add firmware to the config file
a8827e0c hypervisors: Confidential Guests do not support NVDIMM
f50ff9f7 hypervisors: Confidential Guests do not support Memory hotplug
df8ffecd hypervisors: Confidential Guests do not support Device hotplug
28c4c044 hypervisors: Confidential Guests do not support VCPUs hotplug
29ee870d clh: Add confidential_guest to the config file
9621c596 clh: refactor image / initrd configuration set
dcdc412e clh: use common kernel params from the hypervisor code
4c164afb versions: Update Cloud Hypervisor to 5343e09e7b8db
b2a65f90 virtcontainers: Use available s390x hugepages
cb4230e6 runtime: fix package declaration for ppc64le
fec26f8e kata-monitor: trivial: rename symbols & labels
9fd4e551 runtime: Move the resourcecontrol package one layer up
823faee8 virtcontainers: Rename the cgroups package
0d1a7da6 virtcontainers: Rename and clean the cgroup interface
ad10e201 virtcontainers: cgroups: Move non Linux routine to utils.go
d49d0b6f virtcontainers: cgroups: Define a cgroup interface
3ac52e81 kata-monitor: fix updating sandbox cache at startup
160bb621 kata-monitor: bump version to 0.3.0
1a3381b0 docs: Developer-Guide build a custom Kata agent with musl
f6fc1621 shim: log events for CRI-O
1d68a08f docs: Update contributing link
9123fc09 kata-deploy: Simplify Dockerfile and support s390x
11220f05 kata-deploy: Use (kata with) qemu as the default shim-v2 binary
3175aad5 virtiofs-nydus: add lazyload support for kata with clh
94b831eb virtcontainers: remove temp dir created for vsock in test code
8cc1b186 kernel: remove SYS_SUPPORTS_HUGETLBFS from powerpc fragments
5c9d2b41 packaging: Use `patch` for applying patches
5b3fb6f8 kernel: Build SGX as part of the vanilla kernel
2c35d8cb workflows: Stop building the experimental kernel
32e7845d snap: Build vanilla kernel for all arches
27de212f runtime: Always add network endpoints from the pod netns
1cee0a94 virtcontainers: Remove duplicated assert messages in utils test code
6c1d149a docs: Update limitations document
7c4ee6ec packaging/qemu: create no_patches file for qemu-tdx
d47c488b versions: add qemu tdx section
77c29bfd container: Remove VFIO lazy attach handling
7241d618 versions: add nydus-snapshotter
26b3f001 virtcontainers: Split hypervisor into Linux and OS agnostic bits
fa0e9dc6 virtcontainers: Make all Linux VMMs only build on Linux
c91035d0 virtcontainers: Move non QEMU specific constants to hypervisor.go
10ae0591 virtcontainers: Move guest protection definitions to hypervisor.go
b28d0274 virtcontainers: Make max vCPU config less QEMU specific
a5f6df6a govmm: Define the number of supported vCPUs per architecture
a6b40151 tools: clh: Remove unused variables
5816c132 tools: Build cloud-hypervisor with "--features tdx"
e6060cb7 versions: Linux 5.15.x
9818cf71 docs: Improve top-level and runtime README
36c3fc12 agent: support hugepages for containers
81a8baa5 runtime: add hugepages support
7df677c0 runtime: Update calculateSandboxMemory to include Hugepages Limit
948a2b09 tools: clh: Ensure the download binary is executable
72bf5496 agent: handle hook process result
80e8dbf1 agent: valid envs for hooks
4f96e3ea katautils: Pass the nerdctl netns annotation to the OCI hooks
a871a33b katautils: Run the createRuntime hooks
d9dfce14 katautils: Run the preStart hook in the host namespace
6be6d0a3 katautils: Pass the OCI annotations back to the called OCI hooks
493ebc8c utils: Update kata manager docs
34b2e67d utils: Added more kata manager cli options
714c9f56 utils: Improve containerd configuration
c464f326 utils: kata-manager: Force containerd sym link creation
4755d004 utils: Fix unused parameter
601be4e6 utils: Fix containerd installation
ae21fcc7 utils: Fix Kata tar archive check
f4d1e45c utils: Add kata-manager CLI options for kata and containerd
395cff48 docs: Remove docker run and shared memory from limitations
e07545a2 tools: clh: Allow passing down a build flag
55cdef22 tools: clh: Add the possibility to always build from sources
3f87835a utils: Switch kata manager to use getopts
4bd945b6 virtiofsd: Use "-o announce_submounts"
37df1678 build: always reset ARCH after getting it
3a641b56 katatestutils: remove distro constraints
90fd625d versions: Udpate Cloud Hypervisor to 55479a64d237
573a37b3 osbuilder: Add CentOS Stream rootfs
f10642c8 osbuilder: Source .cargo/env before checking Rust
955d359f kernel: add missing config fragment for TDx
734b618c agent-ctl: run cargo fmt/clippy in make check
12c37faf trace-forwarder: add make check for Rust
c1ce67d9 runtime: use github.com/mdlayher/vsock@v1.1.0
42a878e6 runtime: The index variable is initialized multiple times in for
1797b3eb packaging/kernel: build TDX guest kernel
98752529 versions: add url and tag for tdx kernel
bc8464e0 packaging/kernel: add option -s option
2d9f89ae feature(nydusd): add nydusd support to introduse lazyload ability
b19b6938 docs: Fix relative links in Markdown
9590874d device: Update PCIDEVICE_ environment variables for the guest
7b7f426a device: Keep host to VM PCI mapping persistently
0b2bd641 device: Rework update_spec_pci() to update_env_pci()
982f14fa runtime: support QEMU SGX
40aa43f4 docs: Update link to EFK stack docs
54e1faec scripts: fix a typo while to check build_type
07b9d93f virtcontainer: Simplify the sandbox network creation flow
2c7087ff virtcontainers: Make all endpoints Linux only
49d2cde1 virtcontainers: Split network tests into generic and OS specific parts
0269077e virtcontainers: Remove the netlink package dependency from network.go
7fca5792 virtcontainers: Unify Network endpoints management interface
c67109a2 virtcontainers: Remove the Network PostAdd method
e0b26443 virtcontainers: Define a Network interface
5e119e90 virtcontainers: Rename the Network structure fields and methods
b858d0de virtcontainers: Make all Network fields private
49eee79f virtcontainers: Remove the NetworkNamespace structure
844eb619 virtcontainers: Have CreateVM use a Network reference
d7b67a7d virtcontainers: Network API cleanups and simplifications
2edea883 virtcontainers: Make the Network structure manage endpoints
8f48e283 virtcontainers: Expand the Network structure
5ef522f7 runtime: check kvm module `sev` correctly
419d8134 snap: update qemu version to 6.1.0 for arm
00722187 docs: update Release-Process.md
496bc10d tools: check for yq before using it
88a70d32 Revert "workflows: Ensure a label change re-triggers the actions"
a9bebb31 openshift-ci: switch to CentOS Stream
89047901 kata-deploy-push: only run if PR modifying tools path
7ffe9e51 virtcontainers: Do not add a virtio-rng-ccw device
1f29478b runtime: suppport split firmware
24796d2f kata-deploy: for testing, make sure we use the PR branch
1cc1c8d0 docs: Remove images from Zun documentation
5861e52f docs: Remove Zun documentation with kata containers
903a6a45 versions: Bump critools to its 1.23 release
63eb1158 versions: bump CRI-O to its 1.23 release
5083ae65 workflows: stop checking revert commit
14e7f52a virtcontainers: Split the rootless package into OS specific parts
ab447285 kata-monitor: add kubernetes pod metadata labels to metrics
834e199e kata-monitor: drop unused functions
7516a8c5 kata-monitor: rework the sandbox cache sync with the container manager
e78d80ea kata-monitor: silently ignore CHMOD events on the sandboxes fs
e9eb34ce kata-monitor: improve debug logging
4fc4c76b agent: Fix execute_hook() args error
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
kata-deploy files must be adapted to a new release. The cases where it
happens are when the release goes from -> to:
* main -> stable:
* kata-deploy-stable / kata-cleanup-stable: are removed
* stable -> stable:
* kata-deploy / kata-cleanup: bump the release to the new one.
There are no changes when doing an alpha release, as the files on the
"main" branch always point to the "latest" and "stable" tags.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
This change introduces the `disable_guest_empty_dir` config option,
which allows the user to change whether a Kubernetes emptyDir volume is
created on the guest (the default, for performance reasons), or the host
(necessary if you want to pass data from the host to a guest via an
emptyDir).
Fixes#2053
Signed-off-by: Evan Foster <efoster@adobe.com>
Let's bring in the latest release of Containerd, 1.6.1, released on
March 2nd, 2022.
With this, we take the opportunity to remove containerd/api reference as
we shouldn't need a separate module only for the API.
Here's the list of changes needed in the code due to the bump:
* stop using `grpc.WithInsecure()` as it's been deprecated
- use `grpc.WithTransportCredentials(insecure.NewCredentials())`
instead
Fixes: #3820
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As this is just a initial vcpu hotplug support, thread and socket has
not been supported. So, don't set socket and thread when hotadd cpu for
arm/virt.
Fixes: #3280
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Add a new entry of arm-kernel-experimental and let the kernel build
script support to build it.
Fixes: #3280
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
I'm sure that it is correct to remove CONFIG_ARM64_UAO and
CONFIG_MANDATORY_FILE_LOCKING and . Both are gone in 5.15. Maintain a
specific config files for a kernel version is a little ugly. If someone
needs them, shout at me.
Fixes: #3280
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
As the support for vcpu hotplug is on the road, I pick them up here as
experimental to let user try cpu hotplug and virtio-mem on arm64.
Fixes: #3280
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Mount the direct-assigned block device fs only once and keep a refcount
in the guest. Also use the ro flag inside the options field to determine
whether the block device and filesystem should be mounted as ro
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Translate the volume path from host-known path to guest-known path
and forward the request to kata agent.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
During the container creation, it will parse the mount info file
of the direct assigned volumes and update the in memory mount object.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Add GetVolumeStats and ResizeVolume APIs for the runtime to query stat
and resize fs in the guest.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
To query fs stats and resize fs, the requests need to be passed to
kata agent through containerd-shim-v2. So we're adding to rest APIs
on the shim management endpoint.
Also refactor shim management client to its own go file.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
In the direct assigned volume scenario, Kata Containers persists
the information required for managing the volume inside the guest
on host filesystem.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Add commands to add, remove, resize and get stats of a direct-assigned volume.
These commands are expected to be consumed by CSI.
Fixes: #3454
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Highlights from the Cloud Hypervisor release v22.0: 1) GDB Debug Stub
Support; 2) `virtio-iommu` Backed Segments (to facilitate hotplug
devices that require being behind an IOMMU, e.g. QAT); 3) Before Boot
Configuration Changes; 4) `virtio-balloon` Free Page Reporting; 5)
Support for Direct Kernel Booting with TDX; 6) PMU Support for AArch64;
7) Documentation Under CC-BY-4.0 License; 8) Deprecation of "Classic"
virtiofsd (rust-based virtiofsd now is recommended); 9) Bug fixes on
`virtio-balloon`, `virtio-net` with multiple TAP fd support, REST APIs,
seccomp filters, migration with `vhost-user`, etc;
Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v22.0Fixes: #3825
Signed-off-by: Bo Chen <chen.bo@intel.com>
If, for some reason, we're able to launch cloud hypervisor but not able
to boot the VM up, the virtiofsd process would be left behind.
Let's ensure, via defer, that we stop virtiofsd in case of errors.
Fixes: #3819
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
kata-containers/pulls#3771 added TDX support for Cloud Hypervisor, but
two big things got overlooked while doing that.
1. virtio-fs, as of now, cannot be part of the trust boundary, so the
Confidential Guest will not be using it.
2. virtio-block hotplug should be enabled in order to use virtio-block
for the rootfs (used with the devmapper plugin).
When trying to use cloud-hypervisor with TDX using virtio-fs, we're
facing the following error on the guest kernel:
```
virtiofs virtio2: device must provide VIRTIO_F_ACCESS_PLATFORM
```
After checking and double-checking with virtiofs and cloud-hypervisor
developers, it happens as confidential containers might put some
limitations on the device, so it can't access all of the guests' memory
and that's where this restriction seems to be coming from. Vivek
mentioned that virtiofsd do not support VIRTIO_F_ACCESS_PLATFORM (aka
VIRTIO_F_IOMMU_PLATFORM) yet, and that for ecrypted guests virtiofs may
not be the best solution at the moment.
@sboeuf put this in a very nice way: "if the virtio-fs driver doesn't
support VIRTIO_F_ACCESS_PLATFORM, then the pages corresponding to the
virtqueues and the buffers won't be marked as SHARED, meaning the VMM
won't have access to it".
Interestingly enough, it works with QEMU, and it may be due to some
change done on the patched QEMU that @devimc is packaging, but we won't
take the path to figure out what was the change and patch
cloud-hypervisor on the same way, because of 1.
Fixes: #3810
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As mounting volumes into the guest requires SharedFS setup, let's ensure
we error out if trying to do so in a situation where SharedFS is not
supported.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
supportsSharedFS() is a new method to be used to ensure that no SharedFS
specifics are called when, for a reason or another, Cloud Hypervisor is
in a mode where SharedFSs are not supported.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to the `createVirtiofsDaemon` and `stopVirtiofsDaemon` methos,
let's introduce and use loadVirtiofsDaemon, at it'll also be handy later
in this series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similary to the `createVirtiofsDaemon` method, let's introduce and use
its counterpart, as it'll also be handy later in this series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to what's been done with the `createVirtiofsDaemon`, let's
create a `setupVirtiofsDaemon` one.
It will also become handy later in this series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's introduce and use a new `createVirtiofsDaemon` method. Its name
says it all, and it'll be handy later in this series when, spoiler
alert, SharedFS cannot be used (in such cases as in Confidential
Guests).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This reverts commit df8ffecde0, as device
hotplug *is* supported and, more than that, is very much needed when
using virtio-blk instead of virtio-fs.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
virtio-fs, instead of virtio-9p, is the default shared file system type
in case virtio-blk is not used.
Fixes: #3813
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Relying on virtio-block is the *only* way to use Firecracker with Kata
Containers, as shared FS (virtio-{fs,fs-nydus,9p}) is not supported by
Firecracker.
As configuration doesn't make sense to be exposed, we hardcode the
`false` value in the Firecracker configuration structure.
Fixes: #3813
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
snapcraft build is failing due to:
``
utils.mk:130: "WARNING: powerpc64le-unknown-linux-musl target is unavailable"
```
It seems to happen as powerpc64-unknown-linux-musl is a target that
although there's support for it, it's not exactly built or
automatically tested, at least according to:
https://doc.rust-lang.org/rustc/platform-support.htmlFixes: #3803
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Use `git clone --depth 1 ...` for QEMU and its dependencies
to speed up checkouts.
Fixes: #3799.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Add the Kata Containers unit testing presentation I gave to the Kata
outreach students as this may be of some use to others.
Fixes: #3781
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Change `kata-monitor` to listen to port `8090` on the local interface
only by default.
> **Note:**
>
> This is a breaking change as previously it listened on all interfaces.
Fixes: #3795.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Removed redundant and duplicated build options to build
`kata-monitor` the same way as the other components:
- `CGO_ENABLED=0` is not necessary.
- `-buildmode=exe` is not necessary since `BUILDFLAGS` already sets the
build mode.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
The current snap build for clh is broken as it's not aware of how to
build the binary from sources.
Instead of fixing it here, let's take advantage of the kata-deploy
script, which is capable of building from sources, and re-use it here.
Fixes: #3693
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This bump brings a fix on the build script, for ARM, so we can use the
very same build script everywhere.
The commit of our interest is b0324f85571c441f840e9bdeb25410514a00bb74:
```
scripts: Fix musl build on aarch64
Adding the missing TARGET_CC environment variable to get the build to
complete correctly.
Fixes#3776
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
With the ACPI PCI hotplug changes introduced in 2.3, QEMU >= 6.1 is required.
Remove unnecessary qemu version check in build script.
Fixes#3547
Signed-off-by: goodluckbot <tangbo_gl@hotmail.com>
This PR updates the README document by using the proper link for
the contributing guide as well as a misspelling.
Fixes#3791
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
- Mostly blank lines after `+build` -- see
https://pkg.go.dev/go/build@go1.14.15 -- this is, to date, enforced by
`gofmt`.
- 1.17-style go:build directives are also added.
- Spaces in govmm/vmm_s390x.go
Fixes: #3769
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
There are a few outstanding changes required to build the runtime on
Darwin.
Let's add a GitHub action to exercise build and unit tests of the
packages which we do expect to work. Eventually this should be dropped
and we can run any Darwin specific tests, or just add MacOS to the
matrix for our static check OSes.
Fixes: #3778
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
This utility function is also used to check the spec that will run in
the guest - no need for this to be linux specific.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Their types may differ on various host OSes, but
unix.Major|Minor always takes a uint64
Depends-on: github.com/kata-containers/tests#4516
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Add a stub for utils_darwin to facilitate building this package on
Darwin. We can probably drop this empty stub if we have better
abstraction for the various parts of virtcontainers that call it
today...
Fixes:# 3777
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
We need to convert them to uint64 as their types may differ on various
host OSes, but unix.Major|Minor takes a uint64 regardless.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Let's clarify that an error will be reported in case confidential_guest
is enabled, but the hardware where Kata Containers is running doesn't
provide the required feature set.
Fixes: #3787
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's use "Intel TDX" rather than just "TDX", as it can ease the
understanding of the terminology.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's mention the supported TEEs to be used with confidential guests.
Right now, Cloud Hyperisor supports only Intel TDX, used together with
TD Shim.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Nydusd uses a bufio.Scanner to check if nydusd process has
existed, but stderr/stdout passed to Cmd is self-created pipe,
this pipe will not be closed if the process start failing.
Use standard Cmd.StdoutPipe can close the stdout and kata shim
will detect the existence of the nydusd process, then call cmd.Wait to
reap the process' resources.
Fixes: #3783
Signed-off-by: bin <bin@hyper.sh>
The content about systemd in "/proc/self/cgroup" is as:
1:name=systemd:/kubepods/pod1815643d-3789-4e4e-aaf4-00de024912e1/0e15a65bd5f7b30a0b818d90706212354d8b3f0998a1495473c3be9a24706ccf
and in "/prol/self/mountinfo" is as:
30 29 0:26 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:6 - cgroup cgroup rw,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
The keys extracted from the two files are the same as "name=systemd". So no need to rename the key to "systemd".
Fixes: #3385
Signed-off-by: sailorvii <challengingway@hotmail.com>
A copy and paste mistake was made and the error on HotplugRemoveDevice()
should be about removal and not about addition.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
4c164afbac renamed extra_build_args to
features, but did it only in one place, leading to:
```
21:15:28 /home/jenkins/workspace/kata-containers-2.0-ubuntu-ARM-PR/go/src/github.com/kata-containers/kata-containers/tools/packaging/static-build/cloud-hypervisor/build-static-clh.sh: line 55: features: unbound variable
21:15:29 make[1]: *** [tools/packaging/kata-deploy/local-build/Makefile:30: cloud-hypervisor-tarball-build] Error 1
```
Fixes: #3775
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
removes --tags selinux handling in the makefile (part of it introduced here: d78ffd6)
and makes selinux configurable via configuration.toml
Fixes: #3631
Signed-off-by: Tanweer Noor <tnoor@apple.com>
Switching to the generic FilesystemSharer brings 2 majors improvements:
1. Remove container and sandbox specific code from kata_agent.go
2. Allow for non Linux implementations to provide ways to share
container files and root filesystems with the Kata Linux guest.
Fixes#3622
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
With the Linux implementation of the FilesystemSharer interface, we can
now remove all host filesystem sharing code from kata_agent and keep it
where it belongs: sandbox.go.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
This gathers the current kata agent and container filesystem sharing
code into a FilesystemSharer implementation.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Filesystem sharing here means the ability to share some parts of the
host filesystem with the guest. It's mostly about sharing files and
container bundle root filesystems.
In order to allow for different file and rootfs sharing implementations,
we define a FilesystemSharer interface.
This interface provides a preparation step, where concrete
implementations will be able to e.g. prepare the host filesysstem.
Then it provides 2 methods, one for sharing any file (regular file or a
directory) and another one for sharing a container root filesystem
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Let's enable TDX support for Cloud Hypervisor, using td-shim as its
desired firmware.
Fixes: #3632
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
"firmware" option was already present for a while, but it's never been
exposed to the configuration file before.
Let's do it now as it can be used, in combination with the newly added
confidential_guest option, to boot a guest VM using the so called
`td-shim`[0] with Cloud Hypervisor.
[0]: https://github.com/confidential-containers/td-shim
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
NVDIMM is also not supported with Confidential Guests and Virtio Block
devices should be used instead.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to VCPUs and Device hotplug, Confidential Guests also do not
support Memory hotplug.
Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Memory hotplug as
being supported when running Confidential Guests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to VCPUs hotplug, Confidential Guests also do not support
Device hotplug.
Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Device hotplug as
being supported when running Confidential Guests.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As confidential guests do not support VCPUs hotplug, let's set the
"DefaultMaxVCPUs" value to "NumVCPUs".
The reason to do this is to ensure that guests will be started with the
correct amount of VCPUs, without giving to the guest with all the
possible VCPUs the host could provide.
One clear side effect of this limitation is that workloads that would
require more VCPUs on their yaml definition will not run on this
scenario.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
ConfidentialGuest is an option already present and exposed for QEMU,
which is used for using Kata Containers together with different sorts of
Guest Protections, such as TDX and SEV for x86_64, PEF for ppc64le, and
SE for s390x.
Right now we error out in case confidential_guest is enabled, as we will
be implementing the needed blocks for this as part of this series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is a small code refactor removing a deadcode based the checks
already done in the generic hypervisor abstraction.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The hypervisor code already defines 3 common kernel root params for the
following cases:
* NVDIMM
* NVDIMM without DAX support
* Virtio Block
As parameters used for cloud-hypervisor have an overlap with the ones
provided by the NVDIMM case, let's take advantage of that.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's bump the Cloud Hypervisor version to 5343e09e7b8db, as that brings
a few fixes we're interested in, such as:
* hypervisor, vmm: Handle TDX hypercalls with INVALID_OPERAND
- https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3723
- This is needed for the TDX support on the cloud hypervisor driver,
which is part of this very same series.
* openapi: Update the PciBdf types
- https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3748
- This is needed due to a change in a DeviceNode field, which would
cause a marshalling / demarshalling error when running with a
version of cloud-hypervisor that includes the TDX fixes mentioned
above.
* scripts: dev_cli: Don't quote $features_build
* scripts: dev_cli: Add --features option
- https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3773
- This is needed due to changes in the scripts used to build Cloud
Hypervisor, which are used as part of Kata Containers CIs and
github actions.
Due to this change, we're also adapting the build scripts as part
of this very same commit.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
in TestHandleHugepages. On s390x, hugepage sizes must be set at boot, so
test with any that are present (default is 1M).
Depends-on: github.com/kata-containers/kata-containers#3770
Fixes: #3763
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
We introduced collection of sandboxes metadata from the CRI that will be
attached to the sandbox metrics: this will allow to immediately match
sandboxes metrics with CRI workloads.
Rename the symbols from *Kube* to *CRI* as the metadata will be there
every time pods are created through CRI, also if kubernetes is not
installed (e.g., 'crictl runp').
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
To resourcecontrol, and make it consistent with the fact that cgroups
are a Linux implementation of the ResourceController interface.
Fixes: #3601
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
We call it a ResourceController, and we make it not so Linux specific.
Now the Linux implementations is the cgroups one.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
We now rely on fs events only to update the sandbox cache. This is not
true anyway for sandboxes already present at kata-monitor startup: we
just retrieve the list and add them in the cache only when we get their
CRI metadata. If CRI metadata is not available we will never add them to
the sandbox cache.
Fix this by immediately adding the sandboxes we find at startup time to
the sandbox cache.
Fixes: #3705
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
Since kata-monitor now:
- relies on fs events *only* to update the sandbox cache
- adds CRI meta-data as labels (CRI pod name, namespace and uid)
it deserves a version bump.
Note that while we could let kata-monitor match the runtime version,
kata-monitor will usually work flawlessy with different kata shim
releases: so it makes sense to keep kata-monitor version separated.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
The Developer-Guide.md build a custom kata agent with `x86_64-unknown-linux-musl`.
The `musl` should be changed by the system arch. The system arch is aarch64,
ppc64le and s390x, the musl should be changed. When the arch is ppc64le or s390x,
the musl should be replaced by the gnu.
Fixes: #3731
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
CRI-O start shim process without setting TTRPC_ADDRESS,
that the forwarding events goroutine will get errors.
For CRI-O runtime, we can log the events to log file.
Fixes: #3733
Signed-off-by: bin <bin@hyper.sh>
This PR updates the contributing documentation link to the
one that is using kata 2.0
Fixes#3740
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
The kata-deploy Dockerfile is based on CentOS 7, which has no s390x
support. Add an `IMAGE` argument to specify the registry, which still
defaults to CentOS, but e.g. ClefOS can be selected instead.
Other x86_64 assumptions are also removed. Other general simplicifations
are made.
This does not address the more general issue of #3723 -- what we're
doing here does not seem to be working with systemd >= something between
235-237.
Fixes: #3722
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
When using kata-deploy, no `containerd-shim-kata-v2` binary is deployed,
but we do deploy a `kata` runtime class, which seems very much
incosistent.
As the default configuration for kata-containers points to QEMU, let's
also use kata with QEMU as the default shim-v2 binary.
Fixes: #3228, #3734
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As kata with qemu has supported lazyload, so this pr aims to
bring lazyload ability to kata with clh.
Fixes#3654
Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
The name of SYS_SUPPORTS_HUGETLBFS has been changed to
ARCH_SUPPORTS_HUGETLBFS which is being selected on default
by another kernel config.
More info- 855f9a8e87
Change applicable from v5.13.
Fixes: #3720
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
`tools/packaging/scripts/apply_patches.sh` uses `git apply $patch`, but
this will not apply to subdirectories. If one wanted to apply with
`git apply`, they'd have to run it with `--directory=...`
_relative to the Git tree's root_ (absolute will not work!). I suggest
we just use `patch`, which will do what we expected `git apply` would
do.
`patch` is also added to build containers that require it.
Fixes: #3690
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Let's take advantage of the fact that we've bumped to our kernel version
ot the 5.15 LTS and enable SGX by default, as it's present there.
Fixes: #3692
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's stop building the experimental kernel as, currently, we have
all the needed contents as part of the vanilla kernel.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
There's no need to build an experimental kernel for x86_64 as all the
bits which were part of the experimental one (SGX only, really) are now
part of the vanilla one.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As the container runtime, we're never inspecting, adding or configuring
host networking endpoints.
Make sure we're always do that by wrapping addSingleEndpoint calls into
the pod network namespace.
Fixes#3661
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
This PR updates the limitations document by removing the docker
references belonged to kata 1.x and add as a limitation the
docker and podman support for kata 2.0
Fixes#3709
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Keep all the OS agnostic bits in the hypervisor.go and
hypervisor_ARCH.go files.
Fixes#3614
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Some of them (e.g. QEMU) can run on other OSes (e.g. Darwin) but the
current virtcontainers implementation is Linux specific.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Even though it's still actually defined as the QEMU upper bound,
it's now abstracted away through govmm.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Right now we're getting the info for the Cloud Hypervisor repo and
version, but we don't do anything with them, as those are not passed
down to the build script.
Morever, the build script itself gets the info from exactly the same
place when those are not passed, making those redundant.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Right now TDx support on Cloud Hypervisor is gated behind a "--features
tdx" flag. However, having TDx support enabled should not and does not
impact on the general usability of cloud-hypervisor.
As sooner than later we'll need kata-deploy binaries to be tested on a
CI that's TDx capable, for the confidential containers effort, let's
take the bullet and already enable it by default.
By the way, touching kata-deploy-binaries.sh as it's ensure the change
will be used in the following workflows:
* kata-deploy-push
* kata-deploy-test
* release
Fixes: #3688
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Various improvements to the top-level README file:
- Moved the following sections from the runtime's README to the
top-level README:
- License
- Platform support / Hardware requirements
- Added the following sections to the top-level README:
- Configuration
- Hypervisors
- Improved formatting of the Documentation section in the top-level
README.
- Removed some unused named links from the top-level README.
Also improvements to the runtime README:
- Removed confusing mention of the old 1.x runtime name.
- Clarify the binary name for the 2.x runtime and the utility program.
> **Note:**
>
> We cannot currently link to the AMD website as that site's
> configuration causes the CI static checks to fail. See
> https://github.com/kata-containers/tests/issues/4401Fixes: #3557.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Mount hugepage directories and configure the requested number of hugepages
dynamically by writing to sysfs files
Port from:
78b307b5bdFixes: #3342
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: bin <bin@hyper.sh>
We're downloading the released cloud-hypervisor binary from GitHub, but
we should also ensure we set the binary as executable.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Current hook process is handled by just calling
unwrap() on it, sometime it will cause panic.
By handling all Result type and check the error can
avoid panic.
Fixes: #3649
Signed-off-by: bin <bin@hyper.sh>
Envs contain null-byte will cause running hooks to panic,
this commit will filter envs and only pass valid envs to hooks.
Fixes: #3667
Signed-off-by: bin <bin@hyper.sh>
The OCI spec is very specific about it:
"The prestart hooks MUST be executed in the runtime namespace."
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
That allows us to amend those annotations with information that could be
used when running those hooks.
For example nerdctl will use those annotations to resolve the networking
namespace path in where to run the CNI plugin, i.e. the created pod
networking namespace.
Fixes#3629
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Update the `kata-manager.sh` README to recommend users view the
available options before running the script.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Added CLI options to the `kata-manager.sh` script to:
- Force installation
- Disable cleanup (retain downloaded files)
- Only install Kata (don't consider containerd).
> **Note:**
>
> This change introduces a subtle behaviour difference:
>
> - Previously, the script would error if containerd was already installed.
>
> - Now, the script will detect the existing installation and skip
> trying to install containerd.
>
> This new behaviour makes more sense for most users but if you wish
> to use the old behaviour, you (now) need to run the script specifying
> the `-f` (force) option.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
`kata-manager.sh` improvements for containerd:
- Fixed containerd default branch (which is now `main`).
- Only install service file if it doesn't already exist.
- Enable the containerd service to ensure it can be started.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
For consistency with the rest of the script force the creation of a
symbolic link for containerd in `kata-manager.sh`.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Actually make use of the `requested_version` parameter in
`kata-manager.sh` and added a comment.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Fix bug introduced inadvertently on #3330 which fixes the Kata
installation, but unfortunately breaks installing containerd.
The new approach is to check that the download URL matches a
project-specific regular expression.
Also improves the architecture test to handle the containerd
architecture name (`amd64` rather than `x86_64`).
Fixes: #3674.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
The static tar archive published on GitHub (now) contains `./` which is
being being flagged as an "unknown path" and resulting in the
`kata-manager.sh` script failing.
Partially fixes: #3674.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This PR removes the docker run and shared memory segment from the
limitations document as for kata 2.0 we do not support docker
and this is not longer valid.
Fixes#3676
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Let's allow passing down a build flag to cargo, when building Cloud
Hypervisor.
By doing this we allow calling this script with:
```
extra_build_flags="--features tdx" ./build-static-clh.sh
```
Fixes: #3671
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The current code will always pull the release binaries in case the
version requested by Kata Containers matches with a released version.
This, however, has a limitation of preventing users / CIs to build
cloud-hypervisor from source for a reason or another, such as passing a
specific build flag to cloud-hypervisor.
This is a pre-req to solving
https://github.com/kata-containers/kata-containers/issues/3671.
While here, a small changes were needed in order to improve readability
and debugability of why we're building something from the sources rather
than simply downloading and using a pre-built binary.
Fixes: #3672
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
German Maglione, one of the current virtio-fs developers, has brought to
our attention that using "announce-submounts" could help us to prevent
inode number collisions.
This feature was introduced a year ago or so by Hanna Reitz as part of
the 08dce386e77eb9ab044cb118e5391dc9ae11c5a8, and as we already mandate
QEMU >= 6.1.0, let's take advantage of that.
Fixes: #3507
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
When building with `ARCH=x86_64`, the previous `Makefile` will use it
without checking and cause:
Makefile:319: *** "ERROR: No hypervisors known for architecture x86_64 (looked for: acrn firecracker qemu cloud-hypervisor)". Stop.
This commit fix the above issue by checking `ARCH` no matter where it
is assigned.
Fixes: #3444
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
The distro constraint parses os release files, which may not contain
distro version(VERSION_ID field), for example rolling release distributions
like Debian testing, archlinux.
These distro constraints are not used anyway, so removing them instead
of fixing the complex version detection.
Fixes: #1864
Signed-off-by: Shengjing Zhu <zhsj@debian.org>
Let's update cloud-hypervisor to a version that exposes the TDx support
via the OpenAPI's auto-generated code.
Fixes: #3663
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
to cover a Red Hat (adjacent) rootfs with great cross-platform compatibility
and a workable release cadence. The previous CentOS & Fedora workflows are
simplified.
Also remove unnecessary `/usr/share` files as on Ubuntu and mark Alpine
as unuspported on ppc64le (due to musl, for a while already).
Fixes: #3340
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
We install Rust in the build containers, but we also install Rust in
`rootfs.sh` if it is missing. It makes sense to install Rust in the build
containers so it does not have to be installed every time, but for that check
to work on non-login shells, we should source `.cargo/env` before running it.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Change the variables `mountTypeFieldIdx := 8`, `mntDestIdx := 4` and `netNsMountType := "nsfs"` to const.
And unify the variable naming style, modify `mntDestIdx` to `mountDestIdx`.
Fixes: #3646
Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
Add support for building TDX kernel from github.com/intel/tdx
To build a guest kernel that supports Intel TDx run:
```
./build-kernel.sh -s -x tdx -d setup
./build-kernel.sh -s -x tdx -d install
```
fixes#3650
Signed-off-by: Julio Montes <julio.montes@intel.com>
Pulling image is the most time-consuming step in the container lifecycle. This PR
introduse nydus to kata container, it can lazily pull image when container start. So it
can speed up kata container create and start.
Fixes#2724
Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
Relative links within this repository allow for easier navigation to
the corresponding file / directory in the current commit / for the
selected version.
Link text was slightly changed / fixed in
- docs/Unit-Test-Advice.md
- docs/how-to/how-to-run-docker-with-kata.md
Fixes#3045
Signed-off-by: Daniel Höxtermann <daniel@hxtm.dev>
In commit 78dff468bf1 we introduced logic to rewrite PCIDEVICE_ environment
variables for the container so that they contain correct addresses for the
Kata VM rather than for the host. Unfortunately, we never actually invoked
the function to do this.
It turns out we need to do this not only at container creation time, but
also for environment variables supplied to processes exec-ed into the
container after creation (e.g. with crictl exec). Add calls to make both
those updates.
fixes#3634
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
add_devices() generates a mapping of host to guest PCI addresses which is
used to update some environment variables for the workload. Currently it
just does this locally, but it turns out we're going to need the same map
again in order to correct environment variables for processes exec-ed into
the existing container.
Move the map to the sandbox structure so we can keep it around for those
later uses.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This function updates PCIDEVICE_ environment variables (such as those
supplied by the Kubernetes SR-IOV plugin) in the OCI spec to be correct
for the Kata VM, rather than for the host.
We neglected to actually call this function, however, and it turns out that
when we do, we need to do things slightly different. We actually need to
adjust envionment variables both in the OCI spec when creating a container
and also in the variables supplied for exec-ing a new process within an
existing container.
Adjust the function so that it can be used for both these cases.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We don't need to call NewNetwork() twice, and we can have the VM factory
case return immediatly. That makes the code more readable.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Move the netlink dependent code into network_linux.go.
Other OSes will have to provide the same functions.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
And only have AddEndpoints/RemoveEndpoints for all cases (single
endpoint vs all of them, hotplug or not).
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
We are converting the Network structure into an interface, so that
different host OSes can have different networking implementations for
Kata.
One step into that direction is to rename all the Network structure
fields and methods to something that is less Linux networking namespace
specific. This will make the Network interface naming consistent.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
We are replacing the NetworkingNamespace structure with the Network
one, so we should have the hypervisor interface switching to it as well.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Remove unused parameters.
Reduce the number of parameters by deriving some of them (e.g. a
networking config) from their outer structure (e.g. a Sandbox
reference).
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Endpoints creations, attachement and hotplug are bound to the networking
namespace described through the Network structure.
Making them Network methods is natural and simplifies the code.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
For simplicity sake, there should only be one networking structure per
sandbox, as opposed to two (Network and NetworkingNamespace) currently.
This commit start expanding the Network structure in order to eventually
make it the single representation of a virtcontainers sandbox
networking.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Runtime now accepts both `1` and `Y` as valid values for
kvm_amd module parameter kvm_amd.sev.
Fixes#3273
Signed-off-by: Pierre Kohler <pierre.kohler@cysec.systems>
Update qemu version of snap for arm to 6.1.0 thus the arch specific qemu
version for arm needs clean up.
Fixes: #3627
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
This reverts commit 7a879164bd, as it's
been proved that re-triggering the checks at every single change is more
painful than having to close / re-open a PR in case we ever use the
`force-skip-ci` label again.
Fixes: #2804
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The build root container is switched from CentOS 8 to Stream 8 as
the former reached EOL.
Fixes#3605
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Since we are using this to exercise any changes to osbuilder or
packaging scripts, let's make sure that we only run the test in that
case.
Similarly, don't run for every single push. Just run this workflow for
pull requests.
Fixes: #3594
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
On s390x, skip adding a virtio-rng device. The on-chip CPACF provides
entropy instead. For Confidential Containers, when using Secure
Execution, entropy attacks on virtio-rng are mitigated.
Fixes: #3598
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
firmware can be split into FIRMWARE_VARS.fd (UEFI variables as
configuration) and FIRMWARE_CODE.fd (UEFI program image). UEFI
variables can be customized per each user while UEFI code is kept same.
fixes#3583
Signed-off-by: Julio Montes <julio.montes@intel.com>
Since we are already checking that only an admin is triggering the job,
let's go ahead and make sure we are testing against the PR itself. This
will ensure that we are exercising changes to kata-deploy tooling, which
is important for this test.
While at it, cleanup and simplify some of the tarball creation.
Fixes: #3586
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
This PR removes the images belonged to the Zun documentation at
the use cases directory.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes the zun documentation use case with kata containers mainly
because is not longer valid as it is using as a reference docker with
clear containers 2.0 which are not longer being supported and it is also
using docker to test kata with openstack zun and docker is also not supported.
Fixes#3581
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
critools v1.23.0 has been released a few days ago. As we're already
bumping kubernetes, and CRI-O, let's also update critools.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As done for kubernetes, CRI-O should also be bumped to its 1.23 release
so those are in sync.
Fixes: #3481
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
- virtcontainers: Enable initrd for Cloud Hypervisor
- versions: update Rust to 1.58.1
- Sandbox sizing feature
- kata-deploy: Fix the tag replacement logic
- docs: Update networking details in the architecture doc
- Fix and re-enable s390x GoVMM tests
- runtime: fix handling container spec's memory limit
- ci: Pass function arguments in static-checks.sh
- docs: Remove docker run and sysctl limitation
- runtime: update runc and image-spec dependencies
- agent: resolve unused variables in tests
- Upgrade to Cloud Hypervisor v21.0
- runtime: rectify passing empty options to -ldflags
- osbuilder: Remove libseccomp from Dockerfile
- agent: fix the issue of creating new namespaces for agent
- docs: Remove kata-pkgsync reference
- docs: Redirect glossary to the wiki
- workflows: Use base instead of head ref for kata-deploy-test
- govmm: Use it from our own repo
- tools: Fix groupname if it differs from username
- workflows: Fix typo in kata-deploy-push action
- release: Escape backticks in Libseccomp Notices
- packaging: Remove kata-pkgsync tool
- govmm: Bring the project in
- version: bump to kubernetes 1.23
- vendor: update govmm
- workflows: Ensure force-skip-ci skips all actions
- runtime: -Wl,--s390-pgste for s390x
- workflows: Use the correct branch ref on test kata-deploy
- update apiVersion
- scripts: Use shebang /usr/bin/env bash
- packaging: Make kernel config accessible to guest
- docs: fix a typo in host-cgroups.md doc
- qemu: add support for SGX
- experimentally enable the vcpu-hotplug for arm in qemu side
- Remove all the non-tested rootfs
- docs: Remove ccloudvm reference
- runtime: Provide protection for shared data
- kata-deploy: validate conf file can be created
- runtime: it should rollback when failed in Sandbox AddInterface
- libs: add some generated files to .gitignore
- runtime: close span before return from function in case of error
- packaging: Remove ccloudvm instructions and script
- docs: Default machine type is q35 meanwhile
- CI: Revert "CI: Switch to a mirror as gnu.org is down"
- agent: fix the broken protobuf generation code
- packaging: Remove obs packages testing for kata 2.0
- runtime: Remove docker comments for kata 2.0 configuration.tomls
- docs: fix agent proto file path
- qemu: update readonly flag for block devices
- qemu: only set wait parameter for server mode socket based char device
- qemu: Fix 32 bit int overflow in test file
- qemu: Add support for legacy serial device
- qemu: Remove -realtime in favor of -overcommit
- Add clean shutdown support
- govmm/qemu: Let IO/memory reservations be specified for bridge devices
- QMP: Add ExecuteBlockdevAddWithDriverCache
- qemu: Fix iommu_platform for CCW
- qemu: Add credentials to qemu Cmd
- Don't use deprecated 'props' argument to QMP 'object-add'
- Use 'host_device' driver for blockdev backends
- add support for "sandbox" feature to qemu
- qemu: support read-only nvdimm
- Support golang 1.16
- qemu: Consistent parameter building
- qemu: Allow hot-plugging memory devices on PCI bridges
- qemu: Add support for PEF
- qemu: Add support for Secure Execution
- qemu: VhostUserDevice CCW device numbers
- qmp: remove chatty log
- Fix qemu commandline issue with empty romfile
- qemu: add support for tdx-guest object
- qemu: Append memory backend for non-DIMM setups
- qemu: add support for device loaders
- qemu: support QEMU 6
- qmp: Add ro argument for block-device hotplug funcs
- qemu: add arm64 to support list of dimm
- qemu: enable "-pflash"
- qemu: add pvpanic and dump guest memory support
- Add serial ID to blk device
- Make fw_cfg a slice
- contributors: remove CONTRIBUTORS.md file
- misc: Update for new GitHub organisation name
- qemu: add fw_cfg flag to config
- Add qom-get function
- typo fix
- Add support for hot-plugging IBM Adjunct Processor (AP) devices
- github: enable github actions
- travis: Run coveralls after success
- qemu: add iommu_platform knob for qemuParams
- qemu: Add NoReboot config Knob for qemuParams
- Add multidevs option to fsdev
- qemu/qmp: use boolean type for the vhost
- qemu: add IOMMU Device
- Enable Numa support for Power (ppc64le) architecture
- qemu: Add max_ports option to virtio-serial device
- Add rt clock definition for rtc clock in qemu
- qemu: Add microvm machine type support
- qemu: add pmem flag to memory-backend-file
- Refactor code to support multiple virtio transports at runtime
- qemu: Don't set ".cache-size=" when CacheSize is 0
- qemu: Add pcie-root-port device support.
- qmp: Add ExecMemdevAdd and ExecQomSet API
- qmp: add ExecutePCIVhostUserDevAdd and ExecuteChardevDel to hotplug vhost-user device
- s390x: add s390x travis support
- virtio-blk: Add support for share-rw flag
- s390x: dimm not supported
- improve qemu interaction
- qmp: support command 'query-qmp-schema'
- qmp: add checks for the CPU toplogy
- qemu: support x86 SMP die
- Support x-pci-vendor-id and x-pci-device-id pass to qemu
- Support for virtio-blk-ccw
- Allow sharing of memory backend file
- qemu: add migration incoming defer support
- qmp: add virtio-blk multiqueue
- qemu: fix the issue of wrong driver for VirtioBlock
- qemu: use MiB instead of Gib for virtio-fs cache size
- qemu/qmp: re-implement mainLoop
- qemu/qmp: fix readLoop() reuse scanner.Bytes() underlying array problem
- govmm: add VhostUserFS vhost-user device type
- qmp: Conditionally pass threadID and socketID when CPU device add
- Fix travis
- qmp: Add nvdimm support
- qemu: Allow disable-modern option from QMP
- qmp: Output error detail when execute QMP command failed
- Run tests for the s390x build
- Contributors: Add Clare Chen to CONTRIBUTORS.md
- Verify govmm builds on s390x
- Contributors: Add my name
- qemu: Add s390x support
- Update file headers , CONTRIBUTING.md and add CONTRIBUTORS.md
- qmp: fix mem-path properties for hotplug memory.
- qemu: change Context ID for Vsock to uint64
- qemu/qmp: preparation for s390x support
- qemu/qmp: add new function ExecuteBlockdevAddWithCache
- qemu: add support for pidfile option
- qemu: Fix virtio-net-pci QMP command
- qemu: Add support for romfile option
- Update guidelines on security issue reporting
- qemu: Add virtio-balloon device suppport.
- qemu: Show full path to qemu binary at launch time
- qemu: Fix the support of PCIe bridge
- qmp: add ExecuteQueryMigration
- qemu: skip setting system memory if it is set via dimm device
- qmp: add "query-cpus" support
- qemu/qmp: add vfio mediated device support on root bus
- qemu/image: Reduce permissions of .iso creation dir
- qemu/qmp: nic can works without vhost
- qemu: Add rng device .
- qemu/qmp: support query-memory-devices qmp command.
- govmm: modify govmm to be compatible with qemu 2.8
- qemu/qmp: support hotplug a nic whose qdisc is mq
- qmp: Remind users that you must first call ExecuteQMPCapabilities()
- qemu/qmp: Add netdev_add with chardev support
- Add some negative test cases
- qemu: Use the supplied context.Context for launching
- disk: Add --share-rw option for hotplugging disks
- qemu/qmp: add vfio mediated device support
- qemu: Do not try and generate invalid RTC parameters
- qemu/qmp: add addr and bus to hotplug vsock devices
- qemu/qmp: add function for hotplug network by fds
- qemu/qmp: implement functions to hotplug chardevs and serial ports
- qemu: add vhostfd and disable-modern to vsock hotplug
- Add two additional static analysis tools to the travis builds
- qemu/qmp: implement function for hotplug network
- qemu: add vhostfd and disable-modern to vhost-vsock-pci
- qemu/qmp: implement function to hotplug vsock-pci
- Add APIs to enable vm templating
- qemu: Add qemu parameter for PCI address for a bridge.
- Add ability to associate a SCSI controller device with an iothread
- qemu: add initrd support
- qemu: add DisableModern to SCSIController
- qemu: add extra options for the machine type
- scsi: Add function to send device_add qmp command for a scsi device
- Compute coverage statistics for unit tests in Travis builds
- scsi: Add a scsi controller device
- qemu: Add VSOCK support
- Vhost-user: add block device support
- qemu: Add maxcpus attribute to -smp
- Add badges to the README.md file
- Enable Travis builds
- qemu: introduce vhost-user handling
bcce1a19 versions: update Rust to 1.58.1
7c956e0d virtcontainers: Enable initrd for Cloud Hypervisor
aa3fae13 kata-deploy: Fix the tag replacement logic
8cde5413 runtime: introduce static sandbox resource management
13eb1f81 docs: describe vCPU handling when hotplug is unavailable
c3e97a0a config: updates to configuration clh, fc toml template
75ae5361 docs: Update networking details in the architecture doc
fc0e0951 runtime: fix handling container spec's memory limit
7af40fbc docs: Remove docker run, sysctl and docker daemon limitations
17211979 ci: Pass function arguments in static-checks.sh
5643c6dc runtime: update runc and image-spec dependencies
2f37165f govmm: Unite VirtioNet tests
4a428fd1 govmm: readonly=on in s390x blkdev test
79ecebb2 govmm: TestAppendPCIBridgeDevice et al. on !s390x
dc285ab1 govmm: Remove unnecessary comma in iommu_platform
d23f2eb0 govmm: Revert "govmm: s390x: Skip broken tests"
f52ce302 runtime: rectify passing empty options to -ldflags
2d799cbf virtcontainers: clh: Re-generate the client code
7e15e99d versions: Upgrade to Cloud Hypervisor v21.0
9c2f1de1 docs: Remove kata-pkgsync reference
df6ae1e7 osbuilder: Remove libseccomp from Dockerfile
0338fc65 docs: Redirect glossary to the wiki
3924470c workflows: Use base instead of head ref for kata-deploy-test
5ce9011a govmm: s390x: Skip broken tests
8bcaed0b govmm: Adapt license headers to kata-containers
6dd65779 govmm: Ignore govet checks, at least for now
de678a3a govmm: Remove non-relevant top files
ec6655af govmm: Use govmm from our own pkg
8cc088b5 packaging: Remove kata-pkgsync tool
a8b66de5 release: Escape backticks in Libseccomp Notices
c3785f66 workflows: Fix typo in kata-deploy-push action
f4a4c3c7 version: bump to kubernetes 1.23
49223e67 runtime: remove enable_swap option
7a879164 workflows: Ensure a label change re-triggers the actions
d87ab14f workflows: Ensure force-skip-ci skips all actions
5285ac2b runtime: -Wl,--s390-pgste for s390x
fc646434 workflows: Use the correct branch ref on test kata-deploy
e347694f tools: Fix groupname if it differs from username
41e0c414 vendor: update govmm
a5829a29 docs: fix a typo in host-cgroups.md doc
92773170 agent: resolve unused variables in tests
8939b0f8 qemu: add support for SGX
2d0ec00a Qemu: Enable the vcpu-hotplug for arm
e22a4e2a packaging: Make kernel config accessible to guest
adffd3f8 scripts: Use shebang /usr/bin/env bash
e4b7a12b qat: Add Debian to the distro examples
6979d5be osbuilder: Remove gentoo rootfs-builder
22c1a093 osbuilder: Remove suse rootfs-builder
85dd5873 osbuilder: Remove fedora rootfs-builder
06fae29f osbuilder: Remove centos rootfs-builder
01005c5a docs: Remove ccloudvm reference
878ab93c runtime: Provide protection for shared data
ac7acbf8 kata-deploy: validate conf file can be created
7e2bc4d7 packaging: Remove ccloudvm instructions and script
85f5ae19 runtime: close span before return from function in case of error
106df33f libs: add some generated files to .gitignore
b133a236 runtime: it should rollback when failed in Sandbox AddInterface
7f546748 CI: Revert "CI: Switch to a mirror as gnu.org is down"
c486c2ca agent: fix the broken protobuf generation code
f6cdf464 docs: Default machine type is q35 meanwhile
b48322d4 packaging: Remove obs packages testing for kata 2.0
ad16d75c runtime: Remove docker comments for kata 2.0 configuration.tomls
905e124b docs: fix agent proto file path
ea1a1738 agent: fix the issue of creating new namespaces for agent
b17f0739 qemu: update readonly flag for block devices
b5b9de1d kata-deploy: Update API Version of RuntimeClass to v1
f971801b qemu: only set wait parameter for server mode socket based char device
82cc01d2 qemu: Fix 32 bit int overflow in test file
1d1a2313 qemu: Add support for legacy serial device
9a2bbeda qemu: Remove -realtime in favor of -overcommit
fe83c208 qemu: Add support for --no-shutdown Knob
1ed52714 qmp: wait for POWERDOWN event in ExecuteSystemPowerdown()
de039da2 govmm/qemu: Let IO/memory reservations be specified for bridge devices
5c7998db QMP: Add ExecuteBlockdevAddWithDriverCache
3a9a6749 qemu: Add credentials to qemu Cmd
d27256f8 qmp: Don't use deprecated 'props' field for object-add
d8cdf9aa qemu: Drop support for versions older than 5.0
18352c36 qemu: Fix iommu_platform for vhost user CCW
1b021929 Use 'host_device' driver for blockdev backends
9518675e add support for "sandbox" feature to qemu
335fa816 qemu: fix golangci-lint errors
61b63787 .github/workflows: reimplement github actions CI
9d6e7970 go: support go modules
0d21263a qemu: support read-only nvdimm
ff34d283 qemu: Consistent parameter building
0e19ffb6 qemu: Allow hot-plugging memory devices on PCI bridges
c135681d qemu: Add support for PEF
03b55ea5 qemu: Add support for Secure Execution
7a367dc0 qemu: Simplify (Object).Valid()
a6cec2d3 qemu: add support for SevGuest object
abd3c7ea qemu: VhostUserDevice CCW device numbers
3eaeda7f qemu: Refactor vhostuserDev.QemuParams
511cf58b Fix qemu commandline issue with empty romfile
b3eac95b qmp: remove frequent, chatty log
31418940 qemu: add support for tdx-guest object
4b136f3f qemu: Append memory backend for non-DIMM setups
6213dea4 qemu: support QEMU 6
0d47025d qemu: add support for device loaders
e2eb549f qmp: Add ro argument for block-device hotplug funcs
0592c825 qemu: add arm64 to support list of dimm
2079c15c qemu: enable "-pflash"
b8cd7059 qmp: add dump-guest-memory support
d7836877 qemu: add pvpanic device to get GUEST_PANICKED event
43d774d2 Add serial to blk device
8cb8b24c Make fw_cfg a slice
cb0d3391 contributors: remove CONTRIBUTORS.md file
29ba5a90 qemu: add fw_cfg flag to config
9f309c2a misc: Update for new GitHub organisation name
3d46d08a Add qom-get function
39c372a2 Add support for hot-plugging IBM VFIO-AP devices
f5bdd53c travis: disable amd64 jobs
1af1c0d7 github: enable github actions
4831c6e0 travis: Run coveralls after success
cf0f05d2 qemu: add iommu_platform knob for qemuParams
6645baf2 qemu: Add NoReboot config Knob for qemuParams
abca6f3c Add multidevs option to fsdev
cc538766 qemu/qmp: use boolean type for the vhost
e57e86e2 qemu: add IOMMU Device
b2aa0225 Enable Numa support for Power (ppc64le) architecture
29529a5d Add rt clock definition for rtc clock in qemu
0e98b613 qemu: Add max_ports option to virtio-serial device
787c86b7 qemu: Add microvm machine type support
5378725f qemu: add pmem flag to memory-backend-file
3700c55d qemu: add block device readonly support
88a25a2d Refactor code to support multiple virtio transports at runtime
2ee53b00 qemu: Don't set ".cache-size=" when CacheSize is 0
f1252f6e qemu: Add pcie-root-port device support.
6667f4e9 qmp_test: Add TestExecMemdevAdd and TestExecQomSet
201fd0ae qmp: Add ExecMemdevAdd and ExecQomSet API
e04be2cc qmp: add ExecutePCIVhostUserDevAdd API
13aeba09 qmp: support command 'chardev-remove'
6d6b2d88 s390x: add s390x travis support
175ac499 typo fix
cb9f640b virtio-blk: Add support for share-rw flag
9463486d s390x: dimm not supported
164bd8cd test/fmt: drop extra newlines
73555a40 qmp: add query-status API
234e0edf qemu: fix memory prealloc handling
30bfcaaa qemu: add debug logfile
79e0d533 qmp: support command 'query-qmp-schema'
68cdf64f test: add cpu topology tests
e0cf9d5c qmp: add checks for the CPU toplogy
a5c11908 qemu: support x86 SMP die
8fd28e23 Support x-pci-vendor-id and x-pci-device-id pass to qemu
713d0d94 s390x: add virtio-blk-ccw type
65cc343f test: add devno in the tests for s390x
9cf98da0 s390x: add devno support
0c900f59 Allow sharing of memory backend file
f695ddf8 qemu: add migration incoming defer support
f0f18dd0 qmp: add virtio-blk multiqueue
7d3deea4 qemu: Add a virtio-blk-pci device driver support
058cda06 qemu: use MiB instead of Gib for virtio-fs cache size
694a7b1c qemu/qmp: re-implement mainLoop
5712b119 qemu/qmp: fix readLoop() reuse scanner.Bytes() underlying array problem
3c84b1da govmm: add VhostUserFS vhost-user device type
4692f6b9 qmp: Conditionally pass threadID and socketID when CPU device add
1f51b438 Update the versions of Go used to build GoVMM
ad310f9f Fix staticcheck S1023
932fdc7f Fix staticcheck S1023
cb2ce933 Fix staticcheck S1008
f0172cd2 Fix staticcheck (S1002)
5f2e630b Fix staticcheck (S1025)
4beea513 Fix staticcheck (ST1005) errors
97fc3435 contributors: add my name
c891f5f8 qmp: Add nvdimm support
f9b31c0f qemu: Allow disable-modern option from QMP
d6173077 Run tests for the s390x build
b36b5a8f Contributors: Add Clare Chen to CONTRIBUTORS.md
b41939c6 Contributors: Add my name
dab4cf1d qmp: Add tests
5ea6da14 Verify govmm builds on s390x
ee75813a contributors: add my name
c80fc3b1 qemu: Add s390x support
ca477a18 Update source file headers
e68e0056 Update the CONTRIBUTING.md
2b7db547 Add the CONTRIBUTORS.md file
b3b765cb qemu: test Valid for Vsock for Context ID
3becff5f qemu: change of ContextID from uint32 to uint64
f30fd135 qmp: Output error detail when execute QMP command failed
7da6a4c7 qmp: fix mem-path properties for hotplug memory.
e4892e33 qemu/qmp: preparation for s390x support
110d2fa0 qemu/qmp: add new function ExecuteBlockdevAddWithCache
a0b0c86e qmp_test: Change QMP version from 2.6 to 2.9
10c36a13 qemu: add support for pidfile option
9c819db5 qemu: Fix virtio-net-pci QMP command
7fdfc6a4 qemu: Add support for romfile option
e74de3c7 Update guidelines on security issue reporting
ec83abe6 qemu: Add virtio-balloon device suppport.
46970781 qemu: Show full path to qemu binary at launch time
ef725050 qemu: Fix the support of PCIe bridge
56f645ea qmp: add ExecuteQueryMigration
a429677a govmm: fix memory prealloc
1130aab8 qmp: add "query-cpus" support
de5d2788 qemu/qmp: add vfio mediated device support on root bus
de00d7a6 qemu/image: Reduce permissions of .iso creation dir
1a1fee75 qemu/qmp: nic can works without vhost
6c3d84ea qemu: Add virtio RNG device.
b16291cf qemu/qmp: support query-memory-devices qmp command.
ce070d11 govmm: modify govmm to be compatible with qemu 2.8
0286ff9e qemu/qmp: support hotplug a nic whose qdisc is mq
8515ae48 qmp: Remind users that you must first call ExecuteQMPCapabilities()
21504d31 qemu/qmp: Add netdev_add with chardev support
ed34f616 Add some negative test cases for qmp.go
17cacc72 Add negative test cases for qemu.go
2706a07b qemu: Use the supplied context.Context for launching
e46092e0 qemu: Do not try and generate invalid RTC parameters
fcaf61dc qemu/qmp: add vfio mediated device support
4461c459 disk: Add --share-rw option for hotplugging disks
68519998 qemu/qmp: add addr and bus to hotplug vsock devices
10efa841 qemu/qmp: add function for hotplug network by fds
80ed88ed qemu/qmp: implement function to hotplug serial ports
ca46f21f qemu/qmp: implement function to hotplug character devices
03f1a1c3 qemu/qmp: implement getfd
84b212f1 qemu: add vhostfd and disable-modern to vsock hotplug
12dfa872 qemu/qmp: implement function for hotplug network
3830b441 qemu: add vhostfd and disable-modern to vhost-vsock-pci
f700a97b qemu/qmp: implement function to hotplug vsock-pci
4ca232ec qmp_test: Fix Warning and Error level logs
430e72c6 qemu,qmp: Enable gas security checker
ffc06e6b qemu,qmp: Add staticcheck to travis and fix errors
54caf781 qmp: add hotplug memory
e66a9b48 qemu: add appendMemoryKnobs helper
8aeca153 qmp: add migrate set arguments
a03d4968 qmp: add set migration capabilities
0ace4176 qemu: allow to set migration incoming
723bc5f3 qemu: allow to create a stopped guest
283d7df9 qemu: add file backed memory device support
30aeacb8 qemu: Add qemu parameter for PCI address for a bridge.
9130f375 scsi: Allow scsi controller to associate with an IO thread.
a54de183 iothread: Add ability to configure iothreads
0c0ec8f3 qemu: add initrd support
68f30718 qemu: add DisableModern to SCSIController
693d9548 qemu: add options for the machine type
3273aafd scsi: Add function to send device_add qmp command for a scsi device
6d198b8a Compute coverage statistics for unit tests in Travis builds
3a31da32 scsi: Add a scsi controller device
5316779d qemu: Add VSOCK support
f5655366 vhost-user: add blk device support
e9e27673 vhost-user: updating comments for accuracy, rename device field
8fe57236 qemu: Add maxcpus attribute to -smp
3baa7765 Add badges to the README.md file
d74e3b66 Fix errcheck failures in the unit tests
db60e32f Enable Travis builds
9cb47fc0 Add .gitignore file.
a8aaf534 Add project documentation
57aafb56 Remove all references to and dependencies on ciao
27709fce Move files to the qemu folder
48feb29f qemu: introduce vhost-user handling
b8ddd244 qemu: Add function to list hotpluggable CPUs
8c428ed7 qemu: Add function to hotplug CPUs
24b14059 qemu: Add functions to process QMP response
e39da6ca qmp: Add support for hot plugging VFIO devices on PCI(E) bridges
bc030d13 qemu: Add a SysProcAttr parameter to CreateCloudInitISO
11977072 qemu: Add a SysProcAttr parameter to LaunchCustomQemu
b639da45 qemu: Add function to hotplug vfio device
7e5614b8 Networking: Add vhost fd support
14316ce0 qemu/qmp: Implement function to hot plug PCI devices
83485dc9 qemu: Implement Bridge struct
cfa8a995 Networking: Add support for handling macvtap interfaces
83126d3e bios: add support for custom bios
3da2ef9d QEMU: Knobs: Huge Page Support: Add support for huge pages
9bfa7927 vfio: Add ability to pass VFIO devices to qemu
a70ffd19 Build: Fix the build after repo move.
0c206170 Knobs: Modify the behaviour of the Mlock knob.
ddee41d5 QEMU: Enable realtime options
4ecb9de5 qemu: Add support for memory pre-allocation
1fbe6c5d qmp: Update block device deletion for newer versions of qemu
e74aeef1 qemu: Add disable-modern option for virtio devices
8d617ff5 qemu: Update virtio-net-pci command line
25a2dc8f qemu: Update blockdev-add qmp command to support newer qemu versions
d4f77103 misc: Remove some of the code flagged by unused linter
a1600dc1 misc: Remove unused fields identified by structcheck
58a835e6 misc: Remove unused variables identified by varcheck
d48b5b5f qemu: Add PCI option to the NetDevice
a84228ae qemu: Document how cancelling works.
1e7202a5 qemu: Fix spelling error in qmp_test.go
c6f33453 qemu: Fix command cancelling.
a8a798b0 qemu, ciao-launcher: Move ConfigDrive ISO creation code to qemu
30cf1163 Add missing bus parameter for a CharDevice
2aa5f5a3 qemu: Add support for serial port addition
6fe338d6 qemu: Support creating multiple QMP sockets
992b861e qemu: Add the daemonize qemu option to the Knobs structure
997cb233 qemu: Remove dead code
e555f565 qemu: Add support for socket based consoles
eae8fae0 qemu: Fix security model typo
db067857 qemu: Make Config's FDs field private
12f6ebe3 qemu: Embed the qemu parameters into the Config structure
e193a77b qemu: Add support for block devices
3908185c qemu: Add MACVTAP support
6d7dfa04 qemu: Get rid of the Driver structure
cc9cb33a qemu: Add QMPSocket specific type
2d736d71 qemu: Add RTC specific types
e543c338 qemu: Probe each qemu device with a driver
eda8607c qemu: Add netdev options to the Device structure
4780e237 qemu: Add multi-queue and vhost definitions to NetDevice
137e7c72 qemu: Add a NetDevice slice to the Config structure
c0e2aaca qemu: Add one unit test for the Config strings
5ba8ef79 qemu: Add QMP socket unit tests
7b2f7eb5 qemu: Add Memory and SMP unit tests
2ea9b9a3 qemu: Add a Kernel unit test
8e495f6e qemu: Add a Knobs unit test
8aeb3d45 qemu: Add an Object unit test
38e041dc qemu: Add Device unit tests
54d32c24 qemu: Add parameters adding unit tests
ebfa382d qemu: Add a Knobs field to the Config structure
fe1bdcd2 qemu: Remove the extra parameters field from the Config structure
15bce61a qemu: Group all machine configurations into one structure
d94b5af8 qemu: Add a VGA parameter field to the Config structure
4892d041 qemu: Add a Global parameter field to the Config structure
612a5a9e qemu: Add a RTC field to the Config structure
c63ec096 qemu: Add a SMP field to the Config structure
7cf386a8 qemu: Add a Memory field to the Config structure
b198bc67 qemu: Add a UUID field to the Config structure
6239e846 qemu: Add a Character Devices slice field to the Config structure
73e2d53c qemu: Add a Filesystem Devices slice field to the Config structure
518ba627 qemu: Add a Kernel field to the Config structure
b973bc59 qemu: Add an Object slice field to the Config structure
8744dfe8 qemu: Add a Device slice field to the Config structure
5458de70 qemu: Add a QMP socket field to the Config structure
17118270 qemu: Add qemu's name to the Config structure
37a1f500 qemu: Add configuration structure to simplify LaunchQemu
5ccbaf2b ciao-launcher, qemu: Upgrade to new context package.
f5720198 qemu: Use null QMP logger when the logger parameter is nil
7d4199a4 qemu: Fix ineffassign error
7f50a415 qemu: Fix a silly bug in LaunchQemu
fc6bf8cf qemu: Add package documentation
306f54a9 ciao-launcher, qemu: Move launchQemu to qemu
344aa22b qemu: Add the qemu package
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
The commit message of a revert commit usually generated by
`git revert`, we should consider this as legal.
Consider the commit as the merge commit if the subject
starts with 'Reject "'
Follow the pr kata-containers/tests/#3938, the suttle diffrence
is we skip all commit checks for revert commit including fixes checking
and subsystem checking. Because the commit was reverted must have passed
the check so the revert-commit should have the Fixes and Subsystem.
Fixes: #3568Fixes: kata-containers/tests#3934
Signed-off-by: Tim Zhang <tim@hyper.sh>
When building a non-stable release, the tag is **always** "latest¨,
instead of the version. The same magic done for setting the correct
tags up should be done for replacing the tag on the kata-deploy and
kata-cleanup yaml files, as part of the kata-deploy test.
Fixes: #3559
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
There are software and hardware architectures which do not support
dynamically adjusting the CPU and memory resources associated with a
sandbox. For these, today, they rely on "default CPU" and "default
memory" configuration options for the runtime, either set by annotation
or by the configuration toml on disk.
In the case of a single container (launched by ctr, or something like
"docker run"), we could allow for sizing the VM correctly, since all of
the information is already available to us at creation time.
In the sandbox / pod container case, it is possible for the upper layer
container runtime (ie, containerd or crio) could send a specific
annotation indicating the total workload resource requirements
associated with the sandbox creation request.
In the case of sizing information not being provided, we will follow
same behavior as today: start the VM with (just) the default CPU/memory.
If this information is provided, we'll track this as Workload specific
resources, and track default sizing information as Base resources. We
will update the hypervisor configuration to utilize Base+Workload
resources, thus starting the VM with the appropriate amount of CPU and
memory.
In this scenario (we start the VM with the "right" amount of
CPU/Memory), we do not want to update the VM resources when containers
are added, or adjusted in size.
This functionality is introduced behind a configuration flag,
`static_sandbox_resource_mgmt`. This is defaulted to false for all
configurations except Firecracker, which is set to true.
This'll greatly improve UX for folks who are utilizing
Kata with a VMM or hardware architecture that doesn't support hotplug.
Note, users will still be unable to do in place vertical pod autoscaling
or other dynamic container/pod sizing with this enabled.
Fixes: #3264
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Describe the static_sandbox_resource_mgmt flag, and how this applies to
configurations that do not utilize hotplug.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Add the POD metadata we get from the container manager to the metrics by
adding more labels.
Fixes: #3551
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
Kata-monitor detects started and terminated kata pods by monitoring the
vc/sbs fs (this makes sense since we will have to access that path to
access the sockets there to get the metrics from the shim).
While kata-monitor updates its sandbox cache based on the sbs fs events,
it will schedule also a sync with the container manager via the CRI in
order to sync the list of sandboxes there.
The container manager will be the ultimate source of truth, so we will
stick with the response from the container manager, removing the
sandboxes not reported from the container manager.
May happen anyway that when we check the container manager, the new kata
pod is not reported yet, and we will remove it from the kata-monitor pod
cache. If we don't get any new kata pod added or removed, we will not
check with the container manager again, missing reporting metrics about
that kata pod.
Let's stick with the sbs fs as the source of truth: we will update the
cache just following what happens on the sbs fs.
At this point we may have also decided to drop the container manager
connection... better instead to keep it in order to get the kube pod
metadata from it, i.e., the kube UID, Name and Namespace associated with
the sandbox.
Every time we get a new sandbox from the sbs fs we will try to retrieve the
pod metadata associated with it.
Right now we just attach the container manager sandbox id as a label to
the exposed metrics, making hard to link the metrics to the running pod
in the kubernetes cluster.
With kubernetes pod metadata we will be able to add them as labels to map
explicitly the metrics to the kubernetes workloads.
Fixes: #3550
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
We currently WARN about unexpected fs events, which includes CHMOD
operations (which should be actually expected...).
Just ignore all the fs events we don't care about without any warn.
We dump all the events with debug log in any case.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
Improve debug log formatting of the sandbox cache update process.
Move raw and tracing logs from the DEBUG to the TRACE log level.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
Updated the doc to clarify certain networking details and
external links to some of the networking terms used.
Fixes#3308
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
The OCI container spec specifies a limit of -1 signifies
unlimited memory. Update the sandbox memory calculator
to reflect this part of the spec.
Fixes: #3512
Signed-off-by: Braden Rayhorn <bradenrayhorn@fastmail.com>
This PR removes the docker run and sysctl limitation reference
for kata 2.0 as well as docker daemon limitation as currently
for kata we are not supporting docker and this reference belonged
to kata 1.0
Fixes#3545
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
1. The hook.args[0] is the hook binary name which shouldn't be included
in the Command.args.
2. Add new unit tests
Fixes: #2610
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
no explicit PCI test, just switch path depending on architecture
(CCW for s390x, PCI for others). Also fixes an unknown variable error.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
When no options are passed to -ldflags, it passes
incorrect values(in this case, $BUILDFLAGS) to it.
Fix passing empty values by passing $KATA_LDFLAGS
in quotes.
Fixes: #3521
Signed-off-by: Amulya Meka <amulmek1@in.ibm.com>
Now that kata-pkgsync has been removed, this PR removes the reference
in the documentation.
Fixes#3513
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Remove the libseccomp package from Dockerfile of `alpine` and `clearlinux`
because the libseccomp library is installed by the `ci/install_libseccomp.sh`
script when building the kata-agent.
Fixes: #3508
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Although I've done tests on my own fork using `head_ref` and those
worked, it seems those only worked as the PR was coming from exactly the
same repository as the target one.
Let's switch to base_ref, instead, which we for sure have as part of our
repo.
The downside of this is that we run the test with the last merged PR,
rather than with the "to-be-approved" PR, but that's a limitation we've
always had.
Fixes: #3482
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
For now a bunch of tests are simply not working.
Let's skip them all, and re-enable them once
kata-containers/kata-containers/issues/3500 gets fixed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Both projects follow the same license, Apache-2.0, but the header saying
that comes from govmm is different from the one expected for the tests
present on the kata-containers repo.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
govet checks have been ignored on govmm repo, but those are enabled on
kata-containers one. So, in order to avoid failing our CIs let's just
keep ignoring the checks for the govmm structs and have an issue opened
for fixing it whenever someone has cycles to do it.
The important bit here is, we're not making anything worse that it
already is. :-)
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
govmm, from now on, should follow the same guidelines from contributing,
copying, and etc as kata-containers does.
The go.mod is not needed anymore as the project lives inside the
runtime.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's stop using govmm from kata-containers/govmm and let's start using
it from our own repo.
Fixes: #3495
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR removes the kata-pkgsync tool that is mainly used for OBS
packages, currently for kata 2.0 we do not have OBS packages and
this tool is not being used for kata 2.0
Fixes#3493
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Escape (with backslash) backticks (`) to prevent them from being
evaluated by the shell.
Fixes: #3487
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Current latest release is 1.23.1. Let's update to this version for our
integration testing.
Fixes: #3477
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
`enable_swap` option was added long time ago to add
`-realtime mlock=off` to the QEMU's command line.
Kata now supports QEMU 6, `-realtime` option has been deprecated and
`mlock=on` is causing unexpected behaviors in kata.
This patch removes support for `enable_swap`, `-realtime` and `mlock=`
since they are causing bugs in kata.
Signed-off-by: Julio Montes <julio.montes@intel.com>
This is needed in order to ensure that, for instance, if `force-skip-ci`
label is either added or removed later, the jobs related to the actions
will be restarted and accordingly checked.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Before this change it was only applied to the static-checks, but if
we're already taking the extreme path of skipping the CI, we better
ensure we skip all the actions and not just a few of them.
Fixes: #3471
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The action used for testing kata-deploy is entirely based on the action
used to build the kata-deploy tarball, but while the latter is able to
use the correct branch, the former always uses `main`.
This happens as the `issue_comment`, from GitHub actions, passed the
"default branch" as the GITHUB_REF.
As we're not the first ones to face such a issue, I've decided to take
one of the approaches suggested at one of the checkout's issues,
https://github.com/actions/checkout/issues/331, and take advantage of a
new action provided by the community, which will get the PR where the
comment was made, give us that ref, and that then can be used with the
checkout action, resulting on what we originally wanted.
Fixes: #3443
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The script `tools/packaging/static-build/qemu/build-base-qemu.sh`
previously failed on systems where the user's groupname differs from the
username
Fixes: #3461
Signed-off-by: Sebastian Hasler <sebastian.hasler@stuvus.uni-stuttgart.de>
bring SGX support and other fixes
shortlog:
8939b0f qemu: add support for SGX
b17f073 qemu: update readonly flag for block devices
f971801 qemu: only set wait parameter for server mode socket based
char device
82cc01d qemu: Fix 32 bit int overflow in test file
1d1a231 qemu: Add support for legacy serial device
9a2bbed qemu: Remove -realtime in favor of -overcommit
fe83c20 qemu: Add support for --no-shutdown Knob
1ed5271 qmp: wait for POWERDOWN event in ExecuteSystemPowerdown()
fixes#3080
Signed-off-by: Julio Montes <julio.montes@intel.com>
Provide the `/proc/config.gz` file in guest kernels that allow the guest
to determine the kernel configuration used to build the running kernel.
Note that since `gunzip` expects to rename the gzip'ed file it operates
on, to use this feature you need to run something like the following in
the container environment:
```bash
# cat /proc/config.gz|gunzip -c
```
Fixes: #3445.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Debian is a supported rootfs that uses systemd as init, thus, it should
be mentioned in the QAT README document.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As the gentoo rootfs is not tested in our CI, we can't guarantee it
actually works as expected.
Whenever we have someone willing to maintain this rootfs we can have it
added back, and also add a CI job to test it altogether, avoiding then
any possible regression.
Fixes: #2144
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As the suse rootfs is not tested in our CI, we can't guarantee it
actually works as expected.
Whenver we have someone willing to maintain this rootfs we can have it
added back, and also add a CI job to test it altogether, avoiding then
any possible regression.
Fixes: #2145
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As the fedora rootfs is not tested in our CI, we can't guarantee it
actually works as expected.
Whenever we have someone willing to maintain the rootfs we can have it
added back, and also add a CI job to test it altogether, avoiding then
any possible regression.
Fixes: #2143
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
As the centos rootfs is not tested in our CI, we can't guarantee it
actually works as expected.
Whenever we have someone willing to maintain the rootfs we can have it
added back, and also add a CI job to test it altogether, avoiding then
any possible regression.
Fixes: #2140
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This PR removes the ccloudvm reference at the README document as the
setup of scripts of ccloudvm were removed.
Fixes#3448
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes ccloudvm for kata 2.0, ccloudvm was used in kata 1.x
and we are not longer using it for kata 2.0.
Fixes#3427
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Generated protocols files should not be inclued in Git repo.
And also add Cargo.lock in oci/protocols directory to .gitignore.
Fixes: #3422
Signed-off-by: bin <bin@hyper.sh>
When Sandbox AddInterface() is called, it may fail after endpoint.HotAttach,
we'd better rollback and call save() in the end.
Fixes: #3419
Signed-off-by: yangfeiyu <yangfeiyu20102011@163.com>
This reverts commit 321995b7df.
Now that gnu.org is back online, we don't need to use a mirror.
Fixes: #3313.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
After the protocols are moved to upper libs (PR3355),
the runtime protocol generation is broken. This fixes it.
Fixes: #3414
Signed-off-by: Feng Wang <feng.wang@databricks.com>
This PR removes the scripts and the dockerfiles that were used in kata 1.x
to test the different kata components for different distributions in OBS.
Currently for kata 2.0 we are not generating packages in OBS so these scripts
are not longer being used.
Fixes#3404
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
This PR removes the reference of how to use disable_new_netns
configuration with docker as for kata 2.0 we are not supporting docker
and this information was used for kata 1.x
Fixes#3400
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
- kata-deploy: fix tar command in dockerfile
- vendor: update to containerd v1.6.0-beta.4
- versions: Upgrade to Cloud Hypervisor v20.2
- vc: remove swagger binary
- agent: Refactor command line parsing to use a framework
- move the oci and protocols crates from agent to upper libs
- docs: Remove word duplication
- osbuilder: Restore Debian as a rootfs
- runtime: fix a typo in kata-collect-data.sh
- agent: return detail error message for RPC calls from shim
- use-cases: clarify SPDK vhost-user-nvme target status in using-spdk-v…
- Delint dockerfiles
- Makefile: update `make go-test` call
- docs: add how-to on DinD in Kata
- agent: Ignore unknown seccomp system calls
- agent: mount: Remove unneeded mount_point local variable
- docs: Fix outdated links
- docs: Fix kernel configs README spelling errors
- security: Update rust crate versions
- kata-manager: Retrieve static tarball
- osbuilder: avoid to copy versions.txt which already deprecated
- qemu: Disable libudev for QEMU 5.2 and newer
- osbuilder: Add protoc to the alpine container
- docs: Clarify where to run agent API generation commands
- packaging/qemu: partial git clone
- docs: Fix arch doc formatting
- CI: Switch to a mirror as gnu.org is down
- Split architecture doc into separate files
- docs: Update the stable branch strategy
- tracing: Add span name to logging error
- docs: Update code PR advice document
- agent: Add config file option to cli
- update container type handling
- docs: Update architecture document
- runtime: update golang to 1.16 and remove ioutil package
- kata-deploy: Deal with empty containerd conf file
- src: reorg source code directory
- osbuilder: show usage if no options/arguments specified
- Upgrade to Cloud Hypervisor v20.1
- image_build: add help info for '-f' option and 'BLOCK_SIZE' env.
- osbuilder: be runtime consistent with podman build
- osbuilder: Revert to using apk.static for Alpine
- runtime/template: Handling new attributes for hypervisor config
- docs: fix check-markdown test
- runtime: correct span name for stopSandbox function
- runtime: only call stopVirtiofsd when shared_fs is virtio-fs
- snap: read initrd and image distros from version.yaml
- versions: Use Ubuntu initrd for non-musl archs
- packaging: Fix missing commit message in building kata-runtime
- virtcontainers: clh: Upgrade to openapi-generator v5.3.0
- agent: user container ID as watchable storage key for hashmap
- runtime: enable vhost-net for rootless hypervisor
- packaging: add help information for '-f' option in install_go.sh
- Cleanup some unused variables, definitions
- Upgrade to Cloud Hypervisor v20.0
- docs: Update limitation document regarding docker swarm
- runtime: Enable FUSE_DAX kernel config for DAX
- agent: copy empty directories for watchable-bind mounts
- runtime: Update comments for virtcontainers to use kata 2.0
- Update rust crate versions
- osbuilder: Remove debian as a rootfs
e2c1e65e kata-deploy: fix tar command in dockerfile
615224e9 agent: move the protocols to upper libs
330e3dcc agent: move the oci crate to upper libs
7b03d78f vendor: update to containerd v1.6.0-beta.4
1f581a04 versions: Upgrade to Cloud Hypervisor v20.2
623d8f08 docs: Remove word duplication
1c4edb96 agent: Refactor arg parsing to use clap
3093f93a osbuilder: Restore Debian as a rootfs
073a3459 use-cases: clarify vhost-user-nvme status in using-spdk-vhost-user
2254fa86 runtime: fix a typo in kata-collect-data.sh
2d0f9d2d vc: remove swagger binary
cf91307c agent: return detail error message for rpc calls from shim
137e217b docs: Fix outdated k8s link
55bac67a docs: Fix kernel configs README spelling errors
205420d2 docs: Replicate branch rename on runtime-spec
91abebf9 agent: mount: Remove unneeded mount_point local variable
b1f4e945 security: Update rust crate versions
d79268ac tools/packaging: add copyright to kata-monitor's Dockerfile
428cf0a6 packaging: delint tests dockerfiles
1ea9b703 packaging: delint kata-deploy dockerfiles
3669e1b6 ci/openshift-ci: delint dockerfiles
aeb2b673 osbuilder: delint dockerfiles
bc120289 packaging: delint kata-monitor dockerfiles
bc71dd58 packaging: delint static-build dockerfiles
99ef52a3 osbuilder: Add protoc to the alpine container
c2578cd9 docs: Clarify where to run agent API generation commands
321995b7 CI: Switch to a mirror as gnu.org is down
fb1989b2 docs: Fix arch doc formatting
2938bb7f packaging/qemu: Use QEMU script to update submodules
5d49ccd6 packaging/qemu: Use partial git clone
87a219a1 docs: Update the stable branch strategy
d1bc409d osbuilder: avoid to copy versions.txt which already deprecated
1653dd4a tracing: Add span name to logging error
12c8e41c qemu: Disable libudev for QEMU 5.2 and newer
233015a6 docs: Split guest assets details out of arch doc
db411c23 docs: Split k8s info out of arch doc
7ac619b2 docs: Split networking out of arch doc
5df0cb64 docs: Split storage out of arch doc
7229b7a6 docs: Split background and example out of arch doc
283d7d52 docs: Split history out of arch doc
6f9efb40 docs: Move arch doc to separate directory
02608e13 docs: Update code PR advice document
cb5c948a kata-manager: Retrieve static tarball
51bf9807 docs: Update architecture document
f3a97e94 docs: add how-to on Docker in Kata
7a989a83 runtime: api-test: fixup
52f79aef utils: update container type handling
5b002f3c docs: change io/ioutil to io/os packages
03546f75 runtime: change io/ioutil to io/os packages
24a530ce versions: bump minimum golang version to 1.16.10
7c4263b3 src: reorg source directories
1a34fbcd agent: Add config file option to cli
bbfb10e1 versions: Upgrade to Cloud Hypervisor v20.1
84571506 kata-deploy: Deal with empty containerd conf file
3f7cf7ae osbuilder: show usage if no options/arguments specified
2ebaaac7 osbuilder: be runtime consistent also with podman build
f3103696 docs: fix check-markdown test
2204ecac versions: Upgrade Alpine, using minor version
dfd0732f osbuilder: Revert to using apk.static for Alpine
6b3e4c21 image_build: add help info for '-f' option and 'BLOCK_SIZE' env.
b92babf9 runtime/template: Handling new attributes for hypervisor config
40bd34ca runtime: only call stopVirtiofsd when shared_fs is virtio-fs
33f343ee runtime: correct span name for stopSandbox function
d7cc952c versions: Use Ubuntu initrd for non-musl archs
ff929fc0 snap: read initrd and image distros from version.yaml
8fae2631 packaging: Fix missing commit message in building kata-runtime
99530026 virtcontainers: clh: Upgrade to openapi-generator v5.3.0
b3bcb7b2 runtime: enable vhost-net for rootless hypervisor
7cb7b9d5 agent: remove unused field in mount handling
f6ae1582 agent: drop unused fields from network
4756a04b virtcontainers: clh: Re-generate the client code
0bf4d257 versions: Upgrade to Cloud Hypervisor v20.0
647082b2 docs: Update limitation document regarding docker swarm
39b35d00 agent: user container ID as watchable storage key for hashmap
1e6f58e5 packaging: add help information for '-f' option in install_go.sh
2af95bc5 agent: create directories for watchable-bind mounts
6105e3ee runtime: enable FUSE_DAX kernel config for DAX
591d4af1 runtime: Update comments for virtcontainers to use kata 2.0
923e098d osbuilder: Remove debian as a rootfs
afb96c00 agent: Wrap remaining nix errors with anyhow
aba572e0 rustjail: Wrap remaining nix errors with anyhow
30d60078 uevent: Fix clippy issue in test code
4a2be13c agent: Upgrade nix version for security fix
256d5008 agent: Update crate versions
13257986 agent-ctl: Update rust lockfile
4ebdd424 forwarder: Update rust lockfile
6007322d agent: Fixed invalid error message
7b356151 agent: Log unknown seccomp system calls
7304e52a Makefile: update `make go-test` call
c66b5668 agent: Ignore unknown seccomp system calls
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
move the protocols to upper libs thus it can
be shared between agent and other rust runtime.
Depends-on: github.com/kata-containers/tests#4306
Fixes: #3348
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Move the oci crate to upper libs thus it can be
shared between agent and other rust runtimes.
Fixes: #3348
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Update our containerd vendoring. In particular, we're interested in
grabbing the updated annotation definitions for defining sandbox sizing.
- go get github.com/containerd/containerd@v1.6.0-beta.4
- edit go.mod to remove containerd v1.5.8 replacement directive
- go mod vendor
- go mod tidy
Fixes: #3276
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
This is a bug release from Cloud Hypervisor addressing the following
issues: 1) Don't error out when setting up the SIGWINCH handler (for
console resize) when this fails due to older kernel; 2) Seccomp rules
were refined to remove syscalls that are now unused; 3) Fix reboot on
older host kernels when SIGWINCH handler was not initialised; 4) Fix
virtio-vsock blocking issue.
Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v20.2Fixes: #3383
Signed-off-by: Bo Chen <chen.bo@intel.com>
Restore Debian as a rootfs.
1. revert of #3154, but some change
2. update debian version to 10.11
3. update `libstdc++-6-dev` to `libstdc++-8-dev`
4. changes discarded in QAT are not restored
Fixes: #3372
Signed-off-by: zhaojizhuang <571130360@qq.com>
SPDK vhost-user-nvme target is removed from SPDK 21.07 release since
upstreamed QEMU version does not support. Fixes this usage.
Fixes#3371
Signed-off-by: Ziye Yang <ziye.yang@intel.com>
The tokio's spawn will only create an future async task
instead of a new real thread, thus executing unshare to
create a new namespace in tokio's async task would make
the agent process to join in the new created namespace,
which isn't expected.
Thus, we'd better to to the unshare in a real thread to
prevent moving the agent process into a new namespace.
Fixes: #3369
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
For calls from shim to agent, the return error will be processed like this:
match self.do_start_container(req).await {
Err(e) => Err(ttrpc_error(ttrpc::Code::INTERNAL, e.to_string())),
Ok(_) => Ok(Empty::new()),
}
The e.to_string() return only a part of the error(for example set by context()),
this may lead lack of information.
The `format!("{:?}", err)` will return more info.
Fixes: #3353
Signed-off-by: bin <bin@hyper.sh>
We already have a `mount_path` local Path variable which holds the mount
point.
Use it instead of creating a new `mount_point` variable with identical
type and content.
Fixes: #3332
Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
The kata-monitor's Dockerfile was added by Eric Ernst on commit 2f1cb7995f
but for some reason the static checker did not catch the file misses the copyright statement
at the time it was added. But it is now complaining about it. So this assign the copyright to
him to make the static-checker happy.
Fixes#3329
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Removed all errors/warnings pointed out by hadolint version 2.7.0, except for the following
ignored rules:
- "DL3008 warning: Pin versions in apt get install"
- "DL3041 warning: Specify version with `dnf install -y <package>-<version>`"
- "DL3033 warning: Specify version with `yum install -y <package>-<version>`"
- "DL3048 style: Invalid label key"
- "DL3003 warning: Use WORKDIR to switch to a directory"
- "DL3018 warning: Pin versions in apk add. Instead of apk add <package> use apk add <package>=<version>"
- "DL3037 warning: Specify version with zypper install -y <package>[=]<version>"
Fixes#3107
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Removed all errors/warnings pointed out by hadolint version 2.7.0, except for the following
ignored rules:
- "DL3008 warning: Pin versions in apt get install"
- "DL3041 warning: Specify version with `dnf install -y <package>-<version>`"
- "DL3033 warning: Specify version with `yum install -y <package>-<version>`"
- "DL3048 style: Invalid label key"
- "DL3003 warning: Use WORKDIR to switch to a directory"
- "DL3018 warning: Pin versions in apk add. Instead of apk add <package> use apk add <package>=<version>"
- "DL3037 warning: Specify version with zypper install -y <package>[=]<version>"
Fixes#3107
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Removed all errors/warnings pointed out by hadolint version 2.7.0, except for the following
ignored rules:
- "DL3008 warning: Pin versions in apt get install"
- "DL3041 warning: Specify version with `dnf install -y <package>-<version>`"
- "DL3033 warning: Specify version with `yum install -y <package>-<version>`"
- "DL3048 style: Invalid label key"
- "DL3003 warning: Use WORKDIR to switch to a directory"
- "DL3018 warning: Pin versions in apk add. Instead of apk add <package> use apk add <package>=<version>"
- "DL3037 warning: Specify version with zypper install -y <package>[=]<version>"
Fixes#3107
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Removed all errors/warnings pointed out by hadolint version 2.7.0, except for the following
ignored rules:
- "DL3008 warning: Pin versions in apt get install"
- "DL3041 warning: Specify version with `dnf install -y <package>-<version>`"
- "DL3033 warning: Specify version with `yum install -y <package>-<version>`"
- "DL3048 style: Invalid label key"
- "DL3003 warning: Use WORKDIR to switch to a directory"
- "DL3018 warning: Pin versions in apk add. Instead of apk add <package> use apk add <package>=<version>"
- "DL3037 warning: Specify version with zypper install -y <package>[=]<version>"
Fixes#3107
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Removed all errors/warnings pointed out by hadolint version 2.7.0, except for the following
ignored rules:
- "DL3008 warning: Pin versions in apt get install"
- "DL3041 warning: Specify version with `dnf install -y <package>-<version>`"
- "DL3033 warning: Specify version with `yum install -y <package>-<version>`"
- "DL3048 style: Invalid label key"
- "DL3003 warning: Use WORKDIR to switch to a directory"
- "DL3018 warning: Pin versions in apk add. Instead of apk add <package> use apk add <package>=<version>"
- "DL3037 warning: Specify version with zypper install -y <package>[=]<version>"
Fixes#3107
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Removed all errors/warnings pointed out by hadolint version 2.7.0, except for the following
ignored rules:
- "DL3008 warning: Pin versions in apt get install"
- "DL3041 warning: Specify version with `dnf install -y <package>-<version>`"
- "DL3033 warning: Specify version with `yum install -y <package>-<version>`"
- "DL3048 style: Invalid label key"
- "DL3003 warning: Use WORKDIR to switch to a directory"
- "DL3018 warning: Pin versions in apk add. Instead of apk add <package> use apk add <package>=<version>"
- "DL3037 warning: Specify version with zypper install -y <package>[=]<version>"
Fixes#3107
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
It seems the lack of protoc in the alpine containers is causing issues
with some of our CIs, such as the VFIO one.
Fixes: #3323
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Make it clear when reading the table in the agent's "Change the agent
API" documentation that the commands in the "Generation method" column
should be run in the agent repo.
Fixes: #3317.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
All CI jobs are failing as www.gnu.org is down, so switch to a mirror
for the time being.
Fixes: #3314.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
PR #3298 failed to move the named link for the debug console to the
`guest-assets.md` meaning the debug console cells in the "User
accessible" column in the table in the "Root filesystem image" section
do not work as a link.
Fixes: #3311.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Currently QEMU's submodules are git cloned but there is the scripts/git-submodule.sh
which is meant for that. Let's use that script.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
The static build of QEMU takes a good amount of time on cloning the
source tree because we do a full git clone. In order to speed up that
operation this changed the Dockerfile so that it is carried out a
partial clone by using --depth=1 argument.
Fixes#3291
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
On the last architecture committee meeting, the one held on December
14th 2021, we reached the agreement that minor releases will be cut once
every 16 weeks (instead of 12), and that patch releases will be cut
every 4 weeks (instead of 3)
Fixes: #3298
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Currently the versions.txt in rootfs-builder dir is already removed,
so avoid to copy it in list of helper files.
Fixes: #3267
Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
Add span name to logging error to help with debugging when the context
is not set before the span is created.
Fixes#3289
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Commit 112ea25859 disabled libudev for static builds because it was
breaking snap. It turns out that the only users of libudev in QEMU are
qemu-pr-helper and USB. Kata already disables USB and doesn't use
qemu-pr-helper. Disable libudev for all builds if QEMU supports it, i.e.
version 5.2 or newer.
Fixes#3078
Signed-off-by: Greg Kurz <groug@kaod.org>
Move the guest assets details out of the architecture doc and into a
separate file.
Fixes: #3246.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Move the Kubernetes information out of the architecture doc and into a
separate file.
Partially fixes: #3246.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Move the networking details out of the architecture doc and into a
separate file.
Partially fixes: #3246.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Move the background and example command details out of the architecture
doc and into separate files.
Partially fixes: #3246.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Move the historical details out of the architecture doc
and into a separate file.
Partially fixes: #3246.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Move the architecture document into a new `docs/design/architecture/` directory
in preparation for splitting it into more manageable pieces.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Allow using `expect()` for `Mutex.lock()` because it is almost
unrecoverable if failed in the lock acquisition
Fixes: #3285
Signed-off-by: Zack <zmlcc@linux.alibaba.com>
In `utils/kata-manager.sh`, we download the first asset listed for the
release, which used to be the static x86_64 tarball. If that happened to
not match the system architecture, we would abort. Besides that logic
being invalid for !x86_64 (despite not distributing other tarballs at
the moment), the first asset listed is also not the static tarball any
more, it is the vendored source tarball. Retrieve all _static_ tarballs
and select the appropriate one depending on architecture.
Fixes: #3254
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Refresh the content and formatting of the architecture document.
Out of scope of these changes:
- Diagram updates.
- Updates to the Networking section.
Fixes: #3190.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
not clear why this was commented out before -- ensure that we set
approprate annotation on the sandbox container's annotations to indicate
this is a sandbox.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Today we assume that if the CRI/upper layer doesn't provide a container
type annotation, it should be treated as a sandbox. Up to this point, a
sandbox with a pause container in CRI context and a single container
(ala ctr run) are treated the same.
For VM sizing and container constraining, it'll be useful to know if
this is a sandbox or if this is a single container.
In updating this, we cleanup the type handling tests and we update the
containerd annotations vendoring.
Fixes: #2926
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Change io/ioutil to io/os packages because io/ioutil package
is deprecated from 1.16:
TempDir => os.MkdirTemp
Details: https://go.dev/doc/go1.16#ioutilFixes: #3265
Signed-off-by: bin <bin@hyper.sh>
since qemu 6.0, readonly flag for block devices must be enable or
disable with `on` or `off` respectively.
Signed-off-by: Julio Montes <julio.montes@intel.com>
According to https://endoflife.date/go golang 1.11.10 is not supported
anymore, 1.16.10 is the minimum supported version.
Fixes: #3265
Signed-off-by: bin <bin@hyper.sh>
This is a bug release from Cloud Hypervisor addressing the following
issues: 1) Networking performance regression with virtio-net; 2) Limit
file descriptors sent in vfio-user support; 3) Fully advertise PCI MMIO
config regions in ACPI tables; 4) Set the TSS and KVM identity maps so
they don't overlap with firmware RAM; 5) Correctly update the DeviceTree
on restore.
Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v20.1Fixes: #3262
Signed-off-by: Bo Chen <chen.bo@intel.com>
As containerd can properly run without having a existent
`/etc/containerd/config.toml` file (it'd run using the default
cobnfiguration), let's explicitly create the file in those cases.
This will avoid issues on ammending runtime classes to a non-existent
file.
Fixes: #3229
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Tested-by: Jakob Naucke <jakob.naucke@ibm.com>
Now if no options/arguments specified, the shell scripts will return an error:
ERROR: Invalid rootfs directory: ''
This commit will show usage if no options/arguments specified.
Fixes: #3256
Signed-off-by: bin <bin@hyper.sh>
Use the same runtime used for podman run also for the podman build cmd
Additionally remove "docker" from the docker_run_args variable
Fixes: #3239
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
Unit-Test-Advice.md was moved to kata-containers repo but URLs pointing
to that document were not updated. This patch updates these URLs.
Depends-on: github.com/kata-containers/tests#4273
fixes#3240
Signed-off-by: Julio Montes <julio.montes@intel.com>
- Upgrade Alpine guest rootfs to 3.15
- Specify a minor version rather than patch level as the Alpine
repositories use that.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
#2399 partially reverted #418, missing on returning to bootstrapping a
rootfs with `apk.static` instead of copying the entire root, which can
result in drastically larger (more than 10x) images. Revert this as well
(requires some updates to URL building).
Fixes: #3216
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
The help information of '-f' option is missing, and same issue
with 'BLOCK_SIZE' env variables, fix it in usage() function.
Fixes: #3231
Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
Some new attributes are added to hypervisor config:
- VMStorePath
- RunStorePath
- SharedPath
These attributes should be handled in two places:
- reset when check the new hypervisor's config is suitable
to the base config.
- copy from new hypervisor's config when create new VM
Fixes: #3193
Signed-off-by: bin <bin@hyper.sh>
ppc64le & s390x have no (well supported) musl target for Rust,
therefore, the agent must use glibc and cannot use Alpine. Specify
Ubuntu as the distribution to be used for initrd.
Fixes: #3212
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
vhost-net is disabled in the rootless kata runtime feature, which has been abandoned since kata 2.0.
I reused the rootless flag for nonroot hypervisor and would like to enable vhost-net.
Fixes#3182
Signed-off-by: Feng Wang <feng.wang@databricks.com>
This PR removes the information about docker swarm and docker compose
as currently for kata 2.0 we have not support for docker swarm and docker
compose and the links and references that the document is referring are
currently not part of kata 1.0
Fixes#3174
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
In function `update_target`, if the updated source is a directory,
we should create the corresponding directory.
Fixes: #3140
Signed-off-by: bin <bin@hyper.sh>
This PR updates the comments in the configuration.toml to point to
the current kata containers repository instead of the kata 1.x.
Fixes#3163
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Currently we do not have debian as part of the kata CI as we
do not have a mantainer, this PR removes debian as a supported
rootfs in order to have only the distros that we are supporting
and mantainining.
Fixes#3153
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Wrap `nix` `Error`'s in an `anyhow` error for consistency with the way
`rustjail` handles errors.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Replace `Result` values that use a "bare" `nix` `Error` like this:
```rust
return Err(nix::Error::EINVAL.into());
```
... to the following which wraps the nix` error in an `anyhow` call for
consistency with the other errors returned by `rustjail`:
```rust
return Err(anyhow!(nix::Error::EINVAL));
```
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Remove a bare `return` from a test function. This looks wrong but isn't
because the callers are all tests that just wait for a state change
caused by this test function.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Running `cargo audit` showed that the `nix` package for the agent and
the `rustjail` and `vsock-exporter` local crates need to be updated to
resolve rust security issue
[RUSTSEC-2021-0119](https://rustsec.org/advisories/RUSTSEC-2021-0119).
Hence, bumped `nix` to the latest version (which required changes to
work with the new, simpler `errno` handling).
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Run `cargo update` to update to the latest crate dependency versions.
The agent is an application so this includes expanding the partially
specified semvers to full semver values for the following crates,
which makes those crates consistent with the other agent dependencies:
- `futures`
- `regex`
- `scan_fmt`
- `tokio`
Fixes: #3124.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Remove the format specifier in the `"failed to get VFIO group"` error
returned by `vfio_device_handler()`.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
- osbuilder: fix missing cpio package when building rootfs-initrd image
- osbuilder: add coreutils to guest rootfs
- workflows: only allow org members to run `/test_kata_deploy`
- agent: use temp directory for test containers
- tools/osbuilder: build QAT kernel in fedora 34
- agent: refactor find_process function and add test cases
- Hypervisor cleanup, refactoring
- agent: clear cargo test warnings
- docs: Add a code PR advice document
- tools: Automatically revert kata-deploy changes
- runtime: delete netmon
- agent: Remove some unwrap and expect calls
- agent: fixed the `make optimize` bug
- docs: make kata-deploy more visible
- workflows: Add back the checks for running test-kata-deploy
- kata-deploy: Ensure we test HEAD with `/test_kata_deploy`
- docs: update using-SPDK-vhostuser-and-kata.md
- Update k8s SR-IOV plugin environment variables to work properly with Kata
- watchers: don't dereference symlinks when copying files
- kata-deploy: Add back stable & latest tags
- agent: fix the issue of missing create a new session for container
- runtime: Update containerd to 1.5.8
- qemu: fix snap build on ppc64le
- virtcontainers: fix failing template test on ppc64le
- agent: Update README
- Remove cruft, do some simple non-functional cleanup in the runtime
- macvlan: drop bridged part of name
- clh: Fix race condition that prevent start pods
- Update CRI-O documentation
- cgroups: Fix systemd cgroup support
- runtime: merge virtcontainers/pkg/types into virtcontainers/types
- workflows: Remove non-used main.yaml
- agent/src: improve unit test coverage for src/namespace.rs
- doc: update kata metrics documentation
- runtime: delete not used codes
- versions: bump golang to 1.17.x
- release: Use ${GOPATH}/bin/yq for upload-libseccomp-tarball action
- agent-ctl: Allow API specification in JSON format
- virtcontainers: Lint protection types
- agent: check environment variables if empty or invalid
- runtime: Revert "runtime: use containerd package instead of cri-containerd"
- rustjail: Fix created time of container
- agent: Remove dynamic tracing APIs
- kernel: add VFIO kernel dependencies for ppc64le
- logging: Always run crate tests
8ee67aae osbuilder: fix missing cpio package when building rootfs-initrd image
f59d3ff6 osbuilder: add coreutils to guest rootfs
5e7c1a29 workflows: only allow org members to run `/test_kata_deploy`
857501d8 tools/osbuilder: build QAT kernel in fedora 34
a32e02a1 agent: use temp directory as root of test containers
f0734f52 docs: Remove extraneous whitespace
aff32756 docs: Add a code PR advice document
d41c375c docs: Add more advice to the UT advice doc
baf4f76d docs: More detail on running tests as different users
fcf45b0c docs: Use more idiomatic rust string check
9fed7d0b docs: Mention anyhow for error handling in UT doc
318b3f18 docs: No present continuous in UT advice doc
e8bb6b26 docs: Correct repo name usage
c1111a1d docs: Use leading caps for lang names in UT advice doc
597b239e docs: Remove TOC in UT advice doc
cf360fad docs: Move unit test advice doc from tests repo
bc955814 docs: Move doc requirements section higher
6a0b7165 agent: refactor find_process function and add test cases
5ba2f52c tools: Quote functions arguments in the update repos script
5dbd752f tools: Remove the check for the VERSION file
85eb743f tools: Make hub usage slightly less fragile
76540dbd tools: Automatically revert kata-deploy changes
36d73c96 tools: Do the kata-deploy changes on its own commit
c8e22daf tools: Use vars for the registry in the update repo script
ac958a30 tools: Use vars for the yaml files used in the update repo script
edca8292 tools: Rewrite the logic around kata-deploy changes
31f6c2c2 tools: Update comments about the kata-deploy yaml changes
75bb3401 shimv2/service: fix defer funtions never run with os.Exit()
bd3217da agent: Remove redundant returns
adab6434 agent: Remove some unwrap and expect calls
351cef7b agent: Remove unwrap from verify_cid()
a7d1c70c agent: Improve baremount
09abcd4d agent-ctl: Remove some unwrap and expect calls
35db75ba agent-ctl: Remove redundant returns
46e45958 agent-ctl: Simplify main
c7349d0b agent-ctl: Simplify error handling
ddc68131 runtime: delete netmon
705687dc docs: Add kata-deploy as part of the install docs
acece849 docs: Use the default notation for "Note" on install README
143fb278 kata-deploy: Use the default notation for "Note"
45d76407 kata-deploy: Don't mention arch specific binaries in the README
0c6c0735 agent: fixed the `make optimize` bug
a7c08aa4 workflows: Add back the checks for running test-kata-deploy
ce0693d6 agent: clear cargo test warnings
ce92cadc vc: hypervisor: remove setSandbox
2227c46c vc: hypervisor: use our own logger
4c2883f7 vc: hypervisor: remove dependency on persist API
34f23de5 vc: hypervisor: Remove need to get shared address from sandbox
c28e5a78 acrn: remove dependency on sandbox, persistapi datatypes
a0e0e186 hypervisors: introduce pkg to unbreak vc/persist dependency
b5dfcf26 watcher: tests: ensure there is 20ms delay between fs writes
78dff468 agent/device: Adjust PCIDEVICE_* container environment variables for VM
4530e7df agent/device: Use simpler structure in update_spec_devices()
b6062278 agent/device: Correct misleading comment on test case
89ff7000 agent/device: Remove unnecessary check for empty container_path
c855a312 agent/device: Make DevIndex local to update_spec_devices()
084538d3 agent/device: Change update_spec_device to handle multiple devices at once
d6a3ebc4 agent/device: Obtain guest major/minor numbers when creating DevNumUpdate
f4982130 agent/device: Check for conflicting device updates
f10e8c81 agent/device: Batch changes to the OCI specification
46a4020e agent/device: Types to represent update for a device in the OCI spec
e7beed54 agent/device: Remove unneeded clone() from several device handlers
2029eeeb agent/device: Improve update_spec_device() final_path handling
57541315 agent/device: Correct misleading parameter name in update_spec_device()
0c51da3d agent/device: Correct misleading error message in update_spec_device()
94b7936f agent/device: Use nix::sys::stat::{major,minor} instead of libc::*
296e76f8 watchers: handle symlinked directories, dir removal
2b6dfe41 watchers: don't dereference symlinks when copying files
3c9ae7fb kata-deploy: Ensure we test HEAD with `/test_kata_deploy`
0380b9bd runtime: Update containerd to 1.5.8
112ea258 qemu: fix snap build by disabling libudev
d5a18173 virtcontainers: fix failing template test on ppc64le
6955d144 kata-deploy: Add back stable & latest tags
bbaf57ad agent: fix the issue of missing create a new session for container
46fd5069 docs: update using-SPDK-vhostuser-and-kata.md
7e6f2b8d vc-utils: don't export unused function
860f3088 virtcontainers: move oci, uuid packages top level
8acb3a32 virtcontainers: remove unused package nsenter
4788cb82 vc-network: remove unused functions
b6ebddd7 oci: remove unused function GetContainerType
599bc0c2 agent: Update README
1e7cb4bc macvlan: drop bridged part of name
55412044 monitor: Fix monitor race condition doing hypervisor.check()
eb11d053 cri-o: Update deployment documentation
92e3a140 cri-o: Update links for the CRI-O github page
0a19340a cri-o: Remove outdated documentation
a3b3c85e workflows: Remove non-used main.yaml
09f7962f runtime: merge virtcontainers/pkg/types into virtcontainers/types
6acedc25 runtime: delete not used codes
395638c4 versions: bump golang to 1.17.x
570915a8 docs: update kata 2.0 metrics documentation
bcf181b7 cgroups: Fix systemd cgroup support
34307235 release: Use ${GOPATH}/bin/yq for upload-libseccomp-tarball action
6339fdd1 docs: update kata metrics architecture image
57bb7ffa agent: check environment variables if empty or invalid
8ab90e10 agent-ctl: Allow API specification in JSON format
eacfcdec runtime: Revert "runtime: use containerd package instead of cri-containerd"
e7856ff1 rustjail: Fix created time of container
b7b89905 virtcontainers: Lint protection types
7566b736 kernel: add VFIO kernel dependencies for ppc64le
87f67606 agent: Remove dynamic tracing APIs
b09dd7a8 docs: Fix typo
d47484e7 logging: Always run crate tests
5c9c0b6e build: Fix default target
b34ed403 cgroups: pass vhost-vsock device to cgroup
7362e1e8 runtime: remove prefix when cgroups are managed by systemd
1b1790fd agent/src: improve unit test coverage for src/namespace.rs
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
So that the debug console is more useful. In the meantime, remove
iptables as it is not used by kata-agent any more.
Fixes: #3138
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Let's take advantage of the "is-organization-member" action and only
allow members who are part of the `kata-containers` organization to
trigger `/test_kata_deploy`.
One caveat with this approach is that for the user to be considered as
part of an organization, they **must** have their "Organization
Visibility" configured as Public (and I think the default is Private).
This was found out and suggested by @jcvenegas!
Fixes: #3130
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
kernel compiled in fedora 35 (latest) is not working, following error
is reported:
```
qemu-system-x86_64: Error loading uncompressed kernel without PVH ELF
Note
```
Build QAT kernel in fedora 34 container to fix it
fixes#3135
Signed-off-by: Julio Montes <julio.montes@intel.com>
Some tests in sandbox.rs need root user to run, because they need create
directories under /run/agent directories, actually this is a limit
that shouldn't be there. By using a temp directory for test containers
will not need run tests as root user.
Fixes: #3122
Signed-off-by: bin <bin@hyper.sh>
Kata agent logs unknown system calls given by seccomp profiles
in advance before the log file descriptor closes.
Fixes: #2957
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Rather than comparing a string to a literal in the rust example,
use `.is_empty()` as that approach is more idiomatic and preferred.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Add a comment stating that `anyhow` and `thiserror` should be used in
real rust code, rather than the unwieldy default `Result` handling
shown in the example.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Change some headings to avoid using the present continuous tense which
should not be used for headings.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Use a capital letter when referring to Golang and Rust (and remove
unnecessary backticks for Rust).
> **Note:**
>
> We continue refer to "Go" as "Golang" since it's a common alias,
> but, crucially, familiarity with this name makes searching for
> information using this term possible: "Go" is too generic a word.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Unit tests necessarily need to be maintained with the code they test so
it makes sense to keep the Unit Test Advice document into the main repo
since that is where the majority of unit tests reside.
Note: The
[`Unit-Test-Advice.md` file](https://github.com/kata-containers/tests/blob/main/Unit-Test-Advice.md)
was copied from the `tests` repo when it's `HEAD` was
38855f1f40.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Move the documentation requirements document link up so that it appears
immediately below the "How to Contribute" section.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
`grep`ing by a specific output, in a specific language, is quite fragile
and could easily break `hub`. For now, let's work this around following
James' suggestion of setting `LC_ALL=C LANG=C` when calling `hub`.
> **Note**: I don't think we should invest much time on fixing `hub`
> usage, as it'll be soon replaced by `gh`, see:
> https://github.com/kata-containers/kata-containers/issues/3083
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
When branching the "stable-x.y" branch, we need to do some quite
specific changes to kata-deploy / kata-cleanup files, such as:
* changing the tags from "latest" to "stable-x.y".
* removing the kata-deploy / kata-cleanup stable files.
However, after the branching is done, we need to get the `main` repo to
its original state, with the kata-deploy / kata-cleanup using the
"latest" tag, and with the stable files present there, and this commit
ensures that, during the release process, a new PR is automatically
created with these changes.
Fixes: #3069
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Rather than doing the kata-deploy changes as part of the release bump
commit, let's split those on its own changes, as it will both make the
life of the reviewer less confusing and also allows us to start
preparing the field for a possible automated revert of these changes,
whenever it becomes needed.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Similarly to what was done for the yaml files, let's use a var for
representing the registry where our images will be pushed to and avoid
repetition and too long lines.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Instead of always writing the full path of some files, let's just create
some vars and avoid both repetition (which is quite error prone) and too
long lines (which makes the file not so easy to read).
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We can simplify the code a little bit, as at least now we group common
operationr together. Hopefully this will improve the maintainability
and the readability of the code.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The comments were mentioning kata-deploy-base files while it really
should mention kata-deploy-stable files.
While here, I've also added a missing '"' to one of the tags.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
os.Exit() will terminate program immediately, the defer functions
won't be executed, so we add defer functions again before os.Exit().
Refer to https://pkg.go.dev/os#ExitFixes: #3059
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
Replace some `unwrap()` and `expect()` calls with code to return the
error to the caller.
Fixes: #3011.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Improved the `verify_cid()` function that validates container ID's by
removing the need for an `unwrap()`.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Change `baremount()` to accept `Path` values rather than string values
since:
- `Path` is more natural given the function deals with paths.
- This minimises the caller having to convert between string and `Path`
types, which simplifies the surrounding code.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Although the binary name of the shipped binary is `qemu-system-x86_64`,
and we only ship kata-deploy for `x86_64`, we better leaving the
architecture specific name out of our README file.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The unrecognized option: 'deny-warnings' args caused `make optimize` failed.
Fixed the Makefile of the agent project, make sure the `make optimize` command
execute correctly. This PR modify the rustc args from '--deny-warnings' to
'--deny warnings'.
Fixes: #3104
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
Commit 3c9ae7f made /test_kata_deploy run
against HEAD, but it also mistakenly removed all the checks that ensure
/test_kata_deploy only runs when explicitly called.
Mea culpa on this, and let's add the tests back.
Fixes: #3101
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Function parameters in test config is not used. This
commit will add under score before variable name
in test config.
Fixes: #3091
Signed-off-by: bin <bin@hyper.sh>
1. use ci/go-test.sh to replace the direct call to go test
2. fix data race test
3. install hook whether it is root or not
Fixes#1494
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
This'll end up moving to hypervisors pkg, but let's stop using virtLog,
instead introduce hvLogger.
Fixes: #2884
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Today the hypervisor code in vc relies on persist pkg for two things:
1. To get the VM/run store path on the host filesystem,
2. For type definition of the Load/Save functions of the hypervisor
interface.
For (1), we can simply remove the store interface from the hypervisor
config and replace it with just the path, since this is all we really
need. When we create a NewHypervisor structure, outside of the
hypervisor, we can populate this path.
For (2), rather than have the persist pkg define the structure, let's
let the hypervisor code (soon to be pkg) define the structure. persist
API already needs to call into hypervisor anyway; let's allow us to
define the structure.
We'll probably want to look at following similar pattern for other parts
of vc that we want to make independent of the persist API.
In doing this, we started an initial hypervisors pkg, to hold these
types (avoid a circular dependency between virtcontainers and persist
pkg). Next step will be to remove all other dependencies and move the
hypervisor specific code into this pkg, and out of virtcontaienrs.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Today, acrn relies on sandbox level information, as well as a store
provided by common parts of the hypervisor. As we cleanup the
abstractions within our runtime, we need to ensure that there aren't
cross dependencies between the sandbox, the persistence logic and the
hypervisor.
Ensure that ACRN still compiles, but remove the setSandbox usage as
well as persist driver setup.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
We noticed s390x test failures on several of the watcher unit tests.
Discovered that on s390 in particular, if we update a file in quick
sucecssion, the time stampe on the file would not be unique between the
writes. Through testing, we observe that a 20 millisecond delay is very
reliable for being able to observe the timestamp update. Let's ensure we
have this delay between writes for our tests so our tests are more
reliable.
In "the real world" we'll be polling for changes every 2 seconds, and
frequency of filesystem updates will be on order of minutes and days,
rather that microseconds.
Fixes: #2946
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
The k8s SR-IOV plugin, when it assigns a VFIO device to a container, adds
an variable of the form PCIDEVICE_<identifier> to the container's
environment, so that the payload knows which device is which. The contents
of the variable gives the PCI address of the device to use.
Kata allows VFIO devices to be passed in to a Kata container, however it
runs within a VM which has a different PCI topology. In order for the
payload to find the right device, the environment variables therefore need
to be converted to list the guest PCI addresses instead of the host PCI
addresses.
fixes#2897
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
update_spec_devices() takes a bunch of updates for the device entries in
the OCI spec and applies them, adjusting things in both the linux.devices
and linux.resources.devices sections of the spec.
It's important that each entry in the spec only be updated once. Currently
we ensure this by first creating an index of where the entries are, then
consulting that as we apply each update, so that earlier updates don't
cause us to incorrectly detect an entry as being relevant to a later
update. This method works, but it's quite awkward.
This inverts the loop structure in update_spec_devices() to make this
clearer. Instead of stepping through each update and finding the relevant
entries in the spec to change, we step through each entry in the spec and
find the relevant update. This makes it structurally clear that we're only
updating each entry once.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We have a test case commented as testing the case where linux.devices is
empty in the OCI spec. While it's true that linux.devices is empth in this
example, the reason it fails isn't specifically because it's empty but
because it doesn't contain a device for the update we're trying to apply.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
update_spec_devices() explicitly checks for being called with an empty
container path and fails. We have a unit test to verify this behaviour.
But while an empty container_path probably does mean something has gone
wrong elsewhere, that's also true of any number of other bad paths. Having
an empty string here doesn't prevent what we're doing in this function
making sense - we can compare it to the strings in the OCI spec perfectly
well (though more likely we simply won't find it there).
So, there's no real reason to check this one particular odd case.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The DevIndex data structure keeps track of devices in the OCI
specification. We used to carry it around to quite a lot of
functions, but it's now used only within update_spec_devices(). That
means we can simplify things a bit by just open coding the maps we
need, rather than declaring a special type.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
update_spec_device() adjusts the OCI spec for device differences
between the host and guest. It is called repeatedly for each device
we need to alter. These calls are now all in a single loop in
add_devices(), so it makes more sense to move the loop into a renamed
update_spec_devices() and process all the fixups in one call.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently the DevNumUpdate structure is created with a path to a
device node in the VM, which is then used by update_spec_device().
However the only piece of information that update_spec_device()
actually needs is the VM side major and minor numbers for the device.
We can determine those when we create the DevNumUpdate structure.
This means we detect errors earlier and as a bonus we don't need to
make a copy of the vm path string.
Since that change requires updating 2 of the log statements, we take the
opportunity to update all the log statements to structured style.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For each device in the OCI spec we need to update it to reflect the guest
rather than the host. We do this with additional device information
provided by the runtime. There should only be one update for each device
though, if there are multiple, something has gone horribly wrong.
Detect and report this situation, for safety.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
As we process container devices in the agent, we repeatedly call
update_spec_device() to adjust the OCI spec as necessary for differences
between the host and the VM. This means that for the whole of a pretty
complex call graph, the spec is in a partially-updated state - neither
fully as it was on the host, not fully as it will be for the container
within the VM.
Worse, it's not discernable from the contents itself which parts of the
spec have already been updated and which have not. We used to have real
bugs because of this, until the DevIndex structure was introduced, but that
means a whole, fairly complex, parallel data structure needs to be passed
around this call graph just to keep track of the state we're in.
Start simplifying this by having the device handler functions not directly
update the spec, but instead return an update structure describing the
change they need. Once all the devices are added, add_devices() will
process all the updates as a batch.
Note that collecting the updates in a HashMap, rather than a simple Vec
doesn't make a lot of sense in the current code, but will reduce churn
in future changes which make use of it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently update_spec_device() takes parameters 'vm_path' and 'final_path'
to give it the information it needs to update a single device in the OCI
spec for the guest. This bundles these parameters into a single structure
type describing the updates to a single device. This doesn't accomplish
much immediately, but will allow a number of further cleanups.
At the same time we change the representation of vm_path from a Unicode
string to a std::path::Path, which is a bit more natural since we are
performing file operations on it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
virtio_blk_device_handler(), virtio_blk_ccw_device_handler() and
virtio_scsi_device_handler() all take a clone of their 'device' parameter.
They appear to do this in order to get a mutable copy in which they can
update the vm_path field.
However, the copy is dropped at the end of the function, so the only thing
that's used in it is the vm_path field passed to update_spec_device()
afterwards.
We can avoid the clone by just using a local variable for the vm_path.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
update_spec_device() takes a 'final_path' parameter which gives the
name the device should be given in the "inner" OCI spec. We need this
for VFIO devices where the name the payload sees needs to match the
VM's IOMMU groups. However, in all other cases (for now, and maybe
forever), this is the same as the original 'container_path' given in
the input OCI spec. To make this clearer and simplify callers, make
this parameter an Option, and only update the device name if it is
non-None.
Additionally, update_spec_device() needs to call to_string() on
update_path to get an owned version. Rust convention[0] is to let the
caller decide whether it should copy, or just give an existing owned
version to the function. Change from &str to String to allow that; it
doesn't buy us anything right now, but will make some things a little
nicer in future.
[0] https://rust-lang.github.io/api-guidelines/flexibility.html?highlight=clone#caller-decides-where-to-copy-and-place-data-c-caller-control
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
update_spec_device() takes a 'host_path' parameter which it uses to locate
the device to correct in the OCI spec. Although this will usually be the
path of the device on the host, it doesn't have to be - a traditional
runtime like runc would create a device node of that name in the container
with the given (host) major and minor numbers. To clarify that, rename it
to 'container_path'.
We also update the block comment to explain the distinctions more
carefully. Finally we update some variable names in tests to match.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This error is returned if we have information for a device from the
runtime, but a matching device does not appear in the OCI spec. However,
the name for the device we print is the name from the VM, rather than the
name from the container which is what we actually expect in the spec.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
update_spec_devices() includes an unsafe block, in order to call the libc
functions to get the major and minor numbers from a device ID. However,
the nix crate already has a safe wrapper for this function, which we use in
other places in the file.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
- Even a directory could be a symlink - check for this. This is very
common when using configmaps/secrets
- Add unit test to better mimic a configmap, configmap update
- We would never remove directories before. Let's ensure that these are
added to the watched_list, and verify in unit tests
- Update unit tests which exercise maximum number of files per entry. There's a change
in behavior now that we consider directories/symlinks watchable as well.
For these tests, it means we support one less file in a watchable mount.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
The current implementation just copies the file, dereferencing any
simlinks in the process. This results in symlinks no being preserved,
and a change in layout relative to the mount that we are making
watchable.
What we want is something like "cp -d"
This isn't available in a crate, so let's go ahead and introduce a copy
function which will create a symlink with same relative path if the
source file is a symlink. Regular files are handled with the standard
fs::copy.
Introduce a unit test to verify symlinks are now handled appropriately.
Fixes: #2950
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Is the past few releases we ended up hitting issues that could be easily
avoided if `/test_kata_deploy` would use HEAD instead of a specific
tarball.
By the end of the day, we want to ensure kata-deploy works, but before
we cut a release we also want to ensure that the binaries used in that
release are in a good shape. If we don't do that we end up either
having to roll a release back, or to cut a second release in a really
short time (and that's time consuming).
Note: there's code duplication here that could and should be avoided,b
but I sincerely would prefer treating it in a different PR.
Fixes: #3001
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
While building snap, static qemu is considered. Disable libudev
as it doesn't have static libraries on most of the distros of all
archs.
Fixes: #3002
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
If a file/directory doesn't exist, os.Stat() returns an
error. Assert the returned value with os.IsNotExist() to
prevent it from failing.
Fixes: #2920
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
stable-2.3 was the first time we branched the repo since
43a72d76e2 was merged. One bit that I
didn't notice while working on this, regardless of being warned by
@amshinde (sorry!), was that the change would happen on `main` branch,
rather than on the branched `stable-2.3` one.
In my mind, the workflow was:
* we branch.
* we do the changes, including removing the files.
* we tag a release.
However, the workflow actually is:
* we do the changes, including removing the files.
* we branch.
* we tag a release.
A better way to deal with this has to be figured out before 2.4.0 is
out, but for now let's just re-add the files back.
Fixes: #3067
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Now the `wait` is passed to qmp command, even at non-server mode. This
will cause qemu return this error:
'wait' option is incompatible with socket in client connect mode
Fixes: #205
Signed-off-by: bin liu <liubin0329@gmail.com>
When the container didn't had a tty console, it would be in a same
process group with the kata-agent, which wasn't expected. Thus,
create a new session for the container process.
Fixes: #3063
Signed-off-by: Fupan Li <fupan.lfp@antgroup.com>
Many of these functions are just used on one place throughout the rest
of the code base. If we create hypervisor package, newtork package, etc, we may want to
parse this out.
Fixes: #3049
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
This will be useful at runtime level; no need for oci or uuid to be subpkg of
virtcontainers.
While at it, ensure we run gofmt on the changed files.
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Update the agent README by removing the historical details about the
conversion from golang to rust which (occurred at the start of Kata 2.x
development) and replacing it with information that developers and
testers should find more useful.
Fixes: #3056.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
The fact that we need to "bridge" the endpoint is a bit irrelevant. To
be consistent with the rest of the endpoints, let's just call this
"macvlan"
Fixes: #3050
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
The thread monitor will check if the agent and the VMM are alive every
second in a blocking thread. The Cloud hypervisor API server is
single-threaded, if the monitor does a `check()`, while a slow request
is still in progress, the monitor check() method will timeout. The
monitor thread will stop all the shim-v2 execution.
This commit modifies the monitor thread to make it check the status of
the hypervisor after 5 seconds. Additionally, the `check()` method from
cloud-hypervisor will use the method `clh.isClhRunning(timeout)` with a
10 seconds timeout. The monitor function does no timeout, so even if
`hypervisor.check()` takes more 10 seconds, the isClhRunning method
handles errors doing a VmmPing and retry in case of errors until the
timeout is reached.
Reduce the time to the next check to 5 should not affect any functionality,
but it will reduce the overhead polling the hypervisor.
Fixes: #2777
Signed-off-by: Carlos Venegas <jose.carlos.venegas.munoz@intel.com>
CRI-O deployment documentation was quite outdated, giving info from the
`1.x` era. Let's update this to reflect what we currently have.
Fixes: #2498
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The links are either pointing to the not-used-anymore `master` branch,
or to the kubernetes-incubator page.
Let's always point to the CRI-O github page, using the `main`branch.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Although the documentation removed is correct, it's not relevant to the
current supported versions of CRI-O.
Related: #2498
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The main.yaml workflow was created and used only on 1.x. We inherited
it, but we didn't remove it after deprecating the 1.x repos.
While here, let's also update the reference to the `main.yaml` file,
and point to `release.yaml` (the file that's actually used for 2.x).
Fixes: #3033
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
There are two types packages under virtcontainers, and the
virtcontainers/pkg/types has a few codes, merging them into
one can make it easy for outstanding and using types package.
Fixes: #3031
Signed-off-by: bin <bin@hyper.sh>
According to https://endoflife.date/go golang 1.15 is not supported
anymore. Let's remove it from out tests, add 1.17.x, and bump the
newest version known to work when building kata to 1.17.3.
Fixes: #3016
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
We now support any container engine CRI compliant in kata-monitor.
Update documentation to reflect it.
Fixes: #980
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
As github.com/containerd/cgroups doesn't support scope
units which are essential in some cases lets create
the cgroups manually and load it trough the cgroups
api
This is currently done only when there's single sandbox
cgroup (sandbox_cgroup_only=true), otherwise we set it
as static cgroup path as it used to be (until a proper
soultion for overhead cgroup under systemd will be
suggested)
Fixes: #2868
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
We need to explicitly call `${GOPATH}/bin/yq` that is installed by
`ci/install_yq.sh`.
Fixes: #3014
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
We now support any CRI container engine in kata-monitor, notably CRI-O.
Add both containerd and CRI-O in the kata metrics architecture image.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
Update the `agent-ctl` tool to allow API fields to be specified in JSON
format, either directly on the command-line, or via a file URI.
This feature is made possible by enabling `serde` support in the agent
`protocols` crate. Careful use of the `serde` macros allows the
`agent-ctl` tool to accept _partially_ specified API objects in JSON
format; fields that are not specified are set to the default value for
their respective types.
`build.rs` changes based on work by Fupan.
Fixes: #2978.
Contributions-by: Fupan Li <lifupan@gmail.com>
Contributions-by: Bin Liu <bin@hyper.sh>
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
This reverts commit 76f16fd1a7 to bring
back cri-containerd crioptions parsing so that kata works with older
containerd versions like v1.3.9 and v1.4.6.
Fixes: #2999
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Protection types like tdxProtection or seProtection were marked nolint,
remove this. As a side effect, ARM needs dummy tests for these.
Fixes: #2801
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
If Kata agent cannot resolve the system calls given by seccomp profiles,
the agent ignores the system calls and continues to run without an error.
Fixes: #2957
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
- runtime# make sure the "Shutdown" trace span have a correct end
- tracing: Accept multiple dynamic tags
- logging: Enable agent debug output for release builds
- agent: "Revert agent: Disable seccomp feature on aarch64 temporarily"
- runtime: Enhancement for Makefile
- osbuilder: build image-builder image from Fedora 34
- agent: refactor process IO processing
- agent-ctl: Update for Hybrid VSOCK
- docs: Fix outdated links
- ci/install_libseccomp: Fix libseccomp build and misc improvement
- virtcontainers: simplify read-only mount handling
- runtime: add fast-test to let test exit on error
- test: Fix random failure for TestIoCopy
- cli: Show available guest protection in env output
- Update k8s, critools, and CRI-O to their 1.22 release
- package: assign proper value to redefined_string in build-kernel.sh
- agent: Make wording of error message match CRI-O test suite
- docs: Moving from EOT to EOF
- virtcontainers: api: update the functions in the api.md docs
- release: Upload libseccomp sources with notice to release page
- virtcontainers: check that both initrd and image are not set
- agent: Fix the configuration sample file
- runtime: set tags for trace span
- agent-ctl: Implement Linux OCI spec handling
- runtime: Remove comments about unsupported features in config for clh
- tools/packaging: Add options for VFIO to guest kernel
- agent/runtime: Add seccomp feature
- ci: test-kata-deploy: Get rid of slash-command-action action
- This is to bump the OOT QAT 1.7 driver version to the latest version.…
- forwarder: Drop privileges when using hybrid VSOCK
- packaging/static-build: s390x fixes
- agent-ctl: improve the oci_to_grpc code
- agent: do not return error but print it if task wait failed
- virtcontainers: delete duplicated notify in watchHypervisor function
- agent: Handle uevent remove actions
- enable unit test on arm
- rustjail: Consistent coding style of LinuxDevice type
- cli: Fix outdated kata-runtime bash completion
- Allow VFIO devices to be used as VFIO devices in the container
- Expose top level hypervisor methods -
- Upgrade to Cloud Hypervisor v19.0
- docs: use-cases: Update Intel SGX use case
- virtcontainers: clh: Enable the `seccomp` feature
- runtime: delete cri containerd plugin from versions.yaml
- docs: Write tracing documentation
- runtime: delete useless src/runtime/cli/exit.go
- snap: add cloud-hypervisor and experimental kernel
- osbuilder: Call detect_rust_version() right before install_rust.sh
- docs: Updating Developer Guide re qemu-img
- versions: Add libseccomp and gperf version
- Enable agent tracing for hybrid VSOCK hypervisors
- runtime: optimize test code
- runtime: use containerd package instead of cri-containerd
- runtime: update sandbox root dir cleanup behavior in rootless hypervisor
- utils: kata-manager: Update kata-manager.sh for new containerd config
- osbuilder: Re-enable building the agent in Docker
- agent: Do not fail when trying to adding existing routes
- tracing: Fix typo in "package" tag name
- kata-deploy: add .dockerignore file
- runtime: change name in config settings back to "kata"
- tracing: Remove trace mode and trace type
09d5d88 runtime: tracing: Change method for adding tags
bcf3e82 logging: Enable agent debug output for release builds
a239a38 osbuilder: build image-builder image from Fedora 34
375ad2b runtime: Enhancement for Makefile
b468dc5 agent: Use dup3 system call in unit tests of seccomp
1aaa059 agent: "Revert agent: Disable seccomp feature on aarch64 temporarily"
1e331f7 agent: refactor process IO processing
9d3ec58 runtime: make sure the "Shutdown" trace span have a correct end
3f21af9 runtime: add fast-test to let test exit on error
9b270d7 ci/install_libseccomp: use a temporary work directory
98b4406 ci/install_libseccomp: Fix fail when DESTDIR is set
338ac87 virtcontainers: api: update the functions in the api.md docs
23496f9 release: Upload libseccomp sources with notice to release page
e610fc8 runtime: Remove comments about unsupported features in config for clh
7e40195 agent-ctl: Add stub for AddSwap API
82de838 agent-ctl: Update for Hybrid VSOCK
d1bcf10 forwarder: Remove quotes from socket path in doc
e66d047 virtcontainers: simplify read-only mount handling
bdf4824 tools/packaging: Add options for VFIO to guest kernel
c509a20 agent-ctl: Implement Linux OCI spec handling
42add7f agent: Disable seccomp feature on aarch64 temporarily
5dfedc2 docs: Add explanation about seccomp
45e7c2c static-checks: Add step for installing libseccomp
a3647e3 osbuilder: Set up libseccomp library
3be50ad agent: Add support for Seccomp
4280415 agent: Fix the configuration sample file
b0bc71f ci: test-kata-deploy: Get rid of slash-command-action action
309dae6 virtcontainers: check that both initrd and image are not set
a10cfff forwarder: Fix changing log level
6abccb9 forwarder: Drop privileges when using hybrid VSOCK
bf00b8d agent-ctl: improve the oci_to_grpc code
b67fa9e forwarder: Make explicit root check
e377578 forwarder: Fix docs socket path
5f30633 virtcontainers: delete duplicated notify in watchHypervisor function
5f5eca6 agent: do not return error but print it if task wait failed
d2a7b6f packaging/static-build: s390x fixes
6cc8000 cli: Show available guest protection in env output
2063b13 virtcontainers: Add func AvailableGuestProtections
a13e2f7 agent: Handle uevent remove actions
34273da runtime/device: Allow VFIO devices to be presented to guest as VFIO devices
68696e0 runtime: Add parameter to constrainGRPCSpec to control VFIO handling
d9e2e9e runtime: Rename constraintGRPCSpec to improve grammar
57ab408 runtime: Introduce "vfio_mode" config variable and annotation
730b9c4 agent/device: Create device nodes for VFIO devices
175f9b0 rustjail: Allow container devices in subdirectories
9891efc rustjail: Correct sanity checks on device path
d6b62c0 rustjail: Change mknod_dev() and bind_dev() to take relative device path
2680c0b rustjail: Provide useful context on device node creation errors
42b92b2 agent/device: Allow container devname to differ from the host
827a41f agent/device: Refactor update_spec_device_list()
8ceadcc agent/device: Sanity check guest IOMMU groups
ff59db7 agent/device: Add function to get IOMMU group for a PCI device
13b06a3 agent/device: Rebind VFIO devices to VFIO driver inside guest
e22bd78 agent/device: Add helper function for binding a guest device to a driver
b40eedc rustjail: Consistent coding style of LinuxDevice type
57c0f93 agent: fix race condition when test watcher
1a96b8b template: disable template unit test on arm
43b13a4 runtime: DefaultMaxVCPUs should not greater than defaultMaxQemuVCPUs
c59c367 runtime: current vcpu number should be limited
fa92251 runtime: kernel version with '+' as suffix panic in parse
52268d0 hypervisor: Expose the hypervisor itself
a72bed5 hypervisor: update tests based on createSandbox->CreateVM change
f434bcb hypervisor: createSandbox is CreateVM
76f1ce9 hypervisor: startSandbox is StartVM
fd24a69 hypervisor: waitSandbox is waitVM
a6385c8 hypervisor: stopSandbox is StopVM
f989078 hypervisor: resumeSandbox is ResumeVM
73b4f27 hypervisor: saveSandbox is SaveVM
7308610 hypervisor: pauseSandbox is nothing but PauseVM
8f78e1c hypervisor: The SandboxConsole is the VM's console
4d47aee hypervisor: Export generic interface methods
6baf258 hypervisor: Minimal exports of generic hypervisor internal fields
37fa453 osbuilder: Update QAT driver in Dockerfile
8030b6c virtcontainers: clh: Re-generate the client code
8296754 versions: Upgrade to Cloud Hypervisor v19.0
2b13944 docs: Fix outdated links
4f75ccb docs: use-cases: Update Intel SGX use case
4f018b5 runtime: delete useless src/runtime/cli/exit.go
7a80aeb docs: Moving from EOT to EOF
09a5e03 docs: Write tracing documentation
b625f62 runtime: delete cri containerd plugin from versions.yaml
24fff57 snap: make curl commands consistent
2b9f79c snap: add cloud-hypervisor and experimental kernel
273a1a9 runtime: optimize test code
76f16fd runtime: use containerd package instead of cri-containerd
6d55b1b docs: use containerd to replace cri-containerd
ed02bc9 packaging: add containerd to versions.yaml
50da26d osbuilder: Call detect_rust_version() right before install_rust.sh
b4fadc9 docs: Updating Developer Guide re qemu-img
b8e69ce versions: Add libseccomp and gperf version
17a8c5c runtime: Fix random failure for TestIoCopy
f34f67d osbuilder: Specify version when installing Rust
135a080 osbuilder: Pass CI env to container agent build
eb5dd76 osbuilder: Re-enable building the agent in Docker
bcffa26 tracing: Fix typo in "package" tag name
e61f5e2 runtime: Show socket path in kata-env output
5b3a349 trace-forwarder: Support Hybrid VSOCK
e42bc05 kata-deploy: add .dockerignore file
321be0f tracing: Remove trace mode and trace type
7d0b616 agent: Do not fail when trying to adding existing routes
3f95469 runtime: logging: Add variable for syslog tag
adc9e0b runtime: fix two bugs in rootless hypervisor
51cbe14 runtime: Add option "disable_seccomp" to config hypervisor.clh
98b7350 virtcontainers: clh: Enable the `seccomp` feature
46720c6 runtime: set tags for trace span
d789b42 package: assign proper value to redefined_string
4d7ddff utils: kata-manager: Update kata-manager.sh for new containerd config
f5172d1 cli: Fix outdated kata-runtime bash completion
d45c86d versions: Update CRI-O to its 1.22 release
c4a6426 versions: Update k8s & critools to v1.22
881b996 agent: Make wording of error message match CRI-O test suite
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
- Add support for legacy serial device
- Additionally add support for the file backend for chardev
Legacy serial plus char backend file will allow us to support
capture early boot messages.
Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Upgrade from v0.20.0 to v1.0.0, first stable release.
Git log
4bfa0034 Release prep v1.0.0-RC3 (2218)
c7ae470a Refactor SDK span creation and implementation (2213)
db317fce Verify and update OTLP trace exporter documentation (2053)
04de34a2 Update the website getting started docs (2203)
a7b9d021 Rename metric instruments to match feature-freeze API specification (2202)
1f527a52 Update trace API config creation functions (2212)
361a2096 Fix RC2 header in changelog (2215)
e209ee75 chore(exporter/zipkin): improves logging on invalid collector. (2191)
c0c5ef65 Fix typos in resource.go. (2201)
abf6afe0 Update otel example guide (2210)
3b05ba02 Bump actions/setup-go from 2.1.3 to 2.1.4 (2206)
bcd7ff7b Bump codecov/codecov-action from 2.0.2 to 2.0.3 (2205)
c912b179 Print JSON objects to stdout without a wrapping array (2196)
add511c1 Make WithoutTimestamps work (2195)
85c27e01 Bump github.com/golangci/golangci-lint from 1.41.1 to 1.42.0 in /internal/tools (2199)
bf6500b3 Bump google.golang.org/grpc from 1.39.1 to 1.40.0 in /exporters/otlp/otlptrace (2184)
9392af96 Bump google.golang.org/grpc in /exporters/otlp/otlptrace/otlptracegrpc (2185)
c95694dc Bump google.golang.org/grpc from 1.39.1 to 1.40.0 in /example/otel-collector (2183)
0528fa66 Bump google.golang.org/grpc from 1.39.1 to 1.40.0 in /exporters/otlp/otlpmetric (2186)
3a26ed21 Deprecate the oteltest package (2188)
c885435f Website: support GH page links to canonical src (2189)
6da20a27 Add cross-module test coverage (2182)
dfc866bd Support capturing stack trace (2163)
41588fea Deprecate the attribute.Any function (2181)
4e8d667f Support a single Resource per MeterProvider in the SDK (2120)
a8bb0bf8 Make the tracetest.SpanRecorder concurrent safe (2178)
87d09df3 Deprecate Array attribute in favor of *Slice types (2162)
df384a9a Move InstrumentKind into the new metric/sdkapi package (2091)
1cb5cdca Unify the OTLP attribute transform (2170)
a882ee37 Clarify the attribute package documentation and order/grouping (2168)
5d25c4d2 Add support for int32 in attribute.Any (2169)
2b0e139e Refactor attributes benchmark tests (2167)
4c7470d9 Bump google.golang.org/grpc from 1.39.0 to 1.39.1 in /exporters/otlp/otlptrace (2176)
990c534a Bump google.golang.org/grpc in /example/otel-collector (2172)
b45c9d31 Bump google.golang.org/grpc from 1.39.0 to 1.39.1 in /exporters/otlp/otlpmetric (2174)
a3d4ff5c Deprecated the bridge/opencensus/utils package (2166)
b1d1d529 Move OC bridge integration tests to own mod (2165)
89a9489c Add OC bridge internal unit tests (2164)
56c743ba Allow global ErrorHandler to be set multiple times (2160)
d18c135f Add OpenCensus bridge internal package (2146)
fcf945a4 Just a little typo fix in code documentation. (2159)
59a82eba Update version.go (2157)
21d4686f Add ErrorHandlerFunc to simplify creating ErrorHandlers (2149)
23cb9396 Remove `internal/semconv-gen` (2155)
39acab32 Fix code sample in otel.GetTraceProvider (2147)
2b1bb29e Update OpenCensus bridge docs with limitations (2145)
fd7c327b Fix Jaeger exporter agent port default value and docs (2131)
b8561785 fix(2138): add guard to constructOTResources to return an empty resource (2139)
11f62640 Add a SpanRecorder to the sdk/trace/tracetest (2132)
fd9de7ec rename assertsocketbuffersize.go to *_test (2136)
a6b4d90c nit doc fix (2135)
79398418 pre-release v1.0.0-RC2 (2133)
2501e0fd Use semconv.SchemaURL in STDOUT exporter example (2134)
ef03dbc9 Bump codecov/codecov-action from 1 to 2.0.2 (2129)
bbe6ca40 Deprecate oteltest.Harness for removal (2123)
7a624ac2 Deprecated the oteltest.TraceStateFromKeyValues function (2122)
ece1879f Removed dropped link's attributes field from API package (2118)
03902d98 Rename sdk/trace/tracetest test.go -> exporter.go (2128)
cb607b0a Unify OTLP exporter retry logic (2095)
abe22437 API: create new linked span from current context (2115)
db81d4aa Update internal/global/trace testing (2111)
7f10ef72 Remove propagation testing types from oteltest (2116)
25d739b0 Remove resource.WithBuiltinDetectors() which has not been maintained (2097)
d57c5a56 Remove several metrics test helpers (2105)
49359495 Simplify trace_context tests (2108)
56d42011 Simplify trace context benchmark test (2109)
63dfe64a Correct status transform in OTLP exporter (2102)
9b1a5f70 Performance improvement: avoid creating multiple same read-only objects (2104)
ab78dbd0 Update release URL (2106)
647af3a0 Pre release experimental metrics v0.22.0 (2101)
0a562337 Fixed OS type value for DragonFly BSD (2092)
62c21ffb Bump golang.org/x/tools from 0.1.4 to 0.1.5 in /internal/tools (2096)
4a3da55a Ensure sample code in website_docs getting started page works (2094)
d3063a3d Update otel.Meter to global.Meter in Getting Started Document.(2087) (2093)
00a1ec5f Add documentation guidelines and improve Jaeger exporter readme (2082)
12f737c7 oteltest: ensure valid SpanContext created for span started WithNewRoot (2073)
484258eb OS description attribute detector (1840)
d8c9a955 Bump google.golang.org/grpc from 1.38.0 to 1.39.0 in /example/otel-collector (2054)
4ffdf034 Add @pellard as an Approver (2047)
1a74b399 Bump google.golang.org/protobuf from 1.26.0 to 1.27.0 in /exporters/otlp/otlpmetric (2040)
57c2e8fb Bump golang.org/x/tools from 0.1.3 to 0.1.4 in /internal/tools (2036)
7cff31a9 Bump google.golang.org/protobuf from 1.26.0 to 1.27.0 in /exporters/otlp/otlptrace (2035)
9e8f523d when using WithNewRoot, don't use the parent context for sampling (2032)
62af6c70 semconv-gen: fix capitalization at word boundaries, add stability/deprecation indicators (2033)
0bceed7e Fix docs on otel-collector example (2034)
6428cd69 Update doc.go (2030)
311a6396 fix documentation for trace.Status (2029)
16f83ce6 export ToZipkinSpanModels for use outside this library (2027)
d5d4c87f Add HTTP metrics exporter for OTLP (2022)
d6e8f60f Bump github.com/golangci/golangci-lint from 1.40.1 to 1.41.1 in /internal/tools (2023)
51dbe3cb Remove deprecated exporters (2020)
257ef7fc Update project status in README (2017)
ced177b7 Pre-release 1.0.0-RC1 (2013)
694c9a41 Interface stability documentation (2012)
39fe8092 Add span.TracerProvider() (2009)
d020e1a2 Add more tests for go.opentelemetry.io/otel/trace package. (2004)
6d4a38f1 replace WithSyncer with WithBatcher in opencensus example (2007)
c30cd1d0 Split stdout exporter into stdouttrace and stdoutmetric (2005)
80ca2b1e otlp: mark unix endpoints to work without transport security (2001)
65140985 Update codecov ignore (2006)
3be9813d Deprecate the exporters in the "trace" and "metric" sub-directories (1993)
377f7ce4 remove WithTrace* options from otlptrace exporters (1997)
b33edaa5 OTLP metrics gRPC exporter (1991)
64b640cc Remove old OTLP exporter (1990)
7728a521 Remove dependency on metrics packages (1988)
135ac4b6 Moved internal/tools duplicated findRepoRoot function to common package (1978)
cdf67ddf Update semantic conventions to v1.4.0, move to versioned package (1987)
4883cb11 Refactor exporter creation functions (1985)
87cc1e1f Test BatchSpanProcessor export timeout directly (1982)
7ffe2845 Added inputPath validation to semconv-gen (1986)
a113856a Add caveat about installing opencensus bridge (1983)
741cb9a3 Fix generator.go call typo in RELEASING.md (1977)
7a0cee7b Replaces golint by revive and fix newly reported linter issues (1946)
46d9687a Add Schema URL support to Resource (1938)
0827aa62 Use mock server as jaeger agent listener. (1930)
20886012 Bugfix jaeger exporter test panic (1973)
4bf6150f Add baggage implementation based on the W3C and OpenTelemetry specification (1967)
bbe2b8a3 Bump github.com/itchyny/gojq from 0.12.3 to 0.12.4 in /internal/tools (1971)
4949bf05 Bump github.com/cenkalti/backoff/v4 from 4.1.0 to 4.1.1 in /exporters/otlp/otlptrace (1972)
015b4c17 Bump github.com/cenkalti/backoff/v4 from 4.1.0 to 4.1.1 in /exporters/otlp (1970)
13eb12ac Bump github.com/prometheus/client_golang from 1.10.0 to 1.11.0 in /exporters/metric/prometheus (1974)
2371bb0a add otlp trace http exporter (1963)
a75ade4e sdk/resource: honor OTEL_SERVICE_NAME in fromEnv resource detector (1969)
aed45802 Bump go.opentelemetry.io/proto/otlp from 0.8.0 to 0.9.0 in /exporters/otlp/otlptrace (1959)
c4ebae6a Bump go.opentelemetry.io/proto/otlp (1960)
b1d2be3b Bump google.golang.org/grpc from 1.37.1 to 1.38.0 in /exporters/otlp/otlptrace (1958)
f6daea5e Generate semantic conventions according to specification latest tagged version (1933)
435a63b3 Bump github.com/google/go-cmp from 0.5.5 to 0.5.6 (1954)
6c46af66 Bump github.com/google/go-cmp from 0.5.5 to 0.5.6 in /exporters/trace/jaeger (1953)
4d294853 Bump actions/cache from 2.1.5 to 2.1.6 (1952)
dfe2b6f1 OTLP trace gRPC exporter (1922)
5a8f7ff7 Bump go.opentelemetry.io/proto/otlp from 0.8.0 to 0.9.0 in /exporters/otlp (1943)
bd935866 Add schema URL support to Tracer (1889)
c1f460e0 Update API configs. (1921)
270cc603 Small fixes on some Span method's documentation headers (1950)
8603b902 Fix typo in doc (1949)
acbb1882 Bump google.golang.org/grpc from 1.37.1 to 1.38.0 in /exporters/otlp (1942)
b1621501 Add codecov badge (1940)
ea1434c3 Fix some golint issues (1947)
0eeb8f87 Refactor Tracestate (1931)
d3b12808 Add Passthrough example (1912)
f06cace6 Add @MadVikingGod as a project Approver (1923)
ab5facb3 Bump github.com/golangci/golangci-lint in /internal/tools (1925)
d23cc61b Refactor configs (1882)
6324adaa Add tracer option argument to global Tracer function (1902)
035fc650 Do not include authentication information in the http.url attribute (1919)
d8ac212c Fix sporadic test failure in otlp exporter http driver (1906)
a3df00f4 Create .gitattributes (1920)
fb88e926 Bump google.golang.org/grpc from 1.37.0 to 1.37.1 in /exporters/otlp (1914)
1982dc46 Bump google.golang.org/grpc in /example/prom-collector (1915)
1759c630 Bump github.com/golangci/golangci-lint in /internal/tools (1916)
7342aa47 Bump google.golang.org/grpc in /example/otel-collector (1913)
21c16418 Add support for scheme in OTEL_EXPORTER_OTLP_ENDPOINT (1886)
5cb62636 Semantic Convention generation tooling (1891)
6219221f Move the unit package to the metric module (1903)
63e0ecfc Implement global default non-recording span (1901)
b6d5442f Remove the Tracer method from the Span API (1900)
ae85fab3 Document functional options (1899)
cabf0c07 Fix default Jaeger collector endpoint (1898)
1e3fa3a3 Bump go.opentelemetry.io/proto/otlp from 0.7.0 to 0.8.0 in /exporters/otlp (1872)
696af787 Bump github.com/benbjohnson/clock from 1.0.3 to 1.1.0 in /sdk/metric (1532)
97eea6c3 Fix some golint issues (1894)
79d9852e fix container port mismatch issue (1895)
d20e7228 CI builds validate against last two versions of Go, dropping 1.14 and adding 1.16 (1865)
cbcd4b1a Redefine ExportSpans of SpanExporter with ReadOnlySpan (1873)
c99d5e99 Split large jaeger span batch to admire the udp packet size limit (1853)
42a84509 Unembed SpanContext (1877)
b7d02db1 Add Status type to SDK (1874)
f90d0d93 Update README (1876)
a1349944 Update resource.go (1871)
f40cad5e Add markdown link check configuration and action (1869)
9bc28f6b Fix existing markdown lint issues (1866)
08f4c270 Add documentation for tracer.Start() (1864)
2bd4840c remove Set.Encoded(Encoder) enconding cache (1855)
7674eebf Removed different types of Detectors for Resources. (1810)
f92a6d83 Implement retry policy for the OTLP/gRPC exporter (1832)
ec75390f Fix BSP context done tests (1863)
8e55f10a Move the Event type from the API to the SDK (1846)
e399d355 drop failed to exporter batches and return error when forcing flush a span processor (1860)
f6a9279a Honor context deadline or cancellation in SimpleSpanProcessor.Shutdown (1856)
aeef8e00 Add markdown lint GitHub action (1849)
d4c8ffad Replace spaces to tabs in Go code snippets (1854)
cb097250 fixed typo (1857)
392a44fa Refine configuration design docs (1841)
62cd933d Handle Resource env error when non-nil (1851)
24a91628 Document the SSP is not for production use (1844)
ec26ac23 Update RELEASING.md (1843)
8eb0bb99 Fix golint issue caused by typo (1847)
ca130e54 Markdownlint (1842)
1144a83d Small typo fixes to existing CHANGELOG entries (1839)
e6086958 Update website_docs to v0.20.0 (1838)
0f4e454c Change NewSplitDriver paramater and initialization (1798)
92551d39 Prerelease v1.0.0 (2250)
61839133 zipkin: remove no-op WithSDKOptions (2248)
568e7556 Set Schema URL when exporting traces to OTLP (2242)
ec26b556 Fix RC tags in docs (2239)
767ce26c Bump github.com/itchyny/gojq from 0.12.4 to 0.12.5 in /internal/tools (2216)
fe7058da adding NewNoopMeterProvider to follow trace api (2237)
c338a5ef Bump github.com/golangci/golangci-lint from 1.42.0 to 1.42.1 in /internal/tools (2236)
ef126f5c Remove deprecated Array from attribute package (2235)
360d1302 Add tests for nil *Resource (2227)
9e7812d1 Remove the deprecated oteltest package (2234)
486afd34 Remove the deprecated bridge/opencensus/utils pkg (2233)
eaacfaa8 Fix slice-valued attributes when used as map keys (2223)
df2bdbba Fix the import comments of otelpconfig (2224)
7aae2a02 otlptrace: Document supported environment variables (2222)
Fixes#2591
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Update OpenTelemetry from v0.15.0 to v0.20.0.
Git log
02d8bdd5 Release v0.20.0 (1837)
aa66fe75 OS and Process resource detectors (1788)
7374d679 Fix Links documents (1835)
856f5b84 Add feature request issue template (1831)
0fdc3d78 Remove bundler from Jaeger exporter (1830)
738ef11e Fix flaky global ErrorHandler delegation test (1829)
e43d9c00 Update Default Value for Jaeger Exporter Endpoint (1824)
0032bd64 Fix default merging of resource attributes from environment variable (1785)
96c5e4ba Add SpanProcessor example for Span annotation on start (1733)
543c8144 Remove the WithSDKOptions from the Jaeger exporter (1825)
66389ad6 Update function docs in sdk.go (1826)
70bc9eb3 Adds support for timeout on the otlp/gRPC exporter (1821)
081cc61d Update Jaeger exporter convenience functions (1822)
1b9f16d3 Remove the WithDisabled option from Jaeger exporter (1806)
6867faa0 Bump actions/cache from v2.1.4 to v2.1.5 (1818)
a2bf04dc Build context pipeline in Jaeger upload process (1809)
2de86f23 Remove locking from Jaeger exporter shutdown/export (1807)
4f9fec29 Add ExportSpans benchmark to Jaeger exporter (1805)
d9566abe Fix OTLP testing flake: signal connection from mock collector (1816)
a2cecb6e add support for env var configuration to otlp/gRPC (1811)
d616df61 Fix flaky OTLP exporter reconnect test (1814)
b09df84a Changes stdout to expose the `*sdktrace.TracerProvider` (1800)
04890608 Remove options field from Jaeger exporter (1808)
6db20e00 Remove the abandoned Process struct in Jaeger exporter (1804)
086abf34 docs: use test example to document prometheus.InstallNewPipeline (1796)
d0cea04b Bump google.golang.org/api from 0.43.0 to 0.44.0 in /exporters/trace/jaeger (1792)
99c477fe Fixed typo for default service name in Jaeger Exporter (1797)
95fd8f50 Bump google.golang.org/grpc from 1.36.1 to 1.37.0 in /exporters/otlp (1791)
9b251644 Zipkin Exporter: Use default resouce's serviceName as default serivce name (1777) (1786)
4d141e47 Add k8s.node.name and k8s.node.uid to semconv (1789)
5c99a34c Fix golint issue caused by incorrect comment (1795)
c5d006c0 Update Jaeger environment variables (1752)
58432808 add NewExportPipeline and InstallNewPipeline for otlp (1373)
7d8e6bd7 Zipkin Exporter: Adjust span transformation to comply with the spec (1688)
2817c091 Merge sdk/export/trace into sdk/trace (1778)
c61e654c Refactor prometheus exporter tests to match file headers as well (1470)
23422c56 Remove process config for Jaeger exporter (1776)
0d49b592 Add test to check bsp ignores `OnEnd` and `ForceFlush` post Shutdown` (1772)
e9aaa04b Record links/events attribute drops independently (1771)
5bbfc22c Make ExportSpans for Jaeger Exporter honor deadline (1773)
0786fe32 Add Bug report issue templates (1775)
3c7facee Add `ExportTimeout` option to batch span processor (1755)
c6b92d5b Make TraceFlags spec-compliant (1770)
ee687ca5 Bump github.com/itchyny/gojq from 0.12.2 to 0.12.3 in /internal/tools (1774)
52a24774 add support for configuring tls certs via env var to otlp/HTTP (1769)
35cfbc7e Update precedence of event name in Jaeger exporter (1768)
33699d24 Adds semantic conventions for exceptions (1492)
928e3c38 Modify ForceFlush to abort after timeout/cancellation (1757)
3947cab4 Fix testCollectorEndpoint typo and add tag assertions in jaeger_test (1753)
ecc635dc add website docs (1747)
07a8d195 Fix Jaeger span status reporting and unify tag keys (1761)
4fa35c90 add partial support for env var config to otlp/HTTP (1758)
bf180d0f improve OTLP/gRPC connection errors (1737)
d575865b Fix span IsRecording when not sampling (1750)
20c93b01 Update SamplingParameters (1749)
97501a3f Update SpanSnapshot to use parent SpanContext (1748)
604b05cb Store current Span instead of local and remote SpanContext in context.Context (1731)
c61f4b6d Set @lizthegrey to emeritus status (1745)
b1342fec Bump github.com/golangci/golangci-lint in /internal/tools (1743)
54e1bd19 Bump google.golang.org/api from 0.41.0 to 0.43.0 in /exporters/trace/jaeger (1741)
4d25b6a2 Bump github.com/prometheus/client_golang from 1.9.0 to 1.10.0 in /exporters/metric/prometheus (1740)
0a47b66f Bump google.golang.org/grpc from 1.36.0 to 1.36.1 in /exporters/otlp (1739)
26f006b8 Reinstate @paivagustavo as an Approver (1734)
382c7ced Remove hasRemoteParent field from SDK span (1728)
862a5a68 Remove setting error status while recording error with Span from oteltest package (1729)
6defcfdf Remove links on NewRoot spans (1726)
a9b2f851 upgrade thrift to v0.14.1 in jaeger exporter (1712)
5a6a854d Bump google.golang.org/protobuf from 1.25.0 to 1.26.0 in /exporters/otlp (1724)
23486213 Migrate to using go.opentelemetry.io/proto/otlp (1713)
5d559b40 Remove makeSamplingDecision func (1711)
e24702da Update the TraceContext.Extract docs (1720)
9d4eb1f6 Update dates in CHANGELOG.md for 2021 releases (1723)
2b4fa968 Release v0.19.0 (1710)
4beb7041 sdk/trace: removing ApplyConfig and Config (1693)
1d42be16 Rename WithDefaultSampler TracerProvider option to WithSampler and update docs (1702)
860d5d86 Add flag to determine whether SpanContext is remote (1701)
0fe65e6b Comply with OpenTelemetry attributes specification (1703)
88884351 Bump google.golang.org/api from 0.40.0 to 0.41.0 in /exporters/trace/jaeger (1700)
345f264a breaking(zipkin): removes servicName from zipkin exporter. (1697)
62cbf0f2 Populate Jaeger's Span.Process from Resource (1673)
28eaaa9a Add a test to prove the Tracer is safe for concurrent calls (1665)
8b1be11a Rename resource pkg label vars and methods (1692)
a1539d44 OpenCensus metric exporter bridge (1444)
77aa218d Fix issue #1490, apply same logic as in the SDK (1687)
9d3416cc Fix synchronization issues in global trace delegate implementation (1686)
58f69f09 Span status from HTTP code: Do not set status message if it can be inferred (1681)
9c305bde Flush metric events prior to shutdown in OTLP example (1678)
66b1135a Fix CHANGELOG (1680)
90bd4ab5 Update employer information for maintainers (1683)
36841913 Remove WithRecord() option from trace.SpanOption when starting a span (1660)
65c7de20 Remove trace prefix from NoOp src files. (1679)
e88a091a Make SpanContext Immutable (1573)
d75e2680 Avoid overriding configuration of tracer provider (1633)
2b4d5ac3 Bump github.com/golangci/golangci-lint in /internal/tools (1671)
150b868d Bump github.com/google/go-cmp from 0.5.4 to 0.5.5 (1667)
76aa924e Fix the examples target info messaging (1676)
a3aa9fda Bump github.com/itchyny/gojq from 0.12.1 to 0.12.2 in /internal/tools (1672)
a5edd79e Removed setting error status while recording err as span event (1663)
e9814758 chore(zipkin): improves zipkin example to not to depend on timeouts. (1566)
3dc91f2d Add ForceFlush method to TracerProvider (1608)
bd0bba43 exporter: swap pusher for exporter (1656)
56904859 Update the SimpleSpanProcessor (1612)
a7f7abac SpanStatus description set only when status code is set to Error (1662)
05252f40 Jaeger Exporter: Fix minor mapping discrepancies (1626)
238e7c61 Add non-empty string check for attribute keys (1659)
e9b9aca8 Add tests for propagation of Sampler Tracestate changes (1655)
875a2583 Add docs on when reviews should be cleared (1556)
7153ef2d Add HTTP/JSON to the otlp exporter (1586)
62e2a0f7 Unexport the simple and batch SpanProcessors (1638)
992837f1 Add TracerProvider tests to oteltest harness (1607)
bb4c297e Pre release v0.18.0 (1635)
712c3dcc Fix makefile ci target and coverage test packages (1634)
841d2a58 Rename local var new to not collide with builtin (1610)
13938ab5 Update SpanProcessor docs (1611)
e25503a0 Add compatibility tests to CI (1567)
1519d959 Use reasonable interval in sdktrace.WithBatchTimeout (1621)
7d4496e0 Pass metric labels when transforming to gaugeArray (1570)
6d4a5e0d Bump google.golang.org/grpc from 1.35.0 to 1.36.0 in /exporters/otlp (1619)
a93393a0 Bump google.golang.org/grpc in /example/prom-collector (1620)
e499ca86 Fix validation for tracestate with vendor and add tests (1581)
43886e52 Make timestamps sequential in lastvalue agg check (1579)
37688ef6 revent end-users from implementing some interfaces (1575)
85e696d2 Updating documentation with an working example for creating NewExporter (1513)
562eb28b Unify the Added sections of the unreleased changes (1580)
c4cf1aff Fix Windows build of Jaeger tests (1577)
4a163bea Fix stdout TestStdoutTimestamp failure with sleep (1572)
bd4701eb Stagger timestamps in exact aggregator tests (1569)
b94cd4b2 add code attributes to semconv package (1558)
78c06cef Update docs from gitter to slack for communication (1554)
1307c911 Remove vendor exclude from license-check (1552)
5d2636e5 Bump github.com/golangci/golangci-lint in /internal/tools (1565)
d7aff473 Vendor Thrift dependency (1551)
298c5a14 Update span limits to conform with OpenTelemetry specification (1535)
ecf65d79 Rename otel/label -> otel/attribute (1541)
1b5b6621 Remove resampling on span.SetName (1545)
8da52996 fix: grpc reconnection (1521)
3bce9c97 Add Keys() method to propagation.TextMapCarrier (1544)
0b1a1c72 Make oteltest.SpanRecorder into a concrete type (1542)
7d0e3e52 SDK span no modification after ended (1543)
7de3b58c Remove extra labels types (1314)
73194e44 Bump google.golang.org/api from 0.39.0 to 0.40.0 in /exporters/trace/jaeger (1536)
8fae0a64 Create resource.Default() with required attributes/default values (1507)
76f93422 Release v0.17.0 (1534)
9b242bc4 Organize API into Go modules based on stability and dependencies (1528)
e50a1c8c Bump actions/cache from v2 to v2.1.4 (1518)
a6aa7f00 Bump google.golang.org/api from 0.38.0 to 0.39.0 in /exporters/trace/jaeger (1517)
38efc875 Code Improvement - Error strings should not be capitalized (1488)
6b340501 Update default branch name (1505)
b39fd052 nit: Fix comment to be up-to-date (1510)
186c2953 Fix golint error of package comment form (1487)
9308d662 Bump google.golang.org/api from 0.37.0 to 0.38.0 in /exporters/trace/jaeger (1506)
1952d7b6 Reverse order of attribute precedence when merging two Resources (1501)
ad7b4715 Remove build flags for runtime/trace support (1498)
4bf4b690 Remove inaccurate and unnecessary import comment (1481)
7e19eb6a Bump google.golang.org/api from 0.36.0 to 0.37.0 in /exporters/trace/jaeger (1504)
c6a4406a Bump github.com/golangci/golangci-lint in /internal/tools (1503)
9524ac09 Update workflows to include main branch as trigger (1497)
c066f15e Bump github.com/gogo/protobuf from 1.3.1 to 1.3.2 in /internal/tools (1478)
894e0240 Bump github.com/golangci/golangci-lint in /internal/tools (1477)
71ffba39 Bump google.golang.org/grpc from 1.34.0 to 1.35.0 in /exporters/otlp (1471)
515809a8 Bump github.com/itchyny/gojq from 0.12.0 to 0.12.1 in /internal/tools (1472)
3e96ad1e gitignore: remove unused example path (1474)
c5622777 Histogram aggregator functional options (1434)
0df8cd62 Rename Makefile.proto to avoid interpretation as proto file (1468)
979ff51f Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 (1453)
1df8b3b8 Bump github.com/gogo/protobuf from 1.3.1 to 1.3.2 in /exporters/otlp (1456)
4c30a90a Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /sdk (1455)
5a9f8f6e Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/stdout (1454)
7786f34c Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/trace/zipkin (1457)
4352a7a6 Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/otlp (1460)
6990b3b3 Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/metric/prometheus (1461)
7af40d22 Bump github.com/stretchr/testify from 1.6.1 to 1.7.0 in /exporters/trace/jaeger (1463)
f16f1892 Bump google.golang.org/grpc in /example/otel-collector (1465)
fe363be3 Move Span Event to API (1452)
43922240 Bump google.golang.org/grpc in /example/prom-collector (1466)
0aadfb27 Prepare release v0.16.0 (1464)
207587b6 Metric histogram aggregator: Swap in SynchronizedMove to avoid allocations (1435)
c29c6fd1 Shutdown underlying span exporter while shutting down BatchSpanProcessor (1443)
dfece3d2 Combine the Push and Pull metric controllers (1378)
74deeddd Handle tracestate in TraceContext propagator (1447)
49f699d6 Remove Quantile aggregation, DDSketch aggregator; add Exact timestamps (1412)
9c949411 Rename internal/testing to internal/internaltest (1449)
8d809814 Move gRPC driver to a subpackage and add an HTTP driver (1420)
9332af1b Bump github.com/golangci/golangci-lint in /internal/tools (1445)
5ed96e92 Update exporters/otlp Readme.md (1441)
bc9cb5e3 Switch CircleCI badge to GitHub Actions (1440)
716ad082 Remove CircleCI config (1439)
0682db1e Adding Security Workflows to GitHub Actions (2/2): gosec workflow (1429)
11f732b8 Adding Security Workflows to GitHub Actions (1/2): codeql workflow (1428)
40f1c003 Add Tracestate into the SamplingResult struct (1432)
db06c8d1 Flush metric events before shutdown in collector example (1438)
f6f458e1 Fix golint issue caused by typo in trace.go (1436)
fe9d1f7e Use uint64 Count consistently in metric aggregation (1430)
3a337d0b Bump github.com/golangci/golangci-lint in /internal/tools (1433)
1e4c8321 cleanup: drop the removed examples in gitignore (1427)
5c9221cf Unify endpoint API that related to OTel exporter (1401)
045c3ffe Build scripts: Replace mapfile with read loop for old bash versions (1425)
2def8c3d Add Versioning Documentation (1388)
6bcd1085 Bump github.com/itchyny/gojq from 0.11.2 to 0.12.0 in /internal/tools (1424)
38e76efe Add a split protocol driver for otlp exporter (1418)
439cd313 Add TraceState to SpanContext in API (1340)
35215264 Split connection management away from exporter (1369)
add9d933 Bump github.com/prometheus/client_golang from 1.8.0 to 1.9.0 in /exporters/metric/prometheus (1414)
93d426a1 Add @dashpole as a project Approver (1410)
6fe20ef3 Fix small typo (1409)
b22d0d70 Mention the getting started guide (1406)
3fb80fb2 Fix duplicate checkout action in GitHub workflow (1407)
2051927b Correct CI workflow syntax (1403)
f11a86f7 Fix typo in comment (1402)
bdf87a78 Migrate CircleCI ci.yml workflow to GitHub Actions (1382)
4e59dd1f Bump google.golang.org/grpc from 1.32.0 to 1.34.0 in /example/otel-collector (1400)
83513f70 Bump google.golang.org/api from 0.32.0 to 0.36.0 in /exporters/trace/jaeger (1398)
a354fc41 Bump github.com/prometheus/client_golang from 1.7.1 to 1.8.0 in /exporters/metric/prometheus (1397)
3528e42c Bump google.golang.org/grpc from 1.32.0 to 1.34.0 in /exporters/otlp (1396)
af114baf Call otel.Handle with non-nil errors (1384)
c3c4273e Add RO/RW span interfaces (1360)
Fixes#2591
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Ensure the tests in the local `logging` crate are run for all consumers
of it.
Additionally, add a new test which checks that output is generated by a
range of different log level `slog` macros. This is designed to ensure
debug level output is always available for the consumers of the
`logging` crate.
Fixes: #2969.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
In later versions of OpenTelemetry label.Any() is deprecated. Create
addTag() to handle type assertions of values. Change AddTag() to
variadic function that accepts multiple keys and values.
Fixes#2547
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Fixed the top-level build which was broken: the kata deploy
Makefile was being sourced, but it was defining the first target, which
became the default.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Raise the `slog` maximum log level feature for release code from `info`
to `debug` by changing the `slog` maximum level features in the shared
`logging` crate. This allows the consumers of the `logging` crate (the
agent, the `trace-forwarder` and the `agent-ctl` tool) to produce debug
output when their debug options are enabled. Currently, those options
will essentially be a NOP (unless using a debug version of the code).
Testing showed that setting the `slog` maximum level features in the
rust manifest files for the consumers of the `logging` crate has no
impact: those values are ignored, so they have been removed and replaced
with a comment stating the levels are set in the `logging` crate.
Fixes: #2966.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Currently the image-builder image is built from `fedora:latest` and
this is error-prone as any update of the base image can lead to
breakage. Instead let's create the image from Fedora 34, which is the
last known version to build fine.
Fixes#2960
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
There are some issues with Makefile for runtime:
- default target can't be used as a dependent of other targets.
- empty target `check`
And also add two targets for locally development/tests.
- lint: run golangci-lint
- pre-commit: run lint and test
Fixes: #2942
Signed-off-by: bin <bin@hyper.sh>
Use `dup3` system call instead of `dup2` in unit tests of seccomp
because `dup2` is obsolete on aarch64.
Fixes: #2939
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
We only added span.End() in the main process of the shim2 Shutdown method.
The "Shutdown" span would keep alive, when the containers number is not 0.
This PR make sure the "Shutdown" trace span have a correct end.
Fixes: #2930
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
Add -failfast option to let test exit on error, but -failfast option
can't cross package, so there is a for loop used to test on all packages
in src/runtime, and the parallel number is set to 1, this may lead test
to be slow.
Fixes: #1997
Signed-off-by: bin <bin@hyper.sh>
It is safer to download the tarballs and work on a temporary directory
which can be proper cleaned up when the script finishes.
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
If DESTDIR is set on the environment then gperf will be installed
in an unexpected directory, resulting on the libseccomp's configure
not being able to find it. To avoid that issue this changed the
ci/install_libseccomp.sh so that PREFIX and DESTDIR are unset
inside the script.
Fixes#2932
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Virtcontainers API document functions weren't sync with the codes Sandbox and VCImpl.
And we have two functions named `CreateSandbox` functions, diff by one parameter,
very confused. So this pr sync the codes to api documents.
Fixes: #2928
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
The `kata-agent` binaries inside the Kata Containers images provided
with release are statically linked with the GNU LGPL-2.1 licensed
libseccomp library by default.
Therefore, we attach the complete source code of the libseccomp
to the release page in order to comply with the LGPL-2.1 (6(a)).
In addition, we add the description about the libseccomp license
to the release page.
Fixes: #2922
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
Cloud hypervisor is only supporting virtio-blk, this PR removes comments
that make a wrong reference of other features that are not supported
by clh.
Fixes#2924
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Allow the `agent-ctl` tool to connect to a Hybrid VSOCK hypervisor such
as Cloud Hypervisor or Firecracker.
Fixes: #2914.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Update the trace forwarder README to remove the quotes around the socket
path, which makes manipulating that path easier.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Current handling of read-only mounts is a little tricky.
However, a clearer solution can be used here:
1. make a private ro bind mount at privateDest to the mount source
2. make a bind mount at mountDest to the mount created in step 1
3. umount the private bind mount created in step 1
One important aspect is that the mount in step 2 is duplicated from
the one we created in step 1. So the MS_RDONLY flag is properly
preserved in all mounts created in the propagtion.
Fixes: #2205
Depends-on: github.com/kata-containers/tests#4106
Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
Pull #2795 recently added support for a closer-to-OCI behaviour for
VFIO devices, in which they appear to the container as VFIO devices,
rather than being interpreted by the guest kernel. However, in order
to use this, the Kata guest kernel needs to include the VFIO PCI
driver, along with dependencies like the Intel IOMMU driver.
The kernel as built by the scripts within Kata don't currently include
those, so this patch adds them.
fixes#2913
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
- convert linux field from oci spec to grpc spec
- include all the fields below linux oci spec
Fixes: #2715
Signed-off-by: Da Li Liu <liudali@cn.ibm.com>
In order to pass CI test of aarch64, it is necessary to run
`ci/install_libseccomp.sh` before ruuning unit tests in
`jenkins_job_build.sh`.
However, `ci/install_libseccomp.sh` is not available
until PR #1788 including this commit is merged in the mainline.
Therefore, we disable seccomp feature on aarch64 temporarily.
After #1788 lands and CI is fixed, this commit will be reverted.
Fixes: #1476
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
This adds explanation about how to enable seccomp in the kata-runtime and
build the kata-agent with seccomp capability.
Fixes: #1476
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
This adds a step for installing libseccomp because the kata-agent
supports seccomp feature.
Fixes: #1476
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
The osbuilder needs to set up libseccomp library to build the kata-agent
because the kata-agent supports seccomp currently.
The library is built from the sources to create a static library for musl libc.
In addition, environment variables for the libseccomp crate are set to
link the library statically.
Fixes: #1476
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
The kata-agent supports seccomp feature based on the OCI runtime specification.
This seccomp capability in the kata-agent is enabled by default.
However, it is not enforced by default: users need to enable that by setting
`disable_guest_seccomp` to `false` in the main configuration file.
Fixes: #1476
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
All endpoint names share the `Request` suffix.
Also, the current list is based on functions, not requests.
Fixes#2916
Reported-by: Jakob Naucke <jakob.naucke@ibm.com>
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
There is a problem with slash-command-action which is on absence of a slash command
the job fails (instead of simply ignore, i.e., skip). This is documented on
https://github.com/xt0rted/slash-command-action/issues/124. There is a workaround
also documented on that issue, but here instead let's get rid of the action.
In this new implementation all comments sent to the pull request are parsed, if any
starts with "/test_kata-deploy" then the job is triggered.
Fixes#2836
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
This changed valid() in hypervisor to check the case where both
initrd and image path are set; in this case it returns an error.
Fixes#1868
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Fix `-l <log-level>` for the trace forwarder which didn't work
previously as it lacked the magic Cargo configuration.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Hybrid VSOCK requires `root` privileges to access the sandbox-specific
host-side AF_UNIX socket created by the hypervisor (CLH or FC). However,
once the socket has been bound, privileges can be dropped, allowing the
forwarder to run as user `nobody`.
Fixes: #2905.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
The oci_to_grpc function just handles part of oci fields,
and others are not copied from oci spec to grpc spec,
such as process.env, process.capabilities, mounts and so on.
Try to implement more handlings to convert thoses fields.
Fixes#2686
Signed-off-by: Lei Li <cdlleili@cn.ibm.com>
Rather than generating a potentially misleading error message if the
socket bind fails, perform an explicit check for `root` for Hybrid
VSOCK.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Updated the trace forwarder README to ensure the real socket path is
created, not the template socket path returned by `kata-runtime env`.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
- Install OpenSSL for key generation in kernel build
- Do not install libpmem
- Do not exclude `*/share/*/*.img` files in QEMU tarball since among
them are boot loader files critical for IPLing.
Fixes: #2895
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Show available guest protections in the
`kata-runtime env` output. Also bump the formatVersion.
Fixes: #1982
Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
Add functions to return guestProtection as a string slice, which
can be then used in `kata-runtime env` output.
Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
uevents with action=remove was ignored causing the agent to reuse stale
data in the device map. This patch adds handling of such uevents.
Fixes#2405
Signed-off-by: Haitao Li <lihaitao@gmail.com>
On a conventional (e.g. runc) container, passing in a VFIO group device,
/dev/vfio/NN, will result in the same VFIO group device being available
within the container.
With Kata, however, the VFIO device will be bound to the guest kernel's
driver (if it has one), possibly appearing as some other device (or a
network interface) within the guest.
This add a new `vfio_mode` option to alter this. If set to "vfio" it will
instruct the agent to remap VFIO devices to the VFIO driver within the
guest as well, meaning they will appear as VFIO devices within the
container.
Unlike a runc container, the VFIO devices will have different names to the
host, since the names correspond to the IOMMU groups of the guest and those
can't be remapped with namespaces.
For now we keep 'guest-kernel' as the value in the default configuration
files, to maintain current Kata behaviour. In future we should change this
to 'vfio' as the default. That will make Kata's default behaviour more
closely resemble OCI specified behaviour.
fixes#693
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently constrainGRPCSpec always removes VFIO devices from the OCI
container spec which will be used for the inner container. For
upcoming support for VFIO devices in DPDK usecases we'll need to not
do that.
As a preliminary to that, add an extra parameter to the function to
control whether or not it will remove the VFIO devices from the spec.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
"constraint" is a noun, "constrain" is the associated verb, which makes
more sense in this context.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In order to support DPDK workloads, we need to change the way VFIO devices
will be handled in Kata containers. However, the current method, although
it is not remotely OCI compliant has real uses. Therefore, introduce a new
runtime configuration field "vfio_mode" to control how VFIO devices will be
presented to the container.
We also add a new sandbox annotation -
io.katacontainers.config.runtime.vfio_mode - to override this on a
per-sandbox basis.
For now, the only allowed value is "guest-kernel" which refers to the
current behaviour where VFIO devices added to the container will be bound
to whatever driver in the VM kernel claims them.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add and adjust the vfio devices in the inner container spec so that
rustjail will create device nodes for them.
In order to do that, we also need to make sure the VFIO device node is
ready within the guest VM first. That may take (slightly) longer than
just the underlying PCI device(s) being ready, because vfio-pci needs
to initialize. So, add a helper function that will wait for a
specific VFIO device node to be ready, using the existing uevent
listening mechanism. It also returns the device node name for the
device (though in practice it will always /dev/vfio/NN where NN is the
group number).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Many device nodes go directly under /dev, however some are conventionally
placed in subdirectories under /dev. For example /dev/vfio/vfio or
/dev/pts/ptmx.
Currently, attempting to pass such a device into a Kata container will fail
because mknod() will get an ENOENT because the parent directory is missing
(or an equivalent error for bind_dev()).
Correct that by making subdirectories as necessary in create_devices().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For each user supplied device, create_devices() checks that the given path
actually is in /dev, by checking that its path starts with /dev and does
not contain "..".
However, this has subtle errors because it's interpreting the path as a raw
string without considering separators. It will accept the path /devfoo
which it should not, while it will not accept the valid (though weird)
paths /dev/... and /dev/a..b.
Correct this by using std::path::Path methods designed for the purpose.
Having done this, it's trivial to also generate the relative path that
mknod_dev() or bind_dev() will need, so do that at the same time.
We also move this logic into a helper function so that we can add some unit
tests for it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Both these functions take the absolute path from LinuxDevice and drop the
leading '/' to make a relative path. They do that with a simple
&dev.path[1..]. That can be technically incorrect in some edge cases such
as a path with redundant /s like "//dev//sda".
To handle cases like that, have the explicit relative path passed into
these functions. For now we calculate it in the same buggy way, but we'll
fix that shortly.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
create_devices() within the rustjail module is responsible for creating
device nodes within the (inner) containers. Errors that occur here will
be propagated up, but are likely to be low level failures of mknod() - e.g.
ENOENT or EACCESS - which won't be very useful without context when
reported all the way up to the runtime without the context of what we were
trying to do.
Add some anyhow context information giving the details of the device we
were trying to create when it failed.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently, update_spec_device() assumes that the proper device path in the
(inner) container is the same as the device path specified in the outer OCI
spec on the host.
Usually that's correct. However for VFIO group devices we actually need
the container to see the VM's device path, since it's normal to correlate
that with IOMMU group information from sysfs which will be different in the
guest and which we can't namespace away.
So, add an extra "final_path" parameter to update_spec_device() to allow
callers to chose the device path that should be used for the inner
container. All current callers pass the same thing as container_path, but
that will change in future.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
update_spec_device_list() is used to update the container configuration to
change device major/minor numbers configured by the Kata client based on
host details to values suitable for the sandbox VM, which may differ. It
takes a 'device' object, but the only things it actually uses from there
are container_path and vm_path.
Refactor this as update_spec_device(), taking the host and guest paths to
the device as explicit parameters. This makes the function more
self-contained and will enable some future extensions.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Each VFIO device passed into the guest could represent a whole IOMMU group
of devices on the host. Since these devices aren't DMA isolated from each
other, they must appear as the same IOMMU group in the guest as well.
The VMM should enforce that for us, but double check it, since things can't
work otherwise. This also means we determine the guest IOMMU group for the
VFIO device, which we'll be needing later.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For upcoming VFIO extensions we'll need to work with the IOMMU groups of
VFIO devices. This helps us towards that by adding pci_iommu_group() to
retrieve the IOMMU group (if any) of a given PCI device.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
VFIO devices can be added to a Kata container and they will be passed
through to the sandbox guest. However, inside the guest those devices
will bind to a native guest driver, so they will no longer appear as VFIO
devices within the guest. This behaviour differs from runc or other
conventional container runtimes.
This code allows the agent to match the behaviour of other runtimes,
if instructed to by kata-runtime. VFIO devices it's informed about
with the "vfio" type instead of the existing "vfio-gk" type will be
rebound to the vfio-pci driver within the guest.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For better VFIO support, we're going to need to take control of which guest
driver controls specific guest devices. To assist with that, add the
pci_driver_override() function to force a specific guest device to be
bound to a specific guest driver.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Use `"c".to_string` in the device type of `dev/full`
in order to consistent with the coding style of other devices
Fixes: #2890
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
create_tmpfs won't pass as the race condition in watcher umount. quote
James's words here:
1. Rust runs all tests in parallel.
2. Mounts are a process-wide, not a per-thread resource.
The only test that calls watcher.mount() is create_tmpfs().
However, other tests create BindWatcher objects.
3. BindWatcher's drop() implementation calls self.cleanup(),
which calls unmount for the mountpoint create_tmpfs() asserts.
4. The other tests are calling unmount whenever a BindWatcher goes
out of scope.
To avoid that issue, let the tests using BindWatcher in watcher and
sandbox.rs run sequentially.
Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
DefaultMaxVCPUs may be larger than the defaultMaxQemuVCPUs that should
be checked and avoided.
Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
The physical current vcpu number should not be used directly as the
largest vcpu number is limited to defaultMaxQemuVCPUs.
Here, a new helper is introduced in pkg/katautils/config.go to get
current vcpu number.
Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
The current kernel version parse lib can't process suffix '+', as the
modified kernel version will add '+' as suffix, thus panic will occur.
For example, if the current kernel version is "5.14.0-rc4+", test
TestHostNetworkingRequested will panic:
--- FAIL: TestHostNetworkingRequested (0.00s)
panic: &{DistroName:ubuntu DistroVersion:18.04
KernelVersion:5.11.0-rc3+ Issue: Passed:[] Failed:[] Debug:true
ActualEUID:0}: failed to check test constraints: error: Build meta data
is empty
Here, remove the suffix '+' in kernel version fix helper.
Fixes: #2809
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Last of a series of commits to export the top level
hypervisor generic methods.
s/createSandbox/CreateVM
Fixes#2880
Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Export commonly used hypervisor fields and utility functions.
These need to be exposed to allow the hypervisor to be consumed
externally.
Note: This does not change the hypervisor interface definition.
Those changes will be separate commits.
Signed-off-by: Manohar Castelino <mcastelino@apple.com>
This is to bump the OOT QAT 1.7 driver version to the
latest version. I dida test on my QAT enabled system and
everything functioned as expected.
Fixes: #2877
Signed-off-by: Eric Adams <eric.adams@intel.com>
Highlights from the Cloud Hypervisor release v19.0: 1) Improved PTY
handling for serial and virtio-console; 2) PCI boot time optimisations;
3) Improved TDX support; 4) Live migration enhancements (support with
virtio-mem and virtio-balloon); 5) virtio-mem support with vfio-user; 6)
AArch64 for virtio-iommu; 7) Various bug fixes for live-migration and
VFIO passthrough.
Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v19.0Fixes: #2871
Signed-off-by: Bo Chen <chen.bo@intel.com>
The upstream kernel SGX support has changed drastically since
the initial version of the Intel SGX use case doc was written.
The updated use case documents how to easily setup SGX with
Kata Containers running in a Kubernetes cluster.
Fixes: #2811
Depends-on: github.com/kata-containers/tests#4079
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Co-authored-by: James O. D. Hunt <james.o.hunt@intel.com>
This PR includes these optimize changes:
- Remove the dependency on the container engine.
The old code uses runc to generate config.json and
Docker to export rootfs, that will be heavy and need
additional dependency.
Using a fixed config for busybox image can avoid
the heavy processing above.
- Moved duplicate code to pkg/katatestutils package
Fixes: #2752
Signed-off-by: bin <bin@hyper.sh>
cri-containerd project has been merged into containerd repo, and
we should not reference it any more in code and docs.
This commit will use containerd package instead of cri-containerd
package.
Fixes: #2791
Signed-off-by: bin <bin@hyper.sh>
This commit will add containerd to versions.yaml.
Please at now there are both containerd and cri-containerd
in the versions.yaml.
After updating of kata-containers/tests repo, the cri-containerd
should be removed.
Fixes: #2791
Signed-off-by: bin <bin@hyper.sh>
When building with dracut method the build_rootfs_distro() is not called, in turn
detect_rust_version() isn't either, so the install_rust.sh script is gave a null
rust version. This changed the script to call detect_rust_version() right before
install_rust.sh.
Related to commit: f34f67d610Fixes#2862
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Add `libseccomp` and `gperf` version information to support
for seccomp feature in Kata agent: #1788.
Fixes: #2858
Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
When running the TestIoCopy test, on some occasions, the test
runs too quick, and closes the stdin pipe before the ioCopy()
routine start to read from it. This causes a SIGSEGV error.
To fix this issue, I am adding additional read/write tests before
closing the pipes. As the read operation waits for the writer to
be done, this actually synchronizes the threads and make sure
the final tests (with closed pipes) works as expected.
Fixes: #2042
Signed-off-by: Julien Ropé <jrope@redhat.com>
and update the script in `ci/` accordingly.
When only parts of the Kata Containers repositories are checked out
(e.g. when building with Snap) and no Rust version is provided in
calling `install_rust.sh`, the scripts will attempt to clone the
appropriate repos to read the version, which will fail because the
directories already exist. Since we have read the version already, we
can just specify it.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
The agent build inside a Docker or Podman container has been re-enabled,
but we have since introduced the `$CI` environment variable. Pass it to
avoid checking out the tests repo to main when there is a dependency.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
or Podman. This is a partial revert of
76c18aa345. The rationale behind that
commit was the fact that the agent could not be built on Alpine, and
then this capability was removed altogether. The issue in Alpine has
since been resolved (see
https://github.com/kata-containers/osbuilder/issues/386). At the same
time, this ensures being able to run a glibc agent on hosts with distros
more recent than the osbuilder distro used (i.e. as of now, when you
build the agent on the host, and its glibc is newer than the one used in
the guest, the agent may encounter unresolved symbols).
Fixes#2398
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
The tracing tags for api.go contain `"packages"` as a tag name,
whereas all other tags contain `"package"`.
Fixes: #2847
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Display a pseudo path to the sandbox socket in the output of
`kata-runtime env` for those hypervisors that use Hybrid VSOCK.
The path is not a real path since the command does not create a sandbox.
The output includes a `{ID}` tag which would be replaced with the real
sandbox ID (name) when the sandbox was created.
This feature is only useful for agent tracing with the trace forwarder
where the configured hypervisor uses Hybrid VSOCK.
Note that the features required a new `setConfig()` method to be added
to the `hypervisor` interface. This isn't normally needed as the
specified hypervisor configuration passed to `setConfig()` is also
passed to `createSandbox()`. However the new call is required by
`kata-runtime env` to display the correct socket path for Firecracker.
The new method isn't wholly redundant for the main code path though as
it's now used by each hypervisor's `createSandbox()` call.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Add support for Hybrid VSOCK. Unlike standard vsock (`vsock(7)`), under
hybrid VSOCK, the hypervisor creates a "master" *UNIX* socket on the
host. For guest-initiated VSOCK connections (such as the Kata agent uses
for agent tracing), the hypervisor will then attempt to open a VSOCK
port-specific variant of the socket which it expects a server to be
listening on. Running the trace forwarder with the new `--socket-path`
option and passing it the Hypervisor specific master UNIX socket path,
the trace forwarder will listen on the VSOCK port-specific socket path
to handle Kata agent traces.
For further details and examples, see the README or run the
trace forwarder with `--help`.
Fixes: #2786.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
.dockerignore file is similar to .gitignore and serves the purpose to
simply ignore paths in the build context.
For now, let me just use it to fix the following problem:
```
docker build --build-arg KATA_ARTIFACTS=kata-static.tar.xz .
error checking context: 'no permission to read from
'(...)/local-build/build/firecracker/builddir/firecracker/(...)/crc64-1.0.0/.gitignore''.
```
Fixes: #2845
Signed-off-by: Fabiano Fidêncio <fabiano@fidencio.org>
Adding a route that already exists should not be a reason for the agent to fail
booting and thus preventing the sandbox to start.
Fixes#2712
Signed-off-by: zhaojizhuang <571130360@qq.com>
- kata-monitor: add index page
- clh: Refine the usage of guest console and kernel parameters with Cloud Hypervisor
- agent: exec should inherit container process capabilities
- GitHubActions: fix invalid format of require-pr-porting-labels.yaml
- agent: flush root span before process finish
- Extend PCI submodules to represent non-zero functions and addresses
- packaging/kernel: Add CONFIG_PCI_MMCONFIG to x86 guest kernel configuration
- runtime: don't start shim management server in tests
- qemu: use GitLab repos instead of qemu.org
- runtime: optimize code for managing temp users for rootless mode
- Agent configuration file and API restriction
- Delete file virtcontainers-setup.sh
- vendor: Update containerd to v1.5.7
- runtime: Optimize func noNeedForOutput and add test cases
- runtime: Fix !x86 static checks
- #2676: fixing centos gpg key url for ppc64le
- Pass the host route IP family to the guest
- cmd: get return value for setCPUtype
- packaging: Configure QEMU with --enable-pie
- clh: Enable guest userland output
- cmd: Fix mismatched types in testModuleData
- runtime: update .gitignore to ignore monitor_address file
- runtime: fix the make check-go-static command error
- virtcontainers: clean up useless code
- Remove forced PCI rescans from agent
- kernel: Enable SGX in experimental kernel.
- runtime: fix nil reference in cleanup rootless user
- qemu: prepare to upgrade qemu version to 6.1.0 for arm
- kata-monitor (minor) improvements
- virtcontainers: Fix incorrect scripts path
- runtime: clear virtcontainers cgroup duplicated function
- Kata monitor: cache improvements
- virtiofs: fix error report in TestVirtiofsdStart when go test running
176dee6f agent: exec should inherit container process capabilities
7b2bfd4e virtcontainers: clh: Use 'quiet' as the default kernel parameter
3e24e46c virtcontainers: clh: Turn-off serial and virtio-console by default
2d7b65e8 agent: flush root span before process finish
5c77cc2c runtime: don't start shim management server in tests
72044180 agent/device: Return PCI address from wait_for_pci_device()
e50b05d9 agent/pci: Add type to represent PCI addresses
8528157b agent/pci: Extend Slot type to represent PCI function as well
bf8f582c runtime: optimize code for managing temp users for rootless mode
a9c2a4ba GitHubActions: fix invalid format of require-pr-porting-labels.yaml
c4236cb2 packaging/kernel: Add CONFIG_PCI_MMCONFIG to x86 guest kernel configuration
08360c98 agent: Add an agent configutation file example
8a4e69d2 agent: rpc: Return UNIMPLEMENTED for not allowed endpoints
0ea2e3af agent: config: Allow for building the configuration from a file
63539dc9 agent: config: Add allowed endpoints
a953fea3 agent: config: Simplify configuration creation
b888edc2 agent: config: Implement Default
7eac2ec7 protection: add confidential compute frame for arm
8acfc154 check: fix typecheck failure in qemu_arm64_test.go
5b02d54e virtcontainers: fix lint failure on ppc64le
ff9728f0 virtcontainers: nolint guestProtection
5c138c8f runtime: Fix field alignment on s390x
191d0016 vendor: Update containerd to v1.5.7
f7f6bd01 kata-monitor: add index page
a44cde7e agent: netlink: Use the grpc IP family field when updating the route
71ce6cfe runtime: Pass the route IP family to the agent
99450bd1 agent: protos: Add a Family field to the Route payload
f85fe702 runtime: vendor: Bump the netlink package dependency
e439cec7 cmd: fix field alignment on ppc64le
e5159ea7 cmd: get return value for setCPUtype
2ce8d426 clh: Suppress hypervisor output to make guest output visible
cd1064b1 packaging: Configure QEMU with --enable-pie
762922a5 runtime: delete func ConstraintsToVCPUs
4f485430 runtime: delete virtcontainers-setup.sh
80f6b977 osbuilder: fixing centos gpg key url for ppc64le
bb99bfb4 runtime: fix the make check-go-static command error
870771d7 runtime: update .gitignore to ignore monitor_address file
18bff584 runtime: Optimize func noNeedForOutput and add test cases
e5fe53f0 runtime: fix nil reference in cleanup rootless user
2304a596 runtime: set the sandbox storage path static
315295e0 runtime: rename GetSanboxesStoragePath() --> GetSandboxesStoragePath()
13e65f2e cmd: Fix mismatched types in testModuleData
da42cbc0 actions: Build experimental kernel on kata-deploy push action
dffc5092 kernel: Enable SGX in experimental kernel.
ff6a677d kernel-build: Enable multiple config types.
90046964 experimental-kernel: bump 5.13.10
1fbb7304 build: kata-deploy kernel experimental
907459c1 agent/device: Don't force PCI rescans
75f426dd agent: Simplify do_add_swap()
aad1a873 runtime/device: Give the agent information about VFIO devices
ebd7b618 runtime: Don't repeat GetDeviceByID between appendDevices() and append*()
ad45c52f runtime/device: Record guest PCI path for VFIO devices
5c2af3e3 runtime/device: Refactor hotplugVFIODevice() to have common exit path
8bc71105 agent/device: Add device type for VFIO devices
f7a27075 agent: Move driver type constants into device.rs
5b1eb08b agent/uevent: Improve logging of wait_for_uevent()
cf36fd87 runtime: Fix some leftover go fmt errors
6d94957a kernel: reduce alignment size of memory hotplug to 128M
48090f62 qemu: disable plug on arm64 when pie is added
57e3712d virtiofs: fix error report in TestVirtiofsdStart when go test running
8b0bc1f4 kata-monitor: bump version to 0.2.0
bfb556d5 kata-monitor: refresh kata sandbox list on fs events
0e854f3b kata-monitor: improve detection of kata workloads
80463b44 qemu: use GitLab repos instead of qemu.org
3b0c4bf9 runtime: clear virtcontainers cgroup duplicated function
afad910d kata-monitor: add getSandboxFS()
e38686f7 runtime: add GetSandboxesStoragePath()
245a12bb kata-monitor: improve sandbox caching
fc067d61 kata-monitor: warn when unable to retrive the lower level runtime
53ec4df9 kata-monitor: minor fixes
47516988 virtcontainers: Fix incorrect scripts path
814cea96 virtcontainers: clean up useless code
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
The 'quiet' kernel parameter can avoid guest kernel logs while booting,
which can reduce boot time.
Fix: #2820
Signed-off-by: Bo Chen <chen.bo@intel.com>
We will need to have console output from the guest only for debugging
purposes. As a result, we can turn-off both the serial and
virtio-console devices by default for better boot time.
Fixes: #2820
Signed-off-by: Bo Chen <chen.bo@intel.com>
Variables in rust will be dropped at the end of the function.
In function real_main the trace will be shut down by `tracer::end_tracing()`,
but at this time the root span is in an active state, so this root span
will not be sent to the trace collector.
This can be fixed by dropping the root span manually.
Fixes: #2812
Signed-off-by: bin <bin@hyper.sh>
The variable for 'name' in config-settings.go.in was previously
hardcoded as "kata". In e7c42fb it was changed to the runtime name,
which is "kata-runtime". Add a variable to specify a syslog identifier
for consistency for tests and documentation that use it.
Fixes#2806
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Update the sandbox dir clean up logic to be more appropriate
Add different seeds for randInt() method
Fixes#2770
Signed-off-by: Feng Wang <feng.wang@databricks.com>
This patch adds an option "disable_seccomp" to the config
hypervisor.clh, from which users can disable the `seccomp`
feature from Cloud Hypervisor when needed (for debugging purposes).
Fixes: #2782
Signed-off-by: Bo Chen <chen.bo@intel.com>
This patch enables the `seccomp` feature from Cloud Hypervisor which
provides fine-grained allowed syscalls for each of its worker
threads. It brings important security benefits, while would increase
memory footprint.
Fixes: #2782
Signed-off-by: Bo Chen <chen.bo@intel.com>
Shim management server is running in a go routine, in test mode
this will cause the directory where the listen socket
file(/run/vc/sbs/777-77-77777777/shim-monitor.sock) in leak
after the tests finished.
Fixes: #2805
Signed-off-by: bin <bin@hyper.sh>
wait_for_pci_device() waits for the PCI device at the given path to become
ready, but it doesn't currently give you any meaningful handle on that
device.
Change the signature, so that it returns the PCI address of the device.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add a new pci::Address type which represents a guest PCI address in
DDDD:BB:SS.F form.
fixes#2745
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
pci::Slot represents a PCI slot. However, in all cases where we use it, we
actually care about addressing a specific PCI function. So, at the moment
we can only refer to function 0 in each slot.
Replace pci::Slot with pci::SlotFn to represent both the slot and function.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This commit does two chagnes:
- move code for managing temp users to rootless.go.
- use common function in qemu.go when shutdown the VM.
Fixes: #2759
Signed-off-by: bin <bin@hyper.sh>
The yaml file has an indent issue from line 15.
And the branches filter should be under pull_request_target but
not the pull_request trigger.
Also actions/checkout@v2 does not need the token parameter.
Fixes: #2798
Signed-off-by: bin <bin@hyper.sh>
The guest kernel configuration suggested for Kata, and which is used by the
CI didn't include CONFIG_PCI_MMCONFIG. That's kind of weird, MMCONFIG is
the modern normal way of handling configuration cycles.
In addition, due to a complex set of interactions through the ACPI code,
disabling MMCONFIG means that SHPC hotplug doesn't work: the driver is
included in the guest kernel, but will fail to probe on PCI to PCI bridges,
meaning it won't actually be activated.
Enable MMCONFIG so that we suggest and testa more typical guest kernel
configuration.
fixes#2288
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
From the endpoints string described through the configuration file, we
build a hash set of allowed enpoints. If a configuration files does not
include an endpoints section, we assume all endpoints are not allowed.
If there is no configuration file, then all endpoints are allowed.
Then for every ttrpc request, we check if the name of the endpoint is
part of the hashset. If it is not, then we return ttrcp::UNIMPLEMENTED.
Fixes: #1837
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
When the kernel command line includes a agent.config_file=<path> entry,
then we will try to override the default confiuguration values with the
ones we parse from a TOML file at <path>.
As the configuration file overrides the default values, we need to go
through a simplified builder that convert a set of Option<> fields into
the actual AgentConfig structure.
Fixes: #1837
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
They will define the list of endpoints that an agent supports.
They're empty and non actionable for now.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
A single constructor setting default value is a typical pattern for a
Default implementation.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Even CCA, which is the confidential compute archtecture, has not been
ready, add a empty implementation to avoid static check error.
Fixes: #2789
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
Suggested-by: Fabiano Fidêncio <fidencio@redhat.com>
Exclude from lint checking for it is ultimately only used in
architecture-specific code.
Fixes: #2273
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Bump containerd to v1.5.7 in order to bring in a fix for CVE-2021-41103,
"insufficiently restricted permissions ons plugins directories
(https://github.com/advisories/GHSA-c2h3-6mxw-7mvq)".
dependabot found a potential security vulnerability and raised a PR to
fix it. However, dependabot does not properly follows nor understands
the needed of our CIs (mainly related to formatting the PR and whatnot),
thus I'm re-raising it.
Fixes: #2796
Supersedes: #2787
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Not all routes have either a gateway or a destination IP.
Interface routes, where the source, destination and gateway are undefined,
will default to IP v4 with the current is_ipv6() check even when they
are v6 routes.
We use the provided gRPC Route.Family field instead. This field is built
from the host netlink messages, and is a reliable way of finding out
a route's IP family.
Fixes: #2768
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Our check for the IP family is working as long as we have either a
gateway or a destination IP. Some routes are missing both.
The RT netlink messages provide the IP family information for each
route, so we can carry that piece of information up to the guest. That
will allow for a more reliable route IP family determination.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
We need to be able to get the IP family from the netlink route meesages,
and the Route.Family field only got recently added to the netlink
package.
The update generates static check warnings about the call for
nethandler.Delete() being deprecated in favor of a Close() call instead.
So we include the s/Delete()/Close()/ change as part of this PR.
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
Reduce the cloud-hypervisor log level from `Debug` to `Info` when hypervisor
debug is enabled. This is required since `Debug` level:
- Is overkill for debugging hypervisor failures.
- Effectively hides the output from the guest kernel and userland: CLH
generates so much output that the output from the guest gets "lost in
the noise" (experiments show that for each full CLH debug message, at most
1 _byte_ of guest output is displayed).
Fixes: #2726.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
We explicitely set the Postion Independant Executlable (PIE) options
in the extra CFLAGS and LDFLAGS that are passed to the QEMU configure
script for all archs. This means that these options are used pretty
much everywhere, including when building the sample plugins under the
test directory. These cannot be linked with -pie and break the build,
as experienced recently on ARM (see PR #2732).
This only broke on ARM because other archs are configured with
--disable-tcg : this disables plugins which are built by default
otherwise.
The --enable-pie option is all that is needed. The QEMU build system
knows which binaries should be created as PIE, e.g. the important
bits like QEMU and virtiofsd, and which ones should not, e.g. the
sample plugins that aren't used in production.
Rely on --enable-pie only, for all archs. This allows to drop the
workaround that was put in place in PR #2732.
Fixes: #2757
Signed-off-by: Greg Kurz <groug@kaod.org>
The centos ppc64le gpg key at mirror.centos.org doesn't exist (link rot?).
Replacing it with url from CentOS/sig-core-AltArch on github.
Fixes: #2676
Signed-off-by: Aaron Simmons <paleozogt@gmail.com>
modify the make script of the check-go-static, changing the `./cli` path to `./cmd/kata-runtime`
Fixes: #2765
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
Run tests sometimes generate pkg/containerd-shim-v2/monitor_address,
and `git status` will treat it as a new file.
Package containerd-shim-v2 has moved to pkg/containerd-shim-v2,
the monitor_address in .gitignore should be updated too.
Fixes: #2762
Signed-off-by: bin <bin@hyper.sh>
It seems the client (crio) can send multiple requests to stop the Kata VM,
resulting a nil reference if the uid has already been cleaned up by a different thread.
Fixes#2743
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Since we now have "unix://" kind of socket returned by the
SocketAddress() function, there is no more need to build the sandbox
storage path dynamically to keep OS compatibility.
Fixes: #2738
Suggested-by: Christophe de Dinechin <dinechin@redhat.com>
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
Rectify the values of testModuleData with the correct
types in TestCCCheckCLiFunction in kata-check_(!x86)_test.go
Fixes: #2735
Signed-off-by: Amulya Meka <amulmek1@in.ibm.com>
Optional build types are common for early adoption.
Lets add a flag to build and optional config.
e.g.
kernel-build.sh -b experimental
In the future instead of add more flags just add a new build type.
Signed-off-by: Carlos Venegas <jose.carlos.venegas.munoz@intel.com>
The agent initiates a PCI rescan from two places. One is triggered
for each virtio-blk PCI device, and one is triggered unconditionally
when we start a new container.
The PCI bus rescan code was added long time ago in Clear Containers due to
lack of ACPI support in QEMU 2.9 + q35. Since Kata routinely plugs devices
under a PCIe-to-PCI bridge, that left SHPC as the only available hotplug
mechanism.
However, while Kata was using SHPC on the qemu side, it wasn't actually
using it on the guest side. Due to a quirk of our guest kernel
configuration, the SHPC driver never bound to the bridge, and *no* hotplug
was working at all. To work around that, Kata was forcing the rescan
manually, which would discover the new device. That was very fragile (we
were arguably relying on a kernel bug). Even if we were using SHPC
propertly, it includes a mandatory 5s delay during plug operations
(designed for physical cards and human operators), which makes it
unsuitable quick start up.
Worse, the forced PCI rescans could race with either SHPC or PCIe native
hotplug sequences, causing several problems. In some cases this could put
the device into an entirely broken state where it wouldn't respond to
config space accesses at all.
Since pull request #2323 was merged, we have instead used ACPI hotplug
which is both fast, and more solid in terms of semantics and races. So,
the forced PCI rescans are no longer necessary. Remove them all.
fixes#683
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
do_add_swap() has some mildly complex code to translate the PCI path of
a virtio-blk device (where the swap will reside) into a /dev path. However,
the device module already has get_virtio_blk_pci_device_name() which does
exactly that. The existing code has some further advantages: it uses
more precise matching of the sysfs paths, and if necessary it will wait for
the device to be added to the guest.
While we're there, remove an unnecessary 'as u8' from the PCI path
construction: pci::Path::new() already accepts anything which implements
TryInfo<u8>, which u32 certainly does.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We send information about several kinds of devices to the agent so
that it can apply specific handling. We don't currently do this with
VFIO devices. However we need to do that so that the agent can
properly wait for VFIO devices to be ready (previously it did that
using a PCI rescan which may not be reliable and has some very bad
side effects).
This patch collates and sends the relevant information.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Both appendBlockDevice and appendVhostUserBlkDevice start by using
GetDeviceByID to lookup the api.Device object corresponding to their
ContainerDevice object. However their common caller, appendDevices() has
already done this.
This changes it so the looked up api.Device is passed to the individual
append*Device() functions. This slightly reduces duplicated work, but more
importantly it makes it clearer that append*Device() don't need to check
for a nil result from GetDeviceByID, since the caller has already done
that.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
For several device types which correspond to a PCI device in the guest
we record the device's PCI path in the guest. We don't currently do
that for VFIO devices, but we're going to need to for better handling
of SR-IOV devices.
To accomplish this, we have to determine the guest PCI path from the
information the VMM gives us:
For qemu, we query the slot of the device and its bridge from QMP.
For cloud-hypervisor, the device add interface gives us a guest PCI
address. In fact this represents a design error in the clh API -
there's no way it can really know the guest PCI address in general.
It works in this case, because clh doesn't use PCI bridges, so the
device will always be on the root bus. Based on that, the PCI path is
simply the device's slot number.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
hotplugVFIODevice() has several different paths depending if we're
plugging into a root port or a PCIE<->PCI bridge and if we're using a
regular or mediated VFIO device.
We're going to want some common code on the successful exit path here,
so refactor the function to allow that without duplication.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently, VFIO devices attached to a Kata container aren't described to
the agent at all. We essentially just hope they're ready by the time
we've entered the container proper, which is usually the case because of
the PCI rescan - but that causes other problems.
This adds a new device type to the agent representing VFIO devices. The
agent will use its existing uevent watching mechanisms to wait for the
associated guest PCI device to appear before proceeding.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently the constants giving the names for each device/driver type in
the protocol are in mount.rs, and used in device.rs. Since these constants
are inherently related to, well, devices, it makes more sense to put them
in device.rs and use them from mount.rs.
This will become even more so with planned extensions which will add some
device types that will not be used in mount.rs at all.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A few "go fmt" errors appear to have crept it. Clean them up with
"go fmt ./..." in the src/runtime directory.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
After 5.11-rc4, memory hotplug alignment size is reduced to 128M for 4K
page.
It works better for memory hotplug and nvdimm plug in kata on arm.
without this patch, memory hotplug will fail for the current memory
hotplug alignment is 1G but the nvdimm size align with 128M in kata.
After port it here, we can avoid a fix in qemu side.
Note: if you change the page size to other size than 4K, memory hotplug
will has no effect.
Fixes: #2707
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
For qemu 6.1.0 build on arm64, compile error occurs when "-pie" is added
to ldflag.
tests/plugins/empty.c won't be linked as a sysmbol is missing.
I consider there maybe a bug.
Before figure it out, we should disable plugins for qemu 6.1.0 on arm64.
Fixes: #2707
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
There's a typo in the file that should receive the output of `cargo
vendor`. We should use forward the output to `.cargo/config` instead of
`.cargo/vendor`.
This was introduced by 21c8511630.
Fixes: #2729
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
While releasing kata-containers 2.3.0-alpha1 we've hit some issues as
the tags attribution is done incorrectly. We want an array of tags to
iterate over, but the currently code is just lost is the parenthesis.
This issue was introduced in a156288c1f.
Fixes: #2725
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
- virtiofs: Create shared directory with 0700 mode, not 0750
- watcher: ensure we create target mount point for storage
- packaging: fix qemu build on ppc64le
- runtime: tracing: Use root context to stop tracing
- Replace SHPC with ACPI PCI hotplug for Kata guests
- kata-deploy: Also provide "stable" & "latest" tags
- runtime: tracing: Fix logger passed in newContainer
- virtcontainers: update VC SandboxConfig API add SandboxBindMounts field
- sandbox: Allow the device to be accessed,such as /dev/null and /dev/u…
- qemu: add v5.1.0 dir under tag_patches
- threat-model: Add missing threat-model document
- docs: documentation for running non-root VMM
- workflows,release: Upload the vendored cargo code
- runtime: run the QEMU VMM process with a non-root user
- runtime: update .gitignore file cleare the vc shim config
- runtime: fix empty cgroup path validation error
- ci: Call agent shutdown test only in the correspondent CI_JOB
- runtime: Remove outdated TestStoreContainer
- runtime: refactor commandline code directory
- virtcontainers: update VC HypervisorConfig API add three lost fields
- virtcontainers: add unit tests for container.go
- runtime: clh: Enable hugepages support
- agent: Simplify mount point creation
- versions: Allow newer Rust versions
- runtime/qemu: Move from query-cpus to query-cpus-fast
- Update Kata to use qemu-6.1
- Host cgroups improvements and simplifications
- Add doc for guest swap
- versions: Upgrade to Cloud Hypervisor v18.0
- runtime: Fix README link
- qemu: remove default config for arm64.
- sandbox: Add device permissions such as /dev/null to cgroup
- virtcontainers: fc: parse vcpuID correctly
- kata-tarball: Build and test fixes
- test: enable running tests under root user
- osbuilder: Change to "=" operator to make script more portable
- makefile: Fix error exit status code
- osbuilder: fix inconsistent calculation of fs size
- virtcontainers: Remove NewStoreFeature
- snap: Test variable instead of executing "branch"
- license: drop redundent license files
- Fix swap fail insert fail issue
272771dc watcher: ensure we create target mount point for storage
439e5ac3 packaging: fix qemu build on ppc64le
8bbcb06a qemu: Disable SHPC hotplug
cc4983ee runtime: Remove unused qemuArchBase.appendBridges definition
e248de46 vendor: Update govmm
0ca8c272 qemu: add v5.1.0 dir under tag_patches
3bdcfaa6 kata-deploy: Add more info about the stable tag
41c590fa kata-deploy: Improve README
debf3c9f kata-deploy: Remove qemu-virtiofs runtime class
43a72d76 release: update the kata-deploy yaml files accordingly
ea9b2f9c kata-deploy: Add "stable" info to the README
e5411056 kata-deploy: Update the README
9acf4e5d kata-deploy: Add `stable` yaml files
a86babe0 kata-deploy: Point to the `latest` release
a156288c workflows: Add "stable" & "latest" tags to kata-deploy
305afc8b docs: documentation for running non-root VMM
1fe080fd threat-model: Add missing threat-model document
21c85116 workflows,release: Upload the vendored cargo code
9a6d56f1 runtime: fix empty cgroup path validation error
90e63887 ci: Call agent shutdown test only in the correspondent CI_JOB
48fb1d92 virtiofs: Create shared directory with 0700 mode, not 0750
077b77c1 runtime: tracing: Fix logger passed in newContainer
39cd05e0 runtime: tracing: Use root context to stop tracing
1cfe5930 runtime: Run QEMU using a non-root user/group
fd983738 runtime: update .gitignore file cleare the vc shim config
067c44d0 runtime: fix UT build failure
9353cd77 runtime: Remove outdated TestStoreContainer
9a311a2b docs: fix invalid kernel dax doc url
e7c42fbc runtime: unify generated config
4f7cc186 runtime: refactor commandline code directory
9d3cd984 agent/mount: Remove unused ensure_destination_exists()
64aa5623 agent: Correct mount point creation
08d7aebc agent/mount: Split out regular file case from ensure_destination_exists()
9fa3beff agent: Remove unnecessary BareMount structure
49282854 agent: Simplify BareMount::mount by using nix::mount::mount
d00decc9 runtime: clh: Enable hugepages support
64bb803f runtime/qemu: Move from query-cpus to query-cpus-fast
25ac3524 versions: Allow newer Rust versions
851d5f86 tests: Correct heading in static checks test
4b7e4a4c runtime: Vendoring update
8d9d6e6a docs: Host cgroups documentation update
9bed2ade virtcontainers: Convert to the new cgroups package API
b42ed393 virtcontainers: cgroups: Add a containerd API based cgroups package
f17752b0 virtcontainers: container: Do not create and manage container host cgroups
dc7e9bce virtcontainers: sandbox: Host cgroups partitioning
f811026c virtcontainers: Unconditionally create the sandbox cgroup manager
a6066404 virtcontainers: update VC HypervisorConfig API add three lost fields
bb18cd47 virtcontainers: update VC SandboxConfig API add SandboxBindMounts field
58e77a3c sandbox: Allow the device to be accessed,such as /dev/null and /dev/urandom
d67a414b src/runtime/README.md: Fix URL of Licence
13b8bb0c runtime: Fix README link
25670d30 packaging/qemu: Update qemu-exerimental version to v6.1.0
041a513f versions: Update qemu to v6.1.0
62baa48e virtcontainers: fc: parse vcpuID correctly
81de2d47 packaging: Correct error message in apply_patches.sh
f785ff0b virtcontainers: clh: Revert the workaround incorrect default values
0e0e59dc virtcontainers: clh: Re-generate the client code
f0b53314 versions: Upgrade to Cloud Hypervisor v18.0
11652136 actions: test make kata-tarball
626d659f actions: kata-deploy on PRs and use makefile
78d99f51 kata-deploy: Make verbose single builds
59486b85 kata-deploy: Add tarball suffix to makefile targets
96e1246b makefile: Include kata-deploy targets
74d645cd how-to: Add how-to-setup-swap-devices-in-guest-kernel.md
d865c809 virtcontainers: add unit tests for container.go
71f915c6 sandbox: Add device permissions such as /dev/null to cgroup
2174fee4 docs: Add swap annotations introduction
2abc450a test: enable running tests under root user
924a68d0 osbuilder: Change to "=" operator to make script more portable
1fff9be7 qemu: remove default config for arm64.
e2a9e78c virtcontainers: Remove NewStoreFeature
bfcee911 osbuilder: fix inconsistent calculation of fs size
4996f9b7 snap: Test variable instead of executing "branch"
256c3b27 license: drop redundent license files
bcc9fa3b hotplugAddBlockDevice: Use ExecuteBlockdevAddWithDriverCache with swap
bd85da04 vendor: Update vendor/github.com/kata-containers/govmm
d422789f makefile: Fix error exit status code
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
We would only create the target when updating files. We need to make
sure that we create the target if the source is a directory. Without
this, we'll fail to start a container that utilizes an empty configmap,
for example.
Add unit tests for this.
Fixes: #2638
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
We now support any container engine CRI compliant. Let's bump the
kata-monitor version to 0.2.0.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
This commit stops the container engine polling in favor of
the kata sandbox storage path monitoring.
The pod cache list is now refreshed based on fs events and synced with
the container engine only when needed.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
When the container engine is different than containerd or CRI-O we
lack proper detection of kata workloads and consider all the pods as
kata ones.
Instead of querying the container engine for the lower level runtime
used in each pod, check if a directory matching the pod exists in
the virtualcontainers sandboxes storage path.
This provides a container engine independent way to check for kata pods.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
Since the qemu upgrade to v6.1.0, the build fails
with a linking issue. Adding --disable-tcg to fix
it.
Fixes: #2710
Signed-off-by: Amulya Meka <amulmek1@in.ibm.com>
Under certain circumstances[0] Kata will attempt to use SHPC hotplug
for PCI devices on the guest. In fact we explicitly enable SHPC on
our PCI to PCI bridges, regardless of the qemu default.
SHPC was designed a long, long time ago for physical hotplugging and
works very poorly for a virtual environment. In particular it has a
mandatory 5s delay to allow a (real, human) operator to back out the
operation if they press a button by mistake. This alone makes it
unusable for a fast start up application like Kata.
Worse, the agent forces a PCI rescan during startup. That will race
with the SHPC hotplug operation causing the device to go into a bad
state where config space can't be accessed from the guest at all.
The only reason we've sort of gotten away with this is that our
default guest kernel configuration triggers what's arguably a kernel
bug effectively disabling SHPC. That makes the agent rescan the only
reason we see the new device.
Now that we require a qemu >=6.1, which includes ACPI PCI hotplug on
the q35 machine, we can explicitly disable SHPC in all cases. It's
nothing but trouble.
fixes#2174
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
qemuArchBase.appendBridges is never actually used, because the bare
qemuArchBase type is itself never used (outside of unit tests). Instead
*all* the subclasses of qemuArchBase override appendBridges() to call
the very similar, but not identical genericAppendBridges. So, we can
remove the qemuArchBase.appendBridges implementation.
Furthermore, all those subclasses override appendBridges() in exactly
the same way, and so we can remove *those* definitions and replace the
base class qemuArchBase appendBridges() with that version, calling
genericAppendBridges().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A related dir is needed when apply qemu patch using script. As qemu 5.1
is used for arm, a dir of "v5.1.0" is needed under tag_patches.
Fixes: #2696
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
as `-realtime` has been removed in QEMU 6. `-overcommit` has been
supported since at least QEMU 3.1.
Fixes: #189
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
There are `DeviceToDeviceCgroup` and `deviceToDeviceCgroup` two functions,
creating a `specs.LinuxDeviceCgroup` object. We clear the new function `deviceToDeviceCgroup`.
Fixes: #2694
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
Let's make it as clear as possible for the user that if they go for a
tagged version of kata-deploy, eg, 2.2.1, they'll have the kata runtime
2.2.1 deployed on their cluster.
Suggested-by: Eric Adams <eric.adams@intel.com>
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Let's add more instructions in the README in order to make clear to the
reader what they can do to check whether kata-deploy is ready, or
whether they have to wait till proceeding with the next instruction.
Suggested-by: Eric Adams <eric.adams@intel.com>
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
There's only one QEMU runtime class deployed as part of kata-deploy, and
that includes virtiofs support (which is the default for quite some time
already). Knowing this, let's just remove the `qemu-virtiofs` runtime
class definition.
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Let's teach our `update-repository-version.sh` script to properly update
the kata-deploy tags on both kata-deploy and kata-cleanup yaml files.
The 3 scenarios that we're dealing with, based on which branch we're
targetting, are:
```
1) [main] ------> [main] NO-OP
"alpha0" "alpha1"
+----------------+----------------+
| from | to |
-----------------+----------------+----------------+
kata-deploy | "latest" | "latest" |
-----------------+----------------+----------------+
kata-deploy-base | "stable | "stable" |
-----------------+----------------+----------------+
2) [main] ------> [stable] Update kata-deploy and
"alpha2" "rc0" get rid of kata-deploy-base
+----------------+----------------+
| from | to |
-----------------+----------------+----------------+
kata-deploy | "latest" | "rc0" |
-----------------+----------------+----------------+
kata-deploy-base | "stable" | REMOVED |
-----------------+----------------+----------------+
3) [stable] ------> [stable] Update kata-deploy
"x.y.z" "x.y.(z+1)"
+----------------+----------------+
| from | to |
-----------------+----------------+----------------+
kata-deploy | "x.y.z" | "x.y.(z+1)" |
-----------------+----------------+----------------+
kata-deploy-base | NON-EXISTENT | NON-EXISTENT |
-----------------+----------------+----------------+
```
And we can easily cover those 3 cases only with the information about
the "${target_branch}" and the "${new_version}", where:
* case 1) if "${target_branch}" is "main" *and* "${new_version}"
contains "alpha", do nothing
* case 2) if "${target_branch}" is "main" *and* "${new_version}"
contains "rc":
* change the kata-deploy & kata-cleanup tags from "latest" to
"${new_version}".
* delete the kata-deploy-stable & kata-cleanup-stable files.
* case 3) if the "${target_branch}" contains "stable":
* change the kata-deploy & kata-cleanup tags from "${current_version}"
to "${new_version}".
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Similar to the instructions we have for the "latest" images, let's also
add instructions about the "stable" images.
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Let's just point to our repo URLs rather than assume users using
kata-deploy will have our repo cloned.
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
This is **not** the nicest patch of my career, and I know it adds code
duplication. However, I've decided to take this approach in order to
have easier / better instructions for users who're consuming
kata-deploy.
Having both stable & latest yaml on `main` will let us point to just one
place, without having to update the instructions.
I know, would be better to have those generated from a .in file,
wouldn't it? For sure, but then we'd lose the ability to just point to
those files from kata-deploy pages (either on dockerhub or quay.io).
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Instead of point to a specific release number, let's point to the
`latest` tag on the main branch.
There's still some work needed in order to point to the `stable` tag on
the stable-x.y branches, as this is something that should be done
automagically as part of the release process.
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
When releasing a tarball, let's *also* add the "stable" & "latest" tags
to the kata-deploy image.
The "stable" tag refers to any official release, while the "latest" tag
refers to any pre-release / release candidate.
Fixes: #2302
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
This was added in the 1.x repo and is missing in the 2.x repo.
Copying over the document from 1.x.
This is a starting point and focuses on the devices / interfaces
with the virtual machine, and ultimately to the container itself.
We then discuss how these devices/interfaces vary by VMM/hypervisor.
The threat model drawing is created via gdocs, located here:
https://docs.google.com/drawings/d/1dPi9DG9bcCUXlayxrR2OUa1miEZXewtW7YCt4r_VDmA/edit?usp=sharing
For Kata 2.x, the block named as `kata-runtime` has been changed to
`kata-shim`.
Fixes: #2340
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
As part of the release, let's also upload a tarball with the vendored
cargo code. By doing this we allow distros, which usually don't have
access to the internet while performing the builds, to just add the
vendored code as a second source, making the life of the downstream
maintainers slightly easier*.
Fixes: #1203
*: The current workflow requires the downstream maintainer to download
the tarball, unpack it, run `cargo vendor`, create the tarball, etc.
Although this doesn't look like a ridiculous amount of work, it's better
if we can have it in an automated fashion.
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
The agent shutdown test should only run on the CI JOB of CRI_CONTAINERD_K8S_MINIMAL
which is the only one where testing tracing is being enabled, however, this
test is being triggered in multiple CI jobs where it should not run. This PR
fixes that issue.
Fixes#2683
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
A discussion on the Linux kernel mailing list [1] exposed that virtiofsd makes a
core assumption that the file systems being shared are not accessible by any
non-privileged user. We currently create the `shared` directory in the sandbox
with the default `0750` permissions, which gives read and directory traversal
access to the group. There is no real good reason for a non-root user to access
the shared directory, and this is potentially dangerous.
Fixes: #2589
[1]: https://lore.kernel.org/linux-fsdevel/YTI+k29AoeGdX13Q@redhat.com/
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Retrieve the absolute sandbox storage path. We will soon need this to
monitor the creation/deletion of new kata sandboxes.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
The storage path we use to collect the sandbox files is defined in the
virtcontainers/persist/fs package.
We create the runtime socket in that storage path, by hardcoding the
full path in the SocketAddress() function in the runtime package.
This commit splits the hardcoded path by the socket address path so that
the runtime package will be able to provide the storage path to all the
components that may need it.
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
In order to retrieve the list of sandboxes, we poll the container engine
every 15 seconds via the CRI. Once we have the list we have to inspect
each pod to find out the kata ones.
This commit extend the sandbox cache to keep track of all the pods,
marking the kata ones, so that during the next polling only the new
sandboxes should be inspected to figure out which ones are using the
kata runtime.
Fixes: #2563
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
this is an unexpected event (likely a change in how containerd/cri-o
record the lower level runtime in the pod) and should be more visible:
raise the log level to "warning".
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
Change logger in Trace call in newContainer from sandbox.Logger() to
nil. Passing nil will cause an error to be logged by kataTraceLogger
instead of the sandbox logger, which will avoid having the log message
report it as part of the sandbox subsystem when it is part of the
container subsystem.
The kataTraceLogger will not log it as related to the container
subsystem, but since the container logger has not been created at this
point, and we already use the kataTraceLogger in other instances where a
subsystem's logger has not been created yet, this PR makes the call
consistent with other code.
Fixes#2665
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
Call StopTracing with s.rootCtx, which is the root context for tracing,
instead of s.ctx, which is parent to a subset of trace spans.
Fixes#2661
Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
A random generated user/group is used to start QEMU VMM process.
The /dev/kvm group owner is also added to the QEMU process to grant it access.
Fixes#2444
Signed-off-by: Feng Wang <feng.wang@databricks.com>
Add support for --no-shutdown Knob. This allows us to
shutdown the VM without quitting QEMU.
Note: Also fix the comment around --no-reboot to be
more accurate.
Signed-off-by: Manohar Castelino <mcastelino@apple.com>
ExecuteSystemPowerdown issues `system_powerdown` and waits
for `SHUTDOWN`. The event emitted is `POWERDOWN` per spec.
Without this we get an error even though the VM has shutdown
gracefully.
Per QEMU spec:
```
POWERDOWN (Event)
Emitted when the virtual machine is powered down through the power
control system, such as via ACPI.
Since
0.12
Example
<- { "event": "POWERDOWN",
"timestamp": { "seconds": 1267040730, "microseconds": 682951 } }
SHUTDOWN (Event)
Emitted when the virtual machine has shut down, indicating that qemu is
about to exit.
Arguments
guest: boolean
If true, the shutdown was triggered by a guest request (such as a
guest-initiated ACPI shutdown request or other hardware-specific action)
rather than a host request (such as sending qemu a SIGINT). (since 2.10)
reason: ShutdownCause
The ShutdownCause which resulted in the SHUTDOWN. (since 4.0)
Note
If the command-line option “-no-shutdown” has been specified, qemu will
not exit, and a STOP event will eventually follow the SHUTDOWN event
Since
0.12
Example
<- { "event": "SHUTDOWN", "data": { "guest": true },
"timestamp": { "seconds": 1267040730, "microseconds": 682951 } }
```
Signed-off-by: Manohar Castelino <mcastelino@apple.com>
Due to #2332 being merged after running tests for #2604, and the latter
being merged now, a test for the now removed `storeContainer` was added.
Remove it.
Fixes: #2652
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
And use a released version instead of the master branch so that it no
longer gets invalidated.
Depends-on: github.com/kata-containers/kata-containers#2645
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
The only remaining callers of ensure_destination_exists() are in its own
unit tests. So, just remove it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
mount_storage() first makes sure the mount point for the storage volume
exists. It uses fs::create_dir_all() in the case of 9p or virtiofs volumes
otherwise ensure_destination_exists(). But.. ensure_destination_exists()
boils down to an fs::create_dir_all() in most cases anyway. The only case
it doesn't is for a bind fstype, where it creates a file instead of a
directory. But, that's not correct anyway because we need to create either
a file or a directory depending on the source of the bind mount, which
ensure_destination_exists() doesn't know.
The 9p/virtiofs paths also check if the mountpoint exists before calling
fs::create_dir_all(), which is unnecessary (fs::create_dir_all already
handles that case).
mount_storage() does have the information to know what we need to create,
so have it explicitly call ensure_destination_file_exists() for the bind
mount to a non-directory case, and fs::create_dir_all() in all other cases.
fixes#2390
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ensure_destination_exists() can create either a directory or a regular file
depending on the arguments. This patch extracts the regular file specific
option into its own helper: ensure_destination_file_exists(). This:
- Avoids doing some steps in the directory case (they're already handled
by create_dir_all())
- Enables some further future cleanups
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
struct Baremount contains the information necessary to make a new mount.
As a datastructure, however, it's pointless, since every user just
constructs it, immediately calls the BareMount::mount() method then
discards the structure.
Simplify the code by making this a direct function call baremount().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
BareMount::mount does some complicated marshalling and uses unsafe code to
call into the mount(2) system call. However, we're already using the nix
crate which provides a more Rust-like wrapper for mount(2). We're even
already using nix::mount::umount and nix::mount::MsFlags from the same
module.
In the same way, we can replace the direct usage of libc::umount() with
nix::mount::umount() in one of the tests.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This patch adds the configuration option that allows to use hugepages
with Cloud Hypervisor guests.
Fixes: #2648
Signed-off-by: Bo Chen <chen.bo@intel.com>
We recently updated to using qemu-6.1 (from qemu 5.2). Unfortunately one
breaking change in qemu 6.0 wasn't caught by the CI.
The query-cpus QMP command has been removed, replaced by query-cpus-fast
(which has been available since qemu 2.12). govmm already had support for
query-cpus-fast, we just weren't using it, so the change is quite easy.
fixes#2643
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Rust 1.47.0 which is the latest we note as tested in versions.yaml is now
getting fairly old - many current distros have newer versions (e.g.
Rust 1.54.0 in Fedora 34). Bring this more up to date.
Note that this is only updating the 'newest-version', not the minimum
required version.
The new version changes the name of the 'clippy::unknown_clipp_lints'
option to simply 'unknown_lints' so we need to change that as well to avoid
warnings.
fixes#2633
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The github static checks action has a section heading called "Building
rust". It doesn't actually build rust, though, just installs it with
rustup. Correct the misleading message.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The new API is based on containerd's cgroups package.
With that conversion we can simpligy the virtcontainers sandbox code and
also uniformize our cgroups external API dependency. We now only depend
on containerd/cgroups for everything cgroups related.
Depends-on: github.com/kata-containers/tests#3805
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Eventually, we will convert the virtcontainers and the whole Kata
runtime code base to only rely on that package.
This will make Kata only depends on the simpler containerd cgroups API.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
The only process we are adding there is the container host one, and
there is no such thing anymore.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
This is a simplification of the host cgroup handling by partitioning the
host cgroups into 2: A sandbox cgroup and an overhead cgroup.
The sandbox cgroup is always created and initialized. The overhead
cgroup is only available when sandbox_cgroup_only is unset, and is
unconstrained on all controllers. The goal of having an overhead cgroup
is to be more flexible on how we manage a pod overhead. Having such
cgroup will allow for setting a fixed overhead per pod, for a subset of
controllers, while at the same time not having the pod being accounted
for those resources.
When sandbox_cgroup_only is not set, we move all non vCPU threads
to the overhead cgroup and let them run unconstrained. When it is set,
all pod related processes and threads will run in the sandbox cgroup.
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Regardless of the sandbox_cgroup_only setting, we create the sandbox
cgroup manager and set the sandbox cgroup path at the same time.
Without doing this, the hypervisor constraint routine is mostly a NOP as
the sandbox state cgroup path is not initialized.
Fixes#2184
Signed-off-by: Samuel Ortiz <samuel.e.ortiz@protonmail.com>
Sync the virtcontainers api.md document, add `ConfidentialGuest` `EntropySourceList` `GuestSwap` three
fields to the HypervisorConfig API.
Fixes#2625
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
sync the virtcontainers api.md document, add SandboxBindMounts field to the SandboxConfig API.
And update the order of the SandboxConfig API fields.
Fixes#2621
Signed-off-by: wangyongchao.bj <wangyongchao.bj@inspur.com>
If the device has no permission, such as /dev/null, /dev/urandom,
it needs to be added into cgroup.
Fixes: #2615
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
This brings it back into line with the normal qemu version. We refer to
v6.1.0 by full SHA in versions.yaml, rather than the tag, so that
apply_patches.sh sees it as different and applies the virtiofs DAX patches
which is what the experimental version is actually about having.
The virtiofs DAX patches themselves are updated to the version from
https://gitlab.com/virtio-fs/qemu, virtio-fs-dev branch as of commit
3620cb0a.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We need qemu-6.1 for ACPI PCI hotplug support for the q35 machine. At the
moment qemu will use SHPC hotplug under the PCIe to PCI bridge on q35.
SHPC is too slow to use for our purposes (it requires a 5s delay).
Update the qemu version to v6.1.0. This leaves the experimental version
*older* than the normal version, but we'll fix that up later.
We also need to tweak the snapcraft.yaml, since the location for configs
has changed in the new qemu version.
fixes#1691
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
In getThreadIDs(), the cpuID variable is derived from a string that
already contains a whitespace. As a result, strings.SplitAfter returns
the cpuID with a leading space. This makes any go variant of string to int
fail (strconv.ParseInt() in our case). This patch makes sure that the
leading space character is removed so the string passed to
strconv.ParseInt() is "CPUID" and not " CPUID".
This has been caused by a change in the naming scheme of vcpu threads
for Firecracker after v0.19.1.
Fixes: #2592
Signed-off-by: Anastassios Nanos <ananos@nubificus.co.uk>
If the script doesn't find a patches directory it expects, it gives an
error saying to create a dummy 'no_patches' file if you really don't want
any patches applied for that version.
But actual practice in the tree is to call the dummy file 'no_patches.txt'
rather than simply 'no_patches'. Correct the message to match existing
practice.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Highlights from the Cloud Hypervisor release v18.0: 1) Experimental User
Device (vfio-user) support; 2) Migration support for vhost-user devices;
3) VHDX disk image support; 4) Device pass through on MSHV hypervisor;
5) AArch64 for support virtio-mem; 6) Live migration on MSHV hypervisor;
7) AArch64 CPU topology support; 8) Power button support on AArch64; 9)
Various bug fixes on PTY, TTY, signal handling, and live-migration on
AArch64.
Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v18.0Fixes: #2543
Signed-off-by: Bo Chen <chen.bo@intel.com>
make kata-tarball is the main way to
build a kata in a single host. Lets
test it to make sure it works on every PR.
Fixes: #2416
Signed-off-by: Carlos Venegas <jose.carlos.venegas.munoz@intel.com>
- Run kata-deploy tarball generation action on every PR.
- Use kata-deploy makefile targets.
Signed-off-by: Carlos Venegas <jose.carlos.venegas.munoz@intel.com>
If a binary tarball for a single component is done,
the logs will be shown in stdout.
e.g.
make kernel-tarball
To build all a the same time still store logs in files.
make kata-tarball
Signed-off-by: Carlos Venegas <jose.carlos.venegas.munoz@intel.com>
Now that local-build kata-deploy makefile is inlucded in toplevel
makefile, lets use the suffix `-tarball` to avoid name collitions
and identify the tarball releted targets.
Signed-off-by: Carlos Venegas <jose.carlos.venegas.munoz@intel.com>
Use kata-deploy targets from toplevel.
This will help if want to build and
reinstall just one single kata component.
Signed-off-by: Carlos Venegas <jose.carlos.venegas.munoz@intel.com>
Add how-to-setup-swap-devices-in-guest-kernel.md to how-to to introduce
how to setup swap device in guest kernel.
Fixes: #2326
Signed-off-by: Hui Zhu <teawater@antfin.com>
adds the default devices for unix such as /dev/null, /dev/urandom to
the container's resource cgroup spec
Fixes: #2539
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
This adds fields to BridgeDevice struct to allow qemu's io-reserve,
mem-reserve and pref64-reserve properties to be set for PCI bridges.
This is needed for Kata's upcoming change to ACPI hotplug.
fixes#200
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
zsh doesn't support "==" as equal comparison operator, so
replace "==" with "=" to make the script more portable
Fixes: #2584
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
The current default config in qemu for arm64 doesn't suit for qemu
version 5.1+, so remove them here.
Fixes: #2595
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
This patch fixes inconsistent calculations of the rootfs size.
For `du` and `df`, `-B 1MB` is different from `-BM`. The
former is the power of 1000, and the latter is the power of
1024. So comparing them doesn't make sense. The bug may result
in a larger image than needed.
Fixes: #2560
Signed-off-by: Yujia Qiao <rapiz3142@gmail.com>
In snapcraft.yaml we have a case statement on $(branch) - that is on the
output of executing a command "branch". From the selections it appears
that what it actually wants is to simply select on the contents of the
$branch variable, which should be ${branch} instead.
fixes#2558
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There is no need to keep multiple copies of the license file in
different directory. We can just use the top level one for the project.
Fixes: #2553
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
Use ExecuteBlockdevAddWithDriverCache with swap in
hotplugAddBlockDevice to handle swap file cannot work OK with
ExecuteBlockdevAddWithCache issue.
Fixes: #2548
Signed-off-by: Hui Zhu <teawater@antfin.com>
ExecuteBlockdevAddWithDriverCache has three one parameter driver
than ExecuteBlockdevAddWithCache.
Parameter driver can set the driver of block device.
Fixes: #198
Signed-off-by: Hui Zhu <teawater@antfin.com>
Generate `config-generated.go` file under src/runtime/cli/containerd-shim-kata-v2 before excuting test or coverage.
Fixes#2479
Signed-off-by: Binbin Zhang <binbin36520@gmail.com>
As kubernetes version has been bumped to 1.22, let's bump the CRI-O
version accordingly.
Related: #2434
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Let's test our `main` branch against the latest version of k8s. In
order to do the bump, let's also update critools version accordingly.
Depends-on: github.com/kata-containers/tests#3818
Fixes: #2433
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
The CRI-O integration test suite has two tests that fail because they search for
"not found" in the error message, but we emit "is not exist".
Change the error message to match the expectations of the test suite.
Fixes: #2036
Reported-by: Julien Ropé <jrope@redhat.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
Use of the 'props' argument to 'object-add' has been deprecated since QEMU
5.0 (commit 5f07c4d60d09) in favor of flattening the properties directly
into the 'object-add' arguments. Support for 'props' is removed entirely
in qemu 6.0 (commit 50243407457a).
fixes#193
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Kata requires version 5.2 (or 5.1 on ARM) anyway. Simplify code by
dropping support for older versions. In any case explicit checks against
version number aren't necessarily reliable for patched qemu versions.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
ExecuteBlockdevAdd() and ExecuteBlockdevAddWithCache() both appear to be
intended to create block devices in the guest which backend onto a block
device in the host. That seems to be the way that Kata always uses it.
However blockdevAddBaseArgs(), used by both those functions always uses the
"file" driver, which is only intended for use with regular file backends.
Use of the "file" driver for host block devices was deprecated in qemu-3.0,
and has been removed entirely in qemu-6.0 (commit 8d17adf34f5). We should
be using the "host_device" driver instead.
fixes#191
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Update the govmm code in order to support "sandbox" feature on qemu,
which can introduce another protect layer on the host,
to make the secure container more secure.
Fixes: #185
Signed-off-by: Liang Zhou <zhoul110@chinatelecom.cn>
* Remove golang 1.13 and 1.14, add golang 1.16
* gometalinter has been deprecated, use golangci-lint instead
Signed-off-by: Julio Montes <julio.montes@intel.com>
Append `readonly=on` to a `memory-backend-file` object and
`unarmed=on` to a `nvdimm` device when `ReadOnly` is set to `true`
Signed-off-by: Julio Montes <julio.montes@intel.com>
Always join by ",", do not put commas in the parameter slices. Always
use the variable name `deviceParams`.
Fixes: #180
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Currently virtio-mem-pci devices can be hotplugged only on the root bus.
This doesn't work for PCIe machines like q35.
Extend the API to optionally support hotplugging on PCI bridges.
Fixes: #176
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Adding the support for Protected Execution Facility(PEF) is
which is the confidential computing technology on ppc64le.
Fixes: #174
Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
Secure Execution, also known as Protected Virtualization in QEMU, is a
confidential computing technology for s390x (IBM Z & LinuxONE). Allow
the respective object.
Fixes: #172
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Add CCW (s390x) device numbers to VhostUserDevices, as is with other
device types. Add them to VhostUserFS devices (the only type currently
supported on s390x) when building QEMU parameters.
Fixes: #170
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
by splitting out the respective functionality to QemuNetParams,
QemuSCSIParams, QemuBlkParams, and QemuFSParams. This allows adding
functionality to these functions without going beyond the cyclomatic
complexity of 15 mandated by the lint checks.
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Currently, if romfile field is empty, the commandline will
shows like below:
-device driver=virtio-net-pci,...,mq=on,vectors=4,romfile=
This does not make sense, just remove this field in commandline
Add unittest support.
Signed-off-by: Michael Qiu <qiudayu@huayun.com>
In Kata, we are getting a *lot* of logs at runtime from QMP, in particular `read from QMP: xxxx`
Ideally we'd set this to only be visible for trace, but I did not see this working when adding a
V(7) check around these prints. To avoid filling journal with info that isn't useful, let's drop.
Fixes: #165
Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
Some architectures and setups do not support DIMM/NUMA. However, they
can still use memory backends, provided a memory backend of the same ID
is specified under -machine. This was introduced in QEMU 5.0. Enable
this functionality in appendMemoryKnobs.
Fixes: #160
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
Listening to the events channel from QEMU and a guest
panic event issued, then we can get the event and do some
work for the special event.
Fixes: #152
Signed-off-by: bin liu <bin@hyper.sh>
Remove CONTRIBUTORS.md file since, this repo is now part of the
kata-containers organization, the other repos don't have this file
and we are not willing to maintain (update) it.
Signed-off-by: Julio Montes <julio.montes@intel.com>
`govmm` is now part of the `kata-containers` GitHub organisation, so
update to reflect this.
Fixes: #145.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Add ExecuteAPVFIOMediatedDeviceAdd to qmp.go, which executes a hotplug
for an IBM Adjunct processor (AP) VFIO device (see also
https://www.kernel.org/doc/html/latest/s390/vfio-ap.html )
Also includes the respective unittest and adds the VfioAP DeviceDriver
constant to qemu.go.
Pushing again due to incidental CI failure
Fixes: #133
Signed-off-by: Jakob-Naucke <jakob.naucke@ibm.com>
Reviewed-by: alicefr <afrosi@redhat.com>
Use github actions to run unit tests.
Github actions service looks more stable and reliable than travis.
fixes#136
Signed-off-by: Julio Montes <julio.montes@intel.com>
Fix the following error:
```
Bad response status from coveralls: 422
{"message":"service_job_id (717167073) must be unique for Travis Jobs
not supplying a Coveralls Repo Token","error":true}
The command "$GOPATH/bin/goveralls -v -service=travis-ci" exited with 1.
```
fixes#135
Signed-off-by: Julio Montes <julio.montes@intel.com>
The Kata architecture does not support rebooting VMs (the lifecycle
being start/exec/kill) and if a VM is killed (e.g. using sysrq-trigger),
the VM does not exit fully and other layers do not notice the state change.
Kata needs a way to tell QEMU to run with the '--no-reboot' option
so that the guest VM exits and does not attempt to reboot.
Add a NoReboot boolean Knob so when Knobs.NoReboot is set, the '--no-reboot'
command-line option will be passed to QEMU on startup.
Signed-off-by: Liam Merwick <liam.merwick@oracle.com>
multidevs specifies how to deal with multiple devices being shared with a 9p
export. `multidevs=remap` fixes the following warning:
```
9p: Multiple devices detected in same VirtFS export, which might lead to file
ID collisions and severe misbehaviours on guest!
You should either use a separate export for each device shared from host or
use virtfs option 'multidevs=remap'!
```
Signed-off-by: Julio Montes <julio.montes@intel.com>
vhost is a Netdev Tap Option used to configure a host TAP network interface
backend, according to the QMP API documentation the type for such option must
be a boolean. Use boolean type for vhost option to fix the following
error on recent versions of QEMU:
```
Invalid parameter type for 'vhost', expected: boolean
```
Signed-off-by: Julio Montes <julio.montes@intel.com>
The following options can be provided
Intremap: activates interrupt remapping
DeviceIotlb: enables device IOTLB support for the vIOMMU
CachingMode: enables Cahing Mode
See: https://wiki.qemu.org/Features/VT-d
Signed-off-by: Adrian Moreno <amorenoz@redhat.com>
There are three different types for the RTC clock: host, rt and vm.
Add `rt` to the list of RTC clocks.
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Allow API consumers to change the maximum number of ports in the virtio-serial
devices, setting a lower number of ports can improve the boot time and
reduce the attack surface.
fixes#120
Signed-off-by: Julio Montes <julio.montes@intel.com>
Following on from #111 which added support for multiple virtio transports,
add code to use virtio-mmio as the transport when booting a guest with
the microvm machine type and add a microvm case when checking for
NUMA support. Also add a test case for machine string parsing.
Signed-off-by: Liam Merwick <liam.merwick@oracle.com>
According to QEMU's nvdimm documentation: When 'pmem' is 'on' and QEMU is
built with libpmem support, QEMU will take necessary operations to guarantee
the persistence of its own writes to the vNVDIMM backend.
Signed-off-by: Julio Montes <julio.montes@intel.com>
Currently, virtio transports for each device are determined with
architecture dependent build time conditionals. This isn't the ideal
solution, as virtio transports aren't exactly tied to the host's
architecture.
For example, aarch64 VMs do support both PCI and MMIO devices, and
after the recent introduction of the microvm machine type, that's also
the case for x86_64.
This patch extends each device that supports multiple transports with
a VirtioTransport field, so users of the library can manually specify
a transport for each device. To avoid breaking the compatibility, if
VirtioTransport is empty a behavior equivalent to the legacy one is
achieved by checking runtime.GOARCH and Config.Machine.Type.
Keeping support for isVirtioPCI/isVirtioCCW in qmp.go is a bit
tricky. Eventually, the hot-plug API should be extended so callers
must manually specify the transport for the device.
Signed-off-by: Sergio Lopez <slp@redhat.com>
As there's no guarantee that ".cache-size" is a supported QEMU property,
let's not add it to the QEMU command line when the user explicitly set
virtio_fs_cache_size to zero.
By not always setting ".cache-size" property we avoid errors like:
```
$ sudo podman --runtime=/usr/bin/kata-runtime run --security-opt label=disable -it fedora:31 /bin/bash
Error: failed to launch qemu: exit status 1, error messages from qemu log: qemu-kvm: -device vhost-user-fs-pci,chardev=char-88c350403e95d3db,tag=kataShared,cache-size=0M: Property '.cache-size' not found: OCI runtime error
```
Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Caller can hotplug vhost-user device via qmp.
The Qemu vhost-user device, like vhost-user-blk-pci and
vhost-user-scsi-pci can be hotplugged by qmp API:
ExecuteCharDevUnixSocketAdd() together with
ExecutePCIVhostUserDevAdd()
Signed-off-by: Liu Xiaodong <xiaodong.liu@intel.com>
Memory preallocation is just a property of different memory backends.
We should treat it similar to memory sharing property. Also rename
FileBackedMemShared to MemShared as it is just another memory backend
property that works with different memory backends not just file backed
memory.
Signed-off-by: Peng Tao <bergwolf@hyper.sh>
The upper hyervisor manager application maybe need to wait some
QMP event to control boot sequence, but the event we wanted maybe
not exist in some older version, so we need query all QMP ABI and
check the event is supported or not.
related: kata-containers/runtime#1918
Signed-off-by: Ning Bo <ning.bo9@zte.com.cn>
Support for function isSocketIDSupported, isThreadIDSupported and isDieIDSupported.
The functions check if the cpu driver and the qemu version support the
id parameter.
Fixes: #102
Signed-off-by: Alice Frosi <afrosi@de.ibm.com>
In QEMU 4.1 the CPU topology for x86 will change to:
`socket > die > core > thread`.
Add `die-id` field to `CPUProperties` and include it in CPU hotplugging
Signed-off-by: Julio Montes <julio.montes@intel.com>
since some vendor id like 1ded can not be identified by virtio-pci
driver, so upper level need to pass a specified vendor id to qemu.
the upper level will change unavailable id and pass it to qemu.
Signed-off-by: Ace-Tang <aceapril@126.com>
Hotplugged memory could be backed by a file on the host with sharing
turned on. This change allows qmp to pass that option to a govmm.
Signed-off-by: Ganesh Maharaj Mahalingam <ganesh.mahalingam@intel.com>
QEMU supports finer-grained units than GiB. Change the cache size to
MiB so users have more control over the cache size.
Note that changing the semantics of the CacheSize field is fine because
there are no users of this API yet. kata-runtime will be the first
users and prefers MiB instead of GiB.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
In newer versions of QEMU, like 4.0-rc2, QMP events can be thrown even before
the QMP-version response, one example of this behaviour is when a virtio serial
is closed and a VSERPORT_CHANGE event is thrown.
Re-implement mainLoop to check the data received from the VM channel, since
it's not a guarantee that the first data read from the VM channel is the
QMP version.
fixes https://github.com/kata-containers/runtime/issues/1474
Signed-off-by: Julio Montes <julio.montes@intel.com>
Since []byte channel type transfer slice info(include slice underlying array pointer, len, cap)
between channel sender and receiver. scanner.Bytes() function returned slice's underlying array
may point to data that will be overwritten by a subsequent call to Scan(reference from:
https://golang.org/pkg/bufio/#Scanner.Bytes), which may make consecutive scan() call write the
read data into the same underlying array which causes receiver read mixed data,so we need to
copy line to new allocated space and then send to channel receiver to solve this problem.
Fixes: #88
Signed-off-by: jiangpengfei <jiangpengfei9@huawei.com>
The QEMU vhost-user-fs-pci device provides virtio-fs host<->guest file
system sharing (https://virtio-fs.gitlab.io/). The device is
instantiated like this:
$ qemu -chardev socket,path=/tmp/vhost-fs.sock,id=chr0
-device vhost-user-fs-pci,tag=myfs,chardev=chr0,cache-size=4G,versiontable=/dev/shm/fuse_shared_versions
This patch adds the VhostUserFS DeviceDriver and command-line generation
for this QEMU device.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
For vCPU hotplug to work on ppc64le, we need not
pass threadID and socketID. So conditionally pass
arguments when executing CPU device add.
Fixes: #83
Signed-off-by: Nitesh Konkar niteshkonkar@in.ibm.com
The .travis file was building GoVMM with some old of date versions of
Go that seem to be incompatible with the latest versions of gometalinter.
This commit updates the .travis file so that we build against 1.10 and
1.11.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
Static check was complaining about code that looked like
_ = <-ch
when it wants to see simply
<-ch
There was only one instance of this in govmm and this commit fixes
that instance.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
static check was complaining about code that looked like
if x == "" {
return false
}
return true
when what it wants to see is return x != "". This commit fixes the issue.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
staticcheck was complaining about code that looked like
if x == true {
}
rather than
if x {
}
This commit fixes the issue.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
staticcheck was complaining as there were quite a lot of
fmt.Sprintf("%s",d) in the code where d was either a string or
had string as its underlying type.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
staticcheck was complaining as some of the error messages returned by
govmm began with a capital letter. This commit fixes the issue.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
ExecuteNVDIMMDeviceAdd can add a nvdimm disk to qemu.
Not implement NVDIMM device delete function because qemu doesn't support it.
Signed-off-by: Hui Zhu <teawater@hyper.sh>
For devices that actually support the option disable-modern, this
current commit provides a proper flag to the caller. This will allow
for better support when used in nested environment as virtio-pci
devices should rely on virtio 0.9 instead of 1.0 due to a bug in
KVM.
Fixes#80
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
It turns out it is possible to run the unit tests for the s390x build
on travis by renaming the s390x specific files, so that their
inclusion in the build is determined only by tags and not by filename,
and by introducing a new tag s390x_test that we can use to force
their inclusion into a build by using this tag. The .travis file is
then updated to include the line
go test --tags s390x_test ./...
This creates a build on travis that includes the s390x specific
files and runs the unit tests.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
This commit adds a single command to the travis script that checks
that the s390x build works. We can't run the unit tests but at
least we can check that everything builds on this architecture.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
The PR adds the s390x support. It sets the CCW devices and sets to false
all the devices in the mapping isVirtioPCI. It reimplements the functions
QemuNetdevParam and QemuDeviceParam to print an error message if the vhost-user
devices are used. It introduces a new function ExecuteNetCCWDeviceAdd for qmp
for the CCW devices.
Fixes: #37
Co-authored-by: Yash D Jain <ydjainopensource@gmail.com>
Signed-off-by: Alice Frosi <afrosi@de.ibm.com>
This commit updates the headers in the Go source files to adhere
to the new guidelines in the CONTRIBUTING.md file.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
The CONTRIBUTING.md file is updated to provide a template for new
source files and to invite contributors to add themselves to the
CONTRIBUTORS.md file.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
This file is a partial list of contributors to the Virtual Machine
Manager for Go project. To see the full list of contributors,
see the revision history in source control.
Contributors who wish to be recognized in this file should add
themselves (or their employer, as appropriate).
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
Only get 'QMP command failed' error message now when execute QMP
command by 'executeCommandWithResponse' failed. This patch will
output more error detail.
Signed-off-by: NingBo <ning.bo9@zte.com.cn>
This PR prepares for the s390x support. It introduces:
- a generalization of ccw and pci devices. The variables for the pci devices
have been renamed by removing the Pci suffix. They have been moved to the
qemu_arch_base.go
- the mapping isVirtioPCI has been move to qemu_arch_base.go because in
this way a different mapping can be added for other architecture (e.g
s390x)
- the functions QemuNetdevParam and QemuDeviceParam have been moved to
qemu_arch_base.go. In this way, they could be reimplemented for other
architecture for the case VHOSTUSER
- a function disableModern has been introduced to check if the device is
a pci device and then returns the right parameters. In the case of ccw
devices, they don't have the disable-modern flag
- a function mqParameter has been introduced to return the right
parameters for the mq case. The virtio-net-ccw device doesn't have the
vectors flag
- in qemu_arch_base_test.go contains the test and strings that can be
overwritten for other architectures (e.g s390). The devices names and
the flags for the devices can be overwritten.
- the string for the romfile has been replaced by a variable romfile
that could be left empty if the devices doesn't support a romfile as
for the ccw devices for s390.
- clean-up: the disable-modern=on/off options have been changed to
disable-modern=true/false. In the code there was a mixture of on/true
off/false
Fixes: #61
Co-authored-by: Yash D Jain <ydjainopensource@gmail.com>
Signed-off-by: Alice Frosi <afrosi@de.ibm.com>
ExecuteBlockdevAddWithCache has two more parameters direct and noFlush
than ExecuteBlockdevAdd.
They are cache-related options for block devices that are described in
https://github.com/qemu/qemu/blob/master/qapi/block-core.json.
direct denotes whether use of O_DIRECT (bypass the host page cache)
is enabled. noFlush denotes whether flush requests for the device are
ignored.
Signed-off-by: Hui Zhu <teawater@hyper.sh>
Also change TestQMPXBlockdevDel to TestQMPBlockdevDel because QMP verion
2.9 and older use blockdev-del but not x-blockdev-del.
Signed-off-by: Hui Zhu <teawater@hyper.sh>
Add input for -pidfile option of qemu, so that we can get pid of
qemu main process, and apply resource limitations to it.
Fixes#62
Signed-off-by: l00397676 <lujingxiao@huawei.com>
This patch fixes the wrong behavior of specifying a netdev, MAC
address or PCI address entry when those were empty. Instead, it
does not provide those entries if the content is empty.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Any device inheriting from virtio-pci can specify a ROM file. This
option is provisioned by default with "efi-virtio.rom", but most
of the time, firmwares such as OVMF or seabios will already support
what is provided by this ROM file.
In order to reduce the "forced" dependency on such ROM file, govmm
should provide an empty path if the consumer of the library does not
provide one.
This patch reorganizes the list of devices, so that it gets easier to
list which devices inherit from virtio-pci, and then adds the romfile
option to every single device that support this option.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
Add support for virtio-balloon.
- Add test
- Support disable-modern
- Support deflate-on-oom
Signed-off-by: Jose Carlos Venegas Munoz <jose.carlos.venegas.munoz@intel.com>
Rather than show the generic "qemu", log the full path to the
particular qemu binary being used.
Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
In case the type of bridge is PCIEBridge, which we expect as ending
up using pcie-pci-bridge device from Qemu, the properties chassis_nr
and shpc don't exist.
This commit simply fixes this use case by removing those parameters
from the command line.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The memory-backend-ram should also be set to a numa node instead of
being inserted as a new device. Otherwise it becomes additional memory
and requires explicit online to be available, instead of just being a
backend of the memory specified by -m option.
Signed-off-by: Peng Tao <bergwolf@gmail.com>
In addition to supporting hotplug for VFIO mediated device on PCI bridge,
this patch adds hotplug functionality on root bus.
When parameter bus and addr are set to be empty, the system will pick up
an empty slot on root bus.
Signed-off-by: Zhao Xinda <xinda.zhao@intel.com>
The contents of .iso used to bootstrap VMs with cloudinit are
initialised using a precreated, short-lived directory. The
permissions on this directory were too lenient. This commit
restricts access to this directory to the user and his/her group.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
govmm has ExecuteBlockdevAdd() function and ExecuteBlockdevDel() function
doesn't compatible with qemu 2.8,because blockdev-add and x-blockdev-del usages
are different between qemu 2.7 and qemu 2.8
Follow the qemu 2.7 and qemu 2.8 qmp-commands.txt documents to modify ExecuteBlockdevAdd()
function and ExecuteBlockdevDel() function to be compatible with qemu 2.8
Signed-off-by: flyflypeng <jiangpengfei9@huawei.com>
If we hotplug a nic with args mq=on, its qdisc will be mq by default.
This aligns with cold plug nics.
Signed-off-by: Ruidong Cao <caoruidong@huawei.com>
Before calling any other command it is necessary to call
ExecuteQMPCapabilities() otherwise QEMU will not process the subsequent QMP
commands.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In order to be able to hotplug network devices such as vhost user
net, we need to be able to define a previously declared chardev as
a parameter of this new network device.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
This commit adds a couple of negative test cases for qmp.go, one
which checks that failed commands return errors and the other
checks that QMPStart exits gracefully when passed an invalid
socket path.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
This commit adds some negative test cases for the append functions
in qemu.go that build up the qemu command line.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
This will kill the process when the context is cancelled. As using a nil
context is not permitted it is necessary to substitute with a real
context if it is not initialised in the Config struct.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
If no RTC is specified in the config then do not generate any RTC command line
options. RTC command line options are optional for QEMU so make Valid() return
false when presented with the empty version of the RTC struct containing empty
strings.
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
In addition to normal VFIO device, this patch adds VFIO mediated device
as a supplement to do hot plug on PCI(E) bridges.
Signed-off-by: Zhao Xinda <xinda.zhao@intel.com>
With qemu 2.10, a write lock was added for qcow images that
prevents the same image to be passed more than once.
This can be over-ridden using the --share-rw option which is
desired for raw images.
This solves an issue with running Kata with devicemapper
using the privileged mode as in this case all devices on the host
are passed to the container using the block device associated
with the rootfs, causing it to be passed twice to qemu.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
For machines types based on PCIe like q35, device addr and bus must be specified.
For machines types based on PCI like pc, device addr must be specified and bus
is optional since devices can be hot plugged directly on the root bus.
Signed-off-by: Julio Montes <julio.montes@intel.com>
Implement function to hotplug a network device to QEMU by fds.
Macvtap can only be hotplug by this way.
Signed-off-by: Ruidong Cao <caoruidong@huawei.com>
Implement function to hotplug virtio serial ports, the serial ports
are visible in the guest at the directory /dev/virtio-ports.
Signed-off-by: Julio Montes <julio.montes@intel.com>
implement function to hotplug character devices using as backend
unix sockets, binding a character device with a serial port allows
the communnication between processes running in the guest with
processes running in the host.
Signed-off-by: Julio Montes <julio.montes@intel.com>
`getfd` receives a file descriptor via SCM rights and assign it a name,
this command is useful to send file descriptors from the host, and then
hot plug devices that needs file descriptors like vhost-vsock-pci devices.
Signed-off-by: Julio Montes <julio.montes@intel.com>
`vhostfd` is used to specify the vhost-vsock device fd, and it holds
the context ID previously opened.
`disable-modern` is to disable the use of "modern" devices, by using virtio 0.9
instead of virtio 1.0. Particularly, this is useful when running the VM in a
nested environment.
Signed-off-by: Julio Montes <julio.montes@intel.com>
`vhostfd` is the vhost file descriptor that holds the socket context ID
`disable-modern` prevents qemu from relying on fast MMIO
Signed-off-by: Julio Montes <julio.montes@intel.com>
Implement function to hotplug vsocks, vsocks are needed
to communicate processes are running inside the VM
with processes are running on the host.
Signed-off-by: Julio Montes <julio.montes@intel.com>
This commit fixes an issue with the log handlers defined by qmp_test.
The issue was picked up by the latest version of go vet on go tip.
qemu/qmp_test.go:56::error: missing ... in args forwarded to printf-like function (vet)
qemu/qmp_test.go:60::error: missing ... in args forwarded to printf-like function (vet)
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
This commit enables the gas security checker on govmm builds. The
security checker has signalled 4 issues all of which I've checked
and have determined to be non issues. These issues are disabled
by this commit.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
This commit enables staticcheck in the travis builds and fixes the existing
errors detected by staticcheck. There was one type of error repeated in
qemu.go in which the type of some constants was not explicitly specified.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
It is useful when we want to specify migration incoming source.
Supported source are fd and exec right now.
Signed-off-by: Peng Tao <bergwolf@gmail.com>
It allows a caller to use a local file as the memory backend of the
guest, and it also allows the file backed memory device to be set shared
or not.
Signed-off-by: Peng Tao <bergwolf@gmail.com>
We need to be able to specify the PCI slot for a bridge while
adding it.
Add test to verify bridge is correctly added.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This enable data-plane for scsi. All drives attached to the
scsi controller will have their IO processed in a single separate
IO thread instead of qemu's main event loop.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
IOthreads also known as x-data-plane allow IO to
be processed in a separate thread rather than the main event
loop. This produces much better IO throughput and latency.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
certain machines types need to have options to enable or disable features
For example the machine type virt in certain hosts must have the gic version
(gic-version=3 or gic-version=host) to start without problems
Signed-off-by: Julio Montes <julio.montes@intel.com>
device_add qmp command for scsi devices accepts additional parameters like
scsi-id and lun. Implement function to add scsi devices. Devices
with drivers "scsi-hd", "scsi-cd" and "scsi-disk" are accepted.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
This commit enables unit test coverage computation in Travis CI builds.
Going forward, builds that decrease the unit test coverage by more than
1.0% will fail.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
VSOCK sockets are added through a vhost PCI device.
It takes a device ID and a context ID, the latter being
the endpoint value to be reached from the host.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Introduce basic vhost-user-blk-pci support.
In adding this, cleaned up the QemuParams function to use a more
appropriate switch statement. Similarly, cleanup up the Valid() logic.
We still need to look into parameterization of the block parameter
fields as well as introducing multiqueue support for the vhost-user devices.
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
Some comments were network specific for vhost-user devices, which is
incorect. Fixed these.
Renamed the HWAddress field to be Address, so that it could potentially
be used more generically for non-network based vhost-user types.
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
maxcpus is used to specify how many cpus a VM can have.
This attribute must be specified to enable the hotplugging CPUs capability,
otherwise the maximum number of CPU will be defined by the number of CPU
in -smp.
Signed-off-by: Julio Montes <julio.montes@intel.com>
There were some unchecked errors in some of the unit files relating to
the closure and removal of temporary files. As the closure and removal
of these files is not really important to whether the next passes or
fails we ignore the errors.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
This commit adds a .travis file which enables Travis builds for
govmm. The script builds the source and runs the unit tests
and gometalinter enabling
- misspell
- vet
- ineffassign
- gofmt
- gocyclo 15
- golint
- errcheck
- deadcode
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
This commit adds three documents:
- CONTRIBUTING.md ( a files describing how to contribute to the project )`
- COPYING ( the Apache 2.0 license )
- README.md ( a brief description of the project)
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
This commit removes all the references to the ciao project. It also removes
some of the dependencies that the unit tests were pulling in.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
Add ability to add a vhostuser device to the
QEMU commandline. We expect two different types of devices
to be connected through a vhostuser socket: SCSI and network.
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
Some QMP commands like ```query-hotpluggable-cpus``` returns a
response that needs to be processed and returned to the client as
a struct. This patch adds the function ```executeCommandWithResponse```
that returns the response of a QMP command.
Signed-off-by: Julio Montes <julio.montes@intel.com>
This patch adds a new function to hot plug VFIO devices on PCI(E) bridges,
This change allows to hot plug N VFIO devices in Qemu PC and Q35
Signed-off-by: Julio Montes <julio.montes@intel.com>
This change adds an additional parameter to CreateCloudInitISO that
allows users more control over the newly created xorriso process.
They can for instance specify the user under which the new qemu process
should run and which capabilities should be retained in the child
xorriso process.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
This change adds an additional parameter to LaunchCustomQemu that
allows users more control over the newly created process. They can
for instance specify the user under which the new qemu process should
run and which capabilities should be retained in the child qemu
process.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
ExecutePCIDeviceAdd is a function that can be used to hot plug
devices directly on pci(e).0 or pci(e) bridges. ExecutePCIDeviceAdd
is PCI specific because unlike ExecuteDeviceAdd, it includes an
extra parameter to specify the device address on its parent bus.
Signed-off-by: Julio Montes <julio.montes@intel.com>
Bridge struct represent pci bridges(pci-bridge) or
pcie bridges(pcie-pci-bridges), bridges can be used to
hot plug devices in pc and q35 machines
Signed-off-by: Julio Montes <julio.montes@intel.com>
Add support for macvtap interfaces. This also brings in support
for generic multiqueue support in virt containers.
Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
Add support to launch virtual machines where the RAM is
allocated using huge pages. This is useful for running
with a user mode networking stack, and for custom setups
which require high performance and low latency.
Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
VFIO is meant for exposing exposing direct device access
to the virtual machine.
Add ability to append VFIO devices to qemu command line.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Ciao has recently moved from github.com/01org/ciao to
github.com/ciao-project/ciao. This moves requires us to update our
import paths to build successfully.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
The Mlock knob is unfortunately tied to realtime.
Allow Mlock knob to implicitly enable realtime to get the
desired swapping behavior when swapping is desired.
Note: Realtime as implemented today can only be used to enable
swap, and as such does not really control realtime behaviour.
The knob is redundant but retained here just to ensure that
when more capabilities are added in future QEMU iterations
we can take advantage of the same.
Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
Enable realtime options in QEMU. Also add support to control memory
locking. Turning realtime on with memory locking disabled allows
memory to be swapped out, potentially increasing density of VMs.
Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
Add support for pre-allocating all of the RAM.
This increases the memory footprint of QEMU and should be used
only when needed.
Signed-off-by: Manohar Castelino <manohar.r.castelino@intel.com>
blockdev-del command has been added in qemu 2.9 to replace
x-blockdev-del command used earlier for deleting block devices.
Update ExecuteXBlockdevDel() to use this updated qmp command.
Rename ExecuteXBlockdevDel to ExecuteBlockdevDel as this no longer
executes x-block-del command for qemu>=2.9.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
For some cases, we have to disable the fast MMIO support, by disabling
virtio 1.0. The reason for this is that we want to be able to nest our
qemu VM inside a VM run by an hypervisor with no support for fast MMIO.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
In case of a network device, and specifically virtio-net-pci, we have
to update to what is expected by qemu. In this case, the driver name
should be prefixed with "driver=".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
With qemu 2.9, the qmp block-dev command was updated from:
{ "execute": "blockdev-add", "arguments": { "options": { ... } } }
to:
{ "execute": "blockdev-add", "arguments": { ... } }
Also, instead of id, blockdev-add now requires a node-name for the
root node(https://wiki.qemu.org/index.php/ChangeLog/2.9)
Store the version information with QMPStart and use that to issue
qmp command for adding block devices in the correct format.
Signed-off-by: Archana Shinde <archana.m.shinde@intel.com>
Unfortunately the ununused linter is overzealous with some of the fields
that it things are unused as gophercloud relies on their values. So go
ahead with the most straightforward removals but do not enable unused on
travis builds.
ciao-image/datastore/datastore_test.go:28:5⚠️ var metaDsTables is unused (U1000) (unused)
ciao-controller/api/api_test.go:39:6⚠️ func myHostname is unused (U1000) (unused)
ciao-cli/identity.go:58:3⚠️ field Description is unused (U1000) (unused)
ciao-cli/identity.go:59:3⚠️ field DomainID is unused (U1000) (unused)
ciao-cli/identity.go:60:3⚠️ field Enabled is unused (U1000) (unused)
ciao-cli/identity.go:62:3⚠️ field ParentID is unused (U1000) (unused)
ciao-cli/identity.go:63:3⚠️ field Links is unused (U1000) (unused)
ciao-cli/identity.go:70:3⚠️ field Self is unused (U1000) (unused)
ciao-cli/identity.go:71:3⚠️ field Previous is unused (U1000) (unused)
ciao-cli/identity.go:72:3⚠️ field Next is unused (U1000) (unused)
ciao-cli/identity.go:207:3⚠️ field Next is unused (U1000) (unused)
ciao-cli/identity.go:208:3⚠️ field Previous is unused (U1000) (unused)
ciao-cli/identity.go:209:3⚠️ field Self is unused (U1000) (unused)
ciao-cli/identity.go:213:3⚠️ field Description is unused (U1000) (unused)
ciao-cli/identity.go:214:3⚠️ field DomainID is unused (U1000) (unused)
ciao-cli/identity.go:215:3⚠️ field Enabled is unused (U1000) (unused)
ciao-cli/identity.go:217:3⚠️ field Links is unused (U1000) (unused)
ciao-cli/identity.go:221:3⚠️ field ParentID is unused (U1000) (unused)
ciao-cli/main.go:105:6⚠️ type action is unused (U1000) (unused)
ciao-cli/volume.go:37:6⚠️ type customVolumeExt is unused (U1000) (unused)
ciao-cli/volume.go:39:2⚠️ field customVolumeExt is unused (U1000) (unused)
networking/ciao-cnci-agent/network.go:98:8⚠️ const maxKey is unused (U1000) (unused)
networking/libsnnet/tests/parallel/parallel_test.go:371:6⚠️ func dockerNetList is unused (U1000) (unused)
networking/libsnnet/tests/parallel/parallel_test.go:379:6⚠️ func dockerNetInfo is unused (U1000) (unused)
openstack/compute/api.go:308:2⚠️ const limit is unused (U1000) (unused)
openstack/compute/api.go:309:2⚠️ const marker is unused (U1000) (unused)
openstack/compute/api.go:312:6⚠️ type pager is unused (U1000) (unused)
openstack/compute/api.go:313:2⚠️ func pager.filter is unused (U1000) (unused)
openstack/compute/api.go:314:2⚠️ func pager.nextPage is unused (U1000) (unused)
openstack/compute/api_test.go:34:6⚠️ func myHostname is unused (U1000) (unused)
ciao-controller/api.go:72:2⚠️ const statusFilter is unused (U1000) (unused)
ciao-controller/api.go:75:6⚠️ type pager is unused (U1000) (unused)
ciao-controller/api.go:76:2⚠️ func pager.filter is unused (U1000) (unused)
ciao-controller/api.go:77:2⚠️ func pager.nextPage is unused (U1000) (unused)
ciao-controller/api.go:136:25⚠️ func (*nodePager).filter is unused (U1000) (unused)
ciao-controller/api.go:198:31⚠️ func (*nodeServerPager).filter is unused (U1000) (unused)
ciao-controller/controller_test.go:107:6⚠️ func addTestTenantNoCNCI is unused (U1000) (unused)
ciao-controller/controller_test.go:1104:6⚠️ func startTestWorkload is unused (U1000) (unused)
ciao-controller/controller_test.go:1123:6⚠️ func testStartWorkloadLaunchCNCI is unused (U1000) (unused)
ciao-controller/openstack_compute.go:552:5⚠️ field Links is unused (U1000) (unused)
qemu/qmp_test.go:493:3⚠️ const seconds is unused (U1000) (unused)
qemu/qmp_test.go:494:3⚠️ const microsecondsEv1 is unused (U1000) (unused)
qemu/qmp_test.go:495:3⚠️ const device is unused (U1000) (unused)
qemu/qmp_test.go:496:3⚠️ const path is unused (U1000) (unused)
templateutils/example_test.go:53:3⚠️ field hidden is unused (U1000) (unused)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
Add structcheck to the list of linters used on travis runs.
ciao-cli/event.go:109:2⚠️ unused struct field github.com/01org/ciao/ciao-cli.eventDeleteCommand.all (structcheck)
ciao-cli/event.go:110:2⚠️ unused struct field github.com/01org/ciao/ciao-cli.eventDeleteCommand.tenant (structcheck)
ciao-cli/external_ips.go:636:2⚠️ unused struct field github.com/01org/ciao/ciao-cli.poolAddCommand.ips (structcheck)
ciao-cli/node.go:43:2⚠️ unused struct field github.com/01org/ciao/ciao-cli.nodeListCommand.nodeID (structcheck)
ciao-controller/client_wrapper_test.go:29:2⚠️ unused struct field github.com/01org/ciao/ciao-controller.ssntpClientWrapper.ctl (structcheck)
qemu/qmp.go:111:2⚠️ unused struct field github.com/01org/ciao/qemu.qmpResult.data (structcheck)
ssntp/ssntp_test.go:193:2⚠️ unused struct field github.com/01org/ciao/ssntp_test.ssntpClient.evtTracedChannel (structcheck)
ssntp/ssntp_test.go:192:2⚠️ unused struct field github.com/01org/ciao/ssntp_test.ssntpClient.staTracedChannel (structcheck)
ssntp/ssntp_test.go:194:2⚠️ unused struct field github.com/01org/ciao/ssntp_test.ssntpClient.errTracedChannel (structcheck)
ssntp/server.go:75:2⚠️ unused struct field github.com/01org/ciao/ssntp.Server.roleVerify (structcheck)
networking/ciao-cnci-agent/client.go:97:2⚠️ unused struct field github.com/01org/ciao/networking/ciao-cnci-agent.agentClient.netCh (structcheck)
testutil/agent.go:37:2⚠️ unused struct field github.com/01org/ciao/testutil.SsntpTestClient.ticker (structcheck)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
And add varcheck to the list of linters used on travis runs (with an
increased deadline.)
ciao-launcher/qemu_test.go:31:5⚠️ unused variable or constant imageInfoTestGood (varcheck)
ciao-launcher/qemu_test.go:44:5⚠️ unused variable or constant imageInfoTestMissingBytes (varcheck)
ciao-launcher/qemu_test.go:57:5⚠️ unused variable or constant imageInfoTestMissingLine (varcheck)
ciao-launcher/qemu_test.go:69:5⚠️ unused variable or constant imageInfoTooBig (varcheck)
ciao-launcher/qemu_test.go:82:5⚠️ unused variable or constant imageInfoBadBytes (varcheck)
configuration/configuration_test.go:35:7⚠️ unused variable or constant glanceURL (varcheck)
ciao-controller/controller_test.go:1918:5⚠️ unused variable or constant testClients (varcheck)
qemu/qmp_test.go:44:2⚠️ unused variable or constant qmpSuccess (varcheck)
qemu/qmp_test.go:45:2⚠️ unused variable or constant qmpFailure (varcheck)
Signed-off-by: Rob Bradford <robert.bradford@intel.com>
The existing NetDevice relies on virtio-net driver, but there is a
useful PCI variant which was not available: virtio-net-pci.
This patch adds this new driver and adds two parameters specific to
this: "bus" and "addr".
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
The code that handles the serialization and cancelling of QMP commands
is a little complex and it took me some time to remember how it actually
works and why it works in this particular way. For this reason I've
added some comments which will hopefully make the next bug fix in this
area a little less painful.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
There was a bug with the cancelling of commands that meant that when
an attempt was made to cancel a command and then to issue a second
command, the first, cancelled command was re-issued. This commit
fixes the issue and adds a new test case to check that cancelling
of commands does indeed work. There was also an issue with the
test harness which meant that tests that issued more than one command
were not actually testing the second and third commands.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
Launcher's ConfigDrive ISO creation function, createCloudInitISO has
been moved to the qemu package so that it can be re-used by ciao-down.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
When creating a CharDevice, we need to add a "bus" parameter
so that it can match the serial pci device previously created.
Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
We add a new device driver, and also a name to the CharDev structure
this is needed for qemu to actually create the serial port on
the guest.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The QMP socket implementation does not support multiple clients sending
and receiving QMP commands. As a consequence we need to be able to
create multiple QMP sockets from the qemu package, so that at least we
can support a fixed number of QMP clients.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
When we get no virtual console to plug into, we may want qemu to create
a socket where we can asynchronously connect to.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
All file descriptors will come from specific devices configurations, so
this patch:
1) Make the Config FDs file private
2) Provide an appendFDs() method for Config, that takes a slice of
os.File pointers and
a) Adds them to the Config private fd slice
b) Return a slice of ints that represent the file descriptors for
these device specific files, as seen by the qemu process.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
It is a private field now, and all append*() routines are now
Config methods instead of private qemu functions.
Since we will have to carry a kernelParams private field as well,
this change will keep all built parameters internal and make things
consistent.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
By adding QemuParams() to the Device interface, we can get rid of the
driver structure and simplify further the appendDevices() routine.
With that implementation we can generate the following qemu parameters:
"-device virtio-9p-pci,fsdev=foo,mount_tag=rootfs -fsdev local,id=foo,path=/bar/foo,security-model=none"
from these single structures:
fsdev := FSDevice{
Driver: Virtio9P
FSDriver: Local,
ID: "foo",
Path: "/bar/foo",
MountTag: "rootfs",
SecurityModel: None,
}
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Instead of open coding the RTC fields, we now have specific types for
it.
We also have a RTC unit test now.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Having separate structures for the qemu driver definitions
and each possible device definitions is confusing and error prone as one
needs to be very careful using matching IDs and names in both
structures.
As the driver parameter can be derived from the device
ones, this patch changes the Device and Driver structures to be linked
together, i.e. each driver needs to have its corresponding device.
For example this allows us to build the following 9pfs qemu parameters:
"-fsdev local,id=foo,path=/bar/foo,security-model=none -device virtio-9p-pci,fsdev=foo,mount_tag=rootfs"
from these structures:
fsdev := FSDevice{
Driver: Local,
ID: "foo",
Path: "/bar/foo",
MountTag: "rootfs",
SecurityModel: None,
}
driver := Driver{
Driver: Virtio9P,
Device: fsdev,
}
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
With the NetDev and MACAddress strings, we can now create networking
device drivers.
We also add a unit test for netdev Device creation.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We can now specify if we want vhost to be enabled and wich fds we should
use for multiqueue support.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The NetDevice structure represents a network device to be emulated by
qemu.
We also add the corresponding unit test.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The Knobs structure groups all qemu isolated boolean settings.
For now this is -no-user-config, -no-defaults and -nographic.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The extraParams is confusing and can conflict with the rest of the
Config structure definitions.
We remove it and will add new fields to that structure as needed.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Here we group the machine type and acceleration together as they are
defined through the same qemu parameter (-machine).
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The Global string represents the set of default Device driver properties
we want qemu to use. This is mostly useful for automatically created
devices.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The SMP structure defines the amount of virtual CPUs, sockets, and
threads per CPU that is made available to the guest.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The Memory field holds the guest memory configuration.
It is used to define the current and maximum RAM is made available to
the guest and how this amount of RAM is splitted into several slots.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Qemu character devices typically allow for sending traffic from the
guest to the host by emulating a console, a tty, a serial device for
example.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Each Filesystem device should have a corresponding "virtio-9p-pci"
Device driver. They represent a filesystem to be exported through 9pfs.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The Kernel structure holds the guest kernel configuration: its path and
its parameters. This is the kernel qemu will boot the VM from.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
The Object slice tells qemu which specific object to create.
Qemu objects can represent memory backend files, random number
generators, TLS credentials, etc...
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
We may need to support a large range of devices in the qemu created VM
and the Device slice allows us to define which drivers are needed.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
QMP sockets are used to send qemu specific commands to the running qemu
process.
The QMPSocket structure allows us to define the socket type we want,
along with its name.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
LaunchQemu() now takes a Config structure that contains some more
descriptive fields than raw qemu parameter strings.
LaunchQemu is now simpler to call and more extensible as supporting more
qemu parameters would mean expanding Config instead of changing the API.
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Ciao will use the new standard library context package from now on.
This will allow us to use some of the new standard library functions
such as DialContext.
Partial fix for issue #541
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
Fix ciao/qemu/qmp.go:349:3: ineffectual assignment to ok.
Strictly speaking this is a bug in ineffassign but it's easier
to change the ciao code.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
There's no point in setting cmd.ExtraFiles if the fds array is an
empty slice. This won't do any harm but is essentially a no-op.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
This commit adds some package documentation to the qemu package,
including an overview of the package and an example of its use.
Signed-off-by: Mark Ryan <mark.d.ryan@intel.com>
## This repo is part of [Kata Containers](https://katacontainers.io)
For details on how to contribute to the Kata Containers project, please see the main [contributing document](https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md).
For details on how to contribute to the Kata Containers project, please see the main [contributing document](https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md).
a method used in cloud computing, whereby the amount of computational resources in a server farm, typically measured in terms of the number of active servers, which vary automatically based on the load on the farm.
## B
## C
### Container Security Solutions
The process of implementing security tools and policies that will give you the assurance that everything in your container is running as intended, and only as intended.
### Container Software
A standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another.
### Container Runtime Interface
A plugin interface which enables Kubelet to use a wide variety of container runtimes, without the need to recompile.
### Container Virtualization
A container is a virtual runtime environment that runs on top of a single operating system (OS) kernel and emulates an operating system rather than the underlying hardware.
## D
## E
## F
## G
## H
## I
### Infrastructure Architecture
A structured and modern approach for supporting an organization and facilitating innovation within an enterprise.
## J
## K
### Kata Containers
Kata containers is an open source project delivering increased container security and Workload isolation through an implementation of lightweight virtual machines.
## L
## M
## N
## O
## P
### Pod Containers
A Group of one or more containers , with shared storage/network, and a specification for how to run the containers.
### Private Cloud
A computing model that offers a proprietary environment dedicated to a single business entity.
### Public Cloud
Computing services offered by third-party providers over the public Internet, making them available to anyone who wants to use or purchase them.
## Q
## R
## S
### Serverless Containers
An architecture in which code is executed on-demand. Serverless workloads are typically in the cloud, but on-premises serverless platforms exist, too.
## T
## U
## V
### Virtual Machine Monitor
Computer software, firmware or hardware that creates and runs virtual machines.
### Virtual Machine Software
A software program or operating system that not only exhibits the behavior of a separate computer, but is also capable of performing tasks such as running applications and programs like a separate computer.
## W
## X
## Y
## Z
See the [project glossary hosted in the wiki](https://github.com/kata-containers/kata-containers/wiki/Glossary).
which contains a number of sections for various parts of the Kata
Containers system including the [runtime](src/runtime), the
[agent](src/agent) and the [hypervisor](#hypervisors).
## Hypervisors
See the [hypervisors document](docs/hypervisors.md) and the
[Hypervisor specific configuration details](src/runtime/README.md#hypervisor-specific-configuration).
## Community
@@ -48,6 +105,8 @@ Please raise an issue
## Developers
See the [developer guide](docs/Developer-Guide.md).
### Components
### Main components
@@ -70,8 +129,8 @@ The table below lists the remaining parts of the project:
| [packaging](tools/packaging) | infrastructure | Scripts and metadata for producing packaged binaries<br/>(components, hypervisors, kernel and rootfs). |
| [kernel](https://www.kernel.org) | kernel | Linux kernel used by the hypervisor to boot the guest image. Patches are stored [here](tools/packaging/kernel). |
| [osbuilder](tools/osbuilder) | infrastructure | Tool to create "mini O/S" rootfs and initrd images and kernel for the hypervisor. |
| [`agent-ctl`](tools/agent-ctl) | utility | Tool that provides low-level access for testing the agent. |
| [`ci`](https://github.com/kata-containers/ci) | CI | Continuous Integration configuration files and scripts. |
| [`katacontainers.io`](https://github.com/kata-containers/www.katacontainers.io) | Source for the [`katacontainers.io`](https://www.katacontainers.io) site. |
@@ -84,8 +143,4 @@ the [components](#components) section for further details.
## Glossary of Terms
See the [glossary of terms](Glossary.md) related to Kata Containers.
> - You should only do this step if you are testing with the latest version of the agent.
The rust-agent is built with a static linked `musl.` To configure this:
The agent is built with a statically linked `musl.` The default `libc` used is `musl`, but on `ppc64le` and `s390x`, `gnu` should be used. To configure this:
```
rustup target add x86_64-unknown-linux-musl
sudo ln -s /usr/bin/g++ /bin/musl-g++
$ export ARCH=$(uname -m)
$ if [ "$ARCH" = "ppc64le" -o "$ARCH" = "s390x" ]; then export LIBC=gnu; else export LIBC=musl; fi
You MUST choose one of `alpine`, `centos`, `clearlinux`, `debian`, `euleros`, `fedora`, `suse`, and `ubuntu` for `${distro}`. By default `seccomp` packages are not included in the rootfs image. Set `SECCOMP` to `yes` to include them.
> - If you do *not* wish to build under Docker, remove the `USE_DOCKER`
> variable in the previous command and ensure the `qemu-img` command is
> available on your system.
> - If `qemu-img` is not installed, you will likely see errors such as `ERROR: File /dev/loop19p1 is not a block device` and `losetup: /tmp/tmp.bHz11oY851: Warning: file is smaller than 512 bytes; the loop device may be useless or invisible for system tools`. These can be mitigated by installing the `qemu-img` command (available in the `qemu-img` package on Fedora or the `qemu-utils` package on Debian).
See issue https://github.com/kata-containers/runtime/issues/175 for more information.
Docker compose normally uses custom networks, so also has the same limitations.
## Resource management
Due to the way VMs differ in their CPU and memory allocation, and sharing
@@ -112,82 +104,12 @@ See issue https://github.com/clearcontainers/runtime/issues/341 and [the constra
For CPUs resource management see
[CPU constraints](design/vcpu-handling.md).
### docker run and shared memory
The runtime does not implement the `docker run --shm-size` command to
set the size of the `/dev/shm tmpfs` within the container. It is possible to pass this configuration value into the VM container so the appropriate mount command happens at launch time.
See issue https://github.com/kata-containers/kata-containers/issues/21 for more information.
### docker run and sysctl
The `docker run --sysctl` feature is not implemented. At the runtime
level, this equates to the `linux.sysctl` OCI configuration. Docker
allows configuring the sysctl settings that support namespacing. From a security and isolation point of view, it might make sense to set them in the VM, which isolates sysctl settings. Also, given that each Kata Container has its own kernel, we can support setting of sysctl settings that are not namespaced. In some cases, we might need to support configuring some of the settings on both the host side Kata Container namespace and the Kata Containers kernel.
See issue https://github.com/kata-containers/runtime/issues/185 for more information.
## Docker daemon features
Some features enabled or implemented via the
[`dockerd` daemon](https://docs.docker.com/config/daemon/) configuration are not yet
implemented.
### SELinux support
The `dockerd` configuration option `"selinux-enabled": true` is not presently implemented
in Kata Containers. Enabling this option causes an OCI runtime error.
See issue https://github.com/kata-containers/runtime/issues/784 for more information.
The consequence of this is that the [Docker --security-opt is only partially supported](#docker---security-opt-option-partially-supported).
Kubernetes [SELinux labels](https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#assign-selinux-labels-to-a-container) will also not be applied.
# Architectural limitations
This section lists items that might not be fixed due to fundamental
architectural differences between "soft containers" (i.e. traditional Linux*
containers) and those based on VMs.
## Networking limitations
### Support for joining an existing VM network
Docker supports the ability for containers to join another containers
namespace with the `docker run --net=containers` syntax. This allows
multiple containers to share a common network namespace and the network
interfaces placed in the network namespace. Kata Containers does not
support network namespace sharing. If a Kata Container is setup to
share the network namespace of a `runc` container, the runtime
effectively takes over all the network interfaces assigned to the
namespace and binds them to the VM. Consequently, the `runc` container loses
its network connectivity.
### docker --net=host
Docker host network support (`docker --net=host run`) is not supported.
It is not possible to directly access the host networking configuration
from within the VM.
The `--net=host` option can still be used with `runc` containers and
inter-mixed with running Kata Containers, thus enabling use of `--net=host`
when necessary.
It should be noted, currently passing the `--net=host` option into a
Kata Container may result in the Kata Container networking setup
modifying, re-configuring and therefore possibly breaking the host
networking setup. Do not use `--net=host` with Kata Containers.
### docker run --link
The runtime does not support the `docker run --link` command. This
command is now deprecated by docker and we have no intention of adding support.
Equivalent functionality can be achieved with the newer docker networking commands.
@@ -11,20 +11,23 @@ For details of the other Kata Containers repositories, see the
* [Installation guides](./install/README.md): Install and run Kata Containers with Docker or Kubernetes
## Tracing
See the [tracing documentation](tracing.md).
## More User Guides
* [Upgrading](Upgrading.md): how to upgrade from [Clear Containers](https://github.com/clearcontainers) and [runV](https://github.com/hyperhq/runv) to [Kata Containers](https://github.com/kata-containers) and how to upgrade an existing Kata Containers system to the latest version.
* [Limitations](Limitations.md): differences and limitations compared with the default [Docker](https://www.docker.com/) runtime,
[`runc`](https://github.com/opencontainers/runc).
### Howto guides
### How-to guides
See the [howto documentation](how-to).
See the [how-to documentation](how-to).
## Kata Use-Cases
* [GPU Passthrough with Kata](./use-cases/GPU-passthrough-and-Kata.md)
* [OpenStack Zun with Kata Containers](./use-cases/zun_kata.md)
* [SR-IOV with Kata](./use-cases/using-SRIOV-and-kata.md)
* [Intel QAT with Kata](./use-cases/using-Intel-QAT-and-kata.md)
* [VPP with Kata](./use-cases/using-vpp-and-kata.md)
@@ -37,16 +40,30 @@ Documents that help to understand and contribute to Kata Containers.
### Design and Implementations
* [Kata Containers Architecture](design/architecture.md): Architectural overview of Kata Containers
* [Kata Containers Architecture](design/architecture): Architectural overview of Kata Containers
* [Kata Containers E2E Flow](design/end-to-end-flow.md): The entire end-to-end flow of Kata Containers
* [Kata Containers design](./design/README.md): More Kata Containers design documents
* [Kata Containers threat model](./threat-model/threat-model.md): Kata Containers threat model
### How to Contribute
* [Developer Guide](Developer-Guide.md): Setup the Kata Containers developing environments
* [How to contribute to Kata Containers](https://github.com/kata-containers/community/blob/master/CONTRIBUTING.md)
* [How to contribute to Kata Containers](https://github.com/kata-containers/community/blob/main/CONTRIBUTING.md)
- The above step will create a GitHub pull request in the Kata projects. Trigger the CI using `/test` command on each bump Pull request.
- Trigger the test-kata-deploy workflow on the kata-containers repository bump Pull request using `/test_kata_deploy` (monitor under the "action" tab).
- Check any failures and fix if needed.
- Work with the Kata approvers to verify that the CI works and the pull requests are merged.
@@ -64,7 +65,7 @@
### Check Git-hub Actions
We make use of [GitHub actions](https://github.com/features/actions) in this [file](https://github.com/kata-containers/kata-containers/blob/main/.github/workflows/main.yaml) in the `kata-containers/kata-containers` repository to build and upload release artifacts. This action is auto triggered with the above step when a new tag is pushed to the `kata-containers/kata-containers` repository.
We make use of [GitHub actions](https://github.com/features/actions) in this [file](../.github/workflows/release.yaml) in the `kata-containers/kata-containers` repository to build and upload release artifacts. This action is auto triggered with the above step when a new tag is pushed to the `kata-containers/kata-containers` repository.
Check the [actions status page](https://github.com/kata-containers/kata-containers/actions) to verify all steps in the actions workflow have completed successfully. On success, a static tarball containing Kata release artifacts will be uploaded to the [Release page](https://github.com/kata-containers/kata-containers/releases).
This is an architectural overview of Kata Containers, based on the 2.0 release.
The primary deliverable of the Kata Containers project is a CRI friendly shim. There is also a CRI friendly library API behind them.
The [Kata Containers runtime](../../src/runtime)
is compatible with the [OCI](https://github.com/opencontainers) [runtime specification](https://github.com/opencontainers/runtime-spec)
and therefore works seamlessly with the [Kubernetes\* Container Runtime Interface (CRI)](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-node/container-runtime-interface.md)
through the [CRI-O\*](https://github.com/kubernetes-incubator/cri-o) and
Kata Containers creates a QEMU\*/KVM virtual machine for pod that `kubelet` (Kubernetes) creates respectively.
The [`containerd-shim-kata-v2` (shown as `shimv2` from this point onwards)](../../src/runtime/containerd-shim-v2)
is the Kata Containers entrypoint, which
implements the [Containerd Runtime V2 (Shim API)](https://github.com/containerd/containerd/tree/master/runtime/v2) for Kata.
Before `shimv2` (as done in [Kata Containers 1.x releases](https://github.com/kata-containers/runtime/releases)), we need to create a `containerd-shim` and a [`kata-shim`](https://github.com/kata-containers/shim) for each container and the Pod sandbox itself, plus an optional [`kata-proxy`](https://github.com/kata-containers/proxy) when VSOCK is not available. With `shimv2`, Kubernetes can launch Pod and OCI compatible containers with one shim (the `shimv2`) per Pod instead of `2N+1` shims, and no standalone `kata-proxy` process even if no VSOCK is available.

The container process is then spawned by
[`kata-agent`](../../src/agent), an agent process running
as a daemon inside the virtual machine. `kata-agent` runs a [`ttRPC`](https://github.com/containerd/ttrpc-rust) server in
the guest using a VIRTIO serial or VSOCK interface which QEMU exposes as a socket
file on the host. `shimv2` uses a `ttRPC` protocol to communicate with
the agent. This protocol allows the runtime to send container management
commands to the agent. The protocol is also used to carry the I/O streams (stdout,
stderr, stdin) between the containers and the manage engines (e.g. CRI-O or containerd).
For any given container, both the init process and all potentially executed
commands within that container, together with their related I/O streams, need
to go through the VSOCK interface exported by QEMU.
The container workload, that is, the actual OCI bundle rootfs, is exported from the
host to the virtual machine. In the case where a block-based graph driver is
configured, `virtio-scsi` will be used. In all other cases a `virtio-fs` VIRTIO mount point
will be used. `kata-agent` uses this mount point as the root filesystem for the
container processes.
## Virtualization
How Kata Containers maps container concepts to virtual machine technologies, and how this is realized in the multiple
hypervisors and VMMs that Kata supports is described within the [virtualization documentation](./virtualization.md)
## Guest assets
The hypervisor will launch a virtual machine which includes a minimal guest kernel
and a guest image.
### Guest kernel
The guest kernel is passed to the hypervisor and used to boot the virtual
machine. The default kernel provided in Kata Containers is highly optimized for
kernel boot time and minimal memory footprint, providing only those services
required by a container workload. This is based on a very current upstream Linux
kernel.
### Guest image
Kata Containers supports both an `initrd` and `rootfs` based minimal guest image.
#### Root filesystem image
The default packaged root filesystem image, sometimes referred to as the "mini O/S", is a
highly optimized container bootstrap system based on [Clear Linux](https://clearlinux.org/). It provides an extremely minimal environment and
has a highly optimized boot path.
The only services running in the context of the mini O/S are the init daemon
(`systemd`) and the [Agent](#agent). The real workload the user wishes to run
is created using libcontainer, creating a container in the same manner that is done
by `runc`.
For example, when `ctr run -ti ubuntu date` is run:
- The hypervisor will boot the mini-OS image using the guest kernel.
-`systemd`, running inside the mini-OS context, will launch the `kata-agent` in
the same context.
- The agent will create a new confined context to run the specified command in
(`date` in this example).
- The agent will then execute the command (`date` in this example) inside this
new context, first setting the root filesystem to the expected Ubuntu\* root
filesystem.
#### Initrd image
A compressed `cpio(1)` archive, created from a rootfs which is loaded into memory and used as part of the Linux startup process. During startup, the kernel unpacks it into a special instance of a `tmpfs` that becomes the initial root filesystem.
The only service running in the context of the initrd is the [Agent](#agent) as the init daemon. The real workload the user wishes to run is created using libcontainer, creating a container in the same manner that is done by `runc`.
## Agent
[`kata-agent`](../../src/agent) is a process running in the guest as a supervisor for managing containers and processes running within those containers.
For the 2.0 release, the `kata-agent` is rewritten in the [RUST programming language](https://www.rust-lang.org/) so that we can minimize its memory footprint while keeping the memory safety of the original GO version of [`kata-agent` used in Kata Container 1.x](https://github.com/kata-containers/agent). This memory footprint reduction is pretty impressive, from tens of megabytes down to less than 100 kilobytes, enabling Kata Containers in more use cases like functional computing and edge computing.
The `kata-agent` execution unit is the sandbox. A `kata-agent` sandbox is a container sandbox defined by a set of namespaces (NS, UTS, IPC and PID). `shimv2` can
run several containers per VM to support container engines that require multiple
containers running inside a pod.
`kata-agent` communicates with the other Kata components over `ttRPC`.
## Runtime
`containerd-shim-kata-v2` is a [containerd runtime shimv2](https://github.com/containerd/containerd/blob/v1.4.1/runtime/v2/README.md) implementation and is responsible for handling the `runtime v2 shim APIs`, which is similar to [the OCI runtime specification](https://github.com/opencontainers/runtime-spec) but simplifies the architecture by loading the runtime once and making RPC calls to handle the various container lifecycle commands. This refinement is an improvement on the OCI specification which requires the container manager call the runtime binary multiple times, at least once for each lifecycle command.
`containerd-shim-kata-v2` heavily utilizes the
[virtcontainers package](../../src/runtime/virtcontainers/), which provides a generic, runtime-specification agnostic, hardware-virtualized containers library.
### Configuration
The runtime uses a TOML format configuration file called `configuration.toml`. By default this file is installed in the `/usr/share/defaults/kata-containers` directory and contains various settings such as the paths to the hypervisor, the guest kernel and the mini-OS image.
The actual configuration file paths can be determined by running:
```
$ kata-runtime --show-default-config-paths
```
Most users will not need to modify the configuration file.
The file is well commented and provides a few "knobs" that can be used to modify the behavior of the runtime and your chosen hypervisor.
The configuration file is also used to enable runtime [debug output](../Developer-Guide.md#enable-full-debug).
## Networking
Containers will typically live in their own, possibly shared, networking namespace.
At some point in a container lifecycle, container engines will set up that namespace
to add the container to a network which is isolated from the host network, but
which is shared between containers
In order to do so, container engines will usually add one end of a virtual
ethernet (`veth`) pair into the container networking namespace. The other end of
the `veth` pair is added to the host networking namespace.
This is a very namespace-centric approach as many hypervisors/VMMs cannot handle `veth`
interfaces. Typically, `TAP` interfaces are created for VM connectivity.
To overcome incompatibility between typical container engines expectations
and virtual machines, Kata Containers networking transparently connects `veth`
Container workloads are shared with the virtualized environment through [virtio-fs](https://virtio-fs.gitlab.io/).
The [devicemapper `snapshotter`](https://github.com/containerd/containerd/tree/master/snapshots/devmapper) is a special case. The `snapshotter` uses dedicated block devices rather than formatted filesystems, and operates at the block level rather than the file level. This knowledge is used to directly use the underlying block device instead of the overlay file system for the container root file system. The block device maps to the top read-write layer for the overlay. This approach gives much better I/O performance compared to using `virtio-fs` to share the container file system.
Kata Containers has the ability to hotplug and remove block devices, which makes it possible to use block devices for containers started after the VM has been launched.
Users can check to see if the container uses the devicemapper block device as its rootfs by calling `mount(8)` within the container. If the devicemapper block device
is used, `/` will be mounted on `/dev/vda`. Users can disable direct mounting of the underlying block device through the runtime configuration.
## Kubernetes support
[Kubernetes\*](https://github.com/kubernetes/kubernetes/) is a popular open source
container orchestration engine. In Kubernetes, a set of containers sharing resources
such as networking, storage, mount, PID, etc. is called a
A Kubernetes cluster runs a control plane where a scheduler (typically running on a
dedicated master node) calls into a compute Kubelet. This Kubelet instance is
responsible for managing the lifecycle of pods within the nodes and eventually relies
on a container runtime to handle execution. The Kubelet architecture decouples
lifecycle management from container execution through the dedicated
`gRPC` based [Container Runtime Interface (CRI)](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/container-runtime-interface-v1.md).
In other words, a Kubelet is a CRI client and expects a CRI implementation to
handle the server side of the interface.
[CRI-O\*](https://github.com/kubernetes-incubator/cri-o) and [Containerd\*](https://github.com/containerd/containerd/) are CRI implementations that rely on [OCI](https://github.com/opencontainers/runtime-spec)
compatible runtimes for managing container instances.
Kata Containers is an officially supported CRI-O and Containerd runtime. Refer to the following guides on how to set up Kata Containers with Kubernetes:
- [How to use Kata Containers and Containerd](../how-to/containerd-kata.md)
- [Run Kata Containers with Kubernetes](../how-to/run-kata-with-k8s.md)
#### OCI annotations
In order for the Kata Containers runtime (or any virtual machine based OCI compatible
runtime) to be able to understand if it needs to create a full virtual machine or if it
has to create a new container inside an existing pod's virtual machine, CRI-O adds
specific annotations to the OCI configuration file (`config.json`) which is passed to
the OCI compatible runtime.
Before calling its runtime, CRI-O will always add a `io.kubernetes.cri-o.ContainerType`
annotation to the `config.json` configuration file it produces from the Kubelet CRI
request. The `io.kubernetes.cri-o.ContainerType` annotation can either be set to `sandbox`
or `container`. Kata Containers will then use this annotation to decide if it needs to
respectively create a virtual machine or a container inside a virtual machine associated
> **Note:** Since Kubernetes 1.12, the [`Kubernetes RuntimeClass`](https://kubernetes.io/docs/concepts/containers/runtime-class/)
> has been supported and the user can specify runtime without the non-standardized annotations.
With `RuntimeClass`, users can define Kata Containers as a `RuntimeClass` and then explicitly specify that a pod being created as a Kata Containers pod. For details, please refer to [How to use Kata Containers and Containerd](../../docs/how-to/containerd-kata.md).
# Appendices
## DAX
Kata Containers utilizes the Linux kernel DAX [(Direct Access filesystem)](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/filesystems/dax.txt)
feature to efficiently map some host-side files into the guest VM space.
In particular, Kata Containers uses the QEMU NVDIMM feature to provide a
memory-mapped virtual device that can be used to DAX map the virtual machine's
root filesystem into the guest memory address space.
Mapping files using DAX provides a number of benefits over more traditional VM
file and device mapping mechanisms:
- Mapping as a direct access devices allows the guest to directly access
the host memory pages (such as via Execute In Place (XIP)), bypassing the guest
page cache. This provides both time and space optimizations.
- Mapping as a direct access device inside the VM allows pages from the
host to be demand loaded using page faults, rather than having to make requests
via a virtualized device (causing expensive VM exits/hypercalls), thus providing
a speed optimization.
- Utilizing `MAP_SHARED` shared memory on the host allows the host to efficiently
share pages.
Kata Containers uses the following steps to set up the DAX mappings:
1. QEMU is configured with an NVDIMM memory device, with a memory file
backend to map in the host-side file into the virtual NVDIMM space.
2. The guest kernel command line mounts this NVDIMM device with the DAX
feature enabled, allowing direct page mapping and access, thus bypassing the
guest page cache.

Information on the use of NVDIMM via QEMU is available in the [QEMU source code](http://git.qemu-project.org/?p=qemu.git;a=blob;f=docs/nvdimm.txt;hb=HEAD)
API in terms of the way the container lifecycle is split into
different verbs. Rather than calling the runtime multiple times, the
container manager creates a socket and passes it to the shimv2
runtime. The socket is a bi-directional communication channel that
uses a gRPC based protocol to allow the container manager to send API
calls to the runtime, which returns the result to the container
manager using the same channel.
The shimv2 architecture allows running several containers per VM to
support container engines that require multiple containers running
inside a pod.
With the new architecture [Kubernetes](kubernetes.md) can
launch both Pod and OCI compatible containers with a single
[runtime](#runtime) shim per Pod, rather than `2N+1` shims. No stand
alone `kata-proxy` process is required, even if VSOCK is not
available.
## Workload
The workload is the command the user requested to run in the
container and is specified in the [OCI bundle](background.md#oci-bundle)'s
configuration file.
In our [example](example-command.md), the workload is the `sh(1)` command.
### Workload root filesystem
For details of how the [runtime](#runtime) makes the
[container image](background.md#container-image) chosen by the user available to
the workload process, see the
[Container creation](#container-creation) and [storage](#storage) sections.
Note that the workload is isolated from the [guest VM](#environments) environment by its
surrounding [container environment](#environments). The guest VM
environment where the container runs in is also isolated from the _outer_
[host environment](#environments) where the container manager runs.
## System overview
### Environments
The following terminology is used to describe the different or
environments (or contexts) various processes run in. It is necessary
to study this table closely to make sense of what follows:
| Type | Name | Virtualized | Containerized | rootfs | Rootfs device type | Mount type | Description |
|-|-|-|-|-|-|-|-|
| Host | Host | no `[1]` | no | Host specific | Host specific | Host specific | The environment provided by a standard, physical non virtualized system. |
| VM root | Guest VM | yes | no | rootfs inside the [guest image](guest-assets.md#guest-image) | Hypervisor specific `[2]` | `ext4` | The first (or top) level VM environment created on a host system. |
| VM container root | Container | yes | yes | rootfs type requested by user ([`ubuntu` in the example](example-command.md)) | `kataShared` | [virtio FS](storage.md#virtio-fs) | The first (or top) level container environment created inside the VM. Based on the [OCI bundle](background.md#oci-bundle). |
**Key:**
-`[1]`: For simplicity, this document assumes the host environment
runs on physical hardware.
-`[2]`: See the [DAX](#dax) section.
> **Notes:**
>
> - The word "root" is used to mean _top level_ here in a similar
> manner to the term [rootfs](background.md#root-filesystem).
>
> - The term "first level" prefix used above is important since it implies
> that it is possible to create multi level systems. However, they do
> not form part of a standard Kata Containers environment so will not
> be considered in this document.
The reasons for containerizing the [workload](#workload) inside the VM
are:
- Isolates the workload entirely from the VM environment.
- Provides better isolation between containers in a [pod](kubernetes.md).
- Allows the workload to be managed and monitored through its cgroup
confinement.
### Container creation
The steps below show at a high level how a Kata Containers container is
created using the containerd container manager:
1. The user requests the creation of a container by running a command
like the [example command](example-command.md).
1. The container manager daemon runs a single instance of the Kata
[runtime](#runtime).
1. The Kata runtime loads its [configuration file](#configuration).
1. The container manager calls a set of shimv2 API functions on the runtime.
1. The Kata runtime launches the configured [hypervisor](#hypervisor).
1. The hypervisor creates and starts (_boots_) a VM using the
[guest assets](guest-assets.md#guest-assets):
- The hypervisor [DAX](#dax) shares the
[guest image](guest-assets.md#guest-image)
into the VM to become the VM [rootfs](background.md#root-filesystem) (mounted on a `/dev/pmem*` device),
which is known as the [VM root environment](#environments).
- The hypervisor mounts the [OCI bundle](background.md#oci-bundle), using [virtio FS](storage.md#virtio-fs),
into a container specific directory inside the VM's rootfs.
This container specific directory will become the
[container rootfs](#environments), known as the
[container environment](#environments).
1. The [agent](#agent) is started as part of the VM boot.
1. The runtime calls the agent's `CreateSandbox` API to request the
agent create a container:
1. The agent creates a [container environment](#environments)
in the container specific directory that contains the [container rootfs](#environments).
The container environment hosts the [workload](#workload) in the
[container rootfs](#environments) directory.
1. The agent spawns the workload inside the container environment.
> **Notes:**
>
> - The container environment created by the agent is equivalent to
> script as `root` and looking at the "Image details" section of the
> output.
#### Root filesystem image
The default packaged rootfs image, sometimes referred to as the _mini
O/S_, is a highly optimized container bootstrap system.
If this image type is [configured](README.md#configuration), when the
user runs the [example command](example-command.md):
- The [runtime](README.md#runtime) will launch the configured [hypervisor](README.md#hypervisor).
- The hypervisor will boot the mini-OS image using the [guest kernel](#guest-kernel).
- The kernel will start the init daemon as PID 1 (`systemd`) inside the VM root environment.
-`systemd`, running inside the mini-OS context, will launch the [agent](README.md#agent)
in the root context of the VM.
- The agent will create a new container environment, setting its root
filesystem to that requested by the user (Ubuntu in [the example](example-command.md)).
- The agent will then execute the command (`sh(1)` in [the example](example-command.md))
inside the new container.
The table below summarises the default mini O/S showing the
environments that are created, the services running in those
environments (for all platforms) and the root filesystem used by
each service:
| Process | Environment | systemd service? | rootfs | User accessible | Notes |
|-|-|-|-|-|-|
| systemd | VM root | n/a | [VM guest image](#guest-image)| [debug console][debug-console] | The init daemon, running as PID 1 |
| [Agent](README.md#agent) | VM root | yes | [VM guest image](#guest-image)| [debug console][debug-console] | Runs as a systemd service |
| `chronyd` | VM root | yes | [VM guest image](#guest-image)| [debug console][debug-console] | Used to synchronise the time with the host |
| container workload (`sh(1)` in [the example](example-command.md)) | VM container | no | User specified (Ubuntu in [the example](example-command.md)) | [exec command](README.md#exec-command) | Managed by the agent |
See also the [process overview](README.md#process-overview).
> **Notes:**
>
> - The "User accessible" column shows how an administrator can access
> the environment.
>
> - The container workload is running inside a full container
> environment which itself is running within a VM environment.
>
> - See the [configuration files for the `osbuilder` tool](../../../tools/osbuilder/rootfs-builder)
> for details of the default distribution for platforms other than
> Intel x86_64.
#### Initrd image
The initrd image is a compressed `cpio(1)` archive, created from a
rootfs which is loaded into memory and used as part of the Linux
startup process. During startup, the kernel unpacks it into a special
instance of a `tmpfs` mount that becomes the initial root filesystem.
If this image type is [configured](README.md#configuration), when the user runs
the [example command](example-command.md):
- The [runtime](README.md#runtime) will launch the configured [hypervisor](README.md#hypervisor).
- The hypervisor will boot the mini-OS image using the [guest kernel](#guest-kernel).
- The kernel will start the init daemon as PID 1 (the
[agent](README.md#agent))
inside the VM root environment.
- The [agent](README.md#agent) will create a new container environment, setting its root
filesystem to that requested by the user (`ubuntu` in
[the example](example-command.md)).
- The agent will then execute the command (`sh(1)` in [the example](example-command.md))
inside the new container.
The table below summarises the default mini O/S showing the environments that are created,
the processes running in those environments (for all platforms) and
the root filesystem used by each service:
| Process | Environment | rootfs | User accessible | Notes |
|-|-|-|-|-|
| [Agent](README.md#agent) | VM root | [VM guest image](#guest-image) | [debug console][debug-console] | Runs as the init daemon (PID 1) |
| container workload | VM container | User specified (Ubuntu in this example) | [exec command](README.md#exec-command) | Managed by the agent |
> **Notes:**
>
> - The "User accessible" column shows how an administrator can access
> the environment.
>
> - It is possible to use a standard init daemon such as systemd with
> an initrd image if this is desirable.
See also the [process overview](README.md#process-overview).
| [initrd](#initrd-image) | [Alpine Linux](https://alpinelinux.org) | Kata [agent](README.md#agent) (as no systemd support) | Security hardened and tiny C library |
See also:
- The [osbuilder](../../../tools/osbuilder) tool
This is used to build all default image types.
- The [versions database](../../../versions.yaml)
The `default-image-name` and `default-initrd-name` options specify
| `SandboxCgroupOnly=false` | yes | legacy | Easiest to make Kata work | Unaccounted for memory and resource utilization | v1
| `SandboxCgroupOnly=true` | no | recommended | Complete tracking of Kata memory and CPU utilization. In Kubernetes, the Kubelet can fully constrain Kata via the pod cgroup | Requires upper layer orchestrator which sizes sandbox cgroup appropriately | v1, v2
Kata implement CRI's API and support [`ContainerStats`](https://github.com/kubernetes/kubernetes/blob/release-1.18/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto#L101) and [`ListContainerStats`](https://github.com/kubernetes/kubernetes/blob/release-1.18/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto#L103) interfaces to expose containers metrics. User can use these interface to get basic metrics about container.
Kata implements CRI's API and supports [`ContainerStats`](https://github.com/kubernetes/kubernetes/blob/release-1.18/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto#L101) and [`ListContainerStats`](https://github.com/kubernetes/kubernetes/blob/release-1.18/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.proto#L103) interfaces to expose containers metrics. User can use these interfaces to get basic metrics about containers.
But unlike `runc`, Kata is a VM-based runtime and has a different architecture.
Unlike `runc`, Kata is a VM-based runtime and has a different architecture.
## Limitations of Kata 1.x and the target of Kata 2.0
## Limitations of Kata 1.x and target of Kata 2.0
Kata 1.x has a number of limitations related to observability that may be obstacles to running Kata Containers at scale.
In Kata 2.0, the following components will be able to provide more details about the system.
In Kata 2.0, the following components will be able to provide more details about the system:
- containerd shim v2 (effectively `kata-runtime`)
- Hypervisor statistics
- Agent process
- Guest OS statistics
> **Note**: In Kata 1.x, the main user-facing component was the runtime (`kata-runtime`). From 1.5, Kata then introduced the Kata containerd shim v2 (`containerd-shim-kata-v2`) which is essentially a modified runtime that is loaded by containerd to simplify and improve the way VM-based containers are created and managed.
> **Note**: In Kata 1.x, the main user-facing component was the runtime (`kata-runtime`). From 1.5, Kata introduced the Kata containerd shim v2 (`containerd-shim-kata-v2`) which is essentially a modified runtime that is loaded by containerd to simplify and improve the way VM-based containers are created and managed.
>
> For Kata 2.0, the main component is the Kata containerd shim v2, although the deprecated `kata-runtime` binary will be maintained for a period of time.
>
@@ -25,14 +25,15 @@ In Kata 2.0, the following components will be able to provide more details about
Kata 2.0 metrics strongly depend on [Prometheus](https://prometheus.io/), a graduated project from CNCF.
Kata Containers 2.0 introduces a new Kata component called `kata-monitor` which is used to monitor the other Kata components on the host. It's the monitor interface with Kata runtime, and we can do something like these:
Kata Containers 2.0 introduces a new Kata component called `kata-monitor` which is used to monitor the Kata components on the host. It's shipped with the Kata runtime to provide an interface to:
- Get metrics
- Get events
In this document we will cover metrics only. And until now it only supports metrics function.
At present, `kata-monitor` supports retrieval of metrics only: this is what will be covered in this document.
This is the architecture overview metrics in Kata Containers 2.0.
This is the architecture overview of metrics in Kata Containers 2.0:
@@ -45,38 +46,39 @@ For a quick evaluation, you can check out [this how to](../how-to/how-to-set-pro
### Kata monitor
`kata-monitor` is a management agent on one node, where many Kata containers are running. `kata-monitor`'s work include:
The `kata-monitor` management agent should be started on each node where the Kata containers runtime is installed. `kata-monitor` will:
> **Note**: node is a single host system or a node in K8s clusters.
> **Note**: a *node* running Kata containers will be either a single host system or a worker node belonging to a K8s cluster capable of running Kata pods.
- Aggregate sandbox metrics running on this node, and add`sandbox_id` label
- As a Prometheus target, all metrics from Kata shim on this node will be collected by Prometheus indirectly. This can easy the targets count in Prometheus, and also need not to expose shim's metrics by `ip:port`
- Aggregate sandbox metrics running on the node, adding the`sandbox_id` label to them.
- Attach the additional `cri_uid`, `cri_name` and `cri_namespace` labels to the sandbox metrics, tracking the `uid`, `name` and `namespace` Kubernetes pod metadata.
- Expose a new Prometheus target, allowing all node metrics coming from the Kata shim to be collected by Prometheus indirectly. This simplifies the targets count in Prometheus and avoids exposing shim's metrics by `ip:port`.
Only one `kata-monitor` process are running on one node.
Only one `kata-monitor` process runs in each node.
`kata-monitor`is using a different communication channel other than that `conatinerd` communicating with Kata shim, and Kata shim listen on a new socket address for communicating with`kata-monitor`.
`kata-monitor`uses a different communication channel than the one used by the container engine (`containerd`/`CRI-O`) to communicate with the Kata shim. The Kata shim exposes a dedicated socket address reserved to`kata-monitor`.
The way `kata-monitor` get shim's metrics socket file(`monitor_address`) like that `containerd` get shim address. The socket is an abstract socket and saved as file `abstract` with the same directory of `address` for `containerd`.
The shim's metrics socket file is created under the virtcontainers sandboxes directory, i.e. `vc/sbs/${PODID}/shim-monitor.sock`.
> **Note**: If there is no Prometheus server is configured, i.e., there is no scrape operations, `kata-monitor` will do nothing initiative.
> **Note**: If there is no Prometheus server configured, i.e., there are no scrape operations, `kata-monitor` will not collect any metrics.
### Kata runtime
Runtime is responsible for:
Kata runtime is responsible for:
- Gather metrics about shim process
- Gather metrics about hypervisor process
- Gather metrics about running sandbox
- Get metrics from Kata agent(through `ttrpc`)
- Get metrics from Kata agent(through `ttrpc`)
### Kata agent
Agent is responsible for:
Kata agent is responsible for:
- Gather agent process metrics
- Gather guest OS metrics
And in Kata 2.0, agent will add a new interface:
In Kata 2.0, the agent adds a new interface:
```protobuf
rpcGetMetrics(GetMetricsRequest)returns(Metrics);
@@ -93,33 +95,49 @@ The `metrics` field is Prometheus encoded content. This can avoid defining a fix
### Performance and overhead
Metrics should not become the bottleneck of system, downgrade the performance, and run with minimal overhead.
Metrics should not become a bottleneck for the system or downgrade the performance: they should run with minimal overhead.
Requirements:
* Metrics **MUST** be quick to collect
* Metrics **MUST** be small.
* Metrics **MUST** be small
* Metrics **MUST** be generated only if there are subscribers to the Kata metrics service
* Metrics **MUST** be stateless
In Kata 2.0, metrics are collected mainly from `/proc` filesystem, and consumed by Prometheus, based on a pull mode, that is mean if there is no Prometheus collector is running, so there will be zero overhead if nobody cares the metrics.
In Kata 2.0, metrics are collected only when needed (pull mode), mainly from the `/proc` filesystem, and consumed by Prometheus. This means that if the Prometheus collector is not running (so no one cares about the metrics) the overhead will be zero.
Metrics service also doesn't hold any metrics in memory.
The metrics service also doesn't hold any metrics in memory.
#### Metrics size ####
|\*|No Sandbox | 1 Sandbox | 2 Sandboxes |
|---|---|---|---|
|Metrics count| 39 | 106 | 173 |
|Metrics size(bytes)| 9K | 144K | 283K |
|Metrics size(`gzipped`, bytes)| 2K | 10K | 17K |
|Metrics size(bytes)| 9K | 144K | 283K |
|Metrics size(`gzipped`, bytes)| 2K | 10K | 17K |
*Metrics size*: Response size of one Prometheus scrape request.
*Metrics size*: response size of one Prometheus scrape request.
It's easy to estimated that if there are 10 sandboxes running in the host, the size of one metrics fetch request issued by Prometheus will be about to 9 + (144 - 9) * 10 = 1.35M (not `gzipped`) or 2 + (10 - 2) * 10 = 82K (`gzipped`). Of course Prometheus support `gzip` compression, that can reduce the response size of every request.
It's easy to estimate the size of one metrics fetch request issued by Prometheus.
The formula to calculate the expected size when no gzip compression is in place is:
9 + (144 - 9) * `number of kata sandboxes`
Prometheus supports `gzip compression`. When enabled, the response size of each request will be smaller:
2 + (10 - 2) * `number of kata sandboxes`
**Example**
We have 10 sandboxes running on a node. The expected size of one metrics fetch request issued by Prometheus against the kata-monitor agent running on that node will be:
9 + (144 - 9) * 10 = **1.35M**
If `gzip compression` is enabled:
2 + (10 - 2) * 10 = **82K**
#### Metrics delay ####
And here is some test data:
- End-to-end (from Prometheus server to `kata-monitor` and `kata-monitor` write response back): 20ms(avg)
- Agent(RPC all from shim to agent): 3ms(avg)
- End-to-end (from Prometheus server to `kata-monitor` and `kata-monitor` write response back): **20ms**(avg)
- Agent(RPC all from shim to agent): **3ms**(avg)
Test infrastructure:
@@ -128,13 +146,13 @@ Test infrastructure:
**Scrape interval**
Prometheus default `scrape_interval` is 1 minute, and usually it is set to 15s. Small `scrape_interval` will cause more overhead, so user should set it on monitor demand.
Prometheus default `scrape_interval` is 1 minute, but it is usually set to 15 seconds. A smaller`scrape_interval` causes more overhead, so users should set it depending on their monitoring needs.
## Metrics list
Here listed is all supported metrics by Kata 2.0. Some metrics is dependent on guest kernels in the VM, so there may be some different by your environment.
Here are listed all the metrics supported by Kata 2.0. Some metrics are dependent on the VM guest kernel, so the available ones may differ based on the environment.
Metrics is categorized by component where metrics are collected from and for.
Metrics are categorized by the component from/for which the metrics are collected.
* [Metric types](#metric-types)
* [Kata agent metrics](#kata-agent-metrics)
@@ -145,15 +163,15 @@ Metrics is categorized by component where metrics are collected from and for.
> * Labels here are not include `instance` and `job` labels that added by Prometheus.
> * Labels here do not include the `instance` and `job` labels added by Prometheus.
> * Notes about metrics unit
> * `Kibibytes`, abbreviated `KiB`. 1 `KiB` equals 1024 B.
> * For some metrics (like network devices statistics from file `/proc/net/dev`), unit is depend on label( for example `recv_bytes` and `recv_packets` are having different units).
> * Most of these metrics is collected from `/proc` filesystem, so the unit of metrics are keeping the same unit as `/proc`. See the `proc(5)` manual page for further details.
> * For some metrics (like network devices statistics from file `/proc/net/dev`), unit depends on label( for example `recv_bytes` and `recv_packets` have different units).
> * Most of these metrics are collected from the `/proc` filesystem, so the unit of each metric matches the unit of the relevant `/proc` entry. See the `proc(5)` manual page for further details.
### Metric types
Prometheus offer four core metric types.
Prometheus offers four core metric types.
- Counter: A counter is a cumulative metric that represents a single monotonically increasing counter whose value can only increase.
@@ -207,7 +225,7 @@ Metrics for Firecracker vmm.
| `kata_firecracker_uart`: <br> Metrics specific to the UART device. | `GAUGE` | | <ul><li>`item`<ul><li>`error_count`</li><li>`flush_count`</li><li>`missed_read_count`</li><li>`missed_write_count`</li><li>`read_count`</li><li>`write_count`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_vcpu`: <br> Metrics specific to VCPUs' mode of functioning. | `GAUGE` | | <ul><li>`item`<ul><li>`exit_io_in`</li><li>`exit_io_out`</li><li>`exit_mmio_read`</li><li>`exit_mmio_write`</li><li>`failures`</li><li>`filter_cpuid`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
| `kata_firecracker_vmm`: <br> Metrics specific to the machine manager as a whole. | `GAUGE` | | <ul><li>`item`<ul><li>`device_events`</li><li>`panic_count`</li></ul></li><li>`sandbox_id`</li></ul> | 2.0.0 |
@@ -30,7 +30,7 @@ The Kata Containers runtime **MUST** implement the following command line option
The Kata Containers project **MUST** provide two interfaces for CRI shims to manage hardware
virtualization based Kubernetes pods and containers:
- An OCI and `runc` compatible command line interface, as described in the previous section.
This interface is used by implementations such as [`CRI-O`](http://cri-o.io) and [`cri-containerd`](https://github.com/containerd/cri-containerd), for example.
This interface is used by implementations such as [`CRI-O`](http://cri-o.io) and [`containerd`](https://github.com/containerd/containerd), for example.
- A hardware virtualization runtime library API for CRI shims to consume and provide a more
CRI native implementation. The [`frakti`](https://github.com/kubernetes/frakti) CRI shim is an example of such a consumer.
[Research](https://www.usenix.org/conference/fast16/technical-sessions/presentation/harter) shows that time to take for pull operation accounts for 76% of container startup time but only 6.4% of that data is read. So if we can get data on demand (lazy load), it will speed up the container start. [`Nydus`](https://github.com/dragonflyoss/image-service) is a project which build image with new format and can get data on demand when container start.
The following benchmarking result shows the performance improvement compared with the OCI image for the container cold startup elapsed time on containerd. As the OCI image size increases, the container startup time of using `nydus` image remains very short. [Click here](https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-design.md) to see `nydus` design.
## Proposal - Bring `lazyload` ability to Kata Containers
`Nydusd` is a fuse/`virtiofs` daemon which is provided by `nydus` project and it supports `PassthroughFS` and [RAFS](https://github.com/dragonflyoss/image-service/blob/master/docs/nydus-design.md) (Registry Acceleration File System) natively, so in Kata Containers, we can use `nydusd` in place of `virtiofsd` and mount `nydus` image to guest in the meanwhile.
The process of creating/starting Kata Containers with `virtiofsd`,
1. When creating sandbox, the Kata Containers Containerd v2 [shim](https://github.com/kata-containers/kata-containers/blob/main/docs/design/architecture/README.md#runtime) will launch `virtiofsd` before VM starts and share directories with VM.
2. When creating container, the Kata Containers Containerd v2 shim will mount rootfs to `kataShared`(/run/kata-containers/shared/sandboxes/\<SANDBOX\>/mounts/\<CONTAINER\>/rootfs), so it can be seen at the path `/run/kata-containers/shared/containers/shared/\<CONTAINER\>/rootfs` in the guest and used as container's rootfs.
The process of creating/starting Kata Containers with `nydusd`,

1. When creating sandbox, the Kata Containers Containerd v2 shim will launch `nydusd` daemon before VM starts.
After VM starts, `kata-agent` will mount `virtiofs` at the path `/run/kata-containers/shared` and Kata Containers Containerd v2 shim mount `passthroughfs` filesystem to path `/run/kata-containers/shared/containers` when the VM starts.
```bash
# start nydusd
$ sandbox_id=my-test-sandbox
$ sudo /usr/local/bin/nydusd --log-level info --sock /run/vc/vm/${sandbox_id}/vhost-user-fs.sock --apisock /run/vc/vm/${sandbox_id}/api.sock
```
```bash
# source: the host sharedir which will pass through to guest
-X POST "http://localhost/api/v1/mount?mountpoint=/containers" -H "accept: */*"\
-H "Content-Type: application/json"\
-d '{
"source":"/path/to/sharedir",
"fs_type":"passthrough_fs",
"config":""
}'
```
2. When creating normal container, the Kata Containers Containerd v2 shim send request to `nydusd` to mount `rafs` at the path `/run/kata-containers/shared/rafs/<container_id>/lowerdir` in guest.
The Kata Containers Containerd v2 shim will also bind mount `snapshotdir` which `nydus-snapshotter` assigns to `sharedir`。
So in guest, container rootfs=overlay(`lowerdir=rafs`, `upperdir=snapshotdir/fs`, `workdir=snapshotdir/work`)
> how to transfer the `rafs` info from `nydus-snapshotter` to the Kata Containers Containerd v2 shim?
By default, when creating `OCI` image container, `nydus-snapshotter` will return [`struct` Mount slice](https://github.com/containerd/containerd/blob/main/mount/mount.go#L21) below to containerd and containerd use them to mount rootfs
Then, we can append `rafs` info into `Options`, but if do this, containerd will mount failed, as containerd can not identify `rafs` info. Here, we can refer to [containerd mount helper](https://github.com/containerd/containerd/blob/main/mount/mount_linux.go#L42) and provide a binary called `nydus-overlayfs`. The `Mount` slice which `nydus-snapshotter` returned becomes
# How to run Docker in Docker with Kata Containers
This document describes the why and how behind running Docker in a Kata Container.
> **Note:** While in other environments this might be described as "Docker in Docker", the new architecture of Kata 2.x means [Docker can no longer be used to create containers using a Kata Containers runtime](https://github.com/kata-containers/kata-containers/issues/722).
## Requirements
- A working Kata Containers installation
## Install and configure Kata Containers
Follow the [Kata Containers installation guide](../install/README.md) to Install Kata Containers on your Kubernetes cluster.
## Background
Docker in Docker ("DinD") is the colloquial name for the ability to run `docker` from inside a container.
You can learn more about about Docker-in-Docker at the following links:
- [The original announcement of DinD](https://www.docker.com/blog/docker-can-now-run-within-docker/)
While normally DinD refers to running `docker` from inside a Docker container,
Kata Containers 2.x allows only [supported runtimes][kata-2.x-supported-runtimes] (such as [`containerd`](../install/container-manager/containerd/containerd-install.md)).
Running `docker` in a Kata Container implies creating Docker containers from inside a container managed by `containerd` (or another supported container manager), as illustrated below:
[OverlayFS][OverlayFS] is the preferred storage driver for most container runtimes on Linux ([including Docker](https://docs.docker.com/storage/storagedriver/select-storage-driver)).
> **Note:** While in the past Kata Containers did not contain the [`overlay` kernel module (aka OverlayFS)][OverlayFS], the kernel modules have been included since the [Kata Containers v2.0.0 release][v2.0.0].
## Why Docker in Kata Containers 2.x requires special measures
Running Docker containers Kata Containers requires care because `VOLUME`s specified in `Dockerfile`s run by Kata Containers are given the `kataShared` mount type by default, which applies to the root directory `/`:
```console
/ # mount
kataShared on / type virtiofs (rw,relatime,dax)
```
`kataShared` mount types are powered by [`virtio-fs`][virtio-fs], a marked improvement over `virtio-9p`, thanks to [PR #1016](https://github.com/kata-containers/runtime/pull/1016). While `virtio-fs` is normally an excellent choice, in the case of DinD workloads `virtio-fs` causes an issue -- [it *cannot* be used as a "upper layer" of `overlayfs` without a custom patch](http://lists.katacontainers.io/pipermail/kata-dev/2020-January/001216.html).
As `/var/lib/docker` is a `VOLUME` specified by DinD (i.e. the `docker` images tagged `*-dind`/`*-dind-rootless`), `docker` fill fail to start (or even worse, silently pick a worse storage driver like `vfs`) when started in a Kata Container. Special measures must be taken when running DinD-powered workloads in Kata Containers.
## Workarounds/Solutions
Thanks to various community contributions (see [issue references below](#references)) the following options, with various trade-offs have been uncovered:
### Use a memory backed volume
For small workloads (small container images, without much generated filesystem load), a memory-backed volume is sufficient. Kubernetes supports a variant of [the `EmptyDir` volume][k8s-emptydir], which allows for memdisk-backed storage -- the [the `medium: Memory` ][k8s-memory-volume-type]. An example of a `Pod` using such a setup [was contributed](https://github.com/kata-containers/runtime/issues/1429#issuecomment-477385283), and is reproduced below:
```yaml
apiVersion:v1
kind:Pod
metadata:
name:dind
spec:
runtimeClassName:kata
containers:
- name:dind
securityContext:
privileged:true
image:docker:20.10-dind
args:["--storage-driver=overlay2"]
resources:
limits:
memory:"3G"
volumeMounts:
- mountPath:/var/run/
name:dockersock
- mountPath:/var/lib/docker
name:docker
volumes:
- name:dockersock
emptyDir:{}
- name:docker
emptyDir:
medium:Memory
```
Inside the container you can view the mount:
```console
/ # mount | grep lib\/docker
tmpfs on /var/lib/docker type tmpfs (rw,relatime)
```
As is mentioned in the comment encapsulating this code, using volatile memory for container storage backing is a risky and could be possibly wasteful on machines that do not have a lot of RAM.
### Use a loop mounted disk
Using a loop mounted disk that is provisioned shortly before starting of the container workload is another approach that yields good performance.
Contributors provided [an example in issue #1888](https://github.com/kata-containers/runtime/issues/1888#issuecomment-739057384), which is reproduced in part below:
```yaml
spec:
containers:
- name:docker
image:docker:20.10-dind
command:["sh","-c"]
args:
- if [[ $(df -PT /var/lib/docker | awk 'NR==2 {print $2}') == virtiofs ]]; then
apk add e2fsprogs &&
truncate -s 20G /tmp/disk.img &&
mkfs.ext4 /tmp/disk.img &&
mount /tmp/disk.img /var/lib/docker; fi &&
dockerd-entrypoint.sh;
securityContext:
privileged:true
```
Note that loop mounted disks are often sparse, which means they *do not* take up the full amount of space that has been provisioned. This solution seems to produce the best performance and flexibility, at the expense of increased complexity and additional required setup.
### Build a custom kernel
It's possible to [modify the kernel](https://github.com/kata-containers/runtime/issues/1888#issuecomment-616872558) (in addition to applying the earlier mentioned mailing list patch) to support using `virtio-fs` as an upper. Note that if you modify your kernel and use `virtio-fs` you may require [additional changes](https://github.com/kata-containers/runtime/issues/1888#issuecomment-739057384) for decent performance and to address other issues.
> **NOTE:** A future kernel release may rectify the usability and performance issues of using `virtio-fs` as an OverlayFS upper layer.
## References
The solutions proposed in this document are an amalgamation of thoughtful contributions from the Kata Containers community.
Find links to issues & related discussion and the fruits therein below:
- [How to run Docker in Docker with Kata Containers (#2474)](https://github.com/kata-containers/kata-containers/issues/2474)
- [Does Kata-container support AUFS/OverlayFS? (#2493)](https://github.com/kata-containers/runtime/issues/2493)
- [Unable to start docker in docker with virtio-fs (#1888)](https://github.com/kata-containers/runtime/issues/1888)
- [Not using native diff for overlay2 (#1429)](https://github.com/kata-containers/runtime/issues/1429)
To improve security, Kata Container supports running the VMM process (currently only QEMU) as a non-`root` user.
This document describes how to enable the rootless VMM mode and its limitations.
## Pre-requisites
The permission and ownership of the `kvm` device node (`/dev/kvm`) need to be configured to:
```
$ crw-rw---- 1 root kvm
```
use the following commands:
```
$ sudo groupadd kvm -r
$ sudo chown root:kvm /dev/kvm
$ sudo chmod 660 /dev/kvm
```
## Configure rootless VMM
By default, the VMM process still runs as the root user. There are two ways to enable rootless VMM:
1. Set the `rootless` flag to `true` in the hypervisor section of `configuration.toml`.
2. Set the Kubernetes annotation `io.katacontainers.hypervisor.rootless` to `true`.
## Implementation details
When `rootless` flag is enabled, upon a request to create a Pod, Kata Containers runtime creates a random user and group (e.g. `kata-123`), and uses them to start the hypervisor process.
The `kvm` group is also given to the hypervisor process as a supplemental group to give the hypervisor process access to the `/dev/kvm` device.
Another necessary change is to move the hypervisor runtime files (e.g. `vhost-fs.sock`, `qmp.sock`) to a directory (under `/run/user/[uid]/`) where only the non-root hypervisor has access to.
## Limitations
1. Only the VMM process is running as a non-root user. Other processes such as Kata Container shimv2 and `virtiofsd` still run as the root user.
2. Currently, this feature is only supported in QEMU. Still need to bring it to Firecracker and Cloud Hypervisor (see https://github.com/kata-containers/kata-containers/issues/2567).
3. Certain features will not work when rootless VMM is enabled, including:
1. Passing devices to the guest (`virtio-blk`, `virtio-scsi`) will not work if the non-privileged user does not have permission to access it (leading to a permission denied error). A more permissive permission (e.g. 666) may overcome this issue. However, you need to be aware of the potential security implications of reducing the security on such devices.
2. `vfio` device will also not work because of permission denied error.
@@ -34,8 +34,6 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.agent.enable_tracing` | `boolean` | enable tracing for the agent |
| `io.katacontainers.config.agent.container_pipe_size` | uint32 | specify the size of the std(in/out) pipes created for containers |
| `io.katacontainers.config.agent.kernel_modules` | string | the list of kernel modules and their parameters that will be loaded in the guest kernel. Semicolon separated list of kernel modules and their parameters. These modules will be loaded in the guest kernel using `modprobe`(8). E.g., `e1000e InterruptThrottleRate=3000,3000,3000 EEE=1; i915 enable_ppgtt=0` |
| `io.katacontainers.config.agent.trace_mode` | string | the trace mode for the agent |
| `io.katacontainers.config.agent.trace_type` | string | the trace type for the agent |
## Hypervisor Options
| Key | Value Type | Comments |
@@ -58,13 +56,14 @@ There are several kinds of Kata configurations and they are listed below.
| `io.katacontainers.config.hypervisor.enable_iothreads` | `boolean`| enable IO to be processed in a separate thread. Supported currently for virtio-`scsi` driver |
| `io.katacontainers.config.hypervisor.enable_mem_prealloc` | `boolean` | the memory space used for `nvdimm` device by the hypervisor |
| `io.katacontainers.config.hypervisor.enable_swap` | `boolean` | enable swap of VM memory |
| `io.katacontainers.config.hypervisor.entropy_source` (R) | string| the path to a host source of entropy (`/dev/random`, `/dev/urandom` or real hardware RNG device) |
| `io.katacontainers.config.hypervisor.firmware_volume` | string | the guest firmware volume that will be passed to the container VM |
| `io.katacontainers.config.hypervisor.guest_hook_path` | string | the path within the VM that will be used for drop in hooks |
| `io.katacontainers.config.hypervisor.hotplug_vfio_on_root_bus` | `boolean` | indicate if devices need to be hotplugged on the root bus instead of a bridge|
Setup swap device in guest kernel can help to increase memory capacity, handle some memory issues and increase file access speed sometimes.
Kata Containers can insert a raw file to the guest as the swap device.
## Requisites
The swap config of the containers should be set by [annotations](how-to-set-sandbox-config-kata.md#container-options). So [extra configuration is needed for containerd](how-to-set-sandbox-config-kata.md#containerd-configuration).
Kata Containers just supports setup swap device in guest kernel with QEMU.
Install and setup Kata Containers as shown [here](../install/README.md).
Enable setup swap device in guest kernel as follows:
```
$ sudo sed -i -e 's/^#enable_guest_swap.*$/enable_guest_swap = true/g' /etc/kata-containers/configuration.toml
```
## Run a Kata Container utilizing swap device
Use following command to start a Kata Container with swappiness 60 and 1GB swap device (swap_in_bytes - memory_limit_in_bytes).
- For networking, ACRN supports either MACVTAP or TAP. If MACVTAP is not enabled in the Service OS, please follow the below steps to update the kernel:
Refer to [kata-`nydus`-design](../design/kata-nydus-design.md) for introduction and `nydus` has supported Kata Containers with hypervisor `QEMU` and `CLH` currently.
## How to
You can use Kata Containers with `nydus` as follows,
1. Use [`nydus` latest branch](https://github.com/dragonflyoss/image-service);
2. Deploy `nydus` environment as [`Nydus` Setup for Containerd Environment](https://github.com/dragonflyoss/image-service/blob/master/docs/containerd-env-setup.md);
3. Start `nydus-snapshotter` with `enable_nydus_overlayfs` enabled;
4. Use [kata-containers](https://github.com/kata-containers/kata-containers) `latest` branch to compile and build `kata-containers.img`;
5. Update `configuration-qemu.toml` or `configuration-clh.toml`to include:
```toml
shared_fs="virtio-fs-nydus"
virtio_fs_daemon="<nydusd binary path>"
virtio_fs_extra_args=[]
```
6. run `crictl run -r kata nydus-container.yaml nydus-sandbox.yaml`;
@@ -6,4 +6,4 @@ Container deployments utilize explicit or implicit file sharing between host fil
As of the 2.0 release of Kata Containers, [virtio-fs](https://virtio-fs.gitlab.io/) is the default filesystem sharing mechanism.
virtio-fs support works out of the box for `cloud-hypervisor` and `qemu`, when Kata Containers is deployed using `kata-deploy`. Learn more about `kata-deploy` and how to use `kata-deploy` in Kubernetes [here](https://github.com/kata-containers/kata-containers/tree/main/tools/packaging/kata-deploy#kubernetes-quick-start).
virtio-fs support works out of the box for `cloud-hypervisor` and `qemu`, when Kata Containers is deployed using `kata-deploy`. Learn more about `kata-deploy` and how to use `kata-deploy` in Kubernetes [here](../../tools/packaging/kata-deploy/README.md#kubernetes-quick-start).
| [Using official distro packages](#official-packages) | Kata packages provided by Linux distributions official repositories | yes | Recommended for most users. |
| [Using snap](#snap-installation) | Easy to install | yes| Good alternative to official distro packages. |
| [Automatic](#automatic-installation) | Run a single command to install a full system | **No!** | For those wanting the latest release quickly. |
| [Manual](#manual-installation) | Follow a guide step-by-step to install a working system | **No!** | For those who want the latest release with more control. |
| [Build from source](#build-from-source-installation) | Build the software components manually | **No!** | Power users and developers only. |
| Installation method | Description | Automatic updates | Use case |
| [Using kata-deploy](#kata-deploy-installation) | The preferred way to deploy the Kata Containers distributed binaries on a Kubernetes cluster | **No!** | Best way to give it a try on kata-containers on an already up and running Kubernetes cluster. |
| [Using official distro packages](#official-packages) | Kata packages provided by Linux distributions official repositories | yes | Recommended for most users. |
| [Using snap](#snap-installation) | Easy to install | yes | Good alternative to official distro packages. |
| [Automatic](#automatic-installation) | Run a single command to install a full system | **No!** | For those wanting the latest release quickly. |
| [Manual](#manual-installation) | Follow a guide step-by-step to install a working system | **No!** | For those who want the latest release with more control. |
| [Build from source](#build-from-source-installation) | Build the software components manually | **No!** | Power users and developers only. |
### Kata Deploy Installation
Kata Deploy provides a Dockerfile, which contains all of the binaries and
artifacts required to run Kata Containers, as well as reference DaemonSets,
which can be utilized to install Kata Containers on a running Kubernetes
cluster.
[Use Kata Deploy](/tools/packaging/kata-deploy/README.md) to install Kata Containers on a Kubernetes Cluster.
### Official packages
@@ -48,9 +58,9 @@ Follow the [containerd installation guide](container-manager/containerd/containe
## Build from source installation
*Note:* Power users who decide to build from sources should be aware of the
implications of using an unpackaged system which will not be automatically
updated as new [releases](../Stable-Branch-Strategy.md) are made available.
> **Note:** Power users who decide to build from sources should be aware of the
> implications of using an unpackaged system which will not be automatically
> updated as new [releases](../Stable-Branch-Strategy.md) are made available.
[Building from sources](../Developer-Guide.md#initial-setup) allows power users
who are comfortable building software from source to use the latest component
Read the following documents to know how to run Kata Containers 2.x with `containerd`.
* [How to use Kata Containers and Containerd](https://github.com/kata-containers/kata-containers/blob/main/docs/how-to/containerd-kata.md)
* [Install Kata Containers with containerd](https://github.com/kata-containers/kata-containers/blob/main/docs/install/container-manager/containerd/containerd-install.md)
* [How to use Kata Containers and Containerd](../how-to/containerd-kata.md)
* [Install Kata Containers with containerd](./container-manager/containerd/containerd-install.md)
This document discusses threat models associated with the Kata Containers project.
Kata was designed to provide additional isolation of container workloads, protecting
the host infrastructure from potentially malicious container users or workloads. Since
Kata Containers adds a level of isolation on top of traditional containers, the focus
is on the additional layer provided, not on traditional container security.
This document provides a brief background on containers and layered security, describes
the interface to Kata from CRI runtimes, a review of utilized virtual machine interfaces, and then
a review of threats.
## Kata security objective
Kata seeks to prevent an untrusted container workload or user of that container workload to gain
control of, obtain information from, or tamper with the host infrastructure.
In our scenario, an asset is anything on the host system, or elsewhere in the cluster
infrastructure. The attacker is assumed to be either a malicious user or the workload itself
running within the container. The goal of Kata is to prevent attacks which would allow
any access to the defined assets.
## Background on containers, layered security
Traditional containers leverage several key Linux kernel features to provide isolation and
a view that the container workload is the only entity running on the host. Key features include
`Namespaces`, `cgroups`, `capablities`, `SELinux` and `seccomp`. The canonical runtime for creating such
a container is `runc`. In the remainder of the document, the term `traditional-container` will be used
to describe a container workload created by runc.
Kata Containers provides a second layer of isolation on top of those provided by traditional-containers.
The hardware virtualization interface is the basis of this additional layer. Kata launches a lightweight
virtual machine, and uses the guest’s Linux kernel to create a container workload, or workloads in the case
of multi-container pods. In Kubernetes and in the Kata implementation, the sandbox is carried out at the
pod level. In Kata, this sandbox is created using a virtual machine.
## Interface to Kata Containers: CRI, v2-shim, OCI
A typical Kata Containers deployment uses Kubernetes with a CRI implementation.
On every node, Kubelet will interact with a CRI implementor, which will in turn interface with
an OCI based runtime, such as Kata Containers. Typical CRI implementors are `cri-o` and `containerd`.
The CRI API, as defined at the Kubernetes [CRI-API repo](https://github.com/kubernetes/cri-api/),
results in a few constructs being supported by the CRI implementation, and ultimately in the OCI
runtime creating the workloads.
In order to run a container inside of the Kata sandbox, several virtual machine devices and interfaces
are required. Kata translates sandbox and container definitions to underlying virtualization technologies provided
by a set of virtual machine monitors (VMMs) and hypervisors. These devices and their underlying
implementations are discussed in detail in the following section.
## Interface to the Kata sandbox/virtual machine
In case of Kata, today the devices which we need in the guest are:
- Storage: In the current design of Kata Containers, we are reliant on the CRI implementor to
assist in image handling and volume management on the host. As a result, we need to support a way of passing to the sandbox the container rootfs, volumes requested
by the workload, and any other volumes created to facilitate sharing of secrets and `configmaps` with the containers. Depending on how these are managed, a block based device or file-system
sharing is required. Kata Containers does this by way of `virtio-blk` and/or `virtio-fs`.
- Networking: A method for enabling network connectivity with the workload is required. Typically this will be done providing a `TAP` device
to the VMM, and this will be exposed to the guest as a `virtio-net` device. It is feasible to pass in a NIC device directly, in which case `VFIO` is leveraged
and the device itself will be exposed to the guest.
- Control: In order to interact with the guest agent and retrieve `STDIO` from containers, a medium of communication is required.
This is available via `virtio-vsock`.
- Devices: `VFIO` is utilized when devices are passed directly to the virtual machine and exposed to the container.
- Dynamic Resource Management: `ACPI` is utilized to allow for dynamic VM resource management (for example: CPU, memory, device hotplug). This is required when containers are resized,
or more generally when containers are added to a pod.
How these devices are utilized varies depending on the VMM utilized. We clarify the default settings provided when integrating Kata
with the QEMU, Firecracker and Cloud Hypervisor VMMs in the following sections.
### Devices
Each virtio device is implemented by a backend, which may execute within userspace on the host (vhost-user), the VMM itself, or within the host kernel (vhost). While it may provide enhanced performance,
vhost devices are often seen as higher risk since an exploit would be already running within the kernel space. While VMM and vhost-user are both in userspace on the host, `vhost-user` generally allows for the back-end process to require less system calls and capabilities compared to a full VMM.
#### `virtio-blk` and `virtio-scsi`
The backend for `virtio-blk` and `virtio-scsi` are based in the VMM itself (ring3 in the context of x86) by default for Cloud Hypervisor, Firecracker and QEMU.
While `vhost` based back-ends are available for QEMU, it is not recommended. `vhost-user` back-ends are being added for Cloud Hypervisor, they are not utilized in Kata today.
#### `virtio-fs`
`virtio-fs` is supported in Cloud Hypervisor and QEMU. `virtio-fs`'s interaction with the host filesystem is done through a vhost-user daemon, `virtiofsd`.
The `virtio-fs` client, running in the guest, will generate requests to access files. `virtiofsd` will receive requests, open the file, and request the VMM
to `mmap` it into the guest. When DAX is utilized, the guest will access the host's page cache, avoiding the need for copy and duplication. DAX is still an experimental feature,
and is not enabled by default.
From the `virtiofsd` [documentation](https://qemu-project.gitlab.io/qemu/tools/virtiofsd.html):
```This program must be run as the root user. Upon startup the program will switch into a new file system namespace with the shared directory tree as its root. This prevents “file system escapes” due to symlinks and other file system objects that might lead to files outside the shared directory. The program also sandboxes itself using seccomp(2) to prevent ptrace(2) and other vectors that could allow an attacker to compromise the system after gaining control of the virtiofsd process.```
DAX-less support for `virtio-fs` is available as of the 5.4 Linux kernel. QEMU VMM supports virtio-fs as of v4.2. Cloud Hypervisor
supports `virtio-fs`.
#### `virtio-net`
`virtio-net` has many options, depending on the VMM and Kata configurations.
##### QEMU networking
While QEMU has options for `vhost`, `virtio-net` and `vhost-user`, the `virtio-net` backend
for Kata defaults to `vhost-net` for performance reasons. The default configuration is being
reevaluated.
##### Firecracker networking
For Firecracker, the `virtio-net` backend is within Firecracker's VMM.
##### Cloud Hypervisor networking
For Cloud Hypervisor, the current backend default is within the VMM. `vhost-user-net` support
is being added (written in rust, Cloud Hypervisor specific).
#### virtio-vsock
##### QEMU vsock
In QEMU, vsock is backed by `vhost_vsock`, which runs within the kernel itself.
##### Firecracker and Cloud Hypervisor
In Firecracker and Cloud Hypervisor, vsock is backed by a unix-domain-socket in the hosts userspace.
#### VFIO
Utilizing VFIO, devices can be passed through to the virtual machine. We will assess this separately. Exposure to
host is limited to gaps in device pass-through handling. This is supported in QEMU and Cloud Hypervisor, but not
Firecracker.
#### ACPI
ACPI is necessary for hotplug of CPU, memory and devices. ACPI is available in QEMU and Cloud Hypervisor. Device, CPU and memory hotplug
### Compile Intel® QAT drivers for Kata Containers kernel and add to Kata Containers rootfs
@@ -355,10 +355,10 @@ this small script so that it redirects to be able to use either QEMU or
Cloud Hypervisor with Kata.
```bash
$ echo'#!/bin/bash'| sudo tee /usr/local/bin/containerd-shim-kata-qemu-v2
$ echo'#!/usr/bin/env bash'| sudo tee /usr/local/bin/containerd-shim-kata-qemu-v2
$ echo'KATA_CONF_FILE=/opt/kata/share/defaults/kata-containers/configuration-qemu.toml /opt/kata/bin/containerd-shim-kata-v2 $@'| sudo tee -a /usr/local/bin/containerd-shim-kata-qemu-v2
$ echo'#!/bin/bash'| sudo tee /usr/local/bin/containerd-shim-kata-clh-v2
$ echo'#!/usr/bin/env bash'| sudo tee /usr/local/bin/containerd-shim-kata-clh-v2
$ echo'KATA_CONF_FILE=/opt/kata/share/defaults/kata-containers/configuration-clh.toml /opt/kata/bin/containerd-shim-kata-v2 $@'| sudo tee -a /usr/local/bin/containerd-shim-kata-clh-v2
* The Kata VM's SGX Encrypted Page Cache (EPC) memory size is based on the sum of `sgx.intel.com/epc`
resource requests within the pod.
*`init-sgx` can be removed from the YAML configuration file if the Kata rootfs is modified with the
necessary udev rules.
See the [note on SGX backwards compatibility](https://github.com/intel/intel-device-plugins-for-kubernetes/tree/main/cmd/sgx_plugin#backwards-compatibility-note).
* Intel SGX DCAP attestation is known to work from Kata sandboxes but it comes with one limitation: If
the Intel SGX `aesm` daemon runs on the bare metal node and DCAP `out-of-proc` attestation is used,
containers within the Kata sandbox cannot get the access to the host's `/var/run/aesmd/aesm.sock`
because socket passthrough is not supported. An alternative is to deploy the `aesm` daemon as a side-car
container.
* Projects like [Gramine Shielded Containers (GSC)](https://gramine-gsc.readthedocs.io/en/latest/) are
also known to work. For GSC specifically, the Kata guest kernel needs to have the `CONFIG_NUMA=y`
enabled and at least one CPU online when running the GSC container.
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.