We're currently hitting a race condition on the Cloud Hypervisor's
driver code when quickly removing and adding a block device.
This happens because the device removal is an asynchronous operation,
and we currently do *not* monitor events coming from Cloud Hypervisor to
know when the device was actually removed. Together with this, the
sandbox code doesn't know about that and when a new device is attached
it'll quickly assign what may be the very same ID to the new device,
leading to the Cloud Hypervisor's driver trying to hotplug a device with
the very same ID of the device that was not yet removed.
This is, in a nutshell, why the tests with Cloud Hypervisor and
devmapper have been failing every now and then.
The workaround taken to solve the issue is basically *not* passing down
the device ID to Cloud Hypervisor and simply letting Cloud Hypervisor
itself generate those, as Cloud Hypervisor does it in a manner that
avoids such conflicts. With this addition we have then to keep a map of
the device ID and the Cloud Hypervisor's generated ID, so we can
properly remove the device.
This workaround will probably stay for a while, at least till someone
has enough cycles to implement a way to watch the device removal event
and then properly act on that. Spoiler alert, this will be a complex
change that may not even be worth it considering the race can be avoided
with this commit.
Fixes: #4176
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
With everything implemented, let's now expose the disk rate limiter
configuration options in the Cloud Hypervisor configuration file.
Fixes: #4139
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
With everything implemented, let's now expose the net rate limiter
configuration options in the Cloud Hypervisor configuration file.
Fixes: #4017
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The notion of "built-in rate limiter" was added as part of
bd8658e362, and that commit considered
that only Firecracker had a built-in rate limiter, which I think was the
case when that was introduced (mid 2020).
Nowadays, however, Cloud Hypervisor takes advantage of the very same crate
used by Firecraker to do I/O throttling.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's take advantage of the newly added DiskRateLimiter* options and
apply those to the network device configuration.
The logic here is identical to the one already present in the Network
part of Cloud Hypervisor's driver.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the newly added disk rate limiter configurations to the Cloud
Hypervisor's hypervisor configuration.
Right now those are not used anywhere, and there's absolutely no way the
users can set those up. That's coming later in this very same series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
This is the disk counterpart of the what was introduced for the network
as part of the previous commits in this series.
The newly added fields are:
* DiskRateLimiterBwMaxRate, defined in bits per second, which is used to
control the network I/O bandwidth at the VM level.
* DiskRateLimiterBwOneTimeBurst, also defined in bits per second, which
is used to define an *initial* max rate, which doesn't replenish.
* DiskRateLimiterOpsMaxRate, the operations per second equivalent of the
DiskRateLimiterBwMaxRate.
* DiskRateLimiterOpsOneTimeBurst, the operations per second equivalent of
the DiskRateLimiterBwOneTimeBurst.
For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's take advantage of the newly added NetRateLimiter* options and
apply those to the network device configuration.
The logic here is quite similar to the one already present in the
Firecracker's driver, with the main difference being the single Inbound
/ Outbound MaxRate and the presence of both Bandwidth and Operations
rate limiter.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Firecracker's driver doesn't expose the RefillTime option of the rate
limiter to the user. Instead, it uses a contant value of 1000
miliseconds (1 second).
As we're following Firecracker's driver implementation, let's expose
create a new constant, use it as part of the Firecracker's driver, and
later on re-use it as part of the Cloud Hypervisor's driver.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Firecracker's revertBytes function, now called "RevertBytes", can be
exposed as part of the virtcontainers' utils file, as this function will
be reused by Cloud Hypervisor, when adding the rate limiter logic there.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Let's add the newly added network rate limiter configurations to the
Cloud Hypervisor's hypervisor configuration.
Right now those are not used anywhere, and there's absolutely no way the
users can set those up. That's coming later in this very same series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
In a similar way to what's already exposed as RxRateLimiterMaxRate and
TxRateLimiterMaxRate, let's add four new fields to the Hypervisor's
configuration.
The values added are related to bandwidth and operations rate limiters,
which have to be added so we can expose I/O throttling configurations to
users using Cloud Hypervisor as their preferred VMM.
The reason we cannot simply re-use {Rx,Tx}RateLimiterMaxRate is because
Cloud Hypervisor exposes a single MaxRate to be used for both inbound
and outbound queues.
The newly added fields are:
* NetRateLimiterBwMaxRate, defined in bits per second, which is used to
control the network I/O bandwidth at the VM level.
* NetRateLimiterBwOneTimeBurst, also defined in bits per second, which
is used to define an *initial* max rate, which doesn't replenish.
* NetRateLimiterOpsMaxRate, the operations per second equivalent of the
NetRateLimiterBwMaxRate.
* NetRateLimiterOpsOneTimeBurst, the operations per second equivalent of
the NetRateLimiterBwOneTimeBurst.
For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
Currently EnableMockTesting() takes no arguments and will always place the
mock storage in the fixed location /tmp/vc/mockfs. This means that one
test run can interfere with the next one if anything isn't cleaned up
(and there are other bugs which means that happens). If if those were
fixed this would allow developers testing on the same machine to interfere
with each other.
So, allow the mockfs to be placed at an arbitrary place given as a
parameter to EnableMockTesting(). In TestMain() we place it under our
existing temporary directory, so we don't need any additional cleanup just
for the mockfs.
fixes#4140
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Currently MockFSInit always creates the mockfs at the fixed path
/tmp/vc/mockfs. This change allows it to be initialized at any path
given as a parameter. This allows the tests in fs_test.go to be
simplified, because the by using a temporary directory from
t.TempDir(), which is automatically cleaned up, we don't need to
manually trigger initTestDir() (which is misnamed, it's actually a
cleanup function).
For now we still use the fixed path when auto-creating the mockfs in
MockAutoInit(), but we'll change that later.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
virtcontainers/persist/fs/mockfs.go defines a mock filesystem type for
testing. A global variable in virtcontainers/persist/manager.go is used to
force use of the mock fs rather than a normal one.
This patch moves the global, and the EnableMockTesting() function which
sets it into mockfs.go. This is slightly cleaner to begin with, and will
allow some further enhancements.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
storagePathSuffix defines the file path suffix - "vc" - used for
Kata's persistent storage information, as a private constant. We
duplicate this information in fc.go which also needs it.
Export it from fs.go instead, so it can be used in fc.go.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
A number of unit tests under virtcontainers/factory use
MockStorageRootPath() as a general purpose temporary directory. This
doesn't make sense: the mockfs driver isn't even in use here since we only
call EnableMockTesting for the pase virtcontainers package, not the
subpackages.
Instead use t.TempDir() which is for exactly this purpose. As a bonus it
also handles the cleanup, so we don't need MockStorageDestroy any more.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
There are several tests in mount_test.go which perform a sample bind
mount. These need a corresponding unmount to clean up afterwards or
attempting to delete the temporary files will fail due to the existing
mountpoint. Most of them had such an unmount, but
TestBindMountInvalidPgtypes was missing one.
In addition, the existing unmounts where done inconsistently - one was
simply inline (so wouldn't be executed if the test fails too early) and one
is a defer. Change them all to use the t.Cleanup mechanism.
For the dummy mountpoint files, rather than cleaning them up after the
test, the tests were removing them at the beginning of the test. That
stops the test being messed up by a previous run, but messily. Since
these are created in a private temporary directory anyway, if there's
something already there, that indicates a problem we shouldn't ignore.
In fact we don't need to explicitly remove these at all - they'll be
removed along with the rest of the private temporary directory.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The tests in hook_test.go run a mock hook binary, which does some debug
logging to /tmp/mock_hook.log. Currently we don't clean up those logs
when the tests are done. Use a test cleanup function to do this.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
SetupOCIConfigFile creates a temporary directory with os.MkDirTemp(). This
means the callers need to register a deferred function to remove it again.
At least one of them was commented out meaning that a /temp/katatest-
directory was leftover after the unit tests ran.
Change to using t.TempDir() which as well as better matching other parts of
the tests means the testing framework will handle cleaning it up.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Several tests in kata_agent_test.go create /tmp/mountPoint as a dummy
directory to mount. This is not cleaned up after the test. Although it
is in /tmp, that's still a little messy and can be confusing to a user.
In addition, because it uses the same name every time, it allows for one
run of the test to interfere with the next.
Use the built in t.TempDir() to use an automatically named and deleted
temporary directory instead.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This is to fix a test failure for the
kata-containers-2.0-ubuntu-20.04-s390x-main-baseline jenkins job
Fixes: #4088
Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
kata-monitor allows to get data profiles from the kata shim
instances running on the same node by acting as a proxy
(e.g., http://$NODE_ADDRESS:8090/debug/pprof/?sandbox=$MYSANDBOXID).
In order to proxy the requests and the responses to the right shim,
kata-monitor requires to pass the sandbox id via a query string in the
url.
The profiling index page proxied by kata-monitor contains the link to all
the data profiles available. All the links anyway do not contain the
sandbox id included in the request: the links result then broken when
accessed through kata-monitor.
This happens because the profiling index page comes from the kata shim,
which will not include the query string provided in the http request.
Let's add on-the-fly the sandbox id in each href tag returned by the kata
shim index page before providing the proxied page.
Fixes: #4054
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
The scanner reads nothing from viriofsd stderr pipe, because param
'--syslog' rediercts stderr to syslog. So there is no need to write
scanner.Text() to kata log
Fixes: #4063
Signed-off-by: Zhuoyu Tie <tiezhuoyu@outlook.com>
The fsGroup will be specified by the fsGroup key in
the direct-assign mountinfo metadate field.
This will be set when invoking the kata-runtime
binary and providing the key, value pair in the metadata
field. Similarly, the fsGroupChangePolicy will also
be provided in the mountinfo metadate field.
Adding an extra fields FsGroup and FSGroupChangePolicy
in the Mount construct for container mount which will
be populated when creating block devices by parsing
out the mountInfo.json.
And in handleDeviceBlockVolume of the kata-agent client,
it checks if the mount FSGroup is not nil, which
indicates that fsGroup change is required in the guest,
and will provide the FSGroup field in the protobuf to
pass the value to the agent.
Fixes#4018
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
This change adds two fields to the Storage pb
FSGroup which is a group id that the runtime
specifies to indicate to the agent to perform a
chown of the mounted volume to the specified
group id after mounting is complete in the guest.
FSGroupChangePolicy which is a policy to indicate
whether to always perform the group id ownership
change or only if the root directory group id
does not match with the desired group id.
These two fields will allow CSI plugins to indicate
to Kata that after the block device is mounted in
the guest, group id ownership change should be performed
on that volume.
Fixes#4018
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
Add some links to rendered webpages for better user experience,
let users can jump to pages only by clicking links in browsers.
Fixes: #4061
Signed-off-by: bin <bin@hyper.sh>
virtiofsd's debug will be enabled if hypervisor's debug has been
enabled, this will generate too many noisy logs from virtiofsd.
Unbind the relationship of log level between virtiofsd and
hypervisor, if users want to see debug log of virtiofsd,
can set it by:
virtio_fs_extra_args = ["-o", "log_level=debug"]
Fixes: #3303
Signed-off-by: bin <bin@hyper.sh>
The vhost-user-blk can be hotplugged on the PCI bridge successfully on
X86, but failed on Arm. However, hotplugging it on Root Port as a PCIe
device can work well on ARM.
Open the "pcie_root_port" in configuration.toml is needed.
Fixes: #4019
Signed-off-by: Jaylyn Ren <jaylyn.ren@arm.com>
This configuration option is valid for all the hypervisor that are going
to be used with the confidential containers effort, thus exposing the
configuration option for Cloud Hypervisor as well.
Fixes: #4022
Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.
This commit also updates the unit test advice to use `T.TempDir` to
create temporary directory in tests.
Fixes: #3924
Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
(default: "/run/containerd/containerd.sock") is duplicated when
printing kata-monitor usage:
[root@kubernetes ~]# kata-monitor --help
Usage of kata-monitor:
-listen-address string
The address to listen on for HTTP requests. (default ":8090")
-log-level string
Log level of logrus(trace/debug/info/warn/error/fatal/panic). (default "info")
-runtime-endpoint string
Endpoint of CRI container runtime service. (default: "/run/containerd/containerd.sock") (default "/run/containerd/containerd.sock")
the golang flag package takes care of adding the defaults when printing
usage. Remove the explicit print of the value so that it would not be
printed on screen twice.
Fixes: #3998
Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
getOOMEvents is a long-waiting call, it will retry when failed.
For cases of agent shutdown, the retry should stop.
When the agent hasn't detected agent has died, we can also check
whether the error is "ttrpc: closed".
Fixes: #3815
Signed-off-by: bin <bin@hyper.sh>