Commit Graph

1387 Commits

Author SHA1 Message Date
GabyCT
12f0ab120a
Merge pull request #4191 from dgibson/go-test-script
Improve Go unit test script
2022-05-17 10:27:04 -05:00
Rafael Fonseca
8052fe62fa runtime: do not check for EOF error in console watcher
The documentation of the bufio package explicitly says

"Err returns the first non-EOF error that was encountered by the
Scanner."

When io.EOF happens, `Err()` will return `nil` and `Scan()` will return
`false`.

Fixes #4079

Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>
2022-05-17 15:14:33 +02:00
Snir Sheriber
c67b9d2975 qemu: allow using legacy serial device for the console
This allows to get guest early boot logs which are usually
missed when virtconsole is used.
- It utilizes previous work on the govmm side:
https://github.com/kata-containers/govmm/pull/203
- unit test added

Fixes: #4237
Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-05-17 12:06:11 +03:00
Snir Sheriber
44814dce19 qemu: treat console kernel params within appendConsole
as it is tightly coupled with the appended console device
additionally have it tested

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
2022-05-17 12:05:31 +03:00
Fabiano Fidêncio
c39852e83f runtime: Use ${LIBEXEC}/virtiofsd as the default virtiofsd path
As now we build and ship the rust version of virtiofsd, which is not
tied to QEMU, we need to update its default location to match with where
we're installing this binary.

Fixes: #4249

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-05-16 09:30:24 +02:00
David Gibson
e73b70baff runtime: Don't run unit tests verbose by default
go-test.sh by default adds the -v option to 'go test' meaning that output
will be printed from all the passing tests as well as any failing ones.
This results in a lot of output in which it's often difficult to locate the
failing tests you're interested in.

So, remove -v from the default flags.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:22:31 +10:00
David Gibson
f24a6e761f runtime: Consolidate flags setting in unit tests script
One of the responsibilities of the go-test.sh script is setting up the
default flags for 'go test'.  This is constructed across several different
places in the script using several unneeded intermediate variables though.

Consolidate all the flag construction into one place.

fixes #4190

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:22:29 +10:00
David Gibson
cf465feb02 runtime: Don't change test behaviour based on $CI or $KATA_DEV_MODE
go-test.sh changes behaviour based on both the $CI and $KATA_DEV_MODE
variables, but not in a way that makes a lot of sense.

If either one is set it uses the test_coverage path, instead of the
test_local path.  That collects coverage information, as the name
suggests, but it also means it runs the tests twice as root and
non-root, which is very non-obvious.

It's not clear what use case the test_local path is for at all.
Developer local builds will typically have $KATA_DEV_MODE set and CI
builds will have $CI set.  There's essentially no downside to running
coverage all the time - it has little impact on the test runtime.

In addition, if *both* $CI and $KATA_DEV_MODE are set, the script
refuses to run things as root, considering it "unsafe".  While having
both set might be unwise in a general sense, there's not really any
way running sudo can be any more unsafe than it is with either one
set.

So, simplify everything by just always running the test_coverage path.
This leaves the test_local path unused, so we can remove it entirely.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:14:37 +10:00
David Gibson
34c4ac599c runtime: Remove redundant subcommands from go-test.sh
go-test.sh accepts subcommands, however invoking it in the usual way via
the Makefile doesn't use them.  In fact the only remaining subcommand is
"help" and we already have another way of getting the usage information
(-h or --help).  We don't need a second way, so just drop subcommand
handling.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:14:37 +10:00
David Gibson
0aff5aaa39 runtime: Simplify package listing in go-test.sh
go-test.sh defaults to testing all the packages listed by go list, except
for a number filtered out.  It turns out that none of those filters are
necessary any more:
  * We've long required a Go newer than 1.9 which means the vendor filter
    isn't needed
  * The agent filter doesn't do anything now that we've moved to the Kata
    2.x unified repo
  * The tests filters don't hit anything on the list of modules in
    src/runtime (which is the only user of the script)

But since we don't need to filter anything out any more, we don't even need
to iterate through a list ourselves.  We can simply pass "./..." directly
to go test and it will iterate through all the sub-packages itself.

Interestingly this more than doubles the speed of "make test" for me - I
suspect because go test's internal paralellism works better over a larger
pool of tests.

This also lets us remove handling of non-existent coverage files from
test_go_package(), since with default options we will no longer test packages without tests
by default.  If the user explicitly requests testing of a package with no
tests, then failing makes sense.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:14:37 +10:00
David Gibson
557c4cfd00 runtime: Don't chmod coverage files in Go tests
The go-test.sh script has an explicit chmod command, run as root, to
set the mode of the temporary coverage files to 0644.  AFAICT the
point of this is specifically the 004 bit allowing world read access,
so that we can then merge the temporary coverage file into the main
coverage file.

That's a convoluted way of doing things.  Instead we can just run the tail
command which reads the temporary file as the same user that generated it.

In addition, go-test.sh became root to remove that temporary coverage
file.  This is not necessary, since deleting a regular file just requires
write access to the directory, not the file itself.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:14:37 +10:00
David Gibson
04c8b52e04 runtime: Remove HTML coverage option from go-test.sh
The html-coverage option to this script doesn't really alter behaviour
it just does the same thing as normal coverage, then converts the
report to HTML.  That conversion is a single command, plus a chmod to
make the final output mode 0644.  That overrides any umask the user
has set, which doesn't seem like a policy decision this script should
be making.

Nothing in the kata-containers or tests repository uses this, so it doesn't
really make sense to keep this logic inside this script.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:14:37 +10:00
David Gibson
7f76914422 runtime: Add coverage.txt.tmp to gitignore
In addition to coverage.txt, the go-test.sh script creates
coverage.txt.tmp files while running.  These are temporary and
certainly shouldn't be committed, so add them to the gitignore file.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:14:37 +10:00
David Gibson
13c2577004 runtime: Move go testing script locally
The go unit tests for the runtime are invoked by the helper script
ci/go-test.sh.  Which calls the run_go_test() function in ci/lib.sh.  Which
calls into .ci/go-test.sh from the tests repository.

But.. the runtime is the only user of this script, and generally stuff for
unit tests (rather than functional or integration tests) lives in the main
repository, not the tests repository.

So, just move the actual script into src/runtime.  A change to remove it
from the tests repo will follow.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-05-13 13:14:37 +10:00
Zvonko Kaiser
2a1d394147 runtime: Adding the correct detection of mediated PCIe devices
Fixes #4212

Signed-off-by: Zvonko Kaiser <zkaiser@nvidia.com>
2022-05-09 00:57:06 -07:00
Fabiano Fidêncio
33a8b70558 clh: Rely on Cloud Hypervisor for generating the device ID
We're currently hitting a race condition on the Cloud Hypervisor's
driver code when quickly removing and adding a block device.

This happens because the device removal is an asynchronous operation,
and we currently do *not* monitor events coming from Cloud Hypervisor to
know when the device was actually removed.  Together with this, the
sandbox code doesn't know about that and when a new device is attached
it'll quickly assign what may be the very same ID to the new device,
leading to the Cloud Hypervisor's driver trying to hotplug a device with
the very same ID of the device that was not yet removed.

This is, in a nutshell, why the tests with Cloud Hypervisor and
devmapper have been failing every now and then.

The workaround taken to solve the issue is basically *not* passing down
the device ID to Cloud Hypervisor and simply letting Cloud Hypervisor
itself generate those, as Cloud Hypervisor does it in a manner that
avoids such conflicts.  With this addition we have then to keep a map of
the device ID and the Cloud Hypervisor's generated ID, so we can
properly remove the device.

This workaround will probably stay for a while, at least till someone
has enough cycles to implement a way to watch the device removal event
and then properly act on that.  Spoiler alert, this will be a complex
change that may not even be worth it considering the race can be avoided
with this commit.

Fixes: #4176

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-05-04 09:04:03 +02:00
Jianyong Wu
982c32358a
Merge pull request #4031 from Jaylyn-Ren/kata-spdk
Virtcontainers: Enable hot plugging vhost-user-blk device on ARM
2022-04-29 12:16:38 +08:00
Fabiano Fidêncio
b6467ddd73 clh: Expose disk rate limiter config
With everything implemented, let's now expose the disk rate limiter
configuration options in the Cloud Hypervisor configuration file.

Fixes: #4139

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:28:29 +02:00
Fabiano Fidêncio
7580bb5a78 clh: Expose net rate limiter config
With everything implemented, let's now expose the net rate limiter
configuration options in the Cloud Hypervisor configuration file.

Fixes: #4017

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:28:13 +02:00
Fabiano Fidêncio
a88adabaae clh: Cloud Hypervisor has a built-in Rate Limiter
The notion of "built-in rate limiter" was added as part of
bd8658e362, and that commit considered
that only Firecracker had a built-in rate limiter, which I think was the
case when that was introduced (mid 2020).

Nowadays, however, Cloud Hypervisor takes advantage of the very same crate
used by Firecraker to do I/O throttling.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:56 +02:00
Fabiano Fidêncio
63c4da03a9 clh: Implement the Disk RateLimiter logic
Let's take advantage of the newly added DiskRateLimiter* options and
apply those to the network device configuration.

The logic here is identical to the one already present in the Network
part of Cloud Hypervisor's driver.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:53 +02:00
Fabiano Fidêncio
511f7f822d config: Add DiskRateLimiter* to Cloud Hypervisor
Let's add the newly added disk rate limiter configurations to the Cloud
Hypervisor's hypervisor configuration.

Right now those are not used anywhere, and there's absolutely no way the
users can set those up.  That's coming later in this very same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:15 +02:00
Fabiano Fidêncio
5b18575dfe hypervisor: Add disk bandwidth and operations rate limiters
This is the disk counterpart of the what was introduced for the network
as part of the previous commits in this series.

The newly added fields are:
* DiskRateLimiterBwMaxRate, defined in bits per second, which is used to
  control the network I/O bandwidth at the VM level.
* DiskRateLimiterBwOneTimeBurst, also defined in bits per second, which
  is used to define an *initial* max rate, which doesn't replenish.
* DiskRateLimiterOpsMaxRate, the operations per second equivalent of the
  DiskRateLimiterBwMaxRate.
* DiskRateLimiterOpsOneTimeBurst, the operations per second equivalent of
  the DiskRateLimiterBwOneTimeBurst.

For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:27:11 +02:00
Fabiano Fidêncio
1cf9469297 clh: Implement the Network RateLimiter logic
Let's take advantage of the newly added NetRateLimiter* options and
apply those to the network device configuration.

The logic here is quite similar to the one already present in the
Firecracker's driver, with the main difference being the single Inbound
/ Outbound MaxRate and the presence of both Bandwidth and Operations
rate limiter.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:26:38 +02:00
Fabiano Fidêncio
00a5b1bda9 utils: Define DefaultRateLimiterRefillTimeMilliSecs
Firecracker's driver doesn't expose the RefillTime option of the rate
limiter to the user.  Instead, it uses a contant value of 1000
miliseconds (1 second).

As we're following Firecracker's driver implementation, let's expose
create a new constant, use it as part of the Firecracker's driver, and
later on re-use it as part of the Cloud Hypervisor's driver.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
Fabiano Fidêncio
be1bb7e39f utils: Move FC's function to revert bytes to utils
Firecracker's revertBytes function, now called "RevertBytes", can be
exposed as part of the virtcontainers' utils file, as this function will
be reused by Cloud Hypervisor, when adding the rate limiter logic there.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
Fabiano Fidêncio
c9f6496d6d config: Add NetRateLimiter* to Cloud Hypervisor
Let's add the newly added network rate limiter configurations to the
Cloud Hypervisor's hypervisor configuration.

Right now those are not used anywhere, and there's absolutely no way the
users can set those up.  That's coming later in this very same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
Fabiano Fidêncio
2d35e6066d hypervisor: Add network bandwidth and operations rate limiters
In a similar way to what's already exposed as RxRateLimiterMaxRate and
TxRateLimiterMaxRate, let's add four new fields to the Hypervisor's
configuration.

The values added are related to bandwidth and operations rate limiters,
which have to be added so we can expose I/O throttling configurations to
users using Cloud Hypervisor as their preferred VMM.

The reason we cannot simply re-use {Rx,Tx}RateLimiterMaxRate is because
Cloud Hypervisor exposes a single MaxRate to be used for both inbound
and outbound queues.

The newly added fields are:
* NetRateLimiterBwMaxRate, defined in bits per second, which is used to
  control the network I/O bandwidth at the VM level.
* NetRateLimiterBwOneTimeBurst, also defined in bits per second, which
  is used to define an *initial* max rate, which doesn't replenish.
* NetRateLimiterOpsMaxRate, the operations per second equivalent of the
  NetRateLimiterBwMaxRate.
* NetRateLimiterOpsOneTimeBurst, the operations per second equivalent of
  the NetRateLimiterBwOneTimeBurst.

For now those extra fields have only been added to the hypervisor's
configuration and they'll be used in the coming patches of this very
same series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-28 10:22:42 +02:00
David Gibson
1b931f4203 runtime: Allock mockfs storage to be placed in any directory
Currently EnableMockTesting() takes no arguments and will always place the
mock storage in the fixed location /tmp/vc/mockfs.  This means that one
test run can interfere with the next one if anything isn't cleaned up
(and there are other bugs which means that happens).  If if those were
fixed this would allow developers testing on the same machine to interfere
with each other.

So, allow the mockfs to be placed at an arbitrary place given as a
parameter to EnableMockTesting().  In TestMain() we place it under our
existing temporary directory, so we don't need any additional cleanup just
for the mockfs.

fixes #4140

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:47:59 +10:00
David Gibson
ef6d54a781 runtime: Let MockFSInit create a mock fs driver at any path
Currently MockFSInit always creates the mockfs at the fixed path
/tmp/vc/mockfs.  This change allows it to be initialized at any path
given as a parameter.  This allows the tests in fs_test.go to be
simplified, because the by using a temporary directory from
t.TempDir(), which is automatically cleaned up, we don't need to
manually trigger initTestDir() (which is misnamed, it's actually a
cleanup function).

For now we still use the fixed path when auto-creating the mockfs in
MockAutoInit(), but we'll change that later.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:23:36 +10:00
David Gibson
5d8438e939 runtime: Move mockfs control global into mockfs.go
virtcontainers/persist/fs/mockfs.go defines a mock filesystem type for
testing.  A global variable in virtcontainers/persist/manager.go is used to
force use of the mock fs rather than a normal one.

This patch moves the global, and the EnableMockTesting() function which
sets it into mockfs.go.  This is slightly cleaner to begin with, and will
allow some further enhancements.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:23:36 +10:00
David Gibson
963d03ea8a runtime: Export StoragePathSuffix
storagePathSuffix defines the file path suffix - "vc" - used for
Kata's persistent storage information, as a private constant.  We
duplicate this information in fc.go which also needs it.

Export it from fs.go instead, so it can be used in fc.go.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:23:36 +10:00
David Gibson
1719a8b491 runtime: Don't abuse MockStorageRootPath() for factory tests
A number of unit tests under virtcontainers/factory use
MockStorageRootPath() as a general purpose temporary directory.  This
doesn't make sense: the mockfs driver isn't even in use here since we only
call EnableMockTesting for the pase virtcontainers package, not the
subpackages.

Instead use t.TempDir() which is for exactly this purpose.  As a bonus it
also handles the cleanup, so we don't need MockStorageDestroy any more.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:23:36 +10:00
David Gibson
bec59f9e39 runtime: Make bind mount tests better clean up after themselves
There are several tests in mount_test.go which perform a sample bind
mount.  These need a corresponding unmount to clean up afterwards or
attempting to delete the temporary files will fail due to the existing
mountpoint.  Most of them had such an unmount, but
TestBindMountInvalidPgtypes was missing one.

In addition, the existing unmounts where done inconsistently - one was
simply inline (so wouldn't be executed if the test fails too early) and one
is a defer.  Change them all to use the t.Cleanup mechanism.

For the dummy mountpoint files, rather than cleaning them up after the
test, the tests were removing them at the beginning of the test.  That
stops the test being messed up by a previous run, but messily.  Since
these are created in a private temporary directory anyway, if there's
something already there, that indicates a problem we shouldn't ignore.
In fact we don't need to explicitly remove these at all - they'll be
removed along with the rest of the private temporary directory.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:20:35 +10:00
David Gibson
f7ba21c86f runtime: Clean up mock hook logs in tests
The tests in hook_test.go run a mock hook binary, which does some debug
logging to /tmp/mock_hook.log.  Currently we don't clean up those logs
when the tests are done.  Use a test cleanup function to do this.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:14:52 +10:00
David Gibson
90b2f5b776 runtime: Make SetupOCIConfigFile clean up after itself
SetupOCIConfigFile creates a temporary directory with os.MkDirTemp().  This
means the callers need to register a deferred function to remove it again.
At least one of them was commented out meaning that a /temp/katatest-
directory was leftover after the unit tests ran.

Change to using t.TempDir() which as well as better matching other parts of
the tests means the testing framework will handle cleaning it up.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:14:52 +10:00
David Gibson
2eeb5dc223 runtime: Don't use fixed /tmp/mountPoint path
Several tests in kata_agent_test.go create /tmp/mountPoint as a dummy
directory to mount.  This is not cleaned up after the test.  Although it
is in /tmp, that's still a little messy and can be confusing to a user.
In addition, because it uses the same name every time, it allows for one
run of the test to interfere with the next.

Use the built in t.TempDir() to use an automatically named and deleted
temporary directory instead.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-04-22 14:14:52 +10:00
Archana Shinde
33e244f284
Merge pull request #4102 from likebreath/0414/clh_v23.0
Upgrade to Cloud Hypervisor v23.0
2022-04-19 06:01:04 -07:00
Chelsea Mafrica
0af13b469d
Merge pull request #4086 from BbolroC/s390x-fix
test: Fix golangci-lint error for s390x
2022-04-15 21:07:09 -07:00
Bin Liu
b19bfac7cd
Merge pull request #4042 from yibozhuang/direct-assign-fsgroup
fsGroup support for direct-assigned volume
2022-04-16 10:23:15 +08:00
Bin Liu
4ec1967542
Merge pull request #4094 from fgiudici/kata-monitor_readme
kata-monitor: add the README file
2022-04-16 08:27:22 +08:00
Bin Liu
362201605e
Merge pull request #4055 from fgiudici/kata-monitor_pprof
kata-monitor: update the hrefs in the debug/pprof index page
2022-04-16 08:12:18 +08:00
Francesco Giudici
7b2ff02647 kata-monitor: add a README file
Fixes: #3704

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2022-04-15 18:03:23 +02:00
Bo Chen
29e569aa92 virtcontainers: clh: Re-generate the client code
This patch re-generates the client code for Cloud Hypervisor v23.0.
Note: The client code of cloud-hypervisor's (CLH) OpenAPI is
automatically generated by openapi-generator [1-2].

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Signed-off-by: Bo Chen <chen.bo@intel.com>
2022-04-14 12:56:01 -07:00
Chelsea Mafrica
32f92e75cc
Merge pull request #4021 from fengwang666/direct-volume-bug
runtime: Base64 encode the direct volume mountInfo path
2022-04-13 13:15:38 -07:00
Greg Kurz
4443bb68a4
Merge pull request #4064 from tiezhuoyu/4063/no-need-to-write-error-of-virtiofsd-to-kata-log
runtime: no need to write virtiofsd error to log
2022-04-13 11:59:19 +02:00
Hyounggyu Choi
d136c9c240 test: Fix golangci-lint error for s390x
This is to fix a test failure for the
kata-containers-2.0-ubuntu-20.04-s390x-main-baseline jenkins job

Fixes: #4088

Signed-off-by: Hyounggyu Choi <Hyounggyu.Choi@ibm.com>
2022-04-13 09:20:51 +02:00
Fupan Li
66aa07649b
Merge pull request #4062 from liubin/fix/4061-add-links-for-kata-monitor
kata-monitor: add some links when generating pages for browsers
2022-04-13 11:30:21 +08:00
Francesco Giudici
86977ff780 kata-monitor: update the hrefs in the debug/pprof index page
kata-monitor allows to get data profiles from the kata shim
instances running on the same node by acting as a proxy
(e.g., http://$NODE_ADDRESS:8090/debug/pprof/?sandbox=$MYSANDBOXID).
In order to proxy the requests and the responses to the right shim,
kata-monitor requires to pass the sandbox id via a query string in the
url.

The profiling index page proxied by kata-monitor contains the link to all
the data profiles available. All the links anyway do not contain the
sandbox id included in the request: the links result then broken when
accessed through kata-monitor.
This happens because the profiling index page comes from the kata shim,
which will not include the query string provided in the http request.

Let's add on-the-fly the sandbox id in each href tag returned by the kata
shim index page before providing the proxied page.

Fixes: #4054

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2022-04-12 15:53:59 +02:00
Zhuoyu Tie
6e79042aa0 runtime: no need to write virtiofsd error to log
The scanner reads nothing from viriofsd stderr pipe, because param
'--syslog' rediercts stderr to syslog. So there is no need to write
scanner.Text() to kata log

Fixes: #4063

Signed-off-by: Zhuoyu Tie <tiezhuoyu@outlook.com>
2022-04-12 15:59:57 +08:00
Yibo Zhuang
532d53977e runtime: fsGroup support for direct-assigned volume
The fsGroup will be specified by the fsGroup key in
the direct-assign mountinfo metadate field.
This will be set when invoking the kata-runtime
binary and providing the key, value pair in the metadata
field. Similarly, the fsGroupChangePolicy will also
be provided in the mountinfo metadate field.

Adding an extra fields FsGroup and FSGroupChangePolicy
in the Mount construct for container mount which will
be populated when creating block devices by parsing
out the mountInfo.json.

And in handleDeviceBlockVolume of the kata-agent client,
it checks if the mount FSGroup is not nil, which
indicates that fsGroup change is required in the guest,
and will provide the FSGroup field in the protobuf to
pass the value to the agent.

Fixes #4018

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-04-11 08:41:13 -07:00
Yibo Zhuang
6a47b82c81 proto: fsGroup support for direct-assigned volume
This change adds two fields to the Storage pb

FSGroup which is a group id that the runtime
specifies to indicate to the agent to perform a
chown of the mounted volume to the specified
group id after mounting is complete in the guest.

FSGroupChangePolicy which is a policy to indicate
whether to always perform the group id ownership
change or only if the root directory group id
does not match with the desired group id.

These two fields will allow CSI plugins to indicate
to Kata that after the block device is mounted in
the guest, group id ownership change should be performed
on that volume.

Fixes #4018

Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
2022-04-11 08:41:13 -07:00
bin
f8cc5d1ad8 kata-monitor: add some links when generating pages for browsers
Add some links to rendered webpages for better user experience,
let users can jump to pages only by clicking links in browsers.

Fixes: #4061

Signed-off-by: bin <bin@hyper.sh>
2022-04-11 09:29:56 +08:00
bin
9d5b03a1b7 runtime: delete debug option in virtiofsd
virtiofsd's debug will be enabled if hypervisor's debug has been
enabled, this will generate too many noisy logs from virtiofsd.

Unbind the relationship of log level between virtiofsd and
hypervisor, if users want to see debug log of virtiofsd,
can set it by:

  virtio_fs_extra_args = ["-o", "log_level=debug"]

Fixes: #3303

Signed-off-by: bin <bin@hyper.sh>
2022-04-07 19:55:22 +08:00
Greg Kurz
d0d3787233
Merge pull request #3696 from shippomx/main
kata-runtime enable hugepage support
2022-04-06 16:47:04 +02:00
Jaylyn Ren
b975f2e8d2 Virtcontainers: Enable hot plugging vhost-user-blk device on ARM
The vhost-user-blk can be hotplugged on the PCI bridge successfully on
X86, but failed on Arm. However, hotplugging it on Root Port as a PCIe
device can work well on ARM.
Open the "pcie_root_port" in configuration.toml is needed.

Fixes: #4019

Signed-off-by: Jaylyn Ren <jaylyn.ren@arm.com>
2022-04-06 17:37:51 +08:00
Fabiano Fidêncio
b39caf43f1
Merge pull request #3923 from Jakob-Naucke/no-initrd-se
runtime: Allow and require no initrd for SE
2022-04-05 09:26:07 +02:00
Feng Wang
354cd3b9b6 runtime: Base64 encode the direct volume mountInfo path
This is to avoid accidentally deleting multiple volumes.

Fixes #4020

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-04-04 19:56:46 -07:00
Archana Shinde
e62bc8e7f3
Merge pull request #3915 from Juneezee/test/t.TempDir
test: use `T.TempDir` to create temporary test directory
2022-04-04 01:34:46 -07:00
Fabiano Fidêncio
98750d792b clh: Expose service offload configuration
This configuration option is valid for all the hypervisor that are going
to be used with the confidential containers effort, thus exposing the
configuration option for Cloud Hypervisor as well.

Fixes: #4022

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-04-01 11:15:55 +02:00
Eng Zer Jun
59c7165ee1
test: use T.TempDir to create temporary test directory
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.

This commit also updates the unit test advice to use `T.TempDir` to
create temporary directory in tests.

Fixes: #3924

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-03-31 09:31:36 +08:00
snir911
18dc578134
Merge pull request #3999 from fgiudici/kata-monitor_fix_help
kata-monitor: fix duplicated output when printing usage
2022-03-30 18:56:59 +03:00
Francesco Giudici
a63bbf9793 kata-monitor: fix duplicated output when printing usage
(default: "/run/containerd/containerd.sock") is duplicated when
printing kata-monitor usage:

[root@kubernetes ~]# kata-monitor --help
Usage of kata-monitor:
  -listen-address string
        The address to listen on for HTTP requests. (default ":8090")
  -log-level string
        Log level of logrus(trace/debug/info/warn/error/fatal/panic). (default "info")
  -runtime-endpoint string
        Endpoint of CRI container runtime service. (default: "/run/containerd/containerd.sock") (default "/run/containerd/containerd.sock")

the golang flag package takes care of adding the defaults when printing
usage. Remove the explicit print of the value so that it would not be
printed on screen twice.

Fixes: #3998

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2022-03-30 11:58:53 +02:00
bin
5e1c30d484 runtime: add logs around sandbox monitor
For debugging purposes, add some logs.

Fixes: #3815

Signed-off-by: bin <bin@hyper.sh>
2022-03-29 16:59:12 +08:00
bin
fb8be96194 runtime: stop getting OOM events when ttrpc: closed error
getOOMEvents is a long-waiting call, it will retry when failed.
For cases of agent shutdown, the retry should stop.

When the agent hasn't detected agent has died, we can also check
whether the error is "ttrpc: closed".

Fixes: #3815

Signed-off-by: bin <bin@hyper.sh>
2022-03-29 16:39:01 +08:00
Bin Liu
9495316145
Merge pull request #3962 from yaoyinnan/fix/3750-VirtioMem
runtime: Remove the explicit VirtioMem set and fix the comment
2022-03-29 10:20:05 +08:00
yaoyinnan
66f05c5bcb runtime: Remove the explicit VirtioMem set and fix the comment
Modify the 2Mib in the comment to 4Mib.
VirtioMem is set by configuration file or annotation. And setupVirtioMem is called only when VirtioMem is true.

Fixes: #3750

Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2022-03-28 21:21:38 +08:00
Feng Wang
0928eb9f4e agent: Kill the all the container processes of the same cgroup
Otherwise the container process might leak and cause an unclean exit

Fixes: #3913

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-27 10:06:58 -07:00
Jakob Naucke
ff17c756d2
runtime: Allow and require no initrd for SE
Previously, it was not permitted to have neither an initrd nor an image.
However, this is the exact config to use for Secure Execution, where the
initrd is part of the image to be specified as `-kernel`. Require the
configuration of no initrd for Secure Execution.

Also
- remove redundant code for image/initrd checking -- no need to check in
  `newQemuHypervisorConfig` (calling) when it is also checked in
  `getInitrdAndImage` (called)
- use `QemuCCWVirtio` constant when possible

Fixes: #3922
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2022-03-25 18:36:12 +01:00
Feng Wang
19f372b5f5 runtime: Add more debug logs for container io stream copy
This can help debugging container lifecycle issues

Fixes: #3913

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-24 21:35:16 -07:00
David Gibson
c77e34de33 runtime: Move mock hook source
src/runtime/virtcontainers/hook/mock contains a simple example hook in Go.
The only thing this is used for is for some tests in
src/runtime/pkg/katautils/hook_test.go.  It doesn't really have anything
to do with the rest of the virtcontainers package.

So, move it next to the test code that uses it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-23 19:37:35 +11:00
David Gibson
86723b51ae virtcontainers: Remove unused install/uninstall targets
We've now removed the need to install the mock hook binary for unit tests.
However, it turns out that managing that was the *only* thing that the
install and uninstall targets in the virtcontainers Makefile handled.

So, remove them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-23 19:37:18 +11:00
David Gibson
0e83c95fac virtcontainers: Run mock hook from build tree rather than system bin dir
Running unit tests should generally have minimal dependencies on
things outside the build tree.  It *definitely* shouldn't modify
system wide things outside the build tree.  Currently the runtime
"make test" target does so, though.

Several of the tests in src/runtime/pkg/katautils/hook_test.go require a
sample hook binary.  They expect this hook in
/usr/bin/virtcontainers/bin/test/hook, so the makefile, as root, installs
the test binary to that location.

Go tests automatically run within the package's directory though, so
there's no need to use a system wide path.  We can use a relative path to
the binary build within the tree just as easily.

fixes #3941

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-23 19:34:50 +11:00
David Gibson
e65db838ff virtcontainers: Remove VC_BIN_DIR
The VC_BIN_DIR variable in the virtcontainers Makefile is almost unused.
It's used to generate TEST_BIN_DIR, and it's created in the install target.
However, we also create TEST_BIN_DIR, which is a subdirectory of VC_BIN_DIR
with mkdir -p, so it will necessarily create VC_BIN_DIR along the way.

So we can drop the unnecessary mkdir and expand the definition of
VC_BIN_DIR in the definition of TEST_BIN_DIR.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-22 16:53:59 +11:00
David Gibson
c20ad2836c virtcontainers: Remove unused Makefile defines
The INSTALL_EXEC and UNINSTALL_EXEC definitions from the virtcontainers
Makefile (unlike those from the runtime Makefile in the parent directory)
are entirely unused.  Remove them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-22 16:40:57 +11:00
David Gibson
c776bdf4a8 virtcontainers: Remove unused parameter from go-test.sh
The check-go-test target passes the path to the mock hook test binary to
go-test.sh when it invokes it.  But go-test.sh just calls run_go_test from
ci/lib.sh, which invokes a script from the tests repo *without* any
parameters.

That is, this parameter is ignored anyway, so remove it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2022-03-22 16:39:22 +11:00
James O. D. Hunt
f8fb0d3bb6
Merge pull request #3322 from Kvasscn/kata_dev_block_driver_option
device: using const strings for block-driver option instead of hard coding
2022-03-21 10:56:25 +00:00
Miao Xia
a2f5c1768e runtime/virtcontainers: Pass the hugepages resources to agent
The hugepages resources claimed by containers should be limited
by cgroup in the guest OS.

Fixes: #3695

Signed-off-by: Miao Xia <xia.miao1@zte.com.cn>
2022-03-15 18:46:08 +08:00
Feng Wang
aa5ae6b17c runtime: Properly handle ESRCH error when signaling container
Currently kata shim v2 doesn't translate ESRCH signal, causing container
fail to stop and shim leak.

Fixes: #3874

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-14 11:03:05 -07:00
zhanghj
efa19c41eb device: use const strings for block-driver option instead of hard coding
Currently, the block driver option is specifed by hard coding, maybe it
is better to use const string variables instead of hard coded strings.
Another modification is to remove duplicate consts for virtio driver in
manager.go.

Fixes: #3321

Signed-off-by: Jason Zhang <zhanghj.lc@inspur.com>
2022-03-14 09:20:43 +08:00
Gabriela Cervantes
ffdf961ae9 docs: Update contact link in runtime README
This PR updates the contact link in the runtime README document.

Fixes #3854

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2022-03-08 16:27:34 +00:00
Bin Liu
deb8ce97a8
Merge pull request #3836 from liubin/fix/minor-fix
Enhancement: fix comments/logs and delete not used function
2022-03-07 17:26:30 +08:00
bin
1b34494b2f runtime: fix invalid comments for pkg/resourcecontrol
Some comments are copied and not adjusted to the
pkg/resourcecontrol package.

Fixes: #3835

Signed-off-by: bin <bin@hyper.sh>
2022-03-05 10:32:31 +08:00
Evan Foster
afc567a9ae storage: make k8s emptyDir creation configurable
This change introduces the `disable_guest_empty_dir` config option,
which allows the user to change whether a Kubernetes emptyDir volume is
created on the guest (the default, for performance reasons), or the host
(necessary if you want to pass data from the host to a guest via an
emptyDir).

Fixes #2053

Signed-off-by: Evan Foster <efoster@adobe.com>
2022-03-04 12:02:42 -08:00
Eric Ernst
1e301482e7
Merge pull request #3406 from fengwang666/direct-blk-assignment
Implement direct-assigned volume
2022-03-04 11:58:37 -08:00
Feng Wang
e76519af83 runtime: small refactor to improve readability
Remove some confusing/duplicate code so it's more readable

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-04 10:00:52 -08:00
Fabiano Fidêncio
7e5f11a52b vendor: Update containerd to 1.6.1
Let's bring in the latest release of Containerd, 1.6.1, released on
March 2nd, 2022.

With this, we take the opportunity to remove containerd/api reference as
we shouldn't need a separate module only for the API.

Here's the list of changes needed in the code due to the bump:
* stop using `grpc.WithInsecure()` as it's been deprecated
  - use `grpc.WithTransportCredentials(insecure.NewCredentials())`
    instead

Fixes: #3820

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-04 10:28:40 +01:00
Fabiano Fidêncio
2af91b23e1
Merge pull request #3281 from jongwu/vcpu_hotplug_arm64
experimentally enable vcpu hotplug and virtio-mem on arm64 in kernel part
2022-03-04 09:14:31 +01:00
Jianyong Wu
42771fa726 runtime: don't set socket and thread for arm/virt
As this is just a initial vcpu hotplug support, thread and socket has
not been supported. So, don't set socket and thread when hotadd cpu for
arm/virt.

Fixes: #3280
Signed-off-by: Jianyong Wu <jianyong.wu@arm.com>
2022-03-04 11:22:18 +08:00
Feng Wang
f905161bbb runtime: mount direct-assigned block device fs only once
Mount the direct-assigned block device fs only once and keep a refcount
in the guest. Also use the ro flag inside the options field to determine
whether the block device and filesystem should be mounted as ro

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 18:57:02 -08:00
Feng Wang
ea51ef1c40 runtime: forward the stat and resize requests from shimv2 to kata agent
Translate the volume path from host-known path to guest-known path
and forward the request to kata agent.

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 18:57:02 -08:00
Feng Wang
c39281ad65 runtime: update container creation to work with direct assigned volumes
During the container creation, it will parse the mount info file
of the direct assigned volumes and update the in memory mount object.

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 18:57:02 -08:00
Feng Wang
4e00c2377c agent: add grpc interface for stat and resize operations
Add GetVolumeStats and ResizeVolume APIs for the runtime to query stat
and resize fs in the guest.

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 18:57:02 -08:00
Feng Wang
e9b5a25502 runtime: add stat and resize APIs to containerd-shim-v2
To query fs stats and resize fs, the requests need to be passed to
kata agent through containerd-shim-v2. So we're adding to rest APIs
on the shim management endpoint.
Also refactor shim management client to its own go file.

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 18:56:53 -08:00
Feng Wang
6e0090abb5 runtime: persist direct volume mount info
In the direct assigned volume scenario, Kata Containers persists
the information required for managing the volume inside the guest
on host filesystem.

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 15:32:12 -08:00
Feng Wang
fa326b4e0f runtime: augment kata-runtime CLI to support direct-assigned volume
Add commands to add, remove, resize and get stats of a direct-assigned volume.
These commands are expected to be consumed by CSI.

Fixes: #3454

Signed-off-by: Feng Wang <feng.wang@databricks.com>
2022-03-03 15:32:03 -08:00
Fabiano Fidêncio
a2422cf2a1
Merge pull request #3389 from zhsj/rm-distro-test
katatestutils: remove distro constraints
2022-03-03 23:26:58 +01:00
Fabiano Fidêncio
12af632952
Merge pull request #3814 from fidencio/wip/disable-block-device-use-minor-fixes
Minor fixes for the `disable_block_device_use` comments
2022-03-03 23:26:05 +01:00
Fabiano Fidêncio
af80473496 clh: stop virtofsd if clh fails to boot up the vm
If, for some reason, we're able to launch cloud hypervisor but not able
to boot the VM up, the virtiofsd process would be left behind.

Let's ensure, via defer, that we stop virtiofsd in case of errors.

Fixes: #3819

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 19:10:37 +01:00
Fabiano Fidêncio
c54bc8e657
Merge pull request #3811 from fidencio/wip/clh-tdx-round-2
clh: tdx: Don't use sharedFS with Confidential Guests
2022-03-03 19:03:28 +01:00
Fabiano Fidêncio
97951a2d12 clh: Don't use SharedFS with Confidential Guests
kata-containers/pulls#3771 added TDX support for Cloud Hypervisor, but
two big things got overlooked while doing that.

1. virtio-fs, as of now, cannot be part of the trust boundary, so the
   Confidential Guest will not be using it.

2. virtio-block hotplug should be enabled in order to use virtio-block
   for the rootfs (used with the devmapper plugin).

When trying to use cloud-hypervisor with TDX using virtio-fs, we're
facing the following error on the guest kernel:
```
virtiofs virtio2: device must provide VIRTIO_F_ACCESS_PLATFORM
```

After checking and double-checking with virtiofs and cloud-hypervisor
developers, it happens as confidential containers might put some
limitations on the device, so it can't access all of the guests' memory
and that's where this restriction seems to be coming from. Vivek
mentioned that virtiofsd do not support VIRTIO_F_ACCESS_PLATFORM (aka
VIRTIO_F_IOMMU_PLATFORM) yet, and that for ecrypted guests virtiofs may
not be the best solution at the moment.

@sboeuf put this in a very nice way: "if the virtio-fs driver doesn't
support VIRTIO_F_ACCESS_PLATFORM, then the pages corresponding to the
virtqueues and the buffers won't be marked as SHARED, meaning the VMM
won't have access to it".

Interestingly enough, it works with QEMU, and it may be due to some
change done on the patched QEMU that @devimc is packaging, but we won't
take the path to figure out what was the change and patch
cloud-hypervisor on the same way, because of 1.

Fixes: #3810

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:49:40 +01:00
Fabiano Fidêncio
c30b3a9ff1 clh: Adding a volume is not supported without SharedFS
As mounting volumes into the guest requires SharedFS setup, let's ensure
we error out if trying to do so in a situation where SharedFS is not
supported.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:49:30 +01:00
Fabiano Fidêncio
f889f1f957 clh: introduce supportsSharedFS()
supportsSharedFS() is a new method to be used to ensure that no SharedFS
specifics are called when, for a reason or another, Cloud Hypervisor is
in a mode where SharedFSs are not supported.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:49:28 +01:00
Fabiano Fidêncio
54d27ed721 clh: introduce loadVirtiofsDaemon()
Similarly to the `createVirtiofsDaemon` and `stopVirtiofsDaemon` methos,
let's introduce and use loadVirtiofsDaemon, at it'll also be handy later
in this series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:48:38 +01:00
Fabiano Fidêncio
ae2221ea68 clh: introduce stopVirtiofsDaemon()
Similary to the `createVirtiofsDaemon` method, let's introduce and use
its counterpart, as it'll also be handy later in this series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:48:26 +01:00
Fabiano Fidêncio
e8bc26f90d clh: introduce setupVirtiofsDaemon()
Similarly to what's been done with the `createVirtiofsDaemon`, let's
create a `setupVirtiofsDaemon` one.

It will also become handy later in this series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:48:14 +01:00
Fabiano Fidêncio
413b3b477a clh: introduce createVirtiofsDaemon()
Let's introduce and use a new `createVirtiofsDaemon` method.  Its name
says it all, and it'll be handy later in this series when, spoiler
alert, SharedFS cannot be used (in such cases as in Confidential
Guests).

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 12:48:02 +01:00
James O. D. Hunt
55cd0c89d8 runtime: Build golang components with extra security options
Enable stack protector and fortify source for golang builds.

Fixes: #3817.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-03-03 10:41:26 +00:00
Fabiano Fidêncio
76e4f6a2a3 Revert "hypervisors: Confidential Guests do not support Device hotplug"
This reverts commit df8ffecde0, as device
hotplug *is* supported and, more than that, is very much needed when
using virtio-blk instead of virtio-fs.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-03 09:59:55 +01:00
Fabiano Fidêncio
fa8b93927c config: qemu: Fix disable_block_device_use comments
virtio-fs, instead of virtio-9p, is the default shared file system type
in case virtio-blk is not used.

Fixes: #3813

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-02 20:43:36 +01:00
Fabiano Fidêncio
9615c8bc9c config: fc: Don't expose disable_block_device_use
Relying on virtio-block is the *only* way to use Firecracker with Kata
Containers, as shared FS (virtio-{fs,fs-nydus,9p}) is not supported by
Firecracker.

As configuration doesn't make sense to be exposed, we hardcode the
`false` value in the Firecracker configuration structure.

Fixes: #3813

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-03-02 20:43:28 +01:00
Bin Liu
2ae8bd696a
Merge pull request #3367 from wfly1998/main
build: always reset ARCH after getting it
2022-03-02 14:42:45 +08:00
Bin Liu
75877f8793
Merge pull request #3187 from Kvasscn/kata_dev_remove_temp_vsock_dir
virtcontainers: remove temp dir created for vsock in test code
2022-03-02 11:05:47 +08:00
Francesco Giudici
7f638dd049
Merge pull request #3764 from Jakob-Naucke/hugepages-test-s390x
virtcontainers: Use available s390x hugepages
2022-03-01 14:33:59 +01:00
Fabiano Fidêncio
4ab35b0899
Merge pull request #3796 from jodh-intel/fix-monitor-listen-address
Fix monitor listen address
2022-03-01 13:51:01 +01:00
Fabiano Fidêncio
97c17085b0
Merge pull request #3770 from Jakob-Naucke/gofmt-vmm-s390x
runtime: Gofmt fixes
2022-03-01 11:34:15 +01:00
James O. D. Hunt
e64c54a2ad monitor: Listen to localhost only by default
Change `kata-monitor` to listen to port `8090` on the local interface
only by default.

> **Note:**
>
> This is a breaking change as previously it listened on all interfaces.

Fixes: #3795.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-03-01 10:00:43 +00:00
James O. D. Hunt
e6350d3d45 monitor: Fix build options
Removed redundant and duplicated build options to build
`kata-monitor` the same way as the other components:

- `CGO_ENABLED=0` is not necessary.
- `-buildmode=exe` is not necessary since `BUILDFLAGS` already sets the
  build mode.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-03-01 10:00:43 +00:00
GabyCT
ccb063b848
Merge pull request #3788 from fidencio/wip/update-clh-confidential-guest-comments
Update `confidential_guest` comments
2022-02-28 15:11:01 -06:00
GabyCT
bc1733bb0e
Merge pull request #3774 from egernst/delinux-runtime
cleanup runtime pkgs for Darwin build, add basic Darwin build/unit test
2022-02-28 15:08:09 -06:00
Jakob Naucke
eda8ea154a
runtime: Gofmt fixes
- Mostly blank lines after `+build` -- see
  https://pkg.go.dev/go/build@go1.14.15 -- this is, to date, enforced by
  `gofmt`.
- 1.17-style go:build directives are also added.
- Spaces in govmm/vmm_s390x.go

Fixes: #3769
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2022-02-28 17:24:47 +01:00
Eric Ernst
e355a71860 container: file is not linux specific
This should not be linux specific -- drop restriction.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-02-28 08:01:53 -08:00
Eric Ernst
b31876eefb device-manager: move linux-only test to a linux-only file
We can't Mkdev on Darwin - let's make sure the vfio test is in a
linux-only file.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-02-28 08:01:53 -08:00
Eric Ernst
6a5c634490 resourcecontrol: SystemdCgroup check is not necessarily linux specific
This utility function is also used to check the spec that will run in
the guest - no need for this to be linux specific.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-02-28 08:01:53 -08:00
Eric Ernst
cc58cf6993 resourcecontrol: convert stats dev_t to unit64types
Their types may differ on various host OSes, but
unix.Major|Minor always takes a uint64

Depends-on: github.com/kata-containers/tests#4516
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-02-28 08:01:53 -08:00
Eric Ernst
5be188cc29 utils: Add darwin stub
Add a stub for utils_darwin to facilitate building this package on
Darwin. We can probably drop this empty stub if we have better
abstraction for the various parts of virtcontainers that call it
today...

Fixes:# 3777

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2022-02-28 08:01:53 -08:00
Samuel Ortiz
ad0449195d virtcontainers: Convert stats dev_t to uint64
We need to convert them to uint64 as their types may differ on various
host OSes, but unix.Major|Minor takes a uint64 regardless.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-28 08:01:53 -08:00
Samuel Ortiz
56751089c0 katautils: Use a syscall wrapper for the hook JSON state
There is no real equivalent of a thread ID on Darwin.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-28 08:01:53 -08:00
Samuel Ortiz
7d64ae7a41 runtime: Add a syscall wrapper package
It allows to support syscall variations between host OSes.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-28 08:01:53 -08:00
Samuel Ortiz
abc681ca5f katautils: Add Darwin stub for the netNS API
And move the current implementation into a Linux only file.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-28 08:01:53 -08:00
Fabiano Fidêncio
de57466212 config: Expand confidential_guest comments
Let's clarify that an error will be reported in case confidential_guest
is enabled, but the hardware where Kata Containers is running doesn't
provide the required feature set.

Fixes: #3787

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-28 11:57:42 +01:00
Fabiano Fidêncio
641d475fa6 config: clh: Use "Intel TDX" instead of just "TDX"
Let's use "Intel TDX" rather than just "TDX", as it can ease the
understanding of the terminology.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-28 10:27:21 +01:00
Fabiano Fidêncio
0bafa2def9 config: clh: Mention supported TEEs
Let's mention the supported TEEs to be used with confidential guests.

Right now, Cloud Hyperisor supports only Intel TDX, used together with
TD Shim.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-28 10:24:33 +01:00
bin
81ed269ed2 runtime: use Cmd.StdoutPipe instead of self-created pipe
Nydusd uses a bufio.Scanner to check if nydusd process has
existed, but stderr/stdout passed to Cmd is self-created pipe,
this pipe will not be closed if the process start failing.

Use standard Cmd.StdoutPipe can close the stdout and kata shim
will detect the existence of the nydusd process, then call cmd.Wait to
reap the process' resources.

Fixes: #3783

Signed-off-by: bin <bin@hyper.sh>
2022-02-28 16:52:49 +08:00
Eric Ernst
3997c962c2
Merge pull request #3767 from tanweernoor/02242022-kata-containers-issue-3631
runtime, config: make selinux configurable
2022-02-26 08:44:29 -08:00
Fabiano Fidêncio
a9ba7c132b clh: Fix typo on HotplugRemoveDevice
A copy and paste mistake was made and the error on HotplugRemoveDevice()
should be about removal and not about addition.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 22:35:32 +01:00
Tanweer Noor
082d538cb4 runtime: make selinux configurable
removes --tags selinux handling in the makefile (part of it introduced here: d78ffd6)
and makes selinux configurable via configuration.toml

Fixes: #3631
Signed-off-by: Tanweer Noor <tnoor@apple.com>
2022-02-25 10:33:46 -08:00
Fabiano Fidêncio
ea1876f057
Merge pull request #3771 from fidencio/wip/clh-tdx
clh: Add TDX support
2022-02-25 18:45:31 +01:00
Samuel Ortiz
1103f5a4d4 virtcontainers: Use FilesystemSharer for sharing the containers files
Switching to the generic FilesystemSharer brings 2 majors improvements:

1. Remove container and sandbox specific code from kata_agent.go
2. Allow for non Linux implementations to provide ways to share
   container files and root filesystems with the Kata Linux guest.

Fixes #3622

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-25 17:22:27 +01:00
Samuel Ortiz
533c1c0e86 virtcontainers: Keep all filesystem sharing prep code to sandbox.go
With the Linux implementation of the FilesystemSharer interface, we can
now remove all host filesystem sharing code from kata_agent and keep it
where it belongs: sandbox.go.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-25 17:22:27 +01:00
Samuel Ortiz
61590bbddc virtcontainers: Add a Linux implementation for the FilesystemSharer
This gathers the current kata agent and container filesystem sharing
code into a FilesystemSharer implementation.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-25 17:22:27 +01:00
Samuel Ortiz
03fc1cbd7e virtcontainers: Add a filesystem sharing interface
Filesystem sharing here means the ability to share some parts of the
host filesystem with the guest. It's mostly about sharing files and
container bundle root filesystems.

In order to allow for different file and rootfs sharing implementations,
we define a FilesystemSharer interface.

This interface provides a preparation step, where concrete
implementations will be able to e.g. prepare the host filesysstem.
Then it provides 2 methods, one for sharing any file (regular file or a
directory) and another one for sharing a container root filesystem

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-25 17:22:27 +01:00
Fabiano Fidêncio
72434333aa clh: Add TDX support
Let's enable TDX support for Cloud Hypervisor, using td-shim as its
desired firmware.

Fixes: #3632

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
a13b4d5ad8 clh: Add firmware to the config file
"firmware" option was already present for a while, but it's never been
exposed to the configuration file before.

Let's do it now as it can be used, in combination with the newly added
confidential_guest option, to boot a guest VM using the so called
`td-shim`[0] with Cloud Hypervisor.

[0]: https://github.com/confidential-containers/td-shim

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
a8827e0c78 hypervisors: Confidential Guests do not support NVDIMM
NVDIMM is also not supported with Confidential Guests and Virtio Block
devices should be used instead.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
f50ff9f798 hypervisors: Confidential Guests do not support Memory hotplug
Similarly to VCPUs and Device hotplug, Confidential Guests also do not
support Memory hotplug.

Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Memory hotplug as
being supported when running Confidential Guests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
df8ffecde0 hypervisors: Confidential Guests do not support Device hotplug
Similarly to VCPUs hotplug, Confidential Guests also do not support
Device hotplug.

Let's make it clear in the documentation and guard the code on both QEMU
and Cloud Hypervisor side to ensure we don't advertise Device hotplug as
being supported when running Confidential Guests.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
28c4c044e6 hypervisors: Confidential Guests do not support VCPUs hotplug
As confidential guests do not support VCPUs hotplug, let's set the
"DefaultMaxVCPUs" value to "NumVCPUs".

The reason to do this is to ensure that guests will be started with the
correct amount of VCPUs, without giving to the guest with all the
possible VCPUs the host could provide.

One clear side effect of this limitation is that workloads that would
require more VCPUs on their yaml definition will not run on this
scenario.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
29ee870d20 clh: Add confidential_guest to the config file
ConfidentialGuest is an option already present and exposed for QEMU,
which is used for using Kata Containers together with different sorts of
Guest Protections, such as TDX and SEV for x86_64, PEF for ppc64le, and
SE for s390x.

Right now we error out in case confidential_guest is enabled, as we will
be implementing the needed blocks for this as part of this series.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
9621c59691 clh: refactor image / initrd configuration set
This is a small code refactor removing a deadcode based the checks
already done in the generic hypervisor abstraction.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
dcdc412e25 clh: use common kernel params from the hypervisor code
The hypervisor code already defines 3 common kernel root params for the
following cases:
* NVDIMM
* NVDIMM without DAX support
* Virtio Block

As parameters used for cloud-hypervisor have an overlap with the ones
provided by the NVDIMM case, let's take advantage of that.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:21 +01:00
Fabiano Fidêncio
4c164afbac versions: Update Cloud Hypervisor to 5343e09e7b8db
Let's bump the Cloud Hypervisor version to 5343e09e7b8db, as that brings
a few fixes we're interested in, such as:

* hypervisor, vmm: Handle TDX hypercalls with INVALID_OPERAND
  - https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3723
    - This is needed for the TDX support on the cloud hypervisor driver,
      which is part of this very same series.

* openapi: Update the PciBdf types
  - https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3748
    - This is needed due to a change in a DeviceNode field, which would
      cause a marshalling / demarshalling error when running with a
      version of cloud-hypervisor that includes the TDX fixes mentioned
      above.

* scripts: dev_cli: Don't quote $features_build
* scripts: dev_cli: Add --features option
  - https://github.com/cloud-hypervisor/cloud-hypervisor/pull/3773
    - This is needed due to changes in the scripts used to build Cloud
      Hypervisor, which are used as part of Kata Containers CIs and
      github actions.

      Due to this change, we're also adapting the build scripts as part
      of this very same commit.

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-25 16:49:16 +01:00
Jakob Naucke
bbfe7d6591
Merge pull request #3599 from Jakob-Naucke/no-virtio-rng-ccw
virtcontainers: Do not add a virtio-rng-ccw device
2022-02-25 15:27:02 +01:00
Francesco Giudici
3da6006de4
Merge pull request #3751 from fgiudici/kata-monitor_issue3705
kata-monitor: fix collecting metrics for sandboxes not started through CRI
2022-02-25 14:53:12 +01:00
Jakob Naucke
b2a65f9031
virtcontainers: Use available s390x hugepages
in TestHandleHugepages. On s390x, hugepage sizes must be set at boot, so
test with any that are present (default is 1M).

Depends-on: github.com/kata-containers/kata-containers#3770
Fixes: #3763
Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2022-02-25 13:11:00 +01:00
Amulyam24
cb4230e60e runtime: fix package declaration for ppc64le
Incorrect package name causes build to fail. Fix it
in vm_ppc64le.go

Fixes: #3761

Signed-off-by: Amulyam24 <amulmek1@in.ibm.com>
2022-02-24 15:31:48 +05:30
Eric Ernst
c6cc038364
Merge pull request #3615 from sameo/topic/hypervisor
Make the hypervisor framework not Linux specific
2022-02-23 16:02:00 -08:00
Francesco Giudici
fec26f8e51 kata-monitor: trivial: rename symbols & labels
We introduced collection of sandboxes metadata from the CRI that will be
attached to the sandbox metrics: this will allow to immediately match
sandboxes metrics with CRI workloads.
Rename the symbols from *Kube* to *CRI* as the metadata will be there
every time pods are created through CRI, also if kubernetes is not
installed (e.g., 'crictl runp').

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2022-02-23 18:34:32 +01:00
Samuel Ortiz
9fd4e5514f runtime: Move the resourcecontrol package one layer up
And try to reduce the number of virtcontainers packages, step by step.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-23 15:48:40 +01:00
Samuel Ortiz
823faee83a virtcontainers: Rename the cgroups package
To resourcecontrol, and make it consistent with the fact that cgroups
are a Linux implementation of the ResourceController interface.

Fixes: #3601

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-23 15:48:40 +01:00
Samuel Ortiz
0d1a7da682 virtcontainers: Rename and clean the cgroup interface
We call it a ResourceController, and we make it not so Linux specific.
Now the Linux implementations is the cgroups one.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-23 15:48:40 +01:00
Samuel Ortiz
ad10e201e1 virtcontainers: cgroups: Move non Linux routine to utils.go
Have an OS agnostic file for sharing routines.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-23 15:48:40 +01:00
Samuel Ortiz
d49d0b6f39 virtcontainers: cgroups: Define a cgroup interface
And move the current, Linux-specific implementation into
cgroups_linux.go

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-23 15:48:40 +01:00
Francesco Giudici
3ac52e8193 kata-monitor: fix updating sandbox cache at startup
We now rely on fs events only to update the sandbox cache. This is not
true anyway for sandboxes already present at kata-monitor startup: we
just retrieve the list and add them in the cache only when we get their
CRI metadata. If CRI metadata is not available we will never add them to
the sandbox cache.
Fix this by immediately adding the sandboxes we find at startup time to
the sandbox cache.

Fixes: #3705

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2022-02-23 11:21:06 +01:00
Francesco Giudici
160bb62138 kata-monitor: bump version to 0.3.0
Since kata-monitor now:
- relies on fs events *only* to update the sandbox cache
- adds CRI meta-data as labels (CRI pod name, namespace and uid)
it deserves a version bump.

Note that while we could let kata-monitor match the runtime version,
kata-monitor will usually work flawlessy with different kata shim
releases: so it makes sense to keep kata-monitor version separated.

Signed-off-by: Francesco Giudici <fgiudici@redhat.com>
2022-02-23 11:17:02 +01:00
Fabiano Fidêncio
6a9e5f90f7
Merge pull request #3670 from sameo/topic/nerdctl
Support nerdctl OCI hooks
2022-02-22 23:03:33 +01:00
Fabiano Fidêncio
4729fd0fc2
Merge pull request #3736 from liubin/fix/3733-log-events-for-crio
shim: log events for CRI-O
2022-02-22 09:19:37 +01:00
bin
f6fc1621f7 shim: log events for CRI-O
CRI-O start shim process without setting TTRPC_ADDRESS,
that the forwarding events goroutine will get errors.

For CRI-O runtime, we can log the events to log file.

Fixes: #3733

Signed-off-by: bin <bin@hyper.sh>
2022-02-22 11:02:50 +08:00
Fabiano Fidêncio
1e9f3c856d
Merge pull request #3553 from fgiudici/kata-monitor_cachefix
kata-monitor: simplify sandbox cache management and attach kubernetes POD metadata to metrics
2022-02-21 13:17:22 +01:00
Peng Tao
031da99914
Merge pull request #3687 from luodw/nydus-clh
nydus: add lazyload support for kata with clh
2022-02-21 19:31:45 +08:00
luodaowen.backend
3175aad5ba virtiofs-nydus: add lazyload support for kata with clh
As kata with qemu has supported lazyload, so this pr aims to
bring lazyload ability to kata with clh.

Fixes #3654

Signed-off-by: luodaowen.backend <luodaowen.backend@bytedance.com>
2022-02-19 21:55:31 +08:00
zhanghj
94b831ebf8 virtcontainers: remove temp dir created for vsock in test code
remove temp dir generated by mock.GenerateKataMockHybridVSock().

Fixes: #3186

Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
2022-02-19 16:59:15 +08:00
Archana Shinde
7db9bef72c
Merge pull request #3718 from Kvasscn/kata_dev_fix_utils_assert_msg
virtcontainers: Remove duplicated assert messages in utils test code
2022-02-18 06:07:16 -08:00
Samuel Ortiz
27de212fe1 runtime: Always add network endpoints from the pod netns
As the container runtime, we're never inspecting, adding or configuring
host networking endpoints.
Make sure we're always do that by wrapping addSingleEndpoint calls into
the pod network namespace.

Fixes #3661

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-18 10:37:07 +01:00
zhanghj
1cee0a9452 virtcontainers: Remove duplicated assert messages in utils test code
Remove duplicated strings in assert.Errorf() and assert.NoErrorf().

Fixes: #3714

Signed-off-by: zhanghj <zhanghj.lc@inspur.com>
2022-02-18 16:45:05 +08:00
Samuel Ortiz
77c29bfd3b container: Remove VFIO lazy attach handling
With the recently added VFIO fixes and support, we should not need that
anymore.

Fixes #3108

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-17 08:39:44 +01:00
GabyCT
ced5e910d5
Merge pull request #3558 from jodh-intel/docs-rework-readme
docs: Improve top-level README
2022-02-16 16:28:14 -06:00
Fabiano Fidêncio
6f9685fbf5
Merge pull request #3624 from mdlayher/mdl-vsock
runtime: use github.com/mdlayher/vsock@v1.1.0
2022-02-16 23:11:47 +01:00
Samuel Ortiz
26b3f0017c virtcontainers: Split hypervisor into Linux and OS agnostic bits
Keep all the OS agnostic bits in the hypervisor.go and
hypervisor_ARCH.go files.

Fixes #3614

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:15:31 +01:00
Samuel Ortiz
fa0e9dc6b1 virtcontainers: Make all Linux VMMs only build on Linux
Some of them (e.g. QEMU) can run on other OSes (e.g. Darwin) but the
current virtcontainers implementation is Linux specific.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:07:34 +01:00
Samuel Ortiz
c91035d0e1 virtcontainers: Move non QEMU specific constants to hypervisor.go
Hotplugging errors and 9pfs size are not particularily QEMU specific.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:07:34 +01:00
Samuel Ortiz
10ae05914c virtcontainers: Move guest protection definitions to hypervisor.go
They're not QEMU specific, other VMMs may implement support for it.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:07:31 +01:00
Samuel Ortiz
b28d0274ff virtcontainers: Make max vCPU config less QEMU specific
Even though it's still actually defined as the QEMU upper bound,
it's now abstracted away through govmm.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:06:32 +01:00
Samuel Ortiz
a5f6df6a49 govmm: Define the number of supported vCPUs per architecture
Based on qhe QEMU supports on those architectures.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-16 19:06:32 +01:00
Fabiano Fidêncio
be2e90469a
Merge pull request #3669 from fidencio/wip/virtiofsd-use-announce-submounts
virtiofsd: Use "-o announce_submounts"
2022-02-16 16:43:18 +01:00
James O. D. Hunt
9818cf7196 docs: Improve top-level and runtime README
Various improvements to the top-level README file:

- Moved the following sections from the runtime's README to the
  top-level README:
  - License
  - Platform support / Hardware requirements
- Added the following sections to the top-level README:
  - Configuration
  - Hypervisors
- Improved formatting of the Documentation section in the top-level
  README.
- Removed some unused named links from the top-level README.

Also improvements to the runtime README:

- Removed confusing mention of the old 1.x runtime name.
- Clarify the binary name for the 2.x runtime and the utility program.

> **Note:**
>
> We cannot currently link to the AMD website as that site's
> configuration causes the CI static checks to fail. See
> https://github.com/kata-containers/tests/issues/4401

Fixes: #3557.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2022-02-16 09:52:48 +00:00
bin
81a8baa5e5 runtime: add hugepages support
Add hugepages support, port from:
b486387cba

Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: bin <bin@hyper.sh>
2022-02-16 15:14:53 +08:00
bin
7df677c01e runtime: Update calculateSandboxMemory to include Hugepages Limit
Support hugepages and port from:
96dbb2e8f0

Fixes: #3342

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Pradipta Banerjee <pradipta.banerjee@gmail.com>
Signed-off-by: bin <bin@hyper.sh>
2022-02-16 15:14:37 +08:00
Samuel Ortiz
4f96e3eae3 katautils: Pass the nerdctl netns annotation to the OCI hooks
We need to let nerdctl know which namespace to use when calling the
selected CNI plugin.
See https://github.com/containerd/nerdctl/issues/787

Fixes: #1935

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-15 18:11:23 +01:00
Samuel Ortiz
a871a33b65 katautils: Run the createRuntime hooks
The preStart hooks are being deprecated over the createRuntime ones.

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-15 17:31:56 +01:00
Samuel Ortiz
d9dfce1453 katautils: Run the preStart hook in the host namespace
The OCI spec is very specific about it:

"The prestart hooks MUST be executed in the runtime namespace."

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-15 17:31:56 +01:00
Samuel Ortiz
6be6d0a3b3 katautils: Pass the OCI annotations back to the called OCI hooks
That allows us to amend those annotations with information that could be
used when running those hooks.

For example nerdctl will use those annotations to resolve the networking
namespace path in where to run the CNI plugin, i.e. the created pod
networking namespace.

Fixes #3629

Signed-off-by: Samuel Ortiz <s.ortiz@apple.com>
2022-02-15 17:31:56 +01:00
Fabiano Fidêncio
4bd945b67b virtiofsd: Use "-o announce_submounts"
German Maglione, one of the current virtio-fs developers, has brought to
our attention that using "announce-submounts" could help us to prevent
inode number collisions.

This feature was introduced a year ago or so by Hanna Reitz as part of
the 08dce386e77eb9ab044cb118e5391dc9ae11c5a8, and as we already mandate
QEMU >= 6.1.0, let's take advantage of that.

Fixes: #3507

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-15 08:52:03 +01:00
Yu Li
37df1678ae build: always reset ARCH after getting it
When building with `ARCH=x86_64`, the previous `Makefile` will use it
without checking and cause:

Makefile:319: *** "ERROR: No hypervisors known for architecture x86_64 (looked for: acrn firecracker qemu cloud-hypervisor)".  Stop.

This commit fix the above issue by checking `ARCH` no matter where it
is assigned.

Fixes: #3444

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Yu Li <liyu.yukiteru@bytedance.com>
2022-02-15 14:26:34 +08:00
Shengjing Zhu
3a641b56f6 katatestutils: remove distro constraints
The distro constraint parses os release files, which may not contain
distro version(VERSION_ID field), for example rolling release distributions
like Debian testing, archlinux.

These distro constraints are not used anyway, so removing them instead
of fixing the complex version detection.

Fixes: #1864

Signed-off-by: Shengjing Zhu <zhsj@debian.org>
2022-02-15 02:11:52 +08:00
Fabiano Fidêncio
90fd625d0c versions: Udpate Cloud Hypervisor to 55479a64d237
Let's update cloud-hypervisor to a version that exposes the TDx support
via the OpenAPI's auto-generated code.

Fixes: #3663

Signed-off-by: Fabiano Fidêncio <fabiano.fidencio@intel.com>
2022-02-14 17:32:30 +01:00
James O. D. Hunt
8f80dffead
Merge pull request #3648 from yaoyinnan/index-in-for
runtime: The index variable is initialized multiple times in for
2022-02-14 12:36:46 +00:00
Bin Liu
cf53ec2c71
Merge pull request #2977 from luodw/support_nydus
feature(nydusd): add nydusd support to introduce lazyload ability
2022-02-14 13:08:50 +08:00
Matt Layher
c1ce67d905
runtime: use github.com/mdlayher/vsock@v1.1.0
Fixes #3625
Signed-off-by: Matt Layher <mdlayher@gmail.com>
2022-02-12 19:57:15 -05:00
yaoyinnan
42a878e6c1 runtime: The index variable is initialized multiple times in for
Change the variables `mountTypeFieldIdx := 8`, `mntDestIdx := 4` and `netNsMountType := "nsfs"` to const.

And unify the variable naming style, modify `mntDestIdx` to `mountDestIdx`.

Fixes: #3646

Signed-off-by: yaoyinnan <yaoyinnan@foxmail.com>
2022-02-12 11:10:10 +08:00