Commit Graph

1241 Commits

Author SHA1 Message Date
Fabiano Fidêncio
29fdfcfebc Merge pull request #1725 from liubin/liubin/1724-not-return-if-get-api-socket-failed
clh: return error if apiSocketPath failed
2021-04-30 18:16:45 +02:00
Fabiano Fidêncio
dc23adcd50 Merge pull request #1743 from alrs/fix-runtime-err
runtime: fix dropped error
2021-04-30 18:15:22 +02:00
Fabiano Fidêncio
bd486f7bf3 Merge pull request #1720 from ManaSugi/update-seccomp-spec
agent: Update seccomp configuration for errnoRet and flags
2021-04-30 10:52:42 +02:00
Bo Chen
1ca6bedf3e versions: Upgrade to cloud-hypervisor v15.0
Quotes from the cloud-hypervisor release v15.0:

This release is the first in a new version numbering scheme to represent that
we believe Cloud Hypervisor is maturing and entering a period of stability.
With this new release we are beginning our new stability guarantees.

Other highlights from the latest release include: 1) Network device rate
limiting; 2) Support for runtime control of `virtio-net` guest offload;
3) `--api-socket` supports file descriptor parameter; 4) Bug fixes on
`virtio-pmem`, PCI BARs alignment, `virtio-net`, etc.; 5) Deprecation of
the "LinuxBoot" protocol for ELF and bzImage in the coming release.

Details can be found: https://github.com/cloud-hypervisor/cloud-hypervisor/releases/tag/v15.0

Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by `openapi-generator` [1-2]. As the API changes do not
impact usages in Kata, no additional changes in kata's runtime are
needed to work with the current version of cloud-hypervisor.

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #1779

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-04-29 10:56:22 -07:00
Jakob Naucke
3ee61776d6 virtcontainers: Enable virtio-fs on s390x
Allow and configure vhost-user-fs devices (virtio-fs) on s390x. As a
consequence, appendVhostUserDevice now takes a context, which affects
its signature for other architectures.

Fixes: #1753

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-04-29 09:54:08 +02:00
Jakob Naucke
adba4532a4 virtcontainers: Revert "virtcontainers: Allow s390x appendVhostUserDevice"
This reverts commit 7f60911333.
Patch allowed other vhost user devices besides FS not supported on s390x
and failed to attach a CCW device number, which results in the
inavailability to use more devices after vhost-user-fs-ccw.

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-04-29 09:43:33 +02:00
Eric Ernst
b20dff8027 Merge pull request #1759 from kata-containers/fix_update
Fix the issue that sandbox size is not right after update
2021-04-28 14:48:24 -07:00
Eric Ernst
23a8179184 Merge pull request #1756 from egernst/leave-no-virtiofs-behind
qemu: kill virtiofsd if failure to start VMM
2021-04-27 17:16:33 -07:00
Wainer dos Santos Moschetta
3677640811 runtime/virtcontainers: Fix typo on qmp error msg
"negotiate" was misspelled on qemu's qmp error message.

Fixes #1764
Signed-off-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
2021-04-27 11:52:42 -04:00
Hui Zhu
0787ea8073 cgroupsCreate: not set resources to c.config.Resources
cgroupsCreate will just keep the CPU resources infomation but not the
others.
Set it to c.config.Resources will clean most of resources of the
container.

This commit remove it to handle the issue.

Fixes: #1758

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-04-27 16:44:30 +08:00
Hui Zhu
831224aa22 Sandbox: Fix ContainerConfig ptr in CreateContainer and createContainers
The pointer that send to newContainer in CreateContainer and
createContainers is not the pointer that point to the address in
s.config.Containers.

This commit fix this issue.

Fixes: #1758

Signed-off-by: Hui Zhu <teawater@antfin.com>
2021-04-27 16:44:22 +08:00
Eric Ernst
a57c8ab1be qemu: kill virtiofsd if failure to start VMM
If the QEMU VMM fails to launch, we currently fail to kill virtiofsd,
resulting in leftover processes running on the host. Let's make sure we
kill these, and explicitly cleanup the virtiofs socket on the
filesystem.

Ideally we'll migrate QEMU to utilize the same virtiofsd interface that
CLH uses, but let's fix this bug as a first step.

Fixes: #1755

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-04-26 21:07:20 -07:00
bin
0d0a520d42 clh: return error if apiSocketPath failed
If apiSocketPath failed, should return the error, but not nil

Fixes: #1724

Signed-off-by: bin <bin@hyper.sh>
2021-04-25 10:25:42 +08:00
Lars Lehtonen
fc6bb01a7f runtime: fix dropped error
Fixes: #212

Signed-off-by: Lars Lehtonen <lars.lehtonen@gmail.com>
2021-04-24 14:18:50 -07:00
Fabiano Fidêncio
fe2311cd4c Merge pull request #1739 from pmores/virtiofsd-extra-args-annotation-handling
add io.katacontainers.config.hypervisor.virtio_fs_extra_args handling
2021-04-23 23:22:01 +02:00
Pavel Mores
30ff6ee88b runtime: handle io.katacontainers.config.hypervisor.virtio_fs_extra_args
Users can specify extra arguments for virtiofsd in a pod spec using the
io.katacontainers.config.hypervisor.virtio_fs_extra_args annontation.
However, this annotation was ignored so far by the runtime.  This commit
fixes the issue by processing the annotation value (if present) and
translating it to the corresponding hypervisor configuration item.

Fixes #1523

Signed-off-by: Pavel Mores <pmores@redhat.com>
2021-04-23 21:09:28 +02:00
Fabiano Fidêncio
5eaf7a9982 Merge pull request #1049 from c3d/feature/1043-entropy-source-annotation
Entropy source annotation
2021-04-23 20:16:11 +02:00
Fabiano Fidêncio
b41d9a99b4 Merge pull request #1703 from lifupan/main_fix
fix the issue of missing set fsGroup for EphemeralStorage
2021-04-22 20:29:36 +02:00
Christophe de Dinechin
dcb9f40394 config: Protect annotation for entropy_source
It would be undesirable to be given an annotation like "/dev/null".
Filter out bad annotation values.

Fixes: #1043

Suggested-by: James O. D. Hunt <james.o.hunt@intel.com>
Signed-off-by: Christophe de Dinechin <dinechin@redhat.com>
2021-04-22 15:26:40 +02:00
fupan.lfp
628d55bf4c kata-agent: fix the issue of fsGroup missing
For k8s emptyDir volume, a specific fsGroup would
be set for it, thus runtime should pass this fsGroup
for EphemeralStorage to guest and set it properly on
the emptyDir volume in guest.

Fixes: #1580

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2021-04-22 21:08:52 +08:00
Chelsea Mafrica
1c222c75ac Merge pull request #1697 from jodh-intel/improve-agent-shutdown-handling
Improve agent shutdown handling
2021-04-20 21:25:36 -07:00
Manabu Sugimoto
81c5ff1231 agent: Update seccomp configuration for errnoRet and flags
Update:
- Make the type of errnoRet in oci.proto oneof
- Update seccomp_grpc_to_oci that can set errnoRet as EPREM if the
value is empty.
- Update the oci.pb.go based on the above fixes
- Add seccomp errnoRet and flags option to configs in rustjail

Fixes: #1719

Signed-off-by: Manabu Sugimoto <Manabu.Sugimoto@sony.com>
2021-04-21 12:16:58 +09:00
Fabiano Fidêncio
4c177b5c40 Merge pull request #1599 from Jakob-Naucke/virtiofs-s390x
Enable virtio-fs on s390x
2021-04-20 21:07:15 +02:00
Carlos Venegas
cd27308755 Merge pull request #1432 from dgibson/bug1431
block: Generate PCI path for virtio-blk devices on clh
2021-04-20 12:00:09 -05:00
Fabiano Fidêncio
9df86d28a5 Merge pull request #1678 from cmaf/remove-spans-healthcheck
runtime: Disable trace for healthcheck
2021-04-20 18:38:47 +02:00
Jakob Naucke
7f60911333 virtcontainers: Allow s390x appendVhostUserDevice
Remove the prohibition of vhost-user devices on s390x, which are by now
supported (e.g. vhost-user-fs-ccw). As a consequence,
appendVhostUserDevice no longer needs an error in its signature.
This enables virtio-fs support on s390x.

Fixes: #1469

Signed-off-by: Jakob Naucke <jakob.naucke@ibm.com>
2021-04-20 12:20:32 +02:00
James O. D. Hunt
de2631e711 utils: Make WaitLocalProcess safer
Rather than relying on the system clock, use a channel timeout to avoid
problems if the system time changed.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-04-15 15:46:42 +01:00
James O. D. Hunt
9256e590dc shutdown: Don't sever console watcher too early
Fixed logic used to handle static agent tracing.

For a standard (untraced) hypervisor shutdown, the runtime kills the VM
process once the workload has finished. But if static agent tracing is
enabled, the agent running inside the VM is responsible for the
shutdown. The existing code handled this scenario but did not wait for
the hypervisor process to end. The outcome of this being that the
console watcher thread was killed too early.

Although not a problem for an untraced system, if static agent tracing
was enabled, the logs from the hypervisor would be truncated, missing the
crucial final stages of the agents shutdown sequence.

The fix necessitated adding a new parameter to the `stopSandbox()` API,
which if true requests the runtime hypervisor logic simply to wait for
the hypervisor process to exit rather than killing it.

Fixes: #1696.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-04-15 15:22:00 +01:00
James O. D. Hunt
51ab870091 utils: Improve WaitLocalProcess
Previously, the hypervisors were sending a signal and then checking to
see if the process had died by sending the magic null signal (`0`). However,
that doesn't work as it was written: the logic was assuming sending the
null signal to a process that was dead would return `ESRCH`, but it
doesn't: you first need to you `wait(2)` for the process before sending
that signal. This means that previously, all affected hypervisors would
appear to take `timeout` seconds to end, even though they had _already_
finished.

Now, the hypervisors true end time will be seen as we wait for the
processes before sending the null signal to ensure the process has
finished.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-04-15 14:51:06 +01:00
James O. D. Hunt
507ef6369e utils: Add waitLocalProcess function
Refactored some of the hypervisors to remove the duplicated code used to
trigger a shutdown.

Also added some unit tests.

Signed-off-by: James O. D. Hunt <james.o.hunt@intel.com>
2021-04-15 14:51:03 +01:00
David Gibson
1d5098de70 agent/block: Generate PCI path for virtio-blk devices on clh
Currently runtime and agent special case virtio-blk devices under clh,
ostensibly because the PCI address information is not available in that
case.

In fact, cloud-hypervisor's VmAddDiskPut API does return a PciDeviceInfo,
which includes a PCI address.  That API is broken, because PCI addressing
depends on guest (firmware or OS) actions that the hypervisor won't know
about.  clh only gets away with this because it only uses a single PCI root
and never uses PCI bridges, in which case the guest addresses are
accurately predictable: they always have domain and bus zero.

Until https://github.com/kata-containers/kata-containers/pull/1190, Kata
couldn't handle PCI addressing unless there was exactly one bridge, which
might be why this was actually special-cased for clh.

With #1190 merged, we can handle more general PCI paths, and we can derive
a trivial (one element) PCI path from the information that the clh API
gives us.  We can use that to remove this special case.

fixes #1431

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2021-04-13 13:29:24 +10:00
Chelsea Mafrica
543f9da3ba runtime: Disable trace for healthcheck
With tracing enabled, grpc health check generates a large number of
spans which creates too much data for tasks running longer than a few
minutes. To solve this, remove span creation from kata agent check() and
sendReq() where the majority of the spans come from. Leave contexts in
functions for subsequent calls that create spans.

Fixes #1395

Signed-off-by: Chelsea Mafrica <chelsea.e.mafrica@intel.com>
2021-04-09 15:47:00 -07:00
bin
421439c633 API: remove ProcessListContainer/ListProcesses
This commit will remove ProcessListContainer API from VCSandbox
and ListProcesses from agent.proto.

Fixes: #1668

Signed-off-by: bin <bin@hyper.sh>
2021-04-09 17:34:25 +08:00
bin
d75fe95685 virtcontainers: replace newStore by store in Sandbox struct
The property name make newcomers confused when reading code.
Since in Kata Containers 2.0 there will only be one type of store,
so it's safe to replace it by `store` simply.

Fixes: #1660

Signed-off-by: bin <bin@hyper.sh>
2021-04-08 23:59:16 +08:00
GabyCT
0b87fd436f Merge pull request #1544 from snir911/timeout
runtime: increase dial timeout
2021-04-06 16:10:51 -05:00
Peng Tao
d5600641dd Merge pull request #1603 from lifupan/fix_fsgroup
Fix fsgroup
2021-04-06 11:35:03 +08:00
Snir Sheriber
13653e7b55 runtime: increase dial timeout
On some setups, starting multiple kata pods (qemu) simultaneously on the same node
might cause kata VMs booting time to increase and the pods to fail with:
Failed to check if grpc server is working: rpc error: code = DeadlineExceeded desc = timed
out connecting to vsock 1358662990:1024: unknown

Increasing default dialing timeout to 30s should cover most cases.

Signed-off-by: Snir Sheriber <ssheribe@redhat.com>
Fixes: #1543
2021-04-04 09:37:38 +03:00
Bo Chen
1511d966aa Merge pull request #1616 from egernst/dechat-deruntime
Dechat deruntime
2021-04-01 11:02:27 -07:00
Chelsea Mafrica
4a3282cf1a Merge pull request #1608 from likebreath/0331/go_fmt_clh_clinet_code
runtime: Format auto-generated client code for cloud-hypervisor API
2021-04-01 10:39:02 -07:00
Eric Ernst
a4c125a8b9 trace: move gRPC requests from debug to trace
There are many requests to the agent that happen with relatively
high frequency when a workload is running (checkRequest, as an example).

Let's move from Debug to Trace to avoid bombarding journal.

Signed-off-by: Eric Ernst <eric.g.ernst@gmail.com>
2021-04-01 09:03:26 -07:00
Fupan Li
5524bc806b Merge pull request #1612 from liubin/1610/use-concrete-kata-agent-config-type
runtime: use concrete KataAgentConfig instead of interface type
2021-04-01 21:26:38 +08:00
bin
6fe48329b5 runtime: use concrete KataAgentConfig instead of interface type
Kata Containers 2.0 only have one type of agent, so there is no
need to use interface as config's type

Fixes: #1610

Signed-off-by: bin <bin@hyper.sh>
2021-04-01 13:44:45 +08:00
fupan.lfp
88e58a4f4b agent: fix the issue of missing pass fsGroup
For k8s emptyDir volume, a specific fsGroup would
be set for it, thus runtime should pass this fsGroup
to guest and set it properly on the emptyDir volume
in guest.

Fixes: #1580

Signed-off-by: fupan.lfp <fupan.lfp@antfin.com>
2021-04-01 11:33:18 +08:00
Bo Chen
0c38d9ecc4 runtime: Fix the format of the client code of cloud-hypervisor APIs
Regenerate the client code with the added `go-fmt` step. No functional
changes.

Fixes: #1606

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-03-31 14:41:44 -07:00
Bo Chen
52cacf8838 runtime: Format auto-generated client code for cloud-hypervisor API
This patch extends the current process of generating client code for
cloud-hypervisor API with an additional step, `go-fmt`, which will remove
the generated `client/go.mod` file and format all auto-generated code.

Fixes: #1606

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-03-31 14:36:24 -07:00
Eric Ernst
c0c7bef2b8 Merge pull request #1592 from likebreath/0330/versions_clh_v0.14.0
versions: Update cloud-hypervisor to release v0.14.1
2021-03-31 12:39:35 -07:00
Bo Chen
84b62dc3b1 versions: Update cloud-hypervisor to release v0.14.1
Highlights for cloud-hypervisor version 0.14.0 include: 1) Structured
event monitoring; 2) MSHV improvements; 3) Improved aarch64 platform; 4)
Updated hotplug documentation; 6) PTY control for serial and
virtio-console; 7) Block device rate limiting; 8) Plan to deprecate the
support of "LinuxBoot" protocol and support PVH protocol only.

Highlights for cloud-hypervisor version 0.13.0 include: 1) Wider VFIO
device support; 2) Improve huge page support; 3) MACvTAP support; 4) VHD
disk image support; 5) Improved Virtio device threading; 6) Clean
shutdown support via synthetic power button.

Details can be found:
https://github.com/cloud-hypervisor/cloud-hypervisor/releases

Note: The client code of cloud-hypervisor's OpenAPI is automatically
generated by `openapi-generator` [1-2]. As the API changes do not
impact usages in Kata, no additional changes in kata's runtime are
needed to work with the latest version of cloud-hypervisor.

[1] https://github.com/OpenAPITools/openapi-generator
[2] https://github.com/kata-containers/kata-containers/blob/main/src/runtime/virtcontainers/pkg/cloud-hypervisor/README.md

Fixes: #1591

Signed-off-by: Bo Chen <chen.bo@intel.com>
2021-03-31 11:09:47 -07:00
Orestis Lagkas Nikolos
6255cc1959 virtcontainers/fc: Upgrade Firecracker to v0.23.1
This patch upgrades Firecracker version from v0.21.1 to v0.23.1

* Generate swagger models for v0.23.1 (from firecracker.yaml)
* Change uint64 types in TokenBucket object according to rate-limiter
implementation (introduced in commit #cfeb966)
* Update Firecracker Logger/Metrics to support the new API
* Update payload in fc.vmRunning to support the new API
* Add Metrics type to fcConfig

Fixes: #1518

Signed-off-by: Orestis Lagkas Nikolos <olagkasn@nubificus.co.uk>
2021-03-31 04:55:40 -05:00
Chelsea Mafrica
e5aa4e7eb4 Merge pull request #1563 from Jakob-Naucke/s390x-missing-contexts
virtcontainers: Fix missing contexts in s390x
2021-03-30 09:38:28 -07:00
Tim Zhang
b58fb25d88 Merge pull request #1555 from liubin/fix/1554-install-hook-before-test
test: install mock hook binary before test
2021-03-30 14:01:56 +08:00