Commit Graph

93121 Commits

Author SHA1 Message Date
Benjamin Pineau
fcb3f1f64c Tests fixes for Azure per-VMSS VMs caches
Signed-off-by: Benjamin Pineau <benjamin.pineau@datadoghq.com>
2020-07-20 18:35:23 +02:00
Benjamin Pineau
85ecd0e17c Azure: per VMSS, incremental VMSS VMs cache
Azure's cloud provider VMSS VMs API accesses are mediated through
a cache holding and refreshing all VMSS together.

Due to that we hit VMSSVM.List API more often than we could: an
instance's cache miss or expiration should only require a single
VMSS re-list, while it's currently O(n) relative to the number of
attached Scale Sets.

Under hard pressure (clusters with many attached VMSS that can't all
be listed in one sequence of successive API calls) the controller
manager might be stuck trying to re-list everything from scratch,
then aborting the whole operation; then re-trying and re-triggering
API rate-limits, affecting the whole Subscription.

This patch replaces the global VMSS VMs cache by per-VMSS VMs caches.
Refreshes (VMSS VMs lists) are scoped to the single relevant VMSS; under
severe throttling the various caches can be incrementally refreshed.

Signed-off-by: Benjamin Pineau <benjamin.pineau@datadoghq.com>
2020-07-20 18:35:23 +02:00
Kubernetes Prow Robot
5feab0aa1e
Merge pull request #93207 from hasheddan/nvidia-gpu-installer
Use local daemonset manifest for installing Nvidia drivers
2020-07-20 09:02:51 -07:00
Abdullah Gharaibeh
6f9794d5e9 Rename pod_preemption_metrics to preemption_metrics. Since this metric's type was changed from Gauge to Histogram, renaming it should make it easier to providers to migrate 2020-07-20 11:44:10 -04:00
Giuseppe Scrivano
ef935bd991
kubelet: clamp cpu shares to max allowed
clamp the max cpu.shares to the maximum value allowed by the kernel.

It is not an issue when using cgroupfs, as the kernel will
anyway make sure the value is not out of range and automatically clamp
it, systemd has an additional check that prevents the cgroup creation.

Closes: https://github.com/kubernetes/kubernetes/issues/92855

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-07-20 17:18:03 +02:00
Jordan Liggitt
7aacbeac14 Update k8s.io/utils 2020-07-20 11:12:29 -04:00
Kubernetes Prow Robot
c237804533
Merge pull request #92755 from chelseychen/event-e2e-conformance
Promote Event CRUD tests to conformance
2020-07-20 05:50:51 -07:00
Kevin Klues
00df26a985 Fix a bug whereby reusable CPUs and devices were not being honored
Previously, it was possible for reusable CPUs and reusable devices (i.e.
those previously consumed by init containers) to not be reused by
subsequent init containers or app containers if the TopologyManager was
enabled. This would happen because hint generation for the
TopologyManager was not considering the reusable devices when it made
its hint calculation.

As such, it would sometimes:
1) Generate a hint for a differnent NUMA node, causing the CPUs and
devices to be allocated from that node instead of the one where the
reusable devices live; or
2) End up thinking there were not enough CPUs or devices to allocate and
throw a TopologyAffinity admission error

This patch fixes this by ensuring that reusable CPUs and devices are
considered as part of TopologyHint generation. This frunctionality is
difficult to unit test since it spans multiple components, but an e2e
test will be added in a subsequent patch to test this functionality.
2020-07-20 11:41:13 +00:00
Kevin Klues
74fe9364c3 Simplify logic in devicemanager TopologyHint generation 2020-07-20 11:41:13 +00:00
Kevin Klues
9f5f401d60 Add AnySet() to topologymanager bitmask API 2020-07-20 11:41:13 +00:00
Nikhita Raghunath
c3b75416a8 publishing: use go 1.14.6 for master and release-1.19
The `default-go-version` field specifies the go version used for the
master branch, and if the go version is not explicitly specified for a
release branch.

This commit also uses go 1.14.6 for the `release-1.19` branch.
2020-07-20 14:02:30 +05:30
Kubernetes Prow Robot
43fbe17dc6
Merge pull request #93128 from gaurav1086/convertMaptoMapPointer_fix_range_iterator_issue
[staging/azure] azure_utils: fix range iterator issue in convertMaptoMapPointer
2020-07-19 21:02:50 -07:00
Caleb Woodbine
125e839d77 Fix formatting 2020-07-20 13:16:35 +12:00
Caleb Woodbine
05163497bc Fix bazel build 2020-07-20 11:15:57 +12:00
Caleb Woodbine
b38d7f25fe Remove watch tooling 2020-07-20 11:00:37 +12:00
Kubernetes Prow Robot
6ceb6c6845
Merge pull request #93134 from logicalhan/metric-handler
Add reset handler to the instrumentation metric library and expose Reset on the metric registries
2020-07-19 15:48:50 -07:00
Caleb Woodbine
dc30156fb8 Update error handling formatting, handling of type conversion in watch event loop 2020-07-20 10:03:49 +12:00
Caleb Woodbine
6e04fbdde1 Update error statements 2020-07-20 10:03:49 +12:00
Caleb Woodbine
a2c19d7ae0 Add watch checks 2020-07-20 10:03:49 +12:00
Caleb Woodbine
a4e29f2481 Fix formatting 2020-07-20 10:03:49 +12:00
Caleb Woodbine
cb7835bcb0 Add check for unmarshalling onto a Pod object type 2020-07-20 10:03:49 +12:00
Caleb Woodbine
c6a86b5fed Fix test to use values from v1, wording; Update variables to be more templatable 2020-07-20 10:03:49 +12:00
Caleb Woodbine
47cd8dde56 Update to check response data of UpdateStatus instead of listing after updating the status 2020-07-20 10:03:49 +12:00
Caleb Woodbine
19e9368eb8 Create Pod+PodStatus resource lifecycle test 2020-07-20 10:03:49 +12:00
Amim Knabben
1044840f6e Documenting TEST_ARGS on Node E2E helper 2020-07-19 14:37:28 -04:00
Kubernetes Prow Robot
363c3b89f5
Merge pull request #93198 from justaugustus/go1146
Update Golang to v1.14.6
2020-07-19 09:10:50 -07:00
hasheddan
4e4d629af7
Return error instead of panic if container index outside bounds
Adds check for index out of bounds error instead of panic when passing
container to kubectl exec.

Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
2020-07-19 10:04:53 -05:00
Kubernetes Prow Robot
66020d2292
Merge pull request #93110 from dims/adding-dims-as-reviewer-for-test
Adding dims as reviewer for test/
2020-07-19 04:32:50 -07:00
Kubernetes Prow Robot
4804fbe4c1
Merge pull request #93121 from liggitt/resource-quota
kube-up: limit critical pods to kube-system by default
2020-07-19 00:00:50 -07:00
Kubernetes Prow Robot
92e471a0bd
Merge pull request #93216 from liggitt/deflake-preferred-version
Deflake PreferredVersion e2e test
2020-07-18 21:44:50 -07:00
Jordan Liggitt
9718e7906f Deflake PreferredVersion e2e test 2020-07-18 22:51:56 -04:00
Kubernetes Prow Robot
eda07adf6e
Merge pull request #91177 from MikeSpreitzer/more-concurrency-details
Introduce more metrics on concurrency
2020-07-18 19:20:50 -07:00
hasheddan
e990698d5f
Use local daemonset manifest for installing Nvidia drivers
Updates sig-scheduling e2e Nvidia GPU tests to install drivers using
local manifest by default. Currently the DaemonSet is fetched from the
GoogleCloudPlatform/container-enginer-accelerators repo by default.
Using a local manifest allows for manually specifying the image
cos-gpu-installer image rather than always using latest. A remote
manifest can still be fetched by setting
NVIDIA_DRIVER_INSTALLER_DAEMONSET env var.

Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
2020-07-18 21:01:00 -05:00
Kubernetes Prow Robot
a789d56b65
Merge pull request #93119 from dcbw/e2e-ingress-misisng-return
test/e2e/ingress: add missing return to fix panics on !GCE
2020-07-18 13:58:49 -07:00
Jordan Liggitt
9d83ca4b02 Deflake GCEPD namespace deletion test 2020-07-18 15:32:02 -04:00
Jordan Liggitt
5678d40f76 Make CRDList lifecycle consistent with CRD 2020-07-18 13:53:49 -04:00
Kubernetes Prow Robot
1f14cbac54
Merge pull request #93118 from bart0sh/PR0091-update-etcd
go.mod: update etcd to fix e2e tests
2020-07-18 10:24:50 -07:00
Kubernetes Prow Robot
3a0b683c01
Merge pull request #93084 from ii/heyste-get-code-version-test
Promote Check Server Version e2e test to conformance - 1 Endpoint Coverage
2020-07-18 06:14:50 -07:00
Kubernetes Prow Robot
05f6812c2d
Merge pull request #90822 from deads2k/csr-separate-signer-flags-02
allow setting different certificates for kube-controller-managed CSR signers
2020-07-18 03:10:50 -07:00
Kubernetes Prow Robot
242f3d9dce
Merge pull request #80917 from aarnaud/windows-devicemanager
Port deviceManager to windows container manager to enable GPU access
2020-07-17 21:04:50 -07:00
Kubernetes Prow Robot
0a7050f531
Merge pull request #93043 from aramase/vmss-dualstack-ipconfig
fix: determine the correct ip config based on ip family
2020-07-17 15:02:50 -07:00
Dan Winship
e46572ef4b Improve EndpointController's handling of headless services under dual-stack
EndpointController was accidentally requiring all headless services to
be IPv4-only in clusters with IPv6DualStack enabled.

This still leaves "legacy" (ie, IPFamily-less) headless services as
always IPv4-only because the controller doesn't currently have easy
access to the information that would allow it to fix that.
(EndpointSliceController had the same problem already, and still
does.) This can be fixed, if needed, by manually setting IPFamily,
and the proposed API for 1.20 will handle this situation better.
2020-07-17 15:26:21 -04:00
Dan Winship
9023d19c57 Improve EndpointController dual-stack testing
Rewrite some of the test helpers to better support single-stack IPv4
vs single-stack IPv6 vs dual-stack IPv4 primary vs dual-stack IPv6
primary, and update TestPodToEndpointAddressForService to test some
more cases.
2020-07-17 15:26:21 -04:00
Dan Winship
9fb6e2ef55 Fix Endpoint/EndpointSlice pod change detection
The endpoint controllers responded to Pod changes by trying to figure
out if the generated endpoint resource would change, rather than just
checking if the Pod had changed, but since the set of Pod fields that
need to be checked depend on the Service and Node as well, the code
ended up only checking for a subset of the changes it should have.

In particular, EndpointSliceController ended up only looking at IPv4
Pod IPs when processing Pod update events, so when a Pod went from
having no IP to having only an IPv6 IP, EndpointSliceController would
think it hadn't changed.
2020-07-17 15:22:59 -04:00
Stephen Augustus
3bbcba9b84 Update Golang to v1.14.6
Signed-off-by: Stephen Augustus <saugustus@vmware.com>
2020-07-17 14:47:21 -04:00
Stephen Augustus
bddd52046d Update repo-infra to v0.0.8 (to support go1.14.6 and go1.13.14)
Includes:
- bazelbuild/bazel-toolchains@3.4.0
- bazelbuild/rules_go@v0.22.8

Signed-off-by: Stephen Augustus <saugustus@vmware.com>
2020-07-17 14:30:02 -04:00
Kubernetes Prow Robot
f9ad7db9a6
Merge pull request #92349 from jingyih/update_etcd_server_3p4p9
Update default etcd server to 3.4.9
2020-07-17 07:53:01 -07:00
Jordan Liggitt
3b323b2ef0 Limit critical pods to kube-system by default 2020-07-17 09:52:19 -04:00
Kubernetes Prow Robot
a3e3b355fa
Merge pull request #92619 from ii/heyste-get-apigroup-list-test
Write checkAPIGroupPreferredVersion Test - +16 Endpoint coverage
2020-07-17 02:51:01 -07:00
Ed Bartosh
016eb06d8b go.mod: update etcd to fix e2e tests
Updated etcd to v3.4.10 to include this fix:
 - change protobuf field type from int to int64

This should fix increased flakyness in a lot of node e2e tests.
2020-07-17 12:15:43 +03:00