Commit Graph

63139 Commits

Author SHA1 Message Date
Jiaying Zhang
5514a1f4dd Fixes the races around devicemanager Allocate() and endpoint deletion.
There is a race in predicateAdmitHandler Admit() that getNodeAnyWayFunc()
could get Node with non-zero deviceplugin resource allocatable for a
non-existing endpoint. That race can happen when a device plugin fails,
but is more likely when kubelet restarts as with the current registration
model, there is a time gap between kubelet restart and device plugin
re-registration. During this time window, even though devicemanager could
have removed the resource initially during GetCapacity() call, Kubelet
may overwrite the device plugin resource capacity/allocatable with the
old value when node update from the API server comes in later. This
could cause a pod to be started without proper device runtime config set.

To solve this problem, introduce endpointStopGracePeriod. When a device
plugin fails, don't immediately remove the endpoint but set stopTime in
its endpoint. During kubelet restart, create endpoints with stopTime set
for any checkpointed registered resource. The endpoint is considered to be
in stopGracePeriod if its stoptime is set. This allows us to track what
resources should be handled by devicemanager during the time gap.
When an endpoint's stopGracePeriod expires, we remove the endpoint and
its resource. This allows the resource to be exported through other channels
(e.g., by directly updating node status through API server) if there is such
use case. Currently endpointStopGracePeriod is set as 5 minutes.

Given that an endpoint is no longer immediately removed upon disconnection,
mark all its devices unhealthy so that we can signal the resource allocatable
change to the scheduler to avoid scheduling more pods to the node.
When a device plugin endpoint is in stopGracePeriod, pods requesting the
corresponding resource will fail admission handler.
2018-03-09 17:00:57 -08:00
Kubernetes Submit Queue
36fd62eed8
Merge pull request #60972 from wojtek-t/fix_upgrade_test
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix upgrade tests for GKE Regional Clusters
2018-03-09 15:44:46 -08:00
Yongkun Anfernee Gui
eba9528753 Add cache comparison for pods and pdbs 2018-03-09 15:10:26 -08:00
Yongkun Anfernee Gui
fda0d07eb6 Scheduler cache comparer
A debug tool that collects resources from api server and compares it
with the scheduler cache. It currently only compares the node list, but
it should be easy to extend. The compare is triggered by signal USER2,
by doing

  kill -12 ${SCHED_PID}

The compare result goes to scheduler log.

Towards #60860
2018-03-09 15:10:22 -08:00
Kubernetes Submit Queue
df36379670
Merge pull request #60950 from juanvallejo/jvallejo/use-temp-kubeconfig-file-tests
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

use temp kubeconfig for fake factory

**Release note**:
```release-note
NONE
```

Fixes https://github.com/kubernetes/kubernetes/issues/60907

cc @deads2k @ixdy
2018-03-09 15:00:21 -08:00
Joe Betz
e2a25f9b54 Bump to etcd 3.1.12 to pick up critical fix 2018-03-09 14:28:23 -08:00
Kubernetes Submit Queue
a2f0ddcedb
Merge pull request #60932 from juanvallejo/jvallejo/fix-builder-mapping
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Prefer GroupVersionResource, fallback to GVK

**Release note**:
```release-note
NONE
```

Addresses https://github.com/kubernetes/kubernetes/pull/59353#discussion_r173048411

cc @smarterclayton @deads2k @soltysh
2018-03-09 14:16:35 -08:00
Chao Xu
3ab516035d Make admission webhooks work in custom apiservers.
Created a scheme that only understands admission/v1beta1 and use it to
encode/decode admissionReviews.

Also made the NegotiationSerializer setup static
2018-03-09 13:54:27 -08:00
Zihong Zheng
9bb962e238 [e2e service] Fix CleanupGCEResources for regional test 2018-03-09 13:29:30 -08:00
juanvallejo
8d35f94d51
use temp kubeconfig for fake factory 2018-03-09 15:53:19 -05:00
juanvallejo
177dcb998f
match KindFor first 2018-03-09 15:43:10 -05:00
Kubernetes Submit Queue
40143fd687
Merge pull request #60989 from shyamjvs/disable-quotas-in-scalability-e2es
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Revert "Use quotas in default performance tests"

This reverts commit c3c10208bd.

Ref https://github.com/kubernetes/kubernetes/issues/60988

/cc @gmarek 
/kind bug
/sig scalability
/priority critical-urgent

```release-note
NONE
```
2018-03-09 11:18:11 -08:00
Harry Zhang
5cc841a337 Use inline func to fix deadlock 2018-03-09 10:57:03 -08:00
Zihong Zheng
e7c673086f [e2e service] Fix gke failure: move apiserver restart validation logic into util 2018-03-09 10:56:46 -08:00
Lennart Espe
ba1ef7a6c4
Improve PodSecurityPolicy group validate error message on out-of-range group IDs 2018-03-09 18:30:13 +01:00
Shyam Jeedigunta
34e7a7cf06 Revert "Use quotas in default performance tests"
This reverts commit c3c10208bd.
2018-03-09 18:18:18 +01:00
Bryan Moyles
c05504b736 Use grpc to improve the CPU utilization of the logging agent. 2018-03-09 10:09:30 -05:00
Cao Shufeng
c6f72c20d1 [advanced audit]fix comment about throttle burst 2018-03-09 22:31:02 +08:00
Dan Winship
34ce573e99 Fix use of "-w" flag to iptables-restore
iptables accepts "-w5" but iptables-restore requires "-w 5"
2018-03-09 08:52:05 -05:00
Kubernetes Submit Queue
7c9293e1c3
Merge pull request #60973 from shyamjvs/revert-accidental-load-test-remove
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Revert "[Test change - don't merge] Skip load test"

This reverts commit ba6bb999f7.

This was accidentally merged as part of 60891.

/cc @wojtek-t 
/sig scalability
/kind bug
/priority important-soon

```release-note
NONE
```
2018-03-09 05:34:18 -08:00
Kubernetes Submit Queue
b13105d43b
Merge pull request #60421 from gmarek/quotas
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Use quotas in default performance tests

Better to use more features in default tests if possible.

LGTM whenever you think we're ready.

```release-note
NONE
```
2018-03-09 04:39:31 -08:00
Shyam Jeedigunta
62f62fc93a Revert "[Test change - don't merge] Skip load test"
This reverts commit ba6bb999f7.
2018-03-09 13:06:50 +01:00
wojtekt
875c1a7053 Fix upgrade tests for GKE Regional Clusters 2018-03-09 12:23:29 +01:00
Marian Lobur
81c6bb6ec2 Fix broken gke regional logging test. 2018-03-09 09:38:04 +01:00
Kubernetes Submit Queue
17d69c296e
Merge pull request #60959 from feiskyer/external-ip
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Set node external IP for azure node when disabling UseInstanceMetadata

**What this PR does / why we need it**:

This PR sets node external IP for azure node disabling UseInstanceMetadata.

It also adds a check of whether it is running locally when UseInstanceMetadata.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #60958

**Special notes for your reviewer**:

**Release note**:

```release-note
Set node external IP for azure node when disabling UseInstanceMetadata
```
2018-03-08 22:21:37 -08:00
yue9944882
68ad76bf53 move enum into function local 2018-03-09 14:20:58 +08:00
technicianted
659d9df117 added missing error check 2018-03-08 21:39:22 -08:00
Kubernetes Submit Queue
0aad894b9d
Merge pull request #60365 from CaoShuFeng/example_test
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

clean up elasticsearch from unit test

The example of elasticsearch has been removed.

**Release note**:
```release-note
NONE
```
2018-03-08 21:06:33 -08:00
Pengfei Ni
3ae114cf08 Get external IP for azure standard nodes 2018-03-09 11:10:44 +08:00
Pengfei Ni
717fe5d0d6 Check whether it is running locally when UseInstanceMetadata 2018-03-09 11:09:33 +08:00
Di Xu
a08cb5b531 include file name in the error when visiting files 2018-03-09 10:19:20 +08:00
hzxuzhonghu
74121c70d6 update bazel 2018-03-09 09:23:33 +08:00
hzxuzhonghu
2b7fd92dce userspace: move udp echo server to proxier_test.go 2018-03-09 09:22:30 +08:00
Kubernetes Submit Queue
8f8201691e
Merge pull request #60450 from verult/repd-beta-integration
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Change regional PD cloud provider references to use the beta API

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #59988

**Special notes for your reviewer**: Depends on a version of the GCP Go beta compute client that is not yet available. Also need to rebase with #60337 once it's merged.

/hold
/cc @abgworrall 
/assign @saad-ali
2018-03-08 16:27:05 -08:00
Kubernetes Submit Queue
71b40cbce5
Merge pull request #60943 from jennybuckley/webhook-https-url
Automatic merge from submit-queue (batch tested with PRs 60906, 60943). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Make admission webhooks honor scheme part of url

**What this PR does / why we need it**:
Bug fix, allow webhooks to use the scheme provided in clientConfig, instead of defaulting to http.
(more in issue)

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #60942

```release-note
Bug fix, allow webhooks to use the scheme provided in clientConfig, instead of defaulting to http.
```

/kind bug
/sig api-machinery
2018-03-08 15:18:46 -08:00
Kubernetes Submit Queue
a5a81da4f3
Merge pull request #60906 from MrHohn/e2e-restart-apiserver-refine
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

[e2e service] Refine apiserver restart logic

**What this PR does / why we need it**:
Ref https://github.com/kubernetes/kubernetes/issues/60761#issuecomment-371308569, wait for apiserver's restart count increases before proceeding the test.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes (hopefully) #60761

**Special notes for your reviewer**:
/assign @rramkumar1 @bowei 

**Release note**:

```release-note
NONE
```
2018-03-08 14:32:42 -08:00
Kubernetes Submit Queue
bcfdb39824
Merge pull request #60867 from Random-Liu/update-cadvisor
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update cadvisor to v0.29.1

Update cadvisor to v0.29.1 to include a bug fix for containerd integration. https://github.com/google/cadvisor/pull/1894

**Release note**:

```release-note
none
```
2018-03-08 13:49:23 -08:00
Kubernetes Submit Queue
9501c525a6
Merge pull request #60935 from shyamjvs/increase-volume-binder-log-verbosity
Automatic merge from submit-queue (batch tested with PRs 60891, 60935). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Increase verbosity of frequently printed logline in scheduler_binder

Fix https://github.com/kubernetes/kubernetes/issues/60933

/cc @wojtek-t @msau42 

```release-note
NONE
```

/milestone 1.10
/sig storage
2018-03-08 12:45:50 -08:00
Kubernetes Submit Queue
56195fd1d3
Merge pull request #60891 from shyamjvs/go-back-to-etcd-3.1.10
Automatic merge from submit-queue (batch tested with PRs 60891, 60935). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Rollback etcd server version to 3.1.11 due to #60589

Ref https://github.com/kubernetes/kubernetes/issues/60589#issuecomment-371171837

The dependencies were a bit complex (so many things relying on it) + the version was updated to 3.2.16 on top of the original bump.
So I had to mostly make manual reverting changes on a case-by-case basis - so likely to have errors :)

/cc @wojtek-t @jpbetz 

```release-note
Downgrade default etcd server version to 3.1.11 due to #60589
```

(I'm not sure if we should instead remove release-notes of the original PRs)
2018-03-08 12:45:46 -08:00
jennybuckley
7d5696eb6d Make admission webhooks not ignore scheme 2018-03-08 11:35:13 -08:00
Tim Allclair
e004257919 Fix default auditing options.
- Log backend defaults to blocking mode (backwards compatability)
- Fix webhook validation
- Add options test
2018-03-08 11:03:44 -08:00
Shyam Jeedigunta
8ff1f05f7c Increase verbosity of frequently printed logline in scheduler_binder 2018-03-08 19:25:01 +01:00
Mik Vyatskov
07905d6ee8 Make log audit backend configurable in GCE
Signed-off-by: Mik Vyatskov <vmik@google.com>
2018-03-08 14:09:32 +01:00
linweibin
db7b59dc0d fix TODO: test more SetType 2018-03-08 21:00:13 +08:00
Aleksandra Malinowska
42f756aeb0 Improve debug curl command 2018-03-08 13:56:44 +01:00
Shyam Jeedigunta
ba6bb999f7 [Test change - don't merge] Skip load test 2018-03-08 13:07:21 +01:00
Shyam Jeedigunta
21f5e69f08 Rollback etcd server version to 3.1.11 due to #60589 2018-03-08 13:07:15 +01:00
Da K. Ma
5adb2bad45 Task 2: Schedule DaemonSet Pods by default scheduler.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
2018-03-08 17:36:49 +08:00
Harry Zhang
7a7f9dccd0 [PATCH] Use nodename as key 2018-03-07 22:10:47 -08:00
m1093782566
13a6306bea move openHostPorts and closeHostPorts into a common struct 2018-03-08 11:13:46 +08:00