Commit Graph

5054 Commits

Author SHA1 Message Date
Pengfei Ni
95c3782043 Rewrite resolv.conf for dockershim
PR #29378 introduces ClusterFirstWithHostNet, but docker doesn't support
setting dns options togather with hostnetwork. This commit rewrites
resolv.conf same as dockertools.
2017-03-20 18:45:39 +08:00
Pengfei Ni
079158fa08 CRI: add support for dns cluster first policy
PR #29378 introduces ClusterFirstWithHostNet policy but only dockertools
was updated to support the feature. This PR updates kuberuntime to
support it for all runtimes.

Also fixes #43352.
2017-03-20 17:50:38 +08:00
Pengfei Ni
99ed3202f3 Run hack/update-bazel.sh 2017-03-20 17:48:36 +08:00
Pengfei Ni
53b5f2df48 Add unit test for MakePortsAndBindings 2017-03-20 17:47:38 +08:00
Antonio Murdaca
caa6dd2599 pkg/kubelet/remote: fix typo
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2017-03-20 10:12:03 +01:00
Pengfei Ni
2ddaaec199 dockershim: process protocol correctly for port mapping 2017-03-20 16:52:24 +08:00
Kubernetes Submit Queue
7bc86d84c1 Merge pull request #43116 from dchen1107/master
Automatic merge from submit-queue (batch tested with PRs 42828, 43116)

Apply taint tolerations for NoExecute for all static pods.

Fixed https://github.com/kubernetes/kubernetes/issues/42753


**Release note**:
```
Apply taint tolerations for NoExecute for all static pods.
```

cc/ @davidopp
2017-03-17 18:14:29 -07:00
Dawn Chen
d419efbe71 Fix unittest reflecting the default taint tolerations change for static
pods.
2017-03-17 14:06:34 -07:00
Dawn Chen
d26e906191 Apply taint tolerations for NoExecute for all static pods. 2017-03-17 09:50:27 -07:00
Julien Balestra
cd7c480f86 Kubelet:rkt Create any missing hostPath Volumes 2017-03-17 10:47:02 +01:00
Yu-Ju Hong
b1e6e7f774 Use the assert/require package in kubelet unit tests
This reduce the lines of code and improve readability.
2017-03-16 10:21:44 -07:00
Piotr Szczesniak
9bd05bdee4 Setup fluentd-ds-ready label in startup script not in kubelet 2017-03-16 13:18:31 +01:00
Kubernetes Submit Queue
ba25afd278 Merge pull request #40964 from tanshanshan/kubelet-unit-test
Automatic merge from submit-queue (batch tested with PRs 40964, 42967, 43091, 43115)

Improve code coverage for pkg/kubelet/status/generate.go

**What this PR does / why we need it**:

Improve code coverage for pkg/kubelet/status/generate.go  from #39559

Thanks.

**Special notes for your reviewer**:

**Release note**:

```release-note
```
2017-03-15 16:08:23 -07:00
Kubernetes Submit Queue
222f69cf3c Merge pull request #43030 from yujuhong/rm_corrupted_checkpoint
Automatic merge from submit-queue (batch tested with PRs 42747, 43030)

dockershim: remove corrupted sandbox checkpoints

This is a workaround to ensure that kubelet doesn't block forever when
the checkpoint is corrupted.

This is a workaround for #43021
2017-03-14 22:56:20 -07:00
Yu-Ju Hong
48afc7d4e0 dockershim: call sync() after writing the checkpoint
This ensures the checkpoint files are persisted.
2017-03-14 18:36:51 -07:00
Pengfei Ni
91616f666a kubelet: check and enforce minimum docker api version 2017-03-15 09:28:06 +08:00
Kubernetes Submit Queue
6de28fab7d Merge pull request #42942 from vishh/gpu-cont-fix
Automatic merge from submit-queue (batch tested with PRs 42942, 42935)

[Bug] Handle container restarts and avoid using runtime pod cache while allocating GPUs

Fixes #42412

**Background**
Support for multiple GPUs is an experimental feature in v1.6. 
Container restarts were handled incorrectly which resulted in stranding of GPUs
Kubelet is incorrectly using runtime cache to track running pods which can result in race conditions (as it did in other parts of kubelet). This can result in same GPU being assigned to multiple pods.

**What does this PR do**
This PR tracks assignment of GPUs to containers and returns pre-allocated GPUs instead of (incorrectly) allocating new GPUs.
GPU manager is updated to consume a list of active pods derived from apiserver cache instead of runtime cache.
Node e2e has been extended to validate this failure scenario.

**Risk**
Minimal/None since support for GPUs is an experimental feature that is turned off by default. The code is also isolated to GPU manager in kubelet.

**Workarounds**
In the absence of this PR, users can mitigate the original issue by setting `RestartPolicyNever`  in their pods.
There is no workaround for the race condition caused by using the runtime cache though.
Hence it is worth including this fix in v1.6.0.

cc @jianzhangbjz @seelam @kubernetes/sig-node-pr-reviews 

Replaces #42560
2017-03-14 10:19:17 -07:00
Lou Yihua
63f1b077dc Add Host field to TCPSocketAction
Currently, TCPSocketAction always uses Pod's IP in connection. But when a
pod uses the host network, sometimes firewall rules may prevent kubelet
from connecting through the Pod's IP. This PR introduces the 'Host' field
for TCPSocketAction, and if it is set to non-empty string, the probe will
be performed on the configured host rather than the Pod's IP. This gives
users an opportunity to explicitly specify 'localhost' as the target for
the above situations.
2017-03-14 23:48:28 +08:00
Kubernetes Submit Queue
f1e9004da9 Merge pull request #42927 from Random-Liu/fix-kubelet-panic
Automatic merge from submit-queue (batch tested with PRs 42802, 42927, 42669, 42988, 43012)

Fix kubelet panic in cgroup manager.

Fixes https://github.com/kubernetes/kubernetes/issues/42920.
Fixes https://github.com/kubernetes/kubernetes/issues/42875
Fixes #42927 
Fixes #43059

Check the error in walk function, so that we don't use info when there is an error.

@yujuhong @dchen1107 @derekwaynecarr @vishh /cc @kubernetes/sig-node-bugs
2017-03-14 07:31:31 -07:00
Yu-Ju Hong
035afab901 dockershim: remove corrupted sandbox checkpoints
This is a workaround to ensure that kubelet doesn't block forever when
the checkpoint is corrupted.
2017-03-13 15:41:01 -07:00
Random-Liu
e6341cc3c7 Fix kubelet panic in cgroup manager. 2017-03-13 12:06:08 -07:00
Vishnu kannan
ad743a922a remove dead code in gpu manager
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-03-13 10:58:26 -07:00
Vishnu kannan
ff158090b3 use active pods instead of runtime pods in gpu manager
Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-03-13 10:58:26 -07:00
Vishnu Kannan
8ed9bff073 handle container restarts for GPUs
Signed-off-by: Vishnu Kannan <vishnuk@google.com>
2017-03-13 10:58:26 -07:00
tanshanshan
26ab52a3cb fix 2017-03-13 10:00:19 +08:00
Kubernetes Submit Queue
59aa924a9b Merge pull request #42642 from fraenkel/envfrom
Automatic merge from submit-queue

Invalid environment var names are reported and pod starts

When processing EnvFrom items, all invalid keys are collected and
reported as a single event.

The Pod is allowed to start.

fixes #42583
2017-03-10 17:37:31 -08:00
timchenxiaoyu
c295514443 accurate hint 2017-03-10 16:41:51 +08:00
Kubernetes Submit Queue
d790851c8f Merge pull request #42694 from dchen1107/master
Automatic merge from submit-queue (batch tested with PRs 42734, 42745, 42758, 42814, 42694)

Dropped docker 1.9.x support. Changed the minimumDockerAPIVersion to

1.22

cc/ @Random-Liu @yujuhong 

We talked about dropping docker 1.9.x support for a while. I just realized that we haven't really done it yet. 

```release-note
Dropped the support for docker 1.9.x and the belows. 
```
2017-03-09 15:07:00 -08:00
Dawn Chen
69eaea2fcc Merge pull request #42779 from dashpole/fix_status
[Bug Fix] Allow Status Updates for Pods that can be deleted
2017-03-09 13:23:00 -08:00
David Ashpole
e3e0bc6ce0 do not skip pods that can be deleted 2017-03-09 09:35:50 -08:00
Kubernetes Submit Queue
9cfc4f1a10 Merge pull request #42739 from yujuhong/created_time
Automatic merge from submit-queue (batch tested with PRs 42762, 42739, 42425, 42778)

FakeDockerClient: add creation timestamp

This fixes #42736
2017-03-09 02:51:38 -08:00
Kubernetes Submit Queue
4cf553f78e Merge pull request #42767 from Random-Liu/cleanup-infra-container-on-error
Automatic merge from submit-queue (batch tested with PRs 42768, 42760, 42771, 42767)

Stop sandbox container when hit network error.

Fixes https://github.com/kubernetes/kubernetes/issues/42698.

This PR stops the sandbox container when hitting a network error.
This PR also adds a unit test for it.

I'm not sure whether we should try teardown pod network after `SetUpPod` failure. We don't do that in dockertools https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockertools/docker_manager.go#L2276.

@yujuhong @freehan
2017-03-09 00:08:01 -08:00
Michael Fraenkel
c4d07466e8 Invalid environment var names are reported and pod starts
When processing EnvFrom items, all invalid keys are collected and
reported as a single event.

The Pod is allowed to start.
2017-03-09 07:21:53 +00:00
Kubernetes Submit Queue
6fac75c80a Merge pull request #42768 from yujuhong/fix_sandbox_listing
Automatic merge from submit-queue

dockershim: Fix the race condition in ListPodSandbox

In ListPodSandbox(), we
 1. List all sandbox docker containers
 2. List all sandbox checkpoints. If the checkpoint does not have a
    corresponding container in (1), we return partial result based on
    the checkpoint.

The problem is that new PodSandboxes can be created between step (1) and
(2). In those cases, we will see the checkpoints, but not the sandbox
containers. This leads to strange behavior because the partial result
from the checkpoint does not include some critical information. For
example, the creation timestamp'd be zero, and that would cause kubelet's
garbage collector to immediately remove the sandbox.

This change fixes that by getting the list of checkpoints before listing
all the containers (since in RunPodSandbox we create them in the reverse
order).
2017-03-08 21:33:31 -08:00
Kubernetes Submit Queue
ec46846a25 Merge pull request #38691 from xiangpengzhao/fix-empty-logpath
Automatic merge from submit-queue (batch tested with PRs 42211, 38691, 42737, 42757, 42754)

Only create the symlink when container log path exists

When using `syslog` logging driver instead of `json-file`, there will not be container log files such as `<containerID-json.log>`. We should not create symlink in this case.
2017-03-08 18:52:26 -08:00
timchenxiaoyu
0bfbd40d4c fix some typo 2017-03-09 09:34:43 +08:00
Random-Liu
2690461cbb Stop sandbox container when hit network error. 2017-03-08 17:28:42 -08:00
Eric Paris
df590da6ab Return early from eviction debug helpers if !glog.V(3)
Should keep us from running a bunch of loops needlessly.
2017-03-08 20:19:52 -05:00
Yu-Ju Hong
38d8da1215 FakeDockerClient: add creation timestamp
This is necessary for kubemark to work correctly.
2017-03-08 17:11:16 -08:00
timchenxiaoyu
767719ea9c fix across typo 2017-03-09 09:07:21 +08:00
Yu-Ju Hong
8328a66bdf dockershim: Fix the race condition in ListPodSandbox
In ListPodSandbox(), we
 1. List all sandbox docker containers
 2. List all sandbox checkpoints. If the checkpoint does not have a
    corresponding container in (1), we return partial result based on
    the checkpoint.

The problem is that new PodSandboxes can be created between step (1) and
(2). In those cases, we will see the checkpoints, but not the sandbox
containers. This leads to strange behavior because the partial result
from the checkpoint does not include some critical information. For
example, the creation timestamp'd be zero, and that would cause kubelet's
garbage collector to immediately remove the sandbox.

This change fixes that by getting the list of checkpoints before listing
all the containers (since in RunPodSandbox we create them in the reverse
order).
2017-03-08 17:02:34 -08:00
Yu-Ju Hong
1095652cb8 Add more logs to help debugging 2017-03-08 12:27:49 -08:00
xiangpengzhao
7fed242d55 Only create the symlink when container log path exists 2017-03-08 01:36:48 -05:00
Kubernetes Submit Queue
5bc7387b3c Merge pull request #42169 from ncdc/pprof-trace
Automatic merge from submit-queue (batch tested with PRs 42692, 42169, 42173)

Add pprof trace support

Add support for `/debug/pprof/trace`

Can wait for master to reopen for 1.7.

cc @smarterclayton @wojtek-t @gmarek @timothysc @jeremyeder @kubernetes/sig-scalability-pr-reviews
2017-03-07 20:10:26 -08:00
Dawn Chen
ab790b6a3a Dropped docker 1.9.x support. Changed the minimumDockerAPIVersion to
1.22
2017-03-07 17:07:07 -08:00
Kubernetes Submit Queue
1ed3aa6750 Merge pull request #42264 from yujuhong/kubemark_cri
Automatic merge from submit-queue

kubemark: enable CRI for the hollow nodes

This fixes #41488
2017-03-07 13:04:39 -08:00
Matthew Wong
1dabce9815 Print dereferenced pod status fields when logging status update 2017-03-07 15:00:54 -05:00
Yu-Ju Hong
a0f90e1490 Use FakeDockerPuller to bypass auth/keyring logic in tests 2017-03-07 10:11:49 -08:00
Yu-Ju Hong
516848c37d Various fixes for the fake docker client
* Properly return ImageNotFoundError
 * Support inject "Images" or "ImageInspects" and keep both in sync.
 * Remove the FakeDockerPuller and let FakeDockerClient subsumes its
   functinality. This reduces the overhead to maintain both objects.
 * Various small fixes and refactoring of the testing utils.
2017-03-07 10:11:49 -08:00
Kubernetes Submit Queue
5cc6a4e269 Merge pull request #42609 from intelsdi-x/test-out-of-oir
Automatic merge from submit-queue (batch tested with PRs 41890, 42593, 42633, 42626, 42609)

Pods pending due to insufficient OIR should get scheduled once sufficient OIR becomes available (e2e disabled).

#41870 was reverted because it introduced an e2e test flake. This is the same code with the e2e for OIR disabled again.

We can attempt to enable the e2e test cases one-by-one in follow-up PRs, but it would be preferable to get the main fix merged in time for 1.6 since OIR is broken on master (see #41861).

cc @timothysc
2017-03-07 08:10:46 -08:00