Commit Graph

2882 Commits

Author SHA1 Message Date
k8s-merge-robot
d93f80c86b Merge pull request #26677 from Random-Liu/add-image-pull-timeout
Automatic merge from submit-queue

Add timeout for image pulling

Fix #26300.

With this PR, if image pulling makes no progress for *1 minute*, the operation will be cancelled. Docker reports progress for every 512kB block (See [here](3d13fddd2b/pkg/progress/progressreader.go (L32))), *512kB/min* means the throughput is *<= 8.5kB/s*, which should be kind of abnormal?

It's a little hard to write unit test for this, so I just manually tested it. If I set the `defaultImagePullingStuckTimeout` to 0s, and `defaultImagePullingProgressReportInterval` to 1s, image pulling will be cancelled.
```
E0601 18:48:29.026003   46185 kube_docker_client.go:274] Cancel pulling image "nginx:latest" because of no progress for 0, latest progress: "89732b811e7f: Pulling fs layer "
E0601 18:48:29.026308   46185 manager.go:2110] container start failed: ErrImagePull: net/http: request canceled
```

/cc @kubernetes/sig-node 
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
2016-06-03 04:02:20 -07:00
k8s-merge-robot
75ef1ca270 Merge pull request #26351 from saad-ali/attachDetachControllerKubeletChanges
Automatic merge from submit-queue

Attach/Detach Controller Kubelet Changes

This PR contains changes to enable attach/detach controller proposed in #20262.

Specifically it:
* Introduces a new `enable-controller-attach-detach` kubelet flag to enable control by attach/detach controller. Default enabled.
* Removes all references `SafeToDetach` annotation from controller.
* Adds the new `VolumesInUse` field to the Node Status API object.
* Modifies the controller to use `VolumesInUse` instead of `SafeToDetach` annotation to gate detachment.
* Modifies kubelet to set `VolumesInUse` before Mount and after Unmount.
  * There is a bug in the `node-problem-detector` binary that causes `VolumesInUse` to get reset to nil every 30 seconds. Issue https://github.com/kubernetes/node-problem-detector/issues/9#issuecomment-221770924 opened to fix that.
  * There is a bug here in the mount/unmount code that prevents resetting `VolumeInUse in some cases, this will be fixed by mount/unmount refactor.
* Have controller process detaches before attaches so that volumes referenced by pods that are rescheduled to a different node are detached first.
* Fix misc bugs in controller.
* Modify GCE attacher to: remove retries, remove mutex, and not fail if volume is already attached or already detached.

Fixes #14642, #19953

```release-note
Kubernetes v1.3 introduces a new Attach/Detach Controller. This controller manages attaching and detaching volumes on-behalf of nodes that have the "volumes.kubernetes.io/controller-managed-attach-detach" annotation.

A kubelet flag, "enable-controller-attach-detach" (default true), controls whether a node sets the "controller-managed-attach-detach" or not.
```
2016-06-02 23:30:32 -07:00
Saad Ali
9dbe943491 Attach/Detach Controller Kubelet Changes
This PR contains Kubelet changes to enable attach/detach controller control.
* It introduces a new "enable-controller-attach-detach" kubelet flag to
  enable control by controller. Default enabled.
* It removes all references "SafeToDetach" annoation from controller.
* It adds the new VolumesInUse field to the Node Status API object.
* It modifies the controller to use VolumesInUse instead of SafeToDetach
  annotation to gate detachment.
* There is a bug in node-problem-detector that causes VolumesInUse to
  get reset every 30 seconds. Issue https://github.com/kubernetes/node-problem-detector/issues/9
  opened to fix that.
2016-06-02 16:47:11 -07:00
k8s-merge-robot
4c316979c8 Merge pull request #25851 from euank/fixJournaldUsage
Automatic merge from submit-queue

rkt: Get logs via syslog identifier

This change works around https://github.com/coreos/rkt/issues/2630
Without this change, logs cannot reliably be collected for containers
with short lifetimes.

With this change, logs cannot be collected on rkt versions v1.6.0 and
before.

I'd like to also bump the required rkt version, but I don't want to do that until there's a released version that can be pointed to (so the next rkt release).

I haven't added tests (which were missing) because this code will be removed if/when logs are retrieved via the API. I have run E2E tests with this merged in and verified the tests which previously failed no longer fail.

cc @yifan-gu
2016-06-02 15:53:39 -07:00
Random-Liu
49c8683c24 Add timeout for image pulling 2016-06-02 10:49:17 -07:00
k8s-merge-robot
a27058156f Merge pull request #24901 from yifan-gu/support_selinux
Automatic merge from submit-queue

rkt: Add pod selinux support.

Currently only pod level selinux context is supported, besides when
running selinux, we will not be able to use the overlay fs, see:
https://github.com/coreos/rkt/issues/1727#issuecomment-173203129.


cc @kubernetes/sig-node  @alban @mjg59 @pmorie
2016-06-02 07:48:02 -07:00
Yifan Gu
0a7537ecbf rkt: Add pod selinux support.
Currently only pod level selinux context is supported, besides when
running selinux, for now we will not be able to use the overlay fs
except for coreos, see:
https://github.com/coreos/rkt/issues/1727#issuecomment-173203129.
2016-06-02 00:55:27 +08:00
k8s-merge-robot
6277eea57b Merge pull request #26200 from yifan-gu/remove_systemd_quotes
Automatic merge from submit-queue

rkt: Remove quotes in the systemd ExecStart command.

cc @euank @dcbw
2016-06-01 03:13:19 -07:00
Euan Kemp
f028a9f410 rkt: Update minimum rkt version to 1.7.0
Also remove the redundant `appcVersion` check, that version should
already be captured in the rkt version
2016-05-31 15:24:51 -07:00
Euan Kemp
d0a31873d7 rkt: Get logs via syslog identifier
This change works around https://github.com/coreos/rkt/issues/2630
Without this change, logs cannot reliably be collected for containers
with short lifetimes.

With this change, logs cannot be collected on rkt versions v1.6.0 and
before.
2016-05-31 15:23:46 -07:00
Yifan Gu
6cb87e8d69 rkt: Remove quotes in the systemd ExecStart command.
With quotes, the service doesn't start for systemd 219 with the error
saying the path of the netns cannot be found.

This PR fixes the bug by removing the quotes surround the netns path.
2016-05-31 22:16:42 +08:00
Yifan Gu
1d40f471b4 rkt: Fix docker auth config save directory to avoid race. 2016-05-30 20:40:31 +08:00
k8s-merge-robot
77de942e08 Merge pull request #26451 from Random-Liu/cache_image_history
Automatic merge from submit-queue

Kubelet: Cache image history to eliminate the performance regression

Fix https://github.com/kubernetes/kubernetes/issues/25057.

The image history operation takes almost 50% of cpu usage in kubelet performance test. We should cache image history instead of getting it from runtime everytime.

This PR cached image history in imageStatsProvider and added unit test.

@yujuhong @vishh 
/cc @kubernetes/sig-node 

Mark v1.3 because this is a relatively significant performance regression.

[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()
2016-05-29 20:49:01 -07:00
k8s-merge-robot
32da727ca1 Merge pull request #26264 from luxas/remove_flannel_default
Automatic merge from submit-queue

Do not call NewFlannelServer() unless flannel overlay is enabled

Ref: #26093 

This makes so kubelet does not warn the user that iptables isn't in PATH, although the user didn't enable the flannel overlay.

@vishh @freehan @bprashanth
2016-05-29 15:49:00 -07:00
k8s-merge-robot
eed13d702f Merge pull request #26253 from xiangpengzhao/fix_assertnotnil
Automatic merge from submit-queue

Add assert.NotNil for test case

I hardcode the `DefaultInterfaceName` from `eth0` to `eth-k8sdefault` at release 1.2.0,  in order to test my CNI plugins. When running the test, it panics and prints wrongly formatted messages as below.

In the test case `TestBuildSummary`, `containerInfoV2ToNetworkStats` will return `nil` if `DefaultInterfaceName` is not `eth0`. So maybe we should add `assert.NotNil` to the test case.

```
ok      k8s.io/kubernetes/pkg/kubelet/server    0.591s
W0523 03:25:28.257074    2257 summary.go:311] Missing default interface "eth-k8sdefault" for s%!(EXTRA string=node:FooNode)
W0523 03:25:28.257322    2257 summary.go:311] Missing default interface "eth-k8sdefault" for s%!(EXTRA string=pod:test0_pod1)
W0523 03:25:28.257361    2257 summary.go:311] Missing default interface "eth-k8sdefault" for s%!(EXTRA string=pod:test0_pod0)
W0523 03:25:28.257419    2257 summary.go:311] Missing default interface "eth-k8sdefault" for s%!(EXTRA string=pod:test2_pod0)
--- FAIL: TestBuildSummary (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x471817]

goroutine 16 [running]:
testing.func·006()
        /usr/src/go/src/testing/testing.go:441 +0x181
k8s.io/kubernetes/pkg/kubelet/server/stats.checkNetworkStats(0xc20806d3b0, 0x140bbc0, 0x4, 0x0, 0x0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/server/stats/summary_test.go:296 +0xc07
k8s.io/kubernetes/pkg/kubelet/server/stats.TestBuildSummary(0xc20806d3b0)
        /go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/server/stats/summary_test.go:124 +0x11d2
testing.tRunner(0xc20806d3b0, 0x1e43180)
        /usr/src/go/src/testing/testing.go:447 +0xbf
created by testing.RunTests
        /usr/src/go/src/testing/testing.go:555 +0xa8b
```
2016-05-29 14:13:00 -07:00
k8s-merge-robot
98af443209 Merge pull request #26398 from euank/various-kubenet-fixes
Automatic merge from submit-queue

Various kubenet fixes (panics and bugs and cidrs, oh my)

This PR fixes the following issues:

1. Corrects an inverse error-check that prevented `shaper.Reset` from ever being called with a correct ip address
2. Fix an issue where `parseCIDR` would fail after a kubelet restart due to an IP being stored instead of a CIDR being stored in the cache.
3. Fix an issue where kubenet could panic in TearDownPod if it was called before SetUpPod (e.g. after a kubelet restart).. because of bug number 1, this didn't happen except in rare situations (see 2 for why such a rare situation might happen)

This adds a test, but more would definitely be useful.
The commits are also granular enough I could split this up more if desired.

I'm also not super-familiar with this code, so review and feedback would be welcome.

Testing done:
```
$ cat examples/egress/egress.yml
 apiVersion: v1
kind: Pod
metadata:
  labels:
    name: egress
  name: egress-output
  annotations: {"kubernetes.io/ingress-bandwidth": "300k"}
spec:
  restartPolicy: Never
  containers:
    - name: egress
      image: busybox
      command: ["sh", "-c", "sleep 60"]
$ cat kubelet.log
...
Running: tc filter add dev cbr0 protocol ip parent 1:0 prio 1 u32 match ip dst 10.0.0.5/32 flowid 1:1
# setup
...
Running: tc filter del dev cbr0 parent 1:proto ip prio 1 handle 800::800 u32
# teardown
```

I also did various other bits of manual testing and logging to hunt down the panic and other issues, but don't have anything to paste for that 

cc @dcbw @kubernetes/sig-network
2016-05-29 04:04:22 -07:00
k8s-merge-robot
577cdf937d Merge pull request #26415 from wojtek-t/network_not_ready
Automatic merge from submit-queue

Add a NodeCondition "NetworkUnavaiable" to prevent scheduling onto a node until the routes have been created 

This is new version of #26267 (based on top of that one).

The new workflow is:
- we have an "NetworkNotReady" condition
- Kubelet when it creates a node, it sets it to "true"
- RouteController will set it to "false" when the route is created
- Scheduler is scheduling only on nodes that doesn't have "NetworkNotReady ==true" condition

@gmarek @bgrant0607 @zmerlynn @cjcullen @derekwaynecarr @danwinship @dcbw @lavalamp @vishh
2016-05-29 03:06:59 -07:00
k8s-merge-robot
d00dec7825 Merge pull request #26397 from euank/fixReadOnlyRootfsPanic
Automatic merge from submit-queue

rkt: Fix panic in setting ReadOnlyRootFS

What the title says. I wish this method were broken out in a reasonably unit testable way. fixing this panic is more important for the second though, testing will come in a later commit.

I observed the panic in a `./hack/local-up-cluster.sh` run with rkt as the container runtime.

This is also the panic that's failing our jenkins against master ([recent run](https://console.cloud.google.com/m/cloudstorage/b/rktnetes-jenkins/o/logs/kubernetes-e2e-gce/1946/artifacts/jenkins-e2e-minion-group-qjh3/kubelet.log for the log output of a recent run))

cc @tmrts @yifan-gu
2016-05-29 02:17:09 -07:00
k8s-merge-robot
344f26ae69 Merge pull request #26145 from Random-Liu/image-pulling-progress
Automatic merge from submit-queue

Kubelet: Periodically reporting image pulling progress in log

Addresses https://github.com/kubernetes/kubernetes/issues/26075#issuecomment-221129896 and https://github.com/kubernetes/kubernetes/pull/26122#issuecomment-221128397.

This PR changes kube_docker_client to log pulling progress every *10* seconds. We can't print all progress messages into the log, because there are too many. So I make it report newest progress every 10 seconds to reduce log spam.
If the image pulling is too slow or stuck, we'll see image pulling progress unchanged or changed little overtime.

The following is the result if I set the reporting interval to *1* second.
```
I0524 00:53:26.189086  145099 kube_docker_client.go:252] Pulling image "ubuntu:latest": "17b6a9e179d7: Pulling fs layer "
I0524 00:53:27.189082  145099 kube_docker_client.go:252] Pulling image "ubuntu:latest": "202e40f8bb3a: Download complete "
I0524 00:53:28.189160  145099 kube_docker_client.go:252] Pulling image "ubuntu:latest": "487bffc61de6: Downloading [=>                                                 ] 1.474 MB/48.35 MB"
I0524 00:53:29.189156  145099 kube_docker_client.go:252] Pulling image "ubuntu:latest": "487bffc61de6: Downloading [====>                                              ] 3.931 MB/48.35 MB"
I0524 00:53:30.189089  145099 kube_docker_client.go:252] Pulling image "ubuntu:latest": "487bffc61de6: Downloading [=========>                                         ] 8.847 MB/48.35 MB"
I0524 00:53:31.189089  145099 kube_docker_client.go:252] Pulling image "ubuntu:latest": "487bffc61de6: Downloading [==================>                                ] 18.19 MB/48.35 MB"
I0524 00:53:32.189076  145099 kube_docker_client.go:252] Pulling image "ubuntu:latest": "487bffc61de6: Downloading [=======================================>           ] 38.34 MB/48.35 MB"
I0524 00:53:33.189106  145099 kube_docker_client.go:252] Pulling image "ubuntu:latest": "487bffc61de6: Extracting [=============>                                     ] 12.78 MB/48.35 MB"
I0524 00:53:34.189067  145099 kube_docker_client.go:252] Pulling image "ubuntu:latest": "487bffc61de6: Extracting [==================================>                ] 33.42 MB/48.35 MB"
I0524 00:53:35.189083  145099 kube_docker_client.go:252] Pulling image "ubuntu:latest": "487bffc61de6: Extracting [==================================================>] 48.35 MB/48.35 MB"
I0524 00:53:35.376667  145099 kube_docker_client.go:254] Finish pulling image "ubuntu:latest": "Status: Downloaded newer image for ubuntu:latest"
```

Ref image pulling related issue #19077.

[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/.github/PULL_REQUEST_TEMPLATE.md?pixel)]()

@yujuhong @dchen1107 
/cc @kubernetes/sig-node
2016-05-28 13:34:28 -07:00
k8s-merge-robot
350efaf13d Merge pull request #26096 from euank/set-pod-ip
Automatic merge from submit-queue

rkt: Pass through podIP

This is needed for the /etc/hosts mount and the downward API to work.
Furthermore, this is required for the reported `PodStatus` to be
correct.

The `Status` bit mostly worked prior to #25062, and this restores that
functionality in addition to the new functionality.

In retrospect, the regression in status is large enough the prior PR should have included at least some of this; my bad for not realizing the full implications there.

#25902 is needed for downwards api stuff, but either merge order is fine as neither will break badly by itself.

cc @yifan-gu @dcbw
2016-05-28 12:40:39 -07:00
k8s-merge-robot
03fc51f74f Merge pull request #26046 from timoreimann/stabilize-map-order-in-kubectl-describe
Automatic merge from submit-queue

Stabilize map order in kubectl describe

Refs #25251.

Add `SortedResourceNames()` methods to map type aliases in order to achieve stable output order for `kubectl` descriptors.

This affects QoS classes, resource limits, and resource requests.

A few remarks:

1. I couldn't find map usages for described fields other than the ones mentioned above. Then again, I failed to identify those programmatically/systematically. Pointers given, I'd be happy to cover any gaps within this PR or along additional ones.
1. It's somewhat difficult to deterministically test a function that brings reliable ordering to Go maps due to its randomizing nature. None of the possibilities I came up with (rely a "probabilistic testing" against repeatedly created maps, add complexity through additional interfaces) seemed very appealing to me, so I went with testing my `sort.Interface` implementation and the changed logic in `kubectl.describeContainers()`.
1. It's apparently not possible to implement a single function that sorts any map's keys generically in Go without producing lots of boilerplate: a `map[<key type>]interface{}` is different from any other map type and thus requires explicit iteration on the caller site to convert back and forth. Unfortunately, this makes it hard to completely avoid code/test duplication.

Please let me know what you think.
2016-05-28 10:49:57 -07:00
Wojciech Tyczynski
fcfaf1a3bd Revert "Fix system container detection in kubelet on systemd" 2016-05-28 16:11:53 +02:00
k8s-merge-robot
c730198aad Merge pull request #25982 from derekwaynecarr/fix_stats
Automatic merge from submit-queue

Fix system container detection in kubelet on systemd

```release-note
Fix system container detection in kubelet on systemd.

This fixed environments where CPU and Memory Accounting were not enabled on the unit 
that launched the kubelet or docker from reporting the root cgroup when 
monitoring usage stats for those components.
```

Fixes https://github.com/kubernetes/kubernetes/issues/25909

/cc @kubernetes/sig-node @kubernetes/rh-cluster-infra @vishh @dchen1107
2016-05-28 05:38:15 -07:00
k8s-merge-robot
e7a13ac2ad Merge pull request #25902 from euank/changeVolumeMounts
Automatic merge from submit-queue

rkt: Use volumes from RunContainerOptions

This replaces the previous creation of mounts from the `volumeGetter`
with mounts provided via RunContainerOptions.

This is motivated by the fact that the latter has a more complete set of
mounts (e.g. the `/etc/hosts` one created in kubelet.go in the case an IP is available).

This does not induce further e2e failures as far as I can tell.

cc @yifan-gu
2016-05-28 03:58:14 -07:00
k8s-merge-robot
74b20cccc6 Merge pull request #25813 from rrati/kubelet-pods-per-core
Automatic merge from submit-queue

Added pods-per-core to kubelet. #25762

Added --pods-per-core to kubelet

#25762
2016-05-28 03:08:28 -07:00
k8s-merge-robot
f32b2582df Merge pull request #26391 from timstclair/containerd
Automatic merge from submit-queue

Move containerd process into docker cgroup for versions >= v1.11

Addresses https://github.com/kubernetes/kubernetes/issues/23397#issuecomment-209583923

/cc @vishh @kubernetes/sig-node
2016-05-27 19:42:48 -07:00
Euan Kemp
c83ad19ae9 kubenet: Fix ipv4 validity check
The length of an IP can be 4 or 16, and even if 16 it can be a valid
ipv4 address. This check is the more-correct way to handle this, and it
also provides more granular error messages.
2016-05-27 16:25:14 -07:00
Alex Robinson
91f8c784a0 Merge pull request #21373 from enoodle/read_cadvisor_cloudinfo_in_kubelet
kubelet: reading cloudinfo from cadvisor
2016-05-27 16:14:24 -07:00
Alex Robinson
cddf564f3c Merge pull request #24771 from timstclair/event-store
Disable cAdvisor event storage by default
2016-05-27 15:56:13 -07:00
Random-Liu
52a3d8a19d Add unit test for image history cache 2016-05-27 14:49:48 -07:00
Random-Liu
56bde2df9f Cache image history 2016-05-27 14:49:48 -07:00
Alex Robinson
1cca499e92 Merge pull request #26225 from yujuhong/less_noise
Reduce noise in kubelet.log
2016-05-27 14:28:53 -07:00
Euan Kemp
abbd0321b2 rkt: Use volumes from RunContainerOptions
This replaces the previous creation of mounts from the `volumeGetter`
with mounts provided via RunContainerOptions.

This is motivated by the fact that the latter has a more complete set of
mounts (e.g. the `/etc/hosts` one created in kubelet.go).
2016-05-27 13:11:47 -07:00
Alex Robinson
bd0b94efe2 Merge pull request #26029 from luxas/mkdir_all
kubelet: Use MkdirAll instead of Mkdir
2016-05-27 11:40:01 -07:00
Alex Robinson
789b69758e Merge pull request #25688 from sjpotter/rkt_annotations
kubelet: Move common labels out of dockertools package
2016-05-27 11:26:31 -07:00
Euan Kemp
93487867ac kubenet: Update empty ip check
The previous check was incorrect because the `IP.String` method returns
`<nil>` and other non-empty-strings on error conditions.
2016-05-27 10:47:13 -07:00
Euan Kemp
c4b8959a75 kubenet: Reduce loglevel of spammy message
When no shaping is enabled, that warning would always be printed.
2016-05-27 10:47:12 -07:00
Euan Kemp
7e0b9bfa66 kubenet: Fix panic when teardown run before setup
Teardown can run before Setup when the kubelet is restarted... in that
case, the shaper was nil and thus calling the shaper resulted in a panic

This fixes that by ensuring the shaper is always set... +1 level of
indirection and all that.
2016-05-27 10:47:12 -07:00
Euan Kemp
2f5e738dc1 kubenet: Fix inconsistent cidr usage/parsing
Before this change, the podCIDRs map contained both cidrs and ips
depending on which code path entered a container into it.

Specifically, SetUpPod would enter a CIDR while GetPodNetworkStatus
would enter an IP.

This normalizes both of them to always enter just IP addresses.

This also removes the now-redundant cidr parsing that was used to get
the ip before
2016-05-27 10:47:12 -07:00
Wojciech Tyczynski
be1b57100d Change to NotReadyNetworking and use in scheduler 2016-05-27 19:32:49 +02:00
gmarek
7bdf480340 Node is NotReady until the Route is created 2016-05-27 19:29:51 +02:00
Euan Kemp
766eb6f0f7 kubenet: Fix bug where shaper.Reset wasn't called
The error check was inverse what it should have been, causing
shaper.Reset to only get called with invalid cidrs.
2016-05-27 10:20:43 -07:00
Alex Robinson
07d9dff83c Merge pull request #26208 from freehan/kubenetteardownfix
do not return error if TearDownPod is called twice
2016-05-27 09:59:03 -07:00
Robert Rati
2d487f7c06 Added pods-per-core to kubelet. #25762 2016-05-27 07:10:13 -04:00
Euan Kemp
ecfd8f723f rkt: Fix panic in setting ReadOnlyRootFS 2016-05-26 20:43:26 -07:00
Tim St. Clair
e4d8dea0d7 Move containerd process into docker cgroup for versions >= v1.11 2016-05-26 17:27:00 -07:00
Alex Mohr
aab6c43a33 Merge pull request #25604 from freehan/kubenethostport
Kubenet host-port support through iptables
2016-05-26 15:49:12 -07:00
Alex Mohr
5b1653ec39 Merge pull request #25681 from vishh/lifecycle-probe-logs
Log output of lifecycle hooks on failure
2016-05-26 12:37:02 -07:00
Minhan Xia
0834dc489a do not return error if TearDownPod is called twice 2016-05-26 11:57:22 -07:00
Alex Mohr
4357b8a0a6 Merge pull request #25324 from jfrazelle/add-seccomp
Add Seccomp to Annotations
2016-05-26 10:50:06 -07:00