Commit Graph

6595 Commits

Author SHA1 Message Date
Kubernetes Submit Queue
cef2d325ee Merge pull request #66395 from awly/fix-kubelet-exec-plugin-startup
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Update http.Transport if it already exists in ExecProvider

**What this PR does / why we need it**:
This unbreaks ExecPlugin. Without the change, we hit this error
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/client-go/transport/transport.go#L32

**Release note**:
```release-note
Fix kubelet startup failure when using ExecPlugin in kubeconfig
```
2018-07-26 10:47:05 -07:00
Andrew Lytvynov
3357b5ecf4 Set connrotation dialer via restclient.Config.Dialer
Instead of Transport. This fixes ExecPlugin, which fails if
restclient.Config.Transport is set.
2018-07-25 16:23:57 -07:00
stewart-yu
ffbd7b22b3 remove the unnecessary comments in tryRegisterWithAPIServer for externalID removed in PR#61877 2018-07-25 11:23:56 +08:00
bingshen.wbs
b1bdd043c4 fix kubelet npe on device plugin return zero container
Signed-off-by: bingshen.wbs <bingshen.wbs@alibaba-inc.com>
2018-07-25 10:15:30 +08:00
Pengfei Ni
cfb776dcdd Revert #63905: Setup dns servers and search domains for Windows Pods 2018-07-25 09:58:47 +08:00
Seth Jennings
b1ec6da4c7 kubelet: add image-gc low/high validation check 2018-07-23 13:14:31 -05:00
Da K. Ma
aac9f1cbaa Taint node when initializing node.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
2018-07-23 12:52:05 +08:00
Lee Verberne
7c558fb7bb Remove kubelet-level docker shared pid flag
The --docker-disable-shared-pid flag has been deprecated since 1.10 and
has been superceded by ShareProcessNamespace in the pod API, which is
scheduled for beta in 1.12.
2018-07-22 16:54:44 +02:00
Kubernetes Submit Queue
53ee0c8652 Merge pull request #65660 from mtaufen/incremental-refactor-kubelet-node-status
Automatic merge from submit-queue (batch tested with PRs 66152, 66406, 66218, 66278, 65660). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Refactor kubelet node status setters, add test coverage

This internal refactor moves the node status setters to a new package, explicitly injects dependencies to facilitate unit testing, and adds individual unit tests for the setters.

I gave each setter a distinct commit to facilitate review.

Non-goals:
- I intentionally excluded the class of setters that return a "modified" boolean, as I want to think more carefully about how to cleanly handle the behavior, and this PR is already rather large.
- I would like to clean up the status update control loops as well, but that belongs in a separate PR.

```release-note
NONE
```
2018-07-20 12:12:24 -07:00
Ravi Sankar Penta
0282720e29 Do not set cgroup parent when --cgroups-per-qos is disabled
When --cgroups-per-qos=false (default is true), kubelet sets pod
container management to podContainerManagerNoop implementation and
GetPodContainerName() returns '/' as cgroup parent (default cgroup root).

(1) In case of 'systemd' cgroup driver, '/' is invalid parent as
docker daemon expects '.slice' suffix and throws this error:
'cgroup-parent for systemd cgroup should be a valid slice named as \"xxx.slice\"'
(5fc12449d8/daemon/daemon_unix.go (L618))
'/' corresponds to '-.slice' (root slice) in systemd but I don't think
we want to assign root slice instead of runtime specific default value.
In case of docker runtime, this will be 'system.slice'
(e2593239d9/daemon/oci_linux.go (L698))

(2) In case of 'cgroupfs' cgroup driver, '/' is valid parent but I don't
think we want to assign root instead of runtime specific default value.
In case of docker runtime, this will be '/docker'
(e2593239d9/daemon/oci_linux.go (L695))

Current fix will not set the cgroup parent when --cgroups-per-qos is disabled.
2018-07-20 10:25:50 -07:00
Kubernetes Submit Queue
d2cc34fb07 Merge pull request #65771 from smarterclayton/untyped
Automatic merge from submit-queue (batch tested with PRs 65771, 65849). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Add a new conversion path to replace GenericConversionFunc

reflect.Call is very expensive. We currently use a switch block as part of AddGenericConversionFunc to avoid the bulk of top level a->b conversion for our primary types which is hand-written. Instead of having these be handwritten, we should generate them.

The pattern for generating them looks like:

```
scheme.AddConversionFunc(&v1.Type{}, &internal.Type{}, func(a, b interface{}, scope conversion.Scope) error {
  return Convert_v1_Type_to_internal_Type(a.(*v1.Type), b.(*internal.Type), scope)
})
```

which matches AddDefaultObjectFunc (which proved out the approach last year). The
conversion machinery should then do a simple map lookup based on the incoming types and invoke the function.  Like defaulting, it's up to the caller to match the types to arguments, which we do by generating this code.  This bypasses reflect.Call and in the future allows Golang mid-stack inlining to optimize this code.

As part of this change I strengthened registration of custom functions to be generated instead of hand registered, and also strengthened error checking of the generator when it sees a manual conversion to error out.  Since custom functions are automatically used by the generator, we don't really have a case for not registering the functions.

Once this is fully tested out, we can remove the reflection based path and the old registration methods, and all conversion will work from point to point methods (whether generated or custom).

Much of the need for the reflection path has been removed by changes to generation (to omit fields) and changes to Go (to make assigning equivalent structs easy).

```release-note
NONE
```
2018-07-19 09:29:00 -07:00
Kubernetes Submit Queue
afcc156806 Merge pull request #66350 from aveshagarwal/master-rhbz-1601378
Automatic merge from submit-queue (batch tested with PRs 66175, 66324, 65828, 65901, 66350). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Start cloudResourceSyncsManager before getNodeAnyWay (initializeModules) to avoid kubelet getting stuck in retrieving node addresses from a cloudprovider.

**What this PR does / why we need it**:
This PR starts cloudResourceSyncsManager before getNodeAnyWay (initializeModules) otherwise kubelet gets stuck in setNodeAddress->kl.cloudResourceSyncManager.NodeAddresses() (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_node_status.go#L470) forever retrieving node addresses from a cloud provider, and due to this cloudResourceSyncsManager will not be started at all.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #

**Special notes for your reviewer**:

**Release note**:

```release-note
None
```

@ingvagabund @derekwaynecarr @sjenning @kubernetes/sig-node-bugs
2018-07-18 16:42:22 -07:00
Avesh Agarwal
6c33ca13e9 Start cloudResourceSyncsManager before getNodeAnyWay (initializeModules)
so that kubelet does not get stuck in retriving node addresses from a cloudprovider.
2018-07-18 15:15:03 -04:00
Clayton Coleman
ef561ba8b5 generated: Avoid use of reflect.Call in conversion code paths 2018-07-17 23:02:16 -04:00
Shimin Guo
e8cd28ae57 fix a panic due to assignment to nil map 2018-07-17 12:34:20 -07:00
vikaschoudhary16
a5842503eb Use probe based plugin discovery mechanism in device manager 2018-07-17 04:02:31 -04:00
chentao1596
ce3f5002dd Add unit tests for methods of pod's format 2018-07-17 15:37:13 +08:00
chentao1596
9319be121e Change the method name from PodsWithDeletiontimestamps to PodsWithDeletionTimestamps 2018-07-17 15:34:32 +08:00
Pingan2017
45fac6f469 delete unused events 2018-07-17 14:19:50 +08:00
Michael Taufen
0f8976bf84 eliminate unnecessary helper 2018-07-16 09:09:48 -07:00
Michael Taufen
0170826542 port setVolumeLimits to Setter abstraction, add test 2018-07-16 09:09:48 -07:00
Michael Taufen
c5a5e21639 port setNodeStatusGoRuntime to Setter abstraction 2018-07-16 09:09:48 -07:00
Michael Taufen
8e217f7102 port setNodeStatusImages to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen
b7ec333f01 port setNodeStatusDaemonEndpoints to Setter abstraction 2018-07-16 09:09:47 -07:00
Michael Taufen
59bb21051e port setNodeStatusVersionInfo to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen
596fa89af0 port setNodeStatusMachineInfo to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen
aa94a3ba4e lift node-info setters into defaultNodeStatusFuncs
Instead of hiding these behind a helper, we just register them in a
uniform way. We are careful to keep the call-order of the setters the
same, though we can consider re-ordering in a future PR to achieve
fewer appends.
2018-07-16 09:09:47 -07:00
Michael Taufen
2df7e1ad5c port setNodeVolumesInUseStatus to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen
3e03e0611e port setNodeReadyCondition to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen
e0b6ae219f port setNodePIDPressureCondition to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen
b26e4dfa7f port setNodeDiskPressureCondition to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen
f057c9a4ae port setNodeMemoryPressureCondition to Setter abstraction, add test 2018-07-16 09:09:47 -07:00
Michael Taufen
c33f321acd port setNodeOODCondition to Setter abstraction 2018-07-16 09:09:47 -07:00
Michael Taufen
15b03b8c0c port setNodeAddress to Setter abstraction, port test
also put cloud_request_manager.go in its own package
2018-07-16 09:09:47 -07:00
Michael Taufen
a3cbbbd931 move call to defaultNodeStatusFuncs to after the rest of the Kubelet is constructed 2018-07-16 09:03:13 -07:00
Michael Taufen
08c94e0616 add nodestatus package with Setter abstraction for composable node constructors 2018-07-16 09:03:13 -07:00
Michael Taufen
d245e72bae remove incorrect comment referencing removed functionality
The cbr0 configuration behavior this comment references was removed in #34906
2018-07-16 09:03:13 -07:00
linyouchong
6ff285bce3 fix nil pointer dereference in node_container_manager#enforceExistingCgroup 2018-07-14 10:42:42 +08:00
Joonyoung Park
e6d02e9410 fix metrics help comment
pod_start_latency_microseconds is not broken down by podname.
2018-07-13 10:26:35 +09:00
Kubernetes Submit Queue
337dfe0a9c Merge pull request #65594 from liggitt/node-csr-addresses-2
Automatic merge from submit-queue (batch tested with PRs 65052, 65594). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Derive kubelet serving certificate CSR template from node status addresses

xref https://github.com/kubernetes/features/issues/267
fixes #55633

Builds on https://github.com/kubernetes/kubernetes/pull/65587

* Makes the cloud provider authoritative when recording node status addresses
* Makes the node status addresses authoritative for the kube-apiserver determining how to speak to a kubelet (stops paying attention to the hostname label when determining how to reach a kubelet, which was only done to support kubelets < 1.5)
* Updates kubelet certificate rotation to be driven from node status
  * Avoids needing to compute node addresses a second time, and differently, in order to request serving certificates.
  * Allows the kubelet to react to changes in its status addresses by updating its serving certificate
  * Allows the kubelet to be driven by external cloud providers recording node addresses on the node status

test procedure:
```sh
# setup
export FEATURE_GATES=RotateKubeletServerCertificate=true
export KUBELET_FLAGS="--rotate-server-certificates=true --cloud-provider=external"

# cleanup from previous runs
sudo rm -fr /var/lib/kubelet/pki/

# startup
hack/local-up-cluster.sh

# wait for a node to register, verify it didn't set addresses
kubectl get nodes 
kubectl get node/127.0.0.1 -o jsonpath={.status.addresses}

# verify the kubelet server isn't available, and that it didn't populate a serving certificate
curl --cacert _output/certs/server-ca.crt -v https://localhost:10250/pods
ls -la /var/lib/kubelet/pki

# set an address on the node
curl -X PATCH http://localhost:8080/api/v1/nodes/127.0.0.1/status \
  -H "Content-Type: application/merge-patch+json" \
  --data '{"status":{"addresses":[{"type":"Hostname","address":"localhost"}]}}'

# verify a csr was submitted with the right SAN, and approve it
kubectl describe csr
kubectl certificate approve csr-...

# verify the kubelet connection uses a cert that is properly signed and valid for the specified hostname, but NOT the IP
curl --cacert _output/certs/server-ca.crt -v https://localhost:10250/pods
curl --cacert _output/certs/server-ca.crt -v https://127.0.0.1:10250/pods
ls -la /var/lib/kubelet/pki

# set an hostname and IP address on the node
curl -X PATCH http://localhost:8080/api/v1/nodes/127.0.0.1/status \
  -H "Content-Type: application/merge-patch+json" \
  --data '{"status":{"addresses":[{"type":"Hostname","address":"localhost"},{"type":"InternalIP","address":"127.0.0.1"}]}}'

# verify a csr was submitted with the right SAN, and approve it
kubectl describe csr
kubectl certificate approve csr-...

# verify the kubelet connection uses a cert that is properly signed and valid for the specified hostname AND IP
curl --cacert _output/certs/server-ca.crt -v https://localhost:10250/pods
curl --cacert _output/certs/server-ca.crt -v https://127.0.0.1:10250/pods
ls -la /var/lib/kubelet/pki
```

```release-note
* kubelets that specify `--cloud-provider` now only report addresses in Node status as determined by the cloud provider
* kubelet serving certificate rotation now reacts to changes in reported node addresses, and will request certificates for addresses set by an external cloud provider
```
2018-07-11 22:25:07 -07:00
Seth Jennings
f2a7654978 move feature gate checks inside IsCriticalPod 2018-07-11 16:10:05 -05:00
Kubernetes Submit Queue
0972ce1acc Merge pull request #65649 from rsc/fix-printf
Automatic merge from submit-queue (batch tested with PRs 66076, 65792, 65649). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubernetes: fix printf format errors

These are all flagged by Go 1.11's
more accurate printf checking in go vet,
which runs as part of go test.

```release-note
NONE
```
2018-07-11 14:09:08 -07:00
jiaxuanzhou
6ac4a8588e fix bug for garbage collection 2018-07-11 09:33:08 +08:00
Russ Cox
2bd91dda64 kubernetes: fix printf format errors
These are all flagged by Go 1.11's
more accurate printf checking in go vet,
which runs as part of go test.

Lubomir I. Ivanov <neolit123@gmail.com>
applied ammend for:
  pkg/cloudprovider/provivers/vsphere/nodemanager.go
2018-07-11 00:10:15 +03:00
Kubernetes Submit Queue
421789328f Merge pull request #65997 from tallclair/writer
Automatic merge from submit-queue (batch tested with PRs 66030, 65997). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Remove unused io util writer & volume host GetWriter()

Cleanup unused code.
Fixes https://github.com/kubernetes/kubernetes/issues/16971

**Release note**:
```release-note
NONE
```

/kind cleanup
/sig storage
2018-07-10 12:46:09 -07:00
Jordan Liggitt
7828e5d0f9 Make cloud provider authoritative for node status address reporting 2018-07-10 14:33:48 -04:00
Jordan Liggitt
db9d3c2d10 Derive kubelet serving certificate CSR template from node status addresses 2018-07-10 14:33:48 -04:00
Kubernetes Submit Queue
13f9c26fd7 Merge pull request #65902 from wojtek-t/kube_proxy_less_allocations_2
Automatic merge from submit-queue (batch tested with PRs 65902, 65781). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Avoid unnecessary allocations in kube-proxy
2018-07-09 23:07:01 -07:00
Kubernetes Submit Queue
55620e2be6 Merge pull request #65987 from Random-Liu/fix-pod-worker-deadlock
Automatic merge from submit-queue (batch tested with PRs 65987, 65962). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Fix pod worker deadlock.

Preemption will stuck forever if `killPodNow` timeout once. The sequence is:
* `killPodNow` create the response channel (size 0) and send it to pod worker.
* `killPodNow` timeout and return.
*  Pod worker finishes killing the pod, and tries to send back response via the channel.

However, because the channel size is 0, and the receiver has exited, the pod worker will stuck forever.

In @jingxu97's case, this causes a critical system pod (apiserver) unable to come up, because the csi pod can't be preempted.

I checked the history, and the bug was introduced 2 years ago 6fefb428c1.

I think we should at least cherrypick this to `1.11` since preemption is beta and enabled by default in 1.11.

@kubernetes/sig-node-bugs @derekwaynecarr @dashpole @yujuhong 
Signed-off-by: Lantao Liu <lantaol@google.com>

```release-note
none
```
2018-07-09 16:53:59 -07:00
Tim Allclair
b1012b2543 Remove unused io util writer & volume host GetWriter() 2018-07-09 14:09:48 -07:00