Automatic merge from submit-queue
GCE: Bubble IP reservation error to the user when the address is specified.
This PR improves the debug-ability of internal load balancers when an IP fails to be reserved. I'm mostly worried about the case when the subnetwork URL is wrong or referencing a shared network from another project which isn't yet supported. As you can see from line 160, I had originally planned to surface the reservation error, but printed the wrong error.
**Special notes for your reviewer**:
/assign @yujuhong
Please apply 1.8 milestone.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Pipe in upgrade image target for kube-proxy migration tests
**What this PR does / why we need it**:
https://k8s-testgrid.appspot.com/sig-network#gci-gce-latest-upgrade-kube-proxy-ds&width=20
and
https://k8s-testgrid.appspot.com/sig-network#gci-gce-latest-downgrade-kube-proxy-ds&width=20
are still failing.
Reproduced it locally and found node image is being default to debian during upgrade (it was gci before upgrade) because we don't pass in `gci` via `--upgrade--target`. And for some reasons (haven't figured out yet), the upgraded node uses debian image with gci startupscripts...
This PR pipes in `--upgrade-target` for kube-proxy migration tests, hopefully in conjunction with https://github.com/kubernetes/test-infra/pull/4447 it will bring the tests back to normal.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #NONE
**Special notes for your reviewer**:
Sorry for bothering again.
/assign @krousey
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Convert deprecated gcloud --regexp flag into --filter
**What this PR does / why we need it**: #49673 missed a flag in hiding:
```console
WARNING: Flag --regexp is deprecated. Use --filter="name~'REGEXP'" instead.
ERROR: gcloud crashed (TypeError): 'NoneType' object is not iterable
If you would like to report this issue, please run the following command:
gcloud feedback
To check gcloud for common problems, please run the following command:
gcloud info --run-diagnostics
```
(Also it's great how gcloud crashes, rather than handling the deprecation gracefully.)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#49673
**Release note**:
```release-note
NONE
```
cc @pci
Rather than just changing the config once to see if dynamic kubelet
config at-least-sort-of-works, this extends the test to check that the
Kubelet reports the expected Node condition and the expected configuration
values after several possible state transitions.
Additionally, this adds a stress test that changes the configuration 100
times. It is possible for resource leaks across Kubelet restarts to
eventually prevent the Kubelet from restarting. For example, this test
revealed that cAdvisor's leaking journalctl processes (see:
https://github.com/google/cadvisor/issues/1725) could break dynamic
kubelet config. This test will help reveal these problems earlier.
This commit also makes better use of const strings and fixes a few bugs
that the new testing turned up.
Related issue: #50217
Automatic merge from submit-queue (batch tested with PRs 52097, 52054)
Move paused deployment e2e tests to integration
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: xref #52113
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 52097, 52054)
Provide field info in storage configuration
Makes debugging how storage was configured difficult
@kubernetes/sig-api-machinery-bugs
Addresses part of issue #51487.
This is a big change for testing; any testjobs that do not
set an explicit KUBE_NODE_OS_DISTRIBUTION will have been running
on CVM, but after this PR will start running COS.
CVM is being deprecated, and falls out of support on 2018/10/01.
In addition, bump the patch version of COS from
cos-stable-60-9592-84-0 to cos-stable-60-9592-90-0.
Automatic merge from submit-queue (batch tested with PRs 51239, 51644, 52076)
code-generator/protobuf: cut-off kubernetes specifics
The package list moved to hack/update-generated-protobuf-dockerized.sh.
This make the protobuf generator usable outside of kube.
Automatic merge from submit-queue (batch tested with PRs 51239, 51644, 52076)
do not update init containers status if terminated
fixes#29972#41580
This fixes an issue where, if a completed init container is removed while the pod or subsequent init containers are still running, the status for that init container will be reset to `Waiting` with `PodInitializing`.
This can manifest in a number of ways.
If the init container is removed why the main pod containers are running, the status will be reset with no functional problem but the status will be reported incorrectly in `kubectl get pod` for example
If the init container is removed why a subsequent init container is running, the init container will be **re-executed** leading to all manner of badness.
@derekwaynecarr @bparees
Automatic merge from submit-queue (batch tested with PRs 51239, 51644, 52076)
Fix swallowed error in registrytest
**What this PR does / why we need it**: Fixes a swallowed error in the registrytest package.
```release-note NONE
```
The commit allow ScaleIO volume plugin to read SDC GUID value as a node label.
If binary drv_cfg is not installed, the plugin will still work properly.
If node label not found, it defaults to drv_cfg if installed.
Automatic merge from submit-queue
Fix proxied request-uri to be valid HTTP requests
Fixes#52022, introduced in 1.7. Stringifying/re-parsing the URL masked that the path was not constructed with a leading `/` in the first place.
This makes upgrade requests proxied to pods/services via the API server proxy subresources be valid HTTP requests
```release-note
Fixes an issue with upgrade requests made via pod/service/node proxy subresources sending a non-absolute HTTP request-uri to backends
```
Automatic merge from submit-queue
StatefulSet: Deflake e2e RunHostCmd.
The initial retry up to 20s was giving up too soon. I'm seeing this test flake because the Node rebooted and it takes ~2min to recover. Now StatefulSet RunHostCmd calls will use the same 5min timeout as with other Pod state checks.
ref #48031
Centralize the key "forget" and "requeue" process in only on method.
Change the signature of the syncJob method in order to return the
information if it is necessary to forget the backoff delay for a given
key.
Automatic merge from submit-queue
kubeadm: Set the new BT auth group on the init token
**What this PR does / why we need it**:
What I forgot to do in https://github.com/kubernetes/kubernetes/pull/51956😅
When we now have the new group, we should also set it on the token, otherwise nodes can't be joined
On the good side, our CI testing broke https://k8s-testgrid.appspot.com/sig-cluster-lifecycle#kubeadm-gce
Great to see that it actually works :)
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
@kubernetes/sig-cluster-lifecycle-pr-reviews
Automatic merge from submit-queue
kubeadm: add `kubeadm phase addons` command
**What this PR does / why we need it**:
Adds the `addons` phase command to `kubeadm`
fixes: https://github.com/kubernetes/kubeadm/issues/418
/cc @luxas
For 1.8 this will be off by default. In 1.9 it will be on by default.
Add tests and rename some fields to use the `chunking` terminology.
Note that the pager may be used for other things besides chunking.
Automatic merge from submit-queue (batch tested with PRs 51728, 49202)
Fix setNodeAddress when a node IP and a cloud provider are set
**What this PR does / why we need it**:
When a node IP is set and a cloud provider returns the same address with
several types, only the first address was accepted. With the changes made
in PR #45201, the vSphere cloud provider returned the ExternalIP first,
which led to a node without any InternalIP.
The behaviour is modified to return all the address types for the
specified node IP.
**Which issue this PR fixes**: fixes#48760
**Special notes for your reviewer**:
* I'm not a golang expert, is it possible to mock `kubelet.validateNodeIP()` to avoid the need of real host interface addresses in the test ?
* It would be great to have it backported for a next 1.6.8 release.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 51728, 49202)
Enable CRI-O stats from cAdvisor
**What this PR does / why we need it**:
cAdvisor may support multiple container runtimes (docker, rkt, cri-o, systemd, etc.)
As long as the kubelet continues to run cAdvisor, runtimes with native cAdvisor support may not want to run multiple monitoring agents to avoid performance regression in production. Pending kubelet running a more light-weight monitoring solution, this PR allows remote runtimes to have their stats pulled from cAdvisor when cAdvisor is registered stats provider by introspection of the runtime endpoint.
See issue https://github.com/kubernetes/kubernetes/issues/51798
**Special notes for your reviewer**:
cAdvisor will be bumped to pick up https://github.com/google/cadvisor/pull/1741
At that time, CRI-O will support fetching stats from cAdvisor.
**Release note**:
```release-note
NONE
```
The initial retry up to 20s was giving up too soon.
I'm seeing this test flake because the Node rebooted and it takes ~2min
to recover.
Now StatefulSet RunHostCmd calls will use the same 5min timeout as with
other Pod state checks.