Automatic merge from submit-queue (batch tested with PRs 45052, 44983, 41254)
[Federation][e2e] Add 2 new testcases to federation service e2e
**What this PR does / why we need it**:
Add 2 new test cases for federation services
- Federation service updation should update clustered service
- Federation service controller should recreate service shard in cluster, if it gets deleted.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#27623, #35827
Handles one of the tasks discussed in #41253
**Special notes for your reviewer**:
**Release note**:
`NONE`
cc @kubernetes/sig-federation-bugs, @nikhiljindal @madhusudancs
Automatic merge from submit-queue (batch tested with PRs 45052, 44983, 41254)
Non-controversial part of #44523
For easier review of #44523, i extracted the non-controversial part out to this PR.
Automatic merge from submit-queue (batch tested with PRs 45052, 44983, 41254)
Update gazel to v17
**What this PR does / why we need it**: gazel v17 has a bugfix for creating the `vendor/BUILD` file from scratch. there should be no other changes.
**Release note**:
```release-note
NONE
```
/assign @mikedanese @spxtr
Automatic merge from submit-queue (batch tested with PRs 44124, 44510)
Add metrics to all major gce operations (latency, errors)
```release-note
Add metrics to all major gce operations {latency, errors}
The new metrics are:
cloudprovider_gce_api_request_duration_seconds{request, region, zone}
cloudprovider_gce_api_request_errors{request, region, zone}
`request` is the specific function that is used.
`region` is the target region (Will be "<n/a>" if not applicable)
`zone` is the target zone (Will be "<n/a>" if not applicable)
Note: this fixes some issues with the previous implementation of
metrics for disks:
- Time duration tracked was of the initial API call, not the entire
operation.
- Metrics label tuple would have resulted in many independent
histograms stored, one for each disk. (Did not aggregate well).
```
Automatic merge from submit-queue (batch tested with PRs 44124, 44510)
Optimize the time taken to create Persistent volumes with VSAN storage capabilities at scale and handle VPXD crashes
Currently creating persistent volumes with VSAN storage capabilities at scale is taking very large amount of time. We have tested at the scale of 500-600 PVC's and its more time for all the PVC requests to go from Pending state to Bound state.
- In our current design we use a single systemVM - "kubernetes-helper-vm" as a means to create a persistent volume with the VSAN policy configured.
- Since all the operations are on a single system VM, all requests on scale get queued and executed serially on this system VM. Because of this creating a high number of PVC's is taking very large time.
- Since its a single system VM, all parallel PVC requests most of the time tend to take the same SCSI adapter on the system VM and also same unit number on the SCSI adapter. Therefore the error rate is high.
Inorder to overcome these issues and to optimize the time taken to create persistent volumes with VSAN storage capabilities at scale we have slightly modified the design which is described below:
- In this model, we create a VM on the fly for every persistent volume that is being created. Since all the reconfigure operations to create a disk with the VSAN policy configured are on their individual VM's, all of these PVC's request execute in parallel independent one other.
- With this new design, there will no error rate at all.
Also, we have overcome the problem of vpxd crashes and any other intermediate problems by checking type of the errors.
Fixes https://github.com/vmware/kubernetes/issues/122, https://github.com/vmware/kubernetes/issues/124
@kerneltime @tusharnt @divyenpatel @pdhamdhere
**Release note**:
```release-note
None
```
Automatic merge from submit-queue
Fix cockroachdb statefulset test read/write commands
Explicitly specifying `--insecure` is required on insecure clusters,
which started being enforced in a very recent release. In 2 weeks
we'll have a stable image version that we can reliably pin the
relevant statefulset yaml file to in order to avoid stupid failures
like this. I'm really sorry for the flakes!
**What this PR does / why we need it**:
It fixes the currently broken statefulset test suite - https://storage.googleapis.com/k8s-gubernator/triage/index.html?job=gci-gce-statefulset&test=CockroachDB
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*:
N/A
**Special notes for your reviewer**:
N/A
**Release note**:
```release-note
NONE
```
@kow3ns
Automatic merge from submit-queue (batch tested with PRs 42740, 44980, 45039, 41627, 45044)
Update kubernetes-e2e charm to use snaps
**What this PR does / why we need it**:
This updates the kubernetes-e2e charm to use snaps instead of Juju resources for payload delivery.
The main advantage of this is that it decouples the charm from the e2e payload, allowing us to support multiple versions of Kubernetes with a single release of the charm.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
Update kubernetes-e2e charm to use snaps
```
Automatic merge from submit-queue (batch tested with PRs 42740, 44980, 45039, 41627, 45044)
Improved code coverage for /pkg/kubelet/types
**What this PR does / why we need it**:
The test coverage for /pkg/kubelet/types was increased from 50% to 87.5%
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 42740, 44980, 45039, 41627, 45044)
[Federation] Convert Daemonset to use the generic sync controller
To be rebased on master when @perotinus's configmaps PR merges.
Tested integration and e2e.
Automatic merge from submit-queue (batch tested with PRs 42740, 44980, 45039, 41627, 45044)
Cleanup some of the tarball producing code for e2e node tests
This is some e2e node cleanup work I found sitting in a local branch while deleting old local git branches. It looks like it's still useful.
The new metrics is:
cloudprovider_gce_api_request_duration_seconds{request, region, zone}
cloudprovider_gce_api_request_errors{request, region, zone}
`request` is the specific function that is used.
`region` is the target region (Will be "<n/a>" if not applicable)
`zone` is the target zone (Will be "<n/a>" if not applicable)
Note: this fixes some issues with the previous implementation of
metrics for disks:
- Time duration tracked was of the initial API call, not the entire
operation.
- Metrics label tuple would have resulted in many independent
histograms stored, one for each disk. (Did not aggregate well).
Automatic merge from submit-queue (batch tested with PRs 41106, 44346, 44929, 44979, 45027)
Log error before failing in autoscaling e2e
The gcloud alpha command in e2e fails, but no useful information (error message) is logged.
Automatic merge from submit-queue (batch tested with PRs 41106, 44346, 44929, 44979, 45027)
Make metrics filenames for e2e tests indicate the test better
Currently the names of the json files with metrics for e2e tests are named by appending the `SummaryKind` with a timestamp of the test. It took me some time to figure out which file corresponds to which e2e test due to this. Changing it to have the testname instead of the timestamp.
cc @wojtek-t @gmarek
Automatic merge from submit-queue (batch tested with PRs 41106, 44346, 44929, 44979, 45027)
Add PATCH to supported list of proxy subresource verbs
Follow up to #41421 for the proxy subresources
```release-note
The proxy subresource APIs for nodes, services, and pods now support the HTTP PATCH method.
```
Automatic merge from submit-queue
Don't check in zz_generated.openapi.go.
`zz_generated.openapi.go` is the file that causes the most merge conflicts of all. In #33440, @thockin updated the makefile to support generating these files on demand, but that didn't play well with bazel/gazel.
In this PR, I add a new build macro that will generate this file with a `go_genrule`. I added support for keeping the BUILD file up to date in mikedanese/gazel#34.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 44591, 44549)
[Federation][e2e] Fix a failing federation e2e testcase in gce-serial
This is to fix the failing test case in federation [gce-serial](https://k8s-testgrid.appspot.com/cluster-federation#gce-serial) tests. The test case has been failing consistently since we registered the clusters in suite-init instead of doing it in every test case.
Instead of registering and then unregistering, we will be now unregistering and then registering the cluster to federation. this test will be run in serial and will not affect other test cases too.
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 44591, 44549)
Update repo-infra bazel dependency and use new gcs_upload rule
This PR provides similar functionality to push-build.sh entirely within Bazel rules (though it relies on gsutil).
It's an alternative to #44306.
Depends on https://github.com/kubernetes/repo-infra/pull/13.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Update cluster-autoscaler logging config
Previously cluster-autoscaler would duplicate all logs,
writing to master /var/log and /tmp inside pod.
Automatic merge from submit-queue
Update token controller test to test async retry
Fixes#44819https://github.com/kubernetes/kubernetes/pull/44625 changed the token controller to queue a retry if the live service account's resourceVersion did not match our cache.
This updates the unit test that was testing that condition to test async queue behavior (which this condition now drives)
Automatic merge from submit-queue
Increase timeout for Stackdriver Logging e2e tests
They're failing in CI, because Stackdriver Logging's List method is too slow for this purpose. Quick fix, should be gone completely when reading is implemented properly
/cc @piosz
Automatic merge from submit-queue
Cleanup storeToClusterConditionLister
ClusterConditionPredicate() has been deleted,
storeToClusterConditionLister will be unused.
Automatic merge from submit-queue (batch tested with PRs 44970, 43618)
CRI: Fix StopContainer timeout
Fixes https://github.com/kubernetes/kubernetes/issues/44956.
I verified this PR with the example provided in https://github.com/kubernetes/kubernetes/issues/44956, and now pod deletion will respect grace period timeout:
```
NAME READY STATUS RESTARTS AGE
gracefully-terminating-pod 1/1 Terminating 0 6m
```
@dchen1107 @yujuhong @feiskyer /cc @kubernetes/sig-node-bugs
Automatic merge from submit-queue
Allow Partial Success for ImageGC
Fixes#44951. When the eviction manager is under disk pressure, it first attempts to reclaim disk space by deleting images. However, if there are any errors during the image deletion process, the eviction manager treats that as a failed attempt delete images--even if some were successfully deleted.
This change essentially makes the eviction manager ignore errors during image garbage collection, and instead rely solely on the quantity of resources reclaimed. If image deletion completely fails, for example, then this should still work as it would return 0 bytes freed. This allows for partial success, because any resources freed are counted, regardless of if some images fail to be deleted, for example.
This does not require any changes to the image manager, as the current behavior is already to return the disk space freed along with any errors.
```release-note
Fixes a bug where pods were evicted even after images are successfully deleted.
```
cc @dchen1107 @vishh @kubernetes/kubernetes-release-managers
note to reviewers: this is mostly whitespace changes, so it will make more sense in reviewable
Automatic merge from submit-queue (batch tested with PRs 44940, 44974, 44935)
apimachinery/pkg/util/wait: Fix potential goroutine leak in pollInternal().
**What this PR does / why we need it**:
Without the change, the wait function wouldn't exit until the timeout
happens, so if the timeout is set to a big value and the Poll() is run
inside a loop, then the total goroutines will increase indefinitely.
This PR fixes the issue by closing the stop channel to tell the wait function
to exit immediately if condition is true or any error happens.
Automatic merge from submit-queue (batch tested with PRs 44940, 44974, 44935)
Remove import of internal api package in generated external-versioned listers
Follow up of https://github.com/kubernetes/kubernetes/pull/44523
One line change in cmd/libs/go2idl/lister-gen/generators/lister.go, and simple changes in pkg/apis/autoscaling/v2alpha1/register.go, other changes are generated.
The internal api package will be eliminated from client-go, so these imports should be removed. Also, it's more correct to report the versioned resource in the error.