Automatic merge from submit-queue
apiextensions: add Established condition
This introduces a `Established` condition on `CustomResourceDefinition`s. `Established` means that the resource has become active. A resource is established when all names are accepted initially without a conflict. A resource stays established until deleted, even during a later NameConflict due to changed names. Note that not all names can be changed.
This change is necessary to allow deletion of once-active CRDs which might have still instances, but have NameConflicts now. Before this PR the REST endpoint was not active anymore in this case, making deletion of the instances impossible.
Automatic merge from submit-queue (batch tested with PRs 45891, 46147)
Watching ClusterId from within GCE cloud provider
**What this PR does / why we need it**:
Adds the ability for the GCE cloud provider to watch a config map for `clusterId` and `providerId`.
WIP - still needs more testing
cc @MrHohn @csbell @madhusudancs @thockin @bowei @nikhiljindal
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45891, 46147)
fix typo
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue (batch tested with PRs 45514, 45635)
hyperkube_test should not depend on number of spaces.
From #45524.
Apparently adding a long flag to kube-controller-manager breaks the hyperkube unit tests, because they depend on number of spaces :)
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45514, 45635)
refactor certificate controller to break it into two parts
Break pkg/controller/certificates into:
* pkg/controller/certificates/approver: containing the group approver
* pkg/controller/certificates/signer: containing the local signer
* pkg/controller/certificates: containing shared infrastructure
```release-note
Break the 'certificatesigningrequests' controller into a 'csrapprover' controller and 'csrsigner' controller.
```
Automatic merge from submit-queue (batch tested with PRs 42042, 46139, 46126, 46258, 46312)
Remove unused test properties
Issue: #42676
A separate serial memcg suite was created for the initial stages of re-enabling memcg notifications. Now that all e2e tests have memcg notifications enabled, this suite is no longer needed.
Automatic merge from submit-queue (batch tested with PRs 42042, 46139, 46126, 46258, 46312)
Append X-Forwarded-For in proxy handler
Append the request sender's IP to the `X-Forwarded-For` header chain when proxying requests. This is important for audit logging (https://github.com/kubernetes/features/issues/22) in order to capture the client IP (specifically in the case of federation or kube-aggregator).
/cc @liggitt @deads2k @ericchiang @ihmccreery @soltysh
Automatic merge from submit-queue (batch tested with PRs 42042, 46139, 46126, 46258, 46312)
Remove kubectl's dependence on pkg/api/helper
**What this PR does / why we need it**:
Remove kubectl's dependence on pkg/api/helper, as part of
broader effort to isolate kubectl from the rest of k8s.
In this case, the code becomes private to kubectl; nobody else uses it.
**Which issue this PR fixes**
Part of a series of PRs to address kubernetes/community#598
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42042, 46139, 46126, 46258, 46312)
[Federation] Use service accounts instead of the user's credentials when accessing joined clusters' API servers.
Fixes#41267.
Release notes:
```release-note
Modifies kubefed to create and the federation controller manager to use credentials associated with a service account rather than the user's credentials.
```
Automatic merge from submit-queue
Fix some typo of comment in kubelet.go
**What this PR does / why we need it**:
The PR is to fix some typo in kubelet.go
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
N/A
**Special notes for your reviewer**:
**Release note**:
```release-note
```
Automatic merge from submit-queue
remove init blocks from all admission plugins
**What this PR does / why we need it**:
removes init blocks from all admission plugins
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46149, 45897, 46293, 46296, 46194)
check flag format in file known-flags.txt
All flags in file hack/verify-flags/known-flags.txt should contain
character -, this change check it to prevent adding useless flags
to known-flags.txt
ref #45948
**Release note**:
```
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46149, 45897, 46293, 46296, 46194)
Use storage instead of REST for the CRD finalizer
**What this PR does / why we need it**:
Switch the custom resource definition finalizer controller to use
storage instead of a REST client, because a client could incorrectly try
to delete ThirdPartyResources whose names happen to collide with the
CustomResource instances.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46149, 45897, 46293, 46296, 46194)
Chaosmonkey - Signal stop to tests and wait for done when disruption fails
**What this PR does / why we need it**:
Prevents tests from leaking resources because their Teardown was never called when test disruption fails.
**Which issue this PR fixes**
First problem of #45842
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 46149, 45897, 46293, 46296, 46194)
GC: update required verbs for deletable resources, allow list of ignored resources to be customized
The garbage collector controller currently needs to list, watch, get,
patch, update, and delete resources. Update the criteria for
deletable resources to reflect this.
Also allow the list of resources the garbage collector controller should
ignore to be customizable, so downstream integrators can add their own
resources to the list, if necessary.
cc @caesarxuchao @deads2k @smarterclayton @mfojtik @liggitt @sttts @kubernetes/sig-api-machinery-pr-reviews
Automatic merge from submit-queue
Allow the /logs handler on the apiserver to be toggled.
Adds a flag to kube-apiserver, and plumbs through en environment variable in configure-helper.sh
Automatic merge from submit-queue
Double `StopContainer` request timeout.
Doubled `StopContainer` request timeout to leave some time for `SIGKILL` container.
@yujuhong @feiskyer
Automatic merge from submit-queue
removing generic_scheduler todo after discussion (#46027)
**What this PR does / why we need it**:
**Which issue this PR fixes** #46027
**Special notes for your reviewer**: just a quick clean cc @wojtek-t
**Release note**:
```release-note
```
Switch the custom resource definition finalizer controller to use
storage instead of a REST client, because a client could incorrectly try
to delete ThirdPartyResources whose names happen to collide with the
CustomResource instances.
Automatic merge from submit-queue
Enable "kick the tires" support for Nvidia GPUs in COS
This PR provides an installation daemonset that will install Nvidia CUDA drivers on Google Container Optimized OS (COS).
User space libraries and debug utilities from the Nvidia driver installation are made available on the host in a special directory on the host -
* `/home/kubernetes/bin/nvidia/lib` for libraries
* `/home/kubernetes/bin/nvidia/bin` for debug utilities
Containers that run CUDA applications on COS are expected to consume the libraries and debug utilities (if necessary) from the host directories using `HostPath` volumes.
Note: This solution requires updating Pod Spec across distros. This is a known issue and will be addressed in the future. Until then CUDA workloads will not be portable.
This PR updates the COS base image version to m59. This is coupled with this PR for the following reasons:
1. Driver installation requires disabling a kernel feature in COS.
2. The kernel API for disabling this interface changed across COS versions
3. If the COS image update is not handled in this PR, then a subsequent COS image update will break GPU integration and will require an update to the installation scripts in this PR.
4. Instead of having to post `3` PRs, one each for adding the basic installer, updating COS to m59, and then updating the installer again, this PR combines all the changes to reduce review overhead and latency, and additional noise that will be created when GPU tests break.
**Try out this PR**
1. Get Quota for GPUs in any region
2. `export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci`
3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1" cluster/kube-up.sh`
4. `kubectl create -f cluster/gce/gci/nvidia-gpus/cos-installer-daemonset.yaml`
5. Run your CUDA app in a pod.
**Another option is to run a e2e manually to try out this PR**
1. Get Quota for GPUs in any region
2. export `KUBE_GCE_ZONE=<zone-with-gpus>` KUBE_NODE_OS_DISTRIBUTION=gci
3. `NODE_ACCELERATORS="type=nvidia-tesla-k80,count=1"`
4. `go run hack/e2e.go -- --up`
5. `hack/ginkgo-e2e.sh --ginkgo.focus="\[Feature:GPU\]"`
The e2e will install the drivers automatically using the daemonset and then run test workloads to validate driver integration.
TODO:
- [x] Update COS image version to m59 release.
- [x] Remove sleep from the install script and add it to the daemonset
- [x] Add an e2e that will run the daemonset and run a sample CUDA app on COS clusters.
- [x] Setup a test project with necessary quota to run GPU tests against HEAD to start with https://github.com/kubernetes/test-infra/pull/2759
- [x] Update node e2e serial configs to install nvidia drivers on COS by default
Automatic merge from submit-queue (batch tested with PRs 45587, 46286)
fix typo in kubelet
**What this PR does / why we need it**:
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 45587, 46286)
PDB Max Unavailable Field
Completes https://github.com/kubernetes/features/issues/285
```release-note
Adds a MaxUnavailable field to PodDisruptionBudget
```
Individual commits are self-contained; Last commit can be ignored because it is autogenerated code.
cc @kubernetes/sig-apps-api-reviews @kubernetes/sig-apps-pr-reviews
Allow the list of resources the garbage collector controller should
ignore to be customizable, so downstream integrators can add their own
resources to the list, if necessary.
The garbage collector controller currently needs to list, watch, get,
patch, update, and delete resources. Update the criteria for
deletable resources to reflect this.
Automatic merge from submit-queue (batch tested with PRs 45766, 46223)
Scheduler should use a shared informer, and fix broken watch behavior for cached watches
Can be used either from a true shared informer or a local shared
informer created just for the scheduler.
Fixes a bug in the cache watcher where we were returning the "current" object from a watch event, not the historic event. This means that we broke behavior when introducing the watch cache. This may have API implications for filtering watch consumers - but on the other hand, it prevents clients filtering from seeing objects outside of their watch correctly, which can lead to other subtle bugs.
```release-note
The behavior of some watch calls to the server when filtering on fields was incorrect. If watching objects with a filter, when an update was made that no longer matched the filter a DELETE event was correctly sent. However, the object that was returned by that delete was not the (correct) version before the update, but instead, the newer version. That meant the new object was not matched by the filter. This was a regression from behavior between cached watches on the server side and uncached watches, and thus broke downstream API clients.
```
Automatic merge from submit-queue
tighten and simplify owners in some staging repos
With the move to staging, we can have much cleaner owners across the related packages. This pares down the list of OWNERS to better match for code and activity. It should help get PRs directed to people more active and familiar with the areas for quicker review.
@kubernetes/sig-api-machinery-misc
@lavalamp @smarterclayton ptal.