Automatic merge from submit-queue (batch tested with PRs 40906, 40924, 40938, 40902, 40911)
print apiserver log location on apiserver error
**What this PR does / why we need it**:
Improve user experience. Attempt to direct user to logs of failing component.
**Special notes for your reviewer**:
In addition to failure, point to logs so that a user can attempt to self remedy and have more information available to debug immediately. A user may not know that the failing component has logs.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 40906, 40924, 40938, 40902, 40911)
Add [Flaky] tag to persistent volumes tests
**What this PR does / why we need it**:
Persistent Volume tests continue to flake in CI.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 40906, 40924, 40938, 40902, 40911)
Check whether apiversions is empty
What this PR does / why we need it:
#39719 check whether apisversions get from /api is empty
Special notes for your reviewer:
@caesarxuchao
Automatic merge from submit-queue
Add an upgrade test for secrets.
**What this PR does / why we need it**: This PR adds an upgrade test for secrets. It creates a secret and makes sure that pods can consume it before an after an upgrade.
Automatic merge from submit-queue
CRI: Handle cri in-place upgrade
Fixes https://github.com/kubernetes/kubernetes/issues/40051.
## How does this PR restart/remove legacy containers/sandboxes?
With this PR, dockershim will convert and return legacy containers and infra containers as regular containers/sandboxes. Then we can rely on the SyncPod logic to stop the legacy containers/sandboxes, and the garbage collector to remove the legacy containers/sandboxes.
To forcibly trigger restart:
* For infra containers, we manually set `hostNetwork` to opposite value to trigger a restart (See [here](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L389))
* For application containers, they will be restarted with the infra container.
## How does this PR avoid extra overhead when there is no legacy container/sandbox?
For the lack of some labels, listing legacy containers needs extra `docker ps`. We should not introduce constant performance regression for legacy container cleanup. So we added the `legacyCleanupFlag`:
* In `ListContainers` and `ListPodSandbox`, only do extra `ListLegacyContainers` and `ListLegacyPodSandbox` when `legacyCleanupFlag` is `NotDone`.
* When dockershim starts, it will check whether there are legacy containers/sandboxes.
* If there are none, it will mark `legacyCleanupFlag` as `Done`.
* If there are any, it will leave `legacyCleanupFlag` as `NotDone`, and start a goroutine periodically check whether legacy cleanup is done.
This makes sure that there is overhead only when there are legacy containers/sandboxes not cleaned up yet.
## Caveats
* In-place upgrade will cause kubelet to restart all running containers.
* RestartNever container will not be restarted.
* Garbage collector sometimes keep the legacy containers for a long time if there aren't too many containers on the node. In that case, dockershim will keep performing extra `docker ps` which introduces overhead.
* Manually remove all legacy containers will fix this.
* Should we garbage collect legacy containers/sandboxes in dockershim by ourselves? /cc @yujuhong
* Host port will not be reclaimed for the lack of checkpoint for legacy sandboxes. https://github.com/kubernetes/kubernetes/pull/39903 /cc @freehan
/cc @yujuhong @feiskyer @dchen1107 @kubernetes/sig-node-api-reviews
**Release note**:
```release-note
We should mention the caveats of in-place upgrade in release note.
```
Automatic merge from submit-queue
Plumb subresource through subjectaccessreview
plumb all fields for subjectaccessreview into the resulting `authorizer.AttributesRecord`
```release-note
The SubjectAccessReview API passes subresource and resource name information to the authorizer to answer authorization queries.
```
Automatic merge from submit-queue
examples: PV docs clarify Azure storage account restriction
**What this PR does / why we need it**: One line doc fix, clarifies a constraint for using `AzureDisk` volumes.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#40276
**Special notes for your reviewer**: None
**Release note**:
```release-note
NONE
```
cc: @rootfs @otaviosoares
Automatic merge from submit-queue
GroupMetaFactoryArgs documentation
**What this PR does / why we need it**:
Documentation for people writing new API-Groups.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: documentation
**Special notes for your reviewer**:
@deads2k @pmorie my thoughts from writing the service-catalog apiserver.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Tidy up the main README.
Removed the coveralls link since it hasn't been updated in a few years. Made some punctuation more consistent.
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue
Optionally avoid evicting critical pods in kubelet
For #40573
```release-note
When feature gate "ExperimentalCriticalPodAnnotation" is set, Kubelet will avoid evicting pods in "kube-system" namespace that contains a special annotation - `scheduler.alpha.kubernetes.io/critical-pod`
This feature should be used in conjunction with the rescheduler to guarantee availability for critical system pods - https://kubernetes.io/docs/admin/rescheduler/
```
Automatic merge from submit-queue (batch tested with PRs 40696, 39914, 40374)
Convert hack/e2e.go to a test-infra/kubetest shim
Replaces `hack/e2e.go` for a shim that passes the args to `k8s.io/test-infra/kubetest`
Adds fejta to `hack/OWNERS`
Adds `e2e_test.go` for unit test coverage of the shim.
`Usage: go run hack/e2e.go [--get=true] [--old=1d] -- KUBETEST_ARGS`
In other words there is are `--get` and `--old` shim flags, which control how we upgrade `kubetest`, and a `--` to separate the shim args from the kubetest args, and the existing kubetest args like `--down` `--up`, etc. If only `KUBETEST_ARGS` are used then you can skip the `--` (although golang will complain about it).
Once this is ready to go I will update the kubekins-e2e image to copy this file from test-infra: https://github.com/kubernetes/test-infra/blob/master/jenkins/e2e-image/Dockerfile#L70
ref https://github.com/kubernetes/test-infra/issues/1475
Automatic merge from submit-queue (batch tested with PRs 40696, 39914, 40374)
Forgiveness library changes
**What this PR does / why we need it**:
Splited from #34825, contains library changes that are needed to implement forgiveness:
1. ~~make taints-tolerations matching respect timestamps, so that one toleration can just tolerate a taint for only a period of time.~~ As TaintManager is caching taints and observing taint changes, time-based checking is now outside the library (in TaintManager). see #40355.
2. make tolerations respect wildcard key.
3. add/refresh some related functions to wrap taints-tolerations operation.
**Which issue this PR fixes**:
Related issue: #1574
Related PR: #34825, #39469
~~Please note that the first 2 commits in this PR come from #39469 .~~
**Special notes for your reviewer**:
~~Since currently we have `pkg/api/helpers.go` and `pkg/api/v1/helpers.go`, there are some duplicated periods of code laying in these two files.~~
~~Ideally we should move taints-tolerations related functions into a separate package (pkg/util/taints), and make it a unified set of implementations. But I'd just suggest to do it in a follow-up PR after Forgiveness ones done, in case of feature Forgiveness getting blocked to long.~~
**Release note**:
```release-note
make tolerations respect wildcard key
```
Automatic merge from submit-queue (batch tested with PRs 40696, 39914, 40374)
Cleanup scheduler server with an external config class
**What this PR does / why we need it**:
Some cleanup in cmd/server so that the parts which setup scheduler configuration are stored and separately tested.
- additionally a simple unit test to check that erroneous configs return a non-nil error is included.
- it also will make sure we avoid nil panics of schedulerConfiguration is misconfigured.
Automatic merge from submit-queue (batch tested with PRs 40862, 40909)
Remove apimachinery from staging client-go/Godeps/Godeps.json
The publishing robot will add the latest version of apimachinery to Godeps.json.
This is part of the effort to allow update staging apimachinery and staging client-go in a same PR.
The robot change is here: https://github.com/kubernetes/test-infra/pull/1784
@deads2k @stts @lavalamp
Automatic merge from submit-queue (batch tested with PRs 40862, 40909)
[Federation][kubefed] Add option to disable persistence storage for etcd
**What this PR does / why we need it**:
This is part of updates to enable deployment of federation on non-cloud environments. This pr enables disabling persistent storage for etcd via kubefed.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#40617
**Special notes for your reviewer**:
**Release note**:
```
[Federation] Add --etcd-persistent-storage flag to kubefed to enable/disable persistent storage for etcd
```
cc: @kubernetes/sig-federation-bugs @madhusudancs
Automatic merge from submit-queue (batch tested with PRs 40795, 40863)
Use caching secret manager in kubelet
I just found that this is in my local branch I'm using for testing, but not in master :)
Automatic merge from submit-queue (batch tested with PRs 40864, 40666, 38382, 40874)
Density Test includes deletion and volumes
Moved the calls to deletePodSync to BEFORE logDensityTimeSeries. This is because the parser considers a line printed in logDensityTimeSeries to be the "end" of the test. This change includes deletion in the "test window", but makes no other changes.
I also added volumes to the test, so that we can make sure that mounting and unmounting volumes are also taken into account for performance profiling.