Automatic merge from submit-queue
Ensure slices are serialized as zero-length, not null
Fixes https://github.com/kubernetes/kubernetes/issues/43203 null serialization of slices to prevent NPE errors in clients that store and expect to receive non-null JSON values in these fields.
Ensures when we are converting to an external slice field that will be serialized even if empty (has `json` tag that does not include `omitempty`), we populate it with `[]`, not `nil`
Other places I considered putting this logic instead:
* When unmarshaling
* Would have to be done for both protobuf and ugorji
* Would still have to be done here (or on marshal) to handle cases where we construct objects to return
* When marshaling
* Would have to switch to use custom json marshaler (currently we use stdlib)
* When defaulting
* Defaulting isn't run on some fields, notably, pod template in rc/deployment spec
* Would still have to be done here (or on marshal) to handle cases where we construct objects to return
```release-note
API fields that previously serialized null arrays as `null` and empty arrays as `[]` no longer distinguish between those values and always output `[]` when serializing to JSON.
```
Automatic merge from submit-queue
Keep ResourceQuota admission at the end of the chain
Fixes#43426
Moves DefaultTolerationSeconds admission prior to ResourceQuota to keep it at the end of the chain
Automatic merge from submit-queue
Increase memory limit for fluentd-gcp
This PR increases fluentd memory limit in fluentd-gcp addon to avoid OOMs. Request is left intact
Although this should eventually be moved into kubefed itself, monitor kubefed from federation-up.sh and force it to timeout after being unable to initialize.
Automatic merge from submit-queue
e2e test for cluster-autoscaler draining node
**What this PR does / why we need it**:
Adds an e2e test for Cluster-Autoscaler removing a node with a pod running (by rescheduling the pod).
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #
**Special notes for your reviewer**:
@mwielgus can you take a look?
**Release note**:
```release-note
```
Automatic merge from submit-queue
Install a REJECT rule for nodeport with no backend
Rather than actually accepting the connection, REJECT. This will avoid
CLOSE_WAIT.
Fixes#43212
@justinsb @felipejfc @spiddy
Automatic merge from submit-queue
Unify test timeouts under a common name.
Some timeouts were too aggressive and since we've slowly been moving every controller to 5 minutes, consolidate everyone under ``federatedDefaultTestTimeout``. To aid in debugging some service-related issues, if a service cannot be deleted, we issue a kubectl describe on it prior to failing.
Automatic merge from submit-queue
Use uid in config.go instead of pod full name.
For https://github.com/kubernetes/kubernetes/issues/43397.
In config.go, use pod uid in pod cache.
Previously, if we update the static pod, even though a new UID is generated in file.go, config.go will only reference the pod with pod full name, and never update the pod UID in the internal cache. This causes:
1) If we change container spec, kubelet will restart the corresponding container because the container hash is changed.
2) If we change pod spec, kubelet will do nothing.
With this fix, kubelet will always restart pod whenever pod spec (including container spec) is changed.
@yujuhong @bowei @dchen1107
/cc @kubernetes/sig-node-bugs
Automatic merge from submit-queue
Use Semantic.DeepEqual to compare DaemonSet template on updates
Switch to `Semantic.DeepEqual` when comparing templates on DaemonSet updates, since we can't distinguish between `null` and `[]` in protobuf. This avoids unnecessary DaemonSet pods restarts.
I didn't touch `reflect.DeepEqual` used in controller because it's close to release date, and the DeepEqual in the controller doesn't cause serious issues (except for maybe causing more enqueues than needed).
Fixes#43218
@liggitt @kargakis @lukaszo @kubernetes/sig-apps-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42452, 43399)
Fix faulty assumptions in summary API testing
**What this PR does / why we need it**:
1. on systemd, launch kubelet in dedicated part of cgroup hierarchy
1. bump allowable memory usage for busy box containers as my own local testing often showed values > 1mb which were valid per the memory limit settings we impose
1. there is a logic flaw today in how we report node.memory.stats that needs to be fixed in follow-on.
for the last issue, we look at `/sys/fs/cgroup/memory.stat[rss]` value which if you have global accounting enabled on systemd machines (as expected) will report 0 because nothing runs local to the root cgroup. we really want to be showing the total_rss value for non-leaf cgroups so we get the full hierarchy of usage.
Automatic merge from submit-queue (batch tested with PRs 42452, 43399)
Modify getInstanceByName to avoid calling getInstancesByNames
This PR modify getInstanceByname to loop through all management zones
directly instead of calling getInstancesByNames. Currently
getInstancesByNames use a node name prefix as a filter to list the
instances. If the prefix does not match, it will return all instances
which is very wasteful since getInstanceByName only query one instance
with a specific name.
Partially fix issue #42445
Automatic merge from submit-queue (batch tested with PRs 43398, 43368)
CRI: add support for dns cluster first policy
**What this PR does / why we need it**:
PR #29378 introduces ClusterFirstWithHostNet policy but only dockertools was updated to support the feature.
This PR updates kuberuntime to support it for all runtimes.
**Which issue this PR fixes**
fixes#43352
**Special notes for your reviewer**:
Candidate for v1.6.
**Release note**:
```release-note
NONE
```
cc @thockin @luxas @vefimova @Random-Liu
Automatic merge from submit-queue
Deflake TestSyncDeploymentDeletionRace
**What this PR does / why we need it**:
The cache was sometimes catching up while we were testing the case
where the cache is not yet caught up.
Before this fix, I could reproduce the failure with the following
command. After the fix, it passes.
```
go test -count 100000 -run TestSyncDeploymentDeletionRace
```
I checked the other controllers, and they all were already not starting informers for the deletion race test. I also checked that the deletion race tests for other controllers all pass with `-count 100000`.
**Which issue this PR fixes**:
Fixes#43390
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 42659, 43370)
dockershim: process protocol correctly for port mapping
**What this PR does / why we need it**:
dockershim: process protocol correctly for port mapping.
**Which issue this PR fixes**
Fixes#43365.
**Special notes for your reviewer**:
Should be included in v1.6.
**Release note**:
```release-note
NONE
```
cc/ @Random-Liu @justinsb @kubernetes/sig-node-pr-reviews
Automatic merge from submit-queue (batch tested with PRs 42659, 43370)
RC/RS: Fixes for ControllerRef.
**What this PR does / why we need it**:
This fixes some issues with RC/RS ControllerRef handling that were brought up in reviews for other controller types, after #41984 was merged. See the individual commit messages for details.
**Which issue this PR fixes**:
**Special notes for your reviewer**:
**Release note**:
```release-note
```
The cache was sometimes catching up while we were testing the case
where the cache is not yet caught up.
Before this fix, I could reproduce the failure with the following
command. After the fix, it passes.
```
go test -count 100000 -run TestSyncDeploymentDeletionRace
```
bazel update
added new files to reflect that only one method has changed between arch types.
forgot to add changes to a commit.
changes made and gfmt run.
changed node_problem_detector to node_problem_detector_linux and made it linux only.
updated bazel
Automatic merge from submit-queue
Loosen requirements of cluster logging e2e tests, make them more stable
There should be an e2e test for cloud logging in the main test suite, because this is the important part of functionality and it can be broken by different components.
However, existing cluster logging e2e tests were too strict for the current solution, which may loose some log entries, which results in flakes. There's no way to fix this problem in 1.6, so this PR makes basic cluster logging e2e tests less strict.
lsblk reports FSTYPE of devices with partition tables as empty string "",
which is indistinguishable from empty devices. We must look for dependent
devices (i.e. partitions) to see that the device is really empty and report
error otherwise.
I checked that LVM, LUKS and MD RAID have their own FSTYPE in lsblk output,
so it should be only a partition table that has empty FSTYPE.
The main point of this patch is to run lsblk without "-n", i.e. print all
dependent devices and check if they're there.
PR #29378 introduces ClusterFirstWithHostNet, but docker doesn't support
setting dns options togather with hostnetwork. This commit rewrites
resolv.conf same as dockertools.
PR #29378 introduces ClusterFirstWithHostNet policy but only dockertools
was updated to support the feature. This PR updates kuberuntime to
support it for all runtimes.
Also fixes#43352.