Automatic merge from submit-queue
Optimize priorities in scheduler
Ref #28590
It's probably easier to review it commit by commit, since those changes are kind of independent from each other.
@davidopp - FYI
Automatic merge from submit-queue
Error info "scheduler" modify
File "plugin\pkg\scheduler\algorithm\scheduler_interface_test.go“, line 49, "st.t.Errorf("Unexpected error %v\nTried to scheduler: %#v", err, pod)", here "scheduler" should be "schedule" because it is to schedule pod.
Automatic merge from submit-queue
Prevent kube-proxy from panicing when sysfs is mounted as read-only.
Fixes https://github.com/kubernetes/kubernetes/issues/25543.
This PR:
* Checks the permission of sysfs before setting conntrack hashsize, and returns an error "readOnlySysFSError" if sysfs is readonly. As I know, this is the only place we need write permission to sysfs, CMIIW.
* Update a new node condition 'RuntimeUnhealthy' with specific reason, message and hit to the administrator about the remediation.
I think this should be an acceptable fix for now.
Node problem detector is designed to integrate with different problem daemons, but **the main logic is in the problem detection phase**. After the problem is detected, what node problem detector does is also simply updating a node condition.
If we let kube-proxy pass the problem to node problem detector and let node problem detector update the node condition. It looks like an unnecessary hop. The logic in kube-proxy won't be different from this PR, but node problem detector will have to open an unsafe door to other pods because the lack of authentication mechanism.
It is a bit hard to test this PR, because we don't really have a bad docker in hand. I can only manually test it:
* If I manually change the code to let it return `"readOnlySysFSError`, the node condition will be updated:
```
NetworkUnavailable False Mon, 01 Jan 0001 00:00:00 +0000 Fri, 08 Jul 2016 01:36:41 -0700 RouteCreated RouteController created a route
OutOfDisk False Fri, 08 Jul 2016 01:37:36 -0700 Fri, 08 Jul 2016 01:34:49 -0700 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Fri, 08 Jul 2016 01:37:36 -0700 Fri, 08 Jul 2016 01:34:49 -0700 KubeletHasSufficientMemory kubelet has sufficient memory available
Ready True Fri, 08 Jul 2016 01:37:36 -0700 Fri, 08 Jul 2016 01:35:26 -0700 KubeletReady kubelet is posting ready status. WARNING: CPU hardcapping unsupported
RuntimeUnhealthy True Fri, 08 Jul 2016 01:35:31 -0700 Fri, 08 Jul 2016 01:35:31 -0700 ReadOnlySysFS Docker unexpectedly mounts sysfs as read-only for privileged container (docker issue #24000). This causes the critical system components of Kubernetes not properly working. To remedy this please restart the docker daemon.
KernelDeadlock False Fri, 08 Jul 2016 01:37:39 -0700 Fri, 08 Jul 2016 01:35:34 -0700 KernelHasNoDeadlock kernel has no deadlock
Addresses: 10.240.0.3,104.155.176.101
```
* If not, the node condition `RuntimeUnhealthy` won't appear.
* If I run the permission checking code in a unprivileged container, it did return `readOnlySysFSError`.
I'm not sure whether we want to mark the node as `Unscheduable` when this happened, which only needs few lines change. I can do that if we think we should.
I'll add some unit test if we think this fix is acceptable.
/cc @bprashanth @dchen1107 @matchstick @thockin @alex-mohr
Mark P1 to match the original issue.
[]()
Automatic merge from submit-queue
Optimizing the processing flow of HandlePodAdditions and canAdmitPod …
Optimizing the processing flow of HandlePodAdditions and canAdmitPod methods. If the following loop body in canAdmitPod method is removed, the detection speed can be improved, and the change is very small.
------
otherPods := []*api.Pod{}
for _, p := range pods {
if p != pod {
otherPods = append(otherPods, p)
}
}
------
Automatic merge from submit-queue
Inspect the nodeInfo first for CheckServiceAffinity in predicates.go
Suggest to inspect the nodeInfo first for CheckServiceAffinity in predicates.go. When nodeInfo.Node() is nil, return quickly.
Signed-off-by: Kevin Wang <wang.kanghua@zte.com.cn>
change the note for the canAdmitPod method.
Signed-off-by: Kevin Wang <wang.kanghua@zte.com.cn>
gofmt kubelet.go
Signed-off-by: Kevin Wang <wang.kanghua@zte.com.cn>
Automatic merge from submit-queue
[WIP/RFC] Rescheduling in Kubernetes design proposal
Proposal by @bgrant0607 and @davidopp (and inspired by years of discussion and experience from folks who worked on Borg and Omega).
This doc is a proposal for a set of inter-related concepts related to "rescheduling" -- that is, "moving" an already-running pod to a new node in order to improve where it is running. (Specific concepts discussed are priority, preemption, disruption budget, quota, `/evict` subresource, and rescheduler.)
Feedback on the proposal is very welcome. For now, please stick to comments about the design, not spelling, punctuation, grammar, broken links, etc., so we can keep the doc uncluttered enough to make it easy for folks to comment on the more important things.
ref/ #22054#18724#19080#12611#20699#17393#12140#22212
@HaiyangDING @mqliang @derekwaynecarr @kubernetes/sig-scheduling @kubernetes/huawei @timothysc @mml @dchen1107
Automatic merge from submit-queue
Add meta field to predicate signature to avoid computing the same things multiple times
This PR only uses it to avoid computing QOS of a pod for every node from scratch.
Ref #28590
Automatic merge from submit-queue
RemoveContainer in Runtime interface
- Added a DeleteContainer method in Runtime interface
- Implemented DeleteContainer for docker
#28552
Automatic merge from submit-queue
Fixes bad heuristic when calling "tc show" to check interface
`tc` sometimes returns stuff that has more than 12 words in its response. The heuristic is bad, but this at least fixes the case when `tc` is returning too much.
Fixes#28571.
Automatic merge from submit-queue
Add checks in Create and Update Cgroup methods
This PR is connected to upstream issue for adding pod level cgroups in Kubernetes: #27204
Libcontainer currently doesen't support updates to parent devices cgroups. Until we get libcontainer to support skipping devices cgroup we will have that logic on the kubelet side.
This PR includes:
1. Skip the devices cgroup when updating a cgroup. We only update the memory and cpu subsytems.
2. We explicitly pass all the cgroup paths that don't already exist to Apply()
3. Adds an AlreadyExists() method which is a utility function to check if all the subsystems of a cgroup already exist.
On cgroupManager.Update() we only call Set() and cgroupManager.Create() we only call Apply() method
@vishh PTAL
Automatic merge from submit-queue
Extract kubelet network code into its own file
Continuing the effort to begin modularizing the kubelet, this PR extracts the networking code into its own file.
@kubernetes/sig-node cc @kubernetes/sig-network
Automatic merge from submit-queue
allow lock acquisition injection for quota admission
Allows for custom lock acquisition when composing the quota admission controller.
@derekwaynecarr I'm still experimenting to make sure this satisfies the need downstream, but looking for agreement in principle
Automatic merge from submit-queue
Include petsets in kubectl valid commands
Petsets are already implemented in kubectl, but there were no hints
for that subcommand.
Fixes#25615
Automatic merge from submit-queue
Allow specifying base location for test etcd data
Allows controlling where etcd test data goes. Needed in some environments (like AWS/EBS) to allow putting etcd data on a higher performing volume than /tmp
Automatic merge from submit-queue
Update coreos node e2e image to a version that uses cgroupfs
Temporary fix for #28192. This PR updates coreos node e2e image to a version that uses cgroupfs.
cc @vishh @yifan-gu
Search and replace for references to moved examples
Reverted find and replace paths on auto gen docs
Reverting changes to changelog
Fix bugs in test-cmd.sh
Fixed path in examples README
ran update-all successfully
Updated verify-flags exceptions to include renamed files
Automatic merge from submit-queue
Fixes#28205, Check release tar location for Openstack-Heat provider
This does a basic check to see where the release tars are located.
Allows people to use openstack-heat outside of compiling k8s.
Automatic merge from submit-queue
Move KUBE_GIT_UPSTREAM out of init.sh and into *-munge-docs.sh.
It is only used in those 2 scripts and this way we can set the value dynamically.
Clean up a bit too (80col, formatting)