With this commit kube-proxy accepts current system values (retrieved by sysctl) which are higher than the internally known and expected values.
The code change was mistakenly created as PR in the k3s project (see https://github.com/k3s-io/k3s/pull/3505).
A real life use case is described in Rancher issue https://github.com/rancher/rancher/issues/33360.
When Kubernetes runs on a Node which itself is a container (e.g. LXC), and the value is changed on the (LXC) host, kube-proxy then fails at the next start as it does not recognize the current value and attempts to overwrite the current value with the previously known one. This result in:
```
I0624 07:38:23.053960 54 conntrack.go:103] Set sysctl 'net/netfilter/nf_conntrack_max' to 524288
F0624 07:38:23.053999 54 server.go:495] open /proc/sys/net/netfilter/nf_conntrack_max: permission denied
```
However a sysctl overwrite only makes sense if the current value is lower than the previously known and expected value. If the value was increased on the host, that shouldn't really bother kube-proxy and just go on with it.
Signed-off-by: Claudio Kuenzler ck@claudiokuenzler.com
If device plugin returns device without topology, keep it internaly
as NUMA node -1, it helps at podresources level to not export NUMA
topology, otherwise topology is exported with NUMA node id 0,
which is not accurate.
It's imposible to unveile this bug just by tracing json.Marshal(resp)
in podresource client, because NUMANodes field ID has json property
omitempty, in this case when ID=0 shown as emtpy NUMANode.
To reproduce it, better to iterate on devices and just
trace dev.Topology.Nodes[0].ID.
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
* This allows a controller to use cloud provider managed RBAC
when --use-service-account-credentials is set.
* Create ControllerInitFuncConstructor to pass to init funcs to avoid
future function signature growth.
* Add comments for context around legacy naming of node controllers.
* Add example for setting client names from cloud controller manager.
This is a knob added by runc 1.0.2 specifically for kubernetes,
which tells runc/libcontainer/cgroups/systemd v1 manager to not
freeze the cgroup in Set().
We set this knob here because this code is only used for pods
(rather than containers) management, and in this place we create or
update the pod cgroup with no device limits set, so we can skip the
freeze.
If this knob is not set, libcontainer's cgroup v1 manager tries to
figure out whether the freeze is needed or not, but it's a somewhat
expensive check to perform, thus the knob is a shortcut.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
For the complete release notes, see
- https://github.com/opencontainers/runc/releases/tag/v1.0.2
In particular, this fixes the check cgroup v1 systemd manager check
if a container needs to be frozen before Set(), and adds a knob to
skip the check/freeze entirely (to be used by the next commit).
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
As part of https://github.com/kubernetes/kubernetes/pull/100720 we
backported fix on existing releases and in this commit we completely
remove the deprecated metric from master branch.
Signed-off-by: dntosas <ntosas@gmail.com>