This reverts commit 8597b343fa.
I wrote in the Kubernetes documentation:
In practice this means you need at least Linux 6.3, as tmpfs started
supporting idmap mounts in that version. This is usually needed as
several Kubernetes features use tmpfs (the service account token that is
mounted by default uses a tmpfs, Secrets use a tmpfs, etc.)
The check is wrong for several reasons:
* Pods can use userns before 6.3, they will just need to be
careful to not use a tmpfs (like a serviceaccount). MOST users
will probably need 6.3, but it is possible to use earlier kernel
versions. 5.19 probably works fine and with improvements in
the runtime 5.12 can probably be supported too.
* Several distros backport changes and the recommended way is
usually to try the syscall instead of testing kernel versions.
I expect support for simple fs like tmpfs will be backported
in several distros, but with this check it can generate confusion.
* Today a clear error is shown when the pod is created, so it's
unlikely a user will not understand why it fails.
* Returning an error if utilkernel fails to understand what
kernel version is running is also too strict (as we are
logging a warning even if it is not the expected version)
* We are switching to enabled by default, which will log a
warning on every user that runs on an older than 6.3 kernel,
adding noise to the logs.
For there reasons, let's just remove the hardcoded kernel version check.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
This reverts commit fd06dcd604.
The revert is not to make it a hard error again, this revert is needed
to revert cleanly the commit that added this as an error in the first
place.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
This is just refactoring / renaming.
The SELinux e2e tests grab node metrics so far, so mention `Node` in the
function names. Kube-controller-metrics will follow in a subsequent commit.
We used to flush and re-add all map/set elements on nftables
setup, but it is faster to read the existing elements and only
transact the diff.
Signed-off-by: Nadia Pinaeva <npinaeva@redhat.com>
add (admittedly pretty crude) CPU allocatable check.
A more incisive refactoring is needed, but we need
to unbreak CI first, so this seems the minimal decently clean test.
Signed-off-by: Francesco Romani <fromani@redhat.com>
A real SELinuxOptionsToFileLabel function needs access to host's
/etc/selinux to read the defaults. This is not possible in
kube-controller-manager that often runs in a container and does not have
access to /etc on the host. Even if it had, it could run on a different
Linux distro than worker nodes.
Therefore implement a custom SELinuxOptionsToFileLabel that does not
default fields in SELinuxOptions and uses just fields provided by the Pod.
Since the controller cannot default empty SELinux label components,
treat them as incomparable.
Example: "system_u:system_r:container_t:s0:c1,c2" *does not* conflict with ":::s0:c1,c2",
because the node that will run such a Pod may expand "":::s0:c1,c2" to "system_u:system_r:container_t:s0:c1,c2".
However, "system_u:system_r:container_t:s0:c1,c2" *does* conflict with ":::s0:c98,c99".
When proc mount is set to default, it should mask /proc.
The DefaultProcMount test was setting "hostUsers: false" which means to
create a user namespaces. This was not causing issues before, because
user namespaces was disabled by default and therefore the field was
completely ignored. Now that userns is enabled by default, the test is
failing as the runtime doesn't always have userns support.
One option would be to filter for runtimes that do have userns support.
But the default case (/proc is masked) for sure we want to test it
without userns support, as it will be applied to all pods.
To that end, we add a param "hostUsers bool" to testProcMount that will
enable it or not. Then, both test cases that call this function set it
accordingly: the default case sets it to true (no user namespace), and
the unmasked case with a privileged pod sets it to false (use a user
namespace), to verify the /proc mount is unmasked in this case.
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
kubeproxy_conntrack_reconciler_deleted_entries_total can be used
to track total entries deleted in conntrack reconciliation.
Signed-off-by: Daman Arora <aroradaman@gmail.com>
kube_proxy_conntrack_reconciler_sync_duration_seconds can be used
to track the latency of conntrack flow reconciliation.
Signed-off-by: Daman Arora <aroradaman@gmail.com>