If the node authorizer is active, RBAC rules are not needed. But if it's
disabled, kubelet needs to get permission through RBAC. In contrast to the
authorizer code which is a bit more flexible and isn't directly tied to the
current kubelet implementation (i.e. it allows list+delete instead of just
deletecollection), the RBAC entry is just for what the current kubelet does
because it's a bit easier to change.
In reality, the kubelet plugin of a DRA driver is meant to be deployed as a
daemonset with a service account that limits its
permissions. https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#additional-metadata-in-pod-bound-tokens
ensures that the node name is bound to the pod, which then can be used
in a validating admission policy (VAP) to ensure that the operations are
limited to the node.
In E2E testing, we emulate that via impersonation. This ensures that the plugin
does not accidentally depend on additional permissions.
This is the second and final step towards making kubelet independent of the
resource.k8s.io API versioning because it now doesn't need to copy structs
defined by that API from the driver to the API server.
This is a first step towards making kubelet independent of the resource.k8s.io
API versioning because it now doesn't need to copy structs defined by that API
from the driver to the API server. The next step is removing the other
direction (reading ResourceClaim status and passing the resource handle to
drivers).
The drivers must get deployed so that they have their own connection to the API
server. Securing at least the writes via a validating admission policy should
be possible.
As before, the kubelet removes all ResourceSlices for its node at startup, then
DRA drivers recreate them if (and only if) they start up again. This ensures
that there are no orphaned ResourceSlices when a driver gets removed while the
kubelet was down.
While at it, logging gets cleaned up and updated to use structured, contextual
logging as much as possible. gRPC requests and streams now use a shared,
per-process request ID and streams also get logged.
2e34e187c9 enabled kubelet to do List and Watch
requests with the caveat that kubelet should better use a field selector (which
it does). The same is now also needed for DeleteCollection because kubelet will
use that to clean up in one operation instead of using multiple.
Objects in a sync.Pool are assumed to be fungible. This is not a good assumption for pools
of *bytes.Buffer because a *bytes.Buffer's underlying array grows as needed to accomodate writes. In
Kubernetes, apiservers tend to encode "small" objects very frequently and much larger
objects (especially large lists) only occasionally. Under steady load, pooled buffers tend to be
borrowed frequently enough to prevent them from being released. Over time, each buffer is used to
encode a large object and its capacity increases accordingly. The result is that practically all
buffers in the pool retain much more capacity than needed to encode most objects.
As a basic mitigation for the worst case, buffers with more capacity than the default max request
body size are never returned to the pool.
If the nfacct sub-system is not available in the kernel then:
1. nfacct based metrics won't be registered.
2. proxier will not attempt to ensure the counters
Signed-off-by: Daman Arora <aroradaman@gmail.com>