Fixes the deletion of ClusterCIDR object, when a Node is associated(has
Pod CIDRs allocated from this ClusterCIDR) with it. Currently the
ClusterCIDR finalizer is never cleaned up as there is no reconciliation
happening after the associated Node has been deleted. This commit fixes
the issue by adding workitems from all events to a worker queue and
reconcile until the delete is successful.
Using a `sync.Pool` to re-use the buffers that have to escape in the
predicate significantly reduces the number of allocations needed to run
the SchemaHas method, as shown in the following example:
```
> benchstat old.bench new.bench
name old time/op new time/op delta
SchemaHas-8 11.0ms ± 0% 2.1ms ± 1% -80.60% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
SchemaHas-8 37.5MB ± 0% 0.0MB ± 0% -100.00% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
SchemaHas-8 73.1k ± 0% 0.0k -100.00% (p=0.008 n=5+5)
```
PVCs using the ReadWriteOncePod access mode can only be referenced by a
single pod. When a pod is scheduled that uses a ReadWriteOncePod PVC,
return "Unschedulable" if the PVC is already in-use in the cluster.
To support preemption, the "VolumeRestrictions" scheduler plugin
computes cycle state during the PreFilter phase. This cycle state
contains the number of references to the ReadWriteOncePod PVCs used by
the pod-to-be-scheduled.
During scheduler simulation (AddPod and RemovePod), we add and remove
reference counts from the cycle state if they use any of these
ReadWriteOncePod PVCs.
In the Filter phase, the scheduler checks if there are any PVC reference
conflicts, and returns "Unschedulable" if there is a conflict.
This is a required feature for the ReadWriteOncePod beta. See for more context:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2485-read-write-once-pod-pv-access-mode#beta
SchemaHas needs to escape the apiextensions.JSONSchemaProps pointer
since it's calling an arbitrary predicate function that can keep a copy
of the pointer (even though it shouldn't, but the compiler can't know
that). The leak is resulting in a fair amount of memory allocation when
installing many CRDs in a row (about 3% of the memory allocated happens
in this method).
In fact, this actually uses pkg/util/node's GetHostname() but takes
the unit tests from cmd/kubeadm/app/util's private fork of that
function since they were more extensive. (Of course the fact that
kubeadm had a private fork of this function is a strong argument for
moving it to component-helpers.)
When a block device, say /dev/sdc, is unexpectedly disconnected from a node,
the corresponding backing file path found at /sys/block/loop*/backing_file gets
a "(deleted)" suffix. This patch trims that suffix out, allowing the Kubelet to
unmount the volume correctly.
This fixes the following warning (error?) in the apiserver:
E0126 18:10:38.665239 16370 fieldmanager.go:210] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (test/claim-84; resource.k8s.io/v1alpha1, Kind=ResourceClaim) to smd typed: .status.reservedFor: element 0: associative list without keys has an element that's a map type" VersionKind="/, Kind=" namespace="test" name="claim-84"
The root cause is the same as in e50e8a0c91:
nothing in Kubernetes outright complains about a list of items where the item
type is comparable in Go, but not a simple type. This nonetheless isn't
supposed to be done in the API and can causes problems elsewhere.
For the ReservedFor field, everything seems to work okay except for the
warning. However, it's better to follow conventions and use a map. This is
possible in this case because UID is guaranteed to be a unique key.
Validation is now stricter than before, which is a good thing: previously,
two entries with the same UID were allowed as long as some other field was
different, which wasn't a situation that should have been allowed.