desiredStateOfWorldPopulator.findAndRemoveDeletedPods() should remove
volumes from DSW when a pod is deleted on the API server and the volume is
uncertain in ASW.
podVolumesExist() should consider also uncertain volumes (where kubelet
does not know if a volume was fully unmounted) when checking for pod's
volumes. Added GetPossiblyMountedVolumesForPod for that.
Adding uncertain mounts to GetMountedVolumesForPod would potentially break
other callers (e.g. `verifyVolumesMountedFunc`).
When UnmountDevice fails, kubelet treat the volume mount as uncertain,
because it does not know at which stage UnmountDevice failed. It may be
already partially unmonted / destroyed.
As result, MountDevice will be performer when a new Pod is started on the
node after UnmountDevice faiure.
This re-removes rsc.io/quote and rsc.io/sampler from the go.mod.
They never made it into the vendor/ tree, but still contribute
to dependency resolution complexity.
These were originally removed in #97337 but slipped back in.
Signed-off-by: Dan Lorenc <dlorenc@google.com>
Some CSI drivers can't clone a volume into other topology segment (e.g. a
cloud availability zone). The scheduler does not know about these
restrictions and schedules pods with PVCs that clone a volume mostly
randomly.
Run all volume cloning tests in the same topology segment, if such segment
is available and has at least one schedulable node.
The MetricsGrabber itself knows now whether it supports each
component. The checks inside the tests therefore are redundant at best
or worse, they are wrong: for example, on a KinD cluster the check for
"has master node registered" failed and metrics grabbing from
scheduler and controller manager were skipped unnecessarily.
The MetricsGrabber checked whether a component supported metrics
grabbing, but then tests didn't have an API to use the result of that
check. Because metrics grabbing is an optional debug feature, tests
must skip checks that depend on metrics data or, when the entire
test is about metrics data, skip the test.
This is now supported with a special error that gets wrapped and
returned by the individual Grab functions.
This can be checked by trying to retrieve log output. As in the case
of no pod found, a warning gets emitted when log retrieval fails and
metrics grabbing gets disabled.
Logging is checked instead of actual metrics retrieval because the
latter is more complex and thus more likely to fail for other reasons.
The previous approach with grabbing via a nginx proxy had some
drawbacks:
- it did not work when the pods only listened on localhost (as
configured by kubeadm) and the proxy got deployed on a different
node
- starting the proxy raced with starting the pods, causing
sporadic test failures because the proxy was not set up
properly unless it saw all pods when starting the e2e.test
- the proxy was always started, whether it is needed or not
- the proxy was left running after a test and then the next
test run triggered potentially confusing messages when
it failed to create objects for the proxy
The new approach is similar to "kubectl port-forward" + "kubectl get
--raw". It uses the port forwarding feature to establish a TCP
connection via a custom dialer, then lets client-go handle TLS and
credentials.
Somehow verifying the server certificate did not work. As this
shouldn't be a big concern for E2E testing, certificate checking gets
disabled on the client side instead of investigating this further.