The "Failed to update lease optimistically, falling back to slow path"
message was logged at Error level, but this is expected behavior during
normal leader election when the optimistic update encounters a conflict.
The system gracefully falls back to the slow path (Get + Update), so
this is not a real error. Downgrade to V(2) Info to reduce log noise.
Kubernetes-commit: 04977a0ea4592bfaa70d5095a4cfe99dd4b847e1
When watch.Broadcaster.Shutdown() is called it drains all queued events
then calls closeAll(), which closes every watcher's result channel.
eventBroadcasterImpl.Shutdown() calls Broadcaster.Shutdown() first,
then calls the cancellation context's cancel() function. Between those
two steps there is a window in which the result channel is closed while
the cancellation context is still live.
Without the two-value channel receive the goroutine in StartEventWatcher
would spin on the already-closed channel: each select iteration
immediately receives the zero-value watch.Event, the type assertion
fails (nil interface, ok == false), and the loop continues burning CPU
until the select scheduler eventually picks the cancelationCtx.Done()
case.
Guard against this by reading the ok boolean from the channel receive:
case watchEvent, ok := <-watcher.ResultChan():
if !ok {
return
}
This is the correct and idiomatic Go pattern for a channel that may be
closed by its producer. Note that when this return path is taken the
broadcaster has already delivered every queued event (Broadcaster.Shutdown
blocks until the distribute loop exits before closeAll runs), so no
events are silently dropped.
Add a regression test (TestStartEventWatcherExitsOnDirectShutdown) that
creates a broadcaster without an external context so Shutdown() is
fully synchronous, starts a watcher, and verifies the goroutine exits
cleanly via goleak.VerifyNone.
Signed-off-by: Rajneesh180 <rajneeshrehsaan48@gmail.com>
Kubernetes-commit: 95c15b54069922b0a66c198a064577ea0a160694
The error returned from Until() is solely from context cancellation
which is expected behavior when the reflector is stopped. Logging
this as an error (or even at V(4)) creates unnecessary noise.
Kubernetes-commit: cc483208aa306b8c4078d4118cf78a10e58481ec
The change introduced in 59cd1d0b3bb378f40a639e21b615f4df1d4a5a14
causes PollUntilContextTimeout to call the condition function even when
the context is canceled already. This is unnecessary in our case and
only leads to one extra loop of processing and extra error messages.
This change makes the renew loop behave like it did before the change.
Kubernetes-commit: aa494f8174d745c3d9842f3d48272a3c2d6dcbcc
Move backoff documentation comments to var block for better discoverability.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Kubernetes-commit: 33ebd41b2c1abdc03beabd9ccff3428a8fd46473
Replace all imports of k8s.io/apimachinery/pkg/util/dump with
k8s.io/utils/dump across the repo. The apimachinery dump package
now contains deprecated wrapper functions that delegate to
k8s.io/utils/dump for backwards compatibility.
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Kubernetes-commit: 550cc8645bedcc8b187e0ebeb52ead29d5631a32
Passing a context to StartWithContext enables context-aware reflector
logging. This is the main remaining source of log spam (output to stderr
instead of per-test logger) in controller unit tests.
WaitForCacheSynceWithContext takes advantage of the new cache.WaitFor +
NamedHasSynced functionality to finish "immediately" (= no virtual time
passed) in a synctest bubble. While at it, the return type gets improved so
that a failure is easier to handle.
Kubernetes-commit: 5ff323de791df88880f6e065f5de4b445e5c90ed
The main advantage is that waiting on channels creates a causal relationship
between goroutines which is visible to synctest. When a controller in a
synctest bubble does a WaitFor in a test's background goroutine for the
controller, the test can use synctest.Wait to wait for completion of cache
sync, without requiring any test specific "has controller synced" API. Without
this, the test had to poll or otherwise wait for the controller.
The polling in WaitForCacheSync moved the virtual clock forward by a random
amount, depending on how often it had to check in wait.Poll. Now tests can be
written such that all events during a test happen at a predictable time. This
will be demonstrated in a separate commit for the
pkg/controller/devicetainteviction unit test.
The benefit for normal production is immediate continuation when the last
informer is synced (not really a problem, but still...) and more important,
nicer logging thanks to the names associated with the thing that is being
waited for. The caller decides whether logging is enabled or disabled and
describes what is being waited for (typically informer caches, but maybe also
event handlers or even something else entirely as long as it implements the
DoneChecker interface).
Before:
Waiting for caches to sync
Caches are synced
After:
Waiting for="cache and event handler sync"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1.Pod"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1.ResourceClaim"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1.ResourceSlice"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1.DeviceClass"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1alpha3.DeviceTaintRule"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1.ResourceClaim + event handler k8s.io/kubernetes/pkg/controller/devicetainteviction.(*Controller).Run"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1.Pod + event handler k8s.io/kubernetes/pkg/controller/devicetainteviction.(*Controller).Run"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1alpha3.DeviceTaintRule + event handler k8s.io/kubernetes/pkg/controller/devicetainteviction.(*Controller).Run"
Done waiting for="cache and event handler sync" instance="SharedIndexInformer *v1.ResourceSlice + event handler k8s.io/kubernetes/pkg/controller/devicetainteviction.(*Controller).Run"
The "SharedIndexInformer *v1.Pod" is also how this appears in metrics.
Kubernetes-commit: fdcbb6cba9a04c028b158bf66d505df7431f63fe
This improves logging and enables more informative waiting for cache sync in a
following commit. It addresses one klog.TODO in the Reflector.
The RealFIFOOptions and InformerOptions structs get extended the same way as
DeltaFIFOOptions before: a logger may be set, but it's not required. This is
not an API break.
That the name has to be passed separately is a bit annoying at first glance
because it could also be set directly on the logger through WithName, but
keeping it separate is better:
- name can be set without providing a logger
- name can be defaulted
- less code in the caller when passing through a logger and adding
the name only in the field
- last but not least, extracting the name is not supported in a portable
manner by logr
All in-tree references in production code get updated.
While at it, logging in the fifos gets updated to follow best practices: if
some code encounters an abnormal situation and then continues, it should use
utilruntime.HandleErrorWithLogger instead of normal error logging.
Existing "logger" fields get moved to the top because that is a more common
place for such a read-only field.
Kubernetes-commit: 45251e5f654e6c052659d110cd721f9fbe185191
While time.Sleep is what the test needs, maybe an arbitrary hook invocation is
more acceptable in the production code because it is more general.
Kubernetes-commit: 2ec0305d728bf5ce8f8df314a18e71aa120a00cf
In the unlikely situation that sharedProcessor.distribute was triggered by a
resync before sharedProcessor.run had a chance to start the listeners, the
sharedProcessor deadlocked: sharedProcessor.distribute held a read/write lock
on listenersLock while being blocked on the write to the listener's
channel. The listeners who would have read from those weren't get started
because sharedProcessor.run was blocked trying to get a read lock for
listenersLock.
This gets fixed by releasing the read/write lock in sharedProcessor.distribute
while waiting for all listeners to be started. Because either all or no
listeners are started, the existing global listenersStarted boolean is
sufficient.
The TestListenerResyncPeriods tests now runs twice, with and without the
artificial delay. It gets converted to a synctest, so it executes quickly
despite the time.Sleep calls and timing is deterministic. The enhanced log
output confirms that with the delay, the initial sync completes later:
=== RUN TestListenerResyncPeriods
shared_informer_test.go:236: 0s: listener3: handle: pod1
shared_informer_test.go:236: 0s: listener3: handle: pod2
shared_informer_test.go:236: 0s: listener1: handle: pod1
shared_informer_test.go:236: 0s: listener1: handle: pod2
shared_informer_test.go:236: 0s: listener2: handle: pod1
shared_informer_test.go:236: 0s: listener2: handle: pod2
shared_informer_test.go:236: 2s: listener2: handle: pod1
shared_informer_test.go:236: 2s: listener2: handle: pod2
shared_informer_test.go:236: 3s: listener3: handle: pod1
shared_informer_test.go:236: 3s: listener3: handle: pod2
--- PASS: TestListenerResyncPeriods (0.00s)
=== RUN TestListenerResyncPeriodsDelayed
shared_informer_test.go:236: 1s: listener1: handle: pod1
shared_informer_test.go:236: 1s: listener1: handle: pod2
shared_informer_test.go:236: 1s: listener2: handle: pod1
shared_informer_test.go:236: 1s: listener2: handle: pod2
shared_informer_test.go:236: 1s: listener3: handle: pod1
shared_informer_test.go:236: 1s: listener3: handle: pod2
shared_informer_test.go:236: 2s: listener2: handle: pod1
shared_informer_test.go:236: 2s: listener2: handle: pod2
shared_informer_test.go:236: 3s: listener3: handle: pod1
shared_informer_test.go:236: 3s: listener3: handle: pod2
--- PASS: TestListenerResyncPeriodsDelayed (0.00s)
Kubernetes-commit: e6ef79b2f6bb05205652e4fe48ffa523d9e3a1ec
The API for the package already had a context, so all that was missing was to
extract and use the logger from that.
Kubernetes-commit: 1620b2707623036f5133cf8045da89411d2c4345
The only log output is for error messages which should normally not occur. It's
also likely that users expect to see exactly those messages, so it's better to
not touch them.
Kubernetes-commit: 8701f481ff6e24257021f07afcec67df2aef27dc
The API for the package already had a context, so all that was missing was to
extract and use the logger from that.
Kubernetes-commit: 0690af4710a5b322163564d82b95b90417ae3644
The only log output is for error messages which should normally not occur. It's
also likely that users expect to see exactly those messages, so it's better to
not touch them.
Kubernetes-commit: 1ec1248a6555be7ca7a0827061cdd3ef162f6c45
When debugging, it helps to keep output from different connections
separate. This can be done with contextual logging and using different loggers
for each connection.
Cancellation is handled separately for requests. Therefore the new APIs only
add support for passing a logger instance.
Kubernetes-commit: dfdf07bb531aa8a397f3c74e5eec851130325971
I wasn't entirely sure whether this should return a value or a pointer to
satisfy the interface. Both works, so I benchmarked it elsewhere (REST
mapper). Mem allocs are the same (one alloc/call), but returning a value is 10%
slower when calling one method.
What I then benchmarked is whether pointer vs value receiver in the wrapper
makes a difference. Converting from value receiver (what I had before) to
pointer receiver reduced call overhead by 6%. That's because with a value
receiver, Go has to auto-generate a variant with pointer receiver and calls the
value receiver through that.
That can be seen in a debugger (call stack) and when setting breakpoints:
(dlv) b restMapperWrapper.KindForWithContext
Command failed: Location "restMapperWrapper.KindForWithContext" ambiguous: k8s.io/apimachinery/pkg/api/meta.restMapperWrapper.KindForWithContext, k8s.io/apimachinery/pkg/api/meta.(*restMapperWrapper).KindForWithContext…
Conventional wisdom is to define types with value receiver because those can be
called also on unmutable instances, making them more flexible.
But for types which will only ever be used via a pointer, I think pointer
receiver is better for the reasons above (small performance difference, easier
to debug).
Kubernetes-commit: b21dcbcaa1ccf4995bf486afc37dc0321c5bdf0b