checkWatchListConsistencyIfRequested performs a data consistency check only when
the KUBE_WATCHLIST_INCONSISTENCY_DETECTOR environment variable was set during a binary startup.
The consistency check is meant to be enforced only in the CI, not in production.
The check ensures that data retrieved by the watch-list api call
is exactly the same as data received by the standard list api call.
Note that this function will panic when data inconsistency is detected.
This is intentional because we want to catch it in the CI.
Kubernetes-commit: b31e7793d0d873a71c90caf8455556aa905cf88d
originally we honored only apierrors.IsInvalid
but decided to fallback on every error
because it is better to make progress than deadlocking
Kubernetes-commit: 4b3915017950a114124a88c5d308bd8bfb9ec48e
`SetWatchErrorHandler` claims it will fail if Run() has already started.
But if they are called concurrently, it will actually trigger a data
race.
With this PR:
```
62702 runs so far, 0 failures (100.00% pass rate). 59.152682ms avg, 189.068387ms max, 26.623785ms min
```
Without this PR:
```
5012 runs so far, 38 failures (99.25% pass rate). 58.675502ms avg, 186.018084ms max, 29.468104ms min
```
Kubernetes-commit: 35d2431b3a89c5bd693846952e9d27ce4e3a0754
Before, we've used two separate backoff managers for List and Watch
calls, now they share single backoff manager.
Kubernetes-commit: 337728b02559dec8a613fdef174f732da9cae310
Without this change, sometimes leaked goroutines were reported for
test/integration/scheduler_perf. The one that caused the cleanup to get delayed
was this one:
goleak.go:50: found unexpected goroutines:
[Goroutine 2704 in state chan receive, 2 minutes, with k8s.io/client-go/tools/cache.(*Reflector).watch on top of the stack:
goroutine 2704 [chan receive, 2 minutes]:
k8s.io/client-go/tools/cache.(*Reflector).watch(0xc00453f590, {0x0, 0x0}, 0x1f?, 0xc00a128080?)
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/reflector.go:388 +0x5b3
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc00453f590, 0xc006e94900)
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/reflector.go:324 +0x3bd
k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/reflector.go:279 +0x45
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc007aafee0)
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x49
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc003e18150?, {0x75e37c0, 0xc00389c280}, 0x1, 0xc006e94900)
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xcf
k8s.io/client-go/tools/cache.(*Reflector).Run(0xc00453f590, 0xc006e94900)
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/reflector.go:278 +0x257
k8s.io/apimachinery/pkg/util/wait.(*Group).StartWithChannel.func1()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:58 +0x3f
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:75 +0x74
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:73 +0xe5
watch() was stuck in an exponential backoff timeout. Logging confirmed that:
I0309 21:14:21.756149 1572727 reflector.go:387] k8s.io/client-go/informers/factory.go:150: watch of *v1.PersistentVolumeClaim returned Get "https://127.0.0.1:38269/api/v1/persistentvolumeclaims?allowWatchBookmarks=true&resourceVersion=1&timeout=7m47s&timeoutSeconds=467&watch=true": dial tcp 127.0.0.1:38269: connect: connection refused - backing off
Kubernetes-commit: b4751a52d53d4acc6a5ce3e796938c9a12f81fcb
Since the behavior is now changed, and the old behavior leaked objects,
this adds a new comment about how Replace works.
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <odin@uged.al>
Kubernetes-commit: 27f4bcae5c52a3bb88141f940ec23d907a15cde5
This is useful to both reduce the code complexity, and to ensure clients
get the "newest" version of an object known when its deleted. This is
all best-effort, but for clients it makes more sense giving them the
newest object they observed rather than an old one.
This is especially useful when an object is recreated. eg.
Object A with key K is in the KnownObjects store;
- DELETE delta for A is queued with key K
- CREATE delta for B is queued with key K
- Replace without any object with key K in it.
In this situation its better to create a DELETE delta with
DeletedFinalStateUnknown with B (with this patch), than it is to give
the client an DeletedFinalStateUnknown with A (without this patch).
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <odin@uged.al>
Kubernetes-commit: 7bcc3e00fc28b2548886d04639a2e352ab37fb55
This fixes an issue where a relist could result in a DELETED delta
with an object wrapped in a DeletedFinalStateUnknown object; and then on
the next relist, it would wrap that object inside another
DeletedFinalStateUnknown, leaving the user with a "double" layer
of DeletedFinalStateUnknown's.
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <odin@uged.al>
Kubernetes-commit: 0bf0546d9f75d92c801e81c9f7adf040bba64102
This fixes a race condition when a "short lived" object
is created and the create event is still present on the queue
when a relist replaces the state. Previously that would lead in the
object being leaked.
The way this could happen is roughly;
1. new Object is added O, agent gets CREATED event for it
2. watch is terminated, and the agent runs a new list, L
3. CREATE event for O is still on the queue to be processed.
4. informer replaces the old data in store with L, and O is not in L
- Since O is not in the store, and not in the list L, no DELETED event
is queued
5. CREATE event for O is still on the queue to be processed.
6. CREATE event for O is processed
7. O is <leaked>; its present in the cache but not in k8s.
With this patch, on step 4. above it would create a DELETED event
ensuring that the object will be removed.
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <odin@uged.al>
Kubernetes-commit: 25d77218acdac2f793071add9ea878b08c7d328b
While refactoring the backoff manager to simplify and unify the code
in wait a race condition was encountered in
TestSharedInformerWatchDisruption. The new implementation failed
because the fake clock was not propagated to the backoff managers
when the reflector was used in a controller. After ensuring the
mangaers, reflector, controller, and informer shared the same
clock the test needed was updated to avoid the race condition by
advancing the fake clock and adding real sleeps to wait for
asynchronous propagation of the various goroutines in the controller.
Due to the deep structure of informers it is difficult to inject
hooks to avoid having to perform sleeps. At a minimum the FakeClock
interface should allow a caller to determine the number of waiting
timers (to avoid the first sleep).
Kubernetes-commit: 91b3a81fbd916713afe215f7d701950e13a02869
Add a "lazy" type to track when an update is needed. It uses a nested
locking technique to avoid extra evaluation calls.
Kubernetes-commit: 5a1091d88d95bd1dd5c27f2c72cee4ecb4219dda
* Add tracker types and tests
* Modify ResourceEventHandler interface's OnAdd member
* Add additional ResourceEventHandlerDetailedFuncs struct
* Fix SharedInformer to let users track HasSynced for their handlers
* Fix in-tree controllers which weren't computing HasSynced correctly
* Deprecate the cache.Pop function
Kubernetes-commit: 8100efc7b3122ad119ee8fa4bbbedef3b90f2e0d
Add an annotation that can be added to the exampleType passed to
NewReflector to indicate the expected type for the Reflector. This is
useful for types such as unstuctured.Unstructured, which, when used with
a dynamic informer, do not have their TypeMeta filled in.
Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
Kubernetes-commit: 474fc8c5234000bce666a6b02f7ffbb295ef135f
for correctness. technically shouldnt be an issue since restarting a stopped processor is not supported
Kubernetes-commit: 3a81341cfa6f7e2ca1b9bfc195c567dcdfaa4dea