The structure of the error is changing, and we don't guarantee
reflect.DeepEqual(...) will remain true for ErrWaitTimeout currently.
Kubernetes-commit: 8d4004bbc77d012642db97e09238f4f65a926bca
The kube-apiserver validation expects the Count of an EventSeries to be
at least 2, otherwise it rejects the Event. There was is discrepancy
between the client and the server since the client was iniatizing an
EventSeries to a count of 1.
According to the original KEP, the first event emitted should have an
EventSeries set to nil and the second isomorphic event should have an
EventSeries with a count of 2. Thus, we should matcht the behavior
define by the KEP and update the client.
Also, as an effort to make the old clients compatible with the servers,
we should allow Events with an EventSeries count of 1 to prevent any
unexpected rejections.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
Kubernetes-commit: d00364902bda05eed4f7f02051ab81f7be55f8a9
Without this change, sometimes leaked goroutines were reported for
test/integration/scheduler_perf. The one that caused the cleanup to get delayed
was this one:
goleak.go:50: found unexpected goroutines:
[Goroutine 2704 in state chan receive, 2 minutes, with k8s.io/client-go/tools/cache.(*Reflector).watch on top of the stack:
goroutine 2704 [chan receive, 2 minutes]:
k8s.io/client-go/tools/cache.(*Reflector).watch(0xc00453f590, {0x0, 0x0}, 0x1f?, 0xc00a128080?)
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/reflector.go:388 +0x5b3
k8s.io/client-go/tools/cache.(*Reflector).ListAndWatch(0xc00453f590, 0xc006e94900)
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/reflector.go:324 +0x3bd
k8s.io/client-go/tools/cache.(*Reflector).Run.func1()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/reflector.go:279 +0x45
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc007aafee0)
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x49
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc003e18150?, {0x75e37c0, 0xc00389c280}, 0x1, 0xc006e94900)
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xcf
k8s.io/client-go/tools/cache.(*Reflector).Run(0xc00453f590, 0xc006e94900)
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/client-go/tools/cache/reflector.go:278 +0x257
k8s.io/apimachinery/pkg/util/wait.(*Group).StartWithChannel.func1()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:58 +0x3f
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:75 +0x74
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
/nvme/gopath/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:73 +0xe5
watch() was stuck in an exponential backoff timeout. Logging confirmed that:
I0309 21:14:21.756149 1572727 reflector.go:387] k8s.io/client-go/informers/factory.go:150: watch of *v1.PersistentVolumeClaim returned Get "https://127.0.0.1:38269/api/v1/persistentvolumeclaims?allowWatchBookmarks=true&resourceVersion=1&timeout=7m47s&timeoutSeconds=467&watch=true": dial tcp 127.0.0.1:38269: connect: connection refused - backing off
Kubernetes-commit: b4751a52d53d4acc6a5ce3e796938c9a12f81fcb
The comment on ConfigMapsLeasesResourceLock begins with the wrong name: EndpointsLeasesResourceLock.
Kubernetes-commit: a50c9db09ba4d22328088887d2fd00b61b36e6c4
Since the behavior is now changed, and the old behavior leaked objects,
this adds a new comment about how Replace works.
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <odin@uged.al>
Kubernetes-commit: 27f4bcae5c52a3bb88141f940ec23d907a15cde5
This is useful to both reduce the code complexity, and to ensure clients
get the "newest" version of an object known when its deleted. This is
all best-effort, but for clients it makes more sense giving them the
newest object they observed rather than an old one.
This is especially useful when an object is recreated. eg.
Object A with key K is in the KnownObjects store;
- DELETE delta for A is queued with key K
- CREATE delta for B is queued with key K
- Replace without any object with key K in it.
In this situation its better to create a DELETE delta with
DeletedFinalStateUnknown with B (with this patch), than it is to give
the client an DeletedFinalStateUnknown with A (without this patch).
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <odin@uged.al>
Kubernetes-commit: 7bcc3e00fc28b2548886d04639a2e352ab37fb55
This fixes an issue where a relist could result in a DELETED delta
with an object wrapped in a DeletedFinalStateUnknown object; and then on
the next relist, it would wrap that object inside another
DeletedFinalStateUnknown, leaving the user with a "double" layer
of DeletedFinalStateUnknown's.
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <odin@uged.al>
Kubernetes-commit: 0bf0546d9f75d92c801e81c9f7adf040bba64102
This fixes a race condition when a "short lived" object
is created and the create event is still present on the queue
when a relist replaces the state. Previously that would lead in the
object being leaked.
The way this could happen is roughly;
1. new Object is added O, agent gets CREATED event for it
2. watch is terminated, and the agent runs a new list, L
3. CREATE event for O is still on the queue to be processed.
4. informer replaces the old data in store with L, and O is not in L
- Since O is not in the store, and not in the list L, no DELETED event
is queued
5. CREATE event for O is still on the queue to be processed.
6. CREATE event for O is processed
7. O is <leaked>; its present in the cache but not in k8s.
With this patch, on step 4. above it would create a DELETED event
ensuring that the object will be removed.
Signed-off-by: Odin Ugedal <ougedal@palantir.com>
Signed-off-by: Odin Ugedal <odin@uged.al>
Kubernetes-commit: 25d77218acdac2f793071add9ea878b08c7d328b
When Shutdown was called, delivery of each pending event would still be retried
12 times with a delay of ~10s between each retry. In apiserver integration
tests that caused the goroutine to linger long after the corresponding
apiserver of the test was shut down.
Kubernetes-commit: 15b01af9c18a0840d71e2bb7dff4d8c29b158aad
While refactoring the backoff manager to simplify and unify the code
in wait a race condition was encountered in
TestSharedInformerWatchDisruption. The new implementation failed
because the fake clock was not propagated to the backoff managers
when the reflector was used in a controller. After ensuring the
mangaers, reflector, controller, and informer shared the same
clock the test needed was updated to avoid the race condition by
advancing the fake clock and adding real sleeps to wait for
asynchronous propagation of the various goroutines in the controller.
Due to the deep structure of informers it is difficult to inject
hooks to avoid having to perform sleeps. At a minimum the FakeClock
interface should allow a caller to determine the number of waiting
timers (to avoid the first sleep).
Kubernetes-commit: 91b3a81fbd916713afe215f7d701950e13a02869
Add a "lazy" type to track when an update is needed. It uses a nested
locking technique to avoid extra evaluation calls.
Kubernetes-commit: 5a1091d88d95bd1dd5c27f2c72cee4ecb4219dda
Update the definition of an isomorphic event in the events/v1 client to
match the aggregation logic that was already present in the core/v1
implementation.
The note field was omitted even though the message was used in the core
API aggregation because we didn't reach consensus.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
Kubernetes-commit: 21f2f746abc1a5a1b3193274401f5728e19cc26f
Currently, when the remote connection is unexpected closed, forward() prints an error message saying "lost connection to pod" via runtime.HandleError, but then it returns nil for the error.
This prevents the caller from being able to handle this error differently.
This commit changes forward() to return the "lost connection to pod" error so that it can be handled by the caller.
Making this change enables kubectl port-forward to exit with code 1, instead of 0, which is the expected behavior for a command that has failed.
Kubernetes-commit: a9f04103854893056237a09250ad3335867b0391
There was a data race in the recordToSink function that caused changes
to the events cache to be overriden if events were emitted
simultaneously via Eventf calls.
The race lies in the fact that when recording an Event, there might be
multiple calls updating the cache simultaneously. The lock period is
optimized so that after updating the cache with the new Event, the lock
is unlocked until the Event is recorded on the apiserver side and then
the cache is locked again to be updated with the new value returned by
the apiserver.
The are a few problem with the approach:
1. If two identical Events are emitted successively the changes of the
second Event will override the first one. In code the following
happen:
1. Eventf(ev1)
2. Eventf(ev2)
3. Lock cache
4. Set cache[getKey(ev1)] = &ev1
5. Unlock cache
6. Lock cache
7. Update cache[getKey(ev2)] = &ev1 + Series{Count: 1}
8. Unlock cache
9. Start attempting to record the first event &ev1 on the apiserver side.
This can be mitigated by recording a copy of the Event stored in
cache instead of reusing the pointer from the cache.
2. When the Event has been recorded on the apiserver the cache is
updated again with the value of the Event returned by the server.
This update will override any changes made to the cache entry when
attempting to record the new Event since the cache was unlocked at
that time. This might lead to some inconsistencies when dealing with
EventSeries since the count may be overriden or the client might even
try to record the first isomorphic Event multiple time.
This could be mitigated with a lock that has a larger scope, but we
shouldn't want to reflect Event returned by the apiserver in the
cache in the first place since mutation could mess with the
aggregation by either allowing users to manipulate values to update
a different cache entry or even having two cache entries for the same
Events.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
Kubernetes-commit: 55ec09d377274b4a6107fe0b7a061ad408fe05a7
When attempting to record a new Event and a new Serie on the apiserver
at the same time, the patch of the Serie might happen before the Event
is actually created. In that case, we handle the error and try to create
the Event. But the Event might be created during that period of time and
it is treated as an error today. So in order to handle that scenario, we
need to retry when a Create call for a Serie results in an AlreadyExist
error.
Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
Kubernetes-commit: 2f83117bcfe30ad3ada7f1ca66f4b885a1d5df25
Currently, watch package embeds context deadlineexceeded error
in it's own error using `%v`, as can be seen in here;
`fmt.Errorf("UntilWithSync: unable to sync caches: %v", ctx.Err())`
However, consumers of this function can not use
`errors.Is(err, context.DeadlineExceeded)` due this `%v`.
To let consumers can distinguish context deadlineexceeded errors,
this PR changes error embedding format to `%w`.
Kubernetes-commit: 6b7c365f8f6d50280c2dab171efdd4b93d964f32
* Add tracker types and tests
* Modify ResourceEventHandler interface's OnAdd member
* Add additional ResourceEventHandlerDetailedFuncs struct
* Fix SharedInformer to let users track HasSynced for their handlers
* Fix in-tree controllers which weren't computing HasSynced correctly
* Deprecate the cache.Pop function
Kubernetes-commit: 8100efc7b3122ad119ee8fa4bbbedef3b90f2e0d
Add an annotation that can be added to the exampleType passed to
NewReflector to indicate the expected type for the Reflector. This is
useful for types such as unstuctured.Unstructured, which, when used with
a dynamic informer, do not have their TypeMeta filled in.
Signed-off-by: Andy Goldstein <andy.goldstein@redhat.com>
Kubernetes-commit: 474fc8c5234000bce666a6b02f7ffbb295ef135f
* Add RedactSecrets function
* Move RedactSecrets method to existing RawBytesData case
* Update TestRedactSecrets to use new pattern of os.CreateTemp()
Kubernetes-commit: e721272d10dd6c4d85ff613182ba0eaddcec9272