Throughout the devicemanager codebase the named argument to represent
resource for logging pupose is `resourceName` as opposed to `resource`.
The latter can only be seen in topology_hints.go files. To ensure consistency
with the rest of the codebase and also because we want to adhere to the
recommendations in the Kubernetes documentation about named arguments:
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#name-arguments
we update the key from `resource` to `resourceName`.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
We have reasonable amount of logs when things go wrong.
While debugging, it can be useful to have logs to indicate that
things have gone as expected.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Ensure that if possible, we provide sufficient metadata
inclusing pod name and UID to allow filtering by pod name or its
UID.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
We have reasonable amount of logs when things go wrong.
While debugging, it can be useful to have logs to indicate that
things have gone as expected especially when it comes to
important events like successful startup of memory manager
and successful allocation of resources.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Ensure that whereever possible, we provide sufficient metadata
inclusing pod name and UID to allow filtering by pod name or its
UID.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
CPU Allocation is skipped in CPU Manager with static policy
in case the pod doesn't belong to Guaranteed QoS or the CPUs
requested are not integral.
We add logs to capture these skips.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
While debugging, it can be useful to have logs to indicate that
things have gone as expected especially when it comes to
important events like successful startup of CPU manager
and successful allocation of resources.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
For unknown reasons, hack/make-rules/test-e2e-node.sh adds -timeout instead of
--timeout. Therefore the fallback code in test/e2e_node/remote/remote.go didn't
find it and added its own --timeout=60m after it. This effectively limits E2E
node test runs to 60 minutes, regardless of what is specified in the job:
W0206 09:53:51.425532 7151 remote.go:158] ginkgo flags are missing explicit --timeout (ginkgo defaults to 60 minutes)
I0206 09:53:51.425565 7151 remote.go:165] updated ginkgo flags: -timeout=24h --label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Beta, DynamicResourceAllocation } && !Flaky && !Slow" --no-color -v --timeout=60m
...
I0206 09:53:57.767096 7151 ssh.go:146] Running the command ssh, with args: ... timeout -k 30s 3600.000000s ./ginkgo -timeout=24h --label-filter="Feature: containsAny DynamicResourceAllocation && Feature: isSubsetOf { Beta, DynamicResourceAllocation } && !Flaky && !Slow" --no-color -v --timeout=60m ...
Note that the timeout for the test was 60m in this case (hence the "timeout -k
30s 3600.000000s") but it could also be something larger.
Replace DefaultComponentGlobalsRegistry with new instance of componentGlobalsRegistry in test api server.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
move kube effective version validation out of component base.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
move DefaultComponentGlobalsRegistry out of component base.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
move ComponentGlobalsRegistry out of featuregate pkg.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
remove usage of DefaultComponentGlobalsRegistry in test files.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
change non-test DefaultKubeEffectiveVersion to use DefaultBuildEffectiveVersion.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
Restore useDefaultBuildBinaryVersion in effective version.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
rename DefaultKubeEffectiveVersion to DefaultKubeEffectiveVersionForTest.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
pass options.ComponentGlobalsRegistry into config for controller manager and scheduler.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
Pass apiserver effective version to DefaultResourceEncodingConfig.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
change statusz registry to take effective version from the components.
Signed-off-by: Siyuan Zhang <sizhang@google.com>
Address review comments
Signed-off-by: Siyuan Zhang <sizhang@google.com>
update vendor
Signed-off-by: Siyuan Zhang <sizhang@google.com>
Around the 1.31 release, we discovered that a change introduced in 1.27 allowead
clients to open WATCH requests directly to etcd. This had detrimental consequences,
enabling abusive clients to bypass caching and overwhelm etcd.
Unlike the API server, etcd lacks protection against such behavior.
To mitigate this, we redirected all WATCH requests to be served from the cache.
The WatchFromStorageWithoutResourceVersion feature gate was retained as an escape hatch.
However, since we have no plans to allow direct WATCH requests to etcd again,
this flag is now obsolete.
Direct WATCH requests to etcd offer no advantage, as they don't provide stronger
consistency guarantees. WATCH operations are inherently inconsistent; unlike LIST
operations, they do not confirm the resource version with a quorum. While Kubernetes
uses the WithRequireLeader option on WATCH requests to prevent maintaining connections
to isolated etcd members, the API server provides the same level of guarantee through
its health checks, which fail if it cannot connect to etcd member. Therefore,
the WatchFromStorageWithoutResourceVersion feature gate can be deprecated and removed.