Commit Graph

126173 Commits

Author SHA1 Message Date
Patrick Ohly
1088f4fb44 DRA resourceslice controller: do DeepCopy for driver resources
The reason for the previous behavior was unnecessary performance overhead that
occurs when the caller already provided a "fresh" copy and doesn't touch it
afterwards.

But this is something that DRA driver developers can easily get wrong, so it's
better to be safe than sorry.
2024-10-30 15:54:32 +01:00
Patrick Ohly
67f0428769 DRA resourceslice controller: delay sync
When deleting a bunch of slices, the delete events queue the pool while it is
being synced. It then got synced again immediately, while the deleted slices
were still being removed from the informer cache. The obsolete slice in the
cache caused the controller to delete it again, which fails with a "not
found". That error is ignored, but this still caused extra API calls.

Now syncing gets delayed with a configuration duration (default: 30 seconds) so
the informer cache is more likely to be up-to-date when the pool gets synced
again.
2024-10-30 15:54:32 +01:00
Patrick Ohly
99cf2d8a2e DRA resource slice controller: add E2E test
This test covers creating and deleting 100 large ResourceSlices. It is strict
about using the minimum number of calls.

The test also verifies that creating large slices works.
2024-10-30 15:54:32 +01:00
Patrick Ohly
7473e643fa DRA resource slice controller: use MutationCache to avoid race
This avoids the problem of creating an additional slice when the one from the
previous sync is not in the informer cache yet. It also avoids false
attempts to delete slices which were updated in the previous sync. Such
attempts would fail the ResourceVersion precondition check, but would
still cause work for the apiserver.
2024-10-30 15:54:32 +01:00
Patrick Ohly
e88d5c37e6 DRA resource claim controller: add statistics
This is primarily for testing. Proper metrics might be useful, but can still be
added later.
2024-10-30 15:54:32 +01:00
Patrick Ohly
d94752ebc8 DRA resourceslice controller: use preconditions for Delete
It's better to verify UID and ResourceVersion of the ResourceSlice that we want
to delete. If anything changed, the decision to remove it might not apply
anymore and we need to check again.
2024-10-30 15:54:32 +01:00
Patrick Ohly
a6d180c7d3 DRA: validate set of devices in a pool before using the pool
The ResourceSlice controller (theoretically) might end up creating too many
slices if it syncs again before its informer cache was updated. This could
cause the scheduler to allocate a device from a duplicated slice. They should
be identical, but its still better to fail and wait until the controller
removes the redundant slice.
2024-10-30 15:54:32 +01:00
Patrick Ohly
26650371cc DRA resourceslice controller: support publishing multiple slices
The driver determines what each slice is meant to look like. The controller
then ensures that only those slices exist. It reuses existing slices where the
set of devices, as identified by their names, is the same as in some desired
slice. Such slices get updated to match the desired state.

In other words, attributes and the order of devices can be changed by updating
an existing slice, but adding or removing a device is done by deleting and
re-creating slices.

Co-authored-by: googs1025 <googs1025@gmail.com>

The test update is partly based on
https://github.com/kubernetes/kubernetes/pull/127645.
2024-10-30 15:54:32 +01:00
Kubernetes Prow Robot
8b063a6a08
Merge pull request #128331 from ArangoGutierrez/devel/driverresources.deepcopy
DRA: generate deepcopy for DriverResources
2024-10-25 12:42:52 +01:00
Carlos Eduardo Arango Gutierrez
32214631eb
DRA: generate deepcopy for DriverResources
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
2024-10-25 11:43:34 +02:00
Kubernetes Prow Robot
d9b95ea94f
Merge pull request #128259 from dinhxuanvu/conversion-build-tag
Restore build-tag for conversion and defaulter gen
2024-10-25 08:28:53 +01:00
Kubernetes Prow Robot
68f63471a7
Merge pull request #128322 from benluddy/cbor-storage-wiring
KEP-4222: Wire CBOR CR storage behind test-only feature gate.
2024-10-25 03:32:51 +01:00
Ben Luddy
950ed807c3
Wire CBOR CR storage behind test-only feature gate. 2024-10-24 21:29:40 -04:00
Kubernetes Prow Robot
5147eebf22
Merge pull request #128243 from benluddy/cbor-dynamic-integration
KEP-4222: Add CBOR variant of admission webhook integration test.
2024-10-25 01:04:53 +01:00
Kubernetes Prow Robot
66da447e14
Merge pull request #128317 from Jefftree/revert-componentsli-feature
Set ComponentSLIs feature as GA
2024-10-24 22:42:51 +01:00
Kubernetes Prow Robot
b7a85a9db3
Merge pull request #128262 from dom4ha/scheduler-perf
Tune PreemptionAsync and Unschedulable tests threshold and params.
2024-10-24 21:24:52 +01:00
Ben Luddy
77401d7073
Add CBOR variant of admission webhook integration test.
The existing admission webhook integration test provides good coverage of serving built-in resources
and custom resources, including subresources. Serialization concerns, including roundtrippability,
of built-in types have existing test coverage; the CBOR variant of the admission webhook integration
test additionally exercises client and server codec wiring.
2024-10-24 13:27:39 -04:00
Ben Luddy
3e1b6aaf41
Export meta internal version scheme for testing.
Codecs is already exported, but in order for tests to construct an alternate CodecFactory for meta's
internal version types, they either need to be able to reference the scheme or to construct a
parallel scheme, and a parallel scheme construction risks going out of sync with the way the
package-scoped scheme object is initialized.
2024-10-24 13:27:39 -04:00
Ben Luddy
ea13190d8b
Add test-only client feature gates for CBOR.
As with the apiserver feature gate for CBOR as a serving and storage encoding, the client feature
gates for CBOR are being initially added through a test-only feature gate instance that is not wired
to environment variables or to command-line flags and is intended only to be enabled
programmatically from integration tests. The test-only instance will be removed as part of alpha
graduation and replaced by conventional client feature gating.
2024-10-24 13:27:39 -04:00
Ben Luddy
0cad1a89b6
Wire test-only feature gate for CBOR serving.
To mitigate the risk of introducing a new protocol, integration tests for CBOR will be written using
a test-only feature gate instance that is not wired to runtime options. On alpha graduation, the
test-only feature gate instance will be replaced by a normal feature gate in the existing apiserver
feature gate instance.
2024-10-24 13:27:36 -04:00
Kubernetes Prow Robot
7b7a7968d4
Merge pull request #125314 from enj/enj/i/proto_for_core
Use protobuf for core clients
2024-10-24 18:20:54 +01:00
Ben Luddy
d638d64572
Add CBOR serializer option to disable JSON transcoding of raw types. 2024-10-24 12:30:19 -04:00
Ben Luddy
db1239d354
Add WithSerializer option to add serializers to CodecFactory. 2024-10-24 12:30:19 -04:00
Ben Luddy
66a14268c5
Use runtime.SerializerInfo in place of internal "serializerType".
CodecFactory construction uses an unexported struct type named "serializerType" to hold serializer
definitions. There are few differences between it and runtime.SerializerInfo, and they do not appear
to be used anymore. For example, serializerType includes an unused FileExtensions field, and has
distinct ContentType (singular) and AcceptContentTypes (plural) fields instead of
runtime.SerializeInfo's singular MediaType. All remaining uses of serializerType set
AcceptContentTypes to a single-entry slice whose element is equal to its ContentType field.

During construction of a CodecFactory, all serializerType values were already being mechanically
translated into runtime.SerializerInfo values.

Moving to an exported type for serializer definitions makes it easier to expose an option to allow
callers to register their own serializer definitions, which in turn makes it possible to
conditionally include new serializers at runtime (especially behind feature gates).
2024-10-24 12:30:19 -04:00
Kubernetes Prow Robot
fc9330eb65
Merge pull request #128311 from huww98/mount-warn
mount-utils: fix warning message of fs mismatch
2024-10-24 17:15:04 +01:00
Kubernetes Prow Robot
0f549a9286
Merge pull request #128213 from aaron-prindle/fix-127336
chore: remove sig/api-machinery from OWNERS files that sig/etcd owns
2024-10-24 17:14:53 +01:00
Kubernetes Prow Robot
721d66780b
Merge pull request #128305 from adrianmoisey/cidr_release_on_node_delete
Ensure that a node's CIDR isn't released until the node is deleted
2024-10-24 15:21:05 +01:00
Kubernetes Prow Robot
0a62f0fd7b
Merge pull request #128139 from Jefftree/revert-allowservicelb
Revert removal of feature AllowServiceLBStatusOnNonLB and LockToDefault first
2024-10-24 15:20:54 +01:00
Kubernetes Prow Robot
8c7160205d
Merge pull request #127922 from PiotrProkop/topology-manager-policy-options-e2e
add e2e tests for prefer-closest-numa-nodes TopologyManagerPolicyOption
2024-10-24 14:17:03 +01:00
Kubernetes Prow Robot
cadb1508a9
Merge pull request #125258 from serathius/etcd-kubernetes-interface
Etcd kubernetes interface
2024-10-24 14:16:52 +01:00
Jefftree
b8e3ef7fbf update feature yaml 2024-10-24 13:09:04 +00:00
Jefftree
868ec5a637 Move ComponentSLIs to versioned features and mark as GA 2024-10-24 13:08:23 +00:00
Jefftree
a0977f0673 Revert "Remove GA feature gate ComponentSLIs"
This reverts commit f1af84620b.
2024-10-24 13:00:04 +00:00
Adrian Moisey
4d2f3ed8e6
Ensure that a node's CIDR isn't released until the node is deleted
Fixes https://github.com/kubernetes/kubernetes/issues/127792

Fixes bug where a node's PodCIDR was released when the node was given a
delete time stamp, but was hanging around due to a finalizer.
2024-10-24 13:19:34 +02:00
Kubernetes Prow Robot
a8a086fe0a
Merge pull request #128309 from mimowo/job-rollback-test-promotion
Rollback promotion of Job e2e test for pod failure policy using exit code
2024-10-24 12:00:52 +01:00
PiotrProkop
a6eb3281cc add e2e tests for prefer-closest-numa-nodes TopologyManagerPolicyOption suboptimal allocation
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2024-10-24 11:45:39 +02:00
胡玮文
ed43fc467d mount-utils: fix warning message of fs mismatch 2024-10-24 16:48:51 +08:00
Michal Wozniak
d521e44187 Rollback promotion of Job e2e test for pod failure policy using exit code 2024-10-24 10:30:56 +02:00
Marek Siarkowicz
a16a364324 Migrate GetList to Kubernetes client 2024-10-24 10:23:54 +02:00
Marek Siarkowicz
e192ac31a4 Migrate Count to Kubernetes client 2024-10-24 10:23:54 +02:00
Marek Siarkowicz
2fcd321c42 Migrate Delete and GuaranteedUpdate to Kubernetes client 2024-10-24 10:23:52 +02:00
Marek Siarkowicz
53ca81da29 Migrate Create to Kubernetes client 2024-10-24 10:17:13 +02:00
Marek Siarkowicz
092a6d1e0d Migrate Get to Kubernetes client 2024-10-24 10:15:00 +02:00
Marek Siarkowicz
066c1c05d7 Update recorders to wrap kubernetes.Client 2024-10-24 10:14:11 +02:00
Marek Siarkowicz
249ad2a613 Add etcd kubernetes interface package to vendor 2024-10-24 10:09:26 +02:00
Kubernetes Prow Robot
e526a27118
Merge pull request #116388 from mxpv/shutdown
Clean/refactor node shutdown manager
2024-10-24 08:34:53 +01:00
Kubernetes Prow Robot
aa8f2878a5
Merge pull request #117943 from lowang-bh/lessFunCall
improve: reduce function calling number
2024-10-24 04:52:52 +01:00
Kubernetes Prow Robot
1af81c223d
Merge pull request #128197 from aojea/extract_provider_flags
disable cloud-provider code from kube-controller-manager
2024-10-24 03:34:59 +01:00
Kubernetes Prow Robot
122fa7c188
Merge pull request #128127 from macsko/add_macsko_to_sig_scheduling_reviewers
Add macsko to SIG Scheduling reviewers
2024-10-24 03:34:52 +01:00
Maksym Pavlenko
449f86b0ba Refactor node shutdown manager
Signed-off-by: Maksym Pavlenko <pavlenko.maksym@gmail.com>
2024-10-23 17:36:22 -07:00