kubernetes

mirror of https://github.com/k3s-io/kubernetes.git synced 2025-10-01 07:30:27 +00:00

Author	SHA1	Message	Date
Dan Winship	2636aa35e3	Require canonicalization of NetworkDeviceData IPs There's no reason to allow non-standard or non-canonical IP values in new APIs.	2025-02-20 12:49:03 -05:00
AxeZhan	3075a9ae96	DRA API: validate node selector labels Previously, ValidateNodeSelector did not check that labels are valid. Now it does for resource.k8s.io, regardless whether an object already was created with invalid labels in an earlier Kubernetes release. Theoretically this is a breaking change and could cause problems during an upgrade, but that is highly unlikely in practice. In contrast to node affinity, DRA does not ignore parse errors (= uses NewNodeSelector, not NewLazyErrorNodeSelector), so invalid labels would have been found instead of being silently ignored. Even if some object has invalid labels, this only affects an alpha -> beta upgrade which isn't guaranteed to work seamlessly.	2024-11-22 09:10:02 +01:00
Lionel Jouin	118356175d	[KEP-4817] Add limits on conditions and IPs + fix documentation Signed-off-by: Lionel Jouin <lionel.jouin@est.tech>	2024-11-07 22:18:53 +01:00
Lionel Jouin	39f55e1cd0	[KEP-4817] Add data length limit (from #128601 ) Signed-off-by: Lionel Jouin <lionel.jouin@est.tech>	2024-11-07 10:35:29 +01:00
Lionel Jouin	4b76ba1a87	[KEP-4817] Rename Addresses to IPs Signed-off-by: Lionel Jouin <lionel.jouin@est.tech>	2024-11-07 09:59:56 +01:00
Lionel Jouin	43d23b8994	[KEP-4817] Use structured.MakeDeviceID Signed-off-by: Lionel Jouin <lionel.jouin@est.tech>	2024-11-07 09:59:56 +01:00
Lionel Jouin	8ab33b8413	[KEP-4817] Improve NetworkData Validation * Add max length for InterfaceName and HardwareAddress * Prevent duplicated Addresses Signed-off-by: Lionel Jouin <lionel.jouin@est.tech>	2024-11-07 09:59:56 +01:00
Lionel Jouin	a062f91106	[KEP-4817] Fixes based on review * Rename HWAddress to HardwareAddress * Fix condition validation * Remove feature gate validation * Fix drop field on disabled feature gate Signed-off-by: Lionel Jouin <lionel.jouin@est.tech>	2024-11-07 09:59:56 +01:00
Lionel Jouin	cb9ee1d4fe	[KEP-4817] Remove pointer on Data, InterfaceName and HWAddress fields Adapat validation and tests based on these API changes Signed-off-by: Lionel Jouin <lionel.jouin@est.tech>	2024-11-07 09:59:51 +01:00
Lionel Jouin	5d7a16b0a5	[KEP-4817] improve testing * Test feature-gate enabled/disabled for validation * Test pkg/registry/resource/resourceclaim * Add Data and NetworkData to integration test Signed-off-by: Lionel Jouin <lionel.jouin@est.tech>	2024-11-07 09:54:19 +01:00
Lionel Jouin	3e595db0af	[KEP-4817] API, validation and feature-gate * Add status * Add validation to check if fields are correct (Network field, device has been allocated)) * Add feature-gate * Drop field if feature-gate not set Signed-off-by: Lionel Jouin <lionel.jouin@est.tech>	2024-11-07 09:54:17 +01:00
Patrick Ohly	446f20aa3e	DRA API: add maximum length of opaque parameters This had been left out unintentionally earlier. Because theoretically there might now be existing objects with parameters that are larger than whatever limit gets enforced now, the limit only gets checked when parameters get created or modified. This is similar to the validation of CEL expressions and for consistency, the same 10 Ki limit as for those is chosen. Because the limit is not enforced for stored parameters, it can be increased in the future, with the caveat that users who need larger parameters then depend on the newer Kubernetes release with a higher limit. Lowering the limit is harder because creating deployments that worked in older Kubernetes will not work anymore with newer Kubernetes.	2024-11-06 17:29:51 +01:00
Patrick Ohly	81fd64256c	DRA API: use DeviceCapacity struct instead of plain Quantity This enables a future extension where capacity of a single device gets consumed by different claims. The semantic without any additional fields is the same as before: a capacity cannot be split up and is only an attribute of a device. Because its semantically the same as before, two-way conversion to v1alpha3 is possible.	2024-11-06 13:03:19 +01:00
Patrick Ohly	99acb67c68	DRA API: enhance validation testing The line coverage is now at 98.5% and several more corner cases are covered. The remaining lines are hard or impossible to reach. The actual validation is the same as before, with some small tweaks to the generated errors. When failures are not as expected, it is useful to show what the expected and actual failures look like to a user. Perhaps even better would be to put the expected texts into the test files instead of the error structs. That would be easier to review and shorter.	2024-11-06 13:03:18 +01:00
Tim Hockin	c8eeb486f4	Call-site comments: the "" arg to TooLong is unused	2024-11-05 15:10:24 -08:00
Tim Hockin	8a7af90300	Clarify that value arg to field.TooLong is unused	2024-11-05 15:10:23 -08:00
Tim Hockin	4d0e1c8fd4	Kill TooLongMaxLength() in favor of TooLong()	2024-11-05 15:10:22 -08:00
Patrick Ohly	4419568259	DRA: treat AdminAccess as a new feature gated field Using the "normal" logic for a feature gated field simplifies the implementation of the feature gate. There is one (entirely theoretic!) problem with updating from 1.31: if a claim was allocated in 1.31 with admin access, the status field was not set because it didn't exist yet. If a driver now follows the current definition of "unset = off", then it will not grant admin access even though it should. This is theoretic because drivers are starting to support admin access with 1.32, so there shouldn't be any claim where this problem could occur.	2024-10-29 10:22:31 +01:00
Patrick Ohly	9a7e4ccab2	DRA admin access: add feature gate The new DRAAdminAccess feature gate has the following effects: - If disabled in the apiserver, the spec.devices.requests[*].adminAccess field gets cleared. Same in the status. In both cases the scenario that it was already set and a claim or claim template get updated is special: in those cases, the field is not cleared. Also, allocating a claim with admin access is allowed regardless of the feature gate and the field is not cleared. In practice, the scheduler will not do that. - If disabled in the resource claim controller, creating ResourceClaims with the field set gets rejected. This prevents running workloads which depend on admin access. - If disabled in the scheduler, claims with admin access don't get allocated. The effect is the same. The alternative would have been to ignore the fields in claim controller and scheduler. This is bad because a monitoring workload then runs, blocking resources that probably were meant for production workloads.	2024-10-29 09:50:11 +01:00
Patrick Ohly	f3fef01e79	DRA API: AdminAccess in DeviceRequestAllocationResult Drivers need to know that because admin access may also grant additional permissions. The allocator needs to ignore such results when determining which devices are considered as allocated. In both cases it is conceptually cleaner to not rely on the content of the ClaimSpec.	2024-10-29 09:50:07 +01:00
Kubernetes Prow Robot	3690cb7f9a	Merge pull request #128101 from pohly/dra-api-cel-cost-limit DRA API: implement CEL cost limit	2024-10-26 20:18:52 +01:00
Patrick Ohly	d53cb79cec	DRA cel: enforce runtime limit by default again As pointed out during code review, the CEL cost estimates are not considered perfectly reliable. Therefore it is better to also do runtime checks. Some downstream users might decide to allow CEL expressions to run longer. Therefore the cost limit is now part of an Options struct. kube-scheduler uses the default cost limit defined in the resource.k8s.io API, which is the same cost limit that also the apiserver uses during validation.	2024-10-23 21:24:45 +02:00
Patrick Ohly	f548fc2264	DRA API: implement CEL cost limit The main purpose is to protect against denial-of-service attacks. Scheduling time depends a lot on unpredictable factors and expected scheduling time also varies, so no attempt is made to limit the overall time spent on evaluating CEL expressions per claim.	2024-10-23 21:24:45 +02:00
Patrick Ohly	f84eb5ecf8	DRA: remove "classic DRA" This removes the DRAControlPlaneController feature gate, the fields controlled by it (claim.spec.controller, claim.status.deallocationRequested, claim.status.allocation.controller, class.spec.suitableNodes), the PodSchedulingContext type, and all code related to the feature. The feature gets removed because there is no path towards beta and GA and DRA with "structured parameters" should be able to replace it.	2024-10-16 23:09:50 +02:00
Patrick Ohly	91d7882e86	DRA: new API for 1.31 This is a complete revamp of the original API. Some of the key differences: - refocused on structured parameters and allocating devices - support for constraints across devices - support for allocating "all" or a fixed amount of similar devices in a single request - no class for ResourceClaims, instead individual device requests are associated with a mandatory DeviceClass For the sake of simplicity, optional basic types (ints, strings) where the null value is the default are represented as values in the API types. This makes Go code simpler because it doesn't have to check for nil (consumers) and values can be set directly (producers). The effect is that in protobuf, these fields always get encoded because `opt` only has an effect for pointers. The roundtrip test data for v1.29.0 and v1.30.0 changes because of the new "request" field. This is considered acceptable because the entire `claims` field in the pod spec is still alpha. The implementation is complete enough to bring up the apiserver. Adapting other components follows.	2024-07-22 18:09:34 +02:00
Patrick Ohly	8a629b9f15	DRA: remove "sharable" from claim allocation result Now all claims are shareable up to the limit imposed by the size of the "reserverFor" array. This is one of the agreed simplifications for 1.31.	2024-07-21 17:28:14 +02:00
Patrick Ohly	de5742ae83	DRA: remove immediate allocation As agreed in https://github.com/kubernetes/enhancements/pull/4709, immediate allocation is one of those features which can be removed because it makes no sense for structured parameters and the justification for classic DRA is weak.	2024-07-21 17:28:14 +02:00
carlory	bce0335ea6	DRA: enhance validation for the ResourceClaimParametersReference and ResourceClassParametersReference with the following rules: 1. `apiGroup`: If set, it must be a valid DNS subdomain (e.g. 'example.com'). 2. `kind` and `name`: It must be valid path segment name. It may not be '.' or '..' and it may not contain '/' and '%' characters.	2024-06-07 17:18:10 +08:00
Patrick Ohly	a0add8d2c7	dra api: NodeResourceModel -> ResourceModel When renaming NodeResourceSlice to ResourceSlice, the embedded [Node]ResourceModel also should have been renamed.	2024-03-14 18:07:36 +01:00
Patrick Ohly	0b6a0d686a	dra api: rename NodeResourceSlice -> ResourceSlice While currently those objects only get published by the kubelet for node-local resources, this could change once we also support network-attached resources. Dropping the "Node" prefix enables such a future extension. The NodeName in ResourceSlice and StructuredResourceHandle then becomes optional. The kubelet still needs to provide one and it must match its own node name, otherwise it doesn't have permission to access ResourceSlice objects.	2024-03-07 22:22:55 +01:00
Patrick Ohly	d4d5ade7f5	dra: add "named resources" structured parameter model Like the current device plugin interface, a DRA driver using this model announces a list of resource instances. In contrast to device plugins, this list is made available to the scheduler together with attributes that can be used to select suitable instances when they are not all alike. Because this is the first structured parameter model, some checks that previously were not possible, in particular "is one structured parameter field set", now gets enabled. Adding another structured parameter model will be similar. The applyconfigs code generator assumes that all types in an API are defined in a single package. If it wasn't for that, it would be possible to place the "named resources" types in separate packages, which makes their names in the Go code more natural and provides an indication of their stability level because the package name could include a version.	2024-03-07 22:21:16 +01:00
Patrick Ohly	39bbcedbca	dra api: add structured parameters NodeResourceSlice will be used by kubelet to publish resource information on behalf of DRA drivers on the node. NodeName and DriverName in NodeResourceSlice must be immutable. This simplifies tracking the different objects because what they are for cannot change after creation. The new field in ResourceClass tells scheduler and autoscaler that they are expected to handle allocation. ResourceClaimParameters and ResourceClassParameters are new types for telling in-tree components how to handle claims.	2024-03-07 16:15:31 +01:00
Kubernetes Prow Robot	ae36991498	Merge pull request #116332 from klueska/extend-resourceclaimstatus Update resource.AllocationResult with a slice of ResourceHandlers	2023-03-14 19:26:50 -07:00
Kubernetes Prow Robot	f315a4669a	Merge pull request #116576 from pohly/dra-core-validation api: extend validation of dynamic resource allocation fields in PodSpec	2023-03-14 16:34:48 -07:00
Kevin Klues	da0b75f8f9	Update validation for recent changes to resource.k8s.io/v1alpha2 Signed-off-by: Kevin Klues <kklues@nvidia.com>	2023-03-14 22:34:18 +00:00
Patrick Ohly	e97531b349	api: extend validation of dynamic resource allocation fields in PodSpec The generated ResourceClaim name and the names of the ResourceClaimTemplate and ResourceClaim referenced by a pod must be valid according to the resource API, otherwise the pod cannot start. Checking this was removed from the original implementation out of concerns about validating fields in core against limitations imposed by a separate, alpha API. But as this was pointed out again in https://github.com/kubernetes/kubernetes/pull/116254#discussion_r1134010324 it gets added back. The same strings that worked before still work now. In particular, the constraints for a spec.resourceClaim.name are still the same (DNS label).	2023-03-14 11:58:41 +01:00
Patrick Ohly	fec5233668	api: resource.k8s.io PodScheduling -> PodSchedulingContext The name "PodScheduling" was unusual because in contrast to most other names, it was impossible to put an article in front of it. Now PodSchedulingContext is used instead.	2023-03-14 10:18:08 +01:00
Patrick Ohly	508cd60760	dynamic resource allocation: avoid apiserver complaint about list content This fixes the following warning (error?) in the apiserver: E0126 18:10:38.665239 16370 fieldmanager.go:210] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (test/claim-84; resource.k8s.io/v1alpha1, Kind=ResourceClaim) to smd typed: .status.reservedFor: element 0: associative list without keys has an element that's a map type" VersionKind="/, Kind=" namespace="test" name="claim-84" The root cause is the same as in `e50e8a0c91`: nothing in Kubernetes outright complains about a list of items where the item type is comparable in Go, but not a simple type. This nonetheless isn't supposed to be done in the API and can causes problems elsewhere. For the ReservedFor field, everything seems to work okay except for the warning. However, it's better to follow conventions and use a map. This is possible in this case because UID is guaranteed to be a unique key. Validation is now stricter than before, which is a good thing: previously, two entries with the same UID were allowed as long as some other field was different, which wasn't a situation that should have been allowed.	2023-01-27 11:33:05 +01:00
Patrick Ohly	8018ab7cd9	api: fully validate PotentialNodes and SuitableNodes This is in response to review feedback. Checking for valid node names and the set property catches programming mistakes in the components that have write permission.	2022-11-10 20:23:50 +01:00
Patrick Ohly	5cca60f0b8	api: dynamic resource allocation API This adds a new resource.k8s.io API group with v1alpha1 as version. It contains four new types: resource.ResourceClaim, resource.ResourceClass, resource.ResourceClaimTemplate, and resource.PodScheduling.	2022-11-10 20:08:24 +01:00

40 Commits