From a5de75458e223ef44cf133ca623263fc2b871ddc Mon Sep 17 00:00:00 2001
From: Patrick Ohly <patrick.ohly@intel.com>
Date: Thu, 9 Jan 2025 11:41:44 +0100
Subject: [PATCH] DRA API: bump maximum size of ReservedFor to 256

The original limit of 32 seemed sufficient for a single GPU on a node. But for
shared non-local resources it is too low. For example, a ResourceClaim might be
used to allocate an interconnect channel that connects all pods of a workload
running on several different nodes, in which case the number of pods can be
considerably larger.

256 is high enough for currently planned systems. If we need something even
higher in the future, an alternative approach might be needed to avoid
scalability problems.

Normally, increasing such a limit would have to be done incrementally over two
releases. In this case we decided on
Slack (https://kubernetes.slack.com/archives/CJUQN3E4T/p1734593174791519) to
make an exception and apply this change to current master for 1.33 and backport
it to the next 1.32.x patch release for production usage.

This breaks downgrades to a 1.32 release without this change if there are
ResourceClaims with a number of consumers > 32 in ReservedFor. In practice,
this breakage is very unlikely because there are no workloads yet which need so
many consumers and such downgrades to a previous patch release are also
unlikely. Downgrades to 1.31 already weren't supported when using DRA v1beta1.
---
 api/openapi-spec/swagger.json                               | 4 ++--
 .../v3/apis__resource.k8s.io__v1alpha3_openapi.json         | 2 +-
 .../v3/apis__resource.k8s.io__v1beta1_openapi.json          | 2 +-
 pkg/apis/resource/types.go                                  | 6 +++---
 pkg/generated/openapi/zz_generated.openapi.go               | 4 ++--
 staging/src/k8s.io/api/resource/v1alpha3/generated.proto    | 2 +-
 staging/src/k8s.io/api/resource/v1alpha3/types.go           | 6 +++---
 .../api/resource/v1alpha3/types_swagger_doc_generated.go    | 2 +-
 staging/src/k8s.io/api/resource/v1beta1/generated.proto     | 2 +-
 staging/src/k8s.io/api/resource/v1beta1/types.go            | 6 +++---
 .../api/resource/v1beta1/types_swagger_doc_generated.go     | 2 +-
 11 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/api/openapi-spec/swagger.json b/api/openapi-spec/swagger.json
index 4b1b1c7af1a..405253a05c9 100644
--- a/api/openapi-spec/swagger.json
+++ b/api/openapi-spec/swagger.json
@@ -15123,7 +15123,7 @@
           "x-kubernetes-list-type": "map"
         },
         "reservedFor": {
-          "description": "ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.\n\nIn a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.\n\nBoth schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.\n\nThere can be at most 32 such reservations. This may get increased in the future, but not reduced.",
+          "description": "ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.\n\nIn a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.\n\nBoth schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.\n\nThere can be at most 256 such reservations. This may get increased in the future, but not reduced.",
           "items": {
             "$ref": "#/definitions/io.k8s.api.resource.v1alpha3.ResourceClaimConsumerReference"
           },
@@ -15956,7 +15956,7 @@
           "x-kubernetes-list-type": "map"
         },
         "reservedFor": {
-          "description": "ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.\n\nIn a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.\n\nBoth schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.\n\nThere can be at most 32 such reservations. This may get increased in the future, but not reduced.",
+          "description": "ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.\n\nIn a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.\n\nBoth schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.\n\nThere can be at most 256 such reservations. This may get increased in the future, but not reduced.",
           "items": {
             "$ref": "#/definitions/io.k8s.api.resource.v1beta1.ResourceClaimConsumerReference"
           },
diff --git a/api/openapi-spec/v3/apis__resource.k8s.io__v1alpha3_openapi.json b/api/openapi-spec/v3/apis__resource.k8s.io__v1alpha3_openapi.json
index c9d9c65b425..b13dcc8729d 100644
--- a/api/openapi-spec/v3/apis__resource.k8s.io__v1alpha3_openapi.json
+++ b/api/openapi-spec/v3/apis__resource.k8s.io__v1alpha3_openapi.json
@@ -847,7 +847,7 @@
             "x-kubernetes-list-type": "map"
           },
           "reservedFor": {
-            "description": "ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.\n\nIn a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.\n\nBoth schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.\n\nThere can be at most 32 such reservations. This may get increased in the future, but not reduced.",
+            "description": "ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.\n\nIn a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.\n\nBoth schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.\n\nThere can be at most 256 such reservations. This may get increased in the future, but not reduced.",
             "items": {
               "allOf": [
                 {
diff --git a/api/openapi-spec/v3/apis__resource.k8s.io__v1beta1_openapi.json b/api/openapi-spec/v3/apis__resource.k8s.io__v1beta1_openapi.json
index 776e80bf6fd..c988bb4f139 100644
--- a/api/openapi-spec/v3/apis__resource.k8s.io__v1beta1_openapi.json
+++ b/api/openapi-spec/v3/apis__resource.k8s.io__v1beta1_openapi.json
@@ -869,7 +869,7 @@
             "x-kubernetes-list-type": "map"
           },
           "reservedFor": {
-            "description": "ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.\n\nIn a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.\n\nBoth schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.\n\nThere can be at most 32 such reservations. This may get increased in the future, but not reduced.",
+            "description": "ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.\n\nIn a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.\n\nBoth schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.\n\nThere can be at most 256 such reservations. This may get increased in the future, but not reduced.",
             "items": {
               "allOf": [
                 {
diff --git a/pkg/apis/resource/types.go b/pkg/apis/resource/types.go
index 65fb34f9a14..fca9bc6a6b0 100644
--- a/pkg/apis/resource/types.go
+++ b/pkg/apis/resource/types.go
@@ -689,7 +689,7 @@ type ResourceClaimStatus struct {
 	// which issued it knows that it must put the pod back into the queue,
 	// waiting for the ResourceClaim to become usable again.
 	//
-	// There can be at most 32 such reservations. This may get increased in
+	// There can be at most 256 such reservations. This may get increased in
 	// the future, but not reduced.
 	//
 	// +optional
@@ -717,9 +717,9 @@ type ResourceClaimStatus struct {
 	Devices []AllocatedDeviceStatus
 }
 
-// ReservedForMaxSize is the maximum number of entries in
+// ResourceClaimReservedForMaxSize is the maximum number of entries in
 // claim.status.reservedFor.
-const ResourceClaimReservedForMaxSize = 32
+const ResourceClaimReservedForMaxSize = 256
 
 // ResourceClaimConsumerReference contains enough information to let you
 // locate the consumer of a ResourceClaim. The user must be a resource in the same
diff --git a/pkg/generated/openapi/zz_generated.openapi.go b/pkg/generated/openapi/zz_generated.openapi.go
index 0d2b2d3d0d7..dbffb081c2c 100644
--- a/pkg/generated/openapi/zz_generated.openapi.go
+++ b/pkg/generated/openapi/zz_generated.openapi.go
@@ -47405,7 +47405,7 @@ func schema_k8sio_api_resource_v1alpha3_ResourceClaimStatus(ref common.Reference
 							},
 						},
 						SchemaProps: spec.SchemaProps{
-							Description: "ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.\n\nIn a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.\n\nBoth schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.\n\nThere can be at most 32 such reservations. This may get increased in the future, but not reduced.",
+							Description: "ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.\n\nIn a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.\n\nBoth schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.\n\nThere can be at most 256 such reservations. This may get increased in the future, but not reduced.",
 							Type:        []string{"array"},
 							Items: &spec.SchemaOrArray{
 								Schema: &spec.Schema{
@@ -48902,7 +48902,7 @@ func schema_k8sio_api_resource_v1beta1_ResourceClaimStatus(ref common.ReferenceC
 							},
 						},
 						SchemaProps: spec.SchemaProps{
-							Description: "ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.\n\nIn a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.\n\nBoth schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.\n\nThere can be at most 32 such reservations. This may get increased in the future, but not reduced.",
+							Description: "ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.\n\nIn a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.\n\nBoth schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.\n\nThere can be at most 256 such reservations. This may get increased in the future, but not reduced.",
 							Type:        []string{"array"},
 							Items: &spec.SchemaOrArray{
 								Schema: &spec.Schema{
diff --git a/staging/src/k8s.io/api/resource/v1alpha3/generated.proto b/staging/src/k8s.io/api/resource/v1alpha3/generated.proto
index 13be7cbd8ea..e802a01439f 100644
--- a/staging/src/k8s.io/api/resource/v1alpha3/generated.proto
+++ b/staging/src/k8s.io/api/resource/v1alpha3/generated.proto
@@ -675,7 +675,7 @@ message ResourceClaimStatus {
   // which issued it knows that it must put the pod back into the queue,
   // waiting for the ResourceClaim to become usable again.
   //
-  // There can be at most 32 such reservations. This may get increased in
+  // There can be at most 256 such reservations. This may get increased in
   // the future, but not reduced.
   //
   // +optional
diff --git a/staging/src/k8s.io/api/resource/v1alpha3/types.go b/staging/src/k8s.io/api/resource/v1alpha3/types.go
index e3d7fd8945b..fb4d7041dbd 100644
--- a/staging/src/k8s.io/api/resource/v1alpha3/types.go
+++ b/staging/src/k8s.io/api/resource/v1alpha3/types.go
@@ -687,7 +687,7 @@ type ResourceClaimStatus struct {
 	// which issued it knows that it must put the pod back into the queue,
 	// waiting for the ResourceClaim to become usable again.
 	//
-	// There can be at most 32 such reservations. This may get increased in
+	// There can be at most 256 such reservations. This may get increased in
 	// the future, but not reduced.
 	//
 	// +optional
@@ -715,9 +715,9 @@ type ResourceClaimStatus struct {
 	Devices []AllocatedDeviceStatus `json:"devices,omitempty" protobuf:"bytes,4,opt,name=devices"`
 }
 
-// ReservedForMaxSize is the maximum number of entries in
+// ResourceClaimReservedForMaxSize is the maximum number of entries in
 // claim.status.reservedFor.
-const ResourceClaimReservedForMaxSize = 32
+const ResourceClaimReservedForMaxSize = 256
 
 // ResourceClaimConsumerReference contains enough information to let you
 // locate the consumer of a ResourceClaim. The user must be a resource in the same
diff --git a/staging/src/k8s.io/api/resource/v1alpha3/types_swagger_doc_generated.go b/staging/src/k8s.io/api/resource/v1alpha3/types_swagger_doc_generated.go
index 1a71d64c10d..b41609d1186 100644
--- a/staging/src/k8s.io/api/resource/v1alpha3/types_swagger_doc_generated.go
+++ b/staging/src/k8s.io/api/resource/v1alpha3/types_swagger_doc_generated.go
@@ -291,7 +291,7 @@ func (ResourceClaimSpec) SwaggerDoc() map[string]string {
 var map_ResourceClaimStatus = map[string]string{
 	"":            "ResourceClaimStatus tracks whether the resource has been allocated and what the result of that was.",
 	"allocation":  "Allocation is set once the claim has been allocated successfully.",
-	"reservedFor": "ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.\n\nIn a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.\n\nBoth schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.\n\nThere can be at most 32 such reservations. This may get increased in the future, but not reduced.",
+	"reservedFor": "ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.\n\nIn a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.\n\nBoth schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.\n\nThere can be at most 256 such reservations. This may get increased in the future, but not reduced.",
 	"devices":     "Devices contains the status of each device allocated for this claim, as reported by the driver. This can include driver-specific information. Entries are owned by their respective drivers.",
 }
 
diff --git a/staging/src/k8s.io/api/resource/v1beta1/generated.proto b/staging/src/k8s.io/api/resource/v1beta1/generated.proto
index 6d525d5b856..4ea13e03378 100644
--- a/staging/src/k8s.io/api/resource/v1beta1/generated.proto
+++ b/staging/src/k8s.io/api/resource/v1beta1/generated.proto
@@ -683,7 +683,7 @@ message ResourceClaimStatus {
   // which issued it knows that it must put the pod back into the queue,
   // waiting for the ResourceClaim to become usable again.
   //
-  // There can be at most 32 such reservations. This may get increased in
+  // There can be at most 256 such reservations. This may get increased in
   // the future, but not reduced.
   //
   // +optional
diff --git a/staging/src/k8s.io/api/resource/v1beta1/types.go b/staging/src/k8s.io/api/resource/v1beta1/types.go
index a7f1ee7b54f..ca79c5a6640 100644
--- a/staging/src/k8s.io/api/resource/v1beta1/types.go
+++ b/staging/src/k8s.io/api/resource/v1beta1/types.go
@@ -695,7 +695,7 @@ type ResourceClaimStatus struct {
 	// which issued it knows that it must put the pod back into the queue,
 	// waiting for the ResourceClaim to become usable again.
 	//
-	// There can be at most 32 such reservations. This may get increased in
+	// There can be at most 256 such reservations. This may get increased in
 	// the future, but not reduced.
 	//
 	// +optional
@@ -723,9 +723,9 @@ type ResourceClaimStatus struct {
 	Devices []AllocatedDeviceStatus `json:"devices,omitempty" protobuf:"bytes,4,opt,name=devices"`
 }
 
-// ReservedForMaxSize is the maximum number of entries in
+// ResourceClaimReservedForMaxSize is the maximum number of entries in
 // claim.status.reservedFor.
-const ResourceClaimReservedForMaxSize = 32
+const ResourceClaimReservedForMaxSize = 256
 
 // ResourceClaimConsumerReference contains enough information to let you
 // locate the consumer of a ResourceClaim. The user must be a resource in the same
diff --git a/staging/src/k8s.io/api/resource/v1beta1/types_swagger_doc_generated.go b/staging/src/k8s.io/api/resource/v1beta1/types_swagger_doc_generated.go
index 1d0176cbcae..4ecc35d08a6 100644
--- a/staging/src/k8s.io/api/resource/v1beta1/types_swagger_doc_generated.go
+++ b/staging/src/k8s.io/api/resource/v1beta1/types_swagger_doc_generated.go
@@ -300,7 +300,7 @@ func (ResourceClaimSpec) SwaggerDoc() map[string]string {
 var map_ResourceClaimStatus = map[string]string{
 	"":            "ResourceClaimStatus tracks whether the resource has been allocated and what the result of that was.",
 	"allocation":  "Allocation is set once the claim has been allocated successfully.",
-	"reservedFor": "ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.\n\nIn a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.\n\nBoth schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.\n\nThere can be at most 32 such reservations. This may get increased in the future, but not reduced.",
+	"reservedFor": "ReservedFor indicates which entities are currently allowed to use the claim. A Pod which references a ResourceClaim which is not reserved for that Pod will not be started. A claim that is in use or might be in use because it has been reserved must not get deallocated.\n\nIn a cluster with multiple scheduler instances, two pods might get scheduled concurrently by different schedulers. When they reference the same ResourceClaim which already has reached its maximum number of consumers, only one pod can be scheduled.\n\nBoth schedulers try to add their pod to the claim.status.reservedFor field, but only the update that reaches the API server first gets stored. The other one fails with an error and the scheduler which issued it knows that it must put the pod back into the queue, waiting for the ResourceClaim to become usable again.\n\nThere can be at most 256 such reservations. This may get increased in the future, but not reduced.",
 	"devices":     "Devices contains the status of each device allocated for this claim, as reported by the driver. This can include driver-specific information. Entries are owned by their respective drivers.",
 }