Automatic merge from submit-queue (batch tested with PRs 51335, 51364, 51130, 48075, 50920)
[API] Feature/job failure policy
**What this PR does / why we need it**: Implements the Backoff policy and failed pod limit defined in https://github.com/kubernetes/community/pull/583
**Which issue this PR fixes**:
fixes#27997, fixes#30243
**Special notes for your reviewer**:
This is a WIP PR, I updated the api batchv1.JobSpec in order to prepare the backoff policy implementation in the JobController.
**Release note**:
```release-note
Add backoff policy and failed pod limit for a job
```
Automatic merge from submit-queue (batch tested with PRs 45186, 50440)
Retry fed-svc creation on diff NodePort during e2e tests
**What this PR does / why we need it**:
Currently in federated end2end tests, the creation of services are
done with a randomize NodePort selection take is causing e2e test
flakes if the creation of a federated service failed if the port is
not available.
Now the util.CreateService(...) function is retrying to create the
service on different nodePort in case of error. The method retry until
success or all possible NodePorts have been tested and also failed.
**Which issue this PR fixes**
fixes#44018
Automatic merge from submit-queue (batch tested with PRs 50016, 49583, 49930, 46254, 50337)
[Federation] Make the hpa scale time window configurable
This PR is on top of open pr https://github.com/kubernetes/kubernetes/pull/45993.
Please review only the last commit in this PR.
This adds a config param to controller manager, the value of which gets passed to hpa adapter via sync controller.
This is needed to reduce the overall time limit of the hpa scaling window to much lesser (then the default 2 mins) to get e2e tests run faster. Please see the comment on the newly added parameter.
**Special notes for your reviewer**:
@kubernetes/sig-federation-pr-reviews
@quinton-hoole
@marun to please validate the mechanism used to pass a parameter from cmd line to adapter.
**Release note**:
```
federation-controller-manager gets a new flag --hpa-scale-forbidden-window.
This flag is used to configure the duration used by federation hpa controller to determine if it can move max and/or min replicas
around (or not), of a cluster local hpa object, by comparing current time with the last scaled time of that cluster local hpa.
Lower value will result in faster response to scalibility conditions achieved by cluster local hpas on local replicas, but too low
a value can result in thrashing. Higher values will result in slower response to scalibility conditions on local replicas.
```
Currently, in federated end2end tests, the creation of services are
done with a randomize NodePort selection. It causing e2e test
flakes if the creation of a federated service failed if the port is
not available.
Now the util.CreateService(...) function is re trying to create the
service on different nodePort in an error case. The method retries until
success or 10 creation retry with other random NodePorts.
If never the service has not been created properly on one of the
federated cluster, a Service shards cleanup is executed before retrying
again the federated service creation.
fixes#44018
Automatic merge from submit-queue
Federated Job controller implementation
Note that job re-balance is not there yet as it's difficult to honor job deadline
requires #35945 and 35943
fixes#34261
@quinton-hoole @nikhiljindal @deepak-vij
**Release note**:
```release-note
Federated Job feature. It is now possible to create a Federated Job
that is automatically deployed to one or more federated clusters
(as Jobs in those clusters). Job parallelism and completions are
spread across clusters according to cluster selection and weighting
preferences. Federated Job status reflects the aggregate status
across all underlying cluster Jobs.
```
Automatic merge from submit-queue (batch tested with PRs 49538, 49708, 47665, 49750, 49528)
Use the core client with version
**What this PR does / why we need it**:
Replace the **deprecated** `clientSet.Core()` with `clientSet.CoreV1()`.
**Which issue this PR fixes**: fixes#49535
**Special notes for your reviewer**:
**Release note**:
```release-note
NONE
```
Automatic merge from submit-queue (batch tested with PRs 49581, 49652, 49681, 49688, 44655)
Re-enable federated ingress test that was disabled due to a federated service deletion bug.
The details of the bug is described in PR #44626. We believe this bug fixes the flakiness in this test and hence we are re-enabling this test to get some mileage on it. If it turns out to be a problem again we are going to revert this back.
**Release note**:
```release-note
NONE
```
/assign @csbell
cc @kubernetes/sig-federation-pr-reviews
Now that federation of deployments is implemented with the sync
controller, the existing e2e coverage duplicates that provided by the
crudtester integration and e2e testing.
Now that replicaset federation is implemented with the sync
controller, the old replicaset-specific e2e tests duplicate coverage
provided by crudtester integration and e2e testing.
Automatic merge from submit-queue (batch tested with PRs 47851, 47824, 47858, 46099)
Revert "[Federation] Fix federated service reconcilation issue due to addition of External…"
Reverts kubernetes/kubernetes#45798
Reverting the temporary fix as the problem is fixed in #45869.
with that fix federation also can default ExternalTrafficLocalOnly if not set.
Issue: #45812
cc @MrHohn @madhusudancs @kubernetes/sig-federation-bugs
Automatic merge from submit-queue
Federation: create loadbalancer service in tests only if test depends on it
**What this PR does / why we need it**:
Creating LoadBalancer type of service for every test case is kind of expensive and time consuming to provision. So this PR changes the test cases to use LoadBalancer type services only when necessary.
**Which issue this PR fixes** (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes#47068
**Release note**:
```release-note
NONE
```
cc @kubernetes/sig-federation-pr-reviews
/assign @madhusudancs
Automatic merge from submit-queue (batch tested with PRs 46977, 47005, 47018, 47061, 46809)
Directly grab map values instead of using loop-clause variables when setting up federated sync controller tests.
Go's loop-clause variables are allocated once and the items are copied to that variable while iterating through the loop. This means, these variables can't escape the scope since closures are bound to loop-clause variables whose value change during each iteration. Doing so would lead to undesired behavior. For more on this topic see: https://github.com/golang/go/wiki/CommonMistakes
So in order to workaround this problem in sync controller e2e tests, we iterate through the map and copy the map value to a variable inside the loop before using it in closures.
Fixes issue: #47059
**Release note**:
```release-note
NONE
```
/assign @marun @shashidharatd @perotinus
cc @csbell @nikhiljindal
/sig federation
Go's loop-clause variables are allocated once and the items are copied
to that variable while iterating through the loop. This means, these
variables can't escape the scope since closures are bound to loop-clause
variables whose value change during each iteration. Doing so would lead
to undesired behavior. For more on this topic see:
https://github.com/golang/go/wiki/CommonMistakes
So in order to workaround this problem in sync controller e2e tests, we
iterate through the map and copy the map value to a variable inside the
loop before using it in closures.
Fixes issue: #47059
Federation of daemonset and secret types is now implemented by the
sync controller, and e2e testing for each type is provided via crud
lifecycle e2e tests. This renders the legacy e2e tests for these types
redundant, and this commit removes those tests.
Automatic merge from submit-queue (batch tested with PRs 45860, 45119, 44525, 45625, 44403)
[Federation] Move annotations and related parsing code as common code
This PR moves some code, which was duplicate, around as common code.
Changes the names of structures used for annotations to common names.
s/FederatedReplicaSetPreferences/ReplicaAllocationPreferences/
s/ClusterReplicaSetPreferences/PerClusterPreferences/
This can be reused in job controller and hpa controller code.
**Special notes for your reviewer**:
@kubernetes/sig-federation-misc
**Release note**:
```NONE
```
Automatic merge from submit-queue (batch tested with PRs 45052, 44983, 41254)
[Federation][e2e] Add 2 new testcases to federation service e2e
**What this PR does / why we need it**:
Add 2 new test cases for federation services
- Federation service updation should update clustered service
- Federation service controller should recreate service shard in cluster, if it gets deleted.
**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes#27623, #35827
Handles one of the tasks discussed in #41253
**Special notes for your reviewer**:
**Release note**:
`NONE`
cc @kubernetes/sig-federation-bugs, @nikhiljindal @madhusudancs