15m is enough for Cluster Autoscaler to remove empty nodes, so we need
to break them sooner than that. Instead, wait 15m after breaking them to
ensure Cluster Autoscaler will consider them as unready instead of still
starting.
The profile gatherer has been removed in
https://github.com/kubernetes/kubernetes/pull/85304, so those options
are unused since then and can therefore be removed.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
The e2e test "should have Endpoints and EndpointSlices pointing to
the API Server Service" was veryfing the current endpoints
reconciler implementation on the apiservers, however, users may
disable the endpoint reconciler and create their own.
This e2e test is also a conformance test, so we should test the
behaviour and not the implementation details. The test verifies
that a kubernetes.default service exist, an endpoint and endpoint
slices object referencing that service exist and are equivalent.
The configuration is deprecated and targets removal for v1.23. Tests
cases have been changed as well.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Prior to 1.22 a user could change NodePort values within a service
during an update, and the apiserver would allocate values for any that
were not specified.
Consider a YAML like:
```
apiVersion: v1
kind: Service
metadata:
name: foo
spec:
type: NodePort
ports:
- name: p
port: 80
- name: q
port: 81
selector:
app: foo
```
When this is created, nodeport values will be allocated for each port.
Something like:
```
apiVersion: v1
kind: Service
metadata:
name: foo
spec:
clusterIP: 10.0.149.11
type: NodePort
ports:
- name: p
nodePort: 30872
port: 80
protocol: TCP
targetPort: 9376
- name: q
nodePort: 31310
port: 81
protocol: TCP
targetPort: 81
selector:
app: foo
```
If the user PUTs (kubectl replace) the original YAML, we would see that
`.nodePort = 0`, and allocate new ports. This was ugly at best.
In 1.22 we fixed this to not allocate new values if we still had the old
values, but instead re-assign them. Net new ports would still be seen
as `.nodePort = 0` and so new allocations would be made.
This broke a corner case as follows:
Prior to 1.22, the user could PUT this YAML:
```
apiVersion: v1
kind: Service
metadata:
name: foo
spec:
type: NodePort
ports:
- name: p
nodePort: 31310 # note this is the `q` value
port: 80
- name: q
# note this nodePort is not specified
port: 81
selector:
app: foo
```
The `p` port would take the `q` port's value. The `q` port would be
seen as `.nodePort = 0` and a new value allocated. In 1.22 this results
in an error (duplicate value in `p` and `q`).
This is VERY minor but it is an API regression, which we try to avoid,
and the fix is not too horrible.
This commit adds more robust testing of this logic.
These events are currently emitted for a pod using a generic ephemeral volume:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3s default-scheduler 0/1 nodes are available: 1 persistentvolumeclaim "my-csi-app-inline-volume-my-csi-volume" not found.
Warning FailedScheduling 2s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
The one about "persistentvolumeclaim not found" is potentially confusing. It
occurs because the scheduler typically checks the pod before the ephemeral
volume controller had a chance to create the PVC.
This is a bit easier to understand:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4s default-scheduler 0/1 nodes are available: 1 waiting for ephemeral volume controller to create the persistentvolumeclaim "my-csi-app-inline-volume-my-csi-volume".
Warning FailedScheduling 2s default-scheduler 0/1 nodes are available: 1 pod has unbound immediate PersistentVolumeClaims.
The Container Images for Windows Server 2022 have been published, and we can
start adding jobs for them.
The ltsc2022-based images have been built and promoted with these image versions.
This PR adds GA AnnStorageProvisioner annotation to
a PVC if the PVC requires dynamic provisioning. This
also deprecates the beta AnnStorageProvisioner annotation
and it will be removed in a later release.
The makefiles scripts create a variable with all the go files
that are part of the Kubernetes source tree, including staging.
As today, this variable has a size of < 100kb
wc .make/all_go_dirs.mk
2326 2326 98905 .make/all_go_dirs.mk
This variable is passed as argument in the Makefiles, where it
is expanded. In Linux, there is a limit to the max size of
the arguments MAX_ARG_STRLEN.
If the arguments go above 128k, you get a nice:
execvp: /usr/bin/env: Argument list too long
If you, for whatever reason, do some go mod vendor inside the
hack/tools folder, these files will be added to the variable
and most probably you'll go above the limit and get that error.
Then, you'll learn a lot about Makefils, shell expansion, strace,
execpve, ARG_MAX and MAX_ARG_STRLEN,until you realize what is
the real problem :).
The PR https://github.com/kubernetes/kubernetes/pull/104575 introduces
some intermediate types which makes the 32GiB memory machine kill the
typecheck process. To resolve that issue and make the test more robust,
we now reduce the amount of parallel typechecks to run to `2`.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
This partially reverts commit 39cfe232325d66bcdbc935af7aaf7022562e7010and PR #98057
the original problem was caused by not using {end} at the end of the range
Prior to this change, the pod was not getting scheduled on the node as
we don't have a running scheduler in e2e_node. PodClient solves this
problem by manually assigning the pod to the node.
The current GPU installer was built in 2017, from source that no longer
exists in Kubernetes ([adding commit][1]. The image was built on 2017-06-13.
Unfortunately, this installer no longer appears to work. When debugging
on the same node type as used by test-infra, it failed to build the
driver as the kernel sha was no longer available.
This lead to needing to find a new way to install GPUs. The smallest
logical change was switching to [cos-gpu-installer][2]
. There is a newer version of this available on [googlesource][3] that
I have not yet tested as it's not clear what the state of the project
is, as I couldn't find docs outside of the source itself.
We install things to the same location as previously to avoid needing
extra downstream changes. There are a couple of weird issues here
however, like needing to run the container twice to correctly update the
LD Cache.
[1]: 1e77594958/cluster/gce/gci/nvidia-gpus/Dockerfile
[2]: https://github.com/GoogleCloudPlatform/cos-gpu-installer
[3]: https://cos.googlesource.com/cos/tools/+/refs/heads/master/src/cmd/cos_gpu_installer/