The initial work of this had been merged before
this PR but was not yet in use. This simplifies
the implementation and adds some basic type
sanity checking.
Co-authored-by: Jiahui Feng <jhf@google.com>
Make the following changes:
- When dryrunning if the given kubeconfig does not exist
create a DryRun object without a real client. This means only
a fake client will be used for all actions.
- Skip the preflight check if manifests exist during dryrun.
Print "would ..." instead.
- Add new reactors that handle objects during upgrade.
- Add unit tests for new reactors.
- Print message on "upgrade node" that this is not a CP node
if the apiserver manifest is missing.
- Add a new function GetNodeName() that uses 3 different methods
for fetching the node name. Solves a long standing issue where
we only used the cert in kubelet.conf for determining node name.
- Various other minor fixes.
The reason for the previous behavior was unnecessary performance overhead that
occurs when the caller already provided a "fresh" copy and doesn't touch it
afterwards.
But this is something that DRA driver developers can easily get wrong, so it's
better to be safe than sorry.
When deleting a bunch of slices, the delete events queue the pool while it is
being synced. It then got synced again immediately, while the deleted slices
were still being removed from the informer cache. The obsolete slice in the
cache caused the controller to delete it again, which fails with a "not
found". That error is ignored, but this still caused extra API calls.
Now syncing gets delayed with a configuration duration (default: 30 seconds) so
the informer cache is more likely to be up-to-date when the pool gets synced
again.
This test covers creating and deleting 100 large ResourceSlices. It is strict
about using the minimum number of calls.
The test also verifies that creating large slices works.
This avoids the problem of creating an additional slice when the one from the
previous sync is not in the informer cache yet. It also avoids false
attempts to delete slices which were updated in the previous sync. Such
attempts would fail the ResourceVersion precondition check, but would
still cause work for the apiserver.
It's better to verify UID and ResourceVersion of the ResourceSlice that we want
to delete. If anything changed, the decision to remove it might not apply
anymore and we need to check again.
The ResourceSlice controller (theoretically) might end up creating too many
slices if it syncs again before its informer cache was updated. This could
cause the scheduler to allocate a device from a duplicated slice. They should
be identical, but its still better to fail and wait until the controller
removes the redundant slice.
The driver determines what each slice is meant to look like. The controller
then ensures that only those slices exist. It reuses existing slices where the
set of devices, as identified by their names, is the same as in some desired
slice. Such slices get updated to match the desired state.
In other words, attributes and the order of devices can be changed by updating
an existing slice, but adding or removing a device is done by deleting and
re-creating slices.
Co-authored-by: googs1025 <googs1025@gmail.com>
The test update is partly based on
https://github.com/kubernetes/kubernetes/pull/127645.