We should not touch the dockershim ahead of removal and therefore
default to `v1alpha2` CRI instead of `v1`.
Partially reverts changes from https://github.com/kubernetes/kubernetes/pull/106501
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
The same change was already done for csi-driver-host-path master, but not
released yet because csi-snapshotter v5.0.0 itself was not ready yet.
We need this update in k/k because some canary jobs already use the new
snapshotter sidecar which causes permission issues.
This is an automatic update of the testing manifests that mirrors the v1.7.3
release. All of these changes were created with
test/e2e/testing-manifests/storage-csi$ ./update-hostpath.sh v1.7.3
This is a first step towards removing the mock CSI driver completely from
e2e testing in favor of hostpath plugin. With the recent hostpath plugin
changes(PR #260, #269), it supports all the features supported by the mock
csi driver.
Using hostpath-plugin for testing also covers CSI persistent feature
usecases.
the test TestHTTP1DoNotReuseRequestAfterTimeout has to wait for
request to time out to assert that subsequent requests does not
reuse the TCP connection.
It seems that current value of 100ms causes issues on some CI
environments and bumping the timeout seems to solve this flakiness,
We can bump the timeout value because is really low compared to real
scenarios and the bump still keeps it in the millisecond order.
Fixed issue in plugin.go where valid plugin events would be skipped if any plugin had an error. This meant that valid plugins would never be installed if another was in an error state as the events fired only once.
Writing file first truncate it and writes later on. During disk space pressure it may cause file to become empty. To mitigate above, we create file with new version first and then move it in place of old one (to make sure that disk space is available)
Without this fix, the algorithm may decide to allocate "remainder" CPUs from a
NUMA node that has no more CPUs to allocate. Moreover, it was only considering
allocation of remainder CPUs from NUMA nodes such that each NUMA node in the
remainderSet could only allocate 1 (i.e. 'cpuGroupSize') more CPUs. With these
two issues in play, one could end up with an accounting error where not enough
CPUs were allocated by the time the algorithm runs to completion.
The updated algorithm will now omit any NUMA nodes that have 0 CPUs left from
the set of NUMA nodes considered for allocating remainder CPUs. Additionally,
we now consider *all* combinations of nodes from the remainder set of size
1..len(remainderSet). This allows us to find a better solution if allocating
CPUs from a smaller set leads to a more balanced allocation. Finally, we loop
through all NUMA nodes 1-by-1 in the remainderSet until all rmeainer CPUs have
been accounted for and allocated. This ensure that we will not hit an
accounting error later on because we explicitly remove CPUs from the remainder
set until there are none left.
A follow-on commit adds a set of unit tests that will fail before these
changes, but succeeds after them.
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Previously the algorithm was too restrictive because it tried to calculate the
minimum based on the number of *available* NUMA nodes and the number of
*available* CPUs on those NUMA nodes. Since there was no (easy) way to tell how
many CPUs an individual NUMA node happened to have, the average across them was
used. Using this value however, could result in thinking you need more NUMA
nodes to possibly satisfy a request than you actually do.
By using the *total* number of NUMA nodes and CPUs per NUMA node, we can get
the true minimum number of nodes required to satisfy a request. For a given
"current" allocation this may not be the true minimum, but its better to start
with fewer and move up than to start with too many and miss out on a better
option.
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Now that the algorithm for balancing CPU distributions across NUMA nodes is
correct, this test actually behaves differently for the "packed" vs.
"distributed" allocation algorithms (as it should).
In the "packed" case we need to ensure that CPUs are allocated such that they
are packed onto cores. Since one CPU is already allocated from a core on NUMA
node 0, we want the next CPU to be its hyperthreaded pair (even though the
first available CPU id is on Socket 1).
In the "distributed" case, however, we want to ensure CPUs are allocated such
that we have an balanced distribution of CPUs across all NUMA nodes. This
points to allocating from Socket 1 if the only other CPU allocated has been
done on Socket 0.
To allow CPUs allocations to be packed onto full cores, one can allocate them
from the "distributed" algorithm with a 'cpuGroupSize' equal to the number of
hypthreads per core (in this case 2). We added an explicit test case for this,
demonstrating that we get the same result as the "packed" algorithm does, even
though the "distributed" algorithm is in use.
Signed-off-by: Kevin Klues <kklues@nvidia.com>