kubernetes

mirror of https://github.com/k3s-io/kubernetes.git synced 2025-07-24 12:15:52 +00:00

Author	SHA1	Message	Date
Kevin Klues	e284c74d93	Add unit test for CPUManager distribute NUMA algorithm verifying fixes Before Change: "test" description="ensure bestRemainder chosen with NUMA nodes that have enough CPUs to satisfy the request" "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 1] distribution=8 remainder=2 available=[-1 -1 0 6] balance=2.915 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 2] distribution=8 remainder=2 available=[-1 0 -1 6] balance=2.915 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 3] distribution=8 remainder=2 available=[5 -1 0 0] balance=2.345 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[1 2] distribution=8 remainder=2 available=[0 -1 -1 6] balance=2.915 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[1 3] distribution=8 remainder=2 available=[0 -1 0 5] balance=2.345 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[2 3] distribution=8 remainder=2 available=[0 0 -1 5] balance=2.345 "bestCombo found" distribution=8 bestCombo=[0 1 2 3] bestRemainder=[0 3] --- FAIL: TestTakeByTopologyNUMADistributed (0.01s) --- FAIL: TestTakeByTopologyNUMADistributed/ensure_bestRemainder_chosen_with_NUMA_nodes_that_have_enough_CPUs_to_satisfy_the_request (0.00s) cpu_assignment_test.go:867: unexpected error [accounting error, not enough CPUs allocated, remaining: 1] After Change: "test" description="ensure bestRemainder chosen with NUMA nodes that have enough CPUs to satisfy the request" "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[3] distribution=8 remainder=2 available=[0 0 0 4] balance=1.732 "bestCombo found" distribution=8 bestCombo=[0 1 2 3] bestRemainder=[3] SUCCESS Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 20:45:37 +00:00
Cheng Xing	4de40e90d4	DelegateFSGroupToCSIDriver e2e: skip tests with chgrp	2021-11-24 11:41:53 -08:00
Kevin Klues	031f11513d	Fix accounting bug in CPUManager distribute NUMA policy Without this fix, the algorithm may decide to allocate "remainder" CPUs from a NUMA node that has no more CPUs to allocate. Moreover, it was only considering allocation of remainder CPUs from NUMA nodes such that each NUMA node in the remainderSet could only allocate 1 (i.e. 'cpuGroupSize') more CPUs. With these two issues in play, one could end up with an accounting error where not enough CPUs were allocated by the time the algorithm runs to completion. The updated algorithm will now omit any NUMA nodes that have 0 CPUs left from the set of NUMA nodes considered for allocating remainder CPUs. Additionally, we now consider all combinations of nodes from the remainder set of size 1..len(remainderSet). This allows us to find a better solution if allocating CPUs from a smaller set leads to a more balanced allocation. Finally, we loop through all NUMA nodes 1-by-1 in the remainderSet until all rmeainer CPUs have been accounted for and allocated. This ensure that we will not hit an accounting error later on because we explicitly remove CPUs from the remainder set until there are none left. A follow-on commit adds a set of unit tests that will fail before these changes, but succeeds after them. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 19:18:11 +00:00
Kevin Klues	5317a2e2ac	Fix error handling in CPUManager distribute NUMA tests Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:31 +00:00
Kevin Klues	dc4430b663	Add a sum() helper to the CPUManager cpuassignment logic Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:29 +00:00
Kevin Klues	cfacc22459	Allow the map.Values() function in the CPUManager to take a set of keys Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:28 +00:00
Kevin Klues	a160d9a8cd	Fix CPUManager algo to calculate min NUMA nodes needed for distribution Previously the algorithm was too restrictive because it tried to calculate the minimum based on the number of available NUMA nodes and the number of available CPUs on those NUMA nodes. Since there was no (easy) way to tell how many CPUs an individual NUMA node happened to have, the average across them was used. Using this value however, could result in thinking you need more NUMA nodes to possibly satisfy a request than you actually do. By using the total number of NUMA nodes and CPUs per NUMA node, we can get the true minimum number of nodes required to satisfy a request. For a given "current" allocation this may not be the true minimum, but its better to start with fewer and move up than to start with too many and miss out on a better option. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:26 +00:00
Kevin Klues	209cd20548	Fix unit tests following bug fix in CPUManager for map functions (2/2) Now that the algorithm for balancing CPU distributions across NUMA nodes is correct, this test actually behaves differently for the "packed" vs. "distributed" allocation algorithms (as it should). In the "packed" case we need to ensure that CPUs are allocated such that they are packed onto cores. Since one CPU is already allocated from a core on NUMA node 0, we want the next CPU to be its hyperthreaded pair (even though the first available CPU id is on Socket 1). In the "distributed" case, however, we want to ensure CPUs are allocated such that we have an balanced distribution of CPUs across all NUMA nodes. This points to allocating from Socket 1 if the only other CPU allocated has been done on Socket 0. To allow CPUs allocations to be packed onto full cores, one can allocate them from the "distributed" algorithm with a 'cpuGroupSize' equal to the number of hypthreads per core (in this case 2). We added an explicit test case for this, demonstrating that we get the same result as the "packed" algorithm does, even though the "distributed" algorithm is in use. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:24 +00:00
Kevin Klues	67f719cb1d	Fix unit tests following bug fix in CPUManager for map functions (1/2) This fixes two related tests to better test our "balanced" distribution algorithm. The first test originally provided an input with the following number of CPUs available on each NUMA node: Node 0: 16 Node 1: 20 Node 2: 20 Node 3: 20 It then attempted to distribute 48 CPUs across them with an expectation that each of the first 3 NUMA nodes would have 16 CPUs taken from them (leaving Node 0 with no more CPUs in the end). This would have resulted in the following amount of CPUs on each node: Node 0: 0 Node 1: 4 Node 2: 4 Node 3: 20 Which results in a standard deviation of 7.6811 However, a more balanced solution would actually be to pull 16 CPUs from NUMA nodes 1, 2, and 3, and leave 0 untouched, i.e.: Node 0: 16 Node 1: 4 Node 2: 4 Node 3: 4 Which results in a standard deviation of 5.1961524227066 To fix this test we changed the original number of available CPUs to start with 4 less CPUs on NUMA node 3, and 2 more CPUs on NUMA node 0, i.e.: Node 0: 18 Node 1: 20 Node 2: 20 Node 3: 16 So that we end up with a result of: Node 0: 2 Node 1: 4 Node 2: 4 Node 3: 16 Which pulls the CPUs from where we want and results in a standard deviation of 5.5452 For the second test, we simply reverse the number of CPUs available for Nodes 0 and 3 as: Node 0: 16 Node 1: 20 Node 2: 20 Node 3: 18 Which forces the allocation to happen just as it did for the first test, except now on NUMA nodes 1, 2, and 3 instead of NUMA nodes 0,1, and 2. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:23 +00:00
Kevin Klues	4008ea0b4c	Fix bug in CPUManager map.Keys() and map.Values() implementations Previously these would return lists that were too long because we appended to pre-initialized lists with a specific size. Since the primary place these functions are used is in the mean and standard deviation calculations for the NUMA distribution algorithm, it meant that the results of these calculations were often incorrect. As a result, some of the unit tests we have are actually incorrect (because the results we expect do not actually produce the best balanced distribution of CPUs across all NUMA nodes for the input provided). These tests will be patched up in subsequent commits. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:21 +00:00
Kevin Klues	446c58e0e7	Ensure we balance across all NUMA nodes in NUMA distribution algo Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:19 +00:00
Kevin Klues	c8559bc43e	Short-circuit CPUManager distribute NUMA algo for unusable cpuGroupSize Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:16 +00:00
Kevin Klues	b28c1392d7	Round the CPUManager mean and stddev calculations to the nearest 1000th Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:13 +00:00
Kubernetes Prow Robot	0d3f2ca371	Merge pull request #106657 from liggitt/openapiv3 Unversion and normalize openapi v3 fixtures	2021-11-24 08:36:20 -08:00
Jordan Liggitt	88ab0d03b7	Revert "update expected ordering" This reverts commit `fbc8ac9c96`.	2021-11-24 11:19:27 -05:00
Jordan Liggitt	ed68909177	Revert sigs.k8s.io/structured-merge-diff/v4 to v4.1.2	2021-11-24 10:32:24 -05:00
Jordan Liggitt	2588ea76ea	Regenerate openapi v3 fixtures	2021-11-24 10:03:45 -05:00
Jordan Liggitt	f30c5738ea	Unversion and normalize openapi v3 fixtures	2021-11-24 10:03:36 -05:00
Patrick Ohly	9d98c69075	api/errors: explicitly allow nil error parameters This was already possible before because the underlying errors.As supports it. But because it wasn't clear, a lot of code unnecessarily checks for nil before calling the Is* functions.	2021-11-24 08:39:58 +01:00
Anago GCB	c8c81cbfbb	CHANGELOG: Update directory for v1.23.0-rc.0 release	2021-11-24 06:19:11 +00:00
haoyun	eb673cec64	fix: klog flag redefined Signed-off-by: haoyun <yun.hao@daocloud.io>	2021-11-24 10:03:40 +08:00
Kubernetes Prow Robot	e53cf07724	Merge pull request #106611 from verult/delegate-fsgroup-disable-onrootmismatch-e2e Delegate FSGroup CSI driver e2e: verify fsgroup is passed to CSI calls	2021-11-23 17:52:20 -08:00
Kubernetes Prow Robot	c3e6b66643	Merge pull request #106533 from haircommander/summary-page-fault-test test: update major page fault values for summary test	2021-11-23 15:09:45 -08:00
Kubernetes Prow Robot	a5622f3f6e	Merge pull request #106616 from mattcary/pvc-race Clean up deep copy needed for UpdateStatefulSet	2021-11-23 09:38:17 -08:00
Matthew Cary	0e2b901762	Clean up deep copy needed for UpdateStatefulSet Change-Id: Id732358183d682d1a945cfee56f83bcaac0d7c31	2021-11-23 06:48:54 -08:00
xuweiwei	9ab5c8a36f	Fix typo depenging -> depending permssion -> permission Signed-off-by: xuweiwei <xuweiwei_yewu@cmss.chinamobile.com>	2021-11-23 16:18:13 +08:00
Kubernetes Prow Robot	e31aafc4fd	Merge pull request #106348 from endocrimes/dani/rm-gpu e2e_node: unify device tests	2021-11-22 19:46:16 -08:00
Andrea Hoffer	f5612f100e	Adding an example for kubectl plugin list	2021-11-22 21:33:06 -05:00
Cheng Xing	bca1b79728	Delegate FSGroup CSI driver e2e: verify fsgroup is passed to CSI calls using mock driver tests	2021-11-22 17:00:39 -08:00
Kubernetes Prow Robot	f572e4d5b4	Merge pull request #106518 from SergeyKanzhelev/tryProbeFix Fix the bug with GRPC probe	2021-11-22 15:38:54 -08:00
Kubernetes Prow Robot	a142f86351	Merge pull request #105764 from jlebon/pr/add-ssh-mode test/e2e_node/remote: support pure SSH mode	2021-11-22 10:53:33 -08:00
Abu Kashem	41cef06f66	add trace step for transformResponseObject	2021-11-22 13:18:02 -05:00
Jonathan Lebon	3ebd93cd02	test-e2e-node: support pure SSH mode Right now, `run_remote.go` only supports GCE instances. But actually running the tests is completely independent of GCE and could work just as well on any SSH-accessible machine. This patch adds a new `--mode` switch, which defaults to `gce` for backwards compatibility, but can be set to `ssh`. In that mode, the GCE API is not used at all, and we simply connect to the hosts given via `--hosts`. This is still better than `run_local.go` because the latter mixes build environment with test environment, which doesn't fit well with container-optimized operating systems. This is part of an effort to setup the e2e node tests on Fedora CoreOS (see https://github.com/coreos/fedora-coreos-tracker/issues/990). Patch best viewed with whitespace ignored.	2021-11-22 10:13:15 -05:00
Jonathan Lebon	e0723c1e64	test-e2e-node: add SSH_OPTIONS This allows overriding the default options.	2021-11-22 10:13:13 -05:00
Jonathan Lebon	591f4cdb77	run_remote.go: factor out prepareGceImages() Mostly a pure code move. Only changed the `klog.Fatalf` to `fmt.Errorf`. Prep for future patch.	2021-11-22 10:12:29 -05:00
Jonathan Lebon	032dbd2063	run_remote.go: move registerGceHostIP() call to testImage() I.e. don't assume that `testHost` is called on a GCE host. Prep for future patch.	2021-11-22 10:12:28 -05:00
Jonathan Lebon	36233b985b	run_remote.go: factor out registerGceHostIP() Prep for future patch.	2021-11-22 10:12:28 -05:00
Kubernetes Prow Robot	806e38aeb7	Merge pull request #106577 from liggitt/field-validation-speedup Speed up field validation tests	2021-11-22 02:07:09 -08:00
Jordan Liggitt	d4d34085e4	Clean up field validation test logs	2021-11-21 21:29:06 -05:00
Jordan Liggitt	8fa1c612fd	Speed up field validation tests	2021-11-21 21:29:06 -05:00
Amim Knabben	8b37bfec8e	Enabling kube-proxy metrics on windows kernel mode	2021-11-21 21:23:55 -03:00
Kubernetes Prow Robot	a8c9dd6274	Merge pull request #106576 from liggitt/bad-request-patch Return BadRequest for invalid large patch	2021-11-21 13:05:00 -08:00
Jordan Liggitt	2d307f47bd	Return BadRequest for invalid large patch	2021-11-21 09:13:37 -05:00
Antonio Ojea	020cf2d7aa	e2e disable node port on loadbalancers	2021-11-20 20:24:37 +01:00
Kubernetes Prow Robot	ed07515ee0	Merge pull request #106431 from Namanl2001/image-config-dir enabling runtime-config to be passed via make file for node-e2e testing purposes	2021-11-20 08:22:59 -08:00
Kubernetes Prow Robot	21d3acc787	Merge pull request #106544 from ehashman/fix-flake-restart Deflake "Kubelet should correctly account for terminated pods after restart"	2021-11-20 00:04:59 -08:00
Kubernetes Prow Robot	9a1d90165d	Merge pull request #106462 from jpbetz/cel-e2e2 Add e2e test for CEL Validation Rules	2021-11-19 22:04:59 -08:00
Kubernetes Prow Robot	823cc3cc36	Merge pull request #106563 from ehashman/more-etcd-validation Validate etcd image versions in test manifests	2021-11-19 18:09:00 -08:00
Kubernetes Prow Robot	084b28f6d5	Merge pull request #106510 from robscott/topology-ready-fix-controller Updating TopologyCache to disregard unready endpoints in calculations	2021-11-19 17:07:11 -08:00
Kubernetes Prow Robot	37ae94f9ed	Merge pull request #106507 from robscott/topology-ready-fix Updating kube-proxy to ignore unready endpoints for Topology Hints	2021-11-19 17:06:59 -08:00

... 2 3 4 5 6 ...

105386 Commits