Merge pull request #12070 from wojtek-t/kubmark_proposal

Proposal for scalability testing infrastructure
2025-07-27 05:27:21 +00:00 · 2015-08-17 13:41:58 -07:00 · 2015-08-17 13:41:58 -07:00 · a4b06373fa
commit a4b06373fa
parent f21a6e9a93 4f725900e3
2 changed files with 112 additions and 13 deletions
--- a/docs/proposals/apiserver-watch.md
+++ b/docs/proposals/apiserver-watch.md
@ -20,7 +20,7 @@ refer to the docs that go with that version.
 <strong>
 The latest 1.0.x release of this document can be found
-[here](http://releases.k8s.io/release-1.0/docs/proposals/apiserver_watch.md).
+[here](http://releases.k8s.io/release-1.0/docs/proposals/apiserver-watch.md).
 Documentation for other releases can be found at
 [releases.k8s.io](http://releases.k8s.io).
@ -107,7 +107,7 @@ need to reimplement few relevant functions (probably just Watch and List).
 Mover, this will not require any changes in other parts of the code.
 This step is about extracting the interface of tools.EtcdHelper.
-2. Create a FIFO cache with a given capacity. In its "rolling history windown"
+2. Create a FIFO cache with a given capacity. In its "rolling history window"
 we will store two things:
  - the resourceVersion of the object (being an etcdIndex)
@ -129,28 +129,22 @@ we will store two things:
  We may consider reusing existing structures cache.Store or cache.Indexer
  ("pkg/client/cache") but this is not a hard requirement.
-3. Create a new implementation of the EtcdHelper interface, that will internally
+3. Create the new implementation of the API, that will internally have a
 have a single watch open to etcd and will store data received from etcd in the
 FIFO cache. This includes implementing registration of a new watcher that will
 start a new go-routine responsible for iterating over the cache and sending
 appropriately filtered objects to the watcher.
 4. Create the new implementation of the API, that will internally have a
 single watch open to etcd and will store the data received from etcd in
 the FIFO cache - this includes implementing registration of a new watcher
 which will start a new go-routine responsible for iterating over the cache
 and sending all the objects watcher is interested in (by applying filtering
 function) to the watcher.
-5. Add a support for processing "error too old" from etcd, which will require:
+4. Add a support for processing "error too old" from etcd, which will require:
  - disconnect all the watchers
  - clear the internal cache and relist all objects from etcd
  - start accepting watchers again
-6. Enable watch in apiserver for some of the existing resource types - this
+5. Enable watch in apiserver for some of the existing resource types - this
 should require only changes at the initialization level.
-7. The next step will be to incorporate some indexing mechanism, but details
+6. The next step will be to incorporate some indexing mechanism, but details
 of it are TBD.
@ -180,5 +174,5 @@ the same time, we can introduce an additional etcd event type:
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
-[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/apiserver_watch.md?pixel)]()
+[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/apiserver-watch.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->
--- a/docs/proposals/scalability-testing.md
+++ b/docs/proposals/scalability-testing.md
@ -0,0 +1,105 @@
 <!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
 <!-- BEGIN STRIP_FOR_RELEASE -->
 <img src="http://kubernetes.io/img/warning.png" alt="WARNING"
     width="25" height="25">
 <img src="http://kubernetes.io/img/warning.png" alt="WARNING"
     width="25" height="25">
 <img src="http://kubernetes.io/img/warning.png" alt="WARNING"
     width="25" height="25">
 <img src="http://kubernetes.io/img/warning.png" alt="WARNING"
     width="25" height="25">
 <img src="http://kubernetes.io/img/warning.png" alt="WARNING"
     width="25" height="25">
 <h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
 If you are using a released version of Kubernetes, you should
 refer to the docs that go with that version.
 <strong>
 The latest 1.0.x release of this document can be found
 [here](http://releases.k8s.io/release-1.0/docs/proposals/scalability-testing.md).
 Documentation for other releases can be found at
 [releases.k8s.io](http://releases.k8s.io).
 </strong>
 --
 <!-- END STRIP_FOR_RELEASE -->
 <!-- END MUNGE: UNVERSIONED_WARNING -->
 ## Background
 We have a goal to be able to scale to 1000-node clusters by end of 2015.
 As a result, we need to be able to run some kind of regression tests and deliver
 a mechanism so that developers can test their changes with respect to performance.
 Ideally, we would like to run performance tests also on PRs - although it might
 be impossible to run them on every single PR, we may introduce a possibility for
 a reviewer to trigger them if the change has non obvious impact on the performance
 (something like "k8s-bot run scalability tests please" should be feasible).
 However, running performance tests on 1000-node clusters (or even bigger in the
 future is) is a non-starter. Thus, we need some more sophisticated infrastructure
 to simulate big clusters on relatively small number of machines and/or cores.
 This document describes two approaches to tackling this problem.
 Once we have a better understanding of their consequences, we may want to
 decide to drop one of them, but we are not yet in that position.
 ## Proposal 1 - Kubmark
 In this proposal we are focusing on scalability testing of master components.
 We do NOT focus on node-scalability - this issue should be handled separately.
 Since we do not focus on the node performance, we don't need real Kubelet nor
 KubeProxy - in fact we don't even need to start real containers.
 All we actually need is to have some Kubelet-like and KubeProxy-like components
 that will be simulating the load on apiserver that their real equivalents are
 generating (e.g. sending NodeStatus updated, watching for pods, watching for
 endpoints (KubeProxy), etc.).
 What needs to be done:
 1. Determine what requests both KubeProxy and Kubelet are sending to apiserver.
 2. Create a KubeletSim that is generating the same load on apiserver that the
   real Kubelet, but is not starting any containers. In the initial version we
   can assume that pods never die, so it is enough to just react on the state
   changes read from apiserver.
 	 TBD: Maybe we can reuse a real Kubelet for it by just injecting some "fake"
   interfaces to it?
 3. Similarly create a KubeProxySim that is generating the same load on apiserver
   as a real KubeProxy. Again, since we are not planning to talk to those
   containers, it basically doesn't need to do anything apart from that.
 	 TBD: Maybe we can reuse a real KubeProxy for it by just injecting some "fake"
   interfaces to it?
 4. Refactor kube-up/kube-down scripts (or create new ones) to allow starting
   a cluster with KubeletSim and KubeProxySim instead of real ones and put
   a bunch of them on a single machine.
 5. Create a load generator for it (probably initially it would be enough to
   reuse tests that we use in gce-scalability suite).
 ## Proposal 2 - Oversubscribing
 The other method we are proposing is to oversubscribe the resource,
 or in essence enable a single node to look like many separate nodes even though
 they reside on a single host. This is a well established pattern in many different
 cluster managers (for more details see
 http://www.uscms.org/SoftwareComputing/Grid/WMS/glideinWMS/doc.prd/index.html ).
 There are a couple of different ways to accomplish this, but the most viable method
 is to run privileged kubelet pods under a hosts kubelet process. These pods then
 register back with the master via the introspective service using modified names
 as not to collide.
 Complications may currently exist around container tracking and ownership in docker.
 <!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/proposals/scalability-testing.md?pixel)]()
 <!-- END MUNGE: GENERATED_ANALYTICS -->