All code must use the context from Ginkgo when doing API calls or polling for a
change, otherwise the code would not return immediately when the test gets
aborted.
Every ginkgo callback should return immediately when a timeout occurs or the
test run manually gets aborted with CTRL-C. To do that, they must take a ctx
parameter and pass it through to all code which might block.
This is a first automated step towards that: the additional parameter got added
with
sed -i 's/\(framework.ConformanceIt\|ginkgo.It\)\(.*\)func() {$/\1\2func(ctx context.Context) {/' \
$(git grep -l -e framework.ConformanceIt -e ginkgo.It )
$GOPATH/bin/goimports -w $(git status | grep modified: | sed -e 's/.* //')
log_test.go was left unchanged.
- update all the import statements
- run hack/pin-dependency.sh to change pinned dependency versions
- run hack/update-vendor.sh to update go.mod files and the vendor directory
- update the method signatures for custom reporters
Signed-off-by: Dave Chen <dave.chen@arm.com>
On the multi NUMA node environment, kernel splits hugepages allocated under
/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages file equally between NUMA nodes.
That makes it harder to predict where several pods will start because the number
of hugepages on each NUMA node will depend on the amount of NUMA nodes under the environment.
The memory manager test will allocate hugepages on the specific NUMA node to make
the test more predictable on the multi NUMA nodes environment.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
Each e2e test knows it wants to restart a running kubelet or a
non-running kubelet. The vast majority of times, we want to
restart a running kubelet (e.g. to change config or to check
some properties hold across kubelet crashes/restarts), but sometimes
we stop the kubelet, do some actions and only then restart.
To accomodate both use cases, we just expose the `running` boolean
flag to the e2e tests.
Having the `restartKubelet` explicitly restarting a running kubelet
helps us to trobuleshoot e2e failures on which the kubelet
was supposed to be running, while it was not; attempting a restart
in such cases only murkied the waters further, making the
troubleshooting and the eventual fix harder.
In the happy path, no expected change in behaviour.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Current test assumes that test pod is deleted when the test
namespace is deleted. However, namespace deletion is an asynchronous
operation. The pod may still be running and allocating hugepages
resources when next test case creates another pod that requests
the same hugepages resources. This can cause kubelet to fail the test
pod with this kind of error:
OutOfhugepages-2Mi: Node didn't have enough resource: hugepages-2Mi
requested: 6291456, used: 6291456, capacity: 10485760
Explicitly deleting test pod should fix this issue.
The server service monitors the kubelet service and restart it
once the service is down, to avoid kubelet double restarting
we will stop the kubelet service and wait until the kubelet will be
restarted and the node will be ready.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
When kubelet is restarted, it will now remove the resources for huge
page sizes no longer supported. This is required when:
- node disables huge pages
- changing the default huge page size in older versions of linux
(because it will then only support the newly set default).
- Software updates that change what sizes are supported (eg. by changing
boot parameters).
If requesting an substantial amount of huge page memory
(like the tests), it is recommended to to it as early
after boot as possible, or by using the kernel cmd line.
All e2e test images are now using multi-arch manifests so we should stop
looking up and using images that are specific to runtime.GOARCH
Change-Id: I5f3fd6e9a42b9fb88891c19e28a2dfcf7a14be82
The slice of strings more precisely captures the hierarchic nature of
the cgroup paths we use to represent pods and their groupings.
It also ensures we're reducing the chances of passing an incorrect path
format to a cgroup driver that requires a different path naming, since
now explicit conversions are always needed.
The new constructor NewCgroupName starts from an existing CgroupName,
which enforces a hierarchy where a root is always needed. It also
performs checking on the component names to ensure invalid characters
("/" and "_") are not in use.
A RootCgroupName for the top of the cgroup hierarchy tree is introduced.
This refactor results in a net reduction of around 30 lines of code,
mainly with the demise of ConvertCgroupNameToSystemd which had fairly
complicated logic in it and was doing just too many things.
There's a small TODO in a helper updateSystemdCgroupInfo that was
introduced to make this commit possible. That logic really belongs in
libcontainer, I'm planning to send a PR there to include it there.
(The API already takes a field with that information, only that field is
only processed in cgroupfs and not systemd driver, we should fix that.)
Tested by running the e2e-node tests on both Ubuntu 16.04 (with cgroupfs
driver) and CentOS 7 (with systemd driver.)