Currently, when running node e2e it's not possible to use the ginkgo `--repeat`
flag to run the test suite multiple times. This is useful when debugging tests
and ensuring they are not flaky by re-running them several times. Currently if
using `--repeat` ginkgo flag, the 2nd run of the test will fail due to kubelet
not starting with message like:
```
Failed to start transient service unit: Unit kubelet-20221020T040841.service already exists.
```
This is because during the test startup, kubelet is started as a transient unit
file via `systemd-run`. The unit is started with the `--remain-after-exit` flag
to ensure that the unit will remain even if the kubelet is restarted. The test
suite currently uses `systemd kill` command to stop kubelet. This works fine for
stopping the kubelet, but on the second run, when `systemd-run` is used to start
systemd unit again it will fail because the unit already exists. This is because
`systemd kill` will not delete the systemd unit, only send SIGTERM signal to it.
To fix this, add `unitName` as a field to the `server` struct. When
kubelet server is constructed, set the unit name. As part of e2e test
termination, in `E2EServices.Stop()``, stop the kubelet systemd unit. By
stopping the kubelet systemd unit, systemd will delete the systemd
transient unit, allowing it to be created and started again in a
subsequent e2e run.
Signed-off-by: David Porter <david@porter.me>
The original intention was to address "frustration of end users running the e2e
suite is that they take a significant amount of time and it is difficult to
gauge progress".
But Ginkgo's output is different now than it was in Kubernetes 1.19. If users
want to see progress, then "ginkgo --progress" might provide enough
information.
Printing to os.Stdout doesn't work as intended anyway when output redirection
is enabled (the default for parallel runs) and causes these JSON snippets to
appear as "show stdout" for each failed test in a Prow job, which is
distracting.
Tests should accept a context from Ginkgo and pass it through to all functions
which may block for a longer period of time. In particular all Kubernetes API
calls through client-go should use that context. Then if a timeout occurs,
the test returns immediately because everything that it could block on will
return.
Cleanup code then needs to run in a separate Ginkgo node, typically
DeferCleanup, which ensures that it gets a separate context which has not timed
out yet.
DOCKER is otherwise used to be the command name (perhaps podman), but we were conflating DOCKER_OPTS in kube::util::ensure_docker_daemon_connectivity.
Split out docker opts.
This fixes shellcheck warning that docker is assigned an array and then a string in some scripts.