TO properly implement some e2e tests, we need to know
some basic topology facts about the system running the tests.
The bare minimum we need to know is how many PCI SRIOV devices
are attached to which NUMA node.
This way we know which core we can reserve for kube services,
and which NUMA socket we can take to test full socket reservation.
To let the tests know the PCI device topology, we use annotations
in the SRIOV device plugin ConfigMap we need anyway.
The format is
```yaml
metadata:
annotations:
pcidevice_node0: "2"
pcidevice_node1: "0"
```
with one annotation per NUMA node in the system.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Negative tests is when we request a gu Pod we know the system cannot
fullfill - hence we expect rejection from the topology manager.
Unfortunately, besides the trivial case of excessive cores (request
more socket than a NUMA node provides) we cannot easily test the
devices, because crafting a proper pod will require detailed knowledge
of the hw topology.
Let's consider a hypotetical two-node NUMA system with two PCIe busses,
one per NUMA node, with a SRIOV device on each bus.
A proper negative test would require two SRIOV device, that the system
can provide but not on the same single NUMA node.
Requiring for example three devices (one more than the system provides)
will lead to a different, legitimate admission error.
For these reasons we bootstrap the testing infra for the negative tests,
but we add just the simplest one.
Signed-off-by: Francesco Romani <fromani@redhat.com>
We cannot anticipate all the possible configurations
needed by the SRIOV device plugin: there is too much variety.
Hence, we need to allow the test environment to supply
a host-specific ConfigMap to properly configure the device
plugin and avoid false negatives.
We still provide a the default config map as fallback and reference.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The SRIOV device plugin can create different resources depending
on both the hardware present on the system and the configuration.
As long as we have at least one SRIOV device, the tests don't actually
care about which specific device is.
Previously, the test hardcoded the most common intel SRIOV device
identifier. This patch lifts the restriction and let the test
autodetect and use what's available.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This patch extends and completes the previously-added
empty topology manager test for single-NUMA node policy
by adding reporting in the test pod and checking
the resource alignment.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This patch all the testing infra and utilities needed
to run e2e topology manager tests. This include setup
a guaranteed pod which needs some devices.
The simplest real device available for the purpose
are the SRIOV devices, hence we use them.
This patch pulls the SRIOV device plugin from
the official, yet external, repository.
We do it as close as possible for the nvidia GPU plugin.
This patch also performs minor refactoring for some
test framework utilities, needed to support the new
e2e tests.
Finally, we add an empty e2e topology manager test,
to be completed by the next patch.
Signed-off-by: Francesco Romani <fromani@redhat.com>
pull-kubernetes-e2e-gce-rbe are still failing with the following:
```
INFO: Waited 10 seconds for server process (pid=72) to terminate.
FATAL: Attempted to kill stale server process (pid=72) using SIGKILL, but it did not die in a timely fashion.
make: *** [Makefile:626: bazel-release] Error 36
make: Leaving directory '/home/prow/go/src/k8s.io/kubernetes'
```
we have added a pkill just after the line for bazel shutdown, so let's
continue to give the pkill a chance to run.
If we don't install docker and install just containerd apt packages,
there is no docker group. In this scenario, we should not add the gid to
config.toml