The reason for the issue is that the metrics were bumped before the
final job status update. In case the update failed the path was
repeated by the next syncJob leading to double-counting of the metrics.
The solution is to delay recording metrics and broadcasting events
after the job status update succeeds.
This change updates the API server code to load the encryption
config once at start up instead of multiple times. Previously the
code would set up the storage transformers and the etcd healthz
checks in separate parse steps. This is problematic for KMS v2 key
ID based staleness checks which need to be able to assert that the
API server has a single view into the KMS plugin's current key ID.
Signed-off-by: Monis Khan <mok@microsoft.com>
There is a corner case when blocking Pod termination via a lifecycle
preStop hook, for example by using this StateFulSet:
```yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: ubi
serviceName: "ubi"
replicas: 1
template:
metadata:
labels:
app: ubi
spec:
terminationGracePeriodSeconds: 1000
containers:
- name: ubi
image: ubuntu:22.04
command: ['sh', '-c', 'echo The app is running! && sleep 360000']
ports:
- containerPort: 80
name: web
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- 'echo aaa; trap : TERM INT; sleep infinity & wait'
```
After creation, downscaling, forced deletion and upscaling of the
replica like this:
```
> kubectl apply -f sts.yml
> kubectl scale sts web --replicas=0
> kubectl delete pod web-0 --grace-period=0 --force
> kubectl scale sts web --replicas=1
```
We will end up having two pods running by the container runtime, while
the API only reports one:
```
> kubectl get pods
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 92s
```
```
> sudo crictl pods
POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME
e05bb7dbb7e44 12 minutes ago Ready web-0 default 0 (default)
d90088614c73b 12 minutes ago Ready web-0 default 0 (default)
```
When now running `kubectl exec -it web-0 -- ps -ef`, there is a random chance that we hit the wrong
container reporting the lifecycle command `/bin/sh -c echo aaa; trap : TERM INT; sleep infinity & wait`.
This is caused by the container lookup via its name (and no podUID) at:
02109414e8/pkg/kubelet/kubelet_pods.go (L1905-L1914)
And more specifiy by the conversion of the pod result map to a slice in `GetPods`:
02109414e8/pkg/kubelet/kuberuntime/kuberuntime_manager.go (L407-L411)
We now solve that unexpected behavior by tracking the creation time of
the pod and sorting the result based on that. This will cause to always
match the most recently created pod.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
The change made in https://github.com/kubernetes/kubernetes/pull/112644
resulted in an update to the rejection message. In the memory manager
node e2e test, we still checked against the old expected error message
giving the impression that the pod succeeded to run even though it failed
as expected mainly because the check wasn't performed correctly.
In this patch, we update to the correct rejection message to make sure
that the memory manager is no longer failing.
NOTE: This test is supposed to run on multi NUMA systems and if the
underlying node does not have multi NUMA nodes, the test is skipped
which is what happens in upstream test infrastructure as it is mainly
composed of single NUMA nodes. Because of this, this test failure
wasn't evident via testgrid.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>