On the CNI request failure, multus-cni prints out cmdArgs. In all
cases, except for debug printing, this is done with %s and a special
printing function. However, the handleCNIRequest is an exception for
some reason. That leads to unintelligible error messages in case
of CNI request failures (severely abridged):
CmdAdd (shim): CNI request failed with status 400:
'&{ContainerID:<id> Netns:/var/run/netns/<uuid> IfName:eth0
Args:<args> Path: StdinData:[125 121 111 117 114 32 97 100 118
101 114 116 105 115 101 109 101 110 116 32 99 111 117 108 100
32 98 101 32 104 101 114 101 125 ... another 650 numbers ]}
ContainerID:"<id>" Netns:"/var/run/netns/<uuid>" IfName:"eth0"
Args:"<args>" Path:"" ERRORED: error configuring pod ...
printCmdArgs() should be used for this case as well to avoid huge
hardly readable logs.
At the same time, the content of cniCmdArgs is always appended to
the error twice as seen in the example above. The first time by the
HandleCNIRequest and another time by the handleCNIRequest. Same for
the HandleDelegateRequest path.
Just removing the prefixing from the lower level handlers while
keeping higher level ones. The 'ERRORED' part migrated to the higher
level handler functions to preserve the overall look of the error.
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Because deletes should favor a successful path, the readiness check should be skipped for pod removals.
This can cause an issue where there's pods pending deletes and that might impact scheduling of a pod that may be necessary in order to set the readiness indicator.
Adds a new method to check for readiness indicator alone in order to immediately log a warning.
When user recreate whole cluster certs, multus thick plugin's
previous cert is no longer valid. In such case, we need to prevent
to use cert manager's old certs and restart it from bootstrap
kubeconfig. This fix reloads client config from bootstrap
kubeconfig if cert mgr's cert is failed to load pod.
As described in #1171, the watch function is required in the clusterrole
for the thick Multus version, otherwise "Failed to watch *v1.Pod" would
be returned.
The ClusterRole was missing the watch permission on pods, which resulted in Multus throwing this error message every few seconds:
Failed to watch *v1.Pod: unknown (get pods)
This change stops to update status in CNI's DEL command.
There are two reasons:
1. cmd DEL is invoked at only pod deletion, hence k8s does not
guarantee the pod and it may be already deleted. Hence this
API may failed.
2. In stateful set's pod recreation case, it may have race
condition to update the status at cmd DEL case.
In stateful set case, same pod name, i.e. stateful-0, is deleted
and then created again. In this case, if old Pod's CNI DEL command
is not finished before new Pod's creation, then SetStatus function
is failed due to pod UID mismatch.
This change introduces certDuration as parameter to customize
cert duration. In addition, environment variable for node name
is matched to other usages.