Merge pull request #39949 from Crassirostris/fluentd-liveness-probe-fix

Automatic merge from submit-queue (batch tested with PRs 38592, 39949, 39946, 39882)

Remove fluentd buffers if fluentd is stuck

Fluentd now stores its buffers on disk for the resiliency. However, if buffer is corrupted, fluentd will be restarting forever.

Following change will make fluentd liveness probe delete buffers if fluentd is stuck for more than X minutes (15 by default).
This commit is contained in:
Kubernetes Submit Queue 2017-01-16 10:37:40 -08:00 committed by GitHub
commit 06c610e276

View File

@ -57,10 +57,23 @@ spec:
- '/bin/sh'
- '-c'
- >
LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-600};
LIVENESS_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-300};
STUCK_THRESHOLD_SECONDS=${LIVENESS_THRESHOLD_SECONDS:-900};
if [ ! -e /var/log/fluentd-buffers ];
then
exit 1;
fi;
LAST_MODIFIED_DATE=`stat /var/log/fluentd-buffers | grep Modify | sed -r "s/Modify: (.*)/\1/"`;
LAST_MODIFIED_TIMESTAMP=`date -d "$LAST_MODIFIED_DATE" +%s`;
if [ `date +%s` -gt `expr $LAST_MODIFIED_TIMESTAMP + $LIVENESS_THRESHOLD_SECONDS` ]; then exit 1; fi;
if [ `date +%s` -gt `expr $LAST_MODIFIED_TIMESTAMP + $STUCK_THRESHOLD_SECONDS` ];
then
rm -rf /var/log/fluentd-buffers;
exit 1;
fi;
if [ `date +%s` -gt `expr $LAST_MODIFIED_TIMESTAMP + $LIVENESS_THRESHOLD_SECONDS` ];
then
exit 1;
fi;
nodeSelector:
alpha.kubernetes.io/fluentd-ds-ready: "true"
terminationGracePeriodSeconds: 30