We are getting a random false alert from Splunk (6.5.2) search that's looking if certain string is not found in a logfile within the last 15m.
When we did an investigation and try to search, the string were there for the alert period so it shouldn't have triggered any alert.
We couldn't find any relevant error in the splunkd log on the forwarder, but I did notice the two consecutive entries on the metrics.log:
1/25/19 4:55:01.800 PM 01-25-2019 16:55:01.800 +1100 INFO Metrics - group=per_source_thruput, series="/XXX/systemerr.log", kbps=10.196221, eps=0.193555, kb=316.072266, ev=6, avg_age=1389.166667, max_age=1667
1/25/19 4:22:59.801 PM 01-25-2019 16:22:59.801 +1100 INFO Metrics - group=per_source_thruput, series="/XXX/systemerr.log", kbps=6.268667, eps=0.161285, kb=194.334961, ev=5, avg_age=211.600000, max_age=265
We got the false alert around 4:54, so if I understand correctly by looking at the time gap and the "avg_age" value, it might be possible that the alert was triggered because the data was only being read after 4:55; there was no update (new lines) on the file from 4:22 until 4:55.
So the question is, is my understanding correct? Is the problem caused by delay in writing the data in the source logfile or is it because of processing delay in Splunk itself?
Appreciate any advise,
↧