I have about 6 month's worth of data in a summary index that is hourly filecounts and volume for ftp servers. I am trying to generate an alert when any server exceeds its average (for the day of the week) filecount or average volume by +/- 25%.
So far I have the search below and then make a custom condition to check if the current count(or volume) is > the high or < the low. It works for a single server (there could be 100's) and always compares today's data against the Monday's in the past 30 days.
index=si-br server=foo earliest=-30d@d latest=-0d@d
| stats count as count sum(filesize) as volume by server,location,_time
| bin _time span=1d
| stats sum(volume) as volume sum(count) as count by _time
| where strftime(_time, "%w") == "1"
| stats avg(count) as avgcount avg(volume) as avgvolume
| appendcols [
search index=br earliest=-0d@d latest=now server=foo
| stats count as count sum(filesize) as volume
| eval highcount=avgcount*1.25
| eval lowcount=avgcount*.75
| eval highvol=avgvolume*1.25
| eval lowvol=avgvolume*.75
]
My questions to you:
1. How do I make the day of the week dynamic? (if today is Tueday, I want to compare strftime(_time,"%w") == "2", but if it is Friday, I want strftime(_time,"%w") == "5". I suppose I could make 7 clones of the alert, change the day of week and schedule to run once a week on the correct day. Is there another way?
2. How can I exclude any "abnormal" counts/volumes from the calculation of the average. Say the 2nd Monday of the month had a 30% spike in volume that I do not what considered part of the average.
3. How do I scale this out for potentially 100's of servers? Do I have to create a different alert for each server?
4. Is it possible to use a lookup table to use a different threshold for each server ( 25% for some, 50% for others...)
↧