My splunk indexer stopped with lots of ERROR messages which ended in "Too many open files". ulimit -n shows 65536 for the splunk user.
I was able to just start splunk again, and it is running fine now for two days. Also, when I count current open files with> lsof -u splunk | awk 'BEGIN { total = 0; } $4 ~ /^[0-9]/ { total += 1 } END { print total }'
it shows about 2600 open files for user splunk. So nothing serious. So obviously increasing the nofile limit would not help to prevent this in the futur. Any ideas on what could be done to further analyse this?
I am currently monitoring the number of open files over time. Maybe during crash time (0:03 am) nofiles was different form now in the afternoon.
↧