Quantcast
Channel: Questions in topic: "splunk-enterprise"
Viewing all articles
Browse latest Browse all 47296

How to troubleshoot why forwarder to receiver session setup/teardown is suddenly slow?

$
0
0
All of a sudden, it's taking a really long time for forwarders to connect to receivers, mostly sending cooked data. This is true for all of our receivers (4 indexers and 3 heavy forwarders). It also seems to be taking a long time to tear down forwarder-to-receiver sessions, and looking at netstat output, it seems like data's staying queued for longer than I would think is healthy. We see this happening because of a persistent rash of forwarder-to-receiver connect timeouts, e.g., 10-20-2015 16:54:53.761 -0400 WARN TcpOutputProc - Cooked connection to ip=167.113.155.44:9997 timed out from many sources, as well as by running netstat on receivers and seeing lots of sessions in "SYN_RECV" and "CLOSE_WAIT" status, and data queues that don't seem to drain quickly enough. Here's a quick sample of the netstat command that returns the "SYN_RECV" sessions. netstat -a | egrep 'Recv-Q|palace-6' For all the world it seems like our network just got really slow, at least from Splunk's pov. At this point, no idea of what may have precipitated the condition - as I say it's across our entire Splunk deployment, which spans data centers and regions. Nothing to do with SSL, as we do not encrypt. Also, our receivers don't appear to be CPU or memory constrained. I'd like to be able to rule Splunk in or out vs. "the network", so would love some knowledge about: 1. What facilities in Splunk exist to report on whether there's some sort of internal throttling going on. Is this even relevant? I noticed "maxSockets" parameter in server.conf, which is and has been at its default value of 0, meaning I guess that receiver is supposed to be managing connections based on introspection/magic. How would Splunk let us know if it was feeling constrained and throttling connections? Also, would receiver be making the forwarders wait to connect? 2. Anybody know any other diagnostics we could run that could help pinpoint the slowdown as being local to the hosts running the receivers?

Viewing all articles
Browse latest Browse all 47296

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>