We have a support ticket open, but I thought I'd also ask the community. Since upgrading our Splunk to 8.0.1 this one HF has been spewing "TcpOutputProc - Possible duplication of events " for most channels. As well as "TcpOutputProc - Applying quarantine to ip=xx.xx.xx.xx port=9998 _numberOfFailures=2"
We upgraded on the 15th near midnight. This is a count of those the errors from that host.
2020-02-14 0
2020-02-15 623
2020-02-16 923874
2020-02-17 396920
2020-02-18 678568
2020-02-19 602100
2020-02-20 459284
2020-02-21 1177642
Here is a count from the indexer cluster showing the number of blocked=true events. One would expect these to be similar in count if the indexers were telling the HF to go elsewhere because it's queues were full.
index=_internal host=INDEXERNAMES sourcetype=splunkd source=/opt/splunk/var/log/splunk/metrics.log blocked=true component=Metrics
| timechart span=1d count by source
2020-02-14 7
2020-02-15 180
2020-02-16 260
2020-02-17 15
2020-02-18 18
2020-02-19 2415
2020-02-20 1
2020-02-21 2
Lastly, it's not just one source or channel, it's everything from the host.
index=_internal component=TcpOutputProc host=ghdsplfwd01lps log_level=WARN duplication
| rex field=event_message "channel=source::(?[^\|]+)"
| stats count by channel
/opt/splunk/var/log/introspection/disk_objects.log 51395
/opt/splunk/var/log/introspection/resource_usage.log 45470
mule-prod-analytics 42192
/opt/splunk/var/log/splunk/metrics.log 28283
web_ping://PROD_CommerceHub 27881
web_ping://V8_PROD_CustomSolr5 27877
web_ping://V8_PROD_WebServer4 27873
web_ping://EnterWorks PRD 27871
web_ping://RTP DEV 27870
web_ping://Ensighten 27869
web_ping://RTP 27867
bandwidth 20570
cpu 19949
iostat 19946
ps 19821
Any ideas?
↧