Splunk version 7.1.2
uberAgent version: 5.0.1
We have Splunk Search Head + Splunk Indexer + Splunk Heavy Forwarder all running on Windows 2012R2.
We have also uberAgent app installed on Search Head and uberAgent_Indexer app installed on Indexer. It looks like uberAgent is crashing Splunk service on the Indexer frequently. This issue seems to be related to uberAgent, because after disabling the app it isn't crashing anymore.
However we would assume that even if the uberAgent app is buggy, it would not crash Splunk completely, because this completely stops anyone from using the Search Head to search anything (even indexes not related to uberAgent).
Something is very odd there - it looks like the Splunk service on the Indexer sometimes recovers itself automatically, because uberAgent crashes the service e.g. 10 times a day without our intervention, so something must be restarting the splunk service there. Unfortunately, it looks like sometimes the service is not restarted and hence any searches from the Search Head stop working. Then we have to restart the Spplunk service on the Indexer and re-add the Splunk Indexer to the Distributed Search servers to make searches work again (otherwise the Indexer's status is shown as "Sick").
As a side effect of the frequent crashes, dump log files are created along the log files. Each log file takes about 2GB of disk space and since these are not maintained and cleared up automatically, they have filled up the disk space causing Splunk crash due to "not enough disk space". It was the disk space issue which lead us to find out what is going on and found out the root cause of "not enough disk space" - uberAgent was crashing Splunk and the dump files were generating 100GB+ of data.
10/08/2018 04:40 4,449 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-04-40-23.log
10/08/2018 05:11 8,507 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-05-11-39.log
10/08/2018 05:40 8,615 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-05-40-24.log
10/08/2018 05:50 8,508 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-05-50-30.log
10/08/2018 05:56 4,078 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-05-56-30.log
10/08/2018 06:50 4,546 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-06-50-25.log
10/08/2018 07:00 2,435 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-06-50-28.log
10/08/2018 06:50 8,611 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-06-50-34.log
10/08/2018 07:06 4,359 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-07-06-30.log
10/08/2018 07:26 6,603 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-07-25-31.log
10/08/2018 10:10 5,211 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-10-10-37.log
10/08/2018 10:35 2,706 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-10-35-17.log
10/08/2018 12:10 8,595 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-12-10-31.log
10/08/2018 12:40 4,372 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-12-40-20.log
10/08/2018 13:35 4,450 E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-13-35-32.log
...
Sample crash log:
[build 8f0ead9ec3db] 2018-08-10 04:40:23
Access violation, cannot write at address [0x0000000000000000]
Exception address: [0x00007FF766E0FA53]
Crashing thread: rjreaderthread
MxCsr: [0x0000000000001F80]
SegDs: [0x000000000000002B]
SegEs: [0x000000000000002B]
SegFs: [0x0000000000000053]
SegGs: [0x000000000000002B]
SegSs: [0x000000000000002B]
SegCs: [0x0000000000000033]
EFlags: [0x0000000000010202]
Rsp: [0x0000000E249FB420]
Rip: [0x00007FF766E0FA53] ?
Dr0: [0x0000000000000000]
Dr1: [0x0000000000000000]
Dr2: [0x0000000000000000]
Dr3: [0x0000000000000000]
Dr6: [0x0000000000000000]
Dr7: [0x0000000000000000]
Rax: [0x0000000000000000]
Rcx: [0x0000000E5BDC4AC0]
Rdx: [0x0000000E11BA0AC0]
Rbx: [0x0000000E5BDC4A50]
Rbp: [0x0000000E4FA4EB80]
Rsi: [0x0000000000000000]
Rdi: [0x0000000E249FB558]
R8: [0x00007FFBD216F610]
R9: [0x00007FFBD216F618]
R10: [0x5000BB77A2A6EB15]
R11: [0x0000BB705D19EB74]
R12: [0x0000000E4D015A38]
R13: [0x0000000E506C22C8]
R14: [0x0000000E11BA0B00]
R15: [0x0000000E4F65C228]
DebugControl: [0x0000000E591E4E74]
LastBranchToRip: [0x0000000000000000]
LastBranchFromRip: [0x0000000000000000]
LastExceptionToRip: [0x0000000000000000]
LastExceptionFromRip: [0x0000000000000000]
OS: Windows
Arch: x86-64
Backtrace:
[0x00007FF766E0FA53] ?
Args: [0x0000000E4F65C1F0] [0x0000000E00000002] [0x0000000000000063]
[0x00007FF766CEDA7A] ?
Args: [0x0000000E249FB558] [0x00007FFBD20D419B] [0x0000000E4D809480]
[0x00007FF766ABEA09] ?
Args: [0x0000000E001FBDA0] [0x0000000E00000006] [0x0000000000000063]
[0x00007FF76666C8FA] ?
Args: [0x0000000E4FA4EB80] [0x0000000E4FA4EB80] [0x00000000FFFFFFFF]
[0x00007FF766668ED3] ?
Args: [0x0000000E5AA94830] [0x0000000E5AA94830] [0x0000000E5AA7B940]
[0x00007FF766D4E922] ?
Args: [0x0000000000000000] [0x00007FFBF21416A0] [0x00007FFBF21416A0]
[0x00007FFBD212BE1D] crt_at_quick_exit + 125/784
Args: [0x00007FFBF21416A0] [0x0000000E5AA7B940] [0x0000000000000000]
[0x00007FFBF21416AD] BaseThreadInitThunk + 13/48
Args: [0x0000000000000000] [0x0000000000000000] [0x0000000000000000]
[0x00007FFBF2AC54F4] RtlUserThreadStart + 52/1008
Args: [0x0000000000000000] [0x0000000000000000] [0x0000000000000000]
Crash dump written to: E:\Programs\Splunk\var\log\splunk\E__Programs_Splunk_bin_splunkd_exe_crash-2018-08-10-04-40-23.dmp
Splunk ran as local administrator
HXP33715 /Windows Server 2012 R2
GetLastError(): 8
Threads running: 15
Executable module base: 0x00007FF7662F0000
Runtime: 65.111172s
argv: [splunkd search --id=remote_hxp33714_scheduler__nobody__uberAgent__RMD5e28e2a5bd72887c9_at_1533872164_93340 --maxbuckets=0 --ttl=60 --maxout=0 --maxtime=0 --lookups=1 --streaming --sidtype=normal --outCsv=true --user=splunk-system-user --pro --roles=admin:db_connect_user:dbx_user:itoa_admin:itoa_analyst:itoa_user:power:splunk-system-role:user]
Thread: "rjreaderthread", did_join=1, ready_to_run=Y, main_thread=N
First 4 bytes of Thread token @0xe5aa94844:
00000000 8c 0d 00 00 |....|
00000004
x86 CPUID registers:
0: 0000000D 756E6547 6C65746E 49656E69
1: 000306F0 04010800 FFFA3203 0FABFBFF
2: 76036301 00F0B5FF 00000000 00C30000
3: 00000000 00000000 00000000 00000000
4: 00000121 01C0003F 0000003F 00000000
5: 00000000 00000000 00000000 00000000
6: 00000077 00000002 00000009 00000000
7: 00000000 000027AB 00000000 00000000
8: 00000000 00000000 00000000 00000000
9: 00000000 00000000 00000000 00000000
A: 07300401 0000007F 00000000 00000000
B: 00000000 00000001 00000100 00000004
C: 00000000 00000000 00000000 00000000
D: 00000007 00000340 00000340 00000000
80000000: 80000008 00000000 00000000 00000000
80000001: 00000000 00000000 00000021 2C100800
80000002: 65746E49 2952286C 6F655820 2952286E
80000003: 55504320 2D354520 30393632 20347620
80000004: 2E322040 48473036 0000007A 00000000
80000005: 00000000 00000000 00000000 00000000
80000006: 00000000 00000000 01006040 00000000
80000007: 00000000 00000000 00000000 00000100
80000008: 0000302A 00000000 00000000 00000000
terminating...
↧