I was having problems with one of my heavy forwarders (splunk 6.6.3) running on Windows 2008, so I noted what inputs I had, uninstalled and then installed version 7.0.1. After adding my configurations and restoring my connections, I started Splunk and got the following:
Checking prerequisites...
Checking http port [8000]: open
Checking mgmt port [8089]: open
Checking appserver port [127.0.0.1:8065]: open
Checking kvstore port [8191]: open
Checking configuration... Done.
Checking critical directories... Done
Checking indexes...
Validated: _audit _internal _introspection _telemetry _thefishbu
cket
Done
Bypassing local license checks since this instance is configured with a remote license master.
Checking filesystem compatibility... Done
Checking conf files for problems...
Done
Checking default conf files for edits...
Validating installed files against hashes from 'C:\Program Files\Splunk\
splunk-7.0.1-2b5b15c4ee89-windows-64-manifest'
All installed files intact.
Done
All preliminary checks passed.
Starting splunk server daemon (splunkd)...
Splunkd: Starting (pid 3540)
Done
Waiting for web server at https://127.0.0.1:8000 to be available................
................................................................................
........................................................................ Done
This took about 30 minutes to before I see "Done". But the web server wasn't running. Checking the log files I had three messages that seemed to relate:
ERROR UiPythonFallback - Appserver running on port 8065 exited unexpectedly: exited with code 1
ERROR UiHttpListener - An applicaiton server has exited unexpectedly, web UI cannot be used until it is restarted
WARN UiHttpListener - Web UI now stopped
(yes, application is spelled wrong in the logs)
I didn't have anything on that port when I checked with netstat and via the resource monitor. I tried rebooting to no avail. I found the following posts:
https://answers.splunk.com/answers/545000/slow-splunkweb-startup-caused-by-splunk-instrument.html
https://answers.splunk.com/answers/563807/why-does-splunkweb-in-662-take-so-long-to-start.html
https://answers.splunk.com/answers/211525/how-to-troubleshoot-why-we-are-getting-appserver-p.html
and tried changing those changes, but I still had problems.
After seeing this post: https://answers.splunk.com/answers/616294/unable-to-start-splunk-1.html
I looked around but didn't see a repair option for splunk.
I turned on debuging (http://docs.splunk.com/Documentation/Splunk/latest/Troubleshooting/Enabledebuglogging)
and found the following:
7:00:52.429 PM
03-07-2018 19:00:52.429 -0500 ERROR KVStoreBulletinBoardManager - Failed to start KV Store process. See mongod.log and splunkd.log for details.
host = splunk-04
message = Failed to start KV Store process. See mongod.log and splunkd.log for details.
3/7/18
7:00:52.429 PM
03-07-2018 19:00:52.429 -0500 ERROR KVStoreConfigurationProvider - Could not start mongo instance. Initialization failed.
host = splunk-04
message = Could not start mongo instance. Initialization failed.
3/7/18
7:00:52.429 PM
03-07-2018 19:00:52.429 -0500 ERROR KVStoreConfigurationProvider - Could not get ping from mongod.
host = splunk-04
message = Could not get ping from mongod.
3/7/18
7:00:44.067 PM
03-07-2018 19:00:44.067 -0500 ERROR KVStoreBulletinBoardManager - KV Store changed status to failed. KVStore process terminated.
host = splunk-04
message = KV Store changed status to failed. KVStore process terminated.
3/7/18
7:00:44.067 PM
03-07-2018 19:00:44.067 -0500 ERROR KVStoreBulletinBoardManager - KV Store process terminated abnormally (exit code 100, status exited with code 100). See mongod.log and splunkd.log for details.
host = splunk-04
message = KV Store process terminated abnormally (exit code 100, status exited with code 100). See mongod.log and splunkd.log for details.
3/7/18
7:00:44.067 PM
03-07-2018 19:00:44.067 -0500 ERROR MongodRunner - mongod exited abnormally (exit code 100, status: exited with code 100) - look at mongod.log to investigate.
host = splunk-04
message = mongod exited abnormally (exit code 100, status: exited with code 100) - look at mongod.log to investigate.
Which led me to these posts:
http://docs.splunk.com/Documentation/Splunk/6.4.2/Admin/StartSplunk#Start_Splunk_Enterprise_on_Windows_in_legacy_mode
https://answers.splunk.com/answers/514443/after-editing-the-kv-store-for-my-custom-app-why-d.html#answer-521153
After I changed the permissions on the whole splunk folder, I restarted splunk AGAIN and I have zero ERRORs in my logs, but it STILL takes over 30 minutes to start. And the web interface still won't work.
I then saw these errors:
WARN HttpListener - Socket error from 127.0.0.1 while idling: error:1408F10B:SSL routines:SSL3_GET_RECORD:wrong version number
ERROR UiPythonFallback - Appserver at http://127.0.0.1:8065 never started up!
and searched and found these posts, but they don't seem to be of much help.
https://answers.splunk.com/answers/7899/splunkweb-fails-to-start-timeout-when-binding-to-port.html
https://answers.splunk.com/answers/507379/how-to-resolve-splunk-web-not-starting-after-the-h.html
It IS forwarding logs, so the basic functionality is there. The logs are windows server logs being collected by wmi calls. But the logs are about 10 minutes behind.
I checked the firewall and for splunkd, I have any any for incoming. But nothing for outgoing. (hopefully that is the simple answer?)
So I'm asking if anyone has any other suggestions?
Thanks
↧