Hi all,
I built a pre-production indexer cluster using CloudFormation and it works like a champ. However, following the same procedure in production gives me an unusable cluster. I have searched all over Google and Answers, but found only SIMILAR things, nothing matching this.
Here are the relevant cluster master config files.
**inputs.conf**
[default]
host = ip-10-88-172-231
**server.conf**
[sslConfig]
sslKeysfilePassword = xxxx
[lmpool:auto_generated_pool_download-trial]
description = auto_generated_pool_download-trial
quota = MAX
slaves = *
stack_id = download-trial
[lmpool:auto_generated_pool_forwarder]
description = auto_generated_pool_forwarder
quota = MAX
slaves = *
stack_id = forwarder
[lmpool:auto_generated_pool_free]
description = auto_generated_pool_free
quota = MAX
slaves = *
stack_id = free
[general]
pass4SymmKey = xxxx
serverName = ip-10-88-172-231
[clustering]
mode = master
pass4SymmKey = xxxx
Here are the relevant indexer config files:
**inputs.conf**
inputs.conf
[default]
host = ip-10-88-172-234
[splunktcp-ssl:9997]
compressed = true
connection_host = none
[SSL]
password = xxxx
requireClientCert = true
rootCA = /opt/splunk/etc/auth/certs/ca.pem
serverCert = /opt/splunk/etc/auth/certs/indexer.pem
**server.conf**
[sslConfig]
sslKeysfilePassword = xxxx
[lmpool:auto_generated_pool_download-trial]
description = auto_generated_pool_download-trial
quota = MAX
slaves = *
stack_id = download-trial
[lmpool:auto_generated_pool_forwarder]
description = auto_generated_pool_forwarder
quota = MAX
slaves = *
stack_id = forwarder
[lmpool:auto_generated_pool_free]
description = auto_generated_pool_free
quota = MAX
slaves = *
stack_id = free
[replication_port://9887]
[general]
pass4SymmKey = xxxx
serverName = ip-10-88-172-234
[clustering]
master_uri = https://10.88.172.231:8089
mode = slave
pass4SymmKey = xxxx
I am getting an error in the Web UI stating:
One or more replicated indexes may not be fully searchable. Some search results may be incomplete or duplicated as we try to fix up your cluster. For more information, check the cluster manager page on the master - splunkd URI: https://127.0.0.1:8089
splunkd.log on the cluster master shows me this for all three indexers:
11-05-2015 17:25:25.375 +0000 ERROR HttpClientRequest - HTTP client error: Read Timeout (while accessing https://10.88.172.233:8089/services/server/info)
11-05-2015 17:25:25.376 +0000 ERROR HttpClientRequest - HTTP client error: Connection reset by peer (while accessing http://10.88.172.233:8089/services/server/info)
11-05-2015 17:25:25.376 +0000 WARN DistributedPeerManagerHeartbeat - Unable to get server info from peer: http://10.88.172.233:8089 due to: Connection reset by peer
splunkd.log on the indexers show me this:
11-05-2015 17:33:00.641 +0000 WARN HttpListener - Socket error from 10.88.172.231 while idling: error:1407609C:SSL routines:SSL23_GET_CLIENT_HELLO:http request
So first of all it looks like the cluster master is trying to get peer server info via http and not https. On the working cluster, I can log on to any indexer and get valid results from this curl call:
`curl https://localhost:8089/services/server/info -k`
On the non-working cluster, that call hangs. However, these calls DO work.
[root@ip-10-88-172-233 local]# curl https://localhost:8089/services -k
Unauthorized
[root@ip-10-88-172-233 local]# curl https://localhost:8089/services/server -k
Unauthorized
But if I add /info the call will hang for a very long time. I have looked everywhere and cannot find any information on this.
↧