Can someone describe the conditions the cluster master will wait for when scheduling restarts of cluster peers when I have run "splunk apply cluster bundle" ?
We have 8 peers in total.
3 in site1, 2 in site2, 3 in site 3.
We have not varied the percent_peers_to_restart value from it's default of 10 percent.
When we run "splunk apply cluster-bundle" and the CM calls a restart on the 8 cluster peers, we regularly see more than 1 indexer down at once and often see more than one indexer down in the same site.
As I understand it this should not happen - hence me wanting to understand what the CM waits for before starting the next restart.
I have extended the following from the defaults:
[clustering]
restart_timeout = 300
Of our 8 peers, 6 of them start themselves in 5-7 minutes, but 2 take up to 20 minutes.
By "start", I mean they start, and check what buckets they have in place and report them to the cluster master.
It does not look like the CM waits for the peers to complete that activity before kicking off the restart of the next peer, so we generally get people running searches and getting incomplete results warnings.
Thanks
↧