Hi all,
I’m looking for best practice guidance on the order of operations for bringing down a distributed Splunk environment on Linux and then the order to bring the servers back up. I am okay with a period of downtime to allow for any operating system patching, rebooting of servers, etcetera, but I want to avoid any corruption of orphaned data caused by bringing Splunk nodes down in the wrong order or without notification to the master servers. I don’t want my servers struggling to maintain replication and search factors that leads to orphan data or problems starting services.
In short, I am looking to:
• Understand all the dependencies and orders of operation
• Script a graceful shutdown of the Splunk environment
• Do whatever maintenance is called for which could include rebooting the servers
• Script a graceful startup of the environment (or in the case of reboots, determine the correct order to start servers with boot-start enabled)
Here is my distributed environment for reference:
• Deployment server / license server
• Search head cluster deployer
• Multiple search heads
• Index cluster master / Distributed management console
• Multiple indexers
• Heavy weight forwarders
Thanks,
Brian
↧