I have two indexers set for a 2:2 configuration for replication/search factor. All has been fine until a couple of weeks ago when an error crept in. The problem began before I upgraded the cluster to 6.4.0 on Sunday and I've been trying to sort this out. Logs snippet that keep repeating from splunkd.log on each indexer are below:
Indexer 1
04-19-2016 12:26:32.815 -0500 INFO S2SFileReceiver - event=onFileAborted bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 src=E6B3EBCE-6024-4A1E-9CC6-3237336E287E remoteError=false
04-19-2016 12:26:32.815 -0500 INFO CMReplicationRegistry - Finished replication: bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 src=E6B3EBCE-6024-4A1E-9CC6-3237336E287E target=BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6
04-19-2016 12:26:32.815 -0500 INFO CMSlave - bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 src=E6B3EBCE-6024-4A1E-9CC6-3237336E287E tgt=BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 failing=BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 queued replication error job
04-19-2016 12:26:32.815 -0500 INFO S2SFileReceiver - event=onFileAborted replicationType=eJournalReplication bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 src=E6B3EBCE-6024-4A1E-9CC6-3237336E287E bucketType=cold remoteError=false status='success'
04-19-2016 12:26:32.847 -0500 INFO CMRepJob - job=CMReplicationErrorJob bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 failingGuid=BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 srcGuid=E6B3EBCE-6024-4A1E-9CC6-3237336E287E tgtGuid=BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 succeeded
04-19-2016 12:26:35.428 -0500 INFO CMSlave - event=addBucket bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 status=NonStreamingTarget ss=Unsearchable mask=0 earliest=0 latest=0 standalone=0
04-19-2016 12:26:35.428 -0500 INFO CMSlave - bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 addTargetInProgress=true
04-19-2016 12:26:35.428 -0500 INFO CMReplicationRegistry - Starting replication: bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 src=E6B3EBCE-6024-4A1E-9CC6-3237336E287E target=BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6
04-19-2016 12:26:35.428 -0500 INFO S2SFileReceiver - event=onFileOpened replicationType=eJournalReplication bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 src=E6B3EBCE-6024-4A1E-9CC6-3237336E287E bucketType=cold path=/splunkdata/var/lib/splunk/mu-syslog/colddb/441_BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6/rawdata/journal.gz searchable=false
04-19-2016 12:26:35.447 -0500 INFO CMSlave - addTargetDone bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 status=success addTargetInProgress=false
04-19-2016 12:26:36.750 -0500 INFO S2SFileReceiver - event=onDoneReceived replicationType=eJournalReplication bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6
04-19-2016 12:26:36.751 -0500 INFO S2SFileReceiver - event=rename bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 from=/splunkdata/var/lib/splunk/mu-syslog/colddb/441_BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 to=/splunkdata/var/lib/splunk/mu-syslog/colddb/rb_1459818423_1459812147_441_BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6
04-19-2016 12:26:36.751 -0500 INFO CMSlave - bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 Transitioning status from=NonStreamingTarget to=Complete for reason="cold success (target)"
04-19-2016 12:26:36.751 -0500 INFO DatabaseDirectoryManager - addReplicatedBucket bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 dstPath='/splunkdata/var/lib/splunk/mu-syslog/colddb/rb_1459818423_1459812147_441_BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6'
04-19-2016 12:26:36.751 -0500 ERROR S2SFileReceiver - event=onFileClosed replicationType=eJournalReplication bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 state=eComplete src=E6B3EBCE-6024-4A1E-9CC6-3237336E287E bucketType=cold status=failed err="bucket is already registered, registered not as a streaming hot target (SPL-90606)"
04-19-2016 12:26:36.751 -0500 WARN S2SFileReceiver - event=processFileSlice bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 msg='aborting on local error'
Indexer 2
04-19-2016 12:26:30.571 -0500 INFO CMReplicationRegistry - Finished replication: bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 src=E6B3EBCE-6024-4A1E-9CC6-3237336E287E target=BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6
04-19-2016 12:26:30.571 -0500 INFO CMSlave - bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 src=E6B3EBCE-6024-4A1E-9CC6-3237336E287E tgt=BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 failing=BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 queued replication error job
04-19-2016 12:26:30.580 -0500 INFO CMRepJob - job=CMReplicationErrorJob bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 failingGuid=BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 srcGuid=E6B3EBCE-6024-4A1E-9CC6-3237336E287E tgtGuid=BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 succeeded
04-19-2016 12:26:31.341 -0500 INFO CMReplicationRegistry - Starting replication: bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 src=E6B3EBCE-6024-4A1E-9CC6-3237336E287E target=BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6
04-19-2016 12:26:31.341 -0500 INFO BucketReplicator - event=asyncReplicateBucket bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 to guid=BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 host=xxx.xxx.xx.xxx s2sport=8090
04-19-2016 12:26:31.341 -0500 INFO BucketReplicator - bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 earliest=1459812147 latest=1459818423 type=3
04-19-2016 12:26:31.342 -0500 INFO BucketReplicator - Created asyncReplication task to replicate bucket mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 to guid=BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 host=xxx.xxx.xx.xxx s2sport=8090 bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6
04-19-2016 12:26:31.342 -0500 INFO BucketReplicator - event=startBucketReplication bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6
04-19-2016 12:26:31.342 -0500 INFO BucketReplicator - Starting replication of bucket=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 to 128.206.15.196:8090;
04-19-2016 12:26:31.342 -0500 INFO BucketReplicator - Replicating warm bucket=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 node=guid=BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 host=xxx.xxx.xx.xxx s2sport=8090 bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6
04-19-2016 12:26:31.342 -0500 INFO BucketReplicator - event=finishBucketReplication bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 [et=1459812147 lt=1459818423 type=3]
04-19-2016 12:26:31.342 -0500 INFO BucketReplicator - event=localReplicationFinished type=cold bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6
04-19-2016 12:26:31.354 -0500 INFO BucketReplicator - Connection for idx=xxx.xxx.xx.xxx:8090:mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 successful
04-19-2016 12:26:32.818 -0500 WARN BucketReplicator - Failed to replicate warm bucket bid=mu-syslog~441~BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 to guid=BBF7C0FC-BC6B-48FE-8E54-DD93348F29F6 host=xxx.xxx.xx.xxx s2sport=8090. Connection closed.
Suggestions please!
Thanks.
↧