Hi,
At my company we have noticed that for some records (1-2%), the data we see in Splunk does not match the data coming from the IIS logs. This is a rather interesting problem, when we conducted research into the issue we noticed a very strange thing.
When you click "Show Source" on one of the bad records, we see what appears to be our IIS log records. However, when we actually look at the record closely, we noticed that the record is actually parts of 2 different records concatenated together. This is how our Splunk data looks compared to our IIS data.
I'm adding the field names above each field for clarity
Record shown in Splunk "Show Source"
Field1 Field2 Field3 Field4 Field5
AAAA BBBB CCCC DDbbb cccc
IIS Record matching first part of Splunk "Show Source" Record
Field1 Field2 Field3 Field4 Field5
AAAA BBBB CCCC DDDD EEEE
IIS Record matching second part of Splunk "Show Source" Record
Field1 Field2 Field3 Field4 Field5
aaaa bbbb cccc dddd eeee
As you can see, the record shown in Splunk's "Show Source" is actually parts of 2 records concatenated together. It takes the first part of one record, up to some arbitrary location (character length is not consistent, does not care about fields since it will split in the middle of a field value) and then takes the second part of some other record beginning from some arbitrary location and then concatenates them together. Splunk then indexes this new record but it is throwing our metrics off.
We first came across this issue when we noticed that there were some cases where multiple login Ids were associated with a single session ID...after drilling down we determined that was caused by this concatenation occurring on typically just one record where the session ID from one record is concatenated with another record that contains a different user ID.
↧