I work with a file delivery system that relies on an xml "index" file that acts as a sort of manifest of files available for download in a given data set. I need to index these xml files so we can search and report on them in Splunk. While the files are fairly simple in construction, I am having a problem when trying to get them indexed cleanly.
Here is a sample of an xml file:
20191014T201231 \\UMECHUJX\cbm_navy\E-2D\ HFP/IAVAS 14394144 438b87856704c7c3bb5b3927d6d5959aaf997e8777e1ede40d35ff425eb45116 20190825T195424 windows10.0-kb4500641.exe ENG/ALL 1458278573 b214cf5847fdd3244fae60036c6c9aad056e5114bd8320dd460cd902bd44a456 20190906T143311 E2D_ALE_System_V091_Point_Release_20190822.zip HFP/Files 569985537 dc51058df85791c0786e6f41eba315fd044f0fc911a70763748f3ec9ee95d272 20190919T013754 1.02.02_version_update.zip
Here is my props.conf stanza:
[jkcsindex]
DATETIME_CONFIG = NONE
KV_MODE = xml
category = Structured
description = JKCS Index file
disabled = false
pulldown_type = true
SHOULD_LINEMERGE = false
NO_BINARY_CHECK = true
LINE_BREAKER = (\s*)(\s*)(\s*)
REPORT-jkcsxml = jkcsxml
TRANSFORMS-nullIndexHeader = nullIndexHeader
From the transforms.conf, here is the nullIndexHeader stanza to remove the header and extra tags:
[nullIndexHeader]
REGEX = (?m)^(<\?xml)|(\)|(\)|(\<\/GenDate\>)|(\<\/Root\>)|(\<\/Heading\>)|(\<\/Record\>)|(\<\/DSIF\>)
DEST_KEY = queue
FORMAT = nullQueue
And here is the transforms.conf stanza to break out the xml tags:
[jkcsxml]
REGEX = <([^>]+)>([^<]*)\1\>
FORMAT = $1::$2
MV_ADD = true
REPEAT_MATCH = false
So my main problem is that after all of this, when I try to output a simple table, all of the results get doubled, like this:
![alt text][1]
Why is that? Where is this duplication coming from? When I do a simple search to show the raw events, only one of each record is listed. I lot of this is cobbled together from other answers that have been posted here, and some of it I don't entirely understand. I've been fighting regular expressions all last week just to get the fields extracted (because what works for me at regex101.com doesn't seem to apply in the LINE BREAKER in the Add Data GUI), but I can't figure out this doubling of the results in the table.
Help greatly appreciated!
[1]: /storage/temp/275889-capture.jpg
↧