Quantcast
Channel: Questions in topic: "splunk-enterprise"
Viewing all articles
Browse latest Browse all 47296

need help indexing a simple XML file

$
0
0
I work with a file delivery system that relies on an xml "index" file that acts as a sort of manifest of files available for download in a given data set. I need to index these xml files so we can search and report on them in Splunk. While the files are fairly simple in construction, I am having a problem when trying to get them indexed cleanly. Here is a sample of an xml file: 20191014T201231\\UMECHUJX\cbm_navy\E-2D\HFP/IAVAS14394144438b87856704c7c3bb5b3927d6d5959aaf997e8777e1ede40d35ff425eb4511620190825T195424windows10.0-kb4500641.exeENG/ALL1458278573b214cf5847fdd3244fae60036c6c9aad056e5114bd8320dd460cd902bd44a45620190906T143311E2D_ALE_System_V091_Point_Release_20190822.zipHFP/Files569985537dc51058df85791c0786e6f41eba315fd044f0fc911a70763748f3ec9ee95d27220190919T0137541.02.02_version_update.zip Here is my props.conf stanza: [jkcsindex] DATETIME_CONFIG = NONE KV_MODE = xml category = Structured description = JKCS Index file disabled = false pulldown_type = true SHOULD_LINEMERGE = false NO_BINARY_CHECK = true LINE_BREAKER = (\s*)(\s*)(\s*) REPORT-jkcsxml = jkcsxml TRANSFORMS-nullIndexHeader = nullIndexHeader From the transforms.conf, here is the nullIndexHeader stanza to remove the header and extra tags: [nullIndexHeader] REGEX = (?m)^(<\?xml)|(\)|(\)|(\<\/GenDate\>)|(\<\/Root\>)|(\<\/Heading\>)|(\<\/Record\>)|(\<\/DSIF\>) DEST_KEY = queue FORMAT = nullQueue And here is the transforms.conf stanza to break out the xml tags: [jkcsxml] REGEX = <([^>]+)>([^<]*)\1\> FORMAT = $1::$2 MV_ADD = true REPEAT_MATCH = false So my main problem is that after all of this, when I try to output a simple table, all of the results get doubled, like this: ![alt text][1] Why is that? Where is this duplication coming from? When I do a simple search to show the raw events, only one of each record is listed. I lot of this is cobbled together from other answers that have been posted here, and some of it I don't entirely understand. I've been fighting regular expressions all last week just to get the fields extracted (because what works for me at regex101.com doesn't seem to apply in the LINE BREAKER in the Add Data GUI), but I can't figure out this doubling of the results in the table. Help greatly appreciated! [1]: /storage/temp/275889-capture.jpg

Viewing all articles
Browse latest Browse all 47296

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>