We have certain complex logic for merging a certain datatype together, to produce a third merged datatype. There are a number of rules for handling merger, making the merge logic somewhat complex; more then I want to do in splunk directly. I want to create some generalized merger logic that can be reused in many searches/reports/si etc throughout splunk; so I want to be somewhat efficient since this will likely be run often on many of our events.
My first instinct is to use a python search command, wrapping a python library we already have, which will take two string arguments and generate a merged resulting string. My first question is how expensive is it to do this and constantly reuse a python script? Is there an expense for a 'context switch' from regular splunk to python script, and can I write the script such that it does not need to read in an entire event if all I'm doing is merging two strings to create a third string.
There is also a possibility for implementing a rather large hash lookup table solution, to map every argument to an int and store a second lookup which specifies what int should result from merging values X and Y. Does anyone know how well something like this would scale as the possible values for x and Y increase? Do we risk a point where the data is so large the cost in terms of memory and lookup can be greater then the expense of running calculation rules to build a new string.
↧