I was curious to see how Splunk (7.3.1) handles escape sequences in JSON strings, so I created a test file of JSON Lines:
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"| (vertical bar): \u007c"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"@ (commercial at): \u0040"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"# (number sign, hash): \u0023"}
{"code":"variant-characters","time":"2019-10-15T10:00:00+08:00","test":"¬ (not sign): \u00AC"}
(For the purposes of this question, please overlook the `code` and `time` properties.)
In particular, I was curious to see whether (and when) Splunk resolves the escape sequences in the `test` property values.
I was happy to see that it does:
![alt text][1]
But wait: where's the not sign?
I looked at the raw events in Splunk Web:
{"time":"2019-10-15T10:00:00+08:00","test":"\xAC (not sign): \u00AC"}
{"time":"2019-10-15T10:00:00+08:00","test":"# (number sign, hash): \u0023"}
{"time":"2019-10-15T10:00:00+08:00","test":"@ (commercial at): \u0040"}
{"time":"2019-10-15T10:00:00+08:00","test":"| (vertical bar): \u007c"}
Note:
* In case you're wondering, I use a transform to remove the `code` property.
* My `props.conf` file specifies `KV_MODE = json`
**Splunk replaced the not sign in the original incoming JSON Lines with the character sequence `\xAC`!**
While `AC` is the correct Unicode code point in hexadecimal for a not sign, `\x` is not a valid escape sequence in JSON!
By introducing this escape sequence, Splunk has corrupted the JSON.
This looks like a bug to me.
I'm wondering what makes the not sign "special"; why it gets this "bogus" (in the context of JSON) escaping, but other characters don't. I note that the other characters are more easily available on a standard US keyboard.
**My question(s)**
* Is this behavior a bug, as I suspect?
* How many other characters are affected by this behavior?
[1]: /storage/temp/275847-screenshot-splunk-search-variant-characters-stats.png
↧