## What I've read
I ask this question after reading the following Splunk Dev articles, among others:
* "[Getting data in](http://dev.splunk.com/view/dev-guide/SP-CAAAE3A)"
* "[Logging best practices](http://dev.splunk.com/view/logging-best-practices/SP-CAAADP6)"
* "[Building in telemetry with high-performance data collection](http://dev.splunk.com/view/dev-guide/SP-CAAAE7B#sendingthedata)"
* "[High volume HTTP Event Collector data collection using distributed deployment](http://dev.splunk.com/view/event-collector/SP-CAAAE73)"
## Why I'm asking this question
Some background:
* One Splunk usage scenario I need to consider involves loading many thousands of events into Splunk periodically (such as daily) or ad hoc, rather one event or a few events as they occur, in real time.
* The original event records are in hundreds of different proprietary binary formats. However, for the purpose of this question, I am dealing with the event records after they have been converted into a text format of my choice, such as JSON.
* Some of the events contain hundreds of fields (or, if you prefer, properties; key-value pairs), resulting in JSON of a few kilobytes per event. 5 KB per event is common; many are much smaller (only a few fields), some are larger (many fields; long field values).
* Some of the original binary event records contain nested structures such as repeating groups. When converting these binary records, and depending on whether it's important to preserve the granularity of the original data for analysis in Splunk, I can either flatten such structures by aggregating multiple values into a single average or total, or - when converting to formats such as JSON that inherently support nested structures - preserve them.
* I don't mean to be coy about the platform on which these event records exist, but I'd like an answer that is independent of what that platform is, with the following considerations: it's a UNIX environment that conforms to POSIX standard 1003.2, and it's not one of the operating systems for which Splunk offers a Universal Forwarder.
## Some potential answers
* **Splunk HTTP Event Collector** (EC; works a treat - using Java `HttpURLConnection`, or even just cURL - but not necessarily the most performant)
* **TCP** or **UDP** (Splunk Dev: "In terms of performance, UDP/TCP transfer ranks as the highest performing method")
* Monitor files and directories (for example, FTP from the originating remote system to a file system available to Splunk, then use the **batch input** type)
Redis also occurs to me as a possibility, although, to my knowledge, it is not an ingestion method directly supported by Splunk; I mention it because I use Redis to forward logs to a different analytics software (not Splunk).
## Data format?
A related or sub- question: setting aside the method of data *transport*, what event data *format* does Splunk most efficiently ingest?
For example, assuming that each event consists of a flat list of fields, with no nested structures (no repeating groups),
should I use:
* JSON
* "Non-JSON" (syslog-like) key-value pairs (key1=value1, key2=value2, key3=value3 ...)
In particular, if I use EC, then, even though the body of the HTTP POST request to EC is JSON, what is a better choice for the *value* of the `event` property:
{ event:{"key1":"string_value1","key2":numeric_value2} }
or
{ event:"key1=\"string_value1\", key2=numeric_value2" }
If the data *does* contain nested structures, is JSON the most efficient format for ingestion?
↧