hi,
We have an incoming custom dataset which consumes approx 700GB a day and is currently used for CIM. Currently it is in Key-value format. there is a proposal for changing it to csv, which reduces the dataset by approx 60% to 280GB a day. The data savings are quite significant. We know the client is fixed, so lack of flexibility is NOT an issue
**Existing format in every line**
service="retail" source_port="514" dest_port="22" destination_ip="1.2.3.4" source_ip="7.2.3.4"
**Proposed format in every line**
"retail",514,22,"1.2.3.4","7.2.3.4"
The key question is, **from a performance point of view would there be an impact so if we use CIM on csv format**? Also would it have bad impact on tsidx creation? The data comes as syslog & files are rotated at 100MB size (if it matters). I've tried with a smaller subset in my test machine, but I couldn't find any changes in performance with small amount of data. But would like to get experience
↧