Hi all,
For reference, I've seen this Splunk Answer post, but it doesn't quite get me where I want: https://answers.splunk.com/answers/297026/extract-value-from-json-array-of-objects.html?utm_source=typeahead&utm_medium=newquestion&utm_campaign=no_votes_sort_relev
I have logs that time how long each client's browser takes to load different resources from our server. These logs are at a page level, so each log has information about upwards of 20 different resources that were loaded for each page. The logs look something like:
{
Context: {
Database: DatabaseA
RequestContext: {
CallTree: 0
Path: /heartbeat/boomerang
RequestId: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
}
User: Training User
UserId: 1
}
Data: {
restiming[0][rt_con_end]: 2094.2379999905825
restiming[0][rt_con_st]: 2094.2379999905825
restiming[0][rt_dns_end]: 2094.2379999905825
restiming[0][rt_dns_st]: 2094.2379999905825
restiming[0][rt_dur]: 70.596999998088
restiming[0][rt_fet_st]: 2094.2379999905825
restiming[0][rt_in_type]: script
restiming[0][rt_name]: https://myWebsite:80/path/to/someJsFile.js
restiming[0][rt_req_st]: 2158.332999999402
restiming[0][rt_res_end]: 2164.8349999886705
restiming[0][rt_res_st]: 2158.7020000006305
restiming[0][rt_st]: 2094.2379999905825
restiming[1][rt_con_end]: 2334.4559999968624
restiming[1][rt_con_st]: 2334.4559999968624
restiming[1][rt_dns_end]: 2334.4559999968624
restiming[1][rt_dns_st]: 2334.4559999968624
restiming[1][rt_dur]: 26.542999999946915
restiming[1][rt_fet_st]: 2334.4559999968624
restiming[1][rt_in_type]: css
restiming[1][rt_name]: https://myWebsite:80/Images/notebook-skin.png
restiming[1][rt_req_st]: 2336.8760000012117
restiming[1][rt_res_end]: 2360.9989999968093
restiming[1][rt_res_st]: 2360.43499999505
restiming[1][rt_st]: 2334.4559999968624
...
}
}
So, unfortunately the information for each resource is not in correct JSON format. What I want to do is zip up on all of the information into a multi-value field, then mvexpand that field to have one log per resource. After that I can aggregate some data about how long it takes to load different images, javascript files, or CSS files on average.
I was thinking something like my search below might work, but I'm not sure that it would guarantee that each resource would line up with the appropriate time:
{baseSearch}
| rex "\"restiming\[[0-9]+\]\[rt_name\]\":\"(?.+?)\""
| rex "\"restiming\[[0-9]+\]\[rt_in_type\]\":\"(?.+?)\""
| rex "\"restiming\[[0-9]+\]\[rt_dur\]\":\"(?.+?)\""
| eval resourceDetails=mvzip(mvzip(resourceFile, resourceType, ":::"), resourceLoadTime, ":::")
| mvexpand resourceDetails
| {split up resourceDetails into its three parts again}
| stats avg(resourceLoadTime) by resourceType, resourceFile
My questions about this search are: (1) Am I guaranteed that the three fields I extract are going to match up for the mvzip? If not, is there a way I can order them by array index? (2) Is there a better way to do this? Perhaps using the ```foreach``` command to apply some logic to each? (3) General tips for how I can turn this flattened JSON array into the typical format. It's obviously pretty frustrating to have to deal with using the ```rex``` command so much.
Hopefully the example search gets across what I'm trying to do, but let me know if you need more information.
↧