Good morning (or afternoon) fellow Splunkers,
We've got an issue that has us quite perplexed. I'll post all information that I find relevant, but feel free to request more. The only similar problem I've found is **["Why is a Splunk embedded dashboard failing with search head clustering?"](https://answers.splunk.com/answers/242577/why-is-a-splunk-embedded-dashboard-failing-with-se.html)**. While the problem sounds identical to ours, we do not have search head clustering, and our version is 7.2.4.
We have a dashboard. The dashboard contains Simple XML, JavaScript, and CSS. However, the issue only relates to one base search referencing a saved search in Simple XML so the JS and CSS should be irrelevant. There is a saved search that runs every minute. That saved search expires after two minutes. Here is the base search in the Simple XML:
| loadjob savedsearch=[user]:[app]:[savedsearch]
| rename xyz as abc
| dedup abc -90s now 2m delay 1
Note that we only go back 90 seconds (the saved search runs every minute), and that the dashboard panel should refresh every 2 minutes. This base search goes directly to 9 single value panels (using `status_indicator_app`) as seen below:$tok01$ Title search abc="$tok01$"
| fields - abc
| fieldformat count=round(count) . "%" $tok01$
That same code is repeated 9 times (on 3 rows) with tok01 through tok09. Here's where it gets interesting. **Everything** above works fine. Anyone can pull up the dashboard, and all 9 status indicator app visualizations display their proper values. *However*, when the dashboard sits overnight, more often than not, one or more random (changes every time) of the panels receive the error in the title. They remain broken with no single value or visualization until either the panel or the dashboard is **manually** refreshed. This error is noted about once every 30 minutes in the developer console. The error does **not** show on page load - it only happens over time.
Failed to load resource: the server responded with a status of 500 (Internal Server Error)
This is accompanied with a URL:
https://192.168.0.1/en-US/splunkd/__raw/services/search/jobs/[JobSID]/results_preview?output_mode=json&search=[URL encoding of the search in viztok01]=1569356939184
Which simply displays this:
{"messages":[{"type":"FATAL","text":"Unknown sid."}]}
Which would make me think that it's referencing an expired SID. However, this is not the case. All panels show "<1m ago" as the latest refresh time, and clicking the inspector on each panel references the same (base) search. Clicking into `search.log` shows a normal log with no errors for both good and bad panels.
After all the research I've done this week, I'm going to try this **["Auto refresh a dashboard"](https://answers.splunk.com/answers/508962/auto-refresh-a-dashboard.html)** option (which I am pretty confident will work), however I still cannot fathom how this happens in the first place. Either all panels should break or none if they are identical and reference the same base search that works fine.
Does anyone have any ideas at all? Am I missing something here?
↧