Hello,
We would like to use the following search as an input for the ML Toolkit Numeric Field Prediction (DecisionTreeRegressor):
`index=mlbso violation sourcetype=*BWP_hanatraces* | timechart span=10s count as mlbso_hana_composite_ooms`
The idea is to count the occurrences of the word "violation" (out of memory errors) in the specific sourcetype, create the field mlbso_hana_composite_ooms out of it and predict on it. Now, the point is that in order to improve the prediction quality we need to run the above search for the time span long enough in the past, ideally some months. This however produces the error:
> The specified span would result in too many (>50000) rows
I understand where the error comes from (maxresultrows in limits.conf) but we do not want to extend the span, we would actually want event to reduce it to 1 sec, neither we want to make the time window smaller - also here we would rather extend it to several months.
When I reduce the selected time window or put the time span higher in order to stay below 50000 rows, then it works fine but brings around 3.000 events back. So shortly speaking out of 50.000 samples I get 3.000 events where the word "violation" was found - with this the quality of the DecisionTreeRegressor is not good enough.
How would I overcome this issue?
Am I overseeing something?
Kind Regards,
Kamil
↧
ML Toolkit - limit of 50000 rows
↧
cluster configuration, new ip segment.
good morning
It is required to change the entire ip segment of our cluster since the interfaces will be changed to one of higher speed (fiber).
Is there any recommendation for this activity? as for example, perform maintenance mode, change ip's master, deployer ... etc etc?
This configuration is found in the servers.conf of each machine, it will be as easy as editing this file and update to the new ip's?
regards
↧
↧
Failed attempting to parse transport header
Hello,
I have created a custom search command in Splunk as a python script. When I run the command in Splunk SPL I get the following error message:
ERROR ChunkedExternProcessor - Failed attempting to parse transport header:
Does anyone has any suggestions?
This is the output of commands.conf:
[sankey]
chunked = true
enableheader = false
filename = system_python.path
command.arg.1 = sankey.py
↧
Error while reading messages from queues-JMS modular input
I am consuming messages using JMS modular input.
For one connection , I need to refresh the queue connection by disabling it and enabling it again to start re-consuming. I need to do it every 5 min.
It is giving the following error
message from "python /opt/splunk/etc/apps/jms_ta/bin/jms.py" Exception in thread "Thread-40" java.lang.OutOfMemoryError: Java heap space
May I know how to handle this.
@Damien Dallimore
↧
Smart Exporter Button not working
I've installed the Smart PDF Exporter but the new showing button doesn't work. The PhantomJS isn't installed and the configuration is skipped as we don't use the schedule export. Any ideas, what should we do?
↧
↧
How can i list missing and new events ?
Hello,
Every day Splunk forwarders collects system events from machines with different type (warnings,errors,informations, criticals), these events are constantly changing : a new ones appear, others disappear and some stay.
What i want to do is count the number of missing/new event for every day and list them in a table by their id and type.
Thanks in advance
↧
Routing based on host and sourcetype.
Hi,
I am routing traffic to a 3rd party I have done some of this based on a host and others based on source type.
But I now need to route based on a host and a sourcetype and can't work out how to do it?
Any tips of where to look?
↧
Getting total without join
I am trying to see the number of devices in a fleet by location without a specific setting applied. The data I have coming in shows the location, all devices in that location and the settings applied to each device.
![alt text][1]
OG: the location, MacAddress: devices missing the specific setting, Model: Type of device, Profile status: Missing "Specific setting", Not installed count: Count of devices in that location missing the setting, TotalCount: Total number of devices regardless of compliance with setting at that location, Compliance: Just a percentage of devices at that location that have the correct setting.
I am joining the same search to get the total number of devices at the location. The reason I am doing it this way is because I am filtering with a where in my search that will impact all values when I don't want it to impact my TotalCount. Anyway I can do this differently?
index=nitro_apps source="DATA"
| rex "^[^,\n]*,(?P[^,]+)[^,\n]*,(?P[^,]+)(?:[^ \n]* ){4}(?P[^,]+),(?P[^,]+),\w+,(?P[^,]+),\d+,(?P\d+),(?P[^,]+)[^,\n]*,(?P\w+)"
| search "OG"=* Model="180"
| eval IsProfileInstalled=if(in(Profile, "OID"),"true","false")
| dedup MacAddress
| where 'IsProfileInstalled'=="false"
| eval Profile_Status="Missing "+ "OID"
| stats list(MacAddress) as MacAddress, list(Model) as Model, list(Profile_Status) as Profile_Status, count as Not_Installed_Count by OG
| join
[ search index=nitro_apps source="DATA"
| rex "^[^,\n]*,(?P[^,]+)[^,\n]*,(?P[^,]+)(?:[^ \n]* ){4}(?P[^,]+),(?P[^,]+),\w+,(?P[^,]+),\d+,(?P\d+),(?P[^,]+)[^,\n]*,(?P\w+)"
| search "OG"=*
| where 'Model'="180"
| stats dc(MacAddress) as TotalCount by OG]
| eval Compliance_% = round(((TotalCount-Not_Installed_Count)/TotalCount)*100, 2)
| table OG MacAddress Model Profile_Status Not_Installed_Count TotalCount Compliance_%
[1]: /storage/temp/255021-screen-shot-2018-09-24-at-101116-am.png
I also feel my dedup is ruining some of my data. Overall the question is can I be doing this better? I believe it can be optimized significantly.
↧
After installing the Smart PDF Explorer, why is my smart exporter button not working?
I've installed the Smart PDF Exporter but the new showing button doesn't work. The PhantomJS isn't installed and the configuration is skipped as we don't use the schedule export. Any ideas, what should we do?
↧
↧
Why is the Splunk Add-on for Microsoft Windows not reporting disk space correctly?
So, I am creating a dashboard that will report on disk usage over 75%...
I have deployed the Splunk Add-on for Microsoft Windows to a couple of servers to help get data into CIM format... and save time running down the correct inputs for the perfmon Disk Usage.
Anywho... These servers have two logical disks that are 60 GB a piece.
C: Drive 39.4 / 60 GB = 65.6666...
D: Drive 47.5 /60 GB = 79.1666...
Neither is above 85%, looking directly on the box, but, Splunk is reporting 17.1919... %_Free_Space?
The math doesn't quite add up.
But, when I divide 39.4 by 47.5 it returns 82.947.... 1 - 82.947 = 17.05 %... not spot on but much closer to the reported percentage?
Any clues where I or the add-on is going wrong?
↧
How do we deal with a new ip segment for our cluster?
Good morning,
It is required to change the entire ip segment of our cluster since the interfaces will be changed to one of higher speed (fiber).
Is there any recommendation for this activity? as for example, perform maintenance mode, change ip's master, deployer ... etc etc?
This configuration is found in the servers.conf of each machine. It will be as easy as editing this file and update to the new ip's?
Regards
↧
Failed attempting to parse transport header ERROR when running custom search command
Hello,
I have created a custom search command in Splunk as a python script. When I run the command in Splunk SPL I get the following error message:
ERROR ChunkedExternProcessor - Failed attempting to parse transport header:
Does anyone has any suggestions?
This is the output of commands.conf:
[sankey]
chunked = true
enableheader = false
filename = system_python.path
command.arg.1 = sankey.py
↧
How scheduled search works in a cluster?
hi Team,
If i have created scheduled searches/jobs on one of the Standalone SH A and after couple of months if we add 2 more search Heads B and C and made it a cluster. How does the scheduled searches works in a cluster?
1. Since Searches has been initially created on SH A , will they always run on SH A?
2. If it's yes for Above question, then in case at the scheduled time due to various reasons if SH A goes down will they run on SH B or SH C?
OR
3. Captain of the SH Cluster decides where to run the Scheduled searches in cluster?
4. If we have 5 Scheduled jobs or searches do we need to manually create them 2 on each SH to disperse the load?
How does they work please help me.
Thanks,
SM
↧
↧
2FA (Duo) for specific role/authentication method?
Hi guys,
I was wondering if it's possible to configure Splunk so that only our local admin users (people who have admin role) need to sign in with 2FA/DUO?
We currently have local Splunk accounts for our admin users and other roles are assigned using LDAP configuration that integrates our Active directory. We don't want anyone that's not an admin to need 2FA.
Can this be done?
Cheers!
↧
How do scheduled search works in a cluster?
hi team,
If i have created scheduled searches/jobs on one of our standalone Search Heads (Search Head "A") and after a couple of months if we add two more search heads ( "B" and "C" ) and made it a cluster. How do the scheduled searches work in a cluster?
1. Since Searches have been initially created on Search Head "A" , will they always run on Search Head "A"?
2. If it's yes for the above question, then in case at the scheduled time due to various reasons ( like if SH A goes down ), will they run on SH B or SH C?
OR
3. Captain of the Search Head Cluster decide where to run the scheduled searches in the cluster?
4. If we have 5 Scheduled jobs or searches do we need to manually create them 2 on each Search Head to disperse the load?
How do they work? Please help me.
Thanks,
SM
↧
Artifactory Logs into Splunk
I want to integrate the artifactory with splunk to see the artifactory logs. Is there any way to do that?
↧
Is it possible to get total without the join command?
I am trying to see the number of devices in a fleet by location without a specific setting applied. The data I have coming in shows the location, all devices in that location and the settings applied to each device.
![alt text][1]
OG: the location, MacAddress: devices missing the specific setting, Model: Type of device, Profile status: Missing "Specific setting", Not installed count: Count of devices in that location missing the setting, TotalCount: Total number of devices regardless of compliance with setting at that location, Compliance: Just a percentage of devices at that location that have the correct setting.
I am joining the same search to get the total number of devices at the location. The reason I am doing it this way is because I am filtering with a `where` in my search that will impact all values when I don't want it to impact my TotalCount. Anyway can I do this differently?
index=nitro_apps source="DATA"
| rex "^[^,\n]*,(?P[^,]+)[^,\n]*,(?P[^,]+)(?:[^ \n]* ){4}(?P[^,]+),(?P[^,]+),\w+,(?P[^,]+),\d+,(?P\d+),(?P[^,]+)[^,\n]*,(?P\w+)"
| search "OG"=* Model="180"
| eval IsProfileInstalled=if(in(Profile, "OID"),"true","false")
| dedup MacAddress
| where 'IsProfileInstalled'=="false"
| eval Profile_Status="Missing "+ "OID"
| stats list(MacAddress) as MacAddress, list(Model) as Model, list(Profile_Status) as Profile_Status, count as Not_Installed_Count by OG
| join
[ search index=nitro_apps source="DATA"
| rex "^[^,\n]*,(?P[^,]+)[^,\n]*,(?P[^,]+)(?:[^ \n]* ){4}(?P[^,]+),(?P[^,]+),\w+,(?P[^,]+),\d+,(?P\d+),(?P[^,]+)[^,\n]*,(?P\w+)"
| search "OG"=*
| where 'Model'="180"
| stats dc(MacAddress) as TotalCount by OG]
| eval Compliance_% = round(((TotalCount-Not_Installed_Count)/TotalCount)*100, 2)
| table OG MacAddress Model Profile_Status Not_Installed_Count TotalCount Compliance_%
[1]: /storage/temp/255021-screen-shot-2018-09-24-at-101116-am.png
I also feel my `dedup` is ruining some of my data. Overall the question is can I be doing this better? I believe it can be optimized significantly.
↧
↧
Only show results that do not contain value
I want to see devices that do not have a specific value. I am organizing my devices by Mac Address, and I am trying to see the ones that do not have a profile named WifiProfile_X.
I keep getting the ones that do contain this profile, but I need to see the ones that dont.. any ideas?
index=nitro_apps source="DATA" | rex "^[^,\n]*,(?P[^,]+)[^,\n]*,(?P[^,]+)(?:[^ \n]* ){4}(?P[^,]+),(?P[^,]+),\w+,(?P[^,]+),\d+,(?P\d+),(?P[^,]+)[^,\n]*,(?P\w+)" | search OG=7 AND Model=180 AND Profile NOT "WifiProfile_X"
|stats list(MacAddress), list(Model), dc(MacAddress)
↧
How do I make a search that compares weekly values and shows the percentage difference between fields?
Hi,
I would like to compare 1 week of tabled data to the previous weeks and calculate the percentage difference for each field value for field note_label
Initial search:
search... | stats count by note_label
note_label count
abc 10
abcd 20
abcde 30
I would like to show the data as:
note_label count (week1) count(week2) %Change
abc 10 20 100%
abcd 20 5 -75%
abcde 40 60 50%
I may be following the wrong route as i tried this but had no luck, and may need to use a different method? This search only give me the "note_label" field value names, but not the values.
earliest=-1w latest=now my_search | stats earliest(note_label) as e_status_label latest(note_label) as l_note_label | eval 1w=(l_note_label-e_note_label)/e_note_label*100
| appendcols [ search earliest=-2w latest=now my_search | stats earliest(note_label) as e_note_label latest(note_label) as l_note_label | eval 2w=(l_note_label-e_note_label)/e_note_label*100 ]
| fields note_label 1w 2w
thanks
↧
When using a search as an input for the Machine Learning (ML) Toolkit Numeric Field Prediction, how do we overcome the following error?
Hello,
We would like to use the following search as an input for the ML Toolkit Numeric Field Prediction (DecisionTreeRegressor):
index=mlbso violation sourcetype=*BWP_hanatraces* | timechart span=10s count as mlbso_hana_composite_ooms
The idea is to count the occurrences of the word "violation" (out of memory errors) in the specific source type, create the field "mlbso_hana_composite_ooms" out of it and then predict on it. Now, in order to improve the prediction quality, we need to run the above search for the time span long enough in the past, ideally some months. This however produces the following error:
> The specified span would result in too many (>50000) rows
I understand where the error comes from (maxresultrows in limits.conf) but we do not want to extend the span. We would actually want event to reduce it to 1 sec. Neither do we want to make the time window smaller — also here we would rather extend it to several months.
When I reduce the selected time window or put the time span higher in order to stay below 50000 rows, then it works fine but brings around 3.000 events back. So, shortly speaking, out of 50.000 samples, I get 3.000 events where the word "violation" was found — with this, the quality of the DecisionTreeRegressor is not good enough.
How would I overcome this issue?
Am I overlooking something?
Kind Regards,
Kamil
↧