Quantcast
Channel: Questions in topic: "splunk-enterprise"
Viewing all articles
Browse latest Browse all 47296

How can I test if I am overfitting?

$
0
0
Hi I would like to know if I am overfitting. Why are my results too good? The algorithm has never seen the JUNE dataset. I trained it with the MAY dataset. But the prediction is very good. Also, I have tested with a "dummy" dataset. It is the one that comes by default with MLTK. Results are bad. I have been thinking that "maybe" my SPL is wrong. But I am not sure. Thank you ![alt text][1] TRAIN | inputlookup fortigate_QC_May2019_logins.csv //loading the dataset **MAY** company A | fit StandardScaler "logins" with_mean=false with_std=true //normalizing data | fit DBSCAN "SS_logins" //finding outliers | where NOT isOutlier==-1 //erasing the outliers | fit LinearRegression SS_logins from * into "authentication_profiling_LinearRegression" //applying the algorithm and saving it TEST | inputlookup fortigate_QC_June2019_logins.csv //loading the dataset **JUNE** --company A | fit StandardScaler "logins" with_mean=false with_std=true //normalizing the data | apply "authentication_profiling_LinearRegression" //applying the saved model | table _time, "SS_logins", "predicted(SS_logins)" //making predictions TESTING WITH DUMMY DATASET | inputlookup logins.csv //this is the dummy dataset: the logins are from company B | fit StandardScaler "logins" with_mean=false with_std=true //normalizing | apply "authentication_profiling_LinearRegression" //applying the model from company A | table _time, "SS_logins", "predicted(SS_logins)" THIS IS THE PLOT WITH THE DUMMY DATASET Results are not good. ![alt text][2] [1]: /storage/temp/273287-data.png [2]: /storage/temp/273288-data1.png

Viewing all articles
Browse latest Browse all 47296

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>