Hi Team,
I have onboarded Squid Proxy logs on my Splunk instance, but the log contains various formats of URL so URL's are not getting extracted properly. Below are some of the types of URLs present in the logs.
domain.com
domain.co.in
subdomain.domain.com
subdomain.domain.co.in
domain.com:portnumber
subdomain.domain.co.in:portnumber
IP address
IP address with http (http:\\192.168.2.20)
IP address with spaces and http (http 192.168.2.20)
domain names with http (http:\\domain.subdomain.com:portnumber)
domain names with http(s) but with space (http domain.subdomain.co.in:portnumber)
This may not be exhaustive types of URL's which i am getting in the logs, i tried multiple ways to extract the fields so that i can only get domain.com or domain.co.in or just the main part of the URL but in vain.
I am unable to write a fields extraction which can accomodate all of the above criteria.
I am looking for a solution to this problem in either of the two below areas.
1. How can the log be properly formatted in Squid configuration so that we get proper logs?
2. How can the logs which are not properly formatted but available on Splunk be used to utilize to extract domain names using come complex regex or search commands?
↧