Quantcast
Channel: Questions in topic: "splunk-enterprise"
Viewing all articles
Browse latest Browse all 47296

Index a specific table (forum) of a webpage - allowing me to kick off reports (based on time frame)

$
0
0
Hello! Here is what I'm trying to do: Index a particular section of a web page. This particular section is a forum that is updated constantly, and there is only 1 main column that I'm interested in, which is titled "Subject". How do I accomplish this w/o running into duplicate entries? - which is what I'm getting when I do the following. Currently I run the following using PowerShell: $wc.downloadstring("https://website.com/forum123/") >C:\PS_Output\Output.txt Then I index output.txt and use Splunk to find a Named Variable using Regex to find the occurrences of a particular string (i.e.: 4 consecutive capitol letters). But each time Output.txt is overwritten (when I run $wc.download string twice - seconds apart), I get a lot of duplicates. I believe I have 2 problems: 1) Need to instead clean up output.txt and only have relevant events (no need for all the surround garbage html source). Perhaps I need to add some regex to the $wc.downloadstring class? 2) The tricky part is how quickly the webpage's table is flushed out with new posts. If I run this every minute, but all 50 posts flush with 50 new posts within 30 seconds, I loose about half content that I need. Anyone out there ever tried grabbing content from an external site (not having admin access to the server of course) and keeping historical data? Thanks!

Viewing all articles
Browse latest Browse all 47296

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>