Quantcast
Channel: Questions in topic: "splunk-enterprise"
Viewing all articles
Browse latest Browse all 47296

Calculating IOPS using FIO testing

$
0
0
**Installing Fio:** **Linux:** Fio is part of the CentOS/Redhat core repository so you can check/install this via the following: `yum check fio` `yum install fio` Fio is also part of the core Debian/Ubuntu repository and you can check/install via: `apt-cache search fio` `apt-get install fio` **Windows:** Navigate to: [https://bluestop.org/fio][1] Download the most recent version of fio (if this doesn't work, go for one version older) The installer will run through and do a basic installation into Windows Powershell and standard Command Prompt. There will be no acknowledgement of this beyond completing the installation (i.e. desktop icon or program listing). ---------- **Using Fio:** **OS Agnostic Info:** 1. Splunk MUST be down before running this test to get an accurate reading of the disk system's capabilities. 2. The actual size of the file(s) to be tested is a factor of `(2x Total RAM)/(# of CPU's reported by Splunk)` The reason for this is to fully saturate the RAM and to push the CPU's to work through read/write operations for a thorough test. Splunk doesn't take advantage of hyperthreading/multithreading. Because of this, we only run the test with the number of physical cores available or number of CPU's assigned to the VM. (see below. We're looking for CPU cores, not virtual cores) `12-12-2018 09:59:17.240 -0500 INFO loader - Detected 8 (virtual) CPUs, 8 CPU cores, and 7822MB RAM` **IMPORTANT:** Once the test has been run, there will be latent test files in the directory where this was run that will need to be cleaned up or else they will occupy disk space equivalent to: (Size of test file(s) x number of CPU cores) **Linux:** Create a file with a ".fio" extension on it (e.g. fiotest.fio) Edit the file to include the following: `[random-rw]` `rw=randrw` `size=<2x Total RAM on Machine divided by number of CPU cores> (size in: k, m, g)` `blocksize=64k` `ioengine=libaio` `directory=` `numjobs=` `iodepth=32` `group_reporting` Directory must be where Splunk writes to for hot/warm or cold buckets (dependent on where the I/O issue appears to be) Then run the test by calling `fio filename.fio` `[root@host]# fio /root/fio/fiotest.fio` This test can take time. It is recommended to be run during a scheduled window or during downtime. (seriously, you could be there for hours) Once you've run the test, you can copy/paste the results into a text file and (if necessary) upload it to your support case. Fio spits out results to the command line similar to a "cat" command. *See "Interpreting Fio Results" below* ---------- **Windows:** Similar to the Linux install, have the customer create a file with a ".fio" extension (e.g. fiotest.fio) Edit the file to include the following: `[random-rw]` `rw=randrw` `size=<2x Total RAM on Machine divided by number of CPU cores> (size in: k, m, g)` `blocksize=64k` `ioengine=windowsaio` `numjobs=` `group_reporting` `iodepth=32` If the file is not showing as a ".fio" file, you'll need to navigate into the File Explorer to the location where the user created the ".fio" file and show file extensions. Once this has been done, you'll need to change the file to include a ".fio" instead of ".txt" or ".docx", etc. Unlike linux, you cannot specify a directory nicely with Windows. You'll need to navigate to where the test needs to occur in Powershell or Command Prompt and then run the fio test by doing the following: **EXAMPLE:** `C:\Users\Administrator> cd C:\Program Files\Splunk\var\lib\splunk` `C:\Program Files\Splunk\var\lib\splunk> fio C:\Users\Administrator\Desktop\fiotest.fio` This test can take time. It is recommended to be run during a scheduled window or during downtime (seriously, you could be there for hours) Once you've run the test, you can copy/paste the results into a text file and (if necessary) upload it to the case. Fio spits out results to the command line similar to a "cat" command. *See "Interpreting Fio Results" below* ---------- **Interpreting Fio Results:** So you've successfully run a fio test, what now? This is a lot of crap to parse through. Luckily, there is only one we need to really be concerned with (listed below) ***IOPS*** This is a fairly straightforward field and is actually what we're looking for. If either of these are too low, we can successfully point to the disk system having issues Remember your recommended requirements! [http://docs.splunk.com/Documentation/Splunk/latest/Capacity/Referencehardware#Indexer][2] [https://www.splunk.com/web_assets/pdfs/secure/Splunk_and_VMware_VMs_Tech_Brief.pdf][3] Something to keep in mind is if you are running Parallel Ingestion Pipelines. The requirements of the disk system go up by quite a bit for each additional pipeline (300-400 IOPS) [http://docs.splunk.com/Documentation/Splunk/latest/Capacity/Parallelization#Index_parallelization][4] Take this chart with a grain of salt, these are approximate values that you should keep in mind while scaling up. `# of Pipelines Extra CPU's Physical IOPS VM IOPS` ` 1 (default) ----- 800 - 1200` ` 2 4-6 1100-1200 1500-1600` ` 3 10-12 1500-1600 1700-1800` ` 4 16-18 1700-1800 2100-2200` ---------- My test machine has 8 vCPU and 8GB of RAM. With this in mind, the test file itself looks like: `[random-rw]` `rw=randrw` `size=2g` `blocksize=64k` `directory=/opt/splunk/var/lib/splunk` `ioengine=libaio` `numjobs=8` `group_reporting` `iodepth=32` The below are the results of the test itself. Note: This was intentionally stopped halfway through the test to give a general idea of the results you'll have to parse through. `random-rw: (g=0): rw=randrw, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=libaio, iodepth=32` `...` `fio-3.1` `Starting 8 processes` `random-rw: Laying out IO file (1 file / 2048MiB)` `random-rw: Laying out IO file (1 file / 2048MiB)` `random-rw: Laying out IO file (1 file / 2048MiB)` `random-rw: Laying out IO file (1 file / 2048MiB)` `random-rw: Laying out IO file (1 file / 2048MiB)` `random-rw: Laying out IO file (1 file / 2048MiB)` `random-rw: Laying out IO file (1 file / 2048MiB)` `random-rw: Laying out IO file (1 file / 2048MiB)` `bs: 8 (f=8): [m(8)][52.8%][r=17.4MiB/s,w=18.1MiB/s][r=278,w=289 IOPS][eta 03m:11s]` `fio: terminating on signal 2` `random-rw: (groupid=0, jobs=8): err= 0: pid=19368: Wed Dec 12 11:14:50 2018` ` read: IOPS=329, BW=20.6MiB/s (21.6MB/s)(4401MiB/213631msec)` ` slat (usec): min=448, max=484089, avg=24177.79, stdev=20589.49` ` clat (usec): min=4, max=1592.1k, avg=375811.07, stdev=164051.01` ` lat (msec): min=9, max=1752, avg=399.99, stdev=171.01` ` clat percentiles (msec):` ` | 1.00th=[ 155], 5.00th=[ 194], 10.00th=[ 215], 20.00th=[ 247],` ` | 30.00th=[ 275], 40.00th=[ 300], 50.00th=[ 330], 60.00th=[ 368],` ` | 70.00th=[ 422], 80.00th=[ 498], 90.00th=[ 600], 95.00th=[ 701],` ` | 99.00th=[ 911], 99.50th=[ 995], 99.90th=[ 1183], 99.95th=[ 1267],` ` | 99.99th=[ 1401]` ` bw ( KiB/s): min= 128, max= 5604, per=12.54%, avg=2645.05, stdev=1038.64, samples=3414` ` iops : min= 2, max= 87, avg=41.16, stdev=16.15, samples=3414` ` write: IOPS=329, BW=20.6MiB/s (21.6MB/s)(4396MiB/213631msec)` ` slat (usec): min=31, max=2663, avg=73.72, stdev=43.24` ` clat (msec): min=9, max=1509, avg=376.16, stdev=164.08` ` lat (msec): min=9, max=1509, avg=376.23, stdev=164.09` ` clat percentiles (msec):` ` | 1.00th=[ 155], 5.00th=[ 194], 10.00th=[ 215], 20.00th=[ 247],` ` | 30.00th=[ 271], 40.00th=[ 300], 50.00th=[ 330], 60.00th=[ 368],` ` | 70.00th=[ 422], 80.00th=[ 498], 90.00th=[ 609], 95.00th=[ 701],` ` | 99.00th=[ 911], 99.50th=[ 995], 99.90th=[ 1150], 99.95th=[ 1217],` ` | 99.99th=[ 1435]` ` bw ( KiB/s): min= 128, max= 6709, per=12.53%, avg=2640.95, stdev=1134.21, samples=3414` ` iops : min= 2, max= 104, avg=41.10, stdev=17.65, samples=3414` ` lat (usec) : 10=0.01%` ` lat (msec) : 10=0.01%, 20=0.01%, 50=0.02%, 100=0.06%, 250=21.09%` ` lat (msec) : 500=59.10%, 750=16.27%, 1000=2.97%, 2000=0.48%` ` cpu : usr=0.11%, sys=0.74%, ctx=70440, majf=0, minf=233` ` IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=99.8%, >=64=0.0%` ` submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%` ` complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%` ` issued rwt: total=70416,70337,0, short=0,0,0, dropped=0,0,0` ` latency : target=0, window=0, percentile=100.00%, depth=32` `Run status group 0 (all jobs):` ` READ: bw=20.6MiB/s (21.6MB/s), 20.6MiB/s-20.6MiB/s (21.6MB/s-21.6MB/s), io=4401MiB (4615MB), run=213631-213631msec` ` WRITE: bw=20.6MiB/s (21.6MB/s), 20.6MiB/s-20.6MiB/s (21.6MB/s-21.6MB/s), io=4396MiB (4610MB), run=213631-213631msec` `Disk stats (read/write):` `dm-0: ios=70357/58882, merge=0/0, ticks=1693633/1848407, in_queue=3547619, util=100.00%, aggrios=70416/58881,` `aggrmerge=0/1, aggrticks=1694829/1836861, aggrin_queue=3531591, aggrutil=100.00%` `sda: ios=70416/58881, merge=0/1, ticks=1694829/1836861, in_queue=3531591, util=100.00%` ---------- Summary of the testing variables using my example from above: `[random-rw]` stanza header `rw=randrw` This is a random read/write with a ratio of 50/50 `size=2g` This is the size of the testing file that will be written to disk before being randomly read/written to `blocksize=64k` The blocksize needs to be 64k because Splunk writes in 64k blocks `directory=/opt/splunk/var/lib/splunk` Where Splunk writes to disk `ioengine=libaio` For linux/windows the specification is set as such in an asynchronous i/o format. `numjobs=8` I have 8 vCPU's on my test box, so I'm using 8 jobs to simultaneously run through Fio. `group_reporting` This setting does not require an additional variable. What it does is aggregate the results of the 8 simultaneous jobs into one number that the system is capable of sustaining. `iodepth=32` The iodepth setting allows data to be queued and written to many disks at once if they are available. ***Docs:*** [https://media.readthedocs.org/pdf/fio/latest/fio.pdf][5] [https://www.linux.com/learn/inspecting-disk-io-performance-fio][6] [https://github.com/axboe/fio][7] [1]: https://bluestop.org/fio [2]: http://docs.splunk.com/Documentation/Splunk/latest/Capacity/Referencehardware#Indexer [3]: https://www.splunk.com/web_assets/pdfs/secure/Splunk_and_VMware_VMs_Tech_Brief.pdf [4]: http://docs.splunk.com/Documentation/Splunk/latest/Capacity/Parallelization#Index_parallelization [5]: https://media.readthedocs.org/pdf/fio/latest/fio.pdf [6]: https://www.linux.com/learn/inspecting-disk-io-performance-fio [7]: https://github.com/axboe/fio

Viewing all articles
Browse latest Browse all 47296

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>