Monitoring Disk I/O on Linux.

Today I wrote a little piece of code that generates a random array of numbers, stores it into disk and sorts it on disk, using bubble sort. This uses O(1) memory, but it's obviously very slow. I did this for fun mostly. The code is here.

I wanted to know, if I'm running this code, what Linux tools are going to show an increase in disk I/O.

So, I run this:

$ ./disksort 150000

This starts the sorting. It takes a long time to sort 150k numbers on disk, at least with that algorithm.

So the first command that we can run is iostat. Let's see what it is outputting before I start disksort

$ sudo iostat 5 2
Linux 4.4.0-31-generic (petko-VirtualBox)       11/16/2016      _x86_64_        (1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          14.68    0.19   38.34    0.05    0.00   46.73

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               5.60        83.17        63.96     539970     415252

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.03    0.00    0.00    0.00    0.00   96.97

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               0.00         0.00         0.00          0          0

The command line invocation is a bit strange. The 5 2 part says "aggregate IO data for five seconds and show me two reports". The idea is that you can get continuous report output every five seconds. But I need only one report. The first one is a default one, which shows aggregated data for I'm not sure what period. But the second one is interesting. Notice kB_read/s and kB_wrtn/s. They're zeros. So it's all quiet. I now run disksort and run the same command.

$ iostat 5 2
Linux 4.4.0-31-generic (petko-VirtualBox)       11/16/2016      _x86_64_        (1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          15.32    0.17   40.33    0.05    0.00   44.14

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               5.04        73.65        59.42     540254     435868

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          26.81    0.00   73.19    0.00    0.00    0.00

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
sda               0.80         0.00       157.64          0        588

Notice how kB_wrtn/s spiked up. kB_read is zero and I assume that's because of the buffer cache. After all, I'm reading the same file again and again.

Another command that we can use is iotop. iotop is similar to top, but shows I/O stats. We'll run it using iotop -oa. The -o parameter makes it show only processes that do I/O. The -a flag aggregates the data during the time this command is running. So after running it for a few seconds I'm seeing this:

Total DISK READ :       0.00 B/s | Total DISK WRITE :      83.77 K/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:       0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                                       
  545 be/3 root          0.00 B      4.00 K  0.00 %  0.08 % [jbd2/sda1-8]
 7717 be/4 root          0.00 B      0.00 B  0.00 %  0.01 % [kworker/u2:3]
 7865 be/4 root          0.00 B      0.00 B  0.00 %  0.01 % [kworker/u2:1]
 7714 be/4 petko         0.00 B   1144.00 K  0.00 %  0.00 % ./disksort 150000
 7097 be/4 root          0.00 B      0.00 B  0.00 %  0.00 % [kworker/u2:2]
 7758 be/4 root          0.00 B      0.00 B  0.00 %  0.00 % [kworker/u2:0]

Notice how disksort with pid of 7714 is well ahead of everything else shown.

So that's it, happy debugging.

social