fio is a command line tool for I/O benchmarking offering many advanced features for simulating different types of workloads. It can read and write test data sequentially or in random fashion. In the latter case, the program adds a header which helps to verify the integrity of each written data block.

Generating data with verification header

fio should be ran in random write mode with verify parameter specified to generate verification headers. Without this parameter, it will write identical blocks of random data. Here is an example of invocation:

fio --filename=random-data.bin --rw=randwrite --name=rand --size=256 --blocksize=64 --verify=md5

The block size is set to be small enough to contain verification header plus some random data and not to be too large to be difficult to analize. The output file may look like this:

$ hexdump -C random-data.bin
 00000000  ca ac 02 00 40 00 00 00  3c 54 09 c5 01 cc a4 00  |....@...<T......|
 00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 00000020  01 00 00 00 13 8e 15 8b  d6 6f bb c2 2d 7c 21 d1  |.........o..-|!.|
 00000030  f4 9b 85 9f 8e 3a 46 f4  19 1a 38 5f 88 1c 0d 11  |.....:F...8_....|
 00000040  ca ac 02 00 40 00 00 00  34 94 94 93 9a 33 7a 18  |....@...4....3z.|
 00000050  40 00 00 00 00 00 00 00  69 03 00 00 85 a2 00 00  |@.......i.......|
 00000060  01 00 03 00 c7 36 2c 2d  28 89 0c 31 92 d0 e5 61  |.....6,-(..1...a|
 00000070  f0 33 07 71 28 95 20 c5  90 e8 8e 95 cb 85 24 00  |.3.q(. .......$.|
 00000080  ca ac 02 00 40 00 00 00  2e 56 db f6 8c 0a bb 88  |....@....V......|
 00000090  80 00 00 00 00 00 00 00  69 03 00 00 56 a2 00 00  |........i...V...|
 000000a0  01 00 01 00 07 46 da 3f  82 dd b4 8c 72 d0 8a bb  |.....F.?....r...|
 000000b0  d2 1c 1d a5 73 0a 4a dd  ac f2 42 d7 bb 51 22 18  |....s.J...B..Q..|
 000000c0  ca ac 02 00 40 00 00 00  d0 92 04 ac f3 14 98 41  |....@..........A|
 000000d0  c0 00 00 00 00 00 00 00  69 03 00 00 7e a2 00 00  |........i...~...|
 000000e0  01 00 02 00 1c 92 c8 e1  16 2c 37 98 6b 79 87 46  |.........,7.ky.F|
 000000f0  b4 63 7e a9 87 31 f5 df  14 b2 1b 82 6f 1c 4d 0a  |.c~..1......o.M.|
 00000100

There are four easily distinguishable blocks of data starting with 0xacca in this file. This magic number is a delimiter indicating verification headers in data blocks.

Header structure

The interpretaion of verification header depends, at least, on system’s architecture and version of fio used to write data. I am using Linux x86_64 and fio-3.1 for all examples in this post. The verification header is defined in verify.h header file. It has the following format:

 0x00000000 aaaa bbbb cccc cccc dddd dddd dddd dddd
 0x00000010 eeee eeee eeee eeee ffff ffff gggg gggg
 0x00000020 hhhh iiii jjjj jjjj kkkk kkkk kkkk kkkk

where

  • aaaa is a magic number identifying fio’s header; defined in verify.h
  • bbbb shows the method of verifying file contents; defined in verify.h
  • cccc contains the length of the current block including header and data
  • dddd holds random seed
  • eeee specifies offset of the block in the file
  • ffff is part of timestamp defining seconds
  • gggg is part of timestamp defining microseconds
  • hhhh is the thread number that wrote this block of data
  • iiii defines IO unit number
  • jjjj shows the check sum of the header not including data part of the current block
  • kkkk is verification specific data; can be block checksum or data pattern

Considering this header structure, data blocks in the example file shown above where written in the following order: 0x00, 0x80, 0xc0, 0x40.