Multithreaded version of jpeginfo
jpeginfo is a utility for checking JPEG files for errors, calculating their MD5 sums and generating other information. Original version of the program can be found here: https://github.com/tjko/jpeginfo
Description
I used jpeginfo for integrity check of images captured by cameras I worked with and it usually took some time to process several thousands or tens of thousands of files at a time even on a powerful PC. jpeginfo checks file in two stages: it reads a file from disk and then performs required checks. I decided to split these stages in two separate threads and try to save some processing time this way. It is always a good idea to make tests before any performance optimisations, so I modified the sources and measured the time required for loading files and processing them. It is clear that on any modern platform IO performance will be a bottleneck, but having approximate times of reading and processing files would clarify the situation. It turned out that on my test platform (AMD A10-5700 plus HDD in SATA-3 mode) reading and processing times are comparable:
1 read time: 0:045605, processing time: 0:036398 (read time is 125.3% of processing time)
2 Read time: 0:033951, processing time: 0:037120 (read time is 91.5% of processing time)
3 Read time: 0:031341, processing time: 0:037456 (read time is 83.7% of processing time)
4 Read time: 0:037911, processing time: 0:037027 (read time is 102.4% of processing time)
5 Read time: 0:033075, processing time: 0:037580 (read time is 88.0% of processing time)
6 Read time: 0:036929, processing time: 0:037265 (read time is 99.1% of processing time)
7 Read time: 0:035042, processing time: 0:036861 (read time is 95.1% of processing time)
8 Read time: 0:032808, processing time: 0:037667 (read time is 87.1% of processing time)
9 Read time: 0:032908, processing time: 0:039062 (read time is 84.2% of processing time)
10 Read time: 0:030245, processing time: 0:038277 (read time is 79.0% of processing time)
So I decided to try to optimise performance of the program.
Tests
I tested modified version of the program on a set of 1667 JPEG files each ~3.1 Mb in size and on three different test platforms: AMD A10-5700 + HDD in SATA-3 mode, Core2 Duo T7300 + SSD in SATA-1 mode and Core i7-4770 + HDD in SATA-3 mode. Page cache was dropped after each test so it did not influence the results. Processing times are shown in the table:
Test platform | Unmodified jpeginfo | Modified jpeginfo |
---|---|---|
A10-5700 + HDD | 1m10s | 57s |
T7300 + SSD | 2m24s | 1m18s |
i7-4770 + HDD | 48s | 34s |
Ryzen 5 1600x + SSD | 47s | 13s |
Ryzen 5 1600x + NVMe | 46s | 6s |
Not bad! It would be interesting to test it on a modern CPU plus SSD, but I do not have such a system at hand right now.
New feature
I also added a new mode of operation and now jpeginfo can read files from stdio. Here is an example:
cat file001.jpeg | ./jpeginfo -c -s
This can be useful when checking files received over network without the need to save them on disk first.
Update (23.12.2017): I finally got a chance to test the program on a modern desktop PC with AMD Ryzen 5 1600x processor and two disks installed: SATA Kingston A400 and NVMe Samsung 960 EVO. The results obtained with the NVMe disk are very impressive. I put the results in the table above.
Repository of the program