dalmem throughput tests
These tests are performed for most dalmem functions, and best results for fastest variant are displayed here.
Currently (Version 1.0.1) 32KB buffer is used, so results are for cached sets (RAM speed shouldn't affect these tests results).
Every function block (10 consecutive calls) is called 50 times, and best time is accounted. 50x10 calls usually ensure the CPU is finally running at its fastest speed.
If you want to reproduce these tests on your own machines, please download dalib source code (tested only on unix), compile and test yourself (see README and INSTALL files).
System name (arch) | Width | memsearch | memsearch_r | memsearch_ri | bitmapsearch | bitmapsearch_r | ChkSum_ |
SpockBS
(x86_64
) | 8 16 32 64 | 2.1us (14.5GB/s, f3) 2.1us (14.5GB/s, f3) 2.1us (14.5GB/s, f3) 2.7us (11.3GB/s, f0) | 3.5us (8.71GB/s, f1) 20.7us (1.47GB/s, f0) 14us (2.17GB/s, f0) 4.3us (7.09GB/s, f0) | 3.1us (9.84GB/s, f1) 25us (1.22GB/s, f0) 11.3us (2.70GB/s, f0) 4.2us (7.26GB/s, f0) | 2.5us (12.2GB/s, f1) 2.5us (12.2GB/s, f1) 2.5us (12.2GB/s, f1) 4.2us (7.26GB/s, f0) | 4us (7.62GB/s, f1) 24.8us (1.23GB/s, f0) 11.3us (2.70GB/s, f0) 5.6us (5.44GB/s, f0) |
1.1us (27.7GB/s, f2) |
Nano
(x86_64
) | 8 16 32 64 | 5us (6.10GB/s, f3) 5us (6.10GB/s, f3) 5us (6.10GB/s, f3) 6us (5.08GB/s, f0) | 7.9us (3.86GB/s, f1) 39.4us (774MB/s, f0) 14us (2.17GB/s, f0) 10.1us (3.02GB/s, f0) | 7.9us (3.86GB/s, f1) 39.3us (776MB/s, f0) 14us (2.17GB/s, f0) 10.1us (3.02GB/s, f0) | 6us (5.08GB/s, f1) 5.1us (5.98GB/s, f1) 5.1us (5.98GB/s, f1) 6us (5.08GB/s, f0) | 8.9us (3.42GB/s, f1) 39.4us (774MB/s, f0) 14us (2.17GB/s, f0) 10.2us (2.99GB/s, f0) |
2.1us (14.5GB/s, f2) |
quark
(x86_64
) | 8 16 32 64 | 1.7us (17.9GB/s, f3) 1.7us (17.9GB/s, f3) 1.7us (17.9GB/s, f3) 2.3us (13.2GB/s, f0) | 2.4us (12.7GB/s, f1) 11.1us (2.74GB/s, f0) 6us (5.08GB/s, f0) 3.6us (8.47GB/s, f0) | 2.3us (13.2GB/s, f1) 11us (2.77GB/s, f0) 6.1us (5.00GB/s, f0) 3.7us (8.24GB/s, f0) | 1.8us (16.9GB/s, f1) 1.8us (16.9GB/s, f1) 1.8us (16.9GB/s, f1) 3.5us (8.71GB/s, f0) | 3us (10.1GB/s, f1) 20.4us (1.49GB/s, f0) 6.1us (5.00GB/s, f0) 3.7us (8.24GB/s, f0) |
1.1us (27.7GB/s, f2) |
BS2
(x86_64
) | 8 16 32 64 | 1.2us (25.4GB/s, f3) 1.3us (23.4GB/s, f3) 1.2us (25.4GB/s, f3) 2.2us (13.8GB/s, f0) | 2us (15.2GB/s, f1) 10.2us (2.99GB/s, f0) 5.7us (5.35GB/s, f0) 3.5us (8.71GB/s, f0) | 1.9us (16.0GB/s, f1) 20us (1.52GB/s, f0) 5.7us (5.35GB/s, f0) 3.4us (8.97GB/s, f0) | 1.5us (20.3GB/s, f1) 1.5us (20.3GB/s, f1) 1.5us (20.3GB/s, f1) 2.2us (13.8GB/s, f0) | 2.6us (11.7GB/s, f1) 19.9us (1.53GB/s, f0) 5.8us (5.26GB/s, f0) 3.4us (8.97GB/s, f0) |
0.6us (50.8GB/s, f2) |
Iris2
(x86_64
) | 8 16 32 64 | 1.5us (20.3GB/s, f3) 1.8us (16.9GB/s, f3) 1.5us (20.3GB/s, f3) 2.6us (11.7GB/s, f0) | 2.5us (12.2GB/s, f1) 13.5us (2.26GB/s, f0) 6.8us (4.48GB/s, f0) 4.1us (7.44GB/s, f0) | 2.3us (13.2GB/s, f1) 18us (1.69GB/s, f0) 6.8us (4.48GB/s, f0) 4.1us (7.44GB/s, f0) | 1.7us (17.9GB/s, f1) 1.7us (17.9GB/s, f1) 1.7us (17.9GB/s, f1) 2.7us (11.3GB/s, f0) | 2.7us (11.3GB/s, f1) 12us (2.54GB/s, f0) 36.9us (827MB/s, f0) 4.1us (7.44GB/s, f0) |
3.6us (8.47GB/s, f1) |
|