Intel(R) Optimized LINPACK Benchmark工具(具备并行能力)
本帖最后由 superuirui 于 2013-1-5 22:08 编辑Intel(R) Optimized LINPACK Benchmark
Intel(R) Optimized LINPACK Benchmark is a generalization of the LINPACK 1000 benchmark which solves a dense (real*8) system of linear equations (Ax=b), measures the amount of time it takes to factor and solve the system, converts that time into a performance rate and tests the results for accuracy.The generalization is in the number of equations (N) we can solve, which is not limited to 1000.We use partial pivoting to assure the accuracy of the results.This benchmark should not be used to report LINPACK 1000 performance as that is a compiled-code only benchmark.This is a shared memory implementation which runs on a single platform and should not be confused with MP LINPACK, which is a distributed memory version of the same benchmark.This benchmark should not be confused with LINPACK, the library, which has been expanded upon by LAPACK the library.
Intel Optimized LINPACK Benchmark uses these names for the executables to be run:
linpack_xeon32.exe Windows*, Intel(R) Xeon(R) 32-bit
linpack_xeon64.exe Windows*, Intel(R) Xeon(R) 64-bit
xlinpack_xeon32 Linux*, Intel(R) Xeon(R) 32-bit
xlinpack_xeon64 Linux*, Intel(R) Xeon(R) 64-bit
linpack_cd32.app Mac OS*, Intel(R) Core(TM) Duo 32-bit
linpack_cd64.app Mac OS*, Intel(R) Core(TM) microarchitecture 64-bit
本帖最后由 superuirui 于 2013-1-5 21:17 编辑
linpack_xeon32的测试结果:
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Sat Jan 05 21:05:33 2013
CPU frequency: 3.328 GHz
Number of CPUs: 1
Number of cores: 4
Number of threads: 4
Parameters are set to:
Number of tests: 9
Number of equations to solve (problem size) : 15000 14000 13000 12000 11000 10000 800060001000
Leading dimension of array : 15000 14008 13000 12008 11000 10008 800860081000
Number of trials to run : 1 2 2 2 2 2 2 3 4
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4
Maximum memory requested that can be used=1569180256, at the size=14000
============= Timing linear equation system solver =================
Size LDA Align. Time(s) GFlops Residual Residual(norm)
14000140084 55.620 32.89701.899019e-010 3.429426e-002
14000140084 46.324 39.49831.899019e-010 3.429426e-002
13000130004 36.939 39.65981.586276e-010 3.319163e-002
13000130004 36.914 39.68641.586276e-010 3.319163e-002
12000120084 29.307 39.31831.304731e-010 3.202444e-002
12000120084 36.856 31.26441.304731e-010 3.202444e-002
11000110004 23.425 37.89001.099408e-010 3.207430e-002
11000110004 26.336 33.70211.099408e-010 3.207430e-002
10000100084 16.866 39.53928.823431e-011 3.111231e-002
10000100084 22.345 29.84468.823431e-011 3.111231e-002
8000 8008 4 9.081 37.60115.420336e-011 2.981655e-002
8000 8008 4 8.803 38.78985.420336e-011 2.981655e-002
6000 6008 4 4.169 34.55523.530821e-011 3.424139e-002
6000 6008 4 3.801 37.90313.530821e-011 3.424139e-002
6000 6008 4 3.799 37.92013.530821e-011 3.424139e-002
1000 1000 4 0.027 24.82551.172701e-012 3.999211e-002
1000 1000 4 0.022 30.50581.172701e-012 3.999211e-002
1000 1000 4 0.023 29.15001.172701e-012 3.999211e-002
1000 1000 4 0.022 30.43711.172701e-012 3.999211e-002
Performance Summary (GFlops)
Size LDA Align.Average Maximal
14000140084 36.197639.4983
13000130004 39.673139.6864
12000120084 35.291439.3183
11000110004 35.796037.8900
10000100084 34.691939.5392
8000 8008 4 38.195438.7898
6000 6008 4 36.792837.9201
1000 1000 4 28.729630.5058
End of tests
2013/01/05 周六
21:13
我喜欢玩评分,明天试试工作站的效果;P 下载了附件,但是x64无法运行,因为dll文件问题。于是上网搜索到下面的内容
超线程的部分未测试,因为我的机器关闭了超线程。
英特尔 ? Optimized LINPACK Benchmark 性能测试 发布日期:2012.02.27作者:来源: Linpack是我们常用的CPU性能测试程序。它通过计算双精度线性方程组的求解来测试CPU的运算能力。Intel MKL提供一个优化版本的Intel(R) Optimized LINPACK Benchmark,通过运行这个程序,我们可以方便进行CPU的基准性能测试。
Intel(R) Optimized LINPACK Benchmark是根据LINPACK 1000 benchmark优化后的程序。程序根据用户指定的参数生成一个线性的方程组,通过方程的求解时间与计算量,来计算CPU的浮点性能。
运行这个程序基本步骤为:
1>Intel(R) Optimized LINPACK Benchmark 下载:
Intel(R) Optimized LINPACK Benchmark程序包括在Intel MKL 安装包中。如果已经下载并且安装了Intel MKL(http://www3.intel.com/cd/software/products/apac/zho/329191.htm ), LINPACK Benchmark 位于 mkl/benchmarks/linpack 目录下。
Intel(R) Optimized LINPACK Benchmark,还提供的单独下载:http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download/
2>程序运行:
benchmarks/linpack目录下,包括了已经编译好的可执行文件。直接运行该程序,就可以进行相应的性能测试。运行程序前,建议首先阅读MKL用户指南中,有关“LINPACK and MP LINPACK Benchmarks”的说明部分:
http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/mklxe/mkl_userguide_win/MKL_UG_linpack/MKL_UG_linpack.htm
在Intel 64 的平台上, 一个简单的运行命令如下:
>./xlinpack_xeon64 lininput_xeon64
程序运行后,会提供如下图类似的性能数据: 其中GFlops 列提供了方程求解时, 每秒系统能计算到多少次的10亿次双精度浮点运算(GFLOPs)。比如,我们在Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz,4 核的机器测试的结果中,够达到93GFlops 运行结果,该CPU双精度的理论峰值为108.8GFLPs,说明测试程序达到86%的理论峰值。
http://software.intel.com/zh-cn/blogs/wordpress/wp-content/uploads/2012/02/Untitled.jpg
3>运行参数
运行该程序,需要输入参数文件。 该文件包括了运行LINPACK 程序的主要参数。填写参数时,一些主要的注意事项为:
1)设置测试矩阵维数时,需同时考虑的内存的大小:程序运行中,会提示出所需内存(Maximum memory requested that can be used =...), 如果系统实际内存小于该值,需要减小输入的矩阵的维数。
2)矩阵行间距(Leading dimension of array):矩阵行间距必须大于或等于矩阵大小。实际运行时,可以根据具体的环境,测试一下能取得最好性能的矩阵行间距大小。一些测试经验表明,行间距常常在大于矩阵的维数,并能够被8 整除的长度时,系统能取得较高性能。
3)数据的对齐值(alignment values):该数据是KB为单位的,如缺省的输入数据是4,表示是以4K的内存地址上对齐。通常,我们可以将对齐值设置为系统页面的大小。
4>多线程设置
该程序是一个多核优化后的程序. 缺省的程序会根据系统处理器核的数目,设置线程. 用户可以通过OMP_NUM_THREADS 来改变程序运行的线程数目. 如在bash shell中, 设置:
>exportOMP_NUM_THRADS= 线程数目
为获得最高性能,将线程数目设置为系统核的数目。在有超线程的系统中,设置该数目为系统超线程数目的一半,并设置下面的环境变量:
KMP_AFFINITY=granularity=fine,compact,1,0.
这方面的讨论,可以参考文章: http://software.intel.com/en-us/articles/setting-thread-affinity-on-smt-or-ht-enabled-systems/
最后需要顺便说明的是,这个程序的结果,不适用于提交为LINPACK 1000 性能数据。该基准测试需要,运行编译器直接编译后的代码。Intel(R) Optimized LINPACK Benchmark包括手工专门针对处理器优化的代码。
测试结果如下
X5670 32
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Mon Jan 07 13:07:32 2013
CPU frequency: 2.924 GHz
Number of CPUs: 1
Number of cores: 6
Number of threads: 6
Parameters are set to:
Number of tests: 9
Number of equations to solve (problem size) : 15000 14000 13000 12000 11000 10000 800060001000
Leading dimension of array : 15000 14008 13000 12008 11000 10008 800860081000
Number of trials to run : 1 2 2 2 2 2 2 3 4
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4
Maximum memory requested that can be used=1569180256, at the size=14000
============= Timing linear equation system solver =================
Size LDA Align. Time(s) GFlops Residual Residual(norm)
14000140084 31.459 58.16142.035009e-010 3.675010e-002
14000140084 31.484 58.11672.035009e-010 3.675010e-002
13000130004 25.300 57.90471.472652e-010 3.081413e-002
13000130004 25.317 57.86701.472652e-010 3.081413e-002
12000120084 20.002 57.60901.356225e-010 3.328834e-002
12000120084 20.352 56.61671.356225e-010 3.328834e-002
11000110004 15.589 56.93511.105979e-010 3.226603e-002
11000110004 15.486 57.31351.105979e-010 3.226603e-002
10000100084 11.911 55.98739.376647e-011 3.306301e-002
10000100084 11.911 55.98679.376647e-011 3.306301e-002
8000 8008 4 6.178 55.27425.977169e-011 3.287961e-002
8000 8008 4 6.168 55.36415.977169e-011 3.287961e-002
6000 6008 4 2.639 54.59824.235125e-011 4.107164e-002
6000 6008 4 2.644 54.49864.235125e-011 4.107164e-002
6000 6008 4 2.652 54.33194.235125e-011 4.107164e-002
1000 1000 4 0.017 40.05341.231404e-012 4.199404e-002
1000 1000 4 0.017 39.73851.231404e-012 4.199404e-002
1000 1000 4 0.021 31.96701.231404e-012 4.199404e-002
1000 1000 4 0.017 39.16961.231404e-012 4.199404e-002
Performance Summary (GFlops)
Size LDA Align.AverageMaximal
14000140084 58.139158.1614
13000130004 57.885957.9047
12000120084 57.112857.6090
11000110004 57.124357.3135
10000100084 55.987055.9873
8000 8008 4 55.319255.3641
6000 6008 4 54.476254.5982
1000 1000 4 37.732140.0534
End of tests
2013/01/07 周一
13:13
X5670 X64
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Mon Jan 07 12:26:17 2013
CPU frequency: 2.924 GHz
Number of CPUs: 1
Number of cores: 6
Number of threads: 6
Parameters are set to:
Number of tests: 12
Number of equations to solve (problem size) : 1000200030004000500010000 15000 20000 25000 30000 35000 40000
Leading dimension of array : 1000200030004000500010000 15000 20000 25000 30000 35000 40000
Number of trials to run : 4 4 4 4 4 2 2 2 2 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 4
Maximum memory requested that can be used=4210869504, at the size=40000
============= Timing linear equation system solver =================
Size LDA Align. Time(s) GFlops Residual Residual(norm)
1000 1000 4 0.018 36.87691.290190e-012 4.399880e-002
1000 1000 4 0.018 37.94361.290190e-012 4.399880e-002
1000 1000 4 0.018 37.63741.290190e-012 4.399880e-002
1000 1000 4 0.029 22.69241.290190e-012 4.399880e-002
2000 2000 4 0.879 6.0758 3.492484e-012 3.038033e-002
2000 2000 4 0.117 45.49403.492484e-012 3.038033e-002
2000 2000 4 0.102 52.34693.492484e-012 3.038033e-002
2000 2000 4 0.107 50.10153.492484e-012 3.038033e-002
3000 3000 4 0.337 53.50299.005574e-012 3.467831e-002
3000 3000 4 0.354 50.90679.005574e-012 3.467831e-002
3000 3000 4 0.595 30.30209.005574e-012 3.467831e-002
3000 3000 4 0.369 48.82669.005574e-012 3.467831e-002
4000 4000 4 1.417 30.13451.677858e-011 3.657050e-002
4000 4000 4 0.744 57.42571.677858e-011 3.657050e-002
4000 4000 4 0.779 54.84731.677858e-011 3.657050e-002
4000 4000 4 0.752 56.77281.677858e-011 3.657050e-002
5000 5000 4 2.103 39.65382.713010e-011 3.783074e-002
5000 5000 4 1.426 58.47562.713010e-011 3.783074e-002
5000 5000 4 1.414 58.95172.713010e-011 3.783074e-002
5000 5000 4 1.425 58.51282.713010e-011 3.783074e-002
10000100004 11.681 57.08931.027752e-010 3.623959e-002
10000100004 11.242 59.31921.027752e-010 3.623959e-002
15000150004 35.636 63.15182.027952e-010 3.194059e-002
15000150004 35.633 63.15582.027952e-010 3.194059e-002
20000200004 84.053 63.46173.288586e-010 2.911119e-002
20000200004 85.963 62.05153.288586e-010 2.911119e-002
25000250004 164.268 63.42035.767077e-010 3.279530e-002
25000250004 163.132 63.86195.767077e-010 3.279530e-002
30000300004 282.801 63.65537.722717e-010 3.044303e-002
35000350004 444.380 64.32741.158799e-009 3.363819e-002
40000400004 664.366 64.22651.413958e-009 3.144691e-002
Performance Summary (GFlops)
Size LDA Align.AverageMaximal
1000 1000 4 33.787637.9436
2000 2000 4 38.504552.3469
3000 3000 4 45.884653.5029
4000 4000 4 49.795157.4257
5000 5000 4 53.898558.9517
10000100004 58.204359.3192
15000150004 63.153863.1558
20000200004 62.756663.4617
25000250004 63.641163.8619
30000300004 63.655363.6553
35000350004 64.327464.3274
40000400004 64.226564.2265
End of tests
2013/01/07 周一
13:05
X5672×232
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Mon Jan 07 12:20:57 2013
CPU frequency: 3.589 GHz
Number of CPUs: 2
Number of cores: 8
Number of threads: 8
Parameters are set to:
Number of tests: 9
Number of equations to solve (problem size) : 15000 14000 13000 12000 11000 10000 800060001000
Leading dimension of array : 15000 14008 13000 12008 11000 10008 800860081000
Number of trials to run : 1 2 2 2 2 2 2 3 4
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4
Maximum memory requested that can be used=1569180256, at the size=14000
============= Timing linear equation system solver =================
Size LDA Align. Time(s) GFlops Residual Residual(norm)
14000140084 22.136 82.65882.203665e-010 3.979584e-002
14000140084 21.048 86.92962.203665e-010 3.979584e-002
13000130004 17.127 85.53531.461559e-010 3.058202e-002
13000130004 17.077 85.78771.461559e-010 3.058202e-002
12000120084 13.537 85.12121.272941e-010 3.124415e-002
12000120084 13.623 84.58211.272941e-010 3.124415e-002
11000110004 10.700 82.94811.429388e-010 4.170121e-002
11000110004 10.633 83.47161.429388e-010 4.170121e-002
10000100084 7.914 84.26698.823431e-011 3.111231e-002
10000100084 7.965 83.72598.823431e-011 3.111231e-002
8000 8008 4 4.144 82.40635.420336e-011 2.981655e-002
8000 8008 4 4.118 82.92005.420336e-011 2.981655e-002
6000 6008 4 1.775 81.16613.530821e-011 3.424139e-002
6000 6008 4 1.766 81.56833.530821e-011 3.424139e-002
6000 6008 4 1.765 81.61243.530821e-011 3.424139e-002
1000 1000 4 0.012 54.27221.231404e-012 4.199404e-002
1000 1000 4 0.012 53.88191.231404e-012 4.199404e-002
1000 1000 4 0.012 54.77631.231404e-012 4.199404e-002
1000 1000 4 0.012 54.45141.231404e-012 4.199404e-002
Performance Summary (GFlops)
Size LDA Align.AverageMaximal
14000140084 84.794286.9296
13000130004 85.661585.7877
12000120084 84.851685.1212
11000110004 83.209883.4716
10000100084 83.996484.2669
8000 8008 4 82.663282.9200
6000 6008 4 81.448981.6124
1000 1000 4 54.345454.7763
End of tests
2013/01/07 周一
12:25
X5672×2 X64
Intel(R) Optimized LINPACK Benchmark data
Current date/time: Mon Jan 07 11:32:07 2013
CPU frequency: 3.587 GHz
Number of CPUs: 2
Number of cores: 8
Number of threads: 8
Parameters are set to:
Number of tests: 12
Number of equations to solve (problem size) : 1000200030004000500010000 15000 20000 25000 30000 35000 40000
Leading dimension of array : 1000200030004000500010000 15000 20000 25000 30000 35000 40000
Number of trials to run : 4 4 4 4 4 2 2 2 2 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 4
Maximum memory requested that can be used=4210869504, at the size=40000
============= Timing linear equation system solver =================
Size LDA Align. Time(s) GFlops Residual Residual(norm)
1000 1000 4 0.015 43.98791.290190e-012 4.399880e-002
1000 1000 4 0.021 32.52911.290190e-012 4.399880e-002
1000 1000 4 0.016 43.10091.290190e-012 4.399880e-002
1000 1000 4 0.015 46.03681.290190e-012 4.399880e-002
2000 2000 4 0.081 65.88443.681139e-012 3.202140e-002
2000 2000 4 0.076 70.53083.681139e-012 3.202140e-002
2000 2000 4 0.081 66.17833.681139e-012 3.202140e-002
2000 2000 4 0.084 63.73393.681139e-012 3.202140e-002
3000 3000 4 0.235 76.82888.734729e-012 3.363535e-002
3000 3000 4 0.232 77.77818.734729e-012 3.363535e-002
3000 3000 4 0.238 75.70948.734729e-012 3.363535e-002
3000 3000 4 0.255 70.72628.734729e-012 3.363535e-002
4000 4000 4 0.514 83.08471.856720e-011 4.046898e-002
4000 4000 4 0.606 70.50231.856720e-011 4.046898e-002
4000 4000 4 0.605 70.57901.856720e-011 4.046898e-002
4000 4000 4 0.500 85.40231.856720e-011 4.046898e-002
5000 5000 4 2.325 35.85912.776018e-011 3.870934e-002
5000 5000 4 0.955 87.33582.776018e-011 3.870934e-002
5000 5000 4 0.958 87.07992.776018e-011 3.870934e-002
5000 5000 4 0.958 87.07792.776018e-011 3.870934e-002
10000100004 8.549 78.00769.922822e-011 3.498888e-002
10000100004 9.405 70.90259.922822e-011 3.498888e-002
15000150004 24.759 90.89591.860329e-010 2.930049e-002
15000150004 23.856 94.33301.860329e-010 2.930049e-002
20000200004 56.395 94.58513.422355e-010 3.029534e-002
20000200004 56.413 94.55423.422355e-010 3.029534e-002
25000250004 109.409 95.21966.775732e-010 3.853116e-002
25000250004 109.506 95.13566.775732e-010 3.853116e-002
30000300004 189.422 95.03531.038589e-009 4.094128e-002
35000350004 299.855 95.33211.186054e-009 3.442935e-002
40000400004 446.274 95.61361.502736e-009 3.342137e-002
Performance Summary (GFlops)
Size LDA Align.AverageMaximal
1000 1000 4 41.413746.0368
2000 2000 4 66.581870.5308
3000 3000 4 75.260677.7781
4000 4000 4 77.392185.4023
5000 5000 4 74.338287.3358
10000100004 74.455178.0076
15000150004 92.614494.3330
20000200004 94.569794.5851
25000250004 95.177695.2196
30000300004 95.035395.0353
35000350004 95.332195.3321
40000400004 95.613695.6136
End of tests
2013/01/07 周一
12:00
X5670 6核,2.93G 一颗 58GFlops(32)64GFlops (X64)
X5672 4核,3.2G 两颗 85GFlops(32) 95GFlops(X64)
页:
[1]