Intel(R) Optimized LINPACK Benchmark工具（具备并行能力）

superuirui · 发表于 2013-1-5 20:34:29

本帖最后由 superuirui 于 2013-1-5 22:08 编辑

Intel(R) Optimized LINPACK Benchmark
Intel(R) Optimized LINPACK Benchmark is a generalization of the LINPACK 1000 benchmark which solves a dense (real*8) system of linear equations (Ax=b), measures the amount of time it takes to factor and solve the system, converts that time into a performance rate and tests the results for accuracy.  The generalization is in the number of equations (N) we can solve, which is not limited to 1000.  We use partial pivoting to assure the accuracy of the results.  This benchmark should not be used to report LINPACK 1000 performance as that is a compiled-code only benchmark.  This is a shared memory implementation which runs on a single platform and should not be confused with MP LINPACK, which is a distributed memory version of the same benchmark.  This benchmark should not be confused with LINPACK, the library, which has been expanded upon by LAPACK the library.

Intel Optimized LINPACK Benchmark uses these names for the executables to be run:

      linpack_xeon32.exe          Windows*, Intel(R) Xeon(R) 32-bit
      linpack_xeon64.exe          Windows*, Intel(R) Xeon(R) 64-bit
      xlinpack_xeon32             Linux*, Intel(R) Xeon(R) 32-bit
      xlinpack_xeon64             Linux*, Intel(R) Xeon(R) 64-bit
      linpack_cd32.app             Mac OS*, Intel(R) Core(TM) Duo 32-bit
      linpack_cd64.app             Mac OS*, Intel(R) Core(TM) microarchitecture 64-bit

superuirui · 发表于 2013-1-5 21:10:03

本帖最后由 superuirui 于 2013-1-5 21:17 编辑

linpack_xeon32的测试结果：

Intel(R) Optimized LINPACK Benchmark data

Current date/time: Sat Jan 05 21:05:33 2013

CPU frequency: 3.328 GHz
Number of CPUs: 1
Number of cores: 4
Number of threads: 4

Parameters are set to:

Number of tests: 9
Number of equations to solve (problem size) : 15000 14000 13000 12000 11000 10000 8000  6000  1000
Leading dimension of array                         : 15000 14008 13000 12008 11000 10008 8008  6008  1000
Number of trials to run                               : 1       2       2       2       2       2       2       3       4
Data alignment value (in Kbytes)                : 4       4       4       4       4       4       4       4       4

Maximum memory requested that can be used=1569180256, at the size=14000

============= Timing linear equation system solver =================

Size    LDA    Align. Time(s) GFlops    Residual          Residual(norm)
14000  14008  4    55.620    32.8970  1.899019e-010 3.429426e-002
14000  14008  4    46.324    39.4983  1.899019e-010 3.429426e-002
13000  13000  4    36.939    39.6598  1.586276e-010 3.319163e-002
13000  13000  4    36.914    39.6864  1.586276e-010 3.319163e-002
12000  12008  4    29.307    39.3183  1.304731e-010 3.202444e-002
12000  12008  4    36.856    31.2644  1.304731e-010 3.202444e-002
11000  11000  4    23.425    37.8900  1.099408e-010 3.207430e-002
11000  11000  4    26.336    33.7021  1.099408e-010 3.207430e-002
10000  10008  4    16.866    39.5392  8.823431e-011 3.111231e-002
10000  10008  4    22.345    29.8446  8.823431e-011 3.111231e-002
8000 8008 4    9.081    37.6011  5.420336e-011 2.981655e-002
8000 8008 4    8.803    38.7898  5.420336e-011 2.981655e-002
6000 6008 4    4.169    34.5552  3.530821e-011 3.424139e-002
6000 6008 4    3.801    37.9031  3.530821e-011 3.424139e-002
6000 6008 4    3.799    37.9201  3.530821e-011 3.424139e-002
1000 1000 4    0.027    24.8255  1.172701e-012 3.999211e-002
1000 1000 4    0.022    30.5058  1.172701e-012 3.999211e-002
1000 1000 4    0.023    29.1500  1.172701e-012 3.999211e-002
1000 1000 4    0.022    30.4371  1.172701e-012 3.999211e-002

Performance Summary (GFlops)

Size    LDA    Align.  Average Maximal
14000  14008  4    36.1976  39.4983
13000  13000  4    39.6731  39.6864
12000  12008  4    35.2914  39.3183
11000  11000  4    35.7960  37.8900
10000  10008  4    34.6919  39.5392
8000 8008 4    38.1954  38.7898
6000 6008 4    36.7928  37.9201
1000 1000 4    28.7296  30.5058

End of tests

2013/01/05 周六
21:13

wwiiww · 发表于 2013-1-6 00:15:30

我喜欢玩评分，明天试试工作站的效果

wwiiww · 发表于 2013-1-7 21:04:06

下载了附件，但是x64无法运行，因为dll文件问题。于是上网搜索到下面的内容
超线程的部分未测试，因为我的机器关闭了超线程。

英特尔 ? Optimized LINPACK Benchmark 性能测试 发布日期：2012.02.27作者：来源： Linpack是我们常用的CPU性能测试程序。它通过计算双精度线性方程组的求解来测试CPU的运算能力。Intel MKL提供一个优化版本的Intel(R) Optimized LINPACK Benchmark，通过运行这个程序，我们可以方便进行CPU的基准性能测试。
Intel(R) Optimized LINPACK Benchmark是根据LINPACK 1000 benchmark优化后的程序。程序根据用户指定的参数生成一个线性的方程组，通过方程的求解时间与计算量，来计算CPU的浮点性能。
运行这个程序基本步骤为：
1>Intel(R) Optimized LINPACK Benchmark 下载：
Intel(R) Optimized LINPACK Benchmark程序包括在Intel MKL 安装包中。如果已经下载并且安装了Intel MKL(http://www3.intel.com/cd/software/products/apac/zho/329191.htm ), LINPACK Benchmark 位于 mkl/benchmarks/linpack 目录下。
Intel(R) Optimized LINPACK Benchmark，还提供的单独下载：http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download/
2>程序运行：
benchmarks/linpack目录下，包括了已经编译好的可执行文件。直接运行该程序，就可以进行相应的性能测试。运行程序前，建议首先阅读MKL用户指南中，有关“LINPACK and MP LINPACK Benchmarks”的说明部分：
http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/mklxe/mkl_userguide_win/MKL_UG_linpack/MKL_UG_linpack.htm
在Intel 64 的平台上，一个简单的运行命令如下：
>./xlinpack_xeon64 lininput_xeon64
程序运行后，会提供如下图类似的性能数据：其中GFlops 列提供了方程求解时，每秒系统能计算到多少次的10亿次双精度浮点运算(GFLOPs)。比如，我们在Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz，4 核的机器测试的结果中，够达到93GFlops 运行结果，该CPU双精度的理论峰值为108.8GFLPs，说明测试程序达到86%的理论峰值。

3>运行参数
运行该程序，需要输入参数文件。该文件包括了运行LINPACK 程序的主要参数。填写参数时，一些主要的注意事项为：
1）设置测试矩阵维数时，需同时考虑的内存的大小：程序运行中，会提示出所需内存（Maximum memory requested that can be used =...), 如果系统实际内存小于该值，需要减小输入的矩阵的维数。
2）矩阵行间距（Leading dimension of array）：矩阵行间距必须大于或等于矩阵大小。实际运行时，可以根据具体的环境，测试一下能取得最好性能的矩阵行间距大小。一些测试经验表明，行间距常常在大于矩阵的维数，并能够被8 整除的长度时，系统能取得较高性能。
3）数据的对齐值（alignment values）：该数据是KB为单位的，如缺省的输入数据是4，表示是以4K的内存地址上对齐。通常，我们可以将对齐值设置为系统页面的大小。
4>多线程设置
该程序是一个多核优化后的程序. 缺省的程序会根据系统处理器核的数目，设置线程. 用户可以通过OMP_NUM_THREADS 来改变程序运行的线程数目. 如在bash shell中, 设置：
   >export  OMP_NUM_THRADS= 线程数目
为获得最高性能，将线程数目设置为系统核的数目。在有超线程的系统中，设置该数目为系统超线程数目的一半，并设置下面的环境变量：
   KMP_AFFINITY=granularity=fine,compact,1,0.
这方面的讨论，可以参考文章： http://software.intel.com/en-us/articles/setting-thread-affinity-on-smt-or-ht-enabled-systems/
最后需要顺便说明的是，这个程序的结果，不适用于提交为LINPACK 1000 性能数据。该基准测试需要，运行编译器直接编译后的代码。Intel(R) Optimized LINPACK Benchmark包括手工专门针对处理器优化的代码。

测试结果如下

X5670 32

Intel(R) Optimized LINPACK Benchmark data

Current date/time: Mon Jan 07 13:07:32 2013

CPU frequency: 2.924 GHz
Number of CPUs: 1
Number of cores: 6
Number of threads: 6

Parameters are set to:

Number of tests: 9
Number of equations to solve (problem size) : 15000 14000 13000 12000 11000 10000 8000  6000  1000
Leading dimension of array                : 15000 14008 13000 12008 11000 10008 8008  6008  1000
Number of trials to run                   : 1    2    2    2    2    2    2    3    4
Data alignment value (in Kbytes)          : 4    4    4    4    4    4    4    4    4

Maximum memory requested that can be used=1569180256, at the size=14000

============= Timing linear equation system solver =================

Size LDA Align. Time(s) GFlops Residual    Residual(norm)
14000  14008  4    31.459    58.1614  2.035009e-010 3.675010e-002
14000  14008  4    31.484    58.1167  2.035009e-010 3.675010e-002
13000  13000  4    25.300    57.9047  1.472652e-010 3.081413e-002
13000  13000  4    25.317    57.8670  1.472652e-010 3.081413e-002
12000  12008  4    20.002    57.6090  1.356225e-010 3.328834e-002
12000  12008  4    20.352    56.6167  1.356225e-010 3.328834e-002
11000  11000  4    15.589    56.9351  1.105979e-010 3.226603e-002
11000  11000  4    15.486    57.3135  1.105979e-010 3.226603e-002
10000  10008  4    11.911    55.9873  9.376647e-011 3.306301e-002
10000  10008  4    11.911    55.9867  9.376647e-011 3.306301e-002
8000 8008 4    6.178    55.2742  5.977169e-011 3.287961e-002
8000 8008 4    6.168    55.3641  5.977169e-011 3.287961e-002
6000 6008 4    2.639    54.5982  4.235125e-011 4.107164e-002
6000 6008 4    2.644    54.4986  4.235125e-011 4.107164e-002
6000 6008 4    2.652    54.3319  4.235125e-011 4.107164e-002
1000 1000 4    0.017    40.0534  1.231404e-012 4.199404e-002
1000 1000 4    0.017    39.7385  1.231404e-012 4.199404e-002
1000 1000 4    0.021    31.9670  1.231404e-012 4.199404e-002
1000 1000 4    0.017    39.1696  1.231404e-012 4.199404e-002

Performance Summary (GFlops)

Size LDA Align.  Average  Maximal
14000  14008  4    58.1391  58.1614
13000  13000  4    57.8859  57.9047
12000  12008  4    57.1128  57.6090
11000  11000  4    57.1243  57.3135
10000  10008  4    55.9870  55.9873
8000 8008 4    55.3192  55.3641
6000 6008 4    54.4762  54.5982
1000 1000 4    37.7321  40.0534

End of tests

2013/01/07 周一
13:13

wwiiww · 发表于 2013-1-7 21:05:39

X5670 X64

Intel(R) Optimized LINPACK Benchmark data

Current date/time: Mon Jan 07 12:26:17 2013

CPU frequency: 2.924 GHz
Number of CPUs: 1
Number of cores: 6
Number of threads: 6

Parameters are set to:

Number of tests: 12
Number of equations to solve (problem size) : 1000  2000  3000  4000  5000  10000 15000 20000 25000 30000 35000 40000
Leading dimension of array                : 1000  2000  3000  4000  5000  10000 15000 20000 25000 30000 35000 40000
Number of trials to run                   : 4    4    4    4    4    2    2    2    2    1    1    1
Data alignment value (in Kbytes)          : 4    4    4    4    4    4    4    4    4    4    4    4

Maximum memory requested that can be used=4210869504, at the size=40000

============= Timing linear equation system solver =================

Size LDA Align. Time(s) GFlops Residual    Residual(norm)
1000 1000 4    0.018    36.8769  1.290190e-012 4.399880e-002
1000 1000 4    0.018    37.9436  1.290190e-012 4.399880e-002
1000 1000 4    0.018    37.6374  1.290190e-012 4.399880e-002
1000 1000 4    0.029    22.6924  1.290190e-012 4.399880e-002
2000 2000 4    0.879    6.0758 3.492484e-012 3.038033e-002
2000 2000 4    0.117    45.4940  3.492484e-012 3.038033e-002
2000 2000 4    0.102    52.3469  3.492484e-012 3.038033e-002
2000 2000 4    0.107    50.1015  3.492484e-012 3.038033e-002
3000 3000 4    0.337    53.5029  9.005574e-012 3.467831e-002
3000 3000 4    0.354    50.9067  9.005574e-012 3.467831e-002
3000 3000 4    0.595    30.3020  9.005574e-012 3.467831e-002
3000 3000 4    0.369    48.8266  9.005574e-012 3.467831e-002
4000 4000 4    1.417    30.1345  1.677858e-011 3.657050e-002
4000 4000 4    0.744    57.4257  1.677858e-011 3.657050e-002
4000 4000 4    0.779    54.8473  1.677858e-011 3.657050e-002
4000 4000 4    0.752    56.7728  1.677858e-011 3.657050e-002
5000 5000 4    2.103    39.6538  2.713010e-011 3.783074e-002
5000 5000 4    1.426    58.4756  2.713010e-011 3.783074e-002
5000 5000 4    1.414    58.9517  2.713010e-011 3.783074e-002
5000 5000 4    1.425    58.5128  2.713010e-011 3.783074e-002
10000  10000  4    11.681    57.0893  1.027752e-010 3.623959e-002
10000  10000  4    11.242    59.3192  1.027752e-010 3.623959e-002
15000  15000  4    35.636    63.1518  2.027952e-010 3.194059e-002
15000  15000  4    35.633    63.1558  2.027952e-010 3.194059e-002
20000  20000  4    84.053    63.4617  3.288586e-010 2.911119e-002
20000  20000  4    85.963    62.0515  3.288586e-010 2.911119e-002
25000  25000  4    164.268 63.4203  5.767077e-010 3.279530e-002
25000  25000  4    163.132 63.8619  5.767077e-010 3.279530e-002
30000  30000  4    282.801 63.6553  7.722717e-010 3.044303e-002
35000  35000  4    444.380 64.3274  1.158799e-009 3.363819e-002
40000  40000  4    664.366 64.2265  1.413958e-009 3.144691e-002

Performance Summary (GFlops)

Size LDA Align.  Average  Maximal
1000 1000 4    33.7876  37.9436
2000 2000 4    38.5045  52.3469
3000 3000 4    45.8846  53.5029
4000 4000 4    49.7951  57.4257
5000 5000 4    53.8985  58.9517
10000  10000  4    58.2043  59.3192
15000  15000  4    63.1538  63.1558
20000  20000  4    62.7566  63.4617
25000  25000  4    63.6411  63.8619
30000  30000  4    63.6553  63.6553
35000  35000  4    64.3274  64.3274
40000  40000  4    64.2265  64.2265

End of tests

2013/01/07 周一
13:05

X5672×2  32

Intel(R) Optimized LINPACK Benchmark data

Current date/time: Mon Jan 07 12:20:57 2013

CPU frequency: 3.589 GHz
Number of CPUs: 2
Number of cores: 8
Number of threads: 8

Parameters are set to:

Number of tests: 9
Number of equations to solve (problem size) : 15000 14000 13000 12000 11000 10000 8000  6000  1000
Leading dimension of array                : 15000 14008 13000 12008 11000 10008 8008  6008  1000
Number of trials to run                   : 1    2    2    2    2    2    2    3    4
Data alignment value (in Kbytes)          : 4    4    4    4    4    4    4    4    4

Maximum memory requested that can be used=1569180256, at the size=14000

============= Timing linear equation system solver =================

Size LDA Align. Time(s) GFlops Residual    Residual(norm)
14000  14008  4    22.136    82.6588  2.203665e-010 3.979584e-002
14000  14008  4    21.048    86.9296  2.203665e-010 3.979584e-002
13000  13000  4    17.127    85.5353  1.461559e-010 3.058202e-002
13000  13000  4    17.077    85.7877  1.461559e-010 3.058202e-002
12000  12008  4    13.537    85.1212  1.272941e-010 3.124415e-002
12000  12008  4    13.623    84.5821  1.272941e-010 3.124415e-002
11000  11000  4    10.700    82.9481  1.429388e-010 4.170121e-002
11000  11000  4    10.633    83.4716  1.429388e-010 4.170121e-002
10000  10008  4    7.914    84.2669  8.823431e-011 3.111231e-002
10000  10008  4    7.965    83.7259  8.823431e-011 3.111231e-002
8000 8008 4    4.144    82.4063  5.420336e-011 2.981655e-002
8000 8008 4    4.118    82.9200  5.420336e-011 2.981655e-002
6000 6008 4    1.775    81.1661  3.530821e-011 3.424139e-002
6000 6008 4    1.766    81.5683  3.530821e-011 3.424139e-002
6000 6008 4    1.765    81.6124  3.530821e-011 3.424139e-002
1000 1000 4    0.012    54.2722  1.231404e-012 4.199404e-002
1000 1000 4    0.012    53.8819  1.231404e-012 4.199404e-002
1000 1000 4    0.012    54.7763  1.231404e-012 4.199404e-002
1000 1000 4    0.012    54.4514  1.231404e-012 4.199404e-002

Performance Summary (GFlops)

Size LDA Align.  Average  Maximal
14000  14008  4    84.7942  86.9296
13000  13000  4    85.6615  85.7877
12000  12008  4    84.8516  85.1212
11000  11000  4    83.2098  83.4716
10000  10008  4    83.9964  84.2669
8000 8008 4    82.6632  82.9200
6000 6008 4    81.4489  81.6124
1000 1000 4    54.3454  54.7763

End of tests

2013/01/07 周一
12:25

X5672×2 X64

Intel(R) Optimized LINPACK Benchmark data

Current date/time: Mon Jan 07 11:32:07 2013

CPU frequency: 3.587 GHz
Number of CPUs: 2
Number of cores: 8
Number of threads: 8

Parameters are set to:

Number of tests: 12
Number of equations to solve (problem size) : 1000  2000  3000  4000  5000  10000 15000 20000 25000 30000 35000 40000
Leading dimension of array                : 1000  2000  3000  4000  5000  10000 15000 20000 25000 30000 35000 40000
Number of trials to run                   : 4    4    4    4    4    2    2    2    2    1    1    1
Data alignment value (in Kbytes)          : 4    4    4    4    4    4    4    4    4    4    4    4

Maximum memory requested that can be used=4210869504, at the size=40000

============= Timing linear equation system solver =================

Size LDA Align. Time(s) GFlops Residual    Residual(norm)
1000 1000 4    0.015    43.9879  1.290190e-012 4.399880e-002
1000 1000 4    0.021    32.5291  1.290190e-012 4.399880e-002
1000 1000 4    0.016    43.1009  1.290190e-012 4.399880e-002
1000 1000 4    0.015    46.0368  1.290190e-012 4.399880e-002
2000 2000 4    0.081    65.8844  3.681139e-012 3.202140e-002
2000 2000 4    0.076    70.5308  3.681139e-012 3.202140e-002
2000 2000 4    0.081    66.1783  3.681139e-012 3.202140e-002
2000 2000 4    0.084    63.7339  3.681139e-012 3.202140e-002
3000 3000 4    0.235    76.8288  8.734729e-012 3.363535e-002
3000 3000 4    0.232    77.7781  8.734729e-012 3.363535e-002
3000 3000 4    0.238    75.7094  8.734729e-012 3.363535e-002
3000 3000 4    0.255    70.7262  8.734729e-012 3.363535e-002
4000 4000 4    0.514    83.0847  1.856720e-011 4.046898e-002
4000 4000 4    0.606    70.5023  1.856720e-011 4.046898e-002
4000 4000 4    0.605    70.5790  1.856720e-011 4.046898e-002
4000 4000 4    0.500    85.4023  1.856720e-011 4.046898e-002
5000 5000 4    2.325    35.8591  2.776018e-011 3.870934e-002
5000 5000 4    0.955    87.3358  2.776018e-011 3.870934e-002
5000 5000 4    0.958    87.0799  2.776018e-011 3.870934e-002
5000 5000 4    0.958    87.0779  2.776018e-011 3.870934e-002
10000  10000  4    8.549    78.0076  9.922822e-011 3.498888e-002
10000  10000  4    9.405    70.9025  9.922822e-011 3.498888e-002
15000  15000  4    24.759    90.8959  1.860329e-010 2.930049e-002
15000  15000  4    23.856    94.3330  1.860329e-010 2.930049e-002
20000  20000  4    56.395    94.5851  3.422355e-010 3.029534e-002
20000  20000  4    56.413    94.5542  3.422355e-010 3.029534e-002
25000  25000  4    109.409 95.2196  6.775732e-010 3.853116e-002
25000  25000  4    109.506 95.1356  6.775732e-010 3.853116e-002
30000  30000  4    189.422 95.0353  1.038589e-009 4.094128e-002
35000  35000  4    299.855 95.3321  1.186054e-009 3.442935e-002
40000  40000  4    446.274 95.6136  1.502736e-009 3.342137e-002

Performance Summary (GFlops)

Size LDA Align.  Average  Maximal
1000 1000 4    41.4137  46.0368
2000 2000 4    66.5818  70.5308
3000 3000 4    75.2606  77.7781
4000 4000 4    77.3921  85.4023
5000 5000 4    74.3382  87.3358
10000  10000  4    74.4551  78.0076
15000  15000  4    92.6144  94.3330
20000  20000  4    94.5697  94.5851
25000  25000  4    95.1776  95.2196
30000  30000  4    95.0353  95.0353
35000  35000  4    95.3321  95.3321
40000  40000  4    95.6136  95.6136

End of tests

2013/01/07 周一
12:00

wwiiww · 发表于 2013-1-7 21:09:36

X5670 6核，2.93G 一颗 58GFlops（32） 64GFlops （X64）
X5672 4核，3.2G 两颗 85GFlops（32） 95GFlops（X64）

账号		自动登录	找回密码
密码			注册

账号		自动登录	找回密码
密码			立即注册

[综合] Intel(R) Optimized LINPACK Benchmark工具（具备并行能力）

本帖子中包含更多资源

点评

评分

评分

评分