海光服务器性能测试最佳实践

海光服务器性能测试最佳实践

配置清单

操作系统:

阿里云OS:
AliOS7U2-4.19-x86-64.1.265

#uname -a
Linux localhost.localdomain 4.19.91-007.ali4000.alios7.x86_64 #1 SMP Wed Apr 8 16:17:43 CST 2020 x86_64 x86_64 x86_64 GNU/Linux

CPU信息:

#lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    2
Core(s) per socket:    32
Socket(s):             2
NUMA node(s):          8
Vendor ID:             HygonGenuine
CPU family:            24
Model:                 2
Model name:            Hygon C86 7380 32-core Processor
Stepping:              2
CPU MHz:               2197.285
CPU max MHz:           2200.0000
CPU min MHz:           1200.0000
BogoMIPS:              4399.97
Virtualization:        AMD-V
L1d cache:             32K
L1i cache:             64K
L2 cache:              512K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7,64-71
NUMA node1 CPU(s):     8-15,72-79
NUMA node2 CPU(s):     16-23,80-87
NUMA node3 CPU(s):     24-31,88-95
NUMA node4 CPU(s):     32-39,96-103
NUMA node5 CPU(s):     40-47,104-111
NUMA node6 CPU(s):     48-55,112-119
NUMA node7 CPU(s):     56-63,120-127
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca

测试服务器软件配置清单

软件项 版本 备注
SPEC CPU2017 1.0.5

1、SPEC CPU2017测试

安装SPEC CPU 2017的依赖(如已有则忽略)

yum install libnsl gcc gcc-c++ glibc.x86_64 glibc.i686 glibc-devel.i686 glibc-devel.x86_64 libstdc++-devel.i686 libstdc++-devel.x86_64 libgcc.i686 libgcc.x86_64 libxml2.i686 libxml2.x86_64 libgfortran libhugetlbfs libnsl.i686 libnsl2.i686 libnsl2 tar perl vim numactl bzip2

挂载镜像

mount cpu2017-1.0.5.iso /mnt

安装SPEC 2017,指定安装到/home/spec2017

sh /mnt/install.sh -d /home/cpu2017

上传并解压AOCC优化包

tar -xf hygon-speccpu2017-aocc2.2.0-20230411.tar.gz -C /home/cpu2017

上传并解压优化库

tar -xf libs_opt.tar.gz -C /opt

开始测试

cd /home/cpu2017
# 根据测试需求调整run.sh中的runcpu测试命令。
nohup sh run.sh &

防止fpspeed 628题编译不过

# 修改/home/spec2017/config文件夹中的speed配置,注释编译选项

#-------------------------- fpseed tuning flags ---------------------------------
fpspeed:
CXX                      = clang++ -std=c++98

LDCFLAGS                 = -L$[GLIBC64_DIR] -Wl,-rpath=$[GLIBC64_DIR] -Wl,--dynamic-linker=$[GLIBC64_DIR]/ld-linux-x86-64.so.2
LDCXXFLAGS               = -L$[GLIBC64_2_30_DIR] -Wl,-rpath=$[GLIBC64_2_30_DIR] -Wl,--dynamic-linker=$[GLIBC64_2_30_DIR]/ld-linux-x86-64.so.2 -Wl,-mllvm -Wl,-suppress-fmas
LDFFLAGS                 = -L$[GLIBC64_DIR] -Wl,-rpath=$[GLIBC64_DIR] -Wl,--dynamic-linker=$[GLIBC64_DIR]/ld-linux-x86-64.so.2 -ffast-math \
                           # 防止628编译不过
                           #-Wl,-mllvm -Wl,-inline-recursion=4 \
                           #-Wl,-mllvm -Wl,-lsr-in-nested-loop \
                           #-Wl,-mllvm -Wl,-enable-iv-split

测试数据

芯片/测试项 intrate fprate intspeed fpspeed 备注
7380 (2.2Ghz 单核32核,双路64核,超线程128核,共8个NUMA) 274 (128 copies) 256(128 copies) 6.99(128 threads) 92.5(64 threads) 内存32 x 32G,共1024G
7380 (2.2Ghz 单核32核,双路64核,超线程128核,共8个NUMA) 211 (128 copies) 174 (128 copies) 5.29 (128 threads) 77.9 (64 threads) 内存12 x 32G,共384G
7360 (2.5Ghz 单核24核,双路48核,超线程96核,共8个NUMA) 231 (96 copies) 252 (96 copies) 6.05 (96 threads) 93.9 (48 threads) 内存 16 x 32,共512G
5380 (2.4Ghz 单核16核,双路32核,超线程64核,共4个NUMA) 110 (64 copies) 84.6 (64 copies) 5.32 (64 threads) 41.7 (32 threads) 内存12 x 32G,共384G

2、SPEC CPU2006测试

2.1 实验测试环境

# CPU 海光5380
#lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             2
NUMA node(s):          4
Vendor ID:             HygonGenuine
CPU family:            24
Model:                 2
Model name:            Hygon C86 5380 16-core Processor
Stepping:              2
CPU MHz:               2496.997
BogoMIPS:              5000.21
Virtualization:        AMD-V
L1d cache:             32K
L1i cache:             64K
L2 cache:              512K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7,32-39
NUMA node1 CPU(s):     8-15,40-47
NUMA node2 CPU(s):     16-23,48-55
NUMA node3 CPU(s):     24-31,56-63
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca

# 内存384G 12 * 32G DDR4 2933
#free -m
              total        used        free      shared  buff/cache   available
Mem:         386316        1067      383900           1        1348      383380
Swap:          2047           0        2047

2.2 准备测试工具

# 1、SPEC CPU2006的安装镜像
cpu2006-1.2.1.iso

# 2、海光测试工具包
Hygon编译包.rar	// 建议先在windows改成zip再上传

2.3 安装CPU2006测试工具

cpu2006的安装路径:**/cpu2006-1.2 (写死的!一定要安装到这) **

# 1、将 iso 镜像挂载到/mnt路径
mount –o loop cpu2006-1.2.iso /mnt
# 2、进入挂载目录
cd /mnt
# 3、安装: 
./install.sh –d /cpu2006-1.2  

2.4 安装依赖包

Ubuntu 系统:
apt install libc6:i386 libc6-dev-i386:amd64
apt install libc6:amd64 libc6-dev:amd64

Centos 系统:
yum install -y libstdc++.i686 libstdc++.x86_64 libstdc++-devel.i686 libstdc++-devel.x86_64
glibc-devel.i686 glibc-utils.x86_64 numactl

2.5 安装jemalloc库和smartheap库

# 1、将 jemalloc.64-32bit.tar.gz 和 SmartHeap.tar.gz 拷贝到/root 路径下
cp jemalloc.64-32bit.tar.gz SmartHeap.tar.gz /root
# 2、解压 
tar -xvf jemalloc.64-32bit.tar.gz 
tar -xvf SmartHeap.tar.gz 
# 3、创建: smartheap.conf
vim /etc/ld.so.conf.d/smartheap.conf
添加内容: /root/SmartHeap
# 4、创建: jemalloc.conf : 
vim /etc/ld.so.conf.d/jemalloc.conf
添加内容:
/root/jemalloc/lib64/lib
/root/jemalloc/lib32/lib
# 5、执行生效命令: 
ldconfig

2.6 binary 安装 (会把已编译好的执行文件拷贝进去)

将 SPECcpu2006_hygon_Open64_617306_001.tar.gz 拷贝到 Specpu2006 安装路径下
tar –xvf SPECcpu2006_hygon_Open64_617306_001.tar.gz 进行解压

2.7 设置测试迭代次数

解压 binary 包后在 hygon-run-int-fp-rate-20201118cfg.sh 文件内进行配置

# 如需跑3次,并生成正式报告,约测试 35h,按如下配置 
#run benchmarks
runspec --config=Hygon-open64-hygon-binary-20201118.cfg --action=run --tune=all --size=all --reportable int  
runspec --config=Hygon-open64-hygon-binary-20201118.cfg --action=run --tune=all --size=all --reportable fp 

# 如只需跑1次,不生成正式报告,约测试 12h,按如下配置  
runspec --config=Hygon-open64-hygon-binary-20201118.cfg --action=run --tune=all --size=all --iterations=1 --noreportable int  
runspec --config=Hygon-open64-hygon-binary-20201118.cfg --action=run --tune=all --size=all --iterations=1 --noreportable fp 

2.8 选择测试模式(Rate or Speed)

解压 binary 包后进入到 config 文件夹, vim Hygon-open64-hygon-binary-20201118.cfg 修改配置文件
Rate 模式 rate=1, Speed 模式 rate=0

# 第51行
....
  49 teerunout          = yes
  50 post_setup         = sync
  51 rate               = 1
....

2.9 修改配置参数 (生成的报告中记录的信息)

其中内存大小一定要设置好,不然会跑不起来

解压 binary 包后进入到 config 文件夹, vim Hygon-open64-hygon-binary-20201118.cfg 修
改配置文件, 修改说明如下:
%define num_sockets:芯片数量(1 or 2)
%define cores_per_socket 单颗 CPU 物理核数
%define nodes_per_socket 单颗 CPU node 节点数量
%define memory_size 内存总量	# 测试机器是384G内存,改成256
%define memory_freq 内存额定频率
%define memory_freq_actual 内存实际频率

2.10 开始测试

执行命令后台测试

nohup ./hygon-run-int-fp-rate-20201118cfg.sh &

正常运行输出如下

....
run_peak_test_hygon-rate-revB.0057, run_peak_test_hygon-rate-revB.0058, run_peak_test_hygon-rate-revB.0059, run_peak_test_hygon-rate-revB.0060, run_peak_test_hygon-rate-revB.0061, run_peak_test_hygon-rate-revB.0062, run_peak_test_hygon-rate-revB.0063)
sync
Running Benchmarks
  Running 400.perlbench test base hygon-rate-revB default (64 copies)
/cpu2006-1.2/bin/specinvoke -d /cpu2006-1.2/benchspec/CPU2006/400.perlbench/run/run_base_test_hygon-rate-revB.0000 -e speccmds.err -o speccmds.stdout -f speccmds.cmd -C -q
/cpu2006-1.2/bin/specinvoke -E -d /cpu2006-1.2/benchspec/CPU2006/400.perlbench/run/run_base_test_hygon-rate-revB.0000 -c 64 -e compare.err -o compare.stdout -f compare.cmd
.....

海光服务器性能测试最佳实践
http://gsproj.github.io/2023/12/21/02_测试/07_海光服务器性能测试最佳实践/
作者
GongSheng
发布于
2023年12月21日
许可协议