鲲鹏服务器性能测试最佳实践

鲲鹏服务器性能测试最佳实践

配置清单

操作系统:

[root@localhost ceshi]# cat /etc/openEuler-release
openEuler release 22.03 LTS

#uname -a
Linux localhost.localdomain 5.10.0-60.18.0.50.oe2203.aarch64 #1 SMP Wed Mar 30 02:43:08 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux

CPU信息(鲲鹏920 / 128核):

Architecture:           aarch64
  CPU op-mode(s):       64-bit
  Byte Order:           Little Endian
CPU(s):                 128
  On-line CPU(s) list:  0-127
Vendor ID:              HiSilicon
  BIOS Vendor ID:       HiSilicon
  Model name:           Kunpeng-920
    BIOS Model name:    Kunpeng 920-6426
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 64
    Socket(s):          2
    Stepping:           0x1
    BogoMIPS:           200.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
Caches (sum of all):
  L1d:                  8 MiB (128 instances)
  L1i:                  8 MiB (128 instances)
  L2:                   64 MiB (128 instances)
  L3:                   256 MiB (4 instances)
NUMA:
  NUMA node(s):         4
  NUMA node0 CPU(s):    0-31
  NUMA node1 CPU(s):    32-63
  NUMA node2 CPU(s):    64-95
  NUMA node3 CPU(s):    96-127
Vulnerabilities:
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Spec store bypass:    Vulnerable
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Not affected
  Srbds:                Not affected
  Tsx async abort:      Not affected

测试服务器软件配置清单

软件项 版本 备注
SPEC CPU2017 1.0.5

内存512G(16 x 32G):

[root@localhost ceshi]# free -m
               total        used        free      shared  buff/cache   available
Mem:          513391        9983      502466          27         941      501290
Swap:           4095           0        4095

[root@localhost ceshi]# dmidecode  | grep -i mem | grep 2666
        Configured Memory Speed: 2666 MT/s
        Configured Memory Speed: 2666 MT/s
        Configured Memory Speed: 2666 MT/s
        Configured Memory Speed: 2666 MT/s
        Configured Memory Speed: 2666 MT/s
        Configured Memory Speed: 2666 MT/s
        Configured Memory Speed: 2666 MT/s
        Configured Memory Speed: 2666 MT/s
        Configured Memory Speed: 2666 MT/s
        Configured Memory Speed: 2666 MT/s
        Configured Memory Speed: 2666 MT/s
        Configured Memory Speed: 2666 MT/s
        Configured Memory Speed: 2666 MT/s
        Configured Memory Speed: 2666 MT/s
        Configured Memory Speed: 2666 MT/s
        Configured Memory Speed: 2666 MT/s
[root@localhost ceshi]# dmidecode  | grep -i mem | grep 2666 | wc -l
16

1、SPEC CPU2017测试

1.1 整型rate测试

准备测试工具

spec镜像文件:cpu2017-1.0.5.iso
openEuler镜像文件:
毕昇编译器:
spec2017配置文件:

安装SPEC CPU 2017的依赖(如已有则忽略)

挂载镜像

mount cpu2017-1.0.5.iso /mnt

安装SPEC 2017,指定安装到/home/spec2017

cd /mnt
./install.sh -d /ceshi/SPEC2017/spectest

安装毕昇编译器(1.3.3版本)

# 下载地址
https://mirrors.huaweicloud.com/kunpeng/archive/compiler/bisheng_compiler/bisheng-compiler-1.3.3-aarch64-linux.tar.gz
# 解压安装包
tar -vxf bisheng-compiler-1.3.3-aarch64-linux.tar.gz
# 查看目录
[root@localhost spectest]# ls /usr/local/BSC133/
bin  include  lib  libexec  share

设置编译器环境变量

export LLVM_DIR=/usr/local/BSC133/

安装鲲鹏数学库

# 下载地址 
https://www.hikunpeng.com/developer/boostkit/library/math
# 安装
[root@localhost iso]# rpm -ivh boostkit-kml-2.2.0-1.aarch64.rpm
Verifying...                          ################################# [100%]
Preparing...                          ################################# [100%]

安装大页内存编译包

# 下载地址
https://repo.almalinux.org/almalinux/8/BaseOS/aarch64/os/Packages/libhugetlbfs-2.21-17.el8.aarch64.rpm
# 安装
rpm -ivh libhugetlbfs-2.21-17.el8.aarch64.rpm

安装jemalloc

# 解压源码包
tar -xf jemalloc-5.3.0.tar.gz 
# 编译
cd jemalloc-5.3.0 
bash autogen.sh 
mkdir build 
cd build 
../configure  CFLAGS="-march=armv8.2-a+crc+fp+crypto -flto=32" CXXFLAGS="-march=armv8.2-a+crc+fp+crypto -flto=32" 
make -j 
# 安装
make install

安装optimized-routines优化库

# 解压源码包
tar -xf optimized-routines-23.01.tar.gz 
cd optimized-routines-23.01/
# 编译
make ARCH=aarch64 -j 8
# 拷贝库文件
cp -a build/lib/lib* /usr/local/lib

拷贝cpu2017配置文件到config目录中

# rate配置文件
cp cpu2017-kunpeng920-128-rate.cfg /ceshi/SPEC2017/spectest/config 

拷贝flag文件

cp Bisheng-compiler-flags.xml /ceshi/SPEC2017/spectest/config/flag

开始测试

# 测试前优化
source shrc
echo 3 > /proc/sys/vm/drop_caches
ulimit -s unlimited

# rate测试
nohup runcpu -c cpu2017-kunpeng920-128-rate.cfg  --rebuild --copies 128 --reportable --define fastmath=0 --define jemalloc=1 --define hugepages=0 --nopower --runmode rate --tune base --size refrate intrate &

1.2 浮点测试

1.3 测试数据

芯片/测试项 intrate fprate intspeed fpspeed 备注
鲲鹏920(128核) 347(官方数据389) 266

2、SPEC CPU2006测试


鲲鹏服务器性能测试最佳实践
http://gsproj.github.io/2024/01/30/02_测试/08_鲲鹏服务器性能测试最佳实践/
作者
GongSheng
发布于
2024年1月30日
许可协议