1)
修改config/make.def
# Global *compile time* flags for Fortran programs
#---------------------------------------------------------------------------
FFLAGS = -pg
#---------------------------------------------------------------------------
# Global *link time* flags. Flags for increasing maximum executable
# size usually go here.
#---------------------------------------------------------------------------
FLINKFLAGS = -pg
CFLAGS = -pg
#---------------------------------------------------------------------------
# Global *link time* flags. Flags for increasing maximum executable
# size usually go here.
#---------------------------------------------------------------------------
CLINKFLAGS = -pg
-O,改成-pg
2)
make CG CLASS=A
3)
./bin/cg.A.x运行结果
NAS Parallel Benchmarks (NPB3.3-SER) - CG Benchmark
Size: 14000
Iterations: 15
Initialization time = 2.719 seconds
iteration ||r|| zeta
1 0.26065081214763E-12 19.9997581277040
2 0.25753187736717E-14 17.1140495745506
3 0.25934878907518E-14 17.1296668946143
4 0.25626292684826E-14 17.1302113581193
5 0.25110613524700E-14 17.1302338856353
6 0.25581937582088E-14 17.1302349879482
7 0.25456477041068E-14 17.1302350498916
8 0.24494068328538E-14 17.1302350537510
9 0.24885235903729E-14 17.1302350540101
10 0.24771507610856E-14 17.1302350540284
11 0.24928441017003E-14 17.1302350540298
12 0.24443706061229E-14 17.1302350540299
13 0.24709361922612E-14 17.1302350540299
14 0.24381630450112E-14 17.1302350540299
15 0.24296673223448E-14 17.1302350540299
Benchmark completed
VERIFICATION SUCCESSFUL
Zeta is 0.1713023505403E+02
Error is 0.5122640033228E-13
CG Benchmark Completed.
Class = A
Size = 14000
Iterations = 15
Time in seconds = 8.91
Mop/s total = 167.86
Operation type = floating point
Verification = SUCCESSFUL
Version = 3.3.1
Compile date = 04 Oct 2024
Compile options:
F77 = f77
FLINK = $(F77)
F_LIB = (none)
F_INC = (none)
FFLAGS = -pg
FLINKFLAGS = -pg
RAND = randi8
Please send all errors/feedbacks to:
NPB Development Team
npb@nas.nasa.gov
4)
运行完成后生成:
gmon.out
5)
gprof ./bin/cg.A.x > cg.A.log
6)
cg.A.log内容如下所示:
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
80.25 8.51 8.51 16 0.53 0.53 conj_grad_
19.52 10.59 2.07 1 2.07 2.07 sparse_
0.28 10.62 0.03 1 0.03 10.62 MAIN__
0.00 10.62 0.00 360695 0.00 0.00 randlc_
0.00 10.62 0.00 180347 0.00 0.00 icnvrt_
0.00 10.62 0.00 14000 0.00 0.00 sprnvc_
0.00 10.62 0.00 14000 0.00 0.00 vecset_
0.00 10.62 0.00 4 0.00 0.00 elapsed_time_
0.00 10.62 0.00 4 0.00 0.00 wtime_
0.00 10.62 0.00 3 0.00 0.00 timer_clear_
0.00 10.62 0.00 2 0.00 0.00 timer_read_
0.00 10.62 0.00 2 0.00 0.00 timer_start_
0.00 10.62 0.00 2 0.00 0.00 timer_stop_
0.00 10.62 0.00 1 0.00 2.07 makea_
0.00 10.62 0.00 1 0.00 0.00 print_results_
可见conj_grad_和sparse_占用时间最多。