SNPE 是神经网络在骁龙平台上推理的开发套件,方便开发者在使用高通芯片的设备上加速AI应用。
无论是芯片制造商的发布会还是各大厂商的手机发布会,AI能力一直是大家谈论的焦点。
骁龙888采用了新一代的Hexagon 780架构Hexagon Tensor Processer(HTP),算力从865的15TOPS提升到了26TOPS,888+ 进一步提升到了32TOPS,尽管高通没有提供算力的细节,但从TOPS数值来看,确实很强大。
那么作为开发者,有没有办法使用HTP加速自己的AI模型推理呢?答案是肯定的,下面我们来一起探索如何使用888的HTP推理inceptionV3。
本文包含以下两部分。
- 使用手机的CPU和GPU推理inceptionV3
- 使用SNPE工具在HTP上推理inceptionV3
SNPE-CPU\SNPE-GPU推理inception-v3
首先,我们用手机的CPU和GPU做模型推理,并记录他们的推理速度。这里我用搭载888的Redmi K40 pro开始。
(1)模型和数据采用上一篇 上手SNPE-推理inceptionV3 准备好的inception_v3.dlc 和 data
- 将数据和模型推到手机/data/local/tmp/incpv3
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb shell mkdir /data/local/tmp/incpv3
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb push inception_v3.dlc /data/local/tmp/incpv3
inception_v3.dlc: 1 file pushed. 37.5 MB/s (95639760 bytes in 2.433s)
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb push data/. /data/local/tmp/incpv3
data/./: 15 files pushed. 32.6 MB/s (18266784 bytes in 0.535s)
(2)准备SNPE对应的库libSNPE.so和测试工具snpe-net-run
- 将需要的SNPE库libSNPE.so,libc++_shared.so和测试的应用snpe-net-run推进手机 /data/local/tmp/incpv3/
<host workspace>/inceptionv3# adb shell mkdir -p /data/local/tmp/incpv3/arm64/lib
<host workspace>/inceptionv3# adb shell mkdir -p /data/local/tmp/incpv3/arm64/bin
<host workspace>/inceptionv3# adb shell mkdir -p /data/local/tmp/incpv3/dsp/lib
<host workspace>/inceptionv3# adb push $SNPE_ROOT/lib/aarch64-android-clang6.0/libSNPE.so /data/local/tmp/incpv3/arm64/lib
/workspace/tutor/snpe-1.52.0.2724/lib/aarch64-android-clang6.0/libSNPE.so: 1 file pushed. 33.4 MB/s (7197056 bytes in 0.205s)
<host workspace>/inceptionv3# adb push $SNPE_ROOT/lib/aarch64-android-clang6.0/libc++_shared.so /data/local/tmp/incpv3/arm64/lib
/workspace/tutor/snpe-1.52.0.2724/lib/aarch64-android-clang6.0/libc++_shared.so: 1 file pushed. 23.1 MB/s (1055016 bytes in 0.044s)
<host workspace>/inceptionv3# adb push $SNPE_ROOT/bin/aarch64-android-clang6.0/snpe-net-run /data/local/tmp/incpv3/arm64/bin
/workspace/tutor/snpe-1.52.0.2724/bin/aarch64-android-clang6.0/snpe-net-run: 1 file pushed. 15.6 MB/s (550336 bytes in 0.034s)
(3) 在CPU和GPU上推理inceptionV3
- 在adb shell 里设置环境变量
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb shell
haydn:/ $ cd /data/local/tmp/incpv3/
haydn:/data/local/tmp/incpv3 $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/incpv3/arm64/lib
haydn:/data/local/tmp/incpv3 $ export PATH=$PATH:/data/local/tmp/incpv3/arm64/bin
haydn:/data/local/tmp/incpv3 $ snpe-net-run --version
SNPE v1.52.0.2724
- 在CPU上推理inceptionV3, 输出在output_cpu
haydn:/data/local/tmp/incpv3 $ snpe-net-run --container inception_v3.dlc --input_list target_raw_list.txt --output_dir output_cpu
-------------------------------------------------------------------------------
Model String: N/A
SNPE v1.52.0.2724
-------------------------------------------------------------------------------
Processing DNN input(s):
cropped/plastic_cup.raw
Processing DNN input(s):
cropped/chairs.raw
haydn:/data/local/tmp/incpv3 $ ls output_cpu/ -al
drwxr-xr-x 3 shell shell 3452 2021-08-31 17:29 Result_0
drwxr-xr-x 3 shell shell 3452 2021-08-31 17:29 Result_1
-rw-rw-rw- 1 shell shell 46536 2021-08-31 17:29 SNPEDiag_0.log
- 在GPU上推理inceptionV3, 输出在output_gpu. snpe-net-run --use_gpu
haydn:/data/local/tmp/incpv3 $ snpe-net-run --container inception_v3.dlc --input_list target_raw_list.txt --output_dir output_gpu --use_gpu <
-------------------------------------------------------------------------------
Model String: N/A
SNPE v1.52.0.2724
-------------------------------------------------------------------------------
Processing DNN input(s):
cropped/plastic_cup.raw
Processing DNN input(s):
cropped/chairs.raw
haydn:/data/local/tmp/incpv3 $ ls output_gpu/ -al
drwxr-xr-x 3 shell shell 3452 2021-08-31 17:32 Result_0
drwxr-xr-x 3 shell shell 3452 2021-08-31 17:32 Result_1
-rw-rw-rw- 1 shell shell 58064 2021-08-31 17:32 SNPEDiag_0.log
- 解析SNPEDiag.log 得到CPU和GPU的推理速度.
pull outputs 到主机
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb pull /data/local/tmp/incpv3/output_cpu .
/data/local/tmp/incpv3/output_cpu/: 5 files pulled. 0.1 MB/s (62552 bytes in 0.414s)
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb pull /data/local/tmp/incpv3/output_gpu .
/data/local/tmp/incpv3/output_gpu/: 5 files pulled. 0.3 MB/s (74080 bytes in 0.272s)
用snpe-diagview 解析SNPEDiag.log 得到推理的时间
# CPU
root@c633e07fbd33:/workspace/tutor/inceptionv3# snpe-diagview --input_log output_cpu/SNPEDiag_0.log
------------------------------
Total Inference Time: 148400 us
Layer Times:
---------------
0: 1 us : CPU
1: 3823 us : CPU
2: 269 us : CPU
3: 7659 us : CPU
.....
# GPU
root@c633e07fbd33:/workspace/tutor/inceptionv3# snpe-diagview --input_log output_gpu/SNPEDiag_0.log
--------------------------------------------------
Total Inference Time: 75681 us
Layer Times:
---------------
0: 90 us : GPU
1: 212 us : GPU
2: 130 us : GPU
3: 1589 us : GPU
测试结果是在单一手机上的默认设置得到,并不代表任何性能指标
SNPE-HTP 推理inception-v3
(1)量化模型
- 由于888 HTP仅支持定点INT8和INT16推理,所以需要对模型做量化,这里使用INT8。
- SNPE 提供了后量化工具 snpe-dlc-quantize.
- 加上选项--enable_htp,这样会针对888HTP生成一些cache数据,提升初始化速度。这个选项具有版本依赖性,如果更换了SNPE的版本,那么需要重新生产cache。
- 一般后量化需要的样例数据在100-200,这里仅仅演示功能,采用较少的数据。
# 将数据和input-list 拷贝到工作目录下
root@c633e07fbd33:/workspace/tutor/inceptionv3# cp data/cropped . -r
root@c633e07fbd33:/workspace/tutor/inceptionv3# cp data/target_raw_list.txt .
# 后量化模型
root@c633e07fbd33:/workspace/tutor/inceptionv3# snpe-dlc-quantize --input_dlc inception_v3.dlc --input_list target_raw_list.txt --output_dlc inception_v3_htp.dlc --enable_htp
root@c633e07fbd33:/workspace/tutor/inceptionv3# ls -al inception_v3_htp.dlc
-rw-r--r-- 1 root root 48454504 Aug 31 10:05 inception_v3_htp.dlc
(2)准备888 HTP 需要的dsp库
- libsnpe_dsp_domains_v3.so
- libsnpe_dsp_v68_domains_v3_skel.so
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb push $SNPE_ROOT/lib/dsp/libsnpe_dsp_v68_domains_v3_skel.so /data/local/tmp/incpv3/dsp/lib
/workspace/tutor/snpe-1.52.0.2724/lib/dsp/libsnpe_dsp_v68_domains_v3_skel.so: 1 file pushed. 35.5 MB/s (10244328 bytes in 0.276s)
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb push $SNPE_ROOT/lib/aarch64-android-clang6.0/libsnpe_dsp_domains_v3.so /data/local/tmp/incpv3/arm64/lib
/workspace/tutor/snpe-1.52.0.2724/lib/aarch64-android-clang6.0/libsnpe_dsp_domains_v3.so: 1 file pushed. 2.9 MB/s (18080 bytes in 0.006s)
(3)在888 HTP上推理inceptionV3
- 设置环境变量,除了LD_LIBRARY_PATH和PATH之外,需要设定ADSP_LIBRARY_PATH来指向libsnpe_dsp_v68_domains_v3_skel.so 的路径。
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb shell
haydn:/ $ cd /data/local/tmp/incpv3/
haydn:/data/local/tmp/incpv3 $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/incpv3/arm64/lib
haydn:/data/local/tmp/incpv3 $ export PATH=$PATH:/data/local/tmp/incpv3/arm64/bin
haydn:/data/local/tmp/incpv3 $ export ADSP_LIBRARY_PATH="/data/local/tmp/incpv3/dsp/lib;/system/lib/rfsa/adsp;/system/vendor/lib/rfsa/adsp;/dsp"
- snpe-net-run --use_dsp
haydn:/data/local/tmp/incpv3 $ snpe-net-run --container inception_v3_htp.dlc --input_list target_raw_list.txt --use_dsp --output_dir output_htp
The selected runtime is not available on this platform. Continue anyway to observe the failure at network creation time.
error_code=500; error_message=Target runtime is not available. error_code=500; error_message=Target runtime is not available. No viable runtimes available.; error_component=Host Runtime; line_no=391; thread_id=478804950272; error_component=Host Runtime; line_no=262; thread_id=481037636856
- 出现错误 The selected runtime is not available on this platform
- 这是因为商用手机只给开发者开放了HTP 的 unsignedPD( 具体可以参考Hexagon DSP SDK , 里面详细介绍了什么是unsigned PD),需要加上如下选项。
- snpe-net-run --use_dsp --platform_options unsignedPD:ON
haydn:/data/local/tmp/incpv3 $ snpe-net-run --container inception_v3_htp.dlc --input_list target_raw_list.txt --use_dsp --output_dir output_htp --platform_options unsignedPD:ON <
PlatformOptions (unsignedPD:ON) set successful config option is valid
-------------------------------------------------------------------------------
Model String: N/A
SNPE v1.52.0.2724
-------------------------------------------------------------------------------
Processing DNN input(s):
cropped/notice_sign.raw
Processing DNN input(s):
cropped/trash_bin.raw
Processing DNN input(s):
cropped/plastic_cup.raw
Processing DNN input(s):
cropped/chairs.raw
(4)推理速度的比较
依然用snpe-diagview 解析SNPEDiag.log 得到推理的时间
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb pull /data/local/tmp/incpv3/output_htp .
/data/local/tmp/incpv3/output_htp/: 5 files pulled. 0.1 MB/s (48880 bytes in 0.320s)
root@c633e07fbd33:/workspace/tutor/inceptionv3# snpe-diagview --input_log output_htp/SNPEDiag_0.log
------------------------------
Total Inference Time: 10673 us
Layer Times:
---------------
0: 0 us : DSP
1: 0 us : DSP
2: 266038 us : DSP
3: 0 us : DSP
...
这里每层的时间单位不是us, 应该是cycles
- 结果显示,888的HTP推理一次inceptionV3的时间是10.7ms.
测试结果是在单一手机上的默认设置得到,并不代表任何性能指标。如果需要测试性能,需要设定性能模式和其他一些选项的设定,可以使用SNPE里面的snpe-throughput-net-run测试工具。
到此为止,我们就完成了
- 使用888手机的CPU和GPU推理inceptionV3
- 使用SNPE工具在888手机的HTP上推理inceptionV3
参考:
<path_to_snpe_sdk>/snpe-1.52.0.2724/doc/html/index.html
作者:Wenhao