上手SNPE(2)使用手机推理inceptionV3
发表于 2022-03-03 18:10:03

SNPE 是神经网络在骁龙平台上推理的开发套件,方便开发者在使用高通芯片的设备上加速AI应用。

无论是芯片制造商的发布会还是各大厂商的手机发布会,AI能力一直是大家谈论的焦点。

骁龙888采用了新一代的Hexagon 780架构Hexagon Tensor Processer(HTP),算力从865的15TOPS提升到了26TOPS888+ 进一步提升到了32TOPS,尽管高通没有提供算力的细节,但从TOPS数值来看,确实很强大。

那么作为开发者,有没有办法使用HTP加速自己的AI模型推理呢?答案是肯定的,下面我们来一起探索如何使用888的HTP推理inceptionV3。

本文包含以下两部分。

  1. 使用手机的CPU和GPU推理inceptionV3
  2. 使用SNPE工具在HTP上推理inceptionV3

SNPE-CPU\SNPE-GPU推理inception-v3

首先,我们用手机的CPU和GPU做模型推理,并记录他们的推理速度。这里我用搭载888的Redmi K40 pro开始。

(1)模型和数据采用上一篇 上手SNPE-推理inceptionV3 准备好的inception_v3.dlc 和 data

  • 将数据和模型推到手机/data/local/tmp/incpv3
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb shell mkdir /data/local/tmp/incpv3
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb push inception_v3.dlc /data/local/tmp/incpv3
inception_v3.dlc: 1 file pushed. 37.5 MB/s (95639760 bytes in 2.433s)
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb push data/. /data/local/tmp/incpv3
data/./: 15 files pushed. 32.6 MB/s (18266784 bytes in 0.535s)

(2)准备SNPE对应的库libSNPE.so和测试工具snpe-net-run

  • 将需要的SNPE库libSNPE.so,libc++_shared.so和测试的应用snpe-net-run推进手机 /data/local/tmp/incpv3/
<host workspace>/inceptionv3# adb shell mkdir -p  /data/local/tmp/incpv3/arm64/lib
<host workspace>/inceptionv3# adb shell mkdir -p  /data/local/tmp/incpv3/arm64/bin
<host workspace>/inceptionv3# adb shell mkdir -p  /data/local/tmp/incpv3/dsp/lib
<host workspace>/inceptionv3# adb push $SNPE_ROOT/lib/aarch64-android-clang6.0/libSNPE.so /data/local/tmp/incpv3/arm64/lib
/workspace/tutor/snpe-1.52.0.2724/lib/aarch64-android-clang6.0/libSNPE.so: 1 file pushed. 33.4 MB/s (7197056 bytes in 0.205s)
<host workspace>/inceptionv3# adb push $SNPE_ROOT/lib/aarch64-android-clang6.0/libc++_shared.so /data/local/tmp/incpv3/arm64/lib
/workspace/tutor/snpe-1.52.0.2724/lib/aarch64-android-clang6.0/libc++_shared.so: 1 file pushed. 23.1 MB/s (1055016 bytes in 0.044s)
<host workspace>/inceptionv3# adb push $SNPE_ROOT/bin/aarch64-android-clang6.0/snpe-net-run /data/local/tmp/incpv3/arm64/bin
/workspace/tutor/snpe-1.52.0.2724/bin/aarch64-android-clang6.0/snpe-net-run: 1 file pushed. 15.6 MB/s (550336 bytes in 0.034s)

(3)  在CPU和GPU上推理inceptionV3

  • 在adb shell 里设置环境变量
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb shell
haydn:/ $ cd /data/local/tmp/incpv3/
haydn:/data/local/tmp/incpv3 $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/incpv3/arm64/lib
haydn:/data/local/tmp/incpv3 $ export PATH=$PATH:/data/local/tmp/incpv3/arm64/bin
haydn:/data/local/tmp/incpv3 $ snpe-net-run --version
SNPE v1.52.0.2724
  • 在CPU上推理inceptionV3, 输出在output_cpu
haydn:/data/local/tmp/incpv3 $ snpe-net-run --container inception_v3.dlc --input_list target_raw_list.txt --output_dir output_cpu
-------------------------------------------------------------------------------
Model String: N/A
SNPE v1.52.0.2724
-------------------------------------------------------------------------------
Processing DNN input(s):
cropped/plastic_cup.raw
Processing DNN input(s):
cropped/chairs.raw

haydn:/data/local/tmp/incpv3 $ ls output_cpu/ -al
drwxr-xr-x 3 shell shell  3452 2021-08-31 17:29 Result_0
drwxr-xr-x 3 shell shell  3452 2021-08-31 17:29 Result_1
-rw-rw-rw- 1 shell shell 46536 2021-08-31 17:29 SNPEDiag_0.log
  • 在GPU上推理inceptionV3, 输出在output_gpu. snpe-net-run --use_gpu
haydn:/data/local/tmp/incpv3 $ snpe-net-run --container inception_v3.dlc --input_list target_raw_list.txt --output_dir output_gpu --use_gpu                                                                         <
-------------------------------------------------------------------------------
Model String: N/A
SNPE v1.52.0.2724
-------------------------------------------------------------------------------
Processing DNN input(s):
cropped/plastic_cup.raw
Processing DNN input(s):
cropped/chairs.raw

haydn:/data/local/tmp/incpv3 $ ls output_gpu/ -al
drwxr-xr-x 3 shell shell  3452 2021-08-31 17:32 Result_0
drwxr-xr-x 3 shell shell  3452 2021-08-31 17:32 Result_1
-rw-rw-rw- 1 shell shell 58064 2021-08-31 17:32 SNPEDiag_0.log
  • 解析SNPEDiag.log 得到CPU和GPU的推理速度.

pull outputs 到主机

root@c633e07fbd33:/workspace/tutor/inceptionv3# adb pull /data/local/tmp/incpv3/output_cpu .
/data/local/tmp/incpv3/output_cpu/: 5 files pulled. 0.1 MB/s (62552 bytes in 0.414s)
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb pull /data/local/tmp/incpv3/output_gpu .
/data/local/tmp/incpv3/output_gpu/: 5 files pulled. 0.3 MB/s (74080 bytes in 0.272s)

用snpe-diagview 解析SNPEDiag.log 得到推理的时间

# CPU
root@c633e07fbd33:/workspace/tutor/inceptionv3# snpe-diagview --input_log output_cpu/SNPEDiag_0.log
------------------------------
Total Inference Time: 148400 us
Layer Times:
---------------
0: 1 us : CPU
1: 3823 us : CPU
2: 269 us : CPU
3: 7659 us : CPU
.....

# GPU
root@c633e07fbd33:/workspace/tutor/inceptionv3# snpe-diagview --input_log output_gpu/SNPEDiag_0.log
--------------------------------------------------
Total Inference Time: 75681 us
Layer Times:
---------------
0: 90 us : GPU
1: 212 us : GPU
2: 130 us : GPU
3: 1589 us : GPU

 测试结果是在单一手机上的默认设置得到,并不代表任何性能指标


SNPE-HTP 推理inception-v3

(1)量化模型

  • 由于888 HTP仅支持定点INT8和INT16推理,所以需要对模型做量化,这里使用INT8。
  • SNPE 提供了后量化工具 snpe-dlc-quantize.
  • 加上选项--enable_htp,这样会针对888HTP生成一些cache数据,提升初始化速度。这个选项具有版本依赖性,如果更换了SNPE的版本,那么需要重新生产cache。
  • 一般后量化需要的样例数据在100-200,这里仅仅演示功能,采用较少的数据。
# 将数据和input-list 拷贝到工作目录下
root@c633e07fbd33:/workspace/tutor/inceptionv3# cp data/cropped . -r
root@c633e07fbd33:/workspace/tutor/inceptionv3# cp data/target_raw_list.txt .
# 后量化模型
root@c633e07fbd33:/workspace/tutor/inceptionv3# snpe-dlc-quantize --input_dlc inception_v3.dlc --input_list target_raw_list.txt --output_dlc inception_v3_htp.dlc --enable_htp
root@c633e07fbd33:/workspace/tutor/inceptionv3# ls -al inception_v3_htp.dlc
-rw-r--r-- 1 root root 48454504 Aug 31 10:05 inception_v3_htp.dlc

(2)准备888 HTP 需要的dsp库

  • libsnpe_dsp_domains_v3.so
  • libsnpe_dsp_v68_domains_v3_skel.so
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb push $SNPE_ROOT/lib/dsp/libsnpe_dsp_v68_domains_v3_skel.so  /data/local/tmp/incpv3/dsp/lib
/workspace/tutor/snpe-1.52.0.2724/lib/dsp/libsnpe_dsp_v68_domains_v3_skel.so: 1 file pushed. 35.5 MB/s (10244328 bytes in 0.276s)
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb push $SNPE_ROOT/lib/aarch64-android-clang6.0/libsnpe_dsp_domains_v3.so  /data/local/tmp/incpv3/arm64/lib
/workspace/tutor/snpe-1.52.0.2724/lib/aarch64-android-clang6.0/libsnpe_dsp_domains_v3.so: 1 file pushed. 2.9 MB/s (18080 bytes in 0.006s)

(3)在888 HTP上推理inceptionV3

  • 设置环境变量,除了LD_LIBRARY_PATHPATH之外,需要设定ADSP_LIBRARY_PATH来指向libsnpe_dsp_v68_domains_v3_skel.so 的路径。
root@c633e07fbd33:/workspace/tutor/inceptionv3# adb shell
haydn:/ $ cd  /data/local/tmp/incpv3/
haydn:/data/local/tmp/incpv3 $  export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/incpv3/arm64/lib
haydn:/data/local/tmp/incpv3 $ export PATH=$PATH:/data/local/tmp/incpv3/arm64/bin
haydn:/data/local/tmp/incpv3 $ export ADSP_LIBRARY_PATH="/data/local/tmp/incpv3/dsp/lib;/system/lib/rfsa/adsp;/system/vendor/lib/rfsa/adsp;/dsp"
  • snpe-net-run --use_dsp
haydn:/data/local/tmp/incpv3 $ snpe-net-run --container inception_v3_htp.dlc --input_list target_raw_list.txt --use_dsp --output_dir output_htp                                                                    

The selected runtime is not available on this platform. Continue anyway to observe the failure at network creation time.
error_code=500; error_message=Target runtime is not available. error_code=500; error_message=Target runtime is not available. No viable runtimes available.; error_component=Host Runtime; line_no=391; thread_id=478804950272; error_component=Host Runtime; line_no=262; thread_id=481037636856
  1. 出现错误 The selected runtime is not available on this platform
  2. 这是因为商用手机只给开发者开放了HTP 的 unsignedPD( 具体可以参考Hexagon DSP SDK , 里面详细介绍了什么是unsigned PD),需要加上如下选项。
  • snpe-net-run --use_dsp --platform_options unsignedPD:ON
haydn:/data/local/tmp/incpv3 $ snpe-net-run --container inception_v3_htp.dlc --input_list target_raw_list.txt --use_dsp --output_dir output_htp --platform_options unsignedPD:ON                                    <
PlatformOptions (unsignedPD:ON) set successful config option is valid
-------------------------------------------------------------------------------
Model String: N/A
SNPE v1.52.0.2724
-------------------------------------------------------------------------------
Processing DNN input(s):
cropped/notice_sign.raw
Processing DNN input(s):
cropped/trash_bin.raw
Processing DNN input(s):
cropped/plastic_cup.raw
Processing DNN input(s):
cropped/chairs.raw

(4)推理速度的比较

依然用snpe-diagview 解析SNPEDiag.log 得到推理的时间

root@c633e07fbd33:/workspace/tutor/inceptionv3# adb pull /data/local/tmp/incpv3/output_htp .
/data/local/tmp/incpv3/output_htp/: 5 files pulled. 0.1 MB/s (48880 bytes in 0.320s)
root@c633e07fbd33:/workspace/tutor/inceptionv3# snpe-diagview --input_log output_htp/SNPEDiag_0.log
------------------------------
Total Inference Time: 10673 us
Layer Times:
---------------
0: 0 us : DSP
1: 0 us : DSP
2: 266038 us : DSP
3: 0 us : DSP
...

这里每层的时间单位不是us, 应该是cycles

  • 结果显示,888的HTP推理一次inceptionV3的时间是10.7ms.

 测试结果是在单一手机上的默认设置得到,并不代表任何性能指标。如果需要测试性能,需要设定性能模式和其他一些选项的设定,可以使用SNPE里面的snpe-throughput-net-run测试工具。


到此为止,我们就完成了

  • 使用888手机的CPU和GPU推理inceptionV3
  • 使用SNPE工具在888手机的HTP上推理inceptionV3

参考:

<path_to_snpe_sdk>/snpe-1.52.0.2724/doc/html/index.html

作者:Wenhao

CSDN官方微信
扫描二维码,向CSDN吐槽
微信号:CSDNnews
微博关注
【免责声明:CSDN本栏目发布信息,目的在于传播更多信息,丰富网络文化,稿件仅代表作者个人观点,与CSDN无关。其原创性以及文中陈述文字和文字内容未经本网证实,对本文以及其中全部或者部分内容、文字的真实性、完整性、及时性本网不做任何保证或者承诺,请读者仅作参考,并请自行核实相关内容。您若对该稿件有任何怀疑或质疑,请立即与CSDN联系,我们将迅速给您回应并做处理。】