Yolo系列模型在业界有着广泛的应用,这里简单介绍下如何用SNPE在HTP上做Yolov5的推理。
本文包含一下几个部分:
- Yolo5模型保存成ONNX
- ONNX转成DLC,用手机CPU/GPU推理模型
- 量化模型,用手机HTP推理模型
1. Yolo5模型保存成ONNX
本文直接按照 https://github.com/ultralytics/yolov5/issues/251给出的方法得到onnx模型。
git clone https://github.com/ultralytics/yolov5 # clone
cd yolov5
pip install -r requirements.txt # install
以yolov5s为例
/local/mnt/workspace2/tutor/yolo5/frozen_model
╰─ python path/to/export.py --weights yolov5s.pt --include torchscript onnx
得到yolov5s.onnx
/local/mnt/workspace2/tutor/yolo5/frozen_model
╰─ ls -al
-rwxr--r-- 1 zwenhao users 29335097 Mar 7 11:08 yolov5s.onnx
-rwxr--r-- 1 zwenhao users 14808437 Mar 7 11:08 yolov5s.pt
-rwxr--r-- 1 zwenhao users 29379661 Mar 7 11:08 yolov5s.torchscript
2. ONNX转成DLC
由于SNPE目前不支持5D的tensor,所以我们需要设置output到5D reshape之前。
5D reshape之后的算子和后处理需要在SNPE之外来实现和处理了。
snpe-onnx-to-dlc -i yolov5s.onnx --out_node 384 --out_node 442 --out_node 326
# 得到 onnx.dlc
ls -al
-rw-r--r-- 1 zwenhao users 29024606 Mar 17 14:41 yolov5s.dlc
3. SNPE-CPU/GPU 推理
将SNPE对应的库推送到手机的/data/local/tmp/snpe_net_run/snpe/
adb push /opt/snpe/lib/aarch64-android-clang6.0/. /data/local/tmp/snpe_net_run/snpe/
adb push /opt/snpe/lib/dsp/. /data/local/tmp/snpe_net_run/snpe/
adb push /opt/snpe/bin/aarch64-android-clang6.0/snpe-net-run /data/local/tmp/snpe_net_run/snpe/
这里简单随机生成了一张INPUT_images_00000001.raw,用于测试。
# input_list.txt的内容
╰─ cat target_raw_list.txt
#Conv_198 Conv_248 Conv_298
/data/local/tmp/snpe_net_run/yolov5s/Input/INPUT_images_00000001.raw
将对应的input,input_list,yolo5s.dlc推送到手机/data/local/tmp/snpe_net_run/yolov5s
adb push Input /data/local/tmp/snpe_net_run/yolov5s
adb push target_raw_list.txt /data/local/tmp/snpe_net_run/yolov5s
adb push yolov5s.dlc /data/local/tmp/snpe_net_run/yolov5s
在 adb shell 环境中:
设置手机上snpe 库的路径,使用snpe-net-run 在手机CPU推理模型,pull出结果cpu_output
adb shell
xxx:/ #cd /data/local/tmp/snpe_net_run/yolov5s
xxx:/data/local/tmp/snpe_net_run/yolov5s # export LD_LIBRARY_PATH=/data/local/tmp/snpe_net_run/snpe/:$LD_LIBRARY_PATH && export PATH=$PATH:/data/local/tmp/snpe_net_run/snpe/
(1)SNPE-CPU推理
snpe-net-run 测试模型推理,默认在CPU上执行,得到输出在cpu_output
xxx:/data/local/tmp/snpe_net_run/yolov5s # snpe-net-run --input_list target_raw_list.txt --container yolov5s.dlc --output_dir cpu_output --perf_profile high_performance --profiling_level moderate
-------------------------------------------------------------------------------
Model String: N/A
SNPE v1.60.0.3313
-------------------------------------------------------------------------------
Processing DNN input(s):
/data/local/tmp/snpe_net_run/yolov5s/Input/INPUT_images_00000001.raw
pull出结果,查看执行diag.log
用snpe-diagview查看推理时间,CPU推理耗时 Total Inference Time: 895054 us
在host shell 下
╭─ /local/mnt/workspace2/tutor/yolo5/frozen_model/net_run_yolov5s ╭─ /local/mnt/workspace2/tutor/yolo5/frozen_model/net_run_yolov5s 1 ✘
╰─ adb pull /data/local/tmp/snpe_net_run/yolov5s/cpu_output .
/data/local/tmp/snpe_net_run/yolov5s/cpu_output/: 6 files pulled. 29.2 MB/s (8591020 bytes in 0.280s)
查看推理时间
╭─ /local/mnt/workspace2/tutor/yolo5/frozen_model/net_run_yolov5s ✔
╰─ snpe-diagview --input_log cpu_output/SNPEDiag.log
.........
Average SNPE Statistics:
------------------------------
Total Inference Time: 895054 us
Forward Propagate Time: 895021 us
(2)SNPE-GPU推理
输出在gpu_output
xxx:/data/local/tmp/snpe_net_run/yolov5s # snpe-net-run --input_list target_raw_list.txt --container yolov5s.dlc --use_gpu --output_dir gpu_output --perf_profile high_performance --profiling_level moderate
-------------------------------------------------------------------------------
Model String: N/A
SNPE v1.60.0.3313
-------------------------------------------------------------------------------
Processing DNN input(s):
/data/local/tmp/snpe_net_run/yolov5s/Input/INPUT_images_00000001.raw
pull出结果,用snpe-diagview查看推理时间,GPU推理耗时 Total Inference Time: 108866 us
╰─ snpe-diagview --input_log gpu_output/SNPEDiag.log
.......
Average SNPE Statistics:
------------------------------
Total Inference Time: 108866 us
Forward Propagate Time: 108250 us
4. SNPE-HTP推理
(1)量化模型
量化模型,将模型push 到手机/data/local/tmp/snpe_net_run/yolov5s
host主机shell上
snpe-dlc-quantize --input_dlc yolov5s.dlc --input_list raw_list.txt --weights_bitwidth=8 --act_bitwidth=8 --bias_bitwidth=32
--output_dlc yolov5s_htp.dlc --enable_htp --htp_socs sm8350
adb push yolov5s_htp.dlc /data/local/tmp/snpe_net_run/yolov5s
(2)使用手机HTP推理
设置CDSP 库的路径,使用snpe-net-run 在手机HTP上推理模型,pull出结果htp_output
xxx:/data/local/tmp/snpe_net_run/yolov5s # export ADSP_LIBRARY_PATH='/data/local/tmp/snpe_net_run/snpe/;/system/lib/rfsa/adsp;/vendor/lib/rfsa/adsp;/system/vendor/lib/rfsa/adsp;/dsp'
HTP推理,需要在unsignedPD
xxx:/data/local/tmp/snpe_net_run/yolov5s # snpe-net-run --input_list target_raw_list.txt --container yolov5s_htp.dlc --output_dir htp_output --use_dsp --userbuffer_tf8 --platform_options unsignedPD:ON
-------------------------------------------------------------------------------
Model String: N/A
SNPE v1.60.0.3313
-------------------------------------------------------------------------------
Processing DNN input(s):
/data/local/tmp/snpe_net_run/yolov5s/Input/INPUT_images_00000001.raw
txt --container yolov5s_htp.dlc --output_dir htp_output --use_dsp --userbuffer_tf8 --platform_options unsignedPD:ON --perf_profile high_performance --profiling_level moderate
PlatformOptions (unsignedPD:ON) set successful config option is valid
-------------------------------------------------------------------------------
Model String: N/A
SNPE v1.60.0.3313
-------------------------------------------------------------------------------
Processing DNN input(s):
/data/local/tmp/snpe_net_run/yolov5s/Input/INPUT_images_00000001.raw
xxx:/data/local/tmp/snpe_net_run/yolov5s #
pull出结果,查看执行diag.log
用snpe-diagview查看推理时间,HTP推理耗时Total Inference Time: 4621us
host主机shell上
╭─ /local/mnt/workspace2/tutor/yolo5/frozen_model/net_run_yolov5s ✔
╰─ adb pull /data/local/tmp/snpe_net_run/yolov5s/htp_output .
/data/local/tmp/snpe_net_run/yolov5s/htp_output/: 7 files pulled. 34.3 MB/s (8594128 bytes in 0.239s)
查看推理时间
╭─ /local/mnt/workspace2/tutor/yolo5/frozen_model/net_run_yolov5s ✔
╰─ snpe-diagview --input_log htp_output/SNPEDiag.log
......
Average SNPE Statistics:
------------------------------
Total Inference Time: 4621 us
Forward Propagate Time: 4592 us
RPC Execute Time: 4291 us
Snpe Accelerator Time: 3025 us
Accelerator Time: 2952 us
Misc Accelerator Time: 0 us
Layer Times:
5. 对比CPU和HTP的结果
(1)性能对比
high performance 下,HTP的速度是GPU的23.6倍,CPU的193.7倍
(2)精度对比
这里采用的随机生成的输入样例,而且并没有完整的执行这个模型,仅仅对比HTP和CPU的结果,作为参考,并不能完全说明问题。
总结
首先介绍了如何保存yolov5模型到onnx,然后转成DLC,在手机上用SNPE-GPU/CPU推理Yolov5,最后使用SNPE-HTP在888手机上推理yoloV5.
希望对大家有所帮助。
参考:
其他相关内容:
第二篇【上手SNPE(2)使用手机推理inceptionV3】
作者:Wenhao