在上一篇 使用888的HTP推理inceptionV3 中,我们测试了inceptionv3在888芯片手机上的CPU、GPU、HTP上的推理,结果显示HTP的推理速度是GPU的7倍多。除了速度快之外,HTP还有一大优势就是功耗低,这点对于手机应用也非常重要。
下面我将简单介绍如何在APP中通过SNPE API使用HTP,加速模型的同时减少功耗。
本文包含以下部分:
- SNPE Native C++ API 的介绍
- 在 888 手机的HTP上运行示例代码
- 调试API的介绍
SNPE API介绍
<path to snpe sdk>/doc/html/cplus_plus_tutorial.html
<path to snpe sdk>\examples\NativeCpp\SampleCode\jni\main.cpp
简单来说,可以按照以下步骤来调用SNPE
- 检查可用的runtime(可选项);
- 加载网络DLC模型;
- 配置SNPE选项,新建SNPE instance;
- 加载模型的输入数据;
- 执行网络推理, 得到输出数据;
- 卸载SNPE。
SNPE API Basic Call Flow
对应在sample code里面的代码如下:
// <path to snpe sdk>\examples\NativeCpp\SampleCode\jni\main.cpp
static zdl::DlSystem::Runtime_t runtime = checkRuntime();
std::unique_ptr<zdl::DlContainer::IDlContainer> container = loadContainerFromFile(dlc);
std::unique_ptr<zdl::SNPE::SNPE> snpe = setBuilderOptions(container, runtime, useUserSuppliedBuffers);
std::unique_ptr<zdl::DlSystem::ITensor> inputTensor = loadInputTensor(snpe, fileLine); // ITensor
snpe->execute(inputTensor.get(), outputTensorMap);// ITensor
snpe.reset();
1. 检查可用的runtime(可选项)
runtime指的是CPU,GPU,DSP,HTA,HTP。
对于不同的手机,包含的runtime类型不一样。
【注意】:开发者仅可以使用DSP/HTP上的unsignedPD,具体可以参考Hexagon DSP SDK里面的unsigned PD,所以对于HTP的runtime需要加上UNSIGNEDPD_CHECK选项。
// 检查DSP/HTP是不是可用。
zdl::DlSystem::Runtime_t checkRuntime()
{
static zdl::DlSystem::Version_t Version = zdl::SNPE::SNPEFactory::getLibraryVersion();
static zdl::DlSystem::Runtime_t Runtime;
std::cout << "SNPE Version: " << Version.asString().c_str() << std::endl; //Print Version number
// 这里加上了zdl::DlSystem::RuntimeCheckOption_t::UNSIGNEDPD_CHECK,表示使用unsignedPD
// <path to snpe sdk>/doc/html/group__c__plus__plus__apis.html#ga960452d40eef91090973a17a438eaabd
if (zdl::SNPE::SNPEFactory::isRuntimeAvailable(zdl::DlSystem::Runtime_t::DSP, zdl::DlSystem::RuntimeCheckOption_t::UNSIGNEDPD_CHECK)) {
Runtime = zdl::DlSystem::Runtime_t::GPU;
} else {
Runtime = zdl::DlSystem::Runtime_t::CPU;
}
return Runtime;
}
2. 加载模型
加载DLC格式的模型文件,用于新建SNPE instance。
//containerPath 是存放DLC文件的路径
std::unique_ptr<zdl::DlContainer::IDlContainer> loadContainerFromFile(std::string containerPath)
{
std::unique_ptr<zdl::DlContainer::IDlContainer> container;
container = zdl::DlContainer::IDlContainer::open(containerPath);
return container;
}
3. 新建SNPE instance
(1)设置 platformConfig(可选项)
【注意】如果开发者使用DSP/HTP,必须在这里选择 unsignedPD:O
zdl::DlSystem::PlatformConfig platformConfig;
std::string PlatformOptions = "unsignedPD:ON";
// check platform options
if (PlatformOptions.length() > 0) {
bool setSuccess = platformConfig.setPlatformOptions(PlatformOptions);
bool isValid = platformConfig.isOptionsValid();
std::cout << "PlatformOptions (" << PlatformOptions << ") set " << (setSuccess ? "successful" : "failed")
<< " config option is " << (isValid ? "valid" : "invalid") << std::endl;
if (!setSuccess || !isValid) {
return EXIT_FAILURE;
}
}
(2)配置选项,新建SNPE
std::unique_ptr<zdl::SNPE::SNPE> setBuilderOptions(std::unique_ptr<zdl::DlContainer::IDlContainer> & container,
zdl::DlSystem::Runtime_t runtime,
zdl::DlSystem::RuntimeList runtimeList,
bool useUserSuppliedBuffers,
zdl::DlSystem::PlatformConfig platformConfig,
bool useCaching)
{
std::unique_ptr<zdl::SNPE::SNPE> snpe;
zdl::SNPE::SNPEBuilder snpeBuilder(container.get());
if(runtimeList.empty())
{
runtimeList.add(runtime);
}
snpe = snpeBuilder.setOutputLayers({})
.setRuntimeProcessorOrder(runtimeList)
.setUseUserSuppliedBuffers(useUserSuppliedBuffers)
.setPlatformConfig(platformConfig) // 这个就是(1)中设置的选项
.setInitCacheMode(useCaching)// 这个选项加快初始化速度
.build();
return snpe;
}
3 加载模型的输入数据
有两种加载输入数据的方式ITensors 和 User Buffers.
User Buffers方式的好处是SNPE直接映射到用户创建的数据buffer,避免将数据拷贝到ITensor,从而减少了数据拷贝的时间开销。
(1)ITensor使用方式可以参考:
- <path to snpe sdk>\examples\NativeCpp\SampleCode\jni\LoadInputTensor.cpp
- <path to snpe sdk>\examples\NativeCpp\SampleCode\jni\SaveOutputTensor.cpp
(2)User Buffers使用方式可以参考:
- <path to snpe sdk>\examples\NativeCpp\SampleCode\jni\CreateUserBuffer.cpp
4 执行SNPE推理
// ITensor 模式
snpe->execute(inputTensor.get(), outputTensorMap)
// User Buffers 模式
snpe->execute(inputMap, outputMap);
5 卸载SNPE
snpe.reset();
运行示例代码
依然延续上一篇的docker container环境,我们这里将修改、编译SNPE的 Native C++ Sample Code, 在HTP上推理inceptionV3.
1 安装NDK,设置编译环境变量
root@c633e07fbd33:/opt# wget https://dl.google.com/android/repository/android-ndk-r19b-linux-x86_64.zip
root@c633e07fbd33:/opt# unzip -q android-ndk-r19b-linux-x86_64.zip
root@c633e07fbd33:/opt# rm android-ndk-r19b-linux-x86_64.zip
root@c633e07fbd33:/workspace/tutor/inceptionv3# export ANDROID_NDK_ROOT=/opt/android-ndk-r19b/
root@c633e07fbd33:/workspace/tutor/inceptionv3# cp $SNPE_ROOT/examples/NativeCpp/SampleCode . -r
//runtme not present
haydn:/data/local/tmp/incpv3 $ snpe-sample -d inception_v3_htp.dlc -i target_raw_list.txt -r dsp
SNPE Version: 1.52.0.2724
Selected runtime not present. Falling back to CPU.
2 编译 Sample Code
(1)ndk-build 编译得到snpe-sample
//从SNPE SDK 拷贝SampleCode到工作目录
root@c633e07fbd33:/workspace/tutor/inceptionv3# cp $SNPE_ROOT/examples/NativeCpp/SampleCode . -r
root@c633e07fbd33:/workspace/tutor/inceptionv3# cd SampleCode/jni/
root@c633e07fbd33:/workspace/tutor/inceptionv3/SampleCode/jni# export PATH=$PATH:$ANDROID_NDK_ROOT
root@c633e07fbd33:/workspace/tutor/inceptionv3/SampleCode/jni# ndk-build
(2)在手机上运行snpe-sample
这里我们复用上一篇中的push到手机中的库文件和模型文件
root@c633e07fbd33:/workspace/tutor/inceptionv3/SampleCode/jni# adb push ../obj/local/arm64-v8a/snpe-sample /data/local/tmp/incpv3/arm64/bin
root@c633e07fbd33:/workspace/tutor/inceptionv3/SampleCode/jni# adb shell
haydn:/ $ cd /data/local/tmp/incpv3/
haydn:/data/local/tmp/incpv3 $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/incpv3/arm64/lib
haydn:/data/local/tmp/incpv3 $ export PATH=$PATH:/data/local/tmp/incpv3/arm64/bin
haydn:/data/local/tmp/incpv3 $ export ADSP_LIBRARY_PATH="/data/local/tmp/incpv3/dsp/lib;/system/lib/rfsa/adsp;/system/vendor/lib/rfsa/adsp;/dsp"
//测试snpe-sample
haydn:/data/local/tmp/incpv3 $ snpe-sample -h
DESCRIPTION:
------------
Example application demonstrating how to load and execute a neural network
using the SNPE C++ API.
REQUIRED ARGUMENTS:
-------------------
-d <FILE> Path to the DL container containing the network.
-i <FILE> Path to a file listing the inputs for the network.
-o <PATH> Path to directory to store output results.
OPTIONAL ARGUMENTS:
-------------------
-b <TYPE> Type of buffers to use [USERBUFFER_FLOAT, USERBUFFER_TF8, ITENSOR, USERBUFFER_TF16] (ITENSOR is default).
-r <RUNTIME> The runtime to be used [gpu, dsp, aip, cpu] (cpu is default).
-u <VAL,VAL> Path to UDO package with registration library for UDOs.
Optionally, user can provide multiple packages as a comma-separated list.
-z <NUMBER> The maximum number that resizable dimensions can grow into.
Used as a hint to create UserBuffers for models with dynamic sized outputs. Should be a positive integer and is not applicable when using ITensor.
-s <TYPE> Source of user buffers to use [GLBUFFER, CPUBUFFER] (CPUBUFFER is default).
-c Enable init caching to accelerate the initialization process of SNPE. Defaults to disable.
-l <VAL,VAL,VAL> Specifies the order of precedence for runtime e.g cpu_float32, dsp_fixed8_tf etc. Valid values are:-
cpu_float32 (Snapdragon CPU) = Data & Math: float 32bit
gpu_float32_16_hybrid (Adreno GPU) = Data: float 16bit Math: float 32bit
dsp_fixed8_tf (Hexagon DSP) = Data & Math: 8bit fixed point Tensorflow style format
gpu_float16 (Adreno GPU) = Data: float 16bit Math: float 16bit
cpu (Snapdragon CPU) = Same as cpu_float32
gpu (Adreno GPU) = Same as gpu_float32_16_hybrid
dsp (Hexagon DSP) = Same as dsp_fixed8_tf
//运行snpe-sample
haydn:/data/local/tmp/incpv3 $ snpe-sample -d inception_v3_htp.dlc -i target_raw_list.txt -r dsp
SNPE Version: 1.52.0.2724
Selected runtime not present. Falling back to CPU.
运行结果显示 Selected runtime not present. Falling back to CPU. 这是因为默认的SampleCode 里面没有设置为unsignedPD。
(3)设置unsigned PD
Code change
diff --git a/examples/NativeCpp/SampleCode/jni/CheckRuntime.cpp b/examples/NativeCpp/SampleCode/jni/CheckRuntime.cpp
index 82c9c10..8ba67ce 100755
--- a/examples/NativeCpp/SampleCode/jni/CheckRuntime.cpp
+++ b/examples/NativeCpp/SampleCode/jni/CheckRuntime.cpp
@@ -23,7 +23,7 @@ zdl::DlSystem::Runtime_t checkRuntime(zdl::DlSystem::Runtime_t runtime)
std::cout << "SNPE Version: " << Version.asString().c_str() << std::endl; //Print Version number
- if (!zdl::SNPE::SNPEFactory::isRuntimeAvailable(runtime))
+ if (!zdl::SNPE::SNPEFactory::isRuntimeAvailable(runtime, zdl::DlSystem::RuntimeCheckOption_t::UNSIGNEDPD_CHECK))
{
std::cerr << "Selected runtime not present. Falling back to CPU." << std::endl;
runtime = zdl::DlSystem::Runtime_t::CPU;
diff --git a/examples/NativeCpp/SampleCode/jni/main.cpp b/examples/NativeCpp/SampleCode/jni/main.cpp
index 6ec2f95..8ad06bc 100755
--- a/examples/NativeCpp/SampleCode/jni/main.cpp
+++ b/examples/NativeCpp/SampleCode/jni/main.cpp
@@ -324,6 +324,18 @@ int main(int argc, char** argv)
return EXIT_FAILURE;
}
+ std::string PlatformOptions = "unsignedPD:ON";
+
+ // check platform options
+ if (PlatformOptions.length() > 0) {
+ bool setSuccess = platformConfig.setPlatformOptions(PlatformOptions);
+ bool isValid = platformConfig.isOptionsValid();
+ std::cout << "PlatformOptions (" << PlatformOptions << ") set " << (setSuccess ? "successful" : "failed")
+ << " config option is " << (isValid ? "valid" : "invalid") << std::endl;
+ if (!setSuccess || !isValid) {
+ return EXIT_FAILURE;
+ }
+ }
snpe = setBuilderOptions(container, runtime, runtimeList, useUserSuppliedBuffers, platformConfig, usingInitCaching);
if (snpe == nullptr)
{
再次编译和运行snpe-sample,成功运行到DSP/HTP上。
root@c633e07fbd33:/workspace/tutor/inceptionv3/SampleCode/jni# ndk-build
root@c633e07fbd33:/workspace/tutor/inceptionv3/SampleCode/jni# adb push ../obj/local/arm64-v8a/snpe-sample /data/local/tmp/incpv3/arm64/bin
root@c633e07fbd33:/workspace/tutor/inceptionv3/SampleCode/jni# adb shell
haydn:/ $ cd /data/local/tmp/incpv3/
haydn:/data/local/tmp/incpv3 $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp/incpv3/arm64/lib
haydn:/data/local/tmp/incpv3 $ export PATH=$PATH:/data/local/tmp/incpv3/arm64/bin
haydn:/data/local/tmp/incpv3 $ export ADSP_LIBRARY_PATH="/data/local/tmp/incpv3/dsp/lib;/system/lib/rfsa/adsp;/system/vendor/lib/rfsa/adsp;/dsp"
haydn:/data/local/tmp/incpv3 $ snpe-sample -d inception_v3_htp.dlc -i target_raw_list.txt -r dsp
SNPE Version: 1.52.0.2724
PlatformOptions (unsignedPD:ON) set successful config option is valid
Batch size for the container is 1
Processing DNN Input: cropped/notice_sign.raw
Processing DNN Input: cropped/trash_bin.raw
Processing DNN Input: cropped/plastic_cup.raw
Processing DNN Input: cropped/chairs.raw
调试API介绍
1 输出中间层
调试模式下,会输出每一层的结果。
- setDebugMode()
- SNPEBuilder& setDebugMode(bool debugMode)
2. 收集每一层运行时间
设置profilingLevel到DETAILED,会把每一层的运行时间记录到diag.log里面。
- setProfilingLevel()
- SNPEBuilder& setProfilingLevel(zdl::DlSystem::ProfilingLevel_t profilingLevel)
总结
到此为止,我们完成了
- 介绍SNPE API的调用流程
- 修改SampleCode,完成Inceptionv3在的DSP/HTP 的unsignedPD推理
- 介绍调试API
其他相关内容:
第二篇【上手SNPE(2)使用手机推理inceptionV3】
作者:Wenhao